Too much Discussion of the XOR swap trick
The following post involves far too much discussion of the XOR swap trick. I suspect many readers already know some of this, so feel free to skip over some early sections if you like! If you already know what the XOR swap trick is, you can skip straight to the part where we see if it’s useful.
What does XOR mean?
XOR is short for Exclusive OR (we use the X, instead of EOR, because X is cooler). XOR has a lesser known, less cool friend, Inclusive Or (IOR).
These two names comes from an issue in English (and many other languages), where we use ‘or’ for two different purposes. The difference between these two is when someone talks about “A or B”, are you allowed both A and B, or just one or the other? In maths and computing, where we need to be more exact, we separate these two cases into XOR (you cannot have both) and IOR (you can have both).
To hopefully make this clearer, let’s consider a couple of real world examples. I decided to pick a couple of examples from the Disney World website. Let’s begin by picking one of the rules of Disney World, you cannot “harass or harm wildlife” – reasonable sounding rule! If you started kicking a duck while also insulting it, you would be both harassing and harming wildlife. This is clearly an ‘inclusive’ or, if you tried to claim you were not “harassing or harming wildlife, I’m doing both!”, you would still get kicked out of the park.
On the other hand, over in the Refreshment Corner, the “All-Beef Hot Dog Basket” comes with a “Mandarin Orange or a Small Bag of Chips”. Here, the ‘or’ is exclusive, you can have an orange, you can have chips, but you cannot have an orange and chips! No matter how much you pointed at the person being dragged out of the park for duck harassment and harming, Disney are not going to agree their rules are inconsistent.
How do we know if an ‘or’ is inclusive or exclusive? There are some general rules, but often you just have to know by context. However, because we want to be clear when using computers, we are going to use ‘XOR’ when we mean “A, or B, but not both”.
What is a logical XOR?
A logical XOR takes two true/false values and returns true when exactly one of the two is true. We can write this out as a table:
| A | B | A XOR B |
|---|---|---|
| false | false | false |
| false | true | true |
| true | false | true |
| true | true | false |
One useful way to think about XOR: A XOR B is the same as asking “are A and B different?” If they are different, the result is true. If they are the same, the result is false. This will become important later.
What is the XOR bitwise operator?
In most programming languages, the ^ operator performs a bitwise XOR. This means it takes two integers, lines up their binary representations, and applies logical XOR to each pair of bits independently.
For example, let’s XOR the numbers 12 and 10:
12 = 1100
10 = 1010
----------
^ = 0110 = 6
Each column is just a logical XOR: 1 XOR 1 = 0, 1 XOR 0 = 1, 0 XOR 1 = 1, 0 XOR 0 = 0.
Bitwise XOR has two properties that matter for the swap trick:
- Self-inverse:
a ^ a == 0for any valuea. Every bit cancels with itself. - Identity:
a ^ 0 == a. XOR with zero changes nothing.
Combining these: if we compute a ^ b ^ b, the two bs cancel and we get back a. This works the other way too: a ^ b ^ a gives us b. Hold that thought.
What is the XOR swap trick?
The XOR swap trick uses those properties to swap two variables without a temporary variable. Here it is in C:
a ^= b;
b ^= a;
a ^= b;Let’s trace through this with a = 5 and b = 3:
| Step | Operation | a | b |
|---|---|---|---|
| Start | 5 | 3 | |
| Line 1 | a ^= b |
5 ^ 3 = 6 | 3 |
| Line 2 | b ^= a |
6 | 3 ^ 6 = 5 |
| Line 3 | a ^= b |
6 ^ 5 = 3 | 5 |
After line 1, a holds a ^ b. After line 2, b holds b ^ (a ^ b) which simplifies to a (the bs cancel). After line 3, a holds (a ^ b) ^ a which simplifies to b (the as cancel). The values have been swapped, and we never needed a temporary variable.
This is a clever bit of programming, and you can find it discussed in many places online. The question is: is it actually useful? Let’s find out.
Usage 1: Swapping local variables
The most common place you might want to swap two variables is right there in a function, with local variables. So let’s write three functions and see what the compiler makes of them. First, a baseline – just return a / b:
int div_direct(int a, int b) {
return a / b;
}Now, a version that XOR-swaps a and b and then divides:
int div_xor_swap(int a, int b) {
a ^= b;
b ^= a;
a ^= b;
return a / b;
}And finally, the boring version with a temporary variable:
int div_temp_swap(int a, int b) {
int temp = a;
a = b;
b = temp;
return a / b;
}The XOR-swap and temp-swap versions should both compute b / a (since we swap before dividing). Let’s compile all three with clang -O2 on x86-64 and look at the assembly:
div_direct:
mov eax, edi
cdq
idiv esi
ret
div_xor_swap:
mov eax, esi
cdq
idiv edi
ret
div_temp_swap:
mov eax, esi
cdq
idiv edi
retThe compiler has seen straight through both swaps. div_direct divides edi by esi (that is, a / b). Both div_xor_swap and div_temp_swap divide esi by edi (that is, b / a). The generated code is identical – no XOR instructions, no temporary variable, no swap at all. The compiler just tracks which value is in which register and adjusts the final division accordingly.
In practice, this is the situation for almost any use of the XOR swap trick on local variables. The compiler is perfectly capable of working out that you are swapping two values, and it will just rearrange the subsequent operations to account for that. The XOR swap trick does nothing here that the compiler would not already do for you.
Usage 2: Swapping through pointers
So what about writing a proper swap function, one that takes two pointers? Let’s try both approaches:
void swap_xor(int* a, int* b) {
*a ^= *b;
*b ^= *a;
*a ^= *b;
}
void swap_temp(int* a, int* b) {
int temp = *a;
*a = *b;
*b = temp;
}Here the compiler can’t just rearrange later operations, because the function’s whole purpose is the swap itself. Let’s see what we get:
swap_xor:
mov eax, dword ptr [rdi]
xor eax, dword ptr [rsi]
mov dword ptr [rdi], eax
xor eax, dword ptr [rsi]
mov dword ptr [rsi], eax
xor dword ptr [rdi], eax
ret
swap_temp:
mov eax, dword ptr [rdi]
mov ecx, dword ptr [rsi]
mov dword ptr [rdi], ecx
mov dword ptr [rsi], eax
retThe temp variable version is 4 instructions and does exactly what you would expect: load both values into registers, then write them back the other way around. Once the values are in registers, the swap is essentially free – we just store them to the opposite locations.
The XOR version is 6 instructions. It is doing genuine XOR operations, loading, storing, and reloading values. It is strictly worse. So much for saving a register.
Why didn’t the compiler optimise the XOR swap away?
With local variables, the compiler saw straight through both swaps and produced identical code. So why doesn’t it do the same thing here?
Let’s think about what happens if we call swap_temp(&x, &x) – that is, we pass the same pointer for both arguments. The function loads *a into temp, writes *b (the same value) into *a, then writes temp (the original value) back into *b. Nothing changes, which is exactly what we would expect from “swapping” something with itself.
Now consider swap_xor(&x, &x). The very first line, *a ^= *b, computes x ^ x, which is zero. That zero gets written back, and the original value is gone. The XOR swap trick destroys the data when both pointers point to the same address.
This means the two functions are not equivalent. The compiler cannot replace one with the other, because they behave differently when aliased. It has to faithfully emit the XOR operations, reloading from memory after each store, because each write through one pointer might be changing the value that the other pointer sees.
Except, in C we can tell the compiler that two pointers will never alias, using the restrict keyword:
void swap_xor_restrict(int* restrict a, int* restrict b) {
*a ^= *b;
*b ^= *a;
*a ^= *b;
}With restrict, we are promising that a and b point to different memory. Now the compiler knows the aliasing case cannot happen, and:
swap_xor_restrict:
mov eax, dword ptr [rsi]
mov ecx, dword ptr [rdi]
mov dword ptr [rsi], ecx
mov dword ptr [rdi], eax
retWe are back to 4 instructions – just loads and stores, no XOR in sight. The compiler has optimised the XOR swap into exactly the same code as the temp variable version. The trick has, once again, bought us nothing.
What about the addition swap trick?
XOR is not the only operation with a self-inverse that lets you swap without a temporary. You can do the same thing with addition and subtraction:
a = a + b;
b = a - b;
a = a - b;Trace through it: after line 1, a holds a + b. After line 2, b holds (a + b) - b, which is the original a. After line 3, a holds (a + b) - a, which is the original b. Swapped.
Oh no, but for signed integers, this is really bad. The addition a + b can overflow, and signed integer overflow is undefined behaviour in C. So this trick is not just pointless — it is formally broken.
Except, is it actually broken in practice? I have been trying to come up with a case where the undefined behaviour causes a real problem here, and I am not sure I can. The classic way compilers exploit signed overflow UB is in comparisons — if you write i < i + 1, the compiler assumes this is always true, because signed overflow cannot happen. But in the addition swap trick, we never compare the overflowed value against anything. We just add and then subtract.
There are three paths the compiler might take:
- Local variables: the compiler sees through the swap algebraically and optimises it away entirely, just like it did with XOR. The overflow never happens in the generated code.
- Pointers without
restrict: the compiler generates the arithmetic faithfully. On two’s complement hardware (which is everything, and mandated since C23), the wrapping and unwrapping cancel out to give the correct result. - Algebraic simplification: if the compiler exploits the no-overflow assumption to simplify, it gets
b = a_originalanda = b_original— which is the correct swap.
Every path I can find produces the right answer for this specific pattern. The undefined behaviour is real, and you should not rely on this — but I cannot construct a case where a compiler actually generates incorrect code for an addition swap. If anyone can, I would be very interested to see it.
You can also, if you really want to, use the addition swap trick on floating point numbers. Let’s try it with 2.0 and 3.0:
a = 2.0, b = 3.0
a = a + b = 5.0
b = a - b = 5.0 - 3.0 = 2.0
a = a - b = 5.0 - 2.0 = 3.0
Swapped. But what about 1.0 and 1e16?
a = 1.0, b = 1e16
a = a + b = 1e16 (the 1.0 is below the precision of a double at this scale)
b = a - b = 1e16 - 1e16 = 0.0
a = a - b = 1e16 - 0.0 = 1e16
We now have a = 1e16 and b = 0.0. The 1.0 has vanished entirely. So: the addition swap is not just theoretically broken for floats, it will silently eat your data when the two values are far enough apart. XOR does not work on floats at all (in C, at least), so it avoids this problem by not even trying.
Still, this is a point in favour of the XOR version over the addition version: XOR cannot overflow, so at least it avoids the UB question entirely. Though as we have seen, neither version is worth using in the first place.
So, why do people care about the XOR swap trick?
At this point, we have established that the XOR swap trick produces identical or worse code in every case we have tried. So why does it come up at all?
Part of the answer is that it looks clever. It is the kind of thing you can write on a whiteboard and puzzle someone with for thirty seconds, which gives it a life as an interview question and a “did you know” curiosity.
But there is a kernel of a genuine use case, which becomes clearer if we think in terms of assembly rather than C. Imagine you are hand-writing assembly code and you need to swap two values that are already in registers. If you have a spare register, the obvious approach is three moves:
move $t2, $t0 # t2 = t0 (save)
move $t0, $t1 # t0 = t1
move $t1, $t2 # t1 = saved valueThat is three instructions, and it requires a register you are not using for anything else. XOR swap is also three instructions, and requires no spare register:
xor $t0, $t0, $t1 # t0 = t0 ^ t1
xor $t1, $t1, $t0 # t1 = t1 ^ (t0 ^ t1) = original t0
xor $t0, $t0, $t1 # t0 = (t0 ^ t1) ^ original t0 = original t1So if you have a spare register, there is no reason to use XOR swap — three moves and three XORs are the same cost, and moves are clearer. The only situation where XOR swap wins is when you are genuinely out of spare registers and need to avoid spilling a value to memory.
Does this situation actually arise? On MIPS, which has 32 general-purpose registers and no dedicated swap instruction, register pressure can occasionally be high enough in optimised inner loops that all registers are live. In that case, XOR swap is a genuine option: three register-to-register XOR instructions, no memory access, no spare register needed.
Except, for most architectures, it still does not hold up. The Z80 is perhaps the most tempting example — it has very few registers and many operations are restricted to the accumulator. However, the Z80’s XOR instruction can only write to the accumulator (A), so XOR B means A ^= B, and there is simply no single instruction for B ^= A. The three-step XOR swap does not work on Z80 between arbitrary registers.
And on x86? The 8086, from its introduction in 1978, has always had XCHG — a single dedicated exchange instruction. The XOR swap trick has never been necessary on x86.
It is the kind of technique which might have been occasionally useful in the 1980s, but now is only useful for cute interview questions and as a curiosity.
Are there other XOR tricks?
It turns out there are a bunch of other cute things XOR can do. The most well-known: given a list where every value appears exactly twice except one, XOR all the values together and the duplicates cancel out, leaving the unique element.
int find_unique(int* values, int n) {
int result = 0;
for (int i = 0; i < n; i++) {
result ^= values[i];
}
return result;
}For {4, 7, 2, 7, 4}, this computes 4 ^ 7 ^ 2 ^ 7 ^ 4. The two 4s cancel, the two 7s cancel, and we are left with 2. One pass through the data, no extra storage, no sorting, no hash table.
If you don’t know anything else about the values in the list, this really is very clever. I don’t know of any other way to do it anywhere near as efficiently — it is O(n) time and O(1) space, which is hard to beat. I also can’t think of any reason you would actually want to do this. But you certainly can.
There are a number of other XOR tricks out there — XOR linked lists, XOR in hash functions — but this post has already met its quota of entirely too much too much XOR.