When I wore a younger man’s clothes, I often heard people talking about the trivialness of XOR encoding. Yet, what I see in the security industry is a tremendous amount of brute forcing of XOR encoding in order to extract the key. So, I want to put this topic to rest and talk about how to defeat XOR by using math.

XOR is a common encoding function. It is nice because it is asymmetrical. This means in short, that y = x(x(y)). When encoding the common format is

- get the address
- increment over the address
- xor the payload with a key
- jump to the start of the decoded payload.

But, what happens when the key is difficult to get, such as a context key? Can you attack the shellcode without knowing this key? You can, using collusion. Collusion is when you know parts of what you are looking for in the message. One technique is create a list of normalized tokens (things we are looking for), and then search for these tokens in a normalized stream. I will explain normalization in a moment.

First, let’s see what an XOR stream looks like. I created a ruby script that generates the XOR streams:

# More bad code written by Chris Jordan (c) 2012 def printHex(temp) temp.each do |element| if element < 16 then print "0" print element.to_s(16) else print element.to_s(16) end end puts"\n" end def xorStream(stream, key) crypt = [] stream.each_byte do |s| crypt.push(s ^ key) end return crypt end message = "abcdefghijkhttp://lmnopqrstuvwxyz" keys = *(10..20) keys.each do |key| print "#{key}: " encoded = xorStream(message, key) print "Encoded: " printHex(encoded) end

The result is a unique encoded stream. Each stream is unique based on the key. At no time does the value of “http://” (value 0x687474703a2f2f) appear in any of the messages.

$ ruby xorGenGrid.rb 10: Encoded: 6b68696e6f6c6d62636061627e7e7a302525666764657a7b78797e7f7c7d727370 11: Encoded: 6a69686f6e6d6c63626160637f7f7b312424676665647b7a79787f7e7d7c737271 12: Encoded: 6d6e6f68696a6b646566676478787c362323606162637c7d7e7f78797a7b747576 13: Encoded: 6c6f6e69686b6a656467666579797d372222616063627d7c7f7e79787b7a757477 14: Encoded: 6f6c6d6a6b686966676465667a7a7e342121626360617e7f7c7d7a7b7879767774 15: Encoded: 6e6d6c6b6a696867666564677b7b7f352020636261607f7e7d7c7b7a7978777675 16: Encoded: 7172737475767778797a7b786464602a3f3f7c7d7e7f606162636465666768696a 17: Encoded: 7073727574777679787b7a796565612b3e3e7d7c7f7e616063626564676669686b 18: Encoded: 737071767774757a7b78797a666662283d3d7e7f7c7d62636061666764656a6b68 19: Encoded: 727170777675747b7a79787b676763293c3c7f7e7d7c63626160676665646b6a69 20: Encoded: 757677707172737c7d7e7f7c6060642e3b3b78797a7b64656667606162636c6d6e

We are going to treat the “http://” as the token we are searching for in the collusion attack. We are going to normalize both the streams and token(s). Then search the normalized streams for the normalized tokens.

I will show what I mean by normalizing the streams. Let’s say that each element in a sequence is a letter, and that X is the XOR key. That means that a stream would be: (A ^ X), (B ^X), (C ^ X), (D ^ X), ….

Let’s XOR each element to its predecessor:

(A ^ X) ^ (B ^X), (B ^X) ^ (C ^ X), (C ^ X) ^ (D ^ X), …

This is the same as:

(A ^ B) ^ (X ^ X), (B ^ C) ^ (X ^ X), (C ^ D) ^ (X ^ X), …

Finally, this gives us a pattern of:

(A ^ B), (B ^ C), (C ^ D), …

Let’s now do an XOR diff of the previous streams. We will now update the loop to include an XOR diff:

#more bad code written by Chris Jordan (c) 2012 keys.each do |key| print "#{key}: " encoded = xorStream(message, key) print "Encoded: " printHex(encoded) oneDiff = [] counter = 0 previousValue = 0 # Need to error check to ensure that there are enough elements to create a search while counter < (encoded.length - 1) do newValue = encoded[counter] ^ encoded[counter+1] oneDiff.push(newValue) counter += 1 #previousValue += newValue end print " Diff values: " printHex(oneDiff) end

Before we look at the streams, I am going to also normalize the token we are looking for. If we are looking for http:// (687474703a2f2f) it’s XOR diff is 0x1c00044a1500

Now let’s take a look at the output and see if our normalized token appears:

$ ruby xorGenGrid.rb 10: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 11: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 12: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 13: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 14: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 15: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 16: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 17: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 18: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 19: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 20: Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103

We can find the http token diff in the streams regardless of key used. In fact, each xor diff stream is the same. That makes sense for they are the same message, just with different keys. Consider this stream and the xor diff. We can see the match. Now, in order to get the key, we can just XOR the encoded matching section to the token (http://) that we are looking for.

For example:

28: Encoded: 7d7e7f78797a7b747576777468686c263333707172736c6d6e6f68696a6b646566 Diff values: 0301070103010f010301031c00044a1500430103011f010301070103010f0103 h t t p : / / 68 74 74 70 3a 2f 2f ^ 74 68 68 6c 26 33 33 --------------------- 1c 1c 1c 1c 1c 1c 1c

Therefore:

0x1c = 28

Now we have the key. We can go back against the stream and decode with the derived key.

It’s that simple, we do not have to brute force to find a key when we have an understanding of what tokens we might be looking for, string or binary. We can use any token that we want to search for an XOR pattern. The longer the token we are searching for, the longer the XOR key we can look for. Collusion attacks are possible against any XOR scheme as long as you have a long enough stop token library.

[…] all came to me after reading Chris Jordan’s nice blogpost about how to decode single-byte XORed payloads from malware, when you can’t find the key […]