Last active
March 30, 2022 14:15
-
-
Save davehull/7ab911720d895a6dae18 to your computer and use it in GitHub Desktop.
eatoin shrdlu: XOR Encryption and Hamming Distance
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I've been playing around with Matasano Crypto Challenges for my own edification. | |
It's been fun and insightful. I've learned a number of new things and enjoyed | |
doing so. If you're a mediocre programmer like me and have an interest in crypto, | |
I highly recommend checking out the challenges -- http://cryptopals.com/. | |
A few of the exercises in set 1 have you playing around with XOR for encryption. | |
You create a script that can brute force single key decryption and if you're | |
ambitious you'll write a function that will examine letter frequencies of the | |
output and score the results, returning the one that is most likely to be | |
English. I wrote multiple scoring functions for this, one that counts English | |
letter frequencies, one that counts English bigram frequencies, one for trigram | |
frequencies and lastly one that calculates Shannon Entropy. The output of these | |
functions all make up the composite score. | |
Later you have to write a XOR brute force decryption program that first applies | |
Hamming Distances at the bit level, which equates to calculating the Index of | |
Coincidence over the ciphertext, all in an effort to determine the size of the | |
XOR key used to encrypt the data. Here I stumbled a bit. First I didn't pay | |
careful enough attention and missed that the given ciphertext was base64 encoded. | |
I also didn't get the math quite right for building my byte arrays on which the | |
Hamming Distance would be calculated. But I eventually resolved those isses | |
after a day of banging my head on the desk trying to figure out what the issue | |
was. | |
Now I have a XOR brute force tool that can reliably calculate key size... well, | |
to an extent. These calculations are about probabilities not certainties, but | |
one thing I noticed right away was that longer keys are easier to calculate | |
with greater certainty, while shorter keys are more difficult. Here's some output | |
from my script that will show what I mean. I found it fascinating, but I've also | |
not been getting sufficient sleep of late -- thank you Matasano. | |
First, let's set a variable for our plaintext: | |
$data = "Dennis Fisher talks with Dan Kaminsky about the VENOM bug, the value of | |
virtual machine escapes, why everyone wants to make every bug the worst one of | |
all time or just a bunch of hype and what the Avengers have to do with | |
vulnerability disclosure." | |
Next, let's set our key: | |
$key = "eatoin shrdlu" | |
$key.length | |
13 | |
Let's XOR encrypt our plaintext with the key and get $cipherText: | |
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key | |
$cipherText (line breaks added) | |
21041a01001d003501010c090745151503021d000401060c4c31040f54240803491d1b191d4c14 | |
070e011b491a4816482421223a2841161a0e4200070017441a140914114f060800050100101914 | |
0941190e0a06491d0d52011f160411111c454e571b1152011a1017181b010c4e57120606174c01 | |
0a41190e020b00161e171615550714134f1d0645531f1d161f01450e1a0a49014653091e084c01 | |
0c0c114f061c00191d01104c14450301010a06001c0e520c1505004115010d4e571b090644181d | |
004135190c0047161a014404141304541b064e441c48050d181d45170103070b52120a1b080501 | |
1c4110061a0d4c1c1b0716095b | |
Let's run this through the XOR brute forcer to see what it says about the key | |
size: | |
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize | |
KeySize AvgDist | |
------- ------- | |
13 2.74660633484163 | |
39 2.78461538461538 | |
26 2.81730769230769 | |
33 2.8989898989899 | |
40 2.905 | |
25 2.93 | |
38 2.94210526315789 | |
34 2.94607843137255 | |
35 2.95714285714286 | |
29 2.98522167487685 | |
37 2.99459459459459 | |
27 3 | |
28 3 | |
30 3.00952380952381 | |
31 3.01075268817204 | |
23 3.01449275362319 | |
8 3.01724137931034 | |
18 3.02314814814815 | |
22 3.03181818181818 | |
36 3.03888888888889 | |
20 3.04090909090909 | |
10 3.04347826086957 | |
24 3.0462962962963 | |
21 3.05238095238095 | |
14 3.05357142857143 | |
32 3.07291666666667 | |
12 3.10087719298246 | |
19 3.10526315789474 | |
15 3.10666666666667 | |
17 3.15837104072398 | |
16 3.21875 | |
11 3.23376623376623 | |
9 3.23931623931624 | |
5 3.25416666666667 | |
4 3.42916666666667 | |
7 3.44957983193277 | |
6 3.525 | |
3 3.82716049382716 | |
2 4.24590163934426 | |
Beautiful! The result correctly calculates that the key size is 13 bytes, but | |
watch what happens with smaller keys. | |
$key = "eatoin shrdl" | |
$key.length | |
12 | |
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key | |
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize | |
KeySize AvgDist | |
------- ------- | |
24 2.58333333333333 | |
36 2.66111111111111 | |
38 2.85789473684211 | |
40 2.915 | |
19 2.95215311004785 | |
29 2.97536945812808 | |
22 3.00909090909091 | |
16 3.02232142857143 | |
23 3.03381642512077 | |
18 3.03703703703704 | |
37 3.03783783783784 | |
14 3.05357142857143 | |
30 3.05714285714286 | |
32 3.0625 | |
27 3.06481481481481 | |
12 3.06578947368421 | |
28 3.06632653061224 | |
25 3.08 | |
33 3.08585858585859 | |
13 3.1131221719457 | |
9 3.11965811965812 | |
31 3.12365591397849 | |
11 3.13419913419913 | |
35 3.13809523809524 | |
34 3.15196078431373 | |
20 3.15909090909091 | |
26 3.16826923076923 | |
39 3.18974358974359 | |
17 3.19004524886878 | |
21 3.1952380952381 | |
10 3.2 | |
15 3.24444444444444 | |
8 3.26293103448276 | |
5 3.35416666666667 | |
6 3.425 | |
7 3.50840336134454 | |
4 3.56666666666667 | |
3 3.9917695473251 | |
2 4.61475409836066 | |
Our top result is not 12, but notice the top two results are multiples | |
of 12 and 12 is significantly higher in the rankings than any key | |
lengths less than 12. Let's try a shorter key: | |
$key = "eatoin shrd" | |
$key.length | |
11 | |
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key | |
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize | |
KeySize AvgDist | |
------- ------- | |
33 2.69191919191919 | |
22 2.70909090909091 | |
11 2.81385281385281 | |
18 2.97222222222222 | |
38 2.98421052631579 | |
23 3.0048309178744 | |
28 3.06632653061224 | |
15 3.07111111111111 | |
39 3.07179487179487 | |
24 3.0787037037037 | |
16 3.09821428571429 | |
31 3.10215053763441 | |
37 3.11351351351351 | |
32 3.11458333333333 | |
36 3.11666666666667 | |
40 3.12 | |
17 3.12669683257919 | |
13 3.12669683257919 | |
26 3.12980769230769 | |
35 3.13809523809524 | |
21 3.13809523809524 | |
34 3.15686274509804 | |
20 3.15909090909091 | |
29 3.17733990147783 | |
25 3.18 | |
30 3.18571428571429 | |
12 3.18859649122807 | |
27 3.19444444444444 | |
14 3.20089285714286 | |
10 3.20434782608696 | |
8 3.27155172413793 | |
19 3.27272727272727 | |
9 3.2948717948718 | |
7 3.47058823529412 | |
6 3.59583333333333 | |
5 3.62083333333333 | |
4 3.65416666666667 | |
3 4.12757201646091 | |
2 4.20491803278689 | |
Again our correct key size is not the top result, but the two top | |
results are multiples of the correct result and the correct result | |
follows immediately. What of a smaller key size? | |
$key = "eatoin sh" | |
$key.length | |
9 | |
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key | |
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize | |
KeySize AvgDist | |
------- ------- | |
36 2.66111111111111 | |
27 2.75925925925926 | |
31 2.88709677419355 | |
9 2.8974358974359 | |
32 2.91666666666667 | |
38 2.92105263157895 | |
23 2.93719806763285 | |
22 2.95909090909091 | |
29 2.97044334975369 | |
18 2.99074074074074 | |
40 3.025 | |
37 3.02702702702703 | |
14 3.05803571428571 | |
30 3.07619047619048 | |
28 3.0765306122449 | |
20 3.08181818181818 | |
35 3.09047619047619 | |
34 3.10294117647059 | |
17 3.10407239819005 | |
26 3.11538461538462 | |
16 3.11607142857143 | |
15 3.12 | |
13 3.13122171945701 | |
25 3.16 | |
10 3.17391304347826 | |
21 3.2 | |
39 3.20512820512821 | |
33 3.20707070707071 | |
19 3.21052631578947 | |
11 3.22077922077922 | |
24 3.25925925925926 | |
5 3.37083333333333 | |
7 3.39915966386555 | |
12 3.41228070175439 | |
8 3.42672413793103 | |
6 3.45416666666667 | |
4 3.57916666666667 | |
3 3.98765432098765 | |
2 4.45491803278689 | |
Again, not the top result, but the top two are multiples and the | |
correct result follows closely. This pattern continues for small | |
keys. | |
$key = "eatoin s" | |
$key.length | |
8 | |
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key | |
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize | |
KeySize AvgDist | |
------- ------- | |
24 2.58333333333333 | |
40 2.68 | |
32 2.72916666666667 | |
16 2.9375 | |
36 2.98333333333333 | |
39 2.98974358974359 | |
20 3.00454545454545 | |
21 3.00952380952381 | |
30 3.01428571428571 | |
34 3.01960784313725 | |
33 3.02020202020202 | |
28 3.04081632653061 | |
8 3.04741379310345 | |
12 3.08333333333333 | |
38 3.11052631578947 | |
29 3.14285714285714 | |
31 3.16129032258065 | |
25 3.17 | |
27 3.17592592592593 | |
37 3.18918918918919 | |
18 3.19444444444444 | |
23 3.19806763285024 | |
22 3.22272727272727 | |
19 3.24401913875598 | |
14 3.25446428571429 | |
35 3.26190476190476 | |
17 3.28054298642534 | |
11 3.28571428571429 | |
15 3.29777777777778 | |
13 3.30769230769231 | |
26 3.31730769230769 | |
9 3.32905982905983 | |
10 3.35652173913044 | |
6 3.36666666666667 | |
7 3.59663865546219 | |
4 3.61666666666667 | |
5 3.6375 | |
3 3.98353909465021 | |
2 4.68852459016393 | |
$key = "eato" | |
$cipherText = .\XOR-Encrypt.ps1 -String $data -key $key | |
XOR-Brutr.ps1 -MaxKeySize 40 -String $cipherText -Encoding base16 | Sort-Object AvgDist | ft * -AutoSize | |
KeySize AvgDist | |
------- ------- | |
24 2.58333333333333 | |
28 2.64285714285714 | |
36 2.66111111111111 | |
40 2.68 | |
32 2.72916666666667 | |
20 2.79545454545455 | |
30 2.85714285714286 | |
34 2.88725490196078 | |
16 2.9375 | |
29 2.94581280788177 | |
38 2.94736842105263 | |
39 2.94871794871795 | |
26 2.96153846153846 | |
22 3.00909090909091 | |
35 3.01428571428571 | |
33 3.02020202020202 | |
14 3.02232142857143 | |
27 3.02777777777778 | |
8 3.04741379310345 | |
12 3.06578947368421 | |
31 3.0752688172043 | |
13 3.09049773755656 | |
15 3.09333333333333 | |
21 3.09523809523809 | |
37 3.11891891891892 | |
25 3.12 | |
23 3.1207729468599 | |
18 3.14814814814815 | |
17 3.17647058823529 | |
11 3.17748917748918 | |
10 3.2 | |
19 3.21531100478469 | |
6 3.25 | |
9 3.28632478632479 | |
4 3.31666666666667 | |
7 3.31932773109244 | |
5 3.57083333333333 | |
3 3.77366255144033 | |
2 4.33196721311475 | |
What's the point of posting this? I guess the take away is that if | |
you're trying to brute force XOR keys by first determining the key | |
size, you can't automatically take the top result to be the correct | |
key size. Even Matasano's instructions state that this is about | |
probability not certainty, but the interesting property that they | |
don't discuss, probably because they want people to notice on their | |
own, is that for short keys, the top results may be multiples of | |
the actual key size, so be mindful of that pattern. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment