Created
December 14, 2024 20:12
-
-
Save wangqr/01fdc84aeb6319d5ca7e99c984bef35b to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "aa51b010-4f4d-4cf5-b85d-7e307e3f3663", | |
"metadata": {}, | |
"source": [ | |
"# The original algorithm" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "17df1353-18d9-4224-84f3-0c54dc85f594", | |
"metadata": {}, | |
"source": [ | |
"First, let's recreate the algorithm used by https://everyuuid.com/" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"id": "6036e5c6-cecd-4661-ae01-1f9fdd653b0a", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0 497dcba3-ecbf-4587-a2dd-5eb0665e6880\n", | |
"1 50e14f43-dd4e-412f-864d-78943ea28d91\n", | |
"2 7edb3b2e-869c-485b-af70-76a934e0fcfd\n", | |
"3 67e32b59-3348-4dc3-9645-75c60b6f50cc\n", | |
"4 4c8f6d82-e4c6-4478-92eb-d9342500f006\n", | |
"5 7472cba2-6037-488f-b5aa-53b1c39fe450\n", | |
"6 6b72e68d-e596-4e11-a190-bedbded40cc2\n", | |
"7 331541d6-617d-4464-b7d0-9b346b87f41c\n" | |
] | |
} | |
], | |
"source": [ | |
"ROUND_CONSTANTS = [\n", | |
" 0x47f5417d6b82b5d1,\n", | |
" 0x90a7c5fe8c345af2,\n", | |
" 0xd8796c3b2a1e4f8d,\n", | |
" 0x6f4a3c8e7d5b9102,\n", | |
" 0xb3f8c7d6e5a49201,\n", | |
" 0x2d9e8b7c6f5a3d4e,\n", | |
" 0xa1b2c3d4e5f6789a,\n", | |
" 0x123456789abcdef0,\n", | |
"]\n", | |
"\n", | |
"ROUNDS_USED = 4\n", | |
"UUID_ENTROPY = 122\n", | |
"BIT_MASK = (1 << UUID_ENTROPY // 2) - 1\n", | |
"\n", | |
"def feistelRound(block: int, r: int) -> int:\n", | |
" mixed = block ^ ROUND_CONSTANTS[r] & BIT_MASK\n", | |
" mixed = (mixed << 7 | mixed >> 54) & BIT_MASK\n", | |
" mixed = mixed * 0x6c8e944d1f5aa3b7 & BIT_MASK\n", | |
" mixed = (mixed << 13 | mixed >> 48) & BIT_MASK\n", | |
" return mixed\n", | |
" \n", | |
"def encrypt(index: int) -> int:\n", | |
" left = index >> 61\n", | |
" right = index & BIT_MASK\n", | |
" \n", | |
" for r in range(ROUNDS_USED):\n", | |
" left, right = right, left ^ feistelRound(right, r)\n", | |
"\n", | |
" return left << 61 | right\n", | |
"\n", | |
"def origIndexToUUID(index: int) -> str:\n", | |
" c = encrypt(index)\n", | |
" result = (c >> 74 << 80\n", | |
" | 4 << 76\n", | |
" | (c >> 62 & (1 << 12) - 1) << 64\n", | |
" | 2 << 62\n", | |
" | c & (1 << 62) - 1)\n", | |
" \n", | |
" h = '{:032x}'.format(result)\n", | |
" return f'{h[:8]}-{h[8:12]}-{h[12:16]}-{h[16:20]}-{h[20:]}'\n", | |
"\n", | |
"def indexToUUID(index: int) -> str:\n", | |
" return origIndexToUUID(index)\n", | |
" \n", | |
"for i in range(8):\n", | |
" print(i, indexToUUID(i))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "3d7e7870-c015-4264-8877-97a8dbf53501", | |
"metadata": {}, | |
"source": [ | |
"# Simplifying the algorithm" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "958ba7a3-49f8-4555-b019-8ae56cccb8ff", | |
"metadata": {}, | |
"source": [ | |
"In order to make the this easily invertible, we want to make the `encrypt` function linear in $GF(2)$ by removing the multiplication. We also remove the `ROUND_CONSTANTS` for simplicity, and we'll compensate for this later." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"id": "5f9e2834-851d-4c94-b7ca-41e2b7a5d331", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0 00000000-0000-4000-8000-000000000000\n", | |
"1 80000000-0000-4000-8000-010000080001\n", | |
"2 00000000-0000-4000-a000-020000100002\n", | |
"3 80000000-0000-4000-a000-030000180003\n", | |
"4 00000000-0000-4001-8000-040000200004\n", | |
"5 80000000-0000-4001-8000-050000280005\n", | |
"6 00000000-0000-4001-a000-060000300006\n", | |
"7 80000000-0000-4001-a000-070000380007\n" | |
] | |
} | |
], | |
"source": [ | |
"def feistelRound(block: int, r: int) -> int:\n", | |
" mixed = block & BIT_MASK\n", | |
" mixed = (mixed << 7 | mixed >> 54) & BIT_MASK\n", | |
" # mixed = mixed * 0x6c8e944d1f5aa3b7 & BIT_MASK\n", | |
" mixed = (mixed << 13 | mixed >> 48) & BIT_MASK\n", | |
" return mixed\n", | |
"\n", | |
"for i in range(8):\n", | |
" print(i, indexToUUID(i))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "ce1d924b-2482-4e80-9a48-28b536f7e377", | |
"metadata": {}, | |
"source": [ | |
"Now the encrypt function is just a linear transform $y = Ax + b$ on $GF(2)$, where $x$ and $y$ are 122-vector, and $A$ is a $122\\times122$ matrix. Since we removed `ROUND_CONSTANTS`, vector $b$ is now $0$. We can also see this from the fact that the 0-th UUID is all zero.\n", | |
"\n", | |
"To compensate for $b = 0$, a character-wise mapping on each position of the UUID is performed, but for now we simply skip this step. We'll come back to this later. An `INDEX_OFFSET` for the uuid index is also set, allow us to shift indecies around." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"id": "d5d8e2df-a0c3-4ca0-b0f0-bf37552f6d2e", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"INDEX_OFFSET = 0\n", | |
"\n", | |
"def enc_uuid(uuid: str) -> str:\n", | |
" return uuid\n", | |
"\n", | |
"def dec_uuid(uuid: str) -> str:\n", | |
" return uuid\n", | |
"\n", | |
"def indexToUUID(index: int) -> str:\n", | |
" return enc_uuid(origIndexToUUID((index + INDEX_OFFSET) & (1 << UUID_ENTROPY) - 1))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b6402d54-3475-43c2-a774-f9be76d09de7", | |
"metadata": {}, | |
"source": [ | |
"# Implementing the search" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "529c4c41-8062-4ce9-a9f3-c5869386ac2e", | |
"metadata": {}, | |
"source": [ | |
"Now with a given $y$, we want to find $x$ such that $y = Ax$ on $GF(2)$, and $x$ is the next (or previous) one from a given number. If the output $y$ is only partially given, we remove the unknown elements from $y$ and corresponding rows in $A$, making $A$ a non full rank matrix, and the equation under-determined. To do this, we solve $x$ with Gaussian elimination for as many elemets as possible, and try the remaining elements one by one.\n", | |
"\n", | |
"First, we implement Gaussian elimination on $GF(2)$:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"id": "dfc8013e-6b4d-4f19-a2a2-feeaf7a3b0c6", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[1, 0, 1, 0]\n", | |
"[None, 1, None, None]\n", | |
"None\n", | |
"[1, None, None, None]\n", | |
"[None, None, None, 1]\n", | |
"[None, None, None, 1]\n" | |
] | |
} | |
], | |
"source": [ | |
"# Part 4 is too slow. This solver is not used. See the solve function in next cell\n", | |
"def solve(A: list[int], y: list[int], len_x: int):\n", | |
" \"\"\" Solve y = Ax on GF(2) using Gaussian elimination.\n", | |
" Each element of y is 0 or 1.\n", | |
" Each element of A is a row in the matrix, len(A) == len(y), A[i] <= (1 << len_x - 1).\n", | |
" Returns either an array of len_x, each element is either 0, 1, or None meaning unknown;\n", | |
" or None when y = Ax has no solution.\"\"\"\n", | |
" # 1. The stop condition\n", | |
" for ai, yi in zip(A, y):\n", | |
" if ai == 0 and yi != 0:\n", | |
" return None\n", | |
" if len_x == 0:\n", | |
" # Every row in A is 0 if len_x == 0, and we already checked yi\n", | |
" return []\n", | |
" \n", | |
" # 2. Find any row in A, such that the last coefficient is 1\n", | |
" r = -1\n", | |
" for i, ai in enumerate(A):\n", | |
" if ai & 1:\n", | |
" r = i\n", | |
" break\n", | |
"\n", | |
" # 2. if no such row is found, then x[0] can be any value\n", | |
" if r < 0:\n", | |
" ret = solve([ai >> 1 for ai in A], y, len_x - 1)\n", | |
" if ret is None:\n", | |
" return None\n", | |
" else:\n", | |
" return [None] + ret\n", | |
"\n", | |
" # 3. if such row is found, we can eliminate the last element of x\n", | |
" ar, yr = A[r], y[r]\n", | |
" eli_A, eli_y = [], []\n", | |
" for ai, yi in zip(A, y):\n", | |
" if ai & 1:\n", | |
" eli_A.append((ai ^ ar) >> 1)\n", | |
" eli_y.append(yi ^ yr)\n", | |
" else:\n", | |
" eli_A.append(ai >> 1)\n", | |
" eli_y.append(yi)\n", | |
"\n", | |
" # 4. Solve the equation without last element of x\n", | |
" # TODO: Here we simply try both x[0] == 0 and x[0] == 1, but there should be a better way\n", | |
" ret0 = solve(eli_A + [ar >> 1], eli_y + [yr], len_x - 1)\n", | |
" ret1 = solve(eli_A + [ar >> 1], eli_y + [yr ^ 1], len_x - 1)\n", | |
" if ret0 is None:\n", | |
" if ret1 is None:\n", | |
" return None\n", | |
" return [1] + ret1\n", | |
" if ret1 is None:\n", | |
" return [0] + ret0\n", | |
"\n", | |
" # 5. If both x[0] == 0 and x[0] == 1 have solution, just combine them\n", | |
" ret = [None]\n", | |
" for xi0, xi1 in zip(ret0, ret1):\n", | |
" if xi0 is not None and xi0 == xi1:\n", | |
" ret.append(xi0)\n", | |
" else:\n", | |
" ret.append(None)\n", | |
" return ret\n", | |
" \n", | |
"\n", | |
"# Test cases:\n", | |
"# A = Identity matrix\n", | |
"print(solve([1, 2, 4, 8], [1, 0, 1, 0], 4))\n", | |
"# Only second element is constrained\n", | |
"print(solve([2, 2], [1, 1], 4))\n", | |
"# No solution\n", | |
"print(solve([3, 3], [0, 1], 4))\n", | |
"# x[1] and x[2] must be equal, but none of them can be determined\n", | |
"print(solve([6, 7], [0, 1], 4))\n", | |
"print(solve([6, 14], [0, 1], 4))\n", | |
"print(solve([6, 14], [0, 1], 4))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"id": "55a3592e-c451-464e-b2ef-cca6648811ac", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[1, 0, 1, 0]\n", | |
"[None, 1, None, None]\n", | |
"None\n", | |
"[None, None, None, None]\n", | |
"[None, None, None, 1]\n" | |
] | |
} | |
], | |
"source": [ | |
"def solve(A: list[int], y: list[int], len_x: int):\n", | |
" \"\"\" Solve y = Ax on GF(2) using Gaussian elimination.\n", | |
" Each element of y is 0 or 1.\n", | |
" Each element of A is a row in the matrix, len(A) == len(y), A[i] <= (1 << len_x - 1).\n", | |
" Returns either an array of len_x, each element is either 0, 1, or None meaning unknown;\n", | |
" or None when y = Ax has no solution.\"\"\"\n", | |
" # 1. The stop condition\n", | |
" for ai, yi in zip(A, y):\n", | |
" if ai == 0 and yi != 0:\n", | |
" return None\n", | |
" if len_x == 0:\n", | |
" # Every row in A is 0 if len_x == 0, and we already checked yi\n", | |
" return []\n", | |
" \n", | |
" # 2. Find any row in A, such that the first coefficient is 1\n", | |
" r = -1\n", | |
" last = len_x - 1\n", | |
" for i, ai in enumerate(A):\n", | |
" if ai & 1:\n", | |
" r = i\n", | |
" break\n", | |
"\n", | |
" # 2. if no such row is found, then x[0] can be any value\n", | |
" if r < 0:\n", | |
" ret = solve([ai >> 1 for ai in A], y, len_x - 1)\n", | |
" if ret is None:\n", | |
" return None\n", | |
" else:\n", | |
" return [None] + ret\n", | |
"\n", | |
" # 3. if such row is found, we can eliminate x[0]\n", | |
" ar, yr = A[r], y[r]\n", | |
" eli_A, eli_y = [], []\n", | |
" for ai, yi in zip(A, y):\n", | |
" if ai & 1:\n", | |
" if ai == ar:\n", | |
" if yi != yr:\n", | |
" return None\n", | |
" else:\n", | |
" eli_A.append((ai ^ ar) >> 1)\n", | |
" eli_y.append(yi ^ yr)\n", | |
" else:\n", | |
" eli_A.append(ai >> 1)\n", | |
" eli_y.append(yi)\n", | |
"\n", | |
" # 4. Solve the equation without last element of x\n", | |
" ret = solve(eli_A, eli_y, len_x - 1)\n", | |
" if ret is None:\n", | |
" return None\n", | |
"\n", | |
" # 5. Calculating x[0] using y_r = A_r * x\n", | |
" x = yr\n", | |
" for i in range(1, len_x):\n", | |
" if ar >> i & 1:\n", | |
" if ret[i-1] is None:\n", | |
" # x[i] is unknown. This doesn't necessarily mean we cannot determine x[0], but for simplicity we skip that part\n", | |
" # TODO: calculate x[0] properly\n", | |
" return [None] + ret\n", | |
" x ^= ret[i-1]\n", | |
" return [x] + ret\n", | |
" \n", | |
"\n", | |
"# Test cases:\n", | |
"# A = Identity matrix\n", | |
"print(solve([1, 2, 4, 8], [1, 0, 1, 0], 4))\n", | |
"# Only second element is constrained\n", | |
"print(solve([2, 2], [1, 1], 4))\n", | |
"# No solution\n", | |
"print(solve([3, 3], [0, 1], 4))\n", | |
"# x[1] and x[2] must be equal, but none of them can be determined\n", | |
"print(solve([6, 7], [0, 1], 4))\n", | |
"print(solve([6, 14], [0, 1], 4))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "be9c6728-11e6-4aa6-aee6-1f67cab74d7c", | |
"metadata": {}, | |
"source": [ | |
"Next, we implement the solution searching:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"id": "61cd60d1-9023-40ac-a9d2-e05a578dffab", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[1, 0, 1, 0]\n", | |
"[1, 0, 1, 0]\n", | |
"[0, 1, 0, 0]\n", | |
"[1, 1, 0, 0]\n", | |
"None\n", | |
"[1, 0, 0, 0]\n", | |
"[1, 0, 0, 0]\n" | |
] | |
} | |
], | |
"source": [ | |
"def vec_comp(x, y):\n", | |
" \"\"\"\n", | |
" Compare 2 vectors element-wise, from right to left. Returns 1 if x > y, -1 if x < y, 0 if x == y\n", | |
" x, y should have same length, and only contains 0 or 1.\n", | |
" \"\"\"\n", | |
" for xi, yi in zip(reversed(x), reversed(y)):\n", | |
" if xi != yi:\n", | |
" return xi - yi\n", | |
" return 0\n", | |
"\n", | |
"def find_next(A: list[int], y: list[int], x0: list[int], d: int):\n", | |
" # Acp, ycp, x0cp = A.copy(), y.copy(), x0.copy()\n", | |
" # print(f'find_next({A}, {''.join(str(xx) for xx in y)}, {''.join(str(xx) for xx in x0)}, {d})')\n", | |
" \"\"\"\n", | |
" Find the next solution of y=Ax, starting from x0. Comparison is using vec_comp, therefore x[0] is LSB and x[-1] is MSB.\n", | |
" d is direction, 1 meaning x > x0, or -1 meaning x < x0.\n", | |
" \"\"\"\n", | |
" # 1. Solve the partial x\n", | |
" x = solve(A, y, len(x0))\n", | |
"\n", | |
" # 2. Since our encrypt function is invertible, y=Ax should always have solution, but we check it nonetheless\n", | |
" if x is None:\n", | |
" return None\n", | |
"\n", | |
" # 3. If x is solved, just return x\n", | |
" if None not in x:\n", | |
" return x\n", | |
"\n", | |
" # 4. find the last index of None in x\n", | |
" i = len(x) - 1 - x[::-1].index(None)\n", | |
" # print(f'find_next({A}, {''.join(str(xx) for xx in y)}, {''.join(str(xx) for xx in x0)}, {d}), x = {x}, i = {i}')\n", | |
"\n", | |
" # 5. Now there are 2 cases, either the beginning part of x and x0 are the same, or different\n", | |
" \"\"\"\n", | |
" reversed(x0) = 1 1 0 1 0 0\n", | |
" reversed(x) = 1 1 * * 0 1\n", | |
" ^^^ Check if this part is equal\n", | |
" \"\"\"\n", | |
" if x0[i+1:] != x[i+1:]:\n", | |
" # This is the simple case where beginning part is different: just set x[i] depending on search direction\n", | |
" A.append(1 << i)\n", | |
" y.append(0 if d > 0 else 1)\n", | |
" return find_next(A, y, x0, d)\n", | |
" else:\n", | |
" # If they are equal, we try set x[i] = x0[i] first, and check if it is really in the correct direction\n", | |
" \"\"\"\n", | |
" reversed(x0) = 1 1 0 1 0 0\n", | |
" reversed(x) = 1 1 * * 0 1\n", | |
" ^ Setting this value may cause later values become determined\n", | |
" We solve the equation again after setting the value of x[i], there may be 2 outcomes:\n", | |
" reversed(x*) = 1 1 0 1 0 1 <== If x* is greater than x0, we are good, just return x*\n", | |
" reversed(x*) = 1 1 0 0 0 1 <== If x* is less than x0, we'd better use the x[i] != x0[i] solution\n", | |
" \"\"\"\n", | |
" A.append(1 << i)\n", | |
" y.append(x0[i])\n", | |
" Acopy, ycopy = A.copy(), y.copy()\n", | |
" xs = find_next(A, y, x0, d)\n", | |
" # print(f'find_next({A}, {''.join(str(xx) for xx in y)}, {''.join(str(xx) for xx in x0)}, {d}), x = {x}, i = {i}, xs = {''.join(str(xx) for xx in xs)}')\n", | |
" if vec_comp(xs, x0) == d:\n", | |
" return xs\n", | |
" ycopy[-1] ^= 1\n", | |
" return find_next(Acopy, ycopy, x0, d)\n", | |
" \n", | |
"\n", | |
"# Test cases:\n", | |
"# A = Identity matrix\n", | |
"print(find_next([1, 2, 4, 8], [1, 0, 1, 0], [0, 0, 0, 0], 1))\n", | |
"print(find_next([1, 2, 4, 8], [1, 0, 1, 0], [1, 0, 1, 0], -1))\n", | |
"# Only second element is constrained\n", | |
"print(find_next([2, 2], [1, 1], [0, 0, 0, 0], 1))\n", | |
"print(find_next([2, 2], [1, 1], [1, 0, 1, 0], -1))\n", | |
"# No solution\n", | |
"print(find_next([3, 3], [0, 1], [0, 0, 0, 0], 1))\n", | |
"# x[1] and x[2] must be equal, but none of them can be determined\n", | |
"print(find_next([6, 7], [0, 1], [0, 0, 0, 0], 1))\n", | |
"print(find_next([6, 6], [0, 0], [1, 0, 1, 0], -1))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "fdbd6dc3-8a24-41a3-8a66-7517a2b75319", | |
"metadata": {}, | |
"source": [ | |
"# Put things together" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "27f663d7-e72b-4f6f-b910-b3ae5e99be80", | |
"metadata": {}, | |
"source": [ | |
"First, we generate the full matrix A (when all 122 bits in y is determined)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"id": "2f021146-1f07-4787-9055-94a1abf69269", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"matA = [0] * UUID_ENTROPY\n", | |
"for i in range(UUID_ENTROPY):\n", | |
" column_i = encrypt(1 << i)\n", | |
" for j in range(UUID_ENTROPY):\n", | |
" if column_i >> j & 1:\n", | |
" matA[j] |= 1 << i" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "a92c6bca-2511-4314-9052-2a2251d228f2", | |
"metadata": {}, | |
"source": [ | |
"Second, define helper functions to convert from numerical index to vector x and vice versa. Remember that `x[0]` is LSB and `x[-1]` is MSB" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"id": "2167a0d8-5640-4a7e-9e6c-5c755b3d178d", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[1, 1, 0, 0]\n", | |
"5\n" | |
] | |
} | |
], | |
"source": [ | |
"def to_vec(idx: int, length: int):\n", | |
" return [idx >> i & 1 for i in range(length)]\n", | |
"\n", | |
"def from_vec(vec: list[int]):\n", | |
" l = len(vec)\n", | |
" return sum(1 << i for i, b in enumerate(vec) if b)\n", | |
"\n", | |
"\n", | |
"# Test cases:\n", | |
"print(to_vec(3, 4))\n", | |
"print(from_vec([1, 0, 1, 0]))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "890e86f8-835f-4a98-b55d-03cc258c58bc", | |
"metadata": {}, | |
"source": [ | |
"Third, define helper function to convert a partially given uuid to y with element 0, 1, None. Remember that `y[0]` should be LSB, and corresponds to right most character in UUID." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"id": "4c1fd577-9654-44c6-bc56-d6ecbe0bae95", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None, None]\n", | |
"[1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]\n", | |
"True\n" | |
] | |
} | |
], | |
"source": [ | |
"def base16_to_vec(ch: str):\n", | |
" if ch == '*':\n", | |
" return [None, None, None, None]\n", | |
" else:\n", | |
" val = int(ch, 16)\n", | |
" return [val >> i & 1 for i in range(3, -1, -1)]\n", | |
"\n", | |
"def part_uuid_to_y(uuid: str):\n", | |
" # ********-****-4***-8***-************\n", | |
" if len(uuid) != 36 or any(uuid[i] not in '*-' for i in (8, 13, 18, 23)) or uuid[14] not in '*4' or uuid[19] not in '*89abAB':\n", | |
" # Invalid UUID\n", | |
" return None\n", | |
" no_dash = uuid[:8] + uuid[9:13] + uuid[14:18] + uuid[19:23] + uuid[24:]\n", | |
" try:\n", | |
" vec = sum((base16_to_vec(ch) for ch in no_dash), [])\n", | |
" except ValueError:\n", | |
" # Non base16 character\n", | |
" return None\n", | |
" vec = vec[:48] + vec[52:64] + vec[66:]\n", | |
" vec.reverse()\n", | |
" return vec\n", | |
" \n", | |
"\n", | |
"# Test cases:\n", | |
"print(part_uuid_to_y('********-****-****-****-************'))\n", | |
"print(part_uuid_to_y('00000000-0000-4000-8000-000000000001'))\n", | |
"print(''.join(str(x) for x in reversed(part_uuid_to_y(indexToUUID(1)))) == bin(encrypt(1))[2:])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "041fe171-1328-42e2-8fb8-c7587ab06138", | |
"metadata": {}, | |
"source": [ | |
"Now, we can search for next UUID matching certain pattern:" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"id": "89c191c4-88b0-4a45-8d04-e63127657cdd", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"12346\n", | |
"12344\n", | |
"1 80000000-0000-4000-8000-010000080001\n", | |
"5070604818769168521263589621760 80000000-0000-4001-8000-010000080001\n", | |
"10141209637538337042527179243523 80000000-0000-4002-8000-010000080001\n", | |
"15211814456307505563790768865282 80000000-0000-4003-8000-010000080001\n", | |
"20282419275076674085054358487045 80000000-0000-4004-8000-010000080001\n", | |
"25353024093845842606317948108804 80000000-0000-4005-8000-010000080001\n", | |
"30423628912615011127581537730567 80000000-0000-4006-8000-010000080001\n", | |
"35494233731384179648845127352326 80000000-0000-4007-8000-010000080001\n", | |
"40564838550153348170108716974089 80000000-0000-4008-8000-010000080001\n", | |
"45635443368922516691372306595848 80000000-0000-4009-8000-010000080001\n", | |
"50706048187691685212635896217611 80000000-0000-400a-8000-010000080001\n", | |
"55776653006460853733899485839370 80000000-0000-400b-8000-010000080001\n", | |
"60847257825230022255163075461133 80000000-0000-400c-8000-010000080001\n", | |
"65917862643999190776426665082892 80000000-0000-400d-8000-010000080001\n", | |
"70988467462768359297690254704655 80000000-0000-400e-8000-010000080001\n", | |
"76059072281537527818953844326414 80000000-0000-400f-8000-010000080001\n", | |
"309466121274148353914263936 cc000199-aaaa-467f-baaa-aaaaaaaaaaaa\n" | |
] | |
} | |
], | |
"source": [ | |
"def find_next_uuid_idx(uuid: str, idx: int, d: int):\n", | |
" # 1. Convert idx to vector x0, convert uuid to vector y and matrix A\n", | |
" x0 = to_vec(idx + INDEX_OFFSET, UUID_ENTROPY)\n", | |
"\n", | |
" y, A = [], []\n", | |
" y_none = part_uuid_to_y(dec_uuid(uuid))\n", | |
" if y_none is None:\n", | |
" # Invalid uuid\n", | |
" return None\n", | |
" for i, yi in enumerate(y_none):\n", | |
" if yi is not None:\n", | |
" y.append(yi)\n", | |
" A.append(matA[i])\n", | |
"\n", | |
" # 2. Solve for x\n", | |
" x = find_next(A, y, x0, d)\n", | |
"\n", | |
" # 3. Convert x to numerical index\n", | |
" return from_vec(x) - INDEX_OFFSET & (1 << UUID_ENTROPY) - 1\n", | |
"\n", | |
"\n", | |
"# Test cases:\n", | |
"print(find_next_uuid_idx('********-*********-****-************', 12345, 1))\n", | |
"print(find_next_uuid_idx('********-****-*********-************', 12345, -1))\n", | |
"\n", | |
"i = 0\n", | |
"for _ in range(16):\n", | |
" i = find_next_uuid_idx('80000000-0000-400*-8000-010000080001', i, 1)\n", | |
" print(i, indexToUUID(i))\n", | |
"\n", | |
"i = find_next_uuid_idx('********-aaaa-4***-*aaa-aaaaaaaaa***', 12345, 1)\n", | |
"print(i, indexToUUID(i))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "a08e89f1-9622-4693-8e3b-6b4f0cedf6b8", | |
"metadata": {}, | |
"source": [ | |
"Add pattern matching" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"id": "7e76c214-c605-4bd3-bb3b-27680d967fae", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"773378 00000000-002f-4340-abcd-025e681bcd02\n", | |
"773379 80000000-002f-4340-abcd-035e6813cd03\n", | |
"773382 00000000-002f-4341-abcd-065e683bcd06\n", | |
"773383 80000000-002f-4341-abcd-075e6833cd07\n", | |
"773386 00000000-002f-4342-abcd-0a5e685bcd0a\n", | |
"773387 80000000-002f-4342-abcd-0b5e6853cd0b\n", | |
"773390 00000000-002f-4343-abcd-0e5e687bcd0e\n", | |
"773391 80000000-002f-4343-abcd-0f5e6873cd0f\n", | |
"773394 00000000-002f-4344-abcd-125e689bcd12\n", | |
"773395 80000000-002f-4344-abcd-135e6893cd13\n", | |
"773398 00000000-002f-4345-abcd-165e68bbcd16\n", | |
"773399 80000000-002f-4345-abcd-175e68b3cd17\n", | |
"773402 00000000-002f-4346-abcd-1a5e68dbcd1a\n", | |
"773403 80000000-002f-4346-abcd-1b5e68d3cd1b\n", | |
"773406 00000000-002f-4347-abcd-1e5e68fbcd1e\n", | |
"773407 80000000-002f-4347-abcd-1f5e68f3cd1f\n", | |
"None\n" | |
] | |
} | |
], | |
"source": [ | |
"def find_next_pattern(pattern: str, idx: int, d: int):\n", | |
" pad_length = len(indexToUUID(0)) - len(pattern)\n", | |
" ret = None\n", | |
" for left in range(pad_length + 1):\n", | |
" uuid = '*' * left + pattern + '*' * (pad_length - left)\n", | |
" if part_uuid_to_y(uuid) is None:\n", | |
" # Not a valid uuid, skip\n", | |
" continue\n", | |
" i = find_next_uuid_idx(uuid, idx, d)\n", | |
" if i is not None:\n", | |
" if ret is None:\n", | |
" ret = i\n", | |
" else:\n", | |
" ret_distance = (ret - idx) * d % (1 << UUID_ENTROPY)\n", | |
" i_distance = (i - idx) * d % (1 << UUID_ENTROPY)\n", | |
" if i_distance < ret_distance:\n", | |
" ret = i\n", | |
" return ret\n", | |
"\n", | |
"\n", | |
"# Test cases:\n", | |
"i = 0\n", | |
"for _ in range(16):\n", | |
" i = find_next_pattern('abcd-', i, 1)\n", | |
" print(i, indexToUUID(i))\n", | |
"print(find_next_pattern('This is not an UUID', i, 1))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "0f73bbc2-bed5-4e72-bc98-74e51e35c12a", | |
"metadata": {}, | |
"source": [ | |
"Now, everything is working, except... the UUID doesn't look random. So the last part is to make it feels random again.\n", | |
"\n", | |
"We start by using a better feistelRound function. The simplified one just shifted bits around, we want a more complicated (but still linear) one." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"id": "b3ba7744-aa71-47f1-9b10-fdedc6b2a241", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"0 00000000-0000-4000-8000-000000000000\n", | |
"1 85647b3d-7e46-4224-ad8b-a963dc4298e5\n", | |
"2 735ae2ca-49db-4936-9192-a00a15dc532c\n", | |
"3 f63e99f7-379d-4b12-bc19-0969c99ecbc9\n", | |
"4 163f103a-3dce-45cf-8fdb-5bb91398017f\n", | |
"5 935b6b07-4388-47eb-a250-f2dacfda999a\n", | |
"6 6565f2f0-7415-4cf9-9e49-fbb306445253\n", | |
"7 e00189cd-0a53-4edd-b3c2-52d0da06cab6\n" | |
] | |
} | |
], | |
"source": [ | |
"import random\n", | |
"random.seed(42)\n", | |
"\n", | |
"matFR = []\n", | |
"for _ in range(ROUNDS_USED):\n", | |
" # Just generate a random matrix of size 61x61\n", | |
" mat = [random.randint(0, BIT_MASK) for __ in range(UUID_ENTROPY // 2)]\n", | |
" matFR.append(mat)\n", | |
" \n", | |
"\n", | |
"def feistelRound(block: int, r: int) -> int:\n", | |
" # mixed is dot product of matFR (61x61 matrix) and block (61 vector) on GF(2)\n", | |
" mixed = 0\n", | |
" for i in range(UUID_ENTROPY // 2):\n", | |
" if block >> i & 1:\n", | |
" mixed ^= matFR[r][i]\n", | |
" return mixed\n", | |
"\n", | |
"for i in range(8):\n", | |
" print(i, indexToUUID(i))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "008a4f6f-7fa3-4248-8cd3-11721ec1980e", | |
"metadata": {}, | |
"source": [ | |
"This already looks pretty random to me, except the 0-th. We can map each character in the output UUID to another one, to make the 0-th feels more random." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"id": "ad1f6ed7-a1dd-46fa-ae0d-13a091d5c865", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"['afec627d03459b18', 'e2046537af9b81dc', '0b8f531cd967e42a', '39f7b45216c80ade', 'c6b319fd4e5702a8', 'da518ef0673b9c24', 'd2b8a49c736fe105', '8e73d9a650c21bf4', '8549bdf03c17a26e', 'f7c01436925a8bed', 'a5b74df31ec20986', '6b74e98c1fd2530a', '0123456789abcdef', '4f0d8e2716ca9b35', 'a20471f6bdc8953e', '718c69a3bdf25e04', '012345679a8bcdef', '2f3e81c06d547ab9', '9b4e713dfc08a256', '4102f9ab7d3c5e86', '2906b5e4d718cfa3', '578ae426b039f1cd', '5f6b74d803c2a1e9', '0ad876f354cb9e12', '97be10f32c4a856d', 'e1a2748d6b35f90c', '6e98b214fac70d53', '478e5bd96321fac0', 'd05b76a2ce1489f3', 'e0b26ac17d9534f8', 'c5ab78013e62d49f', '3cab17240d965ef8']\n", | |
"ae03cdd8-8fa6-44a7-9294-25509e64dec3\n", | |
"00000000-0000-4000-8000-000000000000\n", | |
"0e03cdd8-8fa6-44a*-9294-27509e66decc\n", | |
"051bdb8b-0e48-4006-8afc-10d85fb8e797\n", | |
"a5b82b72-8e41-44aa-9565-95150c76f1e4\n", | |
"051bdb8b-0e48-4006-8afc-10d85fb8e797\n", | |
"0 ae03cdd8-8fa6-44a7-9294-25509e64dec3\n", | |
"1 051bdb8b-0e48-4006-8afc-10d85fb8e797\n", | |
"2 d43ca5ec-b292-464a-afc0-155c74df62a5\n", | |
"3 83fde756-96e3-4a28-b7bd-20d48bac85dd\n", | |
"4 f3fe6d8c-9b00-4e94-992c-592472a6d018\n", | |
"5 3438fbd6-b011-4732-8314-381c8cd2ede9\n", | |
"6 75148558-0159-49ed-ab7d-392898b56b8b\n", | |
"7 1e0947eb-85d4-435e-bea0-5810536d8922\n" | |
] | |
} | |
], | |
"source": [ | |
"char_mapping = []\n", | |
"for i in range(32):\n", | |
" if i == 8+4:\n", | |
" chars = '4'\n", | |
" mapping = ''.join(random.sample(list(chars), k=len(chars)))\n", | |
" char_mapping.append('0123' + mapping + '56789abcdef')\n", | |
" elif i == 8+4+4:\n", | |
" chars = '89ab'\n", | |
" mapping = ''.join(random.sample(list(chars), k=len(chars)))\n", | |
" char_mapping.append('01234567' + mapping + 'cdef')\n", | |
" else:\n", | |
" chars = '0123456789abcdef'\n", | |
" char_mapping.append(''.join(random.sample(list(chars), k=len(chars))))\n", | |
"print(char_mapping)\n", | |
"\n", | |
"def enc_uuid(uuid: str) -> str:\n", | |
" ret = ''\n", | |
" i = 0\n", | |
" for idx, ch in enumerate(uuid):\n", | |
" if idx in (8, 13, 18, 23):\n", | |
" ret += ch\n", | |
" elif ch == '*':\n", | |
" ret += ch\n", | |
" i += 1\n", | |
" else:\n", | |
" ret += char_mapping[i][int(ch, 16)]\n", | |
" i += 1\n", | |
" return ret\n", | |
"\n", | |
"def dec_uuid(uuid: str) -> str:\n", | |
" ret = ''\n", | |
" i = 0\n", | |
" for idx, ch in enumerate(uuid):\n", | |
" if idx in (8, 13, 18, 23):\n", | |
" ret += ch\n", | |
" elif ch == '*':\n", | |
" ret += ch\n", | |
" i += 1\n", | |
" else:\n", | |
" ret += hex(char_mapping[i].index(ch))[2:]\n", | |
" i += 1\n", | |
" return ret\n", | |
"\n", | |
" \n", | |
"# Test cases:\n", | |
"e = enc_uuid('00000000-0000-4000-8000-000000000000')\n", | |
"print(e)\n", | |
"print(dec_uuid(e))\n", | |
"print(enc_uuid('80000000-0000-400*-8000-010000080001'))\n", | |
"print(indexToUUID(1))\n", | |
"print(enc_uuid(indexToUUID(1)))\n", | |
"print(dec_uuid(enc_uuid(indexToUUID(1))))\n", | |
"\n", | |
"for i in range(8):\n", | |
" print(i, indexToUUID(i))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "5e83fd1a-c843-4768-ab4d-b4bc4e55a068", | |
"metadata": {}, | |
"source": [ | |
"Finally, we give `INDEX_OFFSET` a non-zero value, regenerate `matA` (because we modified `feistelRound`) and verify that everything is still working" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"id": "0e06a311-b58d-44c8-8969-22f849242ee6", | |
"metadata": {}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"INDEX_OFFSET = 4381685760849678301109426716265418720\n", | |
"0 3765bb4e-96ab-45a3-879a-bcca4a76d767\n", | |
"1 f0ef9d91-b24c-4b0c-9ffe-a6ad3032ee73\n", | |
"2 16a00732-0e9d-4241-bacf-accbad15653d\n", | |
"3 7291756a-8fe5-4829-a2b6-b6a5f59d8205\n", | |
"4 02963b92-850a-4c9b-8e2e-94f5a194dd49\n", | |
"5 a6aa1d4a-011f-4135-9b1a-8f8bf918e0b8\n", | |
"6 80e7276e-b05e-4de0-b376-84fd473f69f2\n", | |
"7 d7625531-9bd7-4f5f-a9af-9f8a367c8b5b\n", | |
"5316911983139663491615228241121370056 52e853a8-3edc-4b92-abcd-c4845d8951e6\n", | |
"5316911983139663491615228241121359891 98de5d40-856e-456d-abcd-51f188bf6f42\n", | |
"5316911983139663491615228241121354717 583b7dc7-a33c-4612-abcd-540602143db1\n", | |
"5316911983139663491615228241121348614 92b37392-bc9e-4a3d-abcd-c1ed1ae1ce1d\n", | |
"5316911983139663491615228241121279244 0c5a7c73-abcd-44ad-9e23-1fb264c27b61\n", | |
"5316911983139663491615228241121272693 b723cd2a-e266-424c-abcd-74b41d92d662\n", | |
"5316911983139663491615228241121266478 496bc3d4-0bd4-48b1-abcd-214108272496\n", | |
"5316911983139663491615228241121255264 b9ce63bd-7796-40cc-abcd-24a6824a99cd\n", | |
"5316911983139663491615228241121250533 56cf9802-abcd-4b4d-a12f-caefd2e9a397\n", | |
"5316911983139663491615228241121244987 47f86d8b-9434-4421-abcd-71cd5a6d1271\n", | |
"5316911983139663491615228241121235060 3c3ec611-bcd9-4865-abcd-391ba766393f\n", | |
"5316911983139663491615228241121228847 10b8c5e9-a368-4290-abcd-ea29444ec2db\n", | |
"5316911983139663491615228241121225825 30e3650f-8539-4435-abcd-e963ee2056ac\n", | |
"5316911983139663491615228241121215546 1cdb6656-3e98-4010-abcd-3a70b1956405\n", | |
"5316911983139663491615228241121154422 0c924013-abcd-40d4-a8ba-886f646f9271\n", | |
"5316911983139663491615228241121144009 8fcb557c-9467-45b6-abcd-193bb7eb9d2b\n", | |
"113351995395451788984974967212902327 80000000-0000-400a-8000-01000008000f\n", | |
"543255708970482847490837046476818500 80000000-0000-400a-8000-010000080004\n", | |
"721824138952761890388094133374939715 80000000-0000-400a-8000-01000008000d\n", | |
"1171532081527850628199350933356003389 80000000-0000-400a-8000-010000080008\n", | |
"1350101392924637000018162642316655162 80000000-0000-400a-8000-01000008000c\n", | |
"1739115139246809930864102719202866633 80000000-0000-400a-8000-010000080000\n", | |
"2226625229836017120751613689207263182 80000000-0000-400a-8000-010000080002\n", | |
"2320296806464883735543219163397796915 80000000-0000-400a-8000-010000080009\n", | |
"2807808174608237342957968428826791476 80000000-0000-400a-8000-010000080001\n", | |
"3248745558660405265251327760040477127 80000000-0000-400a-8000-01000008000e\n", | |
"3427313592503097172491178817421310912 80000000-0000-400a-8000-01000008000b\n", | |
"3783562687275318351580946875376323006 80000000-0000-400a-8000-010000080007\n", | |
"3962132394811724915621981785768686521 80000000-0000-400a-8000-010000080006\n", | |
"4360881737396401003349403319192468554 80000000-0000-400a-8000-01000008000a\n", | |
"4848391431846021057579517021679191629 80000000-0000-400a-8000-010000080005\n", | |
"4942753006531381865407668313617907120 80000000-0000-400a-8000-010000080003\n", | |
"42\n" | |
] | |
} | |
], | |
"source": [ | |
"INDEX_OFFSET = random.randrange(1 << UUID_ENTROPY)\n", | |
"print(f'INDEX_OFFSET = {INDEX_OFFSET}')\n", | |
"\n", | |
"matA = [0] * UUID_ENTROPY\n", | |
"for i in range(UUID_ENTROPY):\n", | |
" column_i = encrypt(1 << i)\n", | |
" for j in range(UUID_ENTROPY):\n", | |
" if column_i >> j & 1:\n", | |
" matA[j] |= 1 << i\n", | |
"\n", | |
"\n", | |
"# Test cases:\n", | |
"for i in range(8):\n", | |
" print(i, indexToUUID(i))\n", | |
"\n", | |
"i = 0\n", | |
"for _ in range(16):\n", | |
" i = find_next_pattern('abcd-', i, -1)\n", | |
" print(i, indexToUUID(i))\n", | |
"for _ in range(16):\n", | |
" i = find_next_pattern('80000000-0000-400a-8000-01000008000', i, 1)\n", | |
" print(i, indexToUUID(i))\n", | |
"\n", | |
"# Make sure we can still decode UUID\n", | |
"print(find_next_pattern(indexToUUID(42), i, -1))" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.12.7" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The linear transform part does not have to be a feistel structure. We can use any 122x122 full rank matrix on GF(2) as
matA
and it will still work