Last active
February 2, 2025 11:05
-
-
Save ddjerqq/020a39ed67669f893d3ff01850bd959d to your computer and use it in GitHub Desktop.
C# data protection class, that uses Blake3 to securely hash arbitrary UTF8 strings using a data specific key, to allow for querying for encrypted data using LIKEness filters. Primarily for EFCore. Will add this to Klean.EntityFrameworkCore.DataProtection soon
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
using System.Diagnostics; | |
using System.Security.Cryptography; | |
using System.Text; | |
using Blake3; | |
public static class LikenessFilterableHasher | |
{ | |
/// <summary> | |
/// Chunks a UTF8 string into runes. Each rune has each character's bytes (up to four bytes per char) | |
/// </summary> | |
private static IEnumerable<byte[]> EnumerateRuneChunks(this string input) => input | |
.EnumerateRunes() | |
.Select(rune => rune.ToString()) | |
.Select(Encoding.UTF8.GetBytes); | |
/// <summary> | |
/// Hashes a string using the Blake3 algorithm, taking each character, and hashing it | |
/// </summary> | |
public static string Hash(string input, byte[] key) | |
{ | |
using var hasher = Hasher.NewKeyed(key); | |
Span<byte> hash = stackalloc byte[4]; | |
var outputSize = input.EnumerateRuneChunks().Sum(rune => rune.Length); | |
var output = new List<byte>(outputSize); | |
foreach (var runeChunk in input.EnumerateRuneChunks()) | |
{ | |
hasher.Update(runeChunk); | |
hasher.Finalize(0, hash); | |
hasher.Reset(); | |
output.AddRange(hash[..runeChunk.Length]); | |
} | |
return Convert.ToHexString(output.ToArray()); | |
} | |
} |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
var key = "1e1e5f951d4538da069f2c38da069f2c"u8.ToArray(); | |
var payload = RandomNumberGenerator.GetHexString(1_000_000); | |
var payloadHash = LikenessFilterableHasher.Hash(payload, key); | |
var queryHash = LikenessFilterableHasher.Hash(payload[..20], key); | |
if (!payloadHash.Contains(queryHash)) | |
throw new Exception("payloadHash hash does not contain query hash"); |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
(somewhat inaccurate) benchmark
As you can see, this method hashes a payload of 1 million bytes in only 130ms. Please do keep in mind this intended to protect sensitive personal data, such as Full names, SSNs, PersonalIds, Medical records and so on, so the size of the actual data may not ever reach a million bytes in production. Even if it does, though, for example, large medical records, so on, it will still be relatively fast.