Skip to content

Instantly share code, notes, and snippets.

@ddjerqq
Last active February 2, 2025 11:05
Show Gist options
  • Save ddjerqq/020a39ed67669f893d3ff01850bd959d to your computer and use it in GitHub Desktop.
Save ddjerqq/020a39ed67669f893d3ff01850bd959d to your computer and use it in GitHub Desktop.
C# data protection class, that uses Blake3 to securely hash arbitrary UTF8 strings using a data specific key, to allow for querying for encrypted data using LIKEness filters. Primarily for EFCore. Will add this to Klean.EntityFrameworkCore.DataProtection soon
using System.Diagnostics;
using System.Security.Cryptography;
using System.Text;
using Blake3;
public static class LikenessFilterableHasher
{
/// <summary>
/// Chunks a UTF8 string into runes. Each rune has each character's bytes (up to four bytes per char)
/// </summary>
private static IEnumerable<byte[]> EnumerateRuneChunks(this string input) => input
.EnumerateRunes()
.Select(rune => rune.ToString())
.Select(Encoding.UTF8.GetBytes);
/// <summary>
/// Hashes a string using the Blake3 algorithm, taking each character, and hashing it
/// </summary>
public static string Hash(string input, byte[] key)
{
using var hasher = Hasher.NewKeyed(key);
Span<byte> hash = stackalloc byte[4];
var outputSize = input.EnumerateRuneChunks().Sum(rune => rune.Length);
var output = new List<byte>(outputSize);
foreach (var runeChunk in input.EnumerateRuneChunks())
{
hasher.Update(runeChunk);
hasher.Finalize(0, hash);
hasher.Reset();
output.AddRange(hash[..runeChunk.Length]);
}
return Convert.ToHexString(output.ToArray());
}
}
var key = "1e1e5f951d4538da069f2c38da069f2c"u8.ToArray();
var payload = RandomNumberGenerator.GetHexString(1_000_000);
var payloadHash = LikenessFilterableHasher.Hash(payload, key);
var queryHash = LikenessFilterableHasher.Hash(payload[..20], key);
if (!payloadHash.Contains(queryHash))
throw new Exception("payloadHash hash does not contain query hash");
@ddjerqq
Copy link
Author

ddjerqq commented Feb 2, 2025

(somewhat inaccurate) benchmark

image

As you can see, this method hashes a payload of 1 million bytes in only 130ms. Please do keep in mind this intended to protect sensitive personal data, such as Full names, SSNs, PersonalIds, Medical records and so on, so the size of the actual data may not ever reach a million bytes in production. Even if it does, though, for example, large medical records, so on, it will still be relatively fast.

Because this uses blake3, its incredibly fast. But the algo can be easily swapped to HMACSHA256 or others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment