Running Chroma in Diffusers

You can convert Chroma into a Schnell-compatible safetensors file like this:

apt-get update && apt-get install aria2 -y && pip install safetensors

# Create Schnell-compatible variant of Chroma by downloading both Chroma and Schnell safetensor files, and copying Chroma's matching weights over to Schnell. This works because lodestone *distilled* the guidance layers instead of completely pruning them, so we can actually just use Schnell's guidance stuff. This comes at the cost of bloating the model back to Schnell's original size, but it's probably the easiest approach for diffusers compatbility for now.
CHROMA_VERSION="19"
aria2c -x 16 -s 16 -o "./chroma.safetensors" "https://huggingface.co/lodestones/Chroma/resolve/main/chroma-unlocked-v${CHROMA_VERSION}.safetensors?download=true"
aria2c -x 16 -s 16 -o "./flux1-schnell.safetensors" "https://huggingface.co/black-forest-labs/FLUX.1-schnell/resolve/main/flux1-schnell.safetensors?download=true"
python3 -c '
from safetensors import safe_open
from safetensors.torch import save_file
with safe_open("./chroma.safetensors", framework="pt", device="cpu") as chroma, safe_open("./flux1-schnell.safetensors", framework="pt", device="cpu") as schnell:
    chroma_tensors = {key: chroma.get_tensor(key) for key in chroma.keys()}
    schnell_tensors = {key: schnell.get_tensor(key) for key in schnell.keys()}
matching_keys = set(chroma_tensors).intersection(schnell_tensors)
for key in matching_keys:
    schnell_tensors[key] = chroma_tensors[key]
save_file(schnell_tensors, "./chroma-schnell-compat.safetensors")'

and then the only change needed for diffusers v0.32.2 support is to replace this file:

https://raw.githubusercontent.com/huggingface/diffusers/refs/tags/v0.32.2/src/diffusers/pipelines/flux/pipeline_flux.py

with this:

https://gist.githubusercontent.com/josephrocca/9c2c28efd5ffe810291750bd009419f4/raw/110b956c2e7ca9363e217abb63a1461aac6ab613/pipeline_flux.py

which just adds proper negative prompt/cfg support, and does Chroma's attention masking stuff on the T5 tokens (you can diff those two files to see changes, of course). So then inference is just:

from diffusers import FluxPipeline, FluxTransformer2DModel, FlowMatchEulerDiscreteScheduler
from transformers import T5EncoderModel

transformer = FluxTransformer2dModel.from_pretrained("./chroma-schnell-compat.safetensors")
pipe = FluxPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-schnell",
    text_encoder=None, # Chroma doesn't use CLIP
    transformer=transformer,
    torch_dtype=torch.bfloat16,
    scheduler=FlowMatchEulerDiscreteScheduler(use_dynamic_shifting=True, base_shift=0.5, max_shift=1.15, use_beta_sigmas=True),
).to("cuda")
# ...

josephrocca/chroma_schnell_compat_diffusers.md