Skip to content

Instantly share code, notes, and snippets.

@ericflo
ericflo / start_with_judge.md
Last active August 31, 2025 06:43
Start With the Judge: A Practical Blueprint for Dataset Engineering - I rambled a 12 minute voice note, had Gemini transcribe it, then GPT-5 rewrite it into a nice blog post format

Start With the Judge: A Practical Blueprint for Dataset Engineering

TL;DR: Most LLM “training” work is really dataset engineering: defining the task, crafting a tiny set of crystalline examples, and building a reliable judge that can score outputs. If you start by perfecting the judge and then use it to drive generation, selection, and reinforcement learning—plus a few pragmatic guardrails—you can turn a tinkery, manual grind into a repeatable pipeline (and eventually, an automated agentic system).


The pattern today (and why it’s exhausting)

You notice a recurring failure: maybe the model keeps botching a class of SQL problems (“write a query that does X with window functions”), or it can’t follow a bespoke DSL, or it slips out of character in voice-constrained writing. The current playbook looks like this:

@ericflo
ericflo / NO_MOCKS_POLICY.md
Created August 3, 2025 19:10
Written by Claude Opus, endorsed by me

NO MOCKS, NO FALLBACKS, FAIL FAST, FAIL LOUD

This codebase follows a strict policy:

🚫 NO MOCKS

  • Mock objects create false confidence
  • They diverge from production behavior
  • They hide real integration issues
  • They make debugging harder
#!/usr/bin/env python3
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "datasets",
# "numpy",
# "requests",
# "tqdm",
# "pyarrow"
# ]
def disk_cache(cache_dir: str = ".cache"):
"""
Decorator that implements disk caching for functions using JSON.
NOTE: This is adapted for async usage: we await the wrapped function,
but the file I/O remains synchronous for simplicity.
Args:
cache_dir: Directory to store cache files
"""
def decorator(func):
import copy
import json
import os
import logging
import random
import traceback
from concurrent.futures import ThreadPoolExecutor, as_completed
from dataclasses import dataclass
import fire
@ericflo
ericflo / create_pairs.py
Last active September 15, 2024 05:29
RLAIF Steering Tokens
# create_pairs.py
import argparse
import copy
import json
import random
from tqdm import tqdm
from datasets import load_dataset
from vllm import LLM, SamplingParams
from transformers import AutoTokenizer
@ericflo
ericflo / chalice_of_light.md
Created September 1, 2024 20:45
Llama 405 Base is very creative...

Movie Title: The Chalice of Light

Length:

2 hours and 45 minutes

Supporting Evidence -- Reason for Virality:

Many people believe in various kinds of supernatural phenomena for a variety of reasons, reasons I believe are not yet totally understood by academics, and certainly not by the general public. However, the fact that people believe in these things is by itself an untapped power.

# Deploy 1x AMD MI300X
# python -m vllm.entrypoints.openai.api_server --port 8083 --host 127.0.0.1 --model meta-llama/Meta-Llama-3.1-70B-Instruct --max-model-len 120000
# NUM_WORKERS=32 MODEL_NAME="meta-llama/Meta-Llama-3.1-70B-Instruct" OPENAI_API_URL="http://127.0.0.1:8083/v1" python agent_instruction_database.py
import copy
import os
import json
import traceback
import random
from pprint import pprint
# MODEL_NAME="meta-llama/Meta-Llama-3.1-8B-Instruct" OPENAI_API_URL="http://localhost:1234/v1" OPENAI_API_TOKEN="..." python general_function_calling.py
# MODEL_NAME="meta-llama/Meta-Llama-3.1-8B-Instruct-Turbo" OPENAI_API_URL="https://api.together.xyz/v1" OPENAI_API_TOKEN="$TOGETHER_API_KEY" python general_function_calling.py
# MODEL_NAME="meta-llama/Meta-Llama-3.1-70B-Instruct-Turbo" OPENAI_API_URL="https://api.together.xyz/v1" OPENAI_API_TOKEN="$TOGETHER_API_KEY" python general_function_calling.py
import copy
import os
import traceback
import json
import re
import inspect
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.