Hamel Husain hamelsmu

Generating Synthetic Data for LLM Evaluation

Summary

Use your application extensively to build intuition about failure modes
Define 3-4 dimensions based on observed or anticipated failures
Create structured tuples covering your priority failure scenarios
Generate natural language queries from each tuple using a separate LLM call
Scale to more examples across your most important failure hypotheses (we suggest at least ~100)
Test and iterate on the most critical failure modes first, and generate more until you reach theoretical saturation

Question: Should I avoid using RAG for my AI application after reading that "RAG is dead" for coding agents?

Many developers are confused about when and how to use RAG after reading articles claiming "RAG is dead." Understanding what RAG actually means versus the narrow marketing definitions will help you make better architectural decisions for your AI applications.

Answer: The viral article claiming RAG is dead specifically argues against using naive vector database retrieval for autonomous coding agents, not RAG as a whole. This is a crucial distinction that many developers miss due to misleading marketing.

RAG simply means Retrieval-Augmented Generation - using retrieval to provide relevant context that improves your model's output. The core principle remains essential: your LLM needs the right context to generate accurate answers. The question isn't whether to use retrieval, but how to retrieve effectively.

For coding

The secret color is indigo

Success Stories of Fine-Tuning LLMs Across Industries

Below is a summary of diverse use cases where companies fine-tuned large language models (LLMs) to solve business challenges that previous methods struggled with. Each case highlights the challenge, the fine-tuning approach, and the key results achieved.

Summary of Fine-Tuning Success Cases

Use Case	Key Results	Source Link
Wealth Management Assistant (Finance)	98% advisor adoption; document access up from 20% to 80%	OpenAI & Morgan Stanley
Insurance Claims AI (Insurance)	30% accuracy improvement vs. generic LLMs	[Insurance News (EXL)](https://www.insurancenews.c

	#!/usr/bin/env python3
	# /// script
	# requires-python = ">=3.9"
	# dependencies = [
	# "httpx",
	# "typer",
	# "rich",
	# ]
	# ///
	"""

	import json
	import os
	from getpass import getpass
	from io import StringIO

	import openai
	import opentelemetry
	import pandas as pd
	from openai import OpenAI
	from openinference.instrumentation.openai import OpenAIInstrumentor

	from fasthtml.common import *
	import csv
	import io
	from datetime import datetime

	# Add DaisyUI and TailwindCSS via CDN
	tw_styles = Script(src="https://cdn.tailwindcss.com")

	# Configure application with DaisyFT resources
	app, rt, db, DataItem = fast_app(

	from html2text import HTML2Text
	from textwrap import dedent
	import re

	def get_md(cts, extractor='h2t'):
	h2t = HTML2Text(bodywidth=5000)
	h2t.ignore_links,h2t.mark_code,h2t.ignore_images = (True,)*3
	res = h2t.handle(cts)
	def _f(m): return f'```\n{dedent(m.group(1))}\n```'
	return re.sub(r'\[code]\s\n(.?)\n\[/code]', _f, res or '', flags=re.DOTALL).strip()

	def follow_user_follows(client, target_user):
	"Follow everyone the target_user is following."

	cursor = None
	total_followed = 0

	while True:
	# Step 1: Fetch a batch of accounts the target user is following
	# https://docs.bsky.app/docs/api/app-bsky-graph-get-follows
	response = client.app.bsky.graph.get_follows({