Skip to content

Instantly share code, notes, and snippets.

@winklerj
winklerj / VSDD.md
Created March 4, 2026 14:17 — forked from dollspace-gay/VSDD.md
Verified Spec-Driven Development

Verified Spec-Driven Development (VSDD)

The Fusion: VDD × TDD × SDD for AI-Native Engineering

Overview

Verified Spec-Driven Development (VSDD) is a unified software engineering methodology that fuses three proven paradigms into a single AI-orchestrated pipeline:

  • Spec-Driven Development (SDD): Define the contract before writing a single line of implementation. Specs are the source of truth.
  • Test-Driven Development (TDD): Tests are written before code. Red → Green → Refactor. No code exists without a failing test that demanded it.
@winklerj
winklerj / ai_evals_synthetic_data_hw.py
Created May 27, 2025 19:35 — forked from skylarbpayne/ai_evals_synthetic_data_hw.py
AI Evals Synthetic Data Homework 1
"""Script for generating queries for a recipe search engine.
This script can be used to generate synthetic queries for a recipe search engine.
Following best practices from Hamel and Shreya's AI Evals course, we:
- Generate a set of dimensions that can be used to generate queries; these are attributes that significantly change what the query is about or how it is written.
- For each set of attributes ("Dimensions") we generate a query that matches those attributes.
To ensure that the synthetic generation is better aligned, we first try handwriting the queries using the --manual flag.
This gives us labeled examples to use few shot in our synthetic generation.

Generating Synthetic Data for LLM Evaluation

Synthetic data generation is not about creating random test cases. It's about systematically surfacing specific failure modes in your LLM application.

Start with Real Usage, Not Synthetic Data

Before generating any synthetic data, use your application yourself. Try different scenarios, edge cases, and realistic workflows. If you can't use it extensively, recruit 2-3 people to test it while you observe their interactions.

Generate Data to Test Specific Hypotheses

<TITLE>

Problem Statement

Requirements

Functional Requirements

Stevey's Google Platforms Rant

I was at Amazon for about six and a half years, and now I've been at Google for that long. One thing that struck me immediately about the two companies -- an impression that has been reinforced almost daily -- is that Amazon does everything wrong, and Google does everything right. Sure, it's a sweeping generalization, but a surprisingly accurate one. It's pretty crazy. There are probably a hundred or even two hundred different ways you can compare the two companies, and Google is superior in all but three of them, if I recall correctly. I actually did a spreadsheet at one point but Legal wouldn't let me show it to anyone, even though recruiting loved it.

I mean, just to give you a very brief taste: Amazon's recruiting process is fundamentally flawed by having teams hire for themselves, so their hiring bar is incredibly inconsistent across teams, despite various efforts they've made to level it out. And their operations are a mess; they don't real