Skip to content

Instantly share code, notes, and snippets.

View natolambert's full-sized avatar
🎯
Focusing

Nathan Lambert natolambert

🎯
Focusing
View GitHub Profile
#!/usr/bin/env python3
"""
Human quality transcripts from audio files using
AssemblyAI for transcription and Google's Gemini for enhancement.
Requirements:
- AssemblyAI API key (https://www.assemblyai.com/)
- Google API key (https://aistudio.google.com/)
- Python packages: assemblyai, google-generativeai, pydub
@natolambert
natolambert / skyworks-rewardbench-contamination.md
Last active February 18, 2025 03:56
MagPie RewardBench Contamination (found through SkyWorks Preferences)

Recently, I learned that some of the top reward models on RewardBench were trained on a preference dataset that has unintentional contamination with the benchmark. The dataset, Skyworks Preferences 80k contains contamination by mixing a Magpie dataset in. Magpie is a new method for having language models generate instructions by prompting them with an empty chat template. The source for the Skyworks dataset that was contaminated is Argilla/magpie-ultra-v0.1, generated with Llama 3.1 405B Instruct. I would never expect a Magpie dataset to be contaminated.

What seems likely is that Meta trained on some these prompts, but the exact provenance of each prompt needs more example. For example, we learned that some of the prompts we used in our LLMBar subsets they got from popular training sets like Al

from typing import Dict, List
from rich.console import Console
from rich.panel import Panel
from datasets import load_dataset
def print_hf_messages(messages: List[Dict[str, str]]):
console = Console()
colors = ["red", "green"]
color_idx = 0
console.rule(f"[bold yellow]The number of turns is {len(messages)}")
@hamelsmu
hamelsmu / is_fine_tuning_valuable.md
Last active April 4, 2024 01:22
My thoughts re: Is fine tuning still valuable?

Here is my personal opinion about the questions I posed in this tweet:


I think that fine-tuning is still very valuable in many situations. I’ve done some more digging and I find that people who say that fine-tuning isn't useful are indeed often working on products where fine-tuning isn't likely to be useful:

  • They are making developer tools - foundation models have been trained extensively on coding tasks.
  • They are building foundation models and testing for the most general cases. But the foundation models themselves are also being trained for the most general cases.
  • They are building a personal assistant that isn’t scoped to any type of domain or use case and is essentially similar to the same folks building foundation models.