Imagine you're a data scientist with a powerful script that processes images using machine learning. Locally, it works perfectly on your laptop with 10 sample images. But now you need to process 10,000 images, and you need serious GPU power.
The traditional path is painful:
- Set up cloud infrastructure (AWS/GCP)
- Configure Docker containers
- Manage dependencies and environments
- Upload data to cloud storage
- Write deployment scripts
- Handle output collection
- Debug networking and permissions
UV Scripts + HF Jobs eliminates 90% of this complexity.
UV is a blazingly fast Python package manager written in Rust that introduces a game-changing concept: self-contained executable scripts. Think of it as turning Python scripts into something closer to compiled binaries - they carry their own dependencies.
Traditional Python script:
# requirements.txt needed separately
# Virtual environment setup required
# Hope dependencies don't conflict
import requests
import pandas as pd
from transformers import pipeline
def process_data():
# Your code here
pass
UV-enabled script:
# /// script
# dependencies = [
# "requests>=2.28.0",
# "pandas>=2.0.0",
# "transformers>=4.30.0",
# "torch>=2.0.0"
# ]
# requires-python = ">=3.9"
# ///
import requests
import pandas as pd
from transformers import pipeline
def process_data():
# Exact same code, but now self-contained
pass
Local execution:
uv run my_script.py
What happens:
- UV reads the inline metadata block
- Creates an isolated temporary environment
- Installs exact dependency versions
- Executes your script
- Cleans up automatically
No virtual environment management. No dependency conflicts. It just works.
Before UV:
python script.py
➜ Import errors, version conflicts- Need separate requirements.txt, setup instructions
- "Works on my machine" syndrome
With UV:
uv run script.py
➜ Guaranteed to work identically everywhere- Script is self-documenting and self-contained
- Perfect reproducibility
HF Jobs is Hugging Face's answer to "I need GPU compute without the DevOps nightmare." It's Docker-in-the-cloud, but optimized for AI/ML workloads.
Your Command → Docker Container → GPU Hardware → Results
Available Hardware Flavors:
cpu-basic
: Standard CPU processingt4-small
: NVIDIA T4 GPU (entry-level ML)a10g-small/large
: NVIDIA A10G GPU (solid mid-range)a100-large
: NVIDIA A100 GPU (high-end training)
CLI-based:
hf jobs run python:3.12 python -c "print('Hello from the cloud!')"
hf jobs run --flavor a10g-small pytorch/pytorch:latest python train.py
Python API:
from huggingface_hub import run_job
job = run_job(
image="python:3.12",
command=["python", "my_script.py"],
flavor="a100-large",
timeout="60m"
)
- Ephemeral: No persistent infrastructure
- Pay-per-second: Only pay for actual compute time
- Docker-based: Containers ensure consistency
- Pro-only: Requires $9/month HF Pro subscription
The hf jobs uv run
command bridges local development and cloud execution:
# Test locally first
uv run my_ml_script.py --sample-data
# Same exact script on GPU cloud
hf jobs uv run --flavor a10g-large my_ml_script.py --full-dataset
What makes this special:
- Zero modification: Same script works locally and in cloud
- No Dockerfiles: UV handles the environment automatically
- Perfect parity: Identical dependency resolution
When you run hf jobs uv run script.py
:
- HF Jobs spins up a container with UV pre-installed
- UV reads your script's inline dependencies
- Container installs exact packages in isolation
- Script executes with full GPU access
- Results saved back to HF Hub
- Resources automatically cleaned up
Local development pattern:
my_project/
├── images/ # 1000 local image files
│ ├── img001.jpg
│ └── img002.jpg
├── script.py # Reads from ./images/
└── results/ # Outputs to ./results/
Cloud reality:
# In HF Jobs container:
ls images/ # ❌ Directory doesn't exist
# Your local files aren't magically available
HF Hub serves as the universal data layer:
Local Machine ←→ HF Hub ←→ HF Jobs Container
Data flows:
- Upload: Local data → HF Hub dataset
- Process: HF Jobs reads from Hub, writes to Hub
- Download: Results downloaded from Hub
Step 1 - Upload data locally:
# upload_data.py
from datasets import Dataset, Features, Image as ImageFeature
import os
from PIL import Image
def create_dataset():
images = []
for filename in os.listdir("./images/"):
img = Image.open(f"./images/{filename}")
images.append({"image": img, "filename": filename})
dataset = Dataset.from_list(images)
dataset.push_to_hub("username/my-images")
create_dataset()
Step 2 - Create processing script:
# process_images.py
# /// script
# dependencies = [
# "datasets>=2.0",
# "transformers>=4.30",
# "torch>=2.0",
# "pillow>=9.0"
# ]
# ///
import sys
from datasets import load_dataset, Dataset
def main(input_dataset, output_dataset):
# Load from Hub
dataset = load_dataset(input_dataset)['train']
results = []
for item in dataset:
# Process each image
text = extract_text_from_image(item['image'])
results.append({
"filename": item['filename'],
"extracted_text": text,
"image": item['image']
})
# Save back to Hub
results_dataset = Dataset.from_list(results)
results_dataset.push_to_hub(output_dataset)
if __name__ == "__main__":
main(sys.argv[1], sys.argv[2])
Step 3 - Execute anywhere:
# Test locally
uv run process_images.py username/my-images username/results-test
# Production run on GPU
hf jobs uv run --flavor a10g-large \
process_images.py username/my-images username/results-production
The Script:
# /// script
# dependencies = [
# "datasets>=2.0",
# "transformers>=4.30",
# "torch>=2.0",
# "pillow>=9.0",
# "easyocr>=1.7.0"
# ]
# ///
import sys
import easyocr
from datasets import load_dataset, Dataset
def ocr_pipeline(input_dataset_id, output_dataset_id):
# Initialize OCR reader (GPU-accelerated if available)
reader = easyocr.Reader(['en'])
# Load dataset from Hub
dataset = load_dataset(input_dataset_id)['train']
results = []
for i, item in enumerate(dataset):
print(f"Processing {i+1}/{len(dataset)}: {item['filename']}")
# Convert PIL Image to numpy array for easyocr
img_array = np.array(item['image'])
# Extract text
ocr_results = reader.readtext(img_array)
extracted_text = ' '.join([result[1] for result in ocr_results])
results.append({
"filename": item['filename'],
"extracted_text": extracted_text,
"confidence_scores": [result[2] for result in ocr_results],
"original_image": item['image']
})
# Save results to Hub
output_dataset = Dataset.from_list(results)
output_dataset.push_to_hub(output_dataset_id)
print(f"Saved {len(results)} processed images to {output_dataset_id}")
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python ocr_script.py <input_dataset> <output_dataset>")
sys.exit(1)
ocr_pipeline(sys.argv[1], sys.argv[2])
Usage:
# Development: Test with small dataset
uv run ocr_script.py username/sample-images username/ocr-test
# Production: Full dataset on GPU
hf jobs uv run --flavor a10g-large \
ocr_script.py username/production-images username/ocr-results-final
The Script:
# /// script
# dependencies = [
# "datasets>=2.0",
# "transformers>=4.30",
# "torch>=2.0",
# "accelerate>=0.20",
# "peft>=0.4.0"
# ]
# ///
import sys
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, TrainingArguments, Trainer
from peft import LoraConfig, get_peft_model
def fine_tune_model(dataset_id, model_output_id):
# Load training data from Hub
dataset = load_dataset(dataset_id)
# Load base model
model_name = "microsoft/DialoGPT-medium"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Configure LoRA for efficient fine-tuning
peft_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.1,
target_modules=["c_attn"]
)
model = get_peft_model(model, peft_config)
# Training configuration
training_args = TrainingArguments(
output_dir="./results",
num_train_epochs=3,
per_device_train_batch_size=4,
save_steps=500,
logging_steps=100,
)
# Train model
trainer = Trainer(
model=model,
args=training_args,
train_dataset=dataset['train'],
tokenizer=tokenizer,
)
trainer.train()
# Save fine-tuned model to Hub
model.push_to_hub(model_output_id)
tokenizer.push_to_hub(model_output_id)
if __name__ == "__main__":
fine_tune_model(sys.argv[1], sys.argv[2])
Usage:
# Long training job on high-end GPU
hf jobs uv run --flavor a100-large --timeout 240m \
finetune_script.py username/training-data username/my-finetuned-model
- Start Local: Always test with small datasets locally first
uv run script.py sample-dataset test-output
- Scale Gradually: Move to cloud with progressively larger datasets
hf jobs uv run --flavor cpu-basic script.py medium-dataset test-cloud
hf jobs uv run --flavor a10g-small script.py full-dataset production
- Monitor and Debug: Use HF Jobs monitoring
hf jobs logs <job-id>
hf jobs ps # List running jobs
Small Datasets (<1GB): Upload directly to Hub
dataset.push_to_hub("username/my-data")
Large Datasets (1GB+): Use streaming
dataset = load_dataset("username/huge-dataset", streaming=True)
Incremental Processing: Check for existing results
try:
existing = load_dataset(output_dataset_id)
processed_files = set(existing['train']['filename'])
except:
processed_files = set()
# Only process new files
for item in dataset:
if item['filename'] not in processed_files:
# Process item
pass
# /// script
# dependencies = ["datasets>=2.0", "huggingface_hub>=0.20"]
# ///
import sys
from datasets import load_dataset, Dataset
from huggingface_hub import HfApi
def resilient_processing(input_id, output_id, checkpoint_interval=100):
dataset = load_dataset(input_id)['train']
api = HfApi()
results = []
for i, item in enumerate(dataset):
try:
result = process_item(item)
results.append(result)
# Checkpoint every N items
if i % checkpoint_interval == 0 and results:
checkpoint_dataset = Dataset.from_list(results)
checkpoint_dataset.push_to_hub(f"{output_id}-checkpoint-{i}")
except Exception as e:
print(f"Error processing item {i}: {e}")
continue
# Final save
final_dataset = Dataset.from_list(results)
final_dataset.push_to_hub(output_id)
# Stage 1: Data preprocessing
hf jobs uv run --flavor cpu-basic \
preprocess.py raw-data preprocessed-data
# Stage 2: GPU processing
hf jobs uv run --flavor a100-large \
process.py preprocessed-data processed-results
# Stage 3: Post-processing
hf jobs uv run --flavor cpu-basic \
postprocess.py processed-results final-results
# /// script
# dependencies = ["datasets>=2.0", "click>=8.0"]
# ///
import click
from datasets import load_dataset
@click.command()
@click.argument('input_dataset')
@click.argument('output_dataset')
@click.option('--batch-size', default=32, help='Processing batch size')
@click.option('--max-samples', default=None, help='Limit number of samples')
@click.option('--model-name', default='default-model', help='Model to use')
def process_data(input_dataset, output_dataset, batch_size, max_samples, model_name):
# Processing logic with parameters
pass
if __name__ == "__main__":
process_data()
Usage:
hf jobs uv run --flavor a10g-large \
script.py input-data output-data --batch-size 64 --max-samples 1000
Development Phase:
# Use CPU for debugging
hf jobs uv run --flavor cpu-basic script.py small-sample debug-output
Scaling Phase:
# Use mid-tier GPU for validation
hf jobs uv run --flavor a10g-small script.py medium-sample validation-output
Production Phase:
# Use high-end GPU for full processing
hf jobs uv run --flavor a100-large script.py full-dataset production-output
Cost Examples:
- CPU debugging: ~$0.01 for 10-minute test
- A10G validation: ~$0.25 for 30-minute run
- A100 production: ~$8.25 for 2-hour job
Batch Processing:
def process_in_batches(dataset, batch_size=32):
for i in range(0, len(dataset), batch_size):
batch = dataset[i:i+batch_size]
# Process batch efficiently
yield process_batch(batch)
Memory Management:
# Use streaming for large datasets
dataset = load_dataset(dataset_id, streaming=True)
for item in dataset:
result = process_item(item)
# Process one item at a time to avoid memory issues
UV Scripts + HF Jobs represents a fundamental shift in how we think about ML workflows:
Old Paradigm:
- Write script → Set up infrastructure → Deploy → Debug → Scale
- Days of setup, brittle configurations, "works on my machine"
New Paradigm:
- Write self-contained script → Test locally → Scale to cloud
- Minutes of setup, guaranteed reproducibility, seamless scaling
The Key Insights:
- Self-contained scripts eliminate environment hell
- Hub-centric storage enables universal data access
- Serverless execution removes infrastructure complexity
- Pay-per-use makes GPU compute economically accessible
This combination makes complex ML workflows as simple as running any command-line tool, while maintaining the power and flexibility needed for production workloads.
The future of ML development is: write once, run anywhere, scale instantly.