Skip to content

Instantly share code, notes, and snippets.

@rjpower
rjpower / README.md
Last active May 11, 2026 22:40
Iris: compute job CPU wall-time two ways — (A) controller tasks table, (B) iris.task stats namespace. Reproduces PR marin-community/marin#5637 + adds true integrated cpu_seconds.

iris_job_cpu_time.py

Two ways to compute "CPU wall time" for an Iris job subtree, validated against the marin production cluster.

Background: marin-community/marin#5637 adds iris job cpu-time, which on the controller answers via:

SUM(t.finished_at_ms - t.started_at_ms)  over all leaf-job tasks
@rjpower
rjpower / bos_datasets_tidy.csv
Last active April 24, 2026 17:18
Empirical BOS-bug scan for marin tokenized caches (issue #5149)
dataset status n_cache_roots n_arrays n_sampled_docs bos_id_guess bos_frac n_bugged n_suspect n_safe n_gone n_error
tokenized/validate BUGGED 3636 18550 349047 791 0.054 3489 0 147 0 0
tokenized/common_corpus_english-f7f46b BUGGED 4653 9653 381743 791 0.073 3809 0 843 0 1
tokenized/starcoder2_extras BUGGED 734 1069 5387 128000 1.000 13 3 718 0 0
tokenized/paloma BUGGED 64 790 5489 128000 1.000 24 0 40 0 0
tokenized/uncheatable_eval BUGGED 21 182 2100 128000 1.000 11 2 8 0 0
tokenized/nsf_awards-1a6caf BUGGED 1 26 100 15623 0.070 1 0 0 0 0
tokenized/data_efficiency BUGGED 4 20 400 791 0.070 4 0 0 0 0
tokenized/nemotron_cc_math_v1 SUSPECT 2 464 200 2 0.740 0 1 1 0 0
tokenized/tinystories-f8e445 SUSPECT 2 7 200 12805 0.855 0 2 0 0 0
@rjpower
rjpower / iris_job_failure_report.md
Created April 16, 2026 18:59
Iris job failure: controller RPC timeout

Iris Job Failure Report

/larry/iris-run-job-20260414-211246

Task: grug-train-moe-v16-compute-opt-d1280-2.83e-19 (Attempt 48) Date: 2026-04-16 09:08:33 — 09:21:17 UTC Duration: ~13 minutes Exit Code: 1


@rjpower
rjpower / v4_2048_analysis.md
Created April 14, 2026 23:34
Iris v4-2048 zombie-worker analysis (issue #4724)

Zombie workers from a reserved TPU slice whose QR was abandoned but not cancelled

Cluster: marin (config lib/iris/examples/marin.yaml) Slice: marin-tpu-v4-reserved-2048-us-central2-b-20260414-2123-fd07e934 Symptom: Provider sync: 660 workers, 56 failed, 13015ms failed=[... 33× v4-2048 workers ...] at 22:51 UTC on 2026-04-14.

Timeline

From controller logs filtered by fd07e934:

@rjpower
rjpower / wheel-inspect
Created April 10, 2026 16:57
wheel-inspect: CLI tool to inspect Python wheel files (version, deps, file summary)
#!/usr/bin/env python3
"""Inspect Python wheel files: version info, file summary, dependencies, etc."""
import argparse
import os
import sys
import zipfile
from collections import Counter
from dataclasses import dataclass, field
from pathlib import Path
@rjpower
rjpower / logscan.py
Last active March 31, 2026 20:15
Scan large log files with Gemini in parallel overlapping chunks
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "google-genai>=1.0",
# "pydantic>=2.0",
# ]
# ///
"""
Agent-driven log scanner with two composable modes: grep and summarize.
@rjpower
rjpower / iris-impersonation-options.md
Created March 24, 2026 18:36
Iris task container GCP credential impersonation: options analysis

GCP Credential Impersonation in Iris Task Containers

Context

Iris runs user code in Docker containers on GCE worker VMs. When per-user credential isolation is enabled, each user's tasks should run as their designated GCP service account rather than the worker VM's native SA. The worker VM's SA has roles/iam.serviceAccountTokenCreator on each target user SA.

The challenge: how do we make all GCP client libraries inside the container — Python (google-cloud-storage, gcsfs, BigQuery client, etc.), gcloud CLI, and potentially Go/Java — use impersonated credentials transparently?

The Core Problem: ADC File Format Limitations

@rjpower
rjpower / pyarrow_concat_bench.py
Created March 11, 2026 00:23
PyArrow concat_tables benchmark (uv script)
#!/usr/bin/env -S uv run --script
# /// script
# requires-python = ">=3.11"
# dependencies = [
# "pyarrow",
# ]
# ///
"""Benchmark: create N PyArrow tables with varied column types and concat them."""
@rjpower
rjpower / pq.py
Last active January 9, 2026 23:12
Parquet inspector - stream, filter, select columns, show metadata, bidirectional JSONL conversion
#!/usr/bin/env -S uv run --script
# /// script
# dependencies = [
# "pyarrow",
# "fsspec",
# "click",
# "s3fs",
# "gcsfs",
# ]
# ///
@rjpower
rjpower / gbranch
Last active October 9, 2025 21:28
#!/usr/bin/env -S uv run --quiet --script
# /// script
# requires-python = ">=3.11"
# dependencies = []
# ///
"""
gbranch - Manage branches with naming convention {USER}/{datestamp}-{name}
Usage: