Tim Head betatim

name	pr-walkthrough
description	Produce a structured walkthrough of a GitHub PR ordered by causal dependency. Use when the user asks to walk through, analyze, or understand a PR's changes.

PR Walkthrough

Generate a causal-dependency-ordered walkthrough of a GitHub PR and write it to agents/reviews/ in the repository.

This skill is scoped to PRs in the current repository. Cross-repo PRs are out of scope.

AGENTS Instruction

This file contains is additional guidance for AI agents and other AI editors.

Core Principles

These principles reduce common LLM coding mistakes. Apply them to every task.

AGENTS Instruction

This project is called foobar. Its goal is to provide ...

This file contains is additional guidance for AI agents and other AI editors.

Core Principles

Array API Architecture

Created: 2026-01-07 Last Updated: 2026-01-07

Overview

Scikit-learn's Array API support enables estimators and functions to work with arrays from different libraries (NumPy, CuPy, PyTorch) without modification. This allows computations to run on GPUs when using GPU-backed array libraries.

The implementation follows the Array API Standard, a specification that defines a common API for array manipulation libraries.

	import numpy as np
	from sklearn.ensemble import RandomForestRegressor
	from sklearn.model_selection import cross_validate
	from skrub import tabular_pipeline
	from skrub.datasets import fetch_employee_salaries


	def main():
	print("Loading employee salaries dataset ...")
	data = fetch_employee_salaries()

	#!/usr/bin/env python3
	"""
	Employee Salary Prediction
	==========================
	Predict current annual salary for Montgomery County employees
	using a scikit-learn pipeline with mixed feature types.
	"""

	try:
	import cuml.accel

	"""
	Pipeline(StandardScaler, PCA, RandomForest) with GridSearchCV
	=============================================================

	GridSearchCV over an all-proxy Pipeline on the full Forest Cover Type
	dataset (581K samples, 54 features, 7 classes).

	Pipeline: StandardScaler -> PCA -> RandomForestClassifier

	All three steps are cuml.accel proxies, so the GridSearchCV patch

	"""
	Benchmark: numpy-to-cupy (CPU-to-GPU) transfer times on this machine.

	Target: GPU 1 (NVIDIA RTX A6000, 48 GB, PCIe Gen4 x16 slot)
	"""

	import time
	import statistics
	import numpy as np
	import cupy as cp

	import tarfile
	import time
	import urllib.request
	from collections import OrderedDict
	from pathlib import Path

	import numpy as np
	import scipy.io
	import scipy.sparse
	from sklearn.decomposition import TruncatedSVD

	"""
	Test: Validate that cuml native estimators can be converted to ONNX
	via as_sklearn() -> skl2onnx -> onnxruntime.

	Unlike cuml.accel proxies (which skl2onnx recognizes directly), native cuml
	estimators must first be converted to sklearn via as_sklearn() before
	skl2onnx.convert_sklearn() will accept them.

	Run without cuml.accel:
	python test_onnx_as_sklearn.py