This project is called foobar. Its goal is to provide ...
This file contains is additional guidance for AI agents and other AI editors.
| """ | |
| Pipeline(StandardScaler, PCA, RandomForest) with GridSearchCV | |
| ============================================================= | |
| GridSearchCV over an all-proxy Pipeline on the full Forest Cover Type | |
| dataset (581K samples, 54 features, 7 classes). | |
| Pipeline: StandardScaler -> PCA -> RandomForestClassifier | |
| All three steps are cuml.accel proxies, so the GridSearchCV patch |
| """ | |
| Benchmark: numpy-to-cupy (CPU-to-GPU) transfer times on this machine. | |
| Target: GPU 1 (NVIDIA RTX A6000, 48 GB, PCIe Gen4 x16 slot) | |
| """ | |
| import time | |
| import statistics | |
| import numpy as np | |
| import cupy as cp |
| import tarfile | |
| import time | |
| import urllib.request | |
| from collections import OrderedDict | |
| from pathlib import Path | |
| import numpy as np | |
| import scipy.io | |
| import scipy.sparse | |
| from sklearn.decomposition import TruncatedSVD |
| """ | |
| Test: Validate that cuml native estimators can be converted to ONNX | |
| via as_sklearn() -> skl2onnx -> onnxruntime. | |
| Unlike cuml.accel proxies (which skl2onnx recognizes directly), native cuml | |
| estimators must first be converted to sklearn via as_sklearn() before | |
| skl2onnx.convert_sklearn() will accept them. | |
| Run without cuml.accel: | |
| python test_onnx_as_sklearn.py |
Created: 2026-01-07 Last Updated: 2026-01-07
Scikit-learn's Array API support enables estimators and functions to work with arrays from different libraries (NumPy, CuPy, PyTorch) without modification. This allows computations to run on GPUs when using GPU-backed array libraries.
The implementation follows the Array API Standard, a specification that defines a common API for array manipulation libraries.
| #!/usr/bin/env python3 | |
| """ | |
| Ray + RandomForestClassifier with max_calls=1 | |
| Demonstrates the impact of max_calls=1 on Ray task execution when using | |
| scikit-learn's RandomForestClassifier. | |
| """ | |
| import time | |
| import ray | |
| from sklearn.datasets import make_classification |
| """ | |
| Benchmark: scikit-learn RandomForest vs LightGBM RandomForest | |
| Compares performance across: | |
| - Number of samples (1K, 10K, 100K, 500K) | |
| - Number of features (10, 50, 200) | |
| - Feature types (numerical, categorical, mixed) | |
| - Number of classes (2, 5, 10) | |
| Includes cases optimized for LightGBM's strengths: |
| name: tabareana-20251202 | |
| channels: | |
| - conda-forge | |
| dependencies: | |
| - _libgcc_mutex=0.1=conda_forge | |
| - _openmp_mutex=4.5=2_gnu | |
| - bzip2=1.0.8=hda65f42_8 | |
| - ca-certificates=2025.11.12=hbd8a1cb_0 | |
| - cuda-cccl_linux-64=13.0.85=ha770c72_0 | |
| - cuda-cudart-dev_linux-64=13.0.96=h376f20c_0 |
| from __future__ import annotations | |
| import warnings | |
| warnings.simplefilter("error", FutureWarning) | |
| from pathlib import Path | |
| from typing import Any | |
| import pandas as pd | |
| from tabarena.benchmark.experiment import AGModelBagExperiment, ExperimentBatchRunner |