Skip to content

Instantly share code, notes, and snippets.

View gnomefin's full-sized avatar
:atom:
Crafting AI Product

Alfian Firmansyah gnomefin

:atom:
Crafting AI Product
View GitHub Profile
@gnomefin
gnomefin / clause_sim_in_pod_output.txt
Created April 25, 2026 15:13
ClauseTokenizer vs sentence-only — IN-CLUSTER TTFB simulation (run inside voxcpm2-omni pod)
========================================================================
In-cluster TTFB simulation (host=http://localhost:8000)
TURN: طيب، نقدر نرتب على سداد كامل خلال ٤٥ يوم من اليوم، بس لازم تلتزم بالسداد خلال هالمده. مناسبك هذا الحل؟
LLM pacing: 30.0 tok/sec => 9.5 ms/char, total 102 chars
========================================================================
--- Sentence-level (blingfire-equivalent) ---
first chunk gate: 0.881s
[ 0] t= 0.881s 'طيب، نقدر نرتب على سداد كامل خلال ٤٥ يوم من اليوم، بس لازم تلتزم بالسداد خلال هالمده.'
[ 1] t= 1.033s 'مناسبك هذا الحل؟'
@gnomefin
gnomefin / clause_tokenizer_sim_output.txt
Created April 25, 2026 15:11
ClauseTokenizer vs blingfire — live-pod TTFB simulation for levers-agent PR #138
========================================================================
TURN: طيب، نقدر نرتب على سداد كامل خلال ٤٥ يوم من اليوم، بس لازم تلتزم بالسداد خلال هالمده. مناسبك هذا الحل؟
LLM pacing: 30.0 tokens/sec, ~3.5 chars/token
=> 9.5 ms/char (102 chars total)
========================================================================
--- blingfire (current default) ---
first chunk gate: 0.911s (LLM-first-token to first-flush)
total chunks: 2
[ 0] t= 0.911s 'طيب، نقدر نرتب على سداد كامل خلال ٤٥ يوم من اليوم، بس لازم تلتزم بالسداد خلال هالمده.'
@gnomefin
gnomefin / voxcpm2_tts_unit_test_results.txt
Created April 25, 2026 14:16
collection-service: TestGetAgentConfigurationTtsRouting — 8 unit tests for levers:levers-tts routing (PR #1237)
============================= test session starts ==============================
platform darwin -- Python 3.14.0, pytest-9.0.2, pluggy-1.6.0 -- /Users/levers/Documents/Github/collection-service/.venv/bin/python
cachedir: .pytest_cache
rootdir: /Users/levers/Documents/Github/collection-service
configfile: pyproject.toml
plugins: anyio-4.12.1, Faker-12.0.1, recording-0.13.4, xdist-3.8.0, langsmith-0.7.32, asyncio-1.3.0, sugar-1.1.1, factoryboy-2.8.1, env-1.2.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 8 items
tests/application/ai/phone_calls/test_base.py::TestGetAgentConfigurationTtsRouting::test_no_tts_override_keeps_default_provider_and_model PASSED [ 12%]
@gnomefin
gnomefin / pr3118_evidence_v2.log
Created April 25, 2026 09:29
vllm-omni PR #3118 — round-2 evidence: unit 34/34, L1 1921/1921, live HTTP checks (commit 622f2e5f)
================================================================
VoxCPM2 PR #3118 — review-round 2 evidence pack
Branch HEAD: 622f2e5f
https://github.com/vllm-project/vllm-omni/pull/3118
================================================================
Contents
Part 1: Unit tests (pytest -v on tests/.../test_serving_speech_voxcpm2.py)
Part 2: Full L1 sweep (pytest -q -m "core_model and cpu") with 9 heavy modules ignored
Part 3: Live HTTP curl checks against the rebuilt image, including:
@gnomefin
gnomefin / pr3118_evidence.log
Created April 25, 2026 08:13
vllm-omni PR #3118 — evidence: pytest 29/29 + HTTP migration sanity (commit 8c5c4cda)
================================================================
VoxCPM2 PR #3118 — review-round evidence pack
Branch HEAD: 8c5c4cda
https://github.com/vllm-project/vllm-omni/pull/3118
================================================================
Contents:
Part 1 — pytest -v on tests/entrypoints/openai_api/test_serving_speech_voxcpm2.py
Part 2 — live HTTP curl checks of the deployed image, including:
* NEW vs OLD shape (extra_params migration)
@gnomefin
gnomefin / l1_full.log
Created April 24, 2026 22:31
vllm-omni PR #3118 — L1 local pytest output (core_model and cpu) on commit 0966b4a7
============================= test session starts ==============================
platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3
cachedir: .pytest_cache
rootdir: /tmp/pr-l1
configfile: pyproject.toml
plugins: mock-3.15.1, asyncio-1.3.0, hydra-core-1.3.2, typeguard-4.5.1, anyio-4.13.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collecting ... collected 2847 items / 937 deselected / 1910 selected
benchmarks/metrics/test_metrics.py::test_total_input_aggregated_from_output_prompt_len PASSED [ 0%]
@gnomefin
gnomefin / *DeepSeek-uncensored.md
Created January 29, 2025 21:19 — forked from ruvnet/*DeepSeek-uncensored.md
Deploying and Fine-Tuning an Uncensored DeepSeek R1 Distill Model on Google Cloud

DeepSeek R1 Distill: Complete Tutorial for Deployment & Fine-Tuning

This guide shows how to deploy an uncensored DeepSeek R1 Distill model to Google Cloud Run with GPU support and how to perform a basic, functional fine-tuning process. The tutorial is split into:

  1. Environment Setup
  2. FastAPI Inference Server
  3. Docker Configuration
  4. Google Cloud Run Deployment
  5. Fine-Tuning Pipeline (Cold Start, Reasoning RL, Data Collection, Final RL Phase)