gnomefin / clause_sim_in_pod_output.txt

Created April 25, 2026 15:13

ClauseTokenizer vs sentence-only — IN-CLUSTER TTFB simulation (run inside voxcpm2-omni pod)

	========================================================================
	In-cluster TTFB simulation (host=http://localhost:8000)
	TURN: طيب، نقدر نرتب على سداد كامل خلال ٤٥ يوم من اليوم، بس لازم تلتزم بالسداد خلال هالمده. مناسبك هذا الحل؟
	LLM pacing: 30.0 tok/sec => 9.5 ms/char, total 102 chars
	========================================================================

	--- Sentence-level (blingfire-equivalent) ---
	first chunk gate: 0.881s
	[ 0] t= 0.881s 'طيب، نقدر نرتب على سداد كامل خلال ٤٥ يوم من اليوم، بس لازم تلتزم بالسداد خلال هالمده.'
	[ 1] t= 1.033s 'مناسبك هذا الحل؟'

gnomefin / clause_tokenizer_sim_output.txt

Created April 25, 2026 15:11

ClauseTokenizer vs blingfire — live-pod TTFB simulation for levers-agent PR #138

	========================================================================
	TURN: طيب، نقدر نرتب على سداد كامل خلال ٤٥ يوم من اليوم، بس لازم تلتزم بالسداد خلال هالمده. مناسبك هذا الحل؟
	LLM pacing: 30.0 tokens/sec, ~3.5 chars/token
	=> 9.5 ms/char (102 chars total)
	========================================================================

	--- blingfire (current default) ---
	first chunk gate: 0.911s (LLM-first-token to first-flush)
	total chunks: 2
	[ 0] t= 0.911s 'طيب، نقدر نرتب على سداد كامل خلال ٤٥ يوم من اليوم، بس لازم تلتزم بالسداد خلال هالمده.'

gnomefin / voxcpm2_tts_unit_test_results.txt

Created April 25, 2026 14:16

collection-service: TestGetAgentConfigurationTtsRouting — 8 unit tests for levers:levers-tts routing (PR #1237)

	============================= test session starts ==============================
	platform darwin -- Python 3.14.0, pytest-9.0.2, pluggy-1.6.0 -- /Users/levers/Documents/Github/collection-service/.venv/bin/python
	cachedir: .pytest_cache
	rootdir: /Users/levers/Documents/Github/collection-service
	configfile: pyproject.toml
	plugins: anyio-4.12.1, Faker-12.0.1, recording-0.13.4, xdist-3.8.0, langsmith-0.7.32, asyncio-1.3.0, sugar-1.1.1, factoryboy-2.8.1, env-1.2.0
	asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
	collecting ... collected 8 items

	tests/application/ai/phone_calls/test_base.py::TestGetAgentConfigurationTtsRouting::test_no_tts_override_keeps_default_provider_and_model PASSED [ 12%]

gnomefin / pr3118_evidence_v2.log

Created April 25, 2026 09:29

vllm-omni PR #3118 — round-2 evidence: unit 34/34, L1 1921/1921, live HTTP checks (commit 622f2e5f)

	================================================================
	VoxCPM2 PR #3118 — review-round 2 evidence pack
	Branch HEAD: 622f2e5f
	https://github.com/vllm-project/vllm-omni/pull/3118
	================================================================

	Contents
	Part 1: Unit tests (pytest -v on tests/.../test_serving_speech_voxcpm2.py)
	Part 2: Full L1 sweep (pytest -q -m "core_model and cpu") with 9 heavy modules ignored
	Part 3: Live HTTP curl checks against the rebuilt image, including:

gnomefin / pr3118_evidence.log

Created April 25, 2026 08:13

vllm-omni PR #3118 — evidence: pytest 29/29 + HTTP migration sanity (commit 8c5c4cda)

	================================================================
	VoxCPM2 PR #3118 — review-round evidence pack
	Branch HEAD: 8c5c4cda
	https://github.com/vllm-project/vllm-omni/pull/3118
	================================================================

	Contents:
	Part 1 — pytest -v on tests/entrypoints/openai_api/test_serving_speech_voxcpm2.py
	Part 2 — live HTTP curl checks of the deployed image, including:
	* NEW vs OLD shape (extra_params migration)

gnomefin / l1_full.log

Created April 24, 2026 22:31

vllm-omni PR #3118 — L1 local pytest output (core_model and cpu) on commit 0966b4a7

	============================= test session starts ==============================
	platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3
	cachedir: .pytest_cache
	rootdir: /tmp/pr-l1
	configfile: pyproject.toml
	plugins: mock-3.15.1, asyncio-1.3.0, hydra-core-1.3.2, typeguard-4.5.1, anyio-4.13.0
	asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
	collecting ... collected 2847 items / 937 deselected / 1910 selected

	benchmarks/metrics/test_metrics.py::test_total_input_aggregated_from_output_prompt_len PASSED [ 0%]

gnomefin / *DeepSeek-uncensored.md

Created January 29, 2025 21:19 — forked from ruvnet/*DeepSeek-uncensored.md

Deploying and Fine-Tuning an Uncensored DeepSeek R1 Distill Model on Google Cloud

DeepSeek R1 Distill: Complete Tutorial for Deployment & Fine-Tuning

This guide shows how to deploy an uncensored DeepSeek R1 Distill model to Google Cloud Run with GPU support and how to perform a basic, functional fine-tuning process. The tutorial is split into:

Environment Setup
FastAPI Inference Server
Docker Configuration
Google Cloud Run Deployment
Fine-Tuning Pipeline (Cold Start, Reasoning RL, Data Collection, Final RL Phase)

Alfian Firmansyah gnomefin

DeepSeek R1 Distill: Complete Tutorial for Deployment & Fine-Tuning