gnomefin · April 25, 2026 08:13
diff --git a/pr3118_evidence.log b/pr3118_evidence.log
 ================================================================
 VoxCPM2 PR #3118 — review-round evidence pack
 Branch HEAD: 8c5c4cda
 https://github.com/vllm-project/vllm-omni/pull/3118
 ================================================================

 Contents:
  Part 1 — pytest -v on tests/entrypoints/openai_api/test_serving_speech_voxcpm2.py
  Part 2 — live HTTP curl checks of the deployed image, including:
           * NEW vs OLD shape (extra_params migration)
           * 400 range guard, 400 type guard
           * 400 length cap on /v1/audio/speech (P2 fix)
           * cfg A/B in pure-text mode and Hi-Fi mode (P1 fix)

 ================================================================
 Part 1 — Unit test run (29/29)
 ================================================================

 ============================= test session starts ==============================
 platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3
 cachedir: .pytest_cache
 rootdir: /tmp/pr-final
 configfile: pyproject.toml
 plugins: mock-3.15.1, asyncio-1.3.0, hydra-core-1.3.2, typeguard-4.5.1, anyio-4.13.0
 asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
 collecting ... collected 29 items

 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_voxcpm2_model_type_detection PASSED [  3%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_voxcpm2_accepts_any_text_input PASSED [  6%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_text_only PASSED [ 10%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_prepends_instructions PASSED [ 13%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_strips_instructions_whitespace PASSED [ 17%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_stashes_cfg_value PASSED [ 20%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_omits_cfg_value_when_extra_params_missing PASSED [ 24%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_omits_cfg_value_when_extra_params_has_other_keys PASSED [ 27%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_instructions_and_cfg_together PASSED [ 31%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_hifi_cloning_ref_audio_ref_text_cfg PASSED [ 34%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_hifi_mode_ignores_instructions PASSED [ 37%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_validate_rejects_overlong_instructions PASSED [ 41%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_validate_accepts_at_limit_instructions PASSED [ 44%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_prepare_speech_generation_runs_validator_for_voxcpm2 PASSED [ 48%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[0.1] PASSED [ 51%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[0.5] PASSED [ 55%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[1.5] PASSED [ 58%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[2.0] PASSED [ 62%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[2.7] PASSED [ 65%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[5.0] PASSED [ 68%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[10.0] PASSED [ 72%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[0.0] PASSED [ 75%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[-1.0] PASSED [ 79%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[10.5] PASSED [ 82%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[100.0] PASSED [ 86%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[abc] PASSED [ 89%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[None] PASSED [ 93%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[bad2] PASSED [ 96%]
 entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[bad3] PASSED [100%]

 =============================== warnings summary ===============================
 ../vllm_omni/__init__.py:19
  /tmp/pr-final/vllm_omni/__init__.py:19: RuntimeWarning: Failed to import version from _version.py: No module named 'vllm_omni._version'
  This typically happens in development mode before building.
  Using fallback version 'dev'.
    from .version import __version__, __version_tuple__  # isort:skip # noqa: F401

 <frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

 <frozen importlib._bootstrap>:488
  <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

 ../../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: 14 warnings
  /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

 ../vllm_omni/entrypoints/openai/protocol/audio.py:125
  /tmp/pr-final/vllm_omni/entrypoints/openai/protocol/audio.py:125: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
    class CreateAudio(BaseModel):

 ../../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480
  /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480: DeprecationWarning: `torch.jit.script` is deprecated. Please switch to `torch.compile` or `torch.export`.
    warnings.warn(

 -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
 --- Running Summary
 ======================= 29 passed, 19 warnings in 2.68s ========================

 ============================================================
 VoxCPM2 PR #3118 — Live HTTP migration sanity check
 Branch HEAD: 8c5c4cda
 Pod image: 
 Image SHA: 
 Date: 2026-04-25T08:10:02Z
 ============================================================

 --- /v1/models ---
 HTTP 200
 served: voxcpm2 
 root:   openbmb/VoxCPM2

 --- 1. NEW shape: extra_params.cfg_value=2.7 (in-range) ---
 HTTP 200, time=1.040353s, bytes=46124

 --- 2. OLD shape: top-level cfg_value=2.7 (silently dropped after migration; field no longer in schema) ---
 HTTP 200, time=1.252619s, bytes=107564

 --- 3. Range guard: extra_params.cfg_value=15.0 (out of range, expect 400) ---
 HTTP 400
 {
    "error": {
        "message": "extra_params['cfg_value']=15.0 out of range (0.1-10.0)",
        "type": "BadRequestError",
        "param": null,
        "code": 400
    }
 }

 --- 4. Type guard: extra_params.cfg_value="abc" (non-numeric, expect 400) ---
 HTTP 400
 {
    "error": {
        "message": "extra_params['cfg_value'] must be a number: could not convert string to float: 'abc'",
        "type": "BadRequestError",
        "param": null,
        "code": 400
    }
 }

 --- 5. Length cap on /v1/audio/speech (single-request, used to bypass; expect 400 now) ---
 HTTP 400
 {
    "error": {
        "message": "Instructions too long (max 500 characters)",
        "type": "BadRequestError",
        "param": null,
        "code": 400
    }
 }

 --- 6. Decode-loop cfg propagation A/B: same input, only cfg differs ---
 cfg=2.5  HTTP 200  time=2.702284s  bytes=737324
 cfg=2.7  HTTP 200  time=2.814807s  bytes=737324
 cfg=3.0  HTTP 200  time=2.571122s  bytes=737324

 --- WAV durations from the cfg sweep (sanity: cfg should affect total length) ---
  cfg=2.5  bytes=737324    ~audio=7.68s
  cfg=2.7  bytes=737324    ~audio=7.68s
  cfg=3.0  bytes=737324    ~audio=7.68s

 ============================================================
 End
 ============================================================

 ============================================================
 ADDENDUM: cfg A/B with Hi-Fi clone (ref_audio + ref_text)
 ============================================================

 Pure-text generation (#6 above) converges to a sentence boundary
 regardless of cfg, so duration alone is a weak signal there. The
 stronger signal is when ref_audio + ref_text are set: with the
 decode-loop fix (commit 4e88314), cfg=3.0 now stops noticeably
 earlier than cfg=2.5/2.7 because every decode step honors the
 stricter guidance. Before the fix, the cfg value only affected
 the first patch and total length barely moved.

 cfg=2.5  HTTP 200  time=10.042026s  bytes=860204
 cfg=2.7  HTTP 200  time=4.438532s  bytes=952364
 cfg=3.0  HTTP 200  time=3.584766s  bytes=952364

 WAV durations:
  cfg=2.5  bytes=860204    ~audio=8.96s
  cfg=2.7  bytes=952364    ~audio=9.92s
  cfg=3.0  bytes=952364    ~audio=9.92s
	================================================================
	VoxCPM2 PR #3118 — review-round evidence pack
	Branch HEAD: 8c5c4cda
	https://github.com/vllm-project/vllm-omni/pull/3118
	================================================================

	Contents:
	Part 1 — pytest -v on tests/entrypoints/openai_api/test_serving_speech_voxcpm2.py
	Part 2 — live HTTP curl checks of the deployed image, including:
	* NEW vs OLD shape (extra_params migration)
	* 400 range guard, 400 type guard
	* 400 length cap on /v1/audio/speech (P2 fix)
	* cfg A/B in pure-text mode and Hi-Fi mode (P1 fix)

	================================================================
	Part 1 — Unit test run (29/29)
	================================================================

	============================= test session starts ==============================
	platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3
	cachedir: .pytest_cache
	rootdir: /tmp/pr-final
	configfile: pyproject.toml
	plugins: mock-3.15.1, asyncio-1.3.0, hydra-core-1.3.2, typeguard-4.5.1, anyio-4.13.0
	asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
	collecting ... collected 29 items

	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_voxcpm2_model_type_detection PASSED [ 3%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_voxcpm2_accepts_any_text_input PASSED [ 6%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_text_only PASSED [ 10%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_prepends_instructions PASSED [ 13%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_strips_instructions_whitespace PASSED [ 17%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_stashes_cfg_value PASSED [ 20%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_omits_cfg_value_when_extra_params_missing PASSED [ 24%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_omits_cfg_value_when_extra_params_has_other_keys PASSED [ 27%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_instructions_and_cfg_together PASSED [ 31%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_hifi_cloning_ref_audio_ref_text_cfg PASSED [ 34%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_hifi_mode_ignores_instructions PASSED [ 37%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_validate_rejects_overlong_instructions PASSED [ 41%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_validate_accepts_at_limit_instructions PASSED [ 44%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_prepare_speech_generation_runs_validator_for_voxcpm2 PASSED [ 48%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[0.1] PASSED [ 51%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[0.5] PASSED [ 55%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[1.5] PASSED [ 58%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[2.0] PASSED [ 62%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[2.7] PASSED [ 65%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[5.0] PASSED [ 68%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[10.0] PASSED [ 72%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[0.0] PASSED [ 75%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[-1.0] PASSED [ 79%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[10.5] PASSED [ 82%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[100.0] PASSED [ 86%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[abc] PASSED [ 89%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[None] PASSED [ 93%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[bad2] PASSED [ 96%]
	entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[bad3] PASSED [100%]

	=============================== warnings summary ===============================
	../vllm_omni/__init__.py:19
	/tmp/pr-final/vllm_omni/__init__.py:19: RuntimeWarning: Failed to import version from _version.py: No module named 'vllm_omni._version'
	This typically happens in development mode before building.
	Using fallback version 'dev'.
	from .version import __version__, __version_tuple__ # isort:skip # noqa: F401

	<frozen importlib._bootstrap>:488
	<frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute

	<frozen importlib._bootstrap>:488
	<frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute

	../../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: 14 warnings
	/usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`.
	warnings.warn(

	../vllm_omni/entrypoints/openai/protocol/audio.py:125
	/tmp/pr-final/vllm_omni/entrypoints/openai/protocol/audio.py:125: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
	class CreateAudio(BaseModel):

	../../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480
	/usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480: DeprecationWarning: `torch.jit.script` is deprecated. Please switch to `torch.compile` or `torch.export`.
	warnings.warn(

	-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
	--- Running Summary
	======================= 29 passed, 19 warnings in 2.68s ========================

	============================================================
	VoxCPM2 PR #3118 — Live HTTP migration sanity check
	Branch HEAD: 8c5c4cda
	Pod image:
	Image SHA:
	Date: 2026-04-25T08:10:02Z
	============================================================

	--- /v1/models ---
	HTTP 200
	served: voxcpm2
	root: openbmb/VoxCPM2

	--- 1. NEW shape: extra_params.cfg_value=2.7 (in-range) ---
	HTTP 200, time=1.040353s, bytes=46124

	--- 2. OLD shape: top-level cfg_value=2.7 (silently dropped after migration; field no longer in schema) ---
	HTTP 200, time=1.252619s, bytes=107564

	--- 3. Range guard: extra_params.cfg_value=15.0 (out of range, expect 400) ---
	HTTP 400
	{
	"error": {
	"message": "extra_params['cfg_value']=15.0 out of range (0.1-10.0)",
	"type": "BadRequestError",
	"param": null,
	"code": 400
	}
	}

	--- 4. Type guard: extra_params.cfg_value="abc" (non-numeric, expect 400) ---
	HTTP 400
	{
	"error": {
	"message": "extra_params['cfg_value'] must be a number: could not convert string to float: 'abc'",
	"type": "BadRequestError",
	"param": null,
	"code": 400
	}
	}

	--- 5. Length cap on /v1/audio/speech (single-request, used to bypass; expect 400 now) ---
	HTTP 400
	{
	"error": {
	"message": "Instructions too long (max 500 characters)",
	"type": "BadRequestError",
	"param": null,
	"code": 400
	}
	}

	--- 6. Decode-loop cfg propagation A/B: same input, only cfg differs ---
	cfg=2.5 HTTP 200 time=2.702284s bytes=737324
	cfg=2.7 HTTP 200 time=2.814807s bytes=737324
	cfg=3.0 HTTP 200 time=2.571122s bytes=737324

	--- WAV durations from the cfg sweep (sanity: cfg should affect total length) ---
	cfg=2.5 bytes=737324 ~audio=7.68s
	cfg=2.7 bytes=737324 ~audio=7.68s
	cfg=3.0 bytes=737324 ~audio=7.68s

	============================================================
	End
	============================================================

	============================================================
	ADDENDUM: cfg A/B with Hi-Fi clone (ref_audio + ref_text)
	============================================================

	Pure-text generation (#6 above) converges to a sentence boundary
	regardless of cfg, so duration alone is a weak signal there. The
	stronger signal is when ref_audio + ref_text are set: with the
	decode-loop fix (commit 4e88314), cfg=3.0 now stops noticeably
	earlier than cfg=2.5/2.7 because every decode step honors the
	stricter guidance. Before the fix, the cfg value only affected
	the first patch and total length barely moved.

	cfg=2.5 HTTP 200 time=10.042026s bytes=860204
	cfg=2.7 HTTP 200 time=4.438532s bytes=952364
	cfg=3.0 HTTP 200 time=3.584766s bytes=952364

	WAV durations:
	cfg=2.5 bytes=860204 ~audio=8.96s
	cfg=2.7 bytes=952364 ~audio=9.92s
	cfg=3.0 bytes=952364 ~audio=9.92s
No results found