Created
April 25, 2026 08:13
-
-
Save gnomefin/e777048fea1339c01edf912886fafe0e to your computer and use it in GitHub Desktop.
vllm-omni PR #3118 — evidence: pytest 29/29 + HTTP migration sanity (commit 8c5c4cda)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| ================================================================ | |
| VoxCPM2 PR #3118 — review-round evidence pack | |
| Branch HEAD: 8c5c4cda | |
| https://github.com/vllm-project/vllm-omni/pull/3118 | |
| ================================================================ | |
| Contents: | |
| Part 1 — pytest -v on tests/entrypoints/openai_api/test_serving_speech_voxcpm2.py | |
| Part 2 — live HTTP curl checks of the deployed image, including: | |
| * NEW vs OLD shape (extra_params migration) | |
| * 400 range guard, 400 type guard | |
| * 400 length cap on /v1/audio/speech (P2 fix) | |
| * cfg A/B in pure-text mode and Hi-Fi mode (P1 fix) | |
| ================================================================ | |
| Part 1 — Unit test run (29/29) | |
| ================================================================ | |
| ============================= test session starts ============================== | |
| platform linux -- Python 3.12.13, pytest-9.0.3, pluggy-1.6.0 -- /usr/bin/python3 | |
| cachedir: .pytest_cache | |
| rootdir: /tmp/pr-final | |
| configfile: pyproject.toml | |
| plugins: mock-3.15.1, asyncio-1.3.0, hydra-core-1.3.2, typeguard-4.5.1, anyio-4.13.0 | |
| asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function | |
| collecting ... collected 29 items | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_voxcpm2_model_type_detection PASSED [ 3%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_voxcpm2_accepts_any_text_input PASSED [ 6%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_text_only PASSED [ 10%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_prepends_instructions PASSED [ 13%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_strips_instructions_whitespace PASSED [ 17%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_stashes_cfg_value PASSED [ 20%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_omits_cfg_value_when_extra_params_missing PASSED [ 24%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_omits_cfg_value_when_extra_params_has_other_keys PASSED [ 27%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_instructions_and_cfg_together PASSED [ 31%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_hifi_cloning_ref_audio_ref_text_cfg PASSED [ 34%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_build_prompt_hifi_mode_ignores_instructions PASSED [ 37%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_validate_rejects_overlong_instructions PASSED [ 41%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_validate_accepts_at_limit_instructions PASSED [ 44%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_prepare_speech_generation_runs_validator_for_voxcpm2 PASSED [ 48%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[0.1] PASSED [ 51%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[0.5] PASSED [ 55%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[1.5] PASSED [ 58%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[2.0] PASSED [ 62%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[2.7] PASSED [ 65%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[5.0] PASSED [ 68%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_accepts_range[10.0] PASSED [ 72%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[0.0] PASSED [ 75%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[-1.0] PASSED [ 79%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[10.5] PASSED [ 82%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_out_of_range[100.0] PASSED [ 86%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[abc] PASSED [ 89%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[None] PASSED [ 93%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[bad2] PASSED [ 96%] | |
| entrypoints/openai_api/test_serving_speech_voxcpm2.py::TestVoxCPM2Serving::test_cfg_value_rejects_non_numeric[bad3] PASSED [100%] | |
| =============================== warnings summary =============================== | |
| ../vllm_omni/__init__.py:19 | |
| /tmp/pr-final/vllm_omni/__init__.py:19: RuntimeWarning: Failed to import version from _version.py: No module named 'vllm_omni._version' | |
| This typically happens in development mode before building. | |
| Using fallback version 'dev'. | |
| from .version import __version__, __version_tuple__ # isort:skip # noqa: F401 | |
| <frozen importlib._bootstrap>:488 | |
| <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyPacked has no __module__ attribute | |
| <frozen importlib._bootstrap>:488 | |
| <frozen importlib._bootstrap>:488: DeprecationWarning: builtin type SwigPyObject has no __module__ attribute | |
| ../../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: 14 warnings | |
| /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:362: DeprecationWarning: `torch.jit.script_method` is deprecated. Please switch to `torch.compile` or `torch.export`. | |
| warnings.warn( | |
| ../vllm_omni/entrypoints/openai/protocol/audio.py:125 | |
| /tmp/pr-final/vllm_omni/entrypoints/openai/protocol/audio.py:125: PydanticDeprecatedSince20: Support for class-based `config` is deprecated, use ConfigDict instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/ | |
| class CreateAudio(BaseModel): | |
| ../../../usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480 | |
| /usr/local/lib/python3.12/dist-packages/torch/jit/_script.py:1480: DeprecationWarning: `torch.jit.script` is deprecated. Please switch to `torch.compile` or `torch.export`. | |
| warnings.warn( | |
| -- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html | |
| --- Running Summary | |
| ======================= 29 passed, 19 warnings in 2.68s ======================== | |
| ============================================================ | |
| VoxCPM2 PR #3118 — Live HTTP migration sanity check | |
| Branch HEAD: 8c5c4cda | |
| Pod image: | |
| Image SHA: | |
| Date: 2026-04-25T08:10:02Z | |
| ============================================================ | |
| --- /v1/models --- | |
| HTTP 200 | |
| served: voxcpm2 | |
| root: openbmb/VoxCPM2 | |
| --- 1. NEW shape: extra_params.cfg_value=2.7 (in-range) --- | |
| HTTP 200, time=1.040353s, bytes=46124 | |
| --- 2. OLD shape: top-level cfg_value=2.7 (silently dropped after migration; field no longer in schema) --- | |
| HTTP 200, time=1.252619s, bytes=107564 | |
| --- 3. Range guard: extra_params.cfg_value=15.0 (out of range, expect 400) --- | |
| HTTP 400 | |
| { | |
| "error": { | |
| "message": "extra_params['cfg_value']=15.0 out of range (0.1-10.0)", | |
| "type": "BadRequestError", | |
| "param": null, | |
| "code": 400 | |
| } | |
| } | |
| --- 4. Type guard: extra_params.cfg_value="abc" (non-numeric, expect 400) --- | |
| HTTP 400 | |
| { | |
| "error": { | |
| "message": "extra_params['cfg_value'] must be a number: could not convert string to float: 'abc'", | |
| "type": "BadRequestError", | |
| "param": null, | |
| "code": 400 | |
| } | |
| } | |
| --- 5. Length cap on /v1/audio/speech (single-request, used to bypass; expect 400 now) --- | |
| HTTP 400 | |
| { | |
| "error": { | |
| "message": "Instructions too long (max 500 characters)", | |
| "type": "BadRequestError", | |
| "param": null, | |
| "code": 400 | |
| } | |
| } | |
| --- 6. Decode-loop cfg propagation A/B: same input, only cfg differs --- | |
| cfg=2.5 HTTP 200 time=2.702284s bytes=737324 | |
| cfg=2.7 HTTP 200 time=2.814807s bytes=737324 | |
| cfg=3.0 HTTP 200 time=2.571122s bytes=737324 | |
| --- WAV durations from the cfg sweep (sanity: cfg should affect total length) --- | |
| cfg=2.5 bytes=737324 ~audio=7.68s | |
| cfg=2.7 bytes=737324 ~audio=7.68s | |
| cfg=3.0 bytes=737324 ~audio=7.68s | |
| ============================================================ | |
| End | |
| ============================================================ | |
| ============================================================ | |
| ADDENDUM: cfg A/B with Hi-Fi clone (ref_audio + ref_text) | |
| ============================================================ | |
| Pure-text generation (#6 above) converges to a sentence boundary | |
| regardless of cfg, so duration alone is a weak signal there. The | |
| stronger signal is when ref_audio + ref_text are set: with the | |
| decode-loop fix (commit 4e88314), cfg=3.0 now stops noticeably | |
| earlier than cfg=2.5/2.7 because every decode step honors the | |
| stricter guidance. Before the fix, the cfg value only affected | |
| the first patch and total length barely moved. | |
| cfg=2.5 HTTP 200 time=10.042026s bytes=860204 | |
| cfg=2.7 HTTP 200 time=4.438532s bytes=952364 | |
| cfg=3.0 HTTP 200 time=3.584766s bytes=952364 | |
| WAV durations: | |
| cfg=2.5 bytes=860204 ~audio=8.96s | |
| cfg=2.7 bytes=952364 ~audio=9.92s | |
| cfg=3.0 bytes=952364 ~audio=9.92s |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment