Skip to content

Instantly share code, notes, and snippets.

@mikasenghaas
Last active November 1, 2025 00:47
Show Gist options
  • Select an option

  • Save mikasenghaas/604b2470d86bc1db38971c32260dbb84 to your computer and use it in GitHub Desktop.

Select an option

Save mikasenghaas/604b2470d86bc1db38971c32260dbb84 to your computer and use it in GitHub Desktop.
Benchmarks for constructing OAI Pydantic models w/ many logprobs

To reproduce, download the .json files with raw responses from here.

Running in standard mode

uv run oai_pydantic.py
Processing oai_response_1024.json...
model_validate taken 5.02ms
model_construct taken 5.86ms
model_validate_json taken 2.18ms

Processing oai_response_8192.json...
model_validate taken 12.85ms
model_construct taken 37.14ms
model_validate_json taken 15.86ms

Processing oai_response_16384.json...
model_validate taken 54.59ms
model_construct taken 52.33ms
model_validate_json taken 31.51ms

Processing oai_response_32768.json...
model_validate taken 127.27ms
model_construct taken 133.14ms
model_validate_json taken 63.51ms
  • Indeed, model_validate_json is roughly 2x faster than model_validate and model_construct. That's nice but this would still be a huge bottleneck in RL training with 4k+ batch sizes
  • As @baggiponte said, I am quite surprised that model_validate always takes ~same time as model_construct. Shouldn't the latter not do any validation? Maybe this is not true for submodels? If so, is there a way to also skip validation of submodels - all this data is entirely trusted so skipping validation entirely is fine
  • Nit: I am not quite sure why I am now measuring 130ms instead of the 180ms I got a couple of days ago on the 32k sample, maybe because I am using perf_counter now instead of plain time or the CPU has a better day today, who knows. But really it's about the order of magnitude which is still the same.
  • Also, it does seem like constructing the Pydantic model scales ~linearly with the number of logprobs it has to parse.

Ignoring the logprobs field

uv run oai_pydantic.py --ignore-logprobs
Processing oai_response_1024.json...
model_validate taken 2.97ms
model_construct taken 3.38ms
model_validate_json taken 0.26ms

Processing oai_response_8192.json...
model_validate taken 0.03ms
model_construct taken 0.15ms
model_validate_json taken 0.10ms

Processing oai_response_16384.json...
model_validate taken 0.03ms
model_construct taken 0.13ms
model_validate_json taken 0.16ms

Processing oai_response_32768.json...
model_validate taken 0.03ms
model_construct taken 0.14ms
model_validate_json taken 0.26ms
  • It seems like it's only the logprob parsing that is taking a significant amount of time.

Using the hotfix that we use in prime-rl for now

uv run oai_pydantic.py --use-hotfix
Processing oai_response_1024.json...
model_validate taken 2.38ms
model_construct taken 3.03ms
model_validate_json taken 1.96ms

Processing oai_response_8192.json...
model_validate taken 0.17ms
model_construct taken 0.15ms
model_validate_json taken 13.58ms

Processing oai_response_16384.json...
model_validate taken 0.97ms
model_construct taken 0.14ms
model_validate_json taken 21.81ms

Processing oai_response_32768.json...
model_validate taken 1.83ms
model_construct taken 0.14ms
model_validate_json taken 43.73ms

Our hotfix essentially skips whatever Pydantic does to the logprobs field so we are still quick. Interestingly (I hadn't tested this before), model_validate_json does not seem to profit from it.

Super interested to hear where people think the bottleneck is and if we can find a more elegant general solution!:)

import argparse
import glob
import json
from time import perf_counter
from typing import Any, List, Optional
import openai.types.chat.chat_completion
from openai.types.chat.chat_completion import ChatCompletion, Choice
class ChoiceAny(Choice):
"""Same as openai.types.chat.chat_completion.Choice, but without type validation for logprobs field."""
logprobs: Optional[Any] = None
class ChatCompletionAny(ChatCompletion):
"""Same as openai.types.chat.chat_completion.ChatCompletion, but but using ChoiceAny instead of Choice."""
choices: List[ChoiceAny] # type: ignore
def main(args: argparse.Namespace):
from openai.types.chat.chat_completion import ChatCompletion
oai_response_files = sorted(glob.glob("oai_response_*.json"), key=lambda x: int(x.split(".")[0].split("_")[-1]))
for oai_response_file in oai_response_files:
print(f"Processing {oai_response_file}...")
num_completion_tokens = int(oai_response_file.split(".")[0].split("_")[-1])
with open(oai_response_file, "r") as f:
oai_response = json.load(f)
if args.ignore_logprobs:
for choice in oai_response["choices"]:
choice["logprobs"] = None
oai_response_json = json.dumps(oai_response)
start_time = perf_counter()
completion = ChatCompletion.model_validate(oai_response)
assert completion.usage is not None and completion.usage.completion_tokens == num_completion_tokens
print(f"model_validate taken {1000 * (perf_counter() - start_time):.2f}ms")
start_time = perf_counter()
completion = ChatCompletion.model_construct(**oai_response)
assert completion.usage is not None and completion.usage.completion_tokens == num_completion_tokens
print(f"model_construct taken {1000 * (perf_counter() - start_time):.2f}ms")
start_time = perf_counter()
completion = ChatCompletion.model_validate_json(oai_response_json)
assert completion.usage is not None and completion.usage.completion_tokens == num_completion_tokens
print(f"model_validate_json taken {1000 * (perf_counter() - start_time):.2f}ms")
print()
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--ignore-logprobs", action="store_true")
parser.add_argument("--use-hotfix", action="store_true")
args = parser.parse_args()
if args.use_hotfix:
openai.types.chat.chat_completion.ChatCompletion = ChatCompletionAny
main(args)
@rahuliyer95
Copy link
Copy Markdown

rahuliyer95 commented Oct 31, 2025

This analyis is based on my understanding of the code & libraries, please correct me if I am mistaken. Hope this helps :)

Concerns

I believe the benchmarking is not very accurate for a couple of reasons

  1. The first time a BaseModel is constructed Pydantic spends some time to create its schema (https://github.com/pydantic/pydantic/blob/1a8850d101e67d2744ba8c6286e1172d7cd89d0b/pydantic/_internal/_model_construction.py#L641) which is cached for future uses. This is not being correctly accounted for in this benchmarking code
  2. Computing / comparing the time taken for model_validate/model_construct and model_validate_json does not take into consideration the time required to parse the JSON.
    i. model_validate/model_construct is doing one operation, creating the BaseModel.
    ii. model_validate_json is doing two operations, parsing the JSON and creating the BaseModel.

Updated Benchmarking

I have updated and attached the benchmarking code at the bottom.

Here are the results from my updated code which takes into account the time required to parse JSON for model_validate and model_construct

Results

  1. Execution with no CLI flags
❯ ./oai_pydantic_v2.py
Python Version: 3.14.0 (main, Oct 28 2025, 12:03:45) [Clang 20.1.4 ]
Pydantic Version: 2.12.3
OpenAI Version: 2.6.1

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_1024.json...
model_validate took 0.979ms
model_construct took 2.540ms
model_validate_json took 0.822ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_8192.json...
model_validate took 7.402ms
model_construct took 7.847ms
model_validate_json took 5.343ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_16384.json...
model_validate took 15.064ms
model_construct took 15.697ms
model_validate_json took 10.561ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_32768.json...
model_validate took 30.157ms
model_construct took 31.323ms
model_validate_json took 21.552ms  
  1. Execution with --ignore-logprobs
❯ ./oai_pydantic_v2.py --ignore-logprobs
Python Version: 3.14.0 (main, Oct 28 2025, 12:03:45) [Clang 20.1.4 ]
Pydantic Version: 2.12.3
OpenAI Version: 2.6.1

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_1024.json...
model_validate took 0.017ms
model_construct took 1.229ms
model_validate_json took 0.056ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_8192.json...
model_validate took 0.037ms
model_construct took 0.072ms
model_validate_json took 0.055ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_16384.json...
model_validate took 0.066ms
model_construct took 0.094ms
model_validate_json took 0.096ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_32768.json...
model_validate took 0.104ms
model_construct took 0.161ms
model_validate_json took 0.158ms  
  1. Execution with --use-hotfix
❯ ./oai_pydantic_v2.py --use-hotfix
Python Version: 3.14.0 (main, Oct 28 2025, 12:03:45) [Clang 20.1.4 ]
Pydantic Version: 2.12.3
OpenAI Version: 2.6.1

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_1024.json...
model_validate took 0.523ms
model_construct took 1.625ms
model_validate_json took 0.472ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_8192.json...
model_validate took 3.397ms
model_construct took 3.378ms
model_validate_json took 2.888ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_16384.json...
model_validate took 7.296ms
model_construct took 6.774ms
model_validate_json took 5.612ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_32768.json...
model_validate took 14.812ms
model_construct took 13.750ms
model_validate_json took 12.317ms 
  1. Execution with --ignore-logprobs and --use-hotfix
❯ ./oai_pydantic_v2.py --ignore-logprobs --use-hotfix
Python Version: 3.14.0 (main, Oct 28 2025, 12:03:45) [Clang 20.1.4 ]
Pydantic Version: 2.12.3
OpenAI Version: 2.6.1

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_1024.json...
model_validate took 0.020ms
model_construct took 1.037ms
model_validate_json took 0.055ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_8192.json...
model_validate took 0.037ms
model_construct took 0.082ms
model_validate_json took 0.057ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_16384.json...
model_validate took 0.055ms
model_construct took 0.087ms
model_validate_json took 0.109ms

Processing /Users/rahuliyer/Downloads/oai_responses/oai_response_32768.json...
model_validate took 0.119ms
model_construct took 0.164ms
model_validate_json took 0.164ms

Code

oai_pydantic_v2.py
#!/usr/bin/env uv run
# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "openai==2.6.1",
#   "pydantic==2.12.3",
# ]
# ///
import argparse
import gc
import json
import sys
from pathlib import Path
from time import perf_counter
from typing import Any, List, Optional

import openai.types.chat.chat_completion
import pydantic
from openai.types.chat.chat_completion import ChatCompletion, Choice


class ChoiceAny(Choice):
    """Same as openai.types.chat.chat_completion.Choice, but without type validation for logprobs field."""

    logprobs: Optional[Any] = None


class ChatCompletionAny(ChatCompletion):
    """Same as openai.types.chat.chat_completion.ChatCompletion, but but using ChoiceAny instead of Choice."""

    choices: List[ChoiceAny]


def main(args: argparse.Namespace) -> None:
    from openai.types.chat.chat_completion import ChatCompletion

    gc.disable()
    oai_response_files = sorted(
        Path(__file__).parent.glob("oai_response_*.json"),
        key=lambda x: int(x.stem.split("_")[-1]),
    )
    for i, oai_response_file in enumerate(oai_response_files):
        print(f"Processing {oai_response_file}...")
        oai_response = json.loads(oai_response_file.read_text(encoding="utf-8"))
        num_completion_tokens = int(oai_response_file.stem.split("_")[-1])
        if args.ignore_logprobs:
            for choice in oai_response["choices"]:
                choice["logprobs"] = None
        oai_response_json = json.dumps(oai_response)

        # Dry run to cache the model's schema (only needs to be done once)
        if i == 0:
            ChatCompletion.model_validate(oai_response)

        start_time = perf_counter()
        oai_response = json.loads(oai_response_json)
        completion = ChatCompletion.model_validate(oai_response)
        assert (
            completion.usage is not None
            and completion.usage.completion_tokens == num_completion_tokens
        )
        end_time = perf_counter()
        print(f"model_validate took {1000 * (end_time - start_time):.3f}ms")

        start_time = perf_counter()
        oai_response = json.loads(oai_response_json)
        completion = ChatCompletion.model_construct(**oai_response)
        assert (
            completion.usage is not None
            and completion.usage.completion_tokens == num_completion_tokens
        )
        end_time = perf_counter()
        print(f"model_construct took {1000 * (end_time - start_time):.3f}ms")

        start_time = perf_counter()
        completion = ChatCompletion.model_validate_json(oai_response_json)
        assert (
            completion.usage is not None
            and completion.usage.completion_tokens == num_completion_tokens
        )
        end_time = perf_counter()
        print(f"model_validate_json took {1000 * (end_time - start_time):.3f}ms")

        print()
    gc.enable()


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--ignore-logprobs", action="store_true")
    parser.add_argument("--use-hotfix", action="store_true")
    args = parser.parse_args()
    print(
        f"""
Python Version: {sys.version}
Pydantic Version: {pydantic.__version__}
OpenAI Version: {openai.__version__}
""".lstrip()
    )
    if args.use_hotfix:
        openai.types.chat.chat_completion.ChatCompletion = ChatCompletionAny
    main(args)

What's happening with model_construct?

The OpenAI Python SDK has it's own BaseModel (https://github.com/openai/openai-python/blob/4e8856576211064b09c0cc4a1ed35b82b169abe2/src/openai/_models.py#L86) which implements a custom model_construct called construct (https://github.com/openai/openai-python/blob/4e8856576211064b09c0cc4a1ed35b82b169abe2/src/openai/_models.py#L206) which is recursively constructing it's fields from the JSON dict. This is what is creating the additional latency. If it were to use the original model_construct from pydantic.BaseModel the results would be vastly different (also the behavior would be different). Here are the results,

❯ ./oai_model_construct.py
Python Version: 3.14.0 (main, Oct 28 2025, 12:03:45) [Clang 20.1.4 ]
Pydantic Version: 2.12.3
OpenAI Version: 2.6.1

OpenAI model_construct took 32.612ms
OpenAI type(completion.choices)=<class 'list'>
OpenAI type(completion.choices[0])=<class 'openai.types.chat.chat_completion.Choice'>
OpenAIcompletion.choices[0].finish_reason='length'

Pydantic model_construct took 17.983ms
Pydantic type(completion.choices)=<class 'list'>
Pydantic type(completion.choices[0])=<class 'dict'>
Pydantic calling completion.choices[0].finish_reason
'dict' object has no attribute 'finish_reason'
Pydantic fixed completion.choices[0]["finish_reason"]='length'

Code

oai_model_construct.py
#!/usr/bin/env uv run
# /// script
# requires-python = ">=3.10"
# dependencies = [
#   "openai==2.6.1",
#   "pydantic==2.12.3",
# ]
# ///
import gc
import json
import sys
from pathlib import Path
from time import perf_counter

import openai
import pydantic
from openai.types.chat.chat_completion import ChatCompletion
from pydantic import BaseModel


def patch_model_construct() -> None:
    openai._models.BaseModel = BaseModel
    modules_to_delete = [
        module
        for module in sys.modules
        if module.startswith("openai.") and module != "openai._models"
    ]
    for module in modules_to_delete:
        sys.modules.pop(module)


def main() -> None:
    gc.disable()
    oai_response_file = Path(__file__).parent / "oai_response_32768.json"

    start_time = perf_counter()
    oai_response = json.loads(oai_response_file.read_text(encoding="utf-8"))
    completion = ChatCompletion.model_construct(**oai_response)
    end_time = perf_counter()
    print(f"OpenAI model_construct took {1000 * (end_time - start_time):.3f}ms")
    print(f"OpenAI {type(completion.choices)=}")
    print(f"OpenAI {type(completion.choices[0])=}")
    print(f"OpenAI{completion.choices[0].finish_reason=}")
    print()

    patch_model_construct()

    from openai.types.chat.chat_completion import ChatCompletion as PatchedChatCompletion
    start_time = perf_counter()
    oai_response = json.loads(oai_response_file.read_text(encoding="utf-8"))
    completion = PatchedChatCompletion.model_construct(**oai_response)
    end_time = perf_counter()
    print(f"Pydantic model_construct took {1000 * (end_time - start_time):.3f}ms")
    print(f"Pydantic {type(completion.choices)=}")
    print(f"Pydantic {type(completion.choices[0])=}")
    print("Pydantic calling completion.choices[0].finish_reason")
    try:
        print(f"Pydantic {completion.choices[0].finish_reason=}")
    except AttributeError as exc:
        print(exc)
        print(f"Pydantic fixed {completion.choices[0]["finish_reason"]=}")
    print()

    gc.enable()


if __name__ == "__main__":
    print(
        f"""
Python Version: {sys.version}
Pydantic Version: {pydantic.__version__}
OpenAI Version: {openai.__version__}
""".lstrip()
    )
    main()

Thoughts

Given the nested structure of the response, the time required to build the BaseModel is expected. If you don't want to spend this time you can simply use the results of .json() aka dict[str, Any] which would probably be the fastest. If you want type hints but no validation or the overhead of creating BaseModel you can convert all these models to TypedDict which would give you type-hints in your IDE.

@samuelcolvin
Copy link
Copy Markdown

Thanks @rahuliyer95, what you've said makes loads of sense.

I've no idea why stainlessapi/openai have that model_construct method and don't use model_validate_json.

In a hurry to respond, I didn't check if it was a vanilla BaseModel.

@mikasenghaas it's also worth noting that you're conflating the time taken to decode the JSON with the time taken to run model_construct - that won't have much effect here since model_construct is so slow, but if you used a saner approach like just calling model_validate, it makes quite a lot of difference:

this code
import json
import time
from pathlib import Path

from openai.types.chat.chat_completion import ChatCompletion

content = Path("oai_response_32768.json").read_bytes()

json_data = json.loads(content)

start = time.perf_counter()
ChatCompletion.model_validate(json_data)
end = time.perf_counter()
print(f"model_validate(json_data) taken: {1000 * (end - start): .2f}ms")

start = time.perf_counter()
ChatCompletion.model_validate(json.loads(content))
end = time.perf_counter()
print(f"model_validate(json.loads(content)) taken: {1000 * (end - start): .2f}ms")

start = time.perf_counter()
ChatCompletion.model_validate_json(content)
end = time.perf_counter()
print(f"model_validate_json taken: {1000 * (end - start): .2f}ms")

gives:

model_validate(json_data) taken:  28.24ms
model_validate(json.loads(content)) taken:  45.23ms
model_validate_json taken:  33.34ms

If you really care about performance but need validation, you can shave ~33% off validation time by using typed dicts and TypeAdapter

type typed adapter example
import time
from pathlib import Path
from typing import Literal, NotRequired, TypedDict

from pydantic import TypeAdapter


class TopLogprob(TypedDict):
    token: str
    bytes: NotRequired[list[int] | None]
    logprob: float


class ChatCompletionTokenLogprob(TypedDict):
    token: str
    bytes: NotRequired[list[int] | None]
    logprob: float
    top_logprobs: list[TopLogprob]


class ChoiceLogprobs(TypedDict):
    content: NotRequired[list[ChatCompletionTokenLogprob] | None]
    """A list of message content tokens with log probability information."""
    refusal: NotRequired[list[ChatCompletionTokenLogprob] | None]
    """A list of message refusal tokens with log probability information."""


class AnnotationURLCitation(TypedDict):
    end_index: int
    """The index of the last character of the URL citation in the message."""
    start_index: int
    """The index of the first character of the URL citation in the message."""
    title: str
    """The title of the web resource."""
    url: str
    """The URL of the web resource."""


class Annotation(TypedDict):
    type: Literal["url_citation"]
    url_citation: AnnotationURLCitation


class ChatCompletionAudio(TypedDict):
    id: str
    data: str
    expires_at: int
    transcript: str


class ChatCompletionMessage(TypedDict):
    content: NotRequired[str | None]
    refusal: NotRequired[str | None]
    role: Literal["assistant"]
    annotations: NotRequired[list[Annotation] | None]
    audio: NotRequired[ChatCompletionAudio | None]


class Choice(TypedDict):
    finish_reason: Literal[
        "stop", "length", "tool_calls", "content_filter", "function_call"
    ]
    index: int
    logprobs: NotRequired[ChoiceLogprobs | None]
    message: ChatCompletionMessage


class CompletionTokensDetails(TypedDict):
    accepted_prediction_tokens: NotRequired[int | None]
    audio_tokens: NotRequired[int | None]
    reasoning_tokens: NotRequired[int | None]
    rejected_prediction_tokens: NotRequired[int | None]


class PromptTokensDetails(TypedDict):
    audio_tokens: NotRequired[int | None]
    cached_tokens: NotRequired[int | None]


class CompletionUsage(TypedDict):
    completion_tokens: int
    prompt_tokens: int
    total_tokens: int
    completion_tokens_details: NotRequired[CompletionTokensDetails | None]
    prompt_tokens_details: NotRequired[PromptTokensDetails | None]


class ChatCompletion(TypedDict):
    id: str
    choices: list[Choice]
    created: int
    model: str
    object: Literal["chat.completion"]
    service_tier: NotRequired[
        Literal["auto", "default", "flex", "scale", "priority"] | None
    ]
    system_fingerprint: NotRequired[str | None]
    usage: NotRequired[CompletionUsage | None]


content = Path("oai_response_32768.json").read_bytes()
ta = TypeAdapter(ChatCompletion)
start = time.perf_counter()
ta.validate_json(content)
end = time.perf_counter()
print(f"model_validate_json taken: {1000 * (end - start): .2f}ms")

gives:

model_validate_json taken:  23.08ms

Or if you just want to parse the JSON, you can almost halve the time again:

import time
from pathlib import Path

import pydantic_core

# warmup
pydantic_core.from_json(b"{}")
content = Path("oai_response_32768.json").read_bytes()
start = time.perf_counter()
pydantic_core.from_json(content)
end = time.perf_counter()
print(f"pydantic_core.from_json(content) taken: {1000 * (end - start): .2f}ms")
#> pydantic_core.from_json(content) taken:  13.07ms

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment