Skip to content

Instantly share code, notes, and snippets.

@kevmoo
Last active April 5, 2026 20:35
Show Gist options
  • Select an option

  • Save kevmoo/c8ba01d7cc2a7118191249f1d258e252 to your computer and use it in GitHub Desktop.

Select an option

Save kevmoo/c8ba01d7cc2a7118191249f1d258e252 to your computer and use it in GitHub Desktop.
Pondering better json things in dart

JSON serialization in the Dart SDK is a sophisticated system that balances platform-native capabilities with cross-platform consistency. Below is an analysis of its inner workings, platform specializations, and a proposal for a streaming API.

1. How JSON Serialization Works in the SDK

The core of JSON serialization in dart:convert is built around a Visitor Pattern implemented in the internal _JsonStringifier class.

  • Encoder (JsonEncoder): Traverses the object graph. Primitives (num, String, bool, Null) and standard containers (List, Map) are handled directly. For other objects, it calls toEncodable (defaulting to .toJson()), which is expected to return a serializable object.
  • Specializations: The SDK provides _JsonStringStringifier (for String output) and _JsonUtf8Stringifier (for direct List<int> output). The UTF-8 version is highly optimized to avoid intermediate string allocations by writing directly to byte buffers.
  • Decoder (JsonDecoder): Uses an external method _parseJson which is patched per-platform. It supports both full-string conversion and chunked conversion via sinks.

2. Platform Specializations: VM vs. JS vs. Wasm

Feature Dart VM JS (dart2js/ddc) Wasm (dart2wasm)
Parsing Engine Hand-written Dart state machine. Native JSON.parse. Hand-written Dart state machine (optimized for Wasm).
Streaming True Streaming: The state machine maintains partial state across chunks. Fake Streaming: Buffers the entire input into a StringBuffer before parsing. True Streaming: Similar to the VM implementation.
Object Creation Immediate creation of List and Map during parsing. Lazy Decoding: Returns a _JsonMap wrapper. Values are converted to Dart objects only when accessed. Immediate creation using optimized WasmArray structures.
UTF-8 Support Direct decoding from bytes via _JsonUtf8Parser. Usually requires conversion to a String first. Direct decoding from bytes, leveraging JS string interning for keys.
Key Optimization Linear scan for cycle detection; standard Map hashing. Leverages JS engine's hidden classes/optimization for objects. Uses a specialized string interning cache for JSON keys to save memory and time.

3. Opportunities for Improvement

  1. True JS Streaming: The current JS JsonDecoder buffers the entire input. For multi-gigabyte JSON files, this leads to OutOfMemory errors. A Dart-implemented streaming parser for JS could alleviate this, even if it's slower than JSON.parse.
  2. Allocation Reduction: In toJson() patterns, a new Map is created just to be immediately iterated and discarded. This is a massive source of short-lived garbage.
  3. Cycle Detection: The current _seen list uses a linear search (identical). For extremely deep graphs, this becomes $O(N^2)$. While rare in JSON, a more efficient mechanism could be used for larger depths.
  4. Wasm String Interop: Further reducing the boundary crossing between Wasm memory and JS strings during parsing.

4. Pondering a "Streaming" API

The current "toJson" contract is a bottleneck: Object -> Map -> String. A streaming API would enable Object -> String directly.

Proposal: JsonWriter and JsonEncodable

We could introduce a public JsonWriter interface that mirrors the internal _JsonStringifier:

abstract interface class JsonWriter {
  void writeString(String value);
  void writeNumber(num value);
  void writeBool(bool value);
  void writeNull();
  
  void startObject();
  void endObject();
  void writePropertyName(String name);
  
  void startArray();
  void endArray();
  
  /// Write any serializable object using standard logic
  void writeValue(Object? value); 
}

And a companion interface for classes:

abstract interface class JsonStreamEncodable {
  void writeJson(JsonWriter writer);
}

How it would work:

  1. Modified Encoder: JsonEncoder would check if an object implements JsonStreamEncodable.
  2. No Intermediate Maps: Instead of calling toJson(), the encoder passes the JsonWriter to writeJson.
  3. Seamless Integration: A User object could write its fields directly to the underlying StringConversionSink or ByteConversionSink.

Example Usage:

class User implements JsonStreamEncodable {
  final String name;
  final List<User> friends;

  @override
  void writeJson(JsonWriter writer) {
    writer.startObject();
    writer.writePropertyName('name');
    writer.writeString(name);
    writer.writePropertyName('friends');
    writer.startArray();
    for (var friend in friends) {
      friend.writeJson(writer); // Recursive streaming!
    }
    writer.endArray();
    writer.endObject();
  }
}

Benefits:

  • Zero-Garbage Serialization: No intermediate Map or List objects for custom types.
  • Performance: Drastically reduces pressure on the GC, especially in high-throughput servers or memory-constrained environments (Wasm/Mobile).
  • Symmetry: On the decoding side, exposing a JsonListener would allow similar "SAX-style" parsing, enabling the processing of massive datasets that don't fit in memory.

I have completed my research. I've analyzed the proposed improvements in the gist, explored the current dart:convert implementation in the SDK, and examined how package:yaml and protoc_plugin handle JSON serialization.

The core issue is that Dart's current JSON serialization (via toJson() and jsonEncode) is "allocation-heavy," requiring the creation of intermediate Map and List objects that are immediately discarded. This creates significant GC pressure, especially in high-throughput or memory-constrained environments.

Below is a detailed proposal for a set of APIs to be added to the Dart SDK to address these use cases, followed by an adendum for yaml and protoc_plugin.


Proposal: High-Performance Streaming JSON API for the Dart SDK

1. Direct Streaming Interfaces

To eliminate intermediate allocations, we should introduce a public streaming API that allows objects to write their JSON representation directly to a sink.

JsonWriter (Abstract Interface)

A low-level, high-performance interface for writing JSON tokens. This mirrors the existing internal _JsonStringifier but makes it public and optimized.

abstract interface class JsonWriter {
  /// Writes a property name. Must be called inside an object.
  void writePropertyName(String name);

  /// Writes a string value (handles escaping).
  void writeString(String value);

  /// Writes a numeric value.
  void writeNumber(num value);

  /// Writes a boolean value.
  void writeBool(bool value);

  /// Writes a null value.
  void writeNull();

  /// Starts a JSON object `{`.
  void startObject();

  /// Ends a JSON object `}`.
  void endObject();

  /// Starts a JSON array `[`.
  void startArray();

  /// Ends a JSON array `]`.
  void endArray();

  /// Recursively writes any object. 
  /// If [value] is [JsonStreamEncodable], calls its [writeJson].
  /// Otherwise, falls back to standard [jsonEncode] logic.
  void writeObject(Object? value);
}

JsonStreamEncodable (Interface)

An interface for classes to opt-in to streaming serialization.

abstract interface class JsonStreamEncodable {
  /// Writes the object's JSON representation directly to the [writer].
  void writeJson(JsonWriter writer);
}

2. SDK "Toolkit" Primitives (The Engine Room)

To enable high-performance external encoders/decoders (like package:yaml or custom JSON parsers), the SDK should expose low-level primitives that are currently internal or "trapped" in private implementations.

A. Slice-based Number Parsing

Currently, double.parse and int.parse require a full String, forcing users to call substring() (which allocates) when parsing from a larger buffer.

  • double.parseSlice(String source, int start, int end)
  • int.parseSlice(String source, int start, int end, {int? radix})
  • Future-proofing: Versions that operate directly on List<int> (UTF-8 bytes) to avoid UTF-8 -> String conversion entirely during parsing.

B. Standardized JSON Escaping

Expose the internal fast-path for JSON string escaping so external writers don't have to reimplement (often incorrectly or slowly) the escaping logic.

  • String jsonEscape(String source)
  • void jsonEscapeTo(String source, StringSink sink)

C. String Interning

Expose a mechanism to deduplicate strings during decoding. This is critical for Map keys in JSON or labels in YAML.

  • String.intern(String candidate)
  • Internally, this could use a weak-referenced table to ensure frequent keys (like "id", "name", "type") share a single instance across the entire heap.

D. Identity-based Cycle Detection

Expose an efficient identity-based cycle detector. The current jsonEncode uses a linear scan of a _seen list ($O(N)$), which is slow for deep trees. A public, optimized CycleDetector could be used by any custom serializer.

3. Integration with jsonEncode

Update JsonEncoder to check for JsonStreamEncodable:

// Internal implementation detail of jsonEncode
if (object is JsonStreamEncodable) {
  object.writeJson(internalWriter);
} else {
  // fallback to toJson()
}

Adendum: Opportunities for yaml and protoc_plugin

package:yaml (The Reader)

While primarily a reader, package:yaml would benefit immensely from the String Interning API.

  • Memory Efficiency: YAML documents often repeat keys and tags. Interning these during the load phase would drastically reduce the memory footprint of the resulting YamlMap structures.
  • Decoding Performance: Comparing interned strings via identical() is much faster than character-by-character comparison.

protoc_plugin (Generated Messages)

The protoc_plugin currently generates messages that convert themselves to Map<String, dynamic> before being stringified to JSON. This is a massive source of "short-lived garbage."

  • Implementation: Generated GeneratedMessage classes should implement JsonStreamEncodable.
  • Zero-Garbage Serialization: The generated writeJson method would look like this:
    @override
    void writeJson(JsonWriter writer) {
      writer.startObject();
      if (hasField(1)) {
        writer.writePropertyName('1');
        writer.writeObject(field1);
      }
      // ...
      writer.endObject();
    }
  • Result: Encoding a 10MB Protobuf message to JSON would require near-zero intermediate allocations, compared to the current behavior which might allocate 20-30MB of intermediate Maps and Lists.

package:yaml_edit (The Writer)

For packages that perform YAML writing or editing, a YamlWriter could be built on top of the same JsonWriter primitives (like escaping and number formatting), ensuring consistency and performance across different serialization formats in the Dart ecosystem.

To move JsonWriter and JsonReader into a high-performance external package while maintaining the efficiency of the SDK's internal implementations, the SDK would need to expose several "low-level" primitives that are currently hidden behind patch files and _internal libraries.

Here are the key features and hooks the SDK would need to provide:

1. Slice-based Number Parsing (Avoid String Allocations)

Currently, to parse a number from a JSON stream, you usually have to extract a String slice first and then call num.parse(slice).

  • The Problem: In a large JSON array of numbers, this creates millions of short-lived String objects.
  • The Solution: The SDK should expose a way to parse num, int, and double directly from a List<int> (UTF-8 bytes) or String using a start/end offset.
  • SDK Requirement: Public access to the internal _parseDouble(source, start, end) and an equivalent for int.

2. High-Performance String Escaping/Unescaping

The logic for handling JSON escapes (like \uXXXX, \n, \t) and Unicode surrogate pairs is non-trivial to implement efficiently.

  • The Problem: Every third-party package has to reimplement the "character attribute table" (identifying which characters need escaping) and the state machine for unescaping.
  • The Solution: Expose the internal _characterAttributes table and a standardized JsonEscape.escape(String) / JsonEscape.unescape(String) utility that uses the VM/JS-native fast paths.

3. String Interning / "Deduplication" Hooks

As seen in the Wasm implementation, interning keys in a Map is a massive win for memory and subsequent lookup speed.

  • The Problem: Standard Dart has no public "String Intern Table" API. A package-level cache is just a Map<String, String>, which is itself an overhead.
  • The Solution: An API like String.intern(String candidate) that allows the package to hint to the engine that this string is a repeated key.
  • Wasm-Specific: For dart2wasm, the package would need access to efficient JS-string interop (like jsStringFromDartString) to avoid costly boundary crossings when building the final Dart objects.

4. Zero-Copy Byte Management (Sinks)

Streaming serialization often writes to a ByteConversionSink.

  • The Problem: Currently, if you want to write a chunk of data, you often end up copying bytes into a new Uint8List.
  • The Solution: A "Buffer-view" API for Sinks. Instead of add(List<int> bytes), we need addSlice(List<int> bytes, int start, int end) to be part of the standard Sink interface (it is currently internal-only in many places). This allows the writer to reuse a single large buffer.

5. Identity-based Cycle Detection

The current SDK uses a List and a linear identical() check for cycle detection.

  • The Problem: This is $O(N)$ per object, leading to $O(N^2)$ for deep trees.
  • The Solution: A high-performance IdentityHashSet or an internal VM hook that can quickly check if an object is "on the stack" without a linear scan.

6. The "Universal Encodable" Interface

For a package-based JsonWriter to work with existing code, the SDK needs to define a common interface in dart:core or dart:convert.

  • SDK Requirement: A public JsonStreamEncodable interface.
    abstract interface class JsonStreamEncodable {
      void writeJson(JsonWriter writer);
    }
    Without this in the SDK, two different packages (e.g., json_serializable and built_value) wouldn't be able to agree on how to talk to a JsonWriter.

7. JS-Native "Lazy Map" Hooks

In dart2js, the most efficient way to handle JSON is to keep the data as a JS Object and only convert parts of it when accessed.

  • The Problem: An external package cannot easily implement the _JsonMap trick used in the JS patch because it requires deep integration with the JS interceptors.
  • The Solution: The SDK would need to expose a "JavaScript Object to Dart Map" utility that allows a package to wrap a native JS handle in a way that the Dart compiler understands is a Map<String, dynamic>.

Summary: The "Missing" High-Level Architecture

To make this work, the SDK should shift from providing a closed JSON implementation to providing a JSON toolkit:

  1. Scanner/Lexer: A low-level UTF-8/String scanner that emits tokens (String, Number, ObjectStart, etc.).
  2. Formatter: A low-level UTF-8/String formatter for primitives and escapes.
  3. Standard Interface: The JsonStreamEncodable hook.

If the SDK exposed these three, a package could implement JsonReader and JsonWriter that are faster than the current SDK because they could be tuned for specific use cases (e.g., ignoring fields, strictly typed decoding) while still using the engine's underlying speed for strings and numbers.

This is an excellent point. Strings were the most obvious source of "escaped string allocation" waste, but numbers are often the "volume bottleneck" in JSON streams (think large arrays of metrics or sensor data).

If we want a truly zero-garbage surgical toolkit, we should have a consistent API for all JSON primitives (String, num, bool, Null).

Here is a brainstorm of how we could model the finalized API surface area:

Option 1: The "Namespace" Approach (Object-Oriented Toolkit)

This follows your idea of a JsonRaw<T> or similar. We add static "toolkit" objects to JsonEncoder and JsonDecoder that group logic by JSON type.

// In JsonEncoder (The Writer)
static const JsonStringEncoder string = ...;
static const JsonNumberEncoder number = ...;
static const JsonBooleanEncoder boolean = ...;
static const JsonNullEncoder nil = ...;

// In JsonDecoder (The Reader)
static const JsonStringDecoder string = ...;
static const JsonNumberDecoder number = ...;
static const JsonBooleanDecoder boolean = ...;

How it looks in practice:

// Encoder
JsonEncoder.string.write("value", sink);      // Writes "\"value\""
JsonEncoder.string.writeEscaped("v", sink);   // Writes "v" (no quotes)
JsonEncoder.number.write(12.3, sink);         // Digit-by-digit write (no String)
JsonEncoder.number.writeUtf8(12.3, byteSink); // Direct Ryū/Grisu to bytes

// Decoder
var n = JsonDecoder.number.parseUtf8(bytes, start, end); // No utf8.decode()
var s = JsonDecoder.string.unescapeUtf8(bytes, start, end);
  • Pros: Extremely clean and discoverable. Supports generic programming (interfaces).
  • Cons: Introduces several new interface classes to the SDK.

Option 2: The "Flat" Approach (Standard Dart Style)

This follows the existing pattern in dart:convert (like jsonEncode, jsonDecode). We add explicit, descriptive static methods.

JsonEncoder

  • static String escapeString(String source)
  • static void writeStringContent(String source, StringSink sink)
  • static void writeStringContentUtf8(String source, ByteConversionSink sink)
  • static void writeNumber(num value, StringSink sink)
  • static void writeNumberUtf8(num value, ByteConversionSink sink)
  • static void writeBool(bool value, StringSink sink) ... etc.

JsonDecoder

  • static String unescapeString(String source)

  • static String unescapeStringUtf8(List<int> bytes, ...)

  • static num parseNumber(String source, ...)

  • static num parseNumberUtf8(List<int> bytes, ...)

  • Pros: Very familiar to Dart users. Matches Uri.decodeComponent, etc.

  • Cons: Method names get very long and repetitive.


Comparing the "Content" vs "Full Value" logic

One subtle naming challenge we have is the difference between Content (no quotes) and Value (with quotes).

For Numbers, write always writes the full value (there are no quotes). For Strings, write usually implies the full value (with quotes), while writeEscaped implies just the content.

If we use the Namespace approach, we can be very elegant:

Namespace .write(...) .writeContent(...) (or similar)
.string "value" value (escaped)
.number 123 N/A
.boolean true N/A

Recommendation for "The Final Matrix"

I believe Option 1 (Namespace) is the most future-proof and "SDK-quality" approach. It allows us to add writeInt, writeDouble, parseSlice etc., to the specific toolkit without cluttering the main class.

Proposed Naming:

  • JsonEncoder.string.write(...) (Full JSON string)
  • JsonEncoder.string.writeContent(...) (The "escape" part)
  • JsonDecoder.string.parse(...) (Full JSON string)
  • JsonDecoder.string.parseContent(...) (The "unescape" part)

What do you think? Does the "Namespace" approach feel too heavy for the SDK, or is it exactly the kind of "Low Level Toolkit" structure we need?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment