TOON, or Token-Oriented Object Notation, is a lightweight, human-readable data serialization format specifically designed for use with Large Language Models (LLMs). It acts as a more efficient alternative to JSON by reducing token consumption in prompts, making it ideal for passing structured data to AI systems without losing information. TOON is particularly effective for uniform tabular data, such as arrays of objects with consistent fields, where it can achieve 30-60% fewer tokens compared to JSON, based on benchmarks using common tokenizers like those in GPT models. This efficiency comes from stripping away redundant syntax like braces, brackets, and repeated keys, while relying on indentation and length markers to maintain structure.
The main goal of TOON is to optimize for LLM contexts, where token limits and costs are critical. JSON's verbosity can inflate prompts unnecessarily, especially with large datasets, but TOON minimizes this by:
- Using indentation for nesting (simila