Saturday, May 9, 2026
HomeArtificial IntelligenceCease Losing Tokens: A Smarter Various to JSON for LLM Pipelines

Cease Losing Tokens: A Smarter Various to JSON for LLM Pipelines

Cease Losing Tokens: A Smarter Various to JSON for LLM Pipelines

 

Introduction

 
JSON is nice for APIs, storage, and software logic. However inside massive language mannequin (LLM) pipelines, it usually carries numerous token overhead that doesn’t add a lot worth to the mannequin: braces, quotes, commas, and repeated area names on each row. TOON, brief for Token-Oriented Object Notation, is a more moderen format designed particularly to maintain the identical JSON information mannequin whereas utilizing fewer tokens and giving fashions clearer structural cues. The official TOON docs describe it as a compact, lossless illustration of JSON for LLM enter, particularly robust on uniform arrays of objects.

On this article, you’ll be taught what TOON is, when it is sensible to make use of it, and find out how to begin utilizing it step-by-step in your personal LLM workflow. We can even preserve the tradeoffs sincere, as a result of TOON is helpful in some circumstances, not all of them.

 

Why JSON Wastes Tokens in LLM Pipelines

 
JSON turns into costly in prompts as a result of it repeats construction time and again. LLMs don’t care that JSON is a typical. They solely see tokens.

Should you ship 100 help tickets, product rows, or consumer data to a mannequin, the identical area names seem in each object. TOON reduces that repetition by declaring fields as soon as after which streaming row values in a compact tabular kind. Right here is a straightforward instance.

JSON:

{
  "customers": [
    { "id": 1, "name": "Alice", "role": "admin" },
    { "id": 2, "name": "Bob", "role": "user" },
    { "id": 3, "name": "Charlie", "role": "user" }
  ]
}

 

TOON:

customers[3]{id,identify,position}:
  1,Alice,admin
  2,Bob,consumer
  3,Charlie,consumer

 

Similar information, much less muddle.

The construction remains to be clear, however the repeated keys are gone. That’s the place TOON will get most of its worth.

 

What TOON Really Is and When It Is Price Utilizing

 
TOON is a serialization format for the JSON information mannequin. Meaning it could actually characterize objects, arrays, strings, numbers, booleans, and null values — however in a method that’s extra compact for mannequin enter. The TOON venture presents it as lossless relative to JSON, which implies you’ll be able to convert JSON to TOON and again with out dropping info. The essential factor to know is that this:

You do not want to switch JSON in your app.

A greater strategy is to maintain JSON in your backend, APIs, and storage, then convert it to TOON solely if you end up about to ship structured information into an LLM.

TOON is most helpful when your immediate accommodates repeated structured data with the identical fields. Good examples embrace retrieved help tickets, catalog rows, analytics data, software outputs, CRM entries, or reminiscence snapshots for agent methods. Nonetheless, in case your construction is deeply nested, extremely irregular, purely flat, or very small, the advantages can shrink or disappear.

 

Getting Began with TOON

 

// Step 1: Putting in the TOON Command-Line Interface

The simplest solution to attempt TOON is with the official command-line interface (CLI) from the TOON venture. The TOON website hyperlinks on to its CLI, and the primary repository presents the format as a part of a broader SDK and tooling ecosystem.

Set up the bundle:

npm set up -g @toon-format/cli

 

// Step 2: Changing a JSON File into TOON

Let’s create a folder first:

mkdir toon-test
cd toon-test

 

Now, run the next command to create the JSON file:

 

Paste this:

[
  { "id": 1, "name": "Alice", "role": "admin" },
  { "id": 2, "name": "Bob", "role": "user" },
  { "id": 3, "name": "Charlie", "role": "user" }
]

 

Now convert it:

npx @toon-format/cli customers.json -o customers.toon

 

You must get a compact outcome much like this:

[3]{id,identify,position}:
  1,Alice,admin
  2,Bob,consumer
  3,Charlie,consumer

 

That is the core TOON sample: declare the form as soon as, then checklist the values row by row. That aligns with the official design aim of tabular arrays for uniform objects.

 

// Step 3: Utilizing TOON as Mannequin Enter

The very best place to make use of TOON is on the enter aspect of your pipeline. As a substitute of pasting a big JSON blob right into a immediate, move the TOON model and preserve the instruction easy.

For instance:

The next information is in TOON format.

customers[3]{id,identify,position}:
  1,Alice,admin
  2,Bob,consumer
  3,Charlie,consumer

Summarize the consumer roles and level out something uncommon.

 

This works nicely as a result of TOON is designed to assist the mannequin learn repeated construction with much less overhead. That can be how the official venture frames its benchmarks: as a take a look at of comprehension throughout totally different structured enter codecs.

 

// Step 4: Protecting JSON for Outputs

This is without doubt one of the most essential sensible selections. TOON could be very helpful for enter, however JSON remains to be often the higher alternative for output when one other system must parse the mannequin response. That’s as a result of JSON has a lot stronger tooling help, and fashionable APIs can implement structured JSON output with schemas.

In apply, the most secure sample is:

  • JSON in your app.
  • TOON for giant structured immediate context.
  • JSON once more for machine-parseable mannequin responses.

This provides you effectivity on the enter aspect and reliability on the output aspect.

 

// Step 5: Benchmarking in Your Personal Pipeline

Don’t change codecs based mostly on hype alone.

Run a small benchmark in your personal workflow:

  • Rely enter tokens for JSON.
  • Rely enter tokens for TOON.
  • Evaluate latency.
  • Evaluate reply high quality.
  • Evaluate whole value.

The official TOON venture positions token financial savings as one of many fundamental advantages, and third-party protection repeats these claims, however group dialogue additionally reveals that outcomes rely closely on the form of the info. That’s the reason one of the best query will not be “Is TOON higher than JSON?”

The higher query is: “Is TOON higher for this particular LLM step?”

 

Remaining Ideas

 
TOON will not be one thing it’s worthwhile to use all over the place.

It’s a focused optimization for one particular drawback: losing tokens on repeated JSON construction inside LLM prompts. In case your pipeline passes numerous repeated structured data right into a mannequin, TOON is price testing. In case your payloads are small, irregular, or closely nested, JSON should still be the higher alternative.

The neatest solution to undertake it’s easy: preserve JSON the place JSON already works nicely, use TOON the place you might be packing massive structured inputs into prompts, and benchmark the outcomes by yourself duties earlier than committing to it.
 
 

Kanwal Mehreen is a machine studying engineer and a technical author with a profound ardour for information science and the intersection of AI with medication. She co-authored the e-book “Maximizing Productiveness with ChatGPT”. As a Google Technology Scholar 2022 for APAC, she champions range and tutorial excellence. She’s additionally acknowledged as a Teradata Range in Tech Scholar, Mitacs Globalink Analysis Scholar, and Harvard WeCode Scholar. Kanwal is an ardent advocate for change, having based FEMCodes to empower girls in STEM fields.

RELATED ARTICLES

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Most Popular

Recent Comments