smarter json alternative for llm pipelines

source: kdnuggets: stop wasting tokens: a smarter alternative to json for llm pipelines

level: technical

json is common for apis and storage but adds token overhead in llm prompts. braces, quotes, commas, and repeated field names on every row consume tokens without adding value. toon, or token-oriented object notation, is a compact format that keeps the same json data model while using fewer tokens. it declares fields once and streams row values in a tabular form, making it easier for models to read repeated structure.

toon is a serialization format for the json data model, representing objects, arrays, strings, numbers, booleans, and null values. it is lossless, so you can convert json to toon and back without losing information. the best use is for input when prompts contain many repeated structured records with the same fields, like support tickets or catalog rows. for deeply nested, irregular, or small data, the benefits may be limited. keep json in your backend and convert to toon only when sending data to an llm.

to start, install the toon cli with npm and convert a json file to toon. the output shows the core pattern: a header declaring the shape and count, followed by comma-separated values. use toon in prompts with simple instructions, and keep json for model outputs because of better tooling and schema support. benchmark token counts, latency, quality, and cost in your own pipeline to see if toon helps for your specific llm step.

why it matters: using toon can lower token costs and improve efficiency when feeding large structured datasets into llms.

source: kdnuggets: stop wasting tokens: a smarter alternative to json for llm pipelines