How I Saved Millions in Gen AI Token Costs

The title might sound dramatic - and yes, I didn’t storm into a boardroom shouting about millions saved.

But what started as a casual read on a clever efficiency hack turned into a personal experiment in one of my projects — and the results were eye-opening. A simple adjustment in how data is formatted led to significant reductions in token usage, and at scale, that translates to real cost savings.

Here’s what I discovered and how you can apply it too.

Why Tokens Add Up

If you’ve used Gen AI APIs like OpenAI or Gemini, you know that every token counts — input and output alike.

For a handful of calls, the difference may seem trivial. But when you scale to thousands or millions of requests, even a small inefficiency quietly stacks into noticeable costs. This experiment began with a simple question: Can how data is formatted actually impact token usage?

Turns out, yes — and it’s more impactful than you’d think.

Experimenting With Data Formats

The theory suggested that using YAML instead of JSON could reduce token usage. I wanted to see it in practice.

JSON (Standard)

{
  "invoice_id": "INV-342",
  "vendor": "CodeTerra", 
  "amount": 4500,
  "currency": "USD"
}

YAML (Alternative)

YAML

invoice_id: INV-342
vendor: CodeTerra
amount: 4500
currency: USD

Both contain identical information. But the alternative format eliminates extra punctuation — braces, commas, quotes — which reduces the number of tokens when processed by the AI model.

Results From My Project

In my specific test case, I saw:

20% drop in output tokens.
30% drop in character count.

It’s the kind of optimization that, at scale, feels like “millions saved” — even if it’s metaphorical for your project.

NOTE

Results can vary based on the data structure; some datasets might see even better results, but it’s worth trying for any high-volume application.

Implementing It Without Breaking Your Workflow

Most systems still expect JSON, so I set up a simple conversion pipeline to translate the AI’s YAML output back into JSON for my backend:

JavaScript

const yaml = require("js-yaml");
 
// Example AI output in YAML
const yamlData = `
invoice_id: INV-342
vendor: CodeTerra
amount: 4500
currency: USD
`;
 
// Convert YAML → JSON
const json = yaml.load(yamlData);
console.log(json);

Fast, lightweight, and it preserves efficiency while keeping backend systems happy.

Why This Actually Works

Fewer symbols = fewer tokens.
Compact structure = higher semantic density.
Cleaner data = easier for the model to process.

Key Takeaways

IMPORTANT

Test ideas yourself: Reading about a hack is one thing; validating it in your workflow is another.

Serialization matters: It affects both readability and cost.

Small tweaks multiply: Token savings compound quickly at scale.

Practical conversion is easy: Integrate efficiency without changing your existing backend.

Final Thoughts

This experiment showed me that optimization doesn’t always require rewriting models or infrastructure. Sometimes, it’s about noticing tiny inefficiencies that quietly eat resources.

The “millions saved” might be metaphorical here, but at scale, the idea is real: small changes in AI workflows can have outsized impacts on costs and performance.

CodeTerra

Explorer

How I Saved Millions in Gen AI Token Costs - A Real Experiment

Why Tokens Add Up

Experimenting With Data Formats

JSON (Standard)

YAML (Alternative)

Results From My Project

Implementing It Without Breaking Your Workflow

Why This Actually Works

Key Takeaways

Final Thoughts

Table of Contents

Backlinks

Graph View