Three things every team running LLMs in production should understand before the next invoice lands. Each is drawn from real cost modeling against live 2026 pricing — share them, or put them to work on your own stack.
Run one support chatbot at a million conversations a month and the bill lands anywhere from ~$500 to ~$25,750 — for identical work. The only variables are model choice and optimization. Most teams sit near the top of that range without realizing there's a bottom.
// Takeaway: your provider doesn't set your bill — your architecture does.
// The 50× cost range
// The ~95% method
It isn't one trick. Right-size the model, cache the stable prefix, and batch what can wait, and the discounts compound: prompt caching (−90% on cached input) stacked with async batch (−50%) lands near 95% off the unoptimized baseline.
// Takeaway: the discounts multiply — they don't just add.
Model routing, prompt caching, batching, provisioned throughput, output minimization, context control, and self-hosting. Worked through in priority order against your real workloads, they're the whole optimization playbook on a single page.
// Takeaway: optimization is a system, not a setting.
// The 7 levers
Send a note and I'll get back to you within two business days. No pitch — just a straight read on where your savings are.
// Chris Echevarria · chris@bluetinto.com