2 posts
LLM token cost is growing faster than the planned AI budget. How to measure usage, where hidden costs lurk, and which optimization patterns actually work in production.
LLM prompt caching in 2026: what is a static prefix cache, how it differs from semantic cache, and how to structure your prompt to hit the cache.