Robuta

https://boston.qcon.ai/presentation/boston2026/serving-llms-scale-hidden-kv-cache-advantage QCon AI Boston 2026 | Serving LLMs at Scale: The Hidden KV Cache Advantage KV cache is the hidden lever behind inference cost and performance. It directly impacts GPU utilization, throughput, and Time to First Token. qcon ai bostonserving llmsat scalethe hiddenkv cache https://www.assembled.com/blog/scaling-llms-with-golang-how-we-serve-millions-of-llm-requests Scaling LLMs with Golang: Serving Millions of Requests See why Go is our top choice for production LLM deployments. Learn how its type safety, concurrency, and interfaces power scalable, efficient infrastructure,... millions ofscalingllmsgolangserving https://kittygiraudel.com/2026/03/11/serving-markdown-to-llms-with-11ty/ Serving Markdown to LLMs With Eleventy | Kitty Giraudel May 7, 2026 - A technical walkthrough on how to serve a Markdown version of all pages with Eleventy. kitty giraudelservingmarkdownllmseleventy https://www.sglang.io/ SGLang - High-Performance Serving Framework for LLMs and VLMs high performancesglangservingframeworkllms