https://boston.qcon.ai/presentation/boston2026/serving-llms-scale-hidden-kv-cache-advantage
QCon AI Boston 2026 | Serving LLMs at Scale: The Hidden KV Cache Advantage
KV cache is the hidden lever behind inference cost and performance. It directly impacts GPU utilization, throughput, and Time to First Token.
qcon ai bostonserving llmsat scalethe hiddenkv cache
https://www.assembled.com/blog/scaling-llms-with-golang-how-we-serve-millions-of-llm-requests
Scaling LLMs with Golang: Serving Millions of Requests
See why Go is our top choice for production LLM deployments. Learn how its type safety, concurrency, and interfaces power scalable, efficient infrastructure,...
millions ofscalingllmsgolangserving
https://kittygiraudel.com/2026/03/11/serving-markdown-to-llms-with-11ty/
Serving Markdown to LLMs With Eleventy | Kitty Giraudel
May 7, 2026 - A technical walkthrough on how to serve a Markdown version of all pages with Eleventy.
kitty giraudelservingmarkdownllmseleventy
https://www.sglang.io/
SGLang - High-Performance Serving Framework for LLMs and VLMs
high performancesglangservingframeworkllms