Robuta

https://sebastianraschka.com/llm-architecture-gallery/gqa/ Grouped-Query Attention (GQA) | Sebastian Raschka, PhD Mar 21, 2026 - A gallery-local explainer for Grouped-Query Attention, based on the architecture comparison articles and LLMs-from-scratch. grouped query attentionsebastian raschkagqaphd https://cookllm.com/docs/fundamentals/systems/flash-attention/05-grouped-query-attention Grouped Query Attention | CookLLM | 大模型全栈工程体系:架构、训练与应用 实现 GQA/MQA 支持,让多个 Query Head 共享 KV,优化 KV Cache 内存占用。 grouped query attentioncookllm