Robuta

https://arxiv.org/abs/2308.04623 [2308.04623] Accelerating LLM Inference with Staged Speculative Decoding Abstract page for arXiv paper 2308.04623: Accelerating LLM Inference with Staged Speculative Decoding staged speculative decodingllm inferenceaccelerating