https://arxiv.org/abs/2308.04623
[2308.04623] Accelerating LLM Inference with Staged Speculative Decoding
Abstract page for arXiv paper 2308.04623: Accelerating LLM Inference with Staged Speculative Decoding
staged speculative decodingllm inferenceaccelerating