speculative decoding - Robuta Search

https://openreview.net/forum?id=4avKYhpmn5 LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual... Speculative decoding has been widely used to accelerate autoregressive (AR) text generation. However, its effectiveness for visual AR models remains limited... speculative decoding lantern enhancing relaxed https://openreview.net/forum?id=WpXq5n8yLb&referrer=%5Bthe%20profile%20of%20Yunfei%20Cheng%5D(%2Fprofile%3Fid%3D~Yunfei_Cheng3) Recurrent Drafter for Fast Speculative Decoding in Large Language Models | OpenReview We present Recurrent Drafter (ReDrafter), an advanced speculative decoding approach that achieves state-of-the-art speedup for large language models (LLMs)... large language models speculative decoding recurrent drafter fast https://openreview.net/forum?id=ffDhpmwqdu In-batch Ensemble Drafting: Robust Speculative Decoding for LVLMs | OpenReview Despite the success of Speculative Decoding (SD) in LLM inference acceleration, it largely remains unexplored for Large Vision Language Models (LVLMs), an... speculative decoding batch ensemble drafting robust https://openreview.net/forum?id=o8ZGNn9LpN SpecTr++: Improved transport plans for speculative decoding of large language models | OpenReview We revisit the question of accelerating decoding of language models based on speculative draft samples, inspired by Y. Leviathan et al. (ICML 2023). Following... large language models transport plans speculative decoding https://www.jaist.ac.jp/is/labs/nguyen-lab/systems/intro/spec_decode/index.html Nguyen Lab's Speculative Decoding Research Group speculative decoding nguyen lab research group https://lawsen.substack.com/ Speculative Decoding | Alex Lawsen | Substack AI, Management, Forecasting, whatever else is on my mind. Click to read Speculative Decoding, by Alex Lawsen, a Substack publication with hundreds of... speculative decoding alex substack https://openreview.net/forum?id=rJAIyKo7jA&referrer=%5Bthe%20profile%20of%20Qitan%20Lv%5D(%2Fprofile%3Fid%3D~Qitan_Lv1) Parallel Speculative Decoding with Adaptive Draft Length | OpenReview speculative decoding parallel adaptive draft length https://arxiv.org/abs/2510.02329 [2510.02329] SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification Abstract page for arXiv paper 2510.02329: SelfJudge: Faster Speculative Decoding via Self-Supervised Judge Verification speculative decoding faster https://openreview.net/forum?id=rsY6J3ZaTF DistillSpec: Improving Speculative Decoding via Knowledge Distillation | OpenReview speculative decoding knowledge distillation improving via openreview https://research.google/pubs/distillspec-improving-speculative-decoding-via-knowledge-distillation/ DistillSpec: Improving speculative decoding via knowledge distillation speculative decoding improving via knowledge distillation https://arxiv.org/abs/2605.02888 [2605.02888] SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection Abstract page for arXiv paper 2605.02888: SpecKV: Adaptive Speculative Decoding with Compression-Aware Gamma Selection speculative decoding adaptive https://arxiv.org/abs/2310.08461 [2310.08461] DistillSpec: Improving Speculative Decoding via Knowledge Distillation Abstract page for arXiv paper 2310.08461: DistillSpec: Improving Speculative Decoding via Knowledge Distillation speculative decoding improving via knowledge distillation https://arxiv.org/abs/2601.09212 [2601.09212] Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation Abstract page for arXiv paper 2601.09212: Annealed Relaxation of Speculative Decoding for Faster Autoregressive Image Generation speculative decoding https://arxiv.org/abs/2506.22694 [2506.22694] VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs Abstract page for arXiv paper 2506.22694: VOCABTRIM: Vocabulary Pruning for Efficient Speculative Decoding in LLMs speculative decoding vocabulary pruning https://openreview.net/forum?id=QOXrVMiHGK&referrer=%5Bthe%20profile%20of%20Qitan%20Lv%5D(%2Fprofile%3Fid%3D~Qitan_Lv1) PEARL: Parallel Speculative Decoding with Adaptive Draft Length | OpenReview Speculative decoding (SD), where an extra draft model is employed to provide multiple **draft** tokens first and then the original target model verifies these... speculative decoding pearl parallel adaptive draft https://lmstudio.ai/docs/typescript/llm-prediction/speculative-decoding Speculative Decoding | LM Studio API to use a draft model in speculative decoding in lmstudio-js speculative decoding lm studio https://openreview.net/forum?id=RdKYAHZPxg Recursive Speculative Decoding: Accelerating LLM Inference via Sampling Without Replacement |... Speculative decoding is an inference-acceleration method for large language models (LLMs) where a small language model generates a draft-token sequence which... speculative decoding llm inference recursive accelerating via https://uwaterloo.ca/data-systems-group/references/nearest-neighbor-speculative-decoding-llm-generation-and Nearest Neighbor Speculative Decoding for LLM Generation and Attribution | Data Systems Group |... speculative decoding for llm https://aclanthology.org/2025.findings-acl.1017/ DReSD: Dense Retrieval for Speculative Decoding - ACL Anthology Milan Gritta, Huiyin Xue, Gerasimos Lampouras. Findings of the Association for Computational Linguistics: ACL 2025. 2025. speculative decoding dense retrieval acl anthology https://openreview.net/forum?id=d0mGsaheuT SpecTr: Fast Speculative Decoding via Optimal Transport | OpenReview Autoregressive sampling from large language models has shown to achieve state-of-the-art results in several natural language tasks. However, autoregressive... speculative decoding optimal transport fast via openreview https://openreview.net/forum?id=BPQHXwVNvl Online Speculative Decoding | OpenReview Speculative decoding is a pivotal technique to accelerate the inference of large language models (LLMs) by employing a smaller draft model to predict the... speculative decoding online openreview https://arxiv.org/abs/2602.16961 [2602.16961] Greedy Multi-Path Block Verification for Faster Decoding in Speculative Sampling Abstract page for arXiv paper 2602.16961: Greedy Multi-Path Block Verification for Faster Decoding in Speculative Sampling https://cohere.com/blog/mixture-of-experts-models-get-more-from-speculative-decoding Why MoE models get more from speculative decoding MoE models enhance speculative decoding through bandwidth-bound sweet spots, expert routing correlation reducing unique weight loading, and fixed-overhead... get more moe models speculative decoding https://openreview.net/forum?id=ukDi9JyaL3 MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language... Speculative decoding significantly accelerates language model inference by enabling a lightweight draft model to propose multiple tokens that a larger target... https://aclanthology.org/2025.emnlp-main.366/ SpecVLM: Enhancing Speculative Decoding of Video LLMs via Verifier-Guided Token Pruning - ACL... Yicheng Ji, Jun Zhang, Heming Xia, Jinpeng Chen, Lidan Shou, Gang Chen, Huan Li. Proceedings of the 2025 Conference on Empirical Methods in Natural Language... https://openreview.net/forum?id=mtSSFiqW6y&referrer=%5Bthe%20profile%20of%20Jonas%20K%20Kohler%5D(%2Fprofile%3Fid%3D~Jonas_K_Kohler1) Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment | OpenReview The performance of large language models (LLMs) is closely linked to their underlying size, leading to ever-growing networks and hence slower inference.... going beyond model alignment judge decoding faster https://arxiv.org/abs/2604.15244 [2604.15244] From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step... Abstract page for arXiv paper 2604.15244: From Tokens to Steps: Verification-Aware Speculative Decoding for Efficient Multi-Step Reasoning