efficient inference - Robuta Search

https://github.com/vllm-project/vllm GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for... A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm high throughput memory efficient github vllm project https://virtual.aistats.org/virtual/2026/poster/13635 AISTATS Poster Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete... aistats poster efficient continuous time inference coupled hidden https://effi-stats.fr/ effi-stats.fr – Efficient inference for large and high-frequency data efficient inference high frequency stats large data https://www.together.ai/blog/foundational-research-powering-efficient-inference-at-scale Foundational research powering efficient inference at scale As AI moves from research to production, the challenge for AI-native teams shifts from building models to running them — efficiently, reliably, and at scale. efficient inference foundational research powering scale https://lumalabs.ai/news/tvm Pushing the Limit of Efficient Inference-Time Scaling with Terminal Velocity Matching | Luma Terminal Velocity Matching (TVM) is a new single-stage training paradigm for efficient generation. While achieving the same sample quality, it exhibits 25x... efficient inference pushing limit time scaling https://www.modular.com/models/qwen3-vl-8b Qwen3-VL 8B Inference, Efficient Vision-Language Model | Modular Deploy Qwen3-VL-8B by Alibaba for efficient vision-language inference on Modular. Dense 8B model on NVIDIA and AMD GPUs. vision language model vl inference efficient modular https://arxiv.org/abs/2207.00032 [2207.00032] DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at... Abstract page for arXiv paper 2207.00032: DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale transformer models deepspeed inference enabling efficient https://www.modular.com/models/ministral-14b Ministral 14B Inference, Mistral's Efficient Vision Model | Modular Deploy Ministral 14B by Mistral AI with optimized inference on Modular. Efficient vision and text model on NVIDIA and AMD GPUs. efficient vision ministral inference mistral model https://www.modular.com/models/qwen3-30b-a3b Qwen3 30B Inference, Efficient Small MoE | Modular Deploy Qwen3-30B-A3B by Alibaba with optimized MoE inference on Modular. 30B total, 3B active. Fast and cost-efficient. inference efficient small moe modular https://www.modular.com/models/mistral-small-3-1-24b Mistral Small 3.1 24B, Efficient Vision LLM Inference | Modular Deploy Mistral Small 3.1 24B with optimized inference on Modular. Efficient 24B vision model on NVIDIA and AMD GPUs. mistral small efficient vision llm inference modular https://virtual.aistats.org/virtual/2026/poster/13729 AISTATS Poster Local Causal Discovery for Statistically Efficient Causal Inference aistats poster local causal discovery statistically https://www.modular.com/models/llama-4-scout Llama 4 Scout Inference, Meta's Efficient Vision Model | Modular Deploy Llama 4 Scout by Meta with optimized inference on Modular. Efficient vision-capable model on NVIDIA and AMD GPUs. efficient vision llama scout inference meta https://virtual.aistats.org/virtual/2026/poster/13690 AISTATS Poster Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate... aistats poster bayesian inference latent imh efficient https://www.modular.com/models/minimax-m2-5 MiniMax M2.5 Inference, 230B Efficient MoE | Modular Deploy MiniMax M2.5 (230B MoE, 10B active) with optimized inference on Modular. Efficient text generation on NVIDIA and AMD GPUs. minimax inference efficient moe modular