https://github.com/vllm-project/vllm
GitHub - vllm-project/vllm: A high-throughput and memory-efficient inference and serving engine for...
A high-throughput and memory-efficient inference and serving engine for LLMs - vllm-project/vllm
high throughputmemory efficientgithubvllmproject
https://virtual.aistats.org/virtual/2026/poster/13635
AISTATS Poster Efficient Inference for Coupled Hidden Markov Models in Continuous Time and Discrete...
aistats poster efficientcontinuous timeinferencecoupledhidden
https://effi-stats.fr/
effi-stats.fr – Efficient inference for large and high-frequency data
efficient inferencehigh frequencystatslargedata
https://www.together.ai/blog/foundational-research-powering-efficient-inference-at-scale
Foundational research powering efficient inference at scale
As AI moves from research to production, the challenge for AI-native teams shifts from building models to running them — efficiently, reliably, and at scale.
efficient inferencefoundationalresearchpoweringscale
https://lumalabs.ai/news/tvm
Pushing the Limit of Efficient Inference-Time Scaling with Terminal Velocity Matching | Luma
Terminal Velocity Matching (TVM) is a new single-stage training paradigm for efficient generation. While achieving the same sample quality, it exhibits 25x...
efficient inferencepushinglimittimescaling
https://www.modular.com/models/qwen3-vl-8b
Qwen3-VL 8B Inference, Efficient Vision-Language Model | Modular
Deploy Qwen3-VL-8B by Alibaba for efficient vision-language inference on Modular. Dense 8B model on NVIDIA and AMD GPUs.
vision language modelvlinferenceefficientmodular
https://arxiv.org/abs/2207.00032
[2207.00032] DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at...
Abstract page for arXiv paper 2207.00032: DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale
transformer modelsdeepspeedinferenceenablingefficient
https://www.modular.com/models/ministral-14b
Ministral 14B Inference, Mistral's Efficient Vision Model | Modular
Deploy Ministral 14B by Mistral AI with optimized inference on Modular. Efficient vision and text model on NVIDIA and AMD GPUs.
efficient visionministralinferencemistralmodel
https://www.modular.com/models/qwen3-30b-a3b
Qwen3 30B Inference, Efficient Small MoE | Modular
Deploy Qwen3-30B-A3B by Alibaba with optimized MoE inference on Modular. 30B total, 3B active. Fast and cost-efficient.
inferenceefficientsmallmoemodular
https://www.modular.com/models/mistral-small-3-1-24b
Mistral Small 3.1 24B, Efficient Vision LLM Inference | Modular
Deploy Mistral Small 3.1 24B with optimized inference on Modular. Efficient 24B vision model on NVIDIA and AMD GPUs.
mistral smallefficient visionllm inferencemodular
https://virtual.aistats.org/virtual/2026/poster/13729
AISTATS Poster Local Causal Discovery for Statistically Efficient Causal Inference
aistats posterlocalcausaldiscoverystatistically
https://www.modular.com/models/llama-4-scout
Llama 4 Scout Inference, Meta's Efficient Vision Model | Modular
Deploy Llama 4 Scout by Meta with optimized inference on Modular. Efficient vision-capable model on NVIDIA and AMD GPUs.
efficient visionllamascoutinferencemeta
https://virtual.aistats.org/virtual/2026/poster/13690
AISTATS Poster Latent-IMH: Efficient Bayesian Inference for Inverse Problems with Approximate...
aistats posterbayesian inferencelatentimhefficient
https://www.modular.com/models/minimax-m2-5
MiniMax M2.5 Inference, 230B Efficient MoE | Modular
Deploy MiniMax M2.5 (230B MoE, 10B active) with optimized inference on Modular. Efficient text generation on NVIDIA and AMD GPUs.
minimaxinferenceefficientmoemodular