Robuta

https://openreview.net/forum?id=zJMutieTgh&referrer=%5Bthe%20profile%20of%20Lei%20Wang%5D(%2Fprofile%3Fid%3D~Lei_Wang30)
Face recognition (FR) has been applied to nearly every aspect of daily life, but it is always accompanied by the underlying risk of leaking private...
face recognitioninferenceattacksmodelwithout
https://arxiv.org/html/2402.09748v1
large language modelscompressionefficientinference
https://share.vidyard.com/watch/stPvdbbwos1FnWo2bg5o5j
AI Customer Engineer, Chris Bogdiukiewicz, introduces PyTorch for the IPU. With PopTorch™ - a simple Python wrapper for PyTorch programs, developers can...
getting startedpytorchipurunningbasic
https://arxiv.org/abs/2111.12550
Abstract page for arXiv paper 2111.12550: A Worker-Task Specialization Model for Crowdsourcing: Efficient Inference and Fundamental Limits
workertaskspecializationmodelcrowdsourcing
https://www.confluent.io/resources/online-talk/rag-tutorial-with-flink-ai-model-inference-mongodb/
Learn to build retrieval-augmented generation in 4 steps: data augmentation, inference, workflows, and post-processing. See vector embedding step by step.
ai modelstepsbuildragconfluent
https://developer.nvidia.com/blog/nvidia-tensorrt-llm-supercharges-large-language-model-inference-on-nvidia-h100-gpus/
Nov 7, 2023 - Large language models (LLMs) offer incredible new capabilities, expanding the frontier of what is possible with AI. However, their large size and unique…
large language modelnvidiatensorrtllmsupercharges
https://www.mdpi.com/1424-8220/21/19/6594
With the advancement of machine learning, a growing number of mobile users rely on machine learning inference for making time-sensitive and safety-critical...
jointmodelprovisioningrequestdispatch
https://openreview.net/forum?id=s5rJFjAd65&referrer=%5Bthe%20profile%20of%20Aske%20Plaat%5D(%2Fprofile%3Fid%3D~Aske_Plaat1)
Integrated Assessment Models (IAMs) such as RICE have long provided a foundation for studying the coupled dynamics of the global economy and climate system....
integrated assessment modelleveragingfullydifferentiablerl
https://www.preprints.org/manuscript/202009.0377
The accurate prediction of the solar Diffuse Fraction (DF), sometimes called the Diffuse Ratio, is an important topic for solar energy research. In the present...
fuzzy inference systemmultilayer perceptronadaptiveneuromodel
https://www.baseten.co/resources/customers/superhuman/
Nov 25, 2025 - Superhuman cut P95 latency by 80% across dozens of custom embedding models in just one week after adopting Baseten Embedding Inference.
model inferencesuperhumanfasterembedding
https://www.nvidia.com/en-us/on-demand/session/gtcspring22-s42534/
NVIDIA Triton Inference Server (Triton) is an open-source inference serving software that maximizes performance and simplifies model deployment at scale
scale modelsimplifyservingnvidiatriton
https://openreview.net/forum?id=tnxONP8zTE&referrer=%5Bthe%20profile%20of%20Jiajie%20Zhang%5D(%2Fprofile%3Fid%3D~Jiajie_Zhang2)
Large language models (LLMs) have demonstrated remarkable capabilities in complex reasoning tasks. However, existing approaches mainly rely on imitation...
language modelreinforcement learningadvancingreasoninginference
https://towardsdatascience.com/optimizing-pytorch-model-inference-on-cpu/
Dec 8, 2025 - Flyin’ Like a Lion on Intel Xeon
towards data sciencemodel inferenceoptimizingpytorchcpu
https://aclanthology.org/2020.spnlp-1.4/
Hendrik ter Horst, Philipp Cimiano. Proceedings of the Fourth Workshop on Structured Prediction for NLP. 2020.
structured predictionjoint classcardinalityentityproperty
https://www.confluent.io/ja-jp/resources/online-talk/rag-tutorial-with-flink-ai-model-inference-mongodb/
Learn to build retrieval-augmented generation in 4 steps: data augmentation, inference, workflows, and post-processing. See vector embedding step by step.
ai modelstepsbuildragconfluent
https://aclanthology.org/D08-1084/
Bill MacCartney, Michel Galley, Christopher D. Manning. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing. 2008.
natural language inferencephrasebasedalignmentmodel
https://www.amazon.science/publications/optimizing-cnn-model-inference-on-cpus
The popularity of Convolutional Neural Network (CNN) models and the ubiquity of CPUs imply that better performance of CNN model inference on CPUs can deliver...
model inferenceamazon scienceoptimizingcnncpus
https://arxiv.org/abs/2104.12470
Abstract page for arXiv paper 2104.12470: Easy and Efficient Transformer : Scalable Inference Solution For large NLP model
easyefficienttransformerscalableinference
https://www.alibabacloud.com/help/en/ack/cloud-native-ai-suite/user-guide/deploy-a-vllm-inference-application
Deploy a vLLM model as an inference service,Container Service for Kubernetes:Vectorized Large Language Model (vLLM) is a high-performance large language model...
inference servicedeployvllmmodelcontainer
https://www.osti.gov/pages/biblio/2338226-novel-symmetry-preserving-neural-network-model-phylogenetic-inference
The U.S. Department of Energy's Office of Scientific and Technical Information
neural network modelphylogenetic inferencenovelsymmetrypreserving
https://resources.nvidia.com/en-us-run-ai/reducing-cold-start-latency-for-llm-inference-with-nvidia-runai-model-streamer
Deploying large language models (LLMs) poses a challenge in optimizing inference efficiency. In particular, cold start delays—where models take significant...
cold startllm inferencereducinglatencynvidia
https://openreview.net/forum?id=nZIFBtNuJi&referrer=%5Bthe%20profile%20of%20Avrim%20Blum%5D(%2Fprofile%3Fid%3D~Avrim_Blum1)
We are often interested in decomposing complex, structured data into simple components that explain the data. The linear version of this problem is...
combinatorial dictionarymodellearninginferenceopenreview
https://www.alibabacloud.com/help/en/ack/cloud-native-ai-suite/use-cases/deploy-deepseek-distillation-model-inference-service-based-on-ack
Deploy a DeepSeek distilled model inference service on ACK,Container Service for Kubernetes:This topic describes how to use KServe to deploy a production-ready...
model inferencedeploydeepseekdistilledservice