Robuta

https://huggingface.co/papers/2312.17661
Join the discussion on this paper page
papergeminireasoningunveilingcommonsense
https://aclanthology.org/2025.findings-acl.1378/
Gio Paik, Geewook Kim, Jinbae Im. Findings of the Association for Computational Linguistics: ACL 2025. 2025.
unveilingobstaclesrobustrefinementmultimodal
https://openreview.net/forum?id=rQ7fz9NO7f&referrer=%5Bthe%20profile%20of%20Gang%20Liu%5D(%2Fprofile%3Fid%3D~Gang_Liu6)
While large language models (LLMs) have integrated images, adapting them to graphs remains challenging, limiting their applications in materials and drug...
large language modelsmultimodalinversemoleculardesign
https://arxiv.org/abs/2401.13919v1
Abstract page for arXiv paper 2401.13919v1: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
web agentbuildingend
https://openreview.net/forum?id=on9sP7K1LMm&referrer=%5Bthe%20profile%20of%20Zhiliang%20Peng%5D(%2Fprofile%3Fid%3D~Zhiliang_Peng1)
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and...
large language modelskosmosgroundingmultimodalworld
https://huggingface.co/papers/2503.21851
Join the discussion on this paper page
large multimodal modelsopen worldpaperimage
https://huggingface.co/papers/2406.11839
Join the discussion on this paper page
preference optimizationlarge languagepapermdpoconditional
https://openreview.net/forum?id=bjoHB7IN6b&referrer=%5Bthe%20profile%20of%20Yufei%20Zhan%5D(%2Fprofile%3Fid%3D~Yufei_Zhan1)
Recent advancements in multimodal large language models (MLLMs) have enhanced document understanding by integrating textual and visual information. However,...
large languageseeingbelievingmitigatingocr
https://rocm.blogs.amd.com/software-tools-optimization/vllm-dp-vision/README.html
Learn how to optimize multimodal model inference with batch-level data parallelism for vision encoders in vLLM, achieving up to 45% throughput gains on AMD...
one lineacceleratingmultimodalinferencevllm
https://jmir.org/2024/1/e59505/citations
In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad...
large language modelsinternet researchjournalmedicalmultimodal
https://openreview.net/forum?id=wnuC0jreGI&referrer=%5Bthe%20profile%20of%20Sreejan%20Kumar%5D(%2Fprofile%3Fid%3D~Sreejan_Kumar1)
Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a...
large language modelscomparingabstractionhumansusing
https://openreview.net/forum?id=MctBpV4isI&referrer=%5Bthe%20profile%20of%20Avinash%20Madasu%5D(%2Fprofile%3Fid%3D~Avinash_Madasu1)
Training models on synthetic data is an effective strategy for improving large multimodal models (LMMs) due to the scarcity of high-quality paired image-text...
data generationanalyzegenerateimprovefailure