https://openreview.net/forum?id=rQ7fz9NO7f&referrer=%5Bthe%20profile%20of%20Gang%20Liu%5D(%2Fprofile%3Fid%3D~Gang_Liu6)
While large language models (LLMs) have integrated images, adapting them to graphs remains challenging, limiting their applications in materials and drug...
large language modelsmultimodalinversemoleculardesign
https://arxiv.org/abs/2401.13919v1
Abstract page for arXiv paper 2401.13919v1: WebVoyager: Building an End-to-End Web Agent with Large Multimodal Models
web agentbuildingend
https://openreview.net/forum?id=on9sP7K1LMm&referrer=%5Bthe%20profile%20of%20Zhiliang%20Peng%5D(%2Fprofile%3Fid%3D~Zhiliang_Peng1)
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and...
large language modelskosmosgroundingmultimodalworld
https://openreview.net/forum?id=bjoHB7IN6b&referrer=%5Bthe%20profile%20of%20Yufei%20Zhan%5D(%2Fprofile%3Fid%3D~Yufei_Zhan1)
Recent advancements in multimodal large language models (MLLMs) have enhanced document understanding by integrating textual and visual information. However,...
large languageseeingbelievingmitigatingocr
https://rocm.blogs.amd.com/software-tools-optimization/vllm-dp-vision/README.html
Learn how to optimize multimodal model inference with batch-level data parallelism for vision encoders in vLLM, achieving up to 45% throughput gains on AMD...
one lineacceleratingmultimodalinferencevllm
https://openreview.net/forum?id=wnuC0jreGI&referrer=%5Bthe%20profile%20of%20Sreejan%20Kumar%5D(%2Fprofile%3Fid%3D~Sreejan_Kumar1)
Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a...
large language modelscomparingabstractionhumansusing
https://openreview.net/forum?id=MctBpV4isI&referrer=%5Bthe%20profile%20of%20Avinash%20Madasu%5D(%2Fprofile%3Fid%3D~Avinash_Madasu1)
Training models on synthetic data is an effective strategy for improving large multimodal models (LMMs) due to the scarcity of high-quality paired image-text...
data generationanalyzegenerateimprovefailure