large multimodal models - Robuta Search

https://openreview.net/forum?id=UL56lbucD3 GalleryGPT: Analyzing Paintings with Large Multimodal Models | OpenReview Artwork analysis is an important and fundamental skill for art appreciation, which could enrich personal aesthetic sensibility and facilitate the critical... large multimodal models analyzing paintings openreview https://huggingface.co/papers/2505.11454 Paper page - HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation Join the discussion on this paper page large multimodal models https://arxiv.org/abs/2505.12766 [2505.12766] Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems... Abstract page for arXiv paper 2505.12766: Reasoning-OCR: Can Large Multimodal Models Solve Complex Logical Reasoning Problems from OCR Cues? large multimodal models https://openreview.net/forum?id=2snKOc7TVp&referrer=%5Bthe%20profile%20of%20Shuntian%20Yao%5D(%2Fprofile%3Fid%3D~Shuntian_Yao1) VisualAgentBench: Towards Large Multimodal Models as Visual Foundation Agents | OpenReview Large Multimodal Models (LMMs) have ushered in a new era in artificial intelligence, merging capabilities in both language and vision to form highly capable... large multimodal models towards visual foundation agents https://arxiv.org/abs/2503.05936 [2503.05936] CASP: Compression of Large Multimodal Models Based on Attention Sparsity Abstract page for arXiv paper 2503.05936: CASP: Compression of Large Multimodal Models Based on Attention Sparsity large multimodal models https://arxiv.org/abs/2503.05936v1 [2503.05936v1] CASP: Compression of Large Multimodal Models Based on Attention Sparsity Abstract page for arXiv paper 2503.05936v1: CASP: Compression of Large Multimodal Models Based on Attention Sparsity large multimodal models https://openreview.net/forum?id=5EBT9ekISI&referrer=%5Bthe%20profile%20of%20Mohammad%20Akbari%5D(%2Fprofile%3Fid%3D~Mohammad_Akbari3) Divprune: Diversity-based visual token pruning for large multimodal models | OpenReview Large Multimodal Models (LMMs) have emerged as powerful models capable of understanding various data modalities, including text, images, and videos. LMMs... large multimodal models diversity based visual token https://openreview.net/forum?id=KOTutrSR2y MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities | OpenReview We propose MM-Vet, an evaluation benchmark that examines large multimodal models (LMMs) on complicated multimodal tasks. Recent LMMs have shown various... large multimodal models mm vet evaluating integrated https://arxiv.org/abs/2509.22377?ref=disinfodocket.com [2509.22377] Effectiveness of Large Multimodal Models in Detecting Disinformation: Experimental... Abstract page for arXiv paper 2509.22377: Effectiveness of Large Multimodal Models in Detecting Disinformation: Experimental Results large multimodal models 2509 effectiveness https://huggingface.co/papers/2605.00877 Paper page - OceanPile: A Large-Scale Multimodal Ocean Corpus for Foundation Models Join the discussion on this paper page a large https://openreview.net/forum?id=G5gROx8AVi SELU: Self-Learning Embodied Multimodal Large Language Models in Unknown Environments | OpenReview Recently, multimodal large language models (MLLMs) have demonstrated strong visual understanding and decision-making capabilities, enabling the exploration of... large language models self learning https://huggingface.co/papers/2506.23009 Paper page - MusiXQA: Advancing Visual Music Understanding in Multimodal Large Language Models Join the discussion on this paper page visual music https://openreview.net/forum?id=RBCxmxAgqo&referrer=%5Bthe%20profile%20of%20Liangda%20Fang%5D(%2Fprofile%3Fid%3D~Liangda_Fang2) Reason-and-Execute Prompting: Enhancing MultiModal Large Language Models for Solving Geometry... MultiModal Large Language Models (MM-LLMs) have demonstrated exceptional reasoning abilities in various visual question-answering tasks. However, they... large language models https://openreview.net/forum?id=F2uP3ieu_Rz&referrer=%5Bthe%20profile%20of%20Zhenyu%20He%5D(%2Fprofile%3Fid%3D~Zhenyu_He3) Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and... Generative pre-trained transformer (GPT) models have revolutionized the field of natural language processing (NLP) with remarkable performance in various tasks... https://openreview.net/forum?id=Y07R8h6m8e&referrer=%5Bthe%20profile%20of%20Khouloud%20Saadi%5D(%2Fprofile%3Fid%3D~Khouloud_Saadi1) Dissecting Misalignment of Multimodal Large Language Models via Influence Function | OpenReview Multi-modal Large Language models (MLLMs) are always trained on data from diverse and unreliable sources, which may contain misaligned or mislabeled text-image... large language models influence function dissecting misalignment multimodal https://pmc.ncbi.nlm.nih.gov/articles/PMC12083703/ Multimodal Integrated Knowledge Transfer to Large Language Models through Preference Optimization... The scarcity of high-quality multimodal biomedical data limits the ability to effectively fine-tune pretrained Large Language Models (LLMs) for specialized... large language models knowledge transfer multimodal integrated https://arxiv.org/abs/2404.12390?ref=worv.ghost.io [2404.12390] BLINK: Multimodal Large Language Models Can See but Not Perceive Abstract page for arXiv paper 2404.12390: BLINK: Multimodal Large Language Models Can See but Not Perceive large language models https://openreview.net/forum?id=hJPATsBb3l M3Exam: A Multilingual, Multimodal, Multilevel Benchmark for Examining Large Language Models |... Despite the existence of various benchmarks for evaluating natural language processing models, we argue that human exams are a more suitable means of... multilingual multimodal multilevel https://openreview.net/forum?id=2nIAtsUC27 Improve Temporal Reasoning in Multimodal Large Language Models via Video Contrastive Decoding |... A major distinction between video and image understanding is that the former requires reasoning over time. Existing Video Large Language Models (VLLMs)... large language models https://openreview.net/forum?id=GeTBk67mK6&referrer=%5Bthe%20profile%20of%20Zhendong%20Chu%5D(%2Fprofile%3Fid%3D~Zhendong_Chu1) ErrorRadar: Benchmarking Complex Mathematical Reasoning of Multimodal Large Language Models Via... As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their potential to revolutionize artificial intelligence is particularly... large language models mathematical reasoning benchmarking complex