Robuta

https://huggingface.co/papers/2312.17661
Join the discussion on this paper page
papergeminireasoningunveilingcommonsense
https://arxiv.org/abs/2502.11573v1
Abstract page for arXiv paper 2502.11573v1: InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning
small language modelscraftingeffectivemultimodal
https://edubirdie.com/docs/university-of-michigan/writing-100-the-practice-of-writing/72919-language-of-multimodal-texts
SUPPORTING MULTIMODAL LITERACY: SUPPLEMENT 1 The Language of Multimodal Texts When analyzing multimodal texts... Read more
study guidelanguagemultimodaltextsedubirdie
https://openreview.net/forum?id=G9qA1JZ0Sy&referrer=%5Bthe%20profile%20of%20Jingyang%20Qiao%5D(%2Fprofile%3Fid%3D~Jingyang_Qiao1)
Instruction tuning guides the Multimodal Large Language Models (MLLMs) in aligning different modalities by designing text instructions, which seems to be an...
large languagemultimodalcontinualassistantopenreview
https://www.mdpi.com/2076-3417/14/17/7782
As large language models (LLMs) continue to advance, evaluating their comprehensive capabilities becomes significant for their application in various fields.
comprehensive evaluationputtinggptsword
https://www.tableau.com/th-th/research/publications/discovering-natural-language-commands-multimodal-interfaces
natural languagediscoveringcommandsmultimodalinterfaces
https://aclanthology.org/2025.findings-acl.1378/
Gio Paik, Geewook Kim, Jinbae Im. Findings of the Association for Computational Linguistics: ACL 2025. 2025.
unveilingobstaclesrobustrefinementmultimodal
https://aclanthology.org/2024.findings-acl.463/
Fanqing Meng, Wenqi Shao, Quanfeng Lu, Peng Gao, Kaipeng Zhang, Yu Qiao, Ping Luo. Findings of the Association for Computational Linguistics: ACL 2024. 2024.
multimodal languageuniversalchartmodelvia
https://openreview.net/forum?id=ELd5Kn0StP&referrer=%5Bthe%20profile%20of%20Xiaohong%20Liu%5D(%2Fprofile%3Fid%3D~Xiaohong_Liu2)
The development of multimodal large language models (MLLMs) enables the evaluation of image quality through natural language descriptions. This advancement...
image quality assessmentmultimodal languagegroundingiqamodel
https://huggingface.co/papers/2412.07755
Join the discussion on this paper page
spatial aptitudemultimodal languagepapersatdynamic
https://openreview.net/forum?id=rQ7fz9NO7f&referrer=%5Bthe%20profile%20of%20Gang%20Liu%5D(%2Fprofile%3Fid%3D~Gang_Liu6)
While large language models (LLMs) have integrated images, adapting them to graphs remains challenging, limiting their applications in materials and drug...
large language modelsmultimodalinversemoleculardesign
https://openreview.net/forum?id=g0u6xNAChC&referrer=%5Bthe%20profile%20of%20Yen-Ling%20Kuo%5D(%2Fprofile%3Fid%3D~Yen-Ling_Kuo1)
We study object interaction anticipation in egocentric videos. This task requires an understanding of the spatio-temporal context formed by past actions on...
natural languagesummarizepastpredictfuture
https://huggingface.co/papers/2410.08695
Join the discussion on this paper page
paperdynamicmultimodalevaluationflexible
https://aclanthology.org/L18-1532/
Jacqueline Brixey, Eli Pincus, Ron Artstein. Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). 2018.
chahta anumpachoctaw languagemultimodalcorpusacl
https://aclanthology.org/2024.acl-long.411/
Yunxin Li, Xinyu Chen, Baotian Hu, Haoyuan Shi, Min Zhang. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1:...
visual languagecognitivemapperadvancingmultimodal
https://huggingface.co/papers/2410.01620
Join the discussion on this paper page
paperlmodlargemultimodalophthalmology
https://huggingface.co/papers/2411.14522
Join the discussion on this paper page
papergmaivllarge
https://openreview.net/forum?id=on9sP7K1LMm&referrer=%5Bthe%20profile%20of%20Zhiliang%20Peng%5D(%2Fprofile%3Fid%3D~Zhiliang_Peng1)
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new capabilities of perceiving object descriptions (e.g., bounding boxes) and...
large language modelskosmosgroundingmultimodalworld
https://huggingface.co/papers/2406.11839
Join the discussion on this paper page
preference optimizationlarge languagepapermdpoconditional
https://newsroom-deezer.com/2025/09/recsys-epure-html/
Sep 7, 2025 - Natural language interfaces offer a compelling approach for music recommendation, enabling users to express complex preferences conversationally. While Large...
natural languageaskmusicjammultimodal
https://aclanthology.org/volumes/2023.mmnlg-1/
natural language generationproceedingsworkshopmultimodalmultilingual
https://aclanthology.org/2020.lrec-1.93/
Dimosthenis Kontogiorgos, Elena Sibirtseva, Joakim Gustafson. Proceedings of the Twelfth Language Resources and Evaluation Conference. 2020.
chinese whispersmultimodaldatasetembodiedlanguage
https://arxiv.org/abs/2502.01341
Abstract page for arXiv paper 2502.01341: AlignVLM: Bridging Vision and Language Latent Spaces for Multimodal Document Understanding
bridgingvisionlanguagelatentspaces
https://huggingface.co/papers/2308.01430
Join the discussion on this paper page
large language modelpapergptmultimodal
https://openreview.net/forum?id=bjoHB7IN6b&referrer=%5Bthe%20profile%20of%20Yufei%20Zhan%5D(%2Fprofile%3Fid%3D~Yufei_Zhan1)
Recent advancements in multimodal large language models (MLLMs) have enhanced document understanding by integrating textual and visual information. However,...
large languageseeingbelievingmitigatingocr
https://jmir.org/2024/1/e59505/citations
In the complex and multidimensional field of medicine, multimodal data are prevalent and crucial for informed clinical decisions. Multimodal data span a broad...
large language modelsinternet researchjournalmedicalmultimodal
https://openreview.net/forum?id=wnuC0jreGI&referrer=%5Bthe%20profile%20of%20Sreejan%20Kumar%5D(%2Fprofile%3Fid%3D~Sreejan_Kumar1)
Humans extract useful abstractions of the world from noisy sensory data. Serial reproduction allows us to study how people construe the world through a...
large language modelscomparingabstractionhumansusing
https://aclanthology.org/2024.emnlp-main.387/
Jiahao Huo, Yibo Yan, Boren Hu, Yutao Yue, Xuming Hu. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024.
domain specificdiscoveringneuronlevelinterpretation
https://openreview.net/forum?id=fISpu1aEHM&referrer=%5Bthe%20profile%20of%20Lee%20Hyun%5D(%2Fprofile%3Fid%3D~Lee_Hyun1)
Despite the recent advances in artificial intelligence, building social intelligence remains a challenge.Among social signals, laughter is one of the...
smilemultimodaldatasetunderstandinglaughter
https://huggingface.co/papers/2408.01337
Join the discussion on this paper page
paperevaluatingmusicunderstandingmultimodal