Robuta

https://www.a-star.edu.sg/cfar/events/visual-knowledge-and-reasoning-with-large-language-models-classification-via-description-and-visual-inference-via-program-execution-for-reasoning-(vipergpt)
large language modelsvisualknowledgereasoningclassification
https://openreview.net/forum?id=TjFn6xktTm&referrer=%5Bthe%20profile%20of%20Yang%20Tang%5D(%2Fprofile%3Fid%3D~Yang_Tang3)
This paper addresses the issue of cross-class domain adaptation (CCDA) in semantic segmentation, where the target domain contains both shared and novel classes...
visual language modelscross classsemantic segmentationdomainadaptive
https://openreview.net/forum?id=6dYBP3BIwx&referrer=%5Bthe%20profile%20of%20Jiazheng%20Xu%5D(%2Fprofile%3Fid%3D~Jiazheng_Xu1)
We introduce CogVLM, a powerful open-source visual language foundation model. Different from the popular \emph{shallow alignment} method which maps image...
visual expertlanguage modelsopenreview
https://openreview.net/forum?id=2truTpHZkV&referrer=%5Bthe%20profile%20of%20Zalan%20Fabian%5D(%2Fprofile%3Fid%3D~Zalan_Fabian1)
Building trust in vision-language models (VLMs) requires reliable uncertainty estimation (UE) to detect unreliable generations. Existing UE approaches often...
language modelsunreliableresponsesgenerativevision
https://arxiv.org/abs/2103.00020?utm_campaign=The%20Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-9ArXRo8uGCVwJJHV_bnLhlnqMC4UgREF-7n0D5RN5g5rghAu11jGwRkMQ3B-rtjAZki5xD
Abstract page for arXiv paper 2103.00020: Learning Transferable Visual Models From Natural Language Supervision
natural languagelearningvisualmodelssupervision
https://arxiv.org/abs/2209.11737
Abstract page for arXiv paper 2209.11737: Visual representations in the human brain are aligned with large language models
human brainvisualrepresentationsaligned
https://huggingface.co/papers/2509.20136
Join the discussion on this paper page
papervgamegenerationcode
https://openreview.net/forum?id=bJuLbTmkeR&referrer=%5Bthe%20profile%20of%20Stavros%20Petridis%5D(%2Fprofile%3Fid%3D~Stavros_Petridis1)
Multimodal large language models (MLLMs) have recently become a focal point of research due to their formidable multimodal understanding capabilities. For...
large language modelsvisual speech recognitionstrongaudiolearners