Robuta

https://huggingface.co/blog/lightonai/lightonocr
A Blog post by LightOn AI on Hugging Face
caseendefficient
https://aiwith.me/tools/8pixlabs-com/
Jan 2, 2026 - 8PixLabs: Molmo AI is a family of open-source vision-language models that provide advanced AI functionalities such as image recognition and text generation,...
molmoaisuiteopenvision
https://megagon.ai/vlms-conflicting-info-which-does-it-trust/
Nov 13, 2025 - VLMs don't simply "fuse" multimodal information—they make active, context-dependent decisions about which signals to prioritize...
vision language modelsgetconflictinginformationsignal
https://towardsdatascience.com/how-to-apply-vision-language-models-to-long-documents/
Dec 6, 2025 - Learn how to apply powerful VLMs for long context document understanding tasks
vision language modelsapplylongdocumentstowards
https://huggingface.co/blog/manu/colpali
A Blog post by Manuel Faysse on Hugging Face
vision language modelsdocument retrievalefficient
https://www.eivindkjosbakken.com/webinar
Watch a free webinar recording on applying vision language models to document processing. Learn practical VLM techniques from Eivind Kjosbakken.
vision language modelseivind kjosbakkenfreewebinar
https://www.amazon.science/blog/fine-tuning-vision-language-models-on-memory-constrained-devices
Jan 9, 2026 - A new hybrid optimization approach allows edge devices to fine-tune vision-language models using only forward passes, achieving up to 7% higher accuracy than...
vision language modelsfinetuningmemorydevices
https://www.nvidia.com/en-us/glossary/vision-language-models/
Vision Language Models (VLMs) are multimodal generative AI models capable of reasoning over text, image and video prompts.
vision language modelsnvidiaglossary
https://j-min.io/publication/crg_eccv2024/
Jan 28, 2025 - CRG is a training-free method that guides VLMs to help understand the visual prompts, by contrasting the outputs with & without visual prompts. - *[ECCV...
vision language modelsregionguidanceimprovinggrounding
https://j-min.io/publication/capture_iccv2025/
Sep 16, 2025 - a VLM benchmark testing spatial reasoning by making the models count objects under occlusion
vision language modelscaptureevaluatingspatialreasoning