https://vislang.ai/
VisLang - Vision, Language and Learning Lab at Rice University
Research group at Rice University led by Vicente Ordonez, working at the intersection of computer vision, natural language processing, and machine learning.
vision languagelearning labriceuniversity
https://arxiv.org/abs/2601.22153
[2601.22153] DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
Abstract page for arXiv paper 2601.22153: DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation
vision languageactionmodeldynamicobject
https://arxiv.org/html/2312.04403v1
OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport...
language models viaotattackenhancingadversarial
https://www.modular.com/models/qwen3-vl-8b
Qwen3-VL 8B Inference, Efficient Vision-Language Model | Modular
Deploy Qwen3-VL-8B by Alibaba for efficient vision-language inference on Modular. Dense 8B model on NVIDIA and AMD GPUs.
vision language modelvlinferenceefficientmodular
https://huggingface.co/papers/2503.11576
Paper page - SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal...
Join the discussion on this paper page
vision language modelultra compactpaperendmulti
https://www.nature.com/articles/s41746-026-02596-4?error=cookies_not_supported&code=0e9476f6-f18e-427c-a689-3725863f4394
Decipher-MR: a vision-language foundation model for 3D MRI representations | npj Digital Medicine
Apr 4, 2026 - Magnetic Resonance Imaging is a critical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity hinder scalable,...
vision languagefoundation modeldeciphermrrepresentations
https://evla-survey.github.io/
A Survey on Efficient Vision-Language-Action Models
efficient visionlanguage actionsurveymodels
https://cerebrium.ai/docs/v4/examples/deploy-a-vision-language-model-with-sglang
Deploy a Vision Language Model with SGLang - Cerebrium
Build an intelligent ad analysis system that evaluates advertisements across multiple dimensions
vision language modeldeploysglangcerebrium
https://www.modular.com/models/qwen3-vl-4b
Qwen3-VL 4B Inference, Lightweight Vision-Language Model | Modular
Deploy Qwen3-VL-4B by Alibaba for lightweight vision-language inference on Modular. Dense 4B model on NVIDIA and AMD GPUs.
vision language modelvlinferencelightweightmodular
https://www.together.ai/blog/dragonfly-v1
Dragonfly: A large vision-language model with multi-resolution zoom
vision language modelmulti resolutiondragonflylargezoom
https://www.layerthelatestinalattice.com/papers/arxiv:2605.05045
When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and...
This paper investigates the vulnerability of Vision-Language Models (VLMs) to relation hallucination under visual perturbations like rotation and noise. Ex...
vision language modelrelationsbreakanalyzinghallucination
https://arxiv.org/abs/2303.07226
[2303.07226] Scaling Vision-Language Models with Sparse Mixture of Experts
Abstract page for arXiv paper 2303.07226: Scaling Vision-Language Models with Sparse Mixture of Experts
vision languagescalingmodelssparsemixture
https://deepseekocr.app/
DeepSeek OCR - Free Online OCR Tool | Vision-Language Model Text Extraction
DeepSeek OCR - The world's first online OCR tool powered by DeepSeek's 3B vision-language model. 97% accuracy with ultra-low token consumption (100...
ocr free onlinevision language modeldeepseektooltext
https://www.findbestmodel.app/
ModelMatch - Compare Vision-Language Models
Compare top open source vision-language models side-by-side, no coding needed
vision languagecomparemodels
https://aimultiple.com/vision-language-models
Vision Language Models Compared to Image Recognition
Apr 24, 2026 - Can advanced Vision Language Models (VLMs) replace traditional image recognition models? To find out, we benchmarked 16 leading models across three paradigms.
vision languagemodelscomparedimagerecognition
https://www.liquid.ai/use-cases/accelerating-vision-language-model-deployment-for-automotive-ai
Accelerating Vision-Language Model Deployment for Automotive AI | Liquid AI
Dec 17, 2025 - Deploy 10x faster in-car AI with Liquid’s optimized vision-language models—no hardware upgrades needed.
vision language modelautomotive aiacceleratingdeploymentliquid
https://www.ektavats.se/
Uppsala Vision, Language and Learning – Part of The Beijer Laboratory for Artificial Intelligence...
vision languageuppsalalearningpartbeijer
https://creatus.ai/chat-with-image
Chat with Image for Free | AI Vision Language Model (VLM) | CREATUS.AI
Experience the power of our vision language model and talk to images for free. Enhance your real-world vision and language understanding with our innovative...
vision language modelfree aichatimagevlm
https://www.ximilar.com/services/vision-language-models/
Vision Language Model Platform - Ximilar: Visual AI for Business
Apr 27, 2026 - Train multimodal vision language models (VLMs) on an AI platform combining computer vision and natural language processing (LLMs).
vision language modelvisual aiplatformximilarbusiness
https://www.ndss-symposium.org/ndss-paper/vigtext-deepfake-image-detection-with-vision-language-model-explanations-and-graph-neural-networks/
ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks...
vision language modelimage detectiondeepfakeexplanationsgraph
https://openreview.net/forum?id=Sextl6R3Nf
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics |...
Spatial understanding is essential for robots to perceive, reason about, and interact with their environments. However, current visual language models often...
vision languageteachingspatialunderstandingmodels
https://www.ubicloud.com/blog/end-to-end-ocr-with-vision-language-models
End-to-End OCR with Vision Language Models
Scanned documents and images are everywhere. Extracting structured data from them has always been an essential yet complex, multi-step task.
vision languageendocrmodels
https://www.liquid.ai/blog/lfm2-vl-efficient-vision-language-models
LFM2-VL: Efficient Vision-Language Models | Liquid AI
Oct 21, 2025 - Today, we release LFM2-VL, our first series of vision-language foundation models. These multimodal models are designed for low-latency and device-aware...
efficient visionlanguage modelsvlliquidai
https://www.nvidia.com/en-us/glossary/vision-language-models/
What are Vision-Language Models? | NVIDIA Glossary
Vision Language Models (VLMs) are multimodal generative AI models capable of reasoning over text, image and video prompts.
vision languagemodelsnvidiaglossary
https://ltx-2.run/blog/step3-vl-10b-vision-language-model-en/
Step3-VL-10B: How a 10B Vision-Language Model Rivals Models 10-20x Larger | LTX-2 Blog
Discover Step3-VL-10B, the efficient 10B vision-language model that outperforms models 10-20x larger with PE-lang encoder and exceptional STEM reasoning...
vision language modelvlrivalsmodelslarger
https://arxiv.org/abs/2409.17146
[2409.17146] Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
Abstract page for arXiv paper 2409.17146: Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
vision languagemolmoopenweightsdata
https://github.com/Turbo1123/roubao
GitHub - Turbo1123/roubao: Android Automation Tool Based on Vision-Language Models · GitHub
Android Automation Tool Based on Vision-Language Models - Turbo1123/roubao
automation toolvision languagegithubandroidbased
https://xl-vla.github.io/
XL-VLA: Cross-Hand Latent Representation for Vision-Language-Action Models
XL-VLA: Cross-Hand Latent Representation for Vision-Language-Action Models
vision languagexlvlacrosshand
https://www.jumpstartmag.com/carryais-serverless-vision-language-models-signal-a-new-era-of-on-device-ai/
CarryAI’s Serverless Vision-Language Models Signal a New Era of On-Device AI - Jumpstart Magazine
Apr 10, 2026 - At HKTDC InnoEx 2026, CarryAI Ltd is emerging as a distinctive voice in the evolving AI landscape, showcasing a fundamentally different approach to how...
vision languagenew eradevice aiserverlessmodels
https://openreview.net/forum?id=ifo8oWSLSq
FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies...
This work introduces FLOWER, an efficient, open-source Vision-Language-Action Flow policy. Vision-Language-Action (VLA) models have demonstrated remarkable...
efficient visionlanguage actionflowerdemocratizinggeneralist
https://arxiv.org/abs/2512.04032
[2512.04032] jina-vlm: Small Multilingual Vision Language Model
Abstract page for arXiv paper 2512.04032: jina-vlm: Small Multilingual Vision Language Model
vision languagejinavlmsmallmultilingual
https://fondazione-fair.it/en/transversal-projects/tp2-vision-language-and-multimodal-challenges/
TP2: Vision, Language and Multimodal Challenges - Fondazione FAIR
vision languagemultimodalchallengesfondazionefair
https://jina.ai/news/jina-vlm-small-multilingual-vision-language-model/
Jina-VLM: Small Multilingual Vision Language Model
Dec 5, 2025 - New 2B vision language model achieves SOTA on multilingual VQA, no catastrophic forgetting on text-only tasks.
vision languagejinavlmsmallmultilingual
https://www.liquid.ai/use-cases/optimizing-vision-language-models-for-product-cataloging
Optimizing Vision-Language Models for Product Cataloging | Liquid AI
Dec 17, 2025 - Liquid’s vision-language models cut cataloging time by 65% while delivering higher accuracy and lower costs.
vision languageoptimizingmodelsproductcataloging
https://nord-vla-ai.github.io/
NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning
efficient visionlanguage actionnorddatamodel
https://blogs.nvidia.com/blog/nemotron-3-nano-omni-multimodal-ai-agents/
NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More...
Apr 28, 2026 - Best-in-class open omni-modal reasoning model delivers the highest efficiency and accuracy to power agentic workflows such as computer use, document...
nvidia launchesnano omnivision audionemotronmodel
https://deepmind.google/blog/rt-2-new-model-translates-vision-and-language-into-action/
RT-2: New model translates vision and language into action — Google DeepMind
new modelrttranslatesvisionlanguage
https://www.unitary.ai/articles/unlocking-safe-digital-spaces-with-computer-vision-audio-and-language-processing
Safe Online Spaces: Computer Vision + Audio & Language Processing
Jun 2, 2023 - Learn what computer vision and natural language processing are, and how they can be used together to help platforms better moderate content online. Discover...
safe onlinecomputer visionspacesaudiolanguage
https://www.firstnations.org/stories/we-begin-with-relationship-a-vision-for-native-arts-and-language/
We Begin With Relationship: A Vision for Native Arts and Language | First Nations Development...
Yá’át’ééh. Jennifer Himmelreich yinishyé. In January 2026, I joined First Nations Development Institute as a senior program officer with the Native Arts,...
native artsfirst nationsbeginrelationshipvision
https://dvr2u.com/
Dynamic Vision Resources (DVR) - Japanese Language Courses
dynamic visionjapanese languageresourcesdvrcourses
https://www.graphicmedicine.org/comic-reviews/speaking-in-pictures-a-vision-of-language/
Speaking in Pictures - A Vision of Language | Graphic Medicine
speakingpicturesvisionlanguagegraphic
https://arxiv.org/abs/2501.07171
[2501.07171] BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language...
Abstract page for arXiv paper 2501.07171: BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific...
image captionbiomedicaopenarchivedataset
https://www.lv-lab.org/
Language and Vision Laboratory (LV-Lab)
Language and Vision Laboratory (LV-Lab) across NUS and SMU.
languagevisionlaboratorylv
https://openreview.net/forum?id=kZLANTp6Vw
GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models | OpenReview
In this work, we propose GLOV, which enables Large Language Models (LLMs) to act as implicit optimizers for Vision-Language Models (VLMs) to enhance downstream...
large language modelsguidedimplicitoptimizersvision
https://shihaozhaozsh.github.io/LaVi-Bridge/
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation
language modelsbridgingdifferentgenerativevision
https://nehiyawak.org/
nēhiyawak Language Experience Inc. « Our vision is for Cree people to reclaim, restore and relearn...
languageexperienceincvisioncree