Robuta

https://vislang.ai/ VisLang - Vision, Language and Learning Lab at Rice University Research group at Rice University led by Vicente Ordonez, working at the intersection of computer vision, natural language processing, and machine learning. vision languagelearning labriceuniversity https://arxiv.org/abs/2601.22153 [2601.22153] DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Abstract page for arXiv paper 2601.22153: DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation vision languageactionmodeldynamicobject https://arxiv.org/html/2312.04403v1 OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport... language models viaotattackenhancingadversarial https://www.modular.com/models/qwen3-vl-8b Qwen3-VL 8B Inference, Efficient Vision-Language Model | Modular Deploy Qwen3-VL-8B by Alibaba for efficient vision-language inference on Modular. Dense 8B model on NVIDIA and AMD GPUs. vision language modelvlinferenceefficientmodular https://huggingface.co/papers/2503.11576 Paper page - SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal... Join the discussion on this paper page vision language modelultra compactpaperendmulti https://www.nature.com/articles/s41746-026-02596-4?error=cookies_not_supported&code=0e9476f6-f18e-427c-a689-3725863f4394 Decipher-MR: a vision-language foundation model for 3D MRI representations | npj Digital Medicine Apr 4, 2026 - Magnetic Resonance Imaging is a critical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity hinder scalable,... vision languagefoundation modeldeciphermrrepresentations https://evla-survey.github.io/ A Survey on Efficient Vision-Language-Action Models efficient visionlanguage actionsurveymodels https://cerebrium.ai/docs/v4/examples/deploy-a-vision-language-model-with-sglang Deploy a Vision Language Model with SGLang - Cerebrium Build an intelligent ad analysis system that evaluates advertisements across multiple dimensions vision language modeldeploysglangcerebrium https://www.modular.com/models/qwen3-vl-4b Qwen3-VL 4B Inference, Lightweight Vision-Language Model | Modular Deploy Qwen3-VL-4B by Alibaba for lightweight vision-language inference on Modular. Dense 4B model on NVIDIA and AMD GPUs. vision language modelvlinferencelightweightmodular https://www.together.ai/blog/dragonfly-v1 Dragonfly: A large vision-language model with multi-resolution zoom vision language modelmulti resolutiondragonflylargezoom https://www.layerthelatestinalattice.com/papers/arxiv:2605.05045 When Relations Break: Analyzing Relation Hallucination in Vision-Language Model Under Rotation and... This paper investigates the vulnerability of Vision-Language Models (VLMs) to relation hallucination under visual perturbations like rotation and noise. Ex... vision language modelrelationsbreakanalyzinghallucination https://arxiv.org/abs/2303.07226 [2303.07226] Scaling Vision-Language Models with Sparse Mixture of Experts Abstract page for arXiv paper 2303.07226: Scaling Vision-Language Models with Sparse Mixture of Experts vision languagescalingmodelssparsemixture https://deepseekocr.app/ DeepSeek OCR - Free Online OCR Tool | Vision-Language Model Text Extraction DeepSeek OCR - The world's first online OCR tool powered by DeepSeek's 3B vision-language model. 97% accuracy with ultra-low token consumption (100... ocr free onlinevision language modeldeepseektooltext https://www.findbestmodel.app/ ModelMatch - Compare Vision-Language Models Compare top open source vision-language models side-by-side, no coding needed vision languagecomparemodels https://aimultiple.com/vision-language-models Vision Language Models Compared to Image Recognition Apr 24, 2026 - Can advanced Vision Language Models (VLMs) replace traditional image recognition models? To find out, we benchmarked 16 leading models across three paradigms. vision languagemodelscomparedimagerecognition https://www.liquid.ai/use-cases/accelerating-vision-language-model-deployment-for-automotive-ai Accelerating Vision-Language Model Deployment for Automotive AI | Liquid AI Dec 17, 2025 - Deploy 10x faster in-car AI with Liquid’s optimized vision-language models—no hardware upgrades needed. vision language modelautomotive aiacceleratingdeploymentliquid https://www.ektavats.se/ Uppsala Vision, Language and Learning – Part of The Beijer Laboratory for Artificial Intelligence... vision languageuppsalalearningpartbeijer https://creatus.ai/chat-with-image Chat with Image for Free | AI Vision Language Model (VLM) | CREATUS.AI Experience the power of our vision language model and talk to images for free. Enhance your real-world vision and language understanding with our innovative... vision language modelfree aichatimagevlm https://www.ximilar.com/services/vision-language-models/ Vision Language Model Platform - Ximilar: Visual AI for Business Apr 27, 2026 - Train multimodal vision language models (VLMs) on an AI platform combining computer vision and natural language processing (LLMs). vision language modelvisual aiplatformximilarbusiness https://www.ndss-symposium.org/ndss-paper/vigtext-deepfake-image-detection-with-vision-language-model-explanations-and-graph-neural-networks/ ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks... vision language modelimage detectiondeepfakeexplanationsgraph https://openreview.net/forum?id=Sextl6R3Nf RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics |... Spatial understanding is essential for robots to perceive, reason about, and interact with their environments. However, current visual language models often... vision languageteachingspatialunderstandingmodels https://www.ubicloud.com/blog/end-to-end-ocr-with-vision-language-models End-to-End OCR with Vision Language Models Scanned documents and images are everywhere. Extracting structured data from them has always been an essential yet complex, multi-step task. vision languageendocrmodels https://www.liquid.ai/blog/lfm2-vl-efficient-vision-language-models LFM2-VL: Efficient Vision-Language Models | Liquid AI Oct 21, 2025 - Today, we release LFM2-VL, our first series of vision-language foundation models. These multimodal models are designed for low-latency and device-aware... efficient visionlanguage modelsvlliquidai https://www.nvidia.com/en-us/glossary/vision-language-models/ What are Vision-Language Models? | NVIDIA Glossary Vision Language Models (VLMs) are multimodal generative AI models capable of reasoning over text, image and video prompts. vision languagemodelsnvidiaglossary https://ltx-2.run/blog/step3-vl-10b-vision-language-model-en/ Step3-VL-10B: How a 10B Vision-Language Model Rivals Models 10-20x Larger | LTX-2 Blog Discover Step3-VL-10B, the efficient 10B vision-language model that outperforms models 10-20x larger with PE-lang encoder and exceptional STEM reasoning... vision language modelvlrivalsmodelslarger https://arxiv.org/abs/2409.17146 [2409.17146] Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models Abstract page for arXiv paper 2409.17146: Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models vision languagemolmoopenweightsdata https://github.com/Turbo1123/roubao GitHub - Turbo1123/roubao: Android Automation Tool Based on Vision-Language Models · GitHub Android Automation Tool Based on Vision-Language Models - Turbo1123/roubao automation toolvision languagegithubandroidbased https://xl-vla.github.io/ XL-VLA: Cross-Hand Latent Representation for Vision-Language-Action Models XL-VLA: Cross-Hand Latent Representation for Vision-Language-Action Models vision languagexlvlacrosshand https://www.jumpstartmag.com/carryais-serverless-vision-language-models-signal-a-new-era-of-on-device-ai/ CarryAI’s Serverless Vision-Language Models Signal a New Era of On-Device AI - Jumpstart Magazine Apr 10, 2026 - At HKTDC InnoEx 2026, CarryAI Ltd is emerging as a distinctive voice in the evolving AI landscape, showcasing a fundamentally different approach to how... vision languagenew eradevice aiserverlessmodels https://openreview.net/forum?id=ifo8oWSLSq FLOWER: Democratizing Generalist Robot Policies with Efficient Vision-Language-Action Flow Policies... This work introduces FLOWER, an efficient, open-source Vision-Language-Action Flow policy. Vision-Language-Action (VLA) models have demonstrated remarkable... efficient visionlanguage actionflowerdemocratizinggeneralist https://arxiv.org/abs/2512.04032 [2512.04032] jina-vlm: Small Multilingual Vision Language Model Abstract page for arXiv paper 2512.04032: jina-vlm: Small Multilingual Vision Language Model vision languagejinavlmsmallmultilingual https://fondazione-fair.it/en/transversal-projects/tp2-vision-language-and-multimodal-challenges/ TP2: Vision, Language and Multimodal Challenges - Fondazione FAIR vision languagemultimodalchallengesfondazionefair https://jina.ai/news/jina-vlm-small-multilingual-vision-language-model/ Jina-VLM: Small Multilingual Vision Language Model Dec 5, 2025 - New 2B vision language model achieves SOTA on multilingual VQA, no catastrophic forgetting on text-only tasks. vision languagejinavlmsmallmultilingual https://www.liquid.ai/use-cases/optimizing-vision-language-models-for-product-cataloging Optimizing Vision-Language Models for Product Cataloging | Liquid AI Dec 17, 2025 - Liquid’s vision-language models cut cataloging time by 65% while delivering higher accuracy and lower costs. vision languageoptimizingmodelsproductcataloging https://nord-vla-ai.github.io/ NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning NoRD: A Data-Efficient Vision-Language-Action Model that Drives without Reasoning efficient visionlanguage actionnorddatamodel https://blogs.nvidia.com/blog/nemotron-3-nano-omni-multimodal-ai-agents/ NVIDIA Launches Nemotron 3 Nano Omni Model, Unifying Vision, Audio and Language for up to 9x More... Apr 28, 2026 - Best-in-class open omni-modal reasoning model delivers the highest efficiency and accuracy to power agentic workflows such as computer use, document... nvidia launchesnano omnivision audionemotronmodel https://deepmind.google/blog/rt-2-new-model-translates-vision-and-language-into-action/ RT-2: New model translates vision and language into action — Google DeepMind new modelrttranslatesvisionlanguage https://www.unitary.ai/articles/unlocking-safe-digital-spaces-with-computer-vision-audio-and-language-processing Safe Online Spaces: Computer Vision + Audio & Language Processing Jun 2, 2023 - Learn what computer vision and natural language processing are, and how they can be used together to help platforms better moderate content online. Discover... safe onlinecomputer visionspacesaudiolanguage https://www.firstnations.org/stories/we-begin-with-relationship-a-vision-for-native-arts-and-language/ We Begin With Relationship: A Vision for Native Arts and Language | First Nations Development... Yá’át’ééh. Jennifer Himmelreich yinishyé. In January 2026, I joined First Nations Development Institute as a senior program officer with the Native Arts,... native artsfirst nationsbeginrelationshipvision https://dvr2u.com/ Dynamic Vision Resources (DVR) - Japanese Language Courses dynamic visionjapanese languageresourcesdvrcourses https://www.graphicmedicine.org/comic-reviews/speaking-in-pictures-a-vision-of-language/ Speaking in Pictures - A Vision of Language | Graphic Medicine speakingpicturesvisionlanguagegraphic https://arxiv.org/abs/2501.07171 [2501.07171] BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language... Abstract page for arXiv paper 2501.07171: BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific... image captionbiomedicaopenarchivedataset https://www.lv-lab.org/ Language and Vision Laboratory (LV-Lab) Language and Vision Laboratory (LV-Lab) across NUS and SMU. languagevisionlaboratorylv https://openreview.net/forum?id=kZLANTp6Vw GLOV: Guided Large Language Models as Implicit Optimizers for Vision Language Models | OpenReview In this work, we propose GLOV, which enables Large Language Models (LLMs) to act as implicit optimizers for Vision-Language Models (VLMs) to enhance downstream... large language modelsguidedimplicitoptimizersvision https://shihaozhaozsh.github.io/LaVi-Bridge/ Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation language modelsbridgingdifferentgenerativevision https://nehiyawak.org/ nēhiyawak Language Experience Inc. « Our vision is for Cree people to reclaim, restore and relearn... languageexperienceincvisioncree