Robuta

Sponsor of the Day: Jerkmate
https://arxiv.org/abs/2509.22186 [2509.22186] MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document... Abstract page for arXiv paper 2509.22186: MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing vision language modelefficient high2509decoupledresolution https://www.together.ai/blog/dragonfly-v1 Dragonfly: A large vision-language model with multi-resolution zoom vision language modelmulti resolutiondragonflylargezoom https://ui.adsabs.harvard.edu/abs/2025arXiv251204032K/abstract Jina-VLM: Small Multilingual Vision Language Model - ADS We present Jina-VLM, a 2.4B parameter vision-language model that achieves state-of-the-art multilingual visual question answering among open 2B-scale VLMs. The... jina vlm smallmultilingual vision languagemodelads https://arxiv.org/html/2503.11576v1 SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion vision language modelultra compactmulti modaldocument conversionsmoldocling https://blog.ovhcloud.com/reference-architecture-deploying-a-vision-language-model-with-vllm-on-ovhcloud-mks-for-high-performance-inference-and-full-observability/ Reference Architecture: Deploying a vision-language model with vLLM on OVHcloud MKS for high... Apr 10, 2026 - Deploy Vision Language Model for inference across multiple replicas using vLLM on OVHcloud Managed Kubernetes Service vision language modelreference architecturedeployingvllmovhcloud https://arxiv.org/abs/2512.04032 [2512.04032] Jina-VLM: Small Multilingual Vision Language Model Abstract page for arXiv paper 2512.04032: Jina-VLM: Small Multilingual Vision Language Model jina vlm smallmultilingual vision language2512model https://jina.ai/news/jina-vlm-small-multilingual-vision-language-model/ Jina-VLM: Small Multilingual Vision Language Model Dec 5, 2025 - New 2B vision language model achieves SOTA on multilingual VQA, no catastrophic forgetting on text-only tasks. jina vlm smallmultilingual vision languagemodel https://huggingface.co/papers/2509.22186 Paper page - MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document... Join the discussion on this paper page vision language modelefficient highpaper5decoupled https://arxiv.org/abs/2503.11576 [2503.11576] SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal... Abstract page for arXiv paper 2503.11576: SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion vision language modelultra compactmulti modal2503smoldocling