Sponsor of the Day:
Jerkmate
https://arxiv.org/abs/2509.22186
[2509.22186] MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document...
Abstract page for arXiv paper 2509.22186: MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document Parsing
vision language modelefficient high2509decoupledresolution
https://www.together.ai/blog/dragonfly-v1
Dragonfly: A large vision-language model with multi-resolution zoom
vision language modelmulti resolutiondragonflylargezoom
https://ui.adsabs.harvard.edu/abs/2025arXiv251204032K/abstract
Jina-VLM: Small Multilingual Vision Language Model - ADS
We present Jina-VLM, a 2.4B parameter vision-language model that achieves state-of-the-art multilingual visual question answering among open 2B-scale VLMs. The...
jina vlm smallmultilingual vision languagemodelads
https://arxiv.org/html/2503.11576v1
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
vision language modelultra compactmulti modaldocument conversionsmoldocling
https://blog.ovhcloud.com/reference-architecture-deploying-a-vision-language-model-with-vllm-on-ovhcloud-mks-for-high-performance-inference-and-full-observability/
Reference Architecture: Deploying a vision-language model with vLLM on OVHcloud MKS for high...
Apr 10, 2026 - Deploy Vision Language Model for inference across multiple replicas using vLLM on OVHcloud Managed Kubernetes Service
vision language modelreference architecturedeployingvllmovhcloud
https://arxiv.org/abs/2512.04032
[2512.04032] Jina-VLM: Small Multilingual Vision Language Model
Abstract page for arXiv paper 2512.04032: Jina-VLM: Small Multilingual Vision Language Model
jina vlm smallmultilingual vision language2512model
https://jina.ai/news/jina-vlm-small-multilingual-vision-language-model/
Jina-VLM: Small Multilingual Vision Language Model
Dec 5, 2025 - New 2B vision language model achieves SOTA on multilingual VQA, no catastrophic forgetting on text-only tasks.
jina vlm smallmultilingual vision languagemodel
https://huggingface.co/papers/2509.22186
Paper page - MinerU2.5: A Decoupled Vision-Language Model for Efficient High-Resolution Document...
Join the discussion on this paper page
vision language modelefficient highpaper5decoupled
https://arxiv.org/abs/2503.11576
[2503.11576] SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal...
Abstract page for arXiv paper 2503.11576: SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
vision language modelultra compactmulti modal2503smoldocling