https://bagel-ai.org/
BAGEL: The Open-Source Unified Multimodal Model
open sourceunified multimodalbagelmodel
https://arxiv.org/abs/2602.12279
[2602.12279] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling
Abstract page for arXiv paper 2602.12279: UniT: Unified Multimodal Chain-of-Thought Test-time Scaling
unified multimodaltest timeunitchainthought
https://www.alphaxiv.org/overview/2605.02641
Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE | alphaXiv
ByteDance's Mamoda2.5 unifies multimodal understanding, image generation, and video generation/editing within a single Autoregressive–Diffusion framework,
unified multimodalenhancingmodelditmoe
https://kangliao929.github.io/projects/puffin/
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation |...
We make the first attempt to unify camera-centric understanding and generation in a cohesive multimodal framework.
unified multimodalthinkingcameramodelcentric
https://arxiv.org/abs/2605.00658
[2605.00658] UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion...
Abstract page for arXiv paper 2605.00658: UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors
unified multimodalvideo generationframeworkversatilevia
https://github.com/jd-opensource/JoyAI-Image
GitHub - jd-opensource/JoyAI-Image: JoyAI-Image is the unified multimodal foundation model for...
JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing. -...
unified multimodalfoundation modelgithubjdopensource
https://aigo.tools/best/bagel
BAGEL : Unified Multimodal AI for Understanding, Generation, Editing | AIGO.tools AI Directory
unified multimodalgeneration editingaigo toolsbagelunderstanding
https://arxiv.org/abs/2605.04128
[2605.04128] JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and...
Abstract page for arXiv paper 2605.04128: JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
spatial intelligenceunified multimodalimageunderstanding
https://codetrendy.com/listing/uni-1
Uni-1 Unified Multimodal Image | CodeTrendy
Uni-1 is Luma AI; unified image model that reasons and generates in one autoregressive system.
unified multimodalimagecodetrendy
https://arxiv.org/abs/2505.14683
[2505.14683] Emerging Properties in Unified Multimodal Pretraining
Abstract page for arXiv paper 2505.14683: Emerging Properties in Unified Multimodal Pretraining
unified multimodalemergingpropertiespretraining
https://uni-ie.github.io/
Uni-IE | Unified Multimodal Information Extraction
Uni-IE: Unified Multimodal Information Extraction, a research series on unified structured semantic parsing across entities, relations, events, and multimodal...
unified multimodalinformationextraction
https://huggingface.co/papers/2605.00658
Paper page - UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion...
Join the discussion on this paper page
unified multimodalvideo generationpaperframeworkversatile
https://www.dreamega.ai/models/kling-v3
Kling 3.0 AI Video Generator | Unified Multimodal Model | Dreamega
Experience Kling 3.0, Kuaishou's unified multimodal video model series with Image 3.0 (2K/4K), Video 3.0 (15s native, 4K output), and Video 3.0 Omni element...
ai video generatorunified multimodalklingmodeldreamega
https://www.alphaxiv.org/abs/2605.04128
JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation |...
View recent discussion. Abstract: We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and...
spatial intelligenceunified multimodalimageunderstandinggeneration
https://seeddance.ai/kling-o1
Kling O1 — World's First Unified Multimodal AI Video Model | SeedDance
Generate Kling O1 videos on SeedDance. Kuaishou's groundbreaking unified multimodal video model for generating, editing, and transforming videos in a single...
multimodal ai videofirst unifiedklingworldmodel
https://wan-ai.app/bagel-ai
Bagel AI – Unified Multimodal AI for Image Generation, Editing & Understanding
Experience Bagel AI by ByteDance Seed - the revolutionary 7B parameter unified multimodal AI that generates, edits, and understands images in one powerful...
image generation editingunified multimodalbagelaiunderstanding
https://huggingface.co/papers/2605.04128
Paper page - Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation
Join the discussion on this paper page
spatial intelligenceunified multimodalpaperunderstandinggeneration
https://www.alphaxiv.org/abs/2605.02641
Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE | alphaXiv
View recent discussion. Abstract: We present Mamoda2.5, a unified AR-Diffusion framework that seamlessly integrates multimodal understanding and generation...
unified multimodalenhancingmodelditmoe
https://seeddance.ai/kling-o3
Kling O3 — Unified Multimodal AI Video Generator with 4K & Multi-Shot Control | SeedDance
Generate Kling O3 videos on SeedDance. Kuaishou's flagship unified multimodal AI video model with native audio, up to 6 camera cuts, 4K output, visual...
multimodal ai videoklingunifiedgeneratorshot
https://vidofy.ai/en/models/kling-ai/kling-o3
Kling O3: Unified Multimodal AI Video Generator with Native Audio & 4K Output
Create cinematic AI videos with Kling O3 on Vidofy. Native audio sync, multi-shot control up to 6 cuts, 15-second clips, and 4K output. Launched Feb 2026.
multimodal ai videonative audioklingunifiedgenerator
https://zenmux.ai/blog/gemini-3-pro-preview-now-available-on-zenmux-multimodal-ai-via-a-unified-api
Gemini 3 Pro Preview Now Available on ZenMux: Multimodal AI via a Unified API - ZenMux
Gemini 3 Pro Preview, Google’s most advanced multimodal reasoning model, is now available on ZenMux. Through ZenMux’s unified API, developers can access...
pro previewmultimodal aigeminiavailablezenmux