Robuta

https://bagel-ai.org/ BAGEL: The Open-Source Unified Multimodal Model open sourceunified multimodalbagelmodel https://arxiv.org/abs/2602.12279 [2602.12279] UniT: Unified Multimodal Chain-of-Thought Test-time Scaling Abstract page for arXiv paper 2602.12279: UniT: Unified Multimodal Chain-of-Thought Test-time Scaling unified multimodaltest timeunitchainthought https://www.alphaxiv.org/overview/2605.02641 Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE | alphaXiv ByteDance's Mamoda2.5 unifies multimodal understanding, image generation, and video generation/editing within a single Autoregressive–Diffusion framework, unified multimodalenhancingmodelditmoe https://kangliao929.github.io/projects/puffin/ Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation |... We make the first attempt to unify camera-centric understanding and generation in a cohesive multimodal framework. unified multimodalthinkingcameramodelcentric https://arxiv.org/abs/2605.00658 [2605.00658] UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion... Abstract page for arXiv paper 2605.00658: UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion Priors unified multimodalvideo generationframeworkversatilevia https://github.com/jd-opensource/JoyAI-Image GitHub - jd-opensource/JoyAI-Image: JoyAI-Image is the unified multimodal foundation model for... JoyAI-Image is the unified multimodal foundation model for image understanding, text-to-image generation, and instruction-guided image editing. -... unified multimodalfoundation modelgithubjdopensource https://aigo.tools/best/bagel BAGEL : Unified Multimodal AI for Understanding, Generation, Editing | AIGO.tools AI Directory unified multimodalgeneration editingaigo toolsbagelunderstanding https://arxiv.org/abs/2605.04128 [2605.04128] JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and... Abstract page for arXiv paper 2605.04128: JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation spatial intelligenceunified multimodalimageunderstanding https://codetrendy.com/listing/uni-1 Uni-1 Unified Multimodal Image | CodeTrendy Uni-1 is Luma AI; unified image model that reasons and generates in one autoregressive system. unified multimodalimagecodetrendy https://arxiv.org/abs/2505.14683 [2505.14683] Emerging Properties in Unified Multimodal Pretraining Abstract page for arXiv paper 2505.14683: Emerging Properties in Unified Multimodal Pretraining unified multimodalemergingpropertiespretraining https://uni-ie.github.io/ Uni-IE | Unified Multimodal Information Extraction Uni-IE: Unified Multimodal Information Extraction, a research series on unified structured semantic parsing across entities, relations, events, and multimodal... unified multimodalinformationextraction https://huggingface.co/papers/2605.00658 Paper page - UniVidX: A Unified Multimodal Framework for Versatile Video Generation via Diffusion... Join the discussion on this paper page unified multimodalvideo generationpaperframeworkversatile https://www.dreamega.ai/models/kling-v3 Kling 3.0 AI Video Generator | Unified Multimodal Model | Dreamega Experience Kling 3.0, Kuaishou's unified multimodal video model series with Image 3.0 (2K/4K), Video 3.0 (15s native, 4K output), and Video 3.0 Omni element... ai video generatorunified multimodalklingmodeldreamega https://www.alphaxiv.org/abs/2605.04128 JoyAI-Image: Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation |... View recent discussion. Abstract: We present JoyAI-Image, a unified multimodal foundation model for visual understanding, text-to-image generation, and... spatial intelligenceunified multimodalimageunderstandinggeneration https://seeddance.ai/kling-o1 Kling O1 — World's First Unified Multimodal AI Video Model | SeedDance Generate Kling O1 videos on SeedDance. Kuaishou's groundbreaking unified multimodal video model for generating, editing, and transforming videos in a single... multimodal ai videofirst unifiedklingworldmodel https://wan-ai.app/bagel-ai Bagel AI – Unified Multimodal AI for Image Generation, Editing & Understanding Experience Bagel AI by ByteDance Seed - the revolutionary 7B parameter unified multimodal AI that generates, edits, and understands images in one powerful... image generation editingunified multimodalbagelaiunderstanding https://huggingface.co/papers/2605.04128 Paper page - Awaking Spatial Intelligence in Unified Multimodal Understanding and Generation Join the discussion on this paper page spatial intelligenceunified multimodalpaperunderstandinggeneration https://www.alphaxiv.org/abs/2605.02641 Mamoda2.5: Enhancing Unified Multimodal Model with DiT-MoE | alphaXiv View recent discussion. Abstract: We present Mamoda2.5, a unified AR-Diffusion framework that seamlessly integrates multimodal understanding and generation... unified multimodalenhancingmodelditmoe https://seeddance.ai/kling-o3 Kling O3 — Unified Multimodal AI Video Generator with 4K & Multi-Shot Control | SeedDance Generate Kling O3 videos on SeedDance. Kuaishou's flagship unified multimodal AI video model with native audio, up to 6 camera cuts, 4K output, visual... multimodal ai videoklingunifiedgeneratorshot https://vidofy.ai/en/models/kling-ai/kling-o3 Kling O3: Unified Multimodal AI Video Generator with Native Audio & 4K Output Create cinematic AI videos with Kling O3 on Vidofy. Native audio sync, multi-shot control up to 6 cuts, 15-second clips, and 4K output. Launched Feb 2026. multimodal ai videonative audioklingunifiedgenerator https://zenmux.ai/blog/gemini-3-pro-preview-now-available-on-zenmux-multimodal-ai-via-a-unified-api Gemini 3 Pro Preview Now Available on ZenMux: Multimodal AI via a Unified API - ZenMux Gemini 3 Pro Preview, Google’s most advanced multimodal reasoning model, is now available on ZenMux. Through ZenMux’s unified API, developers can access... pro previewmultimodal aigeminiavailablezenmux