Robuta

Sponsor of the Day: Jerkmate
https://www.alphaxiv.org/?categories=%5B%22computer-science%22%5D&custom-categories=%5B%22agents%22%5D Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/resources/2604.18564 MultiWorld: Scalable Multi-Agent Multi-View Video World Models | alphaXiv View recent discussion. Abstract: Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or... multiworld scalable multiagent view videomodels alphaxiv https://www.alphaxiv.org/abs/2604.15726 LLM Reasoning Is Latent, Not the Chain of Thought | alphaXiv View recent discussion. Abstract: This position paper argues that large language model (LLM) reasoning should be studied as latent-state trajectory formation... llm reasoningthought alphaxivlatentchain https://www.alphaxiv.org/?categories=%5B%22computer-science%22%5D&custom-categories=%5B%22agents%22%5D&subcategories=%5B%22artificial-intelligence%22%2C%22computer-vision-and-pattern-recognition%22%5D Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/resources/2604.14141 Geometric Context Transformer for Streaming 3D Reconstruction | alphaXiv View recent discussion. Abstract: Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which... geometric context transformerstreaming 3d reconstructionalphaxiv https://www.alphaxiv.org/overview/2603.24440 CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents | alphaXiv Researchers from ServiceNow, University of Waterloo, and other institutions developed CUA-SUITE, a large-scale dataset featuring 55 hours of human-annotate computer use agentsvideo demonstrationscuasuitemassive https://www.alphaxiv.org/?categories=%5B%22computer-science%22%5D&custom-categories=%5B%22agents%22%5D&subcategories=%5B%22artificial-intelligence%22%5D Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/abs/2604.19740 Generalization at the Edge of Stability | alphaXiv View recent discussion. Abstract: Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the... generalizationedgestabilityalphaxiv https://www.alphaxiv.org/blog Blog | alphaXiv Research writeups, experiments, and product notes from alphaXiv. blogalphaxiv https://www.alphaxiv.org/ Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/abs/2604.15453 (1D) Ordered Tokens Enable Efficient Test-Time Search | alphaXiv View recent discussion. Abstract: Tokenization is a key component of autoregressive (AR) generative models, converting raw data into more manageable units for... enable efficienttest time1dorderedtokens https://www.alphaxiv.org/overview/2604.15453 (1D) Ordered Tokens Enable Efficient Test-Time Search | alphaXiv This research reveals that 1D ordered tokenization, which processes images from coarse-to-fine semantic detail, intrinsically improves the efficiency of te enable efficienttest time1dorderedtokens https://www.alphaxiv.org/overview/2604.21254 Hyperloop Transformers | alphaXiv The Massachusetts Institute of Technology developed Hyperloop Transformers, an architecture integrating looped Transformers with strategic hyper-connection hyperlooptransformersalphaxiv https://www.alphaxiv.org/abs/2604.16004 AgentV-RL: Scaling Reward Modeling with Agentic Verifier | alphaXiv View recent discussion. Abstract: Verifiers have been demonstrated to enhance LLM reasoning via test-time scaling (TTS). Yet, they face significant challenges... rlscalingrewardmodelingagentic https://www.alphaxiv.org/abs/2604.14141 Geometric Context Transformer for Streaming 3D Reconstruction | alphaXiv View recent discussion. Abstract: Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which... geometric context transformerstreaming 3d reconstructionalphaxiv https://www.alphaxiv.org/resources/2604.21924 Long-Horizon Manipulation via Trace-Conditioned VLA Planning | alphaXiv View recent discussion. Abstract: Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step,... long horizonmanipulation viatraceconditionedvla https://www.alphaxiv.org/?customCategories=agentic-frameworks Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/?customCategories=data-curation Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/abs/2603.22275 Repurposing Geometric Foundation Models for Multi-view Diffusion | alphaXiv View recent discussion. Abstract: While recent advances in generative latent spaces have driven substantial progress in single-image generation, the optimal... foundation modelsmulti viewrepurposinggeometricdiffusion https://www.alphaxiv.org/abs/2604.22709 Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought | alphaXiv View recent discussion. Abstract: While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate... without wordslatent reasoningabstract chainthought alphaxivthinking https://www.alphaxiv.org/resources/2604.17335 Learning Whole-Body Humanoid Locomotion via Motion Generation and Motion Tracking | alphaXiv View recent discussion. Abstract: Whole-body humanoid locomotion is challenging due to high-dimensional control, morphological instability, and the need for... whole bodylearninghumanoidlocomotionvia https://www.alphaxiv.org/?customCategories=agents Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/?sort=Hot Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/?categories=%5B%22computer-science%22%5D Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/labs alphaXiv Labs Visualizations, virtual exhibits, and experimental tools for exploring research alphaxivlabs https://www.alphaxiv.org/overview/2604.14228 Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems | alphaXiv A detailed architectural analysis of Anthropic's Claude Code, a production-grade AI agent, is provided through source-level examination, revealing a design ai agent systemsclaude codedesign spacedivetoday https://www.alphaxiv.org/abs/2603.25562 Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes | alphaXiv View recent discussion. Abstract: On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback... failure modessimple fixesrevisitingpolicydistillation https://www.alphaxiv.org/abs/2506.11176 Model Discovery and Graph Simulation: A Lightweight Gateway to Chaos Engineering | alphaXiv View recent discussion. Abstract: Chaos engineering reveals resilience risks but is expensive and operationally risky to run broadly and often. Model-based... chaos engineeringmodeldiscoverygraphsimulation https://www.alphaxiv.org/overview/2604.15726 LLM Reasoning Is Latent, Not the Chain of Thought | alphaXiv Research demonstrates that the core mechanism of Large Language Model reasoning is primarily driven by internal latent-state trajectories, rather than sole llm reasoningthought alphaxivlatentchain https://www.alphaxiv.org/abs/2604.15039 Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter | alphaXiv View recent discussion. Abstract: Prefill-decode (PD) disaggregation has become the standard architecture for large-scale LLM serving, but in practice its... next generationmodels couldprefillservicekvcache https://www.alphaxiv.org/abs/2604.18564 MultiWorld: Scalable Multi-Agent Multi-View Video World Models | alphaXiv View recent discussion. Abstract: Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or... multiworld scalable multiagent view videomodels alphaxiv https://www.alphaxiv.org/abs/2604.15034 Autogenesis: A Self-Evolving Agent Protocol | alphaXiv View recent discussion. Abstract: Recent advances in LLM based agent systems have shown promise in tackling complex, long horizon tasks. However, existing... self evolvingagentprotocolalphaxiv https://www.alphaxiv.org/resources/2604.16299 Repurposing 3D Generative Model for Autoregressive Layout Generation | alphaXiv View recent discussion. Abstract: We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that... 3d generativerepurposingmodelautoregressivelayout https://www.alphaxiv.org/resources/2603.25040 Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale | alphaXiv View recent discussion. Abstract: We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this... s1 promultimodal foundationinternscientificmodel https://www.alphaxiv.org/abs/2604.15809 Aligning What Vision-Language Models See and Perceive with Adaptive Information Flow | alphaXiv View recent discussion. Abstract: Vision-Language Models (VLMs) have demonstrated strong capability in a wide range of tasks such as visual recognition,... vision language modelsinformation flowaligningseeperceive https://www.alphaxiv.org/abs/2604.16209 Towards Ultra-High-Rate Quantum Error Correction with Reconfigurable Atom Arrays | alphaXiv View recent discussion. Abstract: Quantum error correction is widely believed to be essential for large-scale quantum computation, but the required qubit... quantum error correctionreconfigurable atom arraysultra hightowardsrate https://www.alphaxiv.org/resources/2603.24472 Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? | alphaXiv View recent discussion. Abstract: Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening... self distillationreasoning capabilitysometimesdegradellms https://www.alphaxiv.org/?customCategories=adversarial-attacks Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/overview/2604.14141 Geometric Context Transformer for Streaming 3D Reconstruction | alphaXiv LingBot-Map, a feed-forward 3D foundation model, performs streaming 3D reconstruction by introducing a Geometric Context Transformer (GCT) that employs a n geometric context transformerstreaming 3d reconstructionalphaxiv https://www.alphaxiv.org/?customCategories=chain-of-thought Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/?customCategories=adversarial-robustness Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv https://www.alphaxiv.org/abs/2604.21921 Context Unrolling in Omni Models | alphaXiv View recent discussion. Abstract: We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D... models alphaxivcontextomni https://www.alphaxiv.org/abs/2604.15483 $π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities | alphaXiv View recent discussion. Abstract: We present a new robotic foundation model, called ${\pi}_{0.7}$, that can enable strong out-of-the-box performance in a wide... 0 7foundation modelemergent capabilitiessteerablegeneralist https://www.alphaxiv.org/signin Sign In | alphaXiv signalphaxiv https://www.alphaxiv.org/resources/2603.19461 Hyperagents | alphaXiv View recent discussion. Abstract: Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and... hyperagentsalphaxiv https://www.alphaxiv.org/overview/2604.18574 When Can LLMs Learn to Reason with Weak Supervision? | alphaXiv An investigation systematically characterizes the conditions under which large language models (LLMs) generalize with Reinforcement Learning with Verifiabl weak supervisionllmslearnreasonalphaxiv https://www.alphaxiv.org/overview/2604.15804 Qwen3.5-Omni Technical Report | alphaXiv Alibaba Cloud's Qwen Team developed Qwen3.5-Omni, a large language model scaling to hundreds of billions of parameters that processes and generates across qwen3 5 omnitechnical reportalphaxiv https://www.alphaxiv.org/?customCategories=generative-models Explore | alphaXiv Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv. explore alphaxiv