Sponsor of the Day:
Jerkmate
https://www.alphaxiv.org/?categories=%5B%22computer-science%22%5D&custom-categories=%5B%22agents%22%5D
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/resources/2604.18564
MultiWorld: Scalable Multi-Agent Multi-View Video World Models | alphaXiv
View recent discussion. Abstract: Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or...
multiworld scalable multiagent view videomodels alphaxiv
https://www.alphaxiv.org/abs/2604.15726
LLM Reasoning Is Latent, Not the Chain of Thought | alphaXiv
View recent discussion. Abstract: This position paper argues that large language model (LLM) reasoning should be studied as latent-state trajectory formation...
llm reasoningthought alphaxivlatentchain
https://www.alphaxiv.org/?categories=%5B%22computer-science%22%5D&custom-categories=%5B%22agents%22%5D&subcategories=%5B%22artificial-intelligence%22%2C%22computer-vision-and-pattern-recognition%22%5D
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/resources/2604.14141
Geometric Context Transformer for Streaming 3D Reconstruction | alphaXiv
View recent discussion. Abstract: Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which...
geometric context transformerstreaming 3d reconstructionalphaxiv
https://www.alphaxiv.org/overview/2603.24440
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents | alphaXiv
Researchers from ServiceNow, University of Waterloo, and other institutions developed CUA-SUITE, a large-scale dataset featuring 55 hours of human-annotate
computer use agentsvideo demonstrationscuasuitemassive
https://www.alphaxiv.org/?categories=%5B%22computer-science%22%5D&custom-categories=%5B%22agents%22%5D&subcategories=%5B%22artificial-intelligence%22%5D
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/abs/2604.19740
Generalization at the Edge of Stability | alphaXiv
View recent discussion. Abstract: Training modern neural networks often relies on large learning rates, operating at the edge of stability, where the...
generalizationedgestabilityalphaxiv
https://www.alphaxiv.org/blog
Blog | alphaXiv
Research writeups, experiments, and product notes from alphaXiv.
blogalphaxiv
https://www.alphaxiv.org/
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/abs/2604.15453
(1D) Ordered Tokens Enable Efficient Test-Time Search | alphaXiv
View recent discussion. Abstract: Tokenization is a key component of autoregressive (AR) generative models, converting raw data into more manageable units for...
enable efficienttest time1dorderedtokens
https://www.alphaxiv.org/overview/2604.15453
(1D) Ordered Tokens Enable Efficient Test-Time Search | alphaXiv
This research reveals that 1D ordered tokenization, which processes images from coarse-to-fine semantic detail, intrinsically improves the efficiency of te
enable efficienttest time1dorderedtokens
https://www.alphaxiv.org/overview/2604.21254
Hyperloop Transformers | alphaXiv
The Massachusetts Institute of Technology developed Hyperloop Transformers, an architecture integrating looped Transformers with strategic hyper-connection
hyperlooptransformersalphaxiv
https://www.alphaxiv.org/abs/2604.16004
AgentV-RL: Scaling Reward Modeling with Agentic Verifier | alphaXiv
View recent discussion. Abstract: Verifiers have been demonstrated to enhance LLM reasoning via test-time scaling (TTS). Yet, they face significant challenges...
rlscalingrewardmodelingagentic
https://www.alphaxiv.org/abs/2604.14141
Geometric Context Transformer for Streaming 3D Reconstruction | alphaXiv
View recent discussion. Abstract: Streaming 3D reconstruction aims to recover 3D information, such as camera poses and point clouds, from a video stream, which...
geometric context transformerstreaming 3d reconstructionalphaxiv
https://www.alphaxiv.org/resources/2604.21924
Long-Horizon Manipulation via Trace-Conditioned VLA Planning | alphaXiv
View recent discussion. Abstract: Long-horizon manipulation remains challenging for vision-language-action (VLA) policies: real tasks are multi-step,...
long horizonmanipulation viatraceconditionedvla
https://www.alphaxiv.org/?customCategories=agentic-frameworks
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/?customCategories=data-curation
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/abs/2603.22275
Repurposing Geometric Foundation Models for Multi-view Diffusion | alphaXiv
View recent discussion. Abstract: While recent advances in generative latent spaces have driven substantial progress in single-image generation, the optimal...
foundation modelsmulti viewrepurposinggeometricdiffusion
https://www.alphaxiv.org/abs/2604.22709
Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought | alphaXiv
View recent discussion. Abstract: While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate...
without wordslatent reasoningabstract chainthought alphaxivthinking
https://www.alphaxiv.org/resources/2604.17335
Learning Whole-Body Humanoid Locomotion via Motion Generation and Motion Tracking | alphaXiv
View recent discussion. Abstract: Whole-body humanoid locomotion is challenging due to high-dimensional control, morphological instability, and the need for...
whole bodylearninghumanoidlocomotionvia
https://www.alphaxiv.org/?customCategories=agents
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/?sort=Hot
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/?categories=%5B%22computer-science%22%5D
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/labs
alphaXiv Labs
Visualizations, virtual exhibits, and experimental tools for exploring research
alphaxivlabs
https://www.alphaxiv.org/overview/2604.14228
Dive into Claude Code: The Design Space of Today's and Future AI Agent Systems | alphaXiv
A detailed architectural analysis of Anthropic's Claude Code, a production-grade AI agent, is provided through source-level examination, revealing a design
ai agent systemsclaude codedesign spacedivetoday
https://www.alphaxiv.org/abs/2603.25562
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes | alphaXiv
View recent discussion. Abstract: On-policy distillation (OPD) is appealing for large language model (LLM) post-training because it evaluates teacher feedback...
failure modessimple fixesrevisitingpolicydistillation
https://www.alphaxiv.org/abs/2506.11176
Model Discovery and Graph Simulation: A Lightweight Gateway to Chaos Engineering | alphaXiv
View recent discussion. Abstract: Chaos engineering reveals resilience risks but is expensive and operationally risky to run broadly and often. Model-based...
chaos engineeringmodeldiscoverygraphsimulation
https://www.alphaxiv.org/overview/2604.15726
LLM Reasoning Is Latent, Not the Chain of Thought | alphaXiv
Research demonstrates that the core mechanism of Large Language Model reasoning is primarily driven by internal latent-state trajectories, rather than sole
llm reasoningthought alphaxivlatentchain
https://www.alphaxiv.org/abs/2604.15039
Prefill-as-a-Service: KVCache of Next-Generation Models Could Go Cross-Datacenter | alphaXiv
View recent discussion. Abstract: Prefill-decode (PD) disaggregation has become the standard architecture for large-scale LLM serving, but in practice its...
next generationmodels couldprefillservicekvcache
https://www.alphaxiv.org/abs/2604.18564
MultiWorld: Scalable Multi-Agent Multi-View Video World Models | alphaXiv
View recent discussion. Abstract: Video world models have achieved remarkable success in simulating environmental dynamics in response to actions by users or...
multiworld scalable multiagent view videomodels alphaxiv
https://www.alphaxiv.org/abs/2604.15034
Autogenesis: A Self-Evolving Agent Protocol | alphaXiv
View recent discussion. Abstract: Recent advances in LLM based agent systems have shown promise in tackling complex, long horizon tasks. However, existing...
self evolvingagentprotocolalphaxiv
https://www.alphaxiv.org/resources/2604.16299
Repurposing 3D Generative Model for Autoregressive Layout Generation | alphaXiv
View recent discussion. Abstract: We introduce LaviGen, a framework that repurposes 3D generative models for 3D layout generation. Unlike previous methods that...
3d generativerepurposingmodelautoregressivelayout
https://www.alphaxiv.org/resources/2603.25040
Intern-S1-Pro: Scientific Multimodal Foundation Model at Trillion Scale | alphaXiv
View recent discussion. Abstract: We introduce Intern-S1-Pro, the first one-trillion-parameter scientific multimodal foundation model. Scaling to this...
s1 promultimodal foundationinternscientificmodel
https://www.alphaxiv.org/abs/2604.15809
Aligning What Vision-Language Models See and Perceive with Adaptive Information Flow | alphaXiv
View recent discussion. Abstract: Vision-Language Models (VLMs) have demonstrated strong capability in a wide range of tasks such as visual recognition,...
vision language modelsinformation flowaligningseeperceive
https://www.alphaxiv.org/abs/2604.16209
Towards Ultra-High-Rate Quantum Error Correction with Reconfigurable Atom Arrays | alphaXiv
View recent discussion. Abstract: Quantum error correction is widely believed to be essential for large-scale quantum computation, but the required qubit...
quantum error correctionreconfigurable atom arraysultra hightowardsrate
https://www.alphaxiv.org/resources/2603.24472
Why Does Self-Distillation (Sometimes) Degrade the Reasoning Capability of LLMs? | alphaXiv
View recent discussion. Abstract: Self-distillation has emerged as an effective post-training paradigm for LLMs, often improving performance while shortening...
self distillationreasoning capabilitysometimesdegradellms
https://www.alphaxiv.org/?customCategories=adversarial-attacks
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/overview/2604.14141
Geometric Context Transformer for Streaming 3D Reconstruction | alphaXiv
LingBot-Map, a feed-forward 3D foundation model, performs streaming 3D reconstruction by introducing a Geometric Context Transformer (GCT) that employs a n
geometric context transformerstreaming 3d reconstructionalphaxiv
https://www.alphaxiv.org/?customCategories=chain-of-thought
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/?customCategories=adversarial-robustness
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv
https://www.alphaxiv.org/abs/2604.21921
Context Unrolling in Omni Models | alphaXiv
View recent discussion. Abstract: We present Omni, a unified multimodal model natively trained on diverse modalities, including text, images, videos, 3D...
models alphaxivcontextomni
https://www.alphaxiv.org/abs/2604.15483
$π_{0.7}$: a Steerable Generalist Robotic Foundation Model with Emergent Capabilities | alphaXiv
View recent discussion. Abstract: We present a new robotic foundation model, called ${\pi}_{0.7}$, that can enable strong out-of-the-box performance in a wide...
0 7foundation modelemergent capabilitiessteerablegeneralist
https://www.alphaxiv.org/signin
Sign In | alphaXiv
signalphaxiv
https://www.alphaxiv.org/resources/2603.19461
Hyperagents | alphaXiv
View recent discussion. Abstract: Self-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and...
hyperagentsalphaxiv
https://www.alphaxiv.org/overview/2604.18574
When Can LLMs Learn to Reason with Weak Supervision? | alphaXiv
An investigation systematically characterizes the conditions under which large language models (LLMs) generalize with Reinforcement Learning with Verifiabl
weak supervisionllmslearnreasonalphaxiv
https://www.alphaxiv.org/overview/2604.15804
Qwen3.5-Omni Technical Report | alphaXiv
Alibaba Cloud's Qwen Team developed Qwen3.5-Omni, a large language model scaling to hundreds of billions of parameters that processes and generates across
qwen3 5 omnitechnical reportalphaxiv
https://www.alphaxiv.org/?customCategories=generative-models
Explore | alphaXiv
Discuss, discover, and read arXiv papers. Explore trending papers, see recent activity and discussions, and follow authors of arXiv papers on alphaXiv.
explore alphaxiv