arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1967
专题追踪
2605.13838 2026-05-15 cs.CV cs.GR cs.LG

R-DMesh: Video-Guided 3D Animation via Rectified Dynamic Mesh Flow

Zijie Wu, Lixin Xu, Puhua Jiang, Sicong Liu, Chunchao Guo, Xiang Bai

发表机构 * Huazhong University of Science and Technology(华中科技大学) Tencent Hunyuan(腾讯混元)

AI总结 R-DMesh 是一种用于视频引导的三维动画生成方法,旨在解决静态网格与参考视频初始姿态不匹配导致的动画失真问题。该方法通过引入条件变分自编码器和三流注意力机制,将输入网格分解为基准形态、相对运动轨迹和姿态校正偏移,并在动画前自动对齐初始姿态,从而生成高保真的四维网格。研究还构建了大规模数据集 Video-RDMesh,实验表明该方法在姿态重定向和四维生成等任务中表现出色。

Comments Accepted by SIGGRAPH 2026, Project Page: https://r-dmesh.github.io/ Code URL: https://github.com/Tencent-Hunyuan/R-DMesh

详情
英文摘要

Video-guided 3D animation holds immense potential for content creation, offering intuitive and precise control over dynamic assets. However, practical deployment faces a critical yet frequently overlooked hurdle: the pose misalignment dilemma. In real-world scenarios, the initial pose of a user-provided static mesh rarely aligns with the starting frame of a reference video. Naively forcing a mesh to follow a mismatched trajectory inevitably leads to severe geometric distortion or animation failure. To address this, we present Rectified Dynamic Mesh (R-DMesh), a unified framework designed to generate high-fidelity 4D meshes that are ``rectified'' to align with video context. Unlike standard motion transfer approaches, our method introduces a novel VAE that explicitly disentangles the input into a conditional base mesh, relative motion trajectories, and a crucial rectification jump offset. This offset is learned to automatically transform the arbitrary pose of the input mesh to match the video's initial state before animation begins. We process these components via a Triflow Attention mechanism, which leverages vertex-wise geometric features to modulate the three orthogonal flows, ensuring physical consistency and local rigidity during the rectification and animation process. For generation, we employ a Rectified Flow-based Diffusion Transformer conditioned on pre-trained video latents, effectively transferring rich spatio-temporal priors to the 3D domain. To support this task, we construct Video-RDMesh, a large-scale dataset of over 500k dynamic mesh sequences specifically curated to simulate pose misalignment. Extensive experiments demonstrate that R-DMesh not only solves the alignment problem but also enables robust downstream applications, including pose retargeting and holistic 4D generation.

2605.13789 2026-05-15 cs.LG cs.AI q-bio.BM

ENSEMBITS: an alphabet of protein conformational ensembles

Kaiwen Shi, Carlos Oliver

发表机构 * Department of Computer Science, Vanderbilt University(范德比尔特大学计算机科学系) Center for AI in Protein Dynamics, Vanderbilt University(蛋白质动力学中的人工智能中心,范德比尔特大学) Department of Molecular Physiology and Biophysics, Vanderbilt University(分子生理学与生物物理学系,范德比尔特大学)

AI总结 本文提出了一种名为 Ensembits 的新型蛋白质构象集合分词器,旨在解决现有分词器无法捕捉蛋白质动态构象变化的问题。该方法通过引入残差 VQ-VAE 模型和帧蒸馏目标函数,能够有效编码不同构象间的几何特征和动态变化,实现对蛋白质运动状态的精确描述。Ensembits 在多个任务中表现出色,包括 RMSF 预测、功能注释和突变效应预测等,并且在数据量远少于静态分词器的情况下仍能取得优异性能,为蛋白质语言建模和设计提供了重要的动态词汇基础。

详情
英文摘要

Protein structure tokenizers (PSTs) are workhorses in protein language modeling, function prediction, and evolutionary analysis. However, existing PSTs only capture local geometry of static structures, and miss the correlated motions and alternative conformational states revealed by protein ensembles. Here we introduce Ensembits, the first tokenizer of protein conformational ensembles. Ensembits address challenges inherent to tokenizing dynamics: deriving informative geometric descriptors across conformations, permutation-invariance encoding of variable-size ensembles, and conquering sparsity in dynamics data. Trained with a Residual VQ-VAE using a frame distillation objective on a large molecular dynamics corpus, Ensembits outperforms all related methods on RMSF prediction, and is the strongest standalone structural tokenizer on an token-conditioned ANOVA test on per-residue motion amplitude. Ensembits further matches or exceeds static tokenizers on EC, GO, binding site/affinity prediction, and zero-shot mutation-effect prediction despite using far less pretraining data. Notably, the distillation objective enables Ensembits to predict dynamics token from one single predicted structure, which alleviates dynamics data sparsity. As the field moves from static structure prediction toward ensemble generation, Ensembits offer the discrete vocabulary needed to bring dynamics into protein language modeling and design.

2605.13748 2026-05-15 cs.RO cs.SY eess.SY math.OC

TinySDP: Real Time Semidefinite Optimization for Certifiable and Agile Edge Robotics

Ishaan Mahajan, Jon Arrizabalaga, Andrea Grillo, Fausto Vega, James Anderson, Zachary Manchester, Brian Plancher

发表机构 * A2R-Lab(A2R实验室)

AI总结 本文提出了一种名为TinySDP的实时半定规划求解器,旨在解决资源受限嵌入式系统中实时控制的计算瓶颈问题。该方法通过将半正定锥投影整合到基于缓存Riccati的ADMM求解器中,实现了在微控制器上高效求解具有非凸障碍约束的模型预测控制问题。此外,TinySDP引入了后验秩-1证书,将松弛解转化为每时每刻的几何保证,实验表明其在复杂场景下相比现有方法路径更短且避障效果更优,已在无人机系统中得到验证。

Comments Accepted to Robotics: Science and Systems (RSS) 2026. 11 pages, 5 figures, 2 tables. Project website: https://a2r-lab.org/TinySDP/

详情
英文摘要

Semidefinite programming (SDP) provides a principled framework for convex relaxations of nonconvex geometric constraints in motion planning, yet existing solvers are too computationally expensive for real-time control, particularly on resource-constrained embedded systems. To address this gap, we introduce TinySDP, the first semidefinite programming solver designed for embedded systems, enabling real-time model-predictive control (MPC) on microcontrollers for problems with nonconvex obstacle constraints. Our approach integrates positive-semidefinite cone projections into a cached-Riccati-based ADMM solver, leveraging computational structure for embedded tractability. We pair this solver with an a posteriori rank-1 certificate that converts relaxed solutions into explicit geometric guarantees at each timestep. On challenging benchmarks, e.g., cul-de-sac and dynamic obstacle avoidance scenarios that induce failures in local methods, TinySDP achieves collision-free navigation with up to 73% shorter paths than state-of-the-art baselines. We validate our approach on a Crazyflie quadrotor, demonstrating that semidefinite constraints can be enforced at real-time rates for agile embedded robotics.

2605.13369 2026-05-15 cs.CL cs.AI cs.LG

Query-Conditioned Test-Time Self-Training for Large Language Models

Chaehee Song, Minseok Seo, Yeeun Seong, Doyi Kim, Changick Kim

发表机构 * School of Electrical Engineering, KAIST(韩国科学技术院电子工程学院) Graduate School of Green Growth and Sustainability, KAIST(韩国科学技术院可持续增长与绿色发展研究生院)

AI总结 本文提出了一种名为 QueST 的查询条件化测试时自训练框架,用于在推理过程中根据输入查询动态调整大语言模型的参数,以提升模型对特定问题的适应能力。核心思想是利用输入查询中隐含的结构信息生成相关的“问题-解答”对,作为测试时参数高效微调的监督信号,从而无需外部数据即可实现模型的查询特异性优化。实验表明,QueST 在多个数学和科学推理基准上优于现有的测试时优化方法,验证了该方法的有效性与实用性。

Comments 17 pages, 7 figures

详情
英文摘要

Large language models (LLMs) are typically deployed with fixed parameters, and their performance is often improved by allocating more computation at inference time. While such test-time scaling can be effective, it cannot correct model misconceptions or adapt the model to the specific structure of an individual query. Test-time optimization addresses this limitation by enabling parameter updates during inference, but existing approaches either rely on external data or optimize generic self-supervised objectives that lack query-specific alignment. In this work, we propose Query-Conditioned Test-Time Self-Training (QueST), a framework that adapts model parameters during inference using supervision derived directly from the input query. Our key insight is that the input query itself encodes latent signals sufficient for constructing structurally related problem--solution pairs. Based on this, QueST generates such query-conditioned pairs and uses them as supervision for parameter-efficient fine-tuning at test time. The adapted model is then used to produce the final answer, enabling query-specific adaptation without any external data. Across seven mathematical reasoning benchmarks and the GPQA-Diamond scientific reasoning benchmark, QueST consistently outperforms strong test-time optimization baselines. These results demonstrate that query-conditioned self-training is an effective and practical paradigm for test-time adaptation in LLMs. Code is available at https://chssong.github.io/Query-Conditioned-TTST/.

2605.13276 2026-05-15 cs.AI cs.RO

D-VLA: A High-Concurrency Distributed Asynchronous Reinforcement Learning Framework for Vision-Language-Action Models

Yucheng Guo, Yongjian Guo, Zhong Guan, Wen Huang, Haoran Sun, Haodong Yue, Xiaolong Xiang, Shuai Di, Zhen Sun, Luqiao Wang, Junwu Xiong, Yicheng Gong

发表机构 * Tsinghua University(清华大学) Peking University(北京大学) Tianjin University(天津大学) Beihang University(北航) JDT AI Infra(京东AI基础设施)

AI总结 随着具身人工智能的快速发展,视觉-语言-动作(VLA)模型在多模态感知和任务执行方面表现出色,但在大规模分布式环境中应用强化学习(RL)时面临系统瓶颈,主要源于高保真物理仿真与深度学习对显存和带宽的高需求之间的资源冲突。为解决这一问题,本文提出D-VLA,一种高并发、低延迟的分布式RL框架,通过“平面解耦”和“泳道”异步流水线等创新设计,有效分离训练数据与模型优化过程,实现采样、推理、梯度计算和参数分发的全并行重叠,显著提升了大规模VLA模型的训练吞吐量和采样效率。

详情
英文摘要

The rapid evolution of Embodied AI has enabled Vision-Language-Action (VLA) models to excel in multimodal perception and task execution. However, applying Reinforcement Learning (RL) to these massive models in large-scale distributed environments faces severe systemic bottlenecks, primarily due to the resource conflict between high-fidelity physical simulation and the intensive VRAM/bandwidth demands of deep learning. This conflict often leaves overall throughput constrained by execution-phase inefficiencies. To address these challenges, we propose D-VLA, a high-concurrency, low-latency distributed RL framework for large-scale embodied foundation models. D-VLA introduces "Plane Decoupling," physically isolating high-frequency training data from low-frequency weight control to eliminate interference between simulation and optimization. We further design a four-thread asynchronous "Swimlane" pipeline, enabling full parallel overlap of sampling, inference, gradient computation, and parameter distribution. Additionally, a dual-pool VRAM management model and topology-aware replication resolve memory fragmentation and optimize communication efficiency. Experiments on benchmarks like LIBERO show that D-VLA significantly outperforms mainstream RL frameworks in throughput and sampling efficiency for billion-parameter VLA models. In trillion-parameter scalability tests, our framework maintains exceptional stability and linear speedup, providing a robust system for high-performance general-purpose embodied agents.

2605.13247 2026-05-15 cs.LG

EMO: Frustratingly Easy Progressive Training of Extendable MoE

Linghao Jin, Chufan Shi, Huijuan Wang, Nuan Wen, Zhengzhong Liu, Eric Xing, Xuezhe Ma

发表机构 * USC-ISI(USC- ISI) MBZUAI-IFM

AI总结 本文提出了一种名为EMO的渐进式训练框架,用于可扩展的稀疏混合专家(MoE)模型。该方法通过在训练过程中逐步扩展专家池,解决了传统MoE训练中因过早分配过多专家而导致的内存和通信开销过大的问题。EMO基于扩展定律建模稀疏性,为渐进式扩展设计了计算最优的token预算,实验表明其在保持模型性能的同时显著提升了训练效率和资源利用率。

详情
英文摘要

Sparse Mixture-of-Experts (MoE) models offer a powerful way to scale model size without increasing compute, as per-token FLOPs depend only on k active experts rather than the total pool of E experts. Yet, this asymmetry creates an MoE efficiency paradox in practice: adding more experts balloons memory and communication costs, making actual training inefficient. We argue that this bottleneck arises in part because current MoE training allocates too many experts from the beginning, even though early-stage data may not fully utilize such capacity. Motivated by this, we propose EMO, a simple progressive training framework that treats MoE capacity as expandable memory and grows the expert pool over the course of training. EMO explicitly models sparsity in scaling law to derive stage-wise compute-optimal token budgets for progressive expansion. Empirical results show that EMO matches the performance of a fixed-expert setup in large-scale experiments while improving wall-clock efficiency. It offers a surprisingly simple yet effective path to scalable MoE training, preserving the benefits of large expert pools while reducing both training time and GPU cost.

2605.13084 2026-05-15 cs.CL cs.AI

Does language matter for spoken word classification? A multilingual generative meta-learning approach

Batsirayi Mupamhi Ziki, Louise Beyers, Ruan van der Merwe

发表机构 * Bytefuse

AI总结 本文研究了语言因素在少样本语音词分类中的影响,提出了一种基于生成式元学习的多语言方法。该方法通过生成元持续学习算法,在英语、德语、法语和加泰罗尼亚语等多语言环境下进行训练,发现多语言模型表现最佳,但不同模型之间的性能差异较小。研究还表明,训练数据的独特小时数比语言数量更能反映模型性能。

详情
英文摘要

Meta-learning has been shown to have better performance than supervised learning for few-shot monolingual spoken word classification. However, the meta-learning approach remains under-explored in multilingual spoken word classification. In this paper, we apply the Generative Meta-Continual Learning algorithm to spoken word classification. The generative nature of this algorithm makes it viable for use in application, and the meta-learning aspect promotes generalisation, which is crucial in a multilingual setting. We train monolingual models on English, German, French, and Catalan, a bilingual model on English and German, and a multilingual model on all four languages. We find that although the multilingual model performs best, the differences between model performance is unexpectedly low. We also find that the hours of unique data seen during training seems to be a stronger performance indicator than the number of languages included in the training data.

2605.13050 2026-05-15 cs.CL cs.AI

Context Training with Active Information Seeking

Zeyu Huang, Adhiguna Kuncoro, Qixuan Feng, Jiajun Shen, Lucio Dery, Arthur Szlam, Marc'Aurelio Ranzato

发表机构 * The University of Edinburgh(爱丁堡大学)

AI总结 本文研究了如何通过主动信息检索提升大型语言模型在新任务中的适应能力。不同于传统依赖模型内部知识的封闭式方法,作者为上下文优化器引入了维基百科搜索和浏览器工具,以主动获取外部信息。通过设计一种基于搜索的训练流程,有效维护和剪枝多个候选上下文,显著提升了模型在低资源翻译、医疗场景和复杂推理等任务中的表现,同时表现出良好的数据效率和泛化能力。

Comments Preprint

详情
英文摘要

Most existing large language models (LLMs) are expensive to adapt after deployment, especially when a task requires newly produced information or niche domain knowledge. Recent work has shown that, by manipulating and optimizing their context, LLMs can be tailored to downstream tasks without updating their weights. However, most existing methods remain closed-loop, relying solely on the model's intrinsic knowledge. In this paper, we equip these context optimizers with Wikipedia search and browser tools for active information seeking. We show that naively adding these tools to a standard sequential context optimization pipeline can actually degrade performance compared to baselines. However, when paired with a search-based training procedure that maintains and prunes multiple candidate contexts, active information seeking delivers consistent and substantial gains. We demonstrate these improvements across diverse domains, including low-resource translation (Flores+), health scenarios (HealthBench), and reasoning-heavy tasks (LiveCodeBench and Humanity's Last Exam). Furthermore, our method proves to be data-efficient, robust across different hyperparameters, and capable of generating effective textual contexts that generalize well across different models.

2605.13032 2026-05-15 cs.LG

What Information Matters? Graph Out-of-Distribution Detection via Tri-Component Information Decomposition

Danny Wang, Ruihong Qiu, Zi Huang

发表机构 * The University of Queensland, Australia(昆士兰大学)

AI总结 图神经网络在节点分类任务中广泛应用,但在面对节点特征或图结构的分布外(OOD)变化时表现脆弱。为解决这一问题,本文提出了一种名为TIDE的三组件信息分解框架,将信息显式分解为特征相关、结构相关和联合组件,旨在保留与标签相关的联合信息,同时过滤掉虚假的特征和结构信息,从而增强对分布内(ID)和分布外(OOD)节点的区分能力。实验表明,TIDE在多个数据集上显著提升了OOD检测性能,同时保持了较高的ID分类准确率。

Comments ICML26

详情
英文摘要

Graph neural networks are widely used for node classification, but they remain vulnerable to out-of-distribution (OOD) shifts in node features and graph structure. Prior work established that methods trained with standard supervised learning (SL) objectives tend to capture spurious signals from either features and/or structure, leaving the model fragile under distributional changes. To address this, we propose TIDE, a novel and effective Tri-Component Information Decomposition framework that explicitly decomposes information into feature-specific, structure-specific and joint components. TIDE aims to preserve only the label-relevant part of the joint information while filtering out spurious feature- and structure-specific information, thereby enhancing the separation between in-distribution (ID) and OOD nodes. Beyond the framework, we provide theoretical and empirical analyses showing that an information bottleneck objective is preferable to standard SL for graph OOD detection, with higher ID confidence and a greater entropy gap between ID and OOD data. Extensive experiments across seven datasets confirm the efficacy of TIDE, achieving up to a 34% improvement in FPR95 over strong baselines while maintaining competitive ID accuracy.

2605.12998 2026-05-15 cs.LG

DRIFT: A Benchmark for Task-Free Continual Graph Learning with Continuous Distribution Shifts

Guiquan Sun, Xikun Zhang, Jingchao Ni, Dongjin Song

发表机构 * University of Connecticut(康涅狄格大学) RMIT University(皇家墨尔本理工大学) University of Houston(休斯敦大学)

AI总结 本文提出DRIFT,一个用于无任务划分持续图学习的基准,旨在应对现实环境中连续分布漂移的挑战。传统持续图学习方法通常基于离散任务划分,而DRIFT则从无任务视角出发,将数据流建模为随时间变化的潜在任务分布混合,从而支持对分布漂移的连续建模。通过高斯参数化,DRIFT覆盖了从剧烈任务切换到平滑分布漂移的多种过渡动态,并揭示了现有方法在无任务划分场景下的性能下降问题,突显了研究真实非平稳条件下持续图学习的重要性。

Comments 20 pages, 5 figures

详情
英文摘要

Continual graph learning (CGL) aims to learn from dynamically evolving graphs while mitigating catastrophic forgetting. Existing CGL approaches typically adopt a task-based formulation, where the data stream is partitioned into a sequence of discrete tasks with pre-defined boundaries. However, such assumptions rarely hold in real-world environments, where data distributions evolve continuously and task identity is often unavailable. To better reflect realistic non-stationary environments, we revisit continual graph learning from a task-free perspective. We propose a unified formulation that models the data stream as a time-varying mixture of latent task distributions, enabling continuous modeling of distribution drift. Based on this formulation, we construct \emph{DRIFT}, a benchmark that spans a spectrum of transition dynamics ranging from hard task switches to smooth distributional drift through a Gaussian parameterization. We evaluate representative continual learning methods under this task-free setting and observe substantial performance degradation compared to traditional task-based protocols. Our findings indicate that many existing approaches implicitly rely on task boundary information and struggle under realistic task-free graph streams. This work highlights the importance of studying continual graph learning under realistic non-stationary conditions and provides a benchmark for future research in this direction. Our code is available at https://github.com/UConn-DSIS/DRIFT.

2605.12968 2026-05-15 cs.LG cs.AI cs.CL cs.LO

Controlling Logical Collapse in LLMs via Algebraic Ontology Projection over F2

Hisashi Miyashita, Mgnite Inc

发表机构 * Mgnite Inc(Mgnite公司)

AI总结 该研究探讨了大语言模型是否在内部以可形式验证的代数结构编码本体关系,并提出了一种代数本体投影(AOP)方法,通过在有限域F2上投影隐藏状态,仅使用42对关系作为代数密钥,实现了高达93.33%的零样本包含准确率。研究还引入了语义结晶度(SC)指标,用于量化模型满足F2约束的程度,并揭示了系统提示在防止模型深层逻辑崩溃中的关键作用,为理解大语言模型的逻辑结构提供了新的代数视角。

详情
英文摘要

Do large language models internally encode ontological relations in a formally verifiable algebraic structure? We introduce Algebraic Ontology Projection (AOP), which projects LLM hidden states into the Galois Field F2 under Liskov Substitution Principle constraints, using only 42 relational pairs as algebraic keys. AOP achieves up to 93.33% zero-shot inclusion accuracy on unseen concept pairs (Gemma-2 Instruct with optimized prompt), with consistent 86.67% accuracy observed across multiple model families -- with no model tuning, but through prompt alone. This algebraic structure is strongly layer-dependent. We introduce Semantic Crystallisation (SC), a metric that quantifies F2 constraint satisfaction relative to a random baseline and predicts zero-shot accuracy without held-out data. System prompts act as algebraic boundary conditions: only their combination with instruction tuning prevents Late-layer Collapse -- a systematic degradation of logical consistency in the final layers, observed in 7 of 10 conditions. These findings reframe forward computation as an iterative process of algebraic organisation, and open a path toward LLMs whose logical structure is not merely approximated, but formally accessible.

2605.12856 2026-05-15 cs.AI cs.SI

Moltbook Moderation: Uncovering Hidden Intent Through Multi-Turn Dialogue

Ali Al-Lawati, Nafis Tripto, Abolfazl Ansari, Jason Lucas, Suhang Wang, Dongwon Lee

发表机构 * The Pennsylvania State University(宾夕法尼亚州立大学)

AI总结 本文研究了多智能体系统中隐藏恶意意图的检测问题,提出了基于智能体意图而非内容特征的 moderation 框架 BOT-MOD。该方法通过多轮对话和基于 Gibbs 采样的假设引导,逐步识别智能体的真实意图,有效区分良性与恶意行为。实验基于 Moltbook 构建的数据集验证了方法的有效性,能够在多种对抗场景下准确识别意图,同时保持较低的误报率,为开放多智能体环境中的意图感知 moderation 提供了新思路。

详情
英文摘要

The emergence of multi-agent systems introduces novel moderation challenges that extend beyond content filtering. Agents with malicious intent may contribute harmful content that appears benign to evade content-based moderation, while compromising the system through exploitative and malicious behavior manifested across their overall interaction patterns within the community. To address this, we introduce BOT-MOD (BOT-MODeration), a moderation framework that grounds detection in agent intent rather than traditional content level signals. BOT-MOD identifies the underlying intent by engaging with the target agent in a multi-turn exchange guided by Gibbs-based sampling over candidate intent hypotheses. This progressively narrows the space of plausible agent objectives to identify the underlying behavior. To evaluate our approach, we construct a dataset derived from Moltbook that encompasses diverse benign and malicious behaviors based on actual community structures, posts, and comments. Results demonstrate that BOT-MOD reliably identifies agent intent across a range of adversarial configurations, while maintaining a low false positive rate on benign behaviors. This work advances the foundation for scalable, intent-aware moderation of agents in open multi-agent environments.

2605.12808 2026-05-15 cs.LG

Neurodata Without Boredom: Benchmarking Agentic AI for Data Reuse

Ling-Qi Zhang, Kristin Branson

发表机构 * HHMI Janelia Research Campus(HHMI贾能利亚研究中心)

AI总结 该研究针对神经科学数据碎片化、格式多样且难以复用的问题,探索利用代理AI(Agentic AI)提升数据重用效率的潜力。研究通过八个包含数据和代码的实验论文,评估了通用编程代理在加载、理解和重新格式化神经数据以训练解码器任务中的表现,发现尽管代理在子任务上表现良好,但难以实现端到端无误的解决方案。研究分析了代理的常见错误类型及其触发因素,并提出了适用于代理AI时代的数据共享最佳实践,同时指出代理作为评判者在缺乏真实参考的情况下可靠性有限,强调了人机协作在代码开发中的必要性。

Comments v2: Added forgotten acknowledgments section

详情
英文摘要

Neuroscience data are highly fragmented across labs, formats, and experimental paradigms, and reuse often requires substantial manual effort. A persistent roadblock to data reuse and integration is the need to decipher bespoke and diverse data formatting choices. Common data formats have been proposed in response, but the field continues to struggle with a fundamental tension: formats flexible enough to accommodate diverse experiments are rarely descriptive enough to be self-explanatory, and sufficiently descriptive formats demand detailed documentation and curation effort that few labs can sustain. Agentic AI is a natural candidate to solve this problem: LLMs read code and text faster and with sustained attention to the low-level details humans tend to skim over. To measure how well agentic AI performs on this task, we selected eight recent papers studying large-scale mouse neural population recordings that shared both data and code, spanning diverse recording modalities, behavioral paradigms, and dataset formats (e.g., NWB, specialized APIs, and general-purpose Python or MATLAB files). We provided agents with the data, code, and paper, and prompted them to load, understand, and reformat the data for a common downstream task: training a decoder from neural activity to task or behavioral variables. General-purpose coding agents commonly used by scientists performed well on each sub-task, but rarely strung together a fully error-free end-to-end solution. We characterize the types of mistakes agents made and the dataset properties that elicited them, and propose data-sharing best practices for the agentic-AI era. We further find that agents-as-judges are unreliable at catching errors, especially without ground-truth references, so interactive, human-in-the-loop coding remains necessary.

2605.12784 2026-05-15 cs.LG cs.NE q-bio.QM

ToolMol: Evolutionary Agentic Framework for Multi-objective Drug Discovery

Andrew Y. Zhou, Sharvaree Vadgama, Sumanth Varambally, Peter Eckmann, Michael K. Gilson, Rose Yu

发表机构 * Department of Computer Science(计算机科学系) Skaggs School of Pharmacy(斯卡格斯药学院) Department of Computer Science, Stanford University(斯坦福大学计算机科学系)

AI总结 该研究提出了一种名为ToolMol的进化智能代理框架,用于多目标药物分子设计。该框架结合多目标遗传算法和基于大语言模型的智能代理操作符,通过迭代更新分子种群,实现对药物分子的高效优化。ToolMol引入了基于RDKit的工具箱,支持精确的分子结构修改,并在多个蛋白质靶点上表现出色,其生成的分子在结合亲和力和绝对结合自由能等关键指标上均优于现有方法。

Comments 9 pages, 5 figures

详情
英文摘要

Advances in large language models (LLMs) have recently opened new and promising avenues for small-molecule drug discovery. Yet existing LLM-based approaches for molecular generation often suffer from high rates of invalid and low-quality ligand candidates, a result of the syntactic limitations of current models with regard to molecular strings. In this paper, we introduce $\texttt{ToolMol}$, an evolutionary agentic framework for de novo drug design. $\texttt{ToolMol}$ combines a multi-objective genetic algorithm with an agentic LLM operator that iteratively updates the ligand population. We build a comprehensive toolbox of RDKit-backed functions that allows our agentic operator to consisently make precise ligand modifications. $\texttt{ToolMol}$ achieves state-of-the-art performance on multi-objective property optimization tasks, discovering drug-like and synthesizable ligands that have $>10\%$ stronger predicted binding affinity compared to existing methods, evaluated on three protein targets. $\texttt{ToolMol}$ ligands additionally achieve state-of-the-art results in gold-standard Absolute Binding Free Energy scores, gaining over existing methods by over $35\%$. By studying chain-of-thought reasoning traces, we observe that tool-calling enables the model to more faithfully execute its planned modifications, efficiently exploiting the strong chemical prior knowledge in LLMs.

2605.12651 2026-05-15 cs.LG

Runtime Monitoring of Perception-Based Autonomous Systems via Embedding Temporal Logic

Parv Kapoor, Abigail Hammer, Ashish Kapoor, Karen Leung, Eunsuk Kang

发表机构 * Software and Societal Systems Department(软件与社会系统部门) Carnegie Mellon University(卡内基梅隆大学) General Robotics(通用机器人) Aeronautics and Astronautics Department(航空与航天系) University of Washington(华盛顿大学)

AI总结 本文提出了一种名为嵌入时序逻辑(ETL)的新方法,用于对基于感知的自主系统进行运行时监控。传统方法依赖于将连续传感器观测映射到低维状态变量定义的离散逻辑命题,但在感知驱动的场景中,这种方法存在计算开销大、鲁棒性差和语义不一致等问题。ETL 直接在学习得到的嵌入空间中进行监控,通过观测嵌入与参考观测嵌入之间的距离定义谓词,从而能够表达如视觉目标相似性或语义区域规避等高层感知概念,并通过时序算子组合这些谓词,自然地描述时序感知行为。实验表明,ETL 在多个操作环境中能够准确捕捉真实语义并实现对时序行为的有效监控。

详情
英文摘要

Runtime monitoring of autonomous systems traditionally relies on mapping continuous sensor observations to discrete logical propositions defined over low-dimensional state variables. This abstraction breaks down in perception-driven settings, where such mappings require additional learned modules that are often computationally expensive, brittle, and semantically misaligned. In this work, we propose Embedding Temporal Logic (ETL), a temporal logic that performs monitoring directly in learned embedding spaces. ETL defines predicates through distances between observed embeddings and target embeddings derived from reference observations. This formulation allows specifications to capture high-level perceptual concepts, such as similarity to visual goals or avoidance of semantic regions, that are difficult or impossible to express using traditional predicates. By composing these predicates with temporal operators, ETL naturally expresses temporally extended and sequential perceptual behaviors. We introduce ETL monitors for evaluating specifications over bounded embedding traces, along with a conformal calibration procedure that provides reliable and safety-oriented predicate evaluation. We evaluate our approach across multiple manipulation environments to show that ETL achieves strong empirical agreement with ground-truth semantics, including accurate monitoring of temporally composed behaviors.

2605.12534 2026-05-15 cs.SD cs.LG q-bio.NC

BioSEN: A Bio-acoustic Signal Enhancement Network for Animal Vocalizations

Tianyu Song, Ton Viet Ta, Ngamta Thamwattana, Hisako Nomura, Linh Thi Hoai Nguyen

发表机构 * Graduate School of Bioresource and Bioenvironmental Science, Kyushu University(九州大学生物资源与生物环境科学研究生院) Faculty of Agriculture, Kyushu University(九州大学农学部) School of Information and Physical Sciences, University of Newcastle(新castle大学信息与物理科学学院) International Institute for Carbon-Neutral Energy Research, Kyushu University(九州大学国际碳中性能源研究所)

AI总结 本文提出了一种名为BioSEN的生物声学信号增强网络,旨在解决动物声音在噪声环境下增强的问题。该模型结合了语音增强方法,并针对动物声音的特点设计了三个核心模块,分别用于时频特征提取、谐波结构捕捉和能量自适应门控连接。实验结果表明,BioSEN在三个生物声学数据集上表现优异,计算量远低于现有先进模型,展示了其在生物多样性监测与保护中的应用潜力。

Journal ref ICASSP 2026 - 2026 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

详情
英文摘要

Most work in audio enhancement targets human speech, while bioacoustics is less studied due to noisy recordings and the distinct traits of animal sounds. To fill this gap, we adapt speech enhancement methods and build BioSEN, a model made for bioacoustic signals. BioSEN has three modules: a multi-scale dual-axis attention unit for time-frequency feature extraction, a bio-harmonic multi-scale enhancement unit for capturing harmonic structures, and an energy-adaptive gating connection unit that uses frequency weights to keep vocalizations from being removed as noise. Tests on three bioacoustic datasets show that BioSEN matches or exceeds state-of-the-art speech enhancement models while using far less computation. These results show BioSEN's strength for bioacoustic audio enhancement and its promise for biodiversity monitoring and conservation.

2605.12394 2026-05-15 cs.LG cs.AI

Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory

Hari K. Prakash, Charles H Martin

发表机构 * University of California San Diego(加州大学圣地亚哥分校) Data Science and Engineering(数据科学与工程) Calculation Consulting(计算咨询)

AI总结 本文提出了一种基于随机矩阵理论的新方法,用于在深度学习模型训练过程中检测过拟合现象,而无需访问训练或测试数据。该方法通过随机化每一层的权重矩阵,并拟合其经验谱分布,识别出违反自平均性的异常特征值,称为“相关陷阱”。研究发现,在长期视角下的“反直觉学习”阶段,这些陷阱会随着测试准确率下降而逐渐形成和扩大,揭示了过拟合的结构特征,并指出部分大型语言模型中也存在类似的陷阱,可能暗示潜在的过拟合风险。

Comments 24 pages, 24 figures

详情
英文摘要

Training Neural Networks (NNs) without overfitting is difficult; detecting that overfitting is difficult as well. We present a novel Random Matrix Theory method that detects the onset of overfitting in deep learning models without access to train or test data. For each model layer, we randomize each weight matrix element-wise, $\mathbf{W} \to \mathbf{W}^{\mathrm{rand}}$, fit the randomized empirical spectral distribution with a Marchenko-Pastur distribution, and identify large outliers that violate self-averaging. We call these outliers Correlation Traps. During the onset of overfitting, which we call the "anti-grokking" phase in long-horizon grokking, Correlation Traps form and grow in number and scale as test accuracy decreases while train accuracy remains high. Traps may be benign or may harm generalization; we provide an empirical approach to distinguish between them by passing random data through the trained model and evaluating the JS divergence of output logits. Our findings show that anti-grokking is an additional grokking phase with high train accuracy and decreasing test accuracy, structurally distinct from pre-grokking through its Correlation Traps. More broadly, we find that some foundation-scale LLMs exhibit the same Correlation Traps, indicating potentially harmful overfitting.

2605.12350 2026-05-15 cs.LG cs.AI

A New Technique for AI Explainability using Feature Association Map

Sayantani Ghosh, Amit Kumar Das, Amlan Chakrabarti

发表机构 * DBS Bank(DBS银行) Institute of Engineering & Management(工程与管理学院) University of Calcutta(加尔各答大学)

AI总结 本文提出了一种基于特征关联图(FAM)的新型可解释人工智能算法FAMeX,用于解释AI系统的决策过程。该方法通过构建特征之间的关联图,从图论角度分析特征的重要性,从而更准确地揭示模型的决策依据。实验表明,FAMeX在分类任务中优于现有的可解释性算法如PFI和SHAP,展现出更高的解释能力和有效性。

详情
英文摘要

Lack of transparency in AI systems poses challenges in critical real-life applications. It is important to be able to explain the decisions of an AI system to ensure trust on the system. Explainable AI (XAI) algorithms play a vital role in achieving this objective. In this paper, we are proposing a new algorithm for Explaining AI systems, FAMeX (Feature Association Map based eXplainability). The proposed algorithm is based on a graph-theoretic formulation of the feature set termed as Feature Association Map (FAM). The foundation of the modelling is based on association between features. The proposed FAMeX algorithm has been found to be better than the competing XAI algorithms - Permutation Feature Importance (PFI) and SHapley Additive exPlanations (SHAP). Experiments conducted with eight benchmark algorithms show that FAMeX is able to gauge feature importance in the context of classification better than the competing algorithms. This definitely shows that FAMeX is a promising algorithm in explaining the predictions from an AI system

2605.12055 2026-05-15 cs.CL

Do Language Models Encode Knowledge of Linguistic Constraint Violations?

Hardy, Sebastian Padó

发表机构 * IMS, University of Stuttgart, Stuttgart, Germany(斯图加特大学IMS研究所,斯图加特,德国)

AI总结 本研究探讨了大型语言模型(LLMs)是否在其参数中编码了对语言约束违反的表征,并在处理不合语法的句子时选择性激活这些表征。研究采用稀疏自编码器分解多义激活,提取可能与违反相关的特征,并引入敏感性评分以识别这些特征在违反约束输入中的激活情况。实验结果显示,现有语言模型中并未形成统一的语法违反检测机制,不同语言现象之间也缺乏共享的特征一致性。

详情
英文摘要

Large Language Models (LLMs) achieve strong linguistic performance, yet their internal mechanisms for producing these predictions remain unclear. We investigate the hypothesis that LLMs encode representations of linguistic constraint violations within their parameters, which are selectively activated when processing ungrammatical sentences. To test this, we use sparse autoencoders to decompose polysemantic activations into sparse, monosemantic features and recover candidates for violation-related features. We introduce a sensitivity score for identifying features that are preferentially activated on constraint-violated versus well-formed inputs, enabling unsupervised detection of potential violation-specific features. We further propose a conjunctive falsification framework with three criteria evaluated jointly. Overall, the results are negative in two respects: (1) the falsification criteria are not jointly satisfied across linguistic phenomena, and (2) no features are consistently shared across all categories. While some phenomena show partial evidence of selective causal structure, the overall pattern provides limited support for a unified set of grammatical violation detectors in current LMs.

2605.11853 2026-05-15 cs.LG cs.AI cs.CL

GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

Sijia Li, Yuchen Huang, Zifan Liu, Yanping Li, Jingjing Fu, Li Zhao, Jiang Bian, Ling Zhang, Jun Zhang, Rui Wang

发表机构 * Hong Kong University of Science and Technology(香港科技大学) Microsoft Research Asia(微软亚洲研究院)

AI总结 该论文提出了一种名为GEAR的粒度自适应优势重加权方法,旨在提升大语言模型代理在强化学习中的训练效果。GEAR通过自蒸馏技术,利用token级和段级信号对轨迹级优势进行重加权,从而实现更细粒度的信用分配。该方法通过比较策略网络与教师模型的差异,动态调整信用区域的粒度,有效提升了长期轨迹中的策略更新效率。实验表明,GEAR在多个数学推理和工具使用基准中优于现有方法,尤其在基础较弱的基准上表现突出。

详情
英文摘要

Reinforcement learning has become a widely used post-training approach for LLM agents, where training commonly relies on outcome-level rewards that provide only coarse supervision. While finer-grained credit assignment is promising for effective policy updates, obtaining reliable local credit and assigning it to the right parts of the long-horizon trajectory remains an open challenge. In this paper, we propose Granularity-adaptivE Advantage Reweighting (GEAR), an adaptive-granularity credit assignment framework that reshapes the trajectory-level GRPO advantage using token- and segment-level signals derived from self-distillation. GEAR compares an on-policy student with a ground-truth-conditioned teacher to obtain a reference-guided divergence signal for identifying adaptive segment boundaries and modulating local advantage weights. This divergence often spikes at the onset of a semantic deviation, while later tokens in the same autoregressive continuation may return to low divergence. GEAR therefore treats such spikes as anchors for adaptive credit regions: where the student remains aligned with the teacher, token-level resolution is preserved; where it departs, GEAR groups the corresponding continuation into an adaptive segment and uses the divergence at the departure point to modulate the segment' s advantage. Experiments across eight mathematical reasoning and agentic tool-use benchmarks with Qwen3 4B and 8B models show that GEAR consistently outperforms standard GRPO, self-distillation-only baselines, and token- or turn-level credit-assignment methods. The gains are especially strong on benchmarks with lower GRPO baseline accuracy, reaching up to around 20\% over GRPO, suggesting that the proposed adaptive reweighting scheme is especially useful in more challenging long-horizon settings.

2605.11775 2026-05-15 cs.LG cs.CL

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

Jiazheng Zhang, Ziche Fu, Junrui Shen, Yunbin Zhao, Yunke Zhang, Zhiheng Xi, Long Ma, Chenxin An, Zhihao Zhang, Shichun Liu, Dingwei Zhu, Shihan Dou, Shaofan Liu, Han Li, Wiggin Zhou, Aiden Adams, Tao Gui, Fei Huang, Qi Zhang, Xuanjing Huang

发表机构 * Fudan NLP Group(复旦大学自然语言处理组) Honor Device Co Ltd(荣誉设备有限公司) University of Hong Kong(香港大学) Shanghai Jiao Tong University(上海交通大学) Tencent Hunyuan(腾讯文心)

AI总结 本文研究了强化学习中策略熵的极性特性,提出了熵极性这一新的概念,用于预测策略更新对熵的影响方向。通过理论分析,揭示了熵变化的结构不对称性,并基于此提出了一种新的策略优化方法PAPO,通过优势重加权实现对熵的精确控制。实验表明,PAPO在数学推理和智能体基准任务中表现出更优的训练效率和奖励提升效果。

详情
英文摘要

Policy entropy has emerged as a fundamental measure for understanding and controlling exploration in reinforcement learning with verifiable rewards (RLVR) for LLMs. However, existing entropy-aware methods mainly regulate entropy through global objectives, while the token-level mechanism by which sampled policy updates reshape policy entropy remains underexplored. In this work, we develop a theoretical framework of entropy mechanics in RLVR. Our analysis yields a first-order approximation of the entropy change, giving rise to entropy polarity, a signed token-level quantity that predicts how much a sampled update expands or contracts entropy. This analysis further reveals a structural asymmetry: reinforcing frequent high-probability tokens triggers contraction tendencies, whereas expansive tendencies typically require lower-probability samples or stronger distributional correction. Empirically, we show that entropy polarity reliably predicts entropy changes, and that positive and negative polarity branches play complementary roles in preserving exploration while strengthening exploitation. Building on these insights, we propose Polarity-Aware Policy Optimization (PAPO), which preserves both polarity branches and implements entropy control through advantage reweighting. With the empirical entropy trajectory as an online phase signal, PAPO adaptively reallocates optimization pressure between entropy-expanding and entropy-contracting updates. Experiments on mathematical reasoning and agentic benchmarks show that PAPO consistently outperforms competitive baselines, while delivering superior training efficiency and substantial reward improvements.

2605.11611 2026-05-15 cs.AI

CuSearch: Curriculum Rollout Sampling via Search Depth for Agentic RAG

Jianghan Shen, Siqi Luo, Xinyu Cheng, Jing Xiong, Yue Li, Jiyao Liu, Jiashi Lin, Yirong Chen, Junjun He

发表机构 * Nanjing University(南京大学) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) Peking University(北京大学) University of Hong Kong(香港大学)

AI总结 本文提出了一种名为 CuSearch 的课程式 rollout 采样框架,用于改进基于可验证奖励的强化学习(RLVR)中智能体检索增强生成(RAG)系统的训练。该方法通过搜索深度(search depth)来动态调整 rollout 采样策略,更关注那些包含更多检索决策点、提供更密集监督的深层搜索轨迹。实验表明,CuSearch 能够显著提升不同模型和检索框架下的性能,为 RLVR 训练提供了一种无需人工标注的有效优化手段。

详情
英文摘要

Reinforcement Learning with Verifiable Rewards (RLVR) has emerged as a promising paradigm for training agentic retrieval-augmented generation (RAG) systems from outcome-only supervision. Most existing methods optimize policies from uniformly sampled rollouts, implicitly treating all trajectories as equally informative. However, trajectories differ substantially in search depth and are therefore not equally informative: deeper-search trajectories contain more retrieval decision points and provide denser direct supervision for the retrieval sub-policy. Moreover, this heterogeneity grows over training as the within-batch depth distribution shifts toward higher values, yet uniform rollout sampling remains blind to this shift. To address this, we propose CuSearch, a curriculum rollout sampling framework built on Search-Depth Greedy Allocation (SDGA), a batch-level operator that reallocates a fixed update budget toward deeper-search trajectories. SDGA-Auto always targets the deepest available trajectories in the current batch, yielding an implicit training-aligned curriculum as the depth distribution shifts upward. SDGA-Phase explicitly advances the curriculum threshold as deeper trajectories become sufficiently abundant. Experiments across model types and retrieval frameworks show that CuSearch consistently improves performance, achieving up to 11.8 exact-match points over standard GRPO on ZeroSearch. These results establish per-trajectory search depth as a reliable, annotation-free proxy for retrieval supervision density in RLVR-based agentic RAG training.

2605.11459 2026-05-15 cs.RO cs.AI cs.CV cs.LG

Overcoming Dynamics-Blindness: Training-Free Pace-and-Path Correction for VLA Models

Yanyan Zhang, Chaoda Song, Vikash Singh, Xinpeng Li, Kai Ye, Zhe Hu, Zhongzhu Pu, Yu Yin, Vipin Chaudhary

发表机构 * Case Western Reserve University(凯斯西储大学) The Hong Kong Polytechnic University(香港理工大学) Tsinghua University(清华大学) InspireOmni AI

AI总结 视觉-语言-动作(VLA)模型在灵活性和泛化能力方面表现出色,但大多数现有模型由于采用单帧观测范式,无法感知时间动态变化,导致在非静态环境中性能显著下降。本文提出了一种无需训练的“节奏与路径校正”方法,通过在推理阶段对分块动作的VLA模型进行闭式修正,有效补偿动态变化带来的影响。该方法从单一二次成本函数出发,通过联合优化得到两个正交分解的通道,分别用于压缩执行节奏和调整空间路径,从而在动态环境中显著提升任务成功率。

详情
英文摘要

Vision-Language-Action (VLA) models achieve remarkable flexibility and generalization beyond classical control paradigms. However, most prevailing VLAs are trained under a single-frame observation paradigm, which leaves them structurally blind to temporal dynamics. Consequently, these models degrade severely in non-stationary scenarios, even when trained or finetuned on dynamic datasets. Existing approaches either require expensive retraining or suffer from latency bottlenecks and poor temporal consistency across action chunks. We propose Pace-and-Path Correction, a training-free, closed-form inference-time operator that wraps any chunked-action VLA. From a single quadratic cost, joint minimization yields a unified solution that decomposes orthogonally into two distinct channels. The pace channel compresses execution along the planned direction, while the path channel applies an orthogonal spatial offset, jointly absorbing the perceived dynamics within the chunk window. We evaluate our approach on a comprehensive diagnostic benchmark MoveBench designed to isolate motion as the sole controlled variable. Empirical results demonstrate that our framework consistently outperforms state-of-the-art training-free wrappers and dynamic-adaptive methods and improves success rates by up to 28.8% and 25.9% in absolute terms over foundational VLA models in dynamic-only and static-dynamic mixed environments, respectively.

2605.11410 2026-05-15 cs.AI

What Do EEG Foundation Models Capture from Human Brain Signals?

Ling Tang, Qian Chen, Jilin Mei, Houshi Xu, Quanshi Zhang, Jing Shao, Na Zou, Xia Hu, Dongrui Liu

发表机构 * Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) Shanghai Jiao Tong University(上海交通大学) Fudan University(复旦大学) Tongji University(同济大学) University of Houston(休斯顿大学)

AI总结 该研究探讨了EEG基础模型从人类脑电信号中学习到了哪些信息,并分析了其表征与传统手工特征之间的关系。通过层间岭回归、跨协方差子空间擦除等方法,研究发现EEG基础模型在多个临床任务中表现出色,其优势主要来源于频率域特征及其他多种手工特征的组合。研究还揭示了不同任务中模型性能的差异,并为未来特征发现提供了明确方向。

详情
英文摘要

Clinical electroencephalogram (EEG) analysis rests on a hand-crafted feature catalog refined over decades, \emph{e.g.,} band power, connectivity, complexity, and more. Modern EEG foundation models bypass this catalog, learn directly from raw signals via self-supervised pretraining, and match or outperform feature-engineered baselines on most clinical benchmarks. Whether the two representations align is an open question, which we decompose into three sub-questions: \emph{what does the model learn}, \emph{what does the model use}, and \emph{how much can be explained}. We answer them with layer-wise ridge probing, LEACE-style cross-covariance subspace erasure, and a transparent classifier benchmarked against a random-feature baseline. The audit covers three foundation models (CSBrain, CBraMod, LaBraM), five clinical tasks (MDD, Stress, ISRUC-Sleep, TUSL, Siena), and a 6-family 63-feature lexicon. Of the $945$ (model, task, feature) units, $648$ ($68.6\%$) are representation-causal and $199$ ($21.1\%$) are encoded-only. Across tasks, $50$ features qualify as universal candidates with strong support (all three architectures RC) in two or more tasks. Frequency-domain features dominate, but the other five families each contribute substantial causal mass. Confirmed features recover, on average, $79.3\%$ of the foundation model's advantage over the random baseline, with a clean task gradient (MDD $\approx 0.99$ down to Stress $\approx 0.56$): tasks near ceiling are almost fully recovered by the lexicon, while harder tasks leave a non-trivial residual that pinpoints a concrete target for future concept discovery.

2605.10664 2026-05-15 cs.CL cs.AI

Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions

Diancheng Kang, Zheyuan Liu, Ningshan Ma, Yue Huang, Zhaoxuan Tan, Meng Jiang

发表机构 * Southern University of Science and Technology(南方科技大学) University of Notre Dame(Notre Dame 大学) Massachusetts Institute of Technology(麻省理工学院)

AI总结 该论文研究了如何在对话场景中更有效地控制语言模型的行为,提出了一种新的激活引导方法,以解决传统方法在长对话中累积失效的问题。作者发现,键值缓存污染是导致引导效果下降的主要原因,并提出了一种基于门控裁剪注意力差值的引导方法(GCAD),通过系统提示对自注意力机制的影响进行引导信号提取,并在词元级别进行门控处理。实验表明,该方法在保持角色特征控制的同时,显著提升了长对话中的连贯性与角色表现能力。

Comments 23 pages, 5 figures. This paper proposes GCAD, an attention-level activation steering method for more stable multi-turn behavior control

详情
英文摘要

Activation steering controls language model behavior by adding directions to internal representations at inference time, but standard residual-stream steering can fail in stateful dialogue. We identify KV-cache contamination as a key failure mode: steered token states are stored and repeatedly reused, turning a local perturbation into cumulative coherence degradation. To address this challenge, we propose Gated Cropped Attention-Delta steering (GCAD), which extracts steering signals from system-prompt contributions to self-attention and applies them with token-level gating. Across persona-steering experiments, GCAD preserves trait control while substantially improving long-horizon coherence. On the main multi-turn benchmark, GCAD improves average coherence drift from -18.6 to -1.9 and raises turn-10 trait expression from 78.0 to 93.1. These results suggest that activation steering becomes more reliable when interventions follow the prompt-mediated pathways that models already use for behavioral control.

2605.10550 2026-05-15 cs.CL

Multi-domain Multi-modal Document Classification Benchmark with a Multi-level Taxonomy

Denghao Ma, Qing Liu, Zulong Chen, Chuanfei Xu, Jia Xu, Zhibo Yang, Wei Shao, Zhao Li

发表机构 * Beijing Information Science and Technology University(北京信息科学与技术大学) Alibaba Group(阿里巴巴集团) Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ)(广东人工智能与数字经济实验室(深圳)) Guangzhou University(广州大学) Zhejiang Lab(浙江实验室)

AI总结 本文提出一个名为MMM-Bench的多领域、多模态文档分类基准,旨在解决现有文档分类基准过于简化的问题。该基准构建了一个包含五个层级的深度分类体系,并收集了来自阿里巴巴12个商业领域的5990份真实多模态文档,每份文档均由领域专家标注完整的层次路径。研究通过建立全面的基线模型,系统分析了该基准中的四个核心挑战,并提出了相应的研究见解,为多层级、多领域文档分类的研究提供了坚实的基础。

详情
英文摘要

Document classification forms the backbone of modern enterprise content management, yet existing benchmarks remain trapped in oversimplified paradigms -- single domain settings with flat label structures -- that bear little resemblance to the hierarchical, multi-modal, and cross-domain nature of real-world business documents. This gap not only misrepresents practical complexity but also stifles progress toward industrially viable document intelligence. To bridge this gap, we construct the first Multi-level, Multi-domain, Multi-modal document classification Benchmark (MMM-Bench). MMM-Bench includes (1) a deeply hierarchical taxonomy spanning five levels that capture the authentic organizational logic of business documentation; and (2) 5,990 real-world multi-modal documents meticulously curated from 12 commercial domains in Alibaba. Each document is manually annotated with a complete hierarchical path by domain experts. We establish comprehensive baselines on MMM-Bench, which consists of open-weight models and API-based models. Through systematic experiments, we identify four fundamental challenges within MMM-Bench and propose corresponding insights. To provide a solid foundation for advancing research in multi-level, multi-domain document classification, we release all of the data and the evaluation toolkit of MMM-Bench at https://github.com/MMMDC-Bench/MMMDC-Bench.

2605.10496 2026-05-15 cs.CV

M$^2$E-UAV: A Benchmark and Analysis for Onboard Motion-on-Motion Event-Based Tiny UAV Detection

Weiqi Yan, Lixin Chen, Xiangrui Hou, Zhipeng Cai, Youbiao Wang, Yangyang Shi, Yu Zang, Cheng Wang

发表机构 * Fujian Key Laboratory of Urban Intelligent Sensing and Computing, School of Informatics, Xiamen University, Xiamen, China(福建城市智能感知与计算重点实验室,厦门大学信息学院,厦门,中国) Meta, Menlo Park, USA(Meta,Menlo Park,美国)

AI总结 本文提出M$^2$E-UAV,首个针对运动中事件相机的微型无人机检测数据集与基准,旨在解决在观察者与目标同时运动的情境下,无人机检测面临的背景事件干扰严重、目标稀疏等问题。该数据集包含同步的事件流和IMU数据,并提供了基于时间传播的无人机前景标注,适用于多种表示方法的模型评估。实验表明,现有方法在面对稀疏目标和密集背景事件时仍存在较大局限。

详情
英文摘要

Tiny UAV detection from an onboard event camera is difficult when the observer and target move at the same time. In this motion-on-motion regime, ego-motion activates background edges across buildings, vegetation, and horizon structures, while the UAV may appear as a sparse event cluster. Unlike static- or ground-observer event-based UAV detection, onboard UAV-view detection breaks the clean-background assumption because sensor ego-motion can activate dense background events over the entire field of view. To explore this practical problem, we present M$^2$E-UAV, to the best of our knowledge, the first onboard UAV-view motion-on-motion event-based dataset and benchmark for tiny UAV detection, where both the sensing platform and the target UAV are moving. M$^2$E-UAV provides synchronized event streams and IMU measurements collected from an onboard sensing platform, together with event-level UAV foreground labels derived from temporally propagated 10 Hz bounding-box annotations. The processed benchmark contains 87,223 training samples and 21,395 validation samples across four scene families: sunny building-forest, sunny farm-village, sunset building-forest, and sunset farm-village. We define a train/validation split and an evaluation protocol for comparing representative existing baselines across event-frame, voxel-grid, and point-set representations, with optional IMU input. The benchmark results show that existing baselines remain limited under sparse tiny-target evidence and dense ego-motion-induced background events. Code and benchmark files will be released at https://github.com/Wickyan/M2E-UAV.

2605.10364 2026-05-15 cs.LG

DeepLévy: Learning Heavy-Tailed Uncertainty in Highly Volatile Time Series

Yang Yang, Du Yin, Hao Xue, Flora Salim

发表机构 * University of New South Wales(新南威尔士大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 本文研究了在具有重尾分布的高波动时间序列中建模不确定性这一关键问题,提出了一个名为DeepLévy的深度学习框架。该方法利用Lévy稳定分布的特性,通过最小化经验特征函数与参数化特征函数之间的差异来学习混合Lévy分布,从而有效捕捉极端事件的不确定性。实验表明,DeepLévy在尾部风险指标上优于现有先进方法,尤其在高波动环境下表现突出。

详情
英文摘要

Modeling uncertainty in heavy-tailed time series remains a critical challenge for deep probabilistic forecasting models, which often struggle to capture abrupt, extreme events. While Lévy stable distributions offer a natural framework for modeling such non-Gaussian behaviors, the intractability of their probability density functions severely limits conventional likelihood-based inference. To address this, we introduce DeepLévy, a neural framework that learns mixtures of Lévy stable distributions by minimizing the discrepancy between empirical and parametric characteristic functions. DeepLévy incorporates a mixture mechanism that adaptively learns context-dependent weights and parameters over multiple Lévy components, enabling flexible multi-horizon uncertainty modeling. Evaluations on both real and synthetic datasets demonstrate that DeepLévy outperforms state-of-the-art deep probabilistic forecasting approaches in tail risk metrics, especially under extreme volatility.

2605.10310 2026-05-15 cs.AI cs.CY cs.HC q-bio.NC

Positive Alignment: Artificial Intelligence for Human Flourishing

Ruben Laukkonen, Seb Krier, Chloé Bakalar, Shamil Chandaria, Morten Kringelbach, Adam Elwood, Daniel Ford, Fernando Rosas, Maty Bohacek, Matija Franklin, Nenad Tomašev, Stephanie Chan, Verena Rieser, Roma Patel, Michael Levin, Arun Rao

发表机构 * Department of Psychiatry, University of Oxford(牛津大学精神病学系) Flourishing Intelligence Program, Centre for Eudaimonia and Human Flourishing, Linacre College, University of Oxford(牛津大学幸福智能计划、幸福与人类繁荣中心、林acre学院) Google DeepMind(谷歌DeepMind) LIFE OpenAI Anthropic University of California, Los Angeles(加州大学洛杉矶分校) Aily Labs(Aily实验室) Stanford University(斯坦福大学) Tufts University(塔夫茨大学) Positive AI Labs(积极AI实验室) Department of Informatics, University of Sussex(Sussex大学信息学系) Department of Brain Sciences, Imperial College London(伦敦帝国理工学院脑科学系)

AI总结 本文提出“积极对齐”(Positive Alignment)的概念,旨在开发能够主动支持人类和生态繁荣的人工智能系统,同时保持安全与合作。与现有聚焦于安全与风险防范的对齐研究不同,积极对齐强调系统应具备多元、去中心化、情境敏感及用户主导的特性,并通过培养美德、促进人类福祉来解决当前对齐中的诸多问题。文章还提出了在大语言模型和智能体生命周期中的一系列技术方向与设计原则,以推动分歧包容与去中心化治理。

详情
英文摘要

Existing alignment research is dominated by concerns about safety and preventing harm: safeguards, controllability, and compliance. This paradigm of alignment parallels early psychology's focus on mental illness: necessary but incomplete. What we call Positive Alignment is the development of AI systems that (i) actively support human and ecological flourishing in a pluralistic, polycentric, context-sensitive, and user-authored way while (ii) remaining safe and cooperative. It is a distinct and necessary agenda within AI alignment research. We argue that several existing failures of alignment (e.g., engagement hacking, loss of human autonomy, failures in truth-seeking, low epistemic humility, error correction, lack of diverse viewpoints, and being primarily reactive rather than proactive) may be better addressed through positive alignment, including cultivating virtues and maximizing human flourishing. We highlight a range of challenges, open questions, and technical directions (e.g., data filtering and upsampling, pre- and post-training, evaluations, collaborative value collection) for different phases of the LLM and agents lifecycle. We end with design principles for promoting disagreement and decentralization through contextual grounding, community customization, continual adaptation, and polycentric governance; that is, many legitimate centers of oversight rather than one institutional or moral chokepoint.

2605.10289 2026-05-15 cs.LG stat.ML

Sample-Mean Anchored Thompson Sampling for Offline-to-Online Learning with Distribution Shift

Bochao Li, Yao Fu, Wei Chen, Fang Kong

发表机构 * Southern University of Science and Technology(南方科技大学) Microsoft Research(微软研究院)

AI总结 本文研究了在分布偏移场景下的离线到在线学习问题,旨在利用离线数据提升在线决策性能。为了解决传统汤普森采样(TS)在处理分布偏移时的估计偏差问题,作者提出了基于样本均值锚定的汤普森采样(Anchor-TS),通过引入中位数锚定规则,有效校正了分布偏移带来的估计偏差,提升了算法的稳定性和性能。理论分析表明该方法能够安全利用离线数据加速在线学习,并通过实验验证了其在多种场景下的优越性。

详情
英文摘要

Offline-to-online learning aims to improve online decision-making by leveraging offline logged data. A central challenge in this setting is the distribution shift between offline and online environments. While some existing works attempt to leverage shifted offline data, they largely rely on UCB-type algorithms. Thompson sampling (TS) represents another canonical class of bandit algorithms, well known for its strong empirical performance and naturally suited to offline-to-online learning through its Bayesian formulation. However, unlike UCB indices, posterior samples in TS are not guaranteed to be optimistic with respect to the true arm means. This makes indices constructed from purely online and hybrid data difficult to compare and complicates their use. To address this issue, we propose sample-mean anchored TS (Anchor-TS), which introduces a novel median-based anchoring rule that defines the arm index as the median of an online posterior sample, a hybrid posterior sample, and the online sample mean. The median anchoring systematically corrects bias induced by distribution shift by mitigating over-estimation for suboptimal arms and under-estimation for optimal arms, while exploiting offline information to obtain more accurate estimates when the shift is small. We establish theoretical guarantees showing that the proposed algorithm safely leverages offline data to accelerate online learning, and quantifying how the degree of distribution shift and the size of offline data affect the resulting regret reduction. Extensive experiments demonstrate consistent improvements of our algorithm over baselines.