arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2605.06797 2026-05-11 cs.LG

MIND: Monge Inception Distance for Generative Models Evaluation

MIND:生成模型评估的 Monge 落合距离

Quentin Berthet, Yu-Han Wu, Clement Crepy, Romuald Elie, Klaus Greff, Michael Eli Sander

发表机构 * Google DeepMind(谷歌深Mind) LPSM, Sorbonne Université(巴黎-萨克雷大学LPSM)

AI总结 MIND 通过 sliced Wasserstein 距离评估生成模型,克服 FID 的局限,更高效、稳健且抗攻击,实验表明 5k 样本可替代 50k 样本的 FID 性能。

详情
AI中文摘要

我们提出了 Monge 落合距离(MIND),一种用于评估生成模型的度量标准,解决了广泛采用的 Fréchet 落合距离(FID)的关键限制。MIND 利用 sliced Wasserstein 距离通过平均一维最优运输距离来比较分布,通过排序高效计算。这种方法规避了 FID 所依赖的高维均值和协方差矩阵估计,从而解决了 FID 的样本复杂度低和对抗攻击易受攻击的问题。我们实验证明了三个主要优势:(i) 样本效率提高一个数量级,(ii) 计算速度提高两个数量级,(iii) 对对抗攻击如矩匹配更具鲁棒性。我们展示了 MIND 使用 5k 样本可替代 FID 使用 50k 样本的评估性能,与该标准基准具有高相关性并具有更优的判别性能。我们进一步证明,即使使用更小的样本量(如 1k 或 2k)仍能为快速模型迭代提供高度信息性。

英文摘要

We propose the Monge Inception Distance (MIND), a metric for evaluating generative models that addresses key limitations of the widely adopted Fréchet Inception Distance (FID). The MIND metric leverages the sliced Wasserstein distance to compare distributions by averaging one-dimensional optimal transport distances, efficiently computed via sorting. This approach circumvents the estimation of high-dimensional means and covariance matrices, which underlie FID's poor sample complexity and vulnerability to adversarial attacks. We empirically demonstrate three primary advantages: (i) it is more sample-efficient by one order of magnitude, (ii) it is faster to compute by two orders of magnitude, (iii) it is more robust to adversarial attacks such as moment-matching. We show that MIND with 5k samples can replace the evaluation performance of FID with 50k samples, providing high correlation with this standard benchmark and superior discriminative performance. We further demonstrate that even smaller sample sizes (e.g., 1k or 2k) remain highly informative for rapid model iteration.

2605.06788 2026-05-11 cs.LG cs.MA

Conformal Agent Error Attribution

符合性代理错误归因

Naihe Feng, Yi Sui, Shiyi Hou, Ga Wu, Jesse C. Cresswell

发表机构 * Dalhousie University(达尔豪西大学) Layer 6 AI

AI总结 本文提出基于符合预测框架的代理错误归因方法,通过连续序列预测集实现高效恢复与调试,为多代理系统提供原理化的不确定性层。

Comments 10 pages

详情
AI中文摘要

当多代理系统(MAS)失效时,确定决定性错误发生的位置是自动化恢复到早期状态的第一步。由于大型语言模型基于的MAS生成长交互轨迹,错误归因仍是一个基本挑战。本文提出基于符合预测(CP)的错误归因框架,提供有限样本、分布无关的覆盖保证。我们引入了针对序列数据如代理轨迹设计的基于过滤的CP新算法。与现有CP算法不同,我们的方法预测连续序列集,以实现高效恢复和调试。我们在各种代理和数据集上验证了我们的理论保证,显示错误可以被精确隔离,然后使用预测集回滚MAS以纠正其自身错误。我们的整体方法是模型无关的,并为MAS错误归因提供了一层原理化的不确定性层。我们发布代码在https://github.com/layer6ai-labs/conformal-agent-error-attribution。

英文摘要

When multi-agent systems (MAS) fail, identifying where the decisive error occurred is the first step for automated recovery to an earlier state. Error attribution remains a fundamental challenge due to the long interaction traces that large language model-based MAS generate. This paper presents a framework for error attribution based on conformal prediction (CP) which provides finite-sample, distribution-free coverage guarantees. We introduce new algorithms for filtration-based CP designed for sequential data such as agent trajectories. Unlike existing CP algorithms, our approach predicts sets that are contiguous sequences to enable efficient recovery and debugging. We verify our theoretical guarantees on a variety of agents and datasets, show that errors can be precisely isolated, then use prediction sets to rollback MAS to correct their own errors. Our overall approach is model-agnostic, and offers a principled uncertainty layer for MAS error attribution. We release code at https://github.com/layer6ai-labs/conformal-agent-error-attribution.

2605.06772 2026-05-11 cs.AI cs.HC hep-ph hep-th

When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning

当批评如何提升AI辅助理论物理?SCALAR:用于代理推理的结构化批评-代理循环

Vasilis Niarchos, Constantinos Papageorgakis, Alexander G. Stapleton, Sokratis Trifinopoulos

发表机构 * Department of Physics, CCTP ITCP, University of Crete, 71303, Greece Centre for Theoretical Physics, Department of Physics Astronomy, Queen Mary University of London, London E1 4NS, United Kingdom Theoretical Physics Department, CERN, Geneva, Switzerland

AI总结 本文研究了研究人员与代理交互对AI辅助理论物理研究的影响,通过SCALAR框架评估不同交互结构对科学发现的促进作用。

Comments 17 pages; 9 figures

详情
AI中文摘要

随着大语言模型在研究级物理推理任务中表现日益突出,代理AI的应用日益广泛。本文通过SCALAR(结构化批评-代理循环)框架,研究了研究人员与代理交互对结果的影响。该框架包含Actor、Critic和Judge三个组件,分别提出解决方案、提供迭代反馈并评估结果。通过调整Actor角色、Critic反馈策略和Actor模型家族,发现多轮对话在整体上优于单次尝试,但改进机制和提示选择的价值依赖于Actor-Critic配对。增加模型规模(如从8B参数DeepSeek-R1变体到70B参数DeepSeek-R1)能改善一些简单问题的表现,但无法消除最困难的瓶颈。在不对称Actor-Critic设置中(如轻量级Haiku Actor由更强的Sonnet Critic引导),建设性反馈能提高平均得分。在同一家族Actor-Critic设置中,策略效果较弱:宽松反馈有时更受青睐,而严格和对抗性反馈则无益。SCALAR为评估促进或阻碍AI驱动科学发现的交互结构提供了受控测试环境。

英文摘要

As large language models (LLMs) show increasing promise on research-level physics reasoning tasks and agentic AI becomes more common, a practical question emerges: How does the interaction between researchers and agents affect the results? We study this using SCALAR (Structured Critic--Actor Loop for AI Reasoning), an Actor--Critic--Judge pipeline applied to quantum field theory and string theory problems. The Actor proposes solutions, the Critic provides iterative feedback, and an independent Judge evaluates the transcript against reference solutions. We vary the Actor persona, the Critic feedback strategy, and the Actor model family and scale. Multi-turn dialogue improves over single-shot attempts throughout, but both the mechanism of improvement and the value of different prompting choices depend strongly on the Actor--Critic pairing. Increasing the scale within one model family (e.g. from the 8B-parameter DeepSeek-R1 variant to DeepSeek-R1 70B) improves some easier-problem behavior, but does not remove the hardest bottleneck we observe. Critic feedback strategy matters most clearly in the asymmetric Actor--Critic setting (e.g., a lightweight Haiku Actor guided by a stronger Sonnet Critic), where constructive feedback improves mean-score outcomes. In same-family Actor--Critic settings, strategy effects are weaker: lenient feedback is sometimes favored, while strict and adversarial feedback are not beneficial. Taken together, SCALAR provides a controlled testbed for evaluating which interaction structures help or hinder AI-driven scientific discovery.

2605.06765 2026-05-11 cs.CL cs.AI

VITA-QinYu: Expressive Spoken Language Model for Role-Playing and Singing

VITA-QinYu:用于角色扮演和演唱的表达性口语语言模型

Jiacheng Xu, Heting Gao, Liufei Xie, Zhenchuan Yang, Lijiang Li, Yiting Chen, Bin Zhang, Meng Chen, Chaoyu Fu, Weifeng Zhao, Wenjiang Zhou

发表机构 * QinYu Team(秦宇团队) VITA-Team(VITA团队)

AI总结 VITA-QinYu是首个支持角色扮演和演唱生成的端到端口语语言模型,通过混合语音-文本范式和多代码本音频令牌提升表达性,优于同类模型7个百分点,在演唱和对话任务中均取得显著成果。

Comments https://tme-lyra-lab.github.io/VITA-QinYu/

详情
AI中文摘要

人类语音传达的表达性超越语言内容,包括个性、情绪或表演元素,如安慰语气或哼唱歌曲,我们将其正式化为角色扮演和演唱。我们提出了VITA-QinYu,首个具有表达性的端到端口语语言模型(SLM),超越自然对话支持角色扮演和演唱生成。VITA-QinYu采用混合语音-文本范式,通过多代码本音频令牌扩展交错文本-音频建模,设计使能更丰富的非语言表示,同时保持模态间的清晰分离以避免干扰。我们进一步开发了综合数据生成管道,合成总计15.8K小时的自然对话、角色扮演和演唱数据用于训练。VITA-QinYu在表达性上表现优异,优于同类SLM在角色扮演基准上高出7个百分点,在演唱任务上超越同类模型0.13点(5分制)。同时,它在对话准确性与流畅性上也取得最佳成绩,分别在C3和URO基准上超过先前SLM 1.38和4.98个百分点。我们开源了代码和模型,并提供易于使用的演示,支持流式和全双工交互的全栈支持。

英文摘要

Human speech conveys expressiveness beyond linguistic content, including personality, mood, or performance elements, such as a comforting tone or humming a song, which we formalize as role-playing and singing. We present VITA-QinYu, the first expressive end-to-end (E2E) spoken language model (SLM) that goes beyond natural conversation to support both role-playing and singing generation. VITA-QinYu adopts a hybrid speech-text paradigm that extends interleaved text-audio modeling with multi-codebook audio tokens, a design enabling richer paralinguistic representation while preserving a clear separation between modalities to avoid interference. We further develop a comprehensive data generation pipeline to synthesize a total of 15.8K hours of natural conversation, role-playing, and singing data for training. VITA-QinYu demonstrates superior expressiveness, outperforming peer SLMs by 7 percentage points on objective role-playing benchmarks, and surpassing peer models by 0.13 points on a 5-point MOS scale for singing. Simultaneously, it achieves state-of-the-art conversational accuracy and fluency, exceeding prior SLMs by 1.38 and 4.98 percentage points on the C3 and URO benchmarks, respectively. We open-source our code and models and provide an easy-to-use demo with full-stack support for streaming and full-duplex interaction.

2605.06764 2026-05-11 cs.LG cs.AI

Revisiting Adam for Streaming Reinforcement Learning

重新审视Adam用于流式强化学习

Florin Gogianu, Adrian Catalin Lutu, Razvan Pascanu

发表机构 * GitHub

AI总结 本文研究了在线环境下传统更新方法的有效性,发现C51在流式强化学习中表现优异,基于此提出Adaptive Q(λ)算法,性能超越现有方法。

详情
AI中文摘要

在实时交互序列中学习,一旦观察到并采取行动,而无需显式存储,具有更简单、更高效和适应性算法的潜力。然而,多年来深度强化学习却走上了相反的道路,通过回放缓冲区或并行采样程序增强智能体,以平息学习的不稳定性。最近,Elsayed等人(2024)重新审视了这一主题,专注于通过资格痕迹进行更新计算以及优化过程的修改,从而产生了StreamQ算法。在本工作中,我们退后一步,研究了在在线环境下已建立的更新方法,如DQN和C51所实现的更新方法的有效性。我们不仅发现它们表现良好,而且通过分析优化算法一般情况以及Adam特别是如何与这些更新相互作用,我们认为有两个属性对于稳健性能至关重要:i)目标的导数应有界;ii)权重更新应调整方差。严格且全面的实验表明,C51表现出这两个特性,在55款Atari游戏的子集中与StreamQ竞争。利用这些见解,我们推导出基于资格痕迹的方差调整算法,称为Adaptive Q(λ),在相同子集上接近人类基线的两倍,所有性能指标均超越现有方法。

英文摘要

Learning from a sequence of interactions, as soon as observations are perceived and acted upon, without explicitly storing them, holds the promise of simpler, more efficient and adaptive algorithms. For over a decade, however, deep reinforcement learning walked the contrary path, augmenting agents with replay buffers or parallel sampling routines, in an effort to tame learning instability. Recently, this topic has been revisited by Elsayed et al. (2024), focusing on update computation through eligibility traces and modifications to the optimisation routine, resulting in the StreamQ algorithm. In this work we take a step back, investigating the efficacy of established updates, such as those implemented by DQN and C51 within this online setting. Not only do we find that they perform well, but through analysing how the optimisation algorithm generally, and Adam in particular, interacts with these updates, we contend that two properties are essential for robust performance: i) the derivative of the objective is to be bounded and ii) weight updates are variance-adjusted. Rigorous and exhaustive experimentation demonstrates that C51, which exhibits both characteristics, is competitive with StreamQ across a subset of 55 Atari games. Using these insights, we derive a variance-adjusted algorithm based on eligibility traces, termed Adaptive Q$(λ)$, which approaches double the human baseline on the same subset, surpassing existing methods by all performance metrics.

2605.06761 2026-05-11 cs.AI cs.CV cs.LG

Weblica: Scalable and Reproducible Training Environments for Visual Web Agents

Weblica: 可扩展且可重复的视觉网络代理训练环境

Oğuzhan Fatih Kar, Roman Bachmann, Yuanzheng Gong, Anders Boesen Lindbo Larsen, Afshin Dehghan

发表机构 * Apple(苹果公司)

AI总结 Weblica提出一种可扩展且可重复的视觉网络代理训练框架,通过HTTP级缓存和LLM环境合成技术,实现大规模多样化的网络环境训练,其最佳模型在多个网络导航基准中表现优异。

Comments 28 pages, 19 figures

详情
AI中文摘要

网络环境复杂、开放且不断变化,使视觉网络代理的训练数据扩展变得困难。现有数据收集方法局限于离线轨迹或少量模拟环境,无法捕捉网络多样性。我们提出Weblica(Web Replica)框架,利用HTTP级缓存捕获和回放稳定的视觉状态并保留交互行为,以及基于LLM的环境合成技术。使用该框架,我们扩展了RL训练到数千种多样化环境和任务。我们的最佳模型Weblica-8B在多个网络导航基准中优于类似规模的开源基线模型,使用更少的推理步骤,具有良好的扩展性,并与API模型竞争。

英文摘要

The web is complex, open-ended, and constantly changing, making it challenging to scale training data for visual web agents. Existing data collection attempts remain limited to offline trajectories for supervised fine-tuning or a handful of simulated environments for RL training, thus failing to capture web diversity. We propose Weblica (Web Replica), a framework for constructing reproducible and scalable web environments. Our framework leverages 1) HTTP-level caching to capture and replay stable visual states while preserving interactive behavior and 2) LLM-based environment synthesis grounded in real-world websites and core web navigation skills. Using this framework, we scale RL training to thousands of diverse environments and tasks. Our best model, Weblica-8B, outperforms open-weight baselines of similar size across multiple web navigation benchmarks while using fewer inference steps, scales favorably with additional test-time compute, and is competitive with API models.

2605.06759 2026-05-11 cs.RO

An Aerial Manipulator for Perception-Driven Flower Targeting Toward Contactless Pollination in Vertical Farming

一种用于感知驱动花靶向的空中机械臂,以实现垂直农场的无接触授粉

Chenzhe Jin, Zhuohang Wu, Yifan Cai, Xiangqi Li, Jan Ming Kevin Tan, Narsimlu Kemsaram, Valerio Modugno

发表机构 * Department of Computer Science University College London London United Kingdom(计算机科学系伦敦大学学院伦敦英国) Department of Artificial Intelligence University of Malaya Kuala Lumpur Malaysia(人工智能系马来大学吉隆坡马来西亚) University College London(伦敦大学学院) University of Malaya(马来大学)

AI总结 本文提出了一种用于垂直农场中感知驱动花靶向的空中机械臂平台,结合RGBD感知、MPPI控制和轻量2自由度机械臂,实现精确末端定位和稳定授粉。

Comments This paper has been accepted for publication in the Proceedings of the 2026 4th International Conference on Robotics, Control and Vision Engineering (RCVE 2026), 10-12 July, 2026, Tokyo, Japan

详情
AI中文摘要

自然授粉者的减少给受控室内农业作物生产带来了重大挑战,尤其是在缺乏自然昆虫授粉的垂直农场环境中。这促使开发能够执行精确花靶向任务并最小化对脆弱花结构物理干扰的机器人系统。本文提出了一种用于感知驱动花检测、定位和接近的空中机械臂平台。所提出的系统集成了机载RGBD感知、基于模型预测路径积分(MPPI)的无人机(UAV)控制(在PX4平台上)以及轻量2自由度机械臂,以实现精确末端定位。该平台在MuJoCo模拟和无人机实验室实验中使用花靶向测试台进行评估。实验结果表明,UAV飞行稳定,花定位可靠,末端定位精度达到厘米级。在模拟中,所提出的控制器实现了稳定的轨迹收敛和准确的目标对齐。在现实世界无人机实验室环境中,集成的感知控制机械臂框架能够在受限空中操作下实现稳定的花靶向定位和末端对齐。这些结果验证了所提出的空中机械臂作为未来无接触授粉系统中稳健的机器人载体和定位框架。虽然当前研究专注于感知引导的靶向和定位,但所开发的平台为未来集成先进的无接触末端执行器,包括基于声学的花粉操控模块,提供了实用的基础。

英文摘要

The decline of natural pollinators has created a major challenge for crop production in controlled indoor agriculture, particularly in vertical farming environments where natural insect pollination is absent. This motivates the development of robotic systems capable of performing precise flower targeting tasks while minimizing physical interference with delicate floral structures. This paper presents an aerial manipulator platform for perception driven flower detection, localization, and approach in vertical farming environments. The proposed system integrates onboard RGBD based perception, model predictive path integral (MPPI) based unmanned aerial vehicle (UAV) control on a PX4 platform, and a lightweight 2DoF manipulator for precise end effector positioning. The platform is evaluated in both MuJoCo simulation and UAV lab experiments using a flower targeting testbed. The experimental results demonstrate stable UAV flight, reliable flower localization, and centimeter level end effector positioning accuracy. In simulation, the proposed controller achieves consistent trajectory convergence and accurate target alignment. In the real world UAV lab environment, the integrated perception control manipulation framework enables stable flower targeted positioning and end effector alignment under constrained aerial operation. These results validate the proposed aerial manipulator as a robust robotic carrier and positioning framework for future contactless pollination systems. While the current study focuses on perception guided targeting and positioning, the developed platform provides a practical foundation for integrating advanced contactless end effectors, including acoustic based pollen manipulation modules, in future work.

2605.06756 2026-05-11 cs.LG cs.SY eess.SY

Physics-based Digital Twins for Integrated Thermal Energy Systems Using Active Learning

基于物理的数字孪生用于集成热能系统的主动学习

Umme Mahbuba Nabila, Paul Seurin, Linyu Lin, Majdi I. Radaideh

发表机构 * a Department of Nuclear Engineering Radiological Sciences, University of Michigan, Ann Arbor, MI 48109, United States b Department of Computer Science Engineering, University of Michigan, Ann Arbor, MI 48109, United States c Nuclear Science \& Technology Division, Idaho National Laboratory, Idaho Falls, ID 83415, United States

AI总结 本文提出结合物理模型与数据驱动方法的主动学习框架,用于高效准确地控制热能分布系统,通过减少模拟轨迹数量提升预测精度和计算效率。

Comments 23 pages, 12 figures, and 2 tables

详情
AI中文摘要

实时监督控制热能分布系统需要准确、可解释且不确定性的数字孪生,但需兼顾数据和计算效率。高保真模拟成本高,而纯数据驱动模型缺乏鲁棒性。为此,本文提出结合系统级Modelica模拟与四种更简单的物理信息和数据驱动的替代模型方法的主动学习框架:确定性稀疏非线性动力学识别与控制(SINDyC)、其概率多变量高斯扩展(MvG-SINDyC)、前馈神经网络(FNN)和门控循环单元(GRU)网络。针对每个替代模型,采用特定的主动学习查询策略,包括在系数空间中使用马哈拉诺布斯距离采样用于MvG-SINDyC,以及在预测空间中基于误差采样用于SINDyC、FNN和GRU,使学习过程优先考虑动态信息丰富的轨迹。所提方法在爱达荷国家实验室热能分布系统(TEDS)的甘油热交换器(GHX)子系统上得到验证。在关键GHX输出——旁路质量流量$\dot{m}_{\mathrm{GHX}}$和传热速率$Q_{\mathrm{GHX}}$方面,AL框架使用仅五分之一的模拟轨迹即可达到与随机采样相当的预测精度。在评估的替代模型中,GRU实现了最高的预测保真度,而SINDyC是最具计算效率和可解释性的。概率MvG-SINDyC替代模型进一步实现了不确定性量化,并在主动学习下表现出最大的计算优势。

英文摘要

Real-time supervisory control of thermal energy distribution systems requires digital twins that are accurate, interpretable, and uncertainty-aware, yet remain data and computationally efficient. High-fidelity simulations alone are costly, while purely data-driven surrogates often lack robustness. To address these challenges, this work proposes an active learning (AL) framework that couples system-level Modelica simulations with four simpler physics-informed and data-driven surrogate modeling approaches: deterministic Sparse Identification of Nonlinear Dynamics with Control (SINDyC), its probabilistic multivariate-Gaussian extension (MvG-SINDyC), feedforward neural network (FNN), and gated recurrent unit (GRU) network. Tailored to each surrogate, model-specific AL query strategies are employed, including Mahalanobis-distance sampling in coefficient space for MvG-SINDyC and error-based sampling in prediction space for SINDyC, FNN, and GRU, allowing the learning process to prioritize dynamically informative trajectories. The proposed approach is demonstrated on the glycol heat exchanger (GHX) subsystem of the Thermal Energy Distribution System (TEDS) at Idaho National Laboratory. Across key GHX outputs--the bypass mass flow rate $\dot{m}_{\mathrm{GHX}}$ and heat transfer rate $Q_{\mathrm{GHX}}$-the AL framework achieves comparable predictive accuracy using as few as one-fifth of the simulation trajectories required by random sampling. Among the evaluated surrogates, the GRU achieves the highest predictive fidelity, while SINDyC remains the most computationally efficient and interpretable. The probabilistic MvG-SINDyC surrogate further enables uncertainty quantification and exhibits the largest computational gains under AL.

2605.06755 2026-05-11 cs.LG cs.AI

Gradient Extrapolation-Based Policy Optimization

基于梯度外推的策略优化

Ismam Nur Swapnil, Aranya Saha, Tanvir Ahmed Khan, Mohammad Ariful Haque, Ser-Nam Lim

发表机构 * Bangladesh University of Engineering and Technology(孟加拉工程与技术大学) University of Maryland, College Park(马里兰大学学院公园分校) Illinois Institute of Technology(伊利诺伊理工学院) University of Central Florida(佛罗里达中央大学)

AI总结 本文提出GXPO算法,通过三步反向传播模拟多步前瞻,提升强化学习中大语言模型的推理能力,实验显示在数学推理任务中性能优于GRPO和SFPO。

Comments 26 pages, 9 figures

详情
AI中文摘要

强化学习被广泛用于提升大语言模型的推理能力,尤其是在答案可自动验证的情况下。标准GRPO式训练仅使用当前步骤更新模型,而全多步前瞻能提供更好的更新方向但成本过高,因为它需要多次反向传播。我们提出梯度外推基于策略优化(GXPO),一种适用于GRPO式推理强化学习的策略更新规则。GXPO通过三次反向传播近似更长的局部前瞻。它重用相同的轨迹、奖励、优势和GRPO损失批次,因此不需要在前瞻点计算新的轨迹或奖励。GXPO执行两次快速优化器步骤,测量梯度变化,预测虚拟K步前瞻点,将策略部分向该点移动,然后使用新位置的真实梯度应用修正更新。当前瞻信号不稳定时,GXPO自动切换回标准单次GRPO。我们还给出了梯度下降替代分析,解释了何时外推是精确的以及局部误差来源。在Qwen2.5和Llama数学推理实验中,GXPO在GRPO上平均采样pass@1提升了+1.65至+5.00点,在最强的SFPO设置上提升了+0.14至+1.28点,同时保持主动相成本固定在三次反向传播。它还在达到GRPO峰值精度方面实现了4.00倍的步骤加速、2.33倍的实时时钟加速和1.33倍的反向传播加速。

英文摘要

Reinforcement learning is widely used to improve the reasoning ability of large language models, especially when answers can be automatically checked. Standard GRPO-style training updates the model using only the current step, while full multi-step lookahead can give a better update direction but is too expensive because it needs many backward passes. We propose Gradient Extrapolation-Based Policy Optimization (GXPO), a plug-compatible policy-update rule for GRPO-style reasoning RL. GXPO approximates a longer local lookahead using only three backward passes during an active phase. It reuses the same batch of rollouts, rewards, advantages, and GRPO loss, so it does not require new rollouts or reward computation at the lookahead points. GXPO takes two fast optimizer steps, measures how the gradients change, predicts a virtual K-step lookahead point, moves the policy partway toward that point, and then applies a corrective update using the true gradient at the new position. When the lookahead signal becomes unstable, GXPO automatically switches back to standard single-pass GRPO. We also give a plain-gradient-descent surrogate analysis that explains when the extrapolation is exact and where its local errors come from. Across Qwen2.5 and Llama math-reasoning experiments, GXPO improves the average sampled pass@1 by +1.65 to +5.00 points over GRPO and by +0.14 to +1.28 points over the strongest SFPO setting, while keeping the active-phase cost fixed at three backward passes. It also achieves up to 4.00x step speedup, 2.33x wall-clock speedup, and 1.33x backward-pass speedup in reaching GRPO's peak accuracy.

2605.06747 2026-05-11 cs.CV cs.RO

HumanNet: Scaling Human-centric Video Learning to One Million Hours

HumanNet: 将以人为中心的视频学习扩展到一百万小时

Yufan Deng, Daquan Zhou

发表机构 * Peking University(北京大学)

AI总结 HumanNet通过构建大规模人类活动视频数据集,探索以人为中心的视频在扩展具身基础模型中的潜力,展示其在动作生成和人机转移中的优势。

Comments Github: https://github.com/DAGroup-PKU/HumanNet Project website: https://dagroup-pku.github.io/HumanNet/

详情
AI中文摘要

进展在具身智能中越来越依赖可扩展的数据基础设施。尽管视觉和语言已通过互联网语料扩展,但学习物理交互仍受制于缺乏大规模、多样化和丰富注释的人类活动数据。我们提出了HumanNet,一个覆盖一百万小时的人类中心视频数据集,记录人类与物理世界的大规模互动。HumanNet涵盖第一人称和第三人称视角,覆盖细粒度活动、人类-物体互动、工具使用和长周期行为,跨越多样化现实环境。除了原始视频,该数据集还提供以互动为中心的注释,包括描述、运动描述和手和身体信号,使运动感知和互动感知学习成为可能。除了规模,HumanNet引入了系统化的数据整理范式用于具身学习,其中以人类为中心的过滤、时间结构化、视角多样性以及注释丰富化被视为首要设计原则。这一设计将无结构的互联网视频转化为可扩展的表示学习、活动理解、动作生成和人机转移的基质。我们通过受控的视觉-语言-动作消融验证了这一设计的价值:在固定验证数据集下,从Qwen VLM模型继续训练,使用来自HumanNet的1000小时第一人称视频,超过了使用Magic Cobot的100小时真实机器人数据继续训练的结果,表明第一人称人类视频可能成为可扩展且成本效益高的机器人数据替代品。通过构建该项目,我们旨在探索使用以人类为中心的视频扩展具身基础模型的机会,而不是仅依赖机器人特定数据。

英文摘要

Progress in embodied intelligence increasingly depends on scalable data infrastructure. While vision and language have scaled with internet corpora, learning physical interaction remains constrained by the lack of large, diverse, and richly annotated human activity data. We present HumanNet, a one-million-hour human-centric video corpus that captures how humans interact with the physical world at scale. HumanNet spans both first-person and third-person perspectives and covers fine-grained activities, human-object interactions, tool use, and long-horizon behaviors across diverse real-world environments. Beyond raw video, the dataset provides interaction-centric annotations, including captions, motion descriptions, and hand and body-related signals, enabling motion-aware and interaction-aware learning. Beyond scale, HumanNet introduces a systematic data curation paradigm for embodied learning, where human-centric filtering, temporal structuring, viewpoint diversity, and annotation enrichment are treated as first-class design principles. This design transforms unstructured internet video into a scalable substrate for representation learning, activity understanding, motion generation, and human-to-robot transfer. We conduct a first-step validation on the value of this design through controlled vision-language-action ablation: under a fixed set of validation data, continued training from the Qwen VLM model with 1000 hours of egocentric video drawn from HumanNet surpasses the continued training with 100 hours of real-robot data from Magic Cobot, indicating that egocentric human video could be a scalable and cost-effective substitute for robot data. By building this project, we aim to explore the opportunity to scale embodied foundation models using human-centric videos, rather than relying solely on robot-specific data.

2605.06741 2026-05-11 cs.LG

A Closed-Form Upper Bound for Admissible Learning-Rate Steps in Belief-Space Dynamics

信念空间动力学中可接受学习率步骤的闭式上界

Zixi Li, Youzhen Li

发表机构 * Datawhale

AI总结 本文提出信念空间动力学中可接受学习率步骤的闭式上界公式,通过概率简单集的投影前向步骤建模更新,并在自然KL/Bregman几何中定义可接受性。

详情
AI中文摘要

学习率步骤通常被视为超参数。本文隔离了一个局部信念空间计算:当更新建模为概率简单集上的投影前向步骤时,可接受性意味着在自然KL/Bregman几何中的收缩性。在此模型下,可接受步骤的上界不是调参口号,而是一个公式。

英文摘要

Learning-rate steps are usually treated as hyperparameters. This paper isolates a local beliefspace calculation: when an update is modeled as a projected forward step on the probability simplex, admissibility means contractivity in the natural KL/Bregman geometry. Under this model, the upper bound of an admissible step is not a tuning slogan but a formula.

2605.06740 2026-05-11 cs.LG cs.AI

Geometric Kolmogorov--Arnold Network (GeoKAN)

几何科拉莫戈罗夫-阿诺尔德网络(GeoKAN)

Abhijit Sen, Bikram Keshari Parida, Giridas Maiti, Mahima Arya, Denys I. Bondar

发表机构 * Department of Physics and Engineering Physics, Tulane University(Tulane 大学物理与工程物理系) Institute of Applied Geosciences, Karlsruhe Institute of Technology(卡尔斯鲁厄技术大学应用地球科学研究所)

AI总结 GeoKAN通过学习几何适应坐标进行函数逼近,提升科学计算和微分方程问题的建模能力。

Comments 46 pages, 24 figures, 13 tables

详情
AI中文摘要

GeoKAN通过学习几何适应的坐标进行函数逼近,提升科学计算和微分方程问题的建模能力。

英文摘要

We introduce Geometric Kolmogorov--Arnold Networks (GeoKANs), a family of geometry-aware KAN-type models in which approximation is carried out in learned, geometry-adapted coordinates rather than in fixed Euclidean input coordinates. GeoKAN achieves this by learning a diagonal Riemannian metric that warps the input before basis expansion and feature mixing. The learned metric provides a geometric inductive bias through local length scaling and volume distortion, and in physics-informed settings it also affects the differential structure seen by the model. Within this framework, we develop three main variants, namely GeoKAN-NNMetric, GeoKAN-$γ$, and LM-KAN. For LM-KAN, we further consider three basis-specific versions, LM-KAN-RBF, LM-KAN-Wav, and LM-KAN-Fourier. These variants allow us to study geometry-aware KAN models both as general function approximators and as surrogates in physics-informed learning. By stretching regions with rapid variation and compressing smoother regions, GeoKAN reallocates representational resolution in a task-dependent manner, allowing the model to place capacity where it is most needed. As a result, GeoKAN is well suited to sharp, stiff, localized, and strongly non-uniform regimes arising in scientific machine learning and differential-equation problems.

2605.06736 2026-05-11 cs.LG cs.AI cs.HC

STDA-Net: Spectrogram-Based Domain Adaptation for cross-dataset Sleep Stage Classification

STDA-Net:基于频谱图的跨数据集睡眠阶段分类域适应

Unaza Tallal, Shruti Kshirsagar, Ankita Shukla

发表机构 * School of Computing, Wichita State University(维斯科州立大学计算机学院) Computer Science and Engineering Department, University of Nevada, Reno(内华达大学里诺分校计算机科学与工程系)

AI总结 本文提出STDA-Net框架,结合CNN提取频谱图特征、BiLSTM建模睡眠动态和DANN实现无监督域适应,提升跨数据集睡眠阶段分类的准确性和稳定性。

Comments submitted to IEEE SMC conference

详情
AI中文摘要

准确的跨数据集睡眠阶段分类仍面临挑战,由于EEG通道布局、采样率、记录环境和受试者群体的差异。尽管深度学习在自动化睡眠分期中表现出色,但大多数现有跨数据集方法依赖于一维EEG信号表示,而利用二维频谱图输入在无监督域适应框架中的应用仍较少探索。本文提出STDA-Net(基于频谱图的时域域适应网络),结合卷积神经网络(CNN)提取频谱图特征、双向长短期记忆(BiLSTM)模块建模睡眠动态,以及域对抗神经网络(DANN)实现源到目标特征对齐,无需任何标记的目标域数据进行训练。实验在三个公开数据集Sleep-EDF、SHHS-1和SHHS-2上进行,六个跨数据集迁移设置下进行测试。结果表明,所提框架在平均准确率为89.03%和平均宏F1得分为87.64%,在平衡分类性能上优于现有1D基线方法,且在五次独立运行中方差显著降低,表明改进的稳定性和可重复性。总体而言,这些发现表明,结合时域建模和对抗域适应的二维频谱图表示,为跨数据集睡眠分期提供了稳健且具有竞争力的替代方案,替代传统的一维EEG输入。

英文摘要

Accurate sleep stage classification across datasets remains challenging due to variability in EEG channel montages, sampling rates, recording environments, and subject populations. Although deep learning has shown considerable promise for automated sleep staging, most existing cross-dataset methods rely on one-dimensional EEG signal representations, whereas the use of two-dimensional spectrogram-based inputs within an unsupervised domain adaptation framework has remained largely unexplored. Here, we propose STDA-Net (Spectrogram-based Temporal Domain Adaptation Network), a framework that combines a convolutional neural network (CNN) for spectrogram-based feature extraction, a bidirectional long short-term memory (BiLSTM) module for temporal modeling of sleep dynamics, and a domain-adversarial neural network (DANN) for source-to-target feature alignment without requiring any labeled target-domain data during training. Experiments are conducted on three publicly available datasets Sleep-EDF, SHHS-1, and SHHS-2 under six cross-dataset transfer settings. Results show that the proposed framework achieves an average accuracy of 89.03% and an average macro F1-score of 87.64%, consistently outperforming existing 1D baseline methods in terms of balanced classification performance, with substantially lower variance across five independent runs, indicating improved stability and reproducibility. Overall, these findings demonstrate that 2D spectrogram-based representations, combined with temporal modeling and adversarial domain adaptation, provide a robust and competitive alternative to conventional 1D EEG inputs for cross-dataset sleep staging.

2605.06733 2026-05-11 cs.LG cs.AI

Beyond Factor Aggregation: Gauge-Aware Low-Rank Server Representations for Federated LoRA

超越因子聚合:面向联邦LoRA的 gauge 意识低秩服务器表示

Jinqian Chen, Chang Liu, Jihua Zhu

发表机构 * School of Software Engineering(软件工程学院)

AI总结 GLoRA通过估计共识更新子空间和共享参考坐标,实现语义化的低秩更新聚合,优于联邦LoRA基线,在数据、资源和任务异质性下表现更优。

详情
AI中文摘要

联邦LoRA在去中心化数据和有限客户端资源下实现参数高效适应。然而,直接平均LoRA因子是依赖表示的:相同的内在更新允许无限多种 gauge 等价因子分解,因此因子级聚合在任意坐标选择下会变化,而底层更新保持不变。这揭示了现有联邦LoRA聚合规则的语义不匹配。我们提出GLoRA,一种面向联邦LoRA的gauge意识服务器表示。不同于聚合原始因子,GLoRA从客户端投影器估计共识更新子空间,并在共享参考坐标下聚合客户端更新,从而在低秩形式中完全表示语义更新聚合。为支持异构客户端能力,GLoRA进一步提供一种秩兼容的读取器,从同一服务器状态实例化不同秩的适配器,而无需密集更新重建。在GLUE和SuperNI实验中,GLoRA在数据、资源和任务异质性下,包括异构客户端秩、稀疏参与、更大主干和未见任务评估中,始终优于联邦LoRA基线。GLoRA还实现了有利的效率-性能权衡,表明有效的联邦LoRA不仅需要平均低秩因子,还需要定义语义上有意义的服务器端表示用于聚合。

英文摘要

Federated LoRA enables parameter-efficient adaptation of large language models under decentralized data and limited client resources.However, directly averaging LoRA factors is representation-dependent: the same intrinsic update admits infinitely many gauge-equivalent factorizations, so factor-level aggregation can change under arbitrary coordinate choices while the underlying update remains unchanged. This reveals a semantic mismatch in existing federated LoRA aggregation rules. We propose \textbf{GLoRA}, a gauge-aware server representation for federated LoRA.Instead of aggregating raw factors, GLoRA estimates a consensus update subspace from client projectors and aggregates client updates in shared reference coordinates, thereby representing semantic update aggregation entirely in low-rank form. To support heterogeneous client capacities, GLoRA further provides a rank-compatible readout that instantiates adapters of different ranks from the same server state without dense update reconstruction. Experiments on GLUE and SuperNI show that GLoRA consistently outperforms federated LoRA baselines under data, resource, and task heterogeneity, including heterogeneous client ranks, sparse participation, larger backbones, and unseen-task evaluation. GLoRA also achieves a favorable efficiency--performance trade-off, suggesting that effective federated LoRA requires not merely averaging low-rank factors, but defining a semantically meaningful server-side representation for aggregation.

2605.06730 2026-05-11 cs.LG

Semantic State Abstraction Interfaces for LLM-Augmented Portfolio Decisions: Multi-Axis News Decomposition and RL Diagnostics

语义状态抽象接口用于LLM增强的组合决策:多轴新闻分解与强化学习诊断

Likhita Yerra, Remi Uttejitha Allam

发表机构 * AIVANCITY School of AI & Data(AIVANCITY人工智能与数据学院)

AI总结 本文提出语义状态抽象接口(SSAI),通过多轴新闻分解和强化学习诊断,提升稀疏文本决策系统的可解释性与性能。

Comments 18 pages, 3 figures. NeurIPS 2024 manuscript style (preprint)

详情
AI中文摘要

我们引入语义状态抽象接口(SSAI):一种将稀疏非结构化文本映射为K个可审计、命名坐标的方法,设计中在无新闻日保持中性默认值,旨在分离表示假设与优化方差。我们的贡献是框架及其评估协议,而非声称SSAI优于更密集的替代方法。我们以K=4轴(情绪、风险、信心、波动率预测)在美股市面板(30只纳斯达克100成分股,FNSPID新闻,2019-2023测试)上实例化SSAI,并在直接因子组合、监督岭预测器和RL代理(DP-PPO、SAC)上评估,这些代理共享相同的固定ϕ。四因子因子组合达到307.2%的累计回报和夏普比率1.067,但相对于买入持有(243.6%)的显着收益在覆盖分层控制中失效,在≥0.2%的成本下反转,并在情绪单一基线中统计上脆弱;PC1复合和FinBERT组合基线在此设置中表现更强。岭和RL模块诊断表示与优化器效应。我们将SSAI定位为稀疏文本决策系统的可解释性-性能诊断和可重用协议。

英文摘要

We introduce Semantic State Abstraction Interfaces (SSAI): a methodological template for mapping sparse unstructured text into $K$ auditable, named coordinates with neutral defaults on no-news days, designed to separate representation hypotheses from optimisation variance in sequential decision systems. Our contribution is the framework and its evaluation protocol, not a claim that SSAI outperforms denser alternatives. We instantiate SSAI with $K=4$ axes (sentiment, risk, confidence, volatility forecast) on a US-equity panel (30 NASDAQ-100 names, FNSPID news, 2019--2023 test), and evaluate it across direct factor portfolios, supervised ridge forecasters, and RL agents (DP-PPO, SAC) that share the same fixed $ϕ$. The four-factor factor portfolio reaches 307.2% cumulative return and Sharpe 1.067, but apparent gains versus buy-and-hold (243.6%) fail coverage-stratified controls, reverse at $\geq 0.2$% costs, and are statistically fragile versus a sentiment-only baseline; a PC1 composite and a FinBERT portfolio baseline are stronger ranking signals in this setting. Ridge and RL blocks diagnose representation versus optimiser effects. We position SSAI as an interpretability-performance diagnostic and reusable protocol for sparse-text decision systems.

2605.06729 2026-05-11 cs.LG cs.AI

The E$Δ$-MHC-Geo Transformer: Adaptive Geodesic Operations with Guaranteed Orthogonality

E$Δ$-MHC-Geo变换器:具有保证正交性的自适应测地操作

Arash Shahmansoori

发表机构 * Independent Researcher(独立研究者)

AI总结 本文提出E$Δ$-MHC-Geo变换器,结合流形约束超连接、深度delta学习和Cayley变换,实现输入自适应且无条件正交的残差连接。通过数据依赖的Cayley旋转和Householder反射的混合方法,提升长时稳定性和旋转损失性能。

Comments 21 pages, 8 figures; code will be available at https://github.com/arash-shahmansoori/edelta

详情
AI中文摘要

本文提出E$Δ$-MHC-Geo变换器,结合流形约束超连接、深度delta学习和Cayley变换,实现输入自适应且无条件正交的残差连接。通过数据依赖的Cayley旋转和Householder反射的混合方法,提升长时稳定性和旋转损失性能。

英文摘要

We present the E$Δ$-MHC-Geo Transformer, a novel architecture that unifies Manifold-Constrained Hyper-Connections (mHC), Deep Delta Learning (DDL), and the Cayley transform to obtain input-adaptive, unconditionally orthogonal residual connections. Unlike DDL, whose Householder operator is orthogonal only at $β\in \{0,2\}$, our Data-Dependent Cayley rotation $Q(x)=(I+(β/2)A(x))^{-1}(I-(β/2)A(x))$ preserves orthogonality for all $β$ and all inputs. To handle negation, an eigenvalue $-1$ case that Cayley provably excludes, we introduce the E$Δ$-MHC-Geo Hybrid, which combines Cayley rotation with Householder reflection via a learned operator-selection gate $X'=γ(X)Q(X)X+(1-γ(X))H_2(X)X$. A midpoint-collapse regularizer, $4γ(1-γ)$, encourages boundary gate decisions, where each selected component is orthogonal. In matched-parameter comparisons, with approximately 1.79M parameters per model and mean +/- standard deviation over 3 seeds, against four baselines including the concurrent JPmHC, E$Δ$-MHC-Geo achieves the best long-horizon stability, 1.9x over JPmHC and 3.8x over GPT; the best near-$π$ rotation loss, 4.5x over JPmHC on single-plane; strong norm preservation, with 0.001 mean deviation; and 0.96 negation cosine alignment in a diagnostic reflection probe, all with 33% fewer layers. While JPmHC's wider representation excels on pure rotation, its finite Cayley residual mixer excludes an exact $λ=-1$ operator and has no reflection branch, motivating our hybrid approach for accessing both connected components of $O(n)$.

2605.06727 2026-05-11 cs.LG cs.ET eess.IV

Medical Imaging Classification with Cold-Atom Reservoir Computing using Auto-Encoders and Surrogate-Driven Training

利用冷原子共振计算进行医学影像分类:结合自编码器和代理驱动训练

Nuno Batista, Ana Morgado, Oscar Ferraz, Sagar Silva Pratapsi, Jorge Lobo, Gabriel Falcao

发表机构 * Instituto de Telecomunicações, Dept. of Electrical and Computer Engineering, University of Coimbra, Portugal(葡萄牙科英布拉大学电信研究所,电气与计算机工程系) ISR - Institute of Systems and Robotics, Dept. of Electrical and Computer Engineering, University of Coimbra, Portugal(葡萄牙科英布拉大学系统与机器人研究所,电气与计算机工程系) CFisUC, Department of Physics, University of Coimbra, Portugal(葡萄牙科英布拉大学物理系,CFisUC)

AI总结 本文提出基于中性原子共振计算的混合量子-经典管道,用于医学图像分类,特别是息肉检测的二分类任务。通过引导自编码器处理高维数据,结合可微代理模型克服量子测量非可微问题,提升分类准确性和图像恢复能力。

Comments 8 pages, 6 figures. Accepted to the 2025 IEEE International Conference on Quantum AI (IEEE QAI). Supported by FCT and the Open Quantum Institute (OQI)

Journal ref 2025 IEEE International Conference on Quantum Artificial Intelligence (QAI)

详情
AI中文摘要

我们介绍了一种基于中性原子共振计算的混合量子-经典管道,用于医学图像分类,重点是息肉检测的二分类任务。为有效处理高维性,我们集成了引导自编码器。该管道学习了紧凑且判别性的图像数据表示,这些表示也适合量子共振计算。此类系统的一个关键挑战是量子测量的非可微性,这会形成标准训练的'梯度障碍'。我们通过引入可微代理模型来模拟量子层,从而实现整个系统的端到端反向传播。此引导训练过程联合优化分类准确性和自编码器的忠实图像恢复。所学的潜在表示被编码为脉冲调制参数,嵌入到里德堡哈密顿量中,随后通过期望值获得量子嵌入。这些嵌入随后传递给线性分类器。我们的模拟显示,该方法在使用PCA或无指导自编码器的传统方法中表现更优。我们还进行了消融研究,评估了各种量子和训练参数的影响,证明了我们提出的管道在现实世界医学影像应用中的鲁棒性和灵活性,即使在当前NISQ时代也是如此。

英文摘要

We introduce a hybrid quantum-classical pipeline, based on neutral-atom reservoir computing, for medical image classification, focusing on the binary classification task of polyp detection. To deal effectively with the high dimensionality, we integrate a guided auto-encoder. This pipeline learns compact and discriminative representations of image data that are also well-suited for quantum reservoir computing. A key challenge in such systems is the non-differentiable nature of quantum measurements, which creates a 'gradient barrier' for standard training. We overcome this barrier by incorporating a differentiable surrogate model that emulates the quantum layer, enabling end-to-end backpropagation through the entire system. This guided training process is jointly optimized for classification accuracy and for faithful image recovery from the auto-encoder. The learned latent representations are encoded as pulse detuning parameters within a Rydberg Hamiltonian, and quantum embeddings are subsequently obtained through expectation values. These embeddings are then passed to a linear classifier. Our simulations show that this method outperforms some traditional approaches that use PCA or unguided autoencoders. We also conduct ablation studies to assess the impact of various quantum and training parameters, demonstrating the robustness and flexibility of our proposed pipeline for real-world medical imaging applications, even in the current NISQ era.

2605.06726 2026-05-11 cs.LG

Transformer-Based Wildlife Species Classification from Daily Movement Trajectories

基于Transformer的野生动物物种分类:从每日移动轨迹中推断物种身份

Obed Irakoze, Prasenjit Mitra

发表机构 * Department of Electrical \& Computer Engineering Carnegie Mellon University Africa Kigali, Rwanda

AI总结 本文通过训练序列模型对大规模GPS轨迹进行分类,发现Transformer在物种分类中表现优于LSTM、CNN等模型,尤其在数据有限时提升显著,且统一1小时分辨率能提升整体性能。

Comments 8 pages

详情
AI中文摘要

从单日移动数据推断野生动物物种身份是一项具有挑战性的任务。我们在此基础上训练序列模型,利用Movebank平台上的大规模、多物种GPS轨迹进行训练。轨迹模型通过在测试中排除整个测距研究或区域的协议进行评估。我们比较了基于Transformer的序列模型与LSTM、CNN和时间卷积网络,发现Transformer在平衡准确率上普遍优于其他模型,提升幅度约为8到22个百分点,具体取决于物种和实验设置。在一项针对大象的二分类任务中,使用1小时分辨率时,Transformer的平衡准确率为0.83,AUC为0.92,显著优于所有基线模型。我们还探讨了在数据有限条件下,通过分析基本位移编码与扩展范围的运动描述符(包括速度、方向和转向行为)之间的差异,来研究特征表示。通过特征增强,我们观察到性能提升,尤其是对于受关注较少且稀疏表示的物种,如大型食肉动物、狮子和斑马。最后,比较1小时和30分钟时间分辨率的实验表明,尽管更细粒度的采样可以捕捉某些物种的短期移动模式,但统一的1小时分辨率在减少缺失数据和确保时间一致性方面能带来更广泛的性能提升。

英文摘要

Inferring the identity of wildlife species from daily movement data alone is a challenging task. We train sequence models on large-scale, 7-species GPS trajectories from the Movebank platform. Trajectories models are evaluated using a protocol in which entire telemetry studies or regions are heldout during testing. We compare Transformer-based sequence models to LSTM, CNN, and Temporal Convolutional Networks, and find that Transformers consistently achieve higher balanced accuracy with gains of approximately 8 to 22 percentage points, depending on the species and experimental setting. In an elephant binary classification task with 1-hour resolution, the Transformer achieves a balanced accuracy of 0.83 and an AUC of 0.92, substantially outperforming all baseline models. We examine, under data-limited conditions, feature representations by analyzing the differences between a basic displacement-based encoding and an expanded range of movement descriptors that include speed, direction, and turning behavior. With feature augmentation, we see clear performance gains, especially for underrepresented and sparsely represented species, such as large carnivores, lions, and Zebras. Finally, experiments comparing 1-hour and 30-minutetemporal resolutions show that while finer sampling can capture short-term movement patterns for some species, a unified 1-hour resolution yields more promising performance across studies by reducing missing data and ensuring consistent temporal coverage.

2605.06724 2026-05-11 cs.LG cs.AI eess.SP

Enabling Unsupervised Training of Deep EEG Denoisers With Intelligent Partitioning

通过智能分区实现深度EEG去噪器的无监督训练

Qiyu Rao, Haozhe Tian, Homayoun Hamedmoghadam, Danilo Mandic

发表机构 * Department of Electrical and Electronic Engineering, Imperial College London(帝国理工学院伦敦校区电子与电气工程系) Dyson School of Design Engineering, Imperial College London(帝国理工学院伦敦校区戴森设计工程学院)

AI总结 本文提出iPSD方法,通过学习将输入EEG段分割为独立的噪声实现,实现无监督去噪,尤其在低信噪比和复杂噪声下表现优异。

详情
AI中文摘要

EEG去噪因神经活动微妙且与频谱重叠噪声难以分离而具有挑战性。传统方法难以处理可穿戴EEG中的时变噪声,而深度学习方法虽能有效去噪但需无噪声参考信号。本文提出iPSD方法,通过学习将输入EEG段分割为独立的噪声实现,无需清洁参考信号即可实现深度学习去噪器的自监督训练,即使在仅有一个待去噪EEG段的情况下也能有效工作。通过大量实验验证,iPSD在极低信噪比(低至-10 dB)和挑战性噪声(如EMG)下表现出卓越的频谱保真度,优于竞争基线。

英文摘要

Denoising wearable electroencephalogram (EEG) is inherently challenging since neural activity is not only subtle but also inseparable from spectrally overlapping noise artifacts. Classical signal processing methods, relying on fixed or heuristic rules, cannot handle the time-varying pervasive artifacts in wearable EEGs. Deep learning methods, on the other hand, show promise in decomposition-free EEG denoising using highly expressive neural networks, but the training requires artifact-free EEG, which is inherently unobtainable. To address this, we propose Intelligent Partitioning for Self-supervised Denoising (iPSD). Our method eliminates the need for clean references by learning to partition an input EEG segment into independent noisy realizations with the same underlying signal. This enables self-supervision of deep learning denoisers, even in zero-shot settings where only a single EEG segment to be denoised is available. We validate iPSD through extensive experiments, including validations on wearable EEG from in-ear sensors. The results show that iPSD achieves state-of-the-art performance, most notably under extremely low signal-to-noise ratios (down to -10 dB) and challenging artifacts (e.g., EMG), with spectral fidelity orders of magnitude higher than competitive baselines.

2605.06723 2026-05-11 cs.AI cs.CL cs.LG

When Does a Language Model Commit? A Finite-Answer Theory of Pre-Verbalization Commitment

语言模型何时做出承诺?一种有限答案理论的预言语承诺

Long Zhang, Wei-neng Chen, Feng-feng Wei, Zi-bo Qin

发表机构 * School of Computer Science and Engineering(计算机科学与工程学院)

AI总结 研究语言模型在生成最终答案前的推理过程,通过有限答案偏好稳定化理论分析模型在回答前的承诺时间,发现其在可解析前稳定,且信号可从隐藏状态恢复。

详情
AI中文摘要

语言模型常在给出最终答案前生成推理,但可见答案不显示模型答案偏好何时稳定。本文通过有限答案偏好稳定化研究这一问题。对于模型状态和指定答案生成器,将模型的延续概率投影到有限答案集;在二元任务中,这产生精确的对数几率码,δ(ξ)=S_θ(是|ξ)-S_θ(否|ξ)。此目标定义了基于解析器的答案开始时间、回顾稳定时间以及领先优势,无需依赖贪心展开或学习探测器。在受控延迟判决任务中,Qwen3-4B-Instruct的上下文有限答案投影在答案可解析前稳定,主模板的平均领先为17-31个token,在解析器清洁复制中具有正的、较短的领先。信号跟踪模型最终输出而非事实,可从紧凑隐藏摘要线性恢复,部分可与光标进度分离,并作为共享信息转移,无需单一不变坐标。诊断将测量与在线停止、无语音器信念和因果答案控制分离;精确操控显示δ的局部敏感性,但无可靠生成控制。

英文摘要

Language models often generate reasoning before giving a final answer, but the visible answer does not reveal when the model's answer preference became stable. We study this question through a narrow computable object: \emph{finite-answer preference stabilization}. For a model state and specified answer verbalizers, we project the model's own continuation probabilities onto a finite answer set; in binary tasks this yields an exact log-odds code, $δ(ξ)=S_θ(\mathrm{yes}\midξ)-S_θ(\mathrm{no}\midξ)$. This target defines parser-based answer onset, retrospective stabilization time, and lead without relying on greedy rollouts or learned probes. In controlled delayed-verdict tasks with Qwen3-4B-Instruct, the contextual finite-answer projection stabilizes before the answer is parseable, with 17--31 token mean lead in the main templates and positive, shorter lead in a parser-clean replication. The signal tracks the model's eventual output rather than truth, is linearly recoverable from compact hidden summaries, is partly separable from cursor progress, and transfers as shared information without a single invariant coordinate. Diagnostics separate the measurement from online stopping, verbalizer-free belief, and causal answer control; exact steering shows local sensitivity of $δ$ but not reliable generation control.

2605.06720 2026-05-11 cs.LG cs.AI

Conditional generation of antibody sequences with classifier-guided germline-absorbing discrete diffusion

基于分类器引导的germline吸收离散扩散的抗体序列生成

Justin Sanders, Luca Giancardo, Lan Guo, Yue Zhao, Kemal Sonmez, Nina Cheng, Melih Yilmaz

发表机构 * Paul G. Allen School of Computer Science and Engineering, University of Washington(华盛顿大学保罗·G·艾伦计算机科学与工程学院) Life Sciences, Amazon Web Services(亚马逊网络服务生命科学)

AI总结 本文提出germline吸收扩散模型,通过离散扩散微调提升抗体序列生成性能,有效减少germline偏差,改进非germline残基预测准确率,并在条件生成任务中优于现有方法。

Comments 9 pages, 2 figures, 2 tables

详情
AI中文摘要

抗体治疗是现代医学中最成功的药物之一,但计算设计具有理想结合和开发性性质的抗体仍具挑战性。尽管蛋白质语言模型(pLMs)已成为抗体序列设计的强大工具,但现有方法主要存在两个关键限制:它们主要记忆germline序列而非建模生物上意义的体细胞变异,并且对灵活的分类器引导条件生成支持有限。本文通过两个主要贡献解决这些问题。首先,我们证明离散扩散微调在抗体序列上实现强大的语言建模性能,同时允许基于任何现成分类器的生成。其次,我们引入germline吸收扩散,一种离散扩散噪声过程的新修改,其中germline序列(而非掩码序列)作为吸收状态。这种生物启发的归纳偏置限制模型学习从germline到观察序列的轨迹,有效排除遗传变异和V(D)J重排统计量从学习分布中,并大幅减轻germline偏差。我们显示germline扩散将非germline残基预测准确率从26%提高到46%,接近由真实生物变异设定的理论上限。然后我们展示germline扩散模型在条件生成任务中的实用性,即采样具有改进疏水性和预测结合亲和力的抗体。在两项任务中,我们的模型在类保持和样本质量之间实现了改进的权衡,显著优于EvoProtGrad,一种从pLMs中采样的流行策略,使用基于梯度的离散马尔可夫链蒙特卡洛方法。

英文摘要

Antibody therapeutics are among the most successful modern medicines, yet computationally designing antibodies with desirable binding and developability properties remains challenging. While protein language models (pLMs) have emerged as powerful tools for antibody sequence design, existing approaches largely suffer from two key limitations: they predominantly memorize germline sequences rather than modeling biologically meaningful somatic variation, and they offer limited support for flexible classifier-guided conditional generation. We address these challenges through two primary contributions. First, we demonstrate that discrete diffusion fine-tuning achieves strong language modeling performance on antibody sequences while allowing for generation conditioned on any off-the-shelf classifier. Second, we introduce germline absorbing diffusion, a novel modification of the discrete diffusion noise process in which the germline sequence - rather than a masked sequence - serves as the absorbing state. This biologically motivated inductive bias restricts the model to learning the trajectory from germline to observed sequence, effectively excluding genetic variation and V(D)J recombination statistics from the learned distribution and dramatically mitigating germline bias. We show that germline diffusion improves non-germline residue prediction accuracy from 26 percent to 46 percent, approaching the theoretical upper bound set by true biological variability. We then demonstrate the utility of our germline diffusion model on the conditional generation tasks of sampling antibodies with improved hydrophobicity and predicted binding affinity. On both tasks our model shows an improved tradeoff between class adherence and sample quality, significantly outperforming EvoProtGrad, a popular strategy to sample from pLMs with gradient-based discrete Markov Chain Monte Carlo.

2605.06716 2026-05-11 cs.AI cs.CL

From Storage to Experience: A Survey on the Evolution of LLM Agent Memory Mechanisms

从存储到经验:LLM代理记忆机制演化的综述

Jinghao Luo, Yuchen Tian, Chuxue Cao, Ziyang Luo, Hongzhan Lin, Kaixin Li, Chuyi Kong, Ruichao Yang, Jing Ma

发表机构 * Hong Kong Baptist University(香港 Baptist 大学) South China Normal University(南方中国师范大学) Hong Kong University of Science and Technology(香港科学与技术大学) National University of Singapore(新加坡国立大学) University of Science and Technology Beijing(北京科技大学)

AI总结 本文综述了LLM代理记忆机制的发展,提出三阶段框架,分析长程一致性、动态环境挑战和持续学习目标,探讨前瞻性探索与跨轨迹抽象等前沿机制,为下一代LLM代理设计提供指导。

Comments Accepted by ACL 2026 Findings

详情
AI中文摘要

基于大型语言模型(LLM)的代理已通过整合外部工具和规划能力重塑了人工智能。尽管记忆机制已成为这些系统的架构基石,但当前研究仍碎片化,徘徊于操作系统工程与认知科学之间。为弥合这一差距,本文提出一种新的进化框架,将发展过程分为三个阶段:存储(轨迹保存)、反思(轨迹细化)和经验(轨迹抽象)。我们首先正式定义这三个阶段,然后分析推动演化的三个核心驱动力:长程一致性的需求、动态环境的挑战以及持续学习的最终目标。此外,我们还特别探讨了前沿经验阶段的两种变革性机制:前瞻性探索和跨轨迹抽象。通过整合这些不同的观点,本文为下一代LLM代理的发展提供了稳健的设计原则和清晰的发展路线图。

英文摘要

Large Language Model (LLM)-based agents have fundamentally reshaped artificial intelligence by integrating external tools and planning capabilities. While memory mechanisms have emerged as the architectural cornerstone of these systems, current research remains fragmented, oscillating between operating system engineering and cognitive science. This theoretical divide prevents a unified view of technological synthesis and a coherent evolutionary perspective. To bridge this gap, this survey proposes a novel evolutionary framework for LLM agent memory mechanisms, formalizing the development process into three stages: Storage (trajectory preservation), Reflection (trajectory refinement), and Experience (trajectory abstraction). We first formally define these three stages before analyzing the three core drivers of this evolution: the necessity for long-range consistency, the challenges in dynamic environments, and the ultimate goal of continual learning. Furthermore, we specifically explore two transformative mechanisms in the frontier Experience stage: proactive exploration and cross-trajectory abstraction. By synthesizing these disparate views, this work offers robust design principles and a clear roadmap for the development of next-generation LLM agents.

2605.06708 2026-05-11 cs.CV cs.AI

Visual Text Compression as Measure Transport

视觉文本压缩作为度量传输

Lv Tang, Tianyi Zheng, Yang Liu, Bo Li, Xingyu Li

发表机构 * University of Alberta(阿尔伯塔大学) vivo Mobile Communication Co., Ltd(vivo移动通信有限公司) Tsinghua University(清华大学)

AI总结 本文通过度量传输理论分析视觉文本压缩,提出无标签路由准则和传输感知聚焦机制,提升压缩效率并优化下游任务表现。

详情
AI中文摘要

视觉文本压缩(VTC)通过将文本渲染为图像并用视觉-语言模型重新编码,实现长上下文处理的高效性,但其压缩比并不直接转化为下游任务的实用性。本文通过度量传输理论,将文本和视觉标记视为经验概率测度,揭示ViT补丁编码器诱导的推前映射的传输成本,包含精度成本和覆盖成本。该方法提出无标签路由准则和传输感知聚焦机制,在24个NLP数据集上,无标签规则在17个数据集上达到Oracle水平,平均任务得分提升3.3%且平均tokens减少10.3%。

英文摘要

Visual text compression (VTC) promises efficient long-context processing by rendering text into an image and re-encoding it with a vision-language model, often producing $3$--$20\times$ fewer decoder tokens than subword tokenization. Yet token savings do not translate predictably into downstream utility: on some tasks the visual path matches or exceeds the text path, on others it collapses, and the compression ratio itself does not predict which regime will occur. The missing quantity is therefore not another summary of efficiency, but a principled measure of task-relevant information loss induced by visual encoding. We address this problem by formulating VTC in the language of measure transport. Treating text and visual tokens as empirical probability measures, we show that the ViT patch encoder induces a push-forward map whose transport cost decomposes into a precision cost from within-patch aggregation and a coverage cost from cross-patch fragmentation. Both terms are estimable from downstream-label-free probes. This formulation yields two operational consequences: a downstream-label-free routing criterion that selects whether to use the visual path for a given input or benchmark instance, and a transport-informed foveation mechanism that re-encodes high-cost regions at higher resolution. Across $24$ NLP datasets at Qwen3-4B, our label-free rule matches the per-dataset oracle on $17/24$ datasets ($70.8\%$), and improves the average task score by $+3.3\%$ with $-10.3\%$ average tokens relative to a pure-LLM.

2605.06702 2026-05-11 cs.AI cs.CL cs.LG

CASCADE: Case-Based Continual Adaptation for Large Language Models During Deployment

CASCADE:基于案例的连续适应:在部署期间为大型语言模型进行持续适应

Siyuan Guo, Yali Du, Hechang Chen, Yi Chang, Jun Wang

发表机构 * School of Artificial Intelligence, Jilin University(吉林大学人工智能学院) Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, Jilin University(吉林大学知识驱动人机智能工程研究中心) International Center of Future Science, Jilin University(吉林大学未来科学国际中心) Department of Informatics, King’s College London(伦敦国王学院信息学院) The Alan Turing Institute(艾伦·图灵研究所) AI Centre, Department of Computer Science, UCL(UCL计算机科学系人工智能中心)

AI总结 本文提出CASCADE框架,通过在部署期间持续学习提升LLM性能,实现20.9%的提升,并在多个领域任务中优于基线方法。

详情
AI中文摘要

大型语言模型(LLMs)已成为现代人工智能的核心基础,但其生命周期仍受训练与部署之间严格分割的限制,部署后学习效果显著下降。本文将部署时间学习(DTL)定义为LLM生命周期的第三阶段,使LLM代理在不修改模型参数的情况下通过经验提升自身。我们提出了CASCADE(CASe-based Continual Adaptation during DEployment),一种通用且原则性的框架,使LLM代理具备显式且持续演化的片段记忆。CASCADE将经验重用建模为上下文老虎机问题,使代理能够进行原则性的探索-利用权衡,并在长期交互中建立无遗憾保证。此设计使代理能够积累、选择和优化任务相关案例,将过去经验转化为可操作的知识。在16个多样化的任务中,包括医学诊断、法律分析、代码生成、网络搜索、工具使用和具身交互,CASCADE在零样本提示下将宏平均成功率提高了20.9%,并一致优于梯度和记忆基线方法。通过将部署重新定义为适应性学习过程,本文为持续改进人工智能系统奠定了基础。

英文摘要

Large language models (LLMs) have become a central foundation of modern artificial intelligence, yet their lifecycle remains constrained by a rigid separation between training and deployment, after which learning effectively ceases. This limitation contrasts with natural intelligence, which continually adapts through interaction with its environment. In this paper, we formalise deployment-time learning (DTL) as the third stage in the LLM lifecycle that enables LLM agents to improve from experience during deployment without modifying model parameters. We present CASCADE (CASe-based Continual Adaptation during DEployment), a general and principled framework that equips LLM agents with an explicit, evolving episodic memory. CASCADE formulates experience reuse as a contextual bandit problem, enabling principled exploration-exploitation trade-offs and establishing no-regret guarantees over long-term interactions. This design allows agents to accumulate, select, and refine task-relevant cases, transforming past experience into actionable knowledge. Across 16 diverse tasks spanning medical diagnosis, legal analysis, code generation, web search, tool use, and embodied interaction, CASCADE improves macro-averaged success rate by 20.9% over zero-shot prompting while consistently outperforming gradient-based and memory-based baselines. By reframing deployment as an adaptive learning process, this work establishes a foundation for continually improving AI systems.

2605.06696 2026-05-11 cs.AI cs.LG cs.MA

Hidden Coalitions in Multi-Agent AI: A Spectral Diagnostic from Internal Representations

多智能体AI中的隐藏联盟:来自内部表示的谱诊断

Cameron Berg, Susan L. Schneider, Mark M. Bailey

发表机构 * Reciprocal Research(递归研究) Center for the Future of AI, Mind, and Society(人工智能、心智与社会未来中心) Florida Atlantic University(佛罗里达 Atlantic 大学) Biological and Computational Intelligence Center(生物与计算智能中心) National Intelligence University(国家情报大学)

AI总结 本文提出通过分析多智能体系统内部神经表示的谱分区方法,检测隐藏联盟结构,验证了该方法在强化学习和大语言模型中的有效性,揭示了代表层次结构。

Comments 18 pages

详情
AI中文摘要

交互式AI代理集合可能形成联盟,产生关键的群体级组织,对AI安全和对齐至关重要。然而,仅观察代理行为往往不足以区分真实的信患耦合与虚假的相似性,因为 consequential 联盟可能在任何明显行为变化之前在内部表示层面形成。本文介绍了一种从多代理系统的内部神经表示中检测联盟结构的实用方法。该方法从代理的隐藏状态构建成对互信息图,并应用谱分区来识别最显著的联盟边界。我们在两个领域验证了该方法:首先,在多代理强化学习环境中,该方法成功恢复了编程的分层和动态联盟结构,并正确拒绝了没有信息耦合的行为协调的假阳性。其次,使用大型语言模型,该方法识别了由描述性提示暗示的联盟结构,跟踪动态团队重新分配,并揭示了代表层次结构,其中显式标签优于冲突的交互模式。在两种设置中,恢复的分区揭示了子组组织,这无法通过标量跨代理互信息测量区分。结果表明,通过谱分区分析隐藏状态互信息提供了一种可扩展的诊断方法,用于识别代表联盟,为监控分布式AI系统中的新兴结构提供了有价值的工具。

英文摘要

Collections of interacting AI agents can form coalitions, creating emergent group-level organization that is critical for AI safety and alignment. However, observing agent behavior alone is often insufficient to distinguish genuine informational coupling from spurious similarity, as consequential coalitions may form at the level of internal representations before any overt behavioral change is apparent. Here, we introduce a practical method for detecting coalition structure from the internal neural representations of multi-agent systems. The approach constructs a pairwise mutual-information graph from the hidden states of agents and applies spectral partitioning to identify the most salient coalition boundary. We validate this method in two domains. First, in multi-agent reinforcement learning environments, the method successfully recovers programmed hierarchical and dynamic coalition structures and correctly rejects false positives arising from behavioral coordination without informational coupling. Second, using a large language model, the method identifies coalition structures implied by descriptive prompts, tracks dynamic team reassignments, and reveals a representational hierarchy where explicit labels dominate over conflicting interaction patterns. Across both settings, the recovered partition reveals subgroup organization that a scalar cross-agent mutual-information measure cannot distinguish. The results demonstrate that analyzing hidden-state mutual information through spectral partitioning provides a scalable diagnostic for identifying representational coalitions, offering a valuable tool for monitoring emergent structure in distributed AI systems.

2605.06690 2026-05-11 cs.AI cs.CL cs.LG

State Representation and Termination for Recursive Reasoning Systems

递归推理系统的状态表示与终止

Debashis Guha, Amritendu Mukherjee, Sanjay Kukreja, Tarun Kumar

发表机构 * S P Jain School of Global Management(S P Jain 全球管理学院) Indian Statistical Institute(印度统计研究所) eClerx Services Ltd.(eClerx 服务有限公司)

AI总结 本文提出了一种递归推理系统的状态表示方法及终止条件,通过epistemic状态图编码提取的主张、证据关系、开放问题和置信度权重,并定义了顺序间隙以判断迭代的必要性。

详情
AI中文摘要

递归推理系统在获取新证据和细化累积理解之间交替进行。两个设计选择通常隐含:如何表示演化的推理状态,以及何时停止迭代。本文解决这两个问题。我们将推理状态表示为epistemic状态图,编码提取的主张、证据关系、开放问题和置信度权重。我们定义顺序间隙为expand-then-consolidate与consolidate-then-expand所达到状态之间的距离;小的顺序间隙表明两种顺序一致,进一步迭代可能无助。我们的主要结果给出了线性化顺序间隙在固定点附近非退化的必要且充分条件,表明该标准在何时具有信息性而非代数上空洞。这是一个局部条件,而非全局收敛保证。我们应用该框架到递归推理系统,并简要说明其在智能体循环、树状思维推理、定理证明和持续学习中的应用。

英文摘要

Recursive reasoning systems alternate between acquiring new evidence and refining an accumulated understanding. Two design choices are typically left implicit: how to represent the evolving reasoning state, and when to stop iterating. This paper addresses both. We represent the reasoning state as an epistemic state graph encoding extracted claims, evidential relations, open questions, and confidence weights. We define the order-gap as the distance between the states reached by expand-then-consolidate versus consolidate-then-expand; a small order-gap suggests that the two orderings agree and further iteration is unlikely to help. Our main result gives a necessary and sufficient condition for the linearised order-gap to be non-degenerate near the fixed point, showing when the criterion is informative rather than algebraically vacuous. This is a local condition, not a global convergence guarantee. We apply the framework to recursive reasoning systems and sketch its application to agent loops, tree-of-thought reasoning, theorem proving, and continual learning.

2605.06686 2026-05-11 cs.LG econ.EM stat.AP stat.ML

Robustness of Refugee-Matching Gains to Off-Policy Evaluation Choices

难民匹配收益对离线评估选择的鲁棒性

Kirk Bansak, Elisabeth Paulson, Dominik Rothenhäusler, Jeremy Ferwerda, Jens Hainmueller, Michael Hotard

发表机构 * Immigration Policy Lab, Stanford University(斯坦福大学移民政策实验室) Department of Political Science, University of California, Berkeley(加州大学伯克利分校政治学系) Technology and Operations Management Unit, Harvard Business School(哈佛商学院技术与运营管理单位) Department of Statistics, Stanford University(斯坦福大学统计学系) Department of Government, Dartmouth College(达特茅斯学院政府系) Department of Political Science, Stanford University(斯坦福大学政治学系)

AI总结 本文研究难民匹配对难民结果的提升潜力,通过多种离线评估方法验证了因果影响评估结果的稳定性,发现估计值在多数情况下具有统计显著性且与Bansak等(2018)结果一致。

Comments 13 pages, 2 figures, 10 tables

详情
AI中文摘要

先前研究探讨了难民匹配提升难民结果的潜力,最初由Bansak等人(2018)提出。本文通过多种离线评估方法,在美国难民匹配背景下展示了反事实影响评估结果的稳定性。为估计反事实影响并测试结果的鲁棒性,我们采用多种评估方法,包括逆概率加权(IPW)和多种增强逆概率加权(AIPW)变体。我们还考虑了不同的修改,包括替代的建模架构和不同的分配程序。在所有场景中,影响估计值在幅度上保持一致,并且在大多数情况下具有统计显著性。此外,估计值也与Bansak等人(2018)最初提出的结果一致。

英文摘要

Previous research has investigated the potential of refugee matching for boosting refugee outcomes, first considered by Bansak et al. (2018). This paper demonstrates the stability of counterfactual impact evaluation results in the context of refugee matching in the United States using a range of off-policy evaluation methods. In order to estimate counterfactual impact and test the robustness of our results, we employ several evaluation methods, including inverse probability weighting (IPW) and multiple variants of augmented inverse probability weighting (AIPW). We also consider various modifications, including alternative modeling architectures and different assignment procedures. The impact estimates remain consistent in magnitude in all scenarios as well as statistically significant in most cases. Furthermore, the estimates are also consistent with the results originally presented in Bansak et al. (2018).

2605.06685 2026-05-11 cs.SD eess.AS stat.AP

An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire

具有认证转录的音频到分析管道:用于钢琴曲目信息论分析的管道

Fred Jalbert-Desforges

发表机构 * Independent Music Analysis Researcher(独立音乐分析研究员) CYGNUS & LYRA(CYGNUS与LYRA) Montreal, Quebec(魁北克市)

AI总结 本文提出一个音频到分析管道,通过认证转录生成作曲家级信息论分析,利用香农熵、KL散度和Zipf排名模型分析和解构钢琴曲目中的和声尺度分布。

Comments 25 pages, 4 figures, 25 references

详情
AI中文摘要

我们提出一个音频到分析管道,能够生成作曲家级的信息论分析:反映从聚合表演中出现的作曲词汇:从原始录音中构建,基于一个我们能在标准基准上认证的转录层(在MAESTRO v3.0.0测试集上的F1为0.9791)。应用于1,238首曲目和15位MAESTRO作曲家(至少有十首被归因的曲目),涵盖巴洛克到二十世纪初,该管道通过香农熵、不对称Kullback-Leibler散度和Zipfian排名频率模型分析这些经验分布。结果分析(i)将作曲家沿可解释的和声可预测性轴排序,具有狭窄的熵范围(3.33-3.86位),揭示了调性词汇的边缘相似性;(ii)通过在语料库中最小的KL散度恢复已知的风格谱系(海顿-贝多芬、利斯-拉赫玛尼诺夫、舒伯特-舒曼),门德尔松在其中作为稳定的异常值出现;(iii)通过Zipfian拟合到过渡分布的质量将当代新古典主义艺术家(里赫特、弗拉姆、格拉斯、阿纳尔德斯、约翰内森)与历史作曲家区分开来,新古典主义的均值R²为0.78,而历史作曲家为0.46(每组至少10首曲目)。这一差距大于每组内部的分布范围,并与极简主义作曲倾向一致:使用更紧凑的过渡词汇,且频率排名规律性更强。所有估计均报告了Laplace平滑的bootstrap 95%置信区间。

英文摘要

We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives empirical distributions over harmonic scale degrees and analyzes them through Shannon entropy, asymmetric Kullback-Leibler divergence, and Zipfian rank-frequency modeling. The resulting profiles (i) order composers along an interpretable axis of harmonic predictability, with a narrow entropy range (3.33-3.86 bits) that reveals the marginal-level similarity of tonal vocabularies; (ii) recover known stylistic lineages (Haydn-Beethoven, Liszt-Rachmaninoff, Schubert-Schumann) through the smallest KL divergences in the corpus, with Mendelssohn emerging as a stable outlier within this corpus; and (iii) separate contemporary neoclassical artists (Richter, Frahm, Glass, Arnalds, Jóhannsson) from historical composers on the quality of Zipfian fit to the transition distribution, with mean $R^2 = 0.78$ for neoclassical versus 0.46 for historical (N $\geq$ 10 pieces each). This gap is larger than the spread within either group and is consistent with a minimalist compositional tendency: a compact transition vocabulary used with sharper frequency-rank regularity than historical composers. All estimates are reported with Laplace-smoothed bootstrap 95% confidence intervals.

2605.06684 2026-05-11 cs.LG

From Canopy to Collision: A Hybrid Predictive Framework for Identifying Risk Factors in Tree-Involved Traffic Crashes

从树冠到碰撞:一种混合预测框架,用于识别涉及树木的交通事故中的风险因素

Abdul Azim, Ahmed Hossain, Soumyadip Maitra, Panick Kalambay

发表机构 * Department of Civil Engineering(土木工程系) Rajshahi University of Engineering & Technology(拉贾沙希工程与技术大学) Traffic Safety Group(交通安全组) Multimodal Planning Division (MPD)(多模式规划部门) Arizona Department of Transportation (ADOT)(亚利桑那州交通部门) Texas Southern University(德克萨斯南方大学)

AI总结 本文提出混合预测框架,利用CRSS数据库分析涉及树木碰撞中的风险因素,通过机器学习、SHAP和逻辑回归识别关键影响因素,揭示安全措施改进方向。

Comments 30 pages, 10 figures

详情
AI中文摘要

涉及树木的碰撞是道路脱轨碰撞的重要子集,通常由于高能冲击导致致命或严重伤害。本研究开发了综合分析框架,利用2020-2023年的Crash Report Sampling System (CRSS)数据库,通过多步骤过程识别和量化导致碰撞严重性的风险因素。首先,基于机器学习的分类模型(CatBoost)识别与二元碰撞伤害严重性(KA:致命或致残伤害 vs BC:非致残或可能伤害)相关的关键因素。其次,使用SHapley Additive exPla-nations (SHAP)工具量化和可视化顶级影响因素对碰撞严重性的边际效应。第三,二元逻辑回归模型估计因素效应并验证SHAP得出的重要性度量。最后,SHAP交互图检查关键影响因素的联合效应。结果揭示了安全带未使用是最有影响的预测因子,未受约束的乘客由于弹射风险几乎三倍更可能经历严重后果。车辆年龄、超速违规和驾驶员失能显示出显著影响,反映了降低的碰撞安全性、增加的冲击力和减少的控制能力。关键互动出现在照明条件与车辆年龄、超速与照明条件、安全带使用与车辆年龄以及道路表面与超速之间,显示出加性风险效应与特定互动。这些发现为有针对性的安全系统干预提供了关键见解,包括加强安全带执法、在能见度降低条件下管理速度以及车辆车队现代化。

英文摘要

Tree-involved crashes represent a critical subset of run-off-road (ROR) collisions, often resulting in fatal or severe injuries due to high-energy impacts. This study develops a comprehensive analytical framework to identify and quantify risk factors contributing to crash severity in tree-involved collisions using the Crash Report Sampling System (CRSS) database spanning 2020-2023. The modeling framework follows a multi-step process. First, a machine learning based classification model (CatBoost) identifies key factors associated with binary crash injury severity (KA: fatal or incapacitating injury versus BC: non-incapacitating or possible injury). Second, SHapley Additive exPlanations (SHAP) tool is used to quantify and visualize the marginal effects of top influential factors on crash severity. Third, a binary logistic regression model estimates factor effects and validates SHAP-derived importance measures. Finally, SHAP interaction plots examine the combined effects of key contributing factors. Results reveal restraint non-use as the most influential predictor, with unrestrained occupants nearly three times more likely to experience severe outcomes due to ejection risk. Vehicle age, speeding violations, and driver impairment demonstrate substantial effects, reflecting reduced crashworthiness, increased impact forces, and reduced control capabilities. Critical interactions emerge between lighting conditions and vehicle age, speeding and lighting conditions, restraint use and vehicle age, and road surface and speeding, demonstrating additive risk effects with specific interactions. These findings provide critical insights for targeted safe system-based interventions, including enhanced seat belt enforcement, speed management in reduced visibility conditions, and vehicle fleet modernization.

2605.06683 2026-05-11 cs.LG cs.AI cs.CL

Toeplitz MLP Mixers are Low Complexity, Information-Rich Sequence Models

Toeplitz MLP Mixers 是低复杂度、信息丰富的序列模型

Benjamin L. Badger, Ethan Roland

发表机构 * IBM AE Studio

AI总结 本文提出Toeplitz MLP Mixer,通过三角掩码的Toeplitz矩阵乘法替代注意力机制,实现更低的计算复杂度,同时在信息保留和复制能力上表现更优。

详情
AI中文摘要

基于Transformer的大型语言模型在某些方面受到注意力机制二次时间与空间计算复杂度的限制。我们引入了Toeplitz MLP Mixer (TMM),一种类似于Transformer的架构,通过在序列维度上使用三角掩码的Toeplitz矩阵乘法替代注意力机制,从而在训练时达到O(dn log n)的时间和O(dn)的空间复杂度,在推理预填时达到O(dn)的时间和空间复杂度。尽管与其他亚二次复杂度架构相比,TMM缺乏复杂的输入调节或状态维护,但其在单位计算和设备内存下的训练效率更高。我们证明TMM能够保留更多信息,从而在复制能力上表现更优,我们认为这是由于缺乏架构偏置所致。与更高的输入信息保留一致,TMM在信息检索和上下文学习基准准确性方面优于其他架构。最后,我们从操作符索引理论的角度进行分析,并表明,反直觉的是,训练后的因果不可逆模型的Toeplitz层更可能成为可逆或几乎可逆的,而不是实际上在输入上可逆的模型。

英文摘要

Transformer-based large language models are in some respects limited by the quadratic time and space computational complexity of attention. We introduce the Toeplitz MLP Mixer (TMM), a transformer-like architecture that swaps attention for triangular-masked Toeplitz matrix multiplication over the sequence dimension resulting in $\mathcal{O} (dn \log n)$ time and $\mathcal O(dn)$ space complexity during training and $\mathcal O(dn)$ time and space at inference prefill. Despite the lack of sophisticated input modulation or state maintenance present in other sub-quadratic architectures, TMMs yield greater training efficiency in terms of loss achieved per compute and device memory. We demonstrate that TMMs are capable of retaining more input information resulting in improved copying ability, which we argue results from a lack of architectural biases. Consistent with higher input information retention, TMMs exhibit superior information retrieval and in-context learning benchmark accuracy compared to comparable architectures. We conclude with an analysis from the perspective of operator index theory and show that, counterintuitively, trained Toeplitz layers of causal non-invertible models are more likely to be invertible or nearly so than models that are actually invertible over their inputs.