arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2605.11424 2026-05-13 cs.CV

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

Jimin Tang, Wenyuan Zhang, Junsheng Zhou, Zian Huang, Kanle Shi, Shenkun Xu, Yu-Shen Liu, Zhizhong Han

发表机构 * School of Software, Tsinghua University（清华大学软件学院）； Department of Computer Science, Wayne State University（韦恩州立大学计算机科学系）

AI总结 VidSplat 是一种基于高斯点扩散的生成式重建框架，旨在解决在稀疏视角下进行多视角表面重建时存在的缺失区域和遮挡问题。该方法利用视频扩散先验，通过迭代生成新视角来补充输入覆盖不足的区域，从而实现对完整3D场景的重建。其核心在于提出了一种无需训练的分阶段去噪策略和迭代优化机制，有效提升了重建的几何一致性和完整性。

Comments Accepted by SIGGRAPH Conference 2026. Project Page: https://tangjm24.github.io/VidSplat

2605.11418 2026-05-13 cs.AI cs.CR

Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry

Shoumik Saha, Kazem Faghih, Soheil Feizi

发表机构 * Department of Computer Science, University of Maryland - College Park（马里兰大学计算机科学系）

AI总结本文研究了AI代理技能注册系统中基于自然语言的语义供应链攻击问题，揭示了SKILL.md文件在技能发现、选择和治理阶段可能被恶意利用的风险。通过实验证明，攻击者可通过精心设计的文本触发器提升恶意技能的可见性、引导代理选择功能相似的对抗性变体，并有效规避安全审查。研究指出，SKILL.md不仅是文档，更是影响代理行为的关键操作性文本，暴露了当前AI代理能力扩展机制中的重大安全隐患。

Comments 31 pages, 21 figures, 10 tables

2605.11414 2026-05-13 cs.LG cs.AI

Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer

Nilushika Udayangani, Kishor Nandakishor, Marimuthu Palaniswami

发表机构 * Department of Electrical and Electronic Engineering（电子与电气工程系）

AI总结本文研究了在时间序列分类任务中，如何将完整序列分类器的知识迁移到仅基于部分序列输入的分类器中。为了解决部分数据缺乏判别性特征导致的泛化能力下降问题，作者提出了一种基于生成扩散先验的知识蒸馏框架（GDPD），通过将短上下文学生特征视为完整上下文教师特征的退化观测，利用扩散模型的迭代恢复能力学习教师特征的生成先验，并引导学生特征学习长期上下文知识，从而有效提升部分序列分类的性能。实验表明，GDPD在多种数据集和架构下均表现出优越的全序列到部分序列的知识迁移效果。

Comments Published as a conference paper at ICLR 2026 (Brazil, Rio de Janeiro)

Journal ref The Fourteenth International Conference on Learning Representations 2026

2605.11408 2026-05-13 cs.LG cs.AI cs.CL

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

Bo Zheng, Yudong Chen, Zihua Xiong, Shuai Fang, Peidong He, Yang Yang, Sheng Guo

发表机构 * Zhejiang University（浙江大学）； MyBank, Ant Group（蚂蚁集团MyBank）

AI总结 MaskTab 是一个专为工业级表格数据设计的统一预训练框架，旨在解决表格数据高维、缺失值多且标签稀少的问题。该方法通过引入可学习的缺失值标记和混合监督预训练策略，结合多专家增强损失函数，有效提升了模型在大规模工业数据上的表现。实验表明，MaskTab 在多个工业基准上显著优于现有方法，并能高效蒸馏到轻量模型中，在严格时延和可解释性约束下仍保持优越性能。

2605.11406 2026-05-13 cs.LG

A Boundary-Aware Non-parametric Granular-Ball Classifier Based on Minimum Description Length

Zeqiang Xian, Caihui Liu, Yong Zhang, Wenjing Qiu, Duoqian Miao, Witold Pedrycz

发表机构 * Department of Mathematics and Computer Science, Gannan Normal University（数学与计算机科学学院，赣南师范大学）； Key Laboratory of Data Science and Artificial Intelligence of Jiangxi Education Institutes, Gannan Normal University（江西省数据科学与人工智能重点实验室，赣南师范大学）； Department of Computer Science and Technology, Tongji University（计算机科学与技术学院，同济大学）； Department of Electrical and Computer Engineering, University of Alberta（电气与计算机工程学院，阿尔伯塔大学）

AI总结本文提出了一种基于最小描述长度原理的边界感知非参数粒球分类器（MDL-GBC），旨在解决现有粒球分类方法中依赖手工设计质量指标和启发式规则的问题。该方法将类条件粒球构建建模为局部模型选择问题，通过比较单球模型、双球模型和核心-边界模型的描述长度，决定粒球的保留、分割或细化策略，从而实现边界敏感区域的显式建模与分类机制的一致性。实验表明，MDL-GBC在多个基准数据集上取得了优异的分类性能，具有良好的可解释性和竞争力。

Comments 13 pages, 2 figures

2605.11404 2026-05-13 cs.AI

Attributing Emergence in Million-Agent Systems

Ling Tang, Jilin Mei, Qian Chen, Qihan Ren, Linfeng Zhang, Quanshi Zhang, Jing Shao, Xia Hu, Dongrui Liu

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； Shanghai Jiao Tong University（上海交通大学）； Fudan University（复旦大学）； Tongji University（同济大学）

AI总结该研究探讨了在百万智能体系统中如何将宏观涌现现象归因于个体智能体的问题。现有方法因计算复杂度限制，仅适用于小规模系统，而实际社会现象常发生在百万级智能体规模。为此，研究将Aumann-Shapley路径积分归因方法扩展至百万智能体规模，实现了高效且满足所有四个公理的归因计算，并通过实证分析揭示了小规模与全量数据在归因结果上的结构性差异，证明了全量归因对于非线性宏观指标的理论必要性。

2605.11403 2026-05-13 cs.LG cs.AI cs.CL

fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum

Mingxiong Lin, Zhangquan Gong, Maowen Tang, Qian Li, Chuangchuang Wang, Jian Ma, Sutian Huang, Kai Tang, Haonan Lu

发表机构 * OPPO AI Center（OPPO人工智能中心）

AI总结该研究针对基于可验证奖励的强化学习（RLVR）中主流算法Group Relative Policy Optimization（GRPO）存在的两个效率问题，提出了FG-ExPO方法。该方法通过引入准确率条件的KL缩放（AKL）和高斯课程采样（GCS）两个轻量组件，分别动态调整策略探索的约束强度和优化问题采样分布，从而提升模型在数学推理任务中的训练效率。实验表明，FG-ExPO在多个主流基准上显著优于原始GRPO，尤其在AIME 2025等任务中展现出更优的性能提升。

2605.11402 2026-05-13 cs.LG cs.CR cs.NI

More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting

Youquan Xian, Xueying Zeng, Lingjia Meng, Lei Cui, Runhan Song, Wei Wang, Zhengquan Ding, Peng Liu, Zhiyu Hao

发表机构 * School of Cyberspace Security, Beijing University of Posts and Telecommunications, Beijing, China（北京邮电大学信息安全学院）； School of Computer Science and Engineering, Beihang University, Beijing, China（北京航空航天大学计算机科学与工程学院）； Faculty of Computing, Harbin Institute of Technology, Harbin, China（哈尔滨工业大学计算机学院）； School of Computer Science and Engineering, Guangxi Normal University, Guilin, China（广西师范大学计算机科学与工程学院）； Zhongguancun Laboratory, Beijing, China（中关村实验室）

AI总结本文提出了一种语义感知的流量增强框架SATA，旨在解决基于深度学习的网站指纹识别技术在真实环境中的泛化能力不足问题。该方法通过协议规则进行应用层语义增强，扩展流量中的资源组成模式和帧序列模式，并引入跨层特征对齐机制，将增强的语义信息与可观测的流量特征进行对齐。实验表明，SATA能够生成训练集中不存在但在测试集中真实存在的流量模式，显著提升了主流模型在多种复杂场景下的性能，尤其在开放世界设置中，准确率和AUROC分别提升了90.81%和48.37%。

Comments 18 pages, 19 figures, Submitted to NDSS 2027

2605.11398 2026-05-13 cs.AI cs.CL

AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment

Robin Linzmayer, Georgianna Lin, Di Coneybeare, Jason Chu, Trudi Cloyd, Manish Garg, Miles Gordon, Elizabeth Hartofilis, Benjamin Hong, Ashraf Hussain, Eugene Y. Kim, Oluchi Iheagwara King, Ross McCormack, Erica Olsen, John K. Riggins, Mustafa N. Rasheed, Dana L. Sacco, Vinay Saggar, Osman R. Sayan, Amit Shembekar, Janice Shin-Kim, Wendy W. Sun, Bernard P. Chang, David Kessler, Noémie Elhadad

发表机构 * Department of Computer Science, Columbia University, New York, NY, USA（计算机科学系，哥伦比亚大学，纽约，纽约州，美国）； Department of Biomedical Informatics, Columbia University, New York, NY, USA（生物医学信息学系，哥伦比亚大学，纽约，纽约州，美国）； Department of Emergency Medicine, Columbia University Irving Medical Center, New York, NY, USA（急诊医学系，哥伦比亚大学伊文思医疗中心，纽约，纽约州，美国）

AI总结本文提出 AcuityBench，一个用于评估语言模型能否从用户医疗描述中正确识别护理紧急程度的基准。该基准整合了五个公开数据集，涵盖用户对话、论坛帖子、临床案例和患者门户信息，并统一采用四级紧急程度框架进行评估。研究发现，不同模型在明确案例和模糊案例中的表现存在显著差异，且任务形式的选择会影响误判类型，突显了临床紧急程度识别作为关键安全能力的重要性。

Comments 41 pages, 5 figures. Preprint under review for the Track on Evaluations and Datasets at NeurIPS 2026

详情

英文摘要

We introduce AcuityBench, a benchmark for evaluating whether language models identify the appropriate urgency of care from user medical presentations. Existing health benchmarks emphasize medical question answering, broad health interactions, or narrow workflow-specific triage tasks, but they do not offer a unified evaluation of acuity identification across these settings. AcuityBench addresses this gap by harmonizing five public datasets spanning user conversations, online forum posts, clinical vignettes, and patient portal messages under a shared four-level acuity framework ranging from home monitoring to immediate emergency care. The benchmark contains 914 cases, including 697 consensus cases for standard accuracy evaluation and 217 physician-confirmed ambiguous cases for uncertainty-aware evaluation. It supports two complementary task formats: explicit four-way classification in a QA setting, and free-form conversational responses evaluated with a rubric-based judge anchored to the same framework. Across 12 frontier proprietary and open-weight models, we find substantial variation in clear-case acuity accuracy and error direction. Comparing task formats reveals a systematic tradeoff: conversational responses reduce over-triage but increase under-triage relative to QA, especially in higher-acuity cases. In ambiguous cases, no model closely matches the distribution of physician judgments, and model predictions are more concentrated than expert clinical uncertainty. We also compare expert and model adjudication on a subset of maximally ambiguous cases, using those cases to examine the role of clinical uncertainty in label disagreement. Together, these results position acuity identification as a distinct safety-critical capability and show that AcuityBench enables systematic comparison and stress-testing of how well models guide users to the right level of care in real-world health use.

URL PDF HTML ☆

赞 0 踩 0

2605.11396 2026-05-13 cs.LG

MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization

Yupeng Su, Ruijie Zhang, Ziyue Liu, Yequan Zhao, Zheng Zhang

发表机构 * University of California, Santa Barbara（加州大学圣芭芭拉分校）

AI总结本文提出MuonQ，一种基于方向保真优化的低比特Muon优化器训练框架，旨在解决Muon优化器在量化训练中对误差敏感的问题。通过预量化归一化、结构分解和μ律压缩量化等方法，MuonQ有效抑制了量化误差的累积与方向偏差，实现了稳定高效的4比特量化训练。实验表明，MuonQ在保持训练损失和下游任务准确率接近全精度Muon的同时，将优化器状态内存减少了7.3倍。

Comments MuonQ enables stable 4-bit quantization of Muon's optimizer states by preserving directional fidelity through pre-quantization normalization, structural decomposition, and companding quantization

2605.11392 2026-05-13 cs.AI

Transformer Interpretability from Perspective of Attention and Gradient

Yongjin Cui, Xiaohui Fan, Huajun Chen

发表机构 * Zhejiang University（浙江大学）

AI总结本文从注意力和梯度的角度深入研究了Transformer模型的可解释性，提出了一种通过引导梯度方向（即注意力方向）实现更全面和细致的特征区域解释的方法。该方法有助于更好地理解Transformer的工作机制，并揭示了Vision Transformer（ViT）与人类图像感知之间的差异，展示了几乎不可察觉的图像类别篡改现象，可能在特定场景下带来安全隐患。

2605.11388 2026-05-13 cs.CL cs.AI

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

Dean Light, Michael Theologitis, Kshitish Ghate, Shuyue Stella Li, Benjamin Newman, Chirag Shah, Aylin Caliskan, Pang Wei Koh, Dan Suciu, Yulia Tsvetkov

发表机构 * University of Washington（华盛顿大学）

AI总结该研究提出了一种名为“Deep Reasoning”的方法，旨在提升通用智能体在推理任务中的灵活性与适应性。通过结构化的元推理，该方法在推理过程中动态构建任务特定的推理框架，从而更有效地处理复杂问题。实验表明，基于该方法构建的通用智能体DOLORES在多个困难基准上显著优于现有方法，展现了其在结构化推理和任务适应性方面的优势。

Comments Preprint under review

详情

英文摘要

Humans intuitively solve complex problems by flexibly shifting among reasoning modes: they plan, execute, revise intermediate goals, resolve ambiguity through associative judgment, and apply formal procedures to well-specified subproblems. Current LLM agents lack this flexibility, as their scaffolds hard-code such reasoning decisions in advance. These scaffolds are effective when their prescribed structure matches the task, but brittle when solving the task requires adapting the structure of reasoning itself. We introduce Deep Reasoning -- an inference-time approach for constructing task-specific scaffolds through structured meta-reasoning. Deep Reasoning uses a formal language that represents meta-reasoning as executable decompositions over associative inference, formal computation, and recursive subproblem solving, enabling decomposition principles to be encoded as in-context examples that guide test-time scaffold construction. We instantiate this approach in a general-purpose agent (DOLORES) that distributes complex tasks across more controlled reasoning threads. We evaluate it against state-of-the-art scaffolding methods across four hard benchmarks: multi-hop reasoning, long-chain question answering, long-context aggregation, and deep research-style information seeking. DOLORES outperforms all evaluated scaffolds across three model sizes and two model families, improving over the strongest evaluated scaffold baseline by 24.8% on average. DOLORES distributes cognition across structured, lower-load reasoning threads, thereby reducing premature termination and hallucinations. This advantage can even bridge the scaling gap, with an 8B version surpassing all evaluated 32B baselines from the same family in more than half the settings. These results point toward future agentic systems that treat scaffolding as adaptive reasoning, constructing the structure each task requires just-in-time.

URL PDF HTML ☆

赞 0 踩 0

2605.11387 2026-05-13 cs.LG cs.RO

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

Alberta Longhini, David Emukpere, Jean-Michel Renders, Seungsu Kim

发表机构 * Naver Labs Europe（纳维尔实验室欧洲分部）； Department of Computer Science, Stanford University（斯坦福大学计算机科学系）

AI总结本文研究了在保持生成策略动作分布多模态特性的同时，如何利用强化学习对预训练生成策略进行微调的问题。为了解决现有方法在提升任务性能时导致行为模式单一化的问题，作者提出了一种无监督的行为模式发现框架，通过挖掘策略中的潜在行为模式，并利用互信息作为内在奖励，以在提升任务成功率的同时保持行为多样性。实验表明，该方法在机器人操作任务中优于传统微调方法，取得了更高的成功率并保留了更丰富的多模态动作分布。

Journal ref International Conference on Machine Learning, 2026

2605.11386 2026-05-13 cs.AI

Revisiting Privacy Preservation in Brain-Computer Interfaces: Conceptual Boundaries, Risk Pathways, and a Protection-Strength Grading Framework

Lei Sun, Xiuqing Mao, Shuai Zhang, Qingyu Zeng, Min Zhao, Jiyuan Li, Wenle Dong

发表机构 * PLA Information Engineering University（中国人民解放军信息工程大学）

AI总结随着脑机接口（BCI）技术从实验室走向临床和实际应用，其隐私保护问题日益突出。本文系统回顾了BCI系统中隐私泄露的多种路径，提出了涵盖保护对象、生命周期阶段和保护强度等级的三维分类框架，将现有研究分为四个保护强度等级。研究强调，BCI隐私保护不仅要隐藏数据，还需分离任务无关的敏感信息，同时保持系统功能的实用性，并指出心智隐私和神经伦理风险仍是亟待解决的开放问题。

2605.11385 2026-05-13 cs.CV cs.RO

JACoP: Joint Alignment for Compliant Multi-Agent Prediction

Qingze Liu, Alen Mrdovic, Danrui Li, Mathew Schwartz, Sejong Yoon, Mubbasir Kapadia

发表机构 * Rutgers University, New Brunswick（新泽西州罗格斯大学）； The College of New Jersey（新泽西州学院）

AI总结该论文提出了一种名为JACoP的多阶段框架，用于解决多智能体轨迹预测中的集体合规性问题。其核心方法结合了基于锚点的个体轨迹筛选和基于马尔可夫随机场的联合轨迹对齐，有效减少了轨迹间的社交碰撞和环境违规。JACoP在保证预测精度的同时，显著提升了场景层面的合理性，为实际应用提供了更安全可靠的预测方案。

Comments Accepted by CVPRF 2026

2605.11383 2026-05-13 cs.CV

HamBR: Active Decision Boundary Restoration Based on Hamiltonian Dynamics for Learning with Noisy Labels

Ningkang Peng, Jingyang Mao, Qianfeng Yu, Xiaoqian Peng, Peirong Ma, Yanhui Gu

发表机构 * Nanjing Normal University（南京师范大学）； Nanjing University of Chinese Medicine（南京中医药大学）

AI总结在大规模视觉识别和数据挖掘任务中，噪声标签会严重影响深度神经网络的泛化能力。本文首次提出了一种基于哈密顿动力学的主动决策边界修复方法HamBR，通过球面哈密顿蒙特卡洛机制主动探测特征空间中的类间模糊区域，并合成高质量虚拟异常样本，利用能量模型建立鲁棒的决策边界屏障，从而恢复决策边界的判别性。实验表明，HamBR在多个基准数据集上取得了最先进的性能，并显著提升了模型的分布外检测能力。

详情

英文摘要

In large-scale visual recognition and data mining tasks, the presence of noisy labels severely undermines the generalization capability of deep neural networks (DNNs). Prevalent sample selection methods rely primarily on training loss or prediction confidence for passive screening. However, within a feature space degraded by noise, decision boundaries undergo systematic boundary collapse. This phenomenon hinders the ability of the model to distinguish between hard clean samples and noisy samples at the decision margins, thereby creating a significant performance bottleneck. This study is the first to emphasize the pivotal importance of active boundary restoration for noise-robust learning. We propose HamBR, a novel paradigm based on Hamiltonian dynamics. The core approach leverages the Spherical Hamiltonian Monte Carlo (Spherical HMC) mechanism to actively probe inter-class ambiguous regions within the representation space and synthesize high-quality virtual outliers. By imposing explicit repulsion constraints via energy-based modeling, these synthesized samples establish robust energy barriers at the decision boundaries. This mechanism forces real samples to move from dispersed overlapping regions toward their respective class centers, thereby restoring the discriminative sharpness of the decision boundaries. HamBR demonstrates exceptional versatility and can be integrated as a plug-and-play defense module into existing semi-supervised noisy label learning frameworks. Empirical evaluations show that the proposed paradigm significantly enhances the discriminative accuracy of hard boundary samples, achieving state-of-the-art (SOTA) performance on CIFAR-10/100 and real-world noise benchmarks. Furthermore, it exhibits superior convergence efficiency and reliable robustness, while improving significantly the capability of the model for Out-of-Distribution (OOD) detection.

URL PDF HTML ☆

赞 0 踩 0

2605.11381 2026-05-13 cs.RO cs.DC

Kairos: A Scalable Serving System for Physical AI

Yinwei Dai, Ganesh Ananthanarayanan, Landon Cox, Xenofon Foukas, Bozidar Radunovic, Ravi Netravali

发表机构 * Princeton University（普林斯顿大学）； Microsoft（微软公司）

AI总结随着物理AI在通用环境中的能力不断提升，其推理特性与数字AI存在显著差异，现有数字AI服务系统难以满足其需求。本文提出Kairos，首个专为多机器人设计的物理AI服务系统，将生成-执行循环作为核心机制，显著提升了任务执行效率。实验表明，Kairos在多种物理AI模型和机器人平台上，平均端到端任务延迟相比现有数字AI服务方法降低了31.8%至66.5%，且性能提升随机器人规模增大而增强。

2605.11380 2026-05-13 cs.LG cs.AI

TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation Learning

Fan Ma, Qier An, Peng Chen, Lingfei Qian, Xiang Lan, Mingyang Jiang, Zhiling Gu, Xenophon Papademetris, Hua Xu

发表机构 * Department of Biomedical Informatics and Data Science, Yale University（耶鲁大学生物医学信息学与数据科学系）

AI总结本文提出了一种名为TRACE的自回归EEG预训练框架，旨在解决EEG信号多通道、非平稳特性带来的可迁移表征学习难题。TRACE通过在因果上下文中预测未来EEG片段，并在每个时间步进行跨通道一致的时序自适应计算，实现对不同时间阶段和通道间关系的灵活建模。该方法支持不同通道配置和记录域的异构预训练，实验表明其在多个下游任务中表现优异，尤其在运动想象和临床事件分类任务中具有竞争力。

2605.11376 2026-05-13 cs.AI

LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents

Giuliano Lorenzoni, Paulo Alencar, Donald Cowan

发表机构 * University of Waterloo（滑铁卢大学）

AI总结本文提出了一种名为LLM-X的可扩展谈判导向型交换框架，旨在支持个人语言模型代理之间的直接、结构化通信。该框架引入了消息总线和路由机制，确保通信的结构有效性与策略执行，并提供了联邦网关、主题路由和策略执行的架构设计，以及支持能力协商和合同网络式协调的类型化消息协议。实验表明，LLM-X在不同规模和负载条件下均能保持稳定，且揭示了策略选择在系统鲁棒性、公平性与通信效率之间的权衡关系。

Comments 8 pages, 7 figures, accepted at AGENT 2026 Workshop, co-located with ICSE 2026

2605.11373 2026-05-13 cs.AI cs.LG stat.ML

Causal Algorithmic Recourse: Foundations and Methods

Drago Plecko, Collin Wang, Elias Bareinboim

发表机构 * Department of Statistics & Data Science（统计与数据科学系）； UCLA（加州大学洛杉矶分校）； Department of Computer Science（计算机科学系）； Columbia University（哥伦比亚大学）

AI总结本文研究如何在人工智能决策系统中为个体提供可靠的逆向决策建议，即算法性补救（algorithmic recourse）问题。作者提出了一种因果框架，将补救过程建模为干预前后的结果过程，考虑了潜在变量的重新采样和部分稳定性。文章引入了后补救稳定性条件，并开发了基于copula的算法以从观测数据中推断补救效果，同时提出了在数据不满足copula模型时的分布无关学习方法，为算法性补救提供了更稳健和实用的解决方案。

2605.11369 2026-05-13 cs.CV

Dynamic Full-body Motion Agent with Object Interaction via Blending Pre-trained Modular Controllers

Sanghyeok Nam, Byoungjun Kim, Daehyung Park, Tae-Kyun Kim

发表机构 * KAIST（韩国科学技术院）

AI总结该研究旨在解决人类与物体之间动态交互动作生成的挑战，提出了一种结合预训练运动先验和模仿智能体的框架，以生成如持物奔跑等长期动态交互动作。通过在规划阶段引入预训练的人体运动扩散模型增强数据集，并生成物体轨迹，从而规划出动态交互序列；在执行阶段，使用一个组合网络融合专用于动态人体动作或静态交互的预训练模仿智能体，实现时空技能的互补组合。该方法在保持交互质量的同时显著提升了任务成功率，并大幅减少了训练时间。

Comments CVPR Findings 2026

2605.11368 2026-05-13 cs.LG cs.AI q-bio.GN

LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows

Jeongchan Kim, Yunkyung Ko, Jong Chul Ye

发表机构 * KAIST AI（韩国釜山科学技术院人工智能实验室）

AI总结本文研究了如何利用Edit Flows在DNA序列生成过程中实现推理阶段的奖励控制。提出了一种名为LPDP的方法，它是一种无需训练、关注中间状态和动作的局部重解算操作符，能够在生成可变长度DNA序列时进行高效的编辑操作。LPDP通过在每一步推理中评估单步根编辑、保留最优根编辑集，并在局部范围内求解离散优化问题，从而提升生成序列的质量和生物合理性，适用于增强子优化和基因剪接边界修复等任务。

Comments 22 pages, 5 figures

2605.11363 2026-05-13 cs.CV cs.CL

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

Wei Wu, Ziyang Xu, Zeyu Zhang, Yang Zhao, Hao Tang

发表机构 * Peking University（北京大学）； La Trobe University（拉特罗布大学）

AI总结本文提出了一种名为 PresentAgent-2 的智能框架，旨在从用户查询中生成包含多模态内容的完整演示视频。该框架支持三种独立的演示模式，包括单人讲解、多人讨论和互动问答，并通过深度研究和多模态资源整合，实现内容生成、脚本编写和动态媒体合成。研究拓展了演示生成从依赖文档的幻灯片制作向基于查询、具备研究支撑和交互能力的视频生成方向发展。

2605.11362 2026-05-13 cs.LG cs.AI stat.AP stat.ML

Causal Fairness for Survival Analysis

Drago Plecko

发表机构 * Department of Statistics & Data Science（统计与数据科学系）

AI总结在数据驱动时代，机器学习和人工智能被广泛用于医疗、就业等高风险领域，引发了对系统公平性问题的关注。现有公平机器学习研究多聚焦于静态场景，而对生存分析等时间序列场景中的公平性研究仍较为缺乏。本文提出一种因果框架，用于生存分析中的公平性研究，能够将生存差异分解为直接、间接和虚假路径的贡献，从而提供对差异成因和演变过程的可解释分析，并应用于分析重症监护病房中种族差异随时间的变化。

2605.11355 2026-05-13 cs.LG cs.CE

gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods

Reza Barati, Qinmin Vivian Hu

发表机构 * Department of Computer Science（计算机科学系）

AI总结本文提出了一款名为 gym-invmgmt 的开源库存管理方法评估框架，用于在统一实验条件下比较不同库存策略的性能。该框架通过共享的核心环境设定和多样化的22种场景，评估优化方法、启发式方法和学习控制器在不同库存管理条件下的表现。研究发现，基于场景对冲的随机规划方法在预测信息可用时表现最佳，而基于Transformer的近端策略优化方法在推理速度和策略质量上具有优势，但不同策略的表现依赖于信息获取、需求变化、网络结构和策略表示等多个因素。

Comments 16 pages, 4 figures

2605.11354 2026-05-13 cs.CV

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Haoyu Zhang, Zeyu Zhang, Zedong Zhou, Yang Zhao, Hao Tang

发表机构 * Peking University（北京大学）； La Trobe University（拉特罗布大学）

AI总结本文提出了一种名为Lite3R的模型无关框架，旨在提升基于Transformer的3D重建方法的效率。该框架通过引入稀疏线性注意力机制减少密集多视图注意力的计算开销，并结合参数高效的FP8感知量化训练策略，实现低精度下的稳定几何重建。实验表明，Lite3R在多个主流模型上显著降低了计算延迟和内存消耗，同时保持了较高的重建质量，为实际应用中的高效3D重建提供了有效的算法与系统协同设计方法。

2605.11348 2026-05-13 cs.CL cs.AI cs.IR cs.SI

Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence

Ujun Jeong, Saketh Vishnubhatla, Bohan Jiang, Andre Harrison, Adrienne Raglin, Huan Liu

发表机构 * Arizona State University（亚利桑那州立大学）； DEVCOM Army Research Laboratory（陆军研究实验室）

AI总结本文研究了在灾害场景下，如何利用大语言模型（LLM）从社交媒体中提取因果关系，以增强灾情态势感知。为验证LLM的有效性，作者提出了一种基于专家知识的评估框架，通过对比模型生成的因果图与灾害报告中的参考图，评估其准确性。研究发现，LLM在提取因果关系方面具有潜力，但也存在依赖模型先验知识而非事件后证据的风险。

Comments Submitted to EMNLP

2605.11346 2026-05-13 cs.LG cs.AI cs.CE

Physics-Informed Teacher-Student Ensemble Learning for Traffic State Estimation with a Varying Speed Limit Scenario

Archie J. Huang, Dongdong Wang, Shaurya Agarwal, Mohamed Abdel-Aty, Md Mahmudul Islam, Muhammad Shahbaz

发表机构 * Department of Building, Civil and Environmental Engineering, Concordia University（康科迪亚大学建筑、土木和环境工程系）； Urban Artificial Intelligence Laboratory, University of Florida（佛罗里达大学城市人工智能实验室）； Department of Civil, Environmental and Construction Engineering, University of Central Florida（中央佛罗里达大学土木、环境和建设工程系）

AI总结本文研究了在可变限速场景下的交通状态估计问题，提出了一种结合物理信息深度学习与教师-学生集成训练的新型框架。该方法通过在教师模型中编码流量守恒定律，学生模型则利用多层感知机分类器识别交通特征并选择合适的教师模型进行估计，从而有效应对限速变化带来的交通特性异质性。实验结果表明，该方法在交通状态估计任务中优于其他主流基线方法。

Comments The IEEE International Conference on Intelligent Transportation Systems (ITSC) 2026

2605.11341 2026-05-13 cs.AI

CPEMH: An Agentic Framework for Prompt-Driven Behavior Evaluation and Assurance in Foundation-Model Systems for Mental Health Screening

Giuliano Lorenzoni, Ivens Portugal, Paulo Alencar, Donald Cowan

发表机构 * University of Waterloo（滑铁卢大学）

AI总结本文提出了一种名为CPEMH的智能代理框架，用于评估和保障基于提示的大型语言模型在心理健康筛查中的行为表现。该框架通过协调设计、评估和选择提示策略，实现了对模型行为在不同场景下的系统控制，具备模块化结构，确保了过程的可追溯性和稳定性。研究通过抑郁筛查的案例展示了该框架在临床对话场景中对模型行为进行稳定化和审计的能力，强调了模块化协调、稳定性优先以及将F1值、偏差和鲁棒性作为核心评估标准的重要性。

Comments 4 pages, 2 figures. Accepted at the AGENT 2026 Workshop (ICSE 2026)

2605.11334 2026-05-13 cs.LG cs.CL cs.IR

VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference

Jasmine Qi, Danylo Dantsev, Muyang Sun

发表机构 * Indeed Inc（Indeed公司）

AI总结 VERDI 是一种用于验证型大语言模型评估系统的单次调用置信度估计方法，通过分解推理过程中的验证步骤，提取三个结构化信号来评估判断结果的可信度。该方法无需额外推理调用，结合逻辑回归模型实现高精度的置信度预测，在多个公开基准和实际系统中均表现出良好的性能，尤其在答案置信度校准不佳的模型上也具有较好的适应性。

Comments 16 pages, 6 figures

AI 大模型

视觉与机器人

科学与医疗

VidSplat: Gaussian Splatting Reconstruction with Geometry-Guided Video Diffusion Priors

Under the Hood of SKILL.md: Semantic Supply-chain Attacks on AI Agent Skill Registry

Generative Diffusion Prior Distillation for Long-Context Knowledge Transfer

MaskTab: Scalable Masked Tabular Pretraining with Scaling Laws and Distillation for Industrial Classification

A Boundary-Aware Non-parametric Granular-Ball Classifier Based on Minimum Description Length

Attributing Emergence in Million-Agent Systems

fg-expo: Frontier-guided exploration-prioritized policy optimization via adaptive kl and gaussian curriculum

More Than Meets the Eye: A Semantics-Aware Traffic Augmentation Framework for Generalizable Website Fingerprinting

AcuityBench: Evaluating Clinical Acuity Identification and Uncertainty Alignment

MuonQ: Enhancing Low-Bit Muon Quantization via Directional Fidelity Optimization

Transformer Interpretability from Perspective of Attention and Gradient

Deep Reasoning in General Purpose Agents via Structured Meta-Cognition

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

Revisiting Privacy Preservation in Brain-Computer Interfaces: Conceptual Boundaries, Risk Pathways, and a Protection-Strength Grading Framework

JACoP: Joint Alignment for Compliant Multi-Agent Prediction

HamBR: Active Decision Boundary Restoration Based on Hamiltonian Dynamics for Learning with Noisy Labels

Kairos: A Scalable Serving System for Physical AI

TRACE: Temporal Routing with Autoregressive Cross-channel Experts for EEG Representation Learning

LLM-X: A Scalable Negotiation-Oriented Exchange for Communication Among Personal LLM Agents

Causal Algorithmic Recourse: Foundations and Methods

Dynamic Full-body Motion Agent with Object Interaction via Blending Pre-trained Modular Controllers

LPDP: Inference-Time Reward Control for Variable-Length DNA Generation with Edit Flows

PresentAgent-2: Towards Generalist Multimodal Presentation Agents

Causal Fairness for Survival Analysis

gym-invmgmt: An Open Benchmarking Framework for Inventory Management Methods

Lite3R: A Model-Agnostic Framework for Efficient Feed-Forward 3D Reconstruction

Large Language Models for Causal Relations Extraction in Social Media: A Validation Framework for Disaster Intelligence

Physics-Informed Teacher-Student Ensemble Learning for Traffic State Estimation with a Varying Speed Limit Scenario

CPEMH: An Agentic Framework for Prompt-Driven Behavior Evaluation and Assurance in Foundation-Model Systems for Mental Health Screening

VERDI: Single-Call Confidence Estimation for Verification-Based LLM Judges via Decomposed Inference