arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2605.09638 2026-05-12 cs.LG

Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning

Sze-Ann Chen, Zhi-Yi Chin, Kui-Yuan Chen, Chi-Yu Li, Ping-Chun Hsieh

发表机构 * National Yang Ming Chiao Tung University（国立阳明交通大学）； CISPA Helmholtz Center for Information Security（CISPA海德堡信息安全中心）

AI总结本研究提出了一种名为Plan2Cleanse的测试时反后门防御框架，用于检测和缓解深度强化学习模型中的后门攻击。该方法通过将后门检测转化为规划问题，利用蒙特卡洛树搜索技术高效识别并中和后门触发序列，无需重新训练模型。实验表明，Plan2Cleanse在多个环境中显著提升了后门触发的检测成功率和任务表现，验证了其在实际部署中的有效性。

Comments Published in Transactions on Machine Learning Research (TMLR)

2605.09636 2026-05-12 cs.AI

PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation

Zhen Hang, Yushan Yashengjiang, Junhui Li, Huanshuo Dong, Yang Wei, Zhezheng Hao, Jiangtao Ma, Songlin Bai, Haozhong Kai, Xihang Yue, Gangzong Si, Dongming Jiang, Chao Yao, Zhanhua Hu, Jiangqing Zhang, Pengwei Liu, Yaomin Shen, Xingyu Ren, Lei Liu, Zikang Xu, Han Li, Qingsong Yao, Hande Dong, Hong Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）； Tencent（腾讯）； Beijing University of Posts and Telecommunications（北京邮电大学）； Shanghai Jiao Tong University（上海交通大学）； Zhejiang University（浙江大学）； National University of Singapore（新加坡国立大学）； Tsinghua University（清华大学）； University of Texas at Dallas（德克萨斯大学达拉斯分校）； Arizona State University（亚利桑那州立大学）； Rice University（里士满大学）； Technical University of Munich（慕尼黑技术大学）； Stanford University（斯坦福大学）； Alibaba Group（阿里巴巴集团）

AI总结 PDEAgent-Bench 是首个面向偏微分方程（PDE）求解器生成的多指标、多库基准测试平台，旨在评估从PDE描述自动生成数值求解代码的能力。该基准包含645个实例，涵盖6类数学问题和11类PDE，支持DOLFINx、Firedrake和deal.II等主流有限元库，并对生成代码的可执行性、数值精度和计算效率进行分阶段评估。实验表明，当前大型语言模型和代码生成代理虽能生成可运行代码，但在满足精度和效率要求时表现显著下降，突显了PDE求解器生成任务的挑战性与现有方法的不足。

详情

英文摘要

PDE-to-solver code generation aims to automatically synthesize executable numerical solvers from partial differential equation (PDE) specifications. This task requires not only understanding the mathematical structure of PDEs, but also selecting appropriate discretization schemes and solver configurations, and correctly implementing the resulting formulations in finite-element method (FEM) libraries. Existing code generation benchmarks mainly evaluate syntactic correctness, or success on predefined test cases. To our knowledge, there is currently no publicly available benchmark specifically for PDE-to-solver code generation, and general-purpose code benchmarks do not fully capture the unique challenges of numerical PDE solution, such as ensuring solver accuracy, efficiency, and compatibility with professional FEM libraries. We introduce PDEAgent-Bench, to the best of our knowledge, the first multi-metric, multi-library benchmark for PDE-to-solver code generation. PDEAgent-Bench contains 645 instances across 6 mathematical categories and 11 PDE families, with common FEM libraries for DOLFINx, Firedrake, and deal.II. Each instance provides an agent-facing problem specification, a reference solution on a prescribed evaluation grid, and case-specific accuracy and runtime targets. PDEAgent-Bench adopts a staged evaluation framework in which generated solvers must sequentially pass executability, numerical accuracy, and computational efficiency checks. Experiments with representative LLMs and code agents show that models can often produce runnable code, but their pass rate drops substantially once accuracy and efficiency requirements are enforced. These results indicate that current agents remain limited in producing numerically reliable and efficient PDE solvers, and that PDEAgent-Bench provides a reproducible testbed grounded in the practical requirements of numerical PDE solving.

URL PDF HTML ☆

赞 0 踩 0

2605.09635 2026-05-12 cs.CL

K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

Hao Liang, Qihan Lin, Zhaoyang Han, Xiaochen Ma, Zhen Hao Wong, Meiyi Qiang, Linzhuang Sun, Wentao Zhang

发表机构 * Peking University（北京大学）； Institute for Advanced Algorithms Research（先进算法研究所）； OriginHub Technology（OriginHub技术）； Zhongguancun Academy（中关村学院）

AI总结该研究提出了K12-KGraph，一个与课程内容对齐的知识图谱，旨在评估和训练教育领域的大型语言模型。该图谱从人教版教材中提取，涵盖数学、物理、化学和生物等多个学科，包含七类节点和九类关系，用于构建多任务基准K12-Bench和训练数据集K12-Train。实验表明，基于课程结构的监督训练在教育资源有限的情况下表现更优，显著提升了模型在教育相关任务中的性能。

2605.09634 2026-05-12 cs.CL

Can We Trust LLMs for Mental Health Screening? Consistency, ASR Robustness, and Evidence Faithfulness

Erfan Loweimi, Sofia de la Fuente Garcia, Samira Loveymi, Hadi Daneshvar, Saturnino Luz

发表机构 * Usher Institute, Edinburgh Medical School, University of Edinburgh, Edinburgh, UK（埃瑟尔研究所、爱丁堡医学院、爱丁堡大学、爱丁堡）； Department of Computer Engineering, Ahvaz Campus, Islamic Azad University, Ahvaz, Iran（计算机工程系、阿瓦兹校区、伊斯兰Azad大学、阿瓦兹，伊朗）； School of Health and Social Care, Edinburgh Napier University, Edinburgh, UK（健康与社会护理学院、爱丁堡纳皮尔大学、爱丁堡，英国）

AI总结该研究探讨了大型语言模型（LLMs）在心理健康筛查中的可靠性，重点关注模型的一致性、语音识别（ASR）鲁棒性以及证据可信度。研究评估了Phi-4、Gemma-2-9B和Llama-3.1-8B三类模型在真实语音数据上的表现，发现Phi-4和Gemma-2-9B在模型内部一致性及对ASR错误的鲁棒性方面表现优异，而Llama-3.1-8B则表现出较差的稳定性。研究还揭示了模型评分与关键词依据之间的不一致，对临床应用的可解释性提出了挑战。

2605.09633 2026-05-12 cs.RO cs.SY eess.SY

Minimizing Worst-Case Weighted Latency for Multi-Robot Persistent Monitoring: Theory and RL-Based Solutions

Weizhen Wang, Ziheng Wang, Jianping He, Xinping Guan, Xiaoming Duan

发表机构 * School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China（自动化与智能感知学院，上海交通大学，上海，中国）； Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China（系统控制与信息处理重点实验室，中华人民共和国教育部，上海，中国）

AI总结本文研究多机器人在带权重图上的持续监测问题，旨在设计机器人轨迹以最小化所有节点在无限时间范围内的最差加权延迟。为了解决传统最差延迟目标无法区分瞬态性能差但渐近性能好的策略的问题，作者提出了一类尾部性能目标，并建立相应的优化问题理论框架。基于这些理论结果，构建了一个等效的事件驱动马尔可夫决策过程（TWLO-MDP），并开发了基于强化学习的求解方法，同时提出了多机器人监测基准（M2Bench），实验表明该方法能有效降低最差加权延迟并优于现有方法。

2605.09630 2026-05-12 cs.CL cs.LG

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

Lin Zheng, Vasilisa Bashlovkina, Timothy Dozat, Dan Garrette, Laura Rimell, Joshua Maynez

发表机构 * Google DeepMind（谷歌DeepMind）； The University of Hong Kong（香港大学）

AI总结本文研究了字节级语言模型中基于块（patch）的高效推理方法，提出了“Scratchpad Patching（SP）”技术，通过在每个块内插入临时缓存（scratchpad）来聚合已观测的字节信息，从而更新块级上下文，减少因块大小增加导致的预测滞后问题。该方法能够在保持相同块大小的前提下提升模型性能，并显著降低键值缓存和推理计算量，为高效语言模型设计提供了新思路。

Comments 23 pages, 15 figures

2605.09628 2026-05-12 cs.CV

DegBins: Degradation-Driven Binning for Depth Super-Resolution

Zhiqiang Yan, Zhengxue Wang, Jian Yang, Gim Hee Lee

发表机构 * Department of Computer Science, National University of Singapore（新加坡国立大学计算机科学系）； Nanjing University of Science and Technology（南京理工大学）

AI总结深度超分辨率（DSR）旨在从低分辨率深度图中恢复高分辨率深度图。传统方法通常在低维特征空间中学习高分辨率与低分辨率之间的残差，但难以准确建模空间变化的退化关系。本文提出了一种新的DSR框架DegBins，通过退化驱动的分箱策略，将回归问题转化为分类-回归混合问题，利用离散深度分箱的加权组合更灵活地表示残差深度，并在高维特征空间中建模退化关系，实现分箱范围和概率分布的自适应调整。实验表明，DegBins在多个基准数据集上优于现有方法，具有更高的精度和鲁棒性。

Comments 9 pages

2605.09622 2026-05-12 cs.CV cs.AI

Any2Any 3D Diffusion Models with Knowledge Transfer: A Radiotherapy Planning Study

Yuhan Wang, Zihan Li, Han Liu, Simon Arberet, Martin Kraus, Yuyin Zhou, Florin-Cristian Ghesu, Dorin Comaniciu, Ali Kamen, Riqiang Gao

发表机构 * UC Santa Cruz（加州大学圣克ruz分校）； Siemens Healthineers（西门子医疗）； University of Washington（华盛顿大学）

AI总结在放射治疗计划中，体素级剂量预测是一个关键但具有挑战性的任务，现有模型往往难以在不同临床场景中泛化。本文提出 DiffKT3D，一种统一的 Any2Any 3D 扩散框架，通过迁移预训练视频扩散模型的知识，实现高效且具有临床意义的剂量预测。该方法引入了基于模态嵌入的灵活条件生成机制，并结合临床导向的强化学习后训练策略，显著提升了剂量预测精度与图像质量，优于当前最优模型。

Comments Accepted by CVPR 2026 main conference. Compare to CVPR version, minor updates here are included (e.g., combine main text and appendix; clarify the timing scenario in appendix)

2605.09618 2026-05-12 cs.CL cs.CY

Statistical Scouting Finds Debate-Safe but Not Debate-Useful Cases: A Matched-Ceiling Study of Open-Weight LLM Reasoning Protocols

Julia Hu, Alfred Shen, Kumar Lakshmipathi

发表机构 * Amazon Web Services（亚马逊网络服务）

AI总结该研究探讨了语言模型在直接回答、多样本投票和多智能体辩论等不同推理策略下的表现差异，旨在确定在生成长度受限的情况下，哪种策略最有效。通过在MuSiQue和GSM8K数据集上对多个模型进行实验，发现最佳策略因模型和数据集而异，且难以通过简单的预判信号（如投票熵）来有效选择。研究指出，投票熵仅能预测辩论是否安全，而不能准确判断何时需要辩论，表明当前的辩论机制在实际应用中仍存在局限。

Comments 14 pages, 5 figures. Technical report / preprint

详情

英文摘要

When should a language model answer directly, sample and vote, or engage in multi-agent debate? Recent work shows voting often explains much of the gain attributed to debate, while selective-debate systems activate deliberation only on uncertain examples. We ask: under a matched ceiling on generated tokens (960 per example), how much per-example routing headroom exists, and how much is recoverable from cheap pre-deliberation signals? We evaluate greedy decoding, three-sample voting, and a two-agent critique-revise debate on MuSiQue and GSM8K using Llama 3.1 8B Instruct and Ministral 3 8B Instruct. On MuSiQue, an oracle selecting the correct protocol per example gains +14.0 and +13.7 pp over the best fixed one. The best fixed protocol is model- and dataset-dependent: each (model, dataset) cell has a different winner. This headroom is hard to recover from cheap ex-ante signals. A vote-entropy threshold is the only controller that directionally beats the best fixed protocol on both models (+1.3 and +1.7 pp), though individual paired-bootstrap CIs include zero. A joint analysis (meta-analysis +1.6 pp, p=0.125; Bayesian P(both>0)=0.59) is directionally consistent but not significant. Learned controllers (LR, GBT) do not outperform the threshold. The key finding is structural: vote entropy predicts where debate is safe, not where debate is needed. High entropy sharply reduces debate backfire, but 66% of debate-helpful examples (31/47) occur when voting is unanimous but wrong. A single-prompt self-critique probe on Llama flips the answer in 127/127 unanimous cases, yielding zero mutual information with the debate-helpful label; we cannot rule out a prompt-compliance artifact, but either interpretation disqualifies the probe as a router. Recovering the remaining headroom requires behavioral probes that avoid format-compliance confounds at the 8B scale.

URL PDF HTML ☆

赞 0 踩 0

2605.09614 2026-05-12 cs.CV

Reflection Anchors for Propagation-Aware Visual Retention in Long-Chain Multimodal Reasoning

Xuan Gong, Hanbo Huang, Hao Zheng, Yiran Zhang, Wenbin Dai, Weishu Zhao, Shiyu Liang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Lanzhou University（兰州大学）

AI总结本文研究了长链多模态推理中视觉信息衰减的问题，提出了一种基于信息论的分析方法，推导出干预点对下游视觉收益的下界，并据此设计了反射锚点策略优化（RAPO）方法。RAPO通过选择高熵的反射锚点并优化有限窗口的KL散度代理，有效增强了视觉信息在生成过程中的传播与保留。实验表明，RAPO在多个视觉-语言模型基准上显著优于现有方法，并且机制分析显示其能增强生成轨迹中视觉依赖的对比信号。

Comments Under Review

2605.09613 2026-05-12 cs.RO cs.CV

SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation

Narsimha Menga, Parikshit Sakurikar, Amirreza Rouhi, Satya Sai Reddy, Anirudh Govil, Sri Harsha Chittajallu, Rajat Aggarwal, Anoop Namboodiri, Sashi Reddi

发表机构 * DreamVu

AI总结该研究提出了SABER，一个用于现实零售场景中机器人视觉-语言-动作（VLA）适配的高保真动作数据集。SABER通过多小时的真实店内捕捉，记录了人类在零售环境中的精细手部动作、全身运动及场景动态，无需人工编排或远程操作。该数据集包含多种动作表示形式，并在实际机器人系统上验证了其有效性，显著提升了复杂零售任务的完成率，展示了高质量数据对提升机器人性能的关键作用。

详情

英文摘要

Robotic deployment in real-world environments depends on rich, domain-specific action data as much as on strong model architecture. General-purpose robot foundation models show modest performance in complex unseen tasks such as manipulation in a retail domain when applied out of the box. The root cause is a data gap: retail environments are structurally absent from general robot pretraining distributions, and the path to filling that gap through teleoperation is prohibitively expensive, logistically constrained, and difficult to scale. We introduce SABER, a high-fidelity retail robotics action dataset built from over 100 hours of natural in-store capture across multiple real grocery environments. Egocentric footage from head-mounted cameras records fine-grained hand activity at the point of interaction, while exocentric 360-degree scene footage from DreamVu's ALIA camera simultaneously observes all actors and activities across the entire space. This combination yields a uniquely complete picture of human retail behavior: dexterous hand activity, whole-body motion, and scene dynamics, all captured without staging, scripting, or teleoperation overhead. The SABER corpus contains 44.8K training samples across three action representation streams: 25K latent action sequences via LAPA-style encoding, 18.6K dexterous hand-pose trajectories retargeted to robot joint space, and 1.2K whole-body synchronized motion sequences retargeted to a humanoid embodiment. When applied to GR00T N1.6 via a shared-backbone multi-task post-training recipe, SABER yields a mean success rate of 29.3% across ten retail manipulation tasks -- more than 2.19x over fine-tuning baselines (13.4%). SABER demonstrates that the path to capable retail robots runs through better data, which can be collected today, at scale, without a robot in the loop. The dataset and code are available at https://dreamvu.ai/saber

URL PDF HTML ☆

赞 0 踩 0

2605.09611 2026-05-12 cs.CL

Byte-Exact Deduplication in Retrieval-Augmented Generation: A Three-Regime Empirical Analysis Across Public Benchmarks

Sietse Schelpe

发表机构 * Corbenic AI, Inc.（Corbenic AI公司）

AI总结本文对检索增强生成（RAG）流程中的字节精确块级去重技术进行了实证分析，研究了其在不同应用场景下的上下文缩减效果及质量影响。通过在学术、企业及多轮对话场景下的实验，发现去重可实现高达80.34%的冗余减少，同时通过多方模型的评估验证，确认该方法不会引入可测量的质量下降。研究证明，在不牺牲模型质量的前提下，可以确定性地实现显著的推理计算节省。

Comments Preprint. Implementation and open-source community version available at: https://github.com/corbenic/merlin-community - https://zenodo.org/records/20090712

2605.09608 2026-05-12 cs.LG cs.IT math.IT

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

Yuanyi Wang, Yifan Yang, Su Lu, Yanggan Gu, Pengkai Wang, Wenjun Wang, Zhaoyi Yan, Congkai Xie, Jianmin Wu, Jialun Cao, Shing-Chi Cheung, Hongxia Yang

发表机构 * The Hong Kong Polytechnic University, PolyU（香港理工大学，PolyU）； The Hong Kong University of Science and Technology, HKUST（香港科学与技术大学，HKUST）； PolyU-Daya Bay Technology and Innovation Research Institute（PolyU-大亚湾技术与创新研究院）

AI总结该研究探讨了大语言模型持续后训练过程中遗忘现象的成因与控制方法，提出通过任务参数更新的协方差几何来分析模型状态变化与新知识更新之间的兼容性。核心方法基于几何冲突理论，提出了一种无需数据的更新融合算法GCWM，通过高斯Wasserstein重心构建共享度量，并利用几何冲突进行修正控制。实验表明，该方法在多个模型规模上有效提升了持续训练中的知识保留与最终性能。

2605.09604 2026-05-12 cs.CV

DAP: Doppler-aware Point Network for Heterogeneous mmWave Action Recognition

Jiaying Lin, Shiman Wu, Jinfu Liu, Can Wang, Mengyuan Liu

发表机构 * Peking University（北京大学）； Huazhong University of Science and Technology（华中科技大学）； DJI Technology Company Ltd.（大疆技术创新有限公司）； Christian-Albrechts-Universität zu Kiel（基尔大学）

AI总结该研究针对毫米波雷达在异构场景下的人体动作识别（HAR）问题，提出了首个大规模异构多源毫米波点云数据集UniMM-HAR，并设计了DAP-Net网络以应对不同设备和频段带来的分布差异。DAP-Net通过融合多模态信息与Doppler感知机制，增强了模型对异构雷达源的鲁棒性，实验表明其在跨源识别任务中取得了优越的性能。

2605.09603 2026-05-12 cs.CL

Edit-Based Refinement for Parallel Masked Diffusion Language Models

Houxing Ren, Mingjie Zhan, Zimu Lu, Ke Wang, Yunqiao Yang, Haotian Hou, Junting Pan, Hongsheng Li

发表机构 * CUHK MMLab（香港中文大学多模态实验室）； SenseTime Research（SenseTime研究院）； Shenzhen Loop Area Institute（深圳环城区域研究院）

AI总结本文提出了一种基于编辑的改进框架ME-DLM，用于提升并行掩码扩散语言模型在多令牌生成时的性能。该方法在生成初始完整响应后，通过最小编辑操作（如替换、删除和插入）进行后处理优化，以增强序列一致性。实验表明，ME-DLM在保持并行生成效率的同时，显著提升了生成质量与鲁棒性，尤其在基于LLaDA模型时，在HumanEval和GSM8K数据集上分别取得了11.6和33.6点的提升。

Comments Accepted to ICML 2026

2605.09591 2026-05-12 cs.CV

From Pixels to Concepts: Do Segmentation Models Understand What They Segment?

Shuang Liang, Zeqing Wang, Yuxian Li, Xihui Liu, Han Wang

发表机构 * Department of Electrical and Computer Engineering, The University of Hong Kong（香港大学电子与计算机工程系）； School of Computer Science and Engineering, Sun Yat-sen University（中山大学计算机科学与工程学院）； CASIC, The University of Hong Kong（香港大学中国科学院自动化所）

AI总结本文研究了可提示分割模型是否真正理解其分割的概念，而不仅仅是依赖视觉显著但语义误导的线索。为此，作者提出了一个新的基准测试 CAFE，通过属性层面的反事实修改来评估模型对概念的忠实度。实验表明，尽管模型能生成准确的分割掩码，但在面对误导性提示时仍表现出概念理解的不足，揭示了定位质量与语义理解之间的系统性差距。

Comments 30 pages, 8 figures

2605.09584 2026-05-12 cs.CL cs.AI cs.LG

CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics

Aishik Nagar, Arun-Kumar Kaliya-Perumal, Yu-Hsuan Han, Andrew Sheng-Han Huang, Kristen Kee, Yushi Cao, Yiming Chen, Hongchao Jiang

发表机构 * ASUS Intelligent Cloud Services (AICS)（ASUS智能云服务（AICS））； Rehabilitation Research Institute of Singapore, Nanyang Technological University（新加坡康复研究院，南洋理工大学）； Department of Family Medicine, Taipei Veterans General Hospital（台北荣民总医院家庭医学部）； School of Medicine, National Yang Ming Chiao Tung University（国家阳明交通大学医学院）； Yong Loo Lin School of Medicine, National University of Singapore（新加坡国立大学 Yong Loo Lin 医学院）

AI总结 CLR-voyance 是一种用于强化住院临床决策支持系统中开放性推理能力的新框架，它将临床推理建模为部分可观察马尔可夫决策过程（POMDP），并结合临床结果和专家验证的奖励机制进行监督。该方法通过区分患者旅程中可见的过去信息和仅由专家可见的未来信息，生成可验证的临床推理评分标准（rubrics），用于模型的训练与评估。实验表明，基于 CLR-voyance 训练的模型在住院临床推理任务中表现优异，显著优于现有先进模型，并已在实际医院中部署应用。

详情

英文摘要

Inpatient clinical reasoning is a sequential decision under partial observability: the clinician sees the admission so far and must choose the next action whose downstream consequences are not yet visible. Existing clinical-LLM evaluations and RL rewards signals collapse this into closed-form retrieval, clinical journey leakage, or unanchored LLM-as-judge scoring. We introduce CLR-voyance, a framework that reformulates inpatient reasoning as a Partially Observable Markov Decision Process (POMDP) and supervises it with rewards that are simultaneously outcome-grounded and clinician-validated. We instantiate the formulation as CLR-POMDP, which partitions successful patient journeys into a policy-visible past and an oracle-only future. Using the past information, an oracle LLM generates a case-specific query-answer pair, and the first adaptive rubric for clinical reasoning which is verifiable in the future of the patient journey. These rubrics are used for both post-training and evaluation of models for inpatient clinical reasoning. We post-train Qwen3-8B and MedGemma-4B with GRPO followed by model merging, yielding state-of-the-art inpatient clinical reasoning while retaining generalist capabilities. CLR-voyance-8B achieves 84.91% on CLR-POMDP, ahead of frontier medical reasoning models like GPT-5 (77.83%) and MedGemma-27B (66.66%) and has comparable or better performance on existing medical benchmarks. To ensure a clinically meaningful setting, we conduct a large-scale clinician alignment study, where physicians curate per-case rubrics, grade candidate responses, and provide blinded pairwise preferences of model reasoning. This study provides insights on clinical LLM-as-a-judge and clinical preference-model selection, which can inform the community at large. CLR-voyance has been deployed for 6+ months at a partner public hospital, drafting thousands of reasoning-heavy inpatient notes.

URL PDF HTML ☆

赞 0 踩 0

2605.09581 2026-05-12 cs.CV

FPGA-Based Hardware Architecture for Contrast Maximization in Event-Based Vision

Michal Filipkowski, Marcin Kowalczyk, Tomasz Kryjak

发表机构 * AGH University of Krakow, Poland（波兰格但尼克技术大学）； Embedded Vision Systems Group, Computer Vision Laboratory（嵌入式视觉系统组，计算机视觉实验室）

AI总结本文提出了一种基于FPGA的硬件架构，用于实现基于事件视觉系统的对比度最大化（CM）算法。该架构利用FPGA的并行处理能力，高效实现了从异步事件流中重构图像的对比度计算与迭代优化，从而估计运动参数。研究展示了该硬件模块的设计细节与优化方法，并通过实验验证其在速度和能效方面的显著优势，相比CPU和GPU实现快200倍以上，为高速、低功耗嵌入式系统中的实时运动估计提供了坚实基础。

Comments Accepted for ARC 2026

2605.09579 2026-05-12 cs.LG cs.AI

Biosignal Fingerprinting: A Cross-Modal PPG-ECG Foundation Model

Zhangdaihong Liu, Chang Liu, Fenglin Liu, Yixuan Chen, Yang Yang, David A. Clifton, Xiao Gu

发表机构 * Department of Engineering Science, University of Oxford（牛津大学工程科学系）； Oxford Suzhou Centre for Advanced Research（牛津苏浙先进研究中心）； School of Public Health, Shanghai Jiao Tong University（上海交通大学公共卫生学院）

AI总结该研究提出了一种跨模态的生物信号指纹技术，旨在弥合心电图（ECG）与光电容积图（PPG）在心血管疾病监测中的应用差距。通过构建多模态掩码自编码器（M2AE），该方法从大量配对的ECG和PPG信号中学习到紧凑且可迁移的潜在表示，能够在无需任务特定微调的情况下，用于多种临床任务。实验表明，该方法在心血管疾病分类、高血压检测等任务中表现优异，且仅需单一模态输入即可保持高性能，适用于资源受限的可穿戴设备场景。

Comments 21 pages, 8 figures, 7 tables

2605.09572 2026-05-12 cs.CV cs.AI cs.MM

KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

Guanyi Du, Lintao Wang, Kun Hu, Ziyang Wang

发表机构 * School of Computer Science, The University of Sydney（悉尼大学计算机科学学院）； School of Science, Edith Cowan University（埃迪斯科文大学科学学院）； School of Computer Science and Digital Technologies, Aston University（阿斯顿大学计算机科学与数字技术学院）

AI总结该研究探讨了如何利用Kolmogorov-Arnold网络（KAN）从符号注释生成手语姿态动画，提出了一种多尺度序列生成模型KANMultiSign，能够将HamNoSys符号系统转化为二维人体姿态序列。研究引入了从粗到细的生成策略，并结合多尺度监督机制，先生成整体身体结构，再细化手部动作细节；同时将KAN模块集成到Transformer架构中，以更高效地建模符号到连续姿态的非线性映射。实验表明，该方法在多个手语语料库中取得了比现有方法更优的性能，同时大幅减少了参数量，验证了多尺度监督在提升符号条件姿态生成效果中的关键作用。

Comments Accepted at Neurocomputing

2605.09570 2026-05-12 cs.LG

End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory Sensor

Wiktor Matykiewicz, Piotr Wzorek, Kamil Jeziorek, Tomás Muñoz, Antonio Rios-Navarro, Angel Jiménez-Fernández, Tomasz Kryjak

发表机构 * AGH University of Krakow, Poland（克拉科夫AGH大学，波兰）； Embedded Vision Systems Group（嵌入式视觉系统组）； Computer Vision Laboratory（计算机视觉实验室）； Robotics and Technology of Computers Lab.（机器人与计算机技术实验室）； ETSII, EPS, SCORE, I3US, Universidad de Sevilla（塞维利亚大学ETSII、EPS、SCORE、I3US）

AI总结随着移动机器人和嵌入式智能的快速发展，边缘平台对高效设备端数据处理的需求日益增加。本文提出了一种基于现场可编程门阵列（FPGA）的端到端关键词识别系统，首次将神经形态听觉传感器（NAS）与图神经网络（GNN）集成在单一FPGA设备上，直接处理基于事件的音频流，无需传统信号预处理。该系统采用计算近内存架构，在保持高识别准确率（87.43%）的同时实现了低延迟和低功耗的实时处理。

Comments Accepted for the ARC 2026 conference

2605.09566 2026-05-12 cs.CV

Dual-Path Hyperprior Informed Deep Unfolding Network for Image Compressive Sensing

Tianyi Lu, Wenxue Cui, Shaohui Liu

发表机构 * Harbin Institute of Technology（哈尔滨理工大学）

AI总结本文提出了一种双路径超先验引导的深度展开网络（DPH-DUN），用于解决图像压缩感知中的重建问题。该方法通过将测量数据分为两个子集，并引入超先验信息指导重建过程，有效提升了不同纹理区域的重建质量。核心创新包括设计轻量神经模块生成多域超先验知识，并在重建过程中动态生成自适应步长和注意力机制，以提高重建精度和鲁棒性。实验表明，该方法在多个基准数据集上优于现有压缩感知方法。

2605.09565 2026-05-12 cs.LG

Online Set Learning from Precision and Recall Feedback

Lee Cohen, Yishay Mansour, Shay Moran, Han Shao

发表机构 * Stanford University（斯坦福大学）； Tel Aviv University and Google Research（特拉维夫大学和谷歌研究）； Technion and Google Research（技术学院和谷歌研究）； University of Maryland（马里兰大学）

AI总结本文研究了在在线设置下，从精确率和召回率反馈中学习未知子集的问题。在每一轮中，学习者预测一个子集并根据反馈类型（精确率或召回率）获得部分信息，目标是最大化累积奖励。研究证明，该问题的可学习性等价于假设类具有有限的VC维，并提出了应对反馈依赖性的算法，在可实现和不可知设置下均获得了遗憾界，为该模型的可学习性提供了理论刻画，并指出了多个值得进一步研究的问题。

2605.09554 2026-05-12 cs.CL cs.CV

Towards Compact Sign Language Translation: Frame Rate and Model Size Trade-offs

Kuanwei Chen, Mengfeng Tsai

发表机构 * Computer Science and Information Engineering, National Central University, Zhongli, Taiwan（资讯工程系，国立中央大学，中坜，台湾）

AI总结本文研究了手语翻译（SLT）中帧率与模型大小之间的权衡问题，旨在实现更紧凑高效的翻译系统。作者提出了一种仅含77M参数的轻量级管道，结合MMPose骨骼姿态提取与单一线性投影至T5-small模型，通过调整输入帧率，在保证翻译质量的前提下显著降低计算复杂度。实验表明，该方法在12fps时相比24fps仅小幅降低BLEU-4得分，同时模型大小仅为之前T5-base系统的1/3，展示了轻量架构在无需层次化编码器或大规模模型的情况下仍具竞争力。

Comments 2 pages, 1 figure, 2 tables

2605.09549 2026-05-12 cs.LG

When Adaptation Fails: A Gradient-Based Diagnosis of Collapsed Gating in Vision-Language Prompt Learning

Yunxuan Fang, Ziwei Zhang, Xinhe Wang

发表机构 * Beihang University（北航大学）

AI总结本文研究了在冻结的少样本视觉-语言提示学习中，自适应门控机制失效的问题，发现自适应门和提示选择模块常出现输出恒定、梯度信号微弱且性能不如固定提示的现象。通过系统实验，作者识别出两种主要失效模式：梯度幅值不平衡和门控退化，揭示了自适应门控在特定条件下的局限性，并对参数高效学习中盲目增加架构复杂性的做法提出了反思。

2605.09548 2026-05-12 cs.CL

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

Yihong Liu, Raoyuan Zhao, Michael A. Hedderich, Hinrich Schütze

发表机构 * Center for Information and Language Processing, LMU Munich（信息与语言处理中心，慕尼黑大学）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心（MCML））

AI总结该研究针对多语言推理中低资源语言表现较差的问题，提出了一种跨语言的在线自蒸馏方法COPSD。该方法利用同一模型作为学生和教师，学生仅看到低资源语言的问题，而教师则获得包括英文翻译和参考解法在内的跨语言上下文信息，通过最小化学生生成过程中的全分布词级差异，提供密集的监督信号。实验表明，COPSD在17种低资源非洲语言上显著提升了数学推理能力，优于现有方法，并在答案格式、推理扩展性和基准泛化方面表现出色。

Comments preprint

2605.09544 2026-05-12 cs.AI

TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning

Yize Li, Junzhi Li, Jason Song, Chuxiong Sun, Rui Wang, Changwen Zheng

发表机构 * University of Chinese Academy of Sciences（中国科学院大学）； Institute of Software, Chinese Academy of Sciences（中国科学院软件研究所）

AI总结 TIDE-Bench 是一个用于评估工具集成推理（TIR）方法的全面且高效的基准测试平台，旨在解决当前TIR评估在任务多样性、诊断全面性和评估效率方面的不足。该基准引入了多种任务设置，包括数学推理、知识密集型问答以及两种新设计的任务，以考察模型在复杂工具调用和多工具协作方面的能力。同时，TIDE-Bench 采用任务感知的综合评估协议，并通过筛选高质量样本提升评估效率，实验结果揭示了当前TIR方法在工具 grounding 方面的持续瓶颈，为未来研究提供了重要参考。

Comments 10 pages, 5 figures, 10 tables

2605.09542 2026-05-12 cs.AI

LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs

Rishabh Jakhar, Michel Dumontier, Remzi Celebi

发表机构 * Institute of Data Science, Department of Advanced Computing Sciences, Maastricht University（数据科学研究所，高级计算科学系，马斯特里赫特大学）

AI总结该研究提出了一种结合知识图谱与大语言模型（LLM）的神经符号框架TESSERA，用于从知识图谱中生成药物-疾病对的多步机制解释。该方法利用LLM进行局部判断和状态评估，同时借助蒙特卡洛树搜索（MCTS）实现长期路径的结构化搜索与信用分配，从而在保证生物知识准确性的同时，生成合理且多样化的解释路径。实验表明，该框架在两个互补的知识图谱上有效揭示了药物作用机制，并验证了LLM在其中的关键作用。

Comments Accepted at IJCAI-ECAI 2026. 9 pages (7 content + 2 references), 5 figures, 3 tables. Includes supplementary material (26 pages)

2605.09539 2026-05-12 cs.CL

TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

Chen Xu, Yicheng Hu, Ruizi Wang, Xinyu Lin, Wenjie Wang, Dongrui Liu, Fuli Feng

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； University of Science and Technology of China（中国科学技术大学）； National University of Singapore（新加坡国立大学）； Shanghai AI Lab（上海人工智能实验室）

AI总结本文提出了一种名为TacoMAS的测试时多智能体系统共进化框架，旨在同时动态调整智能体的能力与通信拓扑结构。该方法通过快速更新智能体能力以应对新出现的子任务，并在更长时间尺度上调整通信拓扑以保持协作稳定性，从而实现更高效的多智能体协作。实验表明，TacoMAS在四个基准任务中显著优于近20种现有方法，平均性能提升了13.3%。

2605.09538 2026-05-12 cs.CV cs.AI cs.RO

PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions

Jihyun Lee, Changmin Lee, Donghwan Kim, Tae-Kyun Kim

发表机构 * School of Computing, KAIST, Daejeon, South Korea（韩国釜山科学技术院计算学系）

AI总结 PhysHanDI 是一种基于物理的框架，旨在同时重建手部与非刚性物体（如布料、毛绒玩具）的三维交互。该方法通过模拟由密集重建的手部运动引起的力来驱动物体变形，确保重建的物体动态既符合物理规律又与手部运动一致。此外，物体变形的模拟还能通过逆物理方法提升手部重建的精度，实验表明 PhysHanDI 在重建和未来预测任务中均优于现有最佳方法。

Comments Accepted to ICML 2026

AI 大模型

视觉与机器人

科学与医疗

Plan2Cleanse: Test-Time Backdoor Defense via Monte-Carlo Planning in Deep Reinforcement Learning

PDEAgent-Bench: A Multi-Metric, Multi-Library Benchmark for PDE Solver Generation

K12-KGraph: A Curriculum-Aligned Knowledge Graph for Benchmarking and Training Educational LLMs

Can We Trust LLMs for Mental Health Screening? Consistency, ASR Robustness, and Evidence Faithfulness

Minimizing Worst-Case Weighted Latency for Multi-Robot Persistent Monitoring: Theory and RL-Based Solutions

Scratchpad Patching: Decoupling Compute from Patch Size in Byte-Level Language Models

DegBins: Degradation-Driven Binning for Depth Super-Resolution

Any2Any 3D Diffusion Models with Knowledge Transfer: A Radiotherapy Planning Study

Statistical Scouting Finds Debate-Safe but Not Debate-Useful Cases: A Matched-Ceiling Study of Open-Weight LLM Reasoning Protocols

Reflection Anchors for Propagation-Aware Visual Retention in Long-Chain Multimodal Reasoning

SABER: A Scalable Action-Based Embodied Dataset for Real-World VLA Adaptation

Byte-Exact Deduplication in Retrieval-Augmented Generation: A Three-Regime Empirical Analysis Across Public Benchmarks

Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training

DAP: Doppler-aware Point Network for Heterogeneous mmWave Action Recognition

Edit-Based Refinement for Parallel Masked Diffusion Language Models

From Pixels to Concepts: Do Segmentation Models Understand What They Segment?

CLR-voyance: Reinforcing Open-Ended Reasoning for Inpatient Clinical Decision Support with Outcome-Aware Rubrics

FPGA-Based Hardware Architecture for Contrast Maximization in Event-Based Vision

Biosignal Fingerprinting: A Cross-Modal PPG-ECG Foundation Model

KAN Text to Vision? The Exploration of Kolmogorov-Arnold Networks for Multi-Scale Sequence-Based Pose Animation from Sign Language Notation

End-to-End Keyword Spotting on FPGA Using Graph Neural Networks with a Neuromorphic Auditory Sensor

Dual-Path Hyperprior Informed Deep Unfolding Network for Image Compressive Sensing

Online Set Learning from Precision and Recall Feedback

Towards Compact Sign Language Translation: Frame Rate and Model Size Trade-offs

When Adaptation Fails: A Gradient-Based Diagnosis of Collapsed Gating in Vision-Language Prompt Learning

Crosslingual On-Policy Self-Distillation for Multilingual Reasoning

TIDE-Bench: Task-Aware and Diagnostic Evaluation of Tool-Integrated Reasoning

LLM-Guided Monte Carlo Tree Search over Knowledge Graphs: Composing Mechanistic Explanations for Drug-Disease Pairs

TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems

PhysHanDI: Physics-Based Reconstruction of Hand-Deformable Object Interactions