arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2605.09975 2026-05-12 cs.LG math.OC

Chebyshev Center-Based Direction Selection for Multi-Objective Optimization and Training PINNs

Hoyeol Yoon, Seoungbin Bae, Nam Ho-Nguyen, Dabeen Lee

发表机构 * Department of Industrial & Systems Engineering, KAIST（韩国科学技术院工业与系统工程系）； Discipline of Business Analytics, The University of Sydney（悉尼大学商业分析学科）； Department of Mathematical Sciences, Seoul National University（首尔国立大学数学科学系）

AI总结该论文研究了物理信息神经网络（PINNs）训练中多目标优化的方向选择问题，提出了基于切比雪夫中心的更新方向选择方法。通过将方向选择建模为对偶锥体中的切比雪夫中心问题，该方法在低维空间中高效求解，并保证了非凸情况下的收敛性。该方法统一了现有方法中的关键性质，提供了可解释的几何准则，并在多个基准测试中表现出优越的性能。

2605.09973 2026-05-12 cs.CL cs.AI

GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction

Urchade Zaratiana, Ash Lewis, George Hurn-Maloney

发表机构 * OpenAI ； Zaratiana et al.（Zaratiana 等）

AI总结本文提出GLiNER2-PII，一个用于多语言个人可识别信息（PII）提取的小型模型，能够识别42种不同类型的PII实体。为了解决训练数据稀缺和隐私风险的问题，研究者构建了一个包含4,910篇标注文本的多语言合成语料库，通过约束驱动的生成方法生成多样化、真实的示例。实验表明，GLiNER2-PII在SPY基准测试中取得了最高的跨度级F1分数，优于包括OpenAI隐私过滤器在内的多个对比系统。

Comments Under submission

2605.09972 2026-05-12 cs.RO cs.CV

HiDrive: A Closed-Loop Benchmark for High-Level Autonomous Driving

Zhongyu Xia, Guanyu Zhu, Guo Tang, Wenhao Chen, Yongtao Wang

发表机构 * Wangxuan Institute of Computer Technology, Peking University（王炫计算机技术研究所，北京大学）

AI总结 HiDrive 是一个全新的闭环自动驾驶基准，旨在解决现有基准在场景多样性、对象种类和驾驶能力评估方面的不足。该基准特别强调长尾场景，引入了多种罕见物体和复杂交通情境，并扩展了对规则遵守、道德推理和应急决策等高级驾驶能力的评估。HiDrive 采用更先进的物理引擎，提供真实光照和高保真视觉渲染，为自动驾驶系统在真实复杂环境中的表现提供了更具挑战性的测试平台。

2605.09969 2026-05-12 cs.LG cs.CL

The Truth Lies Somewhere in the Middle (of the Generated Tokens)

Sophie L. Wang, Phillip Isola, Brian Cheung

发表机构 * MIT（麻省理工学院）

AI总结本文研究了如何将自回归生成的隐藏状态压缩为能够反映语言模型内部状态的表示。作者发现，通过对生成的隐藏状态进行均值池化，可以获得比单个token更具有语义信息的表示，并通过核对齐方法在语言、视觉和蛋白质领域进行了验证。研究还表明，生成token的表示优于提示token，并揭示了模型行为中可解释的动态特性。

2605.09967 2026-05-12 cs.LG

Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions

Andrew Lee, Fernanda Viégas, Martin Wattenberg

发表机构 * Harvard University（哈佛大学）； Google DeepMind（谷歌DeepMind）

AI总结本研究探讨了语言模型中线性方向所表示的概念如何捕捉关系结构的问题。通过在结构明确的领域（如围棋游戏Othello）中训练模型，研究发现虽然模型的内部状态可以被线性解码，但其实际结构还包含张量积表示（TPR）。研究通过训练TPR探测器，揭示了线性探测器所捕捉的结构实际上是更复杂结构的投影，并展示了如何从TPR探测器的参数中直接恢复线性方向。这一发现表明，方向性表示可能是更结构化表征的投影。

2605.09963 2026-05-12 cs.CV

Learning to Perceive "Where": Spatial Pretext Tasks for Robust Self-Supervised Learning

Yang Shen, Yusen Cai, Weronika Hryniewska-Guzik, Qing Lin, Mengmi Zhang

发表机构 * Nanyang Technological University, Singapore（南洋理工大学，新加坡）； Warsaw University of Technology, Poland（华沙理工大学，波兰）

AI总结现有自监督学习方法主要学习对象不变的表征，但往往忽视了物体部分之间的空间结构和关系。为解决这一问题，本文提出了一种空间感知的预训练任务——空间预测（SP），通过预测同一图像中两个解耦局部视图之间的相对位置和尺度，学习细粒度的空间依赖关系。实验表明，该方法在图像识别、细粒度分类、语义分割和深度估计等多个任务中均取得显著提升，并增强了模型在分布外场景下的鲁棒性。

2605.09959 2026-05-12 cs.LG cs.AI cs.CL cs.ET

G-Zero: Self-Play for Open-Ended Generation from Zero Data

Chengsong Huang, Haolin Liu, Tong Zheng, Runpeng Dai, Langlin Huang, Jinyuan Li, Zongxia Li, Zhepei Wei, Yu Meng, Jiaxin Huang

发表机构 * Washington University in St. Louis（华盛顿大学圣路易斯分校）； University of Virginia（弗吉尼亚大学）； University of Maryland（马里兰大学）； University of North Carolina at Chapel Hill（北卡罗来纳大学教堂山分校）

AI总结本文提出了一种名为 G-Zero 的自演化框架，用于在无外部评估的情况下实现大语言模型的自主持续改进，尤其适用于开放性生成任务。其核心方法是引入 Hint-δ 内在奖励机制，通过生成模型自身预测差异来指导优化，并结合提案模型和生成模型的协同进化进行训练。该方法无需依赖外部判断器，有效避免了奖励黑客和能力瓶颈，为不可验证领域的模型自我进化提供了可扩展且鲁棒的解决方案。

2605.09956 2026-05-12 cs.CV cs.AI

SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis

Peng Jia, Zhen Xiao, Jia Li, Xueliang Liu, Zhenzhen Hu, Lingyun Yu

发表机构 * Hefei University of Technology（合肥工业大学）； University of Science and Technology of China（中国科学技术大学）

AI总结本文提出了一种名为SDTalk的单次拍摄3D高斯溅射（3DGS）框架，用于实现无需个性化训练即可泛化到未知身份的高质量实时说话头生成。该方法通过引入结构化面部先验和双分支运动场，分别提升头部重建的完整性与面部动态的细节表现，从而在视觉质量和推理效率方面优于现有方法。

Comments 5 pages, 4 figures, 4 tables

2605.09955 2026-05-12 cs.CL

Beyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Tasks

Tadesse Destaw Belay, Ibrahim Said Ahmad, Idris Abdulmumin, Abinew Ali Ayele, Alexander Gelbukh, Eusebio Ricárdez-Vázquez, Olga Kolesnikova, Shamsuddeen Hassan Muhammad, Seid Muhie Yimam

发表机构 * Instituto Politécnico Nacional（智利国家理工学院）； University of Wisconsin–Stevens Point（威斯康星州立大学斯蒂文斯点分校）； University of Pretoria（培特里亚大学）； Bahir Dar University（巴希尔达大学）； Imperial College London（伦敦帝国理工学院）； University of Hamburg（汉堡大学）

AI总结本文研究了在主观性自然语言处理任务中如何有效建模标注者之间的意见分歧问题。作者提出了一种基于共识的聚类方法，用于捕捉和建模不同标注者的观点差异，从而提升标签聚合的效果。通过在18种语言的40个数据集上进行实验，结果表明该方法相比传统的多数投票和单个标注者建模，能够更全面地利用标注者视角，显著提升分类性能。此外，研究还比较了多种聚合策略，发现多标签和多任务方法在处理聚类标注者时表现更优。

Comments Pre-MIT Press publication version

2605.09954 2026-05-12 cs.RO cs.CV

JODA: Composable Joint Dynamics for Articulated Objects

Tianhong Gao, Cheng Yu, Yinghao Xu, Mengyu Chu

发表机构 * Peking University（北京大学）； Ant Group, Robbyant（蚂蚁集团，Robbyant）

AI总结本文提出JODA，一种用于生成关节级动力学的可组合框架，能够捕捉如摩擦保持、卡扣、软闭合等精细的机械行为。JODA通过结构化的三通道场描述关节自由度下的保守力、干摩擦和阻尼，结合形状约束的分段三次插值方法，实现了表达力强且可微分模拟的动力学建模。该方法支持从多模态输入中推断和优化关节动力学，为复杂机械系统的建模、编辑和优化提供了统一的接口。

2605.09951 2026-05-12 cs.LG

Generating synthetic electronic health record data using agent-based models to evaluate machine learning robustness under mass casualty incidents

Roben Delos Reyes, Daniel Capurro, Nicholas Geard

发表机构 * School of Computing and Information Systems The University of Melbourne（墨尔本大学计算与信息系统学院）； Department of Medicine The University of Melbourne（墨尔本大学医学院）

AI总结该研究提出了一种基于智能体的建模方法，用于生成合成电子健康记录（EHR）数据，以评估机器学习模型在大规模伤亡事件（MCI）等极端情况下的鲁棒性。研究利用真实EHR数据构建急诊科的智能体模型，模拟患者到达、资源容量和临床流程，并通过调整系统条件生成反映MCI场景的合成数据。实验表明，MCI条件下机器学习模型的召回率显著下降，突显了系统变化对模型性能和患者预后的影响，为提升医疗AI在复杂环境下的可靠性提供了新方法。

Comments 14 pages, 1 figure; accepted at CHIL 2026

详情

英文摘要

ML models in healthcare are typically evaluated using curated real-world EHR data. A key limitation of such evaluations is that they may fail to assess the robustness of ML models to changes in the data at deployment, which is a common issue because EHR data used for ML model development cannot capture all such changes. Mass casualty incidents (MCIs) caused by disasters are critical instances where this will be an issue, as they induce rare, uncertain, and novel changes to routine system conditions. Because real-world EHR data from MCIs are often limited or unavailable, assessing ML robustness under such conditions before deployment remains challenging. Here, we propose an agent-based modelling approach for generating synthetic EHR data to evaluate the robustness of ML models under MCI scenarios. We use real-world EHR data to develop and calibrate an agent-based model (ABM) of an emergency department (ED) that explicitly models patient arrivals, resource capacity, and clinical workflow. By changing these system conditions to reflect plausible MCI scenarios, the ED model generates synthetic versions of the real-world EHR data that exhibit shifts in system behaviour. Using these synthetic data, we test ML models for predicting length of stay. We observed consistent declines in recall under MCI conditions relative to baseline system conditions, resulting in an increase in the number of patients with prolonged length of stay that were missed by the ML models. These results highlight the impact of changes in system conditions on patient outcomes, EHR data, and ML model performance. Our work establishes ABM-based synthetic EHR data generation as a proactive and systematic approach for evaluating the robustness of ML models under MCI or other system conditions not captured in real-world EHR data, supporting the safer and more effective deployment of ML models in healthcare systems.

URL PDF HTML ☆

赞 0 踩 0

2605.09950 2026-05-12 cs.LG cs.AI

Novel GPU Boruta algorithms for feature selection from high-dimensional data

Xurui Li, Zhiguo Gan, Jiaming Zhang, Zheng Liu, Diannan Lu

发表机构 * Department of Chemical Engineering, Tsinghua University（清华大学化学工程系）

AI总结本文针对传统特征选择算法在CPU上处理高维数据时效率低下的问题，提出两种基于GPU加速的Boruta特征选择方法——Boruta-Permut和Boruta-TreeImp，分别基于特征排列重要性和不纯度减少重要性进行特征选择。实验表明，这两种方法在保持与原始Boruta算法相近选择精度的同时，显著提升了计算效率，为大规模数据的特征选择提供了高效且经济的解决方案。

Comments This paper has been submitted to the journal Data Mining and Knowledge Discovery, and a preprint is available for the authors' records

2605.09949 2026-05-12 cs.LG

From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models

Zehao Li, Yasuhiro Yoshikai, Shumpei Nemoto, Hiroyuki Kusuhara, Tadahaya Mizuno

发表机构 * Laboratory of Molecular Pharmacokinetics, Graduate School of Pharmaceutical Sciences, The University of Tokyo（分子药代动力学实验室，药学研究生院，东京大学）； The Institute of Statistical Mathematics (ISM), Research Organization of Information and Systems（统计数学研究所（ISM），信息与系统研究组织）

AI总结该研究探讨了化学语言模型如何从分子字符串表示中学习化学意义，而非仅依赖表面字符串模式，特别关注手性这一具有挑战性的测试案例。研究提出了一个基于Transformer的SMILES翻译模型Pan-CORE，并通过高时间分辨率的训练过程分析，揭示了手性信息在模型训练中的学习机制。研究发现，手性信息的学习存在一个长期停滞后的突增现象，表明手性学习的困难不仅源于模型容量，还涉及手性约束的复杂性，研究进一步通过注意力动态、残差流轨迹和潜在空间几何分析，揭示了编码器在手性信息学习中的核心作用。

详情

英文摘要

Understanding how chemical language models (CLMs) learn chemical meaning from molecular string representations, rather than only surface-level string patterns, is an important question in chemical representation learning and machine learning for chemistry. Chirality provides a demanding test case: enantiomers can differ greatly in pharmacological activity and toxicity, yet CLMs often struggle to distinguish chiral configurations reliably. Here we present Pan-CORE (Pan-Chemical Omniscale Representation Engine), a family of autoregressive Transformer-based encoder-decoder models for SMILES translation, and use high-temporal-resolution checkpoint analysis to investigate how chiral information is learned during training. Across all tested Pan-CORE variants, we observe a reproducible jump-up in which chiral-token accuracy rises abruptly after a long plateau, suggesting that chiral learning stagnation is not explained by model capacity alone and instead reflects the complexity of chiral constraints. Analyses of attention dynamics, residual-stream trajectories, and latent-space geometry support an encoder-centered mechanism in which chiral-token representations undergo transient destabilization and reconstruction, seen as a V-shaped drop and recovery in vector norm and directional stability, together with a clear reorganization of chiral molecular representations in the latent space. Encoder-decoder cross-evaluation further supports the encoder-centered nature of the transition, and targeted attention-head ablation identifies a small set of chiral-sensitive heads whose removal selectively reduces chiral-token accuracy even in the fully trained model. These findings show that SMILES translation can serve as a useful experimental system for mechanistic analysis of semantic emergence in CLMs, with implications for interpretable chemical representation learning.

URL PDF HTML ☆

赞 0 踩 0

2605.09948 2026-05-12 cs.AI cs.CV cs.RO

LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models

Boyang Shen, Kaixiang Yang, Hao Wang, Qiuyu Yu, Qiang Xie, Qiang Li, Zhiwei Wang

发表机构 * Huazhong University of Science and Technology（华中科技大学）； Wuhan United Imaging Surgical Co.,Ltd. (UIS)（武汉联影 surgical 公司）

AI总结当前视觉-语言-动作（VLA）模型通常将视觉-语言主干网络的最深层表示视为动作预测的最优输入，但机器人操作任务需要频繁的闭环空间调整，过度抽象可能浪费计算资源并削弱精确控制所需的底层几何线索。为此，本文提出LoopVLA，一种递归VLA架构，联合学习表示优化、动作预测与表示充分性估计，通过共享的Transformer块迭代优化多模态特征，并在每一步生成候选动作和充分性评分，从而动态决定是否需要进一步优化。实验表明，LoopVLA在保持任务成功率的同时显著提升了模型效率，参数量减少45%，推理吞吐量提升达1.7倍。

2605.09945 2026-05-12 cs.LG

Selection of the Best Policy under Fairness Constraints for Subpopulations

Tingyu Zhu, Yuhang Wu, Zeyu Zheng

发表机构 * Department of Industrial Engineering and Operations Research（工业工程与运营管理系）； Berkeley Artificial Intelligence Research Lab（伯克利人工智能研究实验室）； University of California Berkeley（加州大学伯克利分校）

AI总结本文研究了在公平性约束下选择适用于不同子群体的最佳政策的问题，要求所选政策在每个预设子群体上的表现均不低于一定阈值。作者提出了一个名为 T-a-S-CS 的算法，能够在保证公平性的前提下高效识别出平均性能最优的政策，并给出了该问题的样本复杂度下界。实验表明，该方法相比现有政策分配方法具有显著的效率提升。

2605.09944 2026-05-12 cs.RO

Explicit Stair Geometry Conditioning for Robust Humanoid Locomotion

Jianguo Zhang, Wentai Xu, Shusheng Ye, Yuxiang He, Weimin Qi, Qinbo Sun, Ning Ding, Liguang Zhou

发表机构 * Shenzhen Institute of Artificial Intelligence and Robotics for Society（深圳人工智能与机器人社会研究院）； School of Science and Engineering, The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳）科学与工程学院）； Mohamed bin Zayed University of Artificial Intelligence (MBZUAI)（穆罕默德·本·扎耶德人工智能大学）

AI总结本文针对人形机器人在复杂楼梯环境中行走的鲁棒性问题，提出了一种基于显式楼梯几何条件的控制框架。该方法通过提取楼梯高度、深度和偏航角等可解释的几何参数，直接作为策略网络的输入，从而实现对步态参数的主动调整。实验表明，该方法在仿真和真实环境中均表现出优异的泛化能力和稳定性，尤其在户外连续33级台阶的测试中验证了其实际应用价值。

Comments 8 pages, 7 figures, 4 tables

2605.09942 2026-05-12 cs.AI

HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

Dongming Jiang, Yi Li, Guanpeng Li, Qiannan Li, Bingzhe Li

发表机构 * Department of Computer Science, The University of Texas at Dallas（德克萨斯大学达拉斯分校计算机科学系）； Department of Electrical and Computer Engineering, University of Florida（佛罗里达大学电气与计算机工程系）； University of California, Davis（加州大学戴维斯分校）

AI总结本文提出HAGE，一种基于强化学习的加权多关系记忆框架，旨在解决智能体大语言模型系统中记忆检索的问题。HAGE将记忆检索重新定义为基于查询条件的序列化图遍历过程，通过共享记忆节点上的关系特定图视图组织记忆，并利用可训练的关系特征向量编码多维关系信号。研究引入了一个路由网络动态调整边嵌入的维度，并结合语义相似度与查询条件下的边表示计算遍历得分，从而优先选择高效用的关系路径。实验表明，HAGE在长期推理任务中表现出更高的准确率，并在准确率与效率之间取得了更优的平衡。

2605.09939 2026-05-12 cs.RO

Neural Distance-Guided Path Integral Control for Tractor-Trailer Navigation

Peng Wei, Chen Peng, Stavros Vougioukas

发表机构 * Department of Biological and Agricultural Engineering, University of California Davis（加州大学戴维斯分校生物与农业工程系）； ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University（浙江大学杭州全球科技创新中心）

AI总结本文研究了牵引挂车系统在复杂农业环境中的自主安全导航问题，针对其复杂的几何结构和非线性动力学特性，提出了一种基于几何神经编码器的实时避障方法。该方法通过神经网络快速准确地估计牵引挂车与激光雷达感知环境之间的距离，无需预先地图即可实现动态几何推理，并将学习到的距离信息融入模型预测路径积分（MPPI）控制器中，从而提升系统在复杂环境中的导航安全性和响应性。仿真结果验证了该方法在生成动态可行且安全轨迹方面的有效性。

2605.09936 2026-05-12 cs.CV cs.IR cs.LG

Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

Yiwei Ou, Chung Ching Cheung, Jun Yang Ang, Xiaobin Ren, Ronggui Sun, Guansong Gao, Kaiqi Zhao, Manfredo Manfredini

发表机构 * University of Auckland（奥克兰大学）； University of Pennsylvania（宾夕法尼亚大学）； Stanford University（斯坦福大学）； Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））

AI总结本文提出Urban-ImageNet，一个大规模多模态数据集与评估框架，用于从社交媒体图像中感知城市空间。该数据集包含来自微博的200万张公共图像及其配对文本，涵盖中国24个城市61个城区，支持从1K到2M不同规模的训练与评估。基于城市理论构建的层次化分类体系，Urban-ImageNet支持城市场景语义分类、跨模态图像-文本检索和实例分割三项任务，旨在评估AI模型对城市空间社会性、功能性和空间特征的理解能力。

详情

英文摘要

We present Urban-ImageNet, a large-scale multi-modal dataset and evaluation benchmark for urban space perception from user-generated social media imagery. The corpus contains over 2 Million public social media images and paired textual posts collected from Weibo across 61 urban sites in 24 Chinese cities across 2019-2025, with controlled benchmark subsets at 1K, 10K, and 100K scale and a full 2M corpus for large-scale training and evaluation. Urban-ImageNet is organized by HUSIC, a Hierarchical Urban Space Image Classification framework that defines a 10-class taxonomy grounded in urban theory. The taxonomy is designed to distinguish activated and non-activated public spaces, exterior and interior urban environments, accommodation spaces, consumption content, portraits, and non-spatial social-media content. Rather than treating urban imagery as generic scene data, Urban-ImageNet evaluates whether machine perception models can capture spatial, social, and functional distinctions that are central to urban studies. The benchmark supports three tasks within one standardized library: (T1) urban scene semantic classification, (T2) cross-modal image-text retrieval, and (T3) instance segmentation. Our experiments evaluate representative vision, vision-language, and segmentation models, revealing strong performance on supervised scene classification but more challenging behavior in cross-modal retrieval and instance-level urban object segmentation. A multi-scale study further examines how model performance changes as balanced training data increases from 1K, 10K to 100K images. Urban-ImageNet provides a unified, theory-grounded, multi-city benchmark for evaluating how AI systems perceive and interpret contemporary urban spaces across modalities, scales, and task formulations. Dataset and benchmark are available at: huggingface.co/datasets/Yiwei-Ou/Urban-ImageNet and github.com/yiasun/dataset-2.

URL PDF HTML ☆

赞 0 踩 0

2605.09934 2026-05-12 cs.CL

TRACER: Verifiable Generative Provenance for Multimodal Tool-Using Agents

Bihui Yu, Caijun Jia, Jing Chi, Xiaohan Liu, Yining Wang, He Bai, Yuchen Liu, Jingxuan Wei, Junnan Zhu

发表机构 * Shenyang Institute of Computing Technology, Chinese Academy of Sciences（中国科学院沈阳计算技术研究所）； University of Chinese Academy of Sciences（中国科学院大学）； MAIS, Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所MAIS）； Key Laboratory of Computing Power Network and Information Security, Ministry of Education, Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences)（教育部计算电力网络与信息安全重点实验室，山东计算机科学中心（济南国家超算中心），齐鲁工业大学（山东省科学院））

AI总结 TRACER 是一种用于多模态工具使用代理的可验证生成溯源框架，旨在解决当前工具使用过程中存在的“溯源鸿沟”问题，即生成的结论缺乏对支撑证据的明确依赖关系。TRACER 在生成每个回答的同时，生成结构化的溯源记录，明确标注支持该结论的工具调用、证据单元及语义关系，并通过多方面验证确保溯源可靠性，进而用于强化学习中的可追溯性约束和局部信用分配。实验表明，TRACER 在 TRACE-Bench 基准上表现出色，显著优于现有方法，证明了可靠多模态工具推理依赖于对观测的溯源感知，而非单纯增加工具调用次数。

详情

英文摘要

Multimodal large language models increasingly solve vision-centric tasks by calling external tools for visual inspection, OCR, retrieval, calculation, and multi-step reasoning. Current tool-using agents usually expose the executed tool trajectory and the final answer, but they rarely specify which tool observation supports each generated claim. We call this missing claim-level dependency structure the provenance gap. The gap makes tool use hard to verify and hard to optimize, because useful evidence, redundant exploration, and unsupported reasoning are mixed in the same trajectory. We introduce TRACER, a framework for verifiable generative provenance in multimodal tool-using agents. Instead of adding citations after generation, TRACER generates each answer sentence together with a structured provenance record that identifies the supporting tool turn, evidence unit, and semantic support relation. Its relation space contains Quotation, Compression, and Inference, covering direct reuse, faithful condensation, and grounded derivation. TRACER verifies each record through schema checking, tool-turn alignment, source authenticity, and relation rationality, and then converts verified provenance into traceability constraints and provenance-derived local credit for reinforcement learning. We further construct TRACE-Bench, a benchmark for sentence-level provenance reconstruction from coarse multimodal tool trajectories. On TRACE-Bench, simply adding tools often introduces noise. With Qwen3-VL-8B, TRACER reaches 78.23% answer accuracy and 95.72% summary accuracy, outperforming the strongest closed-source tool-augmented baseline by 23.80 percentage points. Compared with tool-only supervised fine-tuning, it also reduces total test-set tool calls from 4949 to 3486. These results show that reliable multimodal tool reasoning depends on provenance-aware use of observations, not on more tool calls alone.

URL PDF HTML ☆

赞 0 踩 0

2605.09932 2026-05-12 cs.CL

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

Zehua Pei, Hui-Ling Zhen, Xianzhi Yu, Sinno Jialin Pan, Mingxuan Yuan, Bei Yu

发表机构 * The Chinese University of Hong Kong（香港中文大学）； Huawei Technologies Co., Ltd（华为技术有限公司）

AI总结当前大型语言模型在处理长文本时，仍难以有效利用长上下文中的信息。本文提出FocuSFT，一种基于双层优化的细调方法，通过在训练过程中优化注意力分配，减少位置偏差和注意力陷阱对内容相关词的关注度削弱问题。该方法在内层优化中引入轻量级快速参数形成参数化记忆，引导模型关注语义相关内容，外层则基于此优化进行监督细调，从而提升模型在长上下文任务中的表现。实验表明，FocuSFT在多个基准测试中均取得显著性能提升。

2605.09931 2026-05-12 cs.CL cs.AI

PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning

Luan Zhang, Dandan Song, Zhijing Wu, Zhengyu Chen, Chen Zhang, Yuhang Tian, Huipeng Ma, Chenhao Li, Changzhi Zhou, Xudong Li, Shuhao Zhang

发表机构 * School of Computer Science and Technology, Beijing Institute of Technology, China（北京理工大学计算机科学与技术学院）； School of Computer Science and Technology, Huazhong University of Science and Technology, China（华中科技大学计算机科学与技术学院）； Independent, China（独立）

AI总结 PruneTIR 是一种在推理阶段提升工具集成推理（TIR）效果与效率的方法，旨在优化已具备工具使用能力的大语言模型在实际推理中的表现。该方法通过剪枝错误工具调用轨迹、重新采样工具调用以及在必要时暂停工具使用，有效减少错误调用对推理过程的负面影响，避免模型陷入反复失败的循环。实验表明，PruneTIR 显著提升了模型的推理准确率和效率，同时缩短了推理所需上下文长度。

2605.09929 2026-05-12 cs.LG cs.SE

TeleResilienceBench: Quantifying Resilience for LLM Reasoning in Telecommunications

Pranshav Gajjar, Emmanuel Ojo, Vijay K Shah

发表机构 * NextG Wireless Lab, North Carolina State University（NextG无线实验室，北卡罗来纳州立大学）

AI总结本文提出了TeleResilienceBench，用于评估大型语言模型在电信领域中面对不完整或错误推理时的恢复能力，即“推理韧性”。该基准通过从弱生成模型中收集失败案例，并截断错误推理过程，要求目标模型继续并修正推理，从而量化模型的恢复表现。研究发现，即使是最强的模型其恢复率也仅为29.1%，且模型规模并不总是带来韧性提升，其中Nemotron-3-nano 4b在韧性与成本比方面表现最佳。此外，研究指出当前电信基准的难度标签更多反映知识覆盖而非推理深度。

2605.09925 2026-05-12 cs.CV

Frequency Adapter with SAM for Generalized Medical Image Segmentation

Phuoc-Nguyen Bui, Van-Nguyen Pham, Duc-Tai Le, Junghyun Bum, Hyunseung Choo

发表机构 * Sungkyunkwan University, Korea（成均馆大学，韩国）

AI总结医学图像分割在辅助诊断和治疗规划中具有重要意义，但深度学习模型在面对不同数据集时常因成像协议、扫描设备和患者群体的差异而难以泛化。本文提出了一种基于频率域适配的通用医学图像分割方法FSAM，结合低秩适配（LoRA）和频率适配模块，有效提取跨域不变的高频特征，提升模型在单一源域下的泛化能力。实验表明，该方法在视网膜和前列腺数据集上优于传统域泛化及基于SAM的域泛化方法。

Comments Under review, 10 pages, 1 figure, 2 tables

2605.09924 2026-05-12 cs.CL

Evolving Knowledge Distillation for Lightweight Neural Machine Translation

Xuewen Zhang, Haixiao Zhang, Xinlong Huang

发表机构 * Department of Content Generation（内容生成系）； Li Auto ； Beijing, China（中国北京）

AI总结本文提出了一种名为Evolving Knowledge Distillation（EKD）的渐进式知识蒸馏框架，旨在解决大型神经机器翻译模型在资源受限设备上部署时的挑战。通过让学生模型逐步从容量逐渐增加的一系列教师模型中学习，EKD有效缩小了师生模型之间的性能差距。实验表明，EKD在多个基准数据集上均取得显著提升，最终学生模型的性能与大型教师模型非常接近。

2605.09922 2026-05-12 cs.CL cs.AI

Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs

Wu Li, Yigeng Zhou, Zesheng Shi, Yequan Wang, Min Zhang, Jing Li

发表机构 * Harbin Institute of Technology, Shenzhen, China（哈尔滨工业大学（深圳））； Beijing Academy of Artificial Intelligence, Beijing, China（北京人工智能研究院）

AI总结本文提出了一种名为TPAW的团队式自博弈算法，旨在提升大语言模型在完全自监督设置下的对齐效果。该方法通过让当前策略模型与历史检查点进行协作与竞争，增强训练稳定性与效率，并引入两种自适应加权机制，分别调整目标响应的重要性以及团队成员在训练中的贡献度。实验表明，TPAW在多种基础模型和大语言模型基准上均优于现有方法。

Comments Accepted by ACL 2026 Main

2605.09920 2026-05-12 cs.LG cs.AI

Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward

Xuexiang Wen, Hang Yu, Linchao Zhu, Gaoang Wang

发表机构 * Zhejiang University（浙江大学）； Ant Group（蚂蚁集团）

AI总结本文提出了一种无需验证器的强化学习方法VIGOR，用于大语言模型的后训练优化。该方法通过计算策略模型自身生成文本时的梯度范数作为内在奖励信号，引导模型生成更符合当前策略的输出。VIGOR通过调整梯度长度偏差并采用分组排序策略，提升了奖励信号的稳定性和有效性，在数学推理和代码生成任务中均表现出优于现有方法的性能。

Comments Accepted to Findings of ACL 2026

2605.09918 2026-05-12 cs.LG cs.AI cs.CY

NaiAD: Initiate Data-Driven Research for LLM Advertising

Yihang Zhang, Zimeng Huang, Ren Zhai, Yipeng Kang, Tonghan Wang

发表机构 * Tsinghua University（清华大学）； College of AI（人工智能学院）； Department of Literature, Arts and Communication（文学、艺术与传播系）； Anhui International Studies University（安徽国际关系大学）； State Key Laboratory of General Artificial Intelligence, BIGAI（通用人工智能国家重点实验室，BIGAI）

AI总结本文提出NaiAD，首个专为大语言模型（LLM）广告设计的综合性数据集，包含58,999条精心构建的嵌入广告的响应及对应用户查询。该数据集基于理论支撑的评估指标，分别全面捕捉用户和商业价值，并通过解耦生成管道缓解对齐LLM的维度共线性问题，生成结构多样的样本。研究还引入基于方差校准预测驱动推理的评分框架，使自动评分与人工标注一致，并揭示了成功广告整合依赖于四种语义策略，为未来LLM原生广告系统的发展提供了基础支撑。

Comments 37 pages, 11 figures

2605.09915 2026-05-12 cs.CL cs.AI cs.CY

Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents

Rong Shan, Te Gao, Hang Zheng, Yunjia Xi, Jiachen Zhu, Zeyu Zheng, Yong Yu, Weinan Zhang, Jianghao Lin

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Central South University（中南大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结本文指出，顶级人工智能会议为维持相对稳定的接收率，可能面临由全自动科学代理引发的“分母博弈”新威胁。恶意行为者可通过部署AI代理大量提交表面合理但质量低的论文，从而稀释评审资源，提高特定高质量论文的录用概率。研究分析了该威胁的可行性及影响，并提出需通过系统性政策与激励机制改革，而非仅依赖技术检测手段，来应对这一挑战。

Comments Accepted by ICML'26 Position Track

2605.09908 2026-05-12 cs.LG cs.AI cs.SD

Voice Biomarkers for Depression and Anxiety

Oleksii Abramenko, Noah D. Stein, Colin Vaz

发表机构 * Kintsugi Mindful Wellness, Inc.（Kintsugi Mindful Wellness公司）

AI总结本文研究如何从语音中检测抑郁和焦虑，提出了一种基于深度学习的方法，直接利用原始语音信号进行建模，避免了传统方法中依赖人工设计特征的局限。研究使用了一个包含约65,000条语料、来自23,000名美国代表性人群的大规模数据集进行训练，所提出的模型能够提取与内容无关的生物标志物信息，并与语音中的词汇特征结合，在实际应用中提升了预测性能。实验表明，该模型在约5000名独立测试者上实现了71%的灵敏度和特异性，并已开源发布以促进相关研究。

AI 大模型

视觉与机器人

科学与医疗

Chebyshev Center-Based Direction Selection for Multi-Objective Optimization and Training PINNs

GLiNER2-PII: A Multilingual Model for Personally Identifiable Information Extraction

HiDrive: A Closed-Loop Benchmark for High-Level Autonomous Driving

The Truth Lies Somewhere in the Middle (of the Generated Tokens)

Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions

Learning to Perceive "Where": Spatial Pretext Tasks for Robust Self-Supervised Learning

G-Zero: Self-Play for Open-Ended Generation from Zero Data

SDTalk: Structured Facial Priors and Dual-Branch Motion Fields for Generalizable Gaussian Talking Head Synthesis

Beyond Majority Voting: Agreement-Based Clustering to Model Annotator Perspectives in Subjective NLP Tasks

JODA: Composable Joint Dynamics for Articulated Objects

Generating synthetic electronic health record data using agent-based models to evaluate machine learning robustness under mass casualty incidents

Novel GPU Boruta algorithms for feature selection from high-dimensional data

From Syntax to Semantics: Unveiling the Emergence of Chirality in SMILES Translation Models

LoopVLA: Learning Sufficiency in Recurrent Refinement for Vision-Language-Action Models

Selection of the Best Policy under Fairness Constraints for Subpopulations

Explicit Stair Geometry Conditioning for Robust Humanoid Locomotion

HAGE: Harnessing Agentic Memory via RL-Driven Weighted Graph Evolution

Neural Distance-Guided Path Integral Control for Tractor-Trailer Navigation

Urban-ImageNet: A Large-Scale Multi-Modal Dataset and Evaluation Framework for Urban Space Perception

TRACER: Verifiable Generative Provenance for Multimodal Tool-Using Agents

FocuSFT: Bilevel Optimization for Dilution-Aware Long-Context Fine-Tuning

PruneTIR: Inference-Time Tool Call Pruning for Effective yet Efficient Tool-Integrated Reasoning

TeleResilienceBench: Quantifying Resilience for LLM Reasoning in Telecommunications

Frequency Adapter with SAM for Generalized Medical Image Segmentation

Evolving Knowledge Distillation for Lightweight Neural Machine Translation

Team-Based Self-Play With Dual Adaptive Weighting for Fine-Tuning LLMs

Verifier-Free RL for LLMs via Intrinsic Gradient-Norm Reward

NaiAD: Initiate Data-Driven Research for LLM Advertising

Position: Academic Conferences are Potentially Facing Denominator Gaming Caused by Fully Automated Scientific Agents

Voice Biomarkers for Depression and Anxiety