arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2510.01172 2026-05-15 cs.CL

Energy-Regularized Sequential Model Editing on Hyperspheres

Qingyuan Liu, Jia-Chen Gu, Yunzhi Yao, Hong Wang, Nanyun Peng

发表机构 * Columbia University（哥伦比亚大学）； University of California, Los Angeles（加州大学洛杉矶分校）； Zhejiang University（浙江大学）； University of Science and Technology of China（中国科学技术大学）

AI总结大型语言模型需要持续更新以保持与现实世界知识的一致性，但顺序编辑常导致模型表示不稳定并引发灾难性遗忘。本文提出了一种基于超球面能量（HE）正则化的编辑方法SPHERE，通过维持神经元权重在超球面上的均匀分布，有效缓解了编辑过程中的性能退化问题。实验表明，SPHERE在多个主流模型上显著提升了编辑效果，同时较好地保留了模型原有性能。

Comments Accepted by ICLR 2026. The code is available at https://github.com/PlusLabNLP/SPHERE. Project page: https://www.qingyuanliu.net/sphere_projectpage/

详情

英文摘要

Large language models (LLMs) require constant updates to remain aligned with evolving real-world knowledge. Model editing offers a lightweight alternative to retraining, but sequential editing often destabilizes representations and induces catastrophic forgetting. In this work, we seek to better understand and mitigate performance degradation caused by sequential editing. We hypothesize that hyperspherical uniformity, a property that maintains uniform distribution of neuron weights on a hypersphere, helps the model remain stable, retain prior knowledge, while still accommodate new updates. We use Hyperspherical Energy (HE) to quantify neuron uniformity during editing, and examine its correlation with editing performance. Empirical studies across widely used editing methods reveals a strong correlation between HE dynamics and editing performance, with editing failures consistently coinciding with high HE fluctuations. We further theoretically prove that HE dynamics impose a lower bound on the degradation of pretrained knowledge, highlighting why HE stability is crucial for knowledge retention. Motivated by these insights, we propose SPHERE (Sparse Projection for Hyperspherical Energy-Regularized Editing), an HE-driven regularization strategy that stabilizes neuron weight distributions, ultimately preserving prior knowledge while enabling reliable sequential updates. Specifically, SPHERE identifies a sparse space complementary to the principal hyperspherical directions of the pretrained weight matrices and projects new knowledge onto it, attenuating perturbations on the principal directions. Extensive experiments on LLaMA3 (8B) and Qwen2.5 (7B) show that SPHERE outperforms the best baseline in editing capability by an average of 16.41%, while most faithfully preserving general model performance, thereby offering a principled path toward reliable large-scale knowledge editing.

URL PDF HTML ☆

赞 0 踩 0

2510.00977 2026-05-15 cs.LG cs.CL

It Takes Two: Your GRPO Is Secretly DPO

Yihong Wu, Liheng Ma, Lei Ding, Muzhi Li, Xinyu Wang, Kejia Chen, Zhan Su, Zhanguang Zhang, Chenyang Huang, Yingxue Zhang, Mark Coates, Jian-Yun Nie

发表机构 * UdeM（蒙特利尔大学）； McGill（麦吉尔大学）； Mila（Mila人工智能研究院）； UManitoba（曼尼托巴大学）； CUHK（香港中文大学）； ZJU（浙江大学）； UAlberta（阿尔伯塔大学）； Amii（阿米人工智能研究院）； Huawei Noah’s Ark Lab（华为诺亚实验室）

AI总结本文研究了GRPO算法在大语言模型微调中的有效性，并提出了一种新的视角：GRPO的性能优势来源于其隐含的对比目标，这一特性使其在结构上与DPO等偏好学习方法密切相关。基于这一发现，作者提出了2-GRPO，仅需两次rollouts即可构建对比信号，显著减少了计算资源需求。理论分析和实验表明，2-GRPO在保持97.6%性能的同时，仅需16-GRPO的12.5% rollout和21%训练时间。

2510.00757 2026-05-15 cs.LG

LEAP: Local ECT-Based Learnable Positional Encodings for Graphs

Juan Amboage, Ernst Röell, Patrick Schnider, Bastian Rieck

发表机构 * AIDOS Lab, University of Fribourg（弗里堡大学AIDOS实验室）； Institute of AI for Health, Helmholtz Munich（慕尼黑亥姆霍兹人工智能与健康研究所）； Technical University of Munich（慕尼黑技术大学）； Department of Computer Science, ETH Zurich（苏黎世联邦理工学院计算机科学系）； Department of Computer Science, University of Basel（巴塞尔大学计算机科学系）

AI总结本文提出了一种基于局部欧拉特征变换（$\ell$-ECT）的可学习图位置编码方法LEAP，用于改进图神经网络中的位置编码能力。该方法结合了可微分的ECT近似及其局部变体，能够捕捉图的局部结构特征，并通过端到端训练方式进行优化。实验表明，LEAP在多个真实和合成数据集上表现出色，展示了其在图表示学习中的有效性和潜力。

Comments Accepted at the International Conference on Learning Representations (ICLR) 2026. Our code is available https://www.github.com/aidos-lab/LEAP

2509.26100 2026-05-15 cs.AI

AgenticEval: Toward Agentic and Self-Evolving Safety Evaluation of Large Language Models

Yixu Wang, Xin Wang, Yang Yao, Xinyuan Li, Xibang Yang, Yan Teng, Xingjun Ma, Yingchun Wang

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）； Fudan University（复旦大学）； The University of Hong Kong（香港大学）； East China Normal University（华东师范大学）

AI总结随着大语言模型在高风险领域的广泛应用，现有的静态评估方法已难以应对AI风险的动态变化和法规的持续演进。本文提出了一种新的智能体驱动的安全评估范式AgenticEval，通过多智能体框架自主解析政策文件，持续生成和演化综合性安全基准，并利用自我演进的评估循环不断优化测试用例。实验表明，该方法能够有效揭示传统评估方式难以发现的模型深层次安全漏洞，凸显了动态评估体系在确保AI安全部署中的重要性。

Comments Findings of ACL 2026

2509.25914 2026-05-15 cs.LG

ReNF: Rethinking the Design of Neural Long-Term Time Series Forecasters

Yihang Lu, Xianwei Meng, Enhong Chen

发表机构 * HFIPS, Chinese Academy of Sciences（中国科学院合肥研究院）； University of Science and Technology of China（中国科学技术大学）； Hefei University of Technology（合肥工业大学）

AI总结本文重新审视了长期时间序列预测中神经网络预报器的设计原则，提出了一种基于方差减少假设的新型框架ReNF。该方法通过结合自回归结构与直接输出结构的优势，提出了一种简洁高效的Boosted Direct Output范式，并引入参数平滑技术以提升模型泛化能力。实验表明，这种基于原理的改进使简单的时序多层感知机在多个基准上超越了近期复杂的先进模型，验证了设计原则的重要性。

2509.25826 2026-05-15 cs.LG

Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models

Kun Feng, Shaocheng Lan, Yuchen Fang, Wenchao He, Sihan Lu, Shuqi Gu, Lintao Ma, Xingyu Lu, Kan Ren

发表机构 * School of Information Science and Technology, ShanghaiTech University（信息科学与技术学院，上海科技大学）； Ant Group（蚂蚁集团）

AI总结时间序列基础模型（TSFMs）在零样本泛化方面面临挑战，主要由于时间序列中的采样密度和周期结构等固有时间异质性。为解决这一问题，本文提出Kairos，一种参数高效且灵活的时序基础模型，通过动态分块标记和混合尺寸编码，将时间异质性与模型容量解耦，从而在不增加模型宽度或深度的情况下实现细粒度的时间抽象。Kairos还引入了基于动态旋转编码的多粒度位置嵌入，能够根据实例的频谱特征和时间结构进行条件建模，最终在两个主流基准上以更少的参数取得了优越的零样本性能。

2509.23023 2026-05-15 cs.AI

Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia

Davi Bastos Costa, Renato Vicente

发表机构 * TELUS Digital Research Hub（TELUS数字研究中心）； Center for Artificial Intelligence and Machine Learning（人工智能与机器学习中心）； Institute of Mathematics, Statistics and Computer Science（数学、统计与计算机科学研究所）

AI总结本文提出了一种名为 *Mini-Mafia* 的简化版社交推理游戏，用于评估大型语言模型在多智能体交互中的表现。通过分析游戏中欺诈者、侦探和村民之间的互动，研究得出了一个预测欺诈方获胜概率的解析公式，并据此构建了 *Mini-Mafia Benchmark*，能够定量评估模型的欺骗、检测和披露能力。实验表明，该方法在跨模型预测中表现优异，并揭示了一些关于当前主流大模型能力的反直觉结论。

Comments Adds a validation section for the theoretical model and restructures the presentation

2509.22746 2026-05-15 cs.AI cs.CV

Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning

Zejun Li, Yingxiu Zhao, Jiwen Zhang, Siyuan Wang, Yang Yao, Runzhou Zhao, Jun Song, Bo Zheng, Zhongyu Wei

发表机构 * Fudan University（复旦大学）； Alibaba Group Holding Limited（阿里巴巴集团控股有限公司）； Future Living Lab of Alibaba（阿里巴巴未来生活实验室）； University of Southern California（南加州大学）； Shanghai Innovation Institute（上海创新研究院）

AI总结当前视觉推理方法主要专注于探索特定的推理模式，虽能在特定领域取得改进，但难以形成通用的推理能力。为此，本文提出了一种新的自适应推理范式——Mixture-of-Visual-Thoughts（MoVT），通过在一个模型中统一不同推理模式，并根据上下文选择合适的模式。研究引入了两阶段的自适应视觉推理框架AdaVaR，利用监督学习进行初始训练，并通过强化学习与精心设计的算法引导模型实现上下文自适应的模式选择，实验表明该方法在多种场景下均能有效提升视觉推理性能。

Comments 27 pages, 11 figures, 5 tables, accepted by ICLR 2026

2509.20846 2026-05-15 cs.LG

Causal Time Series Generation via Diffusion Models

Yutong Xia, Chang Xu, Yuxuan Liang, Li Zhao, Qingsong Wen, Roger Zimmermann, Jiang Bian

发表机构 * National University of Singapore（新加坡国立大学）； Microsoft Research Asia（微软亚洲研究院）； HKUST (Guangzhou)（香港科技大学（广州））； Squirrel AI

AI总结本文提出了一种基于因果视角的条件时间序列生成方法，将时间序列生成任务扩展到干预和反事实场景，形成了新的因果时间序列生成（Causal TSG）任务家族。为此，作者设计了基于扩散模型的统一框架CaTSG，通过后门调整和推理-行动-预测过程，实现对因果干预和反事实生成的精确控制。实验表明，CaTSG在保持观测真实性的同时，能够有效生成干预和反事实序列，优于现有基线方法。

2509.14232 2026-05-15 cs.CV

GenExam: A Multidisciplinary Text-to-Image Exam

Zhaokai Wang, Penghao Yin, Xiangyu Zhao, Changyao Tian, Yu Qiao, Wenhai Wang, Jifeng Dai, Gen Luo

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Tsinghua University（清华大学）； Shanghai AI Laboratory（上海人工智能实验室）； The Chinese University of Hong Kong（香港中文大学）

AI总结 GenExam 是首个面向多学科文本到图像生成的考试式基准，旨在评估模型在理解、推理与图像生成方面的综合能力。该基准包含10个学科共1000道题目，每个题目均配有标准答案图像和细粒度评分点，以精确评估生成结果的语义正确性与视觉合理性。实验表明，GenExam 对现有模型提出了巨大挑战，开源模型在性能上与闭源模型存在显著差距，凸显了当前生成模型在复杂任务中的不足。

Comments Accepted by ICML 2026

2509.01299 2026-05-15 cs.CV

Cross-Domain Few-Shot Segmentation via Ordinary Differential Equations over Time Intervals

Huan Ni, Qingshan Liu, Xiaonan Niu, Danfeng Hong, Lingli Zhao, Haiyan Guan

发表机构 * School of Remote Sensing & Geomatics Engineering, Nanjing University of Information Science & Technology（南京信息工程大学遥感与地理信息学院）； Tiandu-Nuist Deep Sapce Exploartion Laboratory（天都-南京信息工程大学深空探索实验室）； School of Computer Science, Nanjing University of Posts and Telecommunications（南京邮电大学计算机科学学院）； Nanjing Center, China Geological Survey（南京地质调查局南京中心）； School of Automation, Southeast University（东南大学自动化学院）； School of Remote Sensing and Information Engineering, Wuhan University（武汉大学遥感与信息工程学院）

AI总结本文研究了跨域少样本分割（CD-FSS）问题，旨在在源域和目标域之间存在域偏移的情况下，利用极少的样本对未知类别进行分割。为了解决现有方法中模块独立导致知识流动受限的问题，作者提出了一种基于常微分方程（ODE）和傅里叶变换的统一模块FSS-TI，通过时间区间内的特征演化过程，实现了对域无关特征的探索和有限样本下的高效学习。实验表明，该方法在跨域适应性和分割性能方面均优于现有方法。

2508.15198 2026-05-15 cs.LG math-ph math.MP

Frequency-adaptive tensor neural networks for high-dimensional multi-scale problems

Jizu Huang, Yue Qiu, Rukang You

发表机构 * SKLMS, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, 100190, PR China（数学与系统科学研究院，中国科学院，北京，100190，中国）； School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100190, PR China（中国科学院大学数学科学学院，北京，100190，中国）； College of Mathematics and Statistics, Chongqing University（重庆大学数学与统计学院）

AI总结该研究针对高维多尺度问题中传统张量神经网络（TNNs）难以准确捕捉高频特征的问题，提出了一种频率自适应的张量神经网络方法。通过傅里叶分析揭示TNNs的训练动态，并引入随机傅里叶特征增强其表达能力，同时利用TNNs的张量结构对一维组件函数进行离散傅里叶变换，有效缓解了维度灾难。该方法显著提升了TNNs在复杂多尺度问题中的求解能力，并通过大量数值实验验证了其有效性与鲁棒性。

2508.06226 2026-05-15 cs.AI

GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines

Yumeng Fu, Jiayin Zhu, Lingling Zhang, Wenjun Wu, Bo Zhao, Shaoxuan Ma, Yushun Zhang, Jun Liu

发表机构 * School of Computer Science and Technology, Xi’an Jiaotong University（西安交通大学计算机科学与技术学院）； Ministry of Education Key Laboratory of Intelligent Networks and Network Security, China（教育部智能网络与网络安全重点实验室）； Shaanxi Province Key Laboratory of Big Data Knowledge Engineering, China（陕西省大数据知识工程重点实验室）； School of Software Engineering, Xi’an Jiaotong University（西安交通大学软件工程学院）

AI总结 GeoLaux 是一个用于评估多模态大语言模型（MLLMs）在需要辅助线构造的长步骤几何问题上表现的细粒度基准数据集，包含2186个计算与证明问题，平均解题步骤达6.51步，其中41.8%的问题需要辅助线构造。基于该数据集对23个主流MLLMs进行五维评估，研究发现模型在长步骤问题上的表现明显下降，辅助线理解能力不足是影响几何推理的关键因素，同时有限的答案提示有助于提升推理过程的正确性。GeoLaux 为评估和提升 MLLMs 的几何推理能力提供了重要参考。

Comments 26 pages, 24 figures

2508.06202 2026-05-15 cs.CV cs.AI

LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning

Chang Che, Ziqi Wang, Pengwan Yang, Qi Wang, Hui Ma, Zenglin Shi

发表机构 * Hefei University of Technology（合肥工业大学）； University of Amsterdam（阿姆斯特丹大学）； Tsinghua University（清华大学）

AI总结持续视觉指令微调（CVIT）使多模态大语言模型能够逐步学习新任务，但面临灾难性遗忘的问题。为解决这一挑战，本文提出了一种高效的架构扩展方法LiLoRA，通过共享LoRA矩阵A并引入对矩阵B的低秩分解，显著减少了参数开销，并结合余弦正则化稳定性损失以保持表示的一致性。实验表明，LiLoRA在多个CVIT基准上实现了更优的性能，同时提升了参数效率。

Comments AAAI 2026 Oral Presentation. 9 pages

Journal ref Proceedings of the AAAI Conference on Artificial Intelligence, 40(24):19978--19986, 2026

2508.05008 2026-05-15 cs.CV

Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation

Xusheng Liang, Lihua Zhou, Nianxin Li, Miao Xu, Ziyang Song, Dong Yi, Jinlin Wu, Jiawei Ma, Hongbin Liu, Zhen Lei, Jiebo Luo

发表机构 * City University of Hong Kong（香港城市大学）； Shenzhen Loop Area Institute（深圳环城院）； CAIR, HKISI, Chinese Academy of Sciences（中国科学院计算智能研究所）； UESTC（电子科技大学）； MAIS, Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）

AI总结该研究针对医学图像分割中因设备差异、成像模式等引起的领域偏移问题，提出了一种多模态因果驱动的表示学习框架MCDRL。该方法结合视觉-语言模型与因果推理，通过构建领域特定的干扰词典并训练因果干预网络，有效消除领域偏差的同时保留解剖结构信息。实验表明，MCDRL在多个医学图像分割任务中表现出更优的分割精度和更强的跨领域泛化能力。

Comments Accepted by CVPR 2026

2508.01916 2026-05-15 cs.LG cs.AI cs.CL

Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning

Xinting Huang, Michael Hahn

发表机构 * Saarland University（萨尔兰大学）

AI总结本文研究如何通过无监督学习将神经网络的表示空间分解为具有可解释性的子空间。作者提出了一种名为邻居距离最小化（NDM）的方法，能够在不依赖标签的情况下学习出与模型内部概念对齐的子空间。实验表明，这些子空间能够捕捉到输入中的抽象概念，并在GPT-2等模型中与已知的电路变量存在强关联，为理解模型内部结构提供了新视角。

Comments Published as a conference paper at ICLR 2026

2507.21433 2026-05-15 cs.LG cs.AI

ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing

Kaiwen Chen, Xin Tan, Minchen Yu, Jingzong Li, Hong Xu

发表机构 * The Chinese University of Hong Kong（香港中文大学）； The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））； The Hang Seng University of Hong Kong（恒生大学）

AI总结大型推理模型（LRMs）在许多AI推理系统中发挥着关键作用，但其在生产环境中的部署面临服务质量（QoS）挑战，主要表现为长序列推理过程带来的高内存开销，限制了吞吐量并增加了延迟。为此，本文提出ReasonCache，一种基于协同过滤算法的KV缓存管理方法，通过识别和复用相似的中间推理步骤对应的KV缓存块，实现零拷贝缓存复用，显著提升了推理效率。实验表明，ReasonCache在保持较高准确率的同时，峰值吞吐量提升了89.2%，平均提升达40-60%，有效提高了AI推理服务的响应速度和成本效益。

Comments 10 pages, 7 figures

2507.21023 2026-05-15 cs.LG eess.SP

On Using the Shapley Value for Anomaly Localization: A Statistical Investigation

Xubin Fang, Rick S. Blum, Franziska Freytag

发表机构 * Electrical and Computer Engineering Department of Lehigh University（莱维大学电气与计算机工程系）

AI总结本文研究了在传感器数据系统中使用夏普利值进行异常定位的问题，探讨了其统计特性。作者提出通过在夏普利值计算中采用单一固定项，可以在保持相同误检概率的前提下，显著降低异常定位的复杂度。研究证明了该方法在独立观测情况下具有普遍适用性，而在相关观测情况下仍需进一步验证。

Journal ref Applied AI Letters 7(2) (2026) e70024

2507.07776 2026-05-15 cs.CV

SCOOTER: A Human Evaluation Framework for Unrestricted Adversarial Examples

Dren Fazlija, Monty-Maximilian Zühlke, Johanna Schrader, Arkadij Orlov, Clara Stein, Iyiola E. Olatunji, Daniel Kudenko

发表机构 * University of Luxembourg（卢森堡大学）； CAIMed – Lower Saxony Center for AI & Causal Methods in Medicine（下萨克森人工智能与因果方法医学中心）

AI总结该论文提出SCOOTER，一个用于评估无约束对抗样本真实性的开源框架。无约束对抗攻击通过改变物体颜色等方式绕过传统防御策略，但其不可察觉性需依赖人类评估。SCOOTER提供了标准化的人类评估流程、大规模对比实验以及开源工具和数据集，揭示了当前多种对抗攻击方法在人类感知上表现不佳，并强调了人类感知与自动视觉系统之间的差异。

Comments 42 pages, 16 figures, 11 tables, Under Review, Code: https://github.com/DrenFazlija/Scooter, Data: https://doi.org/10.5281/zenodo.15771501

详情

英文摘要

Unrestricted adversarial attacks aim to fool computer vision models without being constrained by $\ell_p$-norm bounds to remain imperceptible to humans, for example, by changing an object's color. This allows attackers to circumvent traditional, norm-bounded defense strategies such as adversarial training or certified defense strategies. However, due to their unrestricted nature, there are also no guarantees of norm-based imperceptibility, necessitating human evaluations to verify just how authentic these adversarial examples look. While some related work assesses this vital quality of adversarial attacks, none provide statistically significant insights. This issue necessitates a unified framework that supports and streamlines such an assessment for evaluating and comparing unrestricted attacks. To close this gap, we introduce SCOOTER - an open-source, statistically powered framework for evaluating unrestricted adversarial examples. Our contributions are: $(i)$ best-practice guidelines for crowd-study power, compensation, and Likert equivalence bounds to measure imperceptibility; $(ii)$ the first large-scale human vs. model comparison across 346 human participants showing that three color-space attacks and three diffusion-based attacks fail to produce imperceptible images. Furthermore, we found that GPT-4o can serve as a preliminary test for imperceptibility, but it only consistently detects adversarial examples for four out of six tested attacks; $(iii)$ open-source software tools, including a browser-based task template to collect annotations and analysis scripts in Python and R; $(iv)$ an ImageNet-derived benchmark dataset containing 3K real images, 7K adversarial examples, and over 34K human ratings. Our findings demonstrate that automated vision systems do not align with human perception, reinforcing the need for a ground-truth SCOOTER benchmark.

URL PDF HTML ☆

赞 0 踩 0

2507.04049 2026-05-15 cs.CV cs.RO

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving

Ziying Song, Lin Liu, Hongyu Pan, Bencheng Liao, Mingzhe Guo, Lei Yang, Yongchang Zhang, Shaoqing Xu, Caiyan Jia, Yadan Luo

发表机构 * School of Artificial Intelligence (School of Software), Yanshan University（燕山大学人工智能学院（软件学院））； Beijing Key Laboratory of Traffic Data Mining and Embodied Intelligence, School of Computer Science and Technology, Beijing Jiaotong University（北京交通大数据挖掘与具身智能关键实验室，北京交通大学计算机科学与技术学院）； Horizon Robotics（地平线机器人）； School of Mechanical and Aerospace Engineering, Nanyang Technological University（南洋理工大学机械与航空航天工程学院）； University of Macau（澳门大学）； School of Electrical Engineering and Computer Science, The University of Queensland（昆士兰大学电子工程与计算机科学学院）

AI总结大多数端到端自动驾驶方法依赖单一专家示范的模仿学习，导致行为保守且同质化，难以适应复杂的真实场景。本文提出DIVER框架，结合强化学习与扩散生成模型，生成多样化且可行的驾驶轨迹。DIVER通过强化学习引导扩散过程，利用奖励机制确保轨迹的安全性与多样性，并提出新的多样性度量指标，实验表明其在多个基准测试中显著提升了轨迹多样性，有效缓解了模仿学习中的模式崩溃问题。

Comments 17 pages, 10 figures

2506.16608 2026-05-15 cs.LG cs.AI

Distributions as Actions: A Unified Framework for Diverse Action Spaces

Jiamin He, A. Rupam Mahmood, Martha White

发表机构 * Department of Computing Science University of Alberta（计算科学系阿尔伯塔大学）； Alberta Machine Intelligence Institute (Amii)（阿尔伯塔机器智能研究所（Amii））； CIFAR AI Chair, Amii（CIFAR人工智能主席，Amii）

AI总结本文提出了一种新的强化学习框架，将参数化的动作分布视为动作，重新定义了智能体与环境之间的边界。该方法通过重参数化使动作空间变为连续空间，适用于离散、连续或混合类型的动作。研究还提出了一种通用的确定性策略梯度估计器DA-PG以及基于TD3的实用演员-评论家算法DA-AC，实验表明其在多种控制任务中表现出良好的性能。

Comments Accepted to ICLR 2026 (camera-ready)

2506.08584 2026-05-15 cs.CL

CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering

Yahan Li, Jifan Yao, John Bosco S. Bunyi, Adam C. Frank, Angel Hsing-Chi Hwang, Ruishan Liu

发表机构 * Department of Computer Science, University of Southern California（南加州大学计算机科学系）； Department of Electrical and Computer Engineering, University of Southern California（南加州大学电气与计算机工程系）； Suzanne Dworak-Peck School of Social Work, University of Southern California（南加州大学苏兹安·德沃拉克-佩克社会工作学院）； Department of Psychiatry and the Behavioral Sciences, University of Southern California（南加州大学精神病学与行为科学系）； Annenberg School for Communication, University of Southern California（南加州大学安纳伯格通信学院）

AI总结本文提出CounselBench，一个用于评估大语言模型在心理健康问答任务中表现的大型基准测试，由100名心理健康专家构建。该基准包含两个部分：CounselBench-EVAL基于2000个专家对GPT-4、LLaMA 3等模型及在线人类治疗师的回答进行评分，揭示了模型在临床相关性、个性化和安全性等方面存在的问题；CounselBench-Adv则通过专家设计的对抗性问题，进一步暴露模型的特定失效模式。研究为心理健康领域的语言模型评估提供了临床导向的框架。

2506.04499 2026-05-15 cs.CV

FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices

Shizhong Han, Hsin-Pai Cheng, Hong Cai, Jihad Masri, Soyeb Nagori, Fatih Porikli

发表机构 * Qualcomm AI Research（高通AI研究）

AI总结本文提出了一种名为FALO的高效且精确的LiDAR三维目标检测方法，专为资源受限的边缘设备设计。该方法通过将稀疏体素按坐标和邻近性排列成一维序列，并结合提出的ConvDotMix模块进行处理，实现了在空间和嵌入维度上的充分特征混合与高阶非线性交互。实验表明，FALO在保持先进检测精度的同时，推理速度比当前最新方法在移动端GPU和NPU上提升了1.6到9.8倍，适合部署在紧凑型嵌入式设备上。

2506.00158 2026-05-15 cs.LG

Privacy Amplification in Differentially Private Zeroth-Order Optimization with Hidden States

Eli Chien, Wei-Ning Chen, Pan Li

发表机构 * Department of Electrical Engineering, National Taiwan University, Taiwan（台湾国立台湾大学电子工程系）； NTU Artificial Intelligence Center of Research Excellence (NTU AI-CoRE), Taiwan（国立台湾大学人工智能研究中心（NTU AI-CoRE））； Microsoft, USA（微软公司）； Department of Electrical and Computer Engineering, Georgia Institute of Technology, USA（佐治亚理工学院电子与计算机工程系）

AI总结本文研究了在差分隐私（DP）和内存受限条件下，如何通过零阶优化方法对大语言模型进行微调，并解决隐私放大问题。针对零阶方法中因更新方向随机导致的各向异性噪声难以适用传统隐私分析框架的问题，作者提出了一种混合噪声机制和耦合分析方法，首次建立了收敛的隐藏状态DP界，突破了全局利普希茨条件的限制。该成果为设计更高效的差分隐私零阶优化算法提供了新的理论支持。

Comments ICML 2026

2505.22394 2026-05-15 cs.CV

PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models

Fan Fei, Jiajun Tang, Fei-Peng Tian, Boxin Shi, Ping Tan

发表机构 * State Key Laboratory of Multimedia Information Processing, School of Computer Science, Peking University（信息处理国家重点实验室，计算机学院，北京大学）； National Engineering Research Center of Visual Technology, School of Computer Science, Peking University（视觉技术国家工程研究中心，计算机学院，北京大学）； The Hong Kong University of Science and Technology（香港科技大学）； Light Illusions ； PKU-AI 2 Robotics Joint Lab of Embodied AI（北京大学人工智能2机器人联合实验室）

AI总结本文提出了一种名为 PacTure 的新框架，用于根据文本描述为无纹理的3D网格生成物理基于渲染（PBR）材质纹理。为了解决现有方法在生成效率和纹理一致性方面的不足，该方法引入了视图打包技术，有效提升了多视角生成时的分辨率，同时保持了生成模型的高效性与兼容性。通过结合细粒度控制和自回归预测框架，PacTure 在生成质量和效率方面均优于现有先进方法。

Comments Accepted by Computational Visual Media Journal (CVMJ) in Feb. 2026. 19 pages, 7 figures

2505.11809 2026-05-15 cs.CV

From Street View to Visual Network: Mapping the Visibility of Urban Landmarks with Vision-Language Models

Zicheng Fan, Kunihiko Fujiwara, Pengyuan Liu, Fan Zhang, Filip Biljecki

发表机构 * organization= Department of Architecture, National University of Singapore , country= Singapore ； organization= Research \& Development Institute, Takenaka Corporation , country= Japan ； organization= Urban Analytics Subject Group, Urban Studies \& Social Policy Division, University of Glasgow , country= United Kingdom ； organization= Institute of Remote Sensing ； GIS, Peking University , country= China ； organization= Department of Real Estate, National University of Singapore , country= Singapore

AI总结本文提出一种基于视觉语言模型（VLM）的方法，利用街景图像评估城市地标在真实街道环境中的可见性，替代传统的基于几何遮挡的视线模拟方法。通过在受控方向和缩放的街景图像中检测目标地标，构建异构可见性图以表示地标之间的视觉连接关系，揭示了多个地标通过共享视觉走廊相互关联的模式。实验表明，该方法在多个国际知名地标上的检测准确率达87%，并在伦敦泰晤士河沿岸案例中有效识别了关键中介地点，为城市规划和遗产保护提供了新的分析视角。

2505.03519 2026-05-15 cs.LG

Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment

Sy-Tuyen Ho, Koh Jun Hao, Ngoc-Bao Nguyen, Alexander Binder, Ngai-Man Cheung

发表机构 * Singapore University of Technology and Design（新加坡科技设计大学）； University of Maryland College Park（马里兰大学学院公园分校）

AI总结该论文重新审视了模型逆向攻击的评估方法，指出当前主流评估框架存在误导性，许多被认为是成功的攻击实际上为假阳性，未能真实还原目标个体的信息。研究揭示这些假阳性具有类似第一类对抗样本的特性，并展示了其高度可迁移性，导致现有攻击准确率被高估。为此，作者提出基于多模态大语言模型（MLLM）的新评估框架，有效降低对抗迁移性，更可靠地评估隐私泄露风险。

Comments Accepted to CVPR Findings 2026

详情

英文摘要

Model Inversion attacks aim to reconstruct information from private training data by exploiting access to a target model. Nearly all recent MI studies evaluate attack success using a standard framework that computes attack accuracy through a secondary evaluation model trained on the same private data and task design as the target model. In this paper, we present the first in-depth analysis of this dominant evaluation framework and reveal a fundamental issue: many reconstructions deemed successful under the existing framework are in fact false positives that do not capture the visual identity of the target individual. We first show that these MI false positives satisfy the same formal conditions as Type I adversarial examples. Our controlled experiments, we demonstrate extremely high false-positive transferability, an empirical signature characteristic of adversarial behavior, indicating that many MI false positives likely contain Type I adversarial features. This adversarial transferability significantly inflates reported attack accuracy and leads to an overstatement of privacy leakage in existing MI work. To address this issue, as our second contribution, we introduce a new evaluation framework based on MLLMs, whose general-purpose visual reasoning avoids the shared-task vulnerability and reduces Type-I adversarial transferability of current evaluation framework. We propose systematic design principles for MLLM-based evaluation. Using this framework, we reassess 27 MI attack setups across diverse datasets, target models, and priors, and find consistently high false-positive rates under the conventional approach. Our results call for a reevaluation of progress in MI research and establish MLLM-based evaluation as a more reliable standard for assessing privacy risks in machine learning systems. Code/data/prompt are available at https://hosytuyen.github.io/projects/FMLLM

URL PDF HTML ☆

赞 0 踩 0

2505.01584 2026-05-15 cs.LG cs.AI

Silent Neuron Theory and Plasticity Preservation for Deep Reinforcement Learning in Adaptive Video Streaming

Zhiqiang He, Zhi Liu

发表机构 * Department of Computer and Network Engineering, the University of Electro-Communications, Japan（电子通信大学计算机与网络工程系，日本）

AI总结本文研究了深度强化学习在自适应视频流中的应用，针对实际网络带宽异质性导致的模型泛化能力不足问题，提出了“静默神经元理论”以更准确地刻画神经网络的可塑性退化现象。基于该理论，作者设计了Reset Silent Neuron（ReSiN）方法，通过结合前向和后向传播状态的策略性神经元重置，有效保持网络可塑性，从而提升模型在非稳态网络环境下的适应能力。实验表明，ReSiN在比特率和QoE指标上显著优于现有方法，且在不同网络条件下均表现出良好的鲁棒性。

2504.18544 2026-05-15 cs.LG cs.AI cs.CY

Critical Challenges and Guidelines in Evaluating Synthetic Tabular Data: A Systematic Review

Nazia Nafis, Inaki Esnaola, Alvaro Martinez-Perez, Maria-Cruz Villa-Uriol, Venet Osmani

发表机构 * Healthy Lifespan Institute, School of Computer Science, University of Sheffield（健康寿命研究所，计算机科学学院，谢菲尔德大学）； School of Electrical and Electronic Engineering, University of Sheffield（电子与电气工程学院，谢菲尔德大学）； Healthy Lifespan Institute, School of Sociological Studies, Politics and International Relations, University of Sheffield（健康寿命研究所，社会科学学院，政治与国际关系，谢菲尔德大学）； Digital Environment Research Institute, Queen Mary University of London（数字环境研究 institutes，伦敦女王大学）

AI总结该论文系统回顾了近年来合成表格健康数据生成与评估领域的研究，指出了当前在评估方法上缺乏共识、指标应用不一致、领域专家参与不足等关键挑战。为应对这些问题，研究提出了结构化的分类框架和实用评估指南，旨在推动更严谨、标准化的评估实践，促进合成健康数据的负责任开发与应用。

Comments 32 pages

2504.09549 2026-05-15 cs.CV

SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification

Yuhao Wang, Xiang Hu, Lixin Wang, Pingping Zhang, Huchuan Lu

发表机构 * School of Future Technology, Dalian University of Technology（大连理工大学未来技术学院）； School of Information and Communication Engineering, Dalian University of Technology（大连理工大学信息与通信工程学院）

AI总结本文提出了一种名为SD-ReID的生成框架，用于解决航拍与地面视角下的人再识别（AG-ReID）问题。该方法通过结合生成模型与可控条件，学习不同视角下的特征分布，从而提取更具鲁棒性的身份表示，并引入视图细化解码器以增强特征对齐能力。实验表明，该方法在多个AG-ReID数据集上均取得了优异的性能。

Comments This work is accepted by IEEE TIP 2026. More modifications may performed

AI 大模型

视觉与机器人

科学与医疗

Energy-Regularized Sequential Model Editing on Hyperspheres

It Takes Two: Your GRPO Is Secretly DPO

LEAP: Local ECT-Based Learnable Positional Encodings for Graphs

AgenticEval: Toward Agentic and Self-Evolving Safety Evaluation of Large Language Models

ReNF: Rethinking the Design of Neural Long-Term Time Series Forecasters

Kairos: Toward Adaptive and Parameter-Efficient Time Series Foundation Models

Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia

Mixture-of-Visual-Thoughts: Exploring Context-Adaptive Reasoning Mode Selection for General Visual Reasoning

Causal Time Series Generation via Diffusion Models

GenExam: A Multidisciplinary Text-to-Image Exam

Cross-Domain Few-Shot Segmentation via Ordinary Differential Equations over Time Intervals

Frequency-adaptive tensor neural networks for high-dimensional multi-scale problems

GeoLaux: A Benchmark for Evaluating MLLMs' Geometry Performance on Long-Step Problems Requiring Auxiliary Lines

LoRA in LoRA: Towards Parameter-Efficient Architecture Expansion for Continual Visual Instruction Tuning

Multimodal Causal-Driven Representation Learning for Generalizable Medical Image Segmentation

Decomposing Representation Space into Interpretable Subspaces with Unsupervised Learning

ReasonCache: Accelerating Large Reasoning Model Serving through KV Cache Sharing

On Using the Shapley Value for Anomaly Localization: A Statistical Investigation

SCOOTER: A Human Evaluation Framework for Unrestricted Adversarial Examples

DIVER: Reinforced Diffusion Breaks Imitation Bottlenecks in End-to-End Autonomous Driving

Distributions as Actions: A Unified Framework for Diverse Action Spaces

CounselBench: A Large-Scale Expert Evaluation and Adversarial Benchmarking of Large Language Models in Mental Health Question Answering

FALO: Fast and Accurate LiDAR 3D Object Detection on Resource-Constrained Devices

Privacy Amplification in Differentially Private Zeroth-Order Optimization with Hidden States

PacTure: Efficient PBR Texture Generation on Packed Views with Visual Autoregressive Models

From Street View to Visual Network: Mapping the Visibility of Urban Landmarks with Vision-Language Models

Revisiting Model Inversion Evaluation: From Misleading Standards to Reliable Privacy Assessment

Silent Neuron Theory and Plasticity Preservation for Deep Reinforcement Learning in Adaptive Video Streaming

Critical Challenges and Guidelines in Evaluating Synthetic Tabular Data: A Systematic Review

SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification