arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2605.09708 2026-05-12 cs.LG cs.AI cs.DC

Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon

Víctor Gallego

发表机构 * Komorebi AI Technologies（Komorebi人工智能技术）

AI总结本文提出 Metal-Sci，一个用于在苹果芯片上评估进化型大语言模型（LLM）内核搜索性能的科学计算基准，涵盖六个优化场景的十项任务。该基准结合了轻量级框架，能够自动编译候选内核并评估其性能，同时通过结构化诊断反馈给固定LLM，驱动进化搜索过程。研究显示，使用 Claude、Gemini 和 GPT 等模型在 M1 Pro 上进行内核搜索，可实现最高达 10.7 倍的性能提升，并提出了一种基于保留测试集的评分函数，用于检测模型在未知场景下的性能退化问题。

Comments Preprint

2605.09707 2026-05-12 cs.LG cs.AI

Adaptive Data Harvesting for Efficient Neural Network Learning with Universal Constraints

Siteng Kang, Xinhua Zhang

发表机构 * University of Illinois Chicago（伊利诺伊大学芝加哥分校）

AI总结本文研究了在连续域上训练满足通用约束的神经网络所面临的问题，如李雅普诺夫神经网络和物理信息神经网络，这类问题通常缺乏解析解或约束过于严格。为解决这一问题，作者提出了一种基于强化学习的自适应数据采集方法，通过从数据和经验中学习动态调整样本，以提升模型训练的效率和约束满足能力。该方法在多种任务中验证有效，展示了其在需要自适应输入选择的训练场景中的广泛适用性。

Comments Preprint

2605.09703 2026-05-12 cs.CV

MOTOR-Bench: A Real-world Dataset and Multi-agent Framework for Zero-shot Human Mental State Understanding

Xiaoyu Yuan, Niklas Heikkala, Tiina Törmänen, Hanna Järvenoja, Guoying Zhao, Haoyu Chen

发表机构 * University of Oulu（奥卢大学）

AI总结本文提出MOTOR-Bench，一个用于零样本人类心理状态理解的现实场景数据集与多智能体框架。该数据集包含1,440个协作学习场景的多模态视频片段，每个样本由教育专家基于自我调节学习理论标注，旨在支持对复杂人际互动的结构化分析。为解决现有方法在从可观测行为推理深层心理状态方面的不足，研究提出了MOTOR-MAS多智能体框架，通过结构化协调机制提升对行为、认知和情绪三类标签的预测性能，实验表明其在多项指标上显著优于现有方法。

Comments Accepted by CVPR 2026 workshop AI4RWC

2605.09701 2026-05-12 cs.CV

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

Yufeng Hong, Xiaotian Zhou, Yingyan Li, Xiangpo Zhou, Lin Liu, Yadan Luo, Shaoqing Xu, Lei Yang, Ziying Song

发表机构 * Beijing Institute of Technology（北京理工大学）； Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； Beihang University（北航）； Beijing Jiaotong University（北京交通大学）； The University of Queensland（昆士兰大学）； University of Macau（澳门大学）； Nanyang Technological University（南洋理工大学）； School of Artificial Intelligence ( School of Software), Yanshan University（燕山大学人工智能学院（软件学院））

AI总结 DriveFuture 是一种面向自动驾驶的未来感知潜在世界模型，其核心在于将未来世界状态作为当前潜在状态建模的条件，从而显式学习面向路径规划的前瞻性能力。该方法在训练过程中通过预测和优化未来潜在状态，为基于扩散模型的轨迹规划器提供显式条件，在多个公开基准测试中取得了领先的性能表现。实验结果表明，将未来状态作为当前决策的条件，比单纯预测未来状态更能提升自动驾驶系统的智能化水平。

Comments 24pages, 7 figures

2605.09698 2026-05-12 cs.AI

Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents

Josefa Lia Stoisser, Marc Boubnovski Martell, Sidsel Boldsen, Kaspar Märtens, Robert Kitchen

发表机构 * Novo Nordisk London, UK（诺和制药伦敦分公司）

AI总结随着数据科学代理从辅助工具向自主系统转变，任务框架的隐性错误成为关键失效模式。本文提出 Ambig-DS 基准，用于评估数据科学代理在任务目标和评估目标模糊情况下的表现，包含两个诊断套件，分别基于 DSBench 和 MLE-bench 构建。研究发现，代理常在未明确任务的情况下提交错误答案，而并非执行错误，并且在允许提问时性能显著提升，但代理难以判断何时需要提问，反映出当前评估体系对任务框架识别能力的忽视。

详情

英文摘要

As data-science agents shift from co-pilots to auto-pilots, silent misframing becomes a critical failure mode. Agents quietly commit to plausible but unintended task framings, producing clean, executable artifacts that hide their incorrect assessment of the task. Existing benchmarks score whether the pipeline runs, ignoring whether the agent recognized the task was underspecified. We introduce Ambig-DS, two diagnostic suites: one for prediction-target ambiguity (Ambig-DS-Target, 51 tasks built on DSBench, a tabular modeling benchmark) and one for evaluation-objective ambiguity (Ambig-DS-Objective, 61 tasks built on MLE-bench, a Kaggle-style ML competition benchmark), constructed so that scoring uses each source benchmark's original evaluator. For every task we pair the original, fully specified version with an ambiguous variant produced by controlled edits; a human-and-LLM verification pipeline confirms each variant admits multiple plausible interpretations with decision-relevant consequences. The suites are analyzed independently and ambiguity lowers performance in both. Across five agents spanning efficient to frontier-class models, we find in our controlled diagnostic setting: (i) failures are silent commitments: wrong-target submissions on Target, wrong-metric or non-committal baseline submissions on Objective, rather than execution errors; (ii) allowing the agent to ask one clarifying question recovers much of the loss under idealized conditions, suggesting missing framing information drives a substantial part of the observed degradation; but (iii) agents cannot reliably tell when to use it: permissive prompts induce over-asking on clear tasks, while conservative prompts induce silent defaulting on ambiguous ones. Recognizing target and objective underspecification, not pipeline execution, is the bottleneck missing from standard DS-agent evaluations.

URL PDF HTML ☆

赞 0 踩 0

2605.09696 2026-05-12 cs.LG cs.NE cs.SC

Discovery of Nonlinear Dynamics with Automated Basis Function Generation

Mohammad Amin Basiri, Charles Nicholson

发表机构 * Data Science and Analytics Institute, University of Oklahoma, Norman, OK, USA（数据科学与分析研究所，俄克拉荷马大学，诺曼，OK，USA）； School of Industrial and Systems Engineering, University of Oklahoma, Norman, OK, USA（工业与系统工程学院，俄克拉荷马大学，诺曼，OK，USA）

AI总结从观测数据中发现支配方程是科学建模中的一个基本挑战，尤其当系统背后的数学结构未知时。本文提出了一种名为AutoSINDy的混合框架，结合符号回归的探索能力和SINDy的稀疏性促进能力，通过分阶段的自动基函数生成与筛选，有效提升了模型发现的准确性与鲁棒性。实验表明，该方法在高噪声环境下仍能高效恢复真实方程，显著优于传统方法。

Comments 53 pages, 17 figures. Code available at https://github.com/mabasiri95/AutoSINDy

2605.09693 2026-05-12 cs.CV cs.AI cs.LG

Do multimodal models imagine electric sheep?

Santhosh Kumar Ramakrishnan, Carl Vondrick, Raja Giryes, Philipp Krähenbühl, Vladlen Koltun

发表机构 * Apple（苹果公司）

AI总结该研究探讨了多模态模型在解决空间谜题时是否会产生心理意象，并发现大型多模态模型在解决如拼图、积木等任务时确实会形成类似“想象”的过程，甚至在解决与羊相关的谜题时会“想象”出羊的形象。研究通过微调Qwen3.5视觉语言模型，使其能够完成多种视觉推理任务，并发现模型在执行操作过程中会自发形成对中间状态的视觉表征。基于这一发现，研究提出了两种方法来增强和利用模型的内部视觉表征，显著提升了任务解决的准确率。

2605.09691 2026-05-12 cs.LG

Quantum Circuit Simulation of Compartmental Drug Dynamics: Leveraging Variational Algorithms for Nonlinear Mixed-Effects Population Pharmacokinetics

Isshaan Singh, Nandan Patel

发表机构 * School of Computer Science and Engineering（计算机科学与工程学院）； Vellore Institute of Technology（维洛雷理工学院）

AI总结本文将传统的药物动力学（PK/PD）模型转化为开放量子系统，并利用量子电路进行模拟，以提升群体药代动力学建模的统计性能。研究通过十二个量子比特编码四个药理学腔室，并使用受控量子操作模拟腔室间的随机转移过程。实验表明，该量子方法在对数似然值上优于经典方法，同时保持参数估计一致，验证了模型的统计拟合能力和数值稳定性，为生物医学领域提供了新的量子-经典混合建模方法。

2605.09688 2026-05-12 cs.CV

ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes

Rui Song, Tianhui Cai, Markus Gross, Xingcheng Zhou, Zewei Zhou, Zhiyu Huang, Olaf Wysocki, Jiaqi Ma

发表机构 * University of California, Los Angeles（加州大学洛杉矶分校）； University of Cambridge（剑桥大学）； Technical University of Munich（慕尼黑技术大学）

AI总结本文提出了一种名为 ConFixGS 的方法，用于修复基于前馈的3D高斯泼溅（3DGS）在驾驶场景中的重建问题。该方法利用置信度感知的扩散先验，通过生成局部伪目标并结合支持视图的重投影校验，提升重建的细节可靠性并抑制不一致信息。实验表明，ConFixGS 在多个数据集上显著提升了新视角合成效果，PSNR 提升最高达3.68 dB，FID 减少近一半，展示了其在驾驶场景中鲁棒重建的有效性。

Comments 28 pages, 12 figures

2605.09687 2026-05-12 cs.CV

Spatial-Frequency Gated Swin Transformer for Remote Sensing Single-Image Super-Resolution

Md Aminur Hossain, Parekh Valkesh, Ayush V. Patel, Yogesh Jethani, Sanjay K. Singh, Biplab Banerjee

发表机构 * Space Applications Centre, ISRO, Ahmedabad, India（印度航天研究组织阿赫迈德亚布德研究中心）； Centre of Studies in Resources Engineering, Indian Institute of Technology Bombay, India（印度理工学院孟买资源工程研究学院）； New L J Institute of Engineering and Technology, Ahmedabad, India（阿赫迈德亚布德新LJ工程与技术学院）； Pandit Deendayal Energy University, Gandhinagar, India（潘迪特·德恩达尔能源大学）； GLS University, Ahmedabad, India（阿赫迈德亚布德GLS大学）

AI总结本文研究了遥感单图像超分辨率问题，旨在从低分辨率观测中重建高分辨率图像并保留精细的空间结构。为了解决现有Swin Transformer模型在细节重建上的不足，作者提出了一种空间-频率门控Swin Transformer（SFG-SwinSR），通过在前馈网络中引入空间-频率门控模块，分离低频结构内容与高频残差细节，从而提升重建质量。实验表明，该方法在多个遥感数据集上取得了更好的PSNR和SSIM指标，有效增强了高分辨率图像的细节表现。

Comments 15 pages

2605.09685 2026-05-12 cs.LG cs.AI

Learning Unified Representations of Normalcy for Time Series Anomaly Detection

Prithul Sarker, Sushmita Sarker, Nicholas G. Murray, Alireza Tavakkoli

发表机构 * University of Nevada, Reno（内华达大学里诺分校）

AI总结本文研究了无监督时间序列异常检测中的核心问题——在缺乏异常特征先验知识的情况下，如何学习区分正常数据分布的鲁棒表示。为此，作者提出了一种新的统一无监督异常检测框架 $\text{U}^2\text{AD}$，该方法基于分数生成模型学习正常数据的潜在分布，并引入了时间依赖的分数网络和统一的训练目标，以同时捕捉局部和全局时间上下文信息。实验表明，该方法在检测准确率和异常早期识别能力方面均优于现有先进方法。

2605.09681 2026-05-12 cs.CV

Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

Yicheng Ji, Zhizhou Zhong, Jun Zhang, Qin Yang, XiTai Jin, Ying Qin, Wenhan Luo, Shuiyang Mao, Wei Liu, Huan Li

发表机构 * ZJU（浙江大学）； Video Rebirth（视频重生）； HKUST（香港科技大学）； BJTU（北京理工大学）

AI总结本文针对自回归视频扩散模型中因冗余键值（KV）缓存导致的注意力复杂度高和内存开销大的问题，提出了一种混合KV缓存压缩方法Forcing-KV。通过分析主流模型中注意力头的功能特性，将头分为关注帧内细节和块间过渡的静态头，以及控制帧间运动和一致性的动态头，并分别采用结构化剪枝和基于片段相似度的动态剪枝策略。该方法在保持生成质量的同时，显著提升了生成速度并减少了内存占用，实现在单块NVIDIA H200 GPU上每秒生成29帧以上。

Comments 10 pages

2605.09679 2026-05-12 cs.CV cs.AI

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents

Yixiong Chen, Wenjie Xiao, Pedro R. A. S. Bassi, Boyan Wang, Liang He, Xinze Zhou, Sezgin Er, Ibrahim Ethem Hamamci, Zongwei Zhou, Alan Yuille

发表机构 * Johns Hopkins University（约翰霍普金斯大学）； University of Bologna（博洛尼亚大学）； Istanbul Medipol University（伊斯坦布尔梅迪波尔大学）； Center for Biomolecular Nanotechnologies, Istituto Italiano di Tecnologia（生物分子纳米技术中心，意大利技术研究院）； The First Affiliated Hospital, Sun Yat-Sen University（中山大学第一附属医院）； Tongji University（同济大学）

AI总结 DeepTumorVQA 是一个面向医学影像的层次化3D CT基准，旨在对医疗视觉语言模型（VLMs）和工具增强代理进行分阶段评估。该基准将肿瘤诊断中的推理过程分解为识别、测量、视觉推理和医学推理四个阶段，使模型在不同层次上的表现能够被独立评估。研究还引入了工具交互环境，允许模型调用分割、测量和医学知识等外部工具，从而更贴近实际医疗场景。实验表明，工具增强显著提升了模型在复杂医学推理任务中的表现。

2605.09678 2026-05-12 cs.AI

Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

Ryan Albright, Golam Md Muktadir, Zarif Ikram, S M Jubaer, Mehrab Hossain, Dianbo Liu

发表机构 * The Nueva School（新维学校）； University of Southern California（南加州大学）； Notre Dame College（诺特大学）； Arizona State University（亚利桑那州立大学）； National University of Singapore（新加坡国立大学）

AI总结本文提出了一种名为 Absurd World 的基准框架，用于测试大语言模型（LLM）在逻辑推理方面的能力。该方法通过将现实世界的问题分解为符号、动作、序列和事件，并自动修改这些元素以构建逻辑自洽但荒谬的场景，从而在保持任务逻辑不变的前提下，检验 LLM 是否能够忽略现实世界中的模式进行推理。实验表明，Absurd World 是评估 LLM 逻辑推理鲁棒性的一种有效工具。

2605.09677 2026-05-12 cs.CV

VFM-SDM: A vision foundation model-based framework for training-free, marker-free, and calibration-free structural displacement measurement

Qingyu Xian, Hao Cheng, Berend Jan van der Zwaag, Rolands Kromanis, Ozlem Durmaz Incel

发表机构 * Pervasive Systems Research Group, Faculty of Electrical Engineering, Mathematics and Computer Science, University of Twente（普罗普及系统研究组，电气工程、数学与计算机科学学院，埃因霍温理工大学）； Department of Earth Observation Science, Faculty of Geo-Information Science and Earth Observation (ITC), University of Twente（地球观测科学系，地理信息科学与地球观测（ITC）学院，埃因霍温理工大学）； Department of Civil Engineering and Management, Faculty of Engineering Technology, University of Twente（土木工程与管理系，工程科技学院，埃因霍温理工大学）

AI总结本文提出了一种基于视觉基础模型（VFM）的结构位移测量框架VFM-SDM，能够在无需任务特定训练、无需现场标记和标定的情况下，实现多方向结构位移的非接触式测量。该方法结合VFM推断的相机参数估计与点跟踪技术，通过三角化重建位移，并引入结构几何约束以提升估计的物理合理性和一致性。实验结果表明，该框架在真实场景中具有较高的测量精度和稳定性，为自动化、可扩展的结构健康监测提供了新思路。

2605.09676 2026-05-12 cs.LG cs.AI nlin.CD

ChaosNetBench: Benchmarking Spatio-Temporal Graph Neural Networks on Chaotic Lattice Dynamics

Henok Tenaw Moges, Charalampos Skokos, Deshendran Moodley

发表机构 * Centre for Artificial Intelligence Research (CAIR)（人工智能研究中心（CAIR））； University of Cape Town（开普敦大学）； Nonlinear Dynamics and Chaos Group（非线性动力学与混沌组）； Department of Mathematics and Applied Mathematics（数学与应用数学系）

AI总结该论文提出了一种名为ChaosNetBench（CNB）的合成基准数据集与评估框架，用于在受控的多维混沌动力学条件下评估时空图神经网络（STGNN）的性能。CNB基于耦合标准映射的晶格系统构建，允许独立调节局部混沌强度、耦合强度和系统规模，提供了96个系统实例和9600条轨迹的已知拓扑与动力学信息。研究引入了混沌指标和评估协议，通过对比13种不同架构的性能，揭示了STGNN在应对不同层次局部与全局混沌时相较于非图结构模型的优越性。

Comments 24 pages, 11 figures

2605.09675 2026-05-12 cs.AI cs.MA

CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents

Timothy Ossowski, Xinchi Liu, Danyal Maqbool, Vaibhav Dhanuka, Sheng Zhang, Hoifung Poon, Majid Afshar, Tyler Bradshaw, Junjie Hu

发表机构 * University of Wisconsin–Madison（威斯康星大学麦迪逊分校）； Microsoft Research（微软研究院）

AI总结本文提出CodeClinic，一个基于MIMIC-IV构建的基准，用于评估大型语言模型在临床推理任务中是否能够自动生成和组合可复用的临床技能，而非依赖固定工具库。该基准包含两个互补任务：长期ICU监测和组合信息检索，分别用于评估模型在结构化决策和多步骤推理方面的能力。研究还提出了一种离线自动形式化流程，通过迭代优化将自然语言临床指南转化为可验证的Python技能库，显著提升了推理一致性并减少了每查询的计算开销。

2605.09672 2026-05-12 cs.RO

MVB-Grasp: Minimum-Volume-Box Filtering of Diffusion-based Grasps for Frontal Manipulation

Bibek Poudel, Abdul Basit, Muhammad Shafique

发表机构 * Unitree（单位树）； Intel（英特尔）

AI总结本文针对低成本机械臂在受限工作空间中的正面抓取任务，提出了一种基于最小体积包围盒（MVBB）的抓取过滤方法MVB-Grasp，有效提升了抓取成功率。该方法通过引入几何先验，结合定向包围盒的面法线进行快速过滤，并融合学习到的判别器分数与面对齐几何信息，优化抓取候选方案。实验表明，MVB-Grasp在Unitree Z1机械臂上实现了比传统方法高出2.4倍的成功率，验证了其在受限空间抓取任务中的有效性。

Comments 8 pages, 12 figures, accepted to IJCNN 2026

详情

英文摘要

State-of-the-art 6-DoF grasp generators excel on tabletop benchmarks with overhead cameras but struggle in frontal grasping scenarios on low-cost manipulators with constrained workspaces, where kinematic limits and approach-direction constraints cause high failure rates. We address this challenge for the Unitree Z1 arm by proposing MVB-Grasp, a novel grasping stack that injects a Minimum Volume Bounding Box (MVBB) geometric prior into diffusion-based grasp generation to dramatically improve success rates in frontal, workspace-constrained settings. Our key scientific contributions are threefold: (i) an MVBB-based geometric filter that exploits oriented bounding-box face normals to reject grasps approaching through the table or misaligned with accessible object faces in O(N) time; (ii) a combined re-scoring function that blends learned discriminator scores with face-alignment geometry α=0.85, specifically calibrated for the Z1's frontal workspace and kinematic constraints; and (iii) a systematic MuJoCo evaluation protocol measuring grasp success across object types, distances, lateral positions, and pitch orientations to validate embodiment-specific performance. We implement MVB-Grasp on a Unitree Z1 arm with an Intel RealSense D405 camera, integrating YOLOv8 object detection, GraspGen for candidate generation, Principal Component Analysis (PCA)-based MVBB fitting, and inverse-kinematics trajectory planning. Experiments across 81 MuJoCo episodes (cylinder, asymmetric box, waterbottle) demonstrate that MVB-Grasp achieves 59.3% success versus 24.7% for vanilla GraspGen, a 2.4x improvement, by filtering geometrically infeasible candidates and prioritizing face-aligned grasps suited to the Z1's frontal approach constraints. Real-world trials confirm that the MVBB prior substantially improves grasp reliability on constrained, low-cost manipulators without requiring model retraining.

URL PDF HTML ☆

赞 0 踩 0

2605.09670 2026-05-12 cs.RO cs.CV

Towards Generative Predictive Display for Vision-Based Teleoperation: A Zero-Shot Benchmark of Off-the-Shelf Video Models

Aws Khalil, Jaerock Kwon

发表机构 * Department of Electrical and Computer Engineering, University of Michigan - Dearborn（密歇根大学迪尔伯恩分校电气与计算机工程系）

AI总结本文研究了基于视觉的遥操作系统中预测显示技术的生成能力，旨在通过生成未来视觉状态来缓解通信延迟带来的影响。作者提出了一种无需任务微调的零样本基准，评估了多种现成的生成视频模型在短时预测显示中的表现。实验表明，现有模型在预测精度、推理延迟和误差稳定性等方面难以同时满足预测显示的需求，揭示了通用生成视频模型与遥操作预测显示应用之间的性能差距。

详情

英文摘要

Teleoperation systems are fundamentally limited by communication latency, which degrades situational awareness and control performance. Predictive display aims to mitigate this limitation by presenting an estimate of the current visual state rather than delayed observations. While recent advances in generative video models enable high-quality video synthesis, their suitability for latency-sensitive predictive display remains unclear. This paper presents a zero-shot benchmark of off-the-shelf generative video models for short-horizon predictive display, without task-specific fine-tuning. We formulate the problem as rollout-based future frame prediction and develop a unified benchmarking pipeline using simulated driving data from the CARLA simulator. Five publicly released video models spanning transformer-based and diffusion-based families are evaluated across two resolutions and two conditioning regimes (multi-frame and single-frame). Performance is assessed using prediction accuracy (mean absolute difference), per-rollout latency, peak GPU memory usage, and temporal error evolution across the prediction horizon. On this zero-shot benchmark, no tested model simultaneously achieves low rollout error, non-divergent per-step error behavior, and real-time inference at the source frame rate. Increasing model scale or resolution yields limited and, in some cases, inverted improvements. These findings highlight a gap between general-purpose generative video synthesis and the requirements of predictive display in teleoperation, suggesting that practical deployment will require either explicit short-horizon temporal supervision, in-domain adaptation, or aggressive inference optimization rather than direct application of off-the-shelf models. Code, configurations, and qualitative results are released on the project page: https://bimilab.github.io/paper-GenPD

URL PDF HTML ☆

赞 0 踩 0

2605.09667 2026-05-12 cs.CV cs.AI

S2P-Net: A Spectral-Spatial Polar Network for Rotation-Invariant Object Recognition in Low-Data Regimes

Albert Heruth

发表机构 * Unaffiliated Researcher（无隶属研究人员）

AI总结本文提出了一种名为S2P-Net的紧凑型深度学习网络架构，用于在数据量较少的情况下实现旋转不变的目标识别，且无需数据增强即可保证数学上的旋转不变性。该网络结合了频域与空域信息，并通过极坐标变换增强其对旋转的鲁棒性。与传统卷积神经网络相比，S2P-Net在小样本场景下表现出更优的识别性能，为低数据条件下的旋转不变目标识别提供了新思路。

Comments 9 pages, 4 figures, 3 tables. Preprint. Code available from the author upon request

2605.09666 2026-05-12 cs.CV cs.AI

Rethinking Evaluation of Multiple Sclerosis (MS) Lesion Segmentation Models

Abdul Basit, Ashir Rashid, Muhammad Abdullah Hanif, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi（eBRAIN实验室，工程学院，纽约大学（纽约大学阿布扎克分校））

AI总结本文探讨了多发性硬化症（MS）病灶分割模型评估方法的不足，指出当前大多使用Dice分数进行评估，未能充分考虑病灶级别的检测与分割性能，以及对复杂或人类标注者难以判断情况的模型表现。作者详细分析了神经科医生在脑部MRI扫描中关注的特征，并提出了更符合实际需求的评估指标，同时在两个开源数据集上对现有先进模型进行了分析，以评估其在实际医疗场景中的适用性。

Comments 8 pages, 5 figures, Accepted to IJCNN 2026

2605.09665 2026-05-12 cs.LG cs.AI cs.CL

Learning Multi-Indicator Weights for Data Selection: A Joint Task-Model Adaptation Framework with Efficient Proxies

Jingze Song, Zihao Chen, Wenqing Chen, Zibin Zheng

发表机构 * School of Software Engineering, Sun Yat-sen University（中山大学软件学院）

AI总结本文研究了在指令微调中如何高效选择训练数据的问题，提出了一种联合任务-模型自适应的框架，用于学习多指标权重以优化数据选择。该方法通过在小型验证集上利用上下文学习信号，无需大规模微调即可确定最优权重配置，从而实现高效且高保真的数据评估。实验表明，该方法在多个基准和模型家族上表现出与全数据微调相当甚至更优的效果，并揭示了推理任务中语义多样性与逻辑复杂性的权衡关系。

Comments This work has been accepted at IJCAI 2026

2605.09663 2026-05-12 cs.LG cs.AI

Causal Parametric Drift Simulation: A Digital Twin Framework for Classifier Robustness Evaluation

Julien Lafrance, Richard Khoury, Véronique Tremblay

发表机构 * Laval University（拉瓦尔大学）

AI总结在动态环境中，机器学习分类器常因概念漂移导致性能下降，而传统评估方法难以准确反映数据生成过程中的因果依赖关系。本文提出了一种基于结构因果模型的数字孪生框架——因果参数漂移模拟（Causal Parametric Drift Simulation），通过精确的因果干预揭示分类器在部署前的潜在脆弱性。实验表明，该方法能发现标准统计监测手段无法识别的隐藏问题，为分类器鲁棒性评估提供了新的有效工具。

Comments 34 pages, 13 figures, 14 tables

2605.09662 2026-05-12 cs.CV

BEA-GS: BEyond RAdiance Supervision in 3DGS for Precise Object Extraction

Alessio Mazzucchelli, Maria Naranjo-Almeida, Jorge Bustos-Sanchez, Mariella Dimiccoli, Francesc Moreno-Noguer, Jordi Sanchez-Riera, Adrian Penate-Sanchez

发表机构 * Arquimea Research Center（阿奎米亚研究中心）； Institut de Robòtica i Informàtica Industrial (CSIC-UPC)（机器人与信息技术研究所（CSIC-UPC））； Universidad de las Palmas de Gran Canaria (IUSIANI)（Gran Canaria大学（IUSIANI））

AI总结本文提出了一种名为BEA-GS的新型高斯泼溅方法，旨在在无需辐射监督的情况下实现更精确的物体提取。该方法通过引入两种新的损失函数，分别优化可见和不可见高斯点的几何结构，以更准确地对齐语义边界。实验表明，该方法在多个数据集上取得了当前最佳的边界分割效果，显著提升了物体级编辑和资产提取的精度。

Comments CVPR 2026 Highlight

2605.09661 2026-05-12 cs.CL cs.AI

MedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studies

Huy Hoang Ha, Benoit Favre, Francois Portet

发表机构 * GitHub

AI总结本文提出MedMeta，一个用于评估大语言模型（LLM）从医学研究摘要中合成元分析结论能力的新基准。该基准包含81项来自PubMed的元分析，通过两种流程评估模型：基于真实摘要的检索增强生成（Golden-RAG）和仅依赖内部知识的参数化方法。研究发现，基于外部信息的Golden-RAG方法显著优于仅依赖内部知识的方法，而领域微调的效果有限，且当前模型在处理否定性证据时表现不佳，突显了RAG系统在临床应用中的重要性和现有模型的不足。

2605.09659 2026-05-12 cs.RO

ASACK : Adaptive Safe Active Continual Koopman Learning for Uncertain Systems with Contractive Guarantees

Chandan Kumar Sah, Rajpal Singh, Jishnu Keshavan

发表机构 * Department of Mechanical Engineering, Indian Institute of Science（机械工程系，印度科学研究院）

AI总结本文提出了一种名为ASACK的自适应安全主动持续Koopman学习框架，用于在存在模型不确定性和分布偏移的不确定系统中进行安全控制。该方法通过一个基于自编码器的Koopman模型进行离线学习，并利用收缩性适应律进行在线模型修正，从而在理论上有分布偏移和模型不确定性下的收敛保证。为提高数据效率，该方法结合主动学习策略，在完成任务目标的同时引导系统采集信息量大的数据，并将主动学习目标与安全约束整合到非凸优化问题中，最终通过鲁棒MPC框架实现形式化的安全保证。实验表明该方法在性能上优于现有先进方法。

详情

英文摘要

Koopman operator theory provides a powerful framework for representing nonlinear dynamics through a linear operator acting on lifted observables, enabling the use of linear control techniques for nonlinear systems. However, Koopman models are typically learned from data and often degrade in performance under model uncertainty and distributional shifts between training and deployment. Although several works have explored online adaptation to address this issue, many rely on neural network-based updates that introduce significant computational overhead and lack formal safety guarantees, limiting their suitability for real-time and safety-critical robotic applications. In this work, we propose a unified framework for continual adaptive Koopman learning that enables safe and efficient online refinement of learned models during task execution. An autoencoder-based Koopman model is first learned offline and subsequently refined online through a contractive adaptation law, which provides theoretical convergence guarantees under distributional shifts and model uncertainty. To improve data efficiency and accelerate model refinement, the adaptation mechanism is integrated with an active learning strategy that drives the system to collect informative data while accomplishing task objectives. The resulting control problem is formulated as a nonconvex optimization problem incorporating both active learning objectives and safety constraints. We further derive theoretical bounds on model approximation error and show how these bounds can be incorporated within a robust Model Predictive Control (MPC) framework to provide formal safety guarantees. The proposed approach unifies learning, excitation, and safety within a single control framework without sacrificing real-time feasibility. Extensive simulation and experimental studies demonstrate superior performance compared to state-of-the-art baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.09656 2026-05-12 cs.RO

ORICF -- Open Robotics Inference and Control Framework

Andrés Meseguer Valenzuela, Luís Miguel Bartolín Arnau

发表机构 * Instituto Tecnológico de Informática (ITI)（技术信息学院）

AI总结本文提出了一种名为ORICF的开放机器人推理与控制框架，旨在解决当前人工智能在机器人应用中计算开销大、延迟高和能耗高的问题。该框架具有模块化、声明式和模型无关的特点，支持通过轻量级YAML配置灵活调整模型、硬件和数据通道，无需修改代码。研究通过在移动机器人上结合语音识别、大语言模型和目标检测模型进行实验，验证了ORICF在边缘计算部署下可显著降低机器人端的计算负载和能耗，同时保持系统的模块化与可复现性。

Comments Accepted in ICRA26 Workshop: 8th International Workshop on Robotics Software Engineering (RoSE 26)

2605.09650 2026-05-12 cs.AI cs.LG

Workspace Optimization: How to Train Your Agent

Elad Sarafian, Gal Kaplun, Ron Banner, Daniel Soudry, Boris Ginsburg

发表机构 * NVIDIA

AI总结本文研究了如何通过优化智能体的“工作空间”来提升其在复杂多轮任务中的表现。作者提出，当前沿语言模型的权重难以调整时，应通过结构化的外部工作空间进行训练，这一过程称为“工作空间优化”。为此，他们设计了DreamTeam系统，通过多智能体协作构建可执行的世界模型，并在ARC-AGI-3数据集上实现了比现有最优方法更高的任务解决率，同时减少了环境交互动作的使用。

2605.09649 2026-05-12 cs.LG

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

Ngoc Bui, Hieu Trung Nguyen, Arman Cohan, Rex Ying

发表机构 * Department of Computer Science Yale University（耶鲁大学计算机科学系）； The Chinese University of Hong Kong（香港中文大学）

AI总结该论文研究了如何通过改进键值（KV）缓存的管理策略来提升模型在长上下文推理中的性能。作者提出了一种基于全局保留机制的KV缓存淘汰方法，通过学习每个token的未来有用性，在统一的内存预算下进行选择性淘汰，从而在减少内存消耗的同时提升生成质量。实验表明，该方法在多个长上下文语言和视觉语言推理任务中，能够有效降低KV内存占用并达到或超越全缓存推理的效果。

Comments A learnable KV eviction method for large language models

2605.09640 2026-05-12 cs.CV cs.LG

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning

Meng Lou, Hanzhong Guo, Linwei Chen, Yizhou Yu

发表机构 * The University of Hong Kong（香港大学）； The Hong Kong University of Science and Technology（香港科学与技术大学）； Hong Kong Generative AI Research and Development Center（香港生成式人工智能研究与开发中心）

AI总结本文研究了如何在视觉持续学习中克服灾难性遗忘问题，提出了一种基于强化微调的新方法RaPO。作者发现现有方法如GRPO在面对类别增量和领域增量学习时仍存在显著遗忘，其根本原因在于轨迹层面的策略漂移。为此，RaPO通过引入保留奖励和跨任务优势归一化，有效缓解了策略漂移带来的遗忘问题，实验表明其在多个持续学习场景中均取得优越性能，为视觉持续学习中的强化微调提供了系统性探索。

详情

英文摘要

Recent studies suggest that Reinforcement Fine-Tuning (RFT) is inherently more resilient to catastrophic forgetting than Supervised Fine-Tuning (SFT). However, whether RFT (e.g., GRPO) can effectively overcome forgetting in challenging visual continual learning settings, such as class-incremental learning (CIL) and domain-incremental learning (DIL), remains an open problem. Through a pilot study, we confirm that while RFT consistently outperforms SFT, it still suffers from non-negligible forgetting. We empirically trace this bottleneck to Trajectory-level Drift Agnosticism: among candidate rollouts achieving identical task rewards, the KL divergence from the preceding-task policy varies substantially, which strongly correlates with catastrophic forgetting across sequential tasks. Motivated by this insight, we propose Retention-aware Policy Optimization (RaPO), a simple yet effective RFT method that explicitly mitigates forgetting through trajectory-level reward shaping. Specifically, RaPO comprises two core components: (1) Retention Reward that converts trajectory-level distribution drift into a continuous reward signal, preferentially reinforcing knowledge-preserving rollouts within each group; (2) Cross-Task Advantage Normalization (CTAN), which maintains a persistent exponential moving average of reward statistics across task boundaries to stabilize the optimization progress during continual learning. Leveraging the free-form textual generalization of MLLMs, we comprehensively evaluate RaPO across five visual continual learning settings. Extensive experiments demonstrate that RaPO achieves leading performance, substantially reducing catastrophic forgetting while preserving strong plasticity. To the best of our knowledge, this work represents the first systematic exploration of RFT in visual continual learning, offering insights that we hope will inspire future research.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon

Adaptive Data Harvesting for Efficient Neural Network Learning with Universal Constraints

MOTOR-Bench: A Real-world Dataset and Multi-agent Framework for Zero-shot Human Mental State Understanding

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

Ambig-DS: A Benchmark for Task-Framing Ambiguity in Data-Science Agents

Discovery of Nonlinear Dynamics with Automated Basis Function Generation

Do multimodal models imagine electric sheep?

Quantum Circuit Simulation of Compartmental Drug Dynamics: Leveraging Variational Algorithms for Nonlinear Mixed-Effects Population Pharmacokinetics

ConFixGS: Learning to Fix Feedforward 3D Gaussian Splatting with Confidence-Aware Diffusion Priors in Driving Scenes

Spatial-Frequency Gated Swin Transformer for Remote Sensing Single-Image Super-Resolution

Learning Unified Representations of Normalcy for Time Series Anomaly Detection

Forcing-KV: Hybrid KV Cache Compression for Efficient Autoregressive Video Diffusion Models

DeepTumorVQA: A Hierarchical 3D CT Benchmark for Stage-Wise Evaluation of Medical VLMs and Tool-Augmented Agents

Absurd World: A Simple Yet Powerful Method to Absurdify the Real-world for Probing LLM Reasoning Capabilities

VFM-SDM: A vision foundation model-based framework for training-free, marker-free, and calibration-free structural displacement measurement

ChaosNetBench: Benchmarking Spatio-Temporal Graph Neural Networks on Chaotic Lattice Dynamics

CodeClinic: Evaluating Automation of Coding Skills for Clinical Reasoning Agents

MVB-Grasp: Minimum-Volume-Box Filtering of Diffusion-based Grasps for Frontal Manipulation

Towards Generative Predictive Display for Vision-Based Teleoperation: A Zero-Shot Benchmark of Off-the-Shelf Video Models

S2P-Net: A Spectral-Spatial Polar Network for Rotation-Invariant Object Recognition in Low-Data Regimes

Rethinking Evaluation of Multiple Sclerosis (MS) Lesion Segmentation Models

Learning Multi-Indicator Weights for Data Selection: A Joint Task-Model Adaptation Framework with Efficient Proxies

Causal Parametric Drift Simulation: A Digital Twin Framework for Classifier Robustness Evaluation

BEA-GS: BEyond RAdiance Supervision in 3DGS for Precise Object Extraction

MedMeta: A Benchmark for LLMs in Synthesizing Meta-Analysis Conclusion from Medical Studies

ASACK : Adaptive Safe Active Continual Koopman Learning for Uncertain Systems with Contractive Guarantees

ORICF -- Open Robotics Inference and Control Framework

Workspace Optimization: How to Train Your Agent

Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction

Overcoming Catastrophic Forgetting in Visual Continual Learning with Reinforcement Fine-Tuning