arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2411.04077 2026-05-12 cs.CV

H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models

Nhi Pham, Michael Schott

发表机构 * Max Planck Institute for Informatics（马克斯·普朗克信息研究所）； Saarland University（萨尔兰州大学）； Zuse School（祖斯学校）

AI总结本文提出了一种基于分层抽样评估的H-POPE基准，用于系统评估大视觉语言模型在物体存在性和属性层面的幻觉问题。该方法通过从粗到细的层次结构进行评估，揭示了模型在细粒度属性上更容易产生幻觉的现象。研究进一步探讨了模型在生成文本时是否依赖于视觉输入，为理解视觉语言模型的生成机制提供了新的视角。

Comments Poster at https://sites.google.com/berkeley.edu/bb-stat/home

2410.10247 2026-05-12 cs.CV cs.AI

LPT: Less-overfitting Prompt Tuning for Vision-Language Model

Chenhao Ding, Xinyuan Gao, Songlin Dong, Jizhou Han, Qiang Wang, Zhengdong Zhou, Yuhang He, Yihong Gong

发表机构 * IEEE（国际电气电子工程师协会）

AI总结该研究针对视觉语言模型在迁移过程中易出现的过拟合问题，提出了一种名为LPT的轻量级提示调优框架。其核心方法包括利用CLIP过滤细粒度前景信息以引导基础视觉概念的提示生成，并引入特征级结构保持约束和输出级层次逻辑约束，以增强模型的泛化能力。实验表明，LPT在多个基准任务中显著提升了模型的泛化性能，有效缓解了过拟合问题。

2408.17366 2026-05-12 cs.LG cs.AI

Leveraging Graph Neural Networks to Forecast Electricity Consumption

Eloi Campagne, Yvenn Amara-Ouali, Yannig Goude, Argyris Kalogeratos

发表机构 * Centre Borelli, Université Paris-Saclay, CNRS, Ecole Normale Supérieure Paris-Saclay（巴黎-萨克勒大学中心Borelli，巴黎-萨克勒大学，法国国家科学研究中心，巴黎-萨克勒高等师范学院）； Laboratoire de Mathématiques d’Orsay (LMO), Université Paris-Saclay, CNRS, Faculté des Sciences d’Orsay（奥赛数学实验室（LMO），巴黎-萨克勒大学，法国国家科学研究中心，奥赛科学学院）； EDF R&D, Palaiseau – France（EDF研发部，帕莱舍，法国）

AI总结本文研究了如何利用图神经网络进行电力需求预测，以应对可再生能源接入和去中心化电网带来的复杂性和不确定性。研究提出了一种基于图结构的方法，能够有效捕捉电网中节点间的空间分布与关系特性，并引入了包括图卷积网络和图SAGE在内的多种模型进行预测。该方法不仅拓展了传统广义可加模型的框架，还提供了一套用于构建和评估图模型性能与可解释性的完整框架，并在合成数据和法国本土区域的真实数据上进行了实验验证。

Comments 17 pages, ECML PKDD 2024 Workshop paper

2407.07639 2026-05-12 cs.LG cs.AI

Explaining Graph Neural Networks for Node Similarity on Graphs

Daniel Daza, Cuong Xuan Chu, Trung-Kien Tran, Daria Stepanova, Michael Cochez, Paul Groth

发表机构 * Vrije Universiteit Amsterdam（瓦赫宁根大学阿姆斯特丹）； Bosch Center for Artificial Intelligence（博世人工智能中心）； Abo Akademi University（阿博阿卡迪米大学）； Elsevier discovery lab（埃斯勒弗发现实验室）； University of Amsterdam（阿姆斯特丹大学）

AI总结本文研究了如何为基于图神经网络（GNN）的节点相似性计算提供可解释性，以提升图数据中相似性搜索的可理解性。作者比较了两种主流解释方法——基于互信息（MI）和基于梯度的（GB）解释，发现梯度基解释具有三个重要优势：可操作性、一致性以及可显著压缩为稀疏解释而不影响相似性评分效果。该研究为图神经网络的可解释性提供了有价值的实证分析和指导。

Comments Accepted in Transactions of Machine Learning Research (2026)

2406.19741 2026-05-12 cs.RO cs.AI

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Christopher E. Mower, Yuhui Wan, Hongzhan Yu, Antoine Grosnit, Jonas Gonzalez-Billandon, Matthieu Zimmer, Jinlong Wang, Xinyu Zhang, Yao Zhao, Anbang Zhai, Puze Liu, Daniel Palenicek, Davide Tateo, Cesar Cadena, Marco Hutter, Jan Peters, Guangjian Tian, Yuzheng Zhuang, Kun Shao, Xingyue Quan, Jianye Hao, Jun Wang, Haitham Bou-Ammar

发表机构 * Huawei Noah’s Ark Lab（华为诺亚实验室）； University of Leeds（利兹大学）； Technical University of Darmstadt（达姆施塔特技术大学）； East China Normal University（华东师范大学）； Huawei Technologies（华为技术有限公司）； ETH Zurich（苏黎世联邦理工学院）； University College London（伦敦大学学院）

AI总结本文提出了一种名为ROS-LLM的框架，旨在让非专家用户通过自然语言指令直观地编程机器人，该框架结合了机器人操作系统（ROS）与大型语言模型（LLM）。该系统支持从LLM输出中自动提取行为并执行ROS动作，提供多种行为模式，并通过模仿学习扩展机器人动作库，同时利用人类和环境反馈进行LLM反思。实验表明，该框架在多种复杂场景中表现出良好的鲁棒性、可扩展性和灵活性，并已开源以供使用和复现。

Comments This document contains 26 pages and 13 figures

Journal ref Nature Machine Intelligence 8, 313-325 (2026)

2406.12910 2026-05-12 cs.LG cs.AI cs.NE physics.chem-ph q-bio.BM

Human-level molecular optimization driven by mol-gene evolution

Jiebin Fang, Churu Mao, Yuchen Zhu, Xiaoming Chen, Chang-Yu Hsieh, Zhongjun Ma

发表机构 * Hainan Institute of Zhejiang University（浙江大学海南研究院）； Institute of Marine Biology and Pharmacology, Ocean College, Zhejiang University（浙江大学海洋学院海洋生物与药理研究所）； College of Pharmaceutical Sciences and Cancer Center, Zhejiang University（浙江大学药学院与癌症中心）

AI总结该研究提出了一种名为DGMM的深度遗传分子修饰算法，旨在解决药物分子优化中结构新颖性与药理性质平衡的问题。通过引入离散变分自编码器（D-VAE），将分子编码为量化代码“mol-gene”，从而将深度学习与遗传算法结合，实现类似药物化学家的分子结构优化。该方法能够发现药理性质相似但结构不同的化合物，并揭示药物发现中结构优化的权衡关系，展示了其在多个应用中的有效性。

2405.17642 2026-05-12 cs.LG cs.AI stat.ME

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

Oleksii Furman, Patryk Wielopolski, Łukasz Lenkiewicz, Jerzy Stefanowski, Maciej Zięba

发表机构 * wrocław University of Science and Technology（沃里克大学科学与技术学院）； Poznań University of Technology（波兹南技术大学）； Tooploox Sp. z o.o.（Tooploox公司）

AI总结随着人工智能系统日益复杂，可解释性需求日益迫切。本文提出一种基于梯度优化的统一方法，能够同时生成局部、全局和群体级反事实解释，弥补了现有方法在不同粒度层面缺乏整合的不足。通过将实例分组与反事实生成结合为单一高效流程，并引入可信性准则，提升了群体级反事实的合理性与实用性，实验验证了该方法在有效性、贴近性与可信性之间的良好平衡。

2405.12969 2026-05-12 cs.LG

EchoAlign: Bridging Generative and Discriminative Learning under Noisy Labels

Yuxiang Zheng, Zhongyi Han, Yilong Yin

发表机构 * School of Software, Shandong University, Jinan 250100, China（山东大学软件学院，济南250100，中国）； Sydney AI Centre, The University of Sydney, Sydney, NSW 2050, Australia（悉尼大学悉尼人工智能中心，悉尼，新南威尔士州2050，澳大利亚）

AI总结本文提出了一种名为 EchoAlign 的新框架，用于在存在噪声标签的情况下桥接生成式学习与判别式学习。该方法不直接修正标签，而是通过生成模型调整实例特征以对齐噪声标签，并结合特征相似性筛选出可靠的样本，从而提升模型鲁棒性。实验表明，EchoAlign 在多个基准数据集上优于现有方法，尤其在高噪声环境下表现出更强的性能和稳定性。

Comments 27 pages, 7 figures. The article has been accepted by Frontiers of Computer Science (FCS), with the DOI: 10.1007/s11704-026-51604-z

2404.18923 2026-05-12 cs.CL

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models

Andreas Waldis, Yotam Perlitz, Leshem Choshen, Yufang Hou, Iryna Gurevych

发表机构 * Ubiquitous Knowledge Processing Lab (UKP Lab)（通用知识处理实验室）； Technical University of Darmstadt（达姆施塔特技术大学）； Information Systems Research Lab（信息系统研究实验室）； Lucerne University of Applied Sciences and Arts（卢塞恩应用科学与艺术大学）； IBM Research AI（IBM AI研究部）； MIT CSAIL（MIT CSAIL实验室）； MIT-IBM Watson AI Lab（MIT-IBM沃森AI实验室）； IBM Research Europe - Ireland（IBM欧洲爱尔兰研究部）

AI总结本文提出Holmes，一个用于评估语言模型语言能力的新基准，旨在衡量模型对语言现象的无意识理解能力。通过分类器探测方法，研究分析了模型在句法、形态、语义等语言现象上的内部表征，并发现模型的语言能力与规模密切相关，同时模型架构和指令微调也显著影响性能。为此，作者还提出了计算效率更高的简化版FlashHolmes，以在保持高精度的同时降低计算负担。

2404.14442 2026-05-12 cs.LG cs.AI

Toward a Unified Lyapunov-Certified ODE Convergence Analysis of Smooth Q-Learning with p-Norms

Donghwan Lee, Hyunjun Na

发表机构 * Department of Electrical Engineering（电气工程系）

AI总结本文研究了标准Q学习及其平滑变体的收敛性分析问题，提出了一种基于常微分方程（ODE）的统一收敛性框架。该方法通过引入平滑的p范数Lyapunov函数，克服了传统无穷范数方法中的非光滑问题，提供了简洁且严谨的稳定性证明。该框架适用于包括对数求和指数softmax、玻尔兹曼softmax和mellowmax操作符在内的多种平滑Q学习算法，并且在Bellman算子不构成收缩映射的情况下依然有效，具有广泛的适用性。

2312.08413 2026-05-12 cs.LG cs.CR cs.CY

Privacy Constrained Fairness Estimation for Decision Trees

Florian van der Steen, Fré Vink, Heysem Kaya

发表机构 * Department of Information and Computing Sciences, Utrecht（乌得勒支信息与计算科学系）； Responsible AI Team, Dutch Central Government Audit Service（荷兰中央政府审计服务责任AI团队）

AI总结随着数据价值的提升，保护敏感信息和确保人工智能模型的公平性变得尤为重要。本文研究了在差分隐私约束下对决策树模型进行公平性评估的问题，提出了一种新的方法PAFER，能够在保证隐私的前提下准确估计统计公平性。实验表明，该方法在保持模型可解释性的同时，能够有效降低公平性估计的误差，并在人类更易理解的决策树上表现更优。

Comments 52 pages, under review in Applied Intelligence journal

2311.03600 2026-05-12 cs.RO

Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model

Sayantan Auddy, Jakob Hollenstein, Matteo Saveriano, Antonio Rodríguez-Sánchez, Justus Piater

发表机构 * Faculty of Electrical Engineering and Computer Science, Technical University of Berlin（电气工程与计算机科学系，柏林技术大学）； Department of Computer Science, University of Innsbruck（计算机科学系，因斯布鲁克大学）； Digital Science Center (DiSC), University of Innsbruck（数字科学中心（DiSC），因斯布鲁克大学）； Department of Industrial Engineering, University of Trento（工业工程系，特伦托大学）； Singular Research Center on Intelligent Systems (CiTIUS), University of Santiago de Compostela（智能系统研究中心（CiTIUS），圣地亚哥-德孔波斯特拉大学）

AI总结该研究提出了一种可扩展且高效的从演示中持续学习方法，通过超网络生成稳定动力学模型，以提升机器人在真实环境中学习和保持多技能的能力。核心方法包括生成轨迹学习的动力学模型和轨迹稳定化的李雅普诺夫函数，构建了一个带有时钟增强的稳定神经ODE求解器（sNODE），并在超网络中引入随机正则化以减少训练时间复杂度。实验表明，该方法在多个复杂数据集和现实任务中表现出优越的轨迹精度、持续学习性能和稳定性。

Comments To appear in IEEE Transactions on Cognitive and Developmental Systems

详情

DOI: 10.1109/TCDS.2026.3692632

英文摘要

Robots capable of learning from demonstration (LfD) must exhibit stability while executing learned motion skills. To be effective in the real world, they should also remember multiple skills over time -- a capability lacking in current stable-LfD methods. We propose an approach to stable, continual LfD, and highlight the role of stability in improving continual learning. Our proposed hypernetwork generates the parameters of two neural networks: a trajectory learning dynamics model, and a trajectory-stabilizing Lyapunov function. These generated networks form a clock-augmented stable neural ODE solver (sNODE), a stable dynamics model that offers a superior stability-accuracy trade-off compared to the state-of-the-art. We further propose stochastic hypernetwork regularization with a single, uniformly-sampled task embedding, reducing the cumulative training time for $N$ tasks from O($N^2$) to O($N$) without degrading performance on real-world tasks. We introduce high-dimensional variants of the popular LASA dataset to assess scalability and extend a dataset of robotic LfD tasks to assess real-world performance. We empirically evaluate our approach on multiple LfD datasets of varying complexity, including sequences of 7--26 tasks, trajectories of 2--32 dimensions, and real-world tasks involving position and orientation. Our thorough evaluation on multiple LfD datasets demonstrates that our approach sequentially learns and retains multiple motion skills without retraining on past demonstrations, and outperforms other relevant baselines in terms of trajectory errors, continual learning scores, and stability metrics. Notably, we show that stability greatly enhances continual learning performance, particularly in size-efficient chunked hypernetworks. Our code is available at https://github.com/sayantanauddy/clfd-snode.

URL PDF HTML ☆

赞 0 踩 0

2306.03606 2026-05-12 cs.AI

BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs

Daniel Daza, Dimitrios Alivanistos, Payal Mitra, Thom Pijnenburg, Michael Cochez, Paul Groth

发表机构 * Vrije Universiteit Amsterdam（荷兰阿姆斯特丹自由大学）； University of Amsterdam（阿姆斯特丹大学）； Elsevier B.V.（埃森哲公司）； Discovery Lab, Elsevier（埃森哲发现实验室）

AI总结本文提出了一种名为BioBLP的模块化框架，用于在包含多模态实体属性的生物医学知识图谱中学习实体嵌入，能够处理不同模态的属性数据并支持缺失属性的实体。该方法还引入了一种高效的预训练策略，显著提升了模型性能并减少了训练时间。实验表明，在药物-蛋白质相互作用预测任务中，BioBLP优于不考虑属性数据的基线方法，尤其在低度节点上表现突出。

Journal ref J Biomed Semant 14, 20 (2023)

详情

DOI: 10.1186/s13326-023-00301-y

英文摘要

Knowledge graphs (KGs) are an important tool for representing complex relationships between entities in the biomedical domain. Several methods have been proposed for learning embeddings that can be used to predict new links in such graphs. Some methods ignore valuable attribute data associated with entities in biomedical KGs, such as protein sequences, or molecular graphs. Other works incorporate such data, but assume that entities can be represented with the same data modality. This is not always the case for biomedical KGs, where entities exhibit heterogeneous modalities that are central to their representation in the subject domain. We propose a modular framework for learning embeddings in KGs with entity attributes, that allows encoding attribute data of different modalities while also supporting entities with missing attributes. We additionally propose an efficient pretraining strategy for reducing the required training runtime. We train models using a biomedical KG containing approximately 2 million triples, and evaluate the performance of the resulting entity embeddings on the tasks of link prediction, and drug-protein interaction prediction, comparing against methods that do not take attribute data into account. In the standard link prediction evaluation, the proposed method results in competitive, yet lower performance than baselines that do not use attribute data. When evaluated in the task of drug-protein interaction prediction, the method compares favorably with the baselines. We find settings involving low degree entities, which make up for a substantial amount of the set of entities in the KG, where our method outperforms the baselines. Our proposed pretraining strategy yields significantly higher performance while reducing the required training runtime. Our implementation is available at https://github.com/elsevier-AI-Lab/BioBLP .

URL PDF HTML ☆

赞 0 踩 0

2010.03496 2026-05-12 cs.CL cs.AI cs.LG

Inductive Entity Representations from Text via Link Prediction

Daniel Daza, Michael Cochez, Paul Groth

发表机构 * Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）； University of Amsterdam（阿姆斯特丹大学）； Discovery Lab, Elsevier（Elsevier发现实验室）

AI总结该研究探讨了如何通过链接预测任务从知识图谱中的文本描述中学习归纳性实体表示，并评估这些表示在不同任务中的泛化能力。研究提出了一种基于预训练语言模型的架构，能够有效处理训练时未见过的实体，在链接预测、实体分类和信息检索等任务中均取得显著提升。实验表明，所学实体表示无需微调即可跨任务迁移，展现出比现有方法更强的泛化能力。

Comments The Web Conference 2021

详情

DOI: 10.1145/3442381.3450141

英文摘要

Knowledge Graphs (KG) are of vital importance for multiple applications on the web, including information retrieval, recommender systems, and metadata annotation. Regardless of whether they are built manually by domain experts or with automatic pipelines, KGs are often incomplete. Recent work has begun to explore the use of textual descriptions available in knowledge graphs to learn vector representations of entities in order to preform link prediction. However, the extent to which these representations learned for link prediction generalize to other tasks is unclear. This is important given the cost of learning such representations. Ideally, we would prefer representations that do not need to be trained again when transferring to a different task, while retaining reasonable performance. In this work, we propose a holistic evaluation protocol for entity representations learned via a link prediction objective. We consider the inductive link prediction and entity classification tasks, which involve entities not seen during training. We also consider an information retrieval task for entity-oriented search. We evaluate an architecture based on a pretrained language model, that exhibits strong generalization to entities not observed during training, and outperforms related state-of-the-art methods (22% MRR improvement in link prediction on average). We further provide evidence that the learned representations transfer well to other tasks without fine-tuning. In the entity classification task we obtain an average improvement of 16% in accuracy compared with baselines that also employ pre-trained models. In the information retrieval task, we obtain significant improvements of up to 8.8% in NDCG@10 for natural language queries. We thus show that the learned representations are not limited KG-specific tasks, and have greater generalization properties than evaluated in previous work.

URL PDF HTML ☆

赞 0 踩 0

2605.10111 2026-05-12 cs.LG cs.AI cs.CV

CFSPMNet: Cross-subject Fourier-guided Spatial-Patch Mamba Network for EEG Motor Imagery Decoding in Stroke Patients

Xiangkai Wang, Yun Zhao, Dongyi He, Qingling Xia, Gen Li, Xinlai Xing, Yuchi Pan, Bin Jiang

发表机构 * School of Artificial Intelligence, Chongqing University of Technology（重庆理工大学人工智能学院）； Chongqing Key Laboratory of Embodied Intelligence Perception and Autonomous Learning for Humanoid Robots（重庆市人形机器人感知与自主学习重点实验室）； Key Laboratory of Advanced Equipment Intelligence of the Chongqing Education Commission（重庆市教育委员会先进设备智能重点实验室）； School of Smart Health, Chongqing Polytechnic University of Electronic Technology（重庆理工大学电子工程学院智能健康学院）； Department of Language Science and Technology, The Hong Kong Polytechnic University（香港理工大学语言科学与技术系）； School of Pharmacy and Bioengineering, Chongqing University of Technology（重庆理工大学药学院与生物工程学院）

AI总结该研究针对中风患者脑机接口（BCI）解码中的跨被试应用难题，提出了一种名为CFSPMNet的新型神经网络框架。该方法结合傅里叶域状态重组与共享-私有原型匹配机制，通过建模潜在的神经状态组织，有效提升了跨被试MI-EEG解码的准确性和鲁棒性。实验表明，CFSPMNet在两个中风MI-EEG数据集上均优于现有主流方法，展现出显著的性能提升。

2605.10108 2026-05-12 cs.CL cs.LG

GLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extraction

Ihor Stepanov, Oleksandr Lukashov, Mykhailo Shtopko, Vivek Kalyanarangan

发表机构 * Knowledgator Engineering（Knowledgator工程公司）； Baldor Technologies Pvt. Ltd. (IDfy)（Baldor技术私人有限公司（IDfy））

AI总结本文提出了一种统一的框架GLiNER-Relex，用于联合执行命名实体识别（NER）和关系抽取（RE）任务。该方法基于共享的双向Transformer编码器，将实体类型和关系类型标签联合建模，实现了在推理时对任意实体和关系类型的零样本提取。实验表明，GLiNER-Relex在多个标准关系抽取数据集上表现优异，兼具计算效率和模型灵活性，并已作为开源工具包发布。

Comments 19 pages, 1 figure, 2 tables

2605.10107 2026-05-12 cs.AI cs.AR

Arcane: An Assertion Reduction Framework through Semantic Clustering and MCTS-Guided Rule Exploring

Hongqin Lyu, Yonghao Wang, Zhiteng Chao, Tiancheng Wang, Huawei Li

发表机构 * State Key Lab of Processors, Institute of Computing Technology, CAS, Beijing, China（处理器国家重点实验室，计算技术研究所，中国科学院，北京，中国）； University of Chinese Academy of Sciences, Beijing, China（中国科学院大学，北京，中国）

AI总结本文提出了一种名为Arcane的断言约简框架，旨在解决基于断言的硬件验证中冗余断言导致的仿真效率低下问题。该方法结合语义聚类对大规模断言进行准确分类，并利用蒙特卡洛树搜索（MCTS）探索最优的规则应用顺序，以高效减少断言数量。实验表明，Arcane在保持形式化覆盖率和突变检测能力的前提下，最多可减少76.2%的断言数量，并使仿真速度提升2.6至6.1倍。

Comments 6 pages, 6 figures

2605.10106 2026-05-12 cs.CV cs.AI

ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models

Tingshu Mou, Jiabo He, Renying Wang, Ce Liu, Hao Yang, Tiehua Zhang, Jingjing Chen, Xingjun Ma

发表机构 * Fudan University（复旦大学）； Bosch Center for Artificial Intelligence (BCAI)（博世人工智能中心（BCAI））； Tongji University（同济大学）

AI总结本文提出了一种名为ViSRA的基于视频的三维空间推理代理，旨在提升多模态大语言模型（MLLMs）的空间推理能力。ViSRA无需额外训练，通过利用专家模型提供的显式空间信息，以模块化和可扩展的方式引导模型进行空间推理，实现了灵活的即插即用框架。该方法在多个现有基准和未见过的三维空间任务中均表现出色，相比基线方法分别提升了15.6%和28.9%的绝对性能，具有可迁移的三维理解能力和较低的计算成本。

2605.10091 2026-05-12 cs.LG

TopoU-Net: a U-Net architecture for topological domains

Gaurav Gaurav, Ibrahem ALJabea, Yaroslav Zakomornyy, Eric Frank, Mohamed Elhamdadi, Theodore Papamarkou, Mustafa Hajij

发表机构 * University of South Florida（佛罗里达州立大学）； Louisiana State University（路易斯安那州立大学）； Vinci4D（Vinci4D公司）； PolyShape NTUA（NTUA PolyShape）； USFCA

AI总结 TopoU-Net 是一种面向拓扑结构数据的 U-Net 架构，旨在处理包含点、边、区域、超边等复杂结构的数据。该方法将 U-Net 视为一种层次化的编码-解码框架，利用组合复形中的单元、关联和秩来构建表示空间与跳跃连接。通过引入秩路径的概念，TopoU-Net 在不同拓扑层级之间进行特征传递，并在多个任务中表现出优越的性能，尤其在异质图和高阶结构数据上效果显著。

2605.10087 2026-05-12 cs.CV

Initiation of Interaction Detection Framework using a Nonverbal Cue for Human-Robot Interaction

Guhnoo Yun, Juhan Yoo, Kijung Kim, Dong Hwan Kim

发表机构 * Korea Institute of Science and Technology（韩国科学技术院）； Department of Computer Science, Semyung University（Semyoung大学计算机科学系）

AI总结本文提出了一种基于音频和视觉传感器融合的非语言线索的人机交互（HRI）启动检测框架，用于家庭环境中的机器人交互。该框架通过声音源定位与人体跟踪信息结合，实现用户注视机器人时的交互启动检测，即使用户未直接说话，也能在注视时间超过预设阈值时识别交互意图。研究设计了状态转移模型，并在移动机器人上进行了实验验证，所有模块均集成于ROS系统中，实现了框架的完整实现与应用。

2605.10086 2026-05-12 cs.RO

A cell-decomposition based path planner for 3D navigation in constrained workspaces

João P. L. Morais, Luciano C. A. Pimenta, Marcelo A. Santos, Guilherme V. Raffo

发表机构 * Department of Management, Information and Production Engineering, University of Bergamo, Dalmine, BG, Italy（伯加莫大学管理、信息与生产工程系）； Department of Electronic Engineering, Universidade Federal de Minas Gerais, Belo Horizonte, MG, Brazil（巴西米纳斯格拉斯联邦大学电子工程系）

AI总结本文提出了一种基于单元分解的路径规划算法，用于在受限三维工作空间中进行导航，确保每个单元与其至少一个相邻单元之间具有完全可见性。该方法构建了一个简化的路径可行性验证框架，并可方便地嵌入到优化问题中。通过结合Yen的k最短路径算法与二阶锥规划（SOCP），提出了一种名为KSP-SOCP的新方法，在保证路径质量的同时降低了计算负担，实验表明该方法在时间和内存效率上优于传统方法，适用于大规模场景。

Comments Accepted for publication at the 23rd IFAC World Congress (Busan, Korea)

2605.10079 2026-05-12 cs.CV

SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation

Liangyang Ouyang, Ruicong Liu, Caixin Kang, Yifei Huang, Yoichi Sato

发表机构 * The University of Tokyo（东京大学）； Shanda AI Research Tokyo（Shanda AI东京研究所）

AI总结该论文提出了一种名为SocialDirector的训练-free交互控制器，用于提升多人物视频生成中社会互动的控制能力。该方法通过调节交叉注意力图，实现了对人物动作执行者、动作时机及目标对象的精确控制，有效解决了现有模型中人物与动作不匹配、社交动态混乱等问题。研究还构建了自动化评估流程，实验表明SocialDirector显著提升了生成视频的交互真实性，接近真实视频的表现水平。

2605.10071 2026-05-12 cs.CV

MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization

Yaning Zhang, Tianyi Wang, Zan Gao, Yibo Zhao, Chunjie Ma, Meng Wang

发表机构 * Faculty of Computer Science and Technology, Qilu University of Technology (Shandong Academy of Sciences)（计算机科学与技术学院，齐鲁工业大学（山东省科学院））； School of Computing, National University of Singapore（国立新加坡大学计算机学院）； Shandong Artificial Intelligence Institute, Qilu University of Technology (Shandong Academy of Sciences)（山东省人工智能研究院，齐鲁工业大学（山东省科学院））； Key Laboratory of Computer Vision and System, Ministry of Education, Tianjin University of Technology（教育部计算机视觉与系统重点实验室，天津工业大学）

AI总结随着高真实感人脸生成技术的快速发展，通用性的人脸伪造检测与定位方法变得尤为重要。本文提出了一种多领域细粒度视觉-语言重建模型（MFVLR），通过语言引导的细粒度人脸伪造表示学习，全面捕捉多领域中的视觉伪造痕迹，从而实现对扩散模型生成人脸伪造内容的通用检测与定位。该模型引入细粒度语言变换器、多领域视觉编码器和视觉解码器，并设计了创新的视觉注入模块，显著提升了模型在跨生成器、跨伪造类型和跨数据集场景下的性能。

2605.10065 2026-05-12 cs.CL cs.AI

NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding

Hyundong Jin, Yo-Sub Han

发表机构 * Yonsei University（延世大学）

AI总结在生成文本时，防止大型语言模型生成不适当内容（如脏话和个人身份信息）变得越来越重要。为了解决在解码过程中高效处理多个硬约束和正则表达式约束的问题，本文提出了一种名为NCO的解码策略，该方法通过在线模式匹配实现对约束的高效处理，避免了状态爆炸问题，并兼容多种采样和搜索方法。实验表明，NCO在实际任务中有效提升了内容过滤的效果。

2605.10064 2026-05-12 cs.AI

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

Ruiyi Yang, Zechen Li, Hao Xue, Imran Razzak, Flora D. Salim

发表机构 * University of New South Wales（新南威尔士大学）； The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； Mohamed Bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）

AI总结 MAGE 是一种基于多智能体协同进化的框架，通过构建包含四个子图的协同进化知识图谱，将智能体在学习过程中的经验与反馈外部化存储，从而支持冻结主干模型在推理时的稳定表现。该方法利用任务条件引导检索机制，结合任务级和技能级的强化学习策略，实现了知识的高效积累与应用。实验表明，MAGE 在多个复杂任务上显著优于基于提示的冻结主干模型，展示了其在自我进化学习中的有效性与广泛适用性。

Comments 25 pages, 3 figures

2605.10063 2026-05-12 cs.RO

EFGCL: Learning Dynamic Motion through Spotting-Inspired External Force Guided Curriculum Learning

Keita Yoneda, Kento Kawaharazuka, Kei Okada

发表机构 * Department of Mechano-Informatics, Graduate School of Information Science and Technology, The University of Tokyo（机械信息学系，信息科学和技术研究生院，东京大学）； AI Center, Graduate School of Information Science and Technology, The University of Tokyo（人工智能中心，信息科学和技术研究生院，东京大学）

AI总结本文提出了一种基于物理引导的强化学习方法——外部力引导课程学习（EFGCL），旨在解决足式机器人学习复杂全身动态运动时效率低、失败风险高的问题。受体操中“ spotting ”动作的启发，该方法通过在训练过程中引入辅助外力，使机器人能够物理上体验成功动作的执行过程，无需依赖特定任务的奖励设计或参考轨迹。实验表明，EFGCL显著提升了四足机器人学习跳跃等复杂动作的效率，并成功在真实机器人上复现了仿真中的运动，验证了该方法的有效性和通用性。

Comments Accepted at RA-L 2026, website - https://keitayoneda.github.io/kleiyn-efgcl/, YouTube - https://youtu.be/sFK00hm14No/

Journal ref IEEE Robotics and Automation Letters (RA-L) 2026

2605.10054 2026-05-12 cs.CV

Explanation-Aware Learning for Enhanced Interpretability in Biomedical Imaging

Zubair Faruqui, Rahul Dubey

发表机构 * Department of Computer Science, Missouri State University（密苏里州立大学计算机科学系）

AI总结该研究针对医学影像诊断中深度神经网络过度依赖非临床相关特征的问题，提出了一种在训练过程中直接引入解释性监督的方法，以引导模型关注具有临床意义的区域。研究系统分析了不同解释损失设计和监督强度对模型预测性能和解释可信度的影响，并引入了两个新的量化指标用于评估解释质量。实验表明，该方法在保持模型准确性的同时，能够显著提升解释的临床相关性，适用于多种标注的生物医学影像任务。

Comments Under review at IEEE Journal of Biomedical and Health Informatics (JBHI)

2605.10051 2026-05-12 cs.RO cs.AI

Guided Streaming Stochastic Interpolant Policy

Puming Jiang, Meiyi Wang, Kelvin Lin, Ce Hao, Harold Soh

发表机构 * School of Computing, National University of Singapore（新加坡国立大学计算机学院）

AI总结本文研究了如何在推理时通过引导机制，使生成式机器人策略能够动态适应目标，而无需重新训练。传统方法受限于基于块的架构，存在延迟高、反应性差的问题。作者通过分析价值函数的时间演化，推导出针对随机插值策略的最优引导项，并提出了流式随机插值策略（SSIP），实现了快速且反应灵敏的实时控制。此外，还提出了两种互补机制，分别支持零样本适应和高效推理，实验表明该方法在动态复杂环境中表现出更优的反应能力和物理合理性。

Comments Accepted to Robotics: Science and Systems (RSS) 2026. The first two authors contributed equally

2605.10050 2026-05-12 cs.CV

EchoPrune: Interpreting Redundancy as Temporal Echoes for Efficient VideoLLMs

Jiameng Li, Minye Wu, Jiezhang Cao, Aleksei Tiulpin, Matthew B. Blaschko

发表机构 * KU Leuven（鲁文大学）； Shanghai Jiaotong University（上海交通大学）； Weill Cornell Medicine（韦尔医学院）

AI总结视频大语言模型（VideoLLMs）在处理长视频时面临挑战，因为密集采样会导致大量视觉token，而稀疏采样则可能遗漏关键时间信息，引发模型幻觉。本文提出了一种轻量且无需训练的token剪枝方法EchoPrune，通过将冗余token解释为时间回声，利用跨模态相关性和时间重建误差对token进行评分，从而在固定token预算下提升时间分辨率。实验表明，EchoPrune使VideoLLMs在相同token预算下处理的帧数提升至原来的20倍，并在多个基准上提升了性能和推理速度。

Comments 9 pages

2605.10047 2026-05-12 cs.LG cs.AI

Rethinking Loss Reweighting for Imbalance Learning as an Inverse Problem: A Neural Collapse Point of View

Jinping Wang, Zixin Tong, Zhiwu Xie, Zhiqiang Gao

发表机构 * CSMT, Wenzhou-Kean University（温州肯恩大学计算机科学与技术学院）； International Frontier Interdisciplinary Research Institute, Wenzhou-Kean University（温州肯恩大学国际前沿交叉研究 institutes）

AI总结本文从逆问题的角度重新思考不平衡学习中的损失重加权问题，提出了一种基于神经崩溃（Neural Collapse）理论的动态权重调整策略。该方法以类间平均损失相等为目标，通过逆向推导动态确定类别权重，从而更有效地缓解类别不平衡带来的影响。实验表明，该方法在多个数据集上优于现有主流长尾分类方法，且能更好地贴近理想几何结构。

Comments Accepted by ICML2026

AI 大模型

视觉与机器人

科学与医疗

H-POPE: Hierarchical Polling-based Probing Evaluation of Hallucinations in Large Vision-Language Models

LPT: Less-overfitting Prompt Tuning for Vision-Language Model

Leveraging Graph Neural Networks to Forecast Electricity Consumption

Explaining Graph Neural Networks for Node Similarity on Graphs

ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning

Human-level molecular optimization driven by mol-gene evolution

Unifying Perspectives: Plausible Counterfactual Explanations on Global, Group-wise, and Local Levels

EchoAlign: Bridging Generative and Discriminative Learning under Noisy Labels

Holmes: A Benchmark to Assess the Linguistic Competence of Language Models

Toward a Unified Lyapunov-Certified ODE Convergence Analysis of Smooth Q-Learning with p-Norms

Privacy Constrained Fairness Estimation for Decision Trees

Scalable and Efficient Continual Learning from Demonstration via a Hypernetwork-generated Stable Dynamics Model

BioBLP: A Modular Framework for Learning on Multimodal Biomedical Knowledge Graphs

Inductive Entity Representations from Text via Link Prediction

CFSPMNet: Cross-subject Fourier-guided Spatial-Patch Mamba Network for EEG Motor Imagery Decoding in Stroke Patients

GLiNER-Relex: A Unified Framework for Joint Named Entity Recognition and Relation Extraction

Arcane: An Assertion Reduction Framework through Semantic Clustering and MCTS-Guided Rule Exploring

ViSRA: A Video-based Spatial Reasoning Agent for Multi-modal Large Language Models

TopoU-Net: a U-Net architecture for topological domains

Initiation of Interaction Detection Framework using a Nonverbal Cue for Human-Robot Interaction

A cell-decomposition based path planner for 3D navigation in constrained workspaces

SocialDirector: Training-Free Social Interaction Control for Multi-Person Video Generation

MFVLR: Multi-domain Fine-grained Vision-Language Reconstruction for Generalizable Diffusion Face Forgery Detection and Localization

NCO: A Versatile Plug-in for Handling Negative Constraints in Decoding

MAGE: Multi-Agent Self-Evolution with Co-Evolutionary Knowledge Graphs

EFGCL: Learning Dynamic Motion through Spotting-Inspired External Force Guided Curriculum Learning

Explanation-Aware Learning for Enhanced Interpretability in Biomedical Imaging

Guided Streaming Stochastic Interpolant Policy

EchoPrune: Interpreting Redundancy as Temporal Echoes for Efficient VideoLLMs

Rethinking Loss Reweighting for Imbalance Learning as an Inverse Problem: A Neural Collapse Point of View