arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

检索范围排序方式

检索时间范围

重置

HOT 人工智能、机器人等 9

cs.AI 人工智能 cs.CV 计算机视觉 cs.CL 自然语言处理 cs.RO 机器人 cs.LG 机器学习 cs.SD 声音 cs.ET 新兴技术 eess.AS 音频语音 eess.IV 图像视频

CS 计算机 41

cs 计算机 cs.AI 人工智能 cs.AR 硬件架构 cs.CC 计算复杂性 cs.CE 计算工程 cs.CG 计算几何 cs.CL 自然语言处理 cs.CR 密码安全 cs.CV 计算机视觉 cs.CY 计算机与社会 cs.DB 数据库 cs.DC 分布式计算 cs.DL 数字图书馆 cs.DM 离散数学 cs.DS 数据结构 cs.ET 新兴技术 cs.FL 形式语言 cs.GL 综述文献 cs.GR 图形学 cs.GT 博弈论 cs.HC 人机交互 cs.IR 信息检索 cs.IT 信息论 cs.LG 机器学习 cs.LO 计算机逻辑 cs.MA 多智能体 cs.MM 多媒体 cs.MS 数学软件 cs.NA 数值分析 cs.NE 神经进化 cs.NI 网络架构 cs.OH 其他计算机 cs.OS 操作系统 cs.PF 性能 cs.PL 编程语言 cs.RO 机器人 cs.SC 符号计算 cs.SD 声音 cs.SE 软件工程 cs.SI 社会信息网络 cs.SY 系统控制

ECON 经济学 4

econ 经济学 econ.EM 计量经济 econ.GN 一般经济 econ.TH 理论经济

EESS 电气与系统 5

eess 电气与系统 eess.AS 音频语音 eess.IV 图像视频 eess.SP 信号处理 eess.SY 系统控制

MATH 数学 33

math 数学 math.AC 交换代数 math.AG 代数几何 math.AP 偏微分方程 math.AT 代数拓扑 math.CA 经典分析 math.CO 组合数学 math.CT 范畴论 math.CV 复变函数 math.DG 微分几何 math.DS 动力系统 math.FA 泛函分析 math.GM 一般数学 math.GN 一般拓扑 math.GR 群论 math.GT 几何拓扑 math.HO 历史综述 math.IT 信息论 math.KT K理论 math.LO 逻辑 math.MG 度量几何 math.MP 数学物理 math.NA 数值分析 math.NT 数论 math.OA 算子代数 math.OC 优化控制 math.PR 概率 math.QA 量子代数 math.RA 环与代数 math.RT 表示论 math.SG 辛几何 math.SP 谱理论 math.ST 统计理论

PHYSICS 物理 55

astro-ph 天体物理 astro-ph.CO 宇宙学 astro-ph.EP 地球行星 astro-ph.GA 星系物理 astro-ph.HE 高能天体 astro-ph.IM 天文仪器 astro-ph.SR 太阳恒星 cond-mat 凝聚态 cond-mat.dis-nn 无序神经 cond-mat.mes-hall 介观纳米 cond-mat.mtrl-sci 材料科学 cond-mat.other 其他凝聚态 cond-mat.quant-gas 量子气体 cond-mat.soft 软凝聚态 cond-mat.stat-mech 统计力学 cond-mat.str-el 强关联电子 cond-mat.supr-con 超导 gr-qc 广义相对论 hep-ex 高能实验 hep-lat 格点高能 hep-ph 高能唯象 hep-th 高能理论 math-ph 数学物理 nlin 非线性科学 nlin.AO 自适应系统 nlin.CD 混沌动力学 nlin.CG 胞自动机 nlin.PS 斑图孤子 nlin.SI 可积系统 nucl-ex 核物理实验 nucl-th 核物理理论 physics 物理 physics.acc-ph 加速器物理 physics.ao-ph 大气海洋 physics.app-ph 应用物理 physics.atm-clus 原子分子团簇 physics.atom-ph 原子物理 physics.bio-ph 生物物理 physics.chem-ph 化学物理 physics.class-ph 经典物理 physics.comp-ph 计算物理 physics.data-an 数据分析 physics.ed-ph 物理教育 physics.flu-dyn 流体动力学 physics.gen-ph 普通物理 physics.geo-ph 地球物理 physics.hist-ph 物理史哲 physics.ins-det 仪器探测 physics.med-ph 医学物理 physics.optics 光学 physics.plasm-ph 等离子体 physics.pop-ph 科普物理 physics.soc-ph 物理与社会 physics.space-ph 空间物理 quant-ph 量子物理

Q-BIO 定量生物 11

q-bio 定量生物 q-bio.BM 生物分子 q-bio.CB 细胞行为 q-bio.GN 基因组学 q-bio.MN 分子网络 q-bio.NC 神经认知 q-bio.OT 其他定量生物 q-bio.PE 种群进化 q-bio.QM 定量方法 q-bio.SC 亚细胞过程 q-bio.TO 组织器官

Q-FIN 定量金融 10

q-fin 定量金融 q-fin.CP 计算金融 q-fin.EC 经济学 q-fin.GN 一般金融 q-fin.MF 数学金融 q-fin.PM 投资组合 q-fin.PR 证券定价 q-fin.RM 风险管理 q-fin.ST 统计金融 q-fin.TR 交易微观结构

STAT 统计 7

stat 统计 stat.AP 统计应用 stat.CO 统计计算 stat.ME 统计方法 stat.ML 机器学习 stat.OT 其他统计 stat.TH 统计理论

2512.23694 2026-05-11 stat.ML cs.LG econ.EM

Bellman Calibration for $V$-Learning in Offline Reinforcement Learning

贝尔曼校准用于离线强化学习中的V学习

Lars van der Laan, Nathan Kallus

发表机构 * Department of Statistics, University of Washington（华盛顿大学统计学系）

AI总结本文提出贝尔曼校准方法，用于解决离线强化学习中长期价值预测的可靠性问题，通过校准误差评估和迭代贝尔曼校准方法提升价值预测性能。

详情

AI中文摘要

在离线强化学习中，可靠的长期价值预测困难，因为拟合价值方法结合了 Bootstrap、函数逼近和分布偏移，而标准保证通常需要贝尔曼完备性或可实现性。我们引入贝尔曼校准，一种弱可靠性标准，要求被相似预测值分配的状态具有平均贝尔曼目标与预测一致。此标准产生一个标量校准误差用于诊断系统性数值校准误差，我们通过双稳健贝尔曼目标估计从非策略数据中估计。然后我们提出迭代贝尔曼校准，一种模型无关的后处理过程，通过拟合原始预测的一维映射来重新校准任何学习的价值预测器，具有直方图和等比变种。我们证明了有限样本保证，显示贝尔曼校准误差在无贝尔曼完备性或价值函数可实现性的情况下，以一维非参数速率控制。我们的价值误差界限分离了统计估计、有限迭代和近似误差，澄清了何时校准能提升价值预测，以及何时其收益受限于原始预测器的信息或覆盖不足。

英文摘要

Reliable long-horizon value prediction is difficult in offline reinforcement learning because fitted value methods combine bootstrapping, function approximation, and distribution shift, while standard guarantees often require Bellman completeness or realizability. We introduce Bellman calibration, a weak reliability criterion requiring that states assigned similar predicted values have average Bellman targets that agree with those predictions. This criterion yields a scalar calibration error for diagnosing systematic numerical miscalibration, which we estimate from off-policy data using doubly robust Bellman target estimates. We then propose Iterated Bellman Calibration, a model-agnostic post-hoc procedure that recalibrates any learned value predictor by fitting a one-dimensional map of its original prediction, with histogram and isotonic variants. We prove finite-sample guarantees showing that Bellman calibration error is controlled at one-dimensional nonparametric rates without Bellman completeness or value-function realizability. Our value-error bounds separate statistical estimation, finite-iteration, and approximation errors, clarifying when calibration improves value prediction and when its gains are limited by the information in the original predictor or insufficient coverage.

URL PDF HTML ☆

赞 0 踩 0

2512.09682 2026-05-11 eess.SY cs.AI cs.GT cs.MA cs.SY

Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies

动态单次关键数据传递由小而稀疏的无人机群实现：用于MARL扩展研究的模型问题

Mika Persson, Jonas Lidman, Jacob Ljungberg, Samuel Sandelius, Adam Andersson

发表机构 * Swedish Defence Research Agency (FOI)（瑞典国防研究机构（FOI））； Chalmers University of Technology（查尔姆斯理工大学）； the University of Gothenburg, Department of Mathematical Sciences（哥德堡大学数学科学系）

AI总结本文研究了多智能体强化学习在无人机去中心化控制中的应用，用于传递关键数据包。引入了一类确定性游戏用于MARL扩展研究，提出了一种稳健的基线策略，并通过实验展示了两种现成的MARL算法在小规模时表现良好，但扩展性受限。

Comments Accepted to the 2026 IFAC World Congress

2511.22893 2026-05-11 eess.SY cs.AI cs.SY

Switching-time bioprocess control with pulse-width-modulated optogenetics

具有脉宽调制光遗传学的开关时间生物过程控制

Sebastián Espinel-Ríos

发表机构 * School of Chemical and Bioprocess Engineering, University College Dublin, Ireland（化学与生物过程工程学院，都柏林大学学院，爱尔兰）

AI总结本文提出利用强化学习优化脉宽调制光遗传学，通过 duty cycle 参数实现开关时间控制，提升生物过程可控性。

Comments Accepted conference paper: IFAC World Congress 2026

详情

AI中文摘要

生物技术可通过动态控制提高生产效率。在这一背景下，光遗传学利用光作为外部输入调节基因表达，允许精细调节蛋白质水平以解锁动态代谢控制和细胞生长调节。光遗传系统可通过光强度驱动，但仅依赖强度驱动控制（即信号幅度）可能在剂量-响应关系（即光强度与基因表达强度）陡峭时无法正确调节光遗传生物过程。在这些情况下，可调性被限制在完全激活或完全抑制的基因表达，中间调节有限。脉宽调制可通过在强制周期内交替完全ON和OFF的光强度，从而平滑平均响应并增强过程可控性。优化脉宽调制光遗传学涉及具有二进制输入的开关时间最优控制问题，跨越多个强制周期。虽然可以将其建模为一个混合整数优化问题，在细化的控制网格上具有单调输入约束，但随着强制周期内控制网格分辨率的增加和总强制周期数的增加，决策变量数量会迅速增长，使任务复杂化。本文提出了一种基于强化学习的替代解决方案。我们通过 duty cycle 参数参数化控制动作，这是一个连续代理变量，编码每个强制周期内的ON到OFF切换时间，从而尊重光强度的内在二进制性质，同时避免细网格二进制决策变量。

英文摘要

Biotechnology can benefit from dynamic control to improve production efficiency. In this context, optogenetics enables modulation of gene expression using light as an external input, allowing fine-tuning of protein levels to unlock dynamic metabolic control and regulation of cell growth. Optogenetic systems can be actuated by light intensity. However, relying solely on intensity-driven control (i.e., signal amplitude) may fail to properly tune optogenetic bioprocesses when the dose-response relationship (i.e., light intensity versus gene-expression strength) is steep. In these cases, tunability is effectively constrained to either fully active or fully repressed gene expression, with little intermediate regulation. Pulse-width modulation can alleviate this issue by alternating between fully ON and OFF light intensity within forcing periods, thereby smoothing the average response and enhancing process controllability. Optimizing pulse-width-modulated optogenetics entails a switching-time optimal control problem with a binary input over multiple forcing periods. While this can be formulated as a mixed-integer optimization problem on a refined control grid with monotonic input constraints, the number of decision variables can grow rapidly with increasing control-grid resolution within forcing periods and with the total number of forcing periods, complicating the task. Here, we propose an alternative solution based on reinforcement learning. We parametrize control actions via the duty cycle, a continuous proxy variable that encodes the ON-to-OFF switching time within each forcing period, thereby respecting the intrinsic binary nature of the light intensity while avoiding fine-grid binary decision variables.

URL PDF HTML ☆

赞 0 踩 0

2511.09016 2026-05-11 eess.SY cs.LG cs.SY

Assumed Density Filtering and Smoothing with Neural Network Surrogate Models

假设密度滤波与平滑的神经网络替代模型

Simon Kuang, Xinfan Lin

发表机构 * University of California, Davis（加州大学戴维斯分校）

AI总结本文提出利用神经网络替代模型进行假设密度滤波与平滑，通过分析公式计算深度神经网络的均值与协方差，改进了非线性系统中的不确定性传播，并在洛伦兹系统和维纳系统中验证了方法的优越性。

Comments To appear at Learning for Decision and Control 2026

2511.03182 2026-05-11 cs.SE cs.LG

Understanding Robustness of Model Editing in Code LLMs

理解代码LLM中模型编辑的鲁棒性

Vinaik Chhetri, Moghis Fereidouni, A. B Siddique, Umar Farooq

发表机构 * Louisiana State University（路易斯安那州立大学）； University of Kentucky（肯塔基大学）

AI总结研究代码LLM在API更新下的模型编辑鲁棒性，评估不同方法在单次和连续编辑下的性能，发现多数方法在泛化和特定性上表现不佳，且连续编辑导致性能显著下降。

Comments 26 pages, 14 figures, 20 tables

详情

AI中文摘要

大型语言模型（LLMs）在代码领域日益被用于软件开发，但它们在预训练后保持静态，而API和软件库持续演变。模型编辑提供了一种轻量级替代方法，用于整合API更新，但现有编辑方法是否能诱导正确的API迁移、泛化到未见任务以及在未修改API任务中保持性能仍不明确。本文提出一个受控基准，基于HumanEval、MBPP和APPS，包含2040个问题，覆盖140种独特的合成API修改，以及一个执行沙盒，强制在标准Python语义下执行编辑后的API。评估了几种最先进的编辑方法在三个代码LLM上的表现，使用基于执行的指标区分成功的API采用与基于绕过的方法完成任务。在单次编辑下，编辑模型在未见API使用上泛化较差，许多看似成功的是基于绕过的方法而非真正的API迁移。在涉及未修改API的任务中性能也下降，尽管基于记忆的方法和微调在特定性上优于定位后编辑方法。在连续编辑下，大多数方法模型组合在泛化和特定性上接近零Pass@k，揭示了超出目标编辑的显著干扰。双因素Shapley分解进一步显示，单次编辑在泛化上的失败包含显著的编译组件，而特定性失败更多是编译后的问题。在连续编辑下，失败主要由编译驱动。

英文摘要

Large language models (LLMs) for code are increasingly used in software development, but they remain static after pretraining while APIs and software libraries continue to evolve. Model editing offers a lightweight alternative to retraining for incorporating API updates, yet it remains unclear whether existing editing methods can induce correct API migration, generalize that behavior to unseen tasks, and preserve performance on tasks involving unmodified APIs. We present a controlled benchmark for evaluating model editing under API updates in code LLMs, built from HumanEval, MBPP, and APPS, with 2,040 problems spanning 140 unique synthetic API modifications, together with an execution sandbox that enforces edited APIs under standard Python semantics. We evaluate several state-of-the-art editing methods on three code LLMs under both single-edit and successive-edit regimes using execution-based metrics that distinguish successful API adoption from workaround-based task completion. Under single edits, edited models generalize poorly to unseen uses of the modified API, and many apparent successes are workaround-based rather than true API migrations. Performance on tasks involving unmodified APIs also degrades, although memory-based methods and fine-tuning preserve specificity better than locate-then-edit methods. Under successive edits, most method-model combinations collapse to near-zero Pass@k on both generalization and specificity, revealing substantial interference beyond the target edits. A two-factor Shapley decomposition further shows that single-edit failures on generalization include a substantial compilation component, whereas specificity failures are more often post-compilation. Under successive edits, failures become predominantly compilation-driven.

URL PDF HTML ☆

赞 0 踩 0

2510.24736 2026-05-11 q-bio.QM cs.LG q-bio.BM

RNAGenScape: Property-Guided, Optimized Generation of mRNA Sequences with Manifold Langevin Dynamics

RNAGenScape：基于属性的优化mRNA序列生成方法，结合流形拉格朗日动力学

Danqi Liao, Chen Liu, Xingzhi Sun, Dié Tang, Haochen Wang, Scott Youlten, Srikar Krishna Gopinath, Haejeong Lee, Ethan C. Strayer, Antonio J. Giraldez, Smita Krishnaswamy

发表机构 * Yale University（耶鲁大学）

AI总结本文提出RNAGenScape方法，通过流形拉格朗日动力学生成生物可行的mRNA序列，提升属性优化和成功率，适用于疫苗设计和蛋白质替代疗法。

Comments ICML 2025 Generative AI and Biology (GenBio) Workshop, Oral presentation

详情

AI中文摘要

生成具有特定属性的mRNA序列对于疫苗设计和蛋白质替代疗法至关重要，但受限于数据不足、序列-功能关系复杂以及生物可行序列空间狭窄而具有挑战性。生成方法若偏离数据流形可能生成无法折叠、翻译差或非功能的序列。我们提出了RNAGenScape，一种基于属性的流形拉格朗日动力学框架，直接在学习到的真实数据流形上操作。通过在该流形上进行迭代局部优化，RNAGenScape保持生物可行性，获取可靠指导，并避免进入非功能区域。该框架整合三个组件：(1) 一个与属性预测器联合训练的自动编码器，学习属性组织的潜在流形；(2) 一个去噪自动编码器，将更新投影回流形；(3) 一个基于属性的拉格朗日动力学过程，沿流形进行优化。在三个现实mRNA数据集中，RNAGenScape将中位属性增益提高高达148%，成功率提高高达30%，同时确保生成序列的生物可行性，并在推理效率上与现有生成方法竞争。

英文摘要

Generating property-optimized mRNA sequences is central to applications such as vaccine design and protein replacement therapy, but remains challenging due to limited data, complex sequence-function relationships, and the narrow space of biologically viable sequences. Generative methods that drift away from the data manifold can yield sequences that fail to fold, translate poorly, or are otherwise nonfunctional. We present RNAGenScape, a property-guided manifold Langevin dynamics framework for mRNA sequence generation that operates directly on a learned manifold of real data. By performing iterative local optimization constrained to this manifold, RNAGenScape preserves biological viability, accesses reliable guidance, and avoids excursions into nonfunctional regions of the ambient sequence space. The framework integrates three components: (1) an autoencoder jointly trained with a property predictor to learn a property-organized latent manifold, (2) a denoising autoencoder that projects updates back onto the manifold, and (3) a property-guided Langevin dynamics procedure that performs optimization along the manifold. Across three real-world mRNA datasets spanning two orders of magnitude in size, RNAGenScape increases median property gain by up to 148% and success rate by up to 30% while ensuring biological viability of generated sequences, and achieves competitive inference efficiency relative to existing generative approaches.

URL PDF HTML ☆

赞 0 踩 0

2510.18516 2026-05-11 q-bio.NC cs.LG

Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining

通过细胞模式感知预训练解码动态视觉体验

Sangyoon Bae, Mehdi Azabou, Blake Richards, Jiook Cha

发表机构 * Interdisciplinary Program in Artificial Intelligence（人工智能跨学科项目）； Seoul National University（首尔国立大学）； NSF AI Institute for Artificial and Natural Intelligence (ARNI)（国家科学基金会人工智能与自然智能研究院）； Columbia University（哥伦比亚大学）； Mila (Quebec AI Institute)（蒙特利尔人工智能研究所）； Dept. of Neurology & Neurosurgery（神经病学与神经外科系）； McGill University（麦吉尔大学）； Montreal Neurological Institute, McGill University（麦吉尔大学蒙特利尔神经科学研究所）； School of Computer Science, McGill University（麦吉尔大学计算机科学学院）； Learning in Machines and Brains Program, CIFAR（机器与大脑学习计划，CIFAR）； Department of Psychology（心理学系）； Department of Brain and Cognitive Sciences（脑与认知科学系）

AI总结本文提出POYO-CAP方法，通过预训练提升神经解码鲁棒性，实验证明其在Allen脑观察数据集上比从头训练有12-13%的提升，并实现模型规模的平稳扩展。

2510.02371 2026-05-11 cs.CR cs.AI cs.DC

Federated Spatiotemporal Graph Learning for Passive Attack Detection in Smart Grids

联邦时空图学习用于智能电网中的被动攻击检测

Bochra Al Agha, Razane Tajeddine

发表机构 * Department of Electrical and Computer Engineering（电气与计算机工程系）

AI总结本文提出一种基于图的多模态检测器，通过融合物理层和行为指标检测智能电网中的被动攻击，采用联邦学习框架提升鲁棒性，实现高准确率和低误报率。

详情

AI中文摘要

智能电网面临被动窃听威胁，攻击者通过监听通信链路获取电网拓扑、消费模式和操作行为，为更严重的攻击创造机会。本文提出一种图中心的多模态检测器，通过融合物理层和行为指标检测被动攻击。采用两阶段编码器：图卷积聚合空间上下文，双向GRU建模短期时间依赖性。在联邦学习框架下训练，提升对异构本地数据的鲁棒性。使用合成数据集模拟异构通信，模型在每时间步测试准确率为98.32%（F1_{attack}=0.972），在0.15% FPR下每序列准确率为93.35%。结果表明，结合空间和时间上下文能可靠检测隐蔽侦察，同时保持低误报率，适用于非独立同分布的联邦智能电网部署。

英文摘要

Smart grids are exposed to passive eavesdropping, where attackers listen silently to communication links. Although no data is actively altered, such reconnaissance can reveal grid topology, consumption patterns, and operational behavior, creating a gateway to more severe targeted attacks. Detecting this threat is difficult because the signals it produces are faint, short-lived, and often disappear when traffic is examined by a single node or along a single timeline. This paper introduces a graph-centric, multimodal detector that fuses physical-layer and behavioral indicators over ego-centric star subgraphs and short temporal windows to detect passive attacks. To capture stealthy perturbations, a two-stage encoder is introduced: graph convolution aggregates spatial context across ego-centric star subgraphs, while a bidirectional GRU models short-term temporal dependencies. The encoder transforms heterogeneous features into a unified spatio-temporal representation suitable for classification. Training occurs in a federated learning setup under FedProx, improving robustness to heterogeneous local raw data and contributing to the trustworthiness of decentralized training; raw measurements remain on client devices. A synthetic, standards-informed dataset is generated to emulate heterogeneous HAN/NAN/WAN communications with wireless-only passive perturbations, event co-occurrence, and leak-safe splits. The model achieves a testing accuracy of 98.32% per-timestep (F1_{attack}=0.972) and 93.35% per-sequence at 0.15% FPR using a simple decision rule with run-length m=2 and threshold $τ=0.55$. The results demonstrate that combining spatial and temporal context enables reliable detection of stealthy reconnaissance while maintaining low false-positive rates, making the approach suitable for non-IID federated smart-grid deployments.

URL PDF HTML ☆

赞 0 踩 0

2509.00398 2026-05-11 cs.CY cs.AI

A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI

对生成式AI伦理性和可信度评估框架的研究

Cheonsu Jeong, Seunghyun Lee, Seonhee Jeong, Sungsu Kim

发表机构 * Hyper Automation Team, SAMSUNG SDS（三星SDS超自动化团队）； Digital CRM Team, SAMSUNG SDS（三星SDS数字客户关系管理团队）

AI总结本文研究生成式AI的伦理和可信度评估框架，提出系统评估方法，涵盖公平性、透明度等关键维度，并分析不同国家的AI伦理政策。

Comments 22 pages, 3 figures, 6 tables

Journal ref Artificial Intelligence and Applications, 2026

详情

DOI: 10.47852/bonviewAIA62027463

AI中文摘要

本研究深入分析生成式AI技术快速发展带来的伦理和可信度挑战，提出系统评估框架。尽管生成式AI展现出创新潜力，但也引发偏见、有害性、版权侵权、隐私侵犯和幻觉等伦理和社会问题。现有AI评估方法主要关注性能和准确性，无法应对这些复杂问题。因此，本文强调需要新的以人类为中心的准则，反映社会影响。研究识别了公平性、透明度、问责制、安全性、隐私、准确性、一致性、鲁棒性、可解释性、版权和知识产权保护、来源追溯等关键维度，并为每个维度开发详细指标和评估方法。此外，本文比较分析了韩国、美国、欧盟和中国的AI伦理政策和指南，总结其关键方法和影响。所提框架贯穿AI生命周期，整合技术评估与多学科视角，提供在现实情境中识别和管理伦理风险的实用手段。最终，本文为生成式AI的负责任发展建立学术基础，为政策制定者、开发者、用户及其他利益相关者提供可操作的见解，支持AI技术的积极社会贡献。

英文摘要

This study provides an in_depth analysis of the ethical and trustworthiness challenges emerging alongside the rapid advancement of generative artificial intelligence (AI) technologies and proposes a comprehensive framework for their systematic evaluation. While generative AI, such as ChatGPT, demonstrates remarkable innovative potential, it simultaneously raises ethical and social concerns, including bias, harmfulness, copyright infringement, privacy violations, and hallucination. Current AI evaluation methodologies, which mainly focus on performance and accuracy, are insufficient to address these multifaceted issues. Thus, this study emphasizes the need for new human_centered criteria that also reflect social impact. To this end, it identifies key dimensions for evaluating the ethics and trustworthiness of generative AI_fairness, transparency, accountability, safety, privacy, accuracy, consistency, robustness, explainability, copyright and intellectual property protection, and source traceability and develops detailed indicators and assessment methodologies for each. Moreover, it provides a comparative analysis of AI ethics policies and guidelines in South Korea, the United States, the European Union, and China, deriving key approaches and implications from each. The proposed framework applies across the AI lifecycle and integrates technical assessments with multidisciplinary perspectives, thereby offering practical means to identify and manage ethical risks in real_world contexts. Ultimately, the study establishes an academic foundation for the responsible advancement of generative AI and delivers actionable insights for policymakers, developers, users, and other stakeholders, supporting the positive societal contributions of AI technologies.

URL PDF HTML ☆

赞 0 踩 0

2504.02382 2026-05-11 eess.IV cs.AI cs.CV

Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge

CT和X射线中骨盆骨折分割技术的基准测试：PENGWIN 2024挑战总结

Yudi Sang, Yanzhen Liu, Sutuke Yibulayimu, Yunning Wang, Benjamin D. Killeen, Mingxu Liu, Ping-Cheng Ku, Ole Johannsen, Karol Gotkowski, Maximilian Zenk, Klaus Maier-Hein, Fabian Isensee, Peiyan Yue, Yi Wang, Haidong Yu, Zhaohong Pan, Yutong He, Xiaokun Liang, Daiqi Liu, Fuxin Fan, Artur Jurgas, Andrzej Skalski, Yuxi Ma, Jing Yang, Szymon Płotka, Rafał Litka, Gang Zhu, Yingchun Song, Mathias Unberath, Mehran Armand, Dan Ruan, S. Kevin Zhou, Qiyong Cao, Chunpeng Zhao, Xinbao Wu, Yu Wang

发表机构 * Beijing Rossum Robot Technology Co., Ltd.（北京罗素机器人科技有限公司）； Key Laboratory of Biomechanics and Mechanobiology, Ministry of Education, Beijing Advanced Innovation Center for Biomedical Engineering, School of Biological Science and Medical Engineering, Beihang University（生物力学与机械生物学重点实验室，教育部，北京生物医学创新中心，生物科学与医学工程学院，北航）； Department of Computer Science, Johns Hopkins University（计算机科学系，约翰霍普金斯大学）； Division of Medical Image Computing, German Cancer Research Center (DKFZ)（医学影像计算部，德国癌症研究中心（DKFZ））； Helmholtz Imaging, Heidelberg（海德堡大学医院影像中心）； Smart Medical Imaging, Learning and Engineering (SMILE) Lab, Medical UltraSound Image Computing（智能医学影像、学习与工程（SMILE）实验室，医学超声影像计算）

AI总结本文通过PENGWIN 2024挑战评估了CT和X射线中骨盆骨折分割技术，发现CT分割准确率较高，但X射线分割仍需进一步改进，揭示了分割方法的多样性及片段定义的不确定性。

Comments PENGWIN 2024 Challenge Report

详情

DOI: 10.1109/TMI.2025.3650126

AI中文摘要

CT和X射线中骨盆骨折碎片的分割对于创伤诊断、手术计划和术中指导至关重要。然而，由于复杂的解剖结构和成像限制，准确且高效地勾勒骨碎片仍是一个重大挑战。PENGWIN挑战作为MICCAI 2024卫星会议的一部分，旨在通过在这些复杂任务上基准测试最新算法来推进自动化骨折分割。收集了来自多个临床中心的150例CT扫描数据，并使用DeepDRR方法生成了大量模拟X射线图像。16支全球团队的最终提交在严格多指标测试方案下进行评估。表现最佳的CT算法在平均碎片交并比（IoU）上达到0.930，显示出满意的准确性。然而，在X射线任务中，最佳算法的IoU为0.774，具有前景但尚未足够用于术中决策，反映了投影成像中碎片重叠的固有挑战。除了定量评估，该挑战还揭示了算法设计中的方法多样性。实例表示的变化，如主次分类与边界-核心分离，导致了不同的分割策略。尽管有令人鼓舞的结果，该挑战也暴露了碎片定义中的固有不确定性，特别是在不完整骨折的情况下。这些发现表明，结合人类决策与任务相关信息的交互分割方法可能是提高模型可靠性和临床适用性的关键。

英文摘要

The segmentation of pelvic fracture fragments in CT and X-ray images is crucial for trauma diagnosis, surgical planning, and intraoperative guidance. However, accurately and efficiently delineating the bone fragments remains a significant challenge due to complex anatomy and imaging limitations. The PENGWIN challenge, organized as a MICCAI 2024 satellite event, aimed to advance automated fracture segmentation by benchmarking state-of-the-art algorithms on these complex tasks. A diverse dataset of 150 CT scans was collected from multiple clinical centers, and a large set of simulated X-ray images was generated using the DeepDRR method. Final submissions from 16 teams worldwide were evaluated under a rigorous multi-metric testing scheme. The top-performing CT algorithm achieved an average fragment-wise intersection over union (IoU) of 0.930, demonstrating satisfactory accuracy. However, in the X-ray task, the best algorithm achieved an IoU of 0.774, which is promising but not yet sufficient for intra-operative decision-making, reflecting the inherent challenges of fragment overlap in projection imaging. Beyond the quantitative evaluation, the challenge revealed methodological diversity in algorithm design. Variations in instance representation, such as primary-secondary classification versus boundary-core separation, led to differing segmentation strategies. Despite promising results, the challenge also exposed inherent uncertainties in fragment definition, particularly in cases of incomplete fractures. These findings suggest that interactive segmentation approaches, integrating human decision-making with task-relevant information, may be essential for improving model reliability and clinical applicability.

URL PDF HTML ☆

赞 0 踩 0

2503.17656 2026-05-11 q-bio.QM cs.AI cs.LG

Pretraining a Foundation Model for Small-Molecule Natural Products

为小分子天然产物预训练一个基础模型

Yuheng Ding, Bo Qiang, Shaoning Li, Yiran Zhou, Jie Yu, Qi Li, Cheng Shi, Liangren Zhang, Yusong Wang, Nanning Zheng, Zhenming Liu

发表机构 * State Key Laboratory of Natural and Biomimetic Drugs（天然与仿生药物国家重点实验室）； School of Pharmaceutical Sciences, Peking University（北京大学药学院）

AI总结本文提出基于天然产物独特性质的预训练基础模型，通过对比学习和掩码图学习目标，提升分子骨架和侧链信息的表征能力，在天然产物挖掘和药物发现任务中取得SOTA成果。

Comments Accepted by Nature Machine Intelligence(2026)

Journal ref Nature Machine Intelligence(2026)

详情

DOI: 10.1038/s42256-026-01226-8

AI中文摘要

天然产物作为微生物、动物或植物的代谢产物，表现出多样的生物活性，对药物发现至关重要。目前，现有深度学习方法主要用于特定下游任务的监督学习，但这种一模型一任务范式缺乏泛化能力，存在改进空间。此外，现有分子表征方法不适用于天然产物的独特任务。为此，我们基于天然产物的独特性质预训练了一个基础模型。我们的方法采用了一种针对天然产物的新型预训练策略，通过对比学习和掩码图学习目标，强调分子骨架的进化信息并捕捉侧链信息。我们的框架在各种天然产物挖掘和药物发现任务中取得了最先进的（SOTA）结果。我们首先通过与合成分子聚焦基线的分类比较，证明当前模型不足以理解天然合成。进一步，通过在基因和微生物层面的细粒度分析，NaFM展示了捕捉进化信息的能力。最终，我们的方法通过虚拟筛选实验，展示了具有信息量的天然产物表征，能够更有效地识别潜在的药物候选物。

英文摘要

Natural products, as metabolites from microorganisms, animals, or plants, exhibit diverse biological activities, making them crucial for drug discovery. Nowadays, existing deep learning methods for natural products research primarily rely on supervised learning approaches designed for specific downstream tasks. However, such one-model-for-a-task paradigm often lacks generalizability and leaves significant room for performance improvement. Additionally, existing molecular characterization methods are not well-suited for the unique tasks associated with natural products. To address these limitations, we have pre-trained a foundation model for natural products based on their unique properties. Our approach employs a novel pretraining strategy that is especially tailored to natural products. By incorporating contrastive learning and masked graph learning objectives, we emphasize evolutional information from molecular scaffolds while capturing side-chain information. Our framework achieves state-of-the-art (SOTA) results in various downstream tasks related to natural product mining and drug discovery. We first compare taxonomy classification with synthesized molecule-focused baselines to demonstrate that current models are inadequate for understanding natural synthesis. Furthermore, by diving into a fine-grained analysis at both the gene and microbial levels, NaFM demonstrates the ability to capture evolutionary information. Eventually, our method is experimented with virtual screening, illustrating informative natural product representations that can lead to more effective identification of potential drug candidates.

URL PDF HTML ☆

赞 0 踩 0

2412.11194 2026-05-11 cs.SE cs.AI

Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points

检测方向：自动化漏洞检测的综述及其所有痛点

Dan Ristea, Shae McFadden, Ezzeldin Shereen, Madeleine Dwyer, Sanyam Vyas, Chris Hicks, Vasilios Mavroudis

发表机构 * University College London（伦敦大学学院）； The Alan Turing Institute（艾伦·图灵研究所）； King’s College London（伦敦国王学院）； University of Southampton（南安普顿大学）

AI总结本文综述了自动化漏洞检测领域，指出其在问题定义、数据集、指标等方面存在的12个痛点，并提出解决这些痛点的建议，同时探讨了AIxCC案例在代理AI时代中的相关性。

详情

AI中文摘要

软件安全漏洞可能产生严重后果；然而，手动漏洞检测成本高且难以扩展，尤其是在代理编码框架增加代码生产速率的情况下。过去十年间，大量研究将机器学习应用于自动化漏洞检测（ML4AVD），但最流行数据集上的自我报告性能没有明显上升趋势。ML4AVD研究社区识别了问题定义、数据集和指标中的多个缺陷，但这些缺陷被孤立讨论，未解决产生和强化这些缺陷的根本问题。本文通过系统回顾87项重要工作，基于其问题定义、输入和检测粒度、目标编程语言、评估指标、数据集和检测方法，识别了ML4AVD流程中的12个痛点，显示这些痛点相互强化且因果交织：数据集、定义、基线和指标之间的反馈循环导致该领域持续集中在C/C++漏洞的函数级二元分类上。因此，该领域优化于狭窄且人为的问题，忽略了漏洞类型预测、更广泛的语言支持以及输入与检测粒度的分离。我们为每个痛点提出具体建议以打破这些循环。最后，我们使用AIxCC作为案例研究，评估近期高调努力与这些建议的一致性，并反思ML4AVD在代理AI时代的相关性。

英文摘要

Security vulnerabilities in software can have severe consequences; however, manual vulnerability detection is costly and does not scale, especially as agentic coding frameworks increase the rate of code production. Over the last decade, a large body of research has applied machine learning machine learning to automate vulnerability detection (ML4AVD), yet self-reported performance on the most popular datasets shows no clear upward trend. The ML4AVD research community has identified several flaws in problem formulations, datasets, and metrics, but these are discussed in isolation, leaving the overarching problems that generate and reinforce these flaws unaddressed. We first systematize the field through a survey of 87 influential works based on their problem formulation, input and detection granularity, target programming languages, evaluation metrics, datasets, and detection approach. Drawing on this corpus and prior empirical work, we identify twelve pain points spanning the ML4AVD pipeline and show that they are self-reinforcing and causally inter-meshed: feedback loops between datasets, formulations, baselines, and metrics perpetuate each other and explain the field's persistent concentration on binary classification of C/C++ vulnerabilities at the function level. Thus, the field optimizes for a narrow and artificial problem that omits vulnerability type prediction, broader language support, and separation of input from detection granularity. We pair each pain point with concrete recommendations to break these loops. Finally, we use AIxCC as a case study to assess how well a recent high-profile effort aligns with these recommendations and reflect on the relevance of ML4AVD in the era of agentic AI.

URL PDF HTML ☆

赞 0 踩 0

2410.18103 2026-05-11 eess.SP cs.AI cs.LG

A Hybrid Graph Neural Network for Enhanced EEG-Based Depression Detection

一种用于增强基于EEG的抑郁症检测的混合图神经网络

Yiye Wang, Wenming Zheng, Yang Li, Hao Yang

发表机构 * School of Biological Science（生物科学学院）； Medical Engineering Southeast University Nanjing, China（医学工程东南大学南京中国）

AI总结本文提出混合图神经网络HGNN，结合固定连接的CGNN和自适应连接的IGNN，通过引入图池化和反池化模块提取个体化层次信息，提升EEG抑郁症检测性能。

Journal ref 2025 IJCNN

详情

AI中文摘要

图神经网络（GNNs）在基于EEG的抑郁症检测中日益流行。然而，先前基于GNN的方法未能充分考虑抑郁症的特点，限制了其性能。首先，神经科学研究表明，抑郁症患者表现出共同和个体化的脑异常模式。先前的GNN方法通常专注于固定图连接以捕捉共同的脑异常模式或自适应连接以捕捉个体化模式，这在抑郁症检测中是不足的。其次，脑网络具有分层结构，包括从通道级图到区域级图的排列。这种分层结构因人而异，包含与检测抑郁症相关的显著信息。然而，先前的GNN方法忽视了这些个体化的分层信息。为了解决这些问题，我们提出了一种混合GNN（HGNN），该网络结合了利用固定连接的共同图神经网络（CGNN）分支和利用自适应连接的个体化图神经网络（IGNN）分支。两个分支分别捕捉共同和个体化的抑郁症模式，相互补充。此外，我们通过图池化和反池化模块（GPUM）增强IGNN分支，以提取个体化的层次信息。在两个公开数据集上的广泛实验表明，我们的模型实现了最先进的性能。

英文摘要

Graph neural networks (GNNs) are becoming increasingly popular for EEG-based depression detection. However, previous GNN-based methods fail to sufficiently consider the characteristics of depression, thus limiting their performance. Firstly, studies in neuroscience indicate that depression patients exhibit both common and individualized brain abnormal patterns. Previous GNN-based approaches typically focus either on fixed graph connections to capture common abnormal brain patterns or on adaptive connections to capture individualized patterns, which is inadequate for depression detection. Secondly, brain network exhibits a hierarchical structure, which includes the arrangement from channel-level graph to region-level graph. This hierarchical structure varies among individuals and contains significant information relevant to detecting depression. Nonetheless, previous GNN-based methods overlook these individualized hierarchical information. To address these issues, we propose a Hybrid GNN (HGNN) that merges a Common Graph Neural Network (CGNN) branch utilizing fixed connection and an Individualized Graph Neural Network (IGNN) branch employing adaptive connections. The two branches capture common and individualized depression patterns respectively, complementing each other. Furthermore, we enhance the IGNN branch with a Graph Pooling and Unpooling Module (GPUM) to extract individualized hierarchical information. Extensive experiments on two public datasets show that our model achieves state-of-the-art performance.

URL PDF HTML ☆

赞 0 踩 0

2408.11065 2026-05-11 physics.soc-ph cs.CL hep-th physics.data-an physics.hist-ph

Statistical Patterns in the Equations of Physics and the Emergence of a Meta-Law of Nature

物理学方程中的统计模式与自然法则的涌现

Andrei Constantin, Deaglan Bartlett, Harry Desmond, Pedro G. Ferreira

发表机构 * School of Mathematics, University of Birmingham, Watson Building, Edgbaston, Birmingham B15 2TT, United Kingdom（伯明翰大学数学学院）； Rudolf Peierls Centre for Theoretical Physics, University of Oxford, Parks Road, Oxford OX1 3PU, UK（牛津大学鲁道夫·皮埃尔斯理论物理中心）； Institute of Cosmology and Gravitation, University of Portsmouth, Dennis Sciama Building, Portsmouth, PO1 3FX, UK（普敦大学宇宙学与引力研究所）； Astrophysics, University of Oxford, DWB, Keble Road, Oxford OX1 3RH, UK（牛津大学天体物理学）

AI总结研究发现物理方程中数学运算符频率呈指数衰减，揭示了自然法则的统计规律，为符号回归和语言模型生成数学表达式提供新思路。

Comments 11 pages, 5 figures, 2 table

Journal ref Philos Trans A Math Phys Eng Sci (2026) 384 (2317): 20250091

详情

DOI: 10.1098/rsta.2025.0091

AI中文摘要

物理学试图揭示自然法则并用数学方程来表达。尽管自然现象的多样性巨大，物理方程却表现出独特的结构性规律，使其区别于任意的数学表达式。虽然尺寸分析等原则长期以来指导了物理模型的制定，但探索物理方程中更微妙的统计模式仍是一个开放性问题。在此，通过分析四个物理方程语料库并应用先进的隐式似然技术，我们发现数学运算符的频率遵循指数衰减定律，与自然语言中词频的Zipf幂律不同。这揭示了物理的统计元法则，可能反映了通信效率和自然本身施加的约束的结合。该元法则为符号回归提供了实际好处，大幅缩小了物理合理表达的空间。更广泛地说，它可能指导能够生成连贯数学表示的语言模型的发展，推动物理法则发现的自动化。

英文摘要

Physics seeks to uncover the laws of Nature and express them through mathematical equations. Despite the vast diversity of natural phenomena, physical equations exhibit structural regularities that set them apart from arbitrary mathematical expressions. While principles such as dimensional analysis have long guided the formulation of physical models, the exploration of more subtle statistical patterns within the equations of physics remains an open question. Here, by analysing four corpora of physics equations and applying advanced implicit-likelihood techniques, we find that the frequency of mathematical operators follows an exponential decay law, in contrast to Zipf's power law for word frequencies in natural languages. This reveals a statistical meta-law of physics, possibly reflecting a combination of communication efficiency and constraints imposed by Nature itself. The meta-law offers practical benefits for symbolic regression by drastically narrowing down the space of physically plausible expressions. More broadly, it may inform the development of language models that can generate coherent mathematical representations, advancing the automation of physical law discovery.

URL PDF HTML ☆

赞 0 踩 0

2405.00082 2026-05-11 quant-ph cs.DS cs.LG

Structure learning of Hamiltonians from real-time evolution

从实时演化中学习哈密顿量的结构

Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang

发表机构 * MIT（麻省理工学院）； UC Berkeley（加州大学伯克利分校）

AI总结本文提出了一种新的哈密顿量学习方法，能够在不预先知道相互作用结构的情况下，以Heisenberg极限精度在O(log n/ε)时间内恢复哈密顿量，扩展到任意相互作用范围，并实现常数时间分辨率。

Comments 52 pages; v2 discussed more literature, qualified some claims; v3 minor correction discussing prior work; v4 strengthened main theorem

详情

DOI: 10.1109/FOCS61266.2024.00069

AI中文摘要

我们研究了从实时演化中学习哈密顿量结构的问题：给定能够应用e^{-iHt}的能力，其中H是未知的局部哈密顿量H=∑_{a=1}^m λ_a E_a作用于n个量子比特上，目标是恢复H。该问题在假设相互作用项E_a已知的情况下已经很明确，但如何高效地在没有先验知识相互作用结构的情况下学习局部哈密顿量？我们提出了一种新的通用方法，不仅解决了具有挑战性的结构学习变种，还解决了该领域中的其他开放问题，同时实现了Heisenberg极限的黄金标准。特别是，我们的算法在总演算时间O(log n/ε)内以ε误差恢复哈密顿量，并具有以下吸引人的特性：(1) 不需要知道哈密顿量的项；(2) 超出短程设置，扩展到任何哈密顿量H，其中与量子比特相互作用的项的总和具有有界范数；(3) 以常数时间t增量演化，从而实现常数时间分辨率。作为应用，我们也可以学习表现出幂律衰减的哈密顿量，直到精度ε，总演算时间优于标准限制的1/ε²。

英文摘要

We study the problem of Hamiltonian structure learning from real-time evolution: given the ability to apply $e^{-\mathrm{i} Ht}$ for an unknown local Hamiltonian $H = \sum_{a = 1}^m λ_a E_a$ on $n$ qubits, the goal is to recover $H$. This problem is already well-understood under the assumption that the interaction terms, $E_a$, are given, and only the interaction strengths, $λ_a$, are unknown. But how efficiently can we learn a local Hamiltonian without prior knowledge of its interaction structure? We present a new, general approach to Hamiltonian learning that not only solves the challenging structure learning variant, but also resolves other open questions in the area, all while achieving the gold standard of Heisenberg-limited scaling. In particular, our algorithm recovers the Hamiltonian to $\varepsilon$ error with total evolution time $O(\log (n)/\varepsilon)$, and has the following appealing properties: (1) it does not need to know the Hamiltonian terms; (2) it works beyond the short-range setting, extending to any Hamiltonian $H$ where the sum of terms interacting with a qubit has bounded norm; (3) it evolves according to $H$ in constant time $t$ increments, thus achieving constant time resolution. As an application, we can also learn Hamiltonians exhibiting power-law decay up to accuracy $\varepsilon$ with total evolution time beating the standard limit of $1/\varepsilon^2$.

URL PDF HTML ☆

赞 0 踩 0

2310.02243 2026-05-11 quant-ph cs.DS cs.LG

Learning quantum Hamiltonians at any temperature in polynomial time

在多项式时间内学习任意温度下的量子哈密顿量

Ainesh Bakshi, Allen Liu, Ankur Moitra, Ewin Tang

发表机构 * MIT（麻省理工学院）； UC Berkeley（加州大学伯克利分校）

AI总结本文提出在多项式时间内学习任意常数温度下量子哈密顿量的算法，通过多项式逼近和多项式系统松弛方法解决哈密顿量学习问题。

Comments 66 pages; v2 minor edits, clarification on locality

详情

AI中文摘要

我们研究了在已知逆温度β>0的情况下，通过给定的吉布斯态ρ=e^{-βH}/tr(e^{-βH})来学习局部量子哈密顿量H的问题。Anshu等人（arXiv:2004.07266）给出了一个在n个量子比特上以精度ε学习哈密顿量的算法，但该算法仅需多项式数量的吉布斯态副本，但需要指数时间。获得计算上高效的算法一直是重大开放问题[Alhambra'22 (arXiv:2204.08349)], [Anshu, Arunachalam'22 (arXiv:2204.08349)]，先前工作仅在高温[Haah, Kothari, Tang'21 (arXiv:2108.04842)]或交换项[Anshu, Arunachalam, Kuwahara, Soleimanifar'21]的情况下解决了此问题。我们完全解决了此问题，给出了在任意常数β>0下，从多项式数量的吉布斯态副本中以精度ε学习H的多项式时间算法。我们的主要技术贡献是新的平坦多项式逼近指数函数，以及多变量标量多项式与嵌套交换子的转换。这使我们能够将哈密顿量学习公式化为多项式系统。然后我们证明，解决该多项式系统的低次和平方松弛足以准确学习哈密顿量。

英文摘要

We study the problem of learning a local quantum Hamiltonian $H$ given copies of its Gibbs state $ρ= e^{-βH}/\textrm{tr}(e^{-βH})$ at a known inverse temperature $β>0$. Anshu, Arunachalam, Kuwahara, and Soleimanifar (arXiv:2004.07266) gave an algorithm to learn a Hamiltonian on $n$ qubits to precision $ε$ with only polynomially many copies of the Gibbs state, but which takes exponential time. Obtaining a computationally efficient algorithm has been a major open problem [Alhambra'22 (arXiv:2204.08349)], [Anshu, Arunachalam'22 (arXiv:2204.08349)], with prior work only resolving this in the limited cases of high temperature [Haah, Kothari, Tang'21 (arXiv:2108.04842)] or commuting terms [Anshu, Arunachalam, Kuwahara, Soleimanifar'21]. We fully resolve this problem, giving a polynomial time algorithm for learning $H$ to precision $ε$ from polynomially many copies of the Gibbs state at any constant $β> 0$. Our main technical contribution is a new flat polynomial approximation to the exponential function, and a translation between multi-variate scalar polynomials and nested commutators. This enables us to formulate Hamiltonian learning as a polynomial system. We then show that solving a low-degree sum-of-squares relaxation of this polynomial system suffices to accurately learn the Hamiltonian.

URL PDF HTML ☆

赞 0 踩 0

2605.08012 2026-05-11 cs.LG cs.AI cs.CL

Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims

位置：机制可解释性必须披露因果主张的识别假设

Zezheng Lin, Fengming Liu

AI总结本文指出机制可解释性研究需明确披露因果主张的识别假设，通过审核10篇论文发现缺乏专门的识别假设部分，且常用验证指标未明确说明其假设基础。

Comments 10 pages, 2 figures. Submitted to NeurIPS 2026 (Position Track)

详情

AI中文摘要

机制可解释性论文越来越多地使用因果词汇：电路、中介变量、因果抽象、单义性。此类主张需要显式的识别假设。对四类方法学流派中10篇论文的有目的审核发现，没有专门的识别假设部分，并出现重复模式：如忠实度、完整性、单义性、对齐度或消融效应等验证指标被报告为因果支持，但未说明使它们成为识别的假设。对30篇论文的双人编码审核重现了主要发现的方向：专门的识别部分缺失，且验证指标替代常见，尽管确切的Dim B/D计数对编码规则敏感。本文提出披露规范：声明主张是否为因果，命名识别策略，列举假设，强调至少一个，并解释如果假设失效，结论将如何变化。验证不是识别。

英文摘要

Mechanistic interpretability papers increasingly use causal vocabulary: circuits, mediators, causal abstraction, monosemanticity. Such claims require explicit identification assumptions. A purposive audit of 10 papers across four methodological strands finds no dedicated identification-assumptions section and a recurring pattern: validation metrics such as faithfulness, completeness, monosemanticity, alignment, or ablation effects are reported as causal support without stating the assumptions that make them identifying. A two-human-coder audit on $n=30$ reproduces the direction of the main finding: dedicated identification sections are absent, and validation-metric substitution is common, though exact Dim B/D counts are coding-rule sensitive. The paper proposes a disclosure norm: state whether the claim is causal, name the identification strategy, enumerate assumptions, stress at least one, and explain how conclusions shift if assumptions fail. Validation is not identification.

URL PDF HTML ☆

赞 0 踩 0

2605.07963 2026-05-11 cs.LG

Aggregation in conformal e-classification

符合性e分类中的聚合

Vladimir Vovk

AI总结本文研究了跨符合性e预测的实验，探讨了其修改版本在简化和灵活性方面的优势。

Comments 23 pages, 10 figures

2605.07938 2026-05-11 cs.LG

Prototype Guided Post-pretraining for Single-Cell Representation Learning

原型引导的单细胞表示学习预训练

Sachini Weerasekara, Natasha Darras, Sagar Kamarthi, Colles Price, Jacqueline Isaacs

AI总结本文提出CellRefine方法，通过整合标记基因集作为先验指导单细胞预训练后的优化，提升下游任务性能，实验显示性能提升达15%。

详情

AI中文摘要

单细胞表示学习（SCRL）从基因表达数据中揭示细胞功能的复杂调控逻辑。受自然语言处理中大语言模型的启发，最近提出了几种单细胞预训练模型，将基因视为标记，细胞视为句子。然而，这些模型受限于细胞类型分布的长尾性质，在基因表达数据中的协变量偏移下难以泛化。尽管微调常用于缓解这些问题，我们发现性能仍有限。为解决这一挑战，我们引入CellRefine，一种在单细胞基础模型预训练和微调阶段之间的后预训练方法。CellRefine使用多方面的目标，结合标记基因集作为结构先验，指导后预训练并优化细胞的潜在嵌入流形。在多个计算生物学任务中，实验证明CellRefine一致提升了下游性能，性能提升高达15%。

英文摘要

Single-cell representation learning (SCRL) from gene expression data offers a way to uncover the complex regulatory logic underlying cellular function. Inspired by large language models in natural language modeling, several single-cell pretrained models have recently been proposed that treat genes as tokens and cells as sentences. However, these models are fundamentally limited by the long-tailed nature of cell-type distributions and struggle to generalize under covariate shifts in gene expression data. While fine-tuning is often used to mitigate these issues, we observe that performance remains bounded. To address this challenge, we introduce CellRefine, a post-pretraining method that operates between the pretraining and fine-tuning stages of a single-cell foundation model. CellRefine uses a multi-faceted objective that incorporates marker-gene sets as structural priors to guide post-pretraining and refine the latent embedding manifold of cells. Across multiple computational biology tasks, empirical results show that CellRefine consistently improves downstream performance, yielding gains up to 15%.

URL PDF HTML ☆

赞 0 踩 0

2605.07764 2026-05-11 cs.RO

CommandSwarm: Safety-Aware Natural Language-to-Behavior-Tree Generation for Robotic Swarms

CommandSwarm: 为机器人群体提供安全意识的自然语言到行为树生成

Mohammed Majid, Amjad Yousef Majid

AI总结本文提出CommandSwarm系统，通过多语言翻译、安全过滤和约束提示生成XML行为树，验证了紧凑量化领域适应的LLM在机器人群体控制中的有效性，强调了解析器接受和安全过滤的重要性。

详情

AI中文摘要

自然语言接口可以降低非专家操作员使用群体机器人技术的门槛，但必须将模糊的用户意图转化为可执行的群体行为，而不能产生不支持的操作、格式错误的程序或不安全的计划。本文提出了CommandSwarm，一种安全意识的语言到行为树生成管道，用于从语音或文本命令生成XML行为树（BTs）。系统结合了多语言翻译、命令级安全过滤、约束提示、一个经过LoRA适应的大型语言模型（LLM）以及确定性解析器验证，以白名单中的可执行群体原语进行验证。我们评估了11个开放的6.7B-14B参数LLM，在零样本、一样本和两样本提示下，在代表性群体控制场景中进行评估。Falcon3-Instruct-10B和Mistral-7B-v3是表现最强的提示工程候选者，达到了BLEU分数超过0.60并在少样本设置中具有高语法有效性。对Falcon3-Instruct-10B在2063个合成指令-BT语料库上的LoRA适应，提高了零样本BLEU从0.267到0.663，ROUGE-L从0.366到0.692，以及解析器接受的语法有效性从0%到72%。翻译实验进一步表明，SeamlessM4T v2-large和EuroLLM-9B在多语言前端提供了最佳的质量-延迟权衡。结果表明，当嵌入在经过验证的系统管道中时，紧凑、量化、领域适应的LLM可以生成有用的群体BTs。它们还表明，解析器接受和安全过滤仍然是必要的执行门禁；生成质量本身不足以支持自主部署。

英文摘要

Natural-language interfaces can make swarm robotics more accessible to non-expert operators, but they must translate ambiguous user intent into executable swarm behaviors without unsupported actions, malformed programs, or unsafe plans. This paper presents CommandSwarm, a safety-aware language-to-behavior-tree pipeline for generating XML behavior trees (BTs) from speech or text commands. The system combines multilingual translation, command-level safety filtering, constrained prompting, a LoRA-adapted large language model (LLM), and deterministic parser validation against a whitelist of executable swarm primitives. We evaluate eleven open 6.7B--14B parameter LLMs, all using 4-bit quantization, on representative swarm-control scenarios under zero-shot, one-shot, and two-shot prompting. Falcon3-Instruct-10B and Mistral-7B-v3 are the strongest prompt-engineered candidates, reaching BLEU scores above 0.60 and high syntactic validity in few-shot settings. LoRA adaptation of Falcon3-Instruct-10B on a 2,063-example synthetic instruction--BT corpus improves zero-shot BLEU from 0.267 to 0.663, ROUGE-L from 0.366 to 0.692, and parser-accepted syntactic validity from 0% to 72%. Translation experiments further show that SeamlessM4T v2-large and EuroLLM-9B provide the best quality-latency trade-offs for the multilingual front end. The results indicate that compact, quantized, domain-adapted LLMs can generate useful swarm BTs when embedded in a validated systems pipeline. They also show that parser acceptance and safety filtering remain necessary execution gates; generation quality alone is not sufficient for autonomous deployment.

URL PDF HTML ☆

赞 0 踩 0

2605.07584 2026-05-11 cs.AI

Parallel Lifted Planning via Semi-Naive Datalog Evaluation

Dominik Drexler, Oliver Joergensen, Jendrik Seipp

AI总结该论文研究了如何通过半天真Datalog评估提升提升式经典规划的效率，提出了一个具有规则级和求地级双重并行性的执行模型。研究设计了一种基于团枚举的求地器，并扩展支持半天真Datalog评估，实验表明该方法在单核上已优于现有基线，且随着核心数增加性能优势更加明显，尤其在难以求地的任务中展现出高达92.4%的并行比例和6倍的加速效果。

2605.04460 2026-05-11 cs.LG

Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention

Fatima Ashraf, Muhammad Ayub Sabir, Junbiao Pang, Yufang Zhou, Yan Shang

AI总结该研究旨在从基于调查的社区干预数据中发现稀疏且可行的反事实干预策略，以引导目标群体向参考群体转变。研究提出了一种基于固定基非负潜在表示的方法，通过可解释的潜在因素调整实现分布对齐，并结合Shapley值指导的归因分析和熵正则化的最优传输方法，学习出具有稀疏性且易于实施的群体级干预方案。实验表明，该方法在真实交通调查数据上有效提升了群体转化效果，同时保持了干预策略的简洁性和可操作性。

2603.05539 2026-05-11 cs.LG cs.AI cs.IR cs.MM

VDCook:DIY video data cook your MLLMs

Chengwei Wu

AI总结本文提出 VDCook，一种可自我演进的视频数据操作系统，旨在为研究人员和垂直领域团队提供灵活的视频数据构建平台。用户可通过自然语言查询和参数调整发起数据请求，系统自动优化查询并并行运行视频检索与可控合成模块，最终生成带有完整来源信息和元数据的数据包。VDCook 支持基于 MCP 协议的自动数据摄入机制，使数据集能够持续更新和扩展，同时提供多维元数据标注，为后续数据处理和索引奠定基础，显著降低了构建专业视频训练数据集的门槛。

2601.15127 2026-05-11 cs.LG cs.CV cs.DC

DeepFedNAS: Efficient Hardware-Aware Architecture Adaptation for Heterogeneous IoT Federations via Pareto-Guided Supernet Training

Bostan Khan, Masoud Daneshtalab

AI总结 DeepFedNAS 是一种高效的硬件感知架构适应方法，旨在为异构物联网设备联邦学习场景中不同设备类别定制神经网络结构。该方法通过引入多目标适应度函数，结合信息论网络指标与架构启发式规则，提出两阶段框架：第一阶段通过预计算精英架构缓存提升超网络训练效果，第二阶段利用该适应度函数作为零成本精度代理，快速发现硬件优化子网络，显著提升搜索效率。实验表明，DeepFedNAS 在多个数据集上取得先进精度，同时大幅降低通信开销，适用于大规模、通信受限的物联网联邦学习场景。

Comments This paper significantly extends the preliminary work presented at ESANN 2026. Source Code: https://github.com/bostankhan6/DeepFedNAS

2512.02991 2026-05-11 cs.CV

GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection

Md Sohag Mia, Md Nahid Hasan, Muhammad Abdullah Adnan

AI总结 GraphFusion3D 是一种用于三维目标检测的统一框架，旨在解决点云数据稀疏、结构不完整和语义信息有限等挑战。该方法引入了自适应跨模态变换器（ACMT）和图推理模块（GRM），分别用于融合图像信息和建模点云中的局部几何与全局语义关系，从而提升检测性能。实验表明，GraphFusion3D 在多个基准数据集上取得了显著的性能提升。

2605.07488 2026-05-11 cs.AI cs.LG

Efficient Data Selection for Multimodal Models via Incremental Optimization Utility

通过增量优化效用实现多模态模型的数据选择效率

Jinhao Jing, Qiannian Zhao, Chao Huang, Zhan Su

AI总结本文提出OST框架，将数据选择转化为增量优化效用排名问题，通过轻量代理模拟单步更新估算样本边际效用，实验证明在减少训练成本的同时提升性能，优于现有基线方法。

详情

AI中文摘要

大规模多模态模型（LMMs）的扩展受到合成数据中质量和数量的权衡限制。先前方法如LLM-as-a-Judge在解决此问题上有效，但计算成本高且缺乏可解释性。为此，我们提出One-Step-Train（OST），将数据选择重新表述为增量优化效用排名问题。不同于依赖语义启发式，OST通过轻量代理模拟单步更新估算每个样本的边际效用。在Qwen系列多模态数学推理基准上的实验表明，OST实现了帕累托最优效率。通过选择前50个样本，OST将训练成本降低了43%（总时间消耗降低17），并在1.8个点上超越了强LLM-as-a-Judge基线。此外，在固定计算预算下，仅使用前20个样本的我们的方法在LLM-as-a-Judge上实现了5.6个点的提升，优于启发式评分基线如DEITA，并在8.8个点上超越Full-SFT基线。值得注意的是，尽管Full-SFT因噪声导致性能下降，我们的基于优化的方法有效识别了有毒样本，成功逆转了复杂推理任务中频繁观察到的负迁移。

英文摘要

The scaling of Large Multimodal Models (LMMs) is constrained by the quality-quantity trade-off inherent in synthetic data. Previous approaches, such as LLM-as-a-Judge, have proven their effectiveness in addressing this but suffer from prohibitive computational costs and lack of interpretability. To bridge this gap, we propose One-Step-Train (OST), a framework that reformulates data selection as an incremental optimization utility ranking problem. Instead of relying on semantic heuristics, OST estimates the marginal utility of each sample via a simulated single-step update on a lightweight proxy. Experiments on the Qwen series across multimodal mathematical reasoning benchmarks demonstrate that OST achieves Pareto-optimal efficiency. By selecting the top-50 subset, OST reduces training costs by 43% (and total time consumption by 17) while surpassing the strong LLM-as-a-Judge baseline by 1.8 points. Furthermore, under a fixed compute budget, our method using only the top-20 subset achieves a 5.6 point gain over LLM-as-a-Judge, improves upon heuristic scoring baselines like DEITA, and outperforms the Full-SFT baseline by 8.8 points. Notably, while Full-SFT suffers from performance degradation due to noise, our optimization-grounded approach effectively identifies toxic samples, successfully reversing the negative transfer frequently observed in complex reasoning tasks.

URL PDF HTML ☆

赞 0 踩 0

2605.07442 2026-05-11 cs.LG

GameGen-Verifier: Parallel Keypoint-Based Verification for LLM-Generated Games via Runtime State Injection

Chaobo Jia, Ruipeng Wan, Ting Sun, Weihao Tan, Borui Wan, Yuxuan Tong, Guangming Sheng, Hong Xu

AI总结本文提出了一种名为GameGen-Verifier的自动化验证框架，用于验证基于大语言模型生成的游戏是否符合自然语言规范。该方法通过将游戏规范分解为可验证的关键点，并将其转化为独立的验证单元，在运行时注入目标状态并执行有限交互以判断是否符合规范。实验表明，该方法在准确性上显著优于现有方法，同时大幅减少了验证所需的时间。

2605.07432 2026-05-11 cs.CL cs.LG

Generating training datasets for legal chatbots in Korean

Changhoe Hwang, Jee-Sun Nam, Eric Laporte

AI总结本研究旨在解决法律聊天机器人训练数据多样性与标注成本高的问题，提出了一种基于本地语法图（LGG）的语言资源生成方法，能够同时生成大量对话文本及其高质量标签。该方法通过结合领域特定的分类体系，有效提升了数据的标注效率与质量。研究实现了韩国法律聊天机器人LIGA，其在处理用户法律咨询时能够准确匹配相关案例，实验表明所训练的模型在F1分数上达到了91%。

Journal ref International conference on Law and Society, Feb 2023, Hanoi, Vietnam. pp.1-4

2605.07424 2026-05-11 cs.LG

A Flexible Adaptive Stable Clustering Algorithm for Archive-Scale Online Mass Spectrometry

Shao Shi, Xin Yang, Huiran Feng, Jianhuai Ye, Tianlong Hu, Yaling Zeng, Tzung-May Fu, Lei Zhu, Huizhong Shen, Chen Wang, Shu Tao

AI总结该研究针对在线质谱分析中产生的大规模数据流，提出了一种名为FASC的灵活自适应稳定聚类算法，旨在解决现有方法在可扩展性、度量灵活性和算法稳定性之间的权衡问题。FASC通过将相似性核与优化逻辑解耦，结合密度增强相似性选择规则和几何约束，实现了确定性、顺序无关的收敛。实验表明，该算法在标准数据集上表现出优异的聚类性能，并成功应用于大气气溶胶质谱数据，实现了线性时间复杂度，有效揭示了次级无机气溶胶的老化路径并检测出极低丰度的工业示踪物。

2605.07420 2026-05-11 cs.LG cs.CV

SR$^2$-LoRA: Self-Rectifying Inter-layer Relations in Low-Rank Adaptation for Class-Incremental Learning

Fengqiang Wan, Yipeng Lin, Kan Lv, Yang Yang

AI总结在类增量学习中，预训练模型通过参数高效的微调方法虽然表现出潜力，但在适应新任务时仍面临灾难性遗忘问题。本文从层间关系漂移的角度分析了这一问题，提出了一种新的方法SR$^2$-LoRA，通过约束层间关系的变化来缓解遗忘。该方法通过对齐当前任务样本在旧模型和新模型中的关系矩阵的奇异值，有效提升了模型在多任务场景下的鲁棒性和性能。

AI 大模型

视觉与机器人

科学与医疗

Bellman Calibration for $V$-Learning in Offline Reinforcement Learning

Dynamic one-time delivery of critical data by small and sparse UAV swarms: a model problem for MARL scaling studies

Switching-time bioprocess control with pulse-width-modulated optogenetics

Assumed Density Filtering and Smoothing with Neural Network Surrogate Models

Understanding Robustness of Model Editing in Code LLMs

RNAGenScape: Property-Guided, Optimized Generation of mRNA Sequences with Manifold Langevin Dynamics

Decoding Dynamic Visual Experience from Calcium Imaging via Cell-Pattern-Aware Pretraining

Federated Spatiotemporal Graph Learning for Passive Attack Detection in Smart Grids

A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI

Benchmark of Segmentation Techniques for Pelvic Fracture in CT and X-ray: Summary of the PENGWIN 2024 Challenge

Pretraining a Foundation Model for Small-Molecule Natural Products

Direction for Detection: A Survey of Automated Vulnerability Detection and all of its Pain Points

A Hybrid Graph Neural Network for Enhanced EEG-Based Depression Detection

Statistical Patterns in the Equations of Physics and the Emergence of a Meta-Law of Nature

Structure learning of Hamiltonians from real-time evolution

Learning quantum Hamiltonians at any temperature in polynomial time

Position: Mechanistic Interpretability Must Disclose Identification Assumptions for Causal Claims

Aggregation in conformal e-classification

Prototype Guided Post-pretraining for Single-Cell Representation Learning

CommandSwarm: Safety-Aware Natural Language-to-Behavior-Tree Generation for Robotic Swarms

Parallel Lifted Planning via Semi-Naive Datalog Evaluation

Discovering Sparse Counterfactual Factors via Latent Adjustment for Survey-based Community Intervention

VDCook:DIY video data cook your MLLMs

DeepFedNAS: Efficient Hardware-Aware Architecture Adaptation for Heterogeneous IoT Federations via Pareto-Guided Supernet Training

GraphFusion3D: Dynamic Graph Attention Convolution with Adaptive Cross-Modal Transformer for 3D Object Detection

Efficient Data Selection for Multimodal Models via Incremental Optimization Utility

GameGen-Verifier: Parallel Keypoint-Based Verification for LLM-Generated Games via Runtime State Injection

Generating training datasets for legal chatbots in Korean

A Flexible Adaptive Stable Clustering Algorithm for Archive-Scale Online Mass Spectrometry

SR$^2$-LoRA: Self-Rectifying Inter-layer Relations in Low-Rank Adaptation for Class-Incremental Learning