arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2060
2606.17579 2026-06-17 cs.LG cs.AI cs.CL cs.SI 新提交

LLM Features Can Hurt GNNs: Concatenation Interference on Homophilous Graph Benchmarks

LLM特征可能损害GNN:同配图基准上的拼接干扰

Zhongyuan Wang, Pratyusha Vemuri

AI总结 本文发现将LLM特征通过纯输入拼接(而非联合训练)引入图神经网络时,会在同配基准上系统性地降低准确率,并提出了一个基于LLM单独判别性指标Delta_sig来预测拼接效果。

Comments 29 pages, 8 figures

详情
AI中文摘要

将LLM生成的节点特征添加到图神经网络(GNN)中,被广泛报道能提高标准基准的准确率。我们记录了一个相反的观察:当LLM特征通过纯输入拼接(而非联合训练、蒸馏或提示条件)引入时,它们会在相同的同配基准上系统地降低准确率,而端到端LLM流水线在这些基准上却能成功。使用MLP骨干网络、Planetoid公共划分和词袋原始特征,拼接SBERT编码的GPT-4o-mini TAPE特征导致PubMed测试准确率下降-17.0±0.3个百分点,Cora下降-4.3±0.6个百分点(CiteSeer下降-0.6±0.8个百分点,在种子噪声范围内)。当我们放宽每个条件(GCN/GCNII/GAT骨干网络、随机划分、更小编码器)时,下降幅度减弱,并在中等同配的WikiCS(+4.4个百分点)和ogbn-arxiv(+11.7个百分点)上逆转。为了预测拼接何时有益或有害,我们报告了一个简单的LLM单独判别性指标Delta_sig。在9个数据集上,Delta_sig与拼接成本的相关系数(r^2=0.38)强于同配性(r^2=0.06;N=9,bootstrap置信区间重叠)。bootstrap最佳变点为tau=13.8个百分点,规则“Delta_sig <= tau预测非正拼接成本”正确分类了7/9个数据集;由于60%的bootstrap样本将tau置于[5,30]个百分点之间,我们将Delta_sig视为解释性透镜而非精确过滤器。在PubMed上进行的维度控制消融实验将LLM特征下降置于同源PCA(-2.3个百分点)和同维高斯噪声(-37.3个百分点)之间,排除了维度和权重衰减的影响。九个PubMed配置拟合出幂律|Delta_concat| ∝ (sqrt(d_l/n))^1.31,r^2=0.97;低Delta_sig、小n的角落正是标题中-17个百分点PubMed缺陷出现的位置。

英文摘要

Adding LLM-generated node features to graph neural networks (GNNs) is widely reported to improve accuracy on standard benchmarks. We document a contrasting observation: when LLM features are introduced through pure input concatenation (rather than joint training, distillation, or prompt-conditioning), they can systematically degrade accuracy on the same homophilous benchmarks where end-to-end LLM pipelines succeed. With an MLP backbone on the Planetoid public split and bag-of-words original features, concatenating SBERT-encoded GPT-4o-mini TAPE features reduces PubMed test accuracy by -17.0 +/- 0.3 pp and Cora by -4.3 +/- 0.6 pp (CiteSeer -0.6 +/- 0.8 pp, within seed noise). The drop attenuates as we relax each condition (GCN / GCNII / GAT backbones, random splits, smaller encoders) and reverses on medium-homophily WikiCS (+4.4 pp) and ogbn-arxiv (+11.7 pp). To predict when concatenation helps versus hurts, we report a simple measure of LLM-alone discriminability, Delta_sig. Across 9 datasets Delta_sig correlates with the concatenation cost more strongly than homophily at point estimate (r^2 = 0.38 vs. 0.06; N=9, bootstrap CIs overlap). The bootstrap-best change-point is tau = 13.8 pp, and the rule "Delta_sig <= tau predicts non-positive concat cost" classifies 7/9 datasets correctly; since 60% of bootstrap samples place tau in [5, 30] pp, we treat Delta_sig as an interpretive lens rather than a precision filter. A dimension-controlled ablation on PubMed places the LLM-feature drop between same-source PCA (-2.3 pp) and same-dim Gaussian noise (-37.3 pp), ruling out dimensionality and weight-decay artifacts. Nine PubMed configurations fit a power law |Delta_concat| proportional to (sqrt(d_l/n))^1.31 with r^2 = 0.97; the low-Delta_sig, small-n corner is exactly where the headline -17 pp PubMed deficit appears.

2606.17572 2026-06-17 cs.LG cs.SY eess.SY 新提交

When Dynamics Models Read the Wrong Time Steps: Label-Free Event Credit Re-Anchoring for Robust Global Readouts

当动力学模型读取错误的时间步:无标签事件信用重锚定以实现鲁棒的全局读出

Yifan Wang

AI总结 针对序列到全局接口中的时间信用稀释问题,提出无训练无标签的CREST方法,通过事件核心估计与对比重锚定,减少分布外误差并恢复事件信用。

Comments 7 pages, 6 figures

详情
AI中文摘要

学习到的动力学模型通常通过将每步特征序列池化为一个读出向量来回答全局物理问题,如故障严重性或冲击刚度。这种序列到全局的接口产生了一个未被充分研究的时间信用问题:在仅有轨迹级监督的情况下,模型可以在训练条件下准确预测,同时从丰富的平滑相关物而非决定目标的短暂物理事件中读取信息。我们将这种失败称为时间信用稀释。它不会被训练损失暴露,也不会被标准的物理信息残差消除,因为错误在于全局读出分配功能信用的位置。我们引入了Credit-in-Event,一种接口级探针,用于测量池化信用落在事件步上的程度,并闭式证明当事件分数缩小时,池化线性读取器将信用路由到虚假的背景通道。然后我们提出了CREST,一种无训练且无标签的读出方法,它从学习到的特征中估计瞬态事件核心,并通过事件与其余部分的对比重锚定池化表示。在模拟齿轮和冲击系统、循环和注意力编码器以及公共轴承振动数据上,CREST减少了分布外误差,同时恢复了事件信用。消融实验表明,稳定步选择和感受野缩小失败,证实了增益来自事件核心信用重锚定,而非通用的局部性或稳定性先验。

英文摘要

Learned dynamics models often answer global physical questions, such as fault severity or impact stiffness, by pooling a per-step feature sequence into one readout vector. This sequence-to-global interface creates an under-studied temporal credit problem: with only trajectory-level supervision, a model can predict accurately in training conditions while reading from abundant smooth correlates rather than the brief physical events that determine the target. We call this failure temporal credit dilution. It is not exposed by the training loss and is not removed by standard physics-informed residuals, because the error lies in where the global readout assigns functional credit. We introduce Credit-in-Event, an interface-level probe for measuring how much pooled credit lands on event steps, and prove in closed form that a pooled linear reader routes credit to a spurious background channel as the event fraction shrinks. We then propose CREST, a training-free and label-free readout that estimates a transient event core from learned features and re-anchors the pooled representation through event-versus-rest contrast. Across simulated gear and impact systems, recurrent and attention encoders, and public bearing vibration data, CREST reduces out-of-distribution error while restoring event credit. Ablations show that stable-step selection and receptive-field shrinking fail, confirming that the gain comes from event-core credit re-anchoring rather than a generic locality or stability prior.

2606.17451 2026-06-17 cs.LG cs.RO 新提交

Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift

操作设计域转移下自动驾驶汽车责任的可信度加权定价

Doyeon Jang

AI总结 针对自动驾驶系统部署中经验稀疏、ODD转移及风险非平稳问题,提出分层贝叶斯可信度框架,通过ODD相似性核进行部分池化,在Waymo数据上验证其有效性。

详情
AI中文摘要

自动驾驶系统的部署带来了一个基础性的费率制定挑战:稀疏的经验、不断变化的操作设计域以及跨软件版本的非平稳风险。我们提出了一个分层贝叶斯可信度框架,通过学习的ODD相似性核汇集城市、软件版本和区域的信息,将Buhlmann-Straub作为极限情况嵌套其中。基于NHTSA Standing General Order数据库中美国四个大都市区的648起Waymo已验证碰撞事件与1.16亿匹配里程的演示表明,城市聚合可信度权重适中(0.12-0.46),部分池化明显优于无池化,且功效分析显示,学习核的优势在大约十二个部署城市时变得可检测。

英文摘要

Automated Driving System deployments create a foundational ratemaking challenge: sparse experience, shifting operational design domains, and non-stationary risk across software releases. We propose a hierarchical Bayesian credibility framework pooling across cities, software versions, and territories via a learned ODD-similarity kernel, nesting Buhlmann-Straub as a limiting case. Demonstrated on 648 verified-engaged Waymo crashes across four U.S. metros from the NHTSA Standing General Order database against 116 million matched miles, city-aggregate credibility weights are moderate (0.12-0.46), partial pooling decisively outperforms no pooling, and a power analysis shows the learned kernel's advantage becomes detectable at approximately twelve deployed cities.

2606.16878 2026-06-17 cs.LG 新提交

Integrated Marketing Attribution: A Bayesian Framework for Privacy-Safe Granular Measurement Anchored in MMM

集成营销归因:基于贝叶斯框架的隐私安全粒度测量,锚定于MMM

Meghana R. Bhat, Ankit Umare, Utsav Aggarwal, Richard Vecsler, Arunkumar Mani, Karthik Nair, Chandhu Nair

AI总结 提出集成营销归因(IMA)框架,结合营销组合模型(MMM)与贝叶斯归因模型,从聚合数据中推导出活动级效果,实现隐私安全且粒度精细的归因。

详情
AI中文摘要

零售营销测量日益需要精细的活动级洞察,而无需依赖用户级跟踪。然而,两种主流方法——营销组合模型(MMM)和多触点归因(MTA)——常常产生碎片化的洞察。MMM在渠道级规划中隐私安全且稳健,但对于活动优化过于粗糙;而MTA提供精细归因,但在日益增加的隐私限制下变得不太可靠。我们提出集成营销归因(IMA),一个统一框架,将MMM与特定渠道的贝叶斯归因模型相结合,从聚合数据中推导活动级效果。通过利用MMM信息先验,IMA提供精细、隐私安全的归因,同时保持与MMM的一致性。

英文摘要

Retail marketing measurement increasingly requires granular campaign-level insights without relying on user-level tracking. However, the two dominant approaches, Marketing Mix Modeling (MMM) and Multi-Touch Attribution (MTA), often produce fragmented insights. MMM is privacy-safe and robust for channel-level planning but is too coarse for campaign optimization, while MTA provides granular attribution but has become less reliable under increasing privacy restrictions. We propose Integrated Marketing Attribution (IMA), a unified framework that combines MMM with channel specific Bayesian attribution models to derive campaign-level effects from aggregated data. By leveraging MMM-informed priors, IMA delivers granular, privacy-safe attribution while preserving consistency with MMM.

2606.12867 2026-06-17 cs.LG 新提交

SMGFM: Spectral Multimodal Graph Pretraining for Multimodal-Attributed Graphs

SMGFM: 面向多模态属性图的谱多模态图预训练

Zhengyu Wu, Xu Wang, Hongchao Qin, Xunkai Li, Guang Zeng, Rong-Hua Li, Guoren Wang

AI总结 提出SMGFM框架,利用图频谱分解区分结构诱导语义与模态特有语义,通过频带路由实现跨模态融合,在图级和模态级任务上取得最优性能。

详情
AI中文摘要

多模态属性图(MAGs)将图拓扑结构与来自文本、图像等模态的节点语义相结合。传统的图学习通过耦合拓扑与节点特征来上下文化节点语义。然而,这种耦合设计在MAGs中变得棘手,因为结构诱导和模态固有的语义可能对下游任务产生不同贡献。结构诱导语义通过平滑拓扑变化促进关系一致性,而模态固有语义通常编码局部、细粒度的区分,不应被统一平滑或对齐。因此,关键挑战在于跨模态融合前识别语义角色。为此,我们利用图频率变化作为先验,其中低频分量捕获拓扑一致语义,高频分量保留模态特定语义。基于这一直觉,我们提出SMGFM,一种谱多模态图预训练框架,将每个模态特定的节点信号分解为图频带,并在跨模态交互前分配频带级语义角色。具体地,SMGFM使用可扩展的切比雪夫滤波器构建频率解析的模态令牌,通过拓扑条件路由估计其耦合可靠性,并在融合前进行频带-模态交互。其频率路由目标在平滑共识路由的同时保留模态特定路由,减轻空间域纠缠和统一跨模态对齐。在MAG数据集上的大量实验表明,SMGFM在图级和模态级任务上均达到最先进性能。

英文摘要

Multimodal-attributed graphs (MAGs) couple graph topology with node semantics from text, images, and other modalities. Traditional graph learning contextualizes node semantics by coupling topology with node features. However, this coupling design becomes troublesome in MAGs, where structure-induced and modality-intrinsic semantics may contribute differently to downstream tasks. Structure-induced semantics promote relational consistency through smooth topological variation, whereas modality-intrinsic semantics often encode local, fine-grained distinctions that should not be uniformly smoothed or aligned. Therefore, the key challenge is to identify semantic roles before cross-modal fusion. To this end, we leverage graph-frequency variation as a prior, where low-frequency components capture topology-consistent semantics and high-frequency components preserve modality-specific semantics. Based on this intuition, we propose SMGFM, a spectral multimodal graph pretraining framework that decomposes each modality-specific node signal into graph-frequency bands and assigns band-level semantic roles before cross-modal interaction. Concretely, SMGFM constructs frequency-resolved modality tokens with scalable Chebyshev filters, estimates their coupling reliability through topology-conditioned routing, and performs band-modality interaction before fusion. Its frequency-routed objectives align smooth consensus routes while preserving modality-specific routes, mitigating spatial-domain entanglement and uniform cross-modal alignment. Extensive experiments conducted on the MAG datasets demonstrate that SMGFM achieves state-of-the-art performance across graph-level and modality-level tasks.

2502.17518 2026-06-17 cs.LG cs.AI q-fin.CP stat.ML 版本更新

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

通过分类器模型进行集成强化学习:在交易策略中增强风险回报权衡

Zheli Xiong

AI总结 本文研究了在金融交易策略中使用集成强化学习模型的全面研究,利用分类器模型来提升性能。通过将A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归相结合,探讨不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个强化学习模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

Comments 23 pages,10 figures, 9 table

详情
AI中文摘要

本文提出了一项全面研究,探讨在金融交易策略中使用集成强化学习(RL)模型的应用,利用分类器模型来提升性能。通过结合A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归,我们研究了不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个RL模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。我们的结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

英文摘要

This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our original experimental results demonstrate that ensemble methods often outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, both the original analysis and the additional reproduction reported in this version show that ensemble performance is sensitive to the choice of variance threshold \(τ\), classifier group, RL-agent pair, and market universe. The reproduction evidence strengthens the conclusion that classifier-assisted ensemble selection can improve robustness, while also clarifying that the advantage is conditional rather than automatic across all datasets. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

2602.22159 2026-06-17 cs.CV 版本更新

CASR: A Robust Cyclic Framework for Arbitrary Large-Scale Super-Resolution with Distribution Alignment and Self-Similarity Awareness

CASR:一种鲁棒的循环框架,用于任意大尺度超分辨率,具有分布对齐和自相似性意识

Wenhao Guo, Zhaoran Zhao, Peng Lu, Sheng Li, Qian Qiao, DeRui Li

AI总结 CASR通过分布对齐和自相似性意识,解决大尺度超分辨率中的分布漂移和扩散不一致问题,实现稳定推理和高效单模型处理。

详情
AI中文摘要

CASR通过分布对齐和自相似性意识,解决大尺度超分辨率中的分布漂移和扩散不一致问题,实现稳定推理和高效单模型处理。

英文摘要

Arbitrary-Scale SR (ASISR) remains fundamentally limited by cross-scale distribution shift: once the inference scale leaves the training range, noise, blur, and artifacts accumulate sharply. We revisit this challenge from a cross-scale distribution transition perspective and propose CASR, a simple yet highly efficient cyclic SR framework that reformulates ultra-magnification as a sequence of in-distribution scale transitions. This design ensures stable inference at arbitrary scales while requiring only a single model. CASR tackles two major bottlenecks: distribution drift across iterations and patch-wise diffusion inconsistencies. The proposed SSAM module aligns structural distributions via superpixel aggregation, preventing error accumulation, while SARM module restores high-frequency textures by enforcing correlation-guided consistency and preserving self-similarity structure through correlation alignment. Despite using only a single model, our approach significantly reduces distribution drift, preserves long-range texture consistency, and achieves superior generalization even at extreme magnification.

2605.09313 2026-06-17 cs.CV 版本更新

Attention Sinks in Diffusion Transformers: A Causal Analysis

扩散变换器中的注意力 sinks:一种因果分析

Fangzheng Wu, Brian Summa

AI总结 研究探讨了扩散变换器中注意力 sinks 的作用,通过动态识别并抑制注意力接收者,发现其对文本-图像对齐和偏好代理影响有限,但强干预下出现特定边界。

详情
AI中文摘要

Attention sinks -- tokens that receive disproportionate attention mass -- are assumed to be functionally important in autoregressive language models, but their role in diffusion transformers remains unclear. We present a causal analysis in text-to-image diffusion, dynamically identifying dominant attention recipients per timestep and suppressing them via paired, 免训练 interventions on the score and value paths. Across 553 GenEval prompts on Stable Diffusion~3 (with SDXL corroboration), removing these sinks does not degrade text-image alignment (CLIP-T) or preference proxies (ImageReward, HPS-v2) at $k{=}1$; only under stronger interventions ($k\!\geq\!10$) does HPS-v2 exhibit a metric-dependent boundary, while CLIP-T remains robust throughout. The perceptual shifts induced by suppression are nonetheless \emph{sink-specific} -- $\sim\!6\times$ larger than equal-budget random masking -- revealing an empirical dissociation between trajectory-level perturbation and \emph{semantic alignment} in diffusion transformers. \footnote{Code available at https://github.com/wfz666/ICML26-attention-sink.}

英文摘要

Attention sinks -- tokens that receive disproportionate attention mass -- are assumed to be functionally important in autoregressive language models, but their role in diffusion transformers remains unclear. We present a causal analysis in text-to-image diffusion, dynamically identifying dominant attention recipients per timestep and suppressing them via paired, training-free interventions on the score and value paths. Across 553 GenEval prompts on Stable Diffusion~3 (with SDXL corroboration), removing these sinks does not degrade text-image alignment (CLIP-T) or preference proxies (ImageReward, HPS-v2) at $k{=}1$; only under stronger interventions ($k\!\geq\!10$) does HPS-v2 exhibit a metric-dependent boundary, while CLIP-T remains robust throughout. The perceptual shifts induced by suppression are nonetheless \emph{sink-specific} -- $\sim\!6\times$ larger than equal-budget random masking -- revealing an empirical dissociation between trajectory-level perturbation and \emph{semantic alignment} in diffusion transformers. \footnote{Code available at https://github.com/wfz666/ICML26-attention-sink.}

2601.01762 2026-06-17 cs.RO cs.CV 版本更新

AlignDrive: Aligned Lateral-Longitudinal Planning for End-to-End Autonomous Driving

AlignDrive: 用于端到端自动驾驶的对齐横向-纵向规划

Yanhao Wu, Haoyang Zhang, Fei He, Rui Wu, Yanhu Shan, Congpei Qiu, Liang Gao, Wei Ke, Tong Zhang

AI总结 本文提出一种 cascaded 框架,通过将纵向规划转化为路径条件推理过程,提升自动驾驶的协调性和安全性。方法引入锚点回归设计和规划导向的数据增强策略,实现在 Bench2Drive 上达到 SOTA 性能。

Comments underreview

详情
AI中文摘要

实用的自动驾驶需要能够通过时空可能性推理来排除不安全结果的模型。尽管最先进的方法使用并行规划架构,但它们未能明确将速度决策与路径上的代理行为联系起来,导致协调不优。为此,我们提出了一种级联框架,将纵向规划从独立预测任务转化为路径条件推理过程。在模型方面,我们引入基于锚点的回归设计,将纵向预测条件于横向驾驶路径,并将纵向规划重新表述为路径上的 1D 位移预测。这减少了几何不确定性,并使模型更专注于由交互驱动的动力学。在数据方面,我们引入了规划导向的数据增强策略,通过程序性插入代理和重标记纵向目标来模拟罕见的安全关键事件。在具有挑战性的 Bench2Drive 基准上评估,我们的方法在驾驶分数为 89.07 和成功率为 73.18% 的情况下实现了 SOTA 性能,证明了显著改进的协调性和安全性。进一步在 Fail2Drive 上的评估证实了在平行公式通常失败的罕见边缘情况下具有强大的泛化能力。项目页面:https://yanhaowu.github.io/AlignDrive/.

英文摘要

Practical autonomous driving requires models that generalize by reasoning through spatial-temporal possibilities to exclude unsafe outcomes. While state-of-the-art (SOTA) methods use parallel planning architectures, they fail to explicitly couple speed decisions with agent behavior along the driving path, leading to suboptimal coordination. To address this, we propose a cascaded framework that transforms longitudinal planning from an independent prediction task into a path-conditioned reasoning process. On the model side, we introduce an anchor-based regression design that conditions longitudinal prediction on the lateral drive path, and reformulate longitudinal planning as 1D displacement prediction along the path. This reduces geometric uncertainty and sharpens the model's focus on interaction-driven dynamics. On the data side, we introduce a planning-oriented data augmentation strategy that simulates rare safety-critical events by programmatically inserting agents and relabeling longitudinal targets to enforce collision avoidance. Evaluated on the challenging Bench2Drive benchmark, our method achieves SOTA performance with a driving score of 89.07 and a success rate of 73.18%, demonstrating significantly improved coordination and safety. Further evaluation on Fail2Drive confirms strong generalization to rare edge cases where parallel formulations typically fail. Project page:https://yanhaowu.github.io/AlignDrive/.

2605.08827 2026-06-17 cs.AI 版本更新

Mental Health AI Safety Claims Must Preserve Temporal Evidence

心理健康AI的安全性主张必须保留时间证据

Srimonti Dutta, Ratna Kandala

AI总结 本文指出,心理健康AI的安全性评估常忽略时间维度,提出SCOPE-MH原则以确保评估保留时间证据,揭示对话中逐步恶化等机制,强调时间证据对安全部署的必要性。

详情
AI中文摘要

心理健康AI的安全性往往在错误的时间尺度上被评判。当前评估通常仅评分孤立响应、终点结果或对话质量总和,而临床重要失败可能源于交互顺序和累积,包括延迟升级、重复强化、依赖形成、失败修复和逐步恶化的跨轮次。本文认为这种不匹配不仅是评估覆盖的限制,更是无效安全结论的来源。我们引入了时间安全不可识别性,即为何依赖序列、时间、累积或恢复的安全属性无法通过丢弃这些特征的协议认证。从这一形式化中,我们开发了SCOPE(安全主张基于保留证据)作为对齐安全主张与评估实际保留证据的一般原则,并将其实例化为SCOPE-MH,即心理健康领域的这一报告标准。我们通过AnnoMI数据集上的概念验证,揭示了单轮行为评分无法代表的失败机制。我们提出SCOPE-MH作为现有评估基础设施的诊断补充,并论证保留时间证据对安全关键的心理健康AI部署是必要而非可选的。

英文摘要

The safety of mental health AI is often judged at the wrong temporal scale. Current evaluations typically score isolated responses, endpoint outcomes, or aggregate dialogue quality, while clinically consequential failures may arise from the order and accumulation of interactions themselves, including delayed escalation, repeated reinforcement, dependency formation, failed repair, and gradual deterioration across turns. This paper argues that this mismatch is not merely a limitation of evaluation coverage but a source of invalid safety conclusions. We introduce Temporal Safety Non-Identifiability, a formal account of why safety properties that depend on sequence, timing, accumulation, or recovery cannot be certified by protocols that discard those features. From this formalization, we develop SCOPE (Safety Claims Over Preserved Evidence) as a general principle for aligning safety claims with the evidence an evaluation actually retains, and instantiate it as SCOPE-MH, a mental-health instantiation of this reporting standard. We operationalize SCOPE-MH through a proof-of-concept on the AnnoMI dataset of expert-annotated motivational interviewing conversations, which reveals mechanisms of failure that per-turn behavior scoring does not represent. We propose SCOPE-MH as a diagnostic complement to existing evaluation infrastructure and argue that evaluation preserving temporal evidence is necessary, not optional, for safety-critical mental health AI deployment.

2605.08077 2026-06-17 cs.CL 版本更新

Conformal Path Reasoning: Trustworthy Knowledge Graph Question Answering via Path-Level Calibration

Conformal Path Reasoning: 通过路径级校准实现可信的知识图谱问答

Shuhang Lin, Chuhao Zhou, Xiao Lin, Zihan Dong, Kuan Lu, Zhencan Peng, Jie Yin, Dimitris N. Metaxas

AI总结 提出Conformal Path Reasoning (CPR)框架,通过查询级共形校准和残差共形价值网络(RCVNet)学习判别性路径级非一致性分数,在保证有效覆盖的同时将预测集大小平均减少52%。

Comments 13 pages, 3 figures, 2 tables;

详情
AI中文摘要

知识图谱问答(KGQA)提供了基于事实的、可解释的推理,但现有方法通常无法对检索到的答案提供可靠的覆盖保证。虽然共形预测(CP)为生成具有统计保证的预测集提供了原则性框架,但先前的共形KGQA方法存在两个关键缺陷:由于无效校准导致的覆盖保证被违反,以及弱分数判别性导致预测集过大。我们提出Conformal Path Reasoning (CPR),一个基于两个关键创新的新型可信KGQA框架。首先,路径级分数上的查询级共形校准保持可交换性以确保有效的覆盖保证。其次,我们引入残差共形价值网络(RCVNet),这是一个通过PUCT引导探索训练的轻量级模块,用于学习判别性的路径级非一致性分数。大量实验表明,与基准数据集上的共形基线相比,CPR将经验覆盖率平均提高45%,同时将预测集大小平均减少52%,突显了其在知识图谱上进行可靠共形推理的有效性。

英文摘要

Knowledge Graph Question Answering (KGQA) offers grounded, interpretable reasoning, but existing methods often fail to provide reliable coverage guarantees over retrieved answers. While Conformal Prediction (CP) offers a principled framework for producing prediction sets with statistical guarantees, prior conformal KGQA methods suffer from two critical pitfalls: violated coverage guarantees due to invalid calibration, and weak score discriminability that yields excessively large prediction sets. We propose Conformal Path Reasoning (CPR), a novel trustworthy KGQA framework built on two key innovations. First, query-level conformal calibration over path-level scores preserves exchangeability to ensure valid coverage guarantees. Second, we introduce the Residual Conformal Value Network (RCVNet), a lightweight module trained via PUCT-guided exploration to learn discriminative path-level nonconformity scores. Extensive experiments show that CPR significantly improves the Empirical Coverage Rate by 45% while reducing prediction set size by 52% on average over conformal baselines across benchmark datasets, highlighting its effectiveness for reliable conformal reasoning over knowledge graphs.

2504.11837 2026-06-17 cs.CL cs.AI 版本更新

EmoFSM: A Finite State Machine for Emotional Support Conversation

EmoFSM:一种用于情感支持对话的有限状态机

Yue Zhao, Qingqing Gu, Xiaoyu Wang, Teng Chen, Zhonglin Jiang, Yong Chen, Hongyan Li, Luo Ji

AI总结 针对情感支持对话中长期满意度不足的问题,提出EmoFSM框架,利用有限状态机引导大语言模型进行规划与自我推理,在多个数据集上优于多种基线方法。

Comments 15 pages, 4 figures. PAKDD 2026

详情
AI中文摘要

情感支持对话旨在通过有效对话缓解人们的情感困扰。尽管大语言模型在ESC方面取得了显著进展,但大多数研究可能未从状态模型角度定义图,从而为长期满意度提供了次优解决方案。为解决此问题,我们利用有限状态机在LLM上提出名为EmoFSM的框架。我们的框架允许单个LLM在ESC期间引导规划,并在每个对话轮次中自我推理求助者的情绪、支持策略以及最终回应。在ESC数据集上的大量实验表明,EmoFSM优于许多基线方法,包括直接推理、自我微调、思维链、微调和外部支持方法,甚至那些参数更多的模型。

英文摘要

Emotional support conversation (ESC) aims to alleviate people's emotional distress through effective conversations. Although large language models (LLMs) have made remarkable progress in ESC, most of these studies may not define the diagram from a state-model perspective, thereby providing a suboptimal solution for long-term satisfaction. To address such an issue, we leverage the Finite State Machine (FSM) on LLMs, and propose a framework called EmoFSM. Our framework allows a single LLM to bootstrap the planning during ESC, and self-reason the seeker's emotion, support strategy, and the final response upon each conversation turn. Substantial experiments in ESC datasets suggest that EmoFSM outperforms many baselines, including direct inference, self-fine, chain of thought, finetuning, and externally supported methods, even those with many more parameters.

2604.24696 2026-06-17 cs.CV 版本更新

NeuroClaw Technical Report

NeuroClaw 技术报告

Cheng Wang, Zhibin He, Zhihao Peng, Shengyuan Liu, Yufan Hu, Carl Yang, Lifang He, Lichao Sun, Xiang Li, Yixuan Yuan

AI总结 针对神经影像学中多模态数据、长流程和可重复性挑战,提出NeuroClaw多智能体研究助手,通过数据驱动决策、环境管理和三层技能架构实现可执行可复现的神经影像分析,并在NeuroBench基准上显著优于直接调用智能体。

详情
AI中文摘要

代理型人工智能系统有望加速科学工作流程,但神经影像学面临独特挑战:异质模态(sMRI、fMRI、dMRI、EEG)、长多阶段流水线以及持续的可重复性风险。为解决这一差距,我们提出了NeuroClaw,一个面向可执行和可复现神经影像研究的领域专用多智能体研究助手。NeuroClaw直接操作跨格式和模态的原始神经影像数据,将决策基于数据集语义和BIDS元数据,因此用户无需准备精选输入或定制模型代码。该平台结合了工具工程与端到端环境管理,包括固定Python环境、Docker支持、常见神经影像工具的自动安装程序以及GPU配置。在实践中,这一层强调检查点、执行后验证、结构化审计追踪和受控运行时设置,使工具链更加透明,同时提高可重复性和可审计性。三层技能/智能体层次结构将用户交互、高层编排和底层工具技能分离,将复杂工作流分解为安全、可重用的单元。除了NeuroClaw框架,我们还引入了NeuroBench,一个系统级基准测试,用于评估可执行性、工件有效性和可重复性准备情况。在多个多模态LLM上,与直接调用智能体相比,启用NeuroClaw的运行产生了一致且显著的分数提升。项目主页:此https URL

英文摘要

Agentic artificial intelligence systems promise to accelerate scientific workflows, but neuroimaging poses unique challenges: heterogeneous modalities (sMRI, fMRI, dMRI, EEG), long multi-stage pipelines, and persistent reproducibility risks. To address this gap, we present NeuroClaw, a domain-specialized multi-agent research assistant for executable and reproducible neuroimaging research. NeuroClaw operates directly on raw neuroimaging data across formats and modalities, grounding decisions in dataset semantics and BIDS metadata so users need not prepare curated inputs or bespoke model code. The platform combines harness engineering with end-to-end environment management, including pinned Python environments, Docker support, automated installers for common neuroimaging tools, and GPU configuration. In practice, this layer emphasizes checkpointing, post-execution verification, structured audit traces, and controlled runtime setup, making toolchains more transparent while improving reproducibility and auditability. A three-tier skill/agent hierarchy separates user-facing interaction, high-level orchestration, and low-level tool skills to decompose complex workflows into safe, reusable units. Alongside the NeuroClaw framework, we introduce NeuroBench, a system-level benchmark for executability, artifact validity, and reproducibility readiness. Across multiple multimodal LLMs, NeuroClaw-enabled runs yield consistent and substantial score improvements compared with direct agent invocation. Project homepage: https://cuhk-aim-group.github.io/NeuroClaw/index.html

2605.00725 2026-06-17 cs.LG 版本更新

Weisfeiler Lehman Test on Combinatorial Complexes: Generalized Expressive Power of Topological Neural Networks

组合复形上的Weisfeiler-Lehman测试:拓扑神经网络的泛化表达能力

Jiawen Chen, Qi Shao, Zhiqiang Ge, Duxin Chen, Wenwu Yu

AI总结 提出组合复形Weisfeiler-Lehman(CCWL)框架,通过四种结构邻域统一拓扑神经网络的表达能力,并证明在特定条件下可简化为仅使用上下邻域桥信息,实例化为CCIN网络,实验验证其有效性。

详情
AI中文摘要

拓扑神经网络已成为建模超图、单纯复形和胞腔复形等超越成对图的高阶关系结构的有效工具。然而,现有的Weisfeiler-Leman类型表达能力分析通常在不同的结构域上开发,并依赖于特定域的邻域系统,使得它们的表达能力难以在统一形式下进行比较。本文提出了组合复形Weisfeiler-Lehman(CCWL)框架,这是在组合复形上定义的一种统一的表达能力细化。通过利用组合复形表示集合类型关系和部分-整体层次结构的能力,CCWL通过四个结构邻域进行拓扑颜色细化:边界、共边界、下邻接和上邻接。我们证明,在指定的提升映射下,CCWL可以模拟多个特定域的WL类型细化,从而为分析拓扑消息传递提供了共同的理论基线。我们进一步研究了邻域充分性问题,并证明在显式覆盖条件下,仅使用下邻接和上邻接桥信息的简化细化保留了完整四邻域CCWL细化的区分能力。基于这一理论结果,我们将简化细化实例化为组合复形同构网络(CCIN)。在合成和真实世界基准上的实验表明,CCIN在代表性图和拓扑神经网络基线上取得了有竞争力的性能。消融研究和资源效率分析进一步支持了所提出的下/上邻域设计的有效性。

英文摘要

Topological neural networks have emerged as effective tools for modeling higher-order relational structures beyond pairwise graphs, including hypergraphs, simplicial complexes, and cell complexes. However, existing Weisfeiler-Leman type expressivity analyses are typically developed on different structural domains and rely on domain-specific neighborhood systems, making their expressive powers difficult to compare within a common formalism. In this paper, we introduce the Combinatorial Complex Weisfeiler-Leman (CCWL) framework, a unified expressive power refinement defined on combinatorial complexes. By exploiting the ability of combinatorial complexes to represent both set-type relations and part-whole hierarchies, CCWL performs topological color refinement through four structural neighborhoods: boundary, co-boundary, lower adjacency, and upper adjacency. We show that, under specified lifting maps, CCWL can simulate several domain-specific WL-type refinements, thereby providing a common theoretical baseline for analyzing topological message passing. We further study the neighborhood sufficiency problem and prove that, under explicit coverage conditions, a reduced refinement using only lower- and upper-adjacent bridge information preserves the distinguishing power of the full four-neighborhood CCWL refinement. Guided by this theoretical result, we instantiate the reduced refinement as the Combinatorial Complex Isomorphism Network (CCIN). Experiments on synthetic and real-world benchmarks demonstrate that CCIN achieves competitive performance against representative graph and topological neural network baselines. Ablation studies and resource-efficiency analyses further support the effectiveness of the proposed lower/upper-neighborhood design.

2605.00330 2026-06-17 cs.LG 版本更新

Conformalized Quantum DeepONet Ensembles for Scalable Operator Learning with Distribution-Free Uncertainty

conformalized 量子 deeponet 集团用于具有分布自由不确定性的可扩展操作学习

Purav Matlia, Christian Moya, Guang Lin

AI总结 本文提出一种结合量子正交神经网络和适应性置信预测的框架,解决高维动态系统运算学习中的二次推断复杂度和不确定性量化问题,通过压缩多个模型到单个电路实现高效并行计算。

详情
AI中文摘要

操作学习能够快速构建高维动态系统的替代模型,但现有方法面临两个根本性限制:二次推断复杂性和安全关键设置中不可靠的不确定性量化。我们提出了 conformalized 量子 deeponet 集团,一个同时解决这两个挑战的框架。通过利用量子正交神经网络(qorthonn),我们将操作推断复杂性从 O(n²) 降低到 O(n),使在细粒度离散化上可扩展的评估成为可能。为了提供严谨的不确定性量化,我们结合基于集合的epistemic建模与自适应 conformal 预测,从而获得分布自由的覆盖保证。在集合中的一个关键挑战是,朴素的并行性使硬件资源与模型数量线性增长。我们通过使用叠加参数化量子电路(spqcs)来解决这个问题,将多个集合成员压缩到一个电路中,并启用同时多模型执行。在合成偏微分方程和现实世界电力系统动态上的实验表明,我们的方法在保持现实量子噪声下的校准不确定性的同时实现了准确的预测。这些结果为量子机器学习中的可扩展、具有不确定性的操作学习建立了实用路径。

英文摘要

Operator learning enables fast surrogate modeling of high-dimensional dynamical systems, but existing approaches face two fundamental limitations: quadratic inference complexity and unreliable uncertainty quantification in safety-critical settings. We propose Conformalized Quantum DeepONet Ensembles, a framework that addresses both challenges simultaneously. By leveraging Quantum Orthogonal Neural Networks (QOrthoNNs), we reduce operator inference complexity from O(n^2) to O(n), enabling scalable evaluation over fine discretizations. To provide rigorous uncertainty quantification, we combine ensemble-based epistemic modeling with adaptive conformal prediction, yielding distribution-free coverage guarantees. A key challenge in ensembling is that naive parallelism scales hardware resources linearly with the number of models. We resolve this by using Superposed Parameterized Quantum Circuits (SPQCs), which compress multiple ensemble members into a single circuit and enable simultaneous multi-model execution. Experiments on synthetic partial differential equations and real-world power system dynamics demonstrate that our approach achieves accurate predictions while maintaining calibrated uncertainty under realistic quantum noise. These results establish a practical pathway toward scalable, uncertainty-aware operator learning in quantum machine learning.

2604.18701 2026-06-17 cs.LG cs.AI stat.ML 版本更新

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Curiosity-Critic:累积预测误差改进作为世界模型训练的可处理内在奖励

Vin Bhaskara, Haicheng Wang

AI总结 提出Curiosity-Critic方法,通过可处理的每步替代项(当前预测误差与渐近误差基线的差值)作为内在奖励,利用共训练的评论家在线估计误差基线,有效分离可约与不可约预测误差,在随机网格世界实验中优于现有方法。

Comments Accepted to ICML 2026 Workshop on Epistemic Intelligence in Machine Learning (EIML@ICML 2026). Code: https://github.com/vinbhaskara/Curiosity-Critic

详情
AI中文摘要

基于局部预测误差的好奇心奖励仅关注当前转移,而不考虑世界模型在所有已访问转移上的累积预测误差。我们引入了Curiosity-Critic,其内在奖励基于这一累积目标的改进,并证明它有一个可处理的每步替代项:当前预测误差与当前状态转移的渐近误差基线之间的差值。我们通过一个与世界模型共同训练的评论家在线估计这一误差基线;由于评论家只需学习一个转移的预测难度,其对不可约噪声基线的估计在世界模型饱和之前就已收敛,从而将探索引导向可学习的转移。该奖励对可学习转移较高,而对随机转移趋近于零,从而在线分离认知(可约)和偶然(不可约)预测误差。从Schmidhuber(1991)到学习特征空间变体的先前预测误差好奇心公式,都作为该误差基线的特定近似特例出现。在随机网格世界上的实验表明,Curiosity-Critic在训练速度和最终世界模型准确性上优于基于预测误差、访问计数和随机网络蒸馏的方法。

英文摘要

Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it admits a tractable per-step surrogate: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this error baseline online with a learned critic co-trained alongside the world model; since the critic only has to learn how hard a transition is to predict, its estimate of the irreducible noise floor converges well before the world model saturates, redirecting exploration toward learnable transitions. The reward is higher for learnable transitions and collapses toward zero for stochastic ones, thereby separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this error baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error, visitation-count, and Random Network Distillation methods in training speed and final world model accuracy.

2604.24357 2026-06-17 cs.LG cs.AI 版本更新

DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models

DPRM: 一种用于扩散语言模型的即插即用Doob h变换诱导的令牌排序模块

Dake Bu, Wei Huang, Andi Han, Hau-San Wong, Qingfu Zhang, Taiji Suzuki, Atsushi Nitanda

AI总结 提出DPRM模块,通过在线估计从置信度驱动排序逐步过渡到过程奖励引导排序,改进扩散语言模型的令牌排序策略,在九种任务中提升性能。

详情
AI中文摘要

扩散语言模型生成时没有固定的从左到右顺序,令牌排序是一个核心算法选择。现有系统主要使用随机掩码或置信度驱动排序,分别存在训练-测试不匹配和短视探索的问题。我们引入DPRM(Doob变换过程奖励模型),一个即插即用的令牌排序模块,保持宿主架构、去噪目标和监督不变,仅修改排序策略。DPRM从置信度驱动排序开始,通过在线估计逐渐过渡到过程奖励引导排序。我们将精确的DPRM策略描述为奖励倾斜的Gibbs揭示律,证明其阶段式Soft-BoN近似的收敛性,表明在线分桶跟踪器以经验Bernstein速率跟踪精确的DPRM分数,并在可处理的优化假设下建立样本复杂度优势。在涵盖语言推理、测试时扩展、蛋白质、单细胞、分子、DNA、文本到图像生成和VQA的九个宿主中,DPRM排序变体改进了多个语言、DNA和多模态设置,同时也识别了仅置信度排序或任务特定效用更优的边界情况。代码见:this https URL

英文摘要

Diffusion language models generate without a fixed left-to-right order, leaving token ordering as a central algorithmic choice. Existing systems mainly use random masking or confidence-driven ordering, which respectively suffer from train--test mismatch and myopic exploration. We introduce DPRM (Doob -transform Process Reward Model), a plug-in token-ordering module that keeps the host architecture, denoising objective and supervision unchanged, and modifies only the ordering policy. DPRM starts from confidence-driven ordering and gradually shifts to process-reward-guided ordering through online estimates. We characterize the exact DPRM policy as a reward-tilted Gibbs reveal law, prove convergence of its stagewise Soft-BoN approximation, show that the online bucketized controller tracks the exact DPRM score at empirical-Bernstein rates, and establish a sample-complexity advantage under tractable optimization assumptions. Across nine hosts covering language reasoning, test-time scaling, protein, single-cell, molecular, DNA, text-to-image generation, and VQA, DPRM order variants improve several language, DNA, and multimodal settings while also identifying boundary cases where confidence-only ordering or task-specific utilities are preferable. Code is available at: https://github.com/DakeBU/DPRM-DLLM

2604.22128 2026-06-17 cs.CL cs.LG 版本更新

Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

括号序列Transformer中可解码性与因果使用的分离

Aryan Sharma, Cutter Dawes, Shivam Raval

AI总结 通过探针和干预实验,发现Dyck语言Transformer中层级表示虽可解码,但仅注意力模式中的栈顶位置对长距离准确性有因果影响。

详情
AI中文摘要

当在需要理解层级结构的任务上训练时,Transformer被发现以不同方式表示这种层级:在残差流的几何结构中,以及在维持后进先出顺序的类栈注意力模式中。然而,这些表示是被因果使用还是仅仅可解码仍不清楚。我们在Dyck语言(一种平衡括号序列的形式语言)上训练的Transformer中检验了这一差距,其中层级真实标签是明确的。通过探针和干预残差流及注意力模式,我们发现深度、距离和栈顶信号都是可解码的,但它们的因果作用不同。具体而言,掩盖真实栈顶位置的注意力会导致长距离准确性急剧下降,而消融低维残差流子空间则影响相对较小。这些结果扩展到模板化的自然语言设置,表明即使在相关层级变量已知的受控设置中,仅可解码性并不意味着因果使用。

英文摘要

When trained on tasks requiring an understanding of hierarchical structure, transformers have been found to represent this hierarchy in distinct ways: in the geometry of the residual stream, and in stack-like attention patterns maintaining a last-in, first-out ordering. However, it remains unclear whether these representations are causally used or merely decodable. We examine this gap in transformers trained on the Dyck language (a formal language of balanced bracket sequences), where the hierarchical ground truth is explicit. By probing and intervening on the residual stream and attention patterns, we find that depth, distance, and top-of-stack signals are all decodable, yet their causal roles diverge. Specifically, masking attention to the true top-of-stack position causes a sharp drop in long-distance accuracy, while ablating low-dimensional residual stream subspaces has comparatively little effect. These results, which extend to a templated natural language setting, suggest that even in a controlled setting where the relevant hierarchical variables are known, decodability alone does not imply causal use.

2604.19762 2026-06-17 cs.CL 版本更新

Evidence of Layered Positional and Directional Constraints in the Voynich Manuscript: Implications for Cipher-Like Structure

伏尼契手稿中分层位置和方向约束的证据:对类密码结构的影响

Christophe Parisel

AI总结 通过分析伏尼契手稿的字素序列,发现词内从右到左优化和词边界从左到右依赖的双层结构,这种方向分离在四种对比语言中未出现;测试两类生成器均无法同时满足四个签名标准,表明手稿存在难以用简单位置或频率机制复现的类密码结构约束。

详情
AI中文摘要

伏尼契手稿(VMS)展示了一种起源不明的文字,其字素序列一直抗拒语言学分析。我们对其字素序列进行了系统分析,揭示了两个互补的结构层:词内序列中字符级的从右到左优化,以及词边界处的从左到右依赖,这种方向分离在我们四种对比语言(英语、法语、希伯来语、阿拉伯语)中均未观察到。我们进一步根据一个四签名联合标准评估了两类结构化生成器:一个参数化的槽位生成器和一个实现Rugg(2004)胡言乱语假设的卡尔达诺格栅。在其全部测试参数空间中,两类生成器均无法同时再现所有四个签名。虽然这些结果并未排除我们未测试的生成器类别,但它们提供了第一个定量基准,未来任何关于VMS的生成或密码分析模型均可据此评估,并且表明VMS表现出类似密码的结构约束,这些约束难以仅通过简单的位置或频率机制复现。

英文摘要

The Voynich Manuscript (VMS) exhibits a script of uncertain origin whose grapheme sequences have resisted linguistic analysis. We present a systematic analysis of its grapheme sequences, revealing two complementary structural layers: a character-level right-to-left optimization in word-internal sequences and a left-to-right dependency at word boundaries, a directional dissociation not observed in any of our four comparison languages (English, French, Hebrew, Arabic). We further evaluate two classes of structured generator against a four-signature joint criterion: a parametric slot-based generator and a Cardan grille implementing Rugg's (2004) gibberish hypothesis. Across their full tested parameter spaces, neither class reproduces all four signatures simultaneously. While these results do not rule out generator classes we have not tested, they provide the first quantitative benchmarks against which any future generative or cryptanalytic model of the VMS can be evaluated, and they suggest that the VMS exhibits cipher-like structural constraints that are difficult to reproduce from simple positional or frequency-based mechanisms alone.

2604.17616 2026-06-17 cs.LG 版本更新

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

时间序列异常检测中根因分析的条件归因

Shashank Mishra, Karan Patil, Cedric Schockaert, Didier Stricker, Jason Rambach

AI总结 提出一种条件归因框架,通过检索与异常观测上下文相似的正态实例进行依赖保持的解释,结合变分自编码器和UMAP流形嵌入实现高维时间序列的高效归因,并在SWaT和MSDS基准上提升了根因识别准确率与鲁棒性。

Comments Accepted at ECML PKDD. 16 pages, 8 figures, 13 tables, and an appendix

详情
Journal ref
ECML PKDD 2026
AI中文摘要

根因分析对于时间序列异常检测在复杂真实世界系统的可靠运行中至关重要。现有的解释方法通常依赖于不切实际的特征扰动,并忽略时间依赖和跨特征依赖,导致归因不可靠。我们提出了一种条件归因框架,该框架相对于上下文相似的正态系统状态来解释异常。我们的方法不是使用边际或随机采样的基线,而是检索以异常观测为条件的代表性正态实例,从而实现依赖保持且操作上有意义的解释。为了支持高维时间序列数据,在学习的低维表示中使用变分自编码器潜在空间和UMAP流形嵌入进行上下文检索。通过将检索过程基于系统学习的流形,该策略避免了分布外伪影,并在保持计算效率的同时确保归因保真度。我们进一步引入了置信感知和时间评估指标,用于评估解释的可靠性和响应性。在SWaT和MSDS基准上的实验表明,所提出的方法在多个异常检测模型上持续提高了根因识别准确率、时间定位和鲁棒性。这些结果突显了条件归因在复杂时间序列系统中用于可解释异常诊断的实际效用。代码和模型将公开发布。

英文摘要

Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models are available at: https://github.com/dfki-av/Conditional-Attribution-for-Root-Cause-Analysis-in-Time-Series-Anomaly-Detection.

2603.18104 2026-06-17 cs.AI cs.DC cs.LG cs.NE 版本更新

Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI

自适应领域模型:贝叶斯演化、热旋转与几何及神经形态AI的规范化训练

Houston Haynes

AI总结 提出基于维度类型系统、程序超图和b-posit有界设计的替代训练架构,实现内存开销恒定、梯度精确累积和级保持更新,并引入贝叶斯蒸馏和热旋转机制,支持领域特定模型的持续自适应与可验证正确性。

Comments 32 pages, 3 figures

详情
AI中文摘要

当前AI训练假设在IEEE-754算术上进行反向模式自动微分。训练相对于推理的内存开销、优化器复杂性以及训练过程中几何属性的结构退化,都是该算术基底的后果。本文基于三项先前结果开发了一种替代训练架构:维度类型系统和确定性内存管理框架(Haynes 2026),将栈可分配梯度分配和精确quire累积确立为设计时可验证属性;程序超图(Haynes 2026),将几何代数计算中的级保持确立为类型级不变量;以及b-posit有界设计(Jonnalagadda et al. 2025),使posit算术在传统上被视为仅推理的硬件目标上变得可行。它们的组合实现了深度无关的训练内存(约为推理占用量的两倍)、级保持的权重更新和精确梯度累积,统一适用于损失函数优化和脉冲时序依赖的神经形态模型。我们引入了*贝叶斯蒸馏*,一种通过ADM训练机制提取通用模型潜在先验结构的机制,解决了领域特定训练的数据稀缺自举问题。对于部署,我们引入了*热旋转*,一种操作模式,其中更新后的模型在不中断服务的情况下过渡到活跃推理路径,并通过PHG证书和签名版本记录形式化正确性。结果是一类领域特定AI系统,比通用模型更小、更精确,持续自适应,相对于其领域的物理结构可验证正确,并且可从现有模型初始化。

英文摘要

Prevailing AI training assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structural degradation of geometric properties through training are consequences of this arithmetic substrate. This paper develops an alternative training architecture grounded in three prior results: the Dimensional Type System and Deterministic Memory Management framework (Haynes 2026), which establishes stack-eligible gradient allocation and exact quire accumulation as design-time verifiable properties; the Program Hypergraph (Haynes 2026), which establishes grade preservation through geometric algebra computations as a type-level invariant; and the b-posit bounded-regime design (Jonnalagadda et al. 2025), which makes posit arithmetic tractable across hardware targets conventionally considered inference-only. Their composition enables depth-independent training memory bounded to approximately twice the inference footprint, grade-preserving weight updates, and exact gradient accumulation, applicable uniformly to loss-function-optimized and spike-timing-dependent neuromorphic models. We introduce *Bayesian distillation*, a mechanism by which the latent prior structure of a general-purpose model is extracted through the ADM training regime, resolving the data-scarcity bootstrapping problem for domain-specific training. For deployment, we introduce *warm rotation*, an operational pattern in which an updated model transitions into an active inference pathway without service interruption, with correctness formalized through PHG certificates and signed version records. The result is a class of domain-specific AI systems that are smaller and more precise than general-purpose models, continuously adaptive, verifiably correct with respect to the physical structure of their domains, and initializable from existing models.

2505.00986 2026-06-17 cs.LG cs.CV 版本更新

EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems

EmbodiTTA:面向具身视觉系统的资源高效测试时自适应

Xiao Ma, Young D. Kwon, Dong Ma

AI总结 提出按需测试时自适应范式OD-TTA,通过轻量域移检测、源域选择和分离批归一化更新,在边缘设备上实现高效准确的自适应,显著降低计算和能耗开销。

详情
AI中文摘要

连续测试时自适应(CTTA)持续对每个到达的数据批次调整部署模型。虽然达到了最优精度,但现有的CTTA方法由于巨大的内存开销和能耗,在资源受限的边缘设备上实际应用性差。本文首先引入一种新范式——按需TTA,仅在检测到显著域移时触发自适应。然后,我们提出OD-TTA,一种用于边缘设备上准确高效自适应的按需TTA框架。OD-TTA包含三项创新技术:1)轻量级域移检测机制,仅在需要时激活TTA,大幅降低总体计算开销;2)源域选择模块,选择合适的源模型进行自适应,确保高且鲁棒的精度;3)解耦的批归一化(BN)更新方案,实现小批量下的内存高效自适应。大量实验表明,OD-TTA在显著降低能量和计算开销的同时,实现了可比甚至更好的性能,使TTA成为实际可行的技术。

英文摘要

Continual Test-time adaptation (CTTA) continuously adapts the deployed model on every incoming batch of data. While achieving optimal accuracy, existing CTTA approaches present poor real-world applicability on resource-constrained edge devices, due to the substantial memory overhead and energy consumption. In this work, we first introduce a novel paradigm -- on-demand TTA -- which triggers adaptation only when a significant domain shift is detected. Then, we present OD-TTA, an on-demand TTA framework for accurate and efficient adaptation on edge devices. OD-TTA comprises three innovative techniques: 1) a lightweight domain shift detection mechanism to activate TTA only when it is needed, drastically reducing the overall computation overhead, 2) a source domain selection module that chooses an appropriate source model for adaptation, ensuring high and robust accuracy, 3) a decoupled Batch Normalization (BN) update scheme to enable memory-efficient adaptation with small batch sizes. Extensive experiments show that OD-TTA achieves comparable and even better performance while reducing the energy and computation overhead remarkably, making TTA a practical reality.

2604.03444 2026-06-17 cs.LG cs.CL 版本更新

Olmo Hybrid: From Theory to Practice and Back

Olmo Hybrid:从理论到实践再回到理论

William Merrill, Yanhong Li, Tyler Romero, Anej Svete, Caia Costello, Pradeep Dasigi, Dirk Groeneveld, David Heineman, Bailey Kuehl, Nathan Lambert, Chuan Li, Kyle Lo, Saumya Malik, DJ Matusz, Benjamin Minixhofer, Jacob Morrison, Luca Soldaini, Finbarr Timbers, Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi, Ashish Sabharwal

AI总结 本文通过理论分析和实验验证,证明混合模型(结合注意力与线性RNN)在表达能力、扩展效率上优于纯Transformer,并训练了7B参数的Olmo Hybrid模型,在标准评估中超越Olmo 3。

Comments Corrected author list and typos in appendix

详情
AI中文摘要

近期工作展示了非Transformer语言模型(尤其是线性递归神经网络(RNN)和混合注意力与递归的混合模型)的潜力。然而,对于这些新架构的潜在优势是否值得承担规模化扩展的风险和努力,尚无共识。为解决此问题,我们从多个方面提供混合模型优于纯Transformer的证据。首先,理论上,我们证明混合模型不仅继承了Transformer和线性RNN的表达能力,还能表达超出两者的任务,例如代码执行。将这一理论付诸实践,我们训练了Olmo Hybrid,一个70亿参数模型,与Olmo 3 7B基本相当,但将滑动窗口层替换为Gated DeltaNet层。我们表明,在标准预训练和中期训练评估中,Olmo Hybrid优于Olmo 3,证明了混合模型在受控大规模设置下的优势。我们发现混合模型的扩展效率显著高于Transformer,这解释了其更高的性能。然而,尚不清楚为何特定形式问题上的更高表达能力会导致更好的扩展性或在下游任务(与这些问题无关)上表现更优。为解释这一明显差距,我们回到理论,论证为何增强的表达能力应转化为更好的扩展效率,从而完成循环。总体而言,我们的结果表明,混合注意力和递归层的混合模型是语言建模范式的强大扩展:不仅用于减少推理时的内存,更是获得在预训练中更好扩展的更具表达能力模型的基本途径。

英文摘要

Recent work has demonstrated the potential of non-transformer language models, especially linear recurrent neural networks (RNNs) and hybrid models that mix recurrence and attention. Yet there is no consensus on whether the potential benefits of these new architectures justify the risk and effort of scaling them up. To address this, we provide evidence for the advantages of hybrid models over pure transformers on several fronts. First, theoretically, we show that hybrid models do not merely inherit the expressivity of transformers and linear RNNs, but can express tasks beyond both, such as code execution. Putting this theory to practice, we train Olmo Hybrid, a 7B-parameter model largely comparable to Olmo 3 7B but with the sliding window layers replaced by Gated DeltaNet layers. We show that Olmo Hybrid outperforms Olmo 3 across standard pretraining and mid-training evaluations, demonstrating the benefit of hybrid models in a controlled, large-scale setting. We find that the hybrid model scales significantly more efficiently than the transformer, explaining its higher performance. However, its unclear why greater expressivity on specific formal problems should result in better scaling or superior performance on downstream tasks unrelated to those problems. To explain this apparent gap, we return to theory and argue why increased expressivity should translate to better scaling efficiency, completing the loop. Overall, our results suggest that hybrid models mixing attention and recurrent layers are a powerful extension to the language modeling paradigm: not merely to reduce memory during inference, but as a fundamental way to obtain more expressive models that scale better during pretraining.

2603.18492 2026-06-17 cs.LG 版本更新

AIMER: Calibration-Free Task-Agnostic MoE Expert Pruning

AIMER: 免校准任务无关的MoE专家剪枝

Zongfang Liu, Guangyi Chen, Shengkun Tang, Yifan Shen, Huan Wang, Xin Yuan

AI总结 提出AIMER方法,通过专家权重的集中度模式识别独特专家,实现免校准的任务无关MoE专家剪枝,在7B至47B模型上优于现有方法。

详情
AI中文摘要

混合专家(MoE)语言模型在不增加每token计算量的情况下增加了参数容量,但部署时仍需存储全部专家池,因此专家剪枝对于减少内存和服务开销至关重要。现有的任务无关专家剪枝方法通常依赖校准:它们通过校准集上的路由或激活统计估计专家重要性,使得剪枝决策对校准数据变化敏感,同时引入大量预处理成本。我们提出AIMER(基于均方根绝对均值的重要性专家排序),一种简单的免校准准则,通过捕捉专家权重的集中度模式来识别更独特的专家,使其非常适合任务无关的专家剪枝。在具有不同架构的7B至47B MoE语言模型和16个多样化基准上,AIMER在跨任务能力平衡方面始终优于现有的免校准方法。令人惊讶的是,AIMER还比基于强校准的专家剪枝基线(在广泛使用的任务无关C4语料库上校准)实现了更好的平衡,同时仅需0.22–2.06秒即可对所有专家进行评分。

英文摘要

Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token computation, yet deployment still requires storing the full expert pool, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert-pruning methods are typically calibration-dependent: they estimate expert importance from routing or activation statistics on a calibration set, making pruning decisions sensitive to calibration-data variation while introducing substantial preprocessing cost. We propose AIMER (\textbf{A}bsolute mean over root mean square \textbf{IM}portance for \textbf{E}xpert \textbf{R}anking), a simple calibration-free criterion that identifies more distinct experts by capturing the concentration pattern of expert weights, making it well suited for task-agnostic expert pruning. Across 7B to 47B MoE language models with distinct architectures and 16 diverse benchmarks, AIMER consistently delivers stronger capability balance across diverse tasks than existing calibration-free methods. Surprisingly, AIMER also achieves better balance than strong calibration-based expert-pruning baselines calibrated on the widely used task-agnostic C4 corpus, while requiring only 0.22--2.06 seconds to score all experts.

2506.18831 2026-06-17 cs.CL 版本更新

Adaptive Activation Steering for Efficient LLM Reasoning via Closed-Loop PID Control

自适应激活引导:通过闭环PID控制实现高效LLM推理

Aryasomayajula Ram Bharadwaj

AI总结 提出PID-steering方法,利用PID控制器根据块级冗余分类器动态调整激活引导强度,在减少推理开销的同时提升准确率。

详情
AI中文摘要

使用长思维链训练的推理LLM常出现过度思考:它们在冗余反思和过渡上花费token,增加成本却不提高准确性。静态激活引导(如SEAL)用固定向量抑制此类内容,但无论当前块实际冗余程度如何,都施加相同强度。我们描述了PID-steering,一种无需训练、解码时的方法,通过由轻量级块级冗余分类器驱动的PID控制器来调节引导强度。在GSM8K子集上使用DeepSeek-R1-Distill-Qwen-1.5B,该方法将准确率从85.7%提升至89.6%(+3.9个百分点),同时将平均输出长度从1026个token削减至790个(-23%)。我们将其报告为小规模概念验证,而非基准结果。

英文摘要

Reasoning LLMs trained with long chain-of-thought often overthink: they spend tokens on redundant reflection and transitions that inflate cost without improving accuracy. Static activation steering (e.g.\ SEAL) suppresses such content with a fixed vector, but applies the same strength regardless of how redundant the current chunk actually is. We describe PID-steering, a training-free, decoding-time method that modulates the steering strength with a PID controller driven by a lightweight chunk-level redundancy classifier. On a subset of GSM8K with DeepSeek-R1-Distill-Qwen-1.5B, the method improves accuracy from 85.7\% to 89.6\% (+3.9 pp) while cutting average output length from 1026 to 790 tokens ($-$23\%). We report it as a small-scale proof of concept rather than a benchmark result.

2604.06802 2026-06-17 cs.AI 版本更新

Riemann-Bench: A Benchmark for Moonshot Mathematics

Riemann-Bench: 面向登月级数学的基准测试

Suhaas Garre, Erik Knutsen, Sushant Mehta, Edwin Chen

AI总结 提出Riemann-Bench基准,由专家设计研究级数学问题,评估AI系统超越奥数水平的推理能力,结果显示前沿模型得分低于10%。

详情
AI中文摘要

最近的AI系统在国际数学奥林匹克竞赛中取得了金牌级别的表现,展示了在竞赛式问题解决方面的卓越能力。然而,竞赛数学仅代表了数学推理的一个狭窄部分:问题来自有限的领域,需要最少的先进工具,并且通常奖励洞察力技巧而非深奥的理论知识。我们引入了Riemann-Bench,一个由专家策划的私有基准测试,旨在评估AI系统在研究级数学上的表现,这远远超出了奥林匹克的前沿。问题由常春藤联盟数学教授、研究生和拥有博士学位的IMO金牌得主编写,并且通常需要作者数周才能独立解决。每个问题都经过两位独立领域专家的双盲验证,他们必须从头开始解决问题,并通过程序化验证器得出唯一的封闭形式解。我们将前沿模型评估为不受限制的研究智能体,可以完全访问编码工具、搜索和开放式推理,使用每个问题100次独立运行的无偏统计估计器。我们的结果显示,所有前沿模型目前得分低于10%,揭示了奥林匹克级问题解决与真正研究级数学推理之间的巨大差距。通过保持基准完全私有,我们确保测量的性能反映了真实的数学能力,而不是对训练数据的记忆。

英文摘要

Recent AI systems have achieved gold-medal-level performance on the International Mathematical Olympiad, demonstrating remarkable proficiency at competition-style problem solving. However, competition mathematics represents only a narrow slice of mathematical reasoning: problems are drawn from limited domains, require minimal advanced machinery, and can often reward insightful tricks over deep theoretical knowledge. We introduce Riemann-Bench, a private benchmark of expert-curated problems designed to evaluate AI systems on research-level mathematics that goes far beyond the olympiad frontier. Problems are authored by Ivy League mathematics professors, graduate students, and PhD-holding IMO medalists, and routinely took their authors weeks to solve independently. Each problem undergoes double-blind verification by two independent domain experts who must solve the problem from scratch, and yields a unique, closed-form solution assessed by programmatic verifiers. We evaluate frontier models as unconstrained research agents, with full access to coding tools, search, and open-ended reasoning, using an unbiased statistical estimator computed over 100 independent runs per problem. Our results reveal that all frontier models currently score below 10%, exposing a substantial gap between olympiad-level problem solving and genuine research-level mathematical reasoning. By keeping the benchmark fully private, we ensure that measured performance reflects authentic mathematical capability rather than memorization of training data.

2603.28251 2026-06-17 cs.CV cs.AI 版本更新

DiffAttn: Diffusion-Based Drivers' Visual Attention Prediction with LLM-Enhanced Semantic Reasoning

DiffAttn: 基于扩散的驾驶员视觉注意力预测与LLM增强语义推理

Weimin Liu, Qingkun Li, Jiyuan Qiu, Wenjun Wang, Joshua H. Meng

AI总结 提出DiffAttn框架,将驾驶员视觉注意力预测建模为条件扩散去噪过程,结合Swin Transformer、特征融合金字塔和LLM增强语义推理,在四个数据集上达到最先进性能。

详情
AI中文摘要

驾驶员的视觉注意力为预测潜在危险提供关键线索,并直接影响决策和控制操作,其缺失可能危及交通安全。为模拟驾驶员的感知模式并推进智能车辆的视觉注意力预测,我们提出DiffAttn,一种基于扩散的框架,将该任务建模为条件扩散-去噪过程,从而更准确地建模驾驶员注意力。为捕捉局部和全局场景特征,我们采用Swin Transformer作为编码器,并设计了一个解码器,该解码器结合了特征融合金字塔用于跨层交互,以及密集的多尺度条件扩散,以共同增强去噪学习并建模细粒度的局部和全局场景上下文。此外,引入大语言模型(LLM)层以增强自上而下的语义推理,并提高对安全关键线索的敏感性。在四个公共数据集上的大量实验表明,DiffAttn实现了最先进的性能,超越了大多数基于视频、自上而下特征驱动和LLM增强的基线。我们的框架进一步支持可解释的以驾驶员为中心的场景理解,并具有改善智能车辆中座舱人机交互、风险感知和驾驶员状态测量的潜力。

英文摘要

Drivers' visual attention provides critical cues for anticipating latent hazards and directly shapes decision-making and control maneuvers, where its absence can compromise traffic safety. To emulate drivers' perception patterns and advance visual attention prediction for intelligent vehicles, we propose DiffAttn, a diffusion-based framework that formulates this task as a conditional diffusion-denoising process, enabling more accurate modeling of drivers' attention. To capture both local and global scene features, we adopt Swin Transformer as encoder and design a decoder that combines a Feature Fusion Pyramid for cross-layer interaction with dense, multi-scale conditional diffusion to jointly enhance denoising learning and model fine-grained local and global scene contexts. Additionally, a large language model (LLM) layer is incorporated to enhance top-down semantic reasoning and improve sensitivity to safety-critical cues. Extensive experiments on four public datasets demonstrate that DiffAttn achieves state-of-the-art (SoTA) performance, surpassing most video-based, top-down-feature-driven, and LLM-enhanced baselines. Our framework further supports interpretable driver-centric scene understanding and has the potential to improve in-cabin human-machine interaction, risk perception, and drivers' state measurement in intelligent vehicles.

2604.03120 2026-06-17 cs.CV cs.RO 版本更新

SCC-Loc: A Unified Semantic Cascade Consensus Framework for UAV Thermal Geo-Localization

SCC-Loc: 无人机热红外地理定位的统一语义级联共识框架

Xiaoran Zhang, Yu Liu, Jinyu Liang, Kangqiushi Li, Zhiwei Huang, Huaxin Xiao

AI总结 提出SCC-Loc框架,通过共享DINOv2骨干网络、语义引导视口对齐、级联空间自适应纹理结构滤波和共识驱动可靠性感知位置选择,解决热红外-可见光模态差异导致的特征模糊问题,实现零样本高精度绝对位置估计,平均定位误差9.37米。

Comments 17 pages, 5 figures. Submitted to IEEE J-STARS

详情
AI中文摘要

跨模态热红外地理定位(TG)为无人机在GNSS拒止环境中提供了鲁棒的全天候解决方案。然而,深刻的热红外-可见光模态差异引入了严重的特征模糊性,系统性地破坏了传统的由粗到精配准。为打破这一瓶颈,我们提出SCC-Loc,一个统一的语义-级联-共识定位框架。通过在全局检索和MINIMA$_{\ ext{RoMa}}$匹配中共享单个DINOv2骨干网络,它最小化内存占用并实现零样本、高精度的绝对位置估计。具体而言,我们通过引入三个协同组件来解决模态模糊性。首先,我们设计语义引导视口对齐(SGVA)模块,自适应优化卫星裁剪区域,有效校正初始空间偏差。其次,我们开发级联空间自适应纹理结构滤波(C-SATSF)机制,显式强制几何一致性,从而消除密集的跨模态离群点。最后,我们提出共识驱动可靠性感知位置选择(CD-RAPS)策略,通过物理约束位姿优化的协同作用推导出最优解。为解决数据稀缺问题,我们构建了Thermal-UAV数据集,提供11,890个多样化的热红外查询,并参考大规模卫星正射影像和相应的空间对齐数字表面模型(DSM)。大量实验表明,SCC-Loc建立了新的最先进水平,将平均定位误差抑制到9.37米,并在严格的5米阈值内比最强基线提供了7.6倍的精度提升。代码和数据集可在该URL获取。

英文摘要

Cross-modal Thermal Geo-localization (TG) provides a robust, all-weather solution for Unmanned Aerial Vehicles (UAVs) in Global Navigation Satellite System (GNSS)-denied environments. However, profound thermal-visible modality gaps introduce severe feature ambiguity, systematically corrupting conventional coarse-to-fine registration. To dismantle this bottleneck, we propose SCC-Loc, a unified Semantic-Cascade-Consensus localization framework. By sharing a single DINOv2 backbone across global retrieval and MINIMA$_{\text{RoMa}}$ matching, it minimizes memory footprint and achieves zero-shot, highly accurate absolute position estimation. Specifically, we tackle modality ambiguity by introducing three cohesive components. First, we design the Semantic-Guided Viewport Alignment (SGVA) module to adaptively optimize satellite crop regions, effectively correcting initial spatial deviations. Second, we develop the Cascaded Spatial-Adaptive Texture-Structure Filtering (C-SATSF) mechanism to explicitly enforce geometric consistency, thereby eradicating dense cross-modal outliers. Finally, we propose the Consensus-Driven Reliability-Aware Position Selection (CD-RAPS) strategy to derive the optimal solution through a synergy of physically constrained pose optimization. To address data scarcity, we construct Thermal-UAV, a comprehensive dataset providing 11,890 diverse thermal queries referenced against a large-scale satellite ortho-photo and corresponding spatially aligned Digital Surface Model (DSM). Extensive experiments demonstrate that SCC-Loc establishes a new state-of-the-art, suppressing the mean localization error to 9.37 m and providing a 7.6-fold accuracy improvement within a strict 5-m threshold over the strongest baseline. Code and dataset are available at https://github.com/FloralHercules/SCC-Loc.

2604.00611 2026-06-17 cs.RO 版本更新

Physical Imitation Learning: Distilling Control Policies into Passive Elasticity

物理模仿学习:将控制策略蒸馏到被动弹性中

Huyue Ma, Yurui Jin, Helmut Hauser, Rui Wu

AI总结 提出物理模仿学习(PIL)方法,将强化学习控制策略分解为主动与被动部分,被动部分卸载到并联弹性关节,显著降低能耗,在模拟四足机器人上实现高达95%的机械功率卸载。

详情
AI中文摘要

由于脑-体协同进化,动物的内在身体动力学在其节能运动中起着关键作用。具体来说,控制努力在主动肌肉和被动身体动力学之间共享——这一原则通常被称为物理智能。因此,身体动力学是解决方案的一部分。相比之下,机器人身体通常被设计得尽可能简单,但主动控制常常与内在身体动力学对抗,导致低能效。我们引入了物理模仿学习(PIL),这是一种新颖的方法,使当前的机器人控制更接近动物。PIL 获取通过强化学习(RL)获得的学习控制策略,并将其系统地分解为主动和被动控制贡献。然后,被动部分可以直接卸载到被动并联弹性关节(PEJ)上。结果,主动控制贡献显著减少,降低了整体能耗。此外,策略可以通过 RL 训练,通过生成更容易被 PEJ 模仿的步态来利用 PEJ 的辅助。这使得主动和被动控制组件的协同设计成为可能,将更大份额的驱动努力转移到 PEJ。在这里,我们在模拟四足动物中展示了这种方法的潜力。我们的结果表明,所提出的方法可以在平坦地形上卸载高达 95% 的机械功率到被动身体动力学,在崎岖地形上卸载 13%。因此,PIL 提供了一条可推广的途径,用于实现特定任务的物理智能,适用于各种基于关节的机器人形态。

英文摘要

Due to brain-body co-evolution, animals' intrinsic body dynamics play a crucial role in their energy-efficient locomotion. Specifically, the control effort is shared between active muscles and passive body dynamics--a principle often referred to as Physical Intelligence. As a result, the body dynamics are part of the solution. In contrast, robot bodies are typically designed to be as simple as possible, but the active control often fights the intrinsic body dynamics, resulting in low energy-efficiency. We introduce Physical Imitation Learning (PIL), a novel approach that brings current robotics control closer to animals. PIL takes learned control policies obtained with Reinforcement Learning (RL) and systematically splits them up into an active and passive control contribution. The passive part can be then directly offloaded to passive Parallel Elastic Joints (PEJs). As a result, the active control contribution is significantly reduced, lowering the overall energy consumption. Furthermore, the policy can be trained via RL to leverage the PEJ assistance by generating gaits that are more readily emulated by the PEJs. This enables co-design of the active and passive control components, shifting a greater share of actuation effort to the PEJs. Here we demonstrate the potential of this approach in simulated quadrupeds. Our results show that the proposed approach can offload up to 95% of mechanical power to passive body dynamics on flat terrain and 13% on rough terrain. PIL thereby provides a generalisable route to task-specific Physical Intelligence applicable to a wide range of joint-based robot morphologies.

2604.00605 2026-06-17 cs.CV 版本更新

Fluently Lying: Adversarial Robustness Can Be Substrate-Dependent

流利地撒谎:对抗鲁棒性可能依赖于底层架构

Daye Kang, Hyeongboo Baek

AI总结 发现一种新的对抗攻击失败模式——质量崩溃(QC),即检测数量不变但精度骤降,且仅出现在特定SNN架构(EMS-YOLO)中,表明对抗失败模式可能依赖于底层架构。

Comments Withdrawn by the authors due to an implementation bug discovered in the main experimental pipeline. The bug affects the main results, and therefore the empirical claims and conclusions of the paper are no longer supported

详情
AI中文摘要

用于监控和防御对抗攻击下目标检测器的主要工具假设,当精度下降时,检测数量也会同步下降。这种耦合是假设的,并未经过测量。我们报告了在单个模型上观察到的反例:在标准PGD攻击下,EMS-YOLO(一种脉冲神经网络(SNN)目标检测器)保留了超过70%的检测结果,而mAP从0.528骤降至0.042。我们将这种保持检测数量但精度崩溃的现象称为质量崩溃(QC),以区别于在非目标评估中占主导地位的抑制现象。在四种SNN架构和两种威胁模型(l-infinity和l-2)下,QC仅出现在测试的四种检测器之一(EMS-YOLO)中。在该模型上,所有五种标准防御组件均未能检测或缓解QC,这表明防御生态系统可能依赖于一种基于单一底层架构校准的共享假设。据我们所知,这些结果首次证明对抗失败模式可能依赖于底层架构。

英文摘要

The primary tools used to monitor and defend object detectors under adversarial attack assume that when accuracy degrades, detection count drops in tandem. This coupling was assumed, not measured. We report a counterexample observed on a single model: under standard PGD, EMS-YOLO, a spiking neural network (SNN) object detector, retains more than 70% of its detections while mAP collapses from 0.528 to 0.042. We term this count-preserving accuracy collapse Quality Corruption (QC), to distinguish it from the suppression that dominates untargeted evaluation. Across four SNN architectures and two threat models (l-infinity and l-2), QC appears only in one of the four detectors tested (EMS-YOLO). On this model, all five standard defense components fail to detect or mitigate QC, suggesting the defense ecosystem may rely on a shared assumption calibrated on a single substrate. These results provide, to our knowledge, the first evidence that adversarial failure modes can be substrate-dependent.