arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1971
专题追踪 全部专题
2606.18372 2026-06-18 cs.CL cs.AI 新提交

Redact or Keep? A Fully Local AI Cascade for Educational Dialogue De-Identification

保留还是删除?用于教育对话去标识的完全本地AI级联框架

Haocheng Zhang, Zhuqian Zhou, Kirk Vanacore, Bakhtawar Ahtisham, René F. Kizilcec

发表机构 * Cornell University(康奈尔大学)

AI总结 针对教育对话中课程术语与个人身份信息混淆的问题,提出一种完全本地的级联框架,通过召回优先的联合提议器和上下文感知审查器实现约束性隐私分类,在数学辅导对话上达到0.958的宏F1,优于商业API和纯LLM基线。

详情
AI中文摘要

教育对话是研究中有价值但敏感的资源:捕捉真实学习的同一份转录往往也包含与课程内容纠缠的个人身份信息(PII),其中“Riemann”可能指真实学生或数学概念。现有方法在治理和准确性之间强制权衡。商业大型语言模型(LLM)可以处理这种歧义,但需要将学生数据发送给第三方,而本地命名实体识别(NER)系统保留治理但过度删除课程术语。我们提出一个完全本地的级联框架,将去标识从开放式实体识别重新定义为约束性隐私分类。一个召回优先的联合提议器结合两个轻量级编码器和确定性规则,过度生成候选跨度;然后一个上下文感知审查器利用周围对话和说话者角色对每个候选做出二元的保留/删除决策。我们在两个大型平台的数学辅导转录上评估了三种审查器配置,与同系列纯LLM基线和商业API进行比较。最强的本地配置达到0.958宏F1,而同系列纯LLM基线为0.767,商业API为0.706,同时完全在单个笔记本电脑上运行。在针对课程-人名歧义的挑战集上,相同配置仅下降0.03 F1,而较小审查器下降0.19至0.25。这些结果表明,对于教育去标识,问题表述比模型规模更重要。

英文摘要

Educational dialogue is a valuable but sensitive resource for research: the same transcripts that capture authentic learning often capture personally identifiable information (PII) entangled with curricular content, where "Riemann" may refer to a real student or to a mathematical concept. Existing approaches force a tradeoff between governance and accuracy. Commercial Large Language Models (LLMs) can handle this ambiguity but require sending student data to third parties, while local named entity recognition (NER) systems preserve governance but over-redact curricular terms. We propose a fully local cascade framework that reframes de-identification from open-ended entity recognition to constrained privacy triage. A recall-first union proposer combines two lightweight encoders with deterministic rules to over-generate candidate spans; a context-aware reviewer then makes a binary Redact/Keep decision for each candidate using surrounding dialogue and speaker role. We evaluate three reviewer configurations against same-family LLM-only baselines and a commercial API on math tutoring transcripts from two large platforms. The strongest local configuration reaches 0.958 macro F1, compared with 0.767 for a same-family LLM-only baseline and 0.706 for the commercial API, while running entirely on a single laptop. On a targeted challenge set of curricular-personal name ambiguity, the same configuration degrades by only 0.03 F1 versus 0.19 to 0.25 for smaller reviewers. These results suggest that for educational de-identification, problem formulation matters more than model scale.

2606.18367 2026-06-18 cs.LG 新提交

Do Time Series Foundation Model Benchmarks Hide Regime-Dependent Failures? Evidence from Traffic Speed Forecasting

时间序列基础模型基准是否隐藏了依赖于状态的失败?来自交通速度预测的证据

Yingshuo Wang, Xian Sun, Lingdong Kong, Wei Gao, Yanhang Li, Zhichao Fan, Zexin Zhuang

发表机构 * University of California, Berkeley(加州大学伯克利分校) Duke University(杜克大学) National University of Singapore(新加坡国立大学) Northeastern University(东北大学) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Southern Methodist University(南卫理公会大学)

AI总结 本文提出状态分层评估方法,发现时间序列基础模型在交通状态转换时准确率和预测区间覆盖率显著下降,并提出了双峰混合增强方法以改善转换状态覆盖。

Comments 5 pages, 2 figures. Accepted at the Workshop on Forecasting as a New Frontier of Intelligence, ICML 2026

详情
AI中文摘要

标准基准使用聚合指标评估时间序列基础模型(TSFMs),但这可能掩盖关键运行状态下的严重失败。我们引入了状态分层评估,并将其应用于两个标准交通速度基准上的三个TSFMs。交通在自由流和拥堵状态之间表现出突然的状态切换,在转换期间产生双峰速度分布。当我们按交通状态分层时,准确率和预测区间覆盖率在转换期间急剧下降:转换状态的MAE达到11 mph(而总体为3 mph),90%预测区间的经验覆盖率低至55%。这些失败在聚合指标中不可见,因为自由流观测主导了样本。一个简单的历史条件基线(从每个传感器的训练分布中采样)实现了比任何TSFM更好的转换覆盖率,但总体准确率差得多。我们提出了双峰混合增强(BMA),一种后处理方法,将TSFM预测与历史分布知识相结合,在保持TSFM准确率的同时接近历史基线的转换覆盖率。我们的结果表明,TSFM基准应纳入状态感知评估,以揭示聚合指标隐藏的失败。

英文摘要

Standard benchmarks evaluate time series foundation models (TSFMs) using aggregate metrics, but these can mask severe failures in critical operating regimes. We introduce regime-stratified evaluation and apply it to three TSFMs on two standard traffic speed benchmarks. Traffic exhibits abrupt regime switching between free-flow and congested states, producing bimodal speed distributions during transitions. When we stratify by traffic regime, both accuracy and prediction-interval coverage degrade sharply during transitions: transition-regime MAE reaches 11 mph (versus 3 mph overall), and empirical coverage of 90% prediction intervals drops as low as 55%. These failures are invisible in aggregate metrics because free-flow observations dominate the sample. A simple historical conditional baseline (sampling from per-sensor training distributions) achieves better transition coverage than any TSFM, but has far worse overall accuracy. We propose bimodal mixture augmentation (BMA), a post-hoc method that combines TSFM forecasts with historical distributional knowledge, approaching the historical baseline's transition coverage while preserving the TSFM's accuracy. Our results suggest that TSFM benchmarks should incorporate regime-aware evaluation to surface failures that aggregate metrics hide.

2606.18363 2026-06-18 cs.RO cs.AI 新提交

Guava: An Effective and Universal Harness for Embodied Manipulation

Guava: 一种有效且通用的具身操作工具框架

Haowen Liu, Xirui Li, Shaoxiong Yao, Peng Shi, Tianyi Zhou, Jia-Bin Huang, Furong Huang, Jiayuan Mao

发表机构 * University of Maryland College Park(马里兰大学帕克分校) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of Waterloo(滑铁卢大学) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) University of Pennsylvania(宾夕法尼亚大学) Amazon FAR(亚马逊 FAR)

AI总结 提出Guava框架,通过迭代感知-推理-行动循环、语义动作抽象和多模态观测三大关键设计,将具身操作能力蒸馏到4B开源模型中,在仿真和真实环境中性能媲美前沿专有模型。

详情
AI中文摘要

在大规模视觉-语言数据上训练的语言模型已展现出作为具身智能体的强大潜力。通过具身工具使用来驾驭模型,为端到端的视觉-语言-行动系统提供了一种有前景的替代方案,它将高层推理与外部模块(用于感知、规划和控制)相结合。然而,对于具身操作而言,什么构成了有效的工具框架,以及这种框架能在多大程度上解锁广泛推理模型的具身能力,仍不清楚。在这项工作中,我们提出了Guava,一个通过系统探索智能体工作流、动作空间和观测空间的设计空间而开发的具身工具使用框架。我们的研究确定了有效具身智能体的三个关键要素:迭代感知-推理-行动循环、语义动作抽象和多模态观测。为了理解这些设计原则是否对小型模型也具有普适性,我们开发了一个端到端的训练流程,利用完全在仿真中收集的不到2000条轨迹,将具身操作能力蒸馏到一个4B开源模型中。在仿真和真实环境中的实验结果表明,其性能与前沿专有模型相当,同时展现出对未见物体、新指令和长时域任务的强大泛化能力。结果表明,一个精心设计的框架可以作为具身操作的可扩展、模型无关的接口,使紧凑的开源模型在极少的训练数据下展现出强大的涌现具身能力。

英文摘要

Language models trained on large-scale vision-language data have demonstrated strong potential for embodied agents. Harnessing models through embodied tools use offers a promising alternative to end-to-end vision-language-action systems by combining high-level reasoning with external modules for perception, planning, and control. However, it remains unclear what makes an effective harness for embodied manipulation, and to what extent such a harness can unlock embodied capabilities in a wide range of reasoning models. In this work, we present Guava, a harness framework for embodied tool use developed through systematic exploration of the design space of agent workflows, action spaces, and observation spaces. Our study identifies three key ingredients for effective embodied agents: iterative perception-reasoning-action loops, semantic action abstractions, and multimodal observations. To understand whether these design principles are universal even to small models, we develop an end-to-end training pipeline that distills embodied manipulation capabilities into a 4B open-source model using fewer than 2K trajectories collected entirely in simulation. Experimental results in both simulation and real-world environments show performance comparable to frontier proprietary models while exhibiting strong generalization to unseen objects, novel instructions, and long-horizon tasks. Results suggest that a well-designed harness can serve as a scalable, model-agnostic interface for embodied manipulation, enabling strong emergent embodied capabilities in compact open-source models with minimal training data.

2606.18328 2026-06-18 cs.RO 新提交

Recover, Discover, Plan: Learning Skills and Concepts from Robot Failures

恢复、发现、规划:从机器人失败中学习技能与概念

Bowen Li, Mayank Mishra, Y. Isabel Liu, Stone Tao, Nishanth Kumar, Alexander G. Gray, Ruwan Wickramarachchi, Jonathan Francis, Sebastian Scherer, Tom Silver

发表机构 * CMU(卡内基梅隆大学) Princeton(普林斯顿大学) AI2(艾伦人工智能研究所) MIT(麻省理工学院) Centaur AI Bosch Center for AI(博世人工智能中心)

AI总结 提出ReSYNC方法,通过技能学习与概念发现的交替过程,从失败恢复经验中逐步构建抽象谓词,实现全局失败避免和长期规划,性能提升超50%。

Comments 9 pages, 6 figures. Website: https://jaraxxus-me.github.io/ReSYNC/

详情
AI中文摘要

智能机器人不仅应该从失败中恢复,还应该获取必要的抽象知识以避免未来的失败。虽然强化学习(RL)可以学习反应性恢复行为,但为每种不同的失败模式训练单独的策略效率极低。我们引入了恢复驱动的关系概念综合(ReSYNC),这是第一种从失败恢复经验中逐步发现并细化状态抽象(关系谓词)以支持抽象规划的方法。与纯粹的反应性方法不同,ReSYNC通过增量双学习过程联合学习技能和概念。在技能学习阶段,机器人使用RL学习从训练任务中出现的失败中恢复。在概念学习阶段,机器人发现新的关系谓词并细化其抽象规划模型,以解释和泛化所学的恢复行为。这种交互使ReSYNC能够将训练中看到的局部恢复转化为测试时的全局失败避免。在四个模拟领域,我们展示了ReSYNC持续扩展和细化其抽象库的能力,使其能够解决长期、前所未见的问题,性能超过强基线50%以上。此外,我们展示了ReSYNC的仿真到现实迁移,其中它执行真实世界的非抓取操作技能,并通过抽象规划泛化到未见场景。总体而言,ReSYNC代表了朝着机器人自主获取抽象以实现物理世界中可扩展的、感知失败的规划迈出的重要一步。

英文摘要

Intelligent robots should not only recover from failures, but also acquire the abstract knowledge needed to avoid them in the future. While reinforcement learning (RL) can learn reactive recovery behaviors, training a separate policy for every distinct failure mode is highly inefficient. We introduce Recovery-Driven Synthesis of Relational Concepts (ReSYNC), the first approach that progressively discovers and refines state abstractions (relational predicates) from failure-recovery experience to support abstract planning. Unlike purely reactive methods, ReSYNC jointly learns skills and concepts through an incremental dual-learning process. In the skill-learning phase, the robot uses RL to learn to recover from failures seen in training tasks. In the concept-learning phase, the robot discovers new relational predicates and refines its abstract planning model to explain and generalize the learned recovery behaviors. This interaction enables ReSYNC to convert local recoveries seen during training into global failure avoidance at test time. Across four simulated domains, we show that ReSYNC's ability to continually expand and refine its abstraction library allows it to solve long-horizon, previously unseen problems, outperforming strong baselines by over 50%. Additionally, we demonstrate sim-to-real transfer of ReSYNC, where it performs real-world non-prehensile manipulation skills and generalizes to unseen scenarios through abstract planning. Overall, ReSYNC represents a significant step toward robots that autonomously acquire abstractions for scalable, failure-aware planning in the physical world.

2606.18327 2026-06-18 cs.LG cs.AI 新提交

Self-CTRL: Self-Consistency Training with Reinforcement Learning

Self-CTRL:基于强化学习的自一致性训练

Itamar Pres, Laura Ruis, Melat Ghebreselassie, Belinda Z. Li, Jacob Andreas

发表机构 * MIT CSAIL(麻省理工学院计算机科学与人工智能实验室)

AI总结 提出Self-CTRL方法,通过强化学习优化语言模型自我解释与行为之间的一致性,在概率推理和宪法AI任务上显著提升一致性和安全性。

Comments 34 pages, 12 figures, includes appendices

详情
AI中文摘要

能够忠实描述自身行为的语言模型(LMs)更容易被用户审计、理解和信任。本文描述了基于强化学习的自一致性训练(Self-CTRL),该方法通过更新解释以更好地预测行为或更新行为以更好地匹配解释,优化LM的自我解释与相关输入行为之间的一致性。我们在两个领域应用该方法。首先,研究一个形式化概率推理任务,其中LM必须学习模仿一组有偏采样器,并评估其报告相关偏差的能力。我们发现,一致性训练将自我报告和行为测量的潜在偏差之间的相关性从$R^2=0.24$提高到$R^2=0.64$(在保留分布上),匹配直接真实标签监督的泛化能力。其次,研究一个宪法AI领域,其中LM必须描述何时拒绝或遵守用户请求。在此,Self-CTRL产生忠实描述模型在保留请求上行为的规则,将第三方审计模型的拒绝预测从$36\%$提高到$92\%$。另一方面,行为更新改善了对齐,将HarmBench失败率从$15.0\%$降低到$0.5\%$,而不会显著增加对无害提示的拒绝。通过对齐解释和行为,我们的工作为训练更安全、更透明、更可控的AI模型提供了通用方法。

英文摘要

Language models (LMs) that faithfully describe their own behavior can more easily be audited, understood, and trusted by users. This paper describes Self-Consistency Training with Reinforcement Learning (Self-CTRL), a method that optimizes for consistency between a LM's self-explanations and behavior on related inputs by updating explanations to better predict behavior or updating behavior to better match explanations. We apply our method in two domains. First, we study a formal probabilistic reasoning task in which LMs must learn to imitate a family of biased samplers and evaluated on their ability to report the associated biases. We find that consistency training improves the correlation between self-reported and behaviorally-measured latent biases from $R^2=0.24$ to $R^2=0.64$ on a set of held-out distributions, matching the generalization of direct ground-truth supervision. Second, we study a constitutional AI domain in which LMs must describe when they will refuse or comply with user requests. Here, Self-CTRL produces rules that faithfully describe the model's behavior on held-out requests, improving the refusal predictions of a third-party auditor model from $36\%$ to $92\%$. In the other direction, behavior updates improve alignment, reducing HarmBench failure rate from $15.0\%$ to $0.5\%$ without substantially increasing refusal on harmless prompts. By aligning explanations and behavior, our work provides a general recipe for training AI models to be safer, more transparent, and more controllable.

2606.18326 2026-06-18 cs.LG 新提交

Neural Network Implementation of the Renormalization Group for Fault Diagnosis with Class Imbalance

基于重正化群神经网络的类别不平衡故障诊断

Evgeny Nikulchev, Dmitry Ilin

发表机构 * MIREA – Russian Technological University(莫斯科俄罗斯技术大学)

AI总结 提出RGNet,一种基于重正化群概念的神经网络架构,通过层次化粗粒化特征空间处理类别不平衡和多维噪声,在AI4I数据集上验证了其有效性。

Comments 8 pages

详情
AI中文摘要

机器学习模型在实际任务中的应用面临类别不平衡和多维噪声等挑战。本文提出RGNet,一种基于重正化群(RG)概念的神经网络架构,用于特征空间的层次化粗粒化。该模型依次压缩输入维度,并在分类前拼接所有尺度,从而捕获局部细节和全局模式。引入了RG流的概念——可解释的低维表示,通过t-SNE可视化揭示了离散曲线结构,证实了粗粒化的有效性。在不平衡的AI4I数据集上给出了实验结果。结果表明,RGNet是一种通用、可解释且具有竞争力的故障预测解决方案,适用于类别不平衡的应用场景。

英文摘要

The application of machine learning models in practical tasks faces challenges such as class imbalance and multidimensional noise. This paper proposes RGNet, a neural network architecture based on the concept of the renormalization group (RG), for hierarchical coarse-graining of the feature space. The model sequentially compresses the input dimensionality and concatenates all scales before classification, allowing it to capture both local details and global patterns. The notion of RG-flows is introduced - interpretable low-dimensional representations whose visualization via t-SNE reveals a discrete curvilinear structure confirming the effectiveness of coarse-graining. Experimental results are presented on the imbalanced AI4I dataset. The obtained results demonstrate that RGNet is a universal, interpretable, and competitive solution for fault prediction in applications with imbalanced classes.

2606.18324 2026-06-18 cs.LG cs.AI 新提交

Why SWAVE May Not Be All You Need:A Concept-Evolution Retrospective on Complex-Valued Recurrent Language Models

为什么SWAVE可能不是你所需的一切:复数值循环语言模型的概念演化回顾

Ramprasath Ganesaraja, Swathika N, Sahil Dilip Panse

发表机构 * EdgeVerve Systems Limited(EdgeVerve系统有限公司)

AI总结 本文回顾了复数值循环语言模型SWAVE的演化过程,揭示了其设计假设的缺陷,并提出了cos-domination collapse等理论见解和工程原则。

详情
AI中文摘要

SWave是一个复数值循环语言模型(169.26M参数,D=384,L=16,T=2048),在FineWeb-Edu上使用2xH100 NVL训练。它基于三个基本前提设计:将语言表示为复数值波而非实数值能实现更丰富的信息编码;Cayley参数化的酉变换提供数学保证防止状态衰减或爆炸;旋转而非收缩的隐藏状态能在任意长上下文中保持信号完整性。SWave的核心在三个开发阶段中经历了实质性演化。发现Resonance Head在结构上允许虚通道坍缩为全局损失最小值(我们称为cos-domination collapse的失败模式),并被来自相位关联记忆(PAM)架构的具有独立实部和虚部嵌入表的解耦头取代。这解决了退化最小值,并实现了稳定的200,000步训练(最佳步PPL 22.0,第89,861步)。ComplexNorm和Wave Propagation Scan在所有三个阶段中都是承重结构,并保留在最终架构中。ProtectGatedScan被重新定义为结构先验而非学习行为。四个多尺度保留概念在受控评估下未显示可测量的改进,被发现非承重。ComplexGatedUnit被参数更少的实值平方ReLU通道混合器取代。一旦结构约束得到解决,辅助训练目标未显示益处。研究得出了cos-domination collapse的形式化描述、用于数值稳定性的对数空间反向传播并行扫描、六个可迁移的复数值循环训练工程原则,以及用于捕捉传统测试套件遗漏的结构偏差的计划到代码可追溯性方法。

英文摘要

SWave is a complex-valued recurrent language model (169.26M parameters, D=384, L=16, T=2048) trained on FineWeb-Edu using 2xH100 NVL. It was designed around three founding premises: that representing language as complex waves rather than real-valued numbers enables richer information encoding; that a Cayley-parameterised unitary transition provides a mathematical guarantee against state decay or explosion; and that a hidden state which rotates rather than shrinks preserves signal integrity over arbitrarily long contexts. The core of SWave evolved substantially across three development phases. The Resonance Head was found to structurally admit imaginary-channel collapse as a global loss minimum (a failure mode we term cos-domination collapse) and was superseded by an untied head with independent real and imaginary embedding tables from the Phase-Associative Memory (PAM) architecture. This resolved the degenerate minimum and enabled stable 200,000-step training (best-step PPL 22.0 at step 89,861). ComplexNorm and the Wave Propagation Scan proved load-bearing throughout all three phases and were retained to the final architecture. ProtectGatedScan was reframed as a structural prior rather than a learned behaviour. The four multi-scale retention concepts showed no measurable improvement under controlled evaluation and were found non-load-bearing. The ComplexGatedUnit was superseded by a real-valued squared-ReLU channel mixer with fewer parameters. The auxiliary training objectives showed no benefit once structural constraints were resolved. The investigation yields a formal characterisation of cos-domination collapse, a parallel scan with a log-space backward pass for numerical stability, six transferable engineering principles for complex-valued recurrent training, and a plan-to-code traceability methodology for catching structural divergences that conventional test suites miss.

2606.18323 2026-06-18 cs.SD cs.LG 新提交

Reliable Neural-Codec Text-to-Speech by ASR Self-Verification and Distillation: Near-Zero Catastrophic Failures Across Models and Codecs

通过ASR自验证与蒸馏实现可靠的神经编解码文本转语音:跨模型与编解码器的近零灾难性失败

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结 针对开放自回归神经编解码TTS模型的随机灾难性失败(静音、早停、重复或幻觉),提出基于ASR往返的格式鲁棒度量,通过最佳N自验证将失败率降至近零,并通过蒸馏将鲁棒性迁移至单次解码,在无测试代价下关闭约52-58%的失败。

详情
AI中文摘要

开放自回归神经编解码文本转语音(TTS)模型在典型输入上表现优异,但会出现随机灾难性失败:在相当一部分话语中,它们会发出静音、提前终止或陷入重复或幻觉内容。我们表明这种失败模式可以廉价地消除。在单一格式鲁棒度量(通过ASR往返的灾难性失败率)下,最佳N ASR自验证将失败率降至近零:在标准语料库(LibriSpeech)上N=2时未观察到失败,在困难提示集上N=4时也未观察到。这不是单一模型的假象:该减少在四个开放编解码TTS系统和三个神经编解码器(XCodec2、SNAC、Mimi)上复现,其中三个系统在N=2时达到近零下限。然后,通过将自验证行为蒸馏到模型中,我们在推理时免费实现了修复,这恢复了单次解码中的大部分鲁棒性,在无测试代价下关闭了困难输入上约52-58%的失败。蒸馏增益集中在需要的地方(困难输入);在已经可靠的散文上,没有改进空间且无检测到变化。一项受控比较添加了一个干净的负面结果:离线直接偏好优化(DPO/IPO)并未优于普通监督蒸馏,而在线迭代变体虽有前景但在我们的评估规模下统计上不显著。我们诚实地报告了唯一抵抗的模型(一个更大的Llasa,其中规模并未明显帮助)以及一个罕见词能力上限,该上限无法通过任何自蒸馏方法克服。

英文摘要

Open autoregressive neural-codec text-to-speech (TTS) models sound excellent on typical inputs yet suffer stochastic catastrophic failures: on a meaningful fraction of utterances they emit silence, terminate early, or collapse into repetitive or hallucinated content. We show this failure mode is cheap to remove. Under a single format-robust metric (a catastrophic-failure rate via an ASR round-trip), best-of-N ASR self-verification drives failures to near-zero: no observed failures remain by N=2 on a standard corpus (LibriSpeech) and by N=4 on a hard prompt set. This is not an artifact of one model: the reduction replicates across four open codec-TTS systems and three neural codecs (XCodec2, SNAC, Mimi), reaching the near-zero floor by N=2 on three of the four. We then make the fix free at inference time by distilling the self-verified behaviour into the model, which recovers much of the robustness in single-shot decoding, closing ~52-58% of the failure mass on hard inputs at no test-time cost. The distillation gain concentrates where it is needed (hard inputs); on already-reliable prose there is no headroom and no detectable change. A controlled comparison adds a clean negative: offline direct preference optimization (DPO/IPO) does not beat plain supervised distillation, and an online iterative variant is promising but not statistically separable at our evaluation size. We report honestly the one model that resists (a larger Llasa where scale did not obviously help) and a rare-word capability ceiling that no self-distillation method overcomes

2606.18322 2026-06-18 cs.LG cs.AI 新提交

SAE Interventions are Unreliable: Post-Intervention Recovery of Suppressed Behavior

SAE干预不可靠:干预后抑制行为的恢复

Mingyue Cui, Linghui Shen, Xingyi Yang

发表机构 * The Hong Kong Polytechnic University(香港理工大学)

AI总结 研究发现稀疏自编码器(SAE)特征干预虽能抑制行为,但存在可恢复的失败模式,通过优化残差扰动可恢复原始行为,揭示特征级控制与行为完整性之间的差距。

Comments Code: https://github.com/Mingyuee88/sae-post-intervention-recovery, Project page: https://mingyuee88.github.io/sae-post-intervention-recovery/

详情
AI中文摘要

稀疏自编码器(SAE)将残差流激活分解为可解释特征。最近的潜在空间防御越来越依赖这些分解,假设识别出的“不安全”SAE特征可作为监控和干预的可操作手柄。在这种范式下,固定特定有害特征预期能可靠地防止模型不当行为。然而,我们表明这种成功可能隐藏一种可恢复的失败模式:固定可能阻止行为的一条可见路径,但并未消除行为本身。我们将这种脆弱性形式化为干预后恢复,这是一个受约束的残差空间优化问题。从干预后的残差状态开始,我们优化残差扰动以恢复干预前的行为,同时保持目标SAE特征的干预后值。即使在干预在优化和生成过程中保持活跃的强威胁模型下,恢复仍然可能。为了排除恢复仅仅是撤销干预的可能性,我们使用编码器正交更新进行单层干预,并在跨层设置中使用相应的特征图雅可比矩阵。在TPP、遗忘、IOI和拒绝引导实验中,这种压力测试揭示了尽管特征级干预成功,行为仍可恢复。特别是在安全关键的拒绝引导设置中,我们在有效样本上实现了95.8%的恢复率,同时将防御特征的相对漂移保持在0.131,远低于基于后缀的基线。恢复路径归因分析进一步将这种恢复定位到SAE重建残差,即SAE未解释的组件。这些结果暴露了特征级控制与行为完整性之间的差距:SAE特征可以支持因果干预,但控制它们并不能保证对底层行为的控制。

英文摘要

Sparse Autoencoders (SAEs) decompose residual-stream activations into interpretable features. Recent latent-space defenses increasingly rely on these decompositions, assuming that identified "unsafe" SAE features serve as actionable handles for monitoring and intervention. In this paradigm, clamping a specific harmful feature is expected to reliably prevent model misbehavior. However, we show that this success may hide a recoverable failure mode: the clamp may block one visible route to a behavior without eliminating the behavior itself. We formulate this vulnerability as post-intervention recovery, a constrained residual-space optimization problem. Starting from the post-intervention residual state, we optimize residual perturbations to recover the pre-intervention behavior while preserving the post-intervention values of the targeted SAE features. Even under a strong threat model where the intervention remains active throughout optimization and generation, recovery remains possible. To rule out that recovery simply undoes the intervention, we use encoder-orthogonal updates for single-layer interventions and the corresponding feature-map Jacobian in the cross-layer setting. Across TPP, unlearning, IOI, and refusal steering experiments, this stress test reveals recoverable behavior despite successful feature-level intervention. Especially in the safety-critical refusal-steering setting, we achieve a 95.8% recovery rate on valid samples while keeping defended-feature relative drift to 0.131, substantially below suffix-based baselines. A recovery-path attribution analysis further localizes this recovery to the SAE reconstruction residual, the component left unexplained by the SAE. These results expose a gap between feature-level control and behavioral completeness: SAE features can support causal intervention, but controlling them does not guarantee control over the underlying behavior.

2606.18319 2026-06-18 cs.LG cs.AI cs.HC cs.SE 新提交

ASTRA: A Scalable Next-Generation ATCO Training Simulator with Autonomous Simpilots

ASTRA:一种具有自主模拟飞行员的可扩展下一代空中交通管制员训练模拟器

Ethan Chew, Enjia Wu, Iruss Eng Wei Yeow, Ian Weiqin Lim, Ranen Sim, Brandon Koh Ziheng, Kaleb Nim, Caden Toh Jun Yi, Wei Dong Soin, Darius Kai Keat Koh, Galen King Yu Tay, Prannaya Gupta, Jonathan Ee Fang Koong, Yong Zhi Lim

发表机构 * Air Emerging Technologies High-Speed Experimentations and Research (AETHER), RSAF Agile Innovation Digital (RAiD), Republic of Singapore Air Force(新加坡共和国空军敏捷创新数字实验室空中新兴技术高速实验与研究)

AI总结 提出ASTRA模拟器,通过微调ASR将词错误率降至23.45%,并集成AI评估框架,实现可扩展的标准化ATCO训练。

详情
AI中文摘要

空中交通管制员(ATCO)对于确保空中交通的安全、有序和高效至关重要,但培训能力受到依赖专门的人类培训师(称为模拟飞行员)的限制,这些培训师必须在模拟空域中扮演飞行员和ATCO的双重角色。现有的自动化解决方案依赖于西方中心的语音模型,这些模型在新加坡的运营环境中表现不佳,现成的系统在新加坡口音的航空语音上词错误率(WER)高达107.80%。我们引入了ASTRA,一个端到端的训练模拟器,通过一个流水线自动化这些模拟飞行员角色,该流水线转录ATCO语音、解释指令,并使用本地适应的语音模型生成适当的飞行员和ATCO响应。我们微调的自动语音识别(ASR)流水线将WER降低到23.45%,在该领域显著优于现有方法。除了交通模拟,ASTRA还集成了一个AI辅助的性能评估框架,该框架评估受训者的无线电通信的准确性、简洁性和完整性,优化后得分分别为91.7%、88.2%和86.9%。基于DSPy和Unsloth等开源基础,这种方法实现了可扩展、标准化的ATCO评估,同时减少了教师的工作量。

英文摘要

Air Traffic Control Operators (ATCOs) are vital in ensuring the safe, orderly, and efficient flow of air traffic, yet training capacity is constrained by reliance on specialized human trainers known as simpilots, who must role-play both pilots and ATCOs in a simulated airspace. Existing automated solutions rely on Western-centric speech models that perform poorly in Singaporean operational contexts, with off-the-shelf systems exhibiting Word Error Rates (WER) of up to 107.80% on Singaporean-accented aviation speech. We introduce ASTRA, an end-to-end training simulator that automates these simpilot roles through a pipeline that transcribes ATCO speech, interprets instructions, and generates appropriate pilot and ATCO responses using locally adapted voice models. Our fine-tuned Automatic Speech Recognition (ASR) pipeline reduces WER to 23.45%, substantially outperforming existing approaches in this domain. Beyond traffic simulation, ASTRA incorporates an AI-assisted performance evaluation framework that assesses trainee radiotelephony communications across accuracy, brevity, and completeness, achieving post-optimization scores of 91.7%, 88.2%, and 86.9%, respectively. Built on open-source foundations such as DSPy and Unsloth, this approach enables scalable, standardized ATCO assessment while reducing instructor workload.

2606.18317 2026-06-18 cs.LG 新提交

Enhanced Graph Neural Networks using K-Hop Gaussian Diffusion

使用K跳高斯扩散增强图神经网络

Xuling Zhang, Peng Wang, Daiyan Li, Aoran Huang, Zeiwei Chen, Yongkui Yang

发表机构 * Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences(中国科学院深圳先进技术研究院) Southern University of Science and Technology(南方科技大学)

AI总结 提出K跳高斯扩散核作为预处理模块,通过多跳扩散和高斯权重平衡局部与全局信息,在噪声或结构复杂图中优于传统消息传递和现有扩散方法。

Comments 5page, 3 figures

详情
AI中文摘要

大多数图神经网络核心依赖于图卷积,通常实现为直接(单跳)邻居之间的消息传递。在许多现实世界的图中,边可能带有噪声或定义不明确,限制了信息传播到局部邻域。现有的扩散核,如个性化PageRank和热核,通过全局传播缓解了这个问题,但仍然难以处理复杂的局部结构和远距离节点噪声。为了解决这些限制,我们提出了一种K跳高斯扩散核作为图数据的预处理模块。KHG引入了多跳扩散,并对远程节点进行高斯加权,在应用标准GNN之前平衡局部和全局信息传播。在多个基准数据集上的实验表明,KHG显著优于传统的消息传递GNN,以及PPR和热核扩散,特别是在噪声或结构复杂的图中。

英文摘要

Most graph neural network (GNN) cores rely on graph convolutions, typically implemented as message passing between direct (single-hop) neighbors. In many real-world graphs, edges can be noisy or poorly defined, limiting information propagation to local neighborhoods. Existing diffusion kernels, such as Personalized PageRank (PPR) and Heat Kernel, alleviate this issue through global propagation, but still struggle with complex local structures and distant node noise. To address these limitations, we propose a K-Hop Gaussian (KHG) diffusion kernel as a preprocessing module for graph data. KHG introduces multi-hop diffusion with Gaussian weighting for remote nodes, balancing local and global information propagation before applying standard GNNs. Experiments on multiple benchmark datasets demonstrate that KHG significantly outperforms traditional message-passing GNNs, as well as PPR and Heat Kernel diffusion, particularly in noisy or structurally complex graphs.

2606.18316 2026-06-18 cs.LG 新提交

A Survey on Data-Driven Models for Soil Moisture Regression and Classification

基于数据驱动的土壤湿度回归与分类模型综述

Ilektra Tsimpidi, George Georgoulas, Vidya Sumathy, George Nikolakopoulos

发表机构 * Electrical Engineering\ University of Technology\ , Sweden(电气工程\ 技术大学\ ,瑞典)

AI总结 综述了基于AI的土壤湿度建模方法,分为五类:统计时间序列、地统计、经典机器学习、深度学习和概率/贝叶斯方法,利用多源数据实现回归或分类。

Comments 14 pages, 3 figures, AIAI 2026 Conference

详情
AI中文摘要

土壤湿度(SM)建模构成一个复杂的时空学习问题,其特点是非线性环境相互作用、异构数据源和有限的地面观测。基于物理的方法,如水量平衡模型,依赖于明确的水文方程和高质量的输入,但其计算成本和可扩展性限制阻碍了大规模部署。数据驱动的人工智能(AI)方法已成为灵活的替代方案,能够以较少的建模假设提取土壤湿度与环境变量之间的经验关系。本文对基于AI的土壤湿度估计和分类模型进行了结构化综述。现有方法被组织为五类:(a)统计时间序列模型,(b)地统计方法,(c)经典机器学习(ML)模型,(d)深度学习(DL)模型和(e)概率/贝叶斯方法。这些模型利用历史土壤湿度记录、气象变量、植被指数、地形、土壤特征和地理位置数据来执行回归或分类任务。

英文摘要

Soil Moisture (SM) modelling constitutes a complex spatiotemporal learning problem characterised by nonlinear environmental interactions, heterogeneous data sources, and limited ground observations. Physics-based approaches, such as water balance models, rely on explicit hydrological equations and high-quality inputs, but their computational cost and scalability limitations restrict large-scale deployment. Data-driven artificial intelligence (AI) methods have emerged as flexible alternatives, enabling the extraction of empirical relationships between soil moisture and environmental variables with reduced modelling assumptions. This work presents a structured survey of AI-based models for soil moisture estimation and classification. Existing approaches are organized into five categories: (a) statistical time-series models, (b) geostatistical methods (c) classical machine learning (ML) models, (d) Deep Learning (DL) models and (e) Probabilistic/Bayesian methods. These models leverage historical soil moisture records, meteorological variables, vegetation indices, topography, soil characteristics, and geolocation data to perform regression or classification tasks.

2606.18315 2026-06-18 cs.LG cs.AI 新提交

Ghost Attractor Networks: Basin-Structured Dynamical Decoders for Closed-Loop Sequential Generation

鬼吸引子网络:用于闭环序列生成的盆地结构动力学解码器

Tianyu Wang, Ying Wang, Zhihao Liu, Xi Vincent Wang, Lihui Wang

发表机构 * KTH Royal Institute of Technology(瑞典皇家理工学院) Department of Production Engineering, KTH Royal Institute of Technology(瑞典皇家理工学院生产工程系) Department of Decision and Control Systems, KTH Royal Institute of Technology(瑞典皇家理工学院决策与控制系统系)

AI总结 提出鬼吸引子网络,一种理论推导的动力学解码器,通过构建盆地-吸引子结构实现高效闭环序列生成,在机器人动作解码任务中以2.3M参数匹配1.07B参数扩散变压器的离线精度,延迟降低32倍。

详情
AI中文摘要

使用大规模Transformer和扩散解码器进行序列输出生成时,内存成本随序列长度增长,且需要迭代逐步骤计算。用小型前馈解码器替代可恢复效率,但产生非结构化的潜在表示,限制了闭环控制:相位条件动作生成和跨步骤潜在传递都需要具有稳定盆地的潜在几何结构。本文提出鬼吸引子网络,一种理论推导的动力学解码器,其潜在变量在学习的势能下演化并带有漂移,通过构造产生盆地-吸引子结构。三个期望(多模态、解码器级单次切换和恒定内存)激发了势能-漂移形式,模式转变作为鞍结分岔和鬼吸引子逃逸出现。层次化的相空间分解将一阶盆地收敛与二阶本体感受细化分开。实验上,使用行为克隆和对比目标端到端训练的鬼网络在其势能中表现出预测的梯度流收缩,在1430个保留样本上,梯度范数在五个积分步骤中衰减67%。鬼网络作为机器人动作解码器进行评估。一个230万参数的鬼网络以462倍少的参数和32倍低的延迟匹配了10.7亿参数扩散变压器的离线精度,并在离线均方误差上比五个替代的200万参数解码器(MLP、神经常微分方程、条件变分自编码器、Transformer、单步扩散)低5.9%至29%。在LIBERO-10闭环基准测试中,鬼网络的盆地结构潜在上的相位条件比前馈MLP基线提高了13.5个百分点的成功率,持久潜在集成达到95.7%的最终成功率。

英文摘要

Sequential output generation with large-scale Transformer and diffusion decoders pays a memory cost that grows with sequence length, plus iterative per-step computation. Replacing them with small feed-forward decoders restores efficiency but produces unstructured latent representations that limit closed-loop control: phase-conditioned action generation and cross-step latent carry-over both require a latent geometry with stable basins. This article proposes Ghost Attractor Networks, a theoretically derived dynamical decoder whose latent evolves under a learned potential with drift and produces a basin-attractor structure by construction. Three desiderata (multi-modality, decoder-level single-pass switching, and constant memory) motivate the potential-drift form, and mode transitions arise as saddle-node bifurcations with ghost-attractor escape. A hierarchical phase-space decomposition separates first-order basin convergence from second-order proprioceptive refinement. Empirically, a Ghost trained end-to-end with a behavioral-cloning and contrastive objective exhibits the predicted gradient-flow contraction in its potential, with the gradient norm decaying by 67 percent across five integration steps on 1430 held-out samples. Ghost is evaluated as a robotic action decoder. A 2.3-million-parameter Ghost matches the offline accuracy of a 1.07-billion-parameter Diffusion Transformer at 462 times fewer parameters and 32 times lower latency, and beats five alternative 2M-parameter decoders (MLP, Neural ODE, CVAE, Transformer, 1-step Diffusion) on offline mean squared error by 5.9 to 29 percent. On the LIBERO-10 closed-loop benchmark, phase conditioning on Ghost's basin-structured latent yields a 13.5 percentage-point success-rate gain over a feed-forward MLP baseline, and persistent-latent ensembling reaches a 95.7 percent final success rate.

2606.18309 2026-06-18 cs.LG cs.AI 新提交

SAGE: Retain-Aware Post-Hoc Sanitization of Final Unlearning Vector

SAGE: 保留感知的最终遗忘向量事后净化

Jingyuan Zhang, Yucheng Bai, Peixi Wen, Zhehao Huang, Zhengbao He, Hanling Tian, Xinwen Cheng, Haiyin Ran, Xiaolin Huang

发表机构 * Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University(上海交通大学图像处理与模式识别研究所)

AI总结 提出SAGE方法,通过事后净化最终更新向量,在不重新运行原始遗忘流程的情况下,缓解大语言模型遗忘与保留能力之间的权衡。

详情
AI中文摘要

大语言模型(LLM)遗忘旨在移除不良知识或行为,同时保留已有能力。当前的遗忘方法都涉及遗忘与保留之间的权衡。我们发现,保留激活偏差也可用于量化遗忘方法对保留造成的损害,而无需考虑遗忘过程的具体实现。这使得我们能够通过事后方法恢复任何遗忘方法的保留性能。因此,我们提出一种互补的事后设置,在不重新运行原始遗忘流程的情况下净化最终更新向量。在该设置中,我们设计了SAGE(光谱激活-几何净化),一种对最终遗忘更新的源无关修正。SAGE从一个小型保留代理收集真实模块输入,提取其主导激活几何结构,并求解一个闭式源锚定优化目标,该目标抑制与高能保留方向对齐的更新分量,同时保留源方法的遗忘载体。在多种遗忘方法、模型规模和基准测试中,SAGE持续缓解保留-遗忘权衡,将最终向量的事后净化识别为机器遗忘中一个实用且未被充分探索的维度。

英文摘要

Large Language Model (LLM) unlearning aims to remove undesirable knowledge or behaviors while preserving retained capabilities. Current unlearning methods all involve a trade-off between unlearning and retention. We have found that the retention activation bias can also be used to quantify the damage an unlearning method inflicts on retention, without considering the specific implementation of the unlearning process. This allows us to restore retention performance for any unlearning method using a post-hoc approach. Therefore, we propose a complementary post-hoc setting to sanitize the final update vector without rerunning the original unlearning pipeline. In this setting, we design SAGE, Spectral Activation-GEometry Sanitization, a source-agnostic correction for final unlearning updates. SAGE collects real module inputs from a small retain proxy, extracts their dominant activation geometry, and solves a source-anchored optimization objective in closed form, which suppresses update components aligned with high-energy retained directions while preserving the source method's forgetting carrier. Across multiple unlearning methods, model scales, and benchmarks, SAGE consistently relieves the retain-forget trade-off, identifying post-hoc sanitization of final vectors as a practical and underexplored axis for machine unlearning.

2606.18308 2026-06-18 cs.LG cs.AI 新提交

TRIDENT: Breaking the Hybrid-Safety-Physics Coupling for Provably Safe Multi-Agent Reinforcement Learning

TRIDENT: 打破混合安全-物理耦合以实现可证明安全的多智能体强化学习

Zijie Meng, Ziwei Li, Yufei Liu, Zhiyu Li, Jiyuan Liu, Wenhua Nie, Bingcai Wei, Miao Zhang

发表机构 * Peking University(北京大学) Xiamen University(厦门大学) National Taiwan University(国立台湾大学) WHU(武汉大学) THU / Jimei University(清华大学 / 集美大学)

AI总结 针对混合离散-连续动作、训练时安全约束和物理动力学形成的耦合问题,提出TRIDENT框架,通过Richardson-Romberg梯度校正、Lyapunov约束序列信任域更新和物理信息残差评论家,实现可证明的安全收敛,显著降低训练违规并提升奖励。

Comments 16 pages, 4 figures

详情
AI中文摘要

网络化信息物理系统中的安全协调迫使学习算法同时处理混合离散-连续动作、严格的训练时安全约束和物理支配的动力学。我们证明这三个特征形成了一个有向偏差循环,击败了任何现成模块的朴素组合,并将其形式化为一个三向耦合引理。然后我们引入TRIDENT,这是第一个MARL框架,其三个组件被共同设计以消除每个泄漏:一个将Gumbel-Softmax偏差从O(tau)降低到O(tau^2)的Richardson-Romberg梯度校正,一个强制每次迭代可行性的Lyapunov约束顺序信任域更新,以及一个分解价值而非奖励的物理信息残差评论家。我们证明了以O~(1/sqrt(K))的收敛速率达到约束纳什均衡,以及O(sqrt(K))的累积违规界。在多无人机移动边缘计算、自主交叉口管理和混合SMAC变体上,TRIDENT相比MADDPG减少了95.5%的训练时违规,相比MACPO减少了76.3%,同时相比最强的无约束基线提高了13.5%的奖励。

英文摘要

Safe coordination in networked cyber-physical systems forces learning algorithms to simultaneously handle hybrid discrete-continuous actions, hard training-time safety constraints, and physics-governed dynamics. We show that these three features form a directed cycle of biases that defeats any naive composition of off-the-shelf modules, and formalize this as a three-way coupling lemma. We then introduce TRIDENT, the first MARL framework whose three components are co-designed to cancel each leak: a Richardson-Romberg gradient correction reducing Gumbel-Softmax bias from O(tau) to O(tau^2), a Lyapunov-constrained sequential trust-region update enforcing per-iterate feasibility, and a physics-informed residual critic that decomposes value rather than reward. We prove an O~(1/sqrt(K)) convergence rate to a constrained Nash equilibrium and an O(sqrt(K)) cumulative-violation bound. On multi-UAV mobile-edge computing, autonomous intersection management, and a hybrid SMAC variant, TRIDENT cuts training-time violations by 95.5% over MADDPG and 76.3% over MACPO, while improving reward by 13.5% over the strongest unconstrained baseline.

2606.18307 2026-06-18 cs.LG cs.AI 新提交

DRIFT: Refining Instruction Data via On-Policy Data Attribution

DRIFT: 通过在线策略数据归因优化指令数据

Zefan Wang, Lincheng Li, Tianyu Yu, Yuan Yao

发表机构 * Tsinghua University(清华大学)

AI总结 提出DRIFT方法,利用在线策略影响函数解决标准影响函数在指令微调数据归因中的近邻偏差和梯度范数偏差问题,通过模型自身生成作为验证目标,提升7B模型性能上限。

详情
AI中文摘要

优化监督微调(SFT)的训练数据分布决定了大型语言模型(LLMs)的能力。虽然现有的数据筛选方法在有限预算下加速训练方面表现出色,但它们不太适合提升能力上限。这里的挑战不再是识别一个保持性能的较小子集,而是将数据分布优化为最能提升最终模型的实例。为了解决这个问题,我们探索了使用影响函数(IF)进行实例级数据归因。我们发现标准IF公式在此设置中存在两个结构限制:由离策略验证目标引起的近邻偏差,以及对梯度范数的严重偏向。我们提出了DRIFT(通过在线策略影响函数进行数据优化用于监督微调)。DRIFT不依赖外部参考数据,而是利用模型的在线策略生成作为验证目标,这在经验上最小化了参数近邻偏差,并更好地符合IF的局部邻域假设。它进一步基于轨迹正确性应用符号加权,并针对梯度操纵问题对影响分数进行去偏,使得少量验证查询能够作为可靠锚点来归因整个数据集。在7B参数指令和推理模型上的实验表明,DRIFT持续提升了两者的性能上限,优于现有的数据筛选基线。

英文摘要

Optimizing the training data distribution for Supervised Fine-Tuning (SFT) dictates the capability of Large Language Models (LLMs). While existing data curation methods excel at accelerating training under constrained budgets, they are less suited to elevating the capability upper bound. The challenge here is no longer to identify a smaller subset that preserves performance, but to refine the data distribution toward instances most capable of improving the final model. To address this problem, we explore instance-level data attribution using Influence Functions (IF). We identify that standard IF formulations struggle in this setting due to two structural limitations: a proximity gap caused by off-policy validation targets, and a severe bias towards gradient norm. We propose DRIFT (Data Refinement via On-Policy Influence Functions for Supervised Fine-Tuning). Instead of relying on external reference data, DRIFT utilizes the model's on-policy rollouts as validation targets, which empirically minimizes the parameter proximity gap and better aligns with the local neighborhood assumption of IF. It further applies signed weighting based on trajectory correctness and debiases influence scores against the gradient hacking issue, allowing a small set of validation queries to act as reliable anchors for attributing the full dataset. Experiments on 7B-parameter instruction and reasoning models show that DRIFT consistently raises the performance ceiling on both, outperforming existing data curation baselines.

2606.18306 2026-06-18 cs.LG stat.ML 新提交

Fisher Width: A Geometric Measure of Complexity on Statistical Manifolds

Fisher宽度:统计流形上的几何复杂度度量

Vu Khac Ky

发表机构 * Department of Mathematics, FPT University(FPT大学数学系)

AI总结 提出Fisher宽度作为统计流形上高斯宽度的类比,利用Fisher信息度量局部几何,并证明其保持高斯宽度的关键性质,应用于Fisher-Lipschitz假设类的泛化界。

Comments 48 pages, 3 figures

详情
AI中文摘要

高斯宽度是高维概率、压缩感知、凸优化和学习理论中的一个核心几何复杂度度量。它量化了集合沿随机方向的平均延伸程度,从而捕捉了约束集、假设类和下降锥的有效维度。然而,这一概念本质上是欧几里得的。统计模型则具有由Fisher信息度量诱导的自然黎曼几何,其中方向根据统计可区分性而非环境欧几里得长度进行缩放。我们引入了Fisher宽度,即统计流形上高斯宽度的Fisher几何类比。在参数点$\ heta$处,Fisher宽度将欧几里得恒等替换为局部度量张量$G(\ heta)^{1/2}$,测量Fisher重缩放集的高斯宽度。这使得所得量对局部统计曲率敏感,且在光滑重参数化下不变。我们发展了Fisher宽度的基本理论,表明它保留了高斯宽度的关键结构特征,包括集中性、度量扰动稳定性以及与欧几里得基线的谱比较界,同时捕捉了欧几里得度量无法察觉的各向异性几何效应。作为应用,我们证明了Fisher-Lipschitz假设类的泛化界,并提出了可计算的估计量,在MNIST上对三个模型类进行了实证评估。Fisher宽度之于统计流形,正如高斯宽度之于欧几里得凸体。这项工作为研究弯曲统计流形上的复杂性和学习奠定了基础。

英文摘要

Gaussian width is a central geometric complexity measure in high-dimensional probability, compressed sensing, convex optimization, and learning theory. It quantifies the average extent of a set along random directions, thereby capturing the effective dimension of constraint sets, hypothesis classes, and descent cones. However, this notion is intrinsically Euclidean. Statistical models instead carry a natural Riemannian geometry induced by the Fisher information metric, where directions are scaled according to statistical distinguishability rather than ambient Euclidean length. We introduce Fisher width, a Fisher-geometric analogue of Gaussian width for statistical manifolds. At a parameter point $θ$, Fisher width replaces the Euclidean identity by the local metric tensor $G(θ)^{1/2}$, measuring the Gaussian width of the Fisher-rescaled set. This makes the resulting quantity sensitive to local statistical curvature and invariant under smooth reparameterizations. We develop the basic theory of Fisher width, showing that it retains key structural features of Gaussian width, including concentration, metric perturbation stability, and spectral comparison bounds with the Euclidean baseline, while also capturing anisotropic geometric effects invisible to Euclidean measures. As an application, we prove a generalization bound for Fisher-Lipschitz hypothesis classes and propose computable estimators, which we evaluate empirically on MNIST across three model classes. Fisher width is to statistical manifolds what Gaussian width is to Euclidean convex bodies. This work lays the foundation for studying complexity and learning on curved statistical manifolds.

2606.18304 2026-06-18 cs.LG cs.AI 新提交

Attribution-Guided and Coverage-Maximized Pruning for Structural MoE Compression

基于归因引导和覆盖最大化的结构MoE剪枝

Yifu Ding, Jiacheng Wang, Ge Yang, Yongcheng Jing, Jinyang Guo, Xianglong Liu, Dacheng Tao

发表机构 * School of Computer Science and Engineering, Beihang University(北京航空航天大学计算机科学与工程学院) School of Artificial Intelligence, Beihang University(北京航空航天大学人工智能学院) Nanyang Technological University(南洋理工大学)

AI总结 针对MoE模型专家级剪枝粒度粗、冗余识别不足的问题,提出基于归因引导和覆盖最大化的结构剪枝框架,将剪枝分配转化为通道分数覆盖优化问题,在50%剪枝率下结合4位量化保持精度,内存减少5.27倍。

Comments 9 pages, 5 figures. Submitted to ICML 2026

详情
AI中文摘要

混合专家(MoE)模型在计算上高效扩展,但由于其巨大的内存占用和推理开销,部署成本仍然很高。先前的压缩方法主要在专家级别操作,要么移除整个专家,要么通过粗粒度的重要性分数对专家进行排序。然而,这种专家级别的决策通常过于粗糙,无法捕捉细粒度的冗余,导致剪枝预算分配不当和压缩效果有限。为了解决这个问题,我们观察到MoE专家内的信息高度集中在一小部分通道中,即使在被认为重要的专家中也存在大量冗余。基于这一观察,我们提出了一种针对MoE模型量身定制的结构剪枝框架。我们的方法将剪枝比例分配重新表述为通道分数覆盖最大化问题,并使用基于归因的近似方法高效求解。在DeepSeek和Qwen MoE模型上的实验表明,我们的方法在结合4位量化时,在50%或25%的结构化剪枝下仍能保持模型精度。在Qwen3-30B-A3B上,我们的方法将内存占用减少了5.27倍,并在各种基准测试中持续优于最先进的基线方法。

英文摘要

Mixture-of-Experts (MoE) models scale compute efficiently, yet remain expensive to deploy due to their substantial memory footprint and inference overhead. Prior compression methods mainly operate at the expert level, either removing entire experts or ranking experts by coarse-grained importance scores. However, such expert-wise decisions are often too coarse to capture fine-grained redundancy, leading to misallocated pruning budgets and limited compression. To address this problem, we observe that information within MoE experts is highly concentrated in a small subset of channels, leaving substantial redundancy even in experts deemed important. Based on this observation, we propose a structural pruning framework tailored for MoE models. Our method reformulates prune-ratio allocation as a channel-score coverage maximization problem and solves it efficiently using an attribution-based approximation. Experiments on DeepSeek and Qwen MoE models show that our method preserves model accuracy under 50% or 25% structured pruning when combined with 4-bit quantization. On Qwen3-30B-A3B, our approach reduces memory footprint by 5.27$\times$ and consistently outperforms state-of-the-art baselines across diverse benchmarks.

2606.18303 2026-06-18 cs.LG cs.AI 新提交

A Link between Shock-wave Theory and Symmetry-reduced Stochastic Gradient Descent for Artificial Neural Networks

冲击波理论与人工神经网络对称约化随机梯度下降之间的联系

Taiki Miyagawa

发表机构 * NEC Corporation(NEC公司)

AI总结 本文通过微分几何、李群和流体力学,建立了冲击波理论与对称商化随机梯度下降学习动力学之间的显式数学联系,并应用于多种神经网络架构。

Comments Accepted to the 35th International Conference on Artificial Neural Networks (ICANN) 2026

详情
AI中文摘要

我们利用微分几何、李群理论和流体力学,在冲击波理论与随机梯度下降的对称商化学习动力学之间建立了显式的数学联系。具体而言,在商化参数对称性并应用局部熵粗粒化后,有效动力学满足商流形上的粘性Hamilton-Jacobi方程。此外,假设原始参数动力学可由商空间上的梯度场概括,粗粒化损失函数的梯度服从Burgers型方程,且可严格建立激波形成。我们将该理论应用于多层感知机、卷积神经网络、Transformer和平均场网络,并证明它们满足Hamilton-Jacobi或Burgers型方程。我们推测该框架也能为深度学习提供实用的诊断工具。在诸如Transformer等架构中,原始参数范数常因对称冗余而失真,可能产生误导,而对称校正的商可观测量为监测、预测和控制训练阶段转变提供了原则性基础。

英文摘要

We develop a mathematically explicit link between shock-wave theory and the symmetry-quotiented learning dynamics of stochastic gradient descent, drawing on differential geometry, Lie group theory, and fluid mechanics. Specifically, after quotienting parameter symmetries and applying local-entropy coarse-graining, the effective dynamics satisfy a viscous Hamilton--Jacobi equation on the quotient manifold. Moreover, under the assumption that the raw parameter dynamics can be summarized by a gradient field on the quotiented space, the gradient of the coarse-grained loss function obeys a Burgers-type equation, and shock formation can be established rigorously. We apply our theory to multilayer perceptrons, convolutional neural networks, Transformers, and mean-field networks, and show that they obey the Hamilton--Jacobi or Burgers-type equations. We conjecture that this framework also yields practical diagnostics for deep learning. In architectures such as Transformers, raw parameter norms are often distorted by symmetry redundancy and may therefore be misleading, whereas symmetry-corrected quotient observables provide a principled basis for monitoring, forecasting, and controlling training-phase transitions.

2606.18287 2026-06-18 cs.LG 新提交

Artemis: Anatomy-Resolved inTervention for Eliminating Multimodal NeuroImage confounderS

Artemis: 解剖分辨的干预方法用于消除多模态神经影像混杂因素

Siyuan Dai, Yang Du, Kun Zhao, Zhusuyi Chen, Heng Huang, Paul Thompson, Chao Shi, Haoteng Tang, Liang Zhan

发表机构 * University of Pittsburgh(匹兹堡大学) University of Maryland(马里兰大学) University of Southern California(南加州大学) Binghamton University(宾汉姆顿大学) University of Texas Rio Grande Valley(德克萨斯大学里奥格兰德河谷分校)

AI总结 提出Artemis框架,通过区域级因果干预学习特定脑区的混杂因素表示,消除fMRI和DTI多模态神经影像中人口统计学混杂因素对GNN的影响,在三个基准上提升性能。

Comments 11 pages, 8 figures

详情
AI中文摘要

多模态神经影像学整合了来自fMRI的功能连接和来自DTI的结构连接,使得使用图神经网络对脑网络进行无创分析成为可能。然而,年龄和性别等人口统计学因素系统地混淆了脑连接与临床结果之间的关系,导致GNN利用虚假捷径而非学习因果不变表示。尽管最近的因果GNN方法在图建模层面引入因果关系,但其因果机制仍然是领域无关的,没有考虑临床神经影像数据中固有的真实世界混杂因素。此外,脑网络是基于图谱分区构建的,每个区域对人口统计学因素表现出不同的敏感性,因此需要区域感知的调整。我们提出了Artemis,一个区域级因果框架,通过在每个脑区域独立进行因果干预,使用轻量级参数学习区域特定的混杂因素表示,从而弥合了这一差距。我们的调整综合利用多模态功能和结构特征进行图推理,作为一个与任意GNN骨干兼容的插件模块。在三个基准(用于疾病诊断的ADNI、用于痴呆分期的OASIS和用于性别分类的HCP)上的实验表明,与代表性的基于GNN的基线相比,该方法具有一致的改进。多项支持实验进一步证明了统计显著性和神经科学可解释性。

英文摘要

Multimodal neuroimaging, integrating functional connectivity from fMRI and structural connectivity from DTI, enables non-invasive analysis of brain networks using graph neural networks. However, demographic factors such as age and sex systematically confound the relationship between brain connectivity and clinical outcomes, causing GNNs to exploit spurious shortcuts rather than learning causally invariant representations. While recent causal GNN methods introduce causality at the graph-modeling level, their causal mechanisms remain domain-agnostic without accounting for the real-world confounders inherent in clinical neuroimaging data. Moreover, brain networks are constructed from atlas-based parcellations where each region exhibits distinct sensitivity to demographic factors, necessitating region-aware adjustment. We propose Artemis, a region-level causal framework that bridges this gap with causal intervention at each brain region independently by learning region-specific confounder representations with lightweight parameters. Our adjustment comprehensively utilized the multimodal functional and structural features for graph reasoning as a plug-in module compatible with arbitrary GNN backbones. Experiments on three benchmarks, ADNI for disease diagnosis, OASIS for dementia staging, and HCP for sex classification, demonstrate consistent improvements over representative GNN-based baselines. Multiple supporting experiments further demonstrate statistical significance and neuroscientific interpretability.

2606.18286 2026-06-18 cs.LG 新提交

CODEBLOCK: Learning to Supervise Code at the Right Granularity

CODEBLOCK: 学习在正确的粒度上监督代码

Zhijie Deng, Ling Li, Jinlong Pang, Kaiqin Hu, Qi Xuan, Zhaowei Zhu, Jiaheng Wei

发表机构 * Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) UC Santa Cruz(加州大学圣克鲁兹分校) Ant Group(蚂蚁集团) BAIA, ZJUT(浙江工业大学智能信息处理实验室) D5Data.ai

AI总结 提出CodeBlock框架,通过选择结构完整的代码块而非孤立token进行稀疏监督,在仅使用1.9%监督token的情况下,在六个代码生成基准上取得优于全token微调的效果。

详情
AI中文摘要

代码大语言模型的监督微调通常对所有响应token应用统一的交叉熵损失,隐含假设每个token提供同等有用的学习信号。最近的token级选择方法通过仅监督高价值token挑战了自然语言SFT中的这一假设。然而,直接将token级掩码迁移到代码可能会破坏语法和语义连贯的程序单元,因为代码依赖于结构完整性和定义-使用关系。因此,我们提出CodeBlock,一个结构感知的稀疏监督框架,选择结构完整的代码证据而非孤立token。CodeBlock首先选择高质量的指令-响应对,然后将代码响应划分为语法连贯的编码项,通过聚合核心逻辑token上的广义交叉熵来估计其效用,并使用数据流可达性和桥接信号重新排序,以优先传播或连接重要程序依赖的块。在训练期间,完整响应仍作为上下文可用,但损失仅应用于选定的代码项和信息性自然语言token。在六个代码生成基准上的实验表明,CodeBlock在仅使用1.9%的监督响应token的情况下,实现了比全tokenSFT和竞争性选择基线更强的平均pass@1。

英文摘要

Supervised fine-tuning of code LLMs typically applies uniform cross-entropy loss to all response tokens, implicitly assuming that every token provides equally useful learning signal. Recent token-level selection methods challenge this assumption in natural-language SFT by supervising only high-value tokens. However, directly transferring token-level masking to code can break syntactically and semantically coherent program units, because code depends on structural completeness and definition-use relations. We therefore propose CodeBlock, a structure-aware sparse supervision framework that selects structure-complete code evidence rather than isolated tokens. CodeBlock first selects high-quality instruction-response pairs, then partitions code responses into syntactically coherent coding items, estimates their utility by aggregating generalized cross-entropy over core logic tokens, and reranks them with data-flow reach and bridge signals to prioritize blocks that propagate or connect important program dependencies. During training, the full response remains available as context, while loss is applied only to selected code items and informative natural-language tokens. Experiments on six code-generation benchmarks show that CodeBlock achieves stronger average pass@1 than full-token SFT and competitive selection baselines, while using only 1.9% of supervised response tokens.

2606.18284 2026-06-18 cs.LG cs.AI cs.CL 新提交

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

打破求解器瓶颈:在可学习前沿训练任务生成器

Lorenz Wolf, Connor Watts, Roger Creus Castanyer, Geoffrey Bradway, Maxwill Lin, Augustine N. Mavor-Parker, Matthew Daborn-Sargent

发表机构 * Vmax Goodfire AI

AI总结 提出PROPEL框架,通过训练轻量级激活探针作为求解率代理,在无需重复求解器评估的情况下优化任务生成器,使生成任务集中在可学习前沿,提升数学、代码和软件工程任务的有效性。

Comments 30 pages, 9 figures, 12 tables

详情
AI中文摘要

通过强化学习训练智能体的限制资源日益成为前沿任务供给:有效、可求解且刚好足够困难以训练当前模型的任务。随着推理和智能体模型的改进,固定任务分布趋于饱和,而天真的合成生成产生琐碎、不可能或不适定的任务。用强化学习训练任务生成器以优化有效性和可学习性可以解决这一瓶颈,但直接优化需要对每个候选任务进行重复求解器评估。对于软件工程任务,单次评估可能耗时数十分钟;求解器在环的生成器训练是不可行的。我们提出PROPEL,一个求解器摊销框架,用于在目标求解率下训练任务生成器。PROPEL在一次性标注的生成任务和求解器结果语料库上训练一个轻量级激活探针。该探针从冻结的生成器参考模型预测目标求解器的通过率,并在生成器优化期间作为求解率的代理,将生成器评估简化为单次前向传播。在多种模型规模下的数学、代码和软件工程任务中,PROPEL将生成任务转向目标求解率:对于编程,在可学习前沿生成的任务从$10.1\% \ ightarrow 20.0\%$(针对Qwen2.5-3B-Instruct求解器)和从$5.3\% \ ightarrow 12.6\%$(针对Qwen2.5-7B-Instruct求解器)。对于软件工程,PROPEL将目标求解率下的生成份额从$9.8\% \ ightarrow 19.6\%$(针对Qwen3.5-27B在探针和生成器训练期间未见过的仓库)。

英文摘要

The limiting resource for training agents via reinforcement learning (RL) is increasingly frontier task supply: valid, solvable tasks just difficult enough to train the current model. As reasoning and agentic models improve, fixed task distributions saturate, while naive synthetic generation yields tasks that are trivial, impossible, or ill-posed. Training a task generator with RL to optimize validity and learnability can address this bottleneck, but direct optimization requires repeated solver rollouts per candidate. For software-engineering (SWE) tasks, a single rollout can take tens of minutes; solver-in-the-loop generator training is intractable. We introduce PROPEL, a solver-amortized framework for training task generators at the targeted solve rate. PROPEL trains a lightweight activation probe on a one-time labeled corpus of generated tasks and solver outcomes. The probe predicts target-solver pass rate from a frozen generator reference model and serves as a proxy for solve rate during generator optimization, reducing generator evaluation to a single forward pass. Across math, code, and software-engineering at multiple model scales, PROPEL shifts generation toward the targeted solve rate: for coding, tasks generated at the learnable frontier increase from $10.1\% \rightarrow 20.0\%$ for a Qwen2.5-3B-Instruct solver and from $5.3\% \rightarrow 12.6\%$ for a Qwen2.5-7B-Instruct solver. For SWE, PROPEL increases the share of generations at the targeted solve rate from $9.8\% \rightarrow 19.6\%$ for Qwen3.5-27B on repositories not seen during training of probe and generator.

2606.18273 2026-06-18 cs.CL cs.AI cs.SD eess.AS 新提交

Continuous Audio Thinking for Large Audio Language Models

面向大型音频语言模型的连续音频思考

Gyojin Han, Dong-Jae Lee, Changho Choi, Jongsuk Kim, Junmo Kim

发表机构 * KAIST(韩国科学技术院)

AI总结 提出连续音频思考(CoAT)框架,通过专家蒸馏在连续潜在空间中组织声学信息,使音频语言模型在生成响应前利用丰富声学特征,无需额外自回归解码成本,在多个音频任务上提升性能。

Comments Preprint

详情
AI中文摘要

大型音频语言模型(LALMs)在从语音转录到音乐分析等多种音频理解任务中展现了令人印象深刻的能力。然而,由于LALMs通常被训练生成与文本对齐的响应,其隐藏状态逐渐为文本生成而塑造,而非保留声学信息。因此,音频携带的多样化声学内容,如语音细节、韵律、声音事件、情感和音调,在过程中丢失,难以在响应中利用。我们引入了连续音频思考(CoAT),这是一个框架,为音频语言模型配备一个连续的潜在工作空间,用于在响应生成之前组织声学信息,并通过音频专家的蒸馏进行基础化。在思考空间内,模型可以在生成响应时利用专家蒸馏提供的丰富声学信息。此外,所提出的连续思考块可以在单个预填充中处理,因此CoAT不需要比基线额外的自回归解码成本。在三个LALM上,Qwen2-Audio、Qwen2.5-Omni-7B和Audio Flamingo~3,在涵盖音频推理、音频理解、音乐分类、语音情感和语音转录的广泛基准套件上的性能提升证明了CoAT的有效性。进一步分析证实,辅助监督从思考位置传播到模型的文本响应。

英文摘要

Large audio language models (LALMs) have shown impressive capabilities on diverse audio understanding tasks, ranging from speech transcription to music analysis. However, because LALMs are typically trained to produce text-aligned responses, their hidden states are progressively shaped for text generation rather than for preserving acoustic information. As a result, the diverse acoustic content that audio carries, such as phonetic detail, prosody, sound events, affect, and pitch, is lost along the way and difficult to leverage in the response. We introduce Continuous Audio Thinking (CoAT), a framework that equips audio language models with a continuous latent workspace for organizing acoustic information prior to response generation, grounded by distillation from audio experts. Within the thinking space, the model can utilize the rich acoustic information provided by expert distillation when generating its response. Furthermore, the proposed continuous thinking block can be processed in a single prefill, so CoAT does not require additional autoregressive decoding cost over the baseline. Across three LALMs, Qwen2-Audio, Qwen2.5-Omni-7B, and Audio Flamingo~3, performance gains on a broad benchmark suite spanning audio reasoning, audio understanding, music classification, speech emotion, and speech transcription demonstrate the effectiveness of CoAT. Further analysis confirms that the auxiliary supervision propagates from the thinking positions to the model's textual responses.

2606.18271 2026-06-18 cs.AI cs.LG 新提交

NAVI-Orbital: First In-Orbit Demonstration of a Zero-Shot Vision-Language Model for Autonomous Earth Observation

NAVI-Orbital:用于自主地球观测的零样本视觉语言模型的首次在轨演示

Juan Manuel Delfa Victoria, Taran Cyriac John, Andrew W. Herson

发表机构 * NASA Jet Propulsion Laboratory (JPL)(美国宇航局喷气推进实验室) Loft Orbital(Loft Orbital公司)

AI总结 本文介绍NAVI-Orbital系统,在低地球轨道卫星上首次实现视觉语言模型的自主多模态推理,通过语义压缩解决数据下传瓶颈。

Comments 17 pages, 47 figures

详情
AI中文摘要

随着地球观测数据的生成速度超过下行链路带宽和人在回路处理能力,星载采集与可操作地面情报之间的差距日益扩大。本文介绍NAVI-Orbital,一个部署在低地球轨道(LEO)航天器上的软件系统。2026年4月16日,NAVI-Orbital实现了据作者所知首次在轨演示,即视觉语言模型完全在星上进行自主多模态推理。NAVI-Orbital使用本地视觉语言模型(Gemma 3)对每个捕获场景进行分类,生成其内容及特征间关系的文本描述,并通过自然语言对话响应操作员的后续查询。该系统通过纯英语提示替代传统指令序列进行任务重定向,并由基于图的状态机(LangGraph)编排,协调用于检测和对话的专用代理。地面基准测试(在7,960张图像的精选AID基准上准确率达88.16%)、Flatsat验证以及实时在轨捕获的新获取、未见过的地球图像(包括未校正的YAM-9图像,在星上通过硬件加速GPU推理处理且未对飞行仪器进行微调)的结果表明,在卫星级边缘计算机上运行基础模型是可行的,通过星上地球观测的语义压缩,颠覆了传统的先采集后全部下传的带宽模式。

英文摘要

As Earth Observation data generation outpaces downlink bandwidth and human-in-the-loop processing, a widening gap has emerged between onboard collection and actionable ground intelligence. This paper presents NAVI-Orbital, a software system deployed on a Low Earth Orbit (LEO) spacecraft. On April 16, 2026, NAVI-Orbital achieved what is, to the authors' knowledge, the first in-orbit demonstration of a vision-language model performing autonomous multi-modal inference entirely onboard. NAVI-Orbital uses a local vision-language model (Gemma 3) to classify each captured scene, produce a text description of its content and the relationships between its features, and respond to operator follow-up via natural-language dialogue. The system is re-tasked through plain-English prompts in place of conventional command sequences, and is orchestrated by a graph-based state machine (LangGraph) coordinating dedicated agents for detection and dialogue. Results across ground benchmarking (88.16% accuracy on the 7,960-image curated AID benchmark), Flatsat validation, and live in-orbit captures of newly acquired, previously unseen Earth imagery (including uncorrected YAM-9 imagery, processed onboard with hardware-accelerated GPU inference and no fine-tuning for the flight instrument) demonstrate the feasibility of running foundation models on satellite-class edge computers to invert the conventional acquire-then-downlink-everything bandwidth profile through semantic compression of Earth observations in-orbit.

2606.19279 2026-06-18 cs.AI cs.LG cs.LO math.CT math.LO math.PR 新提交

NeSyCat Torch: A Differentiable Tensor Implementation of Categorical Semantics for Neurosymbolic Learning

NeSyCat Torch:神经符号学习中范畴语义的可微张量实现

Daniel Romero Schellhorn, Till Mossakowski, Björn Gehrke

发表机构 * University of Osnabrück(奥斯纳布吕克大学)

AI总结 提出NeSyCat Torch框架,通过强单子和真值聚合结构统一神经符号语义,利用惰性对数张量单子实现可微训练,在MNIST加法任务上优于LTN和DeepProbLog。

详情
AI中文摘要

神经符号语义是碎片化的:经典、模糊、概率和神经系统的真值各自遵循其归纳规则。NeSyCat扩展了ULLER,将它们统一在一个单一的真值归纳定义下,该定义以强单子和真值上的聚合结构为参数。NeSyCat至今缺乏对由神经网络学习的谓词和函数的描述。我们提供NeSyCat Torch作为缺失的环节,通过神经网络解释计算符号,在概率编程和张量后端中实现该框架。我们使用分布单子作为参考语义和度量评估,并辅以一个用于数值稳定、可微训练的单子:对数半环上的惰性对数张量单子。为了高效批量训练,我们还采用了批处理单子。公理即源代码:一次性地用基于单子的do-notation编写,单子绑定执行边缘化,惰性地剪枝不需要的分支。在MNIST加法任务上,我们的HaskTorch、JAX和PyTorch实现在速度和准确性上优于LTN和DeepProbLog,同时几乎达到DeepStochLog的准确性。然而,与DeepStochLog不同,我们保持在一个统一的框架内,适用于许多一阶神经符号方法。即,该构造以单子为参数;例如,用Giry单子实例化它可将方法扩展到连续概率(在此留作未来工作)。

英文摘要

Neurosymbolic semantics is fragmented: classical, fuzzy, probabilistic and neural systems each define truth by their own inductive rules. NeSyCat, extending ULLER, subsumes them under a single inductive definition of truth, parametric in a strong monad and an aggregation structure on truth-values. NeSyCat has so far lacked an account of predicates and functions learned by neural networks. We provide NeSyCat Torch as the missing link and interpret computational symbols via neural networks, implementing the framework in probabilistic programming and tensor-based backends. We use the distribution monad for reference semantics and metric evaluation, and complement it by a monad for numerically stable, differentiable training: the lazy log-tensor monad over the log-semiring. For efficient training in batches, we furthermore employ a batch monad. The axioms are the source code: written once in monad-based do-notation, monadic bind performs marginalisation, lazily pruning unneeded branches. On MNIST addition, our HaskTorch, JAX, and PyTorch implementations outperform LTN and DeepProbLog in speed and accuracy, while achieving nearly the accuracy of DeepStochLog. However, unlike DeepStochLog, we stay in a uniform framework that applies to many first-order NeSy approaches. Namely, the construction is parametric in the monad; instantiating it with, e.g., the Giry monad extends the approach to continuous probability (working out a neural representation here is left for future work).

2606.19179 2026-06-18 cs.LG cs.AI math.OC stat.ML 新提交

Compute Efficiency and Serial Runtime Tradeoffs for Stochastic Momentum Methods

随机动量方法的计算效率与串行运行时间权衡

Depen Morwani, Alexandru Meterez, Pranav Nair, Sham Kakade

发表机构 * Harvard University(哈佛大学) Kempner Institute at Harvard University(哈佛大学凯普纳研究所)

AI总结 研究随机动量方法(如重球法和加速SGD)在一致线性回归中的批次大小权衡,证明重球法不改善SGD的计算效率前沿但允许更大批次减少串行运行时间,而加速SGD的计算效率与串行运行时间权衡依赖于谱衰减。

详情
AI中文摘要

随机动量方法,如重球法(HB)、Nesterov动量以及加速SGD(ASGD)的变体[Kidambi等人,2018],在现代训练中被广泛使用,但其随机优势取决于两个不同的量:串行运行时间(达到目标精度所需的迭代次数)和计算效率(CE,总梯度查询或FLOP成本的倒数)。更大的批次在不损害CE的情况下减少串行运行时间,仅当收缩间隙随批次大小线性增长时。我们研究了一致线性回归(具有高斯协变量)的随机HB和ASGD,并证明了其批次大小权衡的有限维离散时间下界。我们的第一个结果表明,HB不会改善任意谱下SGD的CE前沿;相反,它在更大的批次大小窗口内保持SGD级别的CE,允许更大的批次减少串行运行时间,直到HB达到其确定性加速尺度。这个窗口可能比SGD临界批次大小大$\sqrt{\kappa}$倍。对于ASGD,情况更依赖于谱:对于快速衰减的幂律谱,ASGD改善了小批次下的CE(相对于HB/SGD),但随着批次大小增加,它牺牲了这种CE优势以换取改进的串行运行时间。合成线性回归实验验证了这些定性区域,包括慢衰减谱下ASGD和HB的近乎重叠,以及快速衰减谱下预测的CE-串行权衡。

英文摘要

Stochastic momentum methods such as heavy ball (HB), Nesterov momentum, and variants of Accelerated SGD (ASGD) [Kidambi et al., 2018] are widely used in modern training, but their stochastic benefits depend on two distinct quantities: serial runtime, the number of iterations needed to reach a target accuracy, and compute efficiency (CE), the inverse total gradient-query or FLOP cost. Larger batches reduce serial runtime without hurting CE only when the contraction gap grows linearly with batch size. We study stochastic HB and ASGD for consistent linear regression with Gaussian covariates and prove finite-dimensional, discrete-time lower bounds on their batch-size tradeoffs. Our first result shows that HB does not improve the CE frontier over SGD for arbitrary spectra; rather, it preserves SGD-level CE over a larger batch-size window, allowing larger batches to reduce serial runtime until HB reaches its deterministic accelerated scale. This window can be a factor $\sqrtκ$ larger than the SGD critical batch size. For ASGD, the picture is more spectrum-dependent: for rapidly decaying power-law spectra, ASGD improves small-batch CE over HB/SGD, but as batch size grows it trades this CE advantage for improved serial runtime. Synthetic linear-regression experiments verify these qualitative regimes, including near-overlap of ASGD and HB for slowly decaying spectra and the predicted CE--serial tradeoff for rapidly decaying spectra.

2606.18730 2026-06-18 cs.RO cs.AI math.CO math.OC 新提交

Two-Phase Bilevel Search for the Moving-Target Traveling Salesman Problem with Moving Obstacles

带移动障碍物的移动目标旅行商问题的两阶段双层搜索

Allen George Philip, Anoop Bhat, Sivakumar Rathinam, Howie Choset

发表机构 * Texas A&M University(德克萨斯A&M大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 针对带移动障碍物的移动目标旅行商问题,提出混合整数锥规划公式和两阶段双层搜索算法,显著优于基线方法。

详情
AI中文摘要

移动目标旅行商问题(MT-TSP)寻求从静态仓库出发、访问一组移动目标(每个目标在其分配的时间窗口内)并返回仓库的代理的最小成本轨迹。在本文中,我们研究了带移动障碍物的移动目标旅行商问题(MT-TSP-MO),这是MT-TSP的推广,其中代理轨迹必须避开移动障碍物。我们提出了一个混合整数锥规划(MICP)公式,可以使用现成的求解器求解,以及一个快速且可扩展的两阶段双层搜索(TPBS)算法,该算法为问题计算高质量可行解。我们在多达40个目标和40个障碍物的广泛问题实例上评估了我们的方法,与现有基线算法相比。结果表明,所提出的两种方法在成功率、解决方案成本和计算时间方面均显著优于基线。

英文摘要

The Moving-Target Traveling Salesman Problem (MT-TSP) seeks a minimum cost trajectory for an agent that departs from a static depot, visits a set of moving targets, each within one of their assigned time windows, and returns to the depot. In this article, we study the Moving-Target Traveling Salesman Problem with Moving Obstacles (MT-TSP-MO), a generalization of the MT-TSP where the agent trajectory must avoid moving obstacles. We present a Mixed-Integer Conic Programming (MICP) formulation that can be solved using off-the-shelf solvers, as well as a fast and scalable Two-Phase Bilevel Search (TPBS) algorithm that computes high-quality feasible solutions for the problem. We evaluate our approaches against an existing baseline algorithm on a broad range of problem instances with up to 40 targets and 40 obstacles. The results demonstrate that both the proposed methods significantly outperform the baseline with respect to success rates, solution costs, and computation time.

2606.19026 2026-06-18 cs.LG cs.AI physics.ao-ph 新提交

A Hybrid LSTM--Vision Transformer Architecture for Predicting HRRR Forecast Errors

混合LSTM-视觉Transformer架构用于预测HRRR预报误差

David Aaron Evans, Jay C. Rothenberger, Kara J. Sulia, Nick P. Bassill, Chris D. Thorncroft

发表机构 * Atmospheric Sciences Research Center, University at Albany, SUNY(纽约州立大学奥尔巴尼分校大气科学研究中心) University of Oklahoma(俄克拉荷马大学) State Weather Risk Communication Center, University at Albany, SUNY(纽约州立大学奥尔巴尼分校州天气风险沟通中心)

AI总结 提出LSTM-ViT混合框架,结合地表观测时序与大气廓线,预测HRRR降水、风速和温度预报误差,相比基线LSTM性能提升,尤其降水误差预测技能提高约两倍。

Comments This manuscript is a preprint and has been submitted for peer review to the Artificial Intelligence for the Earth Systems journal. The content is subject to change based on the outcome of the peer review process and should not be considered final or definitive. Copyright in this Work may be transferred without further notice

详情
AI中文摘要

高分辨率数值天气预报(NWP)系统中的预报误差通常与未解析的边界层(PBL)过程、对流、地形诱导环流以及其他垂直结构的大气现象有关。先前的研究表明,长短期记忆(LSTM)网络可以利用中尺度观测成功预测高分辨率快速刷新(HRRR)模型的预报误差,但我们认为性能下降与复杂垂直大气演化时期有关。为解决这一局限,我们开发了一种混合LSTM-视觉Transformer(LSTM-ViT)框架,将来自地表观测的时间序列学习与来自纽约州中尺度剖面仪网络的垂直大气廓线相结合。LSTM-ViT框架被训练用于预测单个中尺度站点上HRRR的逐时降水、10米风速和2米温度预报误差。在所有三个预测变量中,相对于基线LSTM架构,引入剖面仪导出的大气结构提高了预报误差预测技能,最大提升出现在较短的预报提前期和PBL活动增强期间。对于降水预报误差,改进尤为显著,LSTM-ViT框架相对于基线LSTM实现了约两倍的预测技能提升,同时更好地捕捉了对流驱动的误差演变并减少了与PBL过程相关的退化。这些结果表明,将时间序列学习与垂直注意力机制相结合,为改进业务NWP系统中的预报误差预测提供了一条具有物理意义的途径。我们的研究为预报员提供了关于模型偏差和预报置信度的增强指导。

英文摘要

Forecast errors in high-resolution numerical weather prediction (NWP) systems are often linked to unresolved planetary boundary layer (PBL) processes, convection, terrain-induced circulations, and other vertically structured atmospheric phenomena. Previous work demonstrated that Long Short-Term Memory (LSTM) networks can successfully predict forecast errors in the High-Resolution Rapid Refresh (HRRR) model using mesonet observations, but we believe performance degradation is linked to periods of complex vertical atmospheric evolution. To address this limitation, we develop a hybrid LSTM-Vision Transformer (LSTM-ViT) framework that combines temporal sequence learning from surface observations with atmospheric profiles from the New York State Mesonet profiler network. The LSTM-ViT framework is trained to predict HRRR hourly precipitation, 10 m wind speed, and 2 m temperature forecast errors at individual mesonet stations. Across all three predictors, incorporation of profiler-derived atmospheric structure improves forecast error prediction skill relative to the baseline LSTM architecture, with the largest gains occurring at shorter forecast lead times and during periods of enhanced PBL activity. Improvements are particularly pronounced for precipitation forecast error, where the LSTM-ViT framework achieves approximately a twofold increase in predictive skill relative to the baseline LSTM while better capturing convectively driven error evolution and reducing degradation associated with PBL processes. These results demonstrate that combining temporal sequence learning with vertically informed attention mechanisms provides a physically meaningful pathway for improving forecast error prediction in operational NWP systems. Our research offers forecasters enhanced guidance regarding model bias and forecast confidence.

2606.18857 2026-06-18 cs.LG physics.ao-ph 新提交

Investigating Inductive Biases for Machine Learning Emulation of Sudden Stratospheric Warmings in Idealised Isca Simulations

研究理想化Isca模拟中平流层突然增温的机器学习模拟的归纳偏差

Oskar Bohn Lassen, Simon Driscoll, Stephen I. Thomson, Sebastian Schemm, Francisco C. Pereira

发表机构 * Technical University of Denmark(丹麦技术大学) University of Cambridge(剑桥大学) University of Exeter(埃克塞特大学)

AI总结 测试不同架构的归纳偏差对模拟平流层突然增温动力学的影响,发现三维垂直耦合是关键,但低预测误差不保证物理一致性。

详情
AI中文摘要

机器学习模拟器越来越多地用于天气预报,并有可能通过学习动态重要的可预测性来源,将技能扩展到次季节到季节时间尺度。一个关键挑战是模型能否利用可预测性锚点,例如平流层变率,这些锚点在超出短期超前时间时影响对流层环流。我们使用配对的理想化Isca模拟测试架构归纳偏差如何影响对平流层突然增温(SSW)动力学的模拟,这些模拟仅在施加的波-2加热扰动上有所不同。在用于一步预测的卷积、变换器和基于图的架构中,当平流层动态安静时,模型差异不大,但当类似SSW的变率活跃时,差异显著扩大。我们的结果确定显式三维垂直耦合是机器学习模拟平流层动力学的关键归纳偏差。然而,Eliassen-Palm通量诊断表明,低预测误差并不能保证物理上真实的波-平均流相互作用,平流层波驱动结构中仍存在相干误差。

英文摘要

Machine-learning emulators are increasingly used for weather prediction and have the potential to extend skill on subseasonal-to-seasonal timescales by learning dynamically important sources of predictability. A key challenge is whether the models can exploit predictability anchors, such as stratospheric variability, that influence tropospheric circulation beyond short lead times. We test how architectural inductive bias affects emulation of sudden stratospheric warming (SSW) dynamics using paired idealised Isca simulations that differ only in an imposed wave-2 heating perturbation. Across convolutional, transformer, and graph-based architectures trained for one-step prediction, model differences are modest when the stratosphere is dynamically quiet but widen substantially when SSW-like variability is active. Our results identify explicit three-dimensional vertical coupling as a key inductive bias for machine-learning emulation of stratospheric dynamics. However, Eliassen-Palm flux diagnostics show that low forecast error does not guarantee physically faithful wave-mean-flow interaction, with coherent errors remaining in stratospheric wave-driving structure.

2606.18713 2026-06-18 cs.LG physics.comp-ph 新提交

Trainable Photonic Measurement for Physics-Informed PDE Learning

可训练光子测量用于物理信息偏微分方程学习

Jiale Linghu, Hao Dong, Yangshuai Wang

发表机构 * Xidian University(西安电子科技大学) National University of Singapore(新加坡国立大学)

AI总结 提出一种光子量子神经场,将坐标编码为可训练光学相位,通过多光子Fock空间干涉混合并从光子数测量解码,作为物理信息残差最小化的可训练表示,在七种PDE基准上展示相位复杂度转变,在困难区域误差低一个数量级且参数少约四分之一。

详情
AI中文摘要

光子量子机器学习提供了一条从相位、干涉和测量构建可训练物理表示的途径。然而,其在科学机器学习中的作用仍 largely unexplored。物理信息神经场提供了一个自然设置,因为微分方程需要保留相位、频率和导数结构的试验空间。这里我们引入一种光子量子神经场,其中坐标成为可训练光学相位,通过多光子Fock空间干涉混合,并从光子数测量解码。光子电路本身作为神经场表示进行优化,而非固定特征图或硬件加速器。因此,光子测量是一种可训练表示,在此基础上最小化物理信息残差。在七个椭圆、波动、非线性色散和逆PDE基准测试中,我们观察到相位复杂度转变:经典坐标和傅里叶特征网络在平滑区域足够,而光子场在残差导数放大相位失配时最准确。在最困难区域,它给出最低误差,差距达一个数量级,且可训练参数约为经典基线四分之一。冻结和打乱控制以及噪声压力测试将这一增益归因于学习到的干涉和在复合扰动下稳定的Fock概率读出。这些结果将光子量子测量识别为科学机器学习的一种表示学习原理。

英文摘要

Photonic quantum machine learning offers a route to trainable physical representations built from phase, interference and measurement. However, its role in scientific machine learning remains largely unexplored. Physics-informed neural fields provide a natural setting, because differential equations require trial spaces that preserve phase, frequency and derivative structure. Here we introduce a photonic quantum neural field in which coordinates become trainable optical phases, are mixed by multi-photon Fock-space interference and are decoded from photon-number measurements. The photonic circuit is optimized as the neural-field representation itself, not as a fixed feature map or hardware accelerator. Photonic measurement is therefore a trainable representation on which the physics-informed residual is minimized. Across seven elliptic, wave, nonlinear dispersive and inverse PDE benchmarks, we observe a phase-complexity transition: classical coordinate and Fourier-feature networks suffice in smooth regimes, whereas the photonic field is most accurate when residual derivatives amplify phase mismatch. In the hardest regimes it gives the lowest errors, with margins reaching an order of magnitude and about one quarter of the trainable parameters of classical baselines. Frozen and shuffled controls, together with noise stress tests, attribute this gain to learned interference and stable Fock-probability readout under compound perturbations. These results identify photonic quantum measurement as a representation-learning principle for scientific machine learning.