arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2069
2605.31343 2026-06-01 cs.RO

Learning Terrain-Aware Whole-Body Control for Perceptive Legged Loco-Manipulation

学习面向感知的腿式移动操作的地形感知全身控制

Sikai Guo, Yudong Zhong, Guoyang Zhao, Botao Dang, Zhihai Bi, Jun Ma

AI总结 提出TA-WBC框架,通过混合外感受编码器提取地形特征、基于脚接触平面的末端执行器采样方法以及双策略蒸馏模块,实现腿式机械臂在复杂地形上的全身移动操作控制。

详情
AI中文摘要

腿式机械臂结合了卓越的地形适应性和移动操作能力,使其在人类中心环境中极具应用前景。通过协调腿和臂的控制,全身控制器可以显著扩展腿式机械臂的操作工作空间。然而,许多现有的全身控制器主要依赖于本体感觉,并未整合有效地形拓扑感知所需的关键外部感受。这一限制可能阻碍它们适应不同环境条件并有效导航复杂地形。在本文中,我们介绍了TA-WBC,一种用于腿式机械臂的地形感知全身控制框架,其特点是一种新颖的基于强化学习的统一策略,专门针对各种地形中的全身移动操作任务。具体来说,我们采用混合外感受编码器提取地形特征,为机器人主动调整姿态和立足点提供必要基础。此外,为了促进稳定的跨地形移动操作,我们提出了一种基于脚接触平面的新颖末端执行器采样方法,将操作目标与基座波动解耦。此外,引入了双策略蒸馏模块,以在不发生灾难性遗忘的情况下整合广泛的全身运动与地形适应性。仿真和真实世界实验验证了我们提出的控制器的鲁棒性,该控制器实现了更大的可达空间、更小的跟踪误差和减少的意外绊倒。这一统一策略突显了腿式机械臂在复杂地形上执行移动操作任务的有前景的能力。

英文摘要

Legged manipulators integrate exceptional terrain adaptability along with mobile manipulation capabilities, which make them highly promising for deployment in human-centric environments. By coordinating the control of both legs and arms, a whole-body controller can significantly expand the operational workspace of legged manipulators. However, many existing whole-body controllers primarily depend on proprioception and do not incorporate the critical exteroception required for effective terrain topology perception. This limitation can hinder their ability to adapt to varying environmental conditions and navigate complex terrains effectively. In this paper, we introduce TA-WBC, a terrain-aware whole-body control framework for legged manipulators, which features a novel RL-based unified policy tailored to whole-body loco-manipulation tasks in various terrains. Specifically, we employ a hybrid exteroception encoder to extract terrain features, providing an essential basis for the robot to proactively adapt posture and footholds. Furthermore, to facilitate stable cross-terrain loco-manipulation, we propose a novel end-effector sampling method based on the foot contact plane, decoupling manipulation target from base fluctuations. Moreover, a dual-policy distillation module is introduced to integrate expansive whole-body motion with terrain adaptability without catastrophic forgetting. The simulation and real-world experiments validate the robustness of our proposed controller, which leads to a larger reachable space, less tracking error, and reduced unexpected stumbles. This unified policy highlights the promising capabilities of legged manipulators in performing loco-manipulation tasks across complex terrains.

2605.31338 2026-06-01 cs.CL

Bundesrecht: An Open Library and Corpus for German Statutory Reference Processing

Bundesrecht: 面向德国法律引用处理的开放库与语料库

Harshil Darji, Martin Heckelmann, Christina Kratsch, Gerard de Melo

AI总结 本文提出 bundesrecht,一个包含软件库和结构化语料库的开放资源,用于解析、规范化和解析德国法律引用,实现从原始引用字符串到结构化法律条文的端到端处理。

Comments 10 pages, 1 figure. Preprint

详情
AI中文摘要

法律引用是法律语言理解的核心,但难以自动处理,因为它们以紧凑且多变的表面形式出现,可能组合多个目标,使用特殊缩写,并常指向较低级别的单元。现有的德语工具要么专注于从法律文档中解析引用,要么在引用明确后访问法律文本。本文介绍了 bundesrecht,一个用于德国法律引用处理的开放资源,包含一个软件库和一个结构化的德国联邦法律语料库。该库解析、规范化和解析德国法律引用,将原始引用字符串映射到结构化对象,将紧凑引用扩展为规范形式,并将其链接到法律条文。附带的语料库保留了从法律到细粒度子条款的法律内部层级。我们使用严格精确匹配和微信息抽取指标,在 2,944 个带注释的德国法律引用上评估了解析器和规范化器。我们进一步评估了规范引用去重,并表明规范化引用比字符串匹配更可靠地对真实引用表面变体进行分组。bundesrecht 是第一个覆盖德国法律引用处理端到端流水线的开放资源,从原始引用字符串到解析后的法律条文,并在 PyPI 上可用。

英文摘要

Statutory references are central to legal language understanding, but are difficult to process automatically, as they appear in compact and variable surface forms, may combine multiple targets, use special abbreviations, and often point to lower-level units. Existing tools for German focus either on parsing references from legal documents or accessing statutory text once citations are explicit. This paper introduces bundesrecht, an open resource for German statutory reference processing, consisting of a software library and a structured corpus of German federal law. The library parses, normalizes, and resolves German statutory references, mapping raw citation strings to structured objects, expanding compact references into canonical forms, and linking them to statutory provisions. The accompanying dataset preserves the internal hierarchy of statutes from laws to fine-granular subclauses. We evaluate the parser and normalizer on 2,944 annotated German legal references using strict exact-match and micro information extraction metrics. We further evaluate canonical reference deduplication and show that normalized references group real citation surface variants far more reliably than string matching. bundesrecht is the first open resource that covers German statutory reference processing as an end-to-end pipeline, from raw citation string to resolved statutory provision, and is available on PyPI.

2605.31336 2026-06-01 cs.CV

DecMem: Towards Minute-Long Consistent World Generation with Decoupled Memory

DecMem:基于解耦记忆的分钟级一致世界生成

Zhenhao Yang, Xiaoshi Wu, Zhengyao Lv, Xiaoyu Shi, Xintao Wang, Pengfei Wan, Kun Gai, Kwan-Yee K. Wong

AI总结 提出解耦记忆架构DecMem,通过稀疏全局记忆和锚定局部记忆解决长程视频生成中的时空一致性问题,实现分钟级可控长视频生成。

Comments Project page is available at https://jeffreyyzh.github.io/DecMem-Page

详情
AI中文摘要

近期视频生成模型的进展推动了可控世界模型的快速发展。然而,在长程推理下保持细粒度时空一致性仍是一个关键挑战。在这项工作中,我们超越了显式3D记忆和粗粒度的帧级隐式建模,提出了一种细粒度、可学习且可扩展的记忆用于一致世界生成。我们首先识别了朴素可学习记忆架构在长程外推中的两个基本限制,即计算效率低下和注意力分散。通过对注意力分散的系统分析,我们提出了DecMem,一种解耦记忆架构,采用稀疏全局记忆实现对全局历史的高效细粒度访问,以及锚定局部记忆实现稳定高质量的外推。大量实验表明,DecMem显著优于当前最先进的方法。通过确保精确高效的长时记忆并实现卓越的外推能力,DecMem实现了分钟级可控长视频生成,具有高保真度和一致性。

英文摘要

Recent advances in video generative models have promoted rapid progress in controllable world models. However, maintaining fine-grained spatio-temporal consistency under long-horizon reasoning remains a key challenge. In this work, we move beyond explicit 3D memory and coarse frame-level implicit modeling, and propose a fine-grained, learnable, and scalable memory for consistent world generation. We first identify two fundamental limitations of naïve learnable memory architectures in long-horizon extrapolation, namely computational inefficiency and attention dispersion. Through a systematic analysis of attention dispersion, we propose DecMem, a decoupled memory architecture that employs Sparse Global Memory for efficient fine-grained access to global history and Anchored Local Memory for stable and high-quality extrapolation. Extensive experiments demonstrate that DecMem significantly outperforms current state-of-the-art methods. By ensuring precise and efficient long-term memory and achieving superior extrapolation capabilities, DecMem enables minute-level controllable long video generation with high fidelity and consistency.

2605.31328 2026-06-01 cs.CL

Reinforcement Learning Amplifies Emergent Misalignment from Harmless Rewards

强化学习放大了来自无害奖励的涌现性失调

Magnus Jørgenvåg, David Kaczér, Lasse Ruttert, Marvin Gülhan, Lucie Flek, Florian Mai

AI总结 本文研究强化学习如何从看似无害的奖励信号中引发语言模型的涌现性失调,发现其比监督微调更严重,并验证了在训练中插入安全数据可缓解此问题。

详情
AI中文摘要

涌现性失调(EM)是指语言模型在针对狭窄的失调示例进行微调后,意外地变得广泛失调的倾向。虽然EM在监督微调(SFT)设置中已被广泛研究,但来自强化学习(RL)的证据仅限于大型闭源模型,使得该现象研究成本高昂且难以复现。我们沿三个维度刻画了小型、现成的开源权重模型中来自RL的EM。首先,我们表明,奖励狭窄、明显的失调行为比样本匹配的SFT产生更高的通用域失调。其次,我们表明,来自RL的EM可以由可能自然出现的奖励信号引发,例如不受欢迎的审美偏好或糟糕的修辞诉求。第三,我们评估了为SFT引发的EM开发的训练中缓解措施,发现它们广泛适用,其中交错进行策略内安全数据效果最佳。

英文摘要

Emergent misalignment (EM) is the surprising tendency of language models to become broadly misaligned after fine-tuning on narrowly misaligned examples. While EM has been extensively studied in the supervised fine-tuning (SFT) setting, evidence that it also arises from reinforcement learning (RL) is limited to large, closed-source models, leaving the phenomenon expensive to study and difficult to reproduce. We characterize EM from RL in small, off-the-shelf open-weight models along three axes. First, we show that rewarding narrow, overtly misaligned behavior produces substantially higher general-domain misalignment than sample-matched SFT. Second, we show that EM from RL can be induced by reward signals that could plausibly arise naturally, such as unpopular aesthetic preferences or poor rhetorical appeals. Third, we evaluate in-training mitigations developed for SFT-induced EM and find that they broadly transfer, with interleaving on-policy safety data performing best.

2605.31324 2026-06-01 cs.LG cs.AI

Inconsistency-Aware Minimization: Improving Generalization with Unlabeled Data

不一致感知最小化:利用无标签数据提升泛化能力

Hee-Sung Kim, Hyeonseong Kim, Sungyoon Lee

AI总结 本文提出一种基于信息几何的局部不一致性度量,并据此设计不一致感知最小化(IAM)方法,通过无标签数据计算该度量并融入训练目标,从而提升深度学习模型的泛化性能。

Comments ICML 2026

详情
AI中文摘要

估计泛化差距并开发改进泛化的优化方法对于深度学习模型至关重要,无论是从理论理解还是实际应用角度。利用无标签数据实现这些目标在实际场景中具有显著优势。本文从神经网络参数空间的信息几何角度出发,引入了一种新的泛化度量——局部不一致性。局部不一致性的一个关键特征是它可以在没有显式标签的情况下计算。我们通过将局部不一致性与Fisher信息矩阵和损失Hessian矩阵联系起来,建立了理论基础。实验上,我们证明了局部不一致性与泛化差距相关。基于这些发现,我们提出了不一致感知最小化(IAM),将局部不一致性纳入训练目标。我们证明,在标准监督学习设置中,IAM增强了泛化能力,实现了与现有方法(如锐度感知最小化)相当的性能。此外,IAM在半监督和自监督学习场景中表现出有效性,其中局部不一致性是从无标签数据计算得出的。

英文摘要

Estimating the generalization gap and developing optimization methods that improve generalization are crucial for deep learning models, for both theoretical understanding and practical applications. Leveraging unlabeled data for these purposes offers significant advantages in real-world scenarios. This paper introduces a novel generalization measure, local inconsistency, derived from an information-geometric perspective on the parameter space of neural networks. A key feature of local inconsistency is that it can be computed without explicit labels. We establish theoretical underpinnings by connecting local inconsistency to the Fisher information matrix and the loss Hessian. Empirically, we demonstrate that local inconsistency correlates with the generalization gap. Based on these findings, we propose Inconsistency-Aware Minimization (IAM), which incorporates local inconsistency into the training objective. We demonstrate that in standard supervised learning settings, IAM enhances generalization, achieving performance comparable to that of existing methods such as Sharpness-Aware Minimization. Furthermore, IAM exhibits efficacy in semi- and self-supervised learning scenarios, where the local inconsistency is computed from unlabeled data.

2605.31321 2026-06-01 cs.RO

Surface Constraint Policy for Learning Surface-Constrained and Dynamically Feasible Robot Skills

表面约束策略:学习受表面约束且动态可行的机器人技能

Shuai Ke, Jiexin Zhang, Huan Zhao, Zhiao Wei, Yikun Guo, Jie Pan, Han Ding

AI总结 提出表面约束策略(SCP),通过二维加权高斯核编码表面几何约束,结合扩散策略和基于相似性的动作映射生成动态可行的表面约束运动,解决了自由曲面约束下动作随机性和接触不稳定的问题。

详情
AI中文摘要

基于扩散的模仿学习方法在机器人灵巧操作任务中取得了快速进展。然而,当应用于涉及复杂自由曲面约束的任务时,由于缺乏显式的表面几何约束建模和动态可行性问题,它们存在局限性,导致随机动作生成无法实现可靠的表面对齐和维持稳定接触。为了解决这些局限性,我们提出了一种新颖的表面约束策略(SCP),用于基于人类演示和实时视觉观察生成满足自由曲面约束的机器人动作。首先,使用从演示中推导出的二维加权高斯核函数对表面几何约束进行编码。基于编码的表面几何约束,使用基于扩散的策略从多模态感知输入(包括视觉观察和机器人状态反馈)中推断任务级动作意图。这些意图通过基于相似性的动作映射方法进一步转化为表面约束的动态运动基元(DMP),从而实现平滑且柔顺的运动执行。SCP实现了结构化表面几何意图和动态可接受动作的生成。所提出的方法在多个表面操作任务上进行了验证,并与现有技术进行了比较。实验结果表明,在表面约束下,该方法具有优越的任务成功率和接触稳定性。

英文摘要

Diffusion-based imitation learning methods have driven rapid progress in robot dexterous manipulation tasks. However, they have limitations when applied to tasks that involve complex free-form surface constraints because of their lack of explicit surface geometry constraint modeling and the dynamic feasibility issue, resulting in stochastic action generation that fails to achieve reliable surface alignment and maintain stable contact. To address these limitations, we propose a novel surface constraint policy (SCP) for generating robot actions that satisfy free-form surface constraints on the basis of human demonstrations and real-time visual observations. First, the surface geometry constraint is encoded using a two-dimensional weighted Gaussian kernel function that is derived from demonstrations. Building on the encoded surface geometry constraints, the diffusion-based policy is used to infer task-level action intentions from multimodal sensory inputs, including visual observations and robot state feedback. These intentions are further transformed into surface-constrained dynamic movement primitives (DMPs) through a similarity-based action mapping method, thereby enabling smooth and compliant motion execution. The SCP achieves generation of structured surface geometric intent and dynamically admissible actions. The proposed method is validated on multiple surface manipulation tasks and compared with existing techniques. The experimental results demonstrate superior task success rates and contact stability under surface constraints.

2605.31318 2026-06-01 cs.LG cs.MA

Generalized Intention Modeling in Multi-Agent Reinforcement Learning

多智能体强化学习中的广义意图建模

Mateusz Odrowaz-Sypniewski, Jasmine Bayrooti, Ajay Shankar, Amanda Prorok

AI总结 提出一种任务自适应的对手建模框架,通过性能驱动的多意图表示混合及最大化与自我智能体未来回报的互信息的新意图表示,提升非合作多智能体环境中的决策性能。

详情
AI中文摘要

在非合作、竞争和一般和的多智能体强化学习中,建模对手的意图对于有效决策至关重要。现有的对手建模方法使用从先验选择的回合信息(如对手的下一个动作或未来环境状态)中提取的嵌入来编码意图,并以此引导自我智能体的行为。这些方法假设所选信息普遍代表意图;然而,我们通过实验证明情况并非如此,因为意图通常依赖于任务和环境。为了解决这个问题,我们引入了一个任务自适应的对手建模框架,该框架学习一种性能驱动的多意图表示混合。此外,我们提出了一种新的意图表示,它最大化与自我智能体未来回报的互信息,从而捕获与性能最直接相关的对手信息。我们的方法在各种任务中始终匹配或超越最先进基线的性能,并揭示了不同对手建模策略何时以及为何成功。

英文摘要

Modeling an opponent's intent is critical for effective decision-making in non-cooperative, competitive, and general-sum multi-agent reinforcement learning. Existing opponent modeling methods encode intent using an embedding derived from episode information chosen a priori, such as the opponent's next action or a future environment state, and use this to guide the ego-agent's behavior. These approaches assume that the chosen information is universally representative of intent; however, we show empirically that this is not the case as intentions are often task- and environment-dependent. To address this, we introduce a task-adaptive opponent modeling framework that learns a performance-driven mixture of multiple intent representations. We further introduce a new intention representation that maximizes mutual information with the ego-agent's future returns, thereby capturing opponent information that is most directly relevant to performance. Our approach consistently matches or exceeds the performance of state-of-the-art baselines across diverse tasks and yields insights into when and why different opponent modeling strategies succeed.

2605.31317 2026-06-01 cs.LG

Forgetting Has Neighbors: Localized Collateral Forgetting in Machine Unlearning

遗忘有邻居:机器遗忘中的局部连带遗忘

Polina Dolgova, Sebastian U. Stich

AI总结 本文研究机器遗忘中梯度上升和随机标签方法导致的局部连带遗忘现象,并提出了基于局部教师蒸馏的缓解策略。

详情
AI中文摘要

机器遗忘旨在无需完全重新训练的情况下移除选定训练样本的影响。标准评估通常使用聚合指标(如准确率和遗忘分数)来概括遗忘质量,这可能会掩盖局部失败。我们通过比较遗忘模型与删除后重新训练模型的预测,在样本级别研究这种失败模式。我们表明,这种逐点差异可能高度不均匀:对于梯度上升和随机标签方法,无论是否进行保留集微调,差异都随着与遗忘集的几何接近度而增大。我们将这种现象称为局部连带遗忘。我们的分析确定了该效应背后的机制:遗忘过程中使用的替代目标可能与重新训练引起的局部预测结构不一致,并且这种不一致通过共享表示传播到邻近样本。受此机制启发,我们提出了局部教师蒸馏,一种简单的缓解策略,用仅在遗忘集的保留邻居上训练的小教师生成的软标签替换随机目标。在CIFAR-100部分类别删除任务中,这种局部教师使遗忘模型更接近重新训练,尤其是在遗忘集附近,同时保持有竞争力的聚合遗忘指标。

英文摘要

Machine unlearning aims to remove the influence of selected training examples without full retraining. Standard evaluations often summarize unlearning quality with aggregate metrics, such as accuracy- and forgetting-based scores, which can hide localized failures. We study this failure mode at the example level by comparing the predictions of an unlearned model to those of the model retrained after deletion. We show that this pointwise discrepancy can be highly non-uniform: for gradient-ascent and random-labeling methods, with and without retain-set fine-tuning, it grows with geometric proximity to the forget set. We call this phenomenon localized collateral forgetting. Our analysis identifies a mechanism behind the effect: surrogate targets used during unlearning can be inconsistent with the local prediction structure induced by retraining, and this inconsistency propagates through shared representations to nearby examples. Motivated by this mechanism, we propose Local Teacher Distillation, a simple mitigation strategy that replaces random targets with soft labels from a small teacher trained only on retained neighbors of the forget set. On CIFAR-100 partial-class deletion, this local teacher brings the unlearned model substantially closer to retraining, especially near the forget set, while maintaining competitive aggregate unlearning metrics.

2605.31315 2026-06-01 cs.LG

Graph Neural Networks Are Not Continuous Across Graph Resolutions

图神经网络在图分辨率上不连续

Christian Koke, Yuesong Shen, Abhishek Saroha, Marvin Eisenberger, Bastian Rieck, Michael Bronstein, Daniel Cremers

AI总结 本文证明图神经网络在自然图收敛模式下不连续,并提出一种基于信息传播方案的结构性修改,使其具备跨尺度连续性,从而实现对不同分辨率的稳定整合与泛化。

Comments arXiv admin note: text overlap with arXiv:2310.00431

详情
AI中文摘要

我们表明,与社区中的传统观点相反,图神经网络(GNN)对于所有自然的图收敛模式并不连续。因此,GNN 可能为非常相似的图生成截然不同的潜在表示。特别是,它们为表示同一底层对象但处于不同分辨率尺度的图分配了非常不同的潜在嵌入。我们将这种不连续性的失败追溯到由常用信息传播方案引起的结构性障碍。基于这一见解,我们推导出对标准 GNN 架构的一种原则性修改,使模型具备跨尺度的连续性。所提出的修改能够实现不同分辨率的稳定整合以及它们之间的可靠泛化。我们通过广泛的数值实验系统性地验证了我们的理论发现。

英文摘要

We show that contrary to conventional wisdom in the community, graph neural networks (GNNs) are not continuous with respect to all natural modes of graph convergence. As a result, GNNs may generate substantially different latent representations for graphs that are very similar. In particular they assign vastly different latent embeddings to graphs that represent the same underlying object at different resolution scales. We trace this failure of continuity back to a structural obstruction arising from commonly used information-propagation schemes. Building on this insight we then derive a principled modification to standard GNN architectures which equips models with continuity across scales. The proposed modification enables consistent integration of distinct resolutions and reliable generalization between them. We systematically validate our theoretical findings in a wide range of numerical experiments.

2605.31314 2026-06-01 cs.RO

AR Forcing: Towards Long-Horizon Robot Navigation World Model

AR Forcing: 迈向长时域机器人导航世界模型

Yifei Yang, Zehua Fan, Huan Li, Aoqi Wang, Lida Huang, Haibao Yu, Haiyan Liu, Xuanyao Mao, Jason Bao, Liang Xu, Bingchuan Sun, Yan Wang

AI总结 提出AR Forcing自回归训练策略,通过将扩散损失集成到自回归训练循环中,解决训练与推理分布偏移问题,提升长时域导航中图像一致性和轨迹预测精度。

详情
AI中文摘要

基于扩散的机器人导航世界模型通常使用并行监督进行训练,而在路径规划时采用自回归推理。这导致训练和推理之间的分布偏移,从而在长时域预测中降低性能。我们提出AR Forcing,一种自回归训练策略,将标准扩散损失集成到自回归训练循环中。在每个步骤中,模型使用其自身的预测来更新上下文并优化单步噪声预测目标,从而在训练期间显式地将模型暴露于推理状态分布。我们的方法不需要额外的判别器或分布匹配损失,保留了原始扩散框架和采样器,并且易于集成。在多领域导航数据集(RECON、SCAND、HuRoN、TartanDrive)上的实验表明,与强基线相比,AR Forcing在长时域导航期间提高了生成图像的一致性以及预测轨迹的准确性,增强了模型在复杂已知和未知环境中的鲁棒性。我们将很快发布代码。

英文摘要

The diffusion based robot navigation world models are typically trained using parallel supervision, while autoregressive inference is employed during path planning. This results in a distribution shift between training and inference, which destabilizes the performance over long-horizon prediction. We propose AR Forcing, an autoregressive training strategy, which integrates the standard diffusion loss into the autoregressive training loop. At each step, the model uses its own predictions to update the context and optimize the single step noise prediction objective, thereby explicitly exposing the model to the inference state distribution during training. Our method does not require additional discriminators or distribution-matching losses, retains the original diffusion framework and sampler, and is easy to integrate. Experiments on multi-domain navigation datasets (RECON, SCAND, HuRoN, TartanDrive) show that compared with strong baselines, AR Forcing improved the consistency of generated images during long-horizon navigation and the accuracy of predicted trajectories, enhancing robustness of the model in complex known and unknown environments. We will release the code soon.

2605.31312 2026-06-01 cs.CV cs.CL

Learning from Fine-Grained Visual Discrepancies: Mitigating Multimodal Hallucinations via In-Context Visual Contrastive Optimization

从细粒度视觉差异中学习:通过上下文视觉对比优化缓解多模态幻觉

Haolin Deng, Xin Zou, Zhiwei Jin, Chen Chen, Haonan Lu, Xuming Hu

AI总结 提出上下文视觉对比优化(IC-VCO)方法,通过共享多图像上下文中的对比图像确保数学严谨的目标,并引入视觉对比蒸馏(VCDist)和对比样本编辑策略,有效缓解多模态幻觉。

Comments ICML 2026

详情
AI中文摘要

多模态幻觉仍然是视觉语言模型(VLM)面临的持续挑战。标准的文本直接偏好优化(DPO)由于缺乏显式的视觉监督,往往无法缓解这一问题。虽然现有工作通过将原始图像与负样本对比引入了视觉偏好DPO,但由于配分函数不匹配导致目标在理论上不一致,并且依赖可能引发捷径学习的粗粒度负样本。在这项工作中,我们提出了上下文视觉对比优化(IC-VCO)。通过将对比图像置于共享的多图像上下文中,IC-VCO确保了数学上严谨的目标。我们进一步引入了视觉对比蒸馏(VCDist),一种辅助的可靠性门控正则化器,鼓励多图像对比训练与单图像推理之间的一致性。最后,我们提出了一种对比样本编辑策略,通过精确的语义扰动生成困难负样本。在五个基准上的实验表明,IC-VCO取得了最佳的整体性能,并且我们的样本编辑策略有效。代码和数据可在 https://github.com/OPPO-Mente-Lab/IC-VCO 获取。

英文摘要

Multimodal hallucination remains a persistent challenge for Vision-Language Models (VLMs). Standard textual Direct Preference Optimization (DPO) often fails to mitigate it due to a lack of explicit visual supervision. While existing works introduce visual preference DPO by contrasting original images against negative ones, they suffer from a theoretically inconsistent objective caused by partition function mismatches and rely on coarse-grained negatives that could enable shortcut learning. In this work, we propose In-Context Visual Contrastive Optimization (IC-VCO). By placing contrastive images within a shared multi-image context, IC-VCO ensures a mathematically rigorous objective. We further introduce Visual Contrast Distillation (VCDist), an auxiliary reliability-gated regularizer that encourages consistency between multi-image contrastive training and single-image inference. Finally, we propose a contrastive sample editing strategy that generates hard negatives via precise semantic perturbations. Experiments on five benchmarks demonstrate IC-VCO's best overall performance and the effectiveness of our sample editing strategy. Code and data are available at https://github.com/OPPO-Mente-Lab/IC-VCO.

2605.31309 2026-06-01 cs.LG math.PR stat.ML

Non-Asymptotic Convergence of Stochastic Iterative Algorithms: A Lyapunov Framework

随机迭代算法的非渐近收敛性:一个李雅普诺夫框架

Zaiwei Chen, Siva Theja Maguluri

AI总结 本文综述了基于李雅普诺夫技术的随机迭代算法(随机逼近)的有限时间分析方法,通过广义Moreau包络作为通用李雅普诺夫函数,给出了均方收敛保证,并应用于随机梯度下降、线性SA及Q学习等强化学习算法,最后讨论了马尔可夫噪声、半范数压缩算子等扩展。

Comments 44 pages

详情
AI中文摘要

我们综述了基于李雅普诺夫技术的随机迭代算法(也称为随机逼近(SA)算法)的有限时间分析方法,用于求解不动点方程 $ar{F}(x)=x$,其中算子 $ar{F}(\cdot)$ 只能通过带噪声的预言机访问。我们首先关注标准设定,其中 $ar{F}(\cdot)$ 关于某种范数是压缩的且噪声是独立同分布的,并解释广义Moreau包络如何作为通用李雅普诺夫函数,无论底层范数如何。然后,我们展示该框架如何产生均方收敛保证,并应用于随机梯度下降、线性SA以及基于值的强化学习算法,如Q学习和时序差分学习。最后,我们讨论向马尔可夫噪声、半范数压缩算子、耗散算子和高概率界的扩展,并以开放问题作结。目标是提供一个统一且自包含的SA有限时间分析及其应用(尤其是在强化学习中)的路线图。

英文摘要

We survey Lyapunov-based techniques for the finite-time analysis of stochastic iterative algorithms, also known as stochastic approximation (SA) algorithms, for solving fixed-point equations $\bar{F}(x)=x$, where the operator $\bar{F}(\cdot)$ can only be accessed through a noisy oracle. We first focus on the standard setting in which $\bar{F}(\cdot)$ is contractive with respect to some norm and the noise is i.i.d., and explain how generalized Moreau envelopes serve as universal Lyapunov functions, regardless of the underlying norm. We then show how this framework yields mean-square convergence guarantees and applies to stochastic gradient descent, linear SA, and value-based reinforcement learning algorithms such as Q-learning and temporal-difference learning. Finally, we discuss extensions to Markovian noise, seminorm-contractive operators, dissipative operators, and high-probability bounds, and conclude with open problems. The goal is to present a unified and self-contained roadmap for the finite-time analysis of SA and its applications, especially in reinforcement learning.

2605.31308 2026-06-01 cs.AI

TraceGraph: Shared Decision Landscapes for Diagnosing and Improving Agent Trajectories

TraceGraph: 用于诊断和改进智能体轨迹的共享决策景观

Junjie Nian, Kang Chen, Ge Zhang, Yixin Cao, Yugang Jiang

AI总结 提出TraceGraph图框架,将多模型智能体轨迹构建为共享决策景观,通过事件摘要和陷阱感知恢复管线提升SWE-bench解决率。

详情
AI中文摘要

智能体基准测试越来越多地记录丰富的交互轨迹,但评估通常将每次运行简化为通过率或奖励分数。我们引入了TraceGraph,一个基于图的框架,将发布的多模型智能体轨迹转化为共享决策景观。对于每个任务,TraceGraph在引入模型身份之前,从聚合的运行中构建一个关于可观察动作-观察状态的图。然后,它叠加结果信息丰富的生产核心和陷阱区域,并用三个事件总结每条轨迹:访问、陷阱暴露和修复。跨越五个基准测试分割的轨迹中,TraceGraph配置文件揭示了被聚合分数隐藏的导航差异,并显示不同分割在奖励避免陷阱还是从中恢复方面有所不同。相同的TraceGraph景观还激发了SWE-bench的陷阱感知恢复管线:运行时检测器在匹配历史陷阱区域的状态上触发,然后从相同前缀评估轻量级延续策略。在触发状态上,最佳聚合单因子策略将每个提供者触发子集上的官方解决率从40.4%提高到43.5%,在共同触发实例上从41.0%提高到44.8%,并具有提供者特定的主动组件。总体而言,TraceGraph提供了一个过程词汇,用于询问智能体基准测试测试什么、模型在共享景观上何处出现分歧,以及失败区域如何指导下游改进。

英文摘要

Agent benchmarks increasingly record rich interaction trajectories, yet evaluation often reduces each rollout to a pass rate or reward score. We introduce TraceGraph, a graph-based framework that turns released multi-model agent trajectories into shared decision landscapes. For each task, TraceGraph builds a graph over observable action-observation states from pooled rollouts before model identity is introduced. It then overlays outcome-informed productive cores and trap regions, and summarizes each rollout with three events: Access, Trap exposure, and Repair. Across trajectories spanning five benchmark splits, TraceGraph profiles reveal navigation differences hidden by aggregate scores and show that splits differ in whether they reward avoiding traps or recovering from them. The same TraceGraph landscape also motivates a trap-aware recovery pipeline for SWE-bench: aruntime detector fires on states matching historical trap regions, then lightweight continuation policies are evaluated from the same prefix. On fired states, the best pooled single-factor policy raises official resolved rate from 40.4% to 43.5% on the per-provider fired subset and from 41.0% to 44.8% on common-fired instances, with provider-specific active components. Overall, TraceGraph provides a process vocabulary for asking what agent benchmarks test, where models diverge on a shared landscape, and how failure regions can guide downstream improvement.

2605.31304 2026-06-01 cs.LG cs.CV

Interpretability Without Tradeoffs: Disentangling Polysemanticity At Equal Predictive Performance

无权衡的可解释性:在同等预测性能下解开多义性

Doğukan Bağcı, Bernt Schiele, Simone Schaub-Meyer, Jonas Fischer, Robin Hesse

AI总结 提出ELUDe方法,通过无损重组层间信息流,在不改变模型输出的前提下将多义神经元分解为单义特征,提升深度神经网络的可解释性。

Comments Preprint

详情
AI中文摘要

深度神经网络(DNN)被广泛使用,但解释它们实际学到什么仍然困难。一个主要障碍是单个神经元通常编码多个不相关的概念,模糊了网络的决策过程。虽然先前的工作,如稀疏自编码器,可以将这些混合信号分离成更有意义的“单义”特征,但这通常需要以可能降低下游性能的方式改变模型。为了克服这一点,我们引入了ELUDe(显式、无损、无监督解缠),一种在保持功能等价性的同时提高DNN可解释性的方法。ELUDe将潜在表示分解为清晰、可检查的子单元,这些子单元表现得像可解释的特征,同时保证模型的输出保持完全相同。它不需要显式训练,不需要标签,并且可以应用于预训练模型。ELUDe通过重组层间信息流的方式工作,重新路由特定概念的贡献,同时通过构造保留原始计算。在多个视觉模型上,包括DINOv2和有监督的ViT-B/16,ELUDe提高了可解释性,保持下游准确性不变,运行高效,并支持实际用途,如引导模型表示。简而言之,ELUDe提供了(几乎)没有权衡的可解释性:更清晰、可扩展且可操作的模型洞察,且性能无损失。

英文摘要

Deep neural networks (DNNs) are widely used, but interpreting what they actually learn remains difficult. A major obstacle is that individual neurons often encode multiple unrelated concepts, obscuring the decision process of the network. While prior work, such as sparse autoencoders, can separate these mixed signals into more meaningful, "monosemantic" features, this typically requires altering the model in ways that can degrade downstream performance. To overcome this, we introduce ELUDe (explicit, lossless, unsupervised disentanglement), a method for improving the interpretability of DNNs while preserving their functional equivalence. ELUDe breaks latent representations into clear, inspectable sub-units that behave like interpretable features, while guaranteeing that the model's outputs remain exactly the same. It requires no explicit training, no labels, and can be applied to pretrained models. ELUDe works by reorganizing how information flows between layers, re-routing concept-specific contributions while preserving the original computation by construction. Across several vision models, including DINOv2 and supervised ViT-B/16, ELUDe improves interpretability, keeps downstream accuracy unchanged, runs efficiently, and supports practical uses such as steering model representations. In short, ELUDe offers interpretability (almost) without a tradeoff: clearer, scalable, and actionable model insights with no loss in performance.

2605.31295 2026-06-01 cs.SD cs.AI cs.IR cs.LG

Latent Space Disentanglement via Activation Steering for Interpretable Attribute Control in Symbolic Music Generation

通过激活引导实现潜在空间解缠:符号音乐生成中可解释的属性控制

Ioannis Prokopiou, Pantelis Vikatos, Maximos Kaliakatsos-Papakostas, Theodoros Giannakopoulos, Themos Stafylakis

AI总结 本文利用差分均值方法从多轨音乐Transformer的残差流中分离音高和时长的潜在方向,并通过Gram-Schmidt正交化实现双属性引导,从而在推理时实现可解释的确定性属性调制。

Comments Accepted at EUSIPCO 2026 (34th European Signal Processing Conference), 5 pages, 2 figures

详情
AI中文摘要

基于Transformer的架构在生成复杂符号序列方面取得了显著进展,但在实现对离散信号属性的细粒度、可解释控制方面仍存在显著差距。本文研究了多轨音乐Transformer(MMT)的机制可解释性,并提出了一种无需重新训练的确定性属性调制框架,通过推理时的激活引导来弥合这一差距。利用差分均值(DiffMean)方法,我们在残差流中分离了信号属性(特别是音高和时长)的潜在方向。我们验证了该领域的线性表示假设,实现了引导幅度与属性偏移之间的高相关性。为了解决多属性引导中固有的特征纠缠问题,我们引入了一种利用Gram-Schmidt正交化的双引导框架。实验结果表明,与简单的向量加法相比,这种几何解耦减少了概念干扰和信号退化,即使在强自回归条件下也能实现独立的确定性控制。

英文摘要

Transformer-based architectures have significantly advanced the generation of complex symbolic sequences, yet a significant gap remains in achieving fine-grained, interpretable control over discrete signal attributes. This paper investigates the mechanistic interpretability of the Multitrack Music Transformer (MMT) and proposes a framework for deterministic attribute modulation without retraining to bridge this gap via inference-time activation steering. Utilizing the Difference-in-Means (DiffMean) methodology, we isolate latent directions for signal attributes, specifically Pitch and Duration, within the residual stream. We validate the Linear Representation Hypothesis in this domain, achieving high correlation between steering magnitude and attribute shift. To address the inherent feature entanglement in multi-attribute steering, we introduce a Dual Steering framework utilizing Gram-Schmidt Orthogonalization. Experimental results demonstrate that this geometric decoupling reduces conceptual interference and signal degradation compared to naive vector addition, enabling independent deterministic control even against strong autoregressive conditioning.

2605.31294 2026-06-01 cs.CV

TokTalk: Expressive Real-time Facial Animation from Audio-LLM Tokens

TokTalk: 基于音频-大语言模型令牌的富有表现力的实时面部动画

Qingcheng Zhao, Yifang Pan, Karan Singh

AI总结 提出TokTalk系统,利用音频-大语言模型产生的音频令牌直接实时生成富有表现力的3D面部动画,通过分块条件流匹配模型和轻量级适配策略实现低延迟和高品质。

详情
AI中文摘要

近期GPT-4o等音频-大语言模型的进展开启了与语言模型对话交互的新时代。然而,对话式虚拟角色在面部表情和对话流程上仍显机械,部分原因在于其顺序执行语音识别、文本生成、轮次文本响应、语音合成和音频驱动面部动画等多个阶段。基于当前音频-大语言模型产生的音频令牌包含足够信息以重建合理面部表现这一洞察,我们提出TokTalk,一个直接从流式音频令牌实时输出富有表现力面部动画的系统。我们构建了一个新颖的音频令牌到3D面部运动数据集,并使用基于分块的条件流匹配模型训练TokTalk。一种轻量级适配策略使我们的训练模型能够以极小的计算开销无缝连接到任何基于令牌的音频-大语言模型。我们的分块处理进一步实现了延迟与面部质量之间的参数化权衡,并通过消融研究进行了验证。我们还表明,TokTalk的实时性能在延迟上与现有技术解决方案相当,而在3D面部表现的质量、表现力和可控性方面(通过感知研究)显著更优。我们通过聊天机器人虚拟角色、语音驱动的用户虚拟角色和动画导演界面展示了TokTalk在多种音视频面部应用中的灵活性。

英文摘要

Recent advances in Audio-LLMs like GPT-4o have ushered in an era of conversational interaction with language models. Conversational avatars however, still seem robotic in facial expression and conversational flow, in part due to sequential stages of speech recognition, text generation, turn-based text response, speech synthesis, and audio driven facial animation. Based on our insight that audio-tokens produced by current Audio-LLMs carry sufficient information to reconstruct a plausible facial performance, we present TokTalk, a system that directly outputs expressive facial animation in real-time from streaming audio-tokens. We construct a novel audio-token to 3D facial motion dataset, on which TokTalk is trained using a Chunk-based Conditional Flow Matching model. A lightweight adaptation strategy allows our trained model to seamlessly connect to any token-based Audio-LLM at minimal computational overhead. Our chunk-based processing further enables parametric trade-off between latency and facial quality, shown through ablation studies. We further show that the real-time performance of TokTalk is comparable in latency to prior art solutions, and significantly favorable (via a perceptual study) in terms of quality, expressivity and control of the 3D facial performance. We showcase TokTalk's flexibility using a chatbot Avatar, a voice-driven user Avatar, and an animation Director's interface, as diverse audio-visual face applications.

2605.31293 2026-06-01 cs.CL

Divergence Decoding: Inference-Time Unlearning via Auxiliary Models

发散解码:通过辅助模型进行推理时遗忘

Humzah Merchant, Bradford Levy

AI总结 提出发散解码(DD)方法,利用小型辅助模型在推理时引导LLM的logits远离特定数据,有效且低成本地实现遗忘,并在多个基准上超越现有方法。

详情
AI中文摘要

大型语言模型(LLM)经常记忆敏感的训练数据,从而产生显著的隐私和版权风险。解决这些风险,即从现有模型检查点中移除此类知识,已被证明具有挑战性,因为许多遗忘方法会导致灾难性的效用损失或对复杂查询无效。我们引入了发散解码(DD),一种使用小型辅助模型在推理时将LLM的logits引导远离特定数据的机制。训练这些模型是直接的,即我们使用标准的预训练和微调设置。我们发现该方法在遗忘基准测试中明显优于最先进的基线,且在各种模型和训练数据集规模上保持一致,表明DD是一种有效且廉价的遗忘解决方案。然后我们证明,这种引导后的分布可以轻松地蒸馏回基础模型。由于该方法普遍适用于任何概率模型,我们探索了其在文本生成之外的有效性,并发现了向图像领域泛化的证据。

英文摘要

Large Language Models (LLMs) frequently memorize sensitive training data thereby creating significant privacy and copyright risks. Addressing these risks, i.e., removing such knowledge from an existing model checkpoint, has proven challenging as many unlearning methods lead to catastrophic utility loss or are ineffective for complex queries. We introduce Divergence Decoding (DD), a mechanism that uses small auxiliary models to steer the logits of the LLM away from specific data during inference. Training these models is straight forward, i.e., we use standard pre-training and fine-tuning setups. We find the method decisively outperforms state-of-the-art (SOTA) baselines on unlearning benchmarks across a variety of model and training dataset scales consistent with DD being an effective and inexpensive solution to unlearning. We then demonstrate that this steered distribution can be trivially distilled back into the base model. Since the method is generally applicable to any probabilistic model, we explore its efficacy outside of text generation and find evidence of generalization to the domain of images.

2605.31292 2026-06-01 cs.CV

Authentication of Copy Detection Patterns via Cross-Camera Dual-Synthetic Referencing

复制检测模式的跨相机双合成参考认证

Ivan Oleksiyuk, Roman Chaban, Slava Voloshynovskiy

AI总结 提出一种基于注册的跨相机双合成参考框架,通过深度学习翻译器联合利用数字模板和注册捕获生成高质量参考图像,以应对打印随机性和相机失真,提升复制检测模式的认证性能。

Comments To appear in Proc. ICIP2026, September 13-17, 2026, Tampere, Finland

详情
AI中文摘要

复制检测模式(CDP)是打印在物理对象上的结构,用于实现经济高效的认证。验证通过将捕获图像与打印CDP的数字模板进行比较来完成。在实践中,打印机的随机性和相机失真阻碍了这种比较,限制了对抗伪造的鲁棒性。先前的工作通过在验证相机域中合成参考图像来解决相机效应,但忽略了打印变异性。我们引入了一种基于注册的跨相机双合成参考框架。每个打印的CDP首先由受控的注册相机捕获,然后一个基于深度学习的翻译器联合利用数字模板和注册捕获,为验证图像生成高质量的参考。我们提供了信息论上的证明,表明双参考比基于模板的参考包含更多信息。在异构移动相机上的实验表明,认证性能得到提升,对基于机器学习的复制攻击具有鲁棒性,并且能够从小CDP区域和低端设备上进行可靠验证。

英文摘要

Copy Detection Patterns (CDPs) are structures printed on physical objects to enable cost-effective authentication. Verification is achieved by comparing a captured image with the digital template from which the CDP was printed. In practice, printer stochasticity and camera distortions hinder this comparison, limiting robustness against counterfeiting. Prior work addressed camera effects by synthesising reference images in the verification camera domain, but it ignored printing variability. We introduce an enrolment-based cross-camera dual-synthetic referencing framework. Each printed CDP is first captured by a controlled enrolment camera, and a deep-learning-based translator jointly exploits the digital template and the enrolled capture to generate a high-quality reference for the verification image. We provide an information-theoretic justification showing that the dual reference is more informative than template-based references. Experiments on heterogeneous mobile cameras demonstrate improved authentication performance, robustness to machine-learning-based copy attacks, and reliable verification from small CDP regions and on low-end devices.

2605.31289 2026-06-01 cs.LG cs.AI

The Terminal Representation in Reinforcement Learning

强化学习中的终端表示

Amir Esterhuysen, Anders Jonsson

AI总结 提出终端表示(TR),一种无需特征分解即可直接用于下游任务且计算开销更低的奖励加权状态表示方法。

详情
AI中文摘要

表示学习是强化学习(RL)中用于时空抽象的强大工具。两种成熟的方法是通过后继表示(SR)和默认表示(DR)。SR通过状态引发的未来轨迹对其进行编码,捕获与奖励解耦的信息流。DR在此基础上用奖励加权轨迹,将信用分配结构整合到表示中。两种表示的特征向量已被用于支持一系列下游任务——包括选项发现、奖励塑造、迁移学习和探索。我们引入了一种结构不同的公式:终端表示(TR)。TR类似于DR对奖励加权轨迹进行编码,但可以作为更低维度的对象进行学习,并且可以直接用于上述应用而无需特征分解。特征分解还施加了对称转移动力学的假设,而TR可以绕过这一点。在这项工作中,我们发展了TR的理论基础:其推导、两种学习算法的收敛性、其在零样本组合性中的使用,以及替代奖励公式之间的等价性。我们进一步表明TR嵌入在顶部DR特征向量中,使其无需特征分解即可捕获相同的基础知识。此外,我们提供了经验证据,证明TR在辅助应用中作为现有表示的可行替代方案,同时在学习、存储和使用方面需要更少的计算开销。

英文摘要

Representation learning is a powerful tool for spatio-temporal abstraction within reinforcement learning (RL). Two well established approaches are through the successor representation (SR) and the default representation (DR). The SR encodes states by the future trajectories they induce, capturing information flow decoupled from reward. The DR builds on this by weighting trajectories with reward, integrating credit-assignment structure into the representation. Eigenvectors of both representations have been used to support a range of downstream tasks -- including option discovery, reward shaping, transfer learning, and exploration. We introduce a structurally distinct formulation: the terminal representation (TR). The TR encodes reward-weighted trajectories similarly to the DR, but can be learned as a lower-dimensionality object, and can be used directly for the mentioned applications without eigenvector computations. Eigendecomposition also imposes the assumption of symmetric transition dynamics, which the TR can bypass. In this work we develop the theoretical foundations of the TR: its derivation, convergence of two learning algorithms, its use for zero-shot compositionality, and equivalences between alternative reward formulations. We further show the TR is embedded in the top DR eigenvector, allowing it to capture the same underlying knowledge without eigendecomposition. Additionally, we provide empirical evidence of the TR as a viable alternative to existing representations in subsidiary applications, while requiring less computational overhead to learn, store, and use.

2605.31284 2026-06-01 cs.CV cs.AI

SAM for Robust Mitochondria Instance Segmentation in Fluorescence Microscopy

SAM 用于荧光显微镜中鲁棒的线粒体实例分割

Suyog Jadhav, Dilip K. Prasad, Krishna Agarwal

AI总结 通过仅在合成荧光显微镜数据上微调 SAM,解决了真实数据稀缺问题,提高了线粒体实例分割的精度和平均 Dice 分数。

Comments Accepted at PHAROS-AIF-MIH workshop @ CVPR 2026

详情
AI中文摘要

荧光显微镜(FM)中线粒体的形态分析对于理解细胞健康、能量产生和代谢调节至关重要。虽然像 Segment Anything Model (SAM) 这样的基础模型已经革新了自然图像分割,但由于衍射受限分辨率、低对比度和复杂的重叠细胞器网络,它们直接应用于 FM 受到显著领域偏移的阻碍。此外,鲁棒模型的开发因严重缺乏高质量、手动标注的线粒体实例分割数据集而受阻。在本文中,我们提出了一种可扩展的解决方案,通过仅在合成生成的 FM 数据上微调 SAM 来解决数据稀缺问题。我们模拟真实的线粒体数据并模拟荧光显微镜的光学特性,以创建大规模标注数据集。我们在一个精心策划的真实手动标注 FM 图像数据集上评估了我们的微调模型。定性和定量分析表明,我们的合成微调模型在精度和平均 Dice 分数上优于强基线。这项工作确立了模拟辅助训练在 FM 实例分割中的潜力。

英文摘要

The morphological analysis of mitochondria in fluorescence microscopy (FM) is crucial for understanding cellular health, energy production, and metabolic regulation. While foundation models like the Segment Anything Model (SAM) have revolutionized natural image segmentation, their direct application to FM is hindered by a significant domain shift characterized by diffraction-limited resolution, low contrast, and complex overlapping organelle networks. Furthermore, the development of robust models is bottlenecked by a severe lack of high-quality, manually annotated instance segmentation datasets for mitochondria. In this paper, we propose a scalable solution to this data scarcity by finetuning SAM exclusively on synthetically generated FM data. We simulate realistic mitochondria data and emulate the optical properties of fluorescence microscopes to create a large-scale annotated dataset. We evaluate our fine-tuned model on a curated dataset of real, manually annotated FM images. Qualitative and quantitative analyses demonstrate that our synthetically fine-tuned model improves precision and average dice score over strong baselines. This work establishes the potential of simulation-assisted training for FM instance segmentation.

2605.31283 2026-06-01 cs.CV

Topologically Consistent Multi-view 3D Head Reconstruction via Coarse-Guided Layered Surface Sampling

基于粗引导分层表面采样的拓扑一致多视图三维头部重建

Timo Bolkart, Daoye Wang, Prashanth Chandran

AI总结 提出SHELLS框架,通过分层采样策略解耦特征提取与网格分辨率,实现高效、拓扑一致的多视图三维头部重建,在合成数据训练下泛化到真实场景。

Comments SIGGRAPH Conference Papers 2026

详情
AI中文摘要

我们提出SHELLS(分层局部采样的语义头部估计),一种高效的前馈框架,用于从多视图图像中重建具有密集语义对应的三维头部。现有方法通常通过局部特征体素独立细化顶点,这种方法将内存密集的特征采样与网格分辨率耦合,限制了密集拓扑(>1万顶点)的可扩展性并引入表面噪声。相比之下,SHELLS通过分层采样策略将特征提取与网格分辨率解耦。我们使用带有LoRA适配的DINOv2骨干网络提取多视图特征,投影采样稀疏全局特征云,并预测中间粗网格。该粗先验指导构建分层、表面感知的采样壳,作为最终重建的离散搜索空间。SHELLS保持表面一致性,同时推理GPU内存比体积基线减少88%(2.4GB vs. 20GB)。对于1.8万顶点的网格,它将中位配准误差降低21%至29%,推理速度提升3.5倍(0.08s vs. 0.29s)。值得注意的是,我们的模型仅在合成数据上训练,却能有效泛化到真实世界捕获,消除了先前工作中常见的昂贵预注册多视图数据集的需求。

英文摘要

We present SHELLS (Semantic Head Estimation via Layered Local Sampling), an efficient feed-forward framework for 3D head reconstruction in dense semantic correspondence from multi-view images. Existing methods typically refine vertices independently via localized feature volumes. This approach couples memory-intensive feature sampling to mesh resolution, which limits scalability for dense topologies (> 10k vertices) and introduces surface noise. In contrast, SHELLS decouples feature extraction from mesh resolution via a hierarchical sampling strategy. We extract multi-view features using a DINOv2 backbone with LoRA adaptation, projectively sample a sparse global feature cloud, and predict an intermediate coarse mesh. This coarse prior guides the construction of layered, surface-aware sampling shells that serve as a discrete search space for the final reconstruction. SHELLS maintains surface consistency while using 88% less inference GPU memory (2.4GB vs. 20GB) than volumetric baselines. It reduces median registration error by 21% to 29% with a 3.5x inference speedup (0.08s vs. 0.29s) for 18k-vertex meshes. Notably, our model is trained exclusively on synthetic data yet generalizes effectively to real-world captures, eliminating the need for the costly, pre-registered multi-view datasets common in prior work.

2605.31281 2026-06-01 cs.CL

Wind Turbine Maintenance Log Labelling Framework: LLM-Driven Data Correction and Enrichment via Semantic Extraction of Reliability Intelligence

风力涡轮机维护日志标注框架:基于LLM驱动的数据校正与语义提取的可靠性智能增强

Max Malyi, Jonathan Shek, Alasdair McDonald, Andre Biscaya

AI总结 提出一种利用大语言模型自动标准化和结构化风力涡轮机维护日志的方法,通过纠正系统代码、提取故障模式与维护动作分类,将非结构化文本转化为定量可靠性指标。

Comments An adjustable template containing the Python script architecture, applied dynamic prompts, and data schemas is hosted in an open-source GitHub repository: https://github.com/mvmalyi/llm-driven-wind-turbine-maintenance-log-labelling

详情
AI中文摘要

随着风力涡轮机机队老化,数据驱动的可靠性工程对于优化其运行和维护以延长使用寿命和降低平准化能源成本至关重要。历史维护日志中的故障事件描述是宝贵可靠性智能的来源。然而,它们通常以非结构化的自然语言条目出现,无法进行定量分析。本文提出了一种新颖的方法,利用大语言模型(LLM)根据自由文本描述符系统地标准化和结构化维护日志。该方法在来自280台涡轮机、监测九年的16,316条维护日志数据集上运行,开发的模型无关框架自主纠正了层次化系统代码,并提取了基于证据的维护操作和故障模式分类。自动化流水线成功结构化超过70%的数据集。它解决了普遍存在的错误分类问题,例如隔离先前未分类的变桨系统故障并恢复缺失的系统代码,并通过应用经验分类法标记具体采取的操作和处理的故障模式来丰富记录。通过使用基于系统的日志批次构建故障模式、可观察症状、主导机制和候选原因的经验词典,该方法减少了手动故障模式与影响分析(FMEA)固有的主观性。最终,该方法为将大量定性现场观测转化为定量可靠性指标提供了高度可扩展、成本效益高的蓝图,为可再生能源领域的集成根本原因分析、改进的FMEA和先进预测性维护奠定了基础。

英文摘要

As wind turbine fleets age, data-driven reliability engineering is essential to optimise their operation and maintenance for service life extension and levelised cost of energy reduction. Failure event descriptions within historical maintenance logs are a source of valuable reliability intelligence. However, they typically appear as unstructured natural language entries, rendering them inaccessible for quantitative analysis. This paper presents a novel methodology leveraging a large language model (LLM) to systematically standardise and structure maintenance logs based on their free-text descriptors. Operating on a dataset of 16,316 maintenance logs from 280 turbines monitored over nine years, the developed model-agnostic framework autonomously corrected hierarchical system codes and extracted evidence-based taxonomies of maintenance actions and failure modes. The automated pipeline successfully structured over 70% of the dataset. It resolved pervasive misclassification issues, such as isolating previously unclassified pitch system faults and restoring missing system codes, and enriched the records by applying empirical taxonomies to label specific actions taken and failure modes addressed. By using system-based log batches to construct empirical dictionaries of failure modes, observable symptoms, dominant mechanisms, and candidate causes, this approach reduces the inherent subjectivity of manual failure modes and effects analysis (FMEA). Ultimately, the methodology provides a highly scalable, cost-effective blueprint for translating large sets of qualitative field observations into quantitative reliability metrics, laying the foundation for integrated root-cause analysis across the renewable energy sector, improved FMEA, and advanced predictive maintenance.

2605.31276 2026-06-01 cs.LG

Learning Parametric Nitrogen Fertilizer Response Curves Using Neuro Symbolic Regression

使用神经符号回归学习参数化氮肥响应曲线

Giorgio Morales, John Sheppard

AI总结 提出一种基于神经符号回归的方法,无需预设函数形式即可学习氮肥响应曲线,并在真实冬小麦数据上验证其优于传统模型。

Comments Accepted at the Workshop on Symbolic Regression and Equation Discovery, part of the 2026 IEEE World Congress on Computational Intelligence (WCCI) and the IEEE Congress on Evolutionary Computation (CEC)

详情
AI中文摘要

准确模拟作物对氮肥的响应是精准农业中的基本挑战,因为它影响经济效益和环境可持续性。现有方法要么依赖预定义的参数形式,要么使用不透明的机器学习模型,限制了它们从数据中解释或发现特定地点函数关系的能力。在这项工作中,我们提出了一种神经符号回归方法,无需假设预定义的函数形式即可学习参数化的氮响应曲线。我们的方法集成了基于Transformer的多集符号骨架预测策略,能够发现多个子域或管理区之间的共享函数结构。通过构建多样化的输入子集并强制它们之间的一致性,该方法恢复了稳健的符号骨架,随后使用遗传算法将其拟合到观测数据上。该框架首先在合成一维问题上进行评估,以评估其在不同认知不确定性水平下的稳健性。结果表明,即使在数据稀缺的情况下,所提出的符号回归方法也能恢复正确的表达式。在这项工作中,我们展示了将我们的方法应用于真实冬小麦数据的结果,学习了田间不同管理区的不同参数化氮响应曲线。结果表明,发现的表达式不仅比二次平台和指数函数等传统模型实现了更低的拟合误差,而且还捕捉了不同空间区域的多样化函数行为。这证明了神经符号回归在发现特定地点农学关系和支持精准农业中知情决策方面的潜力。

英文摘要

Accurately modeling crop response to Nitrogen (N) fertilization is a fundamental challenge in precision agriculture, as it impacts both economic returns and environmental sustainability. Existing approaches either rely on predefined parametric forms or opaque machine learning models, limiting their ability to interpret or discover site-specific functional relationships from data. In this work, we propose a neuro symbolic regression (SR) approach to learn parametric N-response curves without assuming a predefined functional form. Our approach integrates a transformer-based Multi-Set Symbolic Skeleton Prediction strategy, enabling the discovery of shared functional structures across multiple subdomains or management zones (MZs). By constructing diverse input subsets and enforcing consistency across them, the method recovers robust symbolic skeletons that are subsequently fitted to observed data using a genetic algorithm. This framework was first evaluated on synthetic one-dimensional problems to assess its robustness under varying levels of epistemic uncertainty. The results demonstrate the ability of the proposed SR approach to recover correct expressions even in data-scarce regimes. In this work, we present the results of applying our method to real-world winter wheat data, learning distinct parametric N-response curves for different MZs within a field. The results show that the discovered expressions not only achieve lower fitting errors than traditional models such as quadratic-plateau and exponential functions, but also capture diverse functional behaviors across spatial regions. This demonstrates the potential that neuro SR has to enable the discovery of site-specific agronomic relationships and support informed decision-making in precision agriculture.

2605.31273 2026-06-01 cs.LG

Survival Reinforcement Learning: Toward Scalable Self-Supervised RL

生存强化学习:迈向可扩展的自监督强化学习

Franki Nguimatsia-Tiofack, Fabian Schramm, Théotime Le Hellard, Justin Carpentier

AI总结 提出生存强化学习(SRL),一种基于在线分类的方法,通过最大化智能体在目标状态停留时间来解决对比强化学习中的均匀性-容忍性困境,在长时程运动任务上性能提升2至8倍。

详情
AI中文摘要

虽然自监督对比强化学习(CRL)展现了显著的深度扩展能力,成功使用了超过64层的网络,但由于对比损失固有的均匀性-容忍性困境,扩展的CRL在长时程目标条件规划中仍然存在困难。我们引入了生存强化学习(SRL),一种基于在线分类的替代方法,通过最大化智能体在目标状态的停留时间来扩展生存价值学习框架。SRL绕过了CRL的结构约束,并缓解了生存框架固有的“bang-bang”控制解,这种控制解在复杂动态系统中往往引发不良行为。在多种机器人基准测试中,扩展的SRL在操作任务上与最先进的CRL相当,并在稳定的长时程运动任务上性能提升2至8倍。我们的结果提供了强有力的额外证据,表明基于分类的方法可能成为扩展强化学习这一更广泛努力中的关键原语。

英文摘要

While self-supervised Contrastive Reinforcement Learning (CRL) has shown remarkable depth-scaling capabilities, successfully using networks over 64 layers, scaled CRL still struggles with long-horizon goal-conditioned planning due to the uniformity-tolerance dilemma inherent in contrastive losses. We introduce Survival Reinforcement Learning (SRL), an online classification-based alternative that extends the survival value learning framework by maximizing the agent's dwell time at target goals. SRL bypasses the structural constraints of CRL and mitigates the "bang-bang" control solutions inherent to survival frameworks, which often induce undesirable behavior in complex dynamical systems. Evaluated across diverse robotic benchmarks, scaled SRL matches state-of-the-art CRL on manipulation tasks and outperforms it by 2x to 8x on stable, long-horizon locomotion tasks. Our results provide strong additional evidence that classification-based methods may serve as a key primitive in the broader effort to scale reinforcement learning.

2605.31272 2026-06-01 cs.LG

Algorithmic Recourse of In-Context Learning for Tabular Data

表格数据的上下文学习算法补救

Wenshuo Dong, Jiaming Zhang, Shaopneg Fu, Hongbin Lin, Di Wang, Lijie Hu

AI总结 针对表格数据上下文学习中的黑箱模型,提出自适应子空间补救框架ASR-ICL,通过零阶优化高效生成可操作且稀疏的补救方案,理论证明补救有界且随上下文增大收敛至经典解。

Comments Accepted by ICML 2026

详情
AI中文摘要

随着预测模型越来越多地部署在信用审批等高风险场景中,对受影响的个体提供补救的后验方法需求日益增长。许多此类模型处理表格数据,其中特征对应现实世界的属性。最近,上下文学习(ICL)使大型语言模型能够通过在推理时以标注示例为条件进行表格预测,而无需显式训练。然而,ICL下表格决策的算法补救仍基本未被探索。在这项工作中,我们首次研究了ICL下表格数据的算法补救。我们进行了理论分析,表明补救仍然定义良好且有界,并刻画了随着上下文增大,补救如何收敛到经典解。在实践中,我们提出了一种新颖的零阶补救框架——自适应子空间补救用于上下文学习(ASR-ICL),该框架高效地为黑箱ICL模型生成可操作且稀疏的补救。所提出的框架自然地扩展到多类表格任务。在多个真实世界数据集和模型上的实验表明,ASR-ICL以更少的查询实现了与现有方法相当的补救质量,并经验性地验证了预测的收敛行为,支持了我们的理论分析。

英文摘要

As predictive models are increasingly deployed in high-stakes settings such as credit approval, there is a growing need for post-hoc methods that provide recourse to affected individuals. Many such models operate on tabular data, where features correspond to real-world attributes. Recently, in-context learning (ICL) has enabled large language models to perform tabular prediction by conditioning on labeled examples at inference time, without explicit training. However, algorithmic recourse for tabular decision-making under ICL remains largely unexplored. In this work, we present the first study of algorithmic recourse for tabular data under ICL. We carry out a theoretical analysis, showing that recourse remains well-defined and bounded, and we characterize how recourse converges toward classical solutions as the context size increases. In practice, we propose a novel zeroth-order recourse framework, Adaptive Subspace Recourse for In-Context Learning (ASR-ICL), that efficiently generates actionable and sparse recourse for black-box ICL models. The proposed framework naturally extends to multi-class tabular tasks. Experiments across multiple real-world datasets and models demonstrate that ASR-ICL achieves recourse quality comparable to existing methods with fewer queries and empirically confirm the predicted convergence behavior, supporting our theoretical analysis.

2605.31271 2026-06-01 cs.CV

DriveMA: Driving Vision-Language-Action Models with verifiable Meta-Actions

DriveMA:基于可验证元动作的驾驶视觉-语言-动作模型

Weicheng Zheng, Yixin Huang, Qiao Sun, Derun Li, Hang Zhao

AI总结 提出DriveMA框架,通过可验证元动作弥合语言与动作的差距,结合动作中心监督训练和强化学习实现端到端驾驶规划,在Waymo Open Dataset上取得最优性能。

Comments arXiv admin note: text overlap with arXiv:2605.21273

详情
AI中文摘要

驾驶视觉-语言-动作模型(Driving VLAs)旨在利用语言改进端到端规划,但语言-动作差距限制了这一前景。我们提出DriveMA,一个基于可验证元动作的Driving VLA框架,该元动作将未来自我运动总结为紧凑的语言域意图,并可通过轨迹接地标注流水线从专家轨迹构建,以及通过基于规则的投影对生成轨迹进行验证。DriveMA利用这种可验证性,采用以动作为中心的监督训练和数据高效的回合级信用分配强化学习框架,通过密集奖励和精确信用分配明确地将高层决策与低层轨迹规划对齐。DriveMA在Waymo Open Dataset基于视觉的端到端驾驶上设立了新的最先进水平,2B模型获得8.060的评分者反馈分数,4B模型进一步提升至8.079;同时在NAVSIM上获得了具有竞争力的闭环规划性能。这些结果表明,即使是一个简单的元动作接口,在可验证并针对语言-动作对齐优化后,也能实现最先进的规划。代码、数据和模型将公开发布以促进未来研究。

英文摘要

Driving Vision-Language-Action Models (Driving VLAs) aim to use language to improve end-to-end planning, but the language-action gap limits this promise. We propose DriveMA, a Driving VLA framework built on verifiable meta-actions, which summarize future ego motion into compact language-domain intentions and can be constructed from expert trajectories with a trajectory-grounded annotation pipeline and can be verified against generated trajectories through rule-based projection. DriveMA exploits this verifiability with action-centric supervised training and a data-efficient turn-level credit assignment reinforcement learning framework, explicitly aligning high-level decisions with low-level trajectory planning through dense rewards and precise credit assignment. DriveMA sets a new state of the art on the Waymo Open Dataset Vision-based E2E Driving, achieving a Rater Feedback Score of 8.060 with a 2B model and further improving it to 8.079 with a 4B model; it also obtains competitive closed-loop planning performance on NAVSIM. These results show that even a simple meta-action interface can achieve state-of-the-art planning when made verifiable and optimized for language-action alignment. Code, data, and models will be released to facilitate future research.

2605.31268 2026-06-01 cs.CL

Mellum2 Technical Report

Mellum2 技术报告

Marko Kojic, Ivan Bondyrev, Aral de Moor, Joseph Shtok, Petr Borovlev, Kseniia Lysaniuk, Madeeswaran Kannan, Ivan Dolgov, Nikita Pavlichenko

AI总结 本文介绍 Mellum 2,一个12B参数(每token激活2.5B)的混合专家语言模型,专攻软件工程,通过架构创新和训练优化在代码生成、数学推理等基准上达到4B-14B开源模型的竞争力。

详情
AI中文摘要

我们提出 Mellum 2,一个开放权重的12B参数混合专家(MoE)语言模型,每token激活2.5B参数。Mellum 2是一个通用语言模型,专精于软件工程,涵盖代码生成与编辑、调试、多步推理、工具使用与函数调用、智能体编码以及对话式编程辅助,它是之前专注于补全的4B密集模型Mellum的继任者。架构基于混合专家(64个专家,8个激活),结合了分组查询注意力(4个KV头)、每四层中三层使用滑动窗口注意力,以及一个多token预测头,该头同时作为辅助预训练目标和内置的推测解码草稿模型;每个选择都通过消融实验验证,并以商品GPU上的推理效率作为设计约束。预训练涵盖约10.6万亿token,通过三个阶段的学习课程,从多样化的网络数据逐步转向精选的代码和数学内容,使用Muon优化器在FP8混合精度下进行优化,并采用预热-保持-衰减调度(线性衰减至零)。预训练基础模型通过层选择性YaRN扩展到128K上下文窗口,然后分两个阶段进行后训练(监督微调后接RLVR),产生两个发布变体:直接回答的Instruct模型和在最终答案前输出显式推理轨迹的Thinking模型。在代码生成、数学与推理、工具使用、知识和安全基准上,Mellum 2与4B-14B范围内的开放权重基线模型竞争,同时每token计算量相当于2.5B密集模型。我们在Apache 2.0许可下发布基础、指令和思考检查点,以及关于架构决策、数据管道和训练配方的本报告。

英文摘要

We present Mellum 2, an open-weight 12B-parameter Mixture-of-Experts (MoE) language model with 2.5B active parameters per token. Mellum 2 is a general-purpose language model specialized in software engineering, spanning code generation and editing, debugging, multi-step reasoning, tool use and function calling, agentic coding, and conversational programming assistance, and it is the successor to the completion-focused 4B dense Mellum model. The architecture builds on the Mixture-of-Experts (64 experts, 8 active) and combines Grouped-Query Attention with 4 KV heads, Sliding Window Attention on three of every four layers, and a single Multi-Token Prediction head that doubles as both an auxiliary pre-training objective and a built-in draft model for speculative decoding; each choice was validated by ablation with inference efficiency on commodity GPUs as a design constraint. Pre-training spans approximately 10.6 trillion tokens through a three-phase curriculum that progressively shifts the mixture from diverse web data toward curated code and mathematical content, optimized with Muon under FP8 hybrid precision and a Warmup-Hold-Decay schedule with linear decay to zero. The pre-trained base is extended to a 128K context window via a layer-selective YaRN and then post-trained in two stages (supervised fine-tuning followed by RLVR), yielding two released variants: an Instruct model that answers directly and a Thinking model that emits an explicit reasoning trace before its final answer. Across code generation, math and reasoning, tool use, knowledge, and safety benchmarks, Mellum 2 is competitive with open-weight baselines in the 4B-14B range while running at the per-token compute of a 2.5B dense model. We release the base, instruct, and thinking checkpoints, together with this report on the architecture decisions, data pipeline, and training recipe behind them, under the Apache 2.0 license.

2605.31266 2026-06-01 cs.CV cs.AI cs.LG

Envisioning Beyond the Few: Disentangled Semantics and Primitives for Few-Shot Atypical Layout-to-Image Generation

超越少数:用于少样本非典型布局到图像生成的解耦语义与基元

Nan Bao, Yifan Zhao, Wenzhuang Wang, Jia Li

AI总结 针对少样本非典型布局到图像生成中表示碎片化问题,提出通过语义锚定和基元注入解耦语义与视觉细节,实现鲁棒少样本适应。

Comments Accepted to ICML 2026; code available at https://github.com/iCVTEAM/DSP

详情
AI中文摘要

布局到图像(L2I)任务通过对象类别和空间布局实现对图像生成的细粒度控制。然而,现有的L2I方法在少样本非典型设置下会产生碎片化和扭曲的生成结果。我们将这种失败称为表示碎片化,源于将语义身份与视觉细节纠缠在一起的粒度不匹配。为了解决这个问题,我们提出了一种表示驱动的框架,将语义与基元解耦,以实现鲁棒的少样本适应。具体来说,语义锚定将类别语义聚合到锚点中以实现稳定的身份,而基元注入则建模可重新组合的基元以实现鲁棒的局部细节建模。概念引导进一步通过显著性感知目标调节优化,以保持前景语义一致性。大量实验表明,在5样本设置下,我们的方法在视觉保真度和跨不同非典型领域的对齐方面,均优于最先进的L2I方法。源代码公开于 https://github.com/iCVTEAM/DSP。

英文摘要

The layout-to-image (L2I) task enables fine-grained control over image generation via object categories and spatial layouts. However, existing L2I methods yield fragmented and distorted generations under few-shot atypical settings. We term this failure as representation fragmentation, arising from a granularity mismatch that entangles semantic identity with visual details. To address this issue, we propose a representation-driven framework that disentangles semantics from primitives for robust few-shot adaptation. Specifically, Semantic Anchoring aggregates categorical semantics into anchors for stable identity, while Primitive Imbuing models recomposable primitives for robust local detail modeling. Conceptual Steering further regulates optimization with a saliency-aware objective to preserve foreground semantic consistency. Extensive experiments demonstrate consistent improvements in the 5-shot regime over state-of-the-art L2I methods in both visual fidelity and alignment across diverse atypical domains. The source code is publicly available at https://github.com/iCVTEAM/DSP.

2605.31264 2026-06-01 cs.AI cs.CL cs.LG

COLLEAGUE.SKILL: Automated AI Skill Generation via Expert Knowledge Distillation

COLLEAGUE.SKILL: 通过专家知识蒸馏实现自动化AI技能生成

Tianyi Zhou, Dongrui Liu, Leitao Yuan, Jing Shao, Xia Hu

AI总结 提出一个从异构痕迹到可检查、可修正、可代理使用的技能包的自动化蒸馏系统,用于生成基于人的AI技能。

Comments 12 pages, 4 figures

详情
AI中文摘要

LLM代理不仅被期望完成孤立的任务,还要承载人类专业知识、判断和互动风格的有限表示。构建这种基于人的代理仍然困难,因为与人或角色相关的可操作知识通常嵌入在异构痕迹中,而不是写成清晰的指令。现有的记忆和角色系统捕捉了这些证据的片段,而技能框架提供了可移植的打包格式;然而,没有端到端的工作流将这些痕迹蒸馏成可检查、可修正和代理可用的技能。我们提出了一个自动化的痕迹到技能蒸馏系统,通过专家知识蒸馏生成基于人的AI技能。给定目标人物或角色的材料,COLLEAGUE.SKILL 生成一个版本化的技能包,包含两个协调的轨道:一个能力轨道,用于实践、心理模型和决策启发式;一个边界行为轨道,用于沟通风格、互动规则和修正历史。该包可以被检查、调用、通过自然语言反馈更新、回滚、跨代理主机安装,并可选择性地为受控分发做准备。我们描述了开源系统中实现的人工制品契约、生成工作流、修正生命周期、部署表面和领域预设。在撰写本文时,公共仓库拥有约18.5k个GitHub星标;画廊列出了来自165位贡献者的215个技能,以及跨列出的技能卡累计超过10万个星标。该系统说明了基于人的技能如何表示为可移植、可修正的包,而不是不透明的提示或隐藏的记忆。

英文摘要

LLM agents are increasingly expected not only to complete isolated tasks, but also to carry bounded representations of human expertise, judgment, and interaction style. Building such person-grounded agents remains difficult because actionable knowledge associated with a person or role is usually embedded in heterogeneous traces rather than written as clean instructions. Existing memory and persona systems capture fragments of this evidence, while skill frameworks provide portable packaging formats; however, there is no end-to-end workflow for distilling these traces into inspectable, correctable, and agent-usable skills. We present an automated trace-to-skill distillation system for generating person-grounded AI skills via expert knowledge distillation. Given materials from a target person or role, COLLEAGUE.SKILL produces a versioned skill package with two coordinated tracks: a capability track for practices, mental models, and decision heuristics, and a bounded behavior track for communication style, interaction rules, and correction history. The package can be inspected, invoked, updated through natural-language feedback, rolled back, installed across agent hosts, and optionally prepared for controlled distribution. We describe the artifact contract, generation workflow, correction lifecycle, deployment surface, and domain presets implemented in the open-source system. At the time of writing, the public repository has approximately 18.5k GitHub stars; the gallery lists 215 skills from 165 contributors and more than 100k cumulative stars across listed skill cards. The system illustrates how person-grounded skills can be represented as portable, correctable packages rather than opaque prompts or hidden memories.

2605.31261 2026-06-01 cs.LG cs.AI stat.ML

Why Linear Recurrent Memory Works in Partially Observable Reinforcement Learning

为什么线性循环记忆在部分可观测强化学习中有效

Yike Zhao, Onno Eberhard, Malek Khammassi, Ali H. Sayed, Michael Muehlebach

AI总结 本文通过构造两种线性滤波器,从理论上证明了线性循环神经网络在部分可观测强化学习中作为记忆单元的有效性,并扩展到动作控制的隐马尔可夫模型。

详情
AI中文摘要

线性循环神经网络家族在部分可观测强化学习中作为循环记忆单元表现出色。我们通过构造并研究两种线性滤波器为其经验有效性提供了理论依据:(i) 第一种在确定性转移矩阵下精确重现隐马尔可夫模型(HMM)中信念向量的预softmax logits,从而作为最优策略学习的充分统计量;(ii) 第二种在近似确定性转移矩阵下实现状态解码误差趋近于零,从而将状态模糊性降至接近零。结果扩展到动作控制的HMM,其中相应的线性滤波器变为随时间变化且依赖于动作的动态。我们通过数值实验说明了主要结果,并进一步展示了所构造的线性滤波器在小型强化学习游戏中作为强特征提取器的能力。

英文摘要

The family of linear recurrent neural networks has shown strong performance as recurrent memory units in partially observable reinforcement learning. We provide a theoretical justification for their empirical effectiveness by constructing and studying two linear filters: (i) the first exactly reproduces the pre-softmax logits of the belief vector in a hidden Markov model (HMM) under a deterministic transition matrix, thereby serving as a sufficient statistic for optimal policy learning, (ii) the second achieves vanishing state-decoding error under a nearly deterministic transition matrix, thus reducing state ambiguity to near zero. The results extend to action-controlled HMMs, where the corresponding linear filters become time-varying with action-dependent dynamics. We illustrate our main results through numerical experiments and further show that the constructed linear filter serves as a strong feature extractor in a small reinforcement learning game.