arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2256
2605.28500 2026-05-28 cs.CL cs.AI cs.LG

Functional Entropy: Predicting Functional Correctness in LLM-Generated Code with Uncertainty Quantification

功能熵:通过不确定性量化预测LLM生成代码的功能正确性

Dylan Bouchard, Mohit Singh Chauhan, Zeya Ahmad, Ho-Kyeong Ra

AI总结 针对LLM生成代码功能不正确的问题,提出基于功能等价性的不确定性量化方法(功能熵),在多个编程语言和模型上优于现有方法。

详情
AI中文摘要

大型语言模型在代码生成方面表现出令人印象深刻的能力,但它们经常生成功能不正确的代码。不确定性量化(UQ)方法已成为检测自然语言生成中幻觉的有前途的方法,但它们在代码生成任务中的有效性仍未得到充分探索。我们系统地评估了UQ技术如何跨三种编程语言、五个LLM和超过1700个问题迁移到代码生成。我们发现,一些基于令牌概率的方法无需修改即可有效泛化,而依赖自然语言推理(NLI)的基于采样的方法失败,因为NLI模型无法区分功能不同的代码,导致大多数响应崩溃为单个语义簇。为了解决这个问题,我们引入了功能等价性方法,这是一类特定于代码的方法,用基于LLM的功能等价性评估取代基于NLI的语义等价性,包括功能熵,即语义熵的代码特定模拟。功能等价性方法在15个模型-基准组合中的11个中实现了最高的AUROC,并在大多数设置中实现了最佳校准,始终优于基于NLI的对应方法以及所有其他评估方法。

英文摘要

Large language models have shown impressive capabilities in code generation, yet they often produce functionally incorrect code. Uncertainty quantification (UQ) methods have emerged as a promising approach for detecting hallucinations in natural language generation, but their effectiveness for code generation tasks remains underexplored. We systematically evaluate how UQ techniques transfer to code generation across three programming languages, five LLMs, and over 1,700 problems. We find that some token-probability-based methods generalize effectively without modification, while sampling-based methods relying on natural language inference (NLI) fail because NLI models cannot distinguish functionally different code, causing most responses to collapse into a single semantic cluster. To address this, we introduce functional equivalence methods, a family of code-specific methods that replace NLI-based semantic equivalence with an LLM-based functional equivalence assessment, including functional entropy, a code-specific analog of semantic entropy. Functional equivalence methods achieve top AUROC in 11 out of 15 model-benchmark combinations and the best calibration across most settings, consistently outperforming both NLI-based counterparts and all other methods evaluated.

2605.28495 2026-05-28 cs.CV

Janus-LoRA: A Balanced Low-Rank Adaptation for Continual Learning

Janus-LoRA:面向持续学习的平衡低秩适配

Cheng Chen, Pengpeng Zeng, Yuyu Guo, Lianli Gao, Hengtao Shen, Jingkuan Song

AI总结 提出Janus-LoRA框架,通过梯度修正实现参数级正交性以克服灾难性遗忘,并利用解耦边际损失增强特征级分离,从而在持续学习中平衡稳定性与可塑性。

Comments 9pages, International Conference on Machine Learning

详情
AI中文摘要

低秩适配(LoRA)已成为持续学习的一种有前景的范式。它独立更新其低秩因子($A$和$B$),通过它们的相互作用对完整权重矩阵产生复合更新。为了防止灾难性遗忘,该更新应保持与包含先前学习知识的任务特定子空间正交。然而,我们发现这种复合更新系统性地违反了这种正交性,重新引入了干扰并破坏了稳定性。此外,天真地强制执行这种正交性会损害可塑性,破坏微妙的稳定性-可塑性权衡。为了解决这些问题,我们提出了 extbf{Janus-LoRA}框架,通过两个新颖的组件恢复这种平衡。具体来说,我们首先引入梯度修正,这是一种闭式解,数学上解耦LoRA的因子更新,针对通过高效在线估计识别的历史知识子空间强制执行正交性。接下来,为了增强可塑性,我们引入解耦边际损失,通过将新特征表示推离旧特征表示来促进特征级分离,从而为新学习创建独特、低干扰的区域。在具有挑战性的基准上的全面实验表明,通过协调参数级正交性与特征级分离,Janus-LoRA实现了优越的平衡,并建立了新的最先进性能。

英文摘要

Low-Rank Adaptation (LoRA) has emerged as a promising paradigm for Continual Learning. It independently updates its low-rank factors ($A$ and $B$), creating a composite update to the full weight matrix through their interaction. To prevent catastrophic forgetting, this update should remain orthogonal to the task-specific subspace that contains previously learned knowledge. However, we identify that this composite update systematically violates this orthogonality, reintroducing interference and undermining stability. Furthermore, naively enforcing this orthogonality compromises plasticity, disrupting the delicate stability-plasticity trade-off. To resolve these issues, we propose \textbf{Janus-LoRA}, a framework that restores this balance through two novel components. Specifically, we first introduce Gradient Rectification, a closed-form solution that mathematically decouples LoRA's factor updates, enforcing orthogonality against the historical knowledge subspace identified by an efficient Online Estimation. Next, to enhance plasticity, we introduce a Decoupled Margin Loss that promotes feature-level separation by pushing new feature representations away from old ones, thus creating distinct, low-interference regions for new learning. Comprehensive experiments on challenging benchmarks demonstrate that by harmonizing parameter-level orthogonality with feature-level separation, Janus-LoRA achieves a superior balance and establishes new state-of-the-art performance.

2605.28494 2026-05-28 cs.CL

A new semantically annotated corpus with syntactic-semantic and cross-lingual senses

一个带有句法语义和跨语言义项的新语义标注语料库

Myriam Rakho, Eric Laporte, Matthieu Constant

AI总结 本文构建了一个包含20个法语多义动词实例的新语义标注语料库,每个实例标注了三种义项:平行语料中的英语翻译、法语计算词典(Lexicon-Grammar表)条目以及两者的组合细粒度义项。

详情
Journal ref
Language Resources and Evaluation (LREC), 2012, Istanbul, Turkey, pp.597-600
AI中文摘要

我们描述了一个用于词义消歧的新义项标注语料库。该语料库由20个法语多义动词的实例组成。每个动词实例都标注了三种义项标签:(1) 该实例在平行语料库英语版本中的实际翻译,(2) 法语计算词典(Lexicon-Grammar表)中的动词条目,以及(3) 由翻译和Lexicon-Grammar条目拼接而成的细粒度义项标签。

英文摘要

We describe a new sense-tagged corpus for word sense disambiguation. The corpus is constituted of instances of 20 French polysemous verbs. Each verb instance is annotated with three sense labels: (1) the actual translation of the verb in the english version of this instance in a parallel corpus, (2) an entry of the verb in a computational dictionary of French (the Lexicon-Grammar tables) and (3) a fine-grained sense label resulting from the concatenation of the translation and the Lexicon-Grammar entry.

2605.28491 2026-05-28 cs.CV

DiscoForcing: A Unified Framework for Real-Time Audio-Driven Character Control with Diffusion Forcing

DiscoForcing:基于扩散强制的实时音频驱动角色控制统一框架

Kaiyang Ji, Bingsheng Qian, Binghuan Wu, Kangyi Chen, Ye Shi, Jingya Wang

AI总结 针对实时音频响应角色控制问题,提出DiscoForcing框架,结合因果音乐编码器和扩散强制序列模型,在严格因果、有限延迟的流式生成中实现音频与全身运动的稳定对齐。

Comments accepted by ICML 2026

详情
AI中文摘要

我们研究实时音频响应角色控制作为一个部署忠实性问题:严格因果、有限延迟的流式生成,必须在交互帧率下生成连贯的全身运动,同时音频条件可能突然变化,包括节奏变化、音频丢失或用户编辑。先前的音乐到运动系统主要针对具有全局上下文的离线生成进行优化,在流式部署中,当条件历史变得过时或不可靠时,性能会下降。我们引入了DiscoForcing,一个流式音频驱动扩散框架,它将捕获节奏结构和相位动态的因果音乐编码器与在时间范围内以异构噪声水平训练的扩散强制序列模型相结合。在此基础上,我们设计了一个混合时间调度和一个历史引导的流式采样器,以明确权衡响应性与非平稳音频下的长期一致性。在端到端实时交互系统中实现,包括在线虚拟角色回放和人形部署工作流,DiscoForcing在匹配因果性和延迟约束下,比先前基线提供更稳定的长期展开和更清晰的音频-运动对齐,同时保持实时吞吐量。

英文摘要

We study real-time audio-responsive character control as a deployment-faithful problem: strictly causal, bounded-latency streaming that must generate coherent full-body motion at interactive frame rates while the audio condition can change abruptly, including tempo shifts, drops, or user edits. Prior music-to-motion systems are largely optimized for offline generation with global context, and degrade in streaming rollouts where conditioning history becomes stale or unreliable. We introduce DiscoForcing, a streaming audio-driven diffusion framework that combines a causal music encoder that captures rhythmic structure and phase dynamics with a diffusion-forcing sequence model trained under heterogeneous noise levels across the temporal horizon. Building on this, we design a hybrid temporal schedule and a history-guided streaming sampler to explicitly trade off responsiveness against long-horizon consistency under non-stationary audio. Implemented in an end-to-end real-time interactive system with online avatar playback and humanoid deployment workflows, DiscoForcing delivers more stable long-horizon rollouts and sharper audio-motion alignment than prior baselines under matched causality and latency constraints while maintaining real-time throughput.

2605.28490 2026-05-28 cs.CV cs.AI

SSR3D-LLM: Structured Spatial Reasoning via Latent Steps for Fine-Grained Grounding in Unified 3D-LLMs

SSR3D-LLM: 通过潜在步骤实现结构化空间推理以实现统一3D-LLM中的细粒度定位

Jiawei Li, Ziyi Liu, Weijie Shi, Long Chen, Jiajie Xu, Xiaofang Zhou

AI总结 针对统一3D-LLM中细粒度查询的脆弱性,提出SSR3D-LLM,通过潜在空间推理步骤和几何感知评分器逐步精炼候选排名,在多个基准上取得最优结果。

详情
AI中文摘要

3D物体定位从自然语言中定位3D场景中的所指对象。统一的以实例为中心的3D-LLM旨在同时解决定位、对话、问答和描述任务,但许多方法依赖于单一的指针式定位决策,将关系指令压缩为一个选择。这对于需要根据上下文对象和空间关系排除多个同类候选的细粒度查询来说是脆弱的。我们提出结构化空间推理3D-LLM(SSR3D-LLM),一种用于统一3D-LLM的结构化定位接口。给定固定的Mask3D物体提议,LLM从查询中写出一系列潜在的空间推理步骤和记忆令牌,然后一个几何感知评分器读取这些潜在步骤,通过逐步长度掩码逐步精炼候选排名。潜在步骤从标准基准目标监督和训练期间的辅助指代线索监督中学习,而推理仅使用输入查询和Mask3D提议。在ReferIt3D、ScanRefer和Multi3DRef上,SSR3D-LLM在统一3D-LLM基线中取得了最强结果,在细粒度定位上相比单指针QPG基线有显著提升,并相比先前的统一3D-LLM有一致改进,同时保留了默认的语言任务路径。

英文摘要

3D object grounding localizes referred objects in a 3D scene from natural language. Unified instance-centric 3D-LLMs aim to solve grounding together with dialog, QA, and captioning, yet many rely on a single pointer-style grounding decision that compresses a relational instruction into one selection. This is brittle for fine-grained queries where multiple same-class candidates must be ruled out by context objects and spatial relations. We propose Structured Spatial Reasoning 3D-LLM (SSR3D-LLM), a structured grounding interface for unified 3D-LLMs. Given fixed Mask3D object proposals, the LLM writes a sequence of latent spatial reasoning steps and memory tokens from the query, and a geometry-aware scorer reads these latent steps in order to refine candidate rankings step by step with step-length masking. The latent steps are learned from standard benchmark target supervision with auxiliary referential-cue supervision during training, while inference uses only the input query and Mask3D proposals. Across ReferIt3D, ScanRefer, and Multi3DRef, SSR3D-LLM achieves the strongest results among unified 3D-LLM baselines, with substantial gains over the single-pointer QPG baseline on fine-grained grounding and consistent improvements over prior unified 3D-LLMs, while preserving the default language-task route.

2605.28487 2026-05-28 cs.AI cs.LG

ProvMind: Provenance-grounded reasoning for materials synthesis

ProvMind:基于来源的材料合成推理

Yiming Zhang, Ryo Tamura, Koji Tsuda

AI总结 提出MatProcBench基准和ProvMind框架,通过来源图推理实现材料合成中的路线、条件和因果依赖优化,在双OOD分割上达到52.84%准确率。

详情
AI中文摘要

材料工艺优化需要对路线、条件、工具和因果依赖进行推理,然而大多数计算方法将合成过程扁平化为文本或有序步骤。我们引入了MatProcBench,一个基于文献挖掘的MatPROV图构建的来源基准,用于评估七个过程推理任务,涵盖路线连续性、步骤级变量推断和全局因果一致性,在相同分割和偏移感知评估下,包括结合时间与材料类别偏移的严格双OOD分割。我们进一步引入了ProvMind,一个过程记忆推理框架,检索类似训练过程,将其转换为来源感知的选项级兼容性分数,并使用语言模型进行约束最终决策。ProvMind在双OOD分割上达到52.84%的准确率,优于提示、检索增强和监督微调基线。

英文摘要

Materials process optimization requires reasoning over routes, conditions, tools and causal dependencies, yet most computational formulations flatten synthesis procedures into text or ordered steps. We introduce MatProcBench, a provenance-grounded benchmark constructed from literature-mined MatPROV graphs, to evaluate seven process-reasoning tasks spanning route continuity, step-level variable inference and global causal consistency under both same-split and shift-aware evaluation, including a strict dual-OOD split that combines temporal and material-class shift. We further introduce ProvMind, a process-memory reasoning framework that retrieves analogous training processes, converts them into provenance-aware option-level compatibility scores, and uses a language model for constrained final decision making. ProvMind achieves 52.84\% accuracy on the dual-OOD split, outperforming prompting, retrieval-augmented and supervised fine-tuning baselines.

2605.28486 2026-05-28 cs.RO

Mag-VLA: Vision-Language-Action Model for Bimanual Magnetically Actuated Microrobot Manipulation

Mag-VLA:用于双臂磁驱动微机器人操作的视觉-语言-动作模型

Yongchen Wang, Kangyi Lu, Lan Wei, Dandan Zhang

AI总结 提出Mag-VLA模型,利用双臂磁驱动微机器人实现灵巧操作,通过视觉-语言-动作框架和动作分块Transformer解码器,在真实机器人实验中达到90%接近成功率和最高80%运输成功率。

Comments Accepted by 2026 MARSS

详情
AI中文摘要

磁驱动微机器人已被用作微尺度下的无线、非接触操作工具,使其在微创应用中具有前景。然而,由于间接驱动、有限的传感和非线性磁相互作用,其控制仍然具有挑战性。在这项工作中,我们提出了Mag-VLA,一种用于灵巧磁微机器人操作的视觉-语言-动作(VLA)模型,该模型使用两个装有磁铁的机械臂来构建动态磁场。双臂协调实现了诸如微机器人重新定向等单臂难以或无法完成的功能,但也引入了耦合控制挑战,因为策略必须在共享工作空间内为两个执行器生成协调轨迹。我们的框架采用Qwen2.5-VL-7B骨干网络,使用低秩适配(LoRA)处理视觉观察和语言指令以进行动作预测。为了捕捉任务进展,我们引入了一个运动感知阶段分类器和一个阶段条件的动作分块Transformer(ACT)解码器,用于时间上连贯的多步控制。我们进一步构建了一个遥操作磁微机器人操作数据集,涵盖三种任务配置。消融研究表明,基于ACT的解码器显著优于其他生成式动作头。在真实机器人实验中,Mag-VLA在所有任务中实现了90%的接近成功率,并且随着任务难度增加,运输成功率分别为80%、70%和50%。这些结果表明,层次化VLA建模为磁微机器人操作提供了一个有前景的框架。

英文摘要

Magnetically actuated microrobots have been used as wireless, non-contact manipulation tools at microscales, making them promising for minimally invasive applications. However, their control remains challenging due to indirect actuation, limited sensing, and nonlinear magnetic interactions. In this work, we propose Mag-VLA, a vision-language-action (VLA) model for dexterous magnetic microrobot manipulation using two robotic arms with mounted magnets for dynamic magnetic-field construction. Bimanual coordination enables capabilities such as microrobot reorientation that are difficult or infeasible with a single arm, but it also introduces coupled control challenges, as the policy must generate coordinated trajectories for both actuators within a shared workspace. Our framework adapts a Qwen2.5-VL-7B backbone using Low-Rank Adaptation (LoRA) to process visual observations and language instructions for action prediction. To capture task progression, we introduce a motion-aware phase classifier and a phase-conditioned Action Chunking Transformer (ACT) decoder for temporally coherent multi-step control. We further construct a teleoperated magnetic microrobot manipulation dataset covering three task configurations. Ablation studies show that the ACT-based decoder substantially outperforms alternative generative action heads. In real-robot experiments, Mag-VLA achieves a 90% approach success rate across all tasks and transport success rates of 80%, 70%, and 50% as task difficulty increases. These results demonstrate that hierarchical VLA modeling provides a promising framework for magnetic microrobot manipulation.

2605.28484 2026-05-28 cs.CL

Comonadic Morphophonology: A Compositional Framework for Context-Dependent Morphological Rules in Finnish

共单子形态音系学:芬兰语上下文相关形态规则的组合框架

Yongseok Jang

AI总结 提出一个基于共单子的组合框架,将每个形态音系规则表示为局部上下文到输出音段的函数,并通过Writer共单子实现长度变化规则的严格组合,显著减少规则表示规模并支持双向形态分析。

Comments 13 pages. Accepted at the Society for Computation in Linguistics (SCiL) 2026

详情
AI中文摘要

组合用于上下文相关形态音系规则(辅音渐变、元音和谐、所有格后缀同化)的有限状态转录机(FST)会导致乘法状态爆炸;神经模型规避了该问题,但未提供规则本身的形式化描述。我们提出了第一个框架,其中每个形态音系规则是从聚焦的局部上下文到单个输出音段的函数——类似于元胞自动机的局部规则类型——并且长度变化规则作为共单子的coKleisli箭头进行组合。我们的核心贡献是Writer共单子(DeletionSet x Zipper),一种新的代数构造,恢复了此类规则的严格coKleisli组合性:每个规则是一个coKleisli箭头,extend将其提升为全局变换,删除操作作为幺半群作用累积,而不需要中间物化。作为支持证据,十三个coKleisli箭头提供了一种替代形式化,表达了Omorfi通过874个延续类编码的相同形态音系行为(规则表示层面67:1的缩减),并且相同的抽象支持双向形态学——MorphGenerator重用分析箭头进行生成。在UD Finnish-TDT上,该系统仅使用规则消歧达到83.92%的UPOS准确率(使用外部后缀标注器达到94.66%),验证了该框架作为实用形态引擎的有效性。

英文摘要

Composing finite-state transducers (FSTs) for context-dependent morphophonological rules -- consonant gradation, vowel harmony, possessive suffix assimilation -- leads to multiplicative state explosion; neural models sidestep the problem but provide no formal account of the rules themselves. We present the first framework where each morphophonological rule is a function from a focused local context to a single output segment -- the type of a local rule familiar from cellular automata -- and where length-changing rules compose as coKleisli arrows of a comonad. Our central contribution is the Writer comonad (DeletionSet x Zipper), a new algebraic construction that restores strict coKleisli compositionality for such rules: each rule is a coKleisli arrow, extend lifts it to a global transformation, and deletions accumulate as a monoid action rather than requiring intermediate materialization. As supporting evidence, thirteen coKleisli arrows provide an alternative formulation expressing the same morphophonological behaviors that Omorfi encodes via 874 continuation classes (67:1 reduction at the rule-representation level), and the same abstraction enables bidirectional morphology -- a MorphGenerator reuses the analysis arrows for generation. On UD Finnish-TDT, the system achieves 83.92% UPOS accuracy with rule-only disambiguation (94.66% with an external suffix tagger), validating the framework as a practical morphological engine.

2605.28483 2026-05-28 cs.AI cs.IR

From Learning Resources to Competencies: LLM-Based Tagging with Evidence and Graph Constraints

从学习资源到能力:基于证据和图约束的LLM标签方法

Ngoc Luyen Le, Marie-Hélène Abel, Bertrand Laforge

AI总结 提出一种端到端对齐流程,利用大语言模型作为受约束的、能产生证据的标签器,将学习资源链接到结构化能力框架,在计算机科学数据集上取得优于基线方法的性能。

详情
AI中文摘要

将学习资源链接到结构化能力框架是实现学习管理系统中基于能力的搜索和课程分析的关键。然而,手动标注劳动密集,全自动方法往往缺乏透明度。在本文中,我们提出了一种端到端对齐流程,使用大语言模型作为受约束的、能产生证据的标签器。LMS资源——包括教学内容和评估——首先被分割成有意义的教学片段。对于每个片段,从基于图上下文增强的结构化能力档案中检索一小部分候选能力。然后,LLM从该集合中选择最相关的能力,并从片段文本中提供支持证据片段。这些预测利用能力图的结构进行细化,并在资源级别聚合。我们在从计算机科学系的能力参考体系(UTC)构建的数据集上评估了我们的方法,该数据集涵盖22个能力,涉及多个课程材料。我们的LLM+BM25+Graph(LBG)流程取得了强劲的结果:片段级微F1为0.57,宏F1为0.50;资源级宏F1为0.51;MRR为0.82——优于零样本和少样本LLM变体、检索/相似性基线以及监督分类器——同时产生更多机械可追踪的证据片段,以支持人工审计和教育分析。

英文摘要

Linking learning resources to a structured competency framework is key to enabling competency-based search and curriculum analytics in Learning Management Systems (LMS). However, manual tagging is labor-intensive, and fully automatic methods often lack transparency. In this paper, we present an end-to-end alignment pipeline that uses a large language model (LLM) as a constrained, evidence-producing tagger. LMS resources -both instructional content and assessments -are first segmented into meaningful pedagogical fragments. For each fragment, a small set of candidate competencies is retrieved from structured competency profiles enriched with graph-based context. The LLM then selects the most relevant competencies from this set and provides supporting evidence spans from the fragment text. These predictions are refined using the structure of the competency graph and aggregated at the resource level. We evaluate our approach on a dataset built from the Computer Science department's competency referential at the Université de Technologie de Compiègne (UTC), covering 22 competencies across multiple course materials. Our LLM+BM25+Graph (LBG) pipeline achieves strong results, with a micro-F1 of 0.57 and macro-F1 of 0.50 at the fragment level, 0.51 macro-F1 at the resource level, and an MRR of 0.82outperforming zero-shot and few-shot LLM variants, retrieval/similarity baselines, and supervised classifiers -while also producing more mechanically traceable evidence spans to support human auditing and educational analysis.

2605.28468 2026-05-28 cs.RO

EIT-Pneumatic Hybrid Robotic Skin for Practical and Accurate Force Map Reconstruction

EIT-气动混合机器人皮肤用于实用且精确的力图重建

Junhwi Cho, Sunggyu Bae, Junghyeon Ma, Hyosang Lee, Jung Kim, Kyungseo Park

AI总结 提出一种结合电阻抗断层成像(EIT)与气动触觉传感的混合机器人皮肤,通过Tikhonov正则化逆重建和逐垫气动校准,实现大面积精确触觉传感,并降低灵敏度不均匀性。

Comments 8 pages, 8 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026. J. Cho, S. Bae, J. Ma contributed equally

详情
AI中文摘要

我们提出了一种混合机器人皮肤,它结合了电阻抗断层成像(EIT)与气动触觉传感,以提高力重建能力。所开发的机器人皮肤完全通过3D打印和喷涂制造,成本低廉且易于构建。采用Tikhonov正则化逆重建,配合逐垫气动校准,通过简单的测量方案实现了精确的大面积触觉传感。为了验证,我们进行了测力计压痕实验;结果显示,在垫内不同位置,力重建保持一致。与仅使用EIT的基线相比,灵敏度不均匀性也有所降低,变异系数从0.31降至0.14,表明所提出的方法解决了EIT长期存在的局限性。我们进一步在仿人机器人上展示了胸部安装集成,并发现气动信号在各种接触场景下保持可靠,包括同一传感垫上的多个同时接触。这些结果表明,在真实机器人系统中实现精确、可扩展的全身触觉传感是一条实用路径。

英文摘要

We present a hybrid robotic skin that combines electrical impedance tomography (EIT) with pneumatic tactile sensing to improve force reconstruction capability. The developed robotic skin is fabricated entirely by 3D printing and spray coating, making it affordable and easy to build. A Tikhonov-regularized inverse reconstruction, paired with per-pad pneumatic calibration, enables accurate large-area tactile sensing with a simple measurement scheme. For validation, we conducted load-cell indentation experiments; the results showed consistent force reconstruction across locations within a pad. Compared with an EIT-only baseline, sensitivity non-uniformity was also reduced, with the coefficient of variation decreasing from 0.31 to 0.14, indicating that the proposed approach addresses a longstanding limitation of EIT. We further demonstrated chest-mounted integration on a humanoid robot and found that the pneumatic signals remained reliable across diverse contact scenarios, including multiple simultaneous contacts on the same sensing pad. These results indicate a practical path toward accurate, scalable whole-body tactile sensing in real robotic systems.

2605.28467 2026-05-28 cs.LG

Mitigating Adaptive Attacks against Reasoning Models with Activation Consistency Training

通过激活一致性训练缓解针对推理模型的自适应攻击

Avidan Shah, Jannik Brinkmann, Rico Angell

AI总结 提出激活一致性训练(ACT)方法,通过监督内部表示来防御针对推理模型的对抗性越狱和提示注入攻击,实验表明ACT在自适应攻击下保持鲁棒性。

详情
AI中文摘要

随着LLMs获得更强的推理能力,其扩展的思维链为防御对抗性越狱和提示注入引入了新的复杂性。我们研究了一致性训练,这是一系列微调目标,强制在干净提示和对抗性重写上行为一致,并评估了其两个主要变体:输出级(BCT)和激活级(ACT),在五个推理模型上。我们将这两种方法表述为提示注入防御,并发现ACT与其他基于训练的防御相比具有竞争力,同时仅需要干净和包装提示的自监督对。我们的实验还将这两种技术推广到越狱设置中,证明ACT对自适应攻击保持更强的鲁棒性。我们还提供了机制证据,表明ACT对越狱的防御被编码为在助手回合边界处激活空间中的大致线性偏移。经过ACT训练后,我们可以恢复一个单一的引导方向,该方向控制推理模型上的拒绝,而对良性输入影响最小。我们发现,即使模型的思维链被替换为来自未防御基础模型的顺从轨迹,ACT仍然保持鲁棒性,转而拒绝预填充的越狱。这些结果共同表明,监督内部表示是推理模型中各种形式安全训练的一种出乎意料有效且可解释的方法。

英文摘要

As LLMs gain stronger reasoning capabilities, their extended chain-of-thought introduces new degrees of complexity for defending against adversarial jailbreaks and prompt injection. We study consistency training, a family of fine-tuning objectives that enforce identical behavior on clean prompts and adversarial rewrites, and evaluate its two main variants, output-level (BCT) and activation-level (ACT), across five reasoning models. We formulate both methods as a prompt injection defense and find ACT to be competitive with other training-based defenses while requiring only self-supervised pairs of clean and wrapped prompts. Our experiments also generalize both techniques within the jailbreak setting, demonstrating that ACT remains more robust to adaptive attacks. We also provide mechanistic evidence that ACT's defense against jailbreaks is encoded as a roughly linear shift in activation space at the assistant-turn boundary. After ACT training, we can recover a single steering direction that controls refusal on reasoning models with minimal effect on benign inputs. We find that ACT remains robust even when the model's chain-of-thought is replaced with a compliant trace from the undefended base model, pivoting to refuse prefilled jailbreaks. Together, these results suggest that supervising internal representations is a surprisingly effective and interpretable approach to various forms of safety training in reasoning models.

2605.28465 2026-05-28 cs.CL

Beyond One Path: Evaluating and Enhancing Divergent Thinking in Interactive LLM Agents

超越单一路径:评估与增强交互式LLM代理的发散性思维

Jihyeong Park, Ingeol Baek, Jeonghyun Park, Hwanhee Lee

AI总结 提出交互式基准MUTATE和策略ReDNA,用于评估和增强LLM代理在路径和动作层面的发散性思维,解决现有框架中即时收敛压力导致的动作固定问题。

Comments 28 pages, 16 figures, 19 tables

详情
AI中文摘要

发散性思维是创造力的核心维度,然而现有对大型语言模型(LLM)的评估将其视为单轮文本生成,未能捕捉代理通过迭代交互进行推理的过程。为解决这一问题,我们引入MUTATE,一个交互式基准,旨在从两个层面评估代理的发散性思维:路径层面,代理发现通向同一目标的多个替代路径;动作层面,单个动作需要非典型、机制转换的物体使用。与仅评估成功不同,MUTATE对完成的路径和偏离路径的尝试都进行评分,捕捉传统成功率忽略的发散性推理。我们对前沿LLM的实验揭示了现有框架中的结构性盲点:当面临即时收敛压力时,它们倾向于陷入即时动作固定,无法改善动作层面的发散性。为克服这一点,我们提出ReDNA,它将无约束的发散候选生成与收敛约束选择分离。ReDNA在两个发散性层面上显著优于先前方法,并能有效泛化到外部创造力环境。我们还确认其成功源于弹性发散推理的定性增强,而非简单的环境探索。

英文摘要

Divergent thinking is a core dimension of creativity, yet existing evaluations of Large Language Models (LLMs) treat them as single-turn text generations, failing to capture how an agent reasons through iterative interaction. To address this, we introduce MUTATE, an interactive benchmark designed to evaluate agentic divergent thinking at two levels: path-level, where an agent discovers multiple alternative paths to the same goal, and action-level, where individual actions require non-typical, mechanism-shifting object uses. Unlike success-only evaluations, MUTATE scores both completed paths and off-path attempts, capturing divergent reasoning that conventional success rates discard. Our experiments with frontier LLMs reveal a structural blind spot in existing frameworks: when exposed to immediate convergence pressure, they tend to fall into immediate action fixation, failing to improve action-level divergence. To overcome this, we propose ReDNA, which separates unconstrained divergent candidate generation from convergent constraint selection. ReDNA significantly outperforms prior methods across both divergence levels and generalizes effectively to an external creativity environment. We also confirm its success stems from a qualitative enhancement of resilient divergent reasoning rather than simple environmental exploration.

2605.28464 2026-05-28 cs.CL cs.AI

The Cases LJP Never Sees: Prosecution Decision Prediction for More Complete Criminal Liability Assessment

LJP 从未见过的案件:面向更完整刑事责任评估的起诉决定预测

Junyu Lu, Qi Wei, Peishuo Zheng, Jie Zhang, Hui Huang, Qianru Wang, Chuan Xiao, Jianbin Qin, Shuyuan Zheng

AI总结 提出起诉决定预测(PDP)任务,通过分类起诉或三种不起诉决定,弥补法律判决预测(LJP)在刑事责任评估中的盲区,并构建PDP-Bench基准,实验表明大语言模型在PDP上表现显著差于LJP。

Comments 24 pages, 5 figures, 22 tables

详情
AI中文摘要

法律判决预测(LJP)已成为评估刑事法律领域人工智能的核心基准,但它只涉及已经通过检察审查并正式起诉的刑事案件。因此,LJP在评估刑事责任方面留下了大量盲区,忽略了证据不足、无刑事责任或免予处罚的案件。为填补这一空白,我们提出了 extbf{起诉决定预测(PDP)},这是首个围绕检察审查构建的法律AI任务,它将每个案件分类为起诉或三种不起诉决定之一,并反映了法律AI在证据评估、法律归类和基于价值的裁量方面的能力。我们进一步构建了 extbf{PDP-Bench},一个包含4,630个真实中国检察决定、涵盖190个罪名的基准。大量实验表明,最先进的大语言模型在PDP上的表现显著差于LJP,且主流增强途径未能缩小差距。此外,受控的RLVR干预表明,简单的结果奖励无法产生可泛化的PDP判别能力。

英文摘要

Legal Judgment Prediction (LJP) has become a core benchmark for evaluating AI in the criminal legal domain, but it only sees criminal cases that have already passed prosecutorial review and been formally indicted. As a result, LJP leaves a substantial blind spot in assessing criminal liability, overlooking cases involving insufficient evidence, no criminal liability, or guilt exempted from punishment. To fill this gap, we propose \textbf{Prosecution Decision Prediction (PDP)}, the first Legal AI task built around prosecutorial review, which classifies each case into prosecution or one of three non-prosecution decisions and reflects legal AI's capabilities in evidence evaluation, legal subsumption, and value-based discretion. We further construct \textbf{PDP-Bench}, a benchmark of 4{,}630 real Chinese prosecutorial decisions spanning 190 charges. Extensive experiments show that state-of-the-art LLMs perform substantially worse on PDP than on LJP and that mainstream enhancement routes fail to close the gap. Moreover, controlled RLVR interventions show that simple outcome rewards fail to produce generalizable PDP discrimination.

2605.28462 2026-05-28 cs.RO

Learning a Kinodynamic Trajectory Manifold for Impact-Aware Compliant Catching of Fast-Moving Objects

学习动力学轨迹流形以实现对快速移动物体的冲击感知柔顺抓取

Guorui Pei, Mengshi Zhang, Xi Chen, Jinsong Wu, Jiaming Qi, Peng Zhou

AI总结 本文通过仿真中的强化学习收集成功抓取轨迹,学习低维动力学轨迹流形,并在运行时将估计的物体初始状态直接映射到参考抓取轨迹,结合近接触柔顺控制实现快速移动物体的冲击感知抓取。

详情
AI中文摘要

快速抓取自由飞行物体由于反应时间短、冲击不确定性和动力学约束而困难。我们在仿真中使用强化学习收集成功的抓取轨迹,并学习一个低维的动力学轨迹流形。在运行时,估计的物体初始状态直接映射到参考抓取轨迹,无需在线非线性优化。轨迹通过近接触柔顺控制进行跟踪,以改善冲击吸收和抓取稳定性。

英文摘要

Fast catching of free-flying objects is difficult because of short reaction time, impact uncertainty, and kinodynamic constraints. We use reinforcement learning in simulation to collect successful catching trajectories and learn a low-dimensional kinodynamic trajectory manifold. At run time, the estimated object initial state is mapped directly to a reference catching trajectory without online nonlinear optimization. The trajectory is tracked with compliant control near contact for improved impact absorption and capture stability.

2605.28459 2026-05-28 cs.CV

REVEAL: Reference-Grounded Reasoning for Multimodal Manipulation Detection

REVEAL:基于参考依据的多模态篡改检测推理

Jun Zhou, Bingwen Hu, Yaxiong Wang, Zhedong Zheng, Yongzhen Wang, Yuchen Zhang, Ping Liu

AI总结 提出REVEAL框架,通过参考依据验证和差异感知融合机制,结合任务解耦的混合专家架构,实现多模态篡改检测与定位,并支持无训练域适应。

Comments 11 pages, 3 figures

详情
AI中文摘要

多模态篡改检测旨在同时识别伪造的图像-文本对并定位被篡改区域,然而现有方法通常依赖于记忆孤立伪影,难以应对难以察觉的篡改痕迹或域偏移。受人类比较推理启发,我们将此任务重新表述为基于参考依据的验证问题,通过将查询与检索到的真实证据进行比较来评估真实性。我们提出REVEAL(参考支持的证据分析与定位验证),一个专门为此比较范式设计的框架。为支持该范式,我们构建了一个包含17万对真实新闻图像-文本对的大规模参考库,涵盖超过4万名公众人物。在技术上,REVEAL采用差异感知融合机制来捕捉查询与检索证据之间的细粒度差异。此外,我们引入任务解耦的混合专家(MoE)架构,以联合执行实例级检测和细粒度定位,有效缓解这些异构目标之间的优化冲突。大量实验表明,REVEAL显著优于最先进方法,并且通过简单更新参考库即可实现无训练域适应,为检测不断演变的虚假信息提供了稳健且实用的解决方案。代码可在 https://anonymous.4open.science/r/REVEAL-Reference-A006 获取。

英文摘要

Multimodal manipulation detection aims to simultaneously identify forged image--text pairs and localize tampered regions, yet existing methods typically rely on memorizing isolated artifacts and struggle with imperceptible manipulation traces or domain shifts. Inspired by human comparative reasoning, we reformulate this task as a reference-grounded verification problem, where authenticity is assessed by comparing a query against retrieved authentic evidence. We propose REVEAL Reference-Enabled Verification for Evidence Analysis and Localization), a framework explicitly designed for this comparative paradigm. To support this paradigm, we construct a large-scale reference library comprising 170K authentic news image--text pairs featuring over 40K public figures. Technically, REVEAL employs a difference-aware fusion mechanism to capture fine-grained discrepancies between the query and retrieved evidence. Furthermore, we introduce a task-decoupled Mixture-of-Experts (MoE) architecture to jointly execute instance-level detection and fine-grained grounding, effectively mitigating optimization conflicts between these heterogeneous objectives. Extensive experiments demonstrate that REVEAL significantly outperforms state-of-the-art methods, and notably enables \emph{training-free domain adaptation} by simply updating the reference library, offering a robust and practical solution for detecting evolving misinformation. Code is available at https://anonymous.4open.science/r/REVEAL-Reference-A006.

2605.28456 2026-05-28 cs.AI cs.CV eess.AS

Diffusion Large Language Models for Visual Speech Recognition

用于视觉语音识别的扩散大语言模型

Jeong Hun Yeo, Chae Won Kim, Hyeongseop Rha, Yong Man Ro

AI总结 提出首个基于扩散大语言模型(DLLM)的视觉语音识别框架DLLM-VSR,通过迭代掩码去噪和灵活顺序解码,结合置信度引导的解掩码策略及两阶段训练,并引入长度引导候选解码以降低目标长度不确定性,在LRS3上取得19.5%的词错误率。

Comments Code: https://github.com/JeongHun0716/dllm-vsr

详情
AI中文摘要

现有的视觉语音识别(VSR)系统通常依赖于从左到右的自回归解码,这可能在获得足够上下文之前,迫使对视觉模糊的令牌做出过早决策。我们提出DLLM-VSR,据我们所知,这是首个基于扩散大语言模型(DLLM)的VSR框架,将转录过程表述为具有灵活顺序解码的迭代掩码去噪。通过基于置信度的解掩码,DLLM-VSR早期提交高置信度位置,并利用已提交的令牌作为双向上下文来细化模糊令牌。为了使DLLM适应VSR,我们引入了一种两阶段掩码去噪训练策略,将视觉到文本的内容对齐与长度建模分离。我们进一步观察到,在假设知道真实转录长度的oracle长度解码下存在性能差距,这表明减少目标长度不确定性可以改善基于DLLM的VSR。为了缩小这一差距,我们开发了长度引导的候选解码,利用视频时长构建合理的转录长度假设,在多个假设下解码,并使用长度合理性和解码置信度对候选进行重新排序。所提出的方法仅使用LRS3的标注训练数据,就实现了19.5%的词错误率(WER),达到了最先进水平。

英文摘要

Existing Visual Speech Recognition (VSR) systems commonly rely on left-to-right autoregressive decoding, which can force premature decisions on visually ambiguous tokens before sufficient context is available. We propose DLLM-VSR, to the best of our knowledge, the first Diffusion Large Language Model (DLLM)-based VSR framework, formulating transcription as iterative masked denoising with flexible-order decoding. With confidence-based unmasking, DLLM-VSR commits high-confidence positions early and uses the committed tokens as bidirectional context to refine ambiguous ones. To adapt DLLMs to VSR, we introduce a two-stage masked-denoising training strategy that separates visual-to-text content alignment from length modeling. We further observe a performance gap with oracle-length decoding, which assumes access to the true transcript length, indicating that reducing target-length uncertainty can improve DLLM-based VSR. To reduce this gap, we develop length-guided candidate decoding, which uses video duration to construct plausible transcript-length hypotheses, decodes under multiple hypotheses, and reranks candidates using length plausibility and decoding confidence. The proposed method achieves a state-of-the-art WER of 19.5\% on LRS3 using only its labeled training data.

2605.28454 2026-05-28 cs.AI

GONDOR to the Rescue: Satisficing Planning with Low Memory

GONDOR 救援:低内存下的满意规划

Yonatan Vernik, Alexander Tuisov, Alexander Shleyfman

AI总结 提出 GONDOR 算法,通过周期压缩搜索树并保留稀疏锚点状态,在严格内存限制下扩展 GBFS,实现低内存预算下的满意规划。

详情
AI中文摘要

贪婪最佳优先搜索(GBFS)是解决可通过启发式估计目标(如规划、路径查找、导航和寻路)的搜索问题的主要方法。当内存严格受限时(例如在边缘设备上规划),尤其如此。为了缓解这一问题,我们提出了 GONDOR(基于动态前哨站再搜索的贪婪在线导航),这是 GBFS 的一种内存高效扩展,通过周期性地压缩搜索树同时保留一组稀疏的锚点状态,允许在严格内存限制下继续搜索,然后在到达目标时通过在稀疏状态之间重新搜索来重建路径。我们分析了该算法,并讨论了由不同前哨站选择策略定义的几种变体。此外,我们探索了在关闭列表中使用布隆过滤器进行紧凑的重复检测。跨数值规划领域和启发式配置的实验表明,与标准 GBFS 相比,GONDOR 在低内存预算下持续提高了覆盖率。我们发布了 GONDOR 和布隆过滤器变体的实现,以促进对内存高效启发式搜索的进一步研究。

英文摘要

Greedy Best-First Search (GBFS) is the dominant approach for solving search problems where the goal can be estimated with a heuristic, such as planning, route finding, navigation, and pathfinding. This is especially true when the memory is tightly constrained, such as planning on edge devices. To alleviate that, we present GONDOR (Greedy Online Navigation with Dynamic Outpost-based Re-search), a memory-efficient extension of GBFS that allows search to continue under strict memory limits by periodically compressing the search tree while retaining a sparse set of anchor states, then upon reaching the goal reconstructs the path by re-searching between the sparse states. We analyze the algorithm and discuss several variants defined by different outpost selection policies. In addition, we explore using Bloom filters for compact duplicate detection in the closed list. Experiments across numeric planning domains and heuristic configurations show that GONDOR consistently improves coverage under low memory budgets compared to standard GBFS. We release the implementation of GONDOR and the Bloom-filter variant to facilitate further research on memory-efficient heuristic search.

2605.28450 2026-05-28 cs.CV cs.AI

BiasEdit: A Training-Free Bias-Detect-and-Edit Framework for Learning Fair Visual Classifiers

BiasEdit: 一种无需训练的偏差检测与编辑框架,用于学习公平的视觉分类器

Jungwook Seo, Yoonsik Park, Changmin Lee, Sungyong Baik

AI总结 提出BiasEdit框架,通过统计依赖和互信息分析自动检测偏差属性,并利用文本引导的图像编辑生成无偏样本,无需手动标注即可实现公平分类。

Comments Accepted to The Web Conference 2026 (formerly WWW) as an Oral presentation

详情
AI中文摘要

来自网络的视觉数据为图像分类器提供动力,这些分类器通常支撑着许多网络服务,如推荐和内容审核。然而,原始网络数据常常包含虚假关联和社会偏见,而神经网络以其倾向于学习数据中存在的偏见而闻名。这可能会加剧网络服务和网络数据中的不公平性,导致恶性循环。在图像分类的背景下,当大多数图像仅针对给定类包含相同属性时,网络会学习该类别的偏差属性。因此,从有偏数据集中训练公平且去偏的分类器需要处理多数具有偏差属性的图像(偏差对齐样本)与少数没有偏差属性的图像(偏差冲突样本)之间的不平衡问题。在这项工作中,我们引入了BiasEdit,一个模块化框架,能够自动从原始数据集中检测偏差属性并对其进行编辑,以构建去偏数据集。具体来说,BiasEdit首先通过视觉-语言表示的统计依赖性和互信息分析检测未知的偏差属性,然后使用文本引导的图像编辑显式编辑这些属性,以生成逼真的偏差冲突样本。与先前假设已知偏差属性或依赖合成混合的工作不同,我们的方法无需手动标注,并且可以利用现成的视觉-语言和编辑模型。BiasEdit解决了网络来源视觉AI中的一个基本挑战,减轻了数据集引起的偏差,并在训练数据完全有偏的情况下实现了最先进的去偏性能。

英文摘要

Visual data from the Web power image classifiers, which often underpin many web services, such as recommendation and content moderation. However, the raw Web data often contain spurious correlations and social biases, and neural networks are known for their tendency to learn biases present in data. This can reinforce unfairness in web services and the web data, leading to a vicious cycle. In the context of image classification, networks learn bias attributes for a specific class when a majority of images contain the same attribute only for a given class. Hence, training a fair and debiased classifier from a biased dataset demands handling an imbalanced problem between a majority of images with bias attributes (bias-aligned samples) and a minority without (bias-conflict samples). In this work, we introduce BiasEdit, a modular framework that automatically detects bias attributes from the original dataset and edits them to construct a debiased dataset. Specifically, BiasEdit first detects unknown bias attributes via statistical dependence and mutual information analysis of visual-linguistic representations, and then explicitly edits those attributes using text-guided image editing to generate realistic bias-conflict samples. Unlike prior works that assume known bias attributes or relies on synthetic mixing, our method operates without manual annotations and can leverage off-the-shelf vision-language and editing models. BiasEdit addresses a fundamental challenge in Web-sourced visual AI, mitigating dataset-induced bias and achieving state-of-the-art debiasing performance even when training data are fully biased.

2605.28448 2026-05-28 cs.RO

A Digital Twin Framework for Virtual Visuo-Haptic Teleoperation of Complex-Shaped Optical Microrobots

复杂形状光学微机器人的虚拟视觉-触觉遥操作数字孪生框架

Zongcai Tan, Lan Wei, Dandan Zhang

AI总结 本文提出一个数字孪生框架,集成多陷阱光学操纵、图像位姿估计、微机器人运动仿真和基于模型的触觉渲染,用于复杂形状光学微机器人的虚拟视觉-触觉遥操作,实验表明触觉反馈显著降低接触力和位置误差标准差并提高任务成功率。

Comments Accepted by 2026 MARSS

详情
AI中文摘要

光镊(OT)为精细生物医学任务提供皮牛级操纵,其中视觉-触觉反馈可通过传达交互力线索和陷阱稳定性信息来增强操作员感知。然而,针对复杂形状光学微机器人的视觉-触觉遥操作框架仍不成熟,特别是在多陷阱操纵场景中。本文提出一个用于复杂形状OT驱动微机器人的虚拟视觉-触觉遥操作数字孪生框架。该框架在机器人操作系统(ROS)连接的双臂遥操作系统中集成了数字孪生环境、基于图像的位姿和深度估计、微机器人运动仿真以及基于模型的触觉渲染。在力建模方面,我们结合了多球分布操纵(MSDM)模型与来自光镊工具箱的光学力估计,从而实现仿真驱动的视觉-触觉反馈。该框架再现了代表性微机器人的运动趋势,并提供了与拟合光学力模型数值一致的触觉力渲染。在模拟细胞递送任务中,触觉反馈使接触力指标和微机器人到陷阱中心距离指标的标准差分别降低了53.2%和55.2%,并将任务成功率从30%提高到80%。这些结果证明了该框架在评估复杂形状光学微机器人视觉-触觉遥操作策略方面的有效性。

英文摘要

Optical tweezers (OT) provide piconewton-scale manipulation for delicate biomedical tasks, where visuo-haptic feedback can improve operator awareness by conveying interaction-force cues and trap-stability information. However, visuo-haptic teleoperation frameworks for complex-shaped optical microrobots remain underdeveloped, particularly in multi-trap manipulation scenarios. This paper presents a digital twin framework for virtual visuo-haptic teleoperation of complex-shaped OT-driven microrobots. The framework integrates a digital twin environment, image-based pose and depth estimation, microrobot motion simulation, and model-based haptic rendering within a Robot Operating System (ROS)-connected bimanual teleoperation system. For force modeling, we combine a Multi-Sphere Distributed Manipulation (MSDM) model with optical-force estimation from the Optical Tweezers Toolbox, enabling simulator-driven visuo-haptic feedback. The framework reproduces representative microrobot motion trends and provides haptic force rendering that is numerically consistent with the fitted optical-force model. In simulated cell-delivery tasks, haptic feedback reduced the standard deviations of the contact-force metric and the microrobot-to-trap-center distance metric by 53.2% and 55.2%, respectively, and improved task success from 30% to 80%. These results demonstrate the framework's effectiveness for evaluating visuo-haptic teleoperation strategies for complex-shaped optical microrobots.

2605.28444 2026-05-28 cs.LG

Bilinear Coordinate Alignment for Training-Free Task-Vector Transfer

双线性坐标对齐用于免训练任务向量迁移

Jungyong Son, Jinwook Jung, Minhee Park, Sungyong Baik

AI总结 针对预训练模型版本更新后微调知识无法直接复用的问题,提出基于双线性坐标对齐的免训练框架BiCo,通过少量校准数据的前向-反向传播估计正交Procrustes映射,实现任务向量在模型间的有效迁移。

详情
AI中文摘要

微调大规模预训练模型是近期将通用表示适配到专门任务的流行范式。然而,当预训练模型的新版本可用时,通过微调获得的专业知识无法直接重用,因为它与原始模型的参数化绑定,需要另一次昂贵的微调。为解决这一低效问题,近期工作使用任务向量(定义为微调模型与其基础模型之间的参数差异)在模型间迁移专业知识。现有方法通过匹配激活或梯度来桥接不同模型,但与直接微调相比仍存在显著性能差距,表明这些部分对应关系不足。在本工作中,我们不将任务向量仅视为参数偏移,而是重新审视任务向量的形成,并表明它们可以推导为输入侧激活与输出侧梯度之间的累积双线性交互。受此观察启发,我们将任务向量迁移形式化为双空间对齐问题,并提出BiCo,一种通过双线性坐标对齐进行任务向量迁移的免训练框架。BiCo使用少量校准集上的单次前向-反向传播估计两个空间中的正交Procrustes映射,无需任何参数更新。在广泛的计算机视觉和自然语言处理基准测试中,BiCo在宽度、深度和预训练配置不同的模型间始终优于现有迁移方法。

英文摘要

Fine-tuning large-scale pre-trained models is a recent prevalent paradigm for adapting general representations to specialized tasks. However, when a new version of a pre-trained model becomes available, expertise acquired through fine-tuning cannot be directly reused because it is tied to the parameterization of the original model, requiring another costly fine-tuning. To address this inefficiency, recent work uses task vectors, defined as the parameter difference between a fine-tuned model and its base model, to transfer expertise across models. While existing methods bridge disparate models by matching activations or gradients, a significant performance gap remains relative to direct fine-tuning, suggesting that these partial correspondences are insufficient. In this work, instead of viewing a task vector merely as a parameter offset, we revisit the formation of task vectors and show that they can be derived as accumulated bilinear interactions between input-side activations and output-side gradients. Motivated by this observation, we formulate task-vector transfer as a dual-space alignment problem and propose BiCo, a training-free framework for transferring task vectors through Bilinear Coordinate alignment. BiCo estimates orthogonal Procrustes mappings in both spaces using a single forward-backward pass on a small calibration set, without any parameter update. Across extensive computer vision and natural language processing benchmarks, BiCo consistently outperforms existing transfer methods across models that differ in width, depth, and pre-training configuration.

2605.28441 2026-05-28 cs.CV cs.AI

Bayesian Gated Non-Negative Contrastive Learning

贝叶斯门控非负对比学习

Peng Cui, Jiahao Zhang, Lijie Hu

AI总结 针对对比学习中表示纠缠问题,提出贝叶斯门控非负对比学习,通过概率门控机制动态过滤无关特征,在Imagenet-100上语义一致性提升142.1%。

Comments Accepted by ICML 2026

详情
AI中文摘要

虽然对比学习(CL)已经革新了自监督表示学习,但其潜在表示仍然高度纠缠且不透明,限制了在安全关键应用中的可解释性。我们发现这种纠缠的一个根本原因是对确定性相似度量的依赖,该度量平等地对待所有特征维度。在组合场景中,这会产生优化冲突:常见的背景特征(如“蓝天”)被鼓励在正对中对齐,但同时又在负对中排斥,导致梯度振荡,阻碍精确的语义解缠。为了解决这个问题,我们提出了BayesNCL(贝叶斯门控非负对比学习)。与标准方法不同,BayesNCL引入了一种概率门控机制,动态过滤掉与任务无关的高频常见特征,同时选择性地保留判别性语义。通过将特征选择形式化为具有稀疏伯努利先验的变分推理问题,我们的方法有效解决了优化冲突。在Imagenet-100上的实验结果表明,与最先进的基线相比,BayesNCL在语义一致性上实现了142.1%的显著提升,在不影响下游任务性能的情况下产生了高度可解释的表示。代码可在 https://github.com/Cui-Peng-624/BayesNCL 获取。

英文摘要

While Contrastive Learning (CL) has revolutionized self-supervised representation learning, its latent representations remain highly entangled and opaque, limiting their interpretability in safety-critical applications. We identify that a fundamental cause of this entanglement is the reliance on deterministic similarity measures, which treat all feature dimensions equally. In compositional scenes, this creates an Optimization Conflict: common background features, such as, "blue sky", are encouraged to align in positive pairs but simultaneously repelled in negative pairs, causing gradient oscillations that hinder precise semantic disentanglement. To address this, we propose BayesNCL (Bayesian Gated Non-Negative Contrastive Learning). Unlike standard approaches, BayesNCL introduces a probabilistic gating mechanism that dynamically filters out task-irrelevant, high-frequency common features while selectively retaining discriminative semantics. By formalizing feature selection as a variational inference problem with a sparse Bernoulli prior, our method effectively resolves the optimization conflict. Empirical experimental results on Imagenet-100 demonstrate that BayesNCL achieves a remarkable 142.1% improvement in semantic consistency compared to state-of-the-art baselines, yielding highly interpretable representations without compromising downstream task performance. Code is available at https://github.com/Cui-Peng-624/BayesNCL.

2605.28440 2026-05-28 cs.CL cs.LG

AdaDPO: Self-Adaptive Direct Preference Optimization with Balanced Gradient Updates

AdaDPO:具有平衡梯度更新的自适应直接偏好优化

Shaolong Chen, Madalina Ciobanu, Qingqing Mao, Ritankar Das

AI总结 针对DPO中梯度不对称导致模型偏向避免不良回答而非生成优质回答的问题,提出AdaDPO算法,通过引入基于策略模型生成概率的自适应系数来平衡正负偏好梯度,在AlpacaEval 2上优于DPO并缓解长度偏差。

Comments 5 figures

详情
AI中文摘要

DPO已成为替代RLHF用于将LLM与人类偏好对齐的广泛采用方法,无需单独的奖励模型或RL循环。最近的理论分析揭示了DPO中不对称的梯度行为:损失抑制不偏好响应的速度远快于促进偏好响应,导致模型学习避免生成坏答案而非生成好答案。我们提出AdaDPO,一种DPO算法的自适应变体,它引入了基于策略模型生成概率的每偏好对、基于停止梯度的系数,并以参考模型的概率作为可选组件。AdaDPO旨在强制偏好和不偏好概率的梯度幅度相等;实际实现平衡了每token梯度并应用数值裁剪边界以保持稳定性,同时保留DPO的原始超参数结构。在SimPO类似设置下使用UltraFeedback训练的Llama-3-8B-Instruct上,AdaDPO在AlpacaEval 2上持续优于DPO:它在81%的超参数组合中实现了更高的长度控制胜率(LC),达到了全局最佳LC(48.3%)和原始胜率(46.1%),并在88%的组合中扩大了LC与WR的差距,表明有效缓解了长度偏差。对KL散度、奖励边际和奖励准确率的额外分析证实,AdaDPO纠正了梯度不平衡并产生了更高效的优化。由于它纯粹在损失层面操作,AdaDPO可以无缝集成到现有的基于偏好的对齐流程中,无需改变数据收集或模型架构。该方法仅需几行代码,并且相同的自适应原理可推广到广泛的成对对比偏好损失族,包括SimPO、R-DPO、IPO、CPO和ORPO。

英文摘要

DPO has become a widely adopted alternative to RLHF for aligning LLMs with human preferences, eliminating the need for a separate reward model or RL loop. Recent theoretical analysis uncovers an asymmetric gradient behavior in DPO: the loss suppresses dispreferred responses substantially faster than it promotes preferred ones, causing the model to learn to avoid bad answers rather than to generate good ones. We propose AdaDPO, a Self-Adaptive variant of the DPO algorithm that introduces per-preference-pair, stop-gradient-based coefficients derived directly from the policy model's generation probabilities, with the reference model's probabilities as an optional component. AdaDPO is constructed to enforce equality of gradient magnitudes between preferred and dispreferred probabilities; the practical implementation balances per-token gradients and applies a numerical clipping bound for stability, while retaining DPO's original hyperparameter structure. On Llama-3-8B-Instruct trained on UltraFeedback under a SimPO similar setup, AdaDPO consistently outperforms DPO on AlpacaEval 2: it achieves higher length-controlled win rates (LC) in 81% of hyperparameter combinations, attains the global best LC (48.3%) and raw win rate (46.1%), and enlarges the LC-over-WR margin in 88% of combinations, indicating effective mitigation of length bias. Additional analyses on KL divergence, reward margin, and reward accuracy confirm that AdaDPO rectifies the gradient imbalance and yields more efficient optimization. Because it operates purely at the loss level, AdaDPO can be dropped into existing preference-based alignment pipelines without changing data collection or model architectures. The method requires only a few lines of code, and the same self-adaptive principle generalizes to a broad family of pairwise contrastive preference losses including SimPO, R-DPO, IPO, CPO, and ORPO.

2605.28438 2026-05-28 cs.CL

Breaking the Script Barrier: Enabling Automatic Alignment for PoS-based ASR Error Analysis in Non-Latin Scripts

打破脚本障碍:实现基于词性标注的ASR错误分析在非拉丁脚本中的自动对齐

Prasenjit K Mudi, Dahlia Devapriya, Sheetal Kalyani

AI总结 提出一种语言无关的自动对齐机制,使基于词性标注的ASR错误分析能在拉丁和非拉丁脚本中可靠进行,并应用于多种书写系统以提升WER。

详情
AI中文摘要

自动语音识别(ASR)系统通常使用词错误率(WER)等聚合指标进行评估,但这些指标无法捕捉错误的语言结构。细粒度分析(如基于词性(PoS)的错误特征)需要ASR假设与参考转录之间的准确对齐。然而,现有的对齐工具对于非拉丁脚本的语言通常不可靠。在这项工作中,我们通过提出一种鲁棒、自动、语言无关的对齐机制来填补这一空白,该机制适用于各种ASR架构以及拉丁和非拉丁脚本的语言。这使得假设、参考和评估序列能够一致对齐,为下游语言分析奠定基础。在此基础上,我们使用标准PoS标注器进行可扩展且可重复的基于PoS的错误分析。值得注意的是,我们对三种主要的分段书写系统进行了对齐和下游ASR错误分析,即元音附标文字(泰米尔语、印地语、卡纳达语)、字母文字(英语、俄语、希腊语)和辅音文字(阿拉伯语)。我们进一步展示了如何在ASR训练中利用此类错误信息来改进WER等指标。

英文摘要

Automatic Speech Recognition (ASR) systems are commonly evaluated using aggregate metrics such as Word Error Rate (WER), which do not capture the linguistic structure of errors. Fine-grained analysis, such as Part-of-Speech (PoS)-wise error characterization, requires accurate alignment between ASR hypotheses and reference transcriptions. However, existing alignment tools are often unreliable for languages written in non-Latin scripts. In this work, we address this gap by proposing a robust, automated, language-agnostic alignment mechanism applicable across ASR architectures and across languages written in both Latin and non-Latin scripts. This enables consistent alignment of hypotheses, references, and evaluation sequences, forming the basis for downstream linguistic analysis. Building on this, we employ standard PoS taggers to perform scalable and reproducible PoS-wise error analysis. Notably, we perform alignment and downstream ASR error analysis across three major segmented writing systems, namely, Abugida (Tamil, Hindi, Kannada), Alphabetic (English, Russian, Greek), and Abjad (Arabic). We further demonstrate how such error information can be leveraged during ASR training to improve metrics such as WER.

2605.28433 2026-05-28 cs.CL

Roles with Rails: Contract-Preserving Role Evolution in Multi-Agent Structured Reasoning

角色与轨道:多智能体结构化推理中保持契约的角色演化

Ling-Yue Ge, Lan-Zhe Guo

AI总结 提出SERO框架,通过契约保持的角色演化机制(信用引导检索、保护终端聚合器、条件验证器修复、上下文赌博机控制器)解决多智能体系统中角色漂移和契约破坏问题,在真实推理基准上验证有效性。

Comments 33 pages, 23 figures, 12 tables

详情
AI中文摘要

基于角色的LLM多智能体系统需要自适应角色池,但适应此类系统不仅仅是提示优化的问题:角色通常带有结构性义务,包括能力覆盖、消息兼容性、验证、最终答案聚合以及解析器兼容的输出协议。现有系统要么固定角色清单而失去自适应性,要么允许无约束生成导致角色漂移,移除结构上必要的角色并破坏答案契约。我们将此形式化为保持契约的角色演化,要求每次提交的编辑保留五个结构性契约(能力、通信、验证、聚合、输出协议)。我们在SERO(自演化角色编排框架)中实例化这一形式化,该框架通过信用引导检索、带有保护终端聚合器和条件验证器修复的信用排序通信DAG,以及一个上下文赌博机控制器来演化类型化角色卡池,其中LLM提出的编辑仅在它们保持契约并提高任务分数时被提交。在三个LLM骨干上的真实世界推理基准实验证实了保持契约的角色演化的价值。

英文摘要

Role-based LLM multi-agent systems need adaptive role pools, yet adapting such systems is not merely a matter of prompt optimization: roles often carry structural obligations, including capability coverage, message compatibility, validation, final-answer aggregation, and parser-compatible output protocols. Existing systems either fix the role inventory and lose adaptivity, or allow unconstrained generation to induce role drift, removing structurally necessary roles and breaking answer contracts. We formulate this as contract-preserving role evolution, requiring every committed edit to preserve five structural contracts (capability, communication, validation, aggregation, output protocol). We instantiate this formulation in SERO, a Self-Evolving Role Orchestration framework that evolves a typed role-card pool through credit-guided retrieval, a credit-ranked communication DAG with a protected terminal aggregator and conditional validator repair, and a contextual-bandit controller whose LLM-proposed edits are committed only when they preserve the contracts and improve task score. Experiments on real-world reasoning benchmarks across three LLM backbones confirm the value of contract-preserving role evolution.

2605.28428 2026-05-28 cs.CV cs.AI

Anomaly as Non-Conformity via Training-Free Graph Laplacian Energy Minimization

通过无训练图拉普拉斯能量最小化的非一致性异常检测

Jungwook Seo, Minjeong Kim, Younkwan Lee, Seungho Shin, Sungyong Baik

AI总结 提出一种无训练图拉普拉斯能量优化方法ANoCo,通过查询补丁与正常流形对齐所需的更新幅度来度量异常,无需学习参数或采样,在标准基准上取得强图像级AUROC和稳定定位图。

Comments Accepted to CVPR 2026

详情
AI中文摘要

检测图像中的细微视觉异常仍然具有挑战性,特别是当仅预先提供正常样本时。这种无监督异常检测通常通过测量查询补丁与正常补丁记忆库的特征相似性来解决。然而,仅凭相似性无法揭示查询补丁在多大程度上违反了正常特征流形的结构。我们提出了一种无训练的拉普拉斯图能量优化公式,名为ANoCo,它通过查询补丁与固定正常流形对齐所需的非一致性成本来评分异常。对于每个查询补丁,我们构建一个由余弦亲和性加权的二分查询-正常图,明确移除查询-查询和正常-正常边以防止证据稀释。我们将异常评分公式化为带有锚定正常节点的凸拉普拉斯能量,并以闭式求解。特别地,我们不使用优化后的特征本身——异常分数是满足正常性约束所需的更新幅度,将图拉普拉斯重新定义为非一致性算子而非平滑先验。所提出的方法不引入可学习参数、消息传递或采样,其复杂度与单次线性求解相当。在标准基准上,它实现了强大的图像级AUROC、稳定的定位图以及相比先前方法更强的鲁棒性,证明了使用优化诱导的特征漂移作为异常度量的有效性。

英文摘要

Detecting subtle visual anomalies in images remains challenging, particularly when only normal samples are available a priori. Such unsupervised anomaly detection is typically solved by measuring feature similarity of a query patch to a memory of normal patches. However, similarity alone does not reveal how strongly a query patch violates the structure of the normal feature manifold. We propose a training-free Laplacian graph energy optimization formulation, named ANoCo that scores Anomaly by the cost of Non-Conformity of a query patch to align with a fixed normal manifold. For each query patch, we construct a bipartite query to normal graph weighted by cosine affinity, explicitly removing query-query and normal-normal edges to prevent evidence dilution. We formulate anomaly scoring as a convex Laplacian energy with anchored normal nodes, and solve in closed form. In particular, we do not use the optimized features themselves-the anomaly score is the magnitude of the update required to satisfy normality constraints, reframing the graph Laplacian as a non-conformity operator rather than a smoothing prior. The proposed method introduces no learnable parameters, message passing, or sampling, and has complexity comparable to a single linear solve. Across standard benchmarks, it delivers strong image-level AUROC, stable localization maps, and improved robustness over prior methods, demonstrating the effectiveness of using optimization-induced feature drift as anomaly measure.

2605.28427 2026-05-28 cs.LG stat.ML

Latent Diffusion for Missing Data

缺失数据的潜在扩散模型

Alberte Heering Estad, Ignacio Peis, Jes Frellsen

AI总结 提出两阶段框架,先利用鲁棒VAE从缺失数据中学习潜在表示,再训练扩散模型,在MCAR缺失率高达50%时仍保持高质量生成,优于像素空间扩散。

详情
AI中文摘要

扩散模型已成为缺失数据插补的强大生成方法,但大多数现有方法直接在数据空间中操作,当训练数据严重不完整时会退化。我们研究将扩散转移到学习到的潜在表示是否能在完全随机缺失(MCAR)损坏下提高鲁棒性。为此,我们提出一个两阶段框架:一个基于VAE的鲁棒插补器首先从不完整观测中学习紧凑的语义特征,然后在得到的潜在空间中训练扩散模型。在不同的训练缺失率下,我们在相同的不完整数据设置下与像素空间扩散模型进行受控比较。潜在扩散模型保持高样本质量,并在缺失率高达50%时保持稳定,而像素空间扩散随着缺失率增加逐渐退化。对于下游插补,潜在扩散也始终比像素空间扩散表现更好。这些发现表明,潜在空间建模减轻了零插补输入带来的伪影放大,并为不完整数据学习提供了更鲁棒的生成先验。总体而言,我们的结果支持潜在扩散作为缺失数据问题中像素空间扩散的一个强大且实用的替代方案。

英文摘要

Diffusion models have emerged as powerful generative approaches for missing-data imputation, yet most existing methods operate directly in data space and degrade when training data are heavily incomplete. We investigate whether shifting diffusion to a learned latent representation improves robustness under missing-completely-at-random (MCAR) corruption. To this end, we propose a two-stage framework: a robust VAE-based imputer first learns compact semantic features from incomplete observations, and a diffusion model is then trained in the resulting latent space. Across training missing rates, we perform a controlled comparison against pixel-space diffusion models under the same incomplete-data setting. The latent diffusion model maintains high sample quality and remains stable up to 50\% missingness, while pixel-space diffusion degrades progressively as missingness increases. For downstream imputation, latent diffusion also achieves consistently better performance than pixel-space diffusion. These findings indicate that latent-space modeling mitigates artifact amplification from zero-imputed inputs and provides a more robust generative prior for incomplete-data learning. Overall, our results support latent diffusion as a strong and practically useful alternative to pixel-space diffusion for missing-data problems.

2605.28424 2026-05-28 cs.CL

Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning

Skill0.5:面向智能体强化学习中分布外泛化的联合技能内化与利用

Jiapeng Zhu, Jianxiang Yu, Yibo Zhao, Chengcheng Han, Qi Gu, Xunliang Cai, Xiang Li, Weining Qian

AI总结 提出Skill0.5框架,通过区分通用技能内化与任务特定技能利用,结合动态难度感知路由器,在ALFWorld和WebShop上提升了分布内和分布外场景的性能。

详情
AI中文摘要

将显式技能赋予大型语言模型已成为使自主智能体解决复杂任务的一种有前景的范式。智能体技能可以内在地分为用于广泛认知迁移的通用技能和用于动态执行的任务特定技能。然而,现有的基于技能的强化学习方法通常强制在完全外化(导致高昂的上下文开销)和完全内化(存在过拟合和知识冲突风险)之间做出僵化选择。为了解决这一困境,我们提出了Skill0.5,一种新颖的智能体强化学习框架,通过结合通用技能内化与任务特定技能利用来明确区分技能处理方式。在动态、难度感知路由器的驱动下,Skill0.5将任务流式传输到不同的掌握层级,以应用定制的优化策略:它通过特权蒸馏内化通用技能,为困难任务构建认知基础,同时在简单任务上使用诊断性探测来惩罚捷径并强制特定技能利用。在ALFWorld和WebShop上的实验表明,Skill0.5优于基于记忆和基于技能的强化学习基线,在分布内和分布外场景中均实现了性能提升。

英文摘要

Equipping large language models with explicit skills has emerged as a promising paradigm for enabling autonomous agents to solve complex tasks. Agent skills can be inherently divided into general skills for broad cognitive transfer and task-specific skills for dynamic execution. However, existing skill-based reinforcement learning (RL) methods typically force a rigid choice between full externalization, which incurs prohibitive context overhead, and full internalization, which risks overfitting and knowledge conflicts. To address this dilemma, we propose Skill0.5, a novel agentic RL framework that explicitly differentiates skill treatments by combining general skill internalization with task-specific skill utilization. Driven by a dynamic, difficulty-aware router, Skill0.5 streams tasks into distinct mastery tiers to apply tailored optimization strategies: it internalizes general skills via privileged distillation to build a cognitive foundation for hard tasks, while using diagnostic probing on easy tasks to penalize shortcuts and enforce specific skill utilization. Experiments on ALFWorld and WebShop demonstrate that Skill0.5 outperforms both memory-based and skill-based RL baselines, yielding performance improvements across both in-distribution and out-of-distribution scenarios.

2605.28422 2026-05-28 cs.CV cs.AI

VITAL: Visual-Semantic Dual Supervision for Enhanced and Interpretable Latent Reasoning in Medical MLLMs

VITAL: 视觉-语义双重监督增强可解释的医学多模态大语言模型潜在推理

Qiaoru Li, Shaotian Liang, Jintao Chen, Haoran Sun, Yuxiang Cai, Jianwei Yin, Yankai Jiang

AI总结 提出VITAL框架,通过视觉-语义双重监督(文本解码器重构推理链、视觉投影器回归ROI特征)实现医学MLLM的可解释潜在推理,在7个基准上达到SOTA。

详情
AI中文摘要

潜在推理能够对连续隐藏状态而非显式token进行推理,避免了医学VQA中思维链的语言瓶颈和推理开销。然而,现有方法存在模态崩溃、视觉监督不足以及训练-推理不匹配的问题。此外,其不透明的潜在状态缺乏可解释性,而这在临床应用中至关重要。我们提出VITAL,一个用于医学MLLM的潜在空间推理框架,具有视觉-语义双重监督:一个辅助文本解码器从潜在状态重建推理链,同时一个视觉投影器从冻结的独立医学视觉编码器回归ROI特征。两个模块在推理时被丢弃,零开销,但可以在事后重新附加以实现双重可解释性,在不牺牲效率的情况下提供推理过程的文本和视觉解释。我们构建了一个涵盖9种成像模态的61K数据集,比之前的医学视觉潜在推理数据集大一个数量级。在7个基准上的实验表明,VITAL一致且显著优于骨干模型、所有潜在推理基线以及在更大数据上训练的医学MLLM,达到了与万亿参数专有模型竞争的最先进结果。

英文摘要

Latent reasoning enables reasoning over continuous hidden states rather than explicit tokens, avoiding the language bottleneck and inference overhead of chain-of-thought for medical VQA. However, existing methods suffer from modality collapse, insufficient visual supervision, and train-inference mismatch. Moreover, their opaque latent states offer no interpretability, which is critical in clinical applications. We propose VITAL, a latent-space reasoning framework for medical MLLMs with visual-semantic dual supervision: an auxiliary text decoder reconstructs reasoning chains from latent states, while a visual projector regresses ROI features from a frozen, independent medical vision encoder. Both modules are discarded at inference with zero overhead, yet can be re-attached post-hoc for dual interpretability, providing textual and visual explanations of the reasoning process without sacrificing efficiency. We construct a 61K dataset spanning 9 imaging modalities, exceeding prior medical visual latent reasoning datasets by an order of magnitude. Experiments on 7 benchmarks show that VITAL consistently and substantially outperforms the backbone, all latent reasoning baselines, and medical MLLMs trained on far larger data, achieving state-of-the-art results competitive with trillion-parameter proprietary models.

2605.28421 2026-05-28 cs.AI

DenoiseRL: Bootstrapping Reasoning Models to Recover from Noisy Prefixes

DenoiseRL:引导推理模型从噪声前缀中恢复

Caijun Xu, Changyi Xiao, Zhongyuan Peng, Yixin Cao

AI总结 提出DenoiseRL框架,通过强化学习从弱模型的错误推理中学习,无需外部监督或强教师模型,提升推理性能和训练效率。

Comments 17 pages, 6 figures

详情
AI中文摘要

强化学习已成为推动大型语言模型推理能力发展的核心范式,然而现有方法仍依赖更强的教师模型或精心策划的困难数据集,限制了可扩展的能力提升。在本文中,我们提出DenoiseRL,一种强化学习框架,通过从弱模型的失败中恢复导向优化来替代外部监督。DenoiseRL不依赖更强的监督或精心设计的数据,而是直接从错误的推理轨迹中学习,将其转化为改进的机会,使训练更具可扩展性且更少依赖外部资源。这产生了更丰富、更多样化的学习信号,提高了从非完美模型行为中探索的效率。因此,DenoiseRL提升了推理性能和整体训练效率,同时减少了对昂贵数据整理或更强教师模型的需求。实验表明,DenoiseRL在竞争性数学和通用推理基准上持续优于强在线强化学习基线,并随着训练难度增加促进更强的自我纠正行为,突显了改进大型语言模型推理的一种有效且可扩展的替代路径。

英文摘要

Reinforcement learning has become a central paradigm for advancing reasoning in large language models, yet most existing methods still depend on stronger teacher models or heavily curated difficult datasets, limiting scalable capability improvement. In this paper, we introduce DenoiseRL, a reinforcement learning framework that substitutes external supervision with recovery-oriented optimization over failures from weak models. Instead of relying on stronger supervision or carefully engineered data, DenoiseRL learns directly from incorrect reasoning traces by converting them into opportunities for improvement, making training more scalable and less dependent on external resources. This yields a richer and more diverse learning signal, improving exploration efficiency from imperfect model behavior. As a result, DenoiseRL improves reasoning performance and overall training efficiency while reducing the need for expensive data curation or stronger teacher models. Empirically, DenoiseRL consistently outperforms strong on-policy RL baselines across competitive mathematical and general reasoning benchmarks and promotes stronger self-corrective behavior as training difficulty increases, highlighting an effective and scalable alternative pathway for improving reasoning in large language models.

2605.28412 2026-05-28 cs.RO cs.LG

Tactile-Proprioceptive Sensor Fusion for Contact Wrench Estimation in Whole-Body Physical Human-Robot Interaction

触觉-本体感觉传感器融合用于全身物理人机交互中的接触力估计

Junha Min, Junghyeon Ma, Jiwung Kwon, Sunggyu Bae, Joohyung Kim, Kyungseo Park

AI总结 提出触觉-本体感觉融合框架,利用气动皮肤垫的触觉线索作为接触指示器,结合基于电机电流的本体感觉,通过时间卷积网络消除摩擦滞后,实现多轴接触力重建,提高物理人机交互的灵敏度和响应性。

Comments 8 pages, 6 figures. Accepted to IEEE International Conference on Robotics and Automation (ICRA) 2026

详情
AI中文摘要

直接物理引导是一种自然的教学和与机器人交互的方式,机器人皮肤通过实现灵敏的接触感知和定位做出关键贡献。本文提出了一种用于自然物理人机交互的触觉-本体感觉传感器融合框架。来自气动皮肤垫的触觉线索作为接触指示器,绕过了摩擦残余和施加外力之间的模糊性,实现了无需明确摩擦识别的高灵敏度接触检测。我们将这些线索与基于电机电流的本体感觉融合,以重建机器人表面的多轴接触力。为了在运动过程中保持精度,我们采用时间卷积网络(TCN)来减轻粘滑过渡期间的摩擦滞后,减少接触起始时的不确定性,并产生平滑、响应灵敏的引导。我们在集成皮肤的机器人臂上验证了该方法:(i)在静止接触中重建多轴力,以及(ii)同时进行力估计和动觉教学。结果表明,与仅触觉和仅本体感觉的基线相比,在不同接触条件下灵敏度和响应性均有提高,支持触觉-本体感觉融合作为安全、直观的物理人机交互的可靠途径。

英文摘要

Direct physical guidance is a natural means of teaching and interacting with robots, and robotic skins make a key contribution by enabling sensitive contact sensing and localization. This paper presents a tactile-proprioceptive sensor fusion framework for natural physical human-robot interaction. Tactile cues from pneumatic skin pads serve as contact indicators that bypass the ambiguity between frictional residues and applied external forces, enabling highly sensitive contact detection without explicit friction identification. We fuse these cues with motor-current-based proprioception to reconstruct multi-axis contact forces on the robot surface. To maintain accuracy during motion, we employ a temporal convolutional network (TCN) to mitigate friction hysteresis during stick-slip transitions, reducing uncertainty at contact onset and yielding smooth, responsive guidance. We validate the approach on a skin-integrated robot arm: (i) multi-axis forces are reconstructed in stationary contacts, and (ii) simultaneous force estimation and kinesthetic teaching are demonstrated. Results indicate improved sensitivity and responsiveness across diverse contact conditions compared with tactile-only and proprioceptive-only baselines, supporting tactile-proprioceptive fusion as a reliable pathway to safe, intuitive physical human-robot interaction.