arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2157
2605.20079 2026-05-20 cs.CV cs.AI cs.LG eess.IV

Probability-Conserving Flow Guidance

概率守恒的流引导

Parsa Esmati, Junha Hyung, Amirhossein Dadashzadeh, Jaegul Choo, Majid Mirmehdi

AI总结 本文提出了一种概率守恒的流引导方法AdaMaG,通过分析连续方程,将引导效果分解为发散项和分数平行项,并通过时间依赖的调度和分数平行衰减来控制这两个项,从而在不增加推理成本的情况下提高生成质量并减少幻觉。

详情
AI中文摘要

扩散和基于流的生成模型在视觉合成中占据主导地位,引导将样本对齐到用户输入并提高感知质量。然而,分类器无关引导(CFG)和基于外推的方法是速度/分数的启发式线性组合,忽略了生成流形的几何结构,破坏了概率守恒,导致在强引导下样本偏离学习的流形。我们通过连续方程分析引导,并展示其效果分解为一个发散项和一个在参数化下不变的分数平行项。我们证明发散项在采样接近数据流形时结构上会发散,这促使我们采用时间依赖的调度和分数平行衰减。所得到的即插即用规则,自适应流形引导(AdaMaG),在不增加推理成本的情况下限制了这两个项。最后,我们展示大多数减少饱和或提高生成质量的实证启发式方法直接对应于我们分解中的两个项。在图像生成基准测试中,AdaMaG提高了真实感,减少了幻觉,并在高引导制度下诱导了受控的去饱和。

英文摘要

Diffusion and flow-based generative models dominate visual synthesis, with guidance aligning samples to user input and improving perceptual quality. However, Classifier-Free Guidance (CFG) and extrapolation-based methods are heuristic linear combinations of velocities/scores that ignore the generative manifold geometry, breaking probability conservation and driving samples off the learned manifold under strong guidance. We analyse guidance through the continuity equation and show its effect decomposes into a divergence term and a score-parallel term defined invariantly across parameterisations. We prove the divergence term blows up structurally as sampling approaches the data manifold, motivating a time-dependent schedule alongside score-parallel attenuation. The resulting plug-and-play rule, Adaptive Manifold Guidance (AdaMaG), bounds both terms at no additional inference cost. Finally, we show that most empirical heuristics for reducing saturation or improving generation quality correspond directly to the two terms in our decomposition. Across image generation benchmarks, AdaMaG improves realism, reduces hallucinations, and induces controlled desaturation in high-guidance regimes.

2605.20075 2026-05-20 cs.CL cs.AI

CopT: Contrastive On-Policy Thinking with Continuous Spaces for General and Agentic Reasoning

CopT: 在连续空间中利用对比学习进行通用和代理推理的在线策略思考

Dachuan Shi, Hanlin Zhu, Xiangchi Yuan, Wanjia Zhao, Kejing Xia, Wen Xiao, Wenke Lee

AI总结 本文提出CopT,一种改进的推理流程,通过反转传统思考和回答的顺序,首先生成草稿答案,再基于该答案进行在线策略思考以进行反思和修正。CopT利用连续嵌入作为推理时的对比验证器,通过对比离散令牌输入和连续嵌入输入下模型对相同生成令牌的支持,得到一个序列级的反KL估计器来评估答案的可靠性。在数学、编程和代理推理任务中,CopT在保持同等或更高准确性的情况下,将峰值准确率提高了高达23%,并将令牌使用量减少了高达57%。

Comments Code: https://github.com/sdc17/CopT, Website: https://copt-web.github.io/

详情
AI中文摘要

链式思考(CoT)是一种用于从大型语言模型(LLMs)中激发推理能力的标准方法。然而,常见的CoT范式将思考视为回答的前提,这会延迟访问合理答案并产生不必要的令牌成本,即使模型能够在扩展思考之前识别出答案,这种行为被称为表现性推理。在本文中,我们引入了CopT,一种重新表述的推理流程,反转了通常的思考和回答顺序。与传统的在思考后再回答不同,CopT首先生成一个草稿答案,然后基于其自身的草稿答案进行后续的在线策略思考以进行反思和修正。为了评估草稿答案是否可信,CopT将连续嵌入重新表述为推理时的对比验证器。具体来说,它对比模型在离散令牌输入和连续嵌入输入下对相同生成令牌的支持,从而得到一个序列级的反KL估计器来评估答案的可靠性。我们的分析表明,在某些假设下,预期估计等于未解决的潜在状态与发出的答案令牌之间的互信息,解释了为什么它捕捉到与答案相关的不确定性,而不是潜在状态中的任意不确定性。当答案被认为不够可靠时,CopT会进行进一步的在线策略思考,其中第二个KL估计器动态控制草稿答案的可见性,保留有用的部分信息,同时减少被不可靠内容误导的风险。在数学、编程和代理推理任务中,CopT在保持同等或更高准确性的情况下,将峰值准确率提高了高达23%,并将令牌使用量减少了高达57%。代码可在https://github.com/sdc17/CopT上获得。

英文摘要

Chain-of-thought (CoT) is a standard approach for eliciting reasoning capabilities from large language models (LLMs). However, the common CoT paradigm treats thinking as a prerequisite for answering, which can delay access to plausible answers and incur unnecessary token costs even when the model is able to identify an answer before extended thinking, a behavior known as performative reasoning. In this paper, we introduce CopT, a reformulated reasoning pipeline that reverses the usual order of thinking and answering. Instead of thinking before answering, CopT first elicits a draft answer and then invokes subsequent on-policy thinking conditioned on its own draft answer for reflection and correction. To assess whether the draft answer should be trusted, CopT recasts continuous embeddings as inference-time contrastive verifiers. Specifically, it contrasts the model's support for the same generated tokens under discrete-token inputs and continuous-embedding inputs, yielding a sequence-level reverse KL estimator for answer reliability. Our analysis shows that under certain assumptions, the expected estimate equals the mutual information between the unresolved latent state and the emitted answer token, explaining why it captures answer-relevant uncertainty rather than arbitrary uncertainty in the latent state. When the answer is deemed insufficiently reliable, CopT performs further on-policy thinking, where a second KL estimator dynamically controls draft-answer visibility, preserving useful partial information while reducing the risk of being misled by unreliable content. Across mathematics, coding, and agentic reasoning tasks, CopT improves peak accuracy by up to 23% and reduces token usage by up to 57% at comparable or higher accuracy, without any additional training. The code is available at https://github.com/sdc17/CopT.

2605.20074 2026-05-20 cs.LG

Towards Distillation Guarantees under Algorithmic Alignment for Combinatorial Optimization

面向组合优化中算法对齐的蒸馏保证

Thien Le, Melanie Weber

AI总结 本文研究了在算法对齐框架下,通过蒸馏将大规模模型的知识转移到更高效的模型以用于部署的问题,重点分析了当目标模型是图神经网络且其架构与动态规划算法对齐时,蒸馏成功的条件。

Comments 22 pages

详情
AI中文摘要

蒸馏将知识从在广泛数据上训练的大模型转移到更小、更高效的模型,以用于部署。在结构预测设置中,任务的先验知识可以指导目标架构的选择,使其与底层问题在算法上对齐。在最近的决策树(DT)蒸馏学习理论分析(Boix-Adsera, 2024)基础上,我们研究了蒸馏在组合优化任务中成功的情况。我们关注目标模型是图神经网络,其架构与任务的动态规划(DP)算法对齐的情况。假设源模型足够丰富,通过线性表示假设(LRH)(Elhage et al., 2022; Park et al., 2024)形式化,我们证明蒸馏问题可以在DP转移函数的复杂度参数中高效解决,该参数表示为决策树。我们的结果提供了在算法对齐风味下的蒸馏成功严格充分条件。

英文摘要

Distillation transfers knowledge from a large model trained on broad data to a smaller, more efficient model suitable for deployment. In structured prediction settings, prior knowledge about the task can guide the choice of a target architecture that is algorithmically aligned with the underlying problem. Building on recent learning-theoretic analyses of decision-tree (DT) distillation (Boix-Adsera, 2024), we study when distillation succeeds for combinatorial optimization tasks. We focus on the case where the target model is a graph neural network whose architecture is aligned with a dynamic programming (DP) algorithm for the task. Assuming that the source model is sufficiently rich, formalized through the linear representation hypothesis (LRH) (Elhage et al., 2022; Park et al., 2024), we show that the distillation problem can be solved efficiently in the complexity parameters of the DP transition function, represented as a DT. Our results provide a rigorous sufficient condition for successful distillation in the flavour of algorithmic alignment.

2605.20073 2026-05-20 cs.CV

X-Ray cardiac angiographic vessel segmentation based on pixel classification using machine learning and region growing

基于机器学习和区域生长的X射线心血管造影血管分割

E O Rodrigues, L O Rodrigues, J J Lima, D Casanova, F Favarim, E R Dosciatti, V Pegorini, L S N Oliveira, F F C Morais

AI总结 本文提出了一种基于像素分类的X射线血管分割方法,利用纹理特征和区域生长技术,通过随机森林分类器实现高精度血管识别,达到95.48%的准确率。

详情
Journal ref
Biomedical Physics & Engineering Express 2021
AI中文摘要

本文提出了一种用于X射线造影图像中血管分割的像素分类方法。该方法利用各向异性扩散、Hessian矩阵特征、数学形态学和统计学等纹理特征,从每个像素的邻域中提取这些特征。该方法还使用了ELEMENT方法,即通过区域生长控制像素分类,其中分类结果影响后续像素的分类。随机森林分类器用于预测像素是否属于血管结构。该方法在文献中实现了最高的准确率(95.48%),优于无监督的最新方法。

英文摘要

This work proposes a pixel-classification approach for vessel segmentation in x-ray angiograms. The proposal uses textural features such as anisotropic diffusion, features based on the Hessian matrix, mathematical morphology and statistics. These features are extracted from the neighborhood of each pixel. The approach also uses the ELEMENT methodology, which consists of creating a pixel-classification controlled by region-growing where the result of the classification affects further classifications of pixels. The Random Forests classifier is used to predict whether the pixel belongs to the vessel structure. The approach achieved the best accuracy in the literature (95.48%) outperforming unsupervised state-of-the-art approaches.

2605.20072 2026-05-20 cs.AI cs.RO

Probing Embodied LLMs: When Higher Observation Fidelity Hurts Problem Solving

探查具身大语言模型:当更高的观察保真度损害问题解决

Oussama Zenkri, Oliver Brock

AI总结 本文研究了具身大语言模型在不同观察信息下的行为,发现高保真度观察反而降低了问题解决能力,核心方法是通过实验改变可用信息并测量行为变化,主要贡献是揭示了感知误差与推理失败的交互影响。

Comments Submitted to From Animals to Animats: The 18th International Conference on the Simulation of Adaptive Behavior (SAB)

详情
AI中文摘要

大型语言模型日益被提出作为机器人系统的认知组件,但其不透明的决策过程使得在闭环具身任务中的成功或失败难以解释。遵循经验AI方法,我们通过改变代理可用的信息并测量行为变化来研究具身LLM代理的行为。使用Lockbox,一个具有隐藏依赖关系的顺序机械谜题,在物理机器人设置中评估LLM在RGB、RGB-D和地面真实符号观察下的表现,并通过受控模拟来探测由此产生行为。反直觉的是,代理在原始RGB输入下表现最佳,而在完美地面真实观察下表现最差。在模拟中,我们通过随机翻转感知的动作结果来探测这一效应,发现适度的噪声提高了性能,峰值出现在40%的翻转概率下,相比无噪声基线,成功率提高了2.85倍。进一步分析将这一收益归因于重复动作循环的减少。这些发现表明,仅凭成功率来评估LLM是不够的,因为测量性能可能反映了感知误差与推理失败之间的相互作用,而非稳健的问题解决。

英文摘要

Large Language Models are increasingly proposed as cognitive components for robotic systems, yet their opaque decision processes make it difficult to explain success or failure in closed-loop embodied tasks. Following an empirical AI methodology, we study embodied LLM agents behaviorally by varying the information available to the agent and measuring the resulting changes in behavior. Using the Lockbox, a sequential mechanical puzzle with hidden interdependencies, we evaluate LLMs across RGB, RGB-D, and ground-truth symbolic observations in a physical robotic setup and use controlled simulation to probe the resulting behavior. Counterintuitively, agents perform best under raw RGB input and worst under perfect ground-truth observations. In simulation, we probe this effect by randomly flipping perceived action outcomes and find that moderate noise improves performance, peaking at a 40% flip probability with a 2.85-fold success rate increase over the noise-free baseline. Further analysis links this gain to a reduction in repetitive action loops. These findings suggest that success rates alone are insufficient for evaluating LLMs, as measured performance may reflect the interaction between perceptual errors and reasoning failures rather than robust problem solving.

2605.20066 2026-05-20 cs.CL

Text-to-SPARQL Generation with Reinforcement Learning: A GRPO-based Approach on DBLP

基于强化学习的文本到SPARQL生成:在DBLP上的GRPO方法

Jann Pfeifer, Debayan Banerjee, Ricardo Usbeck

AI总结 本文研究了在学术领域中,基于强化学习的零样本文本到SPARQL生成方法,通过GRPO算法在DBLP-QuAD上训练小型指令微调语言模型,并与监督学习的DoRA微调基线进行比较。

Comments Accepted by NeSy 2026

详情
AI中文摘要

知识图谱问答旨在将自然语言问题转换为可执行的知识图谱查询,但现有方法往往依赖于大型模型或全监督形式的黄金查询注释。本研究探讨了基于结果奖励的强化学习是否能训练一个小型指令微调语言模型,在学术领域进行零样本文本到SPARQL生成。Group-Relative Policy Optimization (GRPO)被应用于DBLP-QuAD上的Qwen3-1.7B模型,使用结合自然语言问题和实体及关系的符号提示。训练依赖于执行反馈、结构约束和答案级奖励,并额外引入基于黄金查询的塑造。所得模型在答案级准确性、执行准确性、类别得分和泛化到预留模板方面与未修改的零样本基线和监督DoRA微调基线进行比较。GRPO在零样本基线上显著提升,并表现出有竞争力的泛化能力,而监督DoRA微调在相同模型规模上实现了更高的整体准确性。消融分析表明,基于执行的奖励贡献了大部分收益,而额外的塑造带来了有限的额外收益,表明当没有黄金查询用于token级监督时,基于结果的强化学习是一种可行的训练策略。

英文摘要

Knowledge graph question answering seeks to translate natural language questions into executable queries over knowledge graphs, but existing approaches often rely on large models or full supervision in the form of gold query annotations. This study examines whether reinforcement learning with outcome-based rewards can train a small instruction-tuned language model to perform zero-shot Text-to-SPARQL generation in the scholarly domain. Group-Relative Policy Optimization (GRPO) is applied to the Qwen3-1.7B model on DBLP-QuAD, using prompts that combine natural language questions with symbolic hints about entities and relations. Training relies on execution feedback, structural constraints, and answer-level rewards, with an additional variant that incorporates gold-query-based shaping. The resulting models are compared to the unmodified zero-shot baseline and to a supervised DoRA-finetuned baseline across answer-level accuracy, execution accuracy, category-wise scores, and generalization to held-out templates. GRPO substantially improves over the zero-shot baseline and exhibits competitive generalization, while supervised DoRA finetuning achieves higher overall accuracy on the same model scale. Ablation analyses indicate that execution-based rewards account for most gains, with additional shaping yielding limited additional benefit, suggesting that outcome-based reinforcement learning is a viable training strategy when gold queries are unavailable for token-level supervision.

2605.20064 2026-05-20 cs.CV

Cardiac fat segmentation using computed tomography and an image-to-image conditional generative adversarial neural network

利用计算断层扫描和图像到图像的条件生成对抗神经网络进行心脏脂肪分割

Guilherme Santos da Silva, Dalcimar Casanova, Jefferson Tales Oliva, Erick Oliveira Rodrigues

AI总结 本研究提出了一种基于深度学习的新方法,利用pix2pix网络对心脏脂肪进行自动分割和量化,实现了高精度的epicardial和mediastinal脂肪分割,并在准确率和运行时间上优于现有方法。

详情
Journal ref
Medical Engineering & Physics 2024
AI中文摘要

近年来,研究强调了人类心脏周围脂肪组织增加与心瓣膜纤维颤动和冠心病等心血管疾病之间存在联系。然而,由于对医疗专业人员来说手动分割这些脂肪沉积物工作量大且成本高,这种分割并未在临床实践中广泛应用。因此,对更精确和高效定量分析的需求推动了新型计算方法的出现。本研究提出了一种新的深度学习方法,能够自主分割和量化两种不同类型的心脏脂肪沉积物。所提出的方法利用了pix2pix网络,这是一种主要设计用于图像到图像翻译任务的生成对抗网络。通过应用此网络架构,我们旨在研究其在解决心脏脂肪分割特定挑战方面的有效性,尽管该网络并非最初为该目的设计。本研究中感兴趣的两种脂肪沉积物称为心外膜脂肪和心包脂肪,它们被心包空间分开。实验结果表明,epicardial脂肪分割的平均准确率为99.08%和f1分数98.73,mediastinal脂肪分割的准确率为97.90%和f1分数98.40。这些发现代表了所提出方法的高精度和重叠一致性。与现有研究相比,我们的方法在f1分数和运行时间上表现更优,使图像能够在实时情况下进行分割。

英文摘要

In recent years, research has highlighted the association between increased adipose tissue surrounding the human heart and elevated susceptibility to cardiovascular diseases such as atrial fibrillation and coronary heart disease. However, the manual segmentation of these fat deposits has not been widely implemented in clinical practice due to the substantial workload it entails for medical professionals and the associated costs. Consequently, the demand for more precise and time-efficient quantitative analysis has driven the emergence of novel computational methods for fat segmentation. This study presents a novel deep learning-based methodology that offers autonomous segmentation and quantification of two distinct types of cardiac fat deposits. The proposed approach leverages the pix2pix network, a generative conditional adversarial network primarily designed for image-to-image translation tasks. By applying this network architecture, we aim to investigate its efficacy in tackling the specific challenge of cardiac fat segmentation, despite not being originally tailored for this purpose. The two types of fat deposits of interest in this study are referred to as epicardial and mediastinal fats, which are spatially separated by the pericardium. The experimental results demonstrated an average accuracy of 99.08% and f1-score 98.73 for the segmentation of the epicardial fat and 97.90% of accuracy and f1-score of 98.40 for the mediastinal fat. These findings represent the high precision and overlap agreement achieved by the proposed methodology. In comparison to existing studies, our approach exhibited superior performance in terms of f1-score and run time, enabling the images to be segmented in real time.

2605.20061 2026-05-20 cs.CL

Rewarding Beliefs, Not Actions: Consistency-Guided Credit Assignment for Long-Horizon Agents

奖励信念,而非行动:一致性引导的长期智能体信用分配

Wenjie Tang, Minne Li, Sijie Huang, Liquan Xiao, Yuan Zhou

AI总结 本文提出ReBel算法,通过建模结构化信念状态来指导策略学习,解决长期任务中由于部分可观测性导致的信用分配问题,实验表明其在ALFWorld和WebShop等基准测试中提升了任务成功率并提高了样本效率。

Comments 10 pages, 4 figures, 3 tables, plus appendix

详情
AI中文摘要

可验证奖励的强化学习(RLVR)是一种有前景的范式,用于提高大语言模型(LLM)智能体在长期交互任务中的表现。然而,在部分可观测环境中,不完整的观察导致智能体信念随时间漂移,而延迟奖励会模糊中间决策的因果影响,加剧时间信用分配的挑战。为此,我们提出ReBel(奖励信念),一种过程级强化学习算法,通过显式建模结构化信念状态来总结交互历史并指导后续策略学习。ReBel引入信念一致性监督,将预测信念与观察反馈之间的差异转换为密集的自监督信号,无需外部步骤注释或验证者。它还采用信念感知分组,比较相似信念状态下的轨迹,产生更稳健且方差更低的优势估计。我们在具有挑战性的长期基准测试上评估了ReBel,包括ALFWorld和WebShop。ReBel在episode级基线GRPO上将任务成功率提高高达20.4个百分点,并将样本效率提高2.1倍。这些结果表明,信念感知的自监督是一种在部分可观测性下可靠长期决策的有前景方向。代码可在:https://github.com/Fateyetian/Rebel.git获取。

英文摘要

Reinforcement learning from verifiable rewards (RLVR) is a promising paradigm for improving large language model (LLM) agents on long-horizon interactive tasks. However, in partially observable environments, incomplete observations cause agent beliefs to drift over time, while delayed rewards obscure the causal impact of intermediate decisions, exacerbating temporal credit assignment challenges. To address this, we propose ReBel (Reward Belief), a process-level reinforcement learning algorithm that explicitly models structured belief states to summarize interaction history and guide subsequent policy learning. ReBel introduces belief-consistency supervision, converting discrepancies between predicted beliefs and observed feedback into dense self-supervised signals without requiring external step-wise annotations or verifiers. It also employs belief-aware grouping to compare trajectories under similar belief states, yielding more robust and lower-variance advantage estimates. We evaluate ReBel on challenging long-horizon benchmarks, including ALFWorld and WebShop. ReBel improves task success by up to $20.4$ percentage points over the episode-level baseline GRPO and increases sample efficiency by $2.1\times$. These results suggest that belief-aware self-supervision is a promising direction for reliable long-horizon decision-making under partial observability. Code is available at: https://github.com/Fateyetian/Rebel.git.

2605.20050 2026-05-20 cs.CL

Language Mutations Sustain the Persistences of Conspiracy Theories on Social Media

语言变异维持社交媒体上阴谋论的持续性

Calvin Yixiang Cheng, Dorian Quelle, Scott A. Hale

AI总结 本研究探讨了语言变异如何影响社交媒体上阴谋论的持续传播,通过分析X平台三年的阴谋相关帖子数据,发现语义变异更大的阴谋论具有更长的生命周期,且心理语言学属性的变异与延长生命周期有关。

详情
AI中文摘要

本研究探讨了语言变异如何影响社交媒体上阴谋论的持续传播。通过分析X平台三年的阴谋相关帖子数据,结合计算语言学分析和生存建模,我们发现语义变异更大的阴谋论具有更长的生命周期。心理语言学属性的变异,包括代词、社会参照词、认知过程术语、风险和健康相关的词汇,与延长生命周期有关。演员、行动和目标(AAT)类别的变异也与更长的生命周期有关。定性分析识别出两种主要的变异模式:简化和同化,分别在语言和AAT结构层面。总体而言,这些结果加深了我们对语言变异如何促进在线阴谋论持续性的理解,并为长期内容管理策略提供了新的视角。我们主张内容管理应考虑阴谋论声明的可变性,并专注于核心声明以应对其潜在变化。

英文摘要

This study investigates how language mutations affect the persistent diffusion of conspiracy theories on social media. Drawing on a three-year dataset of conspiracy-related posts from X, and applying computational linguistic analysis alongside survival modelling, we find that conspiracy claims with greater semantic mutations have substantially longer lifespans. Mutations in psycholinguistic properties, including pronouns, social reference words, cognitive process terms, risk- and health- related vocabularies, are associated with extended lifespans. Mutations in actor, action and target (AAT) categories are associated with longer lifespans as well. Qualitative analysis identifies two predominant mutation patterns: simplification and assimilation, at both linguistic and AAT structural levels. Taken together, the results advance our understanding of how language mutations contribute to conspiracy persistence online and shed lights on longitudinal content moderation strategies. We argue that content moderation should consider the mutability of conspiracy claims and focus on the core claims that can address their potential variations.

2605.20044 2026-05-20 cs.CV

OP2GS: Object-Aware 3D Gaussian Splatting with Dual-Opacity Primitives

OP2GS: 带双不透明度的物体感知3D高斯散射

Guiyu Liu, Niklas Vaara, Janne Mustaniemi, Juho Kannala, Janne Heikkilä

AI总结 OP2GS通过引入双不透明度机制,为每个原始体素添加显式实例身份和专用实例不透明度σ*,以解决3D高斯散射在物体层面身份缺失的问题,从而提升开放词汇场景理解的性能。

Comments Under review

详情
AI中文摘要

3D高斯散射(3DGS)提供了一种显式且高效的场景表示,但其原始体素缺乏固有的物体层面身份,阻碍了下游任务如开放词汇场景理解。现有方法通常通过将高维特征嵌入提炼为高斯或通过启发式细化将2D掩码标签提升为3D来解决这一问题。然而,基于特征的方法会带来沉重的存储和解码开销,而基于提升的方法则容易受到标签污染:用于外观重建的高斯体往往在2D到3D投影时会获得错误的物体标签。我们提出了OP2GS,一种带物体感知的高斯表示,通过为每个原始体素添加显式实例身份和专用实例不透明度σ*用于物体掩码渲染。原始不透明度σ仍负责视觉重建,而σ*则模型该高斯是否应贡献于特定的物体掩码。这种双不透明度公式将视觉存在与实例占用解耦:错误标记的高斯体仍可用于图像渲染,但在物体掩码分支中会变得透明。为了学习这种表示,我们引入了随机物体损失,通过3DGS标准的透射率基可见性优化1D实例占用场。然后通过多视角聚合将语义描述符附加在物体层面,消除了每个高斯体的特征存储需求。与基于特征训练的方法相比,OP2GS在开放词汇性能方面具有竞争力,同时显著减少了计算开销。与无训练管道相比,它利用物理一致的占用学习来解决可见性歧义。

英文摘要

3D Gaussian Splatting (3DGS) provides an explicit and efficient scene representation, but its primitives lack inherent object-level identity, hindering downstream tasks such as open-vocabulary scene understanding. Existing methods typically address this by either distilling high-dimensional feature embeddings into Gaussians or by lifting 2D mask labels into 3D via heuristic refinement. However, feature-based approaches incur heavy storage and decoding overhead, while lifting-based pipelines remain vulnerable to label contamination: Gaussians necessary for appearance reconstruction often receive incorrect object labels during 2D-to-3D projection. We propose OP2GS, an object-aware Gaussian representation that augments each primitive with an explicit instance identity and a dedicated instance opacity $σ^{*}$ for object-mask rendering. The original opacity $σ$ remains responsible for visual reconstruction, while $σ^{*}$ models whether a Gaussian should contribute to a particular object mask. This dual-opacity formulation decouples visual existence from instance occupancy: mislabeled Gaussians can remain available for image rendering while becoming transparent in the object-mask branch. To learn this representation, we introduce a random object loss that optimizes the 1D instance occupancy field using the standard transmittance-based visibility of 3DGS. Semantic descriptors are then attached at the object level through multi-view aggregation, eliminating per-Gaussian feature storage. Compared with feature-training approaches, OP2GS achieves competitive open-vocabulary performance while significantly reducing computational overhead. Compared with training-free pipelines, it leverages physically consistent occupancy learning to resolve visibility ambiguities.

2605.20040 2026-05-20 cs.LG

Active Context Selection Improves Simple Regret in Contextual Bandits

主动上下文选择提升上下文老虎机中的简单遗憾

Mohammad Shahverdikondori, Jalal Etesami, Negar Kiyavash

AI总结 本文研究了具有有限上下文空间的上下文多臂老虎机问题,通过主动选择上下文样本来优化简单遗憾,提出了一种在已知和未知上下文分布时均能有效提升性能的算法。

详情
AI中文摘要

我们研究了具有有限上下文空间(即亚群体)的上下文多臂老虎机问题,其中学习者为每个上下文推荐最佳动作,并通过上下文加权简单遗憾进行评估。我们的保证是在奖励分布的最坏情况下,同时保持对上下文分布向量p的实例依赖性。类似于实验设计问题,其中感兴趣的总体是固定的但可选的亚群体可以被控制,我们允许学习者主动选择从何处采样上下文。对于已知的p,我们刻画了紧致的遗憾率:被动采样(上下文随机揭示)的遗憾为顺序√(n/T ||p||_{1/2}),而主动采样(分配q_j ∝ p_j^{2/3})则达到紧致的速率√(n/T) ||p||_{2/3}。所获得的改进可以达到Θ(k^{1/4}),其中k是上下文的数量。我们进一步将分析扩展到预算化的主动采样,刻画相应的紧致速率,并确定何时有限的主动预算足以恢复完全主动的速率。当p未知时,我们提出探索-探索-然后-提交(EETC)算法,该算法在大时间范围内能够匹配已知p的主动速率,仅相差常数因子。在合成和现实数据上的实验支持了我们的理论发现。

英文摘要

We study the contextual multi-armed bandit problem with a finite context space (a.k.a. subpopulations), where the learner recommends a best action for each context and is evaluated by context-weighted simple regret. Our guarantees are worst-case over the reward distributions, while remaining instance-dependent with respect to the context distribution vector $p$. Akin to experimental design problems where the population of interest is fixed but the sampled subpopulation can be controlled, we allow the learner to actively choose which context to sample from. For a known $p$, we characterize tight regret rates: passive sampling where contexts are randomly revealed achieves regret of order $\sqrt{n/T \, \lVert p \rVert_{1/2}}$, whereas active sampling with allocation $q_j \propto p_j^{2/3}$ achieves the tight rate $\sqrt{n/T} \, \lVert p \rVert_{2/3}$. The resulting improvement can be as large as $Θ(k^{1/4})$, where $k$ is the number of contexts. We further extend the analysis to budgeted active sampling, characterize the corresponding tight rate, and identify when a limited active budget suffices to recover the fully active rate. When $p$ is unknown, we propose the Explore-Explore-Then-Commit (EETC) algorithm, which optimally balances estimating the context distribution and the time to switch to active allocation, such that for large horizons, it matches the known-$p$ active rate up to constants. Experiments on synthetic and real-world data support our theoretical findings.

2605.20037 2026-05-20 cs.LG cs.AI

When Critics Disagree: Adaptive Reward Poisoning Attacks in RIS-Aided Wireless Control System

当批评者意见不一致时:RIS辅助无线控制系统中的自适应奖励中毒攻击

Deemah H. Tashman, Soumaya Cherkaoui

AI总结 本文提出了一种基于分歧引导的奖励中毒攻击(DGRP),用于攻击Soft Actor-Critic(SAC)智能体,以评估RIS辅助网络中深度强化学习(DRL)的鲁棒性。

详情
AI中文摘要

奖励中毒攻击对基于学习的无线控制系统构成了重大风险。为此,我们提出了一种在受Reconfigurable Intelligent Surfaces(RIS)辅助的Cognitive Radio Network(CRN)环境中,针对Soft Actor-Critic(SAC)智能体的Disagreement-Guided Reward Poisoning(DGRP)自适应攻击。SAC智能体的任务是通过同时优化二次用户(SUs)的发射功率和RIS相移,以最大化长期二次用户的速率。DGRP在SAC双批评者表现出显著分歧时(尤其在高杠杆、高不确定性状态下)污染奖励,导致价值估计扭曲并引导策略朝向次优动作。我们的研究发现,DGRP显著降低了RIS通常提供的性能提升,并降低了传输质量。我们进一步研究了关键攻击参数及其对学习的影响。与周期性定时和探索触发基线相比,DGRP始终造成更大的损害,突显了在评估RIS辅助网络中DRL鲁棒性时考虑分歧意识威胁的必要性。

英文摘要

Reward-poisoning attacks present a significant risk to learning-based wireless control systems. Given this, we propose a Disagreement-Guided Reward Poisoning (DGRP) adaptive attack on a Soft Actor-Critic (SAC) agent. In a Cognitive Radio Network (CRN) environment assisted by Reconfigurable Intelligent Surfaces (RIS), the SAC agent is tasked with maximizing the long-term secondary users' (SUs) rate by simultaneously optimizing the transmission power of the SU transmitter and the RIS phase shifts. DGRP corrupts rewards, particularly when the SAC dual critics exhibit substantial disagreement-especially in high-leverage, high-uncertainty states-resulting in distorted value estimations and guiding the policy towards suboptimal actions. Our findings demonstrate that DGRP substantially diminishes the performance improvements typically provided by RIS and degrades transmission quality. We further investigate key attack parameters and determine their impact on learning. In comparison to periodic-timing and exploration-triggered baselines, DGRP consistently causes greater damage, highlighting the necessity of considering disagreement-aware threats when evaluating the robustness of Deep Reinforcement Learning (DRL) in RIS-assisted networks.

2605.20035 2026-05-20 cs.CV

Stage-adaptive Token Selection for Efficient Omni-modal LLMs

面向高效多模态大语言模型的阶段自适应令牌选择

Zijie Xin, Jie Yang, Ruixiang Zhao, Tianyi Wang, Fengyun Rao, Jing Lyu, Xirong Li

AI总结 本文提出SEATS方法,通过阶段自适应的令牌选择技术,有效提升多模态大语言模型的推理效率,在保留96.3%原始性能的同时,实现9.3倍的FLOPs减少和4.8倍的prefill加速。

Comments Code Link: https://github.com/xxayt/SEATS

详情
AI中文摘要

多模态大语言模型(om-LLMs)通过将视频和音频编码为时间对齐的令牌序列,在窗口级别交错处理以实现统一的音频-视觉理解。然而,处理这些密集的非文本令牌会带来显著的计算开销。尽管训练无关的令牌选择可以减少这种成本,但现有方法要么专注于视觉输入,要么在LLM之前以固定的每模态比例修剪om-LLM令牌,无法捕捉跨模态令牌重要性在层间的变化。为了解决这一限制,我们首先分析om-LLMs的层间令牌依赖性。我们发现视觉和音频依赖性遵循块状模式,并随着深度逐渐减弱,表明许多后期层的非文本令牌在跨模态融合后变得冗余。受此启发,我们提出SEATS,一种训练无关的、阶段自适应的令牌选择方法,用于高效的om-LLM推理。在LLM之前,SEATS通过注意力加权多样性选择去除时空冗余。在LLM内部,它逐步在块间修剪令牌,并利用查询相关性分数动态分配从时间窗口到模态的保留预算。在后期层中,一旦完成跨模态融合,它会移除所有剩余的非文本令牌。在Qwen2.5-Omni和Qwen3-Omni上的实验表明,SEATS有效提高了推理效率。仅保留10%的视觉和音频令牌,实现了9.3倍的FLOPs减少和4.8倍的prefill加速,同时保持96.3%的原始性能。

英文摘要

Omni-modal large language models (om-LLMs) achieve unified audio-visual understanding by encoding video and audio into temporally aligned token sequences interleaved at the window level. However, processing these dense non-textual tokens throughout the LLM incurs substantial computational overhead. Although training-free token selection can reduce this cost, existing methods either focus on visual-only inputs or prune om-LLM tokens only before the LLM with fixed per-modality ratios, failing to capture how cross-modal token importance evolves across layers. To address this limitation, we first analyze the layer-wise token dependency of om-LLMs. We find that visual and audio dependencies follow a block-wise pattern and gradually weaken with depth, indicating that many late-layer non-textual tokens become redundant after cross-modal fusion. Motivated by this observation, we propose SEATS, a training-free, stage-adaptive token selection method for efficient om-LLM inference. Before the LLM, SEATS removes spatiotemporal redundancy via attention-weighted diversity selection. Inside the LLM, it progressively prunes tokens across blocks and dynamically allocates the retention budget from temporal windows to modalities using query relevance scores. In late layers, it removes all remaining non-textual tokens once cross-modal fusion is complete. Experiments on Qwen2.5-Omni and Qwen3-Omni demonstrate that SEATS effectively improves inference efficiency. Retaining only 10% of visual and audio tokens, it achieves a 9.3x FLOPs reduction and a 4.8x prefill speedup while preserving 96.3% of the original performance.

2605.20033 2026-05-20 cs.CV cs.GT

A Nash Equilibrium Framework For Training-Free Multimodal Step Verification

为无训练多模态步骤验证构建纳什均衡框架

Rohit Sinha, Kunal Tilaganji, Tanuja Ganu, Nagarajan Natarajan, Amit Sharma, Vineeth N. Balasubramanian

AI总结 本文提出一种无训练的多模态步骤验证方法,将步骤验证视为专门法官之间的协调问题,并通过纳什均衡游戏形式化法官之间的交互,通过闭式解计算均衡分数,实现对分歧的敏感过滤和稳定性意识的排名,实验表明跨模态一致性(而非平均置信度)提供了鲁棒的验证信号。

Comments ICLR 2026 Workshop VerifAI-2

详情
AI中文摘要

多模态大语言模型经常生成包含细微错误的推理链,导致错误答案。当前的验证方法有显著局限。学习批评者需要大量标注数据且在不同任务上表现不一致。同时,现有无训练方法仅简单平均不同来源的分数,忽略了关键见解:当这些分数不一致时,这种不一致本身包含了关于推理步骤是否真正有效的重要信息。我们提出了一种无训练验证方法,将分步验证视为专门法官之间的协调问题。我们形式化这些法官的交互为纳什均衡游戏,其中一致信号表示有效步骤,不一致揭示不稳定性。我们的方法通过闭式解计算均衡分数,实现了对分歧的敏感过滤和稳定性意识的排名。在六个基准测试中,我们的方法在基准模型上实现了2.4%至5.2%的一致性提升,并在与学习批评者相比时表现出竞争力,证明了跨模态一致性(而非平均置信度)在无任务特定适应的情况下提供了稳健的验证信号。

英文摘要

Multimodal large language models often generate reasoning chains containing subtle errors that lead to incorrect answers. Current verification approaches have notable limitations. Learned critics need extensive labeled data and show inconsistent performance across different tasks. Meanwhile, existing training-free methods simply average scores from different sources, missing a key insight: when these scores disagree, that disagreement itself carries important information about whether a reasoning step is truly valid or not. We propose a training-free verification approach that treats step-wise verification as a coordination problem among specialized judges. We formalize these judges' interaction as a Nash equilibrium game where agreement signals valid steps while disagreement reveals instability. Our method computes equilibrium scores through a closed-form solution, enabling both disagreement-aware filtering and stability-conscious ranking of reasoning steps. Evaluated across six benchmarks, our approach achieves consistent improvements of 2.4% to 5.2% over baseline models and shows competitive performance against learned critics, demonstrating that cross-modal agreement (not just average confidence) provides robust verification signals without task-specific adaptation.

2605.20032 2026-05-20 cs.LG cs.MM

CAMERA: Adapting to Semantic Camouflage in Unsupervised Text-Attributed Graph Fraud Detection

CAMERA: 适应语义伪装的无监督文本属性图欺诈检测

Junjun Pan, Yixin Liu, Yu Zheng, Lianhua Chi, Alan Wee-Chung Liew, Shirui Pan

AI总结 本文提出CAMERA框架,通过适应性多 cue 专家模型来应对语义伪装问题,利用图结构和文本属性信息进行无监督欺诈检测,提高对伪装欺诈者的识别能力。

Comments Accepted by IJCAI 2026

详情
AI中文摘要

文本属性图欺诈检测(TAGFD)在防止在线社交和电子商务平台上欺诈活动方面起着关键作用。然而,为了逃避检测,欺诈者不断演变其伪装策略,通过刻意模仿良性用户的文本响应来隐藏其恶意目的。这种现象称为语义伪装,从根本上破坏了对结构和属性线索如何被用来识别欺诈者的常见假设,并使在无监督TAGFD中发现欺诈者变得困难。为了解决这一问题,我们提出了一个案例自适应多 cue 专家框架(CAMERA)用于无监督TAGFD。CAMERA采用了一个ego解耦的混合专家架构,其中每个专家专门建模一种不同的欺诈指示线索。引入了一个上下文感知的门控模型,以联合考虑ego节点表示及其局部邻域上下文,以适应不同专家学习的线索的集成。此外,CAMERA利用欺诈者的固有稀有性,支持无监督的一类学习,通过专家级目标鼓励建模主导的良性模式,从而实现对伪装欺诈者的可靠无监督检测。在四个具有挑战性的数据集上的实验表明,CAMERA在对抗语义伪装欺诈者方面优于竞争对手,证明了其有效性。代码可在https://github.com/CampanulaBells/CAMERA获取。

英文摘要

Text-attributed graph fraud detection (TAGFD) plays a critical role in preventing fraudulent activities on online social and e-commerce platforms. However, to evade detection, fraudsters continuously evolve their camouflaging strategies by deliberately mimicking textual responses of benign users, thereby concealing their malicious purposes. This phenomenon, referred to as semantic camouflage, fundamentally undermines commonly relied assumptions on how structural and attribute cues can be exploited to identify fraudsters, and makes it difficult to spot fraudsters with unsupervised TAGFD. To bridge the gaps, we propose a Case-Adaptive Multi-cue Expert fRAmework (CAMERA) for unsupervised TAGFD. CAMERA employs an ego-decoupled mixture-of-experts architecture, where each expert specializes in modeling a distinct type of fraud-indicative cue. A context-informed gating model is introduced to jointly consider the ego node representation and its local neighborhood context for adaptive integration of cues learned by different experts. Furthermore, CAMERA leverages the inherent rarity of fraudsters to support unsupervised one-class learning with expert-level objectives that encourage modeling dominant benign patterns, thereby enabling reliable unsupervised detection of camouflaged fraudsters. Experiments on 4 challenging datasets show that CAMERA consistently outperforms competitors, showing its effectiveness against semantically camouflaged fraudsters. Code available at https://github.com/CampanulaBells/CAMERA

2605.20028 2026-05-20 cs.LG physics.ao-ph

Training-Free Bayesian Filtering with Generative Emulators

无需训练的贝叶斯过滤与生成模拟器

Thomas Savary, François Rozet, Gilles Louppe

AI总结 本文提出一种无需额外训练的最优粒子滤波变种,利用基于扩散的动力学模拟器,解决了高维环境下粒子滤波的可扩展性问题,通过非线性混沌系统实验验证了其有效性。

Comments Accepted as a spotlight paper at the International Conference on Machine Learning 2026

详情
AI中文摘要

贝叶斯过滤是一个旨在从观测中估计动态系统合理状态的知名问题。在现有解决方案中,粒子滤波在非线性动态和观测中理论上是精确的,但在高维情况下扩展性差。本文展示,基于扩散的动力学模拟器可以无需额外训练地实现一种最优的粒子滤波变种,这种变种由于经典数值求解器的实现挑战而长期未被探索。非线性混沌系统(包括大气动力学)的实验表明,所提出的方法成功将粒子滤波扩展到高维设置。

英文摘要

Bayesian filtering is a well-known problem that aims to estimate plausible states of a dynamical system from observations. Among existing approaches to solve this problem, particle filters are theoretically exact for non-linear dynamics and observations, but suffer from poor scalability in high dimensions. In this work, we show that diffusion-based emulators of dynamical systems can be used to implement, without additional training, an optimal variant of particle filters that has remained largely unexplored due to implementation challenges with classical numerical solvers. Experiments on nonlinear chaotic systems, including atmospheric dynamics, demonstrate that the proposed approach successfully scales particle filtering to high-dimensional settings.

2605.20022 2026-05-20 cs.CL

FlexDraft: Flexible Speculative Decoding via Attention Tuning and Bonus-Guided Calibration

FlexDraft: 通过注意力调节和奖励引导校准实现灵活的推测解码

Yaojie Zhang, Jianuo Huang, Junlong Ke, Yuhang Han, Yongji Long, Tianchen Zhao, Biqing Qi, Linfeng Zhang

AI总结 本文提出FlexDraft框架,通过注意力调节和奖励引导校准,灵活适应不同批处理大小,解决传统并行推测解码在大批次时的吞吐量下降问题。

详情
AI中文摘要

推测解码通过使用快速草稿生成多个候选标记并由目标模型并行验证,从而加速内存密集型LLM推理且不降低质量。然而,传统顺序推测解码面临草稿与验证之间的相互等待以及中间状态的反复交换,进一步增加内存访问开销。并行推测解码通过在单个目标前向传递中执行草稿和验证,允许在当前候选被验证的同时准备未来的草稿。尽管在小批次中有效,现有并行推测解码方法要么需要昂贵的持续预训练导致质量下降,要么验证接受率低。更重要的是,这种范式固有地面临奖励标记和接受长度的不确定性,导致草稿验证不匹配,从而在大批次时吞吐量收益崩溃。为了解决这些限制,我们引入了FlexDraft,一种无损的推测解码框架,通过三个关键设计灵活适应不同的批次大小。(1) 注意力调节通过仅调节最终几层的注意力投影器来实现块扩散草稿生成,同时保持自回归路径冻结以保留目标分布并生成高质量的草稿,同时使用最少的可训练参数。(2) 奖励引导校准使用一个轻量级的MLP,条件于已解决的奖励标记来校准草稿logits,缓解由奖励标记不确定性导致的草稿验证不匹配。(3) 灵活解码在小批次时动态切换于并行草稿和验证,而在大批次时切换为顺序草稿然后验证,并根据草稿信心调整验证长度以消除冗余计算。

英文摘要

Speculative decoding accelerates memory-bound LLM inference without quality degradation by using a fast drafter to propose multiple candidate tokens and the target model to verify them in parallel. However, conventional sequential speculative decoding suffers from mutual waiting between drafting and verification, and repeated exchange of intermediate states further increases memory access overhead. Parallel speculative decoding addresses this limitation by performing drafting and verification within a single target forward pass, allowing future drafts to be prepared while current candidates are being verified. Although effective at small batch sizes, existing parallel speculative decoding methods either require costly continual pretraining with quality degradation or suffer from low acceptance rates. More importantly, this paradigm inherently suffers from uncertainty in both the bonus token and the accepted length, leading to draft verification mismatch and causing throughput gains to collapse at large batch sizes. To address these limitations, we introduce FlexDraft, a lossless speculative decoding framework that flexibly adapts to varying batch sizes through three key designs. (1) Attention Tuning enables block diffusion drafting by tuning only the attention projectors of the final few layers on mask tokens, while keeping the autoregressive path frozen to preserve the target distribution and produce high quality drafts with minimal trainable parameters. (2) Bonus-guided Calibration uses a lightweight MLP conditioned on the resolved bonus token to calibrate draft logits, mitigating draft verification mismatch caused by bonus token uncertainty. (3) Flex Decoding dynamically switches between parallel draft and verify at small batch sizes and sequential draft then verify at large batch sizes, and adjusts verification length based on draft confidence to eliminate redundant computation.

2605.20014 2026-05-20 cs.SD

Precise and Simple Audio-to-Score Alignment

精确且简单的音频到乐谱对齐

Silvan Peter, Patricia Hu, Gerhard Widmer

AI总结 本文提出了一种直接连接音频样特征和符号级特征的算法,该算法基于符号对齐方法,实现了高精度且灵活的音频到乐谱对齐,适用于不同音色特性。

Comments published at the Music Encoding Conference (MEC) 2026

详情
AI中文摘要

音频到乐谱对齐是音乐信息检索中的长期挑战,也是音乐研究中最广泛适用的对齐任务。对齐算法匹配音乐作品的两个版本,这些版本需要处于可比格式中。音频到音频对齐匹配音频特征;当将音频文件与乐谱匹配时,必须要么合成乐谱,要么通过钢琴卷或其他类似特征序列推导出音频样特征。相比之下,符号对齐匹配符号编码的音符;在音频到乐谱场景中,这些通过音频文件的转录获得。在本文中,我们提出了一种算法,直接连接音频样特征和符号级特征。通过基于符号对齐方法的定制动态规划匹配算法,顺序音频特征编码的起始点和频谱激活被匹配到乐谱位置。所得到的方法既精确——超越了基于合成乐谱的广泛使用的音频到音频方法——又保持了其数字信号处理组件的灵活性,即该方法可以适应不同的音色特性,而无需单独的转录模型。此外,它继承了一些符号对齐的运行时优势,其算法复杂度在最坏情况下与符号乐谱(通常较短)和音频特征序列(通常较长)的长度成线性关系。在接下来的章节中,我们提供详细的算法描述,并在大规模独奏钢琴录音数据集上评估其对齐质量。

英文摘要

Audio-to-score alignment is a long-standing challenge in music information retrieval and arguably the most widely applicable alignment task for music research. Alignment algorithms match two versions of a piece of music, and for this to work these versions need to be in comparable formats. Audio-to-audio alignment matches audio features; when matching audio files to scores, they must either synthesize the score or derive audio-like features by means of piano rolls or similar feature sequences. Symbolic alignment, by contrast, matches symbolically encoded notes; in an audio-to-score scenario these would be obtained by a transcription of the audio file. In this article, we present an algorithm that bridges audio-like and symbol-level features directly. Sequential audio features encoding onset and spectral activation are matched to score positions by a bespoke dynamic programming-based matching algorithm derived from symbolic alignment methods. The resulting method is both precise - surpassing widely used audio-to-audio approaches based on synthesized scores -, and remains flexible in its digital signal processing components, i.e., the method is adaptable to diverse timbral characteristics without requiring a separate transcription model. Furthermore it inherits some of the symbolic alignment runtime advantages with an algorithmic complexity that is at worst linear in the length of the (typically short) symbolic score and (typically long) audio feature sequence. In the following sections, we provide a detailed algorithm description and evaluate its alignment quality on a large-scale dataset of solo piano recordings.

2605.20009 2026-05-20 cs.LG cs.AI cs.NE

Training Neural Networks with Optimal Double-Bayesian Learning

用最优双贝叶斯学习训练神经网络

Vy Bui, Hang Yu, Karthik Kantipudi, Ziv Yaniv, Stefan Jaeger

AI总结 本文提出了一种新的概率框架,用于学习率这一关键参数,通过双贝叶斯决策机制改进随机梯度下降,从而推导出理论上最优的学习率,并在多种任务中验证其有效性。

Comments 13 pages, 4 figures; see also arXiv:2410.12984 [cs.LG]

详情
AI中文摘要

反向传播与梯度下降是大多数机器学习神经网络架构中常用的优化策略。然而,找到指导训练的最优超参数已证明具有挑战性。尽管普遍认可选择合适参数对于避免过拟合和获得无偏结果至关重要,但这一选择仍主要基于经验实验和经验。本文提出了一种新的概率框架,用于学习率这一随机梯度下降中的关键参数。该框架将经典贝叶斯统计发展为一种涉及两个对抗性贝叶斯过程的双贝叶斯决策机制。从这两个过程可以推导出理论上最优的学习率,并用于随机梯度下降。在各种分类、分割和检测任务中的实验验证了理论上推导出的学习率的实践意义。本文还讨论了所提出的双贝叶斯框架对网络训练和模型性能的影响。

英文摘要

Backpropagation with gradient descent is a common optimization strategy employed by most neural network architectures in machine learning. However, finding optimal hyperparameters to guide training has proven challenging. While it is widely acknowledged that selecting appropriate parameters is crucial for avoiding overfitting and achieving unbiased outcomes, this choice remains largely based on empirical experiments and experience. This paper presents a new probabilistic framework for the learning rate, a key parameter in stochastic gradient descent. The framework develops classic Bayesian statistics into a double-Bayesian decision mechanism involving two antagonistic Bayesian processes. A theoretically optimal learning rate can be derived from these two processes and used for stochastic gradient descent. Experiments across various classification, segmentation, and detection tasks corroborate the practical significance of the theoretically derived learning rate. The paper also discusses the ramifications of the proposed double-Bayesian framework for network training and model performance.

2605.20006 2026-05-20 cs.AI

GeoX: Mastering Geospatial Reasoning Through Self-Play and Verifiable Rewards

GeoX:通过自我对战和可验证奖励掌握地理空间推理

Kyeongjin Ahn, Seungeon Lee, Krishna P. Gummadi, Meeyoung Cha

AI总结 本文提出GeoX框架,通过自我对战和可验证奖励解决图像 grounded 的复杂空间问题,无需大规模人工标注数据,提升了基础视觉语言模型在地理空间理解上的性能。

Comments 26 pages,12 figures, 9 tables

详情
AI中文摘要

地理空间推理需要解决图像 grounded 的问题,涉及复杂场景的空间结构。然而,开发这一能力受到标注大量且组合性问题的成本限制。我们提出GeoX,一种通过可执行程序产生可验证奖励的自我对战框架,无需依赖大规模人工标注数据。给定卫星或航空图像,我们的框架采用单一多模态策略,提出空间问题作为可执行程序,并在三种推理模式(演绎、归纳和演绎)下通过空间原语和图像理解工具解决这些问题。验证器执行每个程序以产生奖励信号,通过强化学习联合优化两个角色。GeoX在平均上将基础VLMs提升高达5.5个百分点,与在数百万标注数据上训练的常规基线相匹配或超过。同时,我们发布了通过自我对战积累的地理空间理解基准测试。

英文摘要

Geospatial reasoning requires solving image-grounded problems over the complex spatial structure of a scene. However, developing this capability is hindered by the cost of annotating a vast and combinatorial question space. We propose GeoX, a self-play framework that acquires spatial logic through executable programs that yield verifiable rewards, without relying on large-scale human-curated data Given a satellite or aerial image, our framework employs a single multimodal policy that proposes spatial problems as executable programs and solves them under three reasoning modes-abduction, deduction, and induction-over spatial primitives and an image understanding tool. A verifier executes each program to covert a reward signal that jointly optimizes the two roles via reinforcement learning. GeoX consistently improves its base VLMs by up to 5.5 points on average, matching or exceeding conventional baselines trained on millions of curated data. Along-side the proposed method, we release a benchmark for geospatial understanding accumulated through self-play.

2605.20005 2026-05-20 cs.LG

Fine-Tuning Without Forgetting via Loss-Adaptive Learning Rates

通过损失自适应学习率实现无遗忘的微调

Parjanya Prajakta Prashant, Jiongli Zhu, Aldan Creo, Babak Salimi

AI总结 本文提出了一种损失自适应学习率调度方法FINCH,通过动态调整学习率来减少微调过程中的遗忘现象,同时保持任务性能,从而在知识获取、科学和低资源语言适应等基准测试中显著提升了模型表现。

Comments 25 pages

详情
AI中文摘要

在新数据上微调大型语言模型可以提高任务性能,但会损害预训练期间学到的能力,这种现象称为灾难性遗忘。现有方法通过修改微调目标来抑制高损失的token或序列,但这些token对于学习新任务尤其重要,尤其是那些预训练覆盖不足的任务。在这种情况下,硬token仍应有助于学习,因此必须在不抑制它们的情况下控制遗忘。我们发现了一个简单的机制:每步的遗忘受学习率和当前训练损失平方根的乘积限制。这表明高损失批次尤其容易引发遗忘。受此启发,我们引入了FINCH,一种损失自适应的学习率调度方法,它在高损失批次上降低学习率,在模型收敛时增加学习率,同时保持微调目标不变。在知识获取、科学和低资源语言适应基准测试中,FINCH平均减少了93%的遗忘,同时保持与标准微调相当的任务性能。在Qwen3-4B知识获取任务中,FINCH将TruthfulQA的退化减少了5倍,并逆转了HaluEval的退化,同时更好地保持了置信度校准。总体而言,我们的结果表明,学习率调度是微调过程中塑造模型行为的有效工具,而不仅仅是为了目标任务优化。

英文摘要

Fine-tuning large language models on new data improves task performance but degrades capabilities learned during pretraining, a phenomenon known as catastrophic forgetting. Existing methods mitigate this by modifying the fine-tuning objective to suppress high-loss tokens or sequences, but these tokens are essential for learning new tasks, especially those with poor pretraining coverage. In such settings, hard tokens should still contribute to learning, so forgetting must be controlled without suppressing them. We identify a simple mechanism for doing so: per-step forgetting is bounded by the product of the learning rate and the square root of the current training loss. This suggests that high-loss batches are especially prone to inducing forgetting. Motivated by this observation, we introduce FINCH, a loss-adaptive learning-rate schedule that reduces the learning rate on high-loss batches and increases it as the model converges, while leaving the fine-tuning objective unchanged. Across knowledge acquisition, science, and low-resource language adaptation benchmarks, FINCH reduces forgetting by 93% on average while matching the task performance of standard fine-tuning. On Qwen3-4B knowledge acquisition, FINCH cuts TruthfulQA degradation by 5x and reverses HaluEval degradation, while better preserving confidence calibration. Overall, our results show that learning-rate schedules are an effective tool to shape model behavior during fine-tuning, beyond just target-task optimization.

2605.19999 2026-05-20 cs.LG cs.AI cs.CR

LLM Benchmark Datasets Should Be Contamination-Resistant

LLM基准数据集应具备抗污染性

Ali Al-Lawati, Jason Lucas, Dongwon Lee, Suhang Wang

AI总结 本文探讨了LLM基准数据集应具备抗污染性,提出通过改进数据集设计和架构来提高其可靠性和通用性。

Comments Accepted to ICML 2026 Position Paper Track

详情
AI中文摘要

基准数据集对于可重复、可靠和具有判别性的LLM评估至关重要。然而,最近的研究表明,许多基准数据集包含在预训练语料库中,即被污染,这降低了它们作为可靠模型泛化度量的价值。在本文中,我们主张基准数据集应具备抗污染性,即不可学习但支持推理。为此,我们首先强调基准数据集污染的广泛存在,并概述抗污染数据集的性质。其次,我们强调Transformer架构中推理和训练流程之间的不对称性可以用来支持抗污染性。第三,我们概述了使这些数据集在各种LLM架构之间互操作的数学进展。基于上述内容,我们呼吁社区通过:(i) 推动新的抗污染方法,(ii) 开发支持方法和平台,以及(iii) 在现有评估流程中采用抗污染基准来确保LLM评估的可靠性。

英文摘要

Benchmark datasets are critical for reproducible, reliable, and discriminative evaluation of LLMs. However, recent studies reveal that many benchmark datasets are included in pretraining corpora, i.e., $\textit{contaminated}$, which diminishes their value as reliable measures of model generalization. In this paper, we argue that benchmark datasets should be $\textit{contamination-resistant}$, i.e., $\textit{unlearnable}$, but support $\textit{inference}$. To accomplish this, we first highlight the wide prevalence of benchmark dataset contamination and outline the properties of contamination-resistant datasets. Second, we highlight how the asymmetry between the inference and training pipelines in the Transformer architecture can be leveraged to support contamination-resistance. Third, we outline mathematical advancements to make these datasets interoperable across various LLM architectures. Based on the above, we call on the community to ensure the reliability of LLM benchmarking by: (i) advancing novel contamination-resistant methodologies, (ii) developing supporting methods and platforms, and (iii) adopting contamination-resistant benchmarks into existing evaluation pipelines.

2605.19995 2026-05-20 cs.CV

CogOmniControl: Reasoning-Driven Controllable Video Generation via Creative Intent Cognition

CogOmniControl: 通过创意意图认知实现推理驱动的可控视频生成

Hongji Yang, Songlian Li, Yucheng Zhou, Xiaotong Zhao, Alan Zhao, Chengzhong Xu, Jianbing Shen

AI总结 本文提出CogOmniControl框架,通过将可控视频生成分解为创意意图认知和生成两个阶段,利用专门训练的CogVLM生成更专业清晰的输出,并通过强化学习对齐不同条件的控制,最终在两个基准测试中超越现有开源模型。

详情
AI中文摘要

最近的扩散模型在视频生成中实现了强大的照片真实性和流畅性,但在抽象、稀疏或复杂条件下表现脆弱,导致在专业生产流程如分镜头草图和泥塑渲染条件中性能不佳。现有视频生成模型要么通过适配器注入条件,要么将通用视觉-语言模型(VLM)嵌入扩散骨干中,导致能力缺口,无法生成符合用户创意意图的视频。我们提出了CogOmniControl,一个推理驱动的框架,将可控视频生成分解为创意意图认知和生成。具体而言,我们训练了一个专门的CogVLM,使用真实的动画制作数据。与通用VLM相比,它生成更专业和清晰的输出,能够从稀疏和抽象的条件下准确认知用户的创意意图,并将这些提示转换为密集的推理输出。此外,CogOmniDiT通过上下文生成统一各种条件的控制,并通过强化学习对齐CogVLM的推理输出。此外,利用CogVLM在引导视频生成中的强大能力,我们释放了其在规划特定评估者和启用生成视频的最佳N选择中的潜力。这种整合将整个框架转变为闭环的

英文摘要

Recent diffusion models achieve strong photorealism and fluency in video generation, yet remain fragile under abstract, sparse or complex conditions, leading to poor performance in professional production workflows such as storyboard sketches and clay render conditions. Existing video generation models, either inject conditions through adapters or couple a generic vision-language model (VLM) within a diffusion backbone, leaving a capability gap and failing to produce the videos that align with the user's creative intent. We present CogOmniControl, a reasoning-driven framework that factorizes controllable video generation into creative intent cognition and generation. Specifically, we train a specialized CogVLM using authentic anime production data. Compared to generic VLMs, it generates more professional and clear outputs, accurately cognizing user creative intent from sparse and abstract conditions and tuning these cues into dense reasoning output. Besides, CogOmniDiT unifies the controls from various conditions through in-context generation and is aligned to the CogVLM reasoning outputs via reinforcement learning. Furthermore, leveraging CogVLM's robust capability in guiding video generation, we release its potential in planning specific evaluators and enable a Best-of-N selection for the generated videos. This integration transforms the entire framework into a closed-loop "harness-like" architecture. We further introduce CogReasonBench and CogControlBench, built from professional workflows data that carry genuine creative intent rather than simulated ones. Experiments on two benchmarks show that CogOmniControl surpassed the existing open-source models. The project website: https://um-lab.github.io/CogOmniControl/

2605.19990 2026-05-20 cs.RO cs.CV cs.LG

Minimalist Visual Inertial Odometry

极简视觉惯性里程计

Francesco Pasti, Jeremy Klotz, Nicola Bellotto, Shree K. Nayar

AI总结 本文提出了一种极简的平面里程计方法,通过四个视觉测量和一个IMU实现差分驱动机器人的鲁棒运动估计,展示了极简传感在高效准确平面里程计中的应用。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

视觉-惯性里程计(VIO)对于移动机器人导航至关重要,但使用高像素相机需要大量资源。本文提出了一种极简方法用于平面里程计,证明仅四个视觉测量和一个IMU即可为差分驱动机器人提供可靠的运动估计。我们的关键见解是四个向下 facing 的光电二极管通过光学Gabor掩码感知世界,产生编码速度的信号。基于此,我们利用物理基础模拟器联合优化掩码参数和时间卷积网络(TCN)。所得到的模型仅通过光电二极管产生的四个测量值解码速度。将这些估计与IMU提供的角速度结合,可以得到连续的平面轨迹。我们通过将原型传感器安装在差分驱动机器人上验证了我们的方法。在多样化的室内和室外地形上,我们的系统能够紧密跟踪参考真实地面,无需任何现实中的微调。我们的工作表明,极简传感能够实现高效且准确的平面里程计。

英文摘要

Visual-Inertial Odometry(VIO), which is critical to mobile robot navigation, uses cameras with a large number of pixels. Capturing and processing camera images requires significant resources. This work presents a minimalist approach to planar odometry, demonstrating that just four visual measurements and an IMU can provide robust motion estimation for differential-drive robots. Our key insight is that four downward-facing photodiodes that sense the world through optical Gabor masks produce signals that encode speed. Based on this, we jointly optimize the mask parameters alongside a Temporal Convolutional Network (TCN) using a physically-grounded simulator. The resulting model decodes speed from just the four measurements produced by the photodiodes. Pairing these estimates with the angular speed from an IMU yields a continuous planar trajectory. We validate our approach with a prototype sensor mounted on a differential drive robot. Across diverse indoor and outdoor terrains, our system closely tracks the reference ground truth without any real-world fine-tuning. Our work shows that minimalist sensing enables efficient and accurate planar odometry.

2605.19986 2026-05-20 cs.RO cs.CV cs.LG

Beyond Binary Success: A Diagnostic Meta-Evaluation Framework for Fine-Grained Manipulation

超越二元成功:一种用于细粒度操控的诊断元评估框架

He-Yang Xu, Pengyuan Zhang, Zongyuan Ge, Xiaoshuai Hao, Serge Belongie, Xin Geng, Yuxin Peng, Xiu-Shen Wei

AI总结 本文提出MetaFine框架,通过分解理解、感知和受控行为三个维度,诊断细粒度操控中的能力瓶颈,并通过因果干预识别视觉编码器在保持局部空间结构方面的关键限制,从而提升操控精度。

Comments Project page: https://metafine.github.io/

详情
AI中文摘要

细粒度操控标志着一个领域,其中全局场景上下文不再足够,成功取决于局部属性定位、高保真空间感知和符合约束的运动执行之间的紧密耦合。然而,当前的具身AI基准测试将这些能力简化为二元成功率,系统性地将报告能力夸大了多达70%,并掩盖了阻碍实际应用的架构瓶颈。我们引入了MetaFine,一种诊断元评估框架,通过分解理解、感知和受控行为三个轴来分离操控能力。基于组合任务图,MetaFine吸收异构外部基准,并在统一协议下重构为不同复杂度的诊断场景。通过这一视角评估最先进的视觉-语言-动作(VLA)模型,揭示了传统度量无法发现的严重维度特定失败。通过针对性的因果干预,我们确定了视觉编码器保持局部空间结构的能力是细粒度精度的关键瓶颈:改进它可以直接解锁之前无法触及的操控能力,而无需修改下游策略。MetaFine进一步支持混合真实-仿真验证,利用有限的配对现实运行来校准可扩展的仿真基于估计,以获得更稳定的物理基准测试。通过将评估从排名转向诊断,MetaFine将基准测试转变为修复真实物理敏捷性底层能力的可行指南。MetaFine框架、基准和相关资源将在项目页面上公开发布:https://metafine.github.io/。

英文摘要

Fine-grained manipulation marks a regime where global scene context no longer suffices, and success hinges on the tight coupling of local attribute grounding, high-fidelity spatial perception, and constraint-respecting motor execution. However, current embodied AI benchmarks collapse these capacities into binary success rates, systematically inflating reported capabilities by up to 70% and masking the architectural bottlenecks that impede real-world deployment. We introduce MetaFine, a diagnostic meta-evaluation framework that disentangles manipulation competency along three axes: understanding, perception, and controlled behavior. Built on a compositional task graph, MetaFine absorbs heterogeneous external benchmarks and reconstructs them into diagnostic scenarios of varying complexity under a unified protocol. Evaluating state-of-the-art vision-language-action (VLA) models through this lens exposes severe dimension-specific failures invisible to conventional metrics. Through targeted causal intervention, we identify the visual encoder's ability to preserve local spatial structure as a key bottleneck for fine-grained precision: improving it directly unlocks previously inaccessible manipulation capabilities without modifying downstream policies. MetaFine further supports hybrid real-sim validation, using limited paired real-world rollouts to calibrate scalable simulation-based estimates for more stable physical benchmarking. By shifting evaluation from ranking to diagnosis, MetaFine turns benchmarking into an actionable compass for repairing the layered capacities underlying genuine physical dexterity. The MetaFine framework, benchmarks, and supporting resources will be publicly released at our project page: https://metafine.github.io/.

2605.19984 2026-05-20 cs.SD

A conceptual framework for learning to listen by reward: Curiosity-driven search for novel sources

基于奖励的学习倾听的概念框架:好奇心驱动的新型声源搜索

Andreas Triantafyllopoulos, Jakub Šťastný, Alexios Terpinas, Tianyi Liu, Yuanqi Wang, Björn W. Schuller

AI总结 本文提出了一种基于奖励的学习倾听的概念框架,通过好奇心驱动的新型声源搜索来解决音频领域中强化学习应用不足的问题。

详情
AI中文摘要

强化学习是一种强大的学习范式,已在许多领域推动了进展。其核心承诺在于通过高层目标学习,而无需细粒度标签。然而,在音频领域,它仍然难以应用,相较于计算机视觉或其他领域,受到的关注较少。关键问题是:如何让智能体通过奖励驱动的探索来学习倾听?在本文中,我们概述了先前的尝试,并提出了一种新的学习倾听的概念框架。我们的方法依赖于持续寻找新的声音源。我们制定了我们的框架,讨论了开放的技术挑战,并展示了一个初步的证明概念实现,以展示我们方法的可行性。

英文摘要

Reinforcement learning is a powerful learning paradigm that has spearheaded progress in numerous domains. Its core promise lies in learning through high-level goals without the need for granular labels. However, it still remains elusive in the realm of audio, where it has received substantially less attention than in computer vision or other domains. The key question remains: how can agents learn to listen purely via reward-driven exploration? In this contribution, we present an overview of previous attempts and a new conceptual framework for learning to listen by reward. Our approach depends on the continuous search for novel sound sources. We formulate our framework, discuss open technical challenges, and present a first proof-of-concept implementation that showcases the feasibility of our approach.

2605.19982 2026-05-20 cs.CV

InterLight: Leveraging Intrinsic Illumination Priors for Low-Light Image Enhancement

InterLight: 利用内在照明先验进行低光照图像增强

Ziqi Wang, Xu Zhang, Laibin Chang, Shi Chen, Jiaqi Ma, Huan Zhang

AI总结 本文提出InterLight框架,通过系统挖掘和操作内在照明先验来解决低光照图像增强问题,核心方法是构建照明感知的处理流程,通过物理引导增强和自监督一致性目标实现更清晰的纹理和更一致的增强效果。

Comments Accepted by IJCAI 2026. Code: https://github.com/House-yuyu/InterLight

详情
AI中文摘要

低光照图像增强(LLIE)长期以来一直是低级视觉中的挑战性问题,由于光照不足常导致对比度低、细节丢失和噪声。最近的研究表明,基于深度学习的Retinex理论可以有效解耦光照和反光。然而,现有方法常面临过增强或色彩失真问题,并且通常假设均匀噪声或理想照明。为了解决这些限制,我们提出InterLight,一种新颖的框架,系统挖掘并操作内在照明先验用于LLIE。我们的核心见解是,稳健的增强不仅需要估计光照,还需要构建照明感知的处理流程。我们首先通过物理引导增强注入传感器级光照响应先验,然后通过适应性提示表示退化,这些提示基于场景的潜在光照状态。这种显式表示直接引导一个亮度门控的内在记忆机制,选择性补偿信息损失,优先重建暗区的同时在亮区保持保真度。最后,整个过程通过自监督一致性目标进行正则化,该目标蒸馏了光照不变特征。通过深入挖掘内在光照先验,我们的方法实现了更清晰的纹理和更一致的增强结果。在多个基准上的广泛实验验证了我们的方法的有效性。代码可在:https://github.com/House-yuyu/InterLight 获取。

英文摘要

Low-Light Image Enhancement (LLIE) has long been a challenging problem in low-level vision, as insufficient illumination often leads to low contrast, detail loss, and noise. Recent studies show that deep learning-based Retinex theory can effectively decouple illumination and reflectance. However, existing methods frequently suffer from over-enhancement or color distortion, and often assume uniform noise or ideal lighting. To address these limitations, we propose InterLight, a novel framework that systematically excavates and operationalizes intrinsic illumination priors for LLIE.Our core insight is that robust enhancement requires not just estimating illumination, but constructing an illumination-aware pipeline. We first inject sensor-level illumination-response priors via physics-guided augmentation, then represent the degradation through adaptive prompts conditioned on the scene's latent illumination state. This explicit representation directly guides a luminance-gated intrinsic memory mechanism to selectively compensate for information loss, prioritizing reconstruction in dark regions while preserving fidelity in bright ones. Finally, the entire process is regularized by a self-supervised consistency objective that distills illumination-invariant features. By deeply exploiting intrinsic illumination priors, our method achieves clearer textures and more visually coherent enhancement results. Extensive experiments across multiple benchmarks demonstrate the effectiveness of our approach. Code is available at: https://github.com/House-yuyu/InterLight.

2605.19981 2026-05-20 cs.RO

CEER: Compliant End-Effector and Root Control as a Unified Interface for Hierarchical Humanoid Loco-Manipulation

CEER:一种用于分层人形机器人运动-操作的合规末端执行器-根控制统一接口

Xinyuan Luo, Xingrui Chen, Xunjian Yin, Hongxuan Wu, Boxi Xia, Zhuoqun Chen, Jinzhou Li, Boyuan Chen, Xianyi Cheng

AI总结 本文提出CEER,一种用于分层人形机器人运动-操作的合规末端执行器-根控制统一接口,通过模块化接口实现接触丰富和长时程操作任务的稳定交互,实验表明其在仿真和硬件上均表现出较高的末端执行器跟踪精度和操作稳定性。

Comments Project page: https://robotproject8.github.io/ceer_page/. 9 pages, 7 figures

详情
AI中文摘要

人形机器人已经实现了出色的运动性能,但接触丰富且长时程的操作仍然是主要瓶颈。操作本质上是接触丰富的,需要具有合规性的全身控制以实现稳定的交互,而其多样性和长时程性质则支持模块化、规划兼容的接口,而非关节空间跟踪。我们提出CEER,一种合规末端执行器-根(EE-root)控制抽象,用于在分层规划框架内实现模块化的人形机器人运动-操作。CEER在由根运动命令和末端执行器姿态目标定义的可解释任务空间中实现合规性感知的全身控制,并支持与异构高层规划器的即插即用集成。我们进一步构建了一个分层系统,通过EE-root接口整合异构规划器和任务模块,从而在不重新训练底层全身策略的情况下实现多样化的操作任务。在仿真和硬件上的实验表明,末端执行器的跟踪精度达到3.3厘米,与基线相比显著减少了冲击,实现了在远程操作下的稳定接触丰富操作,并在房间级环境中实现了高达70%的成功率。这些结果表明,合规的EE-root控制提供了一种实用的抽象,用于人形机器人的运动-操作,实现了多样化技能的模块化和可扩展集成。

英文摘要

Humanoid robots have achieved impressive locomotion performance, yet contact-rich and long-horizon manipulation remains a major bottleneck. Manipulation is inherently contact-rich and demands compliant whole-body control for stable interaction, while its diversity and long-horizon nature favor modular, planner-compatible interfaces over joint-space tracking. We propose CEER, a compliant end-effector-root (EE-root) control abstraction for modular humanoid loco-manipulation within a hierarchical planning framework. CEER enables compliance-aware whole-body control in an interpretable task space defined by root motion commands and end-effector pose targets, and supports plug-and-play integration with heterogeneous high-level planners. A teacher-student framework is adopted to distill a general motion-tracking controller into a low-level policy that consumes only EE-root commands. We further construct a hierarchical system that integrates heterogeneous planners and task modules through the EE-root interface, enabling diverse manipulation tasks without retraining the underlying whole-body policy. Experiments in simulation and on hardware demonstrate 3.3 cm end-effector tracking accuracy with substantially reduced jerk compared to baselines, stable contact-rich manipulation under teleoperation, and up to 70% success in simulated single-object loco-manipulation tasks within a room-scale environment. These results indicate that compliant EE-root control provides a practical abstraction for humanoid loco-manipulation, enabling modular and scalable integration of diverse skills.

2605.19976 2026-05-20 cs.CV

RECIPE: Procedural Planning via Grounding in Instructional Video

RECIPE: 通过指令视频中的 grounding 实现过程规划

Luigi Seminara, Antonino Furnari, Lorenzo Torresani

AI总结 该研究提出RECIPE方法,通过利用指令视频中的grounding信息来改进过程规划任务,通过利用预计算的文本嵌入实现大规模视频数据的验证,从而提升规划的准确性和鲁棒性。

详情
AI中文摘要

视觉规划要求模型在给定部分视频上下文和目标的情况下,生成剩余步骤的自然语言描述。该任务的进展受到标注的限制:干净的标记数据集较小,领域狭窄,每个示例只编码一个执行轨迹,尽管许多有效的顺序存在。大规模的指令视频语料库提供了数量级更多的过程内容,但通过使用伪标签进行监督微调会传播分割和对齐错误,并且只能生成单轨迹。我们识别出一个关键的不对称性:从噪声视频中提取干净的步骤标签是困难的,但验证生成的步骤序列是否在ASR转录中时间上接地是便宜的,并且可以通过预计算的文本嵌入扩展到数百万个视频。我们利用这种不对称性,在RECIPE中将grounding质量作为GRPO的奖励,将噪声语料库转化为验证者而不是标签来源。该框架可以统一应用于两种规划器输入配置(Socratic,使用冻结的VLM提取文本历史,以及Video,直接消耗视频令牌)以及标注和弱监督的模式。我们在7个过程基准上进行评估,使用基于参考的LLM-as-judge协议对计划进行评分,跨6个过程标准。RECIPE-RL在所有规模(0.5B、3B、7B)和每个基准上都优于基础检查点,领域内宏准确率提升7到8分,在零样本情况下最高提升16分。它在标注和伪标签计划上均优于监督微调(后者会降低基础模型性能),并在没有人工标注的情况下保持稳健。作为先前提案-评估-搜索规划器的提案阶段使用时,在视觉规划辅助任务中在每个时间范围内均优于最强的零样本基线,在COIN任务中保持了SFT所崩溃的生成多样性。

英文摘要

Visual planning asks a model to generate the remaining steps of a procedure in natural language given a partial video context and a goal. Progress on this task is bottlenecked by annotation: clean labeled datasets are small, domain-narrow, and encode a single execution trajectory per example, even though many valid orderings exist. Large-scale instructional video corpora offer orders of magnitude more procedural content, but supervised fine-tuning on pseudo-labels from their noisy ASR narrations propagates segmentation and alignment errors and stays single-trajectory. We identify a key asymmetry: extracting clean step labels from noisy video is hard, but verifying whether a generated step sequence is temporally grounded in ASR transcripts is cheap and scales to millions of videos via precomputed text embeddings. We exploit this asymmetry in RECIPE, which uses grounding quality as a reward for GRPO, turning the noisy corpus into a verifier rather than a label source. The framework applies uniformly to two planner input configurations (Socratic, with a textual history extracted by a frozen VLM, and Video, consuming video tokens directly) and to annotated and weakly supervised regimes. We evaluate on 7 procedural benchmarks using a reference-based LLM-as-judge protocol scoring plans across 6 procedural criteria. RECIPE-RL improves over the base checkpoint at all scales (0.5B, 3B, 7B) and every benchmark, with macro-accuracy gains of +7 to +8 points in-domain and up to +16 points zero-shot. It outperforms supervised fine-tuning on both annotated and pseudo-labeled plans (the latter degrades the base) and remains robust without human annotations. Used as the proposal stage of a prior propose-assess-search planner, it improves over the strongest zero-shot baseline at every horizon on Visual Planning for Assistance, and on COIN it preserves the generation diversity that SFT collapses.

2605.19975 2026-05-20 cs.LG cs.AI

Learning with Foresight: Enhancing Neural Routing Policy via Multi-Node Lookahead Prediction

具有前瞻性学习:通过多节点前瞻性预测增强神经路由策略

Xia Jiang, Yaoxin Wu, Yew-Soon Ong, Yingqian Zhang

AI总结 本研究提出多节点前瞻性预测(MnLP)方法,通过扩展监督学习范式同时预测多个未来节点,提升神经路由策略的长期规划能力,并在不同问题规模和现实基准上改进泛化能力。

Comments Accepted by the 35th International Joint Conference on Artificial Intelligence

详情
AI中文摘要

神经策略因其对人工启发式依赖的减少而在解决车辆路径问题中展现出潜力。然而,当前的训练范式存在根本性局限:它们主要关注下一个节点的预测,导致短视决策,削弱了长期规划能力。为此,我们引入多节点前瞻性预测(MnLP),一种新的训练策略,扩展监督学习范式以同时预测多个未来节点。我们整合了因果性和可丢弃的MnLP模块,这些模块仅在训练期间运行,使模型能够预测多步决策,同时保持推理时的效率。通过将多深度辅助监督融入损失函数,MnLP使神经策略具备长距离上下文理解能力。实验表明,MnLP在现有训练方法上表现更优,提升了神经策略在各种问题规模、分布和现实基准上的泛化能力。此外,MnLP可以无缝集成到不同的神经架构中,而不引入额外的推理开销。

英文摘要

Neural policies have shown promise in solving vehicle routing problems due to their reduced reliance on handcrafted heuristics. However, current training paradigms suffer from a fundamental limitation: they primarily focus on next-node prediction for solution construction, resulting in myopic decision-making that undermines long-horizon planning capacity. To this end, we introduce Multi-node Lookahead Prediction (MnLP), a novel training strategy that extends the supervised learning paradigm to predict multiple future nodes simultaneously. We incorporate causal and discardable MnLP modules that operate exclusively during training, facilitating models to anticipate multi-step decisions while preserving inference-time efficiency. By incorporating multi-depth auxiliary supervision into the loss function, MnLP equips neural policies with the ability of long-range contextual understanding. Experimentally, MnLP outperforms existing training methods, improving the generalization capability of neural policies across various problem sizes, distributions, and real-world benchmarks. Moreover, MnLP can be seamlessly integrated into diverse neural architectures without introducing additional inference overhead.