arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2029
2603.17310 2026-06-05 cs.AI cs.CL

InfoDensity: Rewarding Information-Dense Traces for Efficient Reasoning

InfoDensity: 为高效推理奖励信息密集的轨迹

Chengwei Wei, Jung-jae Kim, Longyin Zhang, Shengkai Chen, Nancy F. Chen

发表机构 * Institute for Infocomm Research (I 2 R), A*STAR, Singapore(信息与通信研究机构(I 2 R),A*STAR,新加坡) Centre for Frontier AI Research (CFAR), A*STAR, Singapore(前沿人工智能研究中心(CFAR),A*STAR,新加坡)

AI总结 本文提出InfoDensity框架,通过捕捉推理轨迹的信息密度特性,改进强化学习训练中的推理质量与效率平衡。

详情
AI中文摘要

具有扩展推理能力的大语言模型(LLMs)常生成冗长且冗余的推理轨迹,导致不必要的计算成本。尽管现有强化学习方法通过优化最终响应长度来解决这一问题,但它们忽略了中间推理步骤的质量,使模型容易受到奖励黑客攻击。我们主张冗长性不仅仅是长度问题,而是中间推理质量差的症状。为此,我们进行了实证研究,追踪大型推理模型在推理轨迹上的每token预测熵。我们发现高质量的推理轨迹具有两个一致特性:低不确定性收敛和快速不确定性下降。这些发现表明,高质量的推理轨迹是信息密集的,即推理步骤相对于总推理长度有助于达到低不确定性水平。基于此,我们提出InfoDensity,一种用于强化学习训练的奖励框架,通过单个熵轨迹的后缀最大包络线捕捉这两个特性,通过长度缩放项优先实现等效质量的简洁性。在数学和一般推理基准上的实验表明,InfoDensity在准确率-效率权衡上优于现有最先进的基线。

英文摘要

Large Language Models (LLMs) with extended reasoning capabilities often generate verbose and redundant reasoning traces, incurring unnecessary computational cost. While existing reinforcement learning approaches address this by optimizing final response length, they neglect the quality of intermediate reasoning steps, leaving models vulnerable to reward hacking. We argue that verbosity is not merely a length problem, but a symptom of poor intermediate reasoning quality. To investigate this, we conduct an empirical study tracking the per-token predictive entropy of large reasoning models across reasoning trajectories. We find that high-quality reasoning traces exhibit two consistent properties: low uncertainty convergence and fast uncertainty descent. These findings suggest that high-quality reasoning traces are informationally dense, that is, reasoning steps contribute to reaching a low uncertainty level relative to the total reasoning length. Motivated by this, we propose InfoDensity, a reward framework for RL training that captures both properties through a single suffix-max envelope of the entropy trajectory, weighted by a length scaling term that favors achieving equivalent quality more concisely. Experiments on mathematical and general reasoning benchmarks demonstrate that InfoDensity outperforms state-of-the-art baselines on the accuracy-efficiency trade-off.

2603.16652 2026-06-05 cs.CV

Efficient Brood Cell Detection in Layer Trap Nests for Bees and Wasps: Balancing Labeling Effort and Species Coverage

蜂类和黄蜂分层巢穴陷阱中高效育雏细胞检测:平衡标注工作量与物种覆盖度

Chenchang Liu, Felix Fornoff, Annika Grasreiner, Patrick Maeder, Henri Greil, Marco Seeland

发表机构 * Technical University of Ilmenau(伊尔梅瑙技术大学) University of Zurich(苏黎世大学)

AI总结 提出基于深度学习的育雏细胞检测与分类方法,通过约束假阳性损失策略减少标注工作量并缓解类别不平衡,提升检测性能。

详情
AI中文摘要

监测洞穴筑巢的野生蜂类和黄蜂对生物多样性研究和保护至关重要。分层巢穴陷阱(LTNs)正成为研究这些昆虫丰度和物种丰富度的宝贵工具,可深入了解其筑巢活动和生态需求。然而,手动评估LTNs以检测和分类育雏细胞既费时又费力。为此,我们提出一种基于深度学习的方法,用于高效检测和分类LTNs中的育雏细胞。LTNs由于育雏细胞密集排列,导致每张图像的标注工作量很高。此外,我们观察到类别分布显著不平衡,常见物种的出现次数明显多于稀有物种。对常见物种进行全面标注既耗时又加剧数据不平衡,而部分标注则导致数据不完整,从而降低模型性能。为了减少标注工作量并减轻未标注数据的影响,我们引入了一种新颖的约束假阳性损失(CFPL)策略。CFPL动态屏蔽未标注数据的预测,防止其在训练过程中干扰分类损失。实验结果表明,我们的方法提高了检测性能,平衡了模型准确性和标注工作量,同时缓解了类别不平衡问题。

英文摘要

Monitoring cavity-nesting wild bees and wasps is vital for biodiversity research and conservation. Layer trap nests (LTNs) are emerging as a valuable tool to study the abundance and species richness of these insects, offering insights into their nesting activities and ecological needs. However, manually evaluating LTNs to detect and classify brood cells is labor-intensive and time-consuming. To address this, we propose a deep learning based approach for efficient brood cell detection and classification in LTNs. LTNs present additional challenges due to densely packed brood cells, leading to a high labeling effort per image. Moreover, we observe a significant imbalance in class distribution, with common species having notably more occurrences than rare species. Comprehensive labeling of common species is time-consuming and exacerbates data imbalance, while partial labeling introduces data incompleteness which degrades model performance. To reduce labeling effort and mitigate the impact of unlabeled data, we introduce a novel Constrained False Positive Loss (CFPL) strategy. CFPL dynamically masks predictions from unlabeled data, preventing them from interfering with the classification loss during training. Experimental results demonstrate that our method improves detection performance, balances model accuracy and labeling effort, while also mitigating class imbalance.

2603.16475 2026-06-05 cs.AI

Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

打破链条:对LLM对中间结构忠实性的因果分析

Oleg Somov, Mikhail Chaichuk, Gleb Ershov, Karim Vafin, Mikhail Seleznyov, Alexander Panchenko, Elena Tutubalina

发表机构 * AIRI MIPT(莫斯科国立交通大学) HSE University(高等经济大学) Avito AI Lab(Avito人工智能实验室) Skoltech(斯克里普钦科技学院)

AI总结 研究探讨了在模式引导推理(SGR)管道中,LLM对中间结构的忠实性,发现尽管模型在自身中间结构上自洽,但对干预的响应不足,当将最终决策的推导委托给外部工具时,这种不稳定性显著降低。

Comments 20 pages, 4 figures, 7 tables

详情
AI中文摘要

在模式引导推理(SGR)管道中,LLM会在做出最终决定前生成显式的中间结构——如rubrics、checklists或验证查询。SGR因其承诺可控性而被越来越多采用:从业者期望能够检查、编辑和覆盖这些结构以引导结果。但这一承诺是否成立?我们引入了一种因果评估协议来衡量它:通过选择任务,其中确定性函数将中间结构映射到决定,每次受控编辑都意味着一个唯一的正确输出。在12个模型和4个基准测试中,模型在自身中间结构上自洽,但干预后预测未能更新——揭示出当中间结构变化时,看似忠实的特性变得脆弱。当最终决策的推导被委托给外部工具时,这种脆弱性大大消失;更强的提示仅带来有限的改进,而偏好优化显著提高了干预的忠实性。总体而言,模式引导管道中的中间结构起着影响性上下文的作用,而非稳定的因果中介。

英文摘要

In schema-guided reasoning (SGR) pipelines, LLMs produce explicit intermediate structures -- rubrics, checklists, or verification queries -- before committing to a final decision. SGR is increasingly adopted because it promises controllability: practitioners expect to inspect, edit, and override these structures to steer the outcome. But does the promise hold? We introduce a causal evaluation protocol to measure it: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across 12 models and 4 benchmarks, models appear self-consistent with their own intermediate structures but fail to update predictions after intervention -- revealing that apparent faithfulness is fragile once the intermediate structure changes. When derivation of the final decision from the structure is delegated to an external tool, this fragility largely disappears; stronger prompting yields only limited improvements, while preference optimization substantially improves intervention faithfulness. Overall, intermediate structures in schema-guided pipelines function as influential context rather than stable causal mediators.

2603.14805 2026-06-05 cs.AI cs.HC cs.SE

Knowledge Activation: AI Skills as the Institutional Knowledge Primitive for Agentic Software Development

知识激活:作为代理软件开发机构知识基础的AI技能

Gal Bakal

发表机构 * Yahoo Inc.(雅虎公司)

AI总结 本文提出知识激活框架,通过将AI技能转化为结构化、治理-aware的原子知识单元(AKUs)来解决企业软件开发中的机构知识交付问题,提升代理软件开发效率。

Comments Preprint. 59 pages, 11 figures. v2 is a major revision: adds an enterprise case study (a Yahoo deployment evaluated by an anonymous 67-engineer survey), with findings integrated into the abstract, introduction, discussion, and conclusion; methodology tightened and references expanded

详情
AI中文摘要

企业软件组织积累了关键的机构知识——架构决策、部署流程、合规政策、事件 playbook——但这些知识仍被困在为人类解读设计的格式中。代理软件开发有效性的瓶颈不是模型能力,而是知识架构。当任何知识消费者——自主AI代理、新入职工程师或资深开发者——在没有机构上下文的企业任务中遇到问题时,结果是猜测、修正级联以及对资深工程师的不成比例负担。本文介绍知识激活,一个将AI技能——代理可消费知识的开放标准——专门化为结构化、治理-aware的原子知识单元(AKUs)的框架,用于机构知识交付。与其检索文档进行解读,AKUs提供行动准备的规格,编码应做什么、使用哪些工具、尊重哪些约束以及下一步去哪里——这样代理才能正确行动,工程师可以接收基于机构的指导,而无需重新构建组织上下文。AKUs形成一个可组合的知识图谱,代理在运行时遍历——压缩入职时间,减少跨团队摩擦,并消除修正级联。本文正式化了使这种架构必要的资源约束,指定了AKU的模式和部署架构,并将长期维护扎根于知识共享实践。一项针对67名工程师的Yahoo部署调查表明,开发者体验有显著提升——每周节省2.6小时,净推荐值+35。那些为代理时代架构其机构知识的组织将优于只投资模型能力的组织。

英文摘要

Enterprise software organizations accumulate critical institutional knowledge - architectural decisions, deployment procedures, compliance policies, incident playbooks - yet this knowledge remains trapped in formats designed for human interpretation. The bottleneck to effective agentic software development is not model capability but knowledge architecture. When any knowledge consumer - an autonomous AI agent, a newly onboarded engineer, or a senior developer - encounters an enterprise task without institutional context, the result is guesswork, correction cascades, and a disproportionate tax on senior engineers who must manually supply what others cannot infer. This paper introduces Knowledge Activation, a framework that specializes AI Skills - the open standard for agent-consumable knowledge - into structured, governance-aware Atomic Knowledge Units (AKUs) for institutional knowledge delivery. Rather than retrieving documents for interpretation, AKUs deliver action - ready specifications encoding what to do, which tools to use, what constraints to respect, and where to go next - so that agents act correctly and engineers receive institutionally grounded guidance without reconstructing organizational context from scratch. AKUs form a composable knowledge graph that agents traverse at runtime - compressing onboarding, reducing cross - team friction, and eliminating correction cascades. The paper formalizes the resource constraints that make this architecture necessary, specifies the AKU schema and deployment architecture, and grounds long - term maintenance in knowledge commons practice. A Yahoo deployment surveying 67 engineers shows statistically significant developer-experience gains - 2.6 hours per week saved, Net Promoter Score +35. Organizations that architect their institutional knowledge for the agentic era will outperform those that invest solely in model capability.

2603.14210 2026-06-05 cs.CL

Vavanagi: a Community-run Platform for Documentation of the Hula Language in Papua New Guinea

Vavanagi:巴布亚新几内亚胡拉语言文档社区运行平台

Bri Olewale, Raphael Merx, Ekaterina Vylomova

发表机构 * Vula'a Kunenai Community, Central Province, Papua New Guinea(巴布亚新几内亚中央省Vula'a Kunenai社区) The University of Melbourne, Melbourne, Australia(墨尔本大学)

AI总结 本文介绍Vavanagi平台,该平台由社区运营,用于记录巴布亚新几内亚的胡拉语言,通过社区成员参与翻译和语音记录,推动语言技术发展,实现社区主导的语言保护与传承。

详情
AI中文摘要

我们介绍了Vavanagi,一个由社区运营的平台,用于记录巴布亚新几内亚的胡拉语言(Vula'a),这是一种有约10,000名使用者的澳亚语言。Vavanagi支持众包的英语-胡拉文文本翻译和语音记录,由长者主导的审查和社区治理的数据基础设施。截至目前,77名翻译员和4名审阅员已生成超过12,000对平行句子对,涵盖9,000个独特的胡拉词汇。我们还提出了一种多级框架,用于衡量社区参与度,从咨询到完全由社区发起和管理的项目。我们将Vavanagi定位在第5级:倡议、设计、实施和数据治理均位于胡拉社区内部,使其成为我们所知的第一项由社区主导的语言技术倡议,适用于这种规模的语言。Vavanagi展示了语言技术如何连接基于村庄和城市成员,连接世代,并在社区自己的条件下支持文化传承。

英文摘要

We present Vavanagi, a community-run platform for Hula (Vula'a), an Austronesian language of Papua New Guinea with approximately 10,000 speakers. Vavanagi supports crowdsourced English-Hula text translation and voice recording, with elder-led review and community-governed data infrastructure. To date, 77 translators and 4 reviewers have produced over 12k parallel sentence pairs covering 9k unique Hula words. We also propose a multi-level framework for measuring community involvement, from consultation to fully community-initiated and governed projects. We position Vavanagi at Level 5: initiative, design, implementation, and data governance all sit within the Hula community, making it, to our knowledge, the first community-led language technology initiative for a language of this size. Vavanagi shows how language technology can bridge village-based and urban members, connect generations, and support cultural heritage on the community's own terms.

2603.13761 2026-06-05 cs.LG cs.AI

Level Up: Defining and Exploiting Transitional Problems for Curriculum Learning

Level Up: 定义和利用过渡问题以进行课程学习

Amogh Inamdar, Zhenwei Tang, Ashton Anderson, Richard Zemel

发表机构 * Department of Computer Science, Columbia University(哥伦比亚大学计算机科学系) Department of Computer Science, University of Toronto(多伦多大学计算机科学系)

AI总结 本文提出了一种新的方法,通过定义和利用过渡问题来改进课程学习,该方法能够根据模型能力的提升动态调整训练难度,从而更有效地提升模型性能。

详情
AI中文摘要

课程学习——按顺序排列训练示例以帮助机器学习——受到人类学习的启发,但尚未得到广泛接受。静态策略依赖于间接的难度评分代理,产生不特定于当前学习者的课程。动态方法基于梯度信息估计难度,但需要大量的额外计算。我们介绍了一种新的方法,通过一系列能力递增的模型来测量单个问题实例的难度,并识别出在模型能力提升时始终更简单的过渡问题。将此方法应用于由多个可用模型构成的多样化模型系列,我们发现,使用从简单到困难的过渡问题进行训练,最有效地将模型提升到下一个能力层级。这些问题诱导了从简单到困难的自然进步,优于其他训练策略。通过直接测量难度相对于模型能力,我们的方法产生了可解释的问题、特定于学习者的课程以及逐步改进的原理基础。

英文摘要

Curriculum learning--ordering training examples in a sequence to aid machine learning--takes inspiration from human learning, but has not gained widespread acceptance. Static strategies for scoring item difficulty rely on indirect proxy scores of varying quality and produce curricula that are not specific to the learner at hand. Dynamic approaches base difficulty estimates on gradient information, requiring considerable extra computation during training. We introduce a novel method for measuring the difficulty of individual problem instances that is calibrated to a series of models of increasing competence, and identify \emph{transitional problems} that are consistently easier as model ability increases. Applying this method to diverse model series constructed from sets of models that are readily available on many tasks, we find that training on a curriculum that \emph{levels up} from easier to harder transitional problems most efficiently improves a model to the next tier of competence. These problems induce a natural progression from easier to harder items, which outperforms other training strategies. By measuring difficulty directly relative to model competence, our method yields interpretable problems, learner-specific curricula, and a principled basis for step-by-step improvement.

2603.11600 2026-06-05 cs.LG cs.SY eess.SY math.OC

Hybrid Energy-Aware Reward Shaping: A Unified Lightweight Physics-Guided Methodology for Policy Optimization

混合能量感知奖励塑形:一种统一的轻量级物理引导策略优化方法

Qijun Liao, Jue Yang, Yiting Kang, Xinxin Zhao, Yong Zhang, Mingan Zhao

发表机构 * School of Mechanical Engineering, University of Science and Technology Beijing(北京科技大学机械工程学院) Jiangsu XCMG Construction Machinery Research Institute Co., Ltd.(江苏中联重科工程机械研究院有限公司)

AI总结 提出混合能量感知奖励塑形(H-EARS),通过编码先验能量项作为奖励势能,结合动作正则化,在连续控制中提升收敛速度、稳定性和能效。

Comments 23 pages, 48 figures. Accepted by Neurocomputing

详情
AI中文摘要

深度强化学习在连续控制中常因纯数据驱动探索忽略可用物理结构而遭受高方差、低能效和分布偏移下泛化能力差的问题。本文提出混合能量感知奖励塑形(H-EARS),将主导能量项(假设先验已知)直接编码为奖励势能,每步计算复杂度为O(n)。H-EARS将塑形势能分解为任务导向和基于能量的组件,并辅以动作正则化项,有意修改优化目标以强制执行节能控制。建立了完整的理论基础:塑形与正则化的功能独立性、正定Hessian条件下的能量梯度增强、函数近似下的收敛保证以及近似势能误差界。在四个连续控制基准和四种基线算法上,H-EARS在收敛速度、策略稳定性和最终性能方面均取得一致提升。高保真车辆仿真验证了其在极端道路条件下安全关键场景中的适用性。

英文摘要

Deep reinforcement learning for continuous control often suffers from high variance, low energy efficiency, and poor generalization under distribution shift, as purely data-driven exploration ignores available physical structure. This paper proposes Hybrid Energy-Aware Reward Shaping (H-EARS), which encodes dominant energy terms -- assumed known a priori -- directly as reward potentials at O(n) per-step computation. H-EARS decomposes the shaping potential into task-oriented and energy-based components, supplemented by an action regularization term that deliberately modifies the optimization objective to enforce energy-efficient control. A complete theoretical foundation is established: functional independence of shaping and regularization, energy-based gradient enrichment under positive-definite Hessian conditions, convergence guarantees under function approximation, and approximate potential error bounds. Across four continuous control benchmarks and four baseline algorithms, H-EARS achieves consistent gains in convergence speed, policy stability, and final performance. High-fidelity vehicle simulations validate applicability in safety-critical settings under extreme road conditions.

2603.11319 2026-06-05 cs.LG stat.ML

On the Robustness of Langevin Dynamics to Score Function Error

关于对数动力学对分数函数误差的鲁棒性

Daniel Yiming Cao, August Y. Chen, Karthik Sridharan, Yuchen Wu

发表机构 * Cornell University(康奈尔大学)

AI总结 本文研究了基于分数函数的生成模型对分数函数估计误差的鲁棒性,发现对数动力学在L2误差(更一般地Lp误差)下并不鲁棒,即使在高维简单分布中,即使分数函数估计误差非常小,对数动力学在多项式时间内运行也会导致与目标分布的总变差距离很大,这进一步支持了扩散模型优于对数动力学。

Comments ICML 2026

详情
AI中文摘要

我们考虑了基于分数函数的生成模型对分数函数估计误差的鲁棒性。特别是,我们证明了对数动力学对分数函数估计的L2误差(更一般地Lp误差)不具有鲁棒性。已知在分数函数估计的L2误差较小的情况下,扩散模型可以在多项式时间内忠实采样目标分布,只要满足一定的正则性假设。相比之下,我们的工作表明,即使对于高维简单分布,对数动力学在任何多项式时间内运行都会产生与目标分布在总变差(TV)距离远的分布,即使分数函数估计的L2误差(更一般地Lp误差)可以任意小。考虑到在实践中从数据学习分数函数时,分数函数估计误差是不可避免的,我们的结果进一步支持扩散模型优于对数动力学,并警示不要使用估计的分数函数进行对数动力学采样。

英文摘要

We consider the robustness of score-based generative modeling to errors in the estimate of the score function. In particular, we show that Langevin dynamics is not robust to the $L^2$ errors (more generally $L^p$ errors) in the estimate of the score function. It is well-established that with small $L^2$ errors in the estimate of the score function, diffusion models can sample faithfully from the target distribution under fairly mild regularity assumptions in a polynomial time horizon. In contrast, our work shows that even for simple distributions in high dimensions, Langevin dynamics run for any polynomial time horizon will produce a distribution far from the target distribution in Total Variation (TV) distance, even when the $L^2$ error (more generally $L^p$) of the estimate of the score function is arbitrarily small. Considering such an error in the estimate of the score function is unavoidable in practice when learning the score function from data, our results provide further justification for diffusion models over Langevin dynamics and serve to caution against the use of Langevin dynamics with estimated scores.

2603.10971 2026-06-05 cs.RO cs.AI

ContactExplorer: Contact Coverage-Guided Exploration for General-Purpose Dexterous Manipulation

ContactExplorer: 接触覆盖引导的通用灵巧操作探索

Zixuan Liu, Ruoyi Qiao, Chenrui Tie, Xuanwei Liu, Yunfan Lou, Chongkai Gao, Zhixuan Xu, Lin Shao

发表机构 * School of Computing, National University of Singapore(新加坡国立大学计算机学院) RoboScience(机器人科学)

AI总结 提出ContactExplorer方法,通过接触覆盖奖励和能量引导奖励,在灵巧操作任务中高效探索接触模式,提升样本效率和成功率。

Comments 24 pages

详情
AI中文摘要

强化学习在Atari游戏、导航和移动等任务中取得了显著成功,这些任务中的探索通常可以通过状态或动态的新颖性来引导。相比之下,灵巧操作需要丰富的物理手-物体交互,但现有方法常受限于不稳定的基于接触的新颖性信号、低效的距离新颖性信号或依赖任务先验知识。我们提出ContactExplorer,一种用于灵巧操作任务的通用探索方法。ContactExplorer将接触表示为物体表面点与手部关键点的交集,鼓励灵巧手发现多样且新颖的接触模式,即哪些手指接触物体的哪些区域。它维护一个基于离散化物体状态(通过学习的哈希码获得)的接触计数器,捕捉每个手指与不同物体区域交互的频率。该计数器以两种互补方式利用:(1)分配基于计数的接触覆盖奖励,促进对新接触模式的探索;(2)基于能量的到达奖励,引导智能体朝向未充分探索的接触区域。我们在多种灵巧操作任务上评估ContactExplorer。实验结果表明,ContactExplorer在样本效率和成功率上显著优于现有探索方法,并且通过ContactExplorer学习的接触模式能鲁棒地迁移到现实世界。项目页面:https://contact-explorer.github.io。

英文摘要

Reinforcement learning has achieved remarkable success in domains such as Atari games, navigation, and locomotion, where exploration can often be guided by novelty over states or dynamics. In contrast, dexterous manipulation requires rich physical hand--object interactions, but existing methods often suffer from unstable contact-based novelty signals, inefficient distance novelty signals, or reliance on task-specific priors. We propose ContactExplorer, a general exploration method for dexterous manipulation tasks. ContactExplorer represents contact as the intersection between object surface points and hand keypoints, encouraging dexterous hands to discover diverse and novel contact patterns, namely which fingers contact which object regions. It maintains a contact counter conditioned on discretized object states obtained via learned hash codes, capturing how frequently each finger interacts with different object regions. This counter is leveraged in two complementary ways: (1) to assign a count-based contact coverage reward that promotes exploration of novel contact patterns, and (2) an energy-based reaching reward that guides the agent toward under-explored contact regions. We evaluate ContactExplorer on a diverse set of dexterous manipulation tasks. Experimental results show that ContactExplorer substantially improves sample efficiency and success rates over existing exploration methods, and that the contact patterns learned with ContactExplorer transfer robustly to the real world. Project page is https://contact-explorer.github.io.

2603.08491 2026-06-05 cs.CV

Global Cross-Modal Geo-Localization: A Million-Scale Dataset and a Physical Consistency Learning Framework

全球跨模态地理定位:一个百万级数据集和一个物理一致性学习框架

Yutong Hu, Jinhui Chen, Chaoqiang Xu, Yuan Kou, Sili Zhou, Shaocheng Yan, Pengcheng Shi, Qingwu Hu, Jiayuan Li

发表机构 * School of Remote Sensing and Information Engineering, Wuhan University(武汉大学遥感与信息工程学院) First Surveying and Mapping Institute of Hunan Province(湖南省第一测绘院)

AI总结 本文提出CORE数据集和PLANET框架,用于解决全球跨模态地理定位问题,通过大规模数据和物理一致性学习提升定位的鲁棒性和全球适用性。

详情
AI中文摘要

跨模态地理定位(CMGL)将地面级文本描述与带有地理标签的航空影像匹配,这对于行人导航和应急响应至关重要。然而,现有研究受限于狭窄的地理覆盖和简单的场景多样性,无法反映全球建筑风格和地形特征的巨大空间异质性。为弥合这一差距并促进通用定位,我们引入CORE,首个专注于全球CMGL的百万级数据集。CORE包含来自六个大洲225个不同地理区域的1,034,786张跨视角图像,在多样的环境条件和城市布局中提供前所未有的视角多样性。我们利用大视觉-语言模型(LVLMs)的零样本推理能力来合成高质量的场景描述,富含判别性线索。此外,我们提出一个物理定律意识的网络(PLANET)用于跨模态地理定位。PLANET引入了一种新的对比学习范式,指导文本表示在捕捉卫星影像的内在物理特征方面发挥作用。在各种地理区域的广泛实验中,PLANET显著优于现有最先进方法,建立了新的基准,为稳健、大规模的地理定位奠定了基础。数据集和源代码将在https://github.com/YtH0823/CORE发布。

英文摘要

Cross-modal Geo-localization (CMGL) matches ground-level text descriptions with geo-tagged aerial imagery, which is crucial for pedestrian navigation and emergency response. However, existing studies are constrained by narrow geographic coverage and simplistic scene diversity, failing to reflect the immense spatial heterogeneity of global architectural styles and topographic features. To bridge this gap and facilitate universal positioning, we introduce CORE, the first million-scale dataset dedicated to global CMGL. CORE comprises 1,034,786 cross-view images sampled from 225 distinct geographic regions across six continents, offering an unprecedented variety of perspectives in varying environmental conditions and urban layouts. We leverage the zero-shot reasoning of Large Vision-Language Models (LVLMs) to synthesize high-quality scene descriptions rich in discriminative cues. Furthermore, we propose a physical-law-aware network (PLANET) for cross-modal geo-localization. PLANET introduces a novel contrastive learning paradigm to guide textual representations in capturing the intrinsic physical signatures of satellite imagery. Extensive experiments across varied geographic regions demonstrate that PLANET significantly outperforms state-of-the-art methods, establishing a new benchmark for robust, global-scale geo-localization. The dataset and source code will be released at https://github.com/YtH0823/CORE.

2603.07294 2026-06-05 cs.CV cs.AI

MAviS: A Multimodal Conversational Assistant For Avian Species

MAviS:一种用于鸟类物种的多模态对话助手

Yevheniia Kryklyvets, Mohammed Irfan Kurpath, Sahal Shaji Mullappilly, Jinxing Zhou, Fahad Shabzan Khan, Rao Anwer, Salman Khan, Hisham Cholakkal

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 本文提出MAviS数据集和MAviS-Chat模型,通过整合图像、音频和文本信息,提升对鸟类物种的细粒度理解与多模态问答能力,并展示了在生态应用中领域适应的多模态大语言模型的重要性。

Comments EMNLP 2025

详情
AI中文摘要

细粒度理解和特定物种的多模态问答对于推进生物多样性保护和生态监测至关重要。然而,现有的多模态大语言模型在处理如鸟类物种等专业领域时面临挑战,难以提供准确且上下文相关的信息。为此,我们引入了MAviS数据集,这是一个大规模的多模态鸟类物种数据集,整合了图像、音频和文本模态,涵盖超过1000种鸟类物种,包含预训练和指令微调子集,并补充了结构化的问答对。基于MAviS数据集,我们引入了MAviS-Chat,一种支持音频、视觉和文本的多模态大语言模型,旨在实现细粒度物种理解、多模态问答和场景特定描述生成。最后,为了定量评估,我们提出了MAviS-Bench,一个包含超过25,000个问答对的基准测试,用于评估跨模态的鸟类物种特定感知和推理能力。实验结果表明,MAviS-Chat在基准MiniCPM-o-2.6上表现显著优于基线,实现了最先进的开源结果,并展示了我们指令微调MAviS数据集的有效性。我们的发现强调了在生态应用中领域适应的多模态大语言模型的必要性。

英文摘要

Fine-grained understanding and species-specific multimodal question answering are vital for advancing biodiversity conservation and ecological monitoring. However, existing multimodal large language models face challenges when it comes to specialized topics like avian species, making it harder to provide accurate and contextually relevant information in these areas. To address this limitation, we introduce the MAviS-Dataset, a large-scale multimodal avian species dataset that integrates image, audio, and text modalities for over 1,000 bird species, comprising both pretraining and instruction-tuning subsets enriched with structured question-answer pairs. Building on the MAviS-Dataset, we introduce MAviS-Chat, a multimodal LLM that supports audio, vision, and text and is designed for fine-grained species understanding, multimodal question answering, and scene-specific description generation. Finally, for quantitative evaluation, we present MAviS-Bench, a benchmark of over 25,000 QA pairs designed to assess avian species-specific perceptual and reasoning abilities across modalities. Experimental results show that MAviS-Chat outperforms the baseline MiniCPM-o-2.6 by a large margin, achieving state-of-the-art open-source results and demonstrating the effectiveness of our instruction-tuned MAviS-Dataset. Our findings highlight the necessity of domain-adaptive multimodal LLMs for ecological applications.

2601.09923 2026-06-05 cs.AI

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

CaMeLs Can Use Computers Too: System-level Security for Computer Use Agents

Hanna Foerster, Tom Blanchard, Kristina Nikolić, Ilia Shumailov, Cheng Zhang, Robert Mullins, Nicolas Papernot, Florian Tramèr, Yiren Zhao

发表机构 * University of Cambridge(剑桥大学) University of Toronto & Vector Institute(多伦多大学及向量研究所) ETH Zurich(苏黎世联邦理工学院) AI Security Company(人工智能安全公司)

AI总结 本文提出了一种系统级安全方法,用于计算机使用代理(CUAs),通过单次规划和NOVA框架在动态UI状态下提供控制流完整性保障,同时在保持性能的同时提升安全性。

详情
AI中文摘要

AI代理容易受到提示注入攻击,其中恶意内容劫持代理行为。在已提出的防御措施中,架构隔离通过严格分离可信任务规划与不可信环境观察提供了最强的保证。然而,将此设计应用于自动化任务的计算机使用代理(CUAs)则面临根本性挑战。当前代理需要持续观察UI状态以确定每个动作,这与安全所需的隔离相冲突。我们通过证明UI工作流虽然动态但结构上可预测,解决了这一矛盾。单次规划,即可信规划器提前发出完整的分支计划,覆盖所有预期的运行时状态,可为任意指令注入提供控制流完整性保障。我们引入NOVA(通过观察、验证和行动导航)使这种方案在组合爆炸的UI状态空间中可行,其中计划可以调用感知模型来解析运行时值,如UI坐标。我们在OSWorld上评估了我们的设计,保留了前沿模型57%的性能,同时对较小的开源模型性能提升高达19%,证明了在CUAs中严格的安全性和实用性可以共存。尽管提前规划防止了指令注入,但我们展示还需要额外措施来防御分支引导攻击,其中攻击者欺骗感知模型使执行沿着攻击者偏好的计划分支进行,例如将代理引导至恶意网站。

英文摘要

AI agents are vulnerable to prompt injection attacks, where malicious content hijacks agent behavior. Among proposed defenses, architectural isolation provides the strongest guarantees by strictly separating trusted task planning from untrusted environment observations. However, applying this design to Computer Use Agents (CUAs), which automate tasks by viewing screens and executing actions, presents a fundamental challenge. Current agents require continuous observation of UI state to determine each action, which conflicts with the isolation required for security. We resolve this tension by demonstrating that UI workflows, while dynamic, are structurally predictable. Single-shot planning, where a trusted planner emits upfront a complete branching plan covering all anticipated runtime states, provides control flow integrity guarantees against arbitrary instruction injections. We introduce NOVA (Navigating via Observation, Verification, and Action) to make this viable in the combinatorially large UI state space, where the plan can invoke a perception model to resolve runtime values such as UI coordinates. We evaluate our design on OSWorld, and retain up to 57% of the performance of frontier models while improving performance for smaller open-source models by up to 19%, demonstrating that rigorous security and utility can coexist in CUAs. Although upfront planning prevents instruction injections, we show that additional measures are needed to defend against \textbf{Branch Steering} attacks, where adversaries deceive the perception model into routing execution down attacker-preferred branches of the plan, such as redirecting the agent to a malicious website.

2602.12628 2026-06-05 cs.RO

Beyond Imitation: Reinforcement Learning-Based Sim-Real Co-Training for VLA Models

超越模仿:基于强化学习的仿真-现实协同训练用于VLA模型

Liangzhi Shi, Shuaihang Chen, Feng Gao, Yinuo Chen, Kang Chen, Tonghe Zhang, Hongzhi Zang, Jiakai Zhou, Weinan Zhang, Chao Yu, Yu Wang

发表机构 * Tsinghua University(清华大学) Harbin Institute of Technology(哈尔滨工业大学) Peking University(北京大学) Carnegie Mellon University(卡内基梅隆大学) Shanghai AI Laboratory(上海人工智能实验室) Zhongguancun Academy(中关村学院)

AI总结 本文提出基于强化学习的仿真-现实协同训练框架,通过结合仿真交互与真实世界数据,提升VLA模型的现实应用能力和泛化能力。

详情
AI中文摘要

仿真提供了一种可扩展且低成本的方式来丰富视觉-语言-动作(VLA)训练,减少了对昂贵真实机器人演示的依赖。然而,大多数仿真-现实协同训练方法依赖于监督微调(SFT),将仿真视为静态演示源,并未利用大规模闭环交互。因此,现实世界收益和泛化能力往往受到限制。在本文中,我们提出了一种基于强化学习的仿真-现实协同训练(RL-Co)框架,该框架在利用交互式仿真的同时保持现实世界的能力。我们的方法遵循一种通用的两阶段设计:首先使用SFT在真实和模拟演示的混合数据上预热策略,然后在仿真中通过强化学习进行微调,同时在真实世界数据上添加辅助监督损失以锚定策略并缓解灾难性遗忘。我们在四个现实世界桌面操作任务上评估了该框架,使用两种代表性的VLA架构,OpenVLA和π_{0.5},并观察到在真实-only微调和基于SFT的协同训练上的持续改进,包括在OpenVLA上的现实世界成功率提高24%和在π_{0.5}上的成功率提高20%。除了更高的成功率外,RL协同训练还表现出更强的对未见任务变化的泛化能力,并显著提高了现实世界的数据效率,为利用仿真提升真实机器人部署提供了实用且可扩展的途径。

英文摘要

Simulation offers a scalable and low-cost way to enrich vision-language-action (VLA) training, reducing reliance on expensive real-robot demonstrations. However, most sim-real co-training methods rely on supervised fine-tuning (SFT), which treats simulation as a static source of demonstrations and does not exploit large-scale closed-loop interaction. Consequently, real-world gains and generalization are often limited. In this paper, we propose an RL-based sim-real Co-training (RL-Co) framework that leverages interactive simulation while preserving real-world capabilities. Our method follows a generic two-stage design: we first warm-start the policy with SFT on a mixture of real and simulated demonstrations, then fine-tune it with reinforcement learning in simulation while adding an auxiliary supervised loss on real-world data to anchor the policy and mitigate catastrophic forgetting. We evaluate our framework on four real-world tabletop manipulation tasks using two representative VLA architectures, OpenVLA and $π_{0.5}$, and observe consistent improvements over real-only fine-tuning and SFT-based co-training, including +24% real-world success on OpenVLA and +20% on $π_{0.5}$. Beyond higher success rates, RL co-training yields stronger generalization to unseen task variations and substantially improved real-world data efficiency, providing a practical and scalable pathway for leveraging simulation to enhance real-robot deployment.

2508.06249 2026-06-05 cs.LG cs.AI

In-Training Defenses against Emergent Misalignment in Language Models

训练过程中对抗语言模型中新兴偏差的防御措施

David Kaczér, Magnus Jørgenvåg, Clemens Vetter, Esha Afzal, Robin Haselhorst, Lucie Flek, Florian Mai

发表机构 * University of Copenhagen(哥本哈根大学)

AI总结 本文研究了在训练过程中如何防止语言模型出现新兴偏差,提出了五种训练正则化干预方法,并展示了通过选择对齐模型与偏差模型之间困惑度差异的交错数据可以获得最佳效果。

Comments Accepted at ICML 2026 https://icml.cc/virtual/2026/poster/64303

详情
AI中文摘要

微调使从业者能够将对齐的大型语言模型 (LLMs) 重新用于新领域,但最近的研究揭示了新兴偏差 (EM):即使是一个小的、领域特定的微调,也可能导致远超出目标领域的有害行为。即使在模型权重被隐藏在微调API之后的情况下,这也为攻击者提供了无意中访问广泛偏差模型的途径,这从微调数据本身难以检测。我们提出了第一个系统研究在训练过程中对抗EM的防护措施,这些措施对提供者而言是可行的,他们通过API暴露微调:我们评估了这些措施是否能够防止广泛的偏差、允许狭窄的偏差、在良性任务上学习良好,并且保持一致性。我们调查了五种训练正则化干预:(i) 朝着安全参考模型的KL散度正则化,(ii) 特征空间中的ℓ2距离,(iii) 通过邪恶人格向量进行预防性引导,(iv) 从一般指令微调数据集交错训练示例,以及 (v) 疫苗提示。我们证明,通过选择对齐模型与偏差模型之间的困惑度差异的交错数据可以获得最佳效果。

英文摘要

Fine-tuning lets practitioners repurpose aligned large language models (LLMs) for new domains, yet recent work reveals emergent misalignment (EM): Even a small, domain-specific fine-tune can induce harmful behaviors far outside the target domain. Even in the case where model weights are hidden behind a fine-tuning API, this gives attackers inadvertent access to a broadly misaligned model in a way that can be hard to detect from the fine-tuning data alone. We present the first systematic study of in-training safeguards against EM that are practical for providers who expose fine-tuning via an API: We evaluate whether they a) prevent broad misalignment, b) allow narrow misalignment, c) learn well on benign tasks, and d) remain coherent. We investigate five training regularization interventions: (i) KL-divergence regularization toward a safe reference model, (ii) $\ell_2$ distance in feature space, (iii) preventive steering with an evil persona vector, (iv) interleaving training examples from a general instruct-tuning dataset and (v) inoculation prompting. We demonstrate that selecting interleaving data by the perplexity gap between aligned and misaligned models yields the best results overall.

2603.03993 2026-06-05 cs.LG cond-mat.dis-nn

Specialization of softmax attention heads: insights from the high-dimensional single-location model

softmax 注意力头的专门化:来自高维单位置模型的见解

M. Sagitova, O. Duranthon, L. Zdeborová

发表机构 * Statistical physics of computation laboratory, École Polytechnique Fédérale de Lausanne, Switzerland(计算物理学实验室,瑞士联邦理工学院拉沃斯纳分校)

AI总结 本文研究了多头注意力机制中注意力头的专门化现象,提出了一种理论模型,分析了SGD下多头softmax注意力的训练动态,并引入了Bayes-softmax注意力以优化预测性能。

详情
AI中文摘要

多头注意力使Transformer模型能够同时表示多种注意力模式。经验上,头的专门化在训练过程中出现于不同的阶段,而许多头仍然冗余且学习相似的表示。我们提出了一种理论模型,基于多索引和单位置回归框架,捕捉这一现象。第一部分分析了多头softmax注意力在SGD下的训练动态,揭示了初始非专门化阶段后,不同头依次对齐潜在信号方向的多阶段专门化阶段。第二部分研究了注意力激活函数对性能的影响。我们引入了Bayes-softmax注意力,该方法在该设置中实现了最优的预测性能。

英文摘要

Multi-head attention enables transformer models to represent multiple attention patterns simultaneously. Empirically, head specialization emerges in distinct stages during training, while many heads remain redundant and learn similar representations. We propose a theoretical model capturing this phenomenon, based on the multi-index and single-location regression frameworks. In the first part, we analyze the training dynamics of multi-head softmax attention under SGD, revealing an initial unspecialized phase followed by a multi-stage specialization phase in which different heads sequentially align with latent signal directions. In the second part, we study the impact of attention activation functions on performance. We introduce the Bayes-softmax attention, which achieves optimal prediction performance in this setting.

2603.03955 2026-06-05 cs.LG cs.AI

GIPO: Gaussian Importance Sampling Policy Optimization

GIPO:高斯重要性采样策略优化

Chengxuan Lu, Zhenquan Zhang, Shukuan Wang, Qunzhi Lin, Yanjie Li, Baigui Sun, Yang Liu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 该研究提出了一种基于截断重要性采样的策略优化目标GIPO,通过使用基于对数比率的高斯信任权重替代硬裁剪,以软化极端重要性比率同时保持非零梯度,从而提高数据效率,实验表明GIPO在多种回放缓冲区大小下均取得最佳性能,表现出优越的偏差-方差权衡、高训练稳定性及改进的样本效率。

详情
AI中文摘要

在强化学习(RL)后训练近年来已显示出在多模态智能体上超越监督模仿的强劲潜力。然而,RL仍然受到较差的数据效率的限制,特别是在交互数据稀缺且迅速过时的设置中。为了解决这一挑战,GIPO(高斯重要性采样策略优化)被提出作为基于截断重要性采样的策略优化目标,用基于对数比率的高斯信任权重替代硬裁剪,以软化极端重要性比率同时保持非零梯度。理论分析显示,GIPO引入了隐含且可调的更新幅度约束,而集中界保证了在有限样本估计下的鲁棒性和稳定性。实验结果表明,GIPO在各种回放缓冲区大小范围内,从接近策略到高度过时的数据均取得了最佳性能,同时表现出优越的偏差-方差权衡、高训练稳定性和改进的样本效率。代码可在https://github.com/distanceLu/GIPO获得。

英文摘要

Post-training with reinforcement learning (RL) has recently shown strong promise for advancing multimodal agents beyond supervised imitation. However, RL remains limited by poor data efficiency, particularly in settings where interaction data are scarce and quickly become outdated. To address this challenge, GIPO (Gaussian Importance sampling Policy Optimization) is proposed as a policy optimization objective based on truncated importance sampling, replacing hard clipping with a log-ratio-based Gaussian trust weight to softly damp extreme importance ratios while maintaining non-zero gradients. Theoretical analysis shows that GIPO introduces an implicit, tunable constraint on the update magnitude, while concentration bounds guarantee robustness and stability under finite-sample estimation. Experimental results show that GIPO achieves state-of-the-art performance among clipping-based baselines across a wide range of replay buffer sizes, from near on-policy to highly stale data, while exhibiting superior bias--variance trade-off, high training stability and improved sample efficiency. Code is available at https://github.com/distanceLu/GIPO.

2410.06703 2026-06-05 cs.AI

ST-WebAgentBench: A Benchmark for Evaluating Safety and Trustworthiness in Web Agents

ST-WebAgentBench:用于评估网络代理安全性和可信度的基准测试

Ido Levy, Ben Wiesel, Sami Marreed, Alon Oved, Avi Yaeli, Nir Mashkif, Segev Shlomov

发表机构 * IBM Research(IBM研究院)

AI总结 本文提出ST-WebAgentBench基准测试,用于评估网络代理在现实企业场景中的安全性和可信度,通过引入新的评估指标CuP和风险比,揭示了现有代理的安全性缺陷。

Comments The Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
AI中文摘要

自主网络代理能够解决复杂的浏览任务,但现有基准测试仅衡量代理是否完成任务,而忽略了其完成任务的安全性和企业可信任性。为了将这些代理整合到关键工作流程中,安全性和可信度(ST)是采用的前提条件。我们介绍了ST-WebAgentBench,一个可配置且易于扩展的评估套件,用于在现实企业场景中评估网络代理的ST。其222个任务均配以ST策略,即简明的规则,编码约束,并在六个正交维度(如用户同意、鲁棒性)上评分。除了原始任务成功率外,我们提出了完成受政策约束(CuP)指标,仅奖励遵守所有适用政策的完成情况,以及风险比,量化各维度上的ST违规情况。评估三个最先进的开放代理揭示了其平均CuP低于名义完成率的三分之二,暴露了关键安全漏洞。通过发布代码、评估模板和政策编写界面,ST-WebAgentBench提供了一个可操作的第一步,以部署可信赖的网络代理规模化。

英文摘要

Autonomous web agents solve complex browsing tasks, yet existing benchmarks measure only whether an agent finishes a task, ignoring whether it does so safely or in a way enterprises can trust. To integrate these agents into critical workflows, safety and trustworthiness (ST) are prerequisite conditions for adoption. We introduce \textbf{\textsc{ST-WebAgentBench}}, a configurable and easily extensible suite for evaluating web agent ST across realistic enterprise scenarios. Each of its 222 tasks is paired with ST policies, concise rules that encode constraints, and is scored along six orthogonal dimensions (e.g., user consent, robustness). Beyond raw task success, we propose the \textit{Completion Under Policy} (\textit{CuP}) metric, which credits only completions that respect all applicable policies, and the \textit{Risk Ratio}, which quantifies ST breaches across dimensions. Evaluating three open state-of-the-art agents reveals that their average CuP is less than two-thirds of their nominal completion rate, exposing critical safety gaps. By releasing code, evaluation templates, and a policy-authoring interface, \href{https://sites.google.com/view/st-webagentbench/home}{\textsc{ST-WebAgentBench}} provides an actionable first step toward deploying trustworthy web agents at scale.

2603.00573 2026-06-05 cs.CL

CoMoL: Efficient Mixture of LoRA Experts via Dynamic Core Space Merging

CoMoL: 通过动态核心空间融合实现高效的LoRA专家混合

Jie Cao, Zhenxuan Fan, Zhuonan Wang, Tianwei Lin, Ziyuan Zhao, Rolan Yan, Wenqiao Zhang, Feifei Shao, Hongwei Wang, Jun Xiao, Siliang Tang

发表机构 * Zhejiang University(浙江大学) Wechat, Tencent(微信,腾讯)

AI总结 本文提出CoMoL,一种新的MoE-LoRA框架,通过引入核心空间专家和核心空间路由,实现参数高效和细粒度适应,同时在多个任务中优于现有方法。

详情
AI中文摘要

大型语言模型(LLMs)通过参数高效微调(PEFT)在多样化的下游和领域特定任务中取得显著性能。然而,现有的PEFT方法,特别是MoE-LoRA架构,由于LoRA专家和实例级路由的普及,存在参数效率低和粗粒度适应的问题。为了解决这些问题,我们提出了核心空间混合的LoRA(CoMoL),一种新颖的MoE-LoRA框架,结合了专家多样性、参数效率和细粒度适应。具体而言,CoMoL引入了两个关键组件:核心空间专家和核心空间路由。核心空间专家将每个专家存储在紧凑的核心矩阵中,保留多样性同时控制参数增长。核心空间路由动态选择并激活每个标记的适当核心专家,实现细粒度、输入自适应的路由。激活的核心专家通过软融合策略合并成一个核心专家,再与共享的LoRA结合形成专用的LoRA模块。此外,路由网络被投影到与LoRA矩阵相同的低秩空间中,进一步减少参数开销而不影响表达能力。广泛的实验表明,CoMoL保留了MoE-LoRA架构的适应性,同时在参数效率上与标准LoRA相当,在多个任务中持续优于现有方法。

英文摘要

Large language models (LLMs) achieve remarkable performance on diverse downstream and domain-specific tasks via parameter-efficient fine-tuning (PEFT). However, existing PEFT methods, particularly MoE-LoRA architectures, suffer from limited parameter efficiency and coarse-grained adaptation due to the proliferation of LoRA experts and instance-level routing. To address these issues, we propose Core Space Mixture of LoRA (\textbf{CoMoL}), a novel MoE-LoRA framework that incorporates expert diversity, parameter efficiency, and fine-grained adaptation. Specifically, CoMoL introduces two key components: core space experts and core space routing. Core space experts store each expert in a compact core matrix, preserving diversity while controlling parameter growth. Core space routing dynamically selects and activates the appropriate core experts for each token, enabling fine-grained, input-adaptive routing. Activated core experts are then merged via a soft-merging strategy into a single core expert, which is combined with a shared LoRA to form a specialized LoRA module. Besides, the routing network is projected into the same low-rank space as the LoRA matrices, further reducing parameter overhead without compromising expressiveness. Extensive experiments demonstrate that CoMoL retains the adaptability of MoE-LoRA architectures while achieving parameter efficiency comparable to standard LoRA, consistently outperforming existing methods across multiple tasks.

2602.24207 2026-06-05 cs.LG cs.CY cs.GT stat.ML

The Stability of Online Algorithms in Performative Prediction

在线算法在表现性预测中的稳定性

Gabriele Farina, Juan Carlos Perdomo

发表机构 * MIT(麻省理工学院) NYU(纽约大学)

AI总结 本文研究了在线算法在表现性预测中的稳定性问题,证明了任何在表现性设置中使用的无遗憾算法都会收敛到一种表现性稳定的均衡状态,该状态中模型主动塑造数据分布,使得其预测在事后看来是最优的。该研究避免了对模型如何影响分布的假设,并揭示了常见算法如梯度下降为何能自然稳定化并防止 runaway 反馈循环。

详情
AI中文摘要

使用算法预测进行决策会导致反馈循环,其中我们部署的模型主动影响我们看到的数据分布,以及后来用于重新训练的数据分布。这种动态由Perdomo等人在表现性预测工作中正式化。我们的主要结果是一个无条件的减少,表明任何在表现性设置中使用的无遗憾算法都会收敛到一个(混合)表现性稳定的均衡:一种解决方案,其中模型以使它们的预测在事后看来最优的方式塑造数据分布。在我们之前的工作之前,该领域所有积极结果都对模型如何影响分布施加了强限制。通过使用鞅论据并允许随机化,我们避免了对人口如何响应预测的任何假设,并绕过了最近的硬度结果,表明确定性稳定的模型通常在PPAD难度上是难以计算的。最后,从概念上讲,我们的连接揭示了常见算法如梯度下降为何自然稳定化并防止 runaway 反馈循环。我们希望我们的工作能促进未来在线优化和表现性之间的技术转移。

英文摘要

The use of algorithmic predictions in decision-making leads to a feedback loop where the models we deploy actively influence the data distributions we see, and later use to retrain on. This dynamic was formalized by Perdomo et al. 2020 in their work on performative prediction. Our main result is an unconditional reduction showing that any no-regret algorithm deployed in performative settings converges to a (mixed) performatively stable equilibrium: a solution in which models actively shape data distributions in ways that their own predictions look optimal in hindsight. Prior to our work, all positive results in this area imposed strong restrictions on how models influenced distributions. By using a martingale argument and allowing randomization, we avoid any assumption on how populations respond to predictions and sidestep recent hardness results showing that deterministic stable models are in general PPAD-hard to compute. Lastly, on a more conceptual note, our connection sheds light on why common algorithms, like gradient descent, are naturally stabilizing and prevent runaway feedback loops. We hope our work enables future technical transfer of ideas between online optimization and performativity.

2602.23845 2026-06-05 cs.CL

CLFEC: A New Task for Unified Linguistic and Factual Error Correction in paragraph-level Chinese Professional Writing

CLFEC:一种新的任务,用于段落级中文专业写作中的统一语言和事实纠错

Jian Kai, Zidong Zhang, Jiwen Chen, Zhengxiang Wu, Songtao Sun, Fuyang Li, Yang Cao, Qiang Liu

发表机构 * Huazhong University of Science and Technology(华中科技大学) WPS AI, Kingsoft Office(WPS AI,Kingsoft Office)

AI总结 本文提出CLFEC任务,旨在解决段落级中文专业写作中语言和事实错误的联合纠错问题,构建了多领域数据集,并系统研究了基于LLM的纠错方法,揭示了实际挑战并展示了统一纠错的优势。

详情
AI中文摘要

中文文本纠错传统上专注于拼写和语法,而事实纠错通常被单独处理。然而,在段落级中文专业写作中,语言(词语/语法/标点)和事实错误经常同时出现并相互影响,且许多草稿级错误在编辑审核后发布的文本中稀疏可见,这使得统一纠错既必要又需要构建受控基准。本文介绍了CLFEC(中文语言与事实纠错)这一新任务,用于联合语言和事实纠错。我们构建了一个涵盖时事、金融、法律和医学等多领域的中文专业写作混合数据集。然后,我们系统地研究了基于LLM的纠错范式,从提示到检索增强生成(RAG)和代理工作流。分析揭示了实际挑战,包括专门纠错模型的泛化能力有限、事实修复需要证据支撑、混合错误段落的难度以及对干净输入的过度纠正。结果进一步表明,在同一上下文中处理语言和事实错误优于解耦的流程,并且合适的基模型可以使代理工作流有效。总体而言,CLFEC为中文文本纠错研究提供了新的基准,并为校对系统提供了实用指导。

英文摘要

Chinese text correction has traditionally focused on spelling and grammar, while factual error correction is usually treated separately. However, in paragraph-level Chinese professional writing, linguistic (word/grammar/punctuation) and factual errors frequently co-occur and interact, while many draft-level errors are sparsely observable in published texts after editorial review, making unified correction both necessary and controlled benchmark construction essential. This paper introduces CLFEC (Chinese Linguistic \& Factual Error Correction), a new task for joint linguistic and factual correction. We construct a mixed, multi-domain Chinese professional writing dataset spanning current affairs, finance, law, and medicine. We then conduct a systematic study of LLM-based correction paradigms, from prompting to retrieval-augmented generation (RAG) and agentic workflows. The analysis reveals practical challenges, including limited generalization of specialized correction models, the need for evidence grounding for factual repair, the difficulty of mixed-error paragraphs, and over-correction on clean inputs. Results further show that handling linguistic and factual errors within the same context outperforms decoupled pipelines, and that agentic workflows can be effective with suitable backbone models. Overall, CLFEC provides a new benchmark for Chinese text correction research and practical guidance for proofreading systems.

2602.19327 2026-06-05 cs.LG cs.AI

Soft Sequence Policy Optimization

软序列策略优化

Svetlana Glazyrina, Maksim Kryzhanovskiy, Roman Ischenko

发表机构 * Lomonosov Moscow State University(罗蒙诺索夫莫斯科国立大学) Institute for Artificial Intelligence(人工智能研究所)

AI总结 本文提出软序列策略优化方法,通过引入软门控函数改进序列级重要性权重,提升大语言模型对齐任务的训练稳定性与性能。

详情
AI中文摘要

大量近期关于大语言模型(LLM)对齐的研究聚焦于基于组相对策略优化(GRPO)开发新的策略优化方法。两个显著方向出现:(i)向序列级重要性采样权重的转变,以更好地对齐许多任务中使用的序列级奖励;(ii)替代PPO风格的剪裁方法,以避免相关的训练信号损失和熵崩溃。我们引入了软序列策略优化(SSPO),一种离策略强化学习目标,其在序列级重要权重中整合了token级概率比的软门控函数。我们为SSPO提供了理论动机,并调查了实际修改以改善优化行为。实证结果显示,SSPO在数学推理和编码任务中均提高了训练稳定性与性能。

英文摘要

A significant portion of recent research on Large Language Model (LLM) alignment focuses on developing new policy optimization methods based on Group Relative Policy Optimization (GRPO). Two prominent directions have emerged: (i) a shift toward sequence-level importance sampling weights that better align with the sequence-level rewards used in many tasks, and (ii) alternatives to the PPO-style clipping that aim to avoid the associated loss of training signal and entropy collapse. We introduce Soft Sequence Policy Optimization, an off-policy reinforcement learning objective that incorporates soft gating functions over token-level probability ratios within sequence-level importance weights. We provide theoretical motivation for SSPO and investigate practical modifications to improve optimization behavior. Empirically, we demonstrate that SSPO improves training stability and performance both in mathematical reasoning and coding tasks.

2602.22417 2026-06-05 cs.SD eess.AS

Absorbing Discrete Diffusion for Speech Enhancement

吸收式离散扩散用于语音增强

Philippe Gonzalez

发表机构 * Department of Health Technology, Technical University of Denmark(丹麦技术大学健康技术系)

AI总结 本文提出了一种基于吸收式离散扩散的语音增强方法ADDSE,结合神经音频编解码器的潜在空间和扩散模型的非自回归采样过程,以提高低信噪比下的语音增强性能。

Comments Accepted at Interspeech 2026

详情
AI中文摘要

受最近神经语音编码和基于扩散的语言模型发展的启发,我们通过吸收式离散扩散建模清洁语音代码的条件分布来解决语音增强问题。所提出的方法ADDSE结合了神经音频编解码器的表达性潜在空间和扩散模型的非自回归采样过程。为高效建模残差向量量化代码的分层结构,我们提出了RQDiT,结合了RQ-Transformer和扩散Transformer的技术以实现非自回归建模。结果表明,在两个数据集上,该方法在非侵入性客观指标上表现竞争,尤其是在低信噪比和少量采样步骤的情况下。代码和音频示例已在线可用。

英文摘要

Inspired by recent developments in neural speech coding and diffusion-based language modeling, we tackle speech enhancement by modeling the conditional distribution of clean speech codes given noisy speech codes using absorbing discrete diffusion. The proposed approach, which we call ADDSE, leverages both the expressive latent space of neural audio codecs and the non-autoregressive sampling procedure of diffusion models. To efficiently model the hierarchical structure of residual vector quantization codes, we propose RQDiT, which combines techniques from RQ-Transformer and diffusion Transformers for non-autoregressive modeling. Results show competitive performance in terms of non-intrusive objective metrics on two datasets, especially at low signal-to-noise ratios and with few sampling steps. Code and audio examples are available online.

2602.22067 2026-06-05 cs.AI

Semantic Partial Grounding via LLMs

通过大语言模型实现语义部分 grounding

Giuseppe Canonaco, Alberto Pozanco, Daniel Borrajo

发表机构 * Department of Computer Science, University of Cambridge(剑桥大学计算机科学系)

AI总结 本文提出SPG-LLM,利用大语言模型分析领域和问题文件,提前识别可能不相关的对象、动作和谓词,从而减少grounding任务的规模,提升grounding效率并在某些领域实现更优的计划成本。

详情
AI中文摘要

Grounding是经典规划中的关键步骤,但随着任务规模增大,grounded动作和原子的指数增长常使其成为计算瓶颈。近期部分grounding方法通过预测模型逐步grounding最有前景的操作符来解决这一挑战。然而,这些方法主要依赖关系特征或学习到的嵌入,未利用PDDL描述中的文本和结构线索。我们提出SPG-LLM,利用大语言模型分析领域和问题文件,启发式地识别潜在不相关的对象、动作和谓词,从而在grounding前显著减少grounded任务的规模。在七个难以grounding的基准测试中,SPG-LLM实现了更快的grounding速度(通常快几个数量级),并在某些领域实现了可比或更优的计划成本。

英文摘要

Grounding is a critical step in classical planning, yet it often becomes a computational bottleneck due to the exponential growth in grounded actions and atoms as task size increases. Recent advances in partial grounding have addressed this challenge by incrementally grounding only the most promising operators, guided by predictive models. However, these approaches primarily rely on relational features or learned embeddings and do not leverage the textual and structural cues present in PDDL descriptions. We propose SPG-LLM, which uses LLMs to analyze the domain and problem files to heuristically identify potentially irrelevant objects, actions, and predicates prior to grounding, significantly reducing the size of the grounded task. Across seven hard-to-ground benchmarks, SPG-LLM achieves faster grounding-often by orders of magnitude-while delivering comparable or better plan costs in some domains.

2602.16705 2026-06-05 cs.RO cs.CV

HERO: Learning Humanoid End-Effector Control for Visual Whole-Body Open-Vocabulary Object Grasping

HERO: 学习人形机器人的末端执行器控制用于视觉全身体对象抓取

Runpei Dong, Ziyan Li, Arjun Gupta, Xialin He, Saurabh Gupta

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 该研究提出HERO方法,通过结合大视觉模型和模拟训练,实现了视觉全身体对象抓取任务中末端执行器的高精度控制和场景理解,显著提升了抓取精度和泛化能力。

Comments Project page: https://hero-humanoid.github.io/

详情
AI中文摘要

视觉定位和操作任意真实场景中的物体需要精确的末端执行器(EE)控制和从视觉输入(如RGB-D图像)中获得的可泛化场景理解。现有的模仿和仿真到现实的方法通过单体端到端学习同时学习这两个方面,因此难以扩展。在本工作中,我们利用最适合每个问题的工具——大视觉模型用于可泛化的场景理解和模拟训练用于精确的末端执行器控制,从而得到一个整体模块化的定位和操作系统,表现出强大的泛化能力。我们的核心技术创新是HERO,一个通过结合经典机器人学和机器学习实现的准确残差感知末端执行器跟踪策略。它利用a)逆运动学将残差末端执行器目标转换为参考轨迹,b)一个学习的神经前向模型用于准确的前向运动学,以及c)目标调整和重新规划。这些创新共同将末端执行器跟踪误差减少到2.44厘米,优于最强的先前方法5.5倍。我们的整体系统在多样化的现实环境中运行,从办公室到咖啡馆,机器人能够可靠地抓取各种日常物体(如杯子、苹果、玩具)在高度从43厘米到92厘米的表面上。系统性的模块化和端到端测试验证了我们提出设计的有效性。我们相信我们的进展为训练人形机器人与日常物体互动开辟了新途径。

英文摘要

Visual loco-manipulation of arbitrary in-the-wild objects requires accurate end-effector (EE) control and a generalizable understanding of the scene from visual inputs (eg, RGB-D images). Existing imitation and sim2real methods jointly learn both these aspects via monolithic end-to-end learning and are thus hard to scale. In this work, we bring to bear the best tools for each of these problems -- large vision models for generalizable scene understanding and simulated training for accurate EE control -- leading to an overall modular loco-manipulation system that exhibits strong generalization. Our core technical innovation is HERO, an accurate residual-aware EE tracking policy made possible by combining classical robotics with machine learning. It uses a) inverse kinematics to convert residual end-effector targets into reference trajectories, b) a learned neural forward model for accurate forward kinematics, and c) goal adjustment and replanning. Together, these innovations reduce the end-effector tracking error to 2.44cm, outperforming the strongest prior method by 5.5x. Our overall system operates in diverse real-world environments, from offices to coffee shops, where the robot reliably grasps various everyday objects (eg, mugs, apples, toys) on surfaces ranging from 43cm to 92cm in height. Systematic modular and end-to-end tests demonstrate the effectiveness of our proposed design. We believe our advances open up new ways of training humanoids to interact with daily objects.

2602.18955 2026-06-05 cs.LG

Incremental Transformer Neural Processes

增量变换器神经过程

Philip Mortimer, Cristiana Diaconu, Tommy Rochussen, Bruno Mlodozeniec, Richard E. Turner

发表机构 * University of Cambridge(剑桥大学)

AI总结 本文提出增量变换器神经过程(incTNP),通过因果掩码、键值缓存和高效自回归训练策略,在保持预测性能的同时将更新计算复杂度从二次降低到线性,从而在序列推理中实现显著的速度提升,并保持流式推理的一致性。

Comments Accepted at ICML 2026

详情
AI中文摘要

神经过程(NPs)以及特定的变换器神经过程(TNPs)在从时空预测到表格数据建模的任务中展现了卓越的性能。然而,许多应用本质上是序列性的,涉及连续数据流,如实时传感器读数或数据库更新。在这种情况下,模型应支持低成本的增量更新,而不是为每个新观察重新计算内部表示——这一能力现有的TNP变体所缺乏。受大型语言模型的启发,我们引入了增量TNP(incTNP)。通过利用因果掩码、键值(KV)缓存和数据高效的自回归训练策略,incTNP在保持标准TNPs的预测性能的同时,将更新的计算成本从二次复杂度降低到线性复杂度。我们在一系列合成和现实任务上经验性地评估了我们的模型,包括表格回归和温度预测。我们的结果表明,令人惊讶的是,incTNP的性能与非因果TNPs相当或更好,同时在序列推理中解锁了数量级的速度提升。最后,我们评估了模型更新的一致性——通过适配

英文摘要

Neural Processes (NPs), and specifically Transformer Neural Processes (TNPs), have demonstrated remarkable performance across tasks ranging from spatiotemporal forecasting to tabular data modelling. However, many of these applications are inherently sequential, involving continuous data streams such as real-time sensor readings or database updates. In such settings, models should support cheap, incremental updates rather than recomputing internal representations from scratch for every new observation -- a capability existing TNP variants lack. Drawing inspiration from Large Language Models, we introduce the Incremental TNP (incTNP). By leveraging causal masking, Key-Value (KV) caching, and a data-efficient autoregressive training strategy, incTNP matches the predictive performance of standard TNPs while reducing the computational cost of updates from quadratic to linear time complexity. We empirically evaluate our model on a range of synthetic and real-world tasks, including tabular regression and temperature prediction. Our results show that, surprisingly, incTNP delivers performance comparable to -- or better than -- non-causal TNPs while unlocking orders-of-magnitude speedups for sequential inference. Finally, we assess the consistency of the model's updates -- by adapting a metric of "implicit Bayesianness", we show that under a one-at-a-time streaming protocol, incTNP retains a prediction rule as implicitly Bayesian as standard non-causal TNPs, demonstrating that incTNP achieves the computational benefits of causal masking without sacrificing the consistency required for streaming inference.

2602.07875 2026-06-05 cs.LG

Harpoon: Generalised Manifold Guidance for Conditional Tabular Diffusion

Harpoon:基于条件表格扩散的通用流形引导

Aditya Shankar, Yuandou Wang, Rihan Hai, Lydia Y. Chen

发表机构 * Department of Computer Science, Delft University of Technology(代尔夫特理工大学计算机科学系) Department of Computer Science, Université de Neuchâtel(日内瓦大学计算机科学系)

AI总结 本文提出Harpoon,一种基于流形引导的条件表格扩散方法,通过扩展流形理论来处理多样化的推理目标,从而在表格数据生成中实现更精确的条件控制。

Comments Accepted at ICLR 2026

详情
AI中文摘要

在需要对生成过程进行精确控制的应用中,生成表格数据至关重要。现有方法依赖于训练时的策略,无法在推理时泛化到未见过的约束,并且难以处理超出表格填补的条件任务。虽然流形理论提供了一种指导生成的原理化方法,但当前的公式化方法局限于特定的推理时间目标,并且仅限于连续领域。我们扩展了流形理论到表格数据,并扩展了其范围以处理多样的推理时间目标。在此基础上,我们引入了HARPOON,一种表格扩散方法,通过引导无约束样本沿着流形几何来满足多样化的表格条件。我们在填补和强制不等约束等任务上通过实验证明了我们的理论贡献,展示了HARPOON在各种数据集上的强大性能以及流形感知指导在表格数据中的实际好处。代码URL:https://github.com/adis98/Harpoon

英文摘要

Generating tabular data under conditions is critical to applications requiring precise control over the generative process. Existing methods rely on training-time strategies that do not generalise to unseen constraints during inference, and struggle to handle conditional tasks beyond tabular imputation. While manifold theory offers a principled way to guide generation, current formulations are tied to specific inference-time objectives and are limited to continuous domains. We extend manifold theory to tabular data and expand its scope to handle diverse inference-time objectives. On this foundation, we introduce HARPOON, a tabular diffusion method that guides unconstrained samples along the manifold geometry to satisfy diverse tabular conditions at inference. We validate our theoretical contributions empirically on tasks such as imputation and enforcing inequality constraints, demonstrating HARPOON'S strong performance across diverse datasets and the practical benefits of manifold-aware guidance for tabular data. Code URL: https://github.com/adis98/Harpoon

2509.24882 2026-06-05 cs.LG cond-mat.dis-nn cs.AI stat.ML

Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

浅层神经网络在特征学习 regime 中的缩放定律与谱特性

Leonardo Defilippis, Yizhou Xu, Julius Girardin, Emanuele Troiani, Vittorio Erba, Lenka Zdeborová, Bruno Loureiro, Florent Krzakala

发表机构 * Departement d’Informatique, École Normale Supérieure, PSL & CNRS(信息学院,巴黎高等师范学院,PSL与CNRS) Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(计算统计物理实验室,洛桑联邦理工学院(EPFL)) Information, Learning and Physics Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(信息、学习与物理实验室,洛桑联邦理工学院(EPFL))

AI总结 本文研究了浅层神经网络在特征学习 regime 中的缩放定律与谱特性,通过分析二次和对角神经网络的缩放规律,揭示了样本复杂度和权重衰减对过剩风险缩放指数的影响,并建立了这些 regime 与训练网络权重谱性质的精确联系。

详情
Journal ref
ICLR 2026
AI中文摘要

神经缩放定律是深度学习近期许多进展的基础,但其理论理解仍然主要局限于线性模型。在本文中,我们系统分析了二次和对角神经网络在特征学习 regime 中的缩放定律。利用与矩阵压缩感知和LASSO的联系,我们推导了过剩风险缩放指数作为样本复杂度和权重衰减函数的详细相图。这种分析揭示了不同缩放 regime 之间的交叉和平台行为,与经验神经缩放文献中广泛报告的现象相呼应。此外,我们建立了这些 regime 与训练网络权重谱性质的精确联系,我们对其进行了详细刻画。作为结果,我们提供了最近经验观察的理论验证,这些观察将权重谱中幂律尾部的出现与网络泛化性能联系起来,从而给出了从基本原理出发的解释。

英文摘要

Neural scaling laws underlie many of the recent advances in deep learning, yet their theoretical understanding remains largely confined to linear models. In this work, we present a systematic analysis of scaling laws for quadratic and diagonal neural networks in the feature learning regime. Leveraging connections with matrix compressed sensing and LASSO, we derive a detailed phase diagram for the scaling exponents of the excess risk as a function of sample complexity and weight decay. This analysis uncovers crossovers between distinct scaling regimes and plateau behaviors, mirroring phenomena widely reported in the empirical neural scaling literature. Furthermore, we establish a precise link between these regimes and the spectral properties of the trained network weights, which we characterize in detail. As a consequence, we provide a theoretical validation of recent empirical observations connecting the emergence of power-law tails in the weight spectrum with network generalization performance, yielding an interpretation from first principles.

2602.16965 2026-06-05 cs.LG

Multi-Agent Lipschitz Bandits

多智能体Lipschitz老虎机

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

发表机构 * University of Colorado Boulder(科罗拉多大学波德穆尔分校) INRIA Paris(巴黎国家信息与自动化研究所)

AI总结 本文研究了在连续Lipschitz结构动作空间上的去中心化多玩家随机老虎机问题,其中硬碰撞导致零奖励。研究提出了一种无需通信的策略,旨在最大化集体奖励,同时分离协调成本和学习成本。通过新颖的maxima-directed搜索识别并安排玩家到高价值区域,将问题分解为N个独立的单玩家Lipschitz老虎机。在共识模式下,得到端到端的 regrets bound,其主导学习项为~O(T^{(d+1)/(d+2)}),与单玩家Lipschitz速率匹配;前期协调成本在固定置信度下与时间无关,仅在期望 regrets 形式中为多项式对数。在额外的公共覆盖/调度假设下,还获得了无间隙~O(T^{(d+1)/(d+2)})保证。进一步推导了主导学习项的匹配下界,并将框架扩展到一般距离阈值碰撞模型。

Comments Twenty-Ninth Annual Conference on Artificial Intelligence and Statistics (AISTATS 2026)

详情
AI中文摘要

我们研究了在连续、Lipschitz-结构化动作空间上的去中心化多玩家随机老虎机问题,其中硬碰撞导致零奖励。我们的目标是设计一种无需通信的策略,以最大化集体奖励,同时将协调成本与学习成本分开。我们提出了一种模块化协议,首先通过新颖的maxima-directed搜索识别并安排玩家到不同的高价值区域,然后将问题分解为N个独立的单玩家Lipschitz老虎机。在共识模式下,我们得到了端到端的regret界,其主导学习项为~O(T^{(d+1)/(d+2)}),与单玩家Lipschitz速率匹配;前期协调成本在固定置信度下与时间无关,仅在期望regret形式中为多项式对数。在额外的公共覆盖/调度假设下,我们还获得了无间隙~O(T^{(d+1)/(d+2)})保证。我们进一步推导了主导学习项的匹配下界,并将框架扩展到一般距离阈值碰撞模型。

英文摘要

We study the decentralized multi-player stochastic bandit problem over a continuous, Lipschitz-structured action space where hard collisions yield zero reward. Our objective is to design a communication-free policy that maximizes collective reward, while separating coordination costs from learning costs. We propose a modular protocol that first solves the multi-agent coordination problem by identifying and seating players on distinct, high-value regions via a novel maxima-directed search and then decouples the problem into $N$ independent single-player Lipschitz bandits. In the consensus regime, we obtain an end-to-end regret bound whose dominant learning term is \(\tilde{O}(T^{(d+1)/(d+2)})\), matching the single-player Lipschitz rate; the upfront coordination cost is horizon-independent at fixed confidence and only polylogarithmic in \(T\) in the expected-regret form. Under an additional public coverage/scheduling assumption for the epochic extension, we also obtain a gap-free \(\tilde{O}(T^{(d+1)/(d+2)})\) guarantee. We further derive a matching lower bound for the dominant learning term and extend the framework to general distance-threshold collision models.

2602.16149 2026-06-05 cs.CV

Toward Trustworthy Portrait Editing: Evaluation of Demographic Misrepresentation in I2I Models

迈向可信的人像编辑:评估 I2I 模型中的人口统计误表示

Huichan Seo, Minki Hong, Sieun Choi, Jihie Kim, Jean Oh

发表机构 * arXiv

AI总结 本文通过控制基准测试,评估了指令引导的图像到图像编辑器中身份保留失败的两个模式(软擦除和刻板印象替换),发现肤色变浅等偏差普遍存在且人口统计不均,并提出提示级约束作为缓解措施。

Comments 22 pages, 10 figures. Huichan Seo, Minki Hong and Sieun Choi contributed equally

详情
AI中文摘要

指令引导的图像到图像(I2I)编辑器越来越多地用于消费者和专业视觉工作流程,其可信度不仅取决于提示遵循性,还取决于与身份相关属性的公平保留。我们形式化了两种失败模式:软擦除,即请求的编辑被弱实现或静默抑制;以及刻板印象替换,即编辑引入未请求的、符合刻板印象的人口统计属性。使用包含5,040张编辑人像的控制基准,我们通过视觉语言模型评分和人工评估,评估了这些失败在三个近期开源编辑器中的表现。结果表明,身份保留失败普遍存在且人口统计不均。特别是,62-71%的输出表现出肤色变浅,其中印度和黑人源人像受影响率为72-75%,而白人源人像为44%,表明当身份约束未明确指定时,输出层面存在向更浅肤色或更白人外观漂移的趋势。在一项缓解案例研究中,提示级外观约束将非白人源人像的种族变化评分降低了最多1.48分,而白人源人像基本不变,且无需修改模型权重。这些发现表明,身份保留并非I2I人像编辑系统的统一属性,而是一种分布不均的可信度失败,具有直接的社会后果。在部署规模上,这种静默扭曲可能塑造AI中介的自我表征并强化表征差异。我们引入了一种用于公平性感知评估和生成式编辑系统治理的控制审计协议。项目页面:https://seochan99.github.io/i2i-demographic-bias

英文摘要

Instruction-guided image-to-image (I2I) editors are increasingly used in consumer and professional visual workflows, where trustworthiness depends not only on prompt compliance but also on equitable preservation of identity-relevant attributes. We formalize two failure modes: Soft Erasure, where requested edits are weakly realized or silently suppressed, and Stereotype Replacement, where edits introduce unrequested, stereotype-consistent demographic attributes. Using a controlled benchmark of 5,040 edited portraits, we evaluate these failures across three recent open-weight editors with vision-language model scoring and human evaluation. Our results show that identity-preservation failures are pervasive and demographically uneven. In particular, 62--71% of outputs exhibit skin lightening, with Indian and Black source portraits affected at 72--75%, compared with 44% for White source portraits, indicating output-level drift toward lighter or more White-presenting appearances when identity constraints are underspecified. In a mitigation case study, prompt-level appearance constraints reduce race-change scores for non-White source portraits by up to 1.48 points, while leaving White source portraits largely unchanged, without modifying model weights. These findings show that identity preservation is not a uniform property of I2I portrait editing systems, but an unevenly distributed trustworthiness failure with direct social consequences. At deployment scale, such silent distortions can shape AI-mediated self-representation and reinforce representational disparities. We introduce a controlled audit protocol for fairness-aware evaluation and governance of generative editing systems. Project page: https://seochan99.github.io/i2i-demographic-bias

2507.12257 2026-06-05 cs.LG physics.data-an stat.ML stat.OT

Robust Causal Discovery in Real-World Time Series with Power-Laws

在现实时间序列中使用幂律实现鲁棒因果发现

Matteo Tusoni, Giuseppe Masi, Andrea Coletta, Aldo Glielmo, Viviana Arrigoni, Novella Bartolini

发表机构 * Department of Computer Science, Sapienza University of Rome(罗马大学计算机科学系)

AI总结 本文提出了一种基于幂律谱特征提取的鲁棒因果发现方法,以提高在现实时间序列中因果关系发现的鲁棒性,该方法在合成数据集和真实数据集上均优于现有方法。

详情
AI中文摘要

在随机时间序列中探索因果关系是一项具有广泛应用(包括金融、经济、神经科学和气候科学)的挑战性但至关重要的任务。许多因果发现(CD)算法已被提出;然而,它们通常对噪声高度敏感,在真实数据中导致虚假的因果推断。在本文中,我们观察到许多现实时间序列的频率谱遵循幂律分布,这主要是由于内在的自组织行为。利用这一见解,我们构建了一种基于提取幂律谱特征的鲁棒CD方法,以放大真实的因果信号。我们的方法在合成基准和具有已知因果结构的真实数据集上均优于最先进的替代方法,证明了其鲁棒性和实际相关性。

英文摘要

Exploring causal relationships in stochastic time series is a challenging yet crucial task with a vast range of applications, including finance, economics, neuroscience, and climate science. Many algorithms for Causal Discovery (CD) have been proposed; however, they often exhibit a high sensitivity to noise, resulting in spurious causal inferences in real data. In this paper, we observe that the frequency spectra of many real-world time series follow a power-law distribution, notably due to an inherent self-organizing behavior. Leveraging this insight, we build a robust CD method based on the extraction of power-law spectral features that amplify genuine causal signals. Our method consistently outperforms state-of-the-art alternatives on both synthetic benchmarks and real-world datasets with known causal structures, demonstrating its robustness and practical relevance.