arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 8081
专题追踪
2606.01162 2026-06-03 cs.AI

Deft Scheduling of Dynamic Cloud Workflows with Varying Deadlines via Mixture-of-Experts

基于混合专家模型的动态云工作流截止时间感知调度

Ya Shen, Gang Chen, Hui Ma, Mengjie Zhang

发表机构 * School of Engineering and Computer Science, Victoria University of Wellington(维多利亚大学工程与计算机科学学院)

AI总结 提出一种基于混合专家模型的深度强化学习调度策略DEFT,通过图自适应门控机制动态路由决策,有效降低执行成本和截止时间违反率。

Comments This paper has been accepted by the Fourteenth International Conference on Learning Representations (ICLR 2026)

详情
AI中文摘要

云计算中的工作流调度需要将动态到达、图结构且具有不同截止时间的工作流智能地分配到不断变化的虚拟机资源上。然而,现有的深度强化学习调度器受限于僵化的单路径推理架构,难以处理多样化的调度场景。我们引入了 extbf{DEFT}(截止时间感知的混合专家模型),一种创新的深度强化学习策略架构,利用专门的混合专家模型,每个专家被训练用于管理不同级别的截止时间紧迫性。据我们所知,DEFT是首个引入并验证用于动态云工作流调度的混合专家模型架构。通过自适应地将决策路由到最合适的专家,DEFT能够满足单个专家无法实现的广泛截止时间要求。DEFT的核心是一种 extbf{图自适应}门控机制,该机制编码工作流截止时间和DAG、任务状态以及虚拟机条件,使用交叉注意力以细粒度、截止时间敏感的方式指导专家激活。在动态云工作流基准上的实验表明,DEFT显著降低了执行成本和截止时间违反率,优于多个最先进的深度强化学习基线。

英文摘要

Workflow scheduling in cloud computing demands the intelligent allocation of dynamically arriving, graph-structured workflows with varying deadlines onto ever-changing virtual machine resources. However, existing deep reinforcement learning (DRL) schedulers remain limited by rigid, single-path inference architectures that struggle to handle diverse scheduling scenarios. We introduce $\textbf{DEFT}$ ($\textbf{D}$eadline-p$\textbf{E}$rceptive Mixture-o$\textbf{F}$-Exper$\textbf{t}$s), an innovative DRL policy architecture that leverages a specialized mixture of experts, each trained to manage different levels of deadline tightness. To our knowledge, DEFT is the first to introduce and validate a Mixture-of-Experts architecture for dynamic cloud workflow scheduling. By adaptively routing decisions through the most appropriate experts, DEFT is capable of meeting a broad spectrum of deadline requirements that no single expert can achieve. Central to DEFT is a $\textbf{graph-adaptive}$ gating mechanism that encodes workflow DAGs, task states, and VM conditions, using cross-attention to guide expert activation in a fine-grained, deadline-sensitive manner. Experiments on dynamic cloud workflow benchmarks demonstrate that DEFT significantly reduces execution cost and deadline violations, outperforming multiple state-of-the-art DRL baselines.

2606.01111 2026-06-03 cs.LG

LeAP: Learnable Adaptive Permutation for Feature Selection in Heterogeneous and Sparse Recommender Systems

LeAP: 面向异构稀疏推荐系统的可学习自适应特征选择排列

Yihong Huang, Chen Chu, Fei Chen, Yu Lin, Ruiduan Li, Zhihao Li

发表机构 * Bilibili Inc.(哔哩哔哩公司)

AI总结 针对工业推荐系统中特征异构、极度稀疏及排列计算成本高的问题,提出可学习自适应排列模块LeAP,通过将随机排列转化为可学习机制并引入自适应正则化,实现高效特征选择,在四个公开数据集和十亿级工业搜索排序模型中取得最优性能。

详情
AI中文摘要

现代工业推荐系统依赖数千种异构特征——从低维标量(如统计值)到高维嵌入(如用户ID嵌入、MLP表示)——以实现高精度预测。鉴于训练相关的巨大计算成本,高效的特征选择至关重要。然而,现有方法面临三个主要瓶颈:(1)它们通常假设特征维度统一或需要昂贵的映射到固定大小;(2)它们难以处理极度稀疏性,其中大多数特征(例如99%以上)保持默认值;(3)传统的基于排列的方法在大规模设置中计算成本过高。为了解决这些挑战,我们提出了LeAP(可学习自适应排列),一种新颖的、模型无关的即插即用特征选择模块。LeAP将低效的随机排列过程转化为可学习机制,显著加速了特征重要性的评估。此外,我们引入了一种针对异构维度和极度稀疏性定制的自适应正则化策略,使得在非对称输入空间中获得优越的特征重要性排序结果。在四个公开推荐数据集上的实验表明,LeAP达到了最先进的性能。此外,LeAP已部署在一个大规模工业搜索排序模型中,该模型每天处理超过十亿次请求,模型参数规模达2TB。在这个涉及12000多个总特征维度的实际场景中,LeAP成功识别并移除了超过3600个冗余维度,且性能没有下降,其能力是基线方法的2到10倍。

英文摘要

Modern industrial recommender systems rely on thousands of heterogeneous features -- ranging from low-dimensional scalars (e.g., statistical value) to high-dimensional embeddings (e.g., user-id embeddings, MLP representations) -- to achieve high-precision predictions. Given the immense computational costs associated with training, efficient feature selection is critical. However, existing methods encounter three primary bottlenecks: (1) they typically assume uniform feature dimensions or require costly mapping to a fixed size; (2) they struggle with extreme sparsity, where the majority of features (e.g., 99%+) remain at default values; and (3) traditional permutation-based approaches are computationally prohibitive in large-scale settings. To address these challenges, we propose LeAP (Learnable Adaptive Permutation), a novel, model-agnostic plug-in module for feature selection. LeAP transforms the inefficient random permutation process into a learnable mechanism, significantly accelerating the evaluation of feature importance. In addition, we introduce an adaptive regularization strategy tailored for heterogeneous dimensions and extreme sparsity, enabling superior feature importance ranking results across asymmetric input spaces. Experiments on four public recommendation datasets demonstrate that LeAP achieves state-of-the-art performance. Furthermore, LeAP has been deployed in a large-scale industrial search ranking model with over a billion daily requests and a 2TB model parameter scale. In this real-world scenario involving 12,000+ total feature dimensions, LeAP successfully identified and removed over 3,600 redundant dimensions without performance degradation, which is 2 to 10 times the ability of compared baseline methods.

2606.01075 2026-06-03 cs.CL

On the Generalization Gap in Self-Evolving Language Model Reasoning

自进化语言模型推理中的泛化差距

Zhenting Qi, Susanna Maria Baby, Stefanie Anna Baby, Kan Yuan, Andrew Tomkins, Tu Vu, Da-Cheng Juan, Cyrus Rashtchian

发表机构 * Google Research(谷歌研究) Harvard University(哈佛大学) Google(谷歌) Virginia Tech(弗吉尼亚理工大学)

AI总结 本文研究严格闭环设置下自进化算法与完美监督之间的差距,发现多轮批评-修正策略可接近完美监督性能,但自进化仍存在非平凡差距。

Comments Published at ICML 2026

详情
AI中文摘要

近期研究表明,大型语言模型(LLMs)可以通过自进化(SE)来改进,即使用模型自身生成的监督信号。在这项工作中,我们提出疑问:在严格的闭环设置下,自进化算法只能访问未标记的提示集和基础模型,内部生成的监督能多大程度接近完美监督训练?我们在统一的离线自进化框架中分析了四种代表性策略:单轮验证、带反馈的多轮修正、迭代训练和课程学习。我们的主要实验使用骑士与无赖(KK)逻辑推理任务,该任务提供确定性解决方案、可控难度级别以及易于泛化的干净测试平台。我们首先表明,自进化始终优于基础模型,但在投入过多训练计算后趋于平稳,最终仍与完美监督存在非平凡差距。我们发现,使用大型模型的多轮批评-修正可以达到强大的自进化性能,其中Gemma 12B几乎与完美监督训练相匹配。除了骑士与无赖任务,我们还在现实世界的推理基准上评估了自进化,其增益也较为有限。总体而言,我们的结果描述了闭环自进化何时能够提供帮助,并表明在这种最小化设定下,内部生成的监督仍然不足。

英文摘要

Recent work suggests that large language models (LLMs) can improve through self-evolution (SE), using supervision signals generated by the model itself. In this work, we ask: under a strict closed-loop setup, where the self-evolution algorithm has access only to an unlabeled prompt set and a base model, how close can internally generated supervision come to oracle-supervised training? We analyze four representative strategies in a unified offline self-evolution framework: single-round verification, multi-turn revision with feedback, iterative training, and curriculum learning. Our primary experiments use Knights and Knaves (KK) logical reasoning tasks, which provide deterministic solutions, controlled difficulty levels, and a clean testbed for easy-to-hard generalization. We first show that self-evolution consistently improves over the base model, but plateaus after excessive training compute is invested, and eventually still leaves a non-trivial gap to oracle supervision. We find that multi-turn critic-revision with large models can reach strong self-evolution performance, with Gemma 12B nearly matching oracle-supervised training. Beyond Knights and Knaves, we also evaluate self-evolution on real-world reasoning benchmarks, where gains are also modest. Overall, our results characterize when closed-loop self-evolution can help and show how internally generated supervision remains insufficient under this minimal formulation.

2606.01013 2026-06-03 cs.AI cs.AR

Can AI Review Improve Paper Drafting? An Empirical Study on 20 Computer Architecture Submissions

AI审稿能否改进论文撰写?基于20篇计算机体系结构投稿的实证研究

Di Wu

发表机构 * University of Central Florida(中央佛罗里达大学)

AI总结 通过定义对齐指标并开发AI-Paper-Review工具,对20篇计算机体系结构论文进行案例研究,发现AI审稿能覆盖大部分人类提出的问题,并发现人类遗漏的问题,从而探讨AI审稿在改进论文撰写方面的潜力与局限。

Comments 12 pages, 12 figures

详情
AI中文摘要

随着人工智能(AI)的发展,研究进展比以往任何时候都快;相应的研究论文也是如此。AI生成论文数量的激增给同行评审带来了压力,导致AI生成的评审可能被广泛但隐蔽地使用。然而,关于保密性、质量和公平性的相关伦理问题已被提出,且广泛的研究社区尚未达成共识。我们预计这一争论将持续一段时间,但与此同时,我们提出一个替代性的实际问题: extit{AI审稿能否改进论文撰写?} 我们研究了20篇计算机体系结构论文,这些论文的投稿历史各不相同,以揭示AI审稿与人类审稿的对齐程度,并通过我们定义的一组指标进行量化。为了进行案例研究,我们构建了一个集成Web UI的工具——\emph{AI-Paper-Review},该工具可生成论文草稿的结构化AI评审,网址为https://github.com/unarylab/ai-paper-review。该工具从多样化的AI审稿人池中选择若干AI审稿人,并根据评审意见的共性和重要性对其评论进行聚类和排序。它还允许将AI评论与人类评论对齐,以促进基于指标的验证。案例研究表明,AI审稿可以覆盖人类提出的大部分问题,但也提出了人类评审中遗漏的问题。 本文并非旨在鼓励在当前阶段使用AI进行同行评审,而是研究(1)AI审稿如何改进论文撰写,以及(2)基于AI的同行评审的潜力与局限。发布该工具和案例研究数据旨在激发未来关于这一主题的研究。滥用于同行评审将违反主要学术场所的伦理政策。

英文摘要

Research is advancing faster than ever with artificial intelligence (AI); and so are the corresponding research papers. The exploding volume of AI-generated papers have put a strain to peer review, leading to the usage of AI-generated review, potentially wide yet sneaky. However, relevant ethical concerns about confidentiality, quality, and fairness are raised and no consensus has been reached in the broad research community. We expect the debate to continue for a while, but in the meantime, we ask an alternative, practical question: \textit{can AI review improve paper drafting?} We study 20 computer architecture papers, with varying levels of submission lineage, to expose how well AI review aligns with human review, quantified by a set of metrics we define. To conduct the case study, we build a web UI-integrated tool, \emph{AI-Paper-Review}, that generates structured AI review of a draft paper, available at https://github.com/unarylab/ai-paper-review. This tool selects several AI reviewers from a diverse pool of AI reviewers and clusters and ranks their comments based on commonality and importance of review comments. It also allows to align AI comments with human comments to facilitate metric-based validation. The case study shows that AI review can cover a significant fraction of human-raised issues, but also raises issues missing in human review. This paper is not intended to encourage using AI for peer review at the current stage, but to study that (1) how AI review can improve paper drafting and (2) the potential and limitation of AI-based peer review. The release of the tool and the case study data is intended to instigate future research on this topic. Misuse for peer review would violate the ethics policies from major academic venues.

2606.00809 2026-06-03 cs.AI

NBQ: Next-Best-Question for Dynamic Profiling

NBQ: 动态画像中的下一最佳问题

Yimin Shi, Clarice Wang, Haixun Wang, Xiaokui Xiao

发表机构 * National University of Singapore(国立新加坡大学) University of Pennsylvania(宾夕法尼亚大学) EvenUp

AI总结 提出NBQ框架,通过自适应选择信息增益最大的问题,从对话中动态构建用户画像,并引入QuickMatch加速双向匹配。

Journal ref KDD 2026

详情
AI中文摘要

许多真实世界的知识发现对话场景,包括播客、招聘面试和市场,都需要对一个人进行有目的的理解。我们研究了下一最佳问题(NBQ)问题:在每一轮中,面试官应根据已学到的内容和对话目标,提出预期信息增益最高的问题。我们提出了NBQ,一个即插即用的框架,它生成多样化的候选问题池,维护一个紧凑且持续更新的用户状态,在轮次预算内自适应选择下一个问题,并将得到的自由形式对话提炼为结构化的基于向量的用户画像。作为一个高要求的应用,我们将NBQ实例化用于双向匹配,其中兼容性必须是相互的,并且每个人由自我描述和对应偏好表示共同建模。为了支持大规模匹配,我们进一步引入了QuickMatch,一个高效的检索层,将双向匹配从二次成对评分转换为近似向量搜索。实验表明,NBQ在AC@T和AR@T上分别将用户画像质量提高了13.6%和14.0%,而QuickMatch将检索速度提高了22.9倍,召回率高达0.989。

英文摘要

Many real-world conversational settings for knowledge discovery, including podcasts, hiring screens, and marketplaces, require a purpose-driven understanding of a person. We study the Next-Best-Question (NBQ) problem: at each turn, an interviewer should ask the question with the highest expected information gain given what has already been learned and the conversation goal. We propose NBQ, a plug-and-play framework that seeds a diverse pool of candidate questions, maintains a compact and continuously updated user state, adaptively selects the next question within a turn budget, and distills the resulting free-form dialogue into a structured vector-based user profile. As a demanding application, we instantiate NBQ for reciprocal matchmaking, where compatibility must be mutual and each person is modeled by both self-description and counterpart-preference representations. To support large-scale matching, we further introduce QuickMatch, an efficient retrieval layer that recasts reciprocal matching from quadratic pairwise scoring to approximate vector search. Experiments show that NBQ improves user profiling quality by up to 13.6% and 14.0% in AC@T and AR@T, respectively, while QuickMatch accelerates retrieval by up to 22.9x with recall up to 0.989.

2606.00757 2026-06-03 cs.LG

RADE: Random Add-Drop Edge as a Regularizer

RADE: 随机增删边作为正则化器

Danial Saber, Amirali Salehi-Abari

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出随机增删边方法RADE,同时解决图神经网络过拟合和长程信息过压缩问题,通过训练-推理对齐实现无分布偏移的正则化,并自适应调整增删率。

Comments 27 pages, ICML 2026

详情
AI中文摘要

图神经网络(GNN)存在过拟合和长程信息过压缩的问题。随机图增强(如边删除)通过正则化训练来缓解过拟合,但会导致训练-推理错位,且无法改善过压缩。相反,重连方法通过改善连通性来缓解过压缩,但并非设计用于正则化训练。我们提出随机增删边(RADE),一种同时删除和添加边的随机图增强方法,以同时解决过拟合和过压缩。RADE被证明能够对齐训练和推理,使得随机增强在无分布偏移的情况下正则化训练,同时在推理时支持长程通信。我们进一步提出并研究了一种小批量梯度范数平衡算法,该算法在训练过程中自适应调整删除和添加率,使得RADE在实践中无需超参数。在节点和图分类基准上的实验表明,RADE是一种强大的正则化器,并能缓解过压缩。消融实验支持了训练-推理对齐、自适应率选择以及随机边删除和边添加的互补作用。

英文摘要

Graph Neural Networks (GNNs) suffer from overfitting and over-squashing of long-range information. Stochastic graph augmentations (e.g., edge deletion) regularize training against overfitting but can introduce train-inference misalignment and do not improve over-squashing. In contrast, rewiring methods improve connectivity to mitigate over-squashing, but are not designed to regularize training. We propose Random Add-Drop Edge (RADE), a stochastic graph augmentation method that jointly drops and adds edges to address both overfitting and over-squashing simultaneously. RADE is provably designed to align training and inference so that random augmentations regularize training without distribution shift, while supporting long-range communication at inference. We further propose and study a mini-batch gradient-norm balancing algorithm that adapts deletion and addition rates during training, rendering RADE hyperparameter-free in practice. Experiments on node- and graph-classification benchmarks show that RADE is a strong regularizer and mitigates over-squashing. Ablations support the roles of train-inference alignment, adaptive rate selection, and the complementary effects of random edge deletion and edge addition.

2606.00555 2026-06-03 cs.AI q-bio.BM

Probe Before You Edit: Probing-Guided Molecular Optimization for LLM Agents in Structure-Based Drug Design

编辑前先探测:基于探测引导的分子优化用于基于结构的药物设计中的LLM代理

Zaifei Yang, Weiyu Chen, Yaqing Wang, James Kwok

发表机构 * The Hong Kong University of Science and Technology(香港科学与技术大学) City University of Hong Kong(香港城市大学) Beijing Institute of Mathematical Sciences and Applications(北京数学科学应用研究所)

AI总结 提出PROBE框架,通过探测口袋-配体复合物的编辑响应来引导LLM代理进行分子优化,解决结合亲和力与成药性之间的冲突,在CrossDocked2020上达到最优性能。

详情
AI中文摘要

基于结构的药物设计越来越多地使用LLM代理来迭代优化针对目标口袋的配体,然而一个可行的配体必须满足两个常常相互冲突的目标——结合亲和力和成药性——而单次优化步骤很少能同时改善两者。为了量化这一困难,我们引入了两个诊断指标:第一个衡量单次编辑同时改善两个目标的频率,第二个衡量一个目标上的增益伴随另一个目标上的损失的频率。将这些诊断应用于当前的LLM代理流程,揭示了一个一致的失败模式:代理在不知道口袋-配体复合物如何响应局部修改的情况下进行分子编辑,因此很少实现联合改进。受药物化学家的启发,他们在选择优化方向之前通过受控的类似物编辑来探测口袋-配体复合物,我们提出了PROBE,一个围绕编辑响应探测构建的优化框架。PROBE首先将配体分解为可编辑位点,并构建一个口袋特异的位点图,标记出联合增益可能的位置、两个目标可能存在冲突的位置以及应改变责任子结构的位置;然后执行受控的探测编辑,将其响应提炼为编辑手册。在位点图和编辑手册的指导下,PROBE运行一个迭代的多代理循环,其中亲和力代理、成药性代理和协同优化代理共同产生编辑。在CrossDocked2020基准上,PROBE实现了最先进的性能,并显著缓解了我们的诊断指标暴露的失败模式。

英文摘要

Structure-based drug design increasingly employs LLM agents to iteratively refine ligands against a target pocket, yet a viable ligand must satisfy two often-conflicting objectives -- binding affinity and druggability -- which single optimization steps rarely improve together. To quantify this difficulty, we introduce two diagnostic metrics: the first measures how often a single edit improves both objectives, and the second measures how often a gain on one objective comes with a loss on the other. Applying these diagnostics to current LLM-agent pipelines exposes a consistent failure mode: the agent performs molecular editing without knowing how the pocket-ligand complex responds to local modifications, thus rarely achieving joint improvement. Inspired by medicinal chemists, who probe the pocket-ligand complex with controlled analog edits before choosing an optimization direction, we propose \textbf{PROBE}, an optimization framework built around edit-response probing. PROBE first decomposes the ligand into editable sites and builds a pocket-specific \textbf{site map} that flags where joint gains are plausible, where the two objectives are likely in tension, and where liability substructures should be changed; it then performs controlled probe edits whose responses are distilled into an \textbf{EditManual}. Guided by the site map and EditManual, PROBE runs an iterative multi-agent loop in which an affinity agent, a druggability agent, and a co-optimization agent jointly produce edits. On the CrossDocked2020 benchmark, PROBE achieves state-of-the-art performance and substantially mitigates the failure modes exposed by our diagnostics metrics.

2606.00542 2026-06-03 cs.LG

Rethinking Bregman Divergences in Kronecker-Factored Optimizers

重新思考Kronecker因子优化器中的Bregman散度

Bing Liu, Wenjie Zhou, Chengcheng Zhao

发表机构 * College of Control Science and Engineering, Zhejiang University(浙江大学控制科学与工程学院) State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences(中国科学院人工智能安全国家重点实验室,计算技术研究所)

AI总结 本文通过协方差矩阵谱分析,研究了不同Bregman散度(Frobenius、von Neumann、LogDet)在Kronecker近似误差分配中的角色,并提出一种子空间感知的Kronecker优化器,在顶部子空间应用基于特征值的预处理,在底部子空间使用自适应各向同性加速常数。

详情
AI中文摘要

Shampoo风格的优化器使用Kronecker因子结构近似梯度协方差矩阵。最近的工作~\cite{lin2026understanding}表明,这种近似可以视为Bregman矩阵散度下的投影,从而得到不同的Kronecker因子预条件子。然而,当协方差并非精确Kronecker因子化时,散度选择的作用仍不清楚。我们通过协方差矩阵的谱来研究这个问题。我们表明,Frobenius、von Neumann和LogDet散度将不可避免的Kronecker近似误差以不同方式分布在协方差谱上。我们进一步表明,它们的Kronecker因子由散度加权残差而非原始近似误差主导,解释了这些谱偏好如何在所得预条件子中实现。实验上,我们观察到顶部协方差特征空间与Hessian矩阵的对齐程度显著更好,而尾部谱则更加嘈杂且不可靠。受这些发现启发,我们提出一种子空间感知的Kronecker优化器,在顶部子空间应用基于特征值的预处理,在底部子空间使用自适应各向同性加速常数。

英文摘要

Shampoo-style optimizers approximate gradient covariance matrices using Kronecker-factored structures. Recent work~\cite{lin2026understanding} showed that such approximations can be viewed as projections under Bregman matrix divergences, leading to different Kronecker-factored preconditioners. However, it remains unclear what role the choice of divergence plays when the covariance is not exactly Kronecker-factored. We study this question through the spectrum of the covariance matrix. We show that Frobenius, von Neumann, and LogDet divergences distribute the unavoidable Kronecker approximation error differently across the covariance spectrum. We further show that their Kronecker factors are governed by divergence-weighted residuals rather than the raw approximation error, explaining how these spectral preferences are realized in the resulting preconditioners. Empirically, we observe that the top covariance eigenspace is substantially better aligned with the Hessian matrix, while the tail spectrum is much noisier and unreliable. Motivated by these findings, we propose a subspace-aware Kronecker optimizer that applies eigenvalue-based preconditioning in the top subspace and uses an adaptive isotropic acceleration constant in the bottom subspace.

2606.00494 2026-06-03 cs.LG

ProjQ: Project-and-Quantize for Adapter-Aware LLM Compression

ProjQ:面向适配器感知的大语言模型压缩的投影与量化

Wenya Yu, Chao Zhang, Li Wang, Samson Lasaulce, Merouane Debbah

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出ProjQ框架,通过正交子空间投影将量化噪声约束到低秩流形,利用交替算法将主导误差卸载给适配器,实现更优的量化误差补偿和下游任务微调。

Comments Acceppted paper in ICML 2026

详情
AI中文摘要

训练后量化(PTQ)和低秩适配(LoRA)构成了高效大语言模型(LLM)部署的标准流程。然而,顺序应用它们会带来一个问题:PTQ常常留下分散(在模型权重中)的随机噪声,LoRA难以轻易修复,这意味着LoRA最终会浪费其有限的容量来试图修复不可校正的噪声,而不是提高任务性能。在本文中,我们提出了 extbf{ProjQ},一种通过正交子空间投影将量化噪声约束到低秩流形的新框架。我们推导出一种高效的交替算法,将量化噪声塑造成低秩结构,有效地将主导误差分量卸载给后续适配器,同时最小化正交“不可校正”子空间中的残差误差。我们的理论分析表明,与标准PTQ相比,ProjQ为下游任务保留了严格更大的模型可塑性。在LLaMA-2、Qwen2.5和Qwen3上的大量实验证实,ProjQ在量化误差补偿和下游任务微调方面均持续优于现有方法,在补偿方面实现了高达$2 imes$的评估损失降低,并且仅用3比特就达到了标准4比特基线在语言建模任务上的性能。代码可在https://github.com/yy9301/ProjQ获取。

英文摘要

Post-Training Quantization (PTQ) and Low-Rank Adaptation (LoRA) constitute the standard pipeline for efficient Large Language Model (LLM) deployment. However, applying them sequentially poses a problem: PTQ often leaves behind random noise that is spread out (across the model's weights) in a way LoRA can't easily fix, meaning that LoRA ends up wasting its limited capacity trying to fix uncorrectable noise instead of improving task performance. In this paper, we propose \textbf{ProjQ}, a novel framework for constraining quantization noise to the low-rank manifold via orthogonal subspace projection. We derive an efficient alternating algorithm that shapes the quantization noise into a low-rank structure, effectively offloading dominant error components to the subsequent adapter while minimizing the residual error in the orthogonal "uncorrectable" subspace. Our theoretical analysis demonstrates that ProjQ preserves strictly greater model plasticity for downstream tasks compared to standard PTQ. Extensive experiments on LLaMA-2, Qwen2.5 and Qwen3 confirm that ProjQ consistently outperforms existing methods in both quantization error compensation and downstream task fine-tuning, achieving up to $2\times$ lower evaluation loss for compensation and matching the performance of standard 4-bit baselines on language modeling tasks with only 3 bits. The code is available on https://github.com/yy9301/ProjQ .

2606.00489 2026-06-03 cs.CV

3D Segment Anything Model with Visual Mamba for Diagnosing Placenta Accreta Spectrum

基于视觉Mamba的3D分割一切模型用于诊断胎盘植入谱

Yuliang Zhang, Fang He, Lulu Peng, Tianyu Yan, Pingping Zhang, Ting Song, Lili Du, Dunjin Chen

发表机构 * Department of Obstetrics and Gynecology, The Third Affiliated Hospital, Guangzhou Medical University(妇产科系,广州医科大学第三附属医院) Department of Obstetrics, Guangzhou Women and Children’s Medical Center, Guangzhou Medical University(妇产科,广州妇女儿童医疗中心,广州医科大学) Department of Radiology, The Third Affiliated Hospital, Guangzhou Medical University(放射科,广州医科大学第三附属医院) School of Future Technology, Dalian University of Technology(未来技术学院,大连理工大学)

AI总结 提出3DSAMba框架,结合3D SAM、适配器、多级聚合Mamba和融合状态空间模型,通过MRI图像分割病灶区域实现胎盘植入谱的自动诊断。

Comments Accepted by IEEE Transactions on Image Processing (TIP2026). More modifications may be performed

详情
AI中文摘要

胎盘植入谱(PAS)是一种罕见但高度危险的产科疾病。早期准确的PAS诊断对孕产妇健康至关重要。传统的PAS诊断依赖于经验丰富的医生分析剖宫产史和磁共振成像(MRI)数据。然而,地市级医院往往缺乏准确诊断PAS的专业知识和资源。为应对这些挑战,我们建立了首个基于MRI的PAS数据集,包含细粒度分割和分类标注。同时,通过从子宫MRI图像中分割病灶区域,可以显著增强PAS诊断。为了实现自动PAS诊断,我们提出了3DSAMba,一种新颖的特征学习框架,用于有效的病灶分割。具体来说,我们首先设计了3D分割一切模型(SAM),并通过高效的适配器机制将医学领域信息融入模型。此外,我们引入了多级聚合Mamba(MLAM)来聚合不同层次的特征图,以及融合状态空间模型(FSSM)来融合来自编码器和解码器的多尺度特征。最后,我们通过逐元素乘法将分割掩码应用于原始MRI图像,有效隔离病灶区域,以实现更准确的PAS诊断。大量实验验证了我们的框架显著提升了PAS诊断性能。为促进PAS诊断的进一步研究,我们在https://github.com/Drchip61/PASD上发布了数据集和源代码。

英文摘要

Placenta Accreta Spectrum (PAS) is a rare but highly dangerous obstetric disease. Early and accurate PAS diagnosis is critical for maternal health. Traditional PAS diagnosis relies on experienced doctors by analyzing the cesarean history and Magnetic Resonance Imaging (MRI) data. However, district-level hospitals often lack the expertise and resources for accurate PAS diagnosis. To address these challenges, we establish the first MRI-based PAS dataset, which includes both fine-grained segmentation and classification annotations. Meanwhile, diagnosing PAS can be significantly enhanced by segmenting lesion areas from MRI images of the uterus. To achieve automatic PAS diagnosis, we propose 3DSAMba, a novel feature learning framework for effective lesion segmentation. More specifically, we first design a 3D Segment Anything Model (SAM) and incorporate medical domain information into the model through an efficient adapter mechanism. In addition, we introduce a Multi-Level Aggregation Mamba (MLAM) to aggregate feature maps across different levels and a Fusion State Space Model (FSSM) to fuse multi-scale features from both the encoder and decoder. Finally, we apply segmentation masks to the original MRI images through element-wise multiplication, effectively isolating lesion areas for more accurate PAS diagnosis. Extensive experiments validate that our framework significantly improves the PAS diagnostic performance. To facilitate further research in PAS diagnosis, we have released the dataset and source code at https://github.com/Drchip61/PASD.

2606.00395 2026-06-03 cs.LG cs.AI

PR2: Predictive Routing Replay for MoE-Based LLM Reinforcement Learning

PR2: 基于MoE的大语言模型强化学习中的预测性路由重放

Daize Dong, Junlin Chen, Haolong Jia, Jiang Liu, Jiawei Wu, Huanwei Di, Jialian Wu, Zhengzhong Liu, Zicheng Liu, Emad Barsoum, Dimitris N. Metaxas, Hongyi Wang

发表机构 * Rutgers University(罗格斯大学) AMD MBZUAI

AI总结 针对MoE大语言模型强化学习中路由器漂移导致的不稳定性问题,提出预测性路由重放方法,通过轻量级演化预测器减少路由不匹配,提升训练稳定性和性能。

详情
AI中文摘要

混合专家(MoE)大语言模型(LLM)在规模上实现了强大的性能。然而,基于MoE的LLM的强化学习(RL)常常遭受训练不稳定性。一个根本原因是路由器漂移,即专家激活可能在模型更新时发生剧烈变化,并且在分解的推出和训练阶段之间不同,导致PPO风格RL算法中出现大的推出-训练不匹配和不稳定的重要性采样权重。路由重放通过在每个推理轨迹内冻结重放路由来缓解这个问题,但它忽略了路由器在离策略更新下如何演化,从而导致路由器过时。为了解决这个限制,我们提出了预测性路由重放(PR2),它为每个路由器配备了一个轻量级的演化预测器,学习预测短时域的路由器演化。在推出阶段,我们使用预测性路由分布来应用top-$k$路由,使梯度能够到达更新后可能激活的专家。在训练阶段,我们重放由此产生的预测路由,以保持一致性,从而实现稳定的重要性估计。理论分析和实验支持PR2减少了由路由引起的不匹配,提高了RL稳定性,并在各种推理基准上取得了更强的性能。

英文摘要

Mixture of Experts (MoE) Large Language Models (LLMs) achieve strong performance at scale. However, reinforcement learning (RL) on MoE-based LLMs often suffers from training instability. A root cause is router drift, i.e., expert activations can change drastically across model updates and differ between disaggregated rollout and training phases, causing large rollout--training mismatch and unstable importance sampling weights in PPO-style RL algorithms. Routing replay mitigates this issue by freezing the replay route within each reasoning trajectory, but it ignores how the router evolves under off-policy updates and thus causes router staleness. To address this limitation, we propose Predictive Routing Replay (PR2), which augments each router with a lightweight evolution predictor that learns to anticipate short-horizon router evolution. During the rollout phase, we use the predictive routing distribution to apply top-$k$ routing, enabling gradients to reach experts that are likely to become active after updates. During the training phase, we replay the resulting predicted route to retain consistency for stable importance estimation. Theoretical analysis and experiments support that PR2 reduces routing-induced mismatch, improves RL stability, and yields stronger performance across various reasoning benchmarks.

2606.00366 2026-06-03 cs.LG math.OC

GLENS: Global Search via Learning from Solver Iterates with Diffusion Models

GLENS: 通过扩散模型从求解器迭代中学习进行全局搜索

Anjian Li, Bartolomeo Stellato, Ryne Beeson

发表机构 * Department of Electrical and Computer Engineering, Princeton University(电气工程与计算机科学系,普林斯顿大学) Department of Operations Research and Financial Engineering, Princeton University(运筹学与金融工程系,普林斯顿大学) Department of Mechanical and Aerospace Engineering, Princeton University(机械与航空航天工程系,普林斯顿大学)

AI总结 提出GLENS方法,利用扩散模型学习求解器迭代过程中的局部几何结构,生成高质量且多样化的初始猜测,加速多模态非凸优化问题的全局搜索。

详情
AI中文摘要

我们考虑为多模态非凸连续优化问题的局部最小值生成大量初始猜测的问题。目标是这些初始猜测质量高(即数值求解器快速收敛)且多样化(即代表许多不同的局部最小值)。识别多个局部最优解能够实现灵活的下游决策,但通常需要昂贵的全局搜索。现有的数据驱动方法仅使用离线求解器运行中最终收敛的最优值来预测初始猜测,这丢弃了关于解局部邻域的信息,并限制了可用的训练数据。我们提出GLENS(通过从求解器迭代中学习进行全局搜索),一种数据高效的全局搜索方法,利用中间求解器迭代作为免费的数据增强。GLENS由两个组件组成:邻域结构模型,使用扩散模型学习以问题参数为条件的最优值周围的局部几何结构;以及求解器行为模型,学习细化方向,在扩散采样期间进一步引导样本朝向附近的最优值。在修改的非凸基准问题和双机器人避障导航问题上的实验表明,GLENS生成高质量的初始猜测,同时保留了多样局部最优值的多模态分布。生成的初始猜测在不同问题设置和求解器中导致更快的求解器收敛。我们还分析了关键超参数选择对性能的影响。

英文摘要

We consider the problem of generating a large collection of initial guesses for local minima of multimodal non-convex continuous optimization problems. The goal is for these initial guesses to be high-quality (i.e., a numerical solver converges quickly) and diverse (i.e., represent many different local minima). Identifying multiple locally optimal solutions enables flexible downstream decision-making, but typically requires expensive global search. Existing data-driven methods predict initial guesses using only the final converged optima from offline solver runs, which discards information about the local neighborhoods of solutions and limits the available training data. We propose GLENS (Global Search via Learning from Solver Iterates), a data-efficient global search method that leverages intermediate solver iterates as free data augmentation. GLENS consists of two components: a neighborhood structure model that uses diffusion models to learn the local geometry around optima conditioned on problem parameters, and a solver behavior model that learns refinement directions to further guide samples towards nearby optima during diffusion sampling. Experiments on modified non-convex benchmark problems and a two-robot obstacle-avoidance navigation problem show that GLENS generates high-quality initial guesses while preserving the multimodal distribution of diverse local optima. The resulting initial guesses lead to faster solver convergence across different problem settings and solvers. We also analyze how key hyperparameter choices affect the performance.

2606.00351 2026-06-03 cs.CV

UniVerse: A Unified Modulation Framework for Segmentation-Free,Disentangled Multi-Concept Personalization

UniVerse:一种用于无分割、解耦多概念个性化的统一调制框架

Quynh Phung, Sandesh Ghimire, Minsi Hu, Chung-Chi Tsai, Jia-Bin Huang

发表机构 * University of Maryland, College Park(马里兰大学College Park分校) Qualcomm Technologies, Inc.(高通技术公司)

AI总结 提出UniVerse框架,通过扩散变换器中的统一调制实现无分割的多概念解耦与个性化,显著提升定位精度和视觉保真度。

Comments https://universe-personalization.github.io/

详情
AI中文摘要

个性化视觉理解已取得显著进展,但当输入图像包含多个对象时,现有方法难以定位和提取特定概念。许多先前方法严重依赖基于分割的监督或表现出较差的组合泛化能力,限制了它们准确解耦和操作单个概念的能力。在这项工作中,我们提出了UniVerse,一种用于扩散变换器中无分割、解耦多概念个性化的统一调制框架。我们的方法允许可组合和可分解的概念提取,无需显式分割掩码即可实现目标对象的细粒度定位和表示。UniVerse学习将复杂场景分解为特定于概念的表示,然后以统一的方式组合它们,从而在多样化的视觉上下文中实现鲁棒的个性化。通过在多个基准上的大量实验,我们证明UniVerse在定位精度和视觉保真度方面均显著优于最先进的基线。定性和定量结果表明,我们的方法可以在杂乱场景中精确提取目标概念,为更灵活、可解释和个性化的视觉生成与理解铺平道路。

英文摘要

Personalized visual understanding has advanced significantly, yet existing approaches struggle to localize and extract specific concepts when input images contain multiple objects. Many prior methods rely heavily on segmentation-based supervision or exhibit poor compositional generalization, limiting their ability to accurately disentangle and manipulate individual concepts. In this work, we propose UniVerse, a Unified Modulation Framework for segmentation-free, disentangled multi-concept personalization in diffusion transformers. Our method allows for composable and decomposable concept extraction, enabling fine-grained localization and representation of target objects without explicit segmentation masks. UniVerse learns to decompose complex scenes into concept-specific representations and then compose them in a unified manner, enabling robust personalization across diverse visual contexts. Through extensive experiments on multiple benchmarks, we demonstrate that UniVerse significantly outperforms state-of-the-art baselines in both localization accuracy and visual fidelity. Qualitative and quantitative results show that our approach can precisely extract target concepts in cluttered scenes, paving the way for more flexible, interpretable, and personalized visual generation and understanding.

2606.00321 2026-06-03 cs.CV

Training-Free Object-Agnostic Jam Detection in Fulfillment Centers

无训练、对象无关的配送中心堵塞检测

Ruiliang Liu, Tina Dongxu Li, Joshua Migdal, Fernando Ruch, Kenneth Meszaros, Moses Trevor Dardik

发表机构 * Amazon, USA(亚马逊公司)

AI总结 提出一种无需训练和标注数据的对象无关堵塞检测方法,通过监控参考点持续遮挡来识别堵塞事件,在1069个视频上达到100%精度和93.33% F1分数。

Comments 4 pages, 6 figures. Accepted at the 2026 IEEE International Conference on Automation Science and Engineering (CASE 2026) as a presentation-only paper

详情
AI中文摘要

在配送中心,各种物体从入库到出库连续移动,可能因传送带摩擦过大、方向错误或机械故障而堵塞。传统的堵塞检测方法依赖目标检测模型识别物体,然后使用跟踪算法(如IoU重叠和卡尔曼滤波)监控运动。这种流程需要数千个手动标注,耗时约两周,且仅限于已标注的物体类别。我们提出一种无需训练、对象无关的堵塞检测方法,消除了对标注数据的需求。我们的方法在没有物体时在监控区域内均匀采样参考点。当物体遮挡这些点时,我们检测到运动。当足够多的点被遮挡超过时间阈值时,我们将事件分类为堵塞。与传统的点跟踪(将遮挡视为失败情况)不同,我们的方法将遮挡重新用作检测信号,监控参考点是否持续被遮挡,而不是跟踪它们移动到哪里。我们在1069个视频上的实验评估表明,AllTracker实现了100.00%的精度和93.33%的F1分数,显著优于经典的稀疏跟踪方法,同时保持无需训练的部署。该方法具有三个关键优势:(1)无需训练数据或手动标注,(2)对象无关地泛化到任意物体类型,(3)显著减少开发时间。

英文摘要

In fulfillment centers, diverse objects move continuously from inbound to outbound operations and can become jammed due to excessive conveyor friction, incorrect orientation, or mechanical failures. Traditional jam detection approaches rely on object detection models to identify objects, followed by tracking algorithms (such as IoU overlap and Kalman filtering) to monitor motion over time. This pipeline requires thousands of manual annotations, consuming approximately two weeks of effort, and is limited to annotated object classes. We present a training-free, object-agnostic jam detection method that eliminates the need for labeled data. Our approach uniformly samples reference points within the monitoring region when no objects are present. As objects occlude these points, we detect motion. When a sufficient fraction remains occluded beyond a temporal threshold, we classify the event as a jam. Unlike conventional point tracking--which treats occlusion as a failure case--our approach repurposes occlusion as a detection signal, monitoring whether reference points remain persistently occluded rather than tracking where they move. Our experimental evaluation on 1,069 videos demonstrates that AllTracker achieves 100.00% precision and 93.33% F1 score, significantly outperforming classical sparse tracking methods while maintaining training-free deployment. This approach offers three key advantages: (1) no training data or manual annotations, (2) object-agnostic generalization to arbitrary object types, and (3) significantly reduced development time.

2606.00096 2026-06-03 cs.CV cs.AI

Diversity Over Frequency: Rethinking Tool Use in Visual Chain-of-Thought Agents

多样性优于频率:重新思考视觉思维链智能体中的工具使用

Dong-Hee Kim, Reuben Tan, Donghyun Kim

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Cambridge(剑桥大学) University of Toronto(多伦多大学)

AI总结 本文研究视觉思维链智能体在复杂推理任务中的工具使用,发现工具使用崩溃现象,并提出熵正则化方法通过鼓励多样化探索提升推理性能。

Comments Presented in ICML 2026

详情
AI中文摘要

视觉智能体在视觉思维链中使用外部视觉工具来整合细粒度证据。虽然先前的工作主要研究这些工具在视觉搜索任务中的应用,但它们在更复杂的视觉推理中的作用仍未充分探索。在本文中,我们超越简单的视觉搜索任务,研究更具挑战性的任务,包括3D空间推理和医学视觉问答,其中智能体必须将工具获取的局部证据与全局上下文整合。我们识别出一种工具使用崩溃现象:模型逐渐停止使用工具,同时仍能获得更高的任务准确率。此外,我们观察到明显的不对称性:(i) 完全消除工具使用会降低性能,而(ii) 激励工具使用仅带来边际收益,尽管使用量大幅增加。我们发现,普通训练和工具使用鼓励都降低了展开多样性,这解释了为什么更高的工具使用不会带来更强的推理性能。受这些发现的启发,我们添加了一个熵正则化项来鼓励多样化的展开探索,尽管工具使用逐渐下降,但实现了最佳性能。总体而言,我们的发现表明了一种训练时工具作为支架的观点,其中对语言生成和视觉工具调用的更广泛探索改善了推理,尽管存在工具使用崩溃。项目页面:https://scaffolded-exploration.github.io

英文摘要

Visual agents employ external visual tools within visual chains of thought to incorporate fine-grained evidence. While prior work has mainly studied these tools in visual search tasks, their role in more complex visual reasoning remains underexplored. In this paper, we move beyond simple visual search tasks to investigate more challenging tasks, including 3D spatial reasoning and medical visual question answering, where agents must integrate tool-acquired local evidence with the global context. We identify a {tool-use collapse phenomenon: models progressively stop using tools while still achieving higher task accuracy. Moreover, we observe a clear asymmetry: (i) completely eliminating tool use degrades performance, whereas (ii) incentivizing tool use yields only marginal gains despite substantially increasing usage. We find that vanilla training and tool-use encouragement both reduce rollout diversity, explaining why higher tool use does not yield stronger reasoning performance. Motivated by these findings, we add an entropy regularization term to encourage diverse rollout exploration, achieving the best performance despite gradually declining tool usage. Overall, our findings suggest a training-time view of tools as scaffolding, where broader exploration over language generation and visual tool invocation improves reasoning despite tool-use collapse. Project page: https://scaffolded-exploration.github.io

2605.31434 2026-06-03 cs.RO

Shaft-integrated Force Sensing with Transformer-based Dynamics Compensation for Telesurgery

基于变压器的动力学补偿的轴集成力传感用于远程手术

Shuyuan Yang, Grant Boone, Timo Markert, Sebastian Matich, Andreas Theissler, Martin Atzmueller, Zonghe Chua

发表机构 * Department of Electrical, Computer, and Systems Engineering, Case Western Reserve University(电气、计算机与系统工程系,凯斯西储大学) Department of Mechanical and Aerospace Engineering, Case Western Reserve University(机械与航空航天工程系,凯斯西储大学) Resense GmbH Semantic Information Systems Group, Osnabrück University(语义信息系统组,奥斯纳布吕克大学) Justus Liebig University(吉森大学) German Research Center for Artificial Intelligence (DFKI)(德国人工智能研究中心(DFKI))

AI总结 提出一种将六轴力传感器集成到标准缆驱动手术器械远端的方法,利用变压器神经网络补偿内部缆力,实现末端执行器力估计,归一化误差低于6%。

Comments The paper was accepted by IEEE Transactions on Medical Robotics and Bionics in May 2026

详情
AI中文摘要

机器人辅助微创手术(RAMIS)增强了外科医生的灵巧性,新平台利用触觉反馈进一步提高性能。这种力信息具有更广泛的潜力,可用于性能评估、触觉定位和手术自主性。这促使需要将力传感集成到RAMIS工具中的可访问方法。本工作提出了一种将六轴商用力传感器集成到标准缆驱动手术器械远端的方法,在保持设备原始机械功能的同时实现末端执行器力测量。所提出的设计强调可重复性和研究应用的可访问性,无需专门的制造工具。变压器神经网络将力传感器测量值与机器人状态信息相结合,以帮助估计末端执行器施加的力,补偿由驱动引起的内部缆力。我们提出的方法实现了低于6%的归一化误差,并且比纯近端数据驱动传感方法更好地泛化到未见条件。高内部缆力导致传感器饱和并降低轴向力的可观测性,这可能沿工具主轴和更高负载条件下降低性能。鉴于当前性能水平,系统集成性和性能的平衡使得在RAMIS中触觉反馈、技能评估和力信息自主性等及时主题的应用和研究成为可能。视频和代码可在https://enhanced-telerobotics.github.io/shaft force sensing获取。

英文摘要

Robot-Assisted Minimally Invasive Surgery (RAMIS) enhances surgeon dexterity, with newer platforms leveraging haptic feedback to further improve performance. Such force information has broader potential to inform performance assessment, tactile localization, and surgical autonomy. This motivates the need for accessible approaches to integrating force sensing into RAMIS tools. This work presents a method for integrating a six-axis commercial force sensor into the distal end of a standard cable-driven surgical instrument, enabling end-effector force measurement while preserving the original mechanical functionality of the device. The proposed design emphasizes reproducibility and accessibility for research applications, requiring no specialized manufacturing tools. A transformer neural network integrates force sensor measurements with robot state information to aid estimation of applied forces at the end-effector, compensating for internal cable forces arising from actuation. Our proposed approach achieved normalized errors below 6%, and generalized to unseen conditions better than purely proximal data-driven sensing approaches. High internal cable forces caused sensor saturation and reduced axial force observability, which can degrade performance along the tool's major axis and under higher load conditions. Given current levels of performance, the balance of system integrability and performance enables applications and research into timely topics of haptic feedback, skill assessment, and force-informed autonomy in RAMIS. Videos and code are available at https://enhanced-telerobotics.github.io/shaft_force_sensing/.

2605.31381 2026-06-03 cs.CL

LLM Judges Inconsistently Disagree Across Safety Criteria and Harm Categories

LLM 法官在安全标准和危害类别上不一致地存在分歧

Krishnapriya Vishnubhotla, Sowmya Vajjala, Akriti Vij, Isar Nejadgholi

发表机构 * National Research Council, Canada(加拿大国家研究理事会) IMDA, Singapore(新加坡信息通信技术发展局)

AI总结 本研究评估了自动法官在无参考设置下进行多维安全评估的一致性,发现大型语言模型在识别金融等受监管领域中机器生成建议的安全问题时不可靠,且不一致程度因安全标准、内容语言和风格而异,不同法官之间也存在高度分歧。

Comments 8 pages plus appendices, under review

详情
AI中文摘要

我们评估了自动法官在无参考设置下进行多维安全评估的一致性。结果表明,大型语言模型在识别金融等受监管领域中机器生成建议的安全问题时不可靠,尽管它们在识别更明显的危险/有害内容(如暴力)时更为可靠。模型判断的不一致程度可能因所选安全标准而有显著差异,并且可能受内容语言及其语言风格的影响。最后,对于相同的输出,不同法官之间在领域、安全标准和语言上存在高度分歧。这些发现为使用LLM作为评估者的实践提供了新见解,并为从业者如何在实际场景中使用自动法官提供了若干建议。

英文摘要

We evaluate the consistency of automated judges in conducting a multi-dimensional safety evaluation in a reference-free setup. Our results indicate that Large Language Models are unreliable judges in identifying safety issues related to machine-generated advice in regulated domains such as finance, although they are more reliable at identifying more overt forms of unsafe/harmful content such as violence. The degree of inconsistency in a model's judgments can vary significantly by the chosen safety criteria and can be impacted by the language of the content and its linguistic style as well. Finally, there is high disagreement among different judges for the same output, across domains, safety criteria, and languages. These findings provide new insights on the practice of using LLMs as evaluators and offer several recommendations for practitioners on how to use automated judges in practical scenarios.

2605.30952 2026-06-03 cs.LG

Spectral Anatomy of Quantum Gaussian Process Kernels

量子高斯过程核的谱解剖

Jian Xu, Chao Li, Guang Lin, Yuning Qiu, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS RIKEN AIP South China University of Technology(华南理工大学) Columbia University(哥伦比亚大学)

AI总结 通过归一化谱熵S(K)/log n统一解释了量子高斯过程回归中指数加速失效与后验病理现象,并证明该诊断量在多种量子与经典核上具有普适性,且在IBM Heron硬件上实现了低误差迁移。

详情
AI中文摘要

两个近期结果重塑了量子高斯过程(QGPs)。一方面,\citet{lowe2025assessing} 排除了在典型、良态条件下基于HHL的QGP回归声称的指数加速;另一方面,一项独立工作表明,高表达性量子核存在后验病理,破坏了贝叶斯优化。我们证明这些看似无关的现象由同一个量控制:核Gram矩阵的归一化谱熵 $S(K)/\log n$。我们证明了Nyström近似误差的Cauchy–Schwarz尾部界、以Bach自由度 $d_σ(K)$ 表示的有限样本方差收缩恒等式,以及通过目标在核本征基中的内在维数对 \emph{依赖于目标} 的最优熵的表征。实验上,该诊断量与核无关:硬件高效、matchgate、IQP \emph{以及} RBF/Matérn/RFF/深度核族在去量子化、ECE和方差收缩面板上全部坍缩到相同的 $S/\log n$ 曲线上。NLL最佳点位于光滑目标的高熵和带限量子数据目标的低熵。该诊断量从模拟器迁移到IBM Heron硬件,在 $n_q = 4$ 的 $24$ 种配置中 $S/\log n$ 的中位绝对误差为 $3.2\%$,平均误差为 $5.2\%$,其中matchgate和IQP的平均误差在 $5\%$ 以内,单个HE配置返回 $30\%$ 的异常值,重新运行时降至 $0.5\%$(归因于校准漂移);相同的诊断量迁移到第二个Heron后端(平均误差 $2.7\%$)以及原始后端上的 $n_q = 6$ 扩展(平均误差 $1.7\%$)。全程未应用误差缓解。

英文摘要

Two recent results have reshaped quantum Gaussian processes (QGPs). On the one hand, \citet{lowe2025assessing} rule out the exponential speedups claimed by HHL-based QGP regression in the typical, well-conditioned regime; on the other, an independent line of work shows that highly expressive quantum kernels suffer posterior pathologies that break Bayesian optimization. We show that these seemingly unrelated phenomena are governed by the same quantity: the normalized spectral entropy $S(K)/\log n$ of the kernel Gram matrix. We prove a Cauchy--Schwarz tail bound on Nyström approximation error, a finite-sample variance-contraction identity in terms of Bach's degrees of freedom $d_σ(K)$, and a characterization of the \emph{target-dependent} optimal entropy via the intrinsic dimension of the target in the kernel eigenbasis. Empirically, the diagnostic is kernel-agnostic: hardware-efficient, matchgate, IQP \emph{and} RBF/Matérn/RFF/deep-kernel families all collapse onto identical $S/\log n$ curves on dequantization, ECE, and variance-contraction panels. The NLL sweet spot lives at high entropy for smooth targets and at low entropy for band-limited quantum-data targets. The diagnostic transfers from simulator to IBM Heron hardware with median absolute error $3.2\%$ and mean $5.2\%$ in $S/\log n$ across $24$ configurations at $n_q = 4$, with matchgate and IQP within $5\%$ mean and a single HE configuration returning a $30\%$ outlier that drops to $0.5\%$ on rerun (attributed to calibration drift); the same diagnostic transfers to a second Heron backend (mean error $2.7\%$) and to a $n_q = 6$ scale-up on the original backend (mean error $1.7\%$). No error mitigation is applied throughout.

2605.30915 2026-06-03 cs.CV

DiTTo: Scalable Order-aware All-in-One Image Restoration Agent

DiTTo: 可扩展的排序感知全能图像修复智能体

Seungho Choi, Jihyong Oh

发表机构 * CMLab, Chung-Ang University(Chung-Ang 大学 CMLab) David S. Hippocampus Department of Computer Science Cranberry-Lemon University(Cranberry-Lemon 大学 计算机科学系 Hippocampus 教授)

AI总结 提出DiTTo框架,通过模拟器高效构建最优修复轨迹数据集,并采用排序感知对齐实现修复专家的即插即用扩展,在多退化图像修复中达到最优性能。

Comments Please visit our project page at https://cmlab-korea.github.io/DiTTo/

详情
AI中文摘要

真实世界的图像很少只遭受单一退化,且退化去除的顺序显著影响最终修复质量,这推动了基于智能体的图像修复(IR),其中视觉语言模型调度一组预构建的修复专家。然而,现有的基于训练的智能体每张图像需要 $\mathcal{O}((N^{\mathbf{D}})^{2})$ 次修复专家调用来构建最优修复动作轨迹数据集(ORTD),其中 $N^{\mathbf{D}}$ 表示宇宙 $\mathbf{D}$ 中的退化类型数量,并且将智能体训练与固定的修复专家池耦合,阻止了在没有完全重新训练的情况下扩展到新引入的修复专家。为了克服这些效率和可扩展性瓶颈,我们提出了 extbf{DiTTo},一种新颖的排序感知图像修复智能体框架,由 DiTTo 模拟器和 DiTTo 智能体组成。DiTTo 模拟器结合了用于单步修复动作模拟的 $\cup$S-IR 和用于每个动作质量预测的 AiO-IQA,将 ORTD 构建减少到每张图像 $\mathcal{O}(N^{\mathbf{D}})$ 次模拟器调用;DiTTo 智能体通过在模拟器生成的 ORTD 上进行 SFT 训练,随后进行 extbf{排序感知修复对齐(ORA)},该对齐沿着独立轴对齐退化识别、修复动作排序和输出格式。这实现了 extbf{即插即用的可扩展扩展性}:添加一个新的修复专家只需要更新轻量级的 ORA 阶段。在最多包含五种并发退化的 MiO-100 评估集上,我们的 DiTTo 智能体在先前基于智能体的 IR 方法中实现了最先进的多退化修复质量。

英文摘要

Real-world images rarely suffer from a single degradation, and the order in which degradations are removed substantially affects the final restoration quality, motivating agent-based image restoration (IR), where a vision-language model schedules a pool of pre-built restoration-experts. However, existing training-based agents require $\mathcal{O}((N^{\mathbf{D}})^{2})$ restoration-expert calls per image to construct the Optimal Restoration-action Trajectory Dataset (ORTD), where $N^{\mathbf{D}}$ denotes the number of degradation types in the universe $\mathbf{D}$, and couple agent training to a fixed restoration-expert pool, preventing extension to newly introduced restoration-experts without full retraining. To overcome these efficiency and extensibility bottlenecks, we propose \textbf{DiTTo}, a novel order-aware image restoration agent framework consisting of the DiTTo Simulator and the DiTTo Agent. The DiTTo Simulator combines $\cup$S-IR for single-step restoration-action simulation and AiO-IQA for per-action quality prediction, reducing ORTD construction to $\mathcal{O}(N^{\mathbf{D}})$ simulator calls per image; the DiTTo Agent is trained by SFT on the simulator-generated ORTD, followed by \textbf{Order-aware Restoration Alignment (ORA)} that aligns degradation identification, restoration-action-ordering, and output format along independent axes. This enables \textbf{plug-and-play scalable extensibility}: adding a new restoration-expert requires updating only the lightweight ORA stage. On the MiO-100 evaluation set with up to five concurrent degradations, our DiTTo Agent achieves state-of-the-art multi-degradation restoration quality among previous agent-based IR methods.

2605.30789 2026-06-03 cs.LG cs.AI

Smaller Models are Natural Explorers for Policy-Level Diversity in GRPO

小型模型是GRPO中策略级多样性的自然探索者

Yiming Ren, Yiran Xu, Zicheng Lin, Chufan Shi, Yukang Chen, Dingdong Wang, Tianhe Wu, Junjie Wang, Yujiu Yang, Yu Qiao, Ruihang Chu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出S2L-PO框架,利用小型模型作为自然探索者生成策略级多样性的rollout,通过渐进退火策略过渡到大型模型自身采样,提升数学推理性能并减少计算开销。

详情
AI中文摘要

我们识别出增强LLM组相对策略优化(GRPO)中rollout多样性的新维度。虽然GRPO依赖于多样化的rollout,但主流策略主要通过注入更多token级随机性来增加多样性,这可能引入逐步噪声并导致不连贯的轨迹。我们发现,同一模型族中的较小模型固有地表现出更高的策略级多样性,随着样本数量增加,其pass@k优于较大模型。与token级噪声不同,这种多样性在时间上相关,保持逻辑一致性,并为梯度估计提供结构化探索信号。因此,我们提出S2L-PO(从小到大的策略优化)框架,利用固定的小型模型作为自然探索者来训练大型模型。为了平衡探索与利用,我们设计了一种渐进退火策略,从离线的小模型rollout过渡到大型学习者自身的采样。这种转变优雅地避免了由小模型容量限制导致的训练中期性能下降,实现了更快的收敛并解锁了更高的性能上限。S2L-PO在多种数学推理基准上提高了准确率(例如,使用1.7B探索者指导8B模型在AIME 24上提高了8.8%),同时减少了rollout计算量。

英文摘要

We identify a new dimension for enhancing rollout diversity in Group Relative Policy Optimization (GRPO) for LLMs. While GRPO relies on diverse rollouts, prevailing strategies primarily increase diversity by injecting more token-level randomness, which may introduce step-wise noise and lead to incoherent trajectories. We uncover that smaller models within the same model family inherently exhibit higher policy-level diversity, indicated by their superior pass@k relative to larger counterparts as sample counts increase. Unlike token-level noise, this diversity is temporally correlated, preserves logical consistency, and provides structured exploration signals for gradient estimation. We thus propose S2L-PO (Small-to-Large Policy Optimization), a framework that leverages fixed small models as natural explorers to train larger models. To balance exploration and exploitation, we design a progressive annealing strategy that transitions from offline small-model rollouts to the large learner's own sampling. This shift elegantly avoids mid-training performance drops caused by the small model's capacity limits, achieving faster convergence and unlocking a higher performance ceiling. S2L-PO improves accuracy on diverse mathematical reasoning benchmarks (e.g., +8.8% on AIME 24 using a 1.7B explorer to guide the 8B model) while reducing rollout compute.

2605.30722 2026-06-03 cs.LG stat.CO stat.ME

Self-Certifying Transport MCMC via Dual Spectral-Gap Certificates

通过双谱间隙证书实现自认证传输MCMC

Jun Hu

发表机构 * Wuhan University of Technology(武汉理工大学)

AI总结 提出CerT-MCMC框架,利用归一化流实现传输MCMC的自动严格收敛认证,通过覆盖证书和分位数核心证书提供谱间隙界限。

Comments 35 pages, 3 figures, 9 tables. Submitted to JASA

详情
AI中文摘要

我们提出CerT-MCMC,一个为学习传输马尔可夫链蒙特卡洛配备自动、严格收敛证书的框架。归一化流将高斯参考映射到目标后验的近似;同一流同时作为独立Metropolis-Hastings提议和可计算谱间隙界的基础。我们开发了两个互补的证书。覆盖证书通过有限样本覆盖论证在全提议支撑上界权重比振荡,当保守梯度界可用时产生全支撑谱间隙界;其修正项以O(n^{-1/D})缩放,随着维度增加迅速变弱并最终无效。我们证明了一个匹配的Omega(n^{-1/D})下界,确立这一障碍是逐点Lipschitz认证固有的。分位数核心证书将注意力限制在高概率残差核心上,其振荡由一维经验分位数控制,具有O(n^{-1/2})的有限样本概率松弛,与维数无关。在合成目标(D=2-20)、结构工程后验(D=6,8)、心脏病数据集上的真实数据逻辑回归(D=13)以及合成贝叶斯逻辑回归(D=20)上,分位数核心证书在覆盖证书无效时提供了非平凡的谱间隙界,其谱间隙代理在7%内跟踪经验有效样本量。一个阴性对照实验证实,证书以超过10倍的因子区分流质量,而接受率仅相差1.15倍。据我们所知,双证书框架是第一个为学习传输MCMC提供自动、维度感知收敛证书的框架,区分了真正的传输失败与证明技术限制。

英文摘要

We propose CerT-MCMC, a framework that equips learned-transport Markov chain Monte Carlo with automatic, rigorous convergence certificates. A normalising flow maps a Gaussian reference to an approximation of the target posterior; the same flow then serves as both the independence Metropolis-Hastings proposal and the basis for a computable spectral-gap bound. We develop two complementary certificates. The covering certificate bounds the weight-ratio oscillation over the full proposal support via finite-sample covering arguments, yielding full-support spectral-gap bounds when a conservative gradient bound is available; its correction term scales as O(n^{-1/D}), making it rapidly weak and eventually vacuous as dimension increases. We prove a matching Omega(n^{-1/D}) lower bound, establishing that this barrier is intrinsic to pointwise Lipschitz certification. The quantile-core certificate restricts attention to a high-probability residual core on which the oscillation is controlled by one-dimensional empirical quantiles, with a finite-sample probability slack of O(n^{-1/2}), independent of the ambient dimension. On synthetic targets (D=2-20), structural-engineering posteriors (D=6,8), real-data logistic regression on the Heart Disease data set (D=13), and synthetic Bayesian logistic regression (D=20), the quantile-core certificate delivers non-vacuous spectral-gap bounds where the covering certificate is vacuous, and its spectral-gap proxy tracks empirical effective sample sizes within 7%. A negative control experiment confirms that the certificate discriminates flow quality by a factor exceeding 10x, whereas acceptance rates differ by only 1.15x. To our knowledge, the dual-certificate framework is the first to provide automatic, dimension-aware convergence certificates for learned-transport MCMC, distinguishing genuine transport failure from proof-technique limitations.

2605.27762 2026-06-03 cs.AI

PEAM: Parametric Embodied Agent Memory through Contrastive Internalization of Experience in Minecraft

PEAM: 通过经验对比内化的参数化具身智能体记忆在Minecraft中的应用

Yuchen Guo, Junli Gong, Weicheng Wang, Hongmin Cai, Yiu-ming Cheung, Weifeng Su

发表机构 * Northwestern University(西北大学) Northeastern University(东北大学) South China University of Technology(华南理工大学) Hong Kong Baptist University(香港 Baptist大学) Beijing Normal - Hong Kong Baptist University(北京师范大学-香港 Baptist大学)

AI总结 提出PEAM框架,通过对比内化失败-纠正轨迹对,将经验转化为参数化技能,实现Minecraft中具身智能体的持续学习与高效执行。

详情
AI中文摘要

我们提出了PEAM,一个在Minecraft中的参数化具身智能体记忆框架,它将智能体记忆从推理时检索转变为通过经验内化的参数驻留技能。PEAM将一个用于开放推理的慢速思考LLM与一个用于反射性执行已巩固技能的快速参数化模块配对。快速模块是一个多模态专家混合LoRA架构,具有按类别物理隔离的适配器,实现了无灾难性遗忘的参数级持续学习。我们将失败视为第一类训练信号:失败-纠正轨迹对通过联合行为克隆和对比目标进行内化,因此智能体不仅学习什么成功,还学习纠正动作与失败动作的区别。为了控制巩固,PEAM引入了参数化值得分来决定哪些经验应被内化,以及一个无尺度的自触发巩固机制来决定何时内化,无需任务特定的手动调整阈值,使智能体能够自我进化,因为触发器可以在任务分布之间转移而无需重新调整。在Minecraft中的实验表明,PEAM提高了长时域任务性能,减轻了对先前巩固技能的遗忘,并提高了参数化与检索效率,优于基于检索的具身智能体和参数化记忆变体。

英文摘要

We present PEAM, a Parametric Embodied Agent Memory framework in Minecraft that transforms agent memory from inference-time retrieval into parameter-resident skills internalized through experience. PEAM pairs a slow deliberative LLM for open-ended reasoning with a fast parametric module for reflexive execution of consolidated skills. The fast module is a multimodal Mixture-of-Experts LoRA architecture with per-category physically isolated adapters, enabling parameter-level continual learning without catastrophic forgetting. We treat failure as a first-class training signal: failure--correction trajectory pairs are internalized through a joint behavioral-cloning and contrastive objective, so the agent learns not only what succeeds but also how corrected actions differ from failed ones. To govern consolidation, PEAM introduces a parameterization-worthiness score for deciding which experience should be internalized, and a scale-free self-triggered consolidation mechanism for deciding when to internalize without task-specific hand-tuned thresholds, making the agent self-evolving as the trigger transfers across task distributions without re-tuning. Experiments in Minecraft show that PEAM improves long-horizon task performance, mitigates forgetting on previously consolidated skills, and improves parametric-versus-retrieval efficiency over retrieval-based embodied agents and parametric memory variants.

2605.26774 2026-06-03 cs.CV

Cesarean Scar Defect Segmentation in Transvaginal Ultrasound Images: a Dataset and Benchmark

经阴道超声图像中的剖宫产瘢痕缺损分割:数据集与基准

Yuan Tian, Yue Li, Wei Xia, Tianyu Xu, Jian Zhang, Liye Shi, Jing Liu, Yang Wang, Ming Liu, Qing Xu, Yixuan Zhang, Maggie M. He, Xiangjian He

发表机构 * Department of Obstetrics and Gynecology, International Peace Maternity and Child Health Hospital affiliated to Shanghai Jiao Tong University School of Medicine(妇产科部门,上海交通大学医学院国际和平妇产儿童医院) School of Computer Science, University of Nottingham Ningbo China(Nottingham Ningbo中国大学计算机学院) School of Computer Science, University of Nottingham(Nottingham大学计算机学院) Department of Computer Science and Engineering, University of California, San Diego(加州大学圣地亚哥分校计算机科学与工程系) Department of Ultrasound, International Peace Maternity and Child Health Hospital affiliated to Shanghai Jiao Tong University School of Medicine(超声科,上海交通大学医学院国际和平妇产儿童医院) School of Electronic Information and Electrical Engineering, Shanghai Jiao Tong University(上海交通大学电子信息与电气工程学院) Department of Cardiology, Gold Coast University Hospital(心内科,Gold Coast大学医院)

AI总结 针对经阴道超声图像中剖宫产瘢痕缺损(CSD)分割缺乏公开数据集的问题,构建了包含1111张图像和16个视频的CSD数据集,提供像素级标注,并建立了基准以推动医学图像分割算法和临床创新。

详情
AI中文摘要

剖宫产瘢痕缺损(CSD)是剖宫产后最常见的并发症之一。经阴道超声检查广泛用于CSD的初步筛查。准确确定CSD的轮廓和尺寸对治疗至关重要。然而,由于CSD尺寸小、形态不规则、图像质量欠佳以及资源有限环境中临床意识不足,超声医师常常忽略CSD。尽管人工智能在医学影像领域取得了进展,但目前尚无公开的经阴道超声CSD分割数据集。为填补这一空白,我们提出了一个全面的CSD数据集,包含1111张图像和16个视频,共501个阳性样本,带有确证的CSD和精确的像素级手动标注。标注遵循标准化临床指南,由经验丰富的超声医师和受过培训的博士生合作完成。这项工作为推进医学图像分割算法和促进临床创新提供了高质量的基准资源。最终,改善CSD诊断及后续治疗策略可提高育龄女性的生活质量,对医学研究和临床实践均具有重要价值。

英文摘要

Cesarean Scar Defect (CSD) is one of the most prevalent complications following cesarean delivery. Transvaginal ultrasonography is widely used for primary CSD screening. Accurate determination of CSD outline and dimensions is crucial for treatment. However, CSDs are frequently overlooked by sonographers due to small size and irregular morphology, suboptimal image quality, and limited clinical awareness in resource-constrained settings. Despite artificial intelligence advances in medical imaging, no public dataset exists for transvaginal ultrasound CSD segmentation. To address this gap, we present a comprehensive CSD dataset comprising 1,111 images and 16 videos, yielding 501 positive samples with confirmed CSD and precise pixel-level manual annotations. Annotations are performed following standardized clinical guidelines through collaboration between experienced sonographers and trained PhD students. This work provides high-quality benchmark resources for advancing medical image segmentation algorithms and promoting clinical innovation. Ultimately, improved CSD diagnosis and subsequent treatment strategies can enhance the quality of life in women of reproductive age, representing significant value for both medical research and clinical practice.

2605.26704 2026-06-03 cs.LG cs.AI

SL-BiLEM: Structured Learnable Behavior-in-the-Loop Epidemic Modeling for Forecasting and Policy Evaluation

SL-BiLEM: 用于预测和政策评估的结构化可学习行为循环流行病模型

Haochun Wang, Sendong Zhao, Jingbo Wang, Yanrui Du, Ting Liu, Bing Qin

发表机构 * Faculty of Computing, Harbin Institute of Technology(计算学院,哈尔滨工业大学)

AI总结 提出SL-BiLEM模型,通过物理约束正则化实现鲁棒外推,在政策干预导致的分布偏移下预测准确率提升76%,并支持反事实分析。

Comments ACM SIGKDD 2026

详情
AI中文摘要

流行病预测面临一个基本挑战:人类行为会动态响应疾病传播,形成反馈循环,在政策干预点引发分布偏移。这使得数据驱动模型在分布偏移下不可靠。我们提出 extbf{SL-BiLEM}(结构化可学习行为循环流行病模型),利用物理约束作为正则化实现鲁棒外推。该框架将有效传播率分解为$β_{ ext{eff}}(t,g) = β_0(g) imes m_{ ext{policy}}(t) imes m_{ ext{media}}(t) imes m_{ ext{comp}}(t,g)$,其中对学习到的依从函数施加单调性、平滑性和有界跳跃约束,以在新政策制度下保持预测有效性。除预测外,SL-BiLEM还能为干预决策支持进行反事实分析。我们在三个真实世界数据集(邮轮、学校流感和学区COVID-19监测)上验证预测性能,并在已知真实情况的合成基准上评估反事实恢复。SL-BiLEM表明:(1)相比神经机制基线改进76%,在政策诱导偏移下仅53%的OOD退化,而神经基线为1142%;(2)在27个合成反事实实验中,自举置信区间覆盖率达100%;(3)处理效应准确度超过0.85。这些结果使SL-BiLEM成为公共卫生决策者寻求准确预测和原则性干预规划的可解释工具。

英文摘要

Epidemic forecasting faces a fundamental challenge: human behavior dynamically responds to disease spread, creating feedback loops that induce distribution shifts at policy intervention points. This renders data-driven models unreliable under distribution shift. We propose \textbf{SL-BiLEM} (Structured Learnable Behavior-in-the-Loop Epidemic Model), leveraging physical constraints as regularization for robust extrapolation. The framework decomposes effective transmission as $β_{\text{eff}}(t,g) = β_0(g) \times m_{\text{policy}}(t) \times m_{\text{media}}(t) \times m_{\text{comp}}(t,g)$, where monotonicity, smoothness, and bounded-jump constraints on the learned compliance function maintain predictive validity under novel policy regimes. Beyond forecasting, SL-BiLEM enables counterfactual analysis for intervention decision support. We validate forecasting on three real-world datasets (cruise ship, school influenza, and school-district COVID-19 surveillance) and evaluate counterfactual recovery on synthetic benchmarks with known ground truth. SL-BiLEM demonstrates: (1) 76\% improvement over neural-mechanistic baselines, with only 53\% OOD degradation versus 1142\% for neural baselines under policy-induced shift; (2) 100\% bootstrap CI coverage across 27 synthetic counterfactual experiments; and (3) Treatment Effect Accuracy exceeding 0.85. These results establish SL-BiLEM as an interpretable tool for public health decision-makers seeking accurate prediction and principled intervention planning.

2605.26006 2026-06-03 cs.CV cs.GR cs.RO

MIND: Multi-Scale Intent Diffusion for Text-Driven Physics-Based Humanoid Control

MIND: 多尺度意图扩散用于文本驱动的基于物理的人形控制

Bin Li, Ruichi Zhang, Han Liang, Jingyan Zhang, Juze Zhang, Xin Chen, Jingya Wang

发表机构 * ShanghaiTech University(上海科技大学) University of Pennsylvania(宾夕法尼亚大学) Bytedance Seed(字节跳动种子) Stanford University(斯坦福大学) InstAdapt

AI总结 提出MIND框架,通过多尺度意图扩散机制将文本命令与低级动作语义对齐,实现基于物理的人形机器人行为生成。

详情
AI中文摘要

使基于物理的人形机器人能够根据高级文本命令执行多样化的行为仍然是一个重大挑战。现有方法通常遵循两阶段范式(结合运动学动作生成与基于物理的跟踪)或端到端模仿学习范式(直接从文本生成动作)。然而,前者受限于运动学生成与基于物理跟踪之间的固有域偏移,而后者则难以弥合文本命令与低级动作之间的巨大模态差距,限制了有效的语义对齐。值得注意的是,人形状态编码了丰富的运动动态,与低级动作相比,这些动态在语义上与文本描述更对齐,因此成为推导行为意图的自然基础。基于这一见解,我们提出了MIND,一种新颖的端到端扩散框架,用于文本驱动的基于物理的人形控制,该框架利用行为意图作为文本命令与低级动作之间的语义桥梁。其核心是,MIND引入了多尺度意图扩散机制,其中整体意图预测器捕获全局行为动态以指导整体行为合成,而即时意图预测器在每一步扩散中提供逐步的细粒度信号以进行局部行为细化。这种分层意图公式化为人形控制施加了结构化的归纳偏置,改善了语义对齐和行为自然性。此外,MIND将人形状态编码到潜在空间中,以实现更有效的语义意图建模。大量实验表明,MIND优于现有方法,并能从文本命令中合成连贯、物理合理且语义对齐的人形行为。我们的代码将发布以促进未来研究。

英文摘要

Enabling physics-based humanoids to execute diverse behaviors from high-level textual commands remains a significant challenge. Existing methods typically follow either a two-stage paradigm that combines kinematic motion generation with physics-based tracking, or an end-to-end imitation-learning paradigm that directly generates actions from text. However, the former suffers from the inherent domain shift between kinematic generation and physics-based tracking, while the latter struggles with the substantial modality gap between textual commands and low-level actions, limiting effective semantic alignment. Notably, humanoid states encode rich motion dynamics that are more semantically aligned with textual descriptions than low-level actions, making them a natural basis for deriving behavioral intent. Building upon this insight, we propose MIND, a novel end-to-end diffusion framework for text-driven physics-based humanoid control that leverages behavioral intent as a semantic bridge between textual commands and low-level actions. At its core, MIND introduces a multi-scale intent diffusion mechanism, where a holistic intent predictor captures global behavioral dynamics to guide overall behavior synthesis, while an immediate intent predictor provides step-wise, fine-grained signals for local behavior refinement at each diffusion step. This hierarchical intent formulation imposes a structured inductive bias for humanoid control, improving semantic alignment and behavioral naturalness. Furthermore, MIND encodes humanoid states into a latent space to enable more effective semantic intent modeling. Extensive experiments demonstrate that MIND outperforms existing methods and synthesizes coherent, physically plausible, and semantically aligned humanoid behaviors from text commands. Project page: https://binlee26.github.io/MIND_page.

2605.30313 2026-06-03 cs.RO

UniLab: A Heterogeneous Architecture for Robot RL Beyond GPU-Dominant Paradigms

UniLab: 超越GPU主导范式的机器人强化学习异构架构

Yufei Jia, Zhanxiang Cao, Mingrui Yu, Heng Zhang, Shenyu Chen, Dixuan Jiang, Meng Li, Xiaofan Li, Yiyang Liu, Junzhe Wu, Zheng Li, XiLin Fang, Ting-Yu Tsui, Shengcheng Fu, Haoyang Li, Anqi Wang, Zifan Wang, Dongjie Zhu, Chenyu Cao, Zhenbiao Huang, Ziang Zheng, Jie Lu, Xin Ma, Zhengyang Wei, Xiang Zhao, Tianyue Zhan, Ye He, Yuxiang Chen, Yizhou Jiang, Yue Li, Haizhou Ge, Yuhang Dong, Fan Jia, Ziheng Zhang, Meng Zhang, Xiwa Deng, Zhixing Chen, Hanyang Shao, Chenxin Dong, Yixuan Li, Yizhi Chen, Bokui Chen, Kaifeng Zhang, Hanqing Cui, Yusen Qin, Ruqi Huang, Lei Han, Tiancai Wang, Xiang Li, Yue Gao, Guyue Zhou

发表机构 * THU(清华大学) SJTU(上海交通大学) SII(上海信息所) Motphys HITSZ(哈尔滨工业大学) BIT(北京理工大学) NEU(南京大学) SUSTech(四川大学) TJU(天津大学) DISCOVER Robotics HKUST(GZ)(香港科技大学(广州)) Galbot NUS(国立新加坡大学) WTU(武汉理工大学) HBUT(湖南大学) AMD NJU(南京大学) ZJU(浙江大学) Dexmal Sharpa D-Robotics

AI总结 提出UniLab异构CPU-仿真/GPU-学习架构,通过统一运行时解耦CPU并行仿真与GPU策略更新,在相同硬件配置下将端到端训练效率提升3-10倍,并减少对NVIDIA CUDA的依赖。

详情
AI中文摘要

基于仿真的当代机器人控制强化学习日益围绕GPU驻留仿真组织:物理、轨迹收集和学习都放在单个以GPU为中心的执行路径上。这种范式极大地提高了训练速度,但也鼓励了一种默认假设,即高效训练需要物理位于GPU上。我们重新审视这一假设。我们的观点是,在仿真主导的机器人控制中,关键问题不是哪个处理器运行物理,而是仿真吞吐量、策略学习和运行时同步是否形成高效的端到端循环。我们提出了UniLab,一种异构CPU-仿真/GPU-学习架构,通过统一的数据移动、缓冲和同步运行时,将CPU并行仿真与GPU策略更新解耦。UniLab实现为一个完整且可扩展的训练系统,使用MuJoCoUni和MotrixSim CPU批处理物理后端,支持PPO、FastSAC、FlashSAC和APPO。在代表性的基于仿真的机器人控制任务上,UniLab在相同硬件配置下将端到端训练效率提升了3-10倍,同时减少了对基于NVIDIA CUDA的软件栈的依赖,并支持在Apple macOS平台以及AMD ROCm和Intel XPU加速器后端上的跨平台执行。这些结果表明,GPU仿真是高效训练的有效路径,但不是必需的路径,拓宽了机器人强化学习训练可用的实际系统选择。项目页面:https://unilabsim.github.io。

英文摘要

Simulation-based RL for contemporary robot control is increasingly organized around GPU-resident simulation: physics, rollout collection, and learning are placed on a single GPU-centric execution path. This paradigm has greatly improved training speed, but it has also encouraged a default assumption that efficient training requires physics to reside on the GPU. We revisit this assumption. Our view is that, in simulation-dominated robot control, the essential question is not which processor runs physics, but whether simulation throughput, policy learning, and runtime synchronization form an efficient end-to-end loop. We present UniLab, a heterogeneous CPU-simulation / GPU-learning architecture that decouples CPU-parallel simulation from GPU policy updates through a unified runtime for data movement, buffering, and synchronization. UniLab is implemented as a complete and extensible training system using MuJoCoUni and MotrixSim CPU-batched physics backends, supporting PPO, FastSAC, FlashSAC, and APPO. On representative simulation-based robot control tasks, UniLab improves end-to-end training efficiency by 3--10$\times$ under the same hardware configuration, while reducing dependence on the NVIDIA CUDA-based software stack and supporting cross-platform execution on the Apple macOS platform and the AMD ROCm and Intel XPU accelerator backends. These results show that GPU simulation is an effective path to efficient training, but not a necessary one, broadening the practical system choices available for robot RL training. Project page: https://unilabsim.github.io.

2605.30225 2026-06-03 cs.LG

ExDBSCAN: Explaining DBSCAN with Counterfactual Reasoning -- Additional Material

ExDBSCAN: 用反事实推理解释DBSCAN——附加材料

Pernille Matthews, Lena Krieger, Tommaso Amico, Artur Zimek, Thomas Seidl, Ira Assent

发表机构 * Aarhus University Department of Computer Science(奥胡斯大学计算机科学系) Forschungszentrum Jülich(朱利奇研究中心) LMU Munich(慕尼黑大学) University of Southern Denmark Department of Computer Science and Mathematics(南丹麦大学计算机科学与数学系) MCML Munich Germany(慕尼黑MCML德国) DBS

AI总结 提出ExDBSCAN方法,通过密度感知的反事实解释为DBSCAN聚类结果提供可解释性,理论保证有效性,实验证明优于基线。

详情
AI中文摘要

聚类是一种通过相似性对数据点进行分组的无监督技术。尽管监督机器学习存在可解释性方法,但它们不能直接应用于聚类,使得理解聚类分配具有挑战性。这种可解释性差距在流行的基于密度的方法DBSCAN中尤为明显,该方法将点分配为内点(密集区域中的聚类成员)或离群点(稀疏区域中的噪声点)。DBSCAN没有提供关于为什么特定点得到其分配或其分配是否对数据的小变化鲁棒的见解。为了解决缺乏可解释性的问题,我们引入了ExDBSCAN,一种密度感知的事后解释方法。ExDBSCAN提供可操作的反事实解释,并具有有效性的理论保证。它使用密度连接的加权图生成多个反事实,采用物理启发模型,该模型使反事实候选者彼此排斥(多样性),同时将它们拉向要解释的实例(接近性)。在30个表格数据集上对比四个基线的实证评估表明,ExDBSCAN在所有基线上表现优异,同时达到完美的有效性并检索到多样、接近的反事实。

英文摘要

Clustering is an unsupervised technique for grouping data points by similarity. While explainability methods exist for supervised machine learning, they are not directly applicable to clustering, making it challenging to understand cluster assignments. This interpretability gap is particularly evident in the popular density-based method DBSCAN, which assigns points as inliers (cluster members in dense regions) or outliers (noise points in sparse regions). DBSCAN does not provide insight into why a particular point receives its assignment or whether its assignment is robust to small changes in the data. To address the lack of explainability, we introduce ExDBSCAN, a density-aware, post-hoc explanation method. ExDBSCAN offers actionable counterfactual explanations, with theoretical guarantees for validity. It generates multiple counterfactuals using a density connected weighted graph, adopting a physics-inspired model that repels counterfactual candidates from one another (diversity), while pulling them toward the instance to explain (proximity). Empirical evaluation on 30 tabular datasets comparing against four baselines shows that ExDBSCAN outperforms all baselines while attaining perfect validity and retrieving diverse, proximal counterfactuals.

2605.29930 2026-06-03 cs.AI cs.CY cs.HC

Toward AI That Understands Self and Others: A World-Model Theory of Cognitive Diversity and Alignment

迈向理解自我与他人的AI系统:人类认知多样性与世界模型对齐的多阶段推理框架

Toru Takahashi

发表机构 * Human Informatics and Systems Lab, Doshisha University(立命馆大学人机系统实验室) Linked Open Data Initiative, NPO Keio Research Institute at SFC(庆应义塾大学SFC研究所开放数据计划) Stroly Inc(Stroly公司)

AI总结 提出多阶段推理框架(MIM),通过阶段形成空间、前景化场、主体特定轮廓状态和状态表示对齐图,形式化异质世界模型的产生,并将世界模型对齐重新定义为使异质表示相互可处理的问题,而非强制一致。

Comments 87 pages. Revised version with a refined abstract emphasizing disagreement as a late-stage phenomenon, target admissibility, processability, and the methodological abstraction used to compare humans, AI systems, and institutional decision procedures under shared information-theoretic constraints

详情
AI中文摘要

当代社会中的相互误解并非仅仅因为人们持有不同的观点或价值观。即使在相同的观察下,不同的主体也可能形成不同的推理目标、状态表示、预测误差和更新优先级。本文提出了一个多阶段推理框架,并将其核心内部机制定义为多阶段推理机制(MIM)。MIM通过阶段形成空间、前景化场、主体特定轮廓状态以及状态表示之间的对齐图,形式化了异质世界模型如何产生。在此基础上,本文将世界模型对齐重新定义为使异质表示相互可处理的问题,而非强制达成一致或收敛到单一价值体系。它进一步将这种形式化与哲学分歧、认知类型学、社会分裂和AI对齐联系起来。目的是为AI系统提供一个建设性的词汇,通过使意义、价值和预测误差的差异可见、可比较和可转化,帮助人类理解自我和他人。

英文摘要

Modern societies possess more information than ever before, yet they do not converge toward a single shared understanding. The same events, facts, laws, technologies, or risks can be interpreted as evidence of freedom, danger, exclusion, injustice, responsibility, or unrealized possibility. Existing discussions often treat such disagreement as a conflict of values, preferences, or beliefs. This paper argues that disagreement is already a late-stage phenomenon. The central premise is simple but not trivial: observation is not yet inference. Not every observation becomes inferentially relevant, and not every possible object in an observation sequence becomes an estimation target. A possible target becomes admissible only when a state representation can be constructed that is approximately sufficient for prediction, evaluation, or action with respect to that target. This paper develops a world-model theory of cognitive diversity and alignment by reconstructing recognition as the construction of such approximate sufficient statistics under finite informational, representational, observational, and action constraints. It formulates this position as the Multi-Phase Inference Assumption (MIA) and defines its core internal mechanism as the Multi-Phase Inference Mechanism (MIM). The framework introduces alignment maps and transformation loss to analyze how heterogeneous world models communicate without being collapsed into a single representation. World-model alignment is therefore processability, not agreement: the design of AI systems that help heterogeneous forms of intelligence remain mutually processable while preserving their distinct error-detection capacities.

2605.29663 2026-06-03 cs.RO

EXACT-MPPI: Exact Signed-Distance Navigation for Arbitrary-Footprint Robots from Point Clouds via Path Integral Control

EXACT-MPPI:通过路径积分控制实现点云中任意足迹机器人的精确有符号距离导航

Chen Peng, Zhikang Ge, Wenwu Lu, Haiming Gao, Stavros Vougioukas, Peng Wei

发表机构 * ZJU-Hangzhou Global Scientific and Technological Innovation Center, Zhejiang University, Hangzhou, China(浙江大学杭州全球科技创新中心) College of Biosystems Engineering and Food Science, Zhejiang University, Hangzhou, China(浙江大学生物系统工程与食品科学学院) Department of Biological and Agricultural Engineering, University of California, Davis, Davis, California, USA(加州大学戴维斯分校生物与农业工程系)

AI总结 提出EXACT-MPPI框架,将解析精确有符号距离评估器嵌入模型预测路径积分控制器,无需中间地图表示,直接处理点云实现任意形状足迹机器人的安全导航。

详情
AI中文摘要

地面机器人通常携带有效载荷、工具或其他附件,使其有效足迹变成复杂的非凸形状。在杂乱环境中安全导航需要考虑到这种真实几何形状,然而大多数局部规划器使用凸或膨胀代理简化它,并将传感器数据栅格化为占用网格或距离场。当间隙与足迹几何形状相当时,这两种选择都会消除可行运动。我们提出EXACT-MPPI,一种无需训练的局部导航框架,将局部点云观测和稀疏引导直接映射到运动命令,无需任何中间地图表示。该框架将解析的精确有符号距离评估器嵌入模型预测路径积分(MPPI)控制器中。足迹表示为简单多边形,适用于一般凸或凹平面形状,并具有矩形覆盖特化以加速直线足迹的评估,从而实现足迹感知碰撞成本,无需凸分解、膨胀或学习编码器。在每个MPPI rollout期间,观测到的障碍物点被变换到预测的机体坐标系中,并针对足迹进行评估。所有操作在JAX中批处理,利用GPU并行性实现实时滚动时域控制。实验表明,EXACT-MPPI在批处理距离评估上比学习的点到机器人基线更快,在凸足迹规划器失败的地方保留了可行运动,并在密集静态和移动障碍物下保持鲁棒性。相同的框架通过仅更改足迹描述和运动模型即可部署在差速驱动、阿克曼、全向和混合模式平台上,无需针对每个平台进行训练。因此,将精确足迹几何与基于采样的预测控制相结合,为跨不同机器人的足迹感知局部导航提供了一种实用的、无需训练的途径。

英文摘要

Ground robots often carry payloads, implements, or other attachments that turn their effective footprint into complex, non-convex shapes. Navigating safely through clutter then requires reasoning about this true geometry, yet most local planners simplify it with convex or inflated proxies and rasterize sensor data into occupancy grids or distance fields. Both choices eliminate feasible motions when clearance is comparable to the footprint geometry. We present EXACT-MPPI, a training-free local navigation framework that maps local point-cloud observations and sparse guidance directly to motion commands, without any intermediate map representation. The framework embeds an analytic, exact signed-distance evaluator into a Model Predictive Path Integral (MPPI) controller. The footprint is represented as a simple polygon for general convex or concave planar shapes, with a rectangle-cover specialization for faster evaluation of rectilinear footprints, enabling footprint-aware collision costs without convex decomposition, inflation, or learned encoders. During each MPPI rollout, observed obstacle points are transformed into the predicted body frame and evaluated against the footprint. All operations are batched in JAX, leveraging GPU parallelism for real-time receding-horizon control. Experiments show that EXACT-MPPI accelerates batched distance evaluation over a learned point-to-robot baseline, preserves feasible motion where convex-footprint planners fail, and remains robust under dense static and moving obstacles. The same framework deploys on differential-drive, Ackermann, omnidirectional, and hybrid-mode platforms by changing only the footprint description and motion model without per-platform training. Pairing exact footprint geometry with sampling-based predictive control thus offers a practical, training-free path to footprint-aware local navigation across diverse robots.

2605.29661 2026-06-03 cs.CV

Geometry-Guided Modeling of Foundation Features Enables Generalizable Object Shape Deformation Learning

几何引导的基础特征建模实现可泛化的物体形状变形学习

Yiyao Ma, Kai Chen, Zhongxiang Zhou, Zhuheng Song, Dongsheng Xie, Zelong Tan, Rong Xiong, Qi Dou

发表机构 * Department of Computer Science and Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China(香港中文大学计算机科学与工程系) Zhejiang Innovation Center for Humanoid Robotics, Ningbo, China(浙江省人形机器人创新中心) State Key Laboratory of Industrial Control and Technology, Zhejiang University, Hangzhou, China(浙江省工业控制技术重点实验室)

AI总结 提出一种几何引导的特征建模机制和视图自适应特征聚合模块,通过变形类别级形状模板实现单目3D形状恢复,在形状变化和视角多样性上显著优于现有方法。

Comments 20 pages, 12 figures, accepted by ICML 2026

详情
AI中文摘要

单目3D形状恢复是几何理解的基础,但在任意视角和未见物体类别上实现鲁棒泛化仍然是一个重大挑战。本文提出一个可泛化的变形学习框架,通过显式变形类别级形状模板以匹配目标观测来重建3D物体。为了解决模板与目标之间的复杂形状变化,我们引入了几何引导的特征建模机制。该过程首先用模板拓扑丰富基础特征以生成几何感知表示,然后将其与目标观测显式关联以指导精确变形。此外,为了弥合固定模板与任意目标视图之间的差异,我们提出一个视图自适应特征聚合模块。该模块利用多视图模板特征及其对应的相机姿态来丰富规范模板表示,确保无论目标视角如何都能实现鲁棒的特征对齐。大量实验表明,我们的方法在处理大形状变化和多样化视角方面显著优于最先进的方法,展现出对新颖类别的强泛化能力,并有效支持下游真实世界的灵巧机器人操作任务。项目主页:https://GODeform.github.io/

英文摘要

Monocular 3D shape recovery is fundamental to geometric understanding, yet achieving robust generalization across arbitrary viewpoints and unseen object categories remains a significant challenge. In this paper, we present a generalizable deformation learning framework that reconstructs 3D objects by explicitly deforming a category-level shape template to match the target observation. To address complex shape variations between the template and the target, we introduce a geometry-guided feature modeling mechanism. This process first enriches foundation features with template topology to yield a geometry-aware representation, which is then explicitly correlated with the target observation to guide precise deformation. Furthermore, to bridge the disparity between the fixed template and arbitrary target views, we propose a view-adaptive feature aggregation module. This module leverages multi-view template features and their corresponding camera poses to enrich the canonical template representation, ensuring robust feature alignment regardless of the target's perspective. Extensive experiments demonstrate that our approach significantly outperforms state-of-the-art methods in handling large shape variations and diverse viewpoints, exhibiting strong generalization to novel categories and effectively supporting downstream real-world dexterous robotic manipulation tasks. Project homepage: https://GODeform.github.io/