arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2062
2605.31031 2026-06-01 cs.AI

GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning

GraphARC:基于图的抽象推理综合基准

Saku Peltonen, August Bøgh Rønberg, Andreas Plesner, Roger Wattenhofer

AI总结 提出GraphARC基准,将抽象推理扩展到图结构数据,通过少样本变换学习任务评估模型在局部、全局和层次图变换上的泛化能力,并揭示语言模型的理解-执行差距和规模扩展障碍。

Comments Accepted at KDD 2026 Datasets and Benchmarks Track

详情
AI中文摘要

关系推理是智能的核心,但现有基准通常局限于网格或文本格式。我们引入了GraphARC,一个用于图结构数据抽象推理的基准。GraphARC推广了抽象与推理语料库(ARC)的少样本变换学习范式。每个任务需要从几个输入-输出对中推断变换规则,并将其应用于新的测试图,涵盖局部、全局和层次图变换。与基于网格的ARC不同,GraphARC实例可以在不同的图族和规模上大规模生成,从而能够系统评估泛化能力。我们在GraphARC上评估了最先进的语言模型,并观察到明显的局限性。模型能够回答关于图属性的问题,但往往无法解决完整的图变换任务,揭示了理解-执行差距。在更大实例上性能进一步下降,暴露了规模扩展障碍。更广泛地说,通过将节点分类、链接预测和图生成的方面结合在一个单一框架内,GraphARC为未来的图基础模型提供了一个有前景的测试平台。

英文摘要

Relational reasoning lies at the heart of intelligence, but existing benchmarks are typically confined to formats such as grids or text. We introduce GraphARC, a benchmark for abstract reasoning on graph-structured data. GraphARC generalizes the few-shot transformation learning paradigm of the Abstraction and Reasoning Corpus (ARC). Each task requires inferring a transformation rule from a few input-output pairs and applying it to a new test graph, covering local, global, and hierarchical graph transformations. Unlike grid-based ARC, GraphARC instances can be generated at scale across diverse graph families and sizes, enabling systematic evaluation of generalization abilities. We evaluate state-of-the-art language models on GraphARC and observe clear limitations. Models can answer questions about graph properties but often fail to solve the full graph transformation task, revealing a comprehension-execution gap. Performance further degrades on larger instances, exposing scaling barriers. More broadly, by combining aspects of node classification, link prediction, and graph generation within a single framework, GraphARC provides a promising testbed for future graph foundation models.

2605.31029 2026-06-01 cs.CV

PEEK: Picking Essential frames via Efficient Knowledge distillation

PEEK: 通过高效知识蒸馏提取关键帧

Killian Steunou, Anas Filali Razzouki, Khalil Guetari, Mounîm A. El-Yacoubi, Yannis Tevissen

AI总结 提出PEEK方法,通过知识蒸馏将教师模型的帧相关性排名迁移至轻量级时序模型,实现高效动态帧采样,在低帧预算下显著提升视频字幕生成性能。

Comments Supplementary material at https://www.killian-steunou.com/peek/static/pdfs/peek_supplementary.pdf

详情
AI中文摘要

视频语言模型只能处理有限数量的帧,使得帧选择成为高效视频字幕生成的关键瓶颈。大多数字幕生成流程仍依赖均匀采样,该方法计算成本低但忽略视觉内容。自适应帧采样最近成为从视频中选择最具信息量帧的有前景方法,但现有方法计算成本仍然高昂。我们提出PEEK,一种高效的动态帧采样方法,它将字幕条件帧相关性排名从更强的教师模型蒸馏到仅基于视觉内容运行的轻量级时序模型中。我们发现,总体而言,在ActivityNet Captions和MSR-VTT上,我们的方法在所有评估的下游视觉语言模型中优于最先进方法,特别是当仅选择一或两帧进行字幕生成时,在大多数帧预算下获得最佳CIDEr分数。在ActivityNet Captions上,PEEK尤其强大,在16个配置中赢得14个。在MSR-VTT上的零样本评估表明,我们的模型在低帧预算下迁移效果最佳,而在四帧和八帧时结果更为混合,因为时间覆盖和视觉多样性变得更具竞争力。与最近的自适应基线相比,PEEK在低预算场景下更准确且更高效:它仅增加5.2%的字幕生成时间,而CSTA增加65.4%,MaxInfo增加211.9%。我们在https://github.com/momentslab/peek发布代码和预训练检查点。

英文摘要

Video-language models can process only a limited number of frames, making frame selection a key bottleneck for efficient video captioning. Most captioning pipelines still rely on uniform sampling, which is computationally cheap but agnostic to visual content. Adaptive frame sampling has recently emerged as a promising approach for selecting the most informative frames from a video; however, existing methods remain computationally expensive. We introduce PEEK, an efficient dynamic frame sampling method that distills caption-conditioned frame relevance rankings from a stronger teacher model into a lightweight temporal model that operates only on visual content. We find that, overall, on ActivityNet Captions and MSR-VTT, our method outperforms state-of-the-art methods across all evaluated downstream vision language models, especially when only one or two frames are selected for captioning, obtaining the best CIDEr for most frame budgets. On ActivityNet Captions, PEEK is particularly strong, winning 14 out of 16 configurations. Zero-shot evaluation on MSR-VTT shows that our model transfers best at low frame budgets, while results at four and eight frames are more mixed as temporal coverage and visual diversity become increasingly competitive. Compared with recent adaptive baselines, PEEK is both more accurate in the low-budget regime and more efficient: it adds only $5.2\%$ to the captioning time, compared with $65.4\%$ for CSTA and $211.9\%$ for MaxInfo. We release our code and pre-trained checkpoint at https://github.com/momentslab/peek.

2605.31025 2026-06-01 cs.CL

TRACE: Discovering Task-Specific Parameter via Adaptation-Aware Probing for Continual Fine-Tuning

TRACE: 通过适应感知探测发现任务特定参数以实现持续微调

Xiaosong Han, Ke Chen, Xindi Dai, Di Liang, Minlong Peng, Wei Pang, Fausto Giunchiglia, Xiaoyue Feng, Yonghao Liu, Renchu Guan

AI总结 提出TRACE方法,通过适应感知探测发现任务特定核心参数,在持续微调中仅更新这些参数以缓解灾难性遗忘,并验证了跨模型和规模的迁移性。

Comments KDD2026

详情
AI中文摘要

在实际部署中,大型语言模型通常需要跨任务持续适应以保持最新状态,新的微调应保留先前学到的技能。然而,不加区分地混合任务会稀释任务特化,而顺序微调(全参数或低秩适应)常因破坏性覆盖导致灾难性遗忘。基于回放的持续微调和维护单独的任务特定适配器可以缓解遗忘,但引入了额外的计算、存储和管理开销。认识到LLM参数对于任何单一任务都存在冗余,我们将持续任务适应重新定义为通过适应感知探测发现任务特定参数:短时预热探测暴露任务的适应轨迹,使我们能够识别并隔离每个任务所需的一小部分关键参数,以缓解灾难性遗忘。基于这一观点,我们引入了TRACE,一种通过适应感知探测发现任务特定参数以实现持续微调的新方法。我们进行短时预热微调,通过比较预热模型和预训练模型来推导任务特定核心参数。核心参数通过两种策略识别:重要性评分(L2范数和Fisher信息)和特异性分析(参数更新的余弦相似度)。在持续微调设置中,仅更新当前任务的核心参数,其余参数保持冻结,从而保留先前知识。我们在多个标准基准上进行了广泛实验,证明了所提方法的优越性能。此外,我们通过跨模型和规模迁移性研究验证了方法的泛化能力,展示了在资源约束下指导大规模模型微调的“小到大”范式。

英文摘要

In real-world deployment, LLMs are often adapted continually across tasks to keep LLMs up-to-date in production, where new fine-tuning should preserve previously learned skills. However, indiscriminately mixing tasks can dilute task specialization, while sequential fine-tuning (full-parameter or low rank adaptation) often causes catastrophic forgetting due to destructive overwriting. Replay-based continual tuning and maintaining separate task-specific adapters can mitigate forgetting, but introduce additional compute, storage, and management overhead. Recognizing the redundancy of LLM parameters for any single task, we reframe continual task adaptation as task-specific parameter discovery via adaptation-aware probing: a short warm-start probe exposes a task's adaptation trace, enabling us to identify and isolate the small subset of parameters essential for each task to mitigate catastrophic forgetting. Building on this view, we introduce TRACE, a novel approach for discovering Task-specific paRameters via Adaptation-aware probing for Continual finE-tuning. We perform a short warm-start fine-tune to derive task-specific core parameters by comparing the warm-started and pre-trained models. Core parameters are identified via two strategies: importance scoring (L$_2$ norm and Fisher Information) and specificity analysis (cosine similarity of parameter updates). In continual fine-tuning settings, only the active task's core parameters are updated while others remain frozen, preserving prior knowledge. We conduct extensive experiments across multiple standard benchmarks to demonstrate the superior performance of our proposed method. Additionally, we validate the generalization of our method through a cross-model and scale transferability study, demonstrating a "small-to-large" paradigm that guides the fine-tuning of large-scale models under resource constraints.

2605.31023 2026-06-01 cs.AI cs.LG cs.MA

HADT: A Heterogeneous Multi-Agent Differential Transformer for Autonomous Earth Observation Satellite Cluster

HADT: 一种用于自主对地观测卫星集群的异构多智能体差分Transformer

Mohamad A. Hady, Muhammad Anwar Masum, Siyi Hu, Mahardhika Pratama, Jimmy Cao, Ryszard Kowalczyk

AI总结 针对异构卫星集群自主对地观测任务,提出基于Transformer的架构,通过关系观测-动作令牌化和差分注意力机制实现自适应实时资源管理,性能显著优于基线。

Comments Accepted in ECML-PKDD 2026. arXiv admin note: text overlap with arXiv:2511.12792

详情
AI中文摘要

本文解决了执行对地观测任务(包括光学和合成孔径雷达卫星)的异构卫星集群中的自主资源管理问题。在自主运行模式下,卫星配备智能能力,能够根据最新条件实时决策,同时最小化与地面操作员的交互。传统的调度方法通常依赖数学模型来表示卫星任务和资源管理,然后通过优化算法求解。然而,当底层模型不可用、过于复杂或因空间任务环境中的动态变化和不确定性而不准确时,此类解决方案效果不佳。一个有前景的替代方案是将问题重新表述为序列决策过程,并应用无模型强化学习技术来实现自适应和实时资源管理。为此,我们提出了一种新颖的基于Transformer的架构,专门针对异构卫星集群自主对地观测任务,采用关系观测-动作令牌化和差分注意力机制。我们的实验结果表明,与现有基线相比,性能有显著提升。此外,所提出的架构在不同卫星集群数量下表现出强大的适应性和可迁移性。

英文摘要

This work addresses the problem of autonomous resource management in heterogeneous satellite cluster conducting Earth Observation (EO) missions including optical and Synthetic Aperture Radar (SAR) satellites. In autonomous operation mode, satellites are equipped with intelligent capabilities enabling real-time decision-making based on the latest conditions, while requiring minimal interaction with ground operators. Traditional scheduling approaches typically rely on mathematical models to represent satellite mission and resource management. Then, this problem is solved by using optimization algorithms. However, such solutions become less effective when the underlying models are not available, over complex, and inaccurate due to dynamic changes and uncertainties inherent in the space mission environment. A promising alternative is to reformulate the problem as a sequential decision-making process and apply model-free reinforcement learning techniques to enable adaptive and real-time resource management. To this end, we propose a novel transformer-based architecture tailored for heterogeneous satellite cluster autonomous EO Mission with relational observations-actions tokenization and differential attention mechanism. Our experimental results demonstrate significant performance improvements compared to the available baselines. Moreover, the proposed architecture exhibits strong adaptability and transferability with respect to varying numbers of satellite clusters.

2605.31022 2026-06-01 cs.LG

Augmented Lagrangian Predictive Coding

增广拉格朗日预测编码

Jeffrey Seely, Julian Gould

AI总结 提出增广拉格朗日预测编码(PC-ALM),通过层局部拉格朗日乘子累积约束误差,使局部更新对齐反向传播梯度,在深度网络中匹配反向传播性能。

Comments 22 pages, 10 figures

详情
AI中文摘要

预测编码(PC)是反向传播(BP)的一种局部学习替代方案,通过局部能量最小化动力学而非全局反向传播来训练深度网络。我们引入了增广拉格朗日预测编码(PC-ALM),它保持了PC的推理预算,但通过将每层约束误差累积到层局部拉格朗日乘子中,使每个权重更新与BP对齐。在线性PC网络中,PC-ALM收敛到一个平衡点,其中精确的BP梯度仅通过层局部更新分布在整个网络中。我们在深度达128的非线性PC网络中分析了PC-ALM,并表明它在所有宽度-深度设置下匹配BP性能,特别是在PC表现不佳的深度窄网络中。PC-ALM在每层激活中引入了循环动力学。与PC在标量能量上的热流相比,PC-ALM动力学由增广拉格朗日上的对偶上升驱动。我们观察到在非常深的网络中“弹道”式信用传播,信用信号均匀分布在各层,而PC则是缓慢、扩散的信用传播。除了算法本身,增广拉格朗日框架提供了PC的泛化,并可能为分布式系统如何通过纯局部动力学计算和传播类似BP的信用信号提供见解。

英文摘要

Predictive coding (PC) is a local-learning alternative to backpropagation (BP), training deep networks via local energy-minimization dynamics rather than a global backward pass. We introduce Augmented Lagrangian Predictive Coding (PC-ALM), which maintains PC's inference budget but aligns each weight update toward BP by accumulating per-layer constraint errors into a layer-local Lagrange multiplier. In linear PC networks, PC-ALM converges to an equilibrium with exact BP gradients distributed across the network via only layer-local updates. We analyze PC-ALM in nonlinear PC networks up to depth 128 and show that it matches BP performance across all width-depth regimes, notably in deep narrow networks where PC underperforms. PC-ALM introduces recurrent dynamics in each layer's activations. Compared to PC's heat flow on a scalar energy, PC-ALM dynamics are driven by dual ascent on the augmented Lagrangian. We observe "ballistic" credit propagation across very deep networks, with credit signals evenly distributed across layers, compared to PC's slow, diffusive credit propagation. Beyond the algorithm itself, the augmented Lagrangian framework offers a generalization of PC, and may yield insights into how distributed systems could compute and propagate BP-like credit signals through purely local dynamics.

2605.31021 2026-06-01 cs.AI cs.CL cs.LG

A Persona-Based Evaluation Framework for Pluralistic Alignment in Generative AI

基于人格的生成式AI多元对齐评估框架

Atahan Karagoz

AI总结 提出一种状态空间约束仿真框架,通过合成认知轮廓替代单一评估函数,实现反映真实世界共识变异性的多元、视角依赖的基准测试,并分析仿真评估者的稳定性问题,论证动态调节机制的必要性。

详情
AI中文摘要

当前生成式人工智能的对齐范式主要依赖单一基准测试框架,将人类判断的多元性简化为聚合统计基线,从而掩盖了评估中的文化、人口和语境变异性。我们引入一种用于AI评估的状态空间约束仿真框架,用代表不同人类视角的合成认知轮廓的结构化流形替代单一评估函数。我们表明,现代生成架构能够以高度一致性实例化和维护这些评估人格,从而实现一种更接近现实世界共识变异性的多元、视角依赖的基准测试。然而,我们进一步分析了这些模拟评估者在顺序推理和随机提示扰动下的稳定性,揭示了人格一致性的系统性退化,表现为状态空间漂移和语义不一致。这些发现表明,静态对齐约束不足以维持随时间推移的稳健评估行为。相反,我们主张必须在生成系统中嵌入动态的、可行性驱动的调节机制,以保持连贯的认知仿真。通过将基于人格的评估视为潜在表征流形上的结构化动力系统,本研究为更自适应、更符合人类、更注重语境的AI评估方法奠定了基础。

英文摘要

Current alignment paradigms for generative artificial intelligence rely predominantly on monolithic benchmarking frameworks that reduce the plurality of human judgment to aggregated statistical baselines, thereby obscuring cultural, demographic, and contextual variability in evaluation. We introduce a state-space constrained emulation framework for AI evaluation that replaces singular assessment functions with a structured manifold of synthetic cognitive profiles representing diverse human perspectives. We show that modern generative architectures can instantiate and maintain these evaluative personas with high consistency, enabling a form of pluralistic, perspective-dependent benchmarking that more closely reflects real-world consensus variability. However, we further analyze the stability of these simulated evaluators under sequential inference and stochastic prompt perturbations, revealing systematic degradation in persona coherence that manifests as state-space drift and semantic inconsistency. These findings suggest that static alignment constraints are insufficient for sustaining robust evaluative behavior over time. Instead, we argue for the necessity of embedding dynamic, viability-driven regulatory mechanisms within generative systems to preserve coherent cognitive emulation. By framing persona-based evaluation as a structured dynamical system over latent representation manifolds, this study provides a foundation for more adaptive, human-aligned, and context-sensitive approaches to AI evaluation.

2605.31016 2026-06-01 cs.LG

An Efficient and Scalable Graph Condensation with Structure-Preserving

一种高效且可扩展的保结构图压缩方法

Yulin Hu, Fuyan Ou, Ye Yuan

AI总结 提出一种解耦节点压缩与图结构生成的保结构图压缩方法(SP-ESGC),通过热核特征传播和混合聚类策略实现高效图压缩,并利用预训练边预测器生成可迁移的结构模式,在保持高计算效率的同时提升跨GNN架构的泛化能力。

详情
AI中文摘要

图压缩(GC)对于在资源受限场景中部署图神经网络(GNN)至关重要,它通过将大规模图压缩为紧凑的合成图来实现。现有的GC方法通常由于耦合优化而面临计算效率低的问题,并且在不同GNN架构上泛化能力差。为了解决这些挑战,本研究提出了一种高效且可扩展的保结构图压缩方法(SP-ESGC),该方法采用解耦设计,将节点压缩与图结构生成分离。具体来说,首先利用热核特征传播,通过谱图理论启发的扩散生成节点表示。进一步,设计了一种新颖的混合聚类策略,从节点表示中提取判别性的类内质心。最后,一个预训练的边预测器从原始图中推断可迁移的结构模式,确保合成图的准确生成。在真实世界图数据集上的大量实验表明,所提出的SP-ESGC实现了精确的图压缩,同时具有显著高的计算效率。此外,SP-ESGC在多种GNN架构上也具有良好的泛化能力。

英文摘要

Graph condensation (GC) is pivotal for enabling Graph Neural Networks (GNNs) deployment in resource-constrained scenarios by compressing large-scale graphs into compact synthetic counterparts. Existing GC methods commonly suffer from computational inefficiency due to coupled optimization as well as encountering poor generalization across GNN architectures. To address these challenges, this study proposes an Efficient and Scalable Graph Condensation with Structure-Preserving (SP-ESGC), which possesses a decoupled design that separates node condensation from graph structure generation. Specifically, it first employs heat kernel feature propagation to generate node representation via spectral graph theory-inspired diffusion. Further, a novel hybrid clustering strategy is designed to extracts discriminative intra-class centroids from the node representation. Finally, a pre-trained edge predictor infers transferable structural patterns from the original graph, ensuring accurate synthetic graph generation. Extensive experiments on real-world graph datasets demonstrate that the proposed SP-ESGC implementes a precise GC with significantly high computational efficiency. Moreover, SP-ESGC also generalizes well across diverse GNN architectures.

2605.31013 2026-06-01 cs.LG

Physics-Informed Coarsening for Multigrid Graph Neural Surrogates

物理信息粗化用于多重网格图神经网络代理

Amir Bazzi, David Cardinaux, Ramy Nemer, Jose Alaves, Arjun Kalkur Matpadi Raghavendra, Elie Hachem

AI总结 针对固体力学中的非线性弹性、塑性和瞬态行为,提出一种结合物理信息粗化策略的多重网格图神经网络,通过基于残差的局部活动评分保留高应变/应力区域,实现分层消息传递,提升长期滚动稳定性和精度。

Comments Accepted at ICML 2026. 16 pages, 5 figures

详情
AI中文摘要

基于学习的偏微分方程代理最近在流体设置和结构化几何中达到了经典求解器的精度,同时实现了数量级的加速。相比之下,尽管存在非线性弹性、塑性和瞬态行为挑战标准架构,但针对可变形固体的鲁棒代理仍未得到充分探索。我们提出了一种用于固体力学的多重网格图神经网络,它将编码器-处理器-解码器主干与物理信息粗化策略相结合。我们的方法不是通过几何启发式进行下采样,而是使用基于残差的局部物理活动度量对节点进行评分,并优先保留高应变或应力集中区域,在最需要的地方分配多尺度容量。这通过分层消息传递保留了长程相互作用,同时提高了长期滚动的稳定性。我们在涵盖线性、非线性和瞬态状态的多个数据集上进行评估,并观察到与标准采样基线相比,在精度和滚动稳定性方面的一致提升。我们的结果突出了物理信息粗化对于固体力学中可扩展代理建模的重要性。

英文摘要

Learning-based surrogates for partial differential equations have recently matched the accuracy of classical solvers while achieving orders-of-magnitude speedups, predominantly in fluid settings and structured geometries. In contrast, robust surrogates for deformable solids remain underexplored, despite the presence of nonlinear elasticity, plasticity, and transient behavior that challenge standard architectures. We introduce a multigrid graph neural network for solid mechanics that couples an encoder-processor-decoder backbone with a physics-informed coarsening strategy. Instead of downsampling via geometric heuristics, our method scores nodes using a residual-based measure of local physical activity and preferentially retains regions of high strain or stress concentration, allocating multiscale capacity where it is most needed. This preserves long-range interactions through hierarchical message passing while improving stability over long rollouts. We evaluate on multiple datasets covering linear, nonlinear, and transient regimes, and observe consistent gains in accuracy and rollout stability compared to standard sampling baselines. Our results highlight the importance of physics-informed coarsening for scalable surrogate modeling in solid mechanics.

2605.31010 2026-06-01 cs.CL

MoG: Mixture of Experts for Graph-based Retrieval-Augmented Generation

MoG:用于基于图的检索增强生成的混合专家模型

Zheng Yuan, Chuang Zhou, Linhao Luo, Siyu An, Di Yin, Xing Sun, Xiao Huang

AI总结 提出MoG框架,通过组织知识为中心枢纽图和稀疏激活的专家图,利用拓扑感知路由器动态选择相关专家图,以解决检索增强生成中统一知识库引入无关信息的问题,在MuSiQue上相对提升超过20%。

详情
AI中文摘要

检索增强生成被广泛研究以将大型语言模型建立在外部证据上。然而,从统一的知识库中检索可能会不可避免地引入无关信息,从而误导复杂推理的生成。受混合专家(MoE)条件计算的启发,其中路由器为每个输入稀疏地选择专门的专家以及共享专家,我们提出了用于基于图的检索增强生成的混合专家模型,即MoG。它将知识组织为两个核心组件:(i)多样且始终可访问的枢纽图,编码语义和结构上的核心知识,并为专家激活提供上下文线索;(ii)稀疏激活的专家图,包含特定领域的证据。MoG首先访问枢纽图以识别一般证据并推导上下文线索。然后,一个拓扑感知路由器根据查询动态激活一组有限的专家图,从而将检索限制在一个集中的证据子空间中。在具有挑战性的基准测试上的大量实验表明,MoG始终优于强基线,在MuSiQue上相对提升超过20%。我们的代码可在https://github.com/DEEP-PolyU/MoG获取。

英文摘要

Retrieval-augmented generation is intensively studied to ground large language models on external evidence. However, retrieving from a unified knowledge base could inevitably introduce irrelevant information that may mislead generation for complex reasoning. Inspired by the conditional computation of mixture of experts (MoE), where a router sparsely selects specialized experts alongside shared ones for each input, we propose \textbf{M}ixture \textbf{o}f experts for \textbf{G}raph-based Retrieval-Augmented Generation, i.e., \textbf{MoG}. It organizes knowledge into two core components: (i) diverse, always-accessible hub graphs that encode semantically and structurally central knowledge and provide contextual clues for expert activation, and (ii) sparsely activated expert graphs that contain domain-specific evidence. MoG first accesses hub graphs to identify general evidence and derive contextual clues. Then, a topology-aware router dynamically activates a limited set of expert graphs conditioned on the query, thereby confining retrieval to a focused evidence subspace. Extensive experiments on challenging benchmarks show that MoG consistently outperforms strong baselines, with over 20\% relative improvement on MuSiQue. Our code is available in https://github.com/DEEP-PolyU/MoG.

2605.31007 2026-06-01 cs.LG cs.AI

DEM: A Distilled Explanation Model for Interpretable Anomaly Detection in Physiological Sensor Networks

DEM:面向生理传感器网络中可解释异常检测的蒸馏解释模型

Jyotirmoy Singh, Anushka Roy, Shreea Bose, Chittaranjan Hota

AI总结 提出一种三阶段玻璃箱框架DEM,通过将梯度提升专家模型的知识蒸馏到基于线性基线残差的决策树中,实现高精度与内在可解释性的异常检测,并引入蒸馏保真度指标量化解释可信度。

Comments 21 pages, 10 figures, 7 tables. Code: https://github.com/Jyotirmoy17/dem-model

详情
AI中文摘要

无线体域网(WBANs)中生理传感器数据的异常检测可能由传感器故障、网络中断或数据缺失引起,导致误报。因此,它既需要高预测精度,也需要临床可解释的解释。现有方法要么依赖性能强但无透明度的黑盒模型,要么依赖SHAP和LIME等事后解释方法。本文提出蒸馏解释模型(DEM),一个三阶段玻璃箱框架,将梯度提升专家模型的非线性知识蒸馏到基于线性基线残差的可解释决策树中,使得解释不是近似而是预测本身。DEM引入了一种新颖的蒸馏保真度指标,量化解释树忠实捕捉专家模型非线性贡献的程度,提供了先前可解释模型所缺乏的解释可信度的原则性度量。在包括MIMIC-IV、WESAD、eICU和内部SmartNet WBAN语料库在内的四个生理数据集上评估,DEM在临床上下文异常检测上达到0.9964的AUC,在可穿戴压力检测上达到0.9047,同时以可控深度生成人类可读的if-then规则。推理每1000个样本需要0.17ms,使DEM比基于SHAP的事后解释快1235倍,适用于实时生理监测。消融研究证实,XGBoost蒸馏步骤比朴素残差拟合提供了可测量的增益,深度敏感性分析展示了DEM在现有内在可解释模型中独有的、用户可控的准确性-可解释性权衡。

英文摘要

Anomaly detection in physiological sensor data from Wireless Body Area Networks (WBANs) can be caused by sensor faults, network disruptions, or missing data, leading to false alarms. Hence, it demands both high predictive accuracy and clinically interpretable explanations. Existing approaches rely either on black-box models that achieve strong performance but offer no transparency, or on post-prediction explanation methods such as SHAP and LIME. In this paper, we propose the Distilled Explanation Model (DEM), a three-stage glass-box framework that distills the non-linear knowledge of a gradient boosting expert into an interpretable decision tree operating on residuals relative to a linear baseline, so that the explanation is not an approximation but the prediction itself. DEM introduces a novel distillation fidelity metric that quantifies how faithfully the explanation tree captures the expert model's non-linear contribution, providing a principled measure of explanation trustworthiness absent from prior interpretable models. Evaluated across four physiological datasets, including MIMIC-IV, WESAD, eICU, and an in-house SmartNet WBAN corpus, DEM achieves an AUC of 0.9964 on clinical contextual anomaly detection and 0.9047 on wearable stress detection while producing human-readable if-then rules at a controllable depth. Inference requires 0.17ms per 1000 samples, rendering DEM 1235x faster than SHAP-based post-hoc explanation and suitable for real-time physiological monitoring. Ablation studies confirm that the XGBoost distillation step provides measurable gains over naive residual fitting, and depth-sensitivity analysis demonstrates an explicit, user-controlled accuracy-interpretability trade-off unique to DEM among existing intrinsically interpretable models.

2605.31005 2026-06-01 cs.LG

Learning Multi-Agent Coordination via Sheaf-ADMM

通过 Sheaf-ADMM 学习多智能体协调

Jeffrey Seely, Bartłomiej Cupiał, Llion Jones

AI总结 提出一种可微优化框架,利用细胞层(sheaf)和ADMM实现多智能体协调,在迷宫路径规划、图像分类和数独任务中验证了其有效性,并展现出优于标准消息传递架构的可解释性和鲁棒性。

Comments 17 pages, 8 figures, 6 tables. Accepted at ICML 2026

详情
AI中文摘要

我们提出了一种用于多智能体协调的可微优化框架。输入被分解为重叠的局部视图,每个视图由一个智能体处理,该智能体求解由神经编码器参数化的凸子问题。智能体通过交替方向乘子法(ADMM)进行协调,其中智能体间的约束由细胞层(cellular sheaf)指定。该层指定了相邻解必须在哪些方面达成一致,从而允许异构的全局共识概念。通过展开的优化进行反向传播,联合训练多智能体系统的所有组件。我们在迷宫路径规划、图像分类和数独任务上进行了评估,在这些任务中,局部视图单独不足的智能体学会了协调以产生正确的全局输出。在MNIST上,相对于标准CNN,局部视图分解提高了对分布偏移的鲁棒性。在数独上,优化导出的结构比参数匹配的MPNN基线产生了显著更高的求解率。最后,ADMM结构暴露了不同的原始、共识和对偶状态变量,使得协调动态可以直接分析和干预——这是标准消息传递架构所不具备的特性。

英文摘要

We present a differentiable optimization framework for multi-agent coordination. An input is decomposed into overlapping local views, each processed by an agent that solves a convex subproblem parameterized by a neural encoder. Agents coordinate through the Alternating Direction Method of Multipliers (ADMM) with inter-agent constraints specified by a cellular sheaf. The sheaf specifies which aspects of neighboring solutions must agree, allowing for heterogeneous notions of global consensus. Backpropagating through the unrolled optimization jointly trains all components of the multi-agent system. We evaluate on maze pathfinding, image classification, and Sudoku, where agents with individually insufficient local views learn to coordinate to produce correct global outputs. On MNIST, the local-view decomposition yields improved robustness to distribution shifts relative to a standard CNN. On Sudoku, the optimization-derived structure yields markedly higher solve rates than parameter-matched MPNN baselines. Finally, the ADMM structure exposes distinct primal, consensus, and dual state variables, opening the coordination dynamics to direct analysis and intervention -- a property unavailable in standard message-passing architectures.

2605.31001 2026-06-01 cs.CV

Iterative Framework For Data Augmentation Of Segmented Fingerprints

分割指纹数据增强的迭代框架

João Leonardo H. D. Agnol, Wesley Augusto de Bona, Erick Oliveira Rodrigues, Luiz Fernando Puttow Southier, Jefferson Oliva, Marcelo Filipak, Dalcimar Casanova

AI总结 针对婴儿指纹数据稀缺问题,提出一种迭代数据增强方法,通过在训练用于提取指纹脊线和谷线的卷积神经网络中引入错误,生成多样化的分割指纹变体,实验证明该方法能有效扩展指纹变异性且保持视觉相似性。

详情
Journal ref
Anais do XV Workshop de Sistemas de Informação 2024
AI中文摘要

婴儿生物识别由于婴儿与成人之间的生理差异而面临独特挑战,加上可用于研究的数据稀缺,限制了稳健匹配系统的发展。本文提出一种新颖的数据增强方法,使用迭代技术通过在训练用于提取指纹脊线和谷线的卷积神经网络中引入错误,生成分割指纹的多样化变体。在真实婴儿指纹上的实验证明了该方法在扩展指纹变异性方面的有效性,增强后的指纹在细节计数上表现出显著波动,同时仍保持与原始指纹的视觉相似性。研究还强调了该方法在应用不同程度变化到指纹分割方面的可定制性。未来研究包括使用所提框架增强的数据集训练分割和匹配神经网络。

英文摘要

Infant biometrics presents unique challenges due to the physiological differences between infants and adults, compounded by the scarcity of available data for research that limits the development of robust matching systems. This paper proposes a novel data augmentation method that uses iterative techniques to generate diverse variants of segmented fingerprints by inducing errors in a convolutional neural network trained to extract fingerprint ridges and valleys. Experiments on real infant fingerprints demonstrate the method's effectiveness in expanding fingerprint variability, with augmentations exhibiting significant fluctuations in minutiae counts while still retaining visual similarity to the originals. The study also highlights the method's customizable nature for applying varying levels of changes to fingerprint segmentations. Future research includes training segmentation and matching neural networks using datasets augmented by the proposed framework.

2605.30992 2026-06-01 cs.LG

Eigenvectors of Experts are Training-free Non-collapsing Routers

专家特征向量是无需训练的非崩溃路由器

Giang Do, Hung Le, Truyen Tran

AI总结 针对稀疏混合专家模型中专家崩溃问题,提出基于专家权重矩阵特征向量的无需训练路由框架SSMoE,通过奇异值分解利用谱特性提升模型性能。

Comments 24 pages

详情
Journal ref
ICML 2026
AI中文摘要

稀疏混合专家(SMoE)架构通过将输入令牌路由到选定的专家子集来提高大型语言模型(LLMs)的训练效率。尽管取得了显著成功,SMoE模型在训练和推理中仍面临专家崩溃问题(Chi等人,2022),这会降低模型性能。先前研究主要关注改进路由器;然而,这些方法依赖于从头训练或微调,需要高昂的计算和数据处理成本。此外,我们通过理论和实证结果证明,尽管有这些努力,在推进预训练良好的SMoE模型时,该问题仍然存在。为填补这一空白,我们分析了先进的SMoE模型,观察到专家权重矩阵的特征向量编码了丰富的语义信息,指向传统路由策略的有效替代方案。基于这一见解,我们提出了奇异值分解SMoE(SSMoE),一种新颖且无需训练的框架,利用专家权重的谱特性来解决崩溃问题并提升模型性能。在多种语言和视觉任务上的大量实验,包括干净和损坏数据设置,证明了SSMoE的强大泛化能力和鲁棒性。我们的发现强调了更深入理解模型内部结构如何指导开发更有效的SMoE架构。我们的实现已在https://github.com/giangdip2410/SSMoE公开。

英文摘要

Sparse Mixture of Experts (SMoE) architectures improve the training efficiency of Large Language Models (LLMs) by routing input tokens to a selected subset of specialized experts. Despite their remarkable success, both training and inference in SMoE models suffer from the expert collapse issue (Chi et al., 2022), which degrades model performance. Prior studies primarily focus on improving the router; however, such methods rely on training from scratch or fine-tuning, which requires high computational and data-processing costs. Furthermore, we demonstrate that, despite these efforts, the issue persists when advancing well-pretrained SMoE models, as evidenced by both theoretical and empirical results. To fill that gap, we analyze the advanced SMoE models and observe that the eigenvectors of expert weight matrices encode rich semantic information, pointing to an effective alternative to conventional routing strategies. Building on this insight, we propose Singular Value Decomposition SMoE (SSMoE), a novel and training-free framework that leverages spectral properties of the expert weights to address the collapse issue and enhance model performance. Extensive experiments across diverse language and vision tasks, under both clean and corrupt data settings, demonstrate the strong generalization and robustness of SSMoE. Our findings highlight how a deeper understanding of model internals can guide the development of more effective SMoE architectures. Our implementation is publicly available at https://github.com/giangdip2410/SSMoE.

2605.30991 2026-06-01 cs.LG cs.CV

Parallel Tempering Initial Sampling in Inference-Time Reward Alignment

推理时奖励对齐中的并行回火初始采样

Myeongjun Oh, Gwangho Kim, Sungyoon Lee

AI总结 针对推理时奖励对齐中标准SMC方法因初始采样陷入局部模式的问题,提出基于并行回火的PATHS方法,通过耦合多条回火链实现高效探索,提升对齐质量。

Comments 31 pages, 11 figures

详情
AI中文摘要

推理时奖励对齐无需重新训练即可引导预训练的扩散和基于流的生成模型满足用户指定的奖励。最近,序贯蒙特卡洛(SMC)通过迭代过滤和传播多个粒子成为该任务的有力框架。然而,我们表明基于SMC的标准方法通常性能不佳,因为它们从标准先验初始化粒子,而复杂奖励景观中的高奖励区域极为罕见。此外,我们表明即使最近的奖励感知初始采样方法仍然容易陷入局部模式,因为复杂奖励景观通常是多模态的。为克服这些限制,我们提出PATHS(用于高复杂度奖励采样的并行回火),一种通过并行回火耦合多个采样链的新型初始化方法。PATHS维护一个奖励回火链的阶梯,并定期执行Metropolis交换,从而在平坦化的奖励景观中实现高效探索,缓解模式陷阱问题。我们的分析表明,该机制显著增强了有限预算下对通常难以采样的罕见高奖励区域的探索。在布局到图像和数量感知生成上的实验表明,PATHS在对齐质量上取得了一致的提升,尤其是在复杂提示上。

英文摘要

Inference-time reward alignment steers pretrained diffusion and flow-based generative models to satisfy user-specified rewards without retraining. Recently, Sequential Monte Carlo (SMC) has emerged as a powerful framework for this task by iteratively filtering and propagating multiple particles. However, we show that standard SMC-based methods often suffer from poor performance because they initialize particles from a standard prior, whereas high-reward regions in complex reward landscapes are extremely rare. Further, we show that even recent reward-aware initial sampling approaches remain vulnerable to getting trapped in local modes, as complex reward landscapes are often multi-modal. To overcome these limitations, we propose PATHS (PArallel Tempering for High-complexity reward Sampling), a novel initialization method that couples multiple sampling chains through parallel tempering. PATHS maintains a ladder of reward-tempered chains and periodically performs Metropolis swaps, enabling efficient exploration across flattened reward landscapes, thereby mitigating the mode-trapping issues. Our analysis reveals that this mechanism substantially enhances the finite-budget exploration of rare, high-reward regions that are typically challenging to sample. Experiments on layout-to-image and quantity-aware generation show that PATHS achieves consistent gains in alignment quality, particularly on complex prompts.

2605.30989 2026-06-01 cs.RO

A study on a Real-Time VR-Based Teleoperation Framework for Manipulator in Dynamic Environment

动态环境下基于实时VR的机械臂遥操作框架研究

InGyu Choi, GeonYeong Go, SunWoo Ahn, HyoJae Kang, Min-Sung Kang

AI总结 提出一种集成GPU加速逆运动学和轨迹优化的VR遥操作框架,在静态和动态障碍物环境中实现低延迟、碰撞感知的实时机械臂控制。

Comments This manuscript has been submitted for possible publication

详情
AI中文摘要

机器人遥操作能够在人类难以直接进入的危险环境中安全、非接触地执行任务,并且随着最近VR技术的发展,其应用范围已经扩大。然而,许多VR遥操作研究主要作为机器人模仿学习的数据收集工具,因此它们通常没有明确处理操作过程中的动态障碍物、工作空间变化或碰撞风险。为了实际部署以保障操作员安全,遥操作必须能够以低延迟响应动态情况,并对经验不足的操作员的错误保持鲁棒性。本文提出了一种VR遥操作框架,支持实时操作,同时处理与静态和移动障碍物的碰撞。该框架在VR界面中集成了GPU加速的逆运动学和轨迹优化,以在机器人约束下在每个控制周期生成可行的关节命令。使用7自由度机械臂进行的实验展示了在无障碍物、静态障碍物和移动障碍物三种场景下的稳定在线行为和碰撞感知运动生成。结果表明,所提出的方法生成的运动与操作员的命令一致,并在障碍物干扰命令路径时产生安全的绕行。

英文摘要

Robot teleoperation enables safe, non-contact task execution in hazardous environments where direct human access is difficult, and its application has expanded with recent VR technologies. Many VR teleoperation studies, however, have primarily served as data-collection tools for robot imitation learning, so they often do not explicitly address dynamic obstacles, workspace changes, or collision risks during operation. For real deployment aimed at operator safety, teleoperation must react to dynamic situations with low latency and remain robust to mistakes made by inexperienced operators. This paper presents a VR teleoperation framework that supports real-time manipulation while handling collisions with both static and moving obstacles. The framework integrates GPU-accelerated inverse kinematics and trajectory optimization within a VR interface to generate feasible joint commands at each control cycle under robot constraints. Experiments with a 7-DoF manipulator demonstrate stable online behavior and collision-aware motion generation across three scenarios: obstacle-free, static-obstacle, and moving-obstacle environments. The results indicate that the proposed approach generates motion consistent with the operator's command while producing safe detours when obstacles interfere with the commanded path.

2605.30987 2026-06-01 cs.CV

Benchmarking Single-Step Inpainting Methods for Multi-Object 3D Gaussian Splatting Scenes

多对象3D高斯泼溅场景的单步修复方法基准测试

Finn Dröge, Cecilia Curreli, Abhishek Saroha, Daniel Cremers

AI总结 针对3D高斯泼溅场景中的对象移除与修复任务,比较了2D修复器在3D一致性上的表现,发现基于重建的修复器优于生成扩散模型,且从头初始化场景比微调现有场景效果更好,同时引入了一个带真实数据的新多对象场景。

Comments Accepted as an extended abstract to the CVEU Workshop at CVPR 2026

详情
AI中文摘要

对象移除和修复3D高斯泼溅(3DGS)场景面临跨相机视图的3D一致性等挑战。在比较2D修复器及其对3D领域的适用性时,我们发现基于重建的修复器在3D一致性上优于生成扩散模型。将这些2D修复器集成到创建和微调3DGS场景的不同单步方法中,我们的结果表明,从头初始化场景比微调现有场景产生更高质量的结果。使用最先进的生成式2D修复器,我们创建了一个简单的基线,以强调在3D设置中先移除对象再进行修复的重要性。由于360°数据集很少包含真实世界的地面真值,且具有挑战性的遮挡场景同样稀少,我们引入了一个新的多对象场景,其中包含记录的地面真值数据和多个存在对象遮挡的视图。

英文摘要

The tasks of object removal and inpainting 3D Gaussian Splatting (3DGS) scenes face challenges such as 3D consistency across camera views. In comparing 2D inpainters and their suitability for the 3D domain, we find that reconstruction-based inpainters outperform generative diffusion models in 3D consistency. Integrating these 2D inpainters into different single-step methods for creating and finetuning 3DGS scenes, our results indicate that initializing the scene from scratch produces higher quality results than finetuning the existing scene. Using a state-of-the-art generative 2D inpainter, we create a straightforward baseline to underline the importance of object removal before inpainting in the 3D setting. Since 360° datasets rarely include real-world ground truths, and challenging occlusion scenarios are equally sparse, we introduce a novel multi-object scene with recorded ground truth data and many views with object occlusions.

2605.30984 2026-06-01 cs.CV cs.AI cs.CL

Generating Reports or Repeating Templates? Measuring and Mitigating Template Collapse in 3D CT Report Generation

生成报告还是重复模板?测量和缓解三维CT报告生成中的模板崩溃

Tom Maye-Lasserre, Yitong Li, Bailiang Jian, Morteza Ghahremani, Benedikt Wiestler, Christian Wachinger

AI总结 针对三维CT报告生成中模型输出多样性低、病理检测能力差的模板崩溃问题,提出解耦框架CLarGen,通过分离临床检测与语言合成,显著提升临床准确性并保持报告流畅性。

详情
AI中文摘要

现代三维医学视觉语言模型(VLM)能够生成流畅的放射学风格文本,但表现出极低的病理检测率和输出多样性,崩溃为低估罕见但关键发现的通用模板。我们将这种失败模式识别为模板崩溃。这种失败源于三维医学成像的独特限制,例如数据有限、标签严重不平衡以及体积编码器的弱信号。在这些限制下,文本生成目标鼓励捷径学习和流畅但基础薄弱的报告。我们通过临床保真度、输出多样性、正常模板偏差和罕见发现存活率系统性地诊断模板崩溃。为了缓解它,我们提出CLarGen,一个解耦框架,将说什么(临床检测)与怎么说(语言合成)分开。CLarGen使用(i)用于多标签病理检测的潜在查询变换器,(ii)用于临床匹配示例的病理引导检索,以及(iii)用于从检测到的发现和检索到的上下文中合成最终报告的医学语言模型。在最新的三维CT报告生成基线中,CLarGen缓解了模板崩溃,并在保持流畅报告的同时显著提高了临床准确性(macro-F1 0.487 vs. 0.189;CRG 0.472 vs. 0.368)。我们的结果表明,明确、可测量的临床基础对于抗模板崩溃的三维CT报告生成至关重要。代码将在接收后发布。

英文摘要

Modern 3D medical vision-language models (VLMs) can generate fluent radiology-style text while exhibit critically low pathology detection and output diversity, collapsing to generic templates that under-report rare yet critical findings. We identify this failure mode as Template Collapse. This failure stems from the unique constraints of 3D medical imaging, e.g., limited data, severe label imbalance, and weak signals from volumetric encoders. Under these constraints, text-generation objectives encourage shortcut learning and fluent but weakly grounded reports. We systematically diagnose the Template Collapse through clinical fidelity, output diversity, normal-template bias, and rare-finding survival. To mitigate it, we propose CLarGen, a decoupled framework that separates what to say (clinical detection) from how to say it (language synthesis). CLarGen uses (i) a Latent Query Transformer for multi-label pathology detection, (ii) pathology-guided retrieval for clinically matched exemplars, and (iii) a medical language model to synthesize the final report from detected findings and retrieved context. Across state-of-the-art 3D CT report generation baselines, CLarGen mitigates Template Collapse and substantially improves clinical accuracy (macro-F1 0.487 vs. 0.189; CRG 0.472 vs. 0.368) while maintaining fluent reporting. Our results suggest that explicit, measurable clinical grounding is essential for template-collapse-resistant 3D CT report generation. Code will be released upon acceptance.

2605.30983 2026-06-01 cs.CV

Can BEV Perception Gracefully Degrade under Sensor Failures?

BEV感知能否在传感器故障下优雅降级?

Haifa Zhang, Yijing Wang, Haoyu Wang, Zheng Li, Zhiqiang Zuo

AI总结 针对多模态BEV感知在传感器损坏时性能骤降的问题,提出Grace-BEV框架,通过主动可靠性评估和动态特征重校准实现优雅降级,在极端LiDAR故障下将mAP从0.0%恢复至34.7%。

详情
AI中文摘要

尽管多模态鸟瞰图(BEV)感知在自动驾驶中取得了显著成功,但现有系统存在一个关键脆弱性:现有融合机制对传感器损坏高度敏感,常导致灾难性性能下降。这种脆弱性主要源于标准融合框架通常以静态方式集成多模态表示,导致在缺失或损坏模态下性能急剧崩溃。相比之下,我们表明通过主动模态可靠性评估可以实现优雅降级。为此,我们提出Grace-BEV,一个轻量级即插即用框架,在多模态融合过程中强制引入主动可靠性感知。Grace-BEV不依赖计算昂贵的跨模态交互,而是利用对齐的BEV空间通过TrustGate路由器显式评估模态可信度,并使用FailSafe融合块动态重新校准特征集成。此外,我们设计了带模态丢弃的三阶段训练策略,以防止模态主导并鼓励在不可靠输入下进行平衡的跨模态学习。在nuScenes-R和nuScenes-C上的大量实验表明,Grace-BEV在各种损坏设置下保持稳健性能。值得注意的是,在标准基线崩溃至0.0%平均精度(mAP)的灾难性LiDAR故障下,Grace-BEV将性能恢复至高达34.7% mAP。此外,它将干净准确率提升高达1.4%,实现了鲁棒性与效率之间的强权衡。

英文摘要

Despite the remarkable success of multi-modal bird's-eye view (BEV) perception in autonomous driving, current systems exhibit a critical vulnerability: existing fusion mechanisms are highly brittle to sensor corruptions, often causing catastrophic performance degradation. This vulnerability largely stems from the fact that standard fusion frameworks typically integrate multi-modal representations in a static manner, leading to a precipitous performance collapse under missing or corrupted modalities. In contrast, we show that graceful degradation is achievable through active modality reliability assessment. To this end, we present Grace-BEV, a lightweight and plug-and-play framework that enforces active reliability awareness during multi-modal fusion. Instead of relying on computationally expensive cross-modal interactions, Grace-BEV leverages the aligned BEV space to explicitly assess modality trustworthiness via a TrustGate Router and dynamically recalibrate feature integration using the FailSafe Fusion Block. Furthermore, we devise a Three-Phase Training strategy with Modality Dropout to prevent modality dominance and encourage balanced cross-modal learning under unreliable inputs. Extensive experiments on nuScenes-R and nuScenes-C show that Grace-BEV maintains robust performance across diverse corruption settings. Notably, under catastrophic LiDAR failures where standard baselines collapse to 0.0% mean Average Precision (mAP), Grace-BEV restores performance to as high as 34.7% mAP. Moreover, it improves clean accuracy by up to 1.4%, achieving a strong trade-off between robustness and efficiency.

2605.30981 2026-06-01 cs.CL cs.LG

Cognitive Fatigue in Autoregressive Transformers: Formalization and Measurement

自回归Transformer中的认知疲劳:形式化与测量

Riju Marwah, Ritvik Garimella, Vishal Pallagani, Atishay Jain, Michael Stewart, Amit Sheth

AI总结 本文形式化自回归语言模型在长程生成中的退化现象为认知疲劳,并提出轻量级诊断指标疲劳指数(FI),通过聚合注意力衰减、表征漂移和熵校准三个信号实现实时监测,实验表明FI能高精度预测任务退化和重复生成。

Comments 9 pages, 7 figures. Accepted at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

自回归语言模型在长程生成过程中经常退化,产生重复文本、失去指令遵循能力并表现出不稳定的熵。尽管这些失败普遍存在,但从业者缺乏在线诊断工具来实时检测它们。我们将这种退化形式化为认知疲劳,这是一种可测量的生成时状态,其特征是对原始提示的注意力衰减、表征漂移和熵校准错误。我们引入了疲劳指数(FI),这是一种轻量级、模型无关的诊断方法,在明确的公理(单调性、有界性、可解释性)下聚合这三个信号,从而实现可靠的运行时监控。在九个模型(1B-13B参数)上,FI轨迹表现出结构化的时间动态,预测任务退化(AUROC = 0.95)和重复(Spearman rho = 0.94),并揭示了非单调的缩放行为:低于3B的指令微调模型比基础模型退化更快,而在7B时这一趋势逆转。压力分析进一步表明,在更长的上下文、中间位置的证据和降低的数值精度下,FI onset加速。这些结果确立了认知疲劳作为一个连贯且可测量的现象,并将FI定位为生产级LLM系统中运行时可靠性监控的原则性工具。

英文摘要

Autoregressive language models frequently degrade during long-horizon generation, producing repetitive text, losing instruction adherence, and exhibiting unstable entropy. Despite the prevalence of these failures, practitioners lack online diagnostics to detect them in real-time as they occur. We formalize this degradation as cognitive fatigue, a measurable generation-time state characterized by decay in attention to the original prompt, representational drift, and entropy miscalibration. We introduce the Fatigue Index (FI), a lightweight, model-agnostic diagnostic that aggregates these three signals under explicit axioms (monotonicity, boundedness, interpretability) enabling reliable runtime monitoring. Across nine models (1B-13B parameters), FI trajectories exhibit structured temporal dynamics, predict task degradation (AUROC = 0.95) and repetition (Spearman rho = 0.94), and reveal non-monotonic scaling behavior: instruction-tuned models below 3B exhibit faster collapse than base models, with this trend reversing at 7B. Stress analyses further show that FI onset accelerates under longer contexts, middle-positioned evidence, and reduced numerical precision. These results establish cognitive fatigue as a coherent and measurable phenomenon, and position FI as a principled tool for runtime reliability monitoring in production LLM systems.

2605.30972 2026-06-01 cs.CV

BiSegMamba: Efficient Bidirectional Tri-Oriented Mamba for 3D Medical Image Segmentation

BiSegMamba: 用于3D医学图像分割的高效双向三向Mamba

Bakht Zada, Chao Tong, Qile Su, Shuai Zhang

AI总结 提出BiSegMamba,一种基于双向三向Mamba的高效3D医学图像分割网络,通过渐进压缩主干、多尺度空间混合器、双向正交Mamba块和自适应方向融合,在降低计算成本的同时提升分割精度。

Comments 10 pages, 7 figures, 5 tables. Code is available at: https://github.com/bakhtzadaabshare/BiSegMamba

详情
AI中文摘要

精确的3D医学图像分割需要长程体积上下文和精细边界保持。基于CNN的方法全局依赖建模有限,而基于Transformer的模型对于密集3D输入通常计算成本高昂。最近的基于Mamba的方法提供了一种高效替代方案,但现有的体积设计仍依赖于重复的高分辨率扫描、仅前向的顺序建模和固定的方向求和,导致高成本、扫描顺序偏差和次优的方向聚合。我们提出BiSegMamba,一种用于3D医学图像分割的高效双向三向Mamba网络。BiSegMamba遵循紧凑到细节的设计,其中渐进压缩主干(PCS)能够进行高效的潜在空间推理,同时保留浅层高分辨率特征用于重建。多尺度空间混合器(MSSM)在早期阶段捕获局部解剖模式,而提出的双向三向正交Mamba(Bi-ToOM)块使用联合处理的前向和后向扫描序列,从多个正交视图建模长程依赖。自适应方向融合(ADF)学习跨扫描方向的输入相关通道权重,用方向感知融合替代固定求和。在收集的颈动脉CTA数据集和三个公共基准BraTS2023、ACDC和AMOS-CT上的实验表明,BiSegMamba在血管、心脏、脑肿瘤和腹部多器官分割任务中具有良好的泛化能力。与SegMamba-V2相比,BiSegMamba在BraTS2023上性能略有提升,在ACDC和颈动脉数据集上显著改进,同时计算成本降低高达77.9% FLOPs,展示了在通用3D医学图像分割中强大的精度-效率平衡。

英文摘要

Accurate 3D medical image segmentation requires both long-range volumetric context and fine boundary preservation. CNN-based methods have limited global dependency modeling, while Transformer-based models are often computationally expensive for dense 3D inputs. Recent Mamba-based methods provide an efficient alternative, but existing volumetric designs still depend on repeated high-resolution scanning, forward-only sequential modeling, and fixed directional summation, causing high cost, scan-order bias, and suboptimal directional aggregation. We propose BiSegMamba, an efficient bidirectional tri-oriented Mamba network for 3D medical image segmentation. BiSegMamba follows a compact-to-detail design, where a progressive compacting stem (PCS) enables efficient latent-space reasoning while retaining shallow high-resolution features for reconstruction. A multi-scale spatial mixer (MSSM) captures local anatomical patterns in early stages, and the proposed bidirectional tri-oriented Ortho Mamba (Bi-ToOM) block models long-range dependencies from multiple orthogonal views using jointly processed forward and backward scan sequences. Adaptive directional fusion (ADF) learns input-dependent channel-wise weights across scan orientations, replacing fixed summation with orientation-aware fusion. Experiments on a collected carotid CTA dataset and three public benchmarks, BraTS2023, ACDC, and AMOS-CT, show that BiSegMamba generalizes well across vascular, cardiac, brain tumor, and abdominal multi-organ segmentation tasks. Compared with SegMamba-V2, BiSegMamba achieves slightly better performance on BraTS2023 and clear improvements on ACDC and the carotid dataset, while reducing computational cost by up to 77.9% FLOPs, demonstrating a strong accuracy-efficiency balance for general 3D medical image segmentation.

2605.30969 2026-06-01 cs.CV

Omni-Supervised Motion Editing: Balancing Change and Invariance through Positive-Negative Learning

全监督运动编辑:通过正负学习平衡变化与不变性

Zhenwu Shi, Jingyu Gong, Peiwei Wang, Xingzan Wang, Tianwen Qian, Wenxi Li, Yuan Fang, Jiao Xie, Lizhuang Ma, Shaohui Lin

AI总结 提出OmniME框架,通过正负学习结合回顾特征监督、运动保持机制和三元组语义对齐,平衡运动编辑中的变化与不变性,在MotionFix和STANCE Adjustment数据集上达到最优性能。

详情
AI中文摘要

基于文本的人体运动编辑旨在根据自然语言指令修改现有运动序列,同时保持原始运动的一致性。现有的基于扩散的方法通常依赖启发式相似性线索或粗糙的全局条件,导致运动失真和次优的语义对齐。关键挑战在于平衡变化(即精确编辑目标区域)和不变性(即保留未编辑部分)。为应对这一挑战,我们提出了一个全监督正负学习框架,名为OmniME。我们的方法集成了三个互补组件:(1)回顾特征监督,在Transformer层之间强制执行从粗到细的一致性;(2)运动保持机制,根据源-目标相似性关注细微变化;(3)基于三元组的语义对齐,增强文本-运动对应关系。这些组件共同形成了一个统一的监督范式,平衡变化与不变性。在MotionFix和STANCE Adjustment数据集上的大量实验表明,OmniME在编辑对齐方面达到了最先进的性能,验证了我们统一学习框架的有效性。我们的源代码和模型已发布在:https://github.com/rocket-ycyer/OmniME.git

英文摘要

Text-based human motion editing aims to modify existing motion sequences according to natural language instructions while maintaining the consistency of the original motion. Existing diffusion-based approaches often rely on heuristic similarity cues or coarse global conditioning, leading to motion distortion and suboptimal semantic alignment. The key challenge lies in balancing change (i.e. precisely editing target regions) and invariance (i.e. preserving unedited parts). To handle such challenge, we propose an Omni-Supervised Positive-Negative Learning framework, named OmniME. Our method integrates three complementary components: (1) retrospective feature supervision that enforces coarse-to-fine consistency across transformer layers,(2) motion preservation mechanism that focuses on subtle variations according to the source-target similarity, and (3) triplet-based semantic alignment that strengthens text-motion correspondence. Together, these components form a unified supervision paradigm that balances change and invariance. Extensive experiments on the MotionFix and STANCE Adjustment datasets demonstrate that OmniME achieves state-of-the-art performance in editing alignment, validating the effectiveness of our unified learning framework. Our source codes and models have been released at: https://github.com/rocket-ycyer/OmniME.git

2605.30968 2026-06-01 cs.CV cs.AI

Variational Adapter for Cross-modal Similarity Representation

变分适配器用于跨模态相似性表示

WenZhang Wei, Zhipeng Gui, Dehua Peng, Tiandi Ye, Huayi Wu

AI总结 针对跨模态匹配中细粒度标注稀缺导致二元分类边界压缩和假负样本问题,提出变分适配器VACSR,将匹配任务重构为变分推断问题,通过构建潜在相似性空间和正则化缓解过拟合,在图像-文本检索、域泛化和基类到新类泛化任务上验证了有效性。

Comments Accepted by the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

视觉-语言模型的核心在于在统一表示空间中度量跨模态相似性。然而,大多数图像-文本匹配或多类图像分类数据集缺乏细粒度的跨模态匹配标注,迫使连续的相似性空间压缩为二元分类边界。这种压缩引入了假负样本,并严重损害了跨模态任务的泛化性能。尽管先前的研究试图通过建模模态内模糊性来缓解这一问题,但往往忽略了固有的标注缺陷,导致不确定性分配次优。为了解决这些挑战,我们提出了一种变分适配器用于跨模态相似性表示(VACSR)。该方法将具有细粒度语义稀缺性的图像-文本匹配重新表述为变分推断问题。它构建了一个跨模态相似性的潜在空间,并使用正则化技术来减轻对二元标注的过拟合。在图像-文本检索、域泛化和基类到新类泛化上的实验证明了所提出方法的有效性和鲁棒的泛化能力。

英文摘要

The core of vision-language models lies in measuring cross-modal similarity within a unified representation space. However, most image-text matching or multi-class image classification datasets lack fine-grained cross-modal matching annotations, forcing the continuous similarity space into binary classification boundaries. This compression induces false negative samples and significantly impairs the generalization performance of cross-modal tasks. While prior research has attempted to mitigate this by modeling intra-modal ambiguity, it often overlooks inherent annotation flaws, leading to suboptimal uncertainty allocation. To address these challenges, we propose a Variational Adapter for Cross-modal Similarity Representation (VACSR). This approach reformulates image-text matching with fine-grained semantic scarcity as a variational inference problem. It constructs a latent space for cross-modal similarity and uses regularization techniques to mitigate overfitting to binary annotations. Experiments on image-text retrieval, domain generalization, and base-to-novel generalization demonstrate the proposed method's effectiveness and robust generalization ability.

2605.30961 2026-06-01 cs.CL

EvoGens: A Population-Based Heuristic Search Framework for Scientific Idea Generation

EvoGens:一种基于种群的启发式搜索框架用于科学思想生成

Xu Li, Hanzhe Tu, Xinyi Li, Kuncheng Zhao, Xun Han, Zhonghui Liu

AI总结 针对现有LLM方法生成科学思想时语义趋同、多样性和新颖性不足的问题,提出EvoGens框架,通过进化搜索(变异、交叉、选择)增强思想探索,显著提升新颖性和多样性。

Comments 21 pages, 6 figures

详情
AI中文摘要

生成新颖的研究思想是科学进步的基础。虽然大型语言模型(LLM)在辅助这一过程中显示出潜力,但现有方法常表现出语义趋同,导致多样性和新颖性有限。为解决这一问题,我们引入了EvoGens,一个受进化启发的框架,将科学思想生成重新构想为对思想种群的进化搜索。EvoGens迭代地应用基于排名的变异与差异化检索规划以融入外部知识,以及语义感知的交叉以融合互补概念进行概念重组。一个轻量级的评估信号指导选择过程,鼓励持续探索同时缓解过早收敛。大量实验表明,与最先进的基线相比,EvoGens显著增强了探索能力。具体而言,在当前的自动评估协议下,它将新颖性从0.1提升到0.4,多样性从0.24提升到0.55,同时保持了可比的思想质量。这些发现表明,进化机制可以作为面向探索的研究构思的有用框架,特别是在共享自动评估设置下拓宽候选思想的新颖性和多样性。

英文摘要

Generating novel research ideas is fundamental to scientific progress. While Large Language Models (LLMs) show promise in assisting this process, existing approaches often exhibit semantic convergence, resulting in limited diversity and novelty. To address this, we introduce EvoGens, an evolution-inspired framework that recasts scientific idea generation as an evolutionary search over a population of ideas. EvoGens iteratively applies rank-based mutation with differentiated retrieval planning to incorporate external knowledge, and semantic-aware crossover to fuse complementary concepts for conceptual reorganization. A lightweight evaluation signal guides the selection process, encouraging sustained exploration while mitigating premature convergence. Extensive experiments demonstrate that EvoGens substantially enhances exploration capabilities compared to state-of-the-art baselines. Specifically, it improves the Novelty from 0.1 to 0.4 and the Diversity from 0.24 to 0.55, while maintaining comparable idea quality under the current automatic evaluation protocol. These findings suggest that evolutionary mechanisms can serve as a useful framework for exploration-oriented research ideation, especially for broadening the novelty and diversity of candidate ideas under a shared automatic evaluation setting.

2605.30960 2026-06-01 cs.LG

Revisiting Zeroth-Order Hessian Approximation: A Single-Step Policy Optimization Lens

重新审视零阶Hessian近似:单步策略优化视角

Junbin Qiu, Zhaowei Hong, Renzhe Xu, Yao Shu

AI总结 本文通过单步策略优化视角统一零阶Hessian估计,提出方差缩减的ZoVH框架,实现全Hessian矩阵、正则化逆及偏差校正逆Hessian-梯度积的高效估计。

详情
AI中文摘要

精确的零阶Hessian估计是无导数方法的基石,对于双层优化、贝叶斯推断和不确定性量化等任务至关重要。然而,在高维设置中获取完整的低方差Hessian及其逆估计器仍然是一个重大挑战。为了解决这一问题,我们提出了一个统一框架,通过单步策略优化的视角重新解释零阶Hessian近似。该视角建立了通用零阶Hessian估计器与平滑策略优化目标Hessian之间的理论等价性,将不同的经典随机估计器统一为基线选择的特定实例。在此基础上,我们引入了ZoVH,一个针对全Hessian矩阵、其正则化逆以及偏差校正的逆Hessian-梯度积的方差缩减估计器套件。ZoVH利用两种关键技术:(1) 推导出的唯一最优基线,可证明最小化方差;(2) 一种查询重用策略,结合历史函数查询以提高样本效率而不增加成本。我们严格的理论分析证实了Hessian估计器的无偏性,验证了基线的方差最优性,提供了整个ZoVH套件的误差界,并为由此产生的曲率感知零阶算法建立了收敛保证。广泛的实证结果验证了我们的理论发现,表明ZoVH在实际应用中实现了卓越的估计精度和收敛性能。代码可在 https://github.com/Qjbtiger/ZoVH 获取。

英文摘要

Accurate Zeroth-Order (ZO) Hessian estimation is a cornerstone of derivative-free methods, essential for tasks such as bilevel optimization, Bayesian inference, and uncertainty quantification. However, obtaining a complete suite of low-variance estimators for the Hessian and its inverse in high-dimensional settings remains a significant challenge. To address this, we propose a unified framework that reinterprets ZO Hessian approximation through the lens of single-step Policy Optimization (PO). This perspective establishes a theoretical equivalence between general ZO Hessian estimators and the Hessian of a smoothed PO objective, unifying distinct classical randomized estimators as specific instances of baseline selection. Building on this foundation, we introduce ZoVH, a comprehensive suite of variance-reduced estimators for the full Hessian matrix, its regularized inverse, and the bias-corrected inverse Hessian-gradient product. ZoVH leverages two key techniques: (1) a unique optimal baseline derived to provably minimize variance, and (2) a query reuse strategy that incorporates historical function queries to enhance sample efficiency without inflating costs. Our rigorous theoretical analysis confirms the unbiasedness of the Hessian estimator, validates the variance optimality of our baseline, provides error bounds for the entire ZoVH suite, and establishes convergence guarantees for the resulting curvature-aware ZO algorithm. Extensive empirical results validate our theoretical findings, demonstrating that ZoVH achieves superior estimation accuracy and convergence performance in real-world applications. Code is available at https://github.com/Qjbtiger/ZoVH

2605.30957 2026-06-01 cs.RO

RDGen: Demonstration Generation for High-Quality Robot Learning via Reinforcement Learning

RDGen: 通过强化学习生成高质量机器人学习的演示

Zijian Zhu, Menglin Zou, Zhuang Li, Yaojie Tu, Xinhai Sun

AI总结 提出RDGen框架,利用从仿真到真实的强化学习策略生成高质量机器人演示轨迹,用于训练视觉-语言-动作模型,相比人工遥操作产生更平滑轨迹并提升下游性能。

Comments 13 pages, 4 figures, 3 tables

详情
AI中文摘要

视觉-语言-动作(VLA)模型已成为通用机器人控制的一种有前景的范式。然而,其性能仍然从根本上受限于高质量机器人轨迹数据的可用性。在当前的机器人学习实践中,这些数据主要通过人类遥操作收集,这需要大量人力、成本高昂且难以扩展。在本文中,我们提出了RDGen,一种用于生成高质量机器人演示的仿真到真实强化学习框架。RDGen并非仅将强化学习用作最终控制策略,而是利用训练好的RL策略作为结构化的轨迹生成器。该系统由一个基于VLM的任务解析器(用于识别任务相关物体)、一个基于Grounding DINO的物体定位器以及一个从仿真迁移到真实机器人的RL策略组成。然后,成功的 rollout 被收集为干净、高质量的演示,用于下游VLA训练,而仿真阶段进一步以极低的边际成本提供可扩展的额外轨迹来源。在拾取和放置任务上的实验表明,迁移后的RL策略实现了高任务成功率。与人类遥操作相比,RDGen生成的轨迹显著更平滑,并产生更优的下游VLA性能。这些结果表明,RL生成的演示可以作为机器人策略学习更可靠和一致的监督信号。

英文摘要

Vision-Language-Action (VLA) models have emerged as a promising paradigm for general-purpose robot control. However, their performance remains fundamentally constrained by the availability of high-quality robot trajectory data. In current robot learning practice, such data are primarily collected through human teleoperation, which is labor-intensive, costly, and difficult to scale. In this paper, we propose RDGen, a sim-to-real reinforcement learning framework for generating high-quality robot demonstrations. Rather than employing reinforcement learning solely as the final control policy, RDGen leverages trained RL policies as a structured trajectory generator. The system consists of a VLM-based task parser that identifies task-relevant objects, a Grounding DINO-based object localizer, and an RL policy transferred from simulation to the real robot. Successful rollouts are then harvested as clean, high-quality demonstrations for downstream VLA training, while the simulation stage further provides a scalable source of additional trajectories at little marginal cost. Experiments on a pick-and-place task demonstrate that the transferred RL policy achieves a high task success rate. Compared with human teleoperation, RDGen produces significantly smoother trajectories and yields superior downstream VLA performance. These results indicate that RL-generated demonstrations can serve as more reliable and consistent supervisory signals for robot policy learning.

2605.30942 2026-06-01 cs.CV

PRISM: Progressive Reasoning through Iterative Slot Memory for Vision

PRISM: 通过迭代槽记忆进行渐进推理的视觉架构

Ziyu Wang, Shuangpeng Han, Mengmi Zhang

AI总结 提出PRISM架构,通过迭代槽记忆进行渐进推理,在图像分类、目标检测和语义分割等任务上取得竞争性能,并在遮挡等不完整观测下展现出更强的鲁棒性。

详情
AI中文摘要

现代视觉模型通过单次前馈传递处理图像,这限制了它们在观测不完整时恢复缺失证据或细化不确定表示的能力。受人类感知迭代性质的启发,我们引入了PRISM(通过迭代槽记忆进行渐进推理),这是一种通过迭代细化对图像进行推理的金字塔视觉架构。在高层次上,PRISM将视觉特征分组为以对象为中心的表示,从学习到的记忆中检索相关模式,并迭代细化表示以解决歧义和恢复缺失信息。这种组织-回忆-细化过程在多个尺度上循环运行,实现了视觉表示的渐进改进。在包括图像分类、目标检测和语义分割在内的标准视觉任务中,PRISM取得了竞争性能,同时在遮挡等不完整观测下展现出更强的鲁棒性。这些结果表明,使用结构化表示和记忆进行迭代推理是构建更具弹性和适应性的视觉模型的一个有前景的方向。源代码和模型将发布。

英文摘要

Modern vision models process images in a single feed-forward pass, which limits their ability to recover missing evidence or refine uncertain representations under incomplete observations. Inspired by the iterative nature of human perception, we introduce PRISM (Progressive Reasoning through Iterative Slot Memory), a pyramid vision architecture that reasons over images through iterative refinement. At a high level, PRISM groups visual features into object-centric representations, retrieves relevant patterns from a learned memory, and iteratively refines the representation to resolve ambiguity and recover missing information. This organize-recall-refine process operates recurrently across multiple scales, enabling progressive improvement of visual representations. Across standard vision tasks, including image classification, object detection, and semantic segmentation, PRISM achieves competitive performance while demonstrating improved robustness under incomplete observations such as occlusion. These results suggest that iterative reasoning with structured representations and memory is a promising direction for building more resilient and adaptive vision models. Source code and models will be released.

2605.30939 2026-06-01 cs.CV

IAF-Net: Illumination-Adaptive Fusion for Low-Light Urban Road Segmentation

IAF-Net:用于低光照城市道路分割的照明自适应融合网络

Bingtao Wang, Daojie Peng, Fulong Ma, Jun Ma, Liang Zhang

AI总结 提出IAF-Net,通过照明自适应融合模块动态调整RGB与几何特征的融合权重,并利用亮度调制注意力解码器增强低光照特征选择,实现不同光照条件下鲁棒的道路分割。

详情
AI中文摘要

语义道路分割对于自动驾驶至关重要,但现有方法在低光照条件下性能严重下降。许多现有的多模态融合方法没有显式适应模态可靠性的光照依赖性变化,这可能在夜间将退化的RGB特征传播到融合表示中。我们提出IAF-Net(照明自适应融合网络),一种端到端框架,具有照明自适应融合功能,可在不同光照条件下实现鲁棒的道路分割。它通过核心的照明自适应融合(IAF)模块动态调整RGB和几何特征的融合权重,并使用亮度调制注意力解码器增强低光照特征选择。我们还构建了两个专用数据集:nuScenes夜间道路分割(nuScenes-NRS)和CARLA多天气道路分割(CARLA-MWRS)。在nuScenes-NRS上的实验显示,在比较方法中整体性能达到最先进水平,而CARLA-MWRS进一步验证了在恶劣天气条件下的鲁棒性。在40%训练子集上的消融研究进一步强调了IAF模块的重要性,该模块在MaxF中提供了最大的个体增益0.70%。

英文摘要

Semantic road segmentation is important for autonomous driving, but existing methods suffer severe performance degradation under low-light conditions. Many existing multi-modal fusion methods do not explicitly adapt to illumination-dependent changes in modality reliability, which can propagate degraded RGB features into the fused representation at night. We propose IAF-Net (Illumination-Adaptive Fusion Network), an end-to-end framework with illumination-adaptive fusion for robust road segmentation across different lighting conditions. It dynamically adjusts fusion weights of RGB and geometric features via the core Illumination-Adaptive Fusion (IAF) module, and enhances low-light feature selection with a brightness-modulated attention decoder. We also construct two dedicated datasets: nuScenes Nighttime Road Segmentation (nuScenes-NRS) and CARLA Multi-Weather Road Segmentation (CARLA-MWRS). Experiments on nuScenes-NRS show state-of-the-art overall performance among the compared methods, while CARLA-MWRS further validates robustness across adverse weather conditions. Ablation studies on a 40% training subset further highlight the importance of the IAF module, which provides the largest individual gain of 0.70% in MaxF.

2605.30936 2026-06-01 cs.LG math.OC stat.ML

Local linear convergence of gradient methods for overparameterized Gaussian mixtures

过参数化高斯混合模型梯度方法的局部线性收敛性

Jingxing Wang, Vasileios Charisopoulos, Maryam Fazel

AI总结 针对过参数化高斯混合模型,提出一种交替使用短梯度步和长Polyak步的方法,实现局部线性收敛速率,克服了过参数化导致的慢收敛问题。

Comments 45 pages, 7 figures

详情
AI中文摘要

我们研究了过参数化下学习高斯混合模型的问题。先前的工作表明,虽然过参数化对于避免虚假局部最优和通过梯度EM算法实现全局恢复真实模型至关重要,但它会显著减慢局部收敛速度。在混合权重的某些假设下,我们证明了统计学习过程最小化的标准散度度量具有一个缓慢增长的流形,在该流形上著名的Polyak步长可以几何级地减少损失,并设计了一种基于梯度的方法,该方法以局部线性速率收敛到极小值点。此外,我们表明,对于具有任意权重的混合模型,我们的方法收敛到接近最优的解——直到一个自然的误设阈值。在高层次上,该方法在接近流形的几个“短”梯度下降步和收缩到极小值点距离的“长”Polyak步之间交替。我们的结果表明,慢收敛不是过参数化的内在挑战,而是可以通过利用损失景观的有利结构来克服。

英文摘要

We study the problem of learning Gaussian mixture models under overparameterization. Prior work has shown that while overparameterization is essential for avoiding spurious local optima and enables global recovery of the ground-truth model using the gradient-EM (expectation-maximization) algorithm, it can dramatically slow down the local rate of convergence. Under certain assumptions on the mixture weights, we show that a standard divergence measure minimized by statistical learning procedures possesses a manifold of slow growth on which the well-known Polyak stepsize reduces the loss geometrically, and design a gradient-based method that converges to minimizers at a locally linear rate. Additionally, we show that our method converges to nearly optimal solutions -- up to a natural misspecification threshold -- for mixtures with arbitrary weights. At a high level, the method alternates between several "short" gradient descent steps that approach the manifold and "long" Polyak steps that contract the distance to minimizers. Our results suggest that slow convergence is not an intrinsic challenge of overparameterization, but can be overcome by exploiting the favorable structure of the loss landscape.

2605.30934 2026-06-01 cs.CL cs.AI

Do Large Language Models Encode Institutional Experience? Evidence from Cross-Linguistic Moral Reasoning Under Ambiguity

大型语言模型是否编码了制度经验?来自跨语言模糊道德推理的证据

Nattavudh Powdthavee

AI总结 通过跨语言道德困境实验,研究大型语言模型在模糊情境下是否通过语言编码制度经验,发现隐含制度线索会放大跨语言道德分歧,而明确框架则抑制这种差异。

Comments 44 pages

详情
AI中文摘要

大型语言模型(LLMs)在不同语言中表现出系统性的道德推理差异,但这种差异的来源尚不清楚。我们检验了一个假设:语言编码了其使用环境中的制度方面,使得LLMs通过训练继承了特定制度的道德先验。跨越制度质量梯度广泛的九种语言、六个前沿LLM以及两项预注册研究,我们考察了道德困境的可接受性取决于制度功能的情况。在研究1中,明确的制度框架产生了统一的无结果:跨语言道德分歧在制度依赖场景中没有增加,也没有追踪语言社区之间的制度差异。在研究2中,我们引入了制度模糊场景,其中制度利益存在但未明确说明。在这些条件下,跨语言道德分歧相对于制度无关控制组增加,并且除一个理论上有信息的例外,与语言社区之间的现实世界制度差异相关。明确的框架再次减弱了这些效应。这些发现表明,制度经验可能在语言中留下可检测的痕迹,塑造LLM的道德推理,同时也表明明确的制度线索可以抑制这些差异的表达。

英文摘要

Large language models (LLMs) exhibit systematic differences in moral reasoning across languages, yet the source of this variation remains unclear. We test the hypothesis that languages encode aspects of the institutional environments in which they are spoken, allowing LLMs to inherit institution-specific moral priors through training. Across nine languages spanning a broad gradient of institutional quality, six frontier LLMs, and two preregistered studies, we examine moral dilemmas whose acceptability depends on institutional functioning. In Study 1, explicit institutional framing produced uniformly null results: cross-linguistic moral divergence did not increase in institutionally contingent scenarios, nor did it track institutional differences between language communities. In Study 2, we introduced institutionally ambiguous scenarios in which institutional stakes were present but not explicitly stated. Under these conditions, cross-linguistic moral divergence increased relative to institutionally inert controls and, with one theoretically informative exception, was associated with real-world institutional differences between language communities. Explicit framing again attenuated these effects. These findings suggest that institutional experience may leave detectable traces in language that shape LLM moral reasoning, while also indicating that explicit institutional cues can suppress the expression of those differences.

2605.30928 2026-06-01 cs.RO

Enhancing Human-Likeness in Reinforcement Learning Agents via Hierarchical Macro Action Quantization

通过分层宏动作量化增强强化学习智能体的人类相似性

Usman Nizamani, M. Shaheer Luqman, Fawad Javed Fateh, Ali Shah Ali, Murad Popattia, M. Zeeshan Zia, Quoc-Huy Tran

AI总结 提出一种分层宏动作量化框架(HiMAQ),通过两级向量量化将人类演示编码为宏动作,使强化学习智能体在保持高回报的同时生成更接近人类的行为序列,在D4RL基准上优于非分层基线并兼容多种RL算法。

详情
AI中文摘要

人类化智能体是人工智能的长期目标。尽管性能强劲,大多数强化学习(RL)智能体仍以奖励驱动,且常表现出与人类不同的行为,限制了可解释性和可靠性。在这项工作中,我们引入了一种新颖的人类化RL框架,该框架在最大化奖励的同时预测与人类行为紧密对齐的动作序列。具体来说,我们使用一种分层宏动作量化方法(称为HiMAQ)将人类演示编码为宏动作,该方法包含两个连续的向量量化层级。低层量化将输入动作映射到细粒度的子动作簇,而高层量化将这些子动作簇聚合成动作簇。在D4RL基准上的广泛评估表明,我们的分层方法优于非分层基线(MAQ),在保持与先前RL智能体相当或更高成功率的同时,获得了更好的人类相似性分数。这些改进泛化到与各种RL算法(即IQL、SAC和RLPD)的集成中。

英文摘要

Human-like agents are a long-standing goal of artificial intelligence. Despite strong performance, most reinforcement learning (RL) agents remain reward-driven and often exhibit behaviors that differ from humans, limiting interpretability and reliability. In this work, we introduce a novel human-like RL framework that predicts action sequences closely aligned with human behaviors while maximizing rewards. Specifically, we encode human demonstrations into macro actions using a hierarchical macro action quantization approach (termed HiMAQ) consisting of two successive levels of vector quantization. The lower quantization level maps input actions to fine-grained subaction clusters, while the higher quantization level aggregates these subaction clusters into action clusters. Extensive evaluations on the D4RL benchmarks show that our hierarchical approach outperforms the non-hierarchical baseline (MAQ), achieving better human-likeness scores while maintaining comparable or better success rates than previous RL agents. The improvements generalize across integrations with various RL algorithms, namely IQL, SAC, and RLPD.