arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2604.12176 2026-06-03 cs.AI

Evaluating Relational Reasoning in LLMs with REL

使用REL评估大语言模型中的关系推理能力

Lukas Fesser, Yasha Ektefaie, Ada Fang, Sham M. Kakade, Marinka Zitnik

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文通过关系复杂度（RC）定义推理难度，构建涵盖代数、化学和生物学的生成式基准REL，发现前沿大语言模型在RC增加时性能持续下降，表明模型在高元关系绑定上存在固有局限。

Comments ICML 2026

详情

AI中文摘要

关系推理是推断同时绑定多个实体、属性或变量的关系的能力。这种能力对科学推理至关重要，但现有对大语言模型关系推理的评估通常侧重于结构化输入（如表格、图或合成任务），并未分离高元关系绑定带来的困难。我们通过关系复杂度（RC）来研究这个问题，将其定义为应用一个关系时必须同时绑定的独立实体或操作数的最小数量。RC提供了一种原则性的方式来改变推理难度，同时控制输入大小、词汇和表示选择等混杂因素。基于RC，我们引入了REL，一个涵盖代数、化学和生物学的生成式基准框架，在每个领域内变化RC。在前沿大语言模型中，当RC增加时，性能持续且单调下降，即使实体总数保持不变。这种失败模式在增加测试时计算量和上下文学习时仍然存在，表明这一限制与所需关系绑定的元数有关，而非推理步骤不足或缺乏示例暴露。我们的结果识别了当前模型难以应对的高元推理场景，并促使通过关系复杂度的视角重新审视基准测试。

英文摘要

Relational reasoning is the ability to infer relations that jointly bind multiple entities, attributes, or variables. This ability is central to scientific reasoning, but existing evaluations of relational reasoning in large language models often focus on structured inputs such as tables, graphs, or synthetic tasks, and do not isolate the difficulty introduced by higher-arity relational binding. We study this problem through the lens of Relational Complexity (RC), which we define as the minimum number of independent entities or operands that must be simultaneously bound to apply a relation. RC provides a principled way to vary reasoning difficulty while controlling for confounders such as input size, vocabulary, and representational choices. Building on RC, we introduce REL, a generative benchmark framework spanning algebra, chemistry, and biology that varies RC within each domain. Across frontier LLMs, performance degrades consistently and monotonically as RC increases, even when the total number of entities is held fixed. This failure mode persists with increased test-time compute and in-context learning, suggesting a limitation tied to the arity of the required relational binding rather than to insufficient inference steps or lack of exposure to examples. Our results identify a regime of higher-arity reasoning in which current models struggle, and motivate re-examining benchmarks through the lens of relational complexity.

URL PDF HTML ☆

赞 0 踩 0

2604.10169 2026-06-03 cs.AI cs.LG

MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction

MAVEN-T：用于实时多智能体轨迹预测的强化异构蒸馏

Wenchang Duan, Zhenguo Gao, Jinguo Xian, Yi Shi

发表机构 * School of Mathematical Sciences, Shanghai Jiao Tong University（上海交通大学数学科学学院）； Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University（上海交通大学Bio-X研究院、发育与神经精神疾病遗传学重点实验室）； Shanghai Key Laboratory of Psychotic Disorders, Brain Science and Technology Research Center, Shanghai Jiao Tong University（上海精神疾病重点实验室、脑科学与技术研究中心，上海交通大学）

AI总结提出MAVEN-T框架，通过高容量教师模型和紧凑学生模型的异构蒸馏，结合强化学习优化，实现实时多智能体轨迹预测，在多个数据集上达到高精度与低延迟。

详情

AI中文摘要

轨迹预测是自动驾驶系统的关键组成部分，因为未来运动直接影响碰撞检查、行为规划和控制。在密集交互、异构行为、多模态未来和有限车载计算条件下，该任务仍然具有挑战性。现有的图、注意力和生成式预测器改进了交互推理或不确定性建模，但其高容量设计通常成本高昂，难以实时部署。轻量级预测器和传统蒸馏降低了推理成本，但通常依赖静态模仿，并未明确纠正与安全相关的教师偏差。本文提出了MAVEN-T，一种用于实时多智能体轨迹预测的强化异构蒸馏框架。高容量教师模型通过环绕感知图编码器建模有向局部交互，结合高效时间滤波与移位窗口空间注意力，并通过稀疏混合专家头解码特定机动未来。紧凑的GRU-挤压激励学生模型配备低秩自适应策略头，通过特征级、注意力级和语义级蒸馏进行训练。为了与下游行为对齐，学生模型进一步通过近端策略优化奖励进行细化，奖励包括碰撞避免、舒适性和进度，同时复杂度感知课程和弹性权重巩固稳定了分阶段训练。在NGSIM、HighD、MoCAD、Argoverse 2和Waymo开放运动数据集上的实验评估了准确性、效率、泛化性、鲁棒性和闭环安全性。学生模型在NVIDIA Jetson AGX Orin上实现了6.2倍参数压缩、3.7倍推理加速和14.6毫秒延迟，同时保持竞争性准确性。

英文摘要

Trajectory prediction is a key component of autonomous driving systems because future motions directly affect collision checking, behavior planning, and control. The task remains challenging under dense interactions, heterogeneous behaviors, multimodal futures, and limited on-board computation. Existing graph, attention, and generative predictors improve interaction reasoning or uncertainty modeling, but their high-capacity designs are often costly for real-time deployment. Lightweight predictors and conventional distillation reduce inference cost, yet usually rely on static imitation and do not explicitly correct safety-relevant teacher bias. This paper proposes \textbf{MAVEN-T}, a reinforced heterogeneous distillation framework for real-time multi-agent trajectory prediction. A high-capacity teacher models directed local interactions with a surround-aware graph encoder, combines efficient temporal filtering with shifted-window spatial attention, and decodes maneuver-specific futures through a sparse Mixture-of-Experts head. A compact GRU--Squeeze-and-Excitation student with a Low-Rank Adapted policy head is trained by feature-, attention-, and semantic-level distillation. To align prediction with downstream behavior, the student is further refined by Proximal Policy Optimization rewards for collision avoidance, comfort, and progress, while a complexity-aware curriculum and Elastic Weight Consolidation stabilize stage-wise training. Experiments on NGSIM, HighD, MoCAD, Argoverse~2, and the Waymo Open Motion Dataset evaluate accuracy, efficiency, generalization, robustness, and closed-loop safety. The student achieves 6.2$\times$ parameter compression, 3.7$\times$ inference acceleration, and 14.6,ms latency on an NVIDIA Jetson AGX Orin while maintaining competitive accuracy.

URL PDF HTML ☆

赞 0 踩 0

2510.02779 2026-06-03 cs.LG

Optimal Rates for Generalization of Gradient Descent for Deep ReLU Classification

深度ReLU分类中梯度下降泛化的最优速率

Yuanfan Li, Yunwen Lei, Zheng-Chu Guo, Yiming Ying

发表机构 * School of Mathematical Sciences, Zhejiang University（浙江大学数学科学学院）； Department of Mathematics, The University of Hong Kong（香港大学数学系）； School of mathematics and statistics, University of Sydney（悉尼大学数学与统计学学院）

AI总结针对深度ReLU网络，通过权衡优化与泛化误差，在NTK可分离假设下证明了梯度下降的泛化误差率为~O(L^6/(nγ^2))，与SVM最优率仅差深度相关因子，关键技术是控制参考模型附近的激活模式以得到更紧的Rademacher复杂度界。

Comments Published in NeurIPS 2025

详情

AI中文摘要

近期进展显著提升了我们对深度神经网络中梯度下降（GD）方法泛化性能的理解。一个自然且基本的问题是：GD能否达到核方法中建立的最小最大最优速率？现有结果要么给出次优的$O(1/\sqrt{n})$速率，要么关注具有光滑激活函数的网络，导致对网络深度$L$的指数依赖。本文通过仔细权衡优化与泛化误差，为深度ReLU网络的GD建立了最优泛化速率，仅对深度有多项式依赖。具体地，在数据以间隔$γ$为NTK可分离的假设下，我们证明了过风险率为$\widetilde{O}(L^6 / (n γ^2))$，这与最优SVM型速率$\widetilde{O}(1 / (n γ^2))$仅差深度相关因子。一项关键的技术贡献是我们对参考模型附近激活模式的新颖控制，从而为梯度下降训练的深度ReLU网络获得了更紧的Rademacher复杂度界。

英文摘要

Recent advances have significantly improved our understanding of the generalization performance of gradient descent (GD) methods in deep neural networks. A natural and fundamental question is whether GD can achieve generalization rates comparable to the minimax optimal rates established in the kernel setting. Existing results either yield suboptimal rates of $O(1/\sqrt{n})$, or focus on networks with smooth activation functions, incurring exponential dependence on network depth $L$. In this work, we establish optimal generalization rates for GD with deep ReLU networks by carefully trading off optimization and generalization errors, achieving only polynomial dependence on depth. Specifically, under the assumption that the data are NTK separable from the margin $γ$, we prove an excess risk rate of $\widetilde{O}(L^6 / (n γ^2))$, which aligns with the optimal SVM-type rate $\widetilde{O}(1 / (n γ^2))$ up to depth-dependent factors. A key technical contribution is our novel control of activation patterns near a reference model, enabling a sharper Rademacher complexity bound for deep ReLU networks trained with gradient descent.

URL PDF HTML ☆

赞 0 踩 0

2604.07366 2026-06-03 cs.LG

Flow Learners for PDEs: Toward a Physics-to-Physics Paradigm for Scientific Computing

PDE的流学习器：迈向科学计算的物理到物理范式

Yilong Dai, Shengyu Chen, Xiaowei Jia, Runlong Yu

发表机构 * The University of Alabama（阿拉巴马大学）； University of Pittsburgh（匹兹堡大学）

AI总结本文提出流学习器（flow learners）范式，通过参数化传输向量场并积分生成轨迹，将PDE求解从状态预测转向物理上允许的未来传输建模，实现连续时间预测、不确定性量化及物理感知求解器设计。

详情

AI中文摘要

偏微分方程（PDE）支配着科学与工程中几乎所有的物理过程，但大规模求解仍然代价高昂。生成式AI已经改变了语言、视觉和蛋白质科学，但学习的PDE求解器尚未经历类似的转变。现有范式各自捕捉了问题的一部分。物理信息神经网络嵌入残差结构，尽管在刚性、多尺度或大区域情况下通常难以优化。神经算子跨实例进行摊销，尽管它们通常继承快照预测的求解视图，并可能在长滚动中退化。基于扩散的求解器对不确定性建模，尽管它们通常建立在仍以状态回归为中心的求解器模板上。我们认为核心问题是用于训练学习求解器的抽象。许多模型被要求预测状态，而许多科学设置需要建模不确定性如何在约束动力学中移动。相关对象是物理上允许的未来上的传输。这激发了流学习器：参数化传输向量场并通过积分生成轨迹的模型，呼应定义PDE演化的连续动力学。这种物理到物理的对齐支持连续时间预测、原生不确定性量化以及物理感知求解器设计的新机会。我们解释了为什么基于传输的学习为学习的PDE求解提供了更强的组织原则，并概述了从这一转变中产生的研究议程。

英文摘要

Partial differential equations (PDEs) govern nearly every physical process in science and engineering, but solving them at scale remains prohibitively expensive. Generative AI has transformed language, vision, and protein science, but learned PDE solvers have not undergone a comparable shift. Existing paradigms each capture part of the problem. Physics-informed neural networks embed residual structure, although they are often difficult to optimize in stiff, multiscale, or large-domain regimes. Neural operators amortize across instances, although they commonly inherit a snapshot-prediction view of solving and can degrade over long rollouts. Diffusion-based solvers model uncertainty, although they are often built on a solver template that still centers on state regression. We argue that the core issue is the abstraction used to train learned solvers. Many models are asked to predict states, while many scientific settings require modeling how uncertainty moves through constrained dynamics. The relevant object is transport over physically admissible futures. This motivates flow learners: models that parameterize transport vector fields and generate trajectories through integration, echoing the continuous dynamics that define PDE evolution. This physics-to-physics alignment supports continuous-time prediction, native uncertainty quantification, and new opportunities for physics-aware solver design. We explain why transport-based learning offers a stronger organizing principle for learned PDE solving and outline the research agenda that follows from this shift.

URL PDF HTML ☆

赞 0 踩 0

2604.07048 2026-06-03 cs.CV

PRISM: Rethinking Atmospheric Scattering Reconstruction as a Unified Understanding and Restoration Model for Real-world Dehazing

PRISM: 重新思考大气散射重建作为真实世界去雾的统一理解与恢复模型

Chengyu Fang, Chunming He, Yuelin Zhang, Chubin Chen, Chenyang Zhu, Hongqiu Wang, Longxiang Tang, Xiu Li, Sina Farsiu

发表机构 * Tsinghua University（清华大学）； Duke University（杜克大学）； CUHK（香港中文大学）； HKUST（香港理工大学）； HKUST(GZ)（香港理工大学（广州））

AI总结提出基于近端散射大气重建（PSAR）的物理结构化框架，结合在线非均匀雾合成和选择性自蒸馏适应（SSDA）方案，实现真实世界图像去雾的统一理解与恢复。

Comments 21 Pages, 8 Figures, 7 Tables

详情

AI中文摘要

真实世界图像去雾（RID）旨在去除真实场景中由雾引起的退化。由于非均匀雾分布、空间变化的颜色偏移以及配对真实雾-干净数据的稀缺，该任务仍然具有挑战性。在PRISM中，我们提出了近端散射大气重建（PSAR），这是一个物理结构化框架，在大气散射模型下联合重建清晰场景和散射变量，使恢复过程在复杂真实世界条件下更具可解释性。为了弥合合成到真实的差距，我们设计了一个在线非均匀雾合成流程和一个用于非配对真实世界场景的选择性自蒸馏适应（SSDA）方案，该方案使模型能够选择性地从高质量感知目标中学习，同时利用其内在的散射理解来审计残留雾并指导自我优化。在真实世界基准上的实验表明，PRISM在RID任务上取得了具有竞争力的性能。

英文摘要

Real-world image dehazing (RID) aims to remove haze-induced degradation from real scenes. This task remains challenging due to non-uniform haze distribution, spatially varying color shifts, and the scarcity of paired real hazy-clean data. In PRISM, we propose Proximal Scattering Atmosphere Reconstruction (PSAR), a physically structured framework that jointly reconstructs the clear scene and scattering variables under the atmospheric scattering model, making the restoration process more interpretable in complex real-world conditions. To bridge the synthetic-to-real gap, we design an online non-uniform haze synthesis pipeline and a Selective Self-Distillation Adaptation (SSDA) scheme for unpaired real-world scenarios, which enables the model to selectively learn from high-quality perceptual targets while leveraging its intrinsic scattering understanding to audit residual haze and guide self-refinement. Experiments on real-world benchmarks demonstrate that PRISM achieves competitive performance on RID tasks.

URL PDF HTML ☆

赞 0 踩 0

2604.07123 2026-06-03 cs.CL

Language Bias under Conflicting Information in Multilingual LLMs

多语言大模型中冲突信息下的语言偏见

Robert Östling, Murathan Kurfalı

发表机构 * Stockholm University（斯德哥尔摩大学）； RISE Research Institutes of Sweden（瑞典RISE研究机构）

AI总结本研究通过扩展“干草堆中的冲突针”范式至多语言环境，评估了不同规模的多语言大模型在回答问题时对冲突信息中不同语言的偏好，发现模型普遍存在语言偏见，尤其是对俄语的普遍偏见和对中文的偏好，且提示语言与信息语言匹配时更受青睐。

详情

AI中文摘要

大型语言模型（LLMs）在整合冲突信息回答问题过程中已被证明存在偏见。本文探讨这种偏见是否也存在于冲突信息所使用的语言上。为此，我们将“干草堆中的冲突针”范式扩展到多语言环境，并使用五种不同语言的自然新闻领域数据，对一系列不同规模的多语言LLMs进行了全面评估。我们发现，所有测试的LLMs，包括GPT-5.2，在绝大多数情况下都会忽略冲突，并自信地只断言其中一个可能的答案。此外，在模型和提示语言之间，存在一致的语言偏好偏见，普遍对俄语存在偏见，而在最长上下文长度下，则偏好中文。语言偏好在中国大陆内外训练的模型之间一致，但前者稍强。模型还普遍倾向于优先考虑与提示语言匹配的信息。我们希望让多语言LLMs的用户和开发者意识到这类偏见，以促进对其成因及可能缓解方法的进一步研究。

英文摘要

Large Language Models (LLMs) have been shown to contain biases in the process of integrating conflicting information when answering questions. Here we ask whether such biases also exist with respect to which language is used for each conflicting piece of information. To answer this question, we extend the conflicting needles in a haystack paradigm to a multilingual setting and perform a comprehensive set of evaluations with naturalistic news domain data in five different languages, for a range of multilingual LLMs of different sizes. We find that all LLMs tested, including GPT-5.2, ignore the conflict and confidently assert only one of the possible answers in the large majority of cases. Furthermore, there is a consistent bias across models and prompting languages in which languages are preferred, with a general bias against Russian and, for the longest context lengths, in favor of Chinese. The language preferences are consistent between models trained inside and outside of mainland China, though somewhat stronger in the former category. There is also a general tendency among models to prioritize information that matches the language used for prompting. We hope to make users and developers of multilingual LLMs aware of this category of biases, to spur further research on their causes and possible mitigation.

URL PDF HTML ☆

赞 0 踩 0

2604.05718 2026-06-03 cs.CV

MPM: Mutual Pair Merging for Efficient Vision Transformers

MPM：用于高效视觉Transformer的互结对合并

Simon Ravé, Pejman Rasti, David Rousseau

发表机构 * LARIS University of Angers（安格尔大学LARIS实验室）； UMR INRAe-IRHS Angers, France（法国安格尔INRAe-IRHS UMR）

AI总结提出无训练、无参数的互结对合并（MPM）模块，通过余弦空间互近邻配对与平均，记录合并图用于解码器前基于收集的重建，在语义分割中实现端到端加速，且精度损失小。

Comments Accepted to CVPR 2026 (Findings)

Journal ref Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Findings, 2026, pp. 2998-3008

详情

AI中文摘要

减少序列长度是加速Transformer的常用方法，但先前的token缩减工作通常针对分类任务，报告的是代理指标而非端到端延迟。对于语义分割，token缩减进一步受到重建密集、像素对齐特征的需求限制，并且在现代加速器上，计算合并图的开销可能抵消预期收益。我们提出互结对合并（MPM），一种无需训练的token聚合模块，它在余弦空间中形成互最近邻对，对每对进行平均，并记录一个合并图，使得在解码器之前能够进行基于收集的重建，从而现有分割头可以保持不变。MPM不引入任何学习参数，也没有连续的压缩旋钮（无保留率或阈值）。速度-精度权衡由离散的插入调度设置。我们在NVIDIA H100 GPU（带和不带FlashAttention-2）和Raspberry Pi 5上，针对标准分割数据集基准测试了端到端延迟。在ADE20K上，MPM在Raspberry Pi 5上为ViT-Tiny减少了高达60%的每张图像延迟，在H100上使用FlashAttention-2时吞吐量提升高达20%，同时mIoU下降保持在3%以内。这些结果表明，当显式考虑开销时，简单、重建感知、无需训练的token合并可以转化为分割中实际的时钟时间增益。

英文摘要

Decreasing sequence length is a common way to accelerate transformers, but prior token reduction work often targets classification and reports proxy metrics rather than end-to-end latency. For semantic segmentation, token reduction is further constrained by the need to reconstruct dense, pixel-aligned features, and on modern accelerators the overhead of computing merge maps can erase expected gains. We propose Mutual Pair Merging (MPM), a training-free token aggregation module that forms mutual nearest-neighbor pairs in cosine space, averages each pair, and records a merge map enabling a gather-based reconstruction before the decoder so that existing segmentation heads can be used unchanged. MPM introduces no learned parameters and no continuous compression knob (no keep-rate or threshold). The speed-accuracy trade-off is set by a discrete insertion schedule. We benchmark end-to-end latency on an NVIDIA H100 GPU (with and without FlashAttention-2) and a Raspberry Pi 5 across standard segmentation datasets. On ADE20K, MPM reduces per-image latency by up to 60% for ViT-Tiny on Raspberry Pi 5, and increases throughput by up to 20% on H100 with FlashAttention-2 while keeping the mIoU drop below 3%. These results suggest that simple, reconstruction-aware, training-free token merging can translate into practical wall-clock gains for segmentation when overhead is explicitly accounted for.

URL PDF HTML ☆

赞 0 踩 0

2604.04439 2026-06-03 cs.LG cs.CV

Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games

估计Atari游戏中中央、周边和时间视觉对人类决策的贡献

Henrik Krauss, Takehisa Yairi

发表机构 * Department of Advanced Interdisciplinary Studies, The University of Tokyo（东京大学先进跨学科研究系）； Research Center for Advanced Science and Technology, The University of Tokyo（东京大学先进科学与技术研究中心）

AI总结通过控制消融框架分析Atari游戏中的眼动数据，发现周边视觉信息对人类决策贡献最大，而注视信息和过去状态信息贡献较小。

详情

AI中文摘要

我们研究了不同视觉信息源在动态视觉环境中对人类决策的贡献。利用Atari-HEAD（一个带有同步眼动追踪的大规模Atari游戏数据集），我们引入了一个受控消融框架，作为逆向工程周边视觉信息、显式注视信息（以注视图形式）以及人类行为中过去状态信息贡献的手段。我们在六种设置下训练动作预测网络，这些设置选择性地包含或排除这些信息源。在20个游戏中，周边信息的贡献最为显著，移除后预测准确率的中位数下降范围为35.27-43.90%。注视信息导致的下降较小，为2.11-2.76%，而过去状态信息的下降范围较广，为1.52-15.51%，其中上限可能因减少了周边信息泄露而更具信息量。为了补充总体准确率，我们根据不同模型配置分配的真实动作概率对状态进行聚类。该分析识别出粗略的行为模式，包括焦点主导、周边主导以及更多情境决策情境。这些结果表明，Atari游戏中的人类决策强烈依赖于当前注视焦点之外的信息，而所提出的框架提供了一种从行为中估计此类信息源贡献的方法。

英文摘要

We study how different visual information sources contribute to human decision making in dynamic visual environments. Using Atari-HEAD, a large-scale Atari gameplay dataset with synchronized eye-tracking, we introduce a controlled ablation framework as a means to reverse-engineer the contribution of peripheral visual information, explicit gaze information in the form of gaze maps, and past-state information from human behavior. We train action-prediction networks under six settings that selectively include or exclude these information sources. Across 20 games, peripheral information shows by far the strongest contribution, with median prediction-accuracy drops in the range of 35.27-43.90% when removed. Gaze information yields smaller drops of 2.11-2.76%, while past-state information shows a broader range of 1.52-15.51%, with the upper end likely more informative due to reduced peripheral-information leakage. To complement aggregate accuracies, we cluster states by true-action probabilities assigned by the different model configurations. This analysis identifies coarse behavioral regimes, including focus-dominated, periphery-dominated, and more contextual decision situations. These results suggest that human decision making in Atari depends strongly on information beyond the current focus of gaze, while the proposed framework provides a way to estimate such information-source contributions from behavior.

URL PDF HTML ☆

赞 0 踩 0

2604.04087 2026-06-03 cs.LG

ArrowFlow: Hierarchical Machine Learning in the Space of Permutations

ArrowFlow：排列空间中的层次化机器学习

Ozgur Yilmaz

发表机构 * Department of Artificial Intelligence（人工智能系）； Adana Science and Technology University（阿达纳科学技术大学）

AI总结提出ArrowFlow架构，在排列空间中通过排序滤波器和置换矩阵累积实现无浮点参数的层次化排序表示学习，并利用社会选择公理违反作为归纳偏置，实验表明在多个基准上具有竞争力且具备噪声鲁棒性、隐私保护等特性。

详情

AI中文摘要

我们引入了ArrowFlow，一种完全在排列空间中运行的机器学习架构。其计算单元是排序滤波器，即学习到的排序，通过Spearman's footrule距离比较输入，并通过置换矩阵累积（一种基于位移证据的非梯度规则）进行更新。层以层次方式组合：每一层的输出排序成为下一层的输入，从而在核心计算中无需任何浮点参数即可实现深度序数表示学习。我们将该架构与Arrow不可能定理联系起来，表明社会选择公平性公理（上下文依赖性、专业化、对称性破坏）的违反作为非线性、稀疏性和稳定性的归纳偏置。实验涵盖UCI表格基准、MNIST、基因表达癌症分类（TCGA）和偏好数据，均与GridSearchCV调优的基线进行比较。ArrowFlow在Iris上击败所有基线（2.7% vs. 3.3%），并在大多数UCI数据集上具有竞争力。单个参数多项式次数充当主开关：次数1带来噪声鲁棒性（退化减少8-28%）、隐私保护（成本增加0.5个百分点）和缺失特征弹性；更高次数则牺牲这些特性以换取更高的干净准确率。ArrowFlow并非旨在超越基于梯度的方法。它是一个存在性证明，表明在一种根本不同的计算范式（将序数结构提升为一等公民，且与纯整数和神经形态硬件自然对齐）中实现有竞争力的分类是可能的。

英文摘要

We introduce ArrowFlow, a machine learning architecture that operates entirely in the space of permutations. Its computational units are ranking filters, learned orderings that compare inputs via Spearman's footrule distance and update through permutation-matrix accumulation, a non-gradient rule rooted in displacement evidence. Layers compose hierarchically: each layer's output ranking becomes the next layer's input, enabling deep ordinal representation learning without any floating-point parameters in the core computation. We connect the architecture to Arrow's impossibility theorem, showing that violations of social-choice fairness axioms (context dependence, specialization, symmetry breaking) serve as inductive biases for nonlinearity, sparsity, and stability. Experiments span UCI tabular benchmarks, MNIST, gene expression cancer classification (TCGA), and preference data, all against GridSearchCV-tuned baselines. ArrowFlow beats all baselines on Iris (2.7% vs. 3.3%) and is competitive on most UCI datasets. A single parameter, polynomial degree, acts as a master switch: degree 1 yields noise robustness (8-28% less degradation), privacy preservation (+0.5pp cost), and missing-feature resilience; higher degrees trade these for improved clean accuracy. ArrowFlow is not designed to surpass gradient-based methods. It is an existence proof that competitive classification is possible in a fundamentally different computational paradigm, one that elevates ordinal structure to a first-class citizen, with natural alignment to integer-only and neuromorphic hardware.

URL PDF HTML ☆

赞 0 踩 0

2512.18954 2026-06-03 cs.CV

VOIC: Visible-Occluded Integrated Guidance for 3D Semantic Scene Completion

VOIC：可见-遮挡联合引导的3D语义场景补全

Zaidao Han, Risa Higashita, Jiang Liu

发表机构 * Research Institute of Trustworthy Autonomous Systems, Southern University of Science and Technology（可信自主系统研究院，南方科技大学）； Department of Computer Science and Engineering, Southern University of Science and Technology（计算机科学与工程系，南方科技大学）； School of Computer Science, University of Nottingham Ningbo China（宁波大学计算机学院）； Department of Electronic and Information Engineering, Changchun University（电子与信息工程学院，长春大学）

AI总结提出VOIC网络，通过解耦可见区域感知与遮挡区域推理，利用离线可见区域标签提取策略和双解码器框架，在SemanticKITTI和SSCBench-KITTI360上实现最先进的3D语义场景补全性能。

详情

AI中文摘要

基于相机的3D语义场景补全（SSC）是自动驾驶和机器人场景理解的关键任务。它旨在从单张图像推断完整的3D体素表示，包括语义和几何信息。现有方法通常关注端到端的2D到3D特征提升和体素补全。然而，它们常常忽视由单图像输入引起的高置信度可见区域感知与低置信度遮挡区域推理之间的干扰，这可能导致特征稀释和错误传播。为了解决这些挑战，我们引入了一种离线可见区域标签提取（VRLE）策略，该策略从密集的3D地面真值中显式分离并提取可见区域的体素级监督。该策略为两个互补的子任务（可见区域感知和遮挡区域推理）净化了监督空间。基于这一思想，我们提出了可见-遮挡交互补全网络（VOIC），一种新颖的双解码器框架，将SSC显式解耦为可见区域语义感知和遮挡区域场景补全。VOIC首先通过融合图像特征与深度导出的占据信息构建基础3D体素表示。可见解码器专注于生成高保真的几何和语义先验，而遮挡解码器则利用这些先验以及跨模态交互进行连贯的全局场景推理。在SemanticKITTI和SSCBench-KITTI360基准上的大量实验表明，VOIC在几何补全和语义分割精度上均优于现有的单目SSC方法，实现了最先进的性能。

英文摘要

Camera-based 3D Semantic Scene Completion (SSC) is a critical task for autonomous driving and robotic scene understanding. It aims to infer a complete 3D volumetric representation of both semantics and geometry from a single image. Existing methods typically focus on end-to-end 2D-to-3D feature lifting and voxel completion. However, they often overlook the interference between high-confidence visible-region perception and low-confidence occluded-region reasoning caused by single-image input, which can lead to feature dilution and error propagation. To address these challenges, we introduce an offline Visible Region Label Extraction (VRLE) strategy that explicitly separates and extracts voxel-level supervision for visible regions from dense 3D ground truth. This strategy purifies the supervisory space for two complementary sub-tasks: visible-region perception and occluded-region reasoning. Building on this idea, we propose the Visible-Occluded Interactive Completion Network (VOIC), a novel dual-decoder framework that explicitly decouples SSC into visible-region semantic perception and occluded-region scene completion. VOIC first constructs a base 3D voxel representation by fusing image features with depth-derived occupancy. The visible decoder focuses on generating high-fidelity geometric and semantic priors, while the occlusion decoder leverages these priors together with cross-modal interaction to perform coherent global scene reasoning. Extensive experiments on the SemanticKITTI and SSCBench-KITTI360 benchmarks demonstrate that VOIC outperforms existing monocular SSC methods in both geometric completion and semantic segmentation accuracy, achieving state-of-the-art performance.

URL PDF HTML ☆

赞 0 踩 0

2603.01576 2026-06-03 cs.CV

Cryo-Bench: Benchmarking Foundation Models for Cryosphere Applications

Cryo-Bench：面向冰冻圈应用的基础模型基准测试

Saurabh Kaushik, Lalit Maurya, Beth Tellman, Valerio Marsocci

发表机构 * Center for Sustainability and the Global Environment (SAGE), University of Wisconsin–Madison（可持续性与全球环境中心（SAGE），威斯康星大学麦迪逊分校）； Portsmouth AI and Data Science Centre (PAIDS), School of Computing, University of Portsmouth（波特茅斯人工智能与数据科学中心（PAIDS），计算学院，波特茅斯大学）； ESA, ESRIN, φ \varphi -lab, Frascati（欧洲航天局（ESA），欧洲空间研究中心（ESRIN），φ实验室，弗拉斯卡蒂）

AI总结提出Cryo-Bench基准，评估14个地理基础模型在冰冻圈关键组件（如冰川、冰湖、海冰等）上的性能，发现UNet在冻结编码器下平均mIoU最高（66.38），而全微调结合学习率调整可提升性能12.77%。

详情

AI中文摘要

英文摘要

Prevailing High Dynamic Range (HDR) video reconstruction methods are fundamentally trapped in a fragile alignment-and-fusion paradigm. While explicit spatial alignment can successfully recover fine details in controlled environments, it becomes a severe bottleneck in unconstrained dynamic scenes. By forcing rigid alignment across unpredictable motions and varying exposures, these methods inevitably translate registration errors into severe ghosting artifacts and temporal flickering. In this paper, we rethink this conventional prerequisite. Recognizing that explicit alignment is inherently vulnerable to real-world complexities, we propose LoCAtion, a Long-time Collaborative Attention framework that reformulates HDR video generation from a fragile spatial warping task into a robust, alignment-free collaborative feature routing problem. Guided by this new formulation, our architecture explicitly decouples the highly entangled reconstruction task. Rather than struggling to rigidly warp neighboring frames, we anchor the scene on a continuous medium-exposure backbone and utilize collaborative attention to dynamically harvest and inject reliable irradiance cues from unaligned exposures. Furthermore, we introduce a learned global sequence solver. By leveraging bidirectional context and long-range temporal modeling, it propagates corrective signals and structural features across the entire sequence, inherently enforcing whole-video coherence and eliminating jitter. Extensive experiments demonstrate that LoCAtion achieves state-of-the-art visual quality and temporal stability, offering a highly competitive balance between accuracy and computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2512.09106 2026-06-03 cs.LG

Learning Unmasking Policies for Diffusion Language Models

学习扩散语言模型的去掩码策略

Metod Jazbec, Theo X. Olausson, Louis Béthune, Pierre Ablin, Michael Kirchhof, João Monteiro, Victor Turrisi, Jason Ramapuram, Marco Cuturi

发表机构 * Apple（苹果公司）； University of Amsterdam（阿姆斯特丹大学）； Massachusetts Institute of Technology（麻省理工学院）

AI总结针对扩散语言模型中的去掩码采样问题，提出基于强化学习训练轻量级策略，以替代手动调优的启发式方法，在保持性能的同时提升鲁棒性。

Comments V4: Accepted as an oral spotlight at ICML 2026

详情

AI中文摘要

扩散（大型）语言模型（dLLMs）现在在许多任务上与自回归模型的下游性能相匹配，同时有望在推理过程中更高效。dLLMs的一个关键设计方面是采样过程，该过程选择在每个扩散步骤中要去掩码哪些标记。实际上，最近的研究发现，与随机去掩码相比，诸如置信度阈值之类的启发式策略提高了样本质量和标记吞吐量。然而，此类启发式方法存在缺点：它们需要手动调整，并且我们观察到它们的性能随着块大小的增加而下降。在这项工作中，我们提出使用强化学习来训练采样过程。具体来说，我们将掩码扩散采样形式化为一个马尔可夫决策过程，其中dLLM充当环境，并提出了一个基于单层transformer的轻量级策略，该策略将dLLM标记置信度映射到去掩码决策。我们的实验表明，当与半自回归（块）生成结合时，这些训练后的策略与最先进的启发式方法的性能相匹配，同时在完全扩散设置中优于它们。

英文摘要

Diffusion (Large) Language Models (dLLMs) now match the downstream performance of their autoregressive counterparts on many tasks, while holding the promise of being more efficient during inference. One critical design aspect of dLLMs is the sampling procedure that selects which tokens to unmask at each diffusion step. Indeed, recent work has found that heuristic strategies such as confidence thresholding improve both sample quality and token throughput compared to random unmasking. However, such heuristics have downsides: they require manual tuning, and we observe that their performance degrades with larger block sizes. In this work, we instead propose to train sampling procedures using reinforcement learning. Specifically, we formalize masked diffusion sampling as a Markov decision process in which the dLLM serves as the environment, and propose a lightweight policy based on a single-layer transformer that maps dLLM token confidences to unmasking decisions. Our experiments show that these trained policies match the performance of state-of-the-art heuristics when combined with semi-autoregressive (block) generation, while outperforming them in the full-diffusion setting.

URL PDF HTML ☆

赞 0 踩 0

2603.07664 2026-06-03 cs.CV cs.AI cs.GR

Ref-DGS: Reflective Dual Gaussian Splatting

Ref-DGS: 反射性双高斯泼溅

Ningjing Fan, Yiqun Wang, Dong-Ming Yan, Peter Wonka

发表机构 * Chongqing University（重庆大学）； MAIS, Institute of Automation, Chinese Academy of Sciences and UCAS（自动化研究所，中国科学院，UCAS）； King Abdullah University of Science and Technology (KAUST)（卡塔尔科学与技术大学）

AI总结提出Ref-DGS框架，通过双高斯场景表示和物理感知的镜面自适应混合着色器，在高效光栅化管线中解耦表面重建与镜面反射，实现反射场景的SOTA新视图合成且训练速度远快于基于光线的方法。

Comments Project page: https://njfan.github.io/Ref-DGS/

详情

AI中文摘要

反射外观，尤其是强烈的近场镜面反射，对精确的表面重建和新视图合成构成了根本性挑战。现有的高斯泼溅方法要么无法建模近场镜面反射，要么依赖显式光线追踪而计算成本高昂。我们提出了 extbf{Ref-DGS}，一个反射性双高斯泼溅框架，通过在高效光栅化管线中将表面重建与镜面反射解耦来解决这一权衡。Ref-DGS引入了一种双高斯场景表示，由几何高斯和互补的局部反射高斯组成，无需显式光线追踪即可捕捉近场镜面交互，并包含一个全局环境反射场用于建模远场镜面反射。为了预测镜面辐射，我们进一步提出了一种轻量级的、物理感知的镜面自适应混合着色器，融合全局和局部镜面特征。实验表明，Ref-DGS在反射场景上达到了最先进的性能，同时训练速度显著快于基于光线的高斯方法。

英文摘要

The reflective appearance, especially strong and typically near-field specular reflections, poses a fundamental challenge for accurate surface reconstruction and novel view synthesis. Existing Gaussian splatting methods either fail to model near-field specular reflections or rely on explicit ray tracing at substantial computational cost. We present \textbf{Ref-DGS}, a reflective dual Gaussian splatting framework that addresses this trade-off by decoupling surface reconstruction from specular reflection within an efficient rasterization-based pipeline. Ref-DGS introduces a dual Gaussian scene representation consisting of geometry Gaussians and complementary local reflection Gaussians that capture near-field specular interactions without explicit ray tracing, along with a global environment reflection field for modeling far-field specular reflections. To predict specular radiance, we further propose a lightweight, physically-aware specular adaptive mixing shader that fuses global and local specular features. Experiments demonstrate that Ref-DGS achieves state-of-the-art performance on reflective scenes while training substantially faster than ray-based Gaussian methods.

URL PDF HTML ☆

赞 0 踩 0

2511.04469 2026-06-03 cs.LG physics.data-an q-fin.CP stat.ME stat.OT

Towards Causal Market Simulators

迈向因果市场模拟器

Dennis Thumm, Luis Ontaneda Mijares

发表机构 * National University of Singapore（新加坡国立大学）； Veracruz Mexico（墨西哥韦拉克鲁斯）

AI总结提出一种结合变分自编码器与结构因果模型的时间序列神经因果模型VAE（TNCM-VAE），用于生成保留时间依赖和因果关系的反事实金融时间序列，在合成数据上实现低至0.03-0.10的L1距离。

Comments ICAIF 2025 Workshop on Rethinking Financial Time-Series

详情

AI中文摘要

使用深度生成模型的市场生成器在合成金融数据生成方面显示出前景，但现有方法缺乏反事实分析和风险评估所必需的因果推理能力。我们提出了一种时间序列神经因果模型VAE（TNCM-VAE），它将变分自编码器与结构因果模型相结合，以生成反事实金融时间序列，同时保留时间依赖性和因果关系。我们的方法通过解码器架构中的有向无环图施加因果约束，并使用因果Wasserstein距离进行训练。我们在受Ornstein-Uhlenbeck过程启发的合成自回归模型上验证了该方法，在反事实概率估计中表现出优越性能，与真实值相比L1距离低至0.03-0.10。该模型通过生成尊重潜在因果机制的合理反事实市场轨迹，实现了金融压力测试、情景分析和增强回测。

英文摘要

Market generators using deep generative models have shown promise for synthetic financial data generation, but existing approaches lack causal reasoning capabilities essential for counterfactual analysis and risk assessment. We propose a Time-series Neural Causal Model VAE (TNCM-VAE) that combines variational autoencoders with structural causal models to generate counterfactual financial time series while preserving both temporal dependencies and causal relationships. Our approach enforces causal constraints through directed acyclic graphs in the decoder architecture and employs the causal Wasserstein distance for training. We validate our method on synthetic autoregressive models inspired by the Ornstein-Uhlenbeck process, demonstrating superior performance in counterfactual probability estimation with L1 distances as low as 0.03-0.10 compared to ground truth. The model enables financial stress testing, scenario analysis, and enhanced backtesting by generating plausible counterfactual market trajectories that respect underlying causal mechanisms.

URL PDF HTML ☆

赞 0 踩 0

2603.05290 2026-06-03 cs.AI

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

X-RAY: 通过形式化与校准探针映射大语言模型推理能力

Tianxi Gao, Yufan Cai, Yusi Yuan, Jin Song Dong

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出X-RAY系统，利用形式化工具生成结构可控的校准探针，通过分析约束交互、推理深度和解空间几何等属性，揭示LLM在约束细化与解空间重构下的推理不对称性。

Comments Accepted by KDD 2026

详情

DOI: 10.1145/3770855.3818029

AI中文摘要

大型语言模型（LLM）取得了有前景的性能，但其推理能力仍未被充分理解。现有评估主要强调任务级准确性，常常将模式匹配与推理能力混为一谈。我们提出了X-RAY，一个可解释的推理分析系统，通过校准的、形式化验证的探针来映射LLM的推理能力。我们将推理能力建模为可提取的 extit{结构}的函数，通过形式化属性（如约束交互、推理深度和解空间几何）进行操作化。X-RAY通过形式化工具生成具有受控结构变化的探针，通过形式化校准和验证实现对增量结构信息的精确隔离。我们在数学、物理和化学领域从初级到高级的问题上评估了最先进的LLM。我们的分析揭示了LLM推理中的系统性不对称：模型对约束细化（即附加条件缩小现有解空间）相对稳健，但在解空间重构（即修改改变解流形的底层结构形式）下性能急剧下降。此外，校准的形式化探针能够区分在标准基准上看似无法区分的模型，并揭示出结构上可解释而非模糊的失败模式。除了评估，我们的框架无污染，并支持推理模型的训练和测试。

英文摘要

Large language models (LLMs) achieve promising performance, yet their ability to reason remains poorly understood. Existing evaluations largely emphasize task-level accuracy, often conflating pattern matching with reasoning capability. We present X-RAY, an explainable reasoning analysis system that maps the LLM reasoning capability using calibrated, formally verified probes. We model reasoning capability as a function of extractable \textit{structure}, operationalized through formal properties such as constraint interaction, reasoning depth, and solution-space geometry. X-Ray generates probes via formal tools with controlled structural variations, enabling precise isolation of incremental structural information through formal calibration and verification. We evaluate state-of-the-art LLMs on problems ranging from junior-level to advanced in mathematics, physics, and chemistry. Our analysis reveals a systematic asymmetry in LLM reasoning: models are relatively robust to constraint refinement, where additional conditions shrink an existing solution space, but degrade sharply under solution-space restructuring, where modifications alter the underlying structural form of the solution manifold. Moreover, calibrated formal probes differentiate models that appear indistinguishable on standard benchmarks and reveal failure modes that are structurally interpretable rather than opaque. Beyond evaluation, our framework is contamination-free and supports the training and testing of reasoning models.

URL PDF HTML ☆

赞 0 踩 0

2603.04956 2026-06-03 cs.LG cs.IT math.IT

WaterSIC: Information-Theoretically (Near) Optimal Linear Layer Quantization

WaterSIC: 信息论（近乎）最优的线性层量化

Egor Lifar, Semyon Savkin, Or Ordentlich, Yury Polyanskiy

发表机构 * University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Stanford University（斯坦福大学）； University of California, Berkeley（加州大学伯克利分校）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结针对密集线性层低精度量化问题，提出WaterSIC算法，通过为权重矩阵不同列分配不同量化率，实现与信息论极限仅0.255比特的差距，并在Llama和Qwen系列大语言模型上达到1-4比特量化的最优性能。

2603.03612 2026-06-03 cs.LG cs.CC cs.CL cs.FL

Why Are Linear RNNs More Parallelizable?

为什么线性RNN更易于并行化？

William Merrill, Hongjian Jiang, Yanhong Li, Anthony Lin, Ashish Sabharwal

发表机构 * GitHub

AI总结本文通过将RNN类型与标准复杂度类紧密关联，揭示了线性RNN（LRNN）因可视为对数深度算术电路而易于并行化，而非线性RNN因能解决L-完全问题而存在并行化障碍。

Comments To appear at ICML 2026

详情

AI中文摘要

社区越来越多地探索线性RNN（LRNN）作为语言模型，受其表达能力和并行化能力的驱动。虽然先前的工作确立了LRNN相对于Transformer的表达优势，但尚不清楚是什么使得LRNN——而非传统的非线性RNN——在实践中与Transformer一样易于并行化。我们通过提供RNN类型与标准复杂度类之间的紧密联系来回答这个问题。我们表明，LRNN可以看作是对数深度（有界扇入）算术电路，相对于Transformer所允许的对数深度布尔电路，这仅代表轻微深度开销。此外，我们表明非线性RNN可以解决$\mathsf{L}$-完全问题（甚至在多项式精度下解决$\mathsf{P}$-完全问题），揭示了将它们与Transformer一样高效并行化的根本障碍。我们的理论还识别了近期流行LRNN变体之间的细粒度表达差异：置换对角LRNN是$\mathsf{NC}^1$-完全的，而对角加低秩LRNN更具表达性（$\mathsf{PNC}^1$-完全）。我们通过将每种RNN类型与它可以模拟的相应自动机理论模型相关联，提供了进一步见解。总之，我们的结果揭示了非线性RNN与不同LRNN变体之间的基本权衡，为设计在表达性和并行性之间实现最佳平衡的LLM架构提供了基础。

英文摘要

The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs -- but not traditional, nonlinear RNNs -- as easy to parallelize in practice as transformers. We answer this question by providing a tight connection between types of RNNs and standard complexity classes. We show that LRNNs can be viewed as log-depth (bounded fan-in) arithmetic circuits, which represents only a slight depth overhead relative to log-depth boolean circuits that transformers admit. Furthermore, we show that nonlinear RNNs can solve $\mathsf{L}$-complete problems (and even $\mathsf{P}$-complete ones, under polynomial precision), revealing a fundamental barrier to parallelizing them as efficiently as transformers. Our theory also identifies fine-grained expressivity differences between recent popular LRNN variants: permutation-diagonal LRNNs are $\mathsf{NC}^1$-complete whereas diagonal-plus-low-rank LRNNs are more expressive ($\mathsf{PNC}^1$-complete). We provide further insight by associating each type of RNN with a corresponding automata-theoretic model that it can simulate. Together, our results reveal fundamental tradeoffs between nonlinear RNNs and different variants of LRNNs, providing a foundation for designing LLM architectures that achieve an optimal balance between expressivity and parallelism.

URL PDF HTML ☆

赞 0 踩 0

2510.16462 2026-06-03 cs.LG stat.ML

Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

Buzz, Choose, Forget: 一种类蜂决策的元老虎机框架

Emmanuelle Claeys, Elena Kerjean, Jean-Michel Loubes

发表机构 * University of Toulouse, IRIT（图卢兹大学，IRIT）； University of Toulouse, CBI（图卢兹大学，CBI）； Regalia Team, INRIA University of Toulouse, France（Regalia团队，法国国家信息与自动化研究所图卢兹大学）

AI总结提出基于多臂老虎机的序列模仿学习模型MAYA，通过时间窗口τ模拟蜜蜂有限记忆，在真实、模拟和补充数据集上优于基线模型，并具备可解释性和轨迹推断能力。

2603.03480 2026-06-03 cs.LG stat.ML

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

在线强化学习中延迟观测的极小化最优策略

Harin Lee, Kevin Jamieson

发表机构 * University of California, Berkeley（加州大学伯克利分校）； UC Berkeley（加州大学伯克利分校）

AI总结针对延迟状态观测的强化学习问题，提出结合增广方法和上置信界算法的策略，在表格型MDP上达到极小化最优遗憾界。

Comments ICML camera ready version

2602.00423 2026-06-03 cs.LG

scBatchProx: Federated-Inspired Refinement for Stable Cell-Type Discriminability under Heterogeneous Batch Compositions

scBatchProx：异质性批次组成下稳定细胞类型可区分性的联邦启发式精炼

Quang-Huy Nguyen, Jiaqi Wang, Wei-Shinn Ku

发表机构 * National Institute of Health (NIH)（国家卫生研究院）

AI总结提出scBatchProx，一种轻量级后处理方法，通过联邦学习启发的优化和保守正则化，稳定单细胞潜在嵌入，提升异质批次下的细胞类型分类性能。

详情

AI中文摘要

单细胞整合工作流通常构建低维细胞嵌入，然后使用后处理方法减少批次效应。当细胞类型组成在不同批次间变化，某些群体在特定批次中代表性不足或缺失时，这种精炼过程可能变得不稳定。在动态单细胞数据系统中，新获取的批次可能改变技术条件和细胞类型组成，问题变得更加严重。这种不稳定性会降低下游细胞类型分类性能，并削弱在失衡扰动下的稳定性。我们引入scBatchProx，一种轻量级后处理方法，用于在这些异质和不断变化的环境中稳定单细胞潜在嵌入。scBatchProx直接操作预计算嵌入，并将每个批次或研究视为联邦启发优化过程中的客户端。批次条件FiLM适配器学习局部潜在更新，而近端和身份保持正则化使这些更新保持保守。在多批次和跨研究单细胞数据集上的实验表明，scBatchProx在不同上游嵌入上改善了下游细胞类型分类。在受控失衡扰动中，当选定群体从一个批次中降采样或移除时，scBatchProx维持更稳定的受影响细胞类型F1分数。在累积重训练和持续整合设置中，随着新数据集随时间到达，scBatchProx保持有效。这些结果共同表明，保守的联邦启发式精炼有助于在批次组成随数据集和时间变化时维持稳定的单细胞嵌入。

英文摘要

Single-cell integration workflows often construct low-dimensional cell embeddings and then refine them with post-hoc methods to reduce batch effects. This refinement process can become unstable when cell-type compositions vary across batches, with some populations underrepresented or absent in particular batches. The problem becomes more consequential in dynamic single-cell data systems, where newly acquired batches can change both technical conditions and cell-type composition. Such instability can reduce downstream cell-type classification performance and weaken stability under imbalance perturbations. We introduce scBatchProx, a lightweight post-hoc refinement method for stabilizing single-cell latent embeddings in these heterogeneous and evolving settings. scBatchProx operates directly on precomputed embeddings and treats each batch or study as a client in a federated-inspired optimization procedure. A batch-conditioned FiLM adapter learns local latent updates, while proximal and identity-preserving regularization keep these updates conservative. Experiments on multi-batch and cross-study single-cell datasets show that scBatchProx improves downstream cell-type classification across different upstream embeddings. In controlled imbalance perturbations, scBatchProx maintains more stable affected-cell-type F1 when selected populations are downsampled or ablated from one batch. In cumulative retraining and continual integration settings, scBatchProx remains effective as new datasets arrive over time. Together, these results suggest that conservative, federated-inspired refinement can help maintain stable single-cell embeddings as batch compositions change across datasets and over time.

URL PDF HTML ☆

赞 0 踩 0

2512.03005 2026-06-03 cs.AI

From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?

从审核到调解：LLMs能否充当在线论战中的调解员？

Dawei Li, Abdullah Alnaibari, Arslan Bisharat, Manuel Sandoval, Deborah Hall, Yasin Silva, Huan Liu

发表机构 * Arizona State University（亚利桑那州立大学）； Loyola University Chicago（芝加哥洛约拉大学）

AI总结本研究探索大型语言模型（LLMs）能否超越内容审核，作为调解员通过判断对话公平性和情感动态并生成共情缓和信息来化解在线冲突，实验表明API模型在推理和干预一致性上优于开源模型。

Comments Accepted by PAKDD 2026 special session on Data Science: Foundations and Applications

详情

AI中文摘要

大型语言模型（LLMs）的快速发展为人工智能向善应用开辟了新可能性。随着LLMs越来越多地介入在线交流，它们培养共情和建设性对话的潜力成为负责任AI研究的重要前沿。本研究探索LLMs是否不仅能作为检测有害内容的审核员，还能作为能够理解和缓和在线冲突的调解员。我们的框架将调解分解为两个子任务：判断，即LLM评估对话的公平性和情感动态；引导，即生成共情的、缓和性的信息以引导参与者走向解决。为评估调解质量，我们构建了一个大型基于Reddit的数据集，并提出了一个结合基于原则的评分、用户模拟和人工比较的多阶段评估流程。实验表明，API模型在调解时的推理和干预一致性方面优于开源模型。我们的发现突显了当前LLMs作为新兴在线社会调解代理的潜力和局限性。

英文摘要

The rapid advancement of large language models (LLMs) has opened new possibilities for AI for good applications. As LLMs increasingly mediate online communication, their potential to foster empathy and constructive dialogue becomes an important frontier for responsible AI research. This work explores whether LLMs can serve not only as moderators that detect harmful content, but as mediators capable of understanding and de-escalating online conflicts. Our framework decomposes mediation into two subtasks: judgment, where an LLM evaluates the fairness and emotional dynamics of a conversation, and steering, where it generates empathetic, de-escalatory messages to guide participants toward resolution. To assess mediation quality, we construct a large Reddit-based dataset and propose a multi-stage evaluation pipeline combining principle-based scoring, user simulation, and human comparison. Experiments show that API-based models outperform open-source counterparts in both reasoning and intervention alignment when doing mediation. Our findings highlight both the promise and limitations of current LLMs as emerging agents for online social mediation.

URL PDF HTML ☆

赞 0 踩 0

2505.23725 2026-06-03 cs.LG

MuLoCo: Muon is a practical inner optimizer for DiLoCo

MuLoCo: Muon 是 DiLoCo 的实用内部优化器

Benjamin Thérien, Xiaolong Huang, Aaron Defazio, Irina Rish, Eugene Belilovsky

发表机构 * FAIR at Meta（Meta 的 FAIR 部门）； Mila ； Université de Montréal（蒙特利尔大学）； Concordia University（康科迪亚大学）

AI总结本文提出 MuLoCo，将 Muon 作为 DiLoCo 的内部优化器，通过产生方向更准确的伪梯度，在多个工作节点下提升大语言模型训练性能，并兼容量化、流式处理和长同步间隔。

详情

AI中文摘要

DiLoCo 是一个强大的大语言模型（LLM）训练框架，能够在网络约束下实现更大的最优批大小和更高的加速器利用率。然而，研究表明 DiLoCo 的性能会随着工作节点数（K）的增加而下降（Charles 等人，2025）。在这项工作中，我们认为 DiLoCo 行为中一个相关但常被忽视的因素是内部优化器的选择，它塑造了外部优化器使用的伪梯度。鉴于最近 Muon 相对于 AdamW 在数据并行（DP）训练中的成功，我们研究了 Muon 的归一化优化器步骤如何影响伪梯度的质量。我们发现，相对于 AdamW，随着工作节点数（K）的增加，Muon 产生方向更正确的伪梯度。在我们预训练语言模型的实验中，我们对 150M、416M、914M、1.76B 和 3.1B 模型的 DiLoCo、MuLoCo、AdamW DP 和 Muon DP 进行了广泛的超参数调优。在所有规模上一致地发现，当 K≥1 时，MuLoCo（Muon 内部优化器 DiLoCo）在绝对性能上优于 DiLoCo，并且当 K>2 时，相对于它们各自的数据并行基线，MuLoCo 优于 DiLoCo，同时兼容量化、流式处理和长同步间隔。当 K=1 时，我们发现 MuLoCo 甚至可以优于数据并行黄金标准，同时具有更大的临界批大小。最后，我们将最优超参数外推到 15B 规模，并使用 K=1 和 K=16 个工作节点训练每个方法（共六种）的模型。我们发现，在此规模下，K=16 的 MuLoCo 几乎匹配单工作节点性能，而 K=1 的 MuLoCo 在使用更大的 16M token 批大小时匹配最佳基线性能。

英文摘要

DiLoCo is a powerful framework for training large language models (LLMs), enabling larger optimal batch sizes and increased accelerator utilization under networking constraints. However, DiLoCo's performance has been shown to degrade as the number of workers (K) increases (Charles et al., 2025). In this work, we posit that a related but often overlooked factor in DiLoCo's behavior is the choice of inner optimizer, which shapes the pseudogradient used by the outer optimizer. Given the recent success of Muon relative to AdamW for data parallel (DP) training, we examine how Muon's normalized optimizer steps can affect the pseudogradient's quality. We find that, relative to AdamW, Muon yields more directionally correct pseudogradients as the number of workers ($K$) increases. In our experiments pre-training language models, we conduct extensive hyperparameter tuning across 150M, 416M, 914M, 1.76B, and 3.1B models for DiLoCo, MuLoCo, AdamW DP, and Muon DP. Consistently across all scales, we find that with $K\geq1$ workers, MuLoCo (Muon inner optimizer DiLoCo) achieves superior performance to DiLoCo in absolute terms and for $K>2$ it outperforms DiLoCo relative to their data parallel baselines, while being compatible with quantization, streaming, and long synchronization intervals. At $K=1$, we find that MuLoCo can even outperform the data-parallel gold standard while having larger critical batch sizes. Finally, we extrapolate optimal hyperparameters to 15B scale and train a model with each method (six in total) using $K=1$ and $K=16$ workers. We find that $K=16$ MuLoCo nearly matches single-worker performance at this scale, while MuLoCo $K=1$ matches the best performing baseline while using a much larger $16$M token batch size.

URL PDF HTML ☆

赞 0 踩 0

2602.20217 2026-06-03 cs.LG cs.AI

KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem

KnapSpec: 通过自适应层选择作为背包问题的自推测解码

Seongjin Cha, Gyuwan Kim, Dongsu Han, Tao Yang, Insu Han

发表机构 * KAIST（韩国科学技术院）

AI总结提出KnapSpec，一种无需训练的框架，将草稿模型选择重新表述为背包问题，通过解耦注意力与MLP层并建模其硬件特定延迟，使用并行动态规划算法自适应确定最优草稿配置，实现令牌吞吐量最大化。

Comments Accepted to ICML 2026

详情

AI中文摘要

自推测解码（SSD）通过跳过层来创建高效的草稿模型，从而加速LLM推理，但现有方法通常依赖静态启发式，忽略了长上下文场景中注意力的动态计算开销。我们提出KnapSpec，一种无需训练的框架，将草稿模型选择重新表述为背包问题，以最大化每时间令牌吞吐量。通过解耦注意力与MLP层，并将其硬件特定延迟建模为上下文长度的函数，KnapSpec通过并行动态规划算法自适应地即时识别最优草稿配置。此外，我们提供了首个严格的理论分析，建立了隐藏状态之间的余弦相似度作为令牌接受率的数学上合理的代理。这一基础使得我们的方法在导航现实世界硬件的动态瓶颈时，能够保持高草稿保真度。我们在Qwen3和Llama3上的实验表明，KnapSpec始终优于最先进的SSD基线，在各种基准测试中实现了高达1.47倍的墙钟加速。我们的即插即用方法确保了长序列的高效推理，无需额外训练或损害目标模型的输出分布。

英文摘要

Self-speculative decoding (SSD) accelerates LLM inference by skipping layers to create an efficient draft model, yet existing methods often rely on static heuristics that ignore the dynamic computational overhead of attention in long-context scenarios. We propose KnapSpec, a training-free framework that reformulates draft model selection as a knapsack problem to maximize tokens-per-time throughput. By decoupling Attention and MLP layers and modeling their hardware-specific latencies as functions of context length, KnapSpec adaptively identifies optimal draft configurations on the fly via a parallel dynamic programming algorithm. Furthermore, we provide the first rigorous theoretical analysis establishing cosine similarity between hidden states as a mathematically sound proxy for the token acceptance rate. This foundation allows our method to maintain high drafting faithfulness while navigating the shifting bottlenecks of real-world hardware. Our experiments on Qwen3 and Llama3 demonstrate that KnapSpec consistently outperforms state-of-the-art SSD baselines, achieving up to 1.47x wall-clock speedup across various benchmarks. Our plug-and-play approach ensures high-speed inference for long sequences without requiring additional training or compromising the target model's output distribution.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Evaluating Relational Reasoning in LLMs with REL

MAVEN-T: Reinforced Heterogeneous Distillation for Real-Time Multi-Agent Trajectory Prediction

Optimal Rates for Generalization of Gradient Descent for Deep ReLU Classification

Flow Learners for PDEs: Toward a Physics-to-Physics Paradigm for Scientific Computing

PRISM: Rethinking Atmospheric Scattering Reconstruction as a Unified Understanding and Restoration Model for Real-world Dehazing

Language Bias under Conflicting Information in Multilingual LLMs

MPM: Mutual Pair Merging for Efficient Vision Transformers

Estimating Central, Peripheral, and Temporal Visual Contributions to Human Decision Making in Atari Games

ArrowFlow: Hierarchical Machine Learning in the Space of Permutations

VOIC: Visible-Occluded Integrated Guidance for 3D Semantic Scene Completion

Cryo-Bench: Benchmarking Foundation Models for Cryosphere Applications

Low-Resolution Editing is All You Need for High-Resolution Editing

Act Like a Pathologist: Tissue-Aware Whole Slide Image Reasoning

Can Structural Cues Save LLMs? Evaluating Language Models in Massive Document Streams

SJD-PAC: Accelerating Speculative Jacobi Decoding via Proactive Drafting and Adaptive Continuation

PAND: Prompt-Aware Neighborhood Distillation for Lightweight Fine-Grained Visual Classification

PubTables-v2: A new large-scale dataset for full-page and multi-page table extraction

LoCAtion: Long-time Collaborative Attention Framework for High Dynamic Range Video Reconstruction

Learning Unmasking Policies for Diffusion Language Models

Ref-DGS: Reflective Dual Gaussian Splatting

Towards Causal Market Simulators

X-RAY: Mapping LLM Reasoning Capability via Formalized and Calibrated Probes

WaterSIC: Information-Theoretically (Near) Optimal Linear Layer Quantization

Why Are Linear RNNs More Parallelizable?

Buzz, Choose, Forget: A Meta-Bandit Framework for Bee-Like Decision Making

Minimax Optimal Strategy for Delayed Observations in Online Reinforcement Learning

scBatchProx: Federated-Inspired Refinement for Stable Cell-Type Discriminability under Heterogeneous Batch Compositions

From Moderation to Mediation: Can LLMs Serve as Mediators in Online Flame Wars?

MuLoCo: Muon is a practical inner optimizer for DiLoCo

KnapSpec: Self-Speculative Decoding via Adaptive Layer Selection as a Knapsack Problem