arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2511.08247 2026-06-03 cs.CL cs.CY

ParliaBench: An Evaluation and Benchmarking Framework for LLM-Generated Parliamentary Speech

ParliaBench: 面向大语言模型生成的议会演讲的评估与基准框架

Marios Koniaris, Argyro Tsipi, Panayiotis Tsanakas

发表机构 * University of Cambridge（剑桥大学）

AI总结提出ParliaBench基准框架，通过构建英国议会数据集、结合计算指标与LLM评判的评估方法以及两种新型嵌入指标（政治光谱对齐和政党对齐），系统评估LLM生成议会演讲的语言质量、语义连贯性和政治真实性，实验表明微调显著提升多数指标且新指标对政治维度具有强区分力。

Journal ref Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026), pp. 4797-4818, European Language Resources Association (ELRA), Palma, Mallorca, Spain, May 2026

详情

DOI: 10.63317/447dqkef7ks7

AI中文摘要

议会演讲生成对大型语言模型提出了超越标准文本生成任务的特定挑战。与通用文本生成不同，议会演讲不仅需要语言质量，还需要政治真实性和意识形态一致性。当前语言模型缺乏针对议会上下文的专门训练，现有评估方法侧重于标准NLP指标而非政治真实性。为此，我们提出了ParliaBench，一个用于议会演讲生成的基准。我们构建了一个来自英国议会的演讲数据集，以实现系统性的模型训练。我们引入了一个评估框架，将计算指标与LLM-as-a-judge评估相结合，用于衡量生成质量在三个维度上的表现：语言质量、语义连贯性和政治真实性。我们提出了两种新颖的基于嵌入的指标——政治光谱对齐和政党对齐——以量化意识形态定位。我们微调了五个大型语言模型（LLM），生成了28k篇演讲，并使用我们的框架对其进行了评估，比较了基线和微调模型。结果表明，微调在大多数指标上产生了统计显著的改进，并且我们的新颖指标对政治维度表现出强大的区分能力。

英文摘要

Parliamentary speech generation presents specific challenges for large language models beyond standard text generation tasks. Unlike general text generation, parliamentary speeches require not only linguistic quality but also political authenticity and ideological consistency. Current language models lack specialized training for parliamentary contexts, and existing evaluation methods focus on standard NLP metrics rather than political authenticity. To address this, we present ParliaBench, a benchmark for parliamentary speech generation. We constructed a dataset of speeches from UK Parliament to enable systematic model training. We introduce an evaluation framework combining computational metrics with LLM-as-a-judge assessments for measuring generation quality across three dimensions: linguistic quality, semantic coherence, and political authenticity. We propose two novel embedding-based metrics, Political Spectrum Alignment and Party Alignment, to quantify ideological positioning. We fine-tuned five large language models (LLMs), generated 28k speeches, and evaluated them using our framework, comparing baseline and fine-tuned models. Results show that fine-tuning produces statistically significant improvements across the majority of metrics and our novel metrics demonstrate strong discriminative power for political dimensions.

URL PDF HTML ☆

赞 0 踩 0

2511.07971 2026-06-03 cs.LG

Low-Rank Curvature for Zeroth-Order Optimization in LLM Fine-Tuning

低秩曲率用于大语言模型微调中的零阶优化

Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko

发表机构 * University of Wisconsin – Madison（威斯康星大学麦迪逊分校）； University of Georgia（佐治亚大学）； Hanyang University（翰阳大学）

AI总结提出LOREN方法，通过低秩块对角预条件器捕捉曲率并利用REINFORCE留一法梯度估计器降低方差，在LLM微调中实现更高精度和更快收敛，同时峰值内存使用降低27.3%。

Comments Accepted to the AAAI Conference on Artificial Intelligence (AAAI-2026)

详情

DOI: 10.1609/aaai.v40i30.39715

AI中文摘要

我们引入了LOREN，一种用于微调大型语言模型（LLM）的曲率感知零阶（ZO）优化方法。现有的ZO方法通过随机扰动的有限差分估计梯度，常常遭受高方差和次优搜索方向的问题。我们的方法通过以下方式解决这些挑战：（i）将梯度预条件问题重新表述为自适应估计用于梯度估计的各向异性扰动分布的问题，（ii）通过自然进化策略框架，使用低秩块对角预条件器捕捉曲率，以及（iii）应用REINFORCE留一法（RLOO）梯度估计器来降低方差。在标准LLM基准上的实验表明，我们的方法通过实现更高的精度和更快的收敛，优于最先进的ZO方法，同时与MeZO-Adam相比，峰值内存使用减少了高达27.3%。

英文摘要

We introduce LOREN, a curvature-aware zeroth-order (ZO) optimization method for fine-tuning large language models (LLMs). Existing ZO methods, which estimate gradients via finite differences using random perturbations, often suffer from high variance and suboptimal search directions. Our approach addresses these challenges by: (i) reformulating the problem of gradient preconditioning as that of adaptively estimating an anisotropic perturbation distribution for gradient estimation, (ii) capturing curvature through a low-rank block diagonal preconditioner using the framework of natural evolution strategies, and (iii) applying a REINFORCE leave-one-out (RLOO) gradient estimator to reduce variance. Experiments on standard LLM benchmarks show that our method outperforms state-of-the-art ZO methods by achieving higher accuracy and faster convergence, while cutting peak memory usage by up to 27.3% compared with MeZO-Adam.

URL PDF HTML ☆

赞 0 踩 0

2506.08464 2026-06-03 cs.LG

MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature

MAC：一种使用平均激活近似曲率的高效梯度预条件方法

Hyunseok Seung, Jaewoo Lee, Hyunsuk Ko

发表机构 * University of Wisconsin – Madison（威斯康星大学麦迪逊分校）； University of Georgia（佐治亚大学）； Hanyang University（翰阳大学）

AI总结提出MAC方法，通过近似KFAC中Fisher信息矩阵的Kronecker因子，降低二阶优化计算负担，并首次将Kronecker分解应用于Transformer注意力层，在多种网络和数据集上优于KFAC等现有方法。

Comments Accepted to the IEEE International Conference on Data Mining (ICDM-2025)

详情

DOI: 10.1109/ICDM65498.2025.00077

AI中文摘要

用于训练神经网络的二阶优化方法，如KFAC，通过利用损失景观的曲率信息展现出优越的收敛性。然而，这是以高计算负担为代价的。在这项工作中，我们分析了构成KFAC中逐层Fisher信息矩阵（FIM）的两个组件：与激活和预激活梯度相关的Kronecker因子。基于对其特征谱的实证观察，我们提出了它们的有效近似，从而产生了一种计算高效的优化方法，称为MAC。据我们所知，MAC是第一个将Kronecker分解应用于Transformer中注意力层的FIM，并明确将注意力分数整合到预条件中的算法。我们还研究了MAC在非线性神经网络上的收敛性质，并提供了其收敛到全局最小值的两个条件。我们在各种网络架构和数据集上的广泛评估表明，所提出的方法在准确性、端到端训练时间和内存使用方面优于KFAC和其他最先进的方法。

英文摘要

Second-order optimization methods for training neural networks, such as KFAC, exhibit superior convergence by utilizing curvature information of loss landscape. However, it comes at the expense of high computational burden. In this work, we analyze the two components that constitute the layer-wise Fisher information matrix (FIM) used in KFAC: the Kronecker factors related to activations and pre-activation gradients. Based on empirical observations on their eigenspectra, we propose efficient approximations for them, resulting in a computationally efficient optimization method called MAC. To the best of our knowledge, MAC is the first algorithm to apply the Kronecker factorization to the FIM of attention layers used in transformers and explicitly integrate attention scores into the preconditioning. We also study the convergence property of MAC on nonlinear neural networks and provide two conditions under which it converges to global minima. Our extensive evaluations on various network architectures and datasets show that the proposed method outperforms KFAC and other state-of-the-art methods in terms of accuracy, end-to-end training time, and memory usage.

URL PDF HTML ☆

赞 0 踩 0

2310.00965 2026-06-03 cs.LG

Node Perturbation Can Effectively Train Multi-Layer Neural Networks

节点扰动可以有效训练多层神经网络

Sander Dalm, Marcel van Gerven, Nasir Ahmad

发表机构 * Donders Institute for Brain, Cognition and Behaviour（大脑、认知与行为研究所）

AI总结通过将节点扰动与方向导数对齐并在每层进行输入去相关，显著提升了节点扰动学习的参数收敛速度和测试性能，接近反向传播。

详情

AI中文摘要

反向传播（BP）仍然是训练深度神经网络参数的主导且最成功的方法。然而，BP依赖于两个计算上不同的阶段，不能提供对生物学习的满意解释，并且可能难以应用于具有不连续性或噪声节点动态的网络训练。相比之下，节点扰动（NP），也称为活动扰动前向梯度，提出通过向网络激活中注入噪声并随后测量引起的损失变化来学习。NP依赖于两次前向（推理）传递，不使用网络导数，并已被提出作为生物系统中学习的模型。然而，标准NP数据效率极低，并且由于其无引导的基于噪声的搜索过程可能不稳定。在这项工作中，我们通过将NP与方向导数相关联并引入输入去相关，发展了一种现代视角。我们发现，与方向导数的更紧密对齐以及每层的输入去相关在理论和实践上增强了NP学习的性能，在参数收敛方面有大幅改进，并且在测试数据上获得更高的性能，接近BP。此外，我们的新公式允许应用于噪声过程本身不可访问的噪声系统，这对于神经形态芯片上的学习特别有意义。

英文摘要

Backpropagation (BP) remains the dominant and most successful method for training parameters of deep neural network models. However, BP relies on two computationally distinct phases, does not provide a satisfactory explanation of biological learning, and can be challenging to apply for training of networks with discontinuities or noisy node dynamics. By comparison, node perturbation (NP), also known as activity-perturbed forward gradients, proposes learning by the injection of noise into network activations, and subsequent measurement of the induced loss change. NP relies on two forward (inference) passes, does not make use of network derivatives, and has been proposed as a model for learning in biological systems. However, standard NP is highly data inefficient and can be unstable due to its unguided noise-based search process. In this work, we develop a modern perspective on NP by relating it to the directional derivative and incorporating input decorrelation. We find that a closer alignment with directional derivatives together with input decorrelation at every layer theoretically and practically enhances performance of NP learning with large improvements in parameter convergence and much higher performance on the test data, approaching that of BP. Furthermore, our novel formulation allows for application to noisy systems in which the noise process itself is inaccessible, which is of particular interest for on-chip learning in neuromorphic systems.

URL PDF HTML ☆

赞 0 踩 0

2511.04421 2026-06-03 cs.RO

Temporal Action Selection for Action Chunking

用于动作分块的时间动作选择

Yueyang Weng, Xiaopeng Zhang, Yongjin Mu, Yingcong Zhu, Yanjie Li

发表机构 * Guangdong Key Laboratory of Intelligent Morphing Mechanisms and Adaptive Robotics and School of Intelligence Science and Engineering, the Harbin Institute of Technology Shenzhen, China（广东省智能变形机制与自适应机器人重点实验室和智能科学与工程学院，哈尔滨工业大学深圳学院）

AI总结提出时间动作选择（TAS）算法，通过缓存多时间步预测的动作块并动态选择最优动作，在保持决策一致性的同时提升反应性，显著提高任务成功率。

详情

AI中文摘要

动作分块是学习从示范（LfD）中广泛采用的方法。通过建模多步动作块而非单步动作，动作分块显著增强了对人类专家策略的建模能力。然而，由于动作分块仅在完整动作块执行后才做出单一决策，由此导致的决策频率降低限制了实时观测的利用，削弱了在动态或嘈杂环境中的反应性。现有解决该问题的尝试主要是在反应性和决策一致性之间进行权衡，未能同时实现两者。为解决这一局限，我们提出了一种新颖算法——时间动作选择（TAS），该算法缓存来自多个时间步的预测动作块，并通过轻量级选择器网络动态选择最优动作。TAS在反应性和决策一致性上实现了平衡优化。跨多个任务及不同基础策略架构的实验表明，TAS显著提高了成功率。此外，将TAS作为基础策略与残差强化学习（RL）相结合，既提升了训练效率，也提高了性能上限。在仿真和物理机器人上的实验均证实了该方法的有效性。

英文摘要

Action chunking is a widely adopted approach in Learning from Demonstration (LfD). By modeling multi-step action chunks rather than single-step actions, action chunking significantly enhances modeling capabilities for human expert policies. However, because action chunking makes a single decision only after a complete action block has been executed, the resulting reduction in decision frequency restricts the utilization of real-time observations, impairing reactivity in dynamic or noisy environments. Existing efforts to address this issue have primarily resorted to trading off reactivity against decision consistency, without achieving both. To address this limitation, we propose a novel algorithm, Temporal Action Selection (TAS), which caches predicted action chunks from multiple timesteps and dynamically selects the optimal action through a lightweight selector network. TAS achieves balanced optimization across both reactivity and decision consistency. Experiments across multiple tasks with diverse base policy architectures show that TAS significantly improves success rates. Furthermore, integrating TAS as a base policy with residual reinforcement learning (RL) improves both training efficiency and the performance ceiling. Experiments in both simulation and physical robots confirm the method's efficacy.

URL PDF HTML ☆

赞 0 踩 0

2511.02417 2026-06-03 cs.CV cs.RO

CropCraft: A Procedural World Generator for Robotic Simulation of Agricultural Tasks

CropCraft：用于农业任务机器人仿真的程序化世界生成器

Riccardo Bertoglio, Cyrille Pierre, Johann Laconte, Roland Lenain

发表机构 * Institut National de la Recherche Agronomique（法国国家农业科研院）

AI总结提出基于Blender和Python的开源程序化世界生成器CropCraft，通过YAML配置生成多样化农田场景，支持间作、葡萄园和杂草田，并生成带标注的3D仿真环境，用于农业机器人感知和导航算法开发。

详情

AI中文摘要

现代农业中 agroecological 实践的采用要求机器人系统能够在高度多样化和复杂的田间环境中运行。开发和评估此类系统严重依赖仿真，但生成代表 agroecological 多样性的逼真且可配置的3D环境仍然是一个主要挑战。本文提出了 CropCraft，一个基于 Blender 和 Python 构建的开源程序化世界生成器，旨在生成适用于农业机器人的3D仿真环境。CropCraft 通过简单的 YAML 配置文件生成作物田，支持多种场景，包括间作、葡萄园和杂草丛生的田地。该工具包含一个多生长阶段的3D植物模型库（作物、草和杂草），并使用随机放置算法真实地再现实际田地中观察到的空间变异性。生成的场景可直接导入 Gazebo 仿真器，并包含所有放置元素的地面真值标注，支持感知和导航算法的开发。为了展示 CropCraft 的实际用途，我们将其应用于使用深度学习的作物-杂草语义分割任务。生成了包含10,000张玉米田合成图像的数据集，这些图像具有不同的杂草密度、生长阶段和光照条件，并用于训练多个分割架构。仅使用合成数据训练的模型在真实田间图像上实现了约10%的平均交并比（mIoU）的 sim-to-real 差距，优于先前的先进合成生成方法。我们进一步表明，即使将少量真实图像与合成数据结合，也能提高跨领域的泛化能力，为农业感知任务中合成数据的有效使用提供了新见解。

英文摘要

The adoption of agroecological practices in modern agriculture requires robotic systems capable of operating in highly diverse and complex field environments. Developing and evaluating such systems relies heavily on simulation, yet generating realistic and configurable 3D environments representative of agroecological diversity remains a major challenge. This paper presents CropCraft, an open-source procedural world generator built on Blender and Python, designed to produce 3D simulation environments tailored to agricultural robotics. CropCraft generates crop fields from a simple YAML configuration file, supporting a wide range of scenarios including intercropping, vineyards, and weed-infested fields. The tool includes a library of 3D plant models (crops, grasses, and weeds) at multiple growth stages, and uses stochastic placement algorithms to realistically reproduce the spatial variability observed in real fields. Generated worlds are directly importable into the Gazebo simulator and include ground-truth annotations for all placed elements, supporting both perception and navigation algorithm development. To demonstrate the practical utility of CropCraft, we apply it to the task of crop-weed semantic segmentation using deep learning. A dataset of 10,000 synthetic images of maize fields with varying weed densities, growth stages, and lighting conditions was generated and used to train several segmentation architectures. Models trained exclusively on synthetic data achieve a sim-to-real gap of approximately 10% mean Intersection over Union (mIoU) on real field images, outperforming previous state-of-the-art synthetic generation approaches. We further show that combining even a few real images with synthetic data improves generalization across domains, providing new insights into the effective use of synthetic data for agricultural perception tasks.

URL PDF HTML ☆

赞 0 踩 0

2510.23216 2026-06-03 cs.AI cs.LG

Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach

逼真足球模拟中的人性化守门：一种样本高效的强化学习方法

Alessandro Sestini, Joakim Bergdahl, Jean-Philippe Barrette-LaPierre, Florian Fuchs, Brady Chen, Fabio Zinno, Michael Jones, Linus Gisslén

发表机构 * University of Edinburgh（爱丁堡大学）； KTH Royal Institute of Technology（皇家理工学院）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出一种样本高效的深度强化学习方法，通过利用预收集数据和增加网络可塑性，在EA SPORTS FC 25中训练出守门员智能体，其扑救率比内置AI高10%，训练速度比标准DRL快50%，且行为更接近人类。

详情

AI中文摘要

尽管多个知名视频游戏已成为深度强化学习（DRL）的测试平台，但该技术很少被游戏行业用于制作真实的AI行为。先前的研究侧重于使用大型模型训练超人类智能体，这对于资源有限、旨在实现类人智能体的游戏工作室来说并不实际。本文提出了一种样本高效的DRL方法，专为在工业环境（如视频游戏行业）中训练和微调智能体而设计。我们的方法通过利用预收集的数据和增加网络可塑性来提高基于价值的DRL的样本效率。我们在EA SPORTS FC 25（当今最畅销的足球模拟游戏之一）中评估了该方法训练守门员智能体的效果。我们的智能体在扑救率上比游戏内置AI高出10%。消融研究表明，与标准DRL方法相比，我们的方法训练智能体速度提高了50%。最后，领域专家的定性评估表明，与手工制作的智能体相比，我们的方法创造了更人性化的游戏玩法。作为该方法影响力的证明，该技术已被用于该系列的最新版本中。

英文摘要

While several high profile video games have served as testbeds for Deep Reinforcement Learning (DRL), this technique has rarely been employed by the game industry for crafting authentic AI behaviors. Previous research focuses on training super-human agents with large models, which is impractical for game studios with limited resources aiming for human-like agents. This paper proposes a sample-efficient DRL method tailored for training and fine-tuning agents in industrial settings such as the video game industry. Our method improves sample efficiency of value-based DRL by leveraging pre-collected data and increasing network plasticity. We evaluate our method training a goalkeeper agent in EA SPORTS FC 25, one of the best-selling football simulations today. Our agent outperforms the game's built-in AI by 10% in ball saving rate. Ablation studies show that our method trains agents 50% faster compared to standard DRL methods. Finally, qualitative evaluation from domain experts indicates that our approach creates more human-like gameplay compared to hand-crafted agents. As a testament to the impact of the approach, the method has been adopted for use in the most recent release of the series.

URL PDF HTML ☆

赞 0 踩 0

2510.23469 2026-06-03 cs.LG

Towards Fair Graph Prompting: A Dual-Prompt Mechanism for Mitigating Attribute and Structural Bias

面向公平图提示：一种缓解属性与结构偏差的双提示机制

Yuhan Yang, Xingbo Fu, Jundong Li

发表机构 * University of Michigan（密歇根大学）； University of Virginia（弗吉尼亚大学）

AI总结提出自适应双提示框架（ADPrompt），通过自适应特征修正和自适应消息校准两个模块，在适应预训练GNN的同时缓解节点属性与图结构中的偏差，实现公平的节点分类。

详情

AI中文摘要

对未标记图数据进行自监督预训练已成为图神经网络（GNN）的常见范式。然而，预训练目标与下游任务之间通常存在目标差距。为弥补这一差距，图提示方法通过可学习提示将冻结的预训练GNN适应到特定下游任务。尽管有效，但现有大多数图提示方法主要关注提升模型性能，而很大程度上忽略了公平性问题。由于下游图数据在节点属性和图结构中固有地包含偏差，预训练GNN可能在不同人口统计子组之间产生不同的表示。为解决这一局限，我们提出自适应双提示（ADPrompt），一种公平感知的图提示框架，用于适应预训练GNN。ADPrompt包含两个互补组件：自适应特征修正，学习个性化属性提示以在输入层面抑制敏感信息；以及自适应消息校准，引入逐层结构提示以动态调节来自邻居节点的信息传播。通过联合优化这两个模块，ADPrompt在适应预训练GNN的同时缓解了属性级和结构级偏差。在四个基准数据集上采用多种预训练策略的实验表明，ADPrompt在节点分类任务中始终优于七个竞争基线。

英文摘要

Self-supervised pre-training on unlabeled graph data has become a common paradigm for Graph Neural Networks (GNNs). However, an objective gap often remains between pre-training objectives and downstream tasks. To bridge this gap, graph prompting methods adapt frozen pre-trained GNNs to specific downstream tasks through learnable prompts. Despite its effectiveness, most existing graph prompting methods primarily focus on improving model performance and largely overlook fairness concerns. As downstream graph data inherently contains biases in both node attributes and graph structures, pre-trained GNNs may produce representations that differ across demographic subgroups. To address this limitation, we propose Adaptive Dual Prompting (ADPrompt), a fairness-aware graph prompting framework for adapting pre-trained GNNs. ADPrompt incorporates two complementary components: Adaptive Feature Rectification, which learns personalized attribute prompts to suppress sensitive information at the input level, and Adaptive Message Calibration, which introduces layer-wise structure prompts to dynamically regulate information propagation from neighboring nodes. By jointly optimizing these two modules, ADPrompt adapts the pre-trained GNN while mitigating both attribute-level and structural bias. Experiments on four benchmark datasets with multiple pre-training strategies demonstrate that ADPrompt consistently outperforms seven competitive baselines in node classification tasks.

URL PDF HTML ☆

赞 0 踩 0

2510.16302 2026-06-03 cs.AI cs.IR

DTKG: Dual-Track Knowledge Graph-Verified Reasoning Framework for Multi-Hop QA

DTKG: 用于多跳问答的双轨知识图谱验证推理框架

Changhao Wang, Yanfang Liu, Xinxin Fan, Ao Tian, Lanzhi Zhou, Yunfeng Lu

发表机构 * School of Computer Science ； Engineering, Beihang University, Beijing, China ； School of Reliability ； Systems Engineering, Beihang University, Beijing, China ； State Key Laboratory of Complex \& Critical Software Environment ； National Key Laboratory of Reliability ； State Key Laboratory of AI Safety, Institute of Computing Technology, Chinese Academy of Sciences

AI总结提出DTKG框架，通过分类阶段和分支处理阶段分别处理并行事实验证和链式多跳推理，提升多跳问答的效率和准确性。

Comments Accepted to ICML 2026

详情

AI中文摘要

问答中的多跳推理在现代大型语言模型的检索增强生成中扮演关键角色。通过从知识图谱中检索实体的关系结构可以获得准确答案。考虑到固有的关系依赖和推理模式，多跳推理通常分为两类：i) 并行事实验证多跳推理问题，即需要同时验证多个独立子问题；ii) 链式多跳推理问题，即需要顺序多步推理，中间结论作为后续推理的必要前提。目前，多跳推理方法单独使用两种技术之一：基于LLM响应的事实验证和基于KG路径的链构建。然而，前者擅长并行事实验证但在链式推理任务上表现不佳，而后者擅长链式多跳推理但在处理并行事实验证推理时存在冗余路径检索问题。这些限制降低了多跳问答任务的效率和准确性。为解决这一挑战，我们提出了一种新颖的双轨KG验证和推理框架DTKG，其灵感来自认知科学中的双过程理论。具体来说，DTKG包括两个主要阶段：分类阶段和分支处理阶段。

英文摘要

Multi-hop reasoning for question answering (QA) plays a critical role in retrieval-augmented generation (RAG) for modern large language models (LLMs). The accurate answer can be obtained through retrieving relational structure of entities from knowledge graph (KG). Regarding the inherent relation-dependency and reasoning pattern, multi-hop reasoning can be in general classified into two categories: i) parallel fact-verification multi-hop reasoning question, i.e., requiring simultaneous verifications of multiple independent sub-questions; and ii) chained multi-hop reasoning questions, i.e., demanding sequential multi-step inference with intermediate conclusions serving as essential premises for subsequent reasoning. Currently, the multi-hop reasoning approaches singly employ one of two techniques: LLM response-based fact verification and KG path-based chain construction. Nevertheless, the former excels at parallel fact-verification but underperforms on chained reasoning tasks, while the latter demonstrates proficiency in chained multi-hop reasoning but suffers from redundant path retrieval when handling parallel fact-verification reasoning. These limitations deteriorate the efficiency and accuracy for multi-hop QA tasks. To address this challenge, we propose a novel dual-track KG verification and reasoning framework DTKG, which is inspired by the Dual Process Theory in cognitive science. Specifically, DTKG comprises two main stages: the Classification Stage and the Branch Processing Stage.

URL PDF HTML ☆

赞 0 踩 0

2510.16282 2026-06-03 cs.CL

Instant Personalized Large Language Model Adaptation via Hypernetwork

通过超网络实现即时个性化大型语言模型自适应

Zhaoxuan Tan, Zixuan Zhang, Haoyang Wen, Zheng Li, Rongzhi Zhang, Pei Chen, Fengran Mo, Zheyuan Liu, Qingkai Zeng, Qingyu Yin, Meng Jiang

发表机构 * University of Notre Dame（诺丁汉大学）； Amazon.com Inc（亚马逊公司）； Université de Montréal（蒙特利尔大学）

AI总结提出Profile-to-PEFT框架，使用超网络将用户编码直接映射到适配器参数，实现无需用户训练的即时个性化，在降低计算成本的同时优于现有方法。

Comments accepted to ACL 2026

详情

AI中文摘要

个性化大型语言模型（LLM）利用用户档案或历史记录来定制符合个人偏好的内容。然而，现有的参数高效微调（PEFT）方法，例如“每用户一个PEFT”（OPPU）范式，需要为每个用户训练单独的适配器，这使得它们在计算上昂贵且不适用于实时更新。我们引入了Profile-to-PEFT，一个可扩展的框架，它采用端到端训练的超网络，将用户编码档案直接映射到一组完整的适配器参数（例如LoRA），从而消除了部署时的每用户训练。这种设计实现了即时自适应、对未见用户的泛化以及保护隐私的本地部署。实验结果表明，我们的方法在部署时使用显著更少的计算资源，同时优于基于提示的个性化和OPPU。该框架对分布外用户表现出强大的泛化能力，并在不同的用户活动水平和不同的嵌入骨干下保持鲁棒性。所提出的Profile-to-PEFT框架实现了高效、可扩展且自适应的LLM个性化，适用于大规模应用。

英文摘要

Personalized large language models (LLMs) tailor content to individual preferences using user profiles or histories. However, existing parameter-efficient fine-tuning (PEFT) methods, such as the ``One-PEFT-Per-User'' (OPPU) paradigm, require training a separate adapter for each user, making them computationally expensive and impractical for real-time updates. We introduce Profile-to-PEFT, a scalable framework that employs a hypernetwork, trained end-to-end, to map a user's encoded profile directly to a full set of adapter parameters (e.g., LoRA), eliminating per-user training at deployment. This design enables instant adaptation, generalization to unseen users, and privacy-preserving local deployment. Experimental results demonstrate that our method outperforms both prompt-based personalization and OPPU while using substantially fewer computational resources at deployment. The framework exhibits strong generalization to out-of-distribution users and maintains robustness across varying user activity levels and different embedding backbones. The proposed Profile-to-PEFT framework enables efficient, scalable, and adaptive LLM personalization suitable for large-scale applications.

URL PDF HTML ☆

赞 0 踩 0

2505.08222 2026-06-03 cs.RO cs.AI cs.DC cs.PF

Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles

通过自主车辆扩展多智能体强化学习用于水声跟踪

Matteo Gallici, Ivan Masmitja, Mario Martín

发表机构 * KEMLG Research Group, Universitat Politècnica de Catalunya Barcelona, Spain（凯姆尔格研究组，巴塞罗那理工大学，西班牙）； Instituto de Ciencias del Mar, Consejo Superior de Investigaciones Científicas, Barcelona, Spain（海洋科学研究所，西班牙国家科学研究委员会，巴塞罗那，西班牙）； KEMLG Research Group, Universitat Politècnica de Catalunya (UPC), and with the HPAI group at Barcelona Supercomputing Center (BSC), Barcelona, Spain（凯姆尔格研究组，巴塞罗那理工大学（UPC），以及巴塞罗那超级计算中心（BSC）的HPAI组，巴塞罗那，西班牙）

AI总结提出一种GPU加速环境（高达30000倍加速）和基于Transformer的MARL架构（TransfMAPPO），实现多目标快速移动场景下的水下跟踪，跟踪误差低于5米。

详情

AI中文摘要

自主车辆（AV）为水下跟踪等科学任务提供了经济高效的解决方案。强化学习（RL）已成为控制AV的强大方法，但扩展到舰队（对于多目标跟踪或快速移动目标至关重要）具有挑战性。多智能体RL（MARL）以样本效率低下而闻名，虽然像Gazebo的LRAUV这样的高保真模拟器提供高达100倍实时速度的单机器人模拟，但在多车辆场景中几乎没有加速，使得MARL训练不切实际。然而，高保真模拟对于测试复杂策略和缩小模拟到现实的差距至关重要。为了解决这些限制，我们开发了一个GPU加速环境，在保持其动力学的同时，实现了比Gazebo高达30000倍的加速。这使得快速、端到端的GPU训练以及无缝转移到Gazebo进行评估成为可能。我们还引入了一种基于Transformer的架构（TransfMAPPO），该架构学习对舰队规模和目标数量不变的策略，从而能够通过课程学习在日益复杂的场景中训练更大的舰队。经过大规模GPU训练后，我们在Gazebo中进行了广泛评估，表明即使面对多个快速移动的目标，我们的方法也能将跟踪误差保持在5米以下。

英文摘要

Autonomous vehicles (AVs) offer a cost-effective solution for scientific missions such as underwater tracking. Reinforcement learning (RL) has emerged as a powerful method for controlling AVs, but scaling to fleets (essential for multi-target tracking or rapidly moving targets) is challenging. Multi-Agent RL (MARL) is notoriously sample-inefficient, and while high-fidelity simulators like Gazebo's LRAUV provide up to 100x faster-than-real-time single-robot simulations, they offer little speedup in multi-vehicle scenarios, making MARL training impractical. Yet, high-fidelity simulation is crucial to test complex policies and close the sim-to-real gap. To address these limitations, we develop a GPU-accelerated environment that achieves up to 30,000x speedup over Gazebo while preserving its dynamics. This enables fast, end-to-end GPU training and seamless transfer to Gazebo for evaluation. We also introduce a Transformer-based architecture (TransfMAPPO) that learns policies invariant to fleet size and number of targets, enabling curriculum learning to train larger fleets on increasingly complex scenarios. After large-scale GPU training, we perform extensive evaluations in Gazebo, showing our method maintains tracking errors below 5m even with multiple fast-moving targets.

URL PDF HTML ☆

赞 0 踩 0

2510.13565 2026-06-03 cs.CV

XD-RCDepth: Lightweight Radar-Camera Depth Estimation with Explainability-Aligned and Distribution-Aware Distillation

XD-RCDepth: 轻量级雷达-相机深度估计，具有可解释性对齐和分布感知蒸馏

Huawei Sun, Zixu Wang, Xiangyuan Peng, Julius Ott, Georg Stettinger, Lorenzo Servadei, Robert Wille

发表机构 * Technical University of Munich（慕尼黑技术大学）； Infineon Technologies AG（英飞凌科技）

AI总结提出轻量级雷达-相机深度估计架构XD-RCDepth，通过可解释性对齐蒸馏和深度分布蒸馏减少参数29.7%并保持精度，在nuScenes和ZJU-4DRadarCam数据集上实现实时性能。

2506.09398 2026-06-03 cs.LG physics.comp-ph

Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames

通过SO(2)局部框架高效预测SO(3)等变哈密顿矩阵

Haiyang Yu, Yuchao Lin, Xuan Zhang, Xiaofeng Qian, Shuiwang Ji

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出QHNetV2网络，利用SO(2)局部框架和SO(2)等变操作实现全局SO(3)等变性，避免昂贵的SO(3)张量积，高效预测哈密顿矩阵。

Comments Code available at: https://github.com/divelab/AIRS

详情

AI中文摘要

我们考虑预测哈密顿矩阵以加速电子结构计算的任务，这在物理、化学和材料科学中扮演重要角色。受哈密顿矩阵的非对角块与SO(2)局部框架之间固有关系的启发，我们提出了一种新颖高效的网络，称为QHNetV2，该网络在不使用昂贵的SO(3) Clebsch-Gordan张量积的情况下实现了全局SO(3)等变性。这是通过引入一组新的高效且强大的SO(2)等变操作，并在SO(2)局部框架内执行所有非对角特征更新和消息传递来实现的，从而消除了对SO(3)张量积的需求。此外，在每个节点的SO(2)局部框架内执行连续的SO(2)张量积以融合节点特征，模拟对称收缩操作。在大型QH9和MD17数据集上的大量实验表明，我们的模型在广泛的分子结构和轨迹上实现了优越的性能，凸显了其强大的泛化能力。所提出的基于SO(2)局部框架的SO(2)操作为可扩展且对称感知的电子结构学习提供了一个有前景的方向。我们的代码将作为AIRS库的一部分发布，网址为https://github.com/divelab/AIRS。

英文摘要

We consider the task of predicting Hamiltonian matrices to accelerate electronic structure calculations, which plays an important role in physics, chemistry, and materials science. Motivated by the inherent relationship between the off-diagonal blocks of the Hamiltonian matrix and the SO(2) local frame, we propose a novel and efficient network, called QHNetV2, that achieves global SO(3) equivariance without the costly SO(3) Clebsch-Gordan tensor products. This is achieved by introducing a set of new efficient and powerful SO(2)-equivariant operations and performing all off-diagonal feature updates and message passing within SO(2) local frames, thereby eliminating the need of SO(3) tensor products. Moreover, a continuous SO(2) tensor product is performed within the SO(2) local frame at each node to fuse node features, mimicking the symmetric contraction operation. Extensive experiments on the large QH9 and MD17 datasets demonstrate that our model achieves superior performance across a wide range of molecular structures and trajectories, highlighting its strong generalization capability. The proposed SO(2) operations on SO(2) local frames offer a promising direction for scalable and symmetry-aware learning of electronic structures. Our code will be released as part of the AIRS library https://github.com/divelab/AIRS.

URL PDF HTML ☆

赞 0 踩 0

2510.09711 2026-06-03 cs.CL cs.AI

ReaLM: Residual Quantization Bridging Knowledge Graph Embeddings and Large Language Models

ReaLM：残差量化桥接知识图谱嵌入与大型语言模型

Wenbin Guo, Xin Wang, Jiaoyan Chen, Lingbing Guo, Zhao Li, Zirui Chen

发表机构 * Tianjin University（天津大学）； The University of Manchester（曼彻斯特大学）

AI总结提出ReaLM框架，通过残差向量量化将知识图谱嵌入离散化为可学习标记，融入大型语言模型词汇表，结合本体约束实现结构化知识与语言模型的语义对齐，在知识图谱补全任务上取得最优性能。

详情

AI中文摘要

大型语言模型（LLM）最近成为知识图谱补全（KGC）的强大范式，提供了超越传统基于嵌入方法的强大推理和泛化能力。然而，现有的基于LLM的方法通常难以充分利用结构化语义表示，因为预训练KG模型的连续嵌入空间与LLM的离散标记空间根本不对齐。这种差异阻碍了有效的语义转移并限制了它们的性能。为了解决这一挑战，我们提出了ReaLM，一种新颖且有效的框架，通过残差向量量化的机制弥合了KG嵌入和LLM标记化之间的差距。ReaLM将预训练的KG嵌入离散化为紧凑的代码序列，并将它们作为可学习标记集成到LLM词汇表中，从而实现符号知识和上下文知识的无缝融合。此外，我们引入了本体引导的类约束以强制语义一致性，基于类级别的兼容性细化实体预测。在两个广泛使用的基准数据集上进行的大量实验表明，ReaLM实现了最先进的性能，证实了其在将结构化知识与大规模语言模型对齐方面的有效性。

英文摘要

Large Language Models (LLMs) have recently emerged as a powerful paradigm for Knowledge Graph Completion (KGC), offering strong reasoning and generalization capabilities beyond traditional embedding-based approaches. However, existing LLM-based methods often struggle to fully exploit structured semantic representations, as the continuous embedding space of pretrained KG models is fundamentally misaligned with the discrete token space of LLMs. This discrepancy hinders effective semantic transfer and limits their performance. To address this challenge, we propose ReaLM, a novel and effective framework that bridges the gap between KG embeddings and LLM tokenization through the mechanism of residual vector quantization. ReaLM discretizes pretrained KG embeddings into compact code sequences and integrates them as learnable tokens within the LLM vocabulary, enabling seamless fusion of symbolic and contextual knowledge. Furthermore, we incorporate ontology-guided class constraints to enforce semantic consistency, refining entity predictions based on class-level compatibility. Extensive experiments on two widely used benchmark datasets demonstrate that ReaLM achieves state-of-the-art performance, confirming its effectiveness in aligning structured knowledge with large-scale language models.

URL PDF HTML ☆

赞 0 踩 0

2510.08977 2026-06-03 cs.LG cs.CL

Breaking the Self-Confirming Loop: Diagnosing and Mitigating Systemic Reward Bias in Self-Rewarding RL

打破自我确认循环：诊断与缓解自奖励强化学习中的系统性奖励偏差

Chuyi Tan, Peiwen Yuan, Xinglin Wang, Yiwei Li, Shaoxiong Feng, Yueqi Zhang, Jiayi Shi, Ji Zhang, Boyuan Pan, Yao Hu, Kan Li

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结本文通过量化反馈回路偏差并提出集成奖励强化学习（RLER）方法，诊断并缓解了自奖励强化学习中由置信度耦合导致的系统性奖励偏差，从而提升性能与稳定性。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）高效扩展了大语言模型（LLMs）的推理能力，但受限于稀缺的标注数据。基于内在奖励的强化学习（RLIR）通过自奖励提供了一种可扩展的替代方案，但常面临不稳定和性能较差的问题。我们将这一差距归因于置信度耦合的自奖励中的系统性偏差：模型倾向于过度奖励高置信度的错误，形成自我确认循环。我们通过三个指标量化这种反馈回路偏差：奖励噪声幅度（rho_noise）、策略-奖励耦合（rho_selfbias）和过度/不足奖励偏斜（rho_symbias）。我们的分析显示了一种复合效应，其中强耦合放大了置信度条件误差，并导致向过度奖励的漂移，从而引发不稳定和较低的性能上限。为缓解这一问题，我们提出集成奖励强化学习（RLER），该方法通过自适应奖励插值和分歧感知的轨迹选择聚合多样化的模型，以减少耦合并抑制过度奖励漂移。大量实验表明，RLER相比最佳RLIR基线提升了6.2%，且与RLVR的差距在3.6%以内，同时在未标注样本上表现出稳定的扩展性。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) efficiently scales the reasoning ability of large language models (LLMs) but is bottlenecked by scarce labeled data. Reinforcement learning with intrinsic rewards (RLIR) offers a scalable alternative via self-rewarding, yet often suffers from instability and inferior performance. We trace this gap to a systemic bias in confidence-coupled self-rewarding: the model tends to over-reward high-confidence mistakes, forming a self-confirming loop. We quantify this feedback-loop bias with three metrics: reward noise magnitude (rho_noise), policy-reward coupling (rho_selfbias), and over-/under-reward skew (rho_symbias). Our analyses show a compounding effect where strong coupling amplifies confidence-conditioned errors and drives a drift toward over-reward, leading to instability and a lower performance ceiling. To mitigate this, we propose reinforcement learning with ensembled rewards (RLER), which aggregates diverse models with adaptive reward interpolation and disagreement-aware rollout selection to reduce coupling and suppress over-reward drift. Extensive experiments show that RLER improves by 6.2% over the best RLIR baseline and is within 3.6% of RLVR, while exhibiting stable scaling on unlabeled samples.

URL PDF HTML ☆

赞 0 踩 0

2510.03316 2026-06-03 cs.CV cs.AI cs.LG

The View From Space: Navigating Instrumentation Differences with EOFMs

从太空视角：利用EOFMs导航仪器差异

Ryan P. Demilt, Nicholas LaHaye, Karis Tenneson

发表机构 * Spatial Informatics Group（空间信息组）

AI总结本研究通过分析地球观测基础模型（EOFMs）对传感器架构的敏感性，揭示了当前模型设计的缺陷，并为模型开发者、用户和遥感科学社区指明了前进方向。

Journal ref https://neurips.cc/virtual/2025/loc/san-diego/122891

详情

AI中文摘要

地球观测基础模型（EOFMs）作为处理大量遥感及其他地球观测数据、并对许多关键地球监测任务产生影响的工具，其普及程度急剧上升。一个新兴趋势是利用预训练模型的输出作为“嵌入”，这些嵌入总结了高维数据，可用于通用任务，如相似性搜索和内容特定查询。然而，大多数EOFMs仅在单一模态数据上训练，然后通过匹配不同模态的波段进行应用或基准测试。现有工作尚不清楚多样化的传感器架构如何影响当前EOFMs套件的内部表示。我们在本工作中表明，EOFMs的表示空间对传感器架构高度敏感，理解这一差异为我们提供了关于当前EOFMs设计陷阱的关键视角，并指明了作为模型开发者、用户以及以稳健遥感科学为指导的社区应如何前进的方向。

英文摘要

Earth Observation Foundation Models (EOFMs) have exploded in prevalence as tools for processing the massive volumes of remotely sensed and other earth observation data, and for delivering impact on the many essential earth monitoring tasks. An emerging trend posits using the outputs of pre-trained models as 'embeddings' which summarize high dimensional data to be used for generic tasks such as similarity search and content-specific queries. However, most EOFM models are trained only on single modalities of data and then applied or benchmarked by matching bands across different modalities. It is not clear from existing work what impact diverse sensor architectures have on the internal representations of the present suite of EOFMs. We show in this work that the representation space of EOFMs is highly sensitive to sensor architecture and that understanding this difference gives a vital perspective on the pitfalls of current EOFM design and signals for how to move forward as model developers, users, and a community guided by robust remote-sensing science.

URL PDF HTML ☆

赞 0 踩 0

2509.26169 2026-06-03 cs.LG

Alignment-Aware Decoding

对齐感知解码

Frédéric Berdoz, Luca A. Lanzendörfer, René Caky, Roger Wattenhofer

发表机构 * EPFL, Switzerland（瑞士联邦理工学院）

AI总结提出一种推理时增强模型对齐的方法——对齐感知解码（AAD），可解释为隐式奖励优化，无需额外训练，在多种基准和模型规模上优于强基线，并能生成合成数据改善数据受限场景下的对齐。

Comments Accepted at ICML 2026

2509.22854 2026-06-03 cs.CL

Train Once, Reuse Everywhere: Generalizable Implicit In-Context Learning by Routing Attention

一次训练，随处重用：通过路由注意力实现可泛化的隐式上下文学习

Jiaqian Li, Yanshu Li, Ligong Han, Ruixiang Tang, Wenya Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出In-Context Routing (ICR)方法，在注意力logits层面捕获可泛化的上下文学习模式，通过可学习的输入条件路由器调制注意力logits，实现高效的一次训练多次重用框架，在12个数据集上优于现有隐式ICL方法并展现强泛化能力。

Comments ICML 2026 Camera-ready

详情

AI中文摘要

隐式上下文学习（ICL）作为一种新兴的有前景范式，在大语言模型（LLMs）的表示空间中模拟ICL行为，旨在以零样本成本获得少样本性能。然而，现有方法主要依赖于将偏移向量注入残差流，这些向量通常从标注示例或任务特定对齐中构建。这种设计未能充分利用ICL背后的结构机制，且泛化能力有限。为了解决这个问题，我们提出了In-Context Routing (ICR)，一种新颖的隐式ICL方法，在注意力logits层面捕获和利用可泛化的ICL模式。它提取ICL过程中出现的可重用结构方向，并采用可学习的输入条件路由器相应地调制注意力logits，从而实现高效的一次训练多次重用框架。我们在涵盖不同领域和多个LLM的12个真实世界数据集上评估了ICR。结果表明，ICR一致优于需要任务特定检索或训练的现有隐式ICL方法，同时在它们难以处理的域外任务上展现出稳健的泛化能力。这些发现将ICR定位为推动ICL实际价值边界的方案。代码可在https://github.com/Lijiaqian1/In-Context-Routing.git获取。

英文摘要

Implicit in-context learning (ICL) has newly emerged as a promising paradigm that simulates ICL behaviors in the representation space of large language models (LLMs), aiming to attain few-shot performance at zero-shot cost. However, existing approaches largely rely on injecting shift vectors into residual flows, which are typically constructed from labeled demonstrations or task-specific alignment. Such designs fall short of utilizing the structural mechanisms underlying ICL and suffer from limited generalizability. To address this, we propose In-Context Routing (ICR), a novel implicit ICL method that captures and utilizes generalizable ICL patterns at the attention logits level. It extracts reusable structural directions that emerge during ICL and employs a learnable input-conditioned router to modulate attention logits accordingly, enabling an efficient train-once-and-reuse framework. We evaluate ICR on 12 real-world datasets spanning diverse domains and multiple LLMs. The results show that ICR consistently outperforms existing implicit ICL methods that require task-specific retrieval or training, while demonstrating robust generalization to out-of-domain tasks where they struggle. These findings position ICR to push the boundary of the practical value of ICL. The code is available at https://github.com/Lijiaqian1/In-Context-Routing.git.

URL PDF HTML ☆

赞 0 踩 0

2509.22468 2026-06-03 cs.LG cs.AI

Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining

学习邻域：无对比的多模态自监督分子图预训练

Boshra Ariguib, Mathias Niepert, Andrei Manolache

发表机构 * University of Tübingen（图宾根大学）

AI总结提出C-FREE框架，通过预测子图嵌入与互补邻域的关系，融合2D拓扑和3D构象信息，实现无对比、无负样本的多模态自监督分子图预训练，在MoleculeNet上取得最优结果。

Comments Accepted at ICML 2026

详情

AI中文摘要

高质量的分子表示对于性质预测和分子设计至关重要，然而大型标注数据集仍然稀缺。尽管分子图上的自监督预训练已显示出潜力，但许多现有方法要么依赖于手工数据增强或复杂的生成目标，要么仅利用2D拓扑，导致宝贵的3D结构信息未被充分利用。为弥补这一空白，我们引入了C-FREE（基于自我网络的无需对比的表示学习），一个将2D图与3D构象集成在一起的简单框架。C-FREE通过从潜在空间中互补邻域预测子图嵌入来学习分子表示，使用固定半径的自我网络作为不同构象之间的建模单元。这种设计使我们能够在混合图神经网络（GNN）-Transformer骨干中整合几何和拓扑信息，无需负样本、位置编码或昂贵的预处理。在提供丰富3D构象多样性的GEOM数据集上进行预训练后，C-FREE在MoleculeNet上取得了最先进的结果，超越了对比、生成和其他多模态自监督方法。在具有不同规模和分子类型的数据集上进行微调进一步表明，预训练能有效迁移到新的化学领域，突显了3D信息分子表示的重要性。

英文摘要

High-quality molecular representations are essential for property prediction and molecular design, yet large labeled datasets remain scarce. While self-supervised pretraining on molecular graphs has shown promise, many existing approaches either depend on hand-crafted augmentations or complex generative objectives, and often rely solely on 2D topology, leaving valuable 3D structural information underutilized. To address this gap, we introduce C-FREE (Contrast-Free Representation learning on Ego-nets), a simple framework that integrates 2D graphs with ensembles of 3D conformers. C-FREE learns molecular representations by predicting subgraph embeddings from their complementary neighborhoods in the latent space, using fixed-radius ego-nets as modeling units across different conformers. This design allows us to integrate both geometric and topological information within a hybrid Graph Neural Network (GNN)-Transformer backbone, without negatives, positional encodings, or expensive pre-processing. Pretraining on the GEOM dataset, which provides rich 3D conformational diversity, C-FREE achieves state-of-the-art results on MoleculeNet, surpassing contrastive, generative, and other multimodal self-supervised methods. Fine-tuning across datasets with diverse sizes and molecule types further demonstrates that pretraining transfers effectively to new chemical domains, highlighting the importance of 3D-informed molecular representations.

URL PDF HTML ☆

赞 0 踩 0

2505.17659 2026-06-03 cs.RO cs.CV

Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

Plan-R1：安全且可行的轨迹规划作为语言建模

Xiaolong Tang, Meina Kan, Shiguang Shan, Xilin Chen

发表机构 * Institute of Computing Technology, Chinese Academy of Sciences（中国科学院计算技术研究所）； University of Chinese Academy of Sciences（中国科学院大学）

AI总结提出Plan-R1两阶段轨迹规划框架，通过原则对齐与行为学习解耦，结合规则奖励和方差解耦GRPO，显著提升自动驾驶规划的安全性和可行性。

Comments Accepted by ICLR2026

详情

AI中文摘要

安全且可行的轨迹规划对于现实世界的自动驾驶系统至关重要。然而，现有的基于学习的规划器严重依赖专家演示，这不仅缺乏明确的安全意识，还可能继承次优人类驾驶数据中的不良行为（如超速）。受大型语言模型成功的启发，我们提出了Plan-R1，一种两阶段轨迹规划框架，将原则对齐与行为学习解耦。在第一阶段，通用轨迹预测器在专家数据上进行预训练，以捕获多样化的、类人的驾驶行为。在第二阶段，使用基于规则的奖励通过组相对策略优化（GRPO）对模型进行微调，明确地将自我规划与安全、舒适和交通规则遵守等原则对齐。这种两阶段范式保留了类人行为，同时增强了安全意识并丢弃了演示中的不良模式。此外，我们识别了直接应用GRPO到规划的一个关键限制：组级归一化消除了跨组的尺度差异，导致罕见、高方差的安全违规组与大量低方差的安全组具有相似的优势，从而抑制了对安全关键目标的优化。为解决此问题，我们提出了方差解耦GRPO（VD-GRPO），用中心化和固定缩放替代归一化以保留绝对奖励幅度，确保安全关键目标在整个训练过程中保持主导地位。在nuPlan基准上的实验表明，Plan-R1显著提高了规划的安全性和可行性，达到了最先进的性能，特别是在现实反应性设置中。我们的代码可在https://github.com/XiaolongTang23/Plan-R1获取。

英文摘要

Safe and feasible trajectory planning is critical for real-world autonomous driving systems. However, existing learning-based planners rely heavily on expert demonstrations, which not only lack explicit safety awareness but also risk inheriting undesirable behaviors such as speeding from suboptimal human driving data. Inspired by the success of large language models, we propose Plan-R1, a two-stage trajectory planning framework that decouples principle alignment from behavior learning. In the first stage, a general trajectory predictor is pre-trained on expert data to capture diverse, human-like driving behaviors. In the second stage, the model is fine-tuned with rule-based rewards using Group Relative Policy Optimization (GRPO), explicitly aligning ego planning with principles such as safety, comfort, and traffic rule compliance. This two-stage paradigm retains human-like behaviors while enhancing safety awareness and discarding undesirable patterns from demonstrations. Furthermore, we identify a key limitation of directly applying GRPO to planning: group-wise normalization erases cross-group scale differences, causing rare, high-variance safety-violation groups to have similar advantages as abundant low-variance safe groups, thereby suppressing optimization for safety-critical objectives. To address this, we propose Variance-Decoupled GRPO (VD-GRPO), which replaces normalization with centering and fixed scaling to preserve absolute reward magnitudes, ensuring that safety-critical objectives remain dominant throughout training. Experiments on the nuPlan benchmark demonstrate that Plan-R1 significantly improves planning safety and feasibility, achieving state-of-the-art performance, particularly in realistic reactive settings. Our code is available at https://github.com/XiaolongTang23/Plan-R1.

URL PDF HTML ☆

赞 0 踩 0

2502.02748 2026-06-03 cs.LG cond-mat.mtrl-sci

ReciNet: Reciprocal Space-Aware Long-Range Modeling for Crystalline Property Prediction

ReciNet: 用于晶体性质预测的倒易空间感知长程建模

Jianan Nie, Peiyao Xiao, Kaiyi Ji, Peng Gao

发表机构 * Department of Computer Science, Virginia Tech（维吉尼亚理工大学计算机科学系）； Department of Computer Science and Engineering, University at Buffalo（布法罗大学计算机科学与工程系）

AI总结提出基于倒易空间的几何网络ReciNet，通过傅里叶级数表示和可学习滤波器结合几何GNN与倒易模块，实现晶体中短程和长程相互作用建模，在多个基准上取得优异预测精度。

详情

AI中文摘要

从晶体结构预测其性质是材料科学中一项基础但具有挑战性的任务。与分子不同，晶体结构表现出原子的无限周期排列，需要能够有效捕捉局部和全局信息的方法。然而，当前的工作在捕捉周期结构内的长程相互作用方面存在不足。为了解决这个问题，我们利用倒易空间（周期晶体的自然域），并从分数坐标和倒易格矢出发，使用可学习滤波器构建傅里叶级数表示。在此基础上，我们引入了基于倒易空间的几何网络（ReciNet），这是一种新颖的架构，它集成了几何GNN和倒易模块来建模短程和长程相互作用。在综合基准JARVIS、Materials Project和MatBench上的实验表明，ReciNet在一系列晶体性质预测任务中取得了出色的预测精度。此外，我们探索了使用混合专家模型进行多性质预测的模型扩展，该扩展展示了高计算效率，并揭示了相关性质之间的正迁移。这些发现凸显了我们的模型作为可扩展且准确的晶体性质预测解决方案的潜力。

英文摘要

Predicting properties of crystals from their structures is a fundamental yet challenging task in materials science. Unlike molecules, crystal structures exhibit infinite periodic arrangements of atoms, requiring methods capable of capturing both local and global information effectively. However, current works fall short of capturing long-range interactions within periodic structures. To address this, we leverage \emph{reciprocal space}, the natural domain for periodic crystals, and construct a Fourier series representation from fractional coordinates and reciprocal lattice vectors with learnable filters. Building on this, we introduce the reciprocal space-based geometry network (\textbf{ReciNet}), a novel architecture that integrates geometric GNNs and reciprocal blocks to model short-range and long-range interactions. Experiments on comprehensive benchmarks JARVIS, Materials Project, and MatBench demonstrate that ReciNet achieves outstanding predictive accuracy across a range of crystal property prediction tasks. Additionally, we explore a model extension for multi-property prediction with the mixture-of-experts, which demonstrates high computational efficiency and reveals positive transfer between correlated properties. These findings highlight the potential of our model as a scalable and accurate solution for crystal property prediction.

URL PDF HTML ☆

赞 0 踩 0

2509.20623 2026-06-03 cs.RO

Latent Activation Editing: Inference-Time Refinement of Learned Policies for Safer Multirobot Navigation

潜在激活编辑：基于推理时策略精炼的安全多机器人导航

Satyajeet Das, Darren Chiu, Zhehui Huang, Lars Lindemann, Gaurav S. Sukhatme

发表机构 * Department of Computer Science, University of Southern California（南加州大学计算机科学系）； Automatic Control Laboratory, ETH Zürich（苏黎世联邦理工学院自动控制实验室）

AI总结提出潜在激活编辑（LAE）框架，通过在推理时在线检测并编辑中间激活，在不修改权重或架构的情况下降低预训练策略的碰撞率，在四旋翼导航中实现近90%的碰撞减少。

详情

AI中文摘要

强化学习在协调和导航多个四旋翼等复杂领域取得了显著进展。然而，即使经过良好训练的策略在障碍物密集的环境中仍然容易发生碰撞。通过重新训练或微调来解决这些罕见但关键的安全故障成本高昂，并且有损于先前学到的技能。受大语言模型中的激活引导和计算机视觉中的潜在编辑启发，我们引入了一个推理时潜在激活编辑（LAE）框架，该框架在不修改权重或架构的情况下精炼预训练策略的行为。该框架分两个阶段运行：（i）在线分类器监控中间激活以检测与不良行为相关的状态，（ii）激活编辑模块选择性地修改被标记的激活，将策略转向更安全的区域。在这项工作中，我们专注于提高多四旋翼导航的安全性。我们假设放大策略内部的风险感知可以诱导更安全的行为。我们通过训练一个潜在碰撞世界模型来实例化这一想法，该模型预测未来的碰撞前激活，从而促使更早和更谨慎的避碰响应。大量的仿真和真实Crazyflie实验表明，与未编辑的基线相比，LAE实现了统计上显著的碰撞减少（累计碰撞减少近90%），并显著增加了无碰撞轨迹的比例，同时保持了任务完成。更广泛地说，我们的结果确立了LAE作为一种轻量级范式，可在资源受限的硬件上对学习后的机器人策略进行部署后精炼。

英文摘要

Reinforcement learning has enabled significant progress in complex domains such as coordinating and navigating multiple quadrotors. However, even well-trained policies remain vulnerable to collisions in obstacle-rich environments. Addressing these infrequent but critical safety failures through retraining or fine-tuning is costly and risks degrading previously learned skills. Inspired by activation steering in large language models and latent editing in computer vision, we introduce a framework for inference-time Latent Activation Editing (LAE) that refines the behavior of pre-trained policies without modifying their weights or architecture. The framework operates in two stages: (i) an online classifier monitors intermediate activations to detect states associated with undesired behaviors, and (ii) an activation editing module that selectively modifies flagged activations to shift the policy towards safer regimes. In this work, we focus on improving safety in multi-quadrotor navigation. We hypothesize that amplifying a policy's internal perception of risk can induce safer behaviors. We instantiate this idea through a latent collision world model trained to predict future pre-collision activations, thereby prompting earlier and more cautious avoidance responses. Extensive simulations and real-world Crazyflie experiments demonstrate that LAE achieves statistically significant reduction in collisions (nearly 90% fewer cumulative collisions compared to the unedited baseline) and substantially increases the fraction of collision-free trajectories, while preserving task completion. More broadly, our results establish LAE as a lightweight paradigm, feasible on resource-constrained hardware, for post-deployment refinement of learned robot policies.

URL PDF HTML ☆

赞 0 踩 0

2509.19305 2026-06-03 cs.LG cs.AI eess.SP

Wavelet Fourier Diffuser: Frequency-Aware Diffusion Model for Reinforcement Learning

小波傅里叶扩散器：用于强化学习的频率感知扩散模型

Yifu Luo, Yongzhe Chang, Xueqian Wang

发表机构 * Tsinghua University China（清华大学中国）

AI总结针对现有扩散模型在离线强化学习中忽略频域特征导致频率偏移的问题，提出WFDiffuser，通过离散小波变换分解轨迹并利用短时傅里叶变换和交叉注意力增强频域建模，在D4RL基准上有效缓解频率偏移，提升轨迹稳定性和决策性能。

Comments IJCNN 2025

Journal ref IJCNN 2025

详情

AI中文摘要

扩散概率模型通过直接建模轨迹序列，在离线强化学习中展现出显著潜力。然而，现有方法主要关注时域特征而忽略频域特征，根据我们的观察，这会导致频率偏移和性能下降。在本文中，我们从频域的新视角研究强化学习问题。我们首先观察到，仅使用时域的方法会无意中引入频域低频分量的偏移，从而导致轨迹不稳定和性能下降。为了解决这个问题，我们提出了小波傅里叶扩散器（WFDiffuser），一种新颖的基于扩散的强化学习框架，它集成了离散小波变换将轨迹分解为低频和高频分量。为了进一步增强每个分量的扩散建模，WFDiffuser采用短时傅里叶变换和交叉注意力机制来提取频域特征并促进跨频率交互。在D4RL基准上的大量实验结果表明，WFDiffuser有效缓解了频率偏移，从而产生更平滑、更稳定的轨迹，并相比现有方法提高了决策性能。

英文摘要

Diffusion probability models have shown significant promise in offline reinforcement learning by directly modeling trajectory sequences. However, existing approaches primarily focus on time-domain features while overlooking frequency-domain features, leading to frequency shift and degraded performance according to our observation. In this paper, we investigate the RL problem from a new perspective of the frequency domain. We first observe that time-domain-only approaches inadvertently introduce shifts in the low-frequency components of the frequency domain, which results in trajectory instability and degraded performance. To address this issue, we propose Wavelet Fourier Diffuser (WFDiffuser), a novel diffusion-based RL framework that integrates Discrete Wavelet Transform to decompose trajectories into low- and high-frequency components. To further enhance diffusion modeling for each component, WFDiffuser employs Short-Time Fourier Transform and cross attention mechanisms to extract frequency-domain features and facilitate cross-frequency interaction. Extensive experiment results on the D4RL benchmark demonstrate that WFDiffuser effectively mitigates frequency shift, leading to smoother, more stable trajectories and improved decision-making performance over existing methods.

URL PDF HTML ☆

赞 0 踩 0

2509.18068 2026-06-03 cs.RO eess.SP

RadarSFD: Single-Frame Diffusion with Pretrained Priors for Radar Point Clouds

RadarSFD：基于预训练先验的单帧扩散用于雷达点云

Bin Zhao, Nakul Garg

发表机构 * Rice University（里士大学）

AI总结提出RadarSFD，一种条件潜在扩散框架，利用预训练单目深度估计器的几何先验，从单帧雷达数据重建密集LiDAR-like点云，无需合成孔径或多帧聚合。

Comments Accepted to the 2026 IEEE International Conference on Robotics and Automation (ICRA 2026). Project page: https://phi-lab-rice.github.io/RadarSFD/

详情

AI中文摘要

毫米波雷达在雾、烟、尘和低光环境下提供稳健的感知，使其适用于尺寸、重量和功率受限的机器人平台。现有的雷达成像方法通常依赖合成孔径或多帧聚合来提高分辨率，这对于小型空中、检测或可穿戴系统不切实际。我们提出RadarSFD，一种条件潜在扩散框架，无需运动或SAR即可从单帧雷达重建密集的LiDAR-like点云。我们的方法将预训练单目深度估计器的几何先验转移到扩散骨干中，通过通道级潜在拼接将其锚定到雷达输入，并使用结合潜在空间和像素空间损失的双空间目标进行正则化。在RadarHD基准上，RadarSFD相对于基线模型实现了最先进的性能。定性结果显示恢复了精细的墙壁和狭窄的间隙，跨新环境的实验证实了强大的泛化能力。消融研究强调了预训练初始化、雷达BEV条件和双空间损失的重要性。这些结果共同为紧凑型机器人系统中的密集点云感知建立了一个实用的单帧、无SAR毫米波雷达流水线。

英文摘要

Millimeter-wave radar provides robust perception in fog, smoke, dust, and low light, making it attractive for size-, weight-, and power-constrained robotic platforms. Existing radar imaging methods typically rely on synthetic aperture or multi-frame aggregation to improve resolution, which is impractical for small aerial, inspection, or wearable systems. We present RadarSFD, a conditional latent diffusion framework that reconstructs dense LiDAR-like point clouds from a single radar frame without motion or SAR. Our approach transfers geometric priors from a pretrained monocular depth estimator into the diffusion backbone, anchors them to radar inputs via channel-wise latent concatenation, and regularizes outputs with a dual-space objective combining latent and pixel-space losses. On the RadarHD benchmark, RadarSFD achieves state-of-the-art performance against baseline models. Qualitative results show recovery of fine walls and narrow gaps, and experiments across new environments confirm strong generalization. Ablation studies highlight the importance of pretrained initialization, radar BEV conditioning, and the dual-space loss. Together, these results establish a practical single-frame, no-SAR mmWave radar pipeline for dense point cloud perception in compact robotic systems.

URL PDF HTML ☆

赞 0 踩 0

2509.14636 2026-06-03 cs.RO

BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots

BEV-ODOM2: 基于PV-BEV融合与密集光流监督的增强型BEV单目视觉里程计用于地面机器人

Yufei Wei, Chenxiao Hu, Wangtao Lu, Sha Lu, Yuxiang Cui, Fuzhang Han, Rong Xiong, Yue Wang

发表机构 * Tsinghua University（清华大学）

AI总结针对现有BEV方法中位姿训练稀疏监督和透视投影信息丢失的问题，提出BEV-ODOM2框架，通过密集BEV光流监督和PV-BEV融合，在四个数据集上实现40%的RTE提升，并支持边缘实时部署。

详情

AI中文摘要

尺度一致的自我运动估计是自主地面机器人的基础。鸟瞰图（BEV）表示通过提供度量尺度的平面工作空间，自然地解决了单目视觉里程计（MVO）的尺度漂移问题，使得6自由度自我运动简化为更鲁棒的3自由度模型。然而，现有的基于BEV的方法存在两个关键限制：仅从位姿训练得到的稀疏监督信号，以及透视到BEV投影过程中的信息丢失。我们提出了BEV-ODOM2，一个增强框架，无需额外标注即可解决这两个限制。我们的方法引入了（1）直接从3自由度位姿真值构建的密集BEV光流监督，用于像素级指导，以及（2）透视视图（PV）-BEV融合，在投影前计算相关体积以保留6自由度运动线索。增强的旋转采样策略进一步在训练中平衡了不同的运动模式。我们在四个不同空间尺度的数据集上进行了评估：KITTI、Oxford、NCLT和我们新收集的ZJH-VO基准。BEV-ODOM2相比之前的BEV方法实现了40%的RTE提升，在NVIDIA Jetson AGX Orin上的实时推理确认了边缘部署的可行性。源代码和ZJH-VO数据集已公开发布，以促进未来研究。

英文摘要

Scale-consistent ego-motion estimation is fundamental for autonomous ground robots. Bird's-Eye-View (BEV) representation naturally addresses the scale drift problem of monocular visual odometry (MVO) by providing a metric-scaled planar workspace, enabling the simplification of 6-DoF ego-motion to a more robust 3-DoF model. However, existing BEV-based methods suffer from two key limitations: sparse supervision signals from pose-only training, and information loss during perspective-to-BEV projection. We present BEV-ODOM2, an enhanced framework that addresses both limitations without requiring additional annotations. Our approach introduces (1) dense BEV optical flow supervision constructed directly from 3-DoF pose ground truth for pixel-level guidance, and (2) Perspective View (PV)-BEV fusion that computes correlation volumes before projection to preserve 6-DoF motion cues. An enhanced rotation sampling strategy further balances diverse motion patterns during training. We evaluate on four datasets with varied spatial scales: KITTI, Oxford, NCLT, and our newly collected ZJH-VO benchmark. BEV-ODOM2 achieves a 40\% RTE improvement over prior BEV-based methods, with real-time inference on an NVIDIA Jetson AGX Orin confirming edge deployment feasibility. The source code and the ZJH-VO dataset are publicly released to facilitate future research.

URL PDF HTML ☆

赞 0 踩 0

2507.09105 2026-06-03 cs.CV

Hybrid Autoregressive-Diffusion Model for Real-Time Sign Language Production

混合自回归-扩散模型用于实时手语生成

Maoxiao Ye, Xinfeng Ye, Mano Manoharan

发表机构 * University of Auckland（奥克兰大学）

AI总结提出HybridSign混合自回归-扩散模型，结合因果帧生成与流式扩散精炼，实现低延迟高质量手语生成，在PHOENIX14T和How2Sign上取得最佳质量-效率权衡。

Comments Accepted at ACL 2026

详情

AI中文摘要

早期的手语生成（SLP）模型通常依赖于自回归解码，这自然保持了时间因果性，但在推理时会出现错误累积。最近的基于扩散的方法通过迭代去噪提高了生成质量，但其序列级精炼过程引入了大量延迟。为了解决这一权衡问题，我们提出了HybridSign，一种用于低延迟手语生成的混合自回归-扩散模型，它结合了因果帧生成与流式扩散精炼。多尺度姿态表示模块捕获细粒度发音特征，而置信度感知因果注意力机制利用关节级置信度分数提高在噪声2D姿态观测下的鲁棒性。在PHOENIX14T和How2Sign上的实验表明，HybridSign在比较的基线中始终实现了最佳的质量-效率权衡。在How2Sign测试集上，在60帧评估协议下，它达到了BLEU-1/4分数30.12/6.48和DTW 3.89，同时将首帧时间减少到5.90秒，吞吐量提高到10.17 FPS。

英文摘要

Earlier Sign Language Production (SLP) models typically relied on autoregressive decoding, which naturally preserves temporal causality but suffers from error accumulation at inference time. More recent diffusion-based approaches improve generation quality through iterative denoising, yet their sequence-level refinement process introduces substantial latency. To address this trade-off, we propose HybridSign, a hybrid autoregressive-diffusion model for low-latency sign language production that combines causal frame generation with flow-based diffusion refinement. A Multi-Scale Pose Representation module captures fine-grained articulator features, while a Confidence-Aware Causal Attention mechanism leverages joint-level confidence scores to improve robustness under noisy 2D pose observations. Experiments on PHOENIX14T and How2Sign show that HybridSign consistently achieves the best quality--efficiency trade-off among the compared baselines. On the How2Sign test split, it reaches BLEU-1/4 scores of 30.12/6.48 and DTW of 3.89, while reducing time-to-first-frame to 5.90s and increasing throughput to 10.17 FPS under a 60-frame evaluation protocol.

URL PDF HTML ☆

赞 0 踩 0

2507.23035 2026-06-03 cs.LG cs.AR

OASIS: Outlier-Aware LUT-Based GEMM with Dual-Side Quantization for LLM Inference Acceleration

OASIS：基于查找表的离群点感知双端量化LLM推理加速通用矩阵乘法

Xueying Wu, Baijun Zhou, Zhihui Gao, Yuzhe Fu, Qilin Zheng, Yintao He, Hai Li

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出OASIS架构，利用预计算笛卡尔积查找表实现非均匀量化权重与激活的高效通用矩阵乘法，通过离群点感知量化方案和实时离群点检测引擎Orizuru，在保持精度的同时显著提升推理速度和能效。

详情

AI中文摘要

大型语言模型（LLM）在各种应用中展现了令人印象深刻的能力，但在推理过程中需要大量的内存和计算资源。现有的量化方法在效率和准确性之间存在权衡：仅权重量化（WOQ）引入了昂贵的反量化开销，而整数权重和激活量化（INT-WAQ）降低了精度并损害了模型质量。非均匀权重和激活量化（NU-WAQ）能更好地捕捉LLM权重和激活的非均匀分布，但仍与传统的低精度计算单元不兼容。本文提出了OASIS，一种基于查找表（LUT）的架构，能够在无需反量化的情况下实现非均匀量化权重和激活之间的高效通用矩阵乘法（GEMM）。OASIS采用预计算的笛卡尔积LUT，实现了LUT大小的64倍缩减，并相较于现有基于LUT的GEMM方法实现了1024倍的计算并行度提升。为了在激进的激活量化下保持精度，OASIS引入了一种离群点感知量化方案，同时进行基于LUT的GEMM和针对离群点的误差补偿。此外，我们设计了Orizuru，一种用于实时激活离群点检测的高效top-k检测引擎。根据广泛评估，与FP16基线相比，OASIS的平均精度下降仅为1.98%，比Atom低5.18%。在硬件方面，与FIGLUT加速器相比，OASIS实现了平均3.00倍的加速和1.44倍的能效提升。

英文摘要

Large language models (LLMs) have demonstrated impressive capabilities across a wide range of applications, but demand substantial memory and compute resources during inference. Existing quantization methods expose a trade-off between efficiency and accuracy: weight-only quantization (WOQ) incurs costly dequantization overheads, while integer weight-and-activation quantization (INT-WAQ) reduces precision and degrades model quality. Non-uniform weight-and-activation quantization (NU-WAQ) can better capture the non-uniform distributions of LLM weights and activations, yet remains incompatible with conventional low-precision compute units. This paper presents OASIS, a lookup table (LUT)-based architecture that enables efficient general matrix multiplication (GEMM) between non-uniformly quantized weights and activations without requiring dequantization. OASIS employs pre-computed Cartesian Product LUTs, achieving a 64x reduction in LUT size and enabling a 1024x higher computational parallelism over existing LUT-based GEMM methods. To preserve accuracy under aggressive activation quantization, OASIS introduces an outlier-aware quantization scheme with concurrent LUT-based GEMM and error compensation for outliers. Furthermore, we design Orizuru, an efficient top-k detection engine for real-time activation outlier identification. According to extensive evaluations, OASIS incurs an average accuracy drop of only 1.98% compared to the FP16 baseline, which is 5.18% lower than Atom. On the hardware side, OASIS achieves an average 3.00x speedup and a 1.44x energy efficiency improvement compared to the FIGLUT accelerator.

URL PDF HTML ☆

赞 0 踩 0

2509.03376 2026-06-03 cs.CV

Transformer-Guided Content-Adaptive Graph Learning for Hyperspectral Unmixing

Transformer引导的内容自适应图学习用于高光谱解混

Hui Chen, Liangyu Liu, Xianchao Xiu, Wanquan Liu

发表机构 * School of Automation Engineering, Shanghai University of Electric Power（上海电力大学自动化工程学院）； School of Mechatronic Engineering and Automation, Shanghai University（上海大学机电工程与自动化学院）； School of Intelligent Systems Engineering, Sun Yat-sen University（中山大学智能系统工程学院）

AI总结提出T-CAGU框架，结合Transformer捕获全局依赖和内容自适应图神经网络增强局部关系，通过多阶传播动态学习图结构并引入图残差机制，实现高光谱图像的高效解混。

详情

AI中文摘要

高光谱解混（HU）旨在将遥感图像中的每个混合像素分解为一组端元及其对应的丰度。尽管深度学习在该领域取得了显著进展，但大多数方法无法同时表征全局依赖和局部一致性，难以保持长程交互和边界细节。本文提出了一种新颖的Transformer引导的内容自适应图解混框架（T-CAGU），通过采用Transformer捕获全局依赖并引入内容自适应图神经网络增强局部关系，克服了这些挑战。与以往工作不同，T-CAGU集成多个传播阶次以动态学习图结构，确保对噪声的鲁棒性。此外，T-CAGU利用图残差机制保留全局信息并稳定训练。实验结果表明其优于最先进的方法。我们的代码可在https://github.com/xianchaoxiu/T-CAGU获取。

英文摘要

Hyperspectral unmixing (HU) targets to decompose each mixed pixel in remote sensing images into a set of endmembers and their corresponding abundances. Despite significant progress in this field using deep learning, most methods fail to simultaneously characterize global dependencies and local consistency, making it difficult to preserve both long-range interactions and boundary details. This letter proposes a novel transformer-guided content-adaptive graph unmixing framework (T-CAGU), which overcomes these challenges by employing a transformer to capture global dependencies and introducing a content-adaptive graph neural network to enhance local relationships. Unlike previous work, T-CAGU integrates multiple propagation orders to dynamically learn the graph structure, ensuring robustness against noise. Furthermore, T-CAGU leverages a graph residual mechanism to preserve global information and stabilize training. Experimental results demonstrate its superiority over the state-of-the-art methods. Our code is available at https://github.com/xianchaoxiu/T-CAGU.

URL PDF HTML ☆

赞 0 踩 0

2508.13174 2026-06-03 cs.AI cs.LG q-fin.CP stat.ML

AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining

AlphaEval：一个全面高效的公式化Alpha挖掘评估框架

Hongjun Ding, Binqi Chen, Jinsheng Huang, Taian Guo, Zhengyang Mao, Guoyi Shao, Lutong Zou, Luchen Liu, Ming Zhang

发表机构 * CUNY Baruch College（CUNY 巴纳特学院）； Peking University（北京大学）； Harvard University（哈佛大学）； Zhengren Research（正人研究所）； Zhengren Quant（正人量化）

AI总结提出AlphaEval框架，通过五个维度（预测能力、稳定性、鲁棒性、金融逻辑、多样性）对自动Alpha挖掘模型进行统一、可并行化且无需回测的评估，实现与回测相当的评估一致性并提高效率。

Comments Accepted by KDD2026

详情

DOI: 10.1145/3770855.3817727

AI中文摘要

公式化Alpha挖掘从金融数据中生成预测信号，对量化投资至关重要。尽管遗传编程、强化学习和大语言模型等多种算法方法显著扩展了Alpha发现的能力，但系统评估仍是一个关键挑战。现有评估指标主要包括回测和基于相关性的度量。回测计算密集、本质上是顺序的，并且对特定策略参数敏感。基于相关性的度量虽然高效，但仅评估预测能力，忽略了时间稳定性、鲁棒性、多样性和可解释性等其他关键属性。此外，大多数现有Alpha挖掘模型的闭源性质阻碍了可重复性并减缓了该领域的进展。为解决这些问题，我们提出了AlphaEval，一个统一、可并行化且无需回测的自动Alpha挖掘模型评估框架。AlphaEval沿五个互补维度评估生成Alpha的整体质量：预测能力、稳定性、对市场扰动的鲁棒性、金融逻辑和多样性。跨代表性Alpha挖掘算法的广泛实验表明，AlphaEval实现了与全面回测相当的评估一致性，同时提供更全面的洞察和更高的效率。此外，与传统的单一指标筛选方法相比，AlphaEval能有效识别更优的Alpha。所有实现和评估工具均已开源，以促进可重复性和社区参与。

英文摘要

Formula alpha mining, which generates predictive signals from financial data, is critical for quantitative investment. Although various algorithmic approaches-such as genetic programming, reinforcement learning, and large language models-have significantly expanded the capacity for alpha discovery, systematic evaluation remains a key challenge. Existing evaluation metrics predominantly include backtesting and correlation-based measures. Backtesting is computationally intensive, inherently sequential, and sensitive to specific strategy parameters. Correlation-based metrics, though efficient, assess only predictive ability and overlook other crucial properties such as temporal stability, robustness, diversity, and interpretability. Additionally, the closed-source nature of most existing alpha mining models hinders reproducibility and slows progress in this field. To address these issues, we propose AlphaEval, a unified, parallelizable, and backtest-free evaluation framework for automated alpha mining models. AlphaEval assesses the overall quality of generated alphas along five complementary dimensions: predictive power, stability, robustness to market perturbations, financial logic, and diversity. Extensive experiments across representative alpha mining algorithms demonstrate that AlphaEval achieves evaluation consistency comparable to comprehensive backtesting, while providing more comprehensive insights and higher efficiency. Furthermore, AlphaEval effectively identifies superior alphas compared to traditional single-metric screening approaches. All implementations and evaluation tools are open-sourced to promote reproducibility and community engagement.

URL PDF HTML ☆

赞 0 踩 0

2508.09606 2026-06-03 cs.RO cs.SY eess.SY

BEAVR: Bimanual, multi-Embodiment, Accessible, Virtual Reality Teleoperation System for Robots

BEAVR：用于机器人的双手、多形态、可访问的虚拟现实遥操作系统

Alejandro Posadas-Nava, Alejandro Carrasco, Richard Linares

发表机构 * Department of Aeronautics and Astronautics, Massachusetts Institute of Technology（航空与航天系，麻省理工学院）

AI总结提出BEAVR，一个开源的双手多形态VR遥操作系统，通过零拷贝流式架构和异步“思考-行动”控制循环，实现低延迟、多机器人实时控制与数据记录，并兼容多种视觉运动策略。

Comments Accepted for presentation on ICCR Kyoto 2025

详情

DOI: 10.1109/ICCR67607.2025.11372114

AI中文摘要

\textbf{BEAVR}是一个用于机器人的开源、双手、多形态虚拟现实（VR）遥操作系统，旨在统一异构机器人平台上的实时控制、数据记录和策略学习。BEAVR使用商用VR硬件实现实时、灵巧的遥操作，支持从7自由度机械臂到全身人形机器人的模块化集成，并直接以LeRobot数据集模式记录同步的多模态演示。我们的系统具有零拷贝流式架构，实现≤35毫秒延迟，一个用于可扩展推理的异步“思考-行动”控制循环，以及一个针对实时多机器人操作优化的灵活网络API。我们在多种操作任务上对BEAVR进行基准测试，并展示其与领先的视觉运动策略（如ACT、DiffusionPolicy和SmolVLA）的兼容性。所有代码公开可用，数据集发布在Hugging Face上\footnote{代码、数据集和VR应用可在https://github.com/ARCLab-MIT/BEAVR-Bot获取。}

英文摘要

\textbf{BEAVR} is an open-source, bimanual, multi-embodiment Virtual Reality (VR) teleoperation system for robots, designed to unify real-time control, data recording, and policy learning across heterogeneous robotic platforms. BEAVR enables real-time, dexterous teleoperation using commodity VR hardware, supports modular integration with robots ranging from 7-DoF manipulators to full-body humanoids, and records synchronized multi-modal demonstrations directly in the LeRobot dataset schema. Our system features a zero-copy streaming architecture achieving $\leq$35\,ms latency, an asynchronous ``think--act'' control loop for scalable inference, and a flexible network API optimized for real-time, multi-robot operation. We benchmark BEAVR across diverse manipulation tasks and demonstrate its compatibility with leading visuomotor policies such as ACT, DiffusionPolicy, and SmolVLA. All code is publicly available, and datasets are released on Hugging Face\footnote{Code, datasets, and VR app available at https://github.com/ARCLab-MIT/BEAVR-Bot.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

ParliaBench: An Evaluation and Benchmarking Framework for LLM-Generated Parliamentary Speech

Low-Rank Curvature for Zeroth-Order Optimization in LLM Fine-Tuning

MAC: An Efficient Gradient Preconditioning using Mean Activation Approximated Curvature

Node Perturbation Can Effectively Train Multi-Layer Neural Networks

Temporal Action Selection for Action Chunking

CropCraft: A Procedural World Generator for Robotic Simulation of Agricultural Tasks

Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach

Towards Fair Graph Prompting: A Dual-Prompt Mechanism for Mitigating Attribute and Structural Bias

DTKG: Dual-Track Knowledge Graph-Verified Reasoning Framework for Multi-Hop QA

Instant Personalized Large Language Model Adaptation via Hypernetwork

Scaling Multi Agent Reinforcement Learning for Underwater Acoustic Tracking via Autonomous Vehicles

XD-RCDepth: Lightweight Radar-Camera Depth Estimation with Explainability-Aligned and Distribution-Aware Distillation

Efficient Prediction of SO(3)-Equivariant Hamiltonian Matrices via SO(2) Local Frames

ReaLM: Residual Quantization Bridging Knowledge Graph Embeddings and Large Language Models

Breaking the Self-Confirming Loop: Diagnosing and Mitigating Systemic Reward Bias in Self-Rewarding RL

The View From Space: Navigating Instrumentation Differences with EOFMs

Alignment-Aware Decoding

Train Once, Reuse Everywhere: Generalizable Implicit In-Context Learning by Routing Attention

Learning the Neighborhood: Contrast-Free Multimodal Self-Supervised Molecular Graph Pretraining

Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling

ReciNet: Reciprocal Space-Aware Long-Range Modeling for Crystalline Property Prediction

Latent Activation Editing: Inference-Time Refinement of Learned Policies for Safer Multirobot Navigation

Wavelet Fourier Diffuser: Frequency-Aware Diffusion Model for Reinforcement Learning

RadarSFD: Single-Frame Diffusion with Pretrained Priors for Radar Point Clouds

BEV-ODOM2: Enhanced BEV-based Monocular Visual Odometry with PV-BEV Fusion and Dense Flow Supervision for Ground Robots

Hybrid Autoregressive-Diffusion Model for Real-Time Sign Language Production

OASIS: Outlier-Aware LUT-Based GEMM with Dual-Side Quantization for LLM Inference Acceleration

Transformer-Guided Content-Adaptive Graph Learning for Hyperspectral Unmixing

AlphaEval: A Comprehensive and Efficient Evaluation Framework for Formula Alpha Mining

BEAVR: Bimanual, multi-Embodiment, Accessible, Virtual Reality Teleoperation System for Robots