arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.08477 2026-06-09 cs.AI 新提交

不仅仅是在一次之后：受睡眠启发的回放防止顺序任务后的灾难性遗忘

Anthony Bazhenov, Jean Erik Delanois, Giri P. Krishnan

发表机构 * Department of Neuroscience, University of California, San Diego, CA, USA（1 神经科学系，加州大学圣地亚哥分校，美国加利福尼亚州圣地亚哥）

AI总结提出受睡眠启发的无监督回放机制，在多个新任务顺序训练后应用，以部分恢复所有先前学习任务的性能，防止灾难性遗忘。

详情

AI中文摘要

人工神经网络的关键限制之一是缺乏持续学习的能力：在新任务上训练常常导致对先前任务的干扰和遗忘。尽管已有几种算法被提出以保护旧记忆免受干扰，但它们通常在每个新训练阶段期间或之后立即应用。相比之下，人类和动物可以持续学习，在主动学习期间获取多个新记忆，然后将它们全部巩固到长期存储中。在这里，我们展示了多个新任务可以顺序训练，然后应用无监督的睡眠样回放阶段，以部分恢复所有先前学习任务的性能。我们的研究进一步表明，任务特定信息对新训练具有弹性，但随着网络在新任务上训练而逐渐衰减。这些发现为开发广泛范围的持续学习AI解决方案提供了新颖的原则。

英文摘要

One of the critical limitations of artificial neural networks is their lack of ability to continually learn: training on new tasks often leads to interference and forgetting of the previous ones. While several algorithms have been proposed to protect old memories from interference, they are typically applied during or immediately after each new episode of training. In contrast, humans and animals can learn continuously, acquiring multiple new memories during active learning before consolidating all of them into long-term storage. Here we show that multiple new tasks can be trained sequentially before an unsupervised sleep-like replay phase is applied to partially restore performance across all previously learned tasks. Our study further suggests that task-specific information remains resilient to new training but decays gradually as network is trained on new tasks. These findings point to novel principles for developing a broad range of continual learning AI solutions.

URL PDF HTML ☆

赞 0 踩 0

2606.08446 2026-06-09 cs.LG cs.AI 新提交

Sparrow: Sparse Rollout for Stable and Efficient Long-context RL of Large Language Models

Sparrow: 用于大语言模型稳定高效长上下文强化学习的稀疏 rollout

Yang Zhou, Ranajoy Sadhukhan, Zhaofeng Sun, Zhuoming Chen, Souvik Kundu, Saket Dingliwal, Sai Muralidhar Jayanthi, Aram Galstyan, Haizhong Zheng, Beidi Chen

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Cornell University（康奈尔大学）； Intel（英特尔）； Amazon AGI（亚马逊AGI）

AI总结针对RLVR中长上下文rollout计算昂贵的问题，提出Sparrow方法，通过动态稀疏度调度保持token级策略失配的下尾统计量稳定，在Qwen3系列模型上实现2.0-2.4倍加速，并推广到更大模型和编程领域。

详情

AI中文摘要

尽管强大，但带有可验证奖励的强化学习（RLVR）会诱导极长的思维链（COT），使其计算成本高昂。由于RLVR每步成本主要由长上下文rollout生成主导，稀疏注意力为加速密集rollout提供了一种有前景的方法。然而，稀疏rollout需要精细的稳定性-效率权衡：过于激进的稀疏性会导致崩溃，而过于宽松的稀疏性则加速不足。在这项工作中，我们通过稀疏到密集的演员-策略失配来研究这种权衡。我们首先观察到，稀疏rollout崩溃并非由token间的均匀退化驱动：即使在激进的稀疏性下，大多数稀疏token也能与密集token完美对齐。受此启发，我们假设如果每个token的演员-策略失配的下尾在整个轨迹中保持在临界阈值以上，则稀疏rollout训练保持稳定。我们引入一种动态稀疏度调度，在生成过程中保持该尾统计量恒定，并验证了我们的假设。在Qwen3思考族模型上，将尾失配统计量保持在一致阈值附近通常能实现稳定训练。然后，我们使用成本模型在该失配阈值下找到最大加速的稀疏度调度，在训练Qwen3-1.7B、Qwen3-4B和Qwen3-8B时分别实现了2.2倍、2.4倍和2.0倍的rollout加速。实验表明，这些阈值可推广到更大的模型（Qwen3-14B）和另一个RL领域（编程）。最后，我们的分析自然引出了DistillSparse：在稀疏rollout上进行轻量级基于LoRA的蒸馏，使更激进的稀疏性达到相同的稀疏到密集失配阈值，从而获得更高的加速。

英文摘要

Despite being powerful, reinforcement learning with verifiable rewards (RLVR) induces extremely long COT, making it computationally expensive. Since RLVR per-step cost is dominated by long-context rollout generation, sparse attention offers a promising way to accelerate dense rollout. However, sparse rollouts require a delicate stability-efficiency tradeoff: overly aggressive sparsity causes collapse, while overly lenient sparsity gives insufficient speedup. In this work, we study this tradeoff through sparse-to-dense actor-policy mismatch. We first observe that sparse rollout collapse is not driven by uniform degradation across tokens: most sparse tokens align perfectly with dense even under aggressive sparsity. Motivated by this, we hypothesize that sparse rollout training remains stable if the lower tail of per-token actor-policy mismatch stays above a critical threshold throughout the trajectory. We introduce a dynamic sparsity schedule that keeps this tail statistic constant during generation and validate our hypothesis. Across Qwen3 thinking-family models, keeping the tail mismatch statistic near a consistent threshold generally enables stable training. We then use a cost model to find the sparsity schedule for maximum speedup under this mismatch threshold, achieving 2.2x, 2.4x, and 2.0x rollout speedups when training Qwen3-1.7B, Qwen3-4B, and Qwen3-8B. Empirically, we show the thresholds generalize to a larger model (Qwen3-14B) and another RL domain (coding). Finally, our analysis naturally motivates DistillSparse: lightweight LoRA-based distillation on sparse rollout lets more aggressive sparsity reach the same sparse-to-dense mismatch threshold, yielding higher speedup.

URL PDF HTML ☆

赞 0 踩 0

2606.08445 2026-06-09 cs.CL cs.AI 新提交

Segment-level Tree Search for Long Meeting Document Summarization

长会议文档摘要的段级树搜索

Sangwon Ryu, Heejin Do, Jun Seo, Daehui Kim, Yunsu Kim, Gary Geunbae Lee, Jungseul Ok

发表机构 * GSAI, POSTECH（浦项科技大学人工智能研究院）； CSE, POSTECH（浦项科技大学计算机科学与工程系）； ETH Zurich（苏黎世联邦理工学院）； ETH AI Center（苏黎世联邦理工学院人工智能中心）； Agentic AI Lab, KT（KT公司智能体人工智能实验室）； LILT（LILT公司）

AI总结提出基于蒙特卡洛树搜索的段级摘要框架S3，无需训练即可组合段级候选摘要，使用7B模型达到72B模型性能。

Comments INTERSPEECH 2026

2606.08440 2026-06-09 cs.RO cs.CV 新提交

GraspFoM: Towards Reconstruction-Driven Robotic Grasping with 3D Foundation Priors

GraspFoM：基于3D基础先验的重建驱动机器人抓取

Dongli Wu, Xiaobao Wei, Hao Wang, Qiaochu Dong, Ying Li, Qingpo Wuwu, Ming Lu, Wufan Zhao

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Peking University（北京大学）； The Hong Kong University of Science and Technology（香港科技大学）

AI总结提出GraspFoM框架，利用3D基础先验（SAM3D）构建共享3D物体潜变量，联合优化重建与抓取姿态预测，通过锚点初始化的截断姿态推理扩散器生成连续多模态抓取，实现高保真重建与最优抓取。

详情

AI中文摘要

机器人抓取是机器人操作中的基本能力。然而，在部分观测下抓取仍然具有挑战性。可靠的抓取依赖于局部接触线索和物体级3D结构。现有的几何感知抓取方法认识到重建的价值，但通常将几何视为中间预测，而不是可重用的抓取物体先验。在本文中，我们提出了GraspFoM，一个统一的框架，利用3D基础先验（SAM3D）为重建和抓取姿态预测构建共享的3D物体潜变量。基于这个共享的物体潜变量，我们引入了一个锚点初始化的截断姿态推理扩散器，它预测连续且多模态的抓取姿态，而不直接依赖离散的抓取候选。我们进一步通过一个重建感知评分器和残差潜变量更新器来研究重建与抓取之间的相互作用。重建提供基于几何的线索，而抓取监督则使共享的物体潜变量向与抓取相关的可操作性区域细化。GraspFoM联合预测抓取姿态并以网格和3DGS形式重建高保真3D资产。综合实验表明，GraspFoM在重建和抓取上都达到了最先进的结果。值得注意的是，这些改进只需要少量额外的可训练参数。组件消融研究也证明了每个组件的贡献。

英文摘要

Robotic grasping is a fundamental capability in robotic manipulation. Yet grasping remains challenging under partial observations. Reliable grasping depends on both local contact cues and object-level 3D structure. Existing geometry-aware grasping methods recognize the value of reconstruction, but they typically treat geometry as an intermediate prediction rather than a reusable object prior for grasping. In this paper, we present GraspFoM, a unified framework that leverages 3D foundation priors (SAM3D) to build a shared 3D object latent for both reconstruction and grasp pose prediction. Built on this shared object latent, we introduce an anchor-initialized truncated pose-reasoning diffuser that predicts continuous and multimodal grasp poses without directly relying on discrete grasp candidates. We further investigate the interaction between reconstruction and grasping through a reconstruction-aware scorer and a residual latent updater. Reconstruction provides grounded geometric cues, while grasp supervision refines the shared object latent toward grasp-relevant affordances. GraspFoM jointly predicts grasp poses and reconstructs high-fidelity 3D assets in mesh and 3DGS forms. Comprehensive experiments demonstrate that GraspFoM achieves state-of-the-art results on both reconstruction and grasping. Notably, these improvements require only a small number of additional trainable parameters. Component-wise ablation studies also demonstrate the contribution of each component.

URL PDF HTML ☆

赞 0 踩 0

2606.08432 2026-06-09 cs.AI 新提交

Trajectory-Refined Distillation

轨迹精炼蒸馏

Li Jiang, Haoran Xu, Yichuan Ding, Amy Zhang

发表机构 * McGill University（麦吉尔大学）； Mila Quebec AI Institute（米拉魁北克人工智能研究所）； UT Austin（德克萨斯大学奥斯汀分校）

AI总结提出轨迹精炼蒸馏（TRD），通过教师指导修正学生轨迹中的前缀错误，解决在线策略蒸馏中的前缀失败问题，提升大语言模型的单次准确率和推理覆盖。

Comments under review

详情

AI中文摘要

在线策略蒸馏（OPD）已成为大型语言模型（LLM）的重要后训练工具，它沿着学生自身的生成轨迹提供密集的逐词教师监督。在这项工作中，我们识别出OPD中一个常见的结构性问题，称为前缀失败。在前缀失败下，密集的逐词监督会导致双峰教师混合和碎片化梯度，而词级损失截断或重加权无法解决这一问题。这一观察促使我们超越词级损失干预，转向轨迹级输出修正。因此，我们提出轨迹精炼蒸馏（TRD），一种轨迹级修正方法，在教师指导下，于在线策略支持范围内修正学生的生成轨迹。通过在蒸馏前修正有问题的前缀，TRD从根源上缓解了前缀失败。此外，即使原始轨迹已经正确，TRD也能通过教师指导让学生接触到替代的有效推导，从而改善探索。TRD还可应用于在线策略自蒸馏（OPSD），这是一种使用基于特权信息的学生模型作为教师的参数共享变体。在多个尺度的广泛基准和基础模型上，TRD始终优于先前基线，提高了单次尝试准确率并扩展了推理覆盖范围。代码可在 https://github.com/louieworth/trd 获取。

具有主动对话查询的可证明高效个性化多目标老虎机

Linfeng Cao, Ming Shi, Ness B. Shroff

发表机构 * The Ohio State University（俄亥俄州立大学）； University at Buffalo（布法罗大学）

AI总结提出MO-PQUCB算法，通过主动查询获取用户偏好信号，结合Plackett-Luce模型和正则化UCB，解决多目标老虎机中偏好与奖励的耦合问题，实现更优的遗憾界。

Comments UAI 2026

详情

AI中文摘要

多目标老虎机中的个性化决策需要学习用户在不同竞争目标之间的特定权衡。由于臂的效用既取决于未知奖励又取决于未知偏好，现有方法仅从效用反馈中推断偏好，将偏好学习与奖励探索纠缠在一起。然而，在实践中，用户通常通过主动对话查询（例如，“便宜且干净的酒店”）揭示他们的优先级，但这种结构化信号未被利用。我们形式化了一个基于主动查询的框架，其中用户查询提供结构化的偏好信号。通过Plackett-Luce子集选择模型对这些信号进行建模，我们证明了由于基本的平移不变性障碍，仅查询学习是不够的。为了解决这个问题，我们引入了MO-PQUCB，一种混合算法，通过平移不变正则化和双探索UCB将基于查询的偏好锚定与老虎机反馈相结合。我们证明了主动查询加速了偏好估计，并相比先前偏好感知的MO-MAB方法实现了改进的遗憾缩放。在查询被破坏的情况下，我们进一步刻画了统计极限，并设计了一个鲁棒估计器，在破坏稀疏时实现接近最优的性能。实验验证了理论和实际收益。

TrustMargin: 大语言模型中参数化记忆与检索证据之间的无训练仲裁

Jingyan Xu, Hong Shi, Yi Shan, Penghui Liu, Yunhao Bai, Ningyuan Li, Xueyang Liu

发表机构 * Peking University（北京大学）

AI总结针对大语言模型在知识问答中参数记忆与检索证据冲突的问题，提出无训练仲裁层TrustMargin，利用模型自身似然度评分选择更可信的答案，无需微调或外部评判。

Comments 13 pages, 6 figures, 9 tables. Code and data are available at https://github.com/mojixu/TrustMargin.git

详情

AI中文摘要

大语言模型通过参数化记忆和检索证据回答知识密集型问题，但两种来源并非都可靠。检索可以填补知识空白，但干扰性段落可能覆盖正确的闭卷答案。我们将这种生成后冲突视为答案级源仲裁：给定来自同一冻结模型的直接和RAG答案，决定信任哪个源。我们提出TRUSTMARGIN，一个无训练、即插即用的仲裁层，它使用模型自身的似然度对两个现有候选答案进行评分。它结合了参数先验边际（测试记忆是否接受检索答案）和证据绑定边际（折扣仅段落显著性并衡量问题特定支持）。TRUSTMARGIN在直接和RAG之间进行选择，无需微调、外部评判或额外生成。在2WIKIMQA和CWQA上使用三种LLaMA规模，TRUSTMARGIN一致优于直接生成和BM25-RAG，恢复了直接/RAG oracle差距的一部分，并推广到多个无训练RAG流水线。

英文摘要

Large language models answer knowledge-intensive questions using both parametric memory and retrieved evidence, but neither source is uniformly reliable. Retrieval can fill knowledge gaps, yet distracting passages may override correct closed-book answers. We study this post-generation conflict as answer-level source arbitration: given Direct and RAG answers from the same frozen model, decide which source to trust. We propose TRUSTMARGIN, a training-free, plug-and-play arbitration layer that scores the two existing candidates with the model's own likelihoods. It combines a parametric-prior margin, which tests whether memory accepts the retrieved answer, with an evidence-binding margin, which discounts passage-only salience and measures question-specific support. TRUSTMARGIN selects between Direct and RAG without fine-tuning, external judges, or additional generation. Across 2WIKIMQA and CWQA with three LLaMA scales, TRUSTMARGIN consistently improves over Direct generation and BM25-RAG, recovers part of the Direct/RAG oracle gap, and generalizes to multiple training-free RAG pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.08394 2026-06-09 cs.CL 新提交

STAR-KV：通过软阈值实现自适应秩控制的低秩KV缓存压缩

Priyansh Bhatnagar, Ashkan Moradifirouzabadi, Se-Hyun Yang, SeungJae Lee, Jungwook Choi, Mingu Kang

发表机构 * University of Washington（华盛顿大学）

AI总结提出STAR-KV框架，通过可微阈值机制实现注意力头和块级别的自适应秩选择，结合混合分解和低秩感知混合精度量化，在多种LLM上达到75%的KV缓存压缩，结合量化可减少20倍，并实现6.9倍注意力模块加速和3.1倍端到端生成吞吐提升。

详情

AI中文摘要

低秩投影通过利用隐藏维度冗余已成为压缩KV缓存的一种有前景的方法。然而，先前的方法依赖于固定或启发式秩选择，难以在最小精度损失下实现激进压缩。我们提出STAR-KV，一种具有细粒度秩控制的自适应低秩KV缓存压缩框架。STAR-KV包括：1）可微阈值机制，可在注意力头和块级别实现最优秩选择；2）混合分解策略，根据键和值投影的敏感性应用不同的低秩分解；3）低秩感知混合精度量化，利用数据统计实现近乎无损的低比特量化。在多个LLM和基准测试中评估，STAR-KV实现了高达75%的KV缓存压缩，结合量化可实现高达20倍的整体KV缓存减少。通过基于Triton的自定义GPU内核，STAR-KV为注意力模块提供高达6.9倍的加速，端到端生成吞吐量提升3.1倍。我们的代码公开在：https://github.com/PriyanshBhatnagar/STAR-KV。

英文摘要

Low-rank projection has emerged as a promising approach for compressing the KV cache by exploiting hidden-dimension redundancy. However, prior methods rely on fixed or heuristic rank selection and struggle to achieve aggressive compression with minimal accuracy degradation. We propose STAR-KV, an adaptive low-rank KV cache compression framework with fine-grained rank control. STAR-KV encompasses 1) a differentiable thresholding mechanism that enables optimal rank selection at both attention-head and block levels, 2) a hybrid decomposition strategy that applies different low-rank factorizations according to the sensitivity of key and value projections, and 3) a low-rank-aware mixed precision quantization that leverages data statistics for near lossless low-bit quantization. Evaluated across multiple LLMs and benchmarks, STAR-KV achieves up to 75% KV cache compression and up to 20x overall KV cache reduction when combined with quantization. Enabled by custom Triton-based GPU kernels, STAR-KV delivers up to 6.9x speedup for the attention module and 3.1x end-to-end generation throughput. Our code is publicly available at: https://github.com/PriyanshBhatnagar/STAR-KV.

URL PDF HTML ☆

赞 0 踩 0