增强科学论述：面向科学领域的机器翻译

Dimitris Roussis, Sokratis Sofianopoulos, Stelios Piperidis

发表机构 * Institute for Speech and Language Processing（语音与语言处理研究所）； Athena RC（雅典研究中心）

AI总结本文针对科学领域中由于专业术语和复杂句式带来的翻译挑战，构建了多语种平行和单语语料库，并通过微调通用神经机器翻译系统评估语料库质量。

详情

AI中文摘要

随着科研文献数量的增加，跨语言交流的需求日益迫切。机器翻译（MT）为获取国际出版物提供了有前景的解决方案。然而，科学领域因其专业术语和复杂句式而具有独特挑战。本文提出了一套面向科学领域的平行和单语语料库，目标语言对为西班牙-英语、法语-英语和葡萄牙-英语。对于每种语言对，我们创建了一个大规模的通用科学语料库以及四个聚焦于癌症研究、能源研究、神经科学和交通运输研究的较小语料库。为了评估这些语料库的质量，我们利用它们对通用神经机器翻译（NMT）系统进行微调。我们详细介绍了语料库的创建过程、所采用的微调策略，并最后给出了评估结果。

英文摘要

The increasing volume of scientific research necessitates effective communication across language barriers. Machine translation (MT) offers a promising solution for accessing international publications. However, the scientific domain presents unique challenges due to its specialized vocabulary and complex sentence structures. In this paper, we present the development of a collection of parallel and monolingual corpora for the scientific domain. The corpora target the language pairs Spanish-English, French-English, and Portuguese-English. For each language pair, we create a large general scientific corpus as well as four smaller corpora focused on the domains of: Cancer Research, Energy Research, Neuroscience, and Transportation research. To evaluate the quality of these corpora, we utilize them for fine-tuning general-purpose neural machine translation (NMT) systems. We provide details regarding the corpus creation process, the fine-tuning strategies employed, and we conclude with the evaluation results.

URL PDF HTML ☆

赞 0 踩 0

2605.20911 2026-05-21 cs.AI cs.LG

For How Long Should We Be Punching? Learning Action Duration in Fighting Games

我们应该持续打击多久？在格斗游戏中学习动作持续时间

Hoang Hai Nguyen, Kurt Driessens, Dennis J. N. J. Soemers

发表机构 * Department of Advanced Computing Sciences, Maastricht University（马斯特里赫特大学高级计算科学系）

AI总结本文研究了在格斗游戏中如何通过学习动作持续时间来提高强化学习代理的决策能力，探讨了动态调整反应时间的方法及其对性能和行为模式的影响。

Comments Accepted at Computers and Games 2026

详情

AI中文摘要

像《街头霸王II》这样的格斗游戏对强化学习（RL）代理提出了独特的挑战，因为它们具有快速且实时的性质。在大多数RL框架中，代理被硬编码为在固定间隔内做出决策，通常每帧或每N帧。虽然这种设计确保了及时的响应，但限制了代理调整反应时间的能力。每帧行动提供帧完美反应，这与人类玩家相比不现实，而更长的固定间隔会降低计算成本但会阻碍响应速度。我们考虑了一种替代的决策框架，其中代理不仅学习采取什么动作，还学习执行该动作有多久。通过同时预测动作和持续时间，代理可以动态调整其对游戏不同情况的响应能力。我们使用开源的FightLadder环境，通过训练代理对抗内置的脚本机器人，系统地测试不同的帧跳配置，以分析其对性能、响应性和学习行为的影响。实验表明，学习的时间可以与精心选择的固定帧跳性能相匹配，并鼓励可重复的动作模式，但本身并不能保证鲁棒性。在大多数情况下，我们发现代理在一致的高帧跳值（即低响应速度）下表现最佳。这种策略使学习利用性策略变得更容易，其中相同的动作被反复执行，而脚本机器人似乎容易受到这种策略的影响。

英文摘要

Fighting games such as Street Fighter II present unique challenges to reinforcement learning (RL) agents due to their fast-paced, real-time nature. In most RL frameworks, agents are hard-coded to make decisions at a fixed interval, typically every frame or every N frames. Although this design ensures timely responses, it restricts the agent's ability to adjust its reaction timing. Acting every frame grants frame-perfect reflexes, which are unrealistic compared to human players, whereas longer fixed intervals reduce computational cost but hinder responsiveness. We consider an alternative decision-making framework in which the agent learns not only what action to take but also for how long to execute it. By jointly predicting both action and duration, the agent can dynamically adapt its responsiveness to different situations in the game. We implement this method using the open-source FightLadder environment with agents trained against scripted built-in bots, systematically testing different frame skip configurations to analyze their influence on performance, responsiveness, and learned behavior. Experiments show that learned timing can match the performance of well-chosen fixed frame skips and encourages repeatable action patterns, but does not ensure robustness on its own. In most cases, we see agents performing best with consistently high frame skip values (i.e., low responsiveness). This strategy makes it easier to learn exploitative strategies where the same action is repeated over and over, which the scripted bots appear to be susceptible to.

URL PDF HTML ☆

赞 0 踩 0

2605.20910 2026-05-21 cs.CV

FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

FlowLong: 通过流形约束的 Tweedie 匹配实现推理时的长视频生成

Jangho Park, Geon Yeong Park, Gihyun Kwon, Jong Chul Ye

发表机构 * KAIST（韩国科学技术院）； Amazon（亚马逊）

AI总结本文提出了一种新的推理时长视频生成方法，通过流形约束的Tweedie匹配在重叠滑动窗口中生成长视频，同时保持时间和空间一致性，并且无需额外训练。

Comments Project Page: https://flowlong-video.github.io/

详情

AI中文摘要

扩展视频扩散模型的生成时间范围仍然是一个长期且重要的挑战。现有的无训练方法分为两类：双向模型的扩展，这些模型紧密耦合到特定架构，且在长范围内质量下降；以及自回归模型，这些模型由于暴露偏差积累漂移误差，倾向于生成重复的运动模式。为了解决这些问题，我们提出了一种新颖但简单的推理时长视频生成方法，该方法对架构不敏感且不需要额外训练。我们的方法通过重叠滑动窗口生成长视频，其中相邻窗口预测的干净样本通过Tweedie匹配融合，以强制重叠区域的流形约束和时间一致性。随后，随机早期阶段采样通过在高噪声阶段每次Tweedie匹配校正后注入新鲜噪声，同步每个窗口的轨迹，然后过渡到确定性ODE采样以保持细粒度的视觉保真度。应用于各种视频生成模型，我们的方法生成的视频长度是原窗口长度的数倍，同时在时间和视觉质量上优于无训练和自回归基线，并且进一步扩展到音频视频联合生成和文本到3DGS，无需微调。

英文摘要

Extending the generation horizon of video diffusion models to long sequences remains a long-standing and important challenge. Existing training-free approaches fall into two categories: extensions of bidirectional models, which are tightly coupled to specific architectures and suffer from quality degradation over long horizons, and autoregressive models, which accumulate drift errors due to exposure bias and tend to produce repetitive motion patterns. To address these issues, we propose a novel but simple inference-time approach for long video generation that is architecture-agnostic and requires no additional training. Our method generates long videos via overlapping sliding windows, where predicted clean samples from adjacent windows are blended via \emph{Tweedie matching} to enforce both \textbf{manifold constraint and temporal consistency} across overlap regions. \emph{Stochastic early-phase sampling} then synchronizes per-window trajectories by injecting fresh noise after each Tweedie matching correction in the high-noise phase, before transitioning to deterministic ODE sampling to preserve fine-grained visual fidelity. Applied to various video generation models, our method generates videos several times longer than the native window length while outperforming both training-free and autoregressive baselines in temporal consistency and visual quality, and further extends to audio-video joint generation and text-to-3DGS without any fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2605.20908 2026-05-21 cs.CV

SynCB: A Synergy Concept-Based Model with Dynamic Routing Between Concepts and Complementary Neural Branches

SynCB：一种基于协同概念的模型，具有概念与互补神经分支之间的动态路由

Tores Julie, Sun Rémy, Sassatelli Lucile, Ancarani Elisa, Wu Hui-Yin, Precioso Frédéric

发表机构 * CNRS（法国国家科学研究中心）； Inria（法国国家信息与自动化技术研究院）； I3S（信息科学与系统研究所）

AI总结本研究提出了一种协同概念模型SynCB，通过动态路由模块在概念分支和互补神经分支之间进行选择，以提高任务准确性和对人工干预的响应性。

详情

AI中文摘要

基于概念（CB）的模型提供了可解释性和支持测试时的人工干预，而标准神经网络（NN）提供了强大的任务性能但透明性较低。先前的工作探索了将概念和其他表示结合的混合公式以提高准确性，但通常以牺牲人工干预为代价。我们引入了协同概念模型（SynCB）框架，该框架结合了CB分支和互补神经分支，并且有一个可训练的路由模块，可以动态选择每个输入使用的分支。与以往模型不同，SynCB保持两个分支独立，并通过路由模块协调它们。此外，两个分支都是联合学习的，允许互补神经分支和CB分支通过它们的共同骨干进行信息共享。为了提高对干预的响应性，我们进一步引入了测试时的干预策略和相应的损失。在五个数据集和CB基准上，SynCB始终在任务准确性和对人工干预的响应性上取得更高的成绩，比全神经基线高3.9个百分点，比最强竞争对手的干预性能高6.43个百分点。

英文摘要

Concept-based (CB) models provide interpretability and support test-time human intervention, while standard neural networks (NN) offer strong task performance but little transparency. Prior work has explored hybrid formulations that integrate concepts and additional representations to improve accuracy, often at the cost of human interventions. We introduce the \emph{Synergy Concept-Based Model (SynCB)} framework, that combines a CB branch with a complementary neural branch, and a trainable routing module that dynamically selects which branch to use for each input. Unlike prior models, which fuse residual and concept-based predictions, SynCB keeps the two branches distinct and coordinates them through the routing module. Moreover, both branches are learned jointly, allowing information sharing between the complementary neural branch and CB branches through their common backbone. To improve responsiveness to interventions, we further introduce a test-time intervention policy and a corresponding loss. Across five datasets and CB benchmarks, SynCB consistently achieves higher task accuracy while remaining more responsive to human interventions, surpassing the full neural baseline by up to 3.9 percentage points and exceeding the strongest competitor in intervention performance by up to 6.43 percentage points.

URL PDF HTML ☆

赞 0 踩 0

2605.20904 2026-05-21 cs.CV

JFAA: Technical Report for the EPIC-KITCHENS-100 Action Anticipation Challenge at EgoVis 2026

JFAA：EgoVis 2026 EPIC-KITCHENS-100 动作预见挑战的技术报告

Qiaohui Chu, Haoyu Zhang, Yisen Feng, Meng Liu, Weili Guan, Dongmei Jiang, Liqiang Nie

发表机构 * Harbin Institute of Technology (Shenzhen)（哈尔滨工业大学（深圳））； Pengcheng Laboratory（鹏城实验室）； Shandong Jianzhu University（山东建筑大学）

AI总结本文提出JFAA，一种基于JEPA的未来动作预见方法，用于EPIC-KITCHENS-100动作预见任务。通过冻结编码器和预测器提取观察上下文特征和近未来潜在标记，再训练轻量级注意力探针以预测动词、名词和动作日志。通过构建字段感知的集成模型提高鲁棒性，实验结果表明JFAA在EgoVis 2026 EPIC-KITCHENS-100动作预见挑战中取得第一名。

Comments The champion solution for the EPIC-KITCHENS-100 Action Anticipation Challenge at the CVPR EgoVis Workshop 2026

2605.20901 2026-05-21 cs.CV cs.AI

VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026

VISTA：EgoVis 2026 ego4D 短期物体交互预测挑战的技术报告

Qiaohui Chu, Haoyu Zhang, Yisen Feng, Meng Liu, Weili Guan, Dongmei Jiang, Liqiang Nie

发表机构 * Harbin Institute of Technology (Shenzhen)（哈尔滨工业大学（深圳））； Pengcheng Laboratory（鹏城实验室）； Shandong Jianzhu University（山东建筑大学）

AI总结本文提出VISTA，一种用于EgoVis 2026 ego4D短期物体交互预测挑战的V-JEPA集成静态快速时序预测器。该方法结合了以物体为中心的空间检测与短视时间上下文，通过特征调制和ROI级上下文融合，将时间表示注入检测路径，以提高预测的鲁棒性。

Comments The champion solution for the Ego4D Short-Term Object Interaction Anticipation Challenge at the CVPR EgoVis Workshop 2026

详情

AI中文摘要

我们提出VISTA，一种用于EgoVis 2026 ego4D短期物体交互预测（STA）挑战的V-JEPA集成静态快速时序预测器。给定一个眼动视频时间戳，任务要求预测下一步的人-物体交互，包括未来活跃物体的边界框、名词类别、动词类别、接触时间以及置信度分数。VISTA采用StillFast风格的设计，结合以物体为中心的空间检测与短视时间上下文。具体来说，一个在COCO上预训练的Faster R-CNN ResNet-50 FPN检测器从最后一个观察到的高分辨率帧中生成物体建议，而冻结的V-JEPA 2.1时间分支从观察到的视频中提取片段级眼动上下文。时间表示通过特征调制和ROI级上下文融合注入检测路径。融合的建议特征随后传递给多头STA预测器进行框细化、名词分类、动词分类、接触时间回归和交互置信度估计。为了最终提交，我们进一步融合互补预测以提高鲁棒性。在官方挑战服务器上的实验结果表明，VISTA在EgoVis 2026 ego4D STA挑战中获得第一名。我们的代码将在https://github.com/CorrineQiu/VISTA上发布。

英文摘要

We propose VISTA, a V-JEPA Integrated StillFast Temporal Anticipator for the Ego4D Short-Term Object Interaction Anticipation (STA) Challenge at EgoVis 2026. Given an egocentric video timestamp, the task requires anticipating the next human-object interaction, including the future active object's bounding box, noun category, verb category, time-to-contact, and confidence score. VISTA follows a StillFast-style design that combines object-centric spatial detection with short-horizon temporal context. Specifically, a COCO-pretrained Faster R-CNN ResNet-50 FPN detector generates object proposals from the last observed high-resolution frame, while a frozen V-JEPA 2.1 temporal branch extracts clip-level egocentric context from the observed video. The temporal representation is injected into the detection pathway through feature modulation and ROI-level context fusion. The fused proposal features are then passed to multi-head STA predictors for box refinement, noun classification, verb classification, time-to-contact regression, and interaction confidence estimation. For the final submission, we further ensemble complementary predictions to improve robustness. Experimental results on the official challenge server show that VISTA achieves first place in the EgoVis 2026 Ego4D STA Challenge. Our code will be released at https://github.com/CorrineQiu/VISTA.

URL PDF HTML ☆

赞 0 踩 0

2605.20894 2026-05-21 cs.RO

训练分布决定了药物盲癌敏感性预测的上限

Taekyung Heo

发表机构 * Taekyung Heo

AI总结本文研究了药物盲癌敏感性预测中训练分布对预测性能的影响，发现传统指标存在偏差，通过机制分层训练和响应匹配策略恢复了预测增益。

详情

AI中文摘要

精准肿瘤学需要预测特定肿瘤从其分子特征出发哪种药物能抑制它，但尽管药物表示越来越复杂，药物盲敏感性预测却停滞不前。本文表明这种停滞反映的是度量偏差而非表示瓶颈。标准基准全球皮尔逊相关系数受药物间效力差异主导，一个简单的药物均值预测器即可捕捉。每种药物皮尔逊相关系数揭示了在四个独立数据集中，没有药物编码能超过仅基于细胞特征的预测。受控实验将作用机制身份作为药物特征或训练分布约束，确定了原因。将作用机制作为特征提供微小收益，而将其作为训练分布分层则显著提高针对靶向激酶抑制剂的每种药物相关系数，因为全癌症联合训练抑制了通路特异性敏感信号。机制分层训练和试点观察的响应匹配提供了两种可部署策略，共同恢复了药物盲敏感性预测中的主要预测增益来源。

英文摘要

Precision oncology requires predicting which drugs will suppress a specific tumor from its molecular profile, but drug-blind sensitivity prediction has plateaued despite increasingly complex drug representations. Here we show that this stagnation reflects a metric artifact rather than a representational bottleneck. The standard benchmark, global Pearson r, is dominated by between-drug potency differences that a trivial drug-mean predictor captures without any cell-specific learning. Per-drug Pearson r, which isolates within-drug cell ranking, reveals that no drug encoding improves over cell-only features across four independent datasets. A controlled experiment channeling mechanism-of-action identity as either a drug feature or a training-distribution constraint identifies the cause. Supplying MoA as a feature yields negligible benefit, whereas using it to stratify training raises per-drug r substantially for targeted kinase inhibitors, because pan-cancer co-training suppresses pathway-specific sensitivity signals. Mechanism-stratified training and response matching from pilot observations provide two deployable strategies that together recover the principal sources of predictive gain in drug-blind sensitivity prediction.

URL PDF HTML ☆

赞 0 踩 0

2605.20883 2026-05-21 cs.LG

Learning fMRI activations dictionaries across individual geometries via optimal transport

通过最优传输学习跨个体几何的fMRI激活字典

Sonia Mazelet, Rémi Flamary, Bertrand Thirion

发表机构 * CMAP, Ecole Polytechnique Palaiseau, France（CMAP，巴黎政治学院帕莱索校区，法国）； Mind, Inria-Saclay Palaiseau, France（Mind，法国国家信息与自动化研究所萨克雷帕莱索分所，法国）

AI总结本文提出了一种基于最优传输的fMRI字典学习方法，通过Fused Gromov-Wasserstein距离处理个体脑几何差异，利用amortized优化减少计算成本，并学习依赖FGW参数平衡特征对齐与结构一致性的字典原子。

详情

AI中文摘要

字典学习是一种创建可解释表示的强大工具。当应用于功能性磁共振成像（fMRI）数据时，所得到的脑活动模式可用于各种下游任务，如脑状态分类或群体水平分析。然而，一个主要挑战是不同个体之间的脑几何差异。通常通过将每个个体的脑几何投影到一个通用模板上来解决，这会移除个体特定的信息。在本工作中，我们提出了一种新的fMRI数据字典学习方法，该方法明确考虑了这种差异。我们使用基于最优传输的融合Gromov-Wasserstein（FGW）距离来比较具有不同几何和特征的图。为了解决计算多个FGW距离对于大图（如来自fMRI数据的图）带来的挑战，我们依赖于amortized优化来学习一个神经网络，该网络可以预测最优传输计划的近似值，从而显著降低计算成本。此外，我们学习了依赖FGW权衡参数的字典原子，该参数控制特征对齐和结构一致性之间的平衡。在HCP数据集上的数值实验表明，所提出的方法能够捕捉数据中的不同几何差异水平，并提供保留关键信息的表示。

英文摘要

Dictionary learning is a powerful tool for creating interpretable representations. When applied to functional magnetic resonance imaging (fMRI) data, the resulting patterns of brain activity can be used for various downstream tasks, such as brain state classification or population-level analysis. However, a major challenge is the variability in brain geometry across individuals. This is usually addressed by projecting each individual brain geometry onto a common template, which removes subject-specific information. In this work, we introduce a novel approach to dictionary learning on fMRI data that explicitly accounts for this variability. We use the optimal transport-based Fused Gromov-Wasserstein (FGW) distance to compare graphs with different geometries and features. To address the challenge of computing multiple FGW distances for large graphs such as those arising from fMRI data, we rely on amortized optimization to learn a neural network that predicts an approximation of the optimal transport plans, which substantially reduces the computational cost. Additionally, we learn dictionary atoms that depend on the FGW trade-off parameter, which controls the balance between feature alignment and structural consistency. Numerical experiments on the HCP dataset demonstrate that the proposed approach captures different levels of geometric variability in the data and provides representations that preserve essential information.

URL PDF HTML ☆

赞 0 踩 0

2605.20879 2026-05-21 cs.LG

为通用智能体构建的治理机制

Segev Shlomov, Iftach Shoham, Alon Oved, Ido Levy, Sami Marreed, Harold Ship, Offer Akrabi, Sergey Zeltyn, Avi Yaeli, Nir Mashkif

发表机构 * IBM

AI总结本文提出了一种模块化的政策-as-code层，用于在不微调模型的情况下，通过与通用大语言模型智能体结合，实现可预测、可审计且符合合规要求的行为，在复合工作流中无需为每个领域重新构建智能体。

详情

AI中文摘要

企业智能体日益被期望在多个工具和界面中自主运行，但生产部署需要通过构建来实施治理。系统必须指定哪些操作被允许、何时需要人类监督以及哪些信息可以暴露，而无需为每个领域重新构建智能体。本演示展示了CUGA的策略系统，这是一种模块化的策略-as-code层，能够与通用大语言模型智能体结合，以在复合工作流中实现可预测、可审计且符合合规要求的行为。我们提出了一种运行时治理架构，在执行的每一个关键阶段都强制执行策略干预。而不是被动地限制行为，策略在五个结构性检查点拦截智能体：规划上游（意图守卫）、在系统提示内引导推理（手册）、在工具调用边界处强制正确使用（工具指南）、在推理循环外作为人类在环的闸门用于高风险操作（工具批准）、以及在输出阶段过滤和结构化最终响应（输出格式器）。这些阶段将治理连续嵌入智能体的执行流程中，而不是将其视为事后考虑。通过一个医疗场景和多层次的执行干预，演示展示了动态手册注入用于结构化工具序列执行，意图守卫阻止恶意或意外有害请求，以及人类在环的工具批准检查点用于可能破坏性操作。该成果展示了类型化的治理原语如何加快、安全地部署企业智能体系统，同时提高政策遵守和执行一致性。

英文摘要

Enterprise agents are increasingly expected to operate autonomously across tools and interfaces, yet production deployments require governance by construction. Systems must specify which actions are allowed, when human oversight is required, and what information may be exposed, without rebuilding the agent for each domain. This demo presents CUGA's policy system, a modular policy-as-code layer that composes with a generalist LLM agent to deliver predictable, auditable, and compliance-aware behavior in compound workflows without model fine-tuning. We present a runtime governance architecture that enforces policy interventions at every critical stage of execution. Rather than passively constraining behavior, policies intercept the agent at five structural checkpoints: upstream of planning (Intent Guard), within the system prompt to steer reasoning (Playbook), at the tool-call boundary to enforce proper usage (Tool Guide), outside the reasoning loop as a Human-in-the-Loop gate for high-risk actions (Tool Approvals), and at the output stage to filter and structure the final response (Output Formatter). Together, these stages embed governance continuously across the agent's execution pipeline rather than treating it as an afterthought. Using a healthcare scenario and a multi-layered enforcement intervention, the demo shows dynamic playbook injection for structured tool-sequence enforcement, intent guards that block malicious or accidental harmful requests, and human-in-the-loop tool approval checkpoints for potentially destructive actions. The artifact illustrates how typed governance primitives enable faster, safer deployment of enterprise agentic systems while improving policy adherence and execution consistency.

URL PDF HTML ☆

赞 0 踩 0

2605.20872 2026-05-21 cs.LG cs.AI cs.GR

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

CAdam: 3D高斯密度细化中的上下文自适应矩估计

SeungJeh Chung, Geonho Park, Misong Kim, HyeongYeop Kang

发表机构 * IIIXR Lab, Kyung Hee University（庆尚大学IIIXR实验室）； IIIXR Lab, Korea University（韩国大学IIIXR实验室）

AI总结本文提出CAdam方法，通过将密度细化问题转化为统计信号验证问题，解决生成式蒸馏中密度估计的瓶颈，从而在保持视觉质量的同时显著减少高斯点数量。

Comments Accepted to SIGGRAPH 2026 Conference Papers. 12 pages, 8 figures

详情

DOI: 10.1145/3799902.3811215

AI中文摘要

Adaptive densification是3D高斯点划法（3DGS）的核心引擎。然而，当将其应用于基于优化的生成式蒸馏范式时，这种重建原生机制暴露了根本性限制，导致效率低下且充满冗余的表示。我们诊断这种失败为密度困境，源于生成指导的随机性：标准的幅度基积累无差别地聚合瞬态噪声与几何信号，难以在过密度和欠拟合之间取得平衡。为了解决这一问题，我们引入了上下文自适应矩估计（CAdam），一种新的框架，将密度细化重新解释为统计上站得住的信号验证问题。CAdam利用梯度的一阶矩来利用干涉原理，其中随机波动通过破坏性干涉抵消，而一致的几何漂移通过建设性干涉累积，从而有效分离底层信号与生成噪声底座。这进一步通过基于分位数的上下文意识和内在信号噪声比（SNR）门控机制增强，确保在优化阶段之间具有鲁棒的适应性，并使密度细化能够软终止。在多样化的目标（SDS，ISM，VFDS）和强大的生成3DGS后端上进行了广泛的实验，结果表明CAdam相比标准密度细化将高斯点数减少85%-97%，同时保持整体可比的视觉质量。这些结果突显了信号感知密度控制作为改进优化生成式蒸馏内存效率的实用方法。

英文摘要

Adaptive densification is the engine of 3D Gaussian Splatting (3DGS). However, when transposed to the optimization-based Generative Distillation paradigm, this reconstruction-native mechanism reveals fundamental limitations, resulting in inefficient representations cluttered with redundant primitives. We diagnose this failure as a Densification Dilemma stemming from the stochastic nature of generative guidance: the standard magnitude-based accumulation indiscriminately aggregates transient noise alongside geometric signals, making it difficult to strike a balance between over-densification and under-fitting. To resolve this, we introduce Context-Adaptive Moment Estimation (CAdam), a novel framework that reinterprets densification as a statistically grounded signal verification problem. CAdam leverages the first moment of gradients to exploit the interference principle, where stochastic fluctuations cancel out via destructive interference while consistent geometric drifts accumulate via constructive interference, effectively disentangling the underlying signal from the generative noise floor. This is further augmented by a quantile-based context awareness and an intrinsic Signal-to-Noise Ratio (SNR) gating mechanism, which ensure robust adaptation across optimization stages and enable the soft termination of densification. Extensive experiments across diverse objectives (SDS, ISM, VFDS) and strong generative 3DGS backbones show that CAdam reduces Gaussian count by 85%-97% relative to standard densification while preserving overall comparable perceptual quality. These results highlight signal-aware density control as a practical way to improve memory efficiency in optimization-based generative distillation.

URL PDF HTML ☆

赞 0 踩 0

2605.20868 2026-05-21 cs.LG cs.AI cs.SY eess.SY

Runtime-Certified Bounded-Error Quantized Attention

具有运行时认证的误差受限量化注意

Dean Calver

发表机构 * Independent Researcher（独立研究者）

AI总结本文提出了一种分层的KV缓存架构，通过在GPU内存中存储INT8键和INT4值，同时在系统RAM中保留FP16原始数据，实现了运行时认证的注意机制，通过误差分解得到每头每步的误差界，以驱动自适应精度选择和多阶段回退流程，确保在需要时能恢复到精确的密集注意输出。

Comments 32 pages, 1 figure

详情

AI中文摘要

KV缓存量化减少了长上下文LLM推理的内存成本，但引入了通常仅通过经验验证的近似误差。现有系统依赖于平均情况下的鲁棒性，没有机制在运行时检测或恢复失败。本文提出了一种分层的KV缓存架构，使注意机制具有运行时认证：INT8键和INT4值存储在GPU内存中，而FP16原始数据保留在系统RAM中以实现确定性回退。一个两术语误差分解提供了每头每步的误差界（i）键量化导致的注意分布扭曲和（ii）值重建误差。这些界在线计算并用于驱动自适应精度选择和多阶段回退阶梯，确保在需要时能恢复到精确的密集注意输出。在PG-19、NIAH和RULER基准上，对LLaMA~3.1-8B（上下文长度达128K）的测试中，系统在语言建模和检索任务中与密集FP16 KV质量在噪声范围内匹配，同时恢复了在朴素INT8/INT4基线中观察到的灾难性故障。短上下文的值敏感任务暴露了压缩与保真度之间的可控权衡，可通过更紧的值容忍度或FP16值回退消除。认证是局部的（每头、每步），不保证端到端模型的正确性，但确保每个注意计算要么相对于FP16参考是受控的，要么通过回退精确恢复。这将KV缓存量化重新定义为运行时验证的计算，而不是固定近似。目标不是原始的速度提升，而是使在严格质量约束下安全部署的激进KV压缩成为可能。

英文摘要

KV cache quantization reduces the memory cost of long-context LLM inference, but introduces approximation error that is typically validated only empirically. Existing systems rely on average-case robustness, with no mechanism to detect or recover from failures at runtime. We present a tiered KV cache architecture that enables runtime-certified attention: INT8 keys and INT4 values are stored in GPU memory, while FP16 originals are retained in system RAM for deterministic fallback. A two-term error decomposition yields per-head, per-step bounds on (i) attention distribution distortion from key quantization and (ii) value reconstruction error. These bounds are computed online and used to drive adaptive precision selection and a multi-stage fallback ladder, which guarantees recovery to the exact dense attention output when required. Across PG-19, NIAH, and RULER benchmarks on LLaMA~3.1-8B with contexts up to 128K, the system matches dense FP16 KV quality within noise for language modelling and retrieval tasks, while recovering catastrophic failures observed in naive INT8/INT4 baselines. Value-sensitive tasks at short context expose a controlled trade-off between compression and fidelity, which can be eliminated via tighter value tolerances or FP16-value fallback. The certification is local (per-head, per-step) and does not guarantee end-to-end model correctness, but ensures that each attention computation is either bounded relative to an FP16 reference or exactly recovered via fallback. This reframes KV cache quantization as a runtime-verified computation rather than a fixed approximation. The goal is not raw speedups, but enabling safe deployment of aggressive KV compression under strict quality constraints.

URL PDF HTML ☆

赞 0 踩 0

2605.20866 2026-05-21 cs.LG cs.DC math.OC stat.ML

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

LOSCAR-SGD：局部SGD与通信-计算重叠及延迟校正的稀疏模型平均

Yassine Maziane, Ammar Mahran, Artavazd Maranjyan, Peter Richtárik

发表机构 * KAUST（卡塔尔科技大学）

AI总结本文研究了在异构计算环境下结合通信压缩、局部训练和通信-计算重叠的局部SGD方法，提出LOSCAR-SGD通过仅通信稀疏模型坐标并持续优化来提高分布式学习效率，首次给出了这种组合方法的理论保证。

详情

AI中文摘要

在分布式学习中，通信是主要的瓶颈，尤其是在大规模设置和联邦学习环境中链接缓慢时。减少此成本的三种标准方法是通信压缩、局部训练和通信-计算重叠。结合这些成分的方法在实践中被发现对大规模训练有效，但很少有理论支持同时结合这三种方法的方法。我们研究了一个异构计算环境，其中不同的工作者可能进行不同数量的局部步骤，并提出LOSCAR-SGD，一种局部SGD方法，仅通信模型坐标的稀疏子集，并在通信飞行期间继续优化。关键成分是延迟校正的合并规则，该规则在不丢弃重叠阶段所做进展的情况下整合延迟同步信息。我们为光滑非凸目标函数提供了收敛保证，并展示了稀疏性、重叠和工作者异质性如何影响收敛速度。据我们所知，这是首次针对这种成分组合的理论。实验进一步表明，通信-计算重叠减少了训练时间，并且延迟校正的合并优于朴素覆盖。

英文摘要

Communication is a major bottleneck in distributed learning, especially in large-scale settings and in federated learning environments with slow links. Three standard ways to reduce this cost are communication compression, local training, and communication-computation overlap. Methods that combine these ingredients are used in practice and have been found to be effective for large-scale training, but there is little theory for methods that combine all three. We study a heterogeneous-compute setting in which different workers may take different numbers of local steps, and we propose LOSCAR-SGD, a Local SGD method that communicates only a sparse subset of model coordinates and continues optimizing while communication is in flight. A key ingredient is a delay-corrected merge rule that incorporates delayed synchronized information without discarding the progress made during the overlap phase. We give convergence guarantees for smooth non-convex objectives and show how sparsity, overlap, and worker heterogeneity affect the rate. To the best of our knowledge, this is the first theory for this combination of ingredients. Experiments further show that communication-computation overlap reduces training time and that the delay-corrected merge outperforms naive overwriting.

URL PDF HTML ☆

赞 0 踩 0

2605.20865 2026-05-21 cs.LG cs.AI

Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards

多步似然比校正用于可验证奖励的强化学习

Deokgyu Yoon, Hyungkyu Kang, Joongkyu Lee, Byeongchan Kim, Gyungin Shin, Sungrae Park, Min-hwan Oh

发表机构 * Seoul National University（首尔国立大学）； Upstage

AI总结本文提出了一种多步前向轨迹政策优化（NFPO）算法，通过引入N步前向轨迹来改进PPO的近似目标，从而在可验证奖励的强化学习中实现更精确的策略改进。

详情

AI中文摘要

可验证奖励的强化学习（RLVR）在提升大语言模型的推理能力方面起着关键作用。然而，广泛使用的PPO替代目标本质上是局部的，因为它们依赖于精确策略梯度目标的局部近似。虽然这种近似通过减少重要性采样引起的方差来提高稳定性，但它也引入了结构偏差到替代目标中，必须通过信任区域机制进行控制。在本文中，我们引入了N步前向轨迹，通过累积下一个N-1个token的似然比来增强PPO替代目标。基于这一想法，我们提出了N步前向轨迹策略优化（NFPO），一种将N步前向轨迹整合到掩码策略梯度框架中的实用RLVR算法。NFPO提供了一个连续的桥梁，将PPO替代目标与精确策略梯度目标联系起来，提供了一种控制偏差-方差权衡的原理机制。我们的理论分析表明，通过适当选择N，所提出的目标比标准PPO替代目标提供了更紧的策略改进界。在全面推理基准测试中，实验表明NFPO一致地提高了性能，支持了我们的理论发现。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) plays a pivotal role in improving the reasoning ability of large language models. However, widely used PPO surrogate objectives are fundamentally local, as they rely on a local approximation of the exact policy gradient objective. While this approximation improves stability by reducing the variance induced by importance sampling, it also introduces structural bias into the surrogate objective, which must be controlled through trust region mechanisms. In this work, we introduce the $N$-step forward trace, which augments the PPO surrogate objective using the cumulative likelihood ratio of the next $N-1$ tokens. Building on this idea, we propose $N$-Step Forward-Trace Policy Optimization (NFPO), a practical RLVR algorithm that integrates the $N$-step forward trace into the masked policy gradient framework. NFPO provides a continuous bridge between the PPO surrogate objective and the exact policy gradient objective, offering a principled mechanism for controlling the bias-variance trade-off. Our theoretical analysis shows that, with an appropriate choice of $N$, the proposed objective yields a tighter policy-improvement bound than the standard PPO surrogate. Experiments on comprehensive reasoning benchmarks demonstrate that NFPO consistently improves performance, supporting our theoretical findings.

URL PDF HTML ☆

赞 0 踩 0

2605.20856 2026-05-21 cs.RO cs.AI cs.LG

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation

DISC: 通过策略生成解耦指令与状态条件控制

Hanxiang Ren, Pei Zhou, Xunzhe Zhou, Yanchao Yang

发表机构 * Zhejiang University（浙江大学）； The University of Hong Kong（香港大学）； TranscEngram

AI总结 DISC通过策略生成解耦指令与状态条件控制，解决了任务状态耦合导致的观察泄漏问题，并在多个基准测试中表现出色，证明了语言生成的策略参数驱动行为。

详情

AI中文摘要

语言条件的操控策略通常通过共享网络参数处理指令和观察。这种任务-状态耦合提供了观察泄漏的路径——网络学习了场景到动作的捷径，完全绕过了语言接地。DISC通过结构上消除这一失败。而不是将通用策略条件在语言上，DISC使用超网络从指令本身生成整个任务特定的视觉-运动策略参数集。生成的策略从不直接访问语言；因此，其任务意识必须来自语言。 Consequently，观察泄漏没有路径出现。另一方面，生成一致的高维策略权重本身是一个具有挑战性的问题。我们通过两阶段超网络解决它，其细化阶段将基于梯度优化的结构作为前馈归纳偏差嵌入，产生全局一致的参数，而无需实际梯度计算。在标准数据预算上完全从头训练，DISC在LIBERO-90和Meta-World上优于所有耦合基线，在复杂、长周期任务中优势扩大，并在不使用外部预训练数据的情况下超越了大规模预训练的π₀。在一个现实基准中，所有任务共享相同的视觉上下文，DISC显著优于耦合替代方案，直接证实了语言生成的策略参数，而非视觉捷径，驱动行为。超网络进一步学习了一个语义结构化的参数流形，能够从最少的演示中实现少样本适应，并在改写指令中实现稳健的泛化。我们的代码可在：https://github.com/ReNginx/DISC获取。

英文摘要

Language-conditioned manipulation policies typically process instructions and observations through shared network parameters. This task-state entanglement provides a pathway for observation leakage -- networks learn scene-to-action shortcuts that bypass language grounding entirely. DISC eliminates this failure structurally. Rather than conditioning a universal policy on language, DISC uses a hypernetwork to generate the entire parameter set of a task-specific visuomotor policy from the instruction alone. The generated policy never directly accesses language; therefore, its task-awareness must come from the language. Consequently, observation leakage has no pathway to emerge. On the other hand, generating coherent high-dimensional policy weights is itself a challenging problem. We address it with a two-stage hypernetwork whose refinement stage embeds the structure of gradient-based optimization as a feed-forward inductive bias, producing globally consistent parameters without actual gradient computation. Trained entirely from scratch on standard data budgets, DISC outperforms all entangled baselines on LIBERO-90 and Meta-World, with advantages that widen on complex, long-horizon tasks -- and surpasses the large-scale pretrained $π_0$ despite using no external pretraining data. On a real-world benchmark where all tasks share identical visual context, DISC substantially outperforms entangled alternatives, directly confirming that language-generated policy parameters, not visual shortcuts, drive behavior. The hypernetwork further learns a semantically structured parameter manifold that enables few-shot adaptation from minimal demonstrations and robust generalization across paraphrased instructions. Our code is available at: {https://github.com/ReNginx/DISC}.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

DASH: Fast Differentiable Architecture Search for Hybrid Attention in Minutes on a Single GPU

WiXus: A Wheeled-Legged Robot with Wire-Driven Environmental Utilizing to Integrate Mobility and Manipulation

STEAM: A Training-Free Congestion-Aware Enhancement Framework for Decentralized Multi-Agent Path Finding

Strategy-Induct: Task-Level Strategy Induction for Instruction Generation

Winfree Oscillatory Neural Network

Evaluating Speech Articulation Synthesis with Articulatory Phoneme Recognition

SubTGraph: Large-Scale Subterranean Environment Synthesis with Controllable Topological Variability for Robotic Autonomy Validation

Task-Routed Mixture-of-Experts with Cognitive Appraisal for Implicit Sentiment Analysis

Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models

Enhancing Scientific Discourse: Machine Translation for the Scientific Domain

For How Long Should We Be Punching? Learning Action Duration in Fighting Games

FlowLong: Inference-time Long Video Generation via Manifold-constrained Tweedie Matching

SynCB: A Synergy Concept-Based Model with Dynamic Routing Between Concepts and Complementary Neural Branches

JFAA: Technical Report for the EPIC-KITCHENS-100 Action Anticipation Challenge at EgoVis 2026

VISTA: Technical Report for the Ego4D Short-Term Object Interaction Anticipation at EgoVis 2026

Mobile UMI: Cross-View Diffusion Policy with Decoupled Kinematics for Mobile Manipulation

FruitEnsemble: MLLM-Guided Arbitration for Heterogeneous ensemble in Fine-Grained Fruit Recognition

HDMoE: A Hierarchical Decoupling-Fusion Mixture-of-Experts Framework for Multimodal Cancer Survival Prediction

Map-Mono-Ego: Map-Grounded Global Human Pose Estimation from Monocular Egocentric Video

Training distribution determines the ceiling of drug-blind cancer sensitivity prediction

Learning fMRI activations dictionaries across individual geometries via optimal transport

NeighborDiv: Training-free Zero-shot Generalist Graph Anomaly Detection via Neighbor Diversity

CIG: Exploration via Conditional Information Gain

Terminal-World: Scaling Terminal-Agent Environments via Agent Skills

Governance by Construction for Generalist Agents

CAdam: Context-Adaptive Moment Estimation for 3D Gaussian Densification in Generative Distillation

Runtime-Certified Bounded-Error Quantized Attention

LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging

Multi-Step Likelihood-Ratio Correction for Reinforcement Learning with Verifiable Rewards

DISC: Decoupling Instruction from State-Conditioned Control via Policy Generation