arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.17056 2026-06-16 cs.CL 新提交

The Value Axis: Language Models Encode Whether They're on the Right Track

价值轴:语言模型编码它们是否在正确的轨道上

Nick Jiang, Isaac Kauvar, Jack Lindsey

发表机构 * Stanford University(斯坦福大学) Anthropic

AI总结 通过构建Qwen3-8B的“价值轴”,发现语言模型内部追踪当前轨迹的成功概率,并影响自信、自我纠正和探索行为。

Comments Code repository: https://github.com/nickjiang2378/value-axis

详情
AI中文摘要

我们研究语言模型是否内部追踪其当前轨迹的价值,定义为当前策略实现目标的似然。使用合成的上下文强化学习数据,我们为Qwen3-8B构建了一个“价值轴”。我们发现沿此轴的激活区分了高与低口头自信、无回溯与有回溯的展开、正确与错误的代码。向高价值引导因果地抑制自我纠正并减少解释冗长,而向低价值引导则诱导回溯和探索。我们证明直接偏好优化(DPO)可以增加奖励行为(例如使用某个词)的内部价值,使模型在展示这些行为后表现得更自信。最后,我们将价值轴应用于研究野外设置。例如,我们发现Qwen在训练后对政治敏感的聊天查询分配低价值,并且监督微调增加了训练领域内的内部自信。我们的结果表明语言模型线性编码对预期目标成功的一个估计,该估计调节它们追求方向的自信。

英文摘要

We investigate whether language models internally track the value of their current trajectory, defined as the likelihood that their ongoing strategy will achieve their goals. Using synthetic, in-context reinforcement learning data, we construct a "value" axis for Qwen3-8B. We find that activations along this axis distinguish between high vs. low verbalized confidence, rollouts without and with backtracking, and correct vs. corrupted code. Steering towards high value causally suppresses self-correction and reduces explanatory verbosity, while steering towards low value induces backtracking and exploration. We demonstrate that direct preference optimization (DPO) can increase the internal value of rewarded behaviors (e.g. use a certain word), causing the model to act more confidently after exhibiting them. Finally, we apply the value axis to study in-the-wild settings. For example, we find that Qwen assigns low value to politically sensitive chat queries after post-training and that supervised fine-tuning increases internal confidence within the training domain. Our results suggest that language models linearly encode an estimate of expected goal success that modulates their confidence in pursuing a direction.

2606.17055 2026-06-16 cs.RO 新提交

T-Rex: Tactile-Reactive Dexterous Manipulation

T-Rex: 触觉反应灵巧操作

Dantong Niu, Zhuoyang Liu, Zekai Wang, Boning Shao, Zhao-Heng Yin, Anirudh Pai, Yuvan Sharma, Stefano Saravalle, Ruijie Zheng, Jing Wang, Ryan Punamiya, Mengda Xu, Yuqi Xie, Yunfan Jiang, Letian Fu, Konstantinos Kallidromitis, Matteo Gioia, Junyi Zhang, Jiaxin Ge, Haiwen Feng, Fabio Galasso, Wei Zhan, David M. Chan, Yutong Bai, Roei Herzig, Jiahui Lei, Fei-Fei Li, Ken Goldberg, Jitendra Malik, Pieter Abbeel, Yuke Zhu, Danfei Xu, Jim, Fan, Trevor Darrell

发表机构 * UC Berkeley(加州大学伯克利分校) NVIDIA(英伟达) Stanford(斯坦福大学) Panasonic(松下) La Sapienza University(罗马大学) ItalAI

AI总结 提出大规模触觉数据集和可变速率混合Transformer架构,在12项精细操作任务上平均成功率提升超30%。

Comments Project page: https://tactile-rex.github.io/

详情
AI中文摘要

长期以来,对触觉信号做出动态反应的能力被认为是实现敏捷人类级灵巧操作的关键。然而,当前基于学习的视觉-语言-动作(VLA)模型在机器人操作中通常要么忽略触觉模态,要么局限于使用静态线索的编码器,部分原因是缺乏多样化的训练数据和标准化评估、当前VLA模型中的架构限制以及静态触觉编码器的局限性。在本文中,我们通过解决所有这些局限性来推动触觉反应操作的前沿。我们提出了一个大规模、100小时的触觉丰富数据集,该数据集通过一种新颖的、数据高效的配方收集,优先考虑基本运动基元。为了有效利用自然高频的触觉信号而不牺牲现有VLA的现有能力,我们引入了一种可变速率混合Transformer(MoT)架构,配备了一种新颖的时间触觉VQ-VAE编码器。我们在12项需要精细力控制和可变形物体操作的操作任务上展示了触觉反应策略的有效性,平均成功率比最强基线高出30%以上。

英文摘要

The ability to react dynamically to tactile signals has long been considered crucial to agile human-level dexterity. Yet contemporary learning-based Vision-Language-Action (VLA) models for robotic manipulation generally either overlook the tactile modality or are limited to encoders with static cues, due in part to the scarcity of diverse training data and standardized evaluation, architectural constraints in current VLA models, and limitations of static tactile encoders. In this paper, we push the frontier of tactile-reactive manipulation by addressing all of these limitations. We propose a large-scale, 100-hour tactile-rich dataset collected via a novel, data-efficient recipe that prioritizes elementary motor primitives. To effectively exploit naturally high-frequency touch signals without sacrificing the existing capabilities of existing VLAs, we introduce a variable-rate Mixture-of-Transformers (MoT) architecture equipped with a novel temporal tactile VQ-VAE encoder. We demonstrate the effectiveness of tactile-reactive policies on 12 manipulation tasks requiring delicate force control and deformable object manipulation, achieving over 30% higher average success rate than the strongest baseline.

2606.17054 2026-06-16 cs.RO 新提交

Human Universal Grasping

人类通用抓取

Kevin Yuanbo Wu, Tianxing Zhou, Isaac Tu, Billy Yan, Irmak Guzey, David Fouhey, Dandan Shan, Lerrel Pinto

发表机构 * New York University(纽约大学) Tsinghua University(清华大学) University of Michigan(密歇根大学)

AI总结 提出HUG模型,利用人类抓取数据(1M-HUG数据集)和流匹配方法,从单张RGB-D图像生成多样化抓取姿态,并重定向到机器人手,实现零样本抓取,在HUG-Bench上超越基线23%-34%。

Comments 28 pages, 20 figures, 7 tables

详情
AI中文摘要

人类可以轻松抓取物体,而多指机器人远未达到这种通用性。我们认为机器人抓取数据最自然的来源是人类,他们每天拿起数千个物体。我们提出HUG,一个流匹配模型,能够为任何用户指定的物体(从立体相机捕获的单张RGB-D图像中)生成多样化的人类抓取。使用智能眼镜,我们首先收集了1M-HUGs,一个自我中心的人类抓取数据集,涵盖100万帧(27.8小时)和41栋建筑中的6,707个物体实例。接下来,为了建模自然人类抓取的分布,我们的新型流匹配模型融合RGB和深度观测,输出由手腕平移、手腕旋转和MANO手姿态参数化的抓取。预测的抓取可以重定向到各种机器人手,实现在日常场景中的零样本抓取。为了标准化评估,我们构建了一个新的模拟基准HUG-Bench,包含来自五个几何类别和不同尺寸的90个未见物体,并带有公制尺度的3D网格。我们在真实世界中评估HUG,使用HUG-Bench的30个物体测试集,跨越多个立体相机、机器人实体和家庭环境。HUG在我们具有挑战性的物体集上比最先进的抓取基线高出23%和34%。代码、数据、基准、检查点和交互式演示已在我们的网站上发布:https://grasping.io/

英文摘要

Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera. Using smart glasses, we first collect 1M-HUGs, an egocentric dataset of human grasps spanning 1M frames (27.8 hrs) and 6,707 object instances across 41 buildings. Next, to model the distribution of natural human grasps, our novel flow-matching model fuses RGB and depth observations to output a grasp parameterized by wrist translation, wrist rotation, and MANO hand pose. Predicted grasps can be retargeted to various robot hands, enabling zero-shot grasping in everyday scenes. To standardize evaluation, we build a new simulated benchmark, HUG-Bench, of 90 unseen objects from five geometric categories and various sizes, with metric-scale 3D meshes. We evaluate HUG in the real world on the 30-object test set of HUG-Bench across multiple stereo cameras, robot embodiments, and household environments. HUG outperforms the state-of-the-art grasping baselines by +23% and +34% on our challenging object set. Code, data, benchmark, checkpoints, and an interactive demo are released on our website: https://grasping.io/

2606.17053 2026-06-16 cs.CL cs.CV 新提交

Context-Aware RL for Agentic and Multimodal LLMs

上下文感知强化学习用于智能体与多模态大语言模型

Peiyang Xu, Bangzheng Li, Sijia Liu, Karthik R. Narasimhan, Pramod Viswanath, Prateek Mittal, Xingyu Fu

发表机构 * Princeton University(普林斯顿大学) UC Davis(加州大学戴维斯分校)

AI总结 提出ContextRL方法,通过间接辅助目标(上下文选择奖励)增强大模型在长上下文和多模态任务中的细粒度推理能力,在5个长程基准和12个视觉问答基准上分别提升+2.2%和+1.8%。

Comments 29 pages, 9 figures

详情
AI中文摘要

大语言模型在需要从长或复杂上下文中识别细小但决定性证据(如工具跟踪中的一行或图像中的细微细节)时常常失败。我们提出ContextRL,一种上下文感知的强化学习方法,通过一个间接辅助目标来提升长程推理和多模态性能。ContextRL不是仅监督最终答案,而是向模型提供查询、答案和两个高度相似的上下文,并奖励它选择支持查询-答案对的上下文,从而鼓励细粒度定位。我们在两个领域构建对比上下文数据:对于编码智能体,轨迹作为上下文,通过条件过滤生成1k对;对于多模态推理,图像作为上下文,通过生成式编辑和相似性搜索生成7K对。ContextRL在5个长程基准上比标准GRPO平均提升+2.2%,在12个多样化视觉问答基准上平均提升+1.8%。为了分离所提目标与额外数据的影响,我们与数据增强基线进行比较,这些基线将相同的对比上下文重新用作标准查询-上下文-答案示例。这些基线几乎没有改进,表明收益来自所提出的上下文选择目标,而非仅对比数据。

英文摘要

Large language models (LLMs) often fail when answering requires identifying a small but decisive piece of evidence within a long or complex context, such as a single line in a tool trace or a subtle detail in an image. We propose ContextRL, a context-aware reinforcement learning (RL) method that improves long-horizon reasoning and multimodal performance through an \emph{indirect} auxiliary objective. Instead of supervising only the final answer, ContextRL presents the model with a query, an answer, and two highly similar contexts, and rewards it for selecting the context that supports the query--answer pair, thereby encouraging fine-grained grounding. We construct contrastive context data in two domains: for coding agents, trajectories serve as contexts, yielding 1k pairs built via condition filtering; for multimodal reasoning, images serve as contexts, yielding 7K pairs built via generative editing and similarity search. ContextRL achieves average gains of +2.2% over standard GRPO on 5 long-horizon benchmarks, and +1.8% across 12 diverse visual question answering benchmarks. To disentangle the effect of the proposed objective from that of additional data, we compare against data-augmentation baselines that repurpose the same contrastive contexts as standard query--context--answer examples. These baselines provide little to no improvement, showing that the gains arise from the proposed context-selection objective rather than from the contrastive data alone.

2606.17052 2026-06-16 math.NA cs.NA 新提交

Nitsche-based FEM for the Laplace eigenvalue problem: spectral approximation and a posteriori error analysis

基于Nitsche的Laplace特征值问题有限元方法:谱逼近与后验误差分析

Arbaz Khan, David Mora, Jesus Vellojin

AI总结 本文针对弱施加本质边界条件的椭圆特征值问题,在紧算子理论框架下分析Nitsche方法,证明离散解算子的范数收敛性,推导特征值和特征函数的误差估计,并提出适用于自适应细化的残差型后验估计器。

详情
AI中文摘要

本文中,我们提出了一个椭圆特征值问题的数值分析,其中本质边界条件通过Nitsche方法弱施加。所得的离散特征值问题在紧算子理论框架内进行研究。我们证明了离散解算子的范数收敛性,并推导了特征值和特征函数的误差估计,其收敛速率取决于所选的Nitsche变体。此外,我们进行了后验误差分析,并提出了一个适用于自适应细化的残差型估计器。进行了若干数值实验以评估该方法的收敛性、稳定性和鲁棒性,包括Nitsche稳定化参数的影响以及自适应策略的性能。

英文摘要

In this paper, we present the numerical analysis of an elliptic eigenvalue problem in which the essential boundary condition is imposed weakly by means of the Nitsche method. The resulting discrete eigenvalue problem is studied within the framework of compact operator theory. We prove norm convergence of the discrete solution operator and derive error estimates for the eigenvalues and eigenfunctions, with rates depending on the chosen Nitsche variant. In addition, we develop an a posteriori error analysis and propose a residual-based estimator suitable for adaptive refinement. Several numerical experiments are presented to assess the convergence, stability and robustness of the method, including the influence of the Nitsche stabilization parameter and the performance of the adaptive strategy.

2606.17051 2026-06-16 cs.CG cs.DS math.MG 新提交

A constant-factor approximation of the Gromov-Hausdorff distance in the plane

平面中 Gromov-Hausdorff 距离的常数因子近似

Sushovan Majhi

AI总结 本文给出平面中有限点集间 Gromov-Hausdorff 距离的首个多项式时间常数因子近似,通过双射瓶颈距离和胖-共线二分法实现,并证明各成分的必要性。

详情
AI中文摘要

我们给出了欧几里得平面中有限点集间 Gromov-Hausdorff 距离 $d_{GH}$ 的首个多项式时间常数因子近似;在固定欧几里得维度下,这种近似此前仅在线段上已知(Majhi, Vitter, 和 Wenk, 2024)。其核心是双射(瓶颈)Gromov-Hausdorff 距离 $d_{GH}^{bij}$:对于两个大小相等的集合,双射 $σ$ 的最小加性失真 $\max_{i,j}|d_X(i,j) - d_Y(σi, σj)|$ 等于 $2\,d_{GH}^{bij}$,我们同样在绝对常数内近似它。近似加性失真可追溯到 Hall 和 Papadimitriou(2005),他们在线段上给出了 $2$-近似,并观察到在三维中近似到 $3$ 是 NP-难的;他们留下的平面情况正是我们解决的。胖-共线二分法驱动了两个界:胖集通过单个刚体运动对齐,而近共线集被分割成簇,沿其树状图在一次无标度遍历中匹配,相对方向和每个节点的反射符号——在树状图的每个尺度上——通过全局切割恢复。将双射放松为对应关系得到 $d_{GH}$ 本身,它简化为一个单独的簇内多重性核——最优对应关系折叠的对——同一理论封闭了它。匹配的下界——维度下降、多重性间隙和作用于每个尺度的反射障碍——表明每个成分都是必要的。

英文摘要

We give the first polynomial-time constant-factor approximation of the Gromov--Hausdorff distance $d_{GH}$ between finite point sets in the Euclidean plane; in fixed Euclidean dimension such an approximation was previously known only on the line (Majhi, Vitter, and Wenk, 2024). Its engine is the bijective (bottleneck) Gromov--Hausdorff distance $d_{GH}^{bij}$: for two equal-size sets the least additive distortion $\max_{i,j}|d_X(i,j) - d_Y(σi, σj)|$ of a bijection $σ$ equals $2\,d_{GH}^{bij}$, which we likewise approximate within an absolute constant. Approximating additive distortion goes back to Hall and Papadimitriou (2005), who gave a $2$-approximation on the line and observed approximation within $3$ to be NP-hard in dimension three; the planar case they left open is the one we settle. A fat-or-collinear dichotomy drives both bounds: a fat set is aligned by a single rigid motion, while a near-collinear set is split into clusters matched along their dendrogram in one flat, scale-free pass, with relative orientations and per-node reflection signs -- at every scale of the dendrogram -- recovered by global cuts. Relaxing bijections to correspondences yields $d_{GH}$ itself, which reduces to a lone within-cluster-multiplicity kernel -- the pairs an optimal correspondence collapses -- that the same theory closes. Matching lower bounds -- a dimension drop, a multiplicity gap, and a reflection barrier acting at every scale -- show each ingredient is necessary.

2606.17050 2026-06-16 eess.SY cs.SY math.OC 新提交

Optimal Bounded Thrust Powered Descent with Analytical Ground-Collision Avoidance

带有解析地面碰撞避免的最优有界推力动力下降

Or Nataf, Vitaly Shaferman

AI总结 提出一种新方法解决有界推力动力下降问题,通过时间相关多项式近似质量,分层分离推力分配,实现解析地面碰撞避免,并给出饱和感知制导律。

Comments This work has been submitted for journal publication. 32 pages and 15 figures

详情
AI中文摘要

本文提出了一种新方法来解决有界推力动力下降问题,同时确保地面碰撞避免。采用时间相关的多项式近似质量,以制定一个有界线性二次最优控制问题,最小化推力加速度控制努力、终端偏差和终端速度误差。所得近似用于对水平推力剖面施加硬约束,同时保持垂直推力剖面无约束。关键思想是推力分配的分层分离,这使得在有界推力下能够实现解析地面碰撞避免。与基于数值优化和轨迹整形约束的现有有界推力动力下降方法不同,所提方法提供了显式的解析碰撞避免条件。基于此公式,制导律预测饱和弧和非饱和弧之间的切换时间,并塑造推力加速度剖面以实现软着陆,即使控制器在轨迹的较大部分保持饱和。由于其解析性质,制导律计算效率高,且其连续推力剖面便于实时实现。所提方法在真实模拟中在一组扰动初始条件的网格上进行了评估,展示了准确的、无碰撞的软着陆性能。结果突出了在有界推力下将饱和感知制导与地面碰撞避免相结合的重要性。

英文摘要

The paper proposes a new approach to address the bounded-thrust powered-descent problem while ensuring ground-collision avoidance. A time-dependent polynomial approximation of the mass is employed to formulate a bounded linear-quadratic optimal-control problem that minimizes the thrust-acceleration control effort, terminal miss, and terminal velocity error. The resulting approximation is used to impose a hard constraint on the horizontal thrust profile while keeping the vertical thrust profile unconstrained. The key idea is a hierarchical separation of the thrust allocation, which enables analytical ground-collision avoidance under bounded thrust. Unlike existing bounded-thrust powered-descent approaches based on numerical optimization and trajectory-shaping constraints, the proposed method provides explicit analytical collision-avoidance conditions. Building on this formulation, the guidance law predicts the switching times between saturated and unsaturated arcs and shapes the thrust-acceleration profile to achieve a soft landing, even when the controller remains saturated over extended portions of the trajectory. Owing to its analytical nature, the guidance law is computationally efficient, and its continuous thrust profile facilitates real-time implementation. The proposed method was evaluated over a grid of perturbed initial conditions in realistic simulations, demonstrating accurate collision-free soft-landing performance. The results highlight the importance of combining saturation-aware guidance with ground-collision avoidance under bounded thrust.

2606.17049 2026-06-16 cs.CV 新提交

BRDFusion: Physics Meets Generation for Urban Scene Inverse Rendering

BRDFusion:物理与生成结合的城市场景逆渲染

Yi-Ruei Liu, Jie-Ying Lee, Zheng-Hui Huang, Yu-Lun Liu, Chih-Hao Lin

AI总结 提出BRDFusion框架,结合物理建模与生成先验,实现城市场景逆渲染,在保持物理一致性的同时修复伪影,支持新视角重光照、夜间模拟和动态物体编辑。

Comments Project page: https://shigon255.github.io/brdfusion-page/

详情
AI中文摘要

从捕获视频中对城市场景进行逆渲染可实现众多应用,包括内容创建和自动驾驶仿真。基于物理的渲染方法遵循并控制光照物理,但存在重建和渲染伪影。而生成模型能产生逼真视频,但一致性和可控性有限。我们提出BRDFusion,一个统一框架,结合两种互补模型用于逆渲染和前向渲染。具体而言,BRDFusion通过物理建模恢复显式、一致的场景属性,并利用生成先验缓解优化歧义。在前向渲染中,物理模型提供基于场景配置的可控渲染,生成模型则去噪并修复伪影。因此,我们的方法在允许精确控制的同时生成高质量视频,在真实和合成场景中均优于基线。此外,BRDFusion支持新视角重光照、夜间模拟以及动态物体插入/编辑。项目页面:https://shigon255.github.io/brdfusion-page/

英文摘要

Inverse rendering of urban scenes from captured videos enables numerous applications, including content creation and autonomous driving simulation. Physically-based rendering methods follow and control lighting physics, but suffer from reconstruction and rendering artifacts. While generative models produce realistic videos, they offer limited consistency and controllability. We present BRDFusion, a unified framework that combines two complementary models for inverse and forward rendering. Specifically, BRDFusion recovers explicit, consistent scene properties with physical modeling and alleviates optimization ambiguity with generative priors. During forward rendering, the physical model provides controllable rendering from the scene configuration, and the generative model denoises and fixes artifacts. Therefore, our method produces high-quality videos while allowing precise control, outperforming baselines in real and synthetic scenes. Moreover, BRDFusion supports novel-view relighting, night simulation, and dynamic object insertion/editing. Project page: https://shigon255.github.io/brdfusion-page/

2606.17048 2026-06-16 cs.LG cs.CV stat.ML 新提交

Exact Posterior Score Estimation for Solving Linear Inverse Problems

精确后验分数估计用于求解线性逆问题

Abbas Mammadov, Ozgur Kara, Kaan Oktay, Iskander Azangulov, Adil Kaan Akan, Hyungjin Chung, James Matthew Rehg, Yee Whye Teh

发表机构 * University of Oxford(牛津大学) UIUC(伊利诺伊大学厄巴纳-香槟分校) EverEx

AI总结 提出精确后验分数(EPS)方法,通过闭式后验分数将线性逆问题转化为去噪问题,无需梯度或投影,在FFHQ和ImageNet上优于现有方法。

详情
AI中文摘要

扩散和基于流的模型通过训练去噪器来逆转高斯损坏,从而学习强大的数据先验。为了利用这一先验解决线性逆问题,需要从后验中采样,但先验提供的分数是无条件分数,而非后验分数。现有方法要么使用近似测量匹配校正来引导固定的预训练去噪器,要么训练一个放弃先验去噪结构的条件恢复模型。我们在一般高斯插值下推导了线性高斯逆问题的精确后验分数闭式,并表明后验采样可归结为在算子依赖的偏移枢轴和各向异性噪声协方差下的去噪问题。我们将这一恒等式转化为精确后验分数(EPS),这是一种去噪训练目标,保留了标准预训练的输入/输出结构,因此可以从头训练或从预训练去噪器微调。在推理时,EPS使用与底层骨干相同的采样器,无需似然梯度或投影。我们在FFHQ和ImageNet上的五个线性逆问题上评估了EPS,在保真度、感知和分布指标上优于无训练和基于训练的基线,同时使用的去噪器评估次数比基于梯度的后验采样器少大约一个数量级。

英文摘要

Diffusion and flow-based models learn powerful data priors by training a denoiser to reverse Gaussian corruption. To use this prior to solve a linear inverse problem, one needs to sample from the posterior, but the score that the prior provides is the unconditional score, not the posterior score. Existing methods either steer a fixed pretrained denoiser with approximate measurement-matching corrections, or train a conditional restoration model that abandons the denoising structure of the prior. We derive the exact posterior score in closed form for linear Gaussian inverse problems under general Gaussian interpolants, and show that posterior sampling reduces to a denoising problem at an operator-dependent shifted pivot under an anisotropic noise covariance. We turn this identity into Exact Posterior Score (EPS), a denoising training objective that preserves the input/output structure of standard pretraining and can therefore be trained from scratch or fine-tuned from a pretrained denoiser. At inference, EPS uses the same sampler as the underlying backbone, with no likelihood gradients or projections. We evaluate EPS on five linear inverse problems across FFHQ and ImageNet, where it outperforms training-free and training-based baselines on fidelity, perceptual, and distributional metrics, while using roughly an order of magnitude fewer denoiser evaluations than gradient-based posterior samplers.

2606.17046 2026-06-16 cs.RO cs.CV cs.LG 新提交

Geometric Action Model for Robot Policy Learning

几何动作模型用于机器人策略学习

Jisang Han, Seonghu Jeon, Jaewoo Jung, René Zurbrügg, Honggyu An, Tifanny Portela, Marco Hutter, Marc Pollefeys, Seungryong Kim, Sunghwan Hong

发表机构 * KAIST AI(韩国科学技术院人工智能学院) ETH Zurich(苏黎世联邦理工学院) ETH AI Center(苏黎世联邦理工学院人工智能中心)

AI总结 提出几何动作模型(GAM),通过重用预训练几何基础模型(GFM)作为共享骨干,实现语言条件下的操作策略,在仿真和真实机器人任务中优于现有方法。

Comments Project page: https://cvlab-kaist.github.io/Geometric-Action-Model/

详情
AI中文摘要

通用机器人策略必须遵循用户指令,同时推理物体、相机和机器人动作如何在3D物理世界中交互。最近的视觉-语言-动作模型(VLAs)和视频世界-动作模型(WAMs)从大规模基础模型中继承了强大的语义或时间先验,但它们仍然主要在2D图像帧或2D派生的潜在空间上操作,隐含了接触丰富操作所需的3D几何信息。我们提出了几何动作模型(GAM),一种语言条件操作策略,直接重用预训练的几何基础模型(GFM)作为感知、时间预测和动作解码的共享基础。GAM在中间层分割GFM:浅层作为观察编码器,在分割层插入一个因果未来预测器,根据语言、本体感受和动作历史预测未来的潜在令牌。然后,预测的未来令牌通过剩余的GFM块进行特征传播和解码,使得单个骨干能够同时产生未来几何和动作。这种设计通过最小的架构修改赋予GFM语言条件的时间世界建模能力,同时保留其丰富的几何先验。在广泛的仿真和真实机器人操作基准测试中,GAM比当前基础模型规模的基线更准确、更鲁棒、更快、更轻量。

英文摘要

Generalist robot policies must follow user instructions while reasoning about how objects, cameras, and robot actions interact in the 3D physical world. Recent vision-language-action models (VLAs) and video world-action models (WAMs) inherit strong semantic or temporal priors from large-scale foundation models, but they still operate primarily on 2D image frames or 2D-derived latent spaces, leaving implicit the 3D geometry required for contact-rich manipulation. We propose the Geometric Action Model (GAM), a language-conditioned manipulation policy that directly repurposes a pretrained geometric foundation model (GFM) as a shared substrate for perception, temporal prediction, and action decoding. GAM splits the GFM at an intermediate layer: the shallow layers serve as an observation encoder, and a causal future predictor inserted at the split layer forecasts future latent tokens conditioned on language, proprioception, and action history. The predicted future tokens are then routed through the remaining GFM blocks for feature propagation and decoding, allowing a single backbone to produce both future geometry and actions. This design equips the GFM with language-conditioned temporal world modeling through minimal architectural modification while preserving its rich geometric priors. Across a broad suite of simulation and real-robot manipulation benchmarks, GAM is more accurate, more robust, faster, and lighter than current foundation-model-scale baselines.

2606.17043 2026-06-16 cs.RO cs.LG 新提交

Hierarchical Advantage Weighting for Online RL Fine-Tuning of VLAs from Sparse Episode Outcomes

基于层级优势加权的在线RL微调VLA策略从稀疏回合结果

Tongyan Fang, Siyuan Huang, Naiyu Fang, Ganlong Zhao, Zhongjin Luo, Jianbo Liu, Xiaogang Wang, Ying Dong, Hongsheng Li

发表机构 * ACE Robotics Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院) The Chinese University of Hong Kong(香港中文大学)

AI总结 提出层级优势加权行为克隆(HABC),通过分离生存性和效率目标并自适应平衡,解决稀疏二元结果下VLA策略在线微调中的信用分配问题,在三个双臂接触任务上将成功率从12-44%提升至38-92%。

Comments Website: https://acerobotics-vla.github.io/HABC-Website

详情
AI中文摘要

当预训练的VLA策略通过在线RL进行微调时,每次 rollout 回合仅产生单个二元结果(成功或失败),但 actor 更新需要每个时间步的监督。现有方法通常将此稀疏结果简化为单个标量奖励或优势信号,这混淆了不同形式的过渡级反馈,并且在基本任务成功可实现后提供的指导有限。首先,单个标量信号混淆了生存性和效率这两个目标;一旦基本成功实现,二元标签无法提供梯度来区分高效完成与缓慢完成。其次,真实世界的 rollout 混合了自主段和干预段;天真地将回合结果跨这些边界分配会导致不正确的信用分配。为解决这些问题,我们提出层级优势加权行为克隆(HABC),该方法在不同数据子集上为这两个目标训练独立的评论家头,并通过状态自适应平衡组合其输出。状态自适应门 $g_t$ 合并它们的一步优势,在成功不确定时优先考虑生存性,仅在生存性高时转向效率,并将结果转换为 actor 损失上的每时间步权重。干预感知的信用分配进一步将结果标签限制在当前策略执行的段,防止监督跨干预边界泄漏。在三个接触丰富的双臂任务上的真实机器人实验中,HABC 将监督微调(SFT)基线的成功率从 36%、44% 和 12% 提升至 92%、88% 和 38%。

英文摘要

When pretrained VLA policies are fine-tuned through online RL, each rollout episode produces only a single binary outcome (success or failure), yet the actor update requires per-transition supervision. Existing approaches commonly reduce this sparse outcome to a single scalar reward or advantage signal, which conflates distinct forms of transition-level feedback and provides limited guidance once basic task success becomes achievable. First, a single scalar signal conflates the two objectives of viability and efficiency; once basic success is achieved, the binary label provides no gradient to distinguish efficient completions from slow ones. Second, real-world rollouts mix autonomous and intervention segments; naively assigning episode outcomes across these boundaries introduces incorrect credit assignment. To address these issues, we propose Hierarchical Advantage-Weighted Behavior Cloning (HABC), which trains separate critic heads for these two objectives on different data subsets and combines their outputs with a state-adaptive balance. A state-adaptive gate $g_t$ merges their one-step advantages, prioritizing viability when success is uncertain and shifting to efficiency only when viability is high, and converts the result into per-transition weights on the actor loss. Intervention-aware credit assignment further restricts outcome labels to segments executed by the current policy, preventing supervision from leaking across intervention boundaries. In real-robot experiments on three contact-rich bimanual tasks, HABC raises success from supervised fine-tuning (SFT) baselines of 36%, 44%, and 12% to 92%, 88%, and 38%.

2606.17040 2026-06-16 cs.RO cs.CV 新提交

R2RDreamer: 3D-aware Data Augmentation for Spatially-generalized 2D Manipulation Policies

R2RDreamer: 面向空间泛化的2D操作策略的3D感知数据增强

Xiuwei Xu, Haowen Sun, Angyuan Ma, Yiwei Zhang, Zhenyu Wu, Xiaofeng Wang, Bingyao Yu, Zheng Zhu, Jie Zhou, Jiwen Lu

发表机构 * Tsinghua University(清华大学) BUPT(北京邮电大学) GigaAI

AI总结 提出R2RDreamer框架,通过轻量级3D编辑和2D视频补全,从少量真实演示生成几何一致的增强数据,提升2D操作策略的空间泛化能力。

Comments Project page: https://r2rdreamer.github.io/

详情
AI中文摘要

空间泛化对于模仿学习的操作策略至关重要,但通常需要跨不同物体姿态、机器人配置和相机视角的大规模演示。从少量源演示中进行数据增强为昂贵的真实世界数据收集提供了一种实用替代方案。基于仿真的增强可以创建可控变化,但需要复杂的环境和物体设置,并可能引入仿真到现实的差距。最近的实到实方法通过联合编辑真实演示的3D观测和动作轨迹来避免这些问题,但它们仍然依赖于强大的3D场景解析和几何补全,并且通常生成针对3D点云策略而非基于RGB的2D策略的观测。我们提出R2RDreamer,一个实到实演示增强框架,它在保持3D动作-观测编辑的几何一致性的同时,将视觉补全迁移到2D视频空间。具体来说,R2RDreamer首先通过在一个共享的3D框架中编辑不完整的物体点云和末端执行器轨迹来执行轻量级3D增强;然后,它将编辑后的场景投影到具有遮挡感知推理的掩码图像空间控制视频中,并使用密集控制图像到视频模型来补全时间上连贯的RGB观测。在空间偏移操作任务上的实验,包括2D扩散风格策略和视觉-语言-动作策略,表明R2RDreamer从有限的源演示中提高了空间泛化能力,分析验证了3D编辑、遮挡感知投影和视频补全的贡献。

英文摘要

Spatial generalization is critical for imitation-learned manipulation policies, but achieving it typically requires scaling demonstrations across diverse object poses, robot configurations, and camera viewpoints. Data augmentation from a few source demonstrations offers a practical alternative to costly real-world collection. Simulation-based augmentation can create controllable variation, but requires complex environment and object setup and may introduce a sim-to-real gap. Recent real-to-real methods avoid these issues by jointly editing 3D observations and action trajectories from real demonstrations, yet they still rely on strong 3D scene parsing and geometry completion, and often produce observations tailored to 3D pointcloud policies rather than RGB-based 2D policies. We propose R2RDreamer, a real-to-real demonstration augmentation framework that preserves the geometric consistency of 3D action-observation editing while moving visual completion to 2D video space. Specifically, R2RDreamer first performs lightweight 3D augmentation by editing incomplete object pointclouds and end-effector trajectories in a shared 3D frame; it then projects the edited scene into masked image-space control videos with occlusion-aware reasoning and uses a dense-control image-to-video model to complete temporally coherent RGB observations. Experiments on spatially shifted manipulation tasks with both 2D diffusion-style policies and vision-language-action policies show that R2RDreamer improves spatial generalization from limited source demonstrations, with analyses validating the contributions of 3D editing, occlusion-aware projection, and video completion.

2606.17037 2026-06-16 cs.CV cs.AI cs.LG 新提交

The Importance of Phase in Neural Representations: An Internal Oppenheim-Lim Test of Image Classifiers

相位在神经表示中的重要性:图像分类器的内部Oppenheim-Lim测试

Alper Yıldırım

AI总结 通过内部相位-幅度移植实验,发现图像分类器(如PRISM2D、GFNet、ViT-B/16)的预测主要依赖相位/符号信息,而图像特定幅度对读出贡献有限;ResNet-50在ReLU前存在潜在符号编码,揭示了CNN与注意力模型在纹理-形状差异上的机制。

详情
AI中文摘要

Oppenheim和Lim(1981)表明,自然图像仅从傅里叶相位重建时仍可识别,而幅度几乎不携带其身份信息。我们探究训练后的图像分类器是否在其隐藏层内再现这种不对称性,并进行因果测试:给定两幅图像,我们在选定层将一幅图像的相位移植到另一幅图像的幅度上,并记录预测跟随哪幅图像。在PRISM2D、GFNet和ViT-B/16中,预测跟随相位或符号捐赠者,删除所有图像特定幅度几乎不影响准确率,因此身份信息依赖于相位,而图像特定幅度对读出而言在很大程度上是可舍弃的。ResNet-50起初似乎打破了这一模式,因为在ReLU之后移植符号无效;在ReLU之前的公平干预揭示了后期块中存在强烈的潜在符号编码,而仅DC对照表明读出消耗了通道空间平均值。对照排除了幅度简单地不依赖于图像的平凡情况。因此,这些架构共享一个相位/符号身份编码,但以不同基(由整流和读出几何决定)暴露出来,这为CNN与注意力模型之间的纹理-形状差异提供了机制性解释。

英文摘要

Oppenheim and Lim (1981) showed that natural images stay recognizable when reconstructed from their Fourier phase alone, while the magnitude carries little of their identity. We ask whether trained image classifiers reproduce this asymmetry inside their hidden layers, and we test it causally: given two images, we transplant the phase of one onto the magnitude of the other at a chosen layer and record which image the prediction follows. In PRISM2D, GFNet, and ViT-B/16 the prediction follows the phase or sign donor, and deleting all image-specific magnitude barely moves accuracy, so identity rides on phase while image-specific magnitude is largely dispensable to the readout. ResNet-50 at first seems to break the pattern, because transplanting sign after its ReLUs does nothing; a fair intervention before the ReLU reveals a strong latent sign code in the late blocks, and a DC-only control shows the readout consumes a channel-wise spatial average. Controls rule out the trivial case in which magnitude simply stops depending on the image. The architectures therefore share a phase/sign identity code but expose it in different bases, set by rectification and readout geometry, which gives a mechanistic account of the texture--shape gap between CNNs and attention models.

2606.17035 2026-06-16 cs.LG cs.CR 新提交

Your Privacy My Cloak: Backdoor Attacks on Differentially Private Federated Learning

你的隐私我的伪装:差分隐私联邦学习中的后门攻击

Xiaolin Li, Ning Wang, Ninghui Li, Wenhai Sun

AI总结 针对差分隐私联邦学习,提出RING攻击,利用差分隐私的掩蔽效应绕过防御,在中等隐私预算下平均攻击成功率90.3%。

详情
AI中文摘要

先前的研究表明,差分隐私(DP)本质上增强了联邦学习(FL)对后门攻击的鲁棒性。在本文中,我们挑战了这一假设。通过对两种基线攻击策略的实证分析,我们揭示了DP-FL中的一个基本矛盾:虽然绕过DP使得最先进的防御能够检测并过滤恶意更新,但遵守DP却无意中掩盖了其独特的统计特征。因此,随着DP降低原始后门信号,现有防御变得无效。基于这种掩蔽效应,我们提出了RING,一种新颖的攻击,明确利用DP来隐藏恶意贡献,同时最大化攻击影响。通过协同制作对抗性扰动,受损客户端在聚合过程中重构强大的后门信号而不触发异常检测。RING作为一个与底层后门技术无关的扰动层,使其广泛适用且可与现有攻击组合——这一特性显著放大了其对DP-FL的威胁。在四个图像和文本数据集上进行的非独立同分布分布下的广泛评估表明,在中等隐私预算下,RING针对六种最先进防御的平均攻击成功率达到90.3%,比基线策略提高了高达26.08倍。最后,我们评估了潜在的防御措施,发现缓解这一威胁会带来显著的效用权衡,暴露了部署差分隐私FL中的基本安全漏洞。

英文摘要

Prior research suggests that differential privacy (DP) inherently enhances the robustness of federated learning (FL) against backdoor attacks. In this paper, we challenge this assumption. Through an empirical analysis of two baseline attack strategies, we uncover a fundamental tension in DP-FL: while bypassing DP allows state-of-the-art defenses to detect and filter malicious updates, complying with DP inadvertently masks their distinguishing statistical characteristics. Consequently, existing defenses become ineffective as DP reduces the raw backdoor signal. Building on this masking effect, we propose RING, a novel attack that explicitly exploits DP to conceal malicious contributions while maximizing attack impact. By collaboratively crafting adversarial perturbations, compromised clients reconstruct a strong backdoor signal during aggregation without triggering anomaly detection. RING operates as a perturbation layer that is agnostic to the underlying backdoor technique, making it broadly applicable and composable with existing attacks -- a property that significantly amplifies the threat it poses to DP-FL. Extensive evaluations across four image and text datasets under non-iid distributions show that RING achieves an average attack success rate of 90.3% against six state-of-the-art defenses under a moderate privacy budget, an improvement of up to 26.08x over baseline strategies. Finally, we evaluate potential countermeasures and find that mitigating this threat incurs significant utility trade-offs, exposing a fundamental security gap in the deployment of differentially private FL.

2606.17034 2026-06-16 cs.CL cs.LG 新提交

KVEraser: Learning to Steer KV Cache for Efficient Localized Context Erasing

KVEraser: 学习操控KV缓存以实现高效的局部上下文擦除

Mufei Li, Shikun Liu, Dongqi Fu, Haoyu Wang, Yinglong Xia, Hong Li, Hong Yan, Pan Li

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Meta

AI总结 提出KVEraser方法,通过学习操控KV缓存实现局部上下文擦除,避免全局重计算,在长上下文任务中接近全重算性能且延迟仅增加24%。

Comments Oral at the ICML 2026 Workshop on the Impact of Memorization on Trustworthy Foundation Models

详情
AI中文摘要

在KV缓存上进行事后上下文擦除具有挑战性,因为局部编辑会产生全局影响:一旦某个跨度被处理,其影响会传播到所有后续token的缓存状态。这个问题在长上下文LLM应用中自然出现,其中过时的检索事实、错误的工具观察、撤回的用户偏好或有害的提示注入可能仅在预填充后才发现。精确擦除必须重新计算删除跨度后的所有token,使其计算成本取决于后缀长度而非擦除跨度长度。我们引入KVEraser,一种学习型KV缓存编辑方法,用于高效的局部上下文擦除。给定已处理的上下文和要移除的跨度,KVEraser仅用学习到的操控状态替换擦除区间的KV状态,同时保持其余缓存不变。为了学习可迁移的擦除机制,我们构建了一个两阶段训练流程:通用跨度-邻居预训练教会擦除器抑制擦除跨度的影响,而任务特定微调将此能力适应下游场景。实验表明,在1K--32K上下文长度的域内任务中,KVEraser在擦除后性能上几乎匹配全重算,而其延迟仅增加24%,而全重算延迟增加17.6倍。KVEraser还能泛化到具有有害事实干扰项的未见长文档QA任务,在全重算的3--4倍加速下,在近似基线中取得最佳性能。

英文摘要

Post-hoc context erasing over the KV cache is challenging because a local edit has a global consequence: once a span has been processed, its influence propagates into the cached states of all subsequent tokens. This issue arises naturally in long-context LLM applications, where stale retrieved facts, incorrect tool observations, retracted user preferences, or harmful prompt injections may be identified only after prefill. Exact erasing must then recompute all tokens after the deleted span, making its computational cost depend on suffix length rather than erased-span length. We introduce KVEraser, a learned KV-cache editing method for efficient localized context erasing. Given a processed context and a span to remove, KVEraser replaces only the KV states of the erased interval with learned steering states while reusing the remaining cache unchanged. To learn a transferable erasing mechanism, we build a two-stage training pipeline: generic span-neighbor pre-training teaches the eraser to suppress the influence of the erased span, while task-specific fine-tuning adapts this capability to downstream scenarios. Experiments show that KVEraser nearly matches full recomputation in post-erasure performance on in-domain tasks across 1K--32K context lengths, while its latency increases by only 24% compared with a 17.6x increase for full recomputation. KVEraser also generalizes to unseen long-document QA tasks with harmful factual distractors, achieving the best performance among approximate baselines with a 3--4x speedup over full recomputation.

2606.17029 2026-06-16 cs.CL 新提交

DEEPRUBRIC: Evidence-Tree Rubric Supervision for Efficient Reinforcement Learning of Deep Research Agents

DEEPRUBRIC: 基于证据树规则监督的高效深度研究智能体强化学习

Minghang Zhu, Chuyang Wei, Junhao Xu, Yilin Cheng, Zhumin Chen, Jiyan He

发表机构 * Shandong University(山东大学) Zhongguancun Academy(中关村学院) Fudan University(复旦大学)

AI总结 提出DeepRubric框架,通过构建证据树生成查询-规则对,确保奖励信号准确评估查询所需信息,以13倍少的RL GPU时间达到与先前最优模型相当的性能。

详情
AI中文摘要

深度研究智能体通过搜索和推理检索到的证据来综合长篇报告。基于规则的奖励强化学习通过优化智能体以符合可检查的标准(这些标准将报告质量转化为奖励信号)来改进这些智能体,但其效率取决于这些标准是否可靠地捕捉任务范围和证据需求。大多数现有研究要求LLM为给定查询生成规则,但当模型无法推断潜在信息需求时,生成的规则可能不完整,从而降低RL效率。为了获得更可靠的查询-规则监督,我们引入了DeepRubric,一个反向这一过程的数据构建框架:它首先确定基于证据的报告应该评估什么,然后从这些评估目标中合成对齐的查询-规则对,而不是为给定查询推断评估标准。从采样的种子主题开始,DeepRubric通过递归扩展有证据支持的子问题构建证据树,其叶子节点作为原子且可验证的评估目标。然后,它使用证据树合成训练查询和规则,确保奖励准确评估查询所请求的信息。使用DeepRubric,我们构建了9K个查询-规则监督示例,并使用基于规则的GRPO训练了DeepRubric-8B,在三个基准测试中实现了与先前开源最先进深度研究模型相当的性能,而RL GPU时间减少了约13倍。

英文摘要

Deep research agents synthesize long-form reports by searching and reasoning over retrieved evidence. Reinforcement learning with rubric-based rewards improves these agents by optimizing them against checkable criteria that translate report quality into reward signals, but its efficiency depends on whether those criteria reliably capture the task scope and evidence needs. Most existing studies ask an LLM to generate rubrics for a given query, but when the model fails to infer the underlying information needs, the generated rubrics may be incomplete and reduce RL efficiency. To obtain more reliable query--rubric supervision, we introduce DeepRubric, a data construction framework that reverses this process: instead of inferring evaluation criteria for a given query, it first determines what an evidence-backed report should be evaluated on and then synthesizes aligned query--rubric pairs from those evaluation targets. Starting from a sampled seed topic, DeepRubric builds an evidence tree by recursively expanding evidence-backed sub-questions, whose leaves serve as atomic and verifiable evaluation targets. It then uses the evidence tree to synthesize the training query and rubrics, ensuring that the reward evaluates exactly the information requested by the query. Using DeepRubric, we construct 9K query--rubric supervision examples and train DeepRubric-8B with rubric-based GRPO, achieving comparable performance to prior open state-of-the-art deep research models across three benchmarks with roughly 13x fewer RL GPU-hours.

2606.17028 2026-06-16 cs.LG cs.AI cs.AR 新提交

HAMON: Passive Optical Sequence Mixing for Long-Horizon Forecasting

HAMON: 用于长程预测的无源光学序列混合

Alper Yıldırım

AI总结 提出HAMON无源衍射光学预测核心,通过光学传播替代数字序列混合层,在多个基准上优于或接近最强数字基线,MSE最多降低14%。

详情
AI中文摘要

简单的线性模型和频域模型在长程时间序列预测中仍然出奇地具有竞争力,最近的机制证据表明,标准预测基准可能不需要使Transformer在其他领域强大的密集叠加表示。这引发了一个底层问题:如果核心预测算子通常是低复杂度的且近似线性,它是否需要被实现为学习到的数字时间混合?我们引入了HAMON,一种无源衍射光学预测核心,其中历史值被编码到光学孔径上,未来位置保持暗场,级联的可训练相位掩模与自由空间衍射直接在输出场中形成预测。在推理时,预测由单个无源光学传播过程完成,无需可训练的数字序列混合层。在标准基准上,HAMON在ETTm2的所有预测长度和ETTh2除最长预测长度外的所有长度上优于考虑的最强数字基线,MSE最多降低14%,并且在不同预测长度上一致地优于基线,而非孤立点。它在Weather上具有竞争力,在其余ETT设置以及高通道数的Traffic和Electricity数据集上略逊于最强基线。相位编码、强度兼容读出和相位扰乱消融实验,以及TorchOptics交叉模拟检查表明,预测来自承载数据的光场而非数字预测头。由于无源核心使用标准傅里叶光学,HAMON为光学硬件和无源物理序列混合定义了一个具体目标。

英文摘要

Simple linear and frequency-domain models remain surprisingly competitive in long-horizon time-series forecasting, and recent mechanistic evidence suggests that standard forecasting benchmarks may not require the dense superposed representations that make transformers powerful in other domains. This raises a substrate-level question: if the core forecasting operator is often low-complexity and approximately linear, does it need to be implemented as learned digital temporal mixing? We introduce HAMON, a passive diffractive optical forecasting core in which historical values are encoded onto an optical aperture, future positions are left dark, and cascaded trainable phase masks with free-space diffraction shape the forecast directly in the output field. At inference, prediction is performed by a single passive optical propagation pass with no trainable digital sequence-mixing layer. Across standard benchmarks, HAMON outperforms the strongest digital baselines considered on ETTm2 at all horizons and on ETTh2 at all but the longest horizon, improving MSE by up to 14\% and doing so consistently across horizons rather than at isolated points. It is competitive on Weather and trails the strongest baselines on the remaining ETT settings and on the high-channel-count Traffic and Electricity datasets. Phase encoding, intensity-compatible readout, and phase-scrambling ablations, together with a TorchOptics cross-simulator check, indicate that the forecasts arise from the data-bearing optical field rather than from a digital forecasting head. Because the passive core uses standard Fourier optics, HAMON defines a concrete target for optical hardware and for passive physical sequence mixing.

2606.17027 2026-06-16 cs.CV 新提交

MeshLoom: Feed-Forward Non-Rigid Registration of Mesh Sequences

MeshLoom: 网格序列的前馈式非刚性配准

Jianqi Chen, Jiraphon Yenphraphai, Xiangjun Tang, Sergey Tulyakov, Chaoyang Wang, Peter Wonka, Rameen Abdal

发表机构 * KAUST Saudi Arabia(沙特阿拉伯国王科技大学) Snap Inc. United States of America(Snap Inc. 美国) Purdue University United States of America(普渡大学 美国)

AI总结 提出MeshLoom,一种前馈式配准网络,通过拓扑感知编码器-解码器直接重建网格序列的顶点变形,实现秒级多网格配准,并在非刚性配准任务上达到最先进水平,同时支持运动插值和网格变形。

Comments Project page: https://meshloom.github.io/

详情
AI中文摘要

我们提出MeshLoom,一种前馈式配准网络,可直接重建网格序列中的顶点变形。我们的方法将非刚性配准推进到超越现有模型,这些模型通常受限于昂贵的逐实例优化、狭窄的物体类别、仅成对输入或仅仅是中间输出。该网络简单高效,可在数秒内配准多个网格。其核心在于拓扑感知的编码器-解码器设计。具体来说,我们首先引入一种拓扑感知的点表示,将锚点(参考)网格的拓扑编码到其逐顶点特征中。这种表示增强了网络对锚点网格几何结构的理解,并区分了欧几里得接近但测地距离远的点。然后,我们提出一种多模态编码器,将这种锚点网格表示与每帧的互补线索(如形状潜变量和图像特征)融合。这些多源信号被压缩成一个紧凑的全局运动嵌入,捕捉密集的帧间对应关系。一个轻量级解码器随后用锚点网格点表示查询该全局嵌入,检索目标时间戳处的逐顶点变形。通过在多种运动和物体类别上的大量实验,我们表明MeshLoom在非刚性配准上达到了最先进的结果。此外,我们发现我们的全局嵌入-然后-查询范式自然地使网络能够生成中间时间戳的变形,这扩展了MeshLoom到运动插值和网格变形。项目页面:https://meshloom.github.io/。

英文摘要

We present MeshLoom, a feed-forward registration network that directly reconstructs vertex deformations across mesh sequences. Our approach advances non-rigid registration beyond existing models, which are typically constrained by costly per-instance optimization, narrow object categories, pairwise-only inputs, or merely intermediate outputs. The network is simple and efficient, registering multiple meshes within seconds. At its core lies a topology-aware encoder--decoder design. Specifically, we first introduce a topology-aware point representation that encodes the anchor (reference) mesh's topology into its per-vertex features. This representation strengthens the network's understanding of the anchor-mesh geometry and disambiguates points that are Euclidean-close yet geodesically distant. We then propose a multi-modal encoder that fuses this anchor-mesh representation with complementary cues from each frame, such as shape latents and image features. These multi-source signals are compressed into a compact global motion embedding that captures dense inter-frame correspondence. A lightweight decoder then queries this global embedding with the anchor-mesh point representation, retrieving per-vertex deformations at target timestamps. Through extensive experiments across diverse motions and object categories, we show that MeshLoom achieves state-of-the-art results on non-rigid registration. In addition, we find that our global embedding-then-query paradigm naturally enables the network to generate deformations at intermediate timestamps, which extends MeshLoom to motion interpolation and mesh morphing. Project page: https://meshloom.github.io/ .

2606.17024 2026-06-16 cs.LG 新提交

ExpRL: Exploratory RL for LLM Mid-Training

ExpRL: 用于LLM中期训练的探索性强化学习

Violet Xiang, Amrith Setlur, Chase Blagden, Nick Haber, Aviral Kumar

发表机构 * Stanford University(斯坦福大学) Carnegie Mellon University(卡内基梅隆大学) OpenAI Rogo

AI总结 提出ExpRL方法,利用人类编写的问答数据作为奖励支架,通过密集奖励强化推理过程中的部分进展和有用行为,在数学推理任务上优于SFT、稀疏奖励GRPO和自蒸馏,并为后续稀疏奖励RL提供更好的初始化。

详情
AI中文摘要

稀疏奖励强化学习(RL)已成为提升LLM推理能力的标准工具,但其成功关键取决于基础模型中的覆盖范围。实践中,模型通常通过在精心策划的推理轨迹上进行中期训练来为RL做准备,这些轨迹教授有用的基本技能,如分解、验证或自我纠正。尽管有效,但这种策略需要手动指定模型应学习的内容,并且尚不清楚这种基本覆盖是否足以解决更难的问题,这些问题需要将这些技能组合成更广泛的解决方案策略。我们研究了一种更自动化的方法:使用大规模人工编写的问答数据进行基于RL的中期训练。我们的方法ExpRL不是将参考解决方案作为模仿目标,而是将其用作奖励支架:参考对策略隐藏,仅用于构建问题特定的评分标准,以评判在策略推理轨迹。策略从原始问题提示中采样,而LLM评判器将采样的推理轨迹与参考解决方案进行比较,并分配结果级或过程级的密集奖励。这使得ExpRL能够强化部分进展、有用的中间归约以及稀疏最终答案奖励通常无法提升的生产性推理行为。在具有挑战性的数学推理任务上,ExpRL比SFT、稀疏奖励GRPO和自蒸馏产生更强的RL启动,并为后续稀疏奖励RL提供更好的初始化。额外的混合领域实验进一步表明,ExpRL可以扩展到最初的纯数学设置之外。

英文摘要

Sparse reward reinforcement learning (RL) has become a standard tool for improving LLM reasoning, but its success depends critically on the coverage present in the base model. In practice, models are often primed for RL through \emph{mid-training} on curated reasoning traces that teach useful primitive skills such as decomposition, verification, or self-correction. Although effective, this strategy requires manually specifying what the model should learn, and it remains unclear whether such primitive coverage is enough for much harder problems, which require combining these skills into broader solution strategies. We study a more automated approach: \emph{RL-based mid-training} using large corpora of human-written question-answer data. Rather than treating reference solutions as targets to imitate, our method, ExpRL, uses them as \emph{reward scaffolds}: references are hidden from the policy and used only to construct problem-specific grading rubrics for judging on-policy reasoning traces. The policy samples from the original problem prompt, while an LLM judge compares the sampled reasoning trace against the reference solution and assigns outcome-level or process-level dense rewards. This lets ExpRL reinforce partial progress, useful intermediate reductions, and productive reasoning behaviors that sparse final-answer rewards often fail to upweight. On challenging math reasoning tasks, ExpRL yields stronger RL priming than SFT, sparse-reward GRPO, and self-distillation, and provides a better initialization for subsequent sparse-reward RL. Additional mixed-domain experiments further suggest that ExpRL can extend beyond the original math-only setting.

2606.17022 2026-06-16 math.ST cs.LG stat.ML stat.TH 新提交

Learning the Geometry of Data: A Mathematical Review of Shape Space Analysis

学习数据的几何:形状空间分析的数学综述

Gary P. T. Choi, Khanh Dao Duc, Shira Faigenbaum-Golovin, Karen Habermann, Emmanuel Hartman, Christoph von Tycowicz, Chi Zhang, Wenjun Zhao, Felix Zhou

AI总结 本文综述形状空间分析,利用微分几何、统计学和机器学习构建从形状表示到几何感知学习的分析流程,用于表征几何数据中的非线性结构。

Comments 79 pages, 10 figures, 8 tables

详情
AI中文摘要

机器学习的一个核心目标是识别数据中的结构和模式。数据采集的进步日益产生具有丰富几何形态的观测数据集,从而产生了编码对象几何变异的形状空间。这类数据集出现在广泛的学科中,包括生物学、医学、人类学和计算机视觉,其中微妙的几何差异通常携带重要的科学信息。然而,传统的机器学习方法常常不足以解释这些数据背后的非线性几何结构。本综述综合了快速增长的形状空间分析工作,该工作为几何数据的研究提供了数学和计算框架。借鉴微分几何、统计学和机器学习的理念,我们围绕一个共同的分析流程组织文献:形状表示和参数化、稳健测地距离的严格构造、形状空间上的统计分析以及几何感知的学习方法。我们讨论了这些工具如何能够表征形状变异、比较几何对象以及分析跨群体和时间的结构轨迹。为了说明该领域的广度,我们重点介绍了跨越多个生物组织尺度的应用,包括亚细胞形态学和灵长类牙齿进化的研究。在这些以及许多其他领域中,研究人员面临着由复杂、非线性且常常未对齐的几何变异引起的共同挑战。本综述最后指出了关键的理论和计算挑战,以及由日益庞大和多样化的几何数据集驱动的新兴机遇。

英文摘要

A central objective of machine learning is to identify structure and patterns in data. Advances in data acquisition have increasingly produced datasets whose observations possess rich geometric form, giving rise to shape spaces that encode variability in object geometry. Such datasets arise across a wide range of disciplines, including biology, medicine, anthropology, and computer vision, where subtle geometric differences often carry important scientific information. Traditional machine learning methods, however, are frequently ill-equipped to account for the nonlinear geometric structure underlying these data. This survey synthesizes a rapidly growing body of work on shape space analysis, which provides a mathematical and computational framework for the study of geometric data. Drawing on ideas from differential geometry, statistics, and machine learning, we organize the literature around a common analytical pipeline: shape representation and parameterization, the rigorous construction of robust geodesic metrics, statistical analysis on shape spaces, and geometry-aware learning methods. We discuss how these tools enable the characterization of shape variability, the comparison of geometric objects, and the analysis of structural trajectories across populations and time. To illustrate the breadth of the field, we highlight applications spanning multiple scales of biological organization, including studies of subcellular morphology and primate tooth evolution. Across these and many other domains, researchers face common challenges arising from complex, nonlinear, and often unaligned geometric variation. The review concludes by identifying key theoretical and computational challenges, as well as emerging opportunities driven by increasingly large and diverse geometric datasets.

2606.17020 2026-06-16 cs.CV cs.AI 新提交

FusionRS: A Large-Scale RGB-Infrared Remote Sensing Dataset for Dual-Modal Vision-Language Foundation Models

FusionRS: 用于双模态视觉-语言基础模型的大规模RGB-红外遥感数据集

Jiaju Han, Ben Zhang, Xuemeng Sun, Qike Zhang, Yuxian Dong, Chengyin Hu, Fengyu Zhang, Yiwei Wei, Jiujiang Guo

发表机构 * China University of Petroleum-Beijing at Karamay(中国石油大学(北京)克拉玛依校区) University of Electronic Science and Technology of China(电子科技大学) Tianjin University(天津大学)

AI总结 针对遥感视觉-语言模型缺乏红外数据的问题,提出首个大规模RGB-红外-文本数据集FusionRS,通过翻译RGB图像为红外风格并配以红外感知描述,训练双模态基础模型,提升RGB-红外对齐和双模态字幕生成性能。

详情
AI中文摘要

遥感视觉-语言模型推动了地球观测理解的发展,但现有工作大多集中于RGB图像,红外数据中的互补信息尚未得到充分探索。红外图像提供了独特的线索,包括热强度结构、物体边界和光照不变场景特征,这些可以丰富超越传统RGB观测的视觉-语言学习。然而,用于遥感视觉-语言建模的大规模RGB-红外-文本数据集仍然缺失。为填补这一空白,我们引入了FusionRS,这是首个专为遥感双模态视觉-语言学习设计的大规模RGB-红外-文本数据集。FusionRS通过将多样的公开RGB遥感图像翻译为红外风格对应物,形成对齐的RGB-IR图像对。每对图像都配有常规场景描述和红外感知描述,后者在保留语义内容的同时明确描述红外特有的视觉属性。基于FusionRS,我们训练了用于RGB-IR联合理解的双模态视觉-语言基础模型。我们首先训练CLIP风格的模型进行RGB-IR-文本对齐,然后微调生成式VLM用于双模态RGB-IR字幕生成。实验表明,与仅RGB和非红外感知训练设置相比,FusionRS改进了RGB-IR对齐、红外到文本检索和双模态字幕生成。消融研究进一步验证了红外感知描述对于加强红外-语言对齐至关重要,突显了模态特定文本监督对于更可扩展的RGB-红外遥感视觉-语言表示学习的重要性。

英文摘要

Remote sensing vision-language models have advanced Earth observation understanding, but most existing work remains centered on RGB imagery, leaving the complementary information in infrared data underexplored. Infrared images provide distinctive cues, including thermal intensity structures, object boundaries, and illumination-invariant scene features, which can enrich visual-language learning beyond conventional RGB observations. However, a large-scale RGB-infrared-text dataset for remote sensing vision-language modeling is still absent. To address this gap, we introduce FusionRS, the first large-scale RGB-infrared-text dataset designed for dual-modal vision-language learning in remote sensing. FusionRS is constructed by translating diverse public RGB remote sensing images into infrared-style counterparts, forming aligned RGB-IR image pairs. Each pair is associated with conventional scene captions and IR-aware captions that explicitly describe infrared-specific visual properties while preserving semantic content. Based on FusionRS, we train dual-modal vision-language foundation models for RGB-IR joint understanding. We first train CLIP-style models for RGB-IR-text alignment, and then fine-tune generative VLMs for dual-modal RGB-IR captioning. Experiments show that FusionRS improves RGB-IR alignment, infrared-to-text retrieval, and dual-modal captioning over RGB-only and non-IR-aware training settings. Ablation studies further verify that IR-aware captions are crucial for strengthening infrared-language alignment, highlighting the importance of modality-specific textual supervision for more scalable RGB-infrared remote sensing vision-language representation learning.

2606.17016 2026-06-16 cs.CL cs.AI cs.LG cs.MA 新提交

TokenPilot: Cache-Efficient Context Management for LLM Agents

TokenPilot: 面向LLM智能体的缓存高效上下文管理

Buqiang Xu, Zirui Xue, Dianmou Chen, Chenyang Fu, Chiyu Wu, Caiying Huang, Chen Jiang, Jizhan Fang, Xinle Deng, Yijun Chen, Yunzhi Yao, Xuehai Wang, Jin Shang, Gong Yu, Ningyu Zhang

发表机构 * Zhejiang University(浙江大学) University of Electronic Science and Technology of China(电子科技大学) Xi’an University of Electronic Science and Technology(西安电子科技大学) HomologyAI(同源人工智能)

AI总结 针对LLM智能体长会话中上下文累积导致推理成本高的问题,提出TokenPilot双粒度上下文管理框架,通过摄入感知压缩和生命周期感知驱逐策略,在保持性能的同时降低61%-87%的成本。

Comments LightMem Series: Work in Progress

详情
AI中文摘要

随着LLM智能体被部署在长周期会话中,上下文累积推高了推理成本。现有方法利用文本修剪或动态内存驱逐来最小化token占用,但其无约束的序列突变改变了布局,引入前缀不匹配和缓存失效。这揭示了文本稀疏性与提示缓存连续性之间的关键权衡。为解决此问题,我们提出TokenPilot,一个双粒度上下文管理框架。全局上,摄入感知压缩作为框架工具,稳定提示前缀并在摄入门处消除开放世界环境噪声。局部上,生命周期感知驱逐监控上下文段的持续剩余效用,强制执行保守的批处理轮次调度,仅在任务相关性过期时卸载内容段。在PinchBench和Claw-Eval上的隔离和连续模式实验表明,TokenPilot在隔离模式下成本降低61%和56%,在连续模式下降低61%和87%,同时与先前系统相比保持竞争性能。TokenPilot已集成到LightMem2中,地址为https://github.com/zjunlp/LightMem2。

英文摘要

As LLM agents are deployed in long-horizon sessions, context accumulation drives up inference costs. Existing approaches utilize text pruning or dynamic memory eviction to minimize token footprints; however, their unconstrained sequence mutations alter layouts, introducing prefix mismatches and cache invalidation. This reveals a critical trade-off between text sparsity and prompt cache continuity. To address this, we present TokenPilot, a dual-granularity context management framework. Globally, Ingestion-Aware Compaction acts as a framework harness to stabilize prompt prefixes and eliminate open-world environmental noise at the ingestion gate. Locally, Lifecycle-Aware Eviction monitors the ongoing residual utility of context segments, enforcing a conservative batch-turn schedule to offload content segments only when task relevance expires. Experiments on PinchBench and Claw-Eval under both isolated and continuous modes demonstrate that TokenPilot reduces costs by 61% and 56% in isolated mode, and 61% and 87% in continuous mode, while maintaining competitive performance compared to prior systems. TokenPilot has been integrated into LightMem2 at https://github.com/zjunlp/LightMem2.

2606.17014 2026-06-16 cs.LG math.ST stat.ML stat.TH 新提交

Filtered Conformal Ellipsoids for Graph-Native Time Series

图原生时间序列的过滤共形椭球

Yannick Limmer

发表机构 * DRW London(DRW伦敦)

AI总结 提出过滤共形椭球方法,结合状态空间滤波与共形校准,为多元时间序列生成联合预测集,控制单事件并适应跨坐标依赖,通过可观测预测律商分析保证覆盖界。

详情
AI中文摘要

多元时间序列的联合预测集应控制单个事件,同时适应跨坐标依赖性。我们研究过滤共形椭球:一个冻结的状态空间滤波器输出一步预测均值和协方差,并对得到的马氏距离分数应用分割共形校准。滤波器用于选择椭球形状;共形校准选择标量半径,因此该构造受益于学习到的预测协方差,而不依赖高斯尾部概率来保证覆盖。主要困难在于过滤分数是依赖的,且学习到的循环滤波器不需要在其原始隐藏状态上收缩;因此,我们分析可观测预测律商中的收缩,该商识别产生相同未来发射高斯律序列的隐藏状态。在稳定的贝叶斯高斯投影滤波器、协方差界和有限时域可观测性费舍尔条件下,小超额高斯负对数似然意味着学习到的发射律的收缩。结合阈值自协方差包络,这给出了依赖下过滤分割共形预测的切比雪夫型近似覆盖界;更尖锐的伯恩斯坦型界需要额外的几何混合集中假设。在高斯预言可实现性下,我们还在条件有效的高斯椭球规则类中获得了接近预言的log体积比较。我们使用具有对角加低秩协方差的GCN-GRU滤波器实例化该框架。在中等规模的图原生交通基准(METRLA-$20$和PEMSBAY-$50$)上,学习到的滤波器比静态协方差和非滤波基线给出更尖锐的目标椭球;在全图规模和非图原生数据集上,因子和copula基线可能更强。

英文摘要

Joint prediction sets for multivariate time series should control a single event while adapting to cross-coordinate dependence. We study filtered conformal ellipsoids: a frozen state-space filter emits a one-step predictive mean and covariance, and split-conformal calibration is applied to the resulting Mahalanobis scores. The filter is used to choose the ellipsoid shape; conformal calibration chooses the scalar radius, so the construction benefits from a learned predictive covariance without relying on Gaussian tail probabilities for coverage. The main difficulty is that filtered scores are dependent and learned recurrent filters need not contract in their raw hidden state; we therefore analyse contraction in an observable predictive-law quotient that identifies hidden states producing the same future sequence of emitted Gaussian laws. Under a stable Bayes Gaussian-projection filter, covariance bounds, and a finite-horizon observability Fisher condition, small excess Gaussian negative log-likelihood implies contraction of the learned emitted laws. Combined with a threshold-autocovariance envelope this yields a Chebyshev-type approximate coverage bound for filtered split-conformal prediction under dependence; a sharper Bernstein-type bound requires an additional geometric-mixing concentration assumption. Under Gaussian oracle realisability we also obtain a near-oracle log-volume comparison within the class of conditionally valid Gaussian ellipsoid rules. We instantiate the framework with a GCN-GRU filter with diagonal-plus-low-rank covariance. On moderate-size graph-native traffic benchmarks (METRLA-$20$ and PEMSBAY-$50$), the learned filter gives sharper at-target ellipsoids than static-covariance and non-filter baselines; at full-graph scale and on non-graph-native datasets, factor and copula baselines can be stronger.

2606.17013 2026-06-16 math.OC cs.LG 新提交

Exploding and vanishing gradients in deep neural networks: the effect of residual connections

深度神经网络中的梯度爆炸和消失:残差连接的影响

Vivek S Borkar

AI总结 利用乘法遍历理论分析深度神经网络中的梯度爆炸与消失现象,并解释残差连接对李雅普诺夫谱的影响。

Comments 10 pages

详情
AI中文摘要

深度神经网络中众所周知的梯度爆炸和消失现象通过乘法遍历理论进行分析。在此背景下,解释了添加残差连接的效果。具体而言,利用Furstenberg和Kifer对李雅普诺夫指数的刻画,对李雅普诺夫谱以及残差连接对其的影响做出了精确陈述。

英文摘要

The well known phenomenon of exploding and vanishing gradients in deep neural networks is analyzed using multiplicative ergodic theory. The effect of adding a residual connection is explained in this context. Specifically, a characterization of Liapunov exponents due to Furstenberg and Kifer is exploited in order to make a precise statement about the Liapunov spectrum and the effect of residual connections on it.

2606.17011 2026-06-16 cs.RO cs.LG 新提交

ROVE: Unlocking Human Interventions for Humanoid Manipulation via Reinforcement Learning

ROVE: 通过强化学习解锁人类干预用于人形机器人操作

Wei Xiao, Weiliang Tang, Yuying Ge, Hui Zhou, Yao Mu, Li Zhang, Yixiao Ge

发表机构 * XPENG Robotics(小鹏机器人) Fudan University(复旦大学) The Chinese University of Hong Kong(香港中文大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出ROVE框架,利用强化学习和乐观价值估计,从次优人类干预轨迹中学习高价值行为,提升人形机器人操作性能。

详情
AI中文摘要

人类干预为视觉-语言-动作(VLA)模型的后训练提供了关键的纠正信号。然而,由于复杂的全身运动学和灵巧手控制,实现无缝的人形干预是一个严峻的系统挑战。因此,收集到的干预轨迹往往是次优的,依赖人类干预作为专家监督的方法可能会吸收犹豫、低效甚至错误的行为。为了解决系统和算法两方面的挑战,我们提出了ROVE,一个用于人形VLA后训练的强化学习框架,能够处理不完美的人类干预。首先,ROVE引入了一个人在环的流水线,能够收集人形操作中的部署和干预数据。其次,它利用乐观价值估计(OVE)从混合质量的轨迹中优先考虑高价值行为。为了进一步增强价值估计的鲁棒性,我们融入了跨具身的人类经验视频,为长尾失败和恢复模式提供丰富的监督。由此产生的评论家产生信息丰富的优势信号,引导VLA演员专注于高价值行为,而不是不加区分地模仿所有动作。在具有挑战性的真实世界接触密集和精细的人形操作任务中,ROVE优于基于经验学习的基线,并在多次部署-干预迭代中持续改进。

英文摘要

Human interventions provide crucial corrective signals for post-training Vision-Language-Action (VLA) models. However, enabling seamless humanoid interventions is a formidable systems challenge due to complex whole-body kinematics and dexterous-hand control. Consequently, the collected intervention trajectories are often suboptimal, and methods that rely on human interventions as expert supervision can absorb hesitant, inefficient, or even erroneous behaviors. To address both the system and algorithmic challenges, we propose ROVE, a reinforcement learning framework for humanoid VLA post-training with imperfect human interventions. First, ROVE introduces a human-in-the-loop pipeline capable of collecting deployment and intervention data for humanoid manipulation. Second, it utilizes Optimistic Value Estimation (OVE) to prioritize high-value behaviors from mixed-quality trajectories. To further robustify value estimation, we incorporate cross-embodiment human experience videos to provide rich supervision for long-tailed failure and recovery modes. The resulting critic yields informative advantage signals, steering the VLA actor to focus on high-value behaviors rather than indiscriminately imitating all actions. On challenging real-world contact-rich and fine-grained humanoid manipulation tasks, ROVE outperforms experience-learning baselines and consistently improves across multiple rollout-intervention iterations.

2606.17010 2026-06-16 cs.LG 新提交

From Tokens to Policy: Causal and Interpretable Heterogeneous Treatment Effects Identification

从令牌到策略:因果且可解释的异质性处理效应识别

Riccardo Cadei, Frank Otchere, Nyasha Tirivayi, Gustavo Angeles Tagliaferro, Falco J. Bargagli-Stoffi, Francesco Locatello

发表机构 * ISTA UNICEF(联合国儿童基金会) UCLA(加州大学洛杉矶分校)

AI总结 提出NEXIS方法,利用多模态预处理表示将HTE识别转化为马尔可夫毯发现问题,实现因果可解释的异质性处理效应识别,并在非洲反贫困项目中验证。

详情
AI中文摘要

异质性处理效应(HTE)识别对于解释干预的影响并据此优化策略至关重要。现有方法在表达性和可解释性之间权衡,但如果某些活跃的异质性驱动因素未被测量,这两种极端方法都会允许虚假的HTE表征,缺乏因果解读。在这项工作中,我们聚焦于受控实验,并认为通过潜在交互变量实现因果HTE表征现在已触手可及,这得益于(i)更广泛的预处理测量,即多模态和多视角,以及(ii)具有最小人工监督的可扩展表示。然后,我们将HTE识别重新定义为在充分且对齐的预处理表示上的马尔可夫毯发现问题,并引入神经暴露交互搜索(NEXIS),这是一种具有可证明且经验验证的一致选择性的迭代过程。我们在非洲的两个反贫困项目中部署NEXIS,为每个项目增加卫星图像以捕捉先前未测量的环境效应修饰因子,从而为优化项目的后续迭代提供新颖、可解释且规范性的指导。

英文摘要

Heterogeneous Treatment Effect (HTE) identification is crucial to explain the impact of an intervention and optimize our policies accordingly. Existing approaches trade expressivity for interpretability, but, if some active heterogeneity drivers are unmeasured, methods at both ends of this spectrum allow for spurious HTE characterization with no causal reading. In this work, we focus on controlled experiments and argue that an oracle HTE causal characterization via the latent interactors is now within reach, thanks to (i) more extensive pre-treatment measurements, i.e., multi-modal and multi-view, and (ii) scalable representations with minimal human supervision. We then re-frame HTE identification as a Markov-blanket discovery problem on a sufficient and aligned pre-treatment representation, and introduce Neural EXposure Interaction Search (NEXIS), an iterative procedure with provable and empirically validated consistent selection. We deploy NEXIS on two anti-poverty programs in Africa, augmenting each with satellite imagery capturing previously unmeasured environmental effect modifiers, leading to novel, interpretable and prescriptive guidelines to optimize the programs' next iterations.

2606.17006 2026-06-16 cs.SD cs.AI cs.LG cs.MM eess.AS 新提交

TuneJury: An Open Metric for Improving Music Generation Preference Alignment

TuneJury: 一种改进音乐生成偏好对齐的开放指标

Yonghyun Kim, Junwon Lee, Haiwen Xia, Yinghao Ma, Junghyun Koo, Koichi Saito, Yuki Mitsufuji, Chris Donahue

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Sony AI(索尼AI) Georgia Tech(佐治亚理工学院) KAIST(韩国科学技术院) Peking University(北京大学) QMUL(伦敦玛丽女王大学)

AI总结 提出TuneJury,一个开放、实例级别的成对奖励模型,用于文本到音乐生成,通过预测偏好分数支持数据筛选、后处理校准,并在推理、优化和训练中提升对齐效果。

Comments 32 pages, 9 figures

详情
AI中文摘要

我们引入了TuneJury,一个开放、实例级别的成对奖励模型,用于文本到音乐生成,它从文本提示和音频片段中预测音乐偏好分数。发布的检查点在公开的人类偏好标签上训练,涵盖竞技场风格(A vs. B)投票、度量对齐偏好对、众包成对比较和专家审美评分。两个片段之间的预测分数差在我们的保留测试集上校准良好,支持通过简单的分数阈值进行数据筛选。TuneJury泛化到保留测试对和分布外基准,在后一任务上与先前基线保持竞争力。对于训练后发布的生成器,我们引入了锚定校准,一种事后、每系统的Bradley-Terry校准,以显著优于从头再训练的数据效率恢复一致性。相同的冻结奖励在三个下游应用中驱动一致的奖励轴增益:推理时的最佳N选择、DITTO风格的潜在优化和专家迭代后训练。TuneJury可在https://github.com/yonghyunk1m/TuneJury获取。

英文摘要

We introduce TuneJury, an open, instance-level pairwise reward model for text-to-music that predicts a music preference score from a text prompt and an audio clip. The released checkpoint is trained on publicly available human-preference labels covering arena-style (A vs. B) votes, metric-alignment preference pairs, crowdsourced pairwise comparisons, and expert aesthetic ratings. The predicted score margin between two clips is well calibrated on our held-out test split, supporting data filtering via a simple score threshold. TuneJury generalizes to both held-out test pairs and out-of-distribution benchmarks, remaining competitive with prior baselines on the latter. For generators released after training, we introduce anchor calibration, a post-hoc, per-system Bradley-Terry calibration that recovers agreement at substantially better data efficiency than from-scratch retraining. The same frozen reward drives consistent reward-axis gains across three downstream applications: inference-time best-of-N selection, DITTO-style latent optimization, and expert-iteration post-training. TuneJury is available at https://github.com/yonghyunk1m/TuneJury.

2606.17005 2026-06-16 cs.AI stat.ME 新提交

Bayesian Inference and Decision Audits for Public Archives of Frontier AI Evaluations

前沿AI评估公共档案的贝叶斯推断与决策审计

Yanan Long

AI总结 本文通过贝叶斯推断和审计方法,分析公共AI评估档案中的选择性报告和缺失数据,发现单一终端记录与多种历史路径兼容,并验证了审计门限对虚假声明的过滤作用。

详情
AI中文摘要

公共AI评估常被视为终端排行榜,但底层证据是由报告规则、基准修订和缺失数据塑造的选择性时间序列。LiveBench和Open LLM Leaderboard v2的重复公共档案作为主要纵向记录;LMArena提供偏好压力测试;GAIA和tau-bench贡献有限的智能体试点。这些档案共同实例化了一个贝叶斯推断问题:在固定报告约定下,一个仅包含$1{,}000$个系统的构造终端示例与两个终端前历史兼容,在相同终端尾模型下,达到天花板$0.05$以内的时间分别为$23.03$或$75.13$。在合成后验比较中,面向行动的诊断在不同观测制度下存在差异。候选选择感知的前沿模型未能通过合成恢复、目标档案预测、偏好转移和不确定性校准;相应地,固定审计门限拒绝了其更强的声明。一种档案与裁决协议重建了公共评估历史,隔离了验证的时间边界,并证伪了无依据的前沿声明。

英文摘要

Public AI evaluations are often read as terminal leaderboards, yet the underlying evidence is a selective time series shaped by reporting rules, benchmark revisions, and missingness. Repeated public archives for LiveBench and Open LLM Leaderboard v2 serve as the primary longitudinal record; LMArena provides a preference stress test; and GAIA and tau-bench contribute limited agentic pilots. Together, these archives instantiate a Bayesian inference problem: under a fixed reporting convention, one constructed terminal-only example over $1{,}000$ systems is compatible with two pre-terminal histories, yielding times of $23.03$ or $75.13$ to reach within $0.05$ of the ceiling under the same terminal-tail model. In synthetic posterior comparisons, action-facing diagnostics differ across observation regimes. The candidate selection-aware frontier model fails synthetic recovery, objective-archive prediction, preference transfer, and uncertainty calibration; correspondingly, fixed audit gates reject its stronger claims. An archive-and-adjudication protocol reconstructs public evaluation histories, isolates a verified timing boundary, and falsifies unsupported frontier claims.

2606.17004 2026-06-16 eess.SY cs.SY 新提交

Data-Driven Personalization of Automated Insulin Delivery

自动化胰岛素输送的数据驱动个性化

Ali Kashani, Ali Tavasoli, Heman Shakeri

AI总结 提出一种基于日常血糖数据的实时自适应控制参数调整方法,利用投影梯度下降优化日风险指标,并通过收缩理论验证闭环系统收敛性。在100名成年患者的仿真中,该方法在餐时、餐量和胰岛素敏感性变异下,分别于4、8、17周后使时间在范围(70-180 mg/dL)内增加2%、3%和4%。

详情
AI中文摘要

自动化胰岛素输送(AID)系统通常针对人群进行调谐,并且对由进餐模式、体力活动和胰岛素敏感性波动引起的胰岛素需求的个体间和个体内变异提供的在线适应有限。我们提出了一种实时的数据驱动个性化方法,利用受试者的每日血糖数据自适应控制器参数。该适应被表述为在日风险指标上的投影梯度下降,其中梯度估计被设计用于衰减噪声和代谢变异性。我们使用收缩理论来验证优化框架以及自适应下闭环系统的收敛性。在FDA接受的UVA/Padova T1D模拟器的100名成年人群上的计算机仿真表明,在进餐时间、进餐量和胰岛素敏感性的变异下,我们的方法在4周、8周和17周后分别将血糖风险改善并使时间在范围(70-180 mg/dL)内增加2%、3%和4%。

英文摘要

Automated insulin delivery (AID) systems are often tuned for the population and offer limited online adaptation to the inter- and intrapatient variability in insulin needs caused by meal patterns, physical activity, and fluctuations in insulin sensitivity. We present a real-time, data-driven personalization approach that adapts controller parameters using the subject's daily glycemic data. The adaptation is formulated as projected gradient descent on a daily risk metric, where the gradient estimation is designed to attenuate noise and metabolic variability. We use contraction theory to validate the optimization framework and convergence of the closed-loop system under adaptation. In silico experiments on the 100-adult cohort of the FDA-accepted UVA/Padova T1D simulator show that our method improves glycemic risk and increases time-in-range (TIR, 70-180\,mg/dL) by 2%, 3%, and 4% after 4, 8, and 17 weeks, respectively, under variability in meal timing, meal size, and insulin sensitivity.

2606.17002 2026-06-16 cs.CY 新提交

From Newtonian to Relativistic IAM: The Autonomous Principal as Reference Frame for Digital Identity

从牛顿到相对论IAM:自主主体作为数字身份参考框架

Philippe Page, Robert Mitwicki, Michal Pietrus

AI总结 本文通过物理类比论证,在分布式信息系统中,自主主体是因果一致性的必然结果,而非规范偏好,并展示了自2023年以来实现该观点的技术及其对跨境数据流和代理系统的影响。

Comments 30 pages, 3 figures

详情
AI中文摘要

2023年的论文《分布式治理:数据治理的委托-代理方法》arXiv:2308.07280引入了自主主体作为数字生态系统中交易主权所在。本后续第二部分推进了一个结构性论证,说明该模型不是规范偏好,而是在分布式信息系统中认真对待因果性的结果。通过与从牛顿到相对论物理学的类比,我们表明托管身份管理依赖于全局同时性的隐含假设,一旦身份必须在跨生态系统、司法管辖区以及离线/在线边界上操作,该假设就会失效。一旦放弃该假设,状态就不再是由中央权威持有的名词,而是成为通过因果有序交换在主体之间维持的关系。自主主体成为唯一有权定义自身参考框架的实体。我们报告了自2023年以来构建的技术,该技术将这一观点付诸实践,并概述了其对跨境数据流和代理系统的影响。

英文摘要

The 2023 paper \emph{Distributed Governance: a Principal-Agent Approach to Data Governance} arXiv:2308.07280 introduced the autonomous principal as the locus of transactional sovereignty in digital ecosystems. This follow-up, Part 2, advances a structural argument for why that model is not a normative preference but a consequence of taking causality seriously in distributed information systems. Drawing an analogy with the transition from Newtonian to relativistic physics, we show that custodial identity management rests on an implicit assumption of global simultaneity that fails as soon as identity must operate across ecosystems, jurisdictions, and the offline/online boundary. Once that assumption is dropped, state ceases to be a noun held by a central authority and becomes a relation maintained between principals through causally ordered exchanges. The autonomous principal emerges as the only entity with standing to define its own reference frame. We report on technology built since 2023 that operationalises this view, and outline its consequences for cross-border data flows and agentic systems.