arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2029
2606.05375 2026-06-05 cs.CV cs.AI

Three-Dimensional Retinal Microvasculature Restoration in OCT Angiography

OCT血管造影中的三维视网膜微血管修复

Yukun Guo, Min Gao, Tristan T. Hormel, Steven T. Bailey, Thomas S. Hwang, Yali Jia

发表机构 * Casey Eye Institute, Oregon Health & Science University(俄勒冈健康与科学大学Casey眼科研究所) Department of Biomedical Engineering, Oregon Health & Science University(俄勒冈健康与科学大学生物医学工程系)

AI总结 提出基于EfficientNet-B5编码器和含空间-通道挤压激励模块的解码器的深度学习算法,从单次OCTA体数据恢复毛细血管解剖结构,显著提升图像质量与微血管保真度。

详情
AI中文摘要

光学相干断层扫描血管造影(OCTA)是一种用于成像视网膜微血管的强大技术。然而,由于成像伪影,获取可靠的视网膜血流和视网膜无灌注区域量化具有挑战性。现有方法主要关注噪声抑制、投影伪影去除或信号增强,以改善OCTA在横截面或二维(2D)正面投影中的图像质量,而忽略了内在的三维血管结构。在本研究中,我们提出了一种基于深度学习的算法,用于从单个OCTA体数据中恢复毛细血管解剖血管结构。该网络由EfficientNet-B5编码器和结合了并行空间与通道挤压激励模块的解码器组成,通过跳跃连接保持空间分辨率。使用三个相邻B帧作为输入,预测修复后的中间B帧。我们使用峰值信噪比(PSNR)和结构相似性指数(SSIM)评估模型性能,以多次扫描平均生成的真值作为基准。结果表明,与原始单次OCTA体数据相比,所提模型显著(p < 0.001)提高了图像质量,PSNR为26.16 ± 1.26对比22.23 ± 0.78,SSIM为0.91 ± 0.02对比0.72 ± 0.03。所提模型还显著(p < 0.001)提高了微血管保真度,通过模型输出与真值之间的Dice系数重叠测量,在多个不同血管板层上,2D和3D分别至少提高3.8%和51.2%。

英文摘要

Optical coherence tomographic angiography (OCTA) is a powerful technique for imaging retinal microvasculature. However, acquiring reliable quantification of retinal blood flow and areas of retinal nonperfusion is challenging because of imaging artifacts. Existing methods primarily focus on noise suppression, projection artifact removal, or signal enhancement to improve the image quality of OCTA in cross-sectional or two-dimensional (2D) en face projections, while neglecting the intrinsic three-dimensional vascular architecture. In this study, we propose a deep learning-based algorithm for restoring capillary anatomical vasculature from a single OCTA volume. The network consists of an EfficientNet-B5 encoder and a decoder incorporating concurrent spatial and channel squeeze-and-excitation modules, connected via skip connections to preserve spatial resolution. Three adjacent B-frames are used as input to predict the restored middle B-frame. We evaluated the performance of the model using the peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) against ground truth generated from averaging multiple scans. The results show that the proposed model significantly (both p < 0.001) improved image quality compared with the original single OCTA volume, with a PSNR of 26.16 +/- 1.26 vs. 22.23 +/- 0.78 and an SSIM of 0.91 +/- 0.02 vs. 0.72 +/- 0.03. The proposed model also significantly (p < 0.001) improved microvascular fidelity, measured by the Dice coefficient overlap between the model output and ground truth, in both 2D and 3D by at least 3.8% and 51.2%, respectively, across several different vascular slabs.

2606.05373 2026-06-05 cs.LG physics.bio-ph

Evidence-Guided Neural Architecture Selection under Uncertainty for Subject-Specific Blood Glucose Forecasting

证据引导的神经架构选择在不确定性下用于个体化血糖预测

Md Azharul Islam, Dwyer Deighan, Tarunraj Singha, Danial Faghihi

发表机构 * organization= Department of Mechanical Data-Enabled Sciences, University at Buffalo , city= Buffalo , state= NY , country= USA

AI总结 提出EVIDENT框架,结合贝叶斯训练、证据排序和任务特定验证,在有限、噪声和异构数据中自动选择最优神经架构,用于个体化血糖预测。

详情
AI中文摘要

在有限、噪声和异构数据下的时间序列预测中,可靠的神经架构选择是一个开放挑战,标准的启发式架构设计和验证方法无法确保准确可靠的预测和泛化。我们提出EVIDENT(基于证据的神经架构识别),一个整合贝叶斯训练、基于证据的排序和不确定性下任务特定验证的架构选择框架。该框架探索候选架构池,并识别满足规定验证标准的最低容量模型。我们使用时间卷积网络(TCNs)在1型糖尿病患者的个体化血糖预测中演示了该方法。结果表明,EVIDENT在群体水平糖尿病数据上系统地拒绝了参数不足和过度的TCN架构,同时识别出能可靠泛化到未见患者的模型。当多个架构具有竞争力时,该框架进一步支持基于可信度的集成预测,从而提升预测性能。与随机搜索基线相比,EVIDENT识别出更小的架构,在未见患者上具有更一致的预测性能。这些发现确立了EVIDENT作为一种神经架构发现策略,能够在数据有限和异构环境中实现高风险预测的可靠模型选择。

英文摘要

Reliable neural architecture selection is an open challenge in time-series forecasting under limited, noisy, and heterogeneous data, where standard heuristic architecture design and validation approaches fail to ensure accurate and reliable prediction and generalization. We propose EVIDENT (EVidence-based IDEntification of Neural archiTectures), a framework for architecture selection that integrates Bayesian training, evidence-based ranking, and task-specific validation under uncertainty. The framework explores the candidate architecture pool and identifies the lowest-capacity model that satisfies a prescribed validation criterion. We demonstrate this method using temporal convolutional networks (TCNs) for individualized blood glucose forecasting in type 1 diabetes patients. The results show that EVIDENT systematically rejects both under- and over-parameterized TCN architectures on population-level diabetes data, while identifying models that generalize reliably to unseen patients. When multiple architectures are competitive, the framework further supports plausibility-weighted ensemble predictions that enhance predictive performance. Compared with a random-search baseline, EVIDENT identified smaller architectures with more consistent forecasting performance on unseen patients. These findings establish EVIDENT as a strategy to neural architecture discovery, enabling reliable model selection for high-consequence forecasting in data-limited and heterogeneous settings.

2606.05372 2026-06-05 cs.RO cs.CG

Efficient Computation of Distance Functions for Navigation Vector Fields in Lie Groups

李群中导航向量场距离函数的高效计算

Vinicius M. Gonçalves, João Baião, Felipe Bartelt, Douglas G. Macharet, Gustavo M. Freitas, Héctor Azpúrua, Luciano C. A. Pimenta

发表机构 * University of São Paulo(圣保罗大学)

AI总结 针对李群中基于向量场的路径跟踪问题,提出一种利用G-多项式曲线结构将距离计算简化为多项式求根的高效方法,显著降低计算时间并保持精度。

详情
AI中文摘要

基于向量场的方法被广泛用于机器人控制,并常应用于路径跟踪问题。一些向量场方法需要重复计算机器人配置与曲线之间的距离以及相应的最近点。最近,向量场已被扩展到李群。在这种情况下,这种计算可能非常昂贵,尤其是在嵌入式平台上以高控制频率执行时。本文提出了一种高效计算点与曲线之间距离的方法,该曲线表示为所谓的G-多项式曲线,这是一种将多项式曲线推广到矩阵李群的曲线表示。所提出的方法利用这些曲线的结构,将问题简化为少量多项式求根计算。仿真结果表明,与现有的基于优化的方法相比,该方法在保持精度的同时显著减少了计算时间。还提供了SE(3)群情况下的实用公式,并在机器人机械臂上进行了实验验证。该方法已在一个计算包中实现,可在线获取。

英文摘要

Vector-field-based methods are widely used for robot control and are often applied to the path-tracking problem. Some vector field approaches require repeatedly computing the distance between the robot configuration and the curve, as well as the corresponding closest point. Recently, vector fields have been extended to Lie Groups. In this case, this computation can be expensive, especially when performed at high control frequencies on embedded platforms. This paper proposes a method for efficiently computing the distance between a point and a curve represented as what is called a G-polynomial curve, which is a curve representation that generalizes polynomial curves to matrix Lie groups. The proposed approach exploits the structure of these curves to reduce the problem to a small number of polynomial root-finding computations. Simulation results show that the method significantly reduces computation time while maintaining accuracy compared to existing optimization-based approaches. Practical formulas are also provided for the case of the group SE(3), and the method is validated experimentally on a robotic manipulator. The methodology is implemented in a computational package, available online.

2606.05371 2026-06-05 cs.LG cs.NA math.NA stat.ML

Mamba-Assisted Non-Markovian Closure for Reduced-Order Modeling

Mamba辅助的非马尔可夫闭合用于降阶建模

Zhi-Feng Wei, Saad Qadeer, Panos Stinis

发表机构 * Pacific Northwest National Laboratory(太平洋西北国家实验室) University of Washington(华盛顿大学) Brown University(布朗大学)

AI总结 针对高维动力系统降阶建模中的非马尔可夫闭合项问题,提出Mamba辅助闭合框架,利用Mamba序列模型从已解析轨迹预测闭合项,并通过数值积分器耦合降阶方程,在粘性Burgers方程和混沌双尺度Lorenz '96系统上优于马尔可夫模型、GRU序列模型和Wilks方法。

Comments Code will be released upon acceptance

详情
AI中文摘要

高维动力系统的降阶建模常常受到非马尔可夫闭合项的阻碍,该闭合项表示未解析变量对解析动力学的影响。受Mori--Zwanzig形式论的启发,其中闭合项采取解析轨迹的记忆泛函形式,我们将闭合建模重新表述为序列建模问题,并提出Mamba辅助闭合(MAC)框架:一个基于Mamba的序列模型,经过训练从解析轨迹预测闭合项,通过数值积分器与降阶控制方程耦合,以在时间上推进解析变量。该框架的一个关键特性是利用状态空间模型的双重表示——模型通过卷积形式以序列到序列的方式进行训练,并通过循环形式进行逐步自回归部署,从而实现高效的长轨迹训练和恒定的每步推理成本。在粘性Burgers方程和混沌双尺度Lorenz '96系统上,MAC模型在预测准确性和长时间展开稳定性方面显著优于马尔可夫降阶模型、基于GRU的序列模型和Wilks方法。

英文摘要

Reduced-order modeling of high-dimensional dynamical systems is often hindered by the non-Markovian closure term that represents the effect of unresolved variables on the resolved dynamics. Inspired by the Mori--Zwanzig formalism, in which the closure takes the form of a memory functional of the resolved trajectory, we recast closure modeling as a sequence modeling problem and propose the Mamba-Assisted Closure (MAC) framework: a Mamba-based sequence model, trained to predict the closure from the resolved trajectory, is coupled with the reduced-order governing equations through a numerical integrator to advance the resolved variables in time. A key feature of the framework is its exploitation of the dual representation of state-space models -- the model is trained in a sequence-to-sequence fashion via the convolutional form, and deployed for step-by-step autoregressive rollout via the recurrent form, yielding both efficient long-trajectory training and constant per-step inference cost. On the viscous Burgers' equation and the chaotic two-scale Lorenz '96 system, the MAC model substantially outperforms the Markovian reduced-order model, the GRU-based sequence model, and the Wilks method in predictive accuracy and long-time rollout stability.

2606.05367 2026-06-05 cs.SD eess.AS

Task-Vector Arithmetic for Emotional Expressivity Control in Language-Model-Based Text-to-Speech

基于任务向量算术的语言模型文本到语音情感表达控制

Daniel Oliveira de Brito, Arnaldo Candido Junior

发表机构 * Instituto de Biociências, Letras e Ciências Exatas Universidade Estadual Paulista "Júlio de Mesquita Filho" (UNESP)(生物科学、文学和精确科学学院 帕尔马斯州立大学 "Júlio de Mesquita Filho" (UNESP))

AI总结 本文通过系统消融实验定位情感韵律的主要载体为x-vector,并提出一种基于x-vector质心算术的无训练方法,实现跨说话人情感强度控制,在保留身份和可懂度的同时提升情感相似度。

Comments 10 pages, 5 figures

详情
AI中文摘要

我们研究了任务向量算术(在模块化文本到语音(TTS)中成功用于跨说话人情感强度控制)是否能够迁移到基于语言模型骨干和上下文学习(LM-TTS)构建的大规模TTS系统。通过在Qwen3-TTS-12Hz-1.7B上对四个逐渐缩小的操作数——通过LoRA微调的模型权重、连续编解码器嵌入、离散编解码器标记以及由ECAPA-TDNN编码器(与合成骨干联合训练)生成的说话人嵌入(x-vector)——进行系统消融研究,我们将情感韵律的主要载体定位到x-vector。基于这一发现,我们提出了一种基于x-vector空间质心算术的无训练方法:情感方向τ = E_i[x(s_i, emo)] - E_i[x(s_i, neutral)],应用于未见过的目标说话人:x_new = x(target, neutral) + α·τ。使用ESD(英语)作为τ源,emoUERJ(巴西葡萄牙语)作为跨语言真实目标,我们观察到在英语保留说话人上,情感余弦相似度比ICL基线平均提升+0.29,在巴西葡萄牙语保留说话人上提升+0.09,同时很大程度上保留了身份(多说话人τ变体的WavLM SECS ≥ 0.88)和可懂度(PT-BR中WER ≈ 0)。这些结果初步证明,当算术操作作用于说话人嵌入时,可以规避先前报道的基于质心算术的风格控制与基于标记的TTS架构不兼容的问题。

英文摘要

We investigate whether task-vector arithmetic, successful for cross-speaker emotional intensity control in modular text-to-speech (TTS), transfers to large-scale TTS systems built on language-model backbones with in-context learning (LM-TTS). Through a systematic elimination study over four progressively narrower operands on Qwen3-TTS-12Hz-1.7B - model weights via LoRA fine-tuning, continuous codec embeddings, discrete codec tokens, and the speaker embedding (x-vector) produced by an ECAPA-TDNN encoder jointly trained with the synthesis backbone - we localize the dominant carrier of emotional prosody to the x-vector. Building on this finding, we propose a training-free method based on centroid arithmetic in x-vector space: an emotion direction $τ= \mathbb{E}_i[x(s_i,\text{emo})] -\mathbb{E}_i[x(s_i,\text{neutral})]$ applied to an unseen target speaker as $x_{\text{new}} = x(\text{target},\text{neutral}) + α\cdotτ$. Using ESD (English) as the $τ$ source and emoUERJ (Brazilian Portuguese) as a cross-lingual ground-truth target, we observe average gains of $+0.29$ in emotion2vec cosine over the ICL baseline on English held-out speakers and $+0.09$ on Brazilian Portuguese held-out speakers, while largely preserving identity (WavLM SECS $\gtrsim 0.88$ for the multi-speaker $τ$ variant) and intelligibility (WER $\approx 0$ in PT-BR). These results offer initial evidence that the reported incompatibility of centroid-arithmetic style control with token-based TTS architectures may be circumvented when the arithmetic operates on the speaker embedding.

2606.05359 2026-06-05 cs.CV

Recovering Physically Plausible Human-Object Interactions from Monocular Videos

从单目视频中恢复物理上可信的人-物交互

Dingbang Huang, Etienne Vouga, Qixing Huang, Georgios Pavlakos

发表机构 * University of Texas at Austin(德克萨斯大学奥斯汀分校) Shanghai Jiao Tong University(上海交通大学)

AI总结 提出RePHO方法,通过物理引导的重建框架和强化学习策略,从单目视频中恢复物理上可信的人-物交互,解决了现有方法中的穿透和物体漂浮问题。

Comments CVPR 2026. Project Page: https://dingbang777.github.io/RePHO/

详情
AI中文摘要

在本文中,我们提出了RePHO,一种从单目视频中重建物理上可信的人-物交互(HOI)的方法。现有的基于运动学的方法虽然能产生视觉上合理的运动,但常常导致物理上不合理的伪影,如相互穿透和物体漂浮。为了克服这些问题,我们引入了一个物理引导的重建框架。我们从运动学估计开始,然后通过强化学习(RL)训练一个策略来细化它。该策略被优化以在物理模拟器中重现交互。由于运动学估计通常带有噪声,简单的RL训练可能会失败。因此,我们提出了一种自适应采样策略,具有双重自我更新机制,可以识别具有最丰富信息和最可靠运动学重建的帧。我们的过程逐步提高重建质量,并产生物理一致的HOI序列。我们在两个标准的HOI基准上展示了我们的方法,并在物理合理性指标上取得了比现有方法明显的改进。项目页面:https://dingbang777.github.io/RePHO/

英文摘要

In this paper, we propose RePHO, a method to reconstruct physically plausible human-object interactions (HOI) from monocular videos. While existing kinematic-based approaches produce visually plausible motion, they often result in physically implausible artifacts such as interpenetration and object floating. To overcome these issues, we introduce a physics-guided reconstruction framework. We begin with a kinematic estimate and then refine it by training a policy with reinforcement learning (RL). This policy is optimized to reproduce the interaction in a physics simulator. Because kinematic estimates are typically noisy, naive RL training can fail. Therefore, we propose an adaptive sampling strategy with a dual self-updating mechanism that can identify the frames with the most informative and reliable kinematic reconstruction. Our process progressively improves reconstruction quality and yields physically consistent HOI sequences. We demonstrate our approach on two standard HOI benchmarks and achieve clear improvements in physical plausibility metrics over state-of-the-art methods. Project Page: https://dingbang777.github.io/RePHO/

2606.05357 2026-06-05 cs.AI

An interpretable and trustworthy AI framework for large-scale longitudinal structure-pain association studies using data from the Osteoarthritis Initiative (OAI)

一个可解释且可信赖的AI框架,用于利用骨关节炎倡议(OAI)数据进行大规模纵向结构-疼痛关联研究

Jincheng Yu, Haoyang Li, Yiwen Liu, Shen Liu, Rachel Yuanbao Chen, C. Kent Kwoh, Hongxu Ding, Xiaoxiao Sun

发表机构 * Statistics & Data Science GIDP, University of Arizona(大学阿瓜斯卡连特斯统计与数据科学GIDP) Department of Epidemiology and Biostatistics, University of Arizona(大学阿瓜斯卡连特斯流行病学与生物统计学系) College of Medicine Tucson, University of Arizona(大学阿瓜斯卡连特斯医学学院) R. Kent Coit College of Pharmacy, University of Arizona(大学阿瓜斯卡连特斯R. Kent Coit药学院) University High School(大学高中)

AI总结 提出结合深度学习MOAKS预测与可解释统计建模的AI框架,通过不确定性量化筛选高置信度预测,利用纵向潜类混合模型分析结构异常与疼痛的关联,发现骨髓病变、软骨丢失和半月板挤压是疼痛进展的风险因素。

详情
AI中文摘要

目的:开发一个可解释且可信赖的AI框架,结合基于深度学习的MRI骨关节炎膝关节评分(MOAKS)预测与可解释统计建模,利用骨关节炎倡议(OAI)数据大规模研究结构-疼痛关系。材料与方法:我们首先开发了一个深度学习框架,直接从膝关节MRI预测MOAKS特征,并引入共形预测以提供预测不确定性量化。这种不确定性感知策略能够显式过滤模型输出,仅保留膝关节级别的高置信度MOAKS预测。其次,我们应用纵向潜类混合模型(LCMM)检查关键结构异常与四种互补的膝关节疼痛测量之间的关联。结果:在三种MRI定义的异常(即骨髓病变(BML)、软骨丢失(CART)和半月板挤压(ME))中,我们的框架显著提高了马修斯相关系数(MCC)和其他一些指标。例如,BML的MCC从0.69提高到0.91,CART从0.45提高到0.80,ME从0.59提高到0.89。利用这些高置信度预测,我们将LCMM分析的样本量扩大到2,175个膝关节。识别出两种不同的疼痛轨迹(快速和稳定的疼痛进展)。快速进展组的估计比值比(95% CI)为:BML 1.62(1.12-2.35),CART丢失1.83(1.24-2.70),ME 2.50(1.75-3.57)。结论:这些结果强调了这些结构异常作为骨关节炎疼痛和功能进展风险因素的重要性。

英文摘要

Purpose: To develop an interpretable and trustworthy AI framework that combines deep learning based MRI Osteoarthritis Knee Score (MOAKS) prediction with interpretable statistical modeling to study structure-pain relationships at scale using data from the Osteoarthritis Initiative (OAI). Materials and Methods: We first developed a deep learning framework to predict MOAKS features directly from knee MRIs and incorporated conformal prediction to provide prediction uncertainty quantification. This uncertainty-aware strategy enables explicit filtering of model outputs, retaining only high-confidence MOAKS predictions at the knee level. Second, we applied a longitudinal latent class mixed model (LCMM) to examine associations between key structural abnormalities and four complementary knee pain measurements. Results: Among the three MRI-defined abnormalities (i.e., bone marrow lesions (BML), cartilage loss (CART), and meniscal extrusion (ME)), our framework substantially improved the Matthews correlation coefficient (MCC) and some other metrics. For example, MCC increased from 0.69 to 0.91 for BML, from 0.45 to 0.80 for CART, and from 0.59 to 0.89 for ME. Using these high-confidence predictions, we expanded the sample size to 2,175 knees for the LCMM analysis. Two distinct pain trajectories were identified (rapid and stable pain progression). The estimated odds ratios (95% CI) for the rapid progression group were 1.62 (1.12-2.35) for BML, 1.83 (1.24-2.70) for CART loss, and 2.50 (1.75-3.57) for ME. Conclusion: These results highlight the importance of these structural abnormalities as risk factors for pain and functional progression in osteoarthritis.

2606.05354 2026-06-05 cs.CV

LightVesselNet: An Ultra-Lightweight Sub-100K Parameter Network for Retinal Blood Vessel Segmentation

LightVesselNet:用于视网膜血管分割的超轻量级亚10万参数网络

Shadman Sobhan, Farhana Jalil

发表机构 * Department of Electrical & Electronic Engineering, Bangladesh University of Engineering and Technology (BUET)(电子与电气工程系,孟加拉国工程与技术大学)

AI总结 提出LightVesselNet,一种仅75K参数的紧凑编码器-解码器网络,结合通道与空间注意力、多尺度特征聚合和亚像素上采样,在五个公开数据集上实现与大型模型相当的视网膜血管分割性能,适用于资源受限的临床环境。

详情
AI中文摘要

视网膜血管分割在糖尿病视网膜病变和青光眼的早期检测中起着至关重要的作用。虽然最近的深度学习模型取得了很高的分割精度,但它们通常需要大量的计算资源,使得在边缘设备上的实际部署变得困难。在本文中,我们提出了LightVesselNet,一种专为资源受限环境中的视网膜血管分割设计的高效神经网络。尽管仅包含75K参数,LightVesselNet的性能与更大的模型相比具有竞争力。该网络采用紧凑的编码器-解码器架构,并增强了通道和空间注意力机制、瓶颈处的多尺度特征聚合模块以及解码器中的亚像素上采样策略。专用的边缘残差连接在整个解码过程中保留了精细的血管细节。在五个公开数据集:DRIVE、STARE、CHASEDB1、FIVES和HRF上进行的大量实验,分别获得了0.8189、0.8499、0.8640、0.8634、0.8096的灵敏度分数和0.8070、0.8072、0.8181、0.8649、0.7686的Dice系数。与最先进模型相比,LightVesselNet显示出更高的效率(性能与参数或GFlops之比)。跨数据集评估证实了模型的泛化能力。总体而言,LightVesselNet是低资源临床环境和移动筛查工具中部署的有力候选者。

英文摘要

Retinal blood vessel segmentation plays a vital role in the early detection of diabetic retinopathy and glaucoma. While recent deep learning models have achieved great segmentation accuracy, they typically require heavy computational resources, making real-world deployment on edge devices difficult. In this paper, we propose LightVesselNet, an efficient neural network designed for retinal vessel segmentation in a resource-constrained environment. Despite containing only 75K parameters, LightVesselNet performs competitively with much larger models. The network employs a compact encoder decoder architecture enhanced with channel and spatial attention mechanisms, a multi-scale feature aggregation module at the bottleneck, and a subpixel upsampling strategy in the decoder. A dedicated edge residual connection preserves fine vessel detail throughout decoding. Extensive experiments on five publicly available datasets: DRIVE, STARE, CHASEDB1, FIVES, and HRF, yield sensitivity scores of 0.8189, 0.8499, 0.8640, 0.8634, 0.8096, and Dice coefficients of 0.8070, 0.8072, 0.8181, 0.8649, and 0.7686, respectively. LightVesselNet shows improved efficiency (Performance vs Parameter or GFlops) compared to State-of-the-Art models. Cross-dataset evaluation confirms the model's generalisation capability. Overall, LightVesselNet is a strong candidate for deployment in low-resource clinical settings and mobile screening tools.

2606.05347 2026-06-05 cs.CV

TopoPult-SSL: Gland-Mask-Free Cross-Device Meibomian Gland Segmentation via Self-Distilled Weak Clinical Priors

TopoPult-SSL: 通过自蒸馏弱临床先验实现无腺体掩膜的跨设备睑板腺分割

Nicolò Savioli, Luca Del Tongo

发表机构 * OdaxAI S.R.L.(OdaxAI公司) Topcon Group — VISIA Imaging S.R.L.(Topcon集团——VISIA成像公司)

AI总结 提出TopoPult-SSL两阶段框架,利用眼睑掩膜和临床元数据作为弱先验,通过自蒸馏实现跨设备睑板腺分割,无需目标腺体掩膜即可达到高精度。

Comments 13 pages, 4 figures, 5 tables

详情
AI中文摘要

每一种新的临床成像设备都会造成域偏移,其中密集的腺体掩膜成本高昂,而廉价的临床信号——眼睑轮廓、Pult分级、形态测量比率——则被常规记录。我们提出TopoPult-SSL,一个用于跨设备睑板腺分割的两阶段框架。第一阶段在训练损失中不使用目标腺体掩膜,仅通过目标眼睑掩膜和临床元数据驱动的四个弱先验锚点来适应源域训练模型。第二阶段,当目标腺体掩膜可用时,通过监督自蒸馏将互补的第一阶段教师模型蒸馏成一个紧凑的学生模型。我们在公共MGD-1k到CAMG研究基准(1000到100张图像,不同设备)上开发并验证了该技术,蒸馏模型达到Dice 0.716±0.006(最佳0.726),单次推理超越UA-MT(0.710)和集成教师(0.720)。无腺体掩膜的第一阶段变体达到精确度0.694,而SAM/MedSAM为0.30-0.34(p<0.001),使得无需密集腺体轮廓即可部署。代码和可复现脚本已发布。

英文摘要

Every new clinical imaging device creates a domain shift where dense gland masks are expensive yet cheap clinical signals -- eyelid outlines, Pult grades, morphometric ratios -- are routinely recorded. We present TopoPult-SSL, a two-stage framework for cross-device meibomian gland segmentation. Stage 1 adapts a source-trained model without target gland masks in the training loss, using four weak-prior anchors driven by target eyelid masks and clinical metadata only. Stage 2, when target gland masks are available, distils complementary Stage-1 teachers into a single compact student via supervised self-distillation. We develop and validate the technique on the public MGD-1k to CAMG research benchmark (1,000 to 100 images, different device), where the distilled model achieves Dice 0.716+/-0.006 (best 0.726), surpassing UA-MT (0.710) and the ensemble teacher (0.720) -- with a single pass. The gland-mask-free Stage-1 variant reaches Precision 0.694 vs. 0.30-0.34 for SAM/MedSAM (p<0.001), enabling deployment without dense gland contouring. Code and reproducibility scripts are released.

2606.05346 2026-06-05 cs.CL

Trajectory Dynamics in Language Model Hidden States Predict Human Processing Costs Beyond Surprisal

语言模型隐藏状态中的轨迹动力学预测超越惊讶度的人类处理成本

Elan Barenholtz

发表机构 * Machine Perception & Cognitive Robotics Laboratory(机器感知与认知机器人实验室) Department of Psychology(心理学系) Center for Complex Systems(复杂系统中心) Florida Atlantic University(佛罗里达 Atlantic 大学)

AI总结 通过线性外推语言模型隐藏状态轨迹的偏差,提出轨迹外推误差作为独立于惊讶度的人类处理成本预测因子,并在自然故事语料库中验证其对自定步速阅读时间的预测能力。

Comments 17 pages, 3 figures, 6 tables

详情
AI中文摘要

人类语言理解是顺序进行的:每个词在其前文语境中被处理,解释随时间逐步构建。惊讶度(给定语境下词的对数概率的负值)一直是增量处理成本的主要预测因子。但惊讶度将丰富的序列表示简化为每个词处的单个标量,丢弃了解释演化方向的信息。动力系统方法表明,演化解释状态的轨迹(而不仅仅是每个时刻的位置)应塑造处理过程,语言本身可能具有局部动量,因为说话者一次计划几个词。我们引入轨迹外推误差:在每个词处,我们拟合一条线性轨迹到变换器语言模型的前面隐藏状态,并测量与外推路径的偏差。在自然故事语料库上,该度量几乎与惊讶度正交(r = .044),并独立预测自定步速阅读时间。该效应在花园路径句子中尤为显著,随模型规模(GPT-2 Small到Large)增强,并在具有不同位置编码方案(GPT-2 vs. Pythia/RoPE)的架构中复现。位移控制显示该效应不能简化为表示变化幅度:位移和外推误差以相反方向预测。这些发现揭示了处理成本的两个可分离成分:词级预测误差(惊讶度)和对展开解释的局部动量(轨迹外推误差)的敏感性。

英文摘要

Human language comprehension unfolds sequentially: each word is processed in the context of those that came before, and the interpretation builds incrementally over time. Surprisal, the negative log probability of a word given its context, has been the dominant predictor of incremental processing cost. But surprisal reduces rich sequential representations to a single scalar at each word, discarding information about the direction in which the interpretation has been evolving. Dynamical-systems approaches suggest that the trajectory of the evolving interpretive state, not just its position at each moment,should shape processing, and language itself may have local momentum, since speakers plan utterances a few words at a time. We introduce trajectory extrapolation error: at each word, we fit a linear trajectory to the preceding hidden states of a transformer language model and measure deviation from the extrapolated path. On the Natural Stories corpus, this measure is nearly orthogonal to surprisal (r = .044) and independently predicts self-paced reading times. The effect is especially pronounced in garden-path sentences, strengthens with model scale (GPT-2 Small to Large), and replicates across architectures with different positional encoding schemes (GPT-2 vs. Pythia/RoPE). A displacement control shows the effect is not reducible to representational change magnitude: displacement and extrapolation error predict in opposite directions. These findings reveal two dissociable components of processing cost: word-level prediction error (surprisal) and sensitivity to the local momentum of the unfolding interpretation (trajectory extrapolation error).

2606.05345 2026-06-05 cs.LG

PJ-RoPE: A Fourier-Jet-Affine Position Space for Relative Attention

PJ-RoPE:一种用于相对注意力的傅里叶-喷气-仿射位置空间

Yaobo Zhang

发表机构 * School of Physics, Ningxia University(宁夏大学物理学院)

AI总结 本文提出PJ-RoPE,一种统一RoPE、Jordan-RoPE和ALiBi的傅里叶-喷气-仿射相对位置空间,通过可学习参数适应不同任务,并引入自适应扇区诊断和LC/快度坐标稳定高阶喷气。

Comments 26 pages, 6 figures, 10 tables. Code available at https://github.com/ybzhang-nxu/Poincare_Rope

详情
AI中文摘要

我们将RoPE的傅里叶相位、Jordan-RoPE的有限喷气和ALiBi的仿射近因统一到一个单一的可学习相对位置空间中,并研究不同任务选择该空间的哪些区域。PJ-RoPE是一种用于相对注意力的傅里叶-喷气-仿射公式,可选地具有庞加莱型解读,作为齐次傅里叶-喷气位置表示的仿射完备化。代数上,相同的基本元素构成一个有限常系数差分模:延迟移位算子的简单根给出傅里叶/RoPE特征,重复的非零根给出乔丹/傅里叶喷气,重复的单位根给出类似ALiBi的仿射近因。该框架将标量PJ偏置核与精确的PJ旋转特征变换分离,引入自适应扇区诊断,并使用LC/快度坐标稳定高阶喷气。受控探针验证了扇区包含和选择;小型语言运行暴露了仿射/近因边界;音乐令牌流提供了最清晰的情况,其中LC/仿射变体保持强劲,同时携带可测量的高阶修正;LC诊断显示尺度稳定性增益伴随相位分辨率损失。

英文摘要

We unify RoPE's Fourier phase, Jordan-RoPE's finite jets, and ALiBi's affine recency into a single learnable relative-position space, and study which regions of this space are selected by different tasks. PJ-RoPE is a Fourier-Jet-Affine formulation for relative attention, with an optional Poincare-type reading as the affine completion of a homogeneous Fourier-jet positional representation. Algebraically, the same primitives form a finite constant-coefficient difference module: simple roots of the lag-shift operator give Fourier/RoPE characters, repeated nonzero roots give Jordan/Fourier jets, and the repeated unit root gives ALiBi-like affine recency. The framework separates scalar PJ-bias kernels from exact PJ-rotary feature transforms, introduces adaptive sector diagnostics, and uses LC/rapidity coordinates to stabilize high-order jets. Controlled probes verify sector containment and selection; small language runs expose an affine/recency boundary; music-token streams provide the clearest case where LC/affine variants remain strong while carrying measurable high-order corrections; and LC diagnostics show a scale-stability gain coupled to phase-resolution loss.

2606.05336 2026-06-05 cs.CL

Self-supervised User Profile Generation for Personalization

面向个性化的自监督用户画像生成

Clark Mingxuan Ju, Yuwei Qiu, Tong Zhao, Neil Shah

发表机构 * Snap Inc.(Snap公司) bellevue, WA USA(华盛顿州西雅图市)

AI总结 提出BUMP框架,利用自监督双向排序目标训练大语言模型生成用户文本画像,无需下游标注即可实现个性化。

详情
AI中文摘要

随着大语言模型(LLM)被部署到推荐、搜索、对话和内容生成等场景——在这些场景中,相同的查询应针对不同用户给出不同答案——个性化LLM已成为核心挑战。一个有前景的方法是将每个用户的交互历史总结为自然语言记忆或画像,并将其前置到提示中以便于个性化。现有方法使用来自标注下游任务的显式奖励来学习此类画像生成器,但这种方法成本高昂且稀疏,因为需要为每个目标任务提供标注监督。鉴于这一挑战,我们引入了通过画像的双向用户建模(BUMP),这是一个自监督框架,无需任何下游标签即可训练画像生成器。具体来说,给定用户的交互历史,我们使用GRPO训练LLM在双向批次内排序目标下生成自由形式的文本画像:一个小型LLM评判器衡量(i)生成的画像作为查询时,在批次中将用户自己的保留交互排在其他用户交互之上的程度,以及(ii)一个保留交互作为查询时,在批次中将用户自己的画像排在其他用户画像之上的程度。两个方向均使用多正例NDCG评分,并合并为每次生成的密集奖励;批次中的其他用户提供免费负例,因此每个训练样本仅从原始交互日志中获得监督。在LaMP基准测试上,BUMP匹配或超越了依赖标注奖励的闭源API和先前方法,同时在训练时无需任何任务标签。

英文摘要

Personalizing large language models (LLMs) has become a central challenge as LLMs are deployed across recommendation, search, dialogue, and content generation -- settings where the same query should yield different answers given different users. A promising route is to summarize each user's interaction history into a natural-language memory or profile and prepend it to the prompt to facilitate personalization. Existing methods learn such profile generators with explicit rewards derived from labeled downstream tasks, which are expensive and sparse as they require annotated supervision for every target task. In light of this challenge, we introduce Bidirectional User Modeling via Profiles (BUMP), a self-supervised framework that trains a profile generator without any downstream labels. Specifically, given a user's interaction history, we use GRPO to train an LLM to emit a free-form textual profile under a bidirectional in-batch ranking objective: a small LLM judge measures (i) how well the generated profile, used as a query, ranks the user's own held-out interactions above interactions from other users in the batch, and (ii) how well a held-out interaction, used as a query, ranks the user's own profile above profiles of other users. Both directions are scored with multi-positive NDCG and combined into a dense reward per rollout; other users in the batch supply free negatives, so every training example yields supervision from raw interaction logs alone. Evaluated on the LaMP benchmark, BUMP matches or outperforms closed-source APIs and prior methods relying on labeled rewards, while requiring no task label at training.

2606.05335 2026-06-05 cs.LG stat.ML

A prism hierarchy of learning regimes in large linear autoencoders

大型线性自编码器中学习机制的三棱柱层次结构

Eugene Golikov, Yaroslav Gusev, Dmitry Yarotsky

发表机构 * Applied AI Institute(应用人工智能研究所) Steklov Mathematical Institute of Russian Academy of Sciences(俄罗斯科学院斯捷克洛夫数学研究所)

AI总结 本文通过形式损失展开层次结构,将大型权重绑定线性自编码器的极端学习机制与三棱柱的面相关联,推导出五种基本极端机制下的训练和总体损失演化显式表达式。

Comments 33 pages, under review for NeurIPS'2026

详情
AI中文摘要

机器学习模型的理论研究通常考虑不同的极限机制,在这些机制下梯度下降的学习动态在理论上变得可处理。然而,对于特定类型的模型,系统地获得所有定性不同的极端学习机制的图景是可取的。在本文中,我们为大型权重绑定线性自编码器提出了这样一个图景,其特征由输入和潜在维度、初始化幅度以及训练集大小决定。该模型在权重上非线性,其梯度流没有一般的理论解。我们表明,在形式损失展开层次结构层面,其极端机制自然地与三棱柱的面相关联。特别地,存在与棱柱的2-面相关的五种基本极端机制:(1) 大数据,(2) 小数据,(3) 平均场,(4) 窄潜在,以及 (5) 自由。对于机制 (1,2,3,4),我们推导了梯度流下训练和总体极限损失演化的显式表达式,与实验结果非常吻合。

英文摘要

Theoretical studies of machine learning models commonly consider different limiting regimes in which the learning dynamics of gradient descent becomes theoretically tractable. It is, however, desirable to have a systematically obtained picture of all qualitatively different extreme learning regimes for a particular type of models. In this paper we propose such a picture for large weight-tied linear autoencoders characterized by input and latent dimensions, initialization magnitude, and training set size. This model is nonlinear in the weights and its gradient flow does not have a general theoretical solution. We show that at the level of the formal loss-expansion hierarchy, its extreme regimes are naturally associated with faces of a triangular prism. In particular, there are five basic extreme regimes associated with the 2-faces of the prism: (1) large-data, (2) small-data, (3) mean-field, (4) narrow-latent, and (5) free. For regimes (1,2,3,4), we derive explicit expressions for both train and population limiting loss evolutions under gradient flow, obtaining very good agreement with experimental results.

2606.05334 2026-06-05 cs.AI

Uncertainty Aware Functional Behavior Prediction and Material Fatigue Assessment for Circular Factory

面向循环工厂的不确定性感知功能行为预测与材料疲劳评估

Nehal Afifi, Mehdi Khabou, Victor Mas, Jonas Hemmerich, Patric Grauberger, Stefan Dietrich, Volker Schulze, Sven Matthiesen

发表机构 * IPEK Institute of Product Engineering, Karlsruhe Institute of Technology (KIT)(IPEK产品工程研究所,卡尔斯鲁厄理工学院) IAM-WK Institute for Applied Materials – Materials Science and Engineering, Karlsruhe Institute of Technology (KIT)(应用材料研究所–材料科学与工程,卡尔斯鲁厄理工学院) wbk Institute of Production Science, Karlsruhe Institute of Technology (KIT)(生产科学研究所,卡尔斯鲁厄理工学院)

AI总结 针对循环工厂中回收产品异质退化状态下的再利用决策问题,提出一种结合不确定性感知功能预测与组件级疲劳评估的实例特定可靠性框架,通过卷积编码器提取载荷模式、LSTM预测功能变量、有限元应力重建与疲劳损伤评估,实现功能、材料和系统可靠性轨迹的融合。

Comments 27 pages, submitted to the Journal of Manufacturing Systems' special issue about circular factories, the manuscript is under review

详情
AI中文摘要

循环工厂中的回收产品以异质退化状态、使用历史和剩余能力重新进入生产。仅凭当前检查无法决定再利用,因为未来功能实现和组件完整性可能在下一个服务场景下以不同方式演变。现有的PHM方法支持退化预测,但通常针对固定操作条件或孤立组件基准,而材料疲劳评估很少与系统级功能预后相关联。本文针对角磨机通过将不确定性感知功能预测与组件级疲劳评估结合在一个实例特定的可靠性工作流程中来解决这一差距。所提出的框架结合了当前工具状态与最近的力-扭矩使用窗口。卷积编码器从主轴力和轴扭矩中提取载荷模式,LSTM骨干网络预测九个功能变量作为高斯均值和方差估计。同时,相同的载荷历史通过有限元支持的应力重建、带Haibach扩展的S-N/Miner损伤评估和Paris定律裂纹扩展分析转化为输出轴疲劳信息。流式重放算法将两个分支整合为功能、材料和系统可靠性轨迹。保留测试显示九个输出的平均2%容差精度为0.9652。热变量预测近乎完美,而驱动电机电流和负载速度仍然是最具挑战性的动态输出,R²值分别为0.9750和0.9924。扭矩历史对这些变量尤其重要,传统LSTM在短历史设置中优于GRU和xLSTM。可靠性校准对驱动电机电流信息量最大,其中预测和观测的超越概率...

英文摘要

Returned products in circular factories re-enter production with heterogeneous degradation states, usage histories, and remaining capability. Reuse cannot be decided from the current inspection alone, because future function fulfillment and component integrity may evolve differently under the next service scenario. Existing PHM approaches support degradation prediction, but often target fixed operating conditions or isolated component benchmarks, while material-fatigue assessment is rarely linked to system-level functional prognosis. This paper addresses this gap for an angle grinder by combining uncertainty-aware functional prediction with component-level fatigue assessment in an instance-specific reliability workflow. The proposed framework combines the current tool state with recent force--torque usage windows. A convolutional encoder extracts loading patterns from spindle forces and shaft torque, and an LSTM backbone predicts nine functional variables as Gaussian mean and variance estimates. In parallel, the same loading history is translated into output-shaft fatigue information through finite-element-supported stress reconstruction, S--N/Miner damage evaluation with Haibach extension, and Paris-law crack-growth analysis. A streaming replay algorithm consolidates both branches into functional, material, and system reliability trajectories. Held-out tests show mean \(2\%\)-tolerance accuracy of 0.9652 across nine outputs. Thermal variables are predicted near-perfectly, while drive motor current and load speed remain the most demanding dynamic outputs, with \(R^2\) values of 0.9750 and 0.9924. Torque history is especially important for these variables, and the conventional LSTM outperforms GRU and xLSTM in the short-history setting. Reliability calibration is most informative for drive motor current, where predicted and observed exceedance probabilities ...

2606.05332 2026-06-05 cs.AI

GITCO: Gated Inference-Time Context Optimization in TSFMs

GITCO:TSFMs中的门控推理时上下文优化

Manya Pandey, Dhruv Kumar, Murari Mandal, Saurabh Deshpande

发表机构 * Birla AI Labs(巴尔拉人工智能实验室)

AI总结 提出GITCO框架,通过门控机制在推理时选择性抑制有害补丁,无需更新参数即可提升基于补丁的时间序列基础模型的零样本预测精度。

Comments ICML 2026 Workshop on Foundation Models for Structured Data

详情
AI中文摘要

基于补丁的时间序列基础模型(TSFMs)遭受上下文中毒:结构异常的补丁捕获了不成比例的注意力,并无声地降低了零样本预测质量。我们提出通过在推理时优化输入上下文而不是修改模型权重来提高TSFM精度。我们提出了GITCO(门控推理时上下文优化),一个轻量级的三组件框架:门控、路由和批评者,无需任何参数更新即可选择性地识别和抑制有害补丁。在TimesFM 2.5上,跨53个GIFT-Eval数据集进行K折交叉验证评估,GITCO在TimesFM 2.5上实现了平均+1.95%的MASE降低,同时捕获了89.9%的改进上限。我们引入了上下文敏感性配置文件作为TSFMs的一个新的可表征属性:从时间序列元特征到推理时上下文干预下预期精度改进的映射,由模型架构和数据的统计结构共同塑造。

英文摘要

Patch-based Time Series Foundation Models (TSFMs) suffer from context poisoning: structurally anomalous patches capture disproportionate attention and silently degrade zero-shot forecast quality. We propose improving TSFM accuracy at inference time by optimizing the input context rather than modifying model weights. We present GITCO (Gated Inference-Time Context Optimization), a lightweight three-component framework: Gate, Router, and Critic that selectively identifies and suppresses harmful patches without any parameter updates. Evaluated on TimesFM 2.5 across 53 GIFT-Eval datasets under K-fold cross-validation, GITCO achieves an average +1.95% MASE reduction on TimesFM 2.5 while capturing 89.9% of the improvement upper bound. We introduce context sensitivity profiles as a new characterizable property of TSFMs: the mapping from time series meta-features to expected accuracy improvement under inference-time context intervention, shaped jointly by model architecture and the statistical structure of the data.

2606.05330 2026-06-05 cs.CL cs.AI cs.HC

A Model of Multi-turn Human Persuadability Using Probabilistic Belief Tracing

基于概率信念追踪的多轮人类可说服性模型

Jared Moore, Noah Goodman, Nick Haber, Max Kleiman-Weiner

发表机构 * Stanford University(斯坦福大学) University of Washington(华盛顿大学)

AI总结 提出PERSUASIONTRACE框架,通过记录多轮信念报告、标注修辞维度并引入贝叶斯网络模拟目标,将说服评估从端点变化转向过程保真度。

详情
AI中文摘要

大型语言模型可以在高风险领域改变人类信念,但大多数说服研究依赖于前/后信念变化。这些端点测量确定了说服是否发生,却忽略了信念在对话中移动的位置和方式。我们提出了PERSUASIONTRACE,一个用于研究人机交互中说服的框架。基于网络实验平台,PERSUASIONTRACE贡献了一个多轮说服研究的工具和一个过程级评估协议:它记录来自人类或模拟说服目标的多轮信念报告,用修辞维度(logos/pathos/ethos)标注说服者轮次,并通过保真度评估模拟器与真实人类信念动态的匹配程度。使用该框架,我们发现人类目标分为两个多轮信念更新聚类,并对修辞策略表现出易感性;LLM在通用和个性化主题、文本和音频模态以及多轮交互中都具有说服力。先前的工作主要使用普通提示的LLM来模拟人类目标,但我们表明这些模拟器无法复制人类信念动态。我们引入了一个贝叶斯网络模拟目标,它随时间维持显式的潜在信念状态,使得每个说服者消息产生认知上真实的信念更新。在人类相似性评估中,我们的贝叶斯目标得分接近人类参考(81 vs 80),而基线LLM目标得分显著较低(64)。PERSUASIONTRACE将说服评估从仅端点移动重新定义为过程保真度,为科学分析和说服系统的更安全优化提供了更强的基础。

英文摘要

Large language models can shift human beliefs across high-stakes domains, but most persuasion studies rely on pre/post belief change. These endpoint measures identify whether persuasion occurred, yet miss where and how beliefs moved within a dialogue. We present PERSUASIONTRACE, a framework for studying persuasion in human-LLM interaction. Built on a web-based experimental platform, PERSUASIONTRACE contributes a tool for multi-turn persuasion studies and a process-level evaluation protocol: it records multi-turn belief reports from human or simulated targets of persuasion, annotates persuader turns with rhetorical dimensions (logos/pathos/ethos), and evaluates simulators by fidelity to real human belief dynamics. Using this framework, we find that human targets group into two clusters of multi-turn belief updates and exhibit susceptibility to rhetorical strategies, and that LLMs are persuasive across generic and personalized topics, text and audio modalities, and multi-turn interactions. Prior work has chiefly used vanilla-prompted LLMs to simulate human targets, but we show that these simulators fail to replicate human belief dynamics. We introduce a Bayesian-network simulated target that maintains an explicit latent belief state over time so each persuader message yields cognitively realistic belief updates. In human-likeness evaluation, our Bayesian target scores near a human reference (81 vs 80), while baseline LLM targets score substantially lower (64). PERSUASIONTRACE reframes persuasion evaluation from endpoint movement alone to process fidelity, providing a stronger basis for scientific analysis and safer optimization of persuasive systems.

2606.05327 2026-06-05 cs.LG q-bio.QM stat.ML

Multimarginal flow matching with optimal transport potentials

基于最优传输势的多边缘流匹配

Raghav Kansal, David Crair, Nghia Nguyen, Scott Pope, Bradley Parry

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种利用动态最优传输势引导流匹配学习中间边缘分布的方法,实现高效无模拟的多边缘流匹配,在单细胞RNA测序、海洋学和气象数据集上取得最优性能。

Comments 9 pages, 3 figures, 4 tables, and a 27 page appendix. Accepted to the Forty-Third International Conference on Machine Learning

详情
AI中文摘要

流匹配(FM)已成为学习两个经验分布之间动态传输映射的强大框架。然而,对于存在中间观测边缘分布的情况,这些边缘分布有助于约束端点之间的流,这方面的研究较少。这种“多边缘”设置对于许多科学领域中动态系统的时间演化建模至关重要,这些领域可以对序列分布进行采样。我们通过一种新颖的方法解决了这个问题,该方法利用了FM与动态最优传输(OT)之间的联系,通过动态OT作用中的势项将流柔和地引导向中间边缘分布。通过扩展条件FM学习目标以包含这些势,我们推导出一种高效、无模拟的多边缘FM算法,该算法在学习流的时空动力学方面提供了相当大的灵活性。我们在不同的单细胞RNA测序、海洋学和气象数据集上展示了OT势FM(OTP-FM)的最先进性能和训练效率。我们的代码可在https://github.com/Bexorg-Inc/OTP-FM获取。

英文摘要

Flow matching (FM) has emerged as a powerful framework for learning dynamic transport maps between two empirical distributions. However, less explored is the setting with intermediate observed marginals that can help constrain the flows between the endpoints. This "multimarginal" regime is central to modeling temporal evolution in dynamical systems in many scientific domains that can sample sequential distributions. We tackle this problem with a novel approach that leverages the connection between FM and dynamic optimal transport (OT), softly steering the flow towards the intermediate marginals through potential terms in the dynamic OT action. By extending the conditional FM learning target to incorporate these potentials, we derive an efficient, simulation-free algorithm for multimarginal FM that offers considerable flexibility in the spatiotemporal dynamics of the learned flows. We demonstrate state-of-the-art performance and training efficiency of OT-potential FM (OTP-FM) on diverse single-cell RNA sequencing, oceanographic, and meteorological datasets. Our code is available at https://github.com/Bexorg-Inc/OTP-FM.

2606.05316 2026-06-05 cs.AI

I Know What You Meme, Even If it Emerged Today: Understanding Evolving Memes through Open-World Knowledge Acquisition

我知道你的梗,即使它今天才出现:通过开放世界知识获取理解不断演变的梗

Shanhong Liu, Rui Cao, Pai Chet Ng, De Wen Soh

发表机构 * Singapore University of Technology and Design(新加坡科技设计大学) Singapore Institute of Technology(新加坡理工学院)

AI总结 提出Query Retrieve Conclude零样本框架,通过识别缺失知识、检索开放网络证据并合成背景知识,以理解新兴梗并提升检测性能。

详情
AI中文摘要

多模态梗是动态的,通常需要最新的背景知识来进行解释。现有方法往往忽略此类知识,或依赖预训练模型的固定参数知识,这些知识可能不完整、过时或无法用于新兴梗。我们引入了Query Retrieve Conclude,一个零样本框架,用于识别缺失知识、检索开放网络证据并合成基于证据的背景知识,以进行梗的理解和检测。我们还引入了一个精心策划的梗理解基准,包含2024年至2026年的近期梗及其外部背景知识注释。在三个梗理解数据集和五个梗检测任务上的实验表明,我们的框架在知识恢复、梗理解和下游检测方面优于零样本基线。

英文摘要

Multimodal memes are dynamic and often require up to date background knowledge for interpretation. Existing methods often overlook such knowledge or rely on fixed parametric knowledge of pretrained models that may be incomplete, outdated, or unavailable for emerging memes. We introduce Query Retrieve Conclude, a zero shot framework that identifies missing knowledge, retrieves open web evidence, and synthesizes evidence grounded background knowledge for meme understanding and detection. We also introduce a curated meme understanding benchmark of recent memes from 2024 to 2026 with external background knowledge annotations. Experiments on three meme understanding datasets and five meme detection tasks show that our framework improves knowledge recovery, meme understanding and downstream detection over zero shot baselines.

2606.05315 2026-06-05 cs.CL cs.AI

LoRi: Low-Rank Distillation for Implicit Reasoning

LoRi: 用于隐式推理的低秩蒸馏

Ryan Solgi, Jiayi Tian, Zheng Zhang

发表机构 * University of California-Santa Barbara(加州大学圣巴巴拉分校)

AI总结 提出低秩蒸馏框架,通过对齐师生模型在共享低秩张量子空间中的隐状态推理轨迹,提升大型语言模型的隐式思维链推理能力。

详情
AI中文摘要

隐式思维链方法旨在将推理内化到大型语言模型中,但通常表现不如显式思维链提示。我们通过实验发现,隐状态推理轨迹具有低秩结构。基于此观察,我们提出了一种低秩蒸馏框架,通过使用一阶和二阶统计量,在共享的低秩张量子空间中对齐教师和学生轨迹来传递推理能力。得到的公式捕捉了推理的全局结构,同时支持紧凑的潜在推理过程。我们在多个模型家族(包括LLaMA和Qwen)上,在不同规模下对数学推理基准进行了评估。我们的方法持续提升了性能,尤其是在具有挑战性的多步任务上,接近显式思维链的准确率,并优于先前的隐式思维链蒸馏方法。

英文摘要

Implicit chain-of-thought (iCoT) methods aim to internalize reasoning in large language models, but often underperform explicit CoT prompting. We empirically find that hidden-state reasoning trajectories exhibit low-rank structure. Motivated by this observation, we propose a low-rank distillation framework that transfers reasoning by aligning teacher and student trajectories in a shared low-rank tensor subspace using first- and second-order statistics. The resulting formulation captures the global structure of reasoning while supporting a compact latent reasoning process. We evaluate the method across multiple model families, including LLaMA and Qwen, at different scales on mathematical reasoning benchmarks. Our approach consistently improves performance, especially on challenging multi-step tasks, approaching explicit CoT accuracy and outperforming prior iCoT distillation methods.

2606.05308 2026-06-05 cs.LG cs.AI cs.CL cs.IR stat.AP

Statistically Reliable LLM-Based Ranking Evaluation via Prediction-Powered Inference

基于预测驱动推断的统计可靠LLM排序评估

Abhishek Divekar

发表机构 * Amazon(亚马逊)

AI总结 提出PRECISE框架,将预测驱动推断扩展到排序评估指标,通过结合少量人工标注和大量LLM判断实现无偏估计,并在ESCI基准和实际系统中验证了有效性。

Comments Accepted at ACL 2026 - GEM Workshop

详情
AI中文摘要

通过PRECISE,我们将预测驱动推断扩展到排序评估指标,通过结合少量人工标注集和大量LLM判断集,产生偏差校正的估计。PPI无论LLM判断器的错误分布如何,都是可证明无偏的。我们通过将输出空间计算从O(2^|C|)减少到O(2^K),使其适用于像Precision@K这样的分层指标,其中标注是按文档的,但指标是按查询的。在ESCI基准上,用Claude 3 Sonnet判断增强30个人工标注,将Precision@4估计的标准误差从4.45降低到3.50(相对减少21%)。在一个生产系统中,我们的框架从100个人工标签和2小时的领域专家标注中正确识别了三个系统变体中最好的一个;A/B测试确认了这一排序,日销售额增加了407个基点。

英文摘要

With PRECISE, we extended Prediction-Powered Inference to produce bias-corrected estimates of ranking evaluation metrics by combining a small human-labeled set with a large LLM-judged set. PPI is provably unbiased regardless of the LLM judge's error profile. We make it applicable to hierarchical metrics like Precision@K, where annotations are per-document but the metric is per-query, by reducing the output-space computation from O(2^|C|) to O(2^K). On the ESCI benchmark, augmenting 30 human annotations with Claude 3 Sonnet judgments reduces the standard error of Precision@4 estimates from 4.45 to 3.50 (a 21% relative reduction). In a production system, our framework correctly identified the best of three system variants from 100 human labels and 2 hours of domain-expert annotation; A/B testing confirmed this ranking with +407 bps in daily sales.

2606.05304 2026-06-05 cs.AI

What Should Agents Say? Action-state Communication for Efficient Multi-Agent Systems

智能体应该说什么?面向高效多智能体系统的动作-状态通信

Chen Huang, Yuhao Wu, Wenxuan Zhang

发表机构 * Singapore University of Technology and Design(新加坡科技设计大学)

AI总结 针对多智能体系统中自由形式通信导致令牌膨胀和性能下降的问题,提出PACT协议,将通信视为公共状态更新问题,压缩为紧凑的动作-状态记录,在多种拓扑下实现性能与成本权衡的优化。

Comments 13 pages, 5 figures

详情
AI中文摘要

基于大语言模型的多智能体系统通常围绕角色、流水线和轮次调度进行组织,而智能体之间传递的内容往往被保留为无约束的自然语言。然而,这种自由形式的通信会迅速膨胀令牌使用量,消耗共享上下文窗口,并最终影响系统性能和推理成本。我们分析了两种多智能体系统拓扑中五种常见的智能体间通信策略,发现没有固定策略是普遍最优的。相反,有效的智能体间消息始终保留下游智能体所需的以动作中心的信息。基于此,我们提出了PACT(协议化动作-状态通信与传输),它将智能体间通信视为公共状态更新问题,并在每个原始智能体输出进入共享历史之前将其投影为紧凑的动作-状态记录。在不同的多智能体系统拓扑中,PACT持续改善了性能-成本权衡,以显著更少的令牌实现了相当或更强的任务性能。这些增益扩展到生产编码工具:PACT将OpenHands的解决率提升了-10%的每解决令牌数,并在SWE-agent上保持解决率中性,同时将输入令牌减半。我们的代码公开在https://github.com/iNLP-Lab/PACT。

英文摘要

Multi-agent systems (MAS) built on large language models are typically organized around roles, pipelines, and turn schedules, while the content that agents pass to one another is often left as unconstrained natural language. However, this free-form communication can rapidly inflate token usage, consume the shared context window, and ultimately affect both system performance and inference cost. We analyze five common inter-agent communication strategies across two MAS topologies, finding that no fixed strategy is universally optimal. Instead, effective inter-agent messages consistently preserve action-centered information needed by downstream agents. Building on this, we propose the PACT (Protocolized Action-state Communication and Transmission), which treats inter-agent communication as a public state-update problem and projects each raw agent output into a compact action-state record before it enters shared history. Across different MAS topologies, PACT consistently improves the performance-cost trade-off, achieving comparable or stronger task performance with substantially fewer tokens. The gains extend to production coding harnesses: PACT lifts OpenHands' resolve rate at -10% tokens-per-resolved, and is resolve-neutral on SWE-agent while halving input tokens. Our code is publicly available at https://github.com/iNLP-Lab/PACT.

2606.05296 2026-06-05 cs.LG cs.AI

Agentic Monte Carlo: Simulating Reinforcement Learning for Black-Box Agents

智能体蒙特卡洛:黑盒智能体的强化学习模拟

Dae Yon Hwang, Raunaq Suri, Valentin Villecroze, Anthony L. Caterini, Jesse C. Cresswell, Noël Vouitsis, Brendan Leigh Ross

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出Agentic Monte Carlo (AMC)方法,利用序贯蒙特卡洛从最优策略后验中采样,无需参数级优化即可对黑盒LLM智能体进行强化学习式优化,在AgentGym基准上超越提示基线并随测试时计算扩展优于GRPO。

Comments Accepted by ICML 2026

详情
AI中文摘要

LLM智能体在两种不同的机制下运行:适用于强化学习(RL)的开权重智能体,以及其行为必须在测试时纯粹控制的黑盒智能体。尽管黑盒智能体通常由最先进的专有LLM支持,但仅API访问排除了参数级优化,使得大多数RL方法不适用。为解决这一限制,我们转向RL与贝叶斯推断之间的已知等价性。我们提出智能体蒙特卡洛(AMC),直接从黑盒智能体的最优策略中采样,而不是通过RL训练它。最优策略是轨迹上的后验,其先验我们定义为固定的黑盒LLM智能体。我们采用序贯蒙特卡洛从该后验中采样,通过学习一个价值函数来引导智能体,同时保持底层黑盒模型不变。我们在AgentGym基准的三个不同环境中验证了AMC,展示了相对于提示基线的显著改进,并且随着我们方法测试时计算的扩展,甚至优于组相对策略优化(GRPO)。AMC证明了执行黑盒LLM智能体的原则性RL式优化的可行性。代码可在https://github.com/layer6ai-labs/Agentic-Monte-Carlo获取。

英文摘要

LLM agents operate in two distinct regimes: open-weight agents amenable to reinforcement learning (RL) and black-box agents whose behaviour must be controlled purely at test time. Although black-box agents are often backed by state-of-the-art proprietary LLMs, API-only access precludes parameter-level optimization, rendering most RL methods inapplicable. To address this limitation, we turn to a known equivalence between RL and Bayesian inference. We propose Agentic Monte Carlo (AMC) to directly sample from the optimal policy of a black-box agent rather than training it through RL. The optimal policy is a posterior over trajectories whose prior we define as the fixed black-box LLM agent. We employ Sequential Monte Carlo to sample from this posterior by learning a value function to steer the agent while leaving the underlying black-box model unchanged. We validate AMC on three diverse environments from the AgentGym benchmark, demonstrating significant improvements over prompting baselines and even outperforming Group Relative Policy Optimization (GRPO) as we scale the test-time compute of our method. AMC demonstrates the feasibility of performing principled RL-style optimization of black-box LLM agents. Code is available at https://github.com/layer6ai-labs/Agentic-Monte-Carlo

2606.05290 2026-06-05 cs.CV cs.AI cs.MM

Do Models Share Safety Representations? Cross-Model Steering for Safe Visual Generation

模型是否共享安全表示?面向安全视觉生成的跨模型引导

Tobia Poppi, Silvia Cappelletti, Sara Sarto, Florian Schiffers, Garin Kessler, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara

发表机构 * University of Modena and Reggio Emilia(摩德纳和雷吉奥艾米利亚大学) University of Pisa(比萨大学) Amazon Prime Video(亚马逊prime视频)

AI总结 本文提出首个跨模型安全引导框架,通过源语言模型估计安全方向并迁移至目标生成器,无需目标侧不安全数据即可实现安全控制,且不牺牲生成质量。

Comments Project page: https://aimagelab.github.io/cross-model-safety-representations/

详情
AI中文摘要

生成建模的最新进展使安全控制成为核心挑战,但现有方法大多针对特定模型,需要为每种新架构重新训练或定制干预。在这项工作中,我们探究安全是否可以被表示为一种可移植的潜在方向,一次性学习并在异构生成器之间重用。我们引入了首个跨模型安全引导框架,其中从成对的安全-不安全提示中在源大语言模型中估计安全方向,通过仅在良性数据上拟合的轻量级对齐传输到目标生成器,并在推理时应用。关键的是,我们的流程从未访问目标侧的不安全数据,从而隔离了安全是否可以通过共享表示几何进行转移。除了单个全局方向,我们还识别了一种多向量扩展,捕获类别特定的安全行为,实现更具选择性的控制。我们在文本到图像和文本到视频生成中评估了我们的方法,跨越不同的源-目标模型对。跨模型转移的安全方向实现了与在目标模型上使用不安全数据本地学习的方向相当的ASR降低和CLIP-Score/FID权衡,同时不需要目标侧的不安全数据。这表明安全改进不以生成质量为代价。我们的结果指向了一种模块化的安全观:安全相关行为并非纯粹模型局部,而是可以通过跨模型持续的潜在方向进行控制。这为轻量级、可重用的安全机制开辟了新路径,且无需目标侧不安全数据。

英文摘要

Recent progress in generative modeling has made safety control a central challenge, yet existing approaches remain largely model-specific, requiring retraining or tailored interventions for each new architecture. In this work, we ask whether safety can be represented as a portable latent direction, learned once and reused across heterogeneous generators. We introduce the first framework for cross-model safety steering, in which a safety direction is estimated in a source LLM from paired safe-unsafe prompts, transported to a target generator through a lightweight alignment fitted on benign data alone, and applied at inference time. Crucially, our pipeline never accesses unsafe data on the target side, isolating whether safety can be transferred through shared representation geometry. Beyond a single global direction, we also identify a multi-vector extension that captures category-specific safety behaviors, enabling more selective control. We evaluate our approach in text-to-image and text-to-video generation across diverse source-target model pairs. Across models, transferred safety directions achieve ASR reduction and CLIP-Score/FID trade-offs comparable to directions learned natively on the target model using unsafe data, while requiring no target-side unsafe data. This indicates that safety improvements do not come at the expense of generation quality. Our results point to a modular view of safety: safety-relevant behavior is not purely model-local, but can be controlled through latent directions that persist across models. This suggests a new path toward lightweight, reusable safety mechanisms that do not require target-side unsafe data.

2606.05275 2026-06-05 cs.CV cs.AI

Personal AI Agent for Camera Roll VQA

个人AI代理用于相机胶卷VQA

Thao Nguyen, Krishna Kumar Singh, Donghyun Kim, Yong Jae Lee, Yuheng Li

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校) Korea University(韩国大学) Adobe Research(Adobe研究院)

AI总结 本文提出camroll数据集和camroll-agent代理,通过层次化记忆和工具集解决个人相机胶卷中的长程、高度个性化的视觉问答问题。

Comments Project page, code, and demo: https://thaoshibe.github.io/camroll

详情
AI中文摘要

我们研究了个人相机胶卷的视觉问答设定。在该设定中,一个对话式AI助手可以访问用户的个人相机胶卷并检索相关照片来回答查询,从简单的事实性问题(例如,“我昨天尝试的食物名称?”)到更开放的问题(例如,“推荐一些我从未吃过的菜肴”)。鉴于个人相机胶卷的庞大性质(即多年、数百到数千张照片),一个成功的AI助手需要理解长程、高度个性化的视觉内容流,以便导航和定位正确和/或相关信息。为此,我们收集并手动标注了模拟真实世界使用场景的问题。最终数据集camroll包含50个用户、31,476张图像和2,500个问答对。我们进一步设计了camroll-agent,一个配备层次化记忆和最小工具集的对话式AI代理,用于在大型个性化视觉记忆上高效导航。实验结果表明,camroll-agent在长上下文理解的AI代理系统中优于众多基线和方法。总之,camroll数据集和camroll-agent凸显了AI代理在长上下文推理中的差距:个性化视觉记忆需要与标准长上下文文本记忆不同的方法,尤其是在存在一致性、视觉细节和用户特定上下文时。

英文摘要

We study the personal camera roll visual question answering setting. In this setting, a conversational AI assistant can access a user's personal camera roll and retrieve relevant photos to answer queries, ranging from simple factual questions (e.g., ``Name of the food I tried yesterday?'') to more open-ended ones (e.g., ``Recommend some dishes I have never eaten before''). Given the vast nature of the personal camera roll (i.e., multiple years, hundreds to thousands of photos), a successful AI assistant needs to understand a long-horizon, highly personalized visual content stream in order to navigate and locate the correct and/or relevant information. To support this, we collect and manually annotate questions that mimic real-world usage. The final dataset, camroll, contains 50 users, 31,476 images, and 2,500 QA pairs. We further design camroll-agent, a conversational AI agent equipped with hierarchical memory and a minimal set of tools for efficient navigation over large, personalized visual memory. Experimental results show that camroll-agent outperforms numerous baselines and methods for long-context understanding AI agents system. Together, the camroll dataset and camroll-agent highlight the gap in AI agents' long-context reasoning: personalized visual memory requires different approaches from standard long-context textual memory, especially when consistency, visual details, and user-specific context are present.

2606.05274 2026-06-05 cs.LG

Anomaly Detection for Electro-Hydrostatic Actuators using LSTM Autoencoder

基于LSTM自编码器的电液伺服作动器异常检测

Nehal Afifi, Abdelmonem Elhendawi, Felix Leitenberger, Nadine Piat, Sven Matthiesen

发表机构 * IPEK - Institute of Product Engineering, Karlsruhe Institute of Technology (KIT), Germany(产品工程研究所,卡尔斯鲁厄理工学院(KIT),德国) SUPMICROTECH-ENSMM, France(SUPMICROTECH-ENSMM,法国)

AI总结 针对电液伺服作动器传感器信号,提出基于LSTM自编码器的重构异常检测框架,在多种故障注入场景下达到99%平均准确率与极低误报率。

Comments 8 pages, 6 figures, 3 tables, ESREL 2026 -European Safety and Reliability Conference, accepted paper to be published

详情
AI中文摘要

电液伺服作动器(EHA)广泛应用于航空航天和工业系统,及时检测传感器异常对于确保安全可靠运行至关重要。然而,EHA传感器数据量大且采样频率高,给准确高效的异常检测带来了挑战。传统的统计和经典机器学习方法,如Z-score、四分位距(IQR)、中位数绝对偏差(MAD)、孤立森林、高斯混合和k-means,往往无法捕捉EHA信号中固有的时间依赖性,导致检测精度有限且误报率升高。此外,针对EHA系统的数据驱动异常检测方法的系统评估仍然很少,特别是在不同运行条件下。本研究提出了一种针对单变量EHA传感器信号的离线异常检测框架,重点关注从受控测试台收集的温度和压力数据。该方法采用基于重构的长短期记忆(LSTM)自编码器,通过验证集重构误差分布进行校准和评估。在多种故障注入场景下,使用准确率、精确率、召回率和F1分数评估性能,并辅以不同运行条件下的敏感性分析。LSTM自编码器在所有评估传感器上实现了平均准确率99.0%、精确率高达100%、召回率介于90.2%至99.6%之间、F1分数介于93.1%至99.8%之间,显示出高检测灵敏度和极低的误报率。这些结果凸显了数据驱动的离线异常检测在EHA中的可行性。未来工作将集中于将所开发的框架适配到在线(实时)环境。

英文摘要

Electro-Hydrostatic Actuators (EHAs) are widely used in aerospace and industrial systems, where timely detection of sensor anomalies is essential to ensure safe and reliable operation. However, the large volume and high sampling frequency of EHA sensor data pose challenges for accurate and efficient anomaly detection. Conventional statistical and classical machine-learning methods such as Z-score, Interquartile Range (IQR), Median Absolute Deviation (MAD), Isolation Forest, Gaussian Mixture, and k-means often fail to capture the temporal dependencies inherent in EHA signals, resulting in limited detection accuracy and elevated false-alarm rates. Furthermore, systematic evaluations of data-driven anomaly detection approaches for EHA systems remain scarce, particularly under varying operational conditions. This study presents an offline anomaly-detection framework for univariate EHA sensor signals, focusing on temperature and pressure data collected from a controlled test bench. The method employs a reconstruction-based Long Short-Term Memory (LSTM) autoencoder, calibrated and evaluated using validation-set reconstruction-error distributions. Performance is assessed across multiple fault-injection scenarios using accuracy, precision, recall, and F1-score, complemented by sensitivity analyses under varying operating conditions. The LSTM autoencoder achieved an average accuracy of 99.0\%, precision up to 100\%, recall between 90.2\% and 99.6\%, and F1-scores from 93.1\% to 99.8\%, demonstrating high detection sensitivity and a very low false-alarm rate across all evaluated sensors. These results highlight the feasibility of data-driven offline anomaly detection for EHAs. Future work will focus on adapting the developed framework for an online (real-time) environment.

2606.05272 2026-06-05 cs.LG

Learning Manifold and Itô Dynamics with Branched Neural Rough Differential Equations

学习流形与伊藤动力学:分支神经粗糙微分方程

Luke Thompson, Dai Shi, Lequan Lin, Junbin Gao, Andi Han

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) University of Toronto(多伦多大学)

AI总结 提出分支神经粗糙微分方程(B-NRDE),通过Hopf代数框架统一处理欧几里得伊藤动力学、流形上的有序协变导数及经典Stratonovich情形,实现精确的粗步流形约束动力学和伊藤一致律匹配。

Comments Accepted at ICML 2026

详情
AI中文摘要

神经粗糙微分方程(NRDE)在不规则采样下保持准确性,同时所需的积分步数远少于标准神经微分方程,它通过对数签名总结精细采样的驱动信号,并利用log-ODE方法在粗间隔上推进隐藏状态。这种效率依赖于洗牌代数,即Stratonovich微积分的代数对应。这种依赖性意味着NRDE无法暴露伊藤动力学所需的二次变分项,也无法处理带联络流形上控制伊藤流的有序协变导数。为改善这一点,我们引入了分支神经粗糙微分方程(B-NRDE),这是一个Hopf代数框架,将NRDE的log-ODE步骤重新解释为状态空间流形上的几何数值积分,使驱动代数与主导微积分相匹配:对于欧几里得伊藤动力学使用Grossman--Larson根树,对于流形上的有序协变导数使用Munthe-Kaas--Wright平面根树,在经典Stratonovich情形下使用洗牌代数。这产生了内在的粗步动力学,精确保持流形约束。最后,我们引入一个分支签名核目标,通过在训练过程中使二次变分项可见,实现伊藤一致律匹配。在粗糙Bergomi波动率、仿真到真实$\mathrm{SO}(3)$动力学预测以及SPD协方差动力学上,B-NRDE为欧几里得-Stratonovich设置之外的随机和流形值动力学提供了一种统一、有效的方法。

英文摘要

Neural rough differential equations (NRDEs) stay accurate under irregular sampling while taking far fewer integration steps than standard neural differential equations, summarising a finely sampled driver by its log-signature and advancing the hidden state over coarse intervals using the log-ODE method. This efficiency rests on the shuffle algebra, the algebraic counterpart of Stratonovich calculus. This reliance means NRDEs cannot expose the quadratic-variation terms Itô dynamics require, nor the ordered covariant derivatives that govern Itô flows on connection-equipped manifolds. Ameliorating this, we introduce Branched Neural Rough Differential Equations (B-NRDEs), a Hopf-algebraic framework that recasts the NRDE log-ODE step as geometric numerical integration on the state-space manifold, matching the driving algebra to the governing calculus: Grossman--Larson rooted trees for Euclidean Itô dynamics, Munthe-Kaas--Wright planar rooted trees for ordered covariant derivatives on manifolds, and the shuffle algebra in the classical Stratonovich case. This yields intrinsic coarse-step dynamics that exactly preserve manifold constraints. Finally, we introduce a branched signature-kernel objective to enable Itô-consistent law matching by making quadratic-variation terms visible during training. On rough Bergomi volatility, sim-to-real $\mathrm{SO}(3)$ dynamics forecasting, and SPD covariance dynamics, B-NRDEs offer a unified, effective approach to stochastic and manifold-valued dynamics beyond the Euclidean--Stratonovich setting.

2606.05266 2026-06-05 cs.LG cs.CC cs.DS math.CO math.PR math.ST stat.TH

Sharp Low-Degree Thresholds for Planted-vs-Planted Testing

植入vs植入测试的尖锐低度阈值

Anda Skeja, Daniel Gutiérrez Espinoza, Fiona Skerman, Alexander S. Wein

发表机构 * Department of Mathematics, University of California, Davis(加州大学戴维斯分校数学系)

AI总结 针对植入vs植入设置,建立了低度多项式测试的首个尖锐阈值,并证明在植入子矩阵和植入稠密子图模型中计数社区的匹配上下界,测试阈值与已知低度恢复阈值精确一致。

详情
AI中文摘要

我们在植入vs植入设置中建立了低度多项式测试的首个尖锐阈值,其中目标是以渐近消失的错误率确定两个结构化植入机制中的哪一个生成了观测数据。我们证明了在植入子矩阵和植入稠密子图模型中计数社区的匹配低度上下界。所得的测试阈值与已知的低度恢复阈值精确一致。相比之下,弱测试(即目标优于随机猜测)没有尖锐阈值,而是存在一个我们识别的平滑过渡。为了证明我们的结果,我们开发了一个基于低度恢复中潜在变量展开的植入vs植入测试框架,并采用新方法来识别和修剪非信号贡献。

英文摘要

We establish the first sharp thresholds for low-degree polynomial tests in planted-vs-planted settings, where the goal is to determine with vanishing error which of two structured planted mechanisms generated the observed data. We prove matching low-degree upper and lower bounds for counting communities in the planted submatrix and planted dense subgraph models. The resulting testing threshold coincides, down to the sharp constant, with the known low-degree recovery threshold. In contrast, the task of weak testing, where the goal is to outperform random guessing, does not have a sharp threshold but rather a smooth transition, which we identify. To prove our results, we develop a framework for planted-vs-planted testing that builds on a latent-variable expansion originating in low-degree recovery and employs new methods to identify and prune non-signal contributions.

2606.05265 2026-06-05 cs.LG

Data-efficient flood depth prediction through domain-aware coreset selection and tabular foundation models

数据高效的洪水深度预测:通过领域感知的核心集选择与表格基础模型

Lipai Huang, Adithi Srinath, Manas Singh, Junwei Ma, Ali Mostafavi

发表机构 * Urban Resilience.AI Lab(Urban Resilience.AI实验室) Zachry Department of Civil and Environmental Engineering, Texas A&M University(Zachry土木与环境工程系,德克萨斯A&M大学) Department of Computer Science and Engineering, Texas A&M University(计算机科学与工程系,德克萨斯A&M大学) Resilitix Intelligence LLC Institute for a Disaster Resilient Texas, Texas A&M University(德克萨斯灾难韧性研究所,德克萨斯A&M大学)

AI总结 提出一种领域感知的核心集构建流程,结合表格基础模型,仅用0.7%的训练数据即可实现与监督模型相当的洪水深度预测精度,并支持跨流域迁移。

详情
AI中文摘要

近实时洪水深度预测需要替代模型具有准确性、快速性和跨流域可迁移性。监督替代模型在精度上可媲美基于物理的模拟器,但每个流域需要数百万训练行,且无法外推到原始网格之外。我们提出了一种领域感知的核心集构建流程,在推理时对表格基础模型进行条件化。该流程按重现期和受影响最严重的流域对风暴进行分层,然后使用目标感知的空间选择器采样六边形。使用每个流域训练池的0.7%,模型在休斯顿地区九个流域上实现了平均$R^2$为0.663,达到监督参考($R^2$=0.673)的98.5%。该模型无需特定任务重训练即可迁移到未见的流域,优于基于核心集训练的监督基线。在真实风暴上,模型在一个远分布外案例中超过了监督参考,在一个几乎分布内案例中略逊于监督参考。领域感知的核心集构建使表格基础模型能够实现数据高效、跨流域可迁移的洪水预测,无需每个流域的训练。

英文摘要

Near-real-time flood depth prediction demands surrogate models that are accurate, fast, and transferable across watersheds. Supervised surrogates can match physics-based simulators in accuracy but need millions of training rows per watershed and cannot extrapolate beyond their original mesh. We propose a domain-aware coreset construction pipeline that conditions a tabular foundation model at inference time. The pipeline stratifies storms by return period and most-affected watershed, then samples hexagons with a target-aware spatial selector. With 0.7% of the per-watershed training pool, the model attains a mean $R^2$ of 0.663 across nine Houston-area watersheds, within 98.5% of the supervised reference ($R^2$ = 0.673). It transfers to held-out watersheds without task-specific retraining, staying ahead of a coreset-trained supervised baseline. On real storms it exceeds the supervised reference on a far out-of-distribution case and trails it on a mostly in-distribution one. Domain-aware coreset construction lets tabular foundation models deliver data-efficient, watershed-transferable flood predictions without per-watershed training.

2606.05263 2026-06-05 cs.LG cs.AI

Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents

基于策略条件的反事实信用分配用于长周期语言智能体的可验证强化学习

Renwei Meng

发表机构 * stu.ahu.edu.cn(安徽大学)

AI总结 提出CVT-RL算法,通过策略条件反事实贡献估计和可验证奖励约束,解决长周期语言智能体在推理和工具使用中的虚假证据链、信念漂移和捷径行为问题,在多个任务上提升成功率并降低作弊率。

Comments 16 pages, 6 figures

详情
AI中文摘要

具有可验证奖励的强化学习改进了推理和工具使用,但长周期语言智能体仍然学习到无支持的证据链、信念漂移以及满足终端检查的捷径行为。现有的过程奖励大多是相关的:它们奖励类似检索、反思或验证的步骤,而不估计在指定干预下该步骤是否有助于最终验证的成功。我们提出CVT-RL,一种具有密集可验证奖励、干预有效性门控和策略条件反事实贡献(PCCC)估计器的约束策略梯度算法。删除、语义替换、证据替换和工具输出扰动定义了不同的受控干预;延续从冻结的参考策略中采样,并使用选择调整的双重稳健估计器增强优势。信念控制仅使用前缀可观察标签,而增广拉格朗日约束无支持的声明、跳过的验证、工具篡改和不安全调用。在长上下文问答、ALFWorld、ScienceWorld以及网页/工具任务上,CVT-RL将平均任务成功率从计算匹配的非因果强化学习的71.8%和信息匹配的反事实过程基线的75.4%提高到78.9%,证据F1分数从信息匹配基线的78.9提高到82.8,并将测量的作弊率从7.2%降低到3.9%。独立人工审计估计CVT-RL的作弊率为4.6%,而信息匹配基线为8.1%,自适应检测器规避攻击仅将作弊率提高到7.1%。分层自助法和混合效应检验在Holm校正后所有主要指标的p<0.01。精心范围的反事实信用,结合有效性门控、诊断和可验证约束,为语言智能体更可靠的长周期强化学习提供了一条可复现的路径。

英文摘要

Reinforcement learning with verifiable rewards improves reasoning and tool use, yet long-horizon language agents still learn unsupported evidence chains, belief drift, and shortcut actions that satisfy terminal checks. Existing process rewards are mostly correlational: they reward retrieval-, reflection-, or verification-like steps without estimating whether the step contributes to final verified success under a specified intervention. We propose CVT-RL, a constrained policy-gradient algorithm with dense verifiable rewards, intervention-validity gating, and a policy-conditioned counterfactual contribution (PCCC) estimator. Deletion, semantic substitution, evidence substitution, and tool-output perturbation define separate controlled interventions; continuations are sampled from a frozen reference policy, and a selection-adjusted doubly robust estimator augments the advantage. Belief control uses only prefix-observable labels, while an augmented Lagrangian constrains unsupported claims, skipped verification, tool tampering, and unsafe calls. On long-context QA, ALFWorld, ScienceWorld, and web/tool tasks, CVT-RL improves average task success from 71.8% for compute-matched non-causal RL and 75.4% for an information-matched counterfactual-process baseline to 78.9%, improves evidence F1 from 78.9 to 82.8 over the information-matched baseline, and reduces measured hacking from 7.2% to 3.9%. Independent human audit estimates 4.6% hacking for CVT-RL versus 8.1% for the information-matched baseline, and adaptive detector-evasion attacks raise hacking only to 7.1%. Stratified bootstrap and mixed-effects tests give p<0.01 after Holm correction for all primary metrics. Carefully scoped counterfactual credit, paired with validity gating, diagnostics, and verifiable constraints, provides a reproducible route toward more reliable long-horizon RL for language agents.

2606.05261 2026-06-05 cs.CV cs.AI cs.LG

NIV: Neural Axis Variations for Variable Font Generation

NIV: 用于可变字体生成的神经轴变化

Nadav Benedek, Ariel Shamir, Ohad Fried

发表机构 * Reichman University(雷赫曼大学)

AI总结 提出NIV方法,通过预测字形轮廓的逐点位移,自动将静态字体转换为支持多轴连续插值的可变字体,并在新构建的数据集上验证其泛化能力。

详情
AI中文摘要

可变字体能够沿语义设计轴(如字重、字宽、倾斜和光学尺寸)实现字形几何的连续变化。然而,从静态字体构建可变字体仍然是一个劳动密集型过程,需要专业的字体设计和对字形变化数据的手动规范。我们引入了NIV(神经轴变化),一种自动将静态字体转换为功能齐全的可变字体的方法。给定字形轮廓和一组期望的设计轴,NIV预测每点的位移。该模型直接操作矢量字形几何,并采用一种新颖的属性嵌入机制,捕获多个轴之间的相互作用,从而在统一框架内实现一致的多轴变化。我们在一个新构建的源自可变Google字体的数据集上训练NIV,该数据集包含超过一百万个变化元组。得到的模型能够泛化到未见过的码点、未见过的字体样式、高复杂度的CJK字形,甚至分布外的手写输入。生成的输出是标准的可变字体文件,支持通过现有渲染引擎进行连续插值。为了促进研究,我们在https://github.com/ndvbd/NIV上发布了数据集、完整的训练和推理实现以及训练好的模型。超越字体排印,我们的方法展示了如何使用神经变形合成具有连续参数变化的结构化几何对象。

英文摘要

Variable fonts enable continuous variation of glyph geometry along semantic design axes such as weight, width, slant, and optical size. However, constructing a variable font from a static font remains a labor-intensive process requiring expert typographic design and manual specification of glyph variation data. We introduce NIV (Neural Axis Variations), a method that automatically converts a static font into a fully functional variable font. Given glyph outlines and a set of desired design axes, NIV predicts per-point displacements. The model operates directly on vector glyph geometry and employs a novel Property Embedding mechanism that captures interactions between multiple axes, enabling consistent multi-axis variation within a unified framework. We train NIV on a newly constructed dataset derived from variable Google Fonts, comprising over one million variation tuples. The resulting model generalizes across unseen code points, unseen font styles, high-complexity CJK glyphs, and even out-of-distribution handwriting inputs. The generated outputs are standard variable font files supporting continuous interpolation via existing rendering engines. To facilitate research, we release the dataset, the complete training and inference implementation, and trained models at https://github.com/ndvbd/NIV. Beyond typography, our approach demonstrates how structured geometric objects with continuous parametric variation can be synthesized using neural deformations.