arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2094
专题追踪
2503.23179 2026-06-19 eess.IV cs.CV 版本更新

OncoReg: Medical Image Registration for Oncological Challenges

OncoReg:面向肿瘤学挑战的医学图像配准

Wiebke Heyer, Yannic Elser, Lennart Berkel, Xinrui Song, Xuanang Xu, Pingkun Yan, Xi Jia, Jinming Duan, Zi Li, Tony C. W. Mok, BoWen LI, Tim Hable, Christian Staackmann, Christoph Großbröhmer, Lasse Hansen, Alessa Hering, Malte M. Sieren, Mattias P. Heinrich

发表机构 * Institute of Medical Informatics, University of Lübeck(吕贝克大学医学信息学研究所) Institute of Radiology and Nuclear Medicine, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院放射科和核医学研究所) Department of Biomedical Engineering and Center for Biotechnology and Interdisciplinary Studies, Rensselaer Polytechnic Institute(伦塞拉塞尔理工学院生物医学工程系和生物技术与跨学科研究中心) School of Computer Science, University of Birmingham(伯明翰大学计算机科学学院) Division of Informatics, Imaging and Data Sciences, University of Manchester(曼彻斯特大学信息学、成像和数据科学系) DAMO Academy, Alibaba Group(阿里集团DAMO学院) Hangzhou Shengshi Technology Co., Ltd(杭州盛世科技有限公司) Department of Radiation Oncology, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院放射肿瘤科) EchoScout GmbH Radboud University Medical Center, Nijmegen(奈密根大学医学中心) Institute of Interventional Radiology, University Hospital Schleswig-Holstein(石勒斯维希-霍尔斯坦大学医院介入放射科)

AI总结 提出OncoReg挑战,通过两阶段框架在保护患者隐私的同时开发可泛化的图像配准方法,用于放射治疗中锥束CT与扇束CT的配准,发现特征提取是关键,深度学习和经典方法结合最有效。

Comments 21 pages, 13 figures

详情
AI中文摘要

在现代癌症研究中,由于患者隐私相关的挑战,产生的大量医学数据往往未被充分利用。OncoReg挑战通过一个两阶段框架解决了这一问题,该框架使研究人员能够在确保患者隐私的同时开发和验证图像配准方法,并促进更可泛化的AI模型的发展。第一阶段涉及使用公开可用的数据集,第二阶段则专注于在安全的医院网络内对私有数据集进行模型训练。OncoReg建立在Learn2Reg挑战的基础上,纳入了放射治疗中介入性锥束计算机断层扫描与标准计划扇束CT图像的配准。准确的图像配准在肿瘤学中至关重要,特别是在图像引导放射治疗的动态治疗调整中,需要精确对齐以最小化对健康组织的辐射暴露,同时有效靶向肿瘤。本文详细介绍了OncoReg挑战的方法和数据,并对竞赛参赛作品和结果进行了全面分析。研究发现,特征提取在此配准任务中起着关键作用。从该挑战中涌现的一种新方法展示了其多功能性,而现有方法的表现与新技术相当。深度学习和经典方法在图像配准中仍扮演重要角色,尤其是方法的组合,特别是在特征提取方面,被证明最为有效。

英文摘要

In modern cancer research, the vast volume of medical data generated is often underutilised due to challenges related to patient privacy. The OncoReg Challenge addresses this issue by enabling researchers to develop and validate image registration methods through a two-phase framework that ensures patient privacy while fostering the development of more generalisable AI models. Phase one involves working with a publicly available dataset, while phase two focuses on training models on a private dataset within secure hospital networks. OncoReg builds upon the foundation established by the Learn2Reg Challenge by incorporating the registration of interventional cone-beam computed tomography with standard planning fan-beam CT images in radiotherapy. Accurate image registration is crucial in oncology, particularly for dynamic treatment adjustments in image-guided radiotherapy, where precise alignment is necessary to minimise radiation exposure to healthy tissues while effectively targeting tumours. This work details the methodology and data behind the OncoReg Challenge and provides a comprehensive analysis of the competition entries and results. Findings reveal that feature extraction plays a pivotal role in this registration task. A new method emerging from this challenge demonstrated its versatility, while established approaches continue to perform comparably to newer techniques. Both deep learning and classical approaches still play significant roles in image registration, with the combination of methods, particularly in feature extraction, proving most effective.

2506.01678 2026-06-19 cond-mat.mtrl-sci cs.AI 版本更新

Overcoming Labelled Data Scarcity for Defect Classification in Scanning Tunneling Microscopy

克服扫描隧道显微镜缺陷分类中的标注数据稀缺问题

Nikola L. Kolev, Max Trouton, Filippo Federici Canova, Geoff Thornton, David Z. Gao, Neil J. Curson, Taylor J. Z. Stock

发表机构 * London Centre for Nanotechnology, University College London(伦敦纳米技术中心,伦敦大学学院) Department of Electronic and Electrical Engineering, University College London(电子与电气工程系,伦敦大学学院) Department of Physics and Astronomy, University College London(物理与天文学系,伦敦大学学院) Department of Chemistry, University College London(化学系,伦敦大学学院) Aalto Science Institute, School of Science, Aalto University(艾尔沃斯科学研究所,艾尔沃斯大学) Nanolayers Research Computing LTD, London, UK(纳米层研究计算有限公司,伦敦,英国) Department of Physics, NTNU Norwegian University of Science and Technology(物理系,挪威科技大学)

AI总结 提出结合少样本学习和无监督学习的自动分割方法,在仅需少量标注数据下实现高精度STM图像缺陷分类,并在三种表面验证了强泛化能力。

详情
AI中文摘要

扫描隧道显微镜(STM)是一种以原子分辨率对表面成像的强大技术,可深入理解单原子和分子层面的物理化学过程。STM图像分析的一项常规任务是在均匀背景中识别和标记感兴趣的特征。手动执行此操作是一项劳动密集型工作,需要大量人力。为减轻这一负担,我们提出了一种自动化的STM图像分割方法,该方法同时使用少样本学习和无监督学习。与之前的监督方法相比,我们的技术提供了更大的灵活性;它消除了对大型手动标注数据集的需求,因此更容易适应未见过的表面,同时仍保持高精度。我们通过使用该方法识别三种不同表面上的原子特征来展示其有效性:Si(001)、Ge(001)和TiO$_2$(110),包括吸附在硅和锗表面上的AsH$_3$分子。我们的模型表现出强大的泛化能力,在初始训练后,仅需一个额外的标注数据点即可适应未见过的表面。这项工作朝着高效且与材料无关的STM图像自动分割迈出了重要一步。

英文摘要

Scanning tunnelling microscopy (STM) is a powerful technique for imaging surfaces with atomic resolution, providing insight into physical and chemical processes at the level of single atoms and molecules. A regular task of STM image analysis is the identification and labelling of features of interest against a uniform background. Performing this manually is a labour-intensive task, requiring significant human effort. To reduce this burden, we propose an automated approach to the segmentation of STM images that uses both few-shot learning and unsupervised learning. Our technique offers greater flexibility compared to previous supervised methods; it removes the requirement for large manually annotated datasets and is thus easier to adapt to an unseen surface while still maintaining a high accuracy. We demonstrate the effectiveness of our approach by using it to recognise atomic features on three distinct surfaces: Si(001), Ge(001), and TiO$_2$(110), including adsorbed AsH$_3$ molecules on the silicon and germanium surfaces. Our model exhibits strong generalisation capabilities, and following initial training, can be adapted to unseen surfaces with as few as one additional labelled data point. This work is a significant step towards efficient and material-agnostic, automatic segmentation of STM images.

2503.20646 2026-06-19 cs.HC cs.RO cs.SY eess.SY 版本更新

Immersive and Wearable Thermal Rendering for Augmented Reality

增强现实的沉浸式可穿戴热渲染

Alexandra Watkins, Ritam Ghosh, Evan Chow, Nilanjan Sarkar

发表机构 * Vanderbilt University(范德比大学)

AI总结 提出一种掌戴式热反馈原型,通过间接反馈、主动热透传和时空变化渲染策略,在增强现实中实现沉浸式热触觉体验,实验验证了其可行性与权衡。

详情
AI中文摘要

我们提出了一种概念验证的掌戴式热反馈原型,针对增强现实(AR)中的热渲染挑战,用户必须在其物理工作空间中与真实和虚拟物体交互。与为虚拟现实开发的热反馈系统相比,AR热反馈必须保持手部灵活性、维持对真实世界热线索的访问,并在不阻碍自然物体交互的情况下提供连贯的虚拟温度感知。我们提出了三个AR特定的设计考虑,并由我们的原型实现:间接反馈以保持指尖灵活性、主动热透传以感知和渲染接触物理表面的温度,以及手掌上的空间和时间变化热渲染。人体实验评估了AR交互过程中的感知灵敏度、间接反馈、主动热透传、空间模式识别和移动热渲染。结果表明,尽管间接反馈在指尖视觉接触时降低了感知真实感,但并未降低沉浸感或舒适度;主动热透传支持真实与渲染表面之间的温度辨别;时空渲染相比静态热刺激显著提高了沉浸感和真实感。这些发现表明,我们的设计考虑是AR热触觉的可行设计策略,同时澄清了需要精确真实感与更广泛沉浸式热体验的应用之间的权衡。

英文摘要

We present a proof-of-concept palm-mounted thermal feedback prototype addressing thermal rendering challenges specific to augmented reality (AR), where users must interact with both real and virtual objects in their physical workspace. In contrast to thermal feedback systems developed for virtual reality, AR thermal feedback must preserve manual dexterity, maintain access to real-world thermal cues, and provide coherent virtual temperature sensations without obstructing natural object interaction. We propose three AR-specific design considerations, which our prototype implements: indirect feedback to preserve fingertip dexterity, active thermal passthrough to sense and render the temperature of contacted physical surfaces, and spatially and temporally varying thermal rendering across the palm. Human-subject experiments evaluated perceptual sensitivity, indirect feedback, active thermal passthrough, spatial pattern recognition, and moving thermal rendering during AR interaction. Results showed that although indirect feedback reduced perceived realism during visual contact at the fingertips, it did not reduce immersion or comfort; active thermal passthrough supported temperature discrimination between real and rendered surfaces; and spatiotemporal rendering significantly improved immersion and realism compared with static thermal stimulation. These findings suggest that our design considerations are viable design strategies for AR thermal haptics, while also clarifying tradeoffs for applications that require precise realism versus broader immersive thermal experience.

2503.17386 2026-06-19 eess.SY cs.LG cs.SY 版本更新

A graph neural network surrogate model for mesh-based crashworthiness prediction of vehicle panel components

基于图神经网络的网格级车辆面板部件耐撞性预测代理模型

Haoran Li, Yingxue Zhao, Haosu Zhou, Tobias Pfaff, Nan Li

发表机构 * Dyson School of Design Engineering, Imperial College London(迪森设计工程学院,帝国理工学院伦敦分校) NVIDIA

AI总结 提出递归图U-Net (ReGUNet) 代理模型,通过图表示有限元网格,结合层次架构和递归机制,高效准确预测车辆B柱等面板部件的动态变形和耐撞性指标。

Comments Accepted manuscript version. Final published version available in Results in Engineering via DOI: 10.1016/j.rineng.2026.110925

Journal ref Results in Engineering 30 (2026) 110925

详情
AI中文摘要

耐撞性是安全关键车辆面板部件(如B柱)设计中的关键性能指标。有限元(FE)模拟广泛用于评估碰撞响应,但对于大规模非线性碰撞场景,特别是当集成到迭代设计和优化过程中时,计算成本仍然很高。尽管基于机器学习的代理模型已被开发用于快速耐撞性分析,但它们在对复杂三维部件的详细表示方面存在局限性。图神经网络(GNN)已成为处理复杂结构数据的有前景的解决方案。然而,现有的GNN模型通常缺乏足够的精度和计算效率以满足工业需求。本文提出了递归图U-Net(ReGUNet),一种用于车辆面板部件耐撞性分析的基于图的代理模型。通过将有限元网格表示为图形式,该模型自然地适应复杂的非规则结构几何。其层次架构提高了计算效率和精度,而递归的引入增强了多时间步长上时间预测的稳定性。使用不同几何形状的热冲压钢B柱的侧面碰撞案例研究来生成训练数据集。训练后的模型在预测未见过的部件设计的动态变形行为和耐撞性指标方面表现出高精度。与基线方法相比,ReGUNet在平均变形预测误差上实现了超过52%的降低,同时计算效率显著提高。ReGUNet提供了快速可靠的耐撞性评估,从而加速了车辆面板部件的设计周期。

英文摘要

Crashworthiness is a key performance measure in the design of safety-critical vehicle panel components such as B-pillars. Finite element (FE) simulations are widely used to evaluate crash responses but remain computationally expensive for large-scale, nonlinear impact scenarios, particularly when integrated into iterative design and optimisation processes. Although machine learning-based surrogate models have been developed for rapid crashworthiness analysis, they exhibit limitations in detailed representation of complex 3-dimensional components. Graph Neural Networks (GNNs) have emerged as a promising solution for processing data with complex structures. However, existing GNN models often lack sufficient accuracy and computational efficiency to meet industrial demands. This paper proposes Recurrent Graph U-Net (ReGUNet), a graph-based surrogate model for crashworthiness analysis of vehicle panel components. By representing FE meshes in graph form, the model naturally accommodates complex irregular structural geometries. Its hierarchical architecture improves computational efficiency and accuracy, while the introduction of recurrence enhances stability of temporal predictions over multiple time steps. A side-impact case study of hot-stamped steel B-pillars with varying geometries is used to generate training dataset. The trained model demonstrates high accuracy in predicting the dynamic deformation behaviour and crashworthiness indicators of previously unseen component designs. ReGUNet achieves over a 52% reduction in the average deformation prediction error relative to baseline methods, together with markedly improved computational efficiency. ReGUNet provides rapid and reliable crashworthiness assessments, which in turn accelerates the design cycle of vehicle panel components.

2406.02421 2026-06-19 cs.DM cs.LG cs.SC 版本更新

Representing Piecewise-Linear Functions by Functions with Minimal Arity

用最小元数函数表示分段线性函数

Christoph Koutschan, Anton Ponomarchuk, Josef Schicho

发表机构 * Johann Radon Institute for Computational and Applied Mathematics(约翰·拉登研究所(计算与应用数学)) Research Institute for Symbolic Computation(符号计算研究所) Johannes Kepler University(约翰· Kepler大学)

AI总结 本文研究了连续分段线性函数表示为max函数线性组合所需的最小参数个数,建立了函数诱导的空间剖分与所需参数个数之间的直接联系。

详情
AI中文摘要

任何连续分段线性函数 $F\colon \mathbb{R}^{n}\to \mathbb{R}$ 都可以表示为至多 $n+1$ 个仿射线性函数的 $\max$ 函数的线性组合。在我们之前的论文 [``Representing piecewise linear functions by functions with small arity'', AAECC, 2023] 中,我们证明了 $n+1$ 个参数的上界是紧的。在本文中,我们通过建立函数 $F$ 与任何此类分解所需的最小参数个数之间的对应关系来扩展这一结果。我们表明,由函数 $F$ 诱导的输入空间 $\mathbb{R}^{n}$ 的剖分与 $\max$ 函数中的参数个数有直接联系。

英文摘要

Any continuous piecewise-linear function $F\colon \mathbb{R}^{n}\to \mathbb{R}$ can be represented as a linear combination of $\max$ functions of at most $n+1$ affine-linear functions. In our previous paper [``Representing piecewise linear functions by functions with small arity'', AAECC, 2023], we showed that this upper bound of $n+1$ arguments is tight. In the present paper, we extend this result by establishing a correspondence between the function $F$ and the minimal number of arguments that are needed in any such decomposition. We show that the tessellation of the input space $\mathbb{R}^{n}$ induced by the function $F$ has a direct connection to the number of arguments in the $\max$ functions.

2309.15769 2026-06-19 math.ST cs.LG stat.ME stat.TH 版本更新

Benign overfitting beyond prediction: The ordinary least squares interpolator

超越预测的良性过拟合:普通最小二乘插值器

Dennis Shen, Dogyoon Song, Peng Ding, Jasjeet S. Sekhon

发表机构 * Department of Data Sciences & Operations, University of Southern California(数据科学与运营系,南加州大学) Department of Statistics, University of California, Davis(统计学系,加州大学戴维斯分校) Department of Statistics, University of California, Berkeley(统计学系,加州大学伯克利分校) Google DeepMind(谷歌DeepMind)

AI总结 本文研究过参数化线性模型中最小ℓ2范数OLS插值器的参数估计与推断性质,推导了留k法、遗漏变量偏误公式和Frisch-Waugh-Lovell定理的过参数化版本,并扩展了高斯-马尔可夫定理。

Comments This work is accepted for publication in Biometrika

详情
AI中文摘要

深度学习的最新进展突显了过参数化统计模型中良性过拟合的现象,引发了对其基础理解的浓厚兴趣。由于其简单性和实际相关性,普通最小二乘(OLS)插值器已成为从理论上理解这一现象的关键研究对象。虽然OLS在经典欠参数化设置下的性质已得到充分理解,但其在过参数化区域中的行为——与岭回归或lasso不同——仍相对较少被探索。我们通过为最小$\ell_2$范数OLS插值器推导新的代数和统计结果,为这一不断增长的文献做出贡献。与现有大部分关注预测风险的工作不同,我们的分析集中于参数估计和推断,这对于许多统计学和因果推断应用至关重要。具体地,我们建立了以下内容的过参数化类比:(i) 留$k$法公式,(ii) 遗漏变量偏误公式,以及(iii) Frisch-Waugh-Lovell定理。在高斯-马尔可夫模型下,我们进一步扩展了高斯-马尔可夫定理,并分析了过参数化设置下同方差性时的方差估计。这些结果共同为研究过参数化线性模型中的参数估计和推断提供了一个系统框架,为超越预测含义的良性过拟合提供了新视角。

英文摘要

Recent advances in deep learning have highlighted the phenomenon of benign overfitting in overparameterized statistical models, sparking significant interest in understanding its foundations. Owing to its simplicity and practical relevance, the ordinary least squares (OLS) interpolator has become a key object of study for gaining theoretical insight into this phenomenon. While the properties of OLS are well understood in classical underparameterized settings, its behavior in the overparameterized regime -- unlike that of ridge regression or the lasso -- remains comparatively less explored. We contribute to this growing literature by deriving new algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In contrast to much of the existing work, which focuses on prediction risk, we center our analysis on parameter estimation and inference, which are fundamental for many statistics and causal inference applications. Specifically, we establish overparameterized analogues of (i) the leave-$k$-out formulas, (ii) the omitted variable bias formula, and (iii) the Frisch-Waugh-Lovell theorem. Under the Gauss-Markov model, we further extend the Gauss-Markov theorem and analyze variance estimation under homoskedasticity in the overparameterized setting. Collectively, these results provide a systematic framework for studying parameter estimation and inference in overparameterized linear models, offering a novel perspective on benign overfitting beyond its implications for prediction.

2405.10705 2026-06-19 eess.IV cs.CV 版本更新

3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation Learning

基于血管概率引导衰减学习的稀疏视角动态DSA图像三维血管重建

Zhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wenping Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming Cui

发表机构 * School of Biomedical Engineering \& State Key Laboratory of Advanced Medical Materials Devices, ShanghaiTech University, Shanghai, China National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, China School of Electronic Information Communications, Huazhong University of Science Department of Computer Science \& Engineering, Texas A\&M University, USA

AI总结 提出血管概率引导衰减学习框架,通过静态与动态衰减场互补加权实现稀疏视角DSA重建,降低辐射剂量,并采用渐进训练和时间扰动损失提升质量。

Comments Accepted by Medical Image Analysis (MedIA), 2026

详情
AI中文摘要

数字减影血管造影(DSA)是血管疾病诊断的金标准之一。借助造影剂,时间分辨的二维DSA图像提供全面的血流信息,可用于重建三维血管结构以进行医学评估。当前的商用DSA系统通常需要数百个扫描视角进行重建,导致大量辐射暴露。在本研究中,我们提出了一种基于神经渲染的优化框架,专门用于高质量稀疏视角DSA重建,以减少辐射剂量。我们的方法称为血管概率引导衰减学习,将DSA成像表示为静态和动态衰减场的互补加权组合,权重来自时间无关的血管概率场。作为前景掩膜,血管概率为静态和动态场提供适应不同场景类型的适当梯度。该机制实现了静态背景与动态造影剂流的自监督分解,并显著提高了重建质量。我们的模型通过最小化合成投影与真实DSA图像之间的差异进行训练。我们进一步采用两种训练策略来提高重建质量:(1)由粗到细的渐进训练以改善几何结构,以及(2)时间扰动渲染损失以保持时间一致性。实验结果表明了高质量的三维血管重建和二维DSA图像合成。

英文摘要

Digital Subtraction Angiography (DSA) is one of the gold standards for vascular disease diagnosis. With the help of a contrast agent, time-resolved 2D DSA images deliver comprehensive blood flow information and can be utilized to reconstruct 3D vessel structures for medical assessment. Current commercial DSA systems typically require hundreds of scanning views to perform reconstruction, resulting in substantial radiation exposure. In this study, we propose a neural rendering-based optimization framework tailored for high-quality sparse-view DSA reconstruction to reduce radiation dosage. Our approach, termed vessel probability guided attenuation learning, represents DSA imaging as a complementary weighted combination of static and dynamic attenuation fields, with the weights derived from the time-independent vessel probability field. Functioning as a foreground mask, vessel probability provides proper gradients for both static and dynamic fields adaptive to different scene types. This mechanism enables self-supervised decomposition between static backgrounds and dynamic contrast agent flow, and significantly improves reconstruction quality. Our model is trained by minimizing the discrepancy between synthesized projections and real captured DSA images. We further employ two training strategies to improve reconstruction quality: (1) coarse-to-fine progressive training for better geometry and (2) temporal perturbed rendering loss for temporal consistency. Experimental results have demonstrated high-quality 3D vessel reconstruction and 2D DSA image synthesis.

2104.08928 2026-06-19 stat.ML cs.CL cs.LG 版本更新

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

面向词嵌入迁移学习的组稀疏矩阵分解

Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani

发表机构 * W. P. Carey School of Business, Arizona State University(亚利桑那州立大学韦伯商学院) University of Pennsylvania(宾夕法尼亚大学) Wharton School, University of Pennsylvania(宾夕法尼亚大学沃顿商学院)

AI总结 提出一种基于组稀疏惩罚的两阶段估计器,通过结合大规模语料和少量领域数据高效迁移学习领域特定的词嵌入,并证明了其泛化误差界和非凸目标函数的局部最优与全局最优统计等价。

详情
AI中文摘要

非结构化文本为许多领域的决策者提供了丰富的数据源,从零售中的产品评论到医疗保健中的护理记录。为了利用这些信息,单词通常通过无监督学习算法(如矩阵分解)转化为词嵌入——编码单词之间语义关系的向量。然而,从训练数据有限的新领域学习词嵌入可能具有挑战性,因为在新领域中含义/用法可能不同,例如,单词“positive”通常具有积极情感,但在医疗记录中通常具有消极情感,因为它可能意味着患者检测出疾病阳性。在实践中,我们预计只有少数领域特定的单词可能具有新含义。我们提出了一种直观的两阶段估计器,通过组稀疏惩罚利用这种结构,通过结合大规模文本语料库(如维基百科)和有限的领域特定文本数据,高效地迁移学习领域特定的词嵌入。我们限定了迁移学习估计器的泛化误差,证明当只有少量嵌入在领域间改变时,它可以用显著更少的领域特定数据实现高精度。此外,我们证明了在标准正则化条件下,由非凸目标函数识别的所有局部最小值与全局最小值在统计上不可区分,这意味着我们的估计器可以高效计算。我们的结果首次给出了组稀疏矩阵分解的界限,这可能具有独立意义。我们通过与自然语言处理中最先进的微调启发式方法进行实证比较来评估我们的方法。

英文摘要

Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retail to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word ``positive'' typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient tested positive for a disease. In practice, we expect that only a small number of domain-specific words may have new meanings. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our transfer learning estimator, proving that it can achieve high accuracy with substantially less domain-specific data when only a small number of embeddings are altered between domains. Furthermore, we prove that all local minima identified by our nonconvex objective function are statistically indistinguishable from the global minimum under standard regularization conditions, implying that our estimator can be computed efficiently. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.

2602.05416 2026-06-19 cs.CE cs.AI cs.LG physics.ao-ph physics.flu-dyn

Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models

降阶代理模型用于强制柔性网格海岸-海洋模型

Freja Høgholm Petersen, Jesper Sandvig Mariegaard, Rocco Palmitessa, Allan P. Engsig-Karup

发表机构 * DTU(技术大学)

AI总结 本文提出一种灵活的Koopman自动编码器,结合气象强迫和边界条件,对比其与POD代理模型的性能,展示高精度和高效能的降阶方法。

Comments Submitted for peer-review in a journal. v2: revised version submitted to journal after minor revisions

详情
AI中文摘要

尽管基于正交分解(POD)的代理模型在水动力应用中被广泛研究,但Koopman自动编码器在现实海岸-海洋建模中的应用仍较为有限。本文介绍了一种灵活的Koopman自动编码器公式,结合气象强迫和边界条件,并系统地比较其与POD代理模型的性能。Koopman自动编码器在潜在空间中使用学习的线性时间算子,通过特征值正则化促进时间稳定性。该策略与时间展开技术结合,以实现稳定和准确的长期预测。模型在三个涵盖不同动力学领域的测试案例上进行评估,预测时间跨度达一年,时间分辨率为30分钟。在所有案例中,具有时间展开的降阶代理模型在相对均方根误差为0.0068-0.14和R²值为0.61-0.995的情况下实现了高精度,其中预测误差最大为洋流速度,最小为水表面 elevation。在两个案例中,Koopman自动编码器的精度高于POD代理模型。与现场观测相比,代理模型的水表面 elevation 预测误差比物理模型的预测误差增加了-0.64%至12%。这些误差水平,对应于几厘米,对于许多实际应用是可接受的,同时推理速度提升300-1400倍,使如集合预报和长期气候模拟等工作流程成为可能。

英文摘要

While proper orthogonal decomposition (POD)-based surrogates are widely explored for hydrodynamic applications, the use of Koopman autoencoders for real-world coastal-ocean modelling remains relatively limited. This paper introduces a flexible Koopman autoencoder formulation that incorporates meteorological forcings and boundary conditions, and systematically compares its performance against POD-based surrogates. The Koopman autoencoder employs a learned linear temporal operator in latent space, enabling eigenvalue regularization to promote temporal stability. This strategy is evaluated alongside temporal unrolling techniques for achieving stable and accurate long-term predictions. The models are assessed on three test cases spanning distinct dynamical regimes, with prediction horizons up to one year at 30-minute temporal resolution. Across all cases, the reduced order surrogates with temporal unrolling achieve high accuracy with relative root-mean-squared-errors of 0.0068-0.14 and $R^2$-values of 0.61-0.995, where prediction errors are largest for current velocities, and smallest for water surface elevations. In two of the three cases, the Koopman Autoencoder have higher accuracy than the POD-based surrogates. Comparing to in-situ observations, the surrogate yields -0.64% to 12% increase in water surface elevation prediction error when compared to prediction errors of the physics-based model. These error levels, corresponding to a few centimeters, are acceptable for many practical applications, while inference speed-ups of 300-1400x enables workflows such as ensemble forecasting and long climate simulations for coastal-ocean modelling.

2601.12433 2026-06-19 eess.SP cs.LG

Temporal Data and Short-Time Averages Improve Multiphase Mass Flow Metering

时序数据和短时平均值提升多相质量流量计测量

Amanda Nyholm, Yessica Arellano, Jinyu Liu, Damian Krakowiak, Pierluigi Salvo Rossi

发表机构 * Dept. Electronic Systems, Norwegian University of Science and Technology(电子系统系,挪威科学与技术大学) Dept. Gas Technology, SINTEF Energy Research(气体技术系,SINTEF能源研究) Dept. Research and Development, KROHNE Ltd.(研发部,KROHNE有限公司)

AI总结 本文通过结合机器学习与单相流量计,利用时序数据和短时平均值提升多相流测量精度,CNN在0.25Hz下表现最佳,误差显著低于传统方法。

Comments 9 pages, 6 figures

Journal ref IEEE Sensors Journal, vol. 26, no. 11, pp. 17252-17261, 1 June 2026

详情
AI中文摘要

可靠的流量测量对许多行业至关重要,但当前仪器常难以准确估计多相流。本文将机器学习算法与准确的单相流量计结合,通过保留时序信息显著提升模型性能。我们比较了多层感知机、滑动窗口多层感知机和卷积神经网络(CNN)在342次三相空气-水-油流实验数据上的表现。与以往将每个实验压缩为单一平均样本不同,我们计算每个实验内的短时平均值,并训练保留时序信息的模型。CNN在0.25Hz下表现最佳,相对误差低于13%的占比约95%,归一化均方根误差为0.03,平均绝对百分比误差约4.3%,明显优于最佳单平均模型,证明在单个实验内使用短时平均更优。结果在多种数据分割和随机种子下一致,显示鲁棒性。

英文摘要

Reliable flow measurements are essential in many industries, but current instruments often fail to accurately estimate multiphase flows, which are frequently encountered in real-world operations. Combining machine learning (ML) algorithms with accurate single-phase flowmeters has therefore received extensive research attention in recent years. The Coriolis mass flowmeter is a widely used single-phase meter that provides direct mass flow measurements, which ML models can be trained to correct, thereby reducing measurement errors in multiphase conditions. This paper demonstrates that preserving temporal information significantly improves model performance in such scenarios. We compare a multilayer perceptron, a windowed multilayer perceptron, and a convolutional neural network (CNN) on three-phase air-water-oil flow data from 342 experiments. Whereas prior work typically compresses each experiment into a single averaged sample, we instead compute short-time averages from within each experiment and train models that preserve temporal information at several downsampling intervals. The CNN performed best at 0.25 Hz with approximately 95 % of relative errors below 13 %, a normalized root mean squared error of 0.03, and a mean absolute percentage error of approximately 4.3 %, clearly outperforming the best single-averaged model and demonstrating that short-time averaging within individual experiments is preferable. Results are consistent across multiple data splits and random seeds, demonstrating robustness.

2506.23396 2026-06-19 stat.ML cs.LG

AICO: Feature Significance Tests for Supervised Learning

AICO:监督学习中的特征重要性检验

Kay Giesecke, Enguerrand Horel, Chartsiri Jirachotkulthorn

发表机构 * Stanford University, Department of Management Science and Engineering and Institute for Computational and Mathematical Engineering(斯坦福大学管理科学与工程系和计算与数学工程研究所) Upstart, Inc.(Upstart公司) Stanford University, Institute for Computational and Mathematical Engineering(斯坦福大学计算与数学工程研究所)

AI总结 AICO提出一种高效统计方法,通过屏蔽特征信息来测试特征对预测性能的贡献,为大规模模型提供无分布假设的可解释性工具。

详情
AI中文摘要

机器学习在现代科学、工业和政策中至关重要,但其预测能力往往以透明性为代价:我们很少知道哪些输入特征真正驱动模型的预测。现有工具评估特征影响有限,大多数缺乏统计保证,且许多需要昂贵的重新训练或替代模型,难以应用于大型现代模型。我们引入AICO,一种广泛适用的框架,将模型可解释性转化为高效的统计练习。AICO测试每个特征是否真正提高预测性能,通过屏蔽其信息并测量由此产生的变化。该方法通过简单的非渐近假设检验程序提供精确的有限样本特征p值和置信区间,无需重新训练、替代模型或分布假设,适用于大规模算法。在受控实验和实际应用中,从信用评分到抵押行为预测,AICO可靠地识别驱动模型行为的变量,提供可扩展且统计上合理的透明和可信机器学习路径。

英文摘要

Machine learning is central to modern science, industry, and policy, yet its predictive power often comes at the cost of transparency: we rarely know which input features truly drive a model's predictions. Without such understanding, researchers cannot draw reliable conclusions, practitioners cannot ensure fairness or accountability, and policymakers cannot trust or govern model-based decisions. Existing tools for assessing feature influence are limited; most lack statistical guarantees, and many require costly retraining or surrogate modeling, making them impractical for large modern models. We introduce AICO, a broadly applicable framework that turns model interpretability into an efficient statistical exercise. AICO tests whether each feature genuinely improves predictive performance by masking its information and measuring the resulting change. The method provides exact, finite-sample feature p-values and confidence intervals for feature importance through a simple, non-asymptotic hypothesis testing procedure. It requires no retraining, surrogate modeling, or distributional assumptions, making it feasible for large-scale algorithms. In both controlled experiments and real applications, from credit scoring to mortgage-behavior prediction, AICO reliably identifies the variables that drive model behavior, providing a scalable and statistically principled path toward transparent and trustworthy machine learning.

2602.14239 2026-06-19 cs.SI cs.AI cs.LG

A Hybrid TGN-SEAL Model for Dynamic Graph Link Prediction

一种混合TGN-SEAL模型用于动态图链接预测

Nafiseh Sadat Sajadi, Behnam Bahrak, Mahdi Jafari Siavoshani

发表机构 * Department of Computer Engineering, Sharif University of Technology(谢尔万大学计算机工程系) Tehran Institute for Advanced Studies, Khatam University(泰赫兰高级研究院,卡塔姆大学)

AI总结 本文提出混合TGN-SEAL模型,通过提取候选链接周围子图,联合学习结构和时间信息,提升稀疏动态网络链接预测性能。

Journal ref EPJ Data Science (2026)

详情
AI中文摘要

在稀疏且持续演化的网络中预测链接是网络科学中的核心挑战。传统启发式方法和深度学习模型,包括图神经网络(GNNs),通常设计用于静态图,难以捕捉时间依赖性。基于快照的技术部分解决了这一问题,但在具有短暂交互的网络(如电信呼叫详细记录(CDRs))中常面临数据稀疏和类别不平衡的问题。时间图网络(TGNs)通过随时间更新节点嵌入来建模动态图;然而,在稀疏条件下其预测准确性仍有限。在本研究中,我们通过提取候选链接周围的封闭子图改进TGN框架,使模型能够联合学习结构和时间信息。在稀疏CDR数据集上的实验表明,我们的方法在标准TGNs基础上将平均精度提高了2.6%,展示了在动态网络中整合局部拓扑结构以实现稳健链接预测的优势。

英文摘要

Predicting links in sparse, continuously evolving networks is a central challenge in network science. Conventional heuristic methods and deep learning models, including Graph Neural Networks (GNNs), are typically designed for static graphs and thus struggle to capture temporal dependencies. Snapshot-based techniques partially address this issue but often encounter data sparsity and class imbalance, particularly in networks with transient interactions such as telecommunication call detail records (CDRs). Temporal Graph Networks (TGNs) model dynamic graphs by updating node embeddings over time; however, their predictive accuracy under sparse conditions remains limited. In this study, we improve the TGN framework by extracting enclosing subgraphs around candidate links, enabling the model to jointly learn structural and temporal information. Experiments on a sparse CDR dataset show that our approach increases average precision by 2.6% over standard TGNs, demonstrating the advantages of integrating local topology for robust link prediction in dynamic networks.

2601.15119 2026-06-19 eess.IV cs.CV

Vision Models for Medical Imaging: A Hybrid Approach for PCOS Detection from Ultrasound Scans

医学影像中的视觉模型:一种用于超声扫描中多囊卵巢综合征检测的混合方法

Md Mahmudul Hoque, Md Mehedi Hassain, Muntakimur Rahaman, Md. Towhidul Islam, Shaista Rani, Md Sharif Mollah

发表机构 * Department of CSE, CCN University of Science & Technology(计算机科学与工程系,CCN科学与技术大学) Department of EEE,International Islamic University Chittagong(电子工程系,国际伊斯兰大学恰tagong分校) Faculty of Engineering, Multimedia University(工程学院,多媒体大学) Department of CSE, Stamford University of Bangladesh(计算机科学与工程系,斯塔福德大学孟加拉国分校) Department of Biology, Lucknow University(生物学系,拉胡尔大学) Department of CSE, Bangladesh Army International University of Science & Technology(计算机科学与工程系,孟加拉国军队国际科学与技术大学)

AI总结 本文提出两种混合模型,结合卷积和Transformer方法,用于超声图像中多囊卵巢综合征的准确检测,最终模型在准确性上达到98.23%。

详情
AI中文摘要

多囊卵巢综合征(PCOS)是育龄女性最常见的内分泌疾病。许多孟加拉女性在老年时患PCOS。我们的研究目的是识别有效的基于视觉的医学图像分析技术,并评估混合模型以准确检测PCOS。我们引入了两种新颖的混合模型,结合卷积和Transformer方法。训练和测试数据被分为两类:“感染”(PCOS阳性)和“非感染”(健康卵巢)。在初始阶段,我们的第一个混合模型“DenConST”(结合DenseNet121、Swin Transformer和ConvNeXt)达到了85.69%的准确率。最终优化的模型“DenConREST”(结合Swin Transformer、ConvNeXt、DenseNet121、ResNet18和EfficientNetV2)表现出更优异的性能,准确率达到98.23%。在所有评估的模型中,DenConREST表现最佳。本研究为从超声图像中检测PCOS提供了一个高效的解决方案,显著提高了诊断准确性并减少了检测错误。

英文摘要

Polycystic Ovary Syndrome (PCOS) is the most familiar endocrine illness in women of reproductive age. Many Bangladeshi women suffer from PCOS disease in their older age. The aim of our research is to identify effective vision-based medical image analysis techniques and evaluate hybrid models for the accurate detection of PCOS. We introduced two novel hybrid models combining convolutional and transformer-based approaches. The training and testing data were organized into two categories: "infected" (PCOS-positive) and "noninfected" (healthy ovaries). In the initial stage, our first hybrid model, 'DenConST' (integrating DenseNet121, Swin Transformer, and ConvNeXt), achieved 85.69% accuracy. The final optimized model, 'DenConREST' (incorporating Swin Transformer, ConvNeXt, DenseNet121, ResNet18, and EfficientNetV2), demonstrated superior performance with 98.23% accuracy. Among all evaluated models, DenConREST showed the best performance. This research highlights an efficient solution for PCOS detection from ultrasound images, significantly improving diagnostic accuracy while reducing detection errors.

2509.04390 2026-06-19 eess.AS cs.SD

Accelerated Interactive Auralization of Highly Reverberant Spaces using Graphics Hardware

利用图形硬件加速高混响空间的交互式声景还原

Hannes Rosseel, Toon van Waterschoot

发表机构 * KU Leuven, Dept. of Electrical Engineering (ESAT), STADIUS Center for Dynamical Systems, Signal Processing

AI总结 本文提出基于GPU的实时多声道扬声器声学还原系统,通过GPU加速降低计算延迟,实现高混响空间的实时声学合成与反馈消除。

Comments 9 pages, 6 figures, submitted to Journal of the Audio Engineering Society

详情
AI中文摘要

交互式声学还原允许用户实时探索虚拟声学环境,能够重现不再可访问、声学改变或难以访问的音乐厅或历史礼拜空间。交互式声学合成需要实时将输入信号与一组合成滤波器卷积,以建模空间-时间声学响应。由于音乐厅和历史礼拜空间具有长混响时间,导致合成滤波器包含许多滤波器 taps。因此,卷积过程可能计算密集,产生显著延迟,限制了声学还原系统的实时交互性。本文介绍了实时多声道扬声器基声学还原系统的实现。该系统能够利用GPU加速实时合成高混响空间的声学特性。比较了传统CPU卷积与GPU加速卷积,显示后者可实现显著降低延迟的实时性能。此外,系统在GPU上集成了声学合成与声学反馈消除,创建了一个统一的扬声器基声学还原框架,以最小化处理延迟。

英文摘要

Interactive acoustic auralization allows users to explore virtual acoustic environments in real-time, enabling the acoustic recreation of concert hall or Historical Worship Spaces (HWS) that are either no longer accessible, acoustically altered, or impractical to visit. Interactive acoustic synthesis requires real-time convolution of input signals with a set of synthesis filters that model the space-time acoustic response of the space. The acoustics in concert halls and HWS are both characterized by a long reverberation time, resulting in synthesis filters containing many filter taps. As a result, the convolution process can be computationally demanding, introducing significant latency that limits the real-time interactivity of the auralization system. In this paper, the implementation of a real-time multichannel loudspeaker-based auralization system is presented. This system is capable of synthesizing the acoustics of highly reverberant spaces in real-time using GPU-acceleration. A comparison between traditional CPU-based convolution and GPU-accelerated convolution is presented, showing that the latter can achieve real-time performance with significantly lower latency. Additionally, the system integrates acoustic synthesis with acoustic feedback cancellation on the GPU, creating a unified loudspeaker-based auralization framework that minimizes processing latency.

2510.05013 2026-06-19 stat.ML cs.LG

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

发表机构 * Okinawa Institute of Science and Technology(冲绳科学技术大学院大学)

AI总结 本研究通过好奇心驱动的机器人自我探索,结合Q学习实现主动推理,揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式,为人类高效语言习得提供解释。

Comments 27 pages, 22 pages of supplementary material

详情
AI中文摘要

婴儿通过极少的经验就能泛化习得语言,而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么?我们通过实验研究了这一问题,其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句(例如,推红色立方体)相关的动作。我们的方法使用Q学习摊销主动推理,实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加,泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现,这种模式类似于儿童语言学习中的表征重述。这些结果表明,好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

2606.20532 2026-06-19 cs.AI 新提交

How Do Instructions Shape Speech? Cross-Attention Attribution for Style-Captioned Text-to-Speech

指令如何塑造语音?面向风格描述文本到语音的交叉注意力归因

Nityanand Mathur, Hamees Sayed, Wasim Madha, Apoorv Singh, Sameer Khurana, Akshat Mandloi, Sudarshan Kamath

AI总结 提出交叉注意力归因方法,分析风格描述文本到语音系统中单词对声学输出的影响,发现风格标记在早期步骤和深层注意力峰值,且与基频和能量相关。

详情
AI中文摘要

风格描述文本到语音系统使用自然语言控制语音特征,但单个单词如何影响声学输出仍不清楚。理解这一点对于诊断故障模式和提高表现性TTS的可控性至关重要。我们首次将DAAM框架适配到语音领域,为语音扩散模型提出交叉注意力归因,并将其应用于CapSpeech-TTS。我们的方法提取了25层和24个ODE步骤的逐词热力图。我们分析了3,600个(风格描述,文本转录)组合,包括120个风格描述条件生成30个文本转录,揭示了描述词如何塑造波形。结果表明:(1)风格标记的时间方差低于内容/功能标记,确认了全局条件作用;(2)风格注意力与基频和能量相关;(3)风格条件作用在早期步骤和深层达到峰值;(4)注意力熵在第17层达到最小值,与风格重要性峰值同时出现,表明在最关键风格阶段网络选择性最大。这是首次研究自然语言如何影响语音扩散模型中的交叉注意力。

英文摘要

Style-captioned text-to-speech systems use natural language to control voice characteristics, but how individual words influence acoustic output remains unclear. Understanding this is critical for diagnosing failure modes and improving controllability in expressive TTS. We propose cross-attention attribution for speech diffusion models, adapting the DAAM framework to the speech domain for the first time, and apply it to CapSpeech-TTS. Our method extracts per-token heatmaps across 25 layers and 24 ODE steps. We analyze 3,600 (style caption, text transcript) combinations comprising 120 style captions conditioning the generation of 30 text transcripts each, revealing how caption tokens shape waveforms. Results show: (1) style tokens have lower temporal variance than content/function tokens, confirming global conditioning; (2) style attention correlates with F0 and energy; (3) style conditioning peaks in early steps and deep layers; (4) attention entropy reaches its minimum at layer 17, co-occurring with the style importance peak, indicating maximal network selectivity at the most style-critical stage. This is the first study of how natural language influences cross-attention in speech diffusion models

2606.20508 2026-06-19 cs.AI cs.LG 新提交

What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?

安全对齐的LLM从混合顺从演示中学到了什么?

Sihui Dai, Mann Patel

AI总结 研究通过混合良性顺从演示和有害顺从演示,探究演示组成如何驱动有害顺从,发现演示内容、顺序和训练方法影响模型提取的信息。

详情
AI中文摘要

先前工作表明,上下文演示可以越狱语言模型,但模型如何解释不同类型的顺从演示仍不清楚。我们通过混合良性顺从演示(无害请求,有帮助响应)与有害顺从演示(有害请求,有帮助响应)并测试关于演示组成如何驱动有害顺从的三个假设来研究这一点。在四个模型中,我们发现良性和有害演示不可互换:良性演示根据模型不同可以减少或增加有害顺从。我们进一步表明,偏好优化是防止良性演示增加有害顺从的关键训练阶段,演示顺序表现出强烈的近因偏差,并且模型在拒绝与上下文学习的交互方式上有所不同:一些模型在拒绝时也采用演示的格式,而其他模型在拒绝时覆盖所有上下文信号。综合来看,这项工作超越了展示基于演示的越狱有效,而是描述了其工作原理:模型从顺从演示中提取的内容取决于演示内容、顺序和训练方法。

英文摘要

Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different types of compliance demonstrations. We study this by mixing benign compliance demonstrations (non-harmful request, helpful response) with harmful compliance demonstrations (harmful request, helpful response) and testing three hypotheses about how demonstration composition drives harmful compliance. Across four models, we find that benign and harmful demonstrations are not interchangeable: benign demonstrations can either reduce or increase harmful compliance depending on the model. We further show that preference optimization is the critical training stage that prevents benign demonstrations from increasing harmful compliance, that demonstration ordering exhibits strong recency bias, and that models differ in how refusal interacts with in-context learning: some adopt demonstrated formatting even when refusing, while others override all in-context signals upon refusal. Taken together, this work moves beyond showing that demonstration-based jailbreaking works to characterizing how it works: what models extract from compliance demonstrations depends on demonstration content, ordering, and training methodology.

2606.20428 2026-06-19 cs.RO 新提交

ARC: Adaptive Robust Joint State and Covariance Estimation

ARC:自适应鲁棒联合状态与协方差估计

Alexandre Hadji-Thomas, Andrew Stirling, James R. Forbes

AI总结 提出统一块坐标下降框架,结合自适应鲁棒损失、迭代重加权最小二乘状态更新和最小加权协方差行列式估计器,实现离群值下状态与协方差的自适应联合估计。

Comments Submitted to information IEEE Robotics and Automation Letters (RA-L), June 2026. 8 pages, 7 figures, 1 table

详情
AI中文摘要

传感器测量经常受到离群值和非高斯噪声的污染。这些传感器数据中的缺陷会导致经典状态估计器产生有偏且不可靠的状态和不确定性估计。鲁棒估计器拒绝或降低离群值的权重,但不进行测量协方差估计,而联合状态和协方差估计器假设高斯残差和固定的损失形状参数。将这两种能力整合到一个框架中,可以在存在离群值的情况下同时估计状态和协方差。本文提出了一种统一的块坐标下降框架,该框架结合了范数感知自适应鲁棒损失、迭代重加权最小二乘状态更新和最小加权协方差行列式协方差估计器,产生了一个自调谐的联合状态和协方差估计器。该框架在蒙特卡洛模拟和真实世界超宽带定位实验(在杂乱的视距外环境中)中进行了评估。结果表明,所提出的估计器能够一致地恢复真实的内点测量协方差,并在状态估计精度上达到或超过所有基线方法,且无需任何手动参数调整。

英文摘要

Sensor measurements are frequently corrupted by outliers and non-Gaussian noise. These imperfections in the sensor data can cause classical state estimators to generate biased and unreliable state and uncertainty estimates. Robust estimators reject or downweight outliers but do not perform measurement covariance estimation, whereas joint state and covariance estimators assume Gaussian residuals and fixed loss shape parameters. Integrating these two capabilities into a single framework is an opportunity to simultaneously estimate both state and covariance in the presence of outliers. This paper proposes a unified Block-Coordinate Descent framework that combines a norm-aware adaptive robust loss, an Iteratively Reweighted Least-Squares state update, and a Minimum Weighted Covariance Determinant covariance estimator, yielding a self-tuning joint state and covariance estimator. The framework is evaluated in a Monte-Carlo simulation and on real-world ultra-wideband localization experiments in cluttered non-line-of-sight environments. Results show that the proposed estimator consistently recovers the true inlier measurement covariance and matches or exceeds the state estimation accuracy of all baselines, without requiring any manual parameter tuning.

2606.20411 2026-06-19 cs.LG 新提交

Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

直接优势估计:可扩展且样本高效的深度强化学习

Hsiao-Ru Pan, Bernhard Schölkopf

AI总结 针对直接优势估计(DAE)在部分可观测域和高维观测下的局限性,本文扩展其理论框架并引入离散潜动态模型降低计算复杂度,在Arcade学习环境中验证了DAE的可扩展性和样本效率。

Comments Accepted at RLC2026

详情
AI中文摘要

直接优势估计(DAE)已被证明可以提高深度强化学习算法的样本效率。然而,它对完全环境可观测性的依赖限制了其在现实场景中的适用性,并且其对转移概率建模的要求在高维观测下会带来巨大的计算开销。在本文中,我们解决了这两个局限性。首先,我们将DAE的理论框架扩展到部分可观测域,只需最小的修改。其次,我们通过引入高效近似转移概率的离散潜动态模型来降低其计算复杂度。我们在Arcade学习环境上评估了我们的方法,发现DAE在保持高样本效率的同时,能有效地随函数逼近器容量扩展。

英文摘要

Direct Advantage Estimation (DAE) has been shown to improve the sample efficiency of deep reinforcement learning algorithms. However, its reliance on full environment observability limits its applicability in realistic settings, and its requirement to model transition probabilities incurs substantial computational overhead for high-dimensional observations. In the present work, we address both limitations. First, we extend the theoretical framework of DAE to partially observable domains with minimal modifications. Second, we reduce its computational complexity by introducing discrete latent dynamics models that efficiently approximate transition probabilities. We evaluate our approach on the Arcade Learning Environment and find that DAE scales effectively with function approximator capacity while retaining high sample efficiency.

2606.20382 2026-06-19 cs.LG 新提交

Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based Approach

面向模态不平衡的联邦图学习:一种基于数据合成的方法

Zhengyu Wu, Hongchao Qin, Xunkai Li, Zekai Chen, Rong-Hua Li, Guoren Wang

AI总结 针对联邦图学习中客户端级和节点级模态不平衡问题,提出隐式图感知潜在语义表示合成范式FedMGS,通过可用性感知图编码器、原型引导语义合成器和可靠性校准融合机制恢复缺失模态语义,在四个任务上最高提升17.41%。

详情
AI中文摘要

多模态联邦图学习(MM-FGL)提供了一种自然的协作训练范式,但其实际部署受到两种粒度的模态不平衡挑战。当某些客户端缺少完整模态时,会出现客户端级不平衡;而当单个节点缺少视觉或文本属性时,会出现节点级不平衡。尽管存在一些相关研究,但我们的调查表明,它们主要针对图无关或集中式场景,难以直接适应。为了解决这些挑战,我们将模态不平衡的MM-FGL形式化为一个隐式图感知潜在语义表示合成问题。该范式直接在表示空间中恢复缺失的模态语义,从而最大化与原始数据语义分布的对齐,并缓解由缺失模态引起的高方差。为此,我们提出了FedMGS(联邦模态感知图合成),它集成了三个核心组件。可用性感知图编码器防止缺失模态污染局部结构传播。原型引导潜在语义合成器为不可用模态建立跨客户端语义锚点。可靠性校准语义融合机制在预测读出之前调节恢复的潜在表示的影响。在四个任务上的大量实验表明,FedMGS始终优于竞争基线,最高提升17.41%,并实现了最佳效率-性能权衡。

英文摘要

MultiModal Federated Graph Learning (MM-FGL) offers a natural collaborative training paradigm, but its practical deployment is challenged by two granularities of modality imbalance. Client-level imbalance occurs when certain clients lack entire modalities, while node-level imbalance occurs when individual nodes exhibit missing visual or textual attributes. While several relevant studies exist, our investigation reveals that they predominantly target graph-agnostic or centralized scenarios, rendering them difficult to adapt directly. To address these challenges, we formalize modality-imbalanced MM-FGL as an implicit graph-aware latent semantic representation synthesis problem. This paradigm recovers missing modal semantics directly within the representation space, thereby maximizing alignment with the original data's semantic distribution and mitigating the high variance induced by missing modalities. To this end, we propose FedMGS (Federated Modality-aware Graph Synthesis), which integrates three core components. The availability-aware graph encoder prevents missing modalities from contaminating local structural propagation. The prototype-guided latent semantic synthesizer establishes cross-client semantic anchors for unavailable modalities. The reliability-calibrated semantic fusion mechanism regulates the impact of recovered latent representations prior to predictive readout. Extensive experiments on four tasks show that FedMGS consistently outperforms competitive baselines with gains up to 17.41% with best efficiency-performance tradeoff.

2606.20357 2026-06-19 cs.LG 新提交

On the Variance of Temporal Difference Learning and its Reduction Using Control Variates

时序差分学习的方差及其通过控制变量的降低

Hsiao-Ru Pan, Bernhard Schölkopf

AI总结 本文分析表格表示下相位设置中时序差分学习的方差,证明其方差降低机制是通过有效聚合更多独立轨迹,并比较了TD、MC和DAE的方差界限。

Comments Accepted at RLC2026

详情
AI中文摘要

我们使用表格表示的相位设置分析了时序差分(TD)学习的方差,并表明其降低方差的能力背后的机制之一是通过有效聚合大量独立轨迹。基于这一见解,我们证明(1)TD的方差渐近地被蒙特卡洛(MC)估计器的方差从上方界定,以及(2)对于固定数量的样本,较短的水平更新会导致较小的方差。除了TD,我们还展示了直接优势估计(DAE),一种估计优势函数的方法,可以被视为一种回归调整的控制变量,在大样本极限下实现了比TD更紧的方差界限。最后,我们通过精心设计的环境数值说明了这些估计器的行为。

英文摘要

We analyze the variance of temporal difference (TD) learning using the phased setting with tabular representation, and show that one of the mechanisms behind its ability to reduce variance is by effectively aggregating over a larger number of independent trajectories. Based on this insight, we demonstrate that (1) the variance of TD is asymptotically bounded from above by Monte Carlo (MC) estimators, and (2) shorter horizon updates incurs less variance for a fixed number of samples. Beyond TD, we show that Direct Advantage Estimation (DAE), a method for estimating the advantage function, can be seen as a type of regression-adjusted control variate, which achieves a tighter bound on the variance compared to TD in the large-sample limit. Finally, we numerically illustrate the behaviors of these estimators with carefully designed environments.

2606.20323 2026-06-19 cs.AI 新提交

Leveraging systems' non-linearity to tackle the scarcity of data in the design of Intelligent Fault Diagnosis Systems

利用系统非线性应对智能故障诊断系统设计中的数据稀缺问题

Giancarlo Santamato, Andrea Mattia Garavagno, Massimiliano Solazzi, Antonio Frisoli

AI总结 提出一种利用系统固有非线性的周期多激励级方法,结合数据可视化与增强技术,在数据稀缺条件下实现基于深度迁移学习的振动故障诊断,并在铁路受电弓结构上验证有效性。

Journal ref Nonlinear Dynamics, vol. 112, pp. 16153-16166, 2024

详情
AI中文摘要

深度迁移学习(DTL)允许高效构建智能故障诊断系统(IFDS)。另一方面,DTL方法仍然严重依赖大量标记数据。在处理机器或结构故障时,获取如此大量的数据可能具有挑战性。本文提出了一种在数据严重稀缺条件下使用DTL设计基于振动的IFDS的新方法。利用真实世界系统固有非线性的周期性多激励级过程生成图像,这些图像可以由预训练的卷积神经网络(CNN)方便地分析以诊断故障。本文提出了一种新的数据可视化方法及其增强技术,以应对IFDS设计过程中典型的数据缺乏问题。在铁路受电弓结构上的实验验证为所提方法提供了有效支持。

英文摘要

Deep Transfer Learning (DTL) allows for the efficient building of Intelligent Fault Diagnosis Systems (IFDS). On the other hand, DTL methods still heavily rely on large amounts of labelled data. Obtaining such an amount of data can be challenging when dealing with machines or structures faults. This document proposes a novel approach to the design of vibration-based IFDS using DTL in condition of strong data scarcity. A periodic multi-excitation level procedure leveraging intrinsic non-linearities of real-world systems is used to produce images that can be conveniently analysed by pre-trained Convolutional Neural Networks (CNNs) to diagnose faults. A new data visualization method and its augmentation technique are proposed in this paper to tackle the typical lack of data encountered during the design of IFDS. Experimental validation on a railway pantograph structure provides effective support for the proposed method.

2606.20312 2026-06-19 cs.CV 新提交

Reliability-Aware Prototype Calibration for Frozen Pose-Flow Video Anomaly Detection

面向冻结姿态流视频异常检测的可靠性感知原型校准

Ning Dong, Yingna Su, Xin Dong, Ziyun Jiao, Xinnian Guo, Zhuangzhuang Pan

AI总结 提出一种后验评分校准方法RPC,通过标准化潜在空间中的最近原型偏差修正冻结姿态流检测器的排名,在8个骨干-数据集组合上平均提升AUROC 2.03个百分点。

Comments 15 pages, 5 figures, 7 tables. Code available at https://github.com/iNing10/RPC

详情
AI中文摘要

姿态流视频异常检测器因其能为跟踪的骨架窗口提供基于似然的排名,在一类监控中具有吸引力。然而,单个似然分数可能隐藏多模态正常行为,并对姿态观测噪声敏感。我们研究了一个冻结检测器设置,其中姿态流骨干网络、缓存的骨架轨迹和评估流程是固定的。可靠性感知原型校准(RPC)是针对该设置的一种后验评分校准方法。它在冻结潜在空间中添加标准化的最近原型偏差到标准化的流分数,并仅使用关键点置信度来门控这一新增的几何证据。因此,RPC在保留原始密度信号的同时,利用姿态可靠性下的经验正常模式结构修正排名。在两个冻结姿态流骨干网络和四个数据集上,RPC在所有八个骨干-数据集对中提升了帧级AUROC,增益范围为0.34到4.49个百分点,平均为2.03个百分点。消融和可靠性分析表明,原型偏差是主要的修正信号,而可靠性门控在姿态观测不可靠时最为有用。这些结果表明,当重新训练或复现完整姿态流程不可行时,轻量级后验校准可以增强缓存的姿态流系统。

英文摘要

Pose-flow video anomaly detectors are attractive for one-class surveillance because they provide likelihood-based rankings for tracked skeleton windows. However, a single likelihood score may hide multimodal normal behavior and be sensitive to pose-observation noise. We study a frozen-detector setting in which the pose-flow backbone, cached skeleton tracks, and evaluation pipeline are fixed. Reliability-Aware Prototype Calibration (RPC) is a post-hoc score calibration method for this setting. It adds a standardized nearest-prototype deviation in the frozen latent space to the standardized flow score, and uses keypoint confidence only to gate this added geometric evidence. Thus, RPC preserves the original density signal while correcting the ranking with empirical normal-mode structure under pose reliability. Across two frozen pose-flow backbones and four datasets, RPC improves frame-level AUROC in all eight backbone-dataset pairs, with gains ranging from 0.34 to 4.49 percentage points and averaging 2.03 points. Ablation and reliability analyses show that prototype deviation is the main corrective signal, while reliability gating is most useful when pose observations are less trustworthy. These results suggest that lightweight post-hoc calibration can strengthen cached pose-flow systems when retraining or reproducing the full pose pipeline is impractical.

2606.20274 2026-06-19 cs.AI 新提交

Lagrange: An Open-Vocabulary, Energy-Based Sparse Framework for Generalized End-to-End Driving

Lagrange: 一种面向通用端到端驾驶的开放词汇、基于能量的稀疏框架

Shihao Ji, HongXi Li, Zihui Song, Mingyu Li

AI总结 提出Lagrange框架,利用掩码潜在场和视觉语言模型实现开放词汇、稀疏计算,通过拉格朗日动作最小化确保运动学约束,在nuScenes和CODA基准上验证了鲁棒性和可解释性。

详情
AI中文摘要

将端到端自动驾驶扩展到复杂的开放世界环境,需要能够泛化到异常场景的感知模型和能够产生运动学有效轨迹的规划器。现有范式在表示效率和泛化能力之间存在明显分歧。密集模型(如占用网络)虽然几何鲁棒,但存在关键计算瓶颈,且难以进行高层语义推理。相反,稀疏的基于查询的规划器效率高,但依赖于封闭集定义,使其容易受到分布外事件的影响。尽管最近的视觉-语言-动作模型提供了开放词汇推理,但其自回归离散令牌生成从根本上与车辆动力学的连续高频控制需求相冲突。为解决这一问题,我们提出了Lagrange,一种基于掩码潜在场的开放词汇、计算稀疏的驾驶框架。Lagrange不依赖密集体积重建或封闭集查询机制,而是利用视觉语言模型将类别无关的目标提议编码为连续语义视觉令牌。我们引入了一种意图驱动的掩码交叉注意力模块,该模块在时间上过滤不相关实体,并将注意力令牌解码为定义在空间坐标上的隐式连续能量场。通过将决策制定为跨越该能量场的拉格朗日动作最小化问题,我们在执行碰撞避免的同时强制遵守车辆运动学。在标准(nuScenes)和长尾(CODA)基准上的大量离线评估表明,Lagrange为鲁棒、可解释且运动学可行的开放世界自主性建立了一个有前景的框架。

英文摘要

Scaling end-to-end autonomous driving to complex, open-world environments requires perceptual models that generalize to anomalous scenarios and planners that produce kinematically valid trajectories. Existing paradigms face a distinct dichotomy between representational efficiency and generalization capacity. Dense models (e.g., occupancy networks), while geometrically robust, incur critical computational bottlenecks and struggle with high-level semantic reasoning. Conversely, sparse, query-based planners are efficient but reliant on closed-set definitions, rendering them vulnerable to out-of-distribution (OOD) events. Although recent Vision-Language-Action (VLA) models offer open-vocabulary reasoning, their autoregressive, discrete token generation fundamentally conflicts with the continuous, high-frequency control requirements of vehicle dynamics. To address this, we propose Lagrange, an open-vocabulary, computationally sparse driving framework based on Masked Latent Fields (MLF). Rather than relying on dense volumetric reconstructions or closed-set query mechanisms, Lagrange exploits Vision-Language Models (VLMs) to encode class-agnostic object proposals into continuous semantic visual tokens. We introduce an intent-driven masked cross-attention module that temporally filters irrelevant entities, decoding the attended tokens into an implicit continuous energy field defined over spatial coordinates. By framing decision-making as a Lagrangian action minimization problem spanning this energy field, we enforce strict compliance with vehicle kinematics while executing collision avoidance. Extensive offline evaluations on both standard (nuScenes) and long-tail (CODA) benchmarks demonstrate that Lagrange establishes a promising framework for robust, interpretable, and kinematically feasible open-world autonomy.

2606.20255 2026-06-19 cs.CL cs.AI 新提交

The Register Gap: A Meaning Intelligence Framework for Nigerian Public Discourse

语域差距:尼日利亚公共话语的意义智能框架

Celestine Achi

AI总结 提出九维意义智能框架(MIF),通过语域、真实意图等维度区分表面情感与真实交际意图,在尼日利亚公共话语数据集上使语域分类准确率提升40个百分点,复合意义智能评分提升5.4分。

Comments Preprint. 12 pages, 2 tables. Supplementary materials: MIF Master Specification v2.0, Annotation Guidelines v1.0, and 30-item public calibration set with gold labels available from the author

详情
AI中文摘要

我们提出了意义智能框架(MIF),这是一个用于尼日利亚公共话语的九维标注和评估方案,将表面情感与真实交际意图区分开来。现有的尼日利亚语言基准(包括NaijaSenti和AfriSenti)将情感分类视为三向极性任务(正面、负面、中性)。我们认为,AI系统在尼日利亚话语上的主要失败模式不是翻译失败,而是语境失败:同一话语根据说话者、听众和情境可能具有相反的语用效力。MIF通过九个评分维度将这一见解操作化:语域、表面情感、真实意图、反讽、编码潜台词、风险等级、标注者置信度、说话者情绪和推荐沟通行动。我们构建了一个包含30个项目的校准数据集,涵盖标准英语、尼日利亚英语、尼日利亚皮钦语和混合语域,并在零样本和模式引导提示条件下评估了一个前沿语言模型(Gemini 2.5 Flash)。主要发现是语域差距:零样本语域分类准确率为33.3%,当模型在上下文中接收到MIF模式时,准确率上升至73.3%(+40个百分点)。在模式引导提示下,复合意义智能评分增加了5.4分(从73.2到78.6),最大的实际收益体现在语域识别、编码潜台词检测(+10分)和战略行动推荐(+10.3分)上。我们发布了框架规范、标注指南和包含30个项目的公开校准集以支持可重复性,同时保留了一个私有留存语料库用于防污染评估。

英文摘要

We introduce the Meaning Intelligence Framework (MIF), a nine-dimension annotation and evaluation schema for Nigerian public discourse that separates surface sentiment from true communicative intent. Existing benchmarks for Nigerian languages, including NaijaSenti and AfriSenti, treat sentiment classification as a three-way polarity task (positive, negative, neutral). We argue that the dominant failure mode of AI systems on Nigerian discourse is not translation failure but context failure: the same utterance carries opposite pragmatic force depending on speaker, audience, and situation. The MIF operationalises this insight across nine scored dimensions: register, surface sentiment, true intent, irony, coded subtext, risk tier, annotator confidence, speaker emotion, and recommended communications action. We construct a 30-item calibration dataset spanning Standard English, Nigerian English, Nigerian Pidgin, and code-mixed registers, and evaluate a frontier language model (Gemini 2.5 Flash) under zero-shot and schema-informed prompting conditions. The headline finding is the Register Gap: zero-shot register classification accuracy is 33.3%, rising to 73.3% (+40 points) when the model receives the MIF schema in-context. The composite Meaning Intelligence Score increases by 5.4 points (73.2 to 78.6) under schema-informed prompting, with the largest practical gains in register identification, coded-subtext detection (+10 points), and strategic action recommendation (+10.3 points). We release the framework specification, annotation guidelines, and the 30-item public calibration set to support reproducibility, while retaining a private holdout corpus for contamination-protected evaluation.

2606.20208 2026-06-19 cs.AI cs.DB cs.NE 新提交

Beyond Accuracy: Measuring Logical Compliance of Predictive Models

超越准确性:衡量预测模型的逻辑合规性

Guillaume Olivier Delplanque, Pierre Genevès, Nabil Layaïda, Zephirin Faure

AI总结 提出规则违反分数(RVS),一种独立于预测准确性的评估指标,用于量化预测模型对逻辑规则的遵守程度,并通过实验证明两个准确率相近的模型可能表现出截然不同的逻辑合规性。

详情
AI中文摘要

机器学习模型主要通过预测性能指标进行评估,如排序质量、预测误差或分类准确性。虽然这些指标有效量化了预测与真实值的匹配程度,但它们不评估模型输出是否尊重预定义的逻辑或领域特定约束。在医疗、金融和自主系统等高安全性应用中,逻辑一致性与预测准确性同样关键,但尚无标准指标捕捉这一维度。我们引入了规则违反分数(RVS),这是一种互补的评估指标,独立于预测准确性,量化预测模型对给定逻辑规则集的遵守程度。RVS 对硬规则(严格约束)和软规则(统计规律)区别对待,可在任何数据集和任何在关系词汇上表达的预测模型上进行评估,并可通过为 Horn 规则自动生成的 SQL 查询进行计算。除了评估模型,RVS 还可以评估训练数据集的逻辑一致性,并帮助识别定义不良的规则。我们在三个基准测试上评估了 RVS,涵盖知识图谱链接预测和关系回归,包括基于规则、基于嵌入和神经符号的预测模型。我们的结果表明,两个实现相当预测准确性的模型可能表现出显著不同的逻辑合规性,揭示了标准指标无法捕捉的模型行为差异。

英文摘要

Machine learning models are predominantly evaluated through predictive performance metrics such as ranking quality, prediction error, or classification accuracy. While these metrics effectively quantify how closely predictions match the ground truth, they do not assess whether model outputs respect predefined logical or domain-specific constraints. In high-stakes applications, including healthcare, finance, and autonomous systems, logical consistency can be as critical as predictive accuracy, yet no standard metric captures this dimension. We introduce the Rule Violation Score (RVS), a complementary evaluation metric that quantifies the extent to which a predictive model respects a given set of logical rules, independently of predictive accuracy. RVS treats hard rules (strict constraints) and soft rules (statistical regularities) differently, can be evaluated on any dataset and on any predictive model expressed over a relational vocabulary, and can be computed using SQL queries that are automatically generated for Horn rules. Beyond evaluating models, RVS can also evaluate the logical consistency of training datasets and help identify poorly defined rules. We evaluate RVS on three benchmarks covering knowledge graph link prediction and relational regression, including rule-based, embedding-based, and neuro-symbolic predictive models. Our results demonstrate that two models achieving comparable predictive accuracy can exhibit substantially different levels of logical compliance, revealing differences in model behavior that standard metrics fail to capture.

2606.20183 2026-06-19 cs.LG 新提交

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

有效维度主导量子核视觉模型的泛化

Jian Xu, Delu Zeng, John Paisley, Qibin Zhao

AI总结 通过有效维度d_eff解释量子视觉模型中纠缠结构增强泛化与量子噪声提升测试精度的现象,提出噪声形状核的谱分解与正则化机制。

详情
AI中文摘要

最近的量子视觉模型——量子视觉变换器和量子卷积网络——报告了两个引人注目但尚未解释的经验现象:(i) 具有更多或更均匀分布纠缠的拟设泛化更好,以及(ii) 注入量子噪声可以提高测试精度而不是降低它。这些观察目前被视为奇闻,通过网格搜索发现,并且如果有解释的话,也是手工进行的。我们表明,两者都是一个单一可测量量的表现:即(噪声形状的)量子特征核的\emph{有效维度}$d_{\rm eff}$。主要使用量子核视觉模型——由核分类器读出的量子特征映射——我们给出了一个谱解释,其中纠缠结构和量子噪声是调节$d_{\rm eff}$的两个旋钮;在过拟合区域,收缩$d_{\rm eff}$起到类似岭正则化的作用。我们分析了机制:退极化核$K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$的\emph{精确}分解,其中$d_{\rm eff}(K_p)\to1$,振幅阻尼的收缩结果(及其边界),核机器容量界,以及容量/对齐风险分解;在我们的纠缠实验中运作的单调收缩是经验验证的,并非普遍证明。沿着单参数退极化族,坍缩反而是通过构造精确的;我们仅用它来确认核分解到机器精度,最多达12个量子比特,而不是作为$d_{\rm eff}$的证据。振幅阻尼收缩$d_{\rm eff}$并沿倒U型最佳点将测试精度提升高达+13%;效应符号在过拟合和欠拟合区域之间翻转;噪声注入匹配显式谱过滤前沿。我们的结果将两个报告的现象组织成一个单一可测量原则,用于设计量子视觉模型。

英文摘要

Recent quantum vision models-quantum vision transformers and quantum convolutional networks-report two striking but unexplained empirical phenomena: (i) ansatze with more, or more uniformly distributed, entanglement generalize better, and (ii) injecting quantum noise can improve test accuracy rather than degrade it. These observations are currently treated as curiosities, discovered by grid search and explained, if at all, by hand. We show that both are manifestations of a single, measurable quantity: the \emph{effective dimension} $d_{\rm eff}$ of the (noise-shaped) quantum feature kernel. Working primarily with quantum-kernel vision models-a quantum feature map read out by a kernel classifier-we give a spectral account in which entanglement structure and quantum noise are two knobs that move $d_{\rm eff}$; in an overfitting regime, contracting $d_{\rm eff}$ acts as ridge-like regularization. We analyze the mechanism: an \emph{exact} decomposition of the depolarized kernel $K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$ with $d_{\rm eff}(K_p)\to1$, a contraction result (and its boundary) for amplitude damping, a kernel-machine capacity bound, and a capacity/alignment risk decomposition; the monotone contraction operative in our entangled experiments is verified empirically, not proven in general. Along the one-parameter depolarizing family the collapse is instead exact by construction; we use it only to confirm the kernel decomposition to machine precision and at up to $12$ qubits, not as evidence for $d_{\rm eff}$. Amplitude damping contracts $d_{\rm eff}$ and lifts test accuracy by up to $+13\%$ along an inverted-U sweet spot; the effect's sign flips between the over- and under-fitting regimes; noise injection matches an explicit spectral-filtering frontier. Our results organize two reported anecdotes into a single measurable principle for designing quantum-vision models.

2606.20179 2026-06-19 cs.CL 新提交

ReNikud: Audio-Supervised Hebrew Grapheme-to-Phoneme Conversion

ReNikud:音频监督的希伯来语字素到音素转换

Maxim Melichov, Yakov Kolani, Morris Alper

AI总结 提出ReNikud方法,利用音频监督和伪元音化架构,通过无标注音频的ASR伪标签和字符级对齐,解决希伯来语G2P转换中的元音缺失和发音歧义问题,在多个基准上达到最优。

详情
AI中文摘要

现代希伯来语的字素到音素(G2P)转换对于文本到语音(TTS)等应用是必需的,但由于该语言的辅音音素文字系统(abjad)使元音大多不写出来,造成大量歧义,因此具有挑战性。标准方法首先预测元音变音符号(nikud)以生成国际音标(IPA)转录,但这存在局限性:元音化数据稀缺且制作费力,它不指定词汇重音等特征,并且反映的是正式语法规则而非日常口语发音。同时,直接的序列到序列IPA预测在有限数据上表现不佳,且未能利用辅音音素文字特有的字符级对齐。我们的方法ReNikud通过两个关键洞察克服了这些限制:(1)通过基于音素的自动语音识别(ASR)伪标签流水线,在数千小时无标注希伯来语音频上进行弱音频监督,生成反映自然口语规范的音位转录,无需人工标注。(2)一种伪元音化架构,在每个字符位置预测IPA音素,强制字符级对齐作为归纳偏置。在现有希伯来语G2P基准和针对口语希伯来语的新MILIM基准上的结果表明,ReNikud超越了先前的最先进方法。我们将发布代码和训练模型,以支持希伯来语TTS和语音技术的进一步研究。

英文摘要

Grapheme-to-phoneme (G2P) conversion for Modern Hebrew is needed for applications like text-to-speech (TTS), but is challenging due to the language's abjad writing system, which leaves vowels largely unwritten, creating substantial ambiguity. Standard approaches first predict vowel diacritics (nikud) to produce International Phonetic Alphabet (IPA) transcriptions, but this is limited: vocalization data is scarce and laborious to produce, it does not specify features such as lexical stress, and it reflects formal grammatical rules rather than everyday spoken pronunciation. Direct sequence-to-sequence IPA prediction, meanwhile, struggles on limited data and fails to exploit the character-level alignment characteristic of abjads. Our method, ReNikud, overcomes these limitations with two key insights: (1) Weak audio supervision via a phoneme-based automatic speech recognition (ASR) pseudo-labeling pipeline on thousands of hours of unlabeled Hebrew audio, yielding phonemic transcriptions that reflect natural spoken norms without manual annotation. (2) A pseudo-vocalization architecture that predicts IPA phonemes at each character position, enforcing character-level alignment as an inductive bias. Results on existing Hebrew G2P benchmarks and the new targeted MILIM benchmark for spoken Hebrew show that ReNikud surpasses previous state-of-the-art methods. We will release our code and trained models to support further work on Hebrew TTS and speech technologies.

2606.20034 2026-06-19 cs.LG 新提交

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

探索AlphaEarth和TESSERA嵌入在精细尺度局地气候区制图中的应用潜力:以瑞士五个城市为例

Htet Yamin Ko Ko, Clement Atzberger

AI总结 本研究对比TESSERA和AlphaEarth嵌入与传统Sentinel-1/2数据,使用注意力U-Net将粗分辨率LCZ图提升至10米,发现嵌入模型在跨城市迁移和精度上表现更优,但跨年迁移仍是挑战。

详情
AI中文摘要

理解城市空间形态对于气候建模、风险评估和可持续城市设计至关重要,而局地气候区(LCZ)制图为此提供了基本框架。然而,许多城市仍使用约100米分辨率的粗LCZ记录,这并不适用于精细尺度的城市研究。在本研究中,我们将TESSERA(Feng等人,2025)和AlphaEarth(Brown等人,2025)的预计算嵌入与传统的Sentinel-1/2(S1S2)合成数据在瑞士五个城市进行比较,以评估它们是否能够使用基于注意力的U-Net将粗LCZ图提升至10米分辨率。三个实验评估了多城市迁移性、更高分辨率参考数据的影响以及对年际物候变化的时间鲁棒性。我们发现,所有数据集在前两个实验中均取得了强劲性能,测试数据的交并比(IoU)分别在0.59-0.69和0.77-0.82之间。TESSERA在两种设置下均一致优于S1S2和AlphaEarth。正如预期,我们发现基于嵌入的模型从一年迁移到另一年仍然是一个开放的挑战。然而,总体而言,我们的结果表明,来自地球观测基础模型的嵌入在减少耗时预处理和手动特征工程任务方面具有巨大潜力,并能够指导通用的基于深度学习的LCZ制图工作流程。当与简单的位置感知注意力U-Net架构结合时,这些嵌入增强了区域迁移性和可扩展性,支持为全球城市气候应用开发全面且可重复的精细尺度LCZ图。提高参考数据质量仍然是进一步提升精度的最强杠杆。

英文摘要

Understanding urban spatial morphology is critical for climate modeling, risk assessment, and sustainable urban design, and Local Climate Zone (LCZ) mapping provides the basic framework for this. However, many cities still use coarse ~100-m resolution LCZ records, which are unsuitable for fine-scale urban research. In this study, precomputed embeddings from TESSERA (Feng et al., 2025) and AlphaEarth (Brown et al., 2025) are compared to traditional Sentinel-1/2 (S1S2) composites in five Swiss cities to see if they can upscale coarse LCZ maps to 10-m resolution using an attention-based U-Net. Three experiments assess multi-city transferability, the impact of higher-resolution reference data, and temporal robustness to year-to-year phenology changes. We find that all datasets achieve strong performance with test data Intersection-over-Union (IoU) ranging from 0.59-0.69 and 0.77-0.82 in the first two experiments. TESSERA consistently outperforms both S1S2 and AlphaEarth across both settings As expected, we find that the transfer of embedding-based models from one year to another remains an open challenge. Overall, however, our results demonstrate the promising potential of embeddings derived from EO foundation models to reduce time consuming preprocessing, respectively, manual feature engineering tasks and to guide a universal deep learning-based LCZ mapping workflow. When combined with a simple location-aware attention U-Net architecture, the embeddings enhance regional transferability and scalability, supporting the development of comprehensive and reproducible fine-scale LCZ maps for global urban climate applications Improving reference data quality remains the strongest lever for further accuracy gains.

2606.19987 2026-06-19 cs.SD eess.AS 新提交

PolSeT: Polish Semantics of Timbre Dataset

PolSeT: 波兰语音色语义数据集

Jan Jasiński

AI总结 介绍PolSeT数据集,通过自由言语化和语义差异实验,收集波兰语语义描述符和音色评分,填补音色研究数据空白,支持跨文化心理声学和MIR研究。

Comments 8 pages, 7 figures. Data descriptor for the PolSeT dataset (Polish Semantics of Timbre), available at https://doi.org/10.5281/zenodo.17830609 under CC BY 4.0

详情
AI中文摘要

本数据报告介绍了PolSeT(波兰语语义音色)数据集,该数据集旨在促进波兰语及跨文化背景下的心理声学和音乐信息检索(MIR)研究。数据集包含两个连续实验的数据。实验1(N=60)是一项自由言语化任务,旨在创建波兰语语义描述符词汇表。使用11个刺激,共收集了1901个描述符(701个唯一)。实验2(N=105)利用该词汇表进行语义差异研究,参与者对18种乐器声音在8个双极量表上进行评分,并进行了重复试验以进行信度分析。发布的数据集包括原始听众响应、全面的人口统计数据(经验、性别、年龄)、音频刺激以及提取的声学特征及Python提取代码。该数据集填补了开放音色研究数据的空白,为心理声学研究和多语言语义嵌入模型的训练提供了必要的定性语言基础和定量评分。

英文摘要

This data report introduces PolSeT (Polish Semantic Timbre), a dataset designed to facilitate research in psychoacoustics and Music Information Retrieval (MIR) in Polish and cross-cultural contexts. The dataset contains data from two sequential experiments. Experiment 1 (N=60) was a free-verbalization task aimed at creating a lexicon of Polish semantic descriptors. Using 11 stimuli, a total of 1901 descriptors (701 unique) were gathered. Experiment 2 (N=105) utilized this lexicon to conduct a semantic differential study, where participants rated 18 instrument sounds on 8 bipolar scales, with repeated trials for reliability analysis. The released dataset includes raw listener responses, comprehensive demographics (experience, gender, age), audio stimuli, and extracted acoustic features with Python extraction code. This dataset addresses a gap in open timbre research data, providing both the qualitative linguistic groundwork and the quantitative ratings necessary for psychoacoustic research and the training of multilingual semantic embedding models.