arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2237
专题追踪
2507.11366 2026-06-17 cs.GT cs.LG 版本更新

Characterizing Nash Equilibria in Zero-Sum Games: A Physics-Inspired, Parallelizable Approach with a Linear Number of Gradient Queries

零和博弈中纳什均衡的表征:一种受物理学启发、可并行化且具有线性梯度查询次数的方法

Taemin Kim, James P. Bailey

发表机构 * Industrial and Systems Engineering(工业与系统工程系) Rensselaer Polytechnic Institute(伦塞拉尔理工学院)

AI总结 提出一种受哈密顿动力学启发的在线优化方法,通过交替梯度下降在线性迭代次数内表征零和博弈的纳什均衡集,支持并行化和任意学习率,实验性能显著优于传统方法。

详情
AI中文摘要

我们研究零和博弈的在线优化方法,这是机器学习、经济学及许多其他领域中对抗性学习的一个基本问题。传统方法使用基于遗憾的方法(时间平均收敛)或基于收缩映射的方法(最后迭代收敛)来近似纳什均衡。我们提出一种基于物理学中哈密顿动力学的新方法,并证明在无界设置下,除退化情况外,它能在有限(线性)次交替梯度下降迭代中表征纳什均衡集,这是在线优化中的首次。与计算纳什均衡的标准方法不同,我们提出的方法可并行化且适用于任意学习率,这两者在算法博弈论中均为首次。实验上,我们通过展示我们的方法显著优于标准方法来支持我们的结果。

英文摘要

We study online optimization methods for zero-sum games, a fundamental problem in adversarial learning in machine learning, economics, and many other domains. Traditional methods approximate Nash equilibria (NE) using either regret-based methods (time-average convergence) or contraction-map-based methods (last-iterate convergence). We propose a new method based on Hamiltonian dynamics in physics and prove that it can characterize the set of NE in a finite (linear) number of iterations of alternating gradient descent in the unbounded setting, modulo degeneracy, a first in online optimization. Unlike standard methods for computing NE, our proposed approach can be parallelized and works with arbitrary learning rates, both firsts in algorithmic game theory. Experimentally, we support our results by showing our approach drastically outperforms standard methods.

2411.06842 2026-06-17 eess.IV cs.CV 版本更新

Evaluating Synthetic Data Generation for Domain Generalization in Fetal Brain MRI Segmentation

评估胎儿脑MRI分割中域泛化的合成数据生成

Vladyslav Zalevskyi, Thomas Sanchez, Margaux Roulet, Busra Bulut, Hélène Lajous, Jordina Aviles Verdera, Sara Neves Silva, Georg Langs, Gregor Kasprian, Roxane Licandro, Jana Hutter, Hamza Kebiri, Meritxell Bach Cuadra

发表机构 * Department of Radiology, Lausanne University Hospital and University of Lausanne (UNIL)(拉沃斯大学医院放射科和洛桑大学(UNIL)) CIBM Center for Biomedical Imaging(生物医学成像中心) Institute for Information Processing, Leibniz University Hannover(汉诺威莱比锡大学信息处理研究所) Department of Biomedical Engineering, School of Biomedical Engineering & Imaging Sciences, King’s College London(伦敦国王学院生物医学工程系) Department of Biomedical Imaging and Image-Guided Therapy, Division of Neuroradiology and Musculoskeletal Radiology, Medical University of Vienna(维也纳医学大学生物医学成像与影像引导治疗系) Department of Biomedical Imaging and Image-guided Therapy, Computational Imaging Research Lab (CIR), Medical University of Vienna(维也纳医学大学生物医学成像与影像引导治疗系,计算成像研究实验室(CIR)) Christian Doppler Laboratory for Mathematical Modelling and Simulation of Next-Generation Medical Ultrasound Devices, Medical University of Vienna(维也纳医学大学下一代医学超声设备数学建模与仿真克里斯蒂安多普勒实验室) Comprehensive Center for Artificial Intelligence in Medicine, Medical University of Vienna(维也纳医学大学人工智能在医学中的综合中心) Division of Neuroradiology and Musculoskeletal Radiology, Department of Biomedical Imaging and Image–guided Therapy, Medical University of Vienna(维也纳医学大学生物医学成像与影像引导治疗系,神经放射学和骨科放射学系)

AI总结 针对胎儿脑MRI分割中数据异质性和标注不足问题,研究基于域随机化的合成数据生成策略,提出FetalSynthSeg框架,通过高斯混合强度建模和强度聚类提升跨域鲁棒性,在多个数据集上达到最优性能。

详情
AI中文摘要

从磁共振成像(MRI)中进行胎儿脑组织分割对于研究神经发育至关重要,但由于数据异质性和有限标注而仍然具有挑战性。域随机化(DR)最近作为一种有前景的单源域泛化策略出现,通过合成具有随机伪影、对比度和分辨率的训练图像。在这项工作中,我们研究了如何最大化基于DR的方法的域外(OOD)泛化能力。我们评估了几种用于DR的合成数据生成策略,特别关注我们最近提出的框架FetalSynthSeg。我们表明,简单的高斯混合强度建模优于更复杂的基于物理的模拟,并且强度聚类(根据强度细分组织类别)提高了OOD鲁棒性。在来自四个站点的348个胎儿受试者(涵盖0.55-3T以及T1w和T2w对比)上评估,FetalSynthSeg在多个FeTA 2024测试数据集上达到了最先进的性能(80-85 Dice分数),并且首次在T2w以外的模态上为胎儿脑分割提供了鲁棒的分割(在dHCP-T1w数据集上达到80 Dice)。与最先进的方法(如BOUNTI、nnU-Net集成和FeTA 2024获胜者)相比,FetalSynthSeg在保持跨域偏移的强鲁棒性的同时,提供了相当或更优的准确性。我们的代码、模型权重和便于推理的Docker镜像可在以下网址获取:此 https URL。

英文摘要

Fetal brain tissue segmentation from magnetic resonance imaging (MRI) is crucial for studying neurodevelopment, but remains challenging due to data heterogeneity and limited annotations. Domain randomization (DR) has recently emerged as a promising strategy for single-source domain generalization by synthesizing training images with randomized artifacts, contrast, and resolution. In this work, we investigate how to maximize the out-of-domain (OOD) generalization of DR-based methods. We evaluate several synthetic data generation strategies for DR, with a particular focus on our recently proposed framework, FetalSynthSeg. We show that simple Gaussian mixture-based intensity modeling outperforms more complex physics-based simulations, and that intensity clustering (subdividing tissue classes based on intensity) improves OOD robustness. Evaluated on 348 fetal subjects from four sites spanning 0.55-3T and both T1w and T2w contrasts, FetalSynthSeg reaches state-of-the-art performance on several FeTA 2024 testing datasets (80-85 Dice score) and, for the first time, offers robust segmentation on modalities other than T2w for fetal brain segmentation (80 Dice on dHCP-T1w dataset). Compared with state-of-the-art methods such as BOUNTI, nnU-Net ensemble, and the FeTA 2024 winner, FetalSynthSeg delivers comparable or superior accuracy while maintaining strong robustness across domain shifts. Our code, model weights, and Docker image ready for easy inference are available at https://hub.docker.com/r/vzalevskyi/fetalsynthseg.

2501.00826 2026-06-17 q-fin.TR cs.AI 版本更新

LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management

基于LLM的多智能体系统实现自动化加密货币投资组合管理

Yichen Luo, Yebo Feng, Jiahua Xu, Paolo Tasca, Yang Liu

发表机构 * University College London(伦敦大学学院) Nanyang Technological University(南洋理工大学) Exponential Science(指数科学)

AI总结 提出一个三智能体系统(市场、新闻、交易),通过分层、协作和辩论架构融合多模态信号,在2025年回测中实现133.52%累计收益和1.502夏普比率,优于单智能体和深度学习基线。

详情
AI中文摘要

加密货币投资组合管理需要在高度波动和实时约束下融合异构多模态信号,包括结构化的价格和链上时间序列、非结构化的新闻文本以及技术指标。虽然深度学习方法显示出预测能力,但其不透明性限制了实际应用,而单个大语言模型(LLM)智能体难以处理稳健决策所需的多模态输入广度。我们提出一个多智能体系统(MAS)框架,其中三个模态专业智能体——负责市场动态的加密货币智能体、负责每周新闻情绪的新闻智能体和负责信号融合与投资组合执行的交易智能体——通过三种通信架构(分层、协作和辩论)分解任务。我们评估了四种能力配置:零样本、思维链(CoT)、检索增强生成(RAG)和技能增强。在2025年1月按市值排名前15的L1区块链原生加密货币的52周回测中,最佳配置(分层技能)实现了133.52%的累计收益和1.502的夏普比率,优于单智能体变体、被动基准和深度学习基线。消融研究确定加密货币智能体是最关键的组件,移除它会使累计收益降低42.57个百分点。跨模型比较进一步表明,在GPT-4o、GPT-5和Claude Sonnet 4.5下,MAS均优于单智能体基线,表明多智能体协调的优势与模型无关。与黑箱深度学习模型不同,每个投资组合决策都可追溯到明确的智能体推理,为多模态加密货币投资组合管理提供了一种可解释且有效的方法。

英文摘要

Cryptocurrency portfolio management requires the fusion of heterogeneous multi-modal signals, including structured price and on-chain time series, unstructured news text, and technical indicators, under high-volatility and real-time constraints. While deep learning approaches show predictive capability, their opacity limits practical adoption, and single large language model (LLM) agents struggle to process the breadth of modality-specific inputs needed for robust decision-making. We propose a multi-agent system (MAS) framework in which three modality-specialised agents, a Crypto Agent for market dynamics, a News Agent for weekly news sentiment, and a Trading Agent for signal fusion and portfolio execution, decompose the task across three communication architectures: hierarchical, collaborative, and debate. We evaluate four capability configurations: zero-shot, chain-of-thought (CoT), retrieval-augmented generation (RAG), and skill-augmented. In a 52-week backtest over calendar year 2025 across the top 15 L1 blockchain native cryptocurrencies by market capitalisation as of January 2025, the best configuration, Hierarchical (Skill), achieves a cumulative return of 133.52% and a Sharpe ratio of 1.502, outperforming single-agent variants, passive benchmarks, and deep learning baselines. An ablation study identifies the Crypto Agent as the most critical component, with its removal reducing cumulative return by 42.57 percentage points. A cross-model comparison further shows that MAS outperforms the single-agent baseline under GPT-4o, GPT-5, and Claude Sonnet 4.5, suggesting that the benefit of multi-agent coordination is model-agnostic. Unlike black-box deep learning models, every portfolio decision is traceable to explicit agent reasoning, offering an interpretable and effective approach to multi-modal cryptocurrency portfolio management.

2407.13053 2026-06-17 cs.CY cs.AI cs.CL cs.LG 版本更新

E2Vec: Feature Embedding with Temporal Information for Analyzing Student Actions in E-Book Systems

E2Vec:基于时间信息的特征嵌入用于分析电子书系统中的学生行为

Yuma Miyazaki, Valdemar Švábenský, Yuta Taniguchi, Fumiya Okubo, Tsubasa Minematsu, Atsushi Shimada

发表机构 * Kyushu University(九州大学)

AI总结 提出E2Vec方法,利用词嵌入将操作日志和时间间隔转化为学生向量,用于风险检测任务,提升泛化性和性能。

Comments Research paper published in the Proceedings of the 17th Educational Data Mining Conference (EDM 2024), see https://doi.org/10.5281/zenodo.12729853

详情
AI中文摘要

数字教科书(电子书)系统将学生与教科书的交互记录为一系列事件,称为事件流数据。过去,研究人员从事件流中提取有意义的特征,并将其用作下游任务(如成绩预测和学生行为建模)的输入。先前的研究评估了主要使用基于统计的特征(如操作类型数量或访问频率)的模型。虽然这些特征有助于提供某些见解,但它们缺乏捕捉不同学生学习行为中细粒度差异的时间信息。本研究提出E2Vec,一种基于词嵌入的新型特征表示方法。该方法将每个学生的操作日志及其时间间隔视为字符字符串序列,并生成包含时间信息的学习活动特征的学生向量。我们应用fastText为来自两年计算机科学课程数据集的305名学生生成嵌入向量。然后,我们研究了E2Vec在风险检测任务中的有效性,展示了其泛化性和性能潜力。

英文摘要

Digital textbook (e-book) systems record student interactions with textbooks as a sequence of events called EventStream data. In the past, researchers extracted meaningful features from EventStream, and utilized them as inputs for downstream tasks such as grade prediction and modeling of student behavior. Previous research evaluated models that mainly used statistical-based features derived from EventStream logs, such as the number of operation types or access frequencies. While these features are useful for providing certain insights, they lack temporal information that captures fine-grained differences in learning behaviors among different students. This study proposes E2Vec, a novel feature representation method based on word embeddings. The proposed method regards operation logs and their time intervals for each student as a string sequence of characters and generates a student vector of learning activity features that incorporates time information. We applied fastText to generate an embedding vector for each of 305 students in a dataset from two years of computer science courses. Then, we investigated the effectiveness of E2Vec in an at-risk detection task, demonstrating potential for generalizability and performance.

2208.03023 2026-06-17 eess.AS cs.SD 版本更新

AID: Open-source Anechoic Interferer Dataset

AID:开源消声干扰源数据集

Philipp Götz, Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets

发表机构 * International Audio Laboratories Erlangen(国际声学实验室埃尔朗根) Fraunhofer Institute for Integrated Circuits IIS(弗劳恩霍夫整合电路研究所IIS)

AI总结 提出一个家庭环境中各种声源的消声录音数据集,用于模拟复杂声学场景的非平稳环境噪声信号,并提供Python库生成随机混合干扰信号。

Comments Accepted for publication at IWAENC 2022

详情
AI中文摘要

本文提出了一个数据集,包含家庭环境中遇到的各种声源的消声录音。该数据集旨在作为非平稳环境噪声信号的资源,这些信号与声学脉冲响应卷积后可用于模拟复杂的声学场景。此外,还提供了一个Python库,用于生成数据集中录音的随机混合,这些混合可用作非平稳干扰信号。

英文摘要

A dataset of anechoic recordings of various sound sources encountered in domestic environments is presented. The dataset is intended to be a resource of non-stationary, environmental noise signals that, when convolved with acoustic impulse responses, can be used to simulate complex acoustic scenes. Additionally, a Python library is provided to generate random mixtures of the recordings in the dataset, which can be used as non-stationary interference signals.

2502.17773 2026-06-17 stat.ME cs.AI cs.LG

How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective

大型语言模型值得模拟多少人意见?从不确定性量化角度出发

Chengpiao Huang, Yuhang Wu, Kaizheng Wang

发表机构 * Department of IEOR, Columbia University(哥伦比亚大学工业工程与运筹学系) Decision, Risk, and Operations Division, Columbia Business School(哥伦比亚商学院决策、风险与运营分校) Department of IEOR and Data Science Institute, Columbia University(哥伦比亚大学工业工程与运筹学系及数据科学研究所)

AI总结 本文从不确定性量化角度出发,提出了一种框架,将LLM模拟的响应转换为人类响应总体参数的可靠置信集,通过量化人类-LLM不一致带来的不确定性。关键设计是模拟响应的数量:过多会导致置信集过窄且覆盖性差,过少则导致置信集过宽且信息不足。本文提出了一种数据驱动的方法,自适应选择模拟样本量以实现名义平均覆盖性,无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步反映了LLM能代表的有效人类人口规模,提供了其模拟保真度的定量度量。实验表明不同LLM和领域存在异质性模拟保真度。

Comments 63 pages, 13 figures

详情
AI中文摘要

大型语言模型(LLMs)越来越多地用于模拟调查响应,但合成数据可能与人类人口不一致,导致不可靠的推断。我们开发了一个通用框架,将LLM模拟的响应转换为人类响应总体参数的可靠置信集,量化由人类-LLM不一致引起的不确定性。关键设计选择是模拟响应的数量:过多会产生过于狭窄的置信集,覆盖性差;过少则会产生过于宽泛且信息不足的置信集,受随机噪声主导。我们提出了一种数据驱动的方法,自适应地选择模拟样本量以实现名义平均覆盖性,无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步被证明反映了LLM能代表的有效人类人口规模,提供其模拟保真度的定量度量。在真实调查数据集上的实验揭示了不同LLM和领域之间的异质性模拟保真度。

英文摘要

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, quantifying the uncertainty induced by the human-LLM misalignment. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield overly wide and uninformative sets dominated by stochastic noise. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous simulation fidelity across different LLMs and domains.

2501.12709 2026-06-17 quant-ph cs.AI cs.CR cs.DC

Experimentally validated quantum-secure federated learning over a multi-user quantum network

在多用户量子网络上实验验证的量子安全联邦学习

Zhi-Ping Liu, Xiao-Yu Cao, Hao-Wen Liu, Xiao-Ran Sun, Yu Bao, Jian-Yu Shen, Yu-Shuo Lu, Hua-Lei Yin, Zeng-Bing Chen

发表机构 * National Laboratory of Solid State Microstructures(固态微结构国家实验室) School of Physics, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China(物理系,先进微结构协同创新中心,南京大学,南京210093,中国) School of Physics(物理系) Key Laboratory of Quantum State Construction(量子态制备重点实验室) Manipulation (Ministry of Education), Renmin University of China, Beijing 100872, China(操控(教育部),中国人民大学,北京100872,中国)

AI总结 本文提出QuNetQFL协议,通过分布式量子密钥掩蔽局部模型更新,实现信息论安全的聚合。实验验证在四客户端量子网络上,提升分类准确率并展示在语言任务和大规模模拟中的扩展性。

Comments 25 pages, 7 figures, 7 tables, Accepted by Research

Journal ref Research 9, 1299 (2026)

详情
AI中文摘要

联邦学习实现了去中心化和隐私保护的训练,但在量子时代仍面临隐私泄露的风险。量子联邦学习(QFL)提供了一条通往增强安全性和效率的途径。然而,缺乏一个实际且经过实验验证的QFL协议,利用近期量子技术解决数据隐私问题。本文提出了QuNetQFL协议,在量子网络上实现,其中局部模型更新被分布式量子秘密密钥掩蔽,提供信息论安全的聚合。我们实验验证该协议在四客户端量子网络上,并通过生成的密钥在量子和现实数据集上进行性能基准测试。添加一个量子客户端显著提高了对多体纠缠和非稳定器量子数据集的分类准确率。在语言任务中,我们通过联邦微调混合经典-量子语言模型进行情感分析,实现了在模拟和真实量子硬件上的可比和稳健性能。大规模模拟进一步展示了其扩展性,可扩展到200个客户端进行手写数字识别,具有快速收敛和通信成本减少75%的模型压缩。本文的工作为新兴量子互联网中的量子安全联邦学习建立了实际和可扩展的路线。

英文摘要

Federated learning enables decentralized, privacy-preserving training but remains vulnerable to privacy leakage in the quantum era. Quantum federated learning (QFL) offers a promising path towards enhanced security and efficiency. However, a practical and experimentally validated QFL protocol utilizing near-term quantum techniques to address data privacy has been lacking. Here we present QuNetQFL, a QFL protocol implemented on quantum networks, in which local model updates are masked with distributed quantum secret keys, offering information-theoretic security during aggregation. We experimentally validate the protocol on a four-client quantum network and benchmark its performance using the generated keys on quantum and real-world datasets. Adding a single quantum client significantly improves global accuracy for classifying multipartite entangled and non-stabilizer quantum datasets. For language tasks, we apply QuNetQFL to sentiment analysis by federated fine-tuning of a hybrid classical-quantum language model, achieving comparable and robust performance in simulation and on real quantum hardware. Large-scale simulations further demonstrate scalability to 200 clients for handwritten-digit recognition, with rapid convergence and a $75\%$ reduction in communication cost via model compression. Our work establishes a practical and scalable route to quantum-secure federated learning for the emerging quantum internet.

2604.13662 2026-06-17 cond-mat.mes-hall cs.CV cs.LG

Automatic Charge State Tuning of 300 mm FDSOI Quantum Dots Using Neural Network Segmentation of Charge Stability Diagram

300毫米FDSOI量子点自动电荷状态调节:基于神经网络的电荷稳定性图分割

Peter Samaha, Amine Torki, Ysaline Renaud, Sam Fiette, Emmanuel Chanrion, Pierre-Andre Mortemousque, Yann Beilliard

发表机构 * CEA-Leti(法国格勒诺耶大学(Univ. Grenoble Alpes))

AI总结 本文提出基于深度学习的语义分割流程,通过识别电荷稳定性图中的过渡线实现量子点自动电荷调节,提升硅量子点量子比特的高通量电荷调节效率。

Comments 10 pages, 6 figures, supplementary materials available

详情
AI中文摘要

调节由门定义的半导体量子点(QDs)是扩展自旋量子比特技术的主要瓶颈。我们提出了一种由深度学习(DL)驱动的语义分割流程,通过在完整的电荷稳定性图(CSDs)中定位过渡线来实现电荷自动调节,并返回单电荷 regime 的门电压目标。我们组装并手动注释了1015个实验测量的硅量子点设备的大型异构数据集,涵盖九种设计几何形状、多个晶圆和制造批次。一个具有MobileNetV2编码器的U-Net风格卷积神经网络(CNN)通过五折分组交叉验证进行训练和验证。我们的模型在定位单电荷 regime 方面实现了80.0%的离线调节成功率,某些设计的峰值性能超过88%。我们分析了主导的失败模式并提出了针对性的缓解措施。最后,宽范围图分割也自然地启用了可扩展的基于物理的特征提取,可以反馈到制造和设计流程中,并概述了在低温晶圆探针中实现实时集成的道路图。总体而言,我们的结果表明,基于神经网络(NN)的宽图分割是实现硅量子点量子比特高通量电荷调节的可行步骤。

英文摘要

Tuning of gate-defined semiconductor quantum dots (QDs) is a major bottleneck for scaling spin qubit technologies. We present a deep learning (DL) driven, semantic-segmentation pipeline that performs charge auto-tuning by locating transition lines in full charge stability diagrams (CSDs) and returns gate voltage targets for the single charge regime. We assemble and manually annotate a large, heterogeneous dataset of 1015 experimental CSDs measured from silicon QD devices, spanning nine design geometries, multiple wafers, and fabrication runs. A U-Net style convolutional neural network (CNN) with a MobileNetV2 encoder is trained and validated through five-fold group cross validation. Our model achieves an overall offline tuning success of 80.0% in locating the single-charge regime, with peak performance exceeding 88% for some designs. We analyze dominant failure modes and propose targeted mitigations. Finally, wide-range diagram segmentation also naturally enables scalable physic-based feature extraction that can feed back to fabrication and design workflows and outline a roadmap for real-time integration in a cryogenic wafer prober. Overall, our results show that neural network (NN) based wide-diagram segmentation is a practical step toward automated, high-throughput charge tuning for silicon QD qubits.

2506.07917 2026-06-17 cs.GR cs.CV

SpeeDe3DGS: Speedy Deformable 3D Gaussian Splatting with Temporal Pruning and Motion Grouping

SpeeDe3DGS:通过时间修剪和运动分组实现快速变形3D高斯点拨

Allen Tu, Haiyang Ying, Alex Hanson, Yonghan Lee, Tom Goldstein, Matthias Zwicker

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 本文提出SpeeDe3DGS,通过时间敏感性修剪、时间敏感性采样和GroupFlow模块,在保持高质量重建的同时,显著提升3DGS的渲染和训练效率。

Comments Project Page: https://speede3dgs.github.io/

Journal ref Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 26083-26093

详情
AI中文摘要

动态扩展的3D高斯点拨(3DGS)通过神经运动场实现高质量重建,但每个高斯神经推理使其模型计算成本高。基于DeformableGS,我们引入了快速变形3D高斯点拨(SpeeDe3DGS),通过三个互补模块:时间敏感性修剪(TSP)通过时间聚合敏感性分析移除低影响高斯,时间敏感性采样(TSS)扰动时间戳以抑制漂浮点并提高时间一致性,以及GroupFlow将学习的变形场压缩为共享SE(3)变换以实现高效的组间运动。在50个动态场景的MonoDyGauBench上,将TSP和TSS整合到DeformableGS中,平均渲染速度提升6.78倍,同时保持神经场保真度并使用10倍更少的原始体素。添加GroupFlow后,渲染速度进一步提升13.71倍,训练时间缩短2.53倍,超越所有基线,在保持优越图像质量的同时实现了更快的速度。

英文摘要

Dynamic extensions of 3D Gaussian Splatting (3DGS) achieve high-quality reconstructions through neural motion fields, but per-Gaussian neural inference makes these models computationally expensive. Building on DeformableGS, we introduce Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS), which bridges this efficiency-fidelity gap through three complementary modules: Temporal Sensitivity Pruning (TSP) removes low-impact Gaussians via temporally aggregated sensitivity analysis, Temporal Sensitivity Sampling (TSS) perturbs timestamps to suppress floaters and improve temporal coherence, and GroupFlow distills the learned deformation field into shared SE(3) transformations for efficient groupwise motion. On the 50 dynamic scenes in MonoDyGauBench, integrating TSP and TSS into DeformableGS accelerates rendering by 6.78$\times$ on average while maintaining neural-field fidelity and using 10$\times$ fewer primitives. Adding GroupFlow culminates in 13.71$\times$ faster rendering and 2.53$\times$ shorter training, surpassing all baselines in speed while preserving superior image quality.

2603.19801 2026-06-17 eess.IV cs.AI cs.CV

Offshore oil and gas platform dynamics in the North Sea, Gulf of Mexico, and Persian Gulf: Exploiting the Sentinel-1 archive

北海、墨西哥湾和波斯湾的海上石油和天然气平台动态:利用Sentinel-1档案

Robin Spanier, Thorsten Hoeser, John Truckenbrodt, Felix Bachofer, Claudia Kuenzer

发表机构 * German Remote Sensing Data Center, Earth Observation Center, EOC of the German Aerospace Center, DLR(德国遥感数据中心,地球观测中心,德国航空航天中心(DLR)地球观测中心) Institute for Geography and Geology, Department of Remote Sensing, University of Würzburg(地理与地质研究所,遥感系,乌尔姆大学)

AI总结 本文利用Sentinel-1数据和深度学习技术,研究了北海、墨西哥湾和波斯湾的海上平台动态,揭示了平台数量变化及结构转型,为海洋基础设施监测提供了数据支持。

Comments 16 pages, 10 figures, 1 table

Journal ref Big Earth Data, 2026, 1-27

详情
AI中文摘要

随着海上基础设施的增加,对持续、可扩展的监测需求日益增长。本文提出了一种基于免费地球观测数据的自动化方法,利用Sentinel-1档案数据和深度学习目标检测技术,构建了2017-2025年间北海、墨西哥湾和波斯湾的季度平台位置时间序列。此外,还推导了平台大小、水深、海岸距离、国家归属及安装和退役日期等信息。2025年识别出3728个海上平台,其中北海有356个,墨西哥湾有1641个,波斯湾有1731个。尽管波斯湾平台数量在2024年前持续增长,但墨西哥湾和北海的平台数量在2018-2020年间有所下降。同时,超过2700个平台被安装或迁移到新地点,同时有相当数量被退役或迁移。此外,平台寿命缩短的趋势表明,海上行业正经历结构性变化,与移动海上单位如钻探平台的重要性增长有关。研究结果展示了免费地球观测数据和深度学习在持续、长期监测海洋基础设施中的潜力。所推导的数据集是公开的,为海上监测、海洋规划及海上能源行业转型分析提供了基础。

英文摘要

The increasing use of marine spaces by offshore infrastructure, including oil and gas platforms, underscores the need for consistent, scalable monitoring. Offshore development has economic, environmental, and regulatory implications, yet maritime areas remain difficult to monitor systematically due to their inaccessibility and spatial extent. This study presents an automated approach to the spatiotemporal detection of offshore oil and gas platforms based on freely available Earth observation data. Leveraging Sentinel-1 archive data and deep learning-based object detection, a consistent quarterly time series of platform locations for three major production regions: the North Sea, the Gulf of Mexico, and the Persian Gulf, was created for the period 2017-2025. In addition, platform size, water depth, distance to the coast, national affiliation, and installation and decommissioning dates were derived. 3,728 offshore platforms were identified in 2025, 356 in the North Sea, 1,641 in the Gulf of Mexico, and 1,731 in the Persian Gulf. While expansion was observed in the Persian Gulf until 2024, the Gulf of Mexico and the North Sea saw a decline in platform numbers from 2018-2020. At the same time, a pronounced dynamic was apparent. More than 2,700 platforms were installed or relocated to new sites, while a comparable number were decommissioned or relocated. Furthermore, the increasing number of platforms with short lifespans points to a structural change in the offshore sector associated with the growing importance of mobile offshore units such as jack-ups or drillships. The results highlighted the potential of freely available Earth observation data and deep learning for consistent, long-term monitoring of marine infrastructure. The derived dataset is public and provides a basis for offshore monitoring, maritime planning, and analyses of the transformation of the offshore energy sector.

2603.14692 2026-06-17 cs.LO cs.AI

Applications of Intuitionistic Temporal Logic to Temporal Answer Set Programming

直觉时态逻辑在时态答案集编程中的应用

Pedro Cabalar, Martín Diéguez, David Fernández-Duque, François Laferrière, Torsten Schaub, Igor Stéphan

发表机构 * University of Corunna, Spain(科鲁纳大学) University of Angers, France(昂热大学) University of Barcelona, Spain(巴塞罗那大学) University of Potsdam, Germany(波茨坦大学)

AI总结 本文通过时态平衡逻辑探讨时态答案集编程的逻辑基础,将直觉逻辑与时态逻辑编程相结合,提出新的研究方向。

Comments Under consideration in Theory and Practice of Logic Programming (TPLP)

详情
AI中文摘要

本文通过时态平衡逻辑探讨时态答案集编程的逻辑基础,将直觉逻辑与时态逻辑编程相结合,提出新的研究方向。

英文摘要

The relationship between intuitionistic or intermediate logics and logic programming has been extensively studied, prominently featuring Pearce's equilibrium logic and Osorio's safe beliefs. Equilibrium logic admits a fixpoint characterization based on the logic of here-and-there, akin to theory completion in default and autoepistemic logics. Safe beliefs are similarly defined via a fixpoint operator, albeit under the semantics of intuitionistic or other intermediate logics. In this paper, we investigate the logical foundations of Temporal Answer Set Programming through the lens of Temporal Equilibrium Logic, a formalism combining equilibrium logic with linear-time temporal operators. We lift the seminal approaches of Pearce and Osorio to the temporal setting, establishing a formal correspondence between temporal intuitionistic logic and temporal logic programming. Our results deepen the theoretical underpinnings of Temporal Answer Set Programming and provide new avenues for research in temporal reasoning.

2602.00473 2026-06-17 quant-ph cs.AI cs.LG

Quantum Phase Recognition via Quantum Attention Mechanism

通过量子注意机制进行量子相识别

Jin-Long Chen, Xin Li, Zhang-Qi Yin

发表机构 * Center for Quantum Technology Research(量子技术研究中心) Key Laboratory of Advanced Optoelectronic Quantum Architecture(先进光电量子架构重点实验室) Measurements (MOE), School of Physics, Beijing Institute of Technology, Beijing 100081, China(测量(MOE),物理学院,北京理工大学,北京100081,中国)

AI总结 本文提出混合量子-经典注意模型,利用交换测试和参数化量子电路提取量子态关联,实现基态分类,针对簇异或模型在9和15个量子比特系统中表现出高准确率和鲁棒性。

Comments 10 pages, 7 figures

Journal ref Phys. Rev. A 113, 062403 (2026)

详情
AI中文摘要

许多体系统中的量子相变本质上由复杂的关联结构特征化,这给传统方法在大规模系统中的计算带来了挑战。为此,我们提出了一种混合量子-经典注意模型。该模型利用交换测试和参数化量子电路实现的注意机制,提取量子态中的关联并执行基态分类。在9和15个量子比特的簇异或模型上进行测试,该模型在少于100个训练数据的情况下实现了高分类准确率,并展示了对训练集变化的鲁棒性。进一步分析表明,该模型成功捕捉了相敏感特征和特征物理长度尺度,为复杂许多体系统中的量子相识别提供了一种可扩展且数据高效的解决方案。

英文摘要

Quantum phase transitions in many-body systems are fundamentally characterized by complex correlation structures, which pose computational challenges for conventional methods in large systems. To address this, we propose a hybrid quantum-classical attention model. This model uses an attention mechanism, realized through swap tests and a parameterized quantum circuit, to extract correlations within quantum states and perform ground-state classification. Benchmarked on the cluster-Ising model with system sizes of 9 and 15 qubits, the model achieves high classification accuracy with less than 100 training data and demonstrates robustness against variations in the training set. Further analysis reveals that the model successfully captures phase-sensitive features and characteristic physical length scales, offering a scalable and data-efficient approach for quantum phase recognition in complex many-body systems.

2511.03876 2026-06-17 eess.IV cs.CV cs.LG physics.med-ph

Computed Tomography (CT)-derived Cardiovascular Flow Estimation Using Physics-Informed Neural Networks Improves with Sinogram-based Training: A Simulation Study

基于CT的心血管血流估计利用物理信息神经网络,通过sinogram训练提升:一项模拟研究

Jinyuxuan Guo, Gurnoor Singh Khurana, Alejandro Gonzalo Grande, Juan C. del Alamo, Francisco Contijoch

发表机构 * Dept. of Bioengineering, University of California San Diego(加州大学圣地亚哥分校生物工程系) Dept. of Computer Science Engineering, University of California San Diego(加州大学圣地亚哥分校计算机科学与工程系) Dept. of Mechanical Engineering, Univ of Washington(华盛顿大学机械工程系) Depts of Mechanical Engineering and Cardiology, Univ. of Washington(华盛顿大学机械工程与心内科系) Depts. of Bioengineering, Radiology, University of California San Diego(加州大学圣地亚哥分校生物工程与放射学系)

AI总结 本研究评估了CT影像对基于物理信息神经网络(PINN)的血流估计的影响,提出了一种改进框架SinoFlow,直接利用sinogram数据估计血流,结果显示SinoFlow在避免滤波反投影引入的误差方面表现更优。

详情
AI中文摘要

背景:非侵入性成像基于血流评估在评估心脏功能和结构中起关键作用。CT是一种广泛使用的成像模态,能够稳健地评估心血管解剖和功能,但直接从对比剂演变的电影中估计血流速度的方法尚未开发。目的:本研究评估CT影像对基于物理信息神经网络(PINN)的血流估计的影响,并提出一种改进框架SinoFlow,直接利用sinogram数据估计血流。方法:我们利用计算流体力学生成理想化的2D血管分叉中的脉动流场,并模拟了不同 gantry 旋转速度、管电流和脉冲模式成像设置的CT扫描。我们比较了基于重建图像的PINN血流估计(ImageFlow)与SinoFlow的性能。结果:SinoFlow通过避免滤波反投影引入的误差显著提高了血流估计性能。SinoFlow在所有测试的gantry旋转速度下都表现出鲁棒性,并且始终产生比ImageFlow更低的均方误差和速度误差。此外,SinoFlow与脉冲模式成像兼容,并且在较短的脉冲宽度下保持更高的准确性。结论:本研究展示了SinoFlow在CT基血流估计中的潜力,为非侵入性血流评估提供了一种更有前景的方法。研究结果旨在为PINNs在CT图像中的未来应用提供信息,并提供了一种基于图像的估计解决方案,合理采集参数可产生准确的血流估计。

英文摘要

Background: Non-invasive imaging-based assessment of blood flow plays a critical role in evaluating heart function and structure. Computed Tomography (CT) is a widely-used imaging modality that can robustly evaluate cardiovascular anatomy and function, but direct methods to estimate blood flow velocity from movies of contrast evolution have not been developed. Purpose: This study evaluates the impact of CT imaging on Physics-Informed Neural Networks (PINN)-based flow estimation and proposes an improved framework, SinoFlow, which uses sinogram data directly to estimate blood flow. Methods: We generated pulsatile flow fields in an idealized 2D vessel bifurcation using computational fluid dynamics and simulated CT scans with varying gantry rotation speeds, tube currents, and pulse mode imaging settings. We compared the performance of PINN-based flow estimation using reconstructed images (ImageFlow) to SinoFlow. Results: SinoFlow significantly improved flow estimation performance by avoiding propagating errors introduced by filtered backprojection. SinoFlow was robust across all tested gantry rotation speeds and consistently produced lower mean squared error and velocity errors than ImageFlow. Additionally, SinoFlow was compatible with pulsed-mode imaging and maintained higher accuracy with shorter pulse widths. Conclusions: This study demonstrates the potential of SinoFlow for CT-based flow estimation, providing a more promising approach for non-invasive blood flow assessment. The findings aim to inform future applications of PINNs to CT images and provide a solution for image-based estimation, with reasonable acquisition parameters yielding accurate flow estimates.

2508.10908 2026-06-17 physics.ao-ph cs.LG

Data-driven global ocean model resolving ocean-atmosphere coupling dynamics

数据驱动的全球海洋模型解析海洋-大气耦合动力学

Jeong-Hwan Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, Yoo-Geun Ham

发表机构 * Center for Climate and Carbon Cycle Research, Korea Institute of Science and Technology, Seoul, Republic of Korea(韩国科学技术院气候与碳循环研究中心,首尔,大韩民国) Department of Environment and Energy, Jeonbuk National University, Jeonju, Republic of Korea(全南国立大学环境与能源系,全州,大韩民国) School of Earth and Environmental Sciences, Seoul National University, Seoul, Republic of Korea(首尔国立大学地球与环境科学学院,首尔,大韩民国) Department of Environmental Management, Seoul National University, Seoul, Republic of Korea(首尔国立大学环境管理系,首尔,大韩民国)

AI总结 本文提出KIST-Ocean模型,利用U型视觉注意力对抗网络架构,通过部分卷积、对抗训练和迁移学习提升海洋预测能力,准确模拟热带太平洋的Kelvin波和Rossby波传播及环流风应力诱导的垂直运动,展现其在气候现象中的耦合机制表示能力。

Comments The manuscript contains 4 main figures. The Extended Data contains 7 figures and 3 tables. The Supplementary Information contains 3 text sections, 7 figures, 1 table

Journal ref Sci. Adv. 12, eaed1225 (2026)

详情
AI中文摘要

人工智能已推动全球天气预报发展,优于传统数值模型在准确性和计算效率方面。然而,预测超亚季节时间尺度需要开发基于深度学习的海洋-大气耦合模型,以真实模拟复杂海洋对大气强迫的响应。本文提出KIST-Ocean,一种基于深度学习的全球三维海洋环流模型,采用U型视觉注意力对抗网络架构。KIST-Ocean通过部分卷积、对抗训练和迁移学习解决海岸复杂性和预测分布漂移问题。全面评估证实了模型的鲁棒海洋预测能力和效率。此外,它准确捕捉现实海洋响应,如热带太平洋的Kelvin和Rossby波传播,以及由环流和反环流风应力引起的垂直运动,展示其在气候现象(如厄尔尼诺-南方涛动)中关键海洋-大气耦合机制的表示能力。这些发现增强了基于深度学习的全球天气和气候模型的信心,并拓展深度学习方法到更广泛的地球系统建模,为提升气候预测能力提供潜力。

英文摘要

Artificial intelligence has advanced global weather forecasting, outperforming traditional numerical models in both accuracy and computational efficiency. Nevertheless, extending predictions beyond subseasonal timescales requires the development of deep learning (DL)-based ocean-atmosphere coupled models that can realistically simulate complex oceanic responses to atmospheric forcing. This study presents KIST-Ocean, a DL-based global three-dimensional ocean general circulation model using a U-shaped visual attention adversarial network architecture. KIST-Ocean integrates partial convolution, adversarial training, and transfer learning to address coastal complexity and predictive distribution drift in auto-regressive models. Comprehensive evaluations confirmed the model's robust ocean predictive skill and efficiency. Moreover, it accurately captures realistic ocean response, such as Kelvin and Rossby wave propagation in the tropical Pacific, and vertical motions induced by cyclonic and anticyclonic wind stress, demonstrating its ability to represent key ocean-atmosphere coupling mechanisms underlying climate phenomena, including the El Nino-Southern Oscillation. These findings reinforce confidence in DL-based global weather and climate models and their extending DL-based approaches to broader Earth system modeling, offering potential for enhancing climate prediction capabilities.

2506.08654 2026-06-17 physics.med-ph cs.LG

A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck

一种保护隐私的联邦学习框架用于头颈区域CBCT到合成CT的可推广转换

Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea

发表机构 * Institute of Biomedical Engineering(生物医学工程研究所) Karlsruhe Institute of Technology(卡尔斯鲁厄理工大学) Department of Experimental and Clinical Medicine(实验与临床医学系)

AI总结 本文提出一种跨机构联邦学习框架,用于头颈区域CBCT到合成CT的转换,通过保护数据隐私实现跨机构模型的泛化能力。

Journal ref Frontiers in Digital Health, 8:1812254, June 2026

详情
AI中文摘要

锥束计算机断层扫描(CBCT)已成为图像引导放射治疗(IGRT)中广泛应用的成像模态。然而,CBCT存在噪声增加、软组织对比度有限和伪影等问题,导致Hounsfield单位值不可靠,阻碍了直接剂量计算。合成CT(sCT)生成从CBCT中解决了这些问题,尤其是使用深度学习(DL)方法。现有方法受到机构异质性、扫描仪依赖性变化和数据隐私法规的限制,这些法规防止多中心数据共享。为克服这些挑战,我们提出了一种跨机构横向联邦学习(FL)方法,用于头颈区域CBCT到sCT的合成,扩展了我们的FedSynthCT框架。一个条件生成对抗网络在欧洲三个医疗中心的公共SynthRAD2025挑战数据集上协同训练。联邦模型在不同中心间表现出有效的泛化能力,平均绝对误差(MAE)范围从64.38±13.63到85.90±7.10 HU,结构相似性指数(SSIM)从0.882±0.022到0.922±0.039,峰值信噪比(PSNR)从32.86±0.94到34.91±1.04 dB。值得注意的是,在60名患者的外部验证数据集上,未进行额外训练即可实现相似的性能(MAE: 75.22±11.81 HU,SSIM: 0.904±0.034,PSNR: 33.52±2.06 dB),证实了在协议、扫描仪差异和配准误差的情况下具有鲁棒的泛化能力。这些发现展示了联邦学习在CBCT到sCT合成中的技术可行性,同时保护了数据隐私,并提供了一种无需集中数据共享或特定站点微调即可在不同机构之间开发可推广模型的协作解决方案。

英文摘要

Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.

2501.15351 2026-06-17 cs.CY cs.LG

Fairness in LLM-Generated Surveys

LLM生成调查中的公平性

Andrés Abeliuk, Vanessa Gaete, Naim Bro

发表机构 * Department of Computer Science, University of Chile(智利大学计算机科学系) National Center for Artificial Intelligence (CENIA)(国家人工智能中心) School of Government, Adolfo Ibáñez University(阿道弗·伊巴涅斯大学政府学院) Millennium Institute for Foundational Research on Data (IMFD)(数据基础研究千年研究所)

AI总结 研究分析了LLM在不同人口中的表现,发现其在美国数据集上表现更优,但存在因训练数据偏见导致的公平性问题,提出新的测量框架以提升模型公平性。

Journal ref EPJ Data Science (2026)

详情
AI中文摘要

大型语言模型(LLMs)在文本生成和理解方面表现出色,尤其在模拟社会政治和经济模式方面,可作为传统调查的替代方案。然而,其全球适用性仍存疑,因未探索的社会人口和地理背景中的偏见。本研究通过分析智利和美国的公开调查,探讨LLM在不同人群中的表现,关注预测准确性和公平性指标。结果显示,LLM在美国数据集上表现更优,此偏见源于以美国为中心的训练数据,即使考虑社会人口差异后仍显著。在美国,政治身份和种族显著影响预测准确性,而在智利,性别、教育和宗教归属起更重要作用。本研究提出一种新的框架,用于测量LLM中的社会人口偏见,为确保在不同社会文化背景下实现更公平和公正的模型表现提供路径。

英文摘要

Large Language Models (LLMs) excel in text generation and understanding, especially in simulating socio-political and economic patterns, serving as an alternative to traditional surveys. However, their global applicability remains questionable due to unexplored biases across socio-demographic and geographic contexts. This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States, focusing on predictive accuracy and fairness metrics. The results show performance disparities, with LLM consistently outperforming on U.S. datasets. This bias originates from the U.S.-centric training data, remaining evident after accounting for socio-demographic differences. In the U.S., political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles. Our study presents a novel framework for measuring socio-demographic biases in LLMs, offering a path toward ensuring fairer and more equitable model performance across diverse socio-cultural contexts.

2408.15188 2026-06-17 eess.AS cs.CL cs.SD

Infusing Acoustic Pause Context into Text-Based Dementia Assessment

将语音停顿上下文注入基于文本的痴呆症评估

Franziska Braun, Sebastian P. Bayerl, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

发表机构 * Technische Hochschule Nürnberg(图林根应用技术大学纽伦堡分校) Technische Hochschule Rosenheim(图林根应用技术大学罗森海姆分校) Klinik für Psychiatrie und Psychotherapie, Universitätsklinik der Paracelsus Medizinischen Privatuniversität, Klinikum Nürnberg, Germany(帕拉塞尔斯医学私人大学纽伦堡大学心理治疗与精神病科诊所) KST Institut GmbH, Bad Emstal, Germany(KST研究所,巴德埃姆斯塔尔,德国)

AI总结 本文研究利用停顿增强的转录文本,通过Transformer语言模型区分无认知障碍、轻度认知障碍和阿尔茨海默病患者,探讨停顿信息和声学上下文对不同任务的影响。

Comments Accepted at INTERSPEECH 2024

Journal ref Proceedings of Interspeech 2024

详情
AI中文摘要

语音停顿,与内容和结构相结合,提供了一种有价值的、非侵入性的生物标志物,用于检测痴呆症。本工作探讨了在基于Transformer的语言模型中使用包含停顿的转录文本,以区分无认知障碍、轻度认知障碍和阿尔茨海默病患者在临床评估中的语音特征。我们处理了三个二元分类任务:起始、监测和痴呆排除。通过在德语口头流畅性测试和图片描述测试上的实验,比较模型在不同语音生成上下文中的有效性。从文本基线开始,我们探讨了停顿信息和声学上下文的整合效果。我们展示了测试应根据任务选择,并且词汇停顿信息和声学交叉注意力对不同任务贡献不同。

英文摘要

Speech pauses, alongside content and structure, offer a valuable and non-invasive biomarker for detecting dementia. This work investigates the use of pause-enriched transcripts in transformer-based language models to differentiate the cognitive states of subjects with no cognitive impairment, mild cognitive impairment, and Alzheimer's dementia based on their speech from a clinical assessment. We address three binary classification tasks: Onset, monitoring, and dementia exclusion. The performance is evaluated through experiments on a German Verbal Fluency Test and a Picture Description Test, comparing the model's effectiveness across different speech production contexts. Starting from a textual baseline, we investigate the effect of incorporation of pause information and acoustic context. We show the test should be chosen depending on the task, and similarly, lexical pause information and acoustic cross-attention contribute differently.

2308.08306 2026-06-17 eess.AS cs.SD

Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

在抑郁存在下的痴呆分类:一项跨语料库研究

Franziska Braun, Sebastian P. Bayerl, Paula A. Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

发表机构 * Technische Hochschule Nürnberg(图林根应用技术大学) Friedrich-Alexander-Universität Erlangen-Nürnberg(埃尔兰根-纽伦堡 Friedrich-Alexander 大学) Klinik für Psychiatrie und Psychotherapie, Universitätsklinik der Paracelsus Medizinischen Privatuniversität, Klinikum Nürnberg, Germany(纽伦堡大学心理治疗与精神病科诊所,帕拉塞尔医学私人大学大学医院,纽伦堡诊所,德国) KST Institut GmbH, Bad Emstal, Germany(KST 机构,巴德埃姆斯塔尔,德国)

AI总结 本文通过跨语料库实验,利用文本、音频和情感嵌入对语音进行三类分类(HC vs. MCI vs. DEM),探讨抑郁作为次级诊断对分类器的影响。

Comments Accepted at INTERSPEECH 2023

Journal ref Proceedings of Interspeech 2023

详情
AI中文摘要

自动痴呆筛查有助于早期检测和干预,减少对 healthcare 系统的成本,提高受影响者的质量生活。抑郁症与痴呆有共享症状,增加了诊断的复杂性。迄今为止,研究重点是使用单个数据集的图片描述测试语音对痴呆(DEM)和健康受试者(HC)进行二分类。在本工作中,我们应用已建立的基线系统,利用语义词汇流畅度测试和波士顿命名测试的语音,通过文本、音频和情感嵌入进行三类分类。我们在两个独立录制的德语数据集上进行跨语料库和混合语料库实验,以研究在更大人群和不同录音条件下的泛化能力。在详细的错误分析中,我们研究抑郁症作为次级诊断,以了解分类器实际上学到了什么。

英文摘要

Automated dementia screening enables early detection and intervention, reducing costs to healthcare systems and increasing quality of life for those affected. Depression has shared symptoms with dementia, adding complexity to diagnoses. The research focus so far has been on binary classification of dementia (DEM) and healthy controls (HC) using speech from picture description tests from a single dataset. In this work, we apply established baseline systems to discriminate cognitive impairment in speech from the semantic Verbal Fluency Test and the Boston Naming Test using text, audio and emotion embeddings in a 3-class classification problem (HC vs. MCI vs. DEM). We perform cross-corpus and mixed-corpus experiments on two independently recorded German datasets to investigate generalization to larger populations and different recording conditions. In a detailed error analysis, we look at depression as a secondary diagnosis to understand what our classifiers actually learn.

2201.06574 2026-06-17 eess.IV cs.CV

Neural Computed Tomography

神经计算断层扫描

Kunal Gupta, Brendan Colvert, Francisco Contijoch

发表机构 * University of California San Diego(加州大学圣地亚哥分校)

AI总结 本文提出NeuralCT框架,通过神经隐式方法生成无运动伪影的时间分辨图像,适用于心脏等复杂运动场景。

Comments https://kunalmgupta.github.io/projects/NeuralCT.html

详情
AI中文摘要

在获取投影集过程中发生的运动可能导致计算断层扫描重建中出现显著的运动伪影,尽管单个视图的获取速度较快。在如心脏成像等情况下,运动可能是不可避免的,评估运动具有临床意义。通过开发具有更快门架旋转速度的系统或使用测量和/或估计位移的算法,通常可以减少运动伪影。然而,这些方法由于物理限制以及估计/测量非刚性、时间变化和患者特异性运动的挑战而效果有限。我们提出了一种新的重建框架NeuralCT,以生成无运动伪影的时间分辨图像。我们的方法利用神经隐式方法,不需要对底层运动进行估计或建模。相反,通过使用符号距离度量和神经隐式框架来表示边界。我们利用“分析-合成”方法来确定与所获取的sinogram一致且符合空间和时间一致性约束的解决方案。我们通过三个渐进复杂的场景展示了NeuralCT的实用性:小圆的平移、椭圆直径的心跳样变化以及复杂的拓扑变形。在不进行超参数调优或改变架构的情况下,NeuralCT在使用均方误差和Dice度量时,为所有三种运动提供了高质量的图像重建,相比滤波反投影。

英文摘要

Motion during acquisition of a set of projections can lead to significant motion artifacts in computed tomography reconstructions despite fast acquisition of individual views. In cases such as cardiac imaging, motion may be unavoidable and evaluating motion may be of clinical interest. Reconstructing images with reduced motion artifacts has typically been achieved by developing systems with faster gantry rotation or using algorithms which measure and/or estimate the displacements. However, these approaches have had limited success due to both physical constraints as well as the challenge of estimating/measuring non-rigid, temporally varying, and patient-specific motions. We propose a novel reconstruction framework, NeuralCT, to generate time-resolved images free from motion artifacts. Our approaches utilizes a neural implicit approach and does not require estimation or modeling of the underlying motion. Instead, boundaries are represented using a signed distance metric and neural implicit framework. We utilize `analysis-by-synthesis' to identify a solution consistent with the acquired sinogram as well as spatial and temporal consistency constraints. We illustrate the utility of NeuralCT in three progressively more complex scenarios: translation of a small circle, heartbeat-like change in an ellipse's diameter, and complex topological deformation. Without hyperparameter tuning or change to the architecture, NeuralCT provides high quality image reconstruction for all three motions, as compared to filtered backprojection, using mean-square-error and Dice metrics.

2106.09539 2026-06-17 eess.AS cs.LG cs.SD

Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

对新生儿重症监护病房中以儿童为中心的全天候录音中语音情感内容的自动分析

Einari Vaaras, Sari Ahlqvist-Björkroth, Konstantinos Drossos, Okko Räsänen

发表机构 * Unit of Computing Sciences, Tampere University, Finland(图瓦大学计算科学系) Department of Clinical Medicine, University of Turku, Finland(图尔库大学临床医学系) Department of Signal Processing and Acoustics, Aalto University, Finland(阿尔托大学信号处理与声学系)

AI总结 本文研究了如何通过自动语音情感识别系统分析新生儿录音中的情感内容,探讨了跨语料泛化、WGAN域适应和主动学习在新领域部署中的有效性,实现了73.4%的UAR分类性能。

详情
AI中文摘要

研究人员最近开始研究年轻婴儿听到的情感语音如何影响其发展结果。作为这项研究的一部分,来自芬兰和爱沙尼亚两家医院的数百小时全天候录音被收集,用于所谓的APPLE研究。为了分析此类大规模数据集中的语音情感内容,需要一个自动语音情感识别(SER)系统。然而,目前没有情感标签或现成的领域内SER系统可用。本文介绍了最初未标注的大型真实世界音频数据集,并描述了针对芬兰子集数据开发的功能性SER系统。我们探讨了替代的最先进技术在新领域部署SER系统的有效性,比较了跨语料泛化、基于WGAN的域适应和主动学习在该任务中的效果。结果表明,表现最好的模型能够实现二元分类中valence和arousal的73.4%未加权平均召回率(UAR)和73.2% UAR。结果还显示,主动学习在与其他两种方法相比时表现最为一致。

英文摘要

Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.

2606.18111 2026-06-17 cs.LG cs.AI 新提交

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

多目标强化学习中学习公平帕累托最优策略

Umer Siddique, Peilang Li, Yongcan Cao

AI总结 针对多目标强化学习中固定用户偏好无法提供多样化策略的问题,提出基于广义基尼福利函数的多策略方法,学习公平帕累托最优策略集。

Comments Accepted at the Reinforcement Learning Conference (RLC) 2025. 12 pages main + appendix, 8 figures, 4 tables

详情
AI中文摘要

公平性是多目标强化学习(MORL)决策中的一个重要方面,策略必须确保在多个潜在冲突的目标上既达到最优又实现公平。虽然单策略MORL方法可以使用福利函数(如广义基尼福利函数GGF)为固定的用户偏好学习公平策略,但它们无法提供动态或未知用户偏好所需的多样的策略集。为解决这一局限性,我们形式化了多策略MORL中的公平优化问题,其目标是学习一组帕累托最优策略,确保在所有可能的用户偏好下实现公平。我们的关键技术贡献有三点:(1)我们证明对于凹的、分段线性的福利函数(例如GGF),公平策略仍然在凸覆盖集(CCS)中,CCS是线性标量化下的近似帕累托前沿。(2)我们证明非平稳策略(通过累积奖励历史增强)和随机策略通过动态适应历史不公平性来改善公平性。(3)我们提出了三种新算法,包括将GGF与多策略多目标Q学习(MOQL)集成、用于学习非平稳策略的状态增强多策略MOQL,以及用于学习随机策略的新扩展。我们在多个领域评估了我们的算法,并将我们的方法与最先进的MORL基线进行了比较。实验结果表明,我们的方法学习了一组公平策略,能够适应不同的用户偏好。

英文摘要

Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives. While single-policy MORL methods can learn fair policies for fixed user preferences using welfare functions such as the generalized Gini welfare function (GGF), they fail to provide the diverse set of policies necessary for dynamic or unknown user preferences. To address this limitation, we formalize the fair optimization problem in multi-policy MORL, where the goal is to learn a set of Pareto-optimal policies that ensure fairness across all possible user preferences. Our key technical contributions are threefold: (1) We show that for concave, piecewise-linear welfare functions (e.g., GGF), fair policies remain in the convex coverage set (CCS), which is an approximated Pareto front for linear scalarization. (2) We demonstrate that non-stationary policies, augmented with accrued reward histories, and stochastic policies improve fairness by dynamically adapting to historical inequities. (3) We propose three novel algorithms, which include integrating GGF with multi-policy multi-objective Q-Learning (MOQL), state-augmented multi-policy MOQL for learning non-statoinary policies, and its novel extension for learning stochastic policies. We evaluate our algorithms across various domains and compare our methods against the state-of-the-art MORL baselines. The empirical results show that our methods learn a set of fair policies that accommodate different user preferences.

2606.17692 2026-06-17 cs.LG 新提交

Delta-Based Target Reformulation for Short-Term Electricity Load Forecasting Using LSTM and Transformer Models

基于Delta目标重构的LSTM与Transformer短期电力负荷预测

Vansh Bansal

AI总结 针对电力负荷非平稳性,提出Delta目标重构方法,让LSTM和Transformer预测负荷变化量而非绝对值,在小时级预测中MAE和MAPE降低超50%。

Comments 8 pages, 3 tables

详情
AI中文摘要

准确的短期电力负荷预测对于现代电力系统的可靠和经济运行至关重要,尤其是在天气变化、日历效应和消费模式演变导致的非平稳性下。尽管LSTM和Transformer等深度学习模型表现出色,但大多数现有研究侧重于直接预测绝对负荷,而未明确解决目标非平稳性。受ARIMA模型中经典时间序列差分技术的启发,本文研究了一种基于Delta的目标重构方法,用于深度学习的短期电力负荷预测。该方法不直接预测绝对负荷值,而是训练模型预测连续时间步之间的负荷变化,最终预测通过最后一次观测负荷重建。这旨在稳定学习目标并降低预测难度。利用印度多年逐小时真实电力负荷数据,辅以NASA POWER项目的气象变量和日历特征,本研究评估了LSTM和Transformer在两种公式下的表现,并以LightGBM作为基准。实验针对小时前和日前预测范围进行,通过平均绝对误差(MAE)和平均绝对百分比误差(MAPE)评估性能。结果表明,Delta重构在所有评估模型的小时前预测中持续提高预测精度,与绝对公式相比,MAPE降低超过50%。对于日前预测,Delta目标特别有利于深度序列模型(LSTM和Transformer),而LightGBM在绝对公式下仍具有竞争力。这些发现表明,Delta重构是神经网络的一种强大归纳偏置,但其效果依赖于模型和预测范围。

英文摘要

Accurate short-term electricity load forecasting is critical for the reliable and economic operation of modern power systems, under non-stationarity arising from weather variability, calendar effects, and evolving consumption patterns. While deep learning models such as LSTMs and Transformers show promising performance, most existing studies focus on direct absolute load prediction without explicitly addressing target non-stationarity. Motivated by classical time-series differencing techniques in ARIMA models, this paper investigates a delta-based target reformulation for short-term electricity load forecasting using deep learning. Instead of directly predicting absolute load values, the proposed formulation trains models to predict the change in load between consecutive time steps, with final forecasts reconstructed using the last observed load. This aims to stabilize the learning target and reduce forecasting difficulty. Using multi-year, hourly real-world electricity load data from India, augmented with meteorological variables from the NASA POWER project and calendar features, this study evaluates LSTM and Transformer models under both formulations, benchmarking them against LightGBM. Experiments are conducted for hour-ahead and day-ahead horizons, assessing performance via Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). Results show that delta-based reformulation consistently improves forecasting accuracy for hour-ahead prediction across all evaluated models, yielding MAPE reductions of over 50% compared to absolute formulations. For day-ahead forecasting, delta targets specifically benefit deep sequence models (LSTM and Transformer), while LightGBM remains competitive under the absolute formulation. These findings indicate that while delta reformulation is a powerful inductive bias for neural networks, its efficacy is model- and horizon-dependent.

2606.17603 2026-06-17 cs.LG 新提交

Expanding SPHERE-JEPA: A Family of Statistical Regularizers for the Hypersphere

扩展SPHERE-JEPA:超球面上的统计正则化器家族

Léo Nicollier, Enric Meinhardt-Llopis, Max Dunitz, Marc Pic, Pablo Musé, Gabriele Facciolo

AI总结 为解决自监督学习中切片统计正则化器因蒙特卡洛采样引入投影方差导致优化不稳定和收敛慢的问题,提出全维MMD、KSD和KL散度正则化器,并采用旋转不变核,在ImageNet和Galaxy10上实现更稳定优化和一致改进。

详情
AI中文摘要

在自监督学习(SSL)中,通过在单位超球面上显式强制均匀分布来防止表示坍缩已被证明是有效的。然而,当前的框架通常依赖于切片统计正则化器,如SIGReg(用于LeJEPA)和SUSReg(用于SPHERE-JEPA),这些正则化器通过沿随机一维方向的蒙特卡洛采样来近似这一连续目标。这种随机性将投影方差注入训练梯度,破坏优化稳定性,并阻碍收敛。在这项工作中,我们首先证明,解析地积分掉这些随机投影自然地产生一个确定性的最大均值差异(MMD),从而避免了切片方法的方差。受此等价性的启发,我们直接在球面上制定了MMD、核斯坦因差异(KSD)和KL散度的全维目标,以强制均匀分布。为了防止空间偏差,我们通过谱理论构造旋转不变核来装备这些检验,并系统评估了两个典型族:平滑指数衰减(热核)和严格频率截止(带限)滤波器。实验上,去除投影引起的噪声导致更稳定的优化、更快的收敛,并在ImageNet和Galaxy10上相对于随机切片正则化器取得一致改进。此外,我们揭示了统计检验的选择塑造了学习潜在空间的几何结构:MMD和KSD有利于适用于以对象为中心的领域的局部聚类组织,而基于连续KDE的KL散度促进了细粒度的实例分离,在非聚类的程序化纹理检索上取得了最强结果。

英文摘要

In Self-Supervised Learning (SSL), preventing representation collapse by explicitly enforcing a uniform distribution on the unit hypersphere has proven to be effective. However, current frameworks typically rely on sliced statistical regularizers such as SIGReg (used in LeJEPA) and SUSReg (used in SPHERE-JEPA), which approximate this continuous objective via Monte Carlo sampling along random 1D directions. This stochasticity injects projection variance into the training gradients, destabilizing optimization, and hindering convergence. In this work, we first show that analytically integrating out these random projections natively yields a deterministic Maximum Mean Discrepancy (MMD), bypassing the variance of sliced methods. Motivated by this equivalence, we formulate full-dimensional objectives for MMD, Kernel Stein Discrepancy (KSD), and Kullback-Leibler (KL) divergence directly on the sphere to enforce a uniform distribution. To prevent spatial bias, we equip these tests with rotationally invariant kernels constructed via spectral theory, systematically evaluating two canonical families: smooth exponential decay (Heat) and strict frequency cutoff (Bandlimited) filters. Empirically, removing projection-induced noise results in more stable optimization, faster convergence, and consistent improvements over stochastic sliced regularizers on ImageNet and Galaxy10. Furthermore, we reveal that the choice of the statistical test shapes the geometry of the learned latent space: MMD and KSD favor locally clustered organization suitable for object-centric domains, whereas the continuous KDE-based KL divergence promotes fine-grained instance separation, yielding the strongest results on unclustered procedural texture retrieval.

2606.17579 2026-06-17 cs.LG cs.AI cs.CL cs.SI 新提交

LLM Features Can Hurt GNNs: Concatenation Interference on Homophilous Graph Benchmarks

LLM特征可能损害GNN:同配图基准上的拼接干扰

Zhongyuan Wang, Pratyusha Vemuri

AI总结 本文发现将LLM特征通过纯输入拼接(而非联合训练)引入图神经网络时,会在同配基准上系统性地降低准确率,并提出了一个基于LLM单独判别性指标Delta_sig来预测拼接效果。

Comments 29 pages, 8 figures

详情
AI中文摘要

将LLM生成的节点特征添加到图神经网络(GNN)中,被广泛报道能提高标准基准的准确率。我们记录了一个相反的观察:当LLM特征通过纯输入拼接(而非联合训练、蒸馏或提示条件)引入时,它们会在相同的同配基准上系统地降低准确率,而端到端LLM流水线在这些基准上却能成功。使用MLP骨干网络、Planetoid公共划分和词袋原始特征,拼接SBERT编码的GPT-4o-mini TAPE特征导致PubMed测试准确率下降-17.0±0.3个百分点,Cora下降-4.3±0.6个百分点(CiteSeer下降-0.6±0.8个百分点,在种子噪声范围内)。当我们放宽每个条件(GCN/GCNII/GAT骨干网络、随机划分、更小编码器)时,下降幅度减弱,并在中等同配的WikiCS(+4.4个百分点)和ogbn-arxiv(+11.7个百分点)上逆转。为了预测拼接何时有益或有害,我们报告了一个简单的LLM单独判别性指标Delta_sig。在9个数据集上,Delta_sig与拼接成本的相关系数(r^2=0.38)强于同配性(r^2=0.06;N=9,bootstrap置信区间重叠)。bootstrap最佳变点为tau=13.8个百分点,规则“Delta_sig <= tau预测非正拼接成本”正确分类了7/9个数据集;由于60%的bootstrap样本将tau置于[5,30]个百分点之间,我们将Delta_sig视为解释性透镜而非精确过滤器。在PubMed上进行的维度控制消融实验将LLM特征下降置于同源PCA(-2.3个百分点)和同维高斯噪声(-37.3个百分点)之间,排除了维度和权重衰减的影响。九个PubMed配置拟合出幂律|Delta_concat| ∝ (sqrt(d_l/n))^1.31,r^2=0.97;低Delta_sig、小n的角落正是标题中-17个百分点PubMed缺陷出现的位置。

英文摘要

Adding LLM-generated node features to graph neural networks (GNNs) is widely reported to improve accuracy on standard benchmarks. We document a contrasting observation: when LLM features are introduced through pure input concatenation (rather than joint training, distillation, or prompt-conditioning), they can systematically degrade accuracy on the same homophilous benchmarks where end-to-end LLM pipelines succeed. With an MLP backbone on the Planetoid public split and bag-of-words original features, concatenating SBERT-encoded GPT-4o-mini TAPE features reduces PubMed test accuracy by -17.0 +/- 0.3 pp and Cora by -4.3 +/- 0.6 pp (CiteSeer -0.6 +/- 0.8 pp, within seed noise). The drop attenuates as we relax each condition (GCN / GCNII / GAT backbones, random splits, smaller encoders) and reverses on medium-homophily WikiCS (+4.4 pp) and ogbn-arxiv (+11.7 pp). To predict when concatenation helps versus hurts, we report a simple measure of LLM-alone discriminability, Delta_sig. Across 9 datasets Delta_sig correlates with the concatenation cost more strongly than homophily at point estimate (r^2 = 0.38 vs. 0.06; N=9, bootstrap CIs overlap). The bootstrap-best change-point is tau = 13.8 pp, and the rule "Delta_sig <= tau predicts non-positive concat cost" classifies 7/9 datasets correctly; since 60% of bootstrap samples place tau in [5, 30] pp, we treat Delta_sig as an interpretive lens rather than a precision filter. A dimension-controlled ablation on PubMed places the LLM-feature drop between same-source PCA (-2.3 pp) and same-dim Gaussian noise (-37.3 pp), ruling out dimensionality and weight-decay artifacts. Nine PubMed configurations fit a power law |Delta_concat| proportional to (sqrt(d_l/n))^1.31 with r^2 = 0.97; the low-Delta_sig, small-n corner is exactly where the headline -17 pp PubMed deficit appears.

2606.17572 2026-06-17 cs.LG cs.SY eess.SY 新提交

When Dynamics Models Read the Wrong Time Steps: Label-Free Event Credit Re-Anchoring for Robust Global Readouts

当动力学模型读取错误的时间步:无标签事件信用重锚定以实现鲁棒的全局读出

Yifan Wang

AI总结 针对序列到全局接口中的时间信用稀释问题,提出无训练无标签的CREST方法,通过事件核心估计与对比重锚定,减少分布外误差并恢复事件信用。

Comments 7 pages, 6 figures

详情
AI中文摘要

学习到的动力学模型通常通过将每步特征序列池化为一个读出向量来回答全局物理问题,如故障严重性或冲击刚度。这种序列到全局的接口产生了一个未被充分研究的时间信用问题:在仅有轨迹级监督的情况下,模型可以在训练条件下准确预测,同时从丰富的平滑相关物而非决定目标的短暂物理事件中读取信息。我们将这种失败称为时间信用稀释。它不会被训练损失暴露,也不会被标准的物理信息残差消除,因为错误在于全局读出分配功能信用的位置。我们引入了Credit-in-Event,一种接口级探针,用于测量池化信用落在事件步上的程度,并闭式证明当事件分数缩小时,池化线性读取器将信用路由到虚假的背景通道。然后我们提出了CREST,一种无训练且无标签的读出方法,它从学习到的特征中估计瞬态事件核心,并通过事件与其余部分的对比重锚定池化表示。在模拟齿轮和冲击系统、循环和注意力编码器以及公共轴承振动数据上,CREST减少了分布外误差,同时恢复了事件信用。消融实验表明,稳定步选择和感受野缩小失败,证实了增益来自事件核心信用重锚定,而非通用的局部性或稳定性先验。

英文摘要

Learned dynamics models often answer global physical questions, such as fault severity or impact stiffness, by pooling a per-step feature sequence into one readout vector. This sequence-to-global interface creates an under-studied temporal credit problem: with only trajectory-level supervision, a model can predict accurately in training conditions while reading from abundant smooth correlates rather than the brief physical events that determine the target. We call this failure temporal credit dilution. It is not exposed by the training loss and is not removed by standard physics-informed residuals, because the error lies in where the global readout assigns functional credit. We introduce Credit-in-Event, an interface-level probe for measuring how much pooled credit lands on event steps, and prove in closed form that a pooled linear reader routes credit to a spurious background channel as the event fraction shrinks. We then propose CREST, a training-free and label-free readout that estimates a transient event core from learned features and re-anchors the pooled representation through event-versus-rest contrast. Across simulated gear and impact systems, recurrent and attention encoders, and public bearing vibration data, CREST reduces out-of-distribution error while restoring event credit. Ablations show that stable-step selection and receptive-field shrinking fail, confirming that the gain comes from event-core credit re-anchoring rather than a generic locality or stability prior.

2606.17451 2026-06-17 cs.LG cs.RO 新提交

Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift

操作设计域转移下自动驾驶汽车责任的可信度加权定价

Doyeon Jang

AI总结 针对自动驾驶系统部署中经验稀疏、ODD转移及风险非平稳问题,提出分层贝叶斯可信度框架,通过ODD相似性核进行部分池化,在Waymo数据上验证其有效性。

详情
AI中文摘要

自动驾驶系统的部署带来了一个基础性的费率制定挑战:稀疏的经验、不断变化的操作设计域以及跨软件版本的非平稳风险。我们提出了一个分层贝叶斯可信度框架,通过学习的ODD相似性核汇集城市、软件版本和区域的信息,将Buhlmann-Straub作为极限情况嵌套其中。基于NHTSA Standing General Order数据库中美国四个大都市区的648起Waymo已验证碰撞事件与1.16亿匹配里程的演示表明,城市聚合可信度权重适中(0.12-0.46),部分池化明显优于无池化,且功效分析显示,学习核的优势在大约十二个部署城市时变得可检测。

英文摘要

Automated Driving System deployments create a foundational ratemaking challenge: sparse experience, shifting operational design domains, and non-stationary risk across software releases. We propose a hierarchical Bayesian credibility framework pooling across cities, software versions, and territories via a learned ODD-similarity kernel, nesting Buhlmann-Straub as a limiting case. Demonstrated on 648 verified-engaged Waymo crashes across four U.S. metros from the NHTSA Standing General Order database against 116 million matched miles, city-aggregate credibility weights are moderate (0.12-0.46), partial pooling decisively outperforms no pooling, and a power analysis shows the learned kernel's advantage becomes detectable at approximately twelve deployed cities.

2606.16878 2026-06-17 cs.LG 新提交

Integrated Marketing Attribution: A Bayesian Framework for Privacy-Safe Granular Measurement Anchored in MMM

集成营销归因:基于贝叶斯框架的隐私安全粒度测量,锚定于MMM

Meghana R. Bhat, Ankit Umare, Utsav Aggarwal, Richard Vecsler, Arunkumar Mani, Karthik Nair, Chandhu Nair

AI总结 提出集成营销归因(IMA)框架,结合营销组合模型(MMM)与贝叶斯归因模型,从聚合数据中推导出活动级效果,实现隐私安全且粒度精细的归因。

详情
AI中文摘要

零售营销测量日益需要精细的活动级洞察,而无需依赖用户级跟踪。然而,两种主流方法——营销组合模型(MMM)和多触点归因(MTA)——常常产生碎片化的洞察。MMM在渠道级规划中隐私安全且稳健,但对于活动优化过于粗糙;而MTA提供精细归因,但在日益增加的隐私限制下变得不太可靠。我们提出集成营销归因(IMA),一个统一框架,将MMM与特定渠道的贝叶斯归因模型相结合,从聚合数据中推导活动级效果。通过利用MMM信息先验,IMA提供精细、隐私安全的归因,同时保持与MMM的一致性。

英文摘要

Retail marketing measurement increasingly requires granular campaign-level insights without relying on user-level tracking. However, the two dominant approaches, Marketing Mix Modeling (MMM) and Multi-Touch Attribution (MTA), often produce fragmented insights. MMM is privacy-safe and robust for channel-level planning but is too coarse for campaign optimization, while MTA provides granular attribution but has become less reliable under increasing privacy restrictions. We propose Integrated Marketing Attribution (IMA), a unified framework that combines MMM with channel specific Bayesian attribution models to derive campaign-level effects from aggregated data. By leveraging MMM-informed priors, IMA delivers granular, privacy-safe attribution while preserving consistency with MMM.

2606.12867 2026-06-17 cs.LG 新提交

SMGFM: Spectral Multimodal Graph Pretraining for Multimodal-Attributed Graphs

SMGFM: 面向多模态属性图的谱多模态图预训练

Zhengyu Wu, Xu Wang, Hongchao Qin, Xunkai Li, Guang Zeng, Rong-Hua Li, Guoren Wang

AI总结 提出SMGFM框架,利用图频谱分解区分结构诱导语义与模态特有语义,通过频带路由实现跨模态融合,在图级和模态级任务上取得最优性能。

详情
AI中文摘要

多模态属性图(MAGs)将图拓扑结构与来自文本、图像等模态的节点语义相结合。传统的图学习通过耦合拓扑与节点特征来上下文化节点语义。然而,这种耦合设计在MAGs中变得棘手,因为结构诱导和模态固有的语义可能对下游任务产生不同贡献。结构诱导语义通过平滑拓扑变化促进关系一致性,而模态固有语义通常编码局部、细粒度的区分,不应被统一平滑或对齐。因此,关键挑战在于跨模态融合前识别语义角色。为此,我们利用图频率变化作为先验,其中低频分量捕获拓扑一致语义,高频分量保留模态特定语义。基于这一直觉,我们提出SMGFM,一种谱多模态图预训练框架,将每个模态特定的节点信号分解为图频带,并在跨模态交互前分配频带级语义角色。具体地,SMGFM使用可扩展的切比雪夫滤波器构建频率解析的模态令牌,通过拓扑条件路由估计其耦合可靠性,并在融合前进行频带-模态交互。其频率路由目标在平滑共识路由的同时保留模态特定路由,减轻空间域纠缠和统一跨模态对齐。在MAG数据集上的大量实验表明,SMGFM在图级和模态级任务上均达到最先进性能。

英文摘要

Multimodal-attributed graphs (MAGs) couple graph topology with node semantics from text, images, and other modalities. Traditional graph learning contextualizes node semantics by coupling topology with node features. However, this coupling design becomes troublesome in MAGs, where structure-induced and modality-intrinsic semantics may contribute differently to downstream tasks. Structure-induced semantics promote relational consistency through smooth topological variation, whereas modality-intrinsic semantics often encode local, fine-grained distinctions that should not be uniformly smoothed or aligned. Therefore, the key challenge is to identify semantic roles before cross-modal fusion. To this end, we leverage graph-frequency variation as a prior, where low-frequency components capture topology-consistent semantics and high-frequency components preserve modality-specific semantics. Based on this intuition, we propose SMGFM, a spectral multimodal graph pretraining framework that decomposes each modality-specific node signal into graph-frequency bands and assigns band-level semantic roles before cross-modal interaction. Concretely, SMGFM constructs frequency-resolved modality tokens with scalable Chebyshev filters, estimates their coupling reliability through topology-conditioned routing, and performs band-modality interaction before fusion. Its frequency-routed objectives align smooth consensus routes while preserving modality-specific routes, mitigating spatial-domain entanglement and uniform cross-modal alignment. Extensive experiments conducted on the MAG datasets demonstrate that SMGFM achieves state-of-the-art performance across graph-level and modality-level tasks.

2502.17518 2026-06-17 cs.LG cs.AI q-fin.CP stat.ML 版本更新

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

通过分类器模型进行集成强化学习:在交易策略中增强风险回报权衡

Zheli Xiong

AI总结 本文研究了在金融交易策略中使用集成强化学习模型的全面研究,利用分类器模型来提升性能。通过将A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归相结合,探讨不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个强化学习模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

Comments 23 pages,10 figures, 9 table

详情
AI中文摘要

本文提出了一项全面研究,探讨在金融交易策略中使用集成强化学习(RL)模型的应用,利用分类器模型来提升性能。通过结合A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归,我们研究了不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个RL模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。我们的结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

英文摘要

This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our original experimental results demonstrate that ensemble methods often outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, both the original analysis and the additional reproduction reported in this version show that ensemble performance is sensitive to the choice of variance threshold \(τ\), classifier group, RL-agent pair, and market universe. The reproduction evidence strengthens the conclusion that classifier-assisted ensemble selection can improve robustness, while also clarifying that the advantage is conditional rather than automatic across all datasets. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

2602.22159 2026-06-17 cs.CV 版本更新

CASR: A Robust Cyclic Framework for Arbitrary Large-Scale Super-Resolution with Distribution Alignment and Self-Similarity Awareness

CASR:一种鲁棒的循环框架,用于任意大尺度超分辨率,具有分布对齐和自相似性意识

Wenhao Guo, Zhaoran Zhao, Peng Lu, Sheng Li, Qian Qiao, DeRui Li

AI总结 CASR通过分布对齐和自相似性意识,解决大尺度超分辨率中的分布漂移和扩散不一致问题,实现稳定推理和高效单模型处理。

详情
AI中文摘要

CASR通过分布对齐和自相似性意识,解决大尺度超分辨率中的分布漂移和扩散不一致问题,实现稳定推理和高效单模型处理。

英文摘要

Arbitrary-Scale SR (ASISR) remains fundamentally limited by cross-scale distribution shift: once the inference scale leaves the training range, noise, blur, and artifacts accumulate sharply. We revisit this challenge from a cross-scale distribution transition perspective and propose CASR, a simple yet highly efficient cyclic SR framework that reformulates ultra-magnification as a sequence of in-distribution scale transitions. This design ensures stable inference at arbitrary scales while requiring only a single model. CASR tackles two major bottlenecks: distribution drift across iterations and patch-wise diffusion inconsistencies. The proposed SSAM module aligns structural distributions via superpixel aggregation, preventing error accumulation, while SARM module restores high-frequency textures by enforcing correlation-guided consistency and preserving self-similarity structure through correlation alignment. Despite using only a single model, our approach significantly reduces distribution drift, preserves long-range texture consistency, and achieves superior generalization even at extreme magnification.