arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2060
2511.03211 2026-06-17 cs.CY cs.AI 版本更新

Retrofitters, pragmatists and activists: Public interest litigation for accountable automated decision-making

改造者、实用主义者和活动家:为可问责的自动化决策而进行的公益诉讼

Henry Fraser, Zahra Stardust

发表机构 * Queensland University of Technology, School of Law(昆士兰理工大学法学院) Centre for Automated Decision-Making and Society(自动化决策与社会研究中心) Queensland University of Technology, School of Communication(昆士兰理工大学传播学院)

AI总结 本文探讨公益诉讼在澳大利亚促进AI和自动化决策问责中的作用,基于访谈分析策略与局限,强调制度安排对有效诉讼的关键性。

详情
AI中文摘要

本文考察了公益诉讼在促进澳大利亚人工智能和自动化决策(ADM)问责方面的作用。由于ADM监管面临政治和地缘政治阻力,有效的治理将不得不依赖现有法律的执行。基于对澳大利亚公益诉讼律师、技术政策活动家和技术法学学者的访谈,本文将公益诉讼定位为ADM透明度、问责和正义的更大生态系统的一部分。文章探讨了参与者所称的“改造”旧法律以适应ADM的策略和战术。这些策略超越了创造性的法律论证,涵盖了社区建设、变革理论合作、精明的客户和诉讼理由选择,以及诉讼中利益相关者利益的协调。自然,本文也探讨了这些策略以及澳大利亚法律体系的局限性。然而,在局限可以被克服的地方,本文提出了关于紧迫需求的发现:使有效诉讼和问责得以实现的制度安排。本文对法律和技术学者、受ADM伤害的个人和团体、公益诉讼律师和技术律师、民间社会和倡导组织以及政策制定者具有参考价值。

英文摘要

This paper examines the role of public interest litigation in promoting accountability for AI and automated decision-making (ADM) in Australia. Since ADM regulation faces political and geopolitical headwinds, effective governance will have to rely on the enforcement of existing laws. Drawing on interviews with Australian public interest litigators, technology policy activists, and technology law scholars, the paper positions public interest litigation as part of a larger ecosystem for transparency, accountability and justice with respect to ADM. The paper explores the tactics and strategies of what one participant described as 'retrofitting' old laws to ADM. These go beyond creative legal argumentation, to encompass practices of community-building, collaboration on theories of change, canny selection of clients and causes of action, and the alignment of the interests of stakeholders in litigation. Naturally, the paper also contends with the limits of these strategies, and of the Australian legal system. Where limits are, however, capable of being overcome, the paper presents findings on urgent needs: the enabling institutional arrangements without which effective litigation and accountability will falter. The paper is relevant to law and technology scholars; individuals and groups harmed by ADM; public interest litigators and technology lawyers; civil society and advocacy organisations; and policymakers.

2512.15792 2026-06-17 cs.CY cs.AI cs.CL 版本更新

A Multifaceted Analysis of Social Biases in Large Language Models

大型语言模型中偏见的系统分析

Xulang Zhang, Rui Mao, Erik Cambria

发表机构 * Nanyang Technological University(南洋理工大学)

AI总结 本文系统分析了四种广泛使用的大型语言模型在政治、意识形态、联盟、语言和性别等维度上的偏见,通过多项实验揭示了模型在中立性、意识形态倾向、地缘政治倾向、多语言故事完成中的偏见以及性别倾向。

详情
AI中文摘要

大型语言模型(LLMs)已迅速成为获取信息和支持人类决策不可或缺的工具。然而,确保这些模型在各种情境下保持公平性对于其安全和负责任的部署至关重要。在本研究中,我们对四种广泛采用的LLMs进行了全面分析,探讨了它们在政治、意识形态、联盟、语言和性别等维度上的潜在偏见和倾向。通过一系列精心设计的实验,我们利用新闻摘要来检验其政治中立性,通过新闻立场分类来研究意识形态偏见,通过联合国投票模式来探讨对特定地缘政治联盟的倾向,通过多语言故事完成来检验语言偏见,并通过世界价值观调查中的响应来揭示性别相关倾向。结果表明,尽管这些模型被设计为中立和公正,但它们仍然表现出不同类型的偏见和倾向。

英文摘要

Large language models (LLMs) have rapidly become indispensable tools for acquiring information and supporting human decision-making. However, ensuring that these models uphold fairness across varied contexts is critical to their safe and responsible deployment. In this study, we undertake a comprehensive examination of four widely adopted LLMs, probing their underlying biases and inclinations across the dimensions of politics, ideology, alliance, language, and gender. Through a series of carefully designed experiments, we investigate their political neutrality using news summarization, ideological biases through news stance classification, tendencies toward specific geopolitical alliances via United Nations voting patterns, language bias in the context of multilingual story completion, and gender-related affinities as revealed by responses to the World Values Survey. Results indicate that while the LLMs are aligned to be neutral and impartial, they still show biases and affinities of different types.

2602.11453 2026-06-17 cs.IR cs.AI cs.LG 版本更新

From Noise to Order: Learning to Rank via Denoising Diffusion

从噪声到有序:通过去噪扩散学习排序

Sajad Ebrahimi, Bhaskar Mitra, Negar Arabzadeh, Ye Yuan, Haolun Wu, Fattane Zarrinkalam, Ebrahim Bagheri

发表机构 * University of Guelph(圭尔夫大学) Independent Researcher(独立研究者) University of California, Berkeley(加州大学伯克利分校) McGill University(麦吉尔大学) University of Toronto(多伦多大学)

AI总结 提出基于去噪扩散的生成式排序模型DiffusionRank,通过建模特征向量与相关性标签的联合分布,在四个标准LTR数据集上优于传统判别式方法。

详情
AI中文摘要

在信息检索(IR)中,学习排序(LTR)方法传统上局限于判别式机器学习方法,这些方法基于查询-文档对的特征表示来建模文档与查询相关的概率。在这项工作中,我们提出了一种基于去噪扩散的深度生成式LTR方法,该方法转而建模特征向量和相关性标签的完整联合分布。虽然在判别式设置中,过参数化的排序模型可能通过不同方式拟合训练数据,但我们假设在生成式设置下能够解释完整数据分布的候选解能更好地估计相关性。基于这一动机,我们提出了DiffusionRank,它扩展了TabDiff(一种用于表格数据集的基于去噪扩散的生成模型),以创建经典判别式逐点和成对LTR目标的生成式等价物。我们在四个标准LTR数据集上进行了彻底的实证评估,证明了DiffusionRank模型相对于其判别式对应物的改进。我们的工作为未来研究探索如何利用深度生成建模方法(如扩散)在IR中进行学习排序提供了丰富的空间。

英文摘要

Learning-to-rank (LTR) methods have traditionally been limited to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. We propose an alternative denoising diffusion-based generative approach to LTR that instead models the full joint distribution over features and relevance labels. While in discriminative LTR, an over-parameterized ranking model may find different ways to fit the training data, we posit that candidate solutions that can explain the full data distribution under the generative setting maybe better at estimating relevance. Thus, we propose DiffusionRank that extends TabDiff, an existing diffusion model for tabular datasets, to create generative alternatives to classical discriminative pointwise and pairwise LTR objectives. Our work demonstrates improvements from DiffusionRank over discriminative counterparts on four standard LTR datasets and points to a rich space for future exploration to leverage ongoing advancements in deep generative models for LTR. Our code is publicly available at https://github.com/sadjadeb/DiffusionRank.

2512.01241 2026-06-17 cs.CY cs.AI 版本更新

First, do NOHARM: towards clinically safe large language models

首先,不伤害:迈向临床安全的大语言模型

David Wu, Fateme Nateghi Haredasht, Saloni Kumar Maharaj, Priyank Jain, Jessica Tran, Matthew Gwiazdon, Arjun Rustagi, Jenelle Jindal, Jacob M. Koshy, Vinay Kadiyala, Anup Agarwal, Bassman Tappuni, Brianna French, Sirus Jesudasen, Christopher V. Cosgriff, Rebanta Chakraborty, Jillian Caldwell, Susan Ziolkowski, David J. Iberri, Robert Diep, Rahul S. Dalal, Kira L. Newman, Kristin Galetta, J. Carl Pallais, Nancy Wei, Kathleen M. Buchheit, David I. Hong, Vartan Pahalyants, Ernest Y. Lee, Allen Shih, Tamara B. Kaplan, Vishnu Ravi, Sarita Khemani, Thomas A. Buckley, April S. Liang, Daniel Shirvani, Advait Patil, Nicholas Marshall, Kanav Chopra, Joel Koh, Adi Badhwar, Anastasia Perez, Austin J. Schoeffler, Mahbuba Tusty, Chase M. Walton, Liam G. McCoy, David J. H. Wu, Yingjie Weng, Sumant Ranji, Kevin Schulman, Nigam H. Shah, Jason Hom, Arnold Milstein, Arjun K. Manrai, Adam Rodman, Jonathan H. Chen, Ethan Goh

发表机构 * Harvard Combined Dermatology Program(哈佛联合皮肤科项目) Department of Dermatology, Mass General Brigham(麻省总医院皮肤科) Harvard Medical School(哈佛医学院) Stanford Center for Biomedical Informatics Research(斯坦福生物医学信息学研究中心) Stanford University(斯坦福大学) Division of Hospital Medicine, Department of Medicine, Stanford University School of Medicine(斯坦福大学医学院医院医学科) Department of Medicine, Cambridge Health Alliance(剑桥健康联盟医学科) Beth Israel Deaconess Hospital–Plymouth(贝塞斯达德acons医院-普利茅斯) Department of Medicine, University of California, San Francisco(加州大学旧金山分校医学科) Department of Neurology, Stanford University School of Medicine(斯坦福大学医学院神经科) Department of Medicine, Beth Israel Deaconess Medical Center(贝塞斯达德acons医学中心医学科) Division of Cardiology, Department of Medicine, Cambridge Health Alliance(剑桥健康联盟心脏病科) Department of Cardiovascular Medicine, Summa Health System(Summa健康系统心血管医学科) Division of Allergy, Pulmonary, and Critical Care Medicine, Department of Medicine, University of Wisconsin-Madison(威斯康星大学麦迪逊分校医学科过敏、呼吸科和危重医学科) Division of Pulmonary and Critical Care Medicine, Department of Medicine, Massachusetts General Hospital(麻省总医院呼吸科和危重医学科) Center for Immunology and Inflammatory Diseases, Department of Medicine, Massachusetts General Hospital(麻省总医院免疫和炎症疾病中心) Broad Institute of MIT and Harvard(MIT和哈佛Broad研究所) Division of Pulmonary, Critical Care, and Sleep Medicine, Cambridge Health Alliance(剑桥健康联盟呼吸科、危重医学科和睡眠医学科)

AI总结 提出NOHARM基准,包含1100个初级到专科咨询案例,评估28个LLM的医疗建议安全性,发现高达22.6%的案例存在严重危害风险,其中遗漏错误占80%以上。

详情
AI中文摘要

大语言模型(LLM)被医生和患者常规用于医疗建议,但其临床安全性特征仍不明确。我们提出NOHARM(医学风险评估的众多选项危害评估),一个包含1100个初级保健到专科咨询案例的基准,用于衡量LLM生成的医疗建议的危害频率和严重程度。NOHARM涵盖10个专科,包含4249个临床管理选项的12747个专家注释。在28个LLM中,建议在高达22.6%的案例中具有严重危害潜力,其中遗漏错误占严重错误的80%以上。在一项涉及101名全科医生的随机试验中,AI辅助显著提高了人类基准表现,但医生远未实现AI工具的潜力,经常忽略AI提出的重要建议。安全性表现与通用智能和医学知识基准在整个模型范围内相关,但在前沿模型上解耦。尽管在现有评估中表现强劲,广泛使用的AI模型可能以非平凡的比例产生具有严重危害潜力的医疗建议,凸显了明确测量临床安全性的重要性。

英文摘要

Large language models (LLMs) are routinely used by physicians and patients for medical advice, yet their clinical safety profiles remain poorly characterized. We present NOHARM (Numerous Options Harm Assessment for Risk in Medicine), a 1,100-task benchmark of primary care-to-specialist consultation cases to measure the frequency and severity of harm from LLM-generated medical recommendations. NOHARM covers 10 specialties, with 12,747 expert annotations for 4,249 clinical management options. Across 28 LLMs, recommendations carried the potential for severe harm in up to 22.6% of cases, with errors of omission accounting for more than 80% of severe errors. In a randomized trial of 101 generalist physicians, human benchmark performance significantly improved with AI assistance, yet physicians remained far from realizing the potential of AI tools, frequently ignoring essential advice surfaced by AI. Safety performance tracked general-intelligence and medical-knowledge benchmarks across the full range of models but decoupled at the frontier. Despite strong performance on existing evaluations, widely used AI models can produce medical advice with the potential for severe harm at non-trivial rates, highlighting the importance of explicit measurement of clinical safety.

2510.04421 2026-06-17 stat.ML cs.LG math.ST stat.TH 版本更新

Learning Survival Models with Right-Censored Reporting Delays

学习带有右删失报告延迟的生存模型

Yuta Shikuri, Hironori Fujisawa

发表机构 * The Graduate University for Advanced Studies(高级研究大学) Tokio Marine Holdings, Inc.(东京海上日赤保险株式会社) The Institute of Statistical Mathematics(统计数学研究所) RIKEN(理化学研究所)

AI总结 针对报告延迟导致的生存数据右删失问题,联合建模事件和报告过程的参数风险,提出一致估计量和蒙特卡洛EM算法,并利用迁移学习提高行政删失下及时风险评估的准确性。

Comments 26 pages, 3 figures, 3 tables

详情
AI中文摘要

生存分析提供了对事件发生时间进行建模的统计方法。当事件发生时间未在发生时被观察到,而是仅在报告时被揭示时,就会出现报告延迟。当由于行政删失导致观察窗口较短时,这一问题对于及时风险评估尤为关键。在本研究中,我们通过对事件和报告过程联合建模参数风险,纳入了右删失报告延迟。然后,我们为模型参数构建了一致估计量,并开发了蒙特卡洛期望最大化算法来计算它。为了应对行政删失带来的挑战,我们利用这些发现并提出了一种迁移学习程序。实验结果表明,我们的方法提高了行政删失下及时风险评估的准确性。

英文摘要

Survival analysis provides statistical methods to model the time until an event occurs. Reporting delays arise when event times are not observed at their occurrence but are only revealed upon reporting. This issue is particularly critical for timely risk evaluation when the observation window is short due to administrative censoring. In this study, we incorporate right-censored reporting delays by jointly modeling parametric hazards for the event and reporting processes. We then construct a consistent estimator for the model parameters and develop a Monte Carlo expectation-maximization algorithm to compute it. To address the challenges posed by administrative censoring, we leverage these findings and propose a transfer-learning procedure. Experimental results demonstrate that our method improves the accuracy of timely risk evaluation under administrative censoring.

2501.09876 2026-06-17 math.NA cs.LG cs.NA 版本更新

Geometry-Preserving Encoder/Decoder in Latent Generative Models

潜在生成模型中的几何保持编码器/解码器

Wonjun Lee, Riley C. W. O'Neill, Dongmian Zou, Jeff Calder, Gilad Lerman

发表机构 * Department of Mathematics, The Ohio State University(俄亥俄州立大学数学系) Department of Mathematics, University of Minnesota(明尼苏达大学数学系) Zu Chongzhi Center for Mathematics and Computational Sciences, Duke Kunshan University(杜克-昆山大学仲长奇中心)

AI总结 本文提出一种新型几何保持编码器/解码器框架,通过保留数据分布的几何结构,在潜在扩散模型中实现更高效的训练和更快的收敛。

Comments 50 pages

详情
AI中文摘要

生成建模旨在生成与给定数据集相似的新数据样本。当使用扩散模型完成此任务时,主要挑战之一是在输入空间中解决问题,而输入空间往往非常高维。为了解决这个问题,最近的方法通过编码器将数据空间映射到较低维的潜在空间,在潜在空间中求解扩散模型,从而提高了训练效率并取得了最先进的结果。变分自编码器(VAE)是该领域最常用的编码器/解码器框架,以其学习潜在表示和生成数据样本的能力而闻名。在本文中,我们引入了一种新颖的编码器/解码器框架,其理论特性与VAE不同,专门设计用于保留数据分布的几何结构。我们证明了这种几何保持编码器在编码器和解码器训练过程中的显著优势。此外,我们提供了理论结果,证明了训练过程的收敛性,包括编码器训练的收敛保证,以及使用几何保持编码器时解码器训练收敛更快的结果。

英文摘要

Generative modeling aims to generate new data samples that resemble a given dataset. When using diffusion models for this task, one of the main challenges is solving the problem in the input space, which tends to be very high-dimensional. To address this, recent approaches solve diffusion models in the latent space through an encoder that maps from the data space to a lower-dimensional latent space, improving training efficiency and achieving state-of-the-art results. The variational autoencoder (VAE) is the most commonly used encoder/decoder framework in this domain, known for its ability to learn latent representations and generate data samples. In this paper, we introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE, specifically designed to preserve the geometric structure of the data distribution. We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder. Additionally, we provide theoretical results proving convergence of the training process, including convergence guarantees for encoder training, and results showing faster convergence of decoder training when using the geometry-preserving encoder.

2508.02721 2026-06-17 cs.SE cs.AI cs.PL 版本更新

Blueprint First, Model Second: A Framework for Deterministic LLM Workflow

蓝图优先,模型其次:确定性LLM工作流框架

Libin Qiu, Yuhang Ye, Zhirong Gao, Xide Zou, Junfu Chen, Ziming Gui, Weizhi Huang, Xiaobo Xue, Wenkai Qiu, Kun Zhao

发表机构 * Alibaba(阿里巴巴)

AI总结 提出“蓝图优先,模型其次”框架,通过将工作流逻辑解耦为源代码蓝图并由确定性引擎执行,LLM仅处理子任务,在TravelPlanner上最终通过率提升97.6%,约束违反减少96.0%。

Comments 12 pages, 7 figures, 6 tables

详情
AI中文摘要

尽管强大,大型语言模型(LLM)智能体固有的非确定性限制了它们在结构化操作环境中的应用,这些环境要求程序保真度和可预测执行。这一限制源于当前架构将概率性的高级规划与低级动作执行混淆在单一生成过程中。为解决此问题,我们引入了 \ extsc{Source Code Agent} 框架,这是一种基于“蓝图优先,模型其次”哲学的新范式,将工作流逻辑与生成模型解耦。首先将专家定义的操作程序编纂为基于源代码的执行蓝图,然后由确定性引擎执行。LLM被策略性地调用作为专门工具,处理工作流中有界、复杂的子任务,但从不决定工作流的路径。我们在TravelPlanner基准上评估约束感知的旅行规划。\ extsc{Source Code Agent} 在相同Claude-Sonnet-4骨干上实现了35.56%的最终通过率,比最先进的ATLAS基线(18.00%)提高了97.6%。关键的是,它将约束违反减少了96.0%(11次对比275次),同时将执行效率提高了27.1%(10.2±0.7步对比14.0步)。两个生产事故诊断部署以及在ScienceWorld和ALFWorld上的额外结果证实,该架构可迁移到旅行规划之外的程序定义明确、约束密集型的工作流。我们的工作使得在受严格程序逻辑约束的应用中,自主智能体能够可验证且可靠地部署。

英文摘要

While powerful, the inherent non-determinism of large language model (LLM) agents limits their application in structured operational environments where procedural fidelity and predictable execution are strict requirements. This limitation stems from current architectures that conflate probabilistic, high-level planning with low-level action execution within a single generative process. To address this, we introduce the \textsc{Source Code Agent} framework, a new paradigm built on the ``Blueprint First, Model Second'' philosophy that decouples workflow logic from the generative model. An expert-defined operational procedure is first codified into a source code-based Execution Blueprint, which is then executed by a deterministic engine. The LLM is strategically invoked as a specialized tool to handle bounded, complex sub-tasks within the workflow, but never to decide the workflow's path. We evaluate on the TravelPlanner benchmark for constraint-aware travel planning. The \textsc{Source Code Agent} achieves a 35.56\% final pass rate, a 97.6\% improvement over the state-of-the-art ATLAS baseline (18.00\%) on the same Claude-Sonnet-4 backbone. Critically, it reduces constraint violations by 96.0\% (11 vs 275) while improving execution efficiency by 27.1\% (10.2$\pm$0.7 steps vs 14.0). Two production incident-diagnosis deployments and additional results on ScienceWorld and ALFWorld confirm that the architecture transfers beyond travel planning to procedurally well-defined, constraint-intensive workflows. Our work enables the verifiable and reliable deployment of autonomous agents in applications governed by strict procedural logic.

2507.17188 2026-06-17 cs.NI cs.AI cs.CR 版本更新

LLM-Aided Joint Secrecy Precoding and Trajectory for RSMA-Based Heterogeneous UAV Networks

基于RSMA的异构无人机网络中LLM辅助的联合保密预编码与轨迹设计

Lijie Zheng, Ji He, Shih Yu Chang, Yulong Shen

发表机构 * School of Computer Science and Technology, Xidian University(西安电子科技大学计算机科学与技术学院) Department of Applied Data Science, San Jose State University(圣何塞州立大学应用数据科学系)

AI总结 针对RSMA异构无人机网络中的安全通信问题,提出分层优化框架:内层用SDR-S2DC算法求解固定位置下的保密预编码,外层用LLM引导的多智能体强化学习优化轨迹,实现保密速率与能效的权衡。

详情
AI中文摘要

本文研究了速率分割多址接入(RSMA)使能的异构无人机网络中的安全通信问题,其中多个无人机在存在窃听者的情况下协作服务地面终端。通过联合考虑保密速率最大化和推进能量消耗最小化,我们构建了一个多目标优化问题,涉及无人机轨迹设计、服务关联、功率分配和保密预编码,并受到移动性、碰撞避免、服务容量和通信约束。所构建的问题由于无人机轨迹、RSMA传输变量和保密预编码之间的耦合而高度非凸。为了解决由此产生的非凸且高度耦合的优化问题,我们提出了一种分层优化框架。内层使用基于半定松弛(SDR)的S2DC算法,结合惩罚函数和凸差(D.C.)规划,在固定无人机位置下求解保密预编码问题。外层引入了一种大语言模型(LLM)引导的启发式多智能体强化学习方法(LLM-HeMARL)用于轨迹优化。LLM-HeMARL高效地整合了LLM生成的专家启发式策略,使无人机能够学习能量感知、安全驱动的轨迹,而无需实时LLM调用的推理开销。仿真结果表明,我们的方法在保密速率和能效方面优于现有基线,并在不同的无人机群规模和随机种子下具有一致的鲁棒性。

英文摘要

This paper investigates secure communications in rate-splitting multiple access (RSMA) enabled heterogeneous UAV networks, where multiple UAVs collaboratively serve ground terminals in the presence of eavesdroppers. By jointly considering secrecy rate maximization and propulsion energy consumption minimization, we formulate a multi-objective optimization problem involving UAV trajectory design, service association, power allocation, and secrecy precoding under mobility, collision-avoidance, service-capacity, and communication constraints. The formulated problem is highly non-convex due to the coupling among UAV trajectories, RSMA transmission variables, and secrecy constraints.To address the resulting non-convex and highly coupled optimization problem, we propose a hierarchical optimization framework. The inner layer uses a semidefinite relaxation (SDR)-based S2DC algorithm combining penalty functions and difference-of-convex (D.C.) programming to solve the secrecy precoding problem with fixed UAV positions. The outer layer introduces a Large Language Model (LLM)-guided heuristic multi-agent reinforcement learning approach (LLM-HeMARL) for trajectory optimization. LLM-HeMARL efficiently incorporates LLM-generated expert heuristic policy, enabling UAVs to learn energy-aware, security-driven trajectories without the inference overhead of real-time LLM calls. The simulation results show that our method outperforms existing baselines in secrecy rate and energy efficiency, with consistent robustness across varying UAV swarm sizes and random seeds.

2507.11366 2026-06-17 cs.GT cs.LG 版本更新

Characterizing Nash Equilibria in Zero-Sum Games: A Physics-Inspired, Parallelizable Approach with a Linear Number of Gradient Queries

零和博弈中纳什均衡的表征:一种受物理学启发、可并行化且具有线性梯度查询次数的方法

Taemin Kim, James P. Bailey

发表机构 * Industrial and Systems Engineering(工业与系统工程系) Rensselaer Polytechnic Institute(伦塞拉尔理工学院)

AI总结 提出一种受哈密顿动力学启发的在线优化方法,通过交替梯度下降在线性迭代次数内表征零和博弈的纳什均衡集,支持并行化和任意学习率,实验性能显著优于传统方法。

详情
AI中文摘要

我们研究零和博弈的在线优化方法,这是机器学习、经济学及许多其他领域中对抗性学习的一个基本问题。传统方法使用基于遗憾的方法(时间平均收敛)或基于收缩映射的方法(最后迭代收敛)来近似纳什均衡。我们提出一种基于物理学中哈密顿动力学的新方法,并证明在无界设置下,除退化情况外,它能在有限(线性)次交替梯度下降迭代中表征纳什均衡集,这是在线优化中的首次。与计算纳什均衡的标准方法不同,我们提出的方法可并行化且适用于任意学习率,这两者在算法博弈论中均为首次。实验上,我们通过展示我们的方法显著优于标准方法来支持我们的结果。

英文摘要

We study online optimization methods for zero-sum games, a fundamental problem in adversarial learning in machine learning, economics, and many other domains. Traditional methods approximate Nash equilibria (NE) using either regret-based methods (time-average convergence) or contraction-map-based methods (last-iterate convergence). We propose a new method based on Hamiltonian dynamics in physics and prove that it can characterize the set of NE in a finite (linear) number of iterations of alternating gradient descent in the unbounded setting, modulo degeneracy, a first in online optimization. Unlike standard methods for computing NE, our proposed approach can be parallelized and works with arbitrary learning rates, both firsts in algorithmic game theory. Experimentally, we support our results by showing our approach drastically outperforms standard methods.

2411.06842 2026-06-17 eess.IV cs.CV 版本更新

Evaluating Synthetic Data Generation for Domain Generalization in Fetal Brain MRI Segmentation

评估胎儿脑MRI分割中域泛化的合成数据生成

Vladyslav Zalevskyi, Thomas Sanchez, Margaux Roulet, Busra Bulut, Hélène Lajous, Jordina Aviles Verdera, Sara Neves Silva, Georg Langs, Gregor Kasprian, Roxane Licandro, Jana Hutter, Hamza Kebiri, Meritxell Bach Cuadra

发表机构 * Department of Radiology, Lausanne University Hospital and University of Lausanne (UNIL)(拉沃斯大学医院放射科和洛桑大学(UNIL)) CIBM Center for Biomedical Imaging(生物医学成像中心) Institute for Information Processing, Leibniz University Hannover(汉诺威莱比锡大学信息处理研究所) Department of Biomedical Engineering, School of Biomedical Engineering & Imaging Sciences, King’s College London(伦敦国王学院生物医学工程系) Department of Biomedical Imaging and Image-Guided Therapy, Division of Neuroradiology and Musculoskeletal Radiology, Medical University of Vienna(维也纳医学大学生物医学成像与影像引导治疗系) Department of Biomedical Imaging and Image-guided Therapy, Computational Imaging Research Lab (CIR), Medical University of Vienna(维也纳医学大学生物医学成像与影像引导治疗系,计算成像研究实验室(CIR)) Christian Doppler Laboratory for Mathematical Modelling and Simulation of Next-Generation Medical Ultrasound Devices, Medical University of Vienna(维也纳医学大学下一代医学超声设备数学建模与仿真克里斯蒂安多普勒实验室) Comprehensive Center for Artificial Intelligence in Medicine, Medical University of Vienna(维也纳医学大学人工智能在医学中的综合中心) Division of Neuroradiology and Musculoskeletal Radiology, Department of Biomedical Imaging and Image–guided Therapy, Medical University of Vienna(维也纳医学大学生物医学成像与影像引导治疗系,神经放射学和骨科放射学系)

AI总结 针对胎儿脑MRI分割中数据异质性和标注不足问题,研究基于域随机化的合成数据生成策略,提出FetalSynthSeg框架,通过高斯混合强度建模和强度聚类提升跨域鲁棒性,在多个数据集上达到最优性能。

详情
AI中文摘要

从磁共振成像(MRI)中进行胎儿脑组织分割对于研究神经发育至关重要,但由于数据异质性和有限标注而仍然具有挑战性。域随机化(DR)最近作为一种有前景的单源域泛化策略出现,通过合成具有随机伪影、对比度和分辨率的训练图像。在这项工作中,我们研究了如何最大化基于DR的方法的域外(OOD)泛化能力。我们评估了几种用于DR的合成数据生成策略,特别关注我们最近提出的框架FetalSynthSeg。我们表明,简单的高斯混合强度建模优于更复杂的基于物理的模拟,并且强度聚类(根据强度细分组织类别)提高了OOD鲁棒性。在来自四个站点的348个胎儿受试者(涵盖0.55-3T以及T1w和T2w对比)上评估,FetalSynthSeg在多个FeTA 2024测试数据集上达到了最先进的性能(80-85 Dice分数),并且首次在T2w以外的模态上为胎儿脑分割提供了鲁棒的分割(在dHCP-T1w数据集上达到80 Dice)。与最先进的方法(如BOUNTI、nnU-Net集成和FeTA 2024获胜者)相比,FetalSynthSeg在保持跨域偏移的强鲁棒性的同时,提供了相当或更优的准确性。我们的代码、模型权重和便于推理的Docker镜像可在以下网址获取:此 https URL。

英文摘要

Fetal brain tissue segmentation from magnetic resonance imaging (MRI) is crucial for studying neurodevelopment, but remains challenging due to data heterogeneity and limited annotations. Domain randomization (DR) has recently emerged as a promising strategy for single-source domain generalization by synthesizing training images with randomized artifacts, contrast, and resolution. In this work, we investigate how to maximize the out-of-domain (OOD) generalization of DR-based methods. We evaluate several synthetic data generation strategies for DR, with a particular focus on our recently proposed framework, FetalSynthSeg. We show that simple Gaussian mixture-based intensity modeling outperforms more complex physics-based simulations, and that intensity clustering (subdividing tissue classes based on intensity) improves OOD robustness. Evaluated on 348 fetal subjects from four sites spanning 0.55-3T and both T1w and T2w contrasts, FetalSynthSeg reaches state-of-the-art performance on several FeTA 2024 testing datasets (80-85 Dice score) and, for the first time, offers robust segmentation on modalities other than T2w for fetal brain segmentation (80 Dice on dHCP-T1w dataset). Compared with state-of-the-art methods such as BOUNTI, nnU-Net ensemble, and the FeTA 2024 winner, FetalSynthSeg delivers comparable or superior accuracy while maintaining strong robustness across domain shifts. Our code, model weights, and Docker image ready for easy inference are available at https://hub.docker.com/r/vzalevskyi/fetalsynthseg.

2501.00826 2026-06-17 q-fin.TR cs.AI 版本更新

LLM-Powered Multi-Agent System for Automated Crypto Portfolio Management

基于LLM的多智能体系统实现自动化加密货币投资组合管理

Yichen Luo, Yebo Feng, Jiahua Xu, Paolo Tasca, Yang Liu

发表机构 * University College London(伦敦大学学院) Nanyang Technological University(南洋理工大学) Exponential Science(指数科学)

AI总结 提出一个三智能体系统(市场、新闻、交易),通过分层、协作和辩论架构融合多模态信号,在2025年回测中实现133.52%累计收益和1.502夏普比率,优于单智能体和深度学习基线。

详情
AI中文摘要

加密货币投资组合管理需要在高度波动和实时约束下融合异构多模态信号,包括结构化的价格和链上时间序列、非结构化的新闻文本以及技术指标。虽然深度学习方法显示出预测能力,但其不透明性限制了实际应用,而单个大语言模型(LLM)智能体难以处理稳健决策所需的多模态输入广度。我们提出一个多智能体系统(MAS)框架,其中三个模态专业智能体——负责市场动态的加密货币智能体、负责每周新闻情绪的新闻智能体和负责信号融合与投资组合执行的交易智能体——通过三种通信架构(分层、协作和辩论)分解任务。我们评估了四种能力配置:零样本、思维链(CoT)、检索增强生成(RAG)和技能增强。在2025年1月按市值排名前15的L1区块链原生加密货币的52周回测中,最佳配置(分层技能)实现了133.52%的累计收益和1.502的夏普比率,优于单智能体变体、被动基准和深度学习基线。消融研究确定加密货币智能体是最关键的组件,移除它会使累计收益降低42.57个百分点。跨模型比较进一步表明,在GPT-4o、GPT-5和Claude Sonnet 4.5下,MAS均优于单智能体基线,表明多智能体协调的优势与模型无关。与黑箱深度学习模型不同,每个投资组合决策都可追溯到明确的智能体推理,为多模态加密货币投资组合管理提供了一种可解释且有效的方法。

英文摘要

Cryptocurrency portfolio management requires the fusion of heterogeneous multi-modal signals, including structured price and on-chain time series, unstructured news text, and technical indicators, under high-volatility and real-time constraints. While deep learning approaches show predictive capability, their opacity limits practical adoption, and single large language model (LLM) agents struggle to process the breadth of modality-specific inputs needed for robust decision-making. We propose a multi-agent system (MAS) framework in which three modality-specialised agents, a Crypto Agent for market dynamics, a News Agent for weekly news sentiment, and a Trading Agent for signal fusion and portfolio execution, decompose the task across three communication architectures: hierarchical, collaborative, and debate. We evaluate four capability configurations: zero-shot, chain-of-thought (CoT), retrieval-augmented generation (RAG), and skill-augmented. In a 52-week backtest over calendar year 2025 across the top 15 L1 blockchain native cryptocurrencies by market capitalisation as of January 2025, the best configuration, Hierarchical (Skill), achieves a cumulative return of 133.52% and a Sharpe ratio of 1.502, outperforming single-agent variants, passive benchmarks, and deep learning baselines. An ablation study identifies the Crypto Agent as the most critical component, with its removal reducing cumulative return by 42.57 percentage points. A cross-model comparison further shows that MAS outperforms the single-agent baseline under GPT-4o, GPT-5, and Claude Sonnet 4.5, suggesting that the benefit of multi-agent coordination is model-agnostic. Unlike black-box deep learning models, every portfolio decision is traceable to explicit agent reasoning, offering an interpretable and effective approach to multi-modal cryptocurrency portfolio management.

2407.13053 2026-06-17 cs.CY cs.AI cs.CL cs.LG 版本更新

E2Vec: Feature Embedding with Temporal Information for Analyzing Student Actions in E-Book Systems

E2Vec:基于时间信息的特征嵌入用于分析电子书系统中的学生行为

Yuma Miyazaki, Valdemar Švábenský, Yuta Taniguchi, Fumiya Okubo, Tsubasa Minematsu, Atsushi Shimada

发表机构 * Kyushu University(九州大学)

AI总结 提出E2Vec方法,利用词嵌入将操作日志和时间间隔转化为学生向量,用于风险检测任务,提升泛化性和性能。

Comments Research paper published in the Proceedings of the 17th Educational Data Mining Conference (EDM 2024), see https://doi.org/10.5281/zenodo.12729853

详情
AI中文摘要

数字教科书(电子书)系统将学生与教科书的交互记录为一系列事件,称为事件流数据。过去,研究人员从事件流中提取有意义的特征,并将其用作下游任务(如成绩预测和学生行为建模)的输入。先前的研究评估了主要使用基于统计的特征(如操作类型数量或访问频率)的模型。虽然这些特征有助于提供某些见解,但它们缺乏捕捉不同学生学习行为中细粒度差异的时间信息。本研究提出E2Vec,一种基于词嵌入的新型特征表示方法。该方法将每个学生的操作日志及其时间间隔视为字符字符串序列,并生成包含时间信息的学习活动特征的学生向量。我们应用fastText为来自两年计算机科学课程数据集的305名学生生成嵌入向量。然后,我们研究了E2Vec在风险检测任务中的有效性,展示了其泛化性和性能潜力。

英文摘要

Digital textbook (e-book) systems record student interactions with textbooks as a sequence of events called EventStream data. In the past, researchers extracted meaningful features from EventStream, and utilized them as inputs for downstream tasks such as grade prediction and modeling of student behavior. Previous research evaluated models that mainly used statistical-based features derived from EventStream logs, such as the number of operation types or access frequencies. While these features are useful for providing certain insights, they lack temporal information that captures fine-grained differences in learning behaviors among different students. This study proposes E2Vec, a novel feature representation method based on word embeddings. The proposed method regards operation logs and their time intervals for each student as a string sequence of characters and generates a student vector of learning activity features that incorporates time information. We applied fastText to generate an embedding vector for each of 305 students in a dataset from two years of computer science courses. Then, we investigated the effectiveness of E2Vec in an at-risk detection task, demonstrating potential for generalizability and performance.

2208.03023 2026-06-17 eess.AS cs.SD 版本更新

AID: Open-source Anechoic Interferer Dataset

AID:开源消声干扰源数据集

Philipp Götz, Cagdas Tuna, Andreas Walther, Emanuël A. P. Habets

发表机构 * International Audio Laboratories Erlangen(国际声学实验室埃尔朗根) Fraunhofer Institute for Integrated Circuits IIS(弗劳恩霍夫整合电路研究所IIS)

AI总结 提出一个家庭环境中各种声源的消声录音数据集,用于模拟复杂声学场景的非平稳环境噪声信号,并提供Python库生成随机混合干扰信号。

Comments Accepted for publication at IWAENC 2022

详情
AI中文摘要

本文提出了一个数据集,包含家庭环境中遇到的各种声源的消声录音。该数据集旨在作为非平稳环境噪声信号的资源,这些信号与声学脉冲响应卷积后可用于模拟复杂的声学场景。此外,还提供了一个Python库,用于生成数据集中录音的随机混合,这些混合可用作非平稳干扰信号。

英文摘要

A dataset of anechoic recordings of various sound sources encountered in domestic environments is presented. The dataset is intended to be a resource of non-stationary, environmental noise signals that, when convolved with acoustic impulse responses, can be used to simulate complex acoustic scenes. Additionally, a Python library is provided to generate random mixtures of the recordings in the dataset, which can be used as non-stationary interference signals.

2502.17773 2026-06-17 stat.ME cs.AI cs.LG

How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective

大型语言模型值得模拟多少人意见?从不确定性量化角度出发

Chengpiao Huang, Yuhang Wu, Kaizheng Wang

发表机构 * Department of IEOR, Columbia University(哥伦比亚大学工业工程与运筹学系) Decision, Risk, and Operations Division, Columbia Business School(哥伦比亚商学院决策、风险与运营分校) Department of IEOR and Data Science Institute, Columbia University(哥伦比亚大学工业工程与运筹学系及数据科学研究所)

AI总结 本文从不确定性量化角度出发,提出了一种框架,将LLM模拟的响应转换为人类响应总体参数的可靠置信集,通过量化人类-LLM不一致带来的不确定性。关键设计是模拟响应的数量:过多会导致置信集过窄且覆盖性差,过少则导致置信集过宽且信息不足。本文提出了一种数据驱动的方法,自适应选择模拟样本量以实现名义平均覆盖性,无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步反映了LLM能代表的有效人类人口规模,提供了其模拟保真度的定量度量。实验表明不同LLM和领域存在异质性模拟保真度。

Comments 63 pages, 13 figures

详情
AI中文摘要

大型语言模型(LLMs)越来越多地用于模拟调查响应,但合成数据可能与人类人口不一致,导致不可靠的推断。我们开发了一个通用框架,将LLM模拟的响应转换为人类响应总体参数的可靠置信集,量化由人类-LLM不一致引起的不确定性。关键设计选择是模拟响应的数量:过多会产生过于狭窄的置信集,覆盖性差;过少则会产生过于宽泛且信息不足的置信集,受随机噪声主导。我们提出了一种数据驱动的方法,自适应地选择模拟样本量以实现名义平均覆盖性,无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步被证明反映了LLM能代表的有效人类人口规模,提供其模拟保真度的定量度量。在真实调查数据集上的实验揭示了不同LLM和领域之间的异质性模拟保真度。

英文摘要

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, quantifying the uncertainty induced by the human-LLM misalignment. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield overly wide and uninformative sets dominated by stochastic noise. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous simulation fidelity across different LLMs and domains.

2501.12709 2026-06-17 quant-ph cs.AI cs.CR cs.DC

Experimentally validated quantum-secure federated learning over a multi-user quantum network

在多用户量子网络上实验验证的量子安全联邦学习

Zhi-Ping Liu, Xiao-Yu Cao, Hao-Wen Liu, Xiao-Ran Sun, Yu Bao, Jian-Yu Shen, Yu-Shuo Lu, Hua-Lei Yin, Zeng-Bing Chen

发表机构 * National Laboratory of Solid State Microstructures(固态微结构国家实验室) School of Physics, Collaborative Innovation Center of Advanced Microstructures, Nanjing University, Nanjing 210093, China(物理系,先进微结构协同创新中心,南京大学,南京210093,中国) School of Physics(物理系) Key Laboratory of Quantum State Construction(量子态制备重点实验室) Manipulation (Ministry of Education), Renmin University of China, Beijing 100872, China(操控(教育部),中国人民大学,北京100872,中国)

AI总结 本文提出QuNetQFL协议,通过分布式量子密钥掩蔽局部模型更新,实现信息论安全的聚合。实验验证在四客户端量子网络上,提升分类准确率并展示在语言任务和大规模模拟中的扩展性。

Comments 25 pages, 7 figures, 7 tables, Accepted by Research

详情
Journal ref
Research 9, 1299 (2026)
AI中文摘要

联邦学习实现了去中心化和隐私保护的训练,但在量子时代仍面临隐私泄露的风险。量子联邦学习(QFL)提供了一条通往增强安全性和效率的途径。然而,缺乏一个实际且经过实验验证的QFL协议,利用近期量子技术解决数据隐私问题。本文提出了QuNetQFL协议,在量子网络上实现,其中局部模型更新被分布式量子秘密密钥掩蔽,提供信息论安全的聚合。我们实验验证该协议在四客户端量子网络上,并通过生成的密钥在量子和现实数据集上进行性能基准测试。添加一个量子客户端显著提高了对多体纠缠和非稳定器量子数据集的分类准确率。在语言任务中,我们通过联邦微调混合经典-量子语言模型进行情感分析,实现了在模拟和真实量子硬件上的可比和稳健性能。大规模模拟进一步展示了其扩展性,可扩展到200个客户端进行手写数字识别,具有快速收敛和通信成本减少75%的模型压缩。本文的工作为新兴量子互联网中的量子安全联邦学习建立了实际和可扩展的路线。

英文摘要

Federated learning enables decentralized, privacy-preserving training but remains vulnerable to privacy leakage in the quantum era. Quantum federated learning (QFL) offers a promising path towards enhanced security and efficiency. However, a practical and experimentally validated QFL protocol utilizing near-term quantum techniques to address data privacy has been lacking. Here we present QuNetQFL, a QFL protocol implemented on quantum networks, in which local model updates are masked with distributed quantum secret keys, offering information-theoretic security during aggregation. We experimentally validate the protocol on a four-client quantum network and benchmark its performance using the generated keys on quantum and real-world datasets. Adding a single quantum client significantly improves global accuracy for classifying multipartite entangled and non-stabilizer quantum datasets. For language tasks, we apply QuNetQFL to sentiment analysis by federated fine-tuning of a hybrid classical-quantum language model, achieving comparable and robust performance in simulation and on real quantum hardware. Large-scale simulations further demonstrate scalability to 200 clients for handwritten-digit recognition, with rapid convergence and a $75\%$ reduction in communication cost via model compression. Our work establishes a practical and scalable route to quantum-secure federated learning for the emerging quantum internet.

2604.13662 2026-06-17 cond-mat.mes-hall cs.CV cs.LG

Automatic Charge State Tuning of 300 mm FDSOI Quantum Dots Using Neural Network Segmentation of Charge Stability Diagram

300毫米FDSOI量子点自动电荷状态调节:基于神经网络的电荷稳定性图分割

Peter Samaha, Amine Torki, Ysaline Renaud, Sam Fiette, Emmanuel Chanrion, Pierre-Andre Mortemousque, Yann Beilliard

发表机构 * CEA-Leti(法国格勒诺耶大学(Univ. Grenoble Alpes))

AI总结 本文提出基于深度学习的语义分割流程,通过识别电荷稳定性图中的过渡线实现量子点自动电荷调节,提升硅量子点量子比特的高通量电荷调节效率。

Comments 10 pages, 6 figures, supplementary materials available

详情
AI中文摘要

调节由门定义的半导体量子点(QDs)是扩展自旋量子比特技术的主要瓶颈。我们提出了一种由深度学习(DL)驱动的语义分割流程,通过在完整的电荷稳定性图(CSDs)中定位过渡线来实现电荷自动调节,并返回单电荷 regime 的门电压目标。我们组装并手动注释了1015个实验测量的硅量子点设备的大型异构数据集,涵盖九种设计几何形状、多个晶圆和制造批次。一个具有MobileNetV2编码器的U-Net风格卷积神经网络(CNN)通过五折分组交叉验证进行训练和验证。我们的模型在定位单电荷 regime 方面实现了80.0%的离线调节成功率,某些设计的峰值性能超过88%。我们分析了主导的失败模式并提出了针对性的缓解措施。最后,宽范围图分割也自然地启用了可扩展的基于物理的特征提取,可以反馈到制造和设计流程中,并概述了在低温晶圆探针中实现实时集成的道路图。总体而言,我们的结果表明,基于神经网络(NN)的宽图分割是实现硅量子点量子比特高通量电荷调节的可行步骤。

英文摘要

Tuning of gate-defined semiconductor quantum dots (QDs) is a major bottleneck for scaling spin qubit technologies. We present a deep learning (DL) driven, semantic-segmentation pipeline that performs charge auto-tuning by locating transition lines in full charge stability diagrams (CSDs) and returns gate voltage targets for the single charge regime. We assemble and manually annotate a large, heterogeneous dataset of 1015 experimental CSDs measured from silicon QD devices, spanning nine design geometries, multiple wafers, and fabrication runs. A U-Net style convolutional neural network (CNN) with a MobileNetV2 encoder is trained and validated through five-fold group cross validation. Our model achieves an overall offline tuning success of 80.0% in locating the single-charge regime, with peak performance exceeding 88% for some designs. We analyze dominant failure modes and propose targeted mitigations. Finally, wide-range diagram segmentation also naturally enables scalable physic-based feature extraction that can feed back to fabrication and design workflows and outline a roadmap for real-time integration in a cryogenic wafer prober. Overall, our results show that neural network (NN) based wide-diagram segmentation is a practical step toward automated, high-throughput charge tuning for silicon QD qubits.

2506.07917 2026-06-17 cs.GR cs.CV

SpeeDe3DGS: Speedy Deformable 3D Gaussian Splatting with Temporal Pruning and Motion Grouping

SpeeDe3DGS:通过时间修剪和运动分组实现快速变形3D高斯点拨

Allen Tu, Haiyang Ying, Alex Hanson, Yonghan Lee, Tom Goldstein, Matthias Zwicker

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 本文提出SpeeDe3DGS,通过时间敏感性修剪、时间敏感性采样和GroupFlow模块,在保持高质量重建的同时,显著提升3DGS的渲染和训练效率。

Comments Project Page: https://speede3dgs.github.io/

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026, pp. 26083-26093
AI中文摘要

动态扩展的3D高斯点拨(3DGS)通过神经运动场实现高质量重建,但每个高斯神经推理使其模型计算成本高。基于DeformableGS,我们引入了快速变形3D高斯点拨(SpeeDe3DGS),通过三个互补模块:时间敏感性修剪(TSP)通过时间聚合敏感性分析移除低影响高斯,时间敏感性采样(TSS)扰动时间戳以抑制漂浮点并提高时间一致性,以及GroupFlow将学习的变形场压缩为共享SE(3)变换以实现高效的组间运动。在50个动态场景的MonoDyGauBench上,将TSP和TSS整合到DeformableGS中,平均渲染速度提升6.78倍,同时保持神经场保真度并使用10倍更少的原始体素。添加GroupFlow后,渲染速度进一步提升13.71倍,训练时间缩短2.53倍,超越所有基线,在保持优越图像质量的同时实现了更快的速度。

英文摘要

Dynamic extensions of 3D Gaussian Splatting (3DGS) achieve high-quality reconstructions through neural motion fields, but per-Gaussian neural inference makes these models computationally expensive. Building on DeformableGS, we introduce Speedy Deformable 3D Gaussian Splatting (SpeeDe3DGS), which bridges this efficiency-fidelity gap through three complementary modules: Temporal Sensitivity Pruning (TSP) removes low-impact Gaussians via temporally aggregated sensitivity analysis, Temporal Sensitivity Sampling (TSS) perturbs timestamps to suppress floaters and improve temporal coherence, and GroupFlow distills the learned deformation field into shared SE(3) transformations for efficient groupwise motion. On the 50 dynamic scenes in MonoDyGauBench, integrating TSP and TSS into DeformableGS accelerates rendering by 6.78$\times$ on average while maintaining neural-field fidelity and using 10$\times$ fewer primitives. Adding GroupFlow culminates in 13.71$\times$ faster rendering and 2.53$\times$ shorter training, surpassing all baselines in speed while preserving superior image quality.

2603.19801 2026-06-17 eess.IV cs.AI cs.CV

Offshore oil and gas platform dynamics in the North Sea, Gulf of Mexico, and Persian Gulf: Exploiting the Sentinel-1 archive

北海、墨西哥湾和波斯湾的海上石油和天然气平台动态:利用Sentinel-1档案

Robin Spanier, Thorsten Hoeser, John Truckenbrodt, Felix Bachofer, Claudia Kuenzer

发表机构 * German Remote Sensing Data Center, Earth Observation Center, EOC of the German Aerospace Center, DLR(德国遥感数据中心,地球观测中心,德国航空航天中心(DLR)地球观测中心) Institute for Geography and Geology, Department of Remote Sensing, University of Würzburg(地理与地质研究所,遥感系,乌尔姆大学)

AI总结 本文利用Sentinel-1数据和深度学习技术,研究了北海、墨西哥湾和波斯湾的海上平台动态,揭示了平台数量变化及结构转型,为海洋基础设施监测提供了数据支持。

Comments 16 pages, 10 figures, 1 table

详情
Journal ref
Big Earth Data, 2026, 1-27
AI中文摘要

随着海上基础设施的增加,对持续、可扩展的监测需求日益增长。本文提出了一种基于免费地球观测数据的自动化方法,利用Sentinel-1档案数据和深度学习目标检测技术,构建了2017-2025年间北海、墨西哥湾和波斯湾的季度平台位置时间序列。此外,还推导了平台大小、水深、海岸距离、国家归属及安装和退役日期等信息。2025年识别出3728个海上平台,其中北海有356个,墨西哥湾有1641个,波斯湾有1731个。尽管波斯湾平台数量在2024年前持续增长,但墨西哥湾和北海的平台数量在2018-2020年间有所下降。同时,超过2700个平台被安装或迁移到新地点,同时有相当数量被退役或迁移。此外,平台寿命缩短的趋势表明,海上行业正经历结构性变化,与移动海上单位如钻探平台的重要性增长有关。研究结果展示了免费地球观测数据和深度学习在持续、长期监测海洋基础设施中的潜力。所推导的数据集是公开的,为海上监测、海洋规划及海上能源行业转型分析提供了基础。

英文摘要

The increasing use of marine spaces by offshore infrastructure, including oil and gas platforms, underscores the need for consistent, scalable monitoring. Offshore development has economic, environmental, and regulatory implications, yet maritime areas remain difficult to monitor systematically due to their inaccessibility and spatial extent. This study presents an automated approach to the spatiotemporal detection of offshore oil and gas platforms based on freely available Earth observation data. Leveraging Sentinel-1 archive data and deep learning-based object detection, a consistent quarterly time series of platform locations for three major production regions: the North Sea, the Gulf of Mexico, and the Persian Gulf, was created for the period 2017-2025. In addition, platform size, water depth, distance to the coast, national affiliation, and installation and decommissioning dates were derived. 3,728 offshore platforms were identified in 2025, 356 in the North Sea, 1,641 in the Gulf of Mexico, and 1,731 in the Persian Gulf. While expansion was observed in the Persian Gulf until 2024, the Gulf of Mexico and the North Sea saw a decline in platform numbers from 2018-2020. At the same time, a pronounced dynamic was apparent. More than 2,700 platforms were installed or relocated to new sites, while a comparable number were decommissioned or relocated. Furthermore, the increasing number of platforms with short lifespans points to a structural change in the offshore sector associated with the growing importance of mobile offshore units such as jack-ups or drillships. The results highlighted the potential of freely available Earth observation data and deep learning for consistent, long-term monitoring of marine infrastructure. The derived dataset is public and provides a basis for offshore monitoring, maritime planning, and analyses of the transformation of the offshore energy sector.

2602.00473 2026-06-17 quant-ph cs.AI cs.LG

Quantum Phase Recognition via Quantum Attention Mechanism

通过量子注意机制进行量子相识别

Jin-Long Chen, Xin Li, Zhang-Qi Yin

发表机构 * Center for Quantum Technology Research(量子技术研究中心) Key Laboratory of Advanced Optoelectronic Quantum Architecture(先进光电量子架构重点实验室) Measurements (MOE), School of Physics, Beijing Institute of Technology, Beijing 100081, China(测量(MOE),物理学院,北京理工大学,北京100081,中国)

AI总结 本文提出混合量子-经典注意模型,利用交换测试和参数化量子电路提取量子态关联,实现基态分类,针对簇异或模型在9和15个量子比特系统中表现出高准确率和鲁棒性。

Comments 10 pages, 7 figures

详情
Journal ref
Phys. Rev. A 113, 062403 (2026)
AI中文摘要

许多体系统中的量子相变本质上由复杂的关联结构特征化,这给传统方法在大规模系统中的计算带来了挑战。为此,我们提出了一种混合量子-经典注意模型。该模型利用交换测试和参数化量子电路实现的注意机制,提取量子态中的关联并执行基态分类。在9和15个量子比特的簇异或模型上进行测试,该模型在少于100个训练数据的情况下实现了高分类准确率,并展示了对训练集变化的鲁棒性。进一步分析表明,该模型成功捕捉了相敏感特征和特征物理长度尺度,为复杂许多体系统中的量子相识别提供了一种可扩展且数据高效的解决方案。

英文摘要

Quantum phase transitions in many-body systems are fundamentally characterized by complex correlation structures, which pose computational challenges for conventional methods in large systems. To address this, we propose a hybrid quantum-classical attention model. This model uses an attention mechanism, realized through swap tests and a parameterized quantum circuit, to extract correlations within quantum states and perform ground-state classification. Benchmarked on the cluster-Ising model with system sizes of 9 and 15 qubits, the model achieves high classification accuracy with less than 100 training data and demonstrates robustness against variations in the training set. Further analysis reveals that the model successfully captures phase-sensitive features and characteristic physical length scales, offering a scalable and data-efficient approach for quantum phase recognition in complex many-body systems.

2511.03876 2026-06-17 eess.IV cs.CV cs.LG physics.med-ph

Computed Tomography (CT)-derived Cardiovascular Flow Estimation Using Physics-Informed Neural Networks Improves with Sinogram-based Training: A Simulation Study

基于CT的心血管血流估计利用物理信息神经网络,通过sinogram训练提升:一项模拟研究

Jinyuxuan Guo, Gurnoor Singh Khurana, Alejandro Gonzalo Grande, Juan C. del Alamo, Francisco Contijoch

发表机构 * Dept. of Bioengineering, University of California San Diego(加州大学圣地亚哥分校生物工程系) Dept. of Computer Science Engineering, University of California San Diego(加州大学圣地亚哥分校计算机科学与工程系) Dept. of Mechanical Engineering, Univ of Washington(华盛顿大学机械工程系) Depts of Mechanical Engineering and Cardiology, Univ. of Washington(华盛顿大学机械工程与心内科系) Depts. of Bioengineering, Radiology, University of California San Diego(加州大学圣地亚哥分校生物工程与放射学系)

AI总结 本研究评估了CT影像对基于物理信息神经网络(PINN)的血流估计的影响,提出了一种改进框架SinoFlow,直接利用sinogram数据估计血流,结果显示SinoFlow在避免滤波反投影引入的误差方面表现更优。

详情
AI中文摘要

背景:非侵入性成像基于血流评估在评估心脏功能和结构中起关键作用。CT是一种广泛使用的成像模态,能够稳健地评估心血管解剖和功能,但直接从对比剂演变的电影中估计血流速度的方法尚未开发。目的:本研究评估CT影像对基于物理信息神经网络(PINN)的血流估计的影响,并提出一种改进框架SinoFlow,直接利用sinogram数据估计血流。方法:我们利用计算流体力学生成理想化的2D血管分叉中的脉动流场,并模拟了不同 gantry 旋转速度、管电流和脉冲模式成像设置的CT扫描。我们比较了基于重建图像的PINN血流估计(ImageFlow)与SinoFlow的性能。结果:SinoFlow通过避免滤波反投影引入的误差显著提高了血流估计性能。SinoFlow在所有测试的gantry旋转速度下都表现出鲁棒性,并且始终产生比ImageFlow更低的均方误差和速度误差。此外,SinoFlow与脉冲模式成像兼容,并且在较短的脉冲宽度下保持更高的准确性。结论:本研究展示了SinoFlow在CT基血流估计中的潜力,为非侵入性血流评估提供了一种更有前景的方法。研究结果旨在为PINNs在CT图像中的未来应用提供信息,并提供了一种基于图像的估计解决方案,合理采集参数可产生准确的血流估计。

英文摘要

Background: Non-invasive imaging-based assessment of blood flow plays a critical role in evaluating heart function and structure. Computed Tomography (CT) is a widely-used imaging modality that can robustly evaluate cardiovascular anatomy and function, but direct methods to estimate blood flow velocity from movies of contrast evolution have not been developed. Purpose: This study evaluates the impact of CT imaging on Physics-Informed Neural Networks (PINN)-based flow estimation and proposes an improved framework, SinoFlow, which uses sinogram data directly to estimate blood flow. Methods: We generated pulsatile flow fields in an idealized 2D vessel bifurcation using computational fluid dynamics and simulated CT scans with varying gantry rotation speeds, tube currents, and pulse mode imaging settings. We compared the performance of PINN-based flow estimation using reconstructed images (ImageFlow) to SinoFlow. Results: SinoFlow significantly improved flow estimation performance by avoiding propagating errors introduced by filtered backprojection. SinoFlow was robust across all tested gantry rotation speeds and consistently produced lower mean squared error and velocity errors than ImageFlow. Additionally, SinoFlow was compatible with pulsed-mode imaging and maintained higher accuracy with shorter pulse widths. Conclusions: This study demonstrates the potential of SinoFlow for CT-based flow estimation, providing a more promising approach for non-invasive blood flow assessment. The findings aim to inform future applications of PINNs to CT images and provide a solution for image-based estimation, with reasonable acquisition parameters yielding accurate flow estimates.

2508.10908 2026-06-17 physics.ao-ph cs.LG

Data-driven global ocean model resolving ocean-atmosphere coupling dynamics

数据驱动的全球海洋模型解析海洋-大气耦合动力学

Jeong-Hwan Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, Yoo-Geun Ham

发表机构 * Center for Climate and Carbon Cycle Research, Korea Institute of Science and Technology, Seoul, Republic of Korea(韩国科学技术院气候与碳循环研究中心,首尔,大韩民国) Department of Environment and Energy, Jeonbuk National University, Jeonju, Republic of Korea(全南国立大学环境与能源系,全州,大韩民国) School of Earth and Environmental Sciences, Seoul National University, Seoul, Republic of Korea(首尔国立大学地球与环境科学学院,首尔,大韩民国) Department of Environmental Management, Seoul National University, Seoul, Republic of Korea(首尔国立大学环境管理系,首尔,大韩民国)

AI总结 本文提出KIST-Ocean模型,利用U型视觉注意力对抗网络架构,通过部分卷积、对抗训练和迁移学习提升海洋预测能力,准确模拟热带太平洋的Kelvin波和Rossby波传播及环流风应力诱导的垂直运动,展现其在气候现象中的耦合机制表示能力。

Comments The manuscript contains 4 main figures. The Extended Data contains 7 figures and 3 tables. The Supplementary Information contains 3 text sections, 7 figures, 1 table

详情
Journal ref
Sci. Adv. 12, eaed1225 (2026)
AI中文摘要

人工智能已推动全球天气预报发展,优于传统数值模型在准确性和计算效率方面。然而,预测超亚季节时间尺度需要开发基于深度学习的海洋-大气耦合模型,以真实模拟复杂海洋对大气强迫的响应。本文提出KIST-Ocean,一种基于深度学习的全球三维海洋环流模型,采用U型视觉注意力对抗网络架构。KIST-Ocean通过部分卷积、对抗训练和迁移学习解决海岸复杂性和预测分布漂移问题。全面评估证实了模型的鲁棒海洋预测能力和效率。此外,它准确捕捉现实海洋响应,如热带太平洋的Kelvin和Rossby波传播,以及由环流和反环流风应力引起的垂直运动,展示其在气候现象(如厄尔尼诺-南方涛动)中关键海洋-大气耦合机制的表示能力。这些发现增强了基于深度学习的全球天气和气候模型的信心,并拓展深度学习方法到更广泛的地球系统建模,为提升气候预测能力提供潜力。

英文摘要

Artificial intelligence has advanced global weather forecasting, outperforming traditional numerical models in both accuracy and computational efficiency. Nevertheless, extending predictions beyond subseasonal timescales requires the development of deep learning (DL)-based ocean-atmosphere coupled models that can realistically simulate complex oceanic responses to atmospheric forcing. This study presents KIST-Ocean, a DL-based global three-dimensional ocean general circulation model using a U-shaped visual attention adversarial network architecture. KIST-Ocean integrates partial convolution, adversarial training, and transfer learning to address coastal complexity and predictive distribution drift in auto-regressive models. Comprehensive evaluations confirmed the model's robust ocean predictive skill and efficiency. Moreover, it accurately captures realistic ocean response, such as Kelvin and Rossby wave propagation in the tropical Pacific, and vertical motions induced by cyclonic and anticyclonic wind stress, demonstrating its ability to represent key ocean-atmosphere coupling mechanisms underlying climate phenomena, including the El Nino-Southern Oscillation. These findings reinforce confidence in DL-based global weather and climate models and their extending DL-based approaches to broader Earth system modeling, offering potential for enhancing climate prediction capabilities.

2506.08654 2026-06-17 physics.med-ph cs.LG

A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck

一种保护隐私的联邦学习框架用于头颈区域CBCT到合成CT的可推广转换

Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea

发表机构 * Institute of Biomedical Engineering(生物医学工程研究所) Karlsruhe Institute of Technology(卡尔斯鲁厄理工大学) Department of Experimental and Clinical Medicine(实验与临床医学系)

AI总结 本文提出一种跨机构联邦学习框架,用于头颈区域CBCT到合成CT的转换,通过保护数据隐私实现跨机构模型的泛化能力。

详情
Journal ref
Frontiers in Digital Health, 8:1812254, June 2026
AI中文摘要

锥束计算机断层扫描(CBCT)已成为图像引导放射治疗(IGRT)中广泛应用的成像模态。然而,CBCT存在噪声增加、软组织对比度有限和伪影等问题,导致Hounsfield单位值不可靠,阻碍了直接剂量计算。合成CT(sCT)生成从CBCT中解决了这些问题,尤其是使用深度学习(DL)方法。现有方法受到机构异质性、扫描仪依赖性变化和数据隐私法规的限制,这些法规防止多中心数据共享。为克服这些挑战,我们提出了一种跨机构横向联邦学习(FL)方法,用于头颈区域CBCT到sCT的合成,扩展了我们的FedSynthCT框架。一个条件生成对抗网络在欧洲三个医疗中心的公共SynthRAD2025挑战数据集上协同训练。联邦模型在不同中心间表现出有效的泛化能力,平均绝对误差(MAE)范围从64.38±13.63到85.90±7.10 HU,结构相似性指数(SSIM)从0.882±0.022到0.922±0.039,峰值信噪比(PSNR)从32.86±0.94到34.91±1.04 dB。值得注意的是,在60名患者的外部验证数据集上,未进行额外训练即可实现相似的性能(MAE: 75.22±11.81 HU,SSIM: 0.904±0.034,PSNR: 33.52±2.06 dB),证实了在协议、扫描仪差异和配准误差的情况下具有鲁棒的泛化能力。这些发现展示了联邦学习在CBCT到sCT合成中的技术可行性,同时保护了数据隐私,并提供了一种无需集中数据共享或特定站点微调即可在不同机构之间开发可推广模型的协作解决方案。

英文摘要

Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.

2501.15351 2026-06-17 cs.CY cs.LG

Fairness in LLM-Generated Surveys

LLM生成调查中的公平性

Andrés Abeliuk, Vanessa Gaete, Naim Bro

发表机构 * Department of Computer Science, University of Chile(智利大学计算机科学系) National Center for Artificial Intelligence (CENIA)(国家人工智能中心) School of Government, Adolfo Ibáñez University(阿道弗·伊巴涅斯大学政府学院) Millennium Institute for Foundational Research on Data (IMFD)(数据基础研究千年研究所)

AI总结 研究分析了LLM在不同人口中的表现,发现其在美国数据集上表现更优,但存在因训练数据偏见导致的公平性问题,提出新的测量框架以提升模型公平性。

详情
Journal ref
EPJ Data Science (2026)
AI中文摘要

大型语言模型(LLMs)在文本生成和理解方面表现出色,尤其在模拟社会政治和经济模式方面,可作为传统调查的替代方案。然而,其全球适用性仍存疑,因未探索的社会人口和地理背景中的偏见。本研究通过分析智利和美国的公开调查,探讨LLM在不同人群中的表现,关注预测准确性和公平性指标。结果显示,LLM在美国数据集上表现更优,此偏见源于以美国为中心的训练数据,即使考虑社会人口差异后仍显著。在美国,政治身份和种族显著影响预测准确性,而在智利,性别、教育和宗教归属起更重要作用。本研究提出一种新的框架,用于测量LLM中的社会人口偏见,为确保在不同社会文化背景下实现更公平和公正的模型表现提供路径。

英文摘要

Large Language Models (LLMs) excel in text generation and understanding, especially in simulating socio-political and economic patterns, serving as an alternative to traditional surveys. However, their global applicability remains questionable due to unexplored biases across socio-demographic and geographic contexts. This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States, focusing on predictive accuracy and fairness metrics. The results show performance disparities, with LLM consistently outperforming on U.S. datasets. This bias originates from the U.S.-centric training data, remaining evident after accounting for socio-demographic differences. In the U.S., political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles. Our study presents a novel framework for measuring socio-demographic biases in LLMs, offering a path toward ensuring fairer and more equitable model performance across diverse socio-cultural contexts.

2408.15188 2026-06-17 eess.AS cs.CL cs.SD

Infusing Acoustic Pause Context into Text-Based Dementia Assessment

将语音停顿上下文注入基于文本的痴呆症评估

Franziska Braun, Sebastian P. Bayerl, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

发表机构 * Technische Hochschule Nürnberg(图林根应用技术大学纽伦堡分校) Technische Hochschule Rosenheim(图林根应用技术大学罗森海姆分校) Klinik für Psychiatrie und Psychotherapie, Universitätsklinik der Paracelsus Medizinischen Privatuniversität, Klinikum Nürnberg, Germany(帕拉塞尔斯医学私人大学纽伦堡大学心理治疗与精神病科诊所) KST Institut GmbH, Bad Emstal, Germany(KST研究所,巴德埃姆斯塔尔,德国)

AI总结 本文研究利用停顿增强的转录文本,通过Transformer语言模型区分无认知障碍、轻度认知障碍和阿尔茨海默病患者,探讨停顿信息和声学上下文对不同任务的影响。

Comments Accepted at INTERSPEECH 2024

详情
Journal ref
Proceedings of Interspeech 2024
AI中文摘要

语音停顿,与内容和结构相结合,提供了一种有价值的、非侵入性的生物标志物,用于检测痴呆症。本工作探讨了在基于Transformer的语言模型中使用包含停顿的转录文本,以区分无认知障碍、轻度认知障碍和阿尔茨海默病患者在临床评估中的语音特征。我们处理了三个二元分类任务:起始、监测和痴呆排除。通过在德语口头流畅性测试和图片描述测试上的实验,比较模型在不同语音生成上下文中的有效性。从文本基线开始,我们探讨了停顿信息和声学上下文的整合效果。我们展示了测试应根据任务选择,并且词汇停顿信息和声学交叉注意力对不同任务贡献不同。

英文摘要

Speech pauses, alongside content and structure, offer a valuable and non-invasive biomarker for detecting dementia. This work investigates the use of pause-enriched transcripts in transformer-based language models to differentiate the cognitive states of subjects with no cognitive impairment, mild cognitive impairment, and Alzheimer's dementia based on their speech from a clinical assessment. We address three binary classification tasks: Onset, monitoring, and dementia exclusion. The performance is evaluated through experiments on a German Verbal Fluency Test and a Picture Description Test, comparing the model's effectiveness across different speech production contexts. Starting from a textual baseline, we investigate the effect of incorporation of pause information and acoustic context. We show the test should be chosen depending on the task, and similarly, lexical pause information and acoustic cross-attention contribute differently.

2308.08306 2026-06-17 eess.AS cs.SD

Classifying Dementia in the Presence of Depression: A Cross-Corpus Study

在抑郁存在下的痴呆分类:一项跨语料库研究

Franziska Braun, Sebastian P. Bayerl, Paula A. Pérez-Toro, Florian Hönig, Hartmut Lehfeld, Thomas Hillemacher, Elmar Nöth, Tobias Bocklet, Korbinian Riedhammer

发表机构 * Technische Hochschule Nürnberg(图林根应用技术大学) Friedrich-Alexander-Universität Erlangen-Nürnberg(埃尔兰根-纽伦堡 Friedrich-Alexander 大学) Klinik für Psychiatrie und Psychotherapie, Universitätsklinik der Paracelsus Medizinischen Privatuniversität, Klinikum Nürnberg, Germany(纽伦堡大学心理治疗与精神病科诊所,帕拉塞尔医学私人大学大学医院,纽伦堡诊所,德国) KST Institut GmbH, Bad Emstal, Germany(KST 机构,巴德埃姆斯塔尔,德国)

AI总结 本文通过跨语料库实验,利用文本、音频和情感嵌入对语音进行三类分类(HC vs. MCI vs. DEM),探讨抑郁作为次级诊断对分类器的影响。

Comments Accepted at INTERSPEECH 2023

详情
Journal ref
Proceedings of Interspeech 2023
AI中文摘要

自动痴呆筛查有助于早期检测和干预,减少对 healthcare 系统的成本,提高受影响者的质量生活。抑郁症与痴呆有共享症状,增加了诊断的复杂性。迄今为止,研究重点是使用单个数据集的图片描述测试语音对痴呆(DEM)和健康受试者(HC)进行二分类。在本工作中,我们应用已建立的基线系统,利用语义词汇流畅度测试和波士顿命名测试的语音,通过文本、音频和情感嵌入进行三类分类。我们在两个独立录制的德语数据集上进行跨语料库和混合语料库实验,以研究在更大人群和不同录音条件下的泛化能力。在详细的错误分析中,我们研究抑郁症作为次级诊断,以了解分类器实际上学到了什么。

英文摘要

Automated dementia screening enables early detection and intervention, reducing costs to healthcare systems and increasing quality of life for those affected. Depression has shared symptoms with dementia, adding complexity to diagnoses. The research focus so far has been on binary classification of dementia (DEM) and healthy controls (HC) using speech from picture description tests from a single dataset. In this work, we apply established baseline systems to discriminate cognitive impairment in speech from the semantic Verbal Fluency Test and the Boston Naming Test using text, audio and emotion embeddings in a 3-class classification problem (HC vs. MCI vs. DEM). We perform cross-corpus and mixed-corpus experiments on two independently recorded German datasets to investigate generalization to larger populations and different recording conditions. In a detailed error analysis, we look at depression as a secondary diagnosis to understand what our classifiers actually learn.

2201.06574 2026-06-17 eess.IV cs.CV

Neural Computed Tomography

神经计算断层扫描

Kunal Gupta, Brendan Colvert, Francisco Contijoch

发表机构 * University of California San Diego(加州大学圣地亚哥分校)

AI总结 本文提出NeuralCT框架,通过神经隐式方法生成无运动伪影的时间分辨图像,适用于心脏等复杂运动场景。

Comments https://kunalmgupta.github.io/projects/NeuralCT.html

详情
AI中文摘要

在获取投影集过程中发生的运动可能导致计算断层扫描重建中出现显著的运动伪影,尽管单个视图的获取速度较快。在如心脏成像等情况下,运动可能是不可避免的,评估运动具有临床意义。通过开发具有更快门架旋转速度的系统或使用测量和/或估计位移的算法,通常可以减少运动伪影。然而,这些方法由于物理限制以及估计/测量非刚性、时间变化和患者特异性运动的挑战而效果有限。我们提出了一种新的重建框架NeuralCT,以生成无运动伪影的时间分辨图像。我们的方法利用神经隐式方法,不需要对底层运动进行估计或建模。相反,通过使用符号距离度量和神经隐式框架来表示边界。我们利用“分析-合成”方法来确定与所获取的sinogram一致且符合空间和时间一致性约束的解决方案。我们通过三个渐进复杂的场景展示了NeuralCT的实用性:小圆的平移、椭圆直径的心跳样变化以及复杂的拓扑变形。在不进行超参数调优或改变架构的情况下,NeuralCT在使用均方误差和Dice度量时,为所有三种运动提供了高质量的图像重建,相比滤波反投影。

英文摘要

Motion during acquisition of a set of projections can lead to significant motion artifacts in computed tomography reconstructions despite fast acquisition of individual views. In cases such as cardiac imaging, motion may be unavoidable and evaluating motion may be of clinical interest. Reconstructing images with reduced motion artifacts has typically been achieved by developing systems with faster gantry rotation or using algorithms which measure and/or estimate the displacements. However, these approaches have had limited success due to both physical constraints as well as the challenge of estimating/measuring non-rigid, temporally varying, and patient-specific motions. We propose a novel reconstruction framework, NeuralCT, to generate time-resolved images free from motion artifacts. Our approaches utilizes a neural implicit approach and does not require estimation or modeling of the underlying motion. Instead, boundaries are represented using a signed distance metric and neural implicit framework. We utilize `analysis-by-synthesis' to identify a solution consistent with the acquired sinogram as well as spatial and temporal consistency constraints. We illustrate the utility of NeuralCT in three progressively more complex scenarios: translation of a small circle, heartbeat-like change in an ellipse's diameter, and complex topological deformation. Without hyperparameter tuning or change to the architecture, NeuralCT provides high quality image reconstruction for all three motions, as compared to filtered backprojection, using mean-square-error and Dice metrics.

2106.09539 2026-06-17 eess.AS cs.LG cs.SD

Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

对新生儿重症监护病房中以儿童为中心的全天候录音中语音情感内容的自动分析

Einari Vaaras, Sari Ahlqvist-Björkroth, Konstantinos Drossos, Okko Räsänen

发表机构 * Unit of Computing Sciences, Tampere University, Finland(图瓦大学计算科学系) Department of Clinical Medicine, University of Turku, Finland(图尔库大学临床医学系) Department of Signal Processing and Acoustics, Aalto University, Finland(阿尔托大学信号处理与声学系)

AI总结 本文研究了如何通过自动语音情感识别系统分析新生儿录音中的情感内容,探讨了跨语料泛化、WGAN域适应和主动学习在新领域部署中的有效性,实现了73.4%的UAR分类性能。

详情
AI中文摘要

研究人员最近开始研究年轻婴儿听到的情感语音如何影响其发展结果。作为这项研究的一部分,来自芬兰和爱沙尼亚两家医院的数百小时全天候录音被收集,用于所谓的APPLE研究。为了分析此类大规模数据集中的语音情感内容,需要一个自动语音情感识别(SER)系统。然而,目前没有情感标签或现成的领域内SER系统可用。本文介绍了最初未标注的大型真实世界音频数据集,并描述了针对芬兰子集数据开发的功能性SER系统。我们探讨了替代的最先进技术在新领域部署SER系统的有效性,比较了跨语料泛化、基于WGAN的域适应和主动学习在该任务中的效果。结果表明,表现最好的模型能够实现二元分类中valence和arousal的73.4%未加权平均召回率(UAR)和73.2% UAR。结果还显示,主动学习在与其他两种方法相比时表现最为一致。

英文摘要

Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.

2606.18111 2026-06-17 cs.LG cs.AI 新提交

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

多目标强化学习中学习公平帕累托最优策略

Umer Siddique, Peilang Li, Yongcan Cao

AI总结 针对多目标强化学习中固定用户偏好无法提供多样化策略的问题,提出基于广义基尼福利函数的多策略方法,学习公平帕累托最优策略集。

Comments Accepted at the Reinforcement Learning Conference (RLC) 2025. 12 pages main + appendix, 8 figures, 4 tables

详情
AI中文摘要

公平性是多目标强化学习(MORL)决策中的一个重要方面,策略必须确保在多个潜在冲突的目标上既达到最优又实现公平。虽然单策略MORL方法可以使用福利函数(如广义基尼福利函数GGF)为固定的用户偏好学习公平策略,但它们无法提供动态或未知用户偏好所需的多样的策略集。为解决这一局限性,我们形式化了多策略MORL中的公平优化问题,其目标是学习一组帕累托最优策略,确保在所有可能的用户偏好下实现公平。我们的关键技术贡献有三点:(1)我们证明对于凹的、分段线性的福利函数(例如GGF),公平策略仍然在凸覆盖集(CCS)中,CCS是线性标量化下的近似帕累托前沿。(2)我们证明非平稳策略(通过累积奖励历史增强)和随机策略通过动态适应历史不公平性来改善公平性。(3)我们提出了三种新算法,包括将GGF与多策略多目标Q学习(MOQL)集成、用于学习非平稳策略的状态增强多策略MOQL,以及用于学习随机策略的新扩展。我们在多个领域评估了我们的算法,并将我们的方法与最先进的MORL基线进行了比较。实验结果表明,我们的方法学习了一组公平策略,能够适应不同的用户偏好。

英文摘要

Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives. While single-policy MORL methods can learn fair policies for fixed user preferences using welfare functions such as the generalized Gini welfare function (GGF), they fail to provide the diverse set of policies necessary for dynamic or unknown user preferences. To address this limitation, we formalize the fair optimization problem in multi-policy MORL, where the goal is to learn a set of Pareto-optimal policies that ensure fairness across all possible user preferences. Our key technical contributions are threefold: (1) We show that for concave, piecewise-linear welfare functions (e.g., GGF), fair policies remain in the convex coverage set (CCS), which is an approximated Pareto front for linear scalarization. (2) We demonstrate that non-stationary policies, augmented with accrued reward histories, and stochastic policies improve fairness by dynamically adapting to historical inequities. (3) We propose three novel algorithms, which include integrating GGF with multi-policy multi-objective Q-Learning (MOQL), state-augmented multi-policy MOQL for learning non-statoinary policies, and its novel extension for learning stochastic policies. We evaluate our algorithms across various domains and compare our methods against the state-of-the-art MORL baselines. The empirical results show that our methods learn a set of fair policies that accommodate different user preferences.

2606.17692 2026-06-17 cs.LG 新提交

Delta-Based Target Reformulation for Short-Term Electricity Load Forecasting Using LSTM and Transformer Models

基于Delta目标重构的LSTM与Transformer短期电力负荷预测

Vansh Bansal

AI总结 针对电力负荷非平稳性,提出Delta目标重构方法,让LSTM和Transformer预测负荷变化量而非绝对值,在小时级预测中MAE和MAPE降低超50%。

Comments 8 pages, 3 tables

详情
AI中文摘要

准确的短期电力负荷预测对于现代电力系统的可靠和经济运行至关重要,尤其是在天气变化、日历效应和消费模式演变导致的非平稳性下。尽管LSTM和Transformer等深度学习模型表现出色,但大多数现有研究侧重于直接预测绝对负荷,而未明确解决目标非平稳性。受ARIMA模型中经典时间序列差分技术的启发,本文研究了一种基于Delta的目标重构方法,用于深度学习的短期电力负荷预测。该方法不直接预测绝对负荷值,而是训练模型预测连续时间步之间的负荷变化,最终预测通过最后一次观测负荷重建。这旨在稳定学习目标并降低预测难度。利用印度多年逐小时真实电力负荷数据,辅以NASA POWER项目的气象变量和日历特征,本研究评估了LSTM和Transformer在两种公式下的表现,并以LightGBM作为基准。实验针对小时前和日前预测范围进行,通过平均绝对误差(MAE)和平均绝对百分比误差(MAPE)评估性能。结果表明,Delta重构在所有评估模型的小时前预测中持续提高预测精度,与绝对公式相比,MAPE降低超过50%。对于日前预测,Delta目标特别有利于深度序列模型(LSTM和Transformer),而LightGBM在绝对公式下仍具有竞争力。这些发现表明,Delta重构是神经网络的一种强大归纳偏置,但其效果依赖于模型和预测范围。

英文摘要

Accurate short-term electricity load forecasting is critical for the reliable and economic operation of modern power systems, under non-stationarity arising from weather variability, calendar effects, and evolving consumption patterns. While deep learning models such as LSTMs and Transformers show promising performance, most existing studies focus on direct absolute load prediction without explicitly addressing target non-stationarity. Motivated by classical time-series differencing techniques in ARIMA models, this paper investigates a delta-based target reformulation for short-term electricity load forecasting using deep learning. Instead of directly predicting absolute load values, the proposed formulation trains models to predict the change in load between consecutive time steps, with final forecasts reconstructed using the last observed load. This aims to stabilize the learning target and reduce forecasting difficulty. Using multi-year, hourly real-world electricity load data from India, augmented with meteorological variables from the NASA POWER project and calendar features, this study evaluates LSTM and Transformer models under both formulations, benchmarking them against LightGBM. Experiments are conducted for hour-ahead and day-ahead horizons, assessing performance via Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). Results show that delta-based reformulation consistently improves forecasting accuracy for hour-ahead prediction across all evaluated models, yielding MAPE reductions of over 50% compared to absolute formulations. For day-ahead forecasting, delta targets specifically benefit deep sequence models (LSTM and Transformer), while LightGBM remains competitive under the absolute formulation. These findings indicate that while delta reformulation is a powerful inductive bias for neural networks, its efficacy is model- and horizon-dependent.

2606.17603 2026-06-17 cs.LG 新提交

Expanding SPHERE-JEPA: A Family of Statistical Regularizers for the Hypersphere

扩展SPHERE-JEPA:超球面上的统计正则化器家族

Léo Nicollier, Enric Meinhardt-Llopis, Max Dunitz, Marc Pic, Pablo Musé, Gabriele Facciolo

AI总结 为解决自监督学习中切片统计正则化器因蒙特卡洛采样引入投影方差导致优化不稳定和收敛慢的问题,提出全维MMD、KSD和KL散度正则化器,并采用旋转不变核,在ImageNet和Galaxy10上实现更稳定优化和一致改进。

详情
AI中文摘要

在自监督学习(SSL)中,通过在单位超球面上显式强制均匀分布来防止表示坍缩已被证明是有效的。然而,当前的框架通常依赖于切片统计正则化器,如SIGReg(用于LeJEPA)和SUSReg(用于SPHERE-JEPA),这些正则化器通过沿随机一维方向的蒙特卡洛采样来近似这一连续目标。这种随机性将投影方差注入训练梯度,破坏优化稳定性,并阻碍收敛。在这项工作中,我们首先证明,解析地积分掉这些随机投影自然地产生一个确定性的最大均值差异(MMD),从而避免了切片方法的方差。受此等价性的启发,我们直接在球面上制定了MMD、核斯坦因差异(KSD)和KL散度的全维目标,以强制均匀分布。为了防止空间偏差,我们通过谱理论构造旋转不变核来装备这些检验,并系统评估了两个典型族:平滑指数衰减(热核)和严格频率截止(带限)滤波器。实验上,去除投影引起的噪声导致更稳定的优化、更快的收敛,并在ImageNet和Galaxy10上相对于随机切片正则化器取得一致改进。此外,我们揭示了统计检验的选择塑造了学习潜在空间的几何结构:MMD和KSD有利于适用于以对象为中心的领域的局部聚类组织,而基于连续KDE的KL散度促进了细粒度的实例分离,在非聚类的程序化纹理检索上取得了最强结果。

英文摘要

In Self-Supervised Learning (SSL), preventing representation collapse by explicitly enforcing a uniform distribution on the unit hypersphere has proven to be effective. However, current frameworks typically rely on sliced statistical regularizers such as SIGReg (used in LeJEPA) and SUSReg (used in SPHERE-JEPA), which approximate this continuous objective via Monte Carlo sampling along random 1D directions. This stochasticity injects projection variance into the training gradients, destabilizing optimization, and hindering convergence. In this work, we first show that analytically integrating out these random projections natively yields a deterministic Maximum Mean Discrepancy (MMD), bypassing the variance of sliced methods. Motivated by this equivalence, we formulate full-dimensional objectives for MMD, Kernel Stein Discrepancy (KSD), and Kullback-Leibler (KL) divergence directly on the sphere to enforce a uniform distribution. To prevent spatial bias, we equip these tests with rotationally invariant kernels constructed via spectral theory, systematically evaluating two canonical families: smooth exponential decay (Heat) and strict frequency cutoff (Bandlimited) filters. Empirically, removing projection-induced noise results in more stable optimization, faster convergence, and consistent improvements over stochastic sliced regularizers on ImageNet and Galaxy10. Furthermore, we reveal that the choice of the statistical test shapes the geometry of the learned latent space: MMD and KSD favor locally clustered organization suitable for object-centric domains, whereas the continuous KDE-based KL divergence promotes fine-grained instance separation, yielding the strongest results on unclustered procedural texture retrieval.