arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1971
专题追踪
2606.19212 2026-06-18 stat.ML cs.LG 新提交

Generalised Eigenvalue Geometry of Semantic Adversarial Attacks

语义对抗攻击的广义特征值几何

Martin Anthony, Kaveh Salehzadeh Nobari

AI总结 提出一种连续局部模型,通过矩阵束$(A,B)$的最大广义特征值量化语义对抗攻击性,并给出预测翻转条件、攻击性证书及VC界。

详情
AI中文摘要

最近的实证工作表明,语义等价的释义可以欺骗金融情感分类器:尽管释义在强参考嵌入下保持与原文接近,但它可能足以改变目标模型的表示,从而改变预测类别。现有的鲁棒性理论要么假设单模型威胁模型,要么主要关注实证攻击算法。我们开发了一个连续局部模型来描述语义释义扰动,该模型捕捉了这种双模型结构。我们证明,在代理模型预算下,目标表示的最坏情况局部位移由从两个嵌入映射的雅可比矩阵构造的矩阵束$(A,B)$的最大广义特征值控制。由此产生的攻击性指标$\lambda^*(x)$是局部释义几何和所选嵌入器固有的,为仿射读出提供了闭式预测翻转条件,并支持保守的总体和有限样本攻击性证书。为了对仿射读出的类别进行统一控制,我们推导了二元攻击性指标的无分布VC界,以及基于攻击性调整边界的尺度敏感边界,该边界从标准分类器边界中减去局部几何惩罚。我们还将连续理论与离散释义搜索联系起来,识别出成功与不成功的有限搜索之间的不对称性,并给出了离散和连续设置一致时的覆盖条件。最后,我们提出了一个使用软令牌松弛和生成的释义集的实证验证框架,以评估部署的金融文本分类器上的局部特征值几何、预测翻转条件和有限搜索近似。

英文摘要

Recent empirical work shows that semantically equivalent paraphrases can fool financial sentiment classifiers: although a paraphrase remains close to the original under a strong reference embedding, it may shift the target model's representation enough to change the predicted class. Existing robustness theory either assumes a single-model threat model or focuses mainly on empirical attack algorithms. We develop a continuous local model of semantic paraphrase perturbations that captures this two-model structure. We show that the worst-case local displacement of the target representation, subject to a proxy-model budget, is governed by the largest generalised eigenvalue of a matrix pencil $(A,B)$ constructed from the Jacobians of the two embedding maps. The resulting attackability index $λ^*(x)$ is intrinsic to the local paraphrase geometry and the chosen embedders, yields a closed-form prediction-flip condition for affine readouts, and supports conservative population and finite-sample attackability certificates. For uniform control over classes of affine readouts, we derive a distribution-free VC bound for binary attackability indicators and a scale-sensitive margin bound based on an attackability-adjusted margin that subtracts a local geometric penalty from the standard classifier margin. We also connect the continuous theory to discrete paraphrase search, identify an asymmetry between successful and unsuccessful finite searches, and give a covering condition under which the discrete and continuous settings agree. Finally, we propose an empirical verification framework using soft-token relaxations and generated paraphrase sets to assess the local eigenvalue geometry, prediction-flip condition, and finite-search approximation on a deployed financial-text classifier.

2606.19117 2026-06-18 stat.ME cs.LG econ.EM stat.ML 新提交

Wasserstein Policy Learning for Distributional Outcomes

Wasserstein 策略学习用于分布性结果

Yiyan Huang, Cheuk Hang Leung, Qi Wu, Zhiheng Zhang

AI总结 针对分布值结果,提出基于Wasserstein重心和效用泛函的策略学习框架,使用IPW和DR估计器,证明遗憾率由策略类复杂度主导,并给出极小化下界。

Comments Accepted by The 39th Annual Conference on Learning Theory (COLT 2026)

详情
AI中文摘要

离线策略学习在因果推断中受到越来越多的关注。主要目标是学习一个策略(个体化治疗规则),作为从协变量到治疗的映射,以最大化定义为标量值潜在结果均值的经验福利。在本文中,我们研究具有分布值结果的离线策略学习,其中每个潜在结果是$\mathbb{R}$上的概率测度,奖励通过应用于诱导结果分布的Wasserstein重心的效用泛函来定义。我们基于逆概率加权(IPW)和双稳健(DR)估计器为策略学习框架建立了统计保证。通过处理组合策略类和无限维分位数域乘积上的具有挑战性的均匀偏差,我们证明了有限样本遗憾具有主导依赖$\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(\Pi)/N})$。在一维Wasserstein设定下,并在所述正则条件下,主导遗憾率仍由策略类复杂度控制。此外,我们提供了一个极小化下界,建立了对$N$和$\mathrm{N\text{-}dim}(\Pi)$主导依赖的尖锐性。

英文摘要

Offline policy learning has received growing attention in causal inference. The primary objective is to learn a policy (individualized treatment rule) as a mapping from covariates to treatment that maximizes the empirical welfare defined as the mean of scalar-valued potential outcomes. In this paper, we study offline policy learning with distribution-valued outcomes, where each potential outcome is a probability measure on $\mathbb{R}$ and the reward is defined through a utility functional applied to the Wasserstein barycenter of induced outcome distributions. We establish statistical guarantees for the policy learning framework based on both Inverse Probability Weighting (IPW) and Doubly Robust (DR) estimators. By handling the challenging uniform deviation over the product of the combinatorial policy class and the infinite-dimensional quantile domain, we prove that the finite-sample regret has leading dependence $\widetilde{\mathcal{O}}(\sqrt{\mathrm{N\text{-}dim}(Π)/N})$. In the one-dimensional Wasserstein setting and under the stated regularity conditions, the leading regret rate is still governed by the policy-class complexity. Moreover, we provide a minimax lower bound establishing the sharpness of the leading dependence on $N$ and $\mathrm{N\text{-}dim}(Π)$.

2606.19101 2026-06-18 eess.SP cs.LG 新提交

Structure Over Nonlinearity: Explicit Interaction Architectures for Dynamical Learning

结构优于非线性:面向动力学学习的显式交互架构

Augusto Sarti

AI总结 提出基于波启发交互结构的显式动力学单元,通过结构化组织而非非线性表达实现建模能力,在非线性系统辨识中深度提升表示质量与泛化性能。

Comments 11 pages, 2 figures, 2 tables

详情
AI中文摘要

大多数动力学系统的学习架构依赖于通用非线性函数逼近,通常需要高模型复杂度来捕获结构化行为。在这项工作中,我们提出了一种替代范式,其中建模能力主要来源于结构而非表达性非线性。我们引入了一类基于波启发交互结构和内部状态的显式结构化动力学单元。受波计算原理启发,所提出的单元采用严格的因果组织,消除了代数循环,产生无需隐式求解器即可评估的完全显式模型。堆叠此类单元可产生具有涌现层次行为的分层动力学架构。通过非线性系统辨识任务的实验,我们表明即使在有限的参数优化下,深度也能提高表示质量和泛化能力。特别地,所提出的架构即使在仅进行读出层拟合时也能产生信息丰富的内部表示,这表明有用的动力学结构在大量参数优化之前就已从交互的组织中涌现。这些结果表明,结构优先的设计为学习动力学系统提供了一种可行且有效的替代传统黑箱方法,突出了交互结构作为模型表达性主要来源的作用。

英文摘要

Most learning architectures for dynamical systems rely on generic nonlinear function approximation, often requiring high model complexity to capture structured behaviors. In this work, we propose an alternative paradigm in which modeling capability arises primarily from structure rather than from expressive nonlinearities. We introduce a class of explicit structured dynamical units based on wave-inspired interaction structures with internal state. Inspired by wave-based computational principles, the proposed units adopt a strictly causal organization that eliminates algebraic loops, yielding fully explicit models that can be evaluated without implicit solvers. Stacking such units produces layered dynamical architectures with emergent hierarchical behavior. Through experiments on a nonlinear system identification task, we show that depth improves both representation quality and generalization, even under limited parameter optimization. In particular, the proposed architectures produce informative internal representations even under readout-only fitting, indicating that useful dynamical structure emerges from the organization of interactions prior to substantial parameter optimization. These results suggest that structure-first design provides a viable and effective alternative to conventional black-box approaches for learning dynamical systems, highlighting the role of interaction structure as a primary source of model expressivity.

2606.18993 2026-06-18 stat.ML cs.LG stat.ME 新提交

Sequential Kernel-based Conditional Independence Testing via Adaptive Betting

基于自适应投注的序列核条件独立性检验

Zheng He, Danica J. Sutherland

AI总结 提出一种对估计误差更鲁棒的序列条件独立性检验方法,通过自适应优化核条件独立性统计量、归一化及截断平移校准,在合成与真实数据上控制第一类错误并保持高功效。

Comments Published at ICML 2026: https://openreview.net/forum?id=vUMdIyTs9c

详情
AI中文摘要

检验条件独立性是基础但本质上困难的问题:在没有额外假设的情况下,通常无法控制第一类错误。“Model-X”范式通过假设精确知道相关条件分布来解决这一困难。虽然经典的一次性检验有时可以容忍对该假设的小偏差,但现有的序列条件独立性检验通常要求精确知道Model-X条件分布,这使得当必须估计该分布时它们变得脆弱。我们提出了一种新方法,对这类估计误差具有更强的鲁棒性。我们的方法将测试-投注应用于自适应优化的核条件独立性统计量,并结合归一化方案和截断-移位校准策略。这些修改大大减少了第一类错误膨胀,同时在高维合成基准和现实世界公平性任务中保持了高功效,优于现有的序列Model-X方法。代码可在https://this URL获取。

英文摘要

Testing conditional independence is fundamental yet intrinsically difficult: without additional assumptions, Type I error control is impossible in general. The "Model-X'' paradigm addresses this difficulty by assuming exact knowledge of a relevant conditional distribution. While small deviations from this assumption can sometimes be tolerated in classical one-shot testing, existing sequential conditional independence tests typically require the Model-X conditional to be known exactly, making them fragile when it must instead be estimated. We propose a new approach that is substantially more robust to such estimation error. Our method applies testing-by-betting to an adaptively optimized Kernel Conditional Independence statistic, together with a normalization scheme and a truncate-and-shift calibration strategy. These modifications greatly reduce Type I error inflation while preserving high power across high-dimensional synthetic benchmarks and real-world fairness tasks, outperforming existing sequential Model-X approaches. Code is available at https://github.com/he-zh/SKCI.

2606.18979 2026-06-18 eess.AS cs.CL cs.SD 新提交

Mitigating Scoring Errors and Compensating for Nonverbal Subtests in Speech-Based Dementia Assessment

缓解语音痴呆评估中的评分错误并补偿非语言子测试

Franziska Braun, Christopher Witzl, Andreas Erzigkeit, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

AI总结 研究通过融合转录分数和Whisper嵌入减少语音评估中的评分错误,并利用融合表示近似专家整体评分以补偿缺失的运动子测试,有效区分认知状态组。

Comments Accepted at INTERSPEECH 2026

详情
AI中文摘要

认知障碍的早期检测依赖于神经心理学测试,通过评估多个认知领域来最小化主观性。基于语音的评估可以支持诊断并提高可及性,但转录错误和非语言子测试(如运动技能)的遗漏限制了准确性。除了传统的测试分数,语音衍生特征可以提供对认知状态的额外见解。本研究调查了德国“综合征短测试”的语音评估,这是一种标准化的痴呆筛查测试,包含语言和运动子测试。我们训练模型,整合每个语言子测试的转录衍生分数和Whisper嵌入,以减少评分错误。为了补偿缺失的运动子测试,我们利用这些融合表示来近似专家整体评分。尽管省略了子测试,我们的模型与专家评分高度相关,并能有效且准确地区分认知状态组。

英文摘要

Early detection of cognitive impairment relies on neuropsychological tests to minimize subjectivity by assessing multiple cognitive domains. Speech-based evaluation can support diagnostics and improve accessibility, but transcription errors and the omission of nonverbal subtests (e.g., motor skills) limit accuracy. Beyond conventional test scores, speech-derived features can provide additional insights into cognitive status. This study investigates the speech-based evaluation of the German "Syndrom-Kurz-Test," a standardized dementia screening test comprising verbal and motor subtests. We train models that integrate transcript-derived scores and Whisper embeddings per verbal subtest to reduce scoring errors. To compensate for missing motor subtests, we then leverage these fused representations to approximate expert overall ratings. Despite omitting subtests, our models strongly correlate with expert ratings and efficiently and accurately discriminate between cognitive status groups.

2606.18972 2026-06-18 stat.ML cs.LG 新提交

FOSC-X: An Extended Framework for Optimal Local Cuts and Non-Horizontal Cluster Selection from Clustering Hierarchies

FOSC-X: 一种用于从聚类层次结构中提取最优局部切割和非水平聚类的扩展框架

Connor Simpson, Ricardo J. G. B. Campello

AI总结 提出FOSC-X框架,通过动态规划从层次聚类树中提取前M个全局最优的局部非水平切割聚类,支持聚类数约束,在线性时间内保证最优排序。

详情
AI中文摘要

从层次结构中提取平坦聚类解是实际聚类分析中的常见任务,可表述为优化问题。现有方法侧重于寻找单个最优解。我们引入FOSC-X,一个从层次聚类树的局部非水平切割中提取前M个全局最优平坦聚类的框架,同时可选地对聚类数量施加约束。这使得能够自动识别多个高质量替代聚类,捕捉层次结构的不同方面。无约束时,利用子树内局部最优部分候选可组合成全局最优解并自动确定聚类数的性质,通过动态规划在多项式时间内求解前M问题。然而,这可能导致聚类数最终不理想——例如,在特定应用领域中过大而失去意义或难以实际分析。施加聚类数约束破坏了无约束动态规划方法的最优性性质,因为局部最优部分候选可能不再能组合成可行的全局最优解。FOSC-X通过一种动态规划策略应对这一挑战,该策略使用可行性的下界和上界维护紧凑的可行候选集,同时剪枝不可行或占优的组合。所得方法保证在有无聚类数约束下,均以聚类节点数和数据集大小的线性时间复杂度获得前M个解的最优排序。实验表明,FOSC-X能有效揭示单解提取方法忽略的替代聚类结构。

英文摘要

Extracting a flat clustering solution from a hierarchy is a common task in practical cluster analysis and can be formulated as an optimisation problem. Existing approaches focus on finding a single optimal solution. We introduce FOSC-X, a framework for extracting the top-M globally optimal flat clusterings from local, non-horizontal cuts of a hierarchical cluster tree, while optionally enforcing constraints on the number of clusters. This enables automatic identification of multiple high-quality alternative clusterings that capture different aspects of the hierarchical structure. Without constraints, the top-M problem can be solved in polynomial time using dynamic programming, exploiting the property that locally optimal partial candidates within subtrees can be combined to form globally optimal solutions while automatically determining the number of clusters. However, this can lead to solutions with numbers of clusters that are ultimately undesirable -- e.g., too large to be meaningful or practically analysed within a particular application domain. Imposing cluster-count constraints breaks the optimality property underlying the unconstrained dynamic programming approach, since locally optimal partial candidates may no longer combine into feasible globally optimal solutions. FOSC-X addresses this challenge through a dynamic programming strategy that maintains compact sets of feasible candidates using lower and upper feasibility bounds while pruning infeasible or dominated combinations. The resulting method guarantees optimal rankings of the top-M solutions with linear-time complexity in the number of cluster nodes and dataset size, both with and without cluster-count constraints. Experiments show that FOSC-X efficiently reveals alternative clustering structures overlooked by single-solution extraction methods.

2606.18853 2026-06-18 stat.ML cs.LG 新提交

Kernel of Partition Paths: A Unified Representation for Tree Ensembles

划分路径的核:树集成的统一表示

Nicolas Mahler

AI总结 提出KPP核,通过路径度量索引森林节点,统一了预测、精确加性归因、确定性Lipschitz鲁棒半径和Rademacher风险界,为树集成提供几何框架。

Comments 31 pages

详情
AI中文摘要

最近的一系列工作将单个决策树重新表述为基于其分裂的工程特征的线性模型,为oracle不等式和特征重要性重解释开辟了途径,但留下了一个开放问题:当通过节点而非分裂索引特征映射时,森林诱导的统一几何对象是什么。本文研究了该对象。KPP通过森林节点索引特征映射,并由路径度量加权,该度量将每个坐标转化为平方欧几里得路径等距嵌入的分量。KPP在承载度量的非对角Gram矩阵下统一了四个支柱:预测、精确加性归因、KPP度量下的确定性Lipschitz鲁棒半径,以及在固定、诚实或交叉拟合条件下的回归和分类的均匀Rademacher风险界。所有概率保证均以表示为条件,并在三种显式条件机制下陈述;鲁棒半径保证在KPP度量下是确定性的,而非原始输入的范数。回归和分类的快速率改进被推测为开放问题,并未声称是定理。

英文摘要

A recent line of work has reframed individual decision trees as linear models on engineered features associated with their splits, opening routes for oracle inequalities and feature-importance reinterpretation, but leaving open the question of what unified geometric object a forest induces when one indexes its feature map by nodes rather than by splits. The present paper studies that object. KPP indexes the feature map by the nodes of the forest, weighted by a path metric that turns each coordinate into a component of a squared-Euclidean path-isometric embedding. KPP unifies four pillars under a single non-diagonal Gram that carries a metric: prediction, exact additive attribution, deterministic Lipschitz robust radius in the KPP metric, and uniform Rademacher risk bounds for regression and classification under fixed, honest, or cross-fit conditioning. All probabilistic guarantees are conditional on the representation and are stated under three explicit conditioning regimes; the robust-radius guarantee is deterministic in the KPP metric rather than in a norm on the raw input. Conjectured fast-rate refinements for both regression and classification are stated as open problems and are not claimed as theorems.

2606.18750 2026-06-18 stat.AP cs.LG 新提交

Ensuring Trustworthy Online A/B Testing: Addressing Five Key Questions on CUPED

确保可信的在线A/B测试:解决关于CUPED的五个关键问题

Yu Zhang, Bokui Wan, Yongli Qin, Jinyong Ma, Yifan Guo

AI总结 本文系统解决CUPED应用中五个常见但被忽视的问题,包括最优调整规范、回归调整有效性、鲁棒方差估计,并扩展到多臂实验和两阶段抽样设计,通过理论分析和实验验证提供可靠方法,已在字节跳动平台部署。

Comments 15 pages, 3 figures

详情
AI中文摘要

A/B测试已成为大规模在线实验中数据驱动决策的金标准,为功能发布、定价优化和用户体验提升提供关键指导。为最大化统计灵敏度,许多科技公司常规使用实验前数据控制实验(CUPED),该技术实现大幅方差缩减,同时保持平均处理效应估计的无偏性。尽管被广泛采用,CUPED的几个关键方法和实践细节仍未充分探索。本文系统解决了关于CUPED应用的五个常见但被忽视的问题。首先,我们提供各种后CUPED估计量的比较分析,以确定最优调整规范。其次,我们评估基于回归的调整的有效性,并描述为此类框架定制的鲁棒方差估计方法。最后,我们将研究扩展到复杂但常见的场景,包括多臂实验和两阶段抽样设计。我们的发现表明,在这些设置中,天真地依赖标准方差估计量可能导致严重误导的推断。通过提供严格的理论见解和广泛的实验验证,本工作加深了对CUPED的概念理解。值得注意的是,推荐的方法已成功部署并集成到字节跳动的实验平台中。

英文摘要

A/B testing has become the gold standard for data-driven decision-making in large-scale online experimentation, providing critical guidance for feature launch, pricing optimization, and user experience enhancement. To maximize statistical sensitivity, many technology companies routinely employ Controlled-experiment Using Pre-Experiment Data (CUPED), a technique that achieves substantial variance reduction while preserving the unbiasedness of estimating the average treatment effect. Despite its widespread adoption, several critical methodological and practical nuances of CUPED remain underexplored. This paper systematically addresses five frequently encountered yet overlooked questions regarding the application of CUPED. First, we provide a comparative analysis of various post-CUPED estimators to identify the optimal adjustment specification. Second, we evaluate the validity of regression-based adjustments and delineate robust variance estimation methods tailored for such frameworks. Finally, we extend our investigation to complex but common scenarios, including multi-arm experiments and two-stage sampling designs. Our findings reveal that in these settings, naive reliance on standard variance estimators can lead to severely misleading inferences. By offering rigorous theoretical insights and extensive experimental validation, this work deepens the conceptual understanding of CUPED. Notably, the recommended methodologies have been successfully deployed and integrated into ByteDance's experimentation platform.

2604.04089 2026-06-18 physics.comp-ph cond-mat.str-el cs.AI cs.HC 版本更新

From Paper to Program: Externalizing and Diagnosing Knowledge Bottlenecks in AI-Assisted Quantum Many-Body Code Generation

从论文到程序:AI辅助量子多体代码生成中的知识外化

Yi Zhou

AI总结 针对AI直接翻译论文为代码时因隐含约定导致失败的问题,提出知识外化方法,通过多阶段人机协作流程将隐式假设显式化,在DMRG和Pfaffian-MPS任务上验证了有效性。

Comments Core thesis upgraded

详情
AI中文摘要

大型语言模型可以编写科学代码,但当正确性依赖于文献中的默认约定时,直接的论文到程序翻译仍然脆弱。我们将这一瓶颈识别为\textbf{知识外化}:在实现之前将隐式计算假设——索引约定、规范选择、费米子符号、收缩顺序和内存约束——转换为明确的技术规范。我们评估了一个多阶段、人在回路的工作流程,该流程在理论提取和代码生成之间插入这样的规范,并带有验证和停止门。该工作流程在两个算法上不同的量子多体任务上进行了测试:基于变分扫描的密度矩阵重整化群(DMRG)来自教学综述,以及将Hartree-Fock-Bogoliubov态构造性地转换为矩阵乘积态的Pfaffian方法,来自Jin等人五页的信件,Phys. Rev. B 105, L081101 (2022),该代码未公开。对于DMRG,在$4\ imes4$网格中,所有16个规范引导的模型配对都满足物理验证标准,而直接尝试为6/13。散文规范消融实验表明,外化的内容(而非LaTeX格式)是基本要素。对于Pfaffian-MPS,该工作流程在26次存档尝试中成功11次,而直接提示产生零次审计通过。跨规范转移是不对称的:由GPT~5.5实现的非GPT规范通过4/4,而由较弱模型实现的GPT~5.5规范失败4/4,表明存在残留的实现模型瓶颈。由此产生的\textit{论文到程序多体}技能为AI辅助实现多体算法以及诊断外化成功或失败提供了可审计的协议。

英文摘要

Large language models can write scientific code, but direct paper-to-program translation remains fragile when correctness depends on tacit conventions rather than explicit equations. We frame this as a \textbf{knowledge-externalization} problem: index choices, gauges, fermionic signs, contraction order, validation gates, and scaling constraints must be made explicit before code generation. We evaluate a multi-stage, human-in-the-loop workflow on two quantum many-body tasks. DMRG from Schollwock's pedagogical review serves as calibration: specification-guided implementations pass in all 16 model pairings, compared with 6/13 direct attempts, and a prose-specification ablation shows that externalized content, not \LaTeX{} form, is the active ingredient. Pfaffian conversion of HFB states to MPS from the five-page Letter by Jin et al. serves as the stress test: no public implementation is available, and success depends on tacit sign, gauge, ordering, and scalability conventions. Here the workflow yields 11/26 audited passes, while direct prompting yields none. Cross-specification transfer is asymmetric: non-GPT specifications implemented by GPT~5.5 pass 4/4, whereas GPT~5.5 specifications implemented by weaker models fail 4/4. The contrast supports a two-bottleneck picture. Externalization resolves the first bottleneck -- paper-to-code ambiguity -- well enough to make DMRG reproducible and Pfaffian-MPS auditable. The remaining failures expose a second bottleneck in implementation-model capability. Iterative meta-specification moves this boundary but does not eliminate it. The resulting \emph{Paper-to-Program Many-Body} skill is both a reusable implementation protocol and a diagnostic instrument for AI-assisted many-body programming.

2508.02158 2026-06-18 cs.IT cs.CR cs.DS cs.LG math.IT math.ST stat.TH 版本更新

Robust Detection of Planted Subgraphs in Semi-Random Models

半随机模型中植入子图的鲁棒检测

Dor Elimelech, Wasim Huleihel

AI总结 研究半随机模型下植入子图检测问题,证明存在对抗者时强次对数密度子图检测在信息论上不可能,而对数以上密度子图统计极限不变,并设计了高效鲁棒检测算法。

Comments 38 pages, 2 figures

详情
AI中文摘要

在Erdös-Rényi随机图中检测植入子图已被广泛研究,产生了丰富的刻画统计和计算阈值的结果。然而,大多数先前的工作假设纯随机生成模型,使得所得算法在面对现实扰动时可能脆弱。本文开创性地研究了植入子图检测问题的半随机模型,其中允许对抗者在图被揭示给统计学家之前移除植入子图外的边。关键的是,统计学家仍然不知道哪些边被移除,这给推理任务带来了根本性挑战。我们建立了该半随机模型下检测的基本统计极限,揭示了尖锐的二分性。具体而言,对于具有强次对数最大密度的植入子图,在存在对抗者的情况下检测在信息论上变得不可能——尽管在经典随机模型中某些植入子图是可能的。与此形成鲜明对比的是,对于具有超对数密度的子图,统计极限基本保持不变;我们证明最优(尽管计算上不可行)的似然比检验仍然是鲁棒的。在这些统计边界之外,我们设计了一种新的计算高效且鲁棒的检测算法,并为其性能提供了严格的统计保证。我们的结果为植入子图检测建立了第一个鲁棒框架,并为半随机模型、计算-统计权衡和图推理问题中的鲁棒性研究开辟了新方向。

英文摘要

Detection of planted subgraphs in Erdös-Rényi random graphs has been extensively studied, leading to a rich body of results characterizing both statistical and computational thresholds. However, most prior work assumes a purely random generative model, making the resulting algorithms potentially fragile in the face of real-world perturbations. In this work, we initiate the study of semi-random models for the planted subgraph detection problem, wherein an adversary is allowed to remove edges outside the planted subgraph before the graph is revealed to the statistician. Crucially, the statistician remains unaware of which edges have been removed, introducing fundamental challenges to the inference task. We establish fundamental statistical limits for detection under this semi-random model, revealing a sharp dichotomy. Specifically, for planted subgraphs with strongly sub-logarithmic maximum density detection becomes information-theoretically impossible in the presence of an adversary-despite being possible for some planted subgraphs in the classical random model. In stark contrast, for subgraphs with super-logarithmic density, the statistical limits remain essentially unchanged; we prove that the optimal (albeit computationally intractable) likelihood ratio test remains robust. Beyond these statistical boundaries, we design a new computationally efficient and robust detection algorithm, and provide rigorous statistical guarantees for its performance. Our results establish the first robust framework for planted subgraph detection and open new directions in the study of semi-random models, computational-statistical trade-offs, and robustness in graph inference problems.

2606.19175 2026-06-18 econ.TH 新提交

To Gamble, Perchance to Grow

赌博,或许为了增长

Mark Whitmeyer

AI总结 研究增长最优(凯利)投资组合问题中的收益变换,刻画了产生更保守投资组合的变换条件,并推导了理性疏忽代理人的风险厌恶比较。

详情
AI中文摘要

我研究了增长最优(凯利)投资组合问题中的收益变换。在一安全一风险资产问题中,收益变换 f 普遍产生更保守的投资组合当且仅当 f 是凹且严格递增的,并且 r/f 是凸的。作为推论,我刻画了理性疏忽代理人的比较风险厌恶:一个更风险厌恶的代理人是在 Pratt (1964) 意义上足够更风险厌恶的代理人。

英文摘要

I study transformations of returns in the growth-optimal (Kelly) portfolio problem. In the one-safe-one-risky-asset problem, a return transform f universally produces a more conservative portfolio if and only if f is concave and strictly increasing and r/f is convex. As a corollary, I characterize comparative risk aversion for a rationally-inattentive agent: a more risk-averse agent is one who is sufficiently more risk averse in the Pratt (1964) sense.

2606.19000 2026-06-18 econ.EM 新提交

Tracking Brazil's Real Neutral Rate: A Multi-Block Ensemble Framework Combining Statistical Trends, Market Prices, and State-Space Models

追踪巴西实际中性利率:结合统计趋势、市场价格和状态空间模型的多模块集成框架

Gabriel de Macedo Santos

AI总结 提出一个多模块集成框架,结合统计趋势、市场隐含曲线和状态空间模型,实时追踪巴西实际中性利率代理指标,最新估计(2026年5月)为年化9.48%,并计算政策缺口。

Comments 13 pages, 6 figures, 10 tables

详情
AI中文摘要

本文提出了一个可实施的框架,用于追踪巴西实际中性利率代理指标,采用基于模块的互补模型集成。项目从每日宏观金融数据开始,将序列转换为月度频率,通过费雪方程计算事前实际Selic利率,从IBC-Br构建活动周期指标,然后结合五个方法模块:简单移动平均、统计趋势滤波器、市场隐含曲线代理、收益率曲线状态空间模型以及半结构IS-菲利普斯状态空间模型。最终实施中,半结构模块被保守处理:由于IS-菲利普斯卡尔曼模型在当前样本中退化为局部水平趋势,其输出不被标记为结构性r-star,并在最终集成中赋予零权重。最新估计(2026年5月)将最终操作中性利率代理指标定为年化9.48%,P25-P75模块范围为8.71%-9.97%。事前实际利率为10.04%,意味着政策缺口为0.56个百分点,根据项目阈值处于中性立场。这种中性分类应严格相对于项目提升的操作代理指标,而非相对于传统的长期结构性估计。该估计的高水平不应被解释为确定的长期结构性中性利率:它反映了近期巴西实际利率动态、市场定价和基于趋势的指标在限制性周期中的表现。该估计应被解释为在当前限制性货币政策和风险溢价条件下的短期至中期影子中性利率代理指标,而非稳态结构性均衡利率。因此,主要贡献是方法论和应用层面的:该项目提供了一个透明、可审计且可扩展的测量系统,用于追踪巴西的r-star代理指标和货币政策立场。

英文摘要

This paper presents an implementable framework for tracking Brazil's real neutral-rate proxy, using a block-based ensemble of complementary models. The project begins with daily macro-financial data, converts the series to monthly frequency, computes an ex-ante real Selic rate through the Fisher equation, builds activity-cycle measures from IBC-Br, and then combines five methodological blocks: simple moving averages, statistical trend filters, market-implied curve proxies, a yield-curve state-space model, and a semi-structural IS-Phillips state-space model. The final implementation treats the semi-structural block conservatively: because the IS-Phillips Kalman model falls back to a local-level trend in the current sample, its output is not labeled as structural r-star and receives zero weight in the final ensemble. The latest estimate, for May 2026, places the final operational neutral-rate proxy at 9.48% p.a., with a P25-P75 block range of 8.71%-9.97%. The ex-ante real rate is 10.04%, implying a policy gap of 0.56 p.p. and a neutral stance under the project's thresholds. This neutral classification should be read strictly relative to the project's elevated operational proxy, not relative to conventional long-run structural estimates. The high level of the estimate should not be interpreted as a definitive long-run structural neutral rate: it reflects recent Brazilian real-rate dynamics, market pricing, and trend-based measures in a restrictive cycle. The estimate should be interpreted as a short-to-medium-run shadow neutral-rate proxy under current restrictive monetary and risk-premium conditions, not as a steady-state structural equilibrium rate. The main contribution is therefore methodological and applied: the project offers a transparent, auditable, and extensible measurement system for tracking r-star proxies and monetary-policy stance in Brazil.

2606.18590 2026-06-18 econ.EM 新提交

Ranking Treatment Saturations under Clustered Network Interference

聚类网络干扰下的处理饱和度排序

Seungjin Han, Julius Owusu, Youngki Shin

AI总结 针对聚类网络干扰下的有限处理饱和度排序问题,提出基于两阶段随机饱和度设计的经验成功排序规则,并推导其最大遗憾的非渐近上界,证明渐近最优性。

Comments 67 pages, 5 figures

详情
AI中文摘要

本文研究如何对目标总体在聚类网络干扰下的有限处理饱和度集合进行排序。我们提出了一种经验成功(ES)排序规则,该规则利用两阶段随机饱和度设计的数据,对每对饱和度选择估计福利较高的饱和度水平。我们采用具有可加可分离遗憾损失的统计决策理论框架来评估ES排序规则的性能。我们推导了ES排序规则最大遗憾的非渐近上界,该上界仅通过依赖结构的一个组合摘要依赖于聚类内网络。我们利用这些上界来刻画两阶段随机饱和度设计中的准最优第一阶段饱和度分布。我们进一步证明,在最小化最坏情况遗憾的上界意义上,ES排序规则在阈值排序规则中是渐近最优的。

英文摘要

In this paper, we study how to rank a finite set of treatment saturations for a target population with clustered network interference. We propose an empirical success (ES) ranking rule that, for each pair of saturations, selects the saturation level with the higher estimated welfare using data from a two-stage randomized saturation design. We adopt the statistical decision theory framework with additively separable regret loss to assess the performance of the ES ranking rule. We derive non-asymptotic upper bounds on the maximum regret of the ES ranking rule that depend on the within-cluster network only through a single combinatorial summary of its dependency structure. We exploit these bounds to characterize a quasi-optimal first-stage saturation distribution within the two-stage randomized saturation design. We further show that the ES ranking rule is asymptotically optimal among threshold ranking rules in the sense of minimizing an upper bound on the worst-case regret.

2606.19203 2026-06-18 eess.AS 新提交

DASH: Dual-View Self-Distillation with Multi-Layer Hidden Representations for Robust Speech Recognition

DASH: 基于多层隐藏表示的双视角自蒸馏用于鲁棒语音识别

Jaeeun Baik, Ui-Hyeop Shin, Jiwoon Lee, Woocheol Jeong, Hyung-Min Park

AI总结 提出DASH自蒸馏框架,通过双视角学习干净-噪声一致性,从多层编码器蒸馏隐藏表示并最小化原型分配分布的KL散度,在保持干净准确率的同时提升噪声鲁棒性,额外开销仅约微调时间的4%。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

自动语音识别(ASR)在现实噪声环境中常常性能下降,因此噪声鲁棒性对于部署至关重要。有监督的噪声增强微调是一种常见的补救措施,但它可能引入鲁棒性与干净性能之间的权衡,并过度拟合特定噪声,导致干净条件下的识别性能下降。我们提出了DASH,一种自蒸馏框架,通过从配对视图中学习干净-噪声一致性来提高鲁棒性。DASH从多个编码器层蒸馏隐藏表示,以捕获从低级声学到高级语义的特征,并通过最小化干净视图和噪声视图的原型分配分布之间的KL散度来稳定训练。在LibriSpeech上的实验表明,DASH在保持干净准确率的同时,在各种噪声条件下持续提高识别性能,这是通过在标准微调之外增加一个无标签的预训练阶段实现的,额外开销极小(约为微调时间的4%)。

英文摘要

Automatic Speech Recognition (ASR) often degrades in real-world noisy environments, making noise robustness essential for deployment. Supervised noise-augmented fine-tuning is a common remedy, but it can introduce a robustness-clean trade-off and overfit to specific corruptions, degrading recognition in clean conditions. We propose DASH, a self-distillation framework that improves robustness by learning clean--noisy consistency from paired views. DASH distills hidden representations from multiple encoder layers to capture features from low-level acoustics to high-level semantics, and stabilizes training by minimizing KL divergence between prototype assignment distributions of clean and noisy views. Experiments on LibriSpeech show that DASH consistently improves recognition under diverse noisy conditions while preserving clean accuracy, achieved by a label-free pre-training stage with minimal additional overhead (about 4% of fine-tuning time) beyond standard fine-tuning.

2606.19182 2026-06-18 eess.IV 新提交

Optimized Multi-Contrast Self-Supervised MRI Reconstruction using Learned k-space Partitioning

使用学习型k空间划分的优化多对比度自监督MRI重建

Brenden Kadota, Charles Millard, Mark Chiew

AI总结 提出一种多对比度自监督学习框架,通过端到端学习最优k空间数据划分,无需全采样数据即可提升MRI重建质量。

详情
AI中文摘要

目的:深度学习在通过从欠采样数据重建高质量图像来加速MRI方面显示出前景。虽然最近的工作利用多对比度信息来提高重建性能,但这些方法依赖于监督学习,需要全采样k空间进行训练。一种方法,通过数据欠采样的自监督学习(SSDU),通过将k空间划分为两个集合,并在两者之间进行网络映射,从而能够直接在欠采样k空间上进行训练。在这项工作中,我们通过两项修改改进了MRI自监督重建。方法:我们提出了一个多对比度自监督学习框架,该框架联合训练多个欠采样对比度,无需全采样k空间数据作为参考。此外,我们以端到端的方式为每个对比度学习最优的自监督数据划分,进一步提高了重建质量。具体来说,我们学习一个最优的划分概率分布,对其进行采样以生成用于划分的掩码。结果:在两个公开可用的多对比度MRI数据集上的实验表明,与当前的单对比度自监督学习方法相比,我们提出的自监督多对比度学习划分方法提高了重建质量。我们还证明了学习k空间数据的划分进一步增强了重建的保真度。结论:多对比度重建与学习划分相结合,比单对比度自监督MRI重建提高了重建保真度。意义:与之前的自监督方法相比,我们的方法可以实现更高的图像保真度和/或加速MRI协议时间,并且无需全采样k空间进行训练。

英文摘要

Objective: Deep Learning has shown promise in accelerating MRI by reconstructing high-quality images from under-sampled data. While recent work has leveraged multi-contrast information to improve reconstruction performance, these methods rely on supervised learning, which requires fully sampled k-space for training. One method, self-supervised learning via data undersampling (SSDU), enables direct training on under-sampled k-space by partitioning it into two sets, with a network mapping between the two. In this work, we improve MRI self-supervised MRI reconstruction with two modifications. Methods: We propose a multi-contrast self-supervised learning framework that jointly trains on multiple under-sampled contrasts without requiring fully sampled k-space data as a reference. Moreover, we learn an optimal self-supervised data partitioning for each contrast in an end-to-end manner, further enhancing reconstruction quality. Specifically, we learn an optimal partitioning probability distribution, which is sampled to generate a mask for partitioning. Results: Experiments on two publicly available multi-contrast MRI datasets demonstrate the improved reconstruction quality of our proposed self-supervised multi-contrast learned partitioning method compared to the current single-contrast self-supervised learning methods. We also demonstrate that learning the partitioning of k-space data further enhances the fidelity of reconstructions. Conclusion: Multi-contrast reconstruction combined with learned partitioning improves reconstruction fidelity over single-contrast self-supervised MRI reconstructions. Significance: Our method can facilitate higher image fidelity and/or accelerated MRI protocol times compared to previous self-supervised methods, and without requiring fully sampled k-space for training.

2606.19102 2026-06-18 eess.SP 新提交

Decentralized Power Control for Over-the-Air Computation with Phase Noise

含相位噪声的空中计算去中心化功率控制

Martin Dahl, Erik G. Larsson

AI总结 针对空中计算中信道估计仅本地可用的问题,提出基于截断信道反转的分布式功率控制方案,给出近似闭式解和精确数值解法,证明均方误差与接收天线数无关,并揭示其与聚合相位误差的关系。

Comments SPAWC 2026

详情
AI中文摘要

相干空中计算(OAC)需要上行信道估计。当使用校准互易性进行信道估计时,估计值仅对设备本地可用。这对预编码和解码构成了挑战,因为无法集中协调。为此,我们使用截断信道反转(TCI),并提出了一个近似闭式解和一个精确数值求解器来优化TCI参数。重要的是,我们证明了所提出的TCI方案在均方误差(MSE)方面与接收天线数量无关。此外,我们的分析揭示了MSE与设备间预期聚合相位误差之间的明确联系,这有助于理解OAC的可扩展性。最后,与先前工作中使用全局可用无误差信道估计的参考方法进行的仿真比较表明,所提出的方法在某些条件下甚至优于这些参考方法的MSE。

英文摘要

Estimation of uplink channels is required for coherent over-the-air computation (OAC). When channel estimation is done using calibrated reciprocity, the estimates are only available locally to the devices. This poses a challenge for precoding and decoding, which cannot be coordinated centrally. To this end we use truncated channel inversion (TCI) and propose an approximate closed form solution and an exact numerical solver to optimize the TCI parameters. Importantly, we prove that the proposed TCI scheme is independent of the number of receiver antennas in terms of mean-square-error (MSE). Furthermore, our analysis reveals a clear connection between the MSE and expected aggregate phase error across devices which gives insight to the scalability of OAC. Finally, simulations with comparisons to reference methods from prior work with globally available error-free channel estimates show that proposed is close, even outperforming these references in MSE under some conditions.

2606.19010 2026-06-18 eess.SP 新提交

Channel Charting With Physical Channel Fingerprints For Massive MIMO-OFDM Channel Acquisition

基于物理信道指纹的大规模MIMO-OFDM信道获取的信道图构建

Jinke Tang, Xiqi Gao, Li You, Xiang-Gen Xia, Cheng-Xiang Wang

AI总结 提出基于物理信道指纹的信道图构建方法,利用簇几何随机信道模型提取参数,实现大规模MIMO-OFDM系统中信道状态信息的高效获取,并用于波束域统计信道估计,性能接近传统在线探测方法。

Comments 15 pages, 9 figures

详情
AI中文摘要

6G移动通信和定位技术的发展增强了位置感知工具的重要性,如位置索引信道指纹(CFs)和信道图,它们正成为大规模MIMO-OFDM系统的关键使能技术。本文提出一种基于物理CFs(PCFs)的新型信道图构建方法,并展示其在信道状态信息(CSI)获取中的有效性。首先,基于簇几何随机信道模型(GBSM)定义PCF,使其能够用紧凑参数集全面表示物理信道特性。然后,开发了大规模MIMO-OFDM系统中PCF获取的方法。通过利用PCF与空频时(SFT)域信道的关系,所提方法从多位置信道测量中提取PCF,并构建具有位置索引PCF的结构化信道图。此外,我们提出一种低复杂度算法,利用信道图中的PCF获取波束域统计CSI(sCSI)。所得sCSI可直接用作信道估计的先验信息。仿真结果表明,所提方法提供的sCSI性能与传统在线探测技术相当,且生成的sCSI可作为可靠先验知识显著提升信道估计精度。这些结果验证了所提PCF作为下一代移动通信信道获取和系统设计的强大且多功能工具。

英文摘要

The advancement of 6G mobile communication and positioning technologies has amplified the significance of location-aware tools, such as location-indexed channel fingerprints (CFs) and channel charting, which are becoming key enablers for massive MIMO-OFDM systems. In this paper, we propose a novel channel charting with physical CFs (PCFs) and demonstrate its effectiveness in channel state information (CSI) acquisition. First, we define the PCF based on a cluster-based geometric stochastic channel model (GBSM), enabling a comprehensive representation of physical channel characteristics using a compact set of parameters. We then develop a methodology for PCF acquisition in massive MIMO-OFDM systems. By exploiting the relationship between PCFs and the space-frequency-time (SFT) domain channel, the proposed method extracts PCFs from multi-location channel measurements and constructs a structured channel charting with location-indexed PCFs. Furthermore, we propose a low-complexity algorithm to acquire beam domain statistical CSI (sCSI) using the PCFs in the channel charting. The resulting sCSI can be directly employed as prior information for channel estimation. Simulation results show that the proposed method delivers sCSI performance comparable to traditional online probing techniques, and the generated sCSI can serve as reliable prior knowledge to significantly enhance the accuracy of channel estimation. These results validate the proposed PCF as a powerful and versatile tool for channel acquisition and system design of the next-generation mobile communication.

2606.18985 2026-06-18 eess.AS 新提交

SingFox: A Multi-Lingual Singfake Detection Corpus

SingFox: 多语言歌唱深度伪造检测语料库

Arth J. Shah, Devanshi K. Trivedi, Himanshi U. Borad, Hemant A. Patil

AI总结 为应对歌唱深度伪造检测中的语言多样性和伪造方法多样性挑战,构建了包含20种语言、11.3万音频片段的大规模多语言语料库SingFox,并设计了六个评估轨道,实验表明跨数据集最高准确率达77.84%。

Comments Accepted at INTERSPEECH 2026

详情
AI中文摘要

在这项工作中,我们介绍了SingFox,这是一个全面且大规模的数据集,专门设计用于支持歌唱深度伪造检测和源追踪系统的稳健评估。SingFox分为六个不同的轨道(T1--T6),每个轨道针对一种独特的新颖性形式,涵盖从语言多样性(全球和印度)到特定类型音乐和替代伪造生成方法。该数据集包含超过113,802个音频片段,涵盖20种语言,总计超过126.32小时的音频数据,并包含1,150位歌手。每个轨道旨在模拟真实场景,并评估模型在不同条件下的可靠性,从而评估其稳健性。SingFox旨在通过为歌唱深度伪造检测任务和源验证任务(模型可解释性)提供可靠的基准,促进可重复性并加速歌唱深度伪造检测的研究。实验结果显示,在跨数据集评估设置中最高准确率为77.84%。所有重现数据集所需的代码和资源均可在此https URL公开获取。

英文摘要

In this work, we introduce SingFox, a comprehensive and large-scale dataset specifically designed to support robust evaluation of singing deepfake detection and source tracing systems. SingFox is divided into six distinct tracks (T1--T6), each targeting a unique form of novelty, ranging from language diversity (global and Indian) to genre-specific music and alternative fake generation methods. The dataset encompasses over 113,802 audio clips across 20 languages, totaling more than 126.32 hours of audio data and featuring 1,150 singers. Each track is designed to emulate real-world scenarios and evaluate how reliably models perform under different conditions, thereby assessing their robustness. SingFox aims to foster reproducibility and accelerate research in singing deepfake detection by providing a reliable benchmark for both the singfake detection task and the source verification task (model explainability). Experimental results show a highest accuracy of 77.84\% in cross-dataset evaluation settings. All code and resources required to reproduce the dataset are publicly available at https://github.com/Arth-Shah/SingFox.

2606.18968 2026-06-18 eess.AS 新提交

Audio-to-Audio via Diffusion Warm Initialization

通过扩散热初始化实现音频到音频的转换

Cristóbal Andrade, Sebastian J. Schlecht

AI总结 提出扩散热初始化方法,用于音色迁移、MIDI到真实合成及音频增强等任务,通过选择合适初始化时间$t_\ ext{init}$,单个预训练扩散模型即可支持多种转换目标,无需任务特定训练。

详情
AI中文摘要

在本文中,我们提出扩散热初始化作为一种简单而有效的方法,用于一系列音频到音频的转换任务。为了说明该方法的通用性,我们展示了其在音色迁移、MIDI到真实合成以及多种音频增强任务中的应用。我们对音色迁移进行了详细的实证分析,以研究初始化时间$t_\ ext{init}$的作用。使用基于音高的Jaccard距离和Fréchet音频距离评估$t_\ ext{init}$的效果,以量化对输入信号的忠实度和与目标分布的对齐程度。我们的结果为选择$t_\ ext{init}$提供了实用指导,并表明一旦适当选择,单个预训练扩散模型结合热初始化即可支持多种转换目标,无需任务特定训练或条件化。尽管方法简单,但与专门为这些任务设计的更复杂流程相比,该方法已取得有竞争力的结果。我们进一步观察到,热初始化不一定需要显式噪声注入,因为引导信号本身通常可以作为反向扩散过程的有效初始化状态。总之,这些发现表明热初始化提供了一种简单而有效的框架,可作为更复杂音频转换流程的基本构建块。

英文摘要

In this paper, we propose diffusion warm initialization as a simple yet effective approach for a range of audio-to-audio transformation tasks. To illustrate the generality of the approach, we demonstrate its use in timbre transfer, MIDI-to-Real synthesis, and multiple audio enhancement tasks. We conduct a detailed empirical analysis on timbre transfer to investigate the role of the initialization time $t_\text{init}$. The effect of $t_\text{init}$ is evaluated using pitch-based Jaccard Distance and Fréchet Audio Distance to quantify faithfulness to the input signal and alignment with the target distribution. Our results provide practical guidance for selecting $t_\text{init}$ and show that, once properly chosen, a single pretrained diffusion model combined with warm initialization can support multiple transformation objectives without task-specific training or conditioning. Despite its simplicity, this approach already achieves competitive results when compared with more complex pipelines designed specifically for these tasks. We further observe that warm initialization does not necessarily require explicit noise injection, as the guide signal itself can often serve as a valid initialization state for the backward diffusion process. Together, these findings show that warm initialization provides a simple and effective framework that serves as a fundamental building block for more complex audio transformation pipelines.

2606.18917 2026-06-18 eess.SP 新提交

Spaceborne SAR Change Detection and Coherence Analysis for Maritime Port Monitoring

星载SAR变化检测与相干分析在海事港口监测中的应用

Necati Kagan Erkek, Kudret Esmer

AI总结 利用星载SAR幅度与干涉相干性,通过多时相分析检测港口结构变化,分辨率达0.42米。

Comments 6 pages

详情
AI中文摘要

星载合成孔径雷达(SAR)提供相干微波图像,适用于在光照和天气无关的采集条件下进行海事基础设施监测。针对中国天津港的SAR幅度和地理编码多时相数据,进行了学术会议风格的分析。处理流程包括幅度可视化、辐射定标、视角方向解释、距离和方位分辨率评估、斑点噪声抑制、基于幅度的变化检测、用于地理检查的GeoTIFF导出以及干涉相干性估计。直方图引导的显示限制提高了复杂SAR幅度图像的可解释性,而对阴影和亮叠掩响应的放大检查支持对光照几何的定性解释。使用二维傅里叶分析来表征主导频谱内容,并在可用图像坐标校准下估计出约0.42米的距离分辨率和0.19度的方位角间隔。随后,通过滤波后的幅度差和多个空间平均窗口计算的相干图,对多时相主从图像进行比较。结果突出了SAR幅度和相干产品在检测密集港口环境中(包括船舶、储罐、码头结构、工业场地和水陆过渡带)的结构和表面条件变化的相关性。

英文摘要

Spaceborne synthetic aperture radar (SAR) provides coherent microwave imagery suitable for maritime infrastructure monitoring under illumination-independent and weather-independent acquisition conditions. An academic conference-style analysis is presented for SAR amplitude and geocoded multitemporal data over Tianjin Port, China. The processing chain includes amplitude visualization, radiometric scaling, view-direction interpretation, range and azimuth resolution assessment, speckle reduction, amplitude-based change mapping, GeoTIFF export for geographic inspection, and interferometric coherence estimation. Histogram-guided display limits improve the interpretability of the complex SAR magnitude images, while zoomed inspection of shadows and bright layover responses supports qualitative interpretation of illumination geometry. A two-dimensional Fourier analysis is used to characterize dominant spectral content and to estimate an approximate range resolution of 0.42 m and an azimuth angular separation of 0.19 degrees under the available image-coordinate calibration. Multitemporal master and slave images are subsequently compared through filtered amplitude differences and coherence maps computed with multiple spatial averaging windows. The results highlight the relevance of SAR amplitude and coherence products for detecting structural and surface-condition variations in dense port environments characterized by vessels, storage tanks, quay structures, industrial yards, and water-land transitions.

2606.18827 2026-06-18 eess.SP 新提交

Controlled Out-of-Band Device-to-Device Communication in Cellular Networks Using a Backup Channel in Television White Space

蜂窝网络中利用电视白空间备份信道的受控带外设备到设备通信

Saifur Rahman, Syed Luqman Shah, Salim Nasar Faraj Mursal, Ziaul Haq Abbas, Muhammad Usman, Muhammad Irfan, Fazal Muhammad

AI总结 针对蜂窝网络频谱稀缺问题,提出利用认知无线电在电视白空间检测备份信道,当常规信道繁忙时建立受控带外D2D链路,降低阻塞和延迟概率。

Comments Published in: 2023 18th International Conference on Emerging Technologies (ICET)

详情
AI中文摘要

本文解决了蜂窝网络(CN)中的频谱稀缺问题。我们为位于同一宏小区内、受单个宏基站(eNB)控制的蜂窝用户(CU)提出了一种备份信道(BuC)。该BuC在电视白空间中运行,并通过认知无线电能量检测信道感知技术以一定的成功概率被CU检测到。当所有与蜂窝eNB的常规信道都被占用时,同一宏eNB覆盖区域内的CU可以利用感知到的BuC建立受控的带外设备到设备链路进行通信。BuC绕过eNB进行数据通信,减轻了CN核心的负担,从而提高了蜂窝eNB的容量。在所提出的系统模型中,每个CU和eNB配备两根天线,用于在两个独立的频段(即蜂窝频段和电视频段)进行通信。仿真结果表明,阻塞概率和呼叫延迟概率显著降低。

英文摘要

In this article, we address the problem of spectrum scarcity in cellular networks (CNs). We propose a backup channel (BuC) for cellular users (CUs) located in the same macro-cell under the control of a single macro base station (eNB). This BuC operates in television white space and is detected by the CUs through a cognitive radio energy-detection channel-sensing technique with a certain probability of success. When all regular channels with the cellular eNB are occupied, the CUs within the same coverage area of the macro eNB can utilize the sensed BuC to establish a controlled out-of-band device-to-device link for communication. The BuC bypasses the eNB for data communication and reduces the burden on the core of the CN. This leads to improved cellular eNB capacity. In the proposed system model, each CU and eNB is equipped with two antennas for communication in two separate bands, i.e., cellular and TV bands. Simulations show significant reductions in the blocking probability and probability of call delay.

2606.18766 2026-06-18 eess.SP 新提交

Rotatable Antenna-Enhanced Secure Integrated Sensing and Communications Under Imperfect CSI

不完美信道状态信息下可旋转天线增强的安全集成感知与通信

Qi Yang, Kai Liu, Jingjing Zhao, Xidong Mu, Tianqi Mao, Kaiquan Cai

AI总结 针对不完美窃听信道状态信息,提出可旋转天线增强的安全集成感知与通信系统,通过联合优化发射波束成形、人工噪声协方差矩阵和天线指向,在最大信息泄露和最小感知功率约束下最大化最小数据速率。

详情
AI中文摘要

研究了一种可旋转天线增强的安全集成感知与通信系统,其中基于RA的收发器同时与合法用户通信并感知被视为潜在窃听者的目标。在不完美窃听信道状态信息下,通过联合优化发射波束成形、人工噪声协方差矩阵以及RA的发射/接收指向,在最大信息泄露和最小感知功率约束下,制定了一个最大-最小数据速率优化问题。为了解决高度非凸的问题,分别通过S-Procedure方法和Cauchy-Schwarz不等式将信息泄露和感知功率约束转化为凸约束。随后,开发了一种交替优化算法,将重新表述的问题分解为两个子问题。具体地,利用逐次凸逼近和半定松弛方法优化发射波束成形和人工噪声协方差矩阵,而通过粒子群优化获得RA指向。仿真结果表明,基于RA的方案显著优于基准方案,并且随着最大旋转范围的增加,对不完美CSI的鲁棒性增强。

英文摘要

A rotatable antenna (RA)-enhanced secure integrated sensing and communications system is investigated, where an RA-based transceiver simultaneously communicates with legitimate users and senses a target that is regarded as a potential eavesdropper. Under imperfect eavesdropping channel state information (CSI), a max-min data rate optimization problem is formulated by jointly optimizing the transmit beamforming, artificial noise (AN) covariance matrix, and transmit/receive boresights of RAs, subject to the maximum information leakage and minimum sensing power constraints. To address the highly non-convex problem, the information leakage and sensing power constraints are transformed into convex ones via S-Procedure method and Cauchy-Schwarz inequality, respectively. Subsequently, an alternating optimization algorithm is developed to decompose the reformulated problem into two subproblems. In particular, the transmit beamforming and AN covariance matrix are optimized by utilizing successive convex approximation and semi-definite relaxation methods, while the RA boresights are obtained by invoking the particle swarm optimization. Simulation results show that the RA-based scheme significantly outperforms the benchmarks, and offers enhanced robustness against imperfect CSI with the increase of the maximum rotation range.

2606.18758 2026-06-18 eess.SP 新提交

EH-FedSAG: Variance-Reduced Federated Learning with Energy-Aware Participation in Energy-Harvesting IoT

EH-FedSAG:能量采集物联网中具有能量感知参与的方差缩减联邦学习

Shahab Jahanbazi, Mateen Ashraf, Richard Demo Souza, Onel L. A. Lopez

AI总结 针对能量采集网络中设备参与不稳定和通信成本高的问题,提出基于服务器存储的方差缩减方法EH-FedSAG,在统一仿真框架下与EH-FedAvg对比,实验表明EH-FedSAG在测试精度和训练方差上均优于EH-FedAvg,尤其在能量稀缺和非独立同分布数据下优势更明显。

详情
AI中文摘要

能量采集网络中的联邦学习面临两大挑战:间歇性和随机性能量到达导致训练轮次中设备参与不稳定,以及有限能量预算下的高通信成本降低了整体训练效率。本文研究了基于时隙能量采集模型的联邦学习,并提出了EH-FedSAG,一种基于服务器存储的方差缩减方法。我们在相同的多信道正交多址接入上行链路模型下,并在一个统一的仿真框架内比较了EH-FedSAG与原始EH-FedAvg,该框架捕获了不同能量到达概率下的电池充电、本地计算成本和传输成本。性能根据同质和异质数据分布下训练轮次的测试准确率进行评估。结果表明,在所考虑的设置中,EH-FedSAG始终比EH-FedAvg获得更高的测试准确率,同时表现出显著更低的训练方差。在能量稀缺和非独立同分布数据下,EH-FedSAG的优势更为明显。

英文摘要

Federated learning (FL) in energy-harvesting (EH) networks is challenged by intermittent and stochastic energy arrivals that lead to unstable device participation across training rounds, and by high communication costs under limited energy budgets, reducing overall training efficiency. This paper studies FL under a slot-based EH model and proposes EH-FedSAG, a server-memory-based variance-reduced method. We compare EH-FedSAG with vanilla EH-FedAvg under the same multi-channel orthogonal multiple-access uplink model and within a unified simulation framework that captures battery charging, local computation cost, and transmission cost under different energy-arrival probabilities. Performance is assessed in terms of test accuracy over training rounds for both homogeneous and heterogeneous data distributions. The results show that EH-FedSAG consistently achieves higher test accuracy than EH-FedAvg in the considered settings, while exhibiting substantially lower training variance. The advantage of EH-FedSAG is more pronounced under scarce energy availability and non-independent/identically-distributed data.

2606.18615 2026-06-18 eess.AS eess.SP 新提交

A Survey of Methods for the Discretization of Phonograph Record Playback Filters

留声机唱片回放滤波器离散化方法综述

Benjamin R. Thompson, Tre DiPassio, Jenna Rutowski, Michael C. Heilemann

AI总结 本文综述了将留声机唱片回放均衡曲线从连续时间离散化为数字滤波器的方法,比较了多种方法在性能、计算成本和延迟方面的差异,为数字回放均衡系统开发提供参考。

Comments Presented at the AES 157th Convention, Best Student Paper Winner

Journal ref 2024 Journal of the Audio Engineering Society, AES Convention Paper 10191

详情
AI中文摘要

自1924年留声机唱片电气录音问世以来,唱片故意采用非均匀频率响应进行刻录,以最大化唱片上的信息密度并提高信噪比。为了在可用带宽内再现名义上平坦的信号,必须在回放时应用逆曲线来消除这种刻录曲线的影响。直到1953年引入所谓的RIAA曲线之前,任何特定唱片所需的回放曲线可能因唱片公司和时间而异。因此,任何想要聆听或恢复唱片信息的人必须拥有能够实现多种回放均衡的设备。这种校正可以通过模拟硬件或数字处理来实现。数字方法具有成本低和灵活性高的优点,但需要从连续时间(原始曲线定义域)到离散时间的变换。这种变换不可避免地会在奈奎斯特频率附近产生与连续时间响应的偏差。有许多成熟的方法用于离散化连续时间滤波器,这些方法在性能、计算成本和固有延迟方面各不相同。本文在留声机回放均衡的背景下探讨了执行这种变换的几种方法,并量化了每种方法的性能。本文旨在为开发数字回放均衡系统或类似需要数字近似连续时间滤波器响应的应用的人员提供参考。

英文摘要

Since the inception of electrical recording for phonograph records in 1924, records have been intentionally cut with a non-uniform frequency response to maximize the information density on a disc and to improve the signal-to-noise ratio. To reproduce a nominally flat signal within the available bandwidth, the effects of this cutting curve must be undone by applying an inverse curve on playback. Until 1953, with the introduction of what has become known as the RIAA curve, the playback curve required for any particular disc could vary by record company and over time. As a consequence, anyone seeking to hear or restore the information on a disc must have access to equipment that is capable of implementing multiple playback equalizations. This correction may be accomplished with either analog hardware or digital processing. The digital approach has the advantages of reduced cost and expanded versatility, but requires a transformation from continuous time, where the original curves are defined, to discrete time. This transformation inevitably comes with some deviations from the continuous-time response near the Nyquist frequency. There are many established methods for discretizing continuous-time filters, and these vary in performance, computational cost, and inherent latency. In this work, several methods for performing this transformation are explored in the context of phonograph playback equalization, and the performance of each approach is quantified. This work is intended as a resource for anyone developing systems for digital playback equalization or similar applications that require approximating the response of a continuous-time filter digitally.

2606.18573 2026-06-18 eess.AS eess.SP 新提交

Evaluating Dynamic Range Compressor Models Using Control-Voltage Measurements: an Approach and Dataset

使用控制电压测量评估动态范围压缩器模型:一种方法和数据集

Benjamin R. Thompson, Michael C. Heilemann

AI总结 提出通过直接比较模型与硬件的增益控制电压信号来评估动态范围压缩器模型,实验表明基于代理损失训练的模型在控制轨迹上不如直接训练模型,并发布包含增益控制电压信号的数据集。

Comments Accepted to DAFx 2026

详情
AI中文摘要

定义动态范围压缩器行为的量是作为输入电平函数的时变增益。然而,由于从现有数据集中包含的音频输入输出数据中隔离增益降低信号会产生病态逆问题,这些设备的模型通常使用代理指标进行评估。目前尚不清楚这些指标在多大程度上准确描述了模型需要模拟的行为,尤其是当基于波形的指标可能受到模拟处理和捕获引入的次要影响(即使这些影响听不见)时。我们研究了一种评估方法,其中模型产生的增益降低信号直接与硬件产生的增益降低控制电压信号进行比较。为了评估该指标作为学习目标的有效性,我们训练了一个灰盒模型,其损失直接基于增益控制信号计算,同时训练了两个使用常见代理损失的模型。在底层控制轨迹评估中,使用代理损失训练的模型未能达到与直接基于增益控制信号训练的模型相当的性能,并且波形域指标为直接指标明显区分的模型分配了相似的误差。为了促进这种评估方法的进一步探索,我们发布了一个Solid State Logic总线压缩器数据集,其中包含与音频输出一起捕获的增益控制电压信号。

英文摘要

The quantity that defines the behavior of a dynamic range compressor is the time-varying gain applied to the signal as a function of the input level. However, models of these devices are typically evaluated using proxy metrics because isolating the gain reduction signal from the audio input-output data included in existing datasets creates an ill-conditioned inverse problem. It is unclear how accurately these metrics describe the behavior the model is tasked with emulating, particularly as waveform-based metrics can be influenced by secondary effects introduced by analog processing and capture, even when those effects are inaudible. We investigate a method of evaluation in which the gain-reduction signal produced by a model is measured directly against a gain-reduction control voltage signal produced by the hardware. To evaluate the efficacy of this metric as a learning objective, a gray-box model is trained using loss computed directly over the gain control signals alongside two models trained using common proxy losses. The models trained using proxy losses did not achieve parity with models trained directly on the gain control signal when evaluated with respect to the underlying control trajectory, and the waveform-domain metrics assigned similar errors to models that were clearly separated by the direct metric. To facilitate further exploration of this method of evaluation, we present a Solid State Logic bus compressor dataset that includes the gain control voltage signal captured alongside the audio output.

2606.18492 2026-06-18 eess.IV 新提交

Dense Holographic Associative Memories

密集全息联想记忆

David J. Brady, Gregory Neory

AI总结 提出利用两级全息图级联实现现代Hopfield密集联想记忆的并行光学计算,通过一维编码层引入非线性并消除布拉格简并,同时设计非局部梯度响应介质实现线性效率缩放。

详情
AI中文摘要

联想回忆——将入射模式映射到存储中最相似的模式——是高维视觉前端的自然计算原语,正是体全息图原生执行的操作。我们证明,由一维编码层分隔的两级体全息图级联,通过并行光学计算精确地实现了现代Hopfield(密集联想记忆)检索映射 $\eta = V \text{softmax}(\lambda K^T x)$,其中逆温度通过编码层中的光学寻址空间光调制实现。通过一维编码路由输入和输出,而非直接在二维平面间路由,提供了原始Hopfield模型所缺乏的分离非线性,并通过平衡光栅波矢维度数($2+1=3$),消除了直接二维到二维全息图中强制分形采样的布拉格简并。忠实的密集存储进一步要求记录介质能够捕获神经元间连接,同时抑制导致均匀光折变材料效率下降 $M^{-2}$ 的场自能。我们提出一种非局部、梯度响应的介质,其与照明无关的衰变在原位恢复了线性 $M^{-1}$ 缩放,并在离散的对向二极管单元中展示了其接收、组合和存储功能。概述了实现OASLM堆叠和体积分子/纳米晶体的途径。

英文摘要

Associative recall -- mapping an incident pattern to the stored one it most resembles -- is the natural computational primitive of a high-dimensional vision front end, and it is precisely the operation a volume hologram performs natively. We show that a cascade of two volume holograms separated by a one-dimensional coded layer physically evaluates the modern Hopfield (dense associative memory) retrieval map, $η= V \text{softmax}(λK^T x)$, exactly as a parallel optical computation, with the inverse temperature realized via optically addressed spatial light modulation in the coded-layer. Routing the input and output through a 1D code rather than directly between 2D planes supplies the separating nonlinearity the original Hopfield model lacked and, by balancing the grating-wavevector dimension count ($2+1=3$), removes the Bragg degeneracy that otherwise forces fractal sampling on a direct 2D-to-2D hologram. Faithful dense storage further demands a recording medium that captures inter-neuron connections while rejecting the field self-energy responsible for the $M^{-2}$ efficiency falloff of homogeneous photorefractives. We propose a nonlocal, gradient-responsive medium whose illumination-independent decay recovers the linear $M^{-1}$ scaling in situ, and demonstrate its reception, combination, and storage functions in a discrete opposing-diode cell. Routes to OASLM-stack and volume molecular/nanocrystal realizations are outlined.

2606.18489 2026-06-18 eess.IV 新提交

GHOST-CAT: An Efficient and Practical Network for Mesh Generation from 3D Echocardiography

GHOST-CAT: 一种高效实用的三维超声心动图网格生成网络

Edward Ferdian, Debbie Zhao, Alistair A. Young, Martyn P. Nash

AI总结 提出GHOST-CAT两阶段网络,结合CNN、图卷积和Transformer,从3D超声心动图生成拓扑一致、时间连贯的左心室网格,在100例测试集上Dice系数达0.87(腔室)和0.75(心肌),优于现有方法。

详情
AI中文摘要

深度学习的最新进展显著加速了心脏成像工作流程,从分割到用于计算建模的网格生成。然而,由于3D超声心动图的低对比度噪声比、锥形视野以及对声影的敏感性,其分析面临独特挑战。在此,我们提出了一种专为3D超声心动图定制的高效实用网络。我们的方法由一个两阶段网络组成,结合了卷积神经网络、图卷积网络和Transformer,以创建准确的时间变化3D左心室网格,这些网格在整个心动周期中拓扑一致且时间连贯。我们的模型在100张3D超声图像的保留测试数据集上实现了比当前最先进方法更优越的网格重建精度,与心脏磁共振成像导出的参考分割相比,Dice系数为0.87±0.05(腔室)和0.75±0.07(心肌),平均±标准差表面距离为3.3±0.6毫米(心内膜)和3.5±0.5毫米(心外膜)。重建的网格能够自动计算常规临床指标,如体积、质量和应变,并支持生物物理数字孪生的高级应用。源代码在此https URL公开共享。

英文摘要

Recent advances in deep learning have significantly accelerated cardiac imaging workflows, from segmentation to the generation of meshes for computational modelling. Nevertheless, analysis of 3D echocardiograms presents unique challenges due to their low contrast-to-noise ratio, conical field of view, and susceptibility to acoustic shadowing. Here, we present an efficient and practical network tailored for 3D echocardiograms. Our method consists of a two-stage network that combines convolutional neural networks, graph convolutional networks, and transformers, to create accurate time-varying 3D meshes of the left ventricle that are topologically consistent and temporally coherent throughout the cardiac cycle. Our model achieved superior mesh reconstruction accuracy compared to current state-of-the-art methods on a held-out test dataset of 100 3D echo images, with a Dice coefficient of 0.87 +/- 0.05 (cavity) and 0.75 +/- 0.07 (myocardium), and mean +/- SD surface distances of 3.3 +/- 0.6 mm (endocardium) and 3.5 +/- 0.5 mm (epicardium), against reference segmentations derived from cardiac magnetic resonance imaging. The reconstructed mesh enables automated calculation of routine clinical indices, such as volume, mass, and strain, and enables advanced applications with biophysical digital twins. Source code is openly shared at https://github.com/EdwardFerdian/ghost-cat.

2606.18488 2026-06-18 eess.SP 新提交

Cell-Free Integrated Sensing and Communication

无蜂窝一体化感知与通信

Diluka Galappaththige, Chintha Tellambura

AI总结 综述无蜂窝架构与感知通信融合技术,涵盖分布式接入点、多站感知、资源优化等关键问题,并展望未来方向。

详情
AI中文摘要

无蜂窝(CF)一体化感知与通信(ISAC)将CF架构与ISAC功能相结合。CF-ISAC利用分布式接入点,消除小区边界,提升覆盖、频谱效率和可靠性。它还提高了能效,实现了鲁棒的多用户通信、分布式多站感知和无缝资源优化。目前缺乏对CF-ISAC的全面综述。本专著填补了这一空白,涵盖了基本原理、协作传输、雷达散射截面、目标参数估计、ISAC集成级别、感知指标和关键应用。它还探讨了多站感知的优势。讨论了性能分析、资源分配、安全性和用户/目标中心设计。最后,讨论了同步、多目标检测、干扰管理和前传限制。介绍了先进天线技术、网络辅助系统、近场CF-ISAC、跨技术集成和机器学习方法。

英文摘要

Cell-free (CF) integrated sensing and communication (ISAC) merges the CF architecture with ISAC functionalities. CF-ISAC leverages distributed access points, removes cell boundaries, and enhances coverage, spectral efficiency, and reliability. It also improves energy efficiency, enabling robust multi-user communication, distributed multi-static sensing, and seamless resource optimization. A comprehensive survey on CF-ISAC has been lacking. This monograph addresses that gap by covering the foundational principles, cooperative transmission, radar cross-section, target parameter estimation, ISAC integration levels, sensing metrics, and key applications. It also explores the advantages of multi-static sensing. Performance analysis, resource allocation, security, and user/target-centric designs are discussed. Finally, synchronization, multi-target detection, interference management, and fronthaul limitations are discussed. Advanced antenna technologies, network-assisted systems, near-field CF-ISAC, cross-technology integration, and machine learning approaches are presented.

2606.18435 2026-06-18 eess.SP 新提交

Covert Multi-Hop Communications for Heterogeneous Networks With Multiple Wardens

异构网络中多监听者场景下的隐蔽多跳通信

Justin H. Kong, Terrence J. Moore, Fikadu T. Dagefu

AI总结 针对多个被动监听者监控的异构无线网络,联合优化路由、模态选择和发射功率,在满足端到端速率要求下最大化网络隐蔽性,提出基于KL散度的低复杂度路由度量与两阶段优化算法。

详情
AI中文摘要

本文研究了由多个被动监听者监控的异构无线网络中的隐蔽多跳通信。为了在满足严格的端到端速率要求的同时最大化网络范围的隐蔽性,我们联合优化了路由、模态选择和发射功率。在同步多跳传输方案下,我们分析了两种不同的监听者模型的检测能力:采用中央融合中心的合谋监听者和独立运行的非合谋监听者。对于这两种模型,我们推导了最优检测器和检测错误概率(DEP)的精确表达式。此外,为了降低评估DEP的复杂度,我们基于伽马矩匹配开发了高精度的闭式近似,并使用Kullback-Leibler(KL)散度建立了严格的DEP下界。在此理论基础上,我们提出了一种高效的两阶段优化算法,将链路级资源分配与网络级路径选择解耦。通过将KL散度界转化为一种新颖的低复杂度路由度量(该度量普遍简化为信噪比的线性求和),与传统的基于每跳检测的度量相比,我们显著降低了计算开销。最后,数值仿真验证了理论分析,并展示了所提框架的接近最优的性能。

英文摘要

This paper investigates covert multi-hop communications in heterogeneous wireless networks monitored by multiple passive wardens. To maximize network-wide covertness while satisfying a strict end-to-end rate requirement, we jointly optimize routing, modality selection, and transmit power. Under a simultaneous multi-hop transmission scheme, we analyze the detection capabilities of two distinct warden models: colluding wardens employing a central fusion center, and non-colluding wardens operating independently. For both models, we derive optimal detectors and exact expressions for the detection error probability (DEP). In addition, to reduce the complexity of evaluating the DEP, we develop highly accurate closed-form approximations based on gamma moment matching and establish rigorous DEP lower bounds using Kullback-Leibler (KL) divergence. Building on this theoretical foundation, we propose an efficient two-stage optimization algorithm that decouples link-level resource allocation from network-level path selection. By translating the KL divergence bounds into a novel, low-complexity routing metric, which universally simplifies to a linear summation of signal-to-noise ratios, we substantially reduce the computational overhead compared to conventional per-hop detection-based metrics. Finally, numerical simulations validate the theoretical analysis and demonstrate the near-optimal performance of the proposed framework.

2606.19125 2026-06-18 eess.AS stat.ME 新提交

Continuous-Speech Parkinson's Disease Detection Using Acoustic and Inharmonicity Features

连续语音帕金森病检测:基于声学和非谐和性特征

Rujia Li, Niloofar Momeni, Susanna Whitling, Andreas Jakobsson

AI总结 提出一种基于连续语音的帕金森病检测方法,利用传统声学特征和新型非谐和性特征,实验表明连续语音模型优于持续元音模型。

详情
AI中文摘要

已有研究主要利用持续元音发声从语音数据中识别帕金森病(PD)。本文在此基础上,提出了一种针对连续语音的PD识别方法,从而实现对语音数据的实用背景监测,以检测指示PD的语音变化。使用两个不同的数据集,我们比较了最佳持续元音模型与所提出的连续语音模型的性能,清晰展示了后者的优越性能。我们研究了说话人级别评估和数据泄漏预防的方法,以及如何从连续语音中可靠提取元音信息。所提出的方法框架同时利用传统声学表示和一种有前景的新型基于非谐和性的框架,展示了后者如何提供互补信息以改善其中一个数据集的性能;然而,对于另一个数据集,该信息并未显著改善(或降低)性能,表明在得出其使用结论前需要进一步研究。总体而言,本文清晰展示了使用连续语音进行PD分类相比使用持续元音声音的优势。

英文摘要

Notable efforts have been made to identify Parkinson's disease (PD) from vocal data, primarily using sustained vowel phonations. In this work, we extend on these efforts introducing a PD identification approach for continuous speech, enabling a practical background monitoring of voice data to detect vocal changes indicative of PD. Using two distinct data sets, we compare the best sustained vowel model with that of the proposed continuous speech model, clearly illustrating the preferential performance of the latter. We examine approaches for speaker level evaluation and data leakage preventions, as well as how vowel information may be reliable extracted from continuous speech. The proposed method framework exploits both traditional acoustic representations and a promising novel inharmonicity based framework, showing how the latter provides complementary information improving the performance for one of the data sets; however, for the other data set, this information did not significantly improve (nor reduce) the performance, suggesting that further studies are required before being able to draw firm conclusions in its use. Overall, the work clearly illustrates the benefit of forming PD classification using continuous speech compared to using sustained vowel sounds.