arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2115
2508.04492 2026-06-17 cs.CV cs.AI

Learning Robust Intervention Representations with Delta Embeddings

通过delta嵌入学习鲁棒的干预表示

Panagiotis Alimisis, Christos Diou

发表机构 * Department of Informatics and Telematics(信息与电信学系)

AI总结 本文提出通过潜在空间中的可操作反事实表示提升模型鲁棒性,提出因果delta嵌入方法,在无需额外监督的情况下学习因果表示,实验显示其在合成和现实基准中表现优异。

Comments ICLR 2026, Poster

详情
Journal ref
International Conference on Learning Representations (ICLR), 2026
AI中文摘要

因果表示学习近年来引起了广泛关注,作为提高模型泛化性和鲁棒性的手段。因果干预图像对(也称为“可操作反事实”)的表示具有特性:在起始状态和结束状态之间,只有受干预/动作影响的场景变量发生变化。尽管大多数工作集中在识别和表示因果模型下的场景变量,但较少关注干预本身的表示。本文表明,通过关注潜在空间中的可操作反事实表示,可以有效提升离分布鲁棒性。具体而言,我们提出干预可通过因果delta嵌入表示,该嵌入对视觉场景不变且在影响的因果变量上稀疏。基于此见解,我们提出一种无需额外监督的学习因果表示的方法。在因果三元组挑战中的实验表明,因果delta嵌入在离分布设置中表现突出,显著超越基线性能,在合成和现实基准中均取得优异结果。

英文摘要

Causal representation learning has attracted significant research interest during the past few years, as a means for improving model generalization and robustness. Causal representations of interventional image pairs (also called ``actionable counterfactuals'' in the literature), have the property that only variables corresponding to scene elements affected by the intervention / action are changed between the start state and the end state. While most work in this area has focused on identifying and representing the variables of the scene under a causal model, fewer efforts have focused on representations of the interventions themselves. In this work, we show that an effective strategy for improving out of distribution (OOD) robustness is to focus on the representation of actionable counterfactuals in the latent space. Specifically, we propose that an intervention can be represented by a Causal Delta Embedding that is invariant to the visual scene and sparse in terms of the causal variables it affects. Leveraging this insight, we propose a method for learning causal representations from image pairs, without any additional supervision. Experiments in the Causal Triplet challenge demonstrate that Causal Delta Embeddings are highly effective in OOD settings, significantly exceeding baseline performance in both synthetic and real-world benchmarks.

2602.13318 2026-06-17 cs.AI cs.CV cs.LG

DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing

DECKBench:用于学术幻灯片生成和编辑的多智能体框架基准测试

Daesik Jang, Morgan Lindsay Heisler, Linzi Xing, Yifei Li, Edward Wang, Ying Xiong, Yong Zhang, Zhenan Fan

发表机构 * Huawei Technologies Canada(华为加拿大技术有限公司) University of British Columbia(不列颠哥伦比亚大学)

AI总结 本文提出DECKBench,一个用于评估多智能体生成和编辑学术幻灯片的框架,通过定制数据集和模拟编辑指令,系统评估幻灯片和整个演示文稿的忠实度、连贯性、布局质量和多轮指令遵循能力。

详情
AI中文摘要

本文提出DECKBench,一个用于评估多智能体生成和编辑学术幻灯片的框架,通过定制数据集和模拟编辑指令,系统评估幻灯片和整个演示文稿的忠实度、连贯性、布局质量和多轮指令遵循能力。

英文摘要

Automatically generating and iteratively editing academic slide decks requires more than document summarization. It demands faithful content selection, coherent slide organization, layout-aware rendering, and robust multi-turn instruction following. However, existing benchmarks and evaluation protocols do not adequately measure these challenges. To address this gap, we introduce the Deck Edits and Compliance Kit Benchmark (DECKBench), an evaluation framework for multi-agent slide generation and editing. DECKBench is built on a curated dataset of paper to slide pairs augmented with realistic, simulated editing instructions. Our evaluation protocol systematically assesses slide-level and deck-level fidelity, coherence, layout quality, and multi-turn instruction following. We further implement a modular multi-agent baseline system that decomposes the slide generation and editing task into paper parsing and summarization, slide planning, HTML creation, and iterative editing. Experimental results demonstrate that the proposed benchmark highlights strengths, exposes failure modes, and provides actionable insights for improving multi-agent slide generation and editing systems. Overall, this work establishes a standardized foundation for reproducible and comparable evaluation of academic presentation generation and editing. Code and data are publicly available at https://github.com/morgan-heisler/DeckBench .

2601.17053 2026-06-17 cs.CV

Synthetic Data Guided Feature Selection for Robust Activity Recognition in Older Adults

合成数据引导的特征选择用于老年人稳健活动识别

Shuhao Que, Dieuwke van Dartel, Ilse Heeringa, Han Hegeman, Miriam Vollenbroek-Hutten, Ying Wang

发表机构 * University of Twente(特文特大学) Ziekenhuis Groep Twente(Twente医疗集团) Medisch Spectrum Twente(Twente医疗光谱)

AI总结 本研究开发了稳健的人体活动识别系统,利用合成数据提高老年人髋部骨折康复期间持续活动识别的可靠性,尤其在识别高临床相关性的体位转移任务上表现突出。

Comments This paper has been submitted to Nordic Conference on Digital Health and Wireless Solutions 2026, currently under review

详情
AI中文摘要

髋部骨折康复期间的体力活动对于减轻老年人群长期功能下降至关重要,但在临床实践中很少被量化。现有连续监测系统通常针对中年人开发,因此在老年人步态缓慢且变化大的情况下表现不可靠。本研究旨在开发一个稳健的人体活动识别(HAR)系统,以提高髋部骨折康复期间的持续体力活动识别。24名超过80岁的健康老年人在模拟自由生活条件下,佩戴两个加速度计(分别置于下背部和前上大腿)进行了75分钟的日常活动(行走、站立、坐、躺和体位转换)。通过留一被试法交叉验证评估模型的鲁棒性。合成数据展示了在不同参与者间泛化的能力。所得到的特征干预模型(FIM)通过合成数据指导实现了可靠的活动识别,其平均F1分数分别为行走0.896、站立0.927、坐0.997、躺0.937、体位转换0.816。与无合成数据的对照模型相比,FIM显著提高了体位转换检测,即在现有HAR文献中常被忽视的高临床相关性活动类别。结论:这些初步结果展示了在老年人群中稳健活动识别的可行性。需要进一步在髋部骨折患者群体中验证以评估所提出监测系统的临床实用性。

英文摘要

Physical activity during hip fracture rehabilitation is essential for mitigating long-term functional decline in geriatric patients. However, it is rarely quantified in clinical practice. Existing continuous monitoring systems with commercially available wearable activity trackers are typically developed in middle-aged adults and therefore perform unreliably in older adults with slower and more variable gait patterns. This study aimed to develop a robust human activity recognition (HAR) system to improve continuous physical activity recognition in the context of hip fracture rehabilitation. 24 healthy older adults aged over 80 years were included to perform activities of daily living (walking, standing, sitting, lying down, and postural transfers) under simulated free-living conditions for 75 minutes while wearing two accelerometers positioned on the lower back and anterior upper thigh. Model robustness was evaluated using leave-one-subject-out cross-validation. The synthetic data demonstrated potential to improve generalization across participants. The resulting feature intervention model (FIM), aided by synthetic data guidance, achieved reliable activity recognition with mean F1-scores of 0.896 for walking, 0.927 for standing, 0.997 for sitting, 0.937 for lying down, and 0.816 for postural transfers. Compared with a control condition model without synthetic data, the FIM significantly improved the postural transfer detection, i.e., an activity class of high clinical relevance that is often overlooked in existing HAR literature. In conclusion, these preliminary results demonstrate the feasibility of robust activity recognition in older adults. Further validation in hip fracture patient populations is required to assess the clinical utility of the proposed monitoring system.

2509.11154 2026-06-17 cs.LG cs.AI

Feature Space Topology Control via Hopkins Loss

通过霍普金斯损失控制特征空间拓扑

Einari Vaaras, Manu Airaksinen

发表机构 * Signal Processing Research Centre Tampere University(信号处理研究中心塔尔皮莱大学) BABA Center, Department of Physiology University of Helsinki(BABA中心生理学系赫尔辛基大学)

AI总结 本文提出霍普金斯损失,用于控制特征空间拓扑,通过非线性瓶颈自编码器在语音、文本和图像数据中验证其在分类和降维中的有效性。

Comments Accepted for publication in Proc. IEEE ICTAI 2025, Athens, Greece

详情
AI中文摘要

特征空间拓扑指的是特征空间中样本的组织方式。修改此拓扑在机器学习应用中有益,包括降维、生成建模、迁移学习和对抗攻击的鲁棒性。本文引入了霍普金斯损失,利用霍普金斯统计量来强制实现期望的特征空间拓扑,与现有拓扑相关方法旨在保留输入特征拓扑不同。我们在语音、文本和图像数据的两个场景中评估了霍普金斯损失的有效性:分类和使用非线性瓶颈自编码器的降维。实验表明,将霍普金斯损失整合到分类或降维中对分类性能影响很小,但能提供修改特征拓扑的好处。

英文摘要

Feature space topology refers to the organization of samples within the feature space. Modifying this topology can be beneficial in machine learning applications, including dimensionality reduction, generative modeling, transfer learning, and robustness to adversarial attacks. This paper introduces a novel loss function, Hopkins loss, which leverages the Hopkins statistic to enforce a desired feature space topology, which is in contrast to existing topology-related methods that aim to preserve input feature topology. We evaluate the effectiveness of Hopkins loss on speech, text, and image data in two scenarios: classification and dimensionality reduction using nonlinear bottleneck autoencoders. Our experiments show that integrating Hopkins loss into classification or dimensionality reduction has only a small impact on classification performance while providing the benefit of modifying feature topology.

2601.12641 2026-06-17 cs.AI

STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models

STEP-LLM: 通过大型语言模型生成CAD STEP模型

Xiangyu Shi, Junyang Ding, Xu Zhao, Sinong Zhan, Payal Mohapatra, Daniel Quispe, Kojo Welbeck, Jian Cao, Wei Chen, Ping Guo, Qi Zhu

发表机构 * Northwestern University(西北大学)

AI总结 本文提出STEP-LLM,通过大型语言模型将自然语言转化为CAD STEP模型,采用图结构预处理和强化学习提升几何精度,验证了LLM驱动的STEP模型生成可行性。

Comments Accepted to the Design, Automation & Test in Europe Conference (DATE) 2026

详情
AI中文摘要

计算机辅助设计(CAD)对现代制造至关重要,但模型创建仍劳力密集且依赖专业知识。为使非专家能将直观设计意图转化为可制造的产物,近期基于大语言模型的文本到CAD研究聚焦于命令序列或脚本格式如CadQuery。然而,这些格式依赖内核且缺乏制造业的通用性。相比之下,产品数据交换标准(STEP,ISO 10303)文件是一种广泛采用的中性边界表示(B-rep)格式,直接兼容制造,但其图结构、交叉引用性质对自回归LLM提出了独特挑战。为此,我们编纂了约40,000个STEP-描述对的数据集,并引入了针对STEP图结构格式的新型预处理,包括基于深度优先搜索的重序列化,线性化交叉引用同时保持局部性和思维链(CoT)式结构注释,以引导全局一致性。我们整合了检索增强生成,以在监督微调中将预测与相关示例联系起来,并通过特定的Chamfer距离基于几何奖励的强化学习优化生成质量。实验表明,我们的STEP-LLM在几何保真度上优于Text2CAD基线,改进来自我们框架的多个阶段:RAG模块显著增强了完整性和可渲染性,DFS基于的重序列化增强了整体准确性,RL进一步减少了几何偏差。两者指标和视觉比较均确认STEP-LLM生成的形状比Text2CAD更精确。这些结果展示了通过自然语言驱动LLM生成STEP模型的可行性,展示了其在制造业CAD设计中的潜力。

英文摘要

Computer-aided design (CAD) is vital to modern manufacturing, yet model creation remains labor-intensive and expertise-heavy. To enable non-experts to translate intuitive design intent into manufacturable artifacts, recent large language models-based text-to-CAD efforts focus on command sequences or script-based formats like CadQuery. However, these formats are kernel-dependent and lack universality for manufacturing. In contrast, the Standard for the Exchange of Product Data (STEP, ISO 10303) file is a widely adopted, neutral boundary representation (B-rep) format directly compatible with manufacturing, but its graph-structured, cross-referenced nature poses unique challenges for auto-regressive LLMs. To address this, we curate a dataset of ~40K STEP-caption pairs and introduce novel preprocessing tailored for the graph-structured format of STEP, including a depth-first search-based reserialization that linearizes cross-references while preserving locality and chain-of-thought(CoT)-style structural annotations that guide global coherence. We integrate retrieval-augmented generation to ground predictions in relevant examples for supervised fine-tuning, and refine generation quality through reinforcement learning with a specific Chamfer Distance-based geometric reward. Experiments demonstrate consistent gains of our STEP-LLM in geometric fidelity over the Text2CAD baseline, with improvements arising from multiple stages of our framework: the RAG module substantially enhances completeness and renderability, the DFS-based reserialization strengthens overall accuracy, and the RL further reduces geometric discrepancy. Both metrics and visual comparisons confirm that STEP-LLM generates shapes with higher fidelity than Text2CAD. These results show the feasibility of LLM-driven STEP model generation from natural language, showing its potential to democratize CAD design for manufacturing.

2509.03932 2026-06-17 cs.CL cs.CY cs.LG

KPoEM: A Human-Annotated Dataset for Emotion Classification and RAG-Based Poetry Generation in Korean Modern Poetry

KPoEM:用于韩国现代诗歌情感分类与基于RAG的诗歌生成的人工标注数据集

Iro Lim, Haein Ji, Byungjun Kim

发表机构 * The Academy of Korean Studies(韩国学术院) Graduate School of Korean Studies(韩国研究研究生院) Cultural Informatics(文化信息学)

AI总结 本研究构建了KPoEM多标签情感数据集,通过序列微调策略实现F1-micro 0.60的情感分类,并验证了基于RAG的诗歌生成在韩国文学情感与文化表达上的可行性。

Comments 43 pages, 22 tables, 3 figures, Digital Humanities and Social Sciences Korea Conference, James Joo-Jin Kim Center for Korean Studies, University of Pennsylvania, Philadelphia, USA

详情
Journal ref
The Review of Korean Studies 29(1) (2026) 161-206
AI中文摘要

本研究介绍了KPoEM(韩国诗歌情感映射),这是一个新颖的数据集,为现代韩国诗歌中情感中心分析和生成应用奠定了基础。尽管自然语言处理取得了进展,但由于诗歌复杂的比喻语言和文化特异性,其研究仍不充分。我们构建了一个包含7,662条条目(7,007条行级和615条作品级)的多标签数据集,由五位有影响力的韩国诗人的44个细粒度情感类别进行标注。通过序列策略(从通用语料库到专门的KPoEM数据集)微调的KPoEM情感分类模型,实现了0.60的F1-micro分数,显著优于之前的模型(0.43)。该模型在保留核心诗歌情感的同时,展示了识别时间和文化特定情感表达的能力增强。此外,将结构化情感数据集应用于基于RAG的诗歌生成模型,证明了生成反映韩国文学情感和文化敏感性文本的实证可行性。这种综合方法加强了计算技术与文学分析之间的联系,为定量情感研究和生成诗学开辟了新途径。总体而言,本研究为推进现代韩国诗歌中情感中心分析和创作提供了基础。

英文摘要

This study introduces KPoEM (Korean Poetry Emotion Mapping), a novel dataset that serves as a foundation for both emotion-centered analysis and generative applications in modern Korean poetry. Despite advancements in NLP, poetry remains underexplored due to its complex figurative language and cultural specificity. We constructed a multi-label dataset of 7,662 entries (7,007 line-level and 615 work-level), annotated with 44 fine-grained emotion categories from five influential Korean poets. The KPoEM emotion classification model, fine-tuned through a sequential strategy -- moving from general-purpose corpora to the specialized KPoEM dataset -- achieved an F1-micro score of 0.60, significantly outperforming previous models (0.43). The model demonstrates an enhanced ability to identify temporally and culturally specific emotional expressions while preserving core poetic sentiments. Furthermore, applying the structured emotion dataset to a RAG-based poetry generation model demonstrates the empirical feasibility of generating texts that reflect the emotional and cultural sensibilities of Korean literature. This integrated approach strengthens the connection between computational techniques and literary analysis, opening new pathways for quantitative emotion research and generative poetics. Overall, this study provides a foundation for advancing emotion-centered analysis and creation in modern Korean poetry.

2509.19525 2026-06-17 cs.RO

Real-Time Reinforcement Learning for Dynamic Tasks with a Parallel Soft Robot

动态任务的实时强化学习与并行软机器人

James Avtges, Jake Ketchum, Millicent Schlafly, Helena Young, Taekyoung Kim, Allison Pinosky, Ryan L. Truby, Todd D. Murphey

发表机构 * Department of Mechanical Engineering, Northwestern University(西北大学机械工程系) Department of Materials Science and Engineering, Northwestern University(西北大学材料科学与工程系)

AI总结 本文提出基于课程学习的实时强化学习方法,用于在单次部署中实现软机器人的动态平衡,通过并行软执行器和HSA结构实现高可靠性控制。

Comments Published at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

详情
AI中文摘要

闭环控制仍然是软机器人领域的开放挑战。在动态负载条件下,软执行器的非线性响应限制了分析模型在软机器人控制中的应用。传统方法在控制软机器人时未能充分利用其配置空间,以避免非线性、迟滞、大变形和执行器损坏的风险。此外,基于经验的数据驱动控制方法,如强化学习(RL),通常受到样本效率和初始化不一致的限制。在本工作中,我们展示了RL在实时单次硬件部署中可靠地学习动态平衡任务的控制策略。我们使用由并行3D打印软执行器构建的可变形斯图尔特平台,基于电机驱动的 handed shearing auxetic(HSA)结构。通过引入基于已知平衡点扩展邻域的课程学习方法,我们实现了在任意坐标处的可靠单次部署平衡。除了对基于模型和无模型方法的性能进行基准测试外,我们还证明了在单次部署中,最大扩散RL能够在半数执行器有效失效的情况下学习动态平衡,通过诱导屈曲并用切割器破坏执行器。训练无需先验数据,可在15分钟内完成,性能几乎与完整平台相同。单次硬件学习使软机器人系统能够可靠地在现实世界中学习,并将使更多样化和有能力的软机器人成为可能。

英文摘要

Closed-loop control remains an open challenge in soft robotics. The nonlinear responses of soft actuators under dynamic loading conditions limit the use of analytic models for soft robot control. Traditional methods of controlling soft robots underutilize their configuration spaces to avoid nonlinearity, hysteresis, large deformations, and the risk of actuator damage. Furthermore, episodic data-driven control approaches such as reinforcement learning (RL) are traditionally limited by sample efficiency and inconsistency across initializations. In this work, we demonstrate RL for reliably learning control policies for dynamic balancing tasks in real-time single-shot hardware deployments. We use a deformable Stewart platform constructed using parallel, 3D-printed soft actuators based on motorized handed shearing auxetic (HSA) structures. By introducing a curriculum learning approach based on expanding neighborhoods of a known equilibrium, we achieve reliable single-deployment balancing at arbitrary coordinates. In addition to benchmarking the performance of model-based and model-free methods, we demonstrate that in a single deployment, Maximum Diffusion RL is capable of learning dynamic balancing after half of the actuators are effectively disabled, by inducing buckling and by breaking actuators with bolt cutters. Training occurs with no prior data, in as fast as 15 minutes, with performance nearly identical to the fully-intact platform. Single-shot learning on hardware facilitates soft robotic systems reliably learning in the real world and will enable more diverse and capable soft robots.

2501.16370 2026-06-17 cs.LG cs.AI cs.NA cs.NE math.NA

Advanced Physics-Informed Neural Network with Residuals for Solving Complex Integral Equations

先进物理指导神经网络与残差用于求解复杂积分方程

Mahdi Movahedian Moghaddam, Kourosh Parand, Saeed Reza Kheradpisheh

发表机构 * Department of Computer and Data Sciences, Shahid Beheshti University(计算机与数据科学系,谢赫·贝赫什提大学) Department of Cognitive Modeling, Shahid Beheshti University(认知建模系,谢赫·贝赫什提大学)

AI总结 本文提出残差积分求解网络(RISN),通过高精度数值方法与残差连接提升求解积分和积分微分方程的精度与稳定性,实验表明其在多种方程类型上均优于传统PINN及其变体。

详情
Journal ref
Anal. Numer. Solut. Nonlinear Equ. 11 (2026), no. 1, 153-173
AI中文摘要

本文提出残差积分求解网络(RISN),一种新型神经网络架构,旨在求解广泛类别的积分和积分微分方程,包括一维、多维、常微分和偏微分、分数类型以及包含振荡核的霍尔迈尔类型积分方程。RISN整合残差连接与高精度数值方法如高斯求积和分数导数运算矩阵,使其在精度和稳定性上优于传统物理指导神经网络(PINN)。残差连接有助于缓解消失梯度问题,使RISN能够处理更深层的网络和更复杂的核,特别是在多维问题中。通过广泛实验,我们证明RISN在各种方程类型上均优于传统PINN及其变体,如辅助PINN(A-PINN)和自适应PINN(SA-PINN),在各种方程类型上均取得显著更低的平均绝对误差(MAE)。这些结果突显了RISN在求解具有挑战性的积分和积分微分问题中的鲁棒性和效率,使其成为传统方法难以应对的现实应用中的宝贵工具。

英文摘要

In this paper, we present the Residual Integral Solver Network (RISN), a novel neural network architecture designed to solve a wide range of integral and integro-differential equations, including one-dimensional, multi-dimensional, ordinary and partial integro-differential, systems, fractional types, and Helmholtz-type integral equations involving oscillatory kernels. RISN integrates residual connections with high-accuracy numerical methods such as Gaussian quadrature and fractional derivative operational matrices, enabling it to achieve higher accuracy and stability than traditional Physics-Informed Neural Networks (PINN). The residual connections help mitigate vanishing gradient issues, allowing RISN to handle deeper networks and more complex kernels, particularly in multi-dimensional problems. Through extensive experiments, we demonstrate that RISN consistently outperforms not only classical PINNs but also advanced variants such as Auxiliary PINN (A-PINN) and Self-Adaptive PINN (SA-PINN), achieving significantly lower Mean Absolute Errors (MAE) across various types of equations. These results highlight RISN's robustness and efficiency in solving challenging integral and integro-differential problems, making it a valuable tool for real-world applications where traditional methods often struggle.

2509.13196 2026-06-17 cs.CL

The Few-shot Dilemma: Over-prompting Large Language Models

少样本困境:过度提示大型语言模型

Yongjian Tang, Doruk Tuncel, Christian Koerner, Thomas Runkler

发表机构 * Siemens AG(西门子股份公司) Technical University of Munich(慕尼黑技术大学)

AI总结 本文提出一个提示框架,使用随机采样、语义嵌入和TF-IDF三种少样本选择方法,在多个LLM上实验发现过多领域特定示例会降低性能,并通过TF-IDF与分层采样结合找到最优示例数量,在软件需求分类上超越现有方法1%。

Comments accepted for the main track of FLLM

详情
AI中文摘要

过度提示是一种现象,即提示中过多的示例导致大型语言模型(LLMs)性能下降,挑战了关于上下文少样本学习的传统观点。为了研究这种少样本困境,我们概述了一个提示框架,该框架利用三种标准的少样本选择方法——随机采样、语义嵌入和TF-IDF向量——并在多个LLM上评估这些方法,包括GPT-4o、GPT-3.5-turbo、DeepSeek-V3、Gemma-3、LLaMA-3.1、LLaMA-3.2和Mistral。我们的实验结果表明,在提示中加入过多的领域特定示例可能会在某些LLM中反常地降低性能,这与先前认为更多相关少样本示例普遍有利于LLM的实证结论相矛盾。鉴于LLM辅助软件工程和需求分析的趋势,我们在两个真实世界的软件需求分类数据集上进行了实验。通过逐步增加TF-IDF选择和分层的少样本示例数量,我们为每个LLM确定了其最优数量。这种组合方法以更少的示例实现了更优的性能,避免了过度提示问题,从而在功能性和非功能性需求分类上超越了现有技术1%。

英文摘要

Over-prompting, a phenomenon where excessive examples in prompts lead to diminished performance in Large Language Models (LLMs), challenges the conventional wisdom about in-context few-shot learning. To investigate this few-shot dilemma, we outline a prompting framework that leverages three standard few-shot selection methods - random sampling, semantic embedding, and TF-IDF vectors - and evaluate these methods across multiple LLMs, including GPT-4o, GPT-3.5-turbo, DeepSeek-V3, Gemma-3, LLaMA-3.1, LLaMA-3.2, and Mistral. Our experimental results reveal that incorporating excessive domain-specific examples into prompts can paradoxically degrade performance in certain LLMs, which contradicts the prior empirical conclusion that more relevant few-shot examples universally benefit LLMs. Given the trend of LLM-assisted software engineering and requirement analysis, we experiment with two real-world software requirement classification datasets. By gradually increasing the number of TF-IDF-selected and stratified few-shot examples, we identify their optimal quantity for each LLM. This combined approach achieves superior performance with fewer examples, avoiding the over-prompting problem, thus surpassing the state-of-the-art by 1% in classifying functional and non-functional requirements.

2509.10089 2026-06-17 cs.LG

KAN-SR: A Kolmogorov-Arnold Network Guided Symbolic Regression Framework

KAN-SR:基于Kolmogorov-Arnold网络的符号回归框架

Marco Andrea Bühler, Gonzalo Guillén-Gosálbez

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 本文提出基于Kolmogorov-Arnold网络的KAN-SR框架,通过深度学习技术和简化策略恢复Feynman符号回归科学发现数据集的真实方程,并结合神经控制微分方程精确建模生物过程系统。

详情
Journal ref
Computers & Chemical Engineering, Volume 213, 2026, 109721
AI中文摘要

我们介绍了一种新颖的符号回归框架,即KAN-SR,其基于Kolmogorov-Arnold网络(KANs),采用分而治之的方法。符号回归旨在寻找最佳拟合给定数据集的数学方程,通常通过遗传编程方法解决。我们证明通过使用深度学习技术、更具体的KANs以及结合简化策略如平移对称性和分离性,能够恢复Feynman符号回归科学发现数据集的真实方程。此外,我们还证明通过将所提出的框架与神经控制微分方程结合,能够精确建模生物过程系统,为其他工程系统的动态建模打开大门。

英文摘要

We introduce a novel symbolic regression framework, namely KAN-SR, built on Kolmogorov Arnold Networks (KANs) which follows a divide-and-conquer approach. Symbolic regression searches for mathematical equations that best fit a given dataset and is commonly solved with genetic programming approaches. We show that by using deep learning techniques, more specific KANs, and combining them with simplification strategies such as translational symmetries and separabilities, we are able to recover ground-truth equations of the Feynman Symbolic Regression for Scientific Discovery (SRSD) dataset. Additionally, we show that by combining the proposed framework with neural controlled differential equations, we are able to model the dynamics of an in-silico bioprocess system precisely, opening the door for the dynamic modeling of other engineering systems.

2506.19277 2026-06-17 cs.RO cs.SY eess.SY

Ontology Neural Network and ORTSF: A Framework for Topological Reasoning and Delay-Robust Control

本体神经网络与ORTSF:一种用于拓扑推理和延迟鲁棒控制的框架

Jaehong Oh

发表机构 * Department of Mechanical Engineering Soongsil University, Seoul, Korea Email

AI总结 本文提出Ontology Neural Network和ORTSF框架,解决现有方法在关系语义表示和动态环境中协作所需认知透明度的不足,通过统一架构实现语义认知与鲁棒控制的统一。

Comments 12 pages, 5 figures, includes theoretical proofs and simulation results

详情
AI中文摘要

自主机器人系统的进步在感知、定位、建图和控制方面取得了显著成果,但存在根本性缺口:现有框架在几何推理和动态稳定性方面表现优异,但在关系语义表示、上下文推理和认知透明度方面存在不足,这些是动态、以人为中心环境中协作的关键。本文提出包含本体神经网络(ONN)和本体实时语义织体(ORTSF)的统一架构,以解决这一缺口。ONN将关系语义推理形式化为动态拓扑过程。通过将Forman-Ricci曲率、持续同调和语义张量结构嵌入统一的损失公式中,ONN确保随着场景随时间演变,关系完整性和拓扑一致性得以保持。ORTSF将推理轨迹转化为可操作的控制命令,同时补偿系统延迟。它整合了预测性和延迟感知的操作符,确保在显著延迟条件下相位边距的保持和控制信号的连续性。实证研究展示了ONN + ORTSF框架在统一语义认知和鲁棒控制方面的能力,提供了一种数学上严谨且实际可行的解决方案,用于认知机器人学。

英文摘要

The advancement of autonomous robotic systems has led to impressive capabilities in perception, localization, mapping, and control. Yet, a fundamental gap remains: existing frameworks excel at geometric reasoning and dynamic stability but fall short in representing and preserving relational semantics, contextual reasoning, and cognitive transparency essential for collaboration in dynamic, human-centric environments. This paper introduces a unified architecture comprising the Ontology Neural Network (ONN) and the Ontological Real-Time Semantic Fabric (ORTSF) to address this gap. The ONN formalizes relational semantic reasoning as a dynamic topological process. By embedding Forman-Ricci curvature, persistent homology, and semantic tensor structures within a unified loss formulation, ONN ensures that relational integrity and topological coherence are preserved as scenes evolve over time. The ORTSF transforms reasoning traces into actionable control commands while compensating for system delays. It integrates predictive and delay-aware operators that ensure phase margin preservation and continuity of control signals, even under significant latency conditions. Empirical studies demonstrate the ONN + ORTSF framework's ability to unify semantic cognition and robust control, providing a mathematically principled and practically viable solution for cognitive robotics.

2506.10207 2026-06-17 cs.SD cs.DC eess.AS

FedMLAC: Mutual Learning Driven Heterogeneous Federated Audio Classification

FedMLAC:基于互学习的异构联邦音频分类

Jun Bai, Rajib Rana, Di Wu, Youyang Qu, Xiaohui Tao, Ji Zhang, Carlos Busso, Shivakumara Palaiahnakote

发表机构 * School of Computer Science, McGill University(麦吉尔大学计算机科学学院) Mila - Quebec AI Institute(魁北克AI研究所) School of Mathematics, Physics and Computing, University of Southern Queensland(南方昆士兰大学数学、物理与计算学院) Language Technologies Institute, Carnegie Mellon University(卡内基梅隆大学语言技术研究所) School of Science, Engineering and Environment, University of Salford(萨尔福德大学科学、工程与环境学院)

AI总结 FedMLAC通过双向知识蒸馏解决联邦音频分类中的数据和模型异质性问题,并引入分层剪枝聚合策略对抗数据污染,实验表明其在分类准确性和抗噪声能力上优于现有方法。

Comments updated version for the first submission

详情
Journal ref
Pattern Recognition, vol. 180, Article 114250, 2026
AI中文摘要

联邦学习(FL)提供了一个隐私保护的框架,用于在去中心化的客户端上训练音频分类(AC)模型,而无需共享原始数据。然而,联邦音频分类(FedAC)面临三大主要挑战:数据异质性、模型异质性以及数据污染,这些会降低实际应用中的性能。尽管现有方法通常分别解决这些问题,但统一且稳健的解决方案仍被忽视。我们提出了FedMLAC,一种基于互学习的FL框架,同时解决这三个挑战。每个客户端维护一个个性化本地AC模型和一个轻量级、全局共享的Plug-in模型。这些模型通过双向知识蒸馏交互,实现全局知识共享的同时适应本地数据分布,从而解决数据和模型异质性问题。为对抗数据污染,我们引入了分层剪枝聚合(LPA)策略,在聚合过程中根据参数偏差过滤异常的Plug-in更新。在四个多样化的音频分类基准上进行了广泛的实验,包括语音和非语音任务,结果表明FedMLAC在分类准确性和抗噪声能力上始终优于最先进的基线方法。

英文摘要

Federated Learning (FL) offers a privacy-preserving framework for training audio classification (AC) models across decentralized clients without sharing raw data. However, Federated Audio Classification (FedAC) faces three major challenges: data heterogeneity, model heterogeneity, and data poisoning, which degrade performance in real-world settings. While existing methods often address these issues separately, a unified and robust solution remains underexplored. We propose FedMLAC, a mutual learning-based FL framework that tackles all three challenges simultaneously. Each client maintains a personalized local AC model and a lightweight, globally shared Plug-in model. These models interact via bidirectional knowledge distillation, enabling global knowledge sharing while adapting to local data distributions, thus addressing both data and model heterogeneity. To counter data poisoning, we introduce a Layer-wise Pruning Aggregation (LPA) strategy that filters anomalous Plug-in updates based on parameter deviations during aggregation. Extensive experiments on four diverse audio classification benchmarks, including both speech and non-speech tasks, show that FedMLAC consistently outperforms state-of-the-art baselines in classification accuracy and robustness to noisy data.

2502.10112 2026-06-17 cs.LG

Accelerometry-based Energy Expenditure Estimation During Activities of Daily Living: A Comparison Among Different Accelerometer Compositions

基于加速度计的日常活动能量消耗估计:不同加速度计配置的比较

Shuhao Que, Remco Poelarends, Peter Veltink, Miriam Vollenbroek-Hutten, Ying Wang

发表机构 * Department of Electrical Engineering, University of Twente(特文特大学电气工程系) Department of Nuclear Medicine, Isala(Isala核医学部)

AI总结 本文比较了基于身体中心质量加速度和腕部加速度计的不同配置在日常活动能量消耗估计中的表现,发现基于身体中心质量的3-acc配置表现最佳。

Comments This work has been accepted by IEEE EMBC 2025

详情
AI中文摘要

身体活动能量消耗(PAEE)可通过呼吸数据测量,也可通过身体运动预测。身体中心质量(COM)加速度反映全身运动,是PAEE的良好预测指标。本文使用COSMED K5测量的呼吸数据作为参考,评估了基于COM和腕部的配置性能。COM配置包括仅使用骨盆加速度计(pelvis-acc)和骨盆加速度计加双大腿加速度计(3-acc)。腕部配置包括仅使用左腕或右腕加速度计。两种现有PAEE估计方法(线性回归和CNN-LSTM)在3-acc配置下表现最佳(LR:R²=0.41,CNN-LSTM:R²=0.53)。3-acc与pelvis-acc配置无显著差异(p值=0.278)。对于两种模型,左腕或右腕配置在PAEE预测中无显著表现(R²接近0,显著劣于COM配置(p值<0.05)。左右腕无显著差异(p值=0.329)

英文摘要

Physical activity energy expenditure (PAEE) can be measured from breath-by-breath respiratory data, which can serve as a reference. Alternatively, PAEE can be predicted from the body movements, which can be measured and estimated with accelerometers. The body center of mass (COM) acceleration reflects the movements of the whole body and thus serves as a good predictor for PAEE. However, the wrist has also become a popular location due to recent advancements in wrist-worn devices. Therefore, in this work, using the respiratory data measured by COSMED K5 as the reference, we evaluated and compared the performances of COM-based settings and wrist-based settings. The COM-based settings include two different accelerometer compositions, using only the pelvis accelerometer (pelvis-acc) and the pelvis accelerometer with two accelerometers from two thighs (3-acc). The wrist-based settings include using only the left wrist accelerometer (l-wrist-acc) and only the right wrist accelerometer (r-wrist-acc). We implemented two existing PAEE estimation methods on our collected dataset, where 9 participants performed activities of daily living while wearing 5 accelerometers (i.e., pelvis, two thighs, and two wrists). These two methods include a linear regression (LR) model and a CNN-LSTM model. Both models yielded the best results with the COM-based 3-acc setting (LR: $R^2$ = 0.41, CNN-LSTM: $R^2$ = 0.53). No significant difference was found between the 3-acc and pelvis-acc settings (p-value = 0.278). For both models, neither the l-wrist-acc nor the r-wrist-acc settings demonstrated predictive power on PAEE with $R^2$ values close to 0, significantly outperformed by the two COM-based settings (p-values $<$ 0.05). No significant difference was found between the two wrists (p-value = 0.329).

2503.08679 2026-06-17 cs.AI cs.CL cs.LG

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

现实中的思维链推理并不总是忠实的

Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, Arthur Conmy

发表机构 * Poseidon Research(Poseidon研究)

AI总结 研究发现,在自然语言提示下,模型有时会生成表面连贯但自相矛盾的思维链,揭示出隐含的事后合理化现象,且前沿模型也未能完全避免。

Comments Published at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

最近的研究表明,当面对提示中的显式偏见时,模型通常会在其思维链(CoT)输出中省略提及这些偏见,揭示出口头推理可能给出模型如何得出错误结论的不正确图景(不忠实)。在这项工作中,我们展示了不忠实的CoT也发生在自然措辞、非对抗性的提示上,而无需添加人为偏见或编辑模型输出。我们发现,当分别呈现问题“X比Y大吗?”和“Y比X大吗?”时,模型有时会生成表面连贯的论证来证明系统性地对两者都回答“是”或都回答“否”是合理的,尽管存在矛盾。我们提供了初步证据表明这是由于模型对“是”或“否”的隐含偏见,并将其标记为隐含的事后合理化。我们的结果显示,生产模型的不忠实率高达13%,而前沿模型虽然更忠实,但没有一个完全忠实,包括像DeepSeek R1(0.37%)和Sonnet 3.7 with thinking(0.04%)这样的思考模型。我们还研究了不忠实的非逻辑捷径,即模型使用微妙的非逻辑推理来使对困难数学问题的推测性答案看起来经过严格证明。我们的发现表明,虽然CoT可用于评估输出,但它并不是产生模型答案的内部过程的完整描述,应在代理或安全关键环境中谨慎使用。

英文摘要

Recent studies indicate that when faced with explicit biases in prompts, models often omit mentioning these biases in their Chain-of-Thought (CoT) output, revealing that verbalized reasoning can give an incorrect picture of how models arrive at conclusions (unfaithfulness). In this work, we show that unfaithful CoT also occurs on naturally worded, non-adversarial prompts without adding artificial biases or editing model outputs. We find that when separately presented with the questions "Is X bigger than Y?" and "Is Y bigger than X?", models sometimes produce superficially coherent arguments to justify systematically answering Yes to both or No to both, despite the contradiction. We present preliminary evidence that this is due to models' implicit biases towards Yes or No, labeling this Implicit Post-Hoc Rationalization. Our results reveal rates up to 13% for production models, and while frontier models are more faithful, none are entirely so, including thinking models like DeepSeek R1 (0.37%) and Sonnet 3.7 with thinking (0.04%). We also investigate Unfaithful Illogical Shortcuts, where models use subtly illogical reasoning to make speculative answers to hard math problems seem rigorously proven. Our findings indicate that while CoT can be useful for assessing outputs, it is not a complete account of the internal process that produced the model's answer and should be used with caution in agentic or safety-critical settings.

2305.09366 2026-06-17 cs.LG eess.SP

Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors

基于可穿戴运动传感器的自动婴儿运动分类中自监督预训练的评估

Einari Vaaras, Manu Airaksinen, Sampsa Vanhatalo, Okko Räsänen

发表机构 * Helsinki University Hospital, Helsinki, Finland(赫尔辛基大学医院,芬兰)

AI总结 本文评估了自监督预训练在提高基于可穿戴运动传感器的婴儿运动分类准确性中的效果,发现预训练无标签数据可提升分类模型的鲁棒性,且选择上下文相关数据进一步提升了性能。

Comments To be published in Proc. IEEE EMBC 2023, Sydney, Australia

详情
AI中文摘要

最近开发的婴儿可穿戴MAIJU设备为在非医院环境客观评估婴儿运动性能提供了新方法,该信息可用于发展研究和临床决策支持,如检测发育问题并指导治疗干预。MAIJU分析完全依赖于婴儿姿势和运动的分类,因此研究如何提高此类分类的准确性至关重要。本文研究了自监督预训练如何提升用于分析MAIJU记录的分类器性能,并探讨了预训练数据的上下文选择性质量筛选是否会影响分类器性能。实验表明,i)使用无标签数据预训练分类器可使后续分类模型的准确性显著提升,ii)选择上下文相关预训练数据可进一步提高分类器性能。

英文摘要

The recently-developed infant wearable MAIJU provides a means to automatically evaluate infants' motor performance in an objective and scalable manner in out-of-hospital settings. This information could be used for developmental research and to support clinical decision-making, such as detection of developmental problems and guiding of their therapeutic interventions. MAIJU-based analyses rely fully on the classification of infant's posture and movement; it is hence essential to study ways to increase the accuracy of such classifications, aiming to increase the reliability and robustness of the automated analysis. Here, we investigated how self-supervised pre-training improves performance of the classifiers used for analyzing MAIJU recordings, and we studied whether performance of the classifier models is affected by context-selective quality-screening of pre-training data to exclude periods of little infant movement or with missing sensors. Our experiments show that i) pre-training the classifier with unlabeled data leads to a robust accuracy increase of subsequent classification models, and ii) selecting context-relevant pre-training data leads to substantial further improvements in the classifier performance.

2206.10188 2026-06-17 cs.LG cs.SD eess.AS

Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition

基于聚类的主动学习中自监督学习与降维方法的分析用于语音情感识别

Einari Vaaras, Manu Airaksinen, Okko Räsänen

发表机构 * Unit of Computing Sciences, Tampere University, Finland(图皮大学计算科学系,芬兰) Helsinki University Hospital, Helsinki, Finland(赫尔辛基大学医院,芬兰)

AI总结 本文研究了在语音情感识别中,利用自监督学习和降维方法提升基于聚类的主动学习性能,探讨了特征空间局部和全局拓扑结构对主动学习的影响,发现降维不影响性能且二维特征表现良好。

Comments To be published in Proc. Interspeech 2022, Incheon, South Korea

详情
AI中文摘要

当领域专家需要进行数据标注时,减少标注工作量以节省时间和成本至关重要。在无标注情况下,可以利用特征空间结构进行基于聚类的主动学习(AL)方法。然而,这些方法高度依赖于样本在特征空间中的组织方式和距离度量。无监督方法如对比预测编码(CPC)可以用于学习有序的特征空间,但这些方法通常会产生高维特征,这可能对估计数据密度构成挑战。本文结合CPC和多种降维方法,探索基于聚类的AL的实用方法。我们的实验表明,特征空间的局部和全局拓扑结构可以成功用于AL,并且CPC可以提高基于传统信号特征的聚类AL性能。此外,我们观察到压缩数据维度对AL性能影响不大,当标注数量不低时,二维特征表示与高维特征表示在AL性能上相似。

英文摘要

When domain experts are needed to perform data annotation for complex machine-learning tasks, reducing annotation effort is crucial in order to cut down time and expenses. For cases when there are no annotations available, one approach is to utilize the structure of the feature space for clustering-based active learning (AL) methods. However, these methods are heavily dependent on how the samples are organized in the feature space and what distance metric is used. Unsupervised methods such as contrastive predictive coding (CPC) can potentially be used to learn organized feature spaces, but these methods typically create high-dimensional features which might be challenging for estimating data density. In this paper, we combine CPC and multiple dimensionality reduction methods in search of functioning practices for clustering-based AL. Our experiments for simulating speech emotion recognition system deployment show that both the local and global topology of the feature space can be successfully used for AL, and that CPC can be used to improve clustering-based AL performance over traditional signal features. Additionally, we observe that compressing data dimensionality does not harm AL performance substantially, and that 2-D feature representations achieved similar AL performance as higher-dimensional representations when the number of annotations is not very low.

2606.18019 2026-06-17 eess.AS cs.CL cs.SD 新提交

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

字里行间:利用大型语言模型从临床访谈中进行全球痴呆和抑郁评估

Franziska Braun, Alea Rüggeberg, Thomas Ranzenberger, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

发表机构 * TH Nürnberg(Nürnberg大学) FAU Erlangen(埃朗根大学) PMU Klinikum Nürnberg(纽伦堡大学医院)

AI总结 本研究利用开放权重大型语言模型,从154名德语受试者的临床访谈录音中预测痴呆和抑郁严重程度,引入与全球恶化量表对齐的全球抑郁量表,发现零样本预测对抑郁有效,而结构化特征提取显著提升痴呆评估性能,误差降低达35%,且暂停增强转录本表现与人工转录相当。

Comments Accepted for publication in Text, Speech and Dialogue (TSD 2026). The final authenticated publication will be available online via Springer LNCS/LNAI

详情
AI中文摘要

痴呆和抑郁是老年人群中最常见的神经精神障碍,其重叠症状对鉴别诊断构成重大挑战。在本研究中,我们探讨了开放权重的大型语言模型(LLMs)用于从154名德语受试者的标准化病史访谈录音中预测痴呆和抑郁严重程度。我们引入了一个与已建立的全球恶化量表(GDS)对齐的观察者基础全球抑郁量表(GDS-D),从而能够对情感和认知症状进行并行全局分期。我们在两种设置下比较了三种LLMs(Mistral 3.1、DeepHermes、Qwen3):(1) 零样本预测和(2) 基于LLM的特征提取用于支持向量回归,使用人工转录和暂停增强转录。结果显示,LLMs在零样本设置中有效预测抑郁严重程度(最佳MAE为0.60),而痴呆评估显著受益于结构化特征提取(最佳MAE为0.78),相比零样本基线误差降低高达35%。暂停增强转录本在性能上与人工转录相当,证明了全自动筛查流程在神经精神鉴别评估中的可行性。

英文摘要

Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity from speech samples collected during standardized history taking interviews with 154 German-speaking subjects. We introduce an observer-based Global Depression Scale (GDS-D) aligned with the established Global Deterioration Scale (GDS), enabling parallel global staging of affective and cognitive symptoms. We compare three LLMs (Mistral 3.1, DeepHermes, Qwen3) in two settings: (1) zero-shot prediction and (2) LLM-based feature extraction for Support Vector Regression, using human and pause-enriched transcripts. Results show that LLMs effectively predict depression severity in zero-shot settings (best MAE of 0.60), while dementia assessment benefits substantially from structured feature extraction (best MAE of 0.78), reducing errors by up to 35% over zero-shot baselines. Pause-enriched transcripts achieve competitive performance with human transcriptions, demonstrating the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.

2606.18011 2026-06-17 stat.ML cs.LG stat.ME 新提交

Fast Nonparametric Conditional Independence Testing via Two-Stage Regression

通过两阶段回归的快速非参数条件独立性检验

Eric V. Strobl

发表机构 * Department of Biomedical Informatics, University of Pittsburgh(生物医学信息学系,匹兹堡大学)

AI总结 提出BLITZ方法,通过两阶段回归(低阶多项式+浅层树)快速消除条件集影响,实现校准良好的非参数条件独立性检验,适用于因果发现。

Comments A fast R implementation with C++ back-end is available at https://github.com/ericstrobl/BLITZ

详情
AI中文摘要

基于约束的因果发现依赖于重复的条件独立性检验,但快速非参数检验往往牺牲校准性,尤其是当变量通过非线性关系依赖于条件集时。我们提出了BLITZ(Broad-to-Local Independence Testing via residualiZation),一种非参数条件独立性检验,旨在在一秒内运行良好,同时保持约束因果发现算法执行数千次查询所需的准确性。BLITZ首先使用低阶多项式回归消除对条件集的广泛平滑依赖,然后应用一个小型非线性特征映射,并通过浅层树回归对这些特征进行残差化。得到的统计量检验残差互协方差,并采用矩匹配卡方近似于零分布。我们从理论上证明,两阶段设计降低了树残差化器面临的有效复杂度,使得浅层树能够控制残差条件均值偏差,同时避免过度过拟合。在模拟中,BLITZ提供了比快速核、随机特征和基于回归的竞争者更好的零校准,同时保持所测试方法中最快的速度之一。在合成图和流式细胞术数据的因果发现实验中,BLITZ在保留的邻接中产生了更可靠的端点方向,并具有竞争力的结构恢复。这些结果表明,从宽到局部残差化是实现因果发现中校准、可扩展的非参数条件独立性检验的实用途径。

英文摘要

Constraint-based causal discovery relies on repeated conditional independence tests, but fast nonparametric tests often sacrifice calibration, especially when variables depend on the conditioning set through nonlinear relationships. We introduce BLITZ (Broad-to-Local Independence Testing via residualiZation), a nonparametric conditional independence test designed to run well under a second while maintaining the accuracy needed for the thousands of queries performed by constraint-based causal discovery algorithms. BLITZ first removes broad smooth dependence on the conditioning set using low-order polynomial regression, then applies a small nonlinear feature map and residualizes those features with shallow tree regressions. The resulting statistic tests residual cross-covariance, with a moment-matched chi-square approximation to the null distribution. We show theoretically that the two-stage design reduces the effective complexity faced by the tree residualizers, allowing shallow trees to control residual conditional-mean bias while avoiding excessive overfitting. In simulations, BLITZ provides better null calibration than fast kernel, random-feature, and regression-based competitors while remaining among the fastest methods tested. In causal discovery experiments on synthetic graphs and flow-cytometry data, BLITZ yields more reliable endpoint orientations among retained adjacencies and competitive structural recovery. These results suggest that broad-to-local residualization is a practical route to calibrated, scalable nonparametric conditional independence testing for causal discovery.

2606.17995 2026-06-17 stat.ML cs.CR cs.LG 新提交

Differential Privacy of Gaussian Process Posterior Sampling

高斯过程后验采样的差分隐私

Tomasz Maciazek

发表机构 * School of Mathematics, University of Bristol(布里斯托大学数学学院)

AI总结 研究高斯过程后验样本路径的隐私性,通过Rényi-DP界分离后验均值与协方差泄露,揭示有效岭正则化的关键作用,并验证成员推断攻击与正则化的依赖关系。

Comments 8 pages of main text + 25 pages appendix

详情
AI中文摘要

我们研究了当整个训练集(包括协变量和响应)是私有时,从高斯过程(GP)发布后验样本路径的隐私性。与添加外部噪声的标准差分隐私(DP)机制不同,后验采样在构造上是随机的。我们表明,这种内在随机性通过推导GP后验样本路径发布的显式Rényi-DP界来提供DP保证。这些界将后验均值泄露与数据相关的后验协方差泄露分开,表明有意义的隐私严重依赖于有效的岭正则化。我们应用成员推断攻击来表明经验泄露遵循对正则化、后验方差和发布的样本路径数量的预测依赖关系。在下游后验采样任务上的效用实验识别了噪声观测机制,其中隐私兼容的正则化以适度的效用损失保留了有用的决策。当需要更强的隐私时,可以通过添加校准的GP噪声来增强内在保证,提供显式的额外隐私调节旋钮。

英文摘要

We study the privacy of releasing posterior sample paths from a Gaussian process (GP) when the entire training set including covariates and responses is private. Unlike standard differential-privacy (DP) mechanisms that add external noise, posterior sampling is random by construction. We show that this intrinsic randomness yields DP guarantees by deriving explicit Rényi-DP bounds for GP posterior sample-path release. The bounds separate posterior-mean leakage from data-dependent posterior-covariance leakage showing that meaningful privacy depends sharply on effective ridge regularisation. We apply membership-inference attacks to show that empirical leakage follows the predicted dependence on regularisation, posterior variance and the number of released posterior sample-paths. Utility experiments on downstream posterior-sampling tasks identify noisy-observation regimes where privacy-compatible regularisation preserves useful decisions with modest utility loss. When stronger privacy is needed, the intrinsic guarantee can be sharpened by adding calibrated GP noise, providing an explicit additional privacy knob.

2606.17684 2026-06-17 stat.ML cs.CY cs.LG 新提交

Geometrical fairness in graph neural networks

图神经网络中的几何公平性

Arturo Pérez-Peralta, Sandra Benítez-Peña, Blas Kolic, Rosa E. Lillo

发表机构 * Department of Statistics, University Carlos III of Madrid, Spain(马德里卡斯蒂利亚-拉曼恰大学统计系) uc3m-Santander Big Data Institute(uc3m-桑坦德大数据研究所)

AI总结 针对图神经网络中公平性问题,通过修改拉普拉斯算子引入多种互补变换(子空间投影、频谱调整、频率滤波)来缓解偏差,理论分析并实验验证了公平性提升与竞争性能。

Comments 32 pages, 21 tables, 6 figures

详情
AI中文摘要

基于图的学习方法因其在多种应用中的强大性能而日益突出。其中,基于扩散过程的最新框架提供了一个统一的视角,扩展了传统的图神经网络公式,同时解决了标准消息传递机制的局限性。尽管取得了这些进展,但此类模型的公平性问题仍然令人担忧,因为它们可能传播或放大数据中存在的偏差。在这项工作中,我们通过修改底层拉普拉斯算子,引入了一种基于图扩散的公平性感知适应方法。我们的方法结合了多种互补变换,包括子空间投影、频谱调整和基于频率的滤波,以减轻与偏差相关的成分。利用图扩散的内在平滑特性,我们对由此产生的行为进行了原则性分析,并建立了公平性属性的理论见解。我们在合成数据集和真实数据集上评估了所提出的框架,结果表明,在有限的计算成本下,它实现了具有竞争力的性能,同时提高了公平性指标。

英文摘要

Graph-based learning methods have become increasingly prominent due to their strong performance across diverse applications. Among these, recent frameworks grounded in diffusion processes provide a unifying perspective that extends traditional graph neural network formulations while addressing limitations of standard message-passing mechanisms. Despite these advances, concerns remain regarding the fairness of such models, as they may propagate or amplify biases present in the data. In this work, we introduce a fairness-aware adaptation of graph-based diffusion by modifying the underlying Laplacian operator. Our approach incorporates multiple complementary transformations, including subspace projections, spectral adjustments, and frequency-based filtering, to mitigate bias-related components. Leveraging the intrinsic smoothing properties of graph diffusion, we provide a principled analysis of the resulting behavior and establish theoretical insights into fairness properties. We evaluate the proposed framework on both synthetic and real-world datasets, demonstrating that it achieves competitive performance while improving fairness metrics with limited additional computational cost.

2606.17537 2026-06-17 eess.AS cs.CL 新提交

Non-Autoregressive Minimum Bayes' Risk Decoding for Fast Speech Recognition

非自回归最小贝叶斯风险解码用于快速语音识别

Hiroyuki Deguchi, Takatomo Kano, Katsuki Chousa, Marc Delcroix

发表机构 * NTT, Inc.(日本NTT公司)

AI总结 提出基于最小贝叶斯风险解码的非自回归解码框架,通过单次前向计算高效采样多个候选,在保持速度优势的同时提升识别性能。

Comments Accepted at Interspeech2026

详情
AI中文摘要

非自回归(NAR)解码并行生成输出令牌,使语音识别比自回归解码(从左到右顺序生成)更快。然而,由于NAR解码无法通过依赖先前生成的令牌来解决不确定性,识别性能会下降。为了解决这个问题,我们提出了一种基于最小贝叶斯风险(MBR)解码的新型NAR解码框架,称为NAR-MBR解码,它最大化从NAR模型输出概率中抽取的样本计算的期望效用,而不是最大化输出概率。值得注意的是,通过利用NAR模型的特性,单次前向计算即可高效获得多个样本。我们在LibriSpeech、Switchboard、AMI和网络演示语料库上的实验表明,我们的NAR-MBR解码优于先前的NAR解码,并且运行速度快于AR解码。

英文摘要

Non-autoregressive (NAR) decoding generates output tokens in parallel, making speech recognition faster than autoregressive decoding, which generates them sequentially from left to right. However, the recognition performance is degraded because NAR decoding cannot resolve uncertainty by conditioning on previously generated tokens. To address this issue, we propose a novel NAR decoding framework based on minimum Bayes' risk (MBR) decoding, termed NAR-MBR decoding, that maximizes the expected utility calculated from samples drawn from the output probability of an NAR model rather than maximizing the output probability. Notably, by leveraging the nature of NAR models, multiple samples are obtained efficiently with a single forward computation. Our experiments across LibriSpeech, Switchboard, AMI, and web presentation corpus demonstrated that our NAR-MBR decoding outperformed previous NAR decoding and ran faster than AR decoding.

2606.17491 2026-06-17 stat.ML cs.LG stat.ME 新提交

A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

贝叶斯布尔矩阵分解及其在癌症拷贝数分析中的应用

Adolphus Wagala, Mehmet Samur, Giovanni Parmigiani

发表机构 * Department of Data Science, Dana-Farber Cancer Institute(数据科学部,达纳-法伯癌症研究所) Department of Biostatistics, Harvard T.H. Chan School of Public Health(生物统计学部,哈佛T.H. 潘克学校公共卫生学院)

AI总结 提出贝叶斯布尔矩阵分解(BBMF)模型,通过全共轭生成模型和稀疏先验实现布尔约束下的可解释因子分解,并应用于多发性骨髓瘤的染色体臂拷贝数变异分析,揭示肿瘤异质性的离散潜在结构。

详情
AI中文摘要

二值数据分解很常见,但实值方法忽略了离散性并产生难以解释的因子。布尔矩阵分解(BooMF)通过逻辑与和或运算将二值矩阵分解为两个低秩二值矩阵,将数据表示为可解释模式的布尔析取。在癌症基因组学中,BooMF可以揭示可能驱动肿瘤演化的协调特征变化,这与旋转或加性分解不同。大多数现有的BooMF方法是启发式的、贪婪的、对初始化敏感、容易陷入局部最优,并且不支持原则性的模型选择或不确定性量化。我们引入了贝叶斯布尔矩阵分解(BBMF),这是一个具有稀疏诱导先验的全共轭生成模型。它强制执行布尔约束,产生具有一致不确定性量化的可解释潜在因子,并允许具有封闭形式全条件分布的吉布斯采样。由于癌症演化通常涉及广泛、近乎同时的染色体数目变化(例如,全基因组复制后伴随不稳定性和选择),布尔分解比加性模型更自然地捕捉这些模式。应用于多发性骨髓瘤的臂级拷贝数变异数据(其中条目指示染色体臂扩增的存在/缺失),BBMF找到了一小组可解释的双团,将患者子集与反复共变的染色体臂联系起来,提供了肿瘤异质性的紧凑、生物学上有意义的总结,并展示了BBMF在复杂二值数据中发现离散潜在结构的实用性。

英文摘要

Binary data factorization is common, but real-valued methods ignore discreteness and yield hard-to-interpret factors. Boolean Matrix Factorization (BooMF) instead decomposes a binary matrix into two lower-rank binary matrices via logical AND and OR, expressing the data as a Boolean disjunction of interpretable patterns. In cancer genomics, BooMF can reveal coordinated feature changes that may drive tumor evolution, unlike rotational or additive decompositions. Most existing BooMF methods are heuristic, greedy, sensitive to initialization, prone to local optima, and do not support principled model selection or uncertainty quantification. We introduce Bayesian Boolean Matrix Factorization (BBMF), a fully conjugate generative model with sparsity-inducing priors. It enforces Boolean constraints, yields interpretable latent factors with coherent uncertainty quantification, and admits Gibbs sampling with closed-form full conditionals. Because cancer evolution often involves widespread, near-simultaneous chromosome-number changes (e.g., whole-genome duplication followed by instability and selection), Boolean factorizations capture these patterns more naturally than additive models. Applied to arm-level copy-number alteration data in multiple myeloma, where entries indicate presence/absence of chromosomal-arm amplifications, BBMF finds a small set of interpretable bicliques linking patient subsets to recurrently co-altered chromosomal arms, providing a compact, biologically meaningful summary of tumor heterogeneity and demonstrating BBMF's utility for uncovering discrete latent structure in complex binary data.

2606.17420 2026-06-17 eess.IV cs.AI q-bio.QM 新提交

Feynman Kac Reweighted Schrödinger Bridge Matching for Surface-Based Tau PET Harmonization

基于Feynman Kac重加权薛定谔桥匹配的皮层表面Tau PET标准化

Jianwei Zhang, Xinyu Nie, Jiaxin Yue, Yonggang Shi

发表机构 * Stevens Neuroimaging and Informatics Institute, University of Southern California(斯蒂文斯神经影像与信息学研究所,南加州大学) Ming Hsieh Department of Electrical and Computer Engineering of Viterbi School of Engineering, University of Southern California(明希德电气与计算机工程系,维特比工程学院,南加州大学) Alfred E. Mann Department of Biomedical Engineering of Viterbi School of Engineering, University of Southern California(阿尔弗雷德·E·曼生物医学工程系,维特比工程学院,南加州大学)

AI总结 提出Feynman Kac重加权薛定谔桥匹配(FKRSBM)模型,通过熵正则化最优传输实现源域与目标域间的随机传输,结合子群感知端点提议和球面卷积骨干网络,在Tau PET SUVR图上实现优于现有方法的分布对齐和下游疾病分类。

详情
AI中文摘要

Tau PET成像对于追踪阿尔茨海默病进展至关重要,但不同站点间的扫描仪、协议和放射性示踪剂的系统差异引入了非生物变异性,这会增加生物标志物方差、降低对疾病效应的敏感性,并可能偏倚下游临床评估。标准化方法旨在去除这些站点引起的偏移,同时保留有生物学意义的信号,然而现有方法在源队列和目标队列具有不同子群组成时难以应对,存在将站点效应与生物学变异(如tau阳性状态)混淆的风险。我们提出Feynman Kac重加权薛定谔桥匹配(FKRSBM)模型来解决这一问题。与基于扩散的方法通过高斯噪声先验路由数据不同,FKRSBM通过熵正则化最优运输学习源分布和目标分布之间的直接随机传输过程。为了实现生物学一致的传输,FKRSBM结合了由参考桥测度的Feynman Kac重加权导出的子群感知端点提议,完全通过数据层面的分层重要性抽样实现,无需对底层桥匹配求解器或网络架构进行任何更改。对于基于表面的神经影像,FKRSBM采用在皮层网格上运行的球面卷积骨干网络进行顶点级标准化。我们在tau PET SUVR图上评估该方法,将HABS-HD队列的PI-2620数据标准化到ADNI的AV-1451域。与ComBat、CycleGAN、基于扩散的方法(DF)和无正则化的扩散薛定谔桥匹配(DSBM)相比,FKRSBM实现了更优的分布对齐、更低的tau阳性符号不匹配、更强的APOE子群对齐以及改进的下游疾病分类性能。

英文摘要

Tau PET imaging is central to tracking Alzheimer's disease progression, but systematic differences between scanners, protocols, and radiotracers across sites introduce nonbiological variability that inflates biomarker variance, reduces sensitivity to disease effects, and can bias downstream clinical assessments. Harmonization methods aim to remove these site-induced shifts while preserving biologically meaningful signal, yet existing approaches struggle when source and target cohorts differ in subgroup composition, risking conflation of site effects with biological variation such as tau-positivity status. We propose the Feynman Kac Reweighted Schröodinger Bridge Matching (FKRSBM) model to address this problem. Rather than routing data through a Gaussian noise prior as in diffusion-based methods, FKRSBM learns a direct stochastic transport process between source and target distributions via entropy-regularized optimal transport. To enforce biologically consistent transport, FKRSBM incorporates a subgroup-aware endpoint proposal derived from a Feynman Kac reweighting of the reference bridge measure, implemented entirely through stratified importance sampling at the data level and requiring no changes to the underlying bridge-matching solver or network architecture. For surface-based neuroimaging, FKRSBM employs a spherical convolutional backbone operating on cortical meshes to perform vertex-level harmonization. We evaluate the method on tau PET SUVR maps, harmonizing PI-2620 data from the HABS-HD cohort into the AV-1451 domain of ADNI. Compared against ComBat, CycleGAN, a diffusion-based method (DF), and unregularized Diffusion Schröodinger Bridge Matching (DSBM), FKRSBM achieves superior distributional alignment, reduced tau-positivity sign mismatch, stronger APOE subgroup alignment, and improved downstream disease classification performance.

2606.17404 2026-06-17 eess.AS cs.SD 新提交

ELSA: Acoustic Event-Level Semantic Alignment for Fine-Grained Reference-Free Text-to-Audio Evaluation

ELSA: 面向细粒度无参考文本到音频评估的声学事件级语义对齐

Shuntaro Suzuki, Kento Tokura, Daichi Yashima, Kanon Amemiya, Komei Sugiura, Shinnosuke Takamichi

发表机构 * Keio University(Keio大学)

AI总结 提出ELSA指标,通过将生成音频分解为文本查询中的声学事件并评估事件级对齐,实现细粒度无参考文本到音频评估,在四个基准上比现有指标更符合人类评分。

Comments Accepted for presentation at Interspeech2026

详情
AI中文摘要

文本到音频(TTA)生成,即从自然语言合成音频,因其能够捕捉精确的用户意图而被广泛研究。为了有效推进TTA模型,必须在不依赖昂贵的人类主观评分的情况下可靠地评估生成的音频,这促使开发与人类判断高度相关的自动评估指标。虽然最近的基于CLAP的指标提供了实用的无参考解决方案,但其粗粒度的文本-音频相似度匹配往往与人类评分的相关性较差。为了解决这个问题,我们提出了ELSA,一种用于细粒度文本-音频对齐的无参考评估指标。ELSA将生成的音频分解为由文本查询中的不同声学事件引导,并评估事件级对齐。在四个TTA基准上的实验表明,ELSA与人类主观评分的相关性高于先前的指标,突显了其在可靠TTA评估中的有效性。

英文摘要

Text-to-audio (TTA) generation, synthesizing audio from natural language, has been widely studied for its ability to capture precise user intent. To effectively advance TTA models, it is essential to reliably evaluate generated audio without relying on costly human subjective ratings, motivating the development of automatic evaluation metrics that correlate well with human judgments. While recent CLAP-based metrics provide practical reference-free solutions, their coarse-grained text-audio similarity matching often correlates poorly with human ratings. To address this, we propose ELSA, a reference-free evaluation metric for fine-grained text-audio alignment. ELSA decomposes generated audio guided by distinct acoustic events derived from the text query and assesses event-level alignment. Experiments across four TTA benchmarks show that ELSA reveals a higher correlation with human subjective ratings than prior metrics, highlighting its effectiveness for reliable TTA evaluation.

2606.17383 2026-06-17 q-fin.RM cs.AI cs.LG stat.ML 新提交

Model Validation of Agentic AI Systems: A POMDP-Based Framework for Belief-State, Forecast, and Policy Validation

智能体AI系统的模型验证:基于POMDP的信念状态、预测与策略验证框架

Matthew Francis Dixon

发表机构 * Quiota LLC(Quiota公司)

AI总结 提出基于部分可观测马尔可夫决策过程(POMDP)的智能体AI模型验证框架,将自主决策分解为信息、信念、预测、动作和效用组件独立验证,并通过投资组合管理案例展示其有效性。

Comments 28 pages, 3 figures, 6 tables. Source code available from https://github.com/mfrdixon/agentic-AI-as-POMDP

详情
AI中文摘要

智能体人工智能系统引入了一类新的模型风险。与传统预测模型不同,自主智能体持续获取信息,形成关于环境潜在状态的信念,生成预测,选择行动,并随时间调整其行为。现有的验证方法主要关注预测准确性,因此对底层决策过程的质量提供的洞察有限。本文提出了一种基于部分可观测马尔可夫决策过程(POMDP)的智能体AI模型验证框架。该框架将自主决策分解为信息、信念、预测、行动和效用,允许每个组件独立验证。大型语言模型(LLM)被形式化为近似贝叶斯滤波算子,并开发了一个模型风险分类体系,涵盖状态空间、滤波、预测、策略、效用规范和参数风险。通过一个投资组合管理案例研究展示了模型风险验证方法,其中智能体从市场和宏观经济信息中推断潜在市场制度,生成基于信念的预测,并使用Black-Litterman框架构建投资组合。实证验证结合了性能分析、信念校准诊断、覆盖测试、消融研究和参数敏感性分析。结果表明,潜在状态推断对决策质量有独立贡献,且主要结论在广泛的参数值范围内保持稳健。本文的主要贡献是提供了一个实用框架,将已建立的模型风险管理概念扩展到自主AI系统,并为其验证、治理和监控提供了严格的基础。

英文摘要

Agentic artificial intelligence systems introduce a new class of model risk. Unlike traditional predictive models, autonomous agents continuously acquire information, form beliefs regarding latent states of the environment, generate forecasts, select actions, and adapt their behavior over time. Existing validation methodologies focus primarily on predictive accuracy and therefore provide limited insight into the quality of the underlying decision process. This paper proposes a model validation framework for agentic AI based on Partially Observable Markov Decision Processes (POMDPs). The framework decomposes autonomous decision making into information, beliefs, forecasts, actions, and utility, allowing each component to be validated independently. Large language models (LLMs) are formalized as approximate Bayesian filtering operators, and a model-risk taxonomy is developed encompassing state-space, filtering, forecast, policy, utility-specification, and parameter risks. The model risk validation methodology is demonstrated through a portfolio-management case study in which an agent infers latent market regimes from market and macroeconomic information, generates belief-conditioned forecasts, and constructs portfolios using a Black--Litterman framework. Empirical validation combines performance analysis, belief calibration diagnostics, coverage tests, ablation studies, and parameter-sensitivity analysis. The results indicate that latent-state inference contributes independently to decision quality and that the principal conclusions remain robust across a broad range of parameter values. The principal contribution of the paper is a practical framework for extending established model risk management concepts to autonomous AI systems and providing a rigorous foundation for their validation, governance, and monitoring.

2606.17295 2026-06-17 eess.IV cs.CV 新提交

Phenotyping TPF via Self-Supervised Learning: A Label-Agnostic Framework with Expert Validation

通过自监督学习进行胫骨平台骨折表型分析:一种具有专家验证的标签无关框架

Miral Elnakib, Muhammad Saad, Ahmad Al-Kabbany

发表机构 * Faculty of Sciences(科学学院) Alexandria University(亚历山大大学) Multimedia Interaction and Communication Lab(多媒体交互与通信实验室) Wearables, Biosensing, and Biosignal Processing Research Lab(可穿戴设备、生物传感与生物信号处理研究实验室) Arab Academy for Science and Technology(阿拉伯科学与技术学院)

AI总结 提出一种标签无关的自监督学习框架,利用SimCLR和聚类从X光片中直接学习骨折表征,发现四种影像衍生表型,经盲法专家验证具有稳定性和临床可解释性,与常规分类正交。

详情
AI中文摘要

人工智能在胫骨平台骨折特征描述中的全部潜力尚未实现,受限于对标注数据集的根本依赖,而标注数据集的一致性无法保证:传统的分类方案如Schatzker和AO/OTA存在观察者间变异性,导致监督模型学习的是人类分歧而非稳定的骨折形态。我们设计、实现并验证了一个标签无关的框架,通过直接从影像数据中学习骨折表征来消除这一约束,无需观察者分配的标签。使用RadImageNet预训练的ResNet-50编码器,在154张清洁的膝关节X光片上通过SimCLR对比目标进行微调,之前进行数据清洗协议,之后进行UMAP降维和k-means聚类,以发现四种影像衍生表型。通过盲法专家审查协议评估表型有效性,由两名独立临床医生进行。四种表型表现出稳健的稳定性(bootstrap ARI = 0.319 +/- 0.041)、强内部凝聚力(轮廓系数 = 0.511),以及两名评审者在盲法条件下给出3-5/5的一致性评分;一种表型被一致认为表现出粉碎性——一种在没有监督信号的情况下分离出的高复杂性特征。与Schatzker标签的跨分区比较得出ARI = 0.013,证实了与传统分类边界的正交性。值得注意的是,锚定于既定分类词汇的专家评审者在Schatzker对齐度最低的地方认为影像衍生组是异质的,这表明Schatzker训练的感知和标签无关的嵌入几何测量的是正交维度。这些发现确立了标签无关的SSL表型分析作为传统分类的可重复且临床可解释的补充。

英文摘要

The full potential of artificial intelligence in tibial plateau fracture characterisation remains unrealised, constrained by a fundamental dependency on labelled datasets whose consistency cannot be guaranteed: conventional classification schemes such as Schatzker and AO/OTA suffer from inter-observer variability, causing supervised models to learn human disagreement rather than stable fracture morphology. We design, implement, and validate a label-agnostic framework that eliminates this constraint by learning fracture representations directly from imaging data without observer-assigned labels. A RadImageNet-pretrained ResNet-50 encoder is fine-tuned on 154 cleaned knee radiographs using the SimCLR contrastive objective, preceded by a data cleaning protocol and followed by UMAP dimensionality reduction and k-means clustering to discover four imaging-derived phenotypes. Phenotype validity is assessed through a blinded expert review protocol administered to two independent clinicians. The four phenotypes demonstrate robust stability (bootstrap ARI = 0.319 +/- 0.041), strong internal cohesion (silhouette = 0.511), and coherence ratings of 3-5/5 from both reviewers under blinded conditions; one phenotype was unanimously identified as exhibiting comminution -- a high-complexity feature isolated without any supervisory signal. Inter-partition comparison against Schatzker labels yields ARI = 0.013, confirming orthogonality to conventional classification boundaries. Notably, expert reviewers anchored to established classification vocabularies perceived imaging-derived groups as heterogeneous precisely where Schatzker alignment was lowest, suggesting that Schatzker-trained perception and label-agnostic embedding geometry measure orthogonal dimensions. These findings establish label-agnostic SSL phenotyping as a reproducible and clinically interpretable complement to conventional classification.

2606.17259 2026-06-17 eess.AS cs.SD 新提交

Intelligibility of Speech in Noise: Investigating Contribution of Magnitude and Phase Spectra

噪声中语音的可懂度:幅度谱和相位谱贡献的研究

Bhanu Teja Nellore, Sudarsana Reddy Kadiri, Rohit Kumar, Karan Nathwani, Suryakanth V Gangashetty

发表机构 * Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA(美国南加州大学信号分析与解释实验室) National Institute of Technology, Patna, India(印度帕坦国家理工学院) Indian Institute of Technology, Jammu, India(印度朱默尔理工学院) Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur District, Andhra Pradesh, India(安得拉邦戈塔瓦德区瓦达萨瓦拉姆康纳鲁拉克希玛伊教育基金会)

AI总结 通过三个实验评估幅度谱和相位谱对噪声中辅音可懂度的贡献,发现幅度谱在干净条件下贡献更大,而相位谱在噪声条件下更鲁棒,且鼻音比擦音和近音更易受噪声影响。

详情
AI中文摘要

众所周知,语音的可懂度在环境噪声中会降低。然而,研究表明并非所有声音都受到均匀(或同等)影响,元音比辅音对噪声更鲁棒。本研究评估并分析了各种辅音在平稳白噪声和非平稳嘈杂噪声条件下的可懂度。具体而言,本研究探讨了给定语音信号的幅度谱和相位谱对噪声条件下辅音人类语音识别的各自贡献。为此,进行了三个实验。实验1中,评估了干净信号、仅用幅度谱信息重建的信号(仅幅度信号)和仅用相位谱信息重建的信号(仅相位信号)的可懂度。实验2中,将噪声添加到干净语音中。从带噪语音中重建仅相位信号和仅幅度信号,并对所有这三种信号进行可懂度测试。实验3中,将噪声直接添加到从干净语音重建的仅幅度和仅相位信号中,并评估其可懂度。这些实验结果表明,在干净条件下幅度谱对可懂度的贡献大于相位谱,而相位谱的信息在噪声条件下更鲁棒。还观察到,在辅音中,鼻音更容易受噪声影响,而擦音和近音相对更鲁棒。

英文摘要

It is well known that intelligibility of speech reduces in the presence of ambient noise. However, studies show that all sounds are not affected uniformly (or equally) and that vowels are more robust to noise than consonants. In this study, intelligibility of various consonants is assessed and analyzed in stationary white noise and non-stationary babble noise conditions. Specifically, this study investigates the individual contribution of magnitude and phase spectra of a given speech signal on human speech recognition of consonants in noisy conditions. In this regard, three experiments are carried out. In experiment 1, clean signal, signal reconstructed with only magnitude spectrum information (magnitude only signal) and signal reconstructed with only phase spectrum information (phase only signal) are assessed for intelligibility. In experiment 2, noise is added to clean speech. From noisy speech, phase only signal and magnitude only signal are reconstructed and intelligibility tests are performed for all these three signals. In experiment 3, noise is added directly to the magnitude only and phase only signals reconstructed from clean speech and their intelligibility is assessed. Results of these experiments show that magnitude spectrum contributes more to intelligibility in clean condition than phase spectrum, while information from phase spectrum is more robust in noisy conditions. It is also observed that, among consonants, nasals are more susceptible to noise whereas fricatives and approximants were observed to be comparatively more robust.

2606.17196 2026-06-17 stat.ML cs.LG stat.ME 新提交

Another Look at Log-PCA for Probability Measures: A Dynamical Formulation and Statistical Convergence

再探概率测度的Log-PCA:一种动力学公式与统计收敛性

Peng Xu, Changbo Zhu, Young-Heon Kim, Xiaohui Chen

发表机构 * Department of Statistics University of Illinois Urbana-Champaign(统计学系伊利诺伊大学厄巴纳-香槟分校) Department of ACMS University of Notre Dame(ACMS系诺丁汉大学) Department of Mathematics University of British Columbia(数学系不列颠哥伦比亚大学) Department of Mathematics Thomas Lord Department of Computer Science University of Southern California(数学系托马斯·劳德计算机科学系南加州大学)

AI总结 本文在Wasserstein几何下提出一种动力学公式解释log-PCA,称为Wasserstein切向PCA(WT-PCA),并推导了经验WT-PCA相对于总体测度的统计收敛速率。

详情
AI中文摘要

本文关注在Wasserstein几何下学习随机概率测度在$\mathbb{R}^m$上的主变差。我们引入一种新的动力学公式来解释log-PCA(一种线性化的主测地线分析)作为变分方法。我们的可微版本称为Wasserstein切向PCA(WT-PCA),通过其在重心处的协方差算子捕获Wasserstein空间上(加权)概率测度的局部主测地线变差模式。基于动力学视角并利用最优传输问题的平行传输结构,我们推导了从数据估计的经验WT-PCA相对于总体和经验重心参考测度之间的2-Wasserstein距离的通用统计收敛速率。

英文摘要

This paper is concerned with learning principal variations of random probability measures on $\mathbb{R}^m$ under the Wasserstein geometry. We introduce a new dynamical formulation to interpret the log-PCA, a linearized principal geodesic analysis, as a variational approach. Our differentiable version, termed as the Wasserstein Tangential PCA (WT-PCA), captures the local principal modes of geodesic variations of a (weighted) probability measure on the Wasserstein space via its covariance operator at barycenter. Based on the dynamical perspective and leveraging parallel transport structure of the optimal transport problems, we derive a general statistical convergence rate of the empirical WT-PCA when estimated from data in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures.

2606.17127 2026-06-17 q-bio.QM cs.AI cs.LG 新提交

Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3

AMPGAN v3 的非经典抗菌肽智能发现

Jay Jung, Xiaohan Zhang, Shenghan Song, Mahmoud Sayedahmed, Chijian Xiang, Yunong Xu, Ahmed AbdelKhalek, Severin T. Schneebeli, Matthew J. Wargo, Jianing Li, Safwan Wshah

发表机构 * University of Vermont(弗吉尼亚大学) Larner College of Medicine, University of Vermont(弗吉尼亚大学医学学院) Purdue University(普渡大学) Department of Comparative Pathobiology(比较病理科部门) Department of Horticulture and Landscape Architecture(园艺与景观建筑部门) Department of Industrial and Molecular Pharmaceutics(工业与分子药学部门)

AI总结 提出 AMPGAN v3,一种多目标条件 GAN,扩展生成词汇至 D-氨基酸和末端修饰,通过双判别器提升稳定性,体外验证显示对革兰氏阳性菌有活性,并引入 PepCraft 多智能体框架用于端到端发现。

Comments Presented at the GenBio Workshop, ICML 2026

详情
AI中文摘要

抗菌药物耐药性每年导致超过一百万人死亡。抗菌肽(AMP)是一种有前景的解决方案,但生成式 AMP 模型尚未准备好设计含有非天然氨基酸和/或化学修饰的肽,而这些对于实际肽药物至关重要。我们提出了 AMPGAN v3,一种多目标条件 GAN,它将生成词汇扩展到 D-氨基酸和 N/C 末端修饰(如酰胺化)。通过将对抗性和活性感知监督分离到两个专门的判别器中,AMPGAN v3 显著提高了训练稳定性,并在外部分类器上优于先前的生成式 AMP 模型。我们在体外验证了跨越三个结构类别的五个候选物;其中两个对革兰氏阳性菌株表现出活性,最佳候选物对枯草芽孢杆菌的 MIC 达到 8 μg/mL。为了支持下游筛选,我们进一步提出了 PepCraft,一个用于端到端 AMP 发现的多智能体框架,其中规划智能体协调专门的执行器进行生成、过滤和验证。其优先级推荐与我们的体外结果一致。这些贡献使我们能够在小型但真实的规模上研究生成式和智能体 AI 如何在治疗性肽发现中协同作用。代码:this https URL

英文摘要

Antimicrobial resistance causes to over a million deaths annually. Antimicrobial peptides (AMPs) are a promising solution, but generative AMP models are not yet ready to design peptides with non-natural amino acids and/or chemical modifications, which are essential for real-world peptide drugs. We present AMPGAN v3, a multi-objective conditional GAN that expands the generative vocabulary to D-amino acids and N/C-terminus modifications such as amidation. By separating adversarial and activity-aware supervision across two specialized discriminators, AMPGAN v3 substantially improves training stability and outperforms prior generative AMP models on external classifiers. We validated five candidates spanning three structural classes in vitro; two showed activity against Gram-positive strains, with the best candidate reaching MIC 8 μg/mL against B. subtilis. To support downstream curation, we further present PepCraft, a multi-agent framework for end-to-end AMP discovery in which a Planning Agent orchestrates specialized executors for generation, filtering, and verification. Its prioritization recommendations align with our in vitro outcomes. Together, these contributions let us examine, on a small but real scale, how generative and agentic AI compose in therapeutic peptide discovery. Code: https://github.com/marszzibros/AMPGANv3

2606.17065 2026-06-17 q-fin.CP cs.AI cs.LG 新提交

PIVOT: Bridging Black-Scholes Implied-Volatility and Price Objectives via Differentiable Jäckel Operator

PIVOT: 通过可微分的Jäckel算子桥接Black-Scholes隐含波动率与价格目标

Raeid Saqur, Yannick Limmer, Anastasis Kratsios, Blanka Horvath, Hans Buehler

发表机构 * Mathematical Institute, University of Oxford(牛津大学数学研究所) McMaster University(麦基尔大学) Vector Institute for AI(人工智能矢量研究所) DRW

AI总结 提出PIVOT层,通过隐式微分保留Jäckel求解器的前向精度,并利用门控机制处理低vega区域的奇异性,实现价格与隐含波动率空间的高效可微转换。

Comments 30 pages, 17 figures, 12 tables

详情
AI中文摘要

现代期权学习系统在两种坐标系下运行:价格空间(市场报价且无套利约束最自然执行)和隐含波动率(IV)空间(波动率曲面被平滑、正则化和评估)。瓶颈在于接口而非近似:Jäckel开创性的“Let's Be Rational”(LBR)求解器已经高效地将Black-Scholes价格反转到机器精度。所缺少的是一个可微分层,它在正向传播中保留LBR,并避免通过其分支逻辑进行反向传播。这样的层还必须面对低vega区域中逆映射不可避免的奇异性,其中灵敏度1/vega在vega→0时发散。我们通过PIVOT(价格-隐含波动率目标转换器)填补了这一空白。PIVOT保持LBR正向传播不变,并通过隐式微分通过平滑的Black-Scholes/Black-76价格映射提供反向传播,并带有显式门控合约:无效域返回NaN,良态行接收精确的1/vega梯度,低vega行被衰减而非静默正则化。在单个H100上,融合的Triton内核在机器精度下达到1.79e9 IV/s(与参考C求解器的最大相对误差为9.3e-14);端到端标签生成在合成链上维持48.9M/s,在SPX OptionMetrics上维持16.6M/s。在SPX上的HyperIV风格单日复现中,PIVOT增强目标帕累托主导基线,将保留价格MAE降低高达43.4%,最强的三种子门控目标联合改善价格MAE 38.8%和IV MAE 21.3%;在RUT、VIX和NDX上的跨资产结果显示方向性价格MAE增益分别为40.1%、24.2%和16.7%,而无门控的IV往返控制崩溃为退化的近零曲面,确认门控是正确性合约而非调节旋钮。

英文摘要

Modern option-learning systems operate in two coordinates: price space, where markets quote and no-arbitrage constraints are most naturally enforced, and implied volatility (IV) space, where volatility surfaces are smoothed, regularized, and evaluated. The bottleneck is interface, not approximation: Jäckel's seminal "Let's Be Rational" (LBR) solver already inverts the Black-Scholes price to machine precision efficiently. What is missing is a differentiable layer that preserves LBR in the forward pass and avoids backpropagating through its branch logic. Such a layer must also confront the unavoidable singularity of the inverse map in the low-vega regime, where the sensitivity 1/vega diverges as vega -> 0. We close this gap with PIVOT, the Price-Implied-Volatility Objective Translator. PIVOT keeps the LBR forward pass intact and supplies the backward pass by implicit differentiation through the smooth Black-Scholes/Black-76 price map, with an explicit gating contract: invalid domains return NaN, well-conditioned rows receive the exact 1/vega gradient, and low-vega rows are attenuated rather than silently regularized. On a single H100, a fused Triton kernel reaches 1.79e9 IV/s at machine precision (9.3e-14 max relative error vs. the reference C solver); end-to-end label generation sustains 48.9M/s on synthetic chains and 16.6M/s on SPX OptionMetrics. In a HyperIV-style one-day reproduction on SPX, PIVOT-augmented objectives Pareto-dominate the baselines, reducing held-out price MAE by up to 43.4% and the strongest three-seed gated objective improving price MAE by 38.8% and IV MAE by 21.3% jointly; cross-asset results on RUT, VIX, and NDX show directional price-MAE gains of 40.1%, 24.2%, and 16.7%, while an ungated IV-roundtrip control collapses to a degenerate near-zero surface, confirming the gate as a correctness contract rather than a tuning knob.