arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.18074 2026-06-17 stat.ML cs.LG stat.ME 新提交

Tensor-based second-order causal discovery

基于张量的二阶因果发现

Nathan Ouyang, Kexin Wan, Anna Seigal

AI总结 提出TSCD算法,利用观测和干预数据的协方差矩阵张量,在线性结构方程模型下识别有向无环图及其边函数,仅要求噪声不相关,并扩展到非线性模型,具有对数级干预可识别性。

Comments 27 pages, 7 figures. Code available at this https URL (https://github.com/QWE123665/Tensor-based-Second-order-Causal-Discovery)

详情
AI中文摘要

因果发现旨在揭示变量间的因果依赖关系。为此,我们提出了一种称为基于张量的二阶因果发现(TSCD)的算法。其输入是从观测数据和干预数据的协方差矩阵中得到的张量。假设因果依赖关系遵循有向无环图(DAG)上的线性结构方程模型,TSCD输出DAG及其边上的函数,仅要求噪声变量不相关。我们还实现了该方法在非线性模型中的版本。我们关注二阶统计量(通过协方差矩阵)的动机是:相对于高阶矩,它们在统计和计算上更高效;相对于一阶统计量,它们具有可识别性;并且无论变量是否为高斯分布,它们都适用。我们证明,TSCD从对数于变量数量的干预次数中可识别因果顺序和参数。实验表明,TSCD对噪声具有鲁棒性,与现有方法相比具有竞争力,并且可扩展到数百个变量。

英文摘要

Causal discovery seeks to uncover the causal dependencies among variables. For this purpose, we propose an algorithm called Tensor-based Second-order Causal Discovery (TSCD). Its input is a tensor obtained from the covariance matrices of observational and interventional data. Assuming the causal dependencies follow a linear structural equation model on a directed acyclic graph (DAG), TSCD outputs the DAG and the functions on its edges, requiring only that the noise variables are uncorrelated. We also implement a version of the approach for nonlinear models. Our focus on second-order statistics (via the covariance matrices) is motivated by their statistical and computational efficiency relative to higher-order moments, their identifiability relative to first-order statistics, and that they work regardless of whether the variables are Gaussian. We show that TSCD has identifiable causal order and parameters from a number of interventions that is logarithmic in the number of variables. Experiments show that TSCD is robust to noise, competitive with existing methods, and scales to hundreds of variables.

2606.18019 2026-06-17 eess.AS cs.CL cs.SD 新提交

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

字里行间:利用大型语言模型从临床访谈中进行全球痴呆和抑郁评估

Franziska Braun, Alea Rüggeberg, Thomas Ranzenberger, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

发表机构 * TH Nürnberg(Nürnberg大学) FAU Erlangen(埃朗根大学) PMU Klinikum Nürnberg(纽伦堡大学医院)

AI总结 本研究利用开放权重大型语言模型,从154名德语受试者的临床访谈录音中预测痴呆和抑郁严重程度,引入与全球恶化量表对齐的全球抑郁量表,发现零样本预测对抑郁有效,而结构化特征提取显著提升痴呆评估性能,误差降低达35%,且暂停增强转录本表现与人工转录相当。

Comments Accepted for publication in Text, Speech and Dialogue (TSD 2026). The final authenticated publication will be available online via Springer LNCS/LNAI

详情
AI中文摘要

痴呆和抑郁是老年人群中最常见的神经精神障碍,其重叠症状对鉴别诊断构成重大挑战。在本研究中,我们探讨了开放权重的大型语言模型(LLMs)用于从154名德语受试者的标准化病史访谈录音中预测痴呆和抑郁严重程度。我们引入了一个与已建立的全球恶化量表(GDS)对齐的观察者基础全球抑郁量表(GDS-D),从而能够对情感和认知症状进行并行全局分期。我们在两种设置下比较了三种LLMs(Mistral 3.1、DeepHermes、Qwen3):(1) 零样本预测和(2) 基于LLM的特征提取用于支持向量回归,使用人工转录和暂停增强转录。结果显示,LLMs在零样本设置中有效预测抑郁严重程度(最佳MAE为0.60),而痴呆评估显著受益于结构化特征提取(最佳MAE为0.78),相比零样本基线误差降低高达35%。暂停增强转录本在性能上与人工转录相当,证明了全自动筛查流程在神经精神鉴别评估中的可行性。

英文摘要

Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity from speech samples collected during standardized history taking interviews with 154 German-speaking subjects. We introduce an observer-based Global Depression Scale (GDS-D) aligned with the established Global Deterioration Scale (GDS), enabling parallel global staging of affective and cognitive symptoms. We compare three LLMs (Mistral 3.1, DeepHermes, Qwen3) in two settings: (1) zero-shot prediction and (2) LLM-based feature extraction for Support Vector Regression, using human and pause-enriched transcripts. Results show that LLMs effectively predict depression severity in zero-shot settings (best MAE of 0.60), while dementia assessment benefits substantially from structured feature extraction (best MAE of 0.78), reducing errors by up to 35% over zero-shot baselines. Pause-enriched transcripts achieve competitive performance with human transcriptions, demonstrating the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.

2606.18011 2026-06-17 stat.ML cs.LG stat.ME 新提交

Fast Nonparametric Conditional Independence Testing via Two-Stage Regression

通过两阶段回归的快速非参数条件独立性检验

Eric V. Strobl

发表机构 * Department of Biomedical Informatics, University of Pittsburgh(生物医学信息学系,匹兹堡大学)

AI总结 提出BLITZ方法,通过两阶段回归(低阶多项式+浅层树)快速消除条件集影响,实现校准良好的非参数条件独立性检验,适用于因果发现。

Comments A fast R implementation with C++ back-end is available at this https URL (https://github.com/ericstrobl/BLITZ)

详情
AI中文摘要

基于约束的因果发现依赖于重复的条件独立性检验,但快速非参数检验往往牺牲校准性,尤其是当变量通过非线性关系依赖于条件集时。我们提出了BLITZ(Broad-to-Local Independence Testing via residualiZation),一种非参数条件独立性检验,旨在在一秒内运行良好,同时保持约束因果发现算法执行数千次查询所需的准确性。BLITZ首先使用低阶多项式回归消除对条件集的广泛平滑依赖,然后应用一个小型非线性特征映射,并通过浅层树回归对这些特征进行残差化。得到的统计量检验残差互协方差,并采用矩匹配卡方近似于零分布。我们从理论上证明,两阶段设计降低了树残差化器面临的有效复杂度,使得浅层树能够控制残差条件均值偏差,同时避免过度过拟合。在模拟中,BLITZ提供了比快速核、随机特征和基于回归的竞争者更好的零校准,同时保持所测试方法中最快的速度之一。在合成图和流式细胞术数据的因果发现实验中,BLITZ在保留的邻接中产生了更可靠的端点方向,并具有竞争力的结构恢复。这些结果表明,从宽到局部残差化是实现因果发现中校准、可扩展的非参数条件独立性检验的实用途径。

英文摘要

Constraint-based causal discovery relies on repeated conditional independence tests, but fast nonparametric tests often sacrifice calibration, especially when variables depend on the conditioning set through nonlinear relationships. We introduce BLITZ (Broad-to-Local Independence Testing via residualiZation), a nonparametric conditional independence test designed to run well under a second while maintaining the accuracy needed for the thousands of queries performed by constraint-based causal discovery algorithms. BLITZ first removes broad smooth dependence on the conditioning set using low-order polynomial regression, then applies a small nonlinear feature map and residualizes those features with shallow tree regressions. The resulting statistic tests residual cross-covariance, with a moment-matched chi-square approximation to the null distribution. We show theoretically that the two-stage design reduces the effective complexity faced by the tree residualizers, allowing shallow trees to control residual conditional-mean bias while avoiding excessive overfitting. In simulations, BLITZ provides better null calibration than fast kernel, random-feature, and regression-based competitors while remaining among the fastest methods tested. In causal discovery experiments on synthetic graphs and flow-cytometry data, BLITZ yields more reliable endpoint orientations among retained adjacencies and competitive structural recovery. These results suggest that broad-to-local residualization is a practical route to calibrated, scalable nonparametric conditional independence testing for causal discovery.

2606.17995 2026-06-17 stat.ML cs.CR cs.LG 新提交

Differential Privacy of Gaussian Process Posterior Sampling

高斯过程后验采样的差分隐私

Tomasz Maciazek

发表机构 * School of Mathematics, University of Bristol(布里斯托大学数学学院)

AI总结 研究高斯过程后验样本路径的隐私性,通过Rényi-DP界分离后验均值与协方差泄露,揭示有效岭正则化的关键作用,并验证成员推断攻击与正则化的依赖关系。

Comments 8 pages of main text + 25 pages appendix

详情
AI中文摘要

我们研究了当整个训练集(包括协变量和响应)是私有时,从高斯过程(GP)发布后验样本路径的隐私性。与添加外部噪声的标准差分隐私(DP)机制不同,后验采样在构造上是随机的。我们表明,这种内在随机性通过推导GP后验样本路径发布的显式Rényi-DP界来提供DP保证。这些界将后验均值泄露与数据相关的后验协方差泄露分开,表明有意义的隐私严重依赖于有效的岭正则化。我们应用成员推断攻击来表明经验泄露遵循对正则化、后验方差和发布的样本路径数量的预测依赖关系。在下游后验采样任务上的效用实验识别了噪声观测机制,其中隐私兼容的正则化以适度的效用损失保留了有用的决策。当需要更强的隐私时,可以通过添加校准的GP噪声来增强内在保证,提供显式的额外隐私调节旋钮。

英文摘要

We study the privacy of releasing posterior sample paths from a Gaussian process (GP) when the entire training set including covariates and responses is private. Unlike standard differential-privacy (DP) mechanisms that add external noise, posterior sampling is random by construction. We show that this intrinsic randomness yields DP guarantees by deriving explicit Rényi-DP bounds for GP posterior sample-path release. The bounds separate posterior-mean leakage from data-dependent posterior-covariance leakage showing that meaningful privacy depends sharply on effective ridge regularisation. We apply membership-inference attacks to show that empirical leakage follows the predicted dependence on regularisation, posterior variance and the number of released posterior sample-paths. Utility experiments on downstream posterior-sampling tasks identify noisy-observation regimes where privacy-compatible regularisation preserves useful decisions with modest utility loss. When stronger privacy is needed, the intrinsic guarantee can be sharpened by adding calibrated GP noise, providing an explicit additional privacy knob.

2606.17684 2026-06-17 stat.ML cs.CY cs.LG 新提交

Geometrical fairness in graph neural networks

图神经网络中的几何公平性

Arturo Pérez-Peralta, Sandra Benítez-Peña, Blas Kolic, Rosa E. Lillo

发表机构 * Department of Statistics, University Carlos III of Madrid, Spain(马德里卡斯蒂利亚-拉曼恰大学统计系) uc3m-Santander Big Data Institute(uc3m-桑坦德大数据研究所)

AI总结 针对图神经网络中公平性问题,通过修改拉普拉斯算子引入多种互补变换(子空间投影、频谱调整、频率滤波)来缓解偏差,理论分析并实验验证了公平性提升与竞争性能。

Comments 32 pages, 21 tables, 6 figures

详情
AI中文摘要

基于图的学习方法因其在多种应用中的强大性能而日益突出。其中,基于扩散过程的最新框架提供了一个统一的视角,扩展了传统的图神经网络公式,同时解决了标准消息传递机制的局限性。尽管取得了这些进展,但此类模型的公平性问题仍然令人担忧,因为它们可能传播或放大数据中存在的偏差。在这项工作中,我们通过修改底层拉普拉斯算子,引入了一种基于图扩散的公平性感知适应方法。我们的方法结合了多种互补变换,包括子空间投影、频谱调整和基于频率的滤波,以减轻与偏差相关的成分。利用图扩散的内在平滑特性,我们对由此产生的行为进行了原则性分析,并建立了公平性属性的理论见解。我们在合成数据集和真实数据集上评估了所提出的框架,结果表明,在有限的计算成本下,它实现了具有竞争力的性能,同时提高了公平性指标。

英文摘要

Graph-based learning methods have become increasingly prominent due to their strong performance across diverse applications. Among these, recent frameworks grounded in diffusion processes provide a unifying perspective that extends traditional graph neural network formulations while addressing limitations of standard message-passing mechanisms. Despite these advances, concerns remain regarding the fairness of such models, as they may propagate or amplify biases present in the data. In this work, we introduce a fairness-aware adaptation of graph-based diffusion by modifying the underlying Laplacian operator. Our approach incorporates multiple complementary transformations, including subspace projections, spectral adjustments, and frequency-based filtering, to mitigate bias-related components. Leveraging the intrinsic smoothing properties of graph diffusion, we provide a principled analysis of the resulting behavior and establish theoretical insights into fairness properties. We evaluate the proposed framework on both synthetic and real-world datasets, demonstrating that it achieves competitive performance while improving fairness metrics with limited additional computational cost.

2606.17537 2026-06-17 eess.AS cs.CL 新提交

Non-Autoregressive Minimum Bayes' Risk Decoding for Fast Speech Recognition

非自回归最小贝叶斯风险解码用于快速语音识别

Hiroyuki Deguchi, Takatomo Kano, Katsuki Chousa, Marc Delcroix

发表机构 * NTT, Inc.(日本NTT公司)

AI总结 提出基于最小贝叶斯风险解码的非自回归解码框架,通过单次前向计算高效采样多个候选,在保持速度优势的同时提升识别性能。

Comments Accepted at Interspeech2026

详情
AI中文摘要

非自回归(NAR)解码并行生成输出令牌,使语音识别比自回归解码(从左到右顺序生成)更快。然而,由于NAR解码无法通过依赖先前生成的令牌来解决不确定性,识别性能会下降。为了解决这个问题,我们提出了一种基于最小贝叶斯风险(MBR)解码的新型NAR解码框架,称为NAR-MBR解码,它最大化从NAR模型输出概率中抽取的样本计算的期望效用,而不是最大化输出概率。值得注意的是,通过利用NAR模型的特性,单次前向计算即可高效获得多个样本。我们在LibriSpeech、Switchboard、AMI和网络演示语料库上的实验表明,我们的NAR-MBR解码优于先前的NAR解码,并且运行速度快于AR解码。

英文摘要

Non-autoregressive (NAR) decoding generates output tokens in parallel, making speech recognition faster than autoregressive decoding, which generates them sequentially from left to right. However, the recognition performance is degraded because NAR decoding cannot resolve uncertainty by conditioning on previously generated tokens. To address this issue, we propose a novel NAR decoding framework based on minimum Bayes' risk (MBR) decoding, termed NAR-MBR decoding, that maximizes the expected utility calculated from samples drawn from the output probability of an NAR model rather than maximizing the output probability. Notably, by leveraging the nature of NAR models, multiple samples are obtained efficiently with a single forward computation. Our experiments across LibriSpeech, Switchboard, AMI, and web presentation corpus demonstrated that our NAR-MBR decoding outperformed previous NAR decoding and ran faster than AR decoding.

2606.17504 2026-06-17 eess.IV cs.CV 新提交

Two-Stage Fine-Tuning of ResNet50 for High-Sensitivity Melanoma Detection on Dermoscopic Images

ResNet50的两阶段微调用于皮肤镜图像中高灵敏度黑色素瘤检测

Aryan Bhagat

AI总结 提出ResNet50的两阶段微调方法,通过分层训练和低学习率微调解决类别不平衡和迁移学习不足问题,在3826张测试图像上实现AUC-ROC 0.9559,灵敏度87.56%,优于单阶段微调。

Comments 13 pages, 4 figures, 4 tables. Code available at this https URL (https://github.com/Aryanbhagat23/melanoma-detection)

详情
AI中文摘要

黑色素瘤是最危险的皮肤癌,早期检测五年生存率超过99%,但一旦扩散则急剧下降。本文提出并评估了一种两阶段微调方法,用于皮肤镜图像上的二分类黑色素瘤检测,基于ResNet50。解决的核心挑战是类别不平衡和单阶段微调导致的迁移学习次优。在分层训练/验证/测试分割后,仅对训练集应用随机过采样以实现1:1类别平衡。第一阶段冻结ResNet50骨干网络,仅训练分类头;第二阶段以1e-5的低学习率联合微调所有层,以防止对已学习视觉特征的灾难性遗忘。在包含3826张图像的独立测试集上,模型实现了AUC-ROC为0.9559,准确率88.34%,灵敏度87.56%,特异度89.13%,F1分数88.29%。消融研究证实两阶段协议显著优于单阶段微调,灵敏度提升超过4%。Grad-CAM可视化展示了正确的病变定位。提供了完全可部署的Streamlit检测应用程序及所有训练代码。

英文摘要

Melanoma is the most dangerous form of skin cancer with five-year survival rates exceeding 99% when detected early but falling sharply once the disease spreads. This paper proposes and evaluates a two-stage fine-tuning approach for ResNet50 applied to binary melanoma classification on dermoscopic images. The core challenges addressed are class imbalance and suboptimal transfer learning from single-stage fine-tuning. After stratified train/validation/test splitting, random oversampling was applied exclusively to the training set to achieve a 1:1 class balance. Stage 1 trained only the classification head with the ResNet50 base frozen, while Stage 2 fine-tuned all layers jointly at a low learning rate of 1e-5 to prevent catastrophic forgetting of learned visual features. On an independent test set of 3,826 images, the model achieved an AUC-ROC of 0.9559, accuracy of 88.34%, sensitivity of 87.56%, specificity of 89.13%, and F1-score of 88.29%. An ablation study confirms the two-stage protocol significantly outperforms single-stage fine-tuning, with sensitivity gains of over 4%. Grad-CAM visualizations demonstrate correct lesion localization. A fully deployable Streamlit detection application is provided alongside all training code.

2606.17491 2026-06-17 stat.ML cs.LG stat.ME 新提交

A Bayesian Boolean Matrix Factorization with Application to Copy Number Analysis in Cancer

贝叶斯布尔矩阵分解及其在癌症拷贝数分析中的应用

Adolphus Wagala, Mehmet Samur, Giovanni Parmigiani

发表机构 * Department of Data Science, Dana-Farber Cancer Institute(数据科学部,达纳-法伯癌症研究所) Department of Biostatistics, Harvard T.H. Chan School of Public Health(生物统计学部,哈佛T.H. 潘克学校公共卫生学院)

AI总结 提出贝叶斯布尔矩阵分解(BBMF)模型,通过全共轭生成模型和稀疏先验实现布尔约束下的可解释因子分解,并应用于多发性骨髓瘤的染色体臂拷贝数变异分析,揭示肿瘤异质性的离散潜在结构。

详情
AI中文摘要

二值数据分解很常见,但实值方法忽略了离散性并产生难以解释的因子。布尔矩阵分解(BooMF)通过逻辑与和或运算将二值矩阵分解为两个低秩二值矩阵,将数据表示为可解释模式的布尔析取。在癌症基因组学中,BooMF可以揭示可能驱动肿瘤演化的协调特征变化,这与旋转或加性分解不同。大多数现有的BooMF方法是启发式的、贪婪的、对初始化敏感、容易陷入局部最优,并且不支持原则性的模型选择或不确定性量化。我们引入了贝叶斯布尔矩阵分解(BBMF),这是一个具有稀疏诱导先验的全共轭生成模型。它强制执行布尔约束,产生具有一致不确定性量化的可解释潜在因子,并允许具有封闭形式全条件分布的吉布斯采样。由于癌症演化通常涉及广泛、近乎同时的染色体数目变化(例如,全基因组复制后伴随不稳定性和选择),布尔分解比加性模型更自然地捕捉这些模式。应用于多发性骨髓瘤的臂级拷贝数变异数据(其中条目指示染色体臂扩增的存在/缺失),BBMF找到了一小组可解释的双团,将患者子集与反复共变的染色体臂联系起来,提供了肿瘤异质性的紧凑、生物学上有意义的总结,并展示了BBMF在复杂二值数据中发现离散潜在结构的实用性。

英文摘要

Binary data factorization is common, but real-valued methods ignore discreteness and yield hard-to-interpret factors. Boolean Matrix Factorization (BooMF) instead decomposes a binary matrix into two lower-rank binary matrices via logical AND and OR, expressing the data as a Boolean disjunction of interpretable patterns. In cancer genomics, BooMF can reveal coordinated feature changes that may drive tumor evolution, unlike rotational or additive decompositions. Most existing BooMF methods are heuristic, greedy, sensitive to initialization, prone to local optima, and do not support principled model selection or uncertainty quantification. We introduce Bayesian Boolean Matrix Factorization (BBMF), a fully conjugate generative model with sparsity-inducing priors. It enforces Boolean constraints, yields interpretable latent factors with coherent uncertainty quantification, and admits Gibbs sampling with closed-form full conditionals. Because cancer evolution often involves widespread, near-simultaneous chromosome-number changes (e.g., whole-genome duplication followed by instability and selection), Boolean factorizations capture these patterns more naturally than additive models. Applied to arm-level copy-number alteration data in multiple myeloma, where entries indicate presence/absence of chromosomal-arm amplifications, BBMF finds a small set of interpretable bicliques linking patient subsets to recurrently co-altered chromosomal arms, providing a compact, biologically meaningful summary of tumor heterogeneity and demonstrating BBMF's utility for uncovering discrete latent structure in complex binary data.

2606.17420 2026-06-17 eess.IV cs.AI q-bio.QM 新提交

Feynman Kac Reweighted Schrödinger Bridge Matching for Surface-Based Tau PET Harmonization

基于Feynman Kac重加权薛定谔桥匹配的皮层表面Tau PET标准化

Jianwei Zhang, Xinyu Nie, Jiaxin Yue, Yonggang Shi

发表机构 * Stevens Neuroimaging and Informatics Institute, University of Southern California(斯蒂文斯神经影像与信息学研究所,南加州大学) Ming Hsieh Department of Electrical and Computer Engineering of Viterbi School of Engineering, University of Southern California(明希德电气与计算机工程系,维特比工程学院,南加州大学) Alfred E. Mann Department of Biomedical Engineering of Viterbi School of Engineering, University of Southern California(阿尔弗雷德·E·曼生物医学工程系,维特比工程学院,南加州大学)

AI总结 提出Feynman Kac重加权薛定谔桥匹配(FKRSBM)模型,通过熵正则化最优传输实现源域与目标域间的随机传输,结合子群感知端点提议和球面卷积骨干网络,在Tau PET SUVR图上实现优于现有方法的分布对齐和下游疾病分类。

详情
AI中文摘要

Tau PET成像对于追踪阿尔茨海默病进展至关重要,但不同站点间的扫描仪、协议和放射性示踪剂的系统差异引入了非生物变异性,这会增加生物标志物方差、降低对疾病效应的敏感性,并可能偏倚下游临床评估。标准化方法旨在去除这些站点引起的偏移,同时保留有生物学意义的信号,然而现有方法在源队列和目标队列具有不同子群组成时难以应对,存在将站点效应与生物学变异(如tau阳性状态)混淆的风险。我们提出Feynman Kac重加权薛定谔桥匹配(FKRSBM)模型来解决这一问题。与基于扩散的方法通过高斯噪声先验路由数据不同,FKRSBM通过熵正则化最优运输学习源分布和目标分布之间的直接随机传输过程。为了实现生物学一致的传输,FKRSBM结合了由参考桥测度的Feynman Kac重加权导出的子群感知端点提议,完全通过数据层面的分层重要性抽样实现,无需对底层桥匹配求解器或网络架构进行任何更改。对于基于表面的神经影像,FKRSBM采用在皮层网格上运行的球面卷积骨干网络进行顶点级标准化。我们在tau PET SUVR图上评估该方法,将HABS-HD队列的PI-2620数据标准化到ADNI的AV-1451域。与ComBat、CycleGAN、基于扩散的方法(DF)和无正则化的扩散薛定谔桥匹配(DSBM)相比,FKRSBM实现了更优的分布对齐、更低的tau阳性符号不匹配、更强的APOE子群对齐以及改进的下游疾病分类性能。

英文摘要

Tau PET imaging is central to tracking Alzheimer's disease progression, but systematic differences between scanners, protocols, and radiotracers across sites introduce nonbiological variability that inflates biomarker variance, reduces sensitivity to disease effects, and can bias downstream clinical assessments. Harmonization methods aim to remove these site-induced shifts while preserving biologically meaningful signal, yet existing approaches struggle when source and target cohorts differ in subgroup composition, risking conflation of site effects with biological variation such as tau-positivity status. We propose the Feynman Kac Reweighted Schröodinger Bridge Matching (FKRSBM) model to address this problem. Rather than routing data through a Gaussian noise prior as in diffusion-based methods, FKRSBM learns a direct stochastic transport process between source and target distributions via entropy-regularized optimal transport. To enforce biologically consistent transport, FKRSBM incorporates a subgroup-aware endpoint proposal derived from a Feynman Kac reweighting of the reference bridge measure, implemented entirely through stratified importance sampling at the data level and requiring no changes to the underlying bridge-matching solver or network architecture. For surface-based neuroimaging, FKRSBM employs a spherical convolutional backbone operating on cortical meshes to perform vertex-level harmonization. We evaluate the method on tau PET SUVR maps, harmonizing PI-2620 data from the HABS-HD cohort into the AV-1451 domain of ADNI. Compared against ComBat, CycleGAN, a diffusion-based method (DF), and unregularized Diffusion Schröodinger Bridge Matching (DSBM), FKRSBM achieves superior distributional alignment, reduced tau-positivity sign mismatch, stronger APOE subgroup alignment, and improved downstream disease classification performance.

2606.17404 2026-06-17 eess.AS cs.SD 新提交

ELSA: Acoustic Event-Level Semantic Alignment for Fine-Grained Reference-Free Text-to-Audio Evaluation

ELSA: 面向细粒度无参考文本到音频评估的声学事件级语义对齐

Shuntaro Suzuki, Kento Tokura, Daichi Yashima, Kanon Amemiya, Komei Sugiura, Shinnosuke Takamichi

发表机构 * Keio University(Keio大学)

AI总结 提出ELSA指标,通过将生成音频分解为文本查询中的声学事件并评估事件级对齐,实现细粒度无参考文本到音频评估,在四个基准上比现有指标更符合人类评分。

Comments Accepted for presentation at Interspeech2026

详情
AI中文摘要

文本到音频(TTA)生成,即从自然语言合成音频,因其能够捕捉精确的用户意图而被广泛研究。为了有效推进TTA模型,必须在不依赖昂贵的人类主观评分的情况下可靠地评估生成的音频,这促使开发与人类判断高度相关的自动评估指标。虽然最近的基于CLAP的指标提供了实用的无参考解决方案,但其粗粒度的文本-音频相似度匹配往往与人类评分的相关性较差。为了解决这个问题,我们提出了ELSA,一种用于细粒度文本-音频对齐的无参考评估指标。ELSA将生成的音频分解为由文本查询中的不同声学事件引导,并评估事件级对齐。在四个TTA基准上的实验表明,ELSA与人类主观评分的相关性高于先前的指标,突显了其在可靠TTA评估中的有效性。

英文摘要

Text-to-audio (TTA) generation, synthesizing audio from natural language, has been widely studied for its ability to capture precise user intent. To effectively advance TTA models, it is essential to reliably evaluate generated audio without relying on costly human subjective ratings, motivating the development of automatic evaluation metrics that correlate well with human judgments. While recent CLAP-based metrics provide practical reference-free solutions, their coarse-grained text-audio similarity matching often correlates poorly with human ratings. To address this, we propose ELSA, a reference-free evaluation metric for fine-grained text-audio alignment. ELSA decomposes generated audio guided by distinct acoustic events derived from the text query and assesses event-level alignment. Experiments across four TTA benchmarks show that ELSA reveals a higher correlation with human subjective ratings than prior metrics, highlighting its effectiveness for reliable TTA evaluation.

2606.17397 2026-06-17 econ.GN cs.GT cs.IR 新提交

Designing Recommendation Exposure and Favorite Lists: A Field Experiment in a Spot-Work Platform

设计推荐曝光与收藏列表:零工平台中的现场实验

Kazuki Sekiya, Suguru Otani, Yuki Komatsu, Shunsuke Ozeki, Shunya Noda

AI总结 针对零工平台中推荐影响稀缺短期机会获取的问题,提出阈值资格控制(TEC)机制,通过基于发布活动和未填补容量重新分配模板曝光,将每轮工作找到率从57.6%提升至70.0%。

详情
AI中文摘要

当推荐影响稀缺、短期机会的获取时,推荐系统应如何设计?我们在一个生产环境中研究这个问题:Timee,日本最大的零工平台,工人收藏工作模板,并在企业发布来自这些模板的班次时收到通知。最大化预测的收藏可能导致误导性的集中:推荐积累在产生很少可行职位空缺的热门模板上,而劳动力需求未得到满足的模板曝光不足。我们设计了用于收藏列表管理的曝光控制机制,根据发布活动和未填补容量重新分配模板曝光。提出的推荐器——阈值资格控制(TEC)——是完全可并行化的,适用于大规模数字平台。在基于Timee数据校准的模拟中,TEC将每轮工作找到率从57.6%提高到70.0%。一个县级随机现场实验增加了实际匹配和每个活跃模板的曝光,减少了低曝光模板的比例,并改善了印象级收藏和下游匹配。

英文摘要

How should recommender systems be designed when recommendations shape access to scarce, short-lived opportunities? We study this question in a production setting: Timee, Japan's largest platform for spot work, where workers favorite job templates and receive notifications when firms post shifts from those templates. Maximizing predicted favoriting can generate misdirected concentration: recommendations accumulate on popular templates that create few viable job openings, while templates with unmet labor demand receive too little exposure. We design exposure-control mechanisms for favorite-list management, reallocating template exposure based on posting activity and unfilled capacity. The proposed recommender, thresholded eligibility control (TEC), is fully parallelizable and suitable for large-scale digital platforms. In simulations calibrated to Timee data, TEC raises the per-round job-finding rate from 57.6\% to 70.0\%. A prefecture-level randomized field experiment increases realized matches and exposure per active template, reduces the share of low-exposure templates, and improves impression-level favoriting and downstream matching.

2606.17383 2026-06-17 q-fin.RM cs.AI cs.LG stat.ML 新提交

Model Validation of Agentic AI Systems: A POMDP-Based Framework for Belief-State, Forecast, and Policy Validation

智能体AI系统的模型验证:基于POMDP的信念状态、预测与策略验证框架

Matthew Francis Dixon

发表机构 * Quiota LLC(Quiota公司)

AI总结 提出基于部分可观测马尔可夫决策过程(POMDP)的智能体AI模型验证框架,将自主决策分解为信息、信念、预测、动作和效用组件独立验证,并通过投资组合管理案例展示其有效性。

Comments 28 pages, 3 figures, 6 tables. Source code available from this https URL (https://github.com/mfrdixon/agentic-AI-as-POMDP)

详情
AI中文摘要

智能体人工智能系统引入了一类新的模型风险。与传统预测模型不同,自主智能体持续获取信息,形成关于环境潜在状态的信念,生成预测,选择行动,并随时间调整其行为。现有的验证方法主要关注预测准确性,因此对底层决策过程的质量提供的洞察有限。本文提出了一种基于部分可观测马尔可夫决策过程(POMDP)的智能体AI模型验证框架。该框架将自主决策分解为信息、信念、预测、行动和效用,允许每个组件独立验证。大型语言模型(LLM)被形式化为近似贝叶斯滤波算子,并开发了一个模型风险分类体系,涵盖状态空间、滤波、预测、策略、效用规范和参数风险。通过一个投资组合管理案例研究展示了模型风险验证方法,其中智能体从市场和宏观经济信息中推断潜在市场制度,生成基于信念的预测,并使用Black-Litterman框架构建投资组合。实证验证结合了性能分析、信念校准诊断、覆盖测试、消融研究和参数敏感性分析。结果表明,潜在状态推断对决策质量有独立贡献,且主要结论在广泛的参数值范围内保持稳健。本文的主要贡献是提供了一个实用框架,将已建立的模型风险管理概念扩展到自主AI系统,并为其验证、治理和监控提供了严格的基础。

英文摘要

Agentic artificial intelligence systems introduce a new class of model risk. Unlike traditional predictive models, autonomous agents continuously acquire information, form beliefs regarding latent states of the environment, generate forecasts, select actions, and adapt their behavior over time. Existing validation methodologies focus primarily on predictive accuracy and therefore provide limited insight into the quality of the underlying decision process. This paper proposes a model validation framework for agentic AI based on Partially Observable Markov Decision Processes (POMDPs). The framework decomposes autonomous decision making into information, beliefs, forecasts, actions, and utility, allowing each component to be validated independently. Large language models (LLMs) are formalized as approximate Bayesian filtering operators, and a model-risk taxonomy is developed encompassing state-space, filtering, forecast, policy, utility-specification, and parameter risks. The model risk validation methodology is demonstrated through a portfolio-management case study in which an agent infers latent market regimes from market and macroeconomic information, generates belief-conditioned forecasts, and constructs portfolios using a Black--Litterman framework. Empirical validation combines performance analysis, belief calibration diagnostics, coverage tests, ablation studies, and parameter-sensitivity analysis. The results indicate that latent-state inference contributes independently to decision quality and that the principal conclusions remain robust across a broad range of parameter values. The principal contribution of the paper is a practical framework for extending established model risk management concepts to autonomous AI systems and providing a rigorous foundation for their validation, governance, and monitoring.

2606.17327 2026-06-17 q-bio.BM cs.AR cs.ET cs.NE 新提交

Energy-efficient codon optimization on thermodynamic hardware

热力学硬件上的节能密码子优化

Andraz Jelincic, Ross C. Walker

AI总结 本文将mRNA密码子优化问题映射到伊辛模型,在热力学采样单元上实现,相比GPU能耗降低约10^6倍,为热力学计算在制药领域的应用提供了首个具体实例。

Comments Preprint available on bioRxiv: DOI TBD

详情
AI中文摘要

计算能耗的不断增长正变得日益不可持续。热力学计算利用物理热涨落作为计算资源而非抑制它们,为概率性和组合性任务提供了数量级的节能。制药研发严重依赖计算优化和采样,是一个自然的应用领域。本文提出了据我们所知首个映射到热力学硬件的具体制药应用,并给出了基于原型测量的能耗估计。我们将mRNA密码子优化(药物开发中常规解决的组合问题)简化为从伊辛模型采样,使其可直接在热力学采样单元(TSU)上执行。在SARS-CoV-2刺突蛋白上对三种方法(Potts采样、伊辛采样和遗传算法基线)进行基准测试,发现所有方法均达到相当的优化质量(得分约234-240),但基于验证硬件模型的能耗估计表明,TSU解决该问题所需的能量约为传统GPU的10^6分之一。所有代码均以开源许可证发布。

英文摘要

The growing energy demand for computation is becoming increasingly unsustainable. Thermodynamic computing, which harnesses physical thermal fluctuations as a computational resource rather than suppressing them, offers orders-of-magnitude energy savings for probabilistic and combinatorial tasks. Pharmaceutical R&D, heavily reliant on computational optimization and sampling, is a natural application domain. Here we present what is, to our knowledge, the first concrete pharmaceutical application mapped to thermodynamic hardware with energy estimates grounded in prototype measurements. We reduce mRNA codon optimization, a combinatorial problem routinely solved in drug development, to sampling from an Ising model, making it directly executable on a thermodynamic sampling unit (TSU). Benchmarking three approaches (Potts sampling, Ising sampling, and a genetic algorithm baseline) on the SARS-CoV-2 spike protein, we find that all achieve comparable optimization quality (scores ~234-240), but energy estimates based on validated hardware models indicate that a TSU could solve this problem using approximately 10e6 times less energy than a conventional GPU. All code is released under an open-source license.

2606.17295 2026-06-17 eess.IV cs.CV 新提交

Phenotyping TPF via Self-Supervised Learning: A Label-Agnostic Framework with Expert Validation

通过自监督学习进行胫骨平台骨折表型分析:一种具有专家验证的标签无关框架

Miral Elnakib, Muhammad Saad, Ahmad Al-Kabbany

发表机构 * Faculty of Sciences(科学学院) Alexandria University(亚历山大大学) Multimedia Interaction and Communication Lab(多媒体交互与通信实验室) Wearables, Biosensing, and Biosignal Processing Research Lab(可穿戴设备、生物传感与生物信号处理研究实验室) Arab Academy for Science and Technology(阿拉伯科学与技术学院)

AI总结 提出一种标签无关的自监督学习框架,利用SimCLR和聚类从X光片中直接学习骨折表征,发现四种影像衍生表型,经盲法专家验证具有稳定性和临床可解释性,与常规分类正交。

详情
AI中文摘要

人工智能在胫骨平台骨折特征描述中的全部潜力尚未实现,受限于对标注数据集的根本依赖,而标注数据集的一致性无法保证:传统的分类方案如Schatzker和AO/OTA存在观察者间变异性,导致监督模型学习的是人类分歧而非稳定的骨折形态。我们设计、实现并验证了一个标签无关的框架,通过直接从影像数据中学习骨折表征来消除这一约束,无需观察者分配的标签。使用RadImageNet预训练的ResNet-50编码器,在154张清洁的膝关节X光片上通过SimCLR对比目标进行微调,之前进行数据清洗协议,之后进行UMAP降维和k-means聚类,以发现四种影像衍生表型。通过盲法专家审查协议评估表型有效性,由两名独立临床医生进行。四种表型表现出稳健的稳定性(bootstrap ARI = 0.319 +/- 0.041)、强内部凝聚力(轮廓系数 = 0.511),以及两名评审者在盲法条件下给出3-5/5的一致性评分;一种表型被一致认为表现出粉碎性——一种在没有监督信号的情况下分离出的高复杂性特征。与Schatzker标签的跨分区比较得出ARI = 0.013,证实了与传统分类边界的正交性。值得注意的是,锚定于既定分类词汇的专家评审者在Schatzker对齐度最低的地方认为影像衍生组是异质的,这表明Schatzker训练的感知和标签无关的嵌入几何测量的是正交维度。这些发现确立了标签无关的SSL表型分析作为传统分类的可重复且临床可解释的补充。

英文摘要

The full potential of artificial intelligence in tibial plateau fracture characterisation remains unrealised, constrained by a fundamental dependency on labelled datasets whose consistency cannot be guaranteed: conventional classification schemes such as Schatzker and AO/OTA suffer from inter-observer variability, causing supervised models to learn human disagreement rather than stable fracture morphology. We design, implement, and validate a label-agnostic framework that eliminates this constraint by learning fracture representations directly from imaging data without observer-assigned labels. A RadImageNet-pretrained ResNet-50 encoder is fine-tuned on 154 cleaned knee radiographs using the SimCLR contrastive objective, preceded by a data cleaning protocol and followed by UMAP dimensionality reduction and k-means clustering to discover four imaging-derived phenotypes. Phenotype validity is assessed through a blinded expert review protocol administered to two independent clinicians. The four phenotypes demonstrate robust stability (bootstrap ARI = 0.319 +/- 0.041), strong internal cohesion (silhouette = 0.511), and coherence ratings of 3-5/5 from both reviewers under blinded conditions; one phenotype was unanimously identified as exhibiting comminution -- a high-complexity feature isolated without any supervisory signal. Inter-partition comparison against Schatzker labels yields ARI = 0.013, confirming orthogonality to conventional classification boundaries. Notably, expert reviewers anchored to established classification vocabularies perceived imaging-derived groups as heterogeneous precisely where Schatzker alignment was lowest, suggesting that Schatzker-trained perception and label-agnostic embedding geometry measure orthogonal dimensions. These findings establish label-agnostic SSL phenotyping as a reproducible and clinically interpretable complement to conventional classification.

2606.17259 2026-06-17 eess.AS cs.SD 新提交

Intelligibility of Speech in Noise: Investigating Contribution of Magnitude and Phase Spectra

噪声中语音的可懂度:幅度谱和相位谱贡献的研究

Bhanu Teja Nellore, Sudarsana Reddy Kadiri, Rohit Kumar, Karan Nathwani, Suryakanth V Gangashetty

发表机构 * Signal Analysis and Interpretation Laboratory, University of Southern California, Los Angeles, USA(美国南加州大学信号分析与解释实验室) National Institute of Technology, Patna, India(印度帕坦国家理工学院) Indian Institute of Technology, Jammu, India(印度朱默尔理工学院) Koneru Lakshmaiah Education Foundation, Vaddeswaram, Guntur District, Andhra Pradesh, India(安得拉邦戈塔瓦德区瓦达萨瓦拉姆康纳鲁拉克希玛伊教育基金会)

AI总结 通过三个实验评估幅度谱和相位谱对噪声中辅音可懂度的贡献,发现幅度谱在干净条件下贡献更大,而相位谱在噪声条件下更鲁棒,且鼻音比擦音和近音更易受噪声影响。

详情
AI中文摘要

众所周知,语音的可懂度在环境噪声中会降低。然而,研究表明并非所有声音都受到均匀(或同等)影响,元音比辅音对噪声更鲁棒。本研究评估并分析了各种辅音在平稳白噪声和非平稳嘈杂噪声条件下的可懂度。具体而言,本研究探讨了给定语音信号的幅度谱和相位谱对噪声条件下辅音人类语音识别的各自贡献。为此,进行了三个实验。实验1中,评估了干净信号、仅用幅度谱信息重建的信号(仅幅度信号)和仅用相位谱信息重建的信号(仅相位信号)的可懂度。实验2中,将噪声添加到干净语音中。从带噪语音中重建仅相位信号和仅幅度信号,并对所有这三种信号进行可懂度测试。实验3中,将噪声直接添加到从干净语音重建的仅幅度和仅相位信号中,并评估其可懂度。这些实验结果表明,在干净条件下幅度谱对可懂度的贡献大于相位谱,而相位谱的信息在噪声条件下更鲁棒。还观察到,在辅音中,鼻音更容易受噪声影响,而擦音和近音相对更鲁棒。

英文摘要

It is well known that intelligibility of speech reduces in the presence of ambient noise. However, studies show that all sounds are not affected uniformly (or equally) and that vowels are more robust to noise than consonants. In this study, intelligibility of various consonants is assessed and analyzed in stationary white noise and non-stationary babble noise conditions. Specifically, this study investigates the individual contribution of magnitude and phase spectra of a given speech signal on human speech recognition of consonants in noisy conditions. In this regard, three experiments are carried out. In experiment 1, clean signal, signal reconstructed with only magnitude spectrum information (magnitude only signal) and signal reconstructed with only phase spectrum information (phase only signal) are assessed for intelligibility. In experiment 2, noise is added to clean speech. From noisy speech, phase only signal and magnitude only signal are reconstructed and intelligibility tests are performed for all these three signals. In experiment 3, noise is added directly to the magnitude only and phase only signals reconstructed from clean speech and their intelligibility is assessed. Results of these experiments show that magnitude spectrum contributes more to intelligibility in clean condition than phase spectrum, while information from phase spectrum is more robust in noisy conditions. It is also observed that, among consonants, nasals are more susceptible to noise whereas fricatives and approximants were observed to be comparatively more robust.

2606.17247 2026-06-17 eess.SP cs.ET 新提交

Large-scale Tunable Liquid Lens-assisted VLC Systems under Random Receiver Orientation

大规模可调谐液体透镜辅助VLC系统在随机接收器方向下的研究

Kapila W. S. Palitharathna, Constantinos Psomas, Gaofeng Pan, Ioannis Krikidis

AI总结 针对随机接收器方向下的大规模可见光通信系统,提出基于电润湿的可调谐液体透镜接收器架构,通过动态调整液面方向增强信号并抑制干扰,基于随机几何推导中断概率解析式,最佳信号接收策略相比传统固定透镜降低57.1%中断概率。

Comments This paper has been submitted to IEEE Transactions on Wireless Communications journal

详情
AI中文摘要

本文研究了在随机接收器方向下,可调谐液体透镜辅助接收器在大规模可见光通信系统中的性能。提出了一种简单的基于电润湿的TLL架构,能够通过调整液体界面方向动态地将入射光信号导向光电二极管接收器。该架构增强了期望信号接收,同时减轻了来自相邻接入点的干扰。AP的空间分布采用Matérn硬核点过程建模,而接收器方向由均匀分布的方位角和服从高斯分布的极角表征。此外,开发了一个易处理的光学信道数学模型,以捕捉AP/接收器位置、接收器方向和透镜调整角度对VLC信道增益的综合影响。基于此框架,提出了三种透镜方向策略:最佳信号接收、最近LED选择和垂直向上透镜方向,以改善动态接收器条件下的系统性能。利用随机几何工具,推导了每种方案的中断概率的精确和近似解析表达式。数值结果验证了所开发分析的准确性,并表明所提出的TLL辅助接收器架构在严重的接收器方向波动和密集AP部署下显著提高了VLC系统的鲁棒性。特别是,在AP高度为3.5 m、AP密度为0.2 m^{-2}时,BSR方案相比传统固定透镜接收器将中断概率降低了57.1%。所提出的分析框架和数值结果为未来TLL辅助VLC网络的部署提供了有用的设计见解。

英文摘要

This paper investigates the performance of tunable liquid lens (TLL)-assisted receivers in large-scale visible light communication (VLC) systems under random receiver orientation. A simple electrowetting-based TLL architecture is proposed, capable of dynamically steering the incident optical signal toward the photodiode receiver by adjusting the orientation of the liquid interface. The proposed architecture enhances the desired signal reception while mitigating interference from neighboring access points (APs). The spatial distribution of APs is modeled using a Matérn hard-core point process, whereas receiver orientation is characterized by uniformly distributed azimuth angles and Gaussian-distributed polar angles. Furthermore, a tractable mathematical optical channel model is developed to capture the combined effects of AP/receiver locations, receiver orientation, and lens adjustment angles on the VLC channel gain. Based on this framework, three lens orientation strategies, namely best signal reception (BSR), closest LED selection, and vertical upward lens orientation, are proposed to improve system performance under dynamic receiver conditions. Using stochastic geometry tools, exact and approximate analytical expressions for the outage probability are derived for each scheme. Numerical results verify the accuracy of the developed analysis and demonstrate that the proposed TLL-assisted receiver architecture significantly improves the robustness of VLC systems under severe receiver orientation fluctuations and dense AP deployments. In particular, the BSR scheme reduces the outage probability by $57.1\%$ compared with conventional fixed-lens receivers at an AP height of $3.5$ m and AP density of $0.2~\text{m}^{-2}$. The presented analytical framework and numerical results provide useful design insights for the deployment of future TLL-assisted VLC networks.

2606.17196 2026-06-17 stat.ML cs.LG stat.ME 新提交

Another Look at Log-PCA for Probability Measures: A Dynamical Formulation and Statistical Convergence

再探概率测度的Log-PCA:一种动力学公式与统计收敛性

Peng Xu, Changbo Zhu, Young-Heon Kim, Xiaohui Chen

发表机构 * Department of Statistics University of Illinois Urbana-Champaign(统计学系伊利诺伊大学厄巴纳-香槟分校) Department of ACMS University of Notre Dame(ACMS系诺丁汉大学) Department of Mathematics University of British Columbia(数学系不列颠哥伦比亚大学) Department of Mathematics Thomas Lord Department of Computer Science University of Southern California(数学系托马斯·劳德计算机科学系南加州大学)

AI总结 本文在Wasserstein几何下提出一种动力学公式解释log-PCA,称为Wasserstein切向PCA(WT-PCA),并推导了经验WT-PCA相对于总体测度的统计收敛速率。

详情
AI中文摘要

本文关注在Wasserstein几何下学习随机概率测度在$\mathbb{R}^m$上的主变差。我们引入一种新的动力学公式来解释log-PCA(一种线性化的主测地线分析)作为变分方法。我们的可微版本称为Wasserstein切向PCA(WT-PCA),通过其在重心处的协方差算子捕获Wasserstein空间上(加权)概率测度的局部主测地线变差模式。基于动力学视角并利用最优传输问题的平行传输结构,我们推导了从数据估计的经验WT-PCA相对于总体和经验重心参考测度之间的2-Wasserstein距离的通用统计收敛速率。

英文摘要

This paper is concerned with learning principal variations of random probability measures on $\mathbb{R}^m$ under the Wasserstein geometry. We introduce a new dynamical formulation to interpret the log-PCA, a linearized principal geodesic analysis, as a variational approach. Our differentiable version, termed as the Wasserstein Tangential PCA (WT-PCA), captures the local principal modes of geodesic variations of a (weighted) probability measure on the Wasserstein space via its covariance operator at barycenter. Based on the dynamical perspective and leveraging parallel transport structure of the optimal transport problems, we derive a general statistical convergence rate of the empirical WT-PCA when estimated from data in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures.

2606.17127 2026-06-17 q-bio.QM cs.AI cs.LG 新提交

Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3

AMPGAN v3 的非经典抗菌肽智能发现

Jay Jung, Xiaohan Zhang, Shenghan Song, Mahmoud Sayedahmed, Chijian Xiang, Yunong Xu, Ahmed AbdelKhalek, Severin T. Schneebeli, Matthew J. Wargo, Jianing Li, Safwan Wshah

发表机构 * University of Vermont(弗吉尼亚大学) Larner College of Medicine, University of Vermont(弗吉尼亚大学医学学院) Purdue University(普渡大学) Department of Comparative Pathobiology(比较病理科部门) Department of Horticulture and Landscape Architecture(园艺与景观建筑部门) Department of Industrial and Molecular Pharmaceutics(工业与分子药学部门)

AI总结 提出 AMPGAN v3,一种多目标条件 GAN,扩展生成词汇至 D-氨基酸和末端修饰,通过双判别器提升稳定性,体外验证显示对革兰氏阳性菌有活性,并引入 PepCraft 多智能体框架用于端到端发现。

Comments Presented at the GenBio Workshop, ICML 2026

详情
AI中文摘要

抗菌药物耐药性每年导致超过一百万人死亡。抗菌肽(AMP)是一种有前景的解决方案,但生成式 AMP 模型尚未准备好设计含有非天然氨基酸和/或化学修饰的肽,而这些对于实际肽药物至关重要。我们提出了 AMPGAN v3,一种多目标条件 GAN,它将生成词汇扩展到 D-氨基酸和 N/C 末端修饰(如酰胺化)。通过将对抗性和活性感知监督分离到两个专门的判别器中,AMPGAN v3 显著提高了训练稳定性,并在外部分类器上优于先前的生成式 AMP 模型。我们在体外验证了跨越三个结构类别的五个候选物;其中两个对革兰氏阳性菌株表现出活性,最佳候选物对枯草芽孢杆菌的 MIC 达到 8 μg/mL。为了支持下游筛选,我们进一步提出了 PepCraft,一个用于端到端 AMP 发现的多智能体框架,其中规划智能体协调专门的执行器进行生成、过滤和验证。其优先级推荐与我们的体外结果一致。这些贡献使我们能够在小型但真实的规模上研究生成式和智能体 AI 如何在治疗性肽发现中协同作用。代码:this https URL

英文摘要

Antimicrobial resistance causes to over a million deaths annually. Antimicrobial peptides (AMPs) are a promising solution, but generative AMP models are not yet ready to design peptides with non-natural amino acids and/or chemical modifications, which are essential for real-world peptide drugs. We present AMPGAN v3, a multi-objective conditional GAN that expands the generative vocabulary to D-amino acids and N/C-terminus modifications such as amidation. By separating adversarial and activity-aware supervision across two specialized discriminators, AMPGAN v3 substantially improves training stability and outperforms prior generative AMP models on external classifiers. We validated five candidates spanning three structural classes in vitro; two showed activity against Gram-positive strains, with the best candidate reaching MIC 8 {\mu}g/mL against B. subtilis. To support downstream curation, we further present PepCraft, a multi-agent framework for end-to-end AMP discovery in which a Planning Agent orchestrates specialized executors for generation, filtering, and verification. Its prioritization recommendations align with our in vitro outcomes. Together, these contributions let us examine, on a small but real scale, how generative and agentic AI compose in therapeutic peptide discovery. Code: this https URL

2606.17065 2026-06-17 q-fin.CP cs.AI cs.LG 新提交

PIVOT: Bridging Black-Scholes Implied-Volatility and Price Objectives via Differentiable Jäckel Operator

PIVOT: 通过可微分的Jäckel算子桥接Black-Scholes隐含波动率与价格目标

Raeid Saqur, Yannick Limmer, Anastasis Kratsios, Blanka Horvath, Hans Buehler

发表机构 * Mathematical Institute, University of Oxford(牛津大学数学研究所) McMaster University(麦基尔大学) Vector Institute for AI(人工智能矢量研究所) DRW

AI总结 提出PIVOT层,通过隐式微分保留Jäckel求解器的前向精度,并利用门控机制处理低vega区域的奇异性,实现价格与隐含波动率空间的高效可微转换。

Comments 30 pages, 17 figures, 12 tables

详情
AI中文摘要

现代期权学习系统在两种坐标系下运行:价格空间(市场报价且无套利约束最自然执行)和隐含波动率(IV)空间(波动率曲面被平滑、正则化和评估)。瓶颈在于接口而非近似:Jäckel开创性的“Let's Be Rational”(LBR)求解器已经高效地将Black-Scholes价格反转到机器精度。所缺少的是一个可微分层,它在正向传播中保留LBR,并避免通过其分支逻辑进行反向传播。这样的层还必须面对低vega区域中逆映射不可避免的奇异性,其中灵敏度1/vega在vega→0时发散。我们通过PIVOT(价格-隐含波动率目标转换器)填补了这一空白。PIVOT保持LBR正向传播不变,并通过隐式微分通过平滑的Black-Scholes/Black-76价格映射提供反向传播,并带有显式门控合约:无效域返回NaN,良态行接收精确的1/vega梯度,低vega行被衰减而非静默正则化。在单个H100上,融合的Triton内核在机器精度下达到1.79e9 IV/s(与参考C求解器的最大相对误差为9.3e-14);端到端标签生成在合成链上维持48.9M/s,在SPX OptionMetrics上维持16.6M/s。在SPX上的HyperIV风格单日复现中,PIVOT增强目标帕累托主导基线,将保留价格MAE降低高达43.4%,最强的三种子门控目标联合改善价格MAE 38.8%和IV MAE 21.3%;在RUT、VIX和NDX上的跨资产结果显示方向性价格MAE增益分别为40.1%、24.2%和16.7%,而无门控的IV往返控制崩溃为退化的近零曲面,确认门控是正确性合约而非调节旋钮。

英文摘要

Modern option-learning systems operate in two coordinates: price space, where markets quote and no-arbitrage constraints are most naturally enforced, and implied volatility (IV) space, where volatility surfaces are smoothed, regularized, and evaluated. The bottleneck is interface, not approximation: Jäckel's seminal "Let's Be Rational" (LBR) solver already inverts the Black-Scholes price to machine precision efficiently. What is missing is a differentiable layer that preserves LBR in the forward pass and avoids backpropagating through its branch logic. Such a layer must also confront the unavoidable singularity of the inverse map in the low-vega regime, where the sensitivity 1/vega diverges as vega -> 0. We close this gap with PIVOT, the Price-Implied-Volatility Objective Translator. PIVOT keeps the LBR forward pass intact and supplies the backward pass by implicit differentiation through the smooth Black-Scholes/Black-76 price map, with an explicit gating contract: invalid domains return NaN, well-conditioned rows receive the exact 1/vega gradient, and low-vega rows are attenuated rather than silently regularized. On a single H100, a fused Triton kernel reaches 1.79e9 IV/s at machine precision (9.3e-14 max relative error vs. the reference C solver); end-to-end label generation sustains 48.9M/s on synthetic chains and 16.6M/s on SPX OptionMetrics. In a HyperIV-style one-day reproduction on SPX, PIVOT-augmented objectives Pareto-dominate the baselines, reducing held-out price MAE by up to 43.4% and the strongest three-seed gated objective improving price MAE by 38.8% and IV MAE by 21.3% jointly; cross-asset results on RUT, VIX, and NDX show directional price-MAE gains of 40.1%, 24.2%, and 16.7%, while an ungated IV-roundtrip control collapses to a degenerate near-zero surface, confirming the gate as a correctness contract rather than a tuning knob.

2606.17062 2026-06-17 q-bio.QM cs.LG 新提交

RadSEM: A Finding-by-Finding Metric for Clinical Consistency in Radiology Reports

RadSEM:放射学报告中临床一致性的逐发现指标

Zhenhong Yang, Zhuoyun Liu, Jintao Fei, Wen Tang, Shichao Quan, Jun Zhao, Jun Xu

发表机构 * JDH Algo, JD Health International Inc., China Department of Big Data in Health Science, The First Affiliated Hospital of Wenzhou Medical University, China Zhejiang Engineering Research Center for Hospital Emergency Department of Intensive Care Unit, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China

AI总结 提出RadSEM指标,通过约束LLM辅助将报告重写为原子发现句,进行矛盾感知的多对多匹配,并计算异常加权的F1分数,在SSREE测试中优于现有指标,实现高一致性评分。

详情
AI中文摘要

放射学报告评估必须区分临床兼容性与表面相似性,因为否定、侧别或正常-异常极性可能逆转发现。我们提出RadSEM(放射学句子级评估指标),一种受约束的LLM辅助指标,用于基于参考的放射学发现评估。RadSEM将参考报告和生成报告重写为有序的原子发现句,每个句子表达一个部位-发现命题。然后执行矛盾约束的多对多匹配:不兼容对(如“积液”和“无积液”)不得分,而兼容的粒度差异可获部分得分。确定性阶段根据部分-整体和异常-细节关系对配对加权,计数未匹配的发现,并生成异常加权的加权F1分数。因此,LLM支持结构化重写和局部对齐,而非充当不透明评判者。我们使用SSREE(一种受控单调性压力测试,基于2,448份去标识报告扩展为五个等级损坏水平)评估RadSEM。RadSEM的Kendall tau_b达到0.957,全对一致性97.8%,相邻一致性95.0%,81.9%的报告实现严格五级排序,优于放射学专用和通用文本指标,同时避免了极性反转报告重新获得词汇重叠的失败。在同一SSREE集上,RadSEM优于参考锚定的RadSEM-Alt策略,将相邻一致性从90.7%提升至95.0%,严格排序从67.2%提升至81.9%。在599个三元组同义词/反义词子集上,RadSEM在597个案例(99.67%)中偏好同义词。这些结果表明,显式发现单元、矛盾感知匹配和异常聚焦的确定性评分使报告评分更具可解释性,并对临床有意义的错误更敏感。代码见:此https URL。

英文摘要

Radiology report evaluation must distinguish clinical compatibility from surface similarity, because negation, laterality, or normal-abnormal polarity can reverse a finding. We propose RadSEM (Radiology Sentence-Level Evaluation Metric), a constrained LLM-assisted metric for reference-based evaluation of radiology Findings. RadSEM rewrites reference and generated reports into ordered atomic finding sentences, each expressing one site-finding proposition. It then performs contradiction-constrained many-to-many matching: incompatible pairs such as "effusion" and "no effusion" receive no credit, while compatible granularity differences can receive partial credit. A deterministic stage weights pairs by part-whole and abnormal-detail relationships, counts unmatched findings, and produces an abnormal-focused weighted F1 score. Thus, the LLM supports structured rewriting and local alignment rather than acting as an opaque judge. We evaluate RadSEM with SSREE, a controlled monotonicity stress test built from 2,448 de-identified reports expanded into five graded corruption levels. RadSEM achieves Kendall tau_b of 0.957, all-pairs concordance of 97.8%, adjacent concordance of 95.0%, and strict five-level ordering for 81.9% of reports, outperforming radiology-specific and general text metrics while avoiding the failure in which polarity-inverted reports regain lexical overlap. On the same SSREE set, RadSEM outperforms the Ref-anchored RadSEM-Alt policy, improving adjacent concordance from 90.7% to 95.0% and strict ordering from 67.2% to 81.9%. On a 599-triplet synonym/antonym subset, RadSEM prefers synonyms in 597 cases (99.67%). These results suggest that explicit finding units, contradiction-aware matching, and abnormal-focused deterministic scoring make report scoring more interpretable and sensitive to clinically meaningful errors. Code is available at this https URL.

2606.18250 2026-06-17 cs.CV 新提交

Future Dynamic 3D Reconstruction: A 3D World Model with Disentangled Ego-Motion

未来动态3D重建:一种具有解耦自运动的3D世界模型

Nils Morbitzer, Jonathan Evers, Artem Savkin, Thomas Stauner, Nassir Navab, Federico Tombari, Stefano Gasperini

AI总结 提出FR3D世界模型,通过解耦场景3D演化与智能体轨迹,利用教师-学生蒸馏策略实现从单目观测到未来动态3D重建的几何一致性和零样本泛化。

Comments ICML 2026. Project page: this https URL (https://fr3d-wm.github.io)

详情
AI中文摘要

预测动态环境的演化对于自主智能体至关重要。尽管生成式世界模型最近通过在图像平面内混合自运动和环境动态,在2D视频合成中实现了高逼真度,但它们表现出物理不一致性,例如物体变形或消失,尤其是在长时间范围内。在本文中,我们提出FR3D,一种预测未来动态3D重建的持久3D潜在表示的世界模型。与将世界视为基于图像的特征序列的先前工作不同,FR3D明确地将场景的3D演化与智能体的轨迹解耦,将推断的自运动视为动作的潜在代理。这种解耦解决了自运动和世界运动之间的歧义,确保了几何一致性到未来。此外,我们引入了一种教师-学生蒸馏策略,利用现成基础模型的空间“常识”,从而实现鲁棒的零样本泛化。大量实验表明,FR3D在多个数据集上从单目观测进行未来动态3D重建(甚至到未来2秒)的强大性能。项目页面:此https URL。

英文摘要

Forecasting the evolution of dynamic environments is crucial for autonomous agents. While generative world models have recently achieved high photorealism in 2D video synthesis by mixing ego-motion and environmental dynamics within the image plane, they exhibit physical inconsistencies, such as morphing or vanishing objects, especially over long time horizons. In this paper, we propose FR3D, a world model that predicts a persistent 3D latent representation for future dynamic 3D reconstruction. Unlike prior works that treat the world as a sequence of image-based features, FR3D explicitly decouples the 3D evolution of the scene from the agent's trajectory, treating the inferred ego-motion as a latent proxy for action. This disentanglement resolves the ambiguities between self-motion and world-motion, ensuring geometric consistency into the future. Furthermore, we introduce a teacher-student distillation strategy that leverages the spatial "common sense" of off-the-shelf foundation models, leading to robust zero-shot generalization. Extensive experiments demonstrate FR3D's strong performance for future dynamic 3D reconstruction from monocular observations across multiple datasets, even 2 seconds into the future. Project page: this https URL.

2606.18249 2026-06-17 cs.CV 新提交

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

统一多模态自回归建模:共享上下文-视觉分词器是实现统一的关键

Wujian Peng, Lingchen Meng, Yuxuan Cai, Xianwei Zhuang, Yuhuan Yang, Rongyao Fang, Chenfei Wu, Junyang Lin, Zuxuan Wu, Shuai Bai

发表机构 * Institute of Trustworthy Embodied AI, Fudan University(可信具身AI研究院,复旦大学) Shanghai Innovation Institute(上海创新研究院) Qwen Team, Alibaba Inc.(通义实验室,阿里公司)

AI总结 提出UniAR框架,通过单一离散视觉分词器桥接视觉理解与生成,采用并行位预测和扩散解码,在图像生成和编辑上达到最优,同时保持多模态理解竞争力。

Comments Accepted by ICML2026. Project page this https URL (https://sharelab-sii.github.io/uniar-web)

详情
AI中文摘要

统一多模态建模旨在将视觉理解和生成集成到单个系统中。然而,现有方法通常依赖两个不同的视觉分词器,这分割了表示空间并阻碍了真正的统一建模。我们提出UniAR,一个统一的自回归框架,其中单个离散视觉分词器作为理解和生成之间的关键桥梁,使得模型能够直接解释其自身生成的视觉标记而无需额外的重新编码,从而实现共享上下文。UniAR采用预训练的视觉编码器,结合多级特征融合和无查找的逐位量化方案,在保留高层语义和低层细节的同时,以最小代价扩展有效视觉词汇。在此基础上,统一自回归模型采用并行逐位预测来联合预测空间分组的多级视觉编码,大幅减少视觉序列长度并加速生成。最后,基于扩散的视觉解码器对离散视觉标记进行操作,以解码高保真图像。通过大规模预训练,随后进行监督微调和强化学习,UniAR在图像生成和图像编辑上达到了最先进的性能,同时在多模态理解基准上保持竞争力。项目页面可在此URL获取。

英文摘要

Unified Multimodal Modeling aims to integrate visual understanding and generation within a single system. However, existing approaches typically rely on two disparate visual tokenizers, which splits the representation space and hinders truly unified modeling. We propose UniAR, a unified autoregressive framework where a single discrete visual tokenizer serves as the key bridge between understanding and generation, enabling a shared context in which the model can directly interpret its own generated visual tokens without additional re-encoding. UniAR adapts a pretrained vision encoder with multi-level feature fusion and a lookup-free bitwise quantization scheme, preserving both high-level semantics and low-level details while scaling the effective visual vocabulary at minimal cost. Building on this, the unified autoregressive model adopts parallel-bitwise-prediction to jointly predict spatially grouped, multi-level visual codes, substantially reducing visual sequence length and accelerating generation. Finally, a diffusion-based visual decoder operates on discrete visual tokens to decode high-fidelity images. Through large-scale pre-training, followed by supervised fine-tuning and reinforcement learning, UniAR achieves state-of-the-art performance on image generation and image editing while remaining competitive on multimodal understanding benchmarks. The project page is available at this https URL.

2606.18247 2026-06-17 cs.RO cs.AI 新提交

Visual Verification Enables Inference-time Steering and Autonomous Policy Improvement

视觉验证实现推理时引导与自主策略改进

Mingtong Zhang, Dhruv Shah

AI总结 提出VERITAS框架,利用预训练通用机器人策略作为生成器,结合无梯度视觉验证器在推理时评估动作,实现无需额外训练的推理时策略引导和离线策略改进。

Comments Website: this https URL (https://veritas-improvement.github.io)

详情
AI中文摘要

部署在现实世界中的机器人应从经验中学习并随时间改进。这需要一个实践并从反馈中学习的机制。在本文中,我们提出VERITAS,一个用于通用机器人策略的生成器-验证器框架,用于推理时策略引导和自我改进。我们使用预训练的通用机器人策略作为“生成器”,并将其与一个无梯度的“视觉验证器”配对,该验证器在推理时评估动作。该框架实现了推理时引导,无需额外训练即可提高策略性能。我们证明,推理时验证在无需额外演示数据训练的情况下,始终优于普通通用策略。此外,我们证明验证后的 rollout 为离线策略改进提供了有效的监督:在验证后的自生成轨迹上微调的策略实现了持续的性能提升。值得注意的是,我们发现使用验证后的 rollout 进行后训练达到了与专家演示相当的效率,同时无需人工干预。我们的结果突出了推理时验证作为一种实用且可扩展的机制,用于在部署期间改进机器人策略。

英文摘要

Robots deployed in the real world should learn from their experience and improve over time. This requires a mechanism of practicing and learning from feedback. In this paper, we propose VERITAS, a generator-verifier framework for generalist robot policies for inference-time policy steering and self-improvement. We use a pre-trained generalist robot policy as a ``generator'' and pair it with a gradient-free ``visual verifier'' that evaluates actions at inference time. This framework enables inference-time steering that improves policy performance without additional training. We demonstrate that inference-time verification consistently outperforms vanilla generalists without training on additional demonstration data. Additionally, we demonstrate that the verified rollouts provide effective supervision for offline policy improvement: policies fine-tuned on verified self-generated trajectories achieve consistent performance gains. Notably, we find that post-training with verified rollouts achieves comparable efficiency to expert demonstrations, while requiring no human interventions. Our results highlight inference-time verification as a practical and scalable mechanism for improving robotic policies during deployment.

2606.18246 2026-06-17 cs.CL 新提交

Variable-Width Transformers

变宽Transformer

Zhaofeng Wu, Oliver Sieberling, Shawn Tan, Rameswar Panda, Yury Polyanskiy, Yoon Kim

发表机构 * MIT(麻省理工学院) MIT-IBM Watson AI Lab(MIT-IBM沃森人工智能实验室)

AI总结 提出一种中间窄、两端宽的变宽Transformer架构,通过无参数残差缩放机制实现非均匀容量分配,在语言模型困惑度、FLOPs和KV缓存上优于均匀宽度基线。

详情
AI中文摘要

扩展模型规模,特别是深度和宽度,推动了基于Transformer的语言模型的显著进步。然而,大多数架构在所有层中保持恒定宽度,即使不同层可能扮演不同的计算角色,也均匀分配固定的参数和计算预算。在这项工作中,我们通过提出一个×形> <former架构,实证研究了跨网络深度的非均匀容量分配。该设计保持较宽的早期和晚期层,同时缩窄中间层,利用无参数残差缩放机制。在从200M到2B参数(密集)和3B参数(MoE)的仅解码器语言模型中,我们的> <former在语言建模损失上始终优于参数匹配的均匀基线。通过降低平均层宽度,该架构还减少了总体FLOPs(在拟合的损失匹配缩放曲线下减少22%)以及更小的KV缓存内存和I/O成本(减少15%)。在分析中,我们展示了这种瓶颈结构导致残差流中定性不同的表示。总体而言,我们的结果表明,非均匀宽度分配可以导致语言模型更资源最优的缩放。

英文摘要

Scaling model size, specifically depth and width, has driven significant progress in transformer-based language models. However, most architectures maintain a constant width across all layers, allocating a fixed parameter and computation budget evenly despite different layers potentially playing distinct computational roles. In this work, we empirically investigate nonuniform capacity allocation across network depth by proposing a $\times$-shaped > <former architecture. This design maintains wider early and late layers while narrowing the middle layers, utilizing a parameter-free residual resizing mechanism. Across decoder-only language models ranging from 200M to 2B parameters (dense) and 3B parameters (MoE), our > <former consistently outperforms parameter-matched uniform baselines on language modeling loss. By reducing the average layer width, this architecture also requires fewer overall FLOPs (22% reduction under fitted loss-matched scaling curves) and smaller KV cache memory and I/O cost (15% reduction). In analysis, we show that this bottleneck structure results in qualitatively different representations in residual streams. Overall, our results demonstrate that nonuniform width allocation can result in more resource-optimal scaling of language models.

2606.18243 2026-06-17 cs.CV cs.GR cs.RO 新提交

MOCHI: Motion Enhancement of Collaborative Human-object Interactions

MOCHI: 协作人-物交互的运动增强

Jiye Lee, Yonghun Choi, Jungdam Won

AI总结 针对多人-物交互数据中手物接触错位、运动抖动和手指细节缺失等问题,提出两阶段框架MOCHI,先通过优化生成物理合理的手部抓取,再基于扩散模型优化全身运动,有效增强噪声数据。

Comments SIGGRAPH 2026 Journal (ACM TOG); Project page: this https URL (https://jiyewise.github.io/projects/MOCHI/)

详情
AI中文摘要

协作人-物交互展示了动态且复杂的运动,需要参与者与共享对象之间的相互预期和持续调整。对此类协作多人-物交互(MHOI)场景进行建模需要高质量的数据采集作为基础步骤;然而,由于MHOI中人与人、人与物交互同时发生的内在复杂性,这一步骤具有挑战性。这种复杂性导致MHOI捕获数据存在噪声,表现为多种伪影:手与物体之间的接触错位、捕获序列中的运动抖动和时间不一致性,以及缺失或不完整的手指级关节细节。为了解决这些挑战,我们提出了MOCHI(协作人-物交互的运动增强),一个用于增强噪声MHOI数据的两阶段框架。我们的方法首先通过从噪声身体输入进行优化生成物理合理的手部抓取,产生既物理合理又与身体姿态语义一致的抓取,然后将这些优化后的抓取扩展为完整的手-物交互序列。随后,所有参与者的全身运动通过一个基于扩散的噪声优化框架进行细化,该框架使用单人运动先验。在优化过程中,我们引入优化目标以在这些单人先验中编码人-物和人与人交互信息。实验结果表明,我们的流程在多种MHOI数据(无论是通过现有捕获方法获取还是由生成模型合成)上均有效。我们进一步展示了系统在不同参与者数量和交互类型下的鲁棒性,并演示了包括基于关键帧的MHOI创建和通过改变物体几何形状进行数据增强在内的多种应用。

英文摘要

Collaborative human-object interaction shows dynamic and complex movements that require mutual anticipation and continuous adjustment between participants and the shared object. Modeling such collaborative multi-human object interaction (MHOI) scenarios requires high-quality data acquisition as a foundational step; however, this is challenging due to the inherent complexity of MHOI where human-human and human-object interactions occur simultaneously. Such complexity leads to noisy MHOI captures characterized by several artifacts: contact misalignment between hands and objects, motion jitter and temporal inconsistencies in the captured sequences, and missing or incomplete finger-level articulation details. To address these challenges, we present MOCHI (MOtion Enhancement of Collaborative Human-object Interactions), a two-stage framework for enhancing noisy MHOI data. Our approach first generates physically plausible hand grasps through optimization from noisy body input, producing grasps that are both physically plausible and semantically consistent with the body pose, where these optimized grasps are extended into complete hand-object interaction sequences. Consequently, the full-body motion for all participants are refined through a diffusion-based noise optimization framework that uses single-person motion priors. During the optimization process, we introduce optimization objectives to encode human-object and human-human interaction information within these single-person priors. Experimental results demonstrate the effectiveness of our pipeline across diverse MHOI data, either acquired by existing capture methods or synthesized by generative models. We further show robustness of our system across varying numbers of participants and types of interactions, and demonstrate various applications including keyframe-based MHOI creation and data augmentation through varying object geometries.

2606.18242 2026-06-17 cs.CV 新提交

EventDrive: Event Cameras for Vision-Language Driving Intelligence

EventDrive: 用于视觉-语言驾驶智能的事件相机

Dongyue Lu, Rong Li, Ao Liang, Lingdong Kong, Wei Yin, Lai Xing Ng, Benoit R. Cottereau, Camille Simon Chane, Wei Tsang Ooi

发表机构 * NUS(新加坡国立大学) HKUST(GZ)(香港科技大学(广州)) Horizon Robotics(地平线机器人) A*STAR, I2R(新加坡科技研究局,资讯通信研究院) IPAL, CNRS IRL 2955, Singapore(IPAL,法国国家科学研究中心国际联合实验室2955,新加坡) University Toulouse, CNRS, CerCo, Toulouse, France(图卢兹大学,法国国家科学研究中心,CerCo,法国图卢兹) ETIS UMR 8051, CY Cergy Paris University, ENSEA, CNRS, France(ETIS UMR 8051,CY塞尔吉-巴黎大学,ENSEA,法国国家科学研究中心,法国)

AI总结 提出EventDrive基准和模型套件,通过多时域事件金字塔和时域混合专家模块融合事件流与RGB帧,在感知、理解、预测和规划四维度提升驾驶推理性能。

Comments CVPR2026, 34 pages, 15 figures, 15 tables, project page: this https URL (https://dylanorange.github.io/projects/eventdrive)

详情
AI中文摘要

事件相机通过异步亮度变化感知世界,具有微秒级延迟和高动态范围,其运动保真度远超基于帧的传感器,并能捕捉传统曝光常遗漏的时间结构。这些特性使事件成为自动驾驶中RGB的有力补充,尤其在帧感知可能不可靠的模糊、眩光和快速运动场景下。然而,现有的事件感知视觉-语言模型仍局限于通用感知,未能揭示事件传感如何促进整个驾驶循环中的推理和决策。我们提出EventDrive,一个大规模基准和模型套件,统一了事件流、RGB帧和语言监督,涵盖四个核心维度:感知、理解、预测和规划,包括描述、结构化问答、定位、运动状态识别、轨迹预测和规划任务。在此基础上,EventDrive-VLM引入了多时域事件金字塔和时域混合专家模块,自适应地编码和融合异步与基于帧的信息,用于下游推理。在多样化任务上的全面评估表明,事件流在时间精度、运动感知和鲁棒性方面提供了显著提升,将事件传感置于驾驶智能的核心。

英文摘要

Event cameras sense the world through asynchronous brightness changes with microsecond latency and high dynamic range, offering motion fidelity far beyond frame-based sensors and capturing temporal structure that conventional exposures often miss. These properties make events a powerful complement to RGB in autonomous driving, especially under blur, glare, and rapid motion, where frame-based perception can become unreliable. However, existing event-aware vision-language models remain limited to generic perception and do not reveal how event sensing contributes to reasoning and decision-making across the full driving loop. We present EventDrive, a large-scale benchmark and model suite that unifies event streams, RGB frames, and language supervision across four core dimensions: Perception, Understanding, Prediction, and Planning, covering captions, structured QA, grounding, motion-state recognition, trajectory forecasting, and planning tasks. Building on this foundation, EventDrive-VLM introduces a multi-horizon event pyramid and a temporal-horizon mixture-of-experts module to adaptively encode and fuse asynchronous and frame-based information for downstream reasoning. Comprehensive evaluation across diverse tasks shows that event streams provide substantial gains in temporal precision, motion awareness, and robustness, bringing event sensing into the center of driving intelligence.

2606.18239 2026-06-17 cs.RO 新提交

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

EBench: 通用移动操作策略的要素诊断

Ning Gao, Jinliang Zheng, Xing Gao, Haoxiang Ma, Hanqing Wang, Yukai Wang, Jiantong Chen, Zanxin Chen, Shujie Zhang, Mingda Jia, Xuekun Jiang, Zihou Zhu, Xinyu Li, Shuai Wang, Hao Li, Wenzhe Cai, Yuqiang Yang, Xudong Xu, Zhaoyang Lyu, Yao Mu, Tai Wang, Jiangmiao Pang, Jia Zeng, Weinan Zhang, Chunhua Shen

发表机构 * Shanghai AI Laboratory(上海人工智能实验室) Xi’an Jiaotong University(西安交通大学) Institute for AI Industry Research (AIR), Tsinghua University(清华大学智能产业研究院) Tsinghua University(清华大学) University of Science and Technology of China(中国科学技术大学) Shanghai Jiao Tong University(上海交通大学) Zhejiang University(浙江大学)

AI总结 提出EBench基准,从5个能力和4个泛化维度诊断通用移动操作模型,揭示不同模型在成功率相近时能力差异显著。

详情
AI中文摘要

我们提出EBench,一个仿真基准,用于诊断通用移动操作策略,超越单一的成功率标量。EBench包含26个多样且具有挑战性的操作任务,沿5个能力维度和4个泛化维度进行标注。我们评估了最先进的通用操作模型,包括$\pi_0$、$\pi_{0.5}$、XVLA和InternVLA-A1,并揭示出成功率相近的模型展现出截然不同的能力轮廓:$\pi_{0.5}$实现了最高的测试成功率和最佳的训练-测试保持率,而InternVLA-A1在移动操作上占主导地位,但在灵巧任务上崩溃,XVLA与其他策略相比在一组不相交的原子技能上表现出优势。除了能力轮廓分析,EBench还从4个代表性角度分析了泛化能力,识别了不同分布偏移因素的影响。结果揭示了模型在总体得分背后的优势和弱点。我们希望这个基准能提供广泛的诊断信号,以指导通用操作模型的迭代。

英文摘要

We present EBench, a simulation benchmark that diagnoses generalist mobile manipulation policies beyond a single success-rate scalar. EBench comprises 26 diverse and challenging manipulation tasks annotated along 5 capability dimensions and 4 generalization dimensions. We evaluate state-of-the-art generalist manipulation models including $\pi_0$, $\pi_{0.5}$, XVLA, and InternVLA-A1, and reveal that models with near success rates exhibit strikingly different capability profiles: $\pi_{0.5}$ achieves the highest test success rate and the best train--test retention, whereas InternVLA-A1 dominates mobile manipulation but collapses on dexterous tasks, and XVLA exhibits strengths on a disjoint set of atomic skills compared to other policies. Beyond capability profiling, EBench analyzes the generalization ability from 4 representative perspectives, identifying the impact of different distribution shift factors. The results reveal strengths and weaknesses of models behind an overall score. We hope this benchmark offers a broad set of diagnostic signals to guide iteration on generalist manipulation models.

2606.18237 2026-06-17 cs.CL cs.AI cs.LG 新提交

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

ReproRepo: 利用 GitHub 仓库问题扩展可重复性审计

Shanda Li, Qiuhong Anna Wei, Jingwu Tang, Valerie Chen, Nihar B Shah, Tim Dettmers, Yiming Yang, Ameet Talwalkar

AI总结 提出 ReproRepo 框架,利用 GitHub issues 作为监督信号,对 1149 篇论文进行可重复性评估,发现 Codex with GPT-5.5 能识别约 90% 论文的语义相关复现问题。

详情
AI中文摘要

从论文和已发布代码中复现研究结果对科学进步至关重要。现有工作引入了基准测试来评估 LLM 代理是否能协助可重复性,但由于数据整理和评估需要大量人工努力,这些基准难以扩展。我们提出了 ReproRepo,一个可扩展的可重复性评估框架,利用人类提出的 GitHub issues 作为真实复现障碍的自然监督信号。我们在来自主要会议的 1149 篇近期机器学习论文上实例化 ReproRepo,并评估了四种前沿模型代理配置。我们的结果表明,即使不执行代码,LLM 代理也能从论文-仓库对中识别出许多现实世界的可重复性问题:我们研究中的最佳代理,即带有 GPT-5.5 的 Codex,为研究中约 90% 的论文揭示了至少一个语义相关的人类报告的障碍。进一步分析表明,代理在揭示可见故障和识别正确语义区域方面特别有效,但在精确定位方面可能仍不足。ReproRepo 可作为未来在真实世界可重复性审计中评估 LLM 代理的可重用、可扩展框架。我们的代码发布在 https://this URL。

英文摘要

Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale due to their reliance on substantial manual effort for data curation and evaluation. We introduce ReproRepo, a scalable framework for reproducibility evaluation that leverages human-raised GitHub issues as naturally occurring supervision on realistic reproduction blockers. We instantiate ReproRepo on 1,149 recent machine learning papers from major conferences and evaluate four frontier model-agent configurations. Our results show that LLM agents, even without executing code, can identify many real-world reproducibility problems from paper-repository pairs: the best agent in our study, namely Codex with GPT-5.5, surfaces at least one semantically related human-reported blocker for ~90% of papers in the study. Further analysis shows that agents are particularly effective for surfacing visible failures and identifying the right semantic region, but may still be insufficient in exact localization. ReproRepo can serve as a reusable, scalable framework for future evaluations of LLM agents on real-world reproducibility auditing. Our code is released at this https URL.

2606.18235 2026-06-17 cs.AI 新提交

EvolveNav: Proactive Preflection and Self-Evolving Memory for Zero-Shot Object Goal Navigation

EvolveNav: 用于零样本目标导航的主动预反思与自进化记忆

Qi Chai, Wenhao Shen, Nanjie Yao, Yue Xia, Kaiyong Zhao, Jie Ma, Guosheng Lin, Hao Wang

发表机构 * HKUST(GZ)(香港科技大学(广州)) Nanyang Technological University(南洋理工大学) Xi’an Jiaotong University(西安交通大学) XGRIDS(深圳格物智联)

AI总结 提出自进化零样本目标导航框架,通过从历史轨迹提取规则并基于置信上界检索,结合记忆引导预反思模块,减少无效探索,成功率提升10.1%。

详情
AI中文摘要

零样本目标导航(ZS-OGN)要求具身智能体在没有任何先验训练的情况下探索并定位目标物体。为此,近期方法利用基础模型,但它们通常依赖静态先验且缺乏适应性,导致重复错误和代价高昂的试错。本文提出一种自进化的ZS-OGN框架,实现连续的测试时改进。具体而言,我们通过从过去轨迹中提取可操作知识来构建智能体规则记忆。然后,我们提出一种基于置信上界的检索策略,通过平衡语义相关性和历史成功率来选择有效规则。此外,我们引入一个记忆引导的预反思模块,在行动前预测潜在结果,减少低效探索。大量实验表明,我们的方法优于现有的零样本基线,在减少不必要步骤的同时实现了10.1%的成功率提升。

英文摘要

Zero-Shot Object-Goal Navigation (ZS-OGN) requires embodied agents to explore and locate target objects without any prior training. To this end, recent methods leverage foundation models. But they typically rely on static priors and lack adaptation, which leads to repeated errors and costly trial and error. In this paper, we propose a self-evolving ZS-OGN framework that enables continuous test-time improvement. Specifically, we build an agentic rule memory by extracting actionable knowledge from past trajectories. Then, we propose a retrieval strategy based on upper confidence bound, selecting effective rules by balancing semantic relevance and historical success. In addition, we introduce a memory-guided preflection module that forecasts potential outcomes before action, reducing inefficient exploration. Extensive experiments show that our method outperforms existing zero-shot baselines, achieving a 10.1\% improvement in success rate with fewer unnecessary steps.

2606.18231 2026-06-17 cs.CV cs.LG cs.RO 新提交

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

自适应体积力学属性场:分辨率无关

Rishit Dagli, Donglai Xiang, Vismay Modi, Xuning Yang, Gavriel State, David I.W. Levin, Maria Shugrina

发表机构 * NVIDIA(英伟达)

AI总结 提出AdaVoMP方法,利用稀疏自适应体素结构和自回归Transformer编解码器,为3D物体预测高分辨率空间变化的杨氏模量、泊松比和密度,相比现有技术分辨率提升16^3倍且更准确。

Comments Project Page and hi-res paper: this https URL (https://research.nvidia.com/labs/sil/projects/adavomp/). ICML 2026

详情
AI中文摘要

精确的力学属性(或材料)杨氏模量($E$)、泊松比($\ u$)和密度($\ ho$)对于数字世界的可靠物理模拟至关重要,但大多数3D资产缺乏这些信息。我们提出AdaVoMP,一种预测输入3D物体跨表示形式的精确密集空间变化($E$,$\ u$,$\ ho$)的方法,在分辨率、准确性和内存效率上优于现有技术。我们技术的基础是一种稀疏自适应体素结构SAV,它能高效地表示输入3D形状和材料场输出。我们将最准确的先前方法VoMP的固定体素模型替换为一种新颖的稀疏Transformer编码器-解码器模型,该模型学习为每个输入形状自回归地生成唯一的SAV来表示其材料,实现比先前技术高$16^3$倍的分辨率。实验表明,即使测试时计算量少于所有先前技术,AdaVoMP也能估计出更准确的体积属性。这使得我们能够将高分辨率复杂3D物体转换为可模拟的资产,从而实现逼真的可变形模拟。

英文摘要

Accurate mechanical properties (or materials) Young's modulus ($E$), Poisson's ratio ($\nu$) and density ($\rho$) are essential for reliable physics simulation of digital worlds, but most 3D assets lack this information. We propose AdaVoMP, a method for predicting accurate dense spatially-varying ($E$, $\nu$, $\rho$) for input 3D objects across representations, improving the resolution, accuracy, and memory efficiency over the state-of-the-art. The foundation of our technique is a sparse and adaptive voxel structure SAV that efficiently represents both the input 3D shape and the material field output. We replace the fixed-voxel model of the most accurate prior method, VoMP, with a novel sparse transformer encoder-decoder model that learns to generate a unique SAV autoregressively for every input shape to represent its materials, achieving a resolution $16^3\times$ higher than prior art. Experiments show that AdaVoMP estimates more accurate volumetric properties, even with lesser test-time compute than all prior art. This allows us to convert high-resolution complex 3D objects into simulation-ready assets, resulting in realistic deformable simulations.