URL PDF HTML ☆

赞 0 踩 0

2507.04704 2026-06-17 q-bio.QM cs.AI cs.CV 版本更新

SPATIA: Multimodal Generation and Prediction of Spatial Cell Phenotypes

SPATIA: 空间细胞表型的多模态生成与预测

Zhenglun Kong, Mufan Qiu, John Boesen, Xiang Lin, Sukwon Yun, Tianlong Chen, Manolis Kellis, Marinka Zitnik

AI总结提出SPATIA模型，融合细胞形态、基因表达和空间上下文，通过置信感知流匹配和形态-谱对齐实现多尺度生成与预测，在12项任务中优于18个基线模型。

Comments ICML 2026

详情

AI中文摘要

理解细胞形态、基因表达和空间上下文如何共同塑造组织功能是生物学中的一个核心挑战。基于图像的空间转录组学技术现在能够提供细胞图像和基因表达谱的高分辨率测量，但现有方法通常孤立地分析这些模态或以有限的分辨率进行分析。我们通过引入SPATIA来解决这个问题，这是一个多层次的生成和预测模型，通过融合从细胞到组织水平的形态、基因表达和空间上下文，学习统一的、空间感知的表征。SPATIA还结合了一个空间条件生成框架，该框架具有置信感知的OT重加权和形态-谱对齐，用于建模目标状态形态分布。具体来说，我们提出了一个置信感知的流匹配目标，该目标基于不确定性对弱最优传输对进行重加权。我们进一步应用形态-谱对齐来鼓励有生物学意义的图像生成，从而能够建模微环境依赖的表型转变。我们组装了一个多尺度数据集，包含17个组织中的2590万个细胞-基因对。我们在12项任务上对SPATIA与18个模型进行了基准测试，涵盖表型生成、注释、聚类、基因插补和跨模态预测等类别。SPATIA相比最先进模型取得了改进，生成保真度提高了8%，预测准确率提高了3%。

英文摘要

Understanding how cellular morphology, gene expression, and spatial context jointly shape tissue function is a central challenge in biology. Image-based spatial transcriptomics technologies now provide high-resolution measurements of cell images and gene expression profiles, but existing methods typically analyze these modalities in isolation or at limited resolution. We address the problem by introducing SPATIA, a multi-level generative and predictive model that learns unified, spatially aware representations by fusing morphology, gene expression, and spatial context from the cell to the tissue level. SPATIA also incorporates a spatially conditioned generative framework with confidence-aware OT reweighting and morphology-profile alignment for modeling target-state morphology distributions. Specifically, we propose a confidence-aware flow matching objective that reweights weak optimal-transport pairs based on uncertainty. We further apply morphology-profile alignment to encourage biologically meaningful image generation, enabling the modeling of microenvironment-dependent phenotypic transitions. We assembled a multi-scale dataset consisting of 25.9 million cell-gene pairs across 17 tissues. We benchmark SPATIA against 18 models across 12 tasks, spanning categories such as phenotype generation, annotation, clustering, gene imputation, and cross-modal prediction. SPATIA achieves improved performance over state-of-the-art models, improving generative fidelity by 8% and predictive accuracy by up to 3%.

URL PDF HTML ☆

赞 0 踩 0

2602.05790 2026-06-17 cs.IT cs.LG math.IT stat.ML 版本更新

Price of metric universality in vector quantization is at most 0.11 bit

向量量化中度量普适性的代价至多为0.11比特

Alina Harbuzova, Or Ordentlich, Yury Polyanskiy

AI总结本文证明存在一个通用码本，对于所有可能的X统计量，在W为高斯时，其性能至少与速率每维度降低0.11比特的X自适应水填充码本相当。

Comments 41 page, 1 figure

详情

AI中文摘要

快速计算矩阵乘积 $W^\top X$ 是现代大语言模型的核心操作。为了更高效地部署，一种流行的方法是使用低精度近似 $\widehat W$ 替代真实 $W$（“仅权重量化”）。信息论表明，降低 $W$ 精度的最优算法依赖于 $X$ 的（二阶）统计量，并且需要将向量量化码本与 $X$ 的 PCA 方向仔细对齐（称为“水填充分配”的过程）。然而，码本对 $X$ 统计量的依赖性非常不实用。本文证明存在一个通用码本，对于所有可能的 $X$ 统计量同时接近最优，其意义在于：当 $W$ 为高斯时，该通用码本至少与速率每维度降低 0.11 比特的 $X$ 自适应水填充码本一样好。这样的通用码本将是低精度存储格式的理想候选者，这是当前活跃研究的话题，但可惜存在性证明是非构造性的。等价地，我们的结果表明在 $\mathbb{R}^n$ 中存在一个网，它同时关于所有希尔伯特范数是球面的接近最优覆盖。

英文摘要

Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (``weight-only quantization''). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as ``waterfilling allocation''). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.

URL PDF HTML ☆

赞 0 踩 0

2602.04901 2026-06-17 q-bio.GN cs.LG 版本更新

Beyond Independent Genes: Learning Module-Inductive Representations for Single-Cell Gene Perturbation Prediction

超越独立基因：学习模块归纳表示用于单细胞基因扰动预测

Jiafa Ruan, Ruijie Quan, Liyang Xu, Zongxin Yang, Yi Yang

AI总结提出scBIG框架，通过基因关系聚类、基因簇感知编码器和结构感知对齐学习协调的基因程序模块表示，结合条件流匹配实现灵活泛化的扰动预测，在多个单细胞扰动基准上平均提升6.7%。

详情

AI中文摘要

质疑共形预测中的覆盖-长度度量：当更短的区间并不更好时

Yizhou Min, Yizhou Lu, Lanqi Li, Zhen Zhang, Jiaye Teng

AI总结本文批判性检验共形预测中标准度量（覆盖率和区间长度）的充分性，揭示一种称为“偏见技巧”（PT）的反直觉方法可欺骗性地缩短区间长度而保持覆盖有效，并提出新度量“区间稳定性”以检测此类行为。

详情

AI中文摘要

共形预测（CP）已成为无分布不确定性量化的基石，通常通过其覆盖率和区间长度进行评估。本文批判性地检验了这些标准度量的充分性。我们证明，通过一种称为偏见技巧（PT）的反直觉方法，区间长度可能被欺骗性地改善，而覆盖率仍然有效。具体而言，对于任何给定的测试样本，PT 概率性地返回一个区间，该区间要么为空，要么使用调整后的置信水平构建，从而保持边际覆盖率。虽然 PT 可能产生欺骗性较低的区间长度，但它引入了实际漏洞：同一输入在算法的重复运行中可能产生完全不同的预测区间。我们正式推导了 PT 实现这些误导性改进的条件，并在各种回归和分类任务中提供了广泛的实证证据。此外，我们引入了一个新度量——区间稳定性，它有助于检测新的 CP 方法是否基于此类 PT 技术隐式地改善了长度。代码可在 https://this URL 获取。

英文摘要

Conformal prediction(CP) has become a cornerstone of distribution-free uncertainty quantification, conventionally evaluated by its coverage and interval length. This work critically examines the sufficiency of these standard metrics. We demonstrate that the interval length might be deceptively improved through a counter-intuitive approach termed Prejudicial Trick(PT), while the coverage remains valid. Specifically, for any given test sample, PT probabilistically returns an interval, which is either null or constructed using an adjusted confidence level, thereby preserving marginal coverage. While PT potentially yields a deceptively lower interval length, it introduces practical vulnerabilities: the same input can yield completely different prediction intervals across repeated runs of the algorithm. We formally derive the conditions under which PT achieves these misleading improvements and provide extensive empirical evidence across various regression and classification tasks. Furthermore, we introduce a new metric interval stability which helps detect whether a new CP method implicitly improves the length based on such PT-like techniques. Code is available at https://github.com/benben-cd/PT-Conformal-Prediction.

URL PDF HTML ☆

赞 0 踩 0

2601.06862 2026-06-17 cs.CR cs.CV cs.LG cs.MM eess.IV 版本更新

基于相似性核的稳健局部多项式回归

Yaniv Shulman

AI总结针对传统局部多项式回归对异常值敏感的问题，提出一种结合响应变量信息的条件密度核加权方法，通过局部密度估计降低异常值影响，在保持与标准LOWESS竞争力同时降低经验偏差。

详情

AI中文摘要

局部多项式回归（LPR）因其灵活性和简单性，是一种广泛使用的非参数方法，用于建模复杂关系。它通过拟合低阶多项式到数据的局部子集（按邻近度加权）来估计回归函数。然而，传统的LPR对异常值和高杠杆点敏感，这些点会显著影响估计精度。本文重新审视用于计算回归权重的核函数，并提出一种新颖的框架，将预测变量和响应变量都纳入加权机制。本工作的重点是一种条件密度核，通过局部密度估计减轻异常值的影响，从而稳健地估计权重。所提出的方法已在Python中实现，并在此https URL公开提供。总体分析量化了基于密度的稳健加权引起的偏差，报告的实验显示，与迭代稳健LOWESS相比，经验偏差更低，同时与标准LOWESS保持竞争力。这一进展为传统LPR提供了有前景的扩展，为稳健回归应用开辟了新的可能性。

英文摘要

Local Polynomial Regression (LPR) is a widely used nonparametric method for modeling complex relationships due to its flexibility and simplicity. It estimates a regression function by fitting low-degree polynomials to localized subsets of the data, weighted by proximity. However, traditional LPR is sensitive to outliers and high-leverage points, which can significantly affect estimation accuracy. This paper revisits the kernel function used to compute regression weights and proposes a novel framework that incorporates both predictor and response variables in the weighting mechanism. The focus of this work is a conditional density kernel that robustly estimates weights by mitigating the influence of outliers through localized density estimation. The proposed method is implemented in Python and is publicly available at https://github.com/yaniv-shulman/rsklpr. The population analysis quantifies the bias induced by density-based robust weighting, and the reported experiments show lower empirical bias than iterative robust LOWESS while remaining competitive with standard LOWESS. This advancement provides a promising extension to traditional LPR, opening new possibilities for robust regression applications.

URL PDF HTML ☆

赞 0 踩 0

2507.05164 2026-06-17 math.DS cs.LG nlin.AO 版本更新

A Dynamical Systems Perspective on the Analysis of Neural Networks

神经网络分析的动力学系统视角

Dennis Chemnitz, Maximilian Engel, Christian Kuehn, Sara-Viola Kuntz

AI总结利用动力学系统重新表述深度神经网络、梯度下降等挑战，研究信息传播、训练动态和平均场极限，揭示网络嵌入、稳定性及图极限等性质。

Comments preprint of a book chapter contribution

详情

AI中文摘要

在本章中，我们利用动力学系统分析机器学习算法的几个方面。作为阐述性贡献，我们展示了如何将深度神经网络、（随机）梯度下降及相关主题中的各种挑战重新表述为动力学陈述。我们还解决了三个具体挑战。首先，我们考虑信息通过神经网络的传播过程，即研究不同架构下的输入-输出映射。我们解释了增强神经ODE的通用嵌入性质（可表示给定正则性的任意函数）、根据合适函数类对多层感知器和神经ODE的分类，以及神经延迟方程中的记忆依赖性。其次，我们从动力学角度考虑神经网络的训练方面。我们描述了梯度下降的动力学系统视角，并研究了超定问题的稳定性。然后我们将此分析扩展到过参数化设置，并描述了稳定性边缘现象，也涉及隐式偏差的可能解释。对于随机梯度下降，我们通过插值解的Lyapunov指数展示了过参数化设置的稳定性结果。第三，我们解释了关于神经网络平均场极限的几个结果。我们描述了一个结果，该结果通过有向图测度将现有技术扩展到涉及图极限的异质神经网络。这表明大类神经网络自然落入图上Kuramoto型模型及其大图极限的框架内。最后，我们指出使用动力学研究可解释和可靠AI的类似策略也可应用于生成模型或梯度训练方法中的基本问题（如反向传播或梯度消失/爆炸）等设置。

英文摘要

In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

URL PDF HTML ☆

赞 0 踩 0

2410.08562 2026-06-17 cond-mat.mtrl-sci cs.LG 版本更新

Adaptable Method for Crystal Design across Diverse Constraints and Objectives with Pretrained Property Predictors

基于预训练属性预测器的可适应方法用于跨多样约束与目标的晶体设计

Akihiro Fujii, Yoshitaka Ushiku, Koji Shimizu, Anh Khoa Augustin Lu, Satoshi Watanabe

AI总结提出一种直接预测器引导的梯度优化方法，结合现成预测器、位点元素掩码、模板初始化和任务特定损失，实现数据高效、约束丰富的晶体设计，在钙钛矿中优于生成和贝叶斯基线，并支持半金属设计。

详情

AI中文摘要

先进的晶体设计可以加速从光伏到自旋电子学等应用中的材料发现。实际设计必须满足多种属性和物理约束，然而现有的基于机器学习的方法通常依赖于大型数据集、重新训练或任务特定的生成器。在这里，我们展示了直接预测器引导的梯度优化通过结合现成预测器与位点元素掩码、模板初始化和任务特定损失，实现了数据高效、约束丰富的晶体设计。在钙钛矿中，它在三个目标——带隙、形成能和容忍因子——以及两个硬约束下优于生成和贝叶斯基线。DFT评估进一步表明，尽管使用的预测器训练数据约为领先生成模型的十分之一，其带隙目标性能仍具有竞争力。通过灵活组合预训练预测器与应用导向的掩码和自定义损失，同一框架支持半金属设计。这种模块化可以帮助研究人员和工程师将多样化的应用需求直接转化为优化的候选晶体，且计算成本最低。

英文摘要

Advanced crystal design can accelerate materials discovery across applications from photovoltaics to spintronics. Practical design must satisfy multiple properties and physical constraints, yet existing machine-learning-based approaches to such design often depend on large datasets, retraining, or task-specific generators. Here, we show that direct predictor-guided gradient optimization enables data-efficient, constraint-rich crystal design by combining off-the-shelf predictors with site-wise element masks, template initialization, and task-specific losses. In perovskites, it outperformed generative and Bayesian baselines under three targets -- band gap, formation energy, and tolerance factor -- and two hard constraints. DFT assessment further showed band-gap targeting competitive with a leading generative model despite using predictors trained on roughly one-tenth of the data. By flexibly combining pretrained predictors with application-oriented masks and custom losses, the same framework supported half-metal design. Such modularity could help researchers and engineers translate diverse application requirements directly into optimized candidate crystals with minimal computational cost.

URL PDF HTML ☆

赞 0 踩 0

2405.15379 2026-06-17 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

标准化痴呆筛查测试的自动化评估

Franziska Braun, Markus Förstel, Bastian Oppermann, Andreas Erzigkeit, Thomas Hillemacher, Hartmut Lehfeld, Korbinian Riedhammer

AI总结本文研究了标准化痴呆筛查测试的自动化评分方法，通过分析手动和自动转录本的评分相关性，发现自动评分在某些任务上比人工评分更严格，但整体仍保持高相关性。

Comments Submitted to Interspeech 2022. arXiv admin note: text overlap with arXiv:2206.05018

Journal ref Proceedings of Interspeech 2022

详情

DOI: 10.21437/Interspeech.2022-10436

AI中文摘要

在痴呆筛查和监测中，标准化测试在临床实践中起关键作用，因为它们旨在通过测量多种认知任务的表现来最小化主观性。本文报告了一项研究，该研究包括一个半标准化的病史采集，随后是两种标准化的神经心理学测试，即SKT和CERAD-NB。这些测试包括命名物体、学习词列表等基本任务，以及广泛使用的工具如MMSE。大多数任务是口头进行的，因此应适合基于转录文本的自动化评分。对于前30名患者的第一批，我们分析了专家手动评分与基于手动和自动转录的自动评分之间的相关性。对于SKT和CERAD-NB，我们观察到使用手动转录本时的高到完美相关性；对于某些相关性较低的任务，自动评分比人类参考更严格，因为其仅限于音频。使用自动转录本时，相关性下降如预期，与识别准确性相关；然而，我们仍观察到高达0.98（SKT）和0.85（CERAD-NB）的高相关性。我们证明使用词替代可以缓解识别错误，从而提高与专家评分的相关性。

英文摘要

For dementia screening and monitoring, standardized tests play a key role in clinical routine since they aim at minimizing subjectivity by measuring performance on a variety of cognitive tasks. In this paper, we report on a study that consists of a semi-standardized history taking followed by two standardized neuropsychological tests, namely the SKT and the CERAD-NB. The tests include basic tasks such as naming objects, learning word lists, but also widely used tools such as the MMSE. Most of the tasks are performed verbally and should thus be suitable for automated scoring based on transcripts. For the first batch of 30 patients, we analyze the correlation between expert manual evaluations and automatic evaluations based on manual and automatic transcriptions. For both SKT and CERAD-NB, we observe high to perfect correlations using manual transcripts; for certain tasks with lower correlation, the automatic scoring is stricter than the human reference since it is limited to the audio. Using automatic transcriptions, correlations drop as expected and are related to recognition accuracy; however, we still observe high correlations of up to 0.98 (SKT) and 0.85 (CERAD-NB). We show that using word alternatives helps to mitigate recognition errors and subsequently improves correlation with expert scores.

URL PDF HTML ☆

赞 0 踩 0

2606.17977 2026-06-17 econ.EM 新提交

Beyond Parallel Trends in Staggered Difference-in-Differences: Identification under Higher-Order Parallelism

超越交错双重差分中的平行趋势：高阶平行性下的识别

Zecharias Anteneh

AI总结本文提出高阶平行性假设层次，替代传统平行趋势假设，在交错双重差分设计中实现队列特定和平均处理效应的点识别，并证明聚合定理。

Comments 38 pages, 4 figures. Companion Stata command (anddp) implementing the estimator will be available soon at https://github.com/zanteneh/anddp

详情

AI中文摘要

在双重差分设计中，平行趋势假设要求处理组和对照组之间的结果差距在未处理情况下保持平坦。预处理事件研究经常拒绝这一平坦差距要求。现有的应对措施包括参数趋势控制以及基于违规程度假设的处理效应边界。本文表明，在严格更弱的假设下，交错设计中队列特定和平均处理效应的点识别仍然可以实现。我将平坦差距要求替换为高阶条件层次 Parallel[p]，将该框架嵌入 Callaway 和 Sant'Anna (2021) 的组-时间平均处理效应结构中，并证明了一个聚合定理，该定理适用于不同队列在不同可行多项式阶数下被识别的情况，这是交错设计特有的此前未解决的挑战。一个序贯阶数选择程序指导应用实践。蒙特卡洛证据证实，选择后自助法覆盖接近名义水平，且推断对现实序列相关具有稳健性。应用于医疗补助扩展数据，该方法得到的点估计基于预处理数据未拒绝的假设，而同样的数据明确拒绝了平坦差距要求。

英文摘要

In difference-in-differences designs, the parallel trends assumption requires that the outcome gap between treated and control units would have remained flat absent treatment. Pre-treatment event studies frequently reject this flat-gap requirement. Existing responses include parametric trend controls and bounds on the treatment effect under assumptions about the magnitude of the violation. This paper shows that point identification of cohort-specific and aggregate treatment effects in staggered designs remains achievable under strictly weaker assumptions. I replace the flat-gap requirement with a hierarchy of higher-order conditions, Parallel[p], embed this framework in the group-time average treatment effect structure of Callaway and Sant'Anna (2021), and prove an aggregation theorem for the case where different cohorts are identified under different feasible polynomial orders, a challenge unique to staggered designs that has not been previously addressed. A sequential order-selection procedure guides applied practice. Monte Carlo evidence confirms that post-selection bootstrap coverage remains near-nominal and that inference is robust to realistic serial correlation. Applied to Medicaid expansion data, the method yields point estimates resting on an assumption the pre-treatment data do not reject, in contrast to the flat-gap requirement which those same data decisively reject.

URL PDF HTML ☆

赞 0 踩 0

2606.18134 2026-06-17 eess.AS 新提交

Grounding Spoken LLMs in Multi-Speaker Audio via Diarization Conditioning

通过说话人日志条件将口语大语言模型扩展到多说话人音频

Alexander Polok, Samuele Cornell, Sathvik Udupa, Jan Černocký, Shinji Watanabe, Lukáš Burget

AI总结提出基于说话人日志条件的口语语言模型，通过条件化声学编码器提取目标说话人表示，避免序列化输出训练导致的灾难性遗忘，在多个数据集上显著提升说话人属性转录性能。

Comments Accepted to Interspeech 2026

详情

AI中文摘要

我们提出了说话人日志条件的口语语言模型（SLMs），这是一种将SLMs扩展到远场多说话人音频的策略。不同于通过序列化输出训练来调整解码器（这有灾难性遗忘的风险），我们通过说话人日志掩码条件化声学编码器以提取目标说话人表示，同时保持解码器冻结。我们将其实例化为Dixtral，将说话人日志条件的Whisper（DiCoW）编码器集成到Voxtral SLM中。在AMI、NOTSOFAR-1、LibriSpeechMix和Mixer6上，Dixtral在说话人属性转录方面分别以29.0%、19.8%和16.0%的绝对cpWER优于Gemini 3.0 Flash、VibeVoice和Voxtral Mini Transcribe V2。在一个新颖的长篇多说话人问答基准上，零样本Dixtral在远场内容理解上与Gemini持平，而经过微调后，在所有任务上均超越了Gemini和基于近讲语音的Voxtral。

英文摘要

We propose diarization-conditioned spoken language models (SLMs), a strategy for extending SLMs to far-field multi-talker audio. Rather than adapting the decoder via Serialized Output Training, which risks catastrophic forgetting, we condition the acoustic encoder on diarization masks to extract target-speaker representations, keeping the decoder frozen. We instantiate this as Dixtral, integrating a Diarization Conditioned Whisper (DiCoW) encoder into the Voxtral SLM. On AMI, NOTSOFAR-1, LibriSpeechMix, and Mixer6, Dixtral outperforms Gemini 3.0 Flash, VibeVoice, and Voxtral Mini Transcribe V2 on speaker-attributed transcription by 29.0%, 19.8%, and 16.0% absolute cpWER respectively. On a novel long-form multi-speaker QA benchmark, zero-shot Dixtral matches Gemini on far-field content understanding, and when fine-tuned surpasses both Gemini and Voxtral operating on close-talk across all tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.18072 2026-06-17 eess.AS 新提交

One-Step Token-to-Waveform Generation with MeanFlow in Latent Space

基于潜在空间中MeanFlow的一步式Token到波形生成

Zheqi Dai, Guangyan Zhang, Zhen Ye, Jingyu Li, Haolin He, Chunyat Wu, Yiwen Guo, Qiuqiang Kong

AI总结提出MeanFlow在高度压缩潜在空间中实现一步式Token2Wav生成，解决多步流匹配解码器的速度-质量权衡，RTF提升17倍且质量损失可忽略。

Comments 5 pages, 1 figure

详情

AI中文摘要

神经音频编解码器是现代基于LLM的文本到语音（TTS）和多模态系统的核心。随着低比特率语义编解码器的重要性日益增加，Token到波形（Token2Wav）解码器成为决定感知质量和系统效率的瓶颈。传统的多步流匹配解码器提供了卓越的质量，但由于迭代采样导致高推理延迟，造成了严重的质量-速度权衡。在本文中，我们提出了一种新颖的Token2Wav架构，通过在高度压缩的潜在空间中应用MeanFlow来克服这一限制。通过建模平均速度而非瞬时速度场，MeanFlow实现了真正的一步生成。在潜在域中操作减轻了波形级流的内存和稳定性问题，与多步基线相比，实时因子（RTF）提升了高达17倍，且质量下降可忽略。此外，我们引入了缓解潜在不匹配的细化策略，包括冻结MeanFlow生成器的仅解码器微调和端到端联合微调，在不增加推理时间成本的情况下提高了保真度。代码和演示已公开。

英文摘要

Neural audio codecs are central to modern LLM-based Text-to-Speech (TTS) and multimodal systems. As low-bitrate semantic codecs gain prominence, the Token-to-Waveform (Token2Wav) decoder becomes a bottleneck determining both perceptual quality and system efficiency. Conventional multi-step flow-matching decoders offer superior quality but suffer from high inference latency due to iterative sampling, creating a severe quality-speed trade-off. In this paper, we propose a novel Token2Wav architecture that overcomes this limitation by applying MeanFlow in a highly compressed latent space. By modeling the average velocity rather than the instantaneous velocity field, MeanFlow enables true one-step generation. Operating in the latent domain mitigates the memory and stability issues of waveform-level flows, yielding up to a 17$\times$ improvement in Real-Time Factor (RTF) compared to multi-step baselines with negligible quality degradation. Furthermore, we introduce refinement strategies that mitigate latent mismatch, including decoder-only fine-tuning with the MeanFlow generator frozen and end-to-end joint fine-tuning, improving fidelity without increasing inference-time cost. Code and demo are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.18054 2026-06-17 eess.AS 新提交

AI-based Cognitive-linguistic Features for Dementia Assessment in Picture Description

基于AI的认知语言特征在图片描述任务中的痴呆评估

Lingfeng Xu, Prad Kadambi, Samuel Goldinger, Visar Berisha, Kimberly D. Mueller, Julie Liss

AI总结提出七个针对Cookie Theft图片描述任务的临床构念，利用大语言模型生成严重度评分和解释，Claude 3.5 Sonnet在ADReSS数据集上达到85%准确率，专家一致性评分3.99/5，展示了LLM在可解释认知筛查中的潜力。

Comments 10 pages, 2 figures

详情

AI中文摘要

图片描述为认知语言能力的多个临床构念提供了有价值的见解。然而，将这些构念转化为定量测量仍然具有挑战性，限制了可解释性和临床实用性。我们引入了七个针对Cookie Theft图片描述任务定制的构念，并提示大语言模型（LLMs）对其进行评估，生成严重度评分和基于示例的解释。在所检查的LLMs中，Claude 3.5 Sonnet表现最佳，其生成的严重度评分能够显著区分认知障碍个体与健康对照组。该模型在ADReSS数据集上达到了85%的高准确率。专家对Claude的评分和解释进行评估，平均一致性为3.99/5。研究结果展示了LLMs在操作化临床构念和生成可解释评估方面的潜力，为开发可访问的认知筛查工具提供了一种有前景的方法。

英文摘要

Picture descriptions provide valuable insights into several clinical constructs related to cognitive-linguistic abilities. However, operationalizing these constructs into quantitative measures remains challenging, limiting interpretability and clinical utility. We introduced seven constructs tailored to the Cookie Theft picture description task and prompted large language models (LLMs) to evaluate them, generating severity scores and example-based explanations. Among the examined LLMs, Claude 3.5 Sonnet performed the best, producing severity scores that significantly distinguish cognitively impaired individuals from healthy controls. The model achieves a high accuracy of 85% on the ADReSS dataset. Expert evaluation of Claude's scores and explanations yields a 3.99/5 average agreement. The findings demonstrate the potential of LLMs to operationalize clinical constructs and generate interpretable evaluations, offering a promising approach for accessible cognitive screening tools.

URL PDF HTML ☆

赞 0 踩 0

2606.17942 2026-06-17 eess.SP 新提交

On the Optimum Energy-per-bit Launch Power in Coherent Hollow-core Fibre Transmission Systems

相干空芯光纤传输系统中每比特能量最优发射功率研究

Ronit Sohanpal, Eric Sillekens, Mindaugas Jarmolovicius, Robert I. Killey, Polina Bayvel

AI总结本文研究空芯光纤传输系统中每比特能量最优发射功率，发现1000公里C波段链路在最小每比特能量发射功率下可降低总功耗41.5%，吞吐量仅损失2.2%。

Comments European Conference on Optical Communications (ECOC) 2026

2606.17903 2026-06-17 eess.SP 新提交

Constellation Design for Nonlinear Unified SWIPT Receiver Channels with Memory

非线性统一SWIPT接收机信道带记忆的星座设计

Triantafyllos Mavrovoltsos, Elio Faddoul, Zulqarnain Bin Ashraf, Constantinos Psomas, Besma Smida, Ioannis Krikidis

AI总结针对非线性统一SWIPT接收机信道，提出考虑记忆效应的星座设计方法，通过状态自适应策略和自编码器框架优化误符号率与能量收集的折中。

Comments Submitted to IEEE Transactions on Communications

详情

AI中文摘要

统一接收机（UR）已成为同时无线信息和能量传输（SWIPT）的一种有前景架构，因为共同的整流前端能够从同一整流输出中实现信息解码（ID）和能量收集（EH）。然而，由于二极管的非线性，整流是非线性的，而电容器在符号间引入记忆，使得信道上的星座设计具有挑战性。本文研究了无记忆和有记忆机制下非线性UR-SWIPT信道的星座设计。首先，我们提出一个易处理的统一整流模型，该模型同时捕捉（i）非线性稳态映射和（ii）瞬态操作下的非对称电容器充放电动力学。为了隔离带记忆的整流对ID的影响，我们研究了基于信息的设计。在此设置中，我们开发了一种状态自适应策略，该策略具有算法星座设计，考虑整流器状态并在观测域中塑造星座。通过近似整流器状态分布，我们推导出闭式平均符号错误率（SER）表达式，并表征速率-可靠性（R-R）折中。然后，我们寻找在平均发射功率和EH约束下最小化SER的星座。我们使用基于自编码器的框架解决无记忆机制中的能量约束设置，该框架将非线性整流模型嵌入为可微信道块。数值结果验证了所提模型，展示了记忆对R-R折中的影响，并展示了学习星座如何适应速率-能量折中的EH需求。

英文摘要

Unified receivers (URs) have emerged as a promising architecture for simultaneous wireless information and power transfer (SWIPT), since a common rectifying front-end enables information decoding (ID) and energy harvesting (EH) from the same rectified output. However, rectification is nonlinear due to the diode, while the capacitor introduces memory across symbols, making constellation design over the channel challenging. In this paper, we study constellation design for nonlinear UR-SWIPT channels in both memoryless and memory regimes. First, we propose a tractable unified rectification model that captures both (i) the nonlinear steady-state mapping and (ii) the asymmetric capacitor charging/discharging dynamics under transient operation. To isolate the impact of rectification with memory on ID, we study the information-based design. In this setting, we develop a state-adaptive policy with an algorithmic constellation design that accounts for the rectifier state and shapes the constellation in the observation domain. By approximating the rectifier state distribution, we derive a closed-form average symbol error rate (SER) expression and characterize the rate-reliability (R-R) tradeoff. We then seek constellations that minimize the SER under average transmit power and EH constraints. We address the resulting energy-constrained setting in the memoryless regime using an autoencoder-based framework that embeds the nonlinear rectification model as a differentiable channel block. Numerical results validate the proposed models, demonstrate the impact of memory on the R-R tradeoff, and show how learned constellations adapt to EH requirements in the rate-energy tradeoff.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

Stable and Steerable Sparse Autoencoders with Weight Regularization

Instrumental and Proximal Causal Inference with Gaussian Processes

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

SPATIA: Multimodal Generation and Prediction of Spatial Cell Phenotypes

Price of metric universality in vector quantization is at most 0.11 bit

Beyond Independent Genes: Learning Module-Inductive Representations for Single-Cell Gene Perturbation Prediction

Maximin Relative Improvement: Fair Learning as a Bargaining Problem

Learning-Infused Formal Reasoning: From Contract Synthesis to Artifact Reuse and Formal Semantics

Tacit Coordination of Large Language Models

Questioning the Coverage-Length Metric in Conformal Prediction: When Shorter Intervals Are Not Better

Learning QoE from Packet-Level Measurements in Encrypted Video Conferencing Traffic

Vulcan: Instance-specialized, Verifiable Systems Heuristics Through LLM-driven Search

Enhanced Evolutionary Multi-Objective Deep Reinforcement Learning for Reliable and Efficient Wireless Rechargeable Sensor Networks

Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach

BadScientist: Can a Research Agent Write Convincing but Unsound Papers that Fool LLM Reviewers?

Recursive Learning Without Collapse: A Weighting-Based Stabilization Framework

Breaking the Code: Security Assessment of AI Code Agents Through Systematic Jailbreaking Attacks

Robust Local Polynomial Regression with Similarity Kernels

A Dynamical Systems Perspective on the Analysis of Neural Networks

Adaptable Method for Crystal Design across Diverse Constraints and Objectives with Pretrained Property Predictors

Randomized Midpoint Method for Log-Concave Sampling under Constraints

From Theory to Application: A Practical Introduction to Neural Operators in Scientific Computing

Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models

Automated Evaluation of Standardized Dementia Screening Tests

Beyond Parallel Trends in Staggered Difference-in-Differences: Identification under Higher-Order Parallelism

Grounding Spoken LLMs in Multi-Speaker Audio via Diarization Conditioning

One-Step Token-to-Waveform Generation with MeanFlow in Latent Space

AI-based Cognitive-linguistic Features for Dementia Assessment in Picture Description

On the Optimum Energy-per-bit Launch Power in Coherent Hollow-core Fibre Transmission Systems

Constellation Design for Nonlinear Unified SWIPT Receiver Channels with Memory