arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2505.07683 2026-05-11 cs.LG cs.AI

Multimodal Cancer Modeling in the Age of Foundation Model Embeddings

多模态癌症建模:在基础模型嵌入时代

Steven Song, Morgan Borjigin-Wang, Irene Madejski, Robert L. Grossman

发表机构 * Center for Translational Data Science(转化数据科学中心) Department of Computer Science(计算机科学系) University of Chicago(芝加哥大学) Medical Scientist Training Program(医学科学家培训计划) Brown University(布朗大学) Section of Biomedical Data Science(生物医学数据科学部门) Department of Medicine(医学系)

AI总结 本文探讨了利用基础模型嵌入进行多模态癌症建模的可能性,展示了多模态融合的优势,并评估了病理报告文本和文本摘要对模型性能的影响。

Comments camera ready version for ML4H 2025, typo corrected

Journal ref Proceedings of the Fifth Machine Learning for Health Symposium, PMLR 297:202-227, 2026

详情
AI中文摘要

癌症基因组图谱(TCGA)通过其协调的基因组、临床和影像数据,使新的发现成为可能,并成为癌症研究中的大规模参考数据集。先前的研究已开发出针对TCGA的定制深度学习模型,用于癌症生存预测等任务。生物医学深度学习的一个现代范式是开发基础模型(FMs),以生成不依赖特定建模任务的特征嵌入。生物医学文本尤其在基础模型的发展上取得了显著进展。尽管TCGA包含自由文本数据作为病理报告,但这些数据 historically 被低估。本文研究了在癌症数据上训练经典机器学习模型的多模态零样本基础模型嵌入的能力。我们展示了多模态融合的简便性和加成效应,优于单模态模型。进一步,我们展示了包含病理报告文本的好处,并严格评估了基于模型的文本摘要和幻觉的影响。总体而言,我们提出了一种以嵌入为中心的多模态癌症建模方法。

英文摘要

The Cancer Genome Atlas (TCGA) has enabled novel discoveries and served as a large-scale reference dataset in cancer through its harmonized genomics, clinical, and imaging data. Numerous prior studies have developed bespoke deep learning models over TCGA for tasks such as cancer survival prediction. A modern paradigm in biomedical deep learning is the development of foundation models (FMs) to derive feature embeddings agnostic to a specific modeling task. Biomedical text especially has seen growing development of FMs. While TCGA contains free-text data as pathology reports, these have been historically underutilized. Here, we investigate the ability to train classical machine learning models over multimodal, zero-shot FM embeddings of cancer data. We demonstrate the ease and additive effect of multimodal fusion, outperforming unimodal models. Further, we show the benefit of including pathology report text and rigorously evaluate the effect of model-based text summarization and hallucination. Overall, we propose an embedding-centric approach to multimodal cancer modeling.

2504.11101 2026-05-11 cs.CV cs.AI cs.MM

Consensus Entropy: Harnessing Multi-VLM Agreement for Self-Verifying and Self-Improving OCR

共识熵:利用多视觉语言模型一致性的自验证和自改进OCR

Yulong Zhang, Tianyi Liang, Xinyue Huang, Erfei Cui, Guoqing Wang, Xu Guo, Chenhui Li, Gongshen Liu

发表机构 * Fudan University(复旦大学) Shanghai Innovation Institute(上海创新研究院) Shanghai Jiao Tong University(上海交通大学) East China Normal University(华东师范大学) Sun Yat-sen University(中山大学)

AI总结 本文提出共识熵,一种无需训练的指标,通过测量多模型一致性熵来估计输出可靠性,开发了CE-OCR框架,通过多模型共识验证输出、选择最佳输出并提升效率,实验表明CE在质量验证中鲁棒,提升F1分数42.1%,优于自一致性及单模型基线。

详情
AI中文摘要

光学字符识别(OCR)对视觉语言模型(VLMs)和大语言模型训练数据生成至关重要。然而,尽管平均OCR准确率有所提高,最先进的VLMs仍难以检测样本级错误且缺乏有效的无监督质量控制。我们引入共识熵(CE),一种无需训练的、模型无关的度量标准,通过测量跨模型一致性熵来估计输出可靠性。核心见解是正确预测在输出空间中收敛,而错误则发散。基于CE,我们开发了CE-OCR,一种轻量级多模型框架,通过众包一致性验证输出,选择最佳输出,并通过自适应路由进一步提升效率。实验表明,CE在质量验证中具有鲁棒性,相比VLM-as-Judge提升了42.1%的F1分数。CE-OCR实现了稳定的OCR增益,以相同成本优于自一致性和单模型基线。值得注意的是,CE无需训练或监督,可实现即插即用集成。代码:https://github.com/Aslan-yulong/consensus-entropy。

英文摘要

Optical Character Recognition (OCR) is fundamental to Vision-Language Models (VLMs) and high-quality data generation for LLM training. Yet, despite progress in average OCR accuracy, state-of-the-art VLMs still struggle with detecting sample-level errors and lack effective unsupervised quality control. We introduce Consensus Entropy (CE), a training-free, model-agnostic metric that estimates output reliability by measuring inter-model agreement entropy. The core insight is that correct predictions converge in output space, while errors diverge. Based on CE, we develop CE-OCR, a lightweight multi-model framework that verifies outputs by ensemble agreement, selects the best outputs, and further improves efficiency through adaptive routing. Experiments demonstrate that CE is robust for quality verification, improving F1 scores by 42.1% over VLM-as-Judge. CE-OCR achieves consistent OCR gains, outperforming self-consistency and single-model baselines at the same cost. Notably, CE requires no training or supervision, enabling plug-and-play integration. Code: https://github.com/Aslan-yulong/consensus-entropy.

2503.06223 2026-05-11 cs.CV

RedDiffuser: Auditing Multimodal Safety Failures in Vision-Language Models via Reinforced Diffusion

RedDiffuser:通过强化扩散审计多模态安全故障

Ruofan Wang, Xingjun Ma

发表机构 * Fudan University(复旦大学)

AI总结 研究多模态系统在有害上下文暴露下的安全审计,提出RedDiffuser框架通过扩散模型生成视觉输入,揭示隐藏的安全漏洞,实验显示VLMs在部分有毒文本与视觉上下文结合时存在广泛安全问题。

详情
AI中文摘要

大型视觉-语言模型(VLMs)正被越来越多地部署在开放环境中,确保在多模态输入下的可靠性至关重要。然而,现有评估仍以指令为中心,关注显式的恶意查询,而忽视了更现实且未被探索的风险:在有害上下文暴露下,安全对齐是否仍稳健。本工作研究了在有害上下文暴露下的多模态安全审计,探讨VLMs在部分有毒文本与视觉上下文结合时能否保持安全行为。为实现系统审计,我们提出了RedDiffuser(RedDiff),一种基于强化学习的框架,利用扩散模型生成语义连贯的视觉输入进行黑盒安全测试。通过结合贪婪提示搜索与强化优化,RedDiffuser揭示了高风险的多模态输入,暴露了潜在的安全故障。在开源和商业VLMs上的广泛实验显示,此类上下文条件故障普遍存在。在LLaVA上,RedDiffuser在原始数据集上将不安全响应率提高至10.69%,在保留集上提高至8.91%,且能有效迁移到Gemini和LLaMA-Vision。这些漏洞即使在外部安全防护下仍存在,表明当前系统级安全机制仍不足以应对现实中的多模态风险。我们的发现揭示了现有安全评估中的关键盲点,并确立了上下文感知的多模态审计作为诊断现代VLM系统隐藏漏洞的重要范式。

英文摘要

Large Vision-Language Models (VLMs) are increasingly deployed in open-ended environments, where ensuring reliable safety under multimodal inputs is critical. However, existing evaluations remain largely instruction-centric, focusing on explicit malicious queries while overlooking a more realistic and underexplored risk: whether safety alignment remains robust under harmful contextual exposure. This limitation is particularly important for multimodal systems, where visual inputs can substantially steer model behavior and render text-only auditing insufficient. In this work, we study multimodal safety auditing under harmful contextual exposure, asking whether VLMs can maintain safe behavior when partial toxic text is paired with visual context. To enable systematic auditing, we propose RedDiffuser (RedDiff), a reinforcement-based framework that leverages diffusion models to generate semantically coherent visual inputs for black-box safety testing. By combining greedy prompt search with reinforcement optimization, RedDiffuser uncovers high-risk multimodal inputs that expose latent safety failures. Extensive experiments on both open-source and commercial VLMs show that such context-conditioned failures are widespread. On LLaVA, RedDiffuser increases unsafe response rates by up to 10.69% on the original set and 8.91% on a hold-out set, with strong transferability to Gemini and LLaMA-Vision. These vulnerabilities persist even under external safety guardrails, suggesting that current system-level safety mechanisms remain insufficient for realistic multimodal risks. Our findings reveal a critical blind spot in existing safety evaluations and establish context-aware multimodal auditing as an essential paradigm for diagnosing hidden vulnerabilities in modern VLM systems.

2503.05085 2026-05-11 cs.CL cs.SD eess.AS

S2S-Arena: Evaluating Paralinguistic Instruction Following in Speech-to-Speech Models

S2S-Arena:评估语音到语音模型的副语言指令遵循

Feng Jiang, Zhiyu Lin, Yiyang Liu, Liumeng Xue, Fan Bu, Yuhao Du, Xiangying Chen, Benyou Wang, Haizhou Li

发表机构 * Artificial Intelligence Research Institute, Shenzhen University of Advanced Technology(深圳先进技术研究院) The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Nanjing University(南京大学) Shenzhen Loop Area Institute(深圳环城研究院) CentraleSupélec, Université Paris-Saclay(巴黎萨克雷大学CentraleSupélec分校) National University of Singapore(新加坡国立大学)

AI总结 S2S-Arena通过四级交互协议和双阶段数据构建流程,评估语音模型在语义理解和副语言表达上的能力,揭示了现有系统在复杂副语言需求下的性能差距。

Comments Accepted by ACL 2026 main

详情
AI中文摘要

近年来,大型语言模型(LLMs)的进步深刻重塑了语音到语音(S2S)系统,使语音交互更加自然。然而,现有基准仍严重依赖基于文本的评估,忽视了语调、情感和说话者特征等副语言线索,这些是表达和人类样交流的关键。我们引入S2S-Arena,一个语音原生的基准,用于评估指令遵循的S2S模型,明确评估语义理解和副语言表达。S2S-Arena包含一个四级交互协议,系统性地在增加的副语言复杂性下探测模型;一个双阶段数据构建流程,产生1,243个语音样本,涵盖100多个现实任务;以及一个竞技场式评估框架,使参考自由的成对比较直接在语音模态中进行。在1,000多个比较中评估10种最先进的S2S系统,揭示了当前学术和工业系统在复杂副语言需求下的显著性能差距。我们的分析进一步识别了影响表达指令遵循的关键设计因素,为构建更自然、稳健和人类对齐的语音代理提供了可操作的见解。

英文摘要

Recent advances in large language models (LLMs) have fundamentally reshaped speech-to-speech (S2S) systems, enabling increasingly natural spoken interaction. However, existing benchmarks still rely heavily on text-based evaluation and largely ignore paralinguistic cues such as prosody, emotion, and speaker traits, which are central to expressive and human-like communication. We introduce S2S-Arena, a speech-native benchmark for evaluating instruction-following S2S models with explicit assessment of both semantic understanding and paralinguistic expression. S2S-Arena features a four-level interaction protocol that systematically probes models under increasing paralinguistic complexity, a two-stage data construction pipeline that produces 1,243 speech samples spanning 100+ real-world tasks, and an arena-style evaluation framework that enables reference-free, pairwise comparison directly in the speech modality. Benchmarking 10 state-of-the-art S2S systems over 1,000+ comparisons reveals substantial performance gaps (especially under complex paralinguistic demands) between current academic and industrial systems. Our analysis further identifies key design factors governing expressive instruction following, providing actionable insights for building more natural, robust, and human-aligned speech agents.

2503.04638 2026-05-11 cs.LG

No Forgetting Learning: Buffer-free Continual Learning Classification

无遗忘学习:无缓冲的持续学习分类

Mohammad Ali Vahedifar, Qi Zhang

发表机构 * DIGIT and Department of Electrical and Computer Engineering(DIGIT和电气工程与计算机工程系)

AI总结 本文提出无遗忘学习(NFL),一种无缓冲的持续学习框架,通过分解网络结构并采用逐步冻结协议,实现任务增量学习。NFL+进一步引入欠定自动编码器,提升模型性能并保持内存效率。

详情
AI中文摘要

大多数持续学习(CL)方法通过存储示例在重放缓冲区中保持早期任务性能,引入了与任务数量成比例的内存开销,这在受监管领域引发了隐私问题。我们提出无遗忘学习(NFL),一种无缓冲的类和任务增量学习框架,该框架利用过参数化网络的内在冗余性。NFL将网络分解为共享主干和任务特定的头部,然后应用逐步冻结协议:新能力首先被隔离,共享表示在知识蒸馏下适应,所有组件通过双软目标锚定共同优化。NFL+在此流程中引入欠定自编码器,保留来自先前任务的信息特征并纠正类别不平衡引起的预测偏差。NFL+LoRA进一步将该框架扩展到预训练视觉变换器,通过将更新限制在低秩子空间中,使用Fisher加权正则化,保持恒定的主干内存成本,无论任务数量如何。在CIFAR-100、Tiny-ImageNet和ImageNet-1000上,最多50个增量任务中,NFL+优于所有无缓冲基线,并在仅使用其模型大小的2.53%时匹配基于内存的方法。我们还提出了一种可塑性-稳定性评分,以更平衡地评估贸易-offs。

英文摘要

Most Continual Learning (CL) methods maintain performance on earlier tasks by storing exemplars in a replay buffer, introducing memory overhead that scales with the number of tasks and raising privacy concerns in regulated domains. We propose No Forgetting Learning (NFL), a buffer-free framework for class- and task-incremental learning that instead exploits the inherent redundancy of overparameterized networks. NFL decomposes the network into a shared backbone and task-specific heads, then applies a stepwise freezing protocol: new capabilities are first isolated, shared representations are adapted under knowledge distillation, and all components are jointly refined with dual soft-target anchoring. NFL+ augments this pipeline with an under-complete auto-encoder that preserves informative features from previous tasks and corrects the prediction bias caused by class imbalance. NFL+LoRA further extends the framework to pre-trained Vision Transformers by confining updates to a low-rank subspace with Fisher-weighted regularization, maintaining constant backbone memory cost regardless of the number of tasks. On CIFAR-100, Tiny-ImageNet, and ImageNet-1000 across up to 50 incremental tasks, NFL+ outperforms all buffer-free baselines and matches memory-based methods while requiring only 2.53\% of their model size. We also propose a Plasticity--Stability score for more balanced trade-off evaluation.

2503.02107 2026-05-11 cs.RO

Balancing Act: Trading Off Odometry and Map Registration for Efficient Lidar Localization

平衡艺术:在里程计与地图注册之间进行权衡以实现高效的激光雷达定位

Katya M. Papais, Daniil Lisus, Cedric Le Gentil, David J. Yoon, Timothy D. Barfoot

发表机构 * University of Toronto Institute for Aerospace Studies(多伦多大学航空航天研究所)

AI总结 本文研究了如何通过整合轻量级里程计与优化更新频率,提高激光雷达定位效率,同时保持高性能表现。

Comments 8 pages

详情
AI中文摘要

大多数自动驾驶车辆依赖于准确且高效的定位,这通过将实时传感器数据与现有地图进行比较来实现环境导航。平衡定位精度与计算效率仍是重大挑战,因为高精度方法通常具有更高的计算成本。本文提出两种改进激光雷达定位效率的方法,并研究其对性能的影响。首先,我们将两种轻量级里程计估计器——无对应关系的多普勒惯性估计器和低成本轮 odometer-陀螺仪(OG)方法——整合到一个地形定位管道中,并与最先进的(SOTA)迭代最近点(ICP)基线进行比较。我们强调了这些方法之间的权衡:多普勒和OG估计器提供更快、更轻量的更新,而ICP提供更高的精度,但以增加计算负载为代价。其次,通过控制定位更新的频率,并利用这些更新之间的里程计估计,我们证明了可以使用任何所提出的方法在优化计算效率的同时保持准确的定位。我们使用超过100公里的唯一真实世界驾驶数据在不同的道路环境中评估这些方法。通过改变定位间隔,我们证明了对于ICP、多普勒和OG估计器,计算努力可以分别减少27%、80%和91%,同时保持SOTA精度。

英文摘要

Most autonomous vehicles rely on accurate and efficient localization, which is achieved by comparing live sensor data to a preexisting map, to navigate their environment. Balancing the accuracy of localization with computational efficiency remains a significant challenge, as high-accuracy methods often come with higher computational costs. In this paper, we present two ways of improving lidar localization efficiency and study their impact on performance. First, we integrate two lightweight odometry estimators, a correspondence-free Doppler-inertial estimator and a low-cost wheel odometer-gyroscope (OG) method, into a topometric localization pipeline and compare them against a state-of-the-art (SOTA) iterative closest point (ICP) baseline. We highlight the trade-offs between these approaches: the Doppler and OG estimators offer faster, lightweight updates, while ICP provides higher accuracy at the cost of increased computational load. Second, by controlling the frequency of localization updates and leveraging odometry estimates between them, we demonstrate that accurate localization can be maintained while optimizing for computational efficiency using any of the presented methods. We evaluate these approaches using over 100 km of unique real-world driving data in different on-road environments. By varying the localization interval, we demonstrate that computational effort can be reduced by 27%, 80%, and 91% for the ICP, Doppler, and OG estimators, respectively, while maintaining SOTA accuracy.

2502.17500 2026-05-11 cs.LG cs.AI

Generalized Euler Logarithm and its Applications in Machine Learning: Natural Gradient, Backpropagation, Generalized EG, Mirror Descent and OLPS

广义欧拉对数及其在机器学习中的应用:自然梯度、反向传播、广义EG、镜像下降和OLPS

Andrzej Cichocki

发表机构 * Systems Research Institute of Polish Academy of Science(波兰科学院系统研究所) Warsaw University of Technology(华沙理工大学)

AI总结 本文研究了双参数广义欧拉对数及其反函数的性质,揭示其参数域保证单调性、凹性和可逆性,并将其与多种变形熵和发散度联系起来,同时拓展到现代机器学习算法中。

Comments 34 pages, preprint of Journal paper

详情
AI中文摘要

本文深入研究了双参数广义欧拉对数及其反函数的基本性质。我们系统地澄清了保证单调性、凹性和可逆性的参数域,推导了级数和积分表示,并提供了与广泛的一类单参数和双参数变形的明确联系,包括Tsallis、Kaniadakis、Schwämmle--Tsallis、Kaniadakis--Scarfone和Tempesta类型的对数及其反函数。通过这种方式,欧拉$(a,b)$-对数被确立为广泛的一类广义熵和发散度度量的统一核心。在算法方面,我们扩展了欧拉对数在现代机器学习和优化中的应用。我们引入了广义指数梯度(GEG)和镜像下降(MD)方案,在其中欧拉$(a,b)$-对数作为底层Bregman发散度中的灵活链接函数。此外,我们提出了一种基于欧拉的广义交叉熵(GCE)损失函数,推导了其精确的反向传播公式,并详细说明了其与Fisher-Rao自然梯度(NG)下降的无缝集成。通过分离Fisher信息矩阵(FIM)并开发对角线NG近似,我们展示了两个变形参数如何成功地将尾部鲁棒性与局部梯度塑造解耦。

英文摘要

This paper investigates in depth the fundamental properties of the two-parameter generalized Euler logarithm and its inverse, the associated deformed $(a,b)$-exponential function. We systematically clarify the parameter domains that guarantee monotonicity, concavity, and invertibility, derive series and integral representations, and provide explicit links to a broad class of one- and two-parameter deformations, including Tsallis, Kaniadakis, Schwämmle--Tsallis, Kaniadakis--Scarfone, and Tempesta-type logarithms and their inverse exponentials. In this way, the Euler $(a,b)$-logarithm is established as a unifying kernel for a wide family of generalized entropies and divergence measures. On the algorithmic side, we extend applications of the Euler logarithm to modern machine learning and optimization. We introduce generalized Exponentiated Gradient (GEG) and Mirror Descent (MD) schemes in which the Euler $(a,b)$-logarithm acts as a flexible link function in the underlying Bregman divergence. In addition, we propose an Euler-based Generalized Cross-Entropy (GCE) loss for deep neural networks, derive its exact backpropagation formulas, and detail its seamless integration with Fisher-Rao Natural Gradient (NG) descent. By isolating the Fisher Information Matrix (FIM) and developing a diagonal NG approximation, we demonstrate how the two deformation parameters successfully decouple tail robustness from local gradient shaping.

2501.09189 2026-05-11 cs.LG cs.DS

Testing Noise Assumptions of Learning Algorithms

检验学习算法的噪声假设

Surbhi Goel, Adam R. Klivans, Konstantinos Stavropoulos, Arsen Vasilyan

发表机构 * University of Pennsylvania(宾夕法尼亚大学) UT Austin(得克萨斯大学)

AI总结 本文提出一种高效测试训练集是否符合给定噪声模型假设的方法,首次解决了该问题,并展示了在高斯边际分布下半空间学习的可测试学习算法。

Comments 45 pages, Best Paper Award at Reliable ML workshop at NeurIPS 2025, Accepted to COLT 2026

详情
AI中文摘要

我们提出计算学习理论中的一个基本问题:能否高效检验训练集是否满足给定噪声模型的假设?尽管几十年来噪声环境下的学习研究层出不穷,这一任务仍未被解决。本文证明该任务是可行的,并给出了首个高效算法来测试各种噪声假设。为了建模该问题,我们扩展了Rubinfeld和Vasilyan(2023)最近提出的可测试学习框架,要求学习者运行相关测试,满足以下两个条件:(1)当测试通过时,学习者输出分类器及其最优性证书;(2)测试必须对任何根据指定边际分布和噪声模型生成的数据集都通过。然后我们考虑在高斯边际分布下带有Massart噪声(每个标签翻转概率小于1/2取决于输入特征)的半空间学习问题,并给出一个完全多项式时间的可测试学习算法。我们还展示了传统噪声环境下学习与可测试学习之间的分离。事实上,在简单情况下随机分类噪声(每个标签翻转概率固定为η=1/2)时,我们证明可测试学习需要超多项式时间,而传统学习则为平凡。

英文摘要

We pose a fundamental question in computational learning theory: can we efficiently test whether a training set satisfies the assumptions of a given noise model? This question has remained unaddressed despite decades of research on learning in the presence of noise. In this work, we show that this task is tractable and present the first efficient algorithm to test various noise assumptions on the training data. To model this question, we extend the recently proposed testable learning framework of Rubinfeld and Vasilyan (2023) and require a learner to run an associated test that satisfies the following two conditions: (1) whenever the test accepts, the learner outputs a classifier along with a certificate of optimality, and (2) the test must pass for any dataset drawn according to a specified modeling assumption on both the marginal distribution and the noise model. We then consider the problem of learning halfspaces over Gaussian marginals with Massart noise (where each label can be flipped with probability less than $1/2$ depending on the input features), and give a fully-polynomial time testable learning algorithm. We also show a separation between the classical setting of learning in the presence of structured noise and testable learning. In fact, for the simple case of random classification noise (where each label is flipped with fixed probability $η= 1/2$), we show that testable learning requires super-polynomial time while classical learning is trivial.

2411.16748 2026-05-11 cs.CV

Multimodal Diffusion Transformer with Memory Bank for Scalable Long-Duration Talking Video Generation

具有内存库的多模态扩散变换器用于可扩展的长时谈话视频生成

Haojie Zhang, Zhihao Liang, Ruibo Fu, Bingyan Liu, Zhengqi Wen, Xuefei Liu, Jianhua Tao, Yaling Liang

发表机构 * South China University of Technology(南方科技大学) Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) Beijing National Research Center for Information Science and Technology, Tsinghua University(北京信息科学研究中心,清华大学) Department of Automation, BNRist, Tsinghua University(清华大学自动化系,北京信息科学研究中心)

AI总结 本文提出 LetsTalk 框架,通过多模态指导和内存库机制解决长时谈话视频生成中的质量、一致性、时间连贯性和计算效率问题,实验表明其在生成质量和效率上达到新水平。

Comments 16 pages, 25 figures

详情
AI中文摘要

长时谈话视频合成面临实现高质量视频、人物一致性、时间连贯性和计算效率的持续挑战。随着视频长度增加,视觉退化、人物漂移、时间伪影和误差累积等问题变得愈发严重,严重影响结果的现实性和可靠性。为了解决这些挑战,我们提出了 LetsTalk,一种配备多模态指导和新型内存库机制的扩散变换器框架,明确维护上下文连续性,并能够产生鲁棒、高质量且高效的长时谈话视频。特别地,LetsTalk 引入了噪声正则化的内存库以缓解在扩展视频生成过程中的误差累积和采样伪影。为进一步提高效率和时空建模, LetsTalk 采用了深度压缩自动编码器和具有线性注意力的时空感知变换器以实现有效的多模态融合。我们系统分析了三种融合方案,并表明结合深度(共生融合)用于人物特征和浅层(直接融合)用于音频可以实现更优的视觉真实性和精确的语音驱动运动,同时保持运动的多样性。广泛的实验表明, LetsTalk 在生成质量上建立了新的最先进水平,能够生成时间连贯且逼真的谈话视频,具有增强的多样性和活力,并且在参数上比先前方法少 8 倍,保持了显著的效率。

英文摘要

Long-duration talking video synthesis faces enduring challenges in achieving high video quality, portrait consistency, temporal coherence, and computational efficiency. As video length increases, issues such as visual degradation, portrait drift, temporal artifacts, and error accumulation become increasingly problematic, severely affecting the realism and reliability of the results. To address these challenges, we present LetsTalk, a diffusion transformer framework equipped with multimodal guidance and a novel memory bank mechanism, explicitly maintaining contextual continuity and enabling robust, high-quality, and efficient generation of long-duration talking videos. In particular, LetsTalk introduces a noise-regularized memory bank to alleviate error accumulation and sampling artifacts during extended video generation. To further improve efficiency and spatiotemporal modeling, LetsTalk employs a deep compression autoencoder and a spatiotemporal-aware transformer with linear attention for effective multimodal fusion. We systematically analyze three fusion schemes and show that combining deep (Symbiotic Fusion) for portrait features and shallow (Direct Fusion) for audio achieves superior visual realism and precise speech-driven motion, while preserving diversity of movements. Extensive experiments demonstrate that LetsTalk establishes new state-of-the-art in generation quality, producing temporally coherent and realistic talking videos with enhanced diversity and liveliness, and maintains remarkable efficiency with 8x fewer parameters than previous approaches.

2410.21438 2026-05-11 cs.CL cs.LG

UFT: Unifying Fine-Tuning of SFT and RLHF/DPO/UNA through a Generalized Implicit Reward Function

UFT:通过通用隐式奖励函数统一SFT和RLHF/DPO/UNA的微调

Zhichao Wang, Bin Bi, Zixu Zhu, Xiangbo Mao, Jun Wang, Shiyu Wang, Cheng Wang, Dong Nie, Lingzi Hong

发表机构 * Salesforce RadixArk ChatAlpha AI University of North Texas(北卡罗来纳州立大学)

AI总结 本文提出UFT框架,通过隐式奖励函数整合SFT与对齐过程,提升指令微调和事实性任务的性能,实验显示其优于传统方法。

详情
AI中文摘要

通过在万亿个标记上进行预训练,大语言模型获得了文本生成的能力。然而,为了增强其效用并减少潜在的危害,SFT和对齐过程被依次应用于预训练模型。由于SFT和对齐有不同的目标和底层过程,某些任务的性能可能会下降。为了解决这个问题,我们无缝地引入了统一微调(UFT),通过隐式奖励函数将SFT和对齐整合到一个训练阶段中,使用相同的目标和损失函数。我们的实验结果表明,UFT在仅使用指令微调数据的情况下优于SFT。此外,当将指令微调数据与对齐数据结合时,UFT有效防止了在这些两个阶段中某些任务的退化,并在指令遵循的ifeval任务和事实性任务的truthful任务中表现出明显优势。所提出的通用微调框架UFT建立了一个有效且高效的LLM后训练范式。

英文摘要

By pretraining on trillions of tokens, an LLM gains the capability of text generation. However, to enhance its utility and reduce potential harm, SFT and alignment are applied sequentially to the pretrained model. Because SFT and alignment have different objectives and underlying processes, performance on certain tasks can decline. To address this, we seamlessly introduce Unified Fine-Tuning (UFT), which integrates SFT and alignment into a single training stage using the same objective and loss functions through an implicit reward function. Our experimental results demonstrate that UFT outperforms SFT on instruction-tuning data alone. Moreover, when combining instruction-tuning data with alignment data, UFT effectively prevents the degradation on some tasks across these two stages and shows a clear advantage over sequentially applying SFT and alignment. This is evident in the significant improvements observed in the \textbf{ifeval} task for instruction-following and the \textbf{truthful} task for factuality. The proposed general fine-tuning framework UFT establishes an effective and efficient paradigm for LLM post-training.

2410.18715 2026-05-11 cs.CV

ChatSearch: a Dataset and a Generative Retrieval Model for General Conversational Image Retrieval

ChatSearch: 一个用于通用对话图像检索的数据集和生成检索模型

Zijia Zhao, Longteng Guo, Tongtian Yue, Erdong Hu, Shuai Shao, Zehuan Yuan, Hua Huang, Jing Liu

发表机构 * The Laboratory of Cognition and Decision Intelligence for Complex Systems, Institute of Automation, Chinese Academy of Sciences(认知与决策智能复杂系统实验室,自动化研究所,中国科学院) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Bytedance Inc.(字节跳动公司) School of Artificial Intelligence, Beijing Normal University(北京师范大学人工智能学院)

AI总结 本文提出ChatSearch数据集和生成检索模型,用于开放领域图像的对话式检索,通过多轮多模态对话上下文查询提升检索准确性。

Journal ref Pattern Recognition, 167 (2025) 111696

详情
AI中文摘要

本文研究了开放领域图像的通用对话式检索任务,旨在通过人机交互对话搜索图像。我们构建了ChatSearch数据集,包含针对每张目标图像的多轮多模态对话上下文查询,要求检索系统从数据库中准确找到图像。同时,我们提出ChatSearcher生成检索模型,端到端训练以接受和生成交错的图像-文本输入/输出。ChatSearcher在多模态上下文推理和利用世界知识生成视觉检索结果方面表现出色,在ChatSearch数据集上表现优异,并在其他图像检索和视觉对话任务中也取得竞争性结果。

英文摘要

In this paper, we investigate the task of general conversational image retrieval on open-domain images. The objective is to search for images based on interactive conversations between humans and computers. To advance this task, we curate a dataset called ChatSearch. This dataset includes a multi-round multimodal conversational context query for each target image, thereby requiring the retrieval system to find the accurate image from database. Simultaneously, we propose a generative retrieval model named ChatSearcher, which is trained end-to-end to accept/produce interleaved image-text inputs/outputs. ChatSearcher exhibits strong capability in reasoning with multimodal context and can leverage world knowledge to yield visual retrieval results. It demonstrates superior performance on the ChatSearch dataset and also achieves competitive results on other image retrieval tasks and visual conversation tasks. We anticipate that this work will inspire further research on interactive multimodal retrieval systems. Our dataset will be available at https://github.com/joez17/ChatSearch.

2410.01308 2026-05-11 cs.LG cs.AI

How Hard Is It for Message-Passing GNNs to Simulate One Weisfeiler-Lehman Color-Refinement Step?

信息传递图神经网络模拟一个Weisfeiler-Lehman颜色细化步骤的难度有多大?

Guanyu Cui, Yuhe Guo, Zhewei Wei, Hsin-Hao Su

发表机构 * Renmin University of China(中国人民大学) Boston College(波士顿学院)

AI总结 研究信息传递图神经网络在无标签图上模拟Weisfeiler-Lehman颜色细化步骤的资源需求,区分无偏和实例依赖模拟,发现颜色细化隐藏了全局重标问题,且浅层网络难以解决,但大颜色集可降低成本。

详情
AI中文摘要

信息传递图神经网络(MPGNNs)常与Weisfeiler-Lehman(WL)颜色细化过程进行比较,但这种比较未量化网络实现颜色细化所需的资源参数。本文研究了在无标签图上模拟单个颜色细化步骤的成本。我们区分了输入无关(无偏)模拟与实例依赖模拟。在无偏情况下,确定性和零误差随机MPGNN无法在最坏情况下仅用浅层网络和小消息解决此问题。我们通过更强的根部、端口感知模型补充了此下界。相比之下,当颜色集较大时,有限误差随机性可大幅降低成本,仅需一层MPGNN、对数大小的消息和对数数量的随机位即可。我们证明此对数数量的随机位对浅层、小消息模拟是本质必需的。当颜色集较小时,仍可获得根部、端口感知模拟,但此构造需要更多层或更大消息。我们还证明此额外成本部分不可避免,因为小颜色集迫使在层数和消息大小之间进行非平凡权衡。最后,实例依赖模拟可大幅浅层,但所需的实例特定参数未必易得。这些结果揭示了MPGNN与WL颜色细化匹配背后的量化结构。

英文摘要

Message-passing graph neural networks (MPGNNs) are commonly compared with the Weisfeiler-Lehman (WL) color-refinement procedure, but this comparison does not quantify the resource parameters a network needs to realize color refinement with bounded-size messages and finite numerical precision. We study the cost of simulating a single color-refinement step on unattributed graphs. We distinguish input-independent, or oblivious, simulation from instance-dependent simulation. In the former, the parameters, or their distributions in randomized models, are fixed before the input instance is known. Our results show that the local form of WL color refinement hides a global relabeling problem. In the oblivious setting, deterministic and zero-error randomized MPGNNs cannot solve this problem in the worst case using only shallow networks with small messages. We complement this lower bound with a nearly matching construction in a stronger rooted, port-aware model. By contrast, when the color set is large, bounded-error randomness can greatly reduce the cost, and a one-layer MPGNN with messages of logarithmic size and a logarithmic number of random bits suffices. We show that this logarithmic number of random bits is essentially necessary for shallow, small-message simulations. When the color set is small, we still obtain a rooted, port-aware simulation, but this construction requires more layers or larger messages. We also prove that this extra cost is partly unavoidable, as small color sets force a nontrivial trade-off between the number of layers and the message size. Finally, instance-dependent simulation can be much shallower, but the required instance-specific parameters are not necessarily easy to find. Together, these results reveal quantitative structure hidden behind the statement that MPGNNs match WL color refinement.

2408.15339 2026-05-11 cs.LG cs.CL

UNA: A Unified Supervised Framework for Efficient LLM Alignment Across Feedback Types

UNA: 一种统一的监督框架,用于高效对齐跨反馈类型的大型语言模型

Zhichao Wang, Bin Bi, Can Huang, Shiva Kumar Pentyala, Zixu James Zhu, Sitaram Asur, Na Claire Cheng, Cheng Wan, Dong Nie, Lingzi Hong

发表机构 * Salesforce RadixArk ChatAlpha AI University of North Texas(北得克萨斯大学)

AI总结 UNA框架通过通用隐式奖励函数统一处理二元、成对和评分反馈,理论证明其最优性,实验验证其在经典基准上的优势。

详情
AI中文摘要

基于配对偏好数据的强化学习对齐方法,如RLHF和DPO,通常忽略偏好幅度信息。当前对齐框架难以统一异质监督信号,限制了对齐过程的丰富性和可扩展性。为解决此问题,我们提出UNA框架,通过通用隐式奖励函数训练不同类型的反馈,理论证明该奖励函数通过对数和不等式最优。在经典基准上的广泛实验验证了所提统一框架的优势。

英文摘要

RL alignment methods, including RLHF and DPO, are primarily based on pairwise preference data. Although scalar or score-based feedback has been collected in some settings, it is rarely used directly, and preference magnitude information is typically ignored. Furthermore, current alignment frameworks offer limited capability for unifying heterogeneous supervision signals, making it difficult to jointly leverage diverse data types within a single training paradigm. This limitation constrains the richness and scalability of the alignment process. To address this gap, we propose a \textbf{UN}ified \textbf{A}lignment (UNA) framework capable of training across different types of feedback, including binary, pairwise, and score-based, through a generalized implicit reward function. The reward function is theoretically proved to be the optimal policy by the log sum inequality. Extensive experiments on classical benchmarks consistently demonstrate the advantage of the proposed unified framework with typical LLM base models.

2408.09929 2026-05-11 cs.LG cs.CV

Data Augmentation of Contrastive Learning is Estimating Positive-incentive Noise

对比学习中的数据增强是估计正激励噪声

Hongyuan Zhang, Yanchen Xu, Sida Huang, Xuelong Li

发表机构 * The University of Hong Kong(香港大学) Institute of Artificial Intelligence (TeleAI), China Telecom(中国电信人工智能研究院) Fudan University(复旦大学) School of Artificial Intelligence, OPtics(人工智能学院) ElectroNics (iOPEN), Northwestern Polytechnical University(西北工业大学电子学院)

AI总结 本文研究对比学习与正激励噪声的联系,定义了任务熵并提出正激励噪声生成器,通过可视化证明所提方法有效。

Comments Accepted by ICML 2026

详情
AI中文摘要

受正激励噪声(Pi-Noise或π-Noise)启发,本文科学探讨对比学习与π-噪声的联系。通过将对比损失转换为辅助高斯分布,从信息论框架量化特定对比模型的难度,定义了对比学习中的任务熵。进一步证明标准对比学习中预定义的数据增强可视为π-噪声的点估计。受理论研究启发,提出一个开发π-噪声生成器的框架,学习有益噪声作为对比增强。该框架适用于多种数据类型,与现有对比模型兼容。可视化显示所提方法成功学习有效增强。代码见https://github.com/hyzhang98/PiNDA。

英文摘要

Inspired by the idea of Positive-incentive Noise (Pi-Noise or $π$-Noise) that aims at learning the reliable noise beneficial to tasks, we scientifically investigate the connection between contrastive learning and $π$-noise in this paper. By converting the contrastive loss to an auxiliary Gaussian distribution to quantitatively measure the difficulty of the specific contrastive model under the information theory framework, we properly define the task entropy, the core concept of $π$-noise, of contrastive learning. It is further proved that the predefined data augmentation in the standard contrastive learning paradigm can be regarded as a kind of point estimation of $π$-noise. Inspired by the theoretical study, a framework that develops a $π$-noise generator to learn the beneficial noise (instead of estimation) as data augmentations for contrast is proposed. The designed framework can be applied to diverse types of data and is also completely compatible with the existing contrastive models. From the visualization, we surprisingly find that the proposed method successfully learns effective augmentations. Our code is available at https://github.com/hyzhang98/PiNDA.

2407.04183 2026-05-11 cs.CL cs.AI cs.CY cs.HC

Seeing Like an AI: How LLMs Apply (and Misapply) Wikipedia Neutrality Norms

像AI一样看待:LLMs如何应用(并误用)维基百科中立规范

Joshua Ashkinaze, Ruijia Guan, Laura Kurek, Eytan Adar, Ceren Budak, Eric Gilbert

发表机构 * University of Michigan(密歇根大学)

AI总结 研究评估LLMs在检测和纠正偏见维基百科编辑时的表现,发现其在中立性检测上准确率低,但生成任务表现较好,但存在额外非中立性修改,引发对社区规范执行与公众认知之间差异的思考。

Comments Appeared at ICWSM 2026

详情
AI中文摘要

大型语言模型(LLMs)在广泛语料上进行训练后,被用于具有专门规范的社区。提供LLMs社区规则是否足以使模型遵循这些规范?我们评估LLMs根据维基百科中立观点(NPOV)政策检测(任务1)和纠正(任务2)偏见维基百科编辑的能力。LLMs在偏见检测上表现不佳,仅在平衡数据集上达到64%的准确率。模型表现出对比偏见(一些低估偏见,其他高估偏见),表明不同的中立先验假设。LLMs在生成任务中表现更好,删除了79%被维基百科编辑删除的单词。然而,LLMs在维基百科编辑更简单的中性化之外,做出了额外的修改,导致高召回但低精度的编辑。有趣的是,众包工人认为AI重写更中性(70%)且流畅(61%) than维基百科编辑重写。定性分析发现LLMs有时比维基百科编辑更全面地应用NPOV,但经常做出无关的非NPOV修改(如语法)。LLMs可能以与公众共鸣的方式应用规则,但与社区专家产生分歧。尽管在生成任务中可能有效,LLMs可能减少编辑者的自主权并增加 moderation 工作量(例如,验证添加)。即使规则易于表达,让LLMs像社区成员一样应用它们可能仍然困难。

英文摘要

Large language models (LLMs) are trained on broad corpora and then used in communities with specialized norms. Is providing LLMs with community rules enough for models to follow these norms? We evaluate LLMs' capacity to detect (Task 1) and correct (Task 2) biased Wikipedia edits according to Wikipedia's Neutral Point of View (NPOV) policy. LLMs struggled with bias detection, achieving only 64% accuracy on a balanced dataset. Models exhibited contrasting biases (some under- and others over-predicted bias), suggesting distinct priors about neutrality. LLMs performed better at generation, removing 79% of words removed by Wikipedia editors. However, LLMs made additional changes beyond Wikipedia editors' simpler neutralizations, resulting in high-recall but low-precision editing. Interestingly, crowdworkers rated AI rewrites as more neutral (70%) and fluent (61%) than Wikipedia-editor rewrites. Qualitative analysis found LLMs sometimes applied NPOV more comprehensively than Wikipedia editors but often made extraneous non-NPOV-related changes (such as grammar). LLMs may apply rules in ways that resonate with the public but diverge from community experts. While potentially effective for generation, LLMs may reduce editor agency and increase moderation workload (e.g., verifying additions). Even when rules are easy to articulate, having LLMs apply them like community members may still be difficult.

2310.15288 2026-05-11 cs.AI cs.LG

Active teacher selection for reward learning

奖励学习中的主动教师选择

Rachel Freedman, Justin Svegliato, Kyle Wray, Stuart Russell

发表机构 * University of California, Berkeley(加州大学伯克利分校) Northeastern University(东北大学)

AI总结 本文提出HUB框架,通过主动选择教师来提升奖励学习效果,应用于推荐系统和疫苗测试领域,展示了教师异质性建模的实用性。

详情
AI中文摘要

奖励学习技术使机器学习系统能够从人类反馈中学习目标。这些系统的核心限制在于其假设所有反馈都来自单一人类教师,尽管实际上收集的反馈来自大量且异质的人群。我们提出了隐藏效用老虎机(HUB)框架,以建模教师理性、专业知识和成本性的差异,正式化了从多个教师处学习的问题。我们开发了多种解决方案算法,并将其应用于两个现实世界领域:论文推荐系统和新冠疫苗测试。我们发现,主动教师选择(ATS)算法通过主动选择何时以及向哪个教师查询,优于基线方法。我们的关键贡献是1)HUB框架:一种新的数学框架,用于建模教师选择问题,2)ATS:一种基于主动学习的算法方法,展示了建模教师异质性的实用性,3)HUB框架和ATS方法的证明概念应用,用于解决具有奖励学习与优化之间复杂权衡的多个现实问题。

英文摘要

Reward learning techniques enable machine learning systems to learn objectives from human feedback. A core limitation of these systems is their assumption that all feedback comes from a single human teacher, despite gathering feedback from large and heterogeneous populations. We propose the Hidden Utility Bandit (HUB) framework to model differences in teacher rationality, expertise, and costliness, formalizing the problem of learning from multiple teachers. We develop a variety of solution algorithms and apply them to two real-world domains: paper recommendation systems and COVID-19 vaccine testing. We find that Active Teacher Selection (ATS) algorithms outperform baselines by actively selecting when and which teacher to query. Our key contributions are 1) the HUB framework: a novel mathematical framework for modeling the teacher selection problem, 2) ATS: an active-learning based algorithmic approach that demonstrates the utility of modeling teacher heterogeneity, and 3) proof-of-concept application of the HUB framework and ATS approaches to model and solve multiple real-world problems with complex trade-offs between reward learning and optimization.

2305.18593 2026-05-11 cs.LG cs.AI

On Diffusion Modeling for Anomaly Detection

关于扩散模型在异常检测中的应用

Victor Livernoche, Vineet Jain, Yashar Hezaveh, Siamak Ravanbakhsh

发表机构 * School of Computer Science, McGill University(麦吉尔大学计算机科学学院) Department of Physics, University of Montreal(蒙特利尔大学物理系) Mila - Quebec AI Institute(魁北克人工智能研究所)

AI总结 本文探讨了扩散模型在无监督和半监督异常检测中的不同变体,提出扩散时间估计方法,在保持性能的同时显著提升推理效率。

Journal ref Proceedings of the International Conference on Learning Representations (ICLR 2024)

详情
AI中文摘要

基于生成模型的出色表现,扩散模型在基于密度的异常检测中具有吸引力。本文探讨了扩散建模在无监督和半监督异常检测中的不同变体。特别地,我们发现去噪扩散概率模型(DDPM)在异常检测基准上表现良好,但计算成本高。通过在异常检测中简化DDPM,我们自然地得出了一种替代方法,称为扩散时间估计(DTE)。DTE估计给定输入的扩散时间分布,并使用该分布的模式或均值作为异常分数。我们推导了该密度的解析形式,并利用深度神经网络来提高推理效率。通过在ADBench基准上的实证评估,我们证明所有基于扩散的异常检测方法在半监督和无监督设置中均具有竞争力。值得注意的是,DTE的推理时间比DDPM快几个数量级,同时在该基准上优于DDPM。这些结果确立了基于扩散的异常检测作为传统方法和最新深度学习技术的可扩展替代方案。

英文摘要

Known for their impressive performance in generative modeling, diffusion models are attractive candidates for density-based anomaly detection. This paper investigates different variations of diffusion modeling for unsupervised and semi-supervised anomaly detection. In particular, we find that Denoising Diffusion Probability Models (DDPM) are performant on anomaly detection benchmarks yet computationally expensive. By simplifying DDPM in application to anomaly detection, we are naturally led to an alternative approach called Diffusion Time Estimation (DTE). DTE estimates the distribution over diffusion time for a given input and uses the mode or mean of this distribution as the anomaly score. We derive an analytical form for this density and leverage a deep neural network to improve inference efficiency. Through empirical evaluations on the ADBench benchmark, we demonstrate that all diffusion-based anomaly detection methods perform competitively for both semi-supervised and unsupervised settings. Notably, DTE achieves orders of magnitude faster inference time than DDPM, while outperforming it on this benchmark. These results establish diffusion-based anomaly detection as a scalable alternative to traditional methods and recent deep-learning techniques for standard unsupervised and semi-supervised anomaly detection settings.

2106.09636 2026-05-11 cs.LG

Multi-Stage Prototype Learning for Interpretable Time Series Classification

多阶段原型学习用于可解释的时间序列分类

Bhavesh Kalisetti, Vincent Wang, Gaurav R. Ghosal, Maryam Bijanzadeh, Reza Abbasi-Asl

发表机构 * Department of Neurology, University of California, San Francisco(加州大学旧金山分校神经科部门) Department of Bioengineering and Therapeutic Sciences, University of California, San Francisco(加州大学旧金山分校生物工程与治疗科学部门) UCSF Weill Institute for Neurosciences(UCSF Weill神经科学研究所)

AI总结 本文提出多阶段原型学习框架,通过显式层次化解释提升时间序列分类的可解释性,验证了在模拟和真实数据集上的性能。

详情
AI中文摘要

深度学习方法在多变量时间序列分类中表现强大,但其难以解释限制了高风险领域如医疗的应用。本文提出一种新颖的多阶段原型学习框架,通过识别单变量和跨变量的预测性时间模式,提供显式的层次化原型解释,从而显著提升可解释性。

英文摘要

Deep learning methods are powerful tools in classifying multivariate time series data. Despite their high performance, these methods are hard to interpret, which diminishes their applications in high-risk domains such as healthcare. In this paper, we propose a novel multi-stage prototype learning framework for multivariate time series classification. By design, our framework identifies predictive temporal patterns in individual variables as well as cross-variable patterns that are highly predictive of each class. We validate our model on one simulated and four real-world datasets and demonstrate comparable accuracy to state-of-the-art methods while providing substantially improved interpretability through explicit, hierarchical prototype-based explanations. These explanations reveal both single-variable temporal patterns as well as cross-variable interactions that are most predictive for each class, providing insights into underlying mechanisms of the predictive model.

2605.08072 2026-05-11 stat.ML cs.DS cs.LG math.ST stat.TH

A Note on Non-Negative $L_1$-Approximating Polynomials

关于非负L1逼近多项式的一个注记

Jane H. Lee, Anay Mehrotra, Manolis Zampetakis

发表机构 * Yale University(耶鲁大学) Stanford University(斯坦福大学)

AI总结 研究了在高斯分布下非负L1逼近多项式的存在性,证明了有限高斯表面面积的集合可通过特定次数的非负多项式实现L1逼近,结果与当前最佳的高斯L1逼近次数界相符。

详情
AI中文摘要

L1逼近多项式,即在某些分布下以L1范数逼近指示函数的多项式,广泛应用于计算学习理论。本文研究了在高斯分布下非负L1逼近多项式存在性。这比L1逼近要求更强,但比夹逼多项式(后者有广泛应用)弱。这些非负逼近多项式最近在平滑学习中被用于正例学习。本文证明,任何在标准高斯分布下具有高斯表面面积(GSA)至多Γ的集合,都允许次数为~O(Γ²/ε²)的非负多项式,以ε-精度逼近其指示函数。等价地,有限GSA意味着L1逼近,且逼近多项式范围被限制在[0,∞)内。此结果在不考虑非负性约束时,与当前最佳的高斯L1逼近次数界相差常数因子。

英文摘要

$L_1$-Approximating polynomials, i.e., polynomials that approximate indicator functions in $L_1$-norm under certain distributions, are widely used in computational learning theory. We study the existence of \textit{non-negative} $L_1$-approximating polynomials with respect to Gaussian distributions. This is a stronger requirement than $L_1$-approximation but weaker than sandwiching polynomials (which themselves have many applications). These non-negative approximating polynomials have recently found uses in smoothed learning from positive-only examples. In this short note, we prove that every class of sets with Gaussian surface area (GSA) at most $Γ$ under the standard Gaussian admits degree-$k$ non-negative polynomials that $\eps$-approximate its indicator functions in $L_1$-norm, for $k=\tilde{O}(Γ^2/\varepsilon^2)$. Equivalently, finite GSA implies $L_1$-approximation with the stronger pointwise guarantee that the approximating polynomial has range contained in $[0,\infty)$. Up to a constant-factor, this matches the degree of the best currently known Gaussian $L_1$-approximation degree bound without the non-negativity constraint.

2605.08035 2026-05-11 eess.SP cs.LG

PropSplat: Map-Free RF Field Reconstruction via 3D Gaussian Propagation Splatting

PropSplat: 通过3D高斯传播散射实现无地图的射频场重建

William Bjorndahl, Maninder Pal Singh, Farhad Nouri, Joseph Camp

发表机构 * Department of Electrical and Computer Engineering, Southern Methodist University, Dallas, TX, USA(电气与计算机工程系,南方 Methodist 大学,德克萨斯州达拉斯市) Department of Engineering Technology, University of Houston, Houston, TX, USA(工程技术系,德克萨斯大学休斯敦分校)

AI总结 PropSplat利用3D各向异性高斯体重建射频场,无需地理数据或外部信息,通过端到端优化实现精准传播建模,优于现有方法。

Comments Accepted for presentation at IEEE DySPAN 2026

详情
AI中文摘要

构建特定地点的传播模型通常需要射线追踪详细3D地图或密集测量活动。这两种方法成本高且难以快速部署,尤其是当地理数据不可用或过时时。我们提出了PropSplat,一种无地图的传播建模方法,利用3D各向异性高斯体重建射频(RF)场。每个高斯体编码一个标量路径损耗偏移量,相对于显式基准路径损耗模型,具有可学习的路径损耗指数。高斯体初始化于观测到的发射机-接收机路径上,并通过端到端优化学习传播环境,无需外部信息如楼层计划、地形数据库或障碍物数据。我们评估了PropSplat与无线辐射场方法NeRF$^2$、GSRF和WRF-GS+在两个真实世界数据集上的性能。在覆盖多个地形区域的大型户外道路测试中,PropSplat在300米间隔的训练测量下达到5.38 dB的RMSE,优于WRF-GS+(5.87 dB)、GSRF(7.46 dB)和NeRF$^2$(14.76 dB)。在室内蓝牙低能耗测量中,PropSplat达到0.19米的平均定位误差,比NeRF$^2$(1.84米)高一个数量级,同时实现几乎相同的接收到的信号强度预测精度。这些结果表明,从稀疏的射频原生测量中可以实现准确的特定地点传播重建。地理数据作为可扩展射频环境建模的前提需求被减少。

英文摘要

Building a site-specific propagation model typically requires either ray-tracing over detailed 3D maps or dense measurement campaigns. Both approaches are expensive and often infeasible for rapid deployments where geographic data is unavailable or outdated. We present PropSplat, a map-free propagation modeling method that reconstructs radio frequency (RF) fields using 3D anisotropic Gaussian primitives. Each Gaussian encodes a scalar path loss offset relative to an explicit baseline path loss model with a learnable path loss exponent. Gaussians are initialized along observed transmitter--receiver paths and optimized end-to-end to learn the propagation environment without external information like floor plans, terrain databases, or clutter data. We evaluate PropSplat against wireless radiance field methods NeRF$^2$, GSRF, and WRF-GS+ on two real-world datasets. On large-scale outdoor drive-tests spanning multiple topographical regions at six sub-6 GHz frequencies, PropSplat achieves 5.38 dB RMSE when training measurements are spaced 300m apart and outperforms WRF-GS+ (5.87 dB), GSRF (7.46 dB), and NeRF$^2$ (14.76 dB). On indoor Bluetooth Low Energy measurements, PropSplat achieves 0.19m mean localization error, an order of magnitude better than NeRF$^2$ (1.84m), while achieving near-identical received signal strength prediction accuracy. These results show that accurate site-specific propagation reconstruction is achievable from sparse RF-native measurements. The need for geographic data as a prerequisite for scalable RF environment modeling is reduced.

2605.08034 2026-05-11 stat.ML cs.LG

Semiparametric Efficient Test for Interpretable Distributional Treatment Effects

半参数高效检验可解释分布治疗效应

Houssam Zenati, Arthur Gretton

发表机构 * Gatsby Computational Neuroscience Unit, University College London(Gatsby计算神经科学单位,伦敦大学学院)

AI总结 本文提出DR-ME,首个半参数高效有限位置检验,用于检测可解释的分布治疗效应差异,通过局部测试和样本分割实现高功效和低误差。

详情
AI中文摘要

分布治疗效应可能对均值不可见:治疗可能保持平均结果而改变尾部、模式、分散度或稀有事件概率。核检验可检测干预结果定律的差异,但全局检验无法揭示差异位置。本文提出DR-ME,首个半参数高效有限位置检验,评估干预核见证器在学习结果位置处的值,返回因果差异坐标而非仅全局拒绝。从观测数据中推导正交双稳健核特征,其中心 oracle 形式是该有限见证器的 canonical gradient。对于固定位置,我们刻画局部检验极限:DR-ME 在原假设下为卡方校准,具有非中心卡方局部功效,并使用协方差白化以优化通过选定坐标可见的差异信号噪声比。此高效局部功效几何学产生一个原则性的位置学习准则,样本分割保持后选择有效性。实验显示近名义I型误差,与全局双稳健核检验竞争性功效,并在半合成医学影像研究中获得可解释的学习位置,局部化分布效应。

英文摘要

Distributional treatment effects can be invisible to means: a treatment may preserve average outcomes while changing tails, modes, dispersion, or rare-event probabilities. Kernel tests can detect discrepancies between interventional outcome laws, but global tests do not reveal where the laws differ. We propose DR-ME, to our knowledge the first semiparametrically efficient finite-location test for interpretable distributional treatment effects. DR-ME evaluates an interventional kernel witness at learned outcome locations, returning causal-discrepancy coordinates rather than only a global rejection. From observational data, we derive orthogonal doubly robust kernel features whose centered oracle form is the canonical gradient of this finite witness. For fixed locations, we characterize the local testing limit: DR-ME is chi-square calibrated under the null, has noncentral chi-square local power, and uses the covariance whitening that optimizes local signal-to-noise for discrepancies visible through the selected coordinates. This efficient local-power geometry yields a principled location-learning criterion, with sample splitting preserving post-selection validity. Experiments show near-nominal type-I error, competitive power against global doubly robust kernel tests, and interpretable learned locations that localize distributional effects in a semi-synthetic medical-imaging study.

2605.08022 2026-05-11 cs.NE cs.AI cs.LG

Globally Optimal Training of Spiking Neural Networks via Parameter Reconstruction

通过参数重构实现棘突神经网络的全局最优训练

Himanshu Udupi, Xiaocong Yang, ChengXiang Zhai

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文提出一种参数重构算法,用于改进棘突神经网络的训练,通过理论框架实现跨任务的高效训练,并展示其在大规模SNN中的应用潜力。

详情
AI中文摘要

棘突神经网络(SNN)被提出作为生物合理且节能的替代方案,用于传统人工神经网络(ANN)。然而,SNN的训练通常依赖于替代梯度,由于脉冲函数的非可导性,导致在层间累积的近似误差。为解决这一挑战,我们扩展了平行前馈阈值网络的凸化工作到平行递归阈值网络,后者涵盖了平行SNN作为结构特殊案例。基于这一理论框架,我们提出了一种SNN训练的参数重构算法,该算法在各种任务中均表现出一致且显著的优势,无论是作为独立方法还是与替代梯度训练结合使用。消融实验进一步展示了我们训练算法的数据可扩展性和对模型配置的鲁棒性,指出了其在大规模SNN训练中的潜力。

英文摘要

Spiking Neural Networks (SNNs) have been proposed as biologically plausible and energy-efficient alternatives to conventional Artificial Neural Networks (ANNs). However, the training of SNN usually relies on surrogate gradients due to the non-differentiability of the spike function, introducing approximation errors that accumulate across layers. To address this challenge, we extend the work on convexification of parallel feedforward threshold networks to parallel recurrent threshold networks, which subsume parallel SNNs as a structured special case. Building on this theoretical framework, we propose a parameter reconstruction algorithm for SNN training that demonstrates consistent and significant advantages across various tasks, both as a standalone method and in combination with surrogate-gradient training. The ablations further demonstrate the data scalability and robustness to model configurations of our training algorithm, pointing toward its potential in large-scale SNN training.

2605.08006 2026-05-11 math.OC cs.LG stat.ML

Penalty-Based First-Order Methods for Bilevel Optimization with Minimax and Constrained Lower-Level Problems

基于惩罚的首次-order方法用于具有最优化和约束下层问题的双层优化

Yiyang Shen, Yutian He, Weiran Wang, Qihang Lin

发表机构 * University of Iowa(爱荷华大学)

AI总结 本文提出了一种基于惩罚的首次-order方法,用于解决双层优化问题中的最优化和约束下层问题,无需强凸性假设,提高了算法效率。

详情
AI中文摘要

我们研究了一类双层优化问题,其中上层和下层问题都有最优化结构。这种设定涵盖了广泛的应用。尽管已有大量关于双层优化和最优化优化的文献,现有方法主要集中在具有下层最小化问题的双层优化,通常在强凸性假设下,不适用于本文考虑的最优化下层设定。为填补这一空白,我们开发了无需下层问题强凸性的基于惩罚的首次-order方法用于双层最优化优化。在确定性情况下,我们证明所提出的方法在ε-KKT点上具有~O(ε^{-4})的oracle复杂度。我们进一步证明,具有凸约束下层最小化问题的双层问题可通过拉格朗日对偶性转化为我们的框架的特殊情形,从而得到~O(ε^{-4})的复杂度界,优于现有的~O(ε^{-7})结果。最后,我们将方法扩展到随机情况,其中只有随机梯度oracle可用,并证明所提出的随机方法在近ε-KKT点上具有~O(ε^{-9})的oracle复杂度。

英文摘要

We study a class of bilevel optimization problems in which both the upper- and lower-level problems have minimax structures. This setting captures a broad range of emerging applications. Despite the extensive literature on bilevel optimization and minimax optimization separately, existing methods mainly focus on bilevel optimization with lower-level minimization problems, often under strong convexity assumptions, and are not directly applicable to the minimax lower-level setting considered here. To address this gap, we develop penalty-based first-order methods for bilevel minimax optimization without requiring strong convexity of the lower-level problem. In the deterministic setting, we establish that the proposed method finds an $ε$-KKT point with $\tilde{O}(ε^{-4})$ oracle complexity. We further show that bilevel problems with convex constrained lower-level minimization can be reformulated as special cases of our framework via Lagrangian duality, leading to an $\tilde{O}(ε^{-4})$ complexity bound that improves upon the existing $\tilde{O}(ε^{-7})$ result. Finally, we extend our approach to the stochastic setting, where only stochastic gradient oracles are available, and prove that the proposed stochastic method finds a nearly $ε$-KKT point with $\tilde{O}(ε^{-9})$ oracle complexity.

2605.07987 2026-05-11 eess.IV cs.CV

Uncertainty Quantification for Cardiac Shape Reconstruction with Deep Signed Distance Functions via MCMC methods

基于MCMC方法的深度符号距离函数在心脏形状重建中的不确定性量化

Jan Verhülsdonk, Thomas Grandits, Francisco Sahli Costabal, Thomas Beiert, Simone Pezzuto, Alexander Effland

发表机构 * Department of Mathematics and Scientific Computing, University of Graz, Austria(格拉茨大学数学与科学计算系,奥地利) PRIN-PNRR project no. P2022N5ZNP(PRIN-PNRR项目编号P2022N5ZNP) INdAM-GNCS

AI总结 本文提出结合深度符号距离函数与MCMC采样方法的不确定性感知心脏形状重建框架,通过隐式建模心脏几何并进行贝叶斯推断,实现高精度重建与可靠不确定性估计。

详情
AI中文摘要

基于图谱的方法能够从稀疏或噪声数据(如点云)中实现高质量的患者特异性心脏解剖形状重建。然而,这些方法主要依赖先验知识,不确定性的影响可能较大,限制了其临床可靠性。本文提出一种概率框架,结合深度符号距离函数(DeepSDFs)与马尔可夫链蒙特卡洛(MCMC)采样方法,将心脏几何隐式建模为神经网络的零水平集,通过学习的潜在代码进行多表面重建。通过将重建损失解释为对数似然,我们在潜在空间中进行贝叶斯推断,获得最大后验(MAP)和后验采样重建。在公开心脏数据集上的实验表明,本文方法能够产生准确的重建和良好的不确定性估计。

英文摘要

Atlas-based approaches allow high-quality, patient-specific shape reconstructions of cardiac anatomy from sparse and/or noisy data such as point clouds. However, these methods are mainly prior-driven, so the impact of uncertainty can be large, limiting their clinical reliability. We propose a probabilistic framework for uncertainty-aware cardiac shape reconstruction that combines Deep Signed Distance Functions (DeepSDFs) with Markov Chain Monte Carlo (MCMC) sampling. Cardiac geometries are modeled implicitly as zero-level sets of a neural network conditioned on learned latent codes, enabling multi-surface reconstruction of the left and right ventricles. By interpreting the reconstruction loss as a log-likelihood, we perform Bayesian inference in the latent space to obtain both maximum a posteriori (MAP) and posterior-sampled reconstructions. Experiments on a public cardiac dataset show that our approach produces accurate reconstructions and well-calibrated uncertainty estimates.

2605.07986 2026-05-11 cs.HC cs.AI cs.CY

Towards Apples to Apples for AI Evaluations: From Real-World Use Cases to Evaluation Scenarios

迈向公平的AI评估:从现实使用案例到评估场景

Yee-Yin Choong, Kristen Greene, Alice Qian, Meryem Marasli, Ziqi Yang, Sophia Chen, Laura Dabbish, Anand Rao, Hong Shen

发表机构 * National Institute of Standards and Technollogy(国家标准与技术研究院) Carnegie Mellon University(卡内基梅隆大学)

AI总结 本文提出一种可重复的流程,通过结构化AI使用案例工作表将高层使用案例转化为详细场景,结合LLM提示和人工审查,生成107个场景,确保评估场景的现实性和人类需求。

Comments 23 pages, 3 figures

详情
AI中文摘要

AI测量科学在比较AI系统时有多种方法和测量,导致AI评估中常出现

英文摘要

AI measurement science has a wide variety of methodologies and measurements for comparing AI systems, resulting in what often appear to be "apples-to-oranges" comparisons across AI evaluations. To move toward "apples-to-apples" comparisons in real-world AI evaluations, this work advocates for methodological transparency in evaluation scenarios, operational grounding, and human-centered design (HCD) principles. We propose a repeatable process for transforming high-level use cases to detailed scenarios by eliciting use cases from subject matter experts (SMEs) via a structured AI Use Case Worksheet with six key elements: use case, sector, user (direct and indirect), intended outcomes, expected impacts (positive and negative), and KPIs and metrics. We demonstrate utility of the worksheet and process in the U.S. financial services sector. This paper reports on example high-level AI use cases identified by financial services sector SMEs: cyber defense enablement, developer productivity, financial crime aggregation, suspicious activity report (SAR) filing, credit memo generation, and internal call center support. These AI use cases provided are illustrative of the process and not exhaustive. Central to our work is a three-stage expansion pipeline combining LLM prompting with human reviews to generate 107 scenarios from those use cases elicited from SMEs. This process integrates iterative human reviews at every juncture to ensure operational grounding: for scenario titles and descriptions; for core scenario elements like users, benefits and risks, and metrics; and for scenario narratives and evaluation objectives. Human checkpoints ensure scenarios remain reflective of real-world usage and human needs. We describe a validation rubric to assess scenario quality. By defining key scenario components, this work supports a more consistent and meaningful paradigm for human-centered AI evaluations.

2605.07947 2026-05-11 cs.CE cs.AI cs.LG math.OC

Exploring the non-convexity in machine learning using quantum-inspired optimization

探索机器学习中的非凸性使用量子启发式优化

Kandula Eswara Sai Kumar, Parth Dhananjay Danve, Abhishek Chopra, Rut Lineswala

发表机构 * BosonQ Psi (BQP)(BosonQ Psi(BQP))

AI总结 本文提出基于量子启发式优化的统一框架,解决高维非凸优化问题,提升稀疏信号恢复和鲁棒回归的性能。

详情
AI中文摘要

现代机器学习的复杂性要求解决具有挑战性的非凸优化问题,尤其是在高维领域和存在大规模异常值的场景中。传统方法依赖于凸松弛或专门的局部搜索启发式方法,经常陷入次优局部极小值并无法恢复真正的离散结构。本文提出将这些非凸挑战视为全局搜索问题,并引入基于量子启发式进化优化(QIEO)的统一框架。通过受量子叠加启发的概率表示,QIEO保持了搜索空间的全局观点,使其能够穿透传统梯度法和贪婪求解器所困的局部极值。我们对多样化的非凸应用进行了全面评估,包括稀疏信号恢复(基因表达分析和压缩感知)和鲁棒线性回归。与最先进的连续求解器(ADAM、差分进化)、经典元启发式方法(遗传算法)和专门的非凸算法(迭代硬阈值)的广泛基准测试表明,QIEO在结构保真度、均方误差和鲁棒性方面均表现优异,且不支持膨胀。我们的发现表明,采用量子启发式全局搜索提供了一种坚韧、统一的范式,以克服离散非凸机器学习景观的固有不可解性。

英文摘要

The escalating complexity of modern machine learning necessitates solving challenging non-convex optimization problems, particularly in high-dimensional regimes and scenarios contaminated by gross outliers. Traditional approaches, relying on convex relaxations or specialized local search heuristics, frequently succumb to suboptimal local minima and fail to recover the true underlying discrete structures. In this paper, we propose treating these non-convex challenges as a global search problem and introduce a unified framework based on Quantum-Inspired Evolutionary Optimization (QIEO). By leveraging a probabilistic representation inspired by quantum superposition, QIEO maintains a global view of the search space, enabling it to tunnel through local optima that trap conventional gradient-based and greedy solvers. We comprehensively evaluate QIEO across diverse non-convex applications, including sparse signal recovery (gene expression analysis and compressed sensing) and robust linear regression. Extensive benchmarking against state-of-the-art continuous solvers (ADAM, Differential Evolution), classical metaheuristics (Genetic Algorithms), and specialized non-convex algorithms (Iterative Hard Thresholding) demonstrates that QIEO consistently achieves superior structural fidelity, lower mean squared error, and enhanced robustness without support inflation. Our findings suggest that embracing a quantum-inspired global search provides a resilient, unified paradigm for overcoming the inherent intractability of discrete nonconvex machine learning landscapes.

2605.07908 2026-05-11 math.ST cs.AI cs.LG stat.TH

Statistical inference with belief functions: A survey

基于信念函数的统计推断:综述

Fabio Cuzzolin

发表机构 * Institute for AI, Data Analysis and Systems (AIDAS)(人工智能、数据分析与系统研究所) School of Engineering, Computing & Mathematics, Oxford Brookes University(工程、计算与数学学院,奥克斯伯里大学)

AI总结 综述基于信念函数的统计推断方法,探讨如何从统计数据中学习信念测度,并总结该领域的主要贡献。

Comments 9 pages, 0 figures

详情
AI中文摘要

信念函数是一种强大的、受欢迎的框架,用于数学上表征不确定性,特别是在缺乏数据使得为问题学习概率分布变得不切实际的情况下。基于信念函数的推理链的第一步是推理:如何从可用数据中学习信念测度。在本次综述中,我们特别关注从统计数据中进行推理,并回顾该领域最显著的贡献。

英文摘要

Belief functions are a powerful and popular framework for the mathematical characterisation of uncertainty, in particular in situations in which lack of data renders learning a probability distribution for the problem impractical. The first step in a reasoning chain based on belief functions is inference: how to learn a belief measure from the available data. In this survey we focus, in particular, on making inference from statistical data, and review the most significant contributions in the area.

2605.07907 2026-05-11 stat.ML cs.CV cs.LG

Consistency Regularised Gradient Flows for Inverse Problems

一致性正则化梯度流用于逆问题

Alessio Spagnoletti, Tim Y. J. Wang, Marcelo Pereyra, O. Deniz Akyildiz

发表机构 * Laboratoire MAP5, UMR 8145, Université Paris Cité, CNRS(巴黎城市大学MAP5实验室,UMR 8145,CNRS) Department of Mathematics, Imperial College London(伦敦帝国理工学院数学系) Heriot-Watt University, MACS & Maxwell Institute(赫瑞瓦德大学,MACS与麦克斯韦研究所)

AI总结 本文提出一种统一的欧几里得-瓦瑟斯坦2梯度流框架,通过单一流程在潜在空间中联合执行后验采样和提示优化,减少计算成本并提升逆问题求解性能。

详情
AI中文摘要

视觉-语言潜在扩散模型(LDMs)(Rombach等人,2022)为逆问题提供了强大的生成先验。然而,现有基于LDM的逆问题求解器通常需要大量神经函数评估(NFEs)和通过大型预训练组件的反向传播,导致计算成本高且在某些情况下重建质量下降。我们提出了一种统一的欧几里得-瓦瑟斯坦2梯度流框架,通过单一流程在潜在空间中联合执行后验采样和提示优化,通过对齐先验和后验与观测数据。结合少量步骤的潜在文本到图像模型,该方法使无需反向传播通过自动编码器即可实现低NFE推理。在多个经典图像逆问题实验中,本方法在显著降低计算成本的同时实现了最先进的性能。

英文摘要

Vision-Language Latent Diffusion Models (LDMs) (Rombach et al., 2022) provide powerful generative priors for inverse problems. However, existing LDM-based inverse solvers typically require a large number of neural function evaluations (NFEs) and backpropagation through large pretrained components, leading to substantial computational costs and, in some cases, degraded reconstruction quality. We propose a unified Euclidean-Wasserstein-2 gradient-flow framework that jointly performs posterior sampling and prompt optimization in the latent space through a single flow that aligns the prior and posterior with the observed data. Combined with few-step latent text-to-image models, this formulation enables low-NFE inference without backpropagation through autoencoders. Experiments across several canonical imaging inverse problems show that our method achieves state-of-the-art performance with significantly reduced computational cost.

2605.07896 2026-05-11 cs.CY cs.AI

What if AI systems weren't chatbots?

如果人工智能系统不是聊天机器人呢?

Sourojit Ghosh, Pranav Narayanan Venkit, Sanjana Gautam, Avijit Ghosh

发表机构 * University of Washington Seattle(华盛顿大学(西雅图)) Salesforce Research(Salesforce研究) Microsoft(微软) Hugging Face and University of Connecticut(Hugging Face与康涅狄格大学)

AI总结 本文探讨了将AI主要作为对话助手的聊天机器人范式带来的结构性弊端,分析其对社会、经济、法律和环境系统的影响,并提出替代的AI发展方向。

Comments Accepted at The 2026 ACM Conference on Fairness, Accountability, and Transparency, June 25--28, 2026, Montreal, QC, Canada

详情
AI中文摘要

人工智能(AI)快速向对话聊天机器人界面融合标志着行业的重要转折点。本文认为,聊天机器人范式并非中性接口选择,而是一种主导的社会技术配置,其广泛采用重塑了社会、经济、法律和环境系统。我们探讨了将AI主要作为对话助手的结构性弊端,展示了基于聊天机器人的系统在复杂或高风险情境中往往无法充分满足用户需求,同时表现出自信和权威。我们进一步分析了聊天机器人中介交互的规范化如何改变工作、学习和决策模式,导致去技能化、知识同质化和专业知识期望的转变。最后,我们探讨了更广泛的社会影响,包括劳动力替代、经济权力集中和环境成本增加,这些是由持续投资于大规模聊天机器人基础设施驱动的。尽管承认合法利益,我们主张当前AI发展轨迹反映了特定价值观选择,优先考虑对话通用性而非领域特定性、问责制和长期社会可持续性。我们最后提出替代的AI发展方向和治理,超越单一解决方案的聊天机器人,强调多元系统设计、任务特定工具和制度保障以减少社会和经济损害。

英文摘要

The rapid convergence of artificial intelligence (AI) toward conversational chatbot interfaces marks a critical moment for the industry. This paper argues that the chatbot paradigm is not a neutral interface choice, but a dominant sociotechnical configuration whose widespread adoption reshapes social, economic, legal, and environmental systems. We examine how treating AI primarily as conversational assistants has extensive structural downsides. We show how chatbot-based systems often fail to adequately meet user needs, particularly in complex or high-stakes contexts, while projecting confidence and authority. We further analyze how the normalization of chatbot-mediated interaction alters patterns of work, learning, and decision-making, contributing to deskilling, homogenization of knowledge, and shifting expectations of expertise. Finally, we examine broader societal effects, including labor displacement, concentration of economic power, and increased environmental costs driven by sustained investment in large-scale chatbot infrastructures. While acknowledging legitimate benefits, we argue that the current trajectory of AI development reflects specific value choices that prioritize conversational generality over domain specificity, accountability, and long-term social sustainability. We conclude by outlining alternative directions for AI development and governance that move beyond one-size-fits-all chatbots, emphasizing pluralistic system design, task-specific tools, and institutional safeguards to mitigate social and economic harm.

2605.07886 2026-05-11 stat.ML cs.LG

Characterizing and Correcting Effective Target Shift in Online Learning

表征和纠正在线学习中的有效目标偏移

Ziyan Li, Naoki Hiratani

发表机构 * Department of Physics(物理系) Washington University in St. Louis(圣路易斯华盛顿大学) Department of Neuroscience(神经科学系)

AI总结 本文研究了在线学习中目标偏移的本质,提出通过目标修正使在线核学习与离线学习等效,并在图像分类任务中验证了其有效性。

Comments 22 pages; 6 figures

详情
AI中文摘要

在线学习从数据流中学习是智能的定义特征,但现代机器学习系统在分布偏移下常表现不佳。本文通过核回归研究在线与离线学习的关系,推导出在线核回归学习的函数表达式,揭示其等价于具有偏移且不准确目标输出的离线回归。通过补偿教学信号中的有效偏移,在线核学习可证明与离线学习等效。推导了目标修正的闭式表达式和迭代形式,并在CIFAR-10和CORe50任务中验证,显示在线随机梯度下降结合迭代修正目标在持续学习中优于真实目标学习。本文为分析和改进非平稳环境中的在线学习提供了基础框架。

英文摘要

Online learning from a stream of data is a defining feature of intelligence, yet modern machine learning systems often struggle in this setting, especially under distributional shift. To understand its basic properties, we study the relationship between online and offline learning in the context of kernel regression. We derive a closed-form expression for the function learned by online kernel regression, revealing that online kernel regression is equivalent to offline regression with shifted, inaccurate target outputs. Conversely, we show that by compensating for this effective shift in the teaching signal through target correction, online kernel-based learning can provably learn the same predictor as its offline counterpart. We derive both a closed-form expression for this target correction and an iterative form that can be applied sequentially. Applying this framework to image classification tasks on CIFAR-10 and CORe50, we show that online stochastic gradient descent with iteratively corrected targets outperforms learning with the true targets in continual learning settings. This work therefore provides a basic framework for analyzing and improving online learning in non-stationary environments.