arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1971
专题追踪
2506.13506 2026-06-18 cs.CV q-bio.NC 版本更新

Stimulus Motion Perception Studies Imply Specific Neural Computations in Human Visual Stabilization

刺激运动知觉研究暗示人类视觉稳定中的特定神经计算

David W Arathorn, Josephine C. D'Angelo, Austin Roorda

发表机构 * Montana State University, Dept of Electrical and Computer Engineering(蒙塔那州立大学电气与计算机工程系) University of California, Berkeley, Herbert Wertheim School of Optometry and Vision Science(加州大学伯克利分校赫伯特·韦特海姆视觉科学与眼科学学院)

AI总结 通过分析人类注视时眼球的微小抖动,发现视觉稳定机制比相机稳定或简单进化方案更复杂,提出了基于视网膜信号特定操作的功能模型和可能的神经回路实现。

详情
AI中文摘要

即使在注视期间,人眼也持续进行低幅度运动,以高达100Hz的频率在随机方向上小角度抖动。这种运动导致视网膜上图像的所有特征不断穿过多个视锥细胞,然而世界中稳定的物体被感知为稳定,而任何运动的物体被感知为运动。一系列持续十多年的实验揭示了视觉稳定的心理物理学比可能假设的(例如,从相机图像稳定的机制,或从进化角度可能假设的最简单解决方案)更为微妙。实验揭示的心理物理学强烈暗示了视网膜信号上的一组特定操作,导致了观察到的稳定行为。报告分为两个层次。首先是对很可能负责实验观察行为的机制的功能描述。其次是对可能实现功能行为的电路级神经元的更推测性提议。

英文摘要

Even during fixation the human eye is constantly in low amplitude motion, jittering over small angles in random directions at up to 100Hz. This motion results in all features of the image on the retina constantly traversing a number of cones, yet objects which are stable in the world are perceived to be stable, and any object which is moving in the world is perceived to be moving. A series of experiments carried out over a dozen years revealed the psychophysics of visual stabilization to be more nuanced than might be assumed, say, from the mechanics of stabilization of camera images, or what might be assumed to be the simplest solution from an evolutionary perspective. The psychophysics revealed by the experiments strongly implies a specific set of operations on retinal signals resulting in the observed stabilization behavior. The presentation is in two levels. First is a functional description of the action of the mechanism that is very likely responsible for the experimentally observed behavior. Second is a more speculative proposal of circuit-level neural elements that might implement the functional behavior.

2505.23851 2026-06-18 cs.CL cs.AI cs.SC 版本更新

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

ASyMOB:代数符号数学运算基准

Michael Shalyt, Rotem Elimelech, Ido Kaminer

发表机构 * MIT(麻省理工学院) Technion - Israel Institute of Technology(技术学院-以色列理工学院)

AI总结 提出ASyMOB基准,包含35,368个符号数学问题,通过扰动测试揭示大模型在符号数学推理中的鲁棒性不足,并发现LLM与CAS的互补潜力。

Comments Published in ICML2026: https://icml.cc/virtual/2026/poster/63549 Code repository: https://github.com/RamanujanMachine/ASyMOB Complete benchmark dataset: https://huggingface.co/datasets/Shalyt/ASyMOB-Algebraic_Symbolic_Mathematical_Operations_Benchmark

详情
AI中文摘要

大型语言模型(LLM)越来越多地应用于符号数学,然而现有评估常常混淆模式记忆与真正推理。为弥补这一空白,我们提出\textbf{ASyMOB},一个包含\textit{35,368}个经过验证的符号数学问题的高分辨率数据集,涵盖积分、极限、微分方程、级数和超几何函数。与以往基准不同,\textbf{ASyMOB}通过符号、数值和等价保持变换系统地扰动每个种子问题,从而实现对泛化能力的细粒度评估。我们的评估揭示了三个关键发现:(1)大多数模型的性能在微小扰动下崩溃,而顶级系统表现出明显的鲁棒性\textit{机制转变};(2)集成代码工具稳定了性能,尤其对较弱模型;(3)我们识别出计算机代数系统(CAS)失败而LLM成功的例子,以及仅通过LLM-CAS混合方法解决的问题,突显了有前景的集成前沿。\textbf{ASyMOB}作为一个原则性诊断工具,用于衡量和加速构建可验证、可信赖的AI以促进科学发现。

英文摘要

Large language models (LLMs) are increasingly applied to symbolic mathematics, yet existing evaluations often conflate pattern memorization with genuine reasoning. To address this gap, we present ASyMOB, a high-resolution dataset of 35,368 validated symbolic math problems spanning integration, limits, differential equations, series, and hypergeometrics. Unlike prior benchmarks, ASyMOB systematically perturbs each seed problem using symbolic, numeric, and equivalence-preserving transformations, enabling a fine-grained assessment of generalization. Our evaluation reveals three key findings: (1) most models' performance collapses under minor perturbations, while top systems exhibit an apparent regime shift in robustness; (2) integrated code tools stabilize performance, particularly for weaker models; and (3) we identify examples where Computer Algebra Systems (CAS) fail while LLMs succeed, as well as problems solved only via a hybrid LLM-CAS approach, highlighting a promising integration frontier. ASyMOB serves as a principled diagnostic tool for measuring and accelerating progress toward building verifiable, trustworthy AI for scientific discovery.

2505.21954 2026-06-18 cs.CV cs.AI 版本更新

Revisiting Active Speaker Detection: An In-the-Wild Benchmark for Generalization and Robustness

重新审视主动说话人检测:面向泛化性和鲁棒性的野外基准

Le Thien Phuc Nguyen, Zhuoran Yu, Khoa Quang Nhat Cao, Yuwei Guo, Tu Ho Manh Pham, Tuan Tai Nguyen, Toan Ngo Duc Vo, Lucas Poon, Tuan Khai Nguyen, Soochahn Lee, Yong Jae Lee

发表机构 * University of Wisconsin - Madison(威斯康星大学麦迪逊分校) Oregon State University(俄勒冈州立大学) University of Sydney(悉尼大学) Kookmin University(韩国成均馆大学)

AI总结 提出UniTalk数据集,涵盖多语言、嘈杂背景和拥挤场景等挑战性真实条件,评估显示现有模型在野外环境下性能不足,而UniTalk训练模型泛化性更好,为主动说话人检测建立新基准。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

我们提出了UniTalk,一个强调挑战性场景的新数据集,旨在增强主动说话人检测(ASD)任务的模型泛化性。先前建立的基准如AVA主要包含老电影,因此与现实世界视频存在显著领域差距。相比之下,UniTalk涵盖了反映挑战性真实条件的多种视频类型,包括代表性不足的语言、嘈杂背景和拥挤场景,同时在规模上与AVA相当。广泛评估表明,在现实条件下ASD仍未解决:在AVA上接近完美的先进模型在UniTalk上未能达到饱和。相反,在UniTalk上训练的模型能更好地泛化到现代野外数据集,包括Talkies和ASW。因此,UniTalk为ASD建立了新的基准,为研究人员开发和评估多功能且鲁棒的模型提供了宝贵资源。

英文摘要

We present UniTalk, a novel dataset emphasizing challenging scenarios to enhance model generalization for the task of active speaker detection (ASD). Previously established benchmarks such as AVA predominantly comprise old movies and thus exhibit significant domain gaps with real-world video. In contrast, UniTalk covers diverse video types reflecting challenging real-world conditions, including underrepresented languages, noisy backgrounds, and crowded scenes, while being on par with AVA in scale. Extensive evaluations reveal that ASD remains unsolved under realistic conditions: state-of-the-art models near-perfect on AVA fail to reach saturation on UniTalk. Conversely, models trained on UniTalk generalize better to modern in-the-wild datasets including Talkies and ASW. UniTalk thus establishes a new benchmark for ASD, providing researchers with a valuable resource for developing and evaluating versatile and resilient models.

2505.12369 2026-06-18 cs.AI cs.LG cs.LO 版本更新

Fully Geometric Multi-Hop Reasoning on Knowledge Graphs with Transitive Relations

知识图谱上具有传递关系的全几何多跳推理

Fernando Zhapa-Camacho, Robert Hoehndorf

发表机构 * KAUST Center of Excellence for Smart Health (KCSH)(智能健康卓越中心) KAUST Center of Excellence for Generative AI(生成人工智能卓越中心)

AI总结 提出GeometrE方法,将逻辑操作映射为纯几何变换,并引入传递损失函数,在保持可解释性的同时提升多跳推理性能。

Comments Accepted at ESWC 2026

Journal ref The Semantic Web. ESWC 2026. Lecture Notes in Computer Science, vol 16549. Springer, Cham (2026)

详情
AI中文摘要

知识图谱上的多跳逻辑推理需要将逻辑语义忠实地映射到潜在空间。当前的几何嵌入方法通过将实体映射到几何区域、逻辑操作映射到潜在变换,在此任务上表现出有效性。虽然几何嵌入可以为查询回答提供直接的可解释性框架,但当前方法仅利用了实体的几何构造,未能将逻辑操作映射为纯几何变换,而是使用神经组件来学习这些操作。另一方面,纯神经方法优于几何方法,但在潜在空间中缺乏可解释性。我们提出了GeometrE,一种用于多跳推理的几何嵌入方法,它将每个逻辑操作映射为潜在空间中的纯几何操作。此外,我们引入了一个传递损失函数,并表明与现有方法不同,它可以保留对所有a,b,c的逻辑规则:r(a,b)和r(b,c) -> r(a,c)。我们的实验表明,GeometrE优于当前最先进的几何方法,并在标准基准数据集上与现有的神经方法保持竞争力。

英文摘要

Multi-hop logical reasoning on knowledge graphs requires faithfully mapping the logical semantics to latent space. Current geometric embedding methods show to be useful on this task by mapping entities to geometric regions and logical operations to latent transformations. While a geometric embedding can provide a direct interpretability framework for query answering, current methods have only leveraged the geometric construction of entities, failing to map logical operations to pure geometric transformations and, instead, using neural components to learn these operations. On the other hand, purely neural-based methods outperform geometric methods, but they lack interpretability in the latent space. We introduce GeometrE, a geometric embedding method for multi-hop reasoning, that maps every logical operation to a purely geometric operation in the latent space. Additionally, we introduce a transitive loss function and show that, unlike existing methods, it can preserve the logical rule for all a,b,c: r(a,b) and r(b,c) -> r(a,c). Our experiments show that GeometrE outperforms current state-of-the-art geometric methods and remains competitive with existing neural-based methods on standard benchmark datasets.

2411.16206 2026-06-18 cs.LG cs.AI cs.NE 版本更新

Scalable Batch Bayesian Optimization Via Subspace Acquisition Functions

可扩展的批量贝叶斯优化:基于子空间采集函数

Dawei Zhan, Zhaoxi Zeng, Shuoxiao Wei, Ping Wu

发表机构 * School of Computing and Artificial Intelligence(计算与人工智能学院)

AI总结 提出通过从原始问题的轴对齐子空间中各选一点来扩展贝叶斯优化至大规模批量评估,显著加速收敛,与十种批量算法相比极具竞争力。

Journal ref ACM Transactions on Evolutionary Learning and Optimization, 2026

详情
AI中文摘要

将贝叶斯优化扩展到批量评估可以使设计者充分利用并行计算技术。然而,当前大多数批量方法在批量大小增大时扩展性不佳,优化效率往往下降。为解决此问题,本文提出一种简单高效的方法,将贝叶斯优化扩展到大规模批量评估。与现有批量方法不同,新方法的思想是从原始问题中抽取一批轴对齐子空间,并使用现有采集函数从每个子空间中选择一个点。数值实验表明,与顺序贝叶斯优化算法相比,我们提出的方法显著加速收敛,并且与十种批量贝叶斯优化算法相比表现非常有竞争力。我们提出的方法的实现可在此 https URL 获取。

英文摘要

Extending Bayesian optimization to batch evaluation can enable the designer to make the most use of parallel computing technology. However, most of current batch approaches do not scale well with the batch size. That is, their optimization efficiencies often deteriorate as the batch size increases. To address this issue, we propose a simple and efficient approach to extend Bayesian optimization to large-scale batch evaluation in this work. Different from existing batch approaches, the idea of the new approach is to draw a batch of axis-aligned subspaces of the original problem and select one point from each subspace using existing acquisition functions. Numerical experiments show that our proposed approach speedups the convergence significantly when compared with the sequential Bayesian optimization algorithm, and performs very competitively when compared with ten batch Bayesian optimization algorithms. The implementation of our proposed approach is available at https://github.com/zhandawei/SubSpace_Acquisition_Functions.

2504.14798 2026-06-18 cs.LG cs.CV 版本更新

RUB: Evaluating Residual Knowledge in Unlearned Models

RUB: 评估未学习模型中的残留知识

Hao Xuan, Xingyu Li

发表机构 * Electrical and Computer Engineering University of Alberta(电气与计算机工程大学阿尔伯塔大学)

AI总结 提出鲁棒未学习原则及统一基准RUB,通过未学习映射攻击(UMA)检测残留信息,揭示现有方法在对抗评估下的脆弱性。

Journal ref Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, June 2026, pages 8550-8559

详情
AI中文摘要

机器未学习(MUL)已成为隐私保护和内容监管的关键机制,然而当前技术往往无法保证完全移除敏感信息。虽然现有工作大多关注验证未学习的执行,但它们忽略了模型在面对对抗性恢复遗忘知识尝试时是否保持鲁棒性的关键问题。在这项工作中,我们倡导鲁棒未学习原则,要求模型既与重新训练的模型不可区分,又能抵御多样化的对抗威胁。为实例化这一原则,我们提出了一个统一基准RUB(鲁棒未学习基准),系统评估未学习算法在分类、图像到图像重建和文本到图像合成中的鲁棒性。在此框架内,我们引入未学习映射攻击(UMA)作为检测残留信息的通用方法,并展示现有攻击策略如何适应此框架,只要它们符合通用UMA框架。我们在判别式和生成式任务上的实验表明,最先进的未学习方法在这些评估下仍然脆弱,即使通过了标准验证指标。通过将鲁棒性定位为核心标准并提供对抗评估基准,我们希望RUB能为更可靠和安全的未学习实践铺平道路。RUB中的代码库和模型检查点将公开发布。

英文摘要

Machine Unlearning (MUL) has emerged as a key mechanism for privacy protection and content regulation, yet current techniques often fail to guarantee the complete removal of sensitive information. While most existing works focus on verifying the execution of unlearning, they overlook the critical question of whether models remain robust against adversarial attempts to recover forgotten knowledge. In this work, we advocate for the principle of Robust Unlearning, which requires models to be both indistinguishable from retrained counterparts and resilient against diverse adversarial threats. To instantiate this principle, we propose a unified benchmark, RUB (Robust Unlearning Benchmark), that systematically evaluates the robustness of unlearning algorithms across classification, image-to-image reconstruction, and text-to-image synthesis. Within this framework, we introduce the Unlearning Mapping Attack (UMA) as a generalizable method to detect residual information, and demonstrate how existing attack strategies can be adapted into this framework as long as they conform to the generic UMA framework. Our experiments across discriminative and generative tasks reveal that state-of-the-art unlearning methods remain vulnerable under these evaluations, even when passing standard verification metrics. By positioning robustness as the central criterion and providing a benchmark for adversarial evaluation, we hope RUB paves the way toward more reliable and secure unlearning practices. The codebase and model checkpoints in RUB will be published.

2503.08895 2026-06-18 cs.RO 版本更新

Mutual Adaptation in Human-Robot Co-Transportation with Human Preference Uncertainty

人机协同运输中考虑人类偏好不确定性的相互适应

Al Jaber Mahmud, Weizi Li, Xuan Wang

发表机构 * George Mason University(乔治·马歇尔大学) University of California, Riverside(加州大学河滨分校)

AI总结 针对人机协同运输中人类偏好参数不确定及适应策略平衡问题,提出统一框架,通过建模偏好概率分布、时变固执度及协调规划模型,结合位姿优化策略,实现相互适应以提升任务性能。

Comments 9 pages, 6 figures

详情
AI中文摘要

相互适应可以通过整合机器人和人类对环境的理解来增强人机协同运输的整体任务性能。虽然人类建模有助于捕捉人类的主观偏好,但存在两个挑战:(i)人类偏好参数的不确定性,以及(ii)需要平衡对人和机器都有利的适应策略。在本文中,我们提出了一个统一的框架来应对这些挑战,并通过相互适应提高任务性能。首先,我们不依赖固定参数,而是通过纳入一系列不确定的人类偏好参数来建模人类选择的概率分布。在此基础上,我们引入时变固执度量和协调规划模型,该模型允许机器人领导团队的轨迹,或者如果人类偏好的路径与机器人的计划冲突且其固执度超过阈值,则机器人转为跟随人类。最后,我们引入一种用于低级控制的位姿优化策略,以减轻人类领导时的不确定行为。为了验证该框架,我们设计并进行了包含二十名人类参与者反馈的研究。然后,通过仿真,我们展示了我们的模型在通过相互适应和位姿优化增强任务性能方面的有效性。

英文摘要

Mutual adaptation can enhance overall task performance in human-robot co-transportation by integrating both the robot's and the human's understanding of the environment. While human modeling helps capture humans' subjective preferences, two challenges persist: (i) the uncertainty of human preference parameters and (ii) the need to balance adaptation strategies that benefit both humans and robots. In this paper, we propose a unified framework to address these challenges and improve task performance through mutual adaptation. First, instead of relying on fixed parameters, we model a probability distribution of human choices by incorporating a range of uncertain human preference parameters. Building on this, we introduce a time-varying stubbornness measure and a coordinated planning model, which allows either the robot to lead the team's trajectory or, if a human's preferred path conflicts with the robot's plan and their stubbornness exceeds a threshold, the robot to transition to following the human. Finally, we introduce a pose optimization strategy for low-level control to mitigate the uncertain human behaviors when they are leading. To validate the framework, we design and perform a study with human feedback from twenty human participants. We then demonstrate, through simulations, the effectiveness of our models in enhancing task performance with mutual adaptation and pose optimization.

2503.08038 2026-06-18 cs.LG cs.AI cs.CV 版本更新

Generalized Kullback-Leibler Divergence Loss

广义Kullback-Leibler散度损失

Jiequan Cui, Beier Zhu, Qingshan Xu, Zhuotao Tian, Xiaojuan Qi, Bei Yu, Hanwang Zhang, Richang Hong

发表机构 * Hefei University of Technology(合肥工业大学) University of Science and Technology of China(中国科学技术大学) Nanyang Technological University(南洋理工大学) The Chinese University of Hong Kong(香港中文大学) The University of Hong Kong(香港大学) Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳))

AI总结 本文提出广义KL散度损失,通过解耦KL损失为加权MSE和交叉熵损失,并引入非对称优化修正和类别全局信息,在对抗训练和知识蒸馏中取得SOTA性能。

Comments TPAMI 2026, extension of our NeurIPS paper "Decoupled Kullback-Leibler Divergence Loss". arXiv admin note: substantial text overlap with arXiv:2305.13948

详情
AI中文摘要

在本文中,我们深入探讨了Kullback-Leibler (KL) 散度损失,并从数学上证明它等价于由(1)加权均方误差(wMSE)损失和(2)包含软标签的交叉熵损失组成的解耦Kullback-Leibler (DKL) 散度损失。得益于DKL损失的解耦结构,我们确定了两个改进方向。首先,我们通过打破KL损失的不对称优化性质并引入更平滑的权重函数,解决了其在知识蒸馏等场景中的局限性。这一修改有效缓解了优化中的收敛困难,特别是对于软标签中预测分数较高的类别。其次,我们将类别级别的全局信息引入KL/DKL,以减少单个样本带来的偏差。通过这两项改进,我们推导出广义Kullback-Leibler (GKL) 散度损失,并通过在CIFAR-10/100、ImageNet和视觉-语言数据集上进行实验,聚焦于对抗训练和知识蒸馏任务,评估其有效性。具体来说,我们在公开排行榜RobustBench上实现了新的最先进对抗鲁棒性,并在CIFAR/ImageNet模型和CLIP模型上取得了具有竞争力的知识蒸馏性能,展示了其重要的实际价值。我们的代码可在该https URL获取。

英文摘要

In this paper, we delve deeper into the Kullback-Leibler (KL) Divergence loss and mathematically prove that it is equivalent to the Decoupled Kullback-Leibler (DKL) Divergence loss that consists of (1) a weighted Mean Square Error (wMSE) loss and (2) a Cross-Entropy loss incorporating soft labels. Thanks to the decoupled structure of DKL loss, we have identified two areas for improvement. Firstly, we address the limitation of KL loss in scenarios like knowledge distillation by breaking its asymmetric optimization property along with a smoother weight function. This modification effectively alleviates convergence challenges in optimization, particularly for classes with high predicted scores in soft labels. Secondly, we introduce class-wise global information into KL/DKL to reduce bias arising from individual samples. With these two enhancements, we derive the Generalized Kullback-Leibler (GKL) Divergence loss and evaluate its effectiveness by conducting experiments on CIFAR-10/100, ImageNet, and vision-language datasets, focusing on adversarial training, and knowledge distillation tasks. Specifically, we achieve new state-of-the-art adversarial robustness on the public leaderboard -- RobustBench and competitive knowledge distillation performance across CIFAR/ImageNet models and CLIP models, demonstrating the substantial practical merits. Our code is available at https://github.com/jiequancui/DKL.

2503.04989 2026-06-18 cs.CL 版本更新

Application of integrated gradients explainability to sociopsychological semantic markers

集成梯度可解释性在社会心理语义标记中的应用

Ali Aghababaei, Jan Nikadon, Magdalena Formanowicz, Maria Laura Bettinsoli, Carmen Cervone, Caterina Suitner, Tomaso Erseghe

发表机构 * Department of Information Engineering, University of Padova(帕多瓦大学信息工程系) Center for Research on Social Relations, University of Social Sciences and Humanities (SWPS)(社会科学与人文大学社会关系研究中心) Department of Cognitive Science, Nicolaus Copernicus University in Toruń(托伦尼古拉·哥白尼大学认知科学系) Interdisciplinary Centre for Modern Technologies, Nicolaus Copernicus University in Toruń(托伦尼古拉·哥白尼大学现代技术跨学科中心) Department of Developmental Psychology and Socialization, University of Padova(帕多瓦大学发展心理学与社会化系)

AI总结 本文利用集成梯度方法在词级别解释文本分类输出,聚焦社会心理标记(如能动性),通过测试BERTAgent等模型,验证了该方法在有限标注数据下识别关键词语的有效性。

Comments Submitted to IEEE Trans. on Computational Social Systems

详情
AI中文摘要

基于情感或更细微的社会心理标记(如能动性)的文本数据分类,现在是一种常用的句子级方法。在本文中,我们利用集成梯度(IG)方法在词级别捕获分类输出,揭示哪些词语实际贡献于分类过程。该方法提高了可解释性,并提供了对文本的深入洞察。我们关注超越情感的社会心理标记,并研究如何有效训练IG在能动性上——这是目前少数拥有经过验证的深度学习分类器BERTAgent的标记之一。我们仔细测试了性能和系统参数,评估了IG方法的替代方案,并在相关应用场景中验证了结果的实用性。该方法还应用于仅拥有少量标注数据集的场景,旨在利用IG识别有助于构建与相关社会心理标记相关的不同类别的显著词语。为此,采用了一种鼓励过拟合的非同寻常的训练程序,以增强每个类别的独特性。通过社会心理学的视角分析结果,提供了有价值的见解。

英文摘要

Classification of textual data in terms of sentiment, or more nuanced sociopsychological markers (e.g., agency), is now a popular approach commonly applied at the sentence level. In this paper, we exploit the integrated gradient (IG) method to capture the classification output at the word level, revealing which words actually contribute to the classification process. This approach improves explainability and provides in-depth insights into the text. We focus on sociopsychological markers beyond sentiment and investigate how to effectively train IG in agency, one of the very few markers for which a verified deep learning classifier, BERTAgent, is currently available. Performance and system parameters are carefully tested, alternatives to the IG approach are evaluated, and the usefulness of the result is verified in a relevant application scenario. The method is also applied in a scenario where only a small labeled dataset is available, with the aim of exploiting IG to identify the salient words that contribute to building the different classes that relate to relevant sociopsychological markers. To achieve this, an uncommon training procedure that encourages overfitting is employed to enhance the distinctiveness of each class. The results are analyzed through the lens of social psychology, offering valuable insights.

2412.16468 2026-06-18 cs.LG 版本更新

The Road to Artificial SuperIntelligence: A Comprehensive Survey of Superalignment

通往人工超级智能之路:超级对齐的全面综述

HyunJin Kim, DongHyun Ryu, Xiaoyuan Yi, Jing Yao, Jianxun Lian, Muhua Huang, Shitong Duan, JinYeong Bak, Xing Xie

发表机构 * Microsoft Research Asia(微软亚洲研究院) Sungkyunkwan University(顺天大学) Stanford University(斯坦福大学) Fudan University(复旦大学)

AI总结 本文综述了超级对齐问题,通过分析可扩展监督范式(夹层、自我增强和弱到强泛化)及其局限性,探讨了监督、控制和管理人工超级智能的挑战与路径。

Comments 24 pages

详情
AI中文摘要

大型语言模型(LLMs)的出现引发了关于人工超级智能(ASI)的讨论,这是一种假设性的、超越人类智能的AI系统。尽管ASI仍处于假设阶段且远超出当前AI能力,但讨论其潜力、探索其可行性和潜在风险对于未来AI系统的发展至关重要。超级对齐的概念源于可扩展监督,后者研究当直接人类监督不足时如何监督日益强大的AI系统。本文聚焦于超级对齐问题:“监督、控制和管理人工超级智能的过程”。我们首先回顾可扩展监督范式——夹层、自我增强和弱到强泛化,然后通过可能性和不可能性的视角分析当前范式的局限性,讨论关键挑战,并提出未来AI系统安全持续改进的路径。

英文摘要

The emergence of large language models (LLMs) has sparked discussion on Artificial Superintelligence (ASI), a hypothetical AI system that surpasses human intelligence. Although ASI remains hypothetical and far beyond current AI capabilities, discussing its potential and exploring its feasibility and potential risks is critical for the development of future AI systems. The idea of superalignment originates from scalable oversight, which studies how to supervise increasingly capable AI systems when direct human supervision becomes insufficient. In this paper, we focus on the superalignment problem: "The process of supervising, controlling, and governing artificial superintelligence." We first review scalable oversight paradigms-Sandwiching, Self-Enhancement, and Weak-to-Strong Generalization -- then analyze the limitations of current paradigms through the lens of possibility and impossibility, discuss key challenges, and propose pathways for the safe and continual improvement of future AI systems.

2407.18245 2026-06-18 cs.CV cs.LG 版本更新

VGGHeads: 3D Multi Head Alignment with a Large-Scale Synthetic Dataset

VGGHeads: 基于大规模合成数据集的3D多头部对齐

Orest Kupyn, Eugene Khvedchenia, Christian Rupprecht

发表机构 * University of Oxford(牛津大学) Piñata Farms Ukrainian Catholic University(乌克兰天主大学)

AI总结 提出VGGHeads,一个由扩散模型生成的大规模合成数据集,用于单步同时进行头部检测和3D网格重建,在真实图像上表现优异。

详情
AI中文摘要

人类头部检测、关键点估计和3D头部模型拟合是许多应用中的基本任务。然而,传统的真实世界数据集常常存在偏差、隐私和伦理问题,并且是在实验室环境中记录的,这使得训练出的模型难以泛化。在这里,我们介绍\method——一个使用扩散模型生成的大规模合成数据集,用于人类头部检测和3D网格估计。我们的数据集包含超过100万张高分辨率图像,每张图像都标注了详细的3D头部网格、面部标志和边界框。利用这个数据集,我们引入了一种新的模型架构,能够从单张图像中单步同时进行头部检测和头部网格重建。通过广泛的实验评估,我们证明了在我们的合成数据上训练的模型在真实图像上取得了强劲的性能。此外,我们数据集的多样性使其适用于广泛的任务,提供了人类头部的通用和全面表示。

英文摘要

Human head detection, keypoint estimation, and 3D head model fitting are essential tasks with many applications. However, traditional real-world datasets often suffer from bias, privacy, and ethical concerns, and they have been recorded in laboratory environments, which makes it difficult for trained models to generalize. Here, we introduce \method -- a large-scale synthetic dataset generated with diffusion models for human head detection and 3D mesh estimation. Our dataset comprises over 1 million high-resolution images, each annotated with detailed 3D head meshes, facial landmarks, and bounding boxes. Using this dataset, we introduce a new model architecture capable of simultaneous head detection and head mesh reconstruction from a single image in a single step. Through extensive experimental evaluations, we demonstrate that models trained on our synthetic data achieve strong performance on real images. Furthermore, the versatility of our dataset makes it applicable across a broad spectrum of tasks, offering a general and comprehensive representation of human heads.

2408.01526 2026-06-18 cs.CV 版本更新

Recognizing and Reconstructing a Multi-Unit Floor Plan

识别与重建多单元楼层平面图

Lukas Kratochvila, Gijs de Jong, Monique Arkesteijn, Simon Bilik, Tomas Zemcik, Karel Horak, Jan S. Rellermeyer

发表机构 * Department of Control and Instrumentation, Brno University of Technology, Brno, Czech Republic(控制与仪器系,布拉格技术大学,布拉格,捷克共和国) Department of Software Technology, Faculty of Electrical Engineering Mathematics and Computer Science, TU Delft, Delft, Netherlands(软件技术系,电气工程数学与计算机科学学院,代尔夫特理工大学,代尔夫特,荷兰) Department of Management in the Built Environment, Faculty of Architecture and the Built Environment, TU Delft, Delft, Netherlands(建筑环境管理系,建筑与环境学院,代尔夫特理工大学,代尔夫特,荷兰) Institute for Research and Applications of Fuzzy Modeling, University of Ostrava, Ostrava, Czech Republic and with Department of Informatics, Mendel University in Brno, Brno, Czech Republic(模糊建模研究与应用研究所,奥斯特拉瓦大学,奥斯特拉瓦,捷克共和国,并与布拉格梅德勒大学信息系联合) Department of Software Technology, Faculty of Electrical Engineering Mathematics and Computer Science, TU Delft, Delft, Netherlands and with Dependable and Scalable Software Systems, Institute of Systems Engineering, Faculty of Electrical Engineering and Computer Science, Leibniz University Hannover, Hannover, Germany(软件技术系,电气工程数学与计算机科学学院,代尔夫特理工大学,代尔夫特,荷兰,并与可靠和可扩展软件系统,系统工程研究所,电气工程与计算机科学学院,莱比锡大学汉诺威分校,汉诺威,德国)

AI总结 提出基于MDA-Unet和MACU-Net的像素级分割方法,结合改进跳跃连接和注意力机制,从2D平面图重建3D模型,在CubiCasa数据集上平均F1达0.86。

详情
AI中文摘要

数字孪生在应急规划中具有巨大潜力,可更高效设计逃生路线、在异常情况下提供更好方向感并加快救援干预。然而,由于缺乏3D表示(仅部分新建筑有有限数量),创建数字孪生仍主要依赖手动工作。因此,本文旨在从常见的2D建筑平面图合成3D信息。我们提出两种基于MDA-Unet和MACU-Net架构的新型像素级分割方法,具有改进的跳跃连接、注意力机制以及训练目标,并结合流水线的重建部分,将分割后的平面图矢量化以创建3D模型。将所提方法与另外两种最先进技术及多个基准数据集进行比较。在常用的CubiCasa基准数据集上,我们的方法在五个检查类别上实现了平均F1分数0.86,优于其他测试的像素级方法。我们还公开了代码以支持该领域的研究。

英文摘要

Digital twins have a major potential to form a significant part of urban management in emergency planning, as they allow more efficient designing of the escape routes, better orientation in exceptional situations, and faster rescue intervention. Nevertheless, creating the twins still remains a largely manual effort, due to a lack of 3D-representations, which are available only in limited amounts for some new buildings. Thus, in this paper we aim to synthesize 3D information from commonly available 2D architectural floor plans. We propose two novel pixel-wise segmentation methods based on the MDA-Unet and MACU-Net architectures with improved skip connections, an attention mechanism, and a training objective together with a reconstruction part of the pipeline, which vectorizes the segmented plans to create a 3D model. The proposed methods are compared with two other state-of-the-art techniques and several benchmark datasets. On the commonly used CubiCasa benchmark dataset, our methods have achieved the mean F1 score of 0.86 over five examined classes, outperforming the other pixel-wise approaches tested. We have also made our code publicly available to support research in the field.

2406.18215 2026-06-18 cs.CV 版本更新

Optimizing Incomplete, Large-Scale and Sparse Multi-Graph Matching in Bioimaging

优化生物成像中不完整、大规模和稀疏的多图匹配

Max Kahl, Sebastian Stricker, Lisa Hutschenreiter, Florian Bernard, Carsten Rother, Bogdan Savchynskyy

发表机构 * Heidelberg University(海德堡大学) Max Planck Institute for Informatics(马克斯·普朗克信息研究所) University of Bonn(波恩大学)

AI总结 针对生物成像中大规模稀疏多图匹配问题,提出稀疏排列同步范式及通用方法GREEDA,在目标值和运行时间上优于现有方法。

详情
AI中文摘要

多图匹配是计算机视觉中的一个基本问题。我们的工作受到生物成像中一个具有挑战性的应用的启发,在该应用中,需要将数十甚至数百张蠕虫的3D显微镜图像进行对应。现有数据集未覆盖这种大规模场景,且几乎所有现有方法都不适用,因为它们假设完整或密集的问题设置。为了支持进一步研究,我们的第一个贡献是基于生物成像中的问题实例构建了一个新的大规模数据集。我们的第二个贡献是对两种主要的多图匹配范式:直接法和排列同步法进行了全面分析。我们通过部分证明论证,实用的大规模方法必须明确处理问题的稀疏性和不完整性。由于标准的排列同步方法在此设置下失败,我们进一步引入了一种稀疏排列同步范式。我们的最终贡献是GREEDA,一种针对稀疏和不完整问题的通用方法,可跨成本阶和范式实例化。虽然本文重点研究最高二次阶的目标函数,但GREEDA本质上可推广到任意阶。在更大、更稀疏的实例上,GREEDA在目标值和运行时间上均优于竞争方法。例如,对于基于30张蠕虫图像的中等规模问题,GREEDA在2分钟内产生高质量解,而竞争方法至少需要半小时且结果差得多。在较小的密集问题上,GREEDA与领先方法性能相当,但速度快一个数量级。

英文摘要

Multi-graph matching is a fundamental problem in computer vision. Our work is motivated by a challenging application in bioimaging, where dozens or even hundreds of 3D microscopy images of worms must be brought into correspondence. Existing datasets do not cover this large-scale regime, and virtually all existing methods are inapplicable because they assume a complete or dense problem setting. To support further research, our first contribution is a new large-scale dataset based on problem instances from bioimaging. Our second contribution is a comprehensive analysis of the two main multi-graph matching paradigms: direct and permutation synchronization-based formulations. We argue, in part by proof, that practical large-scale methods must explicitly address problem sparsity and incompleteness. Since standard permutation synchronization approaches fail in this setting, we further introduce a sparse permutation synchronization paradigm. Our final contribution is GREEDA, a general method for sparse and incomplete problems that can be instantiated across cost orders and paradigms. While our paper focuses on objective functions up to quadratic order, GREEDA is inherently generalizable to arbitrary orders. On larger, sparse instances, GREEDA outperforms competing methods in both objective value and runtime. For example, for moderately-sized problems based on 30 worm images GREEDA produces a high-quality solution within 2 minutes, whereas competitors require at least half an hour and yield far worse results. On smaller dense problems, GREEDA remains on par with leading methods while being an order of magnitude faster.

2402.08128 2026-06-18 cs.AI cs.GT 版本更新

Recursive Joint Simulation in Games

博弈中的递归联合模拟

Vojtech Kovarik, Caspar Oesterheld, Vincent Conitzer

发表机构 * Foundations of Cooperative AI Lab (FOCAL), Computer Science Department(合作人工智能基础实验室(FOCAL),计算机科学系) Carnegie Mellon University(卡内基梅隆大学) AI Center(人工智能中心) Czech Technical University(捷克技术大学) Center for Theoretical Study(理论研究中心) Charles University(查理大学)

AI总结 研究AI智能体通过递归联合模拟实现合作,证明该过程等价于原博弈的无限重复版本,从而可直接应用民间定理等现有结论。

详情
AI中文摘要

AI智能体之间的博弈动力学可能以多种方式不同于传统的人类-人类互动。其中一个差异是,可能能够精确模拟一个AI智能体,例如因为其源代码已知。这样的智能体将从根本上不确定自己是在现实世界还是在模拟中。我们的目标是探索利用这种可能性在战略环境中实现更合作的结果。在本文中,我们研究了AI智能体之间的交互,其中智能体运行递归联合模拟。也就是说,智能体首先共同观察它们所面临情境的模拟。这个模拟递归地包含额外的模拟(带有小的失败概率以避免无限递归),并且在选择行动之前观察所有这些嵌套模拟的结果。我们表明,由此产生的交互在策略上等价于原始博弈的无限重复版本,允许直接转移现有结果,如各种民间定理。作为该等价性稳健性的证据,我们表明即使放宽一些假设,它仍然成立,并且“从内部”也成立——即对于发现自己处于博弈中并具有自定位不确定性的智能体而言。

英文摘要

Game-theoretic dynamics between AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to accurately simulate an AI agent, for example because its source code is known. Such an agent would then be fundamentally uncertain whether it is in the real world or in a simulation. Our aim is to explore ways of leveraging this possibility to achieve more cooperative outcomes in strategic settings. In this paper, we study an interaction between AI agents where the agents run a recursive joint simulation. That is, the agents first jointly observe a simulation of the situation they face. This simulation in turn recursively includes additional simulations (with a small chance of failure, to avoid infinite recursion), and the results of all these nested simulations are observed before an action is chosen. We show that the resulting interaction is strategically equivalent to an infinitely repeated version of the original game, allowing a direct transfer of existing results such as the various folk theorems. As evidence that the equivalence is robust, we show that it holds even when we relax some of the assumptions and that it also holds ``from the inside'' -- meaning, for an agent that finds itself inside the game and has self-locating uncertainty.

2310.05753 2026-06-18 cs.AI 版本更新

Large-Scale OD Matrix Estimation with A Deep Learning Method

基于深度学习的大规模OD矩阵估计

Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng

发表机构 * IEEE Publication Technology Group(IEEE出版技术组)

AI总结 提出一种结合深度学习与数值优化的方法,利用探针交通流推断结构约束,实现大规模OD矩阵的实时估计,无需先验信息且具有良好泛化性。

Comments 12 pages,25 figures

详情
AI中文摘要

起点-终点(OD)矩阵估计是智能交通系统(ITS)的关键方面。它涉及通过回归当前观测值(如路段交通计数,例如使用最小二乘法)来调整初始OD矩阵。然而,OD估计问题缺乏足够的约束,在数学上是欠定的。为缓解此问题,一些研究者将先验OD矩阵作为回归目标以提供更多结构约束,但该方法高度依赖于可能过时的先验矩阵。另一些研究者通过传感器数据(如车辆轨迹和速度)添加结构约束,这些数据能实时反映更当前的结构约束。我们提出的方法将深度学习与数值优化算法相结合,以推断矩阵结构并指导数值优化。该方法结合了深度学习与数值优化算法的优势。神经网络(NN)学习从探针交通流中推断结构约束,消除了对先验信息的依赖,并提供了实时性能。此外,由于NN的泛化能力,该方法在工程上经济高效。我们进行了测试,证明了该方法在大规模合成数据集上的良好泛化性能。随后,我们在真实交通数据上验证了方法的稳定性。实验证实了结合NN与数值优化的优势。

英文摘要

The estimation of origin-destination (OD) matrices is a crucial aspect of Intelligent Transport Systems (ITS). It involves adjusting an initial OD matrix by regressing the current observations like traffic counts of road sections (e.g., using least squares). However, the OD estimation problem lacks sufficient constraints and is mathematically underdetermined. To alleviate this problem, some researchers incorporate a prior OD matrix as a target in the regression to provide more structural constraints. However, this approach is highly dependent on the existing prior matrix, which may be outdated. Others add structural constraints through sensor data, such as vehicle trajectory and speed, which can reflect more current structural constraints in real-time. Our proposed method integrates deep learning and numerical optimization algorithms to infer matrix structure and guide numerical optimization. This approach combines the advantages of both deep learning and numerical optimization algorithms. The neural network(NN) learns to infer structural constraints from probe traffic flows, eliminating dependence on prior information and providing real-time performance. Additionally, due to the generalization capability of NN, this method is economical in engineering. We conducted tests to demonstrate the good generalization performance of our method on a large-scale synthetic dataset. Subsequently, we verified the stability of our method on real traffic data. Our experiments provided confirmation of the benefits of combining NN and numerical optimization.

2204.14224 2026-06-18 cs.CV cs.LG eess.IV 版本更新

Investigation of Neural Network Methods for Reconstruction and Classification of Texture Images Under Conditions of Incomplete Information

不完全信息条件下纹理图像重建与分类的神经网络方法研究

Galymzhan Abdimanap, Kairat Bostanbekov, Abdelrahman Abdallah, Anel Alimova, Darkhan Kurmangaliyev, Daniyar Nurseitov, Tatyana Dedova, Larissa Balakay, Serik Nurakynov

发表机构 * Satbayev University(萨特巴耶夫大学) Institute of Ionosphere LLP(电离层研究所) Information Technology Department(信息技术部门) Assiut University(阿西乌特大学)

AI总结 提出结合目标检测、GAN(CRA)修复和Transformer/CNN分类的端到端框架,发现重建质量高(PSNR 28.7dB)但分类准确率仅53%,通过置信度混合集成将MCA从48%提升至58%,揭示生成模型产生语义模糊特征的问题。

Comments IEEE ACCESS

详情
AI中文摘要

异质自然纹理的自动化分析常因物理损伤和数据丢失而受阻,这对计算机视觉构成了重大挑战。虽然深度学习在受控环境中已显示出成功,但其在信息不完全条件下对复杂地质材料的应用仍未被充分探索。本研究提出了一个用于高分辨率岩心样本图像修复和分类的集成框架。我们设计了一个端到端流水线,利用目标检测进行样本分割,随后使用具有上下文残差聚合(CRA)的生成对抗网络(GAN)进行图像修复,以重建缺失的高频细节。接着,我们在重建数据上评估了现代基于Transformer(Swin、ViT)和CNN架构的性能。实验揭示了重建质量与下游效用之间的关键分歧:尽管结构保真度高(PSNR 28.7 dB,FID 74.01),分类准确率却停滞在53%。为了改善少数类检测,我们提出了一种基于置信度的混合集成方法,将MCA从48%提升至58%。这些结果凸显了当前最先进生成模型的局限性,它们可能产生视觉上合理但语义模糊的特征(“幻觉”),从而混淆分类器。本工作深入探讨了图像重建质量与分类性能之间的依赖关系,为无损检测和材料科学领域的未来研究提供了可复现的基线。鉴于井间准确率仍处于49-53%范围,我们将所得到的系统定位为岩相解释的决策支持和筛选工具,而非完全自主的分类器。代码可在以下网址获取:https://github.com/your-repo(注:原文URL未提供,此处为示例)

英文摘要

The automated analysis of heterogeneous natural textures is frequently hindered by physical damage and data loss, presenting a significant challenge to computer vision. While deep learning has shown success in controlled environments, its application to complex geological materials under conditions of incomplete information remains underexplored. This study presents an integrated framework for the inpainting and classification of high-resolution core sample images. We propose an end-to-end pipeline that utilizes object detection for sample segmentation, followed by image inpainting using Generative Adversarial Networks (GANs) with Contextual Residual Aggregation (CRA) to reconstruct missing high-frequency details. Subsequently, we evaluate the performance of modern Transformer-based (Swin, ViT) and CNN architectures on the reconstructed data. Our experiments revealed a critical divergence between reconstruction quality and downstream utility: despite high structural fidelity (PSNR 28.7~dB, FID 74.01), classification accuracy plateaued at 53\%. To improve minority-class detection, we propose a confidence-based hybrid ensemble that raises MCA from 48\% to 58\%. These results highlight the limitations of current state-of-the-art generative models, which may produce visually plausible but semantically ambiguous features ("hallucinations") that confound classifiers. This work provides insights into the dependencies between image reconstruction quality and classification performance, offering a reproducible baseline for future research in non-destructive testing and material science. Given that cross-well accuracy remains in the 49--53\% range, we position the resulting system as a decision-support and screening tool for lithofacies interpretation rather than as a fully autonomous classifier. The code is available at https://github.com/GalymzhanAbdimanap/Lithology_recognition

2307.05623 2026-06-18 cs.LG cs.AI 版本更新

A DeepLearning Framework for Dynamic Estimation of Origin-Destination Sequence

一种用于动态估计起点-终点序列的深度学习框架

Zheli Xiong, Defu Lian, Enhong Chen, Gang Chen, Xiaomin Cheng

发表机构 * School of Data Science University of Science(数据科学学院 中国科学技术大学) Yangtze River Delta Information Intelligence Innovation Research Institute, China(长江三角洲信息智能创新研究院)

AI总结 针对OD矩阵估计中的欠定性和滞后性问题,提出集成深度学习方法,利用神经网络推断OD序列结构并引导数值优化,实验证明能有效提供时空约束。

Comments 11 pages,25 figures

详情
AI中文摘要

OD矩阵估计是交通领域的一个关键问题。主要方法利用交通传感器测量信息(如交通计数)来估计由OD矩阵表示的交通需求。该问题分为两类:静态OD矩阵估计和动态OD矩阵序列(简称OD序列)估计。上述两类都面临由大量待估参数和不足的约束信息引起的欠定性问题。此外,OD序列估计还面临滞后挑战:由于拥堵等不同交通状况,同一车辆在相同观测时段内会出现在不同路段,导致相同的OD需求对应不同的行程。为此,本文提出一种集成方法,利用深度学习方法推断OD序列的结构,并利用结构约束指导传统数值优化。实验表明,神经网络能有效推断OD序列的结构,并为数值优化提供实用的约束以获得更好的结果。此外,实验表明,所提供的结构信息不仅包含对OD矩阵空间结构的约束,还提供了对OD序列时间结构的约束,很好地解决了滞后问题的影响。

英文摘要

OD matrix estimation is a critical problem in the transportation domain. The principle method uses the traffic sensor measured information such as traffic counts to estimate the traffic demand represented by the OD matrix. The problem is divided into two categories: static OD matrix estimation and dynamic OD matrices sequence(OD sequence for short) estimation. The above two face the underdetermination problem caused by abundant estimated parameters and insufficient constraint information. In addition, OD sequence estimation also faces the lag challenge: due to different traffic conditions such as congestion, identical vehicle will appear on different road sections during the same observation period, resulting in identical OD demands correspond to different trips. To this end, this paper proposes an integrated method, which uses deep learning methods to infer the structure of OD sequence and uses structural constraints to guide traditional numerical optimization. Our experiments show that the neural network(NN) can effectively infer the structure of the OD sequence and provide practical constraints for numerical optimization to obtain better results. Moreover, the experiments show that provided structural information contains not only constraints on the spatial structure of OD matrices but also provides constraints on the temporal structure of OD sequence, which solve the effect of the lagging problem well.

2303.18031 2026-06-18 cs.CV cs.AI cs.LG 版本更新

Simple Domain Generalization Methods are Strong Baselines for Open Domain Generalization

简单域泛化方法是开放域泛化的强基线

Masashi Noguchi, Shinichi Shirakawa

发表机构 * Graduate School of Environment and Information Sciences(环境与信息科学研究生院) Yokohama National University(Yokohama国立大学) Faculty of Environment(环境学系)

AI总结 本文评估现有域泛化方法在开放域泛化中的表现,发现简单方法CORAL和MMD与复杂方法DAML竞争力相当,并通过集成学习和Dirichlet混合数据增强简单扩展后性能接近DAML且计算成本更低。

Comments Accepted at IJCNN 2024. The code used in the experiments is available at https://github.com/shiralab/OpenDG-Eval

详情
AI中文摘要

在现实应用中,机器学习模型需要处理开放集识别(OSR),即在推理过程中出现未知类别,同时还要处理域偏移,即训练和推理阶段数据分布不同。域泛化(DG)旨在处理推理阶段目标域在模型训练期间不可访问的域偏移情况。开放域泛化(ODG)同时考虑DG和OSR。域增强元学习(DAML)是一种针对ODG的方法,但其学习过程复杂。相比之下,尽管已提出多种DG方法,但它们尚未在ODG场景下进行评估。在本研究中,我们全面评估了现有DG方法在ODG中的表现,并表明两种简单的DG方法——相关对齐(CORAL)和最大均值差异(MMD)——在多种情况下与DAML具有竞争力。此外,我们通过引入DAML中使用的技术(如集成学习和Dirichlet混合数据增强)提出了CORAL和MMD的简单扩展。实验评估表明,扩展后的CORAL和MMD可以以较低的计算成本达到与DAML相当的性能。这表明简单的DG方法及其简单扩展是ODG的强基线。

英文摘要

In real-world applications, a machine learning model is required to handle an open-set recognition (OSR), where unknown classes appear during the inference, in addition to a domain shift, where the data distribution differs between the training and inference phases. Domain generalization (DG) aims to handle the domain shift situation where the target domain of the inference phase is inaccessible during the model training. Open domain generalization (ODG) considers DG and OSR. Domain-augmented meta-learning (DAML) is a method targeting ODG; however, it has a complicated learning process. By contrast, although various DG methods have been proposed, they have not been evaluated in ODG situations. In this study, we comprehensively evaluate the existing DG methods in ODG and show that the two simple DG methods, CORrelation ALignment (CORAL) and maximum mean discrepancy (MMD), are competitive with DAML in several cases. In addition, we propose simple extensions of CORAL and MMD by introducing the techniques used in DAML, such as ensemble learning and Dirichlet mixup data augmentation. The experimental evaluation demonstrates that the extended CORAL and MMD can perform comparably to DAML with lower computational costs. This suggests that the simple DG methods and their simple extensions are strong baselines for ODG.

2604.14837 2026-06-18 cs.CV

Improved Multiscale Structural Mapping with Supervertex Vision Transformer for the Detection of Alzheimer's Disease Neurodegeneration

改进的多尺度结构映射与超顶点视觉Transformer用于阿尔茨海默病神经退行性病变的检测

Geonwoo Baek, David H. Salat, Ikbeom Jang

发表机构 * Department of Computer Science \& Engineering, Hankuk University of Foreign Studies, Seoul, Republic of Korea Athinoula A. Martinos Center for Biomedical Imaging, Department of Radiology, Massachusetts General Hospital, Charlestown, MA, USA Department of Radiology, Harvard Medical School, Boston, MA, USA Neuroimaging Research for Veterans (NeRVe) Center, VA Boston Healthcare System, Boston, MA, USA

AI总结 本文提出MSSM+结合SSVM和SV-ViT,通过多尺度结构映射和超顶点映射提高阿尔茨海默病早期检测的准确性,实现了更显著的组间差异识别和分类性能提升。

Comments Submitted to Human Brain Mapping

Journal ref Human Brain Mapping 47(8), e70548 (2026)

详情
AI中文摘要

阿尔茨海默病(AD)的确认通常依赖于正电子发射断层扫描(PET)或脑脊液(CSF)分析,这些方法成本高且侵入性。因此,结构MRI生物标志物如皮层厚度(CT)被广泛用于非侵入性AD筛查。多尺度结构映射(MSSM)最近被提出,以整合灰白质对比(GWCs)与CT,从单个T1加权MRI(T1w)扫描中。在此框架基础上,我们提出了MSSM+,结合表面超顶点映射(SSVM)和超顶点视觉Transformer(SV-ViT)。对具有AD和认知正常(CN)控制的个体的3D T1w图像进行了分析。MSSM+通过在顶点层面整合沟回深度和皮层曲率扩展了MSSM。SSVM将皮层表面划分为超顶点(表面块),有效代表区域间和区域内的空间关系。SV-ViT是一种在这些超顶点上运行的视觉Transformer架构,使从表面网格表示中获得解剖学信息的学习成为可能。与MSSM相比,MSSM+在AD和CN之间识别了更广泛且统计上显著的组差异。在AD vs. CN分类中,MSSM+在精确率-召回率曲线下面积比MSSM高3%。针对特定供应商的分析进一步表明,信号变异性减少,并且在MR制造商之间,相对于CT、GWCs和MSSM,分类性能一致提高。这些发现表明,结合SV-ViT的MSSM+是一种有前景的MRI成像生物标志物,用于在CSF/PET确认之前检测AD。

英文摘要

Alzheimer's disease (AD) confirmation often relies on positron emission tomography (PET) or cerebrospinal fluid (CSF) analysis, which are costly and invasive. Consequently, structural MRI biomarkers such as cortical thickness (CT) are widely used for non-invasive AD screening. Multiscale structural mapping (MSSM) was recently proposed to integrate gray-white matter contrasts (GWCs) with CT from a single T1-weighted MRI (T1w) scan. Building on this framework, we propose MSSM+, together with surface supervertex mapping (SSVM) and a Supervertex Vision Transformer (SV-ViT). 3D T1w images from individuals with AD and cognitively normal (CN) controls were analyzed. MSSM+ extends MSSM by incorporating sulcal depth and cortical curvature at the vertex level. SSVM partitions the cortical surface into supervertices (surface patches) that effectively represent inter- and intra-regional spatial relationships. SV-ViT is a Vision Transformer architecture operating on these supervertices, enabling anatomically informed learning from surface mesh representations. Compared with MSSM, MSSM+ identified more spatially extensive and statistically significant group differences between AD and CN. In AD vs. CN classification, MSSM+ achieved a 3%p higher area under the precision-recall curve than MSSM. Vendor-specific analyses further demonstrated reduced signal variability and consistently improved classification performance across MR manufacturers relative to CT, GWCs, and MSSM. These findings suggest that MSSM+ combined with SV-ViT is a promising MRI-based imaging marker for AD detection prior to CSF/PET confirmation.

2602.02370 2026-06-18 cs.CV

Uncertainty-Aware Image Classification In Biomedical Imaging Using Spectral-normalized Neural Gaussian Processes

利用谱归一化神经高斯过程进行生物医学影像中的不确定性感知图像分类

Uma Meleti, Jeffrey J. Nirschl

发表机构 * Department of Pathology(病理学部) Lab Medicine, University of Wisconsin-Madison(实验室医学,威斯康星大学麦迪逊分校)

AI总结 本文提出SNGP模型,通过谱归一化和高斯过程层改进单模型不确定性估计与异常检测,在三个生物医学分类任务中表现优异。

Comments Published at the IEEE International Symposium on Biomedical Imaging (ISBI) 2026

Journal ref Proc. 2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI),London, United Kingdom, Apr. 8-11, 2026, pp. [1-4], 2026

详情
AI中文摘要

准确的组织病理学解释对临床决策至关重要;然而,当前的数字病理深度学习模型在分布外(OOD)设置中往往过于自信且校准不佳,限制了信任和临床应用。安全关键的医学影像工作流程受益于内在的不确定性感知属性,能够准确拒绝OOD输入。我们实现了SNGP,即一组轻量级修改,应用谱归一化并用高斯过程层替代最终密集层,以提高单模型不确定性估计和OOD检测。我们在六个数据集上评估SNGP与确定性和蒙特卡洛dropout,涵盖三个生物医学分类任务:白血球、淀粉样斑块和结直肠组织病理学。SNGP在分布内性能相当,同时显著提高不确定性估计和OOD检测。因此,SNGP或相关模型提供了一个有用的框架,用于数字病理学中的不确定性感知分类,支持安全部署并建立与病理科医生的信任。

英文摘要

Accurate histopathologic interpretation is key for clinical decision-making; however, current deep learning models for digital pathology are often overconfident and poorly calibrated in out-of-distribution (OOD) settings, which limit trust and clinical adoption. Safety-critical medical imaging workflows benefit from intrinsic uncertainty-aware properties that can accurately reject OOD input. We implement the Spectral-normalized Neural Gaussian Process (SNGP), a set of lightweight modifications that apply spectral normalization and replace the final dense layer with a Gaussian process layer to improve single-model uncertainty estimation and OOD detection. We evaluate SNGP vs. deterministic and MonteCarlo dropout on six datasets across three biomedical classification tasks: white blood cells, amyloid plaques, and colorectal histopathology. SNGP has comparable in-distribution performance while significantly improving uncertainty estimation and OOD detection. Thus, SNGP or related models offer a useful framework for uncertainty-aware classification in digital pathology, supporting safe deployment and building trust with pathologists.

2602.15513 2026-06-18 cs.RO cs.AI

HIMM: Human-Inspired Long-Term Memory Modeling for Embodied Exploration and Question Answering

HIMM:面向具身探索与问答的人类启发式长期记忆建模

Ji Li, Bo Wang, Jing Xia, Mingyi Li, Shiyan Hu

发表机构 * The University of Hong Kong(香港大学) Beijing Institute of Technology(北京理工大学)

AI总结 本文提出HIMM模型,通过分离事件记忆与语义记忆,提升具身智能在长期观察和有限上下文下的探索与问答能力,实验显示在多个基准测试中表现优异。

Journal ref IROS 2026

详情
AI中文摘要

将多模态大语言模型作为具身代理的'大脑'仍面临挑战,特别是在长时间观测和有限上下文预算下。现有记忆辅助方法通常依赖文本摘要,丢弃丰富视觉和空间细节且在非平稳环境中易碎。本文提出非参数化记忆框架,明确分离事件记忆与语义记忆以支持具身探索与问答。我们的检索优先、推理辅助范式通过语义相似性召回事件经验并通过视觉推理验证,使过去观察的稳健重用无需严格几何对齐。同时,我们引入程序式规则提取机制,将经验转换为结构化、可重用的语义记忆,促进跨环境泛化。大量实验表明,HIMM在具身问答和探索基准上达到最新水平,在LLM-Match和LLM MatchXSPL上分别获得7.3%和11.4%的提升,在GOAT-Bench上分别获得+7.7%的成功率和+6.8%的SPL。分析显示,事件记忆主要提升探索效率,而语义记忆增强具身代理的复杂推理能力。

英文摘要

Deploying Multimodal Large Language Models as the brain of embodied agents remains challenging, particularly under long-horizon observations and limited context budgets. Existing memory assisted methods often rely on textual summaries, which discard rich visual and spatial details and remain brittle in non-stationary environments. In this work, we propose a non-parametric memory framework that explicitly disentangles episodic and semantic memory for embodied exploration and question answering. Our retrieval-first, reasoning-assisted paradigm recalls episodic experiences via semantic similarity and verifies them through visual reasoning, enabling robust reuse of past observations without rigid geometric alignment. In parallel, we introduce a program-style rule extraction mechanism that converts experiences into structured, reusable semantic memory, facilitating cross-environment generalization. Extensive experiments demonstrate state-of-the-art performance on embodied question answering and exploration benchmarks, yielding a 7.3% gain in LLM-Match and an 11.4% gain in LLM MatchXSPL on A-EQA, as well as +7.7% success rate and +6.8% SPL on GOAT-Bench. Analyses reveal that our episodic memory primarily improves exploration efficiency, while semantic memory strengthens complex reasoning of embodied agents.

2510.15300 2026-06-18 cs.LG

DFCA: Decentralized Federated Clustering Algorithm

DFCA:去中心化的联邦聚类算法

Jonas Kirch, Sebastian Becker, Tiago Koketsu Rodrigues, Stefan Harmeling

发表机构 * Fraunhofer Institute for Software and Systems Engineering(弗劳恩霍夫软件与系统工程研究所) Lamarr Institute for Machine Learning and AI(拉马尔人工智能与机器学习研究所)

AI总结 DFCA是一种去中心化的联邦学习聚类算法,通过邻居的顺序运行平均聚合模型,实现高效通信和保持聚类性能,实验表明其在动态网络中表现优异。

详情
AI中文摘要

集群联邦学习已发展为处理客户端异构数据的有效方法,通过将数据划分为具有相似或相同数据分布的集群。然而,现有方法如迭代联邦聚类算法(IFCA)依赖于中央服务器协调模型更新,导致瓶颈和单点故障,限制了其在更现实的去中心化学习环境中的应用。本文介绍DFCA,一种完全去中心化的集群联邦学习算法,使客户端能够协同训练集群特定模型而无需中央协调。DFCA使用顺序运行平均聚合邻居的模型作为更新到来,提供了一种比批量聚合更高效的通信替代方案,同时保持聚类性能。在各种数据集上的实验表明,DFCA在去中心化算法中表现优于其他算法,并且在稀疏连接下与中央IFCA表现相当,突显了其在动态现实去中心化网络中的鲁棒性和实用性。

英文摘要

Clustered Federated Learning has emerged as an effective approach for handling heterogeneous data across clients by partitioning them into clusters with similar or identical data distributions. However, most existing methods, including the Iterative Federated Clustering Algorithm (IFCA), rely on a central server to coordinate model updates, which creates a bottleneck and a single point of failure, limiting their applicability in more realistic decentralized learning settings. In this work, we introduce DFCA, a fully decentralized clustered FL algorithm that enables clients to collaboratively train cluster-specific models without central coordination. DFCA uses a sequential running average to aggregate models from neighbors as updates arrive, providing a communication-efficient alternative to batch aggregation while maintaining clustering performance. Our experiments on various datasets demonstrate that DFCA outperforms other decentralized algorithms and performs comparably to centralized IFCA, even under sparse connectivity, highlighting its robustness and practicality for dynamic real-world decentralized networks.

2602.20135 2026-06-18 cs.CL cs.AI cs.IR

KNIGHT: Knowledge Graph-Driven Multiple-Choice Question Generation with Adaptive Hardness Calibration

KNIGHT: 基于知识图谱的多选题生成与自适应难度校准

Mohammad Amanlou, Erfan Shafiee Moghaddam, Yasaman Amou Jafari, Mahdi Noori, Farhan Farsi, Behnam Bahrak

发表机构 * University of Tehran(塔里班大学) Independent Researcher(独立研究员) Amirkabir University of Technology(阿米尔卡比尔技术大学) TEIAS Institute(TEIAS研究所)

AI总结 KNIGHT通过构建领域特定知识图谱,实现高效生成多选题数据集,支持自适应难度控制,提升生成效率与质量,验证了其在多个领域内的有效性。

Comments Accepted at the Third Conference on Parsimony and Learning (CPAL 2026). 36 pages, 12 figures. (Equal contribution: Yasaman Amou Jafari and Mahdi Noori.)

Journal ref Conference on Parsimony and Learning, Proceedings of Machine Learning Research, 328:989-1024, 2026

详情
AI中文摘要

随着大语言模型(LLMs)的兴起,它们在检索增强生成(RAG)等应用中变得至关重要。然而,评估这些系统仍受制于构建专用评估数据集的时间和成本。我们介绍了KNIGHT,一种基于LLM的知识图谱驱动框架,用于从外部来源生成多选题(MCQ)数据集。KNIGHT构建了一个主题特定的知识图谱,这是一个结构化且简洁的实体和关系摘要,可以重复使用以生成由教师控制的难度级别,包括多跳问题,而无需反复重新输入完整源文本。该知识图谱充当一个压缩、可重用的状态,使问题生成成为对图的廉价读取。我们将在维基百科/Wikidata上实例化KNIGHT,同时保持框架的领域和本体无关性。作为案例研究,KNIGHT在历史、生物学和数学领域生成了六个MCQ数据集。我们评估了五个标准:流畅性、无歧义性(单个正确答案)、主题相关性、选项唯一性和给定源提供的答案性(作为幻觉的代理)。结果表明,KNIGHT能够通过可重用的图表示实现令牌和成本高效的生成,实现了这些标准的高质量,且模型排名与MMLU风格基准一致,同时支持主题特定和难度控制的评估。

英文摘要

With the rise of large language models (LLMs), they have become instrumental in applications such as Retrieval-Augmented Generation (RAG). Yet evaluating these systems remains bottlenecked by the time and cost of building specialized assessment datasets. We introduce KNIGHT, an LLM-based, knowledge-graph-driven framework for generating multiple-choice question (MCQ) datasets from external sources. KNIGHT constructs a topic-specific knowledge graph, a structured and parsimonious summary of entities and relations, that can be reused to generate instructor-controlled difficulty levels, including multi-hop questions, without repeatedly re-feeding the full source text. This knowledge graph acts as a compressed, reusable state, making question generation a cheap read over the graph. We instantiate KNIGHT on Wikipedia/Wikidata while keeping the framework domain- and ontology-agnostic. As a case study, KNIGHT produces six MCQ datasets in History, Biology, and Mathematics. We evaluate quality on five criteria: fluency, unambiguity (single correct answer), topic relevance, option uniqueness, and answerability given the provided sources (as a proxy for hallucination). Results show that KNIGHT enables token- and cost-efficient generation from a reusable graph representation, achieves high quality across these criteria, and yields model rankings aligned with MMLU-style benchmarks, while supporting topic-specific and difficulty-controlled evaluation.

2405.14273 2026-06-18 cs.LG cs.AI math.OC

Exact Solution to Data-Driven Inverse Optimization of MILPs in Finite Time via Gradient-Based Methods

通过基于梯度的方法在有限时间内精确求解混合整数线性规划的驱动数据反优化问题

Akira Kitaoka

发表机构 * NEC Corporation(日本电气株式会社)

AI总结 本文研究了混合整数线性规划中驱动数据反优化问题,揭示了子最优损失的几何结构,并证明了基于梯度的优化方法可以在有限次迭代内达到观测数据的一致性,同时给出了投影子梯度下降法的迭代次数上界。

Comments 66 pages; comments are welcome

详情
AI中文摘要

驱动数据反优化问题(DDIOP)是估计能够解释观测最优解数据的目标函数参数(权重)的问题,广泛应用于混合整数线性规划(MILP)中。在MILP的反优化中,特征的预测误差对权重的不连续性使得直接应用基于梯度的优化方法具有挑战性。本文聚焦于子最优损失,该损失在权重与观测数据完全一致时达到最小值零。我们揭示了该损失的几何结构——它具有凸性和分段线性特性,并且与观测数据完全一致的权重集合具有正的“厚度”而非单一点或薄边界。利用这一结构,我们证明了:首先,一类广泛的基于梯度的优化方法,包括投影子梯度下降法,在有限次迭代中可以达到观测数据的一致性(在有限时间内获得精确解)。其次,对于投影子梯度下降法,我们给出了达到精确一致性的迭代次数的显式上界。第三,当正向问题是一个整数线性规划(ILP)时,我们将其上界表示为仅由样本数、特征维度和约束系数矩阵结构(例如,若系数矩阵是总模矩阵,则迭代次数被显式地限制为样本数平方和维度的多项式)决定的完全显式迭代次数。通过数值实验,我们验证了这种有限步数达到行为。

英文摘要

A data-driven inverse optimization problem (DDIOP) is the problem of estimating the objective-function parameters (weights) that explain observed optimal-solution data, and it arises in many applications, including mixed integer linear programming (MILP). In inverse optimization for MILPs, the prediction error of the features is discontinuous with respect to the weights, so applying gradient-based optimization directly is difficult. In this paper we focus on the suboptimality loss. This loss attains its minimum value, zero, if and only if the weights are exactly consistent with the observed data. We reveal a geometric structure of this loss -- it is convex and piecewise linear, and moreover the set of weights that are exactly consistent with the observed data has a positive ``thickness'' rather than being a single point or a thin boundary -- and use it to show the following. First, a broad class of gradient-based optimization methods, including projected subgradient descent, reaches exact consistency with the observed data in finitely many iterations (an exact solution is obtained in finite time). Second, for projected subgradient descent we give an explicit upper bound on the number of iterations needed to reach exact consistency. Third, when the forward problem is an integer linear program (ILP), we give this upper bound as a fully explicit iteration count determined solely by the number of samples, the dimension of the features, and the structure of the constraint coefficient matrix. Through numerical experiments, we confirm this finite-step attainment behavior.

2407.00449 2026-06-18 cs.LG cs.AI cs.NE

Fully tensorial approach to hypercomplex-valued neural networks

超复数值神经网络的全张量方法

Agnieszka Niemczynowicz, Radosław Antoni Kycia

发表机构 * Faculty of Computer Science and Mathematics, Cracow University of Technology(克拉科夫技术大学计算机科学与数学系)

AI总结 本文提出基于张量的理论框架,使神经网络能在任意有限维代数上操作,通过张量运算统一描述密集和卷积层,并为超复数值感知机提供理论基础。

Comments 23 pages, 3 figures

Journal ref Information Sciences, 2026, 123796

详情
AI中文摘要

本文提出了一种完全张量化的理论框架,用于超复数值神经网络。所提出的方法使神经网络架构能够处理定义在任意有限维代数上的数据。核心观察是,代数乘法可以表示为三阶张量,这使得神经网络层中的所有代数运算都可以用标准张量收缩、排列和重塑操作来表述。这种基于张量的表述为密集和卷积层提供了统一且维度无关的描述,并且与支持优化张量操作的现代深度学习库直接兼容。所提出的框架将现有的四维代数构造作为特殊情况恢复。在此设定下,建立了单层超复数值感知机的张量版本的通用逼近定理,在底层代数的非退化假设下,从而为所考虑的神经网络类别提供了严谨的理论基础。

英文摘要

A fully tensorial theoretical framework for hypercomplex-valued neural networks is presented. The proposed approach enables neural network architectures to operate on data defined over arbitrary finite-dimensional algebras. The central observation is that algebra multiplication can be represented by a rank-three tensor, which allows all algebraic operations in neural network layers to be formulated in terms of standard tensor contractions, permutations, and reshaping operations. This tensor-based formulation provides a unified and dimension-independent description of hypercomplex-valued dense and convolutional layers and is directly compatible with modern deep learning libraries supporting optimized tensor operations. The proposed framework recovers existing constructions for four-dimensional algebras as a special case. Within this setting, a tensor-based version of the universal approximation theorem for single-layer hypercomplex-valued perceptrons is established under mild non-degeneracy assumptions on the underlying algebra, thereby providing a rigorous theoretical foundation for the considered class of neural networks.

2411.16934 2026-06-18 cs.CV

Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory

在线事件记忆视觉查询定位与眼动流对象记忆

Zaira Manigrasso, Matteo Dunnhofer, Antonino Furnari, Moritz Nottebaum, Antonio Finocchiaro, Davide Marana, Rosario Forte, Giovanni Maria Farinella, Christian Micheloni

发表机构 * University of Udine(乌迪内大学) University of Catania(卡塔尼亚大学) York University(约克大学)

AI总结 本文提出OVQ2D任务,通过在线处理视频流实现对象定位,引入ESOM框架整合发现、跟踪和记忆模块,实验显示其在Ego4D数据集上表现优异,但仍有提升空间。

Comments in IEEE/CVF Winter Conference on Application of Computer Vision (WACV) 2026

详情
AI中文摘要

事件记忆检索使可穿戴相机能回忆之前观察到的对象或事件。然而,现有方法假设离线设置,限制了在资源受限的可穿戴设备中的应用。为更实用的事件记忆系统,我们提出在线视觉查询2D(OVQ2D)任务,模型在线处理视频流,每个帧仅观察一次,并使用紧凑记忆检索对象定位。我们通过ESOM(眼动流对象记忆)框架解决OVQ2D问题,该框架整合对象发现模块、跟踪模块和记忆模块,高效存储时空对象信息以支持查询。在Ego4D数据集上的实验表明,ESOM优于其他在线方法,尽管OVQ2D仍具挑战性,最高成功率仅为约4%。ESOM的准确性随着完美跟踪(31.91%)、发现(40.55%)或两者结合(81.92%)显著提高,凸显了对这些组件应用研究的必要性。

英文摘要

Episodic memory retrieval enables wearable cameras to recall objects or events previously observed in video. However, existing formulations assume an "offline" setting with full video access at query time, limiting their applicability in real-world scenarios with power and storage-constrained wearable devices. Towards more application-ready episodic memory systems, we introduce Online Visual Query 2D (OVQ2D), a task where models process video streams online, observing each frame only once, and retrieve object localizations using a compact memory instead of full video history. We address OVQ2D with ESOM (Egocentric Streaming Object Memory), a novel framework integrating an object discovery module, an object tracking module, and a memory module that find, track, and store spatio-temporal object information for efficient querying. Experiments on Ego4D demonstrate ESOM's superiority over other online approaches, though OVQ2D remains challenging, with top performance at only ~4% success. ESOM's accuracy increases markedly with perfect object tracking (31.91%), discovery (40.55%), or both (81.92%), underscoring the need of applied research on these components.

2512.17696 2026-06-18 cs.LG stat.ME stat.ML

Spatially-informed transformers: Injecting geostatistical covariance biases into self-attention for spatio-temporal forecasting

具有空间信息的变换器:将地理统计学协方差偏置注入自注意力机制以进行时空预测

Yuri Calleo

发表机构 * Unimercatorum(乌尼默卡图姆大学)

AI总结 本文提出一种混合架构,通过可学习的协方差核将地理统计学归纳偏置注入自注意力机制,以提升时空预测的准确性与物理合理性。

详情
AI中文摘要

高维时空过程的建模面临着经典地理统计学的概率严谨性与深度学习的灵活高容量表示之间的根本二元对立。尽管高斯过程提供理论一致性和精确的不确定性量化,但其计算规模 prohibitive 使其难以应用于大规模传感器网络。相反,现代变换器架构在序列建模方面表现出色,但本质上缺乏几何归纳偏置,将空间传感器视为排列不变的标记,而没有对距离的原生理解。在本文中,我们提出了一种具有空间信息的变换器,是一种混合架构,通过可学习的协方差核将地理统计学归纳偏置直接注入自注意力机制中。通过正式将注意力结构分解为一个 stationary 物理先验和一个 non-stationary 数据驱动残差,我们施加了一个软拓扑约束,倾向于空间上接近的交互,同时保留了建模复杂动态的能力。我们展示了“深度变图”现象,其中网络通过反向传播成功恢复了底层过程的真实空间衰减参数。在合成高斯随机场和真实世界交通基准的大量实验中,证实了我们的方法优于最先进的图神经网络。此外,严格的统计验证确认了所提出的方法不仅在预测准确性上更优,而且提供了良好的校准概率预测,有效地弥合了物理感知建模与数据驱动学习之间的差距。

英文摘要

The modeling of high-dimensional spatio-temporal processes presents a fundamental dichotomy between the probabilistic rigor of classical geostatistics and the flexible, high-capacity representations of deep learning. While Gaussian processes offer theoretical consistency and exact uncertainty quantification, their prohibitive computational scaling renders them impractical for massive sensor networks. Conversely, modern transformer architectures excel at sequence modeling but inherently lack a geometric inductive bias, treating spatial sensors as permutation-invariant tokens without a native understanding of distance. In this work, we propose a spatially-informed transformer, a hybrid architecture that injects a geostatistical inductive bias directly into the self-attention mechanism via a learnable covariance kernel. By formally decomposing the attention structure into a stationary physical prior and a non-stationary data-driven residual, we impose a soft topological constraint that favors spatially proximal interactions while retaining the capacity to model complex dynamics. We demonstrate the phenomenon of ``Deep Variography'', where the network successfully recovers the true spatial decay parameters of the underlying process end-to-end via backpropagation. Extensive experiments on synthetic Gaussian random fields and real-world traffic benchmarks confirm that our method outperforms state-of-the-art graph neural networks. Furthermore, rigorous statistical validation confirms that the proposed method delivers not only superior predictive accuracy but also well-calibrated probabilistic forecasts, effectively bridging the gap between physics-aware modeling and data-driven learning.

2508.20275 2026-06-18 cs.LG cs.CL q-bio.QM

A Systematic Review on the Generative AI Applications in Human Medical Genomics

关于生成式AI在人类医学基因组学中的应用系统综述

Anton Changalidis, Yury Barbitoff, Yulia Nasykhova, Andrey Glotov

发表机构 * Dpt. of Genomic Medicine(基因组医学系) D.O. Ott Research Institute of Obstetrics, Gynaecology, and Reproductology(D.O. Ott妇产科与生殖医学研究所)

AI总结 本文系统综述了生成式AI在罕见和常见疾病遗传研究与诊断中的应用,分析了LLM在基因组变异识别、注释及医学影像中的作用,指出其在多模态数据整合和临床应用中的挑战。

Comments 31 pages, 5 figures

Journal ref Frontiers in Genetics 16 (2026) 1694070

详情
AI中文摘要

尽管传统统计技术和机器学习方法在遗传学和特别是遗传病诊断中做出了重要贡献,但它们在处理复杂、高维数据时往往遇到困难,而最先进的深度学习模型现在解决了这一挑战。基于Transformer架构的大语言模型(LLMs)在需要理解非结构化医疗数据的任务中表现出色。本文系统综述了LLMs在遗传研究和诊断中的作用,通过PubMed、bioRxiv、medRxiv和arXiv的自动化关键词搜索,分析了172项研究,突显了基因组变异识别、注释和解释以及通过视觉Transformer改进的医学影像进展。关键发现表明,虽然基于Transformer的模型显著提高了疾病和风险分层,但在变异解释、医学影像分析和报告生成方面仍存在挑战,整合多模态数据(基因组序列、影像和临床记录)到统一且临床稳健的流程中面临可扩展性和临床应用限制。本文提供了LLM在转变遗传病诊断和支持遗传教育方面的全面分类和评估,为导航这一快速发展的领域提供指导。

英文摘要

Although traditional statistical techniques and machine learning methods have contributed significantly to genetics and, in particular, inherited disease diagnosis, they often struggle with complex, high-dimensional data, a challenge now addressed by state-of-the-art deep learning models. Large language models (LLMs), based on transformer architectures, have excelled in tasks requiring contextual comprehension of unstructured medical data. This systematic review examines the role of LLMs in the genetic research and diagnostics of both rare and common diseases. Automated keyword-based search in PubMed, bioRxiv, medRxiv, and arXiv was conducted, targeting studies on LLM applications in diagnostics and education within genetics and removing irrelevant or outdated models. A total of 172 studies were analyzed, highlighting applications in genomic variant identification, annotation, and interpretation, as well as medical imaging advancements through vision transformers. Key findings indicate that while transformer-based models significantly advance disease and risk stratification, variant interpretation, medical imaging analysis, and report generation, major challenges persist in integrating multimodal data (genomic sequences, imaging, and clinical records) into unified and clinically robust pipelines, facing limitations in generalizability and practical implementation in clinical settings. This review provides a comprehensive classification and assessment of the current capabilities and limitations of LLMs in transforming hereditary disease diagnostics and supporting genetic education, serving as a guide to navigate this rapidly evolving field.

2503.01163 2026-06-18 cs.AI cs.CL cs.HC cs.LG cs.NE

Bandit-Based Prompt Design Strategy Selection Improves Prompt Optimizers

基于Bandit的提示设计策略选择改进提示优化器

Rin Ashizawa, Yoichi Hirose, Nozomu Yoshinari, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University(横滨国立大学)

AI总结 本文提出OPTS方法,通过显式选择提示设计策略提升EvoPrompt性能,采用Thompson采样机制在BIG-Bench Hard上验证效果,实现最优结果。

Comments Accepted to ACL 2025 Findings

详情
AI中文摘要

提示优化旨在寻找能提升大语言模型性能的有效提示。尽管现有方法已发现有效提示,但往往与人类专家精心设计的复杂提示不同。提示设计策略作为提升提示性能的最佳实践,对优化提示至关重要。最近,Autonomous Prompt Engineering Toolbox (APET) 将多种提示设计策略整合到提示优化过程中。在APET中,需要LLM隐式选择和应用合适的策略,因为提示设计策略可能产生负面影响。这种隐式选择可能因LLM的有限优化能力而表现不佳。本文引入Optimizing Prompts with sTrategy Selection (OPTS),实现提示设计的显式选择机制。我们提出三种机制,包括基于Thompson采样的方法,并将其整合到EvoPrompt中。在使用BIG-Bench Hard对Llama-3-8B-Instruct和GPT-4o mini进行提示优化的实验中,结果表明提示设计策略的选择提升了EvoPrompt的性能,Thompson采样机制实现了最佳整体结果。我们的实验代码可在https://github.com/shiralab/OPTS获取。

英文摘要

Prompt optimization aims to search for effective prompts that enhance the performance of large language models (LLMs). Although existing prompt optimization methods have discovered effective prompts, they often differ from sophisticated prompts carefully designed by human experts. Prompt design strategies, representing best practices for improving prompt performance, can be key to improving prompt optimization. Recently, a method termed the Autonomous Prompt Engineering Toolbox (APET) has incorporated various prompt design strategies into the prompt optimization process. In APET, the LLM is needed to implicitly select and apply the appropriate strategies because prompt design strategies can have negative effects. This implicit selection may be suboptimal due to the limited optimization capabilities of LLMs. This paper introduces Optimizing Prompts with sTrategy Selection (OPTS), which implements explicit selection mechanisms for prompt design. We propose three mechanisms, including a Thompson sampling-based approach, and integrate them into EvoPrompt, a well-known prompt optimizer. Experiments optimizing prompts for two LLMs, Llama-3-8B-Instruct and GPT-4o mini, were conducted using BIG-Bench Hard. Our results show that the selection of prompt design strategies improves the performance of EvoPrompt, and the Thompson sampling-based mechanism achieves the best overall results. Our experimental code is provided at https://github.com/shiralab/OPTS .

2504.12347 2026-06-18 cs.CL cs.AI cs.CY

Assessment of Evolving Large Language Models in Upper Secondary Mathematics

对上中学数学中演进式大语言模型的评估

Mika Setälä, Pieta Sikström, Ville Heilala, Tommi Kärkkäinen

发表机构 * Faculty of Information Technology(信息科技学院) University of Jyväskylä(于韦斯屈莱大学) Faculty of Humanities and Social Sciences(人文与社会科学学院)

AI总结 本文评估了不同大语言模型在芬兰毕业考试中的数学能力,发现随着模型演进,其表现显著提升,部分模型接近完美,展示了LLM在数学能力上的快速进步及其在教育中的潜力。

详情
AI中文摘要

大型语言模型(LLMs)在教育环境中展现出日益增长的前景,但其数学推理能力被认为是在不断演变的。本研究通过芬兰毕业考试,一种针对上中学教育的高风险数字测试,评估了各种LLMs的数学能力。初步测试显示中等表现,对应中等成绩,但后续评估显示随着语言模型的演进,表现显著提升。令人惊讶的是,某些模型达到了接近完美或完美分数,与顶尖学生表现相当,符合大学入学要求。我们的发现突显了LLM数学能力的快速进步,并展示了其作为支持学习和教学的潜在工具的可能性。

英文摘要

Large language models (LLMs) have shown increasing promise in educational settings, yet their mathematical reasoning has been considered evolving. This study evaluates the mathematical capabilities of various LLMs using the Finnish matriculation examination, a high-stakes digital test for upper secondary education. Initial tests yielded moderate performance corresponding to mid-range grades, but later evaluations demonstrated substantial improvements as the language models evolved. Remarkably, some models achieved near-perfect or perfect scores, matching top student performance and qualifying for university admission. Our findings highlight the rapid advances in the mathematical proficiency of LLMs and illustrate their potential as underlying tools to support learning and teaching in a variety of ways.