arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 1985
2501.08425 2026-06-12 cs.LG math.AP math.PR 版本更新

Is Stochastic Gradient Descent Effective? A PDE Perspective on Machine Learning processes

随机梯度下降有效吗?机器学习过程的PDE视角

Davide Barbieri, Matteo Bonforte, Peio Ibarrondo

发表机构 * Departamento de Matemáticas, Universidad Autónoma de Madrid, ICMAT - Instituto de Ciencias Matemáticas, CSIC-UAM-UC3M-UCM(数学系,马德里自治大学,ICMAT数学科学研究所,CSIC-UAM-UC3M-UCM)

AI总结 通过Fokker-Planck型抛物PDE分析SGD行为,区分漂移和扩散两个阶段,量化浓度现象并证明平均退出时间界限,为非凸损失和退化扩散矩阵下的渐近收敛提供新结果。

详情
AI中文摘要

本文分析了随机梯度下降(SGD)的行为,这是一种在监督学习中广泛使用的方法,通过最小化非凸损失函数来优化神经网络权重。自E、Li和Tai(2017)的开创性工作以来,此类过程的基本结构可以通过Fokker-Planck型抛物PDE来理解,这是我们分析的核心。尽管Fokker-Planck方程历史悠久且文献丰富,但当势函数非凸或扩散矩阵退化时,几乎一无所知,这是我们分析中面临的主要困难。我们识别出两种不同的阶段:在SGD的初始阶段,损失函数驱动权重集中在最近的局部最小值附近。我们将此阶段称为漂移阶段,并提供了关于这种集中现象的定量估计。接下来,我们引入扩散阶段,其中随机波动帮助学习过程逃离次优局部最小值。我们分析了平均退出时间(MET),并证明了MET的上下界。最后,我们针对非凸代价函数和退化扩散矩阵(不允许使用标准方法并需要新技术)研究了SGD的渐近收敛性。为此,我们利用了两种不同的方法:对偶方法和熵方法。我们提供了关于SGD动力学和有效性的新结果,建立了随机优化与PDE理论之间的深层联系,并为机器学习过程中的基本问题提供了一些答案和见解:SGD需要多长时间才能逃离一个坏的最小值?使用SGD时神经网络参数是否收敛?在SGD训练的第一阶段,参数如何演化?

英文摘要

In this paper we analyze the behaviour of the stochastic gradient descent (SGD), a widely used method in supervised learning for optimizing neural network weights via a minimization of non-convex loss functions. Since the pioneering work of E, Li and Tai (2017), the underlying structure of such processes can be understood via parabolic PDEs of Fokker-Planck type, which are at the core of our analysis. Even if Fokker-Planck equations have a long history and a extensive literature, almost nothing is known when the potential is non-convex or when the diffusion matrix is degenerate, and this is the main difficulty that we face in our analysis. We identify two different regimes: in the initial phase of SGD, the loss function drives the weights to concentrate around the nearest local minimum. We refer to this phase as the drift regime and we provide quantitative estimates on this concentration phenomenon. Next, we introduce the diffusion regime, where stochastic fluctuations help the learning process to escape suboptimal local minima. We analyze the Mean Exit Time (MET) and prove upper and lower bounds of the MET. Finally, we address the asymptotic convergence of SGD, for a non-convex cost function and a degenerate diffusion matrix, that do not allow to use the standard approaches, and require new techniques. For this purpose, we exploit two different methods: duality and entropy methods. We provide new results about the dynamics and effectiveness of SGD, offering a deep connection between stochastic optimization and PDE theory, and some answers and insights to basic questions in the Machine Learning processes: How long does SGD take to escape from a bad minimum? Do neural network parameters converge using SGD? How do parameters evolve in the first stage of training with SGD?

2505.01869 2026-06-12 cs.CV 版本更新

Visual enhancement and 3D representation for underwater scenes: a review

水下场景的视觉增强与三维表示:综述

Guoxi Huang, Haoran Wang, Brett Seymour, Evan Kovacs, John Ellerbroc, Dave Blackham, Nantheera Anantrasirichai

发表机构 * Visual Information Laboratory, University of Bristol(视觉信息实验室,布里斯托尔大学) Submerged Resources Center, National Park Service(水下资源中心,国家公园服务) Marine Imaging Technologies, LLC(海洋成像技术有限公司) Gates Underwater Products, Inc(盖茨水下产品公司) Esprit film and television Ltd(Esprit电影和电视有限公司)

AI总结 本文综述了水下视觉增强和三维重建方法,从物理模型到非学习与数据驱动技术(如NeRF和3D高斯溅射),并评估了多种算法在基准数据集上的性能,指出了未来研究方向。

详情
AI中文摘要

水下视觉增强(UVE)和水下三维重建由于水生环境中复杂的成像条件,在计算机视觉和基于AI的任务中面临重大挑战。尽管开发了许多增强算法,但涵盖UVE和水下三维重建的全面系统性综述仍然缺失。为了推动这些领域的研究,我们从多个角度进行了深入综述。首先,我们介绍了基本的物理模型,强调了挑战传统技术的特殊性。我们调查了专门为水下场景设计的视觉增强和三维重建的先进方法。本文评估了从非学习方法到先进数据驱动技术(包括神经辐射场和3D高斯溅射)的各种方法,讨论了它们在处理水下失真方面的有效性。最后,我们在多个基准数据集上对最先进的UVE和水下三维重建算法进行了定量和定性评估。最后,我们指出了水下视觉未来发展的关键研究方向。

英文摘要

Underwater visual enhancement (UVE) and underwater 3D reconstruction pose significant challenges in computer vision and AI-based tasks due to complex imaging conditions in aquatic environments. Despite the development of numerous enhancement algorithms, a comprehensive and systematic review covering both UVE and underwater 3D reconstruction remains absent. To advance research in these areas, we present an in-depth review from multiple perspectives. First, we introduce the fundamental physical models, highlighting the peculiarities that challenge conventional techniques. We survey advanced methods for visual enhancement and 3D reconstruction specifically designed for underwater scenarios. The paper assesses various approaches from non-learning methods to advanced data-driven techniques, including Neural Radiance Fields and 3D Gaussian Splatting, discussing their effectiveness in handling underwater distortions. Finally, we conduct both quantitative and qualitative evaluations of state-of-the-art UVE and underwater 3D reconstruction algorithms across multiple benchmark datasets. Finally, we highlight key research directions for future advancements in underwater vision.

2408.17221 2026-06-12 cs.LG math.AG 版本更新

Geometry of Lightning Self-Attention: Identifiability and Dimension

闪电自注意力的几何:可识别性与维度

Nathan W. Henry, Giovanni Luca Marchetti, Kathlén Kohn

发表机构 * University of Toronto(多伦多大学) Royal Institute of Technology (KTH)(皇家理工学院(KTH))

AI总结 本文利用代数几何工具,分析了无归一化自注意力网络的函数空间几何,给出了深层注意力的可识别性描述并计算了函数空间维度,同时刻画了单层模型的奇异点和边界点,并推测了归一化情形的结果。

Comments Accepted at ICLR 2025

详情
AI中文摘要

我们考虑由无归一化的自注意力网络定义的函数空间,并理论上分析其几何结构。由于这些网络是多项式,我们依赖代数几何的工具。特别地,我们通过描述任意层数参数化的通用纤维来研究深层注意力的可识别性,并据此计算函数空间的维度。此外,对于单层模型,我们刻画了奇异点和边界点。最后,我们提出一个关于归一化自注意力网络结果的推测性扩展,在单层情况下证明该推测,并在深层情况下进行数值验证。

英文摘要

We consider function spaces defined by self-attention networks without normalization, and theoretically analyze their geometry. Since these networks are polynomial, we rely on tools from algebraic geometry. In particular, we study the identifiability of deep attention by providing a description of the generic fibers of the parametrization for an arbitrary number of layers and, as a consequence, compute the dimension of the function space. Additionally, for a single-layer model, we characterize the singular and boundary points. Finally, we formulate a conjectural extension of our results to normalized self-attention networks, prove it for a single layer, and numerically verify it in the deep case.

2501.04823 2026-06-12 cs.RO math.OC stat.AP 版本更新

Learning Robot Safety from Sparse Human Feedback using Conformal Prediction

基于共形预测从稀疏人类反馈中学习机器人安全

Aaron O. Feldman, Joseph A. Vincent, Maximilian Adang, JunEn Low, Mac Schwager

发表机构 * Department of Aeronautics and Astronautics, Stanford University(航空航天工程系,斯坦福大学)

AI总结 通过人类对策略轨迹的二元反馈,利用共形预测识别包含未来策略错误的状态区域,构建具有保证漏检率的预警系统,并用于改进模型预测控制器的安全性。

详情
AI中文摘要

确保机器人安全可能具有挑战性;用户定义的约束可能遗漏边缘情况,策略即使从安全数据训练也可能变得不安全,并且安全可能是主观的。因此,我们通过向标记不安全行为的人类展示策略轨迹来学习机器人安全。从这种二元反馈中,我们使用共形预测的统计方法识别一个状态区域(可能在学习的潜在空间中),保证包含用户指定比例的未来策略错误。我们的方法是样本高效的,因为它基于最近邻分类,避免了共形预测中常见的保留数据。通过提醒机器人是否到达可疑的不安全区域,我们获得了一个模拟人类安全偏好且具有保证漏检率的预警系统。通过视频标注,我们的系统可以检测四旋翼视觉运动策略何时无法通过指定门。我们提出了一种通过避免可疑不安全区域来改进策略的方法。通过它,我们提高了模型预测控制器的安全性,这在30次四旋翼飞行跨越6个导航任务的实验测试中得到了证明。提供了代码和视频。

英文摘要

Ensuring robot safety can be challenging; user-defined constraints can miss edge cases, policies can become unsafe even when trained from safe data, and safety can be subjective. Thus, we learn about robot safety by showing policy trajectories to a human who flags unsafe behavior. From this binary feedback, we use the statistical method of conformal prediction to identify a region of states, potentially in learned latent space, guaranteed to contain a user-specified fraction of future policy errors. Our method is sample-efficient, as it builds on nearest neighbor classification and avoids withholding data as is common with conformal prediction. By alerting if the robot reaches the suspected unsafe region, we obtain a warning system that mimics the human's safety preferences with guaranteed miss rate. From video labeling, our system can detect when a quadcopter visuomotor policy will fail to steer through a designated gate. We present an approach for policy improvement by avoiding the suspected unsafe region. With it we improve a model predictive controller's safety, as shown in experimental testing with 30 quadcopter flights across 6 navigation tasks. Code and videos are provided.

2301.12538 2026-06-12 cs.LG cs.AI math.DS 版本更新

On Approximating the Dynamic Response of Synchronous Generators via Operator Learning: A Step Towards Building Deep Operator-based Power Grid Simulators

关于通过算子学习逼近同步发电机动态响应:迈向构建基于深度算子的电网模拟器的一步

Christian Moya, Amirhossein Mollaali, Guang Lin, Meng Yue

发表机构 * Purdue University(普渡大学)

AI总结 提出基于算子学习的框架,利用DeepONet逼近同步发电机的动态响应,并设计递归模拟方案及残差DeepONet方案,结合数据聚合策略实现与电网交互的模拟。

详情
AI中文摘要

本文开发了一个算子学习框架,用于逼近同步发电机的动态响应。该框架可用于(i)构建一个基于神经网络的发电机模型,与电网模拟器交互,或(ii)跟踪真实发电机的暂态响应。首先,我们开发了一个数据驱动的深度算子网络(DeepONet)来逼近发电机的无限维解算子。然后,我们设计了一个基于DeepONet的数值方案,在给定的时间范围内模拟发电机的响应。所提出的方案递归地使用训练好的DeepONet来模拟给定多维输入下的响应,该输入描述了发电机与电网之间的相互作用。此外,我们设计了一个残差DeepONet数值方案,可以整合现有数学模型的信息。我们为这个残差DeepONet方案提供了预测累积误差的估计。最后,我们构建了一个数据聚合(DAgger)策略,允许使用DeepONet在与其他电网组件交互模拟中可能遇到的聚合训练数据对DeepONet进行微调。作为概念验证,我们证明了所提出的框架能够有效逼近同步发电机的暂态模型。

英文摘要

This paper develops an Operator Learning framework for approximating the dynamic response of synchronous generators. The framework can be used to (i) build a neural network-based generator model that interacts with a power grid simulator or (ii) shadow the true generator's transient response. First, we develop a data-driven Deep Operator Network (DeepONet) to approximate the infinite-dimensional solution operator of the generators. Then, we design a numerical scheme based on DeepONet that simulates the generator's response over a given time horizon. The proposed scheme recursively employs the trained DeepONet to simulate the response for a given multi-dimensional input that describes the interaction between the generator and the power grid. In addition, we design a residual DeepONet numerical scheme that can incorporate information from existing mathematical models. We accompany this residual DeepONet scheme with an estimate for the prediction's cumulative error. Finally, we build a data aggregation (DAgger) strategy that allows fine-tuning of DeepONets using aggregated training data that the DeepONets will likely encounter during interactive simulations with other grid components. As a proof of concept, we demonstrate that the proposed frameworks can effectively approximate the transient model of a synchronous generator.

2604.24449 2026-06-12 cs.RO cs.AI cs.LG

SPLIT: Separating Physical-Contact via Latent Arithmetic in Image-Based Tactile Sensors

SPLIT:通过潜在算术分离物理接触以实现基于图像的触觉传感器

Wadhah Zai El Amri, Nicolás Navarro-Guerrero

发表机构 * Leibniz Universität Hannover, L3S Research Center(莱布尼茨汉诺威大学,L3S研究所)

AI总结 本文提出SPLIT方法,通过潜在空间算术分离接触几何与传感器光学特性,实现触觉传感器的高效模拟,支持多传感器迁移和双向模拟,提升机器人触觉感知研究效率。

Comments Accepted to Elsevier Robotics and Autonomous Systems Journal

详情
AI中文摘要

训练机器人触觉感知的机器学习模型需要大量数据,但获取真实交互数据因物理复杂性和变异性而具有挑战性。模拟触觉传感器是加速进展的关键步骤。本文提出了SPLIT,一种新的基于图像的触觉传感器模拟方法,重点在于DIGIT传感器。我们的方法核心是一种潜在空间算术策略,明确分离接触几何与传感器特定的光学属性。与需要重新校准的现有方法不同,这种分离使SPLIT能够适应多样化的DIGIT背景,甚至在不完全重训练的情况下将数据转移到不同的传感器如GelSight R1.5。此外,我们的方法在推理速度上优于现有替代方案。我们还提供了一种校准的有限元方法(FEM)软体网格模拟,具有可变分辨率,提供速度与保真度之间的可调权衡。此外,我们的算法支持双向模拟,允许从变形网格生成逼真图像以及从触觉图像重建网格。这种多功能性使SPLIT成为加速机器人触觉感知研究进展的重要工具。

英文摘要

Training machine learning models for robotic tactile sensing requires vast amounts of data, yet obtaining realistic interaction data remains a challenge due to physical complexity and variability. Simulating tactile sensors is thus a crucial step in accelerating progress. This paper presents SPLIT, a novel method for simulating image-based tactile sensors, with a primary focus on the DIGIT sensor. Central to our approach is a latent space arithmetic strategy that explicitly disentangles contact geometry from sensor-specific optical properties. Unlike methods that require recalibration for every new unit, this disentanglement allows SPLIT to adapt to diverse DIGIT backgrounds and even transfer data to distinct sensors like the GelSight R1.5 without full model retraining. Beyond this adaptability, our approach achieves faster inference speeds than existing alternatives. Furthermore, we provide a calibrated finite element method (FEM) soft-body mesh simulation with variable resolution, offering a tunable trade-off between speed and fidelity. Additionally, our algorithm supports bidirectional simulation, allowing for both the generation of realistic images from deformation meshes and the reconstruction of meshes from tactile images. This versatility makes SPLIT a valuable tool for accelerating progress in robotic tactile sensing research.

2511.20162 2026-06-12 cs.CV cs.AI q-bio.NC

Action Without Interaction: Probing the Physical Foundations of Video LMMs via Contact-Release Detection

无交互行动:通过接触-释放检测探测视频LMMs的物理基础

Daniel Harari, Michael Sidorov, Chen Shterental, Liel David, Abrham Kahsay Gebreselasie, Muhammad Haris Khan

发表机构 * Weizmann Institute of Science(魏茨曼科学研究所) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 研究探讨了视频LMMs在实际视觉输入中语义理解的深度,通过接触-释放检测发现模型在物理基础方面的不足。

详情
Journal ref
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 workshop on Cognitive Foundations for Multimodal Models (CogVL)
AI中文摘要

大型多模态模型(LMMs)在现实视觉任务中表现出越来越强的性能,例如在视频中描述对象、周围环境和动态动作。本研究探讨了这些模型如何将语义理解与实际视觉输入联系起来。具体来说,给定手与物体互动的序列,我们询问模型何时以及在哪里开始或结束互动。为此,我们引入了一个前所未有的大规模数据集,包含来自Something-Something-V2数据集的视频中超过20,000个标注的互动。250名AMTurk人工标注者标记了核心互动事件,特别是物体和代理何时以及在哪里接触(接触)或分离(释放)。我们要求最先进的LMMs,包括GPT、Gemini和Qwen,在短视频中定位这些事件,每个视频只有一个事件。结果表明,尽管模型能够可靠地命名目标对象并识别动作,但它们表现出一种“捷径学习”现象,即语义成功掩盖了在物理基础方面的失败。具体来说,它们始终无法识别互动开始或结束的帧,并且在场景中对物理事件的定位较差。这种脱节表明,尽管LMMs在系统1直观模式识别(命名动作和对象)方面表现出色,但它们缺乏系统2认知基础,无法对如“接触”和“释放”这样的物理原始要素进行推理,因此无法真正将动态场景 grounded 在物理现实中。

英文摘要

Large multi-modal models (LMMs) show increasing performance in realistic visual tasks for images and, more recently, for videos. For example, given a video sequence, such models are able to describe in detail objects, the surroundings and dynamic actions. In this study, we explored the extent to which these models ground their semantic understanding in the actual visual input. Specifically, given sequences of hands interacting with objects, we asked models when and where the interaction begins or ends. For this purpose, we introduce a first of its kind, large-scale dataset with more than 20K annotated interactions on videos from the Something-Something-V2 dataset. 250 AMTurk human annotators labeled core interaction events, particularly when and where objects and agents become attached (`contact') or detached (`release'). We asked SoTA LMMs, including GPT, Gemini and Qwen to locate these events in short videos, each with a single event. The results show that while models reliably name target objects and identify actions, they exhibit a form of `shortcut learning' where semantic success masks a failure in physical grounding. Specifically, they consistently fail to identify the frame where the interaction begins or ends and poorly localize the physical event within the scene. This disconnect suggests that while LMMs excel at System 1 intuitive pattern recognition (naming the action and objects), they lack the System 2 cognitive foundations required to reason about physical primitives like `contact' and `release', hence truly ground dynamic scenes in physical reality.

2307.05520 2026-06-12 cs.LG cs.CY cs.SE

Estimating Deep Learning energy consumption based on model architecture and training environment

基于模型架构和训练环境的深度学习能耗估算

Santiago del Rey, Luís Cruz, Xavier Franch, Silverio Martínez-Fernández

发表机构 * Universitat Politècnica de Catalunya(巴塞罗那理工大学) Tecnológico de Delft(代尔夫特理工大学)

AI总结 研究通过分析模型架构与训练环境对能耗的影响,提出STEP和PRE方法,显著提升能耗估算准确性,减少训练能耗达80.68%。

Comments 48 pages, 10 figures, under review in Computer Standards & Interfaces journal. This work is an extension of arXiv:2307.05520v3 [cs.LG]

详情
AI中文摘要

为提高对深度学习环境影响的认识,许多研究估算DL系统的能耗。然而,训练期间的能耗估计常依赖未经验证的假设。本文通过研究模型架构和训练环境对能耗的影响,训练多种计算机视觉模型并收集能耗和准确率指标,分析其配置间的权衡。结果表明,选择合适的模型-训练环境组合可将训练能耗降低80.68%,准确率损失低于2%。发现模型与训练环境之间存在显著交互效应:GPU计算能力与模型复杂度成正比时,能效提升。此外,证明常用估算方法如FLOPs或GPU TDP无法捕捉这些动态,可能导致重大误差。为此,提出STable Training Epoch Projection (STEP)和Pre-training Regression-based Estimation (PRE)方法。在评估中,这些方法在估算准确性上比现有工具高两倍或更多。

英文摘要

To raise awareness of the environmental impact of deep learning (DL), many studies estimate the energy use of DL systems. However, energy estimates during DL training often rely on unverified assumptions. This work addresses that gap by investigating how model architecture and training environment affect energy consumption. We train a variety of computer vision models and collect energy consumption and accuracy metrics to analyze their trade-offs across configurations. Our results show that selecting the right model-training environment combination can reduce training energy consumption by up to 80.68% with less than 2% loss in $F_1$ score. We find a significant interaction effect between model and training environment: energy efficiency improves when GPU computational power scales with model complexity. Moreover, we demonstrate that common estimation practices, such as using FLOPs or GPU TDP, fail to capture these dynamics and can lead to substantial errors. To address these shortcomings, we propose the Stable Training Epoch Projection (STEP) and the Pre-training Regression-based Estimation (PRE) methods. Across evaluations, our methods outperform existing tools by a factor of two or more in estimation accuracy.

2505.22169 2026-06-12 cs.CL

ReliableEval: A Recipe for Stochastic LLM Evaluation via Method of Moments

ReliableEval: 通过矩方法进行随机大语言模型评估的配方

Gili Lior, Eliya Habba, Shahar Levy, Avi Caciularu, Gabriel Stanovsky

发表机构 * The Hebrew University of Jerusalem(耶路撒冷希伯来大学) Google Research(谷歌研究)

AI总结 本文提出ReliableEval方法,通过矩方法评估大语言模型的提示敏感性,发现顶级模型如GPT-4o和Claude-3.7-Sonnet存在显著提示敏感性。

Comments Findings of EMNLP 2025

详情
Journal ref
Findings of the Association for Computational Linguistics: EMNLP 2025, pages 11146-11153, Suzhou, China. Association for Computational Linguistics
AI中文摘要

大语言模型对提示语的表述高度敏感,但标准基准通常仅使用单一提示进行性能评估,引发对评估可靠性的担忧。本文主张在保持意义的提示扰动空间中采用随机矩方法进行评估。我们引入了可靠评估的正式定义,考虑了提示敏感性,并建议ReliableEval——一种估计所需提示重采样次数以获得有意义结果的方法。使用我们的框架,我们随机评估了五种前沿大语言模型,并发现即使顶级模型如GPT-4o和Claude-3.7-Sonnet也表现出显著的提示敏感性。我们的方法是模型、任务和度量无关的,提供了一种有意义且稳健的大语言模型评估配方。

英文摘要

LLMs are highly sensitive to prompt phrasing, yet standard benchmarks typically report performance using a single prompt, raising concerns about the reliability of such evaluations. In this work, we argue for a stochastic method of moments evaluation over the space of meaning-preserving prompt perturbations. We introduce a formal definition of reliable evaluation that accounts for prompt sensitivity, and suggest ReliableEval - a method for estimating the number of prompt resamplings needed to obtain meaningful results. Using our framework, we stochastically evaluate five frontier LLMs and find that even top-performing models like GPT-4o and Claude-3.7-Sonnet exhibit substantial prompt sensitivity. Our approach is model-, task-, and metric-agnostic, offering a recipe for meaningful and robust LLM evaluation.

2402.13906 2026-06-12 cs.CL

Leveraging Collection-Wide Similarities for Unsupervised Document Structure Extraction

利用整体相似性进行无监督文档结构提取

Gili Lior, Yoav Goldberg, Gabriel Stanovsky

发表机构 * Allen Institute for AI(Allen人工智能研究所) The Hebrew University of Jerusalem(耶路撒冷希伯来大学) Bar-Ilan University(巴伊兰大学)

AI总结 本文提出一种无监督方法,利用文档间和文档内相似性提取跨领域文档集合的整体结构,通过捕捉重复主题并抽象化标题变体,为人类和结构感知模型提供帮助。

Comments Accepted to ACL 2024 findings

详情
Journal ref
Findings of the Association for Computational Linguistics: ACL 2024, pages 9538-9550, Bangkok, Thailand. Association for Computational Linguistics
AI中文摘要

各种领域(如法律、医疗或金融)的文档集合通常具有某种底层的整体结构,这种结构能为人类用户和结构感知模型提供帮助。我们提出识别文档集合中的典型结构,需要捕捉集合中的重复主题,同时抽象化任意标题的同义表达,并将每个主题定位到相应的文档位置。这些要求带来了多个挑战:标记重复主题的标题经常在措辞上不同,某些部分标题仅在个别文档中出现,而不反映典型结构,且不同文档中的主题顺序可能不同。随后,我们开发了一种无监督的图基方法,利用文档间和文档内的相似性来提取底层的整体结构。我们在英语和希伯来语的三个不同领域上的评估表明,我们的方法能够提取有意义的整体结构,我们希望未来的工作能利用我们的方法进行多文档应用和结构感知模型。

英文摘要

Document collections of various domains, e.g., legal, medical, or financial, often share some underlying collection-wide structure, which captures information that can aid both human users and structure-aware models. We propose to identify the typical structure of document within a collection, which requires to capture recurring topics across the collection, while abstracting over arbitrary header paraphrases, and ground each topic to respective document locations. These requirements pose several challenges: headers that mark recurring topics frequently differ in phrasing, certain section headers are unique to individual documents and do not reflect the typical structure, and the order of topics can vary between documents. Subsequently, we develop an unsupervised graph-based method which leverages both inter- and intra-document similarities, to extract the underlying collection-wide structure. Our evaluations on three diverse domains in both English and Hebrew indicate that our method extracts meaningful collection-wide structure, and we hope that future work will leverage our method for multi-document applications and structure-aware models.

2507.11936 2026-06-12 cs.CL cs.AI cs.CV cs.LG

A Survey of Deep Learning for Geometry Problem Solving

深度学习在几何问题求解中的应用综述

Jianzhe Ma, Wenxuan Wang, Qin Jin

发表机构 * Renmin University of China(中国人民大学)

AI总结 本文综述了深度学习在几何问题求解中的应用,涵盖相关任务、方法、评估指标及未来方向,旨在提供实践参考以推动该领域发展。

Comments ACL 2026 Main Conference

详情
AI中文摘要

几何问题求解作为数学推理的重要组成部分,在教育、评估AI数学能力及多模态能力评估中具有关键作用。近期深度学习技术,尤其是多模态大语言模型的出现,显著加速了该领域的研究。本文综述了深度学习在几何问题求解中的应用,包括(i)几何问题求解相关任务的全面总结;(ii)相关深度学习方法的深入回顾;(iii)评估指标和方法的详细分析;以及(iv)最先进性能、现有挑战和有前景的未来方向的批判性讨论。我们的目标是提供一个全面且实用的深度学习在几何问题求解中的参考,从而推动该领域进一步发展。我们维护了一个相关论文列表:https://github.com/majianz/dl4gps。

英文摘要

Geometry problem solving, a crucial aspect of mathematical reasoning, is vital across various domains, including education, the assessment of AI's mathematical abilities, and multimodal capability evaluation. The recent surge in deep learning technologies, particularly the emergence of multimodal large language models, has significantly accelerated research in this area. This paper presents a survey of the applications of deep learning in geometry problem solving, including (i) a comprehensive summary of the relevant tasks in geometry problem solving; (ii) a thorough review of related deep learning methods; (iii) a detailed analysis of evaluation metrics and methods; and (iv) a critical discussion of state-of-the-art performance, existing challenges, and promising future directions. Our objective is to offer a comprehensive and practical reference of deep learning for geometry problem solving, thereby fostering further advancements in this field. We maintain a list of relevant papers: https://github.com/majianz/dl4gps.

2508.03721 2026-06-12 cs.CV eess.IV

Enhancing Diameter Measurement Accuracy in Machine Vision Applications

提升机器视觉应用中直径测量精度

Ahmet Gokhan Poyraz, Ahmet Emir Dirik, Hakan Gurkan, Mehmet Kacmaz

发表机构 * Department of Electrical and Electronics Engineering, Bursa Technical University(布尔萨技术大学电气与电子工程系) Doğu Pres R&D(多古普研发) Department of Computer Engineering, Bursa Uludağ University(布尔萨乌拉达格大学计算机工程系) Institute of Electrical Information Technology, Clausthal University of Technology(克莱斯特哈尔技术大学电气信息学院)

AI总结 本文提出两种新方法通过多参考零件提升测量精度,利用转换因子和像素信息减少误差,实验显示误差从13-114微米降至1-2微米。

Comments Preprint

详情
Journal ref
Measurement 278 (2026) 121646
AI中文摘要

在相机测量系统中,通常使用特殊设备如 telecentric 镜头来测量公差较小的零件。然而,由于系统内的机械和软件因素,测量误差仍可能发生,特别是在使用相同设置测量不同直径零件时。本文提出两种创新方法,通过多个已知参考零件增强测量精度:基于转换因子的方法和基于像素的方法。第一种方法通过已知参考零件估计转换因子以计算未知零件的直径(毫米)。第二种方法则直接利用参考零件的像素直径信息估算直径(毫米)。实验设置包括工业级相机和 telecentric 镜头。对玻璃样品(1-12 mm)和金属工件(3-24 mm)的测试显示,使用所提出的方法后,原本范围为13-114微米的测量误差被降至1-2微米。仅使用少量已知参考零件,该方法能够实现相机视野内所有零件的高精度测量。此外,该方法通过显著降低误差率和提高测量可靠性,增强了现有直径测量文献。

英文摘要

In camera measurement systems, specialized equipment such as telecentric lenses is often employed to measure parts with narrow tolerances. However, despite the use of such equipment, measurement errors can occur due to mechanical and software-related factors within the system. These errors are particularly evident in applications where parts of different diameters are measured using the same setup. This study proposes two innovative approaches to enhance measurement accuracy using multiple known reference parts: a conversion factor-based method and a pixel-based method. In the first approach, the conversion factor is estimated from known references to calculate the diameter (mm) of the unknown part. In the second approach, the diameter (mm) is directly estimated using pixel-based diameter information from the references. The experimental setup includes an industrial-grade camera and telecentric lenses. Tests conducted on glass samples (1-12 mm) and metal workpieces (3-24 mm) show that measurement errors, which originally ranged from 13-114 micrometers, were reduced to 1-2 micrometers using the proposed methods. By utilizing only a few known reference parts, the proposed approach enables high-accuracy measurement of all parts within the camera's field of view. Additionally, this method enhances the existing diameter measurement literature by significantly reducing error rates and improving measurement reliability.

2507.21086 2026-06-12 cs.CL

Multi-Amateur Contrastive Decoding for Text Generation

多业余对比解码用于文本生成

Jaydip Sen, Subhasis Dasgupta, Hetvi Waghela

发表机构 * Department of Data Science(数据科学系) Praxis Business School(普拉克斯商学院)

AI总结 本文提出多业余对比解码框架,通过集成多个业余模型更全面地捕捉语言生成中的不良模式,提升文本生成的流畅性、连贯性和多样性。

Comments This paper has been accepted for oral presentation and publication in the proceedings of the IEEE I2ITCON 2025. The conference will be organized in Pune, India, from July 4 to 5, 2025. This is the accepted version of the paper and NOT the final camera-ready version. The paper is 11 pages long and contains 5 figures and 6 tables

详情
AI中文摘要

对比解码(CD)作为一种有效的推理时策略,通过利用大专家语言模型和小业余模型输出概率的差异来增强开放性文本生成。尽管CD提升了连贯性和流畅性,但其依赖单一业余模型限制了捕捉语言生成中多样化的失败模式,如重复、幻觉和风格漂移的能力。本文提出多业余对比解码(MACD),作为CD框架的扩展,采用多个业余模型更全面地表征不良生成模式。MACD通过平均和共识惩罚机制整合对比信号,并将可能性约束扩展到多业余设置中。此外,该框架通过引入具有针对性风格或内容偏见的业余模型实现可控生成。在新闻、百科和叙事等多个领域实验结果表明,MACD在流畅性、连贯性、多样性和适应性方面均优于传统解码方法和原始CD方法,且无需额外训练或微调。

英文摘要

Contrastive Decoding (CD) has emerged as an effective inference-time strategy for enhancing open-ended text generation by exploiting the divergence in output probabilities between a large expert language model and a smaller amateur model. Although CD improves coherence and fluency, its dependence on a single amateur restricts its capacity to capture the diverse and multifaceted failure modes of language generation, such as repetition, hallucination, and stylistic drift. This paper proposes Multi-Amateur Contrastive Decoding (MACD), a generalization of the CD framework that employs an ensemble of amateur models to more comprehensively characterize undesirable generation patterns. MACD integrates contrastive signals through both averaging and consensus penalization mechanisms and extends the plausibility constraint to operate effectively in the multi-amateur setting. Furthermore, the framework enables controllable generation by incorporating amateurs with targeted stylistic or content biases. Experimental results across multiple domains, such as news, encyclopedic, and narrative, demonstrate that MACD consistently surpasses conventional decoding methods and the original CD approach in terms of fluency, coherence, diversity, and adaptability, all without requiring additional training or fine-tuning.

2505.18060 2026-06-12 cs.CV

Semantic Correspondence: Unified Benchmarking and a Strong Baseline

语义对应:统一的基准测试与强大的基线

Kaiyan Zhang, Xinghui Li, Jingyi Lu, Kai Han

发表机构 * The University of Hong Kong(香港大学)

AI总结 本文首次全面调研语义对应方法,提出分类体系并汇总多基准结果,提出高性能基线,为未来研究奠定基础。

详情
Journal ref
IEEE Trans. Pattern Anal. Mach. Intell. 48, no. 3 (2026) 3911-3930
AI中文摘要

建立语义对应是计算机视觉中的一个具有挑战性任务,旨在在不同图像中匹配具有相同语义信息的关键点。得益于深度学习的快速发展,过去十年来取得了显著进展。然而,对这一任务的全面回顾和分析仍然缺失。本文首次对语义对应方法进行了广泛的调查。我们首先提出一个分类体系,根据方法设计的类型对现有方法进行分类。这些方法随后被相应归类,并对每种方法进行详细分析。此外,我们汇总并总结了文献中各种基准测试方法的结果,形成一个统一的比较表格,并提供详细的配置以突出性能差异。此外,为了深入了解现有的语义匹配方法,我们彻底进行了受控实验,以分析不同方法组件的有效性。最后,我们提出了一种简单而有效的基线,该基线在多个基准测试中实现了最先进的性能,为该领域未来的研究奠定了坚实基础。我们希望本文的调查能为未来的发展提供全面的参考和统一的基线。代码已公开在:https://github.com/Visual-AI/Semantic-Correspondence。

英文摘要

Establishing semantic correspondence is a challenging task in computer vision, aiming to match keypoints with the same semantic information across different images. Benefiting from the rapid development of deep learning, remarkable progress has been made over the past decade. However, a comprehensive review and analysis of this task remains absent. In this paper, we present the first extensive survey of semantic correspondence methods. We first propose a taxonomy to classify existing methods based on the type of their method designs. These methods are then categorized accordingly, and we provide a detailed analysis of each approach. Furthermore, we aggregate and summarize the results of methods in literature across various benchmarks into a unified comparative table, with detailed configurations to highlight performance variations. Additionally, to provide a detailed understanding on existing methods for semantic matching, we thoroughly conduct controlled experiments to analyse the effectiveness of the components of different methods. Finally, we propose a simple yet effective baseline that achieves state-of-the-art performance on multiple benchmarks, providing a solid foundation for future research in this field. We hope this survey serves as a comprehensive reference and consolidated baseline for future development. Code is publicly available at: https://github.com/Visual-AI/Semantic-Correspondence.

2412.14631 2026-06-12 cs.CV

Review of Fruit Tree Image Segmentation

水果树图像分割综述

Il-Seok Oh

发表机构 * Department of Computer Science and Artificial Intelligence/CAIIT, Jeonbuk National University, South Korea(计算机科学与人工智能系/先进图像与信息科技中心,全州国立大学)

AI总结 本文综述了水果树前视图像分割研究,指出现有方法缺乏通用数据集和模型,提出六个未来研究方向以构建通用分割模块。

详情
Journal ref
Agriculture, Volume 15, Issue 21, 2025
AI中文摘要

水果树图像分割是自动化农业任务如表型分析、采摘、喷洒和修剪中的关键问题。许多论文提出了适用于特定任务和环境的多样化解决方案。本文综述范围限定在水果树前视图,基于158篇通过新设计的爬虫方法收集的相关论文。这些论文基于一种按方法、图像、任务和水果顺序考虑的分类法进行系统回顾。该分类法将帮助读者直观理解这些研究活动的整体情况。本文指出,先前研究的主要不足是缺乏适用于多种任务和环境的通用数据集和分割模型。本文建议六个重要的未来研究任务,期望这些将为构建通用的树分割模块铺平道路。

英文摘要

Fruit tree image segmentation is an essential problem in automating a variety of agricultural tasks such as phenotyping, harvesting, spraying, and pruning. Many research papers have proposed a diverse spectrum of solutions suitable to specific tasks and environments. The review scope of this paper is confined to the front views of fruit trees and based on 158 relevant papers collected using a newly designed crawling review method. These papers are systematically reviewed based on a taxonomy that sequentially considers the method, image, task, and fruit. This taxonomy will assist readers to intuitively grasp the big picture of these research activities. Our review reveals that the most noticeable deficiency of the previous studies was the lack of a versatile dataset and segmentation model that could be applied to a variety of tasks and environments. Six important future research tasks are suggested, with the expectation that these will pave the way to building a versatile tree segmentation module.

2306.01690 2026-06-12 cs.LG cs.AI

Context selectivity with dynamic availability enables lifelong continual learning

基于动态可用性的上下文选择性促进终身持续学习

Martin Barry, Wulfram Gerstner, Guillaume Bellec

发表机构 * Department of Life Sciences, Department of Computer Sciences(生命科学系、计算机科学系)

AI总结 本文提出基于上下文选择性和动态可用性的元可塑性规则,通过模拟验证该模型在图像识别和自然语言处理任务中优于现有持续学习算法。

详情
AI中文摘要

"你永远忘不了如何骑自行车"——但这是如何可能的?大脑能够学习复杂技能,停顿多年不练习,中间学习其他技能,仍能随时召回原始知识。这种能力的机制,称为终身学习(或持续学习,CL),尚不清楚。我们建议一种生物合理的元可塑性规则,基于经典持续学习工作,总结为两个原则:(i) 神经元具有上下文选择性,(ii) 一个局部可用性变量在神经元先前任务相关时部分冻结可塑性。在新的神经中心形式化中,我们建议神经元选择性和神经元级巩固是简单且可行的元可塑性假设,以在大脑中实现CL。在模拟中,该简单模型平衡了遗忘和巩固,导致在图像识别和自然语言处理CL基准上优于当前CL算法。

英文摘要

"You never forget how to ride a bike", -- but how is that possible? The brain is able to learn complex skills, stop the practice for years, learn other skills in between, and still retrieve the original knowledge when necessary. The mechanisms of this capability, referred to as lifelong learning (or continual learning, CL), are unknown. We suggest a bio-plausible meta-plasticity rule building on classical work in CL which we summarize in two principles: (i) neurons are context selective, and (ii) a local availability variable partially freezes the plasticity if the neuron was relevant for previous tasks. In a new neuro-centric formalization of these principles, we suggest that neuron selectivity and neuron-wide consolidation is a simple and viable meta-plasticity hypothesis to enable CL in the brain. In simulation, this simple model balances forgetting and consolidation leading to better transfer learning than contemporary CL algorithms on image recognition and natural language processing CL benchmarks.

2302.01090 2026-06-12 cs.SD cs.IR eess.AS

Goniometers are a Powerful Acoustic Feature for Music Information Retrieval Tasks

角度仪是音乐信息检索任务中一种强大的音频特征

Tim Ziemer

发表机构 * University of Hamburg(汉堡大学)

AI总结 本文探讨了角度仪在音乐信息检索中的应用,通过自组织映射验证其在分类和聚类中的有效性,强调其因果性优势。

详情
Journal ref
Fortschritte der Akustik (DAGA) 2023
AI中文摘要

角度仪,也称为相位图或向量图,是音频测量工具,帮助音乐制作人和混音工程师监测音乐混音的空间特性,如立体声全景、单个声源的宽度、回声的量和扩散度以及可能发生的相位抵消。此外,它们隐含地提供了声音的动力学信息。通过训练自组织映射来探索这种音频特征在音乐信息检索任务中的有用性。可以观察到,角度仪能够区分不同流派并聚类单张专辑。角度仪的优势在于因果性:音乐制作人和混音工程师有意识地查阅角度仪以达到期望的声音,而其他音频特征如零穿越率到梅尔频率倒谱系数则并非如此。

英文摘要

Goniometers, also known as Phase Scopes or Vector Scopes, are audio metering tools that help music producers and mixing engineers monitor spatial aspects of a music mix, such as the stereo panorama, the width of single sources, the amount and diffuseness of reverberation as well as phase cancellations that may occur on the sweet-spot and in a mono-mixdown. In addition, they implicitly inform about the dynamics of the sound. Self-organizing maps trained with a goniometer, are consulted to explore the usefulness of this acoustic feature for music information retrieval tasks. One can see that goniometers are able to classify different genres and cluster a single album. The advantage of goniometers is the causality: Music producers and mixing engineers consciously consult goniometers to reach their desired sound, which is not the case for other acoustic features, from Zero-Crossing Rate to Mel-Frequency Cepstral Coefficients.

2606.13529 2026-06-12 cs.HC cs.LG 新提交

Ride, Track, and Recover: Pilot Randomized Trial of a Wearable Digital Self-Management Intervention During a Veteran Endurance-Cycling Program

骑行、追踪与恢复:一项关于可穿戴数字自我管理干预在退伍军人耐力骑行项目中的初步随机试验

Alan Ta, Nilsu Salgin, Caleb Armstrong, Kala Phillips Reindel, Farzan Sasangohar

发表机构 * Department of Industrial and Systems Engineering, Texas A&M University(工业与系统工程系,德克萨斯A&M大学) Texas A&M Health Telehealth Institute(德克萨斯A&M健康远程医疗研究所)

AI总结 本研究通过随机试验,评估可穿戴数字自我管理干预对退伍军人创伤后应激障碍(PTSD)高唤醒症状的稳定效果,发现干预组症状改善更持久,且机器学习检测精度与症状严重程度正相关。

详情
AI中文摘要

退伍军人的创伤后应激障碍(PTSD)以持续高唤醒及共病焦虑和抑郁症状为特征,这些症状在临床环境外难以监测和管理。在德克萨斯州参加“英雄计划”骑行活动的13名退伍军人,通过计算机生成序列在自然环境中随机分为两组:(1)数字干预加体力活动,或(2)仅体力活动,外加一个由从更广泛的“英雄计划”退伍军人社区中选出的7名退伍军人组成的第三组家庭监测对照组。连续智能手表传感结合心率和加速度计特征来检测高唤醒事件,并由参与者实时确认。每周收集焦虑、抑郁和PTSD严重程度的自我报告测量。广义加性混合模型描述了随时间变化的非线性轨迹。基线归一化的高唤醒轨迹在不同条件下存在显著差异,数字干预组(n=7)显示出结构化的稳定,而仅体力活动组(n=3)在研究后期出现恶化。两个骑行组在耐力活动期间均表现出急性症状改善;然而,数字干预组表现出更高的整体收益维持。家庭对照组(n=4)显示出症状逐渐下降。机器学习检测的感知精度在个体间差异很大,并与症状严重程度正相关,较高严重程度的参与者确认了更大比例的检测事件。这些结果表明,将可穿戴检测与数字自我管理工具相结合可能支持高唤醒的稳定和症状改善,同时强调了在可穿戴心理健康系统中个性化和以人为中心的设计的重要性。

英文摘要

Post-traumatic stress disorder (PTSD) in veterans is characterized by persistent hyperarousal and comorbid anxiety and depressive symptoms that are difficult to monitor and manage outside clinical settings. Thirteen veterans participating in a Project Hero cycling event in Texas were randomized by computer-generated sequence in a naturalistic setting to two arms: (1) digital intervention plus physical activity, or (2) physical activity only, plus a third at-home monitoring control cohort consisting of 7 veterans selected from the broader Project Hero veteran community. Continuous smartwatch sensing combined heart rate and accelerometer features to detect hyperarousal events, which were confirmed in real time by participants. Weekly self-report measures of anxiety, depression, and PTSD severity were collected. Generalized additive mixed models characterized nonlinear trajectories over time. Baseline-normalized hyperarousal trajectories differed significantly across conditions, with the digital intervention group (n=7) showing structured stabilization compared to late-study escalation in the physical-only group (n=3). Both cycling groups exhibited acute symptom improvements during the endurance event; however, the digital intervention group demonstrated a higher overall maintenance of gains. The at-home control group (n=4) showed gradual symptom declines. Perceived precision of ML detections varied substantially across individuals and was positively associated with symptom severity, with higher-severity participants confirming a greater proportion of detected events. These results suggest that coupling wearable detection with digital self-management tools may support stabilization of hyperarousal and symptom improvement while emphasizing the importance of personalization and human-centered design in wearable mental health systems.

2606.13452 2026-06-12 cs.DL cs.CL cs.CY cs.HC 新提交

Examining the Cognitive Gap Between Authors and Peer Reviewers on Academic Paper Novelty

审视作者与同行评审员在学术论文新颖性上的认知差距

Chenggang Yang, Chengzhi Zhang

发表机构 * Department of Information Management, Nanjing University of Science and Technology(南京理工大学信息管理学院)

AI总结 通过分析Nature Communications上15,328篇论文及其评审意见,发现作者和评审员都强调结果导向的创新,但评审员视角更全面;高创新论文受益于强宣传语言,中等创新论文的宣传语言与评审分歧显著相关。

详情
Journal ref
Scientometrics, 2026
AI中文摘要

新颖性是评估学术论文质量的关键指标。学者们努力突出其工作的新颖方面,尤其是在标题、摘要和引言中。同行评审作为科学严谨性的守门人,严格评估论文的新颖性,但作者自我宣传与评审员评价之间可能存在认知差距。为探究此问题,我们分析了2016年至2021年间发表在Nature Communications上的15,328篇学术论文及其同行评审意见。我们发现,评审员和作者都强调结果导向的创新,但评审员采用更全面的评价视角。此外,通过考察宣传强度与论文固有新颖性的关系,我们发现其效果取决于论文的实际创新水平。高创新论文受益于更强的宣传语言,获得更积极的评价。我们还发现,宣传语言与评审员对新颖性的分歧显著相关,但仅针对中等创新性的论文,而对高或低新颖性的论文影响甚微。这揭示了宣传语言如何在学术评价的灰色地带中发挥最显著的作用。

英文摘要

Novelty is a crucial metric for assessing the quality of academic papers. Scholars strive to highlight the novel aspects of their work, particularly in the title, abstract, and introduction. Peer review, serving as the gatekeeper of scientific rigor, rigorously evaluates the novelty of papers, yet a cognitive gap may exist between author self-promotion and reviewer evaluation. To investigate this, we analyzed 15,328 academic papers published in Nature Communications from 2016 to 2021, along with their peer-review comments. We found that both reviewers and authors emphasize result-oriented innovation, with reviewers adopting a more comprehensive evaluation perspective. Furthermore, by examining promotional intensity against inherent paper novelty, we found that its effect depends on the paper's actual innovation level. Highly innovative papers benefit from stronger promotional language, receiving more positive evaluations. We also found that promotional language significantly correlates with reviewer disagreement on novelty specifically for papers of moderate innovativeness, whereas it has negligible impact for papers with either very high or very low novelty. This reveals how promotional language operates most prominently in the gray area of academic evaluation.

2606.13179 2026-06-12 cs.ET cs.AI cs.AR cs.NE 新提交

Modern analog computing for solving differential and matrix equations

现代模拟计算用于求解微分方程和矩阵方程

Zhong Sun, Piergiulio Mannocci, Manuel Le Gallo, Abu Sebastian

发表机构 * Institute for Artificial Intelligence, School of Integrated Circuits, Peking University, Beijing Advanced Innovation Center for Integrated Circuits(人工智能研究院,集成电路学院,北京大学,北京集成电路先进创新中心) Dipartimento di Elettronica, Informazione e Bioingegneria, Politecnico di Milano(电子、信息与生物工程系,米兰理工大学) IBM Research Europe(IBM欧洲研究院)

AI总结 本文综述现代模拟计算在求解微分方程和矩阵方程中的核心原语、硬件实现及最新进展,强调电阻式存储器阵列的优势,并讨论精度、可扩展性及与内存计算的关系。

详情
AI中文摘要

近年来,受人工智能和科学计算等数据密集型应用的计算需求驱动,模拟计算重新获得关注。鉴于计算任务的多样性以及模拟CMOS电路和电阻式存储器技术的最新进展,我们将这一不断发展的领域称为现代模拟计算。在此背景下,我们识别出三个核心计算原语:求解微分方程、求解矩阵方程以及执行矩阵-向量乘法,并探讨它们之间的联系。我们还研究了这些模拟计算算子的各种硬件实现,包括基于分立元件、集成电路和电阻式存储器设备的实现。其中,电阻式存储器阵列因其实现效率而显得尤为有前景。本文随后调查了利用现代模拟计算(使用先进的模拟CMOS电路和电阻式存储器阵列)求解微分方程和矩阵方程的最新进展。最后,我们讨论了这些电路的应用、精度和可扩展性问题及其潜在解决方案、与内存计算的关系,以及模拟计算的独特计算复杂性。本文提供了关于模拟计算的统一视角,强调了其优势、当前发展和挑战,并将其定位为下一代计算前沿的关键推动者。

英文摘要

In recent years, driven by the computational demands of data-intensive applications such as artificial intelligence and scientific computing, analog computing has gained renewed interest. Given the diversity of computational tasks and recent advancements in analog CMOS circuits and resistive memory technologies, we refer to the evolving landscape as modern analog computing. In this context, we identify three core computational primitives: solving differential equations, solving matrix equations, and performing matrix-vector multiplications, and we explore the connections among them. We also examine various hardware implementations of these analog computing operators, including those built with discrete components, integrated circuits, and resistive memory devices. Among these, resistive memory arrays emerge as particularly promising due to their implementation efficiency. The paper then surveys recent progress in leveraging modern analog computing to solve differential and matrix equations using both advanced analog CMOS circuits and resistive memory arrays. Finally, we discuss the applications of these circuits, the precision and scalability issues and their potential solutions, the relationship with in-memory computing, and the unique computational complexity of analog computing. This paper provides a unified perspective on analog computing, highlighting its strengths, current developments, and challenges, and positioning it as a pivotal enabler of next-generation computational frontiers.

2606.13079 2026-06-12 cs.CR cs.AI 新提交

The Emergence of Autonomous Penetration Capabilities in Large Language Model-Powered AI Systems

大型语言模型驱动的AI系统中自主渗透能力的涌现

Jiaqi Luo, Jiarun Dai, Zhile Chen, Jia Xu, Weibing Wang, Yawen Duan, Brian Tse, Geng Hong, Xudong Pan, Yuan Zhang, Min Yang

发表机构 * Fudan University(复旦大学) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室) Concordia AI Shanghai Innovation Institute(上海创新研究院)

AI总结 针对现有评估方法不透明、场景简化等问题,构建包含两级目标服务器和通用代理框架的自主渗透评估体系,测试19个LLM发现成功率10.7%-69.3%,且能力随模型整体能力提升。

详情
AI中文摘要

如今,能够造成重大现实世界危害的网络攻击的自主执行被广泛视为前沿AI系统不得跨越的关键红线之一。在这个更广泛的红线场景中,自主渗透代表了一项核心使能能力和子任务:LLM驱动的AI系统在无需人工干预的情况下,独立对目标服务器进行对抗操作,识别和利用漏洞,并获得未授权访问或控制的能力。越来越多的研究试图评估AI系统的自主渗透能力。然而,现有评估通常采用不透明的方法,依赖不切实际或过度简化的渗透测试场景,或为LLM提供过多的先验知识和任务特定指导,无法准确捕捉现代AI系统在更广泛的高影响网络攻击场景中自主执行这一核心能力的程度。为解决这些局限性,我们构建了一个新的自主渗透评估框架,包含两个组成部分:目标服务器和代理脚手架。具体而言,在目标服务器端,我们基于与易受攻击服务一起部署的无已知漏洞安全服务的数量,设计了两个级别的目标环境:一级(一个安全服务)和二级(三个安全服务),共产生300个目标服务器。同时,代理脚手架采用通用代理架构,配备一组通用网络安全工具,没有任何目标特定的先验知识。我们评估了19个开源和专有LLM,发现当前模型的渗透成功率在10.7%到69.3%之间。此外,我们观察到自主渗透能力随着整体模型能力的提升而持续改进。

英文摘要

Nowadays, the autonomous execution of cyberattacks capable of causing substantial real-world harm is widely regarded as one of the critical red lines that frontier AI systems must not cross. Within this broader red-line scenario, autonomous penetration represents a core enabling capability and subtask: the ability of LLM-powered AI systems to independently conduct adversarial operations against a target server without human intervention, identify and exploit vulnerabilities, and obtain unauthorized access or control. A growing body of work has sought to assess the autonomous penetration capabilities of AI systems. However, existing evaluations often employ opaque methodologies, rely on unrealistic or overly simplified penetration-testing scenarios, or provide LLMs with excessive prior knowledge and task-specific guidance, and cannot accurately capture the extent to which modern AI systems can autonomously perform this core capability within broader high-impact cyberattack scenarios. To address these limitations, we construct a new autonomous penetration evaluation framework consisting of two components: target servers and agent scaffolding. Specifically, on the target-server side, we design two levels of target environments based on the number of secure services without known vulnerabilities deployed alongside a vulnerable service: Tier~1 (one secure service) and Tier~2 (three secure services), resulting in a total of 300 target servers. Meanwhile, the agent scaffolding adopts a general-purpose agent architecture equipped with a set of general-purpose cybersecurity tools, without any target-specific prior knowledge. We evaluate 19 open-weight and proprietary LLMs, and find that current models achieve penetration success rates ranging from 10.7% to 69.3%. Moreover, we observe that autonomous penetration capability continues to improve alongside advances in overall model capability.

2606.13076 2026-06-12 cs.MA cs.GT cs.LG 新提交

$α$-fair heterogeneous agent reinforcement learning

$\alpha$-公平异质智能体强化学习

Yao-hua Franck Xu, Tayeb Lemlouma, Jean-Marie Bonnin, Arnaud Braud

发表机构 * Orange Innov(Orange创新)

AI总结 提出一种结合$\alpha$-公平性与异质智能体信任区域学习(HATRL)的框架,通过公平优势函数动态加权智能体效用,实现单调改进并收敛至纳什均衡,在顺序社会困境中优于HATRL算法。

详情
AI中文摘要

多智能体系统中的合作通常通过功利主义目标进行优化,这些目标最大化整体效率但未能考虑奖励分配,常常导致不公平的“领导者-跟随者”动态。虽然基于公平的方法鼓励每个智能体从合作中受益的亲社会行为,但许多当前算法——包括那些利用奖励塑造的算法——破坏了马尔可夫博弈的平稳性或缺乏严格的理论保证。这在公平目标方法和理论上安全的学习框架之间造成了关键差距。我们提出了一种新颖的框架,将$\alpha$-公平性与异质智能体信任区域学习(HATRL)相结合,确保单调改进并收敛至纳什均衡。我们的方法利用一种公平优势函数,该函数根据智能体的期望回报动态加权其效用,使得全局目标能够根据参数$\alpha$从纯粹的功利主义效率过渡到$\alpha$-公平福利。我们引入了两种实用算法,$\alpha$-公平HATRPO和$\alpha$-公平HAPPO,并通过在CleanUp和CommonHarvest等顺序社会困境中的实验证明,从功利主义角度看,它们比HATRL算法表现更好,同时实现了更高的社会结果。

英文摘要

Cooperation in multi-agent systems is typically optimized through utilitarian objectives that maximize overall efficiency but fail to account for reward distribution, often resulting in inequitable "leader-follower" dynamics. While fairness-based approaches encourage pro-social behaviors where every agent benefits from cooperation, many current algorithms - including those utilizing reward shaping - break the stationarity of Markov Games or lack rigorous theoretical guarantees. This creates a critical gap between fair objective methods and theoretically safe learning frameworks. We propose a novel framework that bridges $α$-fairness with Heterogeneous-Agent Trust Region Learning (HATRL), ensuring monotonic improvement and convergence toward Nash Equilibria. Our approach leverages a fair advantage function that dynamically weights agent utilities based on their expected returns, allowing the global objective to transition from purely utilitarian efficiency to $α$-fairness welfare based on the parameter $α$. We introduce two practical algorithms, $α$-fair HATRPO and $α$-fair HAPPO, and demonstrate through experiments in sequential social dilemmas like CleanUp and CommonHarvest that they perform better than HATRL's algorithms from a utilitarian point of view while achieving socially higher outcomes.

2606.13039 2026-06-12 cs.CY cs.AI cs.HC 新提交

Fault Lines: Navigating Ethics and Responsible AI Where National Policy Meets Local Practice in Public Sector Transformation

断层线:在公共部门转型中国家政策与地方实践交汇处的伦理与负责任AI导航

Sitong Lyu, Shabnam Taghiyeva, Mohit Kukadia, Denis Newman-Griffis

发表机构 * Centre for Machine Intelligence, University of Sheffield(谢菲尔德大学人工智能中心) Blavatnik School of Government, University of Oxford(牛津大学布莱瓦尼克政府学院)

AI总结 本文以英国特殊教育需求与残疾(SEND)为案例,通过17次半结构化访谈的主题分析,揭示了国家政策与地方实践在负责任AI实施中的五大挑战,并提出了政策与结构改革建议。

Comments 10 pages plus references. This study was funded by the University of Sheffield

详情
AI中文摘要

英国政府采取了支持AI的立场,以帮助在严重财政压力下转变公共服务交付,但将这一愿景转化为负责任的AI实践的道路仍然不明确。虽然英国政策通常在国家层面制定,但地方当局负责大多数公共服务交付,而公共部门中AI优先叙事的快速推进正在暴露这一国家-地方接口在知识和实践方面的断层线。本文以高风险的特殊教育需求与残疾(SEND)领域为案例,研究英国中央政府与地方当局之间接口处负责任AI的解释和实施方式。我们对17位政策制定者、从业者和第三部门专业人士进行了半结构化访谈,并进行了主题分析,以识别在国家政策与地方实践交汇处负责任AI的障碍和促成条件。我们发现了地方当局面临的五个相互关联的挑战:AI的影子使用和数据隐私风险、AI供应中的市场-政府不对称、劳动力准备不足、缺乏标准化定义和测量,以及人类问责制的缺口。针对每个挑战,参与者提出了可操作的步骤,从加强数据保护框架和重新平衡市场-政府关系到提升劳动力能力。我们对SEND的审查使这些挑战更加突出,展示了影响弱势儿童和家庭的高风险决策如何加剧了关于问责制、公平性和人类监督的紧张关系,暴露了基于原则的监管方法的局限性。我们认为,负责任的公共部门AI需要国家政策调整以及地方层面机构能力、价值观和治理机制的结构性改革。

英文摘要

The UK government has adopted a pro-AI stance to help transform public service delivery in the face of severe financial pressures, but the path to translate this vision into responsible AI practice remains ill-defined. While UK policy is often set at the national level, local authorities are responsible for most public service delivery, and the rapid advance of AI-first narratives in the public sector is exposing fault lines in knowledge and practice at this national-local interface. This paper examines how responsible AI is interpreted and implemented at the interface between the UK's central government and local authorities, taking the high-stakes area of Special Educational Needs and Disabilities (SEND) as a case study. We present a thematic analysis of 17 semi-structured interviews with policymakers, practitioners, and third-sector professionals to identify barriers and enabling conditions for responsible AI where national policy meets local practice. We identify five interconnected challenges facing local authorities: shadow usage of AI and data privacy risks, market-government asymmetry in AI provision, insufficient workforce readiness, a lack of standardised definitions and measurements, and gaps in human accountability. For each, participants proposed actionable steps, from strengthening data protection frameworks and rebalancing the market-government relationship to enhancing workforce capacity. Our examination of SEND brings these challenges into sharper focus, showing how high-stakes decisions affecting vulnerable children and families intensify tensions around accountability, fairness, and human oversight, exposing the limits of a principle-based regulatory approach. We argue that responsible public sector AI requires both national policy adjustments and structural reforms to institutional capacity, values, and governance mechanisms at the local level.

2606.12443 2026-06-12 cs.CY cs.AI cs.CL 新提交

Occupational Prompting Reveals Cultural Bias in Large Language Models

职业提示揭示大型语言模型中的文化偏见

Maksim E. Eren, Andrea Brennen, Ryan C. Barron, Eric Michalak

发表机构 * U.S. Government(美国政府)

AI总结 通过职业提示(如会计师、教师)替代国籍提示,研究开源LLM在价值观调查中的响应,发现不同职业导致文化地图内偏移,表明职业角色引发结构化价值模式。

详情
AI中文摘要

社会角色塑造期望、优先级和判断,但大型语言模型(LLM)如何将职业身份与更广泛的文化价值模式关联仍不清楚。先前工作使用基于国籍的文化提示来研究LLM对价值观调查问题的响应如何与人类文化基准对齐。本文通过用职业提示替代文化提示,扩展了该框架,以检查职业角色线索如何影响开源LLM的价值观调查响应。使用基于综合价值观调查问题的调查评估流程,我们将模型响应投影到二维Inglehart-Welzel文化空间。我们提示开源LLM以职业身份(如会计师、教师、工程师和护士)回答问题,然后分析这些职业条件化响应在文化地图上的位置。结果表明,当用职业而非国籍身份提示开源LLM时,其响应仍位于文化地图的广泛西方倾向区域。然而,不同职业在该区域内引入偏移,产生不同的职业偏差。这表明职业提示并非被视为中性角色标签,而是引发结构化价值模式。这些发现将基于调查的文化偏见评估扩展到国籍提示之外,并提供了研究职业角色如何塑造LLM中价值表达的框架。

英文摘要

Social roles shape expectations, priorities, and judgments, yet it remains unclear how large language models (LLMs) associate occupational identities with broader cultural value patterns. Prior work used nationality-based cultural prompting to study how LLM responses to value-survey questions align with human cultural benchmarks. In this paper, we extend that framework by replacing cultural prompting with occupational prompting to examine how professional-role cues influence value-survey responses in open-weight LLMs. Using a survey-grounded evaluation pipeline based on questions from the Integrated Values Surveys, we project model responses into the two-dimensional Inglehart--Welzel cultural space. We prompt open-weight LLMs to answer questions under occupational identities such as accountant, teacher, engineer, and nurse, and then analyze how these occupation-conditioned responses are positioned on the cultural map. Our results show that when open-weight LLMs are prompted with occupations rather than national identities, their responses remain within a broadly Western-leaning region of the cultural map. However, different occupations introduce shifts within this region, producing distinct occupational skews. This indicates that occupational prompts are not treated as neutral role labels, but instead elicit structured value patterns. These findings extend survey-based evaluation of cultural bias beyond nationality-based prompting and provide a framework for studying how occupational personas shape value expression in LLMs.

2606.12442 2026-06-12 cs.CY cs.AI 新提交

Reframing AI Loss of Control: What It Is, How to Have It, How to Lose It

重新定义AI失控:它是什么,如何拥有,如何失去

Ze Shen Chin, Maurice Chiodo, Dennis Müller, Coleman Snell

发表机构 * Oxford Martin AI Governance Initiative AI Standards Lab(牛津马丁人工智能治理倡议人工智能标准实验室) Centre for the Study of Existential Risk, University of Cambridge(存在风险研究中心,剑桥大学) Institute of Mathematics Education, University of Cologne(数学教育研究所,科隆大学) Cornell University(康奈尔大学)

AI总结 本文通过将控制锚定于“设定和获取目标”,建立控制的工作定义,探讨控制如何被失去、AI如何导致失控,并提出维持控制的建议。

Comments 56 pages

详情
AI中文摘要

目前,失控风险在公众讨论中备受关注,尤其是在AI领域,学术界、前沿实验室甚至政府都进行了广泛讨论。然而,在现有文献中,这一概念的基础似乎出奇地薄弱,即使是那些广泛讨论失控的人,也没有首先确立什么是控制以及究竟失去了什么。本文旨在解决这些空白。我们将控制锚定于“设定和获取目标”,从而建立控制的工作定义。然后,我们基于控制论、管理控制和控制理论等相关领域的基础概念,讨论控制的各个方面。这包括谁(或什么)可以处于控制之中,以及他们需要什么才能处于控制之中,例如设定目标的能力、拥有功能性的控制回路、具备必要的多样性以及足够的目标对齐。一旦建立了控制框架,我们将讨论控制如何被失去,AI如何导致这种失控,并提供关于如何保持控制的相关建议。我们工作的一个有趣结果是,人类作为个体和群体,可能因远低于超级智能水平的AI行为而失去不同程度的控制;失控情景(如我们所定义的)的可能性已经存在,并且已经存在了很长时间。

英文摘要

At present, loss of control risks have gained much prominence in public discussion, particularly in relation to AI, with extensive discourse present among academics, frontier labs, and even governments. However, in the existing literature, the concept seems to rest on surprisingly weak foundations, where even those that discuss loss of control extensively do not first establish what control is and what exactly is being lost. Our paper aims to address these gaps. We establish a working definition of control by anchoring it to the "setting and getting of goals". Then, we discuss various aspects of control, built on foundational concepts from related fields like cybernetics, management control, and control theory. This includes who (or what) can be in control, and the things they require to be in control, such as the ability to set goals, having a functional control loop, having requisite variety, and having sufficient goal alignment. Once a framework for control is established, we then discuss how control can be lost, how AIs can contribute to such loss of control, and offer relevant recommendations for how one can maintain control. One interesting consequence of our work is that humanity, as individuals and as groups, can lose varying degrees of control as a result of AI behaviour that is far below the level of superintelligence; the potential for loss of control scenarios (as we define them) already exist, and have existed for a long time.

2606.12439 2026-06-12 cs.CY cs.AI 新提交

Position: Generative Engine Optimization Creates Underexamined Risks, Governance Must Target Concentration, Disclosure, and Academic Blind Spots

立场:生成式引擎优化带来未被充分研究的风险,治理必须聚焦于集中化、披露和学术盲点

Yizhu Wen, Nan Zhang, Haohan Yuan, Xun Chen, Haopeng Zhang, Hanqing Guo

发表机构 * GitHub

AI总结 本文分析从搜索引擎优化到生成式引擎优化的转变,识别出集中化影响、未披露的商业影响和学术-工业盲点三大风险,主张答案级别的治理与测量。

Comments This paper is accepted by the ICML 2026 Position Track

详情
Journal ref
https://icml.cc/virtual/2026/poster/67185
AI中文摘要

大型语言模型(LLM)答案引擎越来越多地被用于信息搜索,将可见性从排名列表转变为合成答案。这使得生成式引擎优化(GEO)成为可能,它针对LLM答案引擎的证据池和生成过程。我们分析了从搜索引擎优化(SEO)到GEO的转变,识别出两个风险:(i)由于低可争议性和系统敏感性导致的集中化影响,以及(ii)嵌入在证据和推理中的未披露的商业影响。然后,我们形式化了一个通用的GEO管道,以定位优化行为发生的位置,并比较学术和工业实践,揭示了第三个风险:(iii)由离线设置和部署系统之间的可见性和评估不对称性驱动的学术-工业盲点。这一立场主张需要答案级别的治理和测量:更强的可争议性、高精度披露、对实质性影响的黑盒审计,以及用于暴露持久性的部署对齐指标。

英文摘要

Large language model (LLM) answer engines are increasingly used for information seeking, shifting visibility from ranked lists to synthesized answers. This enables Generative Engine Optimization (GEO), which targets LLM answer engines' evidence pool and generation. We analyze the search engine optimization (SEO) to GEO transition to identify two risks: (i) concentrated influence from low contestability and system sensitivity, and (ii) undisclosed commercial influence embedded in evidence and reasoning. We then formalize a general GEO pipeline to locate where optimization acts and compare academic and industry practices, revealing a third risk: (iii) academic-industry blind spots driven by visibility and evaluation asymmetries between offline setups and deployed systems. This position argues the need for answer-level governance and measurement: stronger contestability, high-precision disclosure, black-box auditing of material influence, and deployment-aligned metrics for exposure persistence.

2606.12435 2026-06-12 cs.CY cs.DB cs.LG 新提交

Auditing Discriminatory Patterns in Mortgage Lending Through Association Rules and Fair Binning

通过关联规则和公平分箱审计抵押贷款中的歧视性模式

Archit Rathod, Dhwani Chande, Het Nagda

发表机构 * University of Illinois Chicago(伊利诺伊大学芝加哥分校)

AI总结 研究标准分箱预处理是否放大抵押贷款中的种族/性别差异,使用HMDA数据构建三阶段流水线,发现公平分箱以公平代价29.4%实现,K-Means聚类揭示黑人申请者拒绝率显著更高。

Comments 10 pages, 4 figures, fairness-aware mortgage lending analysis using HMDA 2023 data. Project repository available at GitHub

详情
AI中文摘要

美国的抵押贷款表现出持续的种族和性别差异。我们研究标准数据预处理步骤,特别是属性分箱,是否在下游模式挖掘中放大这些差异。使用来自HMDA 2023数据集(芝加哥大都市区)的103,481份清理后的抵押贷款申请,我们构建了一个三阶段流水线:(1)PySpark数据清理和分箱流水线,实现标准等频分箱和Asudeh等人[1]的ε偏置公平分箱算法;(2)FP-Growth关联规则挖掘,比较两种分箱制度下的拒绝模式;(3)K-Means聚类及每簇差异影响审计。我们的标准分箱在收入离散化中显示9.63%的种族偏差,与先前工作中报告的8-10%一致。使用七个种族组的公平分箱在ε=0.03时不可行,仅在ε=0.08时成功,公平代价为29.4%。FP-Growth揭示高债务收入比是主要的拒绝预测因子(置信度67.2%,提升度2.81),而种族偏差未表现为显式的高支持度规则。然而,K-Means聚类后进行差异影响审计标记了45个簇-组对中的10个,表明即使在财务相似的群体中,黑人申请者的拒绝率也显著高于白人申请者。

英文摘要

Mortgage lending in the United States exhibits persistent racial and gender disparities. We investigate whether standard data preprocessing steps, specifically attribute binning, amplify these disparities in downstream pattern mining. Using 103,481 cleaned mortgage applications from the HMDA 2023 dataset (Chicago metropolitan area), we build a three-stage pipeline: (1) a PySpark data cleaning and binning pipeline that implements both standard equal-frequency binning and the epsilon-biased fair binning algorithm from Asudeh et al. [1], (2) FP-Growth association rule mining that compares denial patterns under both binning regimes, and (3) K-Means clustering with a per-cluster disparate impact audit. Our standard binning shows 9.63% racial bias in income discretization, consistent with the 8-10% reported in prior work. Fair binning with seven race groups is infeasible at epsilon=0.03 and only succeeds at epsilon=0.08 with a Price of Fairness of 29.4%. FP-Growth reveals that high debt-to-income ratio is the dominant denial predictor (67.2% confidence, 2.81 lift), while racial bias does not appear as explicit high-support rules. However, K-Means clustering followed by a disparate impact audit flags 10 out of 45 cluster-group pairs, showing that Black applicants face significantly higher denial rates than White applicants even among financially similar groups.

2606.12433 2026-06-12 cs.CY cs.CL 新提交

Marginal Alignment Does Not Guarantee Joint-Distribution Fidelity: An Official-Reference Audit of Nemotron-Personas-Korea with Cross-Locale Replication

边缘对齐不能保证联合分布保真度:基于官方参考的Nemotron-Personas-Korea审计与跨区域复制

Joonhyung Bae

发表机构 * Korea Advanced Institute of Science and Technology (KAIST)(韩国科学技术院)

AI总结 提出独立性假设足迹(IAF)审计方法,用于检查合成人物数据集中的联合分布保真度;应用于NVIDIA Nemotron-Personas-Korea,发现其边缘分布对齐但三个联合分布失败。

详情
AI中文摘要

合成人物数据集声称与官方人口统计数据对齐作为信任基础,但下游用户将其作为年龄、性别、地区、职业、教育、姓名和机构地位等联合结构使用。边缘对齐并不意味着这些联合结构得以保留。我们提出独立性假设足迹(IAF),这是一种审计原语,作用于数据集卡片本身记录为独立处理的属性组合。对于每个这样的组合,IAF将合成联合分布与外部官方或机构参考进行比较,使用直接联合表(如果可用)或规则隐含检查。应用于NVIDIA Nemotron-Personas-Korea(一百万韩国合成人物),IAF发现NPK与KOSIS边缘分布对齐,但三个联合分布失败。主要职业分布与KEIS毕业生总体存在较大的条件不匹配。兵役年龄分布在机构上不一致。男性主导职业中的女性代表被过度拉平至接近平等,严格筛选判定依赖于映射,且在直接标准化下对年龄稳健。跨六个额外NPK区域的迁移性演示发现诊断结果依赖于区域而非通用,参考分类基数混淆了跨区域标志计数。因此,对于用作硅样本的合成人物,边缘声明必须与基于披露的联合审计配对后才能重用。发布的审计工件(参考清单、职业交叉表、衍生指标、可重复性脚本)在NPK系列上实例化此协议,并发布用于其他合成人物资源的目标重定向。

英文摘要

Synthetic persona datasets cite alignment with official demographics as a basis for trust, yet downstream users consume them as joint structures across age, sex, region, occupation, education, name, and institutional status. Marginal alignment does not imply that these joints are preserved. We propose the Independence-Assumption Footprint (IAF), an audit primitive that operates on the attribute combinations a dataset card itself documents as treated independently. For each such combination, IAF compares the synthetic joint against an external official or institutional reference, using direct joint tables where available and rule-implied checks otherwise. Applied to NVIDIA Nemotron-Personas-Korea (one million Korean synthetic personas), IAF finds that NPK aligns with KOSIS marginals while three joints fail. The major-by-occupation distribution against the KEIS graduate universe carries a large conditional mismatch. The age profile of military service is institutionally inconsistent. Female representation in male-dominated occupations is substantially over-flattened toward parity, with the strict screening verdict mapping-dependent and age-robust under direct standardisation. A transferability demonstration across six further NPK locales finds locale-dependent rather than universal diagnostics, with reference-taxonomy cardinality confounding cross-locale flag counts. For synthetic personas used as silicon samples, marginal claims must therefore be paired with disclosure-anchored joint audits before reuse. The released audit artefacts (reference manifests, occupational crosswalks, derived metrics, reproducibility scripts) instantiate this protocol on the NPK family and are released for retargeting at other synthetic persona resources.

2606.12428 2026-06-12 cs.CY cs.AI 新提交

Mapping AI Programs in the U.S: A Status Report from Early 2026 and an Analysis of AI Majors and Minors

美国人工智能项目映射:2026年初现状报告及AI主修与辅修分析

Felix Muzny, Carolyn Jones, Carter Ithier, Hasnain Sikora, Hrutika Harshadbhai Patel, Carla E. Brodley

发表机构 * Center for Inclusive Computing(包容计算中心) Khoury College of Computer Sciences(科里学院计算机科学学院) Northeastern University(东北大学) Boston, Massachusetts, United States(马萨诸塞州波士顿,美国)

AI总结 报告2026年春美国本科AI项目现状,开发动态更新工具扫描560多所院校的350多个项目,分析66个AI主修和87个辅修的课程要求,发现并非所有主修都要求通用AI课程但需机器学习,超三分之一主修要求AI伦理课程而辅修不足四分之一。

详情
AI中文摘要

我们提交了一份关于2026年春季美国本科人工智能(AI)项目现状的报告。在此过程中,我们1)描述了我们的抓取和映射工具,这些工具动态更新以追踪美国AI教育的状态,2)在巨大动荡时期创建了一个历史记录。我们开发的工具(可在此https URL获取)检测、抓取并显示来自四年制大学350多个本科AI项目(主修、辅修、方向和证书)的数据。我们的工具搜索了560多所院校以定位这些项目,该样本代表了美国所有本科计算机科学(CS)毕业生的86%。该工具允许潜在学生、指导顾问、管理人员和教师轻松访问AI项目要求,并设计为随着新项目的出现而持续更新。据我们所知,这项调查代表了迄今为止对美国AI项目状态最全面的快照。通过这项工作,我们提供了三项重要贡献:1)在巨大动荡时期美国AI项目的记录;2)一个探索AI项目及其要求的工具;3)对66个AI主修和87个AI辅修所需课程的分析。我们对主修和辅修的分析显示,这些学位的规模和课程要求存在很大差异,但我们注意到两点:首先,并非所有主修都要求通用AI课程,但如果不需要,则必须要求机器学习(ML)课程;其次,虽然超过三分之一的主修要求AI伦理课程,但只有不到四分之一的AI辅修要求该课程。

英文摘要

We present a report on the status of undergraduate Artificial Intelligence (AI) programs in the United States in Spring 2026. In so doing, we 1) describe our scraping and mapping tools, which dynamically update to track the state of AI education in the U.S., and 2) create a historic record at a time of great upheaval. The tool we developed, available at https://cicmap.ai, detects, scrapes, and displays data from more than 350 undergraduate AI programs--majors, minors, concentrations, and certificates--at 4-year universities. Our tool searched over 560 institutions to locate these programs, a sample that represents 86\% of all undergraduate Computer Science (CS) graduates in the U.S. This tool allows prospective students, guidance counselors, administrators, and faculty to easily access AI program requirements and is designed to continually update as new programs emerge. To the best of our knowledge, this survey represents the most comprehensive snapshot of the state of AI programs in the U.S. to date. With this work we offer three important contributions: 1) a record of AI programs in the U.S. at a time of great upheaval; 2) a tool to explore AI programs and their requirements; and 3) an analysis of the courses required for 66 AI majors and 87 AI minors. Our analysis of majors and minors shows great variability in the size and the requirements of these degrees, but we note two takeaways. First, not all majors require a general AI course, but if they don't, they do require a Machine Learning (ML) course. Second, while more than a third of majors require an Ethics in AI course, just under a quarter of AI minors do.

2606.12426 2026-06-12 cs.CY cs.CL cs.LG 新提交

Two Wrongs, No Right: Auditing Social-Desirability Bias in LLM Annotators for Computational Social Science

两个错误,没有正确:审计计算社会科学中LLM标注者的社会期望偏差

Varun Kotte

发表机构 * Varun Kotte

AI总结 研究审计了三个开源指令微调模型在TweetEval任务中的社会期望偏差,发现模型存在宽大、过度纠正和中性偏差,且提示干预无法纠正,聚合指标可能掩盖实质结论错误。

详情
AI中文摘要

LLM标注者越来越多地用于计算社会科学(CSS),但尚不清楚其对齐形状的错误是否会改变研究者报告的实证结论。我们在四个提示条件下(72个单元格)审计了三个开源7B指令微调模型(Zephyr、Mistral-Instruct、Qwen2.5-Instruct)在六个TweetEval任务中的表现,发现社会期望失败并非单一方向。Zephyr表现出宽大偏差,系统性地少应用有害标签(冒犯性语言:假良性率0.729,虚警率0.031)。Mistral和Qwen表现出过度纠正,过度应用相同标签(Mistral仇恨言论FAR = 0.604)。所有三个模型在堕胎立场上表现出中性偏差,低估反对流行率24至40个百分点,并夸大中性标签。我们测试的四种提示干预(中性、安全框架、去个性化、思维链)均未纠正这些跨模型失败;安全框架可能加剧立场扭曲。引人注目的是,Zephyr的仇恨言论流行率估计与黄金率完全一致,而其类别条件误差在两个方向上都很大,这是一种偶然的抵消,误导了聚合验证。我们将这些模式转化为一个三部分分类法,具有诊断性FBR/FAR特征和轻量级黄金样本验证协议。可信CSS的标题:在聚合指标上看起来校准的模型仍然可能翻转研究者报告的实质性实证结论。

英文摘要

LLM annotators are increasingly used in computational social science (CSS), but it is unclear whether their alignment-shaped errors preserve the empirical conclusions a researcher would report. We audit three open-source 7B instruction-tuned models (Zephyr, Mistral-Instruct, Qwen2.5-Instruct) across six TweetEval tasks under four prompt conditions (72 cells) and find that social-desirability failures do not run in a single direction. Zephyr exhibits leniency bias, systematically under-applying harmful labels (offensive language: false benign rate 0.729, false alarm rate 0.031). Mistral and Qwen exhibit overcorrection, over-applying the same labels (Mistral hate-speech FAR = 0.604). All three models exhibit neutrality bias on abortion stance, underestimating opposition prevalence by 24 to 40 percentage points and inflating the neutral label. None of the four prompting interventions we test (neutral, safety framing, depersonalized, chain-of-thought) corrects these failures across models; safety framing can worsen stance distortion. Strikingly, Zephyr's hate-speech prevalence estimate matches the gold rate exactly while its class-conditional errors are large in both directions, an accidental cancellation that misleads aggregate validation. We translate these patterns into a three-part taxonomy with diagnostic FBR/FAR signatures and a lightweight gold-sample validation protocol. The headline for trustworthy CSS: a model that looks calibrated on aggregate metrics can still flip the substantive empirical conclusion a researcher would report.