arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

语言大模型 / LLM

大语言模型、预训练、指令微调、后训练和语言模型应用。

今日/当前日期收录 35 信号源:cs.CL, cs.AI, cs.LG
2602.14789 2026-06-18 cs.LG stat.ML 版本更新 专题 60

On the Stability of Nonlinear Dynamics in GD and SGD: Beyond Quadratic Potentials

关于GD和SGD中非线性动力学的稳定性:超越二次势能

Rotem Mulayoff, Sebastian U. Stich

发表机构 * CISPA Helmholtz Center for Information Security(CISPA赫尔姆霍兹信息安全中心)

专题命中 其他LLM :优化算法稳定性分析,与LLM训练相关但非核心

AI总结 研究梯度下降和随机梯度下降中非线性项对动力学稳定性的影响,推导了多元设置下稳定振荡的精确条件,并发现SGD的稳定性由单个不稳定批次决定。

Comments Accepted to COLT 2026

详情
AI中文摘要

训练过程中迭代的动力稳定性在确定优化算法所获得的极小值方面起着关键作用。例如,梯度下降(GD)的稳定解对应于平坦极小值,而平坦极小值被认为具有有利特征。虽然先前的工作通常依赖线性化来确定稳定性,但线性化动力学是否忠实捕捉完整的非线性行为仍不清楚。最近的研究表明,GD可能在线性不稳定的极小值附近稳定振荡,并在步长衰减后收敛,这表明线性分析可能具有误导性。在这项工作中,我们明确研究了非线性项的影响。具体而言,我们在多元设置下推导了GD在极小值附近稳定振荡的精确准则。我们的条件依赖于高阶导数,推广了现有结果。将分析扩展到随机梯度下降(SGD),我们表明即使单个批次不稳定,非线性动力学也可能在期望上发散。这意味着稳定性可能由单个不稳定振荡的批次决定,而非线性分析所暗示的平均效应。最后,我们证明如果所有批次都是线性稳定的,则SGD的非线性动力学在期望上是稳定的。

英文摘要

The dynamical stability of the iterates during training plays a key role in determining the minima obtained by optimization algorithms. For example, stable solutions of gradient descent (GD) correspond to flat minima, which have been associated with favorable features. While prior work often relies on linearization to determine stability, it remains unclear whether linearized dynamics faithfully capture the full nonlinear behavior. Recent work has shown that GD may stably oscillate near a linearly unstable minimum and still converge once the step size decays, indicating that linear analysis can be misleading. In this work, we explicitly study the effect of nonlinear terms. Specifically, we derive an exact criterion for stable oscillations of GD near minima in the multivariate setting. Our condition depends on high-order derivatives, generalizing existing results. Extending the analysis to stochastic gradient descent (SGD), we show that nonlinear dynamics can diverge in expectation even if a single batch is unstable. This implies that stability can be dictated by a single batch that oscillates unstably, rather than an average effect, as linear analysis suggests. Finally, we prove that if all batches are linearly stable, the nonlinear dynamics of SGD are stable in expectation.

2602.09234 2026-06-18 cs.LG cs.AI 版本更新 专题 60

Do Neural Networks Lose Plasticity in a Gradually Changing World?

神经网络在渐变世界中会失去可塑性吗?

Tianhui Liu, Lili Mou

发表机构 * Dept. Computing Science \& Alberta Machine Intelligence Institute (Amii), University of Alberta Canada CIFAR AI Chair

专题命中 其他LLM :神经网络可塑性损失,持续学习

AI总结 研究任务转换的突然性对神经网络可塑性损失的影响,通过输入/输出插值和任务采样模拟渐变环境,理论和实验表明可塑性损失严重程度与任务转换突然性密切相关,渐变环境下可显著减轻。

详情
AI中文摘要

持续学习已成为机器学习的热门话题。最近的研究发现了一个有趣的现象,称为可塑性丧失,指的是神经网络逐渐失去学习新任务的能力。然而,现有的可塑性研究很大程度上依赖于具有突然任务转换的基准测试,而没有检验突然性本身是否导致了观察到的可塑性损失。在本文中,我们通过输入/输出插值和任务采样模拟逐渐变化的环境,研究了转换突然性的作用。我们进行了理论和实证分析,表明可塑性损失的严重程度与任务转换的突然性密切相关,并且在环境逐渐变化时可以显著降低。

英文摘要

Continual learning has become a trending topic in machine learning. Recent studies have discovered an interesting phenomenon called loss of plasticity, referring to neural networks gradually losing the ability to learn new tasks. However, existing plasticity research largely relies on benchmarks with abrupt task transitions, without examining whether the abruptness itself contributes to the observed plasticity loss. In this paper, we investigate the role of transition abruptness by simulating gradually changing environments through input/output interpolation and task sampling. We perform theoretical and empirical analysis, showing that the severity of plasticity loss is closely tied to the abruptness of task transitions, and can be substantially reduced when the environment changes gradually.

2502.02904 2026-06-18 cs.HC cs.CL q-bio.NC 版本更新 专题 60

ScholaWrite: A Dataset of End-to-End Scholarly Writing Process

ScholaWrite: 端到端学术写作过程数据集

Khanh Chi Le, Linghe Wang, Minhwa Lee, Ross Volkov, Luan Tuyen Chau, Dongyeop Kang

发表机构 * University of Minnesota(明尼苏达大学)

专题命中 其他LLM :数据集涉及LLM辅助写作,但非核心

AI总结 提出ScholaWrite数据集,通过Chrome扩展记录Overleaf上的按键,捕捉从初稿到终稿的多月写作过程,包含5篇计算机科学预印本的近6.2万次文本修改及认知写作意图标注,揭示人类写作与LLM辅助之间的差距。

Comments Equal contribution: Khanh Chi Le, Linghe Wang, Minhwa Lee | project page: https://minnesotanlp.github.io/scholawrite/

详情
AI中文摘要

写作是一项认知要求高的活动,需要持续决策、高度依赖工作记忆,并在不同目标的任务之间频繁切换。为了构建与作者认知真正一致的写作助手,我们必须捕捉并解码作者将想法转化为最终文本背后的完整思维过程。我们提出了ScholaWrite,这是第一个端到端学术写作数据集,追踪从初稿到最终手稿的多月历程。我们贡献了三个关键进展:(1)一个Chrome扩展,可无干扰地记录Overleaf上的按键,从而能够收集真实、现场写作数据;(2)一个新颖的完整学术手稿语料库,附有认知写作意图的细粒度标注。该数据集包含基于LaTeX的五篇计算机科学预印本的编辑,捕捉了四个月内近6.2万次文本更改;(3)对学术写作微观动态的分析和见解,突出了人类写作过程与大型语言模型(LLM)在提供有意义帮助方面的当前能力之间的差距。ScholaWrite强调了捕获端到端写作数据以开发未来写作助手的重要性,这些助手支持而非取代科学家的认知工作。

英文摘要

Writing is a cognitively demanding activity that requires constant decision-making, heavy reliance on working memory, and frequent shifts between tasks of different goals. To build writing assistants that truly align with writers' cognition, we must capture and decode the complete thought process behind how writers transform ideas into final texts. We present ScholaWrite, the first dataset of end-to-end scholarly writing, tracing the multi-month journey from initial drafts to final manuscripts. We contribute three key advances: (1) a Chrome extension that unobtrusively records keystrokes on Overleaf, enabling the collection of realistic, in-situ writing data; (2) a novel corpus of full scholarly manuscripts, enriched with fine-grained annotations of cognitive writing intentions. The dataset includes \LaTeX-based edits from five computer science preprints, capturing nearly 62K text changes over four months; and (3) analyses and insights into the micro-dynamics of scholarly writing, highlighting gaps between human writing processes and the current capabilities of large language models (LLMs) in providing meaningful assistance. ScholaWrite underscores the value of capturing end-to-end writing data to develop future writing assistants that support, not replace, the cognitive work of scientists.

2502.17748 2026-06-18 cs.LG cs.CR 版本更新 专题 60

FinP: Fairness-in-Privacy in Federated Learning by Addressing Disparities in Privacy Risk

FinP:联邦学习中通过解决隐私风险差异实现隐私公平性

Tianyu Zhao, Mahmoud Srewa, Salma Elmalaki

发表机构 * University of California, Irvine(加州大学尔湾分校)

专题命中 其他LLM :联邦学习隐私,与LLM弱相关

AI总结 针对联邦学习中隐私风险分布不均的问题,提出FinP框架,通过服务器端自适应聚合和客户端正则化技术,减轻源推理攻击风险,将隐私暴露差异降低57.14%,同时保持模型效用与基线相当。

Comments To appear in PoPETS 2026 Issue 4. Privacy Enhancing Technology Symposium (PETS) 2026

详情
AI中文摘要

联邦学习(FL)固有地缓解了大规模数据集中化风险;然而,其隐私保护并非均匀分布——使得脆弱个体不成比例地暴露于复杂的隐私攻击之下。关键的是,以人为中心的FL环境中的统计异质性常常导致隐私风险的不公平分布,尤其影响那些敏感属性或行为使其成为异常值的个体。为解决这一关键差距,我们引入了FinP,这是一个新颖的框架,旨在通过减轻客户端对源推理攻击(SIA)的过度脆弱性来形式化和实施隐私公平性。FinP实施了一种双管齐下的防御策略,同时解决隐私差异的症状和根本原因,确保没有一组客户端承担过度的隐私负担。它结合了服务器端自适应聚合机制(根据客户端的估计隐私风险动态加权其贡献)和客户端正则化技术(抑制导致独特数据记忆的局部过拟合)。在FEMNIST、人类活动识别(HAR)和CIFAR-10数据集上的广泛实证评估表明,FinP有效地将隐私公平性与主要任务效用对齐。值得注意的是,FinP成功减轻了SIA风险并减少了隐私暴露差异,证明了强大的隐私公平性保证无需牺牲模型效用。最终,FinP通过将脆弱性差异降低高达57.14%,同时将全局模型效用保持在标准联邦基线±1.75%的微小范围内,建立了公平的隐私保护。

英文摘要

Federated Learning (FL) inherently mitigates mass data centralization risks; however, its privacy protections are not equally distributed - leaving vulnerable individuals disproportionately exposed to sophisticated privacy attacks. Crucially, statistical heterogeneity in human-centric FL environments often results in an inequitable distribution of privacy risks, particularly affecting those whose sensitive attributes or behaviors make them outliers. To address this critical gap, we introduce FinP, a novel framework designed to formalize and enforce fairness-in-privacy by mitigating disproportionate client vulnerability to Source Inference Attacks (SIA). FinP operationalizes a two-pronged defense strategy that tackles both the symptoms and root causes of privacy disparity, ensuring that no group of clients bears an excessive privacy burden. It combines a server-side adaptive aggregation mechanism, which dynamically weights client contributions based on their estimated privacy risk, with a client-side regularization technique to curb localized overfitting that drives unique data memorization. Extensive empirical evaluations on FEMNIST, Human Activity Recognition (HAR), and CIFAR-10 datasets demonstrate that FinP effectively aligns privacy fairness with primary task utility. Notably, FinP successfully mitigates SIA risks and reduces disparities in privacy exposure, establishing that strong fairness-in-privacy guarantees need not compromise model utility. Ultimately, FinP establishes equitable privacy protections by reducing vulnerability disparities by up to 57.14%, while preserving global model utility within a marginal +/- 1.75% of standard federated baselines.

2505.23851 2026-06-18 cs.CL cs.AI cs.SC 版本更新 专题 60

ASyMOB: Algebraic Symbolic Mathematical Operations Benchmark

ASyMOB:代数符号数学运算基准

Michael Shalyt, Rotem Elimelech, Ido Kaminer

发表机构 * MIT(麻省理工学院) Technion - Israel Institute of Technology(技术学院-以色列理工学院)

专题命中 其他LLM :涉及大模型在符号数学上的表现评估

AI总结 提出ASyMOB基准,包含35,368个符号数学问题,通过扰动测试揭示大模型在符号数学推理中的鲁棒性不足,并发现LLM与CAS的互补潜力。

Comments Published in ICML2026: https://icml.cc/virtual/2026/poster/63549 Code repository: https://github.com/RamanujanMachine/ASyMOB Complete benchmark dataset: https://huggingface.co/datasets/Shalyt/ASyMOB-Algebraic_Symbolic_Mathematical_Operations_Benchmark

详情
AI中文摘要

大型语言模型(LLM)越来越多地应用于符号数学,然而现有评估常常混淆模式记忆与真正推理。为弥补这一空白,我们提出\textbf{ASyMOB},一个包含\textit{35,368}个经过验证的符号数学问题的高分辨率数据集,涵盖积分、极限、微分方程、级数和超几何函数。与以往基准不同,\textbf{ASyMOB}通过符号、数值和等价保持变换系统地扰动每个种子问题,从而实现对泛化能力的细粒度评估。我们的评估揭示了三个关键发现:(1)大多数模型的性能在微小扰动下崩溃,而顶级系统表现出明显的鲁棒性\textit{机制转变};(2)集成代码工具稳定了性能,尤其对较弱模型;(3)我们识别出计算机代数系统(CAS)失败而LLM成功的例子,以及仅通过LLM-CAS混合方法解决的问题,突显了有前景的集成前沿。\textbf{ASyMOB}作为一个原则性诊断工具,用于衡量和加速构建可验证、可信赖的AI以促进科学发现。

英文摘要

Large language models (LLMs) are increasingly applied to symbolic mathematics, yet existing evaluations often conflate pattern memorization with genuine reasoning. To address this gap, we present ASyMOB, a high-resolution dataset of 35,368 validated symbolic math problems spanning integration, limits, differential equations, series, and hypergeometrics. Unlike prior benchmarks, ASyMOB systematically perturbs each seed problem using symbolic, numeric, and equivalence-preserving transformations, enabling a fine-grained assessment of generalization. Our evaluation reveals three key findings: (1) most models' performance collapses under minor perturbations, while top systems exhibit an apparent regime shift in robustness; (2) integrated code tools stabilize performance, particularly for weaker models; and (3) we identify examples where Computer Algebra Systems (CAS) fail while LLMs succeed, as well as problems solved only via a hybrid LLM-CAS approach, highlighting a promising integration frontier. ASyMOB serves as a principled diagnostic tool for measuring and accelerating progress toward building verifiable, trustworthy AI for scientific discovery.