arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2063
2602.02285 2026-06-11 cs.LG cs.CL math.ST stat.TH 版本更新

AI4SLT: Empirical Processes in Lean 4 for Formal Statistical Learning Theory

AI4SLT: 基于 Lean 4 的形式化统计学习理论实证过程

Yuanhe Zhang, Jason D. Lee, Fanghui Liu

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文首次在 Lean 4 中完整形式化统计学习理论,基于实证过程理论,通过人机协作工作流构建了可验证的定理证明工具箱,并揭示了教材中的隐含假设。

Comments Accepted by ICML 2026

详情
AI中文摘要

我们提出了首个基于实证过程理论的统计学习理论(SLT)在 Lean 4 中的全面形式化。我们的端到端形式化基础设施填补了最新 Lean 库中缺失的内容,包括高斯 Lipschitz 集中的完整推导、次高斯过程的 Dudley 熵积分定理,以及具有尖锐速率的(稀疏)最小二乘回归应用。该项目采用人机协作工作流,其中人类设计证明策略,AI 代理执行战术性证明构建,从而产生了经过人工验证的 SLT 的 Lean 4 工具箱。除了实现之外,形式化过程暴露并解决了标准 SLT 教材中的隐含假设和缺失细节,强制对理论进行逐行细粒度理解。这项工作建立了一个可重用的形式化基础,并为机器学习理论的未来发展打开了大门。代码可在以下网址获取:https://this https URL。

英文摘要

We present the first comprehensive Lean 4 formalization of statistical learning theory (SLT) grounded in empirical process theory. Our en-to-end formal infrastructure implement the missing contents in latest Lean library, including a complete development of Gaussian Lipschitz concentration, Dudley's entropy integral theorem for sub-Gaussian processes, and an application to least-squares (sparse) regression with a sharp rate. The project was carried out using a human-AI collaborative workflow, in which humans design proof strategies and AI agents execute tactical proof construction, leading to the human-verified Lean 4 toolbox for SLT. Beyond implementation, the formalization process exposes and resolves implicit assumptions and missing details in standard SLT textbooks, enforcing a granular, line-by-line understanding of the theory. This work establishes a reusable formal foundation and opens the door for future developments in machine learning theory. The code is provided in https://github.com/YuanheZ/lean-stat-learning-theory.

2602.02229 2026-06-11 cs.LG eess.SP 版本更新

Prediction-Powered Risk Monitoring of Deployed Models for Detecting Harmful Distribution Shifts

预测驱动的已部署模型风险监控:检测有害分布漂移

Guangyi Zhang, Yunlong Cai, Guanding Yu, Osvaldo Simeone

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出预测驱动风险监控(PPRM),一种基于预测驱动推断的半监督方法,通过结合合成标签与少量真实标签构建运行风险的随时有效下界,实现对有害漂移的检测,并在图像分类、大语言模型和电信监控任务中验证有效性。

Comments Accepted by ICML2026

详情
AI中文摘要

我们研究了在动态环境中模型性能监控的问题,其中标记数据有限。为此,我们提出了预测驱动风险监控(PPRM),一种基于预测驱动推断(PPI)的半监督风险监控方法。PPRM通过结合合成标签与少量真实标签,构建运行风险的随时有效下界。通过基于阈值的比较与名义风险的上界,检测有害漂移,满足无假设的有限样本I型误差保证。我们通过在图像分类、大语言模型(LLM)和电信监控任务上的大量实验,证明了PPRM的有效性。

英文摘要

We study the problem of monitoring model performance in dynamic environments where labeled data are limited. To this end, we propose prediction-powered risk monitoring (PPRM), a semi-supervised risk-monitoring approach based on prediction-powered inference (PPI). PPRM constructs anytime-valid lower bounds on the running risk by combining synthetic labels with a small set of true labels. Harmful shifts are detected via a threshold-based comparison with an upper bound on the nominal risk, satisfying assumption-free finite-sample guarantees on the type-I error. We demonstrate the effectiveness of PPRM through extensive experiments on image classification, large language model (LLM), and telecommunications monitoring tasks.

2602.00945 2026-06-11 cs.CL cs.AI 版本更新

Neural FOXP2 -- Language Specific Neuron Steering for Targeted Language Improvement in LLMs

Neural FOXP2——面向大型语言模型目标语言改进的语言特定神经元引导

Anusa Saha, Tanmay Joshi, Vinija Jain, Aman Chadha, Amitava Das

发表机构 * Meta, USA(Meta, 美国) Apple, USA(Apple, 美国) Pragya Lab, BITS Pilani Goa, India(Pragya实验室,BITS Pilani Goa,印度)

AI总结 提出Neural FOXP2方法,通过定位语言神经元、计算引导方向和施加稀疏激活偏移,将模型默认语言从英语切换为印地语或西班牙语,实现可控的语言主导性。

详情
AI中文摘要

LLMs通过训练成为多语言模型,但其通用语言通常是英语,反映了英语在预训练中的主导地位。其他语言保留在参数记忆中,但被系统性抑制。我们认为语言默认性由稀疏、低秩的控制电路(语言神经元)支配,可以机械地隔离并安全引导。我们引入Neural FOXP2,通过引导语言特定神经元,使模型以选定语言(印地语或西班牙语)为主。Neural FOXP2分三个阶段进行:(i) 定位:我们训练每层的SAE,使每个激活分解为一小组活跃特征组件。对于每个特征,我们量化英语与印地语/西班牙语的选择性,基于整体logit质量向目标语言令牌集的提升。将排名靠前的特征追溯回其最强贡献单元,得到紧凑的语言神经元集。(ii) 引导方向:我们通过谱低秩分析定位可控的语言转换几何。对于每层,我们构建英语到目标激活差异矩阵,并执行逐层SVD以提取主导语言变化的奇异方向。特征间隙和有效秩谱识别出紧凑的引导子空间和经验选择的干预窗口(这些方向最强且最稳定)。(iii) 引导:我们对语言神经元应用有符号的稀疏激活偏移。具体地,在低到中层,我们沿目标语言主导方向添加正向引导,并对英语神经元在零空间施加补偿性负偏移,实现可控的目标语言默认性。

英文摘要

LLMs are multilingual by training, yet their lingua franca is often English, reflecting English language dominance in pretraining. Other languages remain in parametric memory but are systematically suppressed. We argue that language defaultness is governed by a sparse, low-rank control circuit, language neurons, that can be mechanistically isolated and safely steered. We introduce Neural FOXP2, that makes a chosen language (Hindi or Spanish) primary in a model by steering language-specific neurons. Neural FOXP2 proceeds in three stages: (i) Localize: We train per-layer SAEs so each activation decomposes into a small set of active feature components. For every feature, we quantify English vs. Hindi/Spanish selectivity overall logit-mass lift toward the target-language token set. Tracing the top-ranked features back to their strongest contributing units yields a compact language-neuron set. (ii) Steering directions: We localize controllable language-shift geometry via a spectral low-rank analysis. For each layer, we build English to target activation-difference matrices and perform layerwise SVD to extract the dominant singular directions governing language change. The eigengap and effective-rank spectra identify a compact steering subspace and an empirically chosen intervention window (where these directions are strongest and most stable). (iii) Steer: We apply a signed, sparse activation shift targeted to the language neurons. Concretely, within low to mid layers we add a positive steering along the target-language dominant directions and a compensating negative shift toward the null space for the English neurons, yielding controllable target-language defaultness.

2602.00560 2026-06-11 cs.SD eess.AS 版本更新

Edit Content, Preserve Acoustics: Imperceptible Text-Based Speech Editing via Self-Consistency Rewards

编辑内容,保留声学:基于自一致性奖励的不可感知文本语音编辑

Yong Ren, Jiangyan Yi, Jianhua Tao, Tao Wang, Le Xu, Zhengqi Wen

发表机构 * The State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences(多模态人工智能系统国家重点实验室,自动化研究所,中国科学院) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Department of Automation, Tsinghua University(清华大学自动化系) BNRist, Tsinghua University(清华大学BNRist)

AI总结 提出一种在稳定语义空间中编辑内容、通过流匹配解码器保持声学连续性的框架,并利用自一致性奖励组相对策略优化实现不可感知的文本语音编辑。

Comments Accepted by Interspeech 2026

详情
AI中文摘要

不可感知的基于文本的语音编辑通过转录操作修改口语内容,同时保持声学连续性。先前的声学空间方法存在内容-风格纠缠,导致生成不稳定和边界伪影。我们引入了一个以“编辑内容,保留声学”原则为指导的框架。编辑在稳定的语义空间中进行,而声学实现由流匹配解码器处理。为了确保感知一致性,我们提出了自一致性奖励组相对策略优化,该优化利用预训练的文本到语音模型作为隐式评判器,并结合可理解性和持续时间约束。实验表明,在可理解性、鲁棒性和感知质量方面,该方法持续优于最先进的自回归和非自回归基线。

英文摘要

Imperceptible text-based speech editing modifies spoken content through transcript manipulation while preserving acoustic continuity. Prior acoustic-space approaches suffer from content-style entanglement, causing unstable generation and boundary artifacts. We introduce a framework guided by the principle of "Edit Content, Preserve Acoustics". Editing is conducted in a stable semantic space, while acoustic realization is handled by a Flow Matching decoder. To ensure perceptual consistency, we propose Self-Consistency Rewards Group Relative Policy Optimization, which leverages a pre-trained Text-to-Speech model as an implicit critic, together with intelligibility and duration constraints. Experiments demonstrate consistent improvements over state-of-the-art autoregressive and non-autoregressive baselines in intelligibility, robustness, and perceptual quality.

2512.22088 2026-06-11 cs.LG cs.AI cs.CL 版本更新

Unifying Learning Dynamics and Generalization in Transformers Scaling Law

统一Transformer缩放定律中的学习动力学与泛化

Chiwun Yang

发表机构 * Sun Yat-sen University(中山大学)

AI总结 本文通过将Transformer学习动力学形式化为ODE系统并近似为核行为,严格分析了随机梯度下降训练下的泛化误差,揭示了计算资源缩放时泛化误差的指数衰减与幂律衰减的两阶段相变,并建立了紧的上下界。

Comments 87 pages, 10 figures, 3 tables

详情
AI中文摘要

缩放定律是大语言模型(LLM)发展的基石,预测了模型性能随计算资源增加而提升。然而,尽管经验上得到验证,其理论基础仍不清晰。本文形式化了基于Transformer的语言模型的学习动力学为一个常微分方程(ODE)系统,然后将该过程近似为核行为。与之前的玩具模型分析不同,我们严格分析了在序列到序列数据上具有任意数据分布的多层Transformer的随机梯度下降(SGD)训练,紧密反映了真实世界条件。我们的分析刻画了随着计算资源随数据缩放时,泛化误差收敛到不可约风险的过程,特别是在优化过程中。我们建立了过剩风险的匹配上下界,其特征是明显的相变。在初始优化阶段,过剩风险相对于计算成本${\sf C}$呈指数衰减。然而,一旦超过特定的资源分配阈值,系统进入统计阶段,泛化误差遵循$\Theta(\mathsf{C}^{-1/7})$的幂律衰减。这些速率通过互补的下界得到证实——统计方面通过信息论的两点约简,优化方面通过一阶预言机论证——使得两阶段定律在常数、对数因子和条件数差距内是紧的。除了这个统一框架,我们的理论还推导了模型大小、训练时间和数据集大小的独立缩放定律,阐明了每个变量如何独立地控制泛化的边界。

英文摘要

The scaling law, a cornerstone of Large Language Model (LLM) development, predicts improvements in model performance with increasing computational resources. Yet, while empirically validated, its theoretical underpinnings remain poorly understood. This work formalizes the learning dynamics of transformer-based language models as an ordinary differential equation (ODE) system, then approximates this process to kernel behaviors. Departing from prior toy-model analyses, we rigorously analyze stochastic gradient descent (SGD) training for multi-layer transformers on sequence-to-sequence data with arbitrary data distribution, closely mirroring real-world conditions. Our analysis characterizes the convergence of generalization error to the irreducible risk as computational resources scale with data, especially during the optimization process. We establish matching upper and lower bounds on the excess risk, characterized by a distinct phase transition. In the initial optimization phase, the excess risk decays exponentially relative to the computational cost ${\sf C}$. However, once a specific resource allocation threshold is crossed, the system enters a statistical phase, where the generalization error follows a power-law decay of $Θ(\mathsf{C}^{-1/7})$. These rates are certified by complementary lower bounds -- statistical, via an information-theoretic two-point reduction, and optimization-side, via a first-order oracle argument -- rendering the two-stage law tight up to constants, logarithmic factors, and a condition-number gap. Beyond this unified framework, our theory derives isolated scaling laws for model size, training time, and dataset size, elucidating how each variable independently governs the bounds of generalization.

2602.00424 2026-06-11 cs.LG cond-mat.mtrl-sci 版本更新

Open Materials Generation with Inference-Time Reinforcement Learning

基于推理时间强化学习的开放材料生成

Philipp Hoellmer, Stefano Martiniani

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 提出OMatG-IRL框架,通过策略梯度强化学习直接作用于学习的速度场,无需显式计算得分,实现晶体结构预测中的能量目标强化,采样效率提升一个数量级。

Comments 25 pages, 12 figures, 6 tables

详情
AI中文摘要

晶体材料的连续时间生成模型通过学习预测稳定晶体结构实现逆向材料设计,但将显式目标属性纳入生成过程仍然具有挑战性。策略梯度强化学习(RL)为生成模型与下游目标对齐提供了原则性机制,但通常需要访问得分,这阻碍了其应用于仅学习速度场的基于流的模型。我们提出了一种推理时间强化学习的开放材料生成(OMatG-IRL)框架,这是一种直接作用于学习的速度场的策略梯度RL框架,无需显式计算得分。OMatG-IRL利用底层生成动力学的随机扰动,保持预训练生成模型的基线性能,同时在推理时实现探索和策略梯度估计。通过OMatG-IRL,我们首次将RL应用于晶体结构预测(CSP)。我们的方法能够有效强化基于能量的目标,同时通过成分条件保持多样性,并且取得了与基于得分的RL方法竞争的性能。最后,我们展示了OMatG-IRL可以学习时间相关的速度退火调度,实现精确的CSP,采样效率提高一个数量级,相应地生成时间减少。OMatG-IRL代码包含在开放材料生成(OMatG)框架的新版本中,可从该https URL获取。

英文摘要

Continuous-time generative models for crystalline materials enable inverse materials design by learning to predict stable crystal structures, but incorporating explicit target properties into the generative process remains challenging. Policy-gradient reinforcement learning (RL) provides a principled mechanism for aligning generative models with downstream objectives but typically requires access to the score, which has prevented its application to flow-based models that learn only velocity fields. We introduce Open Materials Generation with Inference-time Reinforcement Learning (OMatG-IRL), a policy-gradient RL framework that operates directly on the learned velocity fields and eliminates the need for the explicit computation of the score. OMatG-IRL leverages stochastic perturbations of the underlying generation dynamics preserving the baseline performance of the pretrained generative model while enabling exploration and policy-gradient estimation at inference time. Using OMatG-IRL, we present the first application of RL to crystal structure prediction (CSP). Our method enables effective reinforcement of an energy-based objective while preserving diversity through composition conditioning, and it achieves performance competitive with score-based RL approaches. Finally, we show that OMatG-IRL can learn time-dependent velocity-annealing schedules, enabling accurate CSP with order-of-magnitude improvements in sampling efficiency and, correspondingly, reduction in generation time. The OMatG-IRL code is included in a new release of the Open Materials Generation (OMatG) framework available at https://github.com/FERMat-ML/OMatG.

2601.23278 2026-06-11 cs.LG cs.AR cs.CL 版本更新

FOCUS: DLLMs Know How to Tame Their Compute Bound

FOCUS: DLLMs 知道如何驯服它们的计算瓶颈

Kaihua Liang, Xin Tan, An Zhong, Hong Xu, Marco Canini

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Toronto(多伦多大学)

AI总结 针对扩散大语言模型解码中大部分计算浪费在不可解码令牌上的问题,提出 FOCUS 推理系统,通过动态聚焦可解码令牌并驱逐不可解码令牌,提升有效批大小,实现高达 3.52 倍的吞吐量提升。

Comments ICML 2026 camera-ready version

详情
AI中文摘要

扩散大语言模型(DLLMs)为自回归模型提供了一种引人注目的替代方案,但其部署受到高解码成本的制约。在这项工作中,我们识别出 DLLM 解码中的一个关键低效问题:虽然计算在令牌块上并行化,但每个扩散步骤中只有一小部分令牌是可解码的,导致大部分计算浪费在不可解码的令牌上。我们进一步观察到注意力导出的令牌重要性与逐令牌解码概率之间存在强相关性。基于这一洞察,我们提出了 FOCUS,一个专为 DLLMs 设计的推理系统。通过动态地将计算聚焦于可解码令牌并实时驱逐不可解码令牌,FOCUS 增加了有效批大小,缓解了计算限制并实现了可扩展的吞吐量。实验评估表明,在大批量设置下,FOCUS 相比生产级引擎 LMDeploy 实现了高达 3.52 倍的吞吐量提升,同时在多个基准测试中保持或提升了生成质量。

英文摘要

Diffusion Large Language Models (DLLMs) offer a compelling alternative to Auto-Regressive models, but their deployment is constrained by high decoding cost. In this work, we identify a key inefficiency in DLLM decoding: while computation is parallelized over token blocks, only a small subset of tokens is decodable at each diffusion step, causing most compute to be wasted on non-decodable tokens. We further observe a strong correlation between attention-derived token importance and token-wise decoding probability. Based on this insight, we propose FOCUS, an inference system designed for DLLMs. By dynamically focusing computation on decodable tokens and evicting non-decodable ones on-the-fly, FOCUS increases the effective batch size, alleviating compute limitations and enabling scalable throughput. Empirical evaluations demonstrate that FOCUS achieves up to 3.52$\times$ throughput improvement over the production-grade engine LMDeploy in large-batch settings, while preserving or improving generation quality across multiple benchmarks.

2601.22025 2026-06-11 cs.CL cs.AI cs.IR cs.SE 版本更新

When Generic Prompt Improvements Hurt: Evaluation-Driven Iteration for LLM Applications

当通用提示改进有害:LLM应用的评估驱动迭代

Daniel Commey

发表机构 * Daniel Commey

AI总结 提出最小可行评估套件(MVES),通过结构化评估框架和本地复现实验,发现通用提示添加并非单调改进,强调评估驱动的提示迭代。

Comments Technical report. 42 pages, 3 figures. Code, test suites, and result logs: https://github.com/dcommey/llm-eval-benchmarking

详情
AI中文摘要

评估大型语言模型(LLM)应用与传统软件测试不同,因为输出是概率性的、语义可变的,并且对提示和模型变化敏感。本技术报告提出了最小可行评估套件(MVES),一种面向审计的应用级LLM评估结构。MVES将应用类别与失败模式、指标、所需工件和验证证据联系起来,涵盖通用LLM应用、检索增强系统和智能体工作流。我们将该框架与可复现的本地评估工具配对,包括结构化提取、RAG引用/内容合规性和指令遵循检查。使用Ollama与Llama 3 8B Instruct和Qwen 2.5 7B Instruct,我们在扩展的每套30例消融实验中评估了五种提示条件。结果表明,在测试的本地条件下,通用提示添加不会产生单调改进:更强的输出合同提示提高了两种模型的严格提取,而RAG引用/内容合规性在某些通用规则条件下下降。观察到的最显著下降发生在Qwen 2.5上,当通用规则附加到用户提示时,RAG从26/30下降到9/30。这些发现支持评估驱动的提示迭代:提示更改应被视为潜在的回归风险,并在部署前针对特定任务套件进行测试。随附的存储库包含测试套件、提示变体、评估工具、原始结果日志和复现所报告本地消融所需的脚本。

英文摘要

Evaluating Large Language Model (LLM) applications differs from conventional software testing because outputs are probabilistic, semantically variable, and sensitive to prompt and model changes. This technical report proposes the Minimum Viable Evaluation Suite (MVES), an audit-oriented structure for application-level LLM evaluation. MVES links application categories to failure modes, metrics, required artifacts, and validation evidence across general LLM applications, retrieval-augmented systems, and agentic workflows. We pair the framework with a reproducible local evaluation harness covering structured extraction, RAG citation/content-compliance, and instruction-following checks. Using Ollama with Llama 3 8B Instruct and Qwen 2.5 7B Instruct, we evaluate five prompt conditions over expanded 30-case-per-suite ablations. The results show that, in the tested local conditions, generic prompt additions do not produce monotonic improvements: stronger output-contract prompts improve strict extraction for both models, while RAG citation/content-compliance declines under some generic-rule conditions. The largest observed decline occurs for Qwen 2.5 on RAG when generic rules are appended to the user prompt, from 26/30 to 9/30. These findings support evaluation-driven prompt iteration: prompt changes should be treated as potential regression risks and tested against task-specific suites before deployment. The accompanying repository contains the test suites, prompt variants, evaluation harness, raw result logs, and scripts needed to reproduce the reported local ablations.

2601.21898 2026-06-11 cs.AI cs.CR 版本更新

Making Models Unmergeable via Scaling-Sensitive Loss Landscape

通过尺度敏感损失景观使模型不可合并

Minwoo Jang, Hoyoung Kim, Jabin Koo, Jungseul Ok

发表机构 * Graduate School of AI, POSTECH, Pohang, Republic of Korea(POSTECH人工智能研究生院) National AI Research Lab, Seoul, Republic of Korea(国家人工智能研究实验室) Department of CSE, POSTECH, Pohang, Republic of Korea(POSTECH计算机科学与工程系)

AI总结 提出Trap$^2$框架,通过在微调中编码保护,使模型在单独使用时有效,但在合并中常见的权重缩放下性能下降,从而防止未经授权的模型组合。

Comments Appears in ICML 2026

详情
AI中文摘要

模型中心的兴起使得访问可重用模型组件变得更加容易,使模型合并成为组合能力的实用工具。然而,这种模块化也造成了治理缺口:下游用户可以将发布的权重重新组合成未经授权的混合体,绕过安全对齐或许可条款。由于现有防御措施大多是事后且特定于架构的,它们在实际中无法为不同架构和发布格式提供一致的保护。为了弥补这一缺口,我们提出了Trap$^2$,一个架构无关的保护框架,在微调过程中将保护编码到更新中,无论这些更新是作为适配器还是完整模型发布。Trap$^2$不依赖架构特定的方法,而是使用权重重新缩放作为合并过程的简单代理。它使发布的权重在单独使用时保持有效,但在合并中常见的重新缩放下性能下降,从而破坏未经授权的重新组合。

英文摘要

The rise of model hubs has made it easier to access reusable model components, making model merging a practical tool for combining capabilities. Yet, this modularity also creates a governance gap: downstream users can recompose released weights into unauthorized mixtures that bypass safety alignment or licensing terms. Because existing defenses are largely post-hoc and architecture-specific, they provide inconsistent protection across diverse architectures and release formats in practice. To close this gap, we propose Trap$^2$, an architecture-agnostic protection framework that encodes protection into updates during fine-tuning, regardless of whether they are released as adapters or full models. Instead of relying on architecture-dependent approaches, Trap$^2$ uses weight re-scaling as a simple proxy for the merging process. It keeps released weights effective in standalone use, but degrades them under re-scaling that often arises in merging, undermining unauthorized recomposition.

2601.17717 2026-06-11 cs.AI cs.LG 版本更新

A Survey on Evaluating Quality and Trustworthiness in LLM-Generated Data

评估LLM生成数据的质量与可信度综述

Kaituo Zhang, Mingzhi Hu, Hoang Anh Duy Le, Fariha Kabir Torsha, Zhimeng Jiang, Minh Khai Bui, Chia-Yuan Chang, Yu-Neng Chuang, Zhen Xiong, Ying Lin, Guanchu Wang, Na Zou

发表机构 * University of Houston(德克萨斯大学休斯敦分校) Worcester Polytechnic Institute(沃思利理工学院) Rice University(里德大学) Texas A&M University(德克萨斯农工大学) University of Wisconsin - Madison(威斯康星大学麦迪逊分校) University of Southern California(南加州大学) University of North Carolina at Charlotte(北卡罗来纳州立大学夏洛特分校)

AI总结 提出LLM数据审计框架,从质量和可信度两个维度系统分类评估指标,分析六种模态数据生成方法的评估缺陷并给出改进建议。

Comments Published at TMLR. Title changed in the final version

详情
Journal ref
Transactions on Machine Learning Research, 2026
AI中文摘要

大型语言模型(LLM)已成为跨多种模态生成数据的强大工具。通过将数据从稀缺资源转变为可控资产,LLM缓解了真实世界数据获取成本对模型训练、评估和系统迭代造成的瓶颈。然而,确保LLM生成的合成数据的高质量仍然是一个关键挑战。现有研究主要关注生成方法,对生成数据质量的直接关注有限。此外,大多数研究局限于单一模态,缺乏跨不同数据类型的统一视角。为填补这一空白,我们提出了\textbf{LLM数据审计框架}。在该框架中,我们首先描述了如何利用LLM生成六种不同模态的数据。更重要的是,我们从质量和可信度两个维度系统分类了评估合成数据的内在指标。这种方法将评估重点从依赖下游任务性能的外在评估转向数据本身的固有属性。利用这一评估体系,我们分析了每种模态代表性生成方法的实验评估,并指出了当前评估实践中的重大缺陷。基于这些发现,我们为社区改进数据生成评估提供了具体建议。最后,该框架概述了合成数据在不同模态下的实际应用方法。

英文摘要

Large Language Models (LLMs) have emerged as powerful tools for generating data across various modalities. By transforming data from a scarce resource into a controllable asset, LLMs mitigate the bottlenecks imposed by the acquisition costs of real-world data for model training, evaluation, and system iteration. However, ensuring the high quality of LLM-generated synthetic data remains a critical challenge. Existing research primarily focuses on generation methodologies, with limited direct attention to the quality of the resulting data. Furthermore, most studies are restricted to single modalities, lacking a unified perspective across different data types. To bridge this gap, we propose the \textbf{LLM Data Auditor framework}. In this framework, we first describe how LLMs are utilized to generate data across six distinct modalities. More importantly, we systematically categorize intrinsic metrics for evaluating synthetic data from two dimensions: quality and trustworthiness. This approach shifts the focus from extrinsic evaluation, which relies on downstream task performance, to the inherent properties of the data itself. Using this evaluation system, we analyze the experimental evaluations of representative generation methods for each modality and identify substantial deficiencies in current evaluation practices. Based on these findings, we offer concrete recommendations for the community to improve the evaluation of data generation. Finally, the framework outlines methodologies for the practical application of synthetic data across different modalities.

2601.17360 2026-06-11 cs.LG cs.AI cs.CR 版本更新

Robust Privacy: Inference-Stage Privacy through Certified Robustness

鲁棒隐私:通过认证鲁棒性实现推理阶段隐私

Jiankai Jin, Xiangzheng Zhang, Zhao Liu, Wenzhuo Xu, Dongdong Yang, Deyue Zhang, Quanchen Zou

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出鲁棒隐私(RP)概念,基于认证鲁棒性确保预测在输入邻域内不变,从而限制推理阶段隐私泄露;实验表明RP在属性推断和模型反演攻击中有效提升隐私-效用权衡。

详情
AI中文摘要

观察模型发布预测的对手可以推断查询输入的敏感属性,甚至重建模型训练数据的代表。因此,推理接口充当隐私泄露的侧信道。我们引入鲁棒隐私(RP),一种受认证鲁棒性启发的推理阶段隐私概念:如果模型预测在输入x周围半径为R的邻域内以至少$1-\alpha$的置信度可证明不变,则x享有$(R,\alpha)$-鲁棒隐私,在此条件下我们证明任何观察发布预测的对手在区分x与距离x为R内的任何输入时最多有$\alpha/2$的优势。基于RP,我们形式化鲁棒属性隐私(RAP),一种属性级隐私概念,刻画与发布预测兼容的敏感属性值集合。在分类任务上,RP将RAP兼容推理区间的中位数长度从23.50增加到29.96,降低了属性推断精度。模型反演攻击通常被视为训练阶段威胁,实际上依赖于通过推理接口泄露的细粒度信号;RP在推理阶段掩盖这些信号,将黑盒反演攻击的成功率(ASR)从73%降至4%。这种直接针对泄露通道的方法使RP在隐私-效用权衡空间中优于DP-SGD和随机响应:RP在21% ASR下保持98.4%的准确率,而DP-SGD必须将准确率降至61.7%才能达到相当的ASR。在两个实验中,增加平滑样本量N同时增强了隐私和效用。最后,我们考察模型蒸馏作为范围边界,表明RP缓解了属性级和实例级推理阶段隐私泄露,但无法通过模型蒸馏缓解函数级提取。

英文摘要

An adversary observing a model's released prediction can infer sensitive attributes of the queried input, or even reconstruct representatives of the model's training data. The inference interface thus acts as a side channel for privacy leakage. We introduce Robust Privacy (RP), an inference-stage privacy notion inspired by certified robustness: if a model's prediction is provably invariant within a radius-R neighborhood around an input x with confidence at least $1-α$, then x enjoys $(R,α)$-Robust Privacy, under which we prove that any adversary observing the released prediction has at most $α/2$ advantage in distinguishing x from any input within distance R of x. Building on RP, we formalize Robust Attribute Privacy (RAP), an attribute-level privacy notion that characterizes the set of sensitive-attribute values that remain compatible with a released prediction. On a classification task, RP increases the median length of the RAP-compatible inference interval from 23.50 to 29.96, reducing attribute-inference precision. Model inversion attacks, often treated as a training-stage threat, in fact rely on fine-grained signals leaked through the inference interface; RP masks these signals at the inference stage, reducing attack success rate (ASR) from 73% to 4% on a black-box inversion attack. This direct targeting of the leakage channel enables RP to dominate DP-SGD and randomized response in the privacy-utility tradeoff space: RP retains 98.4% accuracy at 21% ASR, whereas DP-SGD must drop accuracy to 61.7% to reach a comparable ASR. Across both experiments, increasing the smoothing sample size N strengthens privacy and improves utility together. Finally, we examine model distillation as a scope boundary and show that RP mitigates attribute-level and instance-level inference-stage privacy leakage, but not function-level extraction through model distillation.

2601.14792 2026-06-11 cs.LG 版本更新

Robustness of Mixtures of Experts to Feature Noise

混合专家模型对特征噪声的鲁棒性

Dong Sun, Rahul Nittala, Rebekka Burkholz

发表机构 * Dong Sun(东Sun) Rahul Nittala(拉胡尔·尼塔拉) Rebekka Burkholz(蕾贝卡·布克霍尔兹)

AI总结 研究混合专家模型在特征噪声下的鲁棒性,发现稀疏专家激活能作为噪声滤波器,相比密集网络具有更低的泛化误差、更强的鲁棒性和更快的收敛速度。

Comments ICML 2026

详情
AI中文摘要

尽管混合专家(MoE)模型在实践中取得了成功,但其为何能在参数规模相当的情况下超越密集网络仍不清楚。我们研究了一个等参数设置,其中输入具有潜在的模块化结构但被特征噪声破坏,这作为内部激活噪声的代理。我们表明,稀疏专家激活起到了噪声滤波器的作用:与密集估计器相比,MoE在特征噪声下实现了更低的泛化误差、对扰动的更强鲁棒性以及更快的收敛速度。在合成数据和真实语言任务上的实验结果证实了理论见解,展示了稀疏模块化计算带来的持续鲁棒性和效率提升。

英文摘要

Despite their practical success, it remains unclear why Mixture of Experts (MoE) models can outperform dense networks beyond sheer parameter scaling. We study an iso-parameter regime where inputs exhibit latent modular structure but are corrupted by feature noise, a proxy for noisy internal activations. We show that sparse expert activation acts as a noise filter: compared to a dense estimator, MoEs achieve lower generalization error under feature noise, improved robustness to perturbations, and faster convergence speed. Empirical results on synthetic data and real-world language tasks corroborate the theoretical insights, demonstrating consistent robustness and efficiency gains from sparse modular computation.

2601.14764 2026-06-11 cs.AI cs.HC cs.LO 版本更新

An XAI View on Explainable ASP: Methods, Systems, and Perspectives

可解释ASP的XAI视角:方法、系统与展望

Thomas Eiter, Tobias Geibinger, Zeynep G. Saribatur

发表机构 * Institute of Logic and Computation, TU Wien, Austria(逻辑与计算研究所,维也纳技术大学,奥地利)

AI总结 本文从XAI视角综述回答集编程(ASP)的解释方法,分类解释类型并评估现有理论与工具的覆盖范围,指出研究空白与未来方向。

Comments 10 pages

详情
AI中文摘要

回答集编程(ASP)是符号AI中一种流行的声明式推理和问题解决方法。其基于规则的形式化使其天生具有可解释和解释性推理的吸引力,随着可解释AI(XAI)的兴起,这一点日益重要。目前已经开发了许多针对ASP的解释方法和工具,它们通常处理特定的解释设置,可能无法覆盖ASP用户遇到的所有场景。在本综述中,我们从XAI视角出发,概述了与用户解释问题相关的ASP解释类型,并描述了当前理论和工具对其的覆盖情况。此外,我们指出了现有ASP解释方法中的空白,并确定了未来工作的研究方向。

英文摘要

Answer Set Programming (ASP) is a popular declarative reasoning and problem solving approach in symbolic AI. Its rule-based formalism makes it inherently attractive for explainable and interpretive reasoning, which is gaining importance with the surge of Explainable AI (XAI). A number of explanation approaches and tools for ASP have been developed, which often tackle specific explanatory settings and may not cover all scenarios that ASP users encounter. In this survey, we provide, guided by an XAI perspective, an overview of types of ASP explanations in connection with user questions for explanation, and describe their coverage by current theory and tools. Furthermore, we pinpoint gaps in existing ASP explanations approaches and identify research directions for future work.

2601.10774 2026-06-11 cs.LG hep-lat 版本更新

Analytic Bijections for Smooth and Interpretable Normalizing Flows

用于平滑且可解释的归一化流的解析双射

Mathis Gerdes, Miranda C. N. Cheng

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出三类全局光滑、解析可逆的双射函数,替代耦合流中的仿射变换或样条,并设计径向流架构,在径向结构目标上以千分之一参数达到耦合流质量。

Comments Final ICML 2026 version. 9 + 14 pages, 10 + 11 figures, 3 + 2 tables. New CIFAR-10 and tabular-data results; main text shortened for readability

详情
AI中文摘要

归一化流中的一个关键挑战是找到表达力强的可逆标量双射。现有方法面临权衡:仿射变换光滑且解析可逆但缺乏表达力;单调样条提供局部控制但仅分段光滑且作用于有界域;残差流实现光滑性但需要数值求逆。我们引入了三类解析双射,它们全局光滑($C^\infty$),定义在整个$\mathbb{R}$上,且以闭式解析可逆,结合了先前方法的有利性质。除了作为耦合流中的即插即用替代品(其性能匹配或超越样条),我们还开发了径向流:一种使用直接参数化的新颖架构,在保持角度方向的同时变换径向坐标。径向流表现出卓越的训练稳定性,产生几何可解释的变换,并且在具有径向结构的目标上,能以$1000$倍更少的参数达到与耦合流相当的质量。我们在1D和2D基准测试上进行了全面评估,并通过$\phi^4$格点场论实验证明了其在更高维物理问题中的适用性,其中我们的双射优于仿射基线,并能够解决模式崩溃问题的特定设计。

英文摘要

A key challenge in normalizing flows is finding expressive invertible scalar bijections. Existing approaches face trade-offs: affine transformations are smooth and analytically invertible but lack expressivity; monotonic splines offer local control but are only piecewise smooth and act on bounded domains; residual flows achieve smoothness but need numerical inversion. We introduce three families of analytic bijections that are globally smooth ($C^\infty$), defined on all of $\mathbb{R}$, and analytically invertible in closed form, combining the favorable properties of prior approaches. Beyond serving as drop-in replacements in coupling flows, where they match or exceed spline performance, we develop radial flows: a novel architecture using direct parametrization that transforms the radial coordinate while preserving angular direction. Radial flows exhibit exceptional training stability, produce geometrically interpretable transformations, and on targets with radial structure can achieve comparable quality to coupling flows with $1000\times$ fewer parameters. We provide comprehensive evaluation on 1D and 2D benchmarks, and demonstrate applicability to higher-dimensional physics problems through experiments on $ϕ^4$ lattice field theory, where our bijections outperform affine baselines and enable problem-specific designs that address mode collapse.

2601.08136 2026-06-11 cs.LG cs.SY eess.SY 版本更新

Reverse Flow Matching: A Unified Framework for Online Reinforcement Learning with Diffusion and Flow Policies

反向流匹配:基于扩散与流策略的在线强化学习统一框架

Zeyang Li, Sunbochen Tang, Navid Azizan

发表机构 * Zeyang Li(李泽阳) Sunbochen Tang(唐顺波晨) Navid Azizan(阿齐兹安纳维)

AI总结 针对在线强化学习中扩散与流策略缺乏目标样本的问题,提出反向流匹配框架,通过后验均值估计和Langevin Stein算子构造控制变量,统一了噪声期望与梯度期望两类方法,并扩展到流策略,提升训练效率与稳定性。

Comments ICML 2026 (Spotlight); Code: https://github.com/azizanlab/ReverseFlowMatching

详情
AI中文摘要

扩散和流策略因其强大的表达能力在在线强化学习(RL)中日益重要,但高效训练它们仍是一个关键挑战。在线RL与标准生成建模的一个根本区别在于缺乏来自Q函数定义的目标玻尔兹曼分布的直接样本。为此,针对扩散策略提出了两类看似不同的方法:噪声期望族,使用噪声的加权平均作为训练目标;梯度期望族,使用Q函数梯度的加权平均。然而,这些目标如何正式相关,或者它们能否被综合成一个更通用的公式,目前尚不清楚。在本文中,我们提出了一个统一框架——反向流匹配(RFM),该框架严格解决了在没有直接目标样本的情况下训练扩散和流模型的问题。通过采用反向推理视角,我们将训练目标表述为给定中间噪声样本的后验均值估计问题。关键地,我们引入Langevin Stein算子来构造零均值控制变量,推导出一类具有相同期望的通用估计器。我们表明,现有的噪声期望和梯度期望方法只是这个更广泛类别中的两个具体实例。这种统一观点带来了两个关键进展:它将针对玻尔兹曼分布的能力从扩散策略扩展到流策略,并使得能够原则性地结合Q值和Q梯度信息形成有效估计器,从而提高训练效率和稳定性。我们将RFM实例化以在在线RL中训练流策略,并在连续控制基准测试中展示了相比扩散策略基线的改进性能。

英文摘要

Diffusion and flow policies are gaining prominence in online reinforcement learning (RL) due to their expressive power, yet training them efficiently remains a critical challenge. A fundamental difficulty that distinguishes online RL from standard generative modeling is the lack of direct samples from the target Boltzmann distribution defined by the Q-function. To address this, two seemingly distinct families of methods have been proposed for diffusion policies: a noise-expectation family, which uses a weighted average of noise as the training target, and a gradient-expectation family, which employs a weighted average of Q-function gradients. However, it remains unclear how these objectives are formally related, or whether they can be synthesized into a more general formulation. In this paper, we propose a unified framework, reverse flow matching (RFM), which rigorously addresses the problem of training diffusion and flow models without direct target samples. By adopting a reverse inferential perspective, we formulate the training target as a posterior mean estimation problem given an intermediate noisy sample. Crucially, we introduce Langevin Stein operators to construct zero-mean control variates, deriving a general class of estimators that share the same expectation. We show that existing noise-expectation and gradient-expectation methods are simply two specific instances within this broader class. This unified view yields two key advancements: it extends the capability of targeting Boltzmann distributions from diffusion to flow policies, and it enables the principled combination of Q-value and Q-gradient information to form an effective estimator, thereby improving training efficiency and stability. We instantiate RFM to train a flow policy in online RL and demonstrate improved performance on continuous-control benchmarks compared to diffusion policy baselines.

2601.07506 2026-06-11 cs.CL 版本更新

Judging Against the Reference: Uncovering Knowledge-Driven Failures in LLM-Judges on QA Evaluation

与参考标准对照评判:揭示LLM评判者在QA评估中知识驱动的失败模式

Dongryeol Lee, Yerin Hwang, Taegwan Kang, Minwoo Lee, Younhyung Chae, Kyomin Jung

发表机构 * Dept. of ECE, Seoul National University(电子工程系,首尔国立大学) LG AI Research(LG人工智能研究) IPAI, Seoul National University(IPAI,首尔国立大学)

AI总结 本文发现LLM作为QA自动评判者时,当提供的参考答案与模型参数知识冲突,评分可靠性严重下降;通过引入交换参考答案框架系统研究该现象,揭示评判者过度依赖参数知识而忽略参考标准,且常见提示缓解策略无效。

Comments Under review, 21 pgs, 11 figures, 7 tables

详情
AI中文摘要

虽然大型语言模型(LLMs)越来越多地被用作问答(QA)和其他参考条件评估任务的自动评判者,但关于它们遵循所提供的参考标准的能力知之甚少。我们识别出这种基于参考的LLM QA评估的一个关键失败模式:当提供的参考标准与评判模型的参数知识冲突时,产生的评分变得不可靠,从而严重降低评估保真度。为了系统研究这一现象,我们引入了一个受控的交换参考答案QA框架,该框架引发参考-信念冲突。具体来说,我们将参考答案替换为错误实体,并构建原始和交换参考与相应对齐的候选答案的多样化配对。令人惊讶的是,在广泛的评判模型集合中,交换参考下的评分可靠性急剧下降。我们通过实验表明,这种脆弱性是由评判者过度依赖参数知识驱动的,导致评判者在冲突情况下忽略给定的参考标准。最后,我们发现这种失败在常见的基于提示的缓解策略下仍然存在,突显了LLM作为评判者评估的根本局限性,并激励了强制执行更强参考遵循的基于参考的协议。

英文摘要

While large language models (LLMs) are increasingly used as automatic judges for question answering (QA) and other reference-conditioned evaluation tasks, little is known about their ability to adhere to a provided reference. We identify a critical failure mode of such reference-based LLM QA evaluation: when the provided reference conflicts with the judge model's parametric knowledge, the resulting scores become unreliable, substantially degrading evaluation fidelity. To study this phenomenon systematically, we introduce a controlled swapped-reference QA framework that induces reference-belief conflicts. Specifically, we replace the reference answer with an incorrect entity and construct diverse pairings of original and swapped references with correspondingly aligned candidate answers. Surprisingly, grading reliability drops sharply under swapped references across a broad set of judge models. We empirically show that this vulnerability is driven by judges' over-reliance on parametric knowledge, leading judges to disregard the given reference under conflict. Finally, we find that this failure persists under common prompt-based mitigation strategies, highlighting a fundamental limitation of LLM-as-a-judge evaluation and motivating reference-based protocols that enforce stronger adherence to the provided reference.

2510.23508 2026-06-11 cs.CL 版本更新

M4FC: a Multimodal, Multilingual, Multicultural, Multitask Real-World Fact-Checking Dataset

M4FC:一个多模态、多语言、多文化、多任务的真实世界事实验证数据集

Jiahui Geng, Jonathan Tonglet, Iryna Gurevych

发表机构 * Mohamed bin Zayed University of Artificial Intelligence(Mohamed bin Zayed人工智能大学) Ubiquitous Knowledge Processing Lab(ubiquitous知识处理实验室) Department of Computer Science, TU Darmstadt(TU Darmstadt计算机科学系) National Research Center for Applied Cybersecurity ATHENE(应用网络安全国家研究中心ATHENE) Department of Electrical Engineering, KU Leuven(KU Leuven电气工程系) Department of Computer Science, KU Leuven(KU Leuven计算机科学系)

AI总结 为解决现有事实验证数据集规模小、语言单一、任务局限等问题,提出包含4982张图片和6980条声明的多模态数据集M4FC,覆盖6个验证任务,并提供基线结果。

Comments Preprint under review. Code and data available at: https://github.com/UKPLab/M4FC

详情
AI中文摘要

现有的多模态事实验证真实世界数据集存在多个局限性:实例数量少,仅覆盖一种或两种语言,只关注单一任务,或依赖外部新闻文章集来获取真实声明。为解决这些不足,我们引入了M4FC,一个新的真实世界数据集,包含4982张图片和6980条声明。这些图片由来自22个组织的专业事实核查员验证,代表了多样化的文化和地理背景。每条声明以十种语言中的一种或两种提供。M4FC涵盖六个多模态事实验证任务:视觉声明提取、声明者意图预测、虚假图像检测、图像语境化、位置验证和裁决预测。我们为所有任务提供了基线结果,并分析了组合中间任务对裁决预测性能的影响。我们公开了数据集和代码。

英文摘要

Existing real-world datasets for multimodal fact-checking have multiple limitations: they contain few instances, cover on only one or two languages, focus only on one task, or rely on external news article sets for sourcing true claims. To address these shortcomings, we introduce M4FC, a new real-world dataset comprising 4,982 images paired with 6,980 claims. The images, verified by professional fact-checkers from 22 organizations, represent a diverse range of cultural and geographic contexts. Each claim is available in one or two out of ten languages. M4FC spans six multimodal fact-checking tasks: visual claim extraction, claimant intent prediction, fake image detection, image contextualization, location verification, and verdict prediction. We provide baseline results for all tasks and analyze how combining intermediate tasks affects verdict prediction performance. We make our dataset and code publicly available.

2601.04710 2026-06-11 cs.CL cs.LG 版本更新

Steering the Noise: Turning Random Perturbations into Effective Descent for Memory-Efficient LLM Fine-Tuning

引导噪声:将随机扰动转化为有效下降方向以实现内存高效的LLM微调

Feihu Jin, Shipeng Cen, Ying Tan

发表机构 * School of Intelligence Science and Technology(智能科学与技术学院) Institute for Artificial Intelligence(人工智能研究院) Peking University(北京大学) State Key Laboratory of General Artificial Intelligence(通用人工智能国家重点实验室)

AI总结 提出一种即插即用框架,通过候选扰动池选择或组合与优化目标对齐的扰动,改进零阶优化梯度估计,提升LLM微调的收敛速度和任务精度。

Comments 12pages, 6figures

详情
AI中文摘要

微调大型语言模型(LLMs)取得了强大的性能,但通常受到反向传播内存开销的限制。零阶(ZO)优化通过仅使用前向传递来估计梯度,避免了这一开销,但由于随机高斯扰动在高维参数空间中产生高方差的梯度估计,其收敛速度通常较慢。在本文中,我们提出了一种即插即用框架,将随机扰动转化为更有效的下降方向。关键思想是抽取一小批候选扰动,评估其损失值,然后选择或组合那些与优化目标最一致的扰动。我们开发了该思想的两种实例:MeZO-GV,通过低损失和高损失扰动组之间的对比形成引导向量;以及MeZO-Greedy,在固定的评估预算内保留单个最佳扰动。我们从理论上证明,这两种策略在每步目标函数减少上均优于标准ZO估计,从而提高了收敛速度。在不同规模和架构的LLM上的实验证实,所提出的方法自然地与现有ZO优化器集成,并一致地提高了收敛速度和任务准确性。在OPT-13B上,我们的方法在11个基准测试中优于所有ZO基线,并在其中9个上超过了基于梯度的方法,同时保留了仅前向优化的内存效率。

英文摘要

Fine-tuning large language models (LLMs) achieves strong performance but is often limited by the memory overhead of backpropagation. Zeroth-order (ZO) optimization avoids this overhead by estimating gradients through forward passes alone, yet it typically converges slowly because random Gaussian perturbations yield high-variance gradient estimates in high-dimensional parameter spaces. In this paper, we propose a plug-and-play framework that turns random perturbations into more effective descent directions. The key idea is to draw a small pool of candidate perturbations, evaluate their loss values, and then select or combine those that are best aligned with the optimization objective. We develop two instantiations of this idea: MeZO-GV, which forms a guiding vector from the contrast between low-loss and high-loss perturbation groups, and MeZO-Greedy, which keeps the single best perturbation within a fixed evaluation budget. We theoretically show that both strategies yield a larger per-step reduction in the objective than standard ZO estimation, leading to improved convergence rates. Experiments on LLMs of different scales and architectures confirm that the proposed methods integrate naturally with existing ZO optimizers and consistently improve convergence speed and task accuracy. On OPT-13B, our approach outperforms all ZO baselines across 11 benchmarks and exceeds gradient-based methods on 9 of them, while retaining the memory efficiency of forward-only optimization.

2601.04203 2026-06-11 cs.CL cs.CV cs.LG cs.SE 版本更新

FronTalk: Benchmarking Front-End Development as Conversational Code Generation with Multi-Modal Feedback

FronTalk: 以多模态反馈进行对话式代码生成的前端开发基准测试

Xueqing Wu, Zihan Xue, Da Yin, Shuyan Zhou, Kai-Wei Chang, Nanyun Peng, Yeming Wen

发表机构 * Meta Superintelligence Labs(Meta超智能实验室) University of California, Los Angeles(加州大学洛杉矶分校) Duke University(杜克大学)

AI总结 提出FronTalk基准,通过多轮对话和多模态反馈(文本与视觉指令)评估前端代码生成,发现模型存在遗忘和视觉反馈理解困难,提出AceCoder方法有效减少遗忘并提升性能。

详情
AI中文摘要

我们提出了FronTalk,一个前端代码生成基准,开创性地研究了一种独特的交互动态:具有多模态反馈的对话式代码生成。在前端开发中,草图、模型和带注释的截图等视觉工件对于传达设计意图至关重要,但它们在多轮代码生成中的作用仍未得到充分探索。为解决这一差距,我们聚焦于前端开发任务,整理了FronTalk,这是一个包含100个多轮对话的数据集,这些对话源自新闻、金融和艺术等不同领域的真实网站。每一轮都包含一个文本指令和一个等效的视觉指令,每个指令代表相同的用户意图。为全面评估模型性能,我们提出了一种新颖的基于智能体的评估框架,利用网络智能体模拟用户并探索网站,从而衡量功能正确性和用户体验。对20个模型的评估揭示了文献中系统性地未充分探索的两个关键挑战:(1)显著的遗忘问题,即模型覆盖先前实现的功能,导致任务失败;(2)解释视觉反馈的持续挑战,尤其是对于开源视觉语言模型(VLM)。我们提出了一个强大的基线来解决遗忘问题,即AceCoder,一种使用自主网络智能体批评每个过去指令实现的方法。这种方法将遗忘几乎减少到零,并将性能提升高达9.3%(从56.0%到65.3%)。总体而言,我们旨在为前端开发和多轮多模态代码生成的通用交互动态的未来研究提供坚实基础。代码和数据已在此https URL发布。

英文摘要

We present FronTalk, a benchmark for front-end code generation that pioneers the study of a unique interaction dynamic: conversational code generation with multi-modal feedback. In front-end development, visual artifacts such as sketches, mockups and annotated creenshots are essential for conveying design intent, yet their role in multi-turn code generation remains largely unexplored. To address this gap, we focus on the front-end development task and curate FronTalk, a collection of 100 multi-turn dialogues derived from real-world websites across diverse domains such as news, finance, and art. Each turn features both a textual instruction and an equivalent visual instruction, each representing the same user intent. To comprehensively evaluate model performance, we propose a novel agent-based evaluation framework leveraging a web agent to simulate users and explore the website, and thus measuring both functional correctness and user experience. Evaluation of 20 models reveals two key challenges that are under-explored systematically in the literature: (1) a significant forgetting issue where models overwrite previously implemented features, resulting in task failures, and (2) a persistent challenge in interpreting visual feedback, especially for open-source vision-language models (VLMs). We propose a strong baseline to tackle the forgetting issue with AceCoder, a method that critiques the implementation of every past instruction using an autonomous web agent. This approach significantly reduces forgetting to nearly zero and improves the performance by up to 9.3% (56.0% to 65.3%). Overall, we aim to provide a solid foundation for future research in front-end development and the general interaction dynamics of multi-turn, multi-modal code generation. Code and data are released at https://github.com/shirley-wu/frontalk

2601.03326 2026-06-11 cs.CV cs.LG 版本更新

Higher order PCA-like rotation-invariant features for detailed shape descriptors modulo rotation

高阶类PCA旋转不变特征用于模旋转的详细形状描述符

Jarek Duda

发表机构 * Jarek Duda

AI总结 提出将PCA扩展到高阶张量(如三阶中心矩)或多项式乘高斯分布,以获取更精确的旋转不变形状描述符,并应用于分子形状描述、物体识别和形状相似性度量。

Comments 5 pages, 4 figures

详情
AI中文摘要

PCA可用于旋转不变特征,通过协方差矩阵 $p_{ab}=E[(x_i-E[x_a])(x_b-E[x_b])]$ 用椭球近似形状,并利用其幂的迹等旋转不变量。然而,真实形状通常复杂得多,因此提出将其扩展到例如 $p_{abc}=E[(x_a-E[x_a])(x_b-E[x_b])(x_c-E[x_c])]$ 的三阶或更高阶张量以描述中心矩,或多项式乘高斯分布以得到任意高精度的可解码形状描述符及其类似的旋转不变量。其实际应用包括旋转不变特征以包含模旋转的形状,例如用于分子形状描述符,或用于2D图像/3D扫描中直至旋转的物体识别,可能也用于3D场景理解,或作为形状相似性度量,允许模旋转下物体的廉价比较,避免耗时的旋转优化。

英文摘要

PCA can be used for rotation invariant features, describing a shape with its $p_{ab}=E[(x_i-E[x_a])(x_b-E[x_b])]$ covariance matrix approximating shape by ellipsoid, allowing for rotation invariants like its traces of powers. However, real shapes are usually much more complicated, hence there is proposed its extension to e.g. $p_{abc}=E[(x_a-E[x_a])(x_b-E[x_b])(x_c-E[x_c])]$ order-3 or higher tensors describing central moments, or polynomial times Gaussian allowing decodable shape descriptors of arbitrarily high accuracy, and their analogous rotation invariants. Its practical applications could be rotation-invariant features to include shape modulo rotation e.g. for molecular shape descriptors, or for up to rotation object recognition in 2D images/3D scans maybe also for 3D scene understanding, or shape similarity metric allowing inexpensive comparison of objects modulo rotation avoiding costly optimization over rotations.

2506.08473 2026-06-11 cs.LG 版本更新

AsFT: Anchoring Safety During LLM Fine-Tuning Within Narrow Safety Basin

AsFT:在窄安全盆地内锚定大语言模型微调期间的安全性

Shuo Yang, Qihui Zhang, Yuyang Liu, Xiaojun Jia, Kunpeng Ning, Jiayu Yao, Jigang Wang, Hailiang Dai, Yibing Song, Li Yuan

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学) Tsinghua University(清华大学)

AI总结 针对微调大语言模型时安全性易受损的问题,提出AsFT方法,通过惩罚与对齐方向正交的更新,将模型约束在窄安全盆地内,在提升任务性能的同时显著降低有害行为。

详情
AI中文摘要

微调大语言模型(LLMs)可提升性能,但引入了关键的安全漏洞:即使极少的有害数据也会严重破坏安全措施。我们观察到,与对齐方向(由对齐(安全)模型与未对齐模型之间的权重差异定义)正交的扰动会迅速损害模型安全性。相反,沿对齐方向的更新则基本保持安全性,揭示了参数空间是一个“窄安全盆地”。为解决此问题,我们提出AsFT(在微调中锚定安全性),通过在微调过程中显式约束更新方向来维持安全性。通过惩罚与对齐方向正交的更新,AsFT有效将模型约束在“窄安全盆地”内,从而保持其固有安全性。在多个数据集和模型上的大量实验表明,AsFT将有害行为降低高达7.60%,任务性能提升3.44%,并在多个任务上持续优于现有方法。

英文摘要

Fine-tuning large language models (LLMs) improves performance but introduces critical safety vulnerabilities: even minimal harmful data can severely compromise safety measures. We observe that perturbations orthogonal to the alignment direction - defined by weight differences between aligned (safe) and unaligned models - rapidly compromise model safety. In contrast, updates along the alignment direction largely preserve it, revealing the parameter space as a "narrow safety basin". To address this, we propose AsFT (Anchoring Safety in Fine-Tuning) to maintain safety by explicitly constraining update directions during fine-tuning. By penalizing updates orthogonal to the alignment direction, AsFT effectively constrains the model within the "narrow safety basin," thus preserving its inherent safety. Extensive experiments on multiple datasets and models show that AsFT reduces harmful behaviors by up to 7.60%, improves task performance by 3.44%, and consistently outperforms existing methods across multiple tasks.

2511.21594 2026-06-11 cs.LG 版本更新

Visualizing LLM Latent Space Geometry Through Dimensionality Reduction

通过降维可视化LLM潜在空间几何结构

Alex Ning, Vainateya Rangaraju, Yen-Ling Kuo

发表机构 * Department of Computer Science, University of Virginia(计算机科学系,弗吉尼亚大学)

AI总结 通过PCA和UMAP降维,可视化GPT-2和LLaMa中Transformer层的潜在状态几何,发现注意力与MLP输出分离、初始位置高范数及螺旋结构等模式。

Comments 25 pages, 15 figures

详情
AI中文摘要

大型语言模型(LLM)在许多自然语言任务中取得了最先进的结果,但其内部机制仍然难以解释。在这项工作中,我们通过降维提取、处理和可视化基于Transformer的语言模型中的潜在状态几何结构。我们在Transformer块内的多个点捕获逐层激活,并通过主成分分析(PCA)和均匀流形近似与投影(UMAP)实现系统分析。我们在GPT-2和LLaMa模型上进行了实验,发现了潜在空间中有趣的几何模式。值得注意的是,我们识别出中间层中注意力与MLP组件输出之间的清晰分离,据我们所知,这种模式在先前的工作中未被记录。我们还描述了初始序列位置潜在状态的高范数,并可视化了潜在状态的逐层演化。此外,我们展示了GPT-2位置嵌入的高维螺旋结构以及LLaMa中按序列的几何模式。我们在以下网址提供代码:https://this https URL。相同内容的更好格式的博客文章可在以下网址获取:https://this https URL。

英文摘要

Large language models (LLMs) achieve state-of-the-art results across many natural language tasks, but their internal mechanisms remain difficult to interpret. In this work, we extract, process, and visualize latent state geometries in Transformer-based language models through dimensionality reduction. We capture layerwise activations at multiple points within Transformer blocks and enable systematic analysis through Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP). We demonstrate experiments on GPT-2 and LLaMa models, where we uncover interesting geometric patterns in latent space. Notably, we identify a clear separation between attention and MLP component outputs across intermediate layers, a pattern not documented in prior work to our knowledge. We also characterize the high norm of latent states at the initial sequence position and visualize the layerwise evolution of latent states. Additionally, we demonstrate the high-dimensional helical structure of GPT-2's positional embeddings and the sequence-wise geometric patterns in LLaMa. We make our code available at https://github.com/Vainateya/Feature_Geometry_Visualization. A better formatted blog-post with identical content is available at https://iclr-blogposts.github.io/2026/blog/2026/vis-llm-latent-geometry/.

2601.00791 2026-06-11 cs.LG cs.AI cs.CL cs.LO 版本更新

Geometry of Reason: Spectral Signatures of Valid Mathematical Reasoning

推理的几何:有效数学推理的谱特征

Valentin Noël

发表机构 * Valentin Noël(瓦伦丁·诺埃尔)

AI总结 通过将注意力矩阵视为加权词图,提取四个无需学习的谱诊断指标(Fiedler值、高频能量比、谱熵和平滑度),有效区分有效推理与模式匹配,在多个模型上达到85-96%的分类准确率。

Comments 30 pages, 13 figures, Accepted at ICML 2026 (main track)

详情
AI中文摘要

验证语言模型是真正推理还是模式匹配仍然是一个开放问题:学习型验证器成本高昂,基于输出的启发式方法脆弱。我们证明,有效的数学推理在Transformer注意力中诱导出可测量的、无需训练的谱特征。通过将每个注意力矩阵视为加权词图,我们提取四个诊断指标:Fiedler值、高频能量比(HFER)、谱熵和平滑度,这些指标无需学习参数。在来自四个架构家族的七个模型上的实验产生了高达Cohen's $d = 3.30$($p < 10^{-116}$)的效应量,实现了$85$--$96\%$的单阈值分类准确率。两个发现加深了理解。首先,\emph{柏拉图式有效性}:谱信号追踪逻辑连贯性而非编译器接受性,因超时或缺失导入而被拒绝的证明被正确分类为有效,这一区别通过人工审核确认($\kappa = 0.82$,$n = 51$)。其次,\emph{架构确定性}:滑动窗口注意力将判别特征从HFER转移到平滑度($d = 2.09$,$p < 10^{-48}$),表明注意力设计决定了哪个谱通道编码推理质量。因果消融证实该特征追踪归纳头电路。该方法泛化到非形式化思维链($d = 0.78$,$p < 10^{-3}$),并且在证明搜索中,HFER重排序将Best-of-16 Pass@1提高了$+4.4$--$6.6\%$,匹配了完全监督探针AUC的$98\%$且无需标签。谱图分析是一种原则性的、架构感知的推理验证原语。

英文摘要

Verifying whether a language model is genuinely reasoning or pattern-matching remains an open problem: learned verifiers are expensive, and output-based heuristics are brittle. We show that valid mathematical reasoning induces a measurable, training-free spectral signature in transformer attention. By treating each attention matrix as a weighted token graph, we extract four diagnostics: Fiedler value, High-Frequency Energy Ratio (HFER), spectral entropy, and smoothness, that require no learned parameters. Experiments across seven models from four architectural families yield effect sizes up to Cohen's $d = 3.30$ ($p < 10^{-116}$), enabling $85$--$96\%$ single-threshold classification accuracy. Two findings sharpen the interpretation. First, \emph{Platonic validity}: the spectral signal tracks logical coherence rather than compiler acceptance, proofs rejected for timeouts or missing imports are correctly classified as valid, a distinction confirmed by a manual audit ($κ= 0.82$, $n = 51$). Second, \emph{architectural determinism}: Sliding Window Attention shifts the discriminative feature from HFER to smoothness ($d = 2.09$, $p < 10^{-48}$), showing that attention design governs which spectral channel encodes reasoning quality. Causal ablation confirms the signature traces induction-head circuits. The method generalises to informal chain-of-thought ($d = 0.78$, $p < 10^{-3}$), and in proof search, HFER reranking improves Best-of-16 Pass@1 by $+4.4$--$6.6$\%, matching $98\%$ of the AUC of fully supervised probes with zero labels. Spectral graph analysis is a principled, architecture-aware primitive for reasoning verification.

2512.14096 2026-06-11 cs.CV 版本更新

RSTR: Reducing SpatioTemporal Redundancy in Diffusion Transformers

RSTR: 减少扩散Transformer中的时空冗余

Ruitong Sun, Tianze Yang, Wei Niu, Jin Sun

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出RSTR框架,通过进化搜索和自适应秩分配联合减少扩散Transformer中的时空冗余,实现50%-70%计算节省并保持或提升生成质量。

Comments International Conference on Machine Learning (ICML)

详情
AI中文摘要

扩散Transformer(DiTs)在图像生成中取得了显著成功,但其部署受到高计算成本的阻碍。我们识别出两种冗余来源。首先,时间冗余:无分类器引导(CFG)在每个时间步应用昂贵的双重前向传播,然而引导仅在特定步骤重要,且关键步骤的可变尺度可以补偿跳过其他步骤。其次,空间冗余:在可变引导下,不同Transformer块表现出异质性敏感性,但跨所有块的统一校准浪费计算且未能满足其不同需求。我们提出RSTR,这是首个联合减少扩散Transformer中时空冗余的框架。第一阶段通过进化搜索解决时间冗余,发现具有可变尺度的稀疏引导调度。第二阶段通过自适应秩分配解决空间冗余,根据敏感性将校准能力分配给Transformer区域。在DiT-XL/2、PixArt-$\alpha$、FLUX和最先进的Qwen-Image上的实验表明,在保持或提升质量的同时实现了50%-70%的计算节省。在DiT-XL/2上,RSTR实现了57%的节省和15%的FID改进;在Qwen-Image上,实现了3.43倍加速且质量保持不变。

英文摘要

Diffusion Transformers (DiTs) have achieved remarkable success in image generation, yet their deployment is hindered by high computational costs. We identify two sources of redundancy. First, temporal redundancy: Classifier-Free Guidance (CFG) applies costly dual forward passes at every timestep, yet guidance matters only at specific steps, and variable scales at critical steps can compensate for skipping others. Second, spatial redundancy: under variable guidance, different transformer blocks exhibit heterogeneous sensitivity, yet uniform calibration across all blocks wastes computation while failing to address their varying requirements. We present RSTR, the first framework to jointly reduce spatiotemporal redundancy in diffusion transformers. Stage-1 addresses temporal redundancy through evolutionary search, discovering sparse guidance schedules with variable scales. Stage-2 addresses spatial redundancy through adaptive rank allocation, assigning calibration capacities to transformer regions based on their sensitivity. Experiments on DiT-XL/2, PixArt-$α$, FLUX, and state-of-the-art Qwen-Image demonstrate 50%-70% compute savings while maintaining or improving quality. On DiT-XL/2, RSTR achieves 57% savings with 15% FID improvement; on Qwen-Image, 3.43$\times$ speedup with preserved quality.

2505.15201 2026-06-11 cs.LG cs.AI cs.CL stat.ML 版本更新

Pass@K Policy Optimization: Solving Harder Reinforcement Learning Problems

Pass@K 策略优化:解决更困难的强化学习问题

Christian Walder, Deep Karkhanis

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 提出 Pass-at-k 策略优化 (PKPO),通过变换奖励直接优化 pass@k 性能,利用低方差无偏估计器,在训练中退火 k 可同时提升 pass@1 和 pass@k,解决更难问题。

详情
AI中文摘要

强化学习算法对每个问题采样多个 n>1 的解决方案尝试并独立奖励它们。这优化了 pass@1 性能,优先考虑孤立样本的强度,而牺牲了样本集的多样性和集体效用。这未充分利用采样能力,限制了探索和在更难示例上的最终改进。作为修复,我们提出 Pass-at-k 策略优化 (PKPO),一种对最终奖励的变换,导致直接优化 pass@k 性能,从而优化联合考虑时最大化奖励的样本集。我们的贡献是推导出 pass@k 及其梯度在二元和连续奖励设置中的新型低方差无偏估计器。我们展示了使用我们的估计器进行优化简化为标准强化学习,其中奖励经过稳定高效的变换函数联合变换。虽然先前的工作仅限于 k=n,但我们是第一个能够对任意 k ≤ n 实现 pass@k 鲁棒优化的。此外,我们的方法不是以 pass@1 性能换取 pass@k 增益,而是允许在训练中退火 k,同时优化两个指标,通常能在显著 pass@k 增益的同时获得强大的 pass@1 数值。我们在玩具实验上验证了我们的奖励变换,揭示了我们的公式的方差减少特性。我们还使用开源 LLM GEMMA-2 包含了真实世界的例子。我们发现我们的变换有效地优化了目标 k。此外,更高的 k 值能够解决更多和更难的问题,而退火 k 则同时提升了 pass@1 和 pass@k。关键的是,在传统 pass@1 优化停滞的具有挑战性的任务集上,我们的 pass@k 方法解锁了学习,这可能是由于通过优先考虑联合效用而非单个样本的效用实现了更好的探索。

英文摘要

Reinforcement Learning (RL) algorithms sample multiple n>1 solution attempts for each problem and reward them independently. This optimizes for pass@1 performance and prioritizes the strength of isolated samples at the expense of the diversity and collective utility of sets of samples. This under-utilizes the sampling capacity, limiting exploration and eventual improvement on harder examples. As a fix, we propose Pass-at-k Policy Optimization (PKPO), a transformation on the final rewards which leads to direct optimization of pass@k performance, thus optimizing for sets of samples that maximize reward when considered jointly. Our contribution is to derive novel low variance unbiased estimators for pass@k and its gradient, in both the binary and continuous reward settings. We show optimization with our estimators reduces to standard RL with rewards that have been jointly transformed by a stable and efficient transformation function. While previous efforts are restricted to k=n, ours is the first to enable robust optimization of pass@k for any arbitrary k <= n. Moreover, instead of trading off pass@1 performance for pass@k gains, our method allows annealing k during training, optimizing both metrics and often achieving strong pass@1 numbers alongside significant pass@k gains. We validate our reward transformations on toy experiments, which reveal the variance reducing properties of our formulations. We also include real-world examples using the open-source LLM, GEMMA-2. We find that our transformation effectively optimizes for the target k. Furthermore, higher k values enable solving more and harder problems, while annealing k boosts both the pass@1 and pass@k . Crucially, for challenging task sets where conventional pass@1 optimization stalls, our pass@k approach unblocks learning, likely due to better exploration by prioritizing joint utility over the utility of individual samples.

2512.11393 2026-06-11 cs.CV 版本更新

The N-Body Problem: Parallel Execution from Single-Person Egocentric Video

N体问题:从单人物体中心视频进行并行执行

Zhifan Zhu, Yifei Huang, Yoichi Sato, Dima Damen

发表机构 * University of Bristol(布里斯托尔大学) The University of Tokyo(东京大学)

AI总结 提出N体问题,从单人物体中心视频预测N人并行执行任务,通过结构化提示策略引导视觉语言模型推理3D环境、物体使用和时间依赖,在EPIC-Kitchens和HD-EPIC数据集上显著提升动作覆盖率并降低冲突。

Comments project webpage: https://zhifanzhu.github.io/ego-nbody

详情
AI中文摘要

人类可以直观地并行化复杂活动,但模型能否通过观察一个人来预测这一点?给定一个物体中心视频,我们引入N体问题:预测N个人如何假设性地执行同一组任务。目标是最大化加速,但将视频片段天真地分配给个人往往违反现实世界约束,导致物理上不可能的场景,例如两个人使用同一物体或占据同一空间。为了量化这一点,我们形式化了N体问题,并提出了一套度量标准来评估性能(加速、任务覆盖)和可行性(空间碰撞、物体冲突和因果约束)。作为概念验证,我们引入了一种结构化提示策略,引导视觉语言模型(VLM)推理3D环境、物体使用和时间依赖,从而产生可行的并行执行。在来自EPIC-Kitchens和HD-EPIC的100个视频上,对于N=2,我们的结构化提示相比Gemini 2.5 Pro的基线提示,动作覆盖率提高了45%,同时碰撞率、物体冲突和因果冲突分别降低了51%、52%和55%。

英文摘要

Humans can intuitively parallelise complex activities, but can a model predict this from observing a single person? Given one egocentric video, we introduce the N-Body Problem: predicting how N individuals, can hypothetically perform the same set of tasks. The goal is to maximise speed-up, but naive assignment of video segments to individuals often violates real-world constraints, leading to physically impossible scenarios like two people using the same object or occupying the same space. To quantify this, we formalise the N-Body Problem and propose a suite of metrics to evaluate both performance (speed-up, task coverage) and feasibility (spatial collisions, object conflicts and causal constraints). As a proof of concept, we introduce a structured prompting strategy that guides a Vision-Language Model (VLM) to reason about the 3D environment, object usage, and temporal dependencies, producing a viable parallel execution. On 100 videos from EPIC-Kitchens and HD-EPIC, for $N = 2$, our structured prompt improves action coverage by 45% over a baseline prompt for Gemini 2.5 Pro, while simultaneously slashing collision rates, object and causal conflicts by 51%, 52% and 55% respectively.

2512.08211 2026-06-11 cs.LG 版本更新

MobileFineTuner: A Mobile-Native Framework for On-Device LLM Fine-Tuning in Real-World Embedded AI Applications

MobileFineTuner:面向真实世界嵌入式AI应用中设备端大语言模型微调的移动原生框架

Jiaxiang Geng, Lunyu Zhao, Yiyi Lu, Bing Luo

发表机构 * Duke Kunshan University(Duke昆山大学) The University of Hong Kong(香港大学)

AI总结 提出移动原生框架MobileFineTuner,通过C++实现资源感知训练运行时(内存高效注意力、激活检查点等),在商用手机上实现端到端LLM微调,显著降低内存压力并提升可执行性。

Comments 26 pages, 25 figures

详情
AI中文摘要

大语言模型(LLM)正从以云为中心的服务转向设备端嵌入式AI,其中模型与从用户及其物理环境感知的私有、纵向信号进行交互。手机是此类应用的自然平台,因为用户随身携带、连接可穿戴传感器,并深度集成于日常移动应用中。然而,在商用手机上实际进行LLM微调仍然困难。现有微调框架大多基于Python且面向服务器,难以部署到移动应用中。我们提出MobileFineTuner,一个面向移动原生的开源框架,用于在商用手机上实现端到端LLM微调。MobileFineTuner用C++实现,并提供可复用的训练栈。为了在移动资源约束下使微调可行,MobileFineTuner集成了资源感知的训练运行时,包括内存高效注意力、激活检查点、梯度累积、参数分片和能量感知调度。我们在真实手机上使用GPT-2、Gemma 3和Qwen2.5模型,在多个微调任务上评估MobileFineTuner。结果表明,MobileFineTuner再现了标准Full-FT和LoRA微调行为,显著降低了内存压力并提升了在内存受限手机上的可执行性。我们进一步通过一个私有的校园健康代理应用展示了MobileFineTuner,其中本地LLM在用户特定的可穿戴感知记录上进行微调,以提供更个性化的响应,同时将原始记录保留在手机上。这些结果确立了MobileFineTuner作为在嵌入式AI和感知系统中研究和构建设备端LLM微调应用的实用工具包。

英文摘要

Large language models (LLMs) are moving from cloud-centric services toward on-device embedded AI, where models interact with private, longitudinal signals sensed from users and their physical environments. Mobile phones are a natural platform for such applications because they are continuously carried by users, connected to wearable sensors, and deeply integrated with daily mobile applications. However, practical LLM fine-tuning on commodity phones remains difficult. Existing fine-tuning frameworks are largely Python-based and server-oriented, making them hard to deploy inside mobile applications. We present MobileFineTuner, a mobile-native open-source framework for end-to-end LLM fine-tuning on commodity mobile phones. MobileFineTuner is implemented in C++ and provides a reusable training stack. To make fine-tuning feasible under mobile resource constraints, MobileFineTuner integrates a resource-aware training runtime with memory-efficient attention, activation checkpointing, gradient accumulation, parameter sharding, and energy-aware scheduling. We evaluate MobileFineTuner on real mobile phones using GPT-2, Gemma 3, and Qwen2.5 models across multiple fine-tuning tasks. The results show that MobileFineTuner reproduces standard Full-FT and LoRA fine-tuning behavior, substantially reduces memory pressure and improves executability on memory-constrained phones. We further demonstrate MobileFineTuner through a private campus health-agent application, where a local LLM is fine-tuned on user-specific wearable-sensing records to provide more personalized responses while keeping raw records on the phone. These results establish MobileFineTuner as a practical toolkit for studying and building on-device LLM fine-tuning applications in embedded AI and sensing systems.

2508.21380 2026-06-11 cs.LG cs.AI 版本更新

The Algorithm Is Not the Behavior: Learned Priors Override Look-Ahead in a Chess-Playing Neural Network

算法并非行为:学得的先验知识在弈棋神经网络中覆盖前瞻

Elias Sandmann, Sebastian Lapuschkin, Wojciech Samek

发表机构 * Fraunhofer HHI(弗劳恩霍夫人工智能研究所)

AI总结 研究发现,国际象棋神经网络Leela Chess Zero在中间层能正确计算解法,但最终输出被安全优先的先验知识覆盖,导致错误答案。

详情
AI中文摘要

最近的机制性工作揭示了神经网络内部的学习算法,从模运算到游戏智能体中的搜索与规划。但算法结构是否保证算法行为?我们在最强的神经象棋引擎Leela Chess Zero中对此进行研究,先前工作已识别出学习到的前瞻。通过将logit透镜扩展到其选棋策略网络,我们发现正确的谜题解法——包括即时将杀——经常出现在中间层,但在最终输出中被系统性覆盖,我们将此现象称为“遗忘的谜题”。在这些位置上重复先前的分析,我们发现前瞻运行正常——正确续招的未来走法被表示、因果重要且可线性解码——排除了算法本身的失败。相反,后期层逐渐转向优先考虑安全对局而非激进。为了测试这一转变是否驱动了覆盖,我们引导模型反对这些偏好,并恢复了61.7%的遗忘谜题,提供了因果证据表明安全先验覆盖了算法计算的解法。这些发现表明,算法结构并不保证算法行为:模型可以在内部解决问题,但仍然输出错误答案。

英文摘要

Recent mechanistic work has uncovered learned algorithms within neural networks, from modular arithmetic to search and planning in game-playing agents. But does algorithmic structure guarantee algorithmic behavior? We investigate this in Leela Chess Zero, the strongest neural chess engine, where prior work identified learned look-ahead. By extending the logit lens to its move-selecting policy network, we discover that correct puzzle solutions-including immediate checkmates-often appear in intermediate layers but are systematically overridden in the final output, a phenomenon we term "forgotten puzzles". Replicating prior analyses on these positions, we find that look-ahead operates normally-future moves of the correct continuation are represented, causally important, and linearly decodable-ruling out a failure of the algorithm itself. Instead, late layers increasingly shift toward prioritizing safe play over aggression. To test whether this shift drives the override, we steer the model against these preferences and recover 61.7% of forgotten puzzles, providing causal evidence that safety priors override algorithmically computed solutions. These findings demonstrate that algorithmic structure does not guarantee algorithmic behavior: a model can internally solve a problem and still output the wrong answer.

2511.19314 2026-06-11 cs.AI cs.CL cs.LG 版本更新

PRInTS: Reward Modeling for Long-Horizon Information Seeking

PRInTS:面向长程信息检索的奖励建模

Jaewoo Lee, Archiki Prasad, Justin Chih-Yao Chen, Zaid Khan, Elias Stengel-Eskin, Mohit Bansal

发表机构 * University of North Carolina at Chapel Hill(北卡罗来纳大学教堂山分校) University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 提出PRInTS生成式过程奖励模型,通过密集评分和轨迹摘要提升长程信息检索中工具交互与推理能力,在多个基准上超越前沿模型。

Comments ACL 2026, 19 pages, code: https://github.com/G-JWLee/PRInTS

详情
AI中文摘要

信息检索是AI智能体的核心能力,要求它们在整个长轨迹中收集和推理工具生成的信息。然而,这种多步骤信息检索任务对于基于语言模型的智能体仍然具有挑战性。虽然过程奖励模型(PRM)可以通过在测试时对候选步骤进行排序来指导智能体,但现有的PRM——设计用于具有二元判断的短程推理——无法捕捉信息检索步骤的更丰富维度,例如工具交互和对工具输出的推理,也无法处理长程任务中快速增长的上下文。为了解决这些限制,我们引入了PRInTS,一种具有双重能力的生成式PRM:(1)基于PRM对步骤质量多个维度(例如,工具输出的解释、工具调用的信息量)的推理进行密集评分,以及(2)轨迹摘要,在压缩不断增长的上下文的同时保留步骤评估所需的基本信息。在FRAMES、GAIA(级别1-3)和WebWalkerQA(简单-困难)基准上对多个模型的广泛评估表明,使用PRInTS进行最佳n采样增强了开源模型以及专门智能体的信息检索能力,以更小的骨干智能体匹配或超越前沿模型,并优于其他强奖励建模基线。

英文摘要

Information-seeking is a core capability for AI agents, requiring them to gather and reason over tool-generated information across long trajectories. However, such multi-step information-seeking tasks remain challenging for agents backed by language models. While process reward models (PRMs) can guide agents by ranking candidate steps at test-time, existing PRMs - designed for short reasoning with binary judgment - cannot capture richer dimensions of information-seeking steps, such as tool interactions and reasoning over tool outputs, nor handle the rapidly growing context in long-horizon tasks. To address these limitations, we introduce PRInTS, a generative PRM trained with dual capabilities: (1) dense scoring based on the PRM's reasoning across multiple dimensions of step quality (e.g., interpretation of tool outputs, tool call informativeness) and (2) trajectory summarization that compresses the growing context while preserving essential information for step evaluation. Extensive evaluations across FRAMES, GAIA (levels 1-3), and WebWalkerQA (easy-hard) benchmarks on multiple models reveal that best-of-n sampling with PRInTS enhances information-seeking in open-source models as well as specialized agents, matching or surpassing frontier models with a much smaller backbone agent and outperforming other strong reward modeling baselines.

2511.00044 2026-06-11 cs.LG nlin.AO 版本更新

Time-multiplexed layer reuse for physical neural networks

物理神经网络的时间复用层重用

Kohei Tsuchiyama, Andre Roehm, Takatomo Mihana, Ryoichi Horisaki

发表机构 * Graduate School of Information Science and Technology, The University of Tokyo(信息科学与技术研究生学校,东京大学)

AI总结 针对物理神经网络权重调整慢的瓶颈,提出TIDAL-Net,通过时间复用层增加有效深度,在图像分类和自然语言处理任务上提升性能。

详情
AI中文摘要

物理神经网络(PNN)是下一代计算的有前途的候选者,但现有演示仍比现代数字神经网络小几个数量级,而现代数字神经网络的最新进展是由可训练参数的快速增长驱动的。这种情况类似于早期数字神经网络的限制,这导致了关于参数重用的想法。我们研究了类似高效的硬件架构可能是什么样子,特别关注PNN中权重重新调整的常见瓶颈。我们提出了时间索引深度交替层网络(TIDAL-Net),它占据循环神经网络和深度神经网络之间的中间状态,专门针对常见PNN原型的规模和限制。TIDAL-Net利用许多PNN中快速前向动力学和缓慢可训练权重与偏置之间的时间尺度分离,通过逐层时间复用来增加有效深度,同时限制实现成本。在图像分类和自然语言处理任务上的数值实验表明,TIDAL-Net在仅对传统PNN进行微小修改的情况下提高了性能。

英文摘要

Physical neural networks (PNNs) are promising candidates for next-generation computing, but existing demonstrations remain several orders of magnitude smaller than modern digital neural networks, whose recent advances have been driven by rapid growth in trainable parameters. This situation resembles the constraints of early digital neural networks, which led to ideas around parameter reuse. We investigate what similarly efficient hardware architectures may look like, focusing specifically on the common bottleneck of slow re-adjustment of the weights in PNNs. We propose the Time-Indexed Deep Alternating Layers Network (TIDAL-Net), which occupies an intermediate regime between recurrent and deep neural networks, specifically aimed at the scales and restrictions of common PNN prototypes. TIDAL-Net leverages the timescale separation found in many PNNs between fast forward dynamics and slowly trainable weights and biases, using layer-by-layer time multiplexing to increase effective depth while limiting implementation cost. Numerical experiments on image classification and natural language processing tasks show that TIDAL-Net improves performance with only minor modifications to conventional PNNs.