arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2075
专题追踪
2605.07838 2026-05-11 q-bio.QM cs.AI cs.LG

PPI-Net connects molecular protein interactions to functional processes in disease

PPI-Net将分子蛋白相互作用连接到疾病中的功能过程

Kyle Higgins, Guadalupe Gonzalez, Dennis Veselkov, Ivan Laponogov, Kirill Veselkov

发表机构 * Division of Cancer, Department of Surgery and Cancer, Faculty of Medicine, Imperial College London(帝国理工学院伦敦校区癌症部、外科与癌症部门、医学学院) Prescient Design, Genentech(基因泰克预示设计) Department of Computing, Imperial College London(帝国理工学院伦敦校区计算部门)

AI总结 PPI-Net通过整合蛋白相互作用网络与通路表示,从分子互作到功能过程建模疾病,实现高预测性能和机制洞察。

Comments 17 pages, 3 figures, 2 tables

详情
AI中文摘要

理解分子改变如何在生物系统中传播以驱动疾病仍是一个核心挑战。尽管高通量分析能全面表征肿瘤状态,但大多数模型忽视了结构生物关系或缺乏跨尺度的可解释性。本文提出PPI-Net,一种层次图神经网络,整合蛋白质相互作用网络(PPI)与通路级表示,从分子互作到功能过程建模疾病。患者特异性分子特征嵌入共享的STRING相互作用网络,并通过多层Reactome层次结构使用图注意力传播,将基因级信号聚合到更高阶的生物程序。在来自癌症基因组图谱的十种癌症类型的RNA-seq数据上,PPI-Net实现了稳健的预测性能,多个队列的平衡准确率超过90%。在乳腺癌RNA-Seq数据的比较分析中,PPI-Net整合Reactome层次结构使平衡准确率比仅使用PPI的模型提高6.7%,而层次多级监督使平衡准确率比仅使用单一顶层预测头提高12.3%。应用多组学方法结合RNA-seq和甲基化数据提高模型解释性,恢复经典致癌模块,包括TP53-AKT信号和压力响应通路,同时揭示收敛到相干程序如离子信号和细胞对刺激的响应。这些结果表明,整合互作网络与通路层次结构可实现准确预测并提供癌症生物学的机制洞察。

英文摘要

Understanding how molecular alterations propagate across biological systems to drive disease remains a central challenge. Although high-throughput profiling enables comprehensive characterization of tumor states, most models neglect structured biological relationships or lack interpretability across scales. Here we present PPI-Net, a hierarchical graph neural network that integrates protein-protein interaction (PPI) networks with pathway-level representations to model disease from molecular interactions to functional processes. Patient-specific molecular profiles are embedded within a shared interaction network from STRING and propagated through a multi-layer Reactome hierarchy using graph attention, enabling aggregation of gene-level signals into higher-order biological programs. Across RNA-seq data from ten cancer types from The Cancer Genome Atlas, PPI-Net achieves robust predictive performance, with balanced accuracy exceeding 90% in multiple cohorts. Comparative analysis on RNA-Seq data from breast cancer demonstrated that PPI-Net's integration of the Reactome hierarchy improved balanced accuracy by 6.7% relative to a PPI-only model, while hierarchical multi-level supervision improved balanced accuracy by 12.3% relative to using only a single top-level prediction head. Applying a multi-omics approach using RNA-seq and methylation data improves model interpretation, recovering canonical oncogenic modules, including TP53-AKT signaling and stress response pathways, while revealing convergence onto coherent programs such as ion signaling and cellular responses to stimuli. These results demonstrate that integrating interaction networks with pathway hierarchies enables accurate prediction while providing mechanistic insight into cancer biology.

2605.07830 2026-05-11 cs.CR cs.AI

CyBiasBench: Benchmarking Bias in LLM Agents for Cyber-Attack Scenarios

CyBiasBench:用于网络攻击场景的LLM代理偏见基准测试

Taein Lim, Seongyong Ju, Munhyeok Kim, Hyunjun Kim, Hoki Kim

发表机构 * Chung-Ang University(Chung-Ang大学) Myongji University(Myongji大学)

AI总结 本文提出CyBiasBench,通过630次会话评估五个代理在三种目标和四种提示条件下的攻击偏见,揭示代理在攻击家族分配上的差异及偏见动量效应。

Comments Under Review

详情
AI中文摘要

本文揭示了大型语言模型(LLMs)作为自主代理在进攻性网络安全中的一个有趣现象:不同代理表现出不同的攻击模式。具体而言,每个代理都表现出攻击选择偏差,无论提示变化如何,都会倾向于集中在一小部分攻击家族上。为系统量化这种行为,我们引入了CyBiasBench,一个涵盖630次会话的综合性基准测试,评估五个代理在三种目标和四种提示条件下的十种攻击家族。我们发现代理之间存在明显的偏见,不同代理主导的攻击家族和攻击家族分配分布的熵水平各不相同。这种偏见更好地被描述为代理的特性,而不是与攻击成功率相关的因素。此外,我们的实验揭示了偏见动量效应,即代理会抵制明确转向与自身偏见冲突的攻击家族。这种强制分布转移并未带来可测量的攻击性能提升。为确保可重复性和促进未来研究,我们发布了交互式结果仪表板,并提供了可重复性存档,包含聚合的会话级统计数据和完整的评估脚本。

英文摘要

Large language models (LLMs) are increasingly deployed as autonomous agents in offensive cybersecurity. In this paper, we reveal an interesting phenomenon: different agents exhibit distinct attack patterns. Specifically, each agent exhibits an attack-selection bias, disproportionately concentrating its efforts on a narrow subset of attack families regardless of prompt variations. To systematically quantify this behavior, we introduce CyBiasBench, a comprehensive 630-session benchmark that evaluates five agents on three targets and four prompt conditions with ten attack families. We identify explicit bias across agents, with different dominant attack families and varying entropy levels in their attack-family allocation distributions. Such bias is better characterized as a trait of the agents, rather than a factor associated with the attack success rate. Furthermore, our experiments reveal a bias momentum effect, where agents resist explicit steering toward attack families that conflict with their bias. This forced distribution shift does not yield measurable improvements in attack performance. To ensure reproducibility and facilitate future research, we release an interactive result dashboard at https://trustworthyai.co.kr/CyBiasBench/ and a reproducibility artifact with aggregated session-level statistics and full evaluation scripts at https://github.com/Harry24k/CyBiasBench.

2605.07825 2026-05-11 cs.MM cs.CV

Anisotropic Modality Align

各向异性模态对齐

Xiaomin Yu, Yijiang Li, Yuhui Zhang, Hanzhen Zhao, Yue Yang, Hao Tang, Yue Song, Xiaobin Hu, Chengwei Qin, Shuicheng Yan, Hui Xiong

发表机构 * HKUST(GZ)(香港科技大学(广州)) NUS(国立新加坡大学) UCSD(加州大学圣地亚哥分校) Stanford(斯坦福大学) PKU(北京大学) THU(清华大学)

AI总结 本文研究了多模态模型中模态间转换的可行性,提出各向异性模态对齐方法,通过几何修正框架提升单模态数据的多模态训练效果。

详情
AI中文摘要

训练多模态大语言模型长期以来受到高质量配对多模态数据稀缺的限制。近期研究表明,预训练多模态对比模型的共享表示空间可以作为桥梁,使模型能够使用单模态数据进行多模态训练。然而,这一范式的关键前提仍不够明确:不同模态的表示能否可靠互换?核心障碍在于共享空间中持续存在的模态间隙。在本文中,我们重新审视模态间隙的几何本质。我们发现模态表示已经共享兼容的主导语义几何。真正阻碍模态互换性的是不是简单的全局偏移,而是一种集中在少数主导方向上的各向异性残差结构。基于这一发现,我们进一步提出各向异性模态间隙对齐原则:有效的模态对齐应与目标模态分布对齐,同时保留源模态的语义结构。受此原则指导,我们提出一个各向异性几何修正框架,AnisoAlign,用于无配对模态对齐。该框架利用目标模态的内部几何先验,并对源模态表示进行有界修正,从而在目标模态中构建替代表示。实验证实其在几何诊断和纯文本MLLM训练中的优势。总体而言,本文将模态间隙从经验观察转变为可纠正、结构化的几何现象,并为使用单模态数据训练多模态模型提供了新的表示对齐视角。

英文摘要

Training multimodal large language models has long been limited by the scarcity of high-quality paired multimodal data. Recent studies show that the shared representation space of pretrained multimodal contrastive models can serve as a bridge, enabling models to perform multimodal training with unimodal data. However, the key premise of this paradigm remains insufficiently understood: can representations from different modalities be reliably interchanged? The core obstacle lies in the persistent Modality Gap in the shared space. In this work, we revisit the geometric nature of the modality gap. We find that modality representations already share compatible dominant semantic geometry. What truly hinders modality interchangeability is not a simple global shift, but an anisotropic residual structure concentrated along a small number of dominant directions. Based on this finding, we further propose the principle of anisotropic modality gap alignment: effective modality alignment should align with the target-modality distribution while preserving the semantic structure of the source modality. Guided by this principle, we propose an anisotropic geometric correction framework, AnisoAlign, for unpaired modality alignment. This framework leverages the internal geometric prior of the target modality and performs bounded correction on source-modality representations, thereby constructing substitute representations in the target modality. Experiments confirm its benefits in both geometric diagnostics and text-only MLLM training. Overall, this work recasts the modality gap from an empirical observation into a correctable, structured geometric phenomenon and provides a new representation alignment perspective for training multimodal models with unimodal data.

2605.07812 2026-05-11 cs.CR cs.LG

GRASP -- Graph-Based Anomaly Detection Through Self-Supervised Classification

基于自监督分类的图基异常检测:GRASP

Robin Buchta, Carsten Kleiner, Felix Heine, Gabi Dreo Rodosek

发表机构 * IEEE

AI总结 GRASP通过自监督分类检测异常行为,利用进程执行文件的图邻域推断,无需阈值即可识别未知活动,优于现有系统。

Comments 17 pages

详情
AI中文摘要

高级持续性威胁(APT)攻击因隐蔽性、适应性和使用合法系统组件而难以检测。基于溯源的入侵检测系统(PIDS)通过捕获系统组件和操作之间的详细关系提供防御。然而,现有PIDS依赖预定义或子集确定的阈值,限制了检测稳定性和对一般异常行为的检测能力。此外,相关工作常忽视进程可执行文件的作用,这些文件通过与文件、网络组件和其他进程交互描述系统活动。我们引入GRASP,一种基于掩码自监督分类的PIDS。GRASP掩码进程的执行信息,并从其两跳溯源图邻域学习推断,将误分类的进程标记为异常。它在无需阈值的情况下捕捉学习到的可执行文件的行为模式,使其对干扰和未知活动具有鲁棒性。在DARPA TC和OpTC数据集上的评估表明,GRASP一致检测异常行为,包括已知攻击相关活动,优于现有系统。我们的PIDS在行为可学习的数据集上识别所有记录的攻击。此外,与现有系统相比,GRASP揭示了文档中未标记为攻击的潜在恶意异常行为。

英文摘要

Advanced persistent threat (APT) attacks remain difficult to detect due to their stealth, adaptability, and use of legitimate system components. Provenance-based intrusion detection systems (PIDS) offer a promising defense by capturing detailed relationships between system components and actions. However, current PIDS rely on predefined or subset-determined thresholds, which limit detection stability and the ability to detect any anomalous behavior in general. Furthermore, related work often neglects the role of process executables, which describe system activity by interacting through a process with files, network components, and other processes. We introduce GRASP, a PIDS based on masked self-supervised classification. GRASP masks the executable information of processes and learns to infer it from their two-hop provenance graph neighborhood, marking misclassified processes as anomalies. It captures behavior patterns for the learned executables without thresholding, making it robust against interference and unknown activities. Evaluations on the DARPA TC and OpTC datasets demonstrate that GRASP consistently detects anomalous behavior, including known attack-related activities, outperforming existing systems. Our PIDS identifies all documented attacks on datasets where the behavior of executables is learnable. In addition, compared to existing systems, GRASP uncovers potentially malicious anomalous behavior not labeled as an attack in the documentation.

2605.07810 2026-05-11 physics.optics cs.CV

Pre-training Enables Extraordinary All-optical Image Denoising

预训练实现非凡全光图像去噪

Xudong Lv, Yuxiang Sun, Shuo Wang, Nanxing Chen, Jun Guan, Jingtian Hu

发表机构 * Ministry of Industry and Information Technology Key Lab of Micro-Nano Optoelectronic Information System(工业和信息化部微纳光电信息系统重点实验室) Guangdong Provincial Key Laboratory of Semiconductor Optoelectronic Materials and Intelligent Photonic System(广东省半导体光电材料与智能光子系统重点实验室) Harbin Institute of Technology(哈尔滨工业大学) School of Electronics and Information Engineering(电子与信息工程学院) Zhejiang Provincial Key Laboratory of Intelligent Vehicle Electronics Research(浙江省智能车辆电子研究所) Hangzhou Dianzi University(杭州电子科技大学) School of Science and Engineering(科学与工程学院) The Chinese University of Hong Kong (Shenzhen)(香港中文大学(深圳)) Quantum Science Center of Guangdong-Hong Kong-Macao Greater Bay Area(粤港澳大湾区量子科学中心) Key Laboratory of Photonic Technology for Integrated Sensing and Communication, Ministry of Education(教育部光电一体化感知与通信技术重点实验室) Guangdong University of Technology(广东工业大学)

AI总结 本文提出一种预训练驱动方法,通过两步优化过程实现高效全光图像去噪,显著提升去噪质量,适用于多种图像风格,且在噪声环境下保持细节并提高PSNR。

详情
AI中文摘要

光学神经网络因其在速度和能效方面的潜力而成为强大的机器学习和信息处理工具。然而,这些物理模型的训练方法相较于数字模型仍不明确,导致性能欠佳。本文报告了一种预训练驱动方法,通过大规模数据集预训练和任务特定数据集微调,实现了高效自由空间光学去噪。与传统傅里叶域滤波和直接训练的衍射网络相比,该迁移学习过程在严重噪声下去噪效果更优,PSNR从低于8dB提升至超过18dB。重要的是,相同的预训练光学网络可一致微调处理多种图像风格,包括手写数字(MNIST)、胸片(ChestMNIST)、CIFAR-10图像和人脸(CelebA)。此外,本文还展示了光学去噪器在视觉应用中的关键作用,如人脸识别、车牌识别和噪声环境下无人机定位。

英文摘要

Optical neural networks are emerging as powerful machine learning and information processing tools because of their potential advantages in speed and energy efficiency. The training methods of these physical models, however, remain underexplored compared to their digital counterparts and are leading to suboptimal performance. This paper reports a pre-training-driven approach that leads to snapshot image denoising with substantially improved quality. We demonstrated effective free-space optical denoising by a diffractive network optimized by a two-step process including (1) pre-training using a massive dataset of 3.45 million diverse but simple images and (2) fine-tuning with the corresponding task-specific datasets. Compared to conventional Fourier-domain filtering and directly trained diffractive networks, such a transfer learning process exhibited prominent advantages for denoising images degraded by severe noise, peak signal-to-noise ratio (PSNR) below 8 dB, while preserving fine image features and improving the PSNR to above 18 dB. Importantly, the same pre-trained optical network could be consistently fine-tuned to process degraded images from highly diverse styles ranging from handwritten digits (MNIST) and chest X-rays (ChestMNIST) to CIFAR-10 images and human faces (CelebA). We further demonstrated the critical role of our optical denoisers in vision-based applications, including face detection, plate recognition, and localization of UAVs in noisy conditions.

2605.07768 2026-05-11 eess.SY cs.LG cs.SY

Interactive Trajectory Planning with Learning-based Distributionally Robust Model Predictive Control and Markov Systems

基于学习的分布鲁棒模型预测控制与马尔可夫系统的交互轨迹规划

Erik Börve, Nikolce Murgovski, Morteza Haghir Chehreghani, Leo Laine

发表机构 * Chalmers University of Technology(查尔姆斯理工大学) Volvo Group Trucks Technology(沃尔沃集团卡车技术)

AI总结 本文研究了在周围智能体决策不确定性下的交互轨迹规划问题,提出结合PAC学习与分布鲁棒优化的DR-MPC框架,通过样本数量实现鲁棒MPC与随机MPC之间的插值。

详情
AI中文摘要

我们研究了在周围智能体决策不确定性下的交互轨迹规划问题。为控制自主智能体,我们首先学习决策分布并求解随机模型预测控制(SMPC)问题。为考虑学习分布中的误差,我们展示了如何将Probably Approximately Correct(PAC)学习与分布鲁棒(DR)优化结合,以获得考虑学习模型误差的解决方案。结果表明,基于PAC学习的DR-MPC框架提供了一种根据可用样本数量在鲁棒MPC和全能力SMPC之间插值的方法。

英文摘要

We investigate interactive trajectory planning subject to uncertainty in the decisions of surrounding agents. To control the ego-agent, we aim to first learn the decision distribution and solve a Stochastic Model Predictive Control (SMPC) problem. To account for errors in the learned distribution, we show that it is possible to utilize Probably Approximately Correct (PAC) learning in combination with Distributionally Robust (DR) optimization to obtain a solution which accounts for the errors induced by the learning model. The results indicate that our PAC learning-based DR-MPC framework provides a method to interpolate between a robust MPC and an omnipotent SMPC, based on the available number of samples.

2605.07758 2026-05-11 cs.FL cs.LG

SMT-Based Active Learning of Weighted Automata

基于SMT的加权自动机主动学习

Tiago Ferreira, Kevin Batz, Alexandra Silva

发表机构 * University College London(伦敦大学学院) Cornell University(康奈尔大学)

AI总结 本文提出基于SMT的加权自动机主动学习算法,能生成最小自动机,实验表明其在有限和无限半环上均优于基线方法,且生成更小自动机。

Comments Appearing in CAV 2026

详情
AI中文摘要

我们提出一种基于SMT的主动学习算法,用于非确定性加权自动机(WFAs),作为Hankel/L*-类方法的实用且稳健替代方案。该算法参数化于给定的半环,并在终止时保证生成最小的WFAs。我们证明了部分正确性,并提供了一个充分终止条件,这特别意味着对于所有有限半环都终止。我们的广泛实验评估表明,该算法能够学习众多最小WFAs,远超基线方法,并在生成更小自动机和与教师交互较少方面与最先进的算法竞争。

英文摘要

We present an SMT-based active learning algorithm for nondeterministic weighted automata (WFAs) as a practical and robust alternative to Hankel/L*-style methods. Our algorithm is parametric in a given semiring and, if it terminates, guaranteed to produce minimal WFAs. We prove partial correctness and provide a sufficient termination condition, which in particular implies termination for all finite semirings. Our extensive experimental evaluation shows that our algorithm is capable of learning numerous minimal WFAs over both finite and infinite semirings, vastly outperforms a naive baseline, and is competitive with a state-of-the-art algorithm while producing significantly smaller automata and requiring less interaction with the teacher.

2605.07751 2026-05-11 cs.CY cs.AI

Vibe coding before the trend

在趋势之前进行vibe编码

Leon van Bokhorst, Koen Suilen

发表机构 * Fontys ICT, University of Applied science(Fontys ICT应用科学大学)

AI总结 本文通过2025年四个学生群体的vibe编码挑战,探讨AI工具对学习焦点、技能转变和职业影响的模式,总结教育实践中AI与学习的关系变化及教育实践启示。

Comments 10 pages

详情
AI中文摘要

2025年初,我们在荷兰Fontys应用科学大学和南非北开普大学的四个不同学生群体中进行了系列vibe编码挑战。从学生反思中出现了五个主要模式:AI工具使学习重点从语法转向更高阶思维;技能从记忆转向评估;AI能力被视为职业必需;AI被视为合作伙伴而非替代品;非技术学生对工具的可访问性有最高认可。本文报告了课堂实验中的观察,反思了过去一年的变化,并分享了对考虑类似实验的教育者实践启示。我们呈现这些观察作为实践中的模式,而非已证明的结论,相信分享早期经验有助于AI与教育领域的发展。

英文摘要

Early 2025 we ran a series of vibe coding challenges across four different student cohorts. The cohorts included 54 ICT students, 24 digital marketing students, and 7 journalism students at Fontys University of Applied Sciences (Netherlands), and 22 BA Communication students at North-West University (South Africa). From the student reflections, five major patterns emerged. Students reported that AI tools shifted their focus from syntax to higher-order thinking; they also described a skill shift from memorizing to evaluating; they viewed AI proficiency as career-essential; they framed their relationship with AI as partnership rather than replacement; and finally non-technical students showed the strongest appreciation for the accessibility these tools provide. This practitioner report documents what we observed during the classroom experiments, we reflect on how the landscape has shifted in the year since, and shares practical lessons for educators considering similar experiments. We present the observations as what they are: patterns from practice, not proven conclusions, in the beleif that sharing early stage experiences contributes to the overall field of AI and education.

2605.07746 2026-05-11 stat.ML cs.LG q-bio.QM

Flow Matching for Count Data

计数数据的流匹配

Ganchao Wei, John Pearson

发表机构 * Department of Neurobiology(神经生物学系) Department of Statistical Science(统计科学系) Duke University(杜克大学) Department of Electrical and Computer Engineering(电气与计算机工程系)

AI总结 本文提出count-FM框架,通过连续时间出生-死亡过程实现计数数据的高效建模与生成,提升样本质量与模型效率。

详情
AI中文摘要

高维计数数据出现在单细胞RNA测序和神经脉冲序列等应用中,映射跨批次或时间点的分布是数据分析的关键。最近深度生成模型在图像、视频和文本上的成功促使将这些想法扩展到计数值设置,但现有方法要么将每个计数视为分类状态,要么将其转换为连续空间,当计数范围较大时并不自然或高效。我们提出count-FM,一种基于连续时间出生-死亡过程的计数数据流匹配框架,通过无模拟训练的条件转移率学习计数空间中的边缘转移,允许在任意计数分布的源和目标群体之间运输。在模拟中,count-FM在样本质量上优于代表性基线,同时使用显著更少的参数。我们进一步将count-FM应用于scRNA-seq和神经脉冲序列数据进行无条件生成、运输和条件生成。在这些任务中,count-FM实现了改进的样本质量和更高的建模效率,并具有可解释的运输路径。

英文摘要

High-dimensional count data arise in applications such as single-cell RNA sequencing and neural spike trains, where mapping between distributions across successive batches or time points form critical components of data analysis. The recent success of diffusion- and flow-based deep generative models for images, video, and text motivates extending these ideas to count-valued settings, but many existing methods either treat each count as a categorical state or transform counts into a continuous space, neither of which is natural or efficient when the count range is large. We propose count-FM, a flow-matching framework for count data based on a continuous-time birth-death process with local unit jumps. Count-FM learns marginal transitions efficiently in count space through simulation-free training of conditional transition rates, allowing transport between arbitrary count-distributed source and target populations. In simulation, count-FM achieves better sample quality than representative baselines while using substantially fewer parameters. We further apply count-FM to scRNA-seq and neural spike-train data for unconditional generation, transport, and conditional generation. Across these tasks, count-FM yields improved sample quality, greater modeling efficiency, and interpretable transport paths.

2605.07738 2026-05-11 physics.comp-ph cs.LG

Physics-Informed Reduced-Order Operator Learning for Hyperelasticity in Continuum Micromechanics

融合物理信息的降阶运算学习用于连续微力学中的超弹性

Hamidreza Eivazi, Henning Wessels

发表机构 * Division Data-Driven Modeling of Mechanical Systems, Institute of Applied Mechanics, Technische Universität Braunschweig, Pockelsstr. 3, 38106 Braunschweig, Germany(数据驱动机械系统建模 division,应用力学研究所, Braunschweig 技术大学,Pockelsstr. 3,38106 Braunschweig,德国)

AI总结 本文提出一种融合物理信息的降阶运算学习方法,用于连续微力学中的超弹性问题,通过结合等价神经运算和QR-DEIM方法,显著降低计算成本并提升效率。

Comments 22 pages, 12 figures

详情
AI中文摘要

物理信息运算学习是微结构代理建模的有吸引力候选者,特别是在多尺度有限元模拟中。然而,其实际应用通常受限于损失评估的高成本。我们通过将等价神经运算(EquiNO)与基于QR的离散经验插值法(Q-DEIM)结合,解决了这一瓶颈。EquiNO仅学习由周期性和无散度基函数构建的降阶位移波动和第一Piola-Kirchhoff应力表示的模系数,从而通过构造强制周期性和机械平衡。Q-DEIM则通过应力基函数的列主元QR分解识别少量空间点,并在训练过程中仅限制这些点的本构评估。这使全批量二次优化在三维代表性体积元素(RVEs)中成为可能。通过离线平均的降阶应力模式直接恢复均质化第一Piola-Kirchhoff应力,无需在推理时重建完整的应力场。我们验证了该框架在两个三维大变形超弹性RVEs上的有效性。Q-DEIM相对于全场损失评估将每步训练成本降低了约三个数量级,而降阶均质化在直接全场计算上实现了10^3到10^4的加速。尽管仅依赖少量离线快照加载路径构建基函数,该方法能够准确插值和外推微观应力场和均质化应力,预测质量随着更多快照的增加而系统性提高。

英文摘要

Physics-informed operator learning is an attractive candidate for surrogate modeling of microstructures, especially in multiscale finite-element simulations. Its practical use, however, is often limited by the high cost of loss evaluation. We address this bottleneck by combining the Equilibrium Neural Operator (EquiNO) with the QR-based discrete empirical interpolation method (Q-DEIM). EquiNO learns only the modal coefficients of reduced displacement-fluctuation and first Piola-Kirchhoff stress representations built from periodic and divergence-free bases, thereby enforcing periodicity and mechanical equilibrium by construction. Q-DEIM then identifies a small set of spatial points through a column-pivoted QR factorization of the stress basis and restricts constitutive evaluations during training to these points alone. This makes full-batch second-order optimization practical for three-dimensional representative volume elements (RVEs). Homogenized first Piola-Kirchhoff stresses are recovered directly from the offline-averaged reduced stress modes, without the need to reconstruct the full stress field at inference time. We validate the framework on two three-dimensional finite-strain hyperelastic RVEs. Q-DEIM reduces the per-step training cost by roughly three orders of magnitude relative to full-field loss evaluation, while reduced homogenization achieves speed-up factors of order $10^3$ to $10^4$ over direct full-field computations. Despite relying on only a small number of offline snapshot loading paths for basis construction, the method accurately interpolates and extrapolates both microscopic stress fields and homogenized stresses, with prediction quality improving systematically as more snapshots are added.

2605.07723 2026-05-11 cs.DL cs.AI cs.CY physics.soc-ph

LLM hallucinations in the wild: Large-scale evidence from non-existent citations

在现实世界中大型语言模型的幻觉:来自不存在引用的大规模证据

Zhenyue Zhao, Yihe Wang, Toby Stuart, Mathijs De Vaan, Paul Ginsparg, Yian Yin

发表机构 * Department of Information Science, Cornell University(信息科学系,康奈尔大学) Department of Sociology, University of California Los Angeles(社会学系,加州大学洛杉矶分校) Department of Computer Science and Technology, Tsinghua University(计算机科学与技术系,清华大学) Haas School of Business, University of California Berkeley(哈斯商学院,加州大学伯克利分校)

AI总结 研究通过验证引用数据揭示LLM生成虚假引用的问题,发现2025年存在146932个虚假引用,且在AI应用快速发展的领域和语言特征显示AI辅助写作的论文中尤为严重,影响科学认可的公平性。

详情
AI中文摘要

大型语言模型(LLM)在广泛上下文中生成看似合理但虚假的信息,但其现实世界影响和后果仍不明确。本文利用可验证的科学引用对象,审计了arXiv、bioRxiv、SSRN和PubMed Central中250万篇论文的1110万条参考文献。发现随着LLM的广泛应用,非真实引用显著增加,2025年保守估计有146932个幻觉引用。这些错误广泛分布于多篇论文中,尤其在AI应用迅速发展的领域、语言特征显示AI辅助写作的论文中以及小型和早期职业作者团队中尤为明显。同时,幻觉引用倾向于将信用分配给已 prominent 和男性学者,表明LLM生成的错误可能加剧现有科学认可的不平等。预印本审查和期刊发表过程仅捕捉了这些错误的一小部分,表明幻觉内容的传播速度超过了现有保护措施。这些发现表明,LLM幻觉正在大规模渗透知识生产,威胁未来科学发现的可靠性和公平性。

英文摘要

Large language models (LLMs) are known to generate plausible but false information across a wide range of contexts, yet the real-world magnitude and consequences of this hallucination problem remain poorly understood. Here we leverage a uniquely verifiable object - scientific citations - to audit 111 million references across 2.5 million papers in arXiv, bioRxiv, SSRN, and PubMed Central. We find a sharp rise in non-existent references following widespread LLM adoption, with a conservative estimate of 146,932 hallucinated citations in 2025 alone. These errors are diffusely embedded across many papers but especially pronounced in fields with rapid AI uptake, in manuscripts with linguistic signatures of AI-assisted writing, and among small and early-career author teams. At the same time, hallucinated references disproportionately assign credit to already prominent and male scholars, suggesting that LLM-generated errors may reinforce existing inequities in scientific recognition. Preprint moderation and journal publication processes capture only a fraction of these errors, suggesting that the spread of hallucinated content has outpaced existing safeguards. Together, these findings demonstrate that LLM hallucinations are infiltrating knowledge production at scale, threatening both the reliability and equity of future scientific discovery as human and AI systems draw on the existing literature.

2605.07705 2026-05-11 cs.LO cs.AI

Cross-Attention and Encoder-Decoder Transformers: A Logical Characterization

交叉注意力与编码器-解码器变换器:一种逻辑刻画

Veeti Ahvonen, Damian Heiman, Antti Kuusisto, Miguel Moreno, Matias Selin

发表机构 * Mathematics Research Centre, Tampere University(塔尔皮奥大学数学研究中心)

AI总结 本文通过一种新的时序逻辑对编码器-解码器变换器进行逻辑刻画,扩展了命题逻辑并引入计数全局模态和过去模态,同时探讨了变换器在自回归设置中的应用。

详情
AI中文摘要

我们为编码器-解码器变换器提供了一种新颖的逻辑刻画,这是大语言模型的基础架构,也广泛应用于各种受益于交叉注意力的场景。我们研究此类变换器在文本上的实际设置,包括浮点数和软注意力,通过一种新的时序逻辑进行刻画。这种逻辑在命题逻辑中引入了针对编码器输入的计数全局模态和针对解码器输入的过去模态。我们还通过一种分布式自动机对这类变换器进行了额外的刻画,并展示了我们的结果不仅限于架构中的特定选择,还能应对如掩码变化等变化。最后,我们讨论了编码器-解码器变换器在自回归设置中的应用。

英文摘要

We give a novel logical characterization of encoder-decoder transformers, the foundational architecture for LLMs that also sees use in various settings that benefit from cross-attention. We study such transformers over text in the practical setting of floating-point numbers and soft-attention, characterizing them with a new temporal logic. This logic extends propositional logic with a counting global modality over the encoder input and a past modality over the decoder input. We also give an additional characterization of such transformers via a type of distributed automata, and show that our results are not limited to the specific choices in the architecture and can account for changes in, e.g., masking. Finally, we discuss encoder-decoder transformers in the autoregressive setting.

2605.07694 2026-05-11 eess.AS cs.AI cs.SD eess.SP

Dependence on Early and Late Reverberation of Single-Channel Speaker Distance Estimation

单通道说话者距离估计对早期和晚期混响的依赖性

Michael Neri, Archontis Politis, Tuomas Virtanen

发表机构 * Faculty of Information Technology and Communication Sciences(信息科技与通讯科学学院)

AI总结 研究探讨了单通道说话者距离估计中早期和晚期混响对性能的影响,发现早期混响是关键因素,且时间校准能显著提升精度。

Comments Submitted to IWAENC 2026

详情
AI中文摘要

单通道说话者距离估计在模拟环境中已实现厘米级精度,但尚不清楚模型如何利用房间脉冲响应(RIR)的组成部分以及性能如何依赖录音条件。本文通过混合时间估计从回声密度函数中确定早期反射和晚期混响的边界,将模拟RIR分解为四个变体(完整、仅直达、无晚期、无早期)。定义了四种校准场景,从完全校准(同步捕获、已知声源级别)到完全未校准(任意起始时间、未知级别),并在匹配数据集上评估所有组合。结果表明,无时间校准时,均绝对误差(MAE)增加到1.29米,模型提取混响相关线索,早期反射成为最有信息的组成部分。进一步分析DRR、C50和T60证实,早期能量越强,估计精度越高,而在高度混响环境中会下降。当时间校准可用时,模型通过仅提取传播延迟,无论RIR内容,可实现0.14米的MAE。

英文摘要

Single-channel speaker distance estimation has recently achieved centimeter-level accuracy in simulated environments, yet it remains unclear which components of the room impulse response (RIR) the model exploits and how performance depends on the recording conditions. In this work, we decompose simulated RIRs into four variants (full, direct-only, no-late, and no-early) using the mixing time estimated from the echo density function as the boundary between early reflections and late reverberation. We define four calibration scenarios, from fully calibrated (synchronised capture, known source level) to fully uncalibrated (arbitrary onset, unknown level), and evaluate all combinations on a matched dataset. Results show that without time calibration, mean absolute error (MAE) increases to $1.29$ m and the model extracts reverberation-based cues, with early reflections emerging as the most informative component. Further analysis against DRR, $C_{50}$, and $T_{60}$ confirms that estimation accuracy improves with stronger early energy and degrades in highly reverberant environments. When time calibration is available, the model achieves a MAE of $0.14$ m by extracting the propagation delay alone, regardless of the RIR content.

2605.07677 2026-05-11 cs.IR cs.AI cs.CL

TRACE: Tourism Recommendation with Accountable Citation Evidence

TRACE:基于可问责引用证据的旅游推荐

Zixu Zhao, Sijin Wang, Yu Hou, Yuanyuan Xu, Yufan Sheng, Xike Xie, Wenjie Zhang, Won-Yong Shin, Xin Cao

发表机构 * UNSW Sydney(新南威尔士大学悉尼分校) University of Adelaide(阿德莱德大学) Yonsei University(延世大学) USTC(中国科学技术大学)

AI总结 TRACE通过多轮对话结合评论引用和显式拒绝机制,解决旅游推荐中信任、可验证性和适应性问题,提出三项能力差距并验证评估方法。

详情
AI中文摘要

旅游业是对话推荐系统(CRS)高风险领域:一个看似合理的建议一旦旅行者采取行动可能浪费真实金钱和旅行时间。现有CRS基准主要评估系统在实体提及上的单个Recall@k分数,且旅游特定资源添加空间或知识图谱上下文,但均未将多轮推荐与原文评论段落证据和拒绝恢复结合。这留下了旅游推荐的评估空白,需同时具备信任、可验证性和适应性:为多方面偏好(如美食、价格、氛围、步行距离)推荐正确的景点,用可验证的证据证明每个建议,使旅行者无需试错,且在对话中途拒绝第一推荐时能恢复。我们引入TRACE,其中每个项目是包含评论段落引用和显式拒绝轮次的多轮旅游推荐对话:10,000个对话覆盖2,400个Yelp景点和34,208条评论,涵盖八个美国城市,配以14个检索、规划和LLM基线,以及25个指标,组织在准确性、可验证性和恢复三个维度下。在这些基线中,TRACE揭示了三项能力差距:LLM零样本在封闭集Recall@1和拒绝恢复方面领先,但引用密度低于检索器;非LLM检索器实现表面原文可验证性,但准确率低;多评论综合失败于恢复。可验证性分数与人类引用精度一致(Spearman rho=+0.80,p<10^-20),配对t检验复现了各基线排名(p<0.01在主导对比上)。TRACE将可问责旅游推荐重新定义为联合目标(正确POI、可验证证据、适应性修复)而非单一轴 leaderboard。

英文摘要

Tourism is a high-stakes setting for conversational recommender systems (CRS): a plausible-sounding suggestion can waste real money and trip time once a traveler acts on it. Existing CRS benchmarks primarily evaluate systems with a single Recall@k score over entity mentions, and tourism-specific resources add spatial or knowledge-graph context, yet none of them couple multi-turn recommendation with verbatim review-span evidence and rejection recovery. This leaves an evaluation gap for tourism recommendation that is simultaneously trustworthy, verifiable, and adaptive: recommend the right point of interest (POI) for multi-aspect preferences (such as cuisine, price, atmosphere, walking distance), justify each suggestion with verifiable evidence from prior visitors so the traveler can act without trial and error, and recover when the first recommendation is rejected mid-dialogue. We introduce TRACE, where each item is a multi-turn tourism recommendation dialogue with review-span citations and explicit rejection turns: 10,000 dialogues over 2,400 Yelp POIs and 34,208 reviews across eight U.S. cities, paired with 14 retrieval, planning, and LLM baselines, along with 25 metrics organized under Accuracy, Grounding, and Recovery. Across these baselines, TRACE reveals the Three-Competency Gap: LLM Zero-Shot leads in closed-set Recall@1 and rejection recovery but cites less densely than retrievers; non-LLM retrievers achieve surface-verbatim grounding but with low accuracy; Multi-Review Synthesis fails at recovery. The Grounding Score agrees with human citation precision (Spearman rho=+0.80, p<10^-20), and paired t-tests reproduce the per-baseline ranking (p<0.01 on the dominant contrasts). TRACE reframes accountable tourism recommendation as a joint target (right POI, verifiable evidence, adaptive repair) rather than a single-axis leaderboard.

2605.07674 2026-05-11 cs.GT cs.CR cs.LG

Differentially Private Auditing Under Strategic Response

差分隐私下的战略响应审计

Florian A. D. Burnat

发表机构 * University of Bath(巴斯大学)

AI总结 本文研究了在审计方与开发者战略响应下,差分隐私审计的设计问题,提出了一种考虑福利权重、审计误判概率、检测弹性及缓解成本曲率的四因素平衡模型,通过开发者KKT系统提出SPAD算法。

详情
AI中文摘要

监管机构对AI系统的审计日益依赖差分隐私(DP)来保护训练数据和模型内部信息。我们研究了在审计方采用隐私约束审计接口时,被审计开发者可能战略响应的审计设计问题。我们将隐私约束审计形式化为一个双层Stackelberg博弈,其中审计方承诺查询策略和DP预算分配,而开发者则根据此策略重新分配缓解措施。我们引入了福利加权的漏检差距$B_w$,即审计未能检测到的福利加权真实残余危害,且证明了朴素的DP审计(均匀或按危害比例分配)在有效可检测性异质、福利权重与可检测性不单调且开发者最优为内部时,会诱导出比任何非战略缓解基线更大的$B_w$。我们刻画了最优审计分配为福利权重、审计误判概率、可检测性弹性及缓解成本曲率的四因素平衡,并通过开发者KKT系统提供双层问题的单层改写。我们提出战略隐私审计设计(SPAD),一种通过开发者最佳响应计算超梯度的投影梯度算法。

英文摘要

Regulatory audits of AI systems increasingly rely on differential privacy (DP) to protect training data and model internals. We study audit design when the audited developer can strategically respond to the privacy-constrained audit interface. We formalize privacy-constrained auditing as a bilevel Stackelberg game, in which an auditor commits to a query policy and DP budget allocation across harm dimensions, and a strategic developer reallocates mitigation efforts in response. We introduce the welfare-weighted under-detection gap $B_w$, the welfare-weighted true residual harm the audit fails to detect at the developer's strategic best response, and prove that naive DP auditing (uniform or harm-proportional allocation) induces a strictly larger $B_w$ than any non-strategic mitigation baseline whenever effective detectability is heterogeneous, the welfare weights are not comonotone with detectability, and the developer's optimum is interior. We characterize the optimal auditor allocation as a four-factor balance of welfare weight, audit miss-probability, detectability elasticity, and mitigation-cost curvature, and provide a single-level reformulation of the bilevel problem via the developer's KKT system. We propose Strategic Private Audit Design (SPAD), a projected-gradient algorithm with hypergradients computed through the developer's best response.

2605.07671 2026-05-11 cs.GT cs.AI cs.MA econ.TH math.OC

The Endogeneity of Miscalibration: Impossibility and Escape in Scored Reporting

误校准的内生性:在评分报告中的不可能性与逃避

Lauri Lovén, Sasu Tarkoma

发表机构 * Future Computing Group, University of Oulu(未来计算组,奥卢大学) University of Oulu(奥卢大学) University of Helsinki(赫尔辛基大学)

AI总结 研究探讨了在评分报告中如何避免误校准的内生性问题,发现使用非线性批准函数可实现最优监督,但会导致诚实报告次优。提出通过阶梯状批准阈值可实现最优筛选,尤其在布里尔评分下,二阶效用等价性独特。

Comments 38 pages, no figures. Targeting ACM Transactions on Economics and Computation (TEAC); preprint

详情
AI中文摘要

从自主代理获取真实报告是可扩展AI监督的核心问题:主导者使用严格恰当评分规则对代理报告进行评分,但代理通过非准确性渠道(如自主行动批准、分配份额、下游控制)获益。这种结构也出现在经典机制设计设置中,如市场操作。我们的主要结论是内生性:主导者的最优监督必然使用非线性批准函数来筛选类型,但任何非线性批准在可检测偏差时都会使诚实报告次优。主导者无法避免破坏校准的扰动。这一不可能性适用于所有严格恰当评分规则,具有闭式扰动公式。存在一种建设性的逃避方法:阶梯状批准阈值可为每种严格恰当评分规则实现最优筛选,因为代理的二元膨胀或不选择创建了类型空间阈值,无论生成器的曲率如何。在布里尔评分下,类型无关的膨胀成本导致第二优与第一优的效用等价性;我们证明这种等价性仅在布里尔评分下成立(在平滑C^1监督下,效用差距下界为Ω(Var(1/G'') (γ/β)^2)对于所有非布里尔规则)。两个实例发展了该框架:AI代理监督(主要激励设置)和市场操作(平行机制设计领域)。AI对齐的信息直接:基于平滑评分的监督无法从战略代理中获取真实报告;尖锐阈值是保持校准的设计。

英文摘要

Eliciting truthful reports from autonomous agents is a core problem in scalable AI oversight: a principal scores the agent's report using a strictly proper scoring rule, but the agent also benefits from the report through a non-accuracy channel (approval for autonomous action, allocation share, downstream control). The same structure appears in classical mechanism-design settings such as marketplace operation. Our main result is an endogeneity: the principal's optimal oversight necessarily uses a non-affine approval function to screen types, yet any non-affine approval makes truthful reporting suboptimal under the combined objective whenever deviation is undetectable. The principal cannot avoid the perturbation that undermines calibration. This impossibility holds for all strictly proper scoring rules, with a closed-form perturbation formula. A constructive escape exists: a step-function approval threshold achieves first-best screening for every strictly proper scoring rule, because the agent's binary inflate-or-not choice creates a type-space threshold regardless of the generator's curvature. Under the Brier score specifically, the type-independent inflation cost yields a welfare equivalence between second-best and first-best; we prove this equivalence is unique to Brier (the welfare gap under smooth $C^1$ oversight is bounded below by $Ω(\text{Var}(1/G'') (γ/β)^2)$ for every non-Brier rule). Two instances develop the framework: AI agent oversight (the lead motivating setting) and marketplace operation (a parallel mechanism-design domain). The message for AI alignment is direct: smooth scoring-based oversight cannot elicit truthful reports from a strategic agent; sharp thresholds are the calibration-preserving design.

2605.07665 2026-05-11 stat.ML cs.LG

Debiased Counterfactual Generation via Flow Matching from Observations

通过观测数据的流匹配实现去偏反事实生成

Hugh Dance, Johnny Xi, Peter Orbanz, Benjamin Bloem-Reddy

发表机构 * Gatsby Computational Neuroscience Unit(盖茨计算神经科学单位) University College London(伦敦大学学院) Department of Statistics(统计系) University of British Columbia(不列颠哥伦比亚大学)

AI总结 本文提出通过去偏流从观测数据学习反事实分布,改进了传统方法,提高了效率和鲁棒性。

详情
AI中文摘要

估计干预下的反事实分布对于治疗风险评估和反事实生成任务至关重要。现有方法将反事实分布视为独立的生成目标,未利用其与观测数据的关系。本文证明在标准假设下,观测和反事实结果分布紧密相关:它们的支持集和尾部行为相同,在弱混杂下保持统计接近,并共享高维结果的不变特征。这些性质促使学习反事实分布并非从头开始,而是通过从观测分布中去混杂的流。我们通过流匹配问题 formulation,推导出基于新型高效影响函数校正的半参数高效估计器。随后,我们将估计器扩展到高维空间中的最小能量流,证明其可以成为观测和反事实分布之间的简单目标。实验表明,去混杂流在现有去偏反事实分布估计器中表现更优,同时缓解了基于流的方法的已知失败模式。

英文摘要

Estimating counterfactual distributions under interventions is central to treatment risk assessment and counterfactual generation tasks. Existing approaches model the counterfactual distribution as a standalone generative target, without exploiting its relationship to the observational data. In this work, we show that under standard assumptions, observational and counterfactual outcome distributions are tightly linked: they have identical support and tail behavior, remain statistically close under weak confounding, and share any features of high-dimensional outcomes which are invariant to confounders. These properties motivate learning counterfactual distributions not from scratch, but via a deconfounding flow from the observational distribution. We formulate this problem via flow-matching and derive a semiparametrically efficient estimator based on a novel efficient influence function correction. We subsequently extend our estimator to target minimal-energy flows in high-dimensions, which we show can be especially simple targets between observational and counterfactual distributions. In experiments, deconfounding flows outperform existing debiased counterfactual distribution estimators, while also mitigating known failure modes of flow-based methods.

2605.07663 2026-05-11 cs.GT cs.CR cs.LG

Quotient Semivalues for False-Name-Resistant Data Attribution

商数半值用于抗虚假身份的数据归因

Florian A. D. Burnat, Brittany I. Davidson

发表机构 * School of Management, University of Bath, UK(巴斯大学管理学院,英国)

AI总结 本文提出商数半值机制,通过证据支持的归因集群计算Shapley、Banzhaf或Beta值,以对抗虚假身份操纵,证明在固定单调数据价值游戏中,精确Shapley公平归因与无限制抗虚假身份性不可兼得,并在DataMarket-Gym中验证了其有效性。

详情
AI中文摘要

数据估值方法分配付款并审计训练数据对机器学习流程的贡献;然而,它们通常假设贡献者是被动的。实际上,贡献者可以将数据集拆分到伪匿名身份中,复制高价值示例,创建近似副本,或制造合成变体以提高其份额。我们将其正式化为ML数据归因中的虚假身份操纵。我们的主要构造是商数半值机制:在证据支持的归因集群上计算Shapley、Banzhaf或Beta类型的值,而不是原始身份,使用标准代表操作符来吸收集群内的复制。我们证明了一个不可能性:在固定单调数据价值游戏中,精确Shapley公平归因在报告身份上与无限制抗虚假身份性不兼容,即使在二值实例上也是如此,并且表征了通用半值在一致反例上的分割收益。该机制在两种结构性条件下恰好是抗虚假身份的:集群内分配的虚假身份中性以及商数稳定的操纵。在不完美的溯源情况下,当这些条件大致成立时,操纵收益和公平性损失由三个可测量的量界定了:逃逸集群质量、价值估计误差和集群距离。我们将在DataMarket-Gym中实例化该机制,这是一个在战略提供者攻击下的归因基准。在合成分类任务中,具有示例级证据的商数半值将重复和近似重复Sybil攻击的操纵收益从基线Shapley下的1.74降低到0.96,接近诚实水平。余弦阈值和(虚假合并,虚假分割)率扫描追踪相应的公平性-Sybil前沿。

英文摘要

Data valuation methods allocate payments and audit training data's contribution to machine-learning pipelines; however, they often assume passive contributors. In reality, contributors can split datasets across pseudonymous identities, duplicate high-value examples, create near-duplicates, or launder synthetic variants to inflate their share. We formalize this as false-name manipulation in ML data attribution. Our main construction is the quotient semivalue mechanism: compute Shapley-, Banzhaf-, or Beta-style values over evidence-backed attribution clusters instead of raw identities, using a canonical-representative operator to absorb within-cluster duplication. We prove an impossibility: on a fixed monotone data-value game, exact Shapley-fair attribution over reported identities is incompatible with unrestricted false-name-proofness, even on binary-valued instances, and characterize the split-gain of a general semivalue on a unanimity counter-example. The mechanism is exactly false-name-proof under two structural conditions: false-name-neutral within-cluster allocation and quotient-stable manipulations. Under imperfect provenance, when these conditions hold approximately, manipulation gain and fairness loss are bounded by three measurable quantities: escaped-cluster mass, value-estimation error, and clustering distance. We instantiate the mechanisms in DataMarket-Gym, a benchmark for attribution under strategic provider attacks. On synthetic classification tasks, quotient semivalues with example-level evidence reduce manipulation gain on duplicate and near-duplicate Sybil attacks from $1.74$ under baseline Shapley to $0.96$, near the honest level. The cosine-threshold and (false-merge, false-split) rate sweeps trace the corresponding fairness--Sybil frontier.

2605.07654 2026-05-11 stat.ML cs.CL cs.LG

Reliable Chain-of-Thought via Prefix Consistency

通过前缀一致性提升可靠性的链式思维

Naoto Iwase, Yuki Ichihara, Mohammad Atif Quamar, Junpei Komiyama

发表机构 * Nagoya University(名古屋大学) Nara Institute of Science and Technology(奈良科学技術大學) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) RIKEN AIP(理化学研究所(AIP))

AI总结 本文提出通过前缀一致性来提升链式思维的可靠性,利用再生过程中答案的重复频率作为权重,无需访问token对数概率或自我评价提示,在多个模型和基准测试中表现出色,减少至21倍的token使用量。

Comments See our project page at https://naoto-iwase.github.io/prefix-consistency-page

详情
AI中文摘要

大型语言模型常通过采样多个链式思维(CoT)轨迹并用多数投票(MV)聚合来提高推理任务的准确性,这是一种称为自我一致性的时间测试技术。当我们在生成CoT过程中截断并重新生成剩余部分时,发现正确答案的轨迹更常重复其原始答案。我们利用这种差异作为可靠性信号,即前缀一致性,它通过再生过程中答案的重复频率对每个候选答案进行加权。它不需要访问token对数概率或自我评价提示。在五个推理模型和四个数学和科学基准测试中,前缀一致性在大多数设置中是最佳正确性预测器,通过它加权投票可以达到标准MV的准确性,使用至多21倍更少的token(中位数4.6倍)。我们的代码可在https://github.com/naoto-iwase/prefix-consistency上获得。

英文摘要

Large Language Models often improve accuracy on reasoning tasks by sampling multiple Chain-of-Thought (CoT) traces and aggregating them with majority voting (MV), a test-time technique called self-consistency. When we truncate a CoT partway through and regenerate the remainder, we observe that traces with correct answers reproduce their original answer more often than traces with wrong answers. We use this difference as a reliability signal, prefix consistency, that weights each candidate answer by how often it reappears under regeneration. It requires no access to token log-probabilities or self-rating prompts. Across five reasoning models and four math and science benchmarks, prefix consistency is the best correctness predictor in most settings, and reweighting votes by it reaches Standard MV plateau accuracy at up to 21x fewer tokens (median 4.6x). Our code is available at https://github.com/naoto-iwase/prefix-consistency.

2605.07634 2026-05-11 math.OC cs.LG math.ST stat.TH

Robust stochastic first order methods in heavy-tailed noise via medoid mini-batch gradient sampling

在重尾噪声中通过中位数小批量梯度采样实现稳健的随机一阶方法

Manojlo Vukovic, Dusan Jakovetic

发表机构 * Faculty of Technical Sciences, University of Novi Sad(技术科学学院,诺维萨德大学) Faculty of Sciences, University of Novi Sad(科学学院,诺维萨德大学)

AI总结 本文提出R-SGD-Mini方法,通过中位数小批量梯度采样处理重尾噪声,证明了在非凸设定下梯度范数收敛至零的速率,并在已知时间 horizon 时达到O(T^{-1/2})的收敛速度。

详情
AI中文摘要

我们考虑了一个一阶随机优化框架,其中在每次迭代中,从i.i.d.数据点中抽取K个样本,以查询随机梯度。我们允许梯度噪声为重尾分布,可能具有无限方差。对于考虑的重尾设置,许多算法变体最近基于梯度裁剪或其他非线性操作(如归一化)应用于噪声梯度。在本文中,我们采取了替代方法,提出了一种新的随机一阶方法,称为鲁棒随机梯度下降与中位数小批量梯度采样(R-SGD-Mini)。R-SGD-Mini的核心思想是将K大小的数据批次分成M个不同的数据块,为每个块形成随机梯度,并根据所有数据块梯度的中位数方向更新解估计。在一般对称重尾梯度噪声类和标准非凸设定下,我们建立了期望时间平均平方梯度范数的显式界限。更具体地说,我们证明后者以O(T^{-1})的速率收敛到零的附近区域;我们用噪声和算法参数显式表征该区域。此外,如果时间范围已知,我们建立了O(T^{-1/2})的收敛速率。当引入裁剪时,我们获得了高概率意义下的收敛保证,并恢复了相同的速率。实验结果表明,R-SGD-Mini及其裁剪变体在与SGD、裁剪SGD和中位数-均值方法相比时表现良好。

英文摘要

We consider a first order stochastic optimization framework where, at each iteration, $K$ independent identically distributed (i.i.d.) data point samples are drawn, based on which stochastic gradients can be queried. We allow gradient noise to be heavy-tailed, with possibly infinite variances. For the considered heavy-tailed setting, many algorithmic variants have recently been proposed based on gradient clipping or other nonlinear operators (e.g., normalization) applied over noisy gradients. In this paper, we take an alternative approach and propose a novel stochastic first order method dubbed Robust Stochastic Gradient Descent with medoid mini-batch gradient sampling, R-SGD-Mini for short. The core idea of R-SGD-Mini is to split the $K$-sized data batch into $M$ distinct data chunks, form for each chunk the stochastic gradient, and update the solution estimate with respect to the stochastic gradient direction of the chunk that is medoid of gradients of all data-chunks. Under a general class of symmetric heavy-tailed gradient noises and a standard non-convex setting, we establish explicit bounds on the expected time-averaged squared gradient norm. More precisely, we show that the latter quantity converges at rate $\mathcal{O}(T^{-1})$ to a small neighborhood of zero; we explicitly characterize this neighborhood in terms of noise and algorithm's parameters. Moreover, if the time horizon is known in advance, we establish the rate of $\mathcal{O}(T^{-\frac{1}{2}}).$ Furthermore, when clipping is incorporated, we obtain convergence guaranties in the high-probability sense and recover the same rate. Experimental results indicate that R-SGD-Mini and its clipped variant consistently perform favorably compared to SGD, clipped SGD and Median-of-Means based methods.

2605.07536 2026-05-11 cs.CR cs.LG

GESR: Graph-Based Edge Semantic Reconstruction for Stealthy Communication Detection with Benign-Only Training

Henghui Xu, Yuchen Zhang, Xiaobo Ma

发表机构 * MOE Key Laboratory for Intelligent Networks and Network Security(信息网络与网络安全教育部重点实验室) Faculty of Electronic and Information Engineering(电子与信息工程学院) Shaanxi Province Key Laboratory of Computer Network(陕西省计算机网络重点实验室)

AI总结 在仅有良性流量训练的情况下检测隐蔽恶意通信是网络安全部门面临的重要挑战。为解决这一问题,本文提出了一种基于图结构的新型框架GESR,通过重构通信边的语义信息,从局部结构上下文中捕捉通信模式,从而有效识别异常通信和主机。该方法无需依赖标记的攻击样本,利用图结构的一致性进行异常检测,并在多个数据集上取得了优异的检测性能。

详情
英文摘要

Detecting stealthy malicious communications from flow logs under benign-only training remains a critical challenge in network security. Malicious communications often camouflage as normal traffic like standard HTTPS flows. Conventional intrusion detectors rely strictly on known labeled attacks. Alternatively, they score flows completely independently. These approaches fail against sparse and context-dependent suspicious activity. To capture this essential context, graph anomaly detectors have been introduced to add valuable relational information to the analysis. However, existing methods fail to test the structural consistency of specific communication edges. To overcome these fundamental limitations, we present GESR, a novel graph-based framework for detecting suspicious communications and anomalous hosts under a benign-only training setting. GESR models complex network activity as attributed communication graphs. It cleverly reconstructs edge semantics entirely from local structural context rather than isolated features. This non-intuitive design forces the framework to predict expected communication patterns from neighborhood topologies. Attackers cannot easily manipulate this deep structural dependency. The model then converts the resulting structural inconsistencies into host-level anomaly scores. It utilizes robust Median Absolute Deviation (MAD) calibration for this final step. We evaluate GESR extensively on CTU-13 and CICIDS2017 datasets. These evaluations strictly impose tight false-positive operating constraints. On CICIDS2017, GESR achieves an outstanding ROC-AUC of 0.9753. It also yields a high TPR of 0.8569 at a strict 5% FPR threshold. GESR consistently outperforms existing methods across both evaluated benchmarks. The results prove that structure-conditioned edge reconstruction is a credible direction for practical intrusion detection.

2605.01041 2026-05-11 cs.MA cs.AI cs.GT cs.LG cs.RO

Separation Assurance between Heterogeneous Fleets of Small Unmanned Aerial Systems via Multi-Agent Reinforcement Learning

Iman Sharifi, Hyeong Tae Kim, Maheed Hatem Ahmed, Mahsa Ghasemi, Peng Wei

发表机构 * Department of Mechanical and Aerospace Engineering, George Washington University(机械与航空航天工程系,乔治华盛顿大学) Department of Electrical and Computer Engineering, Purdue University(电气与计算机工程系,普渡大学)

AI总结 本文研究了在未来高密度城市空域中,不同公司运营异构小型无人机编队时,如何通过多智能体强化学习实现安全分离的问题。提出了一种基于注意力增强的近端策略优化优势演员-评论家(PPOA2C)框架,用于解决同编队和跨编队的冲突,各编队独立训练策略以保护隐私。实验表明,采用共享PPOA2C策略的两编队能够达到安全分离的均衡状态,且该策略在冲突解决和与规则策略的交互中表现出更强的适应性,突显了其在异构无人机系统中公平冲突管理的重要性。

Comments 8 pages, 3 figure, 1 table

详情
英文摘要

In the envisioned future dense urban airspace, multiple companies will operate heterogeneous fleets of small unmanned aerial systems (sUASs), where each fleet includes several homogeneous aircraft with identical policies and configurations, e.g., equipage, sensing, and communication ranges, making tactical deconfliction highly complex for the aircraft. This paper aims to address two core questions: (1) Can tactical deconfliction policies converge or reach an equilibrium to ensure a conflict-free airspace when companies operate heterogeneous fleets of homogeneous aircraft? (2) If so, will the converged policies discriminate against companies operating sUASs with weaker configurations? We investigate a multi-agent reinforcement learning paradigm in which homogeneous aircraft within heterogeneous fleets operate concurrently to perform package delivery missions over Dallas, Texas, USA. An attention-enhanced Proximal Policy Optimization-based Advantage Actor-Critic (PPOA2C) framework is employed to resolve intra- and inter-fleet conflicts, with each fleet independently training its own policy while preserving privacy. Experimental results show that two fleets with distinct, shared PPOA2C policies can reach an equilibrium to maintain safe separation. While two PPOA2C policies outperform two strong rule-based baselines in terms of conflict resolution, a PPOA2C policy exhibits safer interaction with a rule-based policy, indicating adaptive capabilities of PPOA2C policies. Furthermore, we conducted extensive policy-configuration evaluations, which reveal that equilibria between similar policy types tend to favor fleets with stronger configurations. Even under similar configurations but different policy types, the equilibrium favors one of the heterogeneous policies, underscoring the need for fairness-aware conflict management in heterogeneous sUAS operations.

2605.00932 2026-05-11 cs.SE cs.AI

Code World Model Preparedness Report

Daniel Song, Peter Ney, Cristina Menghini, Faizan Ahmad, Aidan Boyd, Nathaniel Li, Ziwen Han, Jean-Christophe Testud, Saisuke Okabayashi, Maeve Ryan, Jinpeng Miao, Hamza Kwisaba, Felix Binder, Spencer Whitman, Jim Gust, Esteban Arcaute, Dhaval Kapil, Jacob Kahn, Ayaz Minhas, Tristan Goodman, Lauren Deason, Alexander Vaughan, Shengjia Zhao, Summer Yue

发表机构 * MSL Preparedness Team(MSL准备团队) AI Security Team(AI安全团队)

AI总结 本报告评估了Meta开发的代码世界模型(CWM)的准备情况,该模型用于代码生成和代码推理。研究通过在可能带来灾难性风险的领域进行预发布测试,并评估模型的潜在偏差,发现CWM并未引入当前AI生态系统之外的额外风险,因此作为开放权重模型发布。

Comments 25 pages, 3 figures

详情
英文摘要

This report documents the preparedness assessment of Code World Model (CWM), a model for code generation and reasoning about code from Meta. We conducted pre-release testing across domains identified in our Frontier AI Framework as potentially presenting catastrophic risks, and also evaluated the model's misaligned propensities. Our assessment found that CWM does not pose additional frontier risks beyond those present in the current AI ecosystem. We therefore release it as an open-weight model.

2605.00754 2026-05-11 cs.SE cs.LG

Themis: Training Robust Multilingual Code Reward Models for Flexible Multi-Criteria Scoring

Indraneil Paul, Goran Glavaš, Iryna Gurevych

发表机构 * UKP Lab, TU Darmstadt and National Research Center for Applied Cybersecurity ATHENE(UKP实验室,德累斯顿理工大学及应用网络安全国家研究中心ATHENE)

AI总结 该研究提出了Themis-RM,一套用于多语言代码生成的鲁棒奖励模型,支持灵活的多维度评分。为解决现有代码奖励模型主要依赖执行反馈、评分维度单一的问题,研究者构建了Themis-CodeRewardBench基准,并收集了超过35万个代码偏好对,用于训练多语言、多准则的代码奖励模型。实验表明,Themis-RM在多语言迁移和多维度评分任务中表现出色,显著提升了代码奖励模型的灵活性和可靠性。

详情
英文摘要

Reward models (RMs) have become an indispensable fixture of the language model (LM) post-training playbook, enabling policy alignment and test-time scaling. Research on the application of RMs in code generation, however, has been comparatively sparse, with existing work largely focusing on execution feedback. This choice constrains post-training to optimizing functional correctness over self-contained executable code. In this work, we examine the training and evaluation of multilingual, multi-criteria code RMs. To this end, we first compile Themis-CodeRewardBench, a benchmark to evaluate code RMs across five preference dimensions (i.e., criteria) and eight programming languages, on which we profile 50+ code, math, and general-purpose RMs. Observing the limited proficiency of current RMs beyond scoring for functional correctness, we develop Themis-CodePreference, the largest open-source collection of code preferences to date (more than 350k preference pairs), and use it to train Themis-RM, a suite of multilingual code reward models for flexible multi-criteria scoring, ranging in size from 600M to 32B parameters. Our experiments and ablations demonstrate positive scaling trends, strong cross-lingual transfer when training on diverse preferences, and the importance of multi-criteria training for reliable code reward modeling.

2604.06276 2026-05-11 eess.IV cs.CV

Structural Regularities of Cinema SDR-to-HDR Mapping in a Controlled Mastering Workflow: A Pixel-wise Case Study on ASC StEM2

Xin Zhang, Xiaoyi Chen

发表机构 * China Research Institute of Film Science & Technology (Test Institute of Film Technical Quality)(中国电影科学研究院(电影技术质量测试院))

AI总结 本文基于ASC StEM2数据集,对电影从标准动态范围(SDR)到高动态范围(HDR)的映射关系进行了像素级的实证研究,分析了在受控制作流程中SDR与HDR版本在亮度和色彩结构上的规律性差异。研究发现,SDR与HDR版本在亮度上具有稳定的全局单调对应关系,而色彩上则表现出色调一致、饱和度分布调整等特点。通过EXR源数据作为参考,研究进一步构建了像素级决策图,区分了需恢复原场景信息的区域和需内容自适应调整的区域,为结构感知的SDR到HDR映射分析提供了可解释的定量基准。

Comments 15 pages, 6 figures. Empirical case study on cinema SDR-to-HDR mapping using ASC StEM2

Journal ref Advanced Motion Picture Technology, 2026, no. 3, pp. 14-22

详情
英文摘要

We present an empirical case study of cinema SDR-to-HDR mapping using ASC StEM2, a rare common-source dataset containing EXR scene-referred images and matched SDR/HDR cinema release masters from the same ACES-based mastering workflow. Based on pixel-wise statistics over all 18,580 frames of the test film, we construct a three-domain comparison involving EXR source data, SDR release masters, and HDR release masters to characterize their luminance and color structural relationships within this controlled workflow. In the luminance dimension, SDR and HDR masters exhibit a highly stable global monotonic correspondence, with geometric structure remaining largely consistent overall; sparse and structured deviations appear in self-luminous highlights and specific material regions. In the color dimension, the two masters remain largely consistent in hue, with saturation exhibiting a redistribution pattern of shadow suppression, midtone expansion, and highlight convergence. Using EXR as a scene-referred anchor, we further define a pixel-level decision map that operationally separates EXR-closer recovery regions from content-adaptive adjustment regions. Under this operational definition, 82.4% of sampled image regions are classified as EXR-closer recovery, while the remainder require localized adaptive adjustment. Rather than claiming a universal law for all cinema mastering pipelines, the study provides an interpretable quantitative baseline for structure-aware SDR-to-HDR analysis and for designing learning-based models under shared-source mastering conditions.

2604.04891 2026-05-11 math.OC cs.AI stat.ML

Muon Dynamics as a Spectral Wasserstein Flow

Gabriel Peyré

发表机构 * CNRS and ENS, PSL Université(CNRS和ENS,PSL大学)

AI总结 本文研究了深度学习中梯度归一化方法的连续时间动力学,提出了一种基于谱范数的Wasserstein距离,用于描述参数空间上的概率测度演化。核心方法通过引入由不同矩阵范数索引的谱Wasserstein距离,将归一化训练过程解释为梯度流,并建立了与Benamou-Brenier公式等的理论联系。研究贡献包括静态Kantorovich公式、鲁棒成本表示、高斯简化以及在多种模型中的数值验证,为理解归一化训练提供了新的几何视角。

详情
英文摘要

Gradient normalization stabilizes deep-learning optimization, and spectral normalizations are especially natural for matrix-shaped parameter blocks; Muon is the motivating example. We study an idealized deterministic, continuous-time, vanishing-momentum version of this idea in the mean-field regime, where wide models are represented by probability measures on parameter space. Starting from normalized matrix flows, we introduce Spectral Wasserstein distances indexed by norms $γ$ on positive semidefinite matrices: the trace norm gives classical $W_2$, the operator norm gives the Muon geometry, and Schatten norms interpolate between them. We develop the static Kantorovich formulation, a max-min robust-cost representation, Gaussian reductions extending the Bures formula, and for monotone norms, prove equivalence with a Benamou--Brenier formulation. This yields a gradient-flow interpretation of the mean-field normalized training dynamics. We illustrate these findings by numerical experiments on MMD flows, Gaussian reductions, two-layer ReLU models, and shallow attention.

2603.24914 2026-05-11 math.HO cs.AI

Shaping the Future of Mathematics in the Age of AI

Johan Commelin, Mateja Jamnik, Rodrigo Ochigame, Lenny Taelman, Akshay Venkatesh

发表机构 * Lorentz Center(洛伦兹中心) the Netherlands(荷兰)

AI总结 本文探讨了人工智能时代下数学学科面临的变革与挑战,重点分析了价值观、实践方式、教学、技术应用和伦理五个关键领域。作者提出了一系列建议,旨在维护数学界的自主性,重构研究实践,拓展课程内容,建设学术导向的基础设施,并制定共同的伦理准则,以确保数学的未来发展由数学界自身主导。

Comments To appear in Notices of the American Mathematical Society. Based on discussions at a September 2025 workshop on "Mechanization and Mathematical Research" held at the Lorentz center, Leiden

详情
英文摘要

Artificial intelligence is transforming mathematics at a speed and scale that demand active engagement from the mathematical community. We examine five areas where this transformation is particularly pressing: values, practice, teaching, technology, and ethics. We offer recommendations on safeguarding our intellectual autonomy, rethinking our practice, broadening curricula, building academically oriented infrastructure, and developing shared ethical principles - with the aim of ensuring that the future of mathematics is shaped by the community itself.

2602.08786 2026-05-11 cs.CY cs.LG

On the Meta-Design of Allocation Problems

Unai Fischer-Abaigar, Emily Aiken, Christoph Kern, Juan Carlos Perdomo

发表机构 * LMU Munich(慕尼黑莱茵河大学) Munich Center for Machine Learning(慕尼黑机器学习中心) University of California San Diego(圣地亚哥大学) New York University(纽约大学) Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文研究了资源分配问题中设计参数的元设计问题,即如何在预测、容量约束和干预质量等高层决策上进行优化,而不仅仅是固定这些参数后寻找最优分配策略。文章首次形式化定义了资源分配问题的元设计空间,并开发了相应的实证工具,帮助实践者进行系统分析。通过德国就业服务和埃塞俄比亚定向现金转移项目的案例研究,验证了该框架的有效性与实用性。

详情
英文摘要

There is an extensive literature that studies how to find optimal policies in resource allocation problems, taking the underlying design parameters that define the allocation, such as what data is collected, how many people can be served, and quality of service as fixed constraints. Yet, from a planner's perspective, these design parameters are themselves optimization variables that are just as important in determining overall welfare as selecting the optimal targeting rule for a given set of constraints. This realization motivates a rich set of meta-design questions exploring how planners should make principled decisions about investments in prediction, capacity constraints, and treatment quality, all of which lie upstream of classical policy optimization. Building on initial theoretical work in this space, our paper has three main contributions. First, we formally define the broad meta-design space of resource allocation problems. Second, we develop empirical tools that enable practitioners to reliably navigate it. Third, we demonstrate the framework in two real-world case studies on German employment services and targeted cash transfer programs in Ethiopia.

2602.04774 2026-05-11 cond-mat.dis-nn cs.LG stat.ML

Theory of Optimal Learning Rate Schedules and Scaling Laws for a Random Feature Model

Blake Bordelon, Francesco Mori

发表机构 * Center for Mathematical Sciences and Applications, Harvard University(哈佛大学数学科学中心)

AI总结 本文研究了深度学习中学习率调度的最优理论,针对随机特征模型在随机梯度下降(SGD)下的训练过程,提出了基于最优控制理论的分析方法。研究发现学习率调度可分为“易相”和“难相”两个阶段,分别对应不同的衰减策略,并揭示了学习率与批量大小联合优化对训练效率的影响。实验表明,该理论在图像分类和语言模型任务中均具有良好的适用性,为学习率调度提供了理论指导和实践参考。

详情
英文摘要

Setting the learning rate (LR) for a deep learning model is a critical part of successful training. Choosing LRs is often done empirically with trial and error. In this work, we explore a solvable model of optimal LR schedules for a powerlaw random feature model trained with stochastic gradient descent (SGD). We consider the optimal schedule $η_T^\star(t)$ where $t$ is the current iterate and $T$ is the training horizon. This schedule is computed both as a numerical optimization problem and also analytically using optimal control theory. Our analysis reveals two regimes which we term the easy phase and hard phase. In the easy phase the optimal schedule is a polynomial decay $η_T^\star(t) \simeq T^{-ξ} (1-t/T)^δ$ where $ξ$ and $δ$ depend on the properties of the features and task. In the hard phase, the optimal schedule resembles warmup-stable-decay with constant initial LR and annealing performed over a vanishing fraction of training steps. We investigate joint optimization of LR and batch size and find batch ramps can improve the wall-clock time in the easy phase. Beyond SGD, we derive optimal schedules for momentum parameter $β(t)$ and show that it improves the loss-scaling exponent in the hard phase. We compare our optimal schedule to various benchmarks including (1) optimal constant learning rates $η_T(t) \sim T^{-ξ}$ (2) optimal power laws $η_T(t) \sim T^{-ξ} t^{-χ}$, finding that our schedule achieves better rates than either of these. Our theory suggests that LR transfer across training horizon depends on the structure of the model and task. For ResNet image classification on CIFAR-5M, the learning curves exhibit hard-phase behavior where optimal base LRs are constant under sufficient annealing. GPT-2 style transformers trained in language modeling exhibit easy-phase behavior where optimal LRs shift even under annealing.

2602.01372 2026-05-11 math.OC cs.LG

Robust Sublinear Convergence Rates for Iterative Bregman Projections

Gabriel Peyré

发表机构 * CNRS and ENS, Université PSL(国家科学研究中心和巴黎高等师范学院,巴黎大学)

AI总结 本文研究了在熵正则化框架下迭代Bregman投影方法的收敛速率问题,提出了一种通用的分析框架,证明了其对偶收敛速率为 $O(1/k)$,且常数项仅与熵正则化参数 $γ$ 线性相关,因而称为“鲁棒”收敛速率。该方法通过构造约束分割诱导的商范数下的原问题和对偶问题界,结合非扩张性分析,简化了收敛性证明。文章还基于该框架提出了一个新的图结构上的流-Sinkhorn算法,用于计算图上的Wasserstein-1距离,并给出了其计算复杂度的理论保证。

详情
英文摘要

Entropic regularization provides a simple way to approximate linear programs whose constraints split into two or more tractable blocks. The resulting objectives are amenable to cyclic Kullback-Leibler (KL) Bregman projections, with Sinkhorn-type algorithms for optimal transport, matrix scaling, and barycenters as canonical examples. This paper gives a general blueprint for proving $O(1/k)$ dual convergence rate with a constant that scales only linearly in $1/γ$, where $γ$ is the entropic regularization parameter. We call such rates "robust", because this mild dependence on $γ$ underpins favorable complexity bounds for approximating the unregularized problem via alternating KL projections. The blueprint reduces the proof to a uniform primal bound and a dual bound for a quotient norm induced by the constraint split. To make these inputs usable, we propose two helper results, which rely on the non-expansiveness of the dual iterations in this quotient dual norm. Instantiating this blueprint for graph-structured transport yields a new flow-Sinkhorn algorithm for the Wasserstein-1 distance on graphs. It achieves $\varepsilon$-additive accuracy on the transshipment cost in $O(p\,\mathrm{diameter}^3/\varepsilon^{4})$ arithmetic operations (up to logarithmic factors), where $p$ is the number of edges. We also provide a machine-checked Lean formalization of the core blueprint and its graph-$\mathrm{W}_1$ instantiation.