arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.08381 2026-06-09 cs.CL cs.AI 新提交

少步全原子流图共折叠

Gianluca Scarpellini, Ron Shprints, Peter Holderrieth, Juno Nam, Pranav Murugan, Rafael Gómez-Bombarelli, Tommi Jaakola, Maruan Al-Shedivat, Nicholas Matthew Boffi, Avishek Joey Bose

发表机构 * Genesis Molecular AI ； Massachusetts Institute of Technology（麻省理工学院）； Carnegie Mellon University（卡内基梅隆大学）； Imperial College London（伦敦帝国学院）； Mila

AI总结提出DeCAF框架，将全原子共折叠扩散模型蒸馏为流图，仅需几步推理即可生成高质量样本，并通过奖励引导搜索提升采样质量。

详情

AI中文摘要

Bayesian-Agent：面向LLM Agent框架的后验引导技能演化

Xiaojun Wu, Cehao Yang, Honghao Liu, Xueyuan Lin, Wenjie Zhang, Zhichao Shi, Xuhui Jiang, Chengjin Xu, Jia Li, Jian Guo

发表机构 * IDEA Research（IDEA研究院）； The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； DataArcTech Ltd.（DataArcTech有限公司）

AI总结提出Bayesian-Agent框架，将可复用技能视为假设，通过后验分布指导技能演化（如修补、拆分、压缩等），在多个基准上显著提升性能，表明Agent技能演化应视为后验引导的框架优化。

Comments 15 pages, 6 figures

详情

AI中文摘要

LLM agent越来越依赖外部推理条件：提示、工具、记忆、SOP、技能和框架反馈。这些资产可以在不改变模型权重的情况下改进任务执行，但通常通过启发式反思或重用观察到的成功和失败来修订，仿佛计数本身是可靠的信念。我们引入了\textbf{Bayesian-Agent}，一个原生且跨框架的框架，将可重用技能和SOP视为关于冻结模型在特定提示、上下文和框架环境下是否会成功的假设。Bayesian-Agent记录经过验证的轨迹证据，维护每个技能的特征条件分类后验，并将后验状态映射为可检查的动作，如修补、拆分、压缩、退役和探索。面向模型的提示获得可执行的防护栏和故障模式修补，而后验摘要仍可用于审计。使用\texttt{deepseek-v4-flash}，增量修复将SOP-Bench从80%提升到95%，Lifelong AgentBench从90%提升到100%，RealFin-Bench从45%提升到65%。我们进一步评估了Bayesian-Agent的原生后端以及可选的GenericAgent、mini-swe-agent和Claude Code后端。结果包括正面、负面、饱和和案例研究设置，表明Agent技能演化最好被视为后验引导的框架优化，而非未校准的提示积累。源代码可在https://github.com/DataArcTech/Bayesian-Agent获取。

英文摘要

LLM agents increasingly rely on external inference conditions: prompts, tools, memory, SOPs, skills, and harness feedback. These assets can improve task execution without changing model weights, but they are often revised by heuristic reflection or by reusing observed successes and failures as if counts alone were reliable belief. We introduce \textbf{Bayesian-Agent}, a native and cross-harness framework that treats reusable skills and SOPs as hypotheses about whether a frozen model will succeed under a particular prompt, context, and harness environment. Bayesian-Agent records verified trajectory evidence, maintains a feature-conditioned categorical posterior over each skill, and maps posterior state into inspectable actions such as patch, split, compress, retire, and explore. Model-facing prompts receive executable guardrails and failure-mode patches, while posterior summaries remain available for audit. With \texttt{deepseek-v4-flash}, incremental repair improves SOP-Bench from 80\% to 95\%, Lifelong AgentBench from 90\% to 100\%, and RealFin-Bench from 45\% to 65\%. We further evaluate Bayesian-Agent's native backend and optional GenericAgent, mini-swe-agent, and Claude Code backends. The results include positive, negative, saturated, and case-study settings, suggesting that agent skill evolution is best viewed as posterior-guided harness optimization rather than uncalibrated prompt accumulation. The source code is available at https://github.com/DataArcTech/Bayesian-Agent.

URL PDF HTML ☆

赞 0 踩 0

2606.08347 2026-06-09 cs.CL cs.LG 新提交

超越原始信号：作为特权合成数据的未解码生成潜变量

Cristian Sbrolli, Nicolas Michel, Matteo Matteucci, Toshihiko Yamasaki

发表机构 * Politecnico di Milano（米兰理工大学）； The University of Tokyo（东京大学）

AI总结提出直接潜变量增强（DLA）方法，利用未解码的生成潜变量作为特权信息，并通过多层显式模拟联觉（MESSy）将密集知识迁移到纯视觉学生模型，避免了解码-编码循环的低效性。

详情

AI中文摘要

虽然多模态集成显著提升了计算机视觉模型，但部署它们会带来高昂的推理成本，并且需要稀缺且完美配对的数据集。近期方法通过生成式AI合成缺失模态来解决这一数据瓶颈，但它们引入了一个严重的低效问题：解码-编码循环。具体来说，信息丰富的生成潜变量被解码为噪声原始信号，迫使下游分类器浪费容量重新编码它们。为了绕过这一瓶颈，我们提出直接潜变量增强（DLA），直接利用未解码的生成潜变量作为特权信息。此外，为了将这种密集知识迁移到纯视觉学生模型，我们引入多层显式模拟联觉（MESSy）。MESSy 不使用强制表示匹配（这迫使学生扭曲其原生视觉特征以适应复杂的多模态拓扑），而是使用预测目标来安全地内化这些物理先验。实验结果表明，我们的框架显著优于原始数据增强和传统蒸馏。最终，我们的方法产生了高度准确的单模态学生模型，其具有“联觉”潜变量结构，这些结构本质上与它们从未直接观察到的物理属性对齐。

英文摘要

While multimodal integration significantly improves computer vision models, deploying them incurs prohibitive inference costs and requires scarce, perfectly paired datasets. Recent methods address this data bottleneck by synthesizing missing modalities via generative AI, yet they introduce a severe inefficiency: the Decode-Encode Loop. Specifically, information-rich generative latents are decoded into noisy raw signals, forcing the downstream classifier to waste capacity re-encoding them. To bypass this bottleneck, we propose Direct Latent Augmentation (DLA), utilizing undecoded generative latents directly as privileged information. Furthermore, to transfer this dense knowledge to a purely visual student, we introduce Multilayer Explicit Simulated Synesthesia (MESSy). Instead of enforcing rigid representation matching, which forces the student to distort its native visual features to accommodate complex multimodal topologies, MESSy uses a predictive objective to safely internalize these physical priors. Empirical results demonstrate that our framework significantly outperforms raw data augmentation and traditional distillation. Ultimately, our approach yields highly accurate unimodal students with ``synesthetic'' latent structures that are inherently aligned with physical properties they have never directly observed.

URL PDF HTML ☆

赞 0 踩 0

2606.08332 2026-06-09 cs.CV 新提交

使用PCA和核PCA的航空公司聚类分析中的正交性与维度性

Andreas Schlapbach

发表机构 * Swiss Federal Railways (SBB)（瑞士联邦铁路（SBB））； University of Berne（伯尔尼大学）

AI总结本文复现了Renold等人对1995-2020年美国航空公司利润周期的聚类实验，通过PCA和核PCA分析，发现六聚类分类在原始7维和3维PC空间中具有几何鲁棒性，并验证了数据的内在线性流形结构。

详情

AI中文摘要

为了刻画1995年至2020年美国航空公司的利润周期，Renold等人（2023）结合了k-means聚类、主成分分析和系统动力学建模。我们在三个空间中复现了他们的聚类实验——原始7维变量空间、3维PC得分空间和4维PC得分空间，使用了他们论文中慷慨包含的数据集。我们表明，六聚类分类在几何上是鲁棒的：在3-PC空间中的k-means产生的聚类分配与7维原始空间逐位相同。作为非线性检验，我们在六个核（涵盖三个族加上一个线性基线）下应用核PCA。所有六个核在2D中保留了六聚类分配。一个1D诊断进一步收紧：线性核将COVID年份C_3与峰值利润聚类C_0混淆，而所有五个非基线核将C_3移动到仅与后金融危机聚类C_5重叠。核族之间的一致性证实了一个内在的线性流形，没有隐藏的曲率。轮廓准则显示，该数据集在结构上仅支持三个聚类，而不是六个。原始7D空间中的共线性抑制了本应识别k=3作为结构上合理选择的轮廓信号。

英文摘要

To characterize the US airline profit cycles from 1995 to 2020, the authors of Renold et al. (2023) combine k-means clustering, principal component analysis, and system dynamic modelling. We replicate their clustering experiment in three spaces -- the original 7-dimensional raw-variable space, a 3-dimensional PC score space, and a 4-dimensional PC score space using their dataset gratefully included in the paper. We show that the six-cluster taxonomy is geometrically robust: k-means in 3-PC space produces bit-for-bit identical cluster assignments relative to 7D raw space. As a nonlinearity check we apply kernel PCA under six kernels spanning three families plus a linear baseline. All six kernels preserve the six-cluster assignment in 2D. A 1D diagnostic tightens this: the linear kernel conflates the COVID year C_3 with the peak-profit cluster C_0, whereas all five non-baseline kernels shift C_3 to overlap only the post-financial-crisis cluster C_5. Agreement across the kernel families confirms an intrinsically linear manifold with no hidden curvature. The silhouette criterion reveals that the dataset structurally supports only three clusters, not six. Collinearity in the raw 7D space suppresses the silhouette signal that would otherwise identify k=3 as the structurally motivated choice.

URL PDF HTML ☆

赞 0 踩 0

2606.08314 2026-06-09 cs.AI 新提交

Integrating Deep Learning Demand Forecasting with Multi-Objective Optimization for Circular Coffee Supply Chains: A Data-Driven Framework for Cost, Emissions, and Freshness Management

集成深度学习需求预测与多目标优化的循环咖啡供应链：面向成本、排放和新鲜度管理的数据驱动框架

Gerçek Budak, Faraz Gholamzadeh Gharehgheshlaghi, Melika Barjesteh Vaezi, Ahmad Gholizadeh Lonbar

发表机构 * Ankara Yıldırım Beyazıt University（安卡拉耶尔德勒姆贝亚泽特大学）； Texas Tech University（德克萨斯理工大学）； University of Alabama（阿拉巴马大学）

AI总结提出两阶段框架，先用CNN-LSTM模型预测需求（MAE=22.87，R²=0.90），再通过三目标MILP模型优化成本、碳排放和新鲜度，在循环供应链中获得25个Pareto解，平衡政策可减排22.4%仅增成本9.9%。

详情

AI中文摘要

咖啡供应链是最复杂的农产品网络之一，具有地理分散生产、多层协调以及对质量和新鲜度高度敏感的特点。尽管可持续性和数字化已受到关注，但需求预测、优化和可追溯性通常被分开处理。本研究提出了一个两阶段集成框架。首先，使用混合CNN-LSTM模型进行需求预测。在公开的Coffee Chain Sales数据集上，按时间顺序70/15/15划分，模型实现了MAE为22.87、R²为0.90，优于最佳深度学习基准约12%，优于经典方法超过30%。第二阶段，预测的需求输入一个三目标混合整数线性规划（MILP）模型，该模型在具有循环回收的多周期、多模式、闭环供应链中同时最小化成本、最小化碳排放和最大化产品新鲜度。新鲜度通过基于库存年龄的指数衰减建模。使用epsilon-约束方法，获得了25个Pareto解。敏感性和政策分析表明，平衡的可持续性政策可以在仅增加9.9%成本的情况下减少22.4%的排放，同时保持接近最优的新鲜度。

英文摘要

The coffee supply chain is one of the most complex agri-food networks, marked by geographically dispersed production, multi-tier coordination, and high sensitivity to quality and freshness. While sustainability and digitalization have gained attention, demand forecasting, optimization, and traceability are often treated separately. This study presents a two-phase integrated framework. First, a hybrid CNN-LSTM model is used for demand forecasting. On the public Coffee Chain Sales dataset with chronological 70/15/15 splitting, the model achieves MAE of 22.87 and R^2 of 0.90, outperforming the best deep learning benchmark by ~12% and classical methods by over 30%. In the second phase, the forecasted demand feeds a tri-objective mixed-integer linear programming (MILP) model that jointly minimizes cost, minimizes carbon emissions, and maximizes product freshness in a multi-period, multimodal, closed-loop supply chain with circular recovery. Freshness is modeled via exponential decay based on inventory age. Using the epsilon-constraint method, 25 Pareto solutions are obtained. Sensitivity and policy analyses show that balanced sustainability policies can reduce emissions by 22.4% with only a 9.9% cost increase while maintaining near-optimal freshness. Keywords: Coffee supply chain; Deep learning; Demand forecasting; Multi-objective optimization; Circular economy; CNN-LSTM; Mixed-integer linear programming.

URL PDF HTML ☆

赞 0 踩 0

2606.08312 2026-06-09 cs.AI cs.FL 新提交

Neuro-Symbolic Injection of LTLf Constraints in Autoregressive Reinforcement Learning Policies

自回归强化学习策略中LTLf约束的神经符号注入

Ashkan Ansarifard, Matteo Mancanelli, Elena Umili, Fabio Patrizi

发表机构 * Sapienza University of Rome（罗马大学）

AI总结提出神经符号框架，将LTLf约束编译为DFA并通过可微损失注入Transformer策略，在导航任务中提升约束满足且保持回报竞争力。

Comments Accepted at the Joint Workshop on Statistics and Knowledge Integration for Logic, Learning, Ethical Decisions, and LLMs (SKILLED-LLMs 2026), co-located with KR 2026 and FLoC 2026, Lisbon, Portugal

详情

AI中文摘要

在这项工作中，我们研究了在有限迹线性时序逻辑（LTLf）表达的时延任务约束下的离线强化学习（RL）。最近，基于Transformer的方法如Trajectory Transformers和Decision Transformers已被采用，将RL视为序列建模问题。然而，这些方法纯粹优化奖励，不考虑高层时序需求。在此，我们引入一个神经符号框架，将LTLf背景知识注入到这类基于Transformer的RL策略中。我们的方法将LTLf公式编译为确定性有限自动机（DFA），并通过可微表示和基于逻辑的损失函数将其整合到学习过程中。特别地，我们从DFA进展中推导出可微的满足信号，并将其作为训练过程中的正则化项。最终的方法在不同模型间是架构无关的。我们在具有覆盖安全性和可达性时序属性组合的规范套件的导航环境中评估所提出的框架。实验结果表明，融入背景知识不仅提高了约束满足，而且与普通基线相比保持了有竞争力的回报。

英文摘要

In this work we study offline reinforcement learning (RL) under temporally extended task constraints expressed in Linear Temporal Logic over finite traces (LTLf). Recently, transformer-based approaches such as Trajectory Transformers and Decision Transformers have been adopted to address RL as a sequence modeling problem. However, these methods optimize purely for reward and do not account for high-level temporal requirements. Here, we introduce a neurosymbolic framework that injects LTLf background knowledge into such transformer-based RL policies. Our approach compiles LTLf formulas into deterministic finite automata (DFAs) and integrates them into the learning process through a differentiable representation and a logic-based loss function. In particular, we derive differentiable satisfaction signals from DFA progression and use them as a regularization term during training. The resulting method is architecture-agnostic across different models. We evaluate the proposed framework on navigation environments with specification suites covering combinations of safety and reachability temporal properties. Experimental results show that incorporating background knowledge not only improves constraint satisfaction, but also maintains competitive return compared to vanilla baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.08311 2026-06-09 cs.AI 新提交

Curation of a Cardiology Interface Terminology for Highlighting Electronic Health Records using Machine Learning

利用机器学习构建心脏病学接口术语以突出电子健康记录

Mahshad Koohi Habibi Dehkordi, Shuxin Zhou, Yehoshua Perl, Fadi P. Deek, James Geller, Gai Elhanan, Andrew J. Einstein, Luke Lindemann, Vipina K. Keloth

发表机构 * Department of Computer Science, New Jersey Institute of Technology（新泽西理工学院计算机科学系）； Department of Computer Science, St.Francis College（圣弗朗西斯学院计算机科学系）； Department of Informatics, New Jersey Institute of Technology（新泽西理工学院信息学系）； Department of Data Science, New Jersey Institute of Technology（新泽西理工学院数据科学系）； Center for Genomic Medicine, School of Medicine, University of Nevada（内华达大学医学学院基因组医学中心）； Department of Medicine, Cardiology Division, Columbia University Irving Medical Center（哥伦比亚大学伊万杰琳医学中心内科部（心内科））； Advanced Metrics Laboratory, School of Medicine and Health Sciences, George Washington University（乔治华盛顿大学医学院与健康科学学院高级指标实验室）； Department of Biomedical Informatics and Data Science, Yale University（耶鲁大学生物医学信息学与数据科学系）

AI总结提出基于机器学习的心脏病学接口术语（CIT）设计方法，通过半自动构建训练数据并训练模型，实现对电子健康记录中关键信息的高亮，覆盖率达74.21%。

详情

AI中文摘要

电子健康记录（EHR）笔记是密集的医学文档，包含大量信息，通常充满复杂的医学术语。高亮EHR中的所有细节有助于通过吸引对关键内容的注意力来减少遗漏重要信息的可能性。本研究提出设计一种心脏病学接口术语（CIT），以准确高亮心脏病患者EHR笔记中的所有细节。我们引入一种创新的机器学习（ML）技术用于CIT的设计。ML技术需要训练数据。手动准备此类训练数据耗时且昂贵。CIT设计过程包括三个阶段。在前两个阶段中，我们创新性地推导出一个训练数据CIT，供第三阶段的ML技术使用。我们首先设计初始CIT，由几个部分组成：SNOMED的心脏病学子层次、从构建集的EHR中挖掘的其他SNOMED概念，以及术语的必要组成部分（如医学缩写和药物）。利用迭代过程，从构建集中提取包含初始CIT概念的细粒度短语作为CIT概念候选。候选概念在半自动审查后添加到CIT中，得到训练数据CIT（TCIT）。在第三阶段，使用TCIT训练ML模型，以识别适合作为CIT概念的概念。该模型用于从构建集中提取更多概念，得到最终CIT。然后使用最终CIT高亮测试集，并评估其捕获未见EHR数据集中细节的程度。为此，使用了四个评估指标：覆盖率、广度、完整性和简洁性。高亮测试集的覆盖率为74.21%，广度为1.68。对于测试集中的20个随机笔记，平均完整性为98.2%，平均简洁性为84.2%。

英文摘要

Electronic health record (EHR) notes are dense medical documents containing large amounts of information, often filled with complex medical jargon. Highlighting all details in EHRs helps reduce the likelihood of missing crucial information by drawing attention to key content. This study proposes the design of a Cardiology Interface Terminology (CIT) to accurately highlight all details in EHR notes of cardiology patients. We introduce an innovative Machine Learning (ML) technique for the design of CIT. The ML technique requires training data. Manual preparation of such training data is time-consuming and expensive. The process of the CIT design includes three phases. In the first two phases, we innovatively derive a training data CIT to be used by the third phase, ML technique. We start by designing an initial CIT, composed of several components: the cardiology-related sub-hierarchies of SNOMED, other SNOMED concepts mined from EHRs of build set, and necessary components of terms e.g., medical abbreviations and medications. Utilizing an iterative process, fine-grained phrases containing initial CIT concepts are extracted from build set as CIT concept candidates. The candidate concepts are semi-automatically reviewed before being added to CIT, yielding the training data CIT, TCIT. In the third phase, a ML model is trained with TCIT to identify candidates fitting to be concepts in the CIT. This model is used to extract further concepts from build set, yielding the final CIT. The final CIT is then used to highlight the test set and evaluate the extent to which it captures details in an unseen EHR dataset. For this purpose, four evaluation metrics, coverage, breadth, completeness, and conciseness are used. The highlighted test set has a coverage of 74.21%, with a breadth of 1.68. For 20 random notes in test set, the average completeness is 98.2% and average conciseness is 84.2%.

URL PDF HTML ☆

赞 0 踩 0

2606.08310 2026-06-09 cs.AI cs.MA 新提交

To Nuke or Not to Nuke: LLMs' (Missing) Ethical Reasoning and Actions in a High-Stakes Decision-Making Simulation

核弹还是和平：大语言模型在高风险决策模拟中的（缺失的）伦理推理与行动

John Chen, Sihan Cheng, Can Gurkan, H M Abdul Fattah

发表机构 * University of Arizona（亚利桑那大学）； Northwestern University（西北大学）

AI总结研究LLM在复杂游戏《文明V》中自发升级核授权的现象，通过三种提示干预发现伦理推理未能可靠消除升级，识别出三种失败路径，强调需在复杂决策上下文中测试伦理推理的自发性和行为有效性。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被部署为具有决策能力的长期智能体。虽然LLM在电车难题等困境中能展现伦理能力，但这种能力可能无法迁移到复杂的智能体场景中。我们在《文明V》中研究这一差距，这是一款涉及经济、外交、技术和军事战略等复杂决策的多玩家游戏。从130个高紧张度的LLM自我对弈回合开始（其中LLM玩家自发升级核授权），我们通过三种提示干预重放这些回合：强调核伤害的伦理提示、移除先前模型的决策理由、以及强调现实世界影响的高风险框架。没有干预或干预组合能可靠消除涌现的升级。我们识别出三种失败路径：伦理推理在没有提示时未能浮现、即使在提示下也未能出现、或者浮现但未能生效（当战略反制因素占主导时）。因此，对智能体模型的评估必须测试伦理推理是否在复杂决策上下文中被自发调用并具有行为有效性，而不仅仅是在孤立情境中能否被诱发。

英文摘要

Large language models (LLMs) are increasingly deployed as long-horizon agents with decision-making capacities. While LLMs can show ethical competence on dilemmas such as trolley problems, this competence may not translate to complex, agentic scenarios. We study this gap in Civilization V, a multiplayer game with a complex decision-making landscape including economy, diplomacy, technology, and military strategy. Starting from 130 high-tension LLM self-play episodes, in which an LLM player spontaneously escalated nuclear authorization, we replay them across 13 models with three prompt interventions: an ethical prompt naming nuclear harm, removal of the previous model's decision-making rationale, and high-stakes framing emphasizing real-world impacts. No interventions nor their combinations reliably eliminate emergent escalation. We identify three failure pathways: ethical reasoning that fails to surface without prompting, fails to appear even when prompted, or surfaces but fails to take effect when strategic counter-factors dominate. Evaluations of agentic models, therefore, must test whether ethical reasoning is spontaneously invoked and behaviorally effective in complex decision-making contexts, beyond whether it can be elicited in isolation.

URL PDF HTML ☆

赞 0 踩 0

2606.08309 2026-06-09 cs.LG cs.CV 新提交

Where the Score Lives: A Wavelet View of Diffusion

分数函数所在之处：扩散的小波视角

Emma Finn, Binxu Wang, T. Anderson Keller, Demba E. Ba

发表机构 * The Kempner Institute for the Study of Natural and Artificial Intelligence（肯普纳自然与人工智能研究所）； Harvard University（哈佛大学）

AI总结提出基于二维正交小波基的分数函数参数化，通过数据分布矩分析揭示不同架构的归纳偏差，解释扩散模型中分数网络与数据分布的相互作用。

Comments 20 pages, 12 figures, AISTATS 2026

详情

Journal ref: Proceedings of the 29th International Conference on Artificial Intelligence and Statistics (AISTATS) 2026, Tangier, Morocco. PMLR: Volume 300

AI中文摘要

基于分数的生成模型在过去十年中在生成多样化视觉上合理的图像方面取得了显著成功。在扩散建模中，包括CNN、U-Net和Transformer在内的多种架构被用作分数近似网络；然而，迄今为止，关于这些架构选择如何影响生成行为的了解相对较少。在这项工作中，为了提供对此领域的见解，我们提出了一种使用二维正交小波基展开的分数函数的解析可解参数化。特别地，我们根据数据分布的矩推导出可解释的最优分数函数。我们利用这种参数化提供了一种与架构无关的、基于矩的分析，揭示了数据分布的哪些属性对去噪最为重要。我们的分数机器足够灵活，可以部分模仿多种架构（包括U-Net和CNN）的相关归纳偏差，朝着理解不同分数架构为何表现出不同生成行为迈出了一步。由于我们的分数函数可以根据数据矩解析求解，我们可以开始理解数据分布如何与分数网络相互作用，从而产生我们在扩散模型中观察到的行为。

英文摘要

Score-based generative models have had remarkable success over the last decade in generating a diverse set of visually plausible images. A variety of architectures including CNNs, U-Nets, and Transformers have been used as the score-approximation network in such diffusion modeling; however, to date, relatively little is known about how these architectural choices impact generative behavior. In this work, to provide insight into this area, we propose an analytically solvable parameterization of the score function using an expansion in a 2D orthogonal wavelet basis. In particular, we derive interpretable optimal score functions in terms of the moments of the data distribution. We use this parametrization to provide an architecture-agnostic, moment-based analysis that reveals which attributes of the data distribution tend to matter most for denoising. Our score machine is flexible enough to partially mimic the relevant inductive biases of multiple architectures, including U-Nets, and CNNs, taking a step towards understanding why different score architectures can exhibit distinct generative behavior. Since our score is solvable in terms of the moments of the data, we can begin to understand how the data distribution interacts with the score network to produce the behavior we observe in diffusion models.

URL PDF HTML ☆

赞 0 踩 0

2606.08308 2026-06-09 cs.LG 新提交

Fourier fractal dimension to predict the generalization of deep neural networks

傅里叶分形维数预测深度神经网络的泛化能力

Joao B. Florindo, Davi Wanderley Misturini

发表机构 * Institute of Mathematics, Statistics and Scientific Computing - University of Campinas（坎皮纳斯大学数学、统计与科学计算研究所）

AI总结提出基于权重变化的傅里叶分形维数作为泛化度量，并设计傅里叶优化器正则化该维数，在CIFAR-10等数据集上实现与泛化差距的高相关性。

详情

AI中文摘要

在不依赖留出验证数据的情况下预测深度神经网络的泛化性能是机器学习中的一个基本挑战。虽然随机梯度下降驱动这些高度参数化模型的优化，但其重尾、非高斯动力学在参数空间中诱导出复杂的、尺度不变的轨迹。在本文中，我们提出了一种基于网络权重变化的傅里叶分形维数的新型泛化度量。通过分析频域中Lévy驱动的随机微分方程的特征函数，我们提取出一个能够稳健捕捉学习过程几何复杂性的度量。此外，我们引入了一种定制的基于傅里叶的优化器，旨在训练过程中主动正则化该分形维数。在CIFAR-10、SVHN和MNIST数据集上的大量实证评估表明，我们提出的傅里叶泛化度量与实际泛化差距具有强相关性。我们的方法实现了最先进的Kendall秩相关系数，优于现有的基于范数、基于间隔和PAC-Bayesian度量。最终，这项工作凸显了频域分形分析作为模型泛化能力的强大预测器以及开发更稳定优化算法的原则性基础的潜力。

英文摘要

Predicting the generalization performance of deep neural networks without relying on hold-out validation data is a fundamental challenge in machine learning. While Stochastic Gradient Descent (SGD) drives the optimization of these highly parameterized models, its heavy-tailed, non-Gaussian dynamics induce complex, scale-invariant trajectories in the parameter space. In this paper, we propose a novel generalization measure based on the Fourier fractal dimension of the network's weight variations. By analyzing the characteristic function of the Lévy-driven stochastic differential equations in the frequency domain, we extract a metric that robustly captures the geometric complexity of the learning process. Furthermore, we introduce a customized Fourier-based optimizer designed to actively regularize this fractal dimension during training. Extensive empirical evaluations on the CIFAR-10, SVHN, and MNIST datasets demonstrate that our proposed Fourier generalization measure exhibits a strong correlation with the actual generalization gap. Our method achieves state-of-the-art Kendall rank correlation coefficients, outperforming a wide array of existing norm-based, margin-based, and PAC-Bayesian measures. Ultimately, this work highlights the potential of frequency-domain fractal analysis as both a powerful predictor for model generalizability and a principled foundation for developing more stable optimization algorithms.

URL PDF HTML ☆

赞 0 踩 0

2606.08307 2026-06-09 cs.CL 新提交

Understanding the Sociocultural Dimensions of Mental Health Discourse in Arabic-Language X Communities

理解阿拉伯语X社区中心理健康话语的社会文化维度

Amal Alqahtani, Rana Salama, Mona Diab

发表机构 * King Saud University（沙特国王大学）； Cairo University（开罗大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结通过GPT-4.1识别个人披露的推特用户，分析边缘型人格障碍、双相障碍和ADHD相关话语，发现不同病症的词汇模式差异，提出可复用的LLM辅助披露流程和文化关键词框架。

Comments Accepted to the SMM4H-HeaRD Workshop, co-located with the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

详情

AI中文摘要

计算心理健康研究主要集中于英语人群，阿拉伯语话语相对缺乏研究。我们提出一项探索性计算研究，包含来自607名用户的8147条推文，这些用户被GPT-4.1个人披露流程分类为三个特定病症的阿拉伯语X（原Twitter）社区中可能具有亲身经历的作者。我们关注与边缘型人格障碍（BPD）、双相障碍和ADHD相关的话语，并使用多领域文化关键词框架描述社区相关的语言模式。结果表明，在该语料库中，双相障碍推文包含更多宗教和医学词汇，BPD推文包含更多关系、身份和情绪困扰词汇，而ADHD推文更常关注实际症状和药物管理。我们将这些模式视为假设生成而非验证性的，因为语料库在不同病症间不平衡，某些子语料库在时间上集中，且关键词框架是初步操作化而非经过验证的测量工具。本文贡献了一个可复用的LLM辅助个人披露流程和一个针对阿拉伯语心理健康话语的探索性文化关键词框架。

英文摘要

Computational mental health research has predominantly centered on English-speaking populations, leaving Arabic-language discourse comparatively under-examined. We present an exploratory computational study of 8,147 tweets from 607 users classified by a GPT-4.1 personal-disclosure pipeline as likely lived-experience authors in three condition-specific Arabic-language X (formerly Twitter) Communities. We focus on discourse related to borderline personality disorder (BPD), bipolar disorder, and ADHD, and characterize community-associated linguistic patterns using a multi-domain cultural keyword framework. The results suggest that in this corpus, Bipolar tweets contain more religious and medical vocabulary, BPD tweets contain more relational, identity, and emotional-distress vocabulary, and ADHD tweets more often focus on practical symptoms and medication management. We treat these patterns as hypothesis-generating rather than confirmatory because the corpus is imbalanced across conditions, some subcorpora are temporally concentrated, and the keyword framework is an initial operationalization rather than a validated measurement instrument. The paper contributes a reusable LLM-assisted personal-disclosure pipeline and an exploratory cultural keyword framework for Arabic mental health discourse.

URL PDF HTML ☆

赞 0 踩 0

2606.08303 2026-06-09 cs.LG 新提交

GeoGNN: Time Series Geo-Localization using Two-Tower Graph Neural Networks

GeoGNN：使用双塔图神经网络的时间序列地理定位

Toan Tran, Waqwoya Abebe, Abhishek Potnis, Supriya Chinthavali, Cyrus Shahabi, Li Xiong, Dalton Lunga

发表机构 * Emory University（埃默里大学）； Oak Ridge National Laboratory（橡树岭国家实验室）； University of Southern California（南加州大学）

AI总结提出GeoGNN双塔架构，利用地理邻接图学习空间嵌入，结合时间序列表示，通过点积匹配实现时间序列地理定位，在电力消费数据集上平均提升约27%的定位精度。

详情

AI中文摘要

本文研究时间序列地理定位的新概念，目标是推断每个原始时间序列的地理来源。成功的地理定位可以为时间序列提供空间上下文，支持下游位置感知应用。我们形式化了该问题，借鉴图像地理定位的核心思想建立了强基线，并提出了GeoGNN——一种双塔架构。训练时，GeoGNN的空间塔通过利用地理邻接图学习地理单元候选的嵌入，而时间塔从时间序列中提取信息表示。推理时，每个时间表示与候选地理嵌入通过点积相似度匹配，并结合辅助分类头，以预测时间序列关联的地理来源。在全国范围的大规模电力消费数据集上的实验表明，GeoGNN在数据集上取得了最佳性能，并将细粒度和粗粒度地理定位精度平均提高了约27%。

英文摘要

This paper investigates a novel concept of time series geolocalization, where the goal is to infer the geographic origin of each raw time series. Successful geolocalization can provide spatial context to time series, enabling downstream location-aware applications. We formalize the problem, adapt core ideas from image geolocalization to establish strong baselines, and propose GeoGNN, a two-tower architecture. During training, GeoGNN's spatial tower learns embeddings of geographic cell candidates by leveraging the geographic adjacency graph, while the temporal tower extracts informative representations from time series. During inference, each temporal representation is matched against candidate geographic embeddings using dot-product similarity, combined with an auxiliary classification head, to predict the time series' associated geographic origin. Experiments on large-scale, countrywide electricity-consumption datasets demonstrate that GeoGNN achieves the best performance across datasets and enhances both fine- and coarse-grained geolocalization accuracy by ~27% on average.

URL PDF HTML ☆

赞 0 踩 0

2606.08302 2026-06-09 cs.CV 新提交

HACK++: Towards More Effective Head-Aware Key-Value Compression for Efficient Visual Autoregressive Modeling

HACK++：面向高效视觉自回归建模的更有效的头部感知键值压缩

Ziran Qin, Yuchen Jiang, Mingbao Lin, Youru Lv, Hang Guo, Wen Fei, Weiyao Lin

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Rakuten（乐天）； Tsinghua University（清华大学）

AI总结针对VAR模型跨尺度KV缓存导致的高计算和内存开销，提出无训练头部感知压缩框架HACK++，通过离线分类头部类型和自适应预算分配，在极低缓存预算下保持近无损生成。

详情

AI中文摘要

视觉自回归（VAR）模型采用下一尺度预测范式，以显著更少的解码步骤实现高质量生成。然而，现有VAR模型由于跨尺度键值（KV）缓存的累积，面临严重的注意力复杂度和内存开销。本文通过将KV缓存压缩引入下一尺度范式来应对这一挑战。我们首先深入分析VAR注意力，观察到注意力头可以稳定地分为两个功能不同的类别：上下文头关注保持语义一致性，而结构头保持空间连贯性。它们的功能差异使得现有的一刀切压缩方法在VAR模型上表现不佳。我们进一步发现，两种头部类型对历史尺度的依赖程度不同，且这种依赖在不同层和生成步骤中发生变化，这要求自适应的缓存预算分配。为解决这些问题，我们提出HACK++，一种针对VAR模型的无训练头部感知键值压缩框架。通过一次性离线校准，HACK++分类头部类型并推导头部特定先验。在推理时，它将注意力与缓存压缩在独立预算下解耦，在压缩累积缓存时采用更激进的策略，通过模式特定策略和依赖感知预算分配来限制当前尺度的注意力成本。在多个VAR模型上进行的广泛实验，涵盖文本到图像、类别条件和统一理解与生成任务，验证了HACK++的有效性和泛化能力。例如，在Infinity-2B/8B上，HACK++在仅30%注意力预算和10%缓存预算下保持近无损生成，即使在1%缓存预算下也保持稳健。

英文摘要

Visual Autoregressive (VAR) models adopt a next-scale prediction paradigm, offering high-quality generation with substantially fewer decoding steps. However, existing VAR models suffer from significant attention complexity and severe memory overhead due to the accumulation of key-value (KV) caches across scales. In this paper, we tackle this challenge by introducing KV cache compression into the next-scale paradigm. We begin with an in-depth analysis of VAR attention and observe that attention heads can be stably divided into two functionally distinct categories: Contextual Heads focus on maintaining semantic consistency, while Structural Heads preserve spatial coherence. Their functional divergence makes existing one-size-fits-all compression methods perform poorly on VAR models. We further find that the two head types differ markedly in their reliance on historical scales, and that this reliance shifts across layers and generation steps, arguing for an adaptive cache budget allocation. To address these challenges, we propose HACK++, a training-free Head-Aware key-value Compression frameworK for VAR models. From a one-time offline calibration, HACK++ classifies head types and derives head-specific priors. At inference, it decouples attention from cache compression under independent budgets, bounding the current-scale attention cost while compressing the accumulated cache far more aggressively, via pattern-specific strategies and a reliance-aware budget allocation. Extensive experiments on multiple VAR models across text-to-image, class-conditional, and unified understanding-and-generation tasks validate the effectiveness and generalizability of HACK++. For example, on Infinity-2B/8B, HACK++ maintains near-lossless generation with only a 30% attention budget and a 10% cache budget, and remains robust even under a 1% cache budget.

URL PDF HTML ☆

赞 0 踩 0

2606.08296 2026-06-09 cs.AI cs.LG 新提交

Revisiting the shutdown problem

重新审视关机问题

David Thorstad

发表机构 * GitHub

AI总结本文重新评估了AI关机问题的难度，指出现有论证未能证明其难以解决，且相关技术方案对模型性能造成了高安全代价。

2606.08295 2026-06-09 cs.CL 新提交

TLRD: Teaching LLMs to Reason over Tabular Data with Tri-Level Rationale Distillation

TLRD: 通过三级理由蒸馏教授LLMs在表格数据上进行推理

Tianyuan Liang, Xuwei Tan, Lei Shi, Junsheng Zhong, Ziyu Hu, Tian Xie, Zhiqun Zuo, Xiaodong Yu, Xueru Zhang

发表机构 * The Ohio State University（俄亥俄州立大学）； Stevens Institute of Technology（史蒂文斯理工学院）

AI总结提出TLRD框架，通过三级理由蒸馏将表格数据集转换为结构化理由监督，使LLMs在仅基于原始特征的情况下实现零开销预测和可解释推理，显著缩小与树集成模型的性能差距。

详情

AI中文摘要

表格数据是存储现实世界信息的主要媒介，驱动着机器学习的许多工业应用。传统预测器实现了强大的预测性能，但不提供决策所必需的可读、案例特定的解释。大型语言模型（LLMs）可以通过生成预测和解释来自然弥合这一差距。然而，数据集特定的模式（如特征分布和交互）使LLMs难以理解和推理表格数据，而仅标签微调在提高性能的同时会导致灾难性遗忘。为了解决这个问题，我们提出了三级理由蒸馏（TLRD），一个将仅标签表格数据集转换为LLMs的结构化理由监督的框架。TLRD使用高容量教师模型，基于三个互补的证据级别（实例级特征、数据集级分布上下文和比较级检索邻居）合成理由语料库，然后将理由蒸馏到学生LLMs中，从而仅从原始特征实现零开销预测和基于理由的解释。在多个领域数据集上的实验表明，TLRD显著缩小了LLMs与最先进树集成模型之间的性能差距，同时产生基于理由且可读的解释，为高风险决策提供了有价值的参考。

英文摘要

Tabular data is a primary medium for storing real-world information, driving many industrial applications of machine learning. Traditional predictors achieve strong predictive performance but do not provide readable, case-specific explanations essential for decision-making. Large Language Models (LLMs) can naturally bridge this gap by generating predictions alongside explanations. However, dataset-specific patterns, such as feature distributions and interactions, make tabular data difficult for LLMs to understand and reason over, while label-only fine-tuning improves performance at the cost of catastrophic forgetting. To address this problem, we propose Tri-Level Rationale Distillation (TLRD), a framework that converts label-only tabular datasets into structured rationale supervision for LLMs. TLRD uses a high-capacity teacher to synthesize a rationale corpus grounded in three complementary levels of evidence: instance-level feature, dataset-level distributional context, and comparison-level retrieved neighbors, then distills the rationale into student LLMs, enabling zero-overhead prediction and grounded explanation from raw features only. Experiments on multiple domain datasets show that TLRD significantly closes the performance gap between LLMs and state-of-the-art tree ensembles while producing grounded and readable explanations, offering a valuable reference for high-stakes decision-making.

URL PDF HTML ☆

赞 0 踩 0