arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 8081
2603.19551 2026-06-03 stat.ME cs.LG

Learning to Bet for Horizon-Aware Anytime-Valid Testing

学习在严格截止日期下进行前瞻性任意有效测试的投注

Ege Onur Taga, Samet Oymak, Shubhanshu Shekhar

发表机构 * Department of Electrical and Computer Engineering, University of Michigan(密歇根大学电气与计算机工程系)

AI总结 本文通过将前瞻性投注建模为有限时域最优控制问题,利用深度强化学习学习通用策略,在严格截止日期下实现有界均值的任意有效测试和置信序列。

Comments To appear in ICML 2026; 29 pages, 22 figures

详情
AI中文摘要

我们针对严格截止日期 $N$ 下的有界均值,开发了前瞻性任意有效测试和置信序列。利用投注/e-过程框架,我们将前瞻性投注视为一个状态空间为 $(t, \log W_t)$ 的有限时域最优控制问题,其中 $t$ 是时间,$W_t$ 是测试鞅值。我们首先证明,在状态空间的某些内部区域,显著偏离Kelly投注的策略是次优的,而Kelly投注以高概率达到阈值。然后,我们识别出充分条件,表明在该区域之外,如果投注者落后于计划,比Kelly更激进的投注可能更好;如果投注者领先,比Kelly更保守的投注可能更好。这些结果共同暗示了 $(t, \log W_t)$ 平面上的一个简单相图,描绘了Kelly、分数Kelly和激进投注可能更优的区域。在此相图指导下,我们引入了一种基于通用深度Q网络(DQN)智能体的深度强化学习方法,该智能体从合成经验中学习单一策略,并将过去观测的简单统计量映射为跨时域和零假设的投注。在有限时域实验中,学习到的DQN策略取得了最先进的结果。

英文摘要

We develop horizon-aware anytime-valid tests and confidence sequences for bounded means under a strict deadline $N$. Using the betting/e-process framework, we cast horizon-aware betting as a finite-horizon optimal control problem with state space $(t, \log W_t)$, where $t$ is the time and $W_t$ is the test martingale value. We first show that in certain interior regions of the state space, policies that deviate significantly from Kelly betting are provably suboptimal, while Kelly betting reaches the threshold with high probability. We then identify sufficient conditions showing that outside this region, more aggressive betting than Kelly can be better if the bettor is behind schedule, and less aggressive can be better if the bettor is ahead. Taken together these results suggest a simple phase diagram in the $(t, \log W_t)$ plane, delineating regions where Kelly, fractional Kelly, and aggressive betting may be preferable. Guided by this phase diagram, we introduce a Deep Reinforcement Learning approach based on a universal Deep Q-Network (DQN) agent that learns a single policy from synthetic experience and maps simple statistics of past observations to bets across horizons and null values. In limited-horizon experiments, the learned DQN policy yields state-of-the-art results.

2602.04132 2026-06-03 eess.SY cs.LG cs.RO cs.SY

LC-SAC: Lyapunov-Constrained Soft Actor-Critic via Koopman Operator Theory for Trajectory Tracking and Stabilization

LC-SAC: 基于Koopman算子理论的李雅普诺夫约束软演员-评论家算法用于轨迹跟踪与镇定

Dhruv S. Kushwaha, Zoleikha A. Biron

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出一种结合Koopman算子理论的李雅普诺夫约束软演员-评论家算法,通过线性提升动力学模型和闭环控制李雅普诺夫函数实现轨迹跟踪与镇定,并引入条件风险价值约束处理罕见但严重的失稳事件。

Comments 13 pages, 8 Figures

详情
AI中文摘要

强化学习在解决复杂序列决策问题中取得了显著成功,但其在安全关键物理系统中的应用仍受限于缺乏稳定性保证。标准强化学习算法优先考虑奖励最大化,往往产生可能引起振荡或无界状态发散的策略。本文提出一种基于Koopman算子理论的李雅普诺夫约束软演员-评论家算法。我们通过扩展动态模态分解学习误差动力学的线性提升代理模型,并求解离散代数Riccati方程以获得闭式二次候选控制李雅普诺夫函数。该控制李雅普诺夫函数作为拉格朗日惩罚项被纳入SAC演员更新中,通过条件风险价值目标聚合最坏情况尾部分布,将约束压力集中在罕见但严重的失稳事件上。我们进一步引入三种结构性的EDMD改进:在求解DARE之前对提升的A矩阵进行谱半径归一化、具有物理意义的LQR状态代价,以及强制V(0)=0的值偏置锚点,使得闭式控制李雅普诺夫函数对于更高维的提升模型(如倒立摆和3D四旋翼)是适定的。消融研究表明,硬拉格朗日约束是必要的,将其替换为奖励塑形会导致学习不稳定并在四旋翼任务中导致回报崩溃。

英文摘要

Reinforcement Learning (RL) has achieved remarkable success in solving complex sequential decision-making problems. However, its application to safety-critical physical systems remains constrained by the lack of stability guarantees. Standard RL algorithms prioritize reward maximization, often yielding policies that may induce oscillations or unbounded state divergence. In this work we propose a Lyapunov-Constrained Soft Actor-Critic (LC-SAC) algorithm using Koopman operator theory. We learn a linear lifted surrogate of the error dynamics via Extended Dynamic Mode Decomposition (EDMD) and solve the Discrete Algebraic Riccati Equation (DARE) to obtain a closed-form quadratic candidate Control Lyapunov Function (CLF). This CLF is incorporated into the SAC actor update as a Lagrangian penalty that aggregates the worst-case tail of violations via a Conditional Value-at-Risk (CVaR) objective, concentrating constraint pressure on rare but severe instability events. We further introduce three structural EDMD refinements spectral-radius normalization of the lifted A-matrix prior to the DARE solve, a physically meaningful LQR state cost, and a value-bias anchor enforcing V(0)=0 that make the closed-form CLF well-posed for higher-dimensional lifted models such as the cartpole and 3D quadrotor. The ablation study shows that a hard Lagrangian constraint is essential, replacing it with reward shaping (Lyap-RS-SAC) destabilizes learning and collapses return on quadrotor tasks.

2601.12186 2026-06-03 cs.SE cs.AI

Aletheia: What Makes RLVR For Code Verifiers Tick?

Aletheia: 什么使得代码验证器的RLVR有效?

Vatsal Venkatkrishna, Indraneil Paul, Iryna Gurevych

发表机构 * INSAIT, Sofia University "St. Kliment Ohridski", Bulgaria(保加利亚索菲亚大学INSAIT实验室) Ubiquitous Knowledge Processing Lab (UKP Lab), Department of Computer Science, Technical University of Darmstadt and National Research Center for Applied Cybersecurity(德累斯顿技术大学计算机科学系及应用网络安全国家研究中心通用知识处理实验室) ATHENE, Germany(德国ATHENE研究院)

AI总结 通过消融实验研究RLVR训练代码验证器时,中间思考轨迹、负样本学习和策略内训练三个因素在不同规模下的性能-成本权衡,发现最优配方依赖于模型规模。

Comments 31 pages, 6 figures

详情
AI中文摘要

通过可验证奖励的强化学习(RLVR)训练的多领域思考验证器是现代后训练的核心。然而,由于完整RLVR管道的成本过高,它们在代码生成中的应用落后于执行反馈。在这项工作中,我们消融了RLVR中性能-成本权衡的三个主要选择:中间思考轨迹、从负样本学习和策略内训练。我们引入了Aletheia,一个受控的、基于执行的测试平台,以促进对不同模型大小和两个常见验证器应用场景下的协变量偏移进行无污染分析。我们的分析揭示,最优训练配方依赖于规模:对于小型验证器,策略内学习是主要性能驱动因素,而在较大规模下,思考预算成为最关键因素。虽然利用负样本对不同大小的top-1选择准确性有一致影响,但它们对排名重建的贡献随规模单调增加,并在大规模下稳定训练中起关键作用。我们的帕累托最优分析表明,在较大模型规模下消除策略内训练会产生一个与完整RLVR配方性能相当的验证器。此外,我们发现,在较低预算下,放弃思考轨迹是一种计算高效的策略,在训练成本和验证器准确性之间提供了强有力的权衡。最终,我们的工作为高效部署鲁棒代码验证器提供了必要的经验基础,从而使其能够在大型代码生成模型的后训练管道中得到更广泛的应用。

英文摘要

Multi-domain thinking verifiers trained via Reinforcement Learning with Verifiable Rewards (RLVR) are a cornerstone of modern post-training. However, their adoption in code generation has lagged behind that of execution feedback due to the prohibitive costs of the full RLVR pipeline. In this work, we ablate three primary choices along the performance-cost trade-off in RLVR: intermediate thinking traces, learning from negative samples, and on-policy training. We introduce Aletheia, a controlled, execution-grounded testbed to facilitate a contamination-free analysis of code verifier training recipes across disparate model sizes and covariate shifts across two common verifier application scenarios. Our analysis reveals that the optimal training recipe is scale-dependent: on-policy learning is the primary performance driver for small verifiers, whereas the thinking budget becomes the most vital factor at larger scales. While leveraging negative samples has a consistent impact on top-1 selection accuracy across sizes, their contribution to ranking reconstruction increases monotonically with scale and plays a key role in stabilizing training at large sizes. Our Pareto optimality analysis demonstrates that eliminating on-policy training at larger model scales yields a verifier that performs comparably to the full RLVR recipe. Furthermore, we find that eschewing thinking traces serves as a compute-efficient strategy at lower budgets, offering a strong trade-off between training cost and verifier accuracy. Ultimately, our work provides the empirical foundation necessary to efficiently deploy robust code verifiers, thereby enabling their wider adoption in post-training pipelines for large code generation models.

2602.07075 2026-06-03 physics.chem-ph cs.AI cs.CL cs.LG

LatentChem: From Textual CoT to Latent Thinking in Chemical Reasoning

LatentChem: 从文本思维链到化学推理中的潜在思考

Xinwu Ye, Yicheng Mao, Yuxuan Liao, Jia Zhang, Yimeng Liu, Li Hao, Fang Wu, Zhiwei Li, Zehong Wang, Zhiyuan Liu, Zhenfei Yin, Li Yuan, Philip Torr, Huan Sun, xiangxiang Zeng, Mengdi Wang, Le Cong, Shenghua Gao, Xiangru Tang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对化学大语言模型依赖显式思维链导致的模态不匹配问题,提出LatentChem推理接口,通过连续思维向量和动态感知解耦化学逻辑与语言生成,在ChemCoTBench上以59.88%非平局胜率超越强CoT基线,并实现平均10.84倍推理步骤开销降低(5.96倍实际加速)。

Comments Accepted at ICML 2026

详情
AI中文摘要

当前的化学大语言模型主要依赖显式的思维链来解决复杂推理问题。然而,将非语言的隐性化学逻辑强制转化为离散的自然语言,造成了根本性的“模态不匹配”,为推理带来了人为瓶颈。我们提出了LatentChem,一种将化学逻辑与语言生成解耦的推理接口,使模型能够通过连续思维向量和动态感知来处理信息。我们的研究揭示了一个关键涌现行为:自发内化,这里定义为在仅结果优化下的自我选择。当为任务成功进行优化时,模型放弃冗长的文本推导,转而采用隐式的潜在计算,这表明模型将连续流形视为化学逻辑更自然的载体。这一范式转变也被证明是一种更优的计算策略:在严格的ChemCoTBench基准上,LatentChem对强CoT基线取得了59.88%的非平局胜率,同时在所有评估基准上实现了平均10.84倍的推理步骤开销降低(5.96倍实际加速)。我们的结果提供了经验证据,表明化学推理更自然、更有效地实现为连续潜在动力学,而非离散的语言轨迹。

英文摘要

Current chemical large language models (LLMs) predominantly rely on explicit Chain-of-Thought (CoT) to solve complex reasoning problems. However, forcing nonverbal tacit chemical logic into discrete natural language imposes a fundamental ``modality mismatch,'' creating an artificial bottleneck for reasoning. We introduce LatentChem, a reasoning interface that decouples chemical logic from linguistic generation, enabling the model to process information via continuous thought vectors and dynamic perception. Our investigation reveals a pivotal emergent behavior: spontaneous internalization, defined here as self-selected under outcome-only optimization. When optimized for task success, the model abandons verbose textual derivations in favor of implicit latent computation, suggesting that it identifies the continuous manifold as a more native substrate for chemical logic. This paradigm shift also proves to be a superior computational strategy: LatentChem achieves a 59.88\% non-tie win rate against the strong CoT baseline on the rigorous ChemCoTBench, while delivering a broad 10.84$\times$ average reduction in reasoning step overhead (5.96$\times$ wall-clock speedup) across all evaluated benchmarks. Our results provide empirical evidence that chemical reasoning is more naturally and effectively realized as continuous latent dynamics rather than discretized linguistic trajectories.

2603.05207 2026-06-03 cs.IR cs.CL

Core-based Hierarchies for Efficient GraphRAG

基于核心的高效图RAG层次结构

Jakir Hossain, Ahmet Erdem Sarıyüce

发表机构 * University at Buffalo(布法罗大学)

AI总结 针对图RAG中Leiden聚类不可复现的问题,提出用k-core分解替代,构建确定性、密度感知的层次结构,并设计轻量级启发式方法,在保证连接性的同时降低LLM成本,提升答案全面性和多样性。

Comments Accepted at the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情
AI中文摘要

检索增强生成(RAG)通过引入外部知识增强了大型语言模型。然而,现有的基于向量的方法通常无法处理需要跨多个文档推理的全局理解任务。GraphRAG通过将文档组织成具有层次化社区的知识图谱来解决这一问题,这些社区可以被递归总结。当前的GraphRAG方法依赖Leiden聚类进行社区检测,但我们证明,在平均度数为常数且大多数节点度数较低的稀疏知识图谱上,模块度优化允许指数级数量的近似最优划分,使得基于Leiden的社区本质上不可复现。为了解决这个问题,我们提出用k-core分解替代Leiden,它在线性时间内产生确定性的、密度感知的层次结构。我们引入一组轻量级启发式方法,利用k-core层次结构构建大小有界、保持连接性的社区用于检索和总结,同时采用一种令牌预算感知的采样策略来降低LLM成本。我们在包括金融收益报告、新闻文章和播客在内的真实世界数据集上评估了我们的方法,使用三个LLM进行答案生成,并由五个独立的LLM裁判进行逐项比较评估。跨数据集和模型,我们的方法一致地提高了答案的全面性和多样性,同时减少了令牌使用量,证明了基于k-core的GraphRAG是一种有效且高效的全局理解框架。

英文摘要

Retrieval-Augmented Generation (RAG) enhances large language models by incorporating external knowledge. However, existing vector-based methods often fail on global sensemaking tasks that require reasoning across many documents. GraphRAG addresses this by organizing documents into a knowledge graph with hierarchical communities that can be recursively summarized. Current GraphRAG approaches rely on Leiden clustering for community detection, but we prove that on sparse knowledge graphs, where average degree is constant and most nodes have low degree, modularity optimization admits exponentially many near-optimal partitions, making Leiden-based communities inherently non-reproducible. To address this, we propose replacing Leiden with k-core decomposition, which yields a deterministic, density-aware hierarchy in linear time. We introduce a set of lightweight heuristics that leverage the k-core hierarchy to construct size-bounded, connectivity-preserving communities for retrieval and summarization, along with a token-budget-aware sampling strategy that reduces LLM costs. We evaluate our methods on real-world datasets including financial earnings transcripts, news articles, and podcasts, using three LLMs for answer generation and five independent LLM judges for head-to-head evaluation. Across datasets and models, our approach consistently improves answer comprehensiveness and diversity while reducing token usage, demonstrating that k-core-based GraphRAG is an effective and efficient framework for global sensemaking.

2510.20372 2026-06-03 stat.ML cs.LG econ.EM math.ST stat.ME stat.TH

Testing Most Influential Sets

最具影响力集合的检验

Lucas D. Konrad, Nikolas Kuschnig

发表机构 * Vienna University of Economics and Business(维也纳经济与商业大学) Monash University(墨尔本大学)

AI总结 针对小部分数据点可能过度影响模型结论的问题,基于线性最小二乘法推导精确影响公式并识别最大影响的极值分布,提出一个用于检验过度影响的假设检验框架。

Comments Published as a conference paper at ICLR 2026

详情
AI中文摘要

小的有影响力的数据子集可以极大地影响模型结论,少数数据点可能推翻关键发现。虽然最近的研究识别了这些最具影响力的集合,但没有正式的方法来判断最大影响何时是过度的,而非在自然随机抽样变异下预期的。我们通过开发一个关于最具影响力集合的原则性框架来填补这一空白。聚焦于线性最小二乘法,我们推导了一个方便的精确影响公式,并识别了最大影响的极值分布——对于固定大小的集合和重尾数据是重尾的弗雷歇分布,对于增长集合或轻尾数据是表现良好的耿贝尔分布。这使得我们能够对过度影响进行严格的假设检验。我们通过跨经济学、生物学和机器学习基准的应用,解决了有争议的发现,并用严格的推断取代了临时的启发式方法。

英文摘要

Small influential data subsets can dramatically impact model conclusions, with a few data points overturning key findings. While recent work identifies these most influential sets, there is no formal way to tell when maximum influence is excessive rather than expected under natural random sampling variation. We address this gap by developing a principled framework for most influential sets. Focusing on linear least-squares, we derive a convenient exact influence formula and identify the extreme value distributions of maximal influence - the heavy-tailed Fréchet for constant-size sets and heavy-tailed data, and the well-behaved Gumbel for growing sets or light tails. This allows us to conduct rigorous hypothesis tests for excessive influence. We demonstrate through applications across economics, biology, and machine learning benchmarks, resolving contested findings and replacing ad-hoc heuristics with rigorous inference.

2603.01471 2026-06-03 cs.IR cs.LG

Reconstructing Content with Collaborative Attention for Universal Multimodal Representation Learning

通过协同注意力重建内容以提升多模态嵌入质量

Jiahan Chen, Da Li, Hengran Zhang, Yinqiong Cai, Lixin Su, Jiafeng Guo, Daiting Shi, Dawei Yin, Keping Bi

发表机构 * State Key Laboratory of AI Safety(人工智能安全国家重点实验室) Institute of Computing Technology, Chinese Academy of Sciences(中国科学院计算技术研究所) University of Chinese Academy of Sciences(中国科学院大学) Baidu Inc.(百度公司)

AI总结 提出CoCoA预训练范式,通过重构注意力流和基于EOS的重建任务,利用协同注意力优化多模态嵌入,使模型将输入语义压缩到<EOS>令牌中,从而提升嵌入质量。

详情
AI中文摘要

多模态嵌入模型,根植于多模态大语言模型(MLLMs),在检索和分类等多样任务中取得了显著的性能提升。然而,现有方法大多严重依赖大规模对比学习,对MLLMs的架构和训练范式如何影响嵌入质量的探索有限。虽然MLLMs的因果注意力和下一个令牌预测范式在生成任务中有效,但并未明确鼓励形成全局紧凑的表示,限制了其作为多模态嵌入骨干的有效性。为解决这一问题,我们提出了CoCoA,一种基于协同注意力的内容重建预训练范式,用于多模态嵌入优化。具体而言,我们重构注意力流并引入基于EOS的重建任务,鼓励模型从相应的<EOS>嵌入中重建输入。这促使多模态模型将输入的语义信息压缩到<EOS>令牌中,为后续的对比学习奠定基础。在MMEB-V1上的大量实验表明,基于Qwen2-VL和Qwen2.5-VL构建的CoCoA显著提升了嵌入质量。结果验证了内容重建作为最大化现有数据价值的有效策略,使多模态嵌入模型能够生成紧凑且信息丰富的表示,提升其性能上限。

英文摘要

Multimodal embedding models, rooted in multimodal large language models (MLLMs), have yielded significant performance improvements across diverse tasks such as retrieval and classification. However, most existing approaches rely heavily on large-scale contrastive learning, with limited exploration of how the architectural and training paradigms of MLLMs affect embedding quality. While effective for generation, the causal attention and next-token prediction paradigm of MLLMs does not explicitly encourage the formation of globally compact representations, limiting their effectiveness as multimodal embedding backbones. To address this, we propose CoCoA, a Content reconstruction pre-training paradigm based on Collaborative Attention for multimodal embedding optimization. Specifically, we restructure the attention flow and introduce an EOS-based reconstruction task, encouraging the model to reconstruct input from the corresponding <EOS> embeddings. This drives the multimodal model to compress the semantic information of the input into the <EOS> token, laying the foundations for subsequent contrastive learning. Extensive experiments on MMEB-V1 demonstrate that CoCoA built upon Qwen2-VL and Qwen2.5-VL significantly improves embedding quality. Results validate that content reconstruction serves as an effective strategy to maximize the value of existing data, enabling multimodal embedding models generate compact and informative representations, raising their performance ceiling.

2602.20213 2026-06-03 cs.SE cs.AI cs.CR

CodeHacker: Automated Test Case Generation for Detecting Vulnerabilities in Competitive Programming Solutions

CodeHacker: 用于检测竞赛编程解决方案漏洞的自动化测试用例生成

Jingwei Shi, Xinxiang Yin, Jing Huang, Jinman Zhao, Shengyu Tao

发表机构 * Shanghai University of Finance and Economics(上海金融学院) Northwestern Polytechnical University(西北工业大学) Meituan(美团) University of Toronto(多伦多大学)

AI总结 提出CodeHacker框架,通过多策略对抗测试用例生成(压力测试、反哈希攻击、逻辑特定攻击)和校准阶段,有效暴露程序漏洞,提升测试集真负率并增强RL训练模型性能。

详情
AI中文摘要

大型语言模型(LLM)在代码生成方面的评估很大程度上依赖于测试用例的质量和鲁棒性。然而,现有的基准测试往往缺乏对微妙边界情况的覆盖,导致错误的解决方案通过测试。为弥补这一差距,我们提出了CodeHacker,一个自动化的智能体框架,专门用于生成有针对性的对抗性测试用例,以暴露程序提交中的潜在漏洞。模仿竞赛编程中的黑客机制,CodeHacker采用多策略方法,包括压力测试、反哈希攻击和逻辑特定攻击,以破解特定的代码提交。为确保这些攻击的有效性和可靠性,我们引入了一个校准阶段,在该阶段中,智能体在评估参赛者代码之前,通过自生成的对抗性探测迭代地完善自己的验证器和检查器。实验表明,CodeHacker显著提高了现有数据集上的真负率(TNR),有效过滤了先前被接受的错误解决方案。此外,生成的对抗性案例被证明是优越的训练数据,提升了在LiveCodeBench等基准测试上经过强化学习训练的模型的性能。

英文摘要

The evaluation of Large Language Models (LLMs) for code generation relies heavily on the quality and robustness of test cases. However, existing benchmarks often lack coverage for subtle corner cases, allowing incorrect solutions to pass. To bridge this gap, we propose CodeHacker, an automated agent framework dedicated to generating targeted adversarial test cases that expose latent vulnerabilities in program submissions. Mimicking the hack mechanism in competitive programming, CodeHacker employs a multi-strategy approach, including stress testing, anti-hash attacks, and logic-specific targeting to break specific code submissions. To ensure the validity and reliability of these attacks, we introduce a Calibration Phase, where the agent iteratively refines its own Validator and Checker via self-generated adversarial probes before evaluating contestant code.Experiments demonstrate that CodeHacker significantly improves the True Negative Rate (TNR) of existing datasets, effectively filtering out incorrect solutions that were previously accepted. Furthermore, generated adversarial cases prove to be superior training data, boosting the performance of RL-trained models on benchmarks like LiveCodeBench.

2602.18690 2026-06-03 q-bio.NC cs.CV cs.LG

Neural Fields as World Models

神经场作为世界模型

Joshua Nunley

发表机构 * Luddy School of Informatics, Computing, and Engineering, Indiana University, Bloomington(信息学、计算与工程学院,印第安纳大学,布卢明顿) Cognitive Science Program, Indiana University, Bloomington(认知科学项目,印第安纳大学,布卢明顿)

AI总结 提出同构世界模型,利用运动门控神经场在空间图中进行物理预测,实现离线任务学习和身体相关表征。

Comments 6 pages, 6 figures. Annual Meeting of the Cognitive Science Society (CogSci 2026)

详情
AI中文摘要

人类可以在离线状态下预演可能的未来,例如在心理练习和可能的梦境中,这表明世界模型可能支持远离环境的学习。标准的机器学习世界模型将视觉输入压缩为潜在向量,丢弃了感觉皮层的空间结构特征。我们提出了同构世界模型:一种保持感觉拓扑结构的架构,使得物理预测成为几何传播而非抽象状态转换。我们通过运动门控神经场实现这一想法,其中活动通过局部侧向连接演化,运动命令乘性地调制特定通道。在三个实验中,相同的架构学习了无“瞬移”的弹道预测,通过将任务误差通过冻结的学习世界模型传播,改进了离线接球策略,并在没有身体标签的情况下发展出身体选择性的运动通道。这些结果提供了初步证据,表明物理预测、离线任务学习和身体相关表征共享一个共同的计算基础:空间地图内的动作条件预测。

英文摘要

Humans rehearse possible futures offline, as in mental practice and perhaps dreaming, suggesting that world models may support task learning away from the environment. Standard machine learning world models compress visual input into latent vectors, discarding the spatial structure that characterizes sensory cortex. We propose isomorphic world models: architectures that preserve sensory topology, so physics prediction becomes geometric propagation rather than abstract state transition. We implement this idea with motor-gated neural fields, where activity evolves through local lateral connectivity and motor commands multiplicatively modulate specific channels. Across three experiments, the same architecture learns ballistic prediction without ``teleporting,'' improves a catching policy offline by propagating task error through a frozen learned world model, and develops body-selective motor channels without body labels. These results provide preliminary evidence that physical prediction, offline task learning, and body-linked representation share a common computational substrate: action-conditional prediction within a spatial map.

2602.12430 2026-06-03 cs.MA cs.AI

Agent Skills for Large Language Models: Architecture, Acquisition, Security, and the Path Forward

大型语言模型的智能体技能:架构、获取、安全与未来路径

Renjun Xu, Yang Yan

发表机构 * ReDiscovery Hangzhou China(杭州ReDiscovery研究院) Westlake University Hangzhou China(西交大学)

AI总结 本文综述了大型语言模型智能体技能的研究,涵盖架构基础(如SKILL.md规范、渐进式上下文加载)、技能获取(强化学习、自主发现、组合合成)、大规模部署(计算机使用智能体栈、GUI接地)以及安全挑战(26.1%社区技能含漏洞),并提出技能信任与生命周期治理框架。

Comments Accepted by Agent Skills '26 Workshop at ACM Conference on AI and Agentic Systems 2026

详情
AI中文摘要

从单体语言模型向模块化、配备技能的智能体的转变,标志着大型语言模型(LLM)在实践中部署方式的决定性转变。智能体技能——即智能体按需加载的指令、代码和资源的可组合包——无需重新训练即可实现动态能力扩展,而非将所有程序性知识编码在模型权重中。它被形式化为渐进式披露、可移植技能定义以及与模型上下文协议(MCP)集成的范式。本综述全面探讨了智能体技能领域,该领域在过去几个月迅速发展。我们沿四个轴组织该领域:(i)架构基础,考察SKILL.md规范、渐进式上下文加载以及技能与MCP的互补作用;(ii)技能获取,涵盖使用技能库的强化学习、自主技能发现(SEAgent)和组合技能合成;(iii)大规模部署,包括计算机使用智能体(CUA)栈、GUI接地进展以及OSWorld和SWE-bench上的基准进展;(iv)安全,最近的经验分析显示,26.1%的社区贡献技能包含漏洞,这促使我们提出技能信任与生命周期治理框架——一个四层、基于门的权限模型,将技能来源映射到分级部署能力。我们识别出七个开放挑战——从跨平台技能可移植性到基于能力的权限模型——并提出了实现可信、自我改进技能生态系统的研究议程。与先前广泛涵盖LLM智能体或工具使用的综述不同,本工作特别关注新兴的技能抽象层及其对下一代智能体系统的影响。项目仓库:https://github.com/scienceaix/agentskills

英文摘要

The transition from monolithic language models to modular, skill-equipped agents marks a defining shift in how large language models (LLMs) are deployed in practice. Rather than encoding all procedural knowledge within model weights, agent skills -- composable packages of instructions, code, and resources that agents load on demand -- enable dynamic capability extension without retraining. It is formalized in a paradigm of progressive disclosure, portable skill definitions, and integration with the Model Context Protocol (MCP). This survey provides a comprehensive treatment of the agent skills landscape, as it has rapidly evolved during the last few months. We organize the field along four axes: (i) architectural foundations, examining the SKILL$.$md specification, progressive context loading, and the complementary roles of skills and MCP; (ii) skill acquisition, covering reinforcement learning with skill libraries, autonomous skill discovery (SEAgent), and compositional skill synthesis; (iii) deployment at scale, including the computer-use agent (CUA) stack, GUI grounding advances, and benchmark progress on OSWorld and SWE-bench; and (iv) security, where recent empirical analyses reveal that 26.1% of community-contributed skills contain vulnerabilities, motivating our proposed Skill Trust and Lifecycle Governance Framework -- a four-tier, gate-based permission model that maps skill provenance to graduated deployment capabilities. We identify seven open challenges -- from cross-platform skill portability to capability-based permission models -- and propose a research agenda for realizing trustworthy, self-improving skill ecosystems. Unlike prior surveys that broadly cover LLM agents or tool use, this work focuses specifically on the emerging skill abstraction layer and its implications for the next generation of agentic systems. Project repo: https://github.com/scienceaix/agentskills

2602.10949 2026-06-03 stat.ML cs.LG math.DS math.PR

Optimal Initialization in Depth: Lyapunov Initialization and Limit Theorems for Deep Leaky ReLU Networks

深度网络的最优初始化:深度Leaky ReLU网络的Lyapunov初始化与极限定理

Constantin Kogler, Tassilo Schwarz, Samuel Kittle

发表机构 * School of Mathematics, Institute for Advanced Study(数学系,高级研究院) Mathematical Institute, University of Oxford(牛津大学数学学院) Max Planck Institute for Multidisciplinary Sciences(多学科科学研究所) Department of Mathematics, University College London(伦敦大学学院数学系)

AI总结 本文通过随机深度Leaky ReLU网络的严格概率分析,提出Lyapunov初始化方法,将Lyapunov指数设为零以确保激活稳定性,从而改善学习效果。

Comments Preprint, 44 pages

详情
AI中文摘要

深度网络的有效初始化需要理解随机神经网络。本文对深度无偏置随机Leaky ReLU网络进行了严格的概率分析。我们证明了网络激活范数对数的强大数定律和中心极限定理,表明随着层数增加,其增长由称为Lyapunov指数的参数控制。该参数刻画了激活消失与爆炸之间的尖锐相变,并针对高斯或正交权重矩阵显式计算了Lyapunov指数。我们的结果表明,标准方法(如He初始化或正交初始化)无法保证低宽度深度网络的激活稳定性。基于这些理论见解,我们提出了一种新的初始化方法,称为Lyapunov初始化,它将Lyapunov指数设为零,从而确保神经网络尽可能稳定,经验上导致学习改进。

英文摘要

Effective initialization in deep networks requires an understanding of random neural networks. In this work, a rigorous probabilistic analysis of deep bias-free random Leaky ReLU networks is provided. We prove a Law of Large Numbers and a Central Limit Theorem for the logarithm of the norm of network activations, establishing that, as the number of layers increases, their growth is governed by a parameter called the Lyapunov exponent. This parameter characterizes a sharp phase transition between vanishing and exploding activations, and we calculate the Lyapunov exponent explicitly for Gaussian or orthogonal weight matrices. Our results reveal that standard methods, such as He initialization or orthogonal initialization, do not guarantee activation stability for deep networks of low width. Based on these theoretical insights, we propose a novel initialization method, referred to as Lyapunov initialization, which sets the Lyapunov exponent to zero and thereby ensures that the neural network is as stable as possible, leading empirically to improved learning.

2602.10387 2026-06-03 cs.DB cs.AI

Test-Time Optimization of Physical Query Plans with LLMs

基于LLM的物理查询计划测试时优化

Mehmet Hamza Erol, Xiangpeng Hao, Federico Bianchi, Ciro Greco, Jacopo Tagliabue, James Zou

发表机构 * Stanford University(斯坦福大学) University of Wisconsin-Madison(威斯康星大学麦迪逊分校) TogetherAI Bauplan

AI总结 提出DBPlanBench框架,利用LLM在测试时通过语义推理和进化搜索优化物理查询计划,在OLAP查询中实现1.05-1.12倍中位数加速,并支持小规模到大规模的迁移。

Comments Code is available at: https://github.com/BauplanLabs/DBPLANBENCH

详情
AI中文摘要

传统查询优化依赖于基于成本的优化器,使用预定义的启发式和统计模型来估计执行成本(如运行时间、内存和I/O)。改进这些需要大量的工程努力,但它们通常无法利用查询和模式中的语义相关性来获得更好的物理计划。然而,大型语言模型(LLMs)能够推理列语义、值分布以及经典统计所忽略的更广泛的领域上下文。我们介绍了DBPlanBench,一个用于DataFusion引擎的框架,它通过紧凑的序列化表示暴露物理计划,并将LLM提出的编辑作为JSON补丁应用。在此框架上,我们实例化了一个测试时优化工作流,其中LLM检查物理查询计划,基于语义推理提出局部编辑,并通过进化搜索在迭代中优化候选方案。我们针对OLAP查询,其中重复执行的重负载使得即使是微小的效率提升也能转化为显著的累积节省。我们特别将评估重点放在连接重排序和连接侧选择上,其中基数估计误差会复合倍增。在TPC-H上中位数加速达到1.10-1.12倍,在TPC-DS上达到1.05-1.07倍,某些查询加速高达4.78倍。我们还证明了在小规模因子下发现的优化可以有效地迁移到更大规模,支持低成本的小规模到大工作流。

英文摘要

Traditional query optimization relies on cost-based optimizers that estimate execution cost (e.g., runtime, memory, and I/O) using predefined heuristics and statistical models. Improving these requires substantial engineering effort, yet they often cannot exploit semantic correlations in queries and schemas that could enable better physical plans. Large language models (LLMs), however, can reason about column semantics, value distributions, and broader domain context that classical statistics miss. We introduce DBPlanBench, a harness for the DataFusion engine that exposes physical plans through a compact serialized representation and applies LLM-proposed edits as JSON patches. On this harness, we instantiate a test-time optimization workflow where an LLM examines physical query plans, proposes localized edits based on semantic reasoning, and an evolutionary search refines the candidates across iterations. We target OLAP queries, where heavy, repeated execution turns even small efficiency gains into substantial cumulative savings. We specifically focus our evaluation on join reordering and join-side selection, where cardinality-estimation errors compound multiplicatively. Median speedups reach $1.10$-$1.12\times$ on TPC-H and $1.05$-$1.07\times$ on TPC-DS, with some achieving up to $4.78\times$. We also demonstrate that optimizations discovered at small scale factors transfer effectively to larger ones, supporting a low-cost small-to-large workflow.

2510.12049 2026-06-03 econ.GN cs.AI q-fin.EC

Generative AI and Sales Productivity: Field Experiments in Online Retail

生成式人工智能与销售效率:在线零售中的现场实验

Lu Fang, Zhe Yuan, Kaifu Zhang, Dante Donati, Miklos Sarvary

发表机构 * Duke University(杜克大学) Imperial Business School(帝国商学院) BIG AI Conference(大数据人工智能会议) MSI AI Forum(MSI人工智能论坛) TSE Digital Economics Conference(TSE数字经济会议) AIML Conference(人工智能与机器学习会议) Operational Innovation Network Summit(运营创新网络峰会) University of Rochester(罗切斯特大学) UC Davis(加州大学戴维斯分校) TUM Workshop on Generative AI in Marketing(慕尼黑工业大学生成AI在营销中的研讨会) UCL School of Management(伦敦大学学院管理学院) Columbia Business School(哥伦比亚商学院) Business & Generative AI Conference(商业与生成AI会议) Zhejiang University School of Management(浙江大学管理学院)

AI总结 通过大规模随机现场实验,量化生成式人工智能(GenAI)对在线零售销售业绩的短期影响,发现GenAI在多数工作流中提升销售额,主要通过提高转化率而非客单价,且对经验较少的消费者效果更显著。

Comments Keywords: Artificial Intelligence, Consumer Experience, Field Experiments, GenAI, Productivity, Retail Platforms, Sales. JEL codes: C93, D24, L81, M31, O3

详情
AI中文摘要

我们通过在一家领先的跨境在线零售平台上进行涉及数百万用户和产品的大规模随机现场实验,量化了生成式人工智能(GenAI)对销售业绩的短期影响。在2023-2024年间,该平台将GenAI整合到七个面向消费者的业务流程中,涵盖客户服务、消费者-产品匹配、广告和卖家服务。我们发现,GenAI的采用在大多数工作流中提高了销售额,效果范围从无显著影响到16.3%,具体取决于GenAI相对于基线公司实践的边际贡献。在四个具有正向销售效果的GenAI应用中,隐含的年增量价值约为5美元——考虑到零售商的规模和GenAI采用的早期阶段,这是一个具有经济意义的影响。收益主要通过更高的转化率而非更大的购物车价值实现,这与GenAI通过减少搜索、信息、沟通和个性化摩擦来改善购物体验相一致。重要的是,这些效应并未与更差的购买后结果相关,因为产品退货率和客户评分没有恶化。最后,我们记录了显著的需求侧异质性,对经验较少的消费者收益更大。我们的发现提供了新颖的大规模因果证据,展示了GenAI如何塑造在线零售的销售效率,突出了其即时价值和更广泛的潜力。

英文摘要

We quantify the short-term impact of Generative Artificial Intelligence (GenAI) on sales performance through a series of large-scale randomized field experiments involving millions of users and products at a leading cross-border online retail platform. Over 2023-2024, the platform integrated GenAI into seven consumer-facing business workflows spanning customer service, consumer-product matching, advertising, and seller services. We find that GenAI adoption increases sales in most workflows, with effects ranging from no detectable impact to $16.3\%$, depending on GenAI's marginal contribution relative to baseline firm practices. Across the four GenAI applications with positive sales effects, the implied annual incremental value is roughly $\$5-$an economically meaningful impact given the retailer's scale and the early stage of GenAI adoption. The gains operate primarily through higher conversion rates rather than larger cart values, consistent with GenAI improving the shopping experience by reducing search, information, communication, and personalization frictions. Importantly, these effects are not associated with worse post-purchase outcomes, as product return rates and customer ratings do not deteriorate. Finally, we document substantial demand-side heterogeneity, with larger gains for less experienced consumers. Our findings provide novel, large-scale causal evidence on how GenAI shapes sales productivity in online retail, highlighting both its immediate value and broader potential.

2601.02380 2026-06-03 cs.CY cs.AI

LLMs, Reasoning and Plagiarism

可反驳性差距:大型语言模型推理验证中的挑战

Elchanan Mossel

发表机构 * Elchanan Mossel

AI总结 本文指出当前声称LLM具备科学发现和通用智能的说法不满足波普尔可反驳性原则,并提出了提高科学透明度和可重复性的指南。

Comments The authors explicitly reserve all rights in this work. No permission is granted for the reproduction, storage, or use of this document for the purpose of training artificial intelligence systems or for text and data mining (TDM), including but not limited to the generation of embeddings, summaries, or synthetic derivatives. Claude and Gemini were used in writing this manuscript

详情
AI中文摘要

最近的报告声称大型语言模型(LLM)已经具备了推导新科学和展现人类级通用智能的能力。我们认为这样的说法并非严谨的科学声明,因为它们不满足波普尔的可反驳性原则(通常称为可证伪性),该原则要求科学陈述能够被证伪。我们识别了当前AI推理研究中的几个方法论陷阱,包括由于不透明且不可搜索的训练数据而无法验证发现的新颖性、由于持续模型更新导致缺乏可重复性,以及省略人机交互记录从而掩盖科学发现的真正来源。此外,缺乏反事实和失败尝试的数据造成了选择偏差,可能夸大LLM的能力。为应对这些挑战,我们提出了关于LLM推理研究的科学透明度和可重复性指南。建立这样的指南对于科学诚信以及当前关于公平数据使用的社会辩论至关重要。我们还讨论了相关问题,如LLM生成的抄袭挑战以及LLM中检索与新颖性的一般问题。

英文摘要

Recent reports claim that Large Language Models (LLMs) derive new science and exhibit human-level general intelligence. Such claims are entangled with two different narratives about what LLMs do: one in which they are an engine of synthesis that genuinely reasons to new knowledge, and one in which they retrieve and re-emit the work of others without attribution. In the scientific setting these are best understood as a contrast between \emph{reasoning} and \emph{plagiarism}. Finding where the truth lies between these two narratives is very challenging, as central components of the model -- the training data and the interaction transcript -- remain opaque. Thus claims of LLM reasoning do not satisfy Popper's refutability principle. We propose guidelines for transparency and reproducibility that will allow reasoning claims to be studied using the scientific method. The dominance of the reasoning narrative, we suggest, is in practice encouraging plagiarism in the scientific literature; we discuss what might be done about it.

2510.12636 2026-06-03 stat.ML cs.LG math.AP

Adapting Noise to Data: Generative Flows from 1D Processes

将噪声适应于数据:来自一维过程的生成流

Jannis Chemseddine, Gregor Kornhardt, Richard Duong, Gabriele Steidl

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出一个通用框架,通过一维分位数函数学习数据自适应的参数化先验分布(潜在噪声),利用噪声与数据之间的Wasserstein距离进行优化,以改善生成流模型对重尾等分布的学习能力。

Comments ICML 2026

详情
AI中文摘要

基于流的生成模型中的默认高斯潜变量在学习某些分布(如重尾分布)时会带来挑战。我们引入了一个通用框架,使用一维分位数函数学习数据自适应的参数化先验分布(潜在噪声),并通过噪声与数据之间的Wasserstein距离进行优化。基于分位数的先验参数化自然地适应重尾分布和紧支撑分布,并缩短传输路径。在重尾天气和图像数据集上的数值结果证实了该方法的灵活性和有效性,且计算开销可忽略不计。

英文摘要

The default Gaussian latent in flow-based generative models poses challenges when learning certain distributions such as heavy-tailed ones. We introduce a general framework for learning data-adaptive parametric prior distributions (latent noise) using one-dimensional quantile functions, optimized via the Wasserstein distance between noise and data. The quantile-based prior parameterization naturally adapts to both heavy-tailed and compactly supported distributions and shortens transport paths. Numerical results on heavy-tailed weather and image datasets confirm the method's flexibility and effectiveness achieved with negligible computational overhead.

2602.08873 2026-06-03 cs.IR cs.AI cs.CY cs.SI physics.soc-ph

Whose Name Comes Up? II: Benchmarking and Intervention-Based Auditing of LLM-Based Scholar Recommendation

谁的名字出现?II:基于基准测试和干预审计的LLM学者推荐系统

Lisette Espín-Noboa, Gonzalo Gabriel Méndez

发表机构 * Complexity Science Hub Vienna(维也纳复杂性科学中心) Universitat Politècnica de València(巴塞罗那理工大学) Inria Rennes(里昂国家信息与自动化研究所)

AI总结 提出LLMScholarBench基准,通过温度变化、表示约束提示和检索增强生成等干预措施审计22个LLM在物理专家推荐中的技术质量和社会代表性,发现干预措施带来不同权衡。

Comments In Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26). 30 pages: 11 pages in main (6 figures, 1 table), 19 pages in appendix (22 figures, 2 tables)

详情
AI中文摘要

大型语言模型(LLM)现在被用于学术专家推荐。现有的审计通常孤立地评估此类推荐,忽略了最终用户的推理时干预。因此,尚不清楚失败(例如,拒绝、幻觉、覆盖不均)源于模型选择还是部署决策。我们引入了LLMScholarBench,一个用于审计基于LLM的学者推荐的基准,它联合评估模型基础设施和最终用户在多个任务上的干预。LLMScholarBench使用九个指标衡量技术质量和社会代表性。我们在物理专家推荐中实例化该基准,并在温度变化、表示约束提示和通过网络搜索的检索增强生成(RAG)下审计22个LLM。我们的结果表明,每种干预都带来不同的权衡。较高的温度会降低有效性、一致性和事实性。表示约束提示以提高多样性为代价降低了事实性,而RAG主要提高了技术质量,同时降低了多样性和平等性。总体而言,最终用户的干预重塑了权衡,而不是提供统一的收益。LLMScholarBench使得在基于LLM的学者推荐中,跨模型和干预的所有这些动态都可审计。

英文摘要

Large language models (LLMs) are now used for academic expert recommendation. Existing audits typically evaluate such recommendations in isolation, ignoring end-user inference-time interventions. Thus, it remains unclear whether failures (e.g., refusals, hallucinations, uneven coverage) stem from model choice or deployment decisions. We introduce LLMScholarBench, a benchmark for auditing LLM-based scholar recommendation that jointly evaluates model infrastructure and end-user interventions across multiple tasks. LLMScholarBench measures technical quality and social representation using nine metrics. We instantiate the benchmark in physics expert recommendation and audit 22 LLMs under temperature variation, representation-constrained prompting, and retrieval-augmented generation (RAG) via web search. Our results show that each intervention entails distinct tradeoffs. Higher temperature degrades validity, consistency, and factuality. Representation-constrained prompting improves diversity at the expense of factuality, while RAG primarily improves technical quality while reducing diversity and parity. Overall, end-user interventions reshape trade-offs rather than providing uniform gains. LLMScholarBench makes all these dynamics auditable across models and interventions in LLM-based scholar recommendations.

2602.06842 2026-06-03 math.NA cs.LG cs.NA

Are Deep Learning Based Hybrid PDE Solvers Reliable? Why Training Paradigms and Update Strategies Matter

基于深度学习的混合PDE求解器可靠吗?为什么训练范式和更新策略很重要

Yuhan Wu, Jan Willem van Beek, Victorita Dolean, Alexander Heinlein

发表机构 * Delft Institute of Applied Mathematics(代尔夫特应用数学研究所) Delft University of Technology(代尔夫特理工大学) Department of Mathematics and Computer Science(数学与计算机科学系) Eindhoven University of Technology(埃因霍温理工大学) The Netherlands(荷兰)

AI总结 本文研究基于深度学习的混合迭代方法(DL-HIMs)在科学计算中的可靠性问题,发现训练目标与求解器动力学及物理问题不一致会导致残差停滞,并提出物理感知的Anderson加速(PA-AA)方法以恢复可靠收敛。

Comments Accepted manuscript version of an article accepted for publication in IEEE Computing in Science & Engineering. The final published version will be available through IEEE Xplore

详情
AI中文摘要

基于深度学习的混合迭代方法(DL-HIMs)将经典数值求解器与神经算子相结合,利用它们互补的谱偏差来加速收敛。尽管有这一前景,许多DL-HIMs在假固定点处停滞,此时神经更新消失而物理残差仍然很大,这引发了对其在科学计算中可靠性的质疑。在本文中,我们提供证据表明,即使神经架构固定,性能对训练范式和更新策略高度敏感。通过对基于DeepONet的混合迭代数值可转移求解器(HINTS)和基于FFT的傅里叶神经求解器(FNS)的详细研究,我们展示了当训练目标与求解器动力学和问题物理不一致时,显著的物理残差可能持续存在。我们进一步研究了Anderson加速(AA),并证明其经典形式不适用于非线性神经算子。为了克服这一点,我们引入了物理感知的Anderson加速(PA-AA),它最小化物理残差而非固定点更新。数值实验证实,PA-AA在显著更少的迭代次数内恢复了可靠收敛。这些发现为围绕基于AI的PDE求解器的持续争议提供了具体答案:可靠性不仅取决于架构,还取决于物理信息驱动的训练和迭代设计。

英文摘要

Deep learning-based hybrid iterative methods (DL-HIMs) integrate classical numerical solvers with neural operators, utilizing their complementary spectral biases to accelerate convergence. Despite this promise, many DL-HIMs stagnate at false fixed points where neural updates vanish while the physical residual remains large, raising questions about reliability in scientific computing. In this paper, we provide evidence that performance is highly sensitive to training paradigms and update strategies, even when the neural architecture is fixed. Through a detailed study of a DeepONet-based hybrid iterative numerical transferable solver (HINTS) and an FFT-based Fourier neural solver (FNS), we show that significant physical residuals can persist when training objectives are not aligned with solver dynamics and problem physics. We further examine Anderson acceleration (AA) and demonstrate that its classical form is ill-suited for nonlinear neural operators. To overcome this, we introduce physics-aware Anderson acceleration (PA-AA), which minimizes the physical residual rather than the fixed-point update. Numerical experiments confirm that PA-AA restores reliable convergence in substantially fewer iterations. These findings provide a concrete answer to ongoing controversies surrounding AI-based PDE solvers: reliability hinges not only on architectures but on physically informed training and iteration design.

2511.12085 2026-06-03 cs.CR cs.AI cs.LG

A Robust and Explainable Transformer-Based Framework for Phishing Email Detection

一种鲁棒且可解释的基于Transformer的钓鱼邮件检测框架

Sajad U P

发表机构 * Independent Researcher(独立研究者)

AI总结 提出基于DistilBERT的轻量级钓鱼邮件检测框架,通过梯度对抗训练和字符级噪声增强鲁棒性,并集成LIME、SHAP和IG三种可解释AI方法,结合Flan-T5-Small生成自然语言解释,提升检测准确性和用户信任。

详情
AI中文摘要

钓鱼及相关网络威胁正变得越来越复杂,基于电子邮件的钓鱼仍然是最持久的攻击载体。这些攻击利用人类漏洞来传递恶意软件或获取对敏感信息的未授权访问。基于Transformer的模型通过强大的上下文语言理解增强了钓鱼检测;然而,由于缺乏可解释性,它们通常被视为黑盒。此外,最近的AI驱动攻击进一步削弱了模型的韧性。为了解决这些挑战,本文提出了一种基于DistilBERT(一种轻量级Transformer模型)的轻量级钓鱼检测框架。通过使用快速梯度法(FGM)进行基于梯度的对抗训练,并结合随机字符级扰动,增强了对嵌入级扰动和字符级输入噪声的鲁棒性。为了提高透明度,集成了三种突出的可解释AI(XAI)方法:LIME(局部可解释模型无关解释)、SHAP(SHapley Additive exPlanations)和IG(积分梯度),以解释模型决策。一个结构化的基于规则的提示结合模型预测和XAI特征,引导Flan-T5-Small生成通俗易懂、基于证据的解释。实验结果表明,所提出的框架在准确性和韧性方面优于未经鲁棒性增强的标准DistilBERT检测模型。这种集成方法有助于弥合模型可靠性与用户信任之间的差距,推动透明钓鱼检测的发展。

英文摘要

Phishing and related cyber threats are becoming increasingly sophisticated, with email-based phishing remaining the most persistent attack vector. These attacks exploit human vulnerabilities to deliver malware or gain unauthorized access to sensitive information. Transformer-based models enhance phishing detection through robust contextual language understanding; yet they are often regarded as black boxes due to a lack of interpretability. Moreover, recent AI-enabled attacks further undermine model resilience. To address these challenges, this work proposes a lightweight phishing detection framework based on DistilBERT, a lightweight Transformer model. Robustness to embedding-level perturbations and character-level input noise is enhanced through gradient-based adversarial training using the Fast Gradient Method (FGM), combined with stochastic character-level perturbations. To improve transparency, three prominent Explainable AI (XAI) methods, LIME (Local Interpretable Model-agnostic Explanations), SHAP (SHapley Additive exPlanations), and IG (Integrated Gradients), are integrated to interpret model decision-making. A structured rule-based prompt combines model predictions and XAI features to guide Flan-T5-Small in generating plain-language, evidence-based explanations. Experimental results demonstrate that the proposed framework outperforms a standard DistilBERT-based detection model trained without robustness enhancements in terms of accuracy and resilience. This integrated approach helps bridge the gap between model reliability and user trust, advancing transparent phishing detection.

2601.10222 2026-06-03 math.NA cs.AI cs.NA math.OC

Introduction to optimization methods for training SciML models

训练科学机器学习模型的优化方法导论

Alena Kopaničáková, Elisa Riccietti

发表机构 * Toulouse-INP, IRIT-APO, ANITI(图卢兹INP、IRIT-APO、ANITI) ENS de Lyon, CNRS, Inria, Universitè Claude Bernard Lyon 1, LIP, UMR 5668(里昂大学、国家科学研究中心、法国国家信息与自动化研究所、克莱尔伯恩里昂第一大学、LIP、UMR 5668)

AI总结 本文统一介绍了机器学习和科学机器学习中的优化方法,强调问题结构如何影响算法选择,并讨论了物理约束和数据驱动SciML模型的实用策略。

详情
AI中文摘要

优化是现代机器学习(ML)和科学机器学习(SciML)的核心,但底层优化问题的结构在这些领域之间存在显著差异。经典ML通常依赖于随机、样本可分离的目标,这有利于一阶和自适应梯度方法。相比之下,SciML通常涉及物理信息或算子约束的公式,其中微分算子导致损失景观中的全局耦合、刚性和强各向异性。因此,SciML中的优化行为由底层物理模型的谱特性而非数据统计决定,这常常限制了标准随机方法的有效性,并促使采用确定性或曲率感知的方法。本文提供了ML和SciML中优化方法的统一介绍,强调问题结构如何塑造算法选择。我们回顾了确定性和随机设置中的一阶和二阶优化技术,讨论了它们对物理约束和数据驱动SciML模型的适应,并通过教程示例说明了实用策略,同时突出了科学计算和科学机器学习交叉领域的开放研究方向。

英文摘要

Optimization is central to both modern machine learning (ML) and scientific machine learning (SciML), yet the structure of the underlying optimization problems differs substantially across these domains. Classical ML typically relies on stochastic, sample-separable objectives that favor first-order and adaptive gradient methods. In contrast, SciML often involves physics-informed or operator-constrained formulations in which differential operators induce global coupling, stiffness, and strong anisotropy in the loss landscape. As a result, optimization behavior in SciML is governed by the spectral properties of the underlying physical models rather than by data statistics, frequently limiting the effectiveness of standard stochastic methods and motivating deterministic or curvature-aware approaches. This document provides a unified introduction to optimization methods in ML and SciML, emphasizing how problem structure shapes algorithmic choices. We review first- and second-order optimization techniques in both deterministic and stochastic settings, discuss their adaptation to physics-constrained and data-driven SciML models, and illustrate practical strategies through tutorial examples, while highlighting open research directions at the interface of scientific computing and scientific machine learning.

2601.04120 2026-06-03 math.OC cs.LG

A Single-Loop Bilevel Deep Learning Method for Optimal Control of Obstacle Problems

障碍问题最优控制的单环双层深度学习方法

Yongcun Song, Shangzhi Zeng, Jin Zhang, Lvgang Zhang

发表机构 * SUSTech(四川大学)

AI总结 提出一种无网格、可扩展的单环双层深度学习方法,通过约束嵌入神经网络和单环随机一阶双层算法高效求解障碍问题的最优控制。

详情
AI中文摘要

障碍问题的最优控制出现在广泛的应用中,由于其非光滑性、非线性和双层结构,计算上具有挑战性。经典数值方法依赖于基于网格的离散化,通常需要求解一系列代价高昂的子问题。在这项工作中,我们提出了一种单环双层深度学习方法,该方法无网格、可扩展到高维和复杂域,并避免重复求解离散子问题。该方法采用约束嵌入神经网络来逼近状态和控制,并保持双层结构。为了高效训练神经网络,我们提出了一种单环随机一阶双层算法(S2-FOBA),该算法消除了嵌套优化,并且不依赖于限制性的下层唯一性假设。我们在温和假设下分析了S2-FOBA的收敛行为。在基准示例上的数值实验,包括复杂域上具有规则和不规则障碍的分布控制和障碍控制问题,表明与经典数值方法相比,所提出的方法在降低计算成本的同时实现了令人满意的精度。

英文摘要

Optimal control of obstacle problems arises in a wide range of applications and is computationally challenging due to its nonsmoothness, nonlinearity, and bilevel structure. Classical numerical approaches rely on mesh-based discretization and typically require solving a sequence of costly subproblems. In this work, we propose a single-loop bilevel deep learning method, which is mesh-free, scalable to high-dimensional and complex domains, and avoids repeated solution of discretized subproblems. The method employs constraint-embedding neural networks to approximate the state and control and preserves the bilevel structure. To train the neural networks efficiently, we propose a Single-Loop Stochastic First-Order Bilevel Algorithm (S2-FOBA), which eliminates nested optimization and does not rely on restrictive lower-level uniqueness assumptions. We analyze the convergence behavior of S2-FOBA under mild assumptions. Numerical experiments on benchmark examples, including distributed and obstacle control problems with regular and irregular obstacles on complex domains, demonstrate that the proposed method achieves satisfactory accuracy while reducing computational cost compared to classical numerical methods.

2512.16882 2026-06-03 physics.chem-ph cond-mat.mtrl-sci cs.LG

A Cartesian-3j Framework for Machine Learning Interatomic Potentials

机器学习原子间势的 Cartesian-3j 框架

Zemin Xu, Chenyu Wu, Wenbo Xie, P. Hu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出基于Cartesian-3j符号和Cartesian广义Clebsch-Gordan系数的不可约Cartesian张量框架,构建MACE、NequIP和Allegro的Cartesian版本,并引入TACE-v1-OAM-M模型在Matbench Discovery上取得竞争性能。

详情
AI中文摘要

机器学习原子间势(MLIPs)在计算化学的外推能力方面带来了显著提升。然而,大多数等变模型通常使用球张量(STs)构建,而笛卡尔张量公式尽管与原子坐标和张量目标自然对齐,但仍未得到充分发展。在这项工作中,我们通过引入\texttt{Cartesian-3j}符号和Cartesian广义Clebsch-Gordan系数,为不可约Cartesian张量(ICTs)开发了一个Cartesian框架,这些符号和系数直接类比于为ST耦合定义的\texttt{Wigner-3j}符号和广义Clebsch-Gordan系数。我们扩展了\texttt{e3nn}库以支持ICT乘积,并使用该框架构建了\texttt{MACE}、\texttt{NequIP}和\texttt{Allegro}的Cartesian对应版本,从而首次实现了在固定架构仅改变张量基下的受控比较。我们的实验表明,不可约Cartesian模型可以达到与球面对应版本相当的精度,但直接Cartesian化会导致不利的计算和内存缩放,这促使我们采用专门的Cartesian架构选择。利用ICTs和我们的框架,我们引入了\texttt{TACE-v1-OAM-M},并证明它在Matbench Discovery上取得了与最先进ST模型竞争的性能。

英文摘要

Machine learning interatomic potentials (MLIPs) have brought substantial gains in the extrapolation capability in computational chemistry. However, most equivariant models are typically built with spherical tensors (STs), while Cartesian tensor formulations remain less developed despite their natural alignment with atomic coordinates and tensorial targets. In this work, we develop a Cartesian framework for irreducible Cartesian tensors (ICTs) by introduce the \texttt{Cartesian-3j} symbol and Cartesian Generalized Clebsch-Gordan Coefficients, which serve as direct analogues of the \texttt{Wigner-3j} symbol and Generalized Clebsch-Gordan coefficients defined for ST coupling. We extend the \texttt{e3nn} library to support ICT product, and use this framework to build Cartesian counterparts of \texttt{MACE}, \texttt{NequIP}, and \texttt{Allegro}, allowing the first controlled comparison where architectures are held fixed and only the tensor basis is changed. Our experiments show that irreducible Cartesian models can achieve accuracy comparable to spherical counterparts, but direct Cartesianization incurs unfavorable compute and memory scaling, motivating dedicated Cartesian architectural choices. Leveraging ICTs and our framework, we introduce \texttt{TACE-v1-OAM-M} and demonstrate that it achieves competitive performance on Matbench Discovery compared to state-of-the-art ST models.

2511.17126 2026-06-03 eess.IV cs.CV cs.LG physics.optics

Towards Blind Lens Aberration Correction via Large LensLib Pre-training and Discrete Degradation Priors

面向盲镜头像差校正的大规模LensLib预训练与离散退化先验

Xiaolong Qian, Qi Jiang, Yao Gao, Lei Sun, Kailun Yang, Xian Wang, Zhonghua Yi, Wenyong Li, Ming-Hsuan Yang, Luc Van Gool, Kaiwei Wang

发表机构 * National Research Center for Optical Instrumentation, Zhejiang University(浙江省光学仪器研究中心,浙江大学) INSAIT, Sofia University "St. Kliment Ohridski"(INSAIT,索菲亚大学"圣克莱门特·欧弗里迪斯基") School of Artificial Intelligence and Robotics, Hunan University(人工智能与机器人学院,湖南大学) National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University(机器人视觉感知与控制技术国家工程研究中心,湖南大学)

AI总结 提出FoundCAC框架,通过构建大规模无偏镜头库AODLibpro和离散退化先验LPR,解决数据扩展与先验缺失问题,实现盲镜头像差校正的零样本泛化和高效少样本适应。

Comments Accepted to 2026 IEEE International Conference on Computational Photography (ICCP). The source code and datasets will be made publicly available at https://github.com/zju-jiangqi/FoundCAC

详情
AI中文摘要

新兴的基于深度学习的镜头库预训练(LensLib-PT)流程通过训练通用神经网络,为盲镜头像差校正提供了新途径,展现出处理多种未知光学退化的强大能力。本文提出FoundCAC,一个通用的基础框架,解决了阻碍现有流程泛化的两个挑战:训练数据扩展的困难以及缺乏表征光学退化的先验指导。为提高数据可扩展性,我们扩展设计规范以增加退化多样性,并基于均匀采样策略构建了大规模无偏镜头库AODLibpro,该策略量化了空间变化模式和严重程度。在模型设计方面,为利用点扩散函数(PSF)作为指导同时保持盲范式,我们提出了一种多阶段向量量化表示学习方案。该范式专门设计用于构建潜在PSF表示(LPR),将复杂的连续PSF显式编码为离散退化先验,以规范高度病态的恢复过程。通过简单而有效的码本冻结策略,我们的框架利用离散先验提升全样本恢复性能,并实现对未见镜头的高效少样本适应。在合成LensLib和真实镜头的多种像差上的实验表明,我们的框架实现了最先进的零样本泛化,同时支持针对特定镜头的高效少样本适应。源代码和数据集将在https://github.com/zju-jiangqi/FoundCAC公开提供。

英文摘要

Emerging deep-learning-based lens library pre-training (LensLib-PT) pipeline offers a new avenue for blind lens aberration correction by training a universal neural network, demonstrating strong capability in handling diverse unknown optical degradations. This work proposes FoundCAC, a universal foundational framework that resolves two challenges hindering the generalization of existing pipelines: the difficulty of scaling training data and the absence of prior guidance characterizing optical degradation. To improve data scalability, we expand the design specifications to increase degradation diversity and construct AODLibpro, a large-scale, unbiased lens library based on a uniform sampling strategy that quantifies spatial-variation patterns and severity. In terms of model design, to leverage Point Spread Functions (PSFs) as guidance while maintaining the blind paradigm, we propose a multi-stage vector-quantized representation learning scheme. This paradigm is specifically designed to construct a Latent PSF Representation (LPR), explicitly encoding complex continuous PSFs into a discrete degradation prior to regularize the highly ill-posed restoration process. Through a simple yet effective codebook-freezing strategy, our framework leverages the discrete prior to elevate full-shot restoration performance and unlock highly efficient few-shot adaptation for unseen lenses. Experiments on diverse aberrations of synthetic LensLib and real-world lenses demonstrate that our framework achieves state-of-the-art zero-shot generalization while enabling highly efficient few-shot adaptation for specific lenses. The source code and datasets will be made publicly available at https://github.com/zju-jiangqi/FoundCAC.

2511.05050 2026-06-03 stat.ML cs.LG stat.ME

Estimating Bidirectional Causal Effects with Large Scale Online Kernel Learning

基于大规模在线核学习的双向因果效应估计

Masahiro Tanaka

发表机构 * Japan Society for the Promotion of Science(日本学术振兴会)

AI总结 提出一种可扩展的在线核学习框架,结合异方差识别和拟极大似然估计,用于估计存在相互依赖和异方差系统中的双向因果效应,并通过随机傅里叶特征和自适应在线梯度下降实现高效计算。

Journal ref Proceedings of the 2025 International Conference on Data Science and Intelligent Systems (DSIS 2025), Article 65, pp. 449-455

详情
AI中文摘要

本研究提出一种可扩展的在线核学习框架,用于估计以相互依赖和异方差为特征的系统中的双向因果效应。传统因果推断通常关注单向效应,忽略了现实世界中常见的双向关系。基于异方差识别,该方法将联立方程模型的拟极大似然估计与大规模在线核学习相结合。它采用随机傅里叶特征逼近来灵活建模非线性条件均值和方差,同时自适应在线梯度下降算法确保了对流式和高维数据的计算效率。大量模拟结果表明,与单方程和多项式逼近基线相比,该方法在多种数据生成过程中实现了更高的准确性和稳定性,偏差和均方根误差更低。这些结果证实,该方法以近线性计算扩展有效捕获了复杂的双向因果效应。通过将计量经济学识别与现代机器学习技术相结合,所提框架为自然科学/社会科学、政策制定、商业和工业应用中的大规模因果推断提供了一种实用、可扩展且理论扎实的解决方案。

英文摘要

In this study, a scalable online kernel learning framework is proposed for estimating bidirectional causal effects in systems characterized by mutual dependence and heteroskedasticity. Traditional causal inference often focuses on unidirectional effects, overlooking the common bidirectional relationships in real-world phenomena. Building on heteroskedasticity-based identification, the proposed method integrates a quasi-maximum likelihood estimator for simultaneous equation models with large scale online kernel learning. It employs random Fourier feature approximations to flexibly model nonlinear conditional means and variances, while an adaptive online gradient descent algorithm ensures computational efficiency for streaming and high-dimensional data. Results from extensive simulations demonstrate that the proposed method achieves superior accuracy and stability than single equation and polynomial approximation baselines, exhibiting lower bias and root mean squared error across various data-generating processes. These results confirm that the proposed approach effectively captures complex bidirectional causal effects with near-linear computational scaling. By combining econometric identification with modern machine learning techniques, the proposed framework offers a practical, scalable, and theoretically grounded solution for large scale causal inference in natural/social science, policy making, business, and industrial applications.

2511.13899 2026-06-03 q-bio.NC cs.CE cs.LG

A Factorized Low-Rank RNN Framework for Uncovering Independent Neural Latent Dynamics and Connectivity

一种分解低秩RNN框架用于揭示独立神经潜在动力学和连接性

Chengrui Li, Yunmiao Wang, Yule Wang, Weihan Li, Dieter Jaeger, Anqi Wu

发表机构 * University of California, San Diego(加州大学圣迭戈分校)

AI总结 提出FacRNN框架,通过组间独立假设和部分相关惩罚,在低秩循环神经网络中实现潜在动力学的解耦与可解释性提升。

详情
AI中文摘要

低秩循环神经网络(lrRNN)是一类揭示神经群体活动背后低维潜在动力学的模型。尽管其功能连接是低秩的,但缺乏独立性解释,使得难以将不同的计算角色分配给不同的潜在维度。为了解决这个问题,我们提出了分解循环神经网络(FacRNN),这是一种生成式lrRNN框架,它假设潜在动力学之间具有组间独立性,同时允许组内灵活纠缠。这些独立的潜在组允许潜在动力学分别演化,但内部丰富以进行复杂计算。我们在变分自编码器(VAE)框架下重新表述lrRNN,从而引入部分相关惩罚,鼓励潜在维度组之间的独立性。在合成数据、猴子M1和小鼠电压成像数据上的实验表明,与不鼓励组间独立性的基线lrRNN相比,FacRNN持续改善了在低维空间和低秩连接中学到的神经潜在轨迹的解耦性和可解释性。

英文摘要

Low-rank recurrent neural networks (lrRNNs) are a class of models that uncover low-dimensional latent dynamics underlying neural population activity. Although their functional connectivity is low-rank, it lacks independence interpretations, making it difficult to assign distinct computational roles to different latent dimensions. To address this, we propose the Factored Recurrent Neural Network (FacRNN), a generative lrRNN framework that assumes group-wise independence among latent dynamics while allowing flexible within-group entanglement. These independent latent groups allow latent dynamics to evolve separately, but are internally rich for complex computation. We reformulate the lrRNN under a variational autoencoder (VAE) framework, enabling us to introduce a partial correlation penalty that encourages independence between groups of latent dimensions. Experiments on synthetic, monkey M1, and mouse voltage imaging data show that FacRNN consistently improves the disentanglement and interpretability of learned neural latent trajectories in low-dimensional space and low-rank connectivity over baseline lrRNNs that do not encourage group-wise independence.

2511.12482 2026-06-03 quant-ph cs.LG

Discovering autonomous quantum error correction via deep reinforcement learning

通过深度强化学习发现自主量子纠错

Yue Yin, Tailong Xiao, Xiaoyang Deng, Ming He, Jianping Fan, Guihua Zeng

发表机构 * Zhiyuan College, Shanghai Jiao Tong University, Shanghai 200240, P.R. China(上海交通大学玉泉学院) State Key Laboratory of Photonics and Communications, Institute for Quantum Sensing and Information Processing, Shanghai Jiao Tong University, Shanghai 200240, P.R. China(上海交通大学光子通信国家重点实验室) Hefei National Laboratory, Hefei, 230088, P.R. China(合肥国家实验室) Shanghai Research Center for Quantum Sciences, Shanghai, 201315, P.R. China(上海量子科学研究中心) AI Lab, Lenovo Research, Beijing 100094, P.R. China(联想AI实验室)

AI总结 本文利用课程学习启发的深度强化学习,在近似自主量子纠错框架下发现抵抗单光子和双光子损失的玻色子码,并实现超越盈亏平衡点的最优码字。

Journal ref Phys. Rev. A 112, 062618 (2025)

详情
AI中文摘要

量子纠错对于容错量子计算至关重要。然而,依赖主动测量的标准方法可能会引入额外错误。自主量子纠错(AQEC)通过利用玻色子系统中的工程耗散和驱动来规避这一问题,但由于严格的Knill-Laflamme条件,识别实用的编码仍然具有挑战性。在本工作中,我们利用课程学习启发的深度强化学习,在近似AQEC框架下发现抵抗单光子和双光子损失的玻色子码。我们提出了在近似条件下求解主方程的解析解,这可以显著加速强化学习的训练过程。智能体首先通过在受限演化时间框架内快速探索,识别出超越盈亏平衡点的编码子空间,然后策略性地微调其策略,以在更长的时间范围内维持这一性能优势。我们发现,经过两阶段训练的智能体能够发现最优码字集合,即考虑单光子和双光子损失效应的Fock态$\ket{4}$和$\ket{7}$。我们识别出该码在更长的演化时间内超越了盈亏平衡阈值,并达到了最先进的性能。我们还分析了该码对相位阻尼和振幅阻尼噪声的鲁棒性。我们的工作突显了课程学习启发的深度强化学习在发现最优量子纠错码方面的潜力,特别是在早期容错量子系统中。

英文摘要

Quantum error correction is essential for fault-tolerant quantum computing. However, standard methods relying on active measurements may introduce additional errors. Autonomous quantum error correction (AQEC) circumvents this by utilizing engineered dissipation and drives in bosonic systems, but identifying practical encoding remains challenging due to stringent Knill-Laflamme conditions. In this work, we utilize curriculum learning enabled deep reinforcement learning to discover Bosonic codes under approximate AQEC framework to resist both single-photon and double-photon losses. We present an analytical solution of solving the master equation under approximation conditions, which can significantly accelerate the training process of reinforcement learning. The agent first identifies an encoded subspace surpassing the breakeven point through rapid exploration within a constrained evolutionary time-frame, then strategically fine-tunes its policy to sustain this performance advantage over extended temporal horizons. We find that the two-phase trained agent can discover the optimal set of codewords, i.e., the Fock states $\ket{4}$ and $\ket{7}$ considering the effect of both single-photon and double-photon loss. We identify that the discovered code surpasses the breakeven threshold over a longer evolution time and achieve the state-of-art performance. We also analyze the robustness of the code against the phase damping and amplitude damping noise. Our work highlights the potential of curriculum learning enabled deep reinforcement learning in discovering the optimal quantum error correct code especially in early fault-tolerant quantum systems.

2511.02986 2026-06-03 stat.ML cs.LG q-bio.GN

Scalable Single-Cell Gene Expression Generation with Latent Diffusion Models

基于潜在扩散模型的可扩展单细胞基因表达生成

Giovanni Palla, Sudarshan Babu, Payam Dibaeinia, James D. Pearce, Donghui Li, Aly A. Khan, Theofanis Karaletsos, Jakub M. Tomczak

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出scLDM,一种结合变分自编码器和潜在扩散模型的可扩展生成方法,通过置换不变/等变架构和扩散Transformer实现高质量单细胞基因表达生成。

Comments Accepted to ICML 2026, Github: https://github.com/czi-ai/scldm/

详情
AI中文摘要

单细胞基因表达的计算建模对于理解细胞过程至关重要,但生成真实的表达谱仍然是一个主要挑战。这一困难源于基因表达数据的计数性质以及基因之间复杂的潜在依赖性。现有的生成模型通常强加人工基因排序或依赖浅层神经网络架构。我们引入了一种可扩展的潜在扩散模型用于单细胞基因表达数据,称为scLDM,该模型尊重数据的基本可交换性属性。我们的VAE使用固定大小的潜在变量,利用统一的多头交叉注意力块(MCAB)架构,该架构具有双重作用:编码器中的置换不变池化和解码器中的置换等变反池化。我们通过用使用扩散Transformer和线性插值的潜在扩散模型替换高斯先验来增强这一框架,从而通过多条件无分类器引导实现高质量生成。我们在观察性和扰动性单细胞数据的多种实验以及下游任务(如细胞水平分类)中展示了其优越性能。

英文摘要

Computational modeling of single-cell gene expression is crucial for understanding cellular processes, but generating realistic expression profiles remains a major challenge. This difficulty arises from the count nature of gene expression data and complex latent dependencies among genes. Existing generative models often impose artificial gene orderings or rely on shallow neural network architectures. We introduce a scalable latent diffusion model for single-cell gene expression data, which we refer to as scLDM, that respects the fundamental exchangeability property of the data. Our VAE uses fixed-size latent variables leveraging a unified Multi-head Cross-Attention Block (MCAB) architecture, which serves dual roles: permutation-invariant pooling in the encoder and permutation-equivariant unpooling in the decoder. We enhance this framework by replacing the Gaussian prior with a latent diffusion model using Diffusion Transformers and linear interpolants, enabling high-quality generation with multi-conditional classifier-free guidance. We show its superior performance in a variety of experiments for both observational and perturbational single-cell data, as well as downstream tasks like cell-level classification.

2511.02304 2026-06-03 cs.MA cs.AI cs.CL cs.FL cs.LG

Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning

自动机条件化协作多智能体强化学习

Beyazit Yalcinkaya, Marcell Vazquez-Chanlatte, Ameesh Shah, Hanna Krasowski, Sanjit A. Seshia

发表机构 * Massachusetts Institute of Technology(麻省理工学院) Stanford University(斯坦福大学)

AI总结 提出自动机条件化协作多智能体强化学习框架,通过自动机分解团队目标为子任务,学习任务条件化的分散策略,实现最优任务分配和多步协调。

详情
AI中文摘要

我们研究在集中训练、分散执行下,针对协作性时间目标的多任务、多智能体策略学习。在此设置中,使用自动机表示分配给智能体的任务,能够将团队级目标分解为更简单、更小的子任务。然而,现有方法样本效率低下,且局限于单任务情况,需要为每个新任务重新训练策略。在这项工作中,我们提出了自动机条件化协作多智能体强化学习(ACC-MARL),一个学习任务条件化分散团队策略的框架。我们识别了ACC-MARL可行性的挑战,提出了解决方案,并证明了我们的方法是最优的。我们进一步展示了学习到的价值函数可用于在测试时最优地分配任务。实验表明,智能体之间涌现出任务感知的多步协调,例如按下按钮开门、扶住门以及短路任务。

英文摘要

We study learning multi-task, multi-agent policies for cooperative, temporal objectives, under centralized training, decentralized execution. In this setting, using automata to represent tasks assigned to agents enables breaking down a team-level objective into simpler, smaller sub-tasks. However, existing approaches remain sample-inefficient and are limited to the single-task case, requiring retraining policies for each new task. In this work, we present Automata-Conditioned Cooperative Multi-Agent Reinforcement Learning (ACC-MARL), a framework for learning task-conditioned, decentralized team policies. We identify challenges to the feasibility of ACC-MARL, propose solutions, and prove that our approach is optimal. We further show that learned value functions can be used to assign tasks optimally at test time. Experiments demonstrate emergent task-aware, multi-step coordination among agents, such as pressing a button to unlock a door, holding the door, and short-circuiting tasks.

2510.15780 2026-06-03 stat.AP cs.LG

Enhanced Renewable Energy Forecasting using Context-Aware Conformal Prediction

基于上下文感知保形预测的增强型可再生能源预测

Alireza Moradi, Mathieu Tanneau, Reza Zandehshahvar, Pascal Van Hentenryck

发表机构 * EPFL, Switzerland(瑞士联邦理工学院) Ghent University, Belgium(比利时根特大学)

AI总结 提出上下文感知保形预测(CACP)框架,通过加权历史观测校准预测区间,无需重新训练模型,提升可再生能源预测的可靠性和效率。

详情
AI中文摘要

人工智能(AI)越来越多地被用于支持可再生能源预测和电网运营。随着可再生能源渗透率的增长,可靠的概率预测对于管理不确定性和支持风险感知的运营决策变得至关重要。然而,由于时间变异性、天气条件变化和异质运行机制,这些预测常常存在校准偏差。在许多实际场景中,可再生能源预测由外部来源、供应商或独立训练的系统提供,由于模型访问受限或计算约束,重新训练不可行。这需要高效且模型无关的方法来在预测生成后提高其可靠性。本文提出了上下文感知保形预测(CACP),一种用于校准可再生能源预测的框架。所提方法在校准过程中依赖于一种加权机制,该机制为与目标预测条件更相似的历史观测分配更高的权重。这使得能够自适应预测区间,反映局部不确定性机制,而无需访问或重新训练底层预测模型。实验在来自美国国家可再生能源实验室(NREL)的日前太阳能预测大规模数据集上进行,涵盖包括MISO、ERCTO和SPP在内的多个系统。结果表明,与NREL的基础预测模型和其他保形预测基线相比,CACP在站点和系统层面均改善了可靠性-效率权衡。这些结果表明,CACP可以作为可信AI驱动的可再生能源预测和运营决策支持的实际可靠性增强层。

英文摘要

Artificial intelligence (AI) is increasingly used to support renewable energy forecasting and grid operations. As renewable penetration grows, reliable probabilistic forecasting is becoming essential for managing uncertainty and supporting risk-aware operational decision-making. However, these forecasts often suffer from miscalibration due to temporal variability, changing weather conditions, and heterogeneous operating regimes. In many real-world settings, renewable energy forecasts are provided by external sources, vendors, or independently trained systems, making retraining infeasible because of limited model access or computational constraints. This creates a need for efficient and model-agnostic methods that can improve forecast reliability after they are produced. This paper presents Context-Aware Conformal Prediction (CACP), a framework for calibrating renewable energy forecasts. The proposed method relies on a weighting mechanism during the calibration procedure which assigns higher weights to historical observations that are more similar to the target forecasting condition. This enables adaptive prediction intervals that reflect local uncertainty regimes without requiring access to, or retraining of, the underlying forecasting model. Experiments are performed on a large-scale dataset from National Renewable Energy Laboratory (NREL) day-ahead solar forecasting, covering multiple systems including MISO, ERCTO, and SPP. The results show that CACP improves the reliability-efficiency tradeoff at both site and system levels compared to NREL's base forecasting model and the other conformal prediction baselines. These results suggest that CACP can serve as a practical reliability-enhancement layer for trustworthy AI-enabled renewable energy forecasting and operational decision support.

2509.08048 2026-06-03 hep-ph cs.LG

Forecasting Generative Amplification

预测生成放大

Henning Bahl, Sascha Diefenbacher, Nina Elmer, Tilman Plehn, Jonas Spinner

发表机构 * Institut für Theoretische Physik, Universität Heidelberg, Germany(海德堡大学理论物理研究所) Physics Division, Lawrence Berkeley National Laboratory, Berkeley, USA(伯克利国家实验室物理部) Interdisciplinary Center for Scientific Computing (IWR), Universität Heidelberg, Germany(海德堡大学跨学科科学计算中心(IWR))

AI总结 本文提出两种互补方法(平均放大和差分放大)来估计生成网络在LHC模拟中的统计放大因子,无需大型保留数据集,并应用于最新事件生成器,表明放大在相空间特定区域可行但尚未覆盖整个分布。

Comments 23 pages, 15 figures. v2: added link to github repo, extended acknowledgements. v3: updated conventions and refined text, now 25 pages

Journal ref SciPost Phys. 20, 150 (2026)

详情
AI中文摘要

生成网络是提高LHC模拟速度和精度的完美工具。理解其统计精度至关重要,尤其是在生成超出训练数据集大小的事件时。我们提出了两种互补方法来估计放大因子,无需大型保留数据集。平均放大使用贝叶斯网络或集成方法,通过给定相空间体积上积分的精度来估计放大。差分放大使用假设检验来量化放大,且没有任何分辨率损失。应用于最先进的事件生成器时,两种方法都表明放大在相空间的特定区域是可能的,但尚未覆盖整个分布。

英文摘要

Generative networks are perfect tools to enhance the speed and precision of LHC simulations. It is important to understand their statistical precision, especially when generating events beyond the size of the training dataset. We present two complementary methods to estimate the amplification factor without large holdout datasets. Averaging amplification uses Bayesian networks or ensembling to estimate amplification from the precision of integrals over given phase-space volumes. Differential amplification uses hypothesis testing to quantify amplification without any resolution loss. Applied to state-of-the-art event generators, both methods indicate that amplification is possible in specific regions of phase space, but not yet across the entire distribution.

2509.09685 2026-06-03 cs.IR cs.AI cs.MM cs.SD eess.AS

TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation

TalkPlayData 2:用于多模态对话式音乐推荐的智能体合成数据流水线

Keunwoo Choi, Seungheon Doh, Juhan Nam

发表机构 * KAIST(韩国科学技术院)

AI总结 提出TalkPlayData 2,一个由智能体数据流水线生成的多模态对话式音乐推荐合成数据集,通过多角色大语言模型模拟对话并覆盖多种场景,以支持生成式推荐模型训练。

详情
AI中文摘要

我们提出了TalkPlayData 2,一个由智能体数据流水线生成的多模态对话式音乐推荐合成数据集。在该流水线中,多个大语言模型(LLM)智能体被创建,承担不同角色,具有专门的提示词和访问不同信息部分的权限,通过记录Listener LLM和Recsys LLM之间的对话来获取聊天数据。为了覆盖各种对话场景,每个对话的Listener LLM基于微调的对话目标进行条件设置。最后,所有LLM都是多模态的,支持音频和图像,从而模拟多模态推荐和对话。在LLM-as-a-judge和主观评估实验中,TalkPlayData 2在训练音乐生成式推荐模型的各个方面达到了预期目标。TalkPlayData 2及其生成代码已在https://talkpl-ai.github.io发布。

英文摘要

We present TalkPlayData 2, a synthetic dataset for multimodal conversational music recommendation generated by an agentic data pipeline. In the proposed pipeline, multiple large language model (LLM) agents are created under various roles with specialized prompts and access to different parts of information, and the chat data is acquired by logging the conversation between the Listener LLM and the Recsys LLM. To cover various conversation scenarios, for each conversation, the Listener LLM is conditioned on a finetuned conversation goal. Finally, all the LLMs are multimodal with audio and images, allowing a simulation of multimodal recommendation and conversation. In the LLM-as-a-judge and subjective evaluation experiments, TalkPlayData 2 achieved the proposed goal in various aspects related to training a generative recommendation model for music. TalkPlayData 2 and its generation code are released at https://talkpl-ai.github.io.