arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1967
2605.14290 2026-05-15 cs.CR cs.AI cs.CL cs.SE

Web Agents Should Adopt the Plan-Then-Execute Paradigm

Julien Piet, Annabella Chow, Yiwei Hou, Muxi Lyu, Sylvie Venuto, Jinhao Zhu, Raluca Ada Popa, David Wagner

发表机构 * UC Berkeley(加州大学伯克利分校)

AI总结 本文指出,当前基于ReAct架构的大型语言模型代理在处理网页任务时存在安全隐患,因为其在决策过程中直接使用未验证的网页内容,容易受到提示注入攻击。作者主张网页代理应采用“先规划后执行”的范式,即在观察网页内容前制定任务特定的执行计划,从而隔离不可信数据对控制流的影响。研究分析了WebArena基准,发现大多数任务可通过纯程序化规划完成,而无需运行时调用LLM子程序,并指出实现该范式的关键在于构建类型化、可审计的网页API接口,而非改进模型本身。

详情
英文摘要

ReAct has become the default architecture across LLM agents, and many existing web agents follow this paradigm. We argue that it is the wrong default for web agents. Instead, web agents should default to plan-then-execute: commit to a task-specific program before observing runtime web content, then execute it. The reason is that web content mixes inputs from many parties. An e-commerce product page may combine a seller's listing, customer reviews and sponsored advertisements. Under ReAct, all of this content flows into the model when deciding on the next action, creating a direct path for prompt injections to steer the agent's control flow. Plan-then-execute changes this boundary: untrusted data may influence values or branches inside a predefined execution graph, but it cannot redefine the user task or cause the model to synthesize new actions at runtime. We analyze WebArena, a popular web agent benchmark, and find that all tasks are compatible with plan-then-execute, while 80% can be completed with a purely programmatic plan, without any runtime LLM subroutine. We identify the main barrier to adopting plan-then-execute on the web: For it to work well, tools must map cleanly to semantic actions, with effects known before execution, so agents have enough information to plan. The web does not naturally expose that interface. Browser tools such as click, type, and scroll have page-dependent meanings. Planning at this layer is near-sighted: the agent can only see actions on the current page, and later actions appear only after it acts. Closing this gap requires typed interfaces that turn website interactions from clicks and keystrokes to task-level operations. This is an infrastructure problem, not a modeling problem. Web tasks do not need reactivity by default; they need typed, complete, auditable website APIs.

2605.14283 2026-05-15 cs.GT cs.AI cs.CR

Watermarking Game-Playing Agents in Perfect-Information Extensive-Form Games

Juho Kim, Fei Fang, Tuomas Sandholm

发表机构 * Strategic Machine, Inc.(战略机器公司) Strategy Robot, Inc.(策略机器人公司) Optimized Markets, Inc.(优化市场公司)

AI总结 本文研究了在完全信息的扩展式博弈中对博弈策略进行水印的技术,旨在检测游戏代理是否未经授权地使用了AI工具。作者借鉴了大型语言模型的KGW水印方法,提出了一种适用于博弈代理的水印方案,并通过统计检验实现水印的检测。实验表明,水印对策略质量的影响可以忽略不计,且仅需少量对局即可有效检测水印。

详情
英文摘要

Watermarking techniques for large language models (LLMs), which encode hidden information in the output so its source can be verified, have gained significant attention in recent days, thanks to their potential capability to detect accidental or deliberate misuse. Similar challenges involving model misuse also exist in the context of game-playing, such as when detecting the unauthorized use of AI tools in gaming platforms (e.g., cheating in online chess). In this paper, we initiate the study of how game-playing strategies can be watermarked. We show how the KGW watermark for LLMs can be adapted to watermark game-playing agents in perfect-information extensive-form games. The watermark can then be detected using a statistical test. We show that the degradation in the quality of the watermarked strategy profile, quantified by the expected utility, can be bounded, but there is a tradeoff between detectability and quality. In our experiments, we bootstrap the watermarking framework to various chess engines and demonstrate that a) the impact of the watermark on the quality of the strategy is negligible and b) the watermark can be detected with just a handful of games.

2605.14276 2026-05-15 stat.ML cs.LG

Training-Free Generative Sampling via Moment-Matched Score Smoothing

Zhenyu Yao, Daniel Paulin

发表机构 * College of Computing and Data Science(计算与数据科学学院) Nanyang Technological University(南洋理工大学)

AI总结 本文提出了一种无需训练的生成采样方法MM-SOLD,通过矩匹配的得分平滑技术,直接从训练数据中估计目标分布的统计特性,并在采样过程中保持这些矩不变。该方法基于过阻尼朗之万动力学,能够在不训练神经网络的情况下实现高质量的样本生成,实验表明其在二维分布和图像生成任务中表现优异,具有计算高效、鲁棒性强的特点。

Comments 35 pages

详情
英文摘要

Diffusion models generate samples by denoising along the score of a perturbed target distribution. In practice, one trains a neural diffusion model, which is computationally expensive. Recent work suggests that score matching implicitly smooths the empirical score, and that this smoothing bias promotes generalization by capturing low-dimensional data geometry. We propose moment-matched score-smoothed overdamped Langevin dynamics (MM-SOLD), a training-free interacting particle sampler that enforces the target moments throughout the sampling trajectory. We prove that, in the large-particle limit, the empirical particle density converges to a deterministic limit whose one-particle stationary marginal is a Gibbs--Boltzmann density obtained by exponentially tilting a naive score-smoothed diffusion target. The mean and covariance of this distribution agree with the empirical moments of the training data. Experiments on 2D distributions and latent-space image generation show that MM-SOLD enables fast, robust, training-free sampling on CPUs, with sample fidelity and diversity competitive with neural diffusion baselines.

2605.14228 2026-05-15 cs.HC cs.LG

Self-Regulated Learning in Essay Writing: Consistency of Strategies and Impact on Outcomes

Gloria Fernández-Nieto, Kiyoshige Garcés, Mladen Raković, Tongguang Li, Xinyu Li, Linxuan Zhao, Dragan Gašević

发表机构 * Department of Data Science & AI(数据科学与人工智能系) Monash University(墨尔本大学)

AI总结 本研究探讨了中学生在在线作文写作过程中如何运用自我调节学习(SRL)策略,以及这些策略随时间的变化和对学习成果的影响。研究通过分析哥伦比亚两所中学学生在两周内的在线写作过程数据,结合过程挖掘和无监督机器学习方法,识别出三种主要的SRL策略,并发现这些策略的使用存在显著差异,其中“先阅读后写作”的策略较为普遍,而“密集写作、选择性阅读”策略虽较少见,却与更好的学习成果相关。研究结果为在线学习支持系统的优化提供了重要参考。

Comments 16 pages, 4 figures, submitted to Journal of Computer Assisted Learning (JCAL) [Under Review]

详情
英文摘要

Background: Abilities for effective self-regulated learning (SRL) are critical for lifelong learning, particularly during adolescence when these skills consolidate and strongly influence future learning. Their importance has grown with the rise of online and blended education. Yet, little is known about how secondary school students self-regulate in online environments, how their SRL processes and strategies evolve, or how they affect outcomes. In secondary education, understanding these processes can reveal patterns and indicators of learning success, informing the design of online support mechanisms. Evidence from repeated-measures designs remains scarce. Objectives: This study aims to examine how secondary school students enact SRL strategies during online essay writing, how these strategies change over time, and how they relate to learning outcomes. Methods: We analysed metacognition-related trace data collected from secondary students during a two-wave online essay-writing task conducted one week apart in two Colombian schools (N = 93 for session 1, N = 95 for session 2) via a digital learning platform. Using a combination of process mining and unsupervised machine learning techniques, we identified dominant SRL strategies grounded in established SRL processes and examined their stability and association with learning outcomes. Results and conclusions: Three dominant SRL strategies were identified. Results showed variability: many students remained in or shifted to Read first, write next, while none used Write intensively, read selectively in session 2. Although less common, latter strategy was positively associated with learning outcomes.

2605.14224 2026-05-15 math.NA cs.AI cs.NA math.DS math.FA

Wavelet-Based Observables for Koopman Analysis: An Extended Dynamic Mode Decomposition Framework

Cankat Tilki, Serkan Gugercin

发表机构 * Department of Mathematics, Virginia Tech(弗吉尼亚理工大学数学系)

AI总结 本文提出了一种基于小波变换的Koopman算子分析方法,通过引入小波基观测函数,证明其在特定Banach空间下是Koopman半群的特征函数。在此基础上,构建了Koopman半群及其预解算子的闭式表达,并结合扩展动态模态分解(EDMD)提出了一种新的小波动态模态分解算法(cWDMD),用于数值近似Koopman算子的作用。该方法在两个数值例子中得到了验证,展示了其理论有效性与应用潜力。

详情
英文摘要

We present an in-depth analysis of the Koopman semigroup via wavelet transform. Towards this goal, we start by introducing the wavelet-based observables and show that they are eigenfunctions of the Koopman semigroup when this semigroup is considered over the Banach space of continuous functions on a compact forward-invariant set endowed with the supremum norm. We then construct closed-form expressions of the action of the Koopman semigroup and its resolvent in terms of these observables. To approximate the action of Koopman semigroup numerically, we combine Extended Dynamic Mode Decomposition (EDMD) with the proposed wavelet-based observables leading to the Wavelet Dynamic Mode Decomposition via Continuous Wavelet Transform (cWDMD) algorithm. We validate our theoretical results on two numerical examples.

2605.14202 2026-05-15 cs.SE cs.AI

LLM-Based Robustness Testing of Microservice Applications: An Empirical Study

Hrushitha Goud Tigulla, Marco Vieira

发表机构 * College of Computing(计算学院) Informatics University of North Carolina at Charlotte(北卡罗来纳大学夏洛特分校信息学院) Charlotte, USA(美国夏洛特)

AI总结 本文通过实证研究探讨了基于大语言模型(LLM)的微服务应用鲁棒性测试方法。研究针对不同架构的微服务系统,应用七种提示策略和三种开源LLM生成测试用例,发现提示策略对测试多样性的影响比模型规模更大。研究提出了两种新策略——Guided和GuidedFewShot,结合领域知识提升测试覆盖效果,其中GuidedFewShot在两个系统中均实现了较高的失败模式覆盖率,且保持了较低的模型间相似性。实验表明,仅依赖分类规则不足以引导LLM生成有效测试,具体示例对模型理解输入突变至关重要。

详情
英文摘要

Malformed, missing, or boundary-value inputs in microservice APIs can cascade across dependent services, threatening reliability. Robustness testing systematically exercises such inputs to expose server-side failures, but generating diverse, effective tests remains challenging. Large Language Models can generate such tests from API specifications; however, it is unknown whether different models and prompt strategies produce diverse failure sets or converge on the same failures. We report a controlled experiment applying 7 prompt strategies to 3 open-source LLMs (14B-70B parameters) targeting 2 architecturally distinct microservice systems: one Java monolingual (6 services, 9 failure modes) and one polyglot (27 services, 14 failure modes), yielding 38 valid runs and 663 generated tests. We find that prompt strategy explains more variation in diversity than model size: a Structured prompt collapses diversity entirely, while a single model varied across three prompt strategies achieves complete failure-mode coverage on one system, outperforming any multi-model ensemble under a fixed prompt. We introduce two strategies, Guided and GuidedFewShot, that embed a mutation taxonomy from prior robustness testing research as domain context. GuidedFewShot achieves the highest single-run coverage on both systems (5 of 9 and 8 of 14 failure modes) while maintaining low cross-model similarity. A key lesson is that taxonomy rules alone are insufficient: LLMs cannot distinguish key-absent from value-empty mutations without concrete examples. Findings replicate across both systems.

2605.14195 2026-05-15 cs.DS cs.LG

Stochastic Matching via Local Sparsification

Sara Ahmadian, Edith Cohen, Mohammad Roghani

发表机构 * Google Research(谷歌研究)

AI总结 本文研究了在线随机匹配问题中的一种新场景,其中本地通信带宽而非匹配时机成为主要瓶颈。为此,作者提出了一种两阶段的本地稀疏化框架,要求每个请求在全局优化前将其兼容集合缩减到一个固定大小的预算。研究设计了一种基于期望实例分数解的本地选择策略,并理论证明在足够分散度下该方法能够近似保持最大匹配的期望规模。实验表明,即使在严格的本地预算限制下,该方法仍能实现接近最优的全局匹配效果,优于传统在线算法。

详情
英文摘要

The classic online stochastic matching problem typically requires immediate and irrevocable matching decisions. However, in many modern decentralized systems such as real-time ride-hailing and distributed cloud computing, the primary bottleneck is often local communication bandwidth rather than the timing of the match itself. We formalize this challenge by introducing a two-stage local sparsification framework. In this setting, arriving requests must prune their realized compatibility sets to a strict budget of $k$ edges before a central coordinator optimizes the global matching. This creates a "middle ground" between local information constraints and global optimization utility. We propose a local selection strategy, parametrized by a fractional solution of the expected instance. Theoretically, we quantify the approximation ratio as a function of the solution's {\em spread}. We prove that under sufficient spread, our sparsifier globally preserves the expected size of the maximum matching. Empirically, we demonstrate the robustness of our approach using the New York City ride-hailing datasets and adversarial synthetic benchmarks. Our results show that near-optimal global matching is achievable even with highly constrained local budgets, significantly outperforming standard online baselines.

2605.13773 2026-05-15 cs.SE cs.AI cs.LO

(How) Do Large Language Models Understand High-Level Message Sequence Charts?

Mohammad Reza Mousavi

发表机构 * Department of Informatics, King's College London(伦敦国王学院信息学院)

AI总结 本文研究了大型语言模型(LLMs)对高层消息序列图(HMSCs)形式语义的理解程度。通过让三种主流LLMs完成129项与HMSC语义相关的任务,发现它们对基本语义概念的理解较好,但在涉及抽象、组合以及追踪和标签转换系统等复杂语义推理任务时表现较差。研究揭示了当前LLMs在处理具有严格形式语义的软件设计模型时仍存在显著局限。

详情
英文摘要

Large Language Models (LLMs) are being employed widely to automate tasks across the software development life-cycle. It is, however, unclear whether these tasks are performed consistently with respect to the semantics of the artefacts being handled. This question is particularly under-researched concerning architectural design specification. In this paper, we address this question for High-Level Message Sequence Charts (HMSCs). These are visual models with a rigorous formal semantics that have been used for various purposes, including as a foundation for Sequence Diagrams in the Unified Modelling Language (UML). We examine whether LLMs "understand" the semantics of HMSCs by examining three LLMs (Gemini-3, GPT-5.4, and Qwen-3.6) on how they perform 129 semantic tasks ranging from querying basic semantic constructs in HMSCs (i.e., events and their ordering) to semantic-preserving abstractions and compositions, and calculating the set of traces and trace-equivalent labelled transition systems. The results show that LLMs only have a modest understanding of the formal semantics of HMSCs (ca. 52% overall accuracy), with great variability across different semantic concepts: while LLMs seem to understand the basic semantic concepts of MSCs (ca. 88% accuracy), they struggle with semantic reasoning in tasks involving abstraction and composition (ca. 36% accuracy) and traces and LTSs (ca. 42% accuracy). In particular, all three LLMs struggle with the notions of co-region and explicit causal dependencies and never employed them in semantic-preserving transformations.

2605.13362 2026-05-15 cs.MA cs.AI cs.DC cs.GT econ.TH

Constitutional Governance in Metric Spaces

Ehud Shapiro, Nimrod Talmon

发表机构 * London School of Economics and Weizmann Institute of Science(伦敦经济学院和魏茨曼科学研究院) Ben-Gurion University(本· Gurion大学)

AI总结 本文研究了在度量空间中实现平等自主治理的计算机制,提出了宪法治理框架,将提案、审议、修改和共识等过程整合为一个多项式时间协议。该框架通过为每个可修改的组件分配度量空间、聚合规则和超级多数阈值,支持成员通过理想元素投票并提交获得超级多数支持的公开提案,从而实现宪法共识。研究还展示了该框架在七个典型场景中的应用,并证明了广义中位数在多数阈值下具有良好的激励相容性,为数字社区和组织的宪法治理提供了全面解决方案。

详情
英文摘要

Computational social choice and algorithmic decision theory offer rich aggregation theory but no comprehensive process for egalitarian self-governance: aggregation, deliberation, amendment, and consensus are each considered in isolation, with key metric-space aggregators being NP-hard. Here, we propose constitutional governance in metric spaces, integrating these stages into a coherent polynomial-time protocol for constitutional governance. The constitution assigns, per amendable component including itself, a metric space, aggregation rule, and supermajority threshold. Amendments proceed by members voting with their ideal elements, followed by members submitting public proposals carrying supermajority public support under the revealed votes. Public proposals can be sourced from deliberation among members, vote aggregation, or AI mediation. The constitutional rule adopts a supported proposal with positive maximal score, if there is one, else retains the status quo. With Constitutional Consensus, a community can run the constitutional governance protocol on members' personal computing devices (e.g., smartphones), achieving digital sovereignty. We focus on the utility of the generalised median, prove that at majority threshold no misreport weakly dominates sincere voting, and study the compromise gap between best peak and unconstrained optimum. We instantiate the framework to seven canonical settings -- electing officers, setting rates, allocating budgets, ranking priorities, selecting boards, drafting bylaws, and amending the constitution. By unifying metric-space aggregation, reality-aware social choice, supermajority amendment, constitutional consensus, deliberative coalition formation, and AI mediation, this work delivers a comprehensive solution to the constitutional governance of digital communities and organisations.

2605.13343 2026-05-15 cs.GR cs.DC cs.LG cs.NA math.NA

Hierarchical Transformer Preconditioning for Interactive Physics Simulation

Carl Osborne, Minghao Guo, Crystal Owens, Wojciech Matusik

发表机构 * MIT CSAIL(麻省理工学院计算机科学与人工智能实验室)

AI总结 该研究提出了一种基于分层Transformer的预条件器,用于加速实时物理模拟中的求解过程。通过结合弱可接受H-矩阵划分,该方法在保持计算效率的同时,能够有效捕捉长程耦合关系。核心贡献包括一种新的训练目标函数,提升了预条件器对不规则谱的适应性,并实现了在大规模多相泊松系统中的高效求解,显著优于传统方法。

Comments 10 pages, 7 figures. Includes supplementary video and material

详情
英文摘要

Neural preconditioners for real-time physics simulation offer promising data-driven priors, but they often fail to capture long-range couplings efficiently because they inherit local message passing or sparse-operator access patterns. We introduce the Hierarchical Transformer Preconditioner, a neural preconditioner anchored to a weak-admissibility H-matrix partition. The partition provides a multiscale structural prior (dense diagonal leaves plus coarsening off-diagonal tiles) that enables full-graph approximate-inverse computation with O(N) scaling at fixed block sizes. The network models the inverse through low-rank far-field factors and uses highway connections (axial buffers plus a global summary token) to propagate context across transformer depth. At each PCG iteration, preconditioner application reduces to batched dense GEMMs with regular memory access. The key training contribution is a cosine-Hutchinson probe objective that learns the action of MA on convergence-critical spectral subspaces, optimizing angular alignment of MAz with z rather than forcing eigenvalue clusters to a prescribed location. This removes unnecessary spectral-placement constraints from SAI-style objectives and improves conditioning on irregular spectra. Because both inference and apply are dense, dependency-free tensor programs, the full solve loop is captured as a single CUDA Graph. On stiff multiphase Poisson systems (up to 100:1 density contrast, N = 1,024-16,384), the solver runs from ~143 to ~21 fps. At N = 8,192, it reaches 17.9 ms/frame, with 2.2x speedup over GPU Jacobi, ~28x over GPU IC/DILU (AMGX multicolor_dilu), and 2.7x over neural SPAI retrained per scale on the same benchmark.

2605.13137 2026-05-15 cs.IR cs.AI

LeanSearch v2: Global Premise Retrieval for Lean 4 Theorem Proving

Guoxiong Gao, Zeming Sun, Jiedong Jiang, Yutong Wang, Jingda Xu, Peihao Wu, Bryan Dai, Bin Dong

发表机构 * School of Mathematical Sciences, Peking University(北京大学数学科学学院) IQuest Research(IQuest研究院) Research Institute for Mathematical Sciences, Kyoto University(京都大学数学研究所) Westlake Institute for Advanced Study, Westlake University(西湖研究所在线高级研究院) Beijing International Center for Mathematical Research and the New Cornerstone Science Laboratory, Peking University(北京国际数学研究中心和新基石科学实验室,北京大学) Center for Machine Learning Research, Peking University(北京大学机器学习研究中心) Center for Intelligent Computing, Great Bay Institute for Advanced Study, Great Bay University(智能计算中心,Great Bay高级研究院,Great Bay大学) Zhongguancun Academy(中关村学院)

AI总结 LeanSearch v2 是一种用于 Lean 4 定理证明的全局前提检索系统,旨在从数学库中找到能够支持定理证明的多个相关引理。该系统包含两种模式:标准模式通过嵌入-重排序流程实现高精度的单次查询检索,而推理模式则通过迭代的草稿-检索-反思循环实现全局前提的恢复。实验表明,LeanSearch v2 在多个基准测试中显著优于现有系统,有效提升了定理证明的成功率。

详情
英文摘要

Proving theorems in Lean 4 often requires identifying a scattered set of library lemmas whose joint use enables a concise proof -- a task we call global premise retrieval. Existing tools address adjacent problems: semantic search engines find individual declarations matching a query, while premise-selection systems predict useful lemmas one tactic step at a time. Neither recovers the full premise set an entire theorem requires. We present LeanSearch v2, a two-mode retrieval system for this task. Its standard mode applies a hierarchy-informalized Mathlib corpus with an embedding-reranker pipeline, achieving state-of-the-art single-query retrieval without domain-specific fine-tuning (nDCG@10 of 0.62 vs. 0.53 for the next-best system). Its reasoning mode builds on standard mode as its retrieval substrate, targeting global premise retrieval through iterative sketch-retrieve-reflect cycles. On a 69-query benchmark of research-level Mathlib theorems, reasoning mode recovers 46.1% of ground-truth premise groups within 10 retrieved candidates, outperforming strong reasoning retrieval systems (38.0%) and premise-selection baselines (9.3%) on the same benchmark. In a controlled downstream evaluation with a fixed prover loop, replacing alternative retrievers with LeanSearch v2 yields the highest proof success (20% vs. 16% for the next-best system and 4% without retrieval), confirming that retrieval quality propagates to proof generation. We have open-sourced all code, data, and benchmarks. Code and data: https://github.com/frenzymath/LeanSearch-v2 . The standard mode is publicly available with API access at https://leansearch.net/ .

2605.13095 2026-05-15 cs.CR cs.AI cs.CY cs.LG

Watermarking Should Be Treated as a Monitoring Primitive

Toluwani Aremu, Nils Lukas, Jie Zhang

发表机构 * MBZUAI(穆扎布伊人工智能研究所) A*STAR(新加坡科技研究局)

AI总结 该论文探讨了生成模型中水印技术在溯源、归因和安全监控中的应用,并指出当前水印评估通常仅针对单个样本的对抗攻击,忽视了观察者通过聚合多个输出信号进行实体级信息推断的能力。研究引入了基于观察者的威胁模型,表明即使零比特水印也能在多密钥环境下实现归因,并揭示了水印设计在外部监控方面的潜在风险与应对策略。论文揭示了归因与监控之间的根本性双重用途矛盾,强调水印评估应超越单样本鲁棒性,考虑聚合分析和观察者能力的影响。

Comments 12 pages, 5 figures

详情
英文摘要

Watermarking is widely proposed for provenance, attribution, and safety monitoring in generative models, yet is typically evaluated only under adversaries who attempt to evade detection or induce false positives at the level of individual samples. We argue that watermarking should be treated as a monitoring primitive, and that internal monitoring is unavoidable given per-entity attribution keys and messages, as well as detector access. We introduce an observer-based threat model in which observers can aggregate watermark signals across outputs to infer entity-level information, showing that even zero-bit watermarking enables attribution under multi-key settings. We further show that external monitoring can emerge over time from persistent, key-dependent statistical structure, although this depends on watermark design and may be mitigated by distribution-preserving or undetectable schemes. Our findings reveal a fundamental dual-use tension between attribution and monitoring, motivating evaluation of watermarking beyond per-sample robustness to account for aggregation and observer-based capabilities.

2605.09664 2026-05-15 cs.CR cs.LG

FreeMOCA: Memory-Free Continual Learning for Malicious Code Analysis

Zahra Asadi, Haeseung Jeon, Sohyun Han, Md Mahmuduzzaman Kamol, Se Eun Oh, Mohammad Saidur Rahman

发表机构 * Department of Computer Engineering(计算机工程系) Amirkabir University of Technology(阿姆irkabir技术大学) Division of Artificial Intelligence & Software(人工智能与软件系) Ewha Womans University(成均馆大学) Department of Computer Science(计算机科学系) University of Texas at El Paso(德克萨斯理工大学)

AI总结 随着每年新发现的恶意软件样本超过2亿个,反病毒系统需要不断适应不断变化的威胁环境。然而,仅使用新样本进行再训练会导致灾难性遗忘和可被利用的检测盲区,而使用整个数据集再训练则计算成本高昂。为此,本文提出FreeMOCA,一种无需存储记忆且计算高效的持续学习框架,通过在任务更新之间进行自适应的逐层插值,保留先前知识,从而有效提升恶意代码分析的持续学习能力。实验表明,FreeMOCA在多个大规模基准数据集上显著优于现有方法,大幅减少了遗忘并提升了检测准确率。

Comments 17 pages, 5 figures, 12 tables

详情
英文摘要

As over 200 million new malware samples are identified each year, antivirus systems must continuously adapt to the evolving threat landscape. However, retraining solely on new samples leads to catastrophic forgetting and exploitable blind spots, while retraining on the entire dataset incurs substantial computational cost. We propose FreeMOCA, a memory- and compute-efficient continual learning framework for malicious code analysis that preserves prior knowledge via adaptive layer-wise interpolation between consecutive task updates, leveraging the fact that warm-started task optima are connected by low-loss paths in parameter space. We evaluate FreeMOCA in both class-incremental (Class-IL) and domain-incremental (Domain-IL) settings on large-scale Windows (EMBER) and Android (AZ) malware benchmarks. FreeMOCA achieves substantial gains in Class-IL, outperforming 11 baselines on both EMBER and AZ benchmarks. It also significantly reduces forgetting, achieving the best retention across baselines, and improving accuracy by up to 42% and 37% on EMBER and AZ, respectively. These results demonstrate that warm-started interpolation in parameter space provides a scalable and effective alternative to replay for continual malware detection. Code is available at: https://github.com/IQSeC-Lab/FreeMOCA.

2605.09530 2026-05-15 cs.CR cs.CL

MemPrivacy: Privacy-Preserving Personalized Memory Management for Edge-Cloud Agents

Yining Chen, Jihao Zhao, Bo Tang, Haofen Wang, Yue Zhang, Fei Huang, Feiyu Xiong, Zhiyu Li

发表机构 * MemTensor (Shanghai) Technology Co., Ltd.(MemTensor(上海)科技有限公司) HONOR Device Co., Ltd.(HONOR设备有限公司) Tongji University(同济大学)

AI总结 随着基于大语言模型的智能体越来越多地部署在边缘-云环境中,个性化记忆成为实现长期适应和以用户为中心交互的关键。然而,现有的云端辅助记忆管理方式容易暴露敏感用户信息,而现有的隐私保护方法通常依赖于激进的语义抹除,导致记忆效用和个性化质量下降。为此,本文提出 MemPrivacy,通过在边缘设备上识别隐私敏感内容,并用语义结构化的类型感知占位符替代,既保护了隐私,又保留了记忆生成与检索所需的信息。实验表明,MemPrivacy 在隐私信息提取方面表现优异,同时显著降低了推理延迟,有效平衡了隐私保护与个性化记忆效用。

详情
英文摘要

As LLM-powered agents are increasingly deployed in edge-cloud environments, personalized memory has become a key enabler of long-term adaptation and user-centric interaction. However, cloud-assisted memory management exposes sensitive user information, while existing privacy protection methods typically rely on aggressive masking that removes task-relevant semantics and consequently degrades memory utility and personalization quality. To address this challenge, We propose MemPrivacy, which identifies privacy-sensitive spans on edge devices, replaces them with semantically structured type-aware placeholders for cloud-side memory processing, and restores the original values locally when needed. By decoupling privacy protection from semantic destruction, MemPrivacy minimizes sensitive data exposure while retaining the information required for effective memory formation and retrieval. We also construct MemPrivacy-Bench for systematic evaluation, a dataset covering 200 users and over 155k privacy instances, and introduce a four-level privacy taxonomy for configurable protection policies. Experiments show that MemPrivacy achieves strong performance in privacy information extraction, substantially surpassing strong general-purpose models such as GPT-5.2 and Gemini-3.1-Pro, while also reducing inference latency. Across multiple widely used memory systems, MemPrivacy limits utility loss to within 1.6%, outperforming baseline masking strategies. Overall, MemPrivacy offers an effective balance between privacy protection and personalized memory utility for edge-cloud agents, enabling secure, practical, and user-transparent deployment.

2605.09018 2026-05-15 cs.NE cs.AI cs.LG

Evolutionary Ensemble of Agents

Zongmin Yu, Liu Yang

发表机构 * National University of Singapore(新加坡国立大学)

AI总结 本文提出了一种名为EvE的进化集成框架,用于组织现有的高能力编码代理,使其形成一个协同进化的系统,以实现算法发现。该方法固定基础代理结构,专注于进化代理行为的指导与技能,通过两个协同进化的种群(功能代码求解器和代理指导状态)进行同步竞争,并根据其对当前求解状态的边际贡献更新代理的Elo评分。实验表明,EvE在In-Context Operator Networks(ICON)的研究瓶颈中自主发现了可靠的缩放-插值机制,展示了其在复杂代码库中通过自适应代理集成突破性能瓶颈的有效性。

详情
英文摘要

We introduce Evolutionary Ensemble (EvE), a decentralized framework that organizes existing, highly capable coding agents into a live, co-evolving system for algorithmic discovery. Rather than reinventing the wheel within the "LLMs as optimizers" paradigm, EvE fixes the base agent substrate and focuses entirely on evolving the cumulative guidance and skills that dictate agent behaviors. By maintaining two co-evolving populations, namely functional code solvers and agent guidance states, the system evaluates agents through a synchronous race, updating their empirical Elo ratings based on the marginal gains they contribute to the current solver state. When applied to a research bottleneck in In-Context Operator Networks (ICON), EvE autonomously discovered a robust rescale-then-interpolate mechanism that enables reliable example-count generalization. Crucially, controlled ablations reveal the absolute necessity of stage-dependent agent adaptation to navigate the shifting search landscapes of complex codebases. Compared to variants driven by a fixed initial agent or even a frozen "best-evolved" agent, EvE uniquely avoids phase mismatch, demonstrating that organizing agents into a self-revising ensemble is the fundamental driver for breaking through static performance ceilings.

2605.07060 2026-05-15 physics.geo-ph cs.LG physics.comp-ph stat.ML

Functional-prior-based approaches to Bayesian PDE-constrained inversion using physics-informed neural networks

Ryoichiro Agata, Tomohisa Okazaki

发表机构 * Disaster Prevention Research Institute, Kyoto University(京都大学灾害预防研究所) RIKEN Center for Advanced Intelligence Project(理化学研究所先进智能项目中心)

AI总结 本文提出了一种基于函数先验的贝叶斯偏微分方程约束反演方法(fpBPINN),旨在将物理意义明确的函数空间先验有效引入基于物理信息神经网络(PINN)的贝叶斯反演中。研究引入了两种互补方法:一种通过学习神经网络权重先验以符合给定函数先验,另一种则在函数空间中直接进行变分推理。实验表明,这两种方法在地震层析成像和达西流渗透率反演中均能准确估计后验分布,突显了引入物理可解释函数先验在提升反演精度中的重要性。

详情
英文摘要

Physics-informed neural networks (PINNs) provide a mesh-free framework for solving PDE-constrained inverse problems, but their extension to Bayesian inversion still faces a fundamental difficulty: prior distributions are typically defined in the weight space of neural networks, whereas physically meaningful prior assumptions are more naturally expressed in function space. In this study, we introduce a unified framework, termed functional-prior-based approaches to Bayesian PDE-constrained inversion using physics-informed neural networks (fpBPINN), to incorporate functional priors into Bayesian PINN-based inversion. We consider two complementary approaches. The first is a functional-prior-informed Bayesian PINN (FPI-BPINN), in which a neural network weight prior is learned to be consistent with a prescribed functional prior, and Bayesian inference is subsequently performed in weight space. The second is function-space particle-based variational inference for PINNs (fParVI-PINN), which performs Bayesian estimation using ParVI directly in function space. We also show that random Fourier features (RFF) play an important role in representing Gaussian functional priors with neural networks and in improving posterior approximation. We applied the proposed approaches to one-dimensional seismic traveltime tomography and two-dimensional Darcy-flow permeability inversion. These numerical experiments showed that both approaches accurately estimated posterior distributions, highlighting the significance of introducing physically interpretable functional priors into Bayesian PINN-based inverse problems. We also identified the contrasting advantages of FPI-BPINN and fParVI-PINN, namely flexibility and accuracy, respectively.

2604.17954 2026-05-15 math.DG cs.LG

Complex normalizing flows can almost be information Kähler-Ricci flows

Andrew Gracyk

发表机构 * Department of Mathematics, Purdue University(数学系,普渡大学)

AI总结 本文探讨了复正规化流与近似凯勒-里奇流之间的联系,将复正规化流中用于密度变换的对数行列式与凯勒流形的里奇曲率联系起来。通过引入增广雅可比矩阵和贝叶斯参数视角,研究揭示了复正规化流的对数密度在连续极限下与费舍尔信息度量相吻合,从而在时间导数和期望的意义下恢复了凯勒-里奇流的变体。该工作建立了复正规化流的统计行为与几何特征之间的桥梁,为理解深度生成模型提供了新的几何视角。

详情
英文摘要

We develop interconnections between the complex normalizing flow for data drawn from Borel probability measures on the twofold realification of the complex manifold and a nonlinear flow nearly Kähler-Ricci. The complex normalizing flow relates the initial and target realified densities under the complex change of variables, necessitating the log determinant of the ensemble of Wirtinger Jacobians. The Ricci curvature of a Kähler manifold is the second order mixed Wirtinger partial derivative of the log of the local density of the volume form. Therefore, we reconcile these two facts by drawing forth the connection that the log determinant used in the complex normalizing flow matches a Ricci curvature term under differentiation and conditions. The log density under the normalizing flow is kindred to a spatial Fisher information metric under an augmented Jacobian and a Bayesian perspective to the parameter, thus under the continuum limit the log likelihood matches a Fisher metric, recovering a Kähler-Ricci flow variation up to a time derivative and expectation, or an average-valued Kähler-Einstein flow. Using this framework, we establish other relevant results, attempting to bridge the statistical and ordinary behaviors of the complex normalizing flow to the geometric features of our derived Kähler flow.

2604.09603 2026-05-15 cs.DC cs.AI cs.LG

ECHO: Elastic Speculative Decoding with Sparse Gating for High-Concurrency Scenarios

Xinyi Hu, Yuhao Shen, Baolin Zhang, Hengxin Zhang, Jun Dai, Shuang Ge, Lei Chen, Yue Li, Mingcheng Wan

发表机构 * Qwen Applications Business Group of Alibaba(阿里巴巴文勤应用业务部)

AI总结 ECHO 是一种面向高并发场景的弹性推测解码框架,旨在提升大语言模型推理效率。该方法通过稀疏置信度门控机制,将推测执行重新建模为预算调度问题,灵活平衡解码深度与宽度,从而减少全局验证步骤并提高每步效率。实验表明,ECHO 在多种模型规模下均优于现有方法,尤其在工业级模型 Qwen3-235B 上实现了最高达 5.35 倍的加速效果。

详情
英文摘要

Speculative Decoding promises to accelerate the inference of Large Language Models, yet its efficacy often degrades in production-grade serving. Existing evaluations typically overlook the compute-bound nature of high-concurrency regimes, where verification compute becomes the dominant bottleneck. Consequently, prior methods face a dilemma: static trees incur massive verification waste, while dynamic trees suffer from cumulative misjudgments and kernel incompatibility. To bridge this gap, we introduce ECHO, a high concurrency-oriented framework integrated into SGLang that reformulates speculative execution as a budgeted scheduling problem. Crucially, ECHO employs sparse confidence gating to manage the batch as a unified super-tree, elastically pivoting budget between depth and width to co-optimize the trade-off between reducing global verification steps and maximizing per-step efficiency. Extensive evaluations across diverse model scales-particularly the industrial-grade Qwen3-235B-demonstrate that ECHO consistently outperforms SOTA methods in both low-load and high-load scenarios, achieving up to 5.35x walltime speedup and delivering over 20% relative speedup gain.

2603.29097 2026-05-15 eess.AS cs.SD

Asymmetric Encoder-Decoder Based on Time-Frequency Correlation for Speech Separation

Ui-Hyeop Shin, Hyung-Min Park

发表机构 * Department of Electronic Engineering, Sogang University(电子工程系,首尔大学)

AI总结 本文研究了在真实声学环境下如何有效分离混叠语音信号的问题,提出了一种基于时频相关性的不对称编码-解码框架SR-CorrNet。该方法通过引入分离-重建策略,结合时频双路径结构,实现了对说话人特征的逐步细化提取,并利用结构化的相关性到滤波估计方法提升分离效果。实验表明,该方法在多种数据集和不同环境条件下均取得了显著的性能提升。

Comments Submitted to IEEE Transactions on Audio, Speech, and Language Processing (TASLPRO) Code: https://github.com/dmlguq456/SR_CorrNet

详情
英文摘要

Speech separation in realistic acoustic environments remains challenging because overlapping speakers, background noise, and reverberation must be resolved simultaneously. Although recent time-frequency (TF) domain models have shown strong performance, most still rely on late-split architectures, where speaker disentanglement is deferred to the final stage, creating an information bottleneck and weakening discriminability under adverse conditions. To address this issue, we propose SR-CorrNet, an asymmetric encoder-decoder framework that introduces the separation-reconstruction (SepRe) strategy into a TF dual-path backbone. The encoder performs coarse separation from mixture observations, while the weight-shared decoder progressively reconstructs speaker-discriminative features with cross-speaker interaction, enabling stage-wise refinement. To complement this architecture, we formulate speech separation as a structured correlation-to-filter problem: spatio-spectro-temporal correlations computed from the observations are used as input features, and the corresponding deep filters are estimated to recover target signals. We further incorporate an attractor-based dynamic split module to adapt the number of output streams to the actual speaker configuration. Experimental results on WSJ0-{2,3,4,5}Mix, WHAMR!, and LibriCSS demonstrate consistent improvements across anechoic, noisy-reverberant, and real-recorded conditions in both single- and multi-channel settings, highlighting the effectiveness of TF-domain SepRe with correlation-based filter estimation for speech separation.

2603.24586 2026-05-15 cs.SE cs.CL

Comparing Developer and LLM Biases in Code Evaluation

Aditya Mittal, Ryan Shar, Zichu Wu, Shyam Agarwal, Tongshuang Wu, Chris Donahue, Ameet Talwalkar, Wayne Chi, Valerie Chen

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 随着大语言模型(LLM)在代码评估中被广泛用作评判者,研究其在真实交互场景中的表现变得尤为重要。本文提出TRACE框架,用于评估LLM评判者预测人类偏好和揭示人类与模型在代码质量评价上的系统性偏差的能力。研究发现,在多种代码交互场景中,最佳LLM评判者的表现仍比人类注释者低12%-23%,并识别出35个导致人类与模型评判不一致的关键因素,其中大部分与现有软件工程代码质量标准相关。

详情
英文摘要

As LLMs are increasingly used as judges in code applications, they should be evaluated in realistic interactive settings that capture partial context and ambiguous intent. We present TRACE (Tool for Rubric Analysis in Code Evaluation), a framework that evaluates LLM judges' ability to predict human preferences and automatically extracts rubric items to reveal systematic biases in how humans and models weigh each item. Across three modalities -- chat-based programming, IDE autocompletion, and instructed code editing -- we use TRACE to measure how well LLM judges align with developer preferences. Among 13 different models, the best judges underperform human annotators by 12-23%. TRACE identifies 35 significant sources of misalignment between humans and judges across interaction modalities, the majority of which correspond to existing software engineering code quality criteria. For example, in chat-based coding, judges are biased towards longer code explanations while humans prefer shorter ones. We find significant misalignment on the majority of existing code quality dimensions, showing alignment gaps between LLM judges and human preference in realistic coding applications.

2603.24422 2026-05-15 cs.IR cs.AI cs.CL

OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework

Ben Chen, Siyuan Wang, Yufei Ma, Zihan Liang, Xuxin Zhang, Yue Lv, Ying Yang, Huangyu Dai, Lingtao Mao, Tong Zhao, Zhipeng Qian, Xinyu Sun, Zhixin Zhai, Yang Zhao, Bochao Liu, Jingshan Lv, Xiao Liang, Hui Kong, Jing Chen, Han Li, Chenyi Lei, Wenwu Ou, Kun Gai

发表机构 * Kuaishou Technology(快手科技)

AI总结 本文提出了一种名为 OneSearch-V2 的生成式检索框架,旨在解决现有系统在复杂查询理解、用户意图挖掘和偏好过拟合等方面的问题。该方法通过引入潜在推理增强的自蒸馏训练机制,提升了对用户深层需求的理解与匹配能力,并结合行为偏好对齐优化系统,有效缓解了单一转化指标带来的奖励黑客问题。实验表明,OneSearch-V2 在多项指标上均有显著提升,包括点击率、买家数量和订单量,并改善了搜索体验质量。

Comments Codes are available at https://github.com/benchen4395/onesearch-family. Feel free to contact benchen4395@gmail.com

详情
英文摘要

Generative Retrieval (GR) has emerged as a promising paradigm for modern search systems. Compared to multi-stage cascaded architecture, it offers advantages such as end-to-end joint optimization and high computational efficiency. OneSearch, as a representative industrial-scale deployed generative search framework, has brought significant commercial and operational benefits. However, its inadequate understanding of complex queries, inefficient exploitation of latent user intents, and overfitting to narrow historical preferences have limited its further performance improvement. To address these challenges, we propose OneSearch-V2, a latent reasoning enhanced self-distillation generative search framework. It contains three key innovations: (1) a thought-augmented complex query understanding module, which enables deep query understanding and overcomes the shallow semantic matching limitations of direct inference; (2) a reasoning-internalized self-distillation training pipeline, which uncovers users' potential yet precise e-commerce intentions beyond log-fitting through implicit in-context learning; (3) a behavior preference alignment optimization system, which mitigates reward hacking arising from the single conversion metric, and addresses personal preference via direct user feedback. Extensive offline evaluations demonstrate OneSearch-V2's strong query recognition and user profiling capabilities. Online A/B tests further validate its business effectiveness, yielding +3.98\% item CTR, +2.07\% buyer volume, and +2.11\% order volume. Manual evaluation further confirms gains in search experience quality, with +1.37\% in page good rate and +1.65\% in query-item relevance. More importantly, OneSearch-V2 effectively mitigates common search system issues such as information bubbles and long-tail sparsity, without incurring additional inference costs or serving latency.

2603.00772 2026-05-15 stat.ML cs.LG

Generalizing Score-based generative models for Heavy-tailed Distributions

Tiziano Fassina, Gabriel Cardoso, Sylvan Le Corff, Thomas Romary

发表机构 * STIM, Mines Paris(STIM, Mines巴黎) LPSM, Sorbonne Université(LPSM,索邦大学)

AI总结 本文研究了如何将基于分数的生成模型(SGMs)推广到具有重尾分布的数据。针对现有方法在生成保真度和理论基础方面的不足,作者提出了两个理论贡献:一是证明通过早期停止和适当初始化可以将扩散框架扩展到任意目标分布;二是为归一化流的生成过程推导出新的理论保证。基于这些结果,文章提出了一种统一的生成框架,结合归一化流捕捉重尾特性与SGM细化结构细节,有效提升了生成质量并克服了现有方法的局限。

详情
英文摘要

Score-based generative models (SGMs) have achieved remarkable empirical success, motivating their application to a broad range of data distributions. However, extending them to heavy-tailed targets remains a largely open problem. Although dedicated models for heavy-tailed distributions have been proposed, their generative fidelity remains unclear and they lack solid theoretical foundations, leaving important questions open in this regime. In this paper, we address this gap through two theoretical contributions. First, we show that combining early stopping with a suitable initialization is sufficient to extend the diffusion framework to any target distribution; in particular, we establish the well-posedness of the backward process and prove convergence of the approximated diffusion in KL divergence. Second, we derive novel theoretical guarantees for generation with normalizing flows, obtaining convergence results that hold under mild conditions on the flow family and without any assumption on the tail behavior of the target distribution. Building on these results, we propose a unified generative framework for heavy-tailed distributions: a normalizing flow is first trained to capture the tail behavior and is then used as an initialization prior for an SGM, which refines the samples by recovering fine-grained structural details. This design leverages the complementary strengths of the two model classes within a theoretically principled pipeline, overcoming the limitations of existing approaches.

2602.17407 2026-05-15 eess.SY cs.RO cs.SY

Bluetooth Phased-array Aided Inertial Navigation Using Factor Graphs: Experimental Verification

Glen Hjelmerud Mørkbak Sørensen, Torleiv H. Bryne, Kristoffer Gryte, Tor Arne Johansen

发表机构 * Department of Engineering Cybernetics, Norwegian University of Science and Technology (NTNU)(工程 cybernetics 部,挪威科学技术大学(NTNU))

AI总结 本文研究了利用相控阵蓝牙系统辅助惯性导航的问题,提出基于因子图优化的估计方法,并通过多旋翼无人机飞行实验验证其性能。研究对比了不同鲁棒估计策略在GNSS信号丢失场景下的表现,展示了蓝牙角度、距离或气压测量辅助导航的可行性与效果。该工作为低成本、高鲁棒性的室内导航系统提供了实验依据与方法支持。

Comments 6 pages, 5 figures, 2 tables. \c{opyright} 2026 the authors. This work has been accepted to IFAC for publication under a Creative Commons Licence CC-BY-NC-ND

详情
英文摘要

Phased-array Bluetooth systems have emerged as a low-cost alternative for performing aided inertial navigation in GNSS-denied use cases such as warehouse logistics, drone landings, and autonomous docking. Basing a navigation system off of commercial-off-the-shelf components may reduce the barrier of entry for phased-array radio navigation systems, albeit at the cost of significantly noisier measurements and relatively short feasible range. In this paper, we compare robust estimation strategies for a factor graph optimisation-based estimator using experimental data collected from multirotor drone flight. We evaluate performance in loss-of-GNSS scenarios when aided by Bluetooth angular measurements, as well as range or barometric pressure.

2602.15249 2026-05-15 cs.DL cs.AI

Artificial Intelligence Specialization in the European Union: Underexplored Role of the Periphery at NUTS-3 Level

Victor Herrero-Solana, Carmen Gálvez

发表机构 * SCImago-UGR, Unit for Computational Humanities and Social Sciences (U^CHASS) University of Granada, Spain(SCImago-UGR,计算人文与社会科学单位(U^CHASS)格拉纳达大学,西班牙)

AI总结 本研究分析了2015年至2024年间欧洲NUTS-3地区在人工智能领域的研究分布情况,利用引文数据和分类系统,计算了相对专业化指数和相对引用影响力指标。研究发现,尽管巴黎、华沙和马德里等大都市在论文数量上占优,但人工智能领域的相对专业化程度最高的是东欧和西班牙的一些外围地区,如格拉纳达和维尔纽斯地区。研究还揭示了专业化与引用影响力之间关系较弱,不同地区呈现出多样化的发展模式。

Comments 15 pages, 3 figures

详情
英文摘要

This study examines the distribution of Artificial Intelligence (AI) research across European NUTS-3 regions during the period 2015-2024. Using bibliometric data from Clarivate InCites and the Citation Topics classification system, we analyse two hierarchical thematic levels: Electrical Engineering, Electronics & Computer Science (Macro Citation Topic 4) and Artificial Intelligence & Machine Learning (Meso Citation Topic 4.61). Relative Specialization Index (RSI) and Relative Citation Impact (RCI) indicators are calculated for 781 European NUTS-3 regions. While major metropolitan hubs such as Paris, Warszawa, and Madrid dominate in absolute publication volume, the results reveal that the highest levels of relative AI specialization are concentrated in peripheral regions, particularly in Eastern Europe and Spain. Granada and Vilniaus apskritis stand out as regions combining high specialization with strong citation visibility. The analysis further suggests a weak relationship between regional specialization and citation impact, revealing multiple regional profiles, including highly specialized regions with limited citation visibility, highly visible regions with comparatively low specialization, and diversified scientific systems combining moderate specialization with strong citation impact. Fyn emerges as an extreme case of very high citation impact despite relatively low specialization.

2602.14881 2026-05-15 math.OC cs.AI

Numerical exploration of the range of shape functionals using neural networks

Eloi Martinet, Ilias Ftouhi

发表机构 * Institute of Mathematics, University of Würzburg, Germany Laboratoire MIPA, N\ imes University, Site des Carmes, Place Gabriel P\'eri, 30000 N\ imes, France

AI总结 本文提出了一种基于神经网络的新数值框架,用于探索Blaschke–Santaló图,该图用于描述形状泛函之间的可能不等式关系。通过引入基于规范函数的可逆神经网络结构,实现了对任意维凸集的参数化,并在形状优化过程中保持凸性。为实现图内的均匀采样,作者设计了一种通过自动微分最小化Riesz能量泛函的粒子系统,并在二维和三维凸体的多个几何和偏微分方程型泛函上验证了方法的有效性。

Comments 20 pages, 8 figures

详情
英文摘要

We introduce a novel numerical framework for the exploration of Blaschke--Santaló diagrams, which are efficient tools characterizing the possible inequalities relating some given shape functionals. We introduce a parametrization of convex bodies in arbitrary dimensions using a specific invertible neural network architecture based on gauge functions, allowing an intrinsic conservation of the convexity of the sets during the shape optimization process. To achieve a uniform sampling inside the diagram, and thus a satisfying description of it, we introduce an interacting particle system that minimizes a Riesz energy functional via automatic differentiation in PyTorch. The effectiveness of the method is demonstrated on several diagrams involving both geometric and PDE-type functionals for convex bodies of $\mathbb{R}^2$ and $\mathbb{R}^3$, namely, the volume, the perimeter, the moment of inertia, the torsional rigidity, the Willmore energy, and the first two Neumann eigenvalues of the Laplacian.

2602.06718 2026-05-15 cs.CR cs.AI

GhostCite: A Large-Scale Analysis of Citation Validity in the Age of Large Language Models

Zuyao Xu, Yuqi Qiu, Lu Sun, Fasheng Miao, Fubin Wu, Xiang Li, Xinyi Wang, Haozhe Lu, Zhengze Zhang, Yuxin Hu, Jialu Li, Luo Jin, Feng Zhang, Rui Luo, Xinran Liu, Yingxian Li, Jiaji Liu

发表机构 * Nankai University(南开大学) Tsinghua University(清华大学)

AI总结 《GhostCite:大语言模型时代引文有效性的大规模分析》研究了大型语言模型(LLMs)在学术写作中广泛使用所引发的引文有效性问题。研究开发了一个开源框架\citeb,用于大规模验证引文,并通过三个实验分析了LLMs生成虚假引文(“幽灵引文”)的现象。研究发现,所有测试的LLMs在不同领域生成引文时都有较高比例的虚构引文,且近年来学术会议论文中的无效引文比例显著上升,同时多数研究者依赖AI工具,但审稿人对引文的审查并不严格,反映出当前学术出版体系在应对这一问题上的不足。

详情
英文摘要

Citations provide the basis for trusting scientific claims; when they are invalid or fabricated, this trust collapses. With the advent of Large Language Models (LLMs), this risk has intensified: LLMs are increasingly used for academic writing, but their tendency to fabricate citations (``ghost citations'') poses a systemic threat to citation validity. To quantify this threat, we develop \citeb, an open-source framework for large-scale citation verification, and conduct a comprehensive study of citation validity in the LLM era through three complementary experiments. First, we benchmark 13 LLMs on citation generation task in various research domains, finding that all models hallucinate citations at rate from 14.23\% to 94.93\%. Second, we analyze 2.2 million citations from 56,381 papers at AI/ML and Security venues (2020--2025), finding that 1.07\% of papers contain invalid citations, with an 80.9\% increase in 2025. Third, we survey 97 researchers, finding that 87.2\% use AI-powered tools in their workflows, 76.7\% of reviewers do not thoroughly check references, and 74.5\% view peer review as ineffective at catching citation errors. Based on these findings, we argue that ghost citations represent a systemic threat to academic integrity, and call for coordinated efforts from community to address this challenge.

2602.03680 2026-05-15 physics.soc-ph cs.SD

Instantaneous Spectra Analysis of Pulse Series -- Application to Lung Sounds with Abnormalities

Fumihiko Ishiyama

发表机构 * NTT Inc.(日本电通公司)

AI总结 本文研究了脉冲序列的瞬时频谱分析方法,并将其应用于异常肺音(如爆裂音和哮鸣音)及正常肺音的分析。传统傅里叶分析的时间频率分辨率受限于周期边界条件假设,作者提出采用线性外推条件替代该假设,从而实现更精确的瞬时频谱分析。该方法能够有效提取脉冲序列中每个脉冲的频谱信息,并生成脉冲序列的时频图,清晰展示其时间频率结构,为异常肺音的识别提供了新的分析工具。

Comments 10 pages, 7 figures. To appear Proc. IEEE CSPA 2026

详情
英文摘要

The origin of the "theoretical limit of time-frequency resolution of Fourier analysis" is from its numerical implementation, especially from an assumption of "Periodic Boundary Condition (PBC)," which was introduced a century ago. We previously proposed to replace this condition with "Linear eXtrapolation Condition (LXC)," which does not require periodicity. This feature makes instantaneous spectra analysis of pulse series available, which replaces the short time Fourier transform (STFT). We applied the instantaneous spectra analysis to two lung sounds with abnormalities (crackles and wheezing) and to a normal lung sound, as a demonstration. Among them, crackles contains a random pulse series. The spectrum of each pulse is available, and the spectrogram of pulse series is available with assembling each spectrum. As a result, the time-frequency structure of given pulse series is visualized.

2512.12772 2026-05-15 cs.MM cs.CV

JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation

Jianghan Chao, Jianzhang Gao, Wenhui Tan, Yuchong Sun, Ruihua Song, Liyun Ru

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学香樟人工智能学院) Baichuan Inc(百川科技)

AI总结 为了全面评估能够处理多模态信息的全大语言模型(Omni-LLMs),本文提出JointAVBench基准,涵盖多模态依赖、多样化的音频信息类型和不同场景跨度三个关键方面。该基准通过自动化流程生成严格依赖音视频联合理解的问题与答案,弥补了现有数据集在多模态评估方面的不足。实验表明,即使表现最好的Omni-LLM在该基准上的平均准确率也仅为65.3%,显示出在跨场景推理等方面仍有较大提升空间。

详情
英文摘要

Understanding videos inherently requires reasoning over both visual and auditory information. To properly evaluate Omni-Large Language Models (Omni-LLMs), which are capable of processing multi-modal information including vision and audio, an effective benchmark must comprehensively cover three key aspects: (1) multi-modal dependency (i.e., questions that cannot be answered using vision or audio alone), (2) diverse audio information types (e.g., speech, sound events), and (3) varying scene spans. However, existing datasets fall short in one or more of these dimensions, limiting strict and comprehensive evaluation. To address this gap, we introduce JointAVBench, a novel benchmark with strict audio-video correlation, spanning five cognitive dimensions, four audio information types (speech, sound events, music, vocal traits), and three scene spans (single-, cross-, and full-scene). Given the high cost of manual annotation, we propose an automated pipeline that leverages state-of-the-art vision-LLMs, audio-LLMs, and general-purpose LLMs to synthesize questions and answers that strictly require joint audio-visual understanding. We evaluate leading vision-only, audio-only, and Omni-LLMs on our dataset. Results show that even the best-performing Omni-LLM achieves an average accuracy of only 65.3\%, outperforming uni-modal baselines but revealing substantial room for improvement, especially in cross-scene reasoning.

2511.21247 2026-05-15 eess.AS cs.LG cs.SD

The Spheres Dataset: Multitrack Orchestral Recordings for Music Source Separation and Information Retrieval

Jaime Garcia-Martinez, David Diaz-Guerra, John Anderson, Ricardo Falcon-Perez, Pablo Cabañas-Molero, Tuomas Virtanen, Julio J. Carabias-Orti, Pedro Vera-Candeas

发表机构 * Universidad de Jaén(耶鲁大学) Odratek BV(Odratek公司) Tampere University(塔尔库大学)

AI总结 本文介绍了《Spheres数据集》,这是一个包含多轨管弦乐录音的数据集,旨在推动经典音乐领域中音乐源分离及相关音乐信息检索任务的机器学习研究。数据集由Colibrì乐团在The Spheres录音棚演奏的超过一小时的音乐作品组成,包括柴可夫斯基《罗密欧与朱丽叶》和莫扎特第四十号交响曲,并附有各乐器的音阶和独奏片段。通过23个麦克风的多角度录制,该数据集提供了真实立体声混音、可控的音轨混入以及独立音轨,适用于源分离模型的训练与评估,并附有各乐器位置的房间脉冲响应,为研究提供了丰富的声学特性信息。

Journal ref in IEEE Transactions on Audio, Speech and Language Processing, vol. 34, pp. 2622-2634, 2026

详情
英文摘要

This paper introduces The Spheres dataset, multitrack orchestral recordings designed to advance machine learning research in music source separation and related MIR tasks within the classical music domain. The dataset is composed of over one hour recordings of musical pieces performed by the Colibrì Ensemble at The Spheres recording studio, capturing two canonical works - Tchaikovsky's Romeo and Juliet and Mozart's Symphony No. 40 - along with chromatic scales and solo excerpts for each instrument. The recording setup employed 23 microphones, including close spot, main, and ambient microphones, enabling the creation of realistic stereo mixes with controlled bleeding and providing isolated stems for supervised training of source separation models. In addition, room impulse responses were estimated for each instrument position, offering valuable acoustic characterization of the recording space. We present the dataset structure, acoustic analysis, and baseline evaluations using X-UMX based models for orchestral family separation and microphone debleeding. Results highlight both the potential and the challenges of source separation in complex orchestral scenarios, underscoring the dataset's value for benchmarking and for exploring new approaches to separation, localization, dereverberation, and immersive rendering of classical music.

2511.18820 2026-05-15 physics.flu-dyn cs.LG

Unsupervised simulation of incompressible flows with physics- and equality- constrained artificial neural networks

Qifeng Hu, Inanc Senocak

发表机构 * Department of Mechanical Engineering and Materials Science, University of Pittsburgh, Pittsburgh, PA 15261, USA(机械工程与材料科学系,匹兹堡大学,匹兹堡,PA 15261,USA)

AI总结 该研究提出了一种基于物理约束和等式约束的人工神经网络(PECANN)框架,用于无监督模拟不可压缩流体在高雷诺数下的流动。通过引入压力泊松方程目标函数和条件自适应增广拉格朗日乘子法(CA-ALM),严格满足连续性方程和边界条件,有效解决了传统物理信息神经网络在高雷诺数流动中难以保证无散性约束的问题。实验表明,该方法在多个典型流动场景中无需监督预训练或标签数据,即可准确捕捉流动结构,包括高雷诺数下圆柱绕流中涡旋脱落的自发产生。

Comments 33 pages, 19 figures

详情
英文摘要

Physics-informed neural networks (PINNs) have shown promise for solving partial differential equations, yet their success in simulating incompressible flows at high Reynolds numbers remains limited. Existing approaches rely on auxiliary labeled data, supervised pretraining, or reference solutions, and no purely unsupervised method comparable to conventional finite-difference or finite-volume solvers has been demonstrated. We attribute this gap to the absence of a mechanism for enforcing the divergence-free constraint and boundary conditions to strict tolerances. To address this, we adopt the physics- and equality-constrained artificial neural network (PECANN) framework with a conditionally adaptive augmented Lagrangian method (CA-ALM), and introduce a pressure-Poisson-based objective. The residual of the pressure Poisson equation is minimized subject to the momentum and continuity equations and boundary conditions on the primitive variables as equality constraints, with CA-ALM enforcing all constraints tightly. For advection-dominated, high-Reynolds-number flows, we further propose an adaptive vanishing entropy viscosity that stabilizes early training without influencing the converged solution. A baseline that instead uses the momentum residual as the objective proves ineffective under the same machinery, underscoring the critical role of the pressure-Poisson objective. The method is assessed on lid-driven cavity flow up to $Re=7{,}500$, three-dimensional unsteady Beltrami flow, and steady and unsteady flow past a circular cylinder with general inflow-outflow boundary conditions, including an ablation study identifying admissible outlet conditions -- all without labeled data or supervised pretraining. Notably, it captures the spontaneous onset of periodic vortex shedding in unsteady cylinder flow without external perturbations, starting from a randomly initialized network.