arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1967
2605.14786 2026-05-15 cs.CR cs.AI cs.HC cs.LG

Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces

William Lugoloobi, Samuelle Marro, Jabez Magomere, Joss Wright, Chris Russell

发表机构 * Oxford Internet Institute, University of Oxford(牛津互联网研究所,牛津大学) Department of Engineering Science, University of Oxford(工程科学系,牛津大学)

AI总结 随着基于大语言模型(LLM)的智能体越来越多地代表用户浏览网页,一个自然的问题是:网站能否被动识别出驱动该智能体的底层模型?本研究发现,通过被动的JavaScript追踪器捕获智能体的动作和交互时间,可以以高达96%的F1分数识别出使用的模型。研究还表明,基于智能体行为训练的分类器能够跨不同规模和家族的模型泛化,并且仅需少量交互轨迹即可训练出高效的分类器。尽管引入随机时间延迟可以降低分类器性能,但重新训练后仍能恢复识别效果。

详情
英文摘要

As LLM-based agents increasingly browse the web on users' behalf, a natural question arises: can websites passively identify which underlying model powers an agent? Doing so would represent a significant security risk, enabling targeted attacks tailored to known model vulnerabilities. Across 14 frontier LLMs and four web environments spanning information retrieval and shopping tasks, we show that an agent's actions and interaction timings, captured via a passive JavaScript tracker, are sufficient to identify the underlying model with up to 96\% F1. We formalise this attack surface by demonstrating that classifiers trained on agent actions generalise across model sizes and families. We further show that strong classifiers can be trained from few interaction traces and that agent identity can be inferred early within an episode. Injecting randomised timing delays between actions substantially degrades classifier performance, but does not provide robust protection: a classifier retrained on delayed traces largely recovers performance. We release our harness and a labelled corpus of agent traces \href{https://github.com/KabakaWilliam/known_actions}{here}.

2605.14750 2026-05-15 cs.CR cs.AI

EVA: Editing for Versatile Alignment against Jailbreaks

Yi Wang, Hongye Qiu, Yue Xu, Sibei Yang, Zhan Qin, Minlie Huang, Wenjie Wang

发表机构 * ShanghaiTech University(上海科技大学) Sun Yat-sen University(中山大学) State Key Laboratory of Blockchain and Data Security(区块链与数据安全国家重点实验室) Tsinghua University(清华大学)

AI总结 大型语言模型(LLMs)和视觉语言模型(VLMs)虽然表现出色,但仍易受越狱攻击的影响,攻击者通过文本或视觉触发器绕过安全防护。为解决现有防御方法带来的计算开销大和性能下降问题,本文提出EVA框架,通过直接模型编辑技术精准修正模型中导致越狱行为的关键神经元,无需大规模重训练,从而在保持模型原有能力的同时有效消除有害行为。实验表明,EVA在多种模型上均优于现有方法,为部署后的安全对齐提供了高效且精确的解决方案。

Comments IEEE TPAMI 2026

详情
英文摘要

Large Language Models (LLMs) and Vision Language Models (VLMs) have demonstrated impressive capabilities but remain vulnerable to jailbreaking attacks, where adversaries exploit textual or visual triggers to bypass safety guardrails. Recent defenses typically rely on safety fine-tuning or external filters to reduce the model's likelihood of producing harmful content. While effective to some extent, these methods often incur significant computational overheads and suffer from the safety utility trade-off, degrading the model's performance on benign tasks. To address these challenges, we propose EVA (Editing for Versatile Alignment against Jailbreaks), a novel framework that pioneers the application of direct model editing for safety alignment. EVA reframes safety alignment as a precise knowledge correction task. Instead of retraining massive parameters, EVA identifies and surgically edits specific neurons responsible for the model's susceptibility to harmful instructions, while leaving the vast majority of the model unchanged. By localizing the updates, EVA effectively neutralizes harmful behaviors without compromising the model's general reasoning capabilities. Extensive experiments demonstrate that EVA outperforms baselines in mitigating jailbreaks across both LLMs and VLMs, offering a precise and efficient solution for post-deployment safety alignment.

2605.14741 2026-05-15 eess.SY cs.AI cs.SY

Addressing Terminal Constraints in Data-Driven Demand Response Scheduling

Maximilian Bloor, Martha White, Ehecatl Antonio del Rio Chanona, Calvin Tsay

发表机构 * Sargent Centre for Process Systems Engineering, Imperial College London, London, SW7 2AZ, UK(过程系统工程中心,伦敦帝国理工学院,伦敦,SW7 2AZ,英国) Department of Computer Science, University of Alberta, Edmonton, AB, Canada(计算机科学系,阿尔伯塔大学,埃德蒙顿,AB,加拿大) Department of Computing, Imperial College London, London, SW7 2AZ, UK(计算系,伦敦帝国理工学院,伦敦,SW7 2AZ,英国)

AI总结 本文研究了在数据驱动的需求响应调度中如何满足终端约束的问题,提出了一种结合目标空间规划(GSP)与深度确定性策略梯度(DDPG)的方法,通过学习离散子目标的时序抽象模型,有效传递长期价值,提升调度效果。该方法在模拟的空气分离系统中验证了其在提高样本效率和满足终端存储约束方面的优势,缓解了传统方法在长期约束处理上的不足。

Comments Accepted to IFAC World Congress 2026

详情
英文摘要

Electrified chemical processes are incentivized by exposure to time-varying electricity markets to operate flexibly, but participating in demand response schemes can require satisfying terminal constraints over long horizons. Specifically, terminal constraints may be required when computing optimal schedules in order to preserve dynamic stability. Model-based optimization methods are computationally costly, and data-driven scheduling via reinforcement learning (RL) faces severe credit-assignment challenges. We integrate Goal-Space Planning (GSP) with Deep Deterministic Policy Gradient (DDPG), using learned temporally abstract models over discrete subgoals to propagate value across extended horizons. Using a simulated air separation benchmark, we demonstrate the proposed approach improves sample efficiency over standard DDPG while satisfying terminal storage constraints, mitigating myopic control behavior.

2605.14731 2026-05-15 cs.GR cs.CV cs.SD

UMo: Unified Sparse Motion Modeling for Real-Time Co-Speech Avatars

Xiaoyu Zhan, Xinyu Fu, Chenghao Yang, Xiaohong Zhang, Dongjie Fu, Pengcheng Fang, Tengjiao Sun, Xiaohao Cai, Hansung Kim, Yuanqi Li, Jie Guo, Yanwen Guo

发表机构 * Nanjing University(南京大学) Mogo AI Ltd.(Mogo AI有限公司) University of Southampton(南安普顿大学)

AI总结 本文提出了一种统一的稀疏运动建模方法UMo,用于实现高保真、实时的共语义数字人动画生成。UMo通过统一处理文本、音频和运动信息,结合空间稀疏的专家混合框架和时间稀疏的关键帧设计,实现了高效实时的密集重建,能够在保证时间一致性和高保真度的同时提升生成质量。此外,UMo采用多阶段训练策略和针对性的音频增强方法,有效提升了语音-运动对齐的精度和语义一致性,为实时共语义动画提供了实用的解决方案。

详情
英文摘要

Speech-driven gestures and facial animations are fundamental to expressive digital avatars in games, virtual production, and interactive media. However, existing methods are either limited to a single modality for audio motion alignment, failing to fully utilize the potential of massive human motion data, or are constrained by the representation ability and throughput of multimodal models, which makes it difficult to achieve high-quality motion generation or real-time performance. We present UMo, a unified sparse motion modeling architecture for real-time co-speech avatars, which processes text, audio, and motion tokens within a unified formulation. Leveraging a spatially sparse Mixture-of-Experts framework and a temporally sparse, keyframe-centric design, UMo efficiently performs real-time dense reconstruction, enabling temporally coherent and high-fidelity animation generation for both facial expressions and gestures. Furthermore, we implement a multi-stage training strategy with targeted audio augmentation to enhance acoustic diversity and semantic consistency. Consequently, UMo preserves fine-grained speech-motion alignment even under strict latency constraints. Extensive quantitative and qualitative evaluations show that UMo achieves better output quality under low latency and real-time performance constraints, offering a practical solution for high-fidelity real-time co-speech avatars.

2605.14671 2026-05-15 cond-mat.mtrl-sci cs.AI

Agentic Design of Compositional Descriptors via Autoresearch for Materials Science Applications

Matteo Cobelli, Stefano Sanvito

发表机构 * School of Physics(物理系) CRANN Institute, Trinity College, Dublin 2, Dublin, Ireland(CRANN研究所,三一学院,都柏林2号,都柏林,爱尔兰)

AI总结 本文提出了一种基于自研(autoresearch)框架的智能代理系统Automat,用于材料科学中化学成分描述符的设计。该系统利用大型语言模型作为编码代理,自动生成仅基于化学公式的描述符,并通过随机森林进行评估,实现了对无机材料带隙和铁磁化合物居里温度的预测。研究显示,Automat在性能上优于传统基准方法,且生成的描述符具有化学可解释性,展示了无需人工特征工程即可设计任务特定材料描述符的潜力,同时也揭示了当前在描述符冗余和搜索策略等方面存在的挑战。

详情
英文摘要

Autoresearch offers a flexible paradigm for automating scientific tasks, in which an AI agent proposes, implements, evaluates, and refines candidate solutions against a quantitative objective. Here, we use composition-based materials-property prediction to test whether such agents can perform a task beyond model selection and hyperparameter optimization: the design of input descriptors. We introduce Automat, an autoresearch framework where a coding agent based on a large language model generates composition-only descriptors for chemical compounds and evaluates them using a random forest workflow. The agent is restricted to information derivable from chemical formulas and iteratively proposes, implements, and tests chemically motivated descriptor strategies. We apply Automat, with OpenAI Codex using GPT-5.5 as the coding agent, to the prediction of experimental band gaps in inorganic materials and Curie temperatures in ferromagnetic compounds. In both tasks, Automat improves over fractional-composition, Magpie, and combined fractional-composition/Magpie baselines, while producing descriptor families that are chemically interpretable. These results provide a demonstration that autoresearch agents can generate competitive, task-specific materials descriptors without manual feature engineering during the run. They also reveal current limitations, including descriptor redundancy, sensitivity to greedy feature expansion, and the need for explicit complexity control, descriptor pruning, and more sophisticated search strategies.

2605.14662 2026-05-15 math.OC cs.LG

Scalable Solution of the Stochastic Multi-path Traveling Salesman Problem via Neural Networks

Xiaochen Chou, Ludovica Di Marco, Enza Messina

发表机构 * Department of Informatics, Systems and Communication(信息学、系统与通信系)

AI总结 本文研究了在智能城市和城市物流中出现的具有随机旅行成本的多路径旅行商问题,旨在寻找一条最小化期望总旅行成本的哈密顿回路。为解决该问题,作者提出了一种两阶段随机规划方法,并引入基于神经网络的代理模型来近似第二阶段的 recourse 问题,从而显著降低计算复杂度。实验表明,该方法在计算效率、解的质量和泛化能力方面表现良好,为处理不确定性下的复杂车辆路径问题提供了可扩展的解决方案。

详情
英文摘要

The multi-path Traveling Salesman Problem with stochastic travel costs arises in hybrid vehicle routing applications designed for Smart City and City Logistics, where multiple paths exist between each pair of locations. Travel times along these paths are typically affected by real-time traffic conditions and therefore modeled as stochastic. The objective of the problem is to determine a Hamiltonian tour that minimizes the expected total travel cost under uncertainty. In this work, we adopt a two-stage stochastic programming formulation. In the first stage, a predefined route specifying the sequence of locations to be visited is determined, while taking into consideration a second-stage recourse problem that selects the optimal path from the feasible set of alternative paths for each pair of locations, once real-time traffic conditions are realized. To reduce the computational burden imposed by the large number of scenarios required to capture travel time uncertainty, the innovation of this work is the integration of neural network-based surrogate models to approximate the expected value of the second-stage recourse problem. Different architectures and training strategies for the neural networks are proposed and analyzed, with performance evaluated in terms of computation time, solution quality, and generalization capability. Preliminary findings demonstrate the enhanced scalability and practical applicability of the approach for complex vehicle routing problems under uncertainty.

2605.14629 2026-05-15 eess.IV cs.CV

Efficient Dense Matching for Enhanced Gaussian Splatting Using AV1 Motion Vectors

Julien Zouein, Vibhoothi Vibhoothi, François Pitié, Anil Kokaram

发表机构 * SigMedia

AI总结 本文提出了一种基于AV1运动向量的高效密集匹配方法,用于提升高斯泼溅(3DGS)的初始点云质量。该方法利用AV1视频编解码器中的运动向量,避免了传统SfM方法中耗时的穷举匹配,显著降低了计算开销并提高了点云密度。实验表明,该方法生成的点云数量是传统SfM方法的八倍,有效提升了3DGS的重建精度和训练效率。

详情
英文摘要

3D Gaussian Splatting (3DGS) has emerged as a prominent framework for real-time, photorealistic scene reconstruction, offering significant speed-ups over Neural Radiance Fields (NeRF). However, the fidelity of 3DGS representations remains heavily dependent on the quality of the initial point cloud. While standard Structure-from-Motion (SfM) pipelines using COLMAP provide adequate initialisation, they often suffer from high computational costs and sparsity in textureless regions, which degrades subsequent reconstruction accuracy and convergence speed. In this work, we introduce an AV1-based feature detection and matching pipeline that significantly reduces SfM processing overhead. By leveraging motion vectors inherent to the AV1 video codec, we bypass computationally expensive exhaustive matching while maintaining geometric robustness. Our pipeline produces substantially denser point clouds, with up to eight times as many points as classical SfM. We demonstrate that this enhanced initialisation directly improves 3DGS performance, yielding an 9-point increase in VMAF and a 63% average reduction in training time required to reach baseline quality. The project page: https://sigmedia.tv/AV1-3DGS.github.io/

2605.14612 2026-05-15 cs.SE cs.AI

In-IDE Toolkit for Developers of AI-Based Features

Yaroslav Sokolov, Yury Khudyakov, Lenar Sharipov, Andrei Gasparian, Parth Tiwary, Artem Trofimov

发表机构 * JetBrains

AI总结 本文提出了一种集成在JetBrains IDE中的AI Toolkit插件,旨在帮助非机器学习背景的软件工程师更便捷地测试、调试和评估基于大语言模型和智能体工作流的AI功能。该工具通过在运行/调试过程中实现追踪与评估,满足了开发者对可重复评估、实时追踪和简化设置的核心需求。实验表明,该工具能有效降低使用门槛,促进开发者形成规范的AI开发实践。

Comments Published at IDE'26 co-located with ICSE'26

详情
英文摘要

AI-enabled features built on LLMs and agentic workflows are difficult to test, debug, and reproduce, especially for product-focused software engineers without a machine learning background. We present the AI Toolkit plugin for JetBrains IDEs, which brings tracing and evaluation directly into the Run/Debug loop. A mixed methods study with practitioners presents three consistent needs: (1) make evaluation regular and repeatable, (2) expose traces at the moment of execution, and (3) minimize setup and context switching. Guided by these needs, the AI Toolkit introduces an IDE-native workflow: run-triggered trace capture; immediate, hierarchical inspection; one-click "Add to Dataset" from traces; and unit-test-like evaluations with pluggable metrics. The first release in PyCharm shows promising early signals - strong conversion when promoted at Run, sustained usage among those who capture traces, and low churn - suggesting that IDE-native observability lowers activation energy and helps developers adopt disciplined practices. We detail the design and implementation of the AI Agents Debugger and AI Evaluation, report initial adoption telemetry, and outline next steps to broaden framework coverage and scale evaluations. Together, these results indicate that integrating AI observability and evaluation into everyday IDE workflows can make modern AI development accessible to non-ML specialists while preserving software-engineering practices.

2605.14584 2026-05-15 physics.chem-ph cs.LG

All-atomistic Transferable Neural Potentials for Protein Solvation

Rishabh Dey, Salvina Sharipova, Konstantin Popov

发表机构 * University of North Carolina at Chapel Hill - Eshelman School of Pharmacy(北卡罗来纳大学教堂山分校-埃舍尔曼药学院)

AI总结 该研究提出了一种名为PHNN的全原子可迁移神经势能模型,用于蛋白质溶剂化计算。该模型通过学习可迁移的参数修正来改进隐式溶剂模型的准确性,而非对最终能量进行事后调整。PHNN结合物理先验知识以提高数据效率,在传统分析方法基础上显著提升了预测精度,并在超出训练域的蛋白质系统中保持良好的泛化能力。

详情
英文摘要

Implicit solvent models are widely used to decrease the number of solvent degrees of freedom and enable the calculation of solvation energetics without water molecules. However, its accuracy often falls short compared to explicit models. Recent advancements in neural potentials have shown promise in drug discovery, but transferability remains a persistent challenge. Here, we introduce the Protein Hydration Neural Network (PHNN), an implicit solvent model that extends analytical continuum solvation by learning transferable corrections to model parameters instead of applying post hoc adjustments to final energies. The model is explicitly designed to maximize data efficiency by leveraging physical priors embedded in the data. We demonstrate that PHNN improves accuracy relative to traditional analytical methods and maintains predictive accuracy on out-of-domain protein systems.

2605.14567 2026-05-15 stat.ML cs.LG math.PR math.ST stat.TH

Scaling Laws from Sequential Feature Recovery: A Solvable Hierarchical Model

Arie Wortsman-Zurich, Hugo Tabanelli, Yatin Dandi, Florent Krzakala, Bruno Loureiro

发表机构 * Département d’Informatique, Ecole Normale Supérieure, PSL & CNRS(信息学院,巴黎高等师范学院,PSL & CNRS) Information Learning and Physics Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(信息学习与物理实验室,瑞士洛桑联邦理工学院(EPFL)) Statistical Physics of Computation Laboratory, École Polytechnique Fédérale de Lausanne (EPFL)(计算统计物理实验室,瑞士洛桑联邦理工学院(EPFL))

AI总结 本文提出了一种简单的机制,解释了多层网络中特征学习如何产生缩放定律。研究对象是一个高维的分层目标函数,该函数虽然整体复杂度很高,但可以通过一组权重呈幂律衰减的潜在组合特征来表示。通过设计一种逐层谱算法,能够逐步恢复这些潜在特征,且在样本量较小时就能检测到强特征,而弱特征则需要更多数据。理论分析表明,该方法在预测误差上实现了明确的幂律衰减,并通过数值实验验证了特征逐步恢复的现象和与非分层方法的性能差异。

详情
英文摘要

We propose a simple mechanism by which scaling laws emerge from feature learning in multi-layer networks. We study a high-dimensional hierarchical target that is a globally high-degree function, but that can be represented by a combination of latent compositional features whose weights decrease as a power law. We show that a layer-wise spectral algorithm adapted to this compositional structure achieves improved scaling relative to shallow, non-adaptive methods, and recovers the latent directions sequentially: strong features become detectable at small sample sizes, while weaker features require more data. We prove sharp feature-wise recovery thresholds and show that aggregating these transitions yields an explicit power-law decay of the prediction error. Technically, the analysis relies on random matrix methods and a resolvent-based perturbation argument, which gives matching upper and lower bounds for individual eigenvector recovery beyond what standard gap-based perturbation bounds provide. Numerical experiments confirm the predicted sequential recovery, finite-size smoothing of the thresholds, and separation from non-hierarchical kernel baselines. Together, these results show how smooth scaling laws can emerge from a cascade of sharp feature-learning transitions.

2605.14563 2026-05-15 cs.SE cs.CL

Remember Your Trace: Memory-Guided Long-Horizon Agentic Framework for Consistent and Hierarchical Repository-Level Code Documentation

Suyoung Bae, Jaehoon Lee, Changkyu Choi, YunSeok Choi, Jee-Hyong Lee

发表机构 * Sungkyunkwan University(成均馆大学) University of Oslo(奥斯陆大学)

AI总结 本文提出了一种名为MemDocAgent的长视野智能代理框架,用于生成一致且层次分明的仓库级代码文档。该方法通过依赖感知的遍历引导和基于记忆的代理交互,实现了对整个代码仓库的集成化文档生成,有效解决了现有方法中冗余检索、描述冲突和结构混乱的问题。实验表明,MemDocAgent在多个评估指标上优于开源和闭源基线方法,具有实际的软件开发应用价值。

详情
英文摘要

Automated code documentation is essential for modern software development, providing the contextual grounding that both human developers and coding agents rely on to navigate large codebases. Existing repository-level approaches process components independently, causing redundant retrieval and conflicting descriptions across documents while producing outputs that lack hierarchical structure. Therefore, we propose MemDocAgent, a long-horizon agentic framework that generates documentation within a single, integrated context spanning the entire repository. It combines two components: (i) Dependency-Aware Traversal Guiding that predetermines a traversal order respecting dependency and granularity hierarchies; (ii) Memory-Guided Agentic Interaction, in which the agent interacts with RepoMemory, a shared memory accumulating prior work traces through read, write, and verify operations. Through an in-depth multi-criteria evaluation, MemDocAgent achieves the best performance over both open and closed-source baselines and demonstrates practical applicability in real software development workflows.

2605.14526 2026-05-15 cs.GR cs.DC cs.NA cs.RO math.NA

DiffPhD: A Unified Differentiable Solver for Projective Heterogeneous Materials in Elastodynamics with Contact-Rich GPU-Acceleration

Shih-Yu Lai, Sung-Han Tien, Jui-I Huang, Yen-Chen Tseng, Yi-Ting Chiu, Siyuan Luo, Ziqiu Zeng, Fan Shi, Peter Yichen Chen, Tiantian Liu, Yu-Lun Liu, Bing-Yu Chen

发表机构 * National Taiwan University(国立台湾大学) MoonShine Animation Studio(MoonShine 动画工作室) National University of Singapore(新加坡国立大学) The University of British Columbia(不列颠哥伦比亚大学) Independent Researcher(独立研究员) National Yang Ming Chiao Tung University(阳明交通大学)

AI总结 DiffPhD 是一种统一的、基于 GPU 加速的可微分投影动力学框架,旨在解决含异质材料、大变形超弹性以及复杂接触交互的弹性动力学问题。该方法通过引入刚度感知的投影权重、信任域特征值过滤与改进的 Anderson 加速策略,并整合到统一的 GPU 计算流程中,实现了对异质材料的高效且稳定的模拟。DiffPhD 在保持梯度精度的同时显著提升了计算效率,并在大刚度对比场景下仍保持收敛性,为复杂物理系统的端到端优化提供了有力支持。

详情
英文摘要

Differentiable simulation of soft bodies is a foundation for system identification, trajectory optimization, and Real2Sim transfer. Yet, existing methods such as the differentiable Projective Dynamics (DiffPD) struggle when faced with heterogeneous materials with extreme stiffness contrasts, hyperelasticity under large deformations, and contact-rich interactions, which are common scenarios in the real world. We present DiffPhD, a unified GPU-accelerated differentiable Projective Dynamics framework for heterogeneous materials that tackles these intertwined challenges simultaneously. Our key insight is a careful integration of: (i) stiffness-aware projective weights to embed heterogeneity into the global system; (ii) trust-region eigenvalue filtering lifted to the backward pass for stable hyperelastic gradients and a type-II Anderson Acceleration scheme with dual-gate convergence to stabilize forward iteration under large stiffness contrasts; and (iii) a unified GPU pipeline that reuses a single sparse factor across forward, backward, and contact computations, with stiffness-amplified Rayleigh damping folded into the same factor for heterogeneity-aware dissipation at zero recurring cost. DiffPhD achieves strict gradient accuracy while delivering up to an order-of-magnitude speedup over prior differentiable solvers on heterogeneous, hyperelastic, contact-rich benchmarks. Crucially, this speedup does not come at the cost of stability: DiffPhD remains convergent on stiffness contrasts up to 100x where prior PD solvers degrade. This unlocks end-to-end gradient-based optimization on regimes previously bottlenecked by either solver fragility or per-iteration cost -- shell--joint composite creatures, soft characters wielding stiff weapons, and soft-gripper robotic manipulation -- all handled within a single forward--backward pass.

2605.14524 2026-05-15 stat.ML cs.LG

Large Dimensional Kernel Ridge Regression: Extending to Product Kernels

Yang Zhou, Yicheng Li, Yuqian Cheng, Qian Lin

发表机构 * Department of Statistics and Data Science(统计与数据科学系) Tsinghua University(清华大学) Department of Mathematical Science(数学科学系)

AI总结 本文研究了高维核岭回归(KRR)中在更广泛核函数下的泛化误差行为,扩展了之前仅针对球面内积核的结果。作者提出了一类新的高维核函数,并推导了其对应的泛化误差收敛速率。研究发现,即使在更一般的核设置下,仍存在最小最大最优性、饱和效应以及收敛速率的周期性平台和样本量相关的多重下降现象,从而拓展了对高维KRR行为的理解。

详情
英文摘要

Recent studies have reported $\textit{saturation effects}$ and $\textit{multiple descent behavior}$ in large dimensional kernel ridge regression (KRR). However, these findings are predominantly derived under restrictive settings, such as inner product kernels on sphere or strong eigenfunction assumptions like hypercontractivity. Whether such behaviors hold for other kernels remains an open question. In this paper, we establish a broad, new family of large dimensional kernels and derive the corresponding convergence rates of the generalization error. As a result, we recover key phenomena previously associated with inner product kernels on sphere, including: $i)$ the $\textit{minimax optimality}$ when the source condition $s\le 1$; $ii)$ the $\textit{saturation effect}$ when $s>1$; $iii)$ a $\textit{periodic plateau phenomenon}$ in the convergence rate and a $\textit {multiple-descent behavior}$ with respect to the sample size $n$.

2605.14512 2026-05-15 cs.IR cs.AI

Asymmetric Generative Recommendation via Multi-Expert Projection and Multi-Faceted Hierarchical Quantization

Bin Huang, Xin Wang, Junwei Pan, Yongqi Zhou, Yifeng Zhou, Zhixiang Feng, Shudong Huang, Haijie Gu, Wenwu Zhu

发表机构 * DCST, Tsinghua University(清华大学直流系统研究所) DCST, BNRist, Tsinghua University(清华大学直流系统研究所) Tencent(腾讯)

AI总结 该论文针对生成式推荐(GenRec)模型中存在的输入和输出瓶颈问题,提出了一种不对称的连续-离散框架AsymRec。通过多专家语义投影(MSP)和多视角分层量化(MHQ)方法,分别提升了输入表示的语义丰富性和输出目标的结构化精度,有效缓解了流行度偏差和细粒度语义丢失的问题。实验表明,AsymRec在多个数据集上显著优于现有生成式推荐方法,平均性能提升达15.8%。

详情
英文摘要

Generative Recommendation (GenRec) models reformulate recommendation as a sequence generation task, representing items as discrete Semantic IDs used symmetrically as both inputs and prediction targets. We identify a critical dual-stage information bottleneck in this design: (1) the Input Bottleneck, where lossy quantization degrades fine-grained semantics, while popularity bias skews the learned representations toward frequent items, and (2) the Output Bottleneck, where imprecise discrete targets limit supervision quality. To address these issues, we propose AsymRec, an asymmetric continuous-discrete framework that decouples input and output representations. Specifically, Multi-expert Semantic Projection (MSP) maps continuous embeddings into the Transformer's hidden space via expert-specialized projections, preserving semantic richness and improving generalization to infrequent items. Multi-faceted Hierarchical Quantization (MHQ) constructs high-capacity, structured discrete targets through multi-view and multi-level quantization with semantic regularization, preventing dimensional collapse while retaining fine-grained distinctions. Extensive experiments demonstrate that AsymRec consistently outperforms state-of-the-art generative recommenders by an average of 15.8 %. The code will be released.

2605.14502 2026-05-15 eess.SY cs.AI cs.SY

Quantifying Cyber-Vulnerability in Power Electronics Systems via an Impedance-Based Attack Reachable Domain

Hongwei Zhen, Ze Yu, Xin Xiang, Wuhua Li, Mingyang Sun

发表机构 * IEEE

AI总结 本文研究了电力电子系统在受到网络攻击时的脆弱性量化问题,提出了一种基于阻抗的攻击可达域(ARD)框架,用于评估在权限受限条件下节点可能被推近不稳定的程度。该方法通过阻抗重塑映射可行的攻击动作到关键特征值迁移,并定义了攻击穿透指数以综合表征系统稳定性裕度的渗透程度和成功攻击的可达性。为应对逆变器模型缺失的情况,还构建了一个实用的灰盒评估流程,结合现有阻抗识别与可微代理工具,实验表明该方法能有效揭示传统电网强度指标无法反映的脆弱性模式。

详情
英文摘要

Power electronics systems are increasingly exposed to cyber threats due to their integration with digital controllers and communication networks. However, an attacker-oriented metric is still lacking to quantify the extent to which a node can be pushed toward instability within a privilege-constrained action space. This letter proposes an impedance-based Attack Reachable Domain (ARD) framework that maps feasible adversarial actions to critical-eigenvalue migration through impedance reshaping. Based on the ARD, an Attack Penetration Index is defined to quantify node-level cyber-vulnerability by jointly characterizing the penetration of the nominal stability margin and the accessibility of successful destabilizing attacks within a privilege-constrained action space. To make the proposed assessment computable when inverter models are unavailable, a practical gray-box workflow is further established by integrating existing impedance identification and differentiable surrogate tools. Case studies on a 4-bus system and a modified IEEE 39-bus system show that coordinated cross-layer manipulations are markedly more damaging than isolated single-layer attacks, and that the proposed metric reveals vulnerability patterns that cannot be inferred from grid-strength indicators.

2605.14501 2026-05-15 eess.SY cs.AI cs.LG cs.SY

Fully Dynamic Rebalancing in Dockless Bike-Sharing Systems via Deep Reinforcement Learning

Edoardo Scarpel, Alberto Pettena, Matteo Cederle, Federico Chiariotti, Marco Fabris, Gian Antonio Susto

发表机构 * University of Padua(帕多瓦大学)

AI总结 本文提出了一种基于深度强化学习的全动态再平衡方法,用于解决无桩共享单车系统中的车辆调度问题。该方法通过图模拟器建模服务系统,并将再平衡问题建模为马尔可夫决策过程,利用深度强化学习代理实时调度单车,根据时空关键性评分执行局部的取车、还车和充电操作。实验结果表明,该方法在真实数据上显著减少了车辆可用性失败,同时减少了空间不平等和出行荒漠现象,展示了基于学习的再平衡方法在提升共享微出行系统效率和可靠性方面的价值。

Comments 6 pages, 5 figures, 1 table, accepted at the 23rd IFAC World Congress, Busan, South Korea, Aug. 23-26, 2026. Open invited track 9-131: "Control and Optimization for Smart Cities"

详情
英文摘要

This paper proposes a fully dynamic Deep Reinforcement Learning (DRL) method for rebalancing dockless bike-sharing systems, overcoming the limitations of periodic, system-wide interventions. We model the service through a graph-based simulator and cast rebalancing as a Markov decision process. A DRL agent routes a single truck in real time, executing localized pick-up, drop-off, and charging actions guided by spatiotemporal criticality scores. Experiments on real-world data show significant reductions in availability failures with a minimal fleet size, while limiting spatial inequality and mobility deserts. Our approach demonstrates the value of learning-based rebalancing for efficient and reliable shared micromobility.

2605.14495 2026-05-15 cs.MM cs.AI

Contestable Multi-Agent Debate with Arena-based Argumentative Computation for Multimedia Verification

Truong Thanh Hung Nguyen, Vo Thanh Khang Nguyen, Hoang-Loc Cao, Phuc Ho, Van Pham, Hung Cao

发表机构 * University of New Brunswick(新 Brunswick大学) University of Science, VNU-HCM(越南国家大学科学学院(VNU-HCM))

AI总结 该研究针对多媒体验证任务中准确性和透明性并重的需求,提出了一种可争议的多智能体框架,结合多模态大语言模型、外部验证工具和基于竞技场的双极论证计算方法。该方法将每个案例分解为以主张为中心的模块,检索针对性证据并生成带有来源和强度评分的支持与攻击论点,通过局部论证图进行冲突解决和不确定性处理,最终生成结构清晰、可编辑且具有实际计算可行性的验证报告。

Comments ACM ICMR 2026 Grand Challenge on Multimedia Verification

详情
英文摘要

Multimedia verification requires not only accurate conclusions but also transparent and contestable reasoning. We propose a contestable multi-agent framework that integrates multimodal large language models, external verification tools, and arena-based quantitative bipolar argumentation (A-QBAF) as a submission to the ICMR 2026 Grand Challenge on Multimedia Verification. Our method decomposes each case into claim-centered sections, retrieves targeted evidence, and converts evidence into structured support and attack arguments with provenance and strength scores. These arguments are resolved through small local argument graphs with selective clash resolution and uncertainty-aware escalation. The resulting system generates section-wise verification reports that are transparent, editable, and computationally practical for real-world multimedia verification. Our implementation is public at: https://github.com/Analytics-Everywhere-Lab/MV2026_the_liems.

2605.14478 2026-05-15 cs.SE cs.AI cs.CL

When Retrieval Hurts Code Completion: A Diagnostic Study of Stale Repository Context

Haojun Weng, Qianqian Yang, Hao Fu, Haobin Pan, Xinwei Lv

发表机构 * Independent Researcher, California, USA(加利福尼亚独立研究员) Independent Researcher, Beijing, China(北京独立研究员)

AI总结 该研究探讨了检索增强代码生成中使用过时代码片段可能对代码补全造成的负面影响。通过在五个Python仓库中对17个生产辅助函数签名变化进行受控实验,研究发现仅使用过时代码片段会显著诱导模型生成与当前状态不兼容的代码,而完全不使用检索则导致生成结果无法通过验证。实验还表明,引入当前有效的代码信息可以有效缓解过时信息带来的问题,揭示了检索内容的时间有效性是评估代码检索增强生成鲁棒性的重要因素。

Comments 31 pages, 2 tables. Submitted to Information and Software Technology (Elsevier)

详情
英文摘要

Context: Retrieval-augmented code generation relies on cross-file repository context, but retrieved snippets may come from obsolete project states. Objectives: We study whether temporally stale repository snippets act as harmless noise or actively induce current-state-incompatible code. Methods: We conduct a controlled diagnostic study on a curated 17-sample set of production-helper signature changes from five Python repositories. For each sample, we compare current-only, stale-only, no-retrieval, and mixed current/stale retrieval conditions under prompts that hide commit freshness and expected current signatures. Results: Under neutralized prompts, stale-only retrieval induces stale helper references on 15/17 Qwen2.5-Coder-7B-Instruct samples and 13/17 gpt-4.1-mini samples, corresponding to 88.2 and 76.5 percentage-point increases over current-only retrieval. No retrieval produces zero stale references but only 1/17 passing completions. The two models share 75.0% Jaccard overlap among stale-triggering samples, and mixed conditions show that adding valid current evidence largely rescues stale-only failures. Conclusion: Temporal validity of retrieved repository context is a distinct diagnostic variable for Code RAG robustness: stale context can actively bias models toward obsolete repository state rather than merely removing useful evidence.

2605.14434 2026-05-15 cs.IR cs.AI

Efficient Generative Retrieval for E-commerce Search with Semantic Cluster IDs and Expert-Guided RL

Jianbo Zhu, Xing Fang, Jing Wang, Mingmin Jin, Bokang Wang, Guangxin Song, Zhenyu Xie, Junjie Bai

发表机构 * Taobao \& Tmall Group of Alibaba Hangzhou China Taobao \& Tmall Group of Alibaba

AI总结 该研究针对电商搜索中生成式召回方法的实用化难题,提出了一种高效的生成式召回框架CQ-SID,通过语义聚类ID和专家引导强化学习方法,有效降低了搜索复杂度并提升了召回效果。CQ-SID结合类别和查询约束的对比学习与残差量化VAE,生成分层语义标识符,显著减少束搜索规模;同时提出的EG-GRPO方法通过引入真实样本,优化生成召回与后续排序的一致性。实验表明,该方法在语义点击率和个性化点击率上分别提升26.76%和11.11%,并在实际系统中取得了显著的GMV和转化率提升。

详情
英文摘要

Generative retrieval offers a promising alternative by unifying the fragmented multi-stage retrieval process into a single end-to-end model. However, its practical adoption in industrial e-commerce search remains challenging, given the massive and dynamic product catalogs, strict latency requirements, and the need to align retrieval with downstream ranking goals. In this work, we propose a retrieval framework tailored for real-world recall scenarios, positioning generative retrieval as a recall-stage supplement rather than an end-to-end replacement. Our method, CQ-SID (Category-and-Query constrained Semantic ID), employs category-aware and query-item contrastive learning along with Residual Quantized VAEs to encode items into hierarchical semantic cluster identifiers, significantly reducing beam search complexity. Additionally, we develop EG-GRPO (Expert-Guided Group Relative Policy Optimization), a reinforcement learning approach that aligns generative recall with downstream ranking under sparse rewards by injecting ground-truth samples to stabilize training. Offline experiments on TmallAPP search logs show that CQ-SID achieves up to 26.76% and 11.11% relative gains in semantic and personalized click hitrate over RQ-VAE baselines, while halving beam search size. EG-GRPO further improves multi-objective performance. Online A/B tests confirm gains in GMV (+1.15%) and UCTCVR (+0.40%). The generative recall channel now contributes substantially in production, accounting for over 50.25% of exposures, 58.96% of clicks, and 72.63% of purchases, demonstrating a viable path for deploying generative retrieval in real-world e-commerce systems.

2605.14426 2026-05-15 physics.ao-ph cs.AI

A plug-and-play generative framework for multi-satellite precipitation estimation

Yunfan Yang, Haofei Sun, Xiuyu Sun, Wei Han, Xiaoze Xu, Xingtao Song, Jun Li, Zhiqiu Gao, Wei Huang, Hao Li

发表机构 * State Key Laboratory of Atmospheric Boundary Layer Physics and Atmospheric Chemistry(大气边界层物理与大气化学国家重点实验室) Institute of Atmospheric Physics, Chinese Academy of Sciences(中国科学院大气物理研究所) Shanghai Academy of Artificial Intelligence for Science (SAIS)(上海人工智能科学研究院) CMA Earth System Modeling and Prediction Centre (CEMC)(中国气象局地球系统模拟与预测中心)

AI总结 该研究提出了一种名为PRISMA的插件式生成框架,用于多卫星降水估计。该方法通过从IMERG最终场中学习无条件降水先验,并结合独立训练的传感器特定条件分支,实现了无需重新训练生成主干即可灵活集成新传感器数据。实验表明,PRISMA在降水估计精度和效率方面均有显著提升,尤其在融合红外与微波观测数据时,显著提高了关键成功指数并降低了均方根误差。

详情
英文摘要

Reliable precipitation monitoring is essential for disaster risk reduction, water resources management, and agricultural decision-making. Multi-source satellite observations, particularly the combination of geostationary infrared and passive microwave measurements, have become a primary means of precipitation detection. Traditional multi-source satellite precipitation estimation methods remain computationally inefficient, and many deep learning methods lack the flexibility to incorporate new sensors without retraining the full model. Here we introduce PRISMA (Precipitation Inference from Satellite Modalities via generAtive modeling), a plug-and-play latent generative framework for multi-sensor precipitation estimation. PRISMA learns an unconditional precipitation prior from IMERG Final fields and constrains it through independently trained, sensor-specific conditional branches, allowing new observation sources to be incorporated without retraining the generative backbone. Applied to FY-4B AGRI infrared and GPM GMI microwave observations, PRISMA improves Critical Success Index by up to 40.3% and reduces root-mean-square error by 22.6% relative to infrared-only estimation within microwave swaths, while also improving probabilistic skill and maintaining an average inference time of about 37 s. Independent rain-gauge validation across China confirms consistent gains, and typhoon case studies show that microwave conditioning restores eyewall and spiral rainband structures, reducing storm-core mean absolute error by up to 42.3%. PRISMA thus provides an extensible and efficient framework for multi-sensor precipitation estimation.

2605.14421 2026-05-15 cs.CR cs.AI

MemLineage: Lineage-Guided Enforcement for LLM Agent Memory

Ciyan Ouyang, Rui Hou

发表机构 * State Key Laboratory of Cyberspace Security Defense(网络空间安全防御国家重点实验室) Institute of Information Engineering, CAS(信息工程研究所,中国科学院) Beijing, China(北京,中国)

AI总结 MemLineage 是一种针对大型语言模型(LLM)代理记忆的防御机制,通过为每条记忆条目附加密码学来源信息和LLM推导链,确保记忆内容的可信性。该方法将记忆管理视为一种“保管链”问题,利用 Merkle 日志和有向无环图(DAG)记录记忆的生成过程,从而在防止恶意内容被用于敏感操作的同时,保留有用的回忆能力。实验表明,MemLineage 在多个记忆污染场景中表现出色,显著降低了误动作率,且性能开销极低。

Comments 24 pages, 8 figures. Rui Hou is the corresponding author

详情
英文摘要

We introduce MemLineage, a defense for LLM agent memory that attaches both cryptographic provenance and LLM-mediated derivation lineage to every entry. Recent and concurrent work shows that untrusted content can be written into persistent agent state and re-enter later sessions as an instruction; the remaining systems question is how to preserve useful memory recall while preventing such state from justifying sensitive actions. MemLineage treats this as a chain-of-custody problem rather than a filtering problem. It is a six-module design around an RFC-6962 Merkle log over per-principal Ed25519-signed entries: a weighted derivation DAG records which retrieved entries influenced each new memory, and a max-of-strong-edges propagation rule makes Untrusted-Path Persistence hold for any chain whose attribution edges remain above threshold. The sensitive-action gate then refuses dispatches whose active justification descends from an external ancestor, while still allowing benign recall. We evaluate three defense cells against three memory-poisoning workloads on a deterministic mechanism-isolation harness; MemLineage is the only configuration in that harness that drives all three columns to zero ASR, while sub-millisecond per-operation overhead keeps it well below the noise floor of any LLM call. A Codex-backed AgentDojo bridge further separates strong-model behavior from defense-layer behavior: under an intentionally vulnerable tool-output profile, no-defense and signature-only baselines fail on all six banking pairs, while all MemLineage rows reduce strict AgentDojo ASR to zero. The core deterministic artifacts are byte-equal CI-verified; hosted-model AgentDojo and live-model sweeps are recorded as auditable logs rather than byte-pinned artifacts.

2605.14418 2026-05-15 cs.CR cs.AI

The Great Pretender: A Stochasticity Problem in LLM Jailbreak

Jean-Philippe Monteuuis, Cong Chen, Jonathan Petit

发表机构 * Core contributors(核心贡献者)

AI总结 该论文指出,当前大语言模型(LLM)越狱攻击的评估中存在一个关键问题:攻击成功率(ASR)并不稳定,导致不同研究之间的结果难以比较。研究发现,即使某些攻击在封闭模型上表现出高ASR,但在实际测试中却只能以50%的连续成功率通过开放模型,揭示了越狱攻击生成和评估过程中随机性(stochasticity)的影响。为此,作者提出了一种新的评估框架CAS-eval和生成框架CAS-gen,有效提升了攻击的一致性和成功率,为越狱攻击的标准化评估提供了新方法。

详情
英文摘要

"Oh-Oh, yes, I'm the great pretender. Pretending that I'm doing well. My need is such, I pretend too much..." summarizes the state in the area of jailbreak creation and evaluation. You find this method to generate adversarial attacks proposed by a reputable institution (e.g., BoN from Anthropic or Crescendo from Microsoft Research). However, this method does not deliver on the promise claimed in the paper despite having top ASR scores against industry-grade LLMs. You successfully generate the jailbreak prompts against your target (open) model. However, the generated jailbreak prompt works against the target model with a 50% consecutive success rate (5 out of 10 attempts) despite having an 80% ASR (on paper) on the latest closed-source model (with a guardrail system)! This observation leads us to think. First, Attack Success Rate (ASR), the primary metric for LLM jailbreak benchmarking, is not a stable quantity. Second, published ASR numbers are therefore systematically inflated and incomparable across papers. Therefore, we wonder "Why a successful jailbreak prompt does not perform consistently well against a target model on which the prompts have been optimized?". To answer this question, we study the impact of stochasticity not only during attack evaluation but also during attack generation. Our evaluation includes several jailbreak attacks, models (different sizes and providers), and judges. In addition, we propose a new metric and two new frameworks (CAS-eval and CAS-gen). Our evaluation framework, CAS-eval, shows that an attack can have an ASR drop of up to 30 percentage points when a jailbreak prompt needs to succeed on more than one attempt. Thankfully, our attack generation framework (CAS-gen) improves previous jailbreak methods and helps them recover this loss of 30 percentage points!

2605.14415 2026-05-15 cs.SE cs.AI cs.CL

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

Man Ho Lam, Chaozheng Wang, Hange Liu, Jingyu Xiao, Haau-sing Li, Jen-tse Huang, Terry Yue Zhuo, Michael R. Lyu

发表机构 * The Chinese University of Hong Kong(香港中文大学) Independent(独立) ELLIS Technical University of Darmstadt(达姆施塔特技术大学) Johns Hopkins University(约翰霍普金斯大学) Monash University(墨尔本大学)

AI总结 SWE-Chain 是一个用于评估代码智能体在连续版本升级场景下表现的基准,聚焦于包级别的连续发布升级任务。该研究设计了一种基于版本说明与代码差异对齐的合成流程,生成真实可行的升级需求,并构建了包含 9 个真实 Python 包、155 个版本转换和 1660 个升级要求的测试集。实验表明,当前主流代码智能体在连续升级任务中仍面临较大挑战,难以在不破坏现有功能的前提下完成准确的升级操作。

详情
英文摘要

Coding agents powered by large language models are increasingly expected to perform realistic software maintenance tasks beyond isolated issue resolution. Existing benchmarks have shifted toward realistic software evolution, but they rarely capture continuous maintenance at the granularity of package releases, where changes are bundled, shipped, and inherited by subsequent versions. We present SWE-Chain, a benchmark for evaluating agents on chained release-level package upgrades, where each transition builds on the agent's prior codebase. To produce upgrade specifications, we design a divide-and-conquer synthesis pipeline that aligns release notes with code diffs for each version transition, ensuring the requirements are grounded in actual code changes, informative to agents, and feasible to implement. SWE-Chain contains 12 upgrade chains across 9 real Python packages, with 155 version transitions and 1,660 grounded upgrade requirements. Across nine frontier agent-model configurations, agents achieve an average of 44.8% resolving, 65.4% precision, and 50.2% F1 under the Build+Fix regime, with Claude-Opus-4.7 (Claude Code) leading at 60.8% resolving, 80.6% precision, and 68.5% F1. These results show that SWE-Chain is both feasible and discriminative, and reveal that current agents still struggle to make correct upgrades across chained package releases without breaking existing functionality.

2605.14386 2026-05-15 cs.NE cs.AI

Darwin Family: MRI-Trust-Weighted Evolutionary Merging for Training-Free Scaling of Language-Model Reasoning

Taebong Kim, Youngsik Hong, Minsik Kim, Sunyoung Choi, Jaewon Jang, Junghoon Shin, Minseo Kim

发表机构 * VIDRAFT Inc.(VIDRAFT公司)

AI总结 本文提出了一种名为 Darwin Family 的框架,通过无训练的进化合并方法提升大语言模型的推理能力。该方法基于梯度-free的权重空间重组,引入了自适应合并基因、MRI-Trust融合机制以及跨架构映射器,实现了对现有模型检查点中潜在能力的重新组织与优化。实验表明,Darwin 模型在多个任务上超越了其原始训练模型,展示了无需额外训练即可提升模型推理性能的有效性。

Comments NeurIPS 2026 submission. 18 pages including appendix

详情
英文摘要

We present Darwin Family, a framework for training-free evolutionary merging of large language models via gradient-free weight-space recombination. We ask whether frontier-level reasoning performance can be improved without additional training, by reorganizing latent capabilities already encoded in existing checkpoints. Darwin introduces three key ideas: (i) a 14-dimensional adaptive merge genome enabling fine-grained component- and block-level recombination; (ii) MRI-Trust Fusion, which adaptively balances diagnostic layer-importance signals with evolutionary search through a learnable trust parameter; and (iii) an Architecture Mapper that enables cross-architecture breeding between heterogeneous model families. Empirically, the flagship Darwin-27B-Opus achieves 86.9% on GPQA Diamond, ranking #6 among 1,252 evaluated models, and outperforming its fully trained foundation model without any gradient-based training. Across scales from 4B to 35B parameters, Darwin models consistently improve over their parents, support recursive multi-generation evolution, and enable a training-free evolutionary merge that combines Transformer- and Mamba-based components. Together, the Darwin Family demonstrates that diagnostic-guided evolutionary merging is a practical and reproducible alternative to costly post-training pipelines for reasoning-centric language models.

2605.14370 2026-05-15 physics.geo-ph cs.AI physics.comp-ph

Deciphering Neural Reparameterized Full-Waveform Inversion with Neural Sensitivity Kernel and Wave Tangent Kernel

Ruihua Chen, Yisi Luo, Bangyu Wu, Xile Zhao, Deyu Meng

发表机构 * School of Mathematics and Statistics, Xi’an Jiaotong University(西安交通大学数学与统计学院) School of Mathematical Sciences, University of Electronic Science and Technology of China(电子科技大学数学科学学院)

AI总结 本文研究了神经重参数化全波形反演(NeurFWI)的收敛机制,提出了神经灵敏度核(NSK)和波切线核(WTK),揭示了神经表示如何通过调节原始灵敏度核和波切线核的特征结构,影响反演过程中的谱滤波效应、梯度波数调制和波频偏差等关键行为。基于这些理论分析,作者提出了改进的NeurFWI方法,提升了反演性能与效率,并在地震勘探和医学成像中验证了其有效性。

详情
英文摘要

Full-waveform inversion (FWI) estimates unknown parameters in the wave equation from limited boundary measurements. Recent advances in neural reparameterized FWI (NeurFWI) demonstrate that representing the parameters using a neural network can reduce the reliance on the high-quality initial model and wavefield data, at the cost of slow high-resolution convergence. However, its underlying theoretical mechanism remains unclear. In this study, we establish the neural sensitivity kernel (NSK) and the wave tangent kernel (WTK) to analyze their convergence behavior from both model and data domains. These theoretical frameworks show that the neural tangent kernel (NTK) induced by neural representation adaptively modulates the original sensitivity and wave tangent kernels. This modulation leads to several key outcomes, i.e., the spectral filtering effect, the gradient wavenumber modulation, and the wave frequency bias, connecting the convergence behavior of NeurFWI with the eigen-structures of NSK and WTK. Building on these insights, we propose several enhanced NeurFWI methods with tailored eigen-structures in NSK and WTK to improve inversion performances and efficiency. We numerically validate these theoretical claims and the proposed methods in seismic exploration, and firstly extend their application to medical imaging.

2605.14362 2026-05-15 cs.SE cs.AI

Correctness-Aware Repository Filtering Under Maximum Effective Context Window Constraints

Shweta Mishra

发表机构 * Independent Researcher(独立研究者)

AI总结 该研究针对大语言模型在开发工具中面临的上下文窗口效率问题,提出了一种基于文件大小的预执行过滤框架,用于在代码仓库扫描前高效剔除超出上下文限制的非代码文件。该方法仅依赖操作系统级别的元数据,具有极低的计算开销,能够在不进行索引和语义分析的情况下实现快速过滤。实验表明,该方法在多个开源仓库中显著减少了输入令牌数量,同时提升了代码生成的准确性并降低了幻觉发生率。

详情
英文摘要

Context window efficiency is a practical constraint in large language model (LLM)-based developer tools. Paulsen [12] shows that all tested models degrade in accuracy well before their advertised context limits the Maximum Effective Context Window (MECW) which makes context construction a quality problem, not just a cost one. Modern software repositories routinely contain large non-code artifacts compiled datasets, binary model weights, minified JavaScript bundles, and gigabyte-scale log files that overflow the context window and push out task-relevant source code. We present a correctness-aware context hygiene framework: a pre-execution, size-based heuristic filter that intercepts repository scans before tokenization, using only OS-level stat() metadata with sub-millisecond overhead. Semantic retrieval approaches such as RepoCoder, GraphRAG, and AST-based chunking require index construction and query-time inference before any filtering decision is reached. Our framework, by contrast, requires no indexing and operates at <0.01 ms per file decision. Across 10 real open-source repositories (22,046 files, 5 languages), the proposed SizeFilter at θ=1 MB achieves 79.6% (\pm13.2%) mean token reduction at 0.30 ms overhead: the HybridFilter achieves 89.3% (\pm9.0%) the lowest variance of any filter evaluated. A token-density study across 2,688 files confirms a strong linear correlation (Pearson r=0.997, k=0.250 tokens/byte). A limited-scope evaluation (18 tasks, CodeLlama-7B-Instruct) yields 72% file-level accuracy under filtering versus 25% at baseline; hallucination frequency declines from 61% to 17%. All code and data are released for reproducibility.

2605.14360 2026-05-15 cs.HC cs.CL

A Formative Study of Brief Affective Text as a Complement to Wearable Sensing for Longitudinal Student Health Monitoring

Tamunotonye Harry, Johanna Hidalgo, Matthew Price, Yuanyuan Feng, Kathryn Stanton, Connie Tompkins, Peter Sheridan Dodds, Mikaela Irene Fudolig, Laura Bloomfield, Christopher Danforth

发表机构 * University of Vermont(佛罗里达大学) University of Vermont Department of Computer Science(佛罗里达大学计算机科学系) University of Vermont Department of Psychological Science(佛罗里达大学心理学科学系) University of Vermont Vermont Complex Systems Center(佛罗里达大学复杂系统中心) University of Vermont Department of Rehabilitation(佛罗里达大学康复与运动科学系) University of Vermont MassMutual Center of Excellence in Complex Systems(佛罗里达大学复杂系统与数据科学卓越中心) Adelaide University School of Mathematical Sciences(阿德莱德大学数学科学学院) University of Vermont Department of Mathematics(佛罗里达大学数学系) Adelaide University(阿德莱德大学)

AI总结 该研究探讨了如何通过简短的情绪文本补充可穿戴设备的数据,以更全面地监测大学生的长期健康状况。研究采用开放式问题收集学生关于自身担忧的简短回答,并结合可穿戴设备数据,利用多种自然语言处理方法分析情绪与睡眠、活动等健康指标的关系。结果表明,情绪表达而非具体话题内容对健康指标有显著影响,提示简短情绪反馈可有效提升被动生理数据的心理可解释性。

Comments Submitted to ACM IMWUT

详情
英文摘要

Wearable devices capture physiological and behavioral data with increasing fidelity, but the psychological context shaping these outcomes is difficult to recover from sensor data alone, limiting passive sensing utility for digital health. We examined whether ultra-brief naturalistic concern text could serve as a scalable complement to passive sensing. In a year-long study of 458 university students (3,610 person-waves) tracked with Oura rings, participants responded bimonthly to an open-ended prompt about what concerned them most; responses had a median length of three words. We compared dictionary-based, general pretrained, and domain-adapted NLP approaches using within-person mixed-effects models across nine sleep and physical activity outcomes. Weeks dominated by academic concern framing were associated with lower physical activity; weeks characterized by emotional exhaustion language were associated with poorer sleep quality and lower heart rate variability. General pretrained embeddings outperformed domain-adapted models for most outcomes, with domain adaptation showing relative advantage for autonomic outcomes. Zero-shot classification of concern topics produced no significant associations, while affective dimensions across all three methods were consistently associated with outcomes, indicating emotional register rather than topical content carries the signal. These findings offer design guidance: ultra-brief affective prompts enrich the psychological interpretability of passive physiological data at minimal burden.

2605.14351 2026-05-15 eess.SY cs.LG cs.SY

Randomized Atomic Feature Models for Physics-Informed Identification of Dynamic Systems

Rajiv Singh, Mario Sznaier, Lennart Ljung

发表机构 * The MathWorks Inc.(MathWorks公司) ECE Dept., Northeastern University(东北大学电子工程系)

AI总结 本文提出了一种基于随机稳定原子特征的物理信息系统识别框架,通过将脉冲响应表示为稳定极点所关联的阻尼复指数的随机叠加,将系统识别转化为带有线性、二阶锥和KYP约束的凸正则化最小二乘问题。该方法推广了随机傅里叶和拉普拉斯特征,适用于工程系统中的阻尼非平稳情形,同时保持模态可解释性和可扩展的有限维计算。研究还从算子理论角度分析了稳定极点正测度生成正定核的特性,并给出了核空间到ℓ₁空间的嵌入、随机特征收敛性以及稀疏恢复的条件保证。

Comments Extended version of the conference paper submitted for IFAC World Congress, 2026

详情
英文摘要

We present a physics-informed framework for system identification based on randomized stable atomic features. Impulse responses are represented as random superpositions of stable atoms, namely damped complex exponentials associated with poles sampled inside a prescribed disk. Identification is then cast as a convex regularized least-squares problem with optional linear, second-order-cone, and KYP constraints. The approach generalizes random Fourier and random Laplace features to the damped, nonstationary regime relevant to engineering systems while retaining modal interpretability and scalable finite-dimensional computation. The main analytic point is an operator-theoretic Disk-Bochner viewpoint: positive measures over stable poles generate positive-definite kernels with a radius-dependent shift defect, while a converse scalar disk moment representation for an arbitrary kernel is characterized by subnormality of the canonical shift. We prove this statement, establish an RKHS-to-l1 embedding, show that sampled poles induce a valid finite atomic gauge, discuss random-feature convergence, and state sparse-recovery guarantees conditionally on the restricted-eigenvalue properties of the realized disk-Vandermonde or input-output design matrix. We also connect the normalized transfer function problem to Nevanlinna-Pick interpolation and LFT set-membership. The framework directly encodes stability margins, modal localization, DC-gain bounds, monotonicity, passivity, relative degree, settling-time targets, and time/frequency-domain error bounds. Numerical comparisons illustrate how physically meaningful priors can compensate for poor excitation and improve constrained impulse-response recovery in an under-informative data setting.

2605.14331 2026-05-15 eess.SP cs.AI cs.ET cs.IT cs.LG math.IT

Analog RF Computing: A New Paradigm for Energy-Efficient Edge AI Over MU-MIMO Systems

Wentao Yu, Vincent W. S. Wong

发表机构 * Department of Electrical and Computer Engineering, The University of British Columbia(电气与计算机工程系,不列颠哥伦比亚大学)

AI总结 本文提出了一种基于模拟射频(RF)计算的新范式,用于在多用户多输入多输出(MU-MIMO)无线系统中实现高效节能的边缘人工智能推理。该方法通过基站广播编码的神经网络权重波形,客户端利用无源混频器进行本地输入编码波形的乘法运算,从而在无线接收端高效完成矩阵-向量乘法操作。研究设计了一种面向计算的物理层框架,优化了计算精度与能耗之间的平衡,并提出了一种低复杂度算法解决非凸优化问题,实验表明该方法相比传统数字计算可将客户端能耗降低近两个数量级,为边缘推理提供了高效的无线计算新途径。

Comments 13 pages, 6 figures, 2 tables. This paper proposes analog RF computing as a new paradigm for energy-efficient edge inference over wireless networks and studies the corresponding physical layer design framework

详情
英文摘要

Modern edge devices increasingly rely on neural networks for intelligent applications. However, conventional digital computing-based edge inference requires substantial memory and energy consumption. In analog radio frequency (RF) computing, a base station (BS) encodes the weights of the neural networks and broadcasts the RF waveforms to the clients. Each client reuses its passive mixer to multiply the received weight-encoded waveform with a locally generated input-encoded waveform. This enables wireless receivers to perform the matrix-vector multiplications (MVMs) that account for most of the computation burden in edge inference with ultra-low energy consumption. Unlike conventional downlink transmissions which are optimized for communications, analog RF computing requires a computing-centric physical layer that controls both the analog MVM accuracy and the energy consumption for inference. Motivated by this, in this paper, we propose a physical layer design framework for analog RF computing in MU-MIMO wireless systems. We derive tractable models for computing accuracy and energy consumption for inference, formulate a joint BS beamforming and client-side scaling problem subject to computing accuracy, transmit power, and hardware constraints, and develop a low-complexity algorithm to solve the non-convex problem. The proposed design provides client- and layer-specific accuracy control for both uniform- and mixed-precision inference. Simulations under 3GPP specifications show that analog RF computing can significantly reduce client-side energy consumption by nearly two orders of magnitude compared to digital computing, while mixed-precision inference requires even lower energy consumption than uniform-precision inference. Overall, these results establish analog RF computing over wireless networks as a promising paradigm for energy-efficient edge inference.

2605.14291 2026-05-15 cs.CR cs.AI cs.CL cs.CV cs.LG

To See is Not to Learn: Protecting Multimodal Data from Unauthorized Fine-Tuning of Large Vision-Language Model

Chengshuai Zhao, Zhen Tan, Dawei Li, Zhiyuan Yu, Huan Liu

发表机构 * School of Computing Augmented Intelligence, Arizona State University, Tempe, AZ, USA Department of Computer Science Engineering, Texas A\&M University, College Station, TX, USA

AI总结 随着大型视觉-语言模型(LVLMs)的快速发展,未经授权的数据抓取和微调行为带来了严重的版权和隐私风险。为此,本文提出MMGuard,通过注入人类不可感知的扰动生成“不可学习”的示例,主动防御数据被用于未经授权的LVLM微调。该方法利用模型的学习动态,制造优化捷径,使模型在训练时过度拟合噪声,从而在推理时性能下降。此外,MMGuard引入跨模态关联破坏策略,增强防御效果,并在多种威胁模型下展现出高效、隐蔽且鲁棒的保护能力。

详情
英文摘要

The rapid advancement of Large Vision-Language Models (LVLMs) is increasingly accompanied by unauthorized scraping and training on multimodal web data, posing severe copyright and privacy risks to data owners. Existing countermeasures, such as machine unlearning and watermarks, are inherent post-hoc approaches that act only after intellectual property infringement has already occurred. In this work, we propose MMGuard to empower data owners to proactively protect their multimodal data against unauthorized LVLM fine-tuning. MMGuard generates unlearnable examples by injecting human-imperceptible perturbations that actively exploit the learning dynamics of LVLMs. By minimizing the training loss, the perturbation creates an optimization shortcut, causing the model to overfit to the noise and thereby degrading downstream performance when the perturbation is absent during inference. To further strengthen this defense, MMGuard introduces a cross-modal binding disruption, strategically shifting LVLM attention to enforce a spurious correlation between the noise and the training target with theoretical guarantees. Enhanced by an ensemble learning strategy for cross-model transferability, MMGuard is evaluated against nine open-source LVLMs across six datasets. Our comprehensive results demonstrate effective, stealthy, and robust protection under white-box, gray-box, and black-box threat models, establishing a mechanistic advantage in proactively defending against aggressive fine-tuning exploitation.