arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 4033
2605.09378 2026-05-12 cs.CV cs.AI cs.CL

EduStory: A Unified Framework for Pedagogically-Consistent Multi-Shot STEM Instructional Video Generation

Xinyi Wu, Jayant Teotia, Shuai Zhao, Erik Cambria

发表机构 * Nanyang Technological University(南洋理工大学) Shanghai Jiao Tong University(上海交通大学)

AI总结 EduStory 是一个统一的框架,旨在生成符合教学逻辑的多镜头STEM教学视频。该方法通过整合教学状态建模、脚本引导的结构化控制以及面向学习的评估指标,有效提升了视频在知识一致性和教学叙事连贯性方面的表现。研究还引入了 EduVideoBench 评估基准,支持对生成视频的多粒度分析与评估,实验表明该框架在保持教学意图和知识准确性方面具有显著优势。

详情
英文摘要

Long-horizon video generation has advanced in visual quality, yet existing methods still struggle to maintain knowledge consistency and coherent pedagogical narratives across multi-shot instructional videos, especially in STEM domains. To address these challenges, we propose EduStory, a unified framework for reliable instructional video generation. EduStory integrates pedagogical state modeling to track persistent knowledge states, script-guided structured control to organize multi-shot narratives, and learning-oriented evaluation metrics to assess knowledge fidelity and constraint satisfaction. To support rigorous evaluation, we further introduce EduVideoBench, a diagnostic benchmark with multi-granularity annotations, including pedagogical storyboards, shot-level semantics, and knowledge state transitions, together with baseline tasks for controllable instructional video generation. Extensive experiments demonstrate that domain-aware state modeling and structured control substantially reduce narrative breakdown and improve alignment with instructional intent. These results highlight the significance of domain-specific structural constraints and tailored benchmarks for advancing reliable, controllable, and also trustworthy long-horizon video generation.

2605.09376 2026-05-12 cs.RO

Mismatch-Aware Adaptive Constraint Tightening for Bicycle-Model Trajectory Optimization

Lingxue Lyu, Zihui Liu

发表机构 * School of Engineering and Applied Science, University of Pennsylvania(宾夕法尼亚大学工程与应用科学学院) Department of Aeronautics & Astronautics, Stanford University(斯坦福大学航空与航天工程系)

AI总结 本文针对自动驾驶车辆轨迹优化中因模型与实际动力学不匹配导致的安全约束违反问题,提出了一种基于模型失配特性的自适应约束收紧方法。研究通过理论分析得出了特征速度、偏差与时间平方成正比的规律,并推导出仅依赖车辆参数和规划时域的解析系数,从而构建了状态相关的约束收紧公式。实验表明,该方法在保证安全性的前提下显著减少了冗余安全余量,适用于多种车辆模型并在闭环MPC中表现出优越性能。

详情
英文摘要

Trajectory optimization for autonomous vehicles usually relies on the kinematic bicycle model because of its computational simplicity. However, when the planned trajectory is executed under the true vehicle dynamics, which include lateral slip, tire stiffness and yaw-lateral coupling, safety constraints can be violated owing to the model mismatch. In this paper, we make three theoretical contributions. First, we derive a characteristic speed $v_c=\sqrt{C_αL/M}$ which separates two different mismatch regimes: below $v_c$ the dynamic bicycle initially oversteers inward (safe); above $v_c$ it understeers outward (safety-critical). Second, we prove that the peak outward deviation $\varepsilon^*$ follows a $T^2$ horizon scaling whose coefficient transitions between a transient bound $\frac{1}{2}(v^2-v_c^2)κ$ and a steady-state bound. Third, we obtain a simulation-free analytical coefficient $a_2^{\mathrm{anal}}=\frac{1}{2}(1-v_c^2/v_{\max}^2)T^2$ that is computable from vehicle parameters and the planning horizon alone. Putting these together, we propose Mismatch-Aware Adaptive Constraint Tightening (MACT), $ε(v,κ)=a_2 v^2|κ|$, which replaces a fixed worst-case margin by a state-dependent one that is large at high speed/curvature but nearly zero on gentle paths. Eight numerical experiments confirm the scaling laws. MACT reaches 100% safety with 84% less wasted margin than a fixed-margin baseline on the 2-DOF vehicle, extends to a nonlinear leaning bicycle, and in a closed-loop direct-shooting MPC comparison it cuts the applied margin by 34% compared with tube MPC while keeping the same safety.

2605.09369 2026-05-12 cs.AI

Explainable Knowledge Tracing via Probabilistic Embeddings and Pattern-based Reasoning

Siyu Wu, Cong Xu, Wei Zhang

发表机构 * Shanghai Institute of AI Education, East China Normal University, Shanghai, 200241, China(上海人工智能教育研究院,华东师范大学,上海,200241,中国) Shool of Computer Sicence and Technology, East China Normal University, Shanghai, 200241, China(计算机科学与技术学院,华东师范大学,上海,200241,中国)

AI总结 该论文提出了一种可解释的知识追踪模型PLKT,旨在解决传统深度学习模型在预测学生知识状态时缺乏可解释性的问题。PLKT采用概率嵌入和基于模式的推理方法,将知识状态表示为贝塔分布的随机变量,并通过显式的逻辑运算构建透明的推理路径,从而揭示历史学习行为如何影响预测结果。实验表明,PLKT在保持高预测性能的同时,显著提升了模型的可解释性。

详情
英文摘要

Knowledge Tracing (KT) models students' knowledge states based on learning interactions to predict performance. While deep learning-based KT models have boosted predictive accuracy, most models rely on deterministic vector embeddings and opaque latent state transitions, limiting interpretability regarding how specific past behaviors influence predictions. To address this limitation, we propose Probabilistic Logical Knowledge Tracing (PLKT), an interpretable KT framework that formulates prediction as a goal-conditioned evidence reasoning process over historical learning behaviors. Instead of representing knowledge states as deterministic vector embeddings, PLKT employs robust Beta-distributed probabilistic embeddings to represent student knowledge states. This probabilistic foundation allows us to model the uncertainty of historical behaviors and perform explicit logical operations (e.g., conjunction), constructing transparent reasoning paths that reveal how specific past interactions contribute to the prediction. Extensive experiments show that PLKT outperforms state-of-the-art KT methods while achieving superior interpretability. Our code is available at https://anonymous.4open.science/r/PLKT-D3CE/.

2605.09365 2026-05-12 cs.AI cs.CL

Position: Avoid Overstretching LLMs for every Enterprise Task

Kuldeep Singh, Anson Bastos, Isaiah Onando Mulang'

发表机构 * Eka Labs AI Microsoft(微软) SAP

AI总结 本文探讨了在企业任务中过度依赖大语言模型(LLM)可能带来的效率低下和可靠性问题,指出企业任务通常具有确定性、结构化和知识依赖性,且对成本、延迟和可靠性有严格要求。作者主张应将语言模型作为接口而非单一引擎,将知识和计算分离到专用组件中,以提高系统的可靠性、可扩展性和透明度。研究理论证明了有限容量的模型难以全面覆盖企业任务所需的知识范围,并提出应将语言模型主要用于结构化信息提取,而将计算和存储任务委托给知识库和符号处理流程,从而构建更可靠和可持续的企业级AI架构。

详情
英文摘要

Enterprise workloads are dominated by deterministic, structured, and knowledge-dependent tasks operating under strict cost, latency, and reliability constraints. While these are often addressed through large language model (LLM) deployment or distillation into smaller models, we argue this is inefficient, unreliable, and misaligned with enterprise task structures. Instead, AI systems should treat language models as interfaces rather than monolithic engines, externalizing knowledge and computation into dedicated components for greater reliability, scalability, and transparency. Our theoretical evidences show that finite-capacity models cannot fully capture the breadth of knowledge required for enterprise tasks, creating inherent limits to efficiency and interpretability. Building on this, we take the position that language models should primarily be used for structured extraction in deterministic enterprise workflows, while computation and storage are delegated to knowledge bases and symbolic procedures. We formally demonstrate that such modular architectures are more reliable and maintainable than monolithic frameworks, offering a sustainable foundation for enterprise tasks.

2605.09364 2026-05-12 cs.LG

Multi-scale Predictive Representations for Goal-conditioned Reinforcement Learning

Valliappan Chidambaram Adaikkappan, David Meger, Sai Rajeswar, Pietro Mazzaglia

发表机构 * Mila, McGill University(蒙特利尔大学Mila实验室) ServiceNow Research(ServiceNow研究) Qualcomm Research(高通研究)

AI总结 本文研究了在离线目标条件强化学习(GCRL)中鲁棒表征学习的问题,特别是在稀疏奖励环境下,如何学习对齐状态和目标潜在表示的挑战。为了解决表征漂移问题,作者提出了一种基于多尺度预测监督的框架Ms.PR,使智能体能够从局部物理动态到长期目标结构等多个尺度理解环境,从而在潜在空间中实现目标导向的对齐。实验表明,Ms.PR在视觉和状态任务中均表现出优异的表征质量和性能,并在多种复杂数据条件下展现出强大的鲁棒性。

详情
英文摘要

This paper investigates robust representation learning in offline goal-conditioned reinforcement learning (GCRL). Particularly in sparse reward scenarios, learning representations that align state and goal latents is a challenge that frequently culminates in representation divergence where the encoder drifts toward a low-dimensional, goal-agnostic subspace that destabilizes policy learning. We address this issue by showing that an agent must acquire a fundamental understanding of its environment across multiple scales, from local physical dynamics to long-horizon goal-directed structure. Building on this insight, we propose Ms.PR, a framework that leverages multi-scale predictive supervision to enforce goal-directed alignment within the latent space. We demonstrate that Ms.PR leads to improved representation quality and strong performance on both vision and state-based tasks. Furthermore, we show that our approach is exceptionally resilient under realistic, challenging data regimes, maintaining state-of-the-art performance across a wide variety of tasks, trajectory stitching scenarios, and extreme noise conditions.

2605.09363 2026-05-12 cs.LG

Near-Optimal Last-Iterate Convergence for Zero-Sum Games with Bandit Feedback and Opponent Actions

Soumita Hait, Ping Li, Haipeng Luo, Mengxiao Zhang

发表机构 * University of Southern California(南加州大学) Shanghai University of Finance and Economics(上海财经大学) University of Iowa(爱荷华大学)

AI总结 本文研究了在零和博弈中,当玩家仅能观测到自身损失以及对手动作时,学习动态的最后迭代收敛问题。作者提出了一种高效的算法,通过稀疏更新策略并求解估计的对数障碍正则化博弈,实现了以高概率达到 $t^{-1/2}$ 的最后迭代收敛率。该工作克服了传统多臂老虎机分析在博弈场景中的局限性,实验表明该算法相比现有方法收敛更快,同时其结果也改进了对战老虎机这一特例的已有成果。

详情
英文摘要

Last-iterate convergence of learning dynamics in games has attracted significant recent attention. In two-player zero-sum games with bandit feedback, where only the loss of the selected action pair is observed, Fiegel et al. (2025) show a separation between average-iterate and last-iterate convergence in duality gap: while the optimal t^(-1/2) rate after t rounds is achievable for the former via standard no-regret algorithms, the latter cannot converge faster than t^(-1/3) in expectation or t^(-1/4) with high probability. However, in many practical settings, such as preference learning, the players observe not only their loss but also the opponent's action. This raises a natural question: can such additional information enable faster last-iterate convergence? We answer this question affirmatively, showing that t^(-1/2) last-iterate convergence is achievable with high probability in this setting, via an efficient algorithm that updates its strategy infrequently by solving an estimated log-barrier-regularized game. We identify fundamental obstacles preventing standard analysis for multi-armed bandits, the single-player case, from generalizing to games, and develop a novel analysis to overcome them. Experiments confirm that our algorithm indeed converges faster than naive baselines and prior methods that do not exploit opponent-action feedback. Finally, we note that our results also improve those for dueling bandits, a special case with skew-symmetric game matrices.

2605.09360 2026-05-12 cs.LG cs.AI cs.CL cs.SE

Your Simulation Runs but Solves the Wrong Physics: PDE-Grounded Intent Verification for LLM-Generated Multiphysics Simulation Code

Zhenghan Song, Yulong Liu, Cheng Wan, Chenjun Li, Lingfu Liu, Yunyi Li, Congcong Yuan

发表机构 * Cornell University(康奈尔大学) Columbia University(哥伦比亚大学) Harvard University(哈佛大学) Nanyang Technological University(南洋理工大学)

AI总结 该论文研究了大语言模型生成的多物理场仿真代码与用户意图之间的不匹配问题,提出了基于偏微分方程(PDE)的意图验证方法。通过构建意图保真度分数(IFS)并设计基于PDE的修正循环,该方法能够检测并修正生成代码中与用户意图不符的物理方程、边界条件等关键部分。实验表明,该方法在多个基准测试中显著提升了代码的意图一致性,揭示了可执行性与物理正确性应作为两个独立的验证维度。

Comments Preprint

详情
英文摘要

Execution-based evaluation of LLM-generated code implicitly treats successful execution as a proxy for correctness. In scientific simulation, this proxy is insufficient: a generated input file can run, mesh, and converge while encoding governing equations that differ from the user's intent. We call this mismatch between intended physics and generated code the comprehension-generation gap. We instantiate this in MOOSE, where Kernel and BC objects map compositionally to weak-form residual terms, enabling deterministic reconstruction of the encoded PDE and comparison against an intended contract. We formalize this comparison as the Intent Fidelity Score (IFS), a structural metric covering governing terms, BCs, ICs, coefficients, and time scheme. Building on IFS, we develop a PDE-grounded refinement loop that uses deterministic violation reports to correct generated code iteratively. We evaluate on MooseBench, a 220-case multiphysics benchmark with PDE-level ground truth released with this work. On this benchmark, our method consistently improves mean IFS over direct generation, with gains concentrated on hard cases. On the subset where direct generation falls below IFS 0.7, refinement adds +0.22 to +0.41 absolute IFS. In the deployment audit, execution-only repair improves execution success while leaving 39-40% of all 220 cases runnable but still solving the wrong physics across the three main deployment-audit models, exposing executability and intent fidelity as separable failure modes. Static proof-of-concept experiments on four PDE-oriented DSLs (UFL/FEniCS, FreeFEM, FiPy, and Devito) suggest that the reconstruction-and-comparison pattern extends beyond MOOSE. These findings reinforce that executable simulation code should be verified against the mathematical structure it is intended to encode, not accepted on execution alone.

2605.09359 2026-05-12 cs.LG cs.AI

Skill-R1: Agent Skill Evolution via Reinforcement Learning

Yash Vishe, Rohan Surana, Xunyi Jiang, Zihan Huang, Xintong Li, Nikki Lijing Kuang, Tong Yu, Ryan A. Rossi, Jingbo Shang, Julian McAuley, Junda Wu

发表机构 * UC San Diego(UC圣地亚哥大学) Adobe Research(Adobe研究院)

AI总结 该研究提出了一种名为Skill-R1的强化学习框架,用于通过可验证奖励进行实例级别的技能递归优化。与传统依赖提示工程或对任务模型本身进行对齐的方法不同,Skill-R1训练一个轻量级的技能生成器,根据任务上下文、历史执行结果及其验证反馈生成指导冻结任务模型的技能,从而实现低成本且兼容开源与闭源模型的适应。通过引入双层组相对策略优化目标,Skill-R1有效地实现了技能的定向进化,实验表明其在多个基准任务上优于无技能基线和标准GRPO方法,尤其在复杂多步骤任务中表现突出。

详情
英文摘要

Agentic large language models often rely on skills, reusable natural language procedures that guide planning, action, and tool use. In practice, skills are typically improved through prompt engineering or by aligning the task LLM itself, which is costly, model-specific, and often infeasible for closed-source models. Skill optimization is not a one-step problem but a recurrent process with two coupled levels of credit assignment: a useful skill must improve rollout quality under current conditioning, while a useful revision must turn observed outcomes into a better skill for the next round. We propose Skill-R1, a reinforcement learning framework for instance-level recurrent skill optimization from verifiable rewards. Rather than updating the task LLM, Skill-R1 trains a lightweight skill generator that conditions on the task context, prior rollouts, and their verified outcomes to produce skills that steer a frozen task LLM. This preserves black-box compatibility with both open- and closed-source models while making adaptation substantially cheaper than model-level updates. Skill-R1 proceeds over multiple generations: at each step, the current skill induces rollouts whose verified outcomes are fed back to produce the next revision. To optimize this recurrent process, we introduce a bi-level group-relative policy optimization objective combining intra-generation and inter-generation advantages. The intra-generation term compares rollouts under shared skill conditioning, while the inter-generation term rewards revisions that improve behavior across successive generations. Together, these provide a principled objective for directional skill evolution rather than one-shot self-refinement. Empirically, Skill-R1 achieves consistent gains over no-skill baselines and standard GRPO across benchmarks with verifiable rewards, with particularly strong improvements on complex, multi-step tasks.

2605.09356 2026-05-12 cs.LG cs.NI

Function-Space ADMM for Decentralized Federated Learning: A Control Theoretic Perspective

Akihito Taya, Yuuki Nishiyama, Kaoru Sezaki

发表机构 * Institute of Industrial Science, The University of Tokyo(东京大学工业科学研究所) Center for Spatial Information Science, The University of Tokyo(东京大学空间信息科学中心)

AI总结 本文从控制理论的角度出发,提出了一种基于函数空间的分布式联邦学习算法FedF-ADMM,用于解决在无中心服务器的边缘设备网络中训练机器学习模型时面临的数据非独立同分布问题。该方法通过在函数空间中利用损失泛函的凸性,推导出基于ADMM的更新方向,并通过知识蒸馏将其投影到参数空间,从而提升模型训练的收敛性能和鲁棒性。实验表明,FedF-ADMM在严重非独立同分布场景下具有更快的收敛速度、更高的准确率和更好的设备间一致性。

Comments (c) 2026 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

Journal ref IEEE Internet of Things Journal, 2026

详情
英文摘要

Decentralized federated learning (FL) is a promising approach for training machine learning models on sensor networks, Internet of Things (IoT) devices, and other edge systems where no central server exists. While federated learning offers advantages such as preserving data privacy, it often suffers from non-independent and identically distributed (IID) data distributions across devices, which cause significant performance degradation. This issue is particularly severe when directly optimizing model parameters, because neural network training is inherently non-convex and standard convergence guarantees for convex optimization do not apply. Unlike existing decentralized FL methods that primarily operate in parameter space, we propose federated function-space alternating direction method of multipliers (FedF-ADMM). FedF-ADMM exploits the convexity of loss functionals within function space to derive alternating direction method of multipliers (ADMM)-based update directions, which are subsequently projected onto the parameter space via knowledge distillation. We further introduce a stabilization coefficient to enhance robustness under severe non-IID settings and analyze its behavior from a control-theoretic perspective by interpreting it as a proportional-integral (PI) term. Experiments under challenging non-IID scenarios, including settings where each device has data from only a single label, demonstrate that FedF-ADMM achieves faster and more stable convergence than existing decentralized FL methods, while attaining higher accuracy and better consensus among devices.

2605.09355 2026-05-12 cs.LG

FLAME: Adaptive Mixture-of-Experts for Continual Multimodal Multi-Task Learning

Xing Han, Shravan Chaudhari, Tanvi Ranade, Rama Chellappa, Suchi Saria

发表机构 * Johns Hopkins University(约翰霍普金斯大学)

AI总结 本文提出了一种名为FLAME的自适应专家混合模型框架,用于支持多模态多任务的持续学习。该方法结合了多任务预训练与持续适应两种场景,通过模态特定的路由机制实现灵活的模态组合学习,并利用低秩记忆子空间压缩专家知识以提升参数效率并缓解灾难性遗忘。实验表明,该方法在多个医疗多模态基准上表现出优越的性能。

Comments 37 pages, 25 figures, 6 tables

详情
英文摘要

Real-world model deployment across multiple domains requires multimodal models to operate under two complementary regimes: (1) multi-task pretraining, tasks are co-available at design time where related tasks could borrow representational strength from one another, (2) continual adaptation, in which new tasks emerge after deployment with previously unseen modality combinations. However, neither regime alone suffices: the pretraining task set is never exhaustive, while bypassing joint training forfeits the transfer gains and efficiency among co-trainable tasks. Sparse Mixture-of-Experts (MoE) is a natural fit for this dual requirement: sparse activation enables modular capacity expansion as new tasks arrive, while routing decouples modality-level computation from task-level composition. In this work, we propose a scalable MoE framework for multitask pretraining and continual learning across flexible modality combinations. The framework is designed to support training on multimodal tasks with diverse modality configurations by leveraging modality-specific routers that process tokens from each modality across tasks. Furthermore, it enables continual learning over sequential multimodal tasks within a fixed-capacity MoE by compressing accumulated expert knowledge into low-rank memory subspaces, while expanding only the lightweight routers. We validate the effectiveness of our method on multiple healthcare multimodal benchmarks. It demonstrates competitive multitask pretraining performance while alleviating catastrophic forgetting and improving parameter efficiency.

2605.09352 2026-05-12 cs.AI

The Wittgensteinian Representation Hypothesis: Is Language the Attractor of Multimodal Convergence?

Zhaoyang Zhang, Run Shao, Dongyue Wu, Jiajie Teng, Chao Tao, Jingdong Chen, Haifeng Li

发表机构 * Central South University(中南大学) Huazhong University of Science and Technology(华中科技大学) Shanghai Jiao Tong University(上海交通大学) Ant Group(蚂蚁集团)

AI总结 本文探讨了为何不同模态的独立训练神经网络会收敛到共享表示,并研究了这一收敛的方向性。作者提出了一种基于循环k近邻的定向收敛分析方法,发现非语言模态更倾向于向语言表示的结构靠拢,这一现象在多种模型和尺度下均成立。研究进一步指出,语言表示在表征空间中占据更紧凑的区域,信息瓶颈理论为此提供了理论解释,最终提出了“维特根斯坦表征假设”:语言的语义结构是多模态表征收敛的渐近吸引子。

Comments 22 pages, 11 figures, 6 tables

详情
英文摘要

Understanding why independently trained neural networks from different modalities converge toward shared representations, and where this convergence leads, remains an open question in representation learning. All existing evidence relies on symmetric similarity measures, which can detect convergence but are structurally blind to its direction. We introduce directional convergence analysis using cycle-kNN, an asymmetric alignment measure, applied across dozens of independently trained unimodal models spanning point clouds, vision, and language. We uncover a consistent directional asymmetry: non-language modalities move toward the neighborhood structure of language significantly more than the reverse, and this pattern holds across all model families and scales--yet is entirely invisible to symmetric measures. Mechanistic analysis traces the directionality to feature density asymmetry, whereby language representations occupy the most compact regions of representational space. The Information Bottleneck framework provides a principled interpretation: optimization under compression drives representations toward discrete, compositional structures characteristic of language. We formalize this as the Wittgensteinian Representation Hypothesis: the semantic structure of language is the asymptotic attractor of multimodal representation convergence.

2605.09350 2026-05-12 cs.AI

CHAINTRIX: A multi-pipeline LLM-augmented framework for automated smart-contract security auditing

Gabriela Dobrita, Simona-Vasilica Oprea, Adela Bara

发表机构 * Bucharest University of Economic Studies(布加勒斯特经济大学)

AI总结 智能合约漏洞已导致数十亿美元的损失,但安全审计仍存在成本高、效率低的问题。为解决这一问题,本文提出 Chaintrix,一个结合多管道和大语言模型的自动化智能合约安全审计框架,其核心在于将所有大模型生成的检测结果与确定性的合约结构表示进行比对,以提升准确性。该框架引入了跨合约交互模型(CCIM)对 Solidity 代码进行结构化解析,并通过多阶段的误报过滤机制与结构化验证引擎,显著提升了检测效果,在多个基准测试中表现出色,高危漏洞召回率达71.7%,优于当前最先进的模型基线。

详情
英文摘要

Smart-contract exploits have caused billions of USD in cumulative losses, yet audits remain expensive and slow. Automated tools have emerged to close this gap, but each class has a characteristic failure mode. Static analyzers report findings that frequently fail manual triage at high rates, while large language models (LLMs) hallucinate findings that contradict the source code. Thus, we propose Chaintrix, an end-to-end auditing framework whose central architectural commitment is that every LLM-generated claim must be discharged against a deterministic structural contract representation. We introduce a Cross-Contract Interaction Model (CCIM) that parses Solidity into a structured map of function-level reads, writes, modifiers and resolved cross-contract calls. CCIM serves as the substrate against which all 12 of Chaintrix's deterministic signal engines and the parallel LLM audit pipelines operate. A staged false-positive-reduction pipeline, terminating in a Structural Verdict Engine (SVE) that applies deterministic structural checks against parsed code, filters the merged finding set, with selected high-confidence findings further validated through symbolic execution and fuzz testing. We evaluate Chaintrix on EVMbench, the smart-contract security benchmark by OpenAI, Paradigm, OtterSec. Chaintrix detects 86 of 120 high-severity vulnerabilities (71.7% recall), with 25 audits scoring 100% recall, placing Chaintrix 26 percentage points above the strongest frontier-model baseline.

2605.09348 2026-05-12 cs.CL cs.AI cs.DB cs.MM

HOME-KGQA: A Benchmark Dataset for Multimodal Knowledge Graph Question Answering on Household Daily Activities

Shusaku Egami, Aoi Ohta, Tomoki Tsujimura, Masaki Asada, Tatsuya Ishigaki, Ken Fukuda, Masahiro Hamasaki, Hiroya Takamura

发表机构 * National Institute of Advanced Industrial Science(国家工业科学与技术研究院)

AI总结 本文提出HOME-KGQA,一个用于家庭日常活动的多模态知识图谱问答新基准数据集。该数据集基于多模态知识图谱构建,包含复杂的多跳自然语言问题及对应的图数据库查询语言,涵盖了多层级时空推理和多模态对齐等更具挑战性的任务。实验表明,现有基于大语言模型的KGQA方法在该数据集上的表现显著下降,突显了现实场景中KGQA系统仍面临诸多挑战。

Comments 12 pages, 4 figures, 7 tables, accepted at LREC2026

详情
英文摘要

Large Language Models (LLMs) provide flexible natural language processing capabilities, while knowledge graphs (KGs) offer explicit and structured knowledge. Integrating these two in a complementary manner enables the development of reliable and verifiable AI systems. In particular, knowledge graph question answering (KGQA) has attracted attention as a means to reduce LLM hallucinations and to leverage knowledge beyond the training data. However, existing KGQA benchmark datasets are biased toward encyclopedic knowledge, limited to a single modality, and lack fine-grained spatiotemporal data, which limits their applicability to real-world scenarios targeted by Embodied AI. We introduce HOME-KGQA, a novel KGQA benchmark dataset built on a multimodal KG of daily household activities. HOME-KGQA consists of complex, multi-hop natural language questions paired with graph database query languages. Compared to existing benchmarks, it includes more challenging questions that involve multi-level spatiotemporal reasoning, multimodal grounding, and aggregate functions. Experimental results show that the LLM-based KGQA methods fail to achieve performance comparable to that on existing datasets when evaluated on HOME-KGQA. This highlights significant challenges that should be addressed for the real-world deployment of KGQA systems. Our dataset is available at https://github.com/aistairc/home-kgqa

2605.09347 2026-05-12 cs.AI cs.LO

Dsat: A Native SAT Solver for Discrete Logic

Yaofang Zhang, Ken Zhou, Adnan Darwiche

发表机构 * Department of Computer Science, University of California, Los Angeles(加州大学洛杉矶分校计算机科学系)

AI总结 本文提出了一种专为离散逻辑设计的原生SAT求解器Dsat,用于处理变量可取任意离散值的逻辑问题,避免了传统将离散变量二值化为布尔变量的方法所带来的计算和语义挑战。该求解器在设计上借鉴了布尔SAT求解器的机制,如单元归结和子句学习,但直接在离散变量上运行,从而更高效地处理离散逻辑公式。实验表明,Dsat在解决离散CNF问题时相比传统方法具有明显优势。

Comments To Appear at The International Conferences on Theory and Applications of Satisfiability Testing (SAT), 2026

详情
英文摘要

Discrete variables are common in many applications, such as probabilistic reasoning, planning and explainable AI. When symbolic reasoning techniques are brought in to bear on these applications, a standard technique for handling discrete variables is to binarize them into Boolean variables to allow the use of Boolean computational machinery such as SAT solvers. This technique can face both computational and semantical challenges though. In this work, we develop a native SAT solver for discrete logic, which is a direct extension of Boolean logic in which variables can take arbitrary values. Our proposed solver has a similar design to Boolean SAT solvers, with ingredients such as unit resolution and clause learning but ones that operate natively on discrete variables. We illustrate the merits of the developed SAT solver by comparing it empirically to CSP solvers applied to discrete CNFs, to Boolean SAT solver applied to binarized CNFs, and to some hybrid solvers.

2605.09346 2026-05-12 cs.CL cs.AI

RuPLaR : Efficient Latent Compression of LLM Reasoning Chains with Rule-Based Priors From Multi-Step to One-Step

Xiaocheng Luo, Kang Wang, Zaifu Zhan, Yuechi Zhou, Xiangyu Duan

发表机构 * School of Computer Science and Technology(计算机科学与技术学院) Department of Electrical and Computer Engineering(电气与计算机工程系)

AI总结 本文提出了一种名为 RuPLaR 的新型压缩框架,旨在解决潜空间推理(latent CoT)中多步骤或多模型范式带来的结构复杂性问题。该方法通过引入基于规则的先验分布,引导大语言模型在单一训练阶段自主生成潜空间推理标记,从而消除级联过程和模型间依赖。实验表明,RuPLaR 在保持推理质量的同时显著提升了准确率,并大幅减少了所需标记数量,展现出良好的有效性和可扩展性。

Comments 15 pages, 15 figures

详情
英文摘要

The Chain-of-Thought (CoT) paradigm, while enhancing the interpretability of Large Language Models (LLMs), is constrained by the inefficiencies and expressive limits of natural language. Latent Chain-of-Thought (latent CoT) reasoning, which operates in a continuous latent space, offers a promising alternative but faces challenges from structural complexities in existing multi-step or multi-model paradigms, such as error propagation and coordination overhead. In this paper, we introduce One-Model One-Step, a novel compression framework for Latent Reasoning with Rule-Based Priors(RuPLaR) to address this challenge. Our method trains an LLM to autonomously generate latent reasoning tokens in a single training stage, guided by rule-based prior probability distributions, thereby eliminating cascaded processes and inter-model dependencies. To ensure reasoning quality, we design a joint training objective that enforces answer consistency via cross-entropy, aligns soft tokens with rule-based priors via KL divergence (the Soft Thinking constraint), and adds a problem-thought semantic alignment constraint in the representation space. Extensive experiments show that our compression framework not only improves accuracy by 11.1% over existing latent CoT methods but also achieves this with minimal token usage, underscoring its effectiveness and extensibility. Code: https://github.com/xiaocen-luo/RuPLaR.

2605.09345 2026-05-12 cs.LG

Selection Plateau and a Sparsity-Dependent Hierarchy of Pruning Features

Guangqi Li, Yongxin Li

发表机构 * Zaozhuang University(邹庄大学)

AI总结 本文研究了一次性神经网络剪枝中的“选择平台”现象,发现所有单调秩权重评分方法在固定稀疏度下会收敛到相同的准确率,与具体形式无关。作者提出了稀疏度-信息-复杂度光谱(SICS)假说,指出不同稀疏度下需要不同复杂度的特征来突破平台,且特征复杂度需与目标稀疏度匹配。实验表明,非单调特征在中等稀疏度下能显著提升剪枝效果,而仅靠梯度或简单高斯特征则效果有限,说明特征复杂度和秩对齐对剪枝性能至关重要。

Comments 22 pages, 3 figures, 5 tables. Empirical study + framework hypothesis on ViT-Small/CIFAR-10. Cross-domain validation (vision token pruning, KV cache compression, MoE routing) and cross-architecture extensions deferred to follow-up work

详情
英文摘要

We identify a Selection Plateau phenomenon in one-shot neural network pruning: all rank-monotone weight scorers converge to identical accuracy at fixed sparsity, independent of functional form. We propose the Sparsity-Information-Complexity Spectrum (SICS) hypothesis: a sparsity-dependent minimum feature complexity kappa(S) governs plateau escape, with kappa=0 sufficient at low sparsity (S<0.65), kappa=1 dominant at critical sparsity (S~0.7), and kappa=2 necessary at extreme sparsity (S>0.75). On ViT-Small/CIFAR-10, testing nine feature classes across four sparsities, smooth non-monotone features provide +6.6% escape at S=0.7, while only raw features with high-frequency wiggle escape at S=0.8 (+2.6%). A fake non-monotone scorer underperforms the gradient baseline, indicating the requirement is magnitude-independent non-monotonicity. A handcrafted Gaussian bump achieves only +0.006 escape vs. chaos-derived +0.046, indicating rank-alignment is necessary but insufficient. SICS provides a unifying explanation for the performance clustering of diverse pruning methods and suggests that future selection algorithms should adapt feature complexity to target sparsity.

2605.09344 2026-05-12 cs.RO cs.MA

PECMAN: Perception-enabled Collaborative Multi-Agent Navigation in Unknown Environments

Tianchonghui Fang, Shaunak Roy, Shalabh Gupta

发表机构 * Department of Electrical and Computer Engineering, University of Connecticut(电子与计算机工程系,康涅狄格大学)

AI总结 该研究针对未知动态环境中多智能体协作导航的问题,提出了一种基于感知增强的协同导航方法PECMAN。该方法通过分布式树形结构重构和共享感知策略,使每个智能体能够实时响应环境变化并调整路径,同时将新发现的信息广播给其他智能体,提升整体协同效率。实验表明,PECMAN在多个场景中显著降低了团队完成时间,同时保持了高成功率。

详情
英文摘要

Most path planners assume fully known, static environments, assumptions that fail when robots navigate in dynamic and partially observable environments. SMART-3D addresses these issues by real-time replanning, where it morphs the underlying RRT* tree whenever new obstacles or structures are discovered in the environment. Instead of rebuilding the tree entirely from scratch, SMART-3D prunes invalid nodes and edges and subsequently repairs the disjoint subtrees at hot-nodes to find a new path, thus providing high computational efficiency for real-time adaptability. We extend SMART-3D to perception-enabled collaborative multi-agent navigation (PECMAN) in unknown environments. PECMAN is built upon distributed tree morphing and shared perception strategies, where each agent reacts to environmental changes and morphs its respective tree to replan its path, while simultaneously broadcasting newly discovered structures to other agents, thus enabling them to proactively replan even in areas that have not yet been explored by them. This approach reduces redundant reactions and unnecessary replannings of the agents due to improved situational awareness. The performance of PECMAN was evaluated by 28,000 multi-agent simulations on seven 2D scenarios with different case studies. The results show that PECMAN achieves up to 52% reduction in the team-completion time, while maintaining near 100% success rates. Finally, PECMAN was tested by real experiments on two autonomous robots in a building environment.

2605.09343 2026-05-12 cs.AI

SKG-VLA: Scene Knowledge Graph Priors for Structured Scene Semantics and Multimodal Reasoning for Decision Making

Zeyu Li, Lei Li

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学)

AI总结 在大规模投诉处理系统中,决策日益依赖于多源异构证据,如投诉叙述、截图、订单元数据等。为解决现有系统对场景结构、规则知识和跨证据依赖利用不足的问题,本文提出SKG-VLA方法,通过构建场景知识图(SKG)来统一表示投诉场景中的实体、证据、政策条款及关系,并基于该图谱设计数据合成流程和三阶段训练策略,以增强模型的结构化语义理解和多模态决策能力。实验表明,SKG-VLA在政策驱动推理、投诉决策准确性及鲁棒性方面均有显著提升。

详情
英文摘要

Decision making in large-scale complaint handling systems increasingly relies on heterogeneous evidence, including complaint narratives, screenshots, order metadata, historical interactions, and platform policies. Existing complaint understanding systems mainly perform shallow classification or template matching over isolated modalities, while underutilizing explicit scene structure, rule knowledge, and cross-evidence dependencies. To address this limitation, we present SKG-VLA for multimodal complaint decision making. The core idea is to model each case as a structured complaint scene and represent its decision-relevant semantics with a \emph{Scene Knowledge Graph} (SKG), which organizes complaint entities, evidence items, policy clauses, temporal events, transactional states, and action-relevant relations into a unified graph. Based on SKG, we build a data synthesis pipeline that generates complaint scene descriptions, rule-consistent graph generalizations, question-answer supervision, and decision recommendations. We further construct a large-scale complaint scene dataset with both text-only and multimodal in-domain benchmarks. Finally, we adopt a three-stage training strategy -- domain-adaptive pre-training, task-oriented instruction fine-tuning, and end-to-end multimodal alignment -- to inject structured scene priors into a multimodal decision model. Experiments show that SKG-VLA consistently improves policy-grounded reasoning, complaint decision accuracy, long-tail generalization, and robustness under incomplete evidence.

2605.09339 2026-05-12 cs.CV cs.AI

Perceptual Asymmetry Between Hue Categories: Evidence from Human Color Categorization

Elnara Kadyrgali, Nuray Toganas, Muragul Muratbekova, Pakizar Shamoi

发表机构 * School of Information Technology and Engineering(信息科技与工程学院) Kazakh-British Technical University(哈萨克-英国技术大学)

AI总结 人类颜色类别在感知空间中并非均匀分布,但大多数计算颜色模型仍假设颜色表示是固定且均匀的。本文通过分析大规模人类颜色分类数据,扩展了COLIBRI模糊颜色模型,引入了基于模糊隶属函数的定量指标,揭示了色相类别间的感知不对称性。研究发现,黄色类别在色相空间中占据紧凑且明确的区域,而绿色类别则覆盖更广的区间并具有更长的过渡结构,表明人类颜色类别不仅具有模糊性,其几何组织也高度不均匀,为语言颜色分类和感知驱动的颜色建模提供了新的视角。

Comments The paper has been submitted for consideration to ICICS 2026 (International Conference on Informatics and Computer Science)

详情
英文摘要

Human color categories are not uniformly distributed in perceptual space, yet most computational color models still assume fixed and evenly structured representations. In this paper, we present a focused analytical extension of the COLIBRI fuzzy color model by investigating perceptual asymmetry between hue categories. Using previously collected large-scale human color categorization data, we introduce quantitative measures of category extent and boundary uncertainty, namely Wideness and Boundary Width, derived from fuzzy membership functions at the α = 0.5 level. The analysis reveals a strong imbalance between the two categories: yellow occupies a compact and sharply constrained region of the hue space, whereas green spans a substantially broader interval and exhibits a more extended transition structure. The results show that perceptual color categories are not only fuzzy, but also highly non-uniform in their geometric organization. This asymmetry suggests that some categories behave as narrow, highly specific perceptual labels, while others function as broad, tolerant regions of human color naming. These findings provide a new perspective on linguistic color categorization and extend the interpretability of the COLIBRI framework for perceptually grounded color modeling.

2605.09337 2026-05-12 cs.LG math.OC

Adversary-Robust Learning from Fully Asynchronous Directional Derivative Estimates

Anik Kumar Paul, Nibedita Roy, Nagesh Talagani, Swetha Ganesh, Gugan Thoppe, Alexandre Reiffers-Masson

发表机构 * Computer Science and Automation, Indian Institute of Science(印度科学研究院计算机科学与自动化系) Edwardson School of Industrial Engineering, Purdue University(普渡大学埃德沃兹工业工程学院) Department of Computer Science, IMT Atlantique(IMT阿登提大学计算机科学系)

AI总结 本文提出了一种名为 FAR-SIGN 的异步优化算法,用于在参数服务器-工作节点系统中实现对抗鲁棒学习。该方法通过沿精心设计的方向进行符号梯度更新,并结合双时间尺度机制减少偏差,从而提高鲁棒性。FAR-SIGN 支持一阶和零阶实现,无需服务器端的私有参考数据集,且支持完全异步执行。理论分析表明其几乎必然收敛于光滑非凸目标函数的平稳点,并在实验中表现出优于现有鲁棒聚合方法的准确率和运行效率。

详情
英文摘要

We propose FAR-SIGN (Fully Asynchronous Robust optimization via SIGNed directional projections) for adversary-resilient learning in parameter-server--worker systems. FAR-SIGN achieves robustness through sign-based updates along carefully designed directions and mitigates the resulting bias via a two-timescale mechanism. It admits both first-order and zeroth-order implementations and enables fully asynchronous execution without requiring a private reference dataset at the server. We establish almost-sure convergence of FAR-SIGN to the set of stationary points for smooth, nonconvex objectives. Moreover, we prove the near-optimal rate of $O(n^{-1/4+ε})$ in the first-order setting and the standard $O(n^{-1/6+ε})$ in the zeroth-order setting, where $n$ is the iteration count and $ε>0$ can be chosen arbitrarily small. Experiments on MNIST show that FAR-SIGN outperforms robust aggregation-based methods in both accuracy and wall-clock time.

2605.09335 2026-05-12 cs.LG

Functional Graphs for Predicting and Explaining Goal Failure in Sparse Goal-Conditioned RL

Shalley Dash

发表机构 * Institute of Management Technology(管理技术学院)

AI总结 该研究探讨了稀疏目标条件强化学习中策略失败的问题,提出通过确定性功能图分析策略行为,揭示出策略中的吸引子和流域结构。研究定义了局部目标支持(LGS)作为衡量策略在局部范围内能否成功达到目标的指标,并发现LGS可以有效诊断目标失败。进一步引入了策略诱导图的分类方法,以识别超出局部支持范围的失败模式,为理解稀疏目标条件强化学习中的失败提供了结构化分析工具。

Comments 9 pages main, 21 pages appendx, 2 figures in main. 8 figures in appendix, Submitted to a conference

详情
英文摘要

Sparse goal-conditioned reinforcement learning can produce policies whose failures are hidden by aggregate success rates. We analyze trained goal-conditioned value policies through the deterministic functional graphs induced by greedy evaluation: for each goal, every state maps to a single successor, decomposing behavior into attractors and basins. This reveals a local-to-global structure in learned policies. We define local goal support (LGS), a one-step statistic measuring the fraction of valid neighboring states whose greedy successor is the goal. In deterministic sparse GridWorlds, zero LGS exactly precludes goal entry from non-goal starts. Empirically, weak LGS is a strong diagnostic of goal-level failure across update rules, curricula, larger grids, and bottleneck geometries: the fixed rule LGS <= 0.5 identifies low-success goals with precision 0.921, recall 0.929, and F1 0.925 in the main 8x8 TD setting, with similar performance across variants. However, local support is not sufficient for global success: some supported goals still fail because distant states are captured by competing attractors or fragmented basin structure. We therefore introduce a compact post-hoc taxonomy of policy-induced graphs -- goal-dominant, competitor-dominated, partial/contested, and fragmented -- to characterize residual failure modes beyond local support. These results show that sparse GCRL failures can be understood as structured policy-induced dynamics, and that local one-step policy structure provides a cheap post-training diagnostic for goal-level failure.

2605.09331 2026-05-12 cs.LG

Dimension-Free Saddle-Point Escape in Muon

Yanlin Long, Yufei Gu, Zeke Xie

发表机构 * xLeaF Lab, The Hong Kong University of Science and Technology (Guangzhou)(xLeaF实验室,香港理工大学(广州))

AI总结 本文研究了现代大语言模型训练中因高维平坦马鞍点导致的优化瓶颈问题,分析了新兴优化器Muon在逃离马鞍点的动力学特性。通过扩展广义矩阵扰动理论,提出了一种理论框架,证明Muon通过非线性谱塑形机制有效规避了维度诅咒,实现了维度无关的马鞍点逃离。该方法避免了同向噪声假设和Tracy-Widom边缘奇异性,为非凸优化动力学提供了严格的数学分析和逃逸界限。

Comments 33 pages, 5 figures. Preprint

详情
英文摘要

Modern Large Language Model (LLM) training is fundamentally bottlenecked by pathologically flat saddle points in extreme high-dimensional landscapes. Motivated by this challenge, we analyze the saddle-point escape dynamics of the emerging Muon optimizer, demonstrating its resilience against the $\mathcal{O}(D)$ dimensional curse that severely traps element-wise adaptive optimizers like AdamW. By extending generalized matrix perturbation theory, we develop a theoretical framework to capture Muon's non-equilibrium optimization trajectories. This theoretical machinery mathematically proves that Muon elegantly bypasses the dimensional curse via a non-linear spectral shaping mechanism. By leveraging resolvent functional calculus and macroscopic Cauchy contour integration, we avoid isotropic noise assumptions and Tracy-Widom edge singularities. We establish that structural incoherence securely shields the trajectory from orthogonal drift, enabling a dimension-free saddle-point escape, and triggering a deterministic $\mathcal{O}(1)$ discrete ballistic ejection under sufficient spectral gap. Consequently, we provide an algebraically dimension-free escape bound for Muon, formalizing the underlying mechanics of its non-convex optimization dynamics.

2605.09330 2026-05-12 cs.LG cs.AI

The Trap of Trajectory: Towards Understanding and Mitigating Spurious Correlations in Agentic Memory

Luoxi Tang, Rupali Rajendra Vaje, Yuqiao Meng, Sakshi Sunil Narkar, Weicheng Ma, Zeyu Ding, Dazheng Zhang, Zhaohan Xi

发表机构 * Binghamton University, State University of New York(宾夕法尼亚州立大学) Oakland University(奥克兰大学) University of Pennsylvania(宾夕法尼亚大学)

AI总结 该论文研究了智能体记忆(Agentic Memory)中因错误关联导致的推理偏差问题,指出在长期记忆中检索到的信息可能包含误导性证据,从而影响后续决策的准确性。为解决这一问题,研究者提出了CAMEL方法,通过在记忆写入和检索阶段进行校准,有效减少了对虚假关联的依赖,同时保持了模型在正常输入上的性能,并在对抗性攻击下仍表现出鲁棒性。这一方法为构建更可靠、更安全的智能体记忆系统提供了实用的解决方案。

详情
英文摘要

Agentic memory enables LLMs to persist information beyond a single context window and reuse it in later decisions, but it also introduces a new vulnerability: spurious correlations, where retrieved memory carries miscorrelated evidence and propagates erroneous reasoning into downstream decisions. Despite the widespread use of agentic memory, this risk remains largely underexplored. We address it from two aspects. First, we benchmark several canonical types of spurious patterns identified through causal structure and record them across trajectory-level memory. Diagnosing agentic memory systems on this benchmark reveals that memory improves reasoning on clean inputs but amplifies reliance on spurious patterns when they are present. Second, we propose CAMEL, a plug-and-play calibration method that operates across diverse memory architectures at both write and retrieval time. CAMEL consistently reduces reliance on spurious patterns across all three types while preserving or improving performance on clean inputs and staying robust under adaptive attacks targeting the calibration. Overall, CAMEL offers a principled and lightweight solution toward more reliable agentic memory deployment.

2605.09328 2026-05-12 cs.CV

Noise-Started One-Step Real-World Super-Resolution via LR-Conditioned SplitMeanFlow and GAN Refinement

Wei Zhu, Kai Zhang, Yu Zheng, Lei Luo, Yong Guo, Jian Yang

发表机构 * Nanjing University of Science and Technology(南京理工大学) Nanjing University(南京大学) Huawei(华为)

AI总结 该研究提出了一种基于扩散模型的单步真实世界图像超分辨率方法SMFSR,旨在解决传统扩散模型在效率与质量之间的矛盾。该方法在保持噪声起始生成过程的基础上,通过LR条件下的SplitMeanFlow实现从噪声到高分辨率图像的直接映射,并引入GAN优化阶段提升细节真实感和图像自然度。实验表明,SMFSR在保持高效单步推理的同时,达到了当前单步扩散模型在真实世界超分辨率任务中的最优感知质量。

详情
英文摘要

Pre-trained text-to-image (T2I) diffusion models have shown strong potential for real-world image super-resolution (Real-ISR), owing to their noise-started generation process that enables realistic texture synthesis and captures the one-to-many nature of super-resolution. However, diffusion-based Real-ISR methods still face a fundamental efficiency-quality trade-off. Multi-step methods generate high-quality results by iteratively denoising random Gaussian noise under LR conditioning, but suffer from slow sampling. Recent one-step methods greatly improve efficiency, yet they typically replace noise-started generation with direct LR-to-HR restoration, which weakens stochasticity and limits realistic detail synthesis. To address this issue, we propose SMFSR, a noise-started one-step Real-ISR framework via LR-conditioned SplitMeanFlow and GAN refinement. SMFSR preserves the random-noise starting point of diffusion models and learns a direct noise-to-HR mapping conditioned on the LR image. To this end, Interval Splitting Consistency distills the multi-step generative trajectory into a single average-velocity prediction, enabling efficient one-step generation. To compensate for the reduced opportunity for progressive refinement, we further introduce a GAN refinement stage, where a DINOv3-based discriminator enhances realistic texture synthesis and variational score distillation aligns the generated outputs with the natural image distribution under a frozen diffusion teacher. Extensive experiments demonstrate that SMFSR achieves state-of-the-art perceptual quality among one-step diffusion-based Real-ISR methods while retaining fast single-step inference.

2605.09319 2026-05-12 cs.CV cs.LG

PGID: Progressive Guided Inversion and Denoising for Robust Watermark Detection

Minh Quoc Duong, Chun Tong Lei, Chun Pong Lau

发表机构 * City University of Hong Kong(香港城市大学)

AI总结 随着AI生成图像的普及,数字水印技术成为保护知识产权和防止恶意利用的重要手段。然而,现有的语义水印方法依赖扩散模型逆过程进行水印检测,容易受到印痕移除和伪造攻击的影响。本文提出了一种名为PGID的渐进引导逆过程与去噪框架,无需训练即可有效防御这些攻击,通过逐步逆过程和去噪循环将扰动的潜在变量投影回其原始区域,从而恢复被移除的水印并识别伪造实例。

详情
英文摘要

With the proliferation of AI-generated images, digital watermarking has become an essential safeguard for protecting intellectual property and mitigating malicious exploitation. Recent works on semantic watermarking have enabled efficient copyright protection for diffusion models. However, the dependence of semantic watermarking on diffusion inversion for watermark detection creates a critical vulnerability. Imprint removal and forgery attacks exploit this weakness to produce deceptive results. Our analysis reveals that these attacks succeed by displacing watermarked latents into the unwatermarked region, while guiding unwatermarked latents into the watermarked region. Based on that, we propose Progressive Guided Inversion and Denoising (PGID), the first plug-and-play, training-free noise extraction framework designed to defend against both attack strategies. PGID effectively defends by projecting perturbed latents back to the region where they originally belong. The projection is achieved by eliminating intermediate latent deflections and mitigating adversarial perturbations through progressive inversion-denoising cycles. Comprehensive evaluations across multiple schemes demonstrate that PGID successfully restores detection reliability by recovering removed watermarks and identifying forged instances.

2605.09317 2026-05-12 cs.CL cs.CV cs.LG

Mem-W: Latent Memory-Native GUI Agents

Guibin Zhang, Yaohui Ling, Fanci Meng, Kun Wang, Shuicheng Yan

发表机构 * LV-NUS Lab(LV-NUS实验室)

AI总结 本文提出了一种名为 Mem-W 的新型 GUI 智能体,其核心在于将记忆作为智能体连续上下文的一部分,而非传统的外部辅助结构。通过一个共享的轨迹到潜空间压缩器,Mem-W 将历史轨迹和当前会话片段编码为紧凑的记忆标记,并将其与当前 GUI 观测融合为连续的嵌入序列,从而实现对任务进展的统一感知与决策。实验表明,Mem-W 在多个网页和移动端导航任务中显著提升了多种基础模型和增强记忆方法的性能,最高提升达 30.0%,展示了潜空间原生记忆在长时程 GUI 操作中的有效性与扩展性。

详情
英文摘要

GUI agents are beginning to operate the web, mobile, and desktop as interactive worlds, where successful control depends on carrying forward visual, procedural, and task-level evidence beyond the fleeting present screen. Yet most agents still treat memory as an external, human-readable artifact: histories are summarized, categorized, retrieved, and reinserted as text or structured records before being encoded again by the policy. This creates a mismatch between the representational form in which experience is stored and the latent embedding sequence over which modern GUI policies actually act. We introduce Mem-W, a series of latent-memory-native GUI agents that treat memory as part of the agent's continuous context rather than as an auxiliary symbolic scaffold. Mem-W weaves both historical trajectories (as experiential memory) and in-session segments (as working memory) into compact memory tokens through a shared trajectory-to-latent compressor. These tokens are woven with the current GUI observation and local context into one continuous embedding sequence, allowing the agent to read successes, failures, and unfinished progress through the same machine-native interface. Mem-W is trained with self-distillation and outcome-aware supervision to preserve decision-relevant state while filtering memory toward evidence that truly supports task success. Across four web and mobile navigation benchmarks, Mem-W consistently improves diverse backbones and memory-enhanced baselines, with gains of up to $+30.0$, suggesting that latent-context-native memory can serve as a scalable foundation for long-horizon GUI agency.

2605.09315 2026-05-12 cs.AI cs.CL

Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation

Ye Yu, Xiaopeng Yuan, Haibo Jin, Heming Liu, Yaoning Yu, Haohan Wang

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文研究了大型语言模型代理在持续适应新任务过程中出现的能力退化问题,指出在工作流、技能、模型和记忆等多个进化维度上,自我演化可能导致已习得能力的逐步丧失。为此,作者提出了能力保持演化(CPE)方法,通过约束演化过程中的破坏性能力漂移,在保持适应性能的同时提升已有能力的稳定性。实验表明,CPE在多个任务场景下有效缓解了能力退化,为构建稳定、长期自我演化的智能代理提供了新思路。

详情
英文摘要

Recent advances in LLM agents enable systems that autonomously refine workflows, accumulate reusable skills, self-train their underlying models, and maintain persistent memory. However, we show that such self-evolution is often non-monotonic: adapting to new task distributions can progressively degrade previously acquired capabilities across all major evolution channels. We identify this phenomenon as \emph{capability erosion under self-evolution} and show that it consistently emerges across workflow, skill, model, and memory evolution. To mitigate this issue, we propose \emph{Capability-Preserving Evolution} (CPE), a general stabilization principle that constrains destructive capability drift during continual adaptation. Across all four evolution dimensions, CPE consistently improves retained capability stability while preserving adaptation performance. For example, in workflow evolution, CPE improves retained simple-task performance from 41.8\% to 52.8\% under GPT-5.1 optimization while simultaneously achieving stronger complex-task adaptation. Our findings suggest that stable long-horizon self-evolving agents require not only acquiring new capabilities, but also explicitly preserving previously learned ones during continual adaptation.

2605.09314 2026-05-12 cs.AI

How LLMs Are Persuaded: A Few Attention Heads, Rerouted

Xiangkun Sun, Lingkai Kong, Aoqi Zhang, Liang Zeng, Tonghan Wang

发表机构 * Northeastern University(东北大学) Harvard University(哈佛大学) Tsinghua University(清华大学) Skywork AI

AI总结 该研究探讨了大型语言模型如何被说服放弃事实知识的问题,揭示了其内部的因果机制。研究发现,模型的回答主要由少数中间层注意力头决定,这些注意力头将选项编码为低维多面体的顶点,说服过程实际上是一个从正确答案顶点到目标答案顶点的离散跳跃。通过干预实验,研究进一步确认了说服机制依赖于一个可操控的注意力路由特征,并追踪到输入中的说服关键词所构建的浅层注意力头,为监控和防御此类漏洞提供了新思路。

Comments 9 pages, 9 figures

详情
英文摘要

Language models can be persuaded to abandon factual knowledge. This vulnerability is central to AI safety, but its internal mechanism remains poorly understood. We uncover a compact causal mechanism for persuasion-induced factual errors. A small set of mid-layer attention heads almost entirely determines the model's answer. These heads write answer options into a low-dimensional polyhedron, with options occupying distinct vertices. Persuasion does not blur belief or merely reduce confidence; it causes a discrete latent jump from the correct-answer vertex to the persuasion-target vertex. We show that decision heads are not reasoning over evidence. Instead, they copy whichever option token their attention selects. Persuasion works by redirecting attention. We isolate a rank-one evidence-routing feature that controls the route. Directly modifying this feature steers the model's choice, and removing it blocks persuasion. We then trace the feature back to a band of shallower attention heads that build it from persuasive keywords in the input. Every step is validated by intervention. This mechanism appears across open-source LLMs and realistic poisoning scenarios such as Generative Engine Optimization, revealing persuasion as a narrow, monitorable circuit.

2605.09312 2026-05-12 cs.CV

Low-Cost Neural Radiance Fields

Alice Huang, Prathamesh Sonawane, Yashdeep Thorat, Yug Rao

发表机构 * University of Illinois Urbana Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文研究了如何在计算资源和数据量受限的情况下加速神经辐射场(NeRF)的训练与推理。作者对比了三种加速版NeRF模型,并针对低算力、低数据场景进行了扩展实验,包括引入深度监督损失、简化特征解码网络以及设计不同架构的HashNeRF。实验结果表明,在同等训练时间下,各改进方法未明显优于现有基线,但揭示了哪些改进更适合受限环境,并为未来研究提供了方向。

Comments 7 pages

详情
英文摘要

Neural Radiance Fields (NeRF) achieve high-quality novel-view synthesis, but their long training times and reliance on dense input views limit accessibility. We present a comparative study of three accelerated NeRF variants - DS-NeRF, TensoRF, and HashNeRF and explore extensions targeted at the low-compute, low-data regime. First, we add a depth-supervision loss derived from COLMAP keypoints to TensoRF (TensoRF-DS) and evaluate it on the LLFF dataset under reduced view counts. Second, we ablate the feature-decoding MLP of TensoRF and study the effect of input downsampling on PSNR and runtime on the synthetic Lego scene. Third, we propose four architectural variants of the HashNeRF color and density networks, including residual and convolutional designs, and report PSNR/training-time tradeoffs under matched iteration budgets. Under iso-time evaluation, none of our extensions conclusively outperform the published baselines, but the experiments characterize which extensions transfer to constrained settings and surface design questions for future work.

2605.09311 2026-05-12 cs.LG cs.AI physics.atom-ph physics.chem-ph physics.comp-ph

Teaching Molecular Dynamics to a Non-Autoregressive Ionic Transport Predictor

Jiyeon Kim, Byungju Lee, Won-Yong Shin

发表机构 * School of Mathematics and Computing (Computational Science and Engineering)(数学与计算学院(计算科学与工程)) Yonsei University(延世大学) Korea Institute of Science and Technology(韩国科学技术院) Nanoscience and Technology(纳米科学与技术)

AI总结 本文研究了如何快速准确地预测离子传输性质这一动态材料属性的问题,提出了一种基于辅助模态学习的非自回归学习框架,通过在训练过程中引入原子轨迹作为辅助信息,使模型在推理阶段无需依赖轨迹数据即可捕捉动态特性。该方法克服了现有自回归模型计算慢、误差累积以及非自回归模型动态信息利用不足的缺陷,在包含轨迹数据的测试集上实现了比自回归模型快200倍的加速,并显著降低了预测误差。

Comments International Conference on Machine Learning (ICML 2026) (to appear) (Please cite our conference version.)

详情
英文摘要

Unlike most static material properties widely studied in the machine learning literature, ionic transport properties are inherently dynamic, making their fast and accurate prediction from static atomic structures challenging. The current standard approach, molecular dynamics (MD) simulations, suffers from prohibitively high computational cost. Recent autoregressive learning-based MD acceleration methods requiring sequential inference remain slow and prone to error accumulation; in contrast, existing non-autoregressive material property prediction models are less accurate because they fail to exploit dynamics. Moreover, existing methods typically benefit from datasets either with or without atomic trajectories, but not both. To overcome these limitations, we propose a non-autoregressive learning framework based on auxiliary modality learning, which treats atomic trajectories as an auxiliary modality during training but does not require them at inference. This enables the predictor to learn dynamics without sequential inference while benefiting from both types of datasets. As a result, our framework achieves over 200 times speedup compared to autoregressive models on the dataset with atomic trajectories while substantially reducing prediction error relative to non-autoregressive benchmarks across both types of datasets. Our code is available at https://github.com/jykim-git/MD.