arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 4033
专题追踪
2605.09537 2026-05-12 cs.RO

Drift is a Sampling Error: SNR-Aware Power Distributions for Long-Horizon Robotic Planning

Kewei Chen, Yayu Long, Mingsheng Shang

发表机构 * Chongqing Institute of Green and Intelligent Technology, Chinese Academy of Sciences(中国科学院重庆绿色智能技术研究院) Chongqing School, University of Chinese Academy of Sciences(中国科学院大学重庆学院)

AI总结 尽管视觉-语言-动作(VLA)模型在机器人控制方面取得了快速进展,但在长期任务中仍存在指令漂移的问题。本文将这一现象重新定义为一种系统性的采样误差,并提出了一种无需训练的推理时计算框架——上下文感知功率采样(CAPS),通过功率分布增强全局轨迹概率,结合信噪比(SNR)的元认知控制机制,在检测到漂移风险时触发自适应MCMC搜索,从而在“直觉快速思考”与“理性慢速搜索”之间实现动态切换。实验表明,CAPS在多个长期任务基准上显著优于现有方法,提升了机器人长期任务的鲁棒性。

Comments Accepted at ICML 2026

详情
英文摘要

Despite rapid progress in Vision-Language-Action (VLA) models for robotic control, instruction drift remains a persistent failure mode in long-horizon tasks. This paper reconceptualizes this phenomenon, positing that instruction drift is fundamentally a systematic sampling error: local greedy sampling is prone to collapsing into "Negative Pivotal Windows"--irreversible local optima with high local probability that sever global success pathways. To address this, we propose Context-Aware Power Sampling (CAPS), a training-free inference-time computation framework. CAPS leverages power distributions to sharpen global trajectory probabilities, enabling lookahead search over the model's conditional generative trajectory distribution. Furthermore, we introduce a metacognitive control mechanism based on Signal-to-Noise Ratio (SNR). This mechanism triggers adaptive MCMC search solely when drift risk is detected, enabling a dynamic transition from "intuitive fast thinking" to "rational slow search." Experiments on RoboTwin, Simpler-WindowX, and Libero-long benchmarks show that CAPS achieves substantial improvements over strong baselines, including OpenVLA and TACO, without parameter updates. These results support the effectiveness of adaptive inference-time computation for improving long-horizon robustness in embodied control.

2605.09536 2026-05-12 cs.CL cs.AI

TAD: Temporal-Aware Trajectory Self-Distillation for Fast and Accurate Diffusion LLM

Haoyang Zhou, Li Kong, Shijie Ren, Xiting Wang, Shuang Liang, Guowei Wang, Zhenxuan Pan

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学北京校区人工智能学院) Ant Group(蚂蚁集团)

AI总结 扩散大语言模型(dLLMs)在并行文本生成方面具有潜力,但面临生成速度与准确率之间的权衡问题。为此,本文提出了一种时序感知的轨迹自蒸馏框架TAD,通过教师模型生成解码轨迹并根据解码步数对掩码位置进行划分,分别采用交叉熵损失和KL散度损失进行训练,从而在保证生成质量的同时提升并行效率。实验表明,TAD有效改善了准确率与并行性的平衡,在多个指标上均取得显著提升。

详情
英文摘要

Diffusion large language models (dLLMs) offer a promising paradigm for parallel text generation, but in practice they face an accuracy-parallelism trade-off, where increasing tokens per forward (TPF) often degrades generation quality. Existing acceleration methods often gain speed at the cost of accuracy. To address this limitation, we propose TAD, a Temporal-Aware trajectory self-Distillation framework. During data construction, we condition a teacher model on both the prompt and the ground-truth response to generate decoding trajectories, recording the intermediate masked states throughout the process. Based on how many decoding steps remain before each masked token is revealed, we partition masked positions into near and distant subsets. For near tokens, we train the student with a hard cross-entropy loss using the teacher trajectory tokens as labels, encouraging confident predictions for tokens that are about to be decoded. For distant tokens, we apply a soft KL divergence loss between the teacher and student token distributions, providing softer supervision and preserving future planning knowledge. This temporal-aware partition naturally gives rise to two deployment configurations: a Quality model that prioritizes accuracy and a Speed model that favors more aggressive acceleration. Experiments show that TAD consistently improves the accuracy-parallelism trade-off. On LLaDA, it raises average accuracy from 46.2\% to 51.6\% with the Quality model and average AUP from 46.2 to 257.1 with the Speed model. Our code is available at: https://github.com/BHmingyang/TAD

2605.09533 2026-05-12 cs.CL cs.AI

Assessment of RAG and Fine-Tuning for Industrial Question-Answering-Applications

Jakob Sturm, Josef Pichlmeier, Christian Bernhard, Maka Karalashvili, Johannes Klepsch, Georg Groh, Andre Luckow

发表机构 * BMW Group(宝马集团)

AI总结 本研究评估了检索增强生成(RAG)和微调(FT)在工业问答场景中的应用效果,重点分析了它们在汽车行业特定数据集上的表现。通过扩展成本-生成框架,综合考量了输出质量与操作成本,研究发现尽管高端模型在默认情况下表现最佳,但结合RAG的开源模型可以达到相近的质量,且RAG在整体上被证明是更高效且成本更低的适配方法。

Comments Accepted at AAAI 2026 Workshop on New Frontiers in Information Retrieval

详情
英文摘要

Large Language Models (LLMs) are increasingly employed in enterprise question-answering (QA) systems, requiring adaptation to domain-specific knowledge. Among the most prevalent methods for incorporating such knowledge are Retrieval-Augmented Generation (RAG) and fine-tuning (FT). Yet, from a cost-accuracy trade-off perspective, it remains unclear which approach best suits industry scenarios. This study examines the impact of RAG and FT on two closed datasets specific to the automotive industry, assessing answer quality and operational costs. We extend the Cost-of-Pass framework proposed by Erol et al. (arXiv:2504.13359) to jointly assess output quality, generation cost, and user interaction cost. Our findings reveal that while premium models perform best out of the box, open-source models can achieve comparable quality when enhanced with RAG. Overall, RAG emerges as the most effective and cost-efficient adaptation method for both closed- and open-source models.

2605.09528 2026-05-12 cs.AI

Cplus2ASP: Computing Action Language C+ in Answer Set Programming

Joseph Babb, Joohyung Lee

发表机构 * School of Computing, Informatics, and Decision Systems Engineering(计算、信息与决策系统工程学院)

AI总结 本文介绍了Cplus2ASP系统的第二版,实现了行动语言C+的确定性片段。该系统通过结合现代答案集求解技术,显著提升了运行效率,并兼容Causal Calculator Version 2的输入语言。系统整合了多个最新理论成果,支持增量执行模式和多种实用功能,同时为其他行动语言提供了可扩展的多模态翻译支持。

Journal ref In Proceedings of the 12th International Conference on Logic Programming and Nonmonotonic Reasoning (LPNMR 2013), 122-134, 2013

详情
英文摘要

We present Version 2 of system Cplus2ASP, which implements the definite fragment of action language C+. Its input language is fully compatible with the language of the Causal Calculator Version 2, but the new system is significantly faster thanks to modern answer set solving techniques. The translation implemented in the system is a composition of several recent theoretical results. The system orchestrates a tool chain, consisting of f2lp, clingo, iclingo, and as2transition. Under the incremental execution mode, the system translates a C+ description into the input language of iclingo, exploiting its incremental grounding mechanism. The correctness of this execution is justified by the module theorem extended to programs with nested expressions. In addition, the input language of the system has many useful features, such as external atoms by means of Lua calls and the user interactive mode. The system supports extensible multi-modal translations for other action languages, such as B and BC, as well.

2605.09524 2026-05-12 cs.AI

Functional Stable Model Semantics and Answer Set Programming Modulo Theories

Michael Bartholomew, Joohyung Lee

发表机构 * School of Computing, Informatics and Decision Systems Engineering(计算、信息与决策系统工程学院)

AI总结 本文研究了在“答案集编程模理论(ASPMT)”框架中引入“内涵函数”的问题,探讨了功能稳定模型语义在其中的重要作用。作者指出,传统答案集编程中函数是预定义的,而内涵函数的值可通过其他函数和谓词描述,这使得ASPMT能够更灵活地处理复杂约束。研究展示了如何将“紧致”ASPMT程序转化为SMT实例,扩展了答案集编程与可满足性模理论之间的联系。

Journal ref In Proceedings of the 23rd International Joint Conference on Artificial Intelligence (IJCAI 2013), pages 718-724, 2013

详情
英文摘要

Recently there has been an increasing interest in incorporating ``intensional'' functions in answer set programming. Intensional functions are those whose values can be described by other functions and predicates, rather than being pre-defined as in the standard answer set programming. We demonstrate that the functional stable model semantics plays an important role in the framework of ``Answer Set Programming Modulo Theories (ASPMT)'' -- a tight integration of answer set programming and satisfiability modulo theories, under which existing integration approaches can be viewed as special cases where the role of functions is limited. We show that ``tight'' ASPMT programs can be translated into SMT instances, which is similar to the known relationship between ASP and SAT.

2605.09519 2026-05-12 cs.AI cs.LO

Weighted Rules under the Stable Model Semantics

Joohyung Lee, Yi Wang

发表机构 * School of Computing, Informatics and Decision Systems Engineering(计算、信息与决策系统工程学院)

AI总结 本文提出了一种在稳定模型语义下的加权规则形式,借鉴了马尔可夫逻辑中的对数线性模型,以克服传统稳定模型语义的确定性限制。该方法能够处理答案集程序中的不一致性、对稳定模型进行排序、赋予稳定模型概率以及进行统计推理。文章还对相关形式系统如答案集程序、马尔可夫逻辑、ProbLog和P-log进行了形式上的比较分析。

Journal ref In Proceedings of the 15th International Conference on Principles of Knowledge Representation and Reasoning (KR 2016), pages 145-154, 2016

详情
英文摘要

We introduce the concept of weighted rules under the stable model semantics following the log-linear models of Markov Logic. This provides versatile methods to overcome the deterministic nature of the stable model semantics, such as resolving inconsistencies in answer set programs, ranking stable models, associating probability to stable models, and applying statistical inference to computing weighted stable models. We also present formal comparisons with related formalisms, such as answer set programs, Markov Logic, ProbLog, and P-log.

2605.09518 2026-05-12 cs.LG

LLM-Driven Performance-Space Augmentation for Meta-Learning-Based Algorithm Selection

Darren Zhu, Daren Ler

发表机构 * Department of Statistics and Data Science(统计与数据科学系) National University of Singapore(新加坡国立大学) Department of Computer Science(计算机科学系)

AI总结 该研究针对元学习算法选择中因真实数据集稀缺导致的元数据集稀疏问题,提出通过大语言模型生成合成回归数据集以扩充元数据集。研究通过引导语言模型生成具有特定性能特征的数据,重点增强算法性能空间中关键区域的覆盖。实验表明,这种基于性能空间的扩充策略显著提升了元学习模型的性能,尤其在统一采样策略下表现更优,为算法选择的元学习提供了新的数据增强方法。

详情
英文摘要

Meta-learning for algorithm selection relies on a meta-dataset in which each row corresponds to a supervised learning dataset described by meta-features and labelled with a target value that is associated with algorithm choice (typically, some function of algorithm performance). A persistent limitation is that the number of curated real-world datasets is small, resulting in sparse meta-datasets that constrain meta-learner generalisation. In this paper, we address this problem by augmenting the meta-dataset with synthetic regression datasets produced via a large language model (LLM), with generation steered toward target regions of a low-dimensionality performance space. In our experiments, we adopt a two-dimensional geometric setting defined by the cross-validated $R^2$ scores of two anchor algorithms, known as landmarkers. We compare two augmentation strategies: (1) uniform sampling, which distributes synthetic datasets across the performance space; and (2) margin-based sampling, which concentrates them near the decision boundary where landmarker preference is most ambiguous. Across 42 real-world UCI regression datasets and 730 synthetic datasets, both strategies substantially improve meta-learner performance over the unaugmented baseline under regression and multi-label evaluation formulations. However, uniform augmentation consistently outperforms margin-based augmentation, achieving a 17.47% relative reduction in Hamming loss, a 100.41% relative improvement in subset accuracy, and a +6.09% relative gain in pooled out-of-fold $R^2$. These results lead us to postulate a central thesis: the performance of algorithms resides on a low-dimensional performance manifold, whose reconstruction bias may be minimised by user-guided LLMs that seek to maximise uniform $ε$-cover, and consequently, lead to improved meta-learning for algorithm selection.

2605.09516 2026-05-12 cs.LG cs.AI

Mixture of Layers with Hybrid Attention

Ivan Ternovtsii, Yurii Bilak

发表机构 * Department of Software Systems, Uzhhorod National University(软件系统系,乌日霍罗德国立大学)

AI总结 本文提出了一种新的混合注意力机制的分层混合模型(MoL),用于改进传统混合专家(MoE)变压器的结构。该方法通过在每一层中使用多个低维子块,并结合路由机制选择激活的块,从而提升模型的效率和表达能力。为了解决稀疏路由导致的注意力覆盖不足问题,作者引入了混合注意力机制,结合全局软注意力和线性注意力,以兼顾全局上下文和局部细节信息。

详情
英文摘要

Standard Mixture-of-Experts (MoE) transformers route tokens to expert subnetworks within each layer, but the layer structure itself remains monolithic. We introduce Mixture of Layers (MoL), which replaces full-width transformer blocks (d_model) with K parallel thin blocks at reduced dimensionality (d_thin << d_model), connected via learned down/up projections and composed via top-k block routing. Scaling sparse block routing to many blocks creates an attention coverage problem, as each block sees fewer tokens. We address this by introducing hybrid attention, which pairs one shared softmax block for global context with Gated DeltaNet linear attention in routed blocks.

2605.09515 2026-05-12 cs.AI

A Game Theoretic Free Energy Analysis of Higher Order Synergy in Attention Heads of Large Language Models

Djamel Bouchaffra

发表机构 * DAVID Lab, University of Paris-Saclay, UVSQ Campus, 78035 Versailles, France(巴黎萨克雷大学DAVID实验室,UVSQ校区,法国Versailles)

AI总结 本文研究了大型语言模型中多头注意力机制中头之间的高阶协同关系,提出了基于博弈论自由能原理(GTFEP)的分析框架,将注意力头视为理性代理,并通过变分自由能最小化解释其集体行为。研究发现,注意力头之间的三阶协同信息普遍为负,揭示了模型中的高阶冗余,据此提出的剪枝方法可在保持性能基本不变的情况下显著降低计算成本。

Comments this manuscript has been submitted to Neural Networks

详情
英文摘要

Large language models rely on multihead attention, but interactions among heads remain poorly understood. We apply the Game Theoretic Free Energy Principle (GTFEP): a framework casting multiagent systems as distributed variational inference to analyze attention heads as bounded rational agents. According to GTFEP, each head minimizes its variational free energy, and collective behavior follows a Gibbs distribution over coalition structures whose energy is decomposed into Harsanyi dividends. Using a tractable approximation (uniform prior, deterministic dynamics), coalition free energy reduces to joint Shannon entropy of discretized head outputs (argmax key index). Pairwise dividends become mutual information (nonnegative), while triple dividends correspond to interaction information and can be negative. On BERT, GPT2, and Llama with GSM8K, triple dividends are consistently negative, revealing higher order redundancy. The Nash FEP correspondence guarantees that stationary points of collective free energy are epsilon Nash equilibria; thus, heads with negligible contribution can be pruned with minimal performance loss. Pruning heads with low marginal contribution reduces computational cost with minimal performance loss: for example, pruning 20% of heads in GPT2 reduces FLOPs by 18%, increases throughput by 22%, and raises perplexity only modestly (from 28.4 to 33.4 on GSM8K). Our work shows GTFEP provides a principled foundation for analyzing and optimizing transformer architectures.

2605.09514 2026-05-12 cs.LG

Doubly Robust Proxy Causal Learning with Neural Mean Embeddings

Bariscan Bozkurt, Alexandre Galashov, Dimitri Meunier, Zikai Shen, Arthur Gretton, Houssam Zenati

发表机构 * University College London(伦敦大学学院)

AI总结 该论文研究了在存在未观测混杂因素的情况下,如何通过代理因果学习方法识别因果响应函数的问题。提出了一种基于神经均值嵌入的双重稳健代理因果学习框架,结合治疗桥和结果桥的神经网络估计器,并通过最终回归阶段实现双重稳健修正。该方法适用于连续和结构化处理变量,能够估计群体、异质性和条件剂量-响应函数,相比现有方法在合成和图像数据集上表现出更优的性能。

详情
英文摘要

Unobserved confounding prevents standard covariate adjustment from identifying causal response functions in observational studies. Proxy causal learning addresses this problem through bridge equations involving treatment- and outcome-inducing proxies, avoiding direct recovery of the latent confounder. Existing doubly robust proxy estimators combine outcome and treatment bridges, but typically rely on fixed kernels, sieves, or low-dimensional semiparametric models; existing neural proxy methods are more flexible, but are largely single-bridge estimators. We develop a neural doubly robust framework for proxy causal learning with continuous and structured treatments. Our method introduces a neural mean-embedding estimator for the treatment bridge, combines it with a neural outcome bridge, and estimates the doubly robust correction through a final regression stage. The framework covers population, heterogeneous, and conditional dose-response functions, yielding full response-curve estimators rather than binary-treatment effects. The algorithms use two stages for each bridge and history-aware updates of the final linear layers to stabilize stochastic multi-stage training. We prove consistency of the algorithms showing that the doubly robust error is controlled by the final averaging and regression errors together with the smaller of the outcome- and treatment-side weak-norm bridge errors. Across synthetic and image-valued benchmarks, the proposed estimators outperform existing baselines and single-bridge neural estimators, showing the benefit of combining learned outcome and treatment bridges in a doubly robust construction. Our implementation is available at https://github.com/BariscanBozkurt/DRPCL-Neural-Mean-Embedding.

2605.09513 2026-05-12 cs.CV cs.RO

QueST: Persistent Queries as Semantic Monitors for Drift Suppression in Long-Horizon Tracking

Mayank Anand, Mohammad Saqlain, Kyan Mahajan, Priya Shukla, Gora Chand Nandi, Andrew Melnik

发表机构 * Center for Intelligent Robotics(智能机器人中心) Indian Institute of Information Technology Allahabad(阿拔斯理工大学) University of Bremen(不莱梅大学)

AI总结 本文提出QueST,一种用于长期轨迹跟踪的语义监控框架,旨在解决传统逐帧匹配方法在复杂场景下累积误差导致的语义漂移问题。QueST将与交互相关的实体视为持久的语义查询,而非瞬时的点轨迹,并在每个时间步全局关注时空视频特征,提供稳定的语义锚点。通过引入轻量的三维物理约束,QueST在遮挡等情况下有效抑制漂移,实验表明其在长期关节运动序列上的跟踪精度显著优于现有方法。

详情
英文摘要

Tracking points in videos is typically formulated as frame-to-frame correspondence, where each point is matched locally to the next frame. While this works over short horizons, errors accumulate under articulation, occlusion, and viewpoint change, leading to silent semantic drift that existing trackers cannot detect or correct. In this work, we revisit long-horizon tracking from a monitoring perspective and introduce QueST, a monitoring-by-design framework that treats interaction-relevant entities as persistent semantic queries rather than transient point tracks. Instead of local propagation, each query attends globally over spatio-temporal video features at every time-step, providing a stable semantic anchor across time. We further constrain query trajectories with lightweight 3D physical grounding, using geometric plausibility to suppress unbounded drift under occlusion. We evaluate QueST on long-horizon articulated sequences from PartNet-Mobility in SAPIEN and compare against RAFT-3D, CoTracker, and TAP-Net. QueST substantially reduces terminal drift achieving a 67.7% Absolute Point Error (APE) improvement over TAP-Net while better preserving identity over extended horizons. Our results show that embedding semantic monitoring directly into perception enables more reliable long-horizon tracking under distribution shift.

2605.09511 2026-05-12 cs.AI

WindINR: Latent-State INR for Fast Local Wind Query and Correction in Complex Terrain

Yi Xiao, Qilong Jia, Hang Fan, Pascal Fua, Robert Jenssen, Xiaosong Ma, Wei Xue

发表机构 * Tsinghua University(清华大学) MBZUAI Columbia University(哥伦比亚大学) EPFL(苏黎世联邦理工学院) The Arctic University of Norway(挪威北极大学)

AI总结 在复杂地形中,许多下游决策需要对特定位置和高度的风速进行快速估计,而非传统的固定网格高密度预报场。为此,研究提出了WindINR,一种基于潜在状态的隐式神经表示框架,能够实现高分辨率局部风速的快速查询与稀疏观测修正。该方法通过一个受潜在状态条件约束的解码器,将静态地形描述、低分辨率背景场和连续查询坐标映射为高分辨率风场状态,并通过分离可复用的表示学习与样本特异性潜在状态修正,实现了高效的推理时修正。实验表明,WindINR在保证查询连续性的同时,相比全网络微调方法,在修正速度上提升了约2.6倍,为复杂地形中背景场、稀疏观测与风场查询之间的实际应用提供了有效接口。

详情
英文摘要

Many downstream decisions in complex terrain require fast wind estimates at a small number of user-specified locations and heights for a given forecast valid time, rather than another dense forecast field on a fixed grid. We present WindINR, a latent-state implicit neural representation framework for continuous high-resolution local wind query and sparse-observation correction. WindINR maps static terrain descriptors, a low-resolution background field, and continuous query coordinates to a high-resolution wind state through a latent-conditioned decoder. To enable rapid inference-time correction, WindINR separates reusable representation learning from sample-specific latent-state correction. During training, a privileged encoder infers a reference latent state from high-resolution supervision, a deployable latent predictor estimates an initial latent state from inference-time inputs alone, and their discrepancies are summarized into a dataset-adaptive Gaussian prior over latent corrections. At inference time, within the WindINR module, network weights remain fixed and only the latent state is updated by minimizing a regularized correction objective using sparse observations and their uncertainty. In controlled OSSEs over the Senja region, including a UAV-aided approach scenario and random-observation robustness tests, WindINR improves local high-resolution wind estimates by updating only a compact latent state rather than the full network. The corrected representation remains continuously queryable at arbitrary coordinates and, in our CPU benchmark, yields about a $2.6\times$ online-correction speedup over full-network fine-tuning, suggesting a practical interface between kilometer-scale background products, sparse local observations, and wind queries in complex terrain.

2605.09507 2026-05-12 cs.CV

Uncertainty-Aware and Decoder-Aligned Learning for Video Summarization

Omer Tariq, Syed Muhammad Raza, Jeongbae Son

发表机构 * Perception AI Neubility Inc.(感知AI Neubility公司)

AI总结 该论文提出了一种用于视频摘要的不确定性感知与解码器对齐的学习框架VASTSum,旨在解决视频摘要任务中因主观标注和离散解码过程带来的挑战。该方法通过变分形式预测帧级的概率重要性分数,显式建模多标注者监督下的不确定性,并引入解码器对齐正则化以提升摘要选择的稳定性。实验表明,该方法在多个数据集上表现出更强的鲁棒性和高效性,优于传统确定性和扩散模型方法。

Comments Accepted for presentation at the 2026 International Joint Conference on Neural Networks (IJCNN 2026)

详情
英文摘要

Video summarization aims to produce a compact representation of a long video by selecting a subset of temporally important segments that best reflect human preferences. This task is inherently difficult due to strong annotation subjectivity and the reliance on discrete decoding procedures, such as temporal segmentation and knapsack-based selection, during evaluation. Most existing approaches either learn deterministic importance scores that overlook these characteristics or adopt complex generative models that increase training and inference cost. In this paper, we propose VASTSum, an uncertainty-aware and decoder-aligned learning framework for video summarization that addresses both challenges within a single-pass model. The proposed method predicts probabilistic frame-level importance scores using a variational formulation, enabling explicit modeling of uncertainty arising from multi-annotator supervision. To account for subjectivity, particularly under binary annotations, we employ a supervision strategy that encourages alignment with plausible human annotation modes rather than enforcing a single consensus target. Furthermore, we introduce a decoder-aligned regularization that promotes stability of knapsack-based summary selection, reducing sensitivity to small perturbations in predicted scores. We evaluate the proposed framework on the SumMe and TVSum benchmarks using standard rank-based metrics. Experimental results show consistent and competitive Kendall and Spearman correlations across multiple data splits, demonstrating improved robustness under annotation disagreement while maintaining efficient single-forward inference. These results indicate that explicitly modeling uncertainty and aligning learning objectives with the decoding stage provide a principled alternative to both deterministic and diffusion-based video summarization methods.

2605.09502 2026-05-12 cs.CL cs.AI cs.LG

Hidden Error Awareness in Chain-of-Thought Reasoning: The Signal Is Diagnostic, Not Causal

Aojie Yuan, Zhiyuan Julian Su, Haiyue Zhang, Yi Nian, Yue Zhao

发表机构 * University of Southern California, Los Angeles, CA, USA(南加州大学,洛杉矶,加利福尼亚州,美国)

AI总结 该研究揭示了链式推理(CoT)中模型内部与外部表现之间的不一致性:尽管模型在生成过程中表现出高度自信,但其隐藏状态中却能准确检测出推理错误。通过线性探针分析,模型在第一步即可预测推理正确性,而生成的文本表面分类器却无法达到同样效果。研究进一步表明,尽管模型具备错误识别能力,但这种信号仅用于诊断推理质量,而非纠正错误,多种干预方法均未能成功利用该信号改善推理结果。这一发现明确了机械可解释性的边界,指出推理错误的表示与事实知识的表示存在本质差异。

Comments 10 pages, 5 figures, 10 tables.Mechanistic Interpretability @ ICML 2026

详情
英文摘要

Chain-of-thought (CoT) prompting assumes that generated reasoning reflects a model's internal computation. We show this assumption is wrong in a specific, measurable way: models internally detect their own reasoning errors but outwardly express confidence in them. A linear probe on hidden states predicts trace correctness with 0.95 AUROC -- from the very first reasoning step (0.79) -- while verbalized confidence for wrong traces is 4.55/5, nearly identical to correct ones (4.87/5). A text-surface classifier achieves only 0.59 on the same data, confirming a 0.20-point gap invisible in the generated text. This hidden error awareness holds across three model families (Qwen, Llama, Phi), 1.5B-72B parameters, and RL-trained reasoning models (DeepSeek-R1, 0.852 AUROC). The natural question is whether this signal can fix the errors it detects. It cannot. Four interventions -- activation steering, probe-guided best-of-N, self-correction, and activation patching -- all fail; patching destroys output coherence entirely. The signal is diagnostic, not causal: a readout of computation quality, not a lever to redirect it. This delineates a boundary for mechanistic interpretability: error representations during reasoning are fundamentally different from the factual knowledge representations that prior work has successfully edited.

2605.09498 2026-05-12 cs.LG cs.AI

Spectral Transformer Neural Processes

Xianhe Chen, Hao Chen, Yingzhen Li

发表机构 * University of Cambridge(剑桥大学) Tencent(腾讯) Imperial College London(伦敦帝国理工学院)

AI总结 本文提出了一种名为Spectral Transformer Neural Processes(STNPs)的新方法,用于处理具有强周期性和准周期性的时间序列、空间数据和图像。该方法在Transformer Neural Processes(TNPs)的基础上引入了频域感知机制,通过频谱聚合器估计上下文频谱并生成任务自适应的频域特征,从而增强模型对周期性结构的建模能力。实验表明,STNPs在多个合成和真实数据集上均优于现有方法,显著提升了预测性能,拓展了神经过程模型在周期性建模中的应用范围。

Comments 37 pages, 10 figures, 18 tables

详情
英文摘要

Time series, spatial data, and images are natural applications of Neural Processes. However, when such data exhibit strong periodicity and quasi-periodicity, existing methods often suffer from underfitting and generalise poorly beyond the training distribution. In this work, we propose Spectral Transformer Neural Processes (STNPs), a frequency-aware extension of Transformer Neural Processes (TNPs). STNPs introduce a Spectral Aggregator that estimates an empirical context spectrum, compresses it into a spectral mixture, samples task-adaptive spectral features, and concatenates them with time-domain embeddings, thereby injecting a spectral-mixture-kernel bias into TNPs. This design reshapes the similarity geometry, allowing inputs that are distant in Euclidean space to remain close in an induced periodic manifold while enhancing time-frequency interactions. Extensive experiments on synthetic regression tasks, real-world time-series datasets, and an image dataset demonstrate that STNPs consistently improve predictive performance over existing baselines, extending Neural Processes beyond translation equivariance towards effective modelling of periodicity and quasi-periodicity.

2605.09497 2026-05-12 cs.AI cs.CR

Don't Click That: Teaching Web Agents to Resist Deceptive Interfaces

Yilin Zhang, Yingkai Hua, Chunyu Wei, Xin Wang, Yueguo Chen

发表机构 * Renmin University of China(中国人民大学) Ant Digital Technologies, Ant Group(蚂蚁集团数字技术部)

AI总结 本文研究了基于视觉-语言模型的网络代理在面对欺骗性界面时的脆弱性问题,提出了一种名为DUDE的两阶段防御框架,结合混合奖励学习与非对称惩罚机制,有效提升了代理对欺骗性界面的识别与抵御能力。同时,研究还构建了一个名为RUC的基准测试集,用于评估和推动该领域的发展。实验表明,DUDE在降低欺骗性界面影响的同时,仍能保持任务执行性能,为构建更安全的网络代理系统提供了有效基础。

Comments Accepted to ACL 2026 Main Conference. 23 pages, 8 figures, 19 tables

详情
英文摘要

Vision-language model (VLM) based web agents demonstrate impressive autonomous GUI interaction but remain vulnerable to deceptive interface elements. Existing approaches either detect deception without task integration or document attacks without proposing defenses. We formalize deception-aware web agent defense and propose DUDE (Deceptive UI Detector & Evaluator), a two-stage framework combining hybrid-reward learning with asymmetric penalties and experience summarization to distill failure patterns into transferable guidance. We introduce RUC (Real UI Clickboxes), a benchmark of 1,407 scenarios spanning four domains and deception categories. Experiments show DUDE reduces deception susceptibility by 53.8% while maintaining task performance, establishing an effective foundation for robust web agent deployment.

2605.09496 2026-05-12 cs.CL cs.LG

Beyond Language: Format-Agnostic Reasoning Subspaces in Large Language Models

Aojie Yuan, Zhiyuan Su

发表机构 * University of Southern California(南加州大学) Duke University(杜克大学)

AI总结 该研究探讨了大型语言模型在不同符号系统(如英文、代码、数学符号)中是否共享一个统一的推理表征。通过引入TriForm基准测试,研究发现模型中间层存在一个与形式无关的推理子空间(FARS),该子空间能有效提取概念结构并抑制形式信息。实验表明,仅替换这一子空间的10个维度即可保留90%-96%的模型输出,验证了其在跨形式推理中的关键作用,并支持了“柏拉图式表征假设”。此外,研究还揭示了陈述性与过程性表征之间的不对称性,指出形式差异的关键不在于语言与形式,而在于陈述性与过程性之间的区别。

Comments Preprint. 13 pages, 13 figures, 12 tables

详情
英文摘要

Large language models represent the same reasoning in vastly different surface forms -- English prose, Python code, mathematical notation -- yet whether they share a common internal substrate across these symbolic systems remains unknown. We introduce the TriForm Benchmark (18 concepts x 6 forms x 3 instances = 324 stimuli) and study five LLMs (1.6B-8B) across three architecture families. Using permutation-corrected RSA, cross-form probing, and activation patching, we find converging evidence for a Format-Agnostic Reasoning Subspace (FARS) in middle layers. We make FARS concrete: concept-centroid PCA extracts a 10-dimensional subspace that amplifies concept structure 3x while suppressing form information to near zero. Replacing only these 10 dimensions during cross-form patching preserves 90-96% of model output -- far exceeding both full activation replacement (44-56%) and variance-maximizing PCA (60-74%) -- while ablating them causes targeted disruption. FARS generalizes to held-out concepts and converges across architectures (CCA > 0.79 for all model pairs), providing within-modality evidence for the Platonic Representation Hypothesis. We further discover a declarative-procedural asymmetry: representations are far more compatible between prose and mathematics than between either and code, suggesting that the critical axis of divergence is not linguistic vs. formal but declarative vs. procedural.

2605.09494 2026-05-12 cs.RO cs.AI

LASSA Architecture-Based Autonomous Fault-Tolerant Control of Unmanned Underwater Vehicles

Hong Chen, Zixiang Tang, Yuanbao Chen, Yu Liu

发表机构 * Wuhan Second Ship Design and Research Institute(武汉第二船舶设计研究所) School of Aeronautic Science and Engineering, Beihang University(北航航空科学与工程学院)

AI总结 本文提出了一种基于LASSA架构的自主容错控制方法,用于无人水下航行器(UUV)在通信受限环境下的高可靠性运行。该方法结合大型语言模型(LLM)与智能代理,实现未知故障的自主识别与任务重规划,同时通过求解器验证物理约束,抑制模型幻觉并确保决策可解释性。实验表明,该框架在舵故障等异常情况下能够有效调整航迹参数,满足约束条件并完成任务,展示了其在容错控制与实时控制之间的良好平衡。

详情
英文摘要

Unmanned underwater vehicles (UUVs) operate persistently in communication-constrained environments, thus requiring high-level autonomous fault-tolerant control under faulty operating conditions. Existing approaches rely heavily on predefined hard-coded rules and struggle to achieve effective fault-tolerant control against unforeseen faults. Although large language models (LLMs) possess powerful cognitive and reasoning capabilities, their inherent hallucinations remain a major obstacle to their application in UUV control systems. This paper proposes an intelligent control method based on the LASSA (LLM-based Agent with Solver, Sensor and Actuator) architecture. Within this architecture, an LLM identifies unknown faults and accomplishes task replanning via autonomous reasoning without hard-coded rules; the intelligent agent undertakes perception, scheduling and decision evaluation; the solver verifies physical boundary feasibility constraints prior to command transmission to the actuators. This architecture suppresses physically infeasible LLM hallucinations and ensures interpretable, verifiable decision-making. Moreover, it enables fast-slow dual closed-loop collaborative control, where the slow loop undertakes high-level dynamic decision-making and the fast loop guarantees high-frequency real-time control, simultaneously balancing decision intelligence and control timeliness. Lake experiments under normal and lower-rudder-fault conditions show that the framework detects trajectory tracking abnormalities, replans the route by adjusting the turning radius from 4m to 12m and reducing speed from 2kn to 1kn, passes all three solver constraints on the first invocation, and guides the UUV to complete the full mission; under normal conditions no false fault alarms are raised throughout the run.

2605.09490 2026-05-12 cs.CL cs.AR cs.LG

Not All Thoughts Need HBM: Semantics-Aware Memory Hierarchy for LLM Reasoning

Aojie Yuan, Tianqi Shen, Dajun Zhang

发表机构 * University of Southern California(南加州大学) University of Wisconsin–Madison(威斯康星大学麦迪逊分校)

AI总结 大型语言模型在推理过程中生成大量中间思考步骤,这些步骤需要占用有限的GPU高带宽内存(HBM),导致性能瓶颈。本文提出一种语义感知的内存分层机制,将不同重要性的思考步骤分配到不同层级的存储中,如HBM、DDR内存、压缩存储和丢弃,从而减少对HBM的依赖。该方法通过累积注意力评分实现零近似误差的计算卸载,实验表明在保持较高推理精度的同时,可显著降低HBM占用并提升计算效率。

Comments Preprint. 14 pages + appendix. Under review at AdaptFM Workshop @ ICML 2026

详情
英文摘要

Reasoning LLMs produce thousands of chain-of-thought tokens whose KV cache must reside in scarce GPU HBM. The dominant response -- permanently evicting low-importance tokens -- is catastrophic for reasoning: accuracy collapses to 0-2.5% when half the cache is removed. We ask a different question: must every token live in HBM, or can some live elsewhere? We introduce a semantics-aware memory hierarchy that sorts tokens into four tiers -- HBM, DDR, compressed, and evicted -- using cumulative attention scoring. Low-importance tokens are moved to CPU memory rather than destroyed; before each attention step they are prefetched back at full precision, contributing exactly the same terms as if they had never left the GPU. We formalize this as zero-approximation-error offloading and derive our central finding: accuracy depends solely on how many tokens are permanently discarded (the eviction ratio), not on how many remain in HBM. A controlled 3x3 grid over HBM and eviction ratios confirms this across three model scales (7B-32B) and four benchmarks. With only 3% eviction, the hierarchy retains 91% of full-cache accuracy on GSM8K and 71% on MATH-500 (n=200); at 14B scale it matches the uncompressed baseline (90% vs. 86%) while halving HBM occupancy. A head-to-head reproduction of R-KV -- the current SOTA eviction method -- on our setup achieves only 0-32% at comparable budgets. A system prototype with real GPU-CPU data movement shows that the price of this preservation is modest -- 5-7% transfer overhead -- and scaling analysis projects 2-48 GB HBM savings at production batch sizes.

2605.09487 2026-05-12 cs.LG

Kintsugi: Learning Policies by Repairing Executable Knowledge Bases

Teng Cao, Yu Deng, Hikaru Shindo, Quentin Delfosse, Lanxi Wen, Suli Wang, Jannis Blüml, Christopher Tauchmann, Kristian Kersting

发表机构 * Artificial Intelligence and Machine Learning Lab, Technical University of Darmstadt, Germany(德累斯顿技术大学人工智能与机器学习实验室) Hessian Center for Artificial Intelligence (hessian.AI), Germany(黑森人工智能中心) Department of Computer Science, Technical University of Darmstadt, Germany(德累斯顿技术大学计算机科学系) Department of Computer Science, Technical University of Munich (TUM), Germany(慕尼黑技术大学计算机科学系) German Research Center for Artificial Intelligence (DFKI), Germany(德国人工智能研究中心) Centre for Cognitive Science, Technical University of Darmstadt, Germany(德累斯顿技术大学认知科学中心)

AI总结 本文提出了一种名为 Kintsugi 的白盒策略学习框架,旨在解决现代具身智能体任务知识难以检验、重组和复用的问题。该方法将策略改进视为由验证器引导的可执行知识库的构建过程,通过局部类型编辑而非依赖语言模型推理来提升策略知识。Kintsugi 在推理时无需调用大语言模型,通过确定性符号执行器直接执行知识库,实现了在长期文本代理和物体中心操作任务中的高性能,同时保持了知识的可检查性和可编辑性。

详情
英文摘要

Modern embodied agents achieve impressive performance, but their task knowledge is often stored in neural weights, latent state, or prompt-bound memory, making individual policy knowledge difficult to inspect, validate, recombine, and reuse. We introduce \textbf{Kintsugi}, a white-box policy-learning framework that treats embodied policy improvement as verifier-gated construction of a typed executable Knowledge Base (KB). Kintsugi represents task-level policy knowledge as composable typed entries -- predicates, operators, policy schemas, monitors, recovery rules, experience records, and goals -- and improves this artifact through localized typed edits induced from rollout evidence, rather than relying on test-time language-model reasoning. Between rollouts, a tool-constrained agentic editing loop diagnoses trajectory failures, localizes them to editable KB layers, and proposes candidate edits. A deterministic verification gate admits an edit only when the candidate type-checks, the resulting KB executes, and focused validation success or trajectory-health metrics improve without violating protected-regression checks. At inference, the accepted KB is executed by a deterministic symbolic executor with zero LLM calls. Across long-horizon text-agent benchmarks and representative object-centric manipulation settings, Kintsugi achieves strong endpoint performance while preserving inspectability, local editability, and verifier-gated deployment. These results suggest that embodied policy improvement can be organized around executable task knowledge.

2605.09486 2026-05-12 cs.LG cs.AI quant-ph

CTQWformer: A CTQW-based Transformer for Graph Classification

Zhan Li, Wuqing Yu, Yusen Wu, Chuan Wang

发表机构 * school of Artificial Intelligence, Beijing Normal University(人工智能学院,北京师范大学)

AI总结 本文提出了一种基于连续时间量子行走(CTQW)的图分类模型CTQWformer,旨在解决图神经网络和Transformer架构在捕捉全局结构依赖和动态信息传播方面的不足。该模型通过可训练的哈密顿量融合图结构和节点特征,物理地建模量子行走动态,提取丰富的图结构信息,并将其嵌入到图Transformer模块和图循环模块中,分别用于增强自注意力机制的结构偏差和建模时间演化模式。实验表明,CTQWformer在多个基准图分类数据集上优于传统图核和图神经网络方法,是首个将量子动力学与可训练深度学习框架结合的混合型图Transformer。

详情
英文摘要

Graph Neural Networks (GNN) and Transformer-based architectures have achieved remarkable progress in graph learning, yet they still struggle to capture both global structural dependencies and model the dynamic information propagation. In this paper, we propose CTQWformer, a hybrid graph learning framework that integrates continuous-time quantum walks (CTQW) with GNN. CTQWformer employs a trainable Hamiltonian that fuses graph topology and node features, enabling physically grounded modeling of quantum walk dynamics that captures rich and intricate graph structure information. The extracted CTQW-based representations are incorporated into two complementary modules:(i) a Graph Transformer module that embeds final-time propagation probabilities as structural biases in the self-attention mechanism, and (ii) a Graph Recurrent Module that captures temporal evolution patterns with bidirectional recurrent networks. Extensive experiments on benchmark graph classification datasets demonstrate that CTQWformer outperforms graph kernel and GNN-based methods, demonstrating the potential of integrating quantum dynamics into trainable deep learning frameworks for graph representation learning. To the best of our knowledge, CTQWformer is the first hybrid CTQW-based Transformer, integrating CTQW-derived structural bias with temporal evolution modeling to advance graph learning.

2605.09485 2026-05-12 cs.LG stat.ML

SEMASIA: A Large-Scale Dataset of Semantically Structured Latent Representations

Mario Edoardo Pandolfo, Enrico Grimaldi, Lorenzo Marinucci, Leonardo Di Nino, Simone Fiorellino, Sergio Barbarossa, Paolo Di Lorenzo

发表机构 * Dept. Computer, Control and Management Engineering(计算机、控制与管理工程系) Sapienza University of Rome(罗马大学西皮恩扎分校) National Inter-University Consortium for Telecommunications (CNIT)(电信全国大学联合体(CNIT)) Dept. of Statistical Sciences(统计科学系) Dept. of Information Engineering, Electronics, and Telecommunications(信息工程、电子与电信系)

AI总结 本文介绍了SEMASIA,一个大规模的语义结构潜在表示数据集,包含从约1700个预训练视觉模型中提取的潜在表示,覆盖八个标准图像分类基准。该数据集配以描述模型架构、训练方式、预训练来源等结构化元数据,旨在解决不同模型潜在空间几何结构不兼容的问题。研究通过分析潜在空间的概念组织、对齐映射性能以及预训练数据与模型特性对表示的影响,展示了SEMASIA在可解释性、迁移学习等任务中的应用价值。

详情
英文摘要

Latent representations learned by neural networks often exhibit semantic structure, where concept similarity is reflected by geometric proximity in embedding space. However, comparing such spaces across models remains difficult: changes in architecture, pretraining data, objective, or random seed can yield embeddings with similar content but incompatible geometry. This latent space alignment problem is central to interpretability, transfer and multimodal learning, federated systems, and semantic communication; however, progress remains limited by the lack of large-scale, model-diverse, and metadata-rich benchmarks. To address this gap, we introduce SEMASIA, a large-scale collection of latent representations extracted from approximately 1,700 pretrained vision models across eight standard image-classification benchmarks. SEMASIA pairs embeddings with structured metadata describing architectures, training regimes, pretraining sources, and model scale. We demonstrate three applications of the resource. First, we analyze the conceptual organization of individual latent spaces, showing consistent prototype-like clustering and hierarchical semantic neighborhoods across models and datasets. Second, we benchmark supervised alignment mappings between latent spaces using reconstruction error and downstream task performance. Third, we perform a large-scale regression analysis of how pretraining-data complexity, specialization, transfer learning, augmentation, and model scale relate to geometric and probing properties of embeddings. By coupling representational scale with standardized metadata, SEMASIA provides a reproducible foundation for studying latent geometry, evaluating alignment methods, and developing next-generation heterogeneous and interoperable AI systems.

2605.09483 2026-05-12 cs.CL cs.AI cs.LG

A Cognitively Grounded Bayesian Framework for Misinformation Susceptibility

Pranava Madhyastha

发表机构 * Dept. of Computer Science, City, University of London(伦敦城市大学计算机科学系) The Alan Turing Institute(艾伦·图灵研究所)

AI总结 本文提出了一种基于认知理论的贝叶斯框架——有界实用听众模型(BPL),用于建模人们对错误信息的易感性。该框架结合了有限理性理论,引入了工作记忆限制、信息瓶颈和重要性采样等三个认知约束,从而更真实地模拟人类在信息处理中的决策过程。研究通过在LIAR和MultiFC数据集上的实验,验证了BPL在虚假信息分类任务中的有效性,并支持了深度错配悖论等理论预测。

Comments work in progress

详情
英文摘要

In this (work in progress) paper, we present Bounded Pragmatic Listener (or BPL), a cognitively grounded Bayesian framework for modelling susceptibility to information disorder. BPL extends Rational Speech Act theory with three cognitively motivated bounds derived from the bounded rationality literature with a) a recursion depth bound (that emphasises working memory limits);b) a prior compression parameter (which is oriented at capturing information bottleneck); and c) an availability sample size (that operationalises importance sampling with saliency-weighted proposals). This allows us to test predictions about misinformation susceptibility, annotator disagreement, and the differential vulnerability to mis-, dis-, and mal-information as defined in the Information Disorder framework. We validate BPL on the LIAR and MultiFC benchmarks showcasing competitive veracity classification and experimental support for the depth-mismatch paradox.

2605.09477 2026-05-12 cs.CV cs.AI

Outlier-Robust Diffusion Solvers for Inverse Problems

Yang Zheng, Jiahua Liu, Tongyao Pang, Wen Li, Zhaoqiang Liu

发表机构 * School of Computer Science and Engineering, University of Electronic Science and Technology of China(电子科技大学计算机科学与工程学院) Yau Mathematical Sciences Center, Tsinghua University(清华大学尤太数学科学中心)

AI总结 本文研究了在存在异常值的情况下,如何利用扩散模型解决逆问题。为提高鲁棒性,作者首先通过显式噪声估计优化测量数据,并基于Huber损失函数构建迭代加权最小二乘目标函数,进而提出一种基于梯度下降的优化方法,并结合共轭梯度法以避免学习率调优问题。实验表明,该方法在多种图像数据集上表现出对异常值的强鲁棒性,优于现有的扩散模型方法。

Comments Accepted by CVPR 2026

详情
英文摘要

Methods based on diffusion models (DMs) for solving inverse problems (IPs) have recently achieved remarkable performance. However, DM-based methods typically struggle against outliers, which are common in real-world measurements. In this work, to tackle IPs with outliers, we first refine the measurement via explicit noise estimation to mitigate the effect of noise. Subsequently, we formulate an iteratively reweighted least squares objective based on the Huber loss to address the outliers. We propose a method utilizing gradient descent to approximately solve the corresponding optimization problem for the robust objective. To avoid delicate tuning of the learning rate required by the gradient descent method, we further employ the conjugate gradient method with an efficient strategy for updating. Extensive experiments on multiple image datasets for linear and nonlinear tasks under various conditions demonstrate that our proposed methods exhibit robustness to outliers and outperform recent DM-based methods in most cases.

2605.09476 2026-05-12 cs.CL cs.AI

Align and Shine: Building High-Quality Sentence-Aligned Corpora for Multilingual Text Simplification

Kenji Hilasaca, Nouran Khallaf, Serge Sharoff

发表机构 * Centre for Translation, Localisation and Interpreting Studies(翻译、本地化与诠释研究中心) School of Languages, Cultures and Societies(语言、文化和社会学院) University of Leeds, UK(利兹大学)

AI总结 本文研究了多语言文本简化任务中高质量句子对齐语料库的构建问题,针对除英语外其他语言缺乏大规模高质量数据集的现状,提出了一种从可比语料中收集和处理众包简化数据的方法。通过文档级数据实现句子级对齐,构建了一个适用于多语言(包括加泰罗尼亚语、英语、法语、意大利语和西班牙语)文本简化系统训练与测试的公开数据集。

Comments Accepted at BUCC 2026 workshop at LREC 2026

详情
英文摘要

Text simplification plays a crucial role in improving the accessibility and comprehensibility of written information for diverse audiences, including language learners and readers with limited literacy. Despite its importance, large-scale, high-quality datasets for training and evaluating text simplification models remain scarce for languages other than English. This paper reports an experimental study on the collection and processing of crowd-sourced simplification data from comparable corpora to construct a corpus suitable for both training and testing text simplification systems across multiple languages (Catalan, English, French, Italian and Spanish). We report mechanisms for sentence-level alignment from document-level data. The resulting dataset of the aligned sentence pairs is publicly available.

2605.09472 2026-05-12 cs.LG cs.DS

Positional LSH: Binary Block Matrix Approximation for Attention with Linear Biases

Daniel Wolfson, Tal Wagner

发表机构 * Blavatnik School of Computer Science and AI(Blavatnik计算机科学与人工智能学院)

AI总结 该论文研究了在Transformer模型中引入位置偏置的注意力机制,并通过局部敏感哈希(LSH)的视角提出了位置LSH方法。核心方法是将ALiBi位置偏置矩阵视为由位置LSH生成的块对角二值掩码的期望,并证明在采样掩码的均值下,可以以高概率实现谱范数和最大范数的近似保证。该方法将长上下文的ALiBi注意力转化为多个短上下文的随机无偏注意力操作,从而显著提升计算效率,实验验证了理论分析的有效性。

详情
英文摘要

Positional encoding in transformers is commonly implemented through positional embeddings, attention masks, or bias terms, but formal connections between these mechanisms remain limited. We study attention with positional bias through the lens of locality-sensitive hashing (LSH), focusing on Attention with Linear Biases (ALiBi). We show that the ALiBi bias matrix is the expectation of contiguous block-diagonal binary masks induced by a ``positional LSH'' scheme. The empirical mean of masks sampled from this scheme yields spectral norm and max-norm approximation guarantees with bounded block sizes with high probability. This structural theorem implies a uniform approximation theorem for ALiBi-biased attention: with high probability over the sampled masks, the approximate attention output is accurate simultaneously for all query-key-value inputs and can be computed in near-linear time in the context length, reducing long-context ALiBi to a collection of randomized short-context regular (positionally unbiased) attention operations. Conceptually, this connects positional bias, masks, and positional embeddings in a single formal framework and suggests an approach to efficient ALiBi-biased attention. Experiments on large language models validate our theoretical findings.

2605.09469 2026-05-12 cs.CL

FinMoji: A Framework for Emoji-driven Sentiment Analysis in Financial Social Media

Ahmed Mahrous, Roberto Di Pietro

发表机构 * King Abdullah University of Science and Technology (KAUST)(卡斯特科学与技术大学) Hamad Bin Khalifa University(哈马德·本·卡伊夫大学)

AI总结 本文研究了在金融社交平台StockTwits中利用表情符号进行情感分析的问题,探讨表情符号作为投资者情感指标的可靠性及其与传统文本分析的对比。研究采用逻辑回归和Transformer模型进行实验,发现仅使用表情符号的模型在F1分数上约为0.75,而结合文本与表情符号的模型可达约0.88,且计算成本更低,适用于高频交易等时间敏感场景。此外,部分表情符号及其组合对市场趋势具有超过90%的预测准确率,凸显了表情符号在金融情感分析中的独特价值。

详情
英文摘要

This paper explores the use of emojis in financial sentiment analysis, focusing on the social media platform StockTwits. Emojis, increasingly prevalent in digital communication, have potential as compact indicators of investor sentiment, which can be critical for predicting market trends. Our study examines whether emojis alone can serve as reliable proxies for financial sentiment and how they compare with traditional text-based analysis. We conduct a series of experiments using logistic regression and transformer models. We further analyze the performance, computational efficiency, and data requirements of emoji-based versus text-based sentiment classification. Using a balanced dataset of about 528,000 emoji-containing StockTwits posts, we find that emoji-only models achieve F1 approximately 0.75, lower than text-emoji combined models, which achieve F1 approximately 0.88, but with far lower computational cost. This is a useful feature in time-sensitive settings such as high-frequency trading. Furthermore, certain emojis and emoji pairs exhibit strong predictive power for market sentiment, demonstrating over 90 percent accuracy in predicting bullish or bearish trends. Finally, our research reveals large statistical differences in emoji usage between financial and general social media contexts, stressing the need for domain-specific sentiment analysis models.

2605.09465 2026-05-12 cs.RO

High Precision Hydraulic Excavator Control for Heavy-Duty Grading

Lennart Werner, Pol Eyschen, Sean Costello, Andrei Cramariuc, Marco Hutter

发表机构 * ETH Zürich, Robotic Systems Lab(苏黎世联邦理工学院机器人系统实验室)

AI总结 本文研究了如何实现重型土方工程中高精度的液压挖掘机自动平整控制。针对不同液压架构对操作指令和土壤作用力的响应差异,作者提出了一种分层控制方法,包含液压感知的底层控制环和路径跟踪层,通过校准过程适用于负载感应和负流量控制两类设备。实验表明,该方法在精度上比现有商业方案提升2.6倍,并能更高效地利用机器压力性能。

Comments 12 pages 19 figures, RSS 2026

详情
英文摘要

High-precision heavy-duty grading is a common step in earthworks, traditionally carried out manually by skilled operators. Removing a significant amount of material while achieving a high-precision surface requires substantial machine-specific experience. Different hydraulic architectures react differently to operator inputs and soil interaction forces, which makes generalizable controllers challenging. In this paper, we present an autonomous controller that achieves high-precision grading at expert-operator speed on Load Sensing and Negative Flow Control machines alike. We split our controller into two parts: (1) a hydraulic-aware low-level loop that is hydraulic architecture-specific and (2) a path-tracking layer that coordinates joint motions and responses. Through a calibration process, our technique is applicable to load-sensing and negative-flow-control machinery. To showcase its versatility, we benchmark our approach on two excavators with different hydraulics and compare it against a commercial state-of-the-art solution. Our technique (RMSE 1.8~cm) outperforms the commercial solution (RMSE 4.7~cm) in precision by a factor of 2.6 and improves machine usage by leveraging the maximum function pressure, as opposed to commercial solutions that stall prematurely.

2605.09463 2026-05-12 cs.CL

Beyond Position Bias: Shifting Context Compression from Position-Driven to Semantic-Driven

Jiwei Tang, Zhijing Huang, Xinyu Zhang, Chen Jason Zhang, Jianxing Yu, Libin Zheng, Rui Meng, Jian Yin

发表机构 * Sun Yat-sen University(中山大学) Hong Kong Polytechnic University(香港理工大学) Beijing Normal–Hong Kong Baptist University(北京师范大学-香港 Baptist大学)

AI总结 大型语言模型在多种任务中表现出色,但在处理长上下文时面临计算开销大和信息冗余的问题。现有软提示压缩方法受限于位置偏差,导致性能不稳定和语义碎片化。本文提出了一种语义一致的上下文压缩方法SeCo,通过在语义空间中动态选择与查询相关的语义中心,并进行一致性加权合并,摆脱了对物理位置的依赖,有效提升了压缩效果。实验表明,SeCo在多个基准测试中表现出优越的性能、推理速度和领域外鲁棒性。

Comments 20 pages, 6 figures

详情
英文摘要

Large Language Models (LLMs) have demonstrated exceptional performance across diverse tasks. However, their deployment in long-context scenarios faces high computational overhead and information redundancy. While soft prompt compression has emerged as a promising way to mitigate these costs by compressing sequences into compact embeddings, existing paradigms remain fundamentally constrained by position bias: they primarily rely on learnable tokens insertion at fixed positions or group tokens according to their physical token layout, thereby inducing performance instability and semantic fragmentation. To overcome this bottleneck, we propose Semantic Consistency Context Compression (SeCo), a method that shifts context compression from position-driven to semantic-driven. Rather than constraint by physical token layout, SeCo dynamically anchors compression directly in the semantic space by selecting query-relevant tokens as semantic centers and aggregating remaining tokens via consistency-weighted merging. This design inherently preserves semantic consistency while eliminating position bias. Extensive experiments on 14 benchmarks across two backbone models demonstrate that SeCo consistently shows superiority in downstream tasks, inference latency, and out-of-domain robustness. The code is available at https://anonymous.4open.science/r/seco-EE5E.

2605.09460 2026-05-12 cs.CV cs.AI

When Few Steps Are Enough: Training-Free Acceleration of Identity-Preserved Generation

Dongqi Zheng

发表机构 * FLUX Diffusion Transformer(FLUX扩散变换器) InfuseNet

AI总结 本文研究了在保持身份特征的前提下,如何通过简化生成步骤来加速图像生成过程。作者提出了一种无需重新训练的方法,通过替换预训练的扩散模型主干网络,并禁用分类器引导,显著提升了生成效率,同时保持了较高的身份相似度。实验表明,在早期生成步骤中已能获得较高质量的身份特征,后续步骤主要优化细节,从而为身份保留生成提供了高效且实用的优化策略。

详情
英文摘要

Identity-preserved image generation is typically built on many-step diffusion backbones, making personalized generation expensive at deployment time. We show that this cost is often unnecessary for identity-conditioned FLUX generation. A frozen InfuseNet identity adapter trained with dev transfers directly to the distilled schnell backbone without retraining. This two-line replacement -- changing the backbone path and disabling classifier-free guidance -- reduces latency by 5.9x while improving ArcFace identity similarity by +0.028 and lpips by -0.016 over the standard 28-step dev baseline. To explain why this works, we analyze the denoising trajectory and find that identity fidelity enters an early effective regime, often within 4-8 steps, while later steps primarily refine visual detail, sharpness, and contrast. Adapter ablations confirm that identity formation depends on the identity adapter, while attention-stream norm probes suggest that the relative conditioning contribution decreases as sampling proceeds. Preliminary style-adapter and object-adapter sweeps on SDXL and SD1.5 show similar diminishing returns after intermediate steps. These results position distilled backbone replacement as a simple, training-free strategy for improving the efficiency-fidelity tradeoff of identity-preserved generation.