arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2606.18022 2026-06-17 cs.LG 新提交

LLM消费者行为理论：一个新兴研究领域的基础

Manon Reusens, Sofie Goethals, David Martens

发表机构 * Department of Engineering Management, University of Antwerp（安特卫普大学工程管理系）

AI总结本文提出LLM消费者行为理论，研究LLM代理在市场中代表人类消费决策的行为，整合经济学与自然语言处理，探讨偏好表达、市场聚合及理性假设的失效。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被部署为自主代理，代表用户做出消费决策。这一转变对传统上以人类为主要决策者的消费者理论提出了基本问题。在本文中，我们引入了LLM消费者行为理论，这是一个关注分析代理市场中消费者行为的新研究领域。借鉴经典和行为经济学以及自然语言处理的最新进展，我们形式化了人类偏好如何被基于LLM的代理反映和执行，以及代理级别的决策如何聚合为市场需求。我们将先前关于LLM决策、人类行为模拟和偏好诱导的分散文献统一在共同的经济视角下，强调了理性、异质性等假设在代理市场中可能失效的地方。本文不提供实证验证，而是概述了LLM消费者行为的范围，并识别了与对齐、偏好表示和市场动态相关的开放研究问题。

英文摘要

Large language models (LLMs) are increasingly deployed as autonomous agents that make consumption decisions on behalf of users. This shift raises fundamental questions for consumer theory, which has traditionally modeled humans as the primary decision-makers. In this paper, we introduce LLM Consumer Behavior Theory, a new field of study concerned with analyzing consumer behavior in agentic markets. Drawing on classical and behavioral economics alongside recent advances in Natural Language Processing, we formalize how human preferences are reflected and acted upon by LLM-based agents, and how agent-level decisions aggregate into market demand. We unify previously fragmented literature on LLM decision-making, human behavior simulation, and preference elicitation under a common economic lens, highlighting where assumptions, such as rationality and heterogeneity, may fail in agentic markets. Rather than providing empirical validation, this paper outlines the scope of LLM consumer behavior and identifies open research questions related to alignment, preference representation, and market dynamics.

URL PDF HTML ☆

赞 0 踩 0

2606.18003 2026-06-17 cs.LG cs.AI 新提交

C2FL: Clustered Continual Federated Learning under Spatial and Temporal Drift

C2FL：空间和时间漂移下的聚类持续联邦学习

Davide Domini, Gianluca Aguzzi, Lorenzo Pellegrini, Mirko Viroli, Lukas Esterle

发表机构 * University of Bologna（博洛尼亚大学）； Aarhus University（哥本哈根大学）

AI总结针对空间异质性和时间漂移下节点隐私保护的集体自适应问题，提出C2FL方法，通过空间聚类自组织学习组，结合经验回放和停留时间感知自适应平均，实现鲁棒集体适应。

详情

AI中文摘要

集体自适应系统（CAS）越来越依赖机器学习，让每个节点从本地感知数据中学习，使其行为与周围环境对齐。然而，扩展这种智能带来了根本性挑战：感知数据通常涉及隐私，无法集中收集；节点是移动的，穿越不同区域，附近节点感知相似现象，而远处节点观察到截然不同的条件，形成自然空间聚类；并且由于移动性，这些分布随时间演变，引入时间漂移，使本地模型逐渐过时。这些动态出现在多个领域——车辆感知、无人机监测、智能手机众包——但隐私、空间异质性和时间漂移的相互作用严重削弱了传统学习策略。因此，我们提出C2FL，一种完全分布式的联邦学习（FL）方法，其中节点通过空间聚类自组织成学习组，反映环境的地理结构。为了抵消时间漂移，每个节点将经验回放与停留时间感知的自适应平均步骤相结合，随着在同一区域停留更长时间，逐步纳入区域共识，同时在不断变化的分布下保留先前获得的知识。我们在系统再现空间和时间变化的合成实验上评估了我们的方法，表明标准联邦策略在这些条件下显著退化，而我们的方法恢复了鲁棒的集体适应。

英文摘要

Collective Adaptive Systems (CAS) increasingly rely on machine learning to let each node learn from locally sensed data, aligning its behavior with the surrounding environment. Scaling this intelligence, however, raises fundamental challenges: sensed data is often privacy-sensitive, preventing centralized collection; nodes are mobile, traversing regions where nearby nodes perceive similar phenomena while distant ones observe radically different conditions, creating natural spatial clusters; and these distributions evolve over time due to mobility, introducing temporal drift that makes local models progressively stale. These dynamics arise across domains - vehicular sensing, drone-based monitoring, smartphone crowdsensing - yet the interplay of privacy, spatial heterogeneity, and temporal drift severely undermines conventional learning strategies. Therefore, we propose C2FL, a fully distributed Federated Learning (FL) approach where nodes self-organize into learning groups through spatial clustering, reflecting the geographic structure of the environment. To counteract temporal drift, each node combines experience replay with a dwell-time-aware adaptive averaging step, progressively incorporating the regional consensus as it remains longer within the same area, while preserving previously acquired knowledge under evolving distributions. We evaluate our approach on synthetic experiments that systematically reproduce spatial and temporal shifts, showing that standard federated strategies degrade significantly under these conditions and that our method restores robust collective adaptation.

URL PDF HTML ☆

赞 0 踩 0

2606.18001 2026-06-17 cs.LG 新提交

Half a Link can Be Enough to Predict a Whole Link: Understanding Generalization in Knowledge Graph Foundation Models

半条链接足以预测整条链接：理解知识图谱基础模型中的泛化

Cosimo Gregucci, Obaidah Theeb, Daniel Hernandez, Antonio Vergari, Steffen Staab

发表机构 * Institute for AI, University of Stuttgart（斯图加特大学人工智能研究所）； University of Southampton（南安普顿大学）； University of Edinburgh（爱丁堡大学）

AI总结本文通过分析知识图谱基础模型在未见图上的零样本泛化，发现模型利用部分可见的“半链接”进行预测，并基于此提出四类场景的分类法，揭示现有模型的泛化机制与改进方向。

详情

AI中文摘要

知识图谱（KG）基础模型（KGFMs）是零样本泛化器：只需训练一次，它们就能在未见过的图上预测链接，无需重新训练。然而，理解它们何时以及如何能够在不同KG间稳健泛化仍是一个开放问题。在本文中，我们揭示了它们的泛化机制，强调了它们在未见KG上的性能在涉及部分可见链接（我们称之为半链接）时并非均匀。事实上，我们表明，要预测一个测试三元组$(h,r,t)$，在实践中可能只需在推理图中观察到半链接$(h,r)$或$(r,t)$。这产生了四种场景的分类法，这些半链接的组合被观察到或未被观察到。通过对这些场景进行严格的分层分析，我们揭示了SoTA KGFMs利用可见的半链接进行预测，而不可见的半链接则带来不同的挑战。因此，我们更细粒度的分类法可以作为稳健KGFM泛化的诊断协议，并突出新KGFM可以改进的地方。

英文摘要

Knowledge graph (KG) foundation models (KGFMs) are zero-shot generalizers: trained once, they can predict links on unseen graphs without retraining. However, understanding when and how they can robustly generalize across KGs is still an open question. In this paper, we shed some light on their generalization mechanisms highlighting how their performance on unseen KGs is not uniform when it comes to partially seen links, which we call half-links. In fact, we show that to predict a test triple $(h,r,t)$ it might suffice in practice to have observed the half-link $(h,r)$ or $(r,t)$ in the inference graph. This yields a taxonomy of four scenarios when combinations of these half-links are observed or not. In a rigorous stratified analysis over these scenarios, we reveal that SoTA KGFMs use seen half links for predictions, while unseen half-links pose different challenges. As such, our finer-grained taxonomy can be a diagnostic protocol for robust KGFM generalization and highlights where novel KGFMs can improve.

URL PDF HTML ☆

赞 0 踩 0

2606.17999 2026-06-17 cs.CL 新提交

VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

VoidPadding: 让 [VOID] 处理掩码扩散语言模型中的填充，以便 [EOS] 专注于语义终止

Chunyu Liu, Zhengyang Fan, Kaisen Yang, Alex Lamb

发表机构 * Tsinghua University（清华大学）

AI总结提出VoidPadding方法，通过引入[VOID]令牌处理填充，将[EOS]从双重角色中解放，实现大块解码下的早期停止和自适应响应扩展，在数学推理和代码生成任务上平均提升17.84分，并减少55.7%的NFE。

详情

AI中文摘要

MDLM通过去噪预分配的掩码响应画布生成文本，使得响应长度建模成为指令调优的核心。现有的MDLM通常继承自回归惯例，在指令调优期间使用重复的\ exttt{[EOS]}令牌进行填充，赋予\ exttt{[EOS]}双重角色：既是语义终止符又是填充令牌。我们证明这种双重角色是大块解码下\ exttt{[EOS]}溢出的根本原因。为了解耦这些角色，我们提出VoidPadding，引入\ exttt{[VOID]}用于填充，并保留\ exttt{[EOS]}用于终止。在推理过程中，学习到的\ exttt{[EOS]}信号实现早期停止，而学习到的\ exttt{[VOID]}信号指导自适应响应画布扩展。在Dream-7B-Instruct上，VoidPadding在数学推理和代码生成基准测试中，将块大小平均的四任务均值比原始模型提高+17.84分，比RainbowPadding提高+6.95分，同时平均减少55.7%的解码NFE。代码可在该https URL获取。

英文摘要

MDLMs generate text by denoising a preallocated masked response canvas, making response-length modeling central to instruction tuning. Existing MDLMs often inherit the autoregressive convention of using repeated \texttt{[EOS]} tokens for padding during instruction tuning, giving \texttt{[EOS]} a dual role as both a semantic terminator and a padding token. We show that this dual role is a root cause of \texttt{[EOS]} overflow under large-block decoding. To decouple these roles, we propose VoidPadding, which introduces \texttt{[VOID]} for padding and reserves \texttt{[EOS]} for termination. During inference, the learned \texttt{[EOS]} signal enables early stopping, while the learned \texttt{[VOID]} signal guides adaptive response canvas expansion. On Dream-7B-Instruct, VoidPadding improves the block-size-averaged four-task mean across mathematical reasoning and code generation benchmarks by $+17.84$ points over the original model and $+6.95$ points over RainbowPadding, while reducing decoding NFE by 55.7\% on average. Code is available at this https URL.

URL PDF HTML ☆

赞 0 踩 0

2606.17998 2026-06-17 cs.CV 新提交

AIGS-Net: Compact Illumination Field Modeling via 2D Gaussian Splatting for Fast Low-Light Image Enhancement

AIGS-Net: 基于2D高斯泼溅的紧凑光照场建模用于快速低光图像增强

Yuhan Chen, Kunyang Huang, Fuchen Li, Zhuohan Qin, Guofa Li, Wenbo Chu, Keqiang Li

发表机构 * College of Mechanical and Vehicle Engineering, Chongqing University（重庆大学机械与车辆工程学院）； Department of Electrical and Computer Engineering, Carnegie Mellon University（卡内基梅隆大学电气与计算机工程系）； Herbert Wertheim College of Engineering, University of Florida（佛罗里达大学赫伯特·韦特海姆工程学院）； School of Mathematics and Statistics, Qingdao University（青岛大学数学与统计学院）； National Innovation Center of Intelligent and Connected Vehicles（国家智能网联汽车创新中心）； School of Vehicle and Mobility, Tsinghua University（清华大学车辆与运载学院）

AI总结提出AIGS-Net，通过输入自适应的2D高斯泼溅光照场和零参数多尺度上下文编码，以约40个可学习参数实现低光图像增强，在LOL和LSRW基准上平衡了增强质量与推理效率。

详情

AI中文摘要

现有的低光图像增强方法通常在光照场建模的表征能力与计算复杂度之间存在瓶颈。为解决此问题，本文提出自适应光照高斯泼溅网络（AIGS-Net），一种用于快速低光增强的超轻量级架构。与传统的静态先验不同，AIGS-Net构建了一个输入自适应的2D高斯泼溅光照场。高斯基函数的不透明度由输入图像的相对亮度统计动态调制，并通过有序alpha合成渲染空间变化的光照补偿。为了高效指导自适应光照补偿，引入了一个零参数非线性多尺度上下文编码模块，无需额外卷积权重即可提取低频结构和局部对比度线索。为抑制噪声放大和传感器引起的颜色偏差，AIGS-Net集成了噪声掩膜估计、锁定单通道伽马映射、跨通道一致性正则化和目标颜色对齐约束。在LOL和LSRW基准上的实验表明，AIGS-Net在仅需约40个可学习参数的情况下，改善了细节恢复和颜色保真度，实现了增强质量与极端推理效率之间的有效权衡。

英文摘要

Existing low-light image enhancement methods often face a bottleneck between the representation capacity of illumination-field modeling and computational complexity. To address this issue, this paper proposes an Adaptive Illumination Gaussian Splatting Network (AIGS-Net), an ultra-lightweight architecture for fast low-light enhancement. Unlike conventional static priors, AIGS-Net constructs an input-adaptive 2D Gaussian Splatting illumination field. The opacity of Gaussian basis functions is dynamically modulated by relative luminance statistics of the input image, and spatially varying illumination compensation is rendered through ordered alpha compositing. To guide adaptive illumination compensation efficiently, a zero-parameter nonlinear multiscale contextual encoding module is introduced to extract low-frequency structures and local contrast cues without additional convolutional weights. To suppress noise amplification and sensor-induced color bias, AIGS-Net integrates noise-mask estimation, locked single-channel Gamma mapping, cross-channel consistency regularization, and target color-alignment constraints. Experiments on LOL and LSRW benchmarks show that AIGS-Net improves detail recovery and color fidelity while requiring only approximately 40 learnable parameters, achieving an effective trade-off between enhancement quality and extreme inference efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.17996 2026-06-17 cs.LG cs.AI 新提交

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

多重周期性与通道相关的小波分解在长期时间序列预测中的应用

Bin Wang, Heming Yang, Jinfang Sheng

发表机构 * School of Computer Science and Engineering, Central South University（中南大学计算机科学与工程学院）

AI总结提出McWC模型，通过多层周期性构建、多层感知机提取通道相关性、多级小波分解融合高低频信息，并在频域解耦通道内自相关，实现高效准确的长期预测。

详情

AI中文摘要

周期性和趋势是时间序列数据的重要组成部分，许多基于周期性和趋势的研究在长期时间序列预测中取得了良好效果。然而，我们认为当前工作忽略了时间序列数据中真实世界通道间相关性的影响，导致预测次优。此外，这些模型依赖复杂设计来捕获多样信息，导致计算效率低下。为解决这一挑战，我们提出McWC，一种长期时间序列预测模型，分别对周期性、趋势和通道间相关性进行建模。具体来说，McWC首先使用多层周期性构建模块从数据中解耦周期性信息。然后，使用多层感知机提取通道间相关性。接着，使用多级小波分解模块对数据中的多层高频和低频信息进行建模和融合。最后，聚合不同组件的结果以获得输出。同时，我们通过在频域计算损失函数来解耦通道内自相关。在六个真实世界数据集上的实验表明，McWC实现了最先进的性能，展现出卓越的计算效率和历史信息提取能力。

英文摘要

Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the influence of real-world inter-channel correlations in time series data which leads to suboptimal predictions. Furthermore, these models rely on complex designs to capture diverse information so that resulting in low computational efficiency. To address this challenge, we propose McWC, a long-term time series forecasting model that separately models the cyclicity, trend, and inter-channel correlations. Specifically, McWC first decouples cyclical information from data using a multi-layer cyclicity construction module. Then, it extracts inter-channel correlations using multi-layer perceptron. Next, it models and fuses the multi-layer high-frequency and low-frequency information from data using a multi-level wavelet decomposition module. Finally, it aggregates the results of different components to obtain the output. Simultaneously, we decouple intra-channel autocorrelations by calculating a loss function in the frequency domain. Experiments on six real-world datasets demonstrate that McWC achieves state-of-the-art performance, exhibiting excellent computational efficiency and historical information extraction capabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.17989 2026-06-17 cs.CV cs.AI 新提交

Recover Semantics First, Generate Better: Improved Latent Modeling for 3D MRI Reconstruction and Cross-Contrast Synthesis

先恢复语义，再生成更好：改进的潜在建模用于3D MRI重建和跨对比合成

Yonghao Chen, Sicheng Yang, Rui Tang, Lei Zhu

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Xi’an Jiaotong University（西安交通大学）

AI总结提出语义优先的潜在建模框架，通过潜在协调编码器、语义恢复块和解剖感知频率损失，解决3D MRI压缩中长程解剖一致性、语义退化和平滑重建问题，提升重建和跨对比合成质量。

Comments Code: this https URL (https://github.com/script-Yang/RSF)

详情

AI中文摘要

多对比磁共振成像（MRI）为临床诊断提供互补信息。然而，获取所有MRI序列通常耗时且成本高昂。最近的生成模型通过从可用对比推断缺失对比来进行跨对比合成以解决此问题。尽管如此，合成3D MRI面临重大挑战。由于体积巨大，直接在像素空间操作在计算上不可行；因此，常见方法是先将3D体积压缩到潜在空间，然后在该空间中训练生成模型。我们观察到现有压缩架构存在几个关键问题：它们未能保持长程解剖一致性，丢弃了临床有意义的语义，并依赖于导致过度平滑重建的优化目标。最终，这些缺陷损害了后续生成模型的性能。在这项工作中，我们提出了一种语义优先的潜在建模框架，用于3D MRI重建和跨对比合成。具体来说，我们引入了潜在协调编码器（LHE）来捕获全局解剖依赖关系，确保体积表示的一致性。为了减轻潜在压缩过程中的语义退化，我们进一步设计了语义恢复块（SRB），该块从自监督语义教师注入高级先验，增强潜在空间中对比感知的可分离性。此外，我们提出了解剖感知频率损失（AFL），以自适应地保留诊断相关的高频结构。在两个公共多对比MRI数据集上的大量实验表明，重建保真度和跨对比合成质量持续提升。我们的代码可在该https URL获取。

英文摘要

Multi-contrast magnetic resonance imaging (MRI) provides complementary information for clinical diagnosis. However, acquiring all MRI sequences is often time-consuming and costly. Recent generative models perform cross-contrast synthesis to address this issue by inferring absent contrasts from the available ones. Nevertheless, synthesizing 3D MRI presents significant challenges. Due to the massive volume sizes, operating directly in the pixel space is computationally prohibitive; therefore, a common approach is to first compress the 3D volumes into a latent space and subsequently train generative models in that space. We observe that existing compression architectures face several critical issues: they under-preserve long-range anatomical coherence, discard clinically meaningful semantics, and rely on optimization objectives that lead to over-smoothed reconstructions. Ultimately, these shortcomings compromise the performance of subsequent generative models. In this work, we propose a semantics-first latent modeling framework for 3D MRI reconstruction and cross-contrast synthesis. Specifically, we introduce a Latent Harmonization Encoder (LHE) to capture global anatomical dependencies, ensuring coherent volumetric representations. To mitigate semantic degradation during latent compression, we further design a Semantic Recovery Block (SRB) that injects high-level priors from a self-supervised semantic teacher, enhancing contrast-aware separability in the latent space. Additionally, we propose an Anatomy-aware Frequency Loss (AFL) to adaptively preserve diagnostically relevant high-frequency structures. Extensive experiments on two public multi-contrast MRI datasets demonstrate consistent improvements in reconstruction fidelity and cross-contrast synthesis quality. Our code is available at this https URL.

URL PDF HTML ☆

赞 0 踩 0

2606.17985 2026-06-17 cs.CV 新提交

Gaussian Light Field Splatting: A Physical Prior-Driven Vision Transformer for Unsupervised Low-Light Image Enhancement

高斯光场溅射：一种物理先验驱动的视觉Transformer用于无监督低光图像增强

Yuhan Chen, Wenxuan Yu, Guofa Li, Fuchen Li, Kunyang Huang, Yicui Shi, Ying Fang, Wenbo Chu, Keqiang Li

发表机构 * College of Mechanical and Vehicle Engineering, Chongqing University（重庆大学机械与车辆工程学院）； Herbert Wertheim College of Engineering, University of Florida（佛罗里达大学赫伯特·韦特海姆工程学院）； Department of Electrical and Computer Engineering, Carnegie Mellon University（卡内基梅隆大学电气与计算机工程系）； National Innovation Center of Intelligent and Connected Vehicles（国家智能网联汽车创新中心）； School of Vehicle and Mobility, Tsinghua University（清华大学车辆与运载学院）

AI总结提出GLFS模型，将高斯溅射的连续物理光照建模引入Transformer，通过各向异性高斯函数表示场景光照并引入物理引导偏置到自注意力中，配合颜色向量角损失和亮度边缘损失，实现非均匀光照下的曝光均衡和色彩校正，达到最先进性能。

详情

AI中文摘要

现有的无监督低光图像增强方法在复杂的非均匀光照下常常遇到局部曝光不平衡和颜色失真。此外，大多数Vision Transformers缺乏对光照退化的物理先验进行建模的显式机制。为了解决这些限制，我们提出了GLFS，一种基于高斯光场溅射的Vision Transformer，它将高斯溅射的连续物理光照建模集成到Transformer架构中。在GLFS中，场景光照由各向异性高斯基函数的叠加表示。将物理引导的偏置引入自注意力，以自适应地推断空间增益场，从而在复杂光照下实现准确且均匀的恢复。为了减少增强过程中的颜色偏差和结构退化，进一步开发了颜色向量角损失和亮度边缘损失。这些损失强制色调一致性并提高局部细节的结构保真度。广泛的消融研究和定量评估表明，GLFS在光照校正和细节保留方面具有明显优势。它实现了最先进的性能，并为低光图像增强提供了一种新的表示范式。

英文摘要

Existing unsupervised low-light image enhancement methods often encounter local exposure imbalance and color distortion under complex non-uniform illumination. In addition, most Vision Transformers lack an explicit mechanism for modeling the physical priors of illumination degradation. To address these limitations, we propose GLFS, a Gaussian light field splatting-based Vision Transformer that integrates continuous physical illumination modeling from Gaussian splatting into the Transformer architecture. In GLFS, scene illumination is represented by a superposition of anisotropic Gaussian basis functions. Physics-guided biases are introduced into self-attention to adaptively infer a spatial gain field, enabling accurate and uniform restoration under complex illumination. To reduce color bias and structural degradation during enhancement, a color-vector angular loss and a luminance-edge loss are further developed. These losses enforce hue consistency and improve the structural fidelity of local details. Extensive ablation studies and quantitative evaluations show that GLFS provides clear advantages in illumination correction and detail preservation. It achieves state-of-the-art performance and offers a new representation paradigm for low-light image enhancement.

URL PDF HTML ☆

赞 0 踩 0

2606.17982 2026-06-17 cs.RO 新提交

LAGO Policy: Latency-Aware Asynchronous Diffusion Policies with Goal-Directed Collision-Free Planning for Smooth Manipulation

LAGO策略：面向平滑操作的延迟感知异步扩散策略与目标导向无碰撞规划

Guowei Shi, Xupeng Xie, Yiming Luo, Jian Guo, Jun Ma, Boyu Zhou

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； International Digital Economy Academy（国际数字经济学院）； The University of Hong Kong（香港大学）； Southern University of Science and Technology（南方科技大学）

AI总结提出LAGO策略，通过延迟感知条件引导和时空轨迹优化，解决异步扩散策略的间断和碰撞问题，实现平滑安全的操作。

Comments 8 pages, 8 figures

详情

AI中文摘要

基于扩散的视觉运动策略在异步推理部署时，常出现片段间不连续，且缺乏显式的障碍物感知机制，导致运动抖动和碰撞，阻碍了在真实场景中的可靠操作。为解决这些问题，我们提出LAGO策略，一个统一的异步动作生成框架，将轨迹优化与扩散策略相结合，实现平滑安全的执行。LAGO策略通过基于未来动作的延迟感知无分类器引导条件，提高了片段间一致性。它进一步通过从演示中预测任务相关的交互目标，实现目标导向的无碰撞轨迹规划。最后，时空轨迹优化细化待执行的动作，以实现低抖动和可行的运动。大量真实世界实验表明，LAGO策略在具有挑战性的操作任务中，实现了平滑无碰撞的执行和高任务成功率。项目网站：此 https URL

英文摘要

Diffusion-based visuomotor policies deployed with asynchronous inference often exhibit inter-chunk discontinuities and lack explicit mechanisms for obstacle-aware execution, leading to jerky motions and collisions that hinder reliable manipulation in real-world scenes. To address these issues, we propose LAGO Policy, a unified asynchronous action-generation framework that integrates trajectory optimization with diffusion policy for smooth and safe execution. LAGO Policy improves inter-chunk consistency via latency-aware classifier-free guidance conditioning on future actions. It further enables goal-directed collision-free trajectory planning by predicting a task-relevant interaction goal from demonstrations. Finally, spatial-temporal trajectory optimization refines the actions to be executed for low-jerk and feasible motion. Extensive real-world experiments demonstrate that LAGO Policy achieves smooth collision-free execution with high task success across challenging manipulation tasks. Project Website: this https URL

URL PDF HTML ☆

赞 0 踩 0

2606.17979 2026-06-17 cs.AI 新提交

通过语音基础模型的干预后训练学习任务特定子空间

Jack Cox, Jon Barker

发表机构 * University of Sheffield（谢菲尔德大学）

AI总结提出一种干预对比学习后训练方法，将语音基础模型的纠缠表示分解为内容和说话人子空间，提升说话人验证和关键词识别性能。

Comments Accepted to Interspeech 2026; 6 pages (4 main body), 2 figures

2606.17966 2026-06-17 cs.CV 新提交

Reload-Mamba: Hierarchical Anti-Dilution State-Space Modeling for Multi-Class Semantic Segmentation

Reload-Mamba：用于多类语义分割的分层抗稀释状态空间建模

Sheng-Wei Chan, Hsin-Jui Pan, Jen-Shiun Chiang

发表机构 * Department of Electrical and Computer Engineering, Tamkang University（淡江大学电机与计算机工程系）

AI总结提出Reload-Mamba框架，通过边界监督的局部细节先验、类不确定性感知的Reload门控和分层多级Reload机制，解决Mamba状态空间传播导致的响应稀释问题，在ADE20K、Cityscapes和PASCAL VOC 2012上取得优异性能。

Comments 23 pages, 4 figures, 17 tables. Code will be released soon

详情

AI中文摘要

基于Mamba的状态空间模型为高分辨率密集预测提供了线性时间的长程建模能力，但顺序状态空间传播会削弱多类语义分割中关键的边界敏感和细节敏感响应。我们提出Reload-Mamba，一种语义分割框架，通过三个分割特定设计解决这种传播导致的响应稀释问题：(i) 边界监督的局部细节先验，使用真实边界掩码显式训练，以识别需要响应恢复的区域；(ii) 类不确定性感知的Reload门控，将来自预重载辅助头的逐像素类熵作为额外的门控信号，该公式仅在多类密集预测下提供信息；(iii) 分层多级Reload机制，在三个解码器级别应用抗稀释细化，并自上而下融合恢复的表示。基于ConvNeXt-Tiny编码器、多尺度解码器和具有像素级方向注意力的四方向Mamba扫描，Reload-Mamba在ADE20K上达到47.9%单尺度（48.9%多尺度）mIoU，在Cityscapes上达到83.2%单尺度mIoU。在标准DeepLab风格协议下使用ResNet-101 + COCO预训练，Reload-Mamba在PASCAL VOC 2012 val上达到87.8% mIoU。控制消融实验表明，三个分割特定设计各自贡献了超出直接移植先前为二值化提出的抗稀释架构的性能，在ADE20K上相比直接移植基线累积提升了+2.2 mIoU。

英文摘要

Mamba-based state space models offer linear-time long-range modeling for high-resolution dense prediction, but sequential state-space propagation can attenuate boundary-sensitive and detail-sensitive responses that are critical in multi-class semantic segmentation. We propose Reload-Mamba, a semantic segmentation framework that addresses this propagation-induced response dilution through three segmentation-specific designs: (i) a boundary-supervised local detail prior that is explicitly trained with ground-truth boundary masks to identify regions requiring response restoration; (ii) a class-uncertainty-aware Reload Gate that incorporates per-pixel class entropy from a pre-reload auxiliary head as an additional gating signal, a formulation that is informative only under multi-class dense prediction; and (iii) a hierarchical multi-level Reload mechanism that applies anti-dilution refinement at three decoder levels and fuses the restored representations top-down. Built upon a ConvNeXt-Tiny encoder with a multi-scale decoder and four-directional Mamba scanning with pixel-wise directional attention, Reload-Mamba achieves 47.9% single-scale (48.9% multi-scale) mIoU on ADE20K and 83.2% single-scale mIoU on Cityscapes. With ResNet-101 + COCO pre-training under the standard DeepLab-style protocol, Reload-Mamba reaches 87.8% mIoU on PASCAL VOC 2012 val. Controlled ablations show that each of the three segmentation-specific designs contributes beyond a direct port of the prior anti-dilution architecture proposed for binarization, cumulatively improving over the direct-port baseline by +2.2 mIoU on ADE20K.

URL PDF HTML ☆

赞 0 踩 0

2606.17961 2026-06-17 cs.CV cs.AI 新提交

即插即适应：基于预训练对齐模型的首眼多模态指代消解

Jinghan Wu, Jing Li, Ivor W. Tsang, Xuetao Zhang

发表机构 * State Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Institute of Artificial Intelligence and Robotics, Xi'an Jiaotong University（西安交通大学人工智能与机器人研究所人机混合增强智能全国重点实验室）； Centre for Frontier AI Research and Institute of High-Performance Computing, Agency for Science, Technology and Research (A*STAR)（新加坡科技研究局前沿人工智能研究中心与高性能计算研究所）

AI总结提出即插即适应方法，利用预训练的细粒度对齐模型，通过证据理论融合视觉与类别线索，无需目标数据集训练或大型VLLM，在CIN基准上CoNLL F1比专用方法和流行VLLM分别提升5.31%和2.12%。

详情

AI中文摘要

视觉信息有助于解决指代消解中的歧义，带来显著的性能提升。然而，现有的多模态指代消解（MCR）方法在应用前需要使用目标数据集的部分标注数据进行训练，这阻碍了其直接可用性并引发泛化担忧。虽然拥有数十亿参数的视觉-语言大模型（VLLM）提供了有前景的零样本能力，但它们仍然难以获取。其庞大的规模限制了部署能力，且许多模型只能通过付费API访问。在本文中，我们提出了一种即插即适应方法，该方法策略性地适配一个精心预训练的\emph{对齐模型}，以立即用于MCR任务，旨在消除对稀缺基准数据集的训练或依赖资源密集型VLLM的需求。具体来说，我们首先使用视觉-语言对齐数据集预训练文本与视觉上下文信息之间的细粒度对齐模型。然后，我们通过证据理论融合视觉和类别线索进行相似度聚合，将对齐模型重新用于MCR，从而增强效果。在Coreference Image Narratives (CIN)基准数据集上的实验证明了我们方法的有效性，在CoNLL F1上比最先进的专用方法和流行VLLM分别提高了5.31%和2.12%。我们进一步在掩码CIN数据集上进行鲁棒性测试，并在专门构建的VCR-MCR数据集上进行泛化评估，结果证实了这两种能力。

英文摘要

Visual information helps resolve ambiguity in coreference resolution, leading to notable performance gains. However, existing Multi-modal Coreference Resolution (MCR) methods require training with (partially) annotated data from the target dataset before they can be applied, preventing their direct usability and raising concerns about generalization. While Vision-Language Large Models (VLLMs) with billions of parameters offer promising zero-shot capabilities, they remain largely inaccessible. Their massive size limits deployability, and many are only accessible through paid APIs. In this paper, we propose a plug-and-adapt method that strategically adapts a carefully pre-trained \emph{alignment model} for immediate use in MCR tasks, designed to eliminate the need for training on scarce benchmark datasets or relying on resource-intensive VLLMs. Specifically, we first pre-train a fine-grained alignment model between textual and visual contextual information using vision-language alignment datasets. We then repurpose the alignment model to MCR through similarity aggregation by fusing visual and categorical cues with evidence theory, thereby enhancing effectiveness. Experiments on the Coreference Image Narratives (CIN) benchmark dataset demonstrate the effectiveness of our method, achieving a 5.31\% and 2.12\% improvement in CoNLL F1 over SOTA dedicated methods and popular VLLMs, respectively. We further evaluate our method on a masked CIN dataset for robustness testing and on a specially constructed VCR-MCR dataset for generalization assessment, with results confirming both capabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.17945 2026-06-17 cs.AI 新提交

Small Initialization Matters for Large Language Models

小初始化对大语言模型至关重要

Liangkai Hang, Junjie Yao, Zhiyu Li, Feiyu Xiong, Hongkang Yang, Zhi-Qin John Xu

发表机构 * School of Mathematical Sciences, Shanghai Jiao Tong University（上海交通大学数学科学学院）； Institute of Natural Sciences, Shanghai Jiao Tong University（上海交通大学自然科学研究院）； MemTensor (Shanghai) Technology Co., Ltd.（上海记忆张量科技有限公司）； Institute for Advanced Algorithms Research（先进算法研究所）

AI总结本文发现减小初始化尺度能持续改善大语言模型预训练，尤其在推理任务上提升显著，并揭示了小初始化驱动参数从低复杂度结构向丰富表示演化的机制。

Comments 26 pages, 8 figures

详情

AI中文摘要

大语言模型提供了一个可处理的系统，用于探究智能本身如何涌现，而不仅仅是LLM如何被工程化。尽管进展通常归因于规模、数据和架构，但我们表明参数初始化是训练以及模型能力的基因式决定因素。减小初始化尺度持续改善预训练，在推理密集型任务上收益最大。我们识别出两种限制小初始化优势的常用经验设置，并展示放松这些设置如何恢复有利的缩放。我们进一步发现了一个平衡推理和训练的关键初始化。从机制上讲，小初始化驱动了独特的发展轨迹：参数首先凝聚成低复杂度结构，随后扩展为更丰富的表示，为“压缩即智能”这一观点提供了具体形式。词元级分析表明，收益集中在非平凡、上下文约束的预测上，而非均匀地分布于所有词元。这些结果启发了一个简单的$\gamma$-初始化规则：将初始化范围作为显式旋钮，并默认使用小初始化，这是一种几乎无成本的干预，能改善预训练并跨模型规模增强推理。

英文摘要

Large language models provide a tractable system for asking how intelligence itself emerges, rather than only how LLMs can be engineered. Although progress is usually attributed to scale, data and architecture, we show that parameter initialization is a gene-like determinant of training and, in particular, of model capacity. Reducing the initialization scale consistently improves pretraining, with the largest gains on reasoning-demanding tasks. We identify two widely used empirical settings that restrain the advantage of small initialization, and show how relaxing them restores favorable scaling. We further uncover a critical initialization that balances the reasoning and training. Mechanistically, small initialization drives a distinct developmental trajectory: parameters first condense into low-complexity structures and later expand into richer representations, giving concrete form to the idea that compression is intelligence. Token-level analyses show that the gains concentrate on non-trivial, context-constrained predictions rather than all tokens uniformly. These results motivate a simple $\gamma$-initialization rule: expose initialization rage as an explicit knob and use small initialization by default, an almost cost-free intervention that improves pretraining and strengthens reasoning across model scales.

URL PDF HTML ☆

赞 0 踩 0

2606.17937 2026-06-17 cs.RO 新提交

PreAct：在重复任务上加速的计算机使用代理

Bojie Li

发表机构 * Pine AI

AI总结提出PreAct方法，通过将首次成功执行编译为状态机程序，在后续任务中直接重放，避免逐步骤调用语言模型，实现8.5-13倍加速，并确保重放时屏幕状态匹配。

详情

AI中文摘要

计算机使用代理通过屏幕操作真实软件——点击和打字——但它们从头解决每个任务：当要求重复一个任务时，代理重新读取屏幕，重新推理每次点击，并再次支付全部成本。我们提出PreAct，让这样的代理在之前做过的任务上更快。首次成功时，PreAct将运行编译成一个小的状态机程序——检查屏幕的状态、执行动作的转换——并在后续运行中直接重放，而不是调用代理，速度提升8.5-13倍，无需每步的语言模型调用。重放并非盲目：每一步PreAct在行动前检查屏幕是否与程序预期匹配，一旦出现异常就将控制权交还给代理。PreAct在决定保留什么时也应用同样的原则：新编译的程序只有在从干净状态重新运行时，独立评估器确认其解决了任务后，才进入存储——捕获那些重放到最后一步但未完成任务的程序。在移动、桌面和网络基准测试中，这种存储时检查将重复运行中因故障程序积累而改善的运行与退化的运行区分开，每个基准测试价值1.75-2.6个任务，三个方向一致；当没有程序匹配时，从头探索的回退使PreAct与强大的记录-重放基线持平。我们还报告了哪些因素不重要：提示措辞、运行时护栏，以及语言模型或普通嵌入检索器选择重用的程序。

英文摘要

Computer-using agents drive real software through the screen -- clicking and typing -- but they solve every task from scratch: asked to repeat a task, an agent re-reads the screen, re-reasons every tap, and pays the full cost again. We present PreAct, which lets such an agent get faster on tasks it has done before. The first time it succeeds, PreAct compiles the run into a small state-machine program-states that check the screen, transitions that act-and on later runs replays it directly instead of invoking the agent 8.5-13x faster, with no per-step language-model calls. Replay is not blind: at each step PreAct checks that the screen matches what the program expects before acting, and hands control back to the agent the moment something is off. PreAct applies the same discipline when deciding what to keep: a freshly compiled program enters the store only if, re-run from a clean state, an independent evaluator confirms it solved the task-catching programs that replay to their last step yet leave the task undone. Across a mobile, a desktop, and a web benchmark, this store-time check separates repeated runs that improve from ones that degrade as faulty programs accumulate, worth 1.75-2.6 tasks per benchmark, the same direction on all three; a fallback that explores afresh when no program fits brings PreAct level with a strong record-and-replay baseline. We also report what did not matter: prompt wording, runtime guardrails, and whether a language model or a plain embedding retriever selects which program to reuse.

URL PDF HTML ☆

赞 0 踩 0