揭示LLM电路发现中的方差

Frank Zhengqing Wu, Francesco Tonin, Volkan Cevher

发表机构 * Laboratory for Information and Inference Systems (LIONS), École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland（信息与推理系统实验室（LIONS），洛桑联邦理工学院（EPFL），瑞士洛桑）

AI总结本文研究LLM电路发现中的重采样、重述和样本方差，提出CEAP方法减少重采样方差，并分析重述方差源于不同模板激活不同电路，样本方差主要由不忠定义导致。

详情

AI中文摘要

电路发现是机械可解释性中的关键技术，用于定位对执行给定任务至关重要的模型组件。尽管当前最先进的方法（EAP-IG）在（不）忠实性指标上表现良好，但它存在显著的变异性。这包括重采样方差（当我们用来自同一分布的新数据批次探测时电路发生变化）、重述方差（当提示被重新表述时发现的电路发生偏移）以及样本方差（具有低总体不忠实性的电路在单个样本上的不忠实性表现出大幅波动）。本文研究了这些方差的根源。我们证明了CEAP（我们新的电路发现方法，在理论上改进了EAP-IG）可以显著减轻重采样方差。我们进一步表明，重述方差是由于不同模板的提示倾向于激活模型中的不同电路。这使我们提出，可能很难找到一个全面的电路来解释和控制模型在任务上的行为，而该任务可以用无数模板表达，这表明LLM可能本质上难以操控。我们表明，稀疏性（据称能形成更紧凑和可解释的任务电路）无法解决这个问题。关于样本方差，我们认为它很大程度上是良性的：极差的不忠实性分数通常源于不忠实性的定义方式，而非测量电路的缺陷。我们表明，不忠实性的大小受选择性贡献缩放的影响，这是一种神经机制，解释了有时观察到的极差分数。

英文摘要

Circuit discovery is a key technique in mechanistic interpretability to pinpoint the model components that are crucial for performing a given task. Although the current state-of-the-art method (EAP-IG) performs well on the metric of (un)faithfulness, it suffers from substantial variability. This includes resampling variance, where the circuit changes when we probe with a new batch of data from the same distribution; rephrasing variance, where the discovered circuit shifts when the prompts are rephrased; and sample-wise variance, where a circuit with low population unfaithfulness exhibits large fluctuations in unfaithfulness across individual samples. This paper studies the roots of these variances. We demonstrate that CEAP, our new circuit discovery method that improves upon EAP-IG with a theoretical guarantee, can substantially lessen resampling variance. We further show that rephrasing variance arises because prompts with different templates tend to activate different circuits in the model. This leads us to argue that it may be challenging to find a comprehensive circuit that explains and controls the model's behavior on a task, which can be expressed in countless templates, suggesting that LLMs may be inherently hard to steer. We show that sparsity, which has been claimed to form more compact and interpretable task circuits, fails to solve this problem. Regarding sample-wise variance, we argue that it is largely benign: extremely poor unfaithfulness scores often stem from how unfaithfulness is defined, rather than from defects in the measured circuits. We show that the magnitude of unfaithfulness is affected by selective contribution scaling, a neural mechanism that accounts for the extremely poor scores sometimes observed.

URL PDF HTML ☆

赞 0 踩 0

2606.16914 2026-06-16 cs.AI 新提交

Greed Is Learned: Visible Incentives as Reward-Hacking Triggers

贪婪是习得的：可见激励作为奖励黑客触发器

Tong Che, Rui Wu

发表机构 * NVIDIA Research（英伟达研究院）； Rutgers University（罗格斯大学）

AI总结研究强化学习中的奖励通道成瘾现象，即智能体因可见的自我利益通道（如分数、KPI）而偏离真实任务，并发现该成瘾可翻转模型的安全对齐。

详情

AI中文摘要

奇妙预训练优化器及其发现之处 II：超球优化

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu, Tengyu Ma, Percy Liang

发表机构 * Stanford University（斯坦福大学）； Princeton University（普林斯顿大学）； Tsinghua University（清华大学）

AI总结针对Muon等优化器在大模型预训练中增益随规模增大而减弱的问题，提出Hyperball包装器，固定权重矩阵及其更新的Frobenius范数，在1.2B参数模型上实现20-30%的token等效加速，并改善学习率迁移。

Comments Corresponding blog post: https://psychedelic-sunstone-851.notion.site/Fantastic-Pretraining-Optimizers-and-Where-to-Find-Them-2-1-Hyperball-Optimization-2e924306e6f280e7a5ffee00eb40a0dd

详情

AI中文摘要

基于矩阵的优化器（如Muon）可以显著加速语言模型预训练，但观察到当使用标准常数解耦权重衰减时，随着模型大小和数据规模的增长，它们相对于AdamW的增益会缩小。我们提出Hyperball，一个简单的优化器包装器来解决这个问题。给定一个基础优化器（如Adam或Muon），Hyperball将权重矩阵的Frobenius范数及其对应的优化器更新设置为固定常数。在高达1.2B参数的Qwen3风格模型上，Muon Hyperball相对于权重衰减基线实现了20-30%的token等效加速。与解耦权重衰减相比，Hyperball还改善了跨宽度和深度的学习率迁移。该方法的动机来自先前的理论，该理论表明使用权重衰减训练会导致一个仅依赖于训练超参数的平衡权重范数。通过这种机制，权重衰减决定了角度学习率，即权重矩阵方向变化的速度。

英文摘要

Matrix based optimizers such as Muon can substantially speed up language model pretraining, but their gains over AdamW are observed to shrink as model size and data scale grow when using standard constant decoupled weight decay. We propose Hyperball, a simple optimizer wrapper that addresses this issue. Given a base optimizer such as Adam or Muon, Hyperball sets the Frobenius norms of weight matrices and their corresponding optimizer updates to fixed constants. On Qwen3 style models up to 1.2B parameters, Muon Hyperball achieves 20--30% token equivalent speedup over weight decay baselines. Hyperball also improves learning rate transfer across widths and depths compared to decoupled weight decay. This method is motivated by prior theory showing that training with weight decay leads to an equilibrium weight norm that only depends on the training hyperparameters. Through this mechanism, the weight decay then decides the angular learning rate, i.e. how fast the direction of the weight matrix changes.

URL PDF HTML ☆

赞 0 踩 0

2606.16898 2026-06-16 cs.CV cs.AI 新提交

Semantic Flip: Synthetic OOD Generation for Robust Refusal in Embodied Question Answering and Spatial Localization

Semantic Flip: 用于具身问答和空间定位中鲁棒拒绝的合成OOD生成

Dongbin Na, Chanwoo Kim, Giyun Choi, Dooyoung Hong

发表机构 * RGA Inc.（RGA公司）

AI总结提出Semantic Flip框架，通过合成辅助OOD样本训练轻量拒绝模块，使冻结的视觉语言模型在无外部OOD标注下实现鲁棒拒绝，在具身问答和空间定位基准上优于强提示基线。

Comments 18 pages, 3 figures. Code and data: https://github.com/ndb796/SemanticFlip ; project page: https://ndb796.github.io/SemanticFlip

详情

AI中文摘要

检测不可回答的用户查询对于现实世界具身代理的可靠部署仍然至关重要。然而，现代视觉语言模型（VLM）即使当可用视觉记忆无法支持查询时，也常常生成过于自信的答案。这种过度自信会带来各种任务依赖的风险。代理可能在具身问答中向用户提供误导信息，并在空间推理导航中选择任意坐标并物理引导用户前往。尽管风险很高，但只有少数先前研究直接解决具身VLM何时以及如何回答“我不知道”的问题。本文提出Semantic Flip，一个简单而有效的框架，无需外部OOD标注即可合成辅助分布外（OOD）样本用于具身拒绝。关键思想是独立变换查询和视频记忆，以构建缺乏足够视觉基础的辅助OOD对。这些合成对使得能够在冻结的预训练VLM之上训练一个轻量级拒绝模块。该模块可附加到任何现有的基于VLM的流水线中，无需重新训练底层模型。在两个互补的基准测试中，Semantic Flip始终优于强提示基线。本文还引入了SpaceReject，一个新的用于空间定位的拒绝基准，包含故意不可回答的查询和长视频记忆，其中Semantic Flip达到了0.9559的$F_1$分数。源代码和数据集公开于https://github.com/ndb796/SemanticFlip。

英文摘要

Detecting unanswerable user queries remains essential for the reliable deployment of real-world embodied agents. However, modern vision-language models (VLMs) often generate overly confident answers even when the available visual memory cannot support the query. Such overconfidence poses various task-dependent risks. The agent may provide misleading information to the user in Embodied Question Answering and select an arbitrary coordinate and physically guide the user there in spatial reasoning for navigation. Despite these high stakes, only a few prior studies directly address when and how an embodied VLM should respond with "I do not know." This work proposes Semantic Flip, a simple yet effective framework that synthesizes auxiliary out-of-distribution (OOD) samples for embodied refusal without requiring external OOD annotations. The key idea is to independently transform the query and video memory to construct auxiliary OOD pairs that lack sufficient visual grounding. These synthesized pairs enable training a lightweight rejection module on top of a frozen pretrained VLM. The module attaches to any existing VLM-based pipeline without retraining the underlying model. Across two complementary benchmarks, Semantic Flip consistently outperforms strong prompting baselines. This work also introduces SpaceReject, a new refusal benchmark for spatial localization with deliberately unanswerable queries over long video memory, where Semantic Flip achieves an $F_1$ score of 0.9559. The source codes and datasets are publicly available at https://github.com/ndb796/SemanticFlip.

URL PDF HTML ☆

赞 0 踩 0

2606.16897 2026-06-16 cs.CL 新提交

Contrastive-Difference CKA Reveals Concept-Specific Structural Alignment Across Language Model Architectures

对比差异CKA揭示跨语言模型架构的概念特定结构对齐

Xueping Gao

发表机构 * Alibaba Cloud（阿里云）

AI总结提出对比差异CKA（CKA_Delta）方法，发现不同LLM架构在概念表示上存在几何收敛与功能可迁移性分离的现象，能有效区分概念特定相似性与通用相似性。

详情

LOPAL：基于局部性能感知的不完美演示主动学习

Johannes Heidersberger, Shail Jadav, Dongheui Lee

发表机构 * Autonomous Systems Lab, Institute of Computer Technology, TU Wien（维也纳工业大学计算机技术研究所自主系统实验室）； Institute of Robotics and Mechatronics, German Aerospace Center (DLR)（德国航空航天中心机器人与机电一体化研究所）

AI总结提出LOPAL方法，利用局部演示质量信息，通过高斯混合模型编码轨迹与质量评估，结合共享自主权主动收集纠正数据，在不完美演示中提升任务性能。

Comments Accepted for publication in IEEE Robotics and Automation Letters (RAL), 2026

详情

DOI: 10.1109/LRA.2026.3698364

AI中文摘要

从演示中学习（LfD）通过允许机器人直接从人类任务演示中学习，实现了直观的机器人技能获取。然而，当前方法通常未能解决由于次优和不一致的人类行为，演示质量在每个演示内部可能变化的问题。因此，我们引入了LOPAL（局部性能感知主动学习），一种利用这种局部演示质量信息的主动学习方法。我们的方法由两个协同组件组成。首先，一种局部性能驱动的LfD方法使用高斯混合模型（GMM）来编码演示轨迹及其相关的局部质量评估。这使得能够通过利用高性能的互补局部数据生成优于不完美演示的轨迹。其次，主动数据采集允许通过收集额外的信息样本来超越不完美演示。在缺乏良好数据的区域，通过共享自主权（SA）机制主动请求用户提供纠正，同时机器人自主执行学习的行为。LOPAL的有效性在仿真和真实世界实验中得到了验证。真实世界管道检查任务的结果表明，所提出的方法可以实现高达27.31%的任务性能提升，同时减少了收集演示所需的努力。

英文摘要

Learning from Demonstration (LfD) enables intuitive robot skill acquisition by allowing robots to learn directly from human task demonstrations. However, current methods often fail to address the fact that due to suboptimal and inconsistent human behavior, the quality of the demonstration can vary within each demonstration. Therefore, we introduce LOPAL (LOcal Performance-aware Active Learning), an active learning approach that leverages this local demonstration quality information. Our approach consists of two synergistic components. First, a local performance-driven LfD method uses a Gaussian Mixture Model (GMM) to encode both the demonstrated trajectories and their associated local quality assessments. This enables the generation of trajectories that outperform the imperfect demonstrations by utilizing complementary local data of high performance. Second, active data acquisition allows to improve beyond the imperfect demonstrations by collecting additional informative samples. In areas missing good data, the user is actively requested to provide corrections through a shared autonomy (SA) mechanism, while the robot autonomously executes the learned behavior. The efficacy of LOPAL was validated in both a simulation and a real-world experiment. The results from a real-world pipe inspection task showed that the proposed approach can achieve up to 27.31 % improvement in task performance while also reducing the effort required to collect the demonstrations.

URL PDF HTML ☆

赞 0 踩 0

2606.16883 2026-06-16 cs.LG cs.AI 新提交

Upper Bounds on the Generalization Error of Deep Learning Models via Local Robustness and Stability

深度学习模型泛化误差的上界：基于局部鲁棒性和稳定性

Abdul-Rauf Nuhu, Parham M. Kebria, Vahid Hemmati, Mahmoud N. Mahmoud, Edward Tunstel, Abdollah Homaifar

发表机构 * North Carolina Agricultural and Technical State University（北卡罗来纳农业技术州立大学）； University of Alabama（阿拉巴马大学）； Southwest Research Institute（西南研究院）

AI总结提出一种通过局部区域稳定样本数缩放鲁棒性项的泛化上界，在ImageNet上实现非空洞且最紧的误差估计。

详情

AI中文摘要

泛化是数据驱动模型的关键属性，尤其是在安全关键应用中部署的深度学习模型。基于鲁棒性的泛化界作为一种将鲁棒性与泛化性能联系起来的原则性方法而受到关注，通常以数据依赖的方式。然而，大多数现有界在实际设置中存在空洞问题，产生远超过实际错误率的松散上界，限制了其在真实世界评估中的实用性。虽然这个问题通常归因于不确定性项，但问题的很大一部分源于鲁棒性项本身，特别是对于0-1损失。现有方法通常将鲁棒性项视为全局度量，忽略了其在输入空间不同子区域间的变化。在这项工作中，我们提出了一种泛化界，通过根据每个子区域内稳定和不稳定样本的数量来缩放鲁棒性项，从而解决了这一局限性。我们的界同时包含数据和模型依赖因素，同时保持实际相关性（产生更紧的真实误差上界）。在ImageNet数据集上训练的模型上的实验表明，我们的界始终非空洞，并在现有方法中实现了最紧的估计，与一系列鲁棒深度神经网络的实证性能紧密对齐。

英文摘要

Generalization is a critical property of data-driven models, particularly deep learning models deployed in safety-critical applications. Robustness-based generalization bounds have gained attention as a principled way to link robustness properties to generalization performance, often in a data-dependent manner. However, most existing bounds suffer from vacuousness in practical settings, yielding loose upper bounds that greatly exceed the actual error rates and limiting their usefulness for real-world evaluation. While this issue is often attributed to the uncertainty term, a substantial part of the problem originates from the robustness term itself, particularly for the 0-1 loss. Existing approaches typically treat the robustness term as a global measure, ignoring its variation across different sub-regions of the input space. In this work, we propose a generalization bound that addresses this limitation by scaling the robustness term according to the number of stable and unstable samples within each sub-region. Our bounds incorporate both data- and model-dependent factors while maintaining practical relevance (yielding tighter upper bounds on true error). Experiments on models trained on the ImageNet dataset show that our bounds remain consistently non-vacuous and achieve the tightest estimates among existing methods, closely aligning with empirical performance across a range of robust deep neural networks.

URL PDF HTML ☆

赞 0 踩 0

2606.16881 2026-06-16 cs.RO 新提交

潜空间强化学习用于食品断裂模拟中的逆材料估计

Adrian Ramlal, Yuhao Chen, John S. Zelek

发表机构 * University of Waterloo（滑铁卢大学）

AI总结针对食品断裂模拟中材料参数难以直接测量的问题，提出基于潜空间强化学习的目标条件策略，实现从断裂行为描述到材料参数的单次前向估计，精度提升23%。

Comments Accepted in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2026 MetaFood Workshop

详情

Journal ref: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2026, pp. 9573-9581

AI中文摘要

食品操作的真实视觉模拟需要精确的材料参数，但这些参数难以直接测量，且在单个食品的异质区域间变化。我们解决了从非连续损伤力学模拟器中断裂行为的目标描述中估计材料参数的逆问题。以剥橙子为测试案例，我们在2000次正向模拟上训练神经代理，并比较协方差矩阵自适应进化策略（CMA-ES，一种无梯度进化优化器）与近端策略优化（PPO，一种强化学习算法）在原始9维参数空间和两个学习的4维潜表示上的表现。由于不同橙子具有不同的材料属性，实用的逆系统必须能够处理任意目标而无需重新训练。我们训练了一个目标条件PPO策略，该策略学习通用的逆映射：给定任意剥皮行为的目标描述，该策略在单次前向传递（8次代理评估，约10毫秒）中产生材料参数估计。在归一化流潜空间中使用共享代理评估器，目标条件策略通过模拟器验证时实现了0.642的实际恢复率，比原始参数空间高出23%。从策略输出初始化CMA-ES细化的热启动扩展进一步将恢复率提升至0.828，使用540次评估。这些发现为食品逆物理提供了实用框架，并为从食品操作的视频观测中通过视觉驱动识别材料奠定了基础。

英文摘要

Realistic visual simulation of food manipulation requires accurate material parameters, yet these are difficult to measure directly and vary across the heterogeneous regions of a single food item. We address the inverse problem of estimating material parameters from a target description of fracture behavior in a non-differentiable continuum damage mechanics simulator. Using orange peeling as a test case, we train a neural surrogate on 2,000 forward simulations and compare Covariance Matrix Adaptation Evolution Strategy (CMA-ES, a gradient-free evolutionary optimizer) with Proximal Policy Optimization (PPO, a reinforcement learning algorithm) across the original 9-dimensional parameter space and two learned 4-dimensional latent representations. Since different oranges have different material properties, a practical inverse system must handle arbitrary targets without retraining. We train a goal-conditioned PPO policy that learns a general inverse mapping: given any target description of peeling behavior, the policy produces a material parameter estimate in a single forward pass (8 surrogate evaluations, approximately 10ms). Operating in a normalizing flow latent space with a shared surrogate evaluator, the goal-conditioned policy achieves 0.642 actual recovery when validated through the simulator, outperforming the original parameter space by 23%. A warm-start extension that initializes CMA-ES refinement from the policy's output further improves recovery to 0.828 with 540 evaluations. These findings provide a practical framework for inverse food physics and lay groundwork for vision-driven material identification from video observations of food manipulation.

URL PDF HTML ☆

赞 0 踩 0

2606.16868 2026-06-16 cs.CV cs.AI cs.DC 新提交

Federated Medical Image Segmentation under Real-World Label Noise: A Benchmark Suite for Noisy Label Learning Method Selection

真实世界标签噪声下的联邦医学图像分割：面向噪声标签学习方法选择的基准套件

Markus Bujotzek, Dimitrios Bounias, Stefan Denner, Ralf Floca, Maximilian Fischer, Peter Neher, Klaus Maier-Hein

发表机构 * Division of Medical Image Computing, Germany Cancer Research Center（德国癌症研究中心医学图像计算部）； Medical Faculty, University of Heidelberg（海德堡大学医学院）； Heidelberg Institute of Radiation Oncology (HIRO), National Center for Radiation Research in Oncology (NCRO)（海德堡放射肿瘤学研究所（HIRO），国家放射肿瘤学研究中心（NCRO））； Pattern Analysis and Learning Group, Department of Radiation Oncology, Heidelberg University Hospital（海德堡大学医院放射肿瘤科模式分析与学习组）； Faculty of Mathematics and Computer Science, University of Heidelberg（海德堡大学数学与计算机科学学院）； National Center for Tumor Diseases (NCT), NCT Heidelberg, a partnership between DKFZ and the university medical center Heidelberg（国家肿瘤疾病中心（NCT），NCT海德堡，DKFZ与海德堡大学医学中心的合作机构）

AI总结针对联邦学习中真实世界标签噪声（如轮廓不一致、结构缺失或混淆）问题，提出一个包含多样化真实噪声数据集、客户端噪声场景和针对性评估的基准套件，支持系统评估和噪声标签学习方法选择。

详情

AI中文摘要

虽然联邦学习（FL）能够在不集中敏感数据的情况下实现协作式医学图像分割，但实际部署常因跨站点的标签缺陷（如轮廓不一致、结构缺失或多余、标签混淆）而复杂化。联邦噪声标签学习（FNLL）旨在减轻这些影响，但在实践中仍未被充分利用，因为现有证据主要基于合成噪声、简化设置和有限的实际噪声评估。我们通过引入一个基准套件来弥补这一差距，该套件结合了多样化的真实世界噪声数据集、与部署相关的客户端噪声场景以及针对标签噪声的评估，以支持系统的FNLL评估和知情的方法选择。该套件将来自不同来源的精心策划的真实世界噪声医学图像分割数据集与一个全面的联邦分割框架相结合，包括各种客户端噪声场景和针对噪声的评估。所提出的套件为医学图像分割中的FNLL评估提供了现实且具有区分性的基础，并为公平基准测试、数据集特定的标签噪声表征以及未来在现实联邦设置下的方法开发建立了可重复使用的基础。代码可在 https://github.com/MIC-DKFZ/FedSegNoiseBench 获取。

英文摘要

While federated learning (FL) enables collaborative medical image segmentation without centralizing sensitive data, real-world deployment is frequently complicated by cross-site label imperfections such as contour disagreement, missing or additional structures, and confused labels. Federated noisy label learning (FNLL) aims to mitigate these effects, yet remains underused in practice as existing evidence is largely based on synthetic noise, simplified settings, and limited real-world noisy evaluation. We address this gap by introducing a benchmark suite that combines diverse real-world noisy datasets, deployment-relevant client-noise scenarios, and label-noise-targeted evaluation to support systematic FNLL assessment and informed method selection. The suite combines curated real-world noisy medical image segmentation datasets from diverse sources with a comprehensive federated segmentation framework including various client-noise scenarios and noise-targeted evaluation. The presented suite provides a realistic and discriminative basis for FNLL evaluation in medical image segmentation and establishes a reusable foundation for fair benchmarking, dataset-specific label-noise characterization, and future method development under realistic federated settings. Code is available at https://github.com/MIC-DKFZ/FedSegNoiseBench.

URL PDF HTML ☆

赞 0 踩 0

2606.16867 2026-06-16 cs.CL 新提交

Revisiting the Systematicity in Negation in the Era of In-Context Learning

重新审视上下文学习时代否定中的系统性

Hitomi Yanaka, Taisei Yamamoto

发表机构 * The University of Tokyo（东京大学）； Riken（理化学研究所）； Tohoku University（东北大学）

AI总结通过行为与表征系统性分析，发现大型语言模型在上下文学习中能部分识别否定表达和范围，但无法完美执行，且功能向量在否定线索提取任务中可组合，但范围识别更具挑战。

Comments Accepted to the 6th Workshop Natural Language Meets Logic and Machine Learning (NALOMA2026) at ESSLLI2026

详情

AI中文摘要

理解否定句的含义仍然是语言模型面临的挑战之一，即使在大语言模型（LLMs）时代也是如此。我们从两个角度分析LLM对否定理解的系统性：行为系统性和表征系统性。对于行为系统性，我们确认通过示例和上下文学习，LLMs可以在一定程度上识别句子中的否定表达和范围，但无法达到完美性能。特别是，模型识别否定范围的难度因输出格式而异。对于表征系统性，我们分析对于理解否定至关重要的任务，功能向量可以从上下文示例中稳健构建的程度。实验表明，虽然功能向量可以针对否定线索提取任务进行组合，但提取用于识别范围的功能向量更具挑战性。

英文摘要

Understanding the meaning of negated sentences remains one of the challenges for language models, even in the era of large language models (LLMs). We analyze systematicity regarding LLM understanding of negation from two perspectives: behavioral systematicity and representational systematicity. For behavioral systematicity, we confirm that through demonstrations and in-context learning, LLMs can recognize negation expressions and scope within sentences to some extent, but they fail to achieve perfect performance. In particular, the difficulty of the negation scope recognition for models varies depending on the output format. For representational systematicity, we analyze the extent to which function vectors can be robustly constructed from in-context examples for tasks that are essential to understanding negation. The experiments suggest that while function vectors can be composed for negation cue extraction tasks, extracting function vectors for recognizing scope is more challenging.

URL PDF HTML ☆

赞 0 踩 0

2606.16866 2026-06-16 cs.CV 新提交

Redirecting the Flow: Image Customization through Attention Distribution Shift

重定向流：通过注意力分布偏移实现图像定制

Jie Li, Suorong Yang, Jian Zhao, Furao Shen

发表机构 * State Key Laboratory for Novel Software Technology, Nanjing University（南京大学计算机软件新技术国家重点实验室）； School of Artificial Intelligence, Nanjing University（南京大学人工智能学院）； School of Computer Science, Nanjing University（南京大学计算机科学与技术学院）； School of Electronic Science and Engineering, Nanjing University（南京大学电子科学与工程学院）

AI总结提出基于最大熵理论的Conditional Attention Distribution Shift方法，通过双分支架构CustomShift实现高效主题驱动图像生成，在DreamBooth和Custom101基准上优于现有方法。

详情

AI中文摘要

主题驱动的图像定制旨在生成不仅遵循文本指令而且保留给定参考主题身份的图像。现有方法，包括测试时微调、基于编码器的方法以及共享注意力空间中的令牌竞争，存在效率有限、提取的参考特征与生成过程不对齐以及无关信息干扰等问题。为了解决这些限制，我们将定制任务表述为通过将参考图像融入文本到图像生成所引发的分布偏移，并基于最大熵理论推导出条件注意力分布偏移公式。基于这一公式，我们提出了CustomShift，一种基于Stable Diffusion 3的双分支架构。参考对齐分支利用参考图像和主题名称之间的自注意力实现与潜在表示的逐层对齐，而交叉引导分支整合文本和参考线索以指导生成。在DreamBooth和Custom101基准上的实验表明，我们的方法始终优于最先进的方法，在语义保真度和主题一致性之间取得了更好的平衡。

英文摘要

Subject-driven image customization aims to generate images that not only follow textual instructions but also preserve the identity of a given reference subject. Existing approaches, including test-time fine-tuning, encoder-based methods, and token competition in shared attention spaces, suffer from limited efficiency, misalignment between extracted reference features and the generative process, and interference from irrelevant information. To address these limitations, we formulate the customization task as a distribution shift induced by incorporating reference images into text-to-image generation, and derive a Conditional Attention Distribution Shift formulation grounded in maximum entropy theory. Building on this formulation, we propose CustomShift, a dual-branch architecture based on Stable Diffusion 3. The Reference-Alignment Branch leverages self-attention between reference images and subject names to achieve layer-wise alignment with latent representations, while the Cross-Guidance Branch integrates textual and reference cues to guide generation. Experiments on the DreamBooth and Custom101 benchmarks demonstrate that our method consistently outperforms state-of-the-art approaches, achieving a better balance between semantic fidelity and subject consistency.

URL PDF HTML ☆

赞 0 踩 0

2606.16863 2026-06-16 cs.LG 新提交

HawkesNest: A Multi-Axis Synthetic Benchmark for Spatiotemporal Pattern Complexity

HawkesNest：时空模式复杂度的多轴合成基准

Yahya Aalaila, Sumantrak Mukherjee, Gerrit Großmann, Sebastian Vollmer

发表机构 * German Research Center for Artificial Intelligence (DFKI), Data Science and its Applications Research Group, Kaiserslautern, Germany（德国人工智能研究中心（DFKI），数据科学及其应用研究组，凯撒斯劳滕）； Department of Computer Science, Rhineland-Palatinate Technical University of Kaiserslautern-Landau (RPTU), Kaiserslautern, Germany（莱茵兰-普法尔茨凯撒斯劳滕-兰道工业大学（RPTU）计算机科学系，凯撒斯劳滕）

AI总结提出HawkesNest基准，基于多元Hawkes过程定义四个复杂度轴，用于可控测试时空点过程模型在已知结构难度下的性能。

详情

AI中文摘要

时空点过程（STPP）模型的评估严重依赖于不透明的真实世界数据集，其中潜在生成结构未知且模型失败难以归因。我们引入HawkesNest，一个基于多元Hawkes骨干的生成器对齐基准，用于可控的时空模式复杂度。HawkesNest定义了四个复杂度轴：时空纠缠、背景异质性、跨类型交互和域拓扑。每个轴与从潜在数据生成机制计算出的确定性指标相关联。通过在保持全局速率、稳定性和模拟预算固定的同时改变这些轴，HawkesNest能够在已知结构难度下对STPP模型进行诊断性压力测试。我们验证了在受控扫描下这些指标是单调且几乎正交的。我们通过展示Hawkes系列基线在联合异质性-纠缠复杂度下性能下降来说明其用途，尽管它们在结构上与Hawkes数据生成骨干对齐。我们进一步表明HawkesNest暴露了神经模型的敏感性：AutoSTPP在时空纠缠单独增加时仍然脆弱。代码可在https://github.com/YahyaAalaila/HawkesNest获取。

英文摘要

Evaluation of spatiotemporal point process (STPP) models relies heavily on opaque real-world datasets, where latent generative structure is unknown and model failures are difficult to attribute. We introduce HawkesNest, a generator-aligned benchmark for controlled spatiotemporal pattern complexity built on a multivariate Hawkes backbone. HawkesNest defines four complexity axes: space--time entanglement, background heterogeneity, cross-type interaction, and domain topology. Each axis is associated with a deterministic index computed from the latent data-generating mechanism. By varying these axes while holding global rate, stability, and simulation budget fixed, HawkesNest enables diagnostic stress tests of STPP models under known structural difficulty. We verify that the indices are monotone and nearly orthogonal under controlled sweeps. We illustrate its use by showing that Hawkes-family baselines degrade under joint heterogeneity--entanglement complexity, even though they are structurally aligned with the Hawkes data-generating backbone. We further show that HawkesNest exposes neural-model sensitivity: AutoSTPP remains vulnerable under isolated increases in space--time entanglement. Code. Available at https://github.com/YahyaAalaila/HawkesNest

URL PDF HTML ☆

赞 0 踩 0

2606.16861 2026-06-16 cs.CV 新提交

An Open-Source Monitoring Framework for Data Exploration and Progress Tracking in Multi-Center Radiology Studies

一个用于多中心放射学研究中数据探索与进度跟踪的开源监控框架

Markus Bujotzek, Jonas Scherer, Stefan Denner, Peter Neher, Benjamin Hamm, Lorenz Feineis, Uenal Akuenal, Andreas Bucher, Tobias Penzkofer, Klaus Maier-Hein

发表机构 * Germany Cancer Research Center（德国癌症研究中心）； University of Heidelberg（海德堡大学）； University Hospital Frankfurt（法兰克福大学医院）； Charite Universitätsmedizin Berlin（柏林夏里特医学院）； Berlin Institute of Health（柏林健康研究所）

AI总结提出基于Grafana-Prometheus的轻量级开源监控架构，通过聚合分布式站点指标并可视化，实现隐私保护的数据探索和进度监控，已在德国RACOON联盟38家大学医院部署验证。

详情

AI中文摘要

多中心研究对于推进医学和放射学研究至关重要。数据探索、协作发现和研究进度监控对于最大化其潜力至关重要。然而，在实践中，这些过程通常依赖于手动通信和共享表格，这些表格很快就会过时，并阻碍大型分布式研究中的高效协调。这凸显了对专用监控解决方案的需求，以提供对研究进度的透明和最新洞察。我们提出了一种轻量级、开源的多中心研究监控架构，基于广泛使用的Grafana-Prometheus栈。该框架从分布式研究站点收集聚合的监控指标，并通过可配置的仪表板进行可视化。作为一个真实世界的部署示例，该框架被集成到医学影像平台Kaapana中，并在一个大型多中心研究网络中进行评估。通过在德国范围内的RACOON联盟中部署我们的解决方案，我们展示了其在所有38家德国大学医院中实现隐私保护的数据探索和研究进度监控的能力。该监控框架支持分布式研究活动的透明协调，并可促进大规模多中心研究的更高效管理。源代码和Kaapana集成可在https://github.com/MIC-DKFZ/study-monitoring-kaapana公开获取。

英文摘要

Multi-center studies are crucial for advancing medical and radiological research. Data exploration, collaboration discovery, and study progress monitoring are essential for maximizing their potential. However, in practice these processes often rely on manual communication and shared tables, which quickly become outdated and hinder efficient coordination in large distributed studies. This highlights the need for dedicated monitoring solutions that provide transparent and up-to-date insights into study progress. We propose a lightweight, open-source monitoring architecture for multi-center studies based on the widely used Grafana-Prometheus stack. The framework collects aggregated monitoring metrics from distributed study sites and visualizes them through configurable dashboards. As a real-world deployment example, the framework is integrated into the medical imaging platform Kaapana and evaluated within a large multi-center research network. By deploying our solution within the Germany-wide RACOON consortium, we demonstrate its ability to enable privacy-preserving data exploration and study progress monitoring across all 38 German university clinics. The monitoring framework supports transparent coordination of distributed research activities and can facilitate more efficient management of large-scale multi-center studies. The source code and Kaapana integration are publicly available at https://github.com/MIC-DKFZ/study-monitoring-kaapana.

URL PDF HTML ☆

赞 0 踩 0