详情

Journal ref: Proc. Samahang Pisika Pilipinas 42, SPP-2024-2E-05 (2024)

AI中文摘要

关于星系形态的图像数据预计在未来几年内将在数量和质量上都有所增加；因此，探索哪些适用于图像分类任务的深度学习架构具有成本效益非常重要。残差网络和Inception网络因其计算效率而成为探索分类卷积神经网络（CNN）的理想选择，这得益于残差连接和并行化Inception模块等技术，使得网络能够更深而不显著增加计算复杂度。在这项工作中，我们分析了ResNet101和InceptionV4在空间增强的Galaxy10 DECals数据集上的性能。保留星系的十类分类，我们修改了每个类别的图像数量。我们发现ResNet101和InceptionV4模型达到了约90%的准确率，与文献中报告的性能相当。在性能指标方面，ResNet101优于InceptionV4。我们的结果表明，这两种CNN架构中的任何一种都可以作为即将到来的巡天中星系图像分类专用管线的稳健基础。

英文摘要

Image data regarding galactic morphology is expected to increase both in quantity and quality for the next foreseeable years; thus it is important to explore which deep learning architectures adapted for image classification tasks are cost-effective. Residual and Inception networks are ideal for exploring classification convolutional neural networks (CNNs) due to their computational efficiency, achieved through techniques such as residual connections and parallelized inception modules, enabling deeper networks without excessively increasing computational complexity. In this work, we analyze the performance of ResNet101 and InceptionV4 on a spatially-augmented Galaxy10 DECals dataset. Retaining the ten-class classification of galaxies, we modify the image count of each class. We find that ResNet101 and InceptionV4 models achieved accuracies of $\sim$ 90%, comparable with reported performance in the literature. In terms of performance metrics, ResNet101 is superior to InceptionV4. Our results indicate that either of these CNN architectures could serve as a robust foundation for specialized pipelines for classification of galaxy images from upcoming surveys.

URL PDF HTML ☆

赞 0 踩 0

2606.08816 2026-06-09 cs.LG cs.AI 新提交

主动流扩展用于分布外发现：从理论到分子

Riccardo De Santi, Bruce Lee, Cristian Perez Jensen, Kimon Protopapas, Sophia Tang, Cheng-Hao Liu, Pranam Chatterjee, Yisong Yue, Andreas Krause

发表机构 * ETH Zurich（苏黎世联邦理工学院）； ETH AI Center（ETH AI 中心）； University of Pennsylvania（宾夕法尼亚大学）； Caltech（加州理工学院）； FutureHouse

AI总结提出Active Flow Expansion (ActFlow)方法，通过验证器反馈和主动探索扩展预训练流模型的生成集，覆盖更多有效设计空间，理论证明统计学习保证，在分子和蛋白质任务上优于现有方法。

详情

AI中文摘要

标准流和扩散预训练匹配可用数据（例如分子）的分布，这通常只覆盖有效设计空间的一小部分。然而，在生成发现中，目标是采样有效的新自然设计，这些设计在标准模型下被赋予可忽略的概率，因此无法从拟合观测数据的标准模型中获取。为克服这一限制，我们偏离数据分布匹配，通过生成集（模型以非可忽略概率覆盖的区域）来审视生成模型。这允许引入一种新的分布外流建模学习原则：扩大模型的生成集以增加对有效设计空间的覆盖。我们提出主动流扩展（ActFlow），一种持续预训练方法，利用验证器反馈，通过迭代适应在学习的流表示中主动探索生成的合成数据，将预训练模型扩展到新的有效区域。理论上，我们建立了据我们所知首个分布外流建模的统计学习保证，将生成集扩展分析为在学习表示上的局部到全局可达过程。实验上，我们使用合适的分布外生成建模指标，在小有机分子、中等大小药物样分子、治疗性肽和蛋白质序列设计任务上评估ActFlow。结果表明，ActFlow将有效覆盖扩展到远超初始预训练模型建模的区域，显著优于广泛采用的合成流预训练方法。

英文摘要

Standard flow and diffusion pre-training matches the distribution of available data (e.g., molecules), which often covers only a small fraction of the valid design space. In generative discovery, however, one aims to sample valid new-to-nature designs, assigned negligible probability under, and thus inaccessible to, standard models fitted to the observed data. To overcome this limitation, we depart from data distribution matching and view a generative model through its generable set: the region it covers with non-negligible probability. This allows to introduce a new learning principle for out-of-distribution flow modeling: enlarging a model's generable set to increase coverage of the valid design space. We propose Active Flow Expansion (ActFlow), a continued pre-training method that employs verifier feedback to expand a pre-trained model over new valid regions by iteratively adapting to synthetic data generated through active exploration in the learned flow representation. Theoretically, we establish to our knowledge first-of-their-kind statistical learning guarantees for out-of-distribution flow modeling, analyzing generable set expansion as a local-to-global reachability process over a learned representation. Empirically, we assess ActFlow with suitable out-of-distribution generative modeling metrics across small organic molecules, mid-sized drug-like molecules, therapeutic peptides, and protein sequence design tasks. Results show that ActFlow expands valid coverage far beyond the region modeled by the initial pre-trained model, significantly outperforming widely adopted synthetic flow pre-training methods.

URL PDF HTML ☆

赞 0 踩 0

2606.08800 2026-06-09 cs.AI 新提交

Bridging Expert Knowledge and Automated Feature Engineering via Self-Evolution

通过自进化桥接专家知识与自动化特征工程

Varun Khurana, Vijval Ekbote, Vashu Chauhan, Yaman Kumar Singla, Rajiv Ratn Shah, Balaji Krishnamurthy

发表机构 * Adobe Media and Data Science Research（Adobe媒体与数据科学研究）； IIIT-Delhi（德里印度理工学院）

AI总结提出FEST方法，结合双流特征生成、语义去重和树引导迭代进化，从原始文本和图像中发现可审计特征，在品牌分类等任务中平均提升4.2个百分点，并实现60-80%的专家特征覆盖。

详情

AI中文摘要

在品牌合规、临床护理和内容审核等高风险场景中，机器学习不能作为不透明的预言机部署：从业者需要检查驱动模型决策的特征，模型必须利用管理这些领域的专家文档。实际上，数据以非结构化内容形式到达，从中提取的特征必须可解释、有区分度，并与专家认为重要的内容对齐。现有方法存在不足：它们针对表格输入，缺乏专家对齐的证明，并且无法将诸如“保持专业语气”之类的定性标准转化为精确特征。我们提出了FEST（自进化树特征工程），结合了双流特征生成（语义和确定性）、语义去重和树引导的迭代进化，从原始文本和图像中发现可审计特征。FEST在品牌分类、内容真实性检测和压力检测的20个分类器-任务组合中领先17个，在五个分类器上平均比最强基线高出4.2个百分点。LLM作为评判者的评估显示，在严格的语义对齐阈值下，FEST实现了60-80%的专家设计品牌特征覆盖率，并通过人类专家研究证实，这些特征在相关性、清晰度和可操作性方面获得高评分。当以专家指南作为种子时，FEST将定性标准细化为可操作特征，跨品牌平均提高6-12个百分点的准确率。为了实现对自动化特征工程中专家对齐的系统评估，我们发布了BrandGuide，这是第一个将专家设计特征与2,683个品牌的100万+资产配对的数据集。通过将特征工程建立在专家知识基础上，FEST为需要人类监督的可解释机器学习开辟了一条实用途径。

英文摘要

In high-stakes settings such as brand compliance, clinical care, and content moderation, machine learning cannot be deployed as opaque oracles: practitioners inspect the features driving model decisions, and models must leverage the expert documentation governing these domains. In practice, the data arrives as unstructured content, and features extracted from it must be interpretable, discriminative, and aligned with what experts consider important. Existing methods fall short: they target tabular inputs, lack demonstrated expert alignment, and cannot operationalize qualitative criteria such as 'maintain professional tone' into precise features. We present FEST (Feature Engineering with Self-evolving Trees), combining dual-stream feature generation (semantic and deterministic), semantic deduplication, and tree-guided iterative evolution to discover auditable features from raw text and images. FEST leads in 17 of 20 classifier-task combinations across brand classification, content authenticity detection, and stress detection, with a mean gain of 4.2 pp over the strongest baseline across five classifiers. An LLM-as-judge evaluation shows FEST achieves 60-80% coverage of expert-designed brand features at strict semantic-alignment thresholds, corroborated by a human expert study rating features highly on relevance, clarity, and actionability. When seeded with expert guidelines, FEST refines qualitative criteria into operational features, improving accuracy by 6-12 pp on average across brands. To enable systematic evaluation of expert alignment in automated feature engineering, we release BrandGuide, the first dataset pairing expert-designed features with 1M+ assets across 2,683 brands. By grounding feature engineering in expert knowledge, FEST opens a practical pathway for interpretable ML in domains demanding human oversight.

URL PDF HTML ☆

赞 0 踩 0

2606.08797 2026-06-09 cs.LG cs.AI 新提交

Scaling Decision-Focused Learning to Large Problems with Lagrangian Decomposition

通过拉格朗日分解将决策聚焦学习扩展到大规模问题

Stéphane Eilles-Chan Way, Hugo Percot, Quentin Cappart, Tias Guns, Louis-Martin Rousseau

发表机构 * Polytechnique Montréal（蒙特利尔综合理工学院）； Ecole Polytechnique（巴黎综合理工学院）； UCLouvain（鲁汶大学）； Mila - Québec AI Institute（魁北克人工智能研究所）； KU Leuven（荷语鲁汶大学）

AI总结提出结合拉格朗日分解的决策聚焦学习框架，通过新代理目标和两种损失函数，在保持可并行化的同时，有效处理大规模约束优化问题，实验表明在变量数多八倍的实例上优于传统方法。

详情

AI中文摘要

决策聚焦学习在解决预测-优化问题中显示出巨大潜力，尤其是在模型欠规范的情况下。然而，其实际部署常因高计算成本和有限的可扩展性而受阻，因为需要在每次迭代中对每个训练实例求解一个约束优化问题。为解决这些挑战，我们提出了一种新颖的框架，将拉格朗日分解融入决策聚焦学习范式。具体而言，我们引入了一个新的代理目标以及两个用于评估和训练底层预测模型的损失函数。我们进一步提出了两种变体，它们在计算效率和解决方案质量之间提供了不同的权衡。我们的框架可以无缝集成到标准的决策聚焦学习方法中，包括Smart Predict-then-Optimize (SPO+)和隐式最大似然估计 (IMLE)。通过在两个标准基准测试（多维背包问题和二次投资组合优化）上的实验，我们证明了我们的方法在保持可并行化的同时实现了有竞争力的性能。特别是，在大规模实例上，它始终优于传统的决策聚焦学习方法，这些实例的变量数比相关工作通常考虑的要多出八倍。实现代码可在 https://github.com/corail-research/DFL-LD 获取。

英文摘要

Decision-focused learning has shown great promise for addressing predict-then-optimize problems, particularly in the presence of under-specified models. However, its practical deployment is often hindered by high computational costs and limited scalability, as it requires solving a constrained optimization problem for each training instance at every iteration. To address these challenges, we propose a novel framework that incorporates Lagrangian decomposition into the decision-focused learning paradigm. Specifically, we introduce a new surrogate objective along with two loss functions for evaluating and training the underlying prediction model. We further propose two variants of our approach, which offer different trade-offs between computational efficiency and solution quality. Our framework can be seamlessly integrated with standard decision-focused learning methods, including Smart Predict-then-Optimize (SPO+) and Implicit Maximum Likelihood Estimation (IMLE). Through experiments on two standard benchmarks, the multi-dimensional knapsack problem and quadratic portfolio optimization, we demonstrate that our approach achieves competitive performance while remaining amenable to parallelization. In particular, it consistently outperforms traditional decision-focused learning methods on large-scale instances, involving up to eight times more variables than those typically considered in related work. The implementation is available at https://github.com/corail-research/DFL-LD.

URL PDF HTML ☆

赞 0 踩 0

2606.08795 2026-06-09 cs.CV 新提交

PairWise Image Finder: An Open-source Tool for Finding Visually Aligned Street-Level Image Pairs for Urban Perception Studies

PairWise Image Finder: 用于城市感知研究的视觉对齐街景图像对查找开源工具

Jussi Torkko

发表机构 * Digital Geography Lab, Department of Geosciences and Geography, University of Helsinki（赫尔辛基大学地球科学与地理系数字地理实验室）

AI总结提出PairWise图像查找工具，集成特征检测与匹配及语义分割掩码，量化不同时期图像的视觉对齐度，输出匹配特征比例、距离、覆盖率和语义掩码对齐度，支持过滤高质量图像对，用于纵向变化研究和减少人工工作量。

Comments 6 pages, two figures, github repo link near the end

详情

AI中文摘要

变化检测和场景识别技术已广泛应用于街景图像（SVI）以理解跨年场景的变化。然而，仅凭元数据往往不足以可靠地找到视觉对齐的图像对。本研究介绍了PairWise图像查找器，该工具集成了特征检测和匹配，并辅以语义分割掩码来量化不同时期两幅图像的视觉对齐度。该工具输出匹配关键特征的比例、匹配特征距离和覆盖率以及语义掩码的对齐度，使用户能够根据对齐质量和用例过滤图像对。从该工具导出的视觉对齐对可用于准确研究显式的纵向变化，并有助于减少感知研究中的人工工作量。通过比较纵向变化展示了该工具的可用性，强调了量化变化时视角的重要性。所提出的方法为研究人员和利益相关者提供了一个可扩展的开源工具，用于查找用于城市分析、感知及相关应用的高质量图像对。

英文摘要

Change detection and scene recognition techniques have been widely applied to Street View Imagery (SVI) to understand changes in scenes across the years. However, metadata alone is often insufficient to reliably find visually aligned image pairs. This study introduces the PairWise image finder, a tool that integrates feature detection and matching, supported by semantic segmentation masks to quantify the visual alignment of two images of varying time periods. The tool outputs the share of matched key features, the matched feature distance and coverage, and the alignment of semantic masks, which enables the user to filter image pairs depending on the alignment quality and use case. The visually aligned pairs derived from the tool can be used to accurately study explicit longitudinal change and help reduce manual effort for perception studies. The usability of the tool is demonstrated through a comparison of longitudinal changes, highlighting the importance of perspective when quantifying changes. The proposed method provides a scalable and open tool for researchers and stakeholders to find high-quality image pairs for urban analysis, perception and related applications.

URL PDF HTML ☆

赞 0 踩 0

2606.08792 2026-06-09 cs.CL 新提交

The Amplifying Mirror: Locating and Steering the Partisan Direction inside a Large Language Model

放大镜：定位和操控大语言模型内的党派方向

Wendy K. Tam

发表机构 * Vanderbilt University（范德比尔特大学）； University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结通过线性探针在Llama 3.1 8B Instruct模型的隐藏状态中定位党派政治身份方向，并利用稀疏自编码器分解为可解释特征，因果干预可系统性改变模型输出，证明党派偏见是可定位和操控的几何特征。

详情

AI中文摘要

大型语言模型正迅速取代搜索引擎，成为人与信息之间的主要界面。与检索现有内容的搜索引擎不同，LLM生成受训练期间学到的内部表示影响的新文本。在这里，我们展示了党派政治身份编码在模型的激活空间中，并且这个方向直接塑造生成。使用来自美国国会现任议员的190,491条推文作为标记训练数据，我们在Llama 3.1 8B Instruct模型的隐藏状态上训练线性探针。我们在第18层识别出一个单一的几何轴，该轴以0.945的AUC和1.94的Cohen's d区分共和党和民主党文本，并使用稀疏自编码器将该轴分解为可解释的党派特征。沿该轴进行因果干预，在生成过程中消融或放大党派成分，会产生模型输出的系统性变化。我们观察到立场反转、语域转换以及结构化的权威捏造。我们的结果表明，语言模型中的党派偏见不是模糊的涌现属性，而是可以精确定位和操控的习得几何特征。党派偏见不是需要修补的漏洞，而是这些模型如何编码关于用户信息的结构属性。随着LLM取代搜索引擎成为知识界面，理解产品设计（及其后果）对于驾驭从策划到生成的信息生态系统的法律、社会和政治转型至关重要。

英文摘要

Large language models are rapicly replacing search engines as the primary interface between people and information. Unlike search engines, which retrieve existing content, LLMs generate novel text shaped by internal representations learned during training. Here we show that partisan political identity is encoded in the model's activation space, and that this direction directly shapes generation. Using 190,491 tweets from sitting members of the U.S. Congress as labeled training data, we train linear probes on the hidden states of the Llama 3.1 8B Instruct model. We identify a single geometric axis at layer 18 that separates Republican from Democratic text with an AUC of 0.945 and a Cohen's d of 1.94, and use sparse autoencoders to decompose that axis into interpretable partisan features. Causally intervening along this axis, ablating or amplifying the partisan component mid-generation, produces systematic shifts in the model's output. We witness stance reversals, register shifting, and structured fabrications of authority. Our results demonstrate that partisan bias in language models is not a vague emergent property but a learned geometric feature that can be precisely located and steered. Partisan bias is not a bug to be patched, but a structural property of how these models encode information about their users. As LLMs displace search engines as the interface to knowledge, understanding that product design (and its consequences) will be essential for navigating the legal, social, and political transitions from an information ecosystem that is curated to one that is generated.

URL PDF HTML ☆

赞 0 踩 0

2606.08790 2026-06-09 cs.AI cs.CR cs.MA 新提交

RAILS: Verification-Native Clearing For Agentic Commerce

RAILS: 面向代理商务的验证原生清算

Adrian de Valois-Franklin, Alex Bogdan

发表机构 * Evolutionairy AI

AI总结针对自主代理在商务活动中缺乏中立清算机制的问题，提出RAILS协议，通过可靠性评分、记录和清算函数实现验证原生清算，确保财务结算基于充分证据。

Comments 49 pages, 15 figures

详情

AI中文摘要

自主代理进行谈判、购买、部署代码和转移资金，但缺乏中立机制来确定它们是否履行了委托义务、未履行时谁负责、以及后续采取何种结算行动。这就是代理清算问题。工具协议（MCP）、代理间通信（A2A）、支付轨道（x402）、授权和网络代理协议（AP2、Visa、Mastercard）以及结算风险标准都假设存在这种确定机制，但都没有产生它。清算是缺失的原语。支付不是清算。授权不是清算。LLM作为法官的评估不是清算。结算风险托管不是清算：它消耗清算决策。RAILS（实时代理完整性与账本结算）是代理商务的完整性和清算层，涵盖每个输出的可靠性评分、公开的可靠性记录以及消耗它们的清算函数。其核心的清算协议填补了这一空白。七个原语（义务对象、证据信封、验证网格、清算决策、结算指令、清算护照、终局规则），由可接受性分级验证的形式模型约束，共同产生一个可靠性属性：没有财务上重要的结算得到低于义务可接受性底线的证据支持。该属性可针对规范进行证伪。我们不知道先前的代理商务验证机制陈述过此类属性。最接近的方法输出通过、交付保证、裸分数或均衡。本文详细说明了该清算协议。

英文摘要

Autonomous agents negotiate, purchase, deploy code, and move funds, but no neutral mechanism determines whether they met their delegated obligation, who is responsible when they did not, or which settlement action follows. This is the agentic clearing problem. Tool protocols (MCP), inter-agent communication (A2A), payment rails (x402), mandate and network agent protocols (AP2, Visa, Mastercard), and settlement-risk standards each assume that determination and none produce it. Clearing is the missing primitive. Payment is not clearing. Authorization is not clearing. LLM-as-judge evaluation is not clearing. Settlement-risk escrow is not clearing: it consumes clearing decisions. RAILS (Real-Time Agent Integrity & Ledger Settlement) is the integrity and clearing layer for agentic commerce, spanning a per-output reliability score, a published reliability record, and a clearing function that consumes them. The clearing protocol at its core closes that gap. Seven primitives (Obligation Object, Evidence Envelope, Verification Mesh, Clearing Decision, Settlement Instruction, Clearing Passport, Finality Rules), bound by a formal model of admissibility-graded verification, together yield a soundness property: no financially material settlement is supported by evidence below the obligation's admissibility floor. The property is falsifiable against the spec. We are not aware of a prior agent-commerce verification mechanism that states a property of this kind. The approaches nearest to it emit a pass, a delivery guarantee, a bare score, or an equilibrium. This paper specifies that clearing protocol.

URL PDF HTML ☆

赞 0 踩 0

2606.08788 2026-06-09 cs.CV 新提交

MaskAlign: Token-Subset Representation Alignment for Efficient Diffusion Training

MaskAlign: 面向高效扩散训练的令牌子集表示对齐

Lianyu Pang, Tianlin Pan, Cheng Da, Changqian Yu, Huan Yang, Kun Gai, Song Guo, Wenhan Luo

发表机构 * The Hong Kong University of Science and Technology（香港科技大学）； Kuaishou Technology（快手科技）； University of Chinese Academy of Sciences（中国科学院大学）

AI总结针对扩散模型与预训练视觉模型表示对齐中令牌级信息不匹配问题，提出MaskAlign方法，通过随机采样令牌子集进行对齐，并引入预掩码令牌混合块减少信息损失，提升训练效率和生成质量。

详情

AI中文摘要

与预训练视觉模型的表示对齐最近显示出加速扩散Transformer训练的潜力。通过将中间扩散特征与来自自监督视觉编码器的干净图像表示对齐，现有方法提高了收敛速度和生成质量。然而，这种对齐也引入了一个非平凡的约束：扩散模型处理噪声输入，其可用信息随时间步变化，而参考特征是从干净图像中提取的。在本文中，我们从令牌级角度重新审视这种不匹配。我们发现，在全令牌表示对齐下，具有较大对齐梯度范数的令牌表现出稳定的空间偏好，这表明对齐目标并非均匀影响所有令牌，可能鼓励模型依赖完整的干净图像令牌集。为了解决这个问题，我们提出了MaskAlign，一种令牌子集表示对齐方法，在训练期间对随机采样的令牌子集应用对齐。通过在不同迭代中向模型暴露不同的令牌子集，MaskAlign减少了表示对齐对完整令牌集的依赖，并鼓励在令牌子集扰动下更稳定的对齐行为。为了缓解直接丢弃令牌导致的信息损失，我们进一步引入了一个轻量级的预掩码令牌混合块，在掩码之前跨令牌共享信息。

英文摘要

Representation alignment with pretrained vision models has recently shown strong potential for accelerating diffusion transformer training. By aligning intermediate diffusion features with clean-image representations from self-supervised vision encoders, existing methods improve convergence and generation quality. However, such alignment also introduces a non-trivial constraint: diffusion models operate on noisy inputs whose usable information varies across timesteps, while the reference features are extracted from clean images. In this paper, we revisit this mismatch from a token-level perspective. We find that, under full-token representation alignment, tokens with large alignment-gradient norms exhibit a stable spatial preference, suggesting that the alignment objective does not affect all tokens uniformly and may encourage the model to rely on the complete set of clean-image tokens. To address this issue, we propose MaskAlign, a token-subset representation alignment method that applies alignment to randomly sampled token subsets during training. By exposing the model to different token subsets across iterations, MaskAlign reduces the dependence of representation alignment on the complete token set and encourages alignment behavior that is more stable under token-subset perturbations. To mitigate the information loss caused by directly dropping tokens, we further introduce a lightweight pre-mask token mixing block that shares information across tokens before masking.

URL PDF HTML ☆

赞 0 踩 0

2606.08780 2026-06-09 cs.CV 新提交

Beyond Consistency: Preserving Temporal Structure in Zero-Shot Video Editing

超越一致性：在零样本视频编辑中保留时间结构

Deyin Liu, Yisheng Ding, Zhe Jin, Xiatian Zhu, Anjan Dutta, Lin Wu

发表机构 * Anhui University（安徽大学）； University of Surrey（萨里大学）； University of Warwick（华威大学）

AI总结提出一种零样本视频编辑方法，通过自适应分割视频片段、选取锚帧和令牌合并策略，首次显式保留源视频的时间结构，平衡编辑保真度与计算效率。

详情

AI中文摘要

现有的零样本视频编辑方法依赖预训练的扩散模型，成功实现了空间控制和基本的时间一致性，但根本上未能保留视频的原始时间结构。这一区别至关重要：时间一致性确保视觉平滑，而时间结构决定了视频的高层叙事、节奏和语义流。没有这种保留，编辑输出（尤其是具有复杂语义变化的长视频）在叙事上变得不连贯，语义模糊。为了解决这一局限性，我们提出了一种新颖的零样本编辑方法，首次明确关注保留源视频的时间结构。我们通过基于特征相似性自适应地将视频分割成语义不同的片段，并为每个片段选择一个代表性的锚帧来实现这一点。为了增强片段内保真度和计算效率，我们设计了一种片段自适应的令牌合并策略，利用锚帧的语义主导性来稳定编辑。此外，我们采用交替组合策略，确保片段间无缝过渡，同时保持语义区分。大量实验表明，我们的方法达到了最先进的结果，成功平衡了原始时间结构的保留与计算效率，为零样本视频编辑保真度设立了新基准。

英文摘要

Existing zero-shot video editing methods rely on pre-trained diffusion models, successfully achieving spatial control and basic temporal consistency but fundamentally fail to preserve the video's original temporal structure.This distinction is critical: temporal consistency ensures visual smoothness, but temporal structure dictates the video's high-level narrative, rhythm, and semantic flow. Without this preservation, the edited output, especially for long videos with complex semantic variations, becomes narratively incoherent and semantically ambiguous. To address this limitation, we introduce a novel zero-shot editing approach that, for the first time, explicitly focuses on preserving the source video's temporal structure. We achieve this by adaptively partitioning the video into semantically distinct clips based on feature similarity and selecting a representative anchor frame for each clip. To enhance both intra-clip fidelity and computational efficiency, we design a clip-adaptive token merging strategy which leverages the anchor's semantic dominance to stabilize the editing. Furthermore, we employ an alternating combination strategy that ensures seamless inter-clip transitions while maintaining semantic distinction. Extensive experiments demonstrate that our method achieves state-of-the-art results, successfully balancing the preservation of original temporal structure with computational efficiency, and setting a new benchmark for zero-shot video editing fidelity.

URL PDF HTML ☆

赞 0 踩 0

2606.08777 2026-06-09 cs.LG cs.AI 新提交

How Many Counterfactuals Does It Take? Probing VLM Hallucinations Through Circuits and Causal Effects

需要多少反事实？通过电路和因果效应探究VLM幻觉

Abhivansh Gupta, Simardeep Singh, Advika Sinha, Shreyansh Modi, Akshat Tomar

发表机构 * University of California, Berkeley（加州大学伯克利分校）； DeepMind（深度思维）

AI总结本文通过定义基于对数概率差异的因果影响度量，并利用电路发现技术，研究视觉语言模型幻觉输出的反事实鲁棒性，推导出检测不稳定所需的最小反事实样本数。

2606.08775 2026-06-09 cs.RO cs.AI 新提交

AUCp: 用于异常检测中无标注验证数据的推理模型选择的伪AUC

Md Mahfuzur Rahman Siddiquee, Fazle Rafsani, Jay Shah, Teresa Wu, Catherine D Chong, Todd J Schwedt, Baoxin Li

发表机构 * arXiv

AI总结提出AUCp指标，无需标注验证集即可为无监督/自监督异常检测方法选择最优推理模型，通过将测试集所有样本视为异常计算伪AUC，理论及实验证明其优于传统指标。

详情

DOI: 10.1109/TMI.2026.3684946
Journal ref: IEEE Transactions on Medical Imaging (Early Access), 2026

AI中文摘要

异常检测是医学图像分析中一项关键但具有挑战性的任务。通过学习仅重构正常数据来区分异常与正常数据，减少了对标注数据集的依赖。然而，许多研究即使是无监督的，也依赖标注验证集从多次训练迭代中选择最佳推理模型。对于许多疾病，标注数据不可用且获取耗时。为解决此问题，提出了AUCp——一种支持无监督和自监督方法异常检测的新指标。它不通过评估重构图像的真实性来选择最佳推理模型，而是关注实际检测性能，且无需标注测试集。假设测试集中所有未标注样本的伪真实标签为异常/阳性，并使用传统AUC计算，得到AUCp分数。给定一个包含大量正常样本的代表性训练集，我们通过数学和实证证据表明，使用AUCp分数进行模型选择在无监督和自监督方法中比传统指标更能改善疾病检测。使用两种无监督方法进行神经系统疾病检测以及在不同数据集上的自监督方法，我们的结果表明AUCp分数有效识别最佳推理模型，显著增强异常和疾病检测。相应实现可在https://github.com/mahfuzmohammad/AUCp获取。

英文摘要

Abnormality detection is a crucial yet challenging task in medical image analysis. Distinguishing abnormalities from normal data by learning to reconstruct normal-only data alleviates the reliance on labeled datasets. However, many studies, even if unsupervised, rely on a labeled validation set to select the best model for inference from multiple training iterations. For many diseases labeled data are unavailable and substantially time consuming to obtain. To address this, AUCp - a novel metric that supports abnormality detection for unsupervised and self-supervised methods is proposed. Instead of evaluating the realism of reconstructed images to select the best of model for inference, it focuses on actual detection performance and without requiring an annotated test set. Assuming the pseudo ground truth of all unannotated samples in the test set as abnormal/positive and using traditional AUC calculation, AUCp scores are derived. Given a large and representative training set of normal samples, we show mathematical and empirical evidence that model selection using AUCp scores improves disease detection in terms of unsupervised and self-supervised methods over conventional metrics. Using two unsupervised methods for neurologic disease detection and self-supervised methods on diverse datasets, our results demonstrate that the AUCp score effectively identifies the optimal model for inference, significantly enhancing abnormality and disease detection. The corresponding implementations are available in https://github.com/mahfuzmohammad/AUCp.

URL PDF HTML ☆

赞 0 踩 0

2606.08741 2026-06-09 cs.RO 新提交

Safe, Fluent and Acceptable Motion Generation and Execution for Human--Robot Interaction in Manufacturing Environments

制造环境中人机交互的安全、流畅与可接受运动生成与执行

Thibaut Lopez, Olivier Aycard, Pierre-Brice Wieber, Mohamed Boua, Christine Jeoffrion

发表机构 * GIPSA Lab（GIPSA实验室）； Grenoble Institute of Technology（格勒诺布尔理工学院）； Inria（法国国家信息与自动化研究所）； LIP/PC2S（LIP/PC2S实验室）； Univ. Grenoble Alpes（格勒诺布尔阿尔卑斯大学）； Univ. Savoie Mont Blanc（萨瓦大学）

AI总结针对人机共享环境，提出结合安全与社交感知的运动生成策略，通过MPC框架生成四种社交行为，用户研究表明机器人行为显著影响社会可接受性。

详情

AI中文摘要

在人类环境中运行的机器人不仅要确保物理安全，还要表现出人类伙伴可理解、流畅和可接受的行为。本文研究了结合安全保障与交互质量考虑（如运动平滑性和人类舒适度）的运动生成策略。虽然能够确保共享人机环境中安全的机器人设计已经实现了更紧密、更高级的交互形式，但这些新的基于近距离的任务需要超越纯技术考虑。特别是，机器人行为还必须从心理认知和社会角度加以解决。在此背景下，我们论证了将社交感知运动控制集成到机器人系统中的相关性。首先，我们识别了影响人类感知和操作员体验的运动参数。然后，我们实现了一个模型预测控制（MPC）框架，该框架生成四种不同的社交知情机器人行为。最后，我们进行了一项用户研究，以评估和验证这些行为，并评估它们对非专家参与者的社会影响。结果表明，机器人行为的变化显著影响系统的感知社会可接受性。这些发现强调了将以人为本的考虑纳入共享环境中机器人运动生成策略的重要性。

英文摘要

Robots operating in human environments must not only ensure physical safety but also exhibit behaviors that are understandable, fluent, and acceptable to human partners. This paper investigates motion generation strategies that combine safety guarantees with interaction quality considerations, such as motion smoothness and human comfort. While the design of robots capable of ensuring safety in shared human-robot environments has enabled closer and more advanced forms of interaction, these new proximity-based tasks require moving beyond purely technical considerations. In particular, robot behavior must also be addressed from psycho-cognitive and social perspectives. In this context, we argue for the relevance of integrating social-aware motion control into robotic systems. First, we identify the motion parameters that influence human perception and operator experience. Then, we implement a Model Predictive Control (MPC) framework that generates four distinct socially-informed robot behaviors. Finally, we conduct a user study to evaluate and validate these behaviors and assess their social impact on non-expert participants. The results demonstrate that variations in robot behavior significantly affect the perceived social acceptability of the system. These findings highlight the importance of incorporating human-centered considerations into motion generation strategies for robots operating in shared environments.

URL PDF HTML ☆

赞 0 踩 0