arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2505.22322 2026-05-26 cs.LG

A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective

表格扩散模型中记忆化的深入探究：以数据为中心的观点

Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen, Xiaoge Zhang, Kaiyu Tang, Xiao Li, Jing Li

发表机构 * Department of Computer and Data Sciences（计算机与数据科学系）； Case Western Reserve University（凯斯西储大学）； Department of Computer Science & Engineering（计算机科学与工程系）； Texas A&M University（德克萨斯大学）； Department of Biochemistry（生物化学系）； Center for RNA Science and Therapeutics（RNA科学与治疗中心）； Department of Biomedical Engineering（生物医学工程系）

AI总结本文首次从数据角度研究表格扩散模型中的记忆化动态，通过量化每个真实样本的记忆化程度，发现少数样本贡献了大部分泄露，并提出两阶段缓解方法DynamicCut。

Comments Published in Transactions on Machine Learning Research (TMLR), 2026

详情

AI中文摘要

扩散模型在生成高质量表格数据方面表现出色，但通过重现精确训练样本带来隐私风险。先前工作侧重于数据集级增强以减少记忆化，但鲜有研究哪些个体样本贡献最大。我们首次从数据角度研究表格扩散模型中的记忆化动态。我们基于有多少生成样本被标记为副本，使用相对距离比率量化每个真实样本的记忆化程度。实证分析揭示了记忆化计数的重尾分布：一小部分样本对泄露贡献不成比例，通过样本移除实验得到证实。为理解这一点，我们将真实样本分为顶部记忆化和非顶部记忆化两组，分析其训练时行为。我们追踪每个样本首次被记忆化的时间，并监测每轮记忆化强度（AUC）。记忆化样本稍早被记忆化，并在早期训练中表现出更强信号。基于这些见解，我们提出DynamicCut，一种两阶段、模型无关的缓解方法：（a）按轮次强度对样本排序，（b）修剪可调顶部比例，（c）在过滤后的数据集上重新训练。在多个表格数据集和模型上，DynamicCut减少了记忆化，对数据多样性和下游性能影响最小。它还补充了基于增强的防御。此外，DynamicCut实现了跨模型迁移性：从一个模型（如扩散模型）识别出的高排名样本，当从其他模型（如GAN和VAE）中移除时，也能有效减少记忆化。

英文摘要

Diffusion models have shown strong performance in generating high-quality tabular data, but they carry privacy risks by reproducing exact training samples. While prior work focuses on dataset-level augmentation to reduce memorization, little is known about which individual samples contribute most. We present the first data-centric study of memorization dynamics in tabular diffusion models. We quantify memorization for each real sample based on how many generated samples are flagged as replicas, using a relative distance ratio. Our empirical analysis reveals a heavy-tailed distribution of memorization counts: a small subset of samples contributes disproportionately to leakage, confirmed via sample-removal experiments. To understand this, we divide real samples into top- and non-top-memorized groups and analyze their training-time behaviors. We track when each sample is first memorized and monitor per-epoch memorization intensity (AUC). Memorized samples are memorized slightly earlier and show stronger signals in early training. Based on these insights, we propose DynamicCut, a two-stage, model-agnostic mitigation method: (a) rank samples by epoch-wise intensity, (b) prune a tunable top fraction, and (c) retrain on the filtered dataset. Across multiple tabular datasets and models, DynamicCut reduces memorization with minimal impact on data diversity and downstream performance. It also complements augmentation-based defenses. Furthermore, DynamicCut enables cross-model transferability: high-ranked samples identified from one model (e.g., a diffusion model) are also effective for reducing memorization when removed from others, such as GANs and VAEs.

URL PDF HTML ☆

赞 0 踩 0

2505.11758 2026-05-26 cs.CV cs.AI cs.GR cs.RO

Generalizable Vision-Language Few-Shot Adaptation with Predictive Prompts and Negative Learning

具有预测性提示和负学习的可泛化视觉语言少样本适应

Sriram Mandalika

发表机构 * Hasso Plattner Institute, University of Potsdam（霍普夫纳研究所，波茨坦大学）

AI总结提出SCAN框架，通过查询自适应负路由、LLM引导对比提示和自适应融合权重，解决视觉语言模型少样本适应中负类信号处理问题，在11个基准上平均提升4.61%。

详情

AI中文摘要

视觉语言模型的少样本适应在推理时如何处理负类信号方面仍然存在根本性限制。现有方法对所有查询应用统一的负抑制，忽略了最具破坏性的混淆是查询特定的，并且随支持集几何形状而变化。我们提出SCAN（选择性混淆感知负样本），一个通过三个针对性贡献解决这一问题的框架。在推理中，查询自适应负路由将抑制限制在每个查询最易混淆的前K个类别，无需额外参数。通用负文本模板被替换为LLM引导的对比提示，描述易混淆类别对之间的区分属性，在关键处锐化文本决策边界。基于支持集Fisher可判别性估计的无参数自适应融合权重消除了手动调整视觉语言权衡的需要。在11个标准基准上评估，SCAN在16-shot设置下平均优于先前的基于提示和基于适配器的方法4.61%，在类间混淆最严重的细粒度数据集上提升高达7.70%。SCAN在分布偏移下也表现出强泛化性，在四个ImageNet OOD变体上平均提升2.95%，并在显著标签噪声下保持稳健性能，在50%标签损坏下的准确率仍超过最强竞争方法的干净基线。

英文摘要

Few-shot adaptation of vision-language models remains fundamentally limited by how negative class signals are handled at inference. Existing methods apply uniform negative suppression across all queries, ignoring that the most damaging confusions are query-specific and shift with support-set geometry. We introduce SCAN (Selective Confusion-Aware Negatives), a framework that addresses this gap through three targeted contributions. In inference, query-adaptive negative routing restricts suppression to the top-K most confusable classes per query, requiring zero additional parameters. Generic negative text templates are replaced with LLM-bootstrapped contrastive prompts that describe discriminative attributes between confusable class pairs, sharpening the textual decision boundary where it matters most. A parameter-free adaptive fusion weight estimated from support-set Fisher discriminability removes the need for manual tuning of the vision-language trade-off. Evaluated across 11 standard benchmarks, SCAN consistently outperforms prior prompt-based and adapter-based methods by an average of 4.61% at 16-shot, with gains of up to 7.70% on fine-grained datasets where inter-class confusion is most severe. SCAN also generalizes strongly under distribution shift, improving by 2.95% on average across four ImageNet OOD variants, and maintains robust performance under significant label noise, with accuracy under 50% label corruption still exceeding the clean baseline of the strongest competing method.

URL PDF HTML ☆

赞 0 踩 0

2505.08155 2026-05-26 cs.AI

Efficient and Scalable Neural Symbolic Search for Knowledge Graph Complex Query Answering

高效且可扩展的神经符号搜索用于知识图谱复杂查询回答

Weizhi Fei, Zihao Wang, hang Yin, Shukai Zhao, Wei Zhang, Yangqiu Song

发表机构 * Department of Mathematical Sciences, Tsinghua University（清华大学数学科学系）； Department of Computer Science and Engineering, Hong Kong University of Science and Technology（香港理工大学计算机科学与工程系）； Department of Computer Sciences, University of Rochester（罗切斯特大学计算机科学系）

AI总结提出一种结合约束策略和局部搜索的神经符号方法，以降低数据复杂度和近似解决NP难的循环查询，实现高效可扩展的复杂查询回答。

详情

AI中文摘要

复杂查询回答（CQA）是知识图谱（KG）上的一项关键推理任务，旨在从不完整的KG中回答一阶逻辑查询。现有的神经符号方法虽然取得了强劲的性能，但面临显著的复杂度瓶颈：数据复杂度随实体数量呈二次增长，且循环查询的查询复杂度为NP难。因此，这些方法难以有效扩展到大型知识图谱和复杂查询。为解决这些限制，我们提出了一种高效且可扩展的符号搜索方法，包含两个关键组件：（1）约束策略，大幅减少变量搜索域，降低数据复杂度；（2）局部搜索算法，近似解决NP难的循环查询。在各种CQA基准上的实验表明，对于树形查询，我们的方法仅使用10%的搜索空间即可达到97%的相对MRR，并实现10倍的加速。此外，该方法在复杂循环查询和大规模KG上展现出稳健的性能，有效缓解了效率和可扩展性挑战。我们的代码见https://github.com/HKUST-KnowComp/NLISA_KDD2026。

英文摘要

Complex Query Answering (CQA) is a crucial reasoning task over Knowledge Graphs (KGs), which aims to answer first-order logical queries from incomplete KGs. While existing neural-symbolic methods achieve strong performance, they face significant complexity bottlenecks: quadratic data complexity scaling with the number of entities, and NP-hard query complexity for cyclic queries. Consequently, these approaches struggle to scale effectively to large knowledge graphs and complex queries. To address these limitations, we propose an efficient and scalable symbolic search method comprising two key components: (1) constraint strategies that drastically reduce the variable search domain, lowering data complexity; and (2) a local search algorithm that approximately solves NP-hard cyclic queries. Experiments on various CQA benchmarks demonstrate that, for tree-form queries, our method achieves 97% relative MRR with a 10$\times$ speedup using only 10% of the search space. Furthermore, it demonstrates robust performance on complex cyclic queries and large-scale KGs, effectively alleviating efficiency and scalability challenges. Our code is provided in https://github.com/HKUST-KnowComp/NLISA_KDD2026.

URL PDF HTML ☆

赞 0 踩 0

2505.05880 2026-05-26 cs.AI cs.LG

Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams

结合抽象论证与机器学习高效分析低层过程事件流

Bettina Fazzinga, Sergio Flesca, Filippo Furfaro, Luigi Pontieri, Francesco Scala

发表机构 * University of Calabria（卡拉布里亚大学）； CNR（国家科研委员会）

AI总结提出一种数据高效的神经符号方法，通过抽象论证框架（AAF）优化序列标注模型生成的候选事件解释，以解决低层过程事件流中事件到活动映射的不确定性问题。

详情

DOI: 10.1007/s40747-026-02340-1

AI中文摘要

监控和分析过程轨迹是现代公司和组织的一项关键任务。在轨迹事件与参考业务活动之间存在差距的场景中，这涉及一个解释问题，即将任何正在进行的轨迹的每个事件转换为活动实例的相应步骤。基于最近将解释问题框架化为抽象论证框架（AAF）内的接受问题的方法，可以优雅地分析可能的（可能以聚合形式）事件解释，并为那些与先验过程知识冲突的解释提供解释。由于在事件到活动映射高度不确定（或简单地说未充分指定）的环境中，这种基于推理的方法可能产生低信息量的结果和繁重的计算，因此可以考虑发现一个序列标注模型，该模型经过训练以上下文感知的方式建议高概率的候选事件解释。然而，最优地训练这样的模型可能需要使用大量手动注释的示例轨迹。因此，我们提出了一种数据高效的神经符号方法，其中由示例驱动的序列标注器返回的候选解释由基于AAF的推理器进行细化。这使我们能够利用先验知识来补偿示例数据的稀缺性，实验结果证实了这一点。

英文摘要

Monitoring and analyzing process traces is a critical task for modern companies and organizations. In scenarios where there is a gap between trace events and reference business activities, this entails an interpretation problem, amounting to translating each event of any ongoing trace into the corresponding step of the activity instance. Building on a recent approach that frames the interpretation problem as an acceptance problem within an Abstract Argumentation Framework (AAF), one can elegantly analyze plausible event interpretations (possibly in an aggregated form), as well as offer explanations for those that conflict with prior process knowledge. Since, in settings where event-to-activity mapping is highly uncertain (or simply under-specified) this reasoning-based approach may yield lowly-informative results and heavy computation, one can think of discovering a sequence-tagging model, trained to suggest highly-probable candidate event interpretations in a context-aware way. However, training such a model optimally may require using a large amount of manually-annotated example traces. We then propose a data-efficient neuro-symbolic approach to the problem, where the candidate interpretations returned by the example-driven sequence tagger is refined by the AAF-based reasoner. This allows us to also leverage prior knowledge to compensate for the scarcity of example data, as confirmed by experimenftal results.

URL PDF HTML ☆

赞 0 踩 0

2503.01122 2026-05-26 cs.CV

ACCORD: Alleviating Concept Coupling through Dependence Regularization for Text-to-Image Diffusion Personalization

ACCORD: 通过依赖正则化缓解文本到图像扩散个性化中的概念耦合

Shizhan Liu, Hao Zheng, Hang Yu, Jianguo Li

发表机构 * Ant Group（蚂蚁集团）

AI总结提出两种即插即用损失函数（去噪解耦损失和先验解耦损失）直接最小化两种依赖差异，以缓解概念耦合问题，实现文本控制与个性化保真度的更好平衡。

2502.16205 2026-05-26 cs.RO

A neural signed configuration distance function for path planning of picking manipulators

一种用于拾取机械臂路径规划的神经符号配置距离函数

Bernhard Wullt, Mikael Norrlöf, Per Mattsson, Thomas B. Schön

发表机构 * Department of Information Technology, Uppsala University（信息技术系，乌普萨拉大学）

AI总结针对拾取机械臂路径规划问题，提出一种神经符号配置距离函数（nSCDF）作为隐式障碍物表示，通过构建配置空间中的无碰撞球体，将多查询路径规划器中的点替换为球体，从而快速生成无碰撞走廊并利用凸规划优化路径，实验表明该方法在显著减少时间的同时生成接近渐近最优的路径。

详情

AI中文摘要

拾取机械臂是特定任务机器人，与通用机械臂相比自由度较少，在工业中广泛使用。拾取机器人的效率高度依赖于路径规划解决方案，该方案通常基于采样的多查询方法。规划器能够稳健地解决问题，但其对碰撞检测的大量使用限制了在线使用的规划能力。我们通过提出一种新颖的隐式障碍物表示用于路径规划，即神经符号配置距离函数（nSCDF），从而能够在配置空间中形成无碰撞球体。我们使用球体表示重新表述了一种先进的多查询路径规划器，即在图中使用球体而不是点。我们的规划器返回一个无碰撞走廊，这使我们能够使用凸规划生成优化路径。从数值实验中，我们观察到我们的规划器在显著更短的时间内生成接近渐近最优路径规划器的路径。

英文摘要

Picking manipulators are task specific robots, with fewer degrees of freedom compared to general-purpose manipulators, and are heavily used in industry. The efficiency of the picking robots is highly dependent on the path planning solution, which is commonly based on sampling-based multi-query methods. The planner is robustly able to solve the problem, but its heavy use of collision-detection limits the planning capabilities for online use. We approach this problem by presenting a novel implicit obstacle representation for path planning, a neural signed configuration distance function (nSCDF), which allows us to form collision-free balls in the configuration space. We use the ball representation to re-formulate a state of the art multi-query path planner, i.e., instead of points, we use balls in the graph. Our planner returns a collision-free corridor, which allows us to use convex programming to produce optimized paths. From our numerical experiments, we observe that our planner produces paths that are close to those from an asymptotically optimal path planner, in significantly less time.

URL PDF HTML ☆

赞 0 踩 0

2502.11167 2026-05-26 cs.LG cs.CL

SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

SURGE: 大型语言模型作为通用代理代码执行器的潜力

Bohan Lyu, Siqiao Huang, Zichen Liang

发表机构 * Department of Computer Science and Technology, Tsinghua（清华大学计算机科学与技术系）； Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua（清华大学交叉信息研究院）

AI总结提出SURGE基准，包含1160个问题覆盖8个关键方面，通过评估21个开源和专有LLM，研究其作为代码执行预测代理模型的可行性、扩展律、数据效率和预测准确性。

Journal ref Proceedings of The 2025 Conference on Empirical Methods in Natural Language Processing

详情

AI中文摘要

神经代理模型是数据挖掘中强大且高效的工具。同时，大型语言模型（LLM）在代码相关任务（如生成和理解）中展示了卓越的能力。然而，一个同样重要但尚未充分探索的问题是，LLM是否可以作为代码执行预测的代理模型。为了系统研究这一问题，我们引入了SURGE，一个包含1160个问题的综合基准，覆盖8个关键方面：多语言编程任务、竞赛级编程问题、仓库级代码分析、高成本科学计算、时间复杂度密集型算法、有缺陷代码分析、依赖特定编译器或执行环境的程序，以及形式化数学证明验证。通过对21个开源和专有LLM的广泛分析，我们研究了扩展律、数据效率和预测准确性。我们的发现揭示了LLM作为计算过程高效代理的可行性的重要见解。基准和评估框架可在https://github.com/Imbernoulli/SURGE获取。

英文摘要

Neural surrogate models are powerful and efficient tools in data mining. Meanwhile, large language models (LLMs) have demonstrated remarkable capabilities in code-related tasks, such as generation and understanding. However, an equally important yet underexplored question is whether LLMs can serve as surrogate models for code execution prediction. To systematically investigate it, we introduce SURGE, a comprehensive benchmark with $1160$ problems covering $8$ key aspects: multi-language programming tasks, competition-level programming problems, repository-level code analysis, high-cost scientific computing, time-complexity-intensive algorithms, buggy code analysis, programs dependent on specific compilers or execution environments, and formal mathematical proof verification. Through extensive analysis of $21$ open-source and proprietary LLMs, we examine scaling laws, data efficiency, and predictive accuracy. Our findings reveal important insights about the feasibility of LLMs as efficient surrogates for computational processes. The benchmark and evaluation framework are available at https://github.com/Imbernoulli/SURGE.

URL PDF HTML ☆

赞 0 踩 0

2502.10906 2026-05-26 cs.AI

PCGRLLM: Large Language Model-Driven Reward Design for Procedural Content Generation Reinforcement Learning

PCGRLLM：面向程序化内容生成强化学习的大语言模型驱动奖励设计

In-Chang Baek, Sung-Hyun Kim, Sam Earle, Zehua Jiang, Jin-Ha Noh, Julian Togelius, Kyung-Joong Kim

发表机构 * Gwangju Institute of Science and Technology（光州科学技术院）； New York University（纽约大学）； Corresponding author（通讯作者）

AI总结提出PCGRLLM架构，利用大语言模型和反馈机制生成奖励函数，在二维环境中实现故事到奖励的生成，性能接近人类水平。

Comments 14 pages, 8 figures, Acccepted to Transactions on Games

详情

DOI: 10.1109/TG.2026.3695197

AI中文摘要

奖励设计在游戏AI训练中起着关键作用，需要大量领域知识和人力。近年来，一些研究探索了使用大语言模型（LLM）生成奖励函数来训练游戏代理和控制机器人。在内容生成文献中，已有早期工作为强化学习代理生成器生成奖励函数。本文介绍了PCGRLLM，一种基于早期工作的扩展架构，采用了反馈机制和几种基于推理的提示工程技术。我们在二维环境中的故事到奖励生成任务上，使用两种最先进的LLM和各种基于推理的提示方法评估了所提出的方法。我们的实验提供了富有洞察力的评估，展示了LLM在内容生成任务中不可或缺的能力。结果表明，与之前的结构相比，性能有了显著提升，达到了与人类相当的性能。我们的工作展示了在游戏AI开发中减少人类依赖的潜力，同时支持和增强创造性过程。

英文摘要

Reward design plays a pivotal role in the training of game AIs, requiring substantial domain-specific knowledge and human effort. In recent years, several studies have explored reward generation for training game agents and controlling robots using large language models (LLMs). In the content generation literature, there has been early work on generating reward functions for reinforcement learning agent generators. This work introduces PCGRLLM, an extended architecture based on earlier work, which employs a feedback mechanism and several reasoning-based prompt engineering techniques. We evaluate the proposed method on a story-to-reward generation task in a two-dimensional environment using two state-of-the-art LLMs across various reasoning-based prompting methods. Our experiments provide insightful evaluations that demonstrate the capabilities of LLMs essential for content generation tasks. The results demonstrate a substantial performance improvement over the previous structure, achieving performance comparable to that of humans. Our work demonstrates the potential to reduce human dependency in game AI development, while supporting and enhancing creative processes.

URL PDF HTML ☆

赞 0 踩 0

2502.10311 2026-05-26 cs.LG cs.AI cs.HC

ExplainReduce: Generating global explanations from many local explanations

ExplainReduce: 从许多局部解释生成全局解释

Lauri Seppäläinen, Mudong Guo, Kai Puolamäki

发表机构 * University of Helsinki（赫尔辛基大学）

AI总结本文提出 ExplainReduce 方法，通过贪心启发式算法将大量局部解释缩减为少量简单模型，作为生成式全局解释，并证明其有效性和竞争力。

Comments 21 pages with a 36 page appendix, 8 + 39 figures, 1+1 tables. The datasets and source code used in the paper are available at https://github.com/edahelsinki/explainreduce. Accepted for publication in the 4th World Conference on eXplainable Artificial Intelligence (2026)

2502.01397 2026-05-26 cs.LG cs.AI cs.NA math.NA

Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations

消息传递GNN无法近似稀疏三角分解

Vladislav Trifonov, Ekaterina Muravleva, Ivan Oseledets

发表机构 * AIC, Skoltech（斯克里普金技术大学人工智能中心）； Skoltech AI4S Center（斯克里普金技术大学AI4S中心）； Sberbank of Russia（俄罗斯储蓄银行）； AIRI

AI总结本文通过理论和实验证明，消息传递图神经网络在逼近稀疏三角分解时存在根本性局限，需要超越消息传递的架构创新。

Comments Camera-ready version published in Transactions on Machine Learning Research

Journal ref Transactions on Machine Learning Research, 2026

详情

AI中文摘要

图神经网络（GNN）已被提议作为学习稀疏矩阵预条件子的工具，预条件子是加速线性求解器的关键组件。我们提出理论和实验证据表明，对于存在高质量预条件子但需要非局部依赖的矩阵类别，消息传递GNN从根本上无法近似稀疏三角分解。为了说明这一点，我们使用合成矩阵和SuiteSparse集合中的真实示例构建了一组基线。在包括图注意力网络和图变换器在内的多种GNN架构中，我们观察到预测因子与参考因子之间的余弦相似度较低（关键情况下≤0.7）。我们的理论和实验结果表明，需要超越消息传递的架构创新才能将GNN应用于矩阵分解等科学计算任务。此外，实验表明仅克服非局部性是不够的。需要定制的架构来捕获所需的依赖关系，因为即使是完全非局部的全局图变换器也无法匹配所提出的基线。

英文摘要

Graph Neural Networks (GNNs) have been proposed as a tool for learning sparse matrix preconditioners, which are key components in accelerating linear solvers. We present theoretical and empirical evidence that message-passing GNNs are fundamentally incapable of approximating sparse triangular factorizations for classes of matrices for which high-quality preconditioners exist but require non-local dependencies. To illustrate this, we construct a set of baselines using both synthetic matrices and real-world examples from the SuiteSparse collection. Across a range of GNN architectures, including Graph Attention Networks and Graph Transformers, we observe low cosine similarity ($\leq0.7$ in key cases) between predicted and reference factors. Our theoretical and empirical results suggest that architectural innovations beyond message-passing are necessary for applying GNNs to scientific computing tasks such as matrix factorization. Moreover, experiments demonstrate that overcoming non-locality alone is insufficient. Tailored architectures are necessary to capture the required dependencies since even a completely non-local Global Graph Transformer fails to match the proposed baselines.

URL PDF HTML ☆

赞 0 踩 0

2502.01184 2026-05-26 cs.LG cs.AI physics.chem-ph q-bio.QM

FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning

FragmentNet: 自适应图分片用于图到序列分子表示学习

Ankur Samanta, Rohan Gupta, Aditi Misra, Christian McIntosh Clarke, Jayakumar Rajadas

发表机构 * Department of Electrical and Computer Engineering, University of Toronto, Toronto, Canada（电气与计算机工程系，多伦多大学，多伦多，加拿大）； Regenerative Biomaterials Laboratory, Stanford Cardiovascular Institute, Palo Alto, USA（再生生物材料实验室，斯坦福心血管研究所，帕洛阿尔托，美国）

AI总结提出FragmentNet，通过自适应学习的分词器将分子图分解为化学有效的片段，并利用化学感知的空间位置编码保持分子拓扑，在片段级别进行掩码预训练，在多个属性预测任务上提升了性能。

Comments 22 pages, 13 figures, 5 tables

详情

AI中文摘要

分子表示学习方法通常将分子标记为单个原子或使用刚性、基于规则的分片分解，限制了它们捕捉有意义化学子结构上下文的能力。我们引入了FragmentNet，一种围绕新颖的自适应学习分词器构建的图到序列模型，该分词器将分子图分解为可调整粒度的化学有效片段，并辅以化学感知的空间位置编码，在生成的序列中保留分子拓扑。将自然语言处理中的掩码预训练策略扩展到分子领域，我们在化学有意义的片段级别而非单个原子级别对分子进行掩码和重建。在多个属性预测基准上的评估发现，在片段粒度上进行预训练在大多数任务上提高了下游性能，表明标记化粒度是分子表示学习的重要设计选择。

英文摘要

Molecular representation learning methods typically tokenize molecules as individual atoms or use rigid, rule-based fragment decompositions, limiting their ability to capture meaningful chemical substructure context. We introduce FragmentNet, a graph-to-sequence model built around a novel adaptive, learned tokenizer that decomposes molecular graphs into chemically valid fragments of adjustable granularity, complemented by chemically aware spatial positional encodings that preserve molecular topology in the resulting sequence. Extending masked pre-training strategies from natural language processing to the molecular domain, we mask and reconstruct molecules at the level of chemically meaningful fragments rather than individual atoms. Evaluating across multiple property prediction benchmarks, we find that pre-training at fragment granularity leads to improved downstream performance on the majority of tasks, demonstrating that tokenization granularity is an important design choice for molecular representation learning.

URL PDF HTML ☆

赞 0 踩 0

2501.14889 2026-05-26 cs.LG

Iterative Feature Space Optimization through Incremental Adaptive Evaluation

通过增量自适应评估的迭代特征空间优化

Yanping Wu, Yanyong Huang, Zhengzhang Chen, Zijun Yao, Yanjie Fu, Kunpeng Liu, Xiao Luo, Dongjie Wang

发表机构 * University of Kansas（堪萨斯大学）； Southwestern University of Finance and Economics（西南财经大学）； Arizona State University（亚利桑那州立大学）； Portland State University（波特兰州立大学）； University of California（加州大学）

AI总结提出EASE框架，通过特征-样本子空间生成器和上下文注意力评估器，实现高效、泛化的特征空间优化，解决评估偏差、过拟合和低效问题。

Comments 18 pages

详情

AI中文摘要

迭代特征空间优化涉及系统评估和调整特征空间以提升下游任务性能。然而，现有工作存在三个关键局限：1）忽视数据样本间的差异导致评估偏差；2）针对特定机器学习模型定制特征空间导致过拟合和泛化能力差；3）每次优化迭代需要从头重新训练评估器，显著降低整体优化效率。为弥补这些不足，我们提出一种广义自适应特征空间评估器（EASE），以高效产生最优且泛化的特征空间。该框架包含两个关键组件：特征-样本子空间生成器和上下文注意力评估器。第一个组件旨在解耦特征空间内的信息分布以减轻评估偏差。为此，我们首先根据后续评估器的反馈，识别与预测任务最相关的特征和评估中最具挑战性的样本。这种解耦策略使评估器持续聚焦于特征空间中最具挑战性的方面。第二个组件旨在增量捕获特征空间的演化模式以实现高效评估。我们提出一种加权共享多头注意力机制，将特征空间的关键特征编码为嵌入向量用于评估。此外，评估器进行增量更新，保留先前的评估知识同时融入新见解，因为优化过程中连续的特征空间共享部分信息。在十四个真实世界数据集上的大量实验证明了所提框架的有效性。我们的代码和数据已公开。

英文摘要

Iterative feature space optimization involves systematically evaluating and adjusting the feature space to improve downstream task performance. However, existing works suffer from three key limitations:1) overlooking differences among data samples leads to evaluation bias; 2) tailoring feature spaces to specific machine learning models results in overfitting and poor generalization; 3) requiring the evaluator to be retrained from scratch during each optimization iteration significantly reduces the overall efficiency of the optimization process. To bridge these gaps, we propose a gEneralized Adaptive feature Space Evaluator (EASE) to efficiently produce optimal and generalized feature spaces. This framework consists of two key components: Feature-Sample Subspace Generator and Contextual Attention Evaluator. The first component aims to decouple the information distribution within the feature space to mitigate evaluation bias. To achieve this, we first identify features most relevant to prediction tasks and samples most challenging for evaluation based on feedback from the subsequent evaluator. This decoupling strategy makes the evaluator consistently target the most challenging aspects of the feature space. The second component intends to incrementally capture evolving patterns of the feature space for efficient evaluation. We propose a weighted-sharing multi-head attention mechanism to encode key characteristics of the feature space into an embedding vector for evaluation. Moreover, the evaluator is updated incrementally, retaining prior evaluation knowledge while incorporating new insights, as consecutive feature spaces during the optimization process share partial information. Extensive experiments on fourteen real-world datasets demonstrate the effectiveness of the proposed framework. Our code and data are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2412.15668 2026-05-26 cs.CV

Adaptive Hierarchical Graph Cut for Multi-granularity Out-of-distribution Detection

自适应层次图割用于多粒度分布外检测

Xiang Fang, Arvind Easwaran, Blaise Genest, Ponnuthurai Nagaratnam Suganthan

发表机构 * Interdisciplinary Graduate Programme, Nanyang Technological University（新加坡国立大学跨学科研究生项目）； College of Computing and Data Science, Nanyang Technological University（新加坡国立大学计算与数据科学学院）； KINDI Computing Research Center, College of Engineering, Qatar University（卡塔尔大学工程学院KINDI计算研究中心）

AI总结提出自适应层次图割网络(AHGC)，通过构建层次KNN图并基于图连接和密度信息进行子图划分，以处理不同标签粒度下的分布外检测问题，在CIFAR-10和CIFAR-100上FPR95指标分别降低40.47%和81.24%。

Comments Published in IEEE Transactions on Artificial Intelligence

详情

AI中文摘要

本文聚焦于一项重要且具有挑战性的任务：分布外检测（OOD检测），旨在区分并拒绝具有语义偏移的测试样本，以防止在分布内（ID）数据上训练的模型产生不可靠的预测。尽管先前的工作已取得一定成功，但它们对于现实世界中具有挑战性的应用效果不佳，因为这些方法简单地将所有未标记数据视为OOD数据，忽略了不同数据集具有不同标签粒度的情况。例如，CIFAR-10中的“猫”和Tiny-ImageNet中的“虎斑猫”具有相同语义，但由于标签粒度不同而具有不同标签。为此，本文提出了一种新颖的自适应层次图割网络（AHGC），以深入探索不同图像之间的语义关系。具体地，我们构建一个层次KNN图，基于余弦相似度评估不同图像之间的相似性。基于图的连接和密度信息，我们将图切割成多个子图以整合这些语义相似的样本。如果子图中标记样本的百分比大于阈值，我们将百分比最高的标签分配给未标记图像。为进一步提高模型泛化能力，我们将每张图像增强为两个增强版本，并最大化这两个版本之间的相似性。最后，我们利用相似度分数进行OOD检测。在两个具有挑战性的基准（CIFAR-10和CIFAR-100）上进行的大量实验表明，在典型情况下，AHGC在“FPR95”指标上分别比最先进的OOD检测方法在CIFAR-100上降低81.24%，在CIFAR-10上降低40.47%，这显示了我们的AHGC的有效性。

英文摘要

This paper focuses on a significant yet challenging task: out-of-distribution detection (OOD detection), which aims to distinguish and reject test samples with semantic shifts, so as to prevent models trained on in-distribution (ID) data from producing unreliable predictions. Although previous works have made decent success, they are ineffective for real-world challenging applications since these methods simply regard all unlabeled data as OOD data and ignore the case that different datasets have different label granularity. For example, "cat" on CIFAR-10 and "tabby cat" on Tiny-ImageNet share the same semantics but have different labels due to various label granularity. To this end, in this paper, we propose a novel Adaptive Hierarchical Graph Cut network (AHGC) to deeply explore the semantic relationship between different images. Specifically, we construct a hierarchical KNN graph to evaluate the similarities between different images based on the cosine similarity. Based on the linkage and density information of the graph, we cut the graph into multiple subgraphs to integrate these semantics-similar samples. If the labeled percentage in a subgraph is larger than a threshold, we will assign the label with the highest percentage to unlabeled images. To further improve the model generalization, we augment each image into two augmentation versions, and maximize the similarity between the two versions. Finally, we leverage the similarity score for OOD detection. Extensive experiments on two challenging benchmarks (CIFAR- 10 and CIFAR-100) illustrate that in representative cases, AHGC outperforms state-of-the-art OOD detection methods by 81.24% on CIFAR-100 and by 40.47% on CIFAR-10 in terms of "FPR95", which shows the effectiveness of our AHGC.

URL PDF HTML ☆

赞 0 踩 0

2409.20473 2026-05-26 cs.RO

Data-Driven Optimization of Tactile Sensor Configurations for Efficient Dexterous Manipulation

数据驱动的触觉传感器配置优化以实现高效灵巧操作

Haoran Guo, Haoyang Wang, Zhengxiong Li, He Bai, Lingfeng Tao

发表机构 * ShanghaiTech University, School of Information Science and Technology（上海科技大学信息科学与技术学院）； University of Alberta（阿尔伯塔大学）； Oklahoma State University（俄克拉荷马州立大学）； University of Colorado Denver, Department of Computer Science and Engineering（科罗拉多大学丹佛分校计算机科学与工程系）； Department of Robotics and Mechatronics Engineering, Kennesaw State University（凯斯西储大学机器人与机电工程系）

AI总结提出两阶段框架量化触觉传感器对深度强化学习策略的贡献，将Shadow Hand传感器从92个减少至14个仍保持90%以上性能，并发现中指传感器具有负贡献。

Comments This work has been submitted to the ICRA for possible publication

详情

AI中文摘要

触觉感知对于基于学习的灵巧操作至关重要，但传感器放置的原则性指导仍然缺乏。虽然密集传感器阵列提供丰富的接触反馈，但它们带来显著的硬件成本，甚至可能通过引入冗余或冲突输入而降低策略性能。本文提出了第一个系统框架，用于量化单个触觉传感器对深度强化学习（DRL）策略性能的贡献。我们提出了一种两阶段方法：粗粒度经验剪枝阶段将Shadow Hand上的传感器数量从92个减少到21个，同时保留93%的任务性能；随后是细粒度主动学习阶段，结合高斯过程回归（GPR）与Lasso回归对每个剩余传感器的功能重要性进行排序。我们的分析揭示，拇指、无名指和小指上的传感器主导操作性能，而中指传感器表现出负贡献——主动降低策略学习。跨三个操作任务（方块、鸡蛋和笔）的消融研究证实，14个传感器的配置保留了全阵列90%以上的性能。在两个新物体上的零样本迁移实验以及在Allegro和Leap Hand上的跨平台验证进一步表明，识别出的重要性排序在任务和机器人形态之间具有泛化性。这些发现建立了量化部署指南，使从业者能够选择具有可预测性能权衡的成本效益传感器配置。

英文摘要

Tactile sensing is critical for learning-based dexterous manipulation, yet principled guidelines for sensor placement remain largely absent. While dense sensor arrays provide rich contact feedback, they impose significant hardware costs and can even degrade policy performance by introducing redundant or conflicting inputs. This paper presents the first systematic framework for quantifying the contribution of individual tactile sensors to deep reinforcement learning (DRL) policy performance. We propose a two-stage approach: a coarse empirical pruning phase that reduces the sensor count on the Shadow Hand from 92 to 21 while retaining 93\% task performance, followed by a fine-grained active learning phase that combines Gaussian Process Regression (GPR) with Lasso regression to rank the functional importance of each remaining sensor. Our analysis reveals that sensors on the thumb, ring finger, and little finger dominate manipulation performance, while middle-finger sensors exhibit negative contributions -- actively degrading policy learning. Ablation studies across three manipulation tasks (block, egg, and pen) confirm that a 14-sensor configuration preserves over 90\% of the full-array performance. Zero-shot transfer experiments on two novel objects and cross-platform validation on the Allegro and Leap Hand further demonstrate that the identified importance rankings generalize across tasks and robot morphologies. These findings establish quantitative deployment guidelines that enable practitioners to select cost-effective sensor configurations with predictable performance trade-offs.

URL PDF HTML ☆

赞 0 踩 0

2409.17608 2026-05-26 cs.CV

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

外观模糊驱动的自编码器和运动引导的记忆模块用于视频异常检测

Jiahao Lyu, Minghua Zhao, Jing Hu, Xuewen Huang, Shuangli Du, Cheng Shi, Zhiyong Lv

发表机构 * School of Computer Science and Engineering, Xi’an University of Technology（西安理工大学计算机科学与工程学院）

AI总结提出一种基于外观模糊和运动引导记忆模块的零样本跨数据集视频异常检测方法，通过构建全局伪异常并利用运动记忆项扩大正常与异常运动差异。

Comments 13 pages, 11 figures

Journal ref Knowledge-Based Systems 2026

详情

DOI: 10.1016/j.knosys.2025.115218

AI中文摘要

视频异常检测（VAD）通常学习正常样本的分布并通过测量显著偏差来检测异常，但不期望的泛化可能会重构一些异常从而抑制偏差。同时，大多数VAD无法应对新目标域的跨数据集验证，而少样本方法必须费力地依赖目标域的模型调优来完成域适应。为解决这些问题，我们提出一种新颖的VAD方法，带有运动引导记忆模块，实现零样本跨数据集验证。首先，我们对原始外观图像添加高斯模糊，从而构建全局伪异常，作为网络输入。然后，我们提出多尺度残差通道注意力来去模糊正常样本中的伪异常。接下来，通过记录训练阶段的运动特征获得记忆项，用于在测试阶段从原始信息中检索运动特征。最后，我们的方法可以通过注意力忽略模糊的真实异常，并依赖运动记忆项来增加正常与异常运动之间的正常性差距。在三个基准数据集上的大量实验证明了所提方法的有效性。与跨域方法相比，我们的方法在测试时无需适应即可实现有竞争力的性能。

英文摘要

Video anomaly detection (VAD) often learns the distribution of normal samples and detects the anomaly through measuring significant deviations, but the undesired generalization may reconstruct a few anomalies thus suppressing the deviations. Meanwhile, most VADs cannot cope with cross-dataset validation for new target domains, and few-shot methods must laboriously rely on model-tuning from the target domain to complete domain adaptation. To address these problems, we propose a novel VAD method with a motion-guided memory module to achieve cross-dataset validation with zero-shot. First, we add Gaussian blur to the raw appearance images, thereby constructing the global pseudo-anomaly, which serves as the input to the network. Then, we propose multi-scale residual channel attention to deblur the pseudo-anomaly in normal samples. Next, memory items are obtained by recording the motion features in the training phase, which are used to retrieve the motion features from the raw information in the testing phase. Lastly, our method can ignore the blurred real anomaly through attention and rely on motion memory items to increase the normality gap between normal and abnormal motion. Extensive experiments on three benchmark datasets demonstrate the effectiveness of the proposed method. Compared with cross-domain methods, our method achieves competitive performance without adaptation during testing.

URL PDF HTML ☆

赞 0 踩 0

2409.09953 2026-05-26 cs.CV

Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection

不确定性引导的外观-运动关联网络用于分布外动作检测

Xiang Fang, Arvind Easwaran, Blaise Genest

发表机构 * College of Computing and Data Science（计算与数据科学学院）； Nanyang Technological University（南洋理工大学）

AI总结针对分布外动作检测任务，提出不确定性引导的外观-运动关联网络（UAAN），通过融合外观与运动特征并推理时空物体交互，显著优于现有方法。

Comments Accepted by MIPR 2024

详情

AI中文摘要

分布外（OOD）检测旨在检测并拒绝具有语义偏移的测试样本，以防止在分布内（ID）数据集上训练的模型产生不可靠的预测。现有工作仅在图像数据集上提取外观特征，无法处理包含大量运动信息的动态多媒体场景。因此，我们针对一个更现实且更具挑战性的OOD检测任务：OOD动作检测（ODAD）。给定一个未裁剪的视频，ODAD首先对ID动作进行分类并识别OOD动作，然后定位ID和OOD动作。为此，本文提出了一种新颖的不确定性引导的外观-运动关联网络（UAAN），该网络同时探索外观特征和运动上下文，以推理用于ODAD的时空物体间交互。首先，我们设计独立的外观和运动分支，以提取相应的面向外观和面向运动的物体表示。在每个分支中，我们构建一个时空图来推理外观引导和运动驱动的物体间交互。然后，我们设计一个外观-运动注意力模块，融合外观和运动特征以进行最终的动作检测。在两个具有挑战性的数据集上的实验结果表明，UAAN显著优于最先进的方法，证明了其有效性。

英文摘要

Out-of-distribution (OOD) detection targets to detect and reject test samples with semantic shifts, to prevent models trained on in-distribution (ID) dataset from producing unreliable predictions. Existing works only extract the appearance features on image datasets, and cannot handle dynamic multimedia scenarios with much motion information. Therefore, we target a more realistic and challenging OOD detection task: OOD action detection (ODAD). Given an untrimmed video, ODAD first classifies the ID actions and recognizes the OOD actions, and then localizes ID and OOD actions. To this end, in this paper, we propose a novel Uncertainty-Guided Appearance-Motion Association Network (UAAN), which explores both appearance features and motion contexts to reason spatial-temporal inter-object interaction for ODAD.Firstly, we design separate appearance and motion branches to extract corresponding appearance-oriented and motion-aspect object representations. In each branch, we construct a spatial-temporal graph to reason appearance-guided and motion-driven inter-object interaction. Then, we design an appearance-motion attention module to fuse the appearance and motion features for final action detection. Experimental results on two challenging datasets show that UAAN beats state-of-the-art methods by a significant margin, illustrating its effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2408.08399 2026-05-26 cs.LG cs.SY eess.SY

Transformer-based few-shot learning for modeling Electricity Consumption Profiles with minimal data across thousands of domains

基于Transformer的少样本学习：以最少数据跨数千个领域建模电力消费曲线

Weijie Xia, Gao Peng, Chenguang Wang, Peter Palensky, Eric Pauwels, Pedro P. Vergara

发表机构 * Intelligent Electrical Power Grids (IEPG) Group（智能电力电网组）； Centrum Wiskunde & Informatica (CWI)（数学与信息学研究中心）； Alliander N.V（Alliander公司）

AI总结针对电力消费曲线建模中数据稀缺问题，提出一种结合Transformer和高斯混合模型的免微调少样本学习框架，仅需1.6%数据即可准确恢复复杂分布，优于现有方法。

Journal ref International Journal of Electrical Power & Energy Systems, Volume/Issue (February 2026), Article 111575

详情

DOI: 10.1016/j.ijepes.2026.111575

AI中文摘要

电力消费曲线（ECP）对于配电系统的运行和规划至关重要，尤其是在太阳能电池板和电动汽车等低碳技术日益普及的背景下。传统的ECP建模方法通常假设有足够的ECP数据可用。然而，在实践中，由于隐私问题或缺乏计量设备，ECP数据的可访问性有限。少样本学习（FSL）已成为数据稀缺场景下ECP建模的一种有前景的解决方案。然而，标准的FSL方法（例如用于图像的方法）不适用于ECP建模，因为（1）这些方法通常假设有多个具有充足数据的源域和多个目标域。但在ECP建模中，可能存在数千个源域（例如具有中等数据量的家庭）和数千个目标域（例如需要建模ECP的家庭）。（2）标准FSL方法通常涉及繁琐的知识迁移机制，例如预训练和微调。为了解决这些局限性，本文提出了一种新颖的FSL框架，将Transformer与高斯混合模型（GMM）相结合用于ECP建模。所提出的方法无需微调，计算效率高，即使在数据极其有限的情况下也具有鲁棒性。结果表明，我们的方法可以用最少的ECP数据（例如，仅占完整域数据集的1.6%）准确恢复复杂的ECP分布，并且在ECP建模背景下优于最先进的时间序列建模方法。

英文摘要

Electricity Consumption Profiles (ECPs) are crucial for operating and planning power distribution systems, especially with the increasing number of low-carbon technologies such as solar panels and electric vehicles. Traditional ECP modeling methods typically assume the availability of sufficient ECP data. However, in practice, the accessibility of ECP data is limited due to privacy issues or the absence of metering devices. Few-shot learning (FSL) has emerged as a promising solution for ECP modeling in data-scarce scenarios. Nevertheless, standard FSL methods, such as those used for images, are unsuitable for ECP modeling because (1) these methods usually assume several source domains with sufficient data and several target domains. However, in the context of ECP modeling, there may be thousands of source domains, e.g., households with a moderate amount of data, and thousands of target domains, e.g., households that ECP are required to be modeled. (2) Standard FSL methods usually involve cumbersome knowledge transfer mechanisms, such as pre-training and fine-tuning. To address these limitations, this paper proposes a novel FSL framework that integrates Transformers with Gaussian Mixture Models (GMMs) for ECP modeling. The proposed approach is fine-tuning-free, computationally efficient, and robust even with extremely limited data. Results show that our method can accurately restore the complex ECP distribution with a minimal amount of ECP data (e.g., only 1.6% of the complete domain dataset) and outperforms state-of-the-art time series modeling methods in the context of ECP modeling.

URL PDF HTML ☆

赞 0 踩 0

2406.09079 2026-05-26 cs.LG

Hadamard Representation: Scaffolding Performance Across Model-free RL

Hadamard表示：跨无模型强化学习的性能支撑

Jacob E. Kooi, Zhao Yang, Mark Hoogendoorn, Vincent François-Lavet

发表机构 * Vrije Universiteit Amsterdam（阿姆斯特丹自由大学）

AI总结提出Hadamard表示（HR），通过将标准隐藏层替换为两个独立参数化层的逐元素乘积，减少神经元休眠并增加有效秩，从而在多种强化学习算法和领域中一致提升性能。

Comments 26 pages, 17 figures

详情

AI中文摘要

深度强化学习智能体在训练过程中逐渐失去表示能力：神经元变得休眠，从网络中移除活跃容量，有效秩崩溃，使存活的神经元冗余。现有的补救措施如周期性重置和特殊神经网络架构，大多局限于特定算法或领域。我们提出一个简单的架构修复，即Hadamard表示（HR），它将标准隐藏层替换为两个独立参数化层的逐元素乘积。HR通过两种互补机制运作。首先，它降低了神经元变得休眠的概率，这对于连续可微激活函数（如tanh）尤其有价值：与休眠的ReLU神经元（被有效剪枝）不同，饱和的tanh神经元通过将其输出权重转化为固定偏置而暗中破坏下游层。其次，独立于休眠，乘法结构捕获更丰富的特征交互，并在不拓宽层的情况下增加有效秩。我们在五种算法和三个领域上评估HR：基于像素的离散动作Atari上的DQN、PPO和PQN，基于状态连续控制上的SimbaV2，以及视觉连续控制上的MR.Q。HR在无需任何超参数调优的情况下，一致地优于强基线，并且其增益在参数匹配的更宽变体上仍然保持，排除了参数数量作为替代解释的可能性。

英文摘要

Deep reinforcement learning agents progressively lose representational capacity during training: neurons become dormant, removing active capacity from the network, and effective rank collapses, leaving surviving neurons redundant. Existing remedies such as periodic resets, and special neural network architectures, are largely algorithm- or domain-specific. We propose a simple architectural fix, the Hadamard Representation (HR), which replaces a standard hidden layer with the element-wise product of two independently parameterized layers. HR operates through two complementary mechanisms. First, it reduces the probability of a neuron becoming dormant, which is particularly valuable for continuously differentiable activations such as tanh: unlike dormant ReLU neurons, which are effectively pruned, saturated tanh neurons silently corrupt downstream layers by turning their outgoing weights into fixed biases. Second, independently of dormancy, the multiplicative structure captures richer feature interactions and increases effective rank without widening the layer. We evaluate HR across five algorithms and three domains: DQN, PPO, and PQN on pixel-based discrete-action Atari, SimbaV2 on state-based continuous control, and MR.Q on visual continuous control. HR consistently improves performance over the strong baselines without any hyperparameter tuning, and gains persist against parameter-matched wider variants, ruling out parameter count as an alternative explanation.

URL PDF HTML ☆

赞 0 踩 0

2404.10947 2026-05-26 cs.CV

Residual Connections Harm Generative Representation Learning

残差连接损害生成式表示学习

Xiao Zhang, Ruoxi Jiang, William Gao, Rebecca Willett, Michael Maire

发表机构 * University of Chicago（芝加哥大学）； Fudan University（复旦大学）； Tencent（腾讯）； Shanghai Academy of AI for Science（上海人工智能科学研究院）

AI总结通过减少残差网络中恒等捷径的权重，显著提升掩码自编码器和扩散模型等生成式表示学习框架中的语义特征学习质量。

Comments accepted to CVPR 2026

详情

AI中文摘要

我们表明，在残差网络中引入一个加权因子以减少恒等捷径的影响，可以显著增强生成式表示学习框架（如掩码自编码器（MAE）和扩散模型）中的语义特征学习。我们的修改显著提高了特征质量，对于使用ViT-B/16骨干网络的MAE，将ImageNet-1K K近邻准确率从27.4%提升至63.9%，线性探测准确率从67.8%提升至72.7%，同时增强了扩散模型的生成质量。这一显著差距表明，虽然残差连接结构在促进梯度传播方面起着重要作用，但它可能通过将浅层表示的“回声”注入深层，从而降低抽象学习能力，产生有害副作用。我们通过一个固定公式来改善这一缺点，该公式随着层深度增加而单调减少恒等连接的贡献。我们的设计促进了特征抽象的逐步发展，且不影响网络的可训练性。分析我们修改后的残差网络学到的表示，我们发现低有效特征秩与下游任务性能之间存在相关性。

英文摘要

We show that introducing a weighting factor to reduce the influence of identity shortcuts in residual networks significantly enhances semantic feature learning in generative representation learning frameworks, such as masked autoencoders (MAEs) and diffusion models. Our modification notably improves feature quality, raising ImageNet-1K K-Nearest Neighbor accuracy from 27.4% to 63.9% and linear probing accuracy from 67.8% to 72.7% for MAEs with a ViT-B/16 backbone, while also enhancing generation quality in diffusion models. This significant gap suggests that, while residual connection structure serves an essential role in facilitating gradient propagation, it may have a harmful side effect of reducing capacity for abstract learning by virtue of injecting an echo of shallower representations into deeper layers. We ameliorate this downside via a fixed formula for monotonically decreasing the contribution of identity connections as layer depth increases. Our design promotes the gradual development of feature abstractions, without impacting network trainability. Analyzing the representations learned by our modified residual networks, we find correlation between low effective feature rank and downstream task performance.

URL PDF HTML ☆

赞 0 踩 0

2403.04545 2026-05-26 cs.LG math.ST stat.TH

Branch Scaling Manifests as Implicit Architectural Regularization for Improving Generalization in Overparameterized ResNets

分支缩放表现为隐式架构正则化以改善过参数化ResNet的泛化能力

Zixiong Yu, Guhan Chen, Jianfa Lai, Bohan Li, Songtao Tian

发表机构 * Huawei Large Model Data Technology Lab, Shenzhen（华为大模型数据技术实验室，深圳）； Tsinghua University, Beijing（清华大学，北京）； Kyoto University, Kyoto（京都大学，京都）

AI总结本文研究残差网络中分支缩放因子对过参数化ResNet泛化性能的影响，通过理论分析证明快速深度衰减的缩放因子结合早停可实现极小极大最优泛化率，并利用神经正切核（NTK）近似解释其机制。

Comments Accepted by ICML. This version incorporates content from the preprint arXiv:2305.18506. The contributors of the relevant content have consented to its inclusion and have been listed as authors

详情

AI中文摘要

残差分支中的缩放因子已成为提升神经网络性能的流行方法，特别是在无归一化架构中。虽然先前的工作主要从优化角度研究缩放效应，本文通过泛化理论的视角探讨其在残差架构中的作用。具体来说，我们证明具有恒定缩放因子的宽残差网络（ResNet）随着深度增加渐近地变得不可学习。相反，当缩放因子表现出快速的深度方向衰减并结合早停时，过参数化ResNet实现了极小极大最优泛化率。为了建立这一结论，我们证明宽ResNet的泛化能力可以通过与神经正切核（NTK）相关的核回归来近似。我们的理论发现通过合成数据和真实世界分类任务（包括MNIST和CIFAR-100）的实验得到验证。

英文摘要

Scaling factors in residual branches have emerged as a prevalent method for boosting neural network performance, especially in normalization-free architectures. While prior work has primarily examined scaling effects from an optimization perspective, this paper investigates their role in residual architectures through the lens of generalization theory. Specifically, we establish that wide residual networks (ResNets) with constant scaling factors become asymptotically unlearnable as depth increases. In contrast, when the scaling factor exhibits rapid depth-wise decay combined with early stopping, over-parameterized ResNets achieve minimax-optimal generalization rates. To establish this, we demonstrate that the generalization capability of wide ResNets can be approximated by kernel regression associated with the Neural Tangent Kernel (NTK). Our theoretical findings are validated through experiments on synthetic data and real-world classification tasks, including MNIST and CIFAR-100.

URL PDF HTML ☆

赞 0 踩 0

2402.13791 2026-05-26 cs.LG

Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

打开黑箱：遥感中可解释人工智能的系统综述

Adrian Höhl, Ivica Obadic, Miguel Ángel Fernández Torres, Hiba Najjar, Dario Oliveira, Zeynep Akata, Andreas Dengel, Xiao Xiang Zhu

发表机构 * Chair of Data Science in Earth Observation, Technical University of Munich (TUM)（地球观测数据科学教授团，慕尼黑技术大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）； Image Processing Laboratory (IPL), Universitat de València (UV)（图像处理实验室（IPL），瓦伦西亚大学）； University of Kaiserslautern-Landau, Germany（德国凯撒斯劳滕-兰道大学）； German Research Center for Artificial Intelligence (DFKI)（德国人工智能研究中心（DFKI））； School of Applied Mathematics, Getulio Vargas Foundation, Brazil（巴西格洛里奥·瓦格斯基金会应用数学学院）； Institute for Explainable Machine Learning at Helmholtz Munich（海德堡慕尼黑可解释机器学习研究所）； Chair of Interpretable and Reliable Machine Learning, Technical University of Munich（可解释和可靠机器学习教授团，慕尼黑技术大学）

AI总结本文通过系统综述，总结了遥感中可解释AI方法的使用、目标、发现和挑战，揭示了新兴方向并提供了评估方法。

Journal ref published in IEEE Geoscience and Remote Sensing Magazine, vol. 12, no. 4, pp. 261-304, Dec. 2024

详情

DOI: 10.1109/MGRS.2024.3467001

AI中文摘要

近年来，黑箱机器学习方法已成为遥感知识提取的主导建模范式。尽管通过可解释人工智能揭示这些模型内部运作具有潜在益处，但目前在遥感应用中，仍缺乏全面概述可解释AI方法及其目标、发现和挑战的综述。本文通过系统综述来填补这一空白，识别该领域的关键趋势，并阐明针对特定遥感挑战的新颖可解释AI方法和新兴方向。我们还揭示了解释解释的常见模式，讨论了提取的科学见解，并反思了用于评估可解释AI方法的方法。因此，我们的综述提供了遥感中可解释AI最新技术的完整总结。此外，我们详细展望了挑战和有前景的研究方向，这为新颖方法论的发展奠定了基础，并为该领域的新研究者提供了有用的起点。

英文摘要

In recent years, black-box machine learning approaches have become a dominant modeling paradigm for knowledge extraction in remote sensing. Despite the potential benefits of uncovering the inner workings of these models with explainable AI, a comprehensive overview summarizing the explainable AI methods used and their objectives, findings, and challenges in remote sensing applications is still missing. In this paper, we address this gap by performing a systematic review to identify the key trends in the field and shed light on novel explainable AI approaches and emerging directions that tackle specific remote sensing challenges. We also reveal the common patterns of explanation interpretation, discuss the extracted scientific insights, and reflect on the approaches used for the evaluation of explainable AI methods. As such, our review provides a complete summary of the state-of-the-art of explainable AI in remote sensing. Further, we give a detailed outlook on the challenges and promising research directions, representing a basis for novel methodological development and a useful starting point for new researchers in the field.

URL PDF HTML ☆

赞 0 踩 0

2311.11342 2026-05-26 cs.LG cs.DC math.OC

On the Communication Complexity of Decentralized Stochastic Bilevel Optimization

去中心化随机双层优化的通信复杂度

Yihan Zhang, My T. Thai, Jie Wu, Hongchang Gao

发表机构 * Temple University（特拉华大学）

AI总结针对异构环境下现有去中心化随机双层优化算法收敛慢、通信成本高的问题，提出基于同步和交替更新策略的两种新算法，实现了更快的收敛速度和更低的通信成本，并首次在温和假设下揭示了异构设置中Hessian逆向量积的计算与通信对收敛率的影响。

详情

AI中文摘要

随机双层优化在机器学习中有着广泛的应用，包括元学习、超参数优化和神经架构搜索。为了将随机双层优化扩展到分布式数据，已经开发了几种去中心化随机双层优化算法。然而，现有方法在异构设置中通常存在收敛速度慢和通信成本高的问题，限制了它们在实际任务中的适用性。为了解决这些问题，我们提出了两种基于 extit{同步}和 extit{交替}更新策略的新型去中心化随机双层梯度下降算法。我们的算法能够实现比现有方法更快的收敛速度和更低的通信成本。重要的是，我们的收敛分析不依赖于关于异构性的强假设。更重要的是，我们的理论分析清晰地揭示了在异构设置下，关于Hessian逆向量积的计算和通信如何影响收敛率。据我们所知，这是首次在异构设置中在温和假设下取得如此有利的理论结果。此外，我们展示了如何在使用方差缩减梯度时建立交替更新策略的收敛率。最后，实验结果证实了我们算法的有效性。

英文摘要

Stochastic bilevel optimization finds widespread applications in machine learning, including meta-learning, hyperparameter optimization, and neural architecture search. To extend stochastic bilevel optimization to distributed data, several decentralized stochastic bilevel optimization algorithms have been developed. However, existing methods often suffer from slow convergence rates and high communication costs in heterogeneous settings, limiting their applicability to real-world tasks. To address these issues, we propose two novel decentralized stochastic bilevel gradient descent algorithms based on \textit{simultaneous} and \textit{alternating} update strategies. Our algorithms can achieve faster convergence rates and lower communication costs than existing methods. Importantly, our convergence analyses do not rely on strong assumptions regarding heterogeneity. More importantly, our theoretical analyses clearly disclose how the computation and communication regarding the Hessian-inverse-vector product under the heterogeneous setting affects the convergence rate. To the best of our knowledge, this is the first time such favorable theoretical results have been achieved with mild assumptions in the heterogeneous setting. Furthermore, we demonstrate how to establish the convergence rate for the alternating update strategy when combined with the variance-reduced gradient. Finally, experimental results confirm the efficacy of our algorithms.

URL PDF HTML ☆

赞 0 踩 0

2303.07863 2026-05-26 cs.CV cs.AI cs.MM

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

你可以比看见更早定位：一种用于压缩视频中时序句子定位的高效流程

Xiang Fang, Daizong Liu, Pan Zhou, Guoshun Nan

发表机构 * The Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology（大数据安全湖北工程研究中心，网络安全科学与工程学院，华中科技大学）； Peking University（北京大学）； Beijing University of Posts and Telecommunications（北京邮电大学）

AI总结提出一种三分支压缩域时空融合框架（TCSF），直接从压缩视频中提取I帧、运动向量和残差特征，实现高效准确的时序句子定位。

Comments Accepted by CVPR 2023

详情

AI中文摘要

给定一个未剪辑视频，时序句子定位（TSG）旨在根据句子查询语义上定位目标时刻。尽管先前的工作取得了不错的成功，但它们仅关注从连续解码帧中提取的高级视觉特征，未能处理压缩视频的查询建模，导致训练和测试期间表示能力不足且计算复杂度高。本文提出了一种新的设置——压缩域TSG，直接利用压缩视频而非完全解压的帧作为视觉输入。为了处理原始视频比特流输入，我们提出了一种新颖的三分支压缩域时空融合（TCSF）框架，该框架提取并聚合三种低级视觉特征（I帧、运动向量和残差特征）以实现高效准确的定位。特别地，不像先前工作那样编码整个解码帧，我们仅通过学习I帧特征来捕获外观表示，以减少延迟。此外，我们不仅通过学习运动向量特征来探索运动信息，还通过残差特征探索相邻帧的关系。通过这种方式，进一步设计了一个带有自适应运动-外观融合模块的三分支时空注意力层，以提取和聚合外观和运动信息用于最终定位。在三个具有挑战性的数据集上的实验表明，我们的TCSF以更低的复杂度实现了比现有最先进方法更好的性能。

英文摘要

Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target moment semantically according to a sentence query. Although previous respectable works have made decent success, they only focus on high-level visual features extracted from the consecutive decoded frames and fail to handle the compressed videos for query modelling, suffering from insufficient representation capability and significant computational complexity during training and testing. In this paper, we pose a new setting, compressed-domain TSG, which directly utilizes compressed videos rather than fully-decompressed frames as the visual input. To handle the raw video bit-stream input, we propose a novel Three-branch Compressed-domain Spatial-temporal Fusion (TCSF) framework, which extracts and aggregates three kinds of low-level visual features (I-frame, motion vector and residual features) for effective and efficient grounding. Particularly, instead of encoding the whole decoded frames like previous works, we capture the appearance representation by only learning the I-frame feature to reduce delay or latency. Besides, we explore the motion information not only by learning the motion vector feature, but also by exploring the relations of neighboring frames via the residual feature. In this way, a three-branch spatial-temporal attention layer with an adaptive motion-appearance fusion module is further designed to extract and aggregate both appearance and motion information for the final grounding. Experiments on three challenging datasets shows that our TCSF achieves better performance than other state-of-the-art methods with lower complexity.

URL PDF HTML ☆

赞 0 踩 0

2209.11572 2026-05-26 cs.CV cs.AI cs.IR cs.MM

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

多模态跨域对齐网络用于视频时刻检索

Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu

发表机构 * Hubei Key Laboratory of Distributed System Security（湖北分布式系统安全重点实验室）； Hubei Engineering Research Center on Big Data Security（湖北大数据安全工程研究中心）； School of Cyber Science and Engineering（网络安全学院）； Huazhong University of Science and Technology（华中科技大学）； Wangxuan Institute of Computer Technology（王轩计算机技术研究所）； Peking University（北京大学）； School of Computer Science and Technology（计算机科学与技术学院）； Key Laboratory of Information Storage System Ministry of Education of China（信息存储系统教育部重点实验室）

AI总结提出多模态跨域对齐网络，通过域对齐、跨模态对齐和特定对齐三个模块，解决跨域视频时刻检索中域差异和语义鸿沟问题。

Comments Accepted by IEEE Transactions on Multimedia

详情

AI中文摘要

作为多媒体信息检索中日益流行的任务，视频时刻检索（VMR）旨在根据给定的语言查询从未修剪的视频中定位目标时刻。大多数先前的方法严重依赖于大量手动标注（即时刻边界），这在实践中获取成本极高。此外，由于不同数据集之间的域差异，直接将预训练模型应用于未见过的域会导致性能显著下降。本文聚焦于一项新任务：跨域VMR，其中在一个域（“源域”）中有完全标注的数据集，但目标域（“目标域”）仅包含未标注的数据集。据我们所知，我们提出了关于跨域VMR的首项研究。为了解决这一新任务，我们提出了一种新颖的多模态跨域对齐（MMCDA）网络，将标注知识从源域迁移到目标域。然而，由于源域和目标域之间的域差异以及视频和查询之间的语义鸿沟，直接将训练好的模型应用于目标域通常会导致性能下降。为解决此问题，我们开发了三个新颖的模块：（i）域对齐模块，用于对齐每个模态在不同域之间的特征分布；（ii）跨模态对齐模块，旨在将视频和查询特征映射到联合嵌入空间，并对齐目标域中不同模态之间的特征分布；（iii）特定对齐模块，试图获取特定帧与给定查询之间的细粒度相似性以实现最优定位。通过联合训练这三个模块，我们的MMCDA能够学习域不变且语义对齐的跨模态表示。

英文摘要

As an increasingly popular task in multimedia information retrieval, video moment retrieval (VMR) aims to localize the target moment from an untrimmed video according to a given language query. Most previous methods depend heavily on numerous manual annotations (i.e., moment boundaries), which are extremely expensive to acquire in practice. In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop. In this paper, we focus on a novel task: cross-domain VMR, where fully-annotated datasets are available in one domain (``source domain''), but the domain of interest (``target domain'') only contains unannotated datasets. As far as we know, we present the first study on cross-domain VMR. To address this new task, we propose a novel Multi-Modal Cross-Domain Alignment (MMCDA) network to transfer the annotation knowledge from the source domain to the target domain. However, due to the domain discrepancy between the source and target domains and the semantic gap between videos and queries, directly applying trained models to the target domain generally leads to a performance drop. To solve this problem, we develop three novel modules: (i) a domain alignment module is designed to align the feature distributions between different domains of each modality; (ii) a cross-modal alignment module aims to map both video and query features into a joint embedding space and to align the feature distributions between different modalities in the target domain; (iii) a specific alignment module tries to obtain the fine-grained similarity between a specific frame and the given query for optimal localization. By jointly training these three modules, our MMCDA can learn domain-invariant and semantic-aligned cross-modal representations.

URL PDF HTML ☆

赞 0 踩 0

2011.11194 2026-05-26 cs.LG cs.CV cs.NE

V3H: View Variation and View Heredity for Incomplete Multi-view Clustering

V3H: 面向不完整多视图聚类的视图变异与视图遗传

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

发表机构 * School of Computer Science and Technology, Huazhong University of Science and Technology（华中科技大学计算机科学与技术学院）； Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology（华中科技大学大数据安全工程研究中心）； Department of Electrical and Computer Engineering, University of Florida（佛罗里达大学电子与计算机工程系）

AI总结提出一种受遗传学启发的视图变异与视图遗传方法(V3H)，通过分解子空间为变异矩阵和遗传矩阵分别学习各视图的独特信息和所有视图的一致信息，并利用可调低秩表示恢复底层数据结构，在不完整多视图聚类中同时捕获一致与独特信息，在15个基准数据集上超越现有方法。

Comments Publisheded in IEEE Transactions on Artificial Intelligence

Journal ref IEEE Transactions on Artificial Intelligence 2020

详情

DOI: 10.1109/TAI.2021.3052425

AI中文摘要

真实数据常以多个不完整视图的形式出现。不完整多视图聚类是集成这些不完整视图的有效方法。以往的方法仅学习不同视图之间的一致信息，而忽略了每个视图的独特信息，这限制了它们的聚类性能和泛化能力。为克服这一局限，我们提出了一种新颖的视图变异与视图遗传方法(V3H)。受遗传学中变异与遗传的启发，V3H首先将每个子空间分解为对应视图的变异矩阵和所有视图的遗传矩阵，分别表示独特信息和一致信息。然后，通过基于聚类指示矩阵对齐不同视图，V3H集成来自不同视图的独特信息以提高聚类性能。最后，借助基于遗传矩阵的可调低秩表示，V3H恢复潜在的真正数据结构以减少大不完整性的影响。更重要的是，V3H可能是首个将遗传学引入聚类算法以从不完整多视图数据中同时学习一致信息和独特信息的工作。在15个基准数据集上的大量实验结果验证了其相对于其他最先进方法的优越性。

英文摘要

Real data often appear in the form of multiple incomplete views. Incomplete multi-view clustering is an effective method to integrate these incomplete views. Previous methods only learn the consistent information between different views and ignore the unique information of each view, which limits their clustering performance and generalizations. To overcome this limitation, we propose a novel View Variation and View Heredity approach (V3H). Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively. Then, by aligning different views based on their cluster indicator matrices, V3H integrates the unique information from different views to improve the clustering performance. Finally, with the help of the adjustable low-rank representation based on the heredity matrix, V3H recovers the underlying true data structure to reduce the influence of the large incompleteness. More importantly, V3H presents possibly the first work to introduce genetics to clustering algorithms for learning simultaneously the consistent information and the unique information from incomplete multi-view data. Extensive experimental results on fifteen benchmark datasets validate its superiority over other state-of-the-arts.

URL PDF HTML ☆

赞 0 踩 0

2011.10396 2026-05-26 cs.LG cs.AI

Double Self-weighted Multi-view Clustering via Adaptive View Fusion

双自加权多视图聚类：通过自适应视图融合

Xiang Fang, Yuchong Hu

发表机构 * School of Computer Science and Technology, Key Laboratory of Information Storage System Ministry of Education of China, Huazhong University of Science and Technology（计算机科学与技术学院，信息存储系统教育部重点实验室，华中科技大学）

AI总结提出双自加权多视图聚类框架（DSMC），通过自适应权重矩阵和权重因子分别对特征和图进行加权，去除冗余和噪声，并融合多图进行聚类。

Comments Corresponding author: Xiang Fang

详情

AI中文摘要

多视图聚类已应用于许多实际应用中，其中原始数据通常包含噪声。一些基于图的多视图聚类方法被提出来试图减少噪声的负面影响。然而，以往的基于图的多视图聚类方法即使存在冗余特征或噪声，也平等对待所有特征，这显然是不合理的。在本文中，我们提出了一种新颖的多视图聚类框架——双自加权多视图聚类（DSMC）来克服上述缺陷。DSMC执行双自加权操作，从每个图中去除冗余特征和噪声，从而获得鲁棒的图。对于第一次自加权操作，它通过引入自适应权重矩阵为不同特征分配不同的权重，这可以增强重要特征在联合表示中的作用，并使每个图鲁棒。对于第二次自加权操作，它通过施加自适应权重因子对不同图进行加权，这可以为更鲁棒的图分配更大的权重。此外，通过设计自适应多图融合，我们可以融合不同图中的特征，以整合这些图进行聚类。在六个真实世界数据集上的实验证明了其相对于其他最先进的多视图聚类方法的优势。

英文摘要

Multi-view clustering has been applied in many real-world applications where original data often contain noises. Some graph-based multi-view clustering methods have been proposed to try to reduce the negative influence of noises. However, previous graph-based multi-view clustering methods treat all features equally even if there are redundant features or noises, which is obviously unreasonable. In this paper, we propose a novel multi-view clustering framework Double Self-weighted Multi-view Clustering (DSMC) to overcome the aforementioned deficiency. DSMC performs double self-weighted operations to remove redundant features and noises from each graph, thereby obtaining robust graphs. For the first self-weighted operation, it assigns different weights to different features by introducing an adaptive weight matrix, which can reinforce the role of the important features in the joint representation and make each graph robust. For the second self-weighting operation, it weights different graphs by imposing an adaptive weight factor, which can assign larger weights to more robust graphs. Furthermore, by designing an adaptive multiple graphs fusion, we can fuse the features in the different graphs to integrate these graphs for clustering. Experiments on six real-world datasets demonstrate its advantages over other state-of-the-art multi-view clustering methods.

URL PDF HTML ☆

赞 0 踩 0

2011.10331 2026-05-26 cs.CV cs.LG

ANIMC: A Soft Framework for Auto-weighted Noisy and Incomplete Multi-view Clustering

ANIMC: 一种自动加权噪声与不完整多视图聚类的软框架

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

发表机构 * Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology（大数据安全湖北工程研究中心，信息科学与工程学院，华中科技大学）； School of Computer Science and Technology, Huazhong University of Science and Technology（计算机科学与技术学院，华中科技大学）； Key Laboratory of Information Storage System Ministry of Education of China, Huazhong University of Science and Technology（信息存储系统教育部重点实验室，华中科技大学）； Department of Electrical and Computer Engineering, University of Florida（电气与计算机工程系，佛罗里达大学）

AI总结提出ANIMC框架，通过软自动加权策略和双软正则回归模型，处理多视图聚类中的缺失实例和噪声问题。

Comments Publisheded in IEEE Transactions on Artificial Intelligence

Journal ref IEEE Transactions on Artificial Intelligence 2021

详情

AI中文摘要

多视图聚类在许多图像处理场景中有广泛应用。在这些场景中，原始图像数据通常包含缺失实例和噪声，而大多数多视图聚类方法忽略了这一点。然而，缺失实例可能使这些方法难以直接使用，噪声则会导致不可靠的聚类结果。本文通过软自动加权策略和双软正则回归模型，提出了一种新颖的自动加权噪声与不完整多视图聚类框架（ANIMC）。首先，通过设计自适应半正则化非负矩阵分解（adaptive semi-RNMF），软自动加权策略为每个视图分配适当的权重，并添加软边界以平衡噪声和不完整性的影响。其次，通过提出θ-范数，双软正则回归模型通过选择不同的θ来调整模型的稀疏性。与现有方法相比，ANIMC具有三个独特优势：1）它是一种软算法，可以在不同场景下调整我们的框架，从而提高其泛化能力；2）它自动学习每个视图的适当权重，从而减少噪声的影响；3）它执行双软正则回归，对齐不同视图中的相同实例，从而减少缺失实例的影响。大量实验结果表明，它优于其他最先进的方法。

英文摘要

Multi-view clustering has wide applications in many image processing scenarios. In these scenarios, original image data often contain missing instances and noises, which is ignored by most multi-view clustering methods. However, missing instances may make these methods difficult to use directly and noises will lead to unreliable clustering results. In this paper, we propose a novel Auto-weighted Noisy and Incomplete Multi-view Clustering framework (ANIMC) via a soft auto-weighted strategy and a doubly soft regular regression model. Firstly, by designing adaptive semi-regularized nonnegative matrix factorization (adaptive semi-RNMF), the soft auto-weighted strategy assigns a proper weight to each view and adds a soft boundary to balance the influence of noises and incompleteness. Secondly, by proposingθ-norm, the doubly soft regularized regression model adjusts the sparsity of our model by choosing differentθ. Compared with existing methods, ANIMC has three unique advantages: 1) it is a soft algorithm to adjust our framework in different scenarios, thereby improving its generalization ability; 2) it automatically learns a proper weight for each view, thereby reducing the influence of noises; 3) it performs doubly soft regularized regression that aligns the same instances in different views, thereby decreasing the impact of missing instances. Extensive experimental results demonstrate its superior advantages over other state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2011.10254 2026-05-26 cs.LG cs.AI stat.ML

Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

通过视图演化方案的不平衡不完整多视图聚类：弱视图为食，强视图为食

Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu

发表机构 * School of Computer Science and Technology, Key Laboratory of Information Storage System Ministry of Education of China, Huazhong University of Science and Technology（计算机科学与技术学院，信息存储系统教育部重点实验室，华中科技大学）； Hubei Engineering Research Center on Big Data Security, School of Cyber Science and Engineering, Huazhong University of Science and Technology（大数据安全工程研究中心，网络安全学院，华中科技大学）； Department of Electrical and Computer Engineering, University of Florida（电气与计算机工程系，佛罗里达大学）

AI总结针对不同视图不完整程度不平衡的问题，受生物进化理论启发，提出基于视图演化的不平衡不完整多视图聚类方法UIMC，通过加权多视图子空间聚类和低秩鲁棒表示恢复数据，显著提升聚类性能。

Comments Accepted by IEEE Transactions on Emerging Topics in Computational Intelligence

Journal ref IEEE Transactions on Emerging Topics in Computational Intelligence 2021

详情

DOI: 10.1109/TETCI.2021.3077909

AI中文摘要

不完整多视图聚类是处理现实世界中不完整多视图数据的重要技术。以往的工作假设所有视图具有相同的不完整性，即平衡不完整性。然而，不同的视图往往具有不同的不完整性，即不平衡不完整性，这导致了强视图（低不完整性视图）和弱视图（高不完整性视图）。不平衡不完整性阻止我们直接使用先前的方法进行聚类。在本文中，受有效生物进化理论的启发，我们设计了新颖的视图演化方案来聚类强视图和弱视图。此外，我们提出了一种不平衡不完整多视图聚类方法（UIMC），这是第一个基于视图演化的有效方法，用于不平衡不完整多视图聚类。与先前的方法相比，UIMC有两个独特的优势：1）它提出了加权多视图子空间聚类来整合这些不平衡不完整的视图，有效解决了不平衡不完整多视图问题；2）它设计了低秩和鲁棒表示来恢复数据，减少了不完整性和噪声的影响。大量的实验结果表明，UIMC在三个评估指标上相比其他最先进的方法将聚类性能提高了高达40%。

英文摘要

Incomplete multi-view clustering is an important technique to deal with real-world incomplete multi-view data. Previous works assume that all views have the same incompleteness, i.e., balanced incompleteness. However, different views often have distinct incompleteness, i.e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views). The unbalanced incompleteness prevents us from directly using the previous methods for clustering. In this paper, inspired by the effective biological evolution theory, we design the novel scheme of view evolution to cluster strong and weak views. Moreover, we propose an Unbalanced Incomplete Multi-view Clustering method (UIMC), which is the first effective method based on view evolution for unbalanced incomplete multi-view clustering. Compared with previous methods, UIMC has two unique advantages: 1) it proposes weighted multi-view subspace clustering to integrate these unbalanced incomplete views, which effectively solves the unbalanced incomplete multi-view problem; 2) it designs the low-rank and robust representation to recover the data, which diminishes the impact of the incompleteness and noises. Extensive experimental results demonstrate that UIMC improves the clustering performance by up to 40% on three evaluation metrics over other state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

2605.25250 2026-05-26 cs.AI

LipoAgent: Coordinating Fine-Tuned LLM Agents for Safer Lipid Design

LipoAgent: 协调微调的大语言模型智能体以实现更安全的脂质设计

Leshu Li, An Lu, Haiyu Wang, Zhibin Feng, Conghui Duan, Qing Bao, Zongmin Zhao, Sai Qian Zhang

发表机构 * New York University（纽约大学）； University of Illinois Chicago（伊利诺伊大学香槟分校）

AI总结提出LipoAgent，一种安全感知的多智能体大语言模型框架，通过条件预测目标强制毒性作为效率预测的前提，并结合多智能体验证，在mRNA转染效率预测上平均相对提升32%。

详情

AI中文摘要

脂质纳米颗粒（LNPs）是核酸递送中最临床成熟的平台之一，但设计既有效又生物学安全的脂质仍是一个主要瓶颈。在实际筛选中，毒性是一个决策层面的约束：如果一种脂质有毒，其效率预测在临床上无关紧要。我们提出LipoAgent，一种用于脂质发现的安全感知多智能体大语言模型框架。LipoAgent将领域特定微调与条件预测目标相结合，强制毒性作为效率预测的前提，并通过多智能体验证进一步提高可靠性，在存在持续分歧时辅以轻量级人工监督。在多个基础模型上，与已报道的其他脂质设计模型相比，LipoAgent在mRNA转染效率预测上实现了平均32%的相对改进。湿实验验证证实，虚拟筛选排名可靠地转化为生物学转染结果。代码公开于https://github.com/SAI-Lab-NYU/LipoAgent.git。

英文摘要

Lipid nanoparticles (LNPs) are among the most clinically mature platforms for nucleic acid delivery, yet designing lipids that are both effective and biologically safe remains a major bottleneck. In practical screening, toxicity is a decision-level constraint: if a lipid is toxic, its efficiency prediction is clinically irrelevant. We propose LipoAgent, a safety-aware multi-agent LLM framework for lipid discovery. LipoAgent combines domain-specific finetuning with a conditional prediction objective that enforces toxicity as a prerequisite for efficiency prediction, and further improves reliability via multi-agent verification with lightweight human oversight when disagreement persists. Across multiple foundation models, LipoAgent achieves an average 32% relative improvement in mRNA transfection efficiency prediction compared with other reported models for lipid design. Wet-lab validation confirms that virtual screening rankings reliably translate to biological transfection outcomes. The code is publicly available at https://github.com/SAI-Lab-NYU/LipoAgent.git.

URL PDF HTML ☆

赞 0 踩 0

2605.25244 2026-05-26 cs.CL

Inference Time Optimization with Confidence Dynamics

基于置信度动态的推理时优化

Yu Wang, Minghao Liu, Jiayun Wang, Jinrui Huang, Ankit Shah, Wei Wei

发表机构 * Center for Advanced AI, Accenture（Accenture高级人工智能中心）

AI总结本文通过观察推理轨迹中置信度的动态变化，发现正确轨迹置信度上升而错误轨迹下降，据此提出置信度动态增益投票方法，显著提升大语言模型推理性能。

Comments Published in ICML 2026

详情

AI中文摘要

推理时优化技术（如重复采样）显著提升了大语言模型（LLMs）的推理能力。然而，模型不确定性在这些优化策略中的关键作用仍未被充分探索。本文研究了沿推理轨迹的置信度动态，并首次揭示了一个令人惊讶且独特的模式：正确回答轨迹倾向于随时间表现出置信度提升（正置信度增益），而错误轨迹在推理过程中置信度减弱或下降。基于这一观察，我们提出了基于置信度动态增益（CDG）的投票方法，该方法融入了响应置信度轨迹沿推理链的演化方式。在AIME24/25、HMMT25和BRUMO25基准测试上，针对四种开源架构（DeepSeek-R1、gpt-oss、Gemma-3、Qwen-QwQ）的实验表明，CDG相比基线取得了显著的性能提升。这些结果证明，我们的方法为改进LLM推理中的答案选择提供了稳健的判别信号。我们还为这一现象提供了理论见解。代码将在https://github.com/Accenture/CDG.git发布。

英文摘要

Inference time optimization techniques, such as repeated sampling, have significantly advanced the reasoning capabilities of Large Language Models (LLMs). However, the critical role of model uncertainty remains largely underexplored in these optimization strategies. In this paper, we investigate the dynamics of confidence along reasoning trajectories and for first time reveal a surprising and unique pattern: correct answer traces tend to exhibit confidence improvement over time (positive confidence gain), while incorrect traces show attenuated or declining confidence as reasoning proceeds. Based on this observation, we propose Confidence Dynamic Gain (CDG) based voting, which incorporates how the confidence trajectory of the response evolves along the reasoning chain. Experiments across four open-source architectures (DeepSeek-R1, gpt-oss, Gemma-3, Qwen-QwQ) on the AIME24/25, HMMT25, and BRUMO25 benchmarks demonstrate that CDG yields a significant performance boost over baselines. These results demonstrate that our method provides a robust discriminative signal for improving answer selection in LLM reasoning. We also provide theoretical insights for this phenomenon. Code will be released at https://github.com/Accenture/CDG.git.

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective

Generalizable Vision-Language Few-Shot Adaptation with Predictive Prompts and Negative Learning

Efficient and Scalable Neural Symbolic Search for Knowledge Graph Complex Query Answering

Combining Abstract Argumentation and Machine Learning for Efficiently Analyzing Low-Level Process Event Streams

ACCORD: Alleviating Concept Coupling through Dependence Regularization for Text-to-Image Diffusion Personalization

A neural signed configuration distance function for path planning of picking manipulators

SURGE: On the Potential of Large Language Models as General-Purpose Surrogate Code Executors

PCGRLLM: Large Language Model-Driven Reward Design for Procedural Content Generation Reinforcement Learning

ExplainReduce: Generating global explanations from many local explanations

Message-Passing GNNs Fail to Approximate Sparse Triangular Factorizations

FragmentNet: Adaptive Graph Fragmentation for Graph-to-Sequence Molecular Representation Learning

Iterative Feature Space Optimization through Incremental Adaptive Evaluation

Adaptive Hierarchical Graph Cut for Multi-granularity Out-of-distribution Detection

Data-Driven Optimization of Tactile Sensor Configurations for Efficient Dexterous Manipulation

Appearance Blur-driven AutoEncoder and Motion-guided Memory Module for Video Anomaly Detection

Uncertainty-Guided Appearance-Motion Association Network for Out-of-Distribution Action Detection

Transformer-based few-shot learning for modeling Electricity Consumption Profiles with minimal data across thousands of domains

Hadamard Representation: Scaffolding Performance Across Model-free RL

Residual Connections Harm Generative Representation Learning

Branch Scaling Manifests as Implicit Architectural Regularization for Improving Generalization in Overparameterized ResNets

Opening the Black-Box: A Systematic Review on Explainable AI in Remote Sensing

On the Communication Complexity of Decentralized Stochastic Bilevel Optimization

You Can Ground Earlier than See: An Effective and Efficient Pipeline for Temporal Sentence Grounding in Compressed Videos

Multi-Modal Cross-Domain Alignment Network for Video Moment Retrieval

V3H: View Variation and View Heredity for Incomplete Multi-view Clustering

Double Self-weighted Multi-view Clustering via Adaptive View Fusion

ANIMC: A Soft Framework for Auto-weighted Noisy and Incomplete Multi-view Clustering

Unbalanced Incomplete Multi-view Clustering via the Scheme of View Evolution: Weak Views are Meat; Strong Views do Eat

LipoAgent: Coordinating Fine-Tuned LLM Agents for Safer Lipid Design

Inference Time Optimization with Confidence Dynamics