arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2602.09639 2026-06-10 cs.LG stat.ML 版本更新

Blind denoising diffusion models and the blessings of dimensionality

盲去噪扩散模型与维度的祝福

Zahra Kadkhodaie, Aram-Alexandre Pooladian, Sinho Chewi, Eero Simoncelli

发表机构 * Flatiron Institute, Simons Foundation（Flatiron研究院，Simons基金会）； Foundations of Data Science, Yale University（数据科学基础，耶鲁大学）； Department of Statistics and Data Science, Yale University（统计与数据科学系，耶鲁大学）； Ctr. for Neural Science & Courant Institute, New York University（神经科学中心及Courant学院，纽约大学）

AI总结提出盲去噪扩散模型（BDDM），通过不向神经网络传递噪声幅度来简化设计，并在数据内在维度低于环境维度的假设下证明其正确性，实验显示自适应方案的优势。

Comments 39 pages, 13 figures; Accepted to ICML 2025 FoGen workshop

详情

AI中文摘要

去噪扩散模型（DDM）是跨多个领域从数据中学习密度的最先进方法，然而训练和采样流程的许多方面仍知之甚少。特别是，噪声调节要求从业者将人为设计的无原则噪声嵌入纳入神经网络架构，并使用临时噪声调度进行采样。为了解决这些缺点，我们提供了\emph{盲去噪扩散模型}（BDDM）的完整理论：这是DDM的一种变体，其中噪声幅度在训练或采样期间不传入神经网络，从而消除了上述设计选择的需要。我们在数据分布相对于环境维度具有低内在维度的假设下证明了BDDM作为采样算法的正确性。这一假设源于从单个噪声样本估计噪声水平的贝叶斯问题的引入，该问题可能具有独立的意义。我们通过实验将BDDM的性能与标准DDM进行比较，展示了我们分析严格证明的\emph{自适应}方案的优势。

英文摘要

Denoising diffusion models (DDMs) are state-of-the-art methods for learning densities from data across numerous domains, yet many aspects of the training and sampling pipeline remain poorly understood. In particular, noise conditioning requires practitioners to incorporate contrived unprincipled noise embeddings into neural network architectures and to use ad hoc noise schedules for sampling. To address these drawbacks, we provide a complete theory for \emph{blind denoising diffusion models} (BDDMs): a variant of DDMs where the noise amplitude is not passed into the neural network during training or sampling, obviating the need for the aforementioned design choices. We justify the correctness of BDDMs as a sampling algorithm under an assumption of low intrinsic dimensionality of the underlying data distribution relative to the ambient dimension. This assumption arises through the introduction of the Bayesian problem of estimating noise levels from a single noisy sample, which might be of independent interest. We empirically compare the performance of BDDMs to standard DDMs, showcasing the benefits of an \emph{adaptive} scheme which is rigorously justified by our analysis.

URL PDF HTML ☆

赞 0 踩 0

2602.06411 2026-06-10 cs.LG 版本更新

DAH-Net: A Dual-Attention Hybrid Network for Interpretable and Robust EEG-Based Emotion Recognition

DAH-Net: 用于可解释且鲁棒的基于脑电情绪识别的双注意力混合网络

S M Rakib UI Karim, Diponkor Bala, Wenyi Lu, Rownak Ara Rasul, Sean Goggins

发表机构 * Department of Electrical \& Computer Engineering University of Missouri Columbia, Missouri, USA ； Department of Computer Science ； Engineering City University Savar, Dhaka-1340, Bangladesh ； Department of Computer Science University of Missouri Columbia, Missouri, USA

AI总结提出DAH-Net双注意力混合网络，集成1D-CNN、BiLSTM和双多头注意力，在2479样本988特征上达到99.19%测试准确率，优于传统模型，并通过特征分析揭示协方差特征贡献最大。

详情

AI中文摘要

基于脑电的情绪识别支持情感脑机接口和心理健康监测，但仍面临信号复杂性、受试者变异性和可解释性有限的挑战。我们提出DAH-Net，一种双注意力混合网络，集成1D-CNN、BiLSTM和双多头注意力（16+8头），用于三类脑电情绪分类。在2479个样本（988个脑电特征）上评估，DAH-Net达到99.19%的留出测试准确率，训练-测试差距为0.81%，优于RF（96.17%）、SVM（96.77%）、MLP（97.18%）和Transformer（98.19%）基线。Friedman检验（χ²=28.54，p<0.001）和事后Wilcoxon比较证实了统计显著性。使用随机森林重要性、SHAP归因和特征类别隔离进行的特征级分析表明，协方差特征达到接近基线的独立准确率（94.96%），而特征值特征独立表现有限（84.07%），但提供了紧凑的互补信息。紧凑架构（3.33M参数，使用32位权重约13.3MB）表明未来轻量级基于脑电的情感计算的潜力，有待受试者独立和外部验证。

英文摘要

EEG-based emotion recognition supports affective brain-computer interfaces and mental health monitoring yet remains challenged by signal complexity, subject variability, and limited interpretability. We propose DAH-Net, a dual-attention hybrid network integrating 1D-CNN, BiLSTM, and dual multi-head attention (16+8 heads) for three-class EEG emotion classification. Evaluated on 2,479 samples with 988 EEG features, DAH-Net achieves 99.19% held-out test accuracy with a 0.81% train-test gap, outperforming RF (96.17%), SVM (96.77%), MLP (97.18%), and Transformer (98.19%) baselines. Friedman testing (\c{hi}2 = 28.54, p < 0.001) and post-hoc Wilcoxon comparisons confirm statistical significance. Feature-level analysis using Random Forest importance, SHAP attribution, and feature category isolation shows that covariance features achieve near-baseline standalone accuracy (94.96%), while eigenvalue features show limited standalone performance (84.07%) but provide compact complementary information. The compact architecture (3.33M parameters, approximately 13.3MB using 32-bit weights) suggests potential for future lightweight EEG-based affective computing, pending subject-independent and external validation.

URL PDF HTML ☆

赞 0 踩 0

2601.21543 2026-06-10 cs.CL 版本更新

通过具有丰富化学先验的软约束GFlowNets生成可合成分子

Hyeonah Kim, Minsu Kim, Celine Roget, Dionessa Biton, Louis Vaillancourt, Yves V. Brun, Yoshua Bengio, Alex Hernandez-Garcia

发表机构 * University of Toronto（多伦多大学）； DeepMind（深度思维）； University of Montreal（蒙特利尔大学）

AI总结提出S3-GFN方法，通过软正则化序列GFlowNet，利用大规模SMILES语料库的化学先验，生成高奖励且可合成的分子，实验表明可合成率≥95%。

详情

AI中文摘要

生成模型在实验药物发现活动中的应用受到严重限制，因为从头设计实际可合成的分子非常困难。先前的工作利用生成流网络（GFlowNets）通过基于预定义反应模板和构建块的状态和动作空间设计来施加硬合成性约束。尽管这种方法前景广阔，但目前缺乏灵活性和可扩展性。作为替代方案，我们提出了S3-GFN，它通过对基于序列的GFlowNet进行简单的软正则化来生成可合成的SMILES分子。我们的方法利用从大规模SMILES语料库中学习到的丰富分子先验，将分子生成引导向高奖励、可合成的化学空间。该模型通过离策略重放训练和基于可合成与不可合成样本分离缓冲区的对比学习信号来施加约束。我们的实验表明，S3-GFN能够学习生成可合成分子（≥95%），并在多种任务中获得更高奖励。

英文摘要

The application of generative models for experimental drug discovery campaigns is severely limited by the difficulty of designing molecules de novo that can be synthesized in practice. Previous works have leveraged Generative Flow Networks (GFlowNets) to impose hard synthesizability constraints through the design of state and action spaces based on predefined reaction templates and building blocks. Despite the promising prospects of this approach, it currently lacks flexibility and scalability. As an alternative, we propose S3-GFN, which generates synthesizable SMILES molecules via simple soft regularization of a sequence-based GFlowNet. Our approach leverages rich molecular priors learned from large-scale SMILES corpora to steer molecular generation towards high-reward, synthesizable chemical spaces. The model induces constraints through off-policy replay training with a contrastive learning signal based on separate buffers of synthesizable and unsynthesizable samples. Our experiments show that S3-GFN learns to generate synthesizable molecules ($\geq 95\%$) with higher rewards in diverse tasks.

URL PDF HTML ☆

赞 0 踩 0

2503.13358 2026-06-10 cs.CV 版本更新

One-Step Residual Shifting Diffusion for Image Super-Resolution via Distillation

一步残差移位扩散用于图像超分辨率通过蒸馏

Daniil Selikhanovych, David Li, Aleksei Leonov, Nikita Gushchin, Sergei Kushneriuk, Alexander Filippov, Evgeny Burnaev, Iaroslav Koshelev, Alexander Korotin

发表机构 * Kandinsky Lab（坎迪斯基实验室）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）； Luzin Research Center（卢津研究所）； Moscow Independent Research Institute of Artificial Intelligence（莫斯科独立人工智能研究 institute）； Applied AI Institute（应用人工智能研究所）

AI总结提出RSD蒸馏方法，通过训练学生网络使基于其生成图像的虚拟ResShift模型与教师一致，实现单步超分辨率，在感知指标上超越教师和SinSR，且参数和计算成本更低。

Comments ICML-2026

详情

AI中文摘要

用于超分辨率（SR）的扩散模型产生高质量的视觉结果，但需要昂贵的计算成本。尽管已经开发了几种加速基于扩散的SR模型的方法，但有些（例如SinSR）无法产生真实的感知细节，而其他（例如OSEDiff）可能会产生不存在的结构。为了克服这些问题，我们提出了RSD，一种新的ResShift蒸馏方法。我们的方法基于训练学生网络生成图像，使得基于这些图像训练的新假ResShift模型与教师模型一致。RSD实现单步恢复，并在各种感知指标（LPIPS、CLIPIQA、MUSIQ）上明显优于教师。我们表明，我们的蒸馏方法可以超越SinSR（另一种基于ResShift的蒸馏方法），使其在感知质量方面与最先进的扩散SR蒸馏方法相当，且计算成本有限。与基于预训练文本到图像模型的SR方法相比，RSD产生具有竞争力的感知质量，并需要更少的参数、GPU内存和训练成本。我们在各种真实世界和合成数据集上提供了实验结果，包括RealSR、RealSet65、DRealSR、ImageNet和DIV2K。我们在以下网址提供代码：此https URL。

英文摘要

Diffusion models for super-resolution (SR) produce high-quality visual results but require expensive computational costs. Despite the development of several methods to accelerate diffusion-based SR models, some (e.g., SinSR) fail to produce realistic perceptual details, while others (e.g., OSEDiff) may hallucinate non-existent structures. To overcome these issues, we present RSD, a new distillation method for ResShift. Our method is based on training the student network to produce images such that a new fake ResShift model trained on them will coincide with the teacher model. RSD achieves single-step restoration and outperforms the teacher by a noticeable margin in various perceptual metrics (LPIPS, CLIPIQA, MUSIQ). We show that our distillation method can surpass SinSR, the other distillation-based method for ResShift, making it on par with state-of-the-art diffusion SR distillation methods with limited computational costs in terms of perceptual quality. Compared to SR methods based on pre-trained text-to-image models, RSD produces competitive perceptual quality and requires fewer parameters, GPU memory, and training cost. We provide experimental results on various real-world and synthetic datasets, including RealSR, RealSet65, DRealSR, ImageNet, and DIV2K. We provide the code at https://github.com/Daniil-Selikhanovych/RSD.

URL PDF HTML ☆

赞 0 踩 0

2602.03164 2026-06-10 cs.LG cs.AI 版本更新

MemCast: Memory-Driven Time Series Forecasting with Experience-Conditioned Reasoning

MemCast：基于经验条件推理的记忆驱动时间序列预测

Xiaoyu Tao, Mingyue Cheng, Ze Guo, Shuo Yu, Yaguo Liu, Qi Liu, Shijin Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出MemCast框架，将时间序列预测转化为经验条件推理任务，通过层次化记忆学习历史模式、推理智慧和一般规律，并采用动态置信度适应策略实现持续进化，在多个数据集上优于现有方法。

详情

AI中文摘要

时间序列预测（TSF）在许多现实世界的决策中起着关键作用。最近，基于大型语言模型（LLM）的预测器取得了有希望的进展。尽管有效，现有方法通常缺乏显式的经验积累和持续进化。在这项工作中，我们提出了MemCast，一个学习到记忆的框架，将TSF重新表述为经验条件推理任务。具体来说，我们从训练集中学习经验并将其组织成层次化记忆。这是通过将预测结果总结为历史模式、将推理轨迹提炼为推理智慧、以及将提取的时间特征归纳为一般规律来实现的。此外，在推理过程中，我们利用历史模式指导推理过程，利用推理智慧选择更好的轨迹，而一般规律则作为反思迭代的标准。另外，为了实现持续进化，我们设计了一种动态置信度适应策略，该策略在不泄露测试集分布的情况下更新单个条目的置信度。在多个数据集上的大量实验表明，MemCast始终优于以前的方法，验证了我们方法的有效性。我们的代码可在以下网址获得：此 https URL。

英文摘要

Time series forecasting (TSF) plays a critical role in decision-making for many real-world applications. Recently, large language model (LLM)- based forecasters have made promising advancements. Despite their effectiveness, existing methods often lack explicit experience accumulation and continual evolution. In this work, we propose MemCast, a learning-to-memory framework that reformulates TSF as an experience-conditioned reasoning task. Specifically, we learn experience from the training set and organize it into a hierarchical memory. This is achieved by summarizing prediction results into historical patterns, distilling inference trajectories into reasoning wisdom, and inducing extracted temporal features into general laws. Furthermore, during inference, we leverage historical patterns to guide the reasoning process and utilize reasoning wisdom to select better trajectories, while general laws serve as criteria for reflective iteration. Additionally, to enable continual evolution, we design a dynamic confidence adaptation strategy that updates the confidence of individual entries without leaking the test set distribution. Extensive experiments on multiple datasets demonstrate that MemCast consistently outperforms previous methods, validating the effectiveness of our approach. Our code is available at https://github.com/Xiaoyu-Tao/MemCast-TS.

URL PDF HTML ☆

赞 0 踩 0

2602.02788 2026-06-10 cs.LG cs.AI physics.comp-ph 版本更新

Structure-Preserving Learning Improves Geometry Generalization in Neural PDEs

保结构学习提升神经PDE在几何泛化中的表现

Benjamin D. Shaffer, Shawn Koohy, Brooks Kinch, M. Ani Hsieh, Nathaniel Trask

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出General-Geometry Neural Whitney Forms (Geo-NeW)方法，通过联合学习微分算子和兼容的降阶有限元空间，结合有限元外微积分精确保持物理守恒律，显著提升对未见几何域的泛化能力。

详情

AI中文摘要

我们旨在为科学和工程开发物理基础模型，这些模型能为偏微分方程（PDE）提供实时解，并在适应未见几何时保持结构和精度。为此，我们引入了通用几何神经Whitney形式（Geo-NeW）：一种数据驱动的有限元方法。我们联合学习一个微分算子和定义在底层几何上的兼容降阶有限元空间。求解所得模型以生成预测，同时通过有限元外微积分精确保持物理守恒律。几何通过基于Transformer的编码和作为学习到的有限元空间的基础，以离散化网格的形式进入模型。这明确地将底层几何和施加的边界条件与解联系起来，为学习神经PDE提供了强大的归纳偏置，我们证明这改善了对未见域的泛化。我们提供了一种本构模型的新参数化，确保解的存在性和唯一性。我们的方法在几个稳态PDE基准测试中展示了最先进的性能，并在分布外几何上比传统基线有显著改进。

英文摘要

We aim to develop physics foundation models for science and engineering that provide real-time solutions to Partial Differential Equations (PDEs) which preserve structure and accuracy under adaptation to unseen geometries. To this end, we introduce General-Geometry Neural Whitney Forms (Geo-NeW): a data-driven finite element method. We jointly learn a differential operator and compatible reduced finite element spaces defined on the underlying geometry. The resulting model is solved to generate predictions, while exactly preserving physical conservation laws through Finite Element Exterior Calculus. Geometry enters the model as a discretized mesh both through a transformer-based encoding and as the basis for the learned finite element spaces. This explicitly connects the underlying geometry and imposed boundary conditions to the solution, providing a powerful inductive bias for learning neural PDEs, which we demonstrate improves generalization to unseen domains. We provide a novel parameterization of the constitutive model ensuring the existence and uniqueness of the solution. Our approach demonstrates state-of-the-art performance on several steady-state PDE benchmarks, and provides a significant improvement over conventional baselines on out-of-distribution geometries.

URL PDF HTML ☆

赞 0 踩 0

2602.01951 2026-06-10 cs.CV 版本更新

Enabling Progressive Whole-slide Image Analysis with Multi-scale Pyramidal Network

利用多尺度金字塔网络实现渐进式全切片图像分析

Shuyang Wu, Yifu Qiu, Ines P Nearchou, Sandrine Prost, Jonathan A Fallowfield, Hakan Bilen, Timothy J Kendall

发表机构 * Institute for Regeneration and Repair, University of Edinburgh（再生与修复研究所，爱丁堡大学）； School of Informatics, University of Edinburgh（信息学院，爱丁堡大学）； Indica Labs, 8700 Education Pl NW, Bldg. B Albuquerque, US（Indica实验室，美国阿尔伯克基8700教育大道西北区B座）； Medical School, University of St Andrews（医学学校，圣安德鲁大学）

AI总结提出多尺度金字塔网络（MSPN），一种即插即用模块，仅使用单一高倍输入实现渐进式多尺度全切片图像分析，通过网格重映射和粗引导网络学习粗粒度上下文，在多个任务和框架上一致提升MIL性能。

详情

AI中文摘要

多实例学习（MIL）常用于计算病理学（CPath），其中多尺度特征对于捕捉精细细胞细节和广泛组织结构至关重要。然而，现有的多尺度MIL方法通常依赖于不灵活的多倍率输入或计算成本高昂的架构。随着预训练基础模型（FMs）成为特征提取的趋势并推动轻量级模型的发展，我们重新思考并探索更高效的多尺度MIL方法。在本文中，我们提出了多尺度金字塔网络（MSPN），一种用于基于注意力的MIL的即插即用模块。MSPN仅使用单一高倍输入实现渐进式多尺度全切片图像分析。它由（1）基于网格的重映射组成，该重映射聚合高倍特征以导出空间感知的粗粒度特征图，以及（2）粗引导网络（CGN），该网络学习粗粒度上下文。我们将MSPN作为附加模块在4个基于注意力的框架上，针对5个临床相关任务，使用2个基础模型和一个预训练的MIL框架进行基准测试。我们的结果表明，MSPN在比较的配置和任务上一致地提高了MIL性能，同时保持轻量且易于使用。

英文摘要

Multiple-instance Learning (MIL) is commonly used for computational pathology (CPath), where multi-scale features are essential for capturing both fine cellular details and broad tissue architecture. However, existing multi-scale MIL approaches typically rely on the inflexible multi-magnification inputs or the computationally expensive architectures. As pre-trained foundation models (FMs) become the trend for feature extraction and boost lightweight models, we rethink and explore a more efficient multi-scale MIL method. In this paper, we propose the Multi-scale Pyramidal Network (MSPN), a plug-and-play module for attention-based MIL. MSPN introduces progressive multi-scale whole-slide image analysis using only a single high-magnification input. It consists of (1) grid-based remapping that aggregates high-magnification features to derive spatially-aware coarse feature maps, and (2) the Coarse Guidance Network (CGN) that learns coarse contexts. We benchmark MSPN as an add-on module to 4 attention-based frameworks on 5 clinically relevant tasks with 2 foundation models, and a pre-trained MIL framework. Our results demonstrate that MSPN consistently improves MIL across the compared configurations and tasks, while being lightweight and easy-to-use.

URL PDF HTML ☆

赞 0 踩 0

2601.22763 2026-06-10 cs.CV 版本更新

基于模型扩散采样的离线决策预测控制

Haldun Balim, Na Li, Yilun Du

发表机构 * GitHub

AI总结提出MPDiffuser框架，通过组合扩散规划器与动力学扩散模型，在采样中交替更新以生成符合任务目标且动力学可行的轨迹，并利用轻量级排序模块选择最优轨迹，在D4RL和DSRL基准及四足机器人上验证了有效性。

详情

AI中文摘要

英文摘要

Pre-trained diffusion models have emerged as powerful generative priors for both unconditional and conditional sample generation, yet their outputs often deviate from the characteristics of user-specific target data. Such mismatches are especially problematic in domain adaptation tasks, where only a few reference examples are available and retraining the diffusion model is infeasible. Existing inference-time guidance methods can adjust sampling trajectories, but they typically optimize surrogate objectives such as classifier likelihoods rather than directly aligning with the target distribution. We propose \emph{MMD Guidance}, a training-free mechanism that augments the reverse diffusion process with gradients of the \textit{Maximum Mean Discrepancy (MMD)} between generated samples and a reference dataset. MMD provides reliable distributional estimates from limited data, exhibits low variance in practice, and is efficiently differentiable, which makes it particularly well-suited for the guidance task. Our framework naturally extends to prompt-aware adaptation in conditional generation models via product kernels. Also, it can be applied with computational efficiency in latent diffusion models (LDMs), since guidance is applied in the latent space of the LDM. Experiments on synthetic and real-world benchmarks demonstrate that MMD Guidance can achieve distributional alignment while preserving sample fidelity. The project code is available at github.com/matinamehdizadeh/MMD-Guidance.

URL PDF HTML ☆

赞 0 踩 0

2601.05232 2026-06-10 cs.CL cs.CY cs.LG 版本更新

AI Application Gives Users Real-Time Feedback on the Level of Peace in the Social Media Videos They Watch

AI应用为用户观看的社交媒体视频提供实时和平水平反馈

P. Gilda, P. Dungarwal, A. Thongkham, E. T. Ajayi, S. Choudhary, T. M. Terol, C. Lam, J. P. Araujo, M. McFadyen-Mungalln, L. S. Liebovitch, P. T. Coleman, H. West, K. Sieck, S. Carter

发表机构 * Data Science Institute, Columbia University（哥伦比亚大学数据科学研究所）； Advanced Consortium on Cooperation, Conflict, and Complexity, Columbia University（哥伦比亚大学合作、冲突与复杂性高级联合体）； Computer Science, Columbia University（哥伦比亚大学计算机科学）； Data Science, St John’s University（圣约翰大学数据科学）； Quantitative Methods in the Social Sciences, Columbia University（哥伦比亚大学社会科学定量方法）； Barnard College, Columbia University（哥伦比亚大学巴纳德学院）； Teachers College, Columbia University（哥伦比亚大学教师学院）； Department of Industrial Engineering and Operations Research, Columbia University（哥伦比亚大学工业工程与运筹学系）； Harmonious Communities, Toyota Research Institute（丰田研究院和谐社区）

AI总结开发了一个实时分析YouTube视频中语言和平程度的AI应用，使用监督学习和大语言模型，大语言模型在测量和平相关社会维度上更接近人类编码者。

Comments 6 pages, 4 figures, corrected typos, minor edits; v3: 16 pages, improved title, abstract, introduction, discussion, conclusions, added more references

详情

AI中文摘要

现在大多数人通过社交媒体（如YouTube和Facebook）上的视频获取新闻，而不是通过精心策划的新闻业。“我们成为我们所注视的。”语言的内容和语调在开始或结束冲突中起着至关重要的作用。“仇恨言论”会加剧冲突，“和平言论”会促进和平。我们开发了一个应用程序，可以实时测量YouTube视频中这些方面的言论，从而为用户提供关于自身媒体消费的有用反馈。我们使用了两种方法：1）监督机器学习。在线新闻媒体文本中的语言通过衡量这些国家和平水平的调查进行标记。一个全连接前馈网络和两个卷积神经网络在该数据上训练，在测试集上预测和平水平的准确率约为97%，在另一个不同的新闻文本数据集中准确率约为70%，但未能泛化到YouTube视频，表明书面文本与转录的口语不同。2）社会科学维度。没有类似的外部数据来标记YouTube视频转录文本中的语言。因此，我们使用了2个词级情感分析（SA）和6个上下文级大语言模型（LLM）来测量59项社会科学研究确定的和平中的5个社会维度：同情-蔑视、新闻-观点、促进-预防、创造力-秩序、细微差别-简化。在52个视频上，LLM与3个人类编码者的值更接近（r^2~0.60），而SA的r^2~0.03。结果：与人类编码者相比，LLM成功测量了YouTube视频中与和平相关的重要社会维度。这些结果构成了一个分析引擎的基础，该引擎可以为用户和内容创作者提供关于自身媒体消费和创作的反馈。

英文摘要

Most people now get their news from videos on social media, such as YouTube and Facebook, rather than through curated journalism. "We become what we behold." The content and tone of language plays an essential role in starting or ending conflicts. "Hate Speech" can enhance conflict, "Peace Speech" can enhance peace. We developed an application that measures, in real time, these aspects of speech from YouTube videos, which can give users helpful feedback on their own media diet. We used two approaches: 1) supervised machine learning. Language in the text of online news media text was tagged by surveys that measure the level of peace in those countries. One fully connected feedforward and 2 convolutional neural networks trained on that data were $\sim 97\%$ accurate in predicting levels of peace in the test set and $\sim 70\%$ accurate in another distinct news text data set, but did not generalize to YouTube videos, suggesting that written text is different than transcribed spoken language. 2) social science dimensions. There is no similar external data to tag the text in the YouTube video transcripts. We therefore used 2 word-level sentiment analysis (SA) and 6 context-level large language models (LLMs) to measure 5 social dimensions in peace identified by 59 social science studies: compassion-contempt, news-opinion, promotion-prevention, creativity-order, nuance-simplification. LLMs more closely matched the values by 3 human coders on 52 videos, $r^2\sim0.60$ than SA, at $r^2\sim0.03$. Results: LLMs successfully measured social dimensions important in peace in YouTube videos, compared to human coders. These results serve as the basis of an analysis engine that can give users and content creators feedback on their own media diet and creations.

URL PDF HTML ☆

赞 0 踩 0

2601.06997 2026-06-10 cs.RO cs.CV 版本更新

ObjSplat: Geometry-Aware Gaussian Surfels for Active Object Reconstruction

ObjSplat: 几何感知的高斯面元用于主动物体重建

Yuetao Li, Zhizhou Jia, Yu Zhang, Qun Hao, Shaohui Zhang

发表机构 * School of Optics and Photonics, Beijing Institute of Technology（光学与光子学学院，北京理工大学）； School of Optoelectronic Engineering, Changchun University of Science and Technology（光电工程学院，长春理工大学）

AI总结提出ObjSplat框架，利用高斯面元统一表示，通过几何感知视点评估和下一最佳路径规划器，实现高效高保真的主动物体重建。

Comments Accepted to IEEE T-ASE. Code: https://github.com/Li-Yuetao/ObjSplat , Project Page: https://li-yuetao.github.io/ObjSplat-page/

详情

DOI: 10.1109/TASE.2026.3700105

AI中文摘要

自主高保真物体重建是创建数字资产和弥合机器人模拟与现实差距的基础。我们提出ObjSplat，一个主动重建框架，利用高斯面元作为统一表示，逐步重建未知物体，同时具有逼真的外观和准确的几何。针对传统基于不透明度或深度线索的局限性，我们引入了几何感知视点评估管线，明确建模背面可见性和遮挡感知的多视图共视性，即使在几何复杂的物体上也能可靠地识别未重建区域。此外，为了克服贪婪规划策略的局限性，ObjSplat采用下一最佳路径（NBP）规划器，在动态构建的空间图上执行多步前瞻。通过联合优化信息增益和移动成本，该规划器生成全局高效的轨迹。在仿真和真实世界文化遗物上的大量实验表明，ObjSplat在几分钟内生成物理一致的模型，与最先进方法相比，实现了卓越的重建保真度和表面完整性，同时显著减少了扫描时间和路径长度。项目页面：此https URL。

英文摘要

Autonomous high-fidelity object reconstruction is fundamental for creating digital assets and bridging the simulation-to-reality gap in robotics. We present ObjSplat, an active reconstruction framework that leverages Gaussian surfels as a unified representation to progressively reconstruct unknown objects with both photorealistic appearance and accurate geometry. Addressing the limitations of conventional opacity or depth-based cues, we introduce a geometry-aware viewpoint evaluation pipeline that explicitly models back-face visibility and occlusion-aware multi-view covisibility, reliably identifying under-reconstructed regions even on geometrically complex objects. Furthermore, to overcome the limitations of greedy planning strategies, ObjSplat employs a next-best-path (NBP) planner that performs multi-step lookahead on a dynamically constructed spatial graph. By jointly optimizing information gain and movement cost, this planner generates globally efficient trajectories. Extensive experiments in simulation and on real-world cultural artifacts demonstrate that ObjSplat produces physically consistent models within minutes, achieving superior reconstruction fidelity and surface completeness while significantly reducing scan time and path length compared to state-of-the-art approaches. Project page: https://li-yuetao.github.io/ObjSplat-page/ .

URL PDF HTML ☆

赞 0 踩 0

2512.17629 2026-06-10 cs.LG cs.AI 版本更新

QDepth-VLA：量化深度预测作为视觉-语言-动作模型的辅助监督

Yixuan Li, Yuhui Chen, Mingcai Zhou, Haoran Li, Zhengtao Zhang, Dongbin Zhao

发表机构 * School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）； Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； Beijing Zhongke Huiling Robot Technology Co.（北京中科创联机器人科技有限公司）

AI总结提出QDepth-VLA框架，通过辅助深度预测任务增强VLA模型的空间感知与推理能力，在仿真和真实任务中提升操作性能。

2512.14617 2026-06-10 cs.LG cs.AI 版本更新

Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes

离散动作非马尔可夫奖励决策过程中基于模型的强化学习

Alessandro Trapasso, Luca Iocchi, Fabio Patrizi

发表机构 * Fondazione Bruno Kessler（布雷诺·科塞拉基金会）； Sapienza University of Rome（罗马萨皮恩扎大学）

AI总结提出QR-MAX算法，通过奖励机分解马尔可夫转移学习与非马尔可夫奖励处理，首次在离散NMRDP中获得PAC收敛到ε-最优策略的多项式样本复杂度，并扩展至连续状态空间。

Comments Accepted at IJCAI-ECAI 2026. 19 pages, 32 figures, includes appendix

详情

AI中文摘要

许多实际决策问题涉及的任务成功取决于整个系统历史，而非达到具有期望属性的状态。马尔可夫强化学习（RL）方法不适用于此类任务，而基于非马尔可夫奖励决策过程（NMRDP）的RL使智能体能够处理时间依赖任务。长期以来，这种方法缺乏关于（近）最优性和样本效率的形式保证。我们通过QR-MAX解决了这两个问题，这是一种新颖的基于模型的算法，用于离散NMRDP，通过奖励机将马尔可夫转移学习与非马尔可夫奖励处理分解。据我们所知，这是第一个利用这种分解获得PAC收敛到ε-最优策略且具有多项式样本复杂度的离散动作NMRDP的基于模型的RL算法。然后，我们将QR-MAX扩展到连续状态空间，提出Bucket-QR-MAX，一种基于SimHash的离散化器，它保留了相同的分解结构，无需手动网格划分或函数逼近即可实现快速稳定的学习。我们在复杂度递增的环境中将我们的方法与现代最先进的基于模型的RL方法进行了实验比较，显示出样本效率的显著提高和寻找最优策略的鲁棒性增强。

英文摘要

Many practical decision-making problems involve tasks whose success depends on the entire system history, rather than on achieving a state with desired properties. Markovian Reinforcement Learning (RL) approaches are not suitable for such tasks, while RL with non-Markovian reward decision processes (NMRDPs) enables agents to tackle temporal-dependency tasks. This approach has long been known to lack formal guarantees on both (near-)optimality and sample efficiency. We contribute to solving both issues with QR-MAX, a novel model-based algorithm for discrete NMRDPs that factorizes Markovian transition learning from non-Markovian reward handling via reward machines. To the best of our knowledge, this is the first model-based RL algorithm for discrete-action NMRDPs that exploits this factorization to obtain PAC convergence to $\varepsilon$-optimal policies with polynomial sample complexity. We then extend QR-MAX to continuous state spaces with Bucket-QR-MAX, a SimHash-based discretiser that preserves the same factorized structure and achieves fast and stable learning without manual gridding or function approximation. We experimentally compare our method with modern state-of-the-art model-based RL approaches on environments of increasing complexity, showing a significant improvement in sample efficiency and increased robustness in finding optimal policies.

URL PDF HTML ☆

赞 0 踩 0

2512.14614 2026-06-10 cs.CV cs.GR 版本更新

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

WorldPlay：面向实时交互式世界建模的长期几何一致性

Wenqiang Sun, Haiyu Zhang, Haoyuan Wang, Junta Wu, Zehan Wang, Zhenwei Wang, Yunhong Wang, Jun Zhang, Tengfei Wang, Chunchao Guo

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出WorldPlay流式视频扩散模型，通过双重动作表示、重构上下文记忆和上下文强制蒸馏方法，实现实时交互式世界建模并保持长期几何一致性，生成24 FPS的720p长视频。

Comments project page: https://3d-models.hunyuan.tencent.com/world/, demo: https://3d.hunyuan.tencent.com/sceneTo3D, code: https://github.com/Tencent-Hunyuan/HY-WorldPlay

详情

AI中文摘要

本文提出WorldPlay，一种流式视频扩散模型，能够实现实时、交互式的世界建模，并保持长期几何一致性，解决了当前方法在速度与内存之间的权衡。WorldPlay的威力来自三个关键要素。1）我们使用双重动作表示（Dual Action Representation），以响应用户的键盘和鼠标输入实现鲁棒的动作控制。2）为了强制长期一致性，我们的重构上下文记忆（Reconstituted Context Memory）从过去帧动态重建上下文，并使用时间重构使几何上重要但久远的帧保持可访问，有效缓解记忆衰减。3）我们还提出上下文强制（Context Forcing），一种针对记忆感知模型的新型蒸馏方法。对齐教师和学生之间的记忆上下文，保留了学生使用长程信息的能力，在实现实时速度的同时防止误差漂移。综合来看，WorldPlay以24 FPS生成具有优越一致性的长时域流式720p视频，与现有技术相比表现更优，并在多种场景中展现出强大的泛化能力。项目页面和在线演示可访问：this https URL 和 this https URL。

英文摘要

This paper presents WorldPlay, a streaming video diffusion model that enables real-time, interactive world modeling with long-term geometric consistency, resolving the trade-off between speed and memory that limits current methods. WorldPlay draws power from three key ingredients. 1) We use a Dual Action Representation to enable robust action control in response to the user's keyboard and mouse inputs. 2) To enforce long-term consistency, our Reconstituted Context Memory dynamically rebuilds context from past frames and uses temporal reframing to keep geometrically important but long-past frames accessible, effectively alleviating memory attenuation. 3) We also propose Context Forcing, a novel distillation method designed for memory-aware model. Aligning memory context between the teacher and student preserves the student's capacity to use long-range information, enabling real-time speeds while preventing error drift. Taken together, WorldPlay generates long-horizon streaming 720p video at 24 FPS with superior consistency, comparing favorably with existing techniques and showing strong generalization across diverse scenes. Project page and online demo can be found: https://3d-models.hunyuan.tencent.com/world/ and https://3d.hunyuan.tencent.com/sceneTo3D.

URL PDF HTML ☆

赞 0 踩 0