arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1938
2504.19584 2026-05-21 cs.CV

ShowMak3r: Compositional TV Show Reconstruction

ShowMak3r: 动态光场的动态重建

Sangmin Kim, Seunguk Do, Daeun Lee, Jaesik Park

发表机构 * Seoul National University(首尔国立大学)

AI总结 本文提出ShowMak3r,一种能够对电视节目场景进行动态重建的综合管道,通过编辑场景实现类似影视制作控制室中的剪辑效果,解决了动态光场重建中的遮挡、杂乱舞台和视角变化等挑战。

Comments Project page : https://nstar1125.github.io/showmak3r

详情
AI中文摘要

从视频片段中重建动态光场具有挑战性,尤其是当给定的是娱乐视频如电视节目时。许多挑战使重建变得困难,原因包括(1)演员相互遮挡并具有多样的面部表情,(2)杂乱的舞台,以及(3)小基线视角或突然的镜头切换。为了解决这些问题,我们提出了ShowMak3r,一种综合的重建管道,允许像在制作控制室中剪辑视频片段一样编辑场景。在ShowMak3r中,3DLocator模块利用深度先验来定位恢复的演员并估计未见的人体姿态。所提出的ShotMatcher模块则在镜头切换下跟踪演员。此外,ShowMak3r引入了一个面部拟合网络,动态地恢复演员的表情。在Sitcoms3D数据集上的实验表明,我们的管道能够用不同时间戳的新摄像机重新组装电视节目场景。我们还展示了ShowMak3r能够实现有趣的应用,如合成镜头制作、演员重新定位、插入、删除和姿态操控。项目页面:https://nstar1125.github.io/showmak3r

英文摘要

Reconstructing dynamic radiance fields from video clips is challenging, especially when entertainment videos like TV shows are given. Many challenges make the reconstruction difficult due to (1) actors occluding with each other and having diverse facial expressions, (2) cluttered stages, and (3) small baseline views or sudden shot changes. To address these issues, we present ShowMak3r, a comprehensive reconstruction pipeline that allows the editing of scenes like how video clips are made in a production control room. In ShowMak3r, a 3DLocator module locates recovered actors on the stage using depth prior and estimates unseen human poses via interpolation. The proposed ShotMatcher module then tracks the actors under shot changes. Furthermore, ShowMak3r introduces a face-fitting network that dynamically recovers the actors' expressions. Experiments on Sitcoms3D dataset show that our pipeline can reassemble TV show scenes with new cameras at different timestamps. We also demonstrate that ShowMak3r enables interesting applications such as synthetic shot-making, actor relocation, insertion, deletion, and pose manipulation. Project page : https://nstar1125.github.io/showmak3r

2504.13109 2026-05-21 cs.CV

UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models

UniEdit-Flow:在流模型时代释放反向与编辑

Guanlong Jiao, Biqing Huang, Kuan-Chieh Wang, Renjie Liao

发表机构 * The University of British Columbia(不列颠哥伦比亚大学) Tsinghua University(清华大学) Snap Inc.(Snap公司) Vector Institute(向量研究所) Canada CIFAR AI Chair(加拿大CIFAR人工智能主席)

AI总结 本文提出了一种基于预测-校正框架的流模型反向与编辑方法,通过Uni-Inv实现准确重建,并通过Uni-Edit实现区域感知的图像编辑,方法无需调优,具有通用性和高效性,实验表明其在多种生成模型中均表现出色。

Comments ICLR 2026. Project Page: https://uniedit-flow.github.io/

详情
AI中文摘要

流匹配模型已作为一种强大的替代扩散模型的选项,但现有的针对扩散模型的反向和编辑方法往往在流模型上效果不佳或不适用。流模型的直线、非交叉轨迹对基于扩散的方法构成了挑战,但也为新的解决方案提供了途径。在本文中,我们介绍了一种用于流模型反向和编辑的预测-校正框架。首先,我们提出了Uni-Inv,一种有效的反向方法,用于准确的重建。在此基础上,我们将延迟注入的概念扩展到流模型,并引入Uni-Edit,一种区域感知且稳健的图像编辑方法。我们的方法无需调优,模型无关,高效且有效,能够在多样化编辑的同时,确保对编辑无关区域的强保留。在各种生成模型上的广泛实验表明,Uni-Inv和Uni-Edit的优越性和通用性,即使在低成本设置下也是如此。项目页面:https://uniedit-flow.github.io/

英文摘要

Flow matching models have emerged as a strong alternative to diffusion models, but existing inversion and editing methods designed for diffusion are often ineffective or inapplicable to them. The straight-line, non-crossing trajectories of flow models pose challenges for diffusion-based approaches but also open avenues for novel solutions. In this paper, we introduce a predictor-corrector-based framework for inversion and editing in flow models. First, we propose Uni-Inv, an effective inversion method designed for accurate reconstruction. Building on this, we extend the concept of delayed injection to flow models and introduce Uni-Edit, a region-aware, robust image editing approach. Our methodology is tuning-free, model-agnostic, efficient, and effective, enabling diverse edits while ensuring strong preservation of edit-irrelevant regions. Extensive experiments across various generative models demonstrate the superiority and generalizability of Uni-Inv and Uni-Edit, even under low-cost settings. Project page: https://uniedit-flow.github.io/

2504.06925 2026-05-21 cs.CV cs.AI

Are Vision-Language Models Ready for Dietary Assessment? Exploring the Next Frontier in AI-Powered Food Image Recognition

视觉-语言模型是否准备好进行饮食评估?探索AI驱动的食品图像识别的下一个前沿

Sergio Romero-Tapiador, Ruben Tolosana, Blanca Lacruz-Pleguezuelos, Laura Judith Marcos Zambrano, Guadalupe X. Bazán, Isabel Espinosa-Salinas, Julian Fierrez, Javier Ortega-Garcia, Enrique Carrillo de Santa Pau, Aythami Morales

发表机构 * Biometrics and Data Pattern Analytics Lab, Universidad Autonoma de Madrid(生物度量与数据模式分析实验室,马德里自治大学) IMDEA Food, CEI UAM+CSIC(IMDEA食品,CEI UAM+CSIC)

AI总结 本文评估了六种先进的视觉-语言模型在不同层次上的食品识别能力,提出了一个新的评估指标,并展示了FoodNExTDB数据库在饮食评估中的应用潜力。

Comments Accepted at IEEE/CVF Computer Vision and Pattern Recognition Conference workshops 2025 (CVPRw) 10 pages, 4 figures, 2 tables

Journal ref 2025 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2025, pp. 1-10

详情
AI中文摘要

基于食品图像的自动饮食评估仍是一个挑战,需要精确的食品检测、分割和分类。视觉-语言模型(VLMs)通过整合视觉和文本推理提供了新的可能性。在本研究中,我们评估了六种最先进的VLMs(ChatGPT、Gemini、Claude、Moondream、DeepSeek和LLaVA),分析它们在不同层次上的食品识别能力。在实验框架中,我们引入了FoodNExTDB,一个独特的食品图像数据库,包含9,263张由专家标注的图像,涵盖10个类别(例如“蛋白质来源”)、62个子类别(例如“家禽”)和9种烹饪风格(例如“烤制”)。总共,FoodNExTDB包括50,000个由七位专家生成的营养标签,这些标签由手动标注所有数据库中的图像生成。此外,我们提出了一种新的评估指标,专家加权召回率(EWR),该指标考虑了不同标注者之间的差异。结果表明,封闭源模型在识别包含单一产品的图像中的食品产品时,性能优于开源模型,达到了超过90%的EWR。尽管有潜力,当前VLMs在细粒度食品识别方面面临挑战,特别是在区分烹饪风格的细微差异和视觉相似的食品项目时,这限制了它们在自动饮食评估中的可靠性。FoodNExTDB数据库在https://github.com/AI4Food/FoodNExtDB上公开可用。

英文摘要

Automatic dietary assessment based on food images remains a challenge, requiring precise food detection, segmentation, and classification. Vision-Language Models (VLMs) offer new possibilities by integrating visual and textual reasoning. In this study, we evaluate six state-of-the-art VLMs (ChatGPT, Gemini, Claude, Moondream, DeepSeek, and LLaVA), analyzing their capabilities in food recognition at different levels. For the experimental framework, we introduce the FoodNExTDB, a unique food image database that contains 9,263 expert-labeled images across 10 categories (e.g., "protein source"), 62 subcategories (e.g., "poultry"), and 9 cooking styles (e.g., "grilled"). In total, FoodNExTDB includes 50k nutritional labels generated by seven experts who manually annotated all images in the database. Also, we propose a novel evaluation metric, Expert-Weighted Recall (EWR), that accounts for the inter-annotator variability. Results show that closed-source models outperform open-source ones, achieving over 90% EWR in recognizing food products in images containing a single product. Despite their potential, current VLMs face challenges in fine-grained food recognition, particularly in distinguishing subtle differences in cooking styles and visually similar food items, which limits their reliability for automatic dietary assessment. The FoodNExTDB database is publicly available at https://github.com/AI4Food/FoodNExtDB.

2503.08292 2026-05-21 cs.CL cs.AI

Do LLMs Triage Like Clinicians? A Dynamic Study of Outpatient Referral

大语言模型像医生一样分诊吗?对外科会诊的动态研究

Xiaoxiao Liu, Qingying Xiao, Bingquan Zhang, Junying Chen, Xiangyi Feng, Ziniu Li, Xiang Wan, Jian Chang, Guangjun Yu, Yan Hu, Benyou Wang

发表机构 * Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳)) Bournemouth University(伯恩茅斯大学) National Health Data Institute, Shenzhen(深圳国家健康数据研究院) Shenzhen Research Institute of Big Data(深圳大数据研究院)

AI总结 本文研究了大语言模型在动态分诊过程中的表现,发现其在动态场景中通过有效提问减少不确定性,优于传统分类器,但静态场景下优势有限。

详情
AI中文摘要

门诊会诊(OR)是一种核心临床流程,将患者分配到医院部门,在信息不完整且不断演变的情况下进行,但通常被简化为静态分类问题,尽管实际上是交互性的。在本工作中,我们将门诊会诊视为由信息获取和不确定性降低驱动的动态过程。我们分析了基于固定患者信息的静态场景和涉及多轮对话的动态场景,以测试大语言模型(LLMs)是否通过更好的预测或更有效的提问来改善分诊结果。我们的发现表明,LLMs在静态分诊准确性上对传统分类器几乎没有优势,但在动态设置中始终优于它们,通过询问具有辨别性的后续问题来减少候选部门的不确定性。这些结果表明,大语言模型在门诊分诊中的主要价值不在于静态预测,而在于支持交互式、具有不确定性的临床决策。

英文摘要

Outpatient referral (OR) is a core clinical workflow that assigns patients to hospital departments under incomplete and evolving information, yet it is commonly simplified as a static classification problem despite being inherently interactive in practice. In this work, we study outpatient referral as a dynamic process driven by information acquisition and uncertainty reduction. We analyze both static scenarios based on fixed patient information and dynamic scenarios involving multi-turn dialogue, to test whether large language models (LLMs) improve referral outcomes through better prediction or more effective questioning. Our findings show that LLMs offer limited advantages over traditional classifiers in static referral accuracy, but consistently outperform them in dynamic settings by asking discriminative follow-up questions that reduce uncertainty over candidate departments. These results suggest that the primary value of LLMs in outpatient referral lies not in static prediction, but in supporting interactive, uncertainty-aware clinical decision-making.

2502.18915 2026-05-21 cs.CL cs.AI

END: Early Noise Dropping for Efficient and Effective Context Denoising

END:早期噪声丢弃以实现高效有效的上下文去噪

Hongye Jin, Pei Chen, Jingfeng Yang, Zhengyang Wang, Fangran Mo, Jinghan Zhang, Meng Jiang, Yifan Gao, Binxuan Huang, Xinyang Zhang, Zheng Li, Tianyi Liu, Huasheng Li, Bing Yin

发表机构 * Amazon(亚马逊)

AI总结 本文提出END方法,通过在早期层对输入序列进行分割和线性探针,有效识别并丢弃噪声部分,从而提升LLM在不同任务上的性能和效率,同时加深了对LLM内部上下文推理机制的理解。

详情
AI中文摘要

大型语言模型(LLMs)在广泛自然语言处理任务中表现出色,但它们经常受到输入序列中无关或噪声内容的干扰,从而降低输出质量。这个问题影响了长上下文和短上下文场景,如检索增强生成、表格问答和上下文学习。我们发现LLMs可以在生成令牌之前,在早期层中隐式地识别输入序列中是否有有用信息。基于这一见解,我们引入了早期噪声丢弃(END),一种无需微调LLMs的新方法,以缓解此问题。END将输入序列分成块,并在LLMs的早期层上使用线性探针来区分信息丰富和噪声块。通过在过程中早期丢弃噪声块,END保留了关键信息,减少了干扰,并降低了计算开销。广泛的实验表明,END在不同LLMs上多个评估数据集上显著提高了性能和效率。此外,通过探针研究LLMs对输入的隐式理解,这项工作也加深了对LLMs如何内部进行上下文推理的理解。

英文摘要

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of natural language processing tasks. However, they are often distracted by irrelevant or noisy context in input sequences that degrades output quality. This problem affects both long- and short-context scenarios, such as retrieval-augmented generation, table question-answering, and in-context learning. We reveal that LLMs can implicitly identify whether input sequences contain useful information at early layers, prior to token generation. Leveraging this insight, we introduce Early Noise Dropping (\textsc{END}), a novel approach to mitigate this issue without requiring fine-tuning the LLMs. \textsc{END} segments input sequences into chunks and employs a linear prober on the early layers of LLMs to differentiate between informative and noisy chunks. By discarding noisy chunks early in the process, \textsc{END} preserves critical information, reduces distraction, and lowers computational overhead. Extensive experiments demonstrate that \textsc{END} significantly improves both performance and efficiency across different LLMs on multiple evaluation datasets. Furthermore, by investigating LLMs' implicit understanding to the input with the prober, this work also deepens understanding of how LLMs do reasoning with contexts internally.

2502.12120 2026-05-21 cs.LG cs.AI cs.CL

LLMs on the Line: Data Determines Loss-to-Loss Scaling Laws

LLMs on the Line: 数据决定损失-损失缩放定律

Prasanna Mayilvahanan, Thaddäus Wiedemer, Sayak Mallick, Matthias Bethge, Wieland Brendel

发表机构 * Max Planck Institute for Intelligent Systems(智能系统马克斯·普朗克研究所) ELLIS Institute Tübingen(图宾根ELLIS研究所) Tübingen AI Center(图宾根人工智能中心) University of Tübingen(图宾根大学)

AI总结 研究探讨了影响LLM损失-损失缩放定律的主要因素,发现预训练数据决定了缩放趋势,而模型大小、优化超参数、分词器和架构差异对缩放影响有限,因此应精心选择预训练数据以获得最佳下游性能。

Comments ICML 2025 camera-ready version

详情
AI中文摘要

缩放定律指导大型语言模型(LLMs)的发展,通过提供模型大小、令牌和计算量之间的最佳平衡估计。最近,损失-损失缩放定律,即预训练数据集和下游任务之间损失的关系,已成为理解并改进LLM性能和泛化能力的强大工具。在本工作中,我们研究了哪些因素最强烈地影响损失-损失缩放。我们的实验发现,预训练数据决定了缩放趋势。相比之下,模型大小、优化超参数、分词器甚至显著的架构差异,如基于Transformer的模型如Llama和状态空间模型如Mamba之间的差异,通常影响有限。因此,从业者应仔细选择适合的预训练数据集以获得最佳下游性能,而架构和其他设置可以自由优化以提高训练效率。

英文摘要

Scaling laws guide the development of large language models (LLMs) by offering estimates for the optimal balance of model size, tokens, and compute. More recently, loss-to-loss scaling laws that relate losses across pretraining datasets and downstream tasks have emerged as a powerful tool for understanding and improving LLM performance and generalization. In this work, we investigate which factors most strongly influence loss-to-loss scaling. Our experiments reveal that the pretraining data determines the scaling trend. In contrast, model size, optimization hyperparameters, tokenizer and even significant architectural differences, such as between transformer-based models like Llama and state-space models like Mamba, generally have limited impact. Consequently, practitioners should carefully curate suitable pretraining datasets for optimal downstream performance, while architectures and other settings can be freely optimized for training efficiency.

2502.03752 2026-05-21 cs.LG cs.AI

Self-Improving Skill Learning for Robust Skill-based Meta-Reinforcement Learning

基于鲁棒技能的元强化学习中的自我改进技能学习

Sanghyeon Lee, Sangjun Bae, Yisak Park, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence(人工智能研究生院) Ulsan National Institute of Science and Technology (UNIST)(釜山国立科学技术研究院 (UNIST))

AI总结 本文提出Self-Improving Skill Learning (SISL)方法,通过解耦的高层和技能改进策略进行自我指导的技能细化,并利用最大回报重标记进行技能优先级排序,从而在噪声和次优数据下实现鲁棒且稳定的适应,优于其他基于技能的元强化学习方法。

Comments 10 pages main, 27 pages appendix with reference. Accepted to ICLR 2026

Journal ref International Conference on Learning Representations (ICLR), 2026

详情
AI中文摘要

元强化学习(Meta-RL)能够快速适应未见任务,但在长时间 horizon 环境中面临挑战。基于技能的方法通过将状态-动作序列分解为可重用的技能并采用分层决策来解决这一问题。然而,这些方法对噪声的离线演示高度敏感,导致技能学习不稳定和性能下降。为此,我们提出Self-Improving Skill Learning (SISL),通过解耦的高层和技能改进策略进行自我指导的技能细化,同时应用最大回报重标记进行技能优先级排序,从而在噪声和次优数据下实现鲁棒且稳定的适应。通过减轻噪声的影响,SISL实现了可靠的技能学习,并在多样化的长horizon任务上一致优于其他基于技能的元强化学习方法。我们的代码可在https://epsilog.github.io/SISL获取。

英文摘要

Meta-reinforcement learning (Meta-RL) facilitates rapid adaptation to unseen tasks but faces challenges in long-horizon environments. Skill-based approaches tackle this by decomposing state-action sequences into reusable skills and employing hierarchical decision-making. However, these methods are highly susceptible to noisy offline demonstrations, leading to unstable skill learning and degraded performance. To address this, we propose Self-Improving Skill Learning (SISL), which performs self-guided skill refinement using decoupled high-level and skill improvement policies, while applying skill prioritization via maximum return relabeling to focus updates on task-relevant trajectories, resulting in robust and stable adaptation even under noisy and suboptimal data. By mitigating the effect of noise, SISL achieves reliable skill learning and consistently outperforms other skill-based meta-RL methods on diverse long-horizon tasks. Our code is available at https://epsilog.github.io/SISL.

2502.02844 2026-05-21 cs.LG cs.AI cs.CR cs.MA

Wolfpack Adversarial Attack for Robust Multi-Agent Reinforcement Learning

狼群对抗攻击用于鲁棒多智能体强化学习

Sunwoo Lee, Jaebak Hwang, Yonghyeon Jo, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea(人工智能研究生院,UNIST,韩国乌山)

AI总结 本文提出狼群对抗攻击框架,用于对抗多智能体强化学习中的协同对抗攻击,并引入狼群-对抗学习框架来训练鲁棒的MARL策略以防御该攻击。

Comments 9 pages main, 23 pages appendix with reference. Accepeted by ICML 2025

Journal ref Proceedings of Machine Learning Research (PMLR), ICML 2025

详情
AI中文摘要

传统多智能体强化学习(MARL)中的鲁棒方法往往难以应对合作场景中的协调对抗攻击。为了解决这一限制,我们提出了受狼群狩猎策略启发的狼群对抗攻击框架,该框架针对初始智能体及其辅助智能体以破坏合作。此外,我们还引入了狼群-对抗学习用于MARL(WALL)框架,该框架通过促进系统内协作来训练鲁棒的MARL策略以防御所提出的狼群攻击。实验结果突显了狼群攻击的毁灭性影响以及WALL所实现的显著鲁棒性改进。我们的代码可在https://github.com/sunwoolee0504/WALL上获得。

英文摘要

Traditional robust methods in multi-agent reinforcement learning (MARL) often struggle against coordinated adversarial attacks in cooperative scenarios. To address this limitation, we propose the Wolfpack Adversarial Attack framework, inspired by wolf hunting strategies, which targets an initial agent and its assisting agents to disrupt cooperation. Additionally, we introduce the Wolfpack-Adversarial Learning for MARL (WALL) framework, which trains robust MARL policies to defend against the proposed Wolfpack attack by fostering systemwide collaboration. Experimental results underscore the devastating impact of the Wolfpack attack and the significant robustness improvements achieved by WALL. Our code is available at https://github.com/sunwoolee0504/WALL.

2502.02834 2026-05-21 cs.LG cs.AI

Task-Aware Virtual Training: Enhancing Generalization in Meta-Reinforcement Learning for Out-of-Distribution Tasks

任务感知虚拟训练:增强元强化学习在分布外任务中的泛化能力

Jeongmo Kim, Yisak Park, Minung Kim, Seungyul Han

发表机构 * Graduate School of Artificial Intelligence, UNIST, Ulsan, South Korea(人工智能研究生院,UNIST,韩国乌山)

AI总结 本文提出Task-Aware Virtual Training方法,通过度量学习提升元强化学习在分布外任务中的泛化能力,采用虚拟任务保持任务特征并利用状态正则化技术减少状态变化环境中的过估计误差。

Comments 9 pages main paper, 20 pages appendices with reference. Accepted to ICML 2025

Journal ref Proceedings of Machine Learning Research (PMLR), ICML 2025

详情
AI中文摘要

元强化学习旨在开发能够泛化到未见任务的策略,这些任务从任务分布中采样。尽管基于上下文的元强化学习方法通过任务潜在变量改善任务表示,但它们在分布外(OOD)任务上常常表现不佳。为了解决这个问题,我们提出了Task-Aware Virtual Training(TAVT),一种新的算法,通过度量基于的表示学习准确捕捉任务特征,用于训练和OOD场景。我们的方法在虚拟任务中成功保持任务特征,并采用状态正则化技术以减轻状态变化环境中的过估计误差。数值结果表明,TAVT在各种MuJoCo和MetaWorld环境中显著增强了对OOD任务的泛化能力。我们的代码可在https://github.com/JM-Kim-94/tavt.git获取。

英文摘要

Meta reinforcement learning aims to develop policies that generalize to unseen tasks sampled from a task distribution. While context-based meta-RL methods improve task representation using task latents, they often struggle with out-of-distribution (OOD) tasks. To address this, we propose Task-Aware Virtual Training (TAVT), a novel algorithm that accurately captures task characteristics for both training and OOD scenarios using metric-based representation learning. Our method successfully preserves task characteristics in virtual tasks and employs a state regularization technique to mitigate overestimation errors in state-varying environments. Numerical results demonstrate that TAVT significantly enhances generalization to OOD tasks across various MuJoCo and MetaWorld environments. Our code is available at https://github.com/JM-Kim-94/tavt.git.

2501.15151 2026-05-21 cs.CV

SpikeDet: Better Firing Patterns for Accurate and Energy-Efficient Object Detection with Spiking Neural Networks

SpikeDet: 更准确且节能的基于脉冲神经网络的目标检测中的 firing 模式

Yimeng Fan, Changsong Liu, Mingyang Li, Dongze Liu, Yuting Su, Yanyan Liu, Wei Zhang

发表机构 * School of Microelectronics(微电子学院) School of Electrical and Information Engineering(电气与信息工程学院) Optoelectronic Thin Film Device and Technology Research Institute(光电薄膜器件与技术研究所)

AI总结 本文提出SpikeDet,一种新型的脉冲神经网络目标检测器,通过优化firing模式实现更准确且节能的目标检测。具体来说,设计了MDSNet脉冲骨干网络,有效调整每个层的膜电位突触输入分布,实现更优的脉冲特征提取;引入Spiking Multi-direction Fusion Module (SMFM)实现多方向融合,增强多尺度检测能力;提出Local Firing Saturation Index (LFSI)定量衡量局部firing饱和度。实验结果验证了方法的有效性,在COCO 2017数据集上达到52.2% AP,比现有SNN方法提升3.3% AP,能耗仅为一半。

详情
AI中文摘要

脉冲神经网络(SNNs)是神经网络的第三代。由于其低能耗和生物可解释性,SNNs在目标检测中获得了广泛关注。然而,现有的基于SNN的目标检测方法受到局部firing饱和的影响,相邻神经元同时达到最大firing率,尤其是在以对象为中心的区域。这种异常的神经元firing模式降低了特征辨别能力和检测准确性,同时增加了firing率,阻碍了SNNs实现其潜在的能源效率。为了解决这个问题,我们提出了SpikeDet,一种新颖的脉冲目标检测器,通过优化firing模式实现更准确且节能的检测。具体来说,我们设计了MDSNet脉冲骨干网络,该网络在每一层有效调整膜电位突触输入分布,从而在脉冲特征提取过程中实现更好的神经元firing模式。对于颈部部分,为了更好地利用和保留这些高质量的骨干特征,我们引入了Spiking Multi-direction Fusion Module (SMFM),实现了脉冲特征的多方向融合,增强了模型的多尺度检测能力。此外,我们提出了Local Firing Saturation Index (LFSI),以定量衡量局部firing饱和度。实验结果验证了我们方法的有效性。在COCO 2017数据集上,它达到了52.2%的AP,比先前的SNN方法提高了3.3%的AP,同时仅需一半的能耗。在目标检测子任务中,包括基于事件的GEN1、水下URPC 2019、低光ExDARK和密集场景CrowdHuman数据集上,SpikeDet也取得了最佳性能。

英文摘要

Spiking Neural Networks (SNNs) are the third generation of neural networks. They have gained widespread attention in object detection due to their low energy consumption and biological interpretability. However, existing SNN-based object detection methods suffer from local firing saturation, where adjacent neurons concurrently reach maximum firing rates, especially in object-centric regions. This abnormal neuron firing pattern reduces the feature discrimination capability and detection accuracy, while also increasing the firing rates that prevent SNNs from achieving their potential energy efficiency. To address this problem, we propose SpikeDet, a novel spiking object detector that optimizes firing patterns for accurate and energy-efficient detection. Specifically, we design a spiking backbone network, MDSNet, which effectively adjusts the membrane synaptic input distribution at each layer, achieving better neuron firing patterns during spiking feature extraction. For the neck, to better utilize and preserve these high-quality backbone features, we introduce the Spiking Multi-direction Fusion Module (SMFM), which realizes multi-direction fusion of spiking features, enhancing the multi-scale detection capability of the model. Furthermore, we propose the Local Firing Saturation Index (LFSI) to quantitatively measure local firing saturation. Experimental results validate the effectiveness of our method. On the COCO 2017 dataset, it achieves 52.2% AP, outperforming previous SNN-based methods by 3.3% AP while requiring only half the energy consumption. On object detection sub-tasks, including event-based GEN1, underwater URPC 2019, low-light ExDARK, and dense scene CrowdHuman datasets, SpikeDet also achieves the best performance.

2501.02407 2026-05-21 cs.CL cs.CR cs.LG

Towards the Anonymization of the Language Modeling

朝向语言模型的匿名化

Antoine Boutet, Lucas Magnana, Juliette Sénéchal

发表机构 * INSA Lyon, Inria, CITI, UR3720(里昂国家理工学院、法国国家科学研究中心、CITI、UR3720) Inria, INSA Lyon, CITI, UR3720(法国国家科学研究中心、里昂国家理工学院、CITI、UR3720)

AI总结 本文提出了一种隐私保护的语言模型方法,通过掩码语言模型(MLM)和因果语言模型(CLM)方法,旨在解决语言模型的匿名化问题,从而促进其共享。研究通过医疗数据集评估了这两种方法,并表明在避免记忆直接和间接标识信息的同时,能够保持高隐私性和高实用性。

详情
AI中文摘要

自然语言处理(NLP)的快速发展已经革新了许多领域,包括医疗保健。然而,这些进展带来了显著的隐私问题,特别是当预训练模型在敏感数据上进行微调和专门化时,可能会记住并暴露个人信息。本文提出了一种隐私保护的语言模型方法,以解决语言模型的匿名化问题,从而促进其共享。具体来说,我们提出了掩码语言模型(MLM)方法,用于专门化类似于BERT的语言模型,以及因果语言模型(CLM)方法,用于专门化类似于GPT的语言模型,以避免模型记住训练数据中直接和间接的标识信息。我们使用医疗数据集全面评估了我们的方法,并将其与不同的基线进行了比较。我们的结果表明,通过在模型专门化过程中避免记忆直接和间接的标识符,我们的掩码和因果语言模型方案在保持高隐私性的同时,能够保持高实用性。

英文摘要

Rapid advances in Natural Language Processing (NLP) have revolutionized many fields, including healthcare. However, these advances raise significant privacy concerns, especially when pre-trained models fine-tuned and specialized on sensitive data can memorize and then expose and regurgitate personal information. This paper presents a privacy-preserving language modeling approach to address the problem of language models anonymization, and thus promote their sharing. Specifically, we propose both a Masking Language Modeling (MLM) methodology to specialize a BERT-like language model, and a Causal Language Modeling (CLM) methodology to specialize a GPT-like model that avoids the model from memorizing direct and indirect identifying information present in the training data. We have comprehensively evaluated our approaches using a medical dataset and compared them against different baselines. Our results indicate that by avoiding memorizing both direct and indirect identifiers during model specialization, our masking and causal language modeling schemes offer a good tradeoff for maintaining high privacy while retaining high utility.

2412.14738 2026-05-21 cs.LG

Spectrally unstable nodes drive reliability failures in graph learning

谱不稳定性节点驱动图学习中的可靠性故障

Yongyu Wang

发表机构 * MTU(MTU大学)

AI总结 研究探讨了图学习中谱不稳定性节点对可靠性故障的影响,提出了一种可靠性感知干预方法以隔离这些节点,从而提升算法在对抗性和内在噪声下的鲁棒性。

详情
AI中文摘要

图学习算法在图结构被对抗性扰动、本质上嘈杂或由不完美观测构造时可能会失效。本文展示了一些节点比其他节点对对抗性扰动和内在噪声损害图学习算法承担更大的责任。基于图谱畸变分析,我们识别出这些故障驱动节点,并引入一种可靠性感知干预,将其隔离出主要学习步骤。目标算法应用于稳定的诱导子图,隔离节点的预测通过拓扑或质心传播恢复。在针对和非针对的结构攻击下,以及谱超图聚类和多视图谱聚类等图神经网络中,这一原理在对抗性和内在噪声下均提高了可靠性。这些结果表明,节点层面的谱不稳定性为理解并缓解图学习中的可靠性故障提供了一个共同机制。

英文摘要

Graph-learning algorithms can fail when graph structure is adversarially perturbed, intrinsically noisy or constructed from imperfect observations. Here we show that some nodes bear much greater responsibility than others for allowing adversarial perturbations and intrinsic noise to harm graph-learning algorithms. Building on graph-spectral distortion analysis, we identify these failure-driving nodes and introduce a reliability-aware intervention that isolates them from the main learning step. The target algorithm is applied to a stable induced subgraph, and predictions for isolated nodes are recovered through topology- or centroid-based propagation. Across graph neural networks under targeted and non-targeted structural attacks, spectral hypergraph clustering and multi-view spectral clustering, this principle improves reliability under both adversarial and intrinsic noise. These results suggest that node-level spectral instability provides a common mechanism for understanding and mitigating reliability failures in graph learning.

2412.01944 2026-05-21 cs.CV eess.IV

A Comparative Study of Transformer and Convolutional Models for Crop Segmentation from Satellite Image Time Series

变换器与卷积模型在卫星图像时间序列作物分割中的比较研究

Mattia Gatti, Ignazio Gallo, Nicola Landro, Christian Loschiavo, Anwar Ur Rehman, Mirco Boschetti, Riccardo La Grassa

发表机构 * University of Insubria(因斯布鲁克大学) IREA CNR(意大利国家研究委员会IREA分部) INAF-Astronomical Observatory(意大利国家天体物理研究所天文台)

AI总结 本文比较了变换器和卷积模型在从卫星图像时间序列中进行作物分割中的应用,发现TSViT在整体表现上最佳,而VistaFormer在效率与性能之间提供了良好的权衡。

Comments This version corrects an error in the evaluation pipeline affecting previously reported metrics. Results have been recomputed, leading to updated values and a revised conclusion: the adapted Swin UNETR model does not outperform CNN baselines. Tables, figures, and comparisons have been updated, and the analysis has been extended to include additional transformer-based models

详情
AI中文摘要

从卫星图像时间序列(SITS)中进行作物分割是农业监测和土地利用分析中的基本任务。尽管卷积神经网络(CNNs)已被广泛应用,但基于变换器的架构提供了另一种机制,用于在多光谱数据中表示空间和时间依赖性。本文提出了对CNN和基于变换器的分割模型的比较研究,用于Sentinel-2时间序列的作物制图,包括3D U-Net、3D FPN、3D DeepLabv3以及三种变换器架构:Swin UNETR、TSViT和VistaFormer,它们采用不同的策略来捕捉时间依赖性。在Munich和Lombardia数据集上的实验表明,TSViT在整体表现上最佳,略微优于3D U-Net,后者仍然是一个强大的CNN基线。VistaFormer提供了最佳的效率,而Swin UNETR表现竞争,但不如那些显式建模时间动态的变换器。这些结果突显了时间建模在SITS中的重要性:TSViT优于CNNs和将时间视为额外空间维度的方法,而VistaFormer提供了良好的效率-性能权衡。

英文摘要

Crop segmentation from satellite image time series (SITS) is a fundamental task for agricultural monitoring and land-use analysis. While convolutional neural networks (CNNs) have been widely used, transformer-based architectures offer alternative mechanisms for representing spatial and temporal dependencies in multispectral data. This paper presents a comparative study of CNN and transformer-based segmentation models for crop mapping from Sentinel-2 time series, including 3D U-Net, 3D FPN, 3D DeepLabv3, and three transformer architectures: Swin UNETR, TSViT, and VistaFormer, which adopt different strategies for capturing temporal dependencies. Experiments on the Munich and Lombardia datasets show that TSViT achieves the best overall results, slightly surpassing 3D U-Net, which remains a strong CNN baseline. VistaFormer offers the best efficiency, while Swin UNETR performs competitively but is less effective than transformers that explicitly model temporal dynamics. These results highlight that temporal modelling is critical for SITS: TSViT outperforms CNNs and approaches that treat time as an additional spatial dimension, while VistaFormer provides a strong efficiency-performance trade-off.

2411.01141 2026-05-21 cs.CL

Dictionary Insertion Prompting for Multilingual Reasoning on Multilingual Large Language Models

字典插入提示用于多语言推理在多语言大语言模型上

Hongyuan Lu, Zixuan Li, Wai Lam

发表机构 * The Chinese University of Hong Kong(香港中文大学) Southeast University(东南大学)

AI总结 本文提出了一种名为字典插入提示(DIP)的新方法,通过在提示中插入词典中的英文对应词来提升多语言推理性能,实验表明在10到200种语言上效果显著。

Comments ACL *SEM 2026

详情
AI中文摘要

在当前的大语言模型(LLMs)时代,存在两个不足:一是缺乏多语言模型,大多数LLMs以英语为中心,多语言推理性能受限;二是外部知识的使用位置,大多数检索的知识被前置到用户查询中(可能不最优)。本文提出了一种新颖且有效的称为字典插入提示(DIP)的方法。当提供非英语提示时,DIP会查找词典并插入词的英文对应词到提示的中间部分,从而使LLMs更好地翻译成英语并产生更好的英语模型思考步骤,从而获得明显更好的结果。我们实验了10到200种语言(FLORES-200)。由于没有足够的数据集,我们使用NLLB翻译器从现有的4个英语推理基准(如GSM8K和AQuA)创建合成的多语言基准。合成基准被翻译回英语以确保质量并通过人工标注。有趣的是,插入词典的位置对性能提升有重要影响,我们发现在原始词和词典之间交替插入比前置或后置词典效果更好,同一词典构建下。

英文摘要

There are two shortages in the current Large Language Models (LLMs) era. The first is short of multilingual models, where most LLMs are English-centric and performance is limited on multilingual reasoning. The second is the place of external knowledge to be used, where most retrieved knowledge is prepended to the user queries (maybe sub-optimal). This paper presents a novel and simple yet effective method called \textbf{D}ictionary \textbf{I}nsertion \textbf{P}rompting (\textbf{DIP}). When providing a non-English prompt, DIP looks up a word dictionary and inserts words' English counterparts into the middle of the prompt for LLMs. It then enables better translation into English and better English model thinking steps which leads to obviously better results. We experiment with 10 to 200 languages from FLORES-200.\footnote{The number of languages varies on the datasets, and we experiment with 200 languages on GSM8K as in Appendix} Since there are no adequate datasets, we use the NLLB translator to create synthetic multilingual benchmarks from the existing 4 English reasoning benchmarks such as GSM8K and AQuA. The synthetic benchmarks are translated back into English for quality assurance with manual annotation. Interestingly, the place for injecting the dictionary plays an important factor in the performance gains, and we found that interleaving the dictionary with the original words gives a better performance compared to prepending/appending the dictionary, under the same dictionary constructed.

2410.04155 2026-05-21 cs.CL

Toxic Subword Pruning for Dialogue Response Generation on Large Language Models

针对大型语言模型的对话响应生成中的有毒子词修剪

Hongyuan Lu, Wai Lam

发表机构 * FaceMind Corporation(FaceMind公司) The Chinese University of Hong Kong(香港中文大学)

AI总结 本文提出了一种名为ToxPrune的新型算法,通过修剪包含有毒词的子词来防止大型语言模型生成有毒内容,同时提升了对话响应生成任务中非有毒模型的表现。

Comments ACL *SEM 2026

详情
AI中文摘要

如何防御大型语言模型(LLMs)生成有毒内容是一个重要的研究领域。然而,大多数研究集中在各种模型训练技术上,通过更新权重来修复LLMs。安全对齐是一个相关研究领域,但这种方法通常成本高且繁琐,且如果不小心处理,可能会导致模型出现灾难性遗忘等问题。因此,我们提出了一种简单但有效且新颖的算法,即ToxPrune,用于修剪训练好的LLMs中的BPE子词中的有毒词。与之前的研究不同,我们发现修剪BPE标记在机器翻译任务中是有害的,但其在防止LLMs生成有毒内容方面却很有用。幸运的是,我们的发现表明,ToxPrune同时明显提升了有毒语言模型NSFW-3B在对话响应生成任务中的表现。我们还发现,ToxPrune甚至可以明显提升官方Llama-3.1-6B在对话多样性指标上的表现。广泛的自动结果和人工评估表明,ToxPrune对修复有毒LLMs和提升非有毒LLMs在对话响应生成任务中的表现都有帮助。

英文摘要

How to defend large language models (LLMs) from generating toxic content is an important research area. Yet, most research focused on various model training techniques to remediate LLMs by updating their weights. A typical related research area is safety alignment. This however is often costly and tedious and can expose the model to even more problems such as catastrophic forgetting if the trainings are not carefully handled by experienced NLP practitioners. We thus propose a simple yet effective and novel algorithm, namely \textbf{Tox}ic Subword \textbf{Prun}ing (ToxPrune) to prune the subword contained by the toxic words from BPE in trained LLMs. In contrast to the previous work that demonstrates pruning BPE tokens as harmful to the task of machine translation, we surprisingly found its usefulness in preventing toxic content from being generated on LLMs. Fortunately, our findings suggest that ToxPrune simultaneously improves the toxic language model NSFW-3B on the task of dialogue response generation obviously. We surprisingly found that ToxPrune can even obviously improve official Llama-3.1-6B in the metric of dialogue diversity. Extensive automatic results and human evaluation indicate that ToxPrune could be helpful for both remediating toxic LLMs and improving non-toxic LLMs on the task of dialogue response generation.\footnote{We plan to release the resources to facilitate future work.}

2410.03296 2026-05-21 cs.CL cs.AI

A Systematic Comparison between Extractive Self-Explanations and Human Rationales in Text Classification

抽取式自我解释与人类推理在文本分类中的系统比较

Stephanie Brandl, Oliver Eberle

发表机构 * Center for Social Data Science(社会科学数据科学中心) University of Copenhagen(哥本哈根大学) Machine Learning Group(机器学习小组) Technische Universität Berlin(柏林技术大学)

AI总结 本文比较了抽取式自我解释与人类推理在文本分类任务中的有效性,通过分析不同任务和语言的解释质量,发现自我解释在文本长度和任务复杂度上与人类推理存在显著差异。

Comments accepted to the Trustworthy NLP Workshop, co-located with ACL 2026

详情
AI中文摘要

指令微调的LLM能够通过生成自我解释来向用户解释其输出,而无需应用复杂的可解释性技术。本文分析这种能力是否能产生高质量的解释。我们评估了以输入推理形式呈现的自我解释在人类中的可信度。我们研究了三个文本分类任务:情感分类、强迫劳动检测和声明验证。我们包括丹麦语和意大利语的情感分类任务翻译,并将自我解释与人类注释进行比较。为此,我们收集了Climate-Fever声明验证数据集的人类推理注释。我们进一步评估了人类和自我解释推理在正确模型预测方面的忠实度,并通过纳入事后归因基于的解释扩展了研究。我们分析了四个开源LLM,并发现自我解释与人类推理之间的对齐高度依赖于文本长度和任务复杂性。然而,自我解释会产生忠实的token级推理子集,而事后归因方法则倾向于强调结构和格式token,反映出根本不同的解释策略。

英文摘要

Instruction-tuned LLMs are able to provide \textit{an} explanation about their output to users by generating self-explanations, without requiring the application of complex interpretability techniques. In this paper, we analyse whether this ability results in a \textit{good} explanation. We evaluate self-explanations in the form of input rationales with respect to their plausibility to humans. We study three text classification tasks: sentiment classification, forced labour detection and claim verification. We include Danish and Italian translations of the sentiment classification task and compare self-explanations to human annotations. For this, we collected human rationale annotations for Climate-Fever, a claim verification dataset. We furthermore evaluate the faithfulness of human and self-explanation rationales with respect to correct model predictions, and extend the study by incorporating post-hoc attribution-based explanations. We analyse four open-weight LLMs and find that alignment between self-explanations and human rationales highly depends on text length and task complexity. Nevertheless, self-explanations yield faithful subsets of token-level rationales, whereas post-hoc attribution methods tend to emphasize structural and formatting tokens, reflecting fundamentally different explanation strategies.

2409.18272 2026-05-21 cs.LG

SLIDE: A machine-learning based method for forced dynamic response estimation of multibody systems

SLIDE:一种基于机器学习的多体系统强迫动态响应估计方法

Peter Manzl, Alexander Humer, Qasim Khadim, Johannes Gerstmayr

发表机构 * University of Innsbruck, Austria(奥地利因斯布鲁克大学) Johannes Kepler University Linz, Austria(奥地利林茨约翰尼斯·凯普勒大学) University of Oulu, Finland(芬兰奥卢大学)

AI总结 本文提出了一种基于机器学习的SLIDE方法,用于估计机械或多体系统的输出序列,通过滑动窗口初始截断动态响应估计器,利用复数特征值近似阻尼效应,提高模拟速度并实现实时性能。

Comments Paper currently in submission for journal publication

Journal ref Mechanics Based Design of Structures and Machines 54(1), 2026

详情
AI中文摘要

在计算工程中,提高模拟速度和效率是一个永恒的目标。为了充分利用神经网络技术和硬件,我们提出了SLiding-window Initially-truncated Dynamic-response Estimator (SLIDE),一种基于深度学习的方法,用于估计机械或多体系统的输出序列,主要但不局限于强迫激励。SLIDE的一个关键优势是能够估计阻尼系统的动态响应,而无需完整系统状态,使其特别有效于柔性多体系统。该方法根据初始效应(如阻尼)的衰减截断输出窗口,该衰减通过系统线性化方程的复数特征值近似。此外,还训练了一个第二个神经网络来提供误差估计,进一步增强了方法的应用性。该方法应用于包括Duffing振荡器、柔性滑块-曲柄系统和安装在柔性底座上的工业6R机械臂在内的多种系统。我们的结果表明,从模拟到数百万次的加速显著,远超实时性能。

英文摘要

In computational engineering, enhancing the simulation speed and efficiency is a perpetual goal. To fully take advantage of neural network techniques and hardware, we present the SLiding-window Initially-truncated Dynamic-response Estimator (SLIDE), a deep learning-based method designed to estimate output sequences of mechanical or multibody systems with primarily, but not exclusively, forced excitation. A key advantage of SLIDE is its ability to estimate the dynamic response of damped systems without requiring the full system state, making it particularly effective for flexible multibody systems. The method truncates the output window based on the decay of initial effects, such as damping, which is approximated by the complex eigenvalues of the systems linearized equations. In addition, a second neural network is trained to provide an error estimation, further enhancing the methods applicability. The method is applied to a diverse selection of systems, including the Duffing oscillator, a flexible slider-crank system, and an industrial 6R manipulator, mounted on a flexible socket. Our results demonstrate significant speedups from the simulation up to several millions, exceeding real-time performance substantially.

2409.14839 2026-05-21 cs.AI cs.ET cs.HC

Explainable and Human-Grounded AI for Decision Support Systems: The Theory of Epistemic Quasi-Partnerships

可解释且以人为中心的AI用于决策支持系统:知识性准伙伴关系理论

John Dorsch, Maximilian Moll

发表机构 * Faculty of Philosophy, Philosophy of Science and the Study of Religion, Ludwig Maximilian University Munich(哲学学院、科学哲学与宗教研究学院,慕尼黑路德维希-马克西米利安大学)

AI总结 本文提出了一种新的理论框架,即知识性准伙伴关系理论(EQP),用于指导开发能够提供人类基础解释(原因、反事实和置信度)的AI决策支持系统,以满足伦理和可解释AI(XAI)的需求。

Comments 20 pages

Journal ref Philosophy of Artificial Intelligence. Synthese Library, vol 533. Springer. 2026

详情
AI中文摘要

在人工智能决策支持系统(AI-DSS)的背景下,我们主张满足伦理和可解释AI(XAI)的需求是开发AI-DSS,以向人类决策者提供三种类型的以人为中心的解释:原因、反事实和置信度,这种方法我们称为RCC方法。我们首先回顾了当前的实证XAI文献,探讨了生成模型解释的各种方法(如LIME、SHAP、Anchors)与模型感知可信度和终端用户准确性之间的关系。我们展示了当前关于什么是良好人类基础原因的理论要么无法充分解释这些证据,要么没有为开发提供坚实的伦理建议。因此,我们提出了一种新的理论:知识性准伙伴关系理论(EQP)。最后,我们阐明了采用EQP的动机,并展示了它如何解释实证证据,提供坚实的伦理建议,并导致采用RCC方法。

英文摘要

In the context of AI decision support systems (AI-DSS), we argue that meeting the demands of ethical and explainable AI (XAI) is about developing AI-DSS to provide human decision-makers with three types of human-grounded explanations: reasons, counterfactuals, and confidence, an approach we refer to as the RCC approach. We begin by reviewing current empirical XAI literature that investigates the relationship between various methods for generating model explanations (e.g., LIME, SHAP, Anchors), the perceived trustworthiness of the model, and end-user accuracy. We demonstrate how current theories about what constitutes good human-grounded reasons either do not adequately explain this evidence or do not offer sound ethical advice for development. Thus, we offer a novel theory of human-machine interaction: the theory of epistemic quasi-partnerships (EQP). Finally, we motivate adopting EQP and demonstrate how it explains the empirical evidence, offers sound ethical advice, and entails adopting the RCC approach.

2409.04777 2026-05-21 cs.LG math.OC

Optimization Hyper-parameter Laws for Large Language Models

大语言模型的优化超参数规律

Xingyu Xie, Kuangyu Ding, Shuicheng Yan, Kim-Chuan Toh, Tianwen Wei

发表机构 * Department of Mathematics, National University of Singapore, Singapore(新加坡国立大学数学系) School of Computing, National University of Singapore(新加坡国立大学计算机学院) Department of Mathematics and Institute of Operations Research and Analytics, National University of Singapore, Singapore(新加坡国立大学数学系和运筹分析研究所) Skywork AI, Beijing(北京Skywork AI)

AI总结 本文提出Opt-Laws框架,通过分析SDE收敛和逃逸特性,预测最终训练损失,从而在小规模实验中预选学习率调度方案,提高了超参数选择的准确性。

详情
AI中文摘要

大语言模型推动了显著的AI进步,但其训练过程资源消耗大且对超参数选择高度敏感。尽管扩展定律提供了模型大小和数据需求的指导,但它们在选择动态超参数(如学习率调度)方面存在不足。为此,我们提出优化超参数规律(Opt-Laws),该框架将最终训练损失作为学习率调度、模型大小和数据大小的函数进行预测。基于SDE基于的收敛和逃逸分析,Opt-Laws产生可解释的收敛和逃逸特征,能够预测不同模型规模下的最终训练损失,从而在小规模实验中预选调度方案。实证表明,Opt-Laws在验证配置上实现了94%的Top-2命中率,正确识别了所有五个评估的非家族设置中的最佳性能调度家族,并以F1=0.92检测到训练发散。

英文摘要

Large Language Models have driven significant AI advancements, yet their training is resource-intensive and highly sensitive to hyper-parameter selection. While scaling laws provide valuable guidance on model size and data requirements, they fall short in choosing dynamic hyper-parameters, such as learning-rate (LR) schedules, that evolve during training. To bridge this gap, we present Optimization Hyper-parameter Laws (Opt-Laws), a framework that predicts final training loss as a function of LR schedule, model size, and data size. Grounded in SDE-based convergence and escape analyses, Opt-Laws yield interpretable convergence and escape features that predict final training loss across model scales, enabling schedule pre-selection from small-scale experiments. Empirically, Opt-Laws achieve a 94% Top-2 hit rate for identifying near-optimal schedule candidates on held-out configurations, correctly identify the best-performing schedule family in all five evaluated out-of-family settings, and detect training divergence with F1 = 0.92.

2408.08812 2026-05-21 cs.LG

TRAM: Test-Time Risk Adaptation with Mixture of Agents

TRAM: 测试时风险适应与代理混合

Mohamad Fares El Hajj Chehade, Amrit Singh Bedi, Amy Zhang, Hao Zhu

发表机构 * UT Austin(得克萨斯大学) University of Central Florida(中央佛罗里达大学) MIT(麻省理工学院) UMD(大学公园分校)

AI总结 本文研究了在部署时无需更新的零更新适应问题,提出TRAM方法通过混合代理评估源策略的风险调整分数,以降低部署风险并保持奖励。

详情
AI中文摘要

部署的强化学习代理常面临在训练后才指定的安全要求,如新的危险地图、修订的风险阈值或行为对齐约束。我们研究零更新部署时适应,其中固定的风险中性源策略库在新的奖励-风险权衡下被重用。我们提出TRAM(通过代理混合的测试时风险适应),一种源评分的组合规则,该规则在目标奖励和基于占用的部署风险下评估每个源策略,然后使用风险调整的源评分选择动作。不同于训练时与固定替代物(如回报方差)绑定的风险敏感方法,TRAM支持在测试时指定的空间屏障暴露、与参考行为的偏离以及局部波动风险。我们明确将TRAM作为替代方法:它不解决拼接策略的完整占用控制问题,但允许一个可测量的源壳匹配项,将源评分风险与实际风险联系起来。在网格世界、MuJoCo Reacher、Safety-Gymnasium和LLM对齐设置中的实验表明,TRAM在不需测试时任何参数更新的情况下减少了部署风险,同时保持了奖励。

英文摘要

Deployed reinforcement learning agents often face safety requirements that are specified only after training, such as new hazard maps, revised risk thresholds, or behavioral alignment constraints. We study zero-update deployment-time adaptation, where a fixed library of risk-neutral source policies is reused under a newly specified reward-risk tradeoff. We propose TRAM (Test-Time Risk Adaptation via Mixture of Agents), a source-scored composition rule that evaluates each source policy under the target reward and an occupancy-based deployment risk, then selects actions using risk-adjusted source scores. Unlike training-time risk-sensitive methods tied to a fixed surrogate such as return variance, TRAM supports spatial barrier exposure, divergence from a reference behavior, and local volatility risks specified at test time. We explicitly characterize TRAM as a surrogate method: it does not solve the full occupancy-control problem of the stitched policy, but admits a measurable source-hull mismatch term connecting source-scored risk to realized risk. Experiments in gridworlds, MuJoCo Reacher, Safety-Gymnasium, and an LLM alignment setting show that TRAM reduces deployment risk while preserving reward, without requiring any parameter updates at test time.

2406.14978 2026-05-21 cs.CV

E2GS: Event Enhanced Gaussian Splatting

E2GS:事件增强的高斯点撒法

Hiroyuki Deguchi, Mana Masuda, Takuya Nakabayashi, Hideo Saito

发表机构 * Keio University(庆应大学)

AI总结 本文提出E2GS方法,结合事件数据与高斯点撒法,提升图像去模糊和高质量视角合成效果,实验表明其在合成和真实数据集上均能生成视觉吸引人的渲染结果,且训练和渲染速度更快(140 FPS)

Comments 7pages, Accepted at ICIP 2024

详情
AI中文摘要

事件相机因其高动态范围、无运动模糊和低能耗而闻名,这些特性使其在最近的应用中得到了广泛应用。在过去的几年中,基于神经辐射场(NeRF)的事件驱动3D重建领域取得了显著进展,NeRF方法展示了逼真的视角合成结果。然而,NeRF的体积渲染范式需要大量的训练和渲染时间。在本文中,我们介绍了事件增强的高斯点撒法(E2GS),这是一种将事件数据融入高斯点撒法的新方法,该方法最近在新型视角合成领域取得了显著进展。我们的E2GS有效利用了模糊图像和事件数据,显著提高了图像去模糊效果,并产生了高质量的新型视角合成。我们在合成和真实世界数据集上的全面实验表明,我们的E2GS能够生成视觉吸引人的渲染结果,同时提供更快的训练和渲染速度(140 FPS)。我们的代码可在https://github.com/deguchihiroyuki/E2GS上获得。

英文摘要

Event cameras, known for their high dynamic range, absence of motion blur, and low energy usage, have recently found a wide range of applications thanks to these attributes. In the past few years, the field of event-based 3D reconstruction saw remarkable progress, with the Neural Radiance Field (NeRF) based approach demonstrating photorealistic view synthesis results. However, the volume rendering paradigm of NeRF necessitates extensive training and rendering times. In this paper, we introduce Event Enhanced Gaussian Splatting (E2GS), a novel method that incorporates event data into Gaussian Splatting, which has recently made significant advances in the field of novel view synthesis. Our E2GS effectively utilizes both blurry images and event data, significantly improving image deblurring and producing high-quality novel view synthesis. Our comprehensive experiments on both synthetic and real-world datasets demonstrate our E2GS can generate visually appealing renderings while offering faster training and rendering speed (140 FPS). Our code is available at https://github.com/deguchihiroyuki/E2GS.

2312.01386 2026-05-21 cs.LG stat.ML

On the Suboptimality of GP-UCB under Polynomial Effective Optimism

关于多项式有效乐观性下GP-UCB的次优性质

Wenjia Wang, Xiaowei Zhang

发表机构 * Department of Industrial Systems Engineering and Management, National University of Singapore(新加坡国立大学工业系统工程与管理系) Department of Industrial Engineering and Decision Analytics, The Hong Kong University of Science and Technology(香港科学与技术大学工业工程与决策分析系)

AI总结 本文研究了GP-UCB在多项式有效乐观性下的次优性质,通过定义有效乐观性水平(核岭回归中的探索系数与正则化参数的乘积),在统一置信假设下证明了GP-UCB在Matérn核下的新后悔下界,表明有效乐观性水平的多项式增长排除了最小最大最优后悔率,揭示了标准GP-UCB证明最小最大最优性的障碍。

详情
AI中文摘要

高斯过程上置信界(GP-UCB)被广泛用于昂贵黑盒函数的序列优化。尽管文献中已建立了许多关于其累积后悔的上界,但GP-UCB是否最小最大最优仍是一个开放问题。我们通过定义有效乐观性水平(核岭回归中的探索系数与正则化参数的乘积)来研究这一问题。在统一置信假设下,我们证明了GP-UCB在Matérn核下的新后悔下界。该下界表明,有效乐观性水平的多项式增长(至对数因子)排除了最小最大最优的后悔率。由于这一情形涵盖大多数现有分析,我们的结果指出了证明标准GP-UCB最小最大最优性的具体障碍。更广泛地说,它表明当前上界与最小最大下界之间的差距可能反映了算法本身的限制,而不仅仅是分析的限制。

英文摘要

Gaussian process upper confidence bound (GP-UCB) is widely used for sequential optimization of expensive black-box functions. Although many upper bounds on its cumulative regret have been established in the literature, whether GP-UCB is minimax optimal remains open. We study this question through the effective optimism level, defined as the product of the exploration coefficient and the regularization parameter in kernel ridge regression. Under a uniform confidence assumption, we prove a new regret lower bound for GP-UCB with Matérn kernels. The bound shows that polynomial growth of the effective optimism level, up to logarithmic factors, rules out the minimax-optimal regret rate. Since this is the regime covered by most existing analyses, our result identifies a concrete obstacle to proving minimax optimality for standard GP-UCB. More broadly, it suggests that the gap between current upper bounds and minimax lower bounds may reflect a real limitation of the algorithm, not only of the analysis.

2307.11925 2026-05-21 cs.LG math.CA

Mercer Large-Scale Kernel Machines from Ridge Function Perspective

从岭函数视角出发的Mercer大规模核机

Karol Dziedziul, Sergey Kryzhevich, Paweł Wieczyński

发表机构 * Faculty of Applied Mathematics, The Gda\'nsk University of Technology, ul. G. Narutowicza 11/12, 80-952 Gda\'nsk, Poland

AI总结 本文从岭函数视角出发,研究大规模核机的Mercer性质,探讨了通过余弦函数的乘积之和近似核函数的可行性,并分析了该方法的障碍,应用于图像处理中的'一对一'方法。

Comments 17 pages, 3 figures

详情
AI中文摘要

为了从岭函数视角呈现Mercer大规模核机,我们回顾了Lin和Pinkus在《岭函数的基础性》中的结果。我们考虑了Rachimi和Recht于2008年发表的《大规模核机的随机特征》从近似理论的角度出发的主要结果。我们研究了哪些核可以被余弦函数的乘积之和近似,其中余弦函数的参数依赖于x和y,并展示了这种方法的障碍。本文的结果应用于图像处理中的'一对一'方法。

英文摘要

To present Mercer large-scale kernel machines from a ridge function perspective, we recall the results by Lin and Pinkus from {\it Fundamentality of ridge functions}. We consider the main result of the recent paper by Rachimi and Recht, 2008, {\it Random features for large-scale kernel machines} from the Approximation Theory point of view. We study which kernels could be approximated by a sum of products of cosine functions with arguments depending on $x$ and $y$ and present the obstacles of such an approach. The results of this article are applied to Image Processing by procedure "one-vs-rest".

2304.12906 2026-05-21 cs.LG stat.ML

The Score-Difference Flow for Implicit Generative Modeling

隐式生成建模的分数差流

Romann M. Weber

发表机构 * Disney Research(迪士尼研究)

AI总结 本文提出分数差流作为隐式生成建模的一种新方法,通过最优减少两个分布之间的KL散度,展示了其与去噪扩散模型的等价性,并揭示了生成对抗网络训练中隐含的数据优化子问题与分数差流之间的联系。

Comments 25 pages, 5 figures, 4 tables. Updated final version of a paper originally published in Transactions on Machine Learning Research (TMLR), including minor typographical corrections and post-publication commentary connecting the SD flow to drifting models

Journal ref Transactions on Machine Learning Research (7/2023)

详情
AI中文摘要

隐式生成建模(IGM)旨在生成与目标数据分布特征相符的合成样本。近期工作(如分数匹配网络、扩散模型)从推动合成源数据向目标分布的角度出发,通过动力学扰动或环境空间中的流来实现。在此方向上,我们提出任意目标与源分布之间的分数差(SD)作为一种流,该流能够最优地减少两者之间的KL散度。我们应用SD流到方便的代理分布上,这些分布只有在原始分布对齐时才对齐。我们证明在某些条件下,这种形式与去噪扩散模型具有形式等价性。我们还表明,生成对抗网络的训练包含一个隐含的数据优化子问题,当判别器最优时,该子问题在特定损失函数选择下诱导出SD流。因此,SD流为解决生成建模三重困境(高质量样本、模式覆盖和快速采样)的三种模型类别提供了理论联系,从而为统一方法奠定了基础。

英文摘要

Implicit generative modeling (IGM) aims to produce samples of synthetic data matching the characteristics of a target data distribution. Recent work (e.g. score-matching networks, diffusion models) has approached the IGM problem from the perspective of pushing synthetic source data toward the target distribution via dynamical perturbations or flows in the ambient space. In this direction, we present the score difference (SD) between arbitrary target and source distributions as a flow that optimally reduces the Kullback-Leibler divergence between them. We apply the SD flow to convenient proxy distributions, which are aligned if and only if the original distributions are aligned. We demonstrate the formal equivalence of this formulation to denoising diffusion models under certain conditions. We also show that the training of generative adversarial networks includes a hidden data-optimization sub-problem, which induces the SD flow under certain choices of loss function when the discriminator is optimal. As a result, the SD flow provides a theoretical link between model classes that individually address the three challenges of the "generative modeling trilemma" -- high sample quality, mode coverage, and fast sampling -- thereby setting the stage for a unified approach.

2212.08989 2026-05-21 cs.LG

Deep learning applied to computational mechanics: A comprehensive review, state of the art, and the classics

深度学习应用于计算力学:综述、现状和经典方法

Loc Vu-Quoc, Alexander Humer

发表机构 * University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Johannes Kepler University(约翰尼斯·开普勒大学)

AI总结 本文综述了深度学习在计算力学中的应用,包括固体力学、流体力学和有限元技术,并讨论了混合和纯机器学习方法在解决非线性偏微分方程中的作用,同时介绍了LSTM、注意力机制和核方法等技术。

Comments 275 pages, 158 figures. Appeared online on 2023.03.01 at CMES-Computer Modeling in Engineering & Sciences

Journal ref CMES-Computer Modeling in Engineering & Sciences, Vol. 137, No. 2, pp.1069-1343, 2023

详情
AI中文摘要

三个最近由于人工智能在艺术和科学领域取得的突破性进展作为动机:获奖的数字图像、蛋白质折叠、快速矩阵乘法。本文详细回顾了近年来在人工神经网络中的许多发展,特别是深度学习(DL),并将其应用于计算力学(固体力学、流体力学、有限元技术)。讨论了混合和纯机器学习(ML)方法。混合方法将传统PDE离散化与ML方法结合,以帮助建模复杂的非线性本构关系,非线性地降低模型阶数以实现高效模拟(湍流),或通过预测传统积分方法中的某些组件来加速模拟。其中,方法(1)和(2)依赖于长短期记忆(LSTM)架构,方法(3)依赖于卷积神经网络。纯ML方法解决(非线性)PDEs的方法由物理信息神经网络(PINN)方法表示,这些方法可以结合注意力机制来处理不连续解。LSTM和注意力架构,以及现代和通用的经典优化器,包括用于DL网络的随机性,都被广泛回顾。核机,包括高斯过程,为更高级的工作如浅层网络无限宽度提供了足够的深度。不仅面向专家,读者被假定熟悉计算力学,但不熟悉DL,其概念和应用从基础开始构建,旨在让首次学习者快速进入研究前沿。AI的历史和限制被回顾和讨论,特别关注指出经典方法中的错误陈述或误解,即使在知名参考文献中也是如此。大变形梁的位置和指向控制作为示例。

英文摘要

Three recent breakthroughs due to AI in arts and science serve as motivation: An award winning digital image, protein folding, fast matrix multiplication. Many recent developments in artificial neural networks, particularly deep learning (DL), applied and relevant to computational mechanics (solid, fluids, finite-element technology) are reviewed in detail. Both hybrid and pure machine learning (ML) methods are discussed. Hybrid methods combine traditional PDE discretizations with ML methods either (1) to help model complex nonlinear constitutive relations, (2) to nonlinearly reduce the model order for efficient simulation (turbulence), or (3) to accelerate the simulation by predicting certain components in the traditional integration methods. Here, methods (1) and (2) relied on Long-Short-Term Memory (LSTM) architecture, with method (3) relying on convolutional neural networks. Pure ML methods to solve (nonlinear) PDEs are represented by Physics-Informed Neural network (PINN) methods, which could be combined with attention mechanism to address discontinuous solutions. Both LSTM and attention architectures, together with modern and generalized classic optimizers to include stochasticity for DL networks, are extensively reviewed. Kernel machines, including Gaussian processes, are provided to sufficient depth for more advanced works such as shallow networks with infinite width. Not only addressing experts, readers are assumed familiar with computational mechanics, but not with DL, whose concepts and applications are built up from the basics, aiming at bringing first-time learners quickly to the forefront of research. History and limitations of AI are recounted and discussed, with particular attention at pointing out misstatements or misconceptions of the classics, even in well-known references. Positioning and pointing control of a large-deformable beam is given as an example.

2205.13524 2026-05-21 cs.CV cs.GR

PREF: Phasorial Embedding Fields for Compact Neural Representations

PREF: 用于紧凑神经表示的相位嵌入场

Binbin Huang, Xinhao Yan, Anpei Chen, Shenghua Gao, Jingyi Yu

发表机构 * ShanghaiTech University(上海科技大学) ETH Zürich(苏黎世联邦理工学院)

AI总结 本文提出了一种高效的基于频率的神经表示PREF,通过引入覆盖显著边谱的相位体积,结合快速傅里叶变换和局部插值加速傅里叶映射,从而减少频率表示中的成本MLP,提升效率和可解释性。

详情
AI中文摘要

我们提出了一种高效的基于频率的神经表示,称为PREF:一种带有相位体积的浅层MLP,能够覆盖比之前傅里叶特征映射或位置编码更显著的边谱。核心是我们的紧凑3D相位体积,其中频率在2D平面上均匀分布并在1D轴上扩展。为此,我们开发了一种专门且高效的傅里叶变换,结合快速傅里叶变换和局部插值以加速朴素傅里叶映射。我们还引入了Parsvel正则化器以稳定基于频率的学习。通过这些方法,我们的PREF减少了频率表示中的成本MLP,从而显著缩小了其与其他混合表示之间的效率差距,并提高了其可解释性。全面的实验表明,我们的PREF能够捕捉高频细节,同时保持紧凑和鲁棒,包括2D图像泛化、3D签名距离函数回归和5D神经辐射场重建。

英文摘要

We present an efficient frequency-based neural representation termed PREF: a shallow MLP augmented with a phasor volume that covers significant border spectra than previous Fourier feature mapping or Positional Encoding. At the core is our compact 3D phasor volume where frequencies distribute uniformly along a 2D plane and dilate along a 1D axis. To this end, we develop a tailored and efficient Fourier transform that combines both Fast Fourier transform and local interpolation to accelerate naïve Fourier mapping. We also introduce a Parsvel regularizer that stables frequency-based learning. In these ways, Our PREF reduces the costly MLP in the frequency-based representation, thereby significantly closing the efficiency gap between it and other hybrid representations, and improving its interpretability. Comprehensive experiments demonstrate that our PREF is able to capture high-frequency details while remaining compact and robust, including 2D image generalization, 3D signed distance function regression and 5D neural radiance field reconstruction.

2110.06123 2026-05-21 cs.SD eess.AS

COVID-19 Diagnosis from Cough Acoustics using ConvNets and Data Augmentation

通过卷积神经网络和数据增强进行新冠肺炎咳嗽声诊断

Saranga Kingkor Mahanta, Darsh Kaushik, Shubham Jain, Hoang Van Truong, Koushik Guha

发表机构 * Electronics and Communication Engineering Department, National Institute of Technology, Silchar, India 788010(电子与通信工程系,印度尼特理工学院,西尔CHAR,印度788010) Computer Science and Engineering Department, National Institute of Technology,Silchar, India 788010(计算机科学与工程系,印度尼特理工学院,西尔CHAR,印度788010) Software Engineering Department, Paytm, Noida, India 110096(软件工程系,Paytm,印度诺伊达,印度110096) Mathematics in Computer Science, University of Science of HCMC, Ho Chi Minh City, Vietnam 700000(计算机科学中的数学,河内科学大学,胡志明市,越南700000)

AI总结 本文提出利用卷积神经网络和数据增强技术,对DiCOVA 2021挑战赛Track 1中的咳嗽声数据集进行分析,以实现新冠肺炎的诊断,通过改进模型在盲测集上的AUC分数达到87.07,并超越了挑战赛的基线模型。

Comments DiCOVA, top 1st, This work has been submitted to the IEEE for possible publication

Journal ref IEEE Advances in Computing and Future Communication Technologies (ICACFCT), Meerut, India, 2021, pp. 33-38

详情
AI中文摘要

随着新冠肺炎的周期性上升和下降,各国正遭受其波浪的冲击,因此需要一种高效、经济且简便的病毒诊断方法。新冠肺炎阳性个体可能甚至无症状,使诊断变得困难,但其中无症状者并不完全没有病毒引起的症状。他们可能不会表现出任何可观察的症状,如有症状者,但他们在咳嗽方式上可能与未感染者不同。这些咳嗽声音的差异是微小的,难以被人类耳朵察觉,但可以使用基于机器学习的统计模型捕捉到。在本文中,我们提出了一种深度学习方法来分析DiCOVA 2021挑战赛Track 1中的声音数据集,该数据集包含属于新冠肺炎阳性或阴性示例的咳嗽声音记录。为了将声音记录分类为新冠肺炎阳性或阴性示例,我们提出了一个ConvNet模型。我们的模型在提供的盲测集上实现了72.23%的AUC分数,以进行模型的无偏评估。结合数据增强的ConvNet模型进一步将AUC-ROC百分比从72.23增加到87.07。它还比DiCOVA 2021挑战赛的基线模型高出23%,从而在挑战赛排行榜上占据首位。本文提出将梅尔频率倒谱系数作为所提模型的特征输入。

英文摘要

With the periodic rise and fall of COVID-19 and countries being inflicted by its waves, an efficient, economic, and effortless diagnosis procedure for the virus has been the utmost need of the hour. COVID-19 positive individuals may even be asymptomatic making the diagnosis difficult, but amongst the infected subjects, the asymptomatic ones need not be entirely free of symptoms caused by the virus. They might not show any observable symptoms like the symptomatic subjects, but they may differ from uninfected ones in the way they cough. These differences in the coughing sounds are minute and indiscernible to the human ear, however, these can be captured using machine learning-based statistical models. In this paper, we present a deep learning approach to analyze the acoustic dataset provided in Track 1 of the DiCOVA 2021 Challenge containing cough sound recordings belonging to both COVID-19 positive and negative examples. To perform the classification on the sound recordings as belonging to a COVID-19 positive or negative examples, we propose a ConvNet model. Our model achieved an AUC score percentage of 72.23 on the blind test set provided by the same for an unbiased evaluation of the models. The ConvNet model incorporated with Data Augmentation further increased the AUC-ROC percentage from 72.23 to 87.07. It also outperformed the DiCOVA 2021 Challenge's baseline model by 23% thus, claiming the top position on the DiCOVA 2021 Challenge leaderboard. This paper proposes the use of Mel frequency cepstral coefficients as the feature input for the proposed model.

1908.05972 2026-05-21 cs.LG stat.ML

AI-based Prediction of Independent Construction Safety Outcomes from Universal Attributes

基于AI的独立施工安全结果的属性预测

Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier

发表机构 * University of Edinburgh, UK(爱丁堡大学,英国) University of Colorado at Boulder, USA(科罗拉多大学博尔德分校,美国)

AI总结 本文改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法,使用NLP提取属性并训练模型预测伤害严重性、类型、受影响身体部位和事件类型,通过独立人工标注消除潜在的人工相关性,结果表明属性仍具有高度预测性,同时引入了更大的数据集、新模型、模型堆叠和更合适的评估指标,最终成功预测伤害严重性,这是重大进展。

Comments Added author contributions and journal reference, updated corresponding author, fixed a few typos

Journal ref Automation in Construction 118 (2020): 103146

详情
AI中文摘要

本文显著改进并验证了先前研究中通过机器学习从属性中预测安全结果的方法。与原始研究类似,我们使用自然语言处理(NLP)从原始事件报告中提取基本属性,并训练机器学习模型进行预测。此处预测的安全结果包括伤害严重性、伤害类型、受影响身体部位和事件类型。与原始研究不同,安全结果不是通过NLP提取,而是由独立的人工标注提供,从而消除了预测变量和预测目标之间可能的人工相关性。结果表明,属性仍具有高度预测性,证实了原始方法的有效性。当前研究的其他改进包括使用(1)一个包含超过90,000份报告的更大数据集,(2)两种新模型,XGBoost和线性支持向量机(SVM),(3)模型堆叠,(4)更简单的实验设置和更合适的性能指标,以及(5)对各属性重要性评分的分析。最后,伤害严重性结果得到良好预测,这在原始研究中并未实现。这是重大进展。

英文摘要

This paper significantly improves on, and finishes to validate, an approach proposed in previous research in which safety outcomes were predicted from attributes with machine learning. Like in the original study, we use Natural Language Processing (NLP) to extract fundamental attributes from raw incident reports and machine learning models are trained to predict safety outcomes. The outcomes predicted here are injury severity, injury type, body part impacted, and incident type. However, unlike in the original study, safety outcomes were not extracted via NLP but were provided by independent human annotations, eliminating any potential source of artificial correlation between predictors and predictands. Results show that attributes are still highly predictive, confirming the validity of the original approach. Other improvements brought by the current study include the use of (1) a much larger dataset featuring more than 90,000 reports, (2) two new models, XGBoost and linear SVM (Support Vector Machines), (3) model stacking, (4) a more straightforward experimental setup with more appropriate performance metrics, and (5) an analysis of per-category attribute importance scores. Finally, the injury severity outcome is well predicted, which was not the case in the original study. This is a significant advancement.

1907.11769 2026-05-21 cs.CL

Automatically Learning Construction Injury Precursors from Text

从文本自动学习事故前兆

Henrietta Baker, Matthew R. Hallowell, Antoine J. -P. Tixier

发表机构 * University of Edinburgh, UK(爱丁堡大学,英国) University of Colorado at Boulder, USA(科罗拉多大学博尔德分校,美国)

AI总结 本文研究了如何利用建设行业数字记录的安全报告,通过比较几种深度学习方法自动学习事故前兆,以提高对安全事故的理解和学习能力。

Comments Added author contributions and journal reference, updated corresponding author

Journal ref Automation in Construction 118 (2020): 103145

详情
AI中文摘要

鉴于建设行业数字记录的安全报告日益增多,开发方法利用这些数据以提高对安全事故的理解和学习能力变得重要。在本研究中,我们比较了几种自动从原始建设事故报告中学习事故前兆的方法。更具体地说,我们尝试了两种最先进的自然语言处理(NLP)深度学习架构,即卷积神经网络(CNN)和分层注意力网络(HAN),以及已建立的词频-反文档频率表示(TF-IDF)+支持向量机(SVM)方法。对于每种模型,我们提供了一种方法,在训练后识别出平均最能预测每种安全结果的文本模式。我们显示,在这些文本中可以找到有效的事故前兆。所提出的方法也可以让用户可视化和理解模型的预测。

英文摘要

In light of the increasing availability of digitally recorded safety reports in the construction industry, it is important to develop methods to exploit these data to improve our understanding of safety incidents and ability to learn from them. In this study, we compare several approaches to automatically learn injury precursors from raw construction accident reports. More precisely, we experiment with two state-of-the-art deep learning architectures for Natural Language Processing (NLP), Convolutional Neural Networks (CNN) and Hierarchical Attention Networks (HAN), and with the established Term Frequency - Inverse Document Frequency representation (TF-IDF) + Support Vector Machine (SVM) approach. For each model, we provide a method to identify (after training) the textual patterns that are, on average, the most predictive of each safety outcome. We show that among those pieces of text, valid injury precursors can be found. The proposed methods can also be used by the user to visualize and understand the models' predictions.

2605.20539 2026-05-21 cs.LG

OpenSeisML: Open Large-Scale Real Seismic and well-log Dataset for Generative AI

OpenSeisML: 开放式大规模真实地震和井历数据集用于生成式AI

Ipsita Bhar, Huseyin Tuna Erdinc, Thales Souza, Charles Jones, Felix J. Herrmann

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Osokey Ltd(Osokey公司)

AI总结 本文提出OpenSeisML,一个开放的大型真实地震和井历数据集,用于支持生成式AI在地震反演中的应用,通过自动化数据整理流程提供可重复的地震数据准备,以训练生成模型捕捉地下属性的统计分布,从而生成多个统计上一致的现实实现用于不确定性量化。

Comments 5 pages, 8 figures

详情
AI中文摘要

机器学习(ML)和计算机视觉的出现显著加速了地震反演工作流程,通过减少传统昂贵的迭代方法的计算成本。然而,ML方法的发展和评估仍然受限于真实速度模型的稀缺性,因为大多数高质量数据由石油和天然气公司私有拥有。为了解决这一差距,我们提出了OpenSeisML,一个收集真实地震数据集的集合,旨在支持生成式AI(Gen-AI)在地震反演中的工作流程。这些数据集是从英国国家数据存储库(NDR)中公开可用的调查中精心挑选的。当地震体积处于时域而井位于深度时,需要进行时-深转换。我们使用检波器数据建立时-深关系,并通过插值构建速度模型,以实现对叠后地震数据的准确转换。在这里,我们提出了一种自动化数据整理流程,使地震数据准备成为可能,同时确保可重复性。目标是训练一个生成模型,以捕捉地下属性的统计分布,从而生成多个统计上一致的现实实现,用于不确定性量化,这些可以作为地震反演的先验条件。

英文摘要

The advent of machine learning (ML) and computer vision has significantly accelerated seismic inversion workflows by reducing the computational cost of traditionally expensive iterative methods. However, the development and evaluation of ML methods remain limited by the scarcity of realistic velocity models, as most high-quality data are privately owned by oil and gas companies. To address this gap, we present OpenSeisML, a collection of real seismic datasets designed to support generative AI (Gen-AI) workflows for seismic inversion. The datasets are curated from publicly available surveys in the UK National Data Repository (NDR). When seismic volumes are in the time domain and wells are in depth, a time-to-depth conversion is required. We use checkshot data to establish the time-depth relationship and construct a velocity model through interpolation for accurate conversion of post-stack seismic data. Here, we present an automated data curation pipeline that enables seismic data preparation while ensuring reproducibility. The objective is to train a generative model that captures the statistical distribution of subsurface properties, enabling the synthesis of multiple statistically consistent realizations for uncertainty quantification which can act as a prior for seismic inversion.