arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 2237
专题追踪
2412.10139 2026-06-17 cs.CL 版本更新

TACOMORE: Exploring a replicable prompting protocol for LLM-assisted corpus analysis

TACOMORE: 探索一种可复现的提示协议用于LLM辅助语料库分析

Bingru Li, Han Wang, Nicholas Groom

发表机构 * Department of Linguistics and Communication, University of Birmingham(伯明翰大学语言学与传播系) Department of Information Engineering and Computer Science, University of Trento(特伦托大学信息工程与计算机科学系) Institute of Foreign Languages and Cultures, University of Tartu(塔尔图大学外国语言与文化研究所)

AI总结 提出TACOMORE框架,通过结构化提示将LLM从通用概率预测转向基于语料共现模式的推理,提升关键词、搭配和索引行分析的准确性与可复现性,但幻觉问题仍需人工验证。

详情
AI中文摘要

随着语料库语言学不断扩展,研究者面临日益增长的方法论瓶颈:虽然计算工具可以轻松统计数十亿词,但这些数据的定性解释仍然是一个缓慢且劳动密集型的人工任务。大型语言模型(LLM)提供了一种有前景的自动化方法,然而其整合到该领域常因黑箱不可预测性和缺乏可复现性而受阻。本研究引入TACOMORE,一个结构化的提示框架,旨在将临时的AI交互转化为标准化的语言协议。该框架基于四项基本原则(任务、上下文、模型和可复现性),引导LLM超越通用概率预测,将其推理锚定在目标语料库的特定共现模式上。我们将该框架应用于三个核心语料库任务,即关键词、搭配和索引行分析,使用一个开放的COVID-19研究摘要语料库。在测试三个LLM后,我们发现虽然结构化提示提高了准确性和可复现性,但关于幻觉的固有限制仍然存在。本研究为LLM在语料库语言学中的作用提供了批判性视角,强调了它们作为补充工具的潜力,同时突出了人工验证不可替代的角色。

英文摘要

As corpus linguistics continues to scale, researchers are facing a growing methodological bottleneck: while computational tools can easily count billions of words, the qualitative interpretation of these data remains a slow and labor-intensive human task. Large Language Models (LLMs) offer a promising way to automate this process, yet their integration into the field is often hindered by concerns over black-box unpredictability and a lack of replicability. This study introduces TACOMORE, a structured prompting framework designed to transform ad-hoc AI interactions into a standardized linguistic protocol. Built upon four foundational principles (Task, Context, Model, and Replicability), the framework guides LLMs to move beyond generic probability prediction to anchoring their reasoning in the specific co-occurrence patterns of a target corpus. We applied this framework to three core corpus tasks, i.e., the analysis of keywords, collocates, and concordances, using an open corpus of COVID-19 research abstracts. After testing three LLMs, we found that while structured prompting improves accuracy and replicability, inherent limitations regarding hallucination persist. This research offers a critical lens into the role of LLMs in corpus linguistics, highlighting their potential as complementary tools while emphasizing the irreplaceable role of human validation.

2404.09790 2026-06-17 cs.CV 版本更新

NTIRE 2024 Challenge on Image Super-Resolution (x4): Methods and Results

NTIRE 2024图像超分辨率挑战赛(x4):方法与结果

Zheng Chen, Zongwei Wu, Eduard Zamfir, Kai Zhang, Yulun Zhang, Radu Timofte, Xiaokang Yang, Hongyuan Yu, Cheng Wan, Yuxin Hong, Zhijuan Huang, Yajun Zou, Yuan Huang, Jiamin Lin, Bingnan Han, Xianyu Guan, Yongsheng Yu, Daoan Zhang, Xuanwu Yin, Kunlong Zuo, Jinhua Hao, Kai Zhao, Kun Yuan, Ming Sun, Chao Zhou, Hongyu An, Xinfeng Zhang, Zhiyuan Song, Ziyue Dong, Qing Zhao, Xiaogang Xu, Pengxu Wei, Zhi-chao Dou, Gui-ling Wang, Chih-Chung Hsu, Chia-Ming Lee, Yi-Shiuan Chou, Cansu Korkmaz, A. Murat Tekalp, Yubin Wei, Xiaole Yan, Binren Li, Haonan Chen, Siqi Zhang, Sihan Chen, Amogh Joshi, Nikhil Akalwadi, Sampada Malagi, Palani Yashaswini, Chaitra Desai, Ramesh Ashok Tabib, Ujwala Patil, Uma Mudenagudi, Anjali Sarvaiya, Pooja Choksy, Jagrit Joshi, Shubh Kawa, Kishor Upla, Sushrut Patwardhan, Raghavendra Ramachandra, Sadat Hossain, Geongi Park, S. M. Nadim Uddin, Hao Xu, Yanhui Guo, Aman Urumbekov, Xingzhuo Yan, Wei Hao, Minghan Fu, Isaac Orais, Samuel Smith, Ying Liu, Wangwang Jia, Qisheng Xu, Kele Xu, Weijun Yuan, Zhan Li, Wenqin Kuang, Ruijin Guan, Ruting Deng, Zhao Zhang, Bo Wang, Suiyi Zhao, Yan Luo, Yanyan Wei, Asif Hussain Khan, Christian Micheloni, Niki Martinel

发表机构 * CVLAI

AI总结 本文回顾NTIRE 2024图像超分辨率挑战赛(x4),总结参赛方案和成果,推动单图像超分辨率性能边界并概述当前趋势。

Comments NTIRE 2024 webpage: https://cvlai.net/ntire/2024. Code: https://github.com/zhengchen1999/NTIRE2024_ImageSR_x4

Journal ref Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 6108-6132

详情
AI中文摘要

本文回顾了NTIRE 2024图像超分辨率($\ imes$4)挑战赛,重点介绍了提出的解决方案和获得的结果。该挑战涉及利用先验信息从低分辨率(LR)输入生成对应的高分辨率(HR)图像,放大倍数为四倍。LR图像来源于双三次下采样退化。挑战的目标是获得具有最先进SR性能的设计/解决方案,对计算资源(如模型大小和FLOPs)或训练数据没有限制。该赛道在DIV2K测试数据集上使用PSNR指标评估性能。比赛吸引了199名注册者,其中20支队伍提交了有效参赛作品。这一集体努力不仅推动了单图像SR的性能边界,还提供了对该领域当前趋势的全面概述。

英文摘要

This paper reviews the NTIRE 2024 challenge on image super-resolution ($\times$4), highlighting the solutions proposed and the outcomes obtained. The challenge involves generating corresponding high-resolution (HR) images, magnified by a factor of four, from low-resolution (LR) inputs using prior information. The LR images originate from bicubic downsampling degradation. The aim of the challenge is to obtain designs/solutions with the most advanced SR performance, with no constraints on computational resources (e.g., model size and FLOPs) or training data. The track of this challenge assesses performance with the PSNR metric on the DIV2K testing dataset. The competition attracted 199 registrants, with 20 teams submitting valid entries. This collective endeavour not only pushes the boundaries of performance in single-image SR but also offers a comprehensive overview of current trends in this field.

2212.07700 2026-06-17 cs.CV 版本更新

Colab NAS: Obtaining lightweight task-specific convolutional neural networks following Occam's razor

Colab NAS:遵循奥卡姆剃刀原则获取轻量级任务特定卷积神经网络

Andrea Mattia Garavagno, Daniele Leonardis, Antonio Frisoli

发表机构 * Institute of Mechanical Intelligence, Scuola Superiore Sant’Anna of Pisa(机械智能研究所,比萨圣安娜高等学院)

AI总结 提出ColabNAS,一种低成本的硬件感知神经架构搜索方法,通过奥卡姆剃刀启发的无导数搜索策略,在免费GPU服务上3.1小时内获得轻量级CNN,在Visual Wake Word数据集上达到最先进结果。

Journal ref Future Generation Computer Systems, vol. 152, pp. 152-159, 2024

详情
AI中文摘要

当前从在大数据集上训练的卷积神经网络(CNN)进行迁移学习的趋势,在目标应用是一个自定义且有限的问题,且有足够数据从头训练网络时,可能是一种过度杀伤。另一方面,从头训练自定义且更轻量的CNN需要专业知识,以及在硬件感知神经架构搜索(HW NAS)情况下需要高端资源,这限制了非习惯性神经网络开发者对该技术的访问。因此,我们提出了ColabNAS,一种用于生成轻量级任务特定CNN的经济实惠的HW NAS技术。其新颖的无导数搜索策略受奥卡姆剃刀原则启发,使得在Visual Wake Word数据集(一个标准的TinyML基准)上,仅需使用Google Colaboratory和Kaggle Kernel等免费在线GPU服务,在3.1 GPU小时内即可获得最先进的结果。

英文摘要

The current trend of applying transfer learning from convolutional neural networks (CNNs) trained on large datasets can be an overkill when the target application is a custom and delimited problem, with enough data to train a network from scratch. On the other hand, the training of custom and lighter CNNs requires expertise, in the from-scratch case, and or high-end resources, as in the case of hardware-aware neural architecture search (HW NAS), limiting access to the technology by non-habitual NN developers. For this reason, we present ColabNAS, an affordable HW NAS technique for producing lightweight task-specific CNNs. Its novel derivative-free search strategy, inspired by Occam's razor, allows to obtain state-of-the-art results on the Visual Wake Word dataset, a standard TinyML benchmark, in just 3.1 GPU hours using free online GPU services such as Google Colaboratory and Kaggle Kernel.

2507.15777 2026-06-17 cs.CV

Label tree semantic losses for rich multi-class medical image segmentation

用于丰富多类医学图像分割的标签树语义损失

Junwen Wang, Oscar MacCormac, William Rochford, Aaron Kujawa, Jonathan Shapey, Tom Vercauteren

发表机构 * School of Biomedical Engineering & Imaging Sciences(生物医学工程与成像科学学院) Department of Neurosurgery(神经外科部门)

AI总结 提出两种基于标签层次结构的树状语义损失函数,在脑MRI全监督分割和神经外科高光谱成像稀疏标注场景理解中取得一致改进。

详情
AI中文摘要

丰富且准确的医学图像分割有望通过描绘术前规划的关键解剖结构、指导实时术中导航和支持精确术后评估,为下一代AI定义的临床实践奠定基础。然而,医学和外科成像分割任务中常用的学习方法对所有错误一视同仁,未能利用标签空间中的任何类间语义。随着标签基数和丰富度的增加以包含细微不同的类别,这一问题变得尤为突出。在这项工作中,我们提出了两种基于树的语义损失函数,利用标签的层次组织。我们进一步将我们的损失纳入最近提出的用于稀疏、无背景标注的训练方法中,以扩展所提出损失的适用性。在两个医学和外科成像分割任务上进行了大量实验,即全监督的头部MRI全脑分割和稀疏标注的神经外科高光谱成像场景理解。结果表明,在评估的任务特定基线上取得了一致的改进,其中基于Wasserstein的复合损失在全脑分割中支持最强,而层次加权顶层监督在稀疏HSI设置中表现最佳。

英文摘要

Rich and accurate medical image segmentation is poised to underpin the next generation of AI-defined clinical practice by delineating critical anatomy for pre-operative planning, guiding real-time intra-operative navigation, and supporting precise post-operative assessment. However, commonly used learning methods for medical and surgical imaging segmentation tasks penalise all errors equivalently and thus fail to exploit any inter-class semantics in the label space. This becomes particularly problematic as the cardinality and richness of labels increases to include subtly different classes. In this work, we propose two tree-based semantic loss functions which take advantage of a hierarchical organisation of the labels. We further incorporate our losses in a recently proposed approach for training with sparse, background-free annotations to extend the applicability of our proposed losses. Extensive experiments are reported on two medical and surgical imaging segmentation tasks, namely head MRI for whole brain parcellation with full supervision and neurosurgical hyperspectral imaging for scene understanding with sparse annotations. Results demonstrate consistent improvements over the evaluated task-specific baselines, with the strongest support for the Wasserstein-based compound loss in whole-brain parcellation and for hierarchy-weighted top-level supervision in the sparse HSI setting.

2601.06116 2026-06-17 cs.AI cs.CL cs.CY

The Homogenization Problem in LLMs: Towards Meaningful Diversity in AI Safety

在大语言模型中的同质化问题:迈向人工智能安全中的有意义多样性

Ian Rios-Sialer

发表机构 * Independent Researcher(独立研究者)

AI总结 本文探讨了大语言模型中同质化问题,提出通过编码价值观系统来促进多样性,通过实验揭示性别偏见并引入xeno-reproduction概念以缓解同质化。

详情
AI中文摘要

生成式AI模型在训练数据中复制人类偏见,并通过如模式崩溃等机制放大这些偏见。多样性丧失导致同质化,不仅损害少数群体,也使所有人受益。我们主张同质化应成为人工智能安全的核心关注点。为有意义地表征大语言模型中的同质化,我们引入一个框架,允许利益相关者编码其上下文和价值体系。我们通过实验揭示了一个大语言模型(Claude 3.5 Haiku)在开放性故事提示中的性别偏见。基于酷儿理论,我们将同质化定义为规范性。借用女性主义理论的语言,我们引入xeno-reproduction作为一类任务,以通过促进多样性来缓解同质化。我们的工作开启了一条协作研究路线,旨在理解和推进AI中的多样性。

英文摘要

Generative AI models reproduce the human biases in their training data and further amplify them through mechanisms such as mode collapse. The loss of diversity produces homogenization, which not only harms the minoritized but impoverishes everyone. We argue homogenization should be a central concern in AI safety. To meaningfully characterize homogenization in Large Language Models (LLMs), we introduce a framework that allows stakeholders to encode their context and value system. We illustrate our approach with an experiment that surfaces gender bias in an LLM (Claude 3.5 Haiku) on an open-ended story prompt. Building from queer theory, we formalize homogenization in terms of normativity. Borrowing language from feminist theory, we introduce the concept of xeno-reproduction as a class of tasks for mitigating homogenization by promoting diversity. Our work opens a collaborative line of research that seeks to understand and advance diversity in AI.

2605.12220 2026-06-17 cs.CV cs.AI cs.LG cs.RO

TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

TriBand-BEV:基于高度感知的鸟瞰图与高分辨率特征融合的实时仅LiDAR三维行人检测

Mohammad Khoshkdahan, Alexey Vinel

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院)

AI总结 本文提出TriBand-BEV方法,通过高度感知的鸟瞰图与高分辨率特征融合实现实时LiDAR-only三维行人检测,采用轻量级鸟瞰图张量映射,单网络一次通过检测车辆、行人和自行车,提升检测精度与速度。

Comments Accepted for publication in the Proceedings of the 2026 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

Journal ref Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情
AI中文摘要

安全的自动驾驶代理和移动机器人需要快速的实时三维感知,尤其是对于行人等易受伤害道路使用者。我们介绍了一种新的鸟瞰图(BEV)编码方法,将完整的三维LiDAR点云映射到轻量级的二维BEV张量中,分为三个高度带。我们明确地将三维检测重新公式化为二维检测问题,然后从BEV输出中重建三维框。单个网络在一次通过中检测车辆、行人和自行车。骨干网络在深层阶段使用区域注意力,层次化的双向颈部网络在P1到P4之间融合上下文和细节,头部使用分布焦点学习预测定向框,以预测侧偏移和旋转IoU损失。训练应用小垂直重新分箱和温和的反射率抖动以防止记忆化。我们使用四分位距(IQR)过滤器在三维重建中去除噪声和离群的LiDAR点。在KITTI数据集上,TriBand-BEV在49 FPS的单个消费级GPU上实现了易、中等和困难样本的行人BEV AP分别为58.7/52.6/47.2%,优于Complex-YOLO,分别提升了+12.6%、+7.5%和+3.1%。定性场景显示在遮挡下检测稳定。该流程紧凑且适用于实时机器人部署。我们的源代码在GitHub上公开可用。

英文摘要

Safe autonomous agents and mobile robots need fast real time 3D perception, especially for vulnerable road users (VRUs) such as pedestrians. We introduce a new bird's eye view (BEV) encoding, which maps the full 3D LiDAR point cloud into a light-weight 2D BEV tensor with three height bands. We explicitly reformulate 3D detection as a 2D detection problem and then reconstruct 3D boxes from the BEV outputs. A single network detects cars, pedestrians, and cyclists in one pass. The backbone uses area attention at deep stages, a hierarchical bidirectional neck over P1 to P4 fuses context and detail, and the head predicts oriented boxes with distribution focal learning for side offsets and a rotated IoU loss. Training applies a small vertical re bin and a mild reflectance jitter in channel space to resist memorization. We use an interquartile range (IQR) filter to remove noisy and outlier LiDAR points during 3D reconstruction. On KITTI dataset, TriBand-BEV attains 58.7/52.6/47.2 pedestrian BEV AP(%) for easy, moderate, and hard at 49 FPS on a single consumer GPU, surpassing Complex-YOLO, with gains of +12.6%, +7.5%, and +3.1%. Qualitative scenes show stable detection under occlusion. The pipeline is compact and ready for real time robotic deployment. Our source code is publicly available on GitHub.

2601.12912 2026-06-17 cs.AI

Human Emotion Verification by Action Languages via Answer Set Programming

通过答案集编程进行人类情感验证的动作语言

Andreas Brännström, Juan Carlos Nieves

发表机构 * Umeå University\ of Computing Science

AI总结 本文提出动作语言C-MT,基于答案集编程和过渡系统,用于表示人类心理状态对可观察动作序列的演变。通过引入因果规则,该语言能建模心理状态的有效转换原则,从而实现对人类心理动态的受控推理。

Comments Under consideration in Theory and Practice of Logic Programming (TPLP)

Journal ref Theory and Practice of Logic Programming 25 (2025) 1047-1104

详情
AI中文摘要

在本文中,我们介绍了动作语言C-MT(Mind Transition Language)。它建立在答案集编程(ASP)和过渡系统之上,用于表示人类心理状态如何响应一系列可观察动作序列而演变。基于已建立的心理学理论,如情绪评估理论,我们将情绪等心理状态形式化为多维配置。为了满足对受控智能体行为的需求,并限制动作的不良心理副作用,我们扩展了该语言,引入了新的因果规则'禁止导致',以及专门用于心理状态动态的表达式,从而能够建模有效转换之间心理状态的原则。这些心理变化的原则被翻译成过渡约束,并通过所谓的轨迹在过渡系统中严格评估其不变性属性。这使得能够对人类心理状态的动态演变进行受控推理。此外,该框架支持通过分析遵循不同心理学原理的轨迹来比较不同变化动态。我们应用该动作语言来设计情绪验证模型。

英文摘要

In this paper, we introduce the action language C-MT (Mind Transition Language). It is built on top of answer set programming (ASP) and transition systems to represent how human mental states evolve in response to sequences of observable actions. Drawing on well-established psychological theories, such as the Appraisal Theory of Emotion, we formalize mental states, such as emotions, as multi-dimensional configurations. With the objective to address the need for controlled agent behaviors and to restrict unwanted mental side-effects of actions, we extend the language with a novel causal rule, forbids to cause, along with expressions specialized for mental state dynamics, which enables the modeling of principles for valid transitions between mental states. These principles of mental change are translated into transition constraints, and properties of invariance, which are rigorously evaluated using transition systems in terms of so-called trajectories. This enables controlled reasoning about the dynamic evolution of human mental states. Furthermore, the framework supports the comparison of different dynamics of change by analyzing trajectories that adhere to different psychological principles. We apply the action language to design models for emotion verification. Under consideration in Theory and Practice of Logic Programming (TPLP).

2512.03805 2026-06-17 cs.LG

Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA

基于动态算法配置的深度强化学习:在OneMax优化中使用(1+(λ,λ))-GA的案例研究

Tai Nguyen, Phong Le, André Biedenkapp, Carola Doerr, Nguyen Dang

发表机构 * University of St Andrews, United Kingdom(圣安德鲁大学,英国) Sorbonne Université, CNRS, LIP6, France(索邦大学,法国) University of Freiburg, Germany(弗赖堡大学,德国)

AI总结 本文研究了深度强化学习算法DDQN和PPO在OneMax问题中控制(1+(λ,λ))-GA种群大小的挑战,发现DDQN和PPO存在可扩展性下降和学习不稳定问题,通过自适应奖励转移机制改进DDQN,使其在样本效率上优于传统方法。

Comments arXiv admin note: text overlap with arXiv:2502.20265

详情
AI中文摘要

动态算法配置(DAC)研究参数化优化算法控制策略的高效识别。许多研究利用强化学习(RL)解决DAC挑战;然而,应用RL通常需要大量领域专业知识。在本文中,我们对两种深度RL算法——双深度Q网络(DDQN)和近端策略优化(PPO)——进行深入研究,以控制OneMax实例上的(1+(λ,λ))-GA种群大小。尽管OneMax在结构上简单,但为(1+(λ,λ))-GA学习有效的控制策略诱导了一个高度具有挑战性的DAC景观,使其成为受控且 demanding 的基准。我们的研究揭示了限制DDQN和PPO的两个基本挑战:可扩展性下降和学习不稳定,归因于探索不足和规划时间跨度覆盖不足。为了解决探索不足,我们引入了一种自适应奖励转移机制,利用奖励分布统计信息来增强DDQN的探索。这消除了实例特定超参数调优,并确保了在问题规模上的一致有效性。为了解决规划时间跨度覆盖问题,我们证明了在DDQN中无折扣学习的成功,而PPO面临根本的方差问题,需要替代设计。我们进一步表明,尽管超参数优化增强了PPO的稳定性,但它始终无法识别有效的策略。最后,DDQN结合自适应奖励转移在样本效率上与理论推导的策略相当,远超先前的DAC方法。我们的发现提供了对标准深度RL方法在这一具有挑战性的DAC设置中所面临根本障碍的理解,并突显了有效学习所需的关键方法论成分。

英文摘要

Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies leverage Reinforcement Learning (RL) to address DAC challenges; however, applying RL often requires extensive domain expertise. In this work, we conduct a comprehensive study of two deep-RL algorithms--Double Deep Q-Networks (DDQN) and Proximal Policy Optimization (PPO)--for controlling the population size of the $(1+(λ,λ))$-GA on OneMax instances. Although OneMax is structurally simple, learning effective control policies for the $(1+(λ,λ))$-GA induces a highly challenging DAC landscape, making it a controlled yet demanding benchmark. Our investigation reveals two fundamental challenges limiting DDQN and PPO: scalability degradation and learning instability, traced to under-exploration and planning horizon coverage. To address under-exploration, we introduce an adaptive reward shifting mechanism that leverages reward distribution statistics to enhance DDQN exploration. This eliminates instance-specific hyperparameter tuning and ensures consistent effectiveness across problem scales. To resolve planning horizon coverage, we demonstrate that undiscounted learning succeeds in DDQN, while PPO faces fundamental variance issues necessitating alternative designs. We further show that while hyperparameter optimization enhances PPO's stability, it consistently fails to identify effective policies. Finally, DDQN with adaptive reward shifting achieves performance comparable to theoretically derived policies with vastly improved sample efficiency, outperforming prior DAC approaches by orders of magnitude. Our findings provide insights into the fundamental obstacles faced by standard deep-RL approaches in this challenging DAC setting and highlight the key methodological ingredients required for effective learning.

2512.20985 2026-06-17 cs.AI cs.MA

A Blockchain-Monitored Agentic AI Architecture for Trusted Perception-Reasoning-Action Pipelines

基于区块链监控的代理AI架构:可信感知-推理-行动流水线

Salman Jan, Hassan Ali Razzaqi, Ali Akarma, Mohammad Riyaz Belgaum

发表机构 * Faculty of Computer Studies, Arab Open University-Bahrain(巴林阿拉伯开放大学计算机科学学院) Faculty of Computer and Information System, Islamic University of Madinah, Saudi Arabia(沙特阿拉伯麦地那伊斯兰大学计算机与信息系统学院)

AI总结 本文提出一种结合区块链的代理AI架构,用于确保自主决策流程中的信任和可追溯性,通过区块链实现对行动的持续监控和审计,验证输入并记录执行结果。

Comments This paper was presented at the IEEE International Conference on Computing and Applications (ICCA 2025), Bahrain

Journal ref Proceedings of the 2025 IEEE International Conference on Computing and Applications (ICCA), Bahrain, 2025, pp. 1-7

详情
AI中文摘要

代理AI系统在医疗、智慧城市、数字取证和供应链管理等领域应用日益广泛。尽管这些系统灵活且能提供实时推理,但它们也引发了信任、监督和信息完整性方面的担忧。本文提出一种由LangChain多代理系统和受限制区块链组成的单一架构模型,以确保持续监控、政策执行和不可变审计。该框架将感知-行动循环与区块链治理层相关联,验证输入、评估推荐行动并记录执行结果。介绍了一种基于Hyperledger Fabric的系统,集成了MCP执行器和LangChain代理,并进行了智能库存管理、交通信号控制和医疗监控的实验。结果表明,区块链安全验证在防止未经授权实践、确保整个决策过程的可追溯性以及维持合理操作延迟方面是高效的。所提出的框架提供了一种通用系统,用于实施高影响的自主且负责任的代理AI应用。

英文摘要

The application of agentic AI systems in autonomous decision-making is growing in the areas of healthcare, smart cities, digital forensics, and supply chain management. Even though these systems are flexible and offer real-time reasoning, they also raise concerns of trust and oversight, and integrity of the information and activities upon which they are founded. The paper suggests a single architecture model comprising of LangChain-based multi-agent system with a permissioned blockchain to guarantee constant monitoring, policy enforcement, and immutable auditability of agentic action. The framework relates the perception conceptualization-action cycle to a blockchain layer of governance that verifies the inputs, evaluates recommended actions, and documents the outcomes of the execution. A Hyperledger Fabric-based system, action executors MCP-integrated, and LangChain agent are introduced and experiments of smart inventory management, traffic-signal control, and healthcare monitoring are done. The results suggest that blockchain-security verification is efficient in preventing unauthorized practices, offers traceability throughout the whole decision-making process, and maintains operational latency within reasonable ranges. The suggested framework provides a universal system of implementing high-impact agentic AI applications that are autonomous yet responsible.

2310.06328 2026-06-17 cs.LG eess.SP

ARC-Fi: Exploiting Antenna Spatial Diversity for Label-Efficient Domain Generalization in Wi-Fi Sensing

ARC-Fi: 利用天线空间多样性实现标签高效领域泛化在Wi-Fi传感

Ke Xu, Zhiyong Zheng, Hongyuan Zhu, Lei Wang, Jiangtao Wang

发表机构 * Suzhou Institute for Advanced Research, University of Science and Technology of China(中国科学技术大学苏州研究院) Suzhou Big Data and AI Research and Engineering Center(苏州大数据与人工智能研究与工程中心) School of Artificial Intelligence and Data Science, University of Science and Technology of China(中国科学技术大学人工智能与数据科学学院) Institute for Infocomm Research (I 2 R), A*STAR(资讯与通讯研究院(I2R),A*STAR) School of Computer Science and Technology, Soochow University(苏州大学计算机科学与技术学院)

AI总结 ARC-Fi通过引入物理指导的数据增强策略,解决Wi-Fi传感中领域偏移问题,实现高效领域泛化。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

Wi-Fi传感系统在部署于未见过的现实环境时受到领域偏移的严重阻碍。尽管现有方法试图通过无监督领域适应(UDA)或领域泛化(DG)来解决这一问题,但它们严重依赖于不可用的目标数据或过于昂贵且庞大的标注源数据集。在实践中,收集大量未标注的信道状态信息(CSI)是可行的,而手动标注则受到严重限制。这种现实困境需要半监督领域泛化(SSDG)。为此,我们提出了ARC-Fi,这是首个专门用于Wi-Fi传感的SSDG框架。直接应用传统对比学习到CSI数据不可避免地触发领域特定的“捷径学习”,导致模型记忆环境背景而非手势动态。为克服这一问题,ARC-Fi引入了一种物理指导的数据增强策略:天线响应一致性(ARC)模块。ARC利用多天线系统的内在空间多样性,将位于同一位置的天线信号视为自然语义保持的增强视图,以明确阻止环境捷径。此外,我们引入了一个统一的半监督对比目标,利用稀缺标签和可靠的伪标签对跨领域特征进行对齐,有效防止了同类实例的盲目排斥。在Widar和CSIDA数据集上的广泛实验表明,ARC-Fi建立了新的最先进的水平,显著优于现有的UDA、DG和SSDG方法。最终,这项工作提供了一个基于物理的、标签高效的解决方案,推动了稳健现实Wi-Fi传感系统的大规模部署。代码可在:https://github.com/KaoruMiyazono/UniCrossFi。

英文摘要

Wi-Fi sensing systems are severely hindered by domain shifts when deployed in unseen real-world environments. While existing methods attempt to tackle this through Unsupervised Domain Adaptation (UDA) or Domain Generalization (DG), they critically rely on either inaccessible target data or prohibitively expensive, massive labeled source datasets. In practice, collecting abundant unlabeled Channel State Information (CSI) is feasible, whereas manual labeling is severely constrained. This realistic dilemma necessitates Semi-Supervised Domain Generalization (SSDG). To this end, we propose ARC-Fi, the first dedicated SSDG framework for Wi-Fi sensing. Directly applying conventional contrastive learning to CSI data inevitably triggers paradigm-specific "shortcut learning," causing models to memorize environmental backgrounds rather than gesture dynamics. To overcome this, ARC-Fi introduces a physics-informed data augmentation strategy: the Antenna Response Consistency (ARC) module. ARC exploits the intrinsic spatial diversity of multi-antenna systems, treating signals from co-located antennas as naturally semantics-preserving augmented views to explicitly block environmental shortcuts. Furthermore, we introduce a unified Semi-Supervised Contrastive Objective that leverages scarce labels and reliable pseudo-labels to align cross-domain features, effectively preventing the blind repulsion of same-class instances. Extensive experiments on the Widar and CSIDA datasets demonstrate that ARC-Fi establishes a new state-of-the-art, significantly outperforming existing UDA, DG, and SSDG methods. Ultimately, this work provides a physics-grounded, label-efficient solution, advancing the scalable deployment of robust real-world Wi-Fi sensing systems. Code is available at: https://github.com/KaoruMiyazono/UniCrossFi.

2505.12620 2026-06-17 cs.CV

BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation

BusterX:基于MLLM的AI生成视频伪造检测与解释

Haiquan Wen, Yiwei He, Zhenglin Huang, Tianxiao Li, Zihan Yu, Xingru Huang, Lu Qi, Baoyuan Wu, Xiangtai Li, Guangliang Cheng

发表机构 * University of Liverpool, UK(利物浦大学) Nanyang Technological University, SG(南洋理工大学) The Chinese University of Hong Kong, Shenzhen, Guangdong, China(香港中文大学(深圳)) Wuhan University(武汉大学) Hangzhou Dianzi University(杭州电子科技大学)

AI总结 本文提出BusterX,一种基于多模态大语言模型的视频伪造检测系统,通过改进数据集和评估基准,提升检测准确性和解释质量。

详情
AI中文摘要

随着生成视频模型日益逼真,检测AI生成视频需要兼具准确性和可解释性的系统。然而,将多模态大语言模型(MLLMs)应用于视频取证目前受限于过时的数据集、简化的评估协议和对黑盒分类的依赖。为解决这些问题,我们引入了一个全面的数据集、基准和基线模型用于视频伪造检测。首先,我们提出了GenBuster-200K,一个包含超过200,000个高质量视频的公平数据集,这些视频来自最先进的生成器,涵盖多样化的现实场景。其次,我们提出了GenBuster-Bench,一个覆盖三个渐进赛道(领域内、领域外和野外)的诊断基准,用于评估模型在领域转移和代际转移中的表现。它还引入了MLLM-as-a-Judge协议来评估生成的取证解释质量。最后,我们开发了BusterX,一种具有RL训练的MLLM基线模型。不同于直接二元分类,BusterX将检测视为视觉推理任务,其中生成的推理链本身作为检测器。实验结果表明,BusterX在检测准确性和推理质量上均优于几种领先的MLLMs(例如Qwen3.5、Claude-Sonnet-4.6)

英文摘要

As generative video models become increasingly realistic, detecting AI-generated videos requires systems that offer both accuracy and interpretability. However, applying Multimodal Large Language Models (MLLMs) to video forensics is currently limited by outdated datasets, simplistic evaluation protocols, and a reliance on black-box classification. To address these issues, we introduce a comprehensive dataset, benchmark, and baseline model for video forgery detection. First, we present \textbf{GenBuster-200K}, a fair dataset of over 200,000 high-quality videos sourced from state-of-the-art generators, featuring diverse real-world scenarios. Second, we propose \textbf{GenBuster-Bench}, a diagnostic benchmark spanning three progressive tracks (In-Domain, Out-of-Domain, and In-the-Wild) to evaluate models across \textit{domain shifts} and \textit{generational shifts}. It also introduces an MLLM-as-a-Judge protocol to assess the quality of the generated forensic explanations. Finally, we develop \textbf{BusterX}, an MLLM baseline with RL training. Instead of direct binary classification, BusterX formulates detection as a visual reasoning task, where the generated reasoning chain serves as detector itself. Experimental results demonstrate that BusterX outperforms several leading MLLMs (e.g., Qwen3.5, Claude-Sonnet-4.6) in both detection accuracy and rationale quality.

2508.04492 2026-06-17 cs.CV cs.AI

Learning Robust Intervention Representations with Delta Embeddings

通过delta嵌入学习鲁棒的干预表示

Panagiotis Alimisis, Christos Diou

发表机构 * Department of Informatics and Telematics(信息与电信学系)

AI总结 本文提出通过潜在空间中的可操作反事实表示提升模型鲁棒性,提出因果delta嵌入方法,在无需额外监督的情况下学习因果表示,实验显示其在合成和现实基准中表现优异。

Comments ICLR 2026, Poster

Journal ref International Conference on Learning Representations (ICLR), 2026

详情
AI中文摘要

因果表示学习近年来引起了广泛关注,作为提高模型泛化性和鲁棒性的手段。因果干预图像对(也称为“可操作反事实”)的表示具有特性:在起始状态和结束状态之间,只有受干预/动作影响的场景变量发生变化。尽管大多数工作集中在识别和表示因果模型下的场景变量,但较少关注干预本身的表示。本文表明,通过关注潜在空间中的可操作反事实表示,可以有效提升离分布鲁棒性。具体而言,我们提出干预可通过因果delta嵌入表示,该嵌入对视觉场景不变且在影响的因果变量上稀疏。基于此见解,我们提出一种无需额外监督的学习因果表示的方法。在因果三元组挑战中的实验表明,因果delta嵌入在离分布设置中表现突出,显著超越基线性能,在合成和现实基准中均取得优异结果。

英文摘要

Causal representation learning has attracted significant research interest during the past few years, as a means for improving model generalization and robustness. Causal representations of interventional image pairs (also called ``actionable counterfactuals'' in the literature), have the property that only variables corresponding to scene elements affected by the intervention / action are changed between the start state and the end state. While most work in this area has focused on identifying and representing the variables of the scene under a causal model, fewer efforts have focused on representations of the interventions themselves. In this work, we show that an effective strategy for improving out of distribution (OOD) robustness is to focus on the representation of actionable counterfactuals in the latent space. Specifically, we propose that an intervention can be represented by a Causal Delta Embedding that is invariant to the visual scene and sparse in terms of the causal variables it affects. Leveraging this insight, we propose a method for learning causal representations from image pairs, without any additional supervision. Experiments in the Causal Triplet challenge demonstrate that Causal Delta Embeddings are highly effective in OOD settings, significantly exceeding baseline performance in both synthetic and real-world benchmarks.

2602.13318 2026-06-17 cs.AI cs.CV cs.LG

DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing

DECKBench:用于学术幻灯片生成和编辑的多智能体框架基准测试

Daesik Jang, Morgan Lindsay Heisler, Linzi Xing, Yifei Li, Edward Wang, Ying Xiong, Yong Zhang, Zhenan Fan

发表机构 * Huawei Technologies Canada(华为加拿大技术有限公司) University of British Columbia(不列颠哥伦比亚大学)

AI总结 本文提出DECKBench,一个用于评估多智能体生成和编辑学术幻灯片的框架,通过定制数据集和模拟编辑指令,系统评估幻灯片和整个演示文稿的忠实度、连贯性、布局质量和多轮指令遵循能力。

详情
AI中文摘要

本文提出DECKBench,一个用于评估多智能体生成和编辑学术幻灯片的框架,通过定制数据集和模拟编辑指令,系统评估幻灯片和整个演示文稿的忠实度、连贯性、布局质量和多轮指令遵循能力。

英文摘要

Automatically generating and iteratively editing academic slide decks requires more than document summarization. It demands faithful content selection, coherent slide organization, layout-aware rendering, and robust multi-turn instruction following. However, existing benchmarks and evaluation protocols do not adequately measure these challenges. To address this gap, we introduce the Deck Edits and Compliance Kit Benchmark (DECKBench), an evaluation framework for multi-agent slide generation and editing. DECKBench is built on a curated dataset of paper to slide pairs augmented with realistic, simulated editing instructions. Our evaluation protocol systematically assesses slide-level and deck-level fidelity, coherence, layout quality, and multi-turn instruction following. We further implement a modular multi-agent baseline system that decomposes the slide generation and editing task into paper parsing and summarization, slide planning, HTML creation, and iterative editing. Experimental results demonstrate that the proposed benchmark highlights strengths, exposes failure modes, and provides actionable insights for improving multi-agent slide generation and editing systems. Overall, this work establishes a standardized foundation for reproducible and comparable evaluation of academic presentation generation and editing. Code and data are publicly available at https://github.com/morgan-heisler/DeckBench .

2601.17053 2026-06-17 cs.CV

Synthetic Data Guided Feature Selection for Robust Activity Recognition in Older Adults

合成数据引导的特征选择用于老年人稳健活动识别

Shuhao Que, Dieuwke van Dartel, Ilse Heeringa, Han Hegeman, Miriam Vollenbroek-Hutten, Ying Wang

发表机构 * University of Twente(特文特大学) Ziekenhuis Groep Twente(Twente医疗集团) Medisch Spectrum Twente(Twente医疗光谱)

AI总结 本研究开发了稳健的人体活动识别系统,利用合成数据提高老年人髋部骨折康复期间持续活动识别的可靠性,尤其在识别高临床相关性的体位转移任务上表现突出。

Comments This paper has been submitted to Nordic Conference on Digital Health and Wireless Solutions 2026, currently under review

详情
AI中文摘要

髋部骨折康复期间的体力活动对于减轻老年人群长期功能下降至关重要,但在临床实践中很少被量化。现有连续监测系统通常针对中年人开发,因此在老年人步态缓慢且变化大的情况下表现不可靠。本研究旨在开发一个稳健的人体活动识别(HAR)系统,以提高髋部骨折康复期间的持续体力活动识别。24名超过80岁的健康老年人在模拟自由生活条件下,佩戴两个加速度计(分别置于下背部和前上大腿)进行了75分钟的日常活动(行走、站立、坐、躺和体位转换)。通过留一被试法交叉验证评估模型的鲁棒性。合成数据展示了在不同参与者间泛化的能力。所得到的特征干预模型(FIM)通过合成数据指导实现了可靠的活动识别,其平均F1分数分别为行走0.896、站立0.927、坐0.997、躺0.937、体位转换0.816。与无合成数据的对照模型相比,FIM显著提高了体位转换检测,即在现有HAR文献中常被忽视的高临床相关性活动类别。结论:这些初步结果展示了在老年人群中稳健活动识别的可行性。需要进一步在髋部骨折患者群体中验证以评估所提出监测系统的临床实用性。

英文摘要

Physical activity during hip fracture rehabilitation is essential for mitigating long-term functional decline in geriatric patients. However, it is rarely quantified in clinical practice. Existing continuous monitoring systems with commercially available wearable activity trackers are typically developed in middle-aged adults and therefore perform unreliably in older adults with slower and more variable gait patterns. This study aimed to develop a robust human activity recognition (HAR) system to improve continuous physical activity recognition in the context of hip fracture rehabilitation. 24 healthy older adults aged over 80 years were included to perform activities of daily living (walking, standing, sitting, lying down, and postural transfers) under simulated free-living conditions for 75 minutes while wearing two accelerometers positioned on the lower back and anterior upper thigh. Model robustness was evaluated using leave-one-subject-out cross-validation. The synthetic data demonstrated potential to improve generalization across participants. The resulting feature intervention model (FIM), aided by synthetic data guidance, achieved reliable activity recognition with mean F1-scores of 0.896 for walking, 0.927 for standing, 0.997 for sitting, 0.937 for lying down, and 0.816 for postural transfers. Compared with a control condition model without synthetic data, the FIM significantly improved the postural transfer detection, i.e., an activity class of high clinical relevance that is often overlooked in existing HAR literature. In conclusion, these preliminary results demonstrate the feasibility of robust activity recognition in older adults. Further validation in hip fracture patient populations is required to assess the clinical utility of the proposed monitoring system.

2509.11154 2026-06-17 cs.LG cs.AI

Feature Space Topology Control via Hopkins Loss

通过霍普金斯损失控制特征空间拓扑

Einari Vaaras, Manu Airaksinen

发表机构 * Signal Processing Research Centre Tampere University(信号处理研究中心塔尔皮莱大学) BABA Center, Department of Physiology University of Helsinki(BABA中心生理学系赫尔辛基大学)

AI总结 本文提出霍普金斯损失,用于控制特征空间拓扑,通过非线性瓶颈自编码器在语音、文本和图像数据中验证其在分类和降维中的有效性。

Comments Accepted for publication in Proc. IEEE ICTAI 2025, Athens, Greece

详情
AI中文摘要

特征空间拓扑指的是特征空间中样本的组织方式。修改此拓扑在机器学习应用中有益,包括降维、生成建模、迁移学习和对抗攻击的鲁棒性。本文引入了霍普金斯损失,利用霍普金斯统计量来强制实现期望的特征空间拓扑,与现有拓扑相关方法旨在保留输入特征拓扑不同。我们在语音、文本和图像数据的两个场景中评估了霍普金斯损失的有效性:分类和使用非线性瓶颈自编码器的降维。实验表明,将霍普金斯损失整合到分类或降维中对分类性能影响很小,但能提供修改特征拓扑的好处。

英文摘要

Feature space topology refers to the organization of samples within the feature space. Modifying this topology can be beneficial in machine learning applications, including dimensionality reduction, generative modeling, transfer learning, and robustness to adversarial attacks. This paper introduces a novel loss function, Hopkins loss, which leverages the Hopkins statistic to enforce a desired feature space topology, which is in contrast to existing topology-related methods that aim to preserve input feature topology. We evaluate the effectiveness of Hopkins loss on speech, text, and image data in two scenarios: classification and dimensionality reduction using nonlinear bottleneck autoencoders. Our experiments show that integrating Hopkins loss into classification or dimensionality reduction has only a small impact on classification performance while providing the benefit of modifying feature topology.

2601.12641 2026-06-17 cs.AI

STEP-LLM: Generating CAD STEP Models from Natural Language with Large Language Models

STEP-LLM: 通过大型语言模型生成CAD STEP模型

Xiangyu Shi, Junyang Ding, Xu Zhao, Sinong Zhan, Payal Mohapatra, Daniel Quispe, Kojo Welbeck, Jian Cao, Wei Chen, Ping Guo, Qi Zhu

发表机构 * Northwestern University(西北大学)

AI总结 本文提出STEP-LLM,通过大型语言模型将自然语言转化为CAD STEP模型,采用图结构预处理和强化学习提升几何精度,验证了LLM驱动的STEP模型生成可行性。

Comments Accepted to the Design, Automation & Test in Europe Conference (DATE) 2026

详情
AI中文摘要

计算机辅助设计(CAD)对现代制造至关重要,但模型创建仍劳力密集且依赖专业知识。为使非专家能将直观设计意图转化为可制造的产物,近期基于大语言模型的文本到CAD研究聚焦于命令序列或脚本格式如CadQuery。然而,这些格式依赖内核且缺乏制造业的通用性。相比之下,产品数据交换标准(STEP,ISO 10303)文件是一种广泛采用的中性边界表示(B-rep)格式,直接兼容制造,但其图结构、交叉引用性质对自回归LLM提出了独特挑战。为此,我们编纂了约40,000个STEP-描述对的数据集,并引入了针对STEP图结构格式的新型预处理,包括基于深度优先搜索的重序列化,线性化交叉引用同时保持局部性和思维链(CoT)式结构注释,以引导全局一致性。我们整合了检索增强生成,以在监督微调中将预测与相关示例联系起来,并通过特定的Chamfer距离基于几何奖励的强化学习优化生成质量。实验表明,我们的STEP-LLM在几何保真度上优于Text2CAD基线,改进来自我们框架的多个阶段:RAG模块显著增强了完整性和可渲染性,DFS基于的重序列化增强了整体准确性,RL进一步减少了几何偏差。两者指标和视觉比较均确认STEP-LLM生成的形状比Text2CAD更精确。这些结果展示了通过自然语言驱动LLM生成STEP模型的可行性,展示了其在制造业CAD设计中的潜力。

英文摘要

Computer-aided design (CAD) is vital to modern manufacturing, yet model creation remains labor-intensive and expertise-heavy. To enable non-experts to translate intuitive design intent into manufacturable artifacts, recent large language models-based text-to-CAD efforts focus on command sequences or script-based formats like CadQuery. However, these formats are kernel-dependent and lack universality for manufacturing. In contrast, the Standard for the Exchange of Product Data (STEP, ISO 10303) file is a widely adopted, neutral boundary representation (B-rep) format directly compatible with manufacturing, but its graph-structured, cross-referenced nature poses unique challenges for auto-regressive LLMs. To address this, we curate a dataset of ~40K STEP-caption pairs and introduce novel preprocessing tailored for the graph-structured format of STEP, including a depth-first search-based reserialization that linearizes cross-references while preserving locality and chain-of-thought(CoT)-style structural annotations that guide global coherence. We integrate retrieval-augmented generation to ground predictions in relevant examples for supervised fine-tuning, and refine generation quality through reinforcement learning with a specific Chamfer Distance-based geometric reward. Experiments demonstrate consistent gains of our STEP-LLM in geometric fidelity over the Text2CAD baseline, with improvements arising from multiple stages of our framework: the RAG module substantially enhances completeness and renderability, the DFS-based reserialization strengthens overall accuracy, and the RL further reduces geometric discrepancy. Both metrics and visual comparisons confirm that STEP-LLM generates shapes with higher fidelity than Text2CAD. These results show the feasibility of LLM-driven STEP model generation from natural language, showing its potential to democratize CAD design for manufacturing.

2509.03932 2026-06-17 cs.CL cs.CY cs.LG

KPoEM: A Human-Annotated Dataset for Emotion Classification and RAG-Based Poetry Generation in Korean Modern Poetry

KPoEM:用于韩国现代诗歌情感分类与基于RAG的诗歌生成的人工标注数据集

Iro Lim, Haein Ji, Byungjun Kim

发表机构 * The Academy of Korean Studies(韩国学术院) Graduate School of Korean Studies(韩国研究研究生院) Cultural Informatics(文化信息学)

AI总结 本研究构建了KPoEM多标签情感数据集,通过序列微调策略实现F1-micro 0.60的情感分类,并验证了基于RAG的诗歌生成在韩国文学情感与文化表达上的可行性。

Comments 43 pages, 22 tables, 3 figures, Digital Humanities and Social Sciences Korea Conference, James Joo-Jin Kim Center for Korean Studies, University of Pennsylvania, Philadelphia, USA

Journal ref The Review of Korean Studies 29(1) (2026) 161-206

详情
AI中文摘要

本研究介绍了KPoEM(韩国诗歌情感映射),这是一个新颖的数据集,为现代韩国诗歌中情感中心分析和生成应用奠定了基础。尽管自然语言处理取得了进展,但由于诗歌复杂的比喻语言和文化特异性,其研究仍不充分。我们构建了一个包含7,662条条目(7,007条行级和615条作品级)的多标签数据集,由五位有影响力的韩国诗人的44个细粒度情感类别进行标注。通过序列策略(从通用语料库到专门的KPoEM数据集)微调的KPoEM情感分类模型,实现了0.60的F1-micro分数,显著优于之前的模型(0.43)。该模型在保留核心诗歌情感的同时,展示了识别时间和文化特定情感表达的能力增强。此外,将结构化情感数据集应用于基于RAG的诗歌生成模型,证明了生成反映韩国文学情感和文化敏感性文本的实证可行性。这种综合方法加强了计算技术与文学分析之间的联系,为定量情感研究和生成诗学开辟了新途径。总体而言,本研究为推进现代韩国诗歌中情感中心分析和创作提供了基础。

英文摘要

This study introduces KPoEM (Korean Poetry Emotion Mapping), a novel dataset that serves as a foundation for both emotion-centered analysis and generative applications in modern Korean poetry. Despite advancements in NLP, poetry remains underexplored due to its complex figurative language and cultural specificity. We constructed a multi-label dataset of 7,662 entries (7,007 line-level and 615 work-level), annotated with 44 fine-grained emotion categories from five influential Korean poets. The KPoEM emotion classification model, fine-tuned through a sequential strategy -- moving from general-purpose corpora to the specialized KPoEM dataset -- achieved an F1-micro score of 0.60, significantly outperforming previous models (0.43). The model demonstrates an enhanced ability to identify temporally and culturally specific emotional expressions while preserving core poetic sentiments. Furthermore, applying the structured emotion dataset to a RAG-based poetry generation model demonstrates the empirical feasibility of generating texts that reflect the emotional and cultural sensibilities of Korean literature. This integrated approach strengthens the connection between computational techniques and literary analysis, opening new pathways for quantitative emotion research and generative poetics. Overall, this study provides a foundation for advancing emotion-centered analysis and creation in modern Korean poetry.

2511.06500 2026-06-17 cs.RO

Cross-Platform Learnable Fuzzy Gain-Scheduled Proportional-Integral-Derivative Controller Tuning via Physics-Constrained Meta-Learning and Reinforcement Learning Adaptation

跨平台可学习模糊增益调度比例-积分-微分控制器调优:通过物理约束元学习和强化学习适应

JiaHao Wu, ShengWen Yu

发表机构 * The University of Hong Kong(香港大学) Guangzhou College of Commerce(广州商学院)

AI总结 本文提出一种分层框架,用于跨平台调优可学习模糊增益调度PID控制器。通过物理约束虚拟机器人合成,结合元学习和轻量强化学习,实现跨平台初始化和部署特定优化,提升跟踪性能和鲁棒性。

Comments 24 pages,15 tables, 6 figures

详情
AI中文摘要

动机和差距:PID家族控制器仍是许多机器人系统的选择,因其简单和可解释性,但调节稳定、高性能的增益是耗时且通常不可转移的,跨机器人形态、负载和部署条件。模糊增益调度可提供可解释的在线调整,但其每个关节缩放和后续参数是平台依赖性的,难以系统调优。提出方法:我们提出一种分层框架,用于跨平台调优可学习模糊增益调度PID(LF-PID)。控制器使用共享模糊隶属分区以保持共同误差语义,同时学习每个关节缩放和Takagi-Sugeno后件参数,以在线调度PID增益。结合物理约束虚拟机器人合成,元学习提供从机器人物理特征的跨平台初始化,轻量强化学习(RL)阶段在动态不匹配下执行部署特定优化。从三个基础模拟平台开始,通过质量(±10%)、惯性(±15%)和摩擦(±20%)的有界扰动生成232个物理有效的训练变体。结果和见解:我们在两个不同系统(9自由度串联机械臂和12自由度四足机器人)下评估跨平台泛化能力,多个干扰场景下,RL适应阶段在元初始化控制器上提升跟踪性能,高负载关节(12.36度到2.42度)误差减少达80.4%,参数不确定性下提升19.2%。我们进一步识别了一个优化上限效应:在线优化在元初始化基线存在局部缺陷时产生显著收益,但当基线质量已整体良好时提供有限改进。

英文摘要

Motivation and gap: PID-family controllers remain a pragmatic choice for many robotic systems due to their simplicity and interpretability, but tuning stable, high-performing gains is time-consuming and typically non-transferable across robot morphologies, payloads, and deployment conditions. Fuzzy gain scheduling can provide interpretable online adjustment, yet its per-joint scaling and consequent parameters are platform-dependent and difficult to tune systematically. Proposed approach: We propose a hierarchical framework for cross-platform tuning of a learnable fuzzy gain-scheduled PID (LF-PID). The controller uses shared fuzzy membership partitions to preserve common error semantics, while learning per-joint scaling and Takagi-Sugeno consequent parameters that schedule PID gains online. Combined with physics-constrained virtual robot synthesis, meta-learning provides cross-platform initialization from robot physical features, and a lightweight reinforcement learning (RL) stage performs deployment-specific refinement under dynamics mismatch. Starting from three base simulated platforms, we generate 232 physically valid training variants via bounded perturbations of mass (+/-10%), inertia (+/-15%), and friction (+/-20%). Results and insight: We evaluate cross-platform generalization on two distinct systems (a 9-DOF serial manipulator and a 12-DOF quadruped) under multiple disturbance scenarios. The RL adaptation stage improves tracking performance on top of the meta-initialized controller, with up to 80.4% error reduction in challenging high-load joints (12.36 degrees to 2.42 degrees) and 19.2% improvement under parameter uncertainty. We further identify an optimization ceiling effect: online refinement yields substantial gains when the meta-initialized baseline exhibits localized deficiencies, but provides limited improvement when baseline quality is already uniformly strong.

2509.19525 2026-06-17 cs.RO

Real-Time Reinforcement Learning for Dynamic Tasks with a Parallel Soft Robot

动态任务的实时强化学习与并行软机器人

James Avtges, Jake Ketchum, Millicent Schlafly, Helena Young, Taekyoung Kim, Allison Pinosky, Ryan L. Truby, Todd D. Murphey

发表机构 * Department of Mechanical Engineering, Northwestern University(西北大学机械工程系) Department of Materials Science and Engineering, Northwestern University(西北大学材料科学与工程系)

AI总结 本文提出基于课程学习的实时强化学习方法,用于在单次部署中实现软机器人的动态平衡,通过并行软执行器和HSA结构实现高可靠性控制。

Comments Published at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 2025

详情
AI中文摘要

闭环控制仍然是软机器人领域的开放挑战。在动态负载条件下,软执行器的非线性响应限制了分析模型在软机器人控制中的应用。传统方法在控制软机器人时未能充分利用其配置空间,以避免非线性、迟滞、大变形和执行器损坏的风险。此外,基于经验的数据驱动控制方法,如强化学习(RL),通常受到样本效率和初始化不一致的限制。在本工作中,我们展示了RL在实时单次硬件部署中可靠地学习动态平衡任务的控制策略。我们使用由并行3D打印软执行器构建的可变形斯图尔特平台,基于电机驱动的 handed shearing auxetic(HSA)结构。通过引入基于已知平衡点扩展邻域的课程学习方法,我们实现了在任意坐标处的可靠单次部署平衡。除了对基于模型和无模型方法的性能进行基准测试外,我们还证明了在单次部署中,最大扩散RL能够在半数执行器有效失效的情况下学习动态平衡,通过诱导屈曲并用切割器破坏执行器。训练无需先验数据,可在15分钟内完成,性能几乎与完整平台相同。单次硬件学习使软机器人系统能够可靠地在现实世界中学习,并将使更多样化和有能力的软机器人成为可能。

英文摘要

Closed-loop control remains an open challenge in soft robotics. The nonlinear responses of soft actuators under dynamic loading conditions limit the use of analytic models for soft robot control. Traditional methods of controlling soft robots underutilize their configuration spaces to avoid nonlinearity, hysteresis, large deformations, and the risk of actuator damage. Furthermore, episodic data-driven control approaches such as reinforcement learning (RL) are traditionally limited by sample efficiency and inconsistency across initializations. In this work, we demonstrate RL for reliably learning control policies for dynamic balancing tasks in real-time single-shot hardware deployments. We use a deformable Stewart platform constructed using parallel, 3D-printed soft actuators based on motorized handed shearing auxetic (HSA) structures. By introducing a curriculum learning approach based on expanding neighborhoods of a known equilibrium, we achieve reliable single-deployment balancing at arbitrary coordinates. In addition to benchmarking the performance of model-based and model-free methods, we demonstrate that in a single deployment, Maximum Diffusion RL is capable of learning dynamic balancing after half of the actuators are effectively disabled, by inducing buckling and by breaking actuators with bolt cutters. Training occurs with no prior data, in as fast as 15 minutes, with performance nearly identical to the fully-intact platform. Single-shot learning on hardware facilitates soft robotic systems reliably learning in the real world and will enable more diverse and capable soft robots.

2501.16370 2026-06-17 cs.LG cs.AI cs.NA cs.NE math.NA

Advanced Physics-Informed Neural Network with Residuals for Solving Complex Integral Equations

先进物理指导神经网络与残差用于求解复杂积分方程

Mahdi Movahedian Moghaddam, Kourosh Parand, Saeed Reza Kheradpisheh

发表机构 * Department of Computer and Data Sciences, Shahid Beheshti University(计算机与数据科学系,谢赫·贝赫什提大学) Department of Cognitive Modeling, Shahid Beheshti University(认知建模系,谢赫·贝赫什提大学)

AI总结 本文提出残差积分求解网络(RISN),通过高精度数值方法与残差连接提升求解积分和积分微分方程的精度与稳定性,实验表明其在多种方程类型上均优于传统PINN及其变体。

Journal ref Anal. Numer. Solut. Nonlinear Equ. 11 (2026), no. 1, 153-173

详情
AI中文摘要

本文提出残差积分求解网络(RISN),一种新型神经网络架构,旨在求解广泛类别的积分和积分微分方程,包括一维、多维、常微分和偏微分、分数类型以及包含振荡核的霍尔迈尔类型积分方程。RISN整合残差连接与高精度数值方法如高斯求积和分数导数运算矩阵,使其在精度和稳定性上优于传统物理指导神经网络(PINN)。残差连接有助于缓解消失梯度问题,使RISN能够处理更深层的网络和更复杂的核,特别是在多维问题中。通过广泛实验,我们证明RISN在各种方程类型上均优于传统PINN及其变体,如辅助PINN(A-PINN)和自适应PINN(SA-PINN),在各种方程类型上均取得显著更低的平均绝对误差(MAE)。这些结果突显了RISN在求解具有挑战性的积分和积分微分问题中的鲁棒性和效率,使其成为传统方法难以应对的现实应用中的宝贵工具。

英文摘要

In this paper, we present the Residual Integral Solver Network (RISN), a novel neural network architecture designed to solve a wide range of integral and integro-differential equations, including one-dimensional, multi-dimensional, ordinary and partial integro-differential, systems, fractional types, and Helmholtz-type integral equations involving oscillatory kernels. RISN integrates residual connections with high-accuracy numerical methods such as Gaussian quadrature and fractional derivative operational matrices, enabling it to achieve higher accuracy and stability than traditional Physics-Informed Neural Networks (PINN). The residual connections help mitigate vanishing gradient issues, allowing RISN to handle deeper networks and more complex kernels, particularly in multi-dimensional problems. Through extensive experiments, we demonstrate that RISN consistently outperforms not only classical PINNs but also advanced variants such as Auxiliary PINN (A-PINN) and Self-Adaptive PINN (SA-PINN), achieving significantly lower Mean Absolute Errors (MAE) across various types of equations. These results highlight RISN's robustness and efficiency in solving challenging integral and integro-differential problems, making it a valuable tool for real-world applications where traditional methods often struggle.

2509.13196 2026-06-17 cs.CL

The Few-shot Dilemma: Over-prompting Large Language Models

少样本困境:过度提示大型语言模型

Yongjian Tang, Doruk Tuncel, Christian Koerner, Thomas Runkler

发表机构 * Siemens AG(西门子股份公司) Technical University of Munich(慕尼黑技术大学)

AI总结 本文提出一个提示框架,使用随机采样、语义嵌入和TF-IDF三种少样本选择方法,在多个LLM上实验发现过多领域特定示例会降低性能,并通过TF-IDF与分层采样结合找到最优示例数量,在软件需求分类上超越现有方法1%。

Comments accepted for the main track of FLLM

详情
AI中文摘要

过度提示是一种现象,即提示中过多的示例导致大型语言模型(LLMs)性能下降,挑战了关于上下文少样本学习的传统观点。为了研究这种少样本困境,我们概述了一个提示框架,该框架利用三种标准的少样本选择方法——随机采样、语义嵌入和TF-IDF向量——并在多个LLM上评估这些方法,包括GPT-4o、GPT-3.5-turbo、DeepSeek-V3、Gemma-3、LLaMA-3.1、LLaMA-3.2和Mistral。我们的实验结果表明,在提示中加入过多的领域特定示例可能会在某些LLM中反常地降低性能,这与先前认为更多相关少样本示例普遍有利于LLM的实证结论相矛盾。鉴于LLM辅助软件工程和需求分析的趋势,我们在两个真实世界的软件需求分类数据集上进行了实验。通过逐步增加TF-IDF选择和分层的少样本示例数量,我们为每个LLM确定了其最优数量。这种组合方法以更少的示例实现了更优的性能,避免了过度提示问题,从而在功能性和非功能性需求分类上超越了现有技术1%。

英文摘要

Over-prompting, a phenomenon where excessive examples in prompts lead to diminished performance in Large Language Models (LLMs), challenges the conventional wisdom about in-context few-shot learning. To investigate this few-shot dilemma, we outline a prompting framework that leverages three standard few-shot selection methods - random sampling, semantic embedding, and TF-IDF vectors - and evaluate these methods across multiple LLMs, including GPT-4o, GPT-3.5-turbo, DeepSeek-V3, Gemma-3, LLaMA-3.1, LLaMA-3.2, and Mistral. Our experimental results reveal that incorporating excessive domain-specific examples into prompts can paradoxically degrade performance in certain LLMs, which contradicts the prior empirical conclusion that more relevant few-shot examples universally benefit LLMs. Given the trend of LLM-assisted software engineering and requirement analysis, we experiment with two real-world software requirement classification datasets. By gradually increasing the number of TF-IDF-selected and stratified few-shot examples, we identify their optimal quantity for each LLM. This combined approach achieves superior performance with fewer examples, avoiding the over-prompting problem, thus surpassing the state-of-the-art by 1% in classifying functional and non-functional requirements.

2509.10089 2026-06-17 cs.LG

KAN-SR: A Kolmogorov-Arnold Network Guided Symbolic Regression Framework

KAN-SR:基于Kolmogorov-Arnold网络的符号回归框架

Marco Andrea Bühler, Gonzalo Guillén-Gosálbez

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 本文提出基于Kolmogorov-Arnold网络的KAN-SR框架,通过深度学习技术和简化策略恢复Feynman符号回归科学发现数据集的真实方程,并结合神经控制微分方程精确建模生物过程系统。

Journal ref Computers & Chemical Engineering, Volume 213, 2026, 109721

详情
AI中文摘要

我们介绍了一种新颖的符号回归框架,即KAN-SR,其基于Kolmogorov-Arnold网络(KANs),采用分而治之的方法。符号回归旨在寻找最佳拟合给定数据集的数学方程,通常通过遗传编程方法解决。我们证明通过使用深度学习技术、更具体的KANs以及结合简化策略如平移对称性和分离性,能够恢复Feynman符号回归科学发现数据集的真实方程。此外,我们还证明通过将所提出的框架与神经控制微分方程结合,能够精确建模生物过程系统,为其他工程系统的动态建模打开大门。

英文摘要

We introduce a novel symbolic regression framework, namely KAN-SR, built on Kolmogorov Arnold Networks (KANs) which follows a divide-and-conquer approach. Symbolic regression searches for mathematical equations that best fit a given dataset and is commonly solved with genetic programming approaches. We show that by using deep learning techniques, more specific KANs, and combining them with simplification strategies such as translational symmetries and separabilities, we are able to recover ground-truth equations of the Feynman Symbolic Regression for Scientific Discovery (SRSD) dataset. Additionally, we show that by combining the proposed framework with neural controlled differential equations, we are able to model the dynamics of an in-silico bioprocess system precisely, opening the door for the dynamic modeling of other engineering systems.

2506.19277 2026-06-17 cs.RO cs.SY eess.SY

Ontology Neural Network and ORTSF: A Framework for Topological Reasoning and Delay-Robust Control

本体神经网络与ORTSF:一种用于拓扑推理和延迟鲁棒控制的框架

Jaehong Oh

发表机构 * Department of Mechanical Engineering Soongsil University, Seoul, Korea Email

AI总结 本文提出Ontology Neural Network和ORTSF框架,解决现有方法在关系语义表示和动态环境中协作所需认知透明度的不足,通过统一架构实现语义认知与鲁棒控制的统一。

Comments 12 pages, 5 figures, includes theoretical proofs and simulation results

详情
AI中文摘要

自主机器人系统的进步在感知、定位、建图和控制方面取得了显著成果,但存在根本性缺口:现有框架在几何推理和动态稳定性方面表现优异,但在关系语义表示、上下文推理和认知透明度方面存在不足,这些是动态、以人为中心环境中协作的关键。本文提出包含本体神经网络(ONN)和本体实时语义织体(ORTSF)的统一架构,以解决这一缺口。ONN将关系语义推理形式化为动态拓扑过程。通过将Forman-Ricci曲率、持续同调和语义张量结构嵌入统一的损失公式中,ONN确保随着场景随时间演变,关系完整性和拓扑一致性得以保持。ORTSF将推理轨迹转化为可操作的控制命令,同时补偿系统延迟。它整合了预测性和延迟感知的操作符,确保在显著延迟条件下相位边距的保持和控制信号的连续性。实证研究展示了ONN + ORTSF框架在统一语义认知和鲁棒控制方面的能力,提供了一种数学上严谨且实际可行的解决方案,用于认知机器人学。

英文摘要

The advancement of autonomous robotic systems has led to impressive capabilities in perception, localization, mapping, and control. Yet, a fundamental gap remains: existing frameworks excel at geometric reasoning and dynamic stability but fall short in representing and preserving relational semantics, contextual reasoning, and cognitive transparency essential for collaboration in dynamic, human-centric environments. This paper introduces a unified architecture comprising the Ontology Neural Network (ONN) and the Ontological Real-Time Semantic Fabric (ORTSF) to address this gap. The ONN formalizes relational semantic reasoning as a dynamic topological process. By embedding Forman-Ricci curvature, persistent homology, and semantic tensor structures within a unified loss formulation, ONN ensures that relational integrity and topological coherence are preserved as scenes evolve over time. The ORTSF transforms reasoning traces into actionable control commands while compensating for system delays. It integrates predictive and delay-aware operators that ensure phase margin preservation and continuity of control signals, even under significant latency conditions. Empirical studies demonstrate the ONN + ORTSF framework's ability to unify semantic cognition and robust control, providing a mathematically principled and practically viable solution for cognitive robotics.

2506.10207 2026-06-17 cs.SD cs.DC eess.AS

FedMLAC: Mutual Learning Driven Heterogeneous Federated Audio Classification

FedMLAC:基于互学习的异构联邦音频分类

Jun Bai, Rajib Rana, Di Wu, Youyang Qu, Xiaohui Tao, Ji Zhang, Carlos Busso, Shivakumara Palaiahnakote

发表机构 * School of Computer Science, McGill University(麦吉尔大学计算机科学学院) Mila - Quebec AI Institute(魁北克AI研究所) School of Mathematics, Physics and Computing, University of Southern Queensland(南方昆士兰大学数学、物理与计算学院) Language Technologies Institute, Carnegie Mellon University(卡内基梅隆大学语言技术研究所) School of Science, Engineering and Environment, University of Salford(萨尔福德大学科学、工程与环境学院)

AI总结 FedMLAC通过双向知识蒸馏解决联邦音频分类中的数据和模型异质性问题,并引入分层剪枝聚合策略对抗数据污染,实验表明其在分类准确性和抗噪声能力上优于现有方法。

Comments updated version for the first submission

Journal ref Pattern Recognition, vol. 180, Article 114250, 2026

详情
AI中文摘要

联邦学习(FL)提供了一个隐私保护的框架,用于在去中心化的客户端上训练音频分类(AC)模型,而无需共享原始数据。然而,联邦音频分类(FedAC)面临三大主要挑战:数据异质性、模型异质性以及数据污染,这些会降低实际应用中的性能。尽管现有方法通常分别解决这些问题,但统一且稳健的解决方案仍被忽视。我们提出了FedMLAC,一种基于互学习的FL框架,同时解决这三个挑战。每个客户端维护一个个性化本地AC模型和一个轻量级、全局共享的Plug-in模型。这些模型通过双向知识蒸馏交互,实现全局知识共享的同时适应本地数据分布,从而解决数据和模型异质性问题。为对抗数据污染,我们引入了分层剪枝聚合(LPA)策略,在聚合过程中根据参数偏差过滤异常的Plug-in更新。在四个多样化的音频分类基准上进行了广泛的实验,包括语音和非语音任务,结果表明FedMLAC在分类准确性和抗噪声能力上始终优于最先进的基线方法。

英文摘要

Federated Learning (FL) offers a privacy-preserving framework for training audio classification (AC) models across decentralized clients without sharing raw data. However, Federated Audio Classification (FedAC) faces three major challenges: data heterogeneity, model heterogeneity, and data poisoning, which degrade performance in real-world settings. While existing methods often address these issues separately, a unified and robust solution remains underexplored. We propose FedMLAC, a mutual learning-based FL framework that tackles all three challenges simultaneously. Each client maintains a personalized local AC model and a lightweight, globally shared Plug-in model. These models interact via bidirectional knowledge distillation, enabling global knowledge sharing while adapting to local data distributions, thus addressing both data and model heterogeneity. To counter data poisoning, we introduce a Layer-wise Pruning Aggregation (LPA) strategy that filters anomalous Plug-in updates based on parameter deviations during aggregation. Extensive experiments on four diverse audio classification benchmarks, including both speech and non-speech tasks, show that FedMLAC consistently outperforms state-of-the-art baselines in classification accuracy and robustness to noisy data.

2502.10112 2026-06-17 cs.LG

Accelerometry-based Energy Expenditure Estimation During Activities of Daily Living: A Comparison Among Different Accelerometer Compositions

基于加速度计的日常活动能量消耗估计:不同加速度计配置的比较

Shuhao Que, Remco Poelarends, Peter Veltink, Miriam Vollenbroek-Hutten, Ying Wang

发表机构 * Department of Electrical Engineering, University of Twente(特文特大学电气工程系) Department of Nuclear Medicine, Isala(Isala核医学部)

AI总结 本文比较了基于身体中心质量加速度和腕部加速度计的不同配置在日常活动能量消耗估计中的表现,发现基于身体中心质量的3-acc配置表现最佳。

Comments This work has been accepted by IEEE EMBC 2025

详情
AI中文摘要

身体活动能量消耗(PAEE)可通过呼吸数据测量,也可通过身体运动预测。身体中心质量(COM)加速度反映全身运动,是PAEE的良好预测指标。本文使用COSMED K5测量的呼吸数据作为参考,评估了基于COM和腕部的配置性能。COM配置包括仅使用骨盆加速度计(pelvis-acc)和骨盆加速度计加双大腿加速度计(3-acc)。腕部配置包括仅使用左腕或右腕加速度计。两种现有PAEE估计方法(线性回归和CNN-LSTM)在3-acc配置下表现最佳(LR:R²=0.41,CNN-LSTM:R²=0.53)。3-acc与pelvis-acc配置无显著差异(p值=0.278)。对于两种模型,左腕或右腕配置在PAEE预测中无显著表现(R²接近0,显著劣于COM配置(p值<0.05)。左右腕无显著差异(p值=0.329)

英文摘要

Physical activity energy expenditure (PAEE) can be measured from breath-by-breath respiratory data, which can serve as a reference. Alternatively, PAEE can be predicted from the body movements, which can be measured and estimated with accelerometers. The body center of mass (COM) acceleration reflects the movements of the whole body and thus serves as a good predictor for PAEE. However, the wrist has also become a popular location due to recent advancements in wrist-worn devices. Therefore, in this work, using the respiratory data measured by COSMED K5 as the reference, we evaluated and compared the performances of COM-based settings and wrist-based settings. The COM-based settings include two different accelerometer compositions, using only the pelvis accelerometer (pelvis-acc) and the pelvis accelerometer with two accelerometers from two thighs (3-acc). The wrist-based settings include using only the left wrist accelerometer (l-wrist-acc) and only the right wrist accelerometer (r-wrist-acc). We implemented two existing PAEE estimation methods on our collected dataset, where 9 participants performed activities of daily living while wearing 5 accelerometers (i.e., pelvis, two thighs, and two wrists). These two methods include a linear regression (LR) model and a CNN-LSTM model. Both models yielded the best results with the COM-based 3-acc setting (LR: $R^2$ = 0.41, CNN-LSTM: $R^2$ = 0.53). No significant difference was found between the 3-acc and pelvis-acc settings (p-value = 0.278). For both models, neither the l-wrist-acc nor the r-wrist-acc settings demonstrated predictive power on PAEE with $R^2$ values close to 0, significantly outperformed by the two COM-based settings (p-values $<$ 0.05). No significant difference was found between the two wrists (p-value = 0.329).

2503.08679 2026-06-17 cs.AI cs.CL cs.LG

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

现实中的思维链推理并不总是忠实的

Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, Arthur Conmy

发表机构 * Poseidon Research(Poseidon研究)

AI总结 研究发现,在自然语言提示下,模型有时会生成表面连贯但自相矛盾的思维链,揭示出隐含的事后合理化现象,且前沿模型也未能完全避免。

Comments Published at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

最近的研究表明,当面对提示中的显式偏见时,模型通常会在其思维链(CoT)输出中省略提及这些偏见,揭示出口头推理可能给出模型如何得出错误结论的不正确图景(不忠实)。在这项工作中,我们展示了不忠实的CoT也发生在自然措辞、非对抗性的提示上,而无需添加人为偏见或编辑模型输出。我们发现,当分别呈现问题“X比Y大吗?”和“Y比X大吗?”时,模型有时会生成表面连贯的论证来证明系统性地对两者都回答“是”或都回答“否”是合理的,尽管存在矛盾。我们提供了初步证据表明这是由于模型对“是”或“否”的隐含偏见,并将其标记为隐含的事后合理化。我们的结果显示,生产模型的不忠实率高达13%,而前沿模型虽然更忠实,但没有一个完全忠实,包括像DeepSeek R1(0.37%)和Sonnet 3.7 with thinking(0.04%)这样的思考模型。我们还研究了不忠实的非逻辑捷径,即模型使用微妙的非逻辑推理来使对困难数学问题的推测性答案看起来经过严格证明。我们的发现表明,虽然CoT可用于评估输出,但它并不是产生模型答案的内部过程的完整描述,应在代理或安全关键环境中谨慎使用。

英文摘要

Recent studies indicate that when faced with explicit biases in prompts, models often omit mentioning these biases in their Chain-of-Thought (CoT) output, revealing that verbalized reasoning can give an incorrect picture of how models arrive at conclusions (unfaithfulness). In this work, we show that unfaithful CoT also occurs on naturally worded, non-adversarial prompts without adding artificial biases or editing model outputs. We find that when separately presented with the questions "Is X bigger than Y?" and "Is Y bigger than X?", models sometimes produce superficially coherent arguments to justify systematically answering Yes to both or No to both, despite the contradiction. We present preliminary evidence that this is due to models' implicit biases towards Yes or No, labeling this Implicit Post-Hoc Rationalization. Our results reveal rates up to 13% for production models, and while frontier models are more faithful, none are entirely so, including thinking models like DeepSeek R1 (0.37%) and Sonnet 3.7 with thinking (0.04%). We also investigate Unfaithful Illogical Shortcuts, where models use subtly illogical reasoning to make speculative answers to hard math problems seem rigorously proven. Our findings indicate that while CoT can be useful for assessing outputs, it is not a complete account of the internal process that produced the model's answer and should be used with caution in agentic or safety-critical settings.

2305.09366 2026-06-17 cs.LG eess.SP

Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors

基于可穿戴运动传感器的自动婴儿运动分类中自监督预训练的评估

Einari Vaaras, Manu Airaksinen, Sampsa Vanhatalo, Okko Räsänen

发表机构 * Helsinki University Hospital, Helsinki, Finland(赫尔辛基大学医院,芬兰)

AI总结 本文评估了自监督预训练在提高基于可穿戴运动传感器的婴儿运动分类准确性中的效果,发现预训练无标签数据可提升分类模型的鲁棒性,且选择上下文相关数据进一步提升了性能。

Comments To be published in Proc. IEEE EMBC 2023, Sydney, Australia

详情
AI中文摘要

最近开发的婴儿可穿戴MAIJU设备为在非医院环境客观评估婴儿运动性能提供了新方法,该信息可用于发展研究和临床决策支持,如检测发育问题并指导治疗干预。MAIJU分析完全依赖于婴儿姿势和运动的分类,因此研究如何提高此类分类的准确性至关重要。本文研究了自监督预训练如何提升用于分析MAIJU记录的分类器性能,并探讨了预训练数据的上下文选择性质量筛选是否会影响分类器性能。实验表明,i)使用无标签数据预训练分类器可使后续分类模型的准确性显著提升,ii)选择上下文相关预训练数据可进一步提高分类器性能。

英文摘要

The recently-developed infant wearable MAIJU provides a means to automatically evaluate infants' motor performance in an objective and scalable manner in out-of-hospital settings. This information could be used for developmental research and to support clinical decision-making, such as detection of developmental problems and guiding of their therapeutic interventions. MAIJU-based analyses rely fully on the classification of infant's posture and movement; it is hence essential to study ways to increase the accuracy of such classifications, aiming to increase the reliability and robustness of the automated analysis. Here, we investigated how self-supervised pre-training improves performance of the classifiers used for analyzing MAIJU recordings, and we studied whether performance of the classifier models is affected by context-selective quality-screening of pre-training data to exclude periods of little infant movement or with missing sensors. Our experiments show that i) pre-training the classifier with unlabeled data leads to a robust accuracy increase of subsequent classification models, and ii) selecting context-relevant pre-training data leads to substantial further improvements in the classifier performance.

2206.10188 2026-06-17 cs.LG cs.SD eess.AS

Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition

基于聚类的主动学习中自监督学习与降维方法的分析用于语音情感识别

Einari Vaaras, Manu Airaksinen, Okko Räsänen

发表机构 * Unit of Computing Sciences, Tampere University, Finland(图皮大学计算科学系,芬兰) Helsinki University Hospital, Helsinki, Finland(赫尔辛基大学医院,芬兰)

AI总结 本文研究了在语音情感识别中,利用自监督学习和降维方法提升基于聚类的主动学习性能,探讨了特征空间局部和全局拓扑结构对主动学习的影响,发现降维不影响性能且二维特征表现良好。

Comments To be published in Proc. Interspeech 2022, Incheon, South Korea

详情
AI中文摘要

当领域专家需要进行数据标注时,减少标注工作量以节省时间和成本至关重要。在无标注情况下,可以利用特征空间结构进行基于聚类的主动学习(AL)方法。然而,这些方法高度依赖于样本在特征空间中的组织方式和距离度量。无监督方法如对比预测编码(CPC)可以用于学习有序的特征空间,但这些方法通常会产生高维特征,这可能对估计数据密度构成挑战。本文结合CPC和多种降维方法,探索基于聚类的AL的实用方法。我们的实验表明,特征空间的局部和全局拓扑结构可以成功用于AL,并且CPC可以提高基于传统信号特征的聚类AL性能。此外,我们观察到压缩数据维度对AL性能影响不大,当标注数量不低时,二维特征表示与高维特征表示在AL性能上相似。

英文摘要

When domain experts are needed to perform data annotation for complex machine-learning tasks, reducing annotation effort is crucial in order to cut down time and expenses. For cases when there are no annotations available, one approach is to utilize the structure of the feature space for clustering-based active learning (AL) methods. However, these methods are heavily dependent on how the samples are organized in the feature space and what distance metric is used. Unsupervised methods such as contrastive predictive coding (CPC) can potentially be used to learn organized feature spaces, but these methods typically create high-dimensional features which might be challenging for estimating data density. In this paper, we combine CPC and multiple dimensionality reduction methods in search of functioning practices for clustering-based AL. Our experiments for simulating speech emotion recognition system deployment show that both the local and global topology of the feature space can be successfully used for AL, and that CPC can be used to improve clustering-based AL performance over traditional signal features. Additionally, we observe that compressing data dimensionality does not harm AL performance substantially, and that 2-D feature representations achieved similar AL performance as higher-dimensional representations when the number of annotations is not very low.

2606.18019 2026-06-17 eess.AS cs.CL cs.SD 新提交

Reading between the Lines: Leveraging Large Language Models for Global Dementia and Depression Assessment from Clinical Interviews

字里行间:利用大型语言模型从临床访谈中进行全球痴呆和抑郁评估

Franziska Braun, Alea Rüggeberg, Thomas Ranzenberger, Hartmut Lehfeld, Thomas Hillemacher, Tobias Bocklet, Korbinian Riedhammer

发表机构 * TH Nürnberg(Nürnberg大学) FAU Erlangen(埃朗根大学) PMU Klinikum Nürnberg(纽伦堡大学医院)

AI总结 本研究利用开放权重大型语言模型,从154名德语受试者的临床访谈录音中预测痴呆和抑郁严重程度,引入与全球恶化量表对齐的全球抑郁量表,发现零样本预测对抑郁有效,而结构化特征提取显著提升痴呆评估性能,误差降低达35%,且暂停增强转录本表现与人工转录相当。

Comments Accepted for publication in Text, Speech and Dialogue (TSD 2026). The final authenticated publication will be available online via Springer LNCS/LNAI

详情
AI中文摘要

痴呆和抑郁是老年人群中最常见的神经精神障碍,其重叠症状对鉴别诊断构成重大挑战。在本研究中,我们探讨了开放权重的大型语言模型(LLMs)用于从154名德语受试者的标准化病史访谈录音中预测痴呆和抑郁严重程度。我们引入了一个与已建立的全球恶化量表(GDS)对齐的观察者基础全球抑郁量表(GDS-D),从而能够对情感和认知症状进行并行全局分期。我们在两种设置下比较了三种LLMs(Mistral 3.1、DeepHermes、Qwen3):(1) 零样本预测和(2) 基于LLM的特征提取用于支持向量回归,使用人工转录和暂停增强转录。结果显示,LLMs在零样本设置中有效预测抑郁严重程度(最佳MAE为0.60),而痴呆评估显著受益于结构化特征提取(最佳MAE为0.78),相比零样本基线误差降低高达35%。暂停增强转录本在性能上与人工转录相当,证明了全自动筛查流程在神经精神鉴别评估中的可行性。

英文摘要

Dementia and depression are the most prevalent neuropsychiatric disorders in geriatric populations, and their overlapping symptoms pose major challenges for differential diagnosis. In this study, we investigate open-weights Large Language Models (LLMs) for predicting dementia and depression severity from speech samples collected during standardized history taking interviews with 154 German-speaking subjects. We introduce an observer-based Global Depression Scale (GDS-D) aligned with the established Global Deterioration Scale (GDS), enabling parallel global staging of affective and cognitive symptoms. We compare three LLMs (Mistral 3.1, DeepHermes, Qwen3) in two settings: (1) zero-shot prediction and (2) LLM-based feature extraction for Support Vector Regression, using human and pause-enriched transcripts. Results show that LLMs effectively predict depression severity in zero-shot settings (best MAE of 0.60), while dementia assessment benefits substantially from structured feature extraction (best MAE of 0.78), reducing errors by up to 35% over zero-shot baselines. Pause-enriched transcripts achieve competitive performance with human transcriptions, demonstrating the viability of fully automatic screening pipelines for differential neuropsychiatric assessment.

2606.18011 2026-06-17 stat.ML cs.LG stat.ME 新提交

Fast Nonparametric Conditional Independence Testing via Two-Stage Regression

通过两阶段回归的快速非参数条件独立性检验

Eric V. Strobl

发表机构 * Department of Biomedical Informatics, University of Pittsburgh(生物医学信息学系,匹兹堡大学)

AI总结 提出BLITZ方法,通过两阶段回归(低阶多项式+浅层树)快速消除条件集影响,实现校准良好的非参数条件独立性检验,适用于因果发现。

Comments A fast R implementation with C++ back-end is available at https://github.com/ericstrobl/BLITZ

详情
AI中文摘要

基于约束的因果发现依赖于重复的条件独立性检验,但快速非参数检验往往牺牲校准性,尤其是当变量通过非线性关系依赖于条件集时。我们提出了BLITZ(Broad-to-Local Independence Testing via residualiZation),一种非参数条件独立性检验,旨在在一秒内运行良好,同时保持约束因果发现算法执行数千次查询所需的准确性。BLITZ首先使用低阶多项式回归消除对条件集的广泛平滑依赖,然后应用一个小型非线性特征映射,并通过浅层树回归对这些特征进行残差化。得到的统计量检验残差互协方差,并采用矩匹配卡方近似于零分布。我们从理论上证明,两阶段设计降低了树残差化器面临的有效复杂度,使得浅层树能够控制残差条件均值偏差,同时避免过度过拟合。在模拟中,BLITZ提供了比快速核、随机特征和基于回归的竞争者更好的零校准,同时保持所测试方法中最快的速度之一。在合成图和流式细胞术数据的因果发现实验中,BLITZ在保留的邻接中产生了更可靠的端点方向,并具有竞争力的结构恢复。这些结果表明,从宽到局部残差化是实现因果发现中校准、可扩展的非参数条件独立性检验的实用途径。

英文摘要

Constraint-based causal discovery relies on repeated conditional independence tests, but fast nonparametric tests often sacrifice calibration, especially when variables depend on the conditioning set through nonlinear relationships. We introduce BLITZ (Broad-to-Local Independence Testing via residualiZation), a nonparametric conditional independence test designed to run well under a second while maintaining the accuracy needed for the thousands of queries performed by constraint-based causal discovery algorithms. BLITZ first removes broad smooth dependence on the conditioning set using low-order polynomial regression, then applies a small nonlinear feature map and residualizes those features with shallow tree regressions. The resulting statistic tests residual cross-covariance, with a moment-matched chi-square approximation to the null distribution. We show theoretically that the two-stage design reduces the effective complexity faced by the tree residualizers, allowing shallow trees to control residual conditional-mean bias while avoiding excessive overfitting. In simulations, BLITZ provides better null calibration than fast kernel, random-feature, and regression-based competitors while remaining among the fastest methods tested. In causal discovery experiments on synthetic graphs and flow-cytometry data, BLITZ yields more reliable endpoint orientations among retained adjacencies and competitive structural recovery. These results suggest that broad-to-local residualization is a practical route to calibrated, scalable nonparametric conditional independence testing for causal discovery.