arXivDaily arXiv每日学术速递 周一至周五更新
全部学科分类 1938
专题追踪
2501.01785 2026-05-21 cs.LG cs.AI cs.CY

Can Synthetic Data be Fair and Private? A Comparative Study of Synthetic Data Generation and Fairness Algorithms

合成数据能否公平且隐私?合成数据生成与公平性算法的比较研究

Qinyi Liu, Oscar Deho, Sam Urmian, Mohammad Khalil, Srecko Joksimovic, George Siemens

发表机构 * Centre for the Science of Learning & Technology (SLATE), University of Bergen(学习科学与技术中心(SLATE),卑尔根大学) University of South Australia(澳大利亚南澳大利亚大学)

AI总结 本研究探讨了合成数据生成与公平性算法在平衡隐私和公平性方面的效果,发现DECAF算法在隐私和公平性之间取得最佳平衡,但其预测准确性较低,而对合成数据应用预处理公平算法能进一步提升公平性。

详情
AI中文摘要

随着机器学习在学习分析(LA)中的广泛应用,算法公平性和隐私问题引发了广泛关注。合成数据作为一种双重用途工具,能够增强LA模型的隐私性和公平性。然而,先前研究指出公平性与隐私之间存在反比关系,使同时优化两者变得困难。本研究探讨了哪些合成数据生成器能最好地平衡隐私和公平性,并确定预处理公平算法(通常应用于真实数据集)在合成数据上的有效性。我们的结果表明,DEbiasing CAusal Fairness(DECAF)算法在隐私和公平性之间取得了最佳平衡。然而,DECAF在实用性上表现不佳,这体现在其预测准确性上。值得注意的是,我们发现将预处理公平算法应用于合成数据时,公平性提升幅度比应用于真实数据时更大。这些发现表明,结合合成数据生成与公平性预处理可以为创建更公平的LA模型提供有前途的方法。

英文摘要

The increasing use of machine learning in learning analytics (LA) has raised significant concerns around algorithmic fairness and privacy. Synthetic data has emerged as a dual-purpose tool, enhancing privacy and improving fairness in LA models. However, prior research suggests an inverse relationship between fairness and privacy, making it challenging to optimize both. This study investigates which synthetic data generators can best balance privacy and fairness, and whether pre-processing fairness algorithms, typically applied to real datasets, are effective on synthetic data. Our results highlight that the DEbiasing CAusal Fairness (DECAF) algorithm achieves the best balance between privacy and fairness. However, DECAF suffers in utility, as reflected in its predictive accuracy. Notably, we found that applying pre-processing fairness algorithms to synthetic data improves fairness even more than when applied to real data. These findings suggest that combining synthetic data generation with fairness pre-processing offers a promising approach to creating fairer LA models.

2409.08700 2026-05-21 cs.LG

Personalized Weight Loss Management through Wearable Devices and Artificial Intelligence

通过可穿戴设备和人工智能实现个性化体重管理

Sergio Romero-Tapiador, Ruben Tolosana, Aythami Morales, Blanca Lacruz-Pleguezuelos, Sofia Bosch Pastor, Laura Judith Marcos-Zambrano, Guadalupe X. Bazán, Gala Freixer, Ruben Vera-Rodriguez, Julian Fierrez, Javier Ortega-Garcia, Isabel Espinosa-Salinas, Enrique Carrillo de Santa Pau

发表机构 * Department of Mathematics, Universidad de Las Palmas de Gran Canaria, 35001, Spain(拉斯帕尔马斯德Gran Canaria大学数学系) Cancer Research Program, IMDEA Food Institute(IMDEA食品研究所癌症研究计划) Health, IMDEA Food Institute(IMDEA食品研究所健康)

AI总结 本文研究利用可穿戴设备和人工智能预测超重和肥胖人群的体重变化,通过分析100名受试者的生物标志物、体征和行为数据,发现体重减轻者与未减轻者的关键差异,使用梯度提升分类器达到84.44%的AUC,表明多数据源整合在个性化医疗中的潜力。

Comments 25 pages, 6 figures, 7 tables, 1 appendix

Journal ref Computers in Biology and Medicine, Vol. 173, 111676, 2026

详情
AI中文摘要

早期检测慢性及非传染性疾病(NCDs)对于在初始阶段有效治疗至关重要。本研究探讨了可穿戴设备和人工智能(AI)在预测超重和肥胖个体体重变化中的应用。使用来自AI4FoodDB数据库的1个月试验数据,包括生物标志物、体征和行为数据,我们识别出体重减轻(≥初始体重2%)者与未减轻者之间的关键差异。特征选择技术和分类算法显示出有前景的结果,梯度提升分类器达到84.44%的曲线下面积(AUC)。多数据源(如体征、体力和睡眠活动等)的整合增强了性能,表明可穿戴设备和AI在个性化医疗中的潜力。

英文摘要

Early detection of chronic and Non-Communicable Diseases (NCDs) is crucial for effective treatment during the initial stages. This study explores the application of wearable devices and Artificial Intelligence (AI) in order to predict weight loss changes in overweight and obese individuals. Using wearable data from a 1-month trial involving around 100 subjects from the AI4FoodDB database, including biomarkers, vital signs, and behavioral data, we identify key differences between those achieving weight loss (>= 2% of their initial weight) and those who do not. Feature selection techniques and classification algorithms reveal promising results, with the Gradient Boosting classifier achieving 84.44% Area Under the Curve (AUC). The integration of multiple data sources (e.g., vital signs, physical and sleep activity, etc.) enhances performance, suggesting the potential of wearable devices and AI in personalized healthcare.

2605.21260 2026-05-21 cs.LG

On the Cost and Benefit of Chain of Thought: A Learning-Theoretic Perspective

关于思维链的成本与收益:一种学习理论视角

Yue Zhang, Zhiyi Dong, Tommaso Cesari, Yongyi Mao

发表机构 * University of Ottawa(渥太华大学)

AI总结 本文从学习理论的角度出发,研究了思维链(CoT)的成本与收益,通过分析回答映射与链式规则的交互作用,定义了假设在该交互下的推理风险,并推导出该风险的紧分解,揭示了CoT在不同条件下的帮助与损害作用。

详情
AI中文摘要

我们开发了一个学习理论框架,用于理解思维链(CoT)。我们将CoT建模为回答映射与链式规则之间的交互作用,链式规则通过自回归的方式生成中间问题,并定义了在该交互下假设的推理风险。我们的第一个结果是将该风险紧分解为两个具有相反作用的项:一个oracle轨迹风险(OTR),它捕捉了CoT的收益,并在领域适应问题中减少到目标领域风险;一个轨迹不匹配风险(TMR),它捕捉了CoT通过在不匹配的推理轨迹上积累误差所带来的成本。然后我们展示,这种成本在没有结构的情况下是无法避免的:如果任何一项损失、假设的回答映射或链式规则缺乏稳定性,即使OTR为零且假设与真实值一致,TMR也可以任意大。相反,在具有稳定性的情况下,我们证明了在精确放大因子下TMR的紧上界,该放大因子识别了有界、线性和指数误差增长区域。这些结果共同给出了CoT何时有助于推理、何时有害以及控制两者之间转换的精确理论。

英文摘要

We develop a learning-theoretic framework for understanding Chain of Thought (CoT). We model CoT as the interaction between an answer map and a chain rule that generates intermediate questions autoregressively, and define the reasoning risk of a hypothesis under this interaction. Our first result is a tight canonical decomposition of this risk into two terms with opposing roles: an oracle-trajectory risk (OTR), which captures the benefit of CoT and reduces to a target-domain risk in a domain adaptation problem, and a trajectory-mismatch risk (TMR), which captures the cost of CoT through error accumulation along mismatched reasoning trajectories. We then show that this cost is unavoidable without structure: if any one of the loss, the hypothesis answer map, or the chain rule lacks stability, the TMR can be arbitrarily large even when the OTR is zero and the hypothesis is uniformly close to the ground truth. Conversely, under stability, we prove a tight upper bound on the TMR governed by an exact amplification factor that identifies bounded, linear, and exponential error-growth regimes. Together, these results give a precise theory of when CoT helps, when it hurts, and what controls the transition between the two.

2605.21258 2026-05-21 cs.RO cs.AI

Learning Structural Latent Points for Efficient Visual Representations in Robotic Manipulation

为机器人操作中的高效视觉表征学习结构潜在点

Yicheng Jiang, Jiaxu Wang, Junhao He, Zesen Gan, Junhao Li, Qiang Zhang, Jingkai Sun, Jiahang Cao, Mingyuan Sun, Xiangyu Yue, Qiming Shao

发表机构 * The Hong Kong University of Science and Technology(香港科技大学) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) MMLab, The Chinese University of Hong Kong(香港中文大学MMLab) X-Humanoid Robots(X-Humanoid机器人) The University of Hong Kong(香港大学) Tsinghua University(清华大学)

AI总结 本文提出了一种新的预训练框架,通过学习混合表示-结构潜在点,结合隐式表示的表达能力和显式表示的结构先验,以提高机器人操作中的视觉表征效率和鲁棒性。

Journal ref International Conference on Robotics and Automation 2026

详情
AI中文摘要

当前基于3D感知的预训练方法在具身感知和操作中大多基于可微渲染框架,产生完全隐式神经场或完全显式几何基元。隐式表示虽然具有表达能力,但缺乏显式结构线索,而显式表示则保留几何信息但受到分辨率限制和泛化能力差的困扰。为了解决这些限制,我们提出了一种新的预训练框架,学习混合表示-结构潜在点。具体来说,我们将在点云自编码器的潜在空间中插入一个点-wise潜在变分自编码器,联合正则化点-wise特征和坐标向高斯先验。所得到的紧凑潜在保留了粗略的结构趋势,不编码精确几何,但捕捉了更丰富的粗糙形状和语义信息,有效结合了隐式表示的表达能力和显式表示的结构先验。此外,受先前工作的共享设计选择启发,我们开发了一种流线型、高效的3DGS基于渲染管道,故意保持轻量,提高效率的同时,让前端潜在模块有更大的表征能力。在RLBench、ManiSkill2和真实机器人平台上的大量评估显示,在任务成功率、样本效率和对视角和场景变化的鲁棒性方面均优于强基线。消融研究进一步确认了框架中每个组件对整体性能的重要性。

英文摘要

Current 3D-aware pretraining methods for embodied perception and manipulation are largely built on differentiable rendering frameworks, producing either fully implicit neural fields or fully explicit geometric primitives. Implicit representations, while expressive, lack explicit structural cues, whereas explicit ones preserve geometry but suffer from resolution limits and weak generalization. To address these limitations, we propose a novel pretraining framework that learns a hybrid representation-structural latent points. Specifically, we insert a point-wise latent variational autoencoder into the latent space of a point-cloud autoencoder, jointly regularizing point-wise features and coordinates toward a Gaussian prior. The resulting compact latent preserves coarse structural tendencies, which do not encode precise geometry but capture richer rough shape and semantic information, effectively combining the expressiveness of implicit representations with the structural priors of explicit ones. In addition, informed by shared design choices in prior work, we develop a streamlined, efficient 3DGS-based rendering pipeline that is deliberately kept lightweight, improving efficiency while leaving greater representational capacity to the front-end latent module. Extensive evaluations on RLBench, ManiSkill2, and a real-robot platform demonstrate consistent gains in task success, sample efficiency, and robustness to viewpoint and scene variations over strong baselines. Ablation studies further confirm that each component of our framework is critical to overall performance.

2605.21257 2026-05-21 cs.RO

Reinforcement Learning for Risk Adaptation via Differentiable CVaR Barrier Functions

通过可微分CVaR障碍函数实现风险适应的强化学习

Xinyi Wang, Taekyung Kim, Bardh Hoxha, Georgios Fainekos, Dimitra Panagou

发表机构 * Department of Robotics(机器人学系) Department of Aerospace Engineering(航空航天工程系) University of Michigan(密歇根大学) Toyota Motor North America, Research & Development(丰田美国北美洲研发)

AI总结 本文提出了一种端到端的风险适应框架,用于在障碍物运动不确定性的环境下进行人群导航,结合强化学习与基于条件价值-at-风险(CVaR)障碍函数的可微分二次规划安全层,共同学习名义控制输入、风险水平和安全边际,并强制执行显式的概率安全约束。

Comments Project page: https://anonymousrobotics9666.github.io/rlcvarbf/

详情
AI中文摘要

在存在不确定障碍物运动的拥挤环境中进行规划仍然具有挑战性,因为随机交互常常导致过于保守的行为或降低效率。为了解决这一挑战,我们提出了一种端到端的风险适应框架,用于在由高斯混合模型建模的障碍物运动不确定性下的人群导航。该框架结合了强化学习(RL)与基于条件价值-at-风险(CVaR)障碍函数的可微分二次规划安全层,共同学习名义控制输入、风险水平和安全边际,并强制执行显式的概率安全约束。这种设计实现了情境感知的适应,促进高效行为,仅在必要时引发谨慎。我们在动态、不确定和拥挤的环境中进行了广泛的评估,涵盖了不同障碍物密度和机器人模型的情况,进一步评估了在三种非分布情况下的泛化能力。提供了基于优化、基于RL和基于集成RL和优化方法的比较,证明所提出的方法在安全、效率和不确定性下的泛化能力方面表现最强。

英文摘要

Planning through crowded environments under uncertain obstacle motions remains difficult, as stochastic interactions often induce overly conservative behavior or reduced efficiency. To address this challenge, we propose an end-to-end risk adaptation framework for crowd navigation under obstacle-motion uncertainty modeled by a Gaussian mixture model. The framework combines reinforcement learning~(RL) with a differentiable quadratic-program safety layer based on Conditional Value-at-Risk~(CVaR) barrier functions, jointly learning nominal control input, risk level, and safety margin and enforcing explicit probabilistic safety constraints. This design enables context-aware adaptation, promoting efficient behavior while invoking caution only when necessary. We conduct extensive evaluations in dynamic, uncertain, and crowded environments across varying obstacle densities and robot models, and further assess generalization under three out-of-distribution cases. Comparisons across optimization-based, RL-based, and integrated RL and optimization methods are provided, and the proposed method is shown to deliver the strongest overall performance in safety, efficiency, and generalization under uncertainty.

2605.21256 2026-05-21 cs.CL

Reliable Automated Triage in Spanish Clinical Notes: A Hybrid Framework for Risk-Aware HIV Suspicion Identification

西班牙临床笔记中可靠自动分诊:一种风险感知的HIV怀疑识别混合框架

Rodrigo Morales-Sánchez, Soto Montalvo, Raquel Martínez

发表机构 * Dept. of Lenguajes y Sistemas Informáticos, Escuela Técnica Superior de Ingeniería Informática, Universidad Nacional de Educación a Distancia (UNED)(语言与系统信息学系,信息工程技术学校,国家远程教育大学(UNED)) Dept. Informática y Estadística, Escuela Técnica Superior de Ingeniería Informática, Universidad Rey Juan Carlos (URJC)(信息与统计学系,信息工程技术学校,皇家胡安·卡洛斯大学(URJC))

AI总结 本文提出一种混合框架,用于在西班牙临床笔记中识别HIV怀疑,通过分离随机不确定性和epistemic不确定性,提高分诊的可靠性。

Comments Accepted at the BioNLP Workshop @ ACL 2026

详情
AI中文摘要

标准临床自然语言处理(NLP)基准往往通过在模糊实例上强制确定性分类而产生虚高指标,从而掩盖了过于自信预测的临床风险。为弥合这一差距,我们提出了一种风险感知的混合选择性分类框架,在西班牙临床笔记中早期人类免疫缺陷病毒怀疑识别上进行了评估。我们的双验证方法通过Mondrian符合预测分离随机不确定性,并通过多中心马哈拉诺斯距离 veto 分离epistemic不确定性。实证评估表明,标准不确定性度量和基线分类器在安全医疗分诊中结构上不足,当被迫在严格可靠性约束下运行时,会遭受严重的覆盖崩溃。相反,通过要求临床叙述通过概率和几何保障,所提出的框架成功地隔离了一个高度可信的操作领域。

英文摘要

Standard clinical Natural Language Processing (NLP) benchmarks often yield inflated metrics by forcing deterministic classification on ambiguous instances, thereby obscuring the clinical risks of overconfident predictions. To bridge this gap, we propose a risk-aware hybrid selective classification framework, evaluated on early Human Immunodeficiency Virus suspicion identification in Spanish clinical notes. Our dual-verification approach explicitly decouples aleatoric uncertainty through Mondrian conformal prediction and epistemic uncertainty using a Multi-Centroid Mahalanobis Distance veto. Empirical evaluations reveal that standard uncertainty metrics and baseline classifiers are structurally insufficient for safe medical triage, suffering severe coverage collapse when forced to operate under strict reliability constraints. In contrast, by demanding that clinical narratives pass both probabilistic and geometric safeguards, the proposed framework successfully isolates a highly trustworthy operational domain.

2605.21244 2026-05-21 cs.CV

SR-Ground: Image Quality Grounding for Super-Resolved Content

SR-Ground: 图像质量接地用于超分辨内容

Artem Borisov, Evgeney Bogatyrev, Khaled Abud, Dmitriy Vatolin

发表机构 * Lomonosov Moscow State University(莫斯科罗蒙诺索夫莫斯科国立大学) MSU AI Center, Lomonosov Moscow State University(MSU人工智能中心,莫斯科罗蒙诺索夫莫斯科国立大学)

AI总结 本文提出SR-Ground数据集,用于超分辨图像中细粒度伪影分割,通过大规模众包研究生成高质量数据集,提升IQA模型性能并减少超分辨输出中的可感知伪影。

详情
AI中文摘要

超分辨率(SR)近年来发展迅速,扩散模型在保真度上取得了前所未有的进展,但引入了新的视觉伪影类型。尽管现有图像质量评估(IQA)方法提供整体质量评分,但缺乏可解释性且无法区分现代SR方法产生的不同伪影类型。为解决这一差距,我们引入SR-Ground,一个专门设计用于超分辨图像细粒度伪影分割的大规模数据集。该数据集包含由多种最先进的SR模型处理的图像,具有像素级注释的多种伪影类别。我们进行了一项涉及1,062名参与者的大型众包研究,以验证和优化自动生成的分割,最终生成了包含6种不同伪影类型的63,000张高质量图像数据集。我们证明了在SR-Ground上训练具有接地能力的IQA模型在下游任务中显著提高了性能。此外,我们引入了一种微调流程,利用我们的接地模型减少SR输出中的可感知伪影,展示了我们数据集的实用价值。

英文摘要

Super-Resolution (SR) has advanced rapidly in recent years, with diffusion-based models achieving unprecedented fidelity at the cost of introducing new types of visual artifacts. While existing Image Quality Assessment (IQA) methods provide holistic quality scores, they lack interpretability and fail to distinguish between different artifact types arising from modern SR approaches. To address this gap, we introduce SR-Ground, a large-scale dataset specifically designed for fine-grained artifact segmentation in super-resolved images. The dataset comprises images processed by a diverse set of state-of-the-art SR models, with pixel-level annotations for multiple artifact categories. We conduct a large-scale crowdsourcing study involving 1,062 participants to validate and refine automatically generated segmentations, resulting in a high-quality dataset of 63,000 images spanning 6 distinct artifact types. We demonstrate that training IQA models with grounding capabilities on SR-Ground significantly improves performance on downstream tasks. Furthermore, we introduce a fine-tuning pipeline that leverages our grounding model to reduce perceptible artifacts in SR outputs, showcasing the practical utility of our dataset.

2605.21242 2026-05-21 cs.RO

To Select or not to Select, that is the Question: Distilling Robot Skill Prediction into a Small Ensemble

选择还是不选择,这是个问题:将机器人技能预测蒸馏成一个小集成

Haechan Mark Bong, Simon Roy, Euhid Aman, Giovanni Beltrame

发表机构 * Department of Computer Engineering and Software Engineering, Polytechnique Montréal(蒙特利尔理工学院计算机工程与软件工程系) MILA(蒙特利尔人工智能研究所) National Taiwan University of Science and Technology (NTUST)(台湾科技大学)

AI总结 本文研究了机器人技能预测问题,通过合成数据集和微调句子编码器,提出了一种小规模专用模型,在零样本提示下优于大型通用LLM,在机器人队伍任务路由中表现更佳。

Journal ref ICRA 2026 Workshop on Synthetic Data for Robot Learning

详情
AI中文摘要

随着机器人队伍变得更加异质化,包括人形机器人、探测车、四足机器人和无人机,选择合适的机器人执行任务成为系统问题的核心。我们研究了机器人技能预测:将自然语言任务描述映射到所需的物理能力,如飞行、轮子、腿、表面水、水下和手。由于没有将自然语言任务描述映射到机器人物理能力的标记数据,我们使用LLM辅助生成和目标标签审计构建了合成任务到技能数据集。在该数据上训练的约133M参数的两个微调句子编码器(mpnet + MiniLM)在分层的200任务数据集上达到83.5%的任务到技能匹配,优于Kimi K2(1T MoE)72.0%、GPT-OSS-120B 71.5%和Llama-4-Scout-17B 69.0%。这些结果表明,在固定机器人技能分类下,通过合成数据训练的小型专用模型在机器人队伍任务路由中可以优于大型通用LLM。

英文摘要

As robot fleets become more heterogeneous, including humanoids, rovers, quadrupeds, and drones, selecting the right robot for a task becomes a core systems problem. We study robot skill prediction: mapping a natural-language task description to the physical capabilities required to execute it, such as fly, wheels, legs, surface water, under water and hands. Since labelled data that maps natural-language task descriptions to robot's physical capabilities does not exist, we construct a synthetic task-to-skill dataset using LLM-assisted generation and targeted label auditing. Trained on this data, a ~133M-parameter ensemble of two fine-tuned sentence encoders (mpnet + MiniLM) reaches 83.5% task-to-skill matching on a stratified 200 task dataset, outperforming Kimi K2 (1T MoE) at 72.0%, GPT-OSS-120B at 71.5%, and Llama-4-Scout-17B at 69.0% under the same zero-shot prompt. These results suggest that, for fixed robot skill taxonomies, small specialized models trained on synthetic data can outperform much larger general-purpose LLMs for fleet-level task routing.

2605.21241 2026-05-21 cs.LG

Divide and Contrast: Learning Robust Temporal Features without Augmentation

划分与对比:无需增强学习鲁棒的时间特征

Abdul-Kazeem Shamba, Kerstin Bach, Gavin Taylor

发表机构 * Department of Computer Science, Norwegian University of Science(挪威科学与技术大学计算机科学系) Department of Computer Science, United States Naval Academy(美国海军学院计算机科学系)

AI总结 本文提出Di-COT框架,通过对比时间窗口内的信息子结构而非单个时间步,实现了无需数据增强和多编码器传递的自监督学习,从而在六个大规模真实世界数据集和UCR/UEA基准上取得了最先进的性能,同时显著减少了训练时间。

Comments Published in the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

针对时间序列表示的自监督学习旨在减少对标记数据的依赖,同时保持强大的下游性能,但许多现有方法存在计算成本高或依赖不适用于多样化时间动态的假设。在本工作中,我们引入了Divide and Contrast (Di-COT),一种无需数据增强和多次编码器传递的无监督框架,通过对比时间窗口内的信息子结构而非单个时间步来实现。Di-COT在每次迭代中随机将每个窗口划分为少量重叠的子块,从而实现高效且有意义的对比,同时减轻时间转换期间的假阳性。为进一步提高可扩展性,我们采用了一种对比目标,其计算依赖于批量大小和子块数量,使损失计算独立于序列长度。在六个大规模真实世界数据集以及UCR和UEA基准上的广泛实验表明,Di-COT学习了语义结构化且可迁移的表示,实现了分类、聚类、kNN和跨数据集转移任务上的最先进的性能,同时大幅减少了训练时间。源代码可在https://github.com/sfi-norwai/Di-COT上公开获取。

英文摘要

Self-supervised learning for time-series representation aims to reduce reliance on labeled data while maintaining strong downstream performance, yet many existing approaches incur high computational costs or rely on assumptions that do not hold across diverse temporal dynamics. In this work, we introduce Divide and Contrast (Di-COT), an unsupervised framework that avoids data augmentation and multiple encoder passes by contrasting informative substructures within a window rather than individual timesteps. Di-COT stochastically partitions each window into a small number of overlapping sub-blocks per iteration, enabling efficient and meaningful contrast while mitigating false positives during temporal transitions. To further improve scalability, we adopt a contrastive objective whose computation depends on the batch size and the number of sub-blocks, making loss computation independent of sequence length. Extensive experiments on six large-scale real-world datasets, as well as the UCR and UEA benchmarks, demonstrate that Di-COT learns semantically structured and transferable representations, achieving state-of-the-art performance on classification, clustering, $k$NN, and cross-dataset transfer, while substantially reducing training time. The source code is publicly available at https://github.com/sfi-norwai/Di-COT.

2605.21240 2026-05-21 cs.LG cs.AI

APEX: Autonomous Policy Exploration for Self-Evolving LLM Agents

APEX:自主策略探索用于自演化大语言模型代理

Yibo Li, Jiashuo Yang, Zhi Zheng, Zhiyuan Hu, Yuan Sui, Shizun Wang, Yufei He, Bryan Hooi

发表机构 * National University of Singapore(新加坡国立大学) Beijing University of Posts and Telecommunications(北京邮电大学)

AI总结 本文提出APEX,一种用于自演化大语言模型代理的自主策略探索方法,通过构建和维护显式的策略空间来解决探索崩溃问题,并在多个基准测试中表现出色。

详情
AI中文摘要

LLM代理在广泛复杂的任务中表现出强大的性能,包括需要长时间决策的交互环境。但是这些代理在测试时间无法实时学习。自演化代理通过在多个回合中积累记忆和反思来解决这个问题,而不是要求模型权重更新。然而,这些代理常常面临探索崩溃的问题:随着记忆的增长,行为会集中在熟悉的高奖励惯例上,减少了发现更好替代品的机会。为了解决这个问题,我们提出了自主策略探索(APEX),通过策略图——一个具有先决条件依赖边的有向无环图来构建和维护显式的策略空间。在APEX中,分支发现通过证据支持的未探索方向扩展地图,而策略选择在规划过程中平衡探索和利用。在九个Jericho文本冒险游戏和WebArena(一个现实的网络交互基准)上进行评估,APEX优于所有基线。广泛的消融实验验证了每个组件的贡献,并展示了在不同设置中的鲁棒性,证明了APEX在自演化代理中的持续探索有效性。

英文摘要

LLM agents have shown strong performance across a wide range of complex tasks, including interactive environments that require long-horizon decision making. But these agents cannot learn on the fly at test time. Self-evolving agents address this by accumulating memory and reflection across episodes rather than requiring model-weight updates. However, these agents often suffer from exploration collapse: as memory grows, behavior concentrates around familiar high-reward routines, reducing the chance of discovering better alternatives. To address this problem, we propose Autonomous Policy EXploration (APEX), which builds and maintains an explicit strategy space through a strategy map-a directed acyclic graph of milestones with prerequisite dependency edges. In APEX, Fork Discovery expands the map with evidence-grounded unexplored directions, while Policy Selection balances exploration and exploitation during planning. Evaluated on nine Jericho text-adventure games and WebArena, a realistic web interaction benchmark, APEX outperforms all baselines. Extensive ablations validate each component's contribution and demonstrate robustness across diverse settings, demonstrating APEX's effectiveness for sustained exploration in self-evolving agents.

2605.21237 2026-05-21 cs.CV cs.AI

RePCM: Region-Specific and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis

RePCM:区域特定和表型适应的双心室心脏运动合成

Xuan Yang, Xiaohan Yuan, Hao Li, Lingyu Chen, Yanan Liu, Lei Li

发表机构 * School of Biomedical Engineering, National University of Singapore, Singapore(新加坡国立大学生物医学工程学院) School of Automation, Southeast University, Nanjing, China(东南大学自动化学院) School of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China(南京航空航天大学计算机科学与技术学院) School of Information Science and Engineering, Yunnan University, Kunming, China(云南大学信息科学与工程学院)

AI总结 本文提出RePCM方法,通过单帧双心室网格运动补全,利用区域特定和表型适应性来提升心脏运动合成的准确性,以应对心血管疾病导致的区域和疾病特异性差异。

Comments Early Accepted by MICCAI 2026. This is the author's submitted version. 10 pages, 3 figures

详情
AI中文摘要

心脏周期内的运动对于量化区域功能至关重要,并且强烈受到心血管疾病的影响。由于在实践中难以获得时间密集的网格序列,我们专注于利用更易获得的终舒张期帧来推断完整的周期序列。由于存在强区域和疾病特异性差异,传统方法常通过依赖生成模型来过度平滑数据,这些模型是为全球模式优化的。为了解决这个问题,我们提出了Region-Aware和Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis(RePCM)方法,用于单帧双心室网格运动补全。在第一阶段,重建网络学习顶点级别的运动描述符,聚类产生数据驱动的功能分区,提供显式的运动衍生区域结构。在第二阶段,Region-Specific Injection模块在条件VAE中强制执行掩码同步的区域交换,保留局部特定动态并限制跨区域混合。Phenotype-Adaptive Mixture-of-Experts先验条件于ED形状,使用解剖引导的提示来建模潜在运动趋势并捕捉跨疾病变化。在三个涵盖不同心血管疾病的数据集上的实验显示,在几何和功能指标上取得了持续的改进,并且区域特定动态的保护得到了改善。

英文摘要

Cardiac motion over a cardiac cycle is crucial for quantifying regional function and is strongly affected by cardiovascular diseases. Since temporally dense mesh sequences are difficult to obtain in practice, we focus on leveraging the more accessible end-diastolic frame to infer a full-cycle sequence. Due to strong regional and disease-specific differences, traditional methods often oversmooth the data by relying on generative models that are optimized for global patterns. To address this problem, we propose Region-Aware and Phenotype-Adaptive Bi-Ventricular Cardiac Motion Synthesis (RePCM) for single frame Bi-ventricular mesh motion completion. In Stage I, a reconstruction network learns vertex wise motion descriptors and clustering yields a data driven functional partition, providing an explicit motion derived region structure. In Stage II, a Region-Specific Injection Module enforces masked, synchronized region exchange within a conditional VAE, preserving localized specific dynamics and restricting cross-region mixing. A Phenotype-Adaptive Mixture-of-Experts prior conditioned on ED shape uses anatomy-guided cues to model latent motion trends and capture inter-disease variability. Experiments on three datasets covering different cardiovascular diseases show consistent gains in geometric and functional metrics and improved preservation of region specific dynamics.

2605.21227 2026-05-21 cs.CL

Do LLMs Know What Luxembourgish Borrows? Probing Lexical Neology in Low-Resource Multilingual Models

LLMs是否知道卢森堡语借词?探测低资源多语言模型中的词汇新词

Nina Hosseini-Kivanani

发表机构 * University of Luxembourg(卢森堡大学) Radio Télévision Luxembourg (RTL)(卢森堡广播电视台(RTL))

AI总结 本文通过LexNeo-Bench基准测试,探讨了低资源多语言模型在词汇借词识别中的表现,发现通过构建语言知识图谱可以显著提升模型的借词分类准确率,表明词汇资源对LLM评估具有结构化上下文的作用。

Comments Accepted to Neollm colocated with LREC2026, Three figures and three tables

详情
AI中文摘要

大型语言模型(LLMs)越来越多地用于小接触语言的写作辅助,但不清楚它们是否尊重社区在词汇借词和新词方面的规范。我们引入LexNeo-Bench,一个包含3,050个实例的令牌级基准,来源于LuxBorrow,一个大规模的卢森堡语新闻语料库,其中目标令牌被标记为本土词或法语、德语或英语借词。使用此基准,我们对三种多语言LLM在34种提示设置下进行测试,涉及两种任务:借词类型分类和二元词汇创新代理(借词与本土词)。在无外部上下文的情况下,模型在借词分类上的表现仅略高于随机猜测,因此我们构建了一个语言知识图谱,编码了供体语言、形态模式和词汇类比,并将实例特定的子图注入提示中。知识图谱提示将借词分类准确率从25-35%提升到71-81%,大幅缩小了小模型和大模型之间的差距,同时使新词检测困难且对少样本设计敏感。我们的结果表明,词汇意识提示对低资源接触语言中的稳健借词判断非常有益,且词汇资源可以作为LLM评估的结构化上下文。本研究在ENEOLI COST行动中进行,并探讨了多语言卢森堡语数据中的借词作为词汇创新的形式。

英文摘要

Large language models (LLMs) are increasingly used for writing assistance in small contact languages, yet it is unclear whether they respect community norms around lexical borrowing and neology. We introduce LexNeo-Bench, a 3{,}050-instance token-level benchmark derived from LuxBorrow, a large-scale Luxembourgish news corpus, where target tokens are labelled as native or as French, German, or English borrowings. Using this benchmark, we probe three multilingual LLMs across 34 prompt settings on two tasks: borrowing type classification and a binary lexical-innovation proxy (borrowing versus native). Without external context, models perform only slightly above chance on borrowing classification, so we construct a linguistic knowledge graph that encodes donor language, morphological patterns, and lexical analogues, and inject instance-specific subgraphs into the prompt. Knowledge-graph prompts raise borrowing classification accuracy from 25 -- 35\% up to 71 -- 81\% and largely close the gap between small and large models, while leaving neology detection difficult and sensitive to few-shot design. Our results show that lexicon-aware prompting is highly beneficial for robust borrowing judgments in low-resource contact languages and that lexical resources can serve as structured context for LLM evaluation. This study was carried out within the ENEOLI COST Action and examines borrowing as a form of lexical innovation in multilingual Luxembourgish data.

2605.21226 2026-05-21 cs.LG cs.AI

OCTOPUS: Optimized KV Cache for Transformers via Octahedral Parametrization Under optimal Squared error quantization

OCTOPUS: 通过在最优平方误差量化下的八面体参数化优化Transformer的KV缓存

Mark Boss, Vikram Voleti, Simon Donné, Shimon Vainer

发表机构 * Stability AI

AI总结 OCTOPUS通过联合量化旋转坐标三元组,优化了Transformer的KV缓存,在保持内存带宽和足迹的同时,通过八面体参数化将方向映射到平方,并利用Lloyd-Max量化来实现非均匀的位分配,从而在各种数据类型中实现了优于现有旋转编码器的性能。

详情
AI中文摘要

关键值(KV)缓存是长上下文自回归推断中内存带宽和足迹的主要瓶颈。最近的旋转预条件编码器(TurboQuant, PolarQuant)表明,通过结构化的随机旋转后,再配合每个坐标轴的标量量化器,该量化器的边际分布具有解析性,可以近似达到KV压缩的最优解。OCTOPUS通过联合量化旋转坐标三元组进一步推进了这一范式。每个三元组的方向通过八面体参数化映射到平方,然后得到的两个坐标和三元组范数通过Lloyd-Max量化与实现匹配的边际分布进行量化。通过优化每个三元组的平方误差,得到的位分配严格非均匀,仅依赖于键的总维度。我们发现,在有限维的情况下,通过扫描找到的质量最优是恒定的,无论在我们测试的任何现实解码器中。该编码器是数据无关的、在线的,并且在给定种子的情况下是确定性的。在文本、视频和音频中,OCTOPUS在每个报告的比特宽度和指标上都匹配或超越了所有先前的旋转编码器,其优势随着比特数的减少而增加。此外,一个融合的Triton实现可以在不生成未压缩键的情况下实时重建键,因此编码器在解码时间上不会增加带宽或延迟。项目页面:https://octopus-quant.github.io/

英文摘要

The key-value (KV) cache dominates memory bandwidth and footprint in long-context autoregressive inference. Recent rotation-preconditioned codecs (TurboQuant, PolarQuant) show that a structured random rotation followed by a per-coordinate scalar quantizer matched to an analytically tractable marginal is a near-optimal recipe for KV compression. OCTOPUS advances this paradigm through joint quantization of rotated coordinate triplets. Each triplet's direction is mapped to a square via an octahedral parameterization, and the two resulting coordinates and the triplet norm are Lloyd-Max quantized against implementation-matched marginals. Optimizing the per-triplet squared error gives a strictly non-uniform bit allocation depending only on the total dimensionality of the keys. We find the finite-dimensional quality optimum with sweeps to be constant on every real decoder we test. The codec is data-oblivious, online, and deterministic given a seed. Across text, video, and audio, OCTOPUS matches or beats every prior rotation codec at every reported bit width and metric, with a lead that grows as bits drop for extreme compression. Furthermore, a fused Triton implementation reconstructs keys on the fly without materializing the uncompressed key, so the codec adds no decode-time bandwidth or latency over the existing dequantization. Project Page: https://octopus-quant.github.io/

2605.21225 2026-05-21 cs.LG cs.AI

PREFINE: Preference-Based Implicit Reward and Cost Fine-Tuning for Safety Alignment

PREFINE: 基于偏好的隐式奖励和成本微调以实现安全对齐

Richa Verma, Bavish Kulur, Sanjay Chawla, Balaraman Ravindran

发表机构 * TCS Research, \ of CSE, IIT Madras India Department of Computing Science, \ of Alberta Canada Qatar Computing Research Institute, \ Bin Khalifa University Qatar Department of Data Science \& AI, Wadhwani School of Data Science \& AI, IIT Madras India TCS Research, \ of CSE, IIT Madras Department of Computing Science, \ of Alberta Qatar Computing Research Institute, \ Bin Khalifa University Department of Data Science \& AI, Wadhwani School of Data Science \& AI, IIT Madras

AI总结 该研究提出PREFINE方法,通过基于偏好的隐式奖励和成本微调,在连续控制环境中实现安全策略对齐,通过微调预训练强化学习策略以生成低成本行为同时保持高奖励。

Comments Accepted at AAMAS 2026 as a full paper

详情
AI中文摘要

我们解决了通过引入成本约束使预训练的强化学习(RL)策略安全意识的问题,而无需重新训练。虽然成本可以数值编码,但我们假设更一般的情况是当成本作为偏好提供时。给定一个奖励优化的策略和一个小的偏好(低成本)和不偏好(高成本)轨迹数据集,我们的目标是微调策略以生成低成本行为,同时保留高奖励。与标准RLHF在语言模型中不同,我们的设置涉及轨迹层面的偏好,在连续控制环境中。我们介绍了PREFINE:基于偏好的隐式奖励和成本微调以实现安全对齐,这是一种基于偏好的微调方法,将现在广泛用于LLM微调的直接偏好优化(DPO)适应到序列决策设置中。PREFINE构造策略采样的反事实轨迹以建立有意义的偏好对比,并联合优化奖励保留和安全对齐。实证上,PREFINE将约束违反和灾难性故障减少了超过60%,同时保持原始奖励行为。PREFINE生成的策略在显著提高数据和计算效率的情况下,实现了低成本、高奖励性能, bridging preference alignment和安全策略适应在连续域中。

英文摘要

We address the problem of making a pre-trained reinforcement learning (RL) policy safety-aware by incorporating cost constraints without retraining it from scratch. While costs could be numerically encoded, we assume a more general setting is when costs are provided as preferences. Given a reward-optimized policy and a small dataset of preferred (low-cost) and dispreferred (high-cost) trajectories, our goal is to fine-tune the policy to generate low-cost behaviors while retaining high rewards. Unlike standard RLHF in language models, where preferences are defined over responses to the same prompt, our setting involves trajectory-level preferences in continuous control environments. We introduce PREFINE: Preference-based Implicit Reward and Cost Fine-Tuning for Safety Alignment which is a preference-based fine-tuning method that adapts Direct Preference Optimization (DPO), which is now widely used for LLM fine-tuning, to the sequential decision making setting. PREFINE constructs policy-sampled counterfactual trajectories to establish meaningful preference contrasts and jointly optimizes for reward retention and safety alignment. Empirically, PREFINE reduces constraint violations and catastrophic failures by over 60% while maintaining original reward behavior. PREFINE produces policies that achieve low-cost, high-reward performance with significantly improved data and computational efficiency compared to full offline RL or imitation learning, bridging preference alignment and safe policy adaptation in continuous domains.

2605.21207 2026-05-21 cs.CV

PGC: Peak-Guided Calibration for Generalizable AI-Generated Image Detection

PGC:用于通用人工智能生成图像检测的峰值引导校准

Xiaoyu Zhou, Jianwei Fei, Peipeng Yu, Jingchang Xie, Chong Cheng, Zhihua Xia

发表机构 * College of Cyber Security, Jinan University, Guangzhou, China(济南大学网络安全学院,中国广州) Department of Information Engineering, University of Florence, Florence, Italy(佛罗伦萨大学信息工程系,意大利佛罗伦萨) School of Integrated Circuits, Guangdong University of Technology, Guangzhou, China(广东工业大学集成电路学院,中国广州)

AI总结 本文提出PGC框架,通过峰值聚焦机制聚合显著特征,以校准全局决策,从而提高对细粒度判别信号的检测能力,并在CommGen15数据集上实现了最先进的性能。

详情
AI中文摘要

生成式AI的快速发展,从GANs到现代扩散模型,导致了越来越微妙的判别线索。这些细粒度信号常常被主导的高保真图像内容(例如主体)所掩盖,限制了现有主要依赖全局表示的检测器的可靠性。为了解决这一挑战,我们提出了峰值引导校准(PGC)框架。PGC引入了一种新的策略,通过峰值聚焦机制聚合显著特征。具体而言,通过采用对峰值敏感的聚合方法,强调最判别性的局部线索,PGC利用这些关键信号来校准全局决策。这种方法恢复了在全局上下文中被淹没的细微模式。此外,为了更好地模拟现实世界威胁,我们引入了CommGen15数据集,一个包含15个商业模型样本的具有挑战性的基准。广泛实验表明,PGC在性能上达到最先进的水平。具体而言,它在我们的CommGen15数据集上将平均准确率提高了+12.3%,并在标准基准上设定了新纪录,包括GenImage(+2.1%)、AIGI(+3.5%)和UniversalFakeDetect(+1.7%)。代码可在https://github.com/xiaoyu6868/PGC上获得。

英文摘要

The rapid evolution of generative AI, from GANs to modern diffusion models, has resulted in increasingly subtle discriminative clues. These fine-grained signals are often overshadowed by dominant, high-fidelity image content (e.g., the main subject), limiting the reliability of existing detectors that predominantly rely on global representations. To address this challenge, we propose the Peak-Guided Calibration (PGC) framework. PGC introduces a novel strategy that aggregates salient features via a peak-focusing mechanism. Specifically, by employing a peak-sensitive aggregation that accentuates the most discriminative local clues, PGC leverages these critical signals to calibrate the global decision. This approach recovers subtle patterns that would otherwise be submerged in the global context. Furthermore, to better simulate real-world threats, we introduce the CommGen15 dataset, a challenging benchmark comprising samples from 15 commercial models. Extensive experiments demonstrate that PGC achieves state-of-the-art performance. Specifically, it improves mean accuracy by +12.3% on our CommGen15 dataset, and sets new records on standard benchmarks, including GenImage (+2.1%), AIGI (+3.5%), and UniversalFakeDetect (+1.7%). Code is available at https://github.com/xiaoyu6868/PGC.

2605.21195 2026-05-21 cs.CV

RankE: End-to-End Post-Training for Discrete Text-to-Image Generation with Decoder Co-Evolution

RankE: 用于离散文本到图像生成的端到端后训练方法 with Decoder Co-Evolution

Siyong Jian, Siyuan Li, Luyuan Zhang, Zedong Wang, Xin Jin, Ying Li, Cheng Tan, Huan Wang

发表机构 * Westlake University(西湖大学) Zhejiang University(浙江大学) Tsinghua University(清华大学) Hong Kong University of Science and Technology(香港科技大学) Shanghai AI Lab(上海人工智能实验室)

AI总结 本文提出RankE,一种端到端的后训练框架,通过解码器与策略的协同进化,解决离散自回归文本到图像生成中策略优化导致的潜在协变量偏移问题,同时提升图像质量和对齐度。

详情
AI中文摘要

离散自回归(AR)文本到图像(T2I)模型将VQ分词器与自回归策略结合,当前后训练流程仅优化策略而保持VQ解码器冻结。最近的扩散T2I工作,如REPA-E,表明VAE本身构成关键对齐瓶颈,但离散AR模型尚无类似研究。我们证明仅优化策略会引发潜在协变量偏移:随着策略进化,生成的token分布偏离解码器训练的地面真实分布,使得奖励分数提升而解码图像质量下降。为解决此不匹配,我们提出RankE,首个用于离散T2I生成的端到端后训练框架。RankE通过交替优化使两者协同进化:每个模块最大化基于排名的对齐目标,同时通过适合其参数空间的稳定性保持锚点进行正则化。这种协同进化打破了冻结解码器方法所 plagued 的保真度-对齐度权衡:在LlamaGen-XL(775M)上,标准RL提高CLIP但降低FID,而RankE同时提升两者(FID 15.21,CLIP 33.76 on MS-COCO 30K)。在Janus-Pro(1B)上的一致收益证实了解码器协同进化可靠地将奖励优化转化为像素空间质量提升。

英文摘要

Discrete autoregressive (AR) text-to-image (T2I) models pair a VQ tokenizer with an AR policy, and current post-training pipelines optimize only the policy while keeping the VQ decoder frozen. Recent diffusion T2I work, exemplified by REPA-E, has shown that the VAE itself constitutes a key alignment bottleneck, yet no analogous investigation exists for discrete AR models. We show that policy-only optimization induces Latent Covariate Shift: as the policy evolves, the resulting token distribution diverges from the ground-truth distribution on which the decoder was trained, such that reward scores improve while decoded image quality degrades. To address this mismatch, we propose RankE, the first end-to-end post-training framework for discrete T2I generation. Rather than optimizing the policy against a fixed decoder, RankE co-evolves both components through alternating optimization: each module maximizes a ranking-based alignment objective while being regularized by a stability-preserving anchor suited to its parameter space. This co-evolution breaks the fidelity--alignment trade-off that plagues frozen-decoder approaches: on LlamaGen-XL (775M), standard RL improves CLIP but degrades FID, whereas RankE improves both simultaneously (FID 15.21, CLIP 33.76 on MS-COCO 30K). Consistent gains on Janus-Pro (1B) confirm that decoder co-evolution reliably converts reward optimization into pixel-space quality improvements.

2605.21188 2026-05-21 cs.RO

A Terrain-Adaptive epsilon-Constraint MPC for Uneven Terrain Kinodynamic Planning

一种适应地形的epsilon约束MPC用于不规则地形运动动力学规划

Otobong Jerome, Geesara Kalathunga, Tiago Nascimento

发表机构 * Laboratório de Engenharia de Sistemas e Robótica, Universidade Federal da Paraíba(系统与机器人工程实验室,帕拉伊巴联邦大学) School of Computer Science, University of Lincoln(计算机科学学院,林肯大学)

AI总结 本文提出了一种适应地形的epsilon约束MPC方法,用于解决车辆在不规则地形上同时优化路径效率和姿态稳定性的规划问题,通过动态调整epsilon界限来实时探索帕累托前沿,并通过半参数模型结合分析车辆动力学和稀疏高斯过程来捕捉车辆-地形动力学。

详情
AI中文摘要

对于车辆在不规则地形上的运动动力学规划,需要同时优化竞争性目标,如路径效率和姿态稳定性。本文提出了一种集成到模型预测控制(MPC)框架中的自适应epsilon约束方法,其中epsilon界限根据地形描述符动态调整,以实时探索帕累托前沿。为了捕捉车辆-地形动力学,我们开发了一种半参数模型,结合分析车辆动力学和在相同地形描述符上训练的稀疏高斯过程(SGP)。所提出的epsilon-MPC在MPPI和GAKD基准上进行了评估,实现了94%的导航成功率,同时将最大方向偏移减少24%,并提高了多目标权衡质量23%。

英文摘要

Kinodynamic planning for car-like vehicles on uneven terrain requires simultaneously optimizing competing objectives such as path efficiency and pose stability. This work presents an adaptive epsilon-constraint method integrated into a Model Predictive Control (MPC) framework, where the epsilon bounds are dynamically adjusted based on terrain descriptors to explore the Pareto front in real time. To capture vehicle-terrain dynamics, we develop a semi-parametric model combining analytical vehicle dynamics with a Sparse Gaussian Process (SGP) trained on the same terrain descriptors. The proposed epsilon-MPC is evaluated against MPPI and GAKD baselines, achieving a 94% navigation success rate while reducing maximum orientation deviation by 24% and improving multi-objective trade-off quality by 23%.

2605.21186 2026-05-21 cs.CV cs.AI

SAM-Sode: Towards Faithful Explanations for Tiny Bacteria Detection

SAM-Sode:迈向微小细菌检测的可信解释

Wanying Tan, Shuo Yan, Dazhi Huang, Yazheng Liu, Zili Shao, Rufeng Chen, Hechang Chen, Mude Shi, Tianxing Ji, Sihong Xie

发表机构 * Shenzhen University, Shenzhen, China The Second Affiliated Hospital, Guangzhou Medical University, Guangzhou, China The Hong Kong University of Science Technology (Guangzhou), Guangzhou, China Jilin University, Changchun, China Guangdong ACXEL Micro \& Nano Tech Co., Ltd., Guangzhou, China

AI总结 本文提出SAM-Sode框架,通过几何感知提示和双约束机制提升微小细菌检测的解释性与透明度,有效抑制背景冗余并增强决策透明度。

Comments 10 pages, 4 figures, conference paper

详情
AI中文摘要

对象检测的可解释性为临床辅助诊断提供了关键的信心支持。然而,在微小细菌检测中,传统解释方法由于目标形态特征的极端稀疏性和复杂背景的严重干扰,常面临前景边界模糊和特征归因扩散的问题。这种限制阻碍了逻辑连贯的形态证据的提供。为解决这一问题,我们提出了一种新颖的可解释人工智能(XAI)框架SAM-Sode。该框架创新性地将初始特征归因图转换为几何感知提示,利用基础模型(SAM3)的先验知识实现空间细化和形态重建。此外,我们引入基于物理意义和几何对齐的双约束机制,进行实例级去噪,生成更符合人类专家直觉的解释。在我们自行构建的具有复杂电路背景的细菌数据集(包含2,524张图像)及其他公开数据集上的实验结果表明,所提出的方法有效抑制了背景冗余,并显著增强了微小物体检测的决策透明度。

英文摘要

Interpretability in object detection provides crucial confidence support for clinical auxiliary diagnosis. However, in tiny bacteria detection, traditional explanation methods often suffer from blurred foreground boundaries and diffuse feature attribution due to the extreme sparsity of target morphological features and severe interference from complex backgrounds. Such limitations hinder the provision of logically coherent morphological evidence. To bridge this gap, we propose a novel eXplainable AI (XAI) framework, SAM-Sode. The framework innovatively transforms initial feature attribution maps into geometry-aware prompts, leveraging the prior knowledge of the foundation model (SAM3) to achieve spatial refinement and morphological reconstruction of the explanatory mappings. Furthermore, we introduce a dual-constraint mechanism based on physical significance and geometric alignment to perform instance-level denoising, generating coherent explanations that better align with human expert intuition. Experimental results on our self-constructed bacteria dataset with complex circuit backgrounds (containing 2,524 images) and other public datasets demonstrate that the proposed method effectively suppresses background redundancy and significantly enhances the decision-making transparency of tiny object detection.

2605.21180 2026-05-21 cs.LG cs.SE

Domain-Adaptable Reinforcement Learning for Code Generation with Dense Rewards

用于密集奖励的领域可适应强化学习代码生成

Erfan Aghadavoodi Jolfaei, Daniel Maninger, Abhinav Anand, Mert Tiftikci, Mira Mezini

发表机构 * Hessian Center for Artificial Intelligence (hessian.AI)(海斯曼人工智能中心) National Research Center for Applied Cybersecurity ATHENE(应用网络安全国家研究中心ATHENE)

AI总结 本研究提出了一种领域可适应的强化学习框架,用于改进代码生成的正确性、质量和安全性,通过定制化的执行感知奖励公式和令牌级奖励映射机制,提高了代码生成在不同领域中的适应性和执行效率。

Comments 10 pages, 2 figures, under review

详情
AI中文摘要

大型语言模型在自动化代码生成中显示出强大的潜力,但缺乏正确性、质量和安全性的保证,特别是在领域特定约束方面。例如在机器人领域,代码生成越来越多地用于规划和执行动作,环境意识和物理约束至关重要。为了促进代码生成LLM适应多样化需求,包括领域特定需求,我们提出了一种强化学习框架,通过近端策略优化微调预训练LLM。我们的可定制执行感知奖励公式捕捉并优化语法、功能正确性、代码风格、安全性和模拟器可执行性。一个令牌级奖励映射机制使从执行结果到生成令牌的有效信用分配成为可能。该框架在通用代码生成(MBPP/MBPP+)和机器人程序合成(RoboEval)上进行了评估。结果表明,在功能正确性和模拟器可执行性方面有显著改进,包括在MBPP上的pass@1绝对增加19%,在RoboEval上的执行失败减少51%。这些发现表明,结构化的强化学习可以有效地将语言模型对齐到正确的程序生成和领域特定需求。

英文摘要

Large language models show strong potential for automated code generation, but lack guarantees for correctness, quality, safety, and domain-specific constraints. For instance in robotics, where code generation is increasingly being used for planning and executing actions, awareness of the environment and physical constraints is critical. To facilitate the adaption of code-generating LLMs to diverse requirements, including domain-specific ones, we present a reinforcement learning framework that fine-tunes pre-trained LLMs using proximal policy optimization. Our customizable execution-aware reward formula captures and optimizes syntax, functional correctness, code style, security, and simulator executability. A token-level reward mapping mechanism enables effective credit assignment from execution outcomes to generated tokens. The framework is evaluated on general-purpose code generation (MBPP/MBPP+) and robotic program synthesis (RoboEval). The results show substantial improvements in functional correctness and simulator executability, including an absolute pass@1 increase of 19% on MBPP and a reduction in execution failures by 51% on RoboEval. These findings demonstrate that structured reinforcement learning can effectively align language models to correct program generation and domain-specific requirements.

2605.21178 2026-05-21 cs.CL

Metaphors in Literary Post-Editing: Opening Pandora's Box?

文学后编辑中的隐喻:打开普罗米修斯之盒?

Aletta G. Dorst, Mayra O. Nas, Katinka Zeven

发表机构 * Leiden University Centre for Linguistics(莱顿大学语言研究中心)

AI总结 本文研究了文学文本后编辑者如何回应神经机器翻译和大型语言模型对隐喻的翻译方式,发现三分之一的隐喻被后编辑者修改,表明文学机器翻译中隐喻翻译存在问题,且后编辑工作比从头翻译更耗力。

Comments This paper has been accepted for presentation at the EAMT Conference 2026, which will take place in Tilburg from June 15 to 18, 2026

详情
AI中文摘要

本文探讨了文学文本后编辑者对神经机器翻译和大型语言模型翻译隐喻的反应和回应。研究结果表明,输出中三分之一的隐喻被后编辑者修改,证明了文学机器翻译(LitMT)中隐喻翻译确实存在问题。回应表明,后编辑者意识到过于直译的翻译,尽管大多针对多词表达。有时他们难以判断解决方案是否可接受。他们对MT输出的整体质量评价较差,并表示后编辑工作比从头翻译更加费力。这支持了先前研究的观点,即后编辑限制了翻译者的创造力并削弱了他们对文本的所有权感。

英文摘要

This paper investigates how post-editors of literary texts react and respond to the way metaphors have been translated by Neu ral Machine Translation (NMT) and Large Language Models (LLMs). The results show that one in three metaphors in the output were changed by the post-editors, demonstrating that the translation of fig urative language is indeed problematic in literary MT (LitMT). The responses indi cate that the post-editors were aware of overly literal translations, though mostly for multiword expressions. Moreover, at times they found it difficult to determine whether solutions were acceptable. They rated the overall quality of the MT out put as quite poor and stated that the post editing was more work and more effort than it would have been translating from scratch. This supports previous studies ar guing that post-editing constrains transla tors in their creativity and diminishes their sense of text ownership.

2605.21177 2026-05-21 cs.LG cs.CL

ChunkFT: Byte-Streamed Optimization for Memory-Efficient Full Fine-Tuning

ChunkFT: 用于内存高效全微调的分块优化

Yongkang Liu, Zijing Wang, Mengjie Zhao, Ercong Nie, Mingyang Wang, Qian Li, Feiliang Ren, Shi Feng, Daling Wang, Hinrich Schütze

发表机构 * Northeastern University, China(中国东北大学) Shanghai Jiao Tong University, China(上海交通大学) CIS, LMU Munich, Germany(慕尼黑大学CIS实验室) MCML, Germany(德国MCML实验室) Shandong University, China(山东大学)

AI总结 本文提出ChunkFT框架,通过动态激活的工作集重新定义全参数微调,实现了无需修改网络架构即可对任意子张量进行梯度计算,理论分析和实验表明其在内存使用、运行时间和优化质量上均有效,且在下游任务中表现优于现有内存高效基线。

详情
AI中文摘要

本文提出了ChunkFT,一种内存高效的微调框架,其通过动态激活的工作集重新定义全参数微调。ChunkFT能够在不修改网络架构的情况下,对任意子张量进行梯度计算,为优化任意子网络提供了算法基础,同时避免了标准密集梯度计算。在确定性设置下,我们提供了ChunkFT的理论收敛分析。实验中,我们使用单块RTX 4090-24GB GPU和两块H800-80GB GPU分别对Llama 3-8B和Llama 3-70B进行微调。一个7B模型在1K输入长度下的全参数微调仅需13.72GB的GPU内存。结果表明,ChunkFT在内存使用、运行时间和优化质量上均有效。此外,在语言理解、数学推理和MT-Bench等下游任务中,ChunkFT在性能上一致优于现有内存高效的基线。值得注意的是,ChunkFT在某些情况下甚至超过了全参数微调的性能。我们的代码库可在https://github.com/misonsky/chunk上找到。

英文摘要

This work presents \textsc{ChunkFT}, a memory-efficient fine-tuning framework that reformulates full-parameter fine-tuning around a dynamically activated working set. \textsc{ChunkFT} enables gradient computation for arbitrary sub-tensors without modifying the network architecture, providing an algorithmic foundation for optimizing arbitrary sub-networks while avoiding standard dense gradient computation. We provide a theoretical convergence analysis of \textsc{ChunkFT} in the deterministic setting. Empirically, we apply \textsc{ChunkFT} to fine-tune Llama 3-8B and Llama 3-70B using a single RTX 4090-24GB GPU and 2$\times$ H800-80GB GPUs, respectively. Full-parameter fine-tuning of a 7B model with a 1K input length requires only 13.72GB of GPU memory. The results demonstrate the effectiveness of \textsc{ChunkFT} in memory usage, running time, and optimization quality. Moreover, downstream evaluations on language understanding, mathematical reasoning, and MT-Bench show that \textsc{ChunkFT} consistently outperforms existing memory-efficient baselines. Notably, \textsc{ChunkFT} achieves performance comparable to, and in some cases exceeding, full-parameter fine-tuning. Our repository is on https://github.com/misonsky/chunk.

2605.21171 2026-05-21 cs.CV

FTerViT: Fully Ternary Vision Transformer

FTerViT:全三进制视觉变换器

Szymon Ruciński, Pietro Bonazzi, Engin Türetken, Simon Narduzzi, Michele Magno, Nadim Maamari

发表机构 * CSEM(瑞士塞梅实验室) ETH Zürich(苏黎世联邦理工学院)

AI总结 本文提出了一种全三进制视觉变换器(FTerViT),通过将所有权重矩阵和归一化参数三进制化,实现了模型压缩,同时在资源受限的微控制器上实现了高效的部署。

Comments Preprint

详情
AI中文摘要

三进制视觉变换器(Ternary Vision Transformers)提供了显著的模型压缩,但目前最先进的方法仅将编码器层三进制化,而留下的补丁嵌入、归一化参数和分类头仍保持全精度。在针对资源受限处理器(如微控制器)的紧凑模型中,这些剩余的全精度组件决定了总内存占用,严重限制了部署效率和设备可行性。在本工作中,我们引入了一种完全三进制化的视觉变换器,其中所有权重矩阵和归一化参数均被三进制化(FTerViT)。为此,我们引入了两个新的操作符:具有通道缩放的三进制位卷积(TernaryBitConv2d)用于补丁嵌入,以及三进制归一化(TernaryLayerNorm)。FTerViT通过知识蒸馏进行训练,随后进行轻量级量化感知恢复阶段。我们的三进制W2A8 DeiT-III-S在384×384分辨率下达到82.43%的ImageNet-1K Top-1精度,内存占用为6.09MB(约15倍压缩,相比FP32降低2.42个点),优于先前的三进制ViT方法多达8个点。最后,我们展示了在ESP32-S3系统芯片上的双核XTensa LX7微控制器上首次实现三进制视觉变换器。通过部署FTerViT-Small(基于224×224分辨率的DeiT-III-Small,内存占用5.81MB),我们实现了79.64%的ImageNet-1K Top-1精度。

英文摘要

Ternary Vision Transformers offer substantial model compression, however state-of-the-art methods only ternarize the encoder layers, leaving patch embeddings, LayerNorm parameters, and classifier heads in full precision. In compact models targeting resource-constrained processors, such as microcontrollers, these remaining full-precision components determine the total memory footprint, severely limiting deployment efficiency and on-device feasibility. In this work, we introduce a fully ternarized Vision Transformer in which \emph{all} weight matrices and normalization parameters are ternarized (FTerViT). To this end, we introduce two novel operators : TernaryBitConv2d with per-channel scaling for patch embedding and TernaryLayerNorm. FTerViT is trained using knowledge distillation, followed by a lightweight quantization-aware recovery phase. Our ternary W2A8 DeiT-III-S at 384$\times$384 resolution achieves 82.43\% ImageNet-1K top-1 at 6.09\,MB (${\sim}$15$\times$ compression, $-$2.42\,pp vs.\ FP32), outperforming prior ternary ViTs methods up to 8 pp. Finally, we demonstrate the first implementation of ternary vision transformers on a dual cores XTensa LX7 microcontroller inside the ESP32-S3 system-on-chip. By deploying FTerViT-Small (based on DeiT-III-Small at 224$\times$224 resolution, 5.81\,MB), we achieve 79.64\% ImageNet-1K top-1 accuracy.

2605.21164 2026-05-21 cs.LG quant-ph

Q-SYNTH: Hybrid Quantum-Classical Adversarial Augmentation for Imbalanced Fraud Detection

Q-SYNTH:混合量子-经典对抗增强用于不平衡欺诈检测

Adam Innan, Mansour El Alami, Nouhaila Innan, Muhammad Shafique, Mohamed Bennai

发表机构 * Quantum Physics and Spintronics Team, LPMC, Faculty of Sciences Ben M'sick(量子物理与自旋电子团队,拉瓦尔学院,本·马西克科学学院) eBRAIN Lab, Division of Engineering, New York University Abu Dhabi (NYUAD)(eBRAIN实验室,工程学院,纽约大学阿布扎比分校(NYUAD)) Center for Quantum and Topological Systems (CQTS), NYUAD Research Institute(量子与拓扑系统中心(CQTS),NYUAD研究机构)

AI总结 本文提出Q-SYNTH,一种混合量子-经典对抗框架,用于生成不平衡欺诈检测中的少数类样本,通过量子电路生成器和经典神经网络判别器,提升欺诈检测的召回率和F1分数。

Comments 13 pages, 6 figures

详情
AI中文摘要

信用卡欺诈检测受到极端类别不平衡的挑战,其中欺诈交易稀少但操作上至关重要。这种不平衡通常使监督学习器偏向合法类别,导致整体准确率高但欺诈类召回率和F1分数较弱。本文介绍了Q-SYNTH,一种混合经典-量子生成对抗框架,其中参数化量子电路作为生成器,经典神经网络作为判别器。Q-SYNTH旨在表数据中生成少数类欺诈样本,并从两个维度进行评估:生成样本与真实欺诈样本的统计保真度以及下游欺诈检测性能。为此,生成的样本通过基于Kolmogorov-Smirnov统计和Wasserstein距离的分布相似性度量进行评估,通过AUC-ROC衡量真实与合成的可检测性,并在量子和经典分类器上评估下游分类性能。在报告的协议下,Q-SYNTH在与经典GAN基线相比减少了边缘分布不匹配,同时保持了具有竞争力的下游欺诈检测性能。尽管SMOTE在特征相似性方面最强,而经典GAN在某些设置中达到最高的下游性能,Q-SYNTH在分布保真度和下游性能之间提供了良好的权衡,支持了混合量子增强在不平衡欺诈检测中的可行性。

英文摘要

Credit card fraud detection is fundamentally challenged by extreme class imbalance, where fraudulent transactions are rare yet operationally critical. This imbalance often biases supervised learners toward the legitimate class, leading to high overall accuracy but weaker fraud-class recall and F1-score. This paper introduces Q-SYNTH, a hybrid classical--quantum generative adversarial framework in which a parameterized quantum circuit serves as the generator and a classical neural network serves as the discriminator. Q-SYNTH is designed for minority-class fraud synthesis in tabular data and is evaluated along two dimensions: statistical fidelity to real fraud samples and downstream performance for fraud detection. To this end, generated samples are assessed using distributional similarity measures based on Kolmogorov-Smirnov statistics and Wasserstein distances, real-vs-synthetic detectability measured by AUC-ROC, and downstream classification performance across both quantum and classical classifiers. Under the reported protocol, Q-SYNTH reduces marginal distribution mismatch relative to a classical GAN baseline while maintaining competitive downstream fraud-detection performance. Although SMOTE achieves the strongest feature-wise similarity and the classical GAN attains the highest downstream performance in several settings, Q-SYNTH offers a favorable compromise between distributional fidelity and downstream performance, supporting the feasibility of hybrid quantum augmentation for imbalanced fraud detection.

2605.21160 2026-05-21 cs.LG

Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

通过反向生成数据和引导强化学习学习第一积分

Jingfeng Zhong, Zhengxiang Liu, Zhijie Wang, Shuai Li

发表机构 * Shanghai Jiao Tong University(上海交通大学)

AI总结 本文提出FISolver,一种基于LLM的求解器,通过反向生成数据和引导强化学习方法,解决第一积分发现中的数据稀缺问题,并在挑战性基准上显著优于其他方法。

Comments 17 pages, 2 figures, 3 tables

详情
AI中文摘要

发现第一积分对理解动力系统中的守恒律具有根本科学意义。然而,现有的符号计算工具和大语言模型在这一任务上仍然有限,因为高质量的训练数据稀缺,且成功的解决方案往往依赖于数学直觉。本文提出了FISolver,一种旨在解决这一挑战的基于LLM的求解器。首先,我们介绍了一种

英文摘要

The discovery of first integrals is of fundamental scientific importance for understanding conservation laws in dynamical systems. However, existing symbolic computation tools and Large Language Models (LLMs) remain limited on this task because high-quality training data are scarce and successful solutions often depend on mathematical intuition. This paper presents FISolver, an LLM-based solver developed to address this challenge. First, we introduce a "Backward Generation" algorithm that systematically builds large-scale datasets of (differential equation, first integral) pairs by deriving differential equations from sampled integrals, thereby alleviating the data scarcity bottleneck. Second, we apply supervised fine-tuning to a compact mathematical model and further improve its performance through reinforcement learning with a Levenshtein Distance-based shaped reward. In addition, we design data synthesis and blending strategies that support effective adaptation to difficult problem families from sparse examples. Experiments show that FISolver, while requiring substantially lower computational cost, significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks, indicating a new data-driven route for automated discovery of first integrals.

2605.21157 2026-05-21 cs.CV cs.AI cs.LG cs.RO

Comparative Analysis of Military Detection Using Drone Imagery Across Multiple Visual Spectrums

多光谱下无人机影像用于军事检测的比较分析

Sourov Roy Shuvo, Prajwal Panth, Rajesh Chowdhury, Sorup Chakraborty, Sudip Chakrabarty, Prasant Kumar Pattnaik

发表机构 * School of Computer Engineering KIIT Deemed to be University(计算机工程学院 KIIT deemed to be 大学)

AI总结 本文研究了不同光谱条件下无人机影像用于军事目标检测的问题,通过构建四种不同数据集(灰度、热成像、夜视和模糊成像)来评估模型在不同环境下的性能,提出了一种改进的YOLOv11-small模型以提升无人机作战的性能和可靠性。

Comments 6 pages, 7 figures. Accepted at the 16th International Conference on Computing, Communication and Networking Technologies (ICCCNT), July 6-11, 2025, IIT Indore. Proceedings pending publication

详情
AI中文摘要

在现代战争中,无人机已成为情报收集和精确打击在不同 hostile 环境中的重要组成部分。其能够从安全距离实时操作 hostile 环境的能力使其在监视和军事行动中具有无价的价值。KIIT-MiTA 数据集由从无人机拍摄的不同军事场景图像组成,为检测军事目标提供了基础,但未考虑各种现实场景。为此,创建了四种不同类型的数据集:灰度、热成像、夜视和模糊成像,以模拟现实环境如低能见度、热成像和夜间条件。YOLOv11-small 模型被训练和用于检测不同设置中的目标。本研究通过在防御和进攻任务中开发先进的检测系统,提高了基于无人机的作战性能和可靠性。

英文摘要

In modern warfare, drones are becoming an essential part of intelligence gathering and carrying out precise attacks in different kinds of hostile environments. Their ability to operate in real-time and hostile environments from a safe distance makes them invaluable for surveillance and military operations. The KIIT-MiTA dataset is comprised of images of different military scenarios taken from drones, and these provide a foundation for detecting military objects, but it does not take into account the various types of real-world scenarios. With that in mind, to evaluate how the models are performing under varying conditions, four different types of datasets are created: Gray Scale, Thermal Vision, Night Vision, and Obscura Vision. These simulate the real-world environments such as low visibility, heat-based imagery, and nighttime conditions. The YOLOv11-small model is trained and used to detect objects across diverse settings. This research boosts the performance and reliability of drone-based operations by contributing to the development of advanced detection systems in both defensive and offensive missions.

2605.21154 2026-05-21 cs.CL cs.AI cs.LG

Automated ICD Classification of Psychiatric Diagnoses: From Classical NLP to Large Language Models

精神病诊断的ICD分类自动化:从经典NLP到大语言模型

Fernando Ortega, Raúl Lara-Cabrera, Jorge Dueñas-Lerín, Alejandro de la Torre-Luque, Mercé Salvador Robert, Enrique Baca-García

发表机构 * Department of Sistemas Informáticos, Universidad Politécnica de Madrid, Spain(西班牙马德里理工大学信息系统系) KNODIS Research Group, Universidad Politécnica de Madrid, Spain(西班牙马德里理工大学KNODIS研究组) CIBERSAM ISCIII, Spain(西班牙ISCIII CIBERSAM) Department of Legal Medicine, Psychiatry and Pathology. Complutense University of Madrid, Spain(西班牙马德里康普顿斯大学法医学、精神病学与病理学系) Hospital Universitario de Móstoles, Universidad Rey Juan Carlos, Spain(西班牙雷阿尔皇家卡洛斯大学莫斯特oles大学医院) Department of Psychiatry, University Hospital Jimenez Díaz Fundation, Madrid, Spain(西班牙圣地亚哥· jiménez Díaz基金会精神病科部) Department of Psychiatry, University Hospital Rey Juan Carlos, Móstoles, Spain(西班牙雷阿尔皇家卡洛斯大学莫斯特oles医院精神病科部) Department of Psychiatry, General Hospital of Villalba, Madrid, Spain(西班牙维拉尔巴医院精神病科部) Department of Psychiatry, University Hospital Infanta Elena, Madrid, Spain(西班牙伊菲格尼亚医院精神病科部) Department of Psychology, Universidad Catolica del Maule, Talca, Chile(智利马尔学院心理学系) Department of Psychiatry, Madrid Autonomous University, Madrid, Spain(西班牙马德里自治大学精神病科部)

AI总结 本研究提出利用NLP和机器学习技术将自由文本描述映射到国际疾病分类(ICD),以自动化精神病诊断分析,通过评估从经典频率模型到先进大语言模型的多种文本表示方法,展示了transformer嵌入在捕捉隐含语义线索和细致医学术语方面的优势。

详情
AI中文摘要

心理健康已成为全球优先事项,导致临床诊断编码的行政负担巨大。本研究提出通过将自由文本描述映射到国际疾病分类(ICD)来自动化精神病诊断分析,利用包含145,513个西班牙精神病描述的专用数据集,评估了从经典频率模型(BoW,TF-IDF)到先进大语言模型(如e5_large、BioLORD和Llama-3-8B)的各种文本表示方法。结果表明,基于transformer的嵌入 consistently 超过传统方法,通过端到端微调,e5_large模型实现了最高的性能,F1_micro得分为0.866。本研究证明了将大语言模型适应特定临床术语对于克服“长尾”标签分布和精神病 discourse 的固有模糊性至关重要。

英文摘要

Mental health has become a global priority, leading to a massive administrative burden in the coding of clinical diagnoses. This study proposes the automation of psychiatric diagnostic analysis by mapping free-text descriptions to the International Classification of Diseases (ICD) using Natural Language Processing (NLP) and Machine Learning (ML) techniques. Utilizing a specialized dataset of 145,513 Spanish psychiatric descriptions, various text representation paradigms were evaluated, ranging from classical frequency-based models (BoW, TF-IDF) to state-of-the-art Large Language Models (LLMs) such as e5\_large, BioLORD, and Llama-3-8B. Results indicate that transformer-based embeddings consistently outperform traditional methods by capturing implicit semantic cues and nuanced medical terminology. The e5\_large model, through end-to-end fine-tuning, achieved the highest performance with a $F1_{micro}$ score of 0.866. This research demonstrates that adapting LLMs to specific clinical nomenclature is essential for overcoming the challenges of ``long-tail'' label distributions and the inherent ambiguity of psychiatric discourse.

2605.21150 2026-05-21 cs.RO

EllipseLIO: Adaptive LiDAR Inertial Odometry with an Ellipsoid Representation

EllipseLIO: 一种基于椭球表示的自适应激光雷达惯性里程计

Rowan Border, Margarita Chli

发表机构 * Vision for Robotics Lab (V4RL)(机器人视觉实验室)

AI总结 本文提出EllipseLIO,一种基于椭球表示的实时激光雷达惯性里程计,通过自适应的激光雷达扫描过滤和配准方法,在不同环境和传感器下实现鲁棒的里程计性能,实验表明其在多种复杂场景中表现最优。

Comments 8 pages, 6 figures, 2 tables

详情
AI中文摘要

激光雷达惯性里程计(LIO)是许多需要无外部定位(如GPS)导航的移动机器人中的关键组件。在不同环境中自主运行且配备异构激光雷达传感器的平台需要一种能够适应这些不同场景且无需人工干预的LIO方法。现有LIO方法通常在环境和传感器相似时能提供可靠且准确的里程计,但许多方法在异构环境和传感器中保持鲁棒性时面临困难。本文提出了EllipseLIO,一种实时LIO方法,通过使用适应于传感器能力和环境的激光雷达扫描过滤和配准方法,在不同场景间进行泛化。在五个具有多样性和挑战性的数据集上,EllipseLIO与最先进的LIO方法的实验表明,EllipseLIO总体表现最佳。它在平均上比第二好的方法的里程计误差低38%,并且是唯一一个在所有实验中均不发散的方法。EllipseLIO的开源版本将在github.com/v4rl-ucy/ellipselio上提供。

英文摘要

LiDAR Inertial Odometry (LIO) is a critical component for many mobile robots that need to navigate without relying on external positioning (e.g., GPS). Platforms that operate autonomously in different environments and with heterogeneous LiDAR sensors require a LIO approach that can adapt to these different scenarios without human intervention. Existing LIO approaches can typically provide reliable and accurate odometry in scenarios with similar environments and sensors when suitably tuned. However, many approaches struggle to retain robust odometry across heterogeneous environments and sensors while using a consistent configuration. This paper presents EllipseLIO, a real-time LIO approach that generalises between scenarios by using methods for LiDAR scan filtering and registration that adapt to the sensor capabilities and environment without requiring scenario-specific tuning. Experiments with EllipseLIO and state-of-the-art LIO approaches on five datasets with diverse and challenging scenarios demonstrate that EllipseLIO is the best-performing approach overall. It achieves a 38% lower odometry error on average than the second-best approach and is the only approach that does not diverge in any experiment. An open-source version of EllipseLIO will be available at github.com/v4rl-ucy/ellipselio.

2605.21147 2026-05-21 cs.LG cs.CL

SMoA: Spectrum Modulation Adapter for Parameter-Efficient Fine-Tuning

SMoA:用于参数高效微调的频谱调制适配器

Yongkang Liu, Xing Li, Mengjie Zhao, Shanru Zhang, Zijing Wang, Qian Li, Shi Feng, Feiliang Ren, Daling Wang, Hinrich Schütze

发表机构 * Northeastern University, China(东北大学,中国) Shandong University, China(山东大学,中国) CIS, LMU Munich, Germany(慕尼黑莱布尼茨大学CIS中心,德国) MCML, Germany(德国MCML)

AI总结 本文提出SMoA,一种频谱感知更新的适配器,通过在较小的参数预算下扩大可访问的频谱更新家族,提升参数高效微调的性能。

详情
AI中文摘要

随着模型参数数量的增加,参数高效微调(PEFT)已成为定制预训练大语言模型的首选方法。低秩适应(LoRA)使用低秩更新方法来模拟全参数微调,广泛用于减少资源需求。然而,降低秩面临代表能力有限的挑战。理论表明,LoRA微调秩r收敛于预训练权重矩阵的前r个奇异值。随着秩的增加,更多主奇异方向被保留,通常会提高模型性能。然而,更大的秩也会引入更多的可训练参数,导致更高的计算成本。为克服这一矛盾,我们提出SMoA,一种频谱调制适配器,通过在较小的参数预算下扩大可访问的频谱感知更新家族。SMoA将层分成多个对齐的频谱块,并在每个对角块上应用一个块内Hadamard调制的低秩分支,从而获得更广泛的预训练频谱方向覆盖。我们提供了多个任务的理论分析和实证结果。在我们的实验中,SMoA在当前较低预算设置下优于LoRA和具有竞争力的LoRA风格基线。

英文摘要

As the number of model parameters increases, parameter-efficient fine-tuning (PEFT) has become the go-to choice for tailoring pre-trained large language models. Low-rank Adaptation (LoRA) uses a low-rank update method to simulate full parameter fine-tuning, which is widely used to reduce resource requirements. However, decreasing the rank encounters challenges with limited representational capacity. Theory suggests that LoRA fine-tuning with rank r converges toward the top r singular values of the pre-trained weight matrix. As the rank increases, more principal singular directions are preserved, which generally improves the model's performance. However, a larger rank also introduces more trainable parameters, leading to higher computational cost. To overcome this dilemma, we propose SMoA, a \textbf{S}pectrum \textbf{Mo}dulation \textbf{A}dapter that enlarges the accessible family of spectrum-aware updates under a smaller parameter budget. SMoA partitions the layer into multiple aligned spectral blocks and applies one in-block Hadamard-modulated low-rank branch to each diagonal block, yielding broader coverage of pretrained spectral directions. We provide theoretical analysis and empirical results on multiple tasks. In our experiments, SMoA improves average performance in the current lower-budget setting over LoRA and competitive LoRA-style baselines.

2605.21138 2026-05-21 cs.RO

Safety-Critical Control for Smoothed Implicit Contact Dynamics

安全关键控制用于平滑隐式接触动力学

Haegu Lee, Yitaek Kim, Christoffer Sloth

发表机构 * The Maersk Mc-Kinney Moller Institute, University of Southern Denmark(马士基麦金尼莫勒研究所,丹麦南部大学)

AI总结 本文提出了一种方法,通过引入边界聚焦的滚动策略和离散时间控制屏障函数框架,解决平滑隐式接触动力学中接触力的约束问题,以提高安全性能。

详情
AI中文摘要

平滑隐式接触动力学使在接触丰富的任务中能够基于梯度的规划和控制,而无需预定义的模式序列。然而,安全关键控制仍然具有挑战性,因为隐式接触动力学使得安全过滤器设计变得复杂。平滑参数κ放松了接触互补性约束,这使动力学变得平滑但影响了接触力。本文提供了一种方法,以在使用放松的互补性约束时对实际接触力进行界定。我们显示,约束违反可以是非单调的κ。较小的κ减少了力近似误差,但并不一定改善安全性性能。为了解决这个问题,我们引入了边界聚焦的滚动策略来筛选κ,通过比较安全边际与近似误差。然后我们开发了一种基于隐式定义接触力的一阶泰勒近似的离散时间控制屏障函数(CBF)框架。为了考虑可能的力低估,我们通过添加一个固定的鲁棒边缘来增强由此产生的安全约束。在四个接触丰富的系统上的模拟显示,所提出的方法消除了在标准CBF下观察到的力违反现象。

英文摘要

Smoothed implicit contact dynamics enables gradient-based planning and control for contact-rich tasks without predefined mode sequences. However, safety-critical control remains challenging because implicit contact dynamics makes safety-filter design nontrivial. The smoothing parameter $κ$ relaxes contact complementarity constraints, which makes the dynamics smooth but affects the contact force. This paper provides a method for bounding the actual contact force despite the use of relaxed complementarity constraints. We show that constraint violations can be non-monotonic in $κ$. Smaller $κ$ reduces force-approximation error, but it does not necessarily improve safety performance. To address this issue, we introduce boundary-focused rollouts to screen $κ$ by comparing the safety margin with the approximation error. We then develop a discrete-time control barrier function (CBF) framework based on a first-order Taylor approximation of the implicitly defined contact force. To account for possible force under-prediction, we augment the resulting safety constraint with a fixed robust margin. Simulations on four contact-rich systems show that the proposed method eliminates force violations observed under a standard CBF.

2605.21133 2026-05-21 cs.RO

Humanoid Whole-Body Manipulation via Active Spatial Brain and Generalizable Action Cerebellum

通过主动空间大脑和可泛化动作小脑的人形全身 manipulation

Zhizhao Liang, Yi-Lin Wei, Xuhang Chen, Mu Lin, Yi-Xiang He, Zhexi Luo, Jun-Hui Liu, Kun-Yu Lin, Wei-Shi Zheng

发表机构 * School of Computer Science(计算机科学学院) Engineering, Sun Yat-sen University(工程学院,中山大学)

AI总结 本文提出了一种通用的人形 locomotion-manipulation 框架,通过主动空间大脑和可泛化动作小脑来解决复杂3D环境中空间理解困难和动作生成泛化困难的问题,展示了在多种任务和环境中的强性能。

Comments Project page: https://leungchaos.github.io/Humanoid-Whole-Body-Manipulation-via-Active-Spatial-Brain-and-Generalizable-Action-Cerebellum/

详情
AI中文摘要

在本文中,我们探索了空间感知的人形全身 manipulation 任务。与桌面设置相比,该任务提出了两个关键挑战:1)在复杂3D环境中,具有多样空间关系的空间理解具有挑战性。2)动作生成难以泛化,因为有限且昂贵的真实机器人数据限制了数据驱动模型的泛化能力。为了解决这些挑战,我们提出了一种通用的人形 locomotion-manipulation 框架,该框架利用多智能体大模型的空间感知和动作生成能力。具体而言,我们的框架包括两个组件:Active Spatial Brain 用于主动空间感知和决策,以及 Generalizable Action Cerebellum 用于生成可执行的机器人动作。第一个组件主动感知空间场景,并在任务规划和子任务分解上做出决策。第二个组件根据第一个模块的决策生成可执行的机器人动作,而无需任务特定的真实机器人数据。为了基准测试我们的框架,我们从两个视角设计了一组空间 manipulation 任务:评估空间感知和理解,以及评估真实机器人任务性能。结果表明,在各种任务和环境中,该框架在两个方面都表现出强大的性能。

英文摘要

In this paper, we explore spatial-aware humanoid whole-body manipulation task. Compared with tabletop settings, this task poses two key challenges: 1) Spatial understanding is challenging in complex 3D environments with diverse spatial relations. 2) Action generation is difficult to generalize, as limited and costly real-robot data restricts data-driven models generalization. To address these challenges, we propose a generalizable humanoid loco-manipulation framework that leverages the spatial perception and action generation capabilities of multi-agent large models. Specifically, our framework includes two components: Active Spatial Brain for active spatial perception and decision-making, and Generalizable Action Cerebellum for executable robot action generation. The first component actively perceives the spatial scene and makes decisions on task planning and subtask decomposition. The second component generate executable robot actions based on the decisions made by the first module without needs of task-specific real robot data. To benchmark our framework, we design a set of spatial manipulation tasks from two perspectives: evaluating spatial perception and understanding, and assessing real-robot task performance. The results demonstrate strong performance on both aspects across diverse tasks and environments.