arXivDaily每日学术速递，同步arXiv全量数据，AI总结、翻译，覆盖人工智能、机器人、计算机、金融、统计学、数学、物理学、生物学、经济学、电气&系统等方向。

2603.08561 2026-06-10 cs.AI

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

RetroAgent: 从解决到进化 via 逆向双重内在反馈

Xiaoying Zhang, Zichen Liu, Yipeng Zhang, Xia Hu, Wenqi Shao

发表机构 * Shanghai AI Lab（上海人工智能实验室）； National University of Singapore（新加坡国立大学）； Independent Researcher（独立研究者）

AI总结 RetroAgent通过逆向双重内在反馈机制，使大语言模型代理在交互环境中通过持续进化而非单纯完成任务来提升性能，实现更强的适应与泛化能力。

Comments updated

详情

AI中文摘要

标准强化学习（RL）用于大语言模型（LLM）代理主要优化外在任务奖励，往往偏向于孤立任务完成而非持续适应。这种范式可能导致过早收敛到次优策略，并使有用经验仅隐式编码在模型参数中，限制其检索和重用以供未来决策。我们引入RetroAgent，一种在线RL框架，训练代理掌握交互环境不仅通过解决任务，而是通过跨回合进化。受人类逆向自我提升启发，RetroAgent将外在奖励与逆向生成的双重内在反馈相结合：（1）内在数值反馈，通过测量相对于先前尝试的子任务进展增量来奖励有益探索；（2）内在语言反馈，将成功与失败提炼成可重用的文本教训以供显式经验重用。为有效利用这些教训，我们提出Similarity & Utility-Aware Upper Confidence Bound（SimUtil-UCB），一种检索策略，平衡语义相关性、历史效用和探索。在四个具有挑战性的代理基准测试中，RetroAgent实现了新的最先进的性能，优于GRPO在ALFWorld上提升18.3%、WebShop上提升15.4%、Sokoban上提升27.1%、MineSweeper上提升8.9%，同时展现出强大的测试时间适应性和分布外泛化能力。

英文摘要

Standard reinforcement learning (RL) for large language model (LLM) agents primarily optimizes extrinsic task rewards, often favoring isolated task completion over continual adaptation. This paradigm can cause premature convergence to suboptimal policies and leaves useful experience only implicitly encoded in model parameters, limiting its retrieval and reuse for future decisions. We introduce RetroAgent, an online RL framework that trains agents to master interactive environments not merely by solving tasks, but by evolving across episodes. Inspired by human retrospective self-improvement, RetroAgent augments extrinsic rewards with hindsight-generated dual intrinsic feedback: (1) Intrinsic Numerical Feedback, which rewards beneficial exploration by measuring incremental subtask progress relative to prior attempts; and (2) Intrinsic Language Feedback which distills successes and failures into reusable textual lessons for explicit experience reuse. To leverage these lessons effectively, we propose Similarity & Utility-Aware Upper Confidence Bound (SimUtil-UCB), a retrieval strategy that balances semantic relevance, historical utility, and exploration. Across four challenging agentic benchmarks, RetroAgent achieves new state-of-the-art performance, outperforming GRPO by +18.3% on ALFWorld, +15.4% on WebShop, +27.1% on Sokoban, and +8.9% on MineSweeper, while demonstrating strong test-time adaptation and out-of-distribution generalization.

URL PDF HTML ☆

赞 0 踩 0

2603.04056 2026-06-10 cs.CV cs.RO

Long-Term Visual Localization in Dynamic Benthic Environments: A Dataset, Footprint-Based Ground Truth, and Visual Place Recognition Benchmark

长期动态底栖环境中的视觉定位：一个数据集、基于足迹的地面真实信息以及视觉地点识别基准

Martin Kvisvik Larsen, Oscar Pizarro

发表机构 * Department of Marine Technology（海洋技术系）； Norwegian University of Science and Technology（挪威科学技术大学）； Trondheim, Norway（特罗姆瑟，挪威）

AI总结本文提出一个用于长期底栖环境视觉定位的 curated 数据集和基于足迹的地面真实方法，评估了八种最先进的视觉地点识别方法，发现其在该数据集上的 Recall@K 显著低于传统基准。

Journal ref Frontiers in Robotics and AI Volume 13 (2026) 1821019

详情

DOI: 10.3389/frobt.2026.1821019

AI中文摘要

长期视觉定位有潜力降低光学底栖监测中自主水下机器人（AUV）的成本并提高制图质量。尽管有这种潜力，底栖环境中长期视觉定位仍被低估，主要由于缺乏用于基准测试的curated数据集。此外，有限的地理参考精度和图像足迹需要精确的几何信息以实现准确的地面真实。在本文中，我们通过提出一个用于长期视觉定位的底栖环境curated数据集和一种新的方法来为近垂直水下影像的视觉定位结果进行地面真实，解决了这些差距。我们的数据集包括来自五个底栖参考站点的地理参考AUV影像，这些站点在长达六年的期间内被重新访问，包括原始和颜色校正的立体影像、相机校准和亚分米注册的相机姿态。据我们所知，这是首个涵盖多个站点和光层栖息地的长期视觉定位水下数据集。我们的地面真实方法估计3D海底图像足迹，并将具有重叠足迹的相机视图联系起来，确保地面真实链接反映共享的视觉内容。基于此数据集和地面真实，我们基准测试了八种最先进的视觉地点识别（VPR）方法，并发现Recall@K在我们的数据集上显著低于传统陆地和水下基准。最后，我们比较了基于足迹的地面真实与传统位置基于的地面真实，并表明距离阈值地面真实在地形崎岖和海拔变化的站点上会高估VPR Recall@K。共同，curated数据集、地面真实方法和VPR基准为在动态底栖环境中推进长期视觉定位提供了基础。

英文摘要

Long-term visual localization has the potential to reduce cost and improve mapping quality in optical benthic monitoring with autonomous underwater vehicles (AUVs). Despite this potential, long-term visual localization in benthic environments remains understudied, primarily due to the lack of curated datasets for benchmarking. Moreover, limited georeferencing accuracy and image footprints necessitate precise geometric information for accurate ground-truthing. In this work, we address these gaps by presenting a curated dataset for long-term visual localization in benthic environments and a novel method to ground-truth visual localization results for near-nadir underwater imagery. Our dataset comprises georeferenced AUV imagery from five benthic reference sites, revisited over periods up to six years, and includes raw and color-corrected stereo imagery, camera calibrations, and sub-decimeter registered camera poses. To our knowledge, this is the first curated underwater dataset for long-term visual localization spanning multiple sites and photic-zone habitats. Our ground-truthing method estimates 3D seafloor image footprints and links camera views with overlapping footprints, ensuring that ground-truth links reflect shared visual content. Building on this dataset and ground truth, we benchmark eight state-of-the-art visual place recognition (VPR) methods and find that Recall@K is significantly lower on our dataset than on established terrestrial and underwater benchmarks. Finally, we compare our footprint-based ground truth to a traditional location-based ground truth and show that distance-threshold ground-truthing can overestimate VPR Recall@K at sites with rugged terrain and altitude variations. Together, the curated dataset, ground-truthing method, and VPR benchmark provide a stepping stone for advancing long-term visual localization in dynamic benthic environments.

URL PDF HTML ☆

赞 0 踩 0

2602.23232 2026-06-10 cs.AI

ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays

ReCoN-Ipsundrum：一个可检验的循环持续性代理，具有情感耦合控制和机制关联的意识指标测试

Aishik Sanyal

发表机构 * Aishik Sanyal

AI总结本文提出ReCoN-Ipsundrum代理，通过情感耦合控制和机制关联意识指标测试，探讨意识指标与行为之间的关系，发现情感耦合能提高探索和谨慎行为。

Comments Accepted at AAAI 2026 Spring Symposium - Machine Consciousness: Integrating Theory, Technology, and Philosophy

Journal ref Proceedings of the AAAI Symposium Series, 8(1):352-360, 2026

详情

DOI: 10.1609/aaaiss.v8i1.42565

AI中文摘要

基于机器意识的指标方法建议通过任务跨领域的机制关联证据进行三角验证，通过建筑检查和因果干预支持。受Humphrey的ipsundrum假说启发，我们实现了ReCoN-Ipsundrum，一个可检验的代理，扩展了ReCoN状态机，加入了感官显著性$N^s$的循环持续性回路以及可选的情感代理报告愉悦度/唤醒度。在固定参数消融（ReCoN、Ipsundrum、Ipsundrum+affect）中，我们将Humphrey的qualiaphilia（对感官经验本身的偏好）作为熟悉度控制的风景-平淡路线选择。我们发现一种新颖性分离：非情感变体对新颖性敏感（Δscenic-entry = 0.07）。情感耦合是稳定的（Δscenic-entry = 0.01）即使风景不新颖（中位数{Δnovelty≈-0.43）。在无奖励的探索性游戏中，情感变体表现出结构化的局部调查（扫描事件31.4 vs. 0.9；循环分数7.6）。在疼痛尾探针中，只有情感变体能持续延长计划的谨慎（尾部持续时间90 vs. 5）。对反馈+整合的损伤会减少ipsundrum变体的刺激后持续性（AUC下降27.62, 27.9%），而ReCoN保持不变。这些分离连接了循环→持续性和情感耦合控制→偏好稳定性、扫描和持续谨慎，展示了如何工程化指标样式的签名，并解释了为什么机理和因果证据应伴随行为标记。

英文摘要

Indicator-based approaches to machine consciousness recommend mechanism-linked evidence triangulated across tasks, supported by architectural inspection and causal intervention. Inspired by Humphrey's ipsundrum hypothesis, we implement ReCoN-Ipsundrum, an inspectable agent that extends a ReCoN state machine with a recurrent persistence loop over sensory salience $N^s$ and an optional affect proxy reporting valence/arousal. Across fixed-parameter ablations (ReCoN, Ipsundrum, Ipsundrum+affect), we operationalize Humphrey's qualiaphilia (preference for sensory experience for its own sake) as a familiarity-controlled scenic-over-dull route choice. We find a novelty dissociation: non-affect variants are novelty-sensitive ($Δ$scenic-entry = 0.07). Affect coupling is stable ($Δ$scenic-entry = 0.01) even when scenic is less novel (median {$Δ$novelty $\approx$ -0.43). In reward-free exploratory play, the affect variant shows structured local investigation (scan events 31.4 vs. 0.9; cycle score 7.6). In a pain-tail probe, only the affect variant sustains prolonged planned caution (tail duration 90 vs. 5). Lesioning feedback+integration selectively reduces post-stimulus persistence in ipsundrum variants (AUC drop 27.62, 27.9%) while leaving ReCoN unchanged. These dissociations link recurrence $\rightarrow$ persistence and affect-coupled control $\rightarrow$ preference stability, scanning, and lingering caution, illustrating how indicator-like signatures can be engineered and why mechanistic and causal evidence should accompany behavioral markers.

URL PDF HTML ☆

赞 0 踩 0

2509.11517 2026-06-10 cs.CL cs.LG

PeruMedQA: Benchmarking Large Language Models (LLMs) on Peruvian Medical Exams -- Dataset Construction and Evaluation

PeruMedQA：在秘鲁医学考试上评估大语言模型（LLMs）——数据集构建与评估

Rodrigo M. Carrillo-Larco, Jesus Lovón Melgarejo, Manuel Castillo-Cara, Gusseppe Bravo-Rocca

发表机构 * Hubert Department of Global Health, Rollins School of Public Health, Emory University（霍伯特全球健康部门，埃默里大学公共卫生学院）； Emory Global Diabetes Research Center of Woodruff Health Sciences Center, Emory University（埃默里大学伍德鲁夫健康科学中心全球糖尿病研究中心）； Institut de Recherche en Informatique de Toulouse（图卢兹信息研究院）； Universidad Nacional de Educación a Distancia（远程教育国立大学）； Instituto de Investigación Científica, Universidad de Lima（科学研究所，利马大学）； Barcelona Supercomputing Center（巴塞罗那超级计算中心）

AI总结本文构建了包含8380道题的秘鲁医学考试数据集，通过微调大语言模型并对比不同模型的准确率，揭示了在西班牙语国家医学问题上的性能差异。

Comments https://github.com/rodrigo-carrillo/PeruMedQA

详情

DOI: 10.1007/s40670-026-02692-w

AI中文摘要

背景：医疗大语言模型（LLMs）在回答医学考试中表现出色，但其在西班牙语和拉丁美洲国家的医疗问题上的泛化能力尚不明确。目标：构建秘鲁医师专科学习考试问题数据集，对LLMs进行微调，并评估和比较普通LLMs与微调LLMs的准确性。方法：我们整理了包含8380道题的PeruMedQA数据集，涵盖12个专科（2018-2025年）。我们选择了10个医学LLMs，包括medgemma-4b-it和medgemma-27b-text-it，并开发了零样本任务特定提示来回答问题。我们使用参数高效微调（PEFT）和低秩适应（LoRA）对medgemma-4b-it进行微调，使用所有问题除外2025年（测试集）的问题。结果：medgemma-27b在所有专科中表现最佳，达到精神科89.29%的最高分；然而，在两个专科中，OctoMed-7B略胜一筹：神经外科77.27%和77.38%，放射科76.13%和77.39%。在专科层面，大多数参数少于100亿的LLM正确率低于50%。微调版的medgemma-4b-it在所有参数少于100亿的LLM中胜出，并在各种考试中与700亿参数的LLM竞争。结论：对于需要来自西班牙语国家和与秘鲁有相似流行病学特征的知识库的医疗AI应用和研究，应使用medgemma-27b-text-it。

英文摘要

BACKGROUND: Medical large language models (LLMs) have demonstrated remarkable performance in answering medical examinations. However, the extent to which this high performance is transferable to medical questions in Spanish and from a Latin American country remains unexplored. This knowledge is crucial as LLM-based medical applications gain traction in Latin America. AIMS: To build a dataset of questions medical examinations taken by Peruvian physicians pursuing specialty training; to fine-tune a LLM on this dataset; to evaluate and compare the performance in terms of accuracy between vanilla LLMs and the fine-tuned LLM. METHODS: We curated PeruMedQA, a multiple-choice question-answering (MCQA) dataset containing 8,380 questions spanning 12 specialties (2018-2025). We selected ten medical LLMs, including medgemma-4b-it and medgemma-27b-text-it, and developed zero-shot task specific prompts to answer the questions. We employed parameter-efficient fine tuning (PEFT) and low-rand adaptation (LoRA) to fine-tune medgemma-4b-it utilizing all questions except those from 2025 (test set). RESULTS: Medgemma-27b showed the highest accuracy across all specialities, achieving the highest score of 89.29% in Psychiatry; yet, in two specialties, OctoMed-7B exhibited slight superiority: Neurosurgery with 77.27% and 77.38, respectively; and Radiology with 76.13% and 77.39%, respectively. Across specialties, most LLMs with <10 billion parameters exhibited <50% of correct answers. The fine-tuned version of medgemma-4b-it emerged victorious against all LLMs with <10 billion parameters and rivaled a LLM with 70 billion parameters across various examinations. CONCLUSIONS: For medical AI applications and research that require knowledge bases from Spanish-speaking countries and those exhibiting similar epidemiological profile to Peru's, interested parties should utilize medgemma-27b-text-it.

URL PDF HTML ☆

赞 0 踩 0

2512.04799 2026-06-10 cs.CL

DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors

DaLA：由现实世界错误引导的丹麦语言可接受性评估

Gianluca Barmina, Nathalie Carmen Hau Norman, Peter Schneider-Kamp, Lukas Galke Poech

发表机构 * University of Southern Denmark（南方丹麦大学）； University of Copenhagen（哥本哈根大学）

AI总结本文提出一个增强的丹麦语言可接受性评估基准，通过分析常见错误并引入14种腐蚀函数生成错误句子，验证其有效性后用于评估大型语言模型的可接受性判断任务，结果显示该基准更广泛且更全面。

Journal ref Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)

详情

DOI: 10.63317/4kcbotaa3zgo

AI中文摘要

我们提出一个增强的丹麦语言可接受性评估基准。我们首先分析书面丹麦语中最常见的错误。基于此分析，我们引入十四种腐蚀函数，通过系统性地向现有正确丹麦语句子中引入错误来生成不正确的句子。为了确保这些腐蚀的准确性，我们使用手动和自动方法评估其有效性。结果随后用于评估大型语言模型在语言可接受性判断任务上的表现。我们的发现表明，这种扩展比当前最先进的方法更广泛和更全面。通过纳入更多种类的腐蚀类型，我们的基准提供了更严格的语言可接受性评估，增加了任务难度，这体现在LLMs在我们基准上的表现比现有基准更低。我们的结果还表明，我们的基准具有更高的区分能力，能够更好地区分表现优异的模型和表现较差的模型。

英文摘要

We present an enhanced benchmark for evaluating linguistic acceptability in Danish. We first analyze the most common errors found in written Danish. Based on this analysis, we introduce a set of fourteen corruption functions that generate incorrect sentences by systematically introducing errors into existing correct Danish sentences. To ensure the accuracy of these corruptions, we assess their validity using both manual and automatic methods. The results are then used as a benchmark for evaluating Large Language Models on a linguistic acceptability judgement task. Our findings demonstrate that this extension is both broader and more comprehensive than the current state of the art. By incorporating a greater variety of corruption types, our benchmark provides a more rigorous assessment of linguistic acceptability, increasing task difficulty, as evidenced by the lower performance of LLMs on our benchmark compared to existing ones. Our results also suggest that our benchmark has a higher discriminatory power which allows to better distinguish well-performing models from low-performing ones.

URL PDF HTML ☆

赞 0 踩 0

2510.15470 2026-06-10 cs.CV cs.IR

MSAM: Multi-Semantic Adaptive Mining for Cross-Modal Drone Video-Text Retrieval

MSAM：多语义自适应挖掘用于跨模态无人机视频-文本检索

Jinghao Huang, Yaxiong Chen, Ganchao Liu

发表机构 * School of Computer Science and Engineering, Sun Yat-sen University（中山大学计算机科学与工程学院）； School of Computer Science and Artificial Intelligence, Wuhan University of Technology（武汉理工大学计算机科学与人工智能学院）； School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University（西北工业大学人工智能、光学与电子学院（iOPEN））

AI总结本文提出MSAM方法，通过多语义自适应学习机制提升无人机视频-文本跨模态检索性能，采用细粒度交互和自适应语义构建模块增强特征表示鲁棒性。

详情

DOI: 10.1109/TCSVT.2026.3701979

AI中文摘要

随着无人机技术的发展，视频数据量迅速增加，亟需高效的语义检索方法。本文首次系统提出并研究无人机视频-文本检索（DVTR）任务。无人机视频具有俯视视角、强结构同质性和目标组合的多义性，挑战了现有针对地面视角设计的跨模态方法。为此，我们提出名为多语义自适应挖掘（MSAM）的新方法。MSAM引入多语义自适应学习机制，整合帧间动态变化并从特定场景区域提取丰富的语义信息，从而增强对无人机视频内容的深度理解和推理。该方法依赖于词与无人机视频帧之间的细粒度交互，整合自适应语义构建模块、分布驱动的语义学习项和多样性语义项，加深文本与无人机视频模态的交互并提升特征表示的鲁棒性。为减少无人机视频复杂背景的干扰，我们引入了跨模态交互特征融合池化机制，专注于目标区域的特征提取和匹配，以最小化噪声影响。在两个自建的无人机视频-文本数据集上进行的广泛实验表明，MSAM在无人机视频-文本检索任务中优于其他现有方法。源代码和数据集将公开发布。

英文摘要

With the advancement of drone technology, the volume of video data increases rapidly, creating an urgent need for efficient semantic retrieval. We are the first to systematically propose and study the drone video-text retrieval (DVTR) task. Drone videos feature overhead perspectives, strong structural homogeneity, and diverse semantic expressions of target combinations, which challenge existing cross-modal methods designed for ground-level views in effectively modeling their characteristics. Therefore, dedicated retrieval mechanisms tailored for drone scenarios are necessary. To address this issue, we propose a novel approach called Multi-Semantic Adaptive Mining (MSAM). MSAM introduces a multi-semantic adaptive learning mechanism, which incorporates dynamic changes between frames and extracts rich semantic information from specific scene regions, thereby enhancing the deep understanding and reasoning of drone video content. This method relies on fine-grained interactions between words and drone video frames, integrating an adaptive semantic construction module, a distribution-driven semantic learning term and a diversity semantic term to deepen the interaction between text and drone video modalities and improve the robustness of feature representation. To reduce the interference of complex backgrounds in drone videos, we introduce a cross-modal interactive feature fusion pooling mechanism that focuses on feature extraction and matching in target regions, minimizing noise effects. Extensive experiments on two self-constructed drone video-text datasets show that MSAM outperforms other existing methods in the drone video-text retrieval task. The source code and dataset will be made publicly available.

URL PDF HTML ☆

赞 0 踩 0

2510.03844 2026-06-10 cs.LG stat.AP stat.ME

On Using Large Language Models to Enhance Clinically-Driven Missing Data Recovery Algorithms in Electronic Health Records

利用大型语言模型增强电子健康记录中临床驱动的缺失数据恢复算法

Sarah C. Lotspeich, Abbey Collins, Brian J. Wells, Ashish K. Khanna, Joseph Rigdon, Lucy D'Agostino McGowan

发表机构 * Department of Statistical Sciences, Wake Forest University（统计科学系，威克森林大学）； Wake Forest University（威克森林大学）； Wake Forest University School of Medicine（威克森林大学医学院）； Department of Psychology, North Carolina State University（心理学系，北卡罗来纳州立大学）； Department of Biostatistics and Data Science, Wake Forest University School of Medicine（生物统计学与数据科学系，威克森林大学医学院）； Department of Anesthesiology, Division of Critical Care Medicine, Wake Forest University School of Medicine（麻醉学系，重症医学科，威克森林大学医学院）； Outcomes Research Consortium（结局研究联盟）

AI总结本文探讨利用大型语言模型改进电子健康记录中缺失数据恢复算法的准确性与可扩展性，通过临床专家和LLM协同优化路标，实现与专家审查相似的数据恢复效果。

Journal ref 2026

详情

DOI: 10.1093/jamiaopen/ooag080

AI中文摘要

目的：电子健康记录（EHR）数据易出现缺失和错误。先前，我们设计了一种“增强”图表审查协议，利用辅助诊断（路标）来恢复EHR数据中的缺失值（例如，糖尿病控制不良可能暗示缺失的血红蛋白A1c值不健康）。然而，图表审查成本高且耗时，限制了可审查患者的数量。现在，我们研究了基于ICD-10代码的路标驱动算法的准确性和可扩展性，以模拟专家图表审查并恢复缺失值。材料和方法：除了临床专家原始的路标外，我们考虑了通过大型语言模型（LLM）与临床专业知识结合迭代优化的新版本，以扩展辅助诊断列表。使用100名患者在扩展学习健康系统中的图表审查数据，我们检验了不同路标下的算法性能。在1000名患者的更大研究中，我们应用了最终算法，该算法使用了经临床专家批准的LLM添加的路标。结果：该算法恢复的缺失数据量与专家图表审查相当，甚至更多。讨论：临床驱动的算法（通过LLM增强）可以以与图表审查相似的准确性恢复EHR数据，并可应用于大规模样本。将这些算法扩展以监控其他数据质量维度（如合理性）是具有前景的未来方向。

英文摘要

Objective: Electronic health records (EHR) data are prone to missingness and errors. Previously, we devised an "enriched" chart review protocol where a "roadmap" of auxiliary diagnoses (anchors) was used to recover missing values in EHR data (e.g., a diagnosis of impaired glycemic control might imply that a missing hemoglobin A1c value would be considered unhealthy). Still, chart reviews are expensive and time-intensive, which limits the number of patients whose data can be reviewed. Now, we investigate the accuracy and scalability of a roadmap-driven algorithm, based on ICD-10 codes (International Classification of Diseases, 10th revision), to mimic expert chart reviews and recover missing values. Materials and Methods: In addition to the clinicians' original roadmap from our previous work, we consider new versions that were iteratively refined using large language models (LLM) in conjunction with clinical expertise to expand the list of auxiliary diagnoses. Using chart reviews for 100 patients from the EHR at an extensive learning health system, we examine algorithm performance with different roadmaps. Using the larger study of $1000$ patients, we applied the final algorithm, which used a roadmap with clinician-approved additions from the LLM. Results: The algorithm recovered as much, if not more, missing data as the expert chart reviewers, depending on the roadmap. Discussion: Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples. Extending them to monitor other dimensions of data quality (e.g., plausability) is a promising future direction.

URL PDF HTML ☆

赞 0 踩 0

2508.00491 2026-06-10 cs.RO cs.AI

HannesImitation: Grasping with the Hannes Prosthetic Hand via Imitation Learning

HannesImitation：通过模仿学习控制Hannes假手进行抓取

Carlo Alessi, Federico Vasile, Federico Ceola, Giulia Pasquale, Nicolò Boccardo, Lorenzo Natale

发表机构 * Humanoid Sensing and Perception（人形感知与感知实验室）； Istituto Italiano di Tecnologia（意大利技术研究院）； Rehab Technologies Lab（康复技术实验室）

AI总结本文提出HannesImitationPolicy，通过模仿学习控制Hannes假手在无结构环境中抓取物体，并引入HannesImitationDataset进行训练，实验表明其在无结构场景中优于基于分割的视觉伺服控制器。

Comments Paper accepted at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)

Journal ref IEEE/RSJ International Conference on Intelligent Robots and Systems, Hangzhou, China, 2025

详情

AI中文摘要

最近，假手控制的进步集中在通过摄像头和其他传感器输入提高自主性。这些系统旨在通过自动控制某些自由度来减少用户认知负担。在机器人学中，模仿学习已成为学习抓取和复杂操作任务并简化数据收集的有前途的方法。然而，其在假手控制中的应用仍 largely 未被探索。填补这一差距可以提高灵活性恢复，并使假手设备能够在更多无约束场景中运行，其中任务是通过演示学习而非依赖手动标注序列。为此，我们提出了HannesImitationPolicy，一种基于模仿学习的方法来控制Hannes假手，使其在无结构环境中进行物体抓取。此外，我们引入了HannesImitationDataset，包含在桌子、架子和人到假手交接场景中的抓取演示。我们利用此类数据训练了一个单扩散策略，并将其部署在假手上以预测手腕方向和手部闭合以进行抓取。实验评估显示在多样化的物体和条件下成功抓取。最后，我们展示该策略在无结构场景中优于基于分割的视觉伺服控制器。附加材料可在我们的项目页面上提供：https://hsp-iit.github.io/HannesImitation

英文摘要

Recent advancements in control of prosthetic hands have focused on increasing autonomy through the use of cameras and other sensory inputs. These systems aim to reduce the cognitive load on the user by automatically controlling certain degrees of freedom. In robotics, imitation learning has emerged as a promising approach for learning grasping and complex manipulation tasks while simplifying data collection. Its application to the control of prosthetic hands remains, however, largely unexplored. Bridging this gap could enhance dexterity restoration and enable prosthetic devices to operate in more unconstrained scenarios, where tasks are learned from demonstrations rather than relying on manually annotated sequences. To this end, we present HannesImitationPolicy, an imitation learning-based method to control the Hannes prosthetic hand, enabling object grasping in unstructured environments. Moreover, we introduce the HannesImitationDataset comprising grasping demonstrations in table, shelf, and human-to-prosthesis handover scenarios. We leverage such data to train a single diffusion policy and deploy it on the prosthetic hand to predict the wrist orientation and hand closure for grasping. Experimental evaluation demonstrates successful grasps across diverse objects and conditions. Finally, we show that the policy outperforms a segmentation-based visual servo controller in unstructured scenarios. Additional material is provided on our project page: https://hsp-iit.github.io/HannesImitation

URL PDF HTML ☆

赞 0 踩 0

2508.17196 2026-06-10 cs.LG cs.AI

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

BudgetThinker: 通过控制令牌赋能预算感知的LLM推理

Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li

发表机构 * Institute for AI Industry Research (AIR) Tsinghua University（人工智能产业研究院（AIR）清华大学）； Global Innovation Exchange & Department of Automation Tsinghua University（全球创新交流中心及自动化系清华大学）

AI总结 BudgetThinker通过在推理过程中插入控制令牌，使LLM能够精确控制推理过程长度，采用两阶段训练流程提升模型在不同预算下的表现。

详情

AI中文摘要

近年来，大语言模型（LLM）通过增加测试时计算来增强推理能力，但此策略导致显著延迟和资源成本，限制了其在现实时间受限或成本敏感场景中的应用。本文提出BudgetThinker，一种新型框架，旨在使LLM具备预算感知推理能力，通过在推理过程中定期插入特殊控制令牌，持续告知模型剩余令牌预算。我们提出了一种方法，结合监督微调（SFT）和基于课程的学习强化学习（RL）阶段，利用长度感知奖励函数优化准确性和预算遵守度。我们证明BudgetThinker在各种推理预算下的数学基准测试中显著优于强基线。我们的方法提供了一种可扩展且有效的解决方案，用于开发高效可控的LLM推理，使高级模型更适用于资源受限和实时环境。

英文摘要

Recent advancements in Large Language Models (LLMs) have leveraged increased test-time computation to enhance reasoning capabilities, a strategy that, while effective, incurs significant latency and resource costs, limiting their applicability in real-world time-constrained or cost-sensitive scenarios. This paper introduces BudgetThinker, a novel framework designed to empower LLMs with budget-aware reasoning, enabling precise control over the length of their thought processes. We propose a methodology that periodically inserts special control tokens during inference to continuously inform the model of its remaining token budget. This approach is coupled with a comprehensive two-stage training pipeline, beginning with Supervised Fine-Tuning (SFT) to familiarize the model with budget constraints, followed by a curriculum-based Reinforcement Learning (RL) phase that utilizes a length-aware reward function to optimize for both accuracy and budget adherence. We demonstrate that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets on challenging mathematical benchmarks. Our method provides a scalable and effective solution for developing efficient and controllable LLM reasoning, making advanced models more practical for deployment in resource-constrained and real-time environments.

URL PDF HTML ☆

赞 0 踩 0

2508.05769 2026-06-10 cs.CV

Improving Masked Style Transfer using Blended Partial Convolution

通过混合部分卷积改进遮蔽风格迁移

Seyed Hadi Seyed, Ayberk Cansever, David Hart

发表机构 * East Carolina University（东卡罗来纳大学）

AI总结本文提出基于部分卷积的风格迁移网络，精准应用于目标区域，并通过内部混合技术弥补区域选择的不完美，提升视觉和量化效果。

Journal ref IEEE ACCESS Vol. 14 2026

详情

DOI: 10.1109/ACCESS.2026.3687089

AI中文摘要

艺术风格迁移长期以来依赖于卷积和变压器神经网络的发展。大多数算法将艺术风格应用于整个图像，但个别用户可能只需要将风格应用于图像中的特定区域。标准做法是在风格化后简单地对图像进行遮蔽。本文表明这种做法倾向于不恰当地捕捉目标区域的风格特征。我们提出了一种基于部分卷积的风格迁移网络，能够准确地将风格特征仅应用于目标区域。此外，我们还提出了网络内部的混合技术，以弥补区域选择的不完美。我们通过SA-1B数据集中的示例展示了这种改进在视觉和量化上的提升。代码可在https://github.com/davidmhart/StyleTransferMasked公开获取。

英文摘要

Artistic style transfer has long been possible with the advancements of convolution- and transformer-based neural networks. Most algorithms apply the artistic style transfer to the whole image, but individual users may only need to apply a style transfer to a specific region in the image. The standard practice is to simply mask the image after the stylization. This work shows that this approach tends to improperly capture the style features in the region of interest. We propose a partial-convolution-based style transfer network that accurately applies the style features exclusively to the region of interest. Additionally, we present network-internal blending techniques that account for imperfections in the region selection. We show that this visually and quantitatively improves stylization using examples from the SA-1B dataset. Code is publicly available at https://github.com/davidmhart/StyleTransferMasked.

URL PDF HTML ☆

赞 0 踩 0

2410.22967 2026-06-10 cs.LG eess.SP

Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector

自适应NAD：在线且自适应的无监督网络异常检测器

Yachao Yuan, Yu Huang, Yingwen Wu

发表机构 * Suda University（苏州大学）

AI总结提出一种在线自适应的无监督网络异常检测框架Adaptive NAD，通过两层异常检测策略生成伪标签和在线训练方案，在多个数据集上实现最低误报率和更快推理速度。

详情

AI中文摘要

物联网的广泛使用增加了网络威胁的风险；因此，开发能够适应不断变化的流量模式的异常检测系统（ADS）至关重要。以往的研究主要关注离线无监督学习方法以保护ADS，但这在实际应用中并不适用。本文设计了Adaptive NAD，一种面向安全领域的在线自适应无监督网络异常检测框架。提出了一种两层异常检测策略来生成可靠的高置信度伪标签。然后，引入了一种在线训练方案，通过新颖的阈值计算技术来更新Adaptive NAD。实验结果表明，在CIC-Darknet2020、NSL-KDD和Edge-IIoTset数据集上，Adaptive NAD实现了最低的误报率（分别为1.33%、0.71%和0.08%），并且在线推理延迟比现有最优解决方案快3倍以上。代码已发布在https://github.com/MyLearnCodeSpace/Adaptive-NAD。

英文摘要

The widespread usage of the Internet of Things (IoT) has raised the risks of cyber threats; thus, developing Anomaly Detection Systems (ADSs) that can adapt to evolving traffic pattern is critical. Previous studies primarily focused on offline unsupervised learning methods to safeguard ADSs, which is not applicable in practical real-world applications. In this paper, we design Adaptive NAD, an online and self-Adaptive unsupervised Network Anomaly Detection framework for security domains. A two-layer anomaly detection strategy is proposed to generate reliable high-confidence pseudo-labels. Then, an online training scheme is introduced to update Adaptive NAD by a novel threshold calculation technique. Experimental results demonstrate that Adaptive NAD achieves the lowest false alarm rate (1.33%, 0.71%, and 0.08%) and has a more than 3 times faster online inference latency compared with state-of-the-art solutions on the CIC-Darknet2020, NSL-KDD, and Edge-IIoTset datasets, respectively. The code is released at https://github.com/MyLearnCodeSpace/Adaptive-NAD.

URL PDF HTML ☆

赞 0 踩 0

2501.12486 2026-06-10 cs.LG cs.CL

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

训练过程至关重要：平均预训练参数计数统一了稀疏和密集的扩展规律

Tian Jin, Ahmed Imtiaz Humayun, Utku Evci, Suvinay Subramanian, Amir Yazdanbakhsh, Dan Alistarh, Gintare Karolina Dziugaite

发表机构 * MIT CSAIL（MIT 计算科学与人工智能实验室）； Rice University（稻大学）； Google Research（谷歌研究）； Google DeepMind（谷歌深度思维）； Google（谷歌）； IST Austria（奥地利科学院）

AI总结本文通过研究80种不同的剪枝计划，发现预训练过程中在25%和75%的计算量启动和结束剪枝可获得最佳评估损失，提出新的扩展规律统一了稀疏和密集预训练的扩展规律。

Comments 17 pages

Journal ref The Thirteenth International Conference on Learning Representations (ICLR), 2025

详情

AI中文摘要

剪枝通过消除神经网络中不必要的参数，为大型语言模型（LLMs）日益增长的计算需求提供了一个有前途的解决方案。虽然许多研究关注训练后的剪枝，但将剪枝和预训练结合到一个阶段的稀疏预训练提供了一个更简单的替代方案。在本文中，我们通过研究80种不同的剪枝计划，探讨了不同稀疏度和训练持续时间下的最优稀疏预训练配置。我们发现，在总训练计算量的25%处启动剪枝并在75%处结束可获得接近最优的最终评估损失。这些发现为高效且有效的LLMs稀疏预训练提供了有价值的见解。此外，我们提出了一种新的扩展规律，修改了Chinchilla扩展规律以使用预训练期间的平均参数计数。通过实证和理论验证，我们证明了这种修改后的扩展规律能够准确地建模稀疏和密集预训练LLMs的评估损失，统一了预训练范式的扩展规律。我们的发现表明，虽然稀疏预训练在等效计算预算下能获得与密集预训练相同的最终模型质量，但通过减少模型大小，它在推理过程中提供了显著的计算节省潜力。

英文摘要

Pruning eliminates unnecessary parameters in neural networks; it offers a promising solution to the growing computational demands of large language models (LLMs). While many focus on post-training pruning, sparse pre-training--which combines pruning and pre-training into a single phase--provides a simpler alternative. In this work, we present the first systematic exploration of optimal sparse pre-training configurations for LLMs through an examination of 80 unique pruning schedules across different sparsity levels and training durations. We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss. These findings provide valuable insights for efficient and effective sparse pre-training of LLMs. Furthermore, we propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training. Through empirical and theoretical validation, we demonstrate that this modified scaling law accurately models evaluation loss for both sparsely and densely pre-trained LLMs, unifying scaling laws across pre-training paradigms. Our findings indicate that while sparse pre-training achieves the same final model quality as dense pre-training for equivalent compute budgets, it provides substantial benefits through reduced model size, enabling significant potential computational savings during inference.

URL PDF HTML ☆

赞 0 踩 0

2407.09510 2026-06-10 cs.CV

3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods

3DGS.zip：3D高斯散射压缩方法综述

Milena T. Bagdasarian, Paul Knoll, Yi-Hsin Li, Florian Barthel, Anna Hilsmann, Peter Eisert, Wieland Morgenstern

发表机构 * Fraunhofer HHI（弗劳恩霍夫研究所汉诺威研究所）； Humboldt-Universität zu Berlin（柏林洪堡大学）； Technische Universität Berlin（柏林技术大学）

AI总结本文综述了3DGS压缩方法，探讨了压缩与紧缩技术，旨在提高3DGS的效率和实用性，通过减少文件大小和高斯数量来优化质量和性能。

Comments 3D Gaussian Splatting compression survey; 3DGS compression; updated discussion; new approaches added; new illustrations

Journal ref Computer Graphics Forum, Volume 44, Issue 2 (2025)

详情

DOI: 10.1111/cgf.70078

AI中文摘要

3D高斯散射（3DGS）作为一种实时辐射场渲染技术，因其质量和速度的先进性能而崭露头角。3DGS将场景建模为三维高斯集合，并通过优化额外属性以符合场景的几何和视觉特性。尽管其在渲染速度和图像保真度方面具有优势，但其显著的存储和内存需求限制了其在移动设备或头显中的应用。为解决这些挑战并推动3DGS的实用性，本文提供了对压缩和紧缩技术的全面详细分析。我们将现有方法分为压缩（减少文件大小）和紧缩（减少高斯数量）两类。两种方法均旨在维持或提升质量，分别通过最小化其各自属性：压缩通过最小化文件大小，紧缩通过最小化高斯数量。我们介绍了所分析方法的基本数学概念，以及关键的实现细节和设计选择。本文详尽讨论了方法之间的相似性和差异性，以及各自的优势和劣势。我们建立了基于关键性能指标和数据集的统一框架，以比较这些方法。由于这些方法在短时间内并行发展，目前尚无全面的比较。本文首次提出一个统一的框架来评估3DGS压缩技术。我们维护一个网站，定期更新新兴方法：https://w-m.github.io/3dgs-compression-survey/。

英文摘要

3D Gaussian Splatting (3DGS) has emerged as a cutting-edge technique for real-time radiance field rendering, offering state-of-the-art performance in terms of both quality and speed. 3DGS models a scene as a collection of three-dimensional Gaussians, with additional attributes optimized to conform to the scene's geometric and visual properties. Despite its advantages in rendering speed and image fidelity, 3DGS is limited by its significant storage and memory demands. These high demands make 3DGS impractical for mobile devices or headsets, reducing its applicability in important areas of computer graphics. To address these challenges and advance the practicality of 3DGS, this survey provides a comprehensive and detailed examination of compression and compaction techniques developed to make 3DGS more efficient. We classify existing methods into two categories: compression, which focuses on reducing file size, and compaction, which aims to minimize the number of Gaussians. Both methods aim to maintain or improve quality, each by minimizing its respective attribute: file size for compression and Gaussian count for compaction. We introduce the basic mathematical concepts underlying the analyzed methods, as well as key implementation details and design choices. Our report thoroughly discusses similarities and differences among the methods, as well as their respective advantages and disadvantages. We establish a consistent framework for comparing the surveyed methods based on key performance metrics and datasets. Specifically, since these methods have been developed in parallel and over a short period of time, currently, no comprehensive comparison exists. This survey, for the first time, presents a unified framework to evaluate 3DGS compression techniques. We maintain a website that will be regularly updated with emerging methods: https://w-m.github.io/3dgs-compression-survey/ .

URL PDF HTML ☆

赞 0 踩 0

2502.11517 2026-06-10 cs.CL cs.DC cs.LG

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

学习承诺：通过学习异步解码扩展语言模型解码并行性

Tian Jin, Ellie Y. Cheng, Zack Ankner, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subramanian, Michael Carbin

发表机构 * DeepMind, London, UK（深度思维公司，伦敦，英国）； Google Research, New York, NY, USA（谷歌研究院，纽约，纽约州，美国）； Stanford University, Stanford, CA, USA（斯坦福大学，斯坦福，加利福尼亚州，美国）； University of Toronto, Toronto, Ontario, Canada（多伦多大学，多伦多，安大略省，加拿大）； University of Washington, Seattle, WA, USA（华盛顿大学，西雅图，华盛顿州，美国）

AI总结本文提出PASTA系统，通过学习使语言模型识别语义独立性，提升解码并行性，实验证明在解码速度和响应质量上优于现有方法。

Comments 15 pages

Journal ref Proceedings of the 42nd International Conference on Machine Learning (ICML), PMLR 267:27941-27956, 2025

详情

AI中文摘要

传统的自回归大语言模型（LLM）解码通常是顺序进行的，逐个生成token。新兴的研究探索了通过识别并同时生成语义独立的LLM响应片段来实现并行解码。然而，这些技术依赖于手工制定的启发式方法，与语法结构如列表和段落相关，使它们僵化且不精确。我们提出了PASTA，一个基于学习的系统，教会LLM识别语义独立性并在自身响应中表达并行解码机会。其核心是PASTA-LANG及其解释器：PASTA-LANG是一种注释语言，使LLM能够在自身响应中表达语义独立性；语言解释器作用于这些注释，以在推理时实时协调并行解码。通过两阶段微调过程，我们训练LLM生成PASTA-LANG注释，以优化响应质量和解码速度。在AlpacaEval指令遵循基准上的评估显示，我们的方法在解码速度和响应质量上优于现有方法；我们的结果表明，几何平均速度提升范围从1.21x到1.93x，对应的质量变化为+2.2%到-7.1%，通过长度控制的胜利率与顺序解码基线比较。

英文摘要

Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and simultaneously generating semantically independent chunks of LLM responses. However, these techniques rely on hand-crafted heuristics tied to syntactic structures like lists and paragraphs, making them rigid and imprecise. We present PASTA, a learning-based system that teaches LLMs to identify semantic independence and express parallel decoding opportunities in their own responses. At its core are PASTA-LANG and its interpreter: PASTA-LANG is an annotation language that enables LLMs to express semantic independence in their own responses; the language interpreter acts on these annotations to orchestrate parallel decoding on-the-fly at inference time. Through a two-stage finetuning process, we train LLMs to generate PASTA-LANG annotations that optimize both response quality and decoding speed. Evaluation on AlpacaEval, an instruction following benchmark, shows that our approach Pareto-dominates existing methods in terms of decoding speed and response quality; our results demonstrate geometric mean speedups ranging from 1.21x to 1.93x with corresponding quality changes of +2.2% to -7.1%, measured by length-controlled win rates against sequential decoding baseline.

URL PDF HTML ☆

赞 0 踩 0

2501.11937 2026-06-10 cs.LG cs.AI

MeshONet: A Generalizable and Efficient Operator Learning Method for Structured Mesh Generation

MeshONet: 一种通用且高效的结构网格生成运算学习方法

Jing Xiao, Xinhai Chen, Qingling Wang, Jie Liu

发表机构 * Laboratory of Digitizing Software for Frontier Equipment, Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology（前沿装备数字化软件实验室、并行与分布式处理技术实验室、国防科技大学）

AI总结本文提出MeshONet，一种用于结构网格生成的通用智能学习方法，通过将网格生成任务转化为运算学习问题，实现了高效生成和跨几何泛化。

Journal ref Neural Networks 199: 108746 (2026)

详情

DOI: 10.1016/j.neunet.2026.108746

AI中文摘要

网格生成在科学计算中起着关键作用。传统网格生成方法如TFI和基于PDE的方法往往难以在效率和网格质量之间取得平衡。为解决这一挑战，近年来出现了物理引导的智能学习方法，显著提高了生成效率并保持了高质量网格。然而，物理引导方法在应用于以前未见过的几何时无法泛化，因为即使边界形状的微小变化也需要负担得起的重新训练来适应新的几何变化。在本文中，我们引入了MeshONet，这是第一个用于结构网格生成的通用智能学习方法。该方法将网格生成任务转换为一个具有多个输入和解函数的运算学习问题。为了有效克服运算学习方法的多变量映射限制，我们提出了一种双分支、共享主干的架构，以基于输入输出对的方式近似函数空间之间的映射。实验结果表明，MeshONet在生成效率上比传统方法快了四个数量级。它还能够泛化到不同的几何形状而无需重新训练，大大增强了智能方法的实用性。

英文摘要

Mesh generation plays a crucial role in scientific computing. Traditional mesh generation methods, such as TFI and PDE-based methods, often struggle to achieve a balance between efficiency and mesh quality. To address this challenge, physics-informed intelligent learning methods have recently emerged, significantly improving generation efficiency while maintaining high mesh quality. However, physics-informed methods fail to generalize when applied to previously unseen geometries, as even small changes in the boundary shape necessitate burdensome retraining to adapt to new geometric variations. In this paper, we introduce MeshONet, the first generalizable intelligent learning method for structured mesh generation. The method transforms the mesh generation task into an operator learning problem with multiple input and solution functions. To effectively overcome the multivariable mapping restriction of operator learning methods, we propose a dual-branch, shared-trunk architecture to approximate the mapping between function spaces based on input-output pairs. Experimental results show that MeshONet achieves a speedup of up to four orders of magnitude in generation efficiency over traditional methods. It also enables generalization to different geometries without retraining, greatly enhancing the practicality of intelligent methods.

URL PDF HTML ☆

赞 0 踩 0

2409.12263 2026-06-10 cs.LG cs.SI

Detecting LGBTQ+ Instances of Cyberbullying

检测LGBTQ+群体的网络欺凌实例

Arslan Bisharat, Manuel Sandoval Madrigal, Mohammed Abuhamad, Deborah L. Hall, Yasin N. Silva

发表机构 * Loyola University Chicago（洛伊拉大学芝加哥分校）； Arizona State University（亚利桑那州立大学）

AI总结本文研究利用Transformer模型识别针对LGBTQ+群体的网络欺凌，分析不同模型在复杂微妙欺凌行为中的有效性。

Comments 10 pages, 4 tables, 1 figure, 17th International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation

2408.07925 2026-06-10 cs.LG eess.SP

A Single Channel-Based Neonatal Sleep-Wake Classification using Hjorth Parameters and Improved Gradient Boosting

基于Hjorth参数和改进梯度提升的单通道新生儿睡眠-觉醒分类

Arslan Bisharat, Muhammad Mubeen, Saadullah Farooq Abbasi, Muhammad Shahbaz Khan, Wadii Boulila, Jawad Ahmad

发表机构 * Department of Computer Science, Loyola University（洛约拉大学计算机科学系）； Department of Computer Science, University of People（人民大学计算机科学系）； Department of Electronic, Electrical and Systems Engineering, University of Birmingham（伯明翰大学电子、电气与系统工程系）； School of Computing, Engineering and the Built Environment, Edinburgh Napier University（爱丁堡纳皮尔大学计算、工程与环境科学学院）； RIOTU Lab, Prince Sultan University（普森大学RIOTU实验室）

AI总结本文提出利用单通道梯度提升算法与Hjorth特征进行新生儿睡眠阶段分类，通过随机搜索交叉验证优化参数，达到82.35%的分类准确率，验证方法采用5折交叉验证，提升现有算法并拓展应用范围。

Comments 8 pages, 5 figures, 3 tables, International Polydisciplinary Conference on Artificial Intelligence and New Technologies

详情

AI中文摘要

睡眠在新生儿发育中起关键作用。在新生儿重症监护室（NICU）中监测新生儿睡眠模式对于理解成熟过程至关重要。尽管多通道脑电图（EEG）被认为是睡眠分类的最佳实践，但其成本和对人工标注的依赖带来了挑战。现有研究常依赖多通道EEG信号，但对新生儿的脆弱性和可能影响睡眠质量存在担忧。本文提出一种新颖的新生儿睡眠阶段分类方法，采用单通道梯度提升算法与Hjorth特征。梯度提升参数通过随机搜索交叉验证（randomsearchCV）进行微调，实现82.35%的新生儿睡眠-觉醒分类准确率。通过5折交叉验证进行验证。所提算法不仅提升了现有新生儿睡眠算法，还为更广泛的应用开辟了新途径。

英文摘要

Sleep plays a crucial role in neonatal development. Monitoring the sleep patterns in neonates in a Neonatal Intensive Care Unit (NICU) is imperative for understanding the maturation process. While polysomnography (PSG) is considered the best practice for sleep classification, its expense and reliance on human annotation pose challenges. Existing research often relies on multichannel EEG signals; however, concerns arise regarding the vulnerability of neonates and the potential impact on their sleep quality. This paper introduces a novel approach to neonatal sleep stage classification using a single-channel gradient boosting algorithm with Hjorth features. The gradient boosting parameters are fine-tuned using random search cross-validation (randomsearchCV), achieving an accuracy of 82.35% for neonatal sleep-wake classification. Validation is conducted through 5-fold cross-validation. The proposed algorithm not only enhances existing neonatal sleep algorithms but also opens avenues for broader applications.

URL PDF HTML ☆

赞 0 踩 0

2408.07922 2026-06-10 cs.CV cs.LG

A Deep Features-Based Approach Using Modified ResNet50 and Gradient Boosting for Visual Sentiments Classification

基于改进ResNet50和梯度提升的深度特征方法用于视觉情感分类

Arslan Bisharat, Muhammad Mubeen, Arslan Akram, Saadullah Farooq Abbasi, Muhammad Salman Ali, Muhammad Usman Tariq

发表机构 * Department of Computer Science（计算机科学系）； Loyola University Chicago（芝加哥洛伊拉大学）； University Of the People（人民大学）； The Superior University Lahore（拉合尔超级大学）； University of Birmingham（伯明翰大学）

AI总结本文提出一种结合改进ResNet50提取深度特征和梯度提升算法的情感分类方法，通过两个基准数据集验证，优于现有深度学习和机器学习模型。

Comments 4 pages, 4 figures, 3 tables, IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR) 2024

详情

AI中文摘要

视觉情感分析（VSA）的多功能性是其日益受到关注的原因之一。由于以往研究主要集中在单一模态的情感分析上，如文本，因此难以高效管理包含视觉信息的社会媒体数据。此外，大多数视觉情感研究需要充分分类情感，因为它们主要关注简单合并模态属性而未深入研究其复杂关系。为此，提出了一种融合深度学习和机器学习算法的方法。本研究使用深度特征方法进行多类分类，从改进的ResNet50中提取深度特征，并使用梯度提升算法对包含情感内容的照片进行分类。该方法在两个基准数据集CrowdFlower和GAPED上进行了彻底评估。最后，使用最先进的深度学习和机器学习模型来比较所提出的方法。与现有最先进的方法相比，所提出的方法在所呈现的数据集上表现出色。

英文摘要

The versatile nature of Visual Sentiment Analysis (VSA) is one reason for its rising profile. It isn't easy to efficiently manage social media data with visual information since previous research has concentrated on Sentiment Analysis (SA) of single modalities, like textual. In addition, most visual sentiment studies need to adequately classify sentiment because they are mainly focused on simply merging modal attributes without investigating their intricate relationships. This prompted the suggestion of developing a fusion of deep learning and machine learning algorithms. In this research, a deep feature-based method for multiclass classification has been used to extract deep features from modified ResNet50. Furthermore, gradient boosting algorithm has been used to classify photos containing emotional content. The approach is thoroughly evaluated on two benchmarked datasets, CrowdFlower and GAPED. Finally, cutting-edge deep learning and machine learning models were used to compare the proposed strategy. When compared to state-of-the-art approaches, the proposed method demonstrates exceptional performance on the datasets presented.

URL PDF HTML ☆

赞 0 踩 0

2310.04680 2026-06-10 cs.CL cs.AI cs.LG

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

大语言模型降维的成本：事实回忆在内省学习之前恶化

Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

发表机构 * MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）； MIT Harvard University（麻省理工学院哈佛大学）； Google Research（谷歌研究）； Google DeepMind（谷歌深Mind）

AI总结研究探讨了大语言模型参数数量缩放对核心能力的影响，发现模型规模缩减会显著降低事实回忆能力，但对内省信息处理影响较小。

Journal ref The Twelfth International Conference on Learning Representations (ICLR), 2024

详情

AI中文摘要

如何缩放大语言模型（LLMs）的参数数量会影响其核心能力？我们研究了两种自然缩放技术——权重剪枝和简单训练更小或更大的模型（称为密集缩放）——对LLMs两个核心能力的影响：（a）回忆训练期间呈现的事实，以及（b）处理推理期间呈现的信息。通过设计一系列任务来区分这两种能力，我们发现这两种能力在缩放时的表现存在显著差异。通过超过30%的模型规模缩减（通过任一缩放方法）会显著降低对训练期间呈现事实的回忆能力。然而，60-70%的缩减在很大程度上保留了模型处理内省信息的各种方式，从从长上下文检索答案到从内省示例中学习参数化函数。两种缩放方法均表现出这种行为，表明缩放模型大小对事实回忆和内省学习有本质上不同的影响。

英文摘要

How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.

URL PDF HTML ☆

赞 0 踩 0

2606.11166 2026-06-10 stat.OT cs.AI 新提交

Flaws in the LLM Automation Narrative

LLM自动化叙事中的缺陷

George Perrett, Javae Elliott, Jennifer Hill, Marc Scott

发表机构 * New York University（纽约大学）

AI总结通过编写代码完成数据分析任务的新基准测试，发现前沿LLM在平均性能、方差和错误幅度上均不如人类专家，挑战了LLM达到人类专家水平的说法。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被描述为在知识经济任务上达到人类专家水平。这些说法主要基于LLM在标准化数据集上衡量平均性能的基准测试任务中的表现。许多基准测试任务的主要局限性在于，它们通常基于直接包含在LLM训练数据中的内容来衡量性能，并且经常不评估LLM性能的可靠性或LLM错误的幅度。然而，在高风险情境中，这些品质至关重要。通过一项需要编写计算机代码完成数据分析任务的新型LLM基准测试，我们将前沿LLM的性能与人类专家的提交进行了比较，并明确测量了响应的方差和错误的幅度。我们的研究表明，人类专家在一系列指标上平均表现更好，并且表现出更小的性能变异性。我们的结果提供了证据，表明LLM并非始终如一地达到人类专家的水平，并证明了在LLM基准评估中测量方差和评估错误幅度的重要性。

英文摘要

Large Language Models (LLMs) are increasingly described as performing at the level of human experts on knowledge economy tasks. These claims are primarily based on how LLMs perform on benchmarking tasks that measure average performance across standardized datasets. Primary limitations of many benchmarking tasks are that they often measure performance based on content directly included in LLM training data, and they frequently do not assess the reliability of LLM performance or the magnitude of LLM errors. However, in high stakes contexts, these qualities are critically important. Through a novel LLM benchmarking task that requires writing computer code to complete a data analysis task, we compare the performance of a frontier LLM against submissions from human experts and explicitly measure the variance of responses and the magnitude of errors. Our study reveals that the human experts perform better on average on a range of metrics and demonstrate less variability in performance. Our results provide evidence that LLMs do not consistently perform at the level of human experts and demonstrate the importance of measuring variance and assessing error magnitude in LLM benchmark evaluations.

URL PDF HTML ☆

赞 0 踩 0

2606.11156 2026-06-10 stat.ML cs.LG 新提交

Itô maps for any-step SDEs

任意步SDE的Itô映射

Zhengkai Pan, Peter Potaptchik, Wenxi Yao, Michael S. Albergo, Jakiw Pidstrigach

发表机构 * Harvard University（哈佛大学）； University of Oxford（牛津大学）； Kempner Institute（凯门研究所）

AI总结提出Itô映射，一种任意步随机流映射，通过单次前向传播预测未来状态，实现随机动力学的精确蒸馏，并支持推理时控制和后验采样。

2606.11125 2026-06-10 eess.SP cs.LG 新提交

DMT: Demographic Conditioning, Morphology-Enhanced Transformer for Cuffless Blood Pressure Estimation from PPG Signals

DMT: 基于人口统计条件与形态增强Transformer的无袖带血压估计方法

Yidan Shen, Neville Mathew, Maham Rahimi, Deependra Dhakal, George Zouridakis, Xin Fu, Renjie Hu

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）

AI总结提出一种基于Transformer的PPG信号无袖带血压估计网络，通过FiLM风格特征调制融入人口统计信息，并添加辅助形态头引导模型关注与动脉僵硬度相关的波形形态，在PulseDB数据集上实现收缩压MAE 4.56 mmHg、舒张压MAE 2.62 mmHg。

详情

AI中文摘要

血压（BP）是心血管风险评估和治疗决策的关键指标，而光电容积描记术（PPG）能够实现低成本、可穿戴友好的无袖带血压估计。然而，即使近期取得了进展，许多基于PPG的模型仅通过血压回归进行训练，可能依赖于以振幅为主的捷径。此外，系统性调节血管顺应性的人口统计协变量通常仅通过后期融合纳入，限制了特定于主体的表示学习。我们提出了一种基于Transformer的网络，用于从PPG信号进行无袖带血压估计，利用自注意力机制捕获多个心动周期之间的长程依赖关系。为了考虑特定主体的血管差异，模型通过Transformer块的注意力和前馈子层中应用的FiLM风格特征调制，以人口统计信息为条件。此外，我们添加了一个辅助形态头，引导模型关注与动脉硬度和波反射相关的血压相关波形形态。在大型PulseDB数据集上基于校准的评估协议下，所提方法在收缩压上实现了4.56 mmHg的平均绝对误差（MAE），在舒张压上实现了2.62 mmHg，与先前的人口统计增强PPG基线相比，误差分别减少了47%和50。由此产生的轻量级单传感器模型支持在启用校准的部署场景中进行可扩展且临床可靠的无袖带血压估计。

英文摘要

Blood pressure (BP) is a key marker for cardiovascular risk assessment and therapeutic decision-making, and Photoplethysmography (PPG) enables low-cost, wearable-friendly cuffless BP estimation. However, even with recent progress, many PPG-based models are trained with BP regression alone and may rely on amplitude-dominated shortcuts. In addition, demographic covariates that systematically modulate vascular compliance are often incorporated only via late fusion, limiting subject-specific representation learning. We propose a Transformer-based network for cuffless BP estimation from PPG signal, leveraging self-attention to capture long-range dependencies across multiple cardiac cycles. To account for subject-specific vascular differences, the model is conditioned on demographics via FiLM-style feature modulation applied through the attention and feed-forward sublayers of Transformer blocks. In addition, we add an auxiliary morphology head to guide the model to attend to BP-relevant waveform morphology associated with arterial stiffness and wave reflection. Under calibration-based evaluation protocols on the large-scale PulseDB dataset, the proposed method achieves MAE of 4.56 mmHg for systolic BP and 2.62 mmHg for diastolic BP, reducing errors by 47% and 50% compared with prior demographic-enhanced PPG baselines. The resulting lightweight, single-sensor model supports scalable and clinically grounded cuffless BP estimation in calibration-enabled deployment settings.

URL PDF HTML ☆

赞 0 踩 0

2606.11044 2026-06-10 stat.ML cs.LG 新提交

Generalized Conformal Predictive Systems Under Distributional Shifts

广义共形预测系统在分布偏移下的应用

Jef Jonkers, Johanna Ziegel

发表机构 * IDLab Seminar for Statistics（统计研究所研讨会）； Department of Electronics（电子系）； ETH Zurich（苏黎世联邦理工学院）； Information Systems Zurich, Switzerland（苏黎世信息系统，瑞士）； Ghent University（根特大学）

AI总结针对分布偏移，通过观测特定置换权重编码偏移，扩展广义共形预测系统，提出偏移感知预测系统，并引入权重不确定性框构建鲁棒共形预测系统包络，提供有限样本或渐近置信保证。

Comments 27 pages, 10 figures

2606.10972 2026-06-10 eess.AS cs.AI 新提交

Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks

基于CNN和GRU网络的哮喘与COPD鉴别诊断中二维输入表示和子阶段融合策略的优化

Ipek Sen, Ozgur Ozdemir, Elena Battini Sonmez

发表机构 * Dept. Electrical and Electronics Engineering Istanbul Bilgi University, Turkey（电气与电子工程系伊斯坦布尔比尔吉大学，土耳其）； Dept. Computer Engineering Istanbul Bilgi University, Turkey（计算机工程系伊斯坦布尔比尔吉大学，土耳其）

AI总结本研究优化了二维输入表示（MFCC、对数梅尔谱图、VAR模型）和子阶段特征融合策略（直接拼接、GRU、GRU+注意力），使用CNN和GRU网络鉴别哮喘与COPD，最佳F1分数达0.877。

详情

AI中文摘要

本研究旨在探索VAR模型与梅尔频率倒谱系数（MFCC）矩阵和对数梅尔谱图在深度学习中的性能比较。在肺音分类中，基于谱图的表示因呼吸周期时长不同而存在时间维度不一致的问题。除了传统的裁剪/零填充，还提出了自适应长度窗口来固定时间维度。通过测试一系列参数优化其频谱和时间维度。采用不同的卷积神经网络（CNN）架构从子阶段获得的二维表示中提取特征。然后使用各种策略融合提取的子阶段特征，包括直接拼接、门控循环单元（GRU）网络和带注意力的GRU。通过基于呼吸周期的评估和基于受试者的评估（包含多个呼吸周期）来评估模型性能。还研究了多种数据增强技术以应对数据规模限制。最佳基于周期的F1分数（0.877）通过使用13个系数和每子阶段表示64点时间分辨率的MFCC矩阵，随后进行直接特征拼接获得；最佳基于受试者的F1分数（0.855）通过使用13个系数和每完整周期表示256点时间分辨率的MFCC矩阵获得，两者均采用自适应长度窗口。增强总体上降低了模型性能，但mixup增强是测试方法中最好的。MFCC在区分哮喘和COPD方面优于对数梅尔谱图和VAR模型。复杂的融合策略并未改善诊断。增强没有贡献，表明真实数据在肺音研究中的重要性。

英文摘要

This study aims to explore the performance of the VAR model in comparison with mel-frequency cepstral coefficient (MFCC) matrices and log-mel spectrograms using deep learning. In pulmonary sound classification, spectrogram-based representations suffer from inconsistent temporal dimensions due to varying respiratory cycle durations. Along with traditional trimming/zero-padding, adaptive-length windowing was presented to fix their temporal dimensions. Their spectral and temporal dimensions were optimized by testing a range of parameters. Different convolutional neural network (CNN) architectures were employed to extract features from the two-dimensional representations obtained over the sub-phases. The extracted sub-phase features were then fused using various strategies including direct concatenation, gated recurrent unit (GRU) network and GRU with attention mechanism. Model performances were assessed through respiratory cycle-based evaluation and subject-based evaluation comprising multiple respiratory cycles. Several data augmentation techniques were also studied to cope with limitations in data size. The best cycle-based F1-score (0.877) was obtained using the MFCC matrices with thirteen coefficients and 64-point time resolution per sub-phase representation followed by direct feature concatenation, and the best subject-based F1-score (0.855) was obtained using the MFCC matrices with thirteen coefficients and 256-point time resolution per full-cycle representation, both obtained by adaptive-length windowing. Augmentation degraded the performance of models overall, yet mixup augmentation was the best among the methods tested. MFCC outperformed log-mel spectrogram and VAR model in differentiation of asthma and COPD. Sophisticated fusion strategies did not improve the diagnosis. Augmentation did not contribute, demonstrating the significance of authentic data in pulmonary sound studies.

URL PDF HTML ☆

赞 0 踩 0

2606.10906 2026-06-10 stat.ML cs.AI cs.LG 新提交

Human-AI Teaming Through the Lens of Calibration

通过校准视角看人机协作

Eric Nalisnick, Chi Zhang, Sophia Qian, Yixin Wang

发表机构 * Department of Computer Science, Johns Hopkins University（计算机科学系，约翰霍普金斯大学）； Department of Statistics, University of Michigan（统计学系，密歇根大学）

AI总结研究通过统计校准视角分析人机协作模型，发现组合方法不保留人类校准度，而委托方法将校准负担转移给拒绝器元模型，且当人类依赖系统不可观测信息时无法实现。

Comments 19 pages, 5 figures (including appendix)

详情

AI中文摘要

我们通过统计校准的视角研究人机协作模型。假设团队由AI模型和人类组成——两者相对于特征空间的某种划分都是校准的——并揭示校准假设如何传播到协作框架中。特别地，我们考虑两种框架：(i) 结合人类和模型预测，或 (ii) 将预测责任委托给人类或模型。通过理论和实证结果，我们表明现有的组合方法不保留人类的校准程度。委托方法（通过委托行为本身）保留了后续预测器的校准，但将负担转移到了决定谁进行预测的拒绝器元模型上。拒绝器必须足够精细地校准，以定位每个成员的优势所在，这一需求随着人类专业知识的增长而增加，并且当人类依赖系统无法观测的信息时变得无法实现。

英文摘要

We study models for human-AI teaming through the lens of statistical calibration. We assume the team consists of an AI model and human -- both of which are calibrated with respect to some partitioning of the feature space -- and expose how the calibration assumptions propagate into the teaming framework. In particular, we consider frameworks that either (i) combine human and model predictions or (ii) delegate prediction responsibility to either a human or model. We show via theoretical and empirical results that existing methods for combination do not preserve the human's degree of calibration. Methods for delegation (by the very act of delegation) preserve calibration of the downstream predictors but shift the burden onto the rejector meta-model that decides who predicts. The rejector must be calibrated finely enough to locate where each member is superior, a demand that grows with the human's expertise and becomes unattainable when the human relies on information the system cannot observe.

URL PDF HTML ☆

赞 0 踩 0

2606.10889 2026-06-10 q-bio.NC cs.LG 新提交

Sleep EEG Signal Criticality as a Non-Invasive Predictor of Cognitive Decline in Dementia

睡眠脑电信号临界性作为痴呆认知衰退的非侵入性预测指标

Stanisław Narębski, Tomasz Komendziński, Tomasz M. Rutkowski

发表机构 * Institute of Cybernetics and Human Informatics, Polish Academy of Sciences（波兰科学院信息学与人类科学研究所）

AI总结研究通过多重分形去趋势波动分析量化睡眠脑电信号临界性，发现认知健康者更接近最优临界状态，痴呆组DFA指数向1.0偏移，表明睡眠中无标度神经动力学重组先于临床症状，可作为早期筛查工具。

Comments 4 pages, 2 figures, accepted for publication in the Proc. 48th Annu. Int. Conf. IEEE EMBS (EMBC 2026), Toronto, Canada, July 20-24, 2026

详情

AI中文摘要

神经退行性疾病的早期检测仍然是一个关键的临床挑战。本研究探讨了通过多重分形去趋势波动分析（MFDFA）量化的睡眠脑电信号临界性是否可作为未来认知衰退的非侵入性生物标志物。我们分析了国家睡眠研究资源（NSRR）骨质疏松性骨折研究（SOF）队列的纵向数据，比较了保持认知正常与后来进展为痴呆相关损伤（3MS < 78）的女性之间的基线睡眠脑电动力学。我们的结果揭示了Hurst指数$H(q)$分布在组间的显著差异，特别是在非快速眼动阶段N2和N3期间。认知健康的个体在所有电极位置上表现出显著更接近最优临界状态的信号动力学（$p \leqslant 0.001$），支持了大脑临界性假说。监督UMAP投影证实了整夜睡眠期间组间的清晰空间分离。痴呆组表现出DFA指数向$1.0$的偏移，表明睡眠中无标度神经动力学的重组先于临床症状。这些发现强调了将MFDFA衍生测量整合到自动化、基于睡眠的筛查工具中的潜力，从而能够在痴呆的前驱窗口期进行更早的预防性干预。

英文摘要

Early detection of neurodegeneration remains a critical clinical challenge. This study investigates whether sleep EEG signal criticality, quantified via Multifractal Detrended Fluctuation Analysis (MFDFA), serves as a non-invasive biomarker for future cognitive decline. We analyzed longitudinal data from the National Sleep Research Resource (NSRR) Study of Osteoporotic Fractures (SOF) cohort, comparing baseline sleep EEG dynamics between women who remained cognitively normal and those who later progressed to dementia-related impairment ($3MS < 78$).Our results reveal significant group-level differences in Hurst exponent $H(q)$ distributions, particularly during non-REM stages N2 and N3. Cognitively healthy individuals exhibited signal dynamics significantly closer to an optimally critical state across all electrode locations ($p \leqslant 0.001$), supporting the Brain Criticality Hypothesis. Supervised UMAP projections confirmed clear spatial separation between groups throughout the overnight sleep architecture.The dementia group demonstrated a shift in DFA exponents toward $1.0$, suggesting that a reconfiguration of scale-free neural dynamics during sleep precedes clinical symptoms. These findings highlight the potential for MFDFA-derived measures to be integrated into automated, sleep-based screening tools, enabling earlier preventative interventions during the prodromal window of dementia.

URL PDF HTML ☆

赞 0 踩 0

2606.10781 2026-06-10 eess.AS cs.CL 新提交

Recovering the Zipfian Distribution in Unsupervised Term Discovery

在无监督术语发现中恢复齐夫分布

Danel Slabbert, Simon Malan, Herman Kamper

发表机构 * Het Jan Marais Fonds（赫特·詹·马里茨基金会）

AI总结针对无监督术语发现中中心聚类导致分布不均匀的问题，提出图聚类方法，在三种语言上显著优于K-means等，恢复更接近齐夫分布的词汇分布。

详情

AI中文摘要

无监督术语发现涉及将未标记语音分割成词或音节单元，并将这些单元聚类成候选类型的词典。真实词典遵循齐夫分布，然而主流的基于中心的聚类方法——K-means——由于对球形聚类的归纳偏差，产生更均匀的分布。在本文中，我们重新审视基于图的聚类作为一种自下而上的替代方案，其中片段嵌入通过成对相似性连接，并使用Leiden算法进行划分。我们表明，在三种语言的词级和音节级词典发现中，图聚类在性能上显著优于基于中心的方法（K-means、GMM、BIRCH），产生更接近齐夫分布的分布。另一种自下而上的方法，即使用平均链接的凝聚聚类，也表现良好，尽管其计算效率较低，且对结果分布的控制能力较弱。我们的工作质疑了基于中心的聚类在术语发现中的主导地位，并推广图聚类作为一种有吸引力的替代方案。

英文摘要

Unsupervised term discovery involves segmenting unlabelled speech into word- or syllable-like units and clustering these into a lexicon of candidate types. True lexicons follow a Zipfian distribution, yet the dominant centre-based clustering approach -- K-means -- produces a more uniform distribution due to an inductive bias toward spherical clusters. In this paper we revisit graph-based clustering as a bottom-up alternative, where segment embeddings are connected by pairwise similarity and partitioned using the Leiden algorithm. We show that graph clustering substantially outperforms centre-based approaches (K-means, GMM, BIRCH) in both word- and syllable-level lexicon discovery across three languages, producing more Zipf-like distributions. Another bottom-up approach, agglomerative clustering with average linkage, also performs well, although it is computationally less efficient and allows for less control over the resulting distribution. Our work calls into question the dominance of centre-based clustering for term discovery, and promotes graph clustering as an attractive alternative.

URL PDF HTML ☆

赞 0 踩 0

2606.10770 2026-06-10 stat.ME cs.LG 新提交

Correcting Variable Importance Scored by Random Forests

修正随机森林产生的变量重要性评分

Guancheng Zhou, Haiping Xu, Jason Liu, Donghui Yan

发表机构 * Computer and Information Science（计算机与信息科学）； Mathematics and Data Science（数学与数据科学）； University of Massachusetts, Dartmouth, MA（马萨诸塞大学达特茅斯分校）； The Rivers School, Weston, MA（韦斯特on学校的河流学校）

AI总结针对随机森林变量重要性受变量间相关性影响的问题，提出基于条件相关性的分组方法进行修正，实验证明两种计算高效方案均能有效校正变量重要性。

Comments 22 pages, 10 figures

详情

AI中文摘要

随机森林产生的变量重要性在统计分析中广泛应用，在辅助模型解释、模型选择和诊断、成本受限学习等任务中发挥重要作用。然而，RF中变量重要性的计算未考虑变量间的相关性，与许多其他变量相关的变量往往会获得较低的重要性指数，或被其他强相关变量完全掩盖（即重要性指数接近零）。为了在计算变量重要性时避免不相关变量的影响，我们提出根据变量的条件相关性（以响应变量为条件）对变量进行分组。我们探索了两种计算高效的方案：一种将变量单独分组，然后将感兴趣的变量与所有相关变量分离；另一种使用聚类根据变量间的成对条件相关性进行分组。实验表明，两种方法都能对变量重要性进行合理的修正。

英文摘要

Variable importance produced by Random Forests (RF) is used widely in statistical data analysis, and has played an important role in a variety of tasks such as assisting model interpretation, model selection and diagnosis, and cost-bounded learning etc. However, the calculation of variable importance in RF does not take into account of the correlations among variables, and variables that are correlated to many other variables tend to receive a lower importance index or being completely masked (i.e., with an importance index near zero) by other strongly correlated variables. To prevent influence from unwanted correlated variables in calculating variable importance, we propose to group variables by their conditional correlations (conditional on the response variable). We explore two computationally efficient options, with one grouping variables individually, and then separates the variable of interest from all correlated variables, while the other uses clustering to group variables according to their pair-wise conditional correlations. Our experiments show that both lead to sensible corrections to the importance of variables.

URL PDF HTML ☆

赞 0 踩 0

2606.10738 2026-06-10 eess.AS cs.AI 新提交

Spatial-Omni: Spatial Audio Understanding Integration in Multimodal LLMs via FOA Encoding

Spatial-Omni：通过FOA编码在多模态大语言模型中实现空间音频理解

Zhiyuan Zhu, Yixuan Chen, Yiwen Shao, Wenxiang Guo, Changhao Pan, Yu Zhang, Yuxiang Wang, Wei Liu, Houhua Zhang, Chengkuan Zeng, Wenbo Cheng, Yunxi Liu, Rui Yang, Steve Yves, Liefeng Bo, Zhou Zhao

发表机构 * Zhejiang University（浙江大学）； Tencent Hunyuan（腾讯文心）

AI总结提出Spatial-Omni，通过SO-Encoder将一阶Ambisonics空间音频注入现有全模态大语言模型，以轻量方式实现空间音频理解，并在构建的SO-Bench基准上超越现有模型。

详情

AI中文摘要

最近的多模态大语言模型主要将音频处理为单声道信号，从而丢弃了空间音频中包含的空间线索，这些线索用于声音定位、空间关系推理和空间场景理解。我们提出Spatial-Omni，一种轻量级方法，通过实现SO-Encoder将一阶Ambisonics（FOA）空间音频作为独立模态注入现有的全模态大语言模型，而无需修改其原始音频编码器。SO-Encoder以有限的额外上下文成本提供空间标记，并通过高效的分阶段训练提升空间音频理解。为支持训练和评估，我们从开源数据、真实录音和仿真中构建了SO-Dataset、SO-QA和SO-Bench，包含40万条FOA空间音频片段和210万个空间问答对。SO-Bench涵盖16个空间音频理解子任务，包括基本检测和位置估计、空间关系理解以及复杂空间推理。实验表明，Spatial-Omni在空间音频理解任务上优于现有的开源大型音频语言模型（LALM）和全模态大语言模型，同时保持合理的通用音频理解水平。代码和数据见：https://this https URL。

英文摘要

Recent multimodal large language models mainly process audio as monaural signals, thereby discarding the spatial cues contained in spatial audio for sound localization, spatial relation reasoning, and spatial scene understanding. We propose Spatial-Omni, a lightweight method that implements SO-Encoder to inject First-Order Ambisonics (FOA) spatial audio into existing Omni LLMs as an independent modality, without modifying their original audio encoders. SO-Encoder provides spatial tokens with limited additional context cost and improves spatial audio understanding through efficient staged training. To support training and evaluation, we construct SO-Dataset, SO-QA, and SO-Bench from open-source data, real recordings, and simulations, containing 400K FOA spatial audio clips and 2.1M spatial question answering pairs. SO-Bench covers 16 spatial audio understanding subtasks, including basic detection and location estimation, spatial relation understanding, and complex spatial reasoning. Experiments show that Spatial-Omni outperforms existing open-source Large Audio-Language Models (LALMs) and Omni LLM models on spatial audio understanding tasks while retaining a reasonable level of general audio understanding. Code and data are available at https://github.com/dieKarotte/Spatial-Omni.

URL PDF HTML ☆

赞 0 踩 0

2606.10713 2026-06-10 eess.IV cs.AI cs.CV cs.LG 新提交

++nnU-Net: Scaling nnU-Net with Prefix-Based Data Augmentation

++nnU-Net: 基于前缀数据增强的nnU-Net扩展

Ana Sofia Santos, André Ferreira, Gijs Luijten, Naida Solak, Lisle Faray de Paiva, Behrus Hinrichs-Puladi, Jens Kleesiek, Jan Egger, Victor Alves

发表机构 * Center Algoritmi / LASI, University of Minho, Braga, Portugal（阿尔戈里米中心/拉斯伊大学，明霍大学，布拉加，葡萄牙）； Institute for Artificial Intelligence in Medicine, University Medicine Essen, Essen, Germany（医学人工智能研究所，埃森医学院，埃森，德国）； Institute of Medical Informatics / Dept. of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Germany（医学信息学研究所/口腔和颅面外科部，亚琛大学医院，德国）； Faculty of Computer Science, University of Duisburg-Essen, Essen, Germany（计算机科学学院，杜伊斯堡-埃森大学，埃森，德国）

AI总结提出++nnU-Net，通过图像配准进行数据增强，在预处理和训练前生成变形图像，在5个2D数据集上提升Dice系数最高约22%。

Comments 7 pages, 1 figure, 2 tables

详情

AI中文摘要

nnU-Net在医学分割任务中持续展现出成功，这严重依赖于标注生物医学数据的可用性和多样性。然而，由于隐私法规和标注成本等因素，收集医学影像队列仍然具有挑战性。因此，数据增强在增加数据可用性的同时保持解剖学可行性方面起着关键作用。为此，我们提出了++nnU-Net，一种基于图像配准的新型数据增强模块，在预处理和训练之前运行。我们的框架在五个不同的2D数据集上进行了评估。在该工作流中，图像数据经过两阶段配准过程，生成新的变形图像。然后将变换应用于相应的分割。此外，该管道计算可用磁盘空间，生成补充的二进制合成掩码并生成检查点。我们证明++nnU-Net优于nnU-Net基线，在Dice相似系数得分上有所提升。在最显著的情况下，我们观察到性能提升约22%。这些发现强调了基于配准的数据增强的有效性，特别是对于2D医学影像数据集，并表明++nnU-Net为在数据有限的情况下提高分割性能提供了一种实用且可扩展的方法。++nnU-Net的源代码可在以下网址获取：this https URL

英文摘要

The nnU-Net has demonstrated continuous success in medical segmentation tasks, which heavily rely on the availability and diversity of annotated biomedical data. However, assembling medical imaging cohorts remains challenging due to numerous factors such as privacy regulations and annotation costs. As a result, data augmentation plays a crucial role in increasing data availability while maintaining anatomical feasibility. Hence, we propose the ++nnU-Net, a novel data augmentation module based on image registration that operates prior to preprocessing and training take place. Our framework was evaluated across five different 2D datasets. In this workflow, image data go through a two-stage registration process, generating new warped images. The transformations are then applied to the respective segmentation. In addition, the pipeline computes available disk space, generates supplementary binary synthetic masks and generates checkpoints. We demonstrate that the ++nnU-Net outperforms the nnU-Net baseline, yielding improvements in Dice Similarity Coefficient scores. In the most prominent cases, we observe performance gains of approximately 22\%. These findings highlight the effectiveness of registration-based data augmentation, particularly for 2D medical imaging datasets and suggest that the ++nnU-Net provides a practical and scalable approach for enhancing segmentation performance in data-limited settings. The source code for the ++nnU-Net is available at: https://github.com/sofia-adelie/plusplusnnunet.git

URL PDF HTML ☆

赞 0 踩 0

AI 大模型

视觉与机器人

科学与医疗

RetroAgent: From Solving to Evolving via Retrospective Dual Intrinsic Feedback

Long-Term Visual Localization in Dynamic Benthic Environments: A Dataset, Footprint-Based Ground Truth, and Visual Place Recognition Benchmark

ReCoN-Ipsundrum: An Inspectable Recurrent Persistence Loop Agent with Affect-Coupled Control and Mechanism-Linked Consciousness Indicator Assays

PeruMedQA: Benchmarking Large Language Models (LLMs) on Peruvian Medical Exams -- Dataset Construction and Evaluation

DaLA: Danish Linguistic Acceptability Evaluation Guided by Real World Errors

MSAM: Multi-Semantic Adaptive Mining for Cross-Modal Drone Video-Text Retrieval

On Using Large Language Models to Enhance Clinically-Driven Missing Data Recovery Algorithms in Electronic Health Records

HannesImitation: Grasping with the Hannes Prosthetic Hand via Imitation Learning

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

Improving Masked Style Transfer using Blended Partial Convolution

Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

3DGS.zip: A survey on 3D Gaussian Splatting Compression Methods

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

MeshONet: A Generalizable and Efficient Operator Learning Method for Structured Mesh Generation

Detecting LGBTQ+ Instances of Cyberbullying

A Single Channel-Based Neonatal Sleep-Wake Classification using Hjorth Parameters and Improved Gradient Boosting

A Deep Features-Based Approach Using Modified ResNet50 and Gradient Boosting for Visual Sentiments Classification

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

Flaws in the LLM Automation Narrative

Itô maps for any-step SDEs

DMT: Demographic Conditioning, Morphology-Enhanced Transformer for Cuffless Blood Pressure Estimation from PPG Signals

Generalized Conformal Predictive Systems Under Distributional Shifts

Optimizing 2D Input Representations and Sub-phase Fusion Strategies for Differential Diagnosis of Asthma and COPD Using CNN- and GRU-Based Networks

Human-AI Teaming Through the Lens of Calibration

Sleep EEG Signal Criticality as a Non-Invasive Predictor of Cognitive Decline in Dementia

Recovering the Zipfian Distribution in Unsupervised Term Discovery

Correcting Variable Importance Scored by Random Forests

Spatial-Omni: Spatial Audio Understanding Integration in Multimodal LLMs via FOA Encoding

++nnU-Net: Scaling nnU-Net with Prefix-Based Data Augmentation