AI中文摘要
自动语音识别(ASR)是现代技术的核心组成部分,驱动着语音激活助手、转录服务和可访问性工具等应用。然而,ASR系统仍难以应对人类语音的固有变异性,如口音、方言和说话方式,以及环境干扰,包括背景噪声。此外,领域特定的对话常使用专业术语,这会加剧转录错误。这些不足不仅降低了原始ASR的准确性,还会通过后续的自然语言处理流程传播错误。由于重新设计ASR模型成本高且耗时,非侵入式精修技术,即不改变模型架构的方法,变得越来越受欢迎。在本文综述中,我们回顾了当前非侵入式精修方法,并将其分为五类:融合、重评分、修正、蒸馏和训练调整。对于每类方法,我们概述了主要方法、优势、缺点以及理想的应用场景。除了方法分类外,本文还调研了旨在在领域特定上下文中精修ASR的适应技术,回顾了常用评估数据集及其构建过程,并提出了标准化的指标集以促进公平比较。最后,我们识别了开放研究空白,并提出了未来工作的有前途方向。通过提供这种结构化的概述,我们旨在为研究人员和实践者提供开发更鲁棒、准确的ASR精修管道的清晰基础。
英文摘要
Automatic Speech Recognition (ASR) is an integral component of modern technology, powering applications such as voice-activated assistants, transcription services, and accessibility tools. Yet ASR systems continue to struggle with the inherent variability of human speech, such as accents, dialects, and speaking styles, as well as environmental interference, including background noise. Moreover, domain-specific conversations often employ specialized terminology, which can exacerbate transcription errors. These shortcomings not only degrade raw ASR accuracy but also propagate mistakes through subsequent natural language processing pipelines. Because redesigning an ASR model is costly and time-consuming, non-intrusive refinement techniques that leave the model's architecture intact have become increasingly popular. In this survey, we review current non-intrusive refinement approaches and group them into five classes: fusion, re-scoring, correction, distillation, and training adjustment. For each class, we outline the main methods, advantages, drawbacks, and ideal application scenarios. Beyond method classification, this work surveys adaptation techniques aimed at refining ASR in domain-specific contexts, reviews commonly used evaluation datasets along with their construction processes, and proposes a standardized set of metrics to facilitate fair comparisons. Finally, we identify open research gaps and suggest promising directions for future work. By providing this structured overview, we aim to equip researchers and practitioners with a clear foundation for developing more robust, accurate ASR refinement pipelines.