2507.15294
2026-06-10
cs.SD
cs.MM
版本更新
MeMo: Attentional Momentum for Real-time Audio-visual Speaker Extraction under Impaired Visual Conditions
MeMo: 视觉受损条件下的实时视听目标说话人提取的注意力动量
Junjie Li, Wenxuan Wu, Shuai Wang, Zexu Pan, Kong Aik Lee, Helen Meng, Haizhou Li
发表机构
*
Department of Electrical and Electronic Engineering, Faculty of Engineering, The Hong Kong Polytechnic University(电子工程系,工程学院,香港理工大学)
;
Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong(系统工程与工程管理系,香港中文大学)
;
School of Artificial Intelligence (SAI), The Chinese University of Hong Kong, Shenzhen(人工智能学院(SAI),香港中文大学深圳校区)
;
School of Intelligence Science and Technology, Nanjing University(智能科学与技术学院,南京大学)
;
Tongyi Lab, Alibaba Group, Singapore(通义实验室,阿里巴巴集团,新加坡)
AI总结
提出MeMo框架,通过两个自适应记忆库存储注意力信息,在视觉线索缺失时维持注意力动量,实现实时目标说话人提取,SI-SNR提升至少2dB。