EARL: Towards a Unified Analysis-Guided Reinforcement Learning Framework for Egocentric Interaction Reasoning and Pixel Grounding
发表机构 * Department of Electrical and Electronic Engineering, The Hong Kong Polytechnic University, Hong Kong SAR(香港理工大学电子与电气工程系) ; Division of Emerging Interdisciplinary Areas (EMIA), The Hong Kong University of Science and Technology, Hong Kong SAR(香港理工大学新兴跨学科领域研究中心)
AI总结 本文提出EARL,一种以自我视角分析为导向的强化学习框架,旨在提升机器人对人类与环境交互的推理能力和像素级定位精度。EARL采用两阶段解析结构,首先生成结构化文本描述,再根据用户查询生成回答和像素掩码,并通过分析引导特征合成器整合语义先验信息。实验表明,EARL在像素级定位任务中取得了优于现有基于强化学习方法的显著提升,展现出良好的泛化能力。
Comments Accepted at ICML 2026. Project page: https://github.com/yuggiehk/EARL