ERGeoBench:A Comprehensive Benchmark for Embodied Reasoning and Geo-localization in Multimodal Large Language Models
ERGeoBench:多模态大语言模型中具身推理与地理定位的综合基准
发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) ; State Key Laboratory of Networking and Switching Technology(网络与交换技术国家重点实验室) ; School of Materials Science and Engineering(材料科学与工程学院) ; China Mobile Research Institute(中国移动研究院) ; College of Computing and Data Science(计算与数据科学学院)
AI总结 提出ERGeoBench基准,通过单视图、全景视图和具身视图三种渐进设置评估多模态大语言模型在视觉驱动的具身地理定位中的能力,发现当前模型在高层次地理语义推理上表现良好,但在细粒度感知、度量定位和视图间空间一致性上仍有不足。