2606.01393
2026-06-02
cs.CL
cs.AI
cs.CV
Dr. DocBench: A Comprehensive Benchmark for Expert-Level and Difficult Document Parsing
Dr. DocBench:专家级与困难文档解析的综合基准
Minglai Yang, Xinyan Velocity Yu, Pengyuan Li, Xinyu Guo, Zhenting Qi, Konwoo Kim, Longtian Ye, Xiaolong Luo, Jinhe Bi, Henry Zhang, Haris Riaz, Xuan Zhang, Yunze Xiao, Bangya Liu, Tom Tang, Yunfei Zhao, Qunshu Lin, Zihan Wang, Minghao Liu, Michael Lingzhi Li, Yilun Du, Jesse Thomason, Rogerio Feris, Alex Pentland, Zexue He
发表机构
*
Stanford University(斯坦福大学)
;
MIT(麻省理工学院)
;
Carnegie Mellon University(卡内基梅隆大学)
;
University of Southern California(南加州大学)
;
Harvard University(哈佛大学)
;
IBM Research(IBM研究院)
;
University of Arizona(亚利桑那大学)
;
Duke University(杜克大学)
;
UC Berkeley(加州大学伯克利分校)
;
LMU Munich(慕尼黑路德维希-马克西米利安大学)
AI总结
提出Dr. DocBench基准,通过基于解析器失败的采样从多语言书籍语料库中选取挑战性文档,包含52个BISAC主题领域和65k高质量标注,用于评估专家级文档解析能力。