2606.17474
2026-06-17
cs.CL
cs.AI
新提交
AIPatient Arena: EHR-grounded evaluation of large language models in end-to-end clinical consultation workflows
AIPatient Arena:基于电子健康记录的大语言模型在端到端临床咨询工作流中的评估
Jiahui Niu, Huizi Yu, Wenkong Wang, Guangxin Dai, Jingxian He, Xiang Li, Zhiying Liang, Xinxin Lin, Kent CY So, Bryan YP Yan, Yun Kwok Wing, Yanqiu Xing, Xin Ma, Lizhou Fan
发表机构
*
School of Control Science and Engineering, Shandong University(控制科学与工程学院,山东大学)
;
Key Laboratory of Machine Intelligence and System Control, Shandong University(机器智能与系统控制重点实验室,山东大学)
;
Department of Medicine and Therapeutics, The Chinese University of Hong Kong(医学与治疗学系,香港中文大学)
;
Department of Geriatric Medicine, Qilu Hospital of Shandong University(老年医学科,山东大学齐鲁医院)
;
Department of Psychiatry, The Chinese University of Hong Kong(精神病学系,香港中文大学)
;
Li Chiu Kong Family Sleep Assessment Unit, Department of Psychiatry, Faculty of Medicine, The Chinese University of Hong Kong(李秋虹家庭睡眠评估单元,精神病学系,医学院,香港中文大学)
;
Li Ka Shing Institute of Health Sciences, Faculty of Medicine, The Chinese University of Hong Kong(李嘉诚健康科学研究院,医学院,香港中文大学)
;
Gerald Choa Neuroscience Institute, Department of Medicine and Therapeutics, The Chinese University of Hong Kong(Gerald Choa 神经科学研究所,医学与治疗学系,香港中文大学)
AI总结
提出AIPatient Arena框架,通过电子健康记录构建患者知识图谱,在多轮医患交互中评估大语言模型的八项临床能力,发现模型在信息覆盖、诊断推理等方面存在不足,强调过程评估的重要性。