2604.14892
2026-06-15
cs.LG
cs.AI
版本更新
Can LLMs Accurately Score Medical Diagnoses and Clinical Reasoning?
LLM能否准确评分医学诊断和临床推理?
Amy Rouillard, Sitwala Mundia, Linda Camara, Ziyaad Dangor, Michael Cameron Gramanie, Ismail Kalla, Shabir A. Madhi, Kajal Morar, Marlvin T. Ncube, Haroon Saloojee, Bruce A. Bassett
发表机构
*
Wits MIND Institute, University of the Witwatersrand, Johannesburg, South Africa(维特士心理研究所,沃斯兰德大学,约翰内斯堡,南非)
;
Grai Labs, Cape Town, South Africa(格雷实验室,开普敦,南非)
;
South African Medical Research Council Vaccines and Infectious Diseases Analytics Research Unit, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa(南非医学研究理事会疫苗和传染病分析研究组,健康科学学院,沃斯兰德大学,约翰内斯堡,南非)
;
Department of Internal Medicine, Charlotte Maxeke Johannesburg Academic Hospital, and Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa(内科学系,查理·马克斯凯约翰内斯堡学术医院,以及健康科学学院,沃斯兰德大学,约翰内斯堡,南非)
;
Department of Paediatrics and Child Health, Faculty of Health Sciences, University of the Witwatersrand, Johannesburg, South Africa(儿科学与儿童健康系,健康科学学院,沃斯兰德大学,约翰内斯堡,南非)
;
Wits MIND Institute, University of the Witwatersrand, Johannesbu(维特士心理研究所,沃斯兰德大学,约翰内斯堡)
AI总结
研究使用LLM陪审团对300例低收入和中等收入国家医院病例的3334个诊断进行评分,发现校准后的LLM评分与专家评分高度一致,且严重错误风险更低,可作为可靠的评估代理。