2606.10440
2026-06-10
cs.DC
cs.LG
cs.NI
新提交
ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling
ASTRA-sim 3.0:通过高保真GPU和基础设施建模实现下一代分布式机器学习模拟
William Won, Jinsun Yoo, Tuan Ta, Moumita Dey, Andy Balogh, Pradosh Datta, Furkan Eris, Conor Green, Winston Liu, Changhai Man, Kingshuk Mandal, Amos Rai, Vinay Ramakrishnaiah, Ruchi Shah, David Sidler, Harsh Sikhwal, Hanjiang Wu, Tushar Krishna, Bradford M. Beckmann
发表机构
*
AMD Research and Advanced Development(AMD研究与高级开发)
;
Georgia Institute of Technology(佐治亚理工学院)
;
Keysight
;
Purdue University(普渡大学)
AI总结
针对分布式机器学习中延迟敏感通信建模的不足,提出ASTRA-sim 3.0,通过细粒度缓存行级负载存储模拟和标准化基础设施表示InfraGraph,实现高保真模拟,支持优化集合算法、网络需求和GPU架构的设计空间探索。