Glm-130b: An open bilingual pre-trained model A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding, Z Yang, Y Xu, W Zheng, X Xia, ... arXiv preprint arXiv:2210.02414, 2022 | 570 | 2022 |
{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng, Y Li, K Rong, Y Chen, ... 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI
, 2021 | 73 | 2021 |
BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores Z Ma, J He, J Qiu, H Cao, Y Wang, Z Sun, L Zheng, H Wang, S Tang, ... Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 56 | 2022 |
Risgraph: A real-time streaming system for evolving graphs to support sub-millisecond per-update analysis at millions ops/s G Feng, Z Ma, D Li, S Chen, X Zhu, W Han, W Chen Proceedings of the 2021 International Conference on Management of Data, 513-527, 2021 | 45 | 2021 |
Scaling graph traversal to 281 trillion edges with 40 million cores H Cao, Y Wang, H Wang, H Lin, Z Ma, W Yin, W Chen Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 21 | 2022 |
{SmartMoE}: Efficiently Training {Sparsely-Activated} Models through Combining Offline and Online Parallelization M Zhai, J He, Z Ma, Z Zong, R Zhang, J Zhai 2023 USENIX Annual Technical Conference (USENIX ATC 23), 961-975, 2023 | 20 | 2023 |
TriCache: a user-transparent block cache enabling high-performance out-of-core processing with in-memory programs G Feng, H Cao, X Zhu, B Yu, Y Wang, Z Ma, S Chen, W Chen ACM Transactions on Storage 19 (2), 1-30, 2023 | 15 | 2023 |
UniQ: a unified programming model for efficient quantum circuit simulation C Zhang, H Wang, Z Ma, L Xie, Z Song, J Zhai SC22: International Conference for High Performance Computing, Networking
, 2022 | 12 | 2022 |
{EINNET}: Optimizing tensor programs with {Derivation-Based} transformations L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Huang, X Miao, S Tang, ... 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI
, 2023 | 10 | 2023 |
Scaling graph 500 SSSP to 140 trillion edges with over 40 million cores Y Wang, H Cao, Z Ma, W Yin, W Chen 2022 SC22: International Conference for High Performance Computing
, 2022 | 6 | 2022 |
Efficiently emulating high-bitwidth computation with low-bitwidth hardware Z Ma, H Wang, G Feng, C Zhang, L Xie, J He, S Chen, J Zhai Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022 | 4 | 2022 |
高效训练百万亿参数预训练模型的系统挑战和对策 马子轩, 翟季冬, 韩文弢 中兴通讯技术 28 (2), 51-58, 2022 | 3 | 2022 |
OLLIE: Derivation-based tensor program optimizer L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Tang, L Xie, K Huang, ... arXiv preprint arXiv:2208.02025, 2022 | 2 | 2022 |
面向新一代神威超级计算机的高效内存分配器 王豪杰, 马子轩, 郑立言, 王元炜, 王飞, 翟季冬 清华大学学报 (自然科学版), 2022 | 2 | 2022 |
Optimizing dnns with partially equivalent transformations and automated corrections H Wang, J Zhai, M Gao, F Zhang, T Wang, Z Ma, S Tang, L Zheng, ... IEEE Transactions on Computers, 2023 | 1 | 2023 |
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR Z Ma, H Wang, J Xing, L Zheng, C Zhang, H Cao, K Huang, S Tang, ... arXiv preprint arXiv:2307.04995, 2023 | 1 | 2023 |
Unified Programming Models for Heterogeneous High-Performance Computers ZX Ma, YY Jin, SZ Tang, HJ Wang, WC Xue, JD Zhai, WM Zheng Journal of Computer Science and Technology 38 (1), 211-218, 2023 | 1 | 2023 |
Efficient memory allocator for the New Generation Sunway supercomputer W Haojie, MA Zixuan, L ZHENG, W Yuanwei, W Fei, Z Jidong Journal of Tsinghua University (Science and Technology) 62 (5), 943-951, 2022 | 1 | 2022 |
Efficient Asynchronous Performance Prediction for Heterogeneous Systems Y JIN, Z MA, J ZHAI Chinese Journal of Computational Physics 41 (1), 40, 2024 | | 2024 |
异步感知的异构高性能计算机性能预测方法 金煜阳, 马子轩, 翟季冬 计算物理 41 (1), 40, 2024 | | 2024 |