Stebėti
Zixuan Ma
Zixuan Ma
Patvirtintas el. paštas mails.tsinghua.edu.cn
Pavadinimas
Cituota
Cituota
Metai
Glm-130b: An open bilingual pre-trained model
A Zeng, X Liu, Z Du, Z Wang, H Lai, M Ding, Z Yang, Y Xu, W Zheng, X Xia, ...
arXiv preprint arXiv:2210.02414, 2022
5702022
{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections
H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng, Y Li, K Rong, Y Chen, ...
15th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2021
732021
BaGuaLu: Targeting Brain Scale Pretrained Models with over 37 Million Cores
Z Ma, J He, J Qiu, H Cao, Y Wang, Z Sun, L Zheng, H Wang, S Tang, ...
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
562022
Risgraph: A real-time streaming system for evolving graphs to support sub-millisecond per-update analysis at millions ops/s
G Feng, Z Ma, D Li, S Chen, X Zhu, W Han, W Chen
Proceedings of the 2021 International Conference on Management of Data, 513-527, 2021
452021
Scaling graph traversal to 281 trillion edges with 40 million cores
H Cao, Y Wang, H Wang, H Lin, Z Ma, W Yin, W Chen
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of …, 2022
212022
{SmartMoE}: Efficiently Training {Sparsely-Activated} Models through Combining Offline and Online Parallelization
M Zhai, J He, Z Ma, Z Zong, R Zhang, J Zhai
2023 USENIX Annual Technical Conference (USENIX ATC 23), 961-975, 2023
202023
TriCache: a user-transparent block cache enabling high-performance out-of-core processing with in-memory programs
G Feng, H Cao, X Zhu, B Yu, Y Wang, Z Ma, S Chen, W Chen
ACM Transactions on Storage 19 (2), 1-30, 2023
152023
UniQ: a unified programming model for efficient quantum circuit simulation
C Zhang, H Wang, Z Ma, L Xie, Z Song, J Zhai
SC22: International Conference for High Performance Computing, Networking …, 2022
122022
{EINNET}: Optimizing tensor programs with {Derivation-Based} transformations
L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Huang, X Miao, S Tang, ...
17th USENIX Symposium on Operating Systems Design and Implementation (OSDI …, 2023
102023
Scaling graph 500 SSSP to 140 trillion edges with over 40 million cores
Y Wang, H Cao, Z Ma, W Yin, W Chen
2022 SC22: International Conference for High Performance Computing …, 2022
62022
Efficiently emulating high-bitwidth computation with low-bitwidth hardware
Z Ma, H Wang, G Feng, C Zhang, L Xie, J He, S Chen, J Zhai
Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022
42022
高效训练百万亿参数预训练模型的系统挑战和对策
马子轩, 翟季冬, 韩文弢
中兴通讯技术 28 (2), 51-58, 2022
32022
OLLIE: Derivation-based tensor program optimizer
L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Tang, L Xie, K Huang, ...
arXiv preprint arXiv:2208.02025, 2022
22022
面向新一代神威超级计算机的高效内存分配器
王豪杰, 马子轩, 郑立言, 王元炜, 王飞, 翟季冬
清华大学学报 (自然科学版), 2022
22022
Optimizing dnns with partially equivalent transformations and automated corrections
H Wang, J Zhai, M Gao, F Zhang, T Wang, Z Ma, S Tang, L Zheng, ...
IEEE Transactions on Computers, 2023
12023
PowerFusion: A Tensor Compiler with Explicit Data Movement Description and Instruction-level Graph IR
Z Ma, H Wang, J Xing, L Zheng, C Zhang, H Cao, K Huang, S Tang, ...
arXiv preprint arXiv:2307.04995, 2023
12023
Unified Programming Models for Heterogeneous High-Performance Computers
ZX Ma, YY Jin, SZ Tang, HJ Wang, WC Xue, JD Zhai, WM Zheng
Journal of Computer Science and Technology 38 (1), 211-218, 2023
12023
Efficient memory allocator for the New Generation Sunway supercomputer
W Haojie, MA Zixuan, L ZHENG, W Yuanwei, W Fei, Z Jidong
Journal of Tsinghua University (Science and Technology) 62 (5), 943-951, 2022
12022
Efficient Asynchronous Performance Prediction for Heterogeneous Systems
Y JIN, Z MA, J ZHAI
Chinese Journal of Computational Physics 41 (1), 40, 2024
2024
异步感知的异构高性能计算机性能预测方法
金煜阳, 马子轩, 翟季冬
计算物理 41 (1), 40, 2024
2024
Sistema negali atlikti operacijos. Bandykite vėliau dar kartą.
Straipsniai 1–20