BaGuaLu: targeting brain scale pretrained models with over 37 million cores Z Ma, J He, J Qiu, H Cao, Y Wang, Z Sun, L Zheng, H Wang, S Tang, ... Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 56 | 2022 |
Collaborative heterogeneity-aware os scheduler for asymmetric multicore processors T Yu, R Zhong, V Janjic, P Petoumenos, J Zhai, H Leather, J Thomson IEEE Transactions on Parallel and Distributed Systems 32 (5), 1224-1237, 2020 | 22 | 2020 |
PerFlow: A domain specific framework for automatic performance analysis of parallel applications Y Jin, H Wang, R Zhong, C Zhang, J Zhai Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 10 | 2022 |
Efficient Inference for Pruned CNN Models on Mobile Devices With Holistic Sparsity Alignment Y Jin, R Zhong, S Long, J Zhai IEEE Transactions on Parallel and Distributed Systems, 2024 | | 2024 |
Graph-Centric Performance Analysis for Large-Scale Parallel Applications Y Jin, H Wang, R Zhong, C Zhang, X Liao, F Zhang, J Zhai IEEE Transactions on Parallel and Distributed Systems, 2024 | | 2024 |
{MAGPY}: Compiling Eager Mode {DNN} Programs by Monitoring Execution States C Zhang, R Dong, H Wang, R Zhong, J Chen, J Zhai 2024 USENIX Annual Technical Conference (USENIX ATC 24), 683-698, 2024 | | 2024 |
Critique of MemXCT: memory-centric X-ray CT reconstruction with massive parallelization by SCC Team from Tsinghua University R Zhong, J Chen, C Zhang, M Zhai, Z Song, Y Wang, W Han, L Gan, ... IEEE Transactions on Parallel and Distributed Systems 33 (9), 2050-2053, 2021 | | 2021 |