{PET}: Optimizing tensor programs with partially equivalent transformations and automated corrections H Wang, J Zhai, M Gao, Z Ma, S Tang, L Zheng, Y Li, K Rong, Y Chen, ... 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI
, 2021 | 72 | 2021 |
Fastermoe: modeling and optimizing training of large-scale dynamic pre-trained models J He, J Zhai, T Antunes, H Wang, F Luo, S Shi, Q Li Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 67 | 2022 |
BaGuaLu: targeting brain scale pretrained models with over 37 million cores Z Ma, J He, J Qiu, H Cao, Y Wang, Z Sun, L Zheng, H Wang, S Tang, ... Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 60 | 2022 |
HyQuas: hybrid partitioner based quantum circuit simulation system on GPU C Zhang, Z Song, H Wang, K Rong, J Zhai Proceedings of the ACM International Conference on Supercomputing, 443-454, 2021 | 27 | 2021 |
Spindle: Informed memory access monitoring H Wang, J Zhai, X Tang, B Yu, X Ma, W Chen 2018 USENIX Annual Technical Conference (USENIX ATC 18), 561-574, 2018 | 26 | 2018 |
Scaling graph traversal to 281 trillion edges with 40 million cores H Cao, Y Wang, H Wang, H Lin, Z Ma, W Yin, W Chen Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 23 | 2022 |
Spread-n-share: improving application performance and cluster throughput with resource-aware job placement X Tang, H Wang, X Ma, N El-Sayed, J Zhai, W Chen, A Aboulnaga Proceedings of the International Conference for High Performance Computing
, 2019 | 18 | 2019 |
FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs S Tang, J Zhai, H Wang, L Jiang, L Zheng, Z Yuan, C Zhang Proceedings of the 43rd ACM SIGPLAN International Conference on Programming
, 2022 | 15 | 2022 |
: Large-Scale Graph Triangle Counting on a Single Machine Using GPUs J Huang, H Wang, X Fei, X Wang, W Chen IEEE Transactions on Parallel and Distributed Systems 33 (11), 3067-3078, 2021 | 13 | 2021 |
UniQ: a unified programming model for efficient quantum circuit simulation C Zhang, H Wang, Z Ma, L Xie, Z Song, J Zhai SC22: International Conference for High Performance Computing, Networking
, 2022 | 12 | 2022 |
ScalAna: Automating scaling loss detection with graph analysis Y Jin, H Wang, T Yu, X Tang, T Hoefler, X Liu, J Zhai SC20: International Conference for High Performance Computing, Networking
, 2020 | 12 | 2020 |
PerFlow: A domain specific framework for automatic performance analysis of parallel applications Y Jin, H Wang, R Zhong, C Zhang, J Zhai Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 10 | 2022 |
Vapro: Performance variance detection and diagnosis for production-run parallel applications L Zheng, J Zhai, X Tang, H Wang, T Yu, Y Jin, SL Song, W Chen Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 8 | 2022 |
LotusSQL: SQL engine for high-performance big data systems X Li, B Yu, G Feng, H Wang, W Chen Big Data Mining and Analytics 4 (4), 252-265, 2021 | 7 | 2021 |
Efficiently emulating high-bitwidth computation with low-bitwidth hardware Z Ma, H Wang, G Feng, C Zhang, L Xie, J He, S Chen, J Zhai Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022 | 6 | 2022 |
Identifying scalability bottlenecks for large-scale parallel programs with graph analysis Y Jin, H Wang, X Tang, T Hoefler, X Liu, J Zhai Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of
, 2020 | 3 | 2020 |
An Efficient Sparse CNNs Accelerator on FPGA Y Zhang, H Jiang, X Li, H Wang, D Dong, Y Cao 2022 IEEE International Conference on Cluster Computing (CLUSTER), 504-505, 2022 | 2 | 2022 |
OLLIE: Derivation-based tensor program optimizer L Zheng, H Wang, J Zhai, M Hu, Z Ma, T Wang, S Tang, L Xie, K Huang, ... arXiv preprint arXiv:2208.02025, 2022 | 2 | 2022 |
Detecting performance variance for parallel applications without source code J Zhai, L Zheng, F Zhang, X Tang, H Wang, T Yu, Y Jin, SL Song, W Chen IEEE Transactions on Parallel and Distributed Systems 33 (12), 4239-4255, 2022 | 2 | 2022 |
Sparker: Efficient reduction for more scalable machine learning with spark B Yu, H Cao, T Shan, H Wang, X Tang, W Chen Proceedings of the 50th International Conference on Parallel Processing, 1-11, 2021 | 2 | 2021 |