{ROLLER}: Fast and efficient tensor compilation for deep learning H Zhu, R Wu, Y Diao, S Ke, H Li, C Zhang, J Xue, L Ma, Y Xia, W Cui, ... 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI
, 2022 | 79 | 2022 |
HyQuas: hybrid partitioner based quantum circuit simulation system on GPU C Zhang, Z Song, H Wang, K Rong, J Zhai Proceedings of the ACM International Conference on Supercomputing, 443-454, 2021 | 28 | 2021 |
FreeTensor: a free-form DSL with holistic optimizations for irregular tensor programs S Tang, J Zhai, H Wang, L Jiang, L Zheng, Z Yuan, C Zhang Proceedings of the 43rd ACM SIGPLAN International Conference on Programming
, 2022 | 15 | 2022 |
UniQ: a unified programming model for efficient quantum circuit simulation C Zhang, H Wang, Z Ma, L Xie, Z Song, J Zhai SC22: International Conference for High Performance Computing, Networking
, 2022 | 12 | 2022 |
Cocktailer: Analyzing and optimizing dynamic control flow in deep learning C Zhang, L Ma, J Xue, Y Shi, Z Miao, F Yang, J Zhai, Z Yang, M Yang 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI
, 2023 | 11 | 2023 |
PerFlow: A domain specific framework for automatic performance analysis of parallel applications Y Jin, H Wang, R Zhong, C Zhang, J Zhai Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of
, 2022 | 10 | 2022 |
Efficiently emulating high-bitwidth computation with low-bitwidth hardware Z Ma, H Wang, G Feng, C Zhang, L Xie, J He, S Chen, J Zhai Proceedings of the 36th ACM International Conference on Supercomputing, 1-12, 2022 | 6 | 2022 |
Critique of Planetary Normal Mode Computation: Parallel Algorithms, Performance, and Reproducibility by SCC Team From Tsinghua University C Zhang, C Zhao, J He, S Chen, L Zheng, K Huang, W Han, J Zhai IEEE Transactions on Parallel and Distributed Systems 32 (11), 2631-2634, 2021 | 2 | 2021 |
A Fast Lock for Explicit Message Passing Architectures X Tang, C Zhang, J Zhai, X Qian, W Chen, Y Jiang IEEE Transactions on Computers 70 (10), 1555-1568, 2020 | 1 | 2020 |
MixQ: Taming Dynamic Outliers in Mixed-Precision Quantization by Online Prediction Y Chen, C Zhang, R Dong, H Zhang, Y Zhang, Z Lu, J Zhai 2024 SC24: International Conference for High Performance Computing
, 2024 | | 2024 |
Graph-Centric Performance Analysis for Large-Scale Parallel Applications Y Jin, H Wang, R Zhong, C Zhang, X Liao, F Zhang, J Zhai IEEE Transactions on Parallel and Distributed Systems, 2024 | | 2024 |
MagPy: Compiling Eager Mode DNN Programs by Monitoring Execution States C Zhang, R Dong, H Wang, R Zhong, J Chen, J Zhai 2024 USENIX Annual Technical Conference (USENIX ATC 24), 683-698, 2024 | | 2024 |
Critique of MemXCT: memory-centric X-ray CT reconstruction with massive parallelization by SCC Team from Tsinghua University R Zhong, J Chen, C Zhang, M Zhai, Z Song, Y Wang, W Han, L Gan, ... IEEE Transactions on Parallel and Distributed Systems 33 (9), 2050-2053, 2021 | | 2021 |