Zhen ZHENG

Cituota

	Visi	Nuo 2019
Šaltiniai	585	571
h-rodyklė	12	12
i10-rodyklė	14	14

240

120

180

201720182019202020212022202320244 10 12 26 57 103 226 146

Viešas pasiekiamumas

Peržiūrėti viską

11 straipsnių

0 straipsnių

pasiekiami

nepasiekiami

Pagal finansavimo įpareigojimus

Bendraautoriai

Wei LinAlibabaPatvirtintas el. paštas alibaba-inc.com
Jun YangNVIDIAPatvirtintas el. paštas nvidia.com
Xipeng ShenProfessor of Computer Science, North Carolina State UniversityPatvirtintas el. paštas ncsu.edu
Jidong ZhaiTsinghua UniversityPatvirtintas el. paštas tsinghua.edu.cn
Chuan WuProfessor of Computer Science, The University of Hong KongPatvirtintas el. paštas cs.hku.hk
Youngmin YiUniversity of SeoulPatvirtintas el. paštas uos.ac.kr
Shuaiwen Leon SongVice President, Together.ai; Ex-Microsoft; Tenured ProfessorPatvirtintas el. paštas together.ai
Feng ZhangRenmin University of ChinaPatvirtintas el. paštas ruc.edu.cn

Stebėti

Zhen ZHENG

Microsoft

Patvirtintas el. paštas microsoft.com - Pagrindinis puslapis

Machine Learning System High Performance Computing Heterogeneous Computing


Pavadinimas Rūšiuoti pagal šaltinius Rūšiuoti pagal metus Rūšiuoti pagal pavadinimą	Cituota Cituota	Metai
DAPPLE: A pipelined data parallel approach for training large models S Fan, Y Rong, C Meng, Z Cao, S Wang, Z Zheng, C Wu, G Long, J Yang, ... Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021	180	2021
Understanding and bridging the gaps in current GNN performance optimizations K Huang, J Zhai, Z Zheng, Y Yi, X Shen Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021	73	2021
AStitch: enabling a new multi-dimensional optimization space for memory-intensive ML training and inference on modern SIMT architectures Z Zheng, X Yang, P Zhao, G Long, K Zhu, F Zhu, W Zhao, X Liu, J Yang, ... Proceedings of the 27th ACM International Conference on Architectural …, 2022	43	2022
Refactoring and optimizing the community atmosphere model (CAM) on the sunway taihulight supercomputer H Fu, J Liao, W Xue, L Wang, D Chen, L Gu, J Xu, N Ding, X Wang, C He, ... SC'16: Proceedings of the International Conference for High Performance …, 2016	41	2016
Versapipe: a versatile programming framework for pipelined computing on GPU Z Zheng, C Oh, J Zhai, X Shen, Y Yi, W Chen Proceedings of the 50th Annual IEEE/ACM International Symposium on …, 2017	37	2017
Whale: Efficient giant model training over heterogeneous {GPUs} X Jia, L Jiang, A Wang, W Xiao, Z Shi, J Zhang, X Li, L Chen, Y Li, ... 2022 USENIX Annual Technical Conference (USENIX ATC 22), 673-688, 2022	34	2022
Fusionstitching: boosting memory intensive computations for deep learning workloads Z Zheng, P Zhao, G Long, F Zhu, K Zhu, W Zhao, L Diao, J Yang, W Lin arXiv preprint arXiv:2009.10924, 2020	28	2020
Optimizing distributed training deployment in heterogeneous GPU clusters X Yi, S Zhang, Z Luo, G Long, L Diao, C Wu, Z Zheng, J Yang, W Lin Proceedings of the 16th International Conference on emerging Networking …, 2020	24	2020
DISC: A dynamic shape compiler for machine learning workloads K Zhu, WY Zhao, Z Zheng, TY Guo, PZ Zhao, JJ Bai, J Yang, XY Liu, ... Proceedings of the 1st Workshop on Machine Learning and Systems, 89-95, 2021	23	2021
Flash-llm: Enabling cost-effective and highly-efficient large generative model inference with unstructured sparsity H Xia, Z Zheng, Y Li, D Zhuang, Z Zhou, X Qiu, Y Li, W Lin, SL Song arXiv preprint arXiv:2309.10285, 2023	18	2023
Drew: Efficient winograd cnn inference with deep reuse R Wu, F Zhang, J Guan, Z Zheng, X Du, X Shen Proceedings of the ACM Web Conference 2022, 1807-1816, 2022	15	2022
Gopipe: a granularity-oblivious programming framework for pipelined stencil executions on gpu C Oh, Z Zheng, X Shen, J Zhai, Y Yi Proceedings of the ACM International Conference on Parallel Architectures …, 2020	13	2020
Exploring deep reuse in winograd CNN inference R Wu, F Zhang, Z Zheng, X Du, X Shen Proceedings of the 26th ACM SIGPLAN Symposium on Principles and Practice of …, 2021	11	2021
HiWayLib: A software framework for enabling high performance communications for heterogeneous pipeline computations Z Zheng, C Oh, J Zhai, X Shen, Y Yi, W Chen Proceedings of the Twenty-Fourth International Conference on Architectural …, 2019	10	2019
Auto-MAP: A DQN framework for exploring distributed execution plans for DNN workloads S Wang, Y Rong, S Fan, Z Zheng, LS Diao, G Long, J Yang, X Liu, W Lin arXiv preprint arXiv:2007.04069, 2020	7	2020
Bladedisc: Optimizing dynamic shape machine learning workloads via compiler approach Z Zheng, Z Pan, D Wang, K Zhu, W Zhao, T Guo, X Qiu, M Sun, J Bai, ... Proceedings of the ACM on Management of Data 1 (3), 1-29, 2023	5	2023
Fp6-llm: Efficiently serving large language models through fp6-centric algorithm-system co-design H Xia, Z Zheng, X Wu, S Chen, Z Yao, S Youn, A Bakhtiari, M Wyatt, ... arXiv preprint arXiv:2401.14112, 2024	4	2024
Zeroquant(4+2): Redefining llms quantization with a new fp6-centric strategy for diverse generative tasks X Wu, H Xia, S Youn, Z Zheng, S Chen, A Bakhtiari, M Wyatt, Y He, ... arXiv preprint arXiv:2312.08583, 2023	4	2023
Optimizing DNN compilation for distributed training with joint OP and tensor fusion X Yi, S Zhang, L Diao, C Wu, Z Zheng, S Fan, S Wang, J Yang, W Lin IEEE Transactions on Parallel and Distributed Systems 33 (12), 4694-4706, 2022	4	2022
Whale: Scaling deep learning model training to the trillions X Jia, AW Le Jiang, J Zhang, X Li, W Xiao, Y Li, Z Zheng, X Liu, W Lin arXiv preprint arXiv:2011.09208, 2020	4	2020

Sistema negali atlikti operacijos. Bandykite vėliau dar kartą.

Straipsniai 1–20

Šaltinių per metus

Dubliuoti šaltiniai

Sujungti šaltiniai

Pridėti bendraautoriusBendraautoriai

Stebėti

Cituota

Bendraautoriai