Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition X Cai, D Dai, Z Wu, X Li, J Li, H Meng ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 61 | 2021 |
Towards multi-scale style control for expressive speech synthesis X Li, C Song, J Li, Z Wu, J Jia, H Meng arXiv preprint arXiv:2104.03521, 2021 | 46 | 2021 |
Adversarially learning disentangled speech representations for robust multi-factor voice conversion J Wang, J Li, X Zhao, Z Wu, S Kang, H Meng arXiv preprint arXiv:2102.00184, 2021 | 25 | 2021 |
Inferring user emotive state changes in realistic human-computer conversational dialogs R Li, Z Wu, J Jia, J Li, W Chen, H Meng Proceedings of the 26th ACM international conference on Multimedia, 136-144, 2018 | 20 | 2018 |
Neufa: Neural network based end-to-end forced alignment with bidirectional attention mechanism J Li, Y Meng, Z Wu, H Meng, Q Tian, Y Wang, Y Wang ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 16 | 2022 |
Enhancing speaking styles in conversational text-to-speech synthesis with graph-based multi-modal context modeling J Li, Y Meng, C Li, Z Wu, H Meng, C Weng, D Su ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and …, 2022 | 15 | 2022 |
Knowledge-Based Linguistic Encoding for End-to-End Mandarin Text-to-Speech Synthesis. J Li, Z Wu, R Li, P Zhi, S Yang, H Meng INTERSPEECH, 4494-4498, 2019 | 15 | 2019 |
Dependency parsing based semantic representation learning with graph neural network for enhancing expressiveness of text-to-speech Y Zhou, C Song, J Li, Z Wu, H Meng arXiv preprint arXiv:2104.06835, 2021 | 9 | 2021 |
Inferring speaking styles from multi-modal conversational context by multi-scale relational graph convolutional networks J Li, Y Meng, X Wu, Z Wu, J Jia, H Meng, Q Tian, Y Wang, Y Wang Proceedings of the 30th ACM International Conference on Multimedia, 5811-5820, 2022 | 7 | 2022 |
Syntactic representation learning for neural network based tts with syntactic parse tree traversal C Song, J Li, Y Zhou, Z Wu, H Meng ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and …, 2021 | 5 | 2021 |
Enhancing word-level semantic representation via dependency structure for expressive text-to-speech synthesis Y Zhou, C Song, J Li, Z Wu, Y Bian, D Su, H Meng arXiv preprint arXiv:2104.06835, 2021 | 4 | 2021 |
DiCLET-TTS: Diffusion model based cross-lingual emotion transfer for text-to-speech—A study between English and Mandarin T Li, C Hu, J Cong, X Zhu, J Li, Q Tian, Y Wang, L Xie IEEE/ACM Transactions on Audio, Speech, and Language Processing, 2023 | 3 | 2023 |
Multi-modal multi-scale speech expression evaluation in computer-assisted language learning J Li, Z Wu, R Li, M Xu, K Lei, L Cai Artificial Intelligence and Mobile Services–AIMS 2018: 7th International …, 2018 | 2 | 2018 |
Joint Multiscale Cross-Lingual Speaking Style Transfer With Bidirectional Attention Mechanism for Automatic Dubbing J Li, S Li, P Chen, L Zhang, Y Meng, Z Wu, H Meng, Q Tian, Y Wang, ... IEEE/ACM Transactions on Audio, Speech, and Language Processing 32, 517-528, 2023 | 1 | 2023 |