The LORD said, “If as one people speaking the same language they have begun to do this, then nothing they plan to do will be impossible for them. Come, let us go down and confuse their language so they will not understand each other.” (Genesis 11: 6-7)

This is the reason why there are so many languages in the world. Although it is God's intention to confuse human being with the languages, He still gives us a chance to break the spell. God implies that nothing will be impossible when human being get unified. Thus, I believe that everything is achievable with technology.

Announcement

I am currently recruiting intern students in the field of speech and natural language processing. Candidates need to be self-motivated and good at programming (e.g. C++, python). If you are interested in speech or machine translation related research, please feel free to contact me.

Contact: tomkocse@gmail.com

Tom Ko - Google Scholar.

Biography

Tom, Ko Yu Ting received the B.Eng. degree in computer engineering from the Chinese University of Hong Kong in 2003. Then he received the M.Phil. and Ph.D. degree in computer science and engineering from the Hong Kong University of Science and Technology in 2010 and 2014 respectively. Throughout his postgraduate studies, he was supervised by Professor Brian Mak. After that, he joined Huawei Noah's Ark Lab as a research scientist. In 2019, he worked as an assistant professor at Southern University of Science and Technology in China. In 2021, he joined ByteDance AI as a research scientist. His research interests include speech recognition and natural language processing.

Education

Ph.D. in Computer Science and Engineering in HKUST, 2014

M.Phil. in Computer Science and Engineering in HKUST, 2010

M.Sc. in IC Design Engineering in HKUST, 2007

Bachelor Degree in Computer Engineering in CUHK, 2003

Publications

[Conference Papers]

Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao "GigaST: A 10,000-hour Pseudo Speech Translation Corpus", in Proceedings of Interspeech, August, 2023, Dublin, Ireland
Chutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li "CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning", in Proceedings of Interspeech, August, 2023, Dublin, Ireland
Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang "Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention", in Proceedings of Interspeech, August, 2023, Dublin, Ireland
Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu "CTC-based Non-autoregressive Speech Translation", in Proceedings of ACL 2023, Canada
"MOSPC: MOS Prediction Based on Pairwise Comparison", in Proceedings of ACL 2023, Canada
Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou "DUB: Discrete Unit Back-translation for Speech Translation", in Proceedings of ACL Findings 2023, Canada
Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu "Recent Advances in Direct Speech-to-text Translation", in Proceedings of IJCAI, 2023, Macao
Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou "M3ST: Mix at Three Levels for Speech Translation", in Proceedings of ICASSP, June, 2023, Rhodes Island, Greece
Yunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang and Mingxuan Wang "Leveraging per Image-Token Consistency for Vision-Language Pre-training", in CVPR 2023
Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang and Lilian Tang "Personalized Dialogue Generation with Persona-Adaptive Attention", in Proceedings of AAAI, February, 2023, Washington, DC, USA
Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian and Furu Wei "Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data", in Proceedings of Interspeech, September, 2022, Incheon, Korea
Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai and Yu Zhang "Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation", in Proceedings of Interspeech, September, 2022, Incheon, Korea
Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko and Haizhou Li "LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT", in Proceedings of Interspeech, September, 2022, Incheon, Korea
Qibing Bai, Tom Ko and Yu Zhang "A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis", in Proceedings of Interspeech, September, 2022, Incheon, Korea
Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei "SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing", in Proceedings of ACL 2022, Dublin
Fengpeng Yue, Yan Feng, Lei He, Tom Ko, Yu Zhang "EXPLORING MACHINE SPEECH CHAIN FOR DOMAIN ADAPTATION", in Proceedings of ICASSP, May, 2022, Singapore
Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang "MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION", in Proceedings of ICASSP, May, 2022, Singapore
Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-yi Lee, Lei Xie "Auto-KWS 2021 Challenge: Task, Datasets, and Baselines", in Proceedings of Interspeech, September, 2021, Brno, Czech Republic
Qiushi Huang, Tom Ko, H. Lilian Tang, Xubo Liu, Bo Wu "Token-Level Supervised Contrastive Learning for Punctuation Restoration", in Proceedings of Interspeech, September, 2021, Brno, Czech Republic
Yangbin Chen, Tom Ko, Jianping Wang "A Meta-Learning Approach for User-Defined Spoken Term Classification with Varying Classes and Examples", in Proceedings of Interspeech, September, 2021, Brno, Czech Republic
Junyi Ao, Tom Ko "Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network", in Proceedings of ISCSLP, January, 2021, Hong Kong
Fengpeng Yue, Tom Ko "An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition", in Proceedings of ISCSLP, January, 2021, Hong Kong
Yangbin Chen, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li "An Investigation of Few-Shot Learning in Spoken Term Classification", in Proceedings of Interspeech, September, 2020, Shanghai, China
Yangbin Chen, YUN MA, Tom Ko, JIANPING WANG, Qing Li "MetaMix: Improved Meta-Learning with Interpolation based Consistency Regularization", in ICPR 2020
Jingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie "AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification", in Proceedings of Interspeech, September, 2020, Shanghai, China
Tom Ko, Yangbin Chen, Qing Li "Prototypical Networks for Small Footprint Text-independent Speaker Verification", in Proceedings of ICASSP, May, 2020, Barcelona, Spain
Yingke Zhu, Tom Ko, Brian Mak "Mixup Learning Strategies for Text-independent Speaker Verification", in Proceedings of Interspeech, September, 2019, Graz, Austria
Yingke Zhu, Tom Ko, David Snyder, Brian Mak, Daniel Povey "Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification", in Proceedings of Interspeech, September, 2018, Hyderabad, India
Zhen Qin, Tom Ko, Guangjian Tian "Long Distance Voice Channel Diagnosis Using Deep Neural Networks", in Proceedings of Interspeech, September, 2018, Hyderabad, India
Tom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer, Sanjeev Khudanpur "A study on data augmentation of reverberant speech for robust speech recognition", in Proceedings of ICASSP, March, 2017, New Orleans, USA
Yajie Miao, Mohammad Gowayyed, Xingyu Na, Tom Ko, Florian Metze, and Alexander Waibel "An Empirical Exploration of CTC Acoustic Models", in Proceedings of ICASSP, March, 2016, Shanghai, China
Vijayaditya Peddinti, Guoguo Chen, Vimal Manohar, Tom Ko, Daniel Povey, Sanjeev Khudanpur "JHU ASpIRE system : Robust LVCSR with TDNNs, i-vector adaptation and RNN-LMs", in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), December, 2015, Scottsdale, Arizona, USA
Tom Ko, Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur " Audio Augmentation for Speech Recognition", in Proceedings of Interspeech, September, 2015, Dresden, Germany (poster)
Tom Ko, Brian Mak, Dongpeng Chen " Modeling Inter-cluster and Intra-cluster Discrimination Among Triphones", in Proceedings of the International Symposium of Chinese Spoken Language Processing, September, 2014, Singapore (poster)
Tom Ko, Brian Mak and Cheung-Chi Leung " SUBSPACE GAUSSIAN MIXTURE MODEL WITH STATE-DEPENDENT SUBSPACE DIMENSIONS", in Proceedings of ICASSP, pages 1744-1748, May, 2014, Florence, Italy (poster)
Tom Ko and Brian Mak, " DERIVATION OF EIGENTRIPHONES BY WEIGHTED PRINCIPAL COMPONENT ANALYSIS", in Proceedings of ICASSP, pages 4097-4100, March, 2012, Kyoto, Japan (oral)
Tom Ko and Brian Mak, " A FULLY AUTOMATED DERIVATION OF STATE-BASED EIGENTRIPHONES FOR TRIPHONE MODELING WITH NO TIED STATES USING REGULARIZATION", in Proceedings of Interspeech, pages 781-784, August, 2011, Florence, Italy (oral)
Tom Ko and Brian Mak, " EIGENTRIPHONES: A BASIS FOR CONTEXT-DEPENDENT ACOUSTIC MODELING", in Proceedings of ICASSP, pages 4892-4895, May, 2011, Prague, Czech Republic (poster)
Brian Mak and Tom Ko, " Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition", in Proceedings of the International Symposium of Chinese Spoken Language Processing, pages 114-118, Nov, 2010 ,Taiwan (oral)
Tom Ko and Brian Mak, " IMPROVING SPEECH RECOGNITION BY EXPLICIT MODELING OF PHONE DELETIONS", in Proceedings of ICASSP, pages 4858-4861, March, 2010, Dallas, Texas, USA (poster)
Brian Mak and Tom Ko, " Automatic Estimation of Decoding Parameters Using Large-Margin Iterative Linear Programming", in Proceedings of Interspeech, pages 1219-1222, Sept, 2009, Brighton, U.K. (poster)
Brian Mak and Tom Ko, " Min-max Discriminative Training of Decoding Parameters Using Iterative Linear Programming", in Proceedings of Interspeech, pages 915-918, Sept, 2008, Brisbane, Australia

[Journal Papers]

Tom Ko and Brian Mak, "Eigentrigraphemes for Under-Resourced Languages", Speech Communications, volume 56, pages 132-141, January, 2014
Tom Ko and Brian Mak, "Eigentriphones for Context-dependent Acoustic Modeling", IEEE Transactions on Audio, Speech and Language Processing, volume 21, number 6, pages 1285-1294, 2013

[Thesis]

Phone Deletion Modeling in Speech Recognition (M.Phil.)

Distinct Acoustic Modeling for Automatic Speech Recognition (Ph.D.)

[Award]

2nd best presentation in Signal Processing Postgraduate Forum 2010 organized by the IEEE Hong Kong Chapter

Professor Samuel Chanson Best Teaching Assistant Award 2011-12

Services

[Teaching Assistant]

Comp102 Computer and Programming Fundamental I
Comp103 Computer and Programming Fundamental II
Comp104 Programming Fundamentals and Methodology
Comp1022Q Introduction to Computing with Excel VBA