The LORD said, “If as one people speaking the same language they have begun to do this, then nothing they plan to do will be impossible for them. Come, let us go down and confuse their language so they will not understand each other.” (Genesis 11: 6-7)
This is the reason why there are so many languages in the world. Although it is God's intention to confuse human being with the languages, He still gives us a chance to break the spell. God implies that nothing will be impossible when human being get unified. Thus, I believe that everything is achievable with technology.
Announcement
I am currently recruiting intern students in the field of speech and natural language processing. Candidates need to be self-motivated and good at programming (e.g. C++, python). If you are interested in speech or machine translation related research, please feel free to contact me.
Contact: tomkocse@gmail.com
Tom Ko - Google Scholar.Biography
Tom, Ko Yu Ting received the B.Eng. degree in computer engineering from the Chinese University of Hong Kong in 2003. Then he received the M.Phil. and Ph.D. degree in computer science and engineering from the Hong Kong University of Science and Technology in 2010 and 2014 respectively. Throughout his postgraduate studies, he was supervised by Professor Brian Mak. After that, he joined Huawei Noah's Ark Lab as a research scientist. In 2019, he worked as an assistant professor at Southern University of Science and Technology in China. In 2021, he joined ByteDance AI as a research scientist. His research interests include speech recognition and natural language processing.
Education
Ph.D. in Computer Science and Engineering in HKUST, 2014
M.Phil. in Computer Science and Engineering in HKUST, 2010
M.Sc. in IC Design Engineering in HKUST, 2007
Bachelor Degree in Computer Engineering in CUHK, 2003
Publications
[Conference Papers]
Zhichao Huang, Chutong Meng, Tom Ko
"RepCodec: A Speech Representation Codec for Speech Tokenization",
in Proceedings of ACL, August, 2024, ThailandQiushi Huang, Xubo Liu, Tom Ko, Bo Wu, Wenwu Wang, Yu Zhang, Lilian Tang
"Selective Prompting Tuning for Personalized Conversations with LLMsSelective Prompting Tuning for Personalized Conversations with LLMs",
in Proceedings of ACL Findings, August, 2024, ThailandQianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang
"PolyVoice: Language Models for Speech to Speech Translation",
in Proceedings of ICLR, May, 2024, Vienna, AustriaRong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao
"GigaST: A 10,000-hour Pseudo Speech Translation Corpus",
in Proceedings of Interspeech, August, 2023, Dublin, IrelandChutong Meng, Junyi Ao, Tom Ko, Mingxuan Wang, Haizhou Li
"CoBERT: Self-Supervised Speech Representation Learning Through Code Representation Learning",
in Proceedings of Interspeech, August, 2023, Dublin, IrelandXubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang
"Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention",
in Proceedings of Interspeech, August, 2023, Dublin, Ireland"FINDINGS OF THE IWSLT 2023 EVALUATION CAMPAIGN",
in Proceedings of IWSLT 2023, CanadaChen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu
"CTC-based Non-autoregressive Speech Translation",
in Proceedings of ACL, July, 2023, CanadaKexin Wang, Yunlong Zhao, Qianqian Dong, Tom Ko, Mingxuan Wang
"MOSPC: MOS Prediction Based on Pairwise Comparison",
in Proceedings of ACL, July, 2023, CanadaDong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou
"DUB: Discrete Unit Back-translation for Speech Translation",
in Proceedings of ACL Findings, July, 2023, CanadaChen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu
"Recent Advances in Direct Speech-to-text Translation",
in Proceedings of IJCAI, 2023, MacaoXuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou
"M3ST: Mix at Three Levels for Speech Translation",
in Proceedings of ICASSP, June, 2023, Rhodes Island, GreeceYunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang and Mingxuan Wang
"Leveraging per Image-Token Consistency for Vision-Language Pre-training",
in CVPR 2023Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang and Lilian Tang
"Personalized Dialogue Generation with Persona-Adaptive Attention",
in Proceedings of AAAI, February, 2023, Washington, DC, USAJunyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, Lirong Dai, Jinyu Li, Yao Qian and Furu Wei
"Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data",
in Proceedings of Interspeech, September, 2022, Incheon, KoreaQianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai and Yu Zhang
"Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation",
in Proceedings of Interspeech, September, 2022, Incheon, KoreaRui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko and Haizhou Li
"LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT",
in Proceedings of Interspeech, September, 2022, Incheon, KoreaQibing Bai, Tom Ko and Yu Zhang
"A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis",
in Proceedings of Interspeech, September, 2022, Incheon, KoreaJunyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei
"SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing",
in Proceedings of ACL, May, 2022, Dublin, IrelandFengpeng Yue, Yan Feng, Lei He, Tom Ko, Yu Zhang
"EXPLORING MACHINE SPEECH CHAIN FOR DOMAIN ADAPTATION",
in Proceedings of ICASSP, May, 2022, SingaporeRui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang
"MULTI-VIEW SELF-ATTENTION BASED TRANSFORMER FOR SPEAKER RECOGNITION",
in Proceedings of ICASSP, May, 2022, SingaporeJingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-yi Lee, Lei Xie
"Auto-KWS 2021 Challenge: Task, Datasets, and Baselines",
in Proceedings of Interspeech, September, 2021, Brno, Czech RepublicQiushi Huang, Tom Ko, H. Lilian Tang, Xubo Liu, Bo Wu
"Token-Level Supervised Contrastive Learning for Punctuation Restoration",
in Proceedings of Interspeech, September, 2021, Brno, Czech RepublicYangbin Chen, Tom Ko, Jianping Wang
"A Meta-Learning Approach for User-Defined Spoken Term Classification with Varying Classes and Examples",
in Proceedings of Interspeech, September, 2021, Brno, Czech RepublicJunyi Ao, Tom Ko
"Improving Attention-based End-to-end ASR by Incorporating an N-gram Neural Network",
in Proceedings of ISCSLP, January, 2021, Hong KongFengpeng Yue, Tom Ko
"An Investigation of Positional Encoding in Transformer-based End-to-end Speech Recognition",
in Proceedings of ISCSLP, January, 2021, Hong KongYangbin Chen, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li
"An Investigation of Few-Shot Learning in Spoken Term Classification",
in Proceedings of Interspeech, September, 2020, Shanghai, ChinaYangbin Chen, YUN MA, Tom Ko, JIANPING WANG, Qing Li
"MetaMix: Improved Meta-Learning with Interpolation based Consistency Regularization",
in ICPR 2020Jingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie
"AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification",
in Proceedings of Interspeech, September, 2020, Shanghai, ChinaTom Ko, Yangbin Chen, Qing Li
"Prototypical Networks for Small Footprint Text-independent Speaker Verification",
in Proceedings of ICASSP, May, 2020, Barcelona, SpainYingke Zhu, Tom Ko, Brian Mak
"Mixup Learning Strategies for Text-independent Speaker Verification",
in Proceedings of Interspeech, September, 2019, Graz, AustriaYingke Zhu, Tom Ko, David Snyder, Brian Mak, Daniel Povey
"Self-Attentive Speaker Embeddings for Text-Independent Speaker Verification",
in Proceedings of Interspeech, September, 2018, Hyderabad, IndiaZhen Qin, Tom Ko, Guangjian Tian
"Long Distance Voice Channel Diagnosis Using Deep Neural Networks",
in Proceedings of Interspeech, September, 2018, Hyderabad, IndiaTom Ko, Vijayaditya Peddinti, Daniel Povey, Michael L. Seltzer, Sanjeev Khudanpur
"A study on data augmentation of reverberant speech for robust speech recognition",
in Proceedings of ICASSP, March, 2017, New Orleans, USAYajie Miao, Mohammad Gowayyed, Xingyu Na, Tom Ko, Florian Metze, and Alexander Waibel
"An Empirical Exploration of CTC Acoustic Models",
in Proceedings of ICASSP, March, 2016, Shanghai, ChinaVijayaditya Peddinti, Guoguo Chen, Vimal Manohar, Tom Ko, Daniel Povey, Sanjeev Khudanpur
"JHU ASpIRE system : Robust LVCSR with TDNNs, i-vector adaptation and RNN-LMs",
in IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), December, 2015, Scottsdale, Arizona, USATom Ko, Vijayaditya Peddinti, Daniel Povey, Sanjeev Khudanpur
" Audio Augmentation for Speech Recognition",
in Proceedings of Interspeech, September, 2015, Dresden, Germany (poster)Tom Ko, Brian Mak, Dongpeng Chen
" Modeling Inter-cluster and Intra-cluster Discrimination Among Triphones",
in Proceedings of the International Symposium of Chinese Spoken Language Processing, September, 2014, Singapore (poster)Tom Ko, Brian Mak and Cheung-Chi Leung
" SUBSPACE GAUSSIAN MIXTURE MODEL WITH STATE-DEPENDENT SUBSPACE DIMENSIONS",
in Proceedings of ICASSP, pages 1744-1748, May, 2014, Florence, Italy (poster)Tom Ko and Brian Mak
" DERIVATION OF EIGENTRIPHONES BY WEIGHTED PRINCIPAL COMPONENT ANALYSIS",
in Proceedings of ICASSP, pages 4097-4100, March, 2012, Kyoto, Japan (oral)Tom Ko and Brian Mak
" A FULLY AUTOMATED DERIVATION OF STATE-BASED EIGENTRIPHONES FOR TRIPHONE MODELING WITH NO TIED STATES USING REGULARIZATION",
in Proceedings of Interspeech, pages 781-784, August, 2011, Florence, Italy (oral)Tom Ko and Brian Mak
" EIGENTRIPHONES: A BASIS FOR CONTEXT-DEPENDENT ACOUSTIC MODELING",
in Proceedings of ICASSP, pages 4892-4895, May, 2011, Prague, Czech Republic (poster)Brian Mak and Tom Ko
" Problems of Modeling Phone Deletion in Conversational Speech for Speech Recognition",
in Proceedings of the International Symposium of Chinese Spoken Language Processing, pages 114-118, Nov, 2010 ,Taiwan (oral)Tom Ko and Brian Mak
" IMPROVING SPEECH RECOGNITION BY EXPLICIT MODELING OF PHONE DELETIONS",
in Proceedings of ICASSP, pages 4858-4861, March, 2010, Dallas, Texas, USA (poster)Brian Mak and Tom Ko
" Automatic Estimation of Decoding Parameters Using Large-Margin Iterative Linear Programming",
in Proceedings of Interspeech, pages 1219-1222, Sept, 2009, Brighton, U.K. (poster)Brian Mak and Tom Ko
" Min-max Discriminative Training of Decoding Parameters Using Iterative Linear Programming",
in Proceedings of Interspeech, pages 915-918, Sept, 2008, Brisbane, Australia
[Journal Papers]
Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D Plumbley, Yuexian Zou, Wenwu WangXinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D Plumbley, Yuexian Zou, Wenwu Wang
"Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research",
IEEE Transactions on Audio, Speech and Language Processing, 2024Tom Ko and Brian Mak
"Eigentrigraphemes for Under-Resourced Languages",
Speech Communications, volume 56, pages 132-141, January, 2014Tom Ko and Brian Mak
"Eigentriphones for Context-dependent Acoustic Modeling",
IEEE Transactions on Audio, Speech and Language Processing, volume 21, number 6, pages 1285-1294, 2013
[Thesis]
Phone Deletion Modeling in Speech Recognition (M.Phil.)Distinct Acoustic Modeling for Automatic Speech Recognition (Ph.D.)
[Award]
2nd best presentation in Signal Processing Postgraduate Forum 2010 organized by the IEEE Hong Kong ChapterProfessor Samuel Chanson Best Teaching Assistant Award 2011-12
Services
[Teaching Assistant]
Comp102 Computer and Programming Fundamental I
Comp103 Computer and Programming Fundamental II
Comp104 Programming Fundamentals and Methodology
Comp1022Q Introduction to Computing with Excel VBA