Chung-Ming Chien (簡仲明)

Santorini, Greece

May 30, 2023

I am a 2nd-year Ph.D. student at Toyota Technological Institute at Chicago (TTIC), where I am fortunate to work with Karen Livescu. My research interests encompass the fields of speech and natural language processing technologies. Here are some topics I have been focusing on recently:

Speech-Text Joint Learning
Can speech models learn better/faster with the aids of text? How should we integrate speech and audio information into pre-trained text models?
Speech Generation
Control and model non-lexical information in generated speech in a more efficient and intuitive way.
Self-Supervised Speech Representations
Analyze the information encoded in self-supervised speech representations and explore various applications for the learned representations and units.
Multi-Modal Learning
Text-guided image generation and video-guided speech generation.

Prior to joining TTIC, I earned my Master’s degree in Computer Science from National Taiwan University (NTU), where I had the privilege of working with Lin-shan Lee and Hung-yi Lee at the Speech Processing Lab. Outside of school, I also gained valuable experience through summer internships with Amazon Alexa TTS Research and FAIR (AI at Meta).

Beyond my academic pursuits, I am a sports enthusiasts and amateur athlete. I captained the baseball varsity team of NTU during my undergraduate years. I am also broadly interested in tennis, hiking, scuba diving, swimming, badminton, and training. In 2022, I achieved a personal milestone by completing my first marathon, and I have been dedicated to improving my PB with the goal of breaking the 3:10 mark!

news

Jan 13, 2024	My open-source FastSpeech 2 project gets over 1.5k stars on Github
Dec 20, 2023	I share the honor of the Best Student Paper Award of ASRU 2023 with Mingjiamei, Ju-Chieh, and Karen. Check out our work “Few-shot SLU via Joint Speech-Text Models” for more details
Oct 7, 2023	“Toward Joint Language Modeling for Speech Units and Text” is accepted to Findings of EMNLP 2023!
Sep 22, 2023	Our work “Few-shot SLU via Joint Speech-Text Models” is accepted at ASRU 2023, and I’ll surely go back Taiwan to present it in person!
Sep 14, 2023	“What do self-supervised speech models know about words?” and AV2Wav are both available on arXiv!

selected publications

ASRU 2023

Few-Shot Spoken Language Understanding via Joint Speech-Text Models

Chung-Ming Chien, Mingjiamei Zhang, Ju-Chieh Chou, and 1 more author

Best Student Paper Award

In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)

arXiv Bib Poster Slides Video

@inproceedings{chien2023few,
  title = {Few-Shot Spoken Language Understanding via Joint Speech-Text Models},
  author = {Chien, Chung-Ming and Zhang, Mingjiamei and Chou, Ju-Chieh and Livescu, Karen},
  booktitle = {2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
  year = {2023},
  month = dec,
  eprint = {2310.05919},
  archiveprefix = {arXiv},
  primaryclass = {cs.CL},
}

ICASSP 2021

Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech

Chung-Ming Chien, Jheng-Hao Lin, Chien-yu Huang, and 2 more authors

In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv Bib Code Poster Slides

@inproceedings{chien2021investigating,
  author = {Chien, Chung-Ming and Lin, Jheng-Hao and Huang, Chien-yu and Hsu, Po-chun and Lee, Hung-yi},
  booktitle = {ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech},
  year = {2021},
  volume = {},
  number = {},
  pages = {8588-8592},
  doi = {10.1109/ICASSP39728.2021.9413880},
  month = jun,
  eprint = {2103.04088},
  archiveprefix = {arXiv},
  primaryclass = {eess.AS},
}

ICASSP 2021

FragmentVC: Any-To-Any Voice Conversion by End-To-End Extracting and Fusing Fine-Grained Voice Fragments with Attention

Chung-Ming Chien*, Yist Y. Lin*, Jheng-Hao Lin, and 2 more authors

*equal contribution

In ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

arXiv Bib Code Poster Slides

@inproceedings{chien2020fragmentvc,
  author = {Chien*, Chung-Ming and Lin*, Yist Y. and Lin, Jheng-Hao and Lee, Hung-yi and Lee, Lin-shan},
  booktitle = {ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title = {FragmentVC: Any-To-Any Voice Conversion by End-To-End Extracting and Fusing Fine-Grained Voice Fragments with Attention},
  year = {2021},
  volume = {},
  number = {},
  pages = {5939-5943},
  doi = {10.1109/ICASSP39728.2021.9413699},
  month = jun,
  eprint = {2010.14150},
  archiveprefix = {arXiv},
  primaryclass = {eess.AS},
}

SLT 2021

Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis

Chung-Ming Chien, and Hung-yi Lee

In 2021 IEEE Spoken Language Technology Workshop (SLT)

arXiv Bib Slides

@inproceedings{chien2020hierarchical,
  author = {Chien, Chung-Ming and Lee, Hung-yi},
  booktitle = {2021 IEEE Spoken Language Technology Workshop (SLT)},
  title = {Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis},
  year = {2021},
  volume = {},
  number = {},
  pages = {446-453},
  doi = {10.1109/SLT48900.2021.9383629},
  month = jan,
  eprint = {2011.06465},
  archiveprefix = {arXiv},
  primaryclass = {eess.AS},
}