Voice Recognition

Machine Learning in Linux: Coqui STT – deep-learning toolkit for training and deploying speech-to-text models

Last Updated on March 6, 2023

In Operation

The quickest way to start using STT is with its model manager. This provides a convenient unified interface to connect your microphone to a Coqui Speech-to-Text model, manage your installed models and install new ones from the Coqui Model Zoo. The Coqui Model Zoo is the central hub for finding STT models created by its community as well as official Coqui models.

Start the model manager with the command:

$ stt-model-manager

This launches the system’s default web browser at http://127.0.0.1:38450/

Install a model from the Coqui STT Model zoo to get started. There are lots of pre-trained STT models available.

Coqui STT models
Click image for full size

We installed the English STT huge vocab model. The acoustic model was trained on American English data with synthetic noise augmentation. This model was trained on Common Voice 7.0 English (custom Coqui train/dev/test splits), LibriSpeech, and Multilingual Librispeech. In total approximately 47,000 hours of data.

Installed STT models
Click image for full size

The model is stored at ~/local/share/coqui/models/English STT v1.0.0-huge-vocab

total 979M
-rw-rw-r-- 1 sde sde 934M Feb 20 19:44 huge-vocabulary.scorer
-rw-rw-r-- 1 sde sde  46M Feb 20 19:41 model.tflite

We can test the model by clicking the Run model button. In the image below, the model has accurately transcribed our spoken words. For best results, you should ensure you’re using the software in a low-noise environment with a good microphone.

Transcription with Coqui STT
Click image for full size

The software has an efficient training pipeline with multi-GPU support. Streaming and real-time inference is supported.

Summary

STT gets our firm recommendation. It’s very impressive software with high quality pre-trained models available.

Language models are trained from text, and the more similar that text is to the speech your STT system encounters at run-time, the better STT performs. For more accurate transaction you’ll want to use a custom language model.

There are bindings for various programming languages.

Website: coqui.ai
Support: GitHub Code Repository
Developer: Coqui STT developers
License: Mozilla Public License 2.0

Coqui STT is written in C++ and Python. Learn C++ with our recommended free books and free tutorials. Learn Python with our recommended free books and free tutorials.

Artificial intelligence icon For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments