Speech

Machine Learning in Linux: Speech Note

In Operation

First, choose a language by clicking the Languages menu. We can search for a language from the search bar. Let’s choose English.

We can then download models for Speech to Text, Text to Speech and translation from English to a foreign language. The models are stored at ~/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/. You’ll need plenty of disk space. For example, the Large model for Whisper takes up over 1GB of hard disk space.

From a user experience perspective, the interface isn’t particularly refined here although the drop down box letting you select between Speech to Text, Text to Speech and Translator is helpful. But there’s definitely room for improvement. There’s also an Other category for downloading punctuation.

Speech Note download

Here’s an image of Speech Note in its translator mode.

Speech Note translator

I don’t speak any Portuguese whatsoever so I cannot comment on the accuracy of the translation generated by Coqui CV VITS.

Here’s an example of Text to Speech, generated using Piper.

The generated audio is saved in uncompressed WAV format to ~/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote although this is not clear from the interface. The developer plans to add options to save to MP3 and OGG in the future.

Summary

Speech Note works well offering an attractive frontend to powerful Speech to Text and Tech to Speech models. As no net connection is required (other than to download the models), your privacy is not compromised.

All the heavy lifting is performed by other open source software, so our evaluation mostly focuses on the interface itself. We already give the highest plaudits to Whisper and Piper gets a strong recommendation.

We’d love to see support for other tasks such as spell checking and grammar checking in future releases.

Website: github.com/mkiol/dsnote
Support:
Developer: mkiol
License: Mozilla Public License 2.0

Artificial intelligence icon For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Speech Note is written in C++. Learn C++ with our recommended free books and free tutorials.

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Please read our Comment FAQ before posting a comment.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments