Our Machine Learning in Linux series focuses on apps that make it easy to experiment with machine learning.
Tortoise TTS is a multi-voice text-to-speech system trained with an emphasis on quality. It seeks to provide strong multi-voice capabilities, and highly realistic prosody and intonation. It leverages both an autoregressive decoder and a diffusion decoder.
It is based on an GPT like autogressive acoustic model that converts input text to discritized acoustic tokens, a diffusion model that converts these tokens to melspectrogram frames and a Univnet vocoder to convert the spectrograms to the final audio signal
Installation
We’re testing Tortoise TTS with an NVIDIA GeForce RTX 3060 Ti dedicated graphics card with CUDA 12.3 under Ubuntu 23.10.
As we’ve explained in previous articles in this series, we don’t recommend using pip to install software unless it’s within a virtual environment. A good solution is to use a conda environment as it helps manage dependencies, isolate projects, and it’s language agnostic.
We’ll therefore use conda to install Tortoise TTS. If your system is missing conda, install either Anaconda or Miniconda first. Once installed, we can then create our conda environment with the command.
Create our conda environment with the command.
$ conda create --name tortoise python=3.9 numba inflect
Activate that environment with the command:
$ conda activate tortoise
Install the dependencies:
$ conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia
$ conda install transformers=4.29.2
Now clone the project’s GitHub repository:
$ git clone https://github.com/neonbjb/tortoise-tts.git
Change into the newly created directory.
$ cd tortoise-tts
Build the software with the command:
$ python setup.py install
This is cross-platform software, but we only tested the software under Linux.
Next page: Page 2 – In Operation and Summary
Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary