Our Machine Learning in Linux series focuses on apps that make it easy to experiment with machine learning.
One of the standout machine learning apps is Stable Diffusion, a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. We’ve explored quite a few hugely impressive web frontends such as Easy Diffusion, InvokeAI, and Stable Diffusion web UI.
Extending this theme but from an audio perspective, step forward Bark. This is a transformer-based text-to-audio model. The software can generate realistic multilingual speech as well as other audio – including music, background noise and simple sound effects, from text. The model also generates nonverbal communications like laughing, sighing, crying, and hesitations.
Bark follows a GPT style architecture. It is not a conventional Text-to-Speech model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script.
Installation
We tested Bark with a fresh installation of the Arch distro.
To avoid polluting our system, we’ll use conda to install Bark. A conda environment is a directory that contains a specific collection of conda packages that you have installed.
If your system doesn’t have conda, install either Anaconda or Miniconda, the latter is a minimal installer for conda; a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others.
There’s a package for Miniconda in the AUR which we’ll install with the command:
$ yay -S miniconda3
If your shell is Bash or a Bourne variant, enable conda for the current user with
$ echo "[ -f /opt/miniconda3/etc/profile.d/conda.sh ] && source /opt/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
Create our conda environment with the command:
$ conda create --name bark
Activate that environment with the command:
$ conda activate bark
Clone the project’s GitHub repository:
$ git clone https://github.com/suno-ai/bark
Change into the newly created directory, and install with pip (remember we’re installing to our conda environment, without polluting our system).
cd bark && pip install .
There are a few extras which you might need to do. The full version of Bark requires around 12GB of VRAM. If your GPU has less than 12GB of VRAM (our test machine hosts a GeForce RTX 3060 Ti card with only 8GB of VRAM), you’ll get errors such as this:
Oops, an error occurred: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.76 GiB total capacity; 6.29 GiB already allocated; 62.19 MiB free; 6.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC
Instead, we need to use smaller version of the models. To tell Bark to use the smaller models, set the environment flag SUNO_USE_SMALL_MODELS=True.
$ export SUNO_USE_SMALL_MODELS=True
We’ll also install IPython, an interactive command-line terminal for Python.
$ pip install ipython
# Again, only use this command in the conda environment.
Next page: Page 2 – In Operation and Summary
Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary
Page 3 – Example Python File
Never heard of Bark before. It looks kinda interesting. I’ll give it a whirl under Ubuntu.
I’m using Debian so I should be able to get it working.
do what?
Can you run Bark without a dedicated graphics card? I’ve got a 5th generation Intel machine with 8GB of RAM.
We don’t recommend using Bark without a dedicated GPU, but it’s definitely possible to run it without one.
You’ll get a warning
“No GPU being used. Careful, inference might be very slow!”
And that’s definitely the case. A 5 second clip took over a minute to be generated on an Intel i5-10400 machine.
Even with an i9-13900K, processing is slow. A dedicated graphics card is a must for these machine learning apps.