Machine Learning in Linux: Bark - Text-Prompted Generative Audio - Page 2 of 3

In Operation

We can run the Bark models with a simple command such as this:

$ python -m bark --text "Hello everyone, my name is Steve. Let's have some fun!" --output_filename "bark-my-name-is.wav"

Here’s an example of the generated audio with the text prompt using the smaller models.

The clip is vaguely reminiscent of the voice of Stephen Mangan, an English actor, comedian, presenter and writer. Each time you run this command, you’ll get different output. Bark generates audio from scratch. It’s not meant to create only high-fidelity, studio-quality speech. Sometimes the generated audio is garbage.

Bark will occasionally add music to the text, but the symbol ♪ around the text will help or use [music]. We created the next two clips using the Python file shown on Page 3 of this article.

To illustrate how each generation differs, here’s a second version using the same text prompt.

What’s more impressive is the variety of speaker presets. There are more than 100 available for a wide range of languages. The next clip uses a female voice which we specified using audio_array = generate_audio(text_prompt, history_prompt="v2/en_speaker_9")

Bark also supports various languages out-of-the-box and automatically determines language from input text.

Summary

Bark is a really interesting project and great fun to boot. You’re not limited to speech, as Bark can generate music lyrics, sound effects or other non-speech sounds.

With a GeForce RTX 3060 Ti graphics card, processing is fast. A 14 second audio files takes around 13 seconds to be generated. That’s important, as you’ll often need to run the software multiple times to get useful output.

We’d love to try the larger models but we don’t have a graphics card with at least 12GB of VRAM. Maybe NVIDIA or AMD will donate a suitable graphics card to LinuxLinks?

Bark creates audio files with a maximum duration of about 13 seconds, but it’s possible to create much longer audio files by splitting longer text into sentences using nltk and generate the sentences one by one.

Bark has amassed a whopping 22k GitHub stars.

Website: github.com/suno-ai/bark
Support:
Developer: Suno, Inc
License: MIT License

Bark is written in Python. Learn Python with our recommended free books and free tutorials.

For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Next page: Page 3 – Example Python File

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary
Page 3 – Example Python File

Pages: 1 2 3

This site uses Akismet to reduce spam. Read our Comment FAQ.

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Carlos

1 year ago

Never heard of Bark before. It looks kinda interesting. I’ll give it a whirl under Ubuntu.

James

Reply to Carlos

I’m using Debian so I should be able to get it working.

Neil

Reply to James

do what?

Mel

Can you run Bark without a dedicated graphics card? I’ve got a 5th generation Intel machine with 8GB of RAM.

Author

Steve Emms

Reply to Mel

We don’t recommend using Bark without a dedicated GPU, but it’s definitely possible to run it without one.

You’ll get a warning

“No GPU being used. Careful, inference might be very slow!”

And that’s definitely the case. A 5 second clip took over a minute to be generated on an Intel i5-10400 machine.

Last edited 1 year ago by Steve Emms

Reply to Steve Emms

Even with an i9-13900K, processing is slow. A dedicated graphics card is a must for these machine learning apps.

Documents	Internet	Education
Audio	Video	Graphics
Admin	Desktop	Productivity
Science	Games	Security
Utilities	Coding	Finance
Web Apps	Other	Books

Google	Microsoft	Apple
Adobe	IBM	Autodesk
Oracle	Atlassian	Corel
Cisco	Intuit	SAS
Progress	Salesforce	Citrix