In Operation
We tested the software with an NVIDIA GeForce RTX 3060 Ti graphics card which has 8GB of VRAM. The project’s page warns that the software requires a minimum of 8GB of VRAM. To meet this requirement, chunked=True
must be set in the decode_audio function during inference. This was confirmed by our tests. The code has chunked set to False, and the software failed to run with our graphics card with its 8GB of VRAM.
The file that needs the line changed is ~/DiffRhythm/inder/infer.py
In infer.py, change the line:
output = decode_audio(latent, vae_model, chunked=False)
to
output = decode_audio(latent, vae_model, chunked=True)
We can now generate songs. The project provides a simple bash script which lets you set the reference audio file, and the lyrics file (lrc). The lyrics have to be timestamped, but the project provides tools in their huggingface page to generate the lrc.
Here’s a couple of examples of output which were generated from an audio prompt (a music file). We used the project’s example lyrics file. The lyrics are particularly bad (maybe that’s deliberate).
In both examples, there are a few instances of mangled or repeated lyrics.
Audio generation is fast even with our mid-range graphics card. The inference time is around 14 seconds for a 95 second song.
The software can also generate music from a simple text-based style prompt. Here’s an example.
Summary
DiffRhythm is great fun to play with. Even with limited experimentation, we are able to generate some decent songs, far better than generated from the project’s examples. The music output does have a tendency to be off-key at times, and mangled/repeated lyrics detract somewhat.
Both reference audio and text prompts are supported.
The current version supports song lengths up to 95 seconds. DiffRhythm-full which supports 285 seconds generation should be available by the time you read this review.
Website: github.com/ASLP-lab/DiffRhythm
Support:
Developer: ASLP-lab
License: Apache 2.0 License
For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.
DiffRhythm is written in Python. Learn Python with our recommended free books and free tutorials.
Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary