This series looks at practical applications of Machine Learning from a Linux perspective. We only feature free and open source software in this series (except where stated).
Let’s clear up one potential source of confusion at the outset. What’s the difference between Machine Learning and Deep Learning? The two terms mean different things.
In essence, Machine Learning is the practice of using algorithms to parse data, learn insights from that data, and then make a determination or prediction. The machine is ‘trained’ using huge amounts of data.
Deep Learning is a subset of Machine Learning that uses multi-layers artificial neural networks to deliver state-of-the-art accuracy in tasks such as object detection, speech recognition, language translation and others. Think of Machine Learning as cutting-edge, and Deep Learning as the cutting-edge of the cutting-edge.
Both Machine Learning and Deep Learning are changing the world. Deep Learning is trending.
The apps are self-hosted so you don’t need to pay any hosting/cloud fees to use them. We’ve written short reviews for each app. And there are many more reviews currently under preparation.
Audio | |
---|---|
Audiocraft - Python-based software which provides the code and models for MusicGen, a simple and controllable model for music generation. The models generate short music extracts based on the text description you provide. The models can generate up to 30 seconds of audio in one pass. | |
Bark - Transformer-based text-to-audio model. The software can generate realistic multilingual speech as well as other audio – including music, background noise and simple sound effects, from text. | |
Coqui STT - a deep-learning toolkit for training and deploying speech-to-text models. There are bindings for various programming languages. | |
Demucs - billed as “a state-of-the-art music source separation model, currently capable of separating drums, bass, and vocals from the rest of the accompaniment”. | |
Piper - fast, local neural text to speech system written in C++ and Python that runs well even on single board computers. | |
Speech Note - GUI frontend for various processing engines. For Speech to Text it uses Coqui STT, Vosk, and Whisper. For Text to Speech, Speech Note uses espeak-ng, MBROLA, Piper, RHVoice, and Coqui TTS. And machine translation is handled by Bergamot Translator. | |
Spleeter - Command-line source separation library with pre-trained models. It's designed to help the research community in Music Information Retrieval (MIR) leverage the power of a state-of-the-art source separation algorithm. | |
StemRoller - GUI software which lets you separate vocal and instrumental stems from any song with a single click. | |
Tortoise TTS - Multi-voice text-to-speech system trained with an emphasis on quality. It seeks to provide strong multi-voice capabilities, and highly realistic prosody and intonation. | |
TTS - Library for advanced Text-to-Speech generation. It offers pretrained models in more than 1,100 different languages, together tools for training new models and improving existing models. There are also utilities for dataset analysis. | |
Ultimate Vocal Remover - GUI that lets you isolate stems from music. It offers convenient access to a wide range of different models. | |
Whisper - an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Whisper is a natural language processing system that’s built on PyTorch. |
Chat | |
---|---|
Alpaca - chat with a wide range of local AI models. There's also support for image recognition, code highlighting, and more. | |
Bavarder - GTK4/libadwaita based app that offers an easy way to experiment with ChatGPT. | |
ChatGPT (by lencx) - a desktop application wrapper for the ChatGPT website. The chatbot generates human-like text in a conversational style and can be used for a variety of natural language processing tasks. | |
chatGPT-shell-cli is a simple script to use OpenAI’s chatGPT and DALL-E from the terminal without needing to install either Python or Node.js. | |
Dalai - bills itself as “the simplest way to run LLaMA on your local machine”. Large Languages Models trained on massive amount of text can perform new tasks from textual instructions. | |
GodMode - a dedicated chat browser giving instant access to the full webapps of ChatGPT, Bard, Claude 2, Perplexity, Bing, Quora Poe and other AI services all accessible with a single keyboard shortcut. | |
GPT4All - GUI and CLI locally-running AI chat application powered by the GPT4All-J Apache 2 Licensed chatbot. | |
Ollama - run and chat with Llama 2 and other models with the ability to customize models by creating your own Modelfile. | |
Reor - a private AI personal knowledge management tool. Think of it as a notes program on steroids. Each note is saved as a Markdown file to a “vault” directory on your machine. | |
Simplexity - a simple desktop app that accesses Perplexity. It's written in JavaScript. | |
Terminal GPT - a command-line interface (CLI) tool that allows you to use ChatGPT 3.5 in your terminal without needing API keys. | |
Text generation web UI - offers a web user interface for a variety of large language models such as LLaMA, llama.cpp, GPT-J, OPT, and GALACTICA. |
Graphics | |
---|---|
BackgroundRemover - a command line tool to remove the background from images and videos using AI. The AI is performed courtesy of U2Net, a machine learning model that allows you to crop objects in a single shot. | |
CodeFormer - command-line software which offers blind face restoration. This aims at recovering high-quality faces from the low-quality counterparts suffering from unknown degradation. This is freeware. | |
DeOldify - A modern way to colorize black and white images using deep learning technology. The software provides pre-trained weights which allows you to colorize images and video without needing to train your own models | |
Easy Diffusion - web interface to Stable Diffusion designed to be as easy-to-use as possible. | |
FBCNN - Flexible Blind Convolutional Neural Network is software which seeks to remove artifacts from JPEGs while preserving the integrity of the images. | |
Final2x - GUI software that uses sophisticated AI models to enhance your images by guessing what the details could be. | |
GFPGAN - perform real-world face restoration. This software can radically improve the quality of photos. | |
Imaginer - extremely easy-to-use GTK4 software which lets you generate pictures using AI. | |
InvokeAI - a Stable Diffusion toolkit. Generate highly detailed images based on text descriptions, or from images/drawings. | |
Lama Cleaner - Fully self-hostable inpainting tool powered by state-of-the-art AI models | |
Old Photo Restoration - use deep learning to restore old photos via deep latent space translation. | |
PhotoPrism - AI-powered photos app for the decentralized web. It uses modern technologies to tag and find pictures. The software can be run at home, on a private server, or in the cloud. | |
Real-ESRGAN - create practical algorithms for general image/video restoration. | |
Rembg - remove backgrounds from images. The tool relies on the U2Net model, a machine learning model that performs object cropping in a single shot. | |
Stable Diffusion web UI - web interface to Stable Diffusion, a deep learning text-to-image diffusion model capable of generating photo-realistic images given any text input. | |
Upscaler - GTK4 software that uses sophisticated AI models to enhance your images by guessing what the details could be. It's a frontend for Real-ESRGAN. | |
Upscayl - GUI software that uses sophisticated AI models to enhance your images by guessing what the details could be. Like Upscaler, it's a frontend for Real-ESRGAN. |
Science | |
---|---|
Argos Translate is state of the art neural machine translation software. Argos Translate can be used as either a Python library, command-line, or GUI application. It uses OpenNMT for translations. | |
astroML - a Python module which offers statistical data analysis in astronomy and astrophysics. | |
EasyOCR - General OCR that can read both natural scene text and dense text in documents. The software supports more than 80 languages. | |
LibreTranslate is a machine translation API which is entirely self-hosted. This software lets you use open source machine translation in your projects. It uses Argos Translate for its translation engine. It sports a great web frontend. | |
ocrs - Rust library and CLI tool for extracting text from images, also known as OCR (Optical Character Recognition). The software uses neural network models written in PyTorch. | |
scikit-learn - a machine learning library built on top of SciPy that supports supervised and unsupervised learning. It also provides various tools for model fitting, data preprocessing, model selection, model evaluation, and many other utilities |
If you have recommendations for other good free and open source machine learning software for Linux, please comment below.
Machine learning science apps please!
There are so many interesting projects here to try. The Stable Diffusion ones are particularly useful.
One thing I hate is that so many of these programs take up so much hard disk space. I’m not talking about the size of their models but rather the virtual environments with tons of Python libraries. Python is really a mess
You should make it clear that the apps are self-hosted. That’s a big virtue and is worth stressing to readers, don’t ya think?
Good point, article has been updated to include a reference that the apps are self-hosted.