Dask is a flexible, open source, parallel computing library for analytic computing. It takes a Python job and shares it across multiple systems.
It’s main virtue is that if you are familiar with Python’s syntax, you’re ready to use Dask.
Dask consists of two components:
- Dynamic task scheduling optimized for computation. This is similar to Airflow, Luigi, Celery, or Make, but optimized for interactive computational workloads.
- “Big Data” collections like parallel arrays, dataframes, and lists that extend common interfaces like NumPy, Pandas, or Python iterators to larger-than-memory or distributed environments. These parallel collections run on top of the dynamic task schedulers.
It offers three main interfaces for many popular machine learning and scientific-computing libraries in Python:
- Array, which works like NumPy arrays.
- Bag, which is akin to the RDD interface in Spark. Dask.Bag parallelizes computations across a large collection of generic Python objects.
- DataFrame, which works like Pandas DataFrame.
Features include:
- Provides parallelized NumPy array and Pandas DataFrame objects.
- Scale Pandas, scikit-learn, and NumPy workflows with minimal rewriting.
- Provides a task scheduling interface for more custom workloads and integration with other projects.
- Enables distributed computing in pure Python with access to the PyData stack.
- Operates with low overhead, low latency, and minimal serialization necessary for fast numerical algorithms.
- Runs resiliently on clusters with thousands of cores.
- Supports encryption and authentication using TLS/SSL certificates.
- Resilient – can handle the failure of worker nodes gracefully and is elastic.
- Scales down – easy to set up and run on a laptop in a single process. This is useful if you need to manipulate some datasets without needing to use a cluster.
- Responsive – designed with interactive computing in mind it provides rapid feedback and diagnostics to aid humans.
- Diagnostic and investigative tools:
- Real-time and responsive dashboard that shows current progress, communication costs, memory use, and more, updated every 100ms.
- A statistical profiler installed on every worker that polls each thread every 10ms to determine which lines in your code are taking up the most time across your entire computation.
- An embedded IPython kernel in every worker and the scheduler, allowing users to directly investigate the state of their computation with a pop-up terminal
- The ability to re-raise errors locally, so that they can use the traditional debugging tools to which they are accustomed, even when the error happens remotely.
- Several user APIs.
Website: dask.org
Support: Documentation, GitHub
Developer: Dask core developers
License: New BSD License
Dask is written in Python. Learn Python with our recommended free books and free tutorials.
Return to Essential Python Tools | Return to Python Data Analysis
Popular series | |
---|---|
The largest compilation of the best free and open source software in the universe. Each article is supplied with a legendary ratings chart helping you to make informed decisions. | |
Hundreds of in-depth reviews offering our unbiased and expert opinion on software. We offer helpful and impartial information. | |
The Big List of Active Linux Distros is a large compilation of actively developed Linux distributions. | |
Replace proprietary software with open source alternatives: Google, Microsoft, Apple, Adobe, IBM, Autodesk, Oracle, Atlassian, Corel, Cisco, Intuit, and SAS. | |
Awesome Free Linux Games Tools showcases a series of tools that making gaming on Linux a more pleasurable experience. This is a new series. | |
Machine Learning explores practical applications of machine learning and deep learning from a Linux perspective. We've written reviews of more than 40 self-hosted apps. All are free and open source. | |
New to Linux? Read our Linux for Starters series. We start right at the basics and teach you everything you need to know to get started with Linux. | |
Alternatives to popular CLI tools showcases essential tools that are modern replacements for core Linux utilities. | |
Essential Linux system tools focuses on small, indispensable utilities, useful for system administrators as well as regular users. | |
Linux utilities to maximise your productivity. Small, indispensable tools, useful for anyone running a Linux machine. | |
Surveys popular streaming services from a Linux perspective: Amazon Music Unlimited, Myuzi, Spotify, Deezer, Tidal. | |
Saving Money with Linux looks at how you can reduce your energy bills running Linux. | |
Home computers became commonplace in the 1980s. Emulate home computers including the Commodore 64, Amiga, Atari ST, ZX81, Amstrad CPC, and ZX Spectrum. | |
Now and Then examines how promising open source software fared over the years. It can be a bumpy ride. | |
Linux at Home looks at a range of home activities where Linux can play its part, making the most of our time at home, keeping active and engaged. | |
Linux Candy reveals the lighter side of Linux. Have some fun and escape from the daily drudgery. | |
Getting Started with Docker helps you master Docker, a set of platform as a service products that delivers software in packages called containers. | |
Best Free Android Apps. We showcase free Android apps that are definitely worth downloading. There's a strict eligibility criteria for inclusion in this series. | |
These best free books accelerate your learning of every programming language. Learn a new language today! | |
These free tutorials offer the perfect tonic to our free programming books series. | |
Linux Around The World showcases usergroups that are relevant to Linux enthusiasts. Great ways to meet up with fellow enthusiasts. | |
Stars and Stripes is an occasional series looking at the impact of Linux in the USA. |