High Performance Analytics Toolkit (HPAT) – Compiler-based Framework for Big Data

Last Updated on July 11, 2021

High Performance Analytics Toolkit (HPAT) is an open source big data analytics and machine learning framework that offers Python’s ease of use combined with fast operation. It’s a compiler-based framework for big data. It accelerates data analytics and machine learning on clusters.

HPAT scales analytics/ML codes in Python to bare-metal cluster/cloud performance automatically. HPAT is built on top of Numba and LLVM compilers. code, and provides resiliency.

It compiles a subset of Python (Pandas/NumPy) to efficient parallel binaries with MPI, requiring only minimal code changes. It also provides scripting abstractions in the Julia language for analytics tasks, automatically parallelizes them, generates efficient MPI/C++.

HPAT is orders of magnitude faster than alternatives like Apache Spark. For example, HPAT is 14x to 400x faster than Spark on the Cori supercomputer at LBL/NERSC, and scales better to larger number of nodes.

HPAT depends on MPICH (a high performance and widely portable implementation of the Message Passing Interface (MPI) standard), SciPy, pandas, and numba.

Features include:

  • Automatically parallelizes a subset of Python based on the MapReduce parallel pattern. MapReduce provides high-level parallelism abstractions suitable for data-parallel analytics programs, which can also be provided on top of scripting languages.
  • Flexible in distributed data structures, which enables the use of existing libraries such as HDF5, ScaLAPACK,and Intel Data Analytics Acceleration Library.
  • Offers resiliency using automatic checkpointing and facilitates optimization and fusion of array operations. It has restart capabilities, targeted at iterative machine learning applications like logistic regression and k-means.
  • Novel design system.
  • Domain-specific partitioning inference and parallelization.
  • Parallel I/O code generation.
  • Implemented as a Julia package using Julia’s high-level matrix and vector operations. It supports the high-level syntax of the Julia language.
  • Good coverage of NumPy operators.
  • Good coverage of Pandas operators.
  • Supports I/O for the HDF5 and Parquet formats.
  • Provides basic ASCII string support.
  • Supports basic integer dictionaries.

Website: intellabs.github.io/hpat
Support: GitHub
Developer: Ehsan Totoni (Intel)
License: BSD 2-Clause “Simplified” License

HPAT is written in Python. Learn Python with our recommended free books and free tutorials.

Return to Essential Python Tools Home Page


Popular series
Free and Open Source SoftwareThe largest compilation of the best free and open source software in the universe. Each article is supplied with a legendary ratings chart helping you to make informed decisions.
ReviewsHundreds of in-depth reviews offering our unbiased and expert opinion on software. We offer helpful and impartial information.
The Big List of Active Linux Distros is a large compilation of actively developed Linux distributions.
Alternatives to Proprietary SoftwareReplace proprietary software with open source alternatives: Google, Microsoft, Apple, Adobe, IBM, Autodesk, Oracle, Atlassian, Corel, Cisco, Intuit, and SAS.
GamesAwesome Free Linux Games Tools showcases a series of tools that making gaming on Linux a more pleasurable experience. This is a new series.
Artificial intelligence iconMachine Learning explores practical applications of machine learning and deep learning from a Linux perspective. We've written reviews of more than 40 self-hosted apps. All are free and open source.
Guide to LinuxNew to Linux? Read our Linux for Starters series. We start right at the basics and teach you everything you need to know to get started with Linux.
Alternatives to popular CLI tools showcases essential tools that are modern replacements for core Linux utilities.
System ToolsEssential Linux system tools focuses on small, indispensable utilities, useful for system administrators as well as regular users.
ProductivityLinux utilities to maximise your productivity. Small, indispensable tools, useful for anyone running a Linux machine.
AudioSurveys popular streaming services from a Linux perspective: Amazon Music Unlimited, Myuzi, Spotify, Deezer, Tidal.
Saving Money with LinuxSaving Money with Linux looks at how you can reduce your energy bills running Linux.
Home ComputersHome computers became commonplace in the 1980s. Emulate home computers including the Commodore 64, Amiga, Atari ST, ZX81, Amstrad CPC, and ZX Spectrum.
Now and ThenNow and Then examines how promising open source software fared over the years. It can be a bumpy ride.
Linux at HomeLinux at Home looks at a range of home activities where Linux can play its part, making the most of our time at home, keeping active and engaged.
Linux CandyLinux Candy reveals the lighter side of Linux. Have some fun and escape from the daily drudgery.
DockerGetting Started with Docker helps you master Docker, a set of platform as a service products that delivers software in packages called containers.
Android AppsBest Free Android Apps. We showcase free Android apps that are definitely worth downloading. There's a strict eligibility criteria for inclusion in this series.
Programming BooksThese best free books accelerate your learning of every programming language. Learn a new language today!
Programming TutorialsThese free tutorials offer the perfect tonic to our free programming books series.
Linux Around The WorldLinux Around The World showcases usergroups that are relevant to Linux enthusiasts. Great ways to meet up with fellow enthusiasts.
Stars and StripesStars and Stripes is an occasional series looking at the impact of Linux in the USA.