Image of a galaxy

Machine Learning in Linux: astroML – statistical data analysis in astronomy and astrophysics

Last Updated on March 6, 2023

In essence, Machine Learning is the practice of using algorithms to parse data, learn insights from that data, and then make a determination or prediction. The machine is ‘trained’ using huge amounts of data.

In other words, Machine Learning is about building programs with tunable parameters (typically an array of floating point values) that are adjusted automatically so as to improve their behavior by adapting to previously seen data.

astroML is a Python module for machine learning and data mining built on NumPy, SciPy, scikit-learn, matplotlib, and Astropy.

The aim of the project is to offer a repository of Python implementations of common tools and routines used for statistical data analysis in astronomy and astrophysics, and to provide a uniform and easy-to-use interface to freely available astronomical datasets.

Installation

A fresh installation of Ubuntu 22.10 is missing git. Let’s install that first:

$ sudo apt install git

We will install astroML from its source code. Clone the project’s GitHub repository.

$ git clone https://github.com/astroML/astroML

Change into the newly created directory with the command:

$ cd astroML

We will install astroML system-wide:

$ sudo python setup.py install

We normally recommend installing software without polluting a system. Software such as Anaconda and Docker are popular software for this task. If you install Anaconda, you can then install the software using conda. There’s a conda package available.

$ conda install -c astropy astroML

Your system needs:

  • Python version 3.6+
  • Numpy >= 1.13
  • Scipy >= 0.19
  • Scikit-learn >= 0.18
  • Matplotlib >= 3.0
  • AstroPy >= 3.0

You may also need some additional packages:

$ sudo apt-get install dvipng texlive-latex-extra texlive-fonts-recommended cm-super

For example cm-super is needed for the type1ec.sty style sheet.

Next page: Page 2 – In Operation and Summary

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Please read our Comment FAQ before posting a comment.

2 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Steve
Steve
5 months ago

Python is serial task execution language so all python based solution for ML and NN’s are slow and that’s a fact. Is very hard next to impossible to a play hardware acceleration, using CUDA only if u code it for your self. Data mining is useless without hardware acceleration. Knime, RapidMiner, MatLab and some other solutions use it, others do not and that’s something to consider if one want hardware performance and no one tells u about. Everything python based is a non performance toy suitable for playing around with small datasets, great to start with. As soon one need performance python is not the answer. Until developers doesn’t speed up the code and add parallelism and hardware CPU+GPU hybridization to the system. Using CUDA with python also very very hard.

Jacob
Jacob
5 months ago
Reply to  Steve

I stopped reading after your egregious assertion that python ML and NN are slow. What a load of baloney.