Last Updated on June 28, 2024
Tutorial configuration
Recently, we published an introduction to data science in R for the beginner in programming. This is a complementary article written using the same approach, but this time focusing on Python, which is another open source programming language. You will learn how to use Python in a Jupyter notebook to manipulate a data set and visualise the results.
Python has an even larger following than R, so both articles should get the beginner up to speed in the two main languages for doing data science. The differences in the tutorials highlight how R and Python tackle the same task. Often it is your own experience, and the data science task that you want to do, that determines which language to choose.
The tutorial requires some basic knowledge of Linux, but other than that we shall go through the steps to set up the tutorial.
Install anaconda for Linux
Anaconda is a package management system for Python aimed at individual users. We shall use this system for this tutorial. We recommend that you use it for all your data science projects because it handles package dependencies reliably. It also allows you to reproduce your results by allowing separate Python environments for each of your data science projects.
Instructions for how to install anaconda for Linux can be found at https://docs.anaconda.com/anaconda/install/linux/. There are instructions for each popular Linux distribution.
Open up a fresh Linux terminal. To check that anaconda
has been installed correctly, ensure that you can run the conda
command from the Linux terminal by typing conda info
. The conda
command is the main way to access the functionality in anaconda. This particular conda
option should show a list of configuration settings including the version of conda
that has been installed.
More help on conda
can be found be found in the conda cheat sheet.
Create a conda environment
Anaconda allows you to create a dedicated environment for the tutorial. In this environment, we shall install Python and the packages that are needed.
At the Linux terminal prompt, you should create an environment called intro
by typing
conda create -c conda-forge -n intro python=3.8 jupyter plotnine
This should install Python
version 3.8 and the Python packages jupyter
and plotnine
as well as the dependencies of these packages. There should be no error messages.
You can see the list of your environments in anaconda by typing
conda env list
To activate the new environment and obtain the use of the newly installed Python packages type at the terminal
conda activate intro
From now on any conda packages that you install are placed in the intro
environment.
Start Jupyter in a new directory
Create a new directory for the tutorial called pythonintro
on your file system and change to that directory using the following commands:
mkdir pythonintro
cd pythonintro
We shall use a Jupyter notebook for the tutorial that is based in this directory. To start Jupyter, type at the terminal
jupyter notebook
This should open up your web browser in Linux and show you the contents of the directory pythonintro
which is currently empty.
You are now ready to start the training session by going through our Data Science Tutorial For Python.
Nothing works
Every link just redraws this page!
Everything working fine from our end.
Maybe try another web browser, I can view all the links on the page.