1. Course Introduction

Overview

In this lesson, we get a preview of the course ahead.

To explore more Kubicle data literacy subjects, please refer to our full library.

Summary

  1. Preface (00:14)

    In this course, we’ll be covering the intermediate components of Python, specifically around the external libraries NumPy, Pandas, and Matplotlb. These libraries enable us to interact store, transform and explore data in Python to a much greater extent than we can with Python’s default libraries.

    Before we continue, it’s worth remembering that the process of interacting with our lesson/exercise/exam files can be a little tricky. To access these files, you first need to download them from Kubicle. When choosing a location to save these files, ensure that they’re somewhere that you can easily navigate to from Jupyter Notebook. Once downloaded, go to the Jupyter Notebook Home screen and navigate to the folder you saved your file in. You can then select the file to open it.

  2. Course Structure (01:20)

    This course has 3 main parts. 

    In the first part, we’ll learn about the NumPy library. This library stores data in arrays, a form of list, to make more efficient calculations and transformations with ranges of data.

    In the second part, we’ll learn about the Pandas library. This enables us to store data in dataframes which make it easier to understand and interact with large datasets.

    In the third part, we’ll learn about the MatPlotLib library. This library enables us to explore our data using a range of different visualizations.

     

Transcript

Welcome to this first lesson on storing, transforming, and exploring data.

Before we learn about these concepts, we'd like to quickly remind you of the process of using the Jupyter Notebook files we'll be providing for this course.

Most lessons come with before and after files which capture the notebook as it was in the beginning of the lesson and at the end of the lesson.

To view these files, we first need to download them.

By default, Jupyter Notebook opens in our personal folder, so we'll make sure the downloaded files are stored there.

In this case, we have a folder called Kubicle lesson files, we'll save the file here and navigate to Jupyter Notebook.

Here we can open the Kubicle lesson files folder where we can find and open the file we just downloaded.

We'll now return to the course introduction.

The concepts covered in this course are intermediate Python skills.

We'll approach these concepts with the assumption that you, the learner, are already familiar with the basics of Python.

This includes performing basic calculations, understanding the different data types, storing values in variables, and adding multiple values to lists.

If you're unfamiliar with any of these concepts, we encourage you to check out our course on Python basics.

We also assume that you're quite familiar with conditionality at this stage of your learning and although it's not essential, an understanding of loops is also beneficial.

If you're unfamiliar with these concepts, have a look at our course on functions, conditionality, and loops.

Let's now have a quick look at what we'll be covering in this course.

The layout of this course is based on three libraries which are essential for data scientists using Python.

The first of these is the NumPy library.

This package allows us to store our data in a special format which makes it much more effective at performing calculations and transformations on our data.

The next section is about the Pandas library.

This allows us to store our data in table-like format which makes it far easier to interact with, particularly when building machine-learning algorithms.

The final section is about the MatPlotLib library.

This library allows us to visualize and explore our data.

Again, this is an essential part of the machine-learning algorithm building process.

It's also critical when we're our communicating our reports to external viewers, which is what Jupyter Notebook was designed to do.

In the next lesson we'll get started with NumPy.