1. Course Introduction

Overview

In this lesson, we get a preview of the course ahead.

To explore more Kubicle data literacy subjects, please refer to our full library.

Summary

  1. Prerequisites (00:16)

    To complete this course, you’ll need to have a good understanding of the basics of functions, loops, visualizations, and Pandas dataframes.

  2. Course structure (02:16)

    This course has 3 main parts. 

    In the first part of this course, we’ll learn about joining data. We’ll cover the different ways you can join data, including data unions, inner joins, outer joins, left and right joins as well as outer left and right joins.

    In the second part, we’ll learn about aggregating data. We’ll start by exploring the concept of data granularity and then we’ll move on to methods for aggregating data in Python.

    In the third and most comprehensive part of this course, we’ll learn about cleaning our data. We’ll learn how to address data type errors, data range errors, null values, duplication errors, categorical value errors, inconsistent data errors, and cross-column errors.

Transcript

Welcome to this first lesson on data preparation. Before we learn about these concepts, we'd like to quickly remind you of the process of using the Jupyter Notebook files we'll be providing for this course.

Most lessons come with before and after files that capture the notebook as it was at the beginning of the lesson and at the end of the lesson.

To view these files, we first need to download them.

By default, Jupyter Notebook opens in our personal folder, so we'll make sure the downloaded files are stored there.

In this case, we have a folder called Kubicle lesson files.

We'll save the files here and navigate to Jupyter Notebook.

Here, we can open the Kubicle lesson files folder, where we can find and open the file we just downloaded.

We'll now return to the course introduction. The concepts covered in this course are intermediate Python skills.

We'll approach these concepts with the assumption that you, the learner, are already familiar with the basics of Python.

This includes performing basic calculations, understanding the different data types, storing values and variables, and adding multiple values to lists.

If you're unfamiliar with any of these concepts, we encourage you to check out our course on Python Basics.

We also assume that you're quite familiar with conditionality and loops at this stage in your learning. If you're unfamiliar with these concepts, have a look at our course on Functions, Conditionality, and Loops.

Finally, we'll also make extensive use of the Pandas and MatPlotLib libraries, so if you need a refresher on using them, make sure to check out our course on Storing, Transforming, and Visualizing Data.

Now let's have a quick look at what we'll be covering in this course.

In the first part of this course, we'll learn about joining data.

We'll cover the different ways you can join data, including data unions, inner joins, outer joins, left and right joins, as well as outer left and right joins.

In the second part, we'll learn about aggregating data.

We'll start by exploring the concept of data granularity and then we'll move on to methods for aggregating data in Python.

In the third and most comprehensive part of this course, we'll learn about cleaning our data.

We'll learn how to address data type errors, data range errors, null values, duplication errors, categorical value errors, inconsistent data errors, and cross column errors.

We'll start in the next lesson where we'll learn about different types of joins.