3. Unpivoting the Dataset

 
Subtitles Enabled

Sign up for a free trial to access more free content.

Free trial

Overview

Often the data we import to the Query Editor has many columns and few rows when we would prefer to have few columns and many rows. We can solve this problem by unpivoting columns, which we will see in this lesson.

Summary

Why do we unpivot columns?

  • For data analysis, the ideal table layout is long and narrow, with many rows and few columns
  • Data created in Excel is often short and wide, with many columns and few rows
  • This can cause important data to be spread over multiple columns, for example spreading sales data over several monthly columns

Pivoting and Unpivoting

  • Unpivoting columns turn columns into rows, by turning a series of column headings into values of a new column
  • Pivoting converts rows into columns, by creating a new column for each value of an existing column

Transcript

In the previous lesson, we imported data from a folder into a single query. In this lesson, we'll transform the dataset from its current, short and wide format to a long and narrow format by unpivoting the columns. If we select the arrow at the top of the name column and click load more, we can see that the data in our query comes from four separate files, representing the four years from 2013 to 2016. If we then scroll across the table, we can see that there's a column for every month from January, 2013 to December, 2016.

This is due to the layout of the excel files. As we can see, this layout leads to a large number of sales with null data.

For example, the top rows all come from the 2013 file and as a result, they have no values for the columns from 2014, 2015, and 2016.

This layout is often described as wide and short as these tables have many columns and relatively few rows. This is not ideal for data analysis. Instead we'll transform this into a long and narrow layout with a single date column.

We'll do this by unpivoting the date columns. Unpivotting turns a series of columns into rows of attribute or value pairs.

This will become more clear as we see it in action. We'll navigate back to the January, 2013 column and then select all the month columns using shift and right.

We'll then navigate to the transform tab and select unpivot columns.

The columns for each month are now replaced by two columns. One for months and the other for sales.

To better understand this process, let's look at the applied steps on the right of the screen. You may have noticed that this has been tracking the actions we've taken over the last few lessons. If we select the previous step, expanded data, we can see how the table looked before we unpivoted the columns. Notice the three numbers in the first row, representing the sales of best gain to customer A11 0714 from January to March, 2013.

Let's now select the unpivoted column step and once again see the table after it's been unpivoted. We can see that the first three rows represent the same three figures we just looked at in the pivoted table. The figures were previously in separate columns on the same row. However, they are now in the same column but on three different rows. As a result, the table now has fewer columns and more rows.

Now that we've unpivoted the columns, we no longer have a need for the two name columns at the left of the table. We'll select the two columns, navigate to the home tab, and select remove columns.

We should also take this oppportunity to assign names to the new columns on the right. We can re-name columns by double-clicking on their title. We'll re-name the attribute column, date and the value column, sales.

Let's now look at the opposite process of pivoting a column. This process creates multiple columns with one new column created for each value in the selected fields. To that end, we'll select the date column, navigate to the transform tab, and select pivot column.

This will create a separate column for each date in the data set. We'll select sales as the value column and press okay.

We can see that the query has reverted to the previous layout.

Each date has its own column and the values are the total sales for the relevant date. We want to work with unpivot data going forward so we'll delete the pivoting step by clicking the x next to pivoted columns in the applied steps area. Let's stop the lesson here. We've now seen how you can pivot and unpivot columns in the query editor. In the next lesson, we'll cover how to assign a data type to each column.