Sign in or start a free trial to avail of this feature.
5. Merging Queries
Importing data from multiple sources will often require you to merge multiple queries together. In this lesson, we will add a new dataset and look at how we can merge it with our existing data.
Combining Queries in the Query Editor
- There are two methods for combining queries:
- Merging Queries: Merging combines queries by adding new columns to an existing dataset. This is used to add more information to existing records.
- Appending Queries: Appending queries adds extra rows to an existing query. This is used when the new data has the same structure as the existing data.
- Merging queries combines multiple queries using a common column
- In order to merge queries one table must have a column that uniquely identifies each row
- This column is then matched to a column in the other table containing the same values
Over the last few lessons, we've been working with a single query. In practice you'll often find that you do not have all the data you need at the start of a project. In this scenario, you can create a second query for the new data and then combine these queries into one. Our goal in this lesson is to create a new quantity query and merge it with our existing query. There are two methods of combining queries in the query editor: merging and appending.
Merging is used when we want to add one or more new columns to the data set. As such, merging queries alters the structure of the data set. By contrast, appending adds extra data to the query but does not add any new columns. Therefore, appending queries does not alter the structure of the data set. Our query currently contains data on the amount of each sales transaction, however we would like to add quantity information. We'll create a new query containing quantity data and merge it to the existing query. The quantity data will then become a new column in the sales query. Before we can create a new query, let's save the query we've been working on for the last few lessons. We'll navigate to the home tab and select "close and apply." This loads the query output into the Power BI data model, and closes the query editor. We'll now add our new data. The current data set includes details of sales revenue for each month, however our quantity data is in another excel file. We can see at this quantity data already comes in an appropriate long and narrow format. It's unpivoted and includes all the years from 2013 to 2016 in a single file, in fact, the format is almost identical to the format of our sales data. The exception being that we swapped the sales column with the quantity column. Let's move back to Power BI so we can add the quantity data to our sales data query. We'll navigate to the home tab, select "get data," and then excel.
We'll find the file with the quantity data and then select the single sheet. We'll click edit to open up the query editor.
We can see there are now two queries. One for the original sales data and one for the new quantity data.
Let's take a moment and look over the quantity query to make sure it's correct. If we look at the date column we can see there are dates for 2018, just as there were originally with the sales query. This isn't right. To fix this problem we'll delete the changed type step, click on the "ABC123," symbol on the date column, and select "using locale." We'll then select the date data type, click on the locale box, and scroll down to "English (United Kingdom)" We'll click "OK" and find that our dates are now in the correct format.
We're now ready to move on. We wanna match the quantity data with the sales data and merge the two queries to a single data set. We'll do this by editing the original sales data query. We'll select the query, navigate to the home tab, and select "merge queries." We'll then start our merge by selecting the quantity query. We now need to select columns to match on and the type of joint. At this point, we encounter a problem. In order to merge the queries, we need to match one column from the sales query to one column in the quantity query. However, each row of the sales table shows the sales of one specific product to one specific customer on one specific month.
Therefore, we have to look at the value of three different columns to uniquely identify a particular row. There's no single column that has a different value for every row. The same problem exists with our quantity query. For example, let's imagine we tried to merge the tables using the customer ID column. The first five rows in the sales query all relate to the customer A110714 in the product best gain.
If we were to merge on the customer ID the fifth row of the quantity query would be a match for all of these rows as it has the same customer ID even though we can see it's for a different product: Polesta. When merging tables, at least one of the tables needs to have a column with a unique value in every row. In this case, we actually want both tables to have such a column, therefore, before we can merge the queries we need to create a column with a unique value for each row. We'll stop here and continue this process in the next lesson