Sign in or start a free trial to avail of this feature.
1. Union Joins
In this lesson, we learn how to use the union join tool to bring together records vertically from separate files with the same data structure.
To explore more Kubicle data literacy subjects, please refer to our full library.
- The Union Join is used for merging files that have similar structures
- Data from input files will be merged vertically so that data from like fields will be stacked on top of each other
- Data will be merged in the order that input datasets are connected to the Union Join tool as a default
In the previous course, we looked at how users can prepare their data for further analysis.
We'll now address some of the many ways prepared data from various sources can be combined using Alteryx.
The Alteryx terminology for combining data files is join.
In this lesson, we'll use the union join tool which merges all data vertically in files that have similar structures.
To get an idea of how this works, we'll run a quick example.
For this course, we'll make use of 5 sample data sets pertaining to a corporate events business.
These data sets are sales data for January February and March, a customer data set, and an events data set.
Imagine that this business engages sales representatives to sell entertainment packages for events to corporate customers.
These packages are marketed at different price points, tailored for each event.
Each month sales data is contained in its own separate file.
However, we would like to merge the files to view sales data in aggregate.
We'll first import the sales data by just dragging and dropping the files onto the canvas.
We'll now review the data containing these three files by using the preview window.
We can see that the headings for each of these data sets are identical.
Since the file structures are identical, we can now move forward with their union join.
We'll navigate the join tab, bring the union icon onto the canvas and connect it to our 3 sales data sets.
In the configuration window, we have the option to configure the union join automatically by column position or column name.
Alternatively we can manually configure the fields.
The manual option allows users to match data from each of the input sources to ensure that the output columns contain the correct information.
In this case, the headers are identical. So an automatic configuration by name will suffice.
At the bottom of the configuration window, there is a box labeled output order.
If the toggle is selected, the user can change how data is listed in the combined data set.
As we can see, it defaults to list data from the first set followed by the second set and then the third set.
Looking at the canvas, we can see that my March sales data is listed as set number 1, since it was the first set I connected to the Union tool.
We can also see that January is number 2 and February is number 3.
I'll reorder the output, so that January that is listed first followed by February and then March.
We'll now run the workflow and preview the data in the output window.
We can see that the three files have been brought together into one long continuous data set.
As we can see, the union join makes it very easy to merge multiple data sets that have similar structures by vertically stacking the data in question.
In the next lesson, we'll look at more complex join options.