Sign in or start a free trial to avail of this feature.
1. Creating Calculated Columns
Calculated columns let you add new columns to the tables in the data model. Learn how to create these new columns in this lesson.
To explore more Kubicle data literacy subjects, please refer to our full library.
Overview of DAX (00:04)
Data Analysis Expressions, or DAX, is the formula language used in Power BI. DAX has some similarities with the formulas used in Excel, but it is designed to work with the structured data format found in Power BI. This format contrasts with Excel, which has a more unstructured data format.
Lesson Goal (01:07)
The goal of this lesson is to use DAX to create a calculated column that adds unit price information to our data set.
Dataset for the Course (01:18)
Our courses on DAX use a data set containing sales information for a pharmaceutical company. Each row represents an individual sales transaction. There are columns representing the sales amount and the quantity for each transaction, as well as the date of the transaction and the product involved. There are also columns representing the ID of the customer involved in the transaction, and geographical information about the customer.
The data set is contained in a csv file, which we import to Power BI Desktop. Currently, DAX formulas cannot be created in the online service, so all our courses on DAX use Power BI Desktop.
Creating a Calculated Column (02:36)
A calculated column is a new column that is added to a table in Power BI using a DAX formula. It’s best practice to create DAX formulas in Data View, as you can then see the new column being added to the table.
In our data set, we want to add a column to the table representing the unit price of each transaction. We do this by dividing the sales column by the quantity column. To create this column, we select New Column from the Home tab or the Transform tab. We then type the formula for the new column in the formula bar, and press Enter to create the new column.
When writing the formula, we refer to columns from the tables in our data model. The syntax here is to place the table name first in single quotes, then the column name in square brackets. The table name is technically not required if the relevant column is in the tables we are working with, but it’s best practice to include it anyway.
Using the Calculated Column (04:29)
When we create a new column, it appears in the fields list for the table. We can then use the column in visualizations just like any other field in the data model.
A calculated column is not always the appropriate type of calculation. For instance, or unit price column calculates the unit price for each transaction, but does not account for the fact that each transaction can have a different quantity. In these situations, you may need to use a different type of calculation.
In the previous courses, we've seen that Power BI is designed to allow you to create insightful reports in dashboards, without requiring a large amount of technical expertise. However, to fully utilize the abilities of Power BI, you should be familiar with data analysis expressions. Or DAX. DAX is a language used to create formulas in Power BI. Note that some DAX formulas are similar to Excel formulas. However, they're built to interact with data in different ways.
Excel contains unstructured data, meaning it isn't organized in any predefined manner, and cells can contain any type of data. In Power BI, data is structured, containing tables made up of columns of data. DAX is designed to work with data in this structured format.
In the next few courses, we'll teach you the most common functions and concepts in DAX.
We'll also demonstrate several business situations where you might find yourself using DAX in Power BI. In this lesson, we'll use DAX to create a calculated column that adds unit price information to our dataset. We'll start by introducing a dataset from a pharmaceutical company. The dataset records sales of various pharmaceutical products. Each row represents a sale of an individual product. We can see that each sale has a corresponding date, product name, sale amount, quantity sold, a customer id, representing who the product was sold to, and geographic information about the customer. Let's now move over to Power BI desktop. It's worth noting that DAX formulas can't be created in the online services, so all the DAX courses will use Power BI desktop. We'll start by importing the pharma dataset, which is currently housed in a CSV file. We'll navigate to the home tab, select get data, and then text slash CSV.
We'll then find the CSV file and select it.
This opens up a preview of the data.
This dataset comes from a company based in the UK, using the international date standard. However, my machine defaults to the US standard. As a result, I'll need to edit the query off-screen. If your system defaults to the international standard, you can simply select load.
Note that we have the ability to create columns in any of the three views. That being said, it's best practice to create columns in data view, as we can see the columns being added to the data model.
Let's switch to data view now. While we're here, let's ensure that the date column is set to the date data type.
As mentioned previously, each sales transaction has a corresponding quantity sold and total sales amount among other fields. Let's say we wanna create a new column, showing the price per unit. This is simply the total price of a transaction divided by the quantity sold. This will allow us to compare the average price per unit of each product, and see which product sells for the highest and lowest prices. We can create new columns from either the home tab, or the modeling tab.
Let's select the modeling tab, and new column.
Note that a new column is added to the table, and the formula bar is now highlighted.
We'll start our formula, by typing in the name unit price, followed by the equals sign. As we start typing in sales, the auto complete suggests the appropriate functions and columns. We'll select the sales field from the list of options. We'll then divide by quantity and again accept the auto complete suggestion. Before we confirm this formula, let's look at the syntax.
The table name is enclosed in a set of single quotes, followed by the column name in square brackets. The single quotes around the table name are not needed if the table name does not contain spaces.
In this situation, we could actually omit the table name, as we're referring to columns in the table we're adding to.
That said, it's best practice to always include table names for reasons we'll discuss in a later lesson.
We'll press enter to create the column, and see that each row now has a value for unit price.
Unit price also appears in the fields list, with a table symbol, indicating it's a calculated column.
Let's use this new column in a visualization.
We'll switch to report view, create a clustered bar chart, and add both unit price and product name.
Note that the chart currently shows the sum of unit price by product.
This isn't very useful, so we'll change the unit price from sum to average.
Finally, we'll sort the chart by average of unit price.
On the resulting graph, we can see that Melacaine has the highest average unit price of 10.89. However, this number is a bit deceiving.
Our new column calculated the unit price per transaction.
This average does not take into account that each transaction may cover a different quantity of products sold.
To find out the actual average unit price, we'll need a different calculation.
As we've seen here, calculated columns are relatively easy to create, however they don't offer the best solution in all situations.
In the next lesson, we'll create a measure to calculate the average unit price, and discuss the difference between these two types of calculations.