Sign in or start a free trial to avail of this feature.
1. Review of Fundamental Statistical Concepts Part 1
Before delving into Statistical Analysis with Alteryx, take a moment to ensure you are comfortable with various statistical concepts, starting with standard deviation.
- You can download the Alteryx Predictive tools here
- The Standard Deviation is a method of measuring the distance of values of a set from the mean
- A higher Standard Deviation means that the values are more dispersed, while a lower Standard Deviation means that the values are closer together
In the next series of lessons, we'll look more closely at some of analytical capabilities of Alteryx. Before proceeding further, you should ensure that you have downloaded the Alteryx Predictive Analytics add-in. You can check this easily by looking for the Predictive tab on the Tools palette. If it's not there, you'll need to download it from the Alteryx website. A link can be found in the show notes. This add-in contains many prepackaged tools to assist in the predictive analytics process. Once you have the suite of predictive tools installed, you're ready to begin. However, before diving into the numbers, it's worthwhile to take a moment to review some basic statistical methodologies such as standard deviation, normal distribution, z-scores, correlation and regression.
If you're already comfortable with these concepts, please feel free to skip the next three lessons. We'll start our review by looking at standard deviation. Standard deviation measures the concentration of selected data compared to the average or mean. Imagine I record the temperature in Celsius at Hyde Park Corner in London at the same time for the first 10 days of January. My friend then records the temperature at Sydney Harbor in Australia at the same time over the same period. If we compare the temperature recordings in the two locations, we can see that the average temperature in London over that period was five degrees Celsius versus 22 degrees in Sydney.
The London temperatures varied from a high of nine to a low of one, giving us a range of eight. Sydney had a high of 26 and a low of 18, also giving us a range of eight. In which city did the temperature vary more widely? The range of temperature was eight degrees in both cities, but the average temperature was much lower in London. Therefore, relative to the average, the temperature in London varied by a greater amount. Standard deviation develops this idea by considering how the recordings each day compared to the average. In our example, the standard deviation is 2.7 degrees in London versus just 2.1 degrees in Sydney.
The relationship between the average and standard deviation is a fundamental concept in data analysis allowing us to compare different data sets.
In the next review lesson, we'll look at normal distribution and z-scores.