8. X-Y Scatter Plots and Trendlines
Scatter plots are the ideal option when you want to display a full dataset on a chart and identify trends within the dataset. PowerPoint's internal tools also help you create regression trend lines and equations for your data.
X-Y scatter plots
- Typically used to display experimental data
- Perfect for finding patterns in data or clustering data
- Trendlines apply a regression line to your scatter plot
- Select a point, right-click and add a trendline
- R2 measures the accuracy of our trendline equation
- Typically more than one variable is needed to get a R2 >0.9
X Y scatter plots tend to be used a lot in academia, primarily to show the results of experiments and to try and deduce a pattern from these results.
They're also used a lot in engineering and operations, when measuring metrics such as Mean Time between Failures.
For business related tasks, I find that I use the X Y scatter plot a lot less than the Bar chart or Line chart, but it is very useful for one particular task, and that is when I have a lot of data points and I want to deduce a pattern in the data, or group the data into different segments.
To show the X Y scatter plot in action, let's take the sample case of an insurance company that has 100 customers who pay an annual premium every year.
I'd like to see if a relationship exists between the annual premium and the age of the individual customers.
So I'll start by creating the X Y scatter plot, and off camera, I'll paste in my data with the age being on the X axis, and the premium amount on the Y axis.
I'll now minimize the Excel sheet and begin to format our chart.
I'll start by extending its width, I'll remove the legend which doesn't play a role here, and I'll also format the Y axis, so right click and Format Axis.
I'll go to Number and remove the Decimal places, which I don't really need here.
I'll also remove the horizontal lines, and off camera, I'll add labels to the X and Y axis using text boxes.
Now let's take a look at some of the insights from our data.
The insurance company in question only provides car insurance, and in the top left hand corner, we can see that young drivers pay a very large premium for this insurance.
As drivers get older, the premiums decrease, but also the premium begins to vary a lot more within this age group.
This could be due to the different types of car that people drive and the number of accidents that they've had in previous years.
From the scatter plot, we can discern that young people pay more for their car insurance than older people.
But what is the nature of the relationship between age and the insurance premium, and can we quantify it? Well here's where the X Y scatter comes into its own, as it has some additional tools for answering these questions.
We'll start by clicking on one of the points, right clicking and Add Trendline.
A trendline will try to find a regression line to quantify the relationship between age and the insurance premium.
From viewing our data, I think the relationship is likely to be Linear, so I'll leave the regression type at this option.
I'll also select Display Equation and Display R-squared Value, which I'll explain shortly.
I'll also change the line color to red, and a thickness of say two points to make it easier to read.
I'll then close, and we can see that a trendline has been added to our chart.
I'll be sure to move the equation and the R-squared value up to the top right so that it's readable.
The R-squared value determines how accurate our regression line and our equation are.
If R-squared was equal to one, it would mean that our equation would perfectly predict the premium amount for a given age.
With our current equation, we can now calculate the premium amount Y for a given age X, but the equation will contain some errors.
To improve the accuracy of this equation and increase R-squared to say above 0.9, we might need to add other variables to the equation, such as location, car size and driving history.
If you're interested in learning more about regression, be sure to check out my upcoming course on Statistics in Excel.
For now, I'll wrap up this slide by adding a slide title and a chart title off camera.
As an exercise, try to remove the negative values from the Y axis and also start the X axis at 10 instead of zero.
This should serve to improve the formatting of your chart.
I'll leave my answer in the after file below the video.