You’ve built a scatter plot and managed to drag and drop a trend line. Now what? What does your trend line tell you? To answer this question, we must look at the description of the model which we can access by either hovering over the trend line or by right-clicking on it and then choosing ‘Describe Trend Line’ or ‘Describe Trend Model’. Each of these options provides different levels of description with the latter being the most detailed and showing the entire regression table.

 

 

Here, amongst other information, you can find the exact trend line formula, the p-values and the R-squared which is a measure of how well the trend line fits to the data.

R-Squared

R-Squared (goodness-of-fit) is a measure of how well the data fits the linear model. The coefficient of determination, R-squared, is used to analyse how differences in one variable can be explained by a difference in a second variable.

More specifically, R-squared gives you the percentage variation in y explained by x-variables. The range is 0 to 1 (i.e. 0% to 100% of the variation in y can be explained by the x-variables.

0% represents a model that does not explain any of the variance in the response variable around its mean.

100% represents a model that explains all of the variation in the response variable around its mean.

Usually, the larger R-Squared, the better the regression model fits our observations. As a rule of thumb, aim for 80% or higher.

 

P-Value A P-Value with <5% typically indicates a strong evidence against the null hypothesis, so we can reject the null hypothesis. A large P-Value, usually >5% indicates weak evidence against the null hypothesis.

In other words, a p-value is a measure of statistical confidence. It measures how confident we are that the model will still be valid if we have a lot more data.

 

 

This scatter plot shows relationship between % of people with BA or higher education and Obesity.

 

 

The R-squared for this model is 0.45 which means that 45% of variation in average obesity is explained by variation in % of people with BA or higher education.

The P-value for this model is < 0.0001 which indicates a very strong evidence against null hypothesis. This means that average obesity is linked to higher education and that this model will still be valid if we have more data.

For further reading you could refer to Nai Louza’s (Data School UK) blog or this article on Tableau.

 

Shuchita Sharma
Author: Shuchita Sharma