Regression

Sunday, 18 August 2019

8:42 PM

10.6 Adequacy of the regression model 
Variability decomposition 
Similarly to the notations on Slide 11, we can define 
n 
E(Yi - 
Syy 
i=l 
this measures the total amount of variability in the response values, 
and is sometimes denoted sst (for 'total sum of squares') 
Now, this variability in the observed values Yi arises from two factors: 
O because the Xi values are different, all Yi have different means. This 
variability is quantified by the 'regression sum of squares': 
n 
ssr 
O each value Yi has variance 02 around its mean. This variability is 
quantified by the 'error sum of squares': 
n 
sse 
We can always write: 
MATH2099/2859 (Statistics) 
i=l 
sst = ssr + sse 
Dr Jia Deng 
n 
i=l 
Term 2019 - Lecture g 
44/50

 

10.6 Adequacy of the regression model 
Coefficient of determination 
Suppose sst ssr and sse 0: the variability in the responses due to 
the effect of the predictor is almost the total variability in the responses 
—+ all the dots are very close to the straight line, the predictions are 
very accurate: the linear regression model fits the data very well 
Now suppose sst sse and ssr 0: almost the whole variation in the 
responses is due to the error terms 
*the dots are very far away from the fitted straight line, the 
predictions are very imprecise: the regression model is useless 
+ comparing ssr to sst allows us to judge the model adequacy 
The quantity r2, called the coefficient of determination, defined as 
2 ssr 
sst 
represents the proportion of the variability in the responses that is 
explained by the predictor and hence taken into account in the model. 
MATH2099/2859 (Statistics) 
Dr Jia Deng 
Term Lecture g 45/50

 

 

WE WANT  TO BE CLOSER TO 1

 

Created with Microsoft OneNote 2016.