14.3.100 SST, SSR, and SSE
Following are chest size and weight data for 8 randomly selected bears. Here, x denotes chest size, in inches, and y denotes weight, in pounds. Use the information to do parts (a) through (d).
\(\sum x = 396, \sum y = 2362, \sum xy = 118330, \sum x^2 = 19870\)Compute SST, SSR, and SSE, using the formulas, .
First we need to get the data from the question. (We can import it from Excel)
<- c(55, 44, 46, 58, 40, 54, 52, 47)
x<- c(325, 253, 263, 340, 251, 315, 310, 305) y
From formula sheet
\(S_{xx}=\sum(x_i-\bar{x})^2=\sum x_i^2-(\sum x_i)^2/n\)
\(S_{xy}=\sum(x_i-\bar{x})(y_i-\bar{y})=\sum x_iy_i-(\sum x_i)(\sum y_i)/n\)
\(S_{yy}=\sum(y_i-\bar{y})^2=\sum y_i^2-(\sum y_i)^2/n\)
Total sum of squares: \(SST =\sum(y_i-\bar{y})^2 = S_{yy}\)
Regression sum of squares: \(SSR=\sum(\hat{y_i}-\bar{y})^2=S_{xy}^2/S_{xx}\)
Error sum of squares: \(SSE=\sum (y_i-\hat{y_i})^2=S_{yy} - S_{xy}^2/S_{xx}\)
Regression identity: \(SST = SSR + SSE\)
Coefficient of determination: \(r^2=\frac{SSR}{SST}\)
Linear correlation coefficient: \(r=\frac{\frac{1}{n-1}\sum(x_i-\bar{x})(y_i-\bar{y})}{s_xs_y}\) or \(r=\frac{S_{xy}}{\sqrt{S_{xx}S_{yy}}}\)Names of variables
\(S_{xx}: Sxx\)
\(S_{xy}: Sxy\)
\(S_{yy}: Syy\)
First approach, we find SST, SSR, SSE without finding \(\bar x, \bar y\)
= length(x) n
We have the same dataset from the question.
Find \(S_{xy}, S_{xx}, S_{yy}\)
= sum(y*y) - sum(y)^2/n
Syy = sum(x*x) - sum(x)^2/n
Sxx = sum(x*y) - sum(x) * sum(y) /n Sxy
Sxx
## [1] 268
Sxy
## [1] 1411
Syy
## [1] 8373.5
Find SST
= Syy
SST SST
## [1] 8373.5
Round to two decimal places
round(SST,2)
## [1] 8373.5

Find SSR
= Sxy^2/Sxx
SSR SSR
## [1] 7428.81
Round to two decimal places
round(SSR,2)
## [1] 7428.81

Find SSE
= Syy - Sxy^2/Sxx
SSE SSE
## [1] 944.6903
Round to two decimal places
round(SSE,2)
## [1] 944.69

Check if SSE + SSR = SST
+ SSR == SST SSE
## [1] TRUE
Second approach finding SST, SSR, SSE using \(\bar x, \bar y\)
= sum((x-mean(x))^2)
Sxx = sum((x-mean(x))*(y-mean(y)))
Sxy = sum((y-mean(y))^2) Syy
Sxx
## [1] 268
Sxy
## [1] 1411
Syy
## [1] 8373.5
(b) Compute the coefficient of determination, \(r^2\)
First approach: using the formula
Linear correlation coefficient
= Sxy/sqrt(Sxx*Syy)
r r
## [1] 0.9419028
Coefficient of determination \(r^2\)
^2 r
## [1] 0.887181
Round to four decimal places
round(r^2, 4)
## [1] 0.8872

Second approach: using summary() in R
Multiple R-squared = coefficient of determination
summary(lm(y~x))
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.8228 -6.2799 0.3955 2.6325 22.9123
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.6362 38.1993 0.907 0.399502
## x 5.2649 0.7665 6.869 0.000469 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 12.55 on 6 degrees of freedom
## Multiple R-squared: 0.8872, Adjusted R-squared: 0.8684
## F-statistic: 47.18 on 1 and 6 DF, p-value: 0.0004691
Third approach: using cor() in R To find r, we run
cor(x,y)
## [1] 0.9419028
To find \(r^2\), we run
cor(x,y)^2
## [1] 0.887181
Round to 4 decimal places
round(cor(x,y)^2, 4)
## [1] 0.8872
(c) Determine the percentage of variation in the observed values of the response variable explained by the regression, and interpret your answer.
Show percentage value of \(r^2\)
round(cor(x,y)^2, 4) * 100
## [1] 88.72

(d) State how useful the regression equation appears to be for making predictions. Choose the correct answer below.
Since the \(r^2\) value is close to 1, it is very useful to use regression equation

Hope that helps!