13.5.103 chi squared test with data
Perform a chi-square homogeneity test. An independent simple random sample of residents in three regions gave the data on race shown in the table. At the 1 % significance level, do the data provide sufficient evidence to conclude that a difference exists in race distributions among the three regions?
Name of variables
row total: rTotal
column total: cTotal
total numbers of data: total
original data: dframe1
expected frequencies data: dframe2
What are the null and alternative hypotheses?
Since the question asks “that a difference exists in race distributions among the three regions,” the correct hypothesis is
\(H_0:\) The racial distribution is the same in each of the three regions.
\(H_a:\) The racial distribution is not the same in each of the three regions.

Find the test statistic.,\(\chi^2\)
First we need to get the data from the question. (We can import it from Excel)
<- read.csv("https://raw.githubusercontent.com/sileaderwt/MTH1320-UMSL/main/Image%2BData/13.5.103/13.5.103.csv")
data data
## X. White Black Other
## 1 East 99 10 7
## 2 Midwest 124 18 10
## 3 West 120 11 20
We store data into 2 different frame. The first one shows original data. The second shows expected frequency.
= data.frame(data)
dframe1 = data.frame(data) dframe2
Conditions to run a chi-square test
All expected frequencies are 1 or greater
At most 20% of the expected frequencies are less than 5
The sample is a simple random sample
The sample is an independent sample
First approach: using chisq() in R which professor Covert introduces in the video lectures.(Recommended)
This approach is cleaner and faster
First we need to create a new data frame which does not contain a column of names so our data frame only contains data.
We can use subset to drop column of name.
= subset(dframe1, select = -c(X.))
dframe3 dframe3
## White Black Other
## 1 99 10 7
## 2 124 18 10
## 3 120 11 20
To see the difference between dframe1 and dframe3, we can run
dframe1
## X. White Black Other
## 1 East 99 10 7
## 2 Midwest 124 18 10
## 3 West 120 11 20
We run chisq.test()
chisq.test(dframe3, correct=FALSE)
##
## Pearson's Chi-squared test
##
## data: dframe3
## X-squared = 7.2825, df = 4, p-value = 0.1217

Since df = 4, to find critical value for \(\alpha = .01\)
= .01
alpha round(qchisq(1- alpha,4),3)
## [1] 13.277

Since \(\chi^2\) is right-tailed test by nature, our test statistic does not lie in rejected region 7.282 < 13.277 , we do not have enough evidence to reject the hypothesis

If the question ask about expected frequencies, we can find it by running
chisq.test(dframe3, correct=FALSE)$expected
## White Black Other
## 1 94.95943 10.79714 10.24344
## 2 124.42959 14.14797 13.42243
## 3 123.61098 14.05489 13.33413
Second approach using formular
First we need to find row total and column total.
= rep(0, nrow(dframe2))
cTotal = c()
rTotal for (i in 2:ncol(dframe2)){
= cTotal + dframe2[,i]
cTotal = c(rTotal, sum(dframe2[,i]))
rTotal
}= sum(cTotal) total
cTotal
## [1] 116 152 151
rTotal
## [1] 343 39 37
total
## [1] 419
We can find the test statistic \(\chi^2\) by using the formula \(\chi^2=\sum{\frac{(O-E)^2}{E}}\)
for (i in 2:ncol(dframe2)){
<- sum(dframe2[,i])*cTotal/total
dframe2[,i] print(dframe2[,i])
}
## [1] 94.95943 124.42959 123.61098
## [1] 10.79714 14.14797 14.05489
## [1] 10.24344 13.42243 13.33413
dframe2
## X. White Black Other
## 1 East 94.95943 10.79714 10.24344
## 2 Midwest 124.42959 14.14797 13.42243
## 3 West 123.61098 14.05489 13.33413
Find the test statistic.,\(\chi^2\)
= 0
chi for (i in 2:ncol(dframe2)){
= chi + sum((dframe1[,i]-dframe2[,i])^2/dframe2[,i])
chi
} chi
## [1] 7.282498
Round to three decimal places
round(chi,3)
## [1] 7.282
Find the critical value.\(\chi_{\alpha}^2\)
We find degree of freedom by using the formula \(df=(r-1)(c-1)\)
Since dframe2 has 4 variables and the first one is x, so ncol(dframe2) returns 1 more than actual value. We find degree of freedom by running
= (ncol(dframe2)-2) * (nrow(dframe2)-1)
df df
## [1] 4
Our significant level is 1%, so \(\alpha = .01\)
= .01
alpha round(qchisq(1- alpha,df),3)
## [1] 13.277
Since \(\chi^2\) is right-tailed test by nature, our test statistic does not lie in rejected region 7.282 < 13.277 , we do not have enough evidence to reject the hypothesis
Hope that helps!