13.4.75 expected frequencies
The contingency table shown to the right gives a cross-classification of a random sample of values for two variables, x and y, of a population.
First we need to get the data from the question. (We can import it from Excel)
<- read.csv("https://raw.githubusercontent.com/sileaderwt/MTH1320-UMSL/main/Image%2BData/13.4.75/13.4.75.csv")
data data
## X A B
## 1 a 50 20
## 2 b 10 40
We store data into dframe
= data.frame(data)
dframe dframe
## X A B
## 1 a 50 20
## 2 b 10 40
Conditions to run a chi-square test
All expected frequencies are 1 or greater
At most 20% of the expected frequencies are less than 5
The sample is a simple random sample
The sample is an independent sample
(a). Compute the expected frequencies and add them into the table given below.
First we need to create a new data frame which does not contain a column of names so our data frame only contains data.
We can use subset to drop column of name.
= subset(dframe, select = -c(X))
dframe2 dframe2
## A B
## 1 50 20
## 2 10 40
To see the difference between dframe and dframe2, we can run
dframe
## X A B
## 1 a 50 20
## 2 b 10 40
To find expected frequencies, we can run
chisq.test(dframe2, correct=FALSE)$expected
## A B
## 1 35 35
## 2 25 25

b. The test statistic is
We run chisq.test()
chisq.test(dframe2, correct=FALSE)
##
## Pearson's Chi-squared test
##
## data: dframe2
## X-squared = 30.857, df = 1, p-value = 2.777e-08
round(chisq.test(dframe2, correct=FALSE)$statistic,2)
## X-squared
## 30.86

(c). What are the null and alternative hypotheses? Since chi-square statistic test the independence of two variables, the correct hypothesis is
\(H_0:\) The two variables, x and y, are not associated
\(H_a:\) The two variables, x and y, are associated.

Since df = 1, to find critical value for \(\alpha = .51\)
= .05
alpha round(qchisq(1- alpha,1),3)
## [1] 3.841

Since \(\chi^2\) is right-tailed test by nature, our test statistic lies in rejected region 30.86 > 3.841 , we have enough evidence to reject the hypothesis
Hope that helps!