forked from saundersg/BYUI_M221_Book_R
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Lesson17.Rmd
565 lines (397 loc) · 28.2 KB
/
Lesson17.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
---
title: "Lesson 17: Inference for One Proportion"
output:
html_document:
theme: cerulean
toc: true
toc_float: false
---
<script type="text/javascript">
function showhide(id) {
var e = document.getElementById(id);
e.style.display = (e.style.display == 'block') ? 'none' : 'block';
}
</script>
<div style="width:50%;float:right;">
#### Optional Videos for this Lesson {.tabset .tabset-pills}
##### Part 1
<iframe id="kaltura_player_1651534456" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1651534456&entry_id=1_qpcd5feb" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 2
<iframe id="kaltura_player_1651847110" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1651847110&entry_id=1_kf3xiox0" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 3
<iframe id="kaltura_player_1651535955" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1651535955&entry_id=1_6v3cp5w5" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
##### Part 4
<iframe id="kaltura_player_1651536186" src="https://cdnapisec.kaltura.com/p/1157612/sp/115761200/embedIframeJs/uiconf_id/47306393/partner_id/1157612?iframeembed=true&playerId=kaltura_player_1651536186&entry_id=1_dt3o2pa3" width="480" height="270" allowfullscreen webkitallowfullscreen mozAllowFullScreen allow="autoplay *; fullscreen *; encrypted-media *" frameborder="0"></iframe>
</div><div style="clear:both;"></div>
## Lesson Outcomes
By the end of this lesson you should be able to do the following.
**Regarding Confidence Intervals for a single proportion:**
* Calculate and interpret a confidence interval for a single proportion given a confidence level.
* Identify a point estimate and margin of error for the confidence interval.
* Show the appropriate connections between the numerical and graphical summaries that support the confidence interval.
* Check the requirements for the confidence interval.
* Calculate a desired sample size given a level of confidence and margin of error (with or without a prior estimate of the population proportion).
**Regarding Hypothesis Testing for a single proportion:**
* State the null and alternative hypothesis.
* Calculate the test-statistic and p-value of the hypothesis test.
* Assess the statistical significance by comparing the p-value to the α-level.
* Check the requirements for the hypothesis test.
* Show the appropriate connections between the numerical and graphical summaries that support the hypothesis test.
* Draw a correct conclusion for the hypothesis test.
<br>
## Confidence Interval for One Proportion
### Honesty at Medical School
<img src="./Images/Medical_Symbol.png">
Frederick Sierles and his colleagues distributed an anonymous survey to students at two American medical schools. The questionnaire was given during class without any prior announcement to students. The authors of the study personally supervised the distribution and collection of the surveys. 95% of the students completed the survey, and students from all four years of medical school training were represented. A total of 428 individuals participated in the survey. Among this group, 249 people indicated that they had cheated in some way during medical school. The results were published in a journal article in 1980.
We want to use the data from this study to generalize to a larger population. We are not usually interested in the *particular* individuals' responses. The reason the study was conducted is to provide an estimate of the true population proportion, $p$. $\widehat p$ is called a **point estimate** of $p$. The sample proportion, $\widehat p$ is one point on the number line that estimates the value of the true proportion, $p$.
A point estimate like $\widehat p$ is helpful, but it does not give us direct information on *how close* it is to the true parameter, $p$. We use a confidence interval to find a range of plausible values for the parameter.
### Confidence Intervals
<!-- {{:Lesson_17:Content/F2F}} -->
To find a confidence interval for one population proportion, $p$, we follow the same pattern as was done in the estimates for $\mu$ in the lesson titled [Inference for One Mean: Sigma Known (Confidence Interval)](Lesson10.html){target="_blank"}. We start with the point estimate of $p$ and we add and subtract a certain number of standard deviations from this value.
The point estimate for $p$ is $\widehat p$. You might want to review the mean and standard deviation of the random variable $\widehat p$ in the lesson on [Describing Categorical Data: Proportions; Sampling Distribution of a Sample Proportion](Lesson16.html){target="_blank"}. Traditionally, people have used these equations to create confidence intervals for the population proportion.
The formula for the confidence interval for one proportion is:
$$
\left(
\displaystyle {\widehat p - z^* \sqrt{\frac{\widehat p (1-\widehat p)}{n}},
\widehat p + z^* \sqrt{\frac{\widehat p (1-\widehat p)}{n}}}
\right)
$$
$$\text{where }\displaystyle{ \widehat p = \frac{x}{n} }$$.
You can use the normal probability applet to compute $z^*$. Please see the lesson on [Inference for One Mean: Sigma Known (Confidence Interval)](Lesson10.html){target="_blank"} if you need to review this procedure.
Be sure that you do not round any values until the last step. Please perform this entire computation without rounding.
Remember that for a 95% confidence interval, $z^* = 1.96$.
So, the lower bound for the 95% confidence interval for the true proportion $p$ is:
$$
\displaystyle {
\widehat p - z^* \sqrt{\frac{\widehat p (1-\widehat p)}{n}}
= \frac{249}{428} - 1.96 \sqrt{\frac{\frac{249}{428} \left(1-\frac{249}{428}\right)}{428}}
= 0.535 }
$$
The upper bound for the 95% confidence interval for the true proportion $p$ is:
$$
\displaystyle {
\widehat p + z^* \sqrt{\frac{\widehat p (1-\widehat p)}{n}}
= \frac{249}{428} + 1.96 \sqrt{\frac{\frac{249}{428} \left(1-\frac{249}{428}\right)}{428}}
= 0.629 }
$$
The 95% confidence interval for the true proportion of medical students who cheat is: $(0.535, 0.629)$.
To interpret this interval, we say that we are 95% confident that the true proportion of people who cheat in medical school is between 0.535 and 0.629. This represents the range of plausible values for the true proportion of students who cheat at these medical schools.
#### Requirement
Like other procedures, there are requirements that must be checked in order for this confidence interval to be valid.
The confidence intervals are valid whenever $n \widehat p \ge 10$ and $n(1-\widehat p) \ge 10$. Notice that for the data on cheating in medical school, we have $428 * 0.582 = 249$ and $428 * (1-0.582) = 179$ which are both greater than 10, so this requirement is satisfied.
### Using Excel to perform these calculations
<!-- {{Content:Excel/SPSS-Plus-4_CI}} -->
Finding confidence intervals for one proportion using only a calculator is tedious. An Excel spreadsheet has been created to help you quickly and accurately perform these calculations. You will use this spreadsheet throughout this and other lessons.
To download this file, click here: [Math 221 Statistics Toolbox](./Data/Math221StatisticsToolbox.xltx)
<span class="Custom">Click on the link at right</span> for instructions on using this spreadsheet to calculate confidence intervals.
<a href="javascript:showhide('ins')"><span style="font-size:8pt;">Show/Hide Instructions</span></a>
<div id="ins" style="display:none;">
For this example we will consider the "Honesty at Medical School" data above.
- **Step 1**: Open the Math 221 Statistics Toolbox and click on the "One Proportion" tab.
- The yellow boxes indicate the input spaces. The first step, getting an estimate of $\hat{p}$, is accomplished by inputting the value of $x$ and the value of $n$ from the data above. Look at the group of cells labeled "Numerical Summary" and input $x$ into cell D7. Type $n$ in cell C7.
<img src="./Images/Lesson_17_pic_1.PNG">
You may have noticed that after you input the values of $x$ and $n$ into the yellow cells that the values of the other cells changed automatically. Excel performs all of the necessary calculations for you.
<img src="./Images/Lesson_17_pic_4.PNG">
- **Step 2**: Next we must indicate our desired level of confidence in cell I11. By default it is set to 0.95, meaning a 95% confidence level. The 95% confidence interval is given in the cells K14 and L14. For convenience, $\hat{p}$ and the margin of error are also shown in this block of output.
<img src="./Images/Lesson_17_pic_5.PNG">
Compare this confidence interval with the one you calculated by hand. They're the same!
</div>
<br>
### Another Study on Honesty at Medical School
DeWitt C. Baldwin, Jr. and others conducted a larger study to assess how widespread cheating is in medical schools. Elected class officers at 40 schools were invited to distribute a survey to their second-year classmates. Surveys were completed by students from 31 of the 40 schools. Among all students attending the 31 schools, 62% participated in the survey, yielding a total of $n=2426$ surveys. Out of this group, $x=114$ admitted to cheating in medical school. These results were published in *Academic Medicine* in 1996.
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
1. Are the requirements for creating a confidence interval satisfied?
<a href="javascript:showhide('Q1')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q1" style="display:none;">
* Yes, $2426*0.047 = 114$ and $2426*(1-0.047) = 2312$, so it is appropriate to use this procedure to estimate the true proportion of students who cheat in medical school.
</div>
<br>
2. What is the value of $\widehat p$ in this study?
<a href="javascript:showhide('Q2')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q2" style="display:none;">
<center>
$$
\displaystyle{\widehat p = \frac{114}{2426} = 0.047}
$$
</center>
</div>
<br>
3. Use Excel to calculate the lower bound for the 95% confidence interval for the true proportion $p$.
<a href="javascript:showhide('Q3')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q3" style="display:none;">
<center>
$$
\displaystyle{\widehat p - z^* \sqrt{\frac{\widehat p (1-\widehat p)}{n}}
= \frac{114}{2426} - 1.96 \sqrt{\frac{\frac{114}{2426} \left(1-\frac{114}{2426}\right)}{2426}}
= 0.039}
$$
</center>
</div>
<br>
4. Use Excel to help you find the 95% confidence interval for the true proportion of medical students who cheat based on the data from this larger study.
<a href="javascript:showhide('Q4')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q4" style="display:none;">
* The upper bound for the 95% confidence interval for the true proportion of students who cheat in medical school is:
$$
\widehat p + z^* \sqrt{\frac{\widehat p (1-\widehat p)}{n}}
= \frac{114}{2426} + 1.96 \sqrt{\frac{\frac{114}{2426} \left(1-\frac{114}{2426}\right)}{2426}}
= 0.055
$$
* So, the 95% confidence interval for the true proportion of students who cheat at medical school is: $(0.039, 0.055)$
</div>
<br>
5. Compare the confidence intervals obtained from the Sierles study to the confidence interval from Baldwin's study. How do the results compare to each other?
<a href="javascript:showhide('Q5')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q5" style="display:none;">
* The first study concluded that the mean proportion of cheaters in medical school is in the range (0.535,0.629), while the second study concluded a much lower range of possible proportions of cheaters (0.039,0.055). It seems quite likely that at least one of the studies is not accurate.
</div>
<br>
6. What are some possible factors that might explain the discrepancy in these two studies?
<a href="javascript:showhide('Q6')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q6" style="display:none;">
* Possible factors could include: Elected class officers giving the surveys may have skewed results, perhaps the 31 schools have less cheating than the original 2, the non-participating schools may have skewed results. Another possibility is that cheating is more prevalent in the later years of medical school,since the second study only examined second-year students.
</div>
<br>
7. How would you feel if you knew that your doctor cheated in medical school?
<a href="javascript:showhide('Q7')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q7" style="display:none;">
* Any thoughtful answer is sufficient, but we would likely not be happy!
</div>
<br>
8. Write a paragraph explaining why it is important to you to be honest in all your dealings with your fellow men--including your academic pursuits. Be sure to include a discussion of your future plans with regard to this issue.
<a href="javascript:showhide('Q8')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q8" style="display:none;">
* Any thoughtful paragraph is sufficient.
</div>
</div>
<br>
## Sample Size Calculations
<br>
<div class="message Note">**Think about it:** What happens to the margin of error in a confidence interval if the sample size is increased?</div>
<br>
<br>
If you can reduce the margin of error by increasing the sample size, then you can achieve a specific margin of error by choosing a large enough sample. So, if you are planning a future study, you can estimate the sample size you need to obtain a desired margin of error, $m$.
The formula for the margin of error is:
$$
m = z^* \sqrt{\frac{\widehat{p} (1- \widehat{p})}{n}}
$$
If we solve this equation for $n$, we get:
$$
n = \left( \frac{z^*}{m} \right)^2 \widehat{p} (1-\widehat{p})
$$
Note that this equation requires us to know the value of $\widehat{p}$. Unless we do a study, we do not know the value of $\widehat{p}$.
Sometimes we have a prior estimate of the true proportion of successes, denoted $p^*$.
**If we have a prior estimate for $\widehat{p}$,** (namely $p^*$,) we can plug this value into the equation above to compute the sample size required to obtain our desired margin of error:
$$
n = \left( \frac{z^*}{m} \right)^2 p^* (1-p^*)
$$
where $z^*$ is determined by your confidence level, $m$ is your desired margin of error, and $p^*$ is an estimate of the true proportion of successes.
**If no prior estimate for $p$ is available**, we can use the following formula to compute our sample size:
$$
n = \left( \frac{z^*}{2m} \right)^2
$$
The latter formula (where no prior estimate for $p$ is available) will result in excessively large sample sizes if $p$ is small (say, less than 0.3) or large (say, greater than 0.7.) Otherwise, the results for the two equations will be fairly similar.
No matter what value you obtain for the sample size, if it is not a whole number **round it up** to the nearest whole number.
### Example
If you want to find the sample size required to get a margin of error of $m=0.03$ with 95% confidence, and previous studies have shown that the true proportion is approximately equal to $p^*=0.82$, then the sample size required would be:
$$
\displaystyle {
n = \left( \frac{z^*}{m} \right)^2 p^* (1-p^*) = \left( \frac{1.96}{0.03} \right)^2 (0.82) (1-0.82) = 630.02
}
$$
We need to round this answer up to the next larger whole number. So, you would need to collect $n=631$ observations to obtain the desired margin of error.
<br>
## Hypothesis Test for One Proportion
<img src="./Images/StepsAll.png">
### Can You Taste PTC?
<img src="./Images/Step1.png">
The ability to taste the chemical Phenylthiocarbamide (PTC) is hereditary. Some people can taste it, while others cannot. The ability to taste PTC is typically assessed using paper test strips. When a PTC test strip is placed on the tongue, it will either taste like regular paper or else have a bitter taste.
<img src="./Images/Step2.png">
It is believed that 70% of all people are able to taste PTC. Data were collected by Elise Johnson to investigate this claim. Volunteers were provided with PTC test strips and asked if they could taste anything besides paper.
<img src="./Images/Step3.png">
Out of the 118 people who participated in the research, 89 indicated that they can taste PTC. The proportion of people in the sample who could taste PTC is
$$
\widehat p = \frac{89}{118} = 0.754
$$
In other words, 75.4% of the people surveyed could taste the chemical.
<img src="./Images/PTC_Pie_Graph_Excel.PNG">
<br>
<div class="message Tip">**Review:** For a review of how to make pie graphs and bar charts in Excel, read [Describing Categorical Data: Proportions; Sampling Distribution of a Sample Proportion](Lesson16.html){target="_blank"}. You can also use the pie graph produced in the Math221 Statistics Toolbox. When doing so, consider altering the labels in the graph to be more descriptive of your particular study.</div>
<br>
<br>
<img src="./Images/Step4.png">
The empirical research suggested that the proportion of people who can taste PTC is $\frac{89}{118} = 0.754$, or 75.4%.
Is this significantly different from the assumed value of 0.70 (i.e., 70%)? We can test this question using a hypothesis test.
If the following conditions are satisfied:
- $np \ge 10$
- $n(1-p) \ge 10$
then the sample size is large enough that the Central Limit Theorem suggests the sample proportion, $\widehat p$, is approximately normal.
Also, the true mean of $\widehat p$ is $p$, and the standard deviation is $\sqrt{\frac{p \cdot (1-p)}{n}}$.
Notice that the requirements are satisfied for the PTC data:
$$
\begin{array}{ll}
np = 118 \cdot 0.70 = 82.6 \ge 10 & \surd \\
n(1-p) = 118 \cdot (1-0.70) = 35.4 \ge 10 & \surd
\end{array}
$$
We can use a procedure that mimics the test for a single mean with $\sigma$ known from the lesson titled [Inference for One Mean: Sigma Known (Hypothesis Test)](Lesson09.html){target="_blank"} to conduct a test for a single proportion.
It is assumed that the true proportion of people who can taste PTC is 0.70. This is the null hypothesis. The alternative hypothesis is that the true proportion is different from 0.70.
$$
\begin{align}
H_0: & p = 0.70 \\
H_a: & p \ne 0.70
\end{align}
$$
We will use the $\alpha=0.05$ level of significance in this test.
If the requirements are satisfied, then $\widehat p$ is approximately normal with mean $p$ and standard deviation $\sqrt{\frac{p \cdot (1-p)}{n}}$. The test can be based on the standard normal ($z$) distribution. The test statistic is:
$$
z = \frac{\textrm{value}-\textrm{mean}}{\textrm{standard deviation}} = \frac{\widehat p - p}{\sqrt{\frac{p(1-p)}{n}}} = \frac{\frac{89}{118} - 0.70}{\sqrt{\frac{0.70(1-0.70)}{118}}} = 1.286
$$
Remember, we assume that the null hypothesis is true, so we use the value given in the null hypothesis for $p$.
Using the NormalApplet, you can find the $P$-value. This is a two-tailed test, since the alternative hypothesis includes both values above 0.70 and below 0.70. In the applet, make sure both tails are shaded, then enter the $z$-score of 1.286.
<img src="./Images/ZShadeBothTails-1-286.png">
The combined area in the two tails is 0.1984, which is greater than $\alpha = 0.05$. We fail to reject the null hypothesis.
<img src="./Images/Step5.png">
We conclude that there is insufficient evidence to suggest that the true proportion of the population that can taste PTC is different from 0.70.
There is no reason to revise existing perspectives on the prevalence of the ability to taste PTC.
<div class="QuestionsHeading">Answer the following question:</div>
<div class="Questions">
14. Compare and contrast the test for one mean with $\sigma$ known and the test for one proportion. Give at least two similarities and two differences.
<a href="javascript:showhide('Q14')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q14" style="display:none;">
* Similarities: There is one population. Both test are based on the $z$ statistic. Both tests require the use of the normal probability applet.
* Differences: The test for means involves quantitative data, the test for a proportion involves categorical data. The formulas for the $z$-score differ.
</div>
</div>
<br>
### Using Excel to perform these calculations
<!-- {{Content:Excel/SPSS-One Proportion HT}} -->
The Math 221 Statistics Toolbox in Excel can also be used to perform hypothesis tests for one proportion.
To download this file, click here: [Math 221 Statistics Toolbox](./Data/Math221StatisticsToolbox.xltx)
<span class="Custom">Click on the link at right</span> for instructions on using this spreadsheet to perform hypothesis testing.
<a href="javascript:showhide('ins2')"><span style="font-size:8pt;">Show/Hide Instructions</span></a>
<div id="ins2" style="display:none;">
For this example we will consider the "PTC" data above.
- **Step 1**: Open the Math 221 Statistics Toolbox and click on the "One Proportion" tab.
- The yellow boxes indicate the input spaces. The first step, getting an estimate of $\hat{p}$, is accomplished by inputting the value of $x$ and the value of $n$ from the data above. Look at the group of cells labeled "Numerical Summary" and input $x$ into cell D7. Type $n$ in cell C7.
<img src="./Images/Lesson_17_pic_6.PNG">
- **Step 2**: Now we need to define the hypotheses for our test. To the right of the "Numerical Summary" cells is a block of cells labeled "Hypothesis Test". We must change the yellow cells in this block of the sheet in order to test the correct null and alternative hypothesis. The value in cell L5 contains the value of $p$ in our null hypothesis. Cell K6 is a drop-down list where we can select the type of alternative hypothesis we wish to test, i.e. "Greater Than", "Less Than", or "Not Equal To."
The results of the hypothesis test are given in the output boxes:
<img src="./Images/Lesson_17_pic_3.PNG">
</div>
<br>
### Water Quality
<img src="./Images/Gorges_du_Verdon_River_from_Bottom_0364.jpg">
<img src="./Images/Step1.png">
Macroinvertebrates are small insects (without an internal skeleton) that live on the bottom of a stream. These insects are ideal for monitoring changes in water quality, because they (1) live nearly all their life in the water, (2) are easy to collect and identify, (3) often live for several years, (4) have a limited ability to migrate, and (5) they are influenced by environmental conditions.
In any population of macroinvertebrates, there will be indicators of good health and indicators of poor health. Data are collected by capturing macroinvertebrates and recording whether they indicate good health or poor health for the river. In particular sections of a small river near Bozeman, Montana, about 60% of the indicators observed have historically been associated with good health.
<img src="./Images/Step2.png">
Researchers suspect that the water quality in the area has decreased, suggesting that less than 60% of the indicators will show good health. A random sample of macroinvertebrates were captured from the river.
<img src="./Images/Step3.png">
Among the $n=40$ observed indicators of health, $x=19$ suggested good health. Use this information to answer the following question.
<div class="QuestionsHeading">Answer the following question:</div>
<div class="Questions">
15. What is the proportion of the observed indicators that suggested good health? Express your answer as a decimal and a percentage.
<a href="javascript:showhide('Q15')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q15" style="display:none;">
<center>
$\displaystyle {\widehat p = \frac{x}{n} = \frac{19}{40} = 0.475~\text{or}~47.5\%}$
</center>
<img src="./Images/Water_quality_pie_graph_Excel.PNG">
</div>
</div>
<br>
<img src="./Images/Step4.png">
The following questions will guide you through the process of conducting a hypothesis test to determine if the water quality has decreased. Use $\alpha=0.05$ for this test.
<div class="QuestionsHeading">Answer the following questions:</div>
<div class="Questions">
16. The two requirements required to conduct a hypothesis test for one proportion are
<center>
$$
\begin{array}{l}
np \ge 10 \\
n(1-p) \ge 10
\end{array}
$$
</center>
* Are these requirements satisfied?
<a href="javascript:showhide('Q16')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q16" style="display:none;">
* Yes, the requirements are satisfied.
<center>
$$
\begin{array}{ll}
np = 40 \cdot 0.6 = 24 \ge 10 & \surd \\
n(1-p) = 40 \cdot (1-0.6) = 16 \ge 10 & \surd
\end{array}
$$
</center>
</div>
<br>
17. The null hypothesis is $H_0: p = 0.6$ What is the alternative hypothesis?
<a href="javascript:showhide('Q17')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q17" style="display:none;">
<center>
$H_a: p < 0.6$
</center>
</div>
<br>
18. Fill in the blanks to compute the $z$-score.<br />
<center>
$$
\displaystyle{
z = \frac{\widehat p - p}{\sqrt{\frac{p(1-p)}{n}}}
= \frac{()-0.60}{\sqrt{\frac{0.60(1-0.60)}{40}}}
= -1.614}
$$
</center>
<a href="javascript:showhide('Q18')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q18" style="display:none;">
* The missing value is: <br />
<center>
$\displaystyle{\widehat p = \frac{19}{40}=0.475}$
</center>
</div>
<br>
19. The $P$-value will be the area under the normal curve to the left of $z$. Why will you only shade the left tail?
<a href="javascript:showhide('Q19')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q19" style="display:none;">
* We want to test if the water quality has *decreased*.
* The alternative hypothesis is that the proportion of healthy indicators is less than 0.6.
</div>
<br>
20. Using the Normal Probability Applet, it is determined that the area to the left of $z=-1.614$ is 0.053.
<img src="./Images/ShadeLeftZ-1-614.png">
The shaded area in this figure (0.053) represents the $P$-value for this test. What is the decision for this test, do we reject the null hypothesis or fail to reject the null hypothesis? Give an English sentence summarizing the conclusion.
<a href="javascript:showhide('Q20')"><span style="font-size:8pt;">Show/Hide Solution</span></a>
<div id="Q20" style="display:none;">
* $P\textrm{-value} = 0.053 > 0.05 = \alpha$.
* We fail to reject the null hypothesis.
* There is insufficient evidence to suggest that the true proportion of indicators that suggest good health is less than 0.6.
</div>
</div>
<br>
<img src="./Images/Step5.png">
Even though the proportion of indicators that suggested good health was less that 60%, it was not statistically significantly less than 60%. Unless future research indicates to the contrary, we cannot say that the water quality in this river has decreased.
<br>
## Summary
<div class="SummaryHeading">Remember...</div>
<div class="Summary">
- The **estimator** of $p$ is $\widehat p$. $\displaystyle{ \widehat p = \frac {x}{n}}$ and is used for both confidence intervals and hypothesis testing.
- You will use the Excel spreadsheet [Math 221 Statistics Toolbox](./Data/Math221StatisticsToolbox.xltx) to perform hypothesis testing and calculate confidence intervals for problems involving one proportion.
- The requirements for a confidence interval are $n \widehat p \ge 10$ and $n(1-\widehat p) \ge 10$. The requirements for hypothesis tests involving one proportion are $np\ge10$ and $n(1-p)\ge10$.
- We can determine the sample size we need to obtain a desired margin of error using the formula $\displaystyle{ n=\left(\frac{z^*}{m}\right)^2 p^*(1-p^*)}$ where $p^*$ is a **prior estimate** of $p$. If no prior estimate is available, the formula $\displaystyle{ \left(\frac{z^*}{2m}\right)^2}$ is used.
<br>
</div>
<br>
## Navigation
<center>
| **Previous Reading** | **This Reading** | **Next Reading** |
| :------------------: | :--------------: | :--------------: |
| [Lesson 16: <br> Describing Categorical Data: Proportions; Sampling Distribution of a Sample Proportion](Lesson16.html) | Lesson 17: <br> Inference for One Proportion | [Lesson 18: <br> Inference for Two Proportions](Lesson18.html) |
</center>