We ended Chapter 9 with a discussion of hypothesis testing for equality of two variances. If the null hypothesis cannot be rejected, then the assumptions we made about the equality of variances in Chapter 9 are justified. The test statistic used was F-statistic which will be helpful in solving some problems in Chapters 10, 11 and 12.
We started the new Chapter 10 by first looking at a farming problem where the farmer is trying to determine whether she should use low, medium or high level of fertilizer to maximize the yield. To obtain samples we use experimental design with one factor.
The hypothesis that all levels of fertilizer are equally effective is tested using Analysis of Variance (ANOVA). We compare the between and within group variabilities which leads to a one-sided F-test. We found in the example that we can reject the equality hypothesis with a very low p value. We then looked at the confidence intervals for the differences between the means to see if one particular level of fertilization is better than the others. Chapter 10 ended by solving the same problem using MegaStat and examining the Summary Table which includes all the numbers we found manually.
In Week 11 we will do the course evaluations.
Blogs for Dr. M. Parlar's students in Bus Q600: Applied Statistics for Business (DeGroote School of Business, McMaster University)
Sunday, November 28, 2010
Tuesday, November 23, 2010
Summary for Week 10 (November 16-18)
We completed the discussion of Chapter 8 by considering a bottling problem (two-sided test) and then a problem with unknown population variance (which required us to use the t-distribution).
Chapter 9 was started with a problem involving the weight losses as a result of using the Atkins diet vs. the conventional diet. In this chapter we still make use of confidence intervals and hypothesis testing, but for two populations. Therefore, we don't encounter any new theory, but some of the formulas slightly differ from what we saw in Chapters 7 and 8. For example, when the variances are not known, we may assume their equality (which must be tested) and then compute the pooled variance.
The class ended by having two students toss 15 tennis balls into a bucket and testing the hypothesis that their success rates are equal. This material is not going to be in the final exam, but we did it to illustrate the use of MegaStat and to have some fun.
We will complete Chapter 9 in Week 11.
Chapter 9 was started with a problem involving the weight losses as a result of using the Atkins diet vs. the conventional diet. In this chapter we still make use of confidence intervals and hypothesis testing, but for two populations. Therefore, we don't encounter any new theory, but some of the formulas slightly differ from what we saw in Chapters 7 and 8. For example, when the variances are not known, we may assume their equality (which must be tested) and then compute the pooled variance.
The class ended by having two students toss 15 tennis balls into a bucket and testing the hypothesis that their success rates are equal. This material is not going to be in the final exam, but we did it to illustrate the use of MegaStat and to have some fun.
We will complete Chapter 9 in Week 11.
Saturday, November 20, 2010
Posting of assignment and exam marks
Dear Bus Q600 students:
As you know, I have been posting your assignment and exam marks on the course web site as a .pdf file after removing your first and last names. The only information that remains on the .pdf file about you is your student number which you should not reveal to others.
If you are not comfortable with this method of reporting your marks, please let me know so I can remove all information from the .pdf file related to your student number and your marks.
As you know, I have been posting your assignment and exam marks on the course web site as a .pdf file after removing your first and last names. The only information that remains on the .pdf file about you is your student number which you should not reveal to others.
If you are not comfortable with this method of reporting your marks, please let me know so I can remove all information from the .pdf file related to your student number and your marks.
Monday, November 15, 2010
Midterm exam results
We have finally sorted out the problems with missing/incomplete student numbers on the scan sheets. The exam marks can be accessed by clicking on this link.
The population mean is μ = 81%, which is very close to what I reported as the mean 79% of a sample of n = 5 exams. The distribution is bi-modal, meaning that there is a large number of marks around 80%, but also a bunching around 55%.
I am pleased with these results, but I hope that the students who had low marks will try harder in the final exam.
The population mean is μ = 81%, which is very close to what I reported as the mean 79% of a sample of n = 5 exams. The distribution is bi-modal, meaning that there is a large number of marks around 80%, but also a bunching around 55%.
I am pleased with these results, but I hope that the students who had low marks will try harder in the final exam.
Sunday, November 14, 2010
Summary for Week 9 (November 9-11)
I started by informing the class that the exams were still not completely marked because a few students entered their student numbers incorrectly on the scan sheets. This was taking us some time to sort out and that the results would be available early Week 10. However, a random sample of n = 5 exams which were marked manually revealed a sample mean of 79%, with a 95% CI ranging from 70% to 88%.
I then returned to the confidence interval problem for a proportion and talked about the election polls where interviewing roughly 1000 respondents is sufficient to get a 95% CI with a 3.1% margin of error. We found the correct sample size from a formula for n. This completed the discussion of Chapter 7.
Chapter 8 started with a taste test where a student claimed that he/she can tell the difference between Coke and Pepsi. The null hypothesis was that he/she would just be guessing. See this link for details of the experiment.
Null and alternative hypotheses were discussed in greater detail which were followed by a definition and examples of Type I and Type II errors. A one-sided test example with z-test involving cigarette tar content was presented. I then talked about the very important matter of p-value and showed that this value was very small in the cigarette example resulting in rejecting H0 for all α > p. The class ended with an example involving the DSB GMAT scores for 2005 (where the p value was very large, given the sample mean).
I then returned to the confidence interval problem for a proportion and talked about the election polls where interviewing roughly 1000 respondents is sufficient to get a 95% CI with a 3.1% margin of error. We found the correct sample size from a formula for n. This completed the discussion of Chapter 7.
Chapter 8 started with a taste test where a student claimed that he/she can tell the difference between Coke and Pepsi. The null hypothesis was that he/she would just be guessing. See this link for details of the experiment.
Null and alternative hypotheses were discussed in greater detail which were followed by a definition and examples of Type I and Type II errors. A one-sided test example with z-test involving cigarette tar content was presented. I then talked about the very important matter of p-value and showed that this value was very small in the cigarette example resulting in rejecting H0 for all α > p. The class ended with an example involving the DSB GMAT scores for 2005 (where the p value was very large, given the sample mean).
Saturday, November 13, 2010
Taste test experiment with Coke and Pepsi (Week 9)
To motivate the discussion for hypothesis testing, we did a taste test similar to the one that was done by "The Lady Tasting Tea" in Cambridge in the late 1920s. (The idea belongs to late Professor R. Fisher, the father of modern statistics.)
A student in each class volunteered to be the subject of the test. The students claimed that they can tell the difference between Coke and Pepsi. Similar to the "Tea" test, I brought 8 cups to the class and filled 4 with Coke and 4 with Pepsi. The null hypothesis H0 is that the student can't tell the difference, and that he/she is just guessing. (The students were Melissa, Sean and Bryan in EC01, C01 and C02, resp.)
If the subject is just guessing, there is a 1/70 chance (1.4% probability) that all 8 cups will be identified correctly. So, I would reject H0 if that happens. (This is the p-value.)
Melissa got all 8 correct in the second attempt, Sean got all 8 correct, and Bryan made 1 mistake. So, I rejected my H0 in Melissa's ands Sean's cases by stating that I believed they could tell the difference between Coke and Pepsi. But I couldn't reject H0 in the last case because there is a 16/70 probability (28%) that a person guessing will make 1 mistake. (These probabilities are the result of using hypergeometric distribution.)
Here are some still pictures of the experiment in Section C02 with Bryan doing the tasting:
A student in each class volunteered to be the subject of the test. The students claimed that they can tell the difference between Coke and Pepsi. Similar to the "Tea" test, I brought 8 cups to the class and filled 4 with Coke and 4 with Pepsi. The null hypothesis H0 is that the student can't tell the difference, and that he/she is just guessing. (The students were Melissa, Sean and Bryan in EC01, C01 and C02, resp.)
If the subject is just guessing, there is a 1/70 chance (1.4% probability) that all 8 cups will be identified correctly. So, I would reject H0 if that happens. (This is the p-value.)
Melissa got all 8 correct in the second attempt, Sean got all 8 correct, and Bryan made 1 mistake. So, I rejected my H0 in Melissa's ands Sean's cases by stating that I believed they could tell the difference between Coke and Pepsi. But I couldn't reject H0 in the last case because there is a 16/70 probability (28%) that a person guessing will make 1 mistake. (These probabilities are the result of using hypergeometric distribution.)
Here are some still pictures of the experiment in Section C02 with Bryan doing the tasting:
Tuesday, November 9, 2010
Experiment with the inflatable globe in Bus Q600
In Week 8 we discussed confidence intervals for the population mean μ and the population proportion p. To illustrate the calculation of the point estimate for p and the corresponding confidence interval, the students participated in an experiment where they randomly picked points on an inflatable globe.
We did this for 30 times and in each section where the experiment was performed, we found that about 21 or 22 times water (lakes, oceans) was hit. This resulted in an estimate of about 70% which is very close to the true proportion of 70.8%.
Here is the video of this experiment (on YouTube) as it was performed in Section C02 (Thursday class).
We did this for 30 times and in each section where the experiment was performed, we found that about 21 or 22 times water (lakes, oceans) was hit. This resulted in an estimate of about 70% which is very close to the true proportion of 70.8%.
Here is the video of this experiment (on YouTube) as it was performed in Section C02 (Thursday class).
Monday, November 8, 2010
Summary for Week 8 (November 2-4)
We continued with the z-based confidence intervals (CI) where σ is known. We completed the discussion of the CI calculation for the physicians' taxable incomes example. The important thing to remember is that a CI gives us the probability (e.g., 95%) of finding the true but the unknown mean μ in that interval.
When σ is not known, then we can't use the z-distribution, but fortunately, we have the t-distribution at our disposal (due to William Gosset, a.k.a. "Student"). This distribution is more variable than z, but when the sample size increases beyond 30, it too is approximated by the normal.
The important question of how to find the best sample size n was discussed in the context of the physicians' taxable incomes. We also looked at the problem when σ is not known. (Take a preliminary sample of size m, and continue!)
What about the CI for a proportion? This was motivated by looking at the election polling results. Towards the end of the class, we did an experiment using an inflatable globe and found the estimate of the proportion of water (oceans, lakes, etc.) to the total surface area of the globe. In each section the estimated proportion ended up being very close to the true 70.8% after taking only 30 samples. (The students randomly picked a point on the globe as either water (W) or land (L). I will post the video I took in Section C02 on YouTube a little later.
The midterm exam took place this week on November 5, 2010, at 5:00 pm in Great Hall, RJC.
When σ is not known, then we can't use the z-distribution, but fortunately, we have the t-distribution at our disposal (due to William Gosset, a.k.a. "Student"). This distribution is more variable than z, but when the sample size increases beyond 30, it too is approximated by the normal.
The important question of how to find the best sample size n was discussed in the context of the physicians' taxable incomes. We also looked at the problem when σ is not known. (Take a preliminary sample of size m, and continue!)
What about the CI for a proportion? This was motivated by looking at the election polling results. Towards the end of the class, we did an experiment using an inflatable globe and found the estimate of the proportion of water (oceans, lakes, etc.) to the total surface area of the globe. In each section the estimated proportion ended up being very close to the true 70.8% after taking only 30 samples. (The students randomly picked a point on the globe as either water (W) or land (L). I will post the video I took in Section C02 on YouTube a little later.
The midterm exam took place this week on November 5, 2010, at 5:00 pm in Great Hall, RJC.
Monday, November 1, 2010
Subscribe to:
Comments (Atom)













