| Stat 251:Statistical Methods I |
Fall 2008 |
Goal: To collect, describe, and analyze data using the methods of Chapter 3. First think of
a population of interest,
a categorical variable you would like to measure for that population, and
a conjecture you have about the population proportion of successes.
(e.g., more than 90% of Hollins students own a cell phone, more than 1/3 of TV commercials during the Superbowl tournament last longer than 30 seconds, less than 75% of Salem drivers come to a complete stop at intersections next to campus (Roanoke College), a majority of college students can distinguish between the taste of Coke and Pepsi, more than half of a random sample of products are more expensive at Kroger than at Food Lion, ...).
Your task will be to select a random sample from the population and record the categorical variable for every member of the sample. The sample size should be at least 30 and the population should be at least 20 times the size of your sample. The type of study can be an experiment, an observational study, or a survey. The key requirement will be that you randomly select the observational units from the larger population. (Note: the sample does not have to consist of humans. You should be very careful in how you define your population.)
You are free to choose your own topic(s). The topic may be related to your major or another topic of interest. Make sure you choose a topic so that it is straightforward to gather the data or you have access to data from another class or professor.
Teams: You may work in teams of 2 people, or you may work alone. If you work in a group, it is up to the members of the group to make sure everyone contributes equally. Plan your schedules so that you will have time to work together on the project outside of class. Teams should be formed and project topics selected by Wednesday November 5th. You may be asked to share your proposals with the rest of the class. You are also encouraged to share your ideas with me before you begin collecting any data. Please start early so you have time to ask questions.
Final Report: Due Friday, November 21st. This should be a typed report, written collaboratively by all team members. Your report should be written as if will be read by other student researchers. Make sure it includes at least:
I. Introduction
Same guidelines as last time. You should describe the population parameter of interest, an initial conjecture for its value (that makes sense in the context) and whether you suspect the actual value is higher or lower (or just different) than this conjectured value.
II. Data Collection Methods
Same rules as last time, remember to tell me everything, good and bad. Think about designing a study protocol where someone else could mimic exactly the same study that you carried out. In your discussion, be sure to define your observational units, variable of interest, population of interest, sampling frame (if applicable), and parameter of interest. Which type of probability sampling method did you use (SRS, stratified, cluster, systematic)? If you designed a survey, are there any potential wording issues? Did you “field-test” the questions first? How did you ensure confidentiality or take other precautions to ensure honest responses? What was the response rate? How often did you have to make repeat visits in order to obtain the observational units initially selected? Are there any other potential sources of sampling or non-sampling errors?
III. Analysis of Results
Descriptive Statistics
You will need to make choices as to which numerical and graphical summaries are most relevant. Make sure you integrate the output into the body of the report and include discussions of how you are interpreting the message in these summaries. In your discussion you should fully describe your sample, sample size, and report the sample statistic and whether it supports your conjecture.
Inferential Statistics
In carrying out the binomial test and interval:
define the population and parameter in words
state your conjectured value about the parameter and what it signifies.
state whether you suspected (before you saw the data) whether you thought the actually value of the parameter was higher or lower than this conjectured value. If you had no prior direction in mind, then you will calculate a two-sided p-value.
state what a type I and a type II error would represent in this setting.
discuss whether or not your measurements can be considered observations from a Bernoulli process or from a large population.
calculate a binomial probability to represent the p-value corresponding to the direction of your conjecture. Include an interpretation of what this p-value represents.
use Minitab to calculate a confidence interval to describe the plausible values of your population parameter.
state your conclusions in context. Include a statement of which type of error you may have made based on this conclusion (you cannot have made both types), and if possible, give the probability that you have made that error.
IV. Conclusion
Same guidelines as before. Pay particular attention to whether or not the conditions were satisfied for you to generalize your sample to the larger population. Also discuss whether or not the Bernoulli conditions were met and whether or not the p-value represents true randomness in the study or if the p-value is more fictitious, used to measure the amount of chance variability if there had been randomness (measures the uncertainty but you really don’t think it is reasonable to generalize from your sample to your population). Make sure you include a critique of the study you did, as well as make suggestions for future studies.
![]()