Introduction Statistics are used everyday in life, and are very important in the everyday world. One important use of statistics is to summarize a collection of data in a clear and understandable way. For example, if a psychologist gave a personality test measuring shyness to all 2500 students attending a small college, How might these measurements be summarized There are two basic methods: numerical and graphical. Using the numerical approach one might compute statistics such as mean and standard deviation. These statistics convey information about the average degree of shyness and the degree to which people differ in shyness. Using the graphical approach you could create a stem and leaf display and a box plot.
These plots contain detailed information about the distribution of shyness scores. Graphical methods are better than numerical methods for identifying patterns in the data. Numerical approaches are more accurate and objective. Since the numerical and graphical approaches compliment each other, you should use both. Inferential statistics Inferential statistics are used to draw inferences about a population from a sample. An example is, if ten people who performed a task after twenty-four hours without sleep scored 12 points lower than ten people who performed after a normal night’s sleep.
Is the difference real or could it be due to chance How much larger could the real difference be than the 12 points These are the types of questions answered by inferential statistics. There are two main methods used in inferential statistics: estimation and hypothesis testing. In estimation, the sample is used to estimate a parameter and a confidence interval about the estimate is constructed. In the most common use of hypothesis testing, a “straw man” is put forward and it is determined whether the data are strong enough to reject it. For the sleep deprivation study, the null hypothesis would be that sleep deprivation has no effect on performance. The word “statistics” is used in several different senses.
In the broadest sense, “statistics” refers to a range of techniques and procedures for analyzing data, interpreting data, displaying data, and making decisions based on data. This is what courses in “statistics” generally cover. In a second use, a “statistic” is defined as a numerical quantity (such as the mean) calculated in a sample. Such statistics are used to estimate parameters.
The term “statistics” sometimes refers to calculated quantities regardless of whether or not they are from a sample. For example, one might ask about a baseball player’s statistics and be referring to his or her batting average, runs batted in, number of home runs, etc. Although the different meanings of “statistics” can be confusing, a careful consideration of the context in which the word is used should make its intended meaning clear. Parameters A parameter is a numerical quantity measuring some aspect of a population of scores. For example, the mean is a measure of central tendency.
Greek letters are used to designate parameters… Parameters are rarely known and are usually estimated by statistics computed in samples. To the right of each Greek symbol is the symbol for the associated statistic used to estimate it from a sample. Measurement Scales Measurement is the assignment of numbers to objects or events in a systematic fashion.
Four levels of measurement scales are commonly distinguished: nominal ordinal, interval, and ratio. There is a relationship between the level of measurement and the appropriateness of various statistical procedures. For example, it would be silly to compute the mean of nominal measurements. Frequency polygon A frequency polygon is constructed from a frequency table.
The intervals are shown on the X-axis and the number of scores in each interval is represented by the height of a point located above the middle of the interval. The points are connected so that together with the X-axis they form a polygon. Arithmetic Mean The arithmetic mean is what is commonly called the average: When the word “mean” is used without a modifier, it can be assumed that it refers to the arithmetic mean. The mean is the sum of all the scores divided by the number of scores. The formula in summation notation is: mean where + is the population mean and N is the number of scores. If the scores are from a sample, then the symbol M refers to the mean and N refers to the sample size.
The formula for M is the same as the formula for +. The mean is a good measure of central tendency for roughly symmetric distributions but can be misleading in skewed distributions since it can be greatly influenced by extreme scores. Therefore, other statistics such as the median may be more informative for distributions such as reaction time or family income that are frequently very skewed. The sum of squared deviations of scores from their mean is lower than their squared deviations from any other number. For normal distributions, the mean is the most efficient.
Scatterplot A scatterplot shows the scores of subjects on one variable plotted against their scores on a second variable. On the left is a plot of spatial ability against general intelligence. Each point represents the data from one subject. The point that is circled represents the data for a subject who has a score of 10 on spatial ability and a score of 28 on the intelligence test.
Pearson s Correlation The correlation between two variables reflects the degree to which the variables are related. The most common way to measure correlation is the Pearson Product Moment Correlation (called Pearson’s correlation for short). When measured in a population the Pearson Product Moment correlation is designated by the Greek letter rho (r). When computed in a sample, it is designated by the letter “r” and is sometimes called “Pearson’s r.” Pearson’s correlation reflects the degree of linear relationship between two variables. It ranges from +1 to -1. A correlation of +1 means that there is a perfect positive linear relationship between variables.
The scatterplot shown on this page depicts such a relationship. It is a positive relationship because high scores on the X-axis are associated with high scores on the Y-axis. Probability What is the probability that a card drawn at random from a deck of cards will be an ace Since of the 52 cards in the deck, 4 are aces, the probability is 4/52. In general, the probability of an event is the number of favorable outcomes divided by the total number of possible outcomes. (This assumes the outcomes are all equally likely. ) In this case there are four favorable outcomes: (1) the ace of spades, (2) the ace of hearts, (3) the ace of diamonds, and (4) the ace of clubs.
Since each of the 52 cards in the deck represents a possible outcome, there are 52 possible outcomes. Point Estimation When a parameter is being estimated, the estimate can be either a single number or it can be a range of scores. When the estimate is a single number, the estimate is called a “point estimate”; when the estimate is a range of scores, the estimate is called an interval estimate. Confidence intervals are used for interval estimates.
As an example of a point estimate, assume you wanted to estimate the mean time it takes 12- year-olds to run 100 yards. The mean running time of a random sample of 12-year-olds would be an estimate of the mean running time for all 12-year-olds. Therefore, the sample mean, M, would be a point estimate of the population mean, m. Often point estimates are used as parts of other statistical calculations. For example, a point estimate of the standard deviation is used in the calculation of a confidence interval for m.
Point estimates of parameters are often used in the formulas for significance testing. Point estimates are not usually as informative as confidence intervals. Their importance lies in the fact that many statistical formulas are based on them. Power Power is the probability of correctly rejecting a false null hypothesis.
Power is therefore defined as: 1 – b where b is the Type II error probability. If the power of an experiment is low, then there is a good chance that the experiment will be inconclusive. That is why it is so important to consider power in the design of experiments. There are methods for estimating the power of an experiment before the experiment is conducted. If the power is too low, then the experiment can be redesigned by changing one of the factors that determine power.
Consider a hypothetical experiment designed to test whether rats brought up in an enriched environment can learn mazes faster than rats brought up in the typical laboratory environment (the control condition). Two groups of 12 rats each are tested. Although the experimenter does not know it, the population mean number of trials it takes to learn the maze is 20 for the enriched condition and 32 for the control condition. The null hypothesis that the enriched environment makes no difference is therefore false.
Predictions When two variables are related, it is possible to predict a person’s score on one variable from their score on the second variable with better than chance accuracy. It will be assumed that the relationship between the two variables is linear. Although there are methods for making predictions when the relationship is nonlinear, these methods are beyond the scope of this text. Given that the relationship is linear, the prediction problem becomes one of finding the straight line that best fits the data.
Since the terms “regression” and “prediction” are synonymous, this line is called the regression line. The mathematical form of the regression line predicting Y from X is: Y’ = bX + A where X is the variable represented on the abscissa, b is the slope of the line, A is the Y intercept, and Y’ consists of the predicted values of Y for the various values of X. Chi Squares Chi Squares show how to use a test based on the normal distribution to see whether a sample proportion (p) differs significantly from a population proportion (p). This shows how to conduct a test of the same null hypothesis using a test based on the chi square distribution. The two tests always yield identical results. The advantage of the test based on the chi square distribution is that it can be generalized to more complex situations.
In the other section, an example was given in which a researcher wished to test whether a sample proportion of 62/100 differed significantly from an hypothesized population value of. 5. The test based on z resulted in a z of 2. 3 and a probability value of.
0107. -The number of people falling in a specified category is listed as the first line in each cell (62 succeeded, 38 failed). The second line in each cell (in parentheses) contains the number expected to succeed if the null hypothesis is true. Since the null hypothesis is that the proportion that succeed is. 5, (.
5) (100) = 50 are expected to succeed and (. 5) (100) = 50 are expected to fail. Conclusion In conclusion, statistics are a very important factor in people s way of life. Statistics help to organize very specific and detailed information, to make it easier to understand them. Without statistics life would be very complicated. Statistics are used everyday throughout the world to compute and organize numbers and data.
Statistics help to summarize large sums of data and information into small and convenient graphs, charts, etc… Statistics range from the littlest things like a little leaguer s batting average to as big as a national census of population. All in all, people may not realize it, but statistics are a huge factor in mathematics and life.