# Analysis of variance (ANOVA) is a statistical procedure that compares data between two or more groups or conditions to investigate the presence of differences between those groups on some continuous dependent variable

Analysis of variance (ANOVA)ย is a statistical procedure that compares data between two or more groups or conditions to investigate the presence of differences between those groups on some continuous dependent variable (seeย Exercise 18). In this exercise, we will focus on theย one-way ANOVA, which involves testing one independent variable and one dependent variable (as opposed to other types of ANOVAs, such as factorial ANOVAs that incorporate multiple independent variables).

Why ANOVA and not aย t-test? Remember that aย t-test is formulated to compare two sets of data or two groups at one time (seeย Exercise 23ย for guidance on selecting appropriate statistics). Thus, data generated from a clinical trial that involves four experimental groups, Treatment 1, Treatment 2, Treatments 1 and 2 combined, and a Control, would require 6ย t-tests. Consequently, the chance of making a Type I error (alpha error) increases substantially (or is inflated) because so many computations are being performed. Specifically, the chance of making a Type I error is the number of comparisons multiplied by the alpha level. Thus, ANOVA is the recommended statistical technique for examining differences between more than two groups (Zar, 2010).

ANOVA is a procedure that culminates in a statistic called theย Fย statistic. It is this value that is compared against anย Fย distribution (seeย Appendix C) in order to determine whether the groups significantly differ from one another on the dependent variable. The formulas for ANOVA actually compute two estimates of variance: One estimate represents differences between the groups/conditions, and the other estimate represents differences among (within) the data.

### Research Designs Appropriate for the One-Way ANOVA

Research designs that may utilize the one-way ANOVA include the randomized experimental, quasi-experimental, and comparative designs (Gliner, Morgan, & Leech, 2009). The independent variable (the โgroupingโ variable for the ANOVA) may be active or attributional. An active independent variable refers to an intervention, treatment, or program. An attributional independent variable refers to a characteristic of the participant, such as gender, diagnosis, or ethnicity. The ANOVA can compare two groups or more. In the case of a two-group design, the researcher can either select an independent samplesย t-test or a one-way ANOVA to answer the research question. The results will always yield the same conclusion, regardless of which test is computed; however, when examining differences between more than two groups, the one-way ANOVA is the preferred statistical test.

Example 1: A researcher conducts a randomized experimental study wherein she randomizes participants to receive a high-dosage weight loss pill, a low-dosage weight loss pill, or a placebo. She assesses the number of pounds lost from baseline to post-treatmentย 378for the three groups. Her research question is: โIs there a difference between the three groups in weight lost?โ The independent variables are the treatment conditions (high-dose weight loss pill, low-dose weight loss pill, and placebo) and the dependent variable is number of pounds lost over the treatment span.

Null hypothesis: There is no difference in weight lost among the high-dose weight loss pill, low-dose weight loss pill, and placebo groups in a population of overweight adults.

Example 2: A nurse researcher working in dermatology conducts a retrospective comparative study wherein she conducts a chart review of patients and divides them into three groups: psoriasis, psoriatric symptoms, or control. The dependent variable is health status and the independent variable is disease group (psoriasis, psoriatic symptoms, and control). Her research question is: โIs there a difference between the three groups in levels of health status?โ

Null hypothesis: There is no difference between the three groups in health status.

### Statistical Formula and Assumptions

Use of the ANOVA involves the following assumptions (Zar, 2010):

1.ย Sample means from the population are normally distributed.

2.ย The groups are mutually exclusive.

3.ย The dependent variable is measured at the interval/ratio level.

4.ย The groups should have equal variance, termed โhomogeneity of variance.โ

The dependent variable in an ANOVA must be scaled as interval or ratio. If the dependent variable is measured with a Likert scale and the frequency distribution is approximately normally distributed, these data are usually considered interval-level measurements and are appropriate for an ANOVA (de Winter & Dodou, 2010;ย Rasmussen, 1989).

The basic formula for theย Fย without numerical symbols is:

F=Meanย Squareย Betweenย GroupsMeanย Squareย Withinย Groups

The term โmean squareโ (MS) is used interchangeably with the word โvariance.โ The formulas for ANOVA compute two estimates of variance: the between groups variance and the within groups variance. Theย between groups varianceย represents differences between the groups/conditions being compared, and theย within groups varianceย represents differences among (within) each group’s data. Therefore, the formula isย Fย = MS between/MS within.

### Hand Calculations

Using an example from a study of students enrolled in an RN to BSN program, a subset of graduates from the program were examined (Mancini, Ashwill, & Cipher, 2014). The data are presented inย Table 33-1. A simulated subset was selected for this example so thatย 379the computations would be small and manageable. In actuality, studies involving one-way ANOVAs need to be adequately powered (Aberson, 2010;ย Cohen, 1988). Seeย Exercises 24ย andย 25ย for more information regarding statistical power.

TABLE 33-1

MONTHS FOR COMPLETION OF RN TO BSN PROGRAM BY HIGHEST DEGREE STATUS

Participant # | Associate’s | Participant # | Bachelor’s | Participant # | Master’s |

Degree | Degree | Degree | |||

1 | 17 | 10 | 16 | 19 | 17 |

2 | 19 | 11 | 15 | 20 | 21 |

3 | 24 | 12 | 16 | 21 | 20 |

4 | 18 | 13 | 12 | 22 | 21 |

5 | 24 | 14 | 16 | 23 | 12 |

6 | 24 | 15 | 12 | 24 | 16 |

7 | 16 | 16 | 16 | 25 | 20 |

8 | 16 | 17 | 12 | 26 | 18 |

9 | 20 | 18 | 10 | 27 | 12 |

The independent variable in this example is highest degree obtained prior to enrollment (Associate’s, Bachelor’s, or Master’s degree), and the dependent variable was number of months it took for the student to complete the RN to BSN program. The null hypothesis is โThere is no difference between the groups (highest degree of Associate’s, Bachelor’s, or Master’s) in the months these nursing students require to complete an RN to BSN program.โ

The computations for the ANOVA are as follows:

Step 1: Compute correction term,ย C.

Square the grand sum (G), and divide by totalย N:

C=460ย 2ย 27ย =7,837.04

Step 2: Compute Total Sum of Squares.

Square every value in dataset, sum, and subtractย C:

(17ย 2ย +19ย 2ย +24ย 2ย +18ย 2ย +24ย 2ย +16ย 2ย +16ย 2ย +โฆ+12ย 2ย )โ7,837.04=8,234โ7,837.04=396.96

Step 3: Compute Between Groups Sum of Squares.

Square the sum of each column and divide byย N.ย Add each, and then subtractย C:

178ย 2ย 9ย +125ย 2ย 9ย +157ย 2ย 9ย โ7,837.04(3,520.44+1,736.11+2,738.78)โ7,837.04=158.29

Step 4: Compute Within Groups Sum of Squares.

Subtract the Between Groups Sum of Squares (Step 3) from Total Sum of Squares (Step 2):

396.96โ158.29=238.67

Step 5: Create ANOVA Summary Table (seeย Table 33-2).

a.ย Insert the sum of squares values in the first column.

b.ย The degrees of freedom are in the second column. Because theย Fย is a ratio of two separate statistics (mean square between groups and mean square within groups) both have differentย dfย formulasโone for the โnumeratorโ and one for the denominator:

Meanย squareย betweenย groupsdf=numberย ofย groupsโ1

Meanย squareย withinย groupsย df=N-numberย ofย groups

Forย thisย example,ย thedfforย theย numeratorย isย 3โ1=2.

Thedfforย theย denominatorย isย 27โ3=24.

c.ย The mean square between groups and mean square within groups are in the third column. These values are computed by dividing theย SSย by theย df. Therefore, theย MSย between = 158.29 รท 2 = 79.15. Theย MSย within = 238.67 รท 24 = 9.94.

d.ย Theย Fย is the final column and is computed by dividing theย MSย between by theย MSย within. Therefore,ย Fย = 79.15 รท 9.94 = 7.96.

TABLE 33-2

Source of Variation | SS | df | MS | F |

Between Groups | 158.29 | 2 | 79.15 | 7.96 |

Within Groups | 238.67 | 24 | 9.94 | |

Total | 396.96 | 26 |

Step 6: Locate the criticalย Fย value on theย Fย distribution table (seeย Appendix C) and compare it to our obtainedย Fย = 7.96 value. The criticalย Fย value for 2 and 24ย dfย at ฮฑ = 0.05 is 3.40, which indicates theย Fย value in this example is statistically significant. Researchers report ANOVA results in a study report using the following format:ย F(2,24) = 7.96,ย pย < 0.05. Researchers report the exactย pย value instead of โpย < 0.05,โ but this usually requires the use of computer software due to the tedious nature ofย pย value computations.

Our obtainedย Fย = 7.96 exceeds the critical value in the table, which indicates that theย Fย is statistically significant and that the population means are not equal. Therefore, we can reject our null hypothesis that the three groups spent the same amount of time completing the RN to BSN program. However, theย Fย does not indicate which groups differ from one another, and thisย Fย value does not identify which groups are significantly different from one another. Further testing, termed multiple comparison tests or post hoc tests, is required to complete the ANOVA process and determine all the significant differences among the study groups.

#### Post Hoc Tests

Post hoc testsย have been developed specifically to determine the location of group differences after ANOVA is performed on data from more than two groups. These tests were developed to reduce the incidence of a Type I error. Frequently used post hoc tests are the Newman-Keuls test, the Tukey Honestly Significant Difference (HSD) test, the Scheffรฉ test, and the Dunnett test (Zar, 2010; seeย Exercise 18ย for examples). When these tests areย 381calculated, the alpha level is reduced in proportion to the number of additional tests required to locate statistically significant differences. For example, for several of the aforementioned post hoc tests, if many groups’ mean values are being compared, the magnitude of the difference is set higher than if only two groups are being compared. Thus, post hoc tests are tedious to perform by hand and are best handled with statistical computer software programs. Accordingly, the rest of this example will be presented with the assistance of SPSS.

### SPSS Computations

The following screenshot is a replica of what your SPSS window will look like. The data for ID numbers 24 through 27 are viewable by scrolling down in the SPSS screen.

Step 1: From the โAnalyzeโ menu, choose โCompare Meansโ and โOne-Way ANOVA.โ Move the dependent variable, Number of Months to Complete Program, over to the right, as in the window below.

Step 2: Move the independent variable, Highest Degree at Enrollment, to the right in the space labeled โFactor.โ

Step 3: Click โOptions.โ Check the boxes next to โDescriptiveโ and โHomogeneity of variance test.โ Click โContinueโ and โOK.โ

### Interpretation of SPSS Output

The following tables are generated from SPSS. The first table contains descriptive statistics for months to completion, separated by the three groups. The second table contains the Levene’s test of homogeneity of variances. The third table contains the ANOVA summary table, along with theย Fย andย pย values.

The first table displays descriptive statistics that allow us to observe the means for the three groups. This table is important because it indicates that the students with an Associate’s degree took an average of 19.78 months to complete the program, compared to 13.89 months for students with a Bachelor’s and 17.44 months for students with a Master’s degree.

### One Way

The second table contains the Levene’s test for equality of variances. The Levene’s test is a statistical test of the equal variances assumption. Theย pย value is 0.488, indicating there was no significant difference among the three groups’ variances; thus, the data have met the equal variances assumption for ANOVA.

The last table contains the contents of the ANOVA summary table, which looks much likeย Table 33-2. This table contains an additional value that we did not compute by handโthe exactย pย value, which is 0.002. Because the SPSS output indicates that we have a significant ANOVA, post hoc testing must be performed.

Return to the ANOVA window and click โPost Hoc.โ You will see a window similar to the one below. Select the โLSDโ and โTukeyโ options. Click โContinueโ and โOK.โ

The following output is added to the original output. This table contains post hoc test results for two different tests: the LSD (Least Significant Difference) test and the Tukey HSD (Honestly Significant Difference) test. The LSD test, the original post hoc test, explores all possible pairwise comparisons of means using the equivalent of multipleย t-tests. However, the LSD test, in performing a set of multipleย t-tests, reports inaccurateย pย values that have not been adjusted for multiple computations (Zar, 2010). Consequently, researchers should exercise caution when choosing the LSD post hoc test following an ANOVA.

The Tukey HSD comparison test, on the other hand, is a more โconservativeโ test, meaning that it requires a larger difference between two groups to indicate a significant difference than some of the other post hoc tests available. By requiring a larger difference between the groups, the Tukey HSD procedure yields more accurateย pย values of 0.062 to reflect the multiple comparisons (Zar, 2010).

### Post Hoc Tests

Observe the โMean Differenceโ column. Any difference noted with an asterisk (*) is significant atย pย < 0.05. Theย pย values of each comparison are listed in the โSig.โ column, and values below 0.05 indicate a significant difference between the pair of groups. Observe theย pย values for the comparison of the Bachelor’s degree group versus the Master’s degree group. The Tukey HSD test indicates no significant difference between the groups, with aย pย of 0.062; however, the LSD test indicates that the groups significantly differed, with aย pย of 0.025. This example enables you see the difference in results obtained when calculating a conservative versus a lenient post hoc test. However, it should be noted that because anย a prioriย power analysis was not conducted, there is a possibility that these analyses are underpowered. Seeย Exercises 24ย andย 25ย for more information regarding the consequences of low statistical power.

### Final Interpretation in American Psychological Association (Apa) Format

The following interpretation is written as it might appear in a research article, formatted according to APA guidelines (APA, 2010). A one-way ANOVA performed on months to program completion revealed significant differences among the three groups, F(2,24) = 7.96,ย pย = 0.002. Post hoc comparisons using the Tukey HSD comparison test indicated that the students in the Associate’s degree group took significantly longer to complete the program than the students in the Bachelor’s degree group (19.8 versus 13.9 months, respectively) (APA, 2010). However, there were no significant differences in program completion time between the Associate’s degree group and the Master’s degree group or between the Bachelor’s degree group and the Master’s degree group.

### Study Questions

1.ย Is the dependent variable in theย Mancini etย al. (2014)ย example normally distributed? Provide a rationale for your answer.

2.ย What are the two instances that must occur to warrant post hoc testing following an ANOVA?

3.ย Do the data in this example meet criteria for homogeneity of variance? Provide a rationale for your answer.

4.ย What is the null hypothesis in the example?

5.ย What was the exact likelihood of obtaining anย Fย value at least as extreme as or as close to the one that was actually observed, assuming that the null hypothesis is true?

6.ย Do the data meet criteria for โmutual exclusivityโ? Provide a rationale for your answer.

7.ย What does the numerator of theย Fย ratio represent?

8.ย What does the denominator of theย Fย ratio represent?

9.ย How would our final interpretation of the results have changed if we had chosen to report the LSD post hoc test instead of the Tukey HSD test?

10.ย Was the sample size adequate to detect differences among the three groups in this example? Provide a rationale for your answer.

### Answers to Study Questions

1.ย Yes, the data are approximately normally distributed as noted by the frequency distribution generated from SPSS, below. The Shapiro-Wilk (covered inย Exercise 26)ย pย value for months to completion was 0.151, indicating that the frequency distribution did not significantly deviate from normality.

2.ย The two instances that must occur to warrant post hoc testing following an ANOVA are (1) the ANOVA was performed on data comparing more than two groups, and (2) theย Fย value is statistically significant.

3.ย Yes, the data met criteria for homogeneity of variance because the Levene’s test for equality of variances yielded aย pย of 0.488, indicating no significant differences in variance between the groups.

4.ย The null hypothesis is: โThere is no difference between groups (Associate’s, Bachelor’s, and Master’s degree groups) in months until completion of an RN to BSN program.โ

5.ย The exact likelihood of obtaining anย Fย value at least as extreme as or as close to the one that was actually observed, assuming that the null hypothesis is true, was 0.2%.

6.ย Yes, the data met criteria for mutual exclusivity because a student could only belong to one of the three groups of the highest degree obtained prior to enrollment (Associate, Bachelorโs, and Masterโs degree).

7.ย The numerator represents the between groups variance or the differences between the groups/conditions being compared.

8.ย The denominator represents within groups variance or the extent to which there is dispersion among the dependent variables.

9.ย The final interpretation of the results would have changed if we had chosen to report the LSD post hoc test instead of the Tukey HSD test. The results of the LSD test indicated that theย 389students in the Master’s degree group took significantly longer to complete the program than the students in the Bachelor’s degree group (pย = 0.025).

10.ย The sample size was most likely adequate to detect differences among the three groups overall because a significant difference was found,ย pย = 0.002. However, there was a discrepancy between the results of the LSD post hoc test and the Tukey HSD test. The difference between the Master’s degree group and the Bachelor’s degree group was significant according to the results of the LSD test but not the Tukey HSD test. Therefore, it is possible that with only 27 total students in this example, the data were underpowered for the multiple comparisons following the ANOVA.

#### Data for Additional Computational Practice for Questions to be Graded

Using the example fromย Ottomanelli and colleagues (2012)ย study, participants were randomized to receive Supported Employment or treatment as usual. A third group, also a treatment as usual group, consisted of a nonrandomized observational group of participants. A simulated subset was selected for this example so that the computations would be small and manageable. The independent variable in this example is treatment group (Supported Employment, Treatment as UsualโRandomized, and Treatment as UsualโObservational/Not Randomized), and the dependent variable was the number of hours worked post-treatment. Supported employment refers to a type of specialized interdisciplinary vocational rehabilitation designed to help people with disabilities obtain and maintain community-based competitive employment in their chosen occupation (Bond, 2004).

The null hypothesis is: โThere is no difference between the treatment groups in post-treatment number of hours worked among veterans with spinal cord injuries.โ

Compute the ANOVA on the data inย Table 33-3ย below.

### EXERCISE 33ย Questions to Be Graded

Name: _______________________________________________________ Class: _____________________

Date: ___________________________________________________________________________________

Follow your instructor’s directions to submit your answers to the following questions for grading. Your instructor may ask you to write your answers below and submit them as a hard copy for grading. Alternatively, your instructor may ask you to use the space below for notes and submit your answers online atย http://evolve.elsevier.com/Grove/statistics/ย under โQuestions to Be Graded.โ

1.ย Do the data meet criteria for homogeneity of variance? Provide a rationale for your answer.

2.ย If calculating by hand, draw the frequency distribution of the dependent variable, hours worked at a job. What is the shape of the distribution? If using SPSS, what is the result of the Shapiro-Wilk test of normality for the dependent variable?

3.ย What are the means for three groups’ hours worked on a job?

4.ย What are theย Fย value and the group and errorย dfย for this set of data?

5.ย Is theย Fย significant at ฮฑ = 0.05? Specify how you arrived at your answer.

6.ย If using SPSS, what is the exact likelihood of obtaining anย Fย value at least as extreme as or as close to the one that was actually observed, assuming that the null hypothesis is true?

7.ย Which group worked the most weekly job hours post-treatment? Provide a rationale for your answer.

8.ย Write your interpretation of the results as you would in an APA-formatted journal.

9.ย Is there a difference in your final interpretation when comparing the results of the LSD post hoc test versus Tukey HSD test? Provide a rationale for your answer.

10.ย If the researcher decided to combine the two Treatment as Usual groups to represent an overall โControlโ group, then there would be two groups to compare: Supported Employment versus Control. What would be the appropriate statistic to address the difference in hours worked between the two groups?