February 1996 // Volume 34 // Number 1 // Feature Articles // 1FEA2

# Cutting Evaluation Costs by Reducing Sample Size

**Abstract**

If sample size can be reduced without undermining validity of results, then the cost of evaluating Extension programs can be reduced. To test this hypothesis, three sample sizes at 1%, 3%, and 5% margins of error were drawn from the data of two doctoral dissertations. Comparison of the three sample sizes showed that evaluations with a descriptive purpose could drop sample size from a 1% to a 5% error margin, while evaluations with a comparative purpose could reduce sample size from 1% to 3% error margin without affecting validity of the results. As a result, it is conservatively estimated that the cost of data gathering and entry for a mail survey could be substantially reduced from $1,116 (1% error) to $348 (3% error) or $100 (5% error).

**Introduction**

Increasingly, over the past two decades, Extension programs have been subject to scrutiny by funding agencies to see how well programs are being conducted and the extent to which clientele are benefiting from them. At the same time, federal, state, and local funds for Extension programs are becoming more difficult to secure. For these reasons, it is important that accountability- focused evaluations of Extension programs be cost-effective and produce valid, defensible results.

Sample size is an important factor to consider in conducting Extension evaluations, because it influences the cost of evaluations and the validity of results obtained. Larger samples reduce research error, but can be expensive, time-consuming, and often impractical. Smaller samples, on the other hand, are less expensive but the risk of error is increased and the validity of results can be questioned.

When researchers select a particular sample size, they are basically answering two questions. The precision question is how accurately (precisely) do we want the calculated sample "statistic" to measure the population "parameter." The confidence question is what are the desired limits around the sample mean? The greater the precision desired (1% rather than 5%), and the greater the confidence desired (99% rather than 95%), the greater will be the sample size needed. Cochran's formula for calculating sample size incorporates these two criteria (Cochran, 1977). The purpose of this study was to determine the effect of reducing sample size on the results and costs of doing program evaluation.

**Methods**

**Data source and sample size**

The two-part question of cost-effectiveness and validity of research results implied in the above purpose was addressed by analyzing the data of two doctoral dissertation studies on issues in the Cooperative Extension System completed by Davis (1991) and Seals (1989) in the School of Vocational Education at Louisiana State University. Raw data for both studies were available to the researchers.

Davis' regional study covered six state Extension services in the southeastern United States. It focused on performance appraisal of county agents, specifically on their perceptions of how the system was being administered and how an ideal performance appraisal system should be administered. Fifty-six items with a 7-point, Likert-type response scale in fifteen performance appraisal categories were included in a mail questionnaire to a proportionate sample of 602 agents randomly selected from state personnel lists.

Seals' contrasting statewide study involved a segment of a youth client group of a state Extension service. It dealt with the technical aspect of an Extension program, specifically nutrition, comparing a sample of high school 4-H members with a sample of non-members on selected characteristics, dietary practices, and food consumption patterns.

The two studies used nearly similar sample sizes. Davis used a total drawn sample of 602 with 558 usable returned responses. Seals had a total usable sample response of 553. These sample sizes were calculated by Cochran's formula based on a 1% margin of error for interval data. For purposes of comparison for this paper, two additional sample sizes were calculated at 3% and 5% error margins. The calculated samples were 174 at the 3% error margin and 50 at the 5% error margin.

**Data analysis**

The scope of both studies was considerable. Therefore, only a sample of the original analyses was used in each case. For each study, the three sample sizes were compared on descriptive measures--means, standard deviations, and 95% confidence intervals around the mean. If the confidence intervals for the three sample sizes overlapped, the samples were not considered to be different; whereas, if the confidence intervals did not overlap, the samples were considered to be different. Differences between means of selected data were compared for the three sample sizes using the t-test.

**Results**

Davis expected to find a "discrepancy" between how agents perceived the performance appraisal system was being administered and how it should be ideally done. For each of the fifteen performance appraisal categories, he found that this discrepancy, as measured by the difference between the means of perceptions of the "present" system and an "ideal" system, was statistically significant (p>=.0001) for the sample size of 558. When the sample size was reduced to 174 and 50, the discrepancy was found to also be statistically significant (p>=.0001) for all fifteen categories.

A second set of analyses concerned certain important work- related variables (i.e., years in Extension, years in current position, and most recent appraisal score). It was observed that the 95% confidence intervals around the mean for each of these three variables were overlapping for all three sample sizes. This meant that the three samples were drawn from the same population. Both the above analyses showed that reducing sample size did not affect the results of the study.

Seals' analyses selected for study dealt with nine demographic characteristics and nutrition program outcome measures. Results were compared for the three sample sizes. One set of analyses dealt with confidence intervals. It was determined that the 95% confidence intervals around the mean of each of the nine variables were overlapping for all three sample sizes. This meant that the three samples came from the same population.

A second set of analyses compared 4-H members and non-members on the same nine variables to see if similar differences were obtained between the two groups on each of the three sample sizes. The t-test was used as the test statistic to determine differences. In the largest sample (n = 552), statistically significant differences were found for seven of the nine variables. For the sample n = 174, significant differences were found on five variables. And for the sample n = 50, only one variable showed a statistically significant difference. Comparing n = 558 with n = 174, significance levels were the same for five items, but different for two items. On these latter items, probability levels changed from p = .04 to p = .07, and p = .002 to p = .07. It should be mentioned that even though a p of .07 is not statistically significant, it indicates to researchers and research consumers that the variable merits further study. In contrast, the comparison of the results of analyses with n = 552 and n = 50 showed many and large differences in outcomes. Only one variable was found to be significantly different in the small sample as compared to seven variables in the large sample. Furthermore, the probability levels in the small and large samples were considerably different.

**Implications for Extension Evaluation**

The results of this study of different sample sizes vis-a-vis cost considerations and validity of results suggest the following useful applications for conducting evaluations of Extension programs.

- The cost of evaluating Extension programs can be brought down by
reducing sample size without compromising the evaluation results. In
both studies, there was little or no difference in outcome when sample
size was reduced from 558/552 to 174. Therefore, in doing a mail
survey, for example, and using a conservative cost estimate of $2 per
sample subject for (a) initial mailing of the survey instrument, two
mail follow-ups of first stage non-respondents, and telephone follow-up
of second stage non-respondents for comparing with respondents; and (b)
data entry, the cost of the survey could be reduced to over one- third,
from $1,116 for n = 552 to $348 for n = 174.
- If the purpose of the evaluation is descriptive as most Extension
evaluations are, larger margins of error can be used to calculate
minimum sample size without substantial impact on study outcomes. In
the analyses of both studies, no differences were found in descriptive
outcomes--means and standard deviations--by reducing the sample size to
either n = 174 or n = 50 from the original n = 552/558. Using the
smallest sample size of n = 50 would further reduce evaluation study
costs, namely to as little as $100 for n = 50.
- If the purpose of the evaluation is comparison, it was found that outcomes do not change if the margin of error is increased to 3%. The analyses of both studies substantiates this conclusion. However, when the margin of error increased to 5%, outcomes in the Seals' study changed dramatically for the samples n = 558 and n = 50. Interestingly, there was very little change in outcomes of the Davis' study. One reason for this difference between the two studies could be the homogeneity of the subject groups. Davis' population of Extension professionals in the southeastern United States could be a more homogeneous group than Seals' population of high school students. If homogeneity is determined through further research to influence comparative findings, evaluators could make better decisions in selecting acceptable margins of error based on the nature of the population investigated.

In conclusion, given Extension's accountability mandate and budgetary constraints, Extension evaluators should consider the use of smaller sample sizes in conducting evaluation studies, especially when the primary purpose of the study is description of outcomes.

**References**

Cochran, W. G. (1977). Sampling techniques (3rd ed.). New York: Wiley.

Davis, W. L. (1991). Perceptions of performance appraisal by Cooperative Extension Service agents in selected southern states. Unpublished doctoral dissertation, Louisiana State University, Baton Rouge.

Seals, S. (1989). The effectiveness of the food and nutrition 4-H project in improving dietary practices and food consumption patterns of high school students in Louisiana. Unpublished doctoral dissertation, Louisiana State University, Baton Rouge.