The Journal of Extension -

April 2020 // Volume 58 // Number 2 // Tools of the Trade // v58-2tt4

The Stockman's Scorecard: Validity and Reliability as an Instrument for Measuring Stockmanship

The quality of beef cattle stockmanship typically is evaluated through quantitative and qualitative measurements of animal behavior. The Stockman's Scorecard is an observation instrument that has been developed to directly measure the actions of beef cattle stockmen. This article documents a pilot project for determining the content validity, internal consistency, and intrarater reliability of the scorecard as an evaluation instrument. Our results show that the scorecard is a valid and reliable instrument for measuring the actions of stockmen. The instrument can be a valuable tool for Extension educators in evaluating their stockmanship programming impacts.

John K. Yost
Assistant Director of Farm Operations
Davis College School of Design
West Virginia University
Morgantown, West Virginia

Jarred Yates
Farm Manager
Reymann Memorial Farms
West Virginia University
Wardensville, West Virginia

David J. Workman
Assistant Professor
West Virginia University Extension Service
West Virginia University
Moorefield, West Virginia

Matthew E. Wilson
Associate Dean for Research
Davis College of Agriculture, Natural Resources, & Design
West Virginia University
Morgantown, West Virginia


The behavior, and subsequent welfare, of livestock is directly affected by the behavior and actions of stockmen (Zulkifli, 2013). Adverse handling practices induce significant fear in cattle, which can cause serious losses in productivity, increased handling problems, injuries to both animals and handlers, and diminished animal welfare (Rushen, Taylor, & Passille, 1999). Cattle may react negatively to any initial handling practice but can habituate over time (Maston, 2006), although it has been shown that livestock will not habituate to extremely adverse handling (Grandin, Curtis, Widowski, & Thurmon, 1986). The goal of a livestock handling activity should be to minimize fearful reactions (Gonyou, 1995). Cattle handlers are instructed to be calm, quiet, slow, and deliberate when working animals (Grandin, 2015).

Extension educators and other researchers and outreach practitioners conduct stockmanship training to improve the livestock handling skills of stockmen. Evaluation of program outcomes from these trainings has been determined by qualitative evaluation (Adams, Kristula, & Hain, 2019; Coleman, Hemsworth, Hay, & Cox, 2000) and formal quantitative assessments (Beef Quality Assurance, n.d.) of animal behavior. These measurements assess improvements in stockmanship within an operation at the herd level (Rushen & Passille, 2015). However, if aberrations are identified in these animal observations, how are we to determine what stockperson actions were the root cause?

In attempt to more precisely evaluate the quality of beef cattle stockmanship, we developed the Stockman's Scorecard as an evaluation tool for measuring the quality of a stockman's cattle handling ability. The purposes of this report are to

  1. establish the validity and reliability of the evaluation instrument and
  2. confirm the intrarater reliability for multiple observers evaluating the same individual.

The Stockman's Scorecard

The instrument (see Figure 1) lists stockman actions that may be observed during a beef cattle handling activity (Grandin & Dessing, 2008). If an action is likely to produce a positive animal behavior, no points are deducted. Those actions that could produce a negative animal behavior are assigned a minus 5 (−5) or a minus 10 (−10) point deduction according to their perceived impact on animal behavior. When evaluating a stockman, the observer positions himself or herself in a location where it is possible to monitor the stockman herding cattle but not interfere with the activity. The evaluator observes the stockman throughout the activity and places a checkmark next to any actions listed on the card that were observed during the session. At the conclusion, the negative point totals are added up and subtracted from 100 points to determine the final score.

Figure 1.
The Stockman's Scorecard

Determining Validity and Reliability

To produce a usable evaluation instrument, one must establish that it is a valid and reliable tool for measuring the underlying construct (Huck, 2012). Validity refers to the accuracy of the instrument, answering the question "Does the instrument measure the construct it is intended to measure?" The related concept of reliability provides assurance that the instrument consistently collects the desired data. If we compare validity and reliability to shooting a gun, validity is related to whether we are hitting the target and reliability is related to whether we are hitting the same point on the target with each shot. If the instrument is both valid and reliable, we will be hitting the bull's-eye with each shot.

Content, or face, validity of the scorecard was established by a panel of experts, following the guidance of Huck (2012). The completed scorecard was provided to four recognized experts in cattle handling and behavior. They agreed that the content of the card included all items one would wish to consider when evaluating a cattle stockman, thereby resulting in no changes occurring from their review. The instrument's internal consistency, or reliability, was determined via pilot testing at three Midwest cattle feeding facilities. Observer volunteers were trained on the use of the scorecard, and they evaluated 19 stockmen. Results were recorded in Excel as a "1" (action observed) or a "0" (action not observed). A split-half analysis was conducted via use of SPSS (Version 25) to calculate a Spearman-Brown coefficient of individual final scores (Carmines & Zeller, 1979). The instrument constructs were found to be exemplary, with a coefficient of 0.76, exceeding the threshold of 0.30 for interitem correlations (Robinson, Shaver, & Wrightsman, 1991).

The next step was to determine whether multiple observers could use the scorecard to score an individual stockman in a similar manner. For this purpose, six videos were created of stockmen working cattle at three Iowa feedyards. Three trained observers independently scored the six individuals using the scorecard, and results were recorded in Excel as a "1" or "0." The final scores were used to calculate an intraclass correlation coefficient (ICC) using SPSS (Version 25) (Hallgren, 2012). The observers exhibited a high level of agreement, with an ICC of 0.66, which can be classified as good intrarater reliability (Cicchetti, 1994).


Grandin (2014) stated that "people manage the things they measure" and went on to say that "measurement is essential because it enables management to determine if procedures are improving or getting worse" ("3.1. Packers"). Program evaluation is an important, yet challenging, component of Extension educator duties. Extension educators and specialists are recruited for their subject matter expertise and are typically not trained in evaluation techniques. Moreover, educators with a program emphasis in agriculture and natural resources have lower program evaluation skills than their programming counterparts (Ghimire & Martin, 2013). Due to the wide variation in their program delivery methods, it is often difficult for them to develop accurate evaluation instruments (Diaz, Kumar Chaudhary, Jayaratne, & Warner, 2019).

It has been established that the Stockman's Scorecard is a valid, reliable instrument that can be used to assign a numerical score to the actions of cattle handlers. The application of this tool is varied. Extension educators, and other stockmanship trainers, can use the instrument in a pretest/posttest format to determine the effectiveness of their stockmanship training. Additionally, Extension educators can provide facility managers with the scorecard to use to evaluate their employees and identify targeted training needs to improve abilities and reduce animal stress. Furthermore, the instrument may serve as a complement to current assessment procedures to evaluate the human factors associated with positive animal welfare efforts.


This project was supported through a pilot project grant from the National Beef Quality Assurance Advisory Committee.


Adams, A. L., Kristula, M., & Hain, M. V. (2019). Dairy cattle handling programs: Training workers and cattle. Journal of Extension, 57(4), Article v57-4rb8. Available at:

Beef Quality Assurance. (n.d.). Beef Quality Assurance Feedyard Assessment Guide. Retrieved from

Carmines, E. G., & Zeller, R. A. (1979). Reliability and validity assessment (Vol. 17). Thousand Oaks, CA: Sage.

Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6(4), pp. 284–290.

Coleman, G. L., Hemsworth, P. H., Hay, M., & Cox, M. (2000). Modifying stockperson attitudes and behavior towards pigs at a large commercial farm. Applied Animal Behaviour Science, 66(1–2), 11–20.

Diaz, J., Kumar Chaudhary, A., Jayaratne, K. S. U., & Warner, L. A. (2019). Program evaluation challenges and obstacles faced by new Extension agents: Implications for capacity building. Journal of Extension, 57(4), Article v57-4a1. Available at:

Ghimire, N. R., & Martin, R. A. (2013). Does evaluation competence of extension educators differ by their program area of responsibility? Journal of Extension, 51(6), Article v51-6rb1. Available at:

Gonyou, H. W. (1995). How animal handling influences animal behavior. In Prairie Swine Centre, Inc., 1995 annual research report. Retrieved from

Grandin, T. (2014). Animal welfare and society concerns: Finding the missing link. Meat Science, 98(3), 461–469.

Grandin, T. (2015). How to improve livestock handling and reduce stress. In T. Grandin (Ed.), Improving animal welfare: A practical approach (2nd ed., pp. 69–95). Wallingford, UK: CAB International.

Grandin, T., Curtis, S. E., Widowski, T. M., & Thurmon, J. C. (1986). Electro-immobilization versus mechanical restraint in an avoid-avoid choice test for ewes. Journal of Animal Science, 62, 469–1480.

Grandin, T., & Dessing, M. (Eds.) (2008). Humane livestock handling: Understanding livestock behavior and building facilities for healthier animals. North Adams, MA: Storey Publishing.

Hallgren, K. A. (2012). Computing inter-rater reliability for observational data: An overview and tutorial. Tutor Quantitative Methods Psychology, 8(1), 23–34.

Huck, S. W. (2012). Reading statistics and research (6th ed.). Boston, MA: Pearson Education, Inc.

Maston, K. M. (2006). The effect of weekly handling on the temperament of peripuberal crossbred beef heifers (Master's thesis). Virginia Polytechnic Institute and State University, Blacksburg, VA.

Robinson, J. P., Shaver, P. R., & Wrightsman, L. S. (1991). Criteria for scale selection and evaluation. In J. P. Robinson, P. R. Shaver, & L. S. Wrightsman (Eds.). Measures of personality and social psychological attitudes (pp. 1–16). New York, NY: Academic Press.

Rushen, J., & Passille, A. M. D. (2015). The importance of good stockmanship and its benefits to animals. In T. Grandin (Ed.), Improving animal welfare: A practical approach. (2nd ed., pp. 125–138). Wallingford, UK: CAB International.

Rushen, J., Taylor, A., & Passille, A. M. D. (1999). Domestic animals' fear of humans and its effect on their welfare. Applied Animal Behaviour Science, 65, 285–303.

Zulkifli, I. (2013). Review of human-animal interactions and their impact on animal productivity and welfare. Journal of Animal Science and Biotechnology, 4(25). doi:10.1186/2049-1897-4-25