Running Tests Fall Down

A public sector physical ability test bit the dust on summary judgment in Easterling v. The State of Connecticut Department of Correction (DOC), No. 3:08-CV-0826 (D. Conn. 6/5/2011). Easterling, an applicant for Correction Officer (CO), failed one part of the physical ability test, the 1.5 mile run in 2004. She subsequently filed a class action sex discrimination suit. The class was certified in January 2010; both sides filed motions for summary judgment.

Experts for DOC admitted that there was no evidence that the 40th percentile cut scores correlated with minimum level of aerobic capacity to perform the CO job. No study had been done to determine cardiovascular capacity for CO.

The physical ability test has four events: sit and reach, one minute of sit-ups, one minute of push-ups, and the timed 1.5 mile run. All events need to be passed. Scoring is age- and sex-normed for the 40th percentile of performance for each age/sex cohort as established by the Cooper Institute. (The issue of age- and sex-norming is discussed below.) The Institute is a non-profit health and preventive medicine organization established in 1970 by Dr. Kenneth Cooper, who popularized aerobics. Adverse impact against women was established in three administrations of the run in 2004-2006 via the Four-Fifths Rule and Fisher’s Exact Test. DOC presented evidence that when results of the run event are pooled over CO, State Trooper Trainee, and Public Safety Trainee, the women’s pass rate meets or comes close to meeting the Four-Fifths Rule. There was an expectation that the norms used would eliminate adverse impact. The plaintiff presented expert testimony that the female norms were based on a sample of women more fit than the overall female population. DOC argued that it had targeted recruitment at racial/ethnic minorities, and these minorities had poorer cardiovascular health than the general population. That could account for the unexpectedly low pass rates for women. But the court noted that DOC had not presented evidence that black or Hispanic women were being recruited; there was no evidence presented by DOC that minority women were less likely to pass than minority men.

Experts for DOC admitted that there was no evidence that the 40th percentile cut scores correlated with minimum level of aerobic capacity to perform the CO job. No study had been done to determine cardiovascular capacity for CO. Then there was material from the Cooper Institute that stated that Cooper norms are not defensible employment standards.

Apart from giving consideration that the proper applicant pool was all applicants for the set of public safety jobs mentioned above (and concluding that the proper pool was applicants for CO), the court essentially found that there was no evidence supporting use of the test.

The opinion could have ended there. However, the court went into a discussion of “job relatedness” and “business necessity” applied to cut scores.

The two terms are equivalent as interpreted in the Second Circuit, specifically in Gulino v. New York State Education Department, 460 F.3d 361 (2nd Cir. 2006). Gulino saw language from Albemarle Paper v. Moody, 422 U.S. 405, 431 (1975) as the “basic rule” for what these terms meant:

“Discriminatory tests are impermissible unless shown, by professionally acceptable methods, to be predictive of or significantly correlated with important elements of work behavior which comprise or are relevant to the job or jobs for which candidates are being evaluated.” (Similar language first appeared in the EEOC’s Guidelines on Employee Selection Procedures, 29 CFR § 1607, 35 Fed. Reg. 12333 (Aug. 1, 1970)). This is the “significantly correlated standard.” The run test did not meet it. In addition, there is the “minimum qualifications standard,” which the test could not meet because it varied the cut scores with age and gender. The court explained the minimum qualifications standard with respect to Wards Cove v. Atonio, 490 U.S. At 659 (1989) that there was no requirement that a challenged practice be essential or indispensable. This, says the court, rejects the minimum qualifications standard as too onerous. While the court sees ambiguity in some of the Supreme Court’s language across decisions, the conclusion is that the Supreme Court likely never adopted such a standard.

But, says the court, a minimum qualifications standard was used in Lanning v. SEPTA, 181 F.3d 478 , 481 (3rd Cir. 1999). More specifically, as the court notes, it was the cut score to which this standard was applied. Lanning took the position that since the Civil Rights Act of 1991 specified that business necessity meant whatever it meant prior to Wards Cove, the law overturned “significantly correlated” and returned “minimum qualifications,” although, as the court noted here, there may have been no such standard to return to. There are, in my opinion, problems with the court’s discussion of the issues; they, like most of the discussion itself, are not essential to resolving the case. The biggest problem seems to be that courts have felt compelled to write explanations to get around Lanning whenever similar issues surface. In any event, it’s the significantly correlated standard that is appropriate for this case.

According to the court, “By definition, cutoff times that vary by gender and age cannot represent a measure of the minimum aerobic capacity necessary for successful performance as a CO. Only a single cutoff could meet this standard” (slip op. at 33). There are issues here, perhaps not the ones that the court had in mind.

It seems curious that the lawfulness of age- and sex-norming did not come up. On face, using different cutoff times, at least for sex, might be construed as running afoul of the Civil Rights Act of 1991. Interestingly, the need for different norms by demographic group to provide meaningful scores apparently has not come up in litigation. The American Psychological Association indicated that it would be providing guidance after 1991, but it apparently did not so—perhaps because there was no actual case to address.

But because the measure on the test is not directly related to the job and its meaning varies with the characteristics of the subjects (notably sex and age), having one cut score on the test measure is an inaccurate, if not simply meaningless, standard.

The court’s opinion provides little detail on the testing procedure. If it resembles other aerobic tests, aerobic capacity is not measured directly. Subjects perform a physical task and a measure (sometimes pulse rate, here apparently run time) is related by mathematical formula to estimated aerobic capacity. The step that is apparently missing here is that the aerobic capacity estimate is related to the physical demands of the job.

But because the measure on the test is not directly related to the job and its meaning varies with the characteristics of the subjects (notably sex and age), having one cut score on the test measure is an inaccurate, if not simply meaningless, standard. Arguably, Congressional intent behind the prohibition of norming by Title VII class is best served in this situation by having the test measure translated into a common scale of predicted job performance. This could provide the single cutoff that the court thought to be essential. The paradox is to get to that common scale there need to be separate norms.

That’s one problem. Another is that here, as in the ongoing case involving the New York Fire Department, the test was lost on summary judgment. That means that, with circumstances viewed favorably toward the employer, there was no trial because there was no way that the employer could prevail. We’re coming up on the 40th anniversary of the inclusion of state and local government under Title VII. Presumably we as a profession have learned a few things about effective public sector testing along the way. What is going on here?

Reprinted with permission from the Personnel Testing Council of Metropolitan Washington.

This entry was posted in Legal by Richard Tonowski. Bookmark the permalink.

About Richard Tonowski

Rich joined EEOC in 2001 as a Psychologist, worked as the assistant HR director for strategic policy and planning from 2003 to 2006, and then became Chief Psychologist. In that role he reviews test validation documentation, conducts statistical analyses regarding employment practices, and consults with EEOC attorneys and investigators. Prior to his time with EEOC, he had over 20 years of experience involving public sector test development and validation, performance appraisal, employee surveys, diversity management, and labor relations. He also had experience in providing written and oral testimony as an expert witness in court cases, and in federal sector hearings conducted by EEOC and the Merit Systems Protection Board. Rich was awarded his Ph.D. in psychology by Rutgers University, and is certified as a Senior Professional in Human Resources by the credentialing affiliate of the Society for Human Resource Management. He is also an Adjunct Associate Professor of Human Resources Management and Development at University of Maryland University College where he teaches a graduate course.

Leave a Reply