The Second Circuit upheld a fire department promotional test in M.O.C.H.A. Society v. City of Buffalo, Nos. 11-2184-cv, 10-2168-cv, 7/30/2012. The two docket numbers are there because of separate appeals for testing in 1998 and 2002. In both cases district court (same judge) granted summary judgment to the city on disparate treatment regarding the 1998 testing and on the overall testing procedure in 2002. The court ruled for the city on disparate impact after a bench trial regarding the 1998 test and foreclosed litigation on what was essentially the same situation in 2002. Here’s what makes the case interesting, straight from the opinion of the Court of Appeals:
Can an employer show that promotional examinations having a disparate impact on a protected class are job related and supported by business necessity when the job analysis that produced the test relied on data not specific to the employer at issue? We answer that question in the affirmative based on the record developed in these related cases. While employer-specific data may make it easier for an employer to carry his burden in the second step of Title VII analysis, such evidence is not required as a matter of law to support a factual finding of job relatedness and business necessity.
The appellate court noted that the trial judge, John T. Curtin, had longstanding experience with promotions in the Buffalo Fire Department, starting with a 1978 finding of a pattern-or-practice of discrimination against African Americans, Latinos, and women. Presumably Judge Curtin would be no pushover for a weak showing of validity.
Here’s the history. The city asked the New York State Civil Service Department for a Fire Lieutenant test in 1997. (The appellate court noted that subsequently the city uses tests provided by private vendors.) The position is that of a first-line supervisor. Dr. Wendy Steinberg had been working on a “lower level” fire promotion test series for the previous three years, which included job analyses of all ranks from fire departments across New York. After initial development, the task and KSA surveys went to every incumbent firefighter in the state except for NYC and Rochester (which have their own testing function).There were 795 task surveys sent to Fire Lieutenants and 316 responses (39.75% return rate). The appellate court found that the returns “exceeded the numbers required by accepted statistical methodologies to establish 95% confidence in the survey results.” In a footnote, the court noted that Dr. Steinberg did not define 95% statistical confidence or provide a source, but M.O.C.H.A. offered no challenge. The issue on appeal was whether the statewide job analysis was suitable for use in Buffalo absent other direct evidence as a matter of law.
And that evidence was lacking. The appellate court noted that 833 task surveys had been sent to Buffalo for all ranks, and a total of only 68 were returned. There were no responses at all to the KSA survey. One might think that the city, having requested the test, would drum up participation in its development. Not only were responses lacking for the survey, but the city declined to provide subject matter experts for development of questions regarding fire-fighting knowledge. At trial, the city offered no expert witness; Dr. Steinberg testified, but as a fact witness regarding the test development and administration process.
Steinberg, others at Civil Service, and a Fire Advisory Committee (subject matter experts) linked KSAs with tasks and arrived at five sub-test areas to be covered by the Fire Lieutenant examination. There was 90% correlation across jurisdictions in the tasks identified as critical to the position; methodology was not mentioned. In addition, Steinberg reviewed Fire Lieutenant test plans for 14 large fire departments across the United States and found that they were consistent with the New York results.
The resulting test ran in 1998 with disparate impact against blacks. Of 89 black firefighters who took the test, 38 passed, for a rate of 42.6%. The white pass rate was 74.3%, or 133 out of 179.
Testing was also conducted in 2002, with some new questions, but with the test based on the same job analysis. Again, there was disparate impact against blacks. (The numbers were not included in the record; the city stipulated that there was disparate impact.)
A five-day bench trial was conducted regarding the 1998 administration in 2008 (!). The court found for the city regarding disparate impact. The city moved for summary judgment in the disparate treatment case, and also on the 2002 claim which had by now surfaced. The city argued that if the first test was valid and the second developed to be like it, there was no issue of fact to resolve regarding the validity of the second test. The court agreed in 2010.
There was no fight on the impact. As mentioned above, the city stipulated that there was impact in 2002. The appellate court noted that, following circuit precedent that gives “great deference” to the Four-Fifths Rule, the trial court properly determined impact for 1998. So the case moved on to validity, with the city arguing that the factors for content validity in Guardians v. Civil Service Commission of NYC, 620 F.2d 79 (2nd Cir. 1980) had been met. The trial judge noted that Guardians cautioned against a too-rigid application of the Uniform Guidelines.
The trial judge first concluded that content validity was an appropriate strategy. Then came the review of the Guardians factors. Although plaintiffs contended that the job analysis was no more than an “other cities guess” as to the Fire Lieutenant job in Buffalo, and that Buffalo was far larger than any of the other New York cities in the Civil Service study, the consistency in findings across jurisdictions gave credence to the job being highly similar regardless of location. The appellate court found this an appropriate application of induction, quoting from a circuit precedent that, “In the civil context, a finding that X is more likely than not true is the equivalent to a finding that X is true.” The appellate court distinguished the present case from EEOC v. Atlas Paper Box Co., 868 F.2d 1487 (6th Cir. 1989) that held that “the employer failed to sustain burden by assuming jobs were similar and that familiar ‘intelligence tests are always valid’ [stated in the current opinion].” While it is possible that there were other tasks and KSAs involved in the Buffalo job, the law does not require that all possibilities be discounted even where the standard of proof is beyond a reasonable doubt. The trial judge had noted that no evidence of such differences had been presented, and this notation did not imply an inappropriate shifting of the burden of proof unto the plaintiffs.
Plaintiffs had Dr. Kevin Murphy as their expert. He opined that the job analysis simply assumed that the job of Fire Lieutenant was similar from place to place without any detailed analysis. As noted above, the city did not provide an expert. The appellate court noted that Dr. Steinberg’s description of the job analysis and the way she had examined job similarity was a factual refutation of the criticism that could be understood by the fact finder without expert help. The conclusion would have been stronger had Buffalo linked the job analysis findings to its own situation, but this was not required as a matter of law for the trial judge to conclude that the job analysis was adequate.
The next Guardians factor is reasonable competence in designing generic sub-tests (those not involving firefighter knowledge). Problems will arise if the test developers were not professionals or if no study was performed to ensure that the questions were comprehensible and unambiguous. The appellate court noted that besides Steinberg’s personal expertise, she had relied on the Civil Service testing division. Division director Paul Kaiser testified regarding division procedures and consultation either outside experts. A check to see if the test was comprehensible and unambiguous was not done in this situation, but “cross-occupational” question had been used in previous examinations and had been screened for problems.
So there was no clear error in the trial court’s determination of validity. The trial court had noted that plaintiffs had not proposed alternatives that would be valid and have less impact.
This is an important case. At the same time, it is important to note the limits of conclusions that can be drawn. As the court noted, the issue is not whether a fact finder could have found against Buffalo, but whether the fact finder was required by law to do so. A strategy that concentrated on testing flaws (if any) might succeed where an attack on generalized job analysis failed. The case was not an endorsement of validity generalization (VG) as the term is commonly understood. Dr. Murphy distinguished the content approach here from the meta-analysis of statistical data that marks usual VG studies. While general cognitive ability may underlie everything cognitive, it was not an issue here; the appellate court also distinguished this case from Atlas.
Despite plaintiffs’ contention that there was no proof that the generic sub-tests measured anything, the appellate court found that various versions of that argument were not persuasive. The city did not have to produce statistical evidence that success on the generic sub-tests is predictive of job success. This is a content validity effort. The trial court determined that it was not construct validity. The test development procedure provided for content relatedness. M.O.C.H.A. objected that the test questions were unrelated to the constructs being measured; specifically noted by the appellate court was the contention that the supervision items had nothing to do with supervision. The appellate court noted no evidence of this on the record. Plaintiffs had received the 1998 questions but had not contested the relationship of the questions and the subject matter they were supposed to cover.
The dissent noted that Buffalo was required by Title VII to prove that the test was “job related for the position in question” [emphasis in the dissent] and did “virtually nothing” to carry its burden. The district court allowed Buffalo to avoid liability for its use of a racially discriminatory test on the ground that Buffalo had proven content validity without any of the following:
• Participating in the test preparation
• Having an expert to evaluate whether the test was suitable for use in Buffalo or to testify to its validity
• The test developer having anything to substantiate that the test reflected Buffalo’s undisclosed needs
• Review of the test by knowledgeable people in Buffalo, and
• Examining proper weighting of test components for Buffalo.
The dissent was also concerned that this decision “will make it virtually impossible for a municipality not to certify for use a test that has clear discriminatory impact” because of Ricci. To this remark the majority wrote that it was too early to tell how Ricci will ultimately play out.
Reprinted with permission from the Personnel Testing Council of Metropolitan Washington.