In my previous post, Thoughts on Adverse Impact Part 1, I offered my suggestions on how to plan and think about an adverse impact study. Summarizing and reviewing some of my main points:
From the Practitioner Perspective, adverse impact involves a technology, not a science.
- The ways in which we prepare for, calculate, and interpret the results of an adverse impact analysis are guided primarily by the Uniform Guidelines and case law, as opposed to strict principles of statistics.
- Adverse Impact can be defined as practical or significant differences in selection rate as a function of protected group status.
In this month’s blog, I deal with more commonly discussed issues, such as various approaches to quantifying adverse impact and the sequencing of tests. Again, as a caution, this blog does contain my opinions on a controversial topic. In addition, although I have tried to simplify the discussion, this is a complex topic and it would be difficult enough to explain in a long article or book, let alone a blog.
Distinguishing Between Adverse Impact and Subgroup Differences
Although adverse impact and subgroup differences between protected classes are distinct concepts, the two terms are often used interchangeably, not only in the practitioner literature but by experts who should know better. In order to appreciate some of the issues in the analysis of adverse impact, one must first develop an understanding of the difference between these two frequently confused concepts.
Subgroup differences are standardized mean differences between protected class groups on some test or measure; they are a scientific or psychological phenomena and generalize across situations. Unfortunately, for many tests or constructs there are well-documented race or sex differences. As an example, we often speak of a standardized difference of one standard deviation between African Americans and Whites on measures of cognitive ability. Although often expressed in terms of standard deviations units, this standard deviation is quite different from the 2 or 3 standard deviation difference based on the Z test used to assess for adverse impact. Because of their similarity, the two uses of standard deviation are easily misinterpreted or viewed as interchangeable when they are not; so, BE CAREFUL, if a professional article or plaintiff’s expert reports a standard deviation difference ask yourself whether this is a standardized mean difference or based on a Z test.
Adverse impact is a concept that arises from technical, administrative and legal concerns. Fundamentally, adverse impact is not a scientific concept but depends upon a long laundry list of non-random variables, including an organization’s recruitment practices. Adverse impact is a function of the implementation of decision rules, where the decisions are made in real-world situations. As previously mentioned, adverse impact can be defined as practical or significant differences in selection rate as a function of protected group status.
The 4/5ths or 80% Test
Perhaps the simplest and most basic test is the 4/5ths or 80% test, which is often viewed as a rule of thumb. Simply put, one sets up a ratio, or divides, the selection rate for the protected class, or minority group, by the selection rate for the non-protected class, or majority group. If the obtained ratio is less than 4/5ths, or 80%, then one concludes that adverse impact is present.
The problems with the 4/5ths test, and its interpretation, are well known, but one of the issues is that one can avoid adverse impact by simply having a high enough selection rate for both groups. If the selection rate for the minority group is above 80% then there can be no adverse impact since the ratio will always be above .80. So if we have a 5 point difference in selection ratios where the overall hiring rate is 90% for the minority group but 95% for the majority group, then the adverse impact ratio is .95. But if we have a 5 point difference in selection ratios where the overall hiring rate is 05% for the minority group but 10% for the majority group, then the adverse impact ratio is .50 and we have adverse impact.
An even more confusing situation can occur in age discrimination cases. Consider a situation, which frequently occurs in the private sector, where a group of employees is laid off and then rehired. If we look at the situation as a rehire, we find that 80% of the older employees are rehired and 90% of the younger employees are rehired. Our ratio is .80/.90 or .88; thus, leading to a conclusion of no adverse impact. If we look at the situation as a firing, 20% of the older workers were laid off and 10% of the younger workers were laid off. The adverse impact ratio is .10/.20, or .50. We can now conclude that there is adverse impact in that older workers are more likely to be laid off. The difference in selection ratios is still 5 percent, but we reach the opposite conclusion from the same data. The point is that the same percentage difference can lead to different conclusions depending on the overall selection rate and also depending on how we phrase the question.
A more fundamental problem is that the 4/5ths test can lead to a conclusion of adverse impact when the difference in ratios is not statistically significant, especially in cases of small sizes. This leads then to the use of statistical tests as an alternative to the 4/5ths test.
The Z Test
Due to issues with the 4/5ths test, experts often calculate some type of statistical test that leads to results that can be expressed in terms of probability. One of the tests that you will see frequently mentioned is the z test. Because of space limitations, I am going to stay away from complicated statistical explanations and formulae in this blog, but basically the z test tells us how many standard deviations of difference there are between two proportions or percentages. The z test is fairly easy to calculate by hand using a calculator, it is easy to program in excel, and a large number of online calculators are available. All you need are the percentages and the number of people in each group.
As previously mentioned, this is not the same as standardized mean differences. So, a z test might tell us that there are 3 standard deviations of difference between a selection ratio of 5% for minorities and a selection ratio of 10% for the majority group. The nice thing about this test is that the results are presented in easily interpretable units, a z of more than 2 units is seen as statistically significant, as it is viewed as having less than a 5% change of occurring.
There are of course issues with the z test. Given the exact same data, I could come up with at least 5 different zs, if not more. Thus, two experts could come to different conclusions using z tests on the same data. More critically, as with any statistical test, the z test is a function of sample size. With large sample sizes, the z test is likely to be significant, with small sample sizes it is difficult to obtain a significant z test. This is a concern because in both the public and private sector, we are dealing with larger and larger applicant pools, which means that if you use a statistical test you are likely to obtain evidence of adverse impact.
The Fisher Exact Test
The analysis of selection data leads to a natural two-by-two table, where the rows are protected group status, minority or majority, and the columns are the outcomes of the decision, usually pass/fail or hire/do not hire. This type of data can be analyzed using a classic Chi-Square test, but many experts will use a variant on the Chi-Square test called the Fisher Exact test. Standard statistical software can calculate the Fisher Exact test and there are also many online calculators.
The Fisher Exact test tells you whether there is a difference in the type of decision as a function of protected group status. The results are reported as a probability, and if the probability is less than 5 times out of 100, then the conclusion is that there is adverse impact, as there is a relationship between group status and the selection decision.
The issues with the Fisher Exact test are similar to those for the z test. In particular, as the size of the groups increases, the likelihood of a statistically significant result increases, at least in practical situations. Again, as we encounter large pools of applicants, it is more likely we will conclude that there is evidence of adverse impact.
Order and Interpretation of Tests
At this point, by way of summary, I offer my view on how to order and interpret tests of adverse impact. The approach I present has been developed and refined over a number of years. It is based on case law, the hypotheses being tested, and practical considerations; however, one could find experts who would disagree with my reasoning. I will also note that what I present here is a simplified version; in conducting an analysis I will consider a host of more complex issues. Nevertheless, I offer the following as the appropriate sequence and interpretation:
- Apply the 80% rule.
- If the 80% rule is violated (i.e.., the adverse impact ratio is less than 80%), then proceed to Step 2.
- If the 80% rule is not violated (i.e., the adverse impact ratio is 80% or greater), then stop and conclude there is no adverse impact.
- Apply the Fisher’s Exact Test.
- If the Fisher’s Exact Test is significant (i.e., the associated probability value is less than .05), then proceed to Step 3.
- If the Fisher’s Exact Test is non-significant (i.e., the associated probability value is great than .05), then stop and conclude there is no adverse impact.
- Consider whether there are any alternative explanations, such as aggregation paradoxes. If there are no apparent alternative explanations, then conclude there is adverse impact.
To illustrate the aggregation paradox, consider the following example of a Police Department that gives a test at both the Sergeant and Lieutenant levels. The results are as reported below:
|Applications||Promoted||Selection Ratio||Adverse Impact Ratio|
As you can see from the table above, if an analyst was working at the level of the overall results from the test, they would conclude there was adverse impact. However, at the Sergeant and Lieutenant level individually, there is no adverse impact, the selection ratios are identical. This is an example of an aggregation paradox. There are a number of ways to handle this issue, but an in-depth discussion goes beyond what can be discussed in this blog.
Of course, although it is a common error to assume otherwise, a finding of adverse impact does not equate to having a discriminatory test. Given large enough sample sizes, it is likely that your tests will result in adverse impact. The best response to potential adverse impact is to administer job related tests, where the use of the test can be supported based upon solid validity evidence.
In the end, as simple as the calculation of adverse impact may appear at first glance, it is a complicated endeavor. If you have questions, it is best to consult an expert. If you have any questions you would like me to answer in a future blog, please feel free to email them to me.