Successive Hurdles, Test Weighting and Certification Rules: Part 2

In the previous article, I introduced the concept of weighting exams that comprise the battery of instruments in a selection process. This article will explore that process more in depth. To begin with, some instruments lend themselves to being weighted and thus providing an impact on the final ranking of candidates and others do not. Determination of which instruments are appropriate for ranking and the weight given to those that are considered appropriate for that purpose should be established through the use of a comprehensive job analysis designed to support the content validity model for test development.

There are numerous published methodologies for conducting job analyses that are designed to comply with the Uniform Guidelines for Employee Selection Procedures (UGESP) even though they may differ in how they combine subject matter experts’ ratings on KSAP’s which ultimately determine the weight given to selection components. Typically, these systems will collect ratings on KSAP’s and then review them to determine which ones have received ratings that indicate that they are required at time of hire, are important for job success and are linked to performing important job tasks effectively. Often to make the system more manageable, the next step will involve grouping KSAP’s into domains, which is what is recommended by several job analyses procedures designed to conform with the requirements of the UGESP.

Ultimately the surviving KSAP’s should be plugged into an Exam Plan Outline which is essentially a grid with types of selection instruments being listed across the top and surviving KSAP’s being listed along the left side of the grid. The KSAP’s along with their relative weights are distributed under the type of selection instrument that would be the most effective for measuring that particular KSAP.

For example, in applying this model to selection for police officers, we would commonly find that KSAP’s would include ability to communicate verbally and ability to read and comprehend training materials. We would also commonly find that we would have written multiple-choice exams, oral exams, background investigations, psychological exams, medical exams, and physical ability exams. Plugging our two abilities into this hypothetical grid, we can see that we would put verbal communication under our oral board and we would put the ability to read under our written multiple-choice exam. Totaling the weights from each column in our grid will give us the weighting to be applied to each instrument in the selection process.

The following hypothetical grid shows how the weights from the job analysis can be taken from the KSAP’s and assigned to the selection instruments in the battery.

KSAP’s Exam Type
WR OB PF BG PS
WR = Written, OB = Oral Board, PF = Physical Fitness, BG = Background, PS = Psychological
Ability to read and comprehend written materials 5%
Ability to make quick decisions 5%
Ability to respond to questions verbally 5%
Ability to apprehend and subdue suspects C
21 Years of Age R
Stable thinking and mental processing C
Total Weight 5% 10%

In establishing the weights to be applied to exams, another fundamental factor must be considered in regard to whether or not a particular instrument is appropriate for ranking. As indicated previously, job analysis systems designed to support the content validity model for test development are critical for establishing the weights for the instruments in the test battery, however; development and validation of the instruments utilized for background investigations, medical exams and psychological exams are not typically established through this model. While the need for such exams can be demonstrated through the use of the job analysis process, establishing the validity for these exams themselves, typically requires utilization of the construct or criterion related validity models.

It is difficult to support using instruments for ranking that measure constructs without proof that possessing more of a construct increases job performance which is essentially the claim made by the content and criterion related validity models. So while we cannot use the content validity model to demonstrate, for example, that having a more suitable background beyond the minimal threshold level, makes one a better police officer we can demonstrate that having more of an ability directly tied to job performance such as greater ability to read and comprehend written materials can make one a better police officer.

Once the critical test has been applied to determine if a component is suitable for ranking, the focus can shift to the actual weighting of components. As suggested earlier, some job analysis systems apply weights to individual KSAP’s at the beginning of the process that remain with them through out the process while others allow for a reshuffling of the deck so to speak after some KSAP’s have been eliminated from the selection process. Regardless of the system utilized, it is important to follow the process through its cycle to be assured that it conforms to the UGESP. Ultimately, whichever system is used, it should result in percentages that reflect the portion a particular component will contribute to the final score.

Typically selection processes developed from appropriate job analysis procedures will result in weighting written multiple-choice tests and oral exams including structured interviews and assessment centers. It is most important that the tests are weighted in proportion to their job importance and that the job analysis documents the reason for the weighting scheme that is chosen.  For example, if job analysis results indicated that these two components should be equally weighted with each portion contributing 50% of the final score used for ranking. The common practice is to multiply a candidate’s scores on the oral board and written exam by .5 and then sum the results. (Note:  It is important to remember that the range of weights used will vary depending on job analysis results and the other testing components being utilized.  One study (Lowery, 1996) indicated the median weight for the written multiple-choice test was 30%.  There is not one correct weighting scheme.  The job analysis will have to support the scheme that is chosen.)  Similarly, written multiple-choice tests frequently incorporate multiple subtests, one variation being the inclusion of two components, one cognitive and one non-cognitive with non-cognitive components being similar to an interest questionnaire or a background data questionnaire similar to IPMA-HR’s BDQ.  These two components could also be weighted based on information obtained in the job analysis.

It is important to remember that tests tend to self weight based upon their variance. This means that tests with more divergent scores have a greater impact on the ranking of candidates than tests with more homogenous scores.

To ensure that the instruments in the selection process are weighted according to the exam plan outline, it is important to apply a correction or adjustment to test scores so that they actually contribute their desired weight to the final score as opposed to the weight established by their variance. One common method for ensuring that scores contribute their desired weight is to transform the scores into T scores or Z scores. This process involves using the statistical means of the instruments and their standard deviations to establish standardized scores and then combining the standardized scores. The process for establishing T scores and Z scores is described in most texts on statistics including Psychological Testing by Anne Anastasi. In addition, there is a statistical correction that can be made that provides a formula for determining what percentage of an exam score needs to be utilized to achieve the desired weight.

The next article in this series will go more in-depth on the topic of standardizing scores to adjust weights. Stat Trek includes formulas for important statistics. Most of these can be run using a statistical program like SAS or SPSS.

The keys to combining scores ultimately involve using comprehensive job analyses procedures to establish the desired weights and then using appropriate statistical methods for combining these scores so that they reflect the desired weights.


Resources:

Anne Anastasi and Susana Urbina (7th Edition, 2009)  Psychological Testing, Prentice Hall (ISBN-13: 978-0205703890)

Equal Employment Opportunity Commission, Civil Service Commission, Department of Labor, and Department of Justice.  (1978).  Uniform guidelines on employee selection procedures.  43 FR 38295.  Available as a free download from www.uniformguidelines

Phillip E. Lowry, A Survey of the Assessment Center Process in the Public Sector, 25 Pub. Personnel Mgmt. 307, 309 (1996), IPMA-HR


This is part two of a four-part series on successive hurdles, test weighting and certification rules. If you’ve just joined us, read up on part 1, which introduces the concept of weighting exams that comprise the battery of instruments in a selection process. Part 3 will go more in depth on the topic of standardizing scores to adjust weights and will be available to read on the Assessment Services Review on April 18, 2012. The series will conclude with part 4 on April 25, 2012. In case you missed it, check out Robert Burd’s previous series, Item Analysis In Public Safety.

Leave a Reply