In the two previous articles, we looked at the statistical and technical aspects of item analysis. Individual test developers will view the statistical computations and their value differently based upon their knowledge of statistics and their understanding of their application. However, a test developer or test user with a rudimentary understanding of item analysis can still make accurate decisions regarding the effectiveness of test items and therefore, written exams. As we emphasized previously, IPMA-HR conducts item analyses on potential test items in their test development process and maintains item analysis data from successive administrations of all exams. These practices ensure that only items that perform well continue to be utilized and, in addition, this practice reflects a standard that all test developers should employ. Also note that for our discussion, we will be focusing on typical four response multiple choice items and true false items.
Effective utilization of item analysis information for item and test revision is where science meets art. This process of “cleaning” up test items and tests requires utilization of the basic information from an item analysis, effective analysis of the applicant response data and application of the information available for developing good test items. There is an extensive amount of scholarly information available on item writing as well as response theory and the effective practitioner should take the time to review some of this information prior to writing or “repairing” test items. It should also be noted that even though the information provided in this article focuses on actual test developers, it can also be extremely valuable for those who purchase or lease tests since it can assist them in evaluating the quality of tests they are considering. Continue reading
In the previous article, we began our discussion of the valuable information available through conducting an item analysis and we focused on the two most readily available pieces of information. First, of course, is the Difficulty Index. Just as the name implies, this index is an indicator of the difficulty of the item. It is expressed as a percentage and reflects the number of candidates that got that item right out of the total number of candidates that responded to the item. That is if nine out of ten respondents answered an item correctly the index would be .9 or 90%. From this illustration, we can also see that the Index actually has an inverse relationship with the difficulty of the item. That is, the higher the index or the higher percentage the easier the item is.
The second Index we discussed was the Item Discrimination Index. Essentially, this index reflects how the candidates who performed best on the test responded to a specific item when compared to how the candidates who performed the worst on the test responded to that same item. The top 27% of test performers and the bottom 27% of test performers are used for calculating the Discrimination Index and it is expressed as a proportion or percentage of the number in the top group that answered the item correctly in relation to the number in the bottom group that answered the item correctly. Continue reading
Any organization using written exams as part of their selection processes that doesn’t take the time to review the information provided by an item analysis is overlooking a treasure trove of information. Without performing an item analysis on a written exam and acting upon the information gained from such an analysis, a jurisdiction truly does not know how that exam is performing.
There are two fundamental concepts involved in test utility and they are test validity and reliability. Simply put, validity refers to whether or not a test measures what it is intended to measure and reliability refers to how consistently a test measures what it is intended to measure. Item performance directly impacts these two concepts. Test items are the basic element for gathering information about test participants. In order for a test to have the necessary validity and reliability to make it worth using, test items have to perform optimally and gather the best information possible as accurately as possible. If test developers do not utilize information available from an item analysis, that test developer has no idea how items are performing. If items are performing poorly then all other statistical analyses become less meaningful and the test is not suited for its intended purpose. Hiring decisions based on the use of the test are questionable since they become another layer of what is ultimately a house of cards that does not have an adequate foundation. In this case, the written exam has not played its role in terms of optimizing accurate selections and supporting the mission of the organization. Continue reading