- People online don’t read.
- Olny samrt poelpe cna raed tish – cna you?
The opening epigraphs both deal with readability. The first is a commonly encountered claim that people scan rather than read when perusing material online. What does that mean for employment websites and the associated assessments? The second is a teaser that often makes the false claim that very few people can read the material, when in fact almost everyone can. It illustrates that individuals can make sense out of what appears to be unreadable or scrambled text. Both have implications for our topic for this two-part blog, which involves the readability of assessments.
The measurement of readability and the establishment of appropriate reading levels is a critical responsibility faced on a regular basis by many selection specialists and personnel managers in the public sector. This task involves an analysis of the assessment of materials used on the job and the tests or assessments used in selection. Readability can be defined as the ability of material to be comprehended by its intended audience.
Unfortunately, most of our knowledge of the impact of readability on assessments was developed in an era where we used paper-and-pencil, multiple-choice tests. Even that literature is limited in that most of it deals with educational tests. Very few studies look at the actual impact of readability on the difficulty of employment tests or potential racial bias in tests. I could spend this blog complaining ad nauseam about researchers conducting highly artificial studies of irreproducible phenomena of little generalizability, while ignoring questions of real practical importance, but that is another topic for another day or forum.
One of the questions we will examine in the second part of this blog is whether readability is still relevant for computer-based tests. However, before we do, we will review the more traditional literature on readability and how we measure readability.
In Part 1, we investigate approaches to readability based on:
- The measurement of grammatical features or readability formulas.
- The linguistic perspective.
- Job analysis.
Readability formulas are usually based on simple variables such as the number of syllables, number of words, and sentence length. There are estimated to be over 200 existing formulas for estimating the readability of text. Although we will not review all 200+ here in the blog, we will briefly discuss some of the major indices. Based on a review of the recent research, and their availability through Word software, the three most commonly used indices are those based on the Flesh and SMOG.
The beauty of the Flesch is that it can be easily calculated in Word, making it very simple to use and a popular choice. The Flesch Reading Ease score was one of the earliest readability formulas and remains one of the most popular. The formula relies upon only two variables, average sentence length and average word length, which are input into a formula that leads to a score on a 0 to 100 scale, with 100 being easier to read. The Flesch-Kincaid Grade Level test relies upon the same variables, but uses a formula that results in an estimate of grade level; this grade level estimate is very useful in daily practice. Overall, research suggests that the Flesch is reliable and valid depending upon the criterion used. A limitation for personnel selection is that it may underestimate grade levels above seventh grade.
The SMOG is a simplification of the FOG. The number of words with three or more syllables is counted in a sample of three ten-sentence blocks of text. Then, the approximate square root of the number of multisyllabic words is calculated. Finally, three (3) is added to the square root to find the reading grade level. Due to the prevalence of multi-syllable words in technical or trade-related documents, some experts recommend against the use of the SMOG for the analysis of job knowledge tests and the related source material.
Researchers in composition and rhetoric have argued that readability formulas fail to capture the elements that impact the difficulty of text and fail to consider the motivation and ability of the reader. This has led to proposals for new methods of assessing the readability of a document based on more complex, contextual analysis. Reflecting the dislike for formulas, some of these methods are primarily qualitative and rely heavily upon human judgment of the analysis of the flow of communication between the author and the audience, which probably limits their usefulness in the assessment context. Others methods do result in some type of quantitative evaluation. For simplicity, we will classify these methods into those relying upon the cloze procedure and advanced in cognitive psychology.
The Cloze Procedure attempts to overcome the limitations of formulas by assessing the readability of passages through the use of an audience that is similar to the passages’ intended audience. The Cloze Procedure involves constructing a cloze passage in which every fifth word from the reading text is deleted. The research participants are then given the cloze passage (with the deleted words) and asked to try and identify the original word. Readability is the percentage of participants that can predict the correct words. A high percentage, for example 57% or more, indicates that the passage can be successfully read by the intended audience. A readability measure is then determined by examining the percentage of words that each reader correctly predicted. One index that relies upon the Cloze technique, as well as an updated version of a traditional procedure, is the Dale-Chall Readability Formula.
An index based on cognitive theory is the Coh-Metrix. Without getting too complicated, the Coh-Metrix analyzes how sentences and constructs are related to each other, in addition to traditional features. The Coh-Metrix software not only performs the appropriate analysis but also results in a report that analyzes many features of the document. An advantage of this procedure is that it can incorporate information concerning the reader’s level of background knowledge. There is evidence that the results are more valid than those provided by the Flesch, but I do not know of any studies using the Coh-Metrix to assess employment tests.
Newer methods are being developed that rely upon software that can be trained to determine the grade level of Web pages. Although I will admit that these methods are beyond my understanding, they involve determining whether a language model for a particular grade level could generate a combination of words. Again, I am unaware of studies that apply these methods to employment tests, but they may hold future promise in that the language modeling approach can be applied to web based material.
Methods based on linguistic, cognitive science, and computer science hold a great deal of promise for the future. However, due to the cost and difficulty of such approaches at present, they are unlikely to offset the simplicity of the indices available in Word for most assessment professionals.
From a human resource perspective, the assessment of material for readability, as well as the design of tests for ease of understanding, can be seen as incorporated into the process of both job analysis and test design. In job analysis, one analyzes materials read on the job or in training, and also identifies appropriate source material or texts. One could argue that readability should not be an issue if the test is designed to be content valid, the test has psychological and physical fidelity, and there is an isomorphism between the assessment material and the test. Thus, a professional job analysis and careful test construction should lead to built-in readability or readability through content validity.
I must admit to being a strong proponent of the position that through job analysis and test development we ensure readability, as long as we pay attention to normal professional concerns and cautions. However, the cynic might argue that I am putting too much trust in the test developer and that we need some independent and quantitative check, such as that provided by readability indices. Nevertheless, I would argue vociferously for the position that tests developed by professionals or by testing companies achieve admirable levels of readability and if anything, are too easy in terms of reading demands; we will consider this argument in more detail in the next blog.
Conclusion and Next Blog
The availability of the Flesch Reading Ease test and the Flesch-Kincaid Grade Level index as options in the Word software program, make both an attractive choice for assessment practitioners. However, as the linguistic perspective points out, the use of traditional readability formulas may provide very little information on the actual difficulty of a piece of writing. This is even more likely to be true when the writing in question is items on a multiple-choice, job knowledge test, where a number of questions arise as to how to most appropriately analyze the test content. In the next blog, we will turn our attention to more practical issues such as:
- How do assessment professionals use readability indices?
- What adjustments can or should be made when evaluating multiple-choice tests?
- How has the changing nature of jobs impacted readability?
- How have changes in the technology of assessment impacted the evaluation of readability?
- Should we be assessing usability instead of readability?
In writing this blog, Dennis Doverspike was assisted by Megan Nolan and Chelsea Whims. The blog was based on a longer, unpublished paper by Doverspike, Nolan, and Whims entitled Establishing Reading Levels in Employee Assessment. Information on the linguistic approach was provided by Dr. James E. Porter of Miami University in Oxford, OH. Information was also obtained from the paper Benjamin, R. G. (2012). Reconstructing readability: Recent developments and recommendations in the analysis of text difficulty. Educational Psychology Review, 24, 63-88.