The validity of traditional readability tests on accurately predicting people’s comprehension of health information

Dublin Core

Title

The validity of traditional readability tests on accurately predicting people’s comprehension of health information

Creator

Jiawen Liu

Date

2015

Description

Tons of evidence indicated that readers benefit from clear and understandable health information in various contexts. Authors have been looking forward to utilizing a wide range of readability formulas so that they can produce comprehensible texts for readers. Both traditional readability formulas and the new Coh-Metrix algorithms have been widely used for decades and the utilities for the new tool were more likely to be supported by theoretical evidence. Nevertheless, there is still a lack of empirical evidence supporting the utilities of the two kinds of readability formulas. In this paper, a secondary data analysis was utilized to give empirical evidence to whether the widely used readability tests can predict participants’ comprehension responses effectively. By using Bayesian generalized linear mixed-effects models, variation in both traditional readability formulas and two of the new Coh-Metrix algorithms were tested having little or no effect on variation in participants’ comprehension accuracy. In this case, it is suggested that researchers in the future should think twice before utilizing the readability tests to analyse text difficulty.

Subject

Vocabulary knowledge, health literacy, reading comprehension skill, reading strategy

Source

Participants
Participants recruited in the original study were through the Prolific online platform. Participants recruited were all UK nationals who were aged eighteen or over and spoke English as their first language. Participants who completed the test battery were awarded £12.50 (equalling £6.25 per hour). All participants who volunteered were tested, with exclusion of participants whose reading times for the health-related information texts were recorded being below 30s. The reading time includes reading the text and answering the questions relating to that text, including the self-rated evaluation-of understanding probe. While participant recruitment was administered through the Prolific platform, response data collection was conducted through a Qualtrics survey for each study.
Design
Two studies were conducted in the original research. In Study One, participants were presented with a sample of written health information texts on a range of topics. The observation was replicated and extended in Study Two by presenting a sample of texts on a range of health topics, together with a sample of guidance texts on COVID-19. In both studies, participants were asked to complete four multiple-choice questions, each with three answer options, in response to each stimulus health text. After the comprehension test questions, participants were asked to rate how well they thought they understood the information in the guidance. The original dataset also included individual differences, including reading skill and knowledge, and collected information on text attributes. Responses by participants in terms of the comprehension of the four multiple-choice questions for each text and the individual differences, such as reading skills and knowledge, would be utilized in the current study analysis with more kinds of text attributes included. In sum, except the heath-text materials picked to test participants’ comprehension responses and participants chosen, all other variables and procedures were identical in both studies. Since the difference between the two datasets in the two studies was the inclusion of texts on COVID-19 in Study Two, and all variables included in both datasets were identical, both data were renamed as Dataset One and Dataset Two in order to more easily distinguish between the two.
Material
For Dataset One (Study One in the original data), 25 health-related information texts were collected from those available on NHS trust organization webpages. The texts collected were chosen from 115 candidate texts from those available among the web resources of a quasi-random sampling of 23 NHS England trusts (10% of the 228 total in England). For Dataset Two (Study Two in the original data), 14 texts concerning a range of health matters and 15 texts concerning COVID-19 or guidance relating to the public health response to the pandemic were collected. As in Dataset One, the general health texts were selected as a sub-set of a (fresh) pool of 115 candidate texts extracted from those available among the web resources of a (new) sample of 23 NHS England trusts. The COVID texts were selected from a pool of 115 candidate texts extracted from those available from gov.uk, charity (British Heart Foundation, Cancer Research UK), NHS UK, and NHS England trust webpages. The selection of texts, for both general health and COVID-19 information, was made so that the sub-set of items varied as widely as possible across the distribution of values (for each pool of candidates) on each critical text feature. For each text chosen, a set of four multiple-choice questions (MCQs) was constructed, each with three answer options, to testify participants’ comprehension levels.
Individual differences measured: vocabulary knowledge, health literacy, reading comprehension skill, and reading strategy:
Vocabulary knowledge. The Shipley vocabulary sub-test was used to estimate vocabulary knowledge (Kaya et al., 2012). Participants were required to choose the synonymous word from four alternatives to a target stimulus word in The Shipley test (the other three alternatives are semantically related or unrelated distractor words). Participants were associated with a test result corresponding to the total number of correct answers out of 40 multiple-choice items.
Health literacy. The Health Literacy Vocabulary Assessment (HLVA) was used to estimate health literacy. Participants were required to choose the synonymous word from four alternatives to a target stimulus word and all the items are under health contexts. Since the vocabularies presented were drawn from the health-care profession, the HLVA is designed to test participants’ background knowledge of health matters and is considered an index of health literacy. Participants were associated with a test result corresponding to the total number of correct answers out of 16 multiple-choice items.
Reading skill. The Qualitative Reading Inventory (Leslie & Caldwell, 2017) was used to assess reading skills. Participants were asked to read a short factual text (compromised of 802 words) about the life cycle of stars and then answer two sets of 10 open-class questions related to the text, respectively. The questions not only included information that can be found explicitly in the text but also information that requires inference from background knowledge. Participants were associated with a QRI score corresponding to the total number of correct answers out of 20 open-class questions.
Reading strategy. A Reader-based standards of coherence measure published in a doctoral paper by Calloway (2019) was used to assess reading strategy. Participants were asked to complete a 5-point Likert scale based on their reading experience ranging from very untrue to very true. The scale includes 87 items and is supported to measure readers’ reading goals and learning strategies effectively. Participants were associated with a scale score corresponding to their response on the 87-item scale.
Text features measures: traditional readability tests scores, coh-metrics scores of the health-related information texts presented to participants:
Referential Cohesion. The Coh-Metrix tool was used to calculate the referential cohesion (co-reference) of texts. Referential cohesion emphasises the overlap degree of concepts, words, and pronouns between sentences and paragraphs. With the increase of the similarities of sentences and conceptual ideas within a text, it is easier for readers to make connections between ideas and sentences (Coh-Metrix, 2012). Nevertheless, low referential texts sometimes are necessary when readers are required to be more actively involved in comprehending a text (Coh-Metrix, 2012).
Deep cohesion. The Coh-Metrix tool was used to calculate the deep cohesion of texts. Deep cohesion refers to how well a text is tied together by an efficient number of cohesion ties, also called connectives (Coh-Metrix, 2012). The calculation of deep cohesion in a text is determined by the number of the connectives including time, causal, additive, logical and adversative connectives, which connect ideas and propositions and clarify relations in a text (R-Kintsch & Walter Kintsch, 1998). Being able to utilize the connectives effectively helps to tie the information together; thus, it facilitates the readers’ understanding.
Flesch Reading Ease Score (FRE). The FRE (Badarudeen & Sabharwal, 2010) is one of the traditional readability tests. The formula for the FRE is 206.835 - (1.015 * ASL) - (84.6 * ASW), where ASL represents the average sentence length and ASW represents the average number of syllables per word. The FRE evaluates texts on a 100-point scale and higher scores means that it is more difficult to comprehend the text.
The Gunning Frequency of Gobbledygook (FOG). The FOG (Roberts et al., 1994) is one of the traditional readability tests. The formula for the FOG is 0.4*(ASL + % polysyllabic words), where ASL represents the average sentence length. There is a minimum word count for the passages tested using FOG, more than 100 words, and the results given correspond to the education level that a reader needs to comprehend a text.
The Flesch–Kincaid Grade Level (FKG). The FKG (Woodmansey, 2010) is one of the traditional readability tests. The formula for the FKG is (0.39*ASL) + (11.8*ASW) - 15.59, where ASL represents the average sentence length and ASW represents the average number of syllables per word. The results given from the FKG provide a number indicating the specific grade that readers should achieve to comprehend the text, which ranges from grades 3 to 12.
Simple Measure of Gobbledygook (SMOG). The SMOG (McLaughlin, 1969) is one of the traditional readability tests. The formula provided is 1.043 * square root of (number of polysyllabic words * [30/number of sentences] + 3.1291). The SMOG also provides a school grade as a result, indicating the specific education level a reader should have to understand a text, and it was recommended by the National Cancer Institute as having a better performance than the other tests.
Demographic attributes. Participants’ demographic characteristics were recorded, including gender (coded: Male, Female, non-binary, prefer not to say), education (coded: Secondary, Further, Higher), and ethnicity (coded: White, Black, Asian, Mixed, Other).

Publisher

Lancaster University

Format

Data/Excel.csv
Data/R.r
Data/DS_Store

Identifier

Liu2015

Contributor

Mistry, Daniel
Lin, Pei-Ying

Rights

Open

Relation

None

Language

English

Type

Data

Coverage

LA1 4YF

LUSTRE

Supervisor

Robert Davies

Project Level

MSc

Topic

Cognitive

Sample Size

307 participants

Statistical Analysis Type

Bayesian analysis

Files

Citation

Jiawen Liu, “The validity of traditional readability tests on accurately predicting people’s comprehension of health information,” LUSTRE, accessed April 25, 2024, https://www.johnntowse.com/LUSTRE/items/show/146.