A secondary data analysis: How will the effects on accuracy differ when measuring individual differences in word reading skill in Spanish?

Dublin Core

Title

A secondary data analysis: How will the effects on accuracy differ when measuring individual differences in word reading skill in Spanish?

Creator

Julianna Krol

Date

2021

Description

A deficit in accuracy has been found to correlate to reading difficulties (Davies et al., 2007). Effects of psycholinguistic factors and differences in language orthographies contribute to reading skills, predominantly in children with reading impairments such as dyslexia. The present study is a secondary data analysis of the original research conducted by Davies et al. (2007).
The effects on accuracy of individual differences demonstrated by nonword reading skill and word property measures were examined in Spanish children. Participants were 110 students differing in reading ability from schools located in A Coruńa, Lugo, Orense and Pontevendra in northern Spain. The subjects were required to take standardized and experimental reading ability and intelligence tests.
Eight lists consisting of 15 words each were created. The words were presented in five rows of three columns. Participants were asked to read the words as quickly and accurately as they could. Words which were incorrectly pronounced were identified as errors. Word property measures suggested to affect reading ability were selected and updated from an online database of Spanish words ‘EsPal’. Variables of frequency, length of words, neighbourhood size (Levenshtein distance), RAN, PROLEC-R nonword reading were investigated in the present analysis. Accuracy of reading scores was found to be significantly high for the sample. Effects of individual differences on accuracy were noted. Word property measures of frequency and neighborhood size were found to significantly affect reading accuracy. Effects of fluency (RAN) and nonword reading (PROLEC-R) were also observed.
The analysis provides insight into plausible factors which contribute to reading impairments in a rule governed orthography such as Spanish. Results suggest that perhaps nonword reading skill could serve as an marker for reading difficulties.

Subject

Individual differences, Dyslexia, Word property effects, Language orthographies, Reading accuracy

Source

Participants
In the original study (Davies et al.,2007) researchers selected and identified three groups of children from an initial sample of 110. Children who indicated clear reading disabilities (DYS/ dyslexia), a control group consisting of children matched by reading ability level (RA matched group) to the DYS group and a chronological age control group (CA matched group). The present analysis investigated the whole sample of 110 participants and no group selection was conducted.
Participants were students from schools located in A Coruńa, Lugo, Orense and Pontevendra in northern Spain. 110 children differing in reading ability and age were selected. These children did not obtain any prior diagnoses of impaired neurological or sensory-motor functioning. The sample of 110 children was required to take standardized and experimental reading ability and intelligence tests on different school days during a 3-month time. Experimental data was gathered in a single session focusing primarily on the experimental test, whereas the standardized reading test was given in a separate session.
Measures
Reading performance was measured across a series of ability tests (PROLEC-R, RAN).
PROLEC-R Battery Tests of Literacy Skills
Evaluation of reading processes for children is assessed through the use of the PROLEC-R battery constructed by Cuetos, Rodriguez, Ruano & Arribas (1996). The battery consists of Spanish tests analyzing reading processes such as lexical, semantic etc. Subjects were required to read from a list of 40 words as quickly and accurately as possible. Words differed on properties such as frequency and length. The scores obtained consist of a score relating to accuracy and reading speed when assessing words and nonwords. It has been suggested that the results of the test provide significantly more information when combining the PROLEC-R scores of accuracy and PROLEC-R reading times. This is why PROLEC-R nonword reading was computed into a combined measure. This was done by dividing accuracy by time.
Rapid Automatized Naming Tests (RAN)
Rapid automatized naming (RAN) refers to how quickly a child can read aloud a set of previously known items. These items can include numbers, pictures, letters, colors etc. A child’s performance on the tests is assessed by comparing their reading times to the norm scores of children in the same age. RAN tests are designed to predominantly assess fluency of reading. It is suggested that RAN influences reading scores as it requires the retrieval of stored phonological information (Johnson & Eden, 2014). Children were presented with a sequence of rows consisting of sets with different items (colors, letters, pictures etc.). The subjects were required to read aloud all the items from the list starting from top to bottom. Accuracy of reading and time it took for the child to name the words were recorded. Children with reading difficulties will be expected to present a delay in reading speed and accuracy, thus scoring low on the RAN tests.
Word Property Measures
In the original study (Davies et al.,2007), words were chosen varying on lexical frequency (high or low frequency word), orthographic neighbourhood size (many or few neighboring words) as well as word length (short or long in length) (factorial design 2x2x2).
Updated word property measures were derived from the EsPal (“Español Palabras” meaning “Spanish words”) repository consisting of properties for Spanish words. The new word property measures derived from the database (frequency, length of words and neighbourhood size) were compiled together with the old data. The system is able to process different corpora in the same way. It combines a corpus which is derived from movie subtitles and one from previously written text such as Web pages, fiction, nonfiction writing etc. The updated measure of frequency is reported within the analysis with the databases original name “esp.count”. The ‘count’ refers to the number of times in which the word appears within the selected corpus. For orthographic neighbourhood size, all words are counted within EsPal and are in turn compared to other words within the corpus. Yarkoni et al (2008) argued that the orthographic neighbourhood metric (ON) developed by Coltheart et al.(1977) is limited due to the nature of its definition. ON is the number of words which can be developed by substituting one letter in the other word given that it is the same length. As a result, researchers have developed a new measure of orthographic neighbourhood size which is less restricted than the previous metric. The new measure is coded as Levenshtein distance 20 (Lev_N) (Duchon et al.,2013). Levenshtein distance refers to the average distance of 20 words which are found closest in text. LD is calculated as the number of edits to words (substitutions, insertions, deletions) which are needed to change one word into another. For example, the Levenshtein distance between the word “SMILE” to “SIMILES” is two, as it differs from the original by adding the letters “I” and “S” (Yarkoni et al., 2008).
An updated measure of length of words was also derived from the EsPal database and is coded as “esp.num_letters”. This refers to the word length which is expressed in number of letters.
Procedure
Eight lists consisting of 15 words each were created. Participants were shown each list of words on a A4 sheet of paper. The words were presented in five rows of three columns. Participants were tested individually and were asked to read the words as quickly and accurately as they could. Words which were incorrectly pronounced were identified as errors. Three types of errors were identified: word substitution, nonword and stress errors. An example of word substitutions would be the word “nube” (cloud) which would turn into "neuve “(nine). For nonwords: “bigote” (mustache) would be “bixote”. For errors relating to stress “cáfe” would be “café”. All responses from 110 participants were computed and are present in the file: “SpanishR”. Accuracy is presented as the subject responses scored as correct and incorrect (0,1).
Analysis
Item level and subject level data about word properties and subject attributes were extracted. An analysis of the accuracy of responses as well as the effects of word properties on reading was conducted. Errors were scored as 0,1; correct and incorrect.
Random and experimental variables were identified. Random effects were specified as “palabra” (words) and “subject identifier” (participant name). The experimental/fixed effects were specified as frequency, length, neighbourhood size, RAN, PROLEC-R nonword reading. To investigate correlations between the experimental variables a correlation matrix was constructed.
Generalized linear mixed effects modeling (GLMM; Baayen, 2007) was used in order to analyze the accuracy of responses made by children to reading words. The distribution of variables included in the model relate to person characteristics and word characteristics.
Moreover, GLMM was used to capture the randomness of the sample to increase accuracy of estimates for the effects of individual differences on word properties. The model explains the variation of accuracy by incorporating experimental and random variables. Model development followed a stepwise process, adding one variable to each model at a time. The primary model specification was as follows: accuracy~(1|palabra) + (1|subj_identifier), data = spanishr.
A table of estimates of both random and fixed effects were created and analyzed in order to assess the variation in the models.

Publisher

Lancaster University

Format

Data/Excel. csv

Identifier

Krol2021

Contributor

Florine Causer, Siri Sudhakar

Rights

Data set belongs to Robert Davies who is the author of the original published study (Davies et al.,2007, “Reading development and dyslexia in a transparent orthography: a survey of Spanish children”.)

Relation

The present work is a secondary data analysis of the original research conducted by Davies et al. (2007), “Reading development and dyslexia in a transparent orthography: a survey of Spanish children”.

Language

English and Spanish (Spanish participants, words, database)

Type

Data

Coverage

LA1 4YF

LUSTRE

Supervisor

Robert Davies

Project Level

MSc

Topic

Cognitive, Developmental

Sample Size

110 participants

Statistical Analysis Type

Generalized Linear Mixed Effects Modelling
ANOVA
Correlations

Files

Collection

Citation

Julianna Krol, “A secondary data analysis: How will the effects on accuracy differ when measuring individual differences in word reading skill in Spanish?,” LUSTRE, accessed April 28, 2024, https://www.johnntowse.com/LUSTRE/items/show/128.