Reading Standards Up or Down - What do the test norms say?
Marion M. de Lemos, Australian Council for Educational Research
There has been continuing debate in Australia and elsewhere as to
whether or not standards in reading have improved or declined over the
last two decades. Direct evidence on this is however limited. One
way of monitoring trends over time is through examination of changes in
test scores on standardised tests that are renormed at periodic
intervals. In the case of intelligence tests, such studies have
indicated a gradual increase in scores over the last 50 years or so.
Comparable studies have not been undertaken on reading test scores.
This paper will examine scores on standardised tests of reading that
have been periodically renormed over the period 1958 to 1996 to
determine whether there is a trend for an increase or a decrease in
reading test scores over this period.
One way of monitoring changes in performance standards over time is
through examining changes in test scores on standardised tests which
are renormed at periodic intervals. In the case of intelligence tests,
such studies have indicated a gradual increase in test scores over the
last 50 years or so. In terms of IQ score, these increases translate
to a gain in IQ score of approximately 0.3 points a year, or 5 points
every 17 years (i.e. about 15 points (one standard deviation) in 50
years). What these differences actually mean in terms of levels of
intelligence is still being debated. However, the fact that such
increases have occurred and are likely to continue to occur is
generally accepted.
There has been continuing debate in Australia and elsewhere as to
whether or not standards in reading have improved or declined over the
last two decades. Given the trend relating to IQ scores, one would
probably expect to find some increase in reading test scores over time.
Evidence from the United States
In the United States, the evidence indicates an increase in reading
test scores over the period 1971 to 1980, followed by a decline. The
net effect of this is that students in 1992 were reading at much the
same level as in 1971 (Mullis et al, 1994).
This evidence is however complicated by the fact that the trends vary
according to ethnic group (with greater improvements being shown by
Black and Hispanic students than by White students), and also by
changes in the procedures for assessing reading performance. There was
in fact such a marked drop in reading test scores from 1984 to 1986
that this difference was attributed to procedural factors rather than a
real difference in performance standards, although at the time no
specific reason for the discrepancy could be identified (Applebee,
Langer and Mullis, 1988).
Evidence from Australian National Surveys
In Australia, evidence on trends in reading standards is available from
various national surveys that have been carried out over the period
1975 to 1995. These surveys include the 1975 and 1980 ASSP studies
(Australian Studies in School Performance (1975) and Australian Studies
in Student Performance (1980)), as well as data collected for the Youth
in Transition study (1989), the Longitudinal Surveys of Australian
Youth (1995), and data from the Australian Youth Survey (1989, 1990,
1991, and 1992). The ASSP data included samples at both age 9 and age
14, while the data from Youth in Transition and longitudinal surveys
collected data on 14-year-olds.
The tests used in these various studies were not the same. The 1975
test was designed to assess minimum competency, and therefore focused
on the lower levels of achievement, while the later tests generally
covered a wider range of student performance. Comparisons of
achievement over time are based on a scaled score derived on the basis
of common items.
An analysis of the data from these surveys, undertaken by Marks and
Ainley, indicate that, in the case of 14-year-olds, reading achievement
has either stayed the same or declined slightly (Marks and Ainley,
1996).
One point of interest that was noted in comparing performance on common
items was that performance on some types of item improved, while
performance on other types of items declined. The items that showed
improvement were generally items based on short newspaper stories,
while the items that showed a decline in performance were items based
on longer and more complex text passages.
This is a trend which seems to me to merit further investigation.
State Monitoring Programs
Evidence on trends in reading standards in Australia is also available
from various State testing and monitoring programs
In 1978 the Queensland Institute of Educational Research published a
summary of research into the reading standards of Queensland Grade 5
pupils over the period 1933 to 1977 (Jacobsen, 1978). This study was
based on data from the old ACER Silent Reading Tests, administered in
1933, 1946, 1955, 1960, 1965 and 1971, and a new reading test (Reading
Test QR5, based on Test R: Reading from the 1960s NSW Basic Skills
Battery), in 1971, 1976 and 1977.
The results of these studies indicated that there had been an increase
in the standard of reading for meaning for Grade 5 students over the
period 1946 to 1971, but in 1977 the results obtained were slightly
lower than in 1971 (although still higher than at any time prior to
1971). At the same time there was a decrease in the average age of the
students tested.
These results were interpreted as indicating that Grade 5 students in
Queensland were attaining a higher standard of reading in the 1970s,
and at an earlier age, than their peers in the 1930s, the 1940s, the
1950s and the 1960s.
At the same time it was noted that speed of reading results had
remained static over the period 1960 to 1977, and that the 1946 result
for speed of reading was the highest ever in the period tested.
More recent state monitoring systems have in general shown no major
shift in reading standards over time. In Tasmania, results of a survey
cycle based on the 1975 ASSP study has shown no basic change in reading
skills since 1975. A Victorian survey in 1988 indicated no change in
reading performance from 1975 and 1980 to 1988 (McGaw, 1994).
Monitoring studies in Western Australia and Queensland have indicated
some improvement in reading performance from 1990 to 1992, but the 1995
data from the Western Australian program indicated a drop in
performance from 1992 to 1995 at both the Year 3 and Year 10 levels,
but not at the Year 7 level (Cook, Randall and Richards, 1997).
Interpretation of Data from Australian Studies
A problem in interpreting these trends is that in most cases different
tests are used, with comparisons being based on IRT scaling techniques.
There has also been a shift in the nature of the skills being assessed,
and the way in which they are assessed. There is now a much greater
emphasis on the use of open-ended responses which are rated by
teachers, and more emphasis on assessing use of prediction and
contextual cues in deriving meaning from text. This makes it difficult
to compare performance standards as measured on current tests with
those based on more traditional measures of reading comprehension. The
dependence on teacher assessments also introduces another source of
variation into the assessment, in that teacher assessments tend to be
more subject to expectations that may vary over time, and it is
difficult to be sure that ratings are consistent from one period of
time to another.
A further problem in the use of common-item equating methods for
measuring trends over time has been highlighted by the 1986 NAEP
reading anomaly. In this case, it was found that studentsÕ
performances on the Ôbridge itemsÕ (i.e. the set of items that was
common to the 1984 and 1986 NAEP tests), was affected by the context in
which the items were presented, that is, how and where the item
appeared in the test booklet. The effects of these changes on test
scores were found to be larger than the effects of the trend over time
that the instrument was designed to measure. These results have led to
a recognition that in cases where it is important to identify
relatively small shifts in population means, as in examining trends
over time, it is preferable to use identical instruments. That is
Ôwhen measuring change, do not change the measureÕ (Zwick, 1991;
Goldstein, 1995).
Test Renorming Studies
Renorming of standardised reading tests should in theory provide
evidence relating to changes in performance standards over time,
providing the test remains substantially the same. However, this sort
of data has not generally been used for this purpose, and in most cases
renorming is undertaken in conjunction with revision of the test, so
that scores on the old test and the new test are not comparable.
In order to see whether normative data from standardised reading tests
could provide further information on trends in reading standards over
time, an analysis of data from two widely used Australian reading tests
was undertaken.
These tests were the Neale Analysis of Reading Ability and the PAT
Reading Test.
The Neale Analysis of Reading Ability was developed and normed in
Britain in 1958, and was revised and renormed in both Australia and
Britain in the early to mid 1980s. This test was renormed in Britain
in 1996 and is currently being renormed in Australia. The test was
standardised on children aged from 6 to 12 years, and the original
norms were provided in the form of reading ages. In the revised
editions, norms are provided in the form of reading ages, percentiles
and stanine scores.
The PAT Reading Test was developed by NZCER and was adapted and normed
for use in Australia in November 1970. It was renormed, without
change, in November 1984. Norms are provided in the form of grade
norms for Years 3 to 9. In 1970 these norms were presented as separate
state norms for each state, while in 1984 the norms were presented as
one set of national norms.
Trends on the Neale Analysis of Reading Ability
The Neale Analysis of Reading Ability comprises a series of six
passages of prose forming a continuous reading scale for children aged
from 6 to 12 years. The original (1958) version comprised three
parallel forms. The current version comprises two parallel forms, with
most of the original passages included in the current forms.
The test is individually administered, the child reading each passage
aloud, and then being asked a series of questions about the passage.
Three types of score are derived from the test, an accuracy score, a
rate of reading score, and a comprehension score.
The original British norms were based on a sample of over 2000
children, selected to be representative in terms of area and social
background. The 1984 Australian norms were based on stratified random
samples in two states (Victoria and South Australia), with a total
sample of over 1000 children. The British 1988 norms were based on a
representative sample of over 1700 children, while the 1996 norms were
based on a representative sample of over 3 500 children.
In the case of the Australian 1984 norms and the British 1988 norms,
mean scores are reported for each 12 month age group from age 6 to age
11. In the case of the British 1996 norms, mean scores are reported
for these age groups and also for the age 12 group. Mean scores for
each age group are not reported in the case of the British 1958 norms.
For the purpose of comparing the performance of the 1958 norm sample
with that of the later samples, the raw score corresponding to the
middle of the age range (6:6, 7:6, etc.) was taken as an estimate of
the mean score of the 12 month age group. It should be noted that in
the case of the 1958 British norms reading ages above age 11:10 are
based on extrapolation.
In the case of the revised tests, the comparisons are based on Form 1.
In the case of the original test, separate norms are not provided for
each form, since the three forms are assumed to be comparable in
difficulty level.
Of the six passages in Form 1 of the revised version, four were in the
original Form C, one was a modified version of a passage in Form A, and
one was new. There is thus considerable overlap between the passages
included in the original form and those in Form 1 of the revised
edition.
Accuracy Scores
Looking first at the trends in terms of accuracy of reading, and
assuming that the British 1958 norms provided a reasonable estimate of
performance standards at this time in Australia, it can be seen that
there is an improvement in accuracy of reading at the younger age level
from 1958 to 1984, but by age 10 there is no difference in accuracy of
reading between the 1958 sample and the 1984 sample, and by age 11 the
1984 sample is performing at a substantially lower level than the 1958
sample.
(See Transparency 1)
Adding the 1988 British data to this comparison, a similar trend is
observed, but the performance of the British sample is consistently
higher than that of the Australian sample at all age levels.
(See Transparency 2)
Adding the British 1996 data to the 1958 and 1988 data, it can be seen
that the British performance in 1996 tends to be a little lower than
that in 1988, with the 1988 and 1996 samples performing a higher level
than the 1958 sample from age 6 to age 8, but with these differences
leveling out at age 9 to 11. At age 12 the estimated score for the
1958 group is substantially higher than that for the 1996 sample, but
since the 1958 score is based on extrapolation, this comparison may not
be valid.
(See Transparency 3)
Comprehension Scores
Looking at the comprehension scores for the 1984 Australian sample
against the 1958 British sample, the levels of performance from age 7
to 10 are remarkably similar. As in the case of the accuracy scores,
six-year-olds in 1984 are doing substantially better than six-year olds
in 1958, but by age 11 the performance of the 1984 group is
substantially lower than the performance of the 1958 group.
(See Transparency 4)
Adding the British 1988 data to this comparison, it can be seen that
the scores of the British sample are substantially higher than those of
the 1984 Australian sample at all age levels, and also higher than the
1958 sample from age 6 to 10 years, but by age 11 the 1988 British norm
is somewhat lower than the 1958 mean.
(See Transparency 5)
Adding the British 1996 data to the 1958 and 1988 data, it can be seen
that there is little difference in performance from 1988 to 1996,
although the 1996 mean at age 11 is a little higher than the 1988 mean.
At age 12 there is a plateau effect in the case of the 1996 data,
indicating that the ceiling of the test is reached by age 11. This
suggests that the assumption on which the extrapolation of the 1958
data to age 12 is based is probably false.
(See Transparency 6)
Rate of Reading Scores
Looking at the rate of reading scores for the 1984 Australian sample
against the 1958 British sample, the trend for improved performance up
to about age 8 or 9, followed by a decline in performance at the older
age levels is more clearly marked.
(See Transparency 7).
The decline in reading rate from age 9 to 11 is less marked in the case
of the British 1988 sample,
(See Transparency 8)
but is quite strongly marked in the case of the 1996 sample, where up
to age 8 the 1996 sample is scoring at a substantially higher level
than the 1958 sample, but from age 9 to 12 there is a clear drop in
rate of reading scores in 1996 as compared with 1958.
(See Transparency 9)
Interpretation
Overall, these results indicate an improvement in reading performance
from age 6 to age 9 over the period 1958 to 1996, but some evidence of
a decline at older age levels, this decline first showing up in rate of
reading scores at age 9 and then in comprehension and accuracy scores
at ages 10 to 11.
How valid is the trend indicated by these results? It could be argued
that the apparent decline in performance from about age 10 is due to
ceiling effects of the test or sampling error.
The fact that the same trend is found in both the Australian and the
British data, and is consistent from 1988 to 1996 in the case of the
British data, suggest that these results are indicative of an
underlying trend in standards of reading performance over time.
Trends on the PAT Reading Test
Given the trends on the Neale Analysis of Reading Ability for children
aged from 6 to 11 years, it was thought that it would be useful to look
at the trends for older students as indicated by scores on the PAT
Reading Test.
The PAT Reading Test has been widely used as a measure of reading
comprehension and vocabulary over the last 25 years. Originally
developed by NZCER, it was adapted and normed by ACER for use in
Australia in November 1970, and was renormed, without change, in
November 1984. The 1970 norms were based on a representative sample of
about 20 000 students, while the 1984 norms were based on a
representative sample of about 12 000 students.
The test is group administered and comprises two sections, a Vocabulary
test and a Comprehension test. Responses are in a multiple-choice
format.
In the case of the Vocabulary test, students have to read a sentence
and then choose from five alternatives a word which has the same or a
similar meaning to a given word in the sentence.
In the case of the Comprehension test, the student reads a short
passage and then responds to a number of questions about the passage,
choosing the correct answer for each question from five alternatives.
Questions are designed to assess both factual and inferential
comprehension of prose material.
The test covers Year 3 to Year 9, with norms provided in the form of
grade norms. In 1970 these norms were presented as separate state
norms for each state, while in 1984 the norms were presented as one set
of national norms.
For the purposes of this study I have compared the 1970 Victorian and
New South Wales state norms with the 1984 Australian national norms.
This comparison could be done for each state, or the 1970 data could be
converted into an estimated national norm for 1970. However, since
Victoria and New South Wales combined comprise over half the Australian
student population, this comparison is indicative of what would be
expected for the total population, and also gives some indication of
possible state variations in reading achievement.
Vocabulary Test Scores
A comparison of 1970 and 1984 scores on the Vocabulary test indicates
that scores in 1984 were generally substantially lower than scores in
1970, and that these differences tended to increase with age. In the
case of the 1970 state norms, the tendency was for New South Wales to
score at a higher level than Victoria at the primary level (Year 3 to
Year 6), but for there to be no differences at the secondary level,
with some trend for Victoria to score higher at the Year 9 level.
(See Transparency 10).
Comprehension Test Scores
The same trend was observed for scores on the Comprehension test, both
in terms of the increasing decline in scores from Year 3 to Year 9 in
1984 as compared with 1970, and in terms of the differences in score
between Victoria and New South Wales in 1970 at the primary level, but
not at the secondary level.
(See Transparency 11).
These results would therefore seem to confirm the trend suggested by
the Neale data that while the reading performance of younger children
has improved over the years since 1958, the performance of older
children, from the age of about 9 or 10, showed a marked decline over
the period 1970 to 1984.
In comparing the data from these analyses with the data from the United
States, it is of interest to note that over the period when reading
test scores in the United States were increasing (1971 to 1980), those
in Australia were declining.
The Current Situation
What is the current situation?
The short answer is of course that we donÕt know.
The results of the restandardisation of the Neale test, which is
currently underway, will tell us something about what is happening at
the lower age levels.
In this case it will be of interest to note how the 1997 Australian
norms compare with the 1996 British norms, and whether the difference
between the Australian and British scores, as observed in the case of
the 1984 and 1988 norms, persists.
It will also be of interest to note whether any shift in scores has
occurred between 1984 and 1997, and if so, in what direction, and
whether this shift is consistent at younger and older age levels.
However, the results on the Neale will not tell us anything about what
is happening at the older age levels.
Current state and national testing programs do not tell us very much
about how current standards compare with past standards, because
different tests are used and different skills are assessed.
Does this matter?
I think it does matter.
There is accumulating evidence, both in Australia and elsewhere, to
suggest that students of today read less, and read less complex
material, than students of 20 or 30 years ago. Furthermore, the older
students get, the less they read.
Despite optimism in the 1980s of improved education leading to improved
literacy standards, this has not happened.
The reasons for this are likely to be complex.
Jeanne Chall has associated the trend for reading improvements in the
1970s at age nine, and the lack of improvement and possible decline in
the 1980s, with a change in the teaching of reading from a
code-emphasis approach (in the 1970s) to a meaning emphasis approach
(in the 1980s).
In Australia we have no data on the possible impact of different
approaches to the teaching of reading on reading achievement.
Collection of data that may have thrown some light on this issue was
specifically excluded from the data collection associated with the
recent national survey of English literacy.
I would argue that collection of data on trends in reading standards
over time is essential to monitoring the effectiveness of our education
system, and of different approaches to the teaching of reading.
Major changes in the approach to the teaching of reading have occurred
over the past 20 years, yet we have no data to tell us what impact
these changes have had on reading achievement. Such data as does exist
suggest that rather than an improvement in standards, a decline has
occurred, particularly at older age levels.
Given the problems associated with using different tests to compare
performance at different times, the use of data from periodic renorming
of standardised tests seems to me to provide an effective and economic
way of monitoring standards over time.
Unfortunately, the value of this approach seems to have gone largely
unrecognised. Standardised tests of reading are no longer fashionable,
and there is no longer any interest in maintaining or updating such
tests.
As a result, the opportunity to collect data on trends in reading
standards in Australia over the last 25 years appears to have been
lost.
References
Applebee, A.N., Langer, J.A., Mullis, I.V.S. (1988). Who Reads Best?:
Factors Related to Reading Achievment in Grades 3, 7 and 11.
Princeton, NJ: Educational Testing Service.
Cook, J., Randall, K., and Richards, L. (1997). Student Achievement
in English in Western Australian Government Schools, 1995. Perth:
Education Department of Western Australia.
Goldstein, H. (1995). Interpreting International Comparisons of
Student Achievement. Educational Documents and Studies, 63. Paris:
UNESCO.
Jacobsen, J. (1978). A Summary of the Research into Reading Standards
of Queensland Grade Five Pupils, 1933-1977. Brisbane: Queensland
Institute for Eductional Research.
McGaw, B. (1994). Standards from a Curriculum and Assessment
Perspective. Director's Comment, in Australian Council for Educational
Research Annual Report, 1993-94. Melbourne: ACER.
Marks, G. N. and Ainley, J. (1996). Reading Comprehension and
Numeracy Among Junior Secondary School Students in Australia.
Longitudinal Surveys of Australian Youth, Research Report Number 3.
Melbourne: ACER
Mullis, I.V.S., Dossey.J.A., Campbell, J.R., Gentile, C.A., O'Sullivan,
C., and Latham, A.S. (1994). Report in Brief: NAEP 1992 Trends in
Academic Progress Achievement of U.S. Students in Science, 1969 to
1992, Mathematics, 1973 to 1992, Reading, 1971 to 1992, Writing, 1984
to 1992. Washington DC: National Centre for Educational Statistics,
US Department of Education.
Zwick, R. (1991). Effects of Item Order and Context on Estimation of
NAEP Reading Proficiency. Educational Measurment: Issues and
Practice. 10 (3), 10-16.
Test Manuals
Neale, M. (1958). Neale Analysis of Reading Ability: Manual of
Directions and Norms (Second Edition, 1966). London: MacMillan.
Neale, M. (1988). Neale Analysis of Reading Ability Ð Revised:
Manual. Melbourne: ACER.
Neale, M. (1989). Neale Analysis of Reading Ability Ð Revised
British Edition: Manual. Windsor, Berkshire: NFER-Nelson.
Neale, M. (1997). Neale Analysis of Reading Ability Ð Revised
(Second Revised British Edition): Manual for Schools. Windsor,
Berkshire: NFER-Nelson Education.
ACER (1973). Progressive Achievement Tests: Reading Comprehension and
Reading Vocabulary: Teacher's Handbook. Melbourne: ACER.
ACER (1986). Progressive Achievement Test in Readings: Comprehension
and Vocabulary: Teacher's Handbook (Second Edition). Melbourne: ACER.