Reading Standards Up or Down - What do the test norms say?

Marion M. de Lemos, Australian Council for Educational Research

 

There has been continuing debate in Australia and elsewhere as to

whether or not standards in reading have improved or declined over the

last two decades. Direct evidence on this is however limited. One

way of monitoring trends over time is through examination of changes in

test scores on standardised tests that are renormed at periodic

intervals. In the case of intelligence tests, such studies have

indicated a gradual increase in scores over the last 50 years or so.

Comparable studies have not been undertaken on reading test scores.

This paper will examine scores on standardised tests of reading that

have been periodically renormed over the period 1958 to 1996 to

determine whether there is a trend for an increase or a decrease in

reading test scores over this period.

 

One way of monitoring changes in performance standards over time is

through examining changes in test scores on standardised tests which

are renormed at periodic intervals. In the case of intelligence tests,

such studies have indicated a gradual increase in test scores over the

last 50 years or so. In terms of IQ score, these increases translate

to a gain in IQ score of approximately 0.3 points a year, or 5 points

every 17 years (i.e. about 15 points (one standard deviation) in 50

years). What these differences actually mean in terms of levels of

intelligence is still being debated. However, the fact that such

increases have occurred and are likely to continue to occur is

generally accepted.

There has been continuing debate in Australia and elsewhere as to

whether or not standards in reading have improved or declined over the

last two decades. Given the trend relating to IQ scores, one would

probably expect to find some increase in reading test scores over time.

 

Evidence from the United States

In the United States, the evidence indicates an increase in reading

test scores over the period 1971 to 1980, followed by a decline. The

net effect of this is that students in 1992 were reading at much the

same level as in 1971 (Mullis et al, 1994).

This evidence is however complicated by the fact that the trends vary

according to ethnic group (with greater improvements being shown by

Black and Hispanic students than by White students), and also by

changes in the procedures for assessing reading performance. There was

in fact such a marked drop in reading test scores from 1984 to 1986

that this difference was attributed to procedural factors rather than a

real difference in performance standards, although at the time no

specific reason for the discrepancy could be identified (Applebee,

Langer and Mullis, 1988).

 

Evidence from Australian National Surveys

In Australia, evidence on trends in reading standards is available from

various national surveys that have been carried out over the period

1975 to 1995. These surveys include the 1975 and 1980 ASSP studies

(Australian Studies in School Performance (1975) and Australian Studies

in Student Performance (1980)), as well as data collected for the Youth

in Transition study (1989), the Longitudinal Surveys of Australian

Youth (1995), and data from the Australian Youth Survey (1989, 1990,

1991, and 1992). The ASSP data included samples at both age 9 and age

14, while the data from Youth in Transition and longitudinal surveys

collected data on 14-year-olds.

The tests used in these various studies were not the same. The 1975

test was designed to assess minimum competency, and therefore focused

on the lower levels of achievement, while the later tests generally

covered a wider range of student performance. Comparisons of

achievement over time are based on a scaled score derived on the basis

of common items.

An analysis of the data from these surveys, undertaken by Marks and

Ainley, indicate that, in the case of 14-year-olds, reading achievement

has either stayed the same or declined slightly (Marks and Ainley,

1996).

One point of interest that was noted in comparing performance on common

items was that performance on some types of item improved, while

performance on other types of items declined. The items that showed

improvement were generally items based on short newspaper stories,

while the items that showed a decline in performance were items based

on longer and more complex text passages.

This is a trend which seems to me to merit further investigation.

State Monitoring Programs

Evidence on trends in reading standards in Australia is also available

from various State testing and monitoring programs

In 1978 the Queensland Institute of Educational Research published a

summary of research into the reading standards of Queensland Grade 5

pupils over the period 1933 to 1977 (Jacobsen, 1978). This study was

based on data from the old ACER Silent Reading Tests, administered in

1933, 1946, 1955, 1960, 1965 and 1971, and a new reading test (Reading

Test QR5, based on Test R: Reading from the 1960s NSW Basic Skills

Battery), in 1971, 1976 and 1977.

The results of these studies indicated that there had been an increase

in the standard of reading for meaning for Grade 5 students over the

period 1946 to 1971, but in 1977 the results obtained were slightly

lower than in 1971 (although still higher than at any time prior to

1971). At the same time there was a decrease in the average age of the

students tested.

These results were interpreted as indicating that Grade 5 students in

Queensland were attaining a higher standard of reading in the 1970s,

and at an earlier age, than their peers in the 1930s, the 1940s, the

1950s and the 1960s.

At the same time it was noted that speed of reading results had

remained static over the period 1960 to 1977, and that the 1946 result

for speed of reading was the highest ever in the period tested.

More recent state monitoring systems have in general shown no major

shift in reading standards over time. In Tasmania, results of a survey

cycle based on the 1975 ASSP study has shown no basic change in reading

skills since 1975. A Victorian survey in 1988 indicated no change in

reading performance from 1975 and 1980 to 1988 (McGaw, 1994).

Monitoring studies in Western Australia and Queensland have indicated

some improvement in reading performance from 1990 to 1992, but the 1995

data from the Western Australian program indicated a drop in

performance from 1992 to 1995 at both the Year 3 and Year 10 levels,

but not at the Year 7 level (Cook, Randall and Richards, 1997).

Interpretation of Data from Australian Studies

A problem in interpreting these trends is that in most cases different

tests are used, with comparisons being based on IRT scaling techniques.

There has also been a shift in the nature of the skills being assessed,

and the way in which they are assessed. There is now a much greater

emphasis on the use of open-ended responses which are rated by

teachers, and more emphasis on assessing use of prediction and

contextual cues in deriving meaning from text. This makes it difficult

to compare performance standards as measured on current tests with

those based on more traditional measures of reading comprehension. The

dependence on teacher assessments also introduces another source of

variation into the assessment, in that teacher assessments tend to be

more subject to expectations that may vary over time, and it is

difficult to be sure that ratings are consistent from one period of

time to another.

A further problem in the use of common-item equating methods for

measuring trends over time has been highlighted by the 1986 NAEP

reading anomaly. In this case, it was found that studentsÕ

performances on the Ôbridge itemsÕ (i.e. the set of items that was

common to the 1984 and 1986 NAEP tests), was affected by the context in

which the items were presented, that is, how and where the item

appeared in the test booklet. The effects of these changes on test

scores were found to be larger than the effects of the trend over time

that the instrument was designed to measure. These results have led to

a recognition that in cases where it is important to identify

relatively small shifts in population means, as in examining trends

over time, it is preferable to use identical instruments. That is

Ôwhen measuring change, do not change the measureÕ (Zwick, 1991;

Goldstein, 1995).

Test Renorming Studies

Renorming of standardised reading tests should in theory provide

evidence relating to changes in performance standards over time,

providing the test remains substantially the same. However, this sort

of data has not generally been used for this purpose, and in most cases

renorming is undertaken in conjunction with revision of the test, so

that scores on the old test and the new test are not comparable.

In order to see whether normative data from standardised reading tests

could provide further information on trends in reading standards over

time, an analysis of data from two widely used Australian reading tests

was undertaken.

These tests were the Neale Analysis of Reading Ability and the PAT

Reading Test.

The Neale Analysis of Reading Ability was developed and normed in

Britain in 1958, and was revised and renormed in both Australia and

Britain in the early to mid 1980s. This test was renormed in Britain

in 1996 and is currently being renormed in Australia. The test was

standardised on children aged from 6 to 12 years, and the original

norms were provided in the form of reading ages. In the revised

editions, norms are provided in the form of reading ages, percentiles

and stanine scores.

The PAT Reading Test was developed by NZCER and was adapted and normed

for use in Australia in November 1970. It was renormed, without

change, in November 1984. Norms are provided in the form of grade

norms for Years 3 to 9. In 1970 these norms were presented as separate

state norms for each state, while in 1984 the norms were presented as

one set of national norms.

Trends on the Neale Analysis of Reading Ability

The Neale Analysis of Reading Ability comprises a series of six

passages of prose forming a continuous reading scale for children aged

from 6 to 12 years. The original (1958) version comprised three

parallel forms. The current version comprises two parallel forms, with

most of the original passages included in the current forms.

The test is individually administered, the child reading each passage

aloud, and then being asked a series of questions about the passage.

Three types of score are derived from the test, an accuracy score, a

rate of reading score, and a comprehension score.

The original British norms were based on a sample of over 2000

children, selected to be representative in terms of area and social

background. The 1984 Australian norms were based on stratified random

samples in two states (Victoria and South Australia), with a total

sample of over 1000 children. The British 1988 norms were based on a

representative sample of over 1700 children, while the 1996 norms were

based on a representative sample of over 3 500 children.

In the case of the Australian 1984 norms and the British 1988 norms,

mean scores are reported for each 12 month age group from age 6 to age

11. In the case of the British 1996 norms, mean scores are reported

for these age groups and also for the age 12 group. Mean scores for

each age group are not reported in the case of the British 1958 norms.

For the purpose of comparing the performance of the 1958 norm sample

with that of the later samples, the raw score corresponding to the

middle of the age range (6:6, 7:6, etc.) was taken as an estimate of

the mean score of the 12 month age group. It should be noted that in

the case of the 1958 British norms reading ages above age 11:10 are

based on extrapolation.

In the case of the revised tests, the comparisons are based on Form 1.

In the case of the original test, separate norms are not provided for

each form, since the three forms are assumed to be comparable in

difficulty level.

Of the six passages in Form 1 of the revised version, four were in the

original Form C, one was a modified version of a passage in Form A, and

one was new. There is thus considerable overlap between the passages

included in the original form and those in Form 1 of the revised

edition.

Accuracy Scores

Looking first at the trends in terms of accuracy of reading, and

assuming that the British 1958 norms provided a reasonable estimate of

performance standards at this time in Australia, it can be seen that

there is an improvement in accuracy of reading at the younger age level

from 1958 to 1984, but by age 10 there is no difference in accuracy of

reading between the 1958 sample and the 1984 sample, and by age 11 the

1984 sample is performing at a substantially lower level than the 1958

sample.

(See Transparency 1)

 

Adding the 1988 British data to this comparison, a similar trend is

observed, but the performance of the British sample is consistently

higher than that of the Australian sample at all age levels.

(See Transparency 2)

Adding the British 1996 data to the 1958 and 1988 data, it can be seen

that the British performance in 1996 tends to be a little lower than

that in 1988, with the 1988 and 1996 samples performing a higher level

than the 1958 sample from age 6 to age 8, but with these differences

leveling out at age 9 to 11. At age 12 the estimated score for the

1958 group is substantially higher than that for the 1996 sample, but

since the 1958 score is based on extrapolation, this comparison may not

be valid.

(See Transparency 3)

Comprehension Scores

Looking at the comprehension scores for the 1984 Australian sample

against the 1958 British sample, the levels of performance from age 7

to 10 are remarkably similar. As in the case of the accuracy scores,

six-year-olds in 1984 are doing substantially better than six-year olds

in 1958, but by age 11 the performance of the 1984 group is

substantially lower than the performance of the 1958 group.

(See Transparency 4)

Adding the British 1988 data to this comparison, it can be seen that

the scores of the British sample are substantially higher than those of

the 1984 Australian sample at all age levels, and also higher than the

1958 sample from age 6 to 10 years, but by age 11 the 1988 British norm

is somewhat lower than the 1958 mean.

(See Transparency 5)

Adding the British 1996 data to the 1958 and 1988 data, it can be seen

that there is little difference in performance from 1988 to 1996,

although the 1996 mean at age 11 is a little higher than the 1988 mean.

At age 12 there is a plateau effect in the case of the 1996 data,

indicating that the ceiling of the test is reached by age 11. This

suggests that the assumption on which the extrapolation of the 1958

data to age 12 is based is probably false.

(See Transparency 6)

Rate of Reading Scores

Looking at the rate of reading scores for the 1984 Australian sample

against the 1958 British sample, the trend for improved performance up

to about age 8 or 9, followed by a decline in performance at the older

age levels is more clearly marked.

(See Transparency 7).

The decline in reading rate from age 9 to 11 is less marked in the case

of the British 1988 sample,

(See Transparency 8)

but is quite strongly marked in the case of the 1996 sample, where up

to age 8 the 1996 sample is scoring at a substantially higher level

than the 1958 sample, but from age 9 to 12 there is a clear drop in

rate of reading scores in 1996 as compared with 1958.

(See Transparency 9)

 

Interpretation

 

 

Overall, these results indicate an improvement in reading performance

from age 6 to age 9 over the period 1958 to 1996, but some evidence of

a decline at older age levels, this decline first showing up in rate of

reading scores at age 9 and then in comprehension and accuracy scores

at ages 10 to 11.

How valid is the trend indicated by these results? It could be argued

that the apparent decline in performance from about age 10 is due to

ceiling effects of the test or sampling error.

The fact that the same trend is found in both the Australian and the

British data, and is consistent from 1988 to 1996 in the case of the

British data, suggest that these results are indicative of an

underlying trend in standards of reading performance over time.

Trends on the PAT Reading Test

Given the trends on the Neale Analysis of Reading Ability for children

aged from 6 to 11 years, it was thought that it would be useful to look

at the trends for older students as indicated by scores on the PAT

Reading Test.

The PAT Reading Test has been widely used as a measure of reading

comprehension and vocabulary over the last 25 years. Originally

developed by NZCER, it was adapted and normed by ACER for use in

Australia in November 1970, and was renormed, without change, in

November 1984. The 1970 norms were based on a representative sample of

about 20 000 students, while the 1984 norms were based on a

representative sample of about 12 000 students.

The test is group administered and comprises two sections, a Vocabulary

test and a Comprehension test. Responses are in a multiple-choice

format.

In the case of the Vocabulary test, students have to read a sentence

and then choose from five alternatives a word which has the same or a

similar meaning to a given word in the sentence.

In the case of the Comprehension test, the student reads a short

passage and then responds to a number of questions about the passage,

choosing the correct answer for each question from five alternatives.

Questions are designed to assess both factual and inferential

comprehension of prose material.

The test covers Year 3 to Year 9, with norms provided in the form of

grade norms. In 1970 these norms were presented as separate state

norms for each state, while in 1984 the norms were presented as one set

of national norms.

For the purposes of this study I have compared the 1970 Victorian and

New South Wales state norms with the 1984 Australian national norms.

This comparison could be done for each state, or the 1970 data could be

converted into an estimated national norm for 1970. However, since

Victoria and New South Wales combined comprise over half the Australian

student population, this comparison is indicative of what would be

expected for the total population, and also gives some indication of

possible state variations in reading achievement.

 

Vocabulary Test Scores

A comparison of 1970 and 1984 scores on the Vocabulary test indicates

that scores in 1984 were generally substantially lower than scores in

1970, and that these differences tended to increase with age. In the

case of the 1970 state norms, the tendency was for New South Wales to

score at a higher level than Victoria at the primary level (Year 3 to

Year 6), but for there to be no differences at the secondary level,

with some trend for Victoria to score higher at the Year 9 level.

(See Transparency 10).

Comprehension Test Scores

The same trend was observed for scores on the Comprehension test, both

in terms of the increasing decline in scores from Year 3 to Year 9 in

1984 as compared with 1970, and in terms of the differences in score

between Victoria and New South Wales in 1970 at the primary level, but

not at the secondary level.

(See Transparency 11).

These results would therefore seem to confirm the trend suggested by

the Neale data that while the reading performance of younger children

has improved over the years since 1958, the performance of older

children, from the age of about 9 or 10, showed a marked decline over

the period 1970 to 1984.

In comparing the data from these analyses with the data from the United

States, it is of interest to note that over the period when reading

test scores in the United States were increasing (1971 to 1980), those

in Australia were declining.

The Current Situation

What is the current situation?

The short answer is of course that we donÕt know.

The results of the restandardisation of the Neale test, which is

currently underway, will tell us something about what is happening at

the lower age levels.

In this case it will be of interest to note how the 1997 Australian

norms compare with the 1996 British norms, and whether the difference

between the Australian and British scores, as observed in the case of

the 1984 and 1988 norms, persists.

It will also be of interest to note whether any shift in scores has

occurred between 1984 and 1997, and if so, in what direction, and

whether this shift is consistent at younger and older age levels.

However, the results on the Neale will not tell us anything about what

is happening at the older age levels.

Current state and national testing programs do not tell us very much

about how current standards compare with past standards, because

different tests are used and different skills are assessed.

Does this matter?

I think it does matter.

There is accumulating evidence, both in Australia and elsewhere, to

suggest that students of today read less, and read less complex

material, than students of 20 or 30 years ago. Furthermore, the older

students get, the less they read.

Despite optimism in the 1980s of improved education leading to improved

literacy standards, this has not happened.

The reasons for this are likely to be complex.

Jeanne Chall has associated the trend for reading improvements in the

1970s at age nine, and the lack of improvement and possible decline in

the 1980s, with a change in the teaching of reading from a

code-emphasis approach (in the 1970s) to a meaning emphasis approach

(in the 1980s).

In Australia we have no data on the possible impact of different

approaches to the teaching of reading on reading achievement.

Collection of data that may have thrown some light on this issue was

specifically excluded from the data collection associated with the

recent national survey of English literacy.

I would argue that collection of data on trends in reading standards

over time is essential to monitoring the effectiveness of our education

system, and of different approaches to the teaching of reading.

Major changes in the approach to the teaching of reading have occurred

over the past 20 years, yet we have no data to tell us what impact

these changes have had on reading achievement. Such data as does exist

suggest that rather than an improvement in standards, a decline has

occurred, particularly at older age levels.

Given the problems associated with using different tests to compare

performance at different times, the use of data from periodic renorming

of standardised tests seems to me to provide an effective and economic

way of monitoring standards over time.

Unfortunately, the value of this approach seems to have gone largely

unrecognised. Standardised tests of reading are no longer fashionable,

and there is no longer any interest in maintaining or updating such

tests.

As a result, the opportunity to collect data on trends in reading

standards in Australia over the last 25 years appears to have been

lost.

 

References

Applebee, A.N., Langer, J.A., Mullis, I.V.S. (1988). Who Reads Best?:

 

 

Factors Related to Reading Achievment in Grades 3, 7 and 11.

Princeton, NJ: Educational Testing Service.

 

Cook, J., Randall, K., and Richards, L. (1997). Student Achievement

in English in Western Australian Government Schools, 1995. Perth:

Education Department of Western Australia.

 

Goldstein, H. (1995). Interpreting International Comparisons of

Student Achievement. Educational Documents and Studies, 63. Paris:

UNESCO.

 

Jacobsen, J. (1978). A Summary of the Research into Reading Standards

of Queensland Grade Five Pupils, 1933-1977. Brisbane: Queensland

Institute for Eductional Research.

 

McGaw, B. (1994). Standards from a Curriculum and Assessment

Perspective. Director's Comment, in Australian Council for Educational

Research Annual Report, 1993-94. Melbourne: ACER.

 

Marks, G. N. and Ainley, J. (1996). Reading Comprehension and

Numeracy Among Junior Secondary School Students in Australia.

Longitudinal Surveys of Australian Youth, Research Report Number 3.

Melbourne: ACER

 

Mullis, I.V.S., Dossey.J.A., Campbell, J.R., Gentile, C.A., O'Sullivan,

C., and Latham, A.S. (1994). Report in Brief: NAEP 1992 Trends in

Academic Progress Achievement of U.S. Students in Science, 1969 to

1992, Mathematics, 1973 to 1992, Reading, 1971 to 1992, Writing, 1984

to 1992. Washington DC: National Centre for Educational Statistics,

US Department of Education.

 

Zwick, R. (1991). Effects of Item Order and Context on Estimation of

NAEP Reading Proficiency. Educational Measurment: Issues and

Practice. 10 (3), 10-16.

 

 

Test Manuals

 

Neale, M. (1958). Neale Analysis of Reading Ability: Manual of

Directions and Norms (Second Edition, 1966). London: MacMillan.

 

Neale, M. (1988). Neale Analysis of Reading Ability Ð Revised:

Manual. Melbourne: ACER.

 

Neale, M. (1989). Neale Analysis of Reading Ability Ð Revised

British Edition: Manual. Windsor, Berkshire: NFER-Nelson.

 

Neale, M. (1997). Neale Analysis of Reading Ability Ð Revised

(Second Revised British Edition): Manual for Schools. Windsor,

Berkshire: NFER-Nelson Education.

 

ACER (1973). Progressive Achievement Tests: Reading Comprehension and

Reading Vocabulary: Teacher's Handbook. Melbourne: ACER.

 

ACER (1986). Progressive Achievement Test in Readings: Comprehension

and Vocabulary: Teacher's Handbook (Second Edition). Melbourne: ACER.