Sally Larsen

The good, the bad and the pretty good actually

Every year headlines proclaim the imminent demise of the nation due to terrible, horrible, very bad NAPLAN results. But if we look at variability and results over time, it’s a bit of a different story.

I must admit, I’m thoroughly sick of NAPLAN reports. What I am most tired of, however, are moral panics about the disastrous state of Australian students’ school achievement that are often unsupported by the data.

A cursory glance at the headlines since NAPLAN 2022 results were released on Monday show several classics in the genre of “picking out something slightly negative to focus on so that the bigger picture is obscured”. 

A few examples (just for fun) include:

Reading standards for year 9 boys at record low, NAPLAN results show 

Written off: NAPLAN results expose where Queensland students are behind 

NAPLAN results show no overall decline in learning, but 2 per cent drop in participation levels an ‘issue of concern’ 

And my favourite (and a classic of the “yes, but” genre of tabloid reporting)

‘Mixed bag’ as Victorian students slip in numeracy, grammar and spelling in NAPLAN 

The latter contains the alarming news that “In Victoria, year 9 spelling slipped compared with last year from an average NAPLAN score of 579.7 to 576.7, but showed little change compared with 2008 (576.9). Year 5 grammar had a “substantial decrease” from average scores of 502.6 to 498.8.”

If you’re paying attention to the numbers, not just the hyperbole, you’ll notice that these ‘slips’ are in the order of 3 scale scores (Year 9 spelling) and 3.8 scale scores (Year 5 grammar). Perhaps the journalists are unaware that the NAPLAN scale ranges from 1-1000? It might be argued that a change in the mean of 3 scale scores is essentially what you get with normal fluctuations due to sampling variation – not, interestingly, a “substantial decrease”. 

The same might be said of the ‘record low’ reading scores for Year 9 boys. The alarm is caused by a 0.2 score difference between 2021 and 2022. When compared with the 2008 average for Year 9 boys the difference is 6 scale score points, but this difference is not noted in the 2022 NAPLAN Report as being ‘statistically significant’ – nor are many of the changes up or down in means or in percentages of students at or above the national minimum standard.

Even if differences are reported as statistically significant, it is important to note two things: 

1. Because we are ostensibly collecting data on the entire population, it’s arguable whether we should be using statistical significance at all.

2. As sample sizes increase, even very small differences can be “statistically significant” even if they are not practically meaningful.

Figure 1. NAPLAN Numeracy test mean scale scores for nine cohorts of students at Year 3, 5, 7 and 9.

The practical implications of reported differences in NAPLAN results from year to year (essentially the effect sizes) are not often canvassed in media reporting. This is an unfortunate omission and tends to enable narratives of largescale decline, particularly because the downward changes are trumpeted loudly while the positives are roundly ignored. 

The NAPLAN reports themselves do identify differences in terms of effect sizes – although the reasoning behind what magnitude delineates a ‘substantial difference’ in NAPLAN scale scores is not clearly explained. Nonetheless, moving the focus to a consideration of practical significance helps us ask: If an average score changes from year to year, or between groups, are the sizes of the differences something we should collectively be worried about? 

Interestingly, Australian students’ literacy and numeracy results have remained remarkably stable over the last 14 years. Figures 1 and 2 show the national mean scores for numeracy and reading for the nine cohorts of students who have completed the four NAPLAN years, starting in 2008 (notwithstanding the gap in 2020). There have been no precipitous declines, no stunning advances. Average scores tend to move around a little bit from year to year, but again, this may be due to sampling variability – we are, after all, comparing different groups of students. 

This is an important point for school leaders to remember too: even if schools track and interpret mean NAPLAN results each year, we would expect those mean scores to go up and down a little bit over each test occasion. The trick is to identify when an increase or decrease is more than what should be expected, given that we’re almost always comparing different groups of students (relatedly see Kraft, 2019 for an excellent discussion of interpreting effect sizes in education). 

Figure 2. NAPLAN Reading test mean scale scores for nine cohorts of students at Year 3, 5, 7 and 9.

Plotting the data in this way it seems evident to me that, since 2008, teachers have been doing their work of teaching, and students by-and-large have been progressing in their skills as they grow up, go to school and sit their tests in years 3, 5, 7 and 9. It’s actually a pretty good news story – notably not an ongoing and major disaster. 

Another way of looking at the data, and one that I think is much more interesting – and instructive – is to consider the variability in achievement between observed groups. This can help us see that just because one group has a lower average score than another group, this does not mean that all the students in the lower average group are doomed to failure.

Figure 3 shows just one example: the NAPLAN reading test scores of a random sample of 5000 Year 9 students who sat the test in NSW in 2018 (this subsample was randomly selected from data for the full cohort of students in that year, N=88,958). The red dots represent the mean score for boys (left) and girls (right). You can see that girls did better than boys on average. However, the distribution of scores is wide and almost completely overlaps (the grey dots for boys and the blue dots for girls). There are more boys at the very bottom of the distribution and a few more girls right at the top of the distribution, but these data don’t suggest to me that we should go into full panic mode that there’s a ‘huge literacy gap’ for Year 9 boys. We don’t currently have access to the raw data for 2022, but it’s unlikely that the distributions would look much different for the 2022 results.  

Figure 3. Individual scale scores and means for Reading for Year 9 boys and girls (NSW, 2018 data).

So what’s my point? Well, since NAPLAN testing is here to stay, I think we can do a lot better on at least two things: 1) reporting the data honestly (even when its not bad news), and 2) critiquing misleading or inaccurate reporting by pointing out errors of interpretation or overreach. These two aims require a level of analysis that goes beyond mean score comparisons to look more carefully at longitudinal trends (a key strength of the national assessment program) and variability across the distributions of achievement.

If you look at the data over time NAPLAN isn’t a story of a long, slow decline. In fact, it’s a story of stability and improvement. For example, I’m not sure that anyone has reported that the percentage of Indigenous students at or above the minimum standard for reading in Year 3 has stayed pretty stable since 2019 – at around 83% up from 68% in 2008. In Year 5 it’s the highest it’s ever been at 78.5% of Indigenous students at or above the minimum standard – up from 63% in 2008. 

Overall the 2022 NAPLAN report shows some slight declines, but also some improvements, and a lot that has remained pretty stable. 

As any teacher or school leader will tell you, improving students’ basic skills achievement is difficult, intensive and long-term work. Like any task worth undertaking, there will be victories and setbacks along the way. Any successes should not be overshadowed by the disaster narratives continually fostered by the 24/7 news cycle. At the same time, overinterpreting small average fluctuations doesn’t help either. Fostering a more nuanced and longer-term view when interpreting NAPLAN data, and recalling that it gives us a fairly one-dimensional view of student achievement and academic development would be a good place to start.

Sally Larsen is a Lecturer in Learning, Teaching and Inclusive Education at the University of New England. Her research is in the area of reading and maths development across the primary and early secondary school years in Australia, including investigating patterns of growth in NAPLAN assessment data. She is interested in educational measurement and quantitative methods in social and educational research. You can find her on Twitter @SallyLars_27

Everything you never knew you wanted to know about school funding

Book review: Waiting For Gonski: How Australia Failed its Schools, by Tom Greenwell and Chris Bonnor

With the 2022 federal election now in the rear-view mirror and a new Labor government taking office, discussions about the Education portfolio have already begun. As journalists and media commentators noted, education did not figure largely in the election campaign, notwithstanding the understandable public interest in this area. One of the enduring topics of education debates –  and the key theme of Waiting For Gonski: How Australia Failed its Schools, by Tom Greenwell and Chris Bonnor – is school funding.

It is easy, and common, to view the school funding debate as a partisan issue. Inequities in school funding are often presumed to be an extension of conservative government policies going back to the Howard government. Waiting for Gonski shows how inaccurate this perception is, and how far governments of any political persuasion have to go before true reform is achieved. 

The first part of the book is an analysis of the context that gave rise to the Review of Funding for Schooling in 2011, commonly known as the Gonski Report. Greenwell and Bonnor devote their first chapter to an overview of the policy arguments and reforms that consumed much of the 20th century, leading to the Gillard government establishing the review. This history is written in a compelling, detailed and interesting way, and contains many eye-opening revelations. For example, the parallels between the 1973 Karmel report and the 2011 Gonski version are somewhat demoralizing for those who feel that school funding reform should be attainable in our lifetimes. Secondly, the integral role that Catholic church authorities have played in the structure of funding distributions that continue to the present day is, I think, a piece of 20th century history that is very little known. Julia Gillard’s establishment of the first Gonski review is thus situated as part of a longer narrative that is as much a part of Australia’s cultural legacy as are questions around national holidays, or whether or not Australia should become a republic.

Several subsequent chapters detail the findings of the 2011 Gonski review, its reception by governments, lobby groups, and the public, and the immediate rush to build in exceptions when interest groups (particularly independent and catholic school bodies) saw they would “lose money”. The extent to which federal Labor governments are equally responsible for the inequitable state of school funding is made more and more apparent in the first half of the book. Greenwell and Bonnor sought far and wide for comments and recollections from many of the major players in this process, including politicians of both colours, commentators, lobbyists, and members of the review panel itself. This certainly shows in the rich detail and description of this section.

Rather than representing a true champion of equity and fairness, the Gonski report is painted as one built on flawed assumptions, burdened with legacies that were not properly unpacked, and marred by a multitude of compromises, designed to appease the loudest proponents of public funding for private and catholic schools. The second Gonski review, officially titled, Through Growth to Achievement: Report of The Review to Achieve Educational Excellence in Australian Schools, is given less emphasis perhaps because this second review was less about equity and funding and more about teacher quality and instructional reform – a book-length subject in itself.

Waiting for Gonski is most certainly an intriguing and entertaining read (a considerable achievement, given its fairly dry subject matter), and is highly relevant for those of us working towards educational improvements of any description in Australia. My main criticism of the book is that it tends to drag a little in the middle third. While the details of machinations between political leaders and catholic and independent school lobbyists are certainly interesting, the arguments in these middle chapters are generally repetitions from earlier chapters, with reiterated examples of specific funding inequities between schools. 

A second concern I have is the uncritical focus on Programme for International Student Assessment (PISA) data to support claims of widespread student academic failure. While it’s true that PISA shows long-term average declines in achievement amongst Australian school students, these assessments are not the only standardized tests of student achievement in this country. The National Assessment Program: Literacy and Numeracy (NAPLAN) is briefly touched upon in Chapter 8, but not emphasized. The reality is that while average student achievement on NAPLAN literacy and numeracy tests have not increased – after their initial boost between 2008 and 2009 – nor have students’ results suffered large scale declines. Figure 1 demonstrates this graphically, showing the mean scores for all cohorts who have completed four NAPLAN assessments (up until 2019).

Figure 1. Mean NAPLAN reading achievement for six cohorts in all Australian states and territories. Calendar years indicate Year 3. (Data sourced from the National Assessment Program: Results website) 

It seems somewhat disingenuous to focus so wholeheartedly on one standardized assessment regime at the expense of another to support claims that schools and students are ‘failing’. For example, in Chapter 3 the authors argue that,

 “…the second unlevel playing field [i.e. the uneven power of Australian schools to attract high performing students] is a major cause of negative peer effects and, therefore, the decline in the educational outcomes of young Australians witnessed over the course of the 21st century” (p.93) 

In my view, claims such as these are over-reach, not least because arguments of a decline in educational outcomes rely solely on PISA results. Furthermore, the notion that the scale and influence of peer effects are established facts is also not necessarily supported by the research literature. Other claims made about student achievement growth are similarly unsupported by longitudinal research. In this latter case, not because claims overinterpret existing research, rather because there is very little truly longitudinal research in Australia on patterns of basic skills development – despite the fact that NAPLAN is a tool capable of tracking achievement over time. 

Using hyperbole to reinforce a point is not a crime, of course, however the endless repetition of similar claims in the public sphere in Australia tends to reify ideas that are not always supported by empirical evidence. While these may simply be stylistic criticisms, they also throw into sharp relief the research gaps in the Australian context that could do with addressing from several angles (not just reports produced by the Australian Curriculum, Assessment and Reporting Authority [ACARA], which are liberally cited throughout).

I hope that the overabundance of detail, and the somewhat repetitive nature of the examples in this middle section of the book, don’t deter readers from the final chapter: Leveling the playing field. To the credit of Greenwell and Bonnor, rather than outline all the problems leaving readers with a sense of despair, the final chapter spells out several compelling policy options for future reform. While structures of education funding in Australia may seem intractable, the suggestions give concrete and seemingly-achievable options which would work presuming all players are equally interested in educational equity. The authors also tackle the issue of religious schools with sensitivity and candour. It is true that some parents want their children to attend religious schools. How policy can ensure that these schools don’t move further and further along the path of excluding the poorest and most disadvantaged – arguably those whom churches have the greatest mission to help – should be fully considered, without commentators tying themselves in knots over the fact that a proportion of Australia’s citizens have religious convictions.

Questions around school funding, school choice and educational outcomes are perennial topics in public debate in Australia. However, claims about funding reform should be underpinned by a good understanding of how the system actually works, and why it is like this in the first place. This is the great achievement of Greenwell and Bonnor in Waiting for Gonski. The way schools obtain government funding are obscure, to say the least, and there is a perception that private schools are not funded to the same extent as public schools. Waiting for Gonski clearly shows how wrong this idea is. As the book so powerfully argues, what Australia’s school funding system essentially does is allow children from already economically advantaged families to have access to additional educational resources via the school fee contributions these families are able to make. The book is a call to action to all of us to advocate for a rethink of the system.

Education is at the heart of public policy in many nations, not least in Australia. Waiting for Gonski is as much a cautionary tale for other nations as it is a comprehensive and insightful evaluation of what’s gone wrong in Australia, and how we might go about fixing it. 

Waiting for Gonski: How Australia Failed its Schools by Tom Greenwell & Chris Bonnor. 367pp. UNSW Press. RRP $39.99

Sally Larsen is a Lecturer in Learning, Teaching and Inclusive Education at the University of New England. Her research is in the area of reading and maths development across the primary and early secondary school years in Australia, including investigating patterns of growth in NAPLAN assessment data. She is interested in educational measurement and quantitative methods in social and educational research. You can find her on Twitter @SallyLars_27