You know there is something going wrong with Australia’s national testing program when the education minister of the largest state calls for it to be axed. The testing program, which started today across the nation, should be urgently dumped according to NSW Education Minister, Rob Stokes, because it is being “used dishonestly as a school rating system” and that it has “sprouted an industry that extorts money from desperate families”.
I think it should be dumped too, in its current form, but for an even more compelling reason than Stokes has aired. I believe we are not being honest with parents about how misleading the results can be.
The Federal Minister for Education, Simon Birmingham, has rejected the NSW minister’s call, of course, arguing that “parents like NAPLAN”. Birmingham is probably right. Many parents see NAPLAN scores as one of the few clear indications they get of how their child’s performance at school compares to other children.
Parents receive a NAPLAN student report showing their child’s score as dots on a band, clearly positioned in relation to their school average and national average. It all looks very precise.
But how precise are the NAPLAN results? Should parents, or any of us, be so trusting of the results?
There is considerable fallout from the reporting of NAPLAN results, so I believe it is important to talk about what is going on. The comparisons we make, the decisions we make, and the assumptions we make about those NAPLAN results can all be based on very imprecise data.
How NAPLAN results can be widely inaccurate
While communication of results to parents suggests a very high level of precision, the technical report issued by ACARA each year suggests something quite different. Professor Margaret Wu, a world leading expert in educational measurement and statistics, has done excellent and sustained work over a number of years on what national testing data can (and can’t) tell us.
Her work says that because of the relatively small number of questions asked in each section of NAPLAN tests, that are then used to estimate a child’s performance for each (very large) assessment area, there is a lot of what statisticians call ‘measurement error’ involved. This means that while parents are provided with an indication of their child’s performance that looks very precise, the real story is quite different.
Here’s an example of how wrong the results can be
Figure A is based on performance on the 2016 Year 7 Grammar and Punctuation test: in this case, the student has achieved a score of 615, placing them in the middle of Band 8. We can see that on the basis of this, we might conclude that they are performing above their school average of about 590 and well above the national average of 540. Furthermore, the student is at the cut-off of the 60% shaded area, which means their performance appears to be just in the top 20% of students nationally.
However, Figure B tells a different story. Here we have the same result, with the ‘error bars’ added (using the figures provided in the 2016 NAPLAN Technical Report, and a 90% Confidence Interval, consistent with the MySchool website). The solid error bars on Figure B indicate that while the student has received a score of 615 on this particular test, we can be 90% confident on the basis of this that their true ability in grammar and punctuation lies somewhere between 558 and 672, about two bands worth. If we were to use a 95% confidence interval, which is the standard in educational statistics, the span would be even wider, from 547 to 683 – this is shown by the dotted error bars.
In other words, the student’s ‘true ability’ might be very close to the national average, toward the bottom of Band 7, or quite close to the top of Band 9.
That’s a very wide ‘window’ indeed.
Wu goes on to note that NAPLAN is also not very good at representing student ability at the class or school level because of what statisticians call ‘sampling error’, the error caused by variation in mean scores of students due to the characteristics of different cohorts. (Sampling error is affected by the number of students in a year group – the smaller the cohort size, the larger the sampling error. Wu points out that the margin of error on school means can easily be close to, or indeed larger than, one year of expected annual growth. So for schools with cohorts of 50 or less, a very significant change in mean performance from one year to another would be possible just on the basis of sampling error.)
The problem is school NAPLAN results are published on the MySchool website. Major decisions are made based on them and on the information parents get in their child’s individual report; parents can spend a lot of money (tutoring, changing school, choosing a school) based on them. As Minister Stokes said a big industry has developed around selling NAPLAN text books, programs and tutoring services. But the results we get are not precise. The precision argument just doesn’t hold. Don’t fall for it.
Any teacher worth their salt, especially one who hadn’t felt the pressure to engage in weeks of NAPLAN preparation with their students, would be far more precise than any dot on a band, in assessing their students’ ability. Teachers continually assess their students and continually collect substantial evidence as to how their students are performing.
Research also suggests that publishing the NAPLAN results on the MySchool website has played a driving role in Australian teachers and students experiencing NAPLAN as ‘high stakes’.
So is NAPLAN good for anything?
At the national level, however, the story is different. What NAPLAN is good for, and indeed what it was originally designed for, is to provide a national snapshot of student ability, and conducting comparisons between different groups (for example, students with a language background other than English and students from English-speaking backgrounds) on a national level.
This is important data to have. It tells us where support and resources are needed in particular. But we could collect the data we need by using a rigorous sampling method, where a smaller number of children are tested (a sample) rather than having every student in every school sit tests every two years. This a move that would be a lot more cost effective, both financially and in terms of other costs to our education system.
So, does NAPLAN need to be urgently dumped?
Our current use of NAPLAN data definitely does need to be urgently dumped. We need to start using NAPLAN results for, and only for, the purpose for which they are fit. I believe we need to get the individual school results off the MySchool website for starters. That would quite quickly cut out much of the hype and anxiety. I think it is time, at the very least, to be honest with parents about what NAPLAN does and doesn’t tell them about their children’s learning and about their schools.
In the process we might free up some of that precious classroom time for more productive things than test preparation.
*With thanks to A/Prof James Ladwig for his helpful comments on the draft of this post.
Nicole Mockler is an Associate Professor in Education at the University of Sydney. She has a background in secondary school teaching and teacher professional learning. In the past she has held senior leadership roles in secondary schools, and after completing her PhD in Education at the University of Sydney in 2008, she joined the University of Newcastle in 2009, where she was a Senior Lecturer in the School of Education until early 2015. Nicole’s research interests are in education policy and politics, professional learning and curriculum and pedagogy, and she also continues to work with teachers and schools in these areas.
Nicole is currently the Editor in Chief of The Australian Educational Researcher, and a member of the International Advisory Board of the British Educational Research Journal and Educational Action Research. She was the Communications Co-ordinator for the Australian Association for Research in Education from 2011 until 2016, and until December 2016 was Associate Editor for both Critical Studies in Education and Teaching and Teacher Education.
(Note to readers from the Ed. We are having tech problems with our share counters at the moment across some search engines and devices. For those interested – as of 5:25pm 18/5/18 this post had been shared over 1000 times (1K+) on FB and 102 on Twitter.)