The scaled score trap: why you should not use scaled scores to measure progress in Practice SATs

March 19, 2024

Joshua Perry

Education

EDTECH

Assessment

Schools

SATs

KS2

One of our most popular packages at Smartgrade is the Standardised Practice National Assessments series for primaries, covering practice KS1 and KS2 SATs, alongside practice phonics screening checks.

The focus of this blog is progress in KS2 Practice SATs. Our National Asseessments package allows schools to participate in 4 KS2 practice SATs assessment windows (Autumn 1, Autumn 2, Spring 1, Spring 2). Schools enter data at whole mark or question level, then we run a live standardisation using results from all schools to calculate standardised grades like percentile ranks and national performance indicators (e.g. EXS/HS for SATs). Alongside those overall grades, schools get detailed analysis and benchmarking at question and topic level (if they’ve entered question-level data), plus the “actual” scaled scores that correspond with the achieved raw scores from the relevant SATs year.

One question we’re frequently asked — particularly when schools first join us — is how should you measure progress in year 6? It’s common for primaries to use multiple practice SATs papers to assess performance over the year, but deriving a meaningful progress measure from that data is surprisingly perilous.

One approach we come across a lot is where schools convert raw scores to scaled scores using the DfE conversion tables from the actual SATs paper. The attraction here is that you get a number on a scale you recognise, and that is anchored in something real (the SATs “actuals” from the year of the assessment). So you might think that you can look at the change in scaled score between practice papers to see how much progress students are making. Also, in the absence of a system like Smartgrade, you don’t really have other options! So the “actual” scaled score conversion is often the focus.

However, the problem is that students’ performance in these scaled scores does not follow a linear progression. What’s more, the scaled scores across different years are not exactly comparable, since the average scaled score can change from year to year.

Because Smartgrade runs practice SATs assessments in the autumn and spring terms of year 6 using different past papers in each assessment window, we’re able to look at how the average scaled score changes during year 6. Here’s a chart of average scaled score between Autumn 1 and Spring 2 of 2022/23, including the past paper that was used by Smartgrade for the relevant window.

What we see is that progress unsurprisingly follows an upward trajectory, and at first glance there seems to be a reasonably close relationship between the three subjects… but when you look more closely at the detail, the change over time doesn’t really follow a pattern that makes for a useful progress measure. For one thing, there are important discrepancies between subjects: for example average maths performance increases at a more rapid pace than reading or GPS, but if you’re looking at your own school averages it’s unlikely you’ll be able to control for this.

Moreover, none of them follows a perfectly neat, linear progression, so it’ll be hard to infer meaningfully what “good” progress looks like. In our dataset GPS and Maths rose at the slowest rate between Spring 1 and Spring 2, but Reading rose at the most rapid rate during that same period. We’re not sure if that’s a consequence of the papers chosen, the abilities of the children in the “actual” SATs year (after all, one was a pre-pandemic paper and the other was post-pandemic); or something else. And if we’re not sure, it’s unlikely anyone else will be able to be confident about what’s happening either!

Another issue is that even at Spring 2, averages are significantly below the 104–105 averages we typically see for actual KS2 SATs papers. A number of factors may be at play here, including that there’s still teaching time before the actual SATs, but we suspect that data is also slightly suppressed in practice SATs because the experience of sitting the assessments feels lower stakes to the student.

Yet another fly in the ointment is that scaled scores may look like they represent some sort of pleasingly uniform scale, but the results don’t conform to one of the distributions we’re used to seeing on standardised assessments, like percentile ranks or standardised scores revolving around a mean of 100. Here’s what the actual bell curve looks like from the actual 2023 KS2 SATs maths assessment (based on DfE data), for example:

So how do we suggest you approach Y6 SATs progress? Well, in Smartgrade we do show you the scaled score conversions based on actual SATs outcomes, but we also run a live standardisation to give you a percentile rank and predicted performance indicator relative to the performance of the cohort sitting the practice paper at that point in time. We think this is a more meaningful way of tracking progress during year 6, since the measures are more comparable across assessment points in year 6. That’s because our sample is pretty similar between windows: we do have schools that join in some windows and not others; but by and large the rump of the sample and the characteristics of schools across the sample are common for all four windows.

To help illustrate the problem with a scaled score progress measure for practice SATs, imagine a child — let’s call her Belinda — whose Maths raw score converted to a KS2 scaled score of 98 in Autumn 2 and 100 in Spring 2 of 2022/23. At first glance it seems like she made solid progress — there’s an increase in the scaled score after all, and by hitting 100 we can infer she was performing around the expected standard (EXS) by Spring 2, since 100 has always been the scaled score that equates to that threshold. But when you look at our live standardisation, a scaled score of 98 in Autumn 2 equated to the 59th percentile rank, whereas a scaled score of 100 in Spring 2 was only equivalent to the 40th percentile rank. So Belinda actually slipped back relative to her peers over the course of a term.

Graph showing difference in scaled score and percentile rank for the same child using Smartgrade Analysis for practice national assessments in 2022/23 Autumn 2 to Spring 2 — Difference in scaled score and percentile rank for the same child using Smartgrade Analysis for practice national assessments in 2022/23 Autumn 2 to Spring 2

That’s why we think that a change in “live” percentile rank is more meaningful and comparable at both student and cohort level. Of course, you can worry too much about small percentile rank changes at the individual level — assessments are never perfectly accurate after all — but when aggregated by class, school or MAT you can reduce the statistical noise and get a more robust indicator of the overall progress of your cohort.

We’re offering FREE access to our final Spring 2 Practice SATs assessments when schools sign up for 2024/25. Book in a demo using the link below to find out more or email us at sales@smartgrade.co.uk.