What we’ve learned from practice national assessments

March 21, 2024

Joshua Perry

EDTECH

Standardised Testing

Education

Assessment

Primary Schools

SATs

One of our most popular packages at Smartgrade is the Standardised Practice National Assessments series for primaries. In our last blog we focused on measuring progress in KS2 Practice SATs; in this post we’re going to dive into some of the more granular findings we’ve seen across the assessments contained within the package, which are:

4 Practice KS2 SATs papers (Autumn 1, Autumn 2, Spring 1, Spring 2)
2 KS1 SATs papers (1 practice paper in Spring 1, then actuals in Summer 1)
3 Practice Phonics Papers (Autumn 2, Spring 1, Spring 2)

Schools enter data at whole mark or question level, then we run a live standardisation using results from all schools to calculate standardised grades like percentile ranks and national performance indicators (e.g. EXS/HS for SATs). Alongside those overall grades, schools get detailed analysis and benchmarking at question and topic level (if they’ve entered question-level data).

Hundreds of schools participate, which gives us a fascinating dataset from which to glean insight — and because sharing is caring, below are a few of the things we’ve learned along the way. But before we get into it, a couple of caveats:

While we have a broad and national range of schools participating, and sample sizes of 10,000+ students in many cases, we do not claim to have a perfectly representative sample; nor do we standardise in the same way as traditional assessment publishers — we do “live” standardisations using data from that assessment window only. We therefore do not consider the below to be academically rigorous research findings. Rather, they are insights which we consider sufficiently robust to be worth sharing; but caution is always advisable when drawing inferences from assessment data!
We’re assessment data geeks but we’re not primary teachers… so we’d love to hear from you if you’re an educator with views on how to interpret these insights.

Right, with caveats out the way, here’s what we found most interesting…

1. Prior year maths content can trip students up

Here are two questions from the 2019 Maths SATs reasoning paper that we used in our 2023/24 Autumn 2 practice SATs window:

Amina is shopping. She says, ‘I would like to buy one-quarter of a kilogram of cheese.’ Write one-quarter on the scales as a decimal.

___ kg

The cheese costs £1.35 Amina pays with a £2 coin. How much change should Amina get?

Ignoring for a second the fact that Amina got an absolute deal on her cheese, what’s most interesting to us about these questions is that they are testing prior year knowledge (the first is Y4 content; the second draws from Y3&4), and yet the % correct is markedly different on each: 36% got the first question right; 65% got the second question right. We know that this is more than statistical noise because our sample includes some 9,000 children, and also because we saw an even bigger variance when we used the 2019 paper in our 2022/2023 practice SATs (42% right on the first question; 69% on the second).

*Amina getting an absolute deal on her cheese*

What we take from this is that while by definition schools will have taught concepts from previous years to year 6 students, you should not assume that such knowledge is secure. Practice SATs can be a neat way of spotting gaps in prior knowledge, and Smartgrade lets you filter your Question Level Analysis (QLA) view by the year in which the content was taught to help you spot such gaps (see screenshot below of what that looks like).

*Practice SATs class-level QLA filtered for Year 3 content*

2. Ratio & Proportion is consistently the area of KS2 practice SATs on which children struggle (even once it’s been taught!)

The year 6 scheme of learning from our partners White Rose Maths schedules Ratio to be taught at the start of Spring 1. And yet, when we look at the results of our KS2 Practice SATs from both Spring 1 (typically taken later in the half term) and Spring 2, we find that Ratio & Proportion is consistently the topic children struggle with the most when looking at topics that contribute at least 7 marks to the overall raw score.

In our current live standardisation for Spring 2 the average percentage correct for Ratio & Proportion is 39% vs an all-topic average of 59%; in Spring 1 it was just 28% vs an all-topic average of 50%. What’s more, in Spring 2, two of the three toughest questions were linked to this topic. What this tells us is that even given the likely recency with which this content has been taught, it’s still seemingly an area that children commonly struggle to get a handle on.

Of course, when it comes to analysing your own practice SATs data, we’d recommend you use Smartgrade’s topic analysis to find your own class, school or even MAT’s areas for intervention — just because this is a national trend it doesn’t mean it’s necessarily the case at your school! It’s also worth bearing in mind that the issue in topic analysis may not be the topic itself. For example, a child struggling with fractions may in fact be having difficulty with an underlying concept like multiplication and division facts. So it’s important not to view performance in a single topic in isolation — clearly a ratio intervention may not be the answer if for example a child is struggling to count.

3. Children struggle on 3 mark reading comprehension questions

I want to start this section by making clear that we don’t always think it’s worthwhile to pore over the minutiae of topic and QLA for practice SATs reading assessments. As reading expert Christopher Such pointed out:

“Through discussion, you will naturally ask children to visualise, predict, summarise, paraphrase, infer, etc. These are *not* generic skills, so don’t worry about developing them as such.”

To put it another way, if a teacher sees from the topic analysis that their students are weak at making inferences, that can be useful context, and perhaps that becomes an area you keep an eye on in future assessments, but it doesn’t mean that they should necessarily plan a fortnight of inference making.

That’s one reason why if you’re entering data into Smartgrade for any partner assessment we give you the option of entering a whole mark or QLA as per your preference — sometimes the QLA is super-useful; but that’s not always the case. And either way, you can still benefit from our live standardisations.

That said, we do think there is value that can be gleaned from question level analysis of SATs Reading papers, and one insight that we think is meaningful is that children really struggle on 3 mark questions. Take our 2023/24 Spring 1 Reading comprehension practice SATs window (using the 2022 paper) for example. Every question had an average % correct of between 43% and 86% EXCEPT for the two 3 markers, which had an average % correct of just 23% and 25%. (For the record, we see very similar patterns on other papers too.)

Here’s an example of one of those questions:

Think about the whole text. How is a mysterious atmosphere created?

Give two ways, using evidence from the text to support your answer.

Now clearly that question has a higher degree of complexity relative to the 1 and 2 mark questions, but our assumption is that the skill of responding well to that question is also to some extent a teachable skill. Or to put it another way, it may not only be reading comprehension that is tripping students up here; it may also be that they have had limited practice in drawing out two examples from a whole text.

Of course there’s much, much more to a student’s experience of primary education than preparing for SATs, and we’d mostly not want to advocate teaching to the test when it comes to reading comprehension. Again, we’re influenced by Christopher Such here, who says: “Focus on children’s understanding (and enjoyment) of the text in front of them.”

But also, you know, SATs exist, and you’ll be held to account for your school’s results. And the two 3 mark questions typically account for 12% of a child’s total raw score. That’s not nothing! So while there are lots of factors that will feed into whether you wish to spend in-class teaching time on addressing this challenge, given how materially performance in this area will affect the student’s mark, we’d at least consider looking at your QLA to see how your students performed, and to try to unpack what (if any) issues they may be having with such questions. Are they giving one way rather than two? Are they giving two but not using evidence from the text? That will help you understand how you could intervene here.

4. In Spring 1 practice phonics, children found pseudo words easier to decode than real words

This year we launched practice phonics assessments, including topic-level analysis of performance on certain categories of words (CVC, Digraphs and so on) and we’ve already gathered some fascinating insights from this data. Perhaps the most interesting thing was that in Spring 1 of this year, students performed substantially better on the 20 pseudo words (60% average) than they did on the 20 real words (50% average).

At first glance this sounded counterintuitive to us: and sure enough, when I polled people on twitter to see which they thought would be easier, 75% opted for real words (once “don’t know”s were excluded). But then again, the types of words included will make a difference — there’s a lot of phonics teaching packed into year 1, so it may be that there’s lots of progress in real word decoding between Spring 1 and Spring 2 once concepts like digraphs and split digraphs have been introduced? In any case, we’re looking forward to analysing the Spring 2 data to get a sense of whether there’s any movement in this trend over time.

5. Maths SATs papers have contained less Y6 content since the pandemic

All KS2 maths SATs past papers contain content taught in years 3–6, but the percentage of content from each year varies quite a bit. Here’s a breakdown for the last four papers:

What jumps out is how the Y6 content is down from 47% pre-pandemic to 36–37% post-pandemic. Interestingly though, the raw score required to meet the expected standard didn’t change meaningfully; it was 58 in 2019, 58 again in 2022 and 56 in 2023.

Of course, there’s much more to difficulty level than the raw score and year in which content was taught, so it would be over-extrapolating to say that the papers have got definitively easier, but one conclusion we have drawn is that it’s better to stay away from (or truncate) the 2018/2019 past papers in the autumn term. One issue when setting practice SATs in the autumn term is that you could be giving your children quite a few questions relating to concepts they haven’t been taught, so the logical response to this is to use papers with a lower percentage of year 6 content in the autumn. That’s why we’ve adjusted our 24/25 schedule to use the 2022/2023 papers as the basis for practice SATs during the autumn term.

6. Gender has a surprising impact on item performance — but gender-specific teaching strategies probably won’t help much.

We’re not big fans of school-level gender analysis as we doubt that it can be that useful in terms of informing teaching (and reteaching) strategies, particularly with primary cohorts; but once you have c. 10,000 students in a sample as we do, gender analysis becomes more valid and statistically meaningful. And so we’ve taken a look at the performance on items when broken down by gender, and what we’ve found is that it can have a really substantial impact on the likelihood of a child getting a SATs question right. Take these two examples from this year’s Autumn 2 KS2 Practice SATs assessments:

In GPS, the average raw % of girls was 6 percentage points (p.p.) higher than boys across all questions, but on the question shown above girls outperformed boys by a full 15 p.p. In Maths, we saw some even bigger disparities: the average raw % of boys was 4 p.p. higher than girls, but on the question shown above it was a whopping 19 p.p. Higher.

You could come up with hypotheses as to why these disparities exist… but whatever the reason, I think these findings mostly fall into the category of “cool but not actionable” insight. After all, even if you see a disparity in the performance of boys and girls in interpreting pictograms, you’re probably not going to differentiate the way you teach the content to those gender groups as a result. After all, while we’re confident that we’re accurately reporting on the national picture, that doesn’t mean your class exactly reflects these trends. And even if your students do have a similar disparity in their results, it’s likely to be a better use of time to find areas deserving of reteach across a whole class (or broader areas for intervention for an individual student) than to design a girl-specific pictogram teaching approach!

But it is interesting, right?

To conclude, at Smartgrade we obsess about providing analysis that helps you glean actionable insights from your data, but we also feel it’s our responsibility to help schools work out what analysis isn’t worth doing. So we hope this blog has given some tangible examples of how to (and how not to) use data to inform your decision-making. If this has sparked your interest in Smartgrade, or you just want to have a natter about good practice in school assessment, please do get in touch.

We’re offering FREE access to our final Spring 2 Practice SATs assessments when schools sign up for 2024/25. Book in a demo using the link below to find out more or email us at sales@smartgrade.co.uk.