The ‘north/south attainment gap’ claimed to exist by The Sutton Trust, The Social Mobility Foundation, the DfE, the National Schools Commissioner and virtually the entire English educational establishment is a fallacy. The actions taken by the government to ‘close the gap’ through market pressure on allegedly under-performing Northern schools from league tables and OfSTED is counter productive and having the opposite effect to that which is intended. This has been revealed by this recent BBC News investigation.
The under performing schools allegation is based on the comparatively poor attainment of north of England FSM pupils at GCSE, compared to their KS2 SATs scores. A comparison is made with pupils suffering comparable levels of socio-economic deprivation in London Boroughs, where this ‘attainment gap’ is not found. The true explanation for this lies in the different cognitive ability and ethnicity profiles, featured in the BBC report, of London Boroughs compared to ‘white working class’ districts targeted by the ‘attainment gap’ allegations.
The BBC rightly draws attention to the perverse outcomes of the DfE’s ‘Progress 8’ school accountability measure, which uses KS2 SATs as the baseline for measuring the progress made by students at their secondary schools. The fact that neither the SATs themselves, nor their statistical manipulation by the DfE, are fit for purpose gives rise to even greater concerns than those raised in the BBC article.
I have been supported and joined in my investigations by John Mountford, a retired headteacher and former OfSTED inspector. His inquiries have revealed that it is not just Free School Meals (FSM) children in the north of England that are affected, but also in the more prosperous south. This is the letter he has sent to his MP.
Dear Mr Rees-Mogg
Thank you for your prompt reply to my initial inquiry regarding testing in schools (copied below for your convenience). I note that you have referred the matter to Rt. Hon. Nick Gibb and are awaiting his response. I will, in due course take the opportunity to attend one of your surgeries, as suggested. In the meantime, there have been further developments I wish to bring to your attention.
The limited research my colleague and I are engaged upon is, even at this early stage, yielding results that require a response from the DfE.
The KS2 SATs were revised for 2016. In the original version, the raw marks from the exams were used to set National Curriculum Levels, with the raw mark thresholds for each level determined each year by the DfE. The DfE then determined the minimum acceptable proportion of pupils in every primary school that should achieve Level 4. This was the ‘floor target’ with schools failing to meet it being placed in ‘Requires Improvement’ or ‘Special Measures’ categories by OfSTED. SATs have always been ‘attainment tests’ based on specified content set out by DfE.
The post 2016 SATs are different. The concept of National Curriculum levels has been abandoned. The DfE now report SATs results on a ‘scale’ with a mean of 100, a minimum of 80 and a maximum of 120. This is ‘explained’ here, except that there is no explanation, just a set of conversion charts, changed each year, to convert raw SATs exam marks to a score on the 80 – 120 scale.
There is no explanation of what ‘attainment descriptors’ apply to the scaled score of 100, nor to any other scaled score including the minimum and maximum of 80 and 120. The only valid statistical alternative to criterion referenced attainment descriptors is norm referenced percentiles. For example the IQ/cognitive ability scale enables percentiles to be obtained for every standard score. It is not clear that the SATs ‘score’ is a standard score at all in the statistical sense. If it was, then the DfE could state the percentile represented by the minimum expected score of 100 for all pupils. In the 2017 SATs, DfE announced that 61% of pupils had met the ‘expected standard’ and attained a scaled score of at least 100. It is therefore clear that the ‘expected minimum scaled score’ of 100 cannot be the 50th percentile if 61 percent attained it last year.
We have asked respected academics of international standing to comment but none have so far made any statistical sense of it. We invite you to take advice from your own contacts in the academic world alongside any response you get from the DfE. It appears that the SATs ‘scale’ of 80 – 120 is not a ‘standard scale’ of any kind. It appears to be an arbitrary creation, along with the conversion tables for converting raw marks into SATs scores. In this context, it is important to note that Cognitive Ability Tests, in contrast, are standardised according to established statistical procedures, which is why they are still employed by grammar schools for their 11 plus selection tests.
The data acquired as part of our research confirms that the SATs results are inflated when compared to Non-Verbal Reasoning test standard scores, and especially for pupils attaining the lower NVR scores. This has serious implications for secondary schools, especially in relation to setting Attainment 8 and Progress 8 targets, especially for those schools with high numbers of FSM children on roll, because we know from data published by GL Assessment, who provide the Cognitive Ability Tests, that FSM children, on average, have lower cognitive abilities.
For example, we have used FoI to obtain the data for the nine Non Verbal Reasoning Test bands used [for admissions purposes] by one large school. These give the number of pupils in each band in brackets. Underneath each band the mean scaled SATs score for reading and for maths are provided, in that order.
NVR Band 1 (9) corresponds to -2 SD (2nd percentile)
SATs 95, 97
Band 3 (25) to -1 SD (16th percentile)
SATs 102, 104
Band 5 (40) to the mean (50th percentile)
SATs 105, 109
Band 7 (28) to +1 SD (84th percentile)
SATs 110, 110
Band 9 (9) to +2 SD (98th percentile)
SATs 117, 116
NVR Band 1 pupils should presumably be performing at the 2nd SATs percentile. We do not know what SATs score this corresponds to, but it is certainly not 95/97
NVR Band 3 pupils should be performing at the 16th SATs percentile. This cannot be 102/104
NVR Band 5 pupils should be performing at the 50th SATs percentile. Could this be 105/109?
NVR Band 7 pupils should be performing at the 84th SATs percentile. Could this be 110?
NVR Band 9 pupils should be performing at the 98th SATs percentile. Could this be 116/117?
This is like taking the bottom of the regression line and moving it up so that -2SD becomes 96 instead of 70, which we are informed is statistical nonsense. We hypothesise that this pattern results from primary schools, having a high proportion of low NVR pupils, resorting to cramming and coaching methods to meet the DfE floor target. Such children will have understood little and forgotten most of it by the end of the summer holidays, which is what hundreds of secondary schools report as the reason why they buy the Cognitive Ability Tests (CATs) to reliably inform diagnostic and target setting interventions for their pupils.
So, it emerges from our analysis that SATs scores are systematically inflated for pupils of lower cognitive ability, and the lower the cognitive ability score, the greater the inflation.
I apologise for the volume of detail contained herein, but real life is rarely simple. As you will appreciate this is a matter of great importance. The fate of individual pupils and schools depends on this system being transparent and reliable. Clearly, this could potentially threaten the robustness of the whole examination system. As such, we believe it to be an urgent matter, requiring a thorough investigation.
These data represent clear evidence of the general inflation of SATs scores compared to Cognitive Ability Test (CAT) scores for the same pupils, especially for those of lower cognitive ability.
We have also obtained SATs and CATs data for the 2017 admission cohorts of two schools in the south of England. These schools do not use CATs to drive ‘fair banding’ admission systems, but are purchased to screen their Y7 intakes for a variety of purposes including target setting and the diagnosis of barriers to learning. These data confirm the EEF findings that the alleged FSM ‘attainment gap’ is common to all schools including those judged by OfSTED to be ‘good’ or ‘outstanding’.
School A (101 students, FSM 10)
Mean KS2 Standard Scores
GPVS (Grammar, Punctuation and Spelling): cohort – 105, FSM – 102
Reading: cohort – 103, FSM – 101
Maths: cohort – 102, FSM – 99
Mean CATs Scores (percentiles shown in brackets)
Verbal: cohort – 99 (47th), FSM – 96 (39th)
Non-Verbal: cohort – 95 (37th), FSM – 88 (21st)
Quantitative: cohort – 96 (39th), FSM – 88 (21st)
Spatial: cohort – 97 (42nd), FSM – 91 (27th)
School B (308 students, FSM 37)
Mean KS2 Standard Scores
GPVS (Grammar, Punctuation and Spelling): cohort – 107, FSM – 104
Reading: cohort – 106, FSM – 102
Maths: cohort – 105, FSM – 101
Mean CATs Scores (percentiles shown in brackets)
Verbal: cohort – 101 (53rd)), FSM – 94 (34th)
Non-Verbal: cohort – 98 (45th), FSM – 91 (27th)
Quantitative: cohort – 101 (53rd)), FSM – 93 (32nd)
Spatial: cohort – 102 (55th), FSM – 96 (39th)
Normal Distribution data points are always clustered around the mean (100) with fewer data points towards the extremities of the distribution. Percentiles show the percentage of the general population with that score or below. (eg a mean score of 100 represents the 50th percentile). Because the SATs standard scores are not standard scores in the statistical sense, they cannot be converted into percentiles. (eg the score of 100 does not represent the 50th percentile, but some arbitrary percentage decided each year by the DfE)
The data from these schools are highly informative.
- The SATs scores are generally inflated compared to CATs. We believe that this is a consequence of their high stakes nature and susceptibility to ‘gaming’ through extreme coaching and cramming. Secondary schools need CATs data before they or anybody else can form valid judgements of the progress of their students. The SATs based judgements used to justify the ‘attainment gap’ argument of the DfE, the Sutton Trust and the Social Mobility Foundation lack appropriate statistical validity. As stated in the BBC article by the heads of schools with high proportions of ‘white working class’ pupils, the current system is indeed ‘institutionally toxic’ towards their schools.
- The SATs scores of FSM pupils are slightly depressed, but CATs scores are hugely so. This confirms GL Assessment national data that shows that FSM children on average have significantly reduced cognitive abilities. Please do not shoot the messenger. This is factually beyond dispute.
- School B has a higher mean cognitive ability intake than School A. Therefore, all other things being equal (quality of teaching etc) it should get better GCSE results and a higher league table placing. But the ‘Progress 8’ measure is statistically incapable of validly differentiating between schools with different ethnic mixes (BBC article) and with different proportions of FSM (our research), making it unfit for purpose.
- Cognitive ability is not fixed at birth, nor by anything else. Although it is often ignored, considerable expertise exists in relation to how cognitive ability can be raised throughout life, but especially through the school years through the right approaches to teaching and learning.
Put simply, instead of cramming our children with knowledge for SATs and GCSEs in ways that inhibit cognitive development, our pupils deserve educational experiences of the highest quality that make them cleverer and wiser, as well as more knowledgeable. Ways of achieving this are well established (eg by the EEF), although not specifically recognised by OfSTED or promoted by the DfE.
This leads to the inevitable conclusion that SATs are not fit for purpose and that the DfE, OfSTED, The Sutton Trust and the Social Mobility Foundation have got the ‘attainment gap’ completely wrong.