This blog piece was originally published on the SSAT Blog (link) (December, 2017)
Our latest cohort of students, all current or aspiring school leaders, have been getting to grips with school performance tables, Ofsted reports, the new Ofsted inspection dashboard prototypes, the Analyse School Performance (ASP) service and some examples of school tracking data. As they write their assignments on school performance evaluation, I realise others might be as interested as I have been in what we are learning.
There was a general agreement in the group that using progress (ie value-added) indicators rather than ‘raw’ attainment scores give a better indication of school effectiveness. As researchers have known for decades and the data clearly show, raw attainment scores such as schools’ GCSE results say more about schools’ intakes than their performance.
Measuring progress is a step in the right direction. However, as I pointed out in an (open access) research paper on the limitations of the progress measures back when they were introduced, a Progress 8 measure that took context into account would shift the scores of the most advantaged schools enough to put an average school below the floor threshold, and vice versa.
Confidence lacking in confidence intervals
Recent research commissioned by the DfE suggests that school leaders recognise this and are confident about their understanding of the new progress measures. But many are less confident with more technical aspects of the measure, such as the underlying calculations and, crucially, the accompanying ‘confidence intervals’.
Those not understanding confidence intervals are in good company. even the DfE’s guidance mistakenly described confidence intervals as ‘the range of scores within which each school’s underlying performance can be confidently said to lie’. More recent guidance added the caveats that confidence intervals are a ‘proxy’ for the range of values within which we are ‘statistically’ confident that the true value lies. These do little, in my view, to either clarify the situation or dislodge the original interpretation.
A better non-technical description would be that confidence intervals are the typical range of school progress scores that would be produced if we randomly sorted pupils to schools. This provides a benchmark for (but not a measure of) the amount of ‘noise’ in the data.
Limitations to progress scores
Confidence intervals have limited value however for answering the broader question of why progress scores might not be entirely valid indicators of school performance.
Here are four key questions you can ask when deciding whether to place ‘confidence’ in a progress score as a measure of school performance, all of which are examined in my paper referred to above on the limitations of school progress measures:
- Is it school or pupil performance? Progress measures tell us the performance of pupils relative to other pupils with the same prior attainment. It does not necessarily follow that differences in pupil performance are due to differences in school (ie teaching and leadership) performance. As someone who has been both a school teacher and a researcher (about five years of each so far), I am familiar with the impacts of pupil backgrounds and characteristics both on a statistical level, looking at the national data, and at the chalk-face.
- Is it just a good/bad year (group)? Performance of different year groups (even at a single point in time) tends to be markedly different, and school performance fluctuates over time. Also, progress measures tell us about the cohort leaving the school in the previous year and what they have learnt over a number of years before that. These are substantial limitations if your aim is to use progress scores to judge how the school is currently performing.
- Is an average score meaningful? As anyone who has broken down school performance data by pupil groups or subjects will know, inconsistency is the rule rather than the exception. The research is clear that school effectiveness is, to put it mildly, ‘multi-faceted’. So asking, ‘effective for whom?’ and, ‘effective at what?’ is vital.
- How valid is the assessment? The research clearly indicates that these measures have a substantial level of ‘noise’ relative to the ‘signal’. More broadly, we should not conflate indicators of education with the real thing. As Ofsted chief inspector Amanda Spielman put it recently, we have to be careful not to ‘mistake badges and stickers for learning itself’ and not lose our focus on the ‘real substance of education’.
There is no technical substitute for professionals asking questions like these to reach informed and critical interpretations of their data. The fact that confidence intervals do not address or even connect to any of the points above (including the last) should shed light on why they tell us virtually nothing useful about school effectiveness — and why I tend to advise my students to largely ignore them.
So next time you are looking at your progress scores, have a look at the questions above and critically examine how much the data reveal about your school’s performance (and don’t take too much notice of the confidence intervals).