Failing the test: the perversity of England’s exams

Imagine that you are education secretary Gavin Williamson. It is 13th August 2020, and the algorithm employed by Ofqual to “moderate” the teacher-assessed A-level grades, which followed the pandemic cancellation of exams, has come under intense fire. The results are out, and teachers in England have had almost 40 per cent of the grades they awarded marked down.

Q1. Do you:
A. Scrap the algorithm, bowing to the concerns of prominent figures such as Jon Coles, a former Department for Education director general, who warned you in early July that it was flawed;

B. Stick with the algorithm, citing the instruction you issued to Ofqual in March to ensure that the distribution of grades “follows a similar profile to that in previous years”;

Or C. Dither for five days, defending the system as “robust,” before finally performing a U-turn, claiming you have only just discovered the algorithm is unfair, and following the decision already taken in Scotland, Wales and Northern Ireland to simply go with the teachers’ grades.

No marks for choosing the correct answer, but the trickier follow-up question is whether last summer’s saga merely reflects Williamson’s own (undoubted) shortcomings, or is instead a reflection of the confused way in which all politicians—and perhaps many of us voters too—approach examinations. Have we got any coherent idea of what we want them to do?

Labour leader Keir Starmer hailed Williamson’s capitulation as “a victory for the thousands of young people who have powerfully made their voices heard.” It’s true that the algorithm did threaten to prevent high-achievers at schools in poorer areas from transcending their circumstances. But last year’s abortive marking-down of disadvantaged kids was really just an automated version of what happens every year with little public outcry. And getting rid of it promptly created other problems, as grades went up across the board.

Oversubscribed universities asked applicants to defer full courses or choose an alternative subject. This year, with minimal moderation, the marks have risen higher. Students with places at the most coveted medical schools have been offered £10,000 to switch institution, despite the government funding extra places. With more youngsters holding passports to the elite Russell Group universities, other institutions are suffering a ruinous drop in recruitment.

Williamson now strikes a genial note, brushing aside complaints about grade inflation by saying students “deserve to be rewarded” after the pandemic disruption. But if the guiding principle of assessment is being nice, then the system—surely—is sunk.

Q2. Take three identically able students. Olivia gets AAB in 2020, when more than 38 per cent of results in England are A or A, compared to 25 per cent in 2019. Karim gets AAA in 2021, when almost 45 per cent are A or A. What should Jack get in 2022, when exams are almost back to normal?

This year, unlike 2020, schools have been able to choose between a wide variety of assessment methods, raising questions about comparability. This apparent freedom has spawned vast compensatory rituals of probity, with students (ironically) sitting multiple tests, and teachers filling out endless forms for exam boards. We end up in the worst of all worlds: ballooning bureaucracy and the erosion of standards.

If all must have prizes, achievements lose their value. The teachers who, by dint of strange circumstance, found themselves newly empowered to grade their pupils, appeared to be tortured about explicitly declaring any work to be less than first-rate, let alone below average. In that, they are in tune with a wider crisis of judgment in our society. But since higher education and the job market are ever-more competitive, teachers can hardly be blamed for giving their students the benefit of the doubt—although the cumulative effect does nobody any good.

The English system is unusual for having such decisive exams at 18; universities in other countries are far less stratified. Indeed, English pupils are among the most tested in the world, but the irony is that we exhibit such collective awkwardness about producing meaningful results.

Within two years, in primary school alone, standardised assessments will be inflicted on children in Reception, Year 1, Year 2, Year 4 and Year 6: a gruelling regime for teachers and pupils alike that’s opposed by campaigning groups such as More than a Score. For a lot of pain, we get little gain. After running this gauntlet right through to GCSE and then A-levels, pupils are emerging to find that the results might not count for much—in order to filter the enormous numbers with top grades, elite universities are now setting their own entrance exams.

Picking out exceptional talent may have got harder amid all those top grades, but one might still hope that they reflect a positive transformation of real average attainment. Sadly, there is scant evidence of that. Andreas Schleicher is director of education and skills at the OECD, heads up Pisa, the Olympics of school performance, and was once described by Michael Gove as “the most important man in English education.” “The UK is at one end of the spectrum: everything is standardised, and assessment is very high stakes,” he told me. Yet he regards the UK as an “average performer” educationally. (Finland has no standardised testing, and is one of Pisa’s gold medallists.)

The Covid fiasco has amplified calls for a rethink. John Major and eight former education secretaries recently told the Times’s new Education Commission that our methods of assessment need an overhaul. A review commissioned by the National Education Union will report by the end of this year. A campaign, Rethinking Assessment, has been co-founded by Tony Blair-speechwriter-turned-teacher Peter Hyman, who set up the innovative School 21 in east London and is now a co-director of the multi-academy trust and school reform group Big Education. “There’s a massive coalition now for radical change,” Hyman told me. “The question is exactly what form that change will take.”

The logic of the protesters’ placards proclaiming “teachers know best” is that we would do well to cancel exams forever. But teachers are not immune to unconscious bias, and in the absence of formal consistency, private schools press home their advantage. Last year, despite the hated algorithm’s notorious suppression of high-flying marks in low-performing schools, it turned out that the reversion to grading by teachers actually ended up boosting individual pupils from graduate homes by 17 percentage points more than others. And in this second year of teacher assessment, the proportion of entries from private schools resulting in the top A* is up by 12 percentage points, compared to a 4-point rise in state comprehensives and 2-point rise at state sixth-form colleges. Race attainment gaps widened too. As well as perverse in its results, “teacher assessment is incredibly hard to do,” Mary Richardson of UCL’s Institute of Education told me, because teachers are “under so much pressure to show improvement.”

So if teachers don’t necessarily “know best,” which kinds of standardised assessments should we go for?

Q3. Set out the case for and against exams. Use evidence to back up your answer.

I got BBD at my comprehensive sixth form, but still gained a place at Oxford through the now-abolished two-E offer route. I did well in my finals, by (I’m convinced) pretending to be a boy, making confident, sweeping arguments in big, bold handwriting. The experience was intense—quivering in my black-and-white costume as the Examination Schools bell tolled—but it was also thrilling; and whatever setbacks I’ve had since, nobody can take my First away from me. (But I’ll admit to being irked that, while in my day, it put me in the top 10 per cent of Oxford graduates, now I’d be among over 30 per cent.) Exams are a rite of passage, a character-building drama, and the results are a definitive stamp of approval—if things go well, that is. There’s a reason that newspapers print photos of students—invariably female and attractive of course—hugging joyfully.

Sam Freedman was an expert adviser to Gove when he was education secretary. He is now a senior fellow at the Institute for Government and an adviser to the education charity Ark. “Exams are by far the most reliable way we have of comparing people to each other,” he told me. High stakes makes learning stick. Revision strengthens information retrieval. We may have the internet in our pockets, but to find the most accurate information you need to know which search terms to use.

On the negative side, exams are a single-day snapshot. They prioritise performance under pressure over deep knowledge, and are notoriously stressful: Pisa has found that pupils in the UK are among the most anxious in the world, more anxious than their counterparts in exam-mad China and Japan. Worst of all, they turn teachers, pupils and examiners into robots. Shaped by the National Curriculum, league tables and the rise of increasingly-automated marking (AI was edging its way into the system before the algorithm), pupils expect to be spoon-fed. Rationally, perhaps, they demand soul-sapping “exam technique” classes.

Put to one side any unease about what this means for the future of the questioning citizenry on which a healthy democracy depends, and think purely about the consequences for education. It is becoming a series of arbitrary, circumscribed and formulaic exercises. “The reproduction of subject matter knowledge is quite easy to put in a tight, multiple-choice test,” Schleicher told me, citing maths as an obvious example; “but if you want to critically assess creative work, you need a different set of tasks, and at the moment this is not so well integrated in the UK.”

I spoke to Beatrice Coldwell, a 19-year-old from West Yorkshire, about how the chaos of last year contributed to her missing out on her preferred course at Manchester University—but it turned out something else dismayed her more. “The whole assessment process was flawed even without all that,” she told me, describing how, after an inspiring Year 12 exploring big ideas and literary texts in all their complexity, the shift to exam preparation in Year 13 changed everything. “It’s horrific, it just drains essays of their creativity. You pick the quotes you can remember easily, rather than the ones that are multifaceted. When we wrote practice paragraphs, they’d always have to be in this specific format: point, evidence, explain, link” (the mind-numbing PEEL formula). “I didn’t read poetry for months afterwards,” she tells me. Luckily, she resat through the second year of assessment chaos, and is now off to study far away, at the University of London Institute in Paris.

The archaic ceremony of exam hall, pen and paper is ever-less applicable in later life—and not only because handwriting has almost disappeared everywhere apart from school. Peter Hyman points to the appalling decline of practicals in science: “We’ve got to the absurd situation where to be a good chef within our system, it’s all about whether you can write an exam about nutrition for two hours rather than cooking a meal.” Companies such as PWC and KPMG are developing their own skills-based assessments: “they want problem-solvers, collaborators and good communicators,” Hyman said.

And finally, the system of “comparable outcomes” designed to ensure consistency over time in normal years (which the 2020 algorithm was attempting to simulate) has been critiqued as a cap on aspiration and school improvement. The “forgotten third” of 16-year-olds in England who do not gain a “standard pass” in English or Maths (a C in old money, called a “4” since a 2017 translation of letters into numbers) are simply condemned to keep re-taking them during sixth form, usually without success.

There is no shortage of alternatives; some are already in use in English schools, state as well as private. The International Baccalaureate keeps the curriculum broad from 16 to 18. The Extended Project Qualification, equivalent to 50 per cent of an A-level, can be either a 5,000-word essay or an “artefact”—a musical composition, exhibition, or app—with accompanying commentary. Universities have been experimenting with open-book and open-web exams. Coursework, abolished by Gove, has its detractors: it makes stress permanent, and cheating is common, although plagiarism detection software is evolving.

Digitisation may offer ways to track students’ progress as they learn. It gets slightly creepy, as pupils end up continually sitting an “exam” although assessment is a mile from their mind, but some think that’s no bad thing: “one of the biggest mistakes that we made in the last 100 or so years when we industrialised education was to divorce assessment from learning,” Andreas Schleicher told me.

Students at Envision schools in California publicly defend portfolios of work to peers, teachers, and members of the community: high stakes in a good way. At another American school network, the New York Performance Standards Consortium, pupils devise their own assessments through class discussion. In Victoria, Australia, students are tested on “critical and creative thinking”: for example, 16-year-olds might design and play an escape room, a team-building activity in which players must solve puzzles in order to be “released.”

Diversity brings benefits. “Some students might excel in an open-ended written essay, others might do well on multiple choice, others express themselves orally,” Schleicher told me. “The more variety you have, the more chance you have of being fair.” Undergraduates are assessed in a number of ways: observation, collaborative work and vivas—the last of these enable examiners to probe past answers and dig out promise that lacks polish. In Canada, New Pedagogies for Deep Learning advocates a veritable smorgasbord of assessment options—from Socratic dialogue to comics, sculpture to podcasts.

There are ways to make teacher assessment more accurate, too: Tim Brighouse, former chief commissioner of schools, would apply the university system of internal marking and external moderation. “It’s not surprising there’s bias in teacher assessments, as they’re not properly taught how to eliminate it,” Richardson told me. Solutions include blind marking, swapping scripts between schools, and a technique called comparative judgement which involves multiple markers arranging scripts in rank order (as Sam Freedman notes, it’s easier to say one piece of work is better than another than to assign work a grade).

While they wait for the revolution, some schools are adding in what they regard as better assessments. “If you believe in a broad education,” Hyman told me, “you need to measure the full education offer.” School 21 enshrines the principles of “head, hand and heart,” and tests not just knowledge (in the form of traditional GCSEs and A-levels) but also practical and interpersonal skills. In its arsenal is an “oracy assessment toolkit” to evaluate spoken responses to “talking points”—“critically examining ideas and views expressed,” but also “turn-taking.” It is admirable, even if the language illustrates how assessment renders everything—including designing an escape room—somewhat grey.

Other schools are doing less testing altogether. Olly Newton—executive director of the educational charity Edge Foundation, and another co-founder of Rethinking Assessment—is working with the Bohunt Education Trust academy chain based in Hampshire to slim down its GCSE programme, allowing other activities to be accommodated, such as an (outdoors-heavy) “forest school.”

Scrapping GCSEs is low-hanging fruit: here, progressive campaigners have the support of Conservatives including Robert Halfon, chair of the Commons Education Select Committee. No less than Kenneth Baker, who introduced GCSEs as Thatcher’s education minister in 1988, accepts they are now redundant. Why? The original idea was to ensure “everyone left school with something,” but neither half of that any longer applies. Not everyone does get “something” (D and E grades—let alone Gs—were very soon understood by the system as fails) and very few are any longer leaving school then at all. Back in 1950, 93 per cent of pupils left school by 16, and 7 per cent stayed on; now those figures are precisely reversed. Very few other countries have exams at 16.

As Freedman points out, however, 70 per cent of students move to a different institution at 16, often via selection based on GCSE results. “Every time you put selection into a system, you distort the curriculum by creating high stakes,” Freedman told me. The greatest distortion is at A-level: “at the moment, whether we say so or not, our behaviour implies that the main function of secondary education is essentially to help higher and further education to select,” he said. Universities, and particularly sixth forms, do not have to be selective as they are in England: in many countries they are not. If we followed their direction, some of the pressures and problems in our exam system might fade.

Such a shift would prioritise acquisition over comparison. Many of the experts I spoke to cited cumulative portfolios (sometimes called the comprehensive learning record) as their ideal approach. “I was a CSE-type student,” Richardson told me, referring to the old pre-GCSE exam for those the system deemed non-academic. “I left school at 16, with very few qualifications. So I had to slowly build up my record of achievements over time while I was working. And for me, that worked.” Peter Hyman predicts that “in 10 years’ time, everyone will leave school with a URL. And that will be your passport for employers. The radical idea is to say to secondary schools: you don’t need to be sifting people. You just need to be collecting their own strengths together.”

Q4. Identify which of the following statements is true:

  1. Testing promotes excellence and provides the opportunity to achieve justified success.

2. Testing is unnecessary in a world where we invest to create unlimited job opportunities and re-imagine education as lifelong learning.

3. Life isn’t fair, but you have to play the game.

It’s a tricky one. Exams were introduced in the 19th century to replace nepotism with a more equitable allocation of scarce educational benefits. But privilege will find a way in education as everything else, and real meritocracy never arrived (some universities, including Oxbridge, now endeavour to re-interpret results by adjusting for applicants’ backgrounds). We test students relentlessly, but shield them from disappointment; and then we cover our eyes when the majority find they are ill-equipped to secure a fulfilling job.

While the vast wealth gap goes untackled, concern for social justice has been transposed onto educational standards—perhaps because unlike with wealth there is an apparent way to, in the government’s favourite phrase, “level up” without any need for anyone to be levelled down: “just give everyone As.” But as with those suddenly oversubscribed university courses, this creates its own problems. As we’ve seen in this year’s widening of the state-private gap, even a devalued educational currency doesn’t end up fairly shared. Besides, knowledge and intellectual inquiry will be damaged if they’re used to paper over economic inequality.

Some propose achieving fairness instead by raising vocational qualifications to “parity of esteem” with academic ones. The problem is that such divisions always map onto existing social faultlines, tainting the vocational side. The government’s plan to substitute BTECs with T-levels is only the latest in a line of rebrands—Whitehall, after all, initially talked up the secondary “moderns” as attractive, more practically focused counterparts of grammars in the postwar system, but the reality of their standing could not for long be disguised. As Olly Newton points out, medicine and engineering are vocational qualifications, but their high status means they’re classified as academic.

Simply grading things higher or proclaiming things more esteemed takes us nowhere: 85 per cent of English schools are now rated “Good” or “Outstanding,” when it is plain that the system is far from perfect. I can’t help suspecting that if we could only muster adequate inputs in terms of investment—especially in terms of teacher training (the secret of Finland’s success)—then we might find we could expend less energy obsessively measuring outputs.

It is time, surely, to recognise that turning everything into a market leads to gaming and rewards that can’t be trusted. Let us instead develop a more rounded system that—yes—still unflinchingly tests particular aptitudes when that is relevant and can be meaningfully done, but one that also nurtures the individual and serves the collective.

Tim Brighouse would like to see a broader qualification at 18—part exam, part record of achievements, part long-term project—that reflects a more idealistic conception of education’s purpose. “We want people to grow up thinking for themselves and acting for others,” he told me. “We want them to have a range of qualities. We want them to be themselves, have fulfilling lives, and contribute to the fulfilment of others.”

So what are the chances my own primary-school-age kids will avoid the sausage factory? Newton, who was once a civil servant, thinks top-down change will be a 10-year process. “Within the Department of Education, current ministers are just not interested in this at all,” he told me. “But amongst wider policymakers, as well as teachers, parents and young people themselves, there’s a huge amount of interest.”

And that matters. For Hyman—who remember, has looked at policy from inside No 10 before he moved to the world of education—perhaps 80 per cent of change is always “bottom up.” Even before the votes of parents call time on a system that works for no one, innovative schools will try—are already trying—things, and “politicians will respond to what’s happening on the ground.”

So in the end, change will come. But the great challenge is to ensure it is simultaneously radical and rigorous. That means testing less overall, testing more creatively and effectively, and being more courageous in our judgements. Only then will we start to see some real results.