Volume II — GAKHUR: A Philosophy of Learning and Human Formation

Chapter 13: When Measurement Destroys What It Claims to Value

Chapter 20 2,332 words ~12 min read

The Limit of Metrics in Human Formation

There is a principle in the social sciences that has proven, in the decades since its articulation by the economist Charles Goodhart and its subsequent broader application by social researchers, to be one of the most reliable and most practically consequential observations about the behaviour of human institutions that the social sciences have produced — reliable not in the way of statistical regularity that holds under specified conditions but in the deeper sense of describing a mechanism that operates with the consistency of a structural feature rather than a contingent tendency.

The principle states: when a measure becomes a target, it ceases to be a good measure. When an institution identifies a particular indicator as the primary measure of its success and directs its organisational effort toward producing that indicator, two things happen simultaneously and inevitably. The indicator improves — often dramatically, because the full weight of the institution's capacity is now directed at producing it rather than at the underlying reality the indicator was designed to reflect. And the underlying reality begins to diverge from the indicator — because the effort that was previously directed at producing the reality has been redirected toward producing the indicator, and these are not, in most human domains of any genuine complexity, the same activity.

This principle describes, with an accuracy that is genuinely uncomfortable to examine in educational contexts, what has happened to assessment in most educational systems across the period in which assessment has been the primary instrument of educational accountability. Examination results were introduced into educational systems as indicators of learning — as instruments designed to produce evidence of an underlying reality called learning, whose presence they were supposed to reflect rather than substitute for. They became targets — the primary institutional objectives around which teaching, curriculum, and professional evaluation were organised. And in becoming targets, they ceased to be genuine measures of learning in any sense that honours the relationship between the indicator and the reality it was designed to represent, because the effort directed at producing the indicator had displaced the effort directed at producing the learning whose presence the indicator was supposed to reflect.

What Measurement Cannot Reach

The most important things that genuine education produces — the things that the GAKHUR philosophy has been articulating across the preceding chapters as the genuine aims of genuine formation — are not measurable in any sense that educational systems have yet been able to operationalise without either failing to capture the genuine quality they are attempting to measure or, more seriously, destroying it in the specific way that attempting to measure it consistently does.

Genuine curiosity — the specific orientation toward understanding that sustains genuine learning throughout a lifetime, that generates questions from genuine engagement rather than from the strategic performance of engagement — is not a quantity and does not behave like one under institutional measurement. A learner whose curiosity is genuine asks questions because questions arise naturally from their actual engagement with ideas that matter to them, because the engagement produces genuine uncertainty that the questions are trying to resolve. A learner who has been trained to perform curiosity in the assessment conditions that reward its visible expression produces the appearance of questioning without the underlying developmental reality that genuine questioning represents — asks the questions that the assessment rewards rather than the questions that genuine engagement generates. These two states look sufficiently similar in an assessment context that no instrument currently available can reliably distinguish between them. They are fundamentally and consequentially different in their developmental significance and their long-term effects on the learner's relationship with intellectual life.

Genuine resilience is not a score on a scale — it is a quality that reveals itself in specific situations over genuine time in ways that depend on the nature of the specific difficulty, the developmental history of the specific person encountering it, and the specific relational conditions within which the encounter occurs, none of which any standardised instrument can adequately capture because each of them is irreducibly specific. Genuine ethical awareness cannot be adequately assessed by any instrument that rewards the production of correct ethical answers, because the production of correct ethical answers under assessment conditions and the possession of genuine ethical awareness that shapes actual decisions under genuine difficulty are not the same capacity and do not require the same development. And the integrated quality that the Gakhur concept names — the depth of understanding genuinely transformed over time and through genuine difficulty into reliable human judgment — is perhaps the most completely resistant to institutional measurement of anything that genuine education is trying to produce, because it is specifically a quality of being rather than of performance, a quality of the person rather than of their outputs, and its genuine presence is legible only to sustained relationship rather than to standardised assessment.

How Measurement Reshapes What It Measures

The most serious harm that measurement causes in education is not, as is sometimes argued, that it fails to capture what matters most — that its instruments are inadequate to their stated purposes, which is a problem of instrument design that could in principle be addressed by better instruments. The more serious and more genuinely consequential harm is that measurement, when it becomes the primary orientation of the educational enterprise rather than a specific tool used for specific purposes within a broader educational commitment, actively reshapes the reality it was designed to measure in ways that make genuine formation progressively less likely to occur — and does this not despite the quality of the measurement but through it, through the specific mechanism by which directing institutional effort at an indicator displaces the effort that would have been directed at the reality the indicator represents.

The first mechanism of this reshaping is the displacement of intrinsic motivation — the specific form of intellectual engagement that genuine learning requires and that is simultaneously the form most vulnerable to external evaluation. When the primary feedback a learner receives about their intellectual activity is external and evaluative rather than arising from their own genuine relationship with the ideas they are engaging with, the internal motivation that genuine learning requires is gradually displaced not through any act of resistance or refusal but through the entirely natural process by which external motivational systems colonise the space that internal motivation previously occupied, until the learner finds that the external measure has become the point of the activity in ways that make the activity feel purposeless when the external measure is absent.

The second mechanism is the progressive narrowing of what counts as learning within the institution's understanding of its own purpose. When certain aspects of learning are measurable in the forms that institutional accountability has learned to trust and others are not — when examination performance is measurable and genuine curiosity is not, when recall is measurable and the quality of judgment is not — the measurable aspects attract the institution's educational effort in proportion to the accountability weight they carry, and the curriculum, the timetable, and the educator's professional attention narrow progressively toward what the measurement can reach and away from what it cannot, until what the institution is actually doing on a daily basis has been substantially reorganised around the requirements of the measurement rather than around the requirements of genuine formation.

The third mechanism is the development of strategic rather than genuine engagement as the learner's habitual relationship with intellectual activity. When learners develop, through years of consistent institutional experience, the understanding that what is rewarded is measured performance rather than genuine understanding — and they develop this understanding with considerable accuracy because it is accurate — they adapt with the specific rationality of people responding to genuine incentives, developing the skills of performing effectively within a measurement-dominated system rather than the skills of genuine learning, and investing their cognitive and motivational resources where the system's rewards are located rather than where genuine formation would require them to be invested. This adaptation is not a moral failing in the learners who practice it — it is the rational and predictable response of developing human beings to an institutional environment whose reward structure has been organised around the wrong things, and blaming the learners for the adaptation rather than the institution for the design is the specific form of misdiagnosis that allows the design to persist unchanged.

The Specific Harm of Ranking

Of all the forms that educational measurement takes, ranking deserves particular attention not because it is the most technically flawed of the available assessment instruments but because its harms are among the best documented in the educational literature and simultaneously among the most resistant to institutional change, because the specific functions that ranking serves — sorting, comparing, certifying, signalling competitive outcomes — are functions whose institutional usefulness is sufficiently great that the evidence of their developmental costs has not been sufficient to produce the institutional response those costs would seem to warrant.

Ranking transforms the educational environment from a community of learners engaged in the shared pursuit of genuine understanding into a competitive arena in which each learner's standing is defined in relation to the standing of every other learner — a transformation that directly and systematically destroys the specific social conditions that genuine learning requires. The psychological safety without which genuine intellectual risk-taking is unavailable, the mutual support through genuine difficulty that genuine intellectual community provides, and the shared engagement with ideas that is neither enhanced nor threatened by the comparative performance of those engaged in it — all of these are replaced, in a ranking-dominated educational environment, by the specific social dynamics of competition, in which other learners' success is a threat to one's own position and their difficulty is an advantage rather than an occasion for the mutual support that genuine learning community would require.

Ranking ties the learner's developing sense of worth to their relative position — which means it ties their sense of worth to something that is partly outside their control, that changes in response to other people's development independently of anything the learner does, and that is entirely relational in the specific sense that it has no meaning in the absence of comparison. And ranking communicates to the learner, with the authority of institutional repetition across years of schooling, the most fundamental distortion of what genuine learning is for that any institutional practice produces: that the purpose of learning is to be better than others rather than to become better than one was, and that the appropriate measure of one's development is one's position in a distribution rather than the quality and the honesty of one's genuine engagement with genuine difficulty.

Restraint as Educational Wisdom

The argument of this chapter is not that measurement should be eliminated from educational practice — such a position would be both impractical and philosophically unnecessary, because there are genuine educational purposes that appropriate measurement genuinely serves and that the argument against measurement's dominance should not be confused with an argument against its legitimate and carefully bounded use. It is that measurement should be practised with the specific quality of restraint that genuine educational wisdom requires — restraint that is grounded in an honest account of what measurement can and cannot do, and in a genuine commitment to protecting what it cannot do justice to from the institutional pressure to make it measurable that an accountability-dominated educational culture consistently generates.

This restraint begins with the honest institutional acknowledgment that the most important outcomes of genuine education are not measurable in the forms that institutional accountability has learned to trust — and that this fact does not diminish the importance of those outcomes but rather increases the responsibility of those who understand the limits of measurement to protect the conditions for genuine formation from being sacrificed to the requirements of measurement's institutional credibility. Restraint in measurement means being willing to protect the unmeasurable from the institutional pressure to quantify it — to insist that genuine curiosity, genuine resilience, genuine ethical awareness, and the depth of formation that the Gakhur concept names deserve to be cultivated with full institutional seriousness even when they cannot be assessed in ways that satisfy the instruments of accountability, and to resist the institutional tendency to treat what cannot be measured as though it were not genuinely real.

Restraint also means being willing to ask, of every assessment instrument used in an educational context, the question that institutional systems have consistently and consequentially failed to ask: what does this assessment do to the learner's relationship with learning itself? Not what does it reveal about the learner's current level of performance on this specific task — which is the question the instrument was designed to answer and can answer adequately — but what does it communicate to the learner, through the quality and the character of the experience of being assessed, about what learning is for, what they are valued for within this institution, how their development is understood by the institution that has been given responsibility for it, and what kind of person the institution's assessment of them is training them to become?

When the learning that genuinely mattered most in a person's life is examined honestly rather than through the lens of institutional significance, it is rarely what was most carefully measured. It was what changed how the world looked, how one moved within it, and how one understood oneself and the human beings one was responsible to — changes whose significance is not adequately captured by any instrument designed to produce a score, and whose production required the specific conditions of genuine depth that measurement-dominated systems have consistently failed to protect.

Education owes learners the protection of designs that refuse to mistake measurement for meaning, and the courage to resist measuring what cannot be meaningfully measured is not a rejection of rigour but its most serious expression — the commitment to depth over the comfort of the quantifiable.

A quiet realisation

Share your thoughts and reflections on this chapter.

Name yourself to leave a reflection here.