Pluralistic: Goodhart's Law (of AI) (11 Aug 2025)


Today's links



A black and white photo of an old one-room schoolhouse, seen from the back of the classroom. A teacher sits behind a desk and a US flag at the front of the class. Beside her, a small girl stands, reading aloud from a book. The image has been altered. In the foreground is a Robin Hood figure, seen from behind, holding a bow, a quiver of arrows on his back. Behind the little girl is the glaring red eye of HAL 9000 from Stanley Kubrick's '2001: A Space Odyssey.' An arrow vibrates dead-center in the eye.

Goodhart's Law (of AI) (permalink)

One way to think about AI's unwelcome intrusion into our lives can be summed up with Goodhardt's Law: "When a measure becomes a target, it ceases to be a good measure":

https://en.wikipedia.org/wiki/Goodhart%27s_law

Goodhart's Law is a harsh mistress. It's incredibly exciting to discover a new way of measuring aspects of a complex system in a way that lets you understand (and thus control) it. In 1998, Sergey Brin and Larry Page realized that all the links created by everyone who'd ever made a webpage represented a kind of latent map of the value and authority of every website. We could infer that pages that had more links pointing to them were considered more noteworthy than pages that had fewer inbound links. Moreover, we could treat those heavily linked-to pages as authoritative and infer that when they linked to another page, it, too, was likely to be important.

This insight, called "PageRank," was behind Google's stunning entry into the search market, which was easily one of the most exciting technological developments of the decade, as the entire web just snapped into place as a useful system for retrieving information that had been created by a vast, uncoordinated army of web-writers, hosted in a distributed system without any central controls.

Then came the revenge of Goodhart's Law. Before Google became the dominant mechanism for locating webpages, the only reason for anyone to link to a given page or site was because there was something there they thought you should see. Google aggregated all those "I think you should see this" signals and turned them into a map of the web's relevance and authority.

But making a link to a webpage is easy. Once there was another reason to make a link between two web-pages – to garner traffic, which could be converted into money and/or influence – then bad actors made a lot of spurious links between websites. They created linkfarms, they spammed blog comments, they hacked websites for the sole purpose of adding a bunch of human-invisible, Google-scraper-readable links to pages.

The metric ("how many links are there to this page?") became a target ("make links to this page") and ceased to be a useful metric.

Goodhart's Law is still a plague on Google search quality. "Reputation abuse" is a webcrime committed by venerable sites like Forbes, Fortune and Better Homes and Gardens, who abuse the authority imparted by tons of inbound links accumulated over decades by creating spammy, fake product-review sites stuffed with affiliate links, that Google ranks more highly than real, rigorous review sites because of all that accumulated googlejuice:

https://pluralistic.net/2024/05/03/keyword-swarming/#site-reputation-abuse

Goodhart's Law is 50 years old, but policymakers are woefully ignorant of it and continue to operate as though it doesn't apply to them. This is especially pronounced when policymakers are determined to Do Something about a public service that has been starved of funding kicked around as a political football to the point where it has degraded and started to outrage the public. When this happens, policymakers are apt to blame public servants – rather than themselves – for this degradation, and then set out to Bring Accountability to those public employees.

The NHS did this with ambulance response times, which are very bad, and that fact is, in turn, very bad. The reason ambulance response times suck isn't hard to winkle out: there's not enough money being spent on ambulances, drivers, and medics. But that's not a politically popular conclusion, especially in the UK, which has been under brutal and worsening austerity since the Blair years (don't worry, eventually they'll do enough austerity and things will really turn around, because, as the old saying goes, "Good policymaking consists of doing the same thing over and over and expecting a different outcome)."

Instead of blaming inadequate funding for poor ambulance response times, politicians blamed "inefficiency," driven by a poor motivation. So they established a metric: ambulances must arrive within a certain number of minutes (and they set a consequence: massive cuts to any ambulance service that didn't meet the metric).

Now, "an ambulance where it's needed within a set amount of time" may sound like a straightforward metric, and it was – retrospectively. As in, we could tell that the ambulance service was in trouble because ambulances were taking half an hour or more to arrive. But prospectively, after that metric became a target, it immediately ceased to be a good metric. That's because ambulance services, faced with the impossible task of improving response times without spending money, started to dispatch ambulance motorbikes that couldn't carry 95% of the stuff needed to respond to a medical emergency, and had no way to get patients back to hospitals. These motorbikes were able to meet the response-time targets…without improving the survival rates of people who summoned ambulances:

https://timharford.com/2014/07/underperforming-on-performance/

AI turns out to be a great way to explore all the perverse dimensions of Goodhart's Law. For years, machine learning specialists have struggled with the problem of "reward hacking," in which an AI figures out how to meet some target in a way that blows up the metric it was derived from:

https://research.google/blog/bringing-precision-to-the-ai-safety-discussion/

My favorite example of this is the AI-powered Roomba that was programmed to find an efficient path that minimized collisions with furniture, as measured by a forward-facing sensor that sent a signal whenever the Roomba bumped into anything. The Roomba started driving backwards, smashing into all kinds of furniture, but measuring zero collisions, because there was no collision-sensor on its back:

https://x.com/smingleigh/status/1060325665671692288

Charlie Stross has observed that corporations are a kind of "slow AI," that engage in endless reward-hacking to accomplish their goals, increasing their profits by finding nominally legal ways to poison the air, cheat their customers and maim their workers:

https://memex.craphound.com/2017/12/29/charlie-strosss-ccc-talk-the-future-of-psychotic-ais-can-be-read-in-todays-sociopathic-corporations/

Public services under conditions of austerity are another kind of slow AI. When policymakers demand that a metric be satisfied without delivering any of the budget or resources needed to satisfy it, the public employees downstream of that impossible demand will start reward-hacking and the metric will become a target, and then cease to be a useful metric.

Which brings me, at last, to AI in educational contexts.

In 2008, George W Bush stepped up the long-running war on education with the No Child Left Behind Act. The right hates public education, for many reasons. Obviously, there's the fact that uneducated people are easier to mislead, which is helpful if you want to get a bunch of turkeys to vote for Christmas ("I love the uneducated" -DJ Trump). Then there's the fact that, since 1954's Brown v Board of Ed, Black and brown kids were legally guaranteed the right to be educated alongside white kids, which makes a large swathe of the right absolutely nuts. Then there was the 1962 Supreme Court decisions that banned prayer in school, leading to bans on teaching Christian doctrine, including nonsense like Young Earth Creationism. Finally, there's the fact that teachers a) belong to unions; and, b) believe in their jobs and fight for the kids they teach.

No Child Left Behind was a vicious salvo in the war on teachers, positing the problem with education as a failure of teachers, driven by a combination of poor training and indifference to their students. Under No Child Left Behind, students were subjected to multiple rounds of standardized tests, and teachers with low-performing students had their budgets taken away (after first being offered modest assistance in improving those scores).

Some of NCLB's standardized tests represented reasonable metrics: we really do want kids to be able to read and do math and reason and string together coherent thoughts at various points in their schooling. But when these metrics became targets, boy did they stop being useful as metrics.

It's impossible to overstate how fucking perverse NCLB was. I once met an elementary school teacher from an incredibly poor school district in Kansas. Many of her students were resettled refugees who didn't speak English; they spoke a language that no one in the school system could speak, and which had no system of writing. They arrived in her classroom unable to speak English and unable to read or write in any language, and no one could speak their language.

Obviously, these students performed badly on standardized tests delivered in English (it didn't help that they had to take the tests just months after arriving in the classroom, because the clock started ticking on their first test when they entered the system, which could take half a year to place them in a class). Within a couple years, these schools had had most of their budgets taken away.

When the standardized tests rolled around, this teacher would lead her students into the only room in the school with computers – the test taking room. For many of these students, this was the first time they had ever used a computer. She would tell them to do their best and leave the room for an hour, while a well-paid proctor (along with test-taking computers, the only thing NCLB guaranteed funding for) observed them as they tried to figure out how a mouse worked. They would all score zero on the test, and the school would be punished.

NCLB was such a failure that it was eventually rescinded (in 2015), but by that time, a new system of standardization had rushed in to fill the gap, the Common Core. Common Core is a set of rigid standardized curriciula – with standardized assessment rubrics – that was, once again, driven by contempt for teachers. The argument for Common Core was that students were failing – not because of falling budgets or No Child Left Behind – but because the unions were "protecting bad teachers," who would then go on to fail students. By taking away discretion from teachers, we could impose "accountability" on them.

The absolutely predictable outcome followed Goodhart's Law to a tee: teachers prioritized inculcating students with the skills to pass the standardized tests, and when those test-taking skills crowded out actual learning, learning fell by the wayside.

This continues up to the most advanced part of public education, the Advanced Placement courses that students aspiring to college are strongly pressured to take. If Common Core is rigid, AP is brittle to the point of shattering. Anyone who's ever parented a kid through the US secondary school system knows how much time their kids spent learning to hit their marks on standardized assessments, to the exclusion of actual learning, and how soul-suckingly awful this is.

Take that staple of the AP assessment rubric: the five-paragraph essay (5PE), bane of students, teachers and parents everywhere:

https://www.insidehighered.com/blogs/just-visiting/kill-5-paragraph-essay

Speaking as a sometime writing teacher and an internationally bestselling essayist, 5PEs are objectively very bad essays. Their only virtue is that they can be assessed in a totally standard way, so the grade any given 5PE is awarded by any grader is likely to be the same grade it receives when presented to any other grader. Grading an essay is an irreducibly subjective matter, and the only way to create an objective standard for essays is to make the essays unrecognizable as essays.

And yet, the 5PE is the heart of assessment for many AP classes, from History to English to Social Studies and beyond. A kid who scores high on any humanities APs will have put endless hours into perfecting this perfectly abominable literary form, mastering a skill that they will never, ever be called upon to use (the top piece of college entrance advice is "don't write your personal essay as a 5PE" and college professors spend the first half of their 101 classes teaching students not to turn in 5PEs).

The same goes for many other aspects of AP and Common Core assessment. If you do AP Lit, you'll be required to annotate the literature you read by making a set number of marginal observations on every page of the novels, poems and essays you read. Again, as a literary reviewer, novelist, and nonfiction writer who's written more than 30 books, I have to say, this is a batshit way to learn to analyze and criticize literature. Its sole virtue is that it reduces the qualitative matter of literary analysis to a quantitative target that students can hit and teachers can count.

And that's where AI comes in. AI – the ultimate bullshit machine – can produce a better 5PE than any student can, because the point of the 5PE isn't to be intellectually curious or rigorous, it's to produce a standardized output that can be analyzed using a standardized rubric.

I've been writing YA novels and doing school visits for long enough to cement my understanding that kids are actually pretty darned clever. They don't graduate from high school thinking that their mastery of the 5PE is in any way good or useful, or that they're learning about literature by making five marginal observations per page when they read a book.

Given all this, why wouldn't you ask an AI to do your homework? That homework is already the revenge of Goodhart's Law, a target that has ruined its metric. Your homework performance says nothing useful about your mastery of the subject, so why not let the AI write it. Hell, if you're a smart, motivated kid, then letting the AI write your bullshit 5PEs might give you time to write something good.

Teachers aren't to blame here. They have to teach to the test, or they will fail their students (literally, because they will have to assign a failing grade to them, and figuratively, because a student who gets a failing grade will face all kinds of punishments). Teachers' unions – who consistently fight against standardization and in favor of their members discretion to practice their educational skills based on kids' individual needs – are the best hope we have:

https://pluralistic.net/2025/03/29/jane-mcalevey/#trump-is-a-scab

The right hates teachers and keeps on setting them up to fail. That hatred has no bottom. Take the Republican Texas State Rep Ryan Guillen, whose House Bill 462 will increase the state's school safety budget from $10/student to $100/student, with those additional funds earmarked to buy one armed drone per 200 students (these drones are supplied by a single company that has ties to Guillen):

https://dronelife.com/2024/12/08/texas-lawmaker-proposes-drones-for-school-security-a-less-lethal-solution/

Imagine how much Texas schools could do with an extra $90/student/year – how much more usefully that money could be spent if it were turned over to teachers. But instead, Rep Guillen wants to put "AI in schools" in the form of drones equipped with pepper-spray, flash bangs, and "lances" that can be smashed into people at 100mph.

The problem with AI in schools isn't that students are using AI to do their homework. It's that schools have been turned into reward-hacking AIs by a system that hates the idea of an educated populace almost as much as it hates the idea of unionized teachers who are empowered to teach our kids.

(Image: Cryteria, CC BY 3.0; Lee Haywood, CC BY-SA 2.0; modified)


Hey look at this (permalink)



A shelf of leatherbound history books with a gilt-stamped series title, 'The World's Famous Events.'

Object permanence (permalink)

#15yrsago Bill Ayers’s To Teach: The Journey, in Comics, a humanist look at education https://memex.craphound.com/2010/08/10/bill-ayerss-to-teach-the-journey-in-comics-a-humanist-look-at-education/

#10yrsago Kansas officials stonewall mathematician investigating voting machine “sabotage” https://www.kansas.com/news/politics-government/article27951310.html

#10yrsago Chinese mega-manufacturers set up factories in India https://web.archive.org/web/20150811043714/https://www.itworld.com/article/2968375/android/foxconn-to-invest-5b-to-set-up-first-of-up-to-12-factories-in-india.html

#10yrsago Oracle’s CSO demands an end to customers checking Oracle products for defects https://arstechnica.com/information-technology/2015/08/oracle-security-chief-to-customers-stop-checking-our-code-for-vulnerabilities/

#10yrsago Girl Sex 101: “for EVERYone who wants to bone down with chicks, regardless of your gender/orientation.” https://www.ohjoysextoy.com/girlsex-101/

#10yrsago John Oliver on the brutal state of sex-ed in America https://www.youtube.com/watch?v=L0jQz6jqQS0

#10yrsago Insurance monitoring dashboard devices used by Uber let hackers “cut your brakes” over wireless https://www.wired.com/2015/08/hackers-cut-corvettes-brakes-via-common-car-gadget/

#10yrsago US lobbying for TPP to lock up clinical trial data https://theconversation.com/how-the-battle-over-biologics-helped-stall-the-trans-pacific-partnership-45648

#10yrsago Larry Lessig considers running for the Democratic presidential nomination https://www.youtube.com/watch?v=CaqrQz71bMk

#10yrsago Felicia Day’s “You’re Never Weird on the Internet (Almost)” https://memex.craphound.com/2015/08/11/felicia-days-youre-never-weird-on-the-internet-almost/

#10yrsago Overshare: Justin Hall’s biopic about the first social media/blogging https://www.youtube.com/watch?v=AxD4mqFtySQ

#5yrsago When you hear "intangibles"… https://pluralistic.net/2020/08/11/nor-glom-of-nit/#capitalists-hate-competition

#5yrsago How they're killing the post office https://pluralistic.net/2020/08/11/nor-glom-of-nit/#sos-usps

#5yrsago Terra Nullius https://pluralistic.net/2020/08/11/nor-glom-of-nit/#terra-nullius

#5yrsago Uber lost $4b in H1/2020 https://pluralistic.net/2020/08/10/folksy-monopolists/#bezzled

#5yrsago Warren Buffet, monopolist https://pluralistic.net/2020/08/10/folksy-monopolists/#folksy-monopolists


Upcoming appearances (permalink)

A photo of me onstage, giving a speech, pounding the podium.



A screenshot of me at my desk, doing a livecast.

Recent appearances (permalink)



A grid of my books with Will Stahle covers..

Latest books (permalink)



A cardboard book box with the Macmillan logo.

Upcoming books (permalink)

  • Canny Valley: A limited edition collection of the collages I create for Pluralistic, self-published, September 2025

  • Enshittification: Why Everything Suddenly Got Worse and What to Do About It, Farrar, Straus, Giroux, October 7 2025
    https://us.macmillan.com/books/9780374619329/enshittification/

  • Unauthorized Bread: a middle-grades graphic novel adapted from my novella about refugees, toasters and DRM, FirstSecond, 2026

  • Enshittification, Why Everything Suddenly Got Worse and What to Do About It (the graphic novel), Firstsecond, 2026

  • The Memex Method, Farrar, Straus, Giroux, 2026

  • The Reverse-Centaur's Guide to AI, a short book about being a better AI critic, Farrar, Straus and Giroux, 2026



Colophon (permalink)

Today's top sources:

Currently writing:

  • "The Reverse Centaur's Guide to AI," a short book for Farrar, Straus and Giroux about being an effective AI critic. (1076 words yesterday, 27803 words total).

  • A Little Brother short story about DIY insulin PLANNING


This work – excluding any serialized fiction – is licensed under a Creative Commons Attribution 4.0 license. That means you can use it any way you like, including commercially, provided that you attribute it to me, Cory Doctorow, and include a link to pluralistic.net.

https://creativecommons.org/licenses/by/4.0/

Quotations and images are not included in this license; they are included either under a limitation or exception to copyright, or on the basis of a separate license. Please exercise caution.


How to get Pluralistic:

Blog (no ads, tracking, or data-collection):

Pluralistic.net

Newsletter (no ads, tracking, or data-collection):

https://pluralistic.net/plura-list

Mastodon (no ads, tracking, or data-collection):

https://mamot.fr/@pluralistic

Medium (no ads, paywalled):

https://doctorow.medium.com/

Twitter (mass-scale, unrestricted, third-party surveillance and advertising):

https://twitter.com/doctorow

Tumblr (mass-scale, unrestricted, third-party surveillance and advertising):

https://mostlysignssomeportents.tumblr.com/tagged/pluralistic

"When life gives you SARS, you make sarsaparilla" -Joey "Accordion Guy" DeVilla

READ CAREFULLY: By reading this, you agree, on behalf of your employer, to release me from all obligations and waivers arising from any and all NON-NEGOTIATED agreements, licenses, terms-of-service, shrinkwrap, clickwrap, browsewrap, confidentiality, non-disclosure, non-compete and acceptable use policies ("BOGUS AGREEMENTS") that I have entered into with your employer, its partners, licensors, agents and assigns, in perpetuity, without prejudice to my ongoing rights and privileges. You further represent that you have the authority to release me from any BOGUS AGREEMENTS on behalf of your employer.

ISSN: 3066-764X