Pluralistic: 19 Aug 2021


Today's links



An industrial meat-grinder; on its intake belt is a procession of recycling bins heaped high with garbage; its output cone has been replaced with the glowing eye of HAL 9000, and it empties into a giant wheeled hopper full of ground-up trash.

Machine learning's crumbling foundations (permalink)

Technological debt is insidious, a kind of socio-infrastructural subprime crisis that's unfolding around us in slow motion. Our digital infrastructure is built atop layers and layers and layers of code that's insecure due to a combination of bad practices and bad frameworks.

Even people who write secure code import insecure libraries, or plug it into insecure authorization systems or databases. Like asbestos in the walls, this cruft has been fragmenting, drifting into our air a crumb at a time.

We ignored these, treating them as containable, little breaches and now the walls are rupturing and choking clouds of toxic waste are everywhere.

https://pluralistic.net/2021/07/27/gas-on-the-fire/#a-safe-place-for-dangerous-ideas

The infosec apocalypse was decades in the making. The machine learning apocalypse, on the other hand…

ML has serious, institutional problems, the kind of thing you'd expect in a nascent discipline, which you'd hope would be worked out before it went into wide deployment.

ML is rife with all forms of statistical malpractice – AND it's being used for high-speed, high-stakes automated classification and decision-making, as if it was a proven science whose professional ethos had the sober gravitas you'd expect from, say, civil engineering.

Civil engineers spend a lot of time making sure the buildings and bridges they design don't kill the people who use them. Machine learning?

Hundreds of ML teams built models to automate covid detection, and every single one was useless or worse.

https://pluralistic.net/2021/08/02/autoquack/#gigo

The ML models failed due to failure to observe basic statistical rigor. One common failure mode?

Treating data that was known to be of poor quality as if it was reliable because good data was not available.

Obtaining good data and/or cleaning up bad data is tedious, repetitive grunt-work. It's unglamorous, time-consuming, and low-waged. Cleaning data is the equivalent of sterilizing surgical implements – vital, high-skilled, and invisible unless someone fails to do it.

It's work performed by anonymous, low-waged adjuncts to the surgeon, who is the star of the show and who gets credit for the success of the operation.

The title of a Google Research team (Nithya Sambasivan et al) paper published in ACM CHI beautifully summarizes how this is playing out in ML: "Everyone wants to do the model work, not the data work: Data Cascades in High-Stakes AI,"

https://storage.googleapis.com/pub-tools-public-publication-data/pdf/0d556e45afc54afeb2eb6b51a9bc1827b9961ff4.pdf

The paper analyzes ML failures from a cross-section of high-stakes projects (health diagnostics, anti-poaching, etc) in East Africa, West Africa and India. They trace the failures of these projects to data-quality, and drill into the factors that caused the data problems.

The failures stem from a variety of causes. First, data-gathering and cleaning are low-waged, invisible, and thankless work. Front-line workers who produce the data – like medical professionals who have to do extra data-entry – are not compensated for extra work.

Often, no one even bothers to explain what the work is for. Some of the data-cleaning workers are atomized pieceworkers, such as those who work for Amazon's Mechanical Turk, who lack both the context in which the data was gathered and the context for how it will be used.

This data is passed to model-builders, who lack related domain expertise. The hastily labeled X-ray of a broken bone, annotated by an unregarded and overworked radiologist, is passed onto a data-scientist who knows nothing about broken bones and can't assess the labels.

This is an age-old problem in automation, pre-dating computer science and even computers. The "scientific management" craze that started in the 1880s saw technicians observing skilled workers with stopwatches and clipboards, then restructuring the workers' jobs by fiat.

Rather than engaging in the anthropological work that Clifford Geertz called "thick description," the management "scientists" discarded workers' qualitative experience, then treated their own assessments as quantitative and thus empirical.

http://hypergeertz.jku.at/GeertzTexts/Thick_Description.htm

How long a task takes is empirical, but what you call a "task" is subjective. Computer scientists take quantitative measurements, but decide what to measure on the basis of subjective judgment. This empiricism-washing sleight of hand is endemic to ML's claims of neutrality.

In the early 2000s, there was a movement to produce tools and training that would let domain experts produce their own tools – rather than delivering "requirements" to a programmer, a bookstore clerk or nurse or librarian could just make their own tools using Visual Basic.

This was the radical humanist version of "learn to code" – a call to seize the means of computation and program, rather than being programmed. Over time, it was watered down, and today it lives on as a weak call for domain experts to be included in production.

The disdain for the qualitative expertise of domain experts who produce data is a well-understood guilty secret within ML circles, embodied in Frederick Jelinek's ironic talk, "Every time I fire a linguist, the performance of the speech recognizer goes up."

But a thick understanding of context is vital to improving data-quality. Take the American "voting wars," where GOP-affiliated vendors are brought in to purge voting rolls of duplicate entries – people who are registered to vote in more than one place.

These tools have a 99% false-positive rate.

Ninety. Nine. Percent.

To understand how they go so terribly wrong, you need a thick understanding of the context in which the data they analyze is produced.

https://5harad.com/papers/1p1v.pdf

The core assumption of these tools is that two people with the same name and date of birth are probably the same person.

But guess what month people named "June" are likely to be born in? Guess what birthday is shared by many people named "Noel" or "Carol"?

Many states represent unknown birthdays as "January 1," or "January 1, 1901." If you find someone on a voter roll whose birthday is represented as 1/1, you have no idea what their birthday is, and they almost certainly don't share a birthday with other 1/1s.

But false positives aren't evenly distributed. Ethnic groups whose surnames were assigned in recent history for tax-collection purposes (Ashkenazi Jews, Han Chinese, Koreans, etc) have a relatively small pool of surnames and a slightly larger pool of first names.

This is likewise true of the descendants of colonized and enslaved people, whose surnames were assigned to them for administrative purposes and see a high degree of overlap. When you see two voter rolls with a Juan Gomez born on Jan 1, you need to apply thick analysis.

Unless, of course, you don't care about purging the people who are most likely to face structural impediments to voter registration (such as no local DMV office) and who are also likely to be racialized (for example, migrants whose names were changed at Ellis Island).

ML practitioners don't merely use poor quality data when good quality data isn't available – they also use the poor quality data to assess the resulting models. When you train an ML model, you hold back some of the training data for assessment purposes.

So maybe you start with 10,000 eye scans labeled for the presence of eye disease. You train your model with 9,000 scans and then ask the model to assess the remaining 1,000 scans to see whether it can make accurate classifications.

But if the data is no good, the assessment is also no good. As the paper's authors put it, it's important to "catch[] data errors using mechanisms specific to data validation, instead of using model performance as a proxy for data quality."

ML practitioners studied for the paper – practitioners engaged in "high-stakes" model building reported that they had to gather their own data for their models through field partners, "a task which many admitted to being unprepared for."

High-stakes ML work has inherited a host of sloppy practices from ad-tech, where ML saw its first boom. Ad-tech aims for "70-75% accuracy."

That may be fine if you're deciding whether to show someone an ad, but it's a very different matter if you're deciding whether someone needs treatment for an eye-disease that, untreated, will result in irreversible total blindness.

Even when models are useful at classifying input produced under present-day lab conditions, those conditions are subject to several kinds of "drift."

For example, "hardware drift," where models trained on images from pristine new cameras are asked to assess images produced by cameras from field clinics, where lenses are impossible to keep clean (see also "environmental drift" and "human drift").

Bad data makes bad models. Bad models instruct people to make ineffective or harmful interventions. Those bad interventions produce more bad data, which is fed into more bad models – it's a "data-cascade."

GIGO – Garbage In, Garbage Out – was already a bedrock of statistical practice before the term was coined in 1957. Statistical analysis and inference cannot proceed from bad data.

Producing good data and validating data-sets are the kind of unsexy, undercompensated maintenance work that all infrastructure requires – and, as with other kinds of infrastructure, it is undervalued by journals, academic departments, funders, corporations and governments.

But all technological debts accrue punitive interest. The decision to operate on bad data because good data is in short supply isn't like looking for your car-keys under the lamp-post – it's like driving with untrustworthy brakes and a dirty windscreen.

(Image: Seydelmann, CC BY-SA, modified; Cryteria, CC BY, modified)



The cover for the paperback of Hench.

Hench (permalink)

Natalie Zina Walschots's debut novel HENCH is fantastic, funny, furious and fucking amazing. It is a profound and moving story about justice wrapped up in a gag about superheroes, sneaky and sharp.

https://www.harpercollins.com/products/hench-natalie-zina-walschots

Anna is a temp who works for supervillains, doing data-entry. She's economically marginal, but enjoys the camaraderie of her fellow villain temps, and she gets to work from home, massaging spreadsheets in her pyjamas, dressing up in villain-chic for temp agency cattle calls.

But then Anna gets a solid gig working for The Electric Eel, a villain who really seems to value her skills and insists that she come with him to a press-conference where he will unveil a new super-weapon.

Even after she learns she's only been brought along so that the Eel's hench backdrop will look more diverse thanks to her token female presence, she's excited to be there.

Until the superheroes arrive.

While Supercollider – indestructible, irresistibly powerful – is kicking six kinds of shit out of Eel's hired muscle, he incidentally knocks Anna aside, throwing her across the room with so much force that her femur is irreparably shattered.

Maimed, broke (the Eel lays her off once it's clear her injuries will take months to heal), evicted, and dependent on a fellow hench for a couch to recuperate on, Anna grows obsessed with the collateral damage wrought by superheroes.

A data analyst at heart, Anna begins building actuarial tables that tally the life-years and dollars that heroes save when they fight supervillains, and compare them to the cost in lives and dollars from the damage that heroes wreak in their careless battles.

The conclusion is inescapable: heroism is a destructive force that costs us more than it saves, but a long chalk, with that cost measured in human lives and destroyed homes and livelihoods.

At first, Anna's blog documenting her findings is dismissed as the ravings of a crank, but as statisticians independently verify her findings and other survivors of hero incidents come forward, she gains notoriety, then fame.

Now, to be clear, there have been other superhero stories told from the villains' point of view, and also stories told from the point of view of the thankless civil servants who clean up the damage supes do to the urban fabric. But they have all been played for yuks.

Despite its lighthearted tone, Hench isn't in the register of Despicable Me or Damage Control. After a deceptively light-hearted opening, Walschots gets pretty damned dark, making us feel Anna's fury at the literal hero-worship heaped on these brutal monsters.

And when Anna goes to work for a supervillain who perceives her potential and gives her the resources she needs to conduct far more comprehensive analyses of superhero crimes, the story shifts into successively higher gears.

We follow Anna as she uses her data-analysis powers to stalk and destroy supers, and we thrill for her victories and feel deep pangs for the friends she loses as her trajectory diverges from the other henches who were once her peers.

On the way, we get a sly, devastating critique of the state's monopoly on violence, the double-standard for "police-involved killings" and "criminal murders." It's an emotional fly-through of corrupt power structures that tells more than a dozen abstract, academic papers.

You couldn't ask for a better example of how to address a gender-, race- and class-based analysis of societal injustice than in this page-turning superhero romp. For while it deals with novel themes, it does so in verse-verse-chorus structure, building to a fantastic climax.

Boss fights, heists, superweapons, big reveals, plot-twists and reversals, an uneasy alliance and a deliciously grotesque superhero battle – this one's got it all.



This day in history (permalink)

#20yrsago IP and scientific journals https://web.archive.org/web/20010928235132/https://www.abc.net.au/rn/talks/bbing/stories/s345514.htm

#15yrsago British air travelers kick brown “terrorists” off their planes https://www.dailymail.co.uk/news/article-401419/Mutiny-passengers-refuse-fly-Asians-removed.html

#10yrsago “Probability neglect”: why policy-makers are constitutionally incapable of formulating evidence-based anti-terrorism policy https://web.archive.org/web/20190221193522/http://opim.wharton.upenn.edu/risk/library/J2011OBHDP_APM,AT,HK_PolicymakersDilemma.pdf

#10yrsago The Onion (seriously): We did a paywall because British people like paying for the Webhttps://www.avclub.com/about-the-onions-new-paid-content-system-1798226867

#10yrsago TSA can’t explain why “enhanced patdowns” are legal https://flyingwithfish.boardingarea.com/2011/08/18/the-legality-of-the-tsas-enhanced-pat-down-authority/

#1yrago Austerity breeds Nazis https://pluralistic.net/2020/08/19/a-band-apart/#austerity

#1yrago Yale admin: "Prepare for death" https://pluralistic.net/2020/08/19/a-band-apart/#boola-boola

#1yrago Orwell prize winner trapped in orwellian nightmare https://pluralistic.net/2020/08/19/a-band-apart/#fuck-the-algorithm

#1yrago Thomas Hawk's Talking Heads https://pluralistic.net/2020/08/19/a-band-apart/#talkingheads

#1yrago Amazon's Monopoly Tollbooth https://pluralistic.net/2020/08/19/a-band-apart/#amazon-tollbooth



Colophon (permalink)

Today's top sources: Naked Capitalism (https://www.nakedcapitalism.com/).

Currently writing:

  • Spill, a Little Brother short story about pipeline protests. Friday's progress: 257 words (15793 words total)

  • A Little Brother short story about remote invigilation. PLANNING

  • A nonfiction book about excessive buyer-power in the arts, co-written with Rebecca Giblin, "The Shakedown." FINAL EDITS

  • A post-GND utopian novel, "The Lost Cause." FINISHED

  • A cyberpunk noir thriller novel, "Red Team Blues." FINISHED

Currently reading: Analogia by George Dyson.

Latest podcast: Managing Aggregate Demand https://craphound.com/news/2021/08/08/managing-aggregate-demand/
Upcoming appearances:

Recent appearances:

Latest book:

Upcoming books:

  • The Shakedown, with Rebecca Giblin, nonfiction/business/politics, Beacon Press 2022

This work licensed under a Creative Commons Attribution 4.0 license. That means you can use it any way you like, including commercially, provided that you attribute it to me, Cory Doctorow, and include a link to pluralistic.net.

https://creativecommons.org/licenses/by/4.0/

Quotations and images are not included in this license; they are included either under a limitation or exception to copyright, or on the basis of a separate license. Please exercise caution.


How to get Pluralistic:

Blog (no ads, tracking, or data-collection):

Pluralistic.net

Newsletter (no ads, tracking, or data-collection):

https://pluralistic.net/plura-list

Mastodon (no ads, tracking, or data-collection):

https://mamot.fr/web/accounts/303320

Medium (no ads, paywalled):

https://doctorow.medium.com/

(Latest Medium column: "Disneyland at a stroll," part six of a series on themepark design, queing theory, immersive entertainment, and load-balancing. https://pluralistic.net/2021/08/15/disneyland-at-a-stroll-part-vi/)

Twitter (mass-scale, unrestricted, third-party surveillance and advertising):

https://twitter.com/doctorow

Tumblr (mass-scale, unrestricted, third-party surveillance and advertising):

https://mostlysignssomeportents.tumblr.com/tagged/pluralistic

"When life gives you SARS, you make sarsaparilla" -Joey "Accordion Guy" DeVilla