Pluralistic: Pinkdrunk Linkdump (18 Nov 2023)


Today's links

  • Pinkdrunk Linkdump: Your semi-regular weekend declaration of link bankruptcy.
  • This day in history: 2003, 2008, 2013, 2018
  • Colophon: Recent publications, upcoming/recent appearances, current writing projects, current reading

Continue reading "Pluralistic: Pinkdrunk Linkdump (18 Nov 2023)"

Pluralistic: The (open) web is good, actually (13 Nov 2023)


Today's links

Continue reading "Pluralistic: The (open) web is good, actually (13 Nov 2023)"

Pluralistic: The enshittification of garage-door openers reveals a vast and deadly rot (09 Nov 2023)


Today's links

Continue reading "Pluralistic: The enshittification of garage-door openers reveals a vast and deadly rot (09 Nov 2023)"

Pluralistic: In defense of bureaucratic competence (23 Oct 2023)


Today's links

Continue reading "Pluralistic: In defense of bureaucratic competence (23 Oct 2023)"

Pluralistic: An interoperability rule for your money (21 Oct 2023)


Today's links

Continue reading "Pluralistic: An interoperability rule for your money (21 Oct 2023)"

Pluralistic: Leaving Twitter had no effect on NPR's traffic (14 Oct 2023)


Today's links

Continue reading "Pluralistic: Leaving Twitter had no effect on NPR's traffic (14 Oct 2023)"

Pluralistic: The surveillance advertising to financial fraud pipeline (29 Sept 2023)


Today's links

Continue reading "Pluralistic: The surveillance advertising to financial fraud pipeline (29 Sept 2023)"

Pluralistic: Podcasting "How To Think About Scraping" (25 Sept 2023)


Today's links

Continue reading "Pluralistic: Podcasting "How To Think About Scraping" (25 Sept 2023)"

Pluralistic: Kashmir Hill's "Your Face Belongs to Us" (20 Sept 2023)


Today's links

Continue reading "Pluralistic: Kashmir Hill's "Your Face Belongs to Us" (20 Sept 2023)"

How To Think About Scraping

In privacy and labor fights, copyright is a clumsy tool at best.

A paint scraper on a window-sill. The blade of the scraper has been overlaid with a ‘code rain’ effect as seen in the credits of the Wachowskis’ ‘Matrix’ movies.
syvwlch/CC BY 2.0 (modified)

Web-scraping is good, actually.

For nearly all of history, academic linguistics focused on written, formal text, because informal, spoken language was too expensive and difficult to capture. In order to find out how people spoke — which is not how people write! — a researcher had to record speakers, then pay a grad student to transcribe the speech.

The process was so cumbersome that the whole discipline grew lopsided. We developed an extensive body of knowledge about written, formal prose (something very few of us produce), while informal, casual language (something we all produce) was mostly a black box.

The internet changed all that, creating the first-ever corpus of informal language — the immense troves of public casual speech that we all off-gas as we move around on the internet, chattering with our friends.

Continue reading "How To Think About Scraping"