Pluralistic: Penguin Random House, AI, and writers' rights (19 Oct 2024)


Today's links

Continue reading "Pluralistic: Penguin Random House, AI, and writers' rights (19 Oct 2024)"

Pluralistic: An interoperability rule for your money (21 Oct 2023)


Today's links

Continue reading "Pluralistic: An interoperability rule for your money (21 Oct 2023)"

Pluralistic: For 40 years, Big Meat has openly colluded to rig prices (04 Oct 2023)


Today's links

Continue reading "Pluralistic: For 40 years, Big Meat has openly colluded to rig prices (04 Oct 2023)"

Pluralistic: Podcasting "How To Think About Scraping" (25 Sept 2023)


Today's links

Continue reading "Pluralistic: Podcasting "How To Think About Scraping" (25 Sept 2023)"

How To Think About Scraping

In privacy and labor fights, copyright is a clumsy tool at best.

A paint scraper on a window-sill. The blade of the scraper has been overlaid with a ‘code rain’ effect as seen in the credits of the Wachowskis’ ‘Matrix’ movies.
syvwlch/CC BY 2.0 (modified)

Web-scraping is good, actually.

For nearly all of history, academic linguistics focused on written, formal text, because informal, spoken language was too expensive and difficult to capture. In order to find out how people spoke — which is not how people write! — a researcher had to record speakers, then pay a grad student to transcribe the speech.

The process was so cumbersome that the whole discipline grew lopsided. We developed an extensive body of knowledge about written, formal prose (something very few of us produce), while informal, casual language (something we all produce) was mostly a black box.

The internet changed all that, creating the first-ever corpus of informal language — the immense troves of public casual speech that we all off-gas as we move around on the internet, chattering with our friends.

Continue reading "How To Think About Scraping"

Pluralistic: 05 Apr 2021


Today's links

Continue reading "Pluralistic: 05 Apr 2021"

Pluralistic: 13 Jun 2020


Today's links

Continue reading "Pluralistic: 13 Jun 2020"