Content moderation is a security problem.
in·fo·sec (/ˈinfōˌsek/): information security
co·mo (/koh-moh/): content moderation
Content moderation is really, really hard.
- seek to set conversational norms, steering transgressors toward resources that help them better understand the local conversational rules;
- respond to complaints from users about uncivil, illegal, deceptive or threatening posts;
- flag or delete material that crosses some boundary (for example, deleting posts that dox other users, or flagging posts with content warning for adult material or other topics);
- elevate or feature material that is exemplary of community values;
- adjudicate disputes about impersonation, harassment and other misconduct.
This is by no means a complete list!
There are many things that make content moderation hard. For starters, there are a lot of instances in which there is no right answer. A user may sincerely not intend to harass another user, who might, in turn, sincerely feel themself to be harassed. Deciding which of these is the case is an art, not a science.
But there is an another side to como, one that’s far more adversarial: detecting, interdicting and undoing deliberate attempts to circumvent the limitations imposed by content moderation policies. Some examples:
- Using alternative spellings (e.g. “phuck” or “5hit”) to bypass filters intended to block profanity, racial slurs, or other language prohibited within a community or discussion;
- Using typographical conventions to circumvent prohibitions on slurs (for example, using (((multiple parentheses))) to express antisemitism);
- Using euphemisms to express racist views;
- Degrading or altering video or audio files to bypass content filters;
- Using automated or semi-automated sock-puppets to disguise coordination and make organized harassment campaigns look like spontaneous community disapprobation;
Some of these tactics are social, others are technical, and many straddle the boundary between the two.
These are classic infosec problems: they have a defender (content moderators, attempting to enforce policy) and an attacker (users, attempting to circumvent that policy).
And yet, we treat como as though its problems existed in the realm of ideas — as though it was a purely social problem that incidentally happened to take place on technically mediated platforms.
This is a mistake, and it makes como harder, not easier.
A bedrock of infosec is that “there is no security in obscurity.” That is, a security system that only works when your adversary is not privy to its inner workings is brittle, because it relies on your adversary not being able to discover or reverse-engineer the system’s workings.
Or, as Schneier’s Law has it:
Anyone can design a security system that they, themselves, cannot break. That doesn’t mean they have designed a security system that works, only that they have designed a security system that works on people who are stupider than them.
This is just a security-specific version of the Enlightenment-era notion of adversarial peer-review, which is the bedrock of modern science and scholarship.
It is an acknowledgment that even the most careful and assiduous scientist is perfectly capable of tricking themself into ignoring or papering over problems in their experimental design or results, and that their friends and well-wishers can easily fall into the same trap.
It’s only by exposing our ideas to our enemies that we truly put them to the test. If the people who want us to fail, who disagree with our hypothesis and even wish us personal disgrace can’t find a flaw in our reasoning, then we assume that we’re onto something.
Security through obscurity definitely weakens some attackers: secrecy can foil an adversary who can’t figure out the underlying design and probe it for weaknesses.
But smart, capable attackers face no such barriers: for adversaries capable of reverse-engineering a security system, secrecy works in their favor. it means that they know about the defects and weaknesses in the system, but the users of the system don’t.
When “bad guys” figure out how to bypass your hotel room lock and break into your hotel safe, they want the news to remain secret. Once the secret gets out, we might stop putting our valuables in the safe and then these tricks lose their value.
The same problems with security through obscurity for hotel room doors and safes apply to como, but because we don’t treat como as infosec, we don’t acknowledge this.
That’s why companies like Facebook keep both the rules they apply to community moderation and the tools they use to automate discovery of violations of those rules a secret.
They insist that revealing the policies and enforcement mechanisms will help bad actors who want to harass, defraud or impersonate their users and upload prohibited materials, from disinformation to misinformation, from Child Sex Abuse Material to copyright infringements to terrorist atrocity videos.
Ironically, the fact that this material continues to show up on social media platforms is cited as the reason for needing more secrecy: this job is hard enough as it is, do you really want us to tell the bad guys how we operate?
Yes, in fact.
Como is really, really hard — infosec is always really, really hard. In information security problems, the attackers always have an advantage over the defenders. Defenders need to make no mistakes, while attackers need merely identify one mistake and exploit it.
And yet, the same tech giants who would never think of using a secret, proprietary encryption algorithm that had never been exposed to adversarial peer review routinely use and defend security through obscurity as the only way to keep their content moderation programs on track.
Trying to be sociable in a security-through-obscurity world is frustrating to say the least. It’s why having your content removed or your account suspended is such a Kafka-esqure affair: if they told you what rule you broke and what the exceptions were to that rule, you might figure out how to trick your way back onto the platform.
But this is already a problem. Committed trolls have all day to spend creating throwaway accounts and trying different strategies for slipping through the cracks in content moderation policies and enforcement.
They can discover what precisely constitutes harassment, and then they can engage in conduct that is almost-but-not-quite harassment — conduct that will be experienced as harassment by its targets, but which can evade sanction by moderators themselves.
What’s more, because harassers can winkle out the contours of secret content moderation policies by laborious trial-and-error, while their victims are in the dark, harassers can goad their victims into committing sanctionable transgressions, and then rat them out to content moderators, explaining exactly which rules their victims have broken and getting them suspended or kicked off the platform.
This is the same failure mode of all security-through-obscurity. Secrecy means that bad guys are privy to defects in systems, while the people who those systems are supposed to defend are in the dark, and can have their defenses weaponized against them.
The sooner we start treating como as infosec, the better. A good first step would be to adopt the Santa Clara Principles, a multistakeholder document that sets out a program for accountable and transparent moderation.