GPT-2 and Threatmodels

This last week we've had the most active discussion of AI safety in quite a while. The "Malicious uses of AI" report last year did not generate this much discussion. The concrete warning that this might be abused to generate Fake News, which is much more concrete than AGI, seems to have alerted the public that AI safety is an issue. Which is a good thing, people need to know what these algorithms that control their lives are capable of. This model seems to have capabilities that could be dangerous and it should be held back for a proper review. However, the bot-based-misinformation threat model may not be changed by the introduction of GPT-2.

In 2017, the FCC launched a public comments website which, aside from major accessibility issues, spawned a massive, coordinated botnet attack which posted millions of comments alongside humans. This botnet was posting anti-net-neutrality comments generated like MadLibs. This disinformation was detectable, once a researcher stared at enough they could use regex to filter for all these comments with near certainty. If these were generated by GPT-2 we might not have found out that nearly all anti-net-neutrality comments were from this botnet. This is the biggest threat, online comments on a website that doesn't handle these often.

For social media, Reddit, Facebook, Twitter, etc... bot-generated disinformation is a threat, but there's already significant bot-activity. Most disinformation is human written, and then bot amplified (via upvotes & likes) or bot-disseminated. Spam networks who work for direct monetary incentives operate the same way, the difference is just the end goal. Ideally, the bot activity on Facebook logs on, likes the target post and then logs off for the next bot. This looks nothing like a human, who scrolls through their feed, posts messages to family and sticks around for hours. This metadata of login times, locations, site activity is far more helpful at detecting bots than looking at content. There's no difference in fighting an existing botnet and one powered by GPT-2.

We need to have an honest discussion about misinformation & disinformation online. This discussion needs to include detecting and fighting botnets, and users who are clearly not who they say they are from generating disinformation. However, that is the easy part. Vast quantities of human generated misinformation (closely related to the disinformation issues highlighted in the previous paragraphs) are disseminated in echo chambers built by recommendation systems and users in those bubbles. These people should not be censored, but it seems like the algorithms are helping grow the extreme bubbles. Not releasing GPT-2 will probably help prevent real attacks on our society. OpenAI was right to withhold it, but not for the reasons why it generated so much news.

A more thorough discussion of damaging AI needs to happen soon. Generative Adversarial Networks can now produce headshots that are indistinguishable from photos. Deepfakes for video and audio are becoming more sophisticated all the time. An intentionally evil example is John Seymour and Philip Tully's SNAP_R an AI that effectively spear phishes on twitter. We have no mechanism for an author to responsibly release AI research that might be used for evil. With truly dangerous AI, it should be locked up. But after we verify that it's a threat and scope out that threat.

r00tzBook: A misinformation CTF for kids

AI Village: Intro to Data Masterclass at DEF CON 2018!