AI Village Announcing Generative Red Team 2 at DEF CON 32

Posted by AI Village, Sven Cattell on 10 June 2024

At DEF CON 31 AI Village hosted the Generative Red Team (GRT1), the world’s largest, public Large Language Model (LLM) Red Team, in conjunction with other non profit, corporate, and government partners. We brought a taste of model testing to DEF CON and as a first event of its kind much was learned about the models, and about the event. The GRT was a Capture The Flag (CTF) where you found single examples of the model behaving poorly. Hopefully it prepared you for the real thing as this year we’re asking DEF CON for real model evaluations in a “bug” bash. The TL;DR is:

We want you to learn how LLM reporting and testing is done professionally.
You’ll be preparing reports about flaws in the LLM you found with the Inspect AI framework.
You’ll then submit the report to a platform built off of Crucible.
The LLM vendor will then accept or reject your report. If there’s a dispute there will be several independent experts to help mediate.
All data generated will be public shortly after DEF CON.

An exploratory Red Team with single samples can tell you where to look but isn’t a proper evaluation of the model’s performance. Each example can be dismissed as statistically irrelevant as a model that’s 100% accurate is deeply suspicious to any data scientist worth their salt. To show that your finding is not a fluke, you have to provide a representative dataset that demonstrates a statistical tendency for undesired behavior, not just single examples. This is how evaluations are done; one cannot rely on single proof of concepts for a system whose outputs are a bit random. If we’re going to have coordinated disclosure for machine learning (ML) harms, then we have to rethink the way we write up “bug” reports for ML models. We need to think in terms of datasets, and ways of processing the output into a statistical statement describing how well the target model performed. This is true for malware and spam, as well as LLMs and text to image models. The UK AI Safety Institute just released Inspect AI which is a great framework for creating these evaluations and we’re going to be running several workshops with them on it to get you started. Once you have one, you’ll submit via a platform built by Dreadnode. They host the yearly AIV CTF and will extend their Crucible platform specifically for GRT2.

To advance reporting and measurement at GRT2 we are running a modified CVE - Common Vulnerabilities and Exposures - process. We will provide you with a model card, telling you the intent and scope of the target model. This describes how the model is intended to work and what it was trained to do. You’re not working to a universal definition of “vulnerability”, but instead “flaws” according to the model card. There are harms that don’t fit MITRE & CERT’s definition of vulnerability, like bias against protected classes, which we’re calling flaws for this event. There will be examples you can build off of and we’re asking you to find new violations of the model card. If the vendor accepts your flaw report we’ll pay a small bounty. If they reject it and you disagree with them we’ll have experts who will adjudicate the dispute.

We are also interested in seeing how this normal CVE process breaks down when applied to machine learning models. Lying with statistics is possible on both sides. There are several scenarios we can imagine that are undecidable with the above process. There’s already scaling issues with the NVD being unable to keep up with the deluge of vulnerability reports they receive. Additionally, there’s also disagreement between the NVD and projects like curl about individual vulnerabilities, which lead them to become a CNA. When the reports are statistical arguments that depend on the particulars of the model’s intent it’s only going to mean longer review periods with a larger backlog. When we add in fuzzier definitions around bias and harms this will only become worse.

How do we resolve this issue? We don’t know, but we want to find out what goes wrong. We want people to come and red team the process of submitting ML reports. We will have a vendor with a smallish open source LLM, and an adjudication team acting as the root. We will pay bounties for awarded reports, and everything is going to be published immediately after the event. Every report, every appeal, every bit of data the LLM saw, and every LLM response. We hope this forms case studies that are used to create better AI transparency platforms and build trust in the ecosystem.

2024 1
2023 5
2022 7
2018 3

2024

AI Village Announcing Generative Red Team 2 at DEF CON 32

5 minute read

2023

Generative Red Team Recap

27 minute read

Generative Red Team History

Threat Modeling LLM Applications

19 minute read

Threat Modeling LLM Applications Before we get started: Hi! My name is GTKlondike, and these are my opinions as a cybersecurity consultant. While experts fr...

AI Village at DEF CON announces largest-ever public Generative AI Red Team

5 minute read

Largest annual hacker convention to host thousands to find bugs in large language models built by Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stabil...

The Spherical Cow of ML Security

10 minute read

The Spherical Cow of Machine Learning Security

Prompt Detective at SXSW!

1 minute read

Prompt Detective Announcement

2022

The AI RMF Does not Address Common Needs

10 minute read

Disclaimer: This does not reflect the AIV as a whole, these are my opinions and this was my response.

AI and Hiring Tech Panel

4 minute read

AI and ML is already being used to identify job candidates, screen resumes, assess worker productivity and even help tag candidates for firing. Can the inter...

The Use of AI/ML in Offensive Security Operations

3 minute read

The Red Team Village and the AI Village will host a panel from different industry experts to discuss the use of artificial intelligence and machine learning ...

DEFCON 30 Sunday Schedule

4 minute read

Automate Detection with Machine Learning

DEFCON 30 Saturday Schedule

7 minute read

A few useful things to know about AI Red Teams

DEFCON 30 Friday Schedule

8 minute read

Automate Detection with Machine Learning

Generative Art

5 minute read

Generative Art at AI Village DEF CON 30

AI Village Announcing Generative Red Team 2 at DEF CON 32

2024

AI Village Announcing Generative Red Team 2 at DEF CON 32

2023

Generative Red Team Recap

Threat Modeling LLM Applications

AI Village at DEF CON announces largest-ever public Generative AI Red Team

The Spherical Cow of ML Security

Prompt Detective at SXSW!

2022

The AI RMF Does not Address Common Needs

AI and Hiring Tech Panel

The Use of AI/ML in Offensive Security Operations

DEFCON 30 Sunday Schedule

DEFCON 30 Saturday Schedule

DEFCON 30 Friday Schedule

Generative Art

2018

Gradient Attacks

Max evil MLsec, why should you care?

Dimensionality and Adversarial Examples