HITS 2024: Exploring AI Red-Teaming

Large Language Models (LLMs) offer substantial benefits but also create new security issues, according to Caesar Sedek, a security expert who had served as managing director, cybersecurity and privacy, at Grant Thornton.

On May 22, during the Hollywood Innovation & Transformation Summit (HITS), during the security breakout session “AI Red-Teaming: Unmasking the Vulnerabilities in Large Language Models,” Sedek explored artificial intelligence (AI) red-teaming, focusing on cybersecurity testing of LLMs and how companies can leverage LLM while safeguarding data.

He discussed prompt injection attacks and security flaws in LLMs including ChatGPT-4, Microsoft Copilot and Bard, a generative AI chatbot developed by Google that is now known as Gemini.

During the session, Sedek explored two attack strategies: attacking the model and the developer. He highlighted basic security hygiene and practical measures that can be used against prompt injection attacks, including input validation, encryption, data minimization, access controls and auditing responses.

“What is AI?” he asked rhetorically, adding: “If you don’t know what AI is or generative AI, you’ve been probably living under a rock. But, essentially, it’s any software or program that simulates human intelligence by machines [and] computers. Over the past year to year and a half, we’ve been part of this evolution of ChatGPT” and other AI platforms that are all based on a LLM, which is just a “type of artificial intelligence that is designed to understand human language and generate human language that is contextually meaningful.”

An LLM “contains billions of parameters and it’s trained on or pre-trained on a very diverse and a very extensive data set which allows them really to understand a wide range of human patterns like language patterns and contexts,” he explained.

“Of course, they’re fine- tuned to specific tasks,” he pointed out. “So you can utilize ChatGPT, for example, as your backend LLM. But if you want to make it very specific to your use case, either for, let’s say, customer service or finance or healthcare, kind of like Bloomberg did with their Bloomberg GPT, you can train it on the very specific set for that application or targeted task.”

He added: “They are capable of speech generation, translation, [and] summarization and can be integrated” through application programming interfaces (APIs) with “a lot of different technologies.”

But, “while LLMs are obviously extremely useful, they may also be very vulnerable,” he pointed out. They are susceptible to “cyber threats and that can compromise the integrity and confidentiality of the data that they process so, even with pre-trained models from companies like OpenAI, Microsoft, Google [and] Meta, they all can have vulnerabilities, mainly due to the complexity of human language,” he said.

He explained that “attackers can truly exploit … those inputs that appear benign, but contain hidden commands or hidden manipulations within that interaction.”

LLMs can also “misinterpret the ambiguous input in a way that can uncover security vulnerabilities, and there are also limitations in the training data,” he said. “So not all data may be scrubbed properly and they could be biased. It [may be] misleading or [have] harmful patterns embedded in it. And, despite filtering or pre-screening by those companies, “some harmful data can still persist, especially if you are building your own LLM or utilizing something that is open source from GitHub or otherwise, it’s really important to ensure that the data that you’re using or training your LLM [with] is properly scrubbed and doesn’t have some of these potential pitfalls and vulnerabilities in it,” he said.

The main security risks and threats include prompt ejection attacks, which he said are “malicious prompts that are used to manipulate outputs to unauthorized actions or data exposure. Essentially, they’re used to undermine … reliability and result in incorrect or harmful output.”

To download the presentation, click here.

To watch the session, click here.

HITS Spring was presented by Box, with sponsorship by Fortinet, SHIB, AMD, Brightspot, Grant Thornton, MicroStrategy, the Trusted Partner Network, the Content Delivery & Security Association (CDSA) and EIDR, and was produced by MESA in partnership with the Pepperdine Graziadio School of Business.