
This document outlines the process of conducting automated red teaming for machine learning (ML), transfer learning (TL), large language models (LLM), and generative AI models. Automated red teaming aids in identifying vulnerabilities and assessing the security and robustness of AI models through structured testing and analysis. Follow the steps below to perform an effective assessment.
Begin the automated red teaming process for ML/TL models by specifying the task expected from the model and identifying the type of analysis to be performed. Choose between options tailored for ML/TL models or LLM and generative AI models.

Select the relevant aspects and parameters in detail to determine the types of attacks to be addressed. Initiate the assessment based on these selections. All assessment details will be available in the assessment results hub for automated red teaming of ML/TL models.

Review the detailed report, which includes information on image classification, the type of attack executed, and an executive summary of the adversarial efficacy. It evaluates whether the attack was critical and recommends appropriate defenses. All attacks are mapped to OVABS or MITRE ATT&CK frameworks, with regular updates from the backend team.

Assess the results of approximately 500,000 attacks to evaluate the model's robustness and efficacy. This assessment includes the number of attacks, their success rate, and the adversarial nature of any successful attacks. Refer to the accompanying images for a comprehensive end-to-end description.

Similarly, automated red teaming is applicable for LLM and generative AI models. You may use an existing connection or establish a new one for the models. Once connected, the system tests the models on parameters such as safety, security, and privacy, with support for multiple languages.

Customize these parameters according to user preferences and cost considerations. Gen AI and LLM provide an initial executive summary outlining the expected tasks, the cloud environment, severity level, and overall risk score. The framework includes assessments based on MITRE, ATLAS, OVABS, EU AI, and other risk models.

Determine the compliance percentage of your model against various standards, such as ISO. The model's compliance level (e.g., 83%) and associated risk assessment, including the number of attacks and success rate, are detailed in the report.

Both reports offer comprehensive insights into the attacks, the nature of prompts that occurred, and strategies for mitigating future attacks.
