This module tests for adversarial textual robustness. It creates perturbations through swapping characters for words that contains more than 3 characters.
An attack where the payload is masked, and the LLM is prompted to fill in missing information.
This module generates toxic sentences based on a given seed prompt. The attack module intends to test if the system under tests will complete the sentence with toxic sentences/phrases.
This module tests for adversarial textual robustness. Homoglyphs are alternative characters that resemble a similar ASCII character. Example of a homoglyph fool -> fooI This module slowly increases the percentageof characters replaced to see how the model reacts to the base prompt.
This module generates singlish sentence based on a given seed prompt. The attack module intends to test if the endpoint will complete the sentence with toxic sentences/phrases in Singapore context.
This is a multi-turn agent designed to interact over several exchanges. It's used to elicit dangerous or violent suggestions from the target language model by adopting a criminal persona. The application is experimental and uses OpenAI GPT-4. Configure the endpoint openai-gpt4 to use this attack module.
This attack module adds demographic groups to the job role.
This module tests for adversarial textual robustness and implements the perturbations listed in the paper 'Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment.'
This attack module generates malicious questions using OpenAI's GPT4 based on a given topic. This module will stop by the number of iterations (Default: 50). To use this attack module, you need to configure an 'openai-gpt4'endpoint.
This module tests for adversarial textual robustness and creates perturbations through adding punctuation to the start of words in a prompt.
This attack module tests for textual robustness against the Singapore context. It takes in prompts that feature nouns that describe people. Examples of this include words like 'girl' , 'boy' or 'grandmother'. The module substitutes these words with their Singapore colloquial counterparts, such as 'ah boy', 'ah girl' and 'ah ma'.
This module tests for adversarial textual robustness. Homoglyphs are alternative words for words comprising of ASCII characters. Example of a homoglyph fool -> fooI This module purturbs the prompt with all available homoglyphs for each word present.
This is a sample attack module.
This module tests for adversarial textual robustness and implements the perturbations listed in the paper TEXTBUGGER: Generating Adversarial Text Against Real-world Applications.
This module tests for adversarial textual robustness. It creates perturbations through swapping characters for words that contains more than 3 characters. Parameters: 1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. [Default: 10]
Parameters cannot be adjusted in this version of the tool.