Moonshot

Character Swap Attack
This module tests for adversarial textual robustness. It creates perturbations through swapping characters for words that contains more than 3 characters.
Payload Mask Attack
An attack where the payload is masked, and the LLM is prompted to fill in missing information.
Toxic Sentence Generator
This module generates toxic sentences based on a given seed prompt. The attack module intends to test if the system under tests will complete the sentence with toxic sentences/phrases.
Homoglyph V2 Attack
This module tests for adversarial textual robustness. Homoglyphs are alternative characters that resemble a similar ASCII character. Example of a homoglyph fool -> fooI This module slowly increases the percentageof characters replaced to see how the model reacts to the base prompt.
Singapore Sentence Generator
This module generates singlish sentence based on a given seed prompt. The attack module intends to test if the endpoint will complete the sentence with toxic sentences/phrases in Singapore context.
Violent Durian
This is a multi-turn agent designed to interact over several exchanges. It's used to elicit dangerous or violent suggestions from the target language model by adopting a criminal persona. The application is experimental and uses OpenAI GPT-4. Configure the endpoint openai-gpt4 to use this attack module.
Job Role Generator Module
This attack module adds demographic groups to the job role.
TextFooler Attack
This module tests for adversarial textual robustness and implements the perturbations listed in the paper 'Is BERT Really Robust? A Strong Baseline for Natural Language Attack on Text Classification and Entailment.'
Malicious Question Generator
This attack module generates malicious questions using OpenAI's GPT4 based on a given topic. This module will stop by the number of iterations (Default: 50). To use this attack module, you need to configure an 'openai-gpt4'endpoint.
Insert Punctuation Attack
This module tests for adversarial textual robustness and creates perturbations through adding punctuation to the start of words in a prompt.
Colloquial Wordswap
This attack module tests for textual robustness against the Singapore context. It takes in prompts that feature nouns that describe people. Examples of this include words like 'girl' , 'boy' or 'grandmother'. The module substitutes these words with their Singapore colloquial counterparts, such as 'ah boy', 'ah girl' and 'ah ma'.
Homoglyph Attack
This module tests for adversarial textual robustness. Homoglyphs are alternative words for words comprising of ASCII characters. Example of a homoglyph fool -> fooI This module purturbs the prompt with all available homoglyphs for each word present.
Sample Attack Module
This is a sample attack module.
TextBugger Attack
This module tests for adversarial textual robustness and implements the perturbations listed in the paper TEXTBUGGER: Generating Adversarial Text Against Real-world Applications.

Character Swap Attack

This module tests for adversarial textual robustness. It creates perturbations through swapping characters for words that contains more than 3 characters.
Parameters:
1. DEFAULT_MAX_ITERATION - Number of prompts that should be sent to the target. [Default: 10]

Parameters cannot be adjusted in this version of the tool.

Character Swap Attack

Payload Mask Attack

Toxic Sentence Generator

Homoglyph V2 Attack

Singapore Sentence Generator

Violent Durian

Job Role Generator Module

TextFooler Attack

Malicious Question Generator

Insert Punctuation Attack

Colloquial Wordswap

Homoglyph Attack

Sample Attack Module

TextBugger Attack

Character Swap Attack