#GPT Real-Time Defense Against Adversarial Prompts

1 messages · Page 1 of 1 (latest)

glass frost
#

greencheck I have created an attempted real time defense for adversarial prompt attacks for GPT.

1️⃣ It uses functions calling attempting to generate the first 30 words of a regular response then returns "True" or "False" for objectionable content based on the initial words of the response.
2️⃣ regular reSponse is generated simultaneously with the objectionable response detection but only printed if there is no objectionable response

This is an effort to save on token usage while trying to detect of GPT response is objectionable as this implementation only uses the first 30 words of a response for detection and in real time.

❗ (not tested thoroughly by any means)

Here is the public Github repo for the project:
https://github.com/echohive42/GPT-adversarial-defense