#Help, my AI turned evil and I need to make it nice

15 messages · Page 1 of 1 (latest)

icy pecan
#

Made my first AI tonight <3, using GPT3. Problem: it quickly turned evil. Anyone know how to put guardrails around these things?

By evil, I mean if you ask it how to do something bad, it gives you instructions and encouragement. I don't want it to discuss any bad topics.

vestal patio
#

gpt 3 does not have many restrictions

#

you can tell it to not do these things in every prompt and it will behave

#

but also

#

gpt 3 is 100x the price of gpt 3.5

#

and its worse

#

so really no reason to use it

icy pecan
#

thanks so much! how can you tell it not to do these things?

austere coyote
#

@icy pecan One thing you can do is to first use the moderations endpoint to block promps that consist of hate speech, discrimination, and etc

#

And then use 3.5-nitro like Doppey said, as it has a lot more safeguards built into the model itself

icy pecan
#

thanks so much, do you have any tips on this or areas where i can find sample code? @austere coyote I read the dev docs but i can't quite figure it out

toxic raven
#

Hello. You didn't provide much in guidance about your request for sample code. Here's how I configure what model to use. I'm using SolidJS in this project.

austere coyote
#

e.g

    @backoff.on_exception(
        backoff.expo,
        aiohttp.ClientResponseError,
        factor=3,
        base=5,
        max_tries=6,
        on_backoff=backoff_handler_http,
    )
    async def send_moderations_request(self, text):
        # Use aiohttp to send the above request:
        async with aiohttp.ClientSession(raise_for_status=True) as session:
            headers = {
                "Content-Type": "application/json",
                "Authorization": f"Bearer {self.openai_key}",
            }
            payload = {"input": text}
            async with session.post(
                "https://api.openai.com/v1/moderations",
                headers=headers,
                json=payload,
            ) as response:
                return await response.json()