#Do I get in trouble if a model continuously violates OpenAIs guidelines?

46 messages · Page 1 of 1 (latest)

slim copper
#

I'm attempting to create a personality clone. As a first step I answered a bunch of questions as myself. Well, they weren't all politically correct. I swear a lot but I replaced all the F's in my training data with F***, etc. I would have thought that would have been enough. But as I'm experimenting with my model the gem pops out (it's pretty funny imo).

I mean, I have opinions that are not government approved. I think the vaccines was not necessary for children for example. Do I have to constantly worry about OpenAI locking me out of my project because I don't have the right opinion?

This is concerning to me. Plus, I can't help what GPT says. True, in development I can do prompt engineering by saying things like "You are Jon, a nice person who doesn't swear and avoids saying offensive things". However I'm getting flagged while doing development? C'mon man.

How much do I have to worry about this?

vague reef
hearty steppe
#

Please tell me this doesn't happen with the API also. What if we are using GPT for something that deals with multiple languages? How do we possibly parse everything that a user might input that may be "offensive"?

vague reef
#

There are openai terms of service, just like anything. And yes, with the API too. If you have users, then you’re going to have to deal with things like that. Lucky for me I’m just dicking about on my own. I would hate to have to deploy an app to users and have to deal with legal issues and all of that stuff…:S

hearty steppe
#

How exactly am I supposed to know if a user prompts something offensive in Chinese? What if someone just does it out of spite?

rancid valve
hearty steppe
#

Obviously, but I'm not trying to get philosophical here. The core of the issue is that "offensive" api prompts should just:

  1. Result in an error code
  2. Still charge the user
  3. Not have them get their account banned

Let's say I have a production app that talks to the user in any language, much like chatGPT itself. Let's go one step further and even say my app lets my SaaS users make their own website chat bots.

Why should my API access get restricted if someone's random website visitor writes something offensive in a language I have no way of pre-processing?

What if that language has its alphabet in Cyrillic, but the user writes the offensive message in equivalent latin letters - I would have no way of knowing their offensive meaning, but GPT would deduct that and therefore my I could be banned from using it.

vague reef
#

There’s a moderation api. And then you’ll probably have to show that you make a ‘reasonable attempt’ to prevent user bad action

#

Write it into your own terms of service, have some human moderation…

rancid valve
#

I’m agreeing with you, and supporting your point, and additionally I think the language that the offensive content is in is not very relevant because even in a language the developer knows, what might be insensitive is purely subjective and trying to interpret the content before sending it to AI is virtually impossible short of just seeking certain banned words or phrases. We shouldn’t have to prescreen the prompts and we shouldn’t have to worry about our accounts getting banned for anything related to this

vague reef
#

Why not?

rancid valve
# vague reef Why not?

Well ultimately openai has their own freedom of choice too and if they want to ban people for this I guess it’s okay… but I guess I would qualify my position by saying that they shouldn’t ban simply based on prompt content if they want to have a reputation for being fair and unbiased

vague reef
#

I don’t agree with ban no warning no appeal, but if you’re letting other people do what they want and just throwing your hands in the air and saying ‘don’t ban me it’s not me doing it!’ And not even attempting to moderate, then imo you kinda deserve it…

rancid valve
#

Well I would agree that the owner of the account is ultimately responsible for how it used. And if the account is violating, TOS then it makes sense that the owner gets punished. I guess my argument is more along. The lines is that it should not be against TOS

#

What’s the possible harm in submitting a prompt? One way to make it a little bit more fair might be to implement a strikes system like they use on YouTube.

vague reef
#

If you look at the terms of service, there are various banned uses. Large scale propaganda or scams for example.

rancid valve
#

Ultimately, I think the best way to determine if a prompt is offensive or not would be to send it to AI and ask which would mean I have to proceed my actual prompts with another prompt just to determine if the actual prompt might violate TOS so they don’t send it, but I’m also sending it to AI to be a valuated in the first place for offensiveness or whatever so it would seem to make a lot more sense to me just to let the account holder send whatever prompt they want and if it’s deemed a violation of TOS just refused to respond to it he doesn’t need to be of further punishment for the account holder far as I’m concerned

rancid valve
#

I don’t really think there’s a way to sort this out without having a philosophical discussion about morals and ethics, freedoms, and rights

#

If open ai wants to treat users fairly that is

hearty steppe
rancid valve
hearty steppe
rancid valve
#

i guess you'd have to filter the input first, and if it's ok to send then send it, but then you'd also have to filter the output to make sure it's ok too... so that's 3 api calls for each message which isn't fun

hearty steppe
#

Yup :/

#

I'm also processing sometimes up to 100k tokens from outside resources

#

Without knowing what they are

#

So I'd literally need to moderate all of those before each individual api call

vague reef
#

Moderation api calls are free, I believe?

#

There you go. It’s free

vague reef
slim copper
#

They're not free in terms of time. Regardless, I would still have to get the offensive completion before I could send it to the moderation endpoint and it is generating the completion where I got the warning. ...if the completion API must censor output, which I don't think it should, then it should simply return a 400.

hearty steppe
#

GPT 3.5 API reply: "Sorry, I cannot complete this task as it goes against OpenAIs content policy on hate speech and offensive language."

#

So the moderation endpoint didn't flag this prompt (on history of racism), yet GPT 3.5 did. So, what now?

#

@slim copper this should remain a valid concern for anyone using the OpenAI API in production

vague reef
#

It’s a good question about ‘what to do now’ without more actual information on who’s getting banned for what. On one hand, you have chatgpt themself telling you it’s against the policy, but on the other hand the moderation tools didn’t notice anything wrong with it. And on the other hand chatgpt is known to give false information, and on the other hand it is also known that it’s possible to persuade it to break the rules. Is that on the website or through the api btw?

#

…and did the request break openai’s rules, in your opinion?

#

Presumably, the api moderation tools are… the same as the ones on the back end? If the moderation tools don’t flag it for you, then it’s not being flagged at openai and you’ve got nothing to worry about. Chatgpt can say what it wants. It doesn’t have the ability to contact anyone, or do anything really, if all it is is text you read…

clever rune
#

You should carefully review that document above to see if the content you're generating violates any of this, if not, you're good!

hearty steppe
clever rune
#

Let's go one step further and even say my app lets my SaaS users make their own website chat bots.
You are responsible for ensuring the safety of these chat bots
Why should my API access get restricted if someone's random website visitor writes something offensive in a language I have no way of pre-processing?
Again, because as they've specified, you're responsible for ensuring the safety of your applications and making sure that anything generated by it does not generate safety violations. If anything, you can attach user IDs to each request to pinpoint bad-acting users and ban them, you will not be banned for single occurrences or if you're not attempting to purposefully generate harmful content as the creator of the application.
What if that language has its alphabet in Cyrillic, but the user writes the offensive message in equivalent latin letters - I would have no way of knowing their offensive meaning
OpenAI's moderation endpoint currently only supports english properly, so you may need to find other solutions to ensure safety for other languages or remove support for other languages entirely

#

Things like swearing and your own opinions on vaccines aren't content violations or anything like that unless you're using it to generate political content / spread the generated content