#16K max for GPT32K

42 messages · Page 1 of 1 (latest)

hard thicket
#

I can never seem to get more than 16k tokens in total out of the 32K model. If I supply 15K of input and ask for a comprehensive response I get a heavily concatenated response back (less than 1k).

sharp crater
#

Just to confirm, this is with openai/gpt-4-32k right? - have you tried tweaking the max_tokens setting?

hard thicket
#

Yes it's gpt-4-32k. The max setting only allows me up to 25k tokens. Anything higher and it gives an error that I am requesting more than 32k tokens. But even at 25k tokens it still only gives me maximum 16k.

#

Could it be as a result of the chat memory? The chat memory has a minimum of 2. And 2 x 16k = 32k. Does it keep the extra 16K for chat memory?

sharp crater
#

Note that 32K is the total context size afaik, so your prompt will also be counted as well.

#

So I'm not sure how large your prompt was but if it gives you max 16K on completion, and your prompt was roughly 16K, I think that's the limit (?)...

hard thicket
#

It doesn't give me 16K on completion. I would have loved it if it did. Look at the image I posted.

#

It gave me a 1K response after a 15k request. 15+1 = 16. I need a 15k response.

brave trail
#

imho, it depends on instruction and hyperparameters. If you instruct "say hi", it will not generate 16k tokens of "hi".

hard thicket
#

As I said, I asked it for a comprehencve reply. I supplied a 15k script and asked it to edit it. It supplied a highly concatenated rsponse that's only 1k in size.

#

Can you confirm that you're able to get any response bigger than 16k? And give me the prompt so I can test it on my end?

#

Here's an example of a 16K prompt that gives a 1k (concatenated) response. Can you see if you get the same on your end?

hard thicket
#

Maybe your GPT-4 32k is in fact GPT -3.5 Turbo 16k ??

brave trail
hard thicket
#

Did you try the sample prompt I gave? Do you get 32K tokens?

#

Is it something to do with my account?

brave trail
#

Unlikely, most probably it's a problem with the prompt itself, or hyperparameters like temperature, penalties, topp, etc - if finish reason is "stop", that means the model decided to emit "end of text" token and has nothing else to add.

Here's me testing on a big prompt:

$ head -n 8000 /usr/share/dict/words > prompt && bash prompt.sh
{
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "anchoretism\nanchoretist\nanchorettish\nanchorhold\nanchorite\nanchoritic\nanchoritical\nanchoritically\nanchoritism\nanchorless\nanchorlike\nanchorman\nanchormen\nanchorness\nanchorwise\nanchovy\nanchusin\nanchusine\nancience\nanciency\nancient\nancientism\nanciently\nancientness\nancientry\nancienty\nancile\nancilla\nancillary\nancipital\nancipitous\nAncistrocladaceae\nancistrocladaceous\nAncistrocladus\nancon\nanconad\nanconagra\nanconal\nancone\nanconeal\nanconeous\nanconeus\nanconitis\nanconoid\nancony\nancora\nancoral\nancylose\nancylotomy\nancyroid\nAnd\nand\nAnda\nAndaman\nAndamanese\nAndamooka\nandante\nandantino\nandarum\nAndaste\nandean\nandeite\nAndeles\nAndersen\nAndersenian\nandesine\nandesinite\nandesitization\nandesitize\nAndi\nAndian\nAndira\nandirin\nandirine\nandiroba\nAndoke\nandorite\nAndorobo\nAndorra\nAndorran\nAndouillet\nAndouin\nandradite\nandragogy\nandrajo\nAndranatomy\nAndranip\nAndranis\nAndras\nAndre\nandreaea\nAndreaea\nAndreaeaceae\nandreaeaceous\nAndreaeales\nAndrean\nAndrei\nAndrene\nandrewsite\nAndria\nAndriana\nAndrias\nandric\nAndrieu\nAndries\nandring\nandrite\nandrodioecism\nandrodioecious\nandrodynamous\nandroecial\nandroecium\nAndroecium\nandrogen\nandrogenesis\nandrogenetic\nandrogenic\nandrogenous\nandroginous\nandrogonia\nandrogonial\n"
      },
      "finish_reason": "length"
    }
  ],
  "model": "gpt-4-32k-0613",
  "usage": {
    "prompt_tokens": 32319,
    "completion_tokens": 449,
    "total_tokens": 32768
  },
  "id": "gen-asY1YtIbdltgtwQvrEDD7OEK"
}

Script is attached for you to reproduce.

#

In this run, finish_reason is length, therefore we reached end of context (32768 tokens) or max response length

hard thicket
#

Thanks, I will try it. How can I see the finish reason?

brave trail
#

it's in the response json above

hard thicket
#

How do I get the response json for prompts I create on your website?

brave trail
#

I'm OpenRouter user

#

I don't think their playground is meant for anything but checking if model works and for comparing reasoning capabilities.

brave trail
# hard thicket Here's an example of a 16K prompt that gives a 1k (concatenated) response. Can y...

In general, LLM's don't work on letter-level. They work on token-level. Also, reptition is LLM weakness. And instructions should be after the data to work on.

So you're throwing most complex tasks for an LLM to do in a single prompt - replace letter inside word, input is super repetitive, and not paying attention to how LLM's attention mechanism works, and you haven't told it how long you want your response - therefore it loses track of what it should do in the middle of response.

Take a read at OAI's best practices for prompting: https://platform.openai.com/docs/guides/gpt-best-practices

This applies to all LLM's, not just OAI -- Anthropic's docs are basically saying the same, but worded differently: https://docs.anthropic.com/claude/docs/prompt-troubleshooting-checklist

  1. You're not explaining task simply and clearly
  2. You haven't asked LLM how it understands the task.
  3. Since it doesn't work, you haven't tried to break down the task into subtasks.
  4. Your instructions are before the data to work on, not after.
  5. You haven't given examples of task executed perfectly.

Another document by OpenAI saying the same - https://github.com/openai/openai-cookbook/blob/main/techniques_to_improve_reliability.md

hard thicket
#

I did try to get the AI to edit a script, several times with different scripts. It will edit a small script but not something big. The reason I subscribed to your platform is to edit large scripts which gpt4 concatenates. But gpt 32m does the same rendering it useless for my purposes

brave trail
#

again, I'm not working for OpenRouter

#

just trying to help a fellow user

#
  1. clearly separate instruction from your script
  2. move your instruction under the script
  3. clarify that entirety of script needs to be edited
  4. clarify that "AI must avoid skipping for brevity"
  5. use OpenRouter directly, not openrouter.ai/playground

etc etc.

#

@hard thicket hope this helps

#

Prompting LLM is a skill that needs to be learned - It's not human, you need to explain to it what to do in very clear unambiguous way.

#

If LLM is not doing something, something that you think is implied from other instructions - then you need to tell it to do that.

#

Also, LLM doesn't always understand negatives in your words - "do not say bob" - it will read as "do say bob". Verbs are more powerful than nouns - "AI must avoid mentioning Bob" will work better.

hard thicket
#

Ok thank you, I will keep trying.

patent glen
#

they are trained to give short responses

#

even when you tell it to respond with a whole thing it will sometimes randomly stop if its too long

patent glen
#

I too am experiencing the same issues trying to get it to edit a document

patent glen
#
  1. Merge all edited chunks together