#Getting around token limit
10 messages · Page 1 of 1 (latest)
ask it to break it down into chunks and confirm before posting the next chunk. It will simply wait for you to acknowledge the response before continuing.
A number of clients also offer ways around this.
It's a system designed to emulate human language - talk to it like a human. Use terms like "Can you break it down for me" and "If you reach your limit confirm before moving to the next response." Sometimes it chains them together, sometimes it will want a new prompt - work with the model not against it.
The way continue works in the chat app, is by cutting the context and only including the latest 4k token that can fit. That's why it would sometimes spit out nonsense in the continue, since for example, the instruction part would get cut off.
@old wharf @humble veldt is there a way to only send the 4k token and then send another request if there are more tokens?
Sure, but the model would only have the context of the new tokens.
is there a python tool for counting tokens? how would i do that
There's one mentioned in the docs. I don't remember the exact name. Something about "tictokenizer". Also, you get the used tokens count in the API response.
hmm thanks. im basically getting dates and event data from a pdf, so i need every date and it cannot be summarized in any way. im thinking that I count tokens before i send the request and split it into multiple requests if need be
There's quite a few mrthods and specialised embeddings for doing this.
Document processing can be done using 'chaining' for large inputs - there's a lot of talk about this and a variety of tools
tiktoken i think is the lib ...?
Why not use the new 16k token 3.5 model? That’s a huge increase over 4096…