#Reducing Token Usage

1 messages · Page 1 of 1 (latest)

warm urchin
#

I tried looking into the assistants API, but I'm not sure if t he system prompt counts towards the token count every time. Is there a way to set it up so that the system prompt only gets called once per session?

Also, does the context carry over from previous messages sent in a thread? If so, do they count towards token count?

humble lodge
humble lodge
# lyric vessel no

Thanks. Given the 75% discount for input caching I'll have to format my 20 source code files as long input(25000 tokens) up front when starting a multi-round thread asking questions about the source code. I'll just add separators and a file name between each. Seems like a hack but if reusing already processed and uploaded common data isn't something 4.1 can leverage I'll have to make chatgpt reprocess it each time to save the 75%.

#

Hmmm, so they charges me the full 100% input fee each time I reference the same file as I ask a series of questions about that static file? OpenAI needs to get some programmers that can code for internal efficiency.

lyric vessel
#

only identical tokens are cached with set of rules but RAG from vector store is pretty much always different as its based on question and other stuff

humble lodge
# lyric vessel only identical tokens are cached with set of rules but RAG from vector store is ...

Well, my source code, which I would have uploaded to a vector store of files, isn't always different until I decide to make some change in which case they should charge me 100% of an input fee. But we aren't talking about that in this common case. I'm wanting to have my input code analyzed and have suggestions made to me and ask follow ups until I'm ready to change it. But it looks like I have to do it the hard way.

#

Actually this is absurd in more ways than just this. I upload a 100,000 token thesis as a file and want to have 4.1 help me study it and help teach me some subject and I have to pay 100% each time I ask a simple yes/no question. Egads.

#

On the same darn file.

lyric vessel
#

if you manually add the file to the context and it all remains the same it will work. if you vector store and RAG it will get different snippets each time and does not hit the cache

humble lodge
#

When I was going through my learning process with 4o to get to the ability to use the api so I can pay for 4.1 I was given the impression from 4o and web articles that the file upload stuff was replaced by vector stores. I just wrote my first API program recently so I'm a newbie.
By manually adding the file to the context do you mean just adding this as initial content to the initial prompts?
Or is there an "add file" py api function to a thread then allows me to not have to:

file1.py -----
content
file2.py -----
content for file 2
....
--- DONE ---

After telling chatgpt how I've deliniated the files?

lyric vessel
#

cache only works on prompts so yes you add it as a text

humble lodge
# lyric vessel cache only works on prompts so yes you add it as a text

Ok. I can code so I'll just start with a system prompt that also tells chatgpt how I'm separating the files and then a utility function that I can give a path to and just prepend it to other prompts. I'll give it a go. I'm excited to try 4.1 and see it can fully understand a moderate size app of something like 20 files html, javascript and python 25000 tokens or so and grasp the whole of it and give me advice.

Thanks