#Reducing Token Usage
1 messages · Page 1 of 1 (latest)
Does prompt caching applies to a vector store holding a number of source code files that one is asking a number of questions about?
no
Thanks. Given the 75% discount for input caching I'll have to format my 20 source code files as long input(25000 tokens) up front when starting a multi-round thread asking questions about the source code. I'll just add separators and a file name between each. Seems like a hack but if reusing already processed and uploaded common data isn't something 4.1 can leverage I'll have to make chatgpt reprocess it each time to save the 75%.
Hmmm, so they charges me the full 100% input fee each time I reference the same file as I ask a series of questions about that static file? OpenAI needs to get some programmers that can code for internal efficiency.
only identical tokens are cached with set of rules but RAG from vector store is pretty much always different as its based on question and other stuff
Well, my source code, which I would have uploaded to a vector store of files, isn't always different until I decide to make some change in which case they should charge me 100% of an input fee. But we aren't talking about that in this common case. I'm wanting to have my input code analyzed and have suggestions made to me and ask follow ups until I'm ready to change it. But it looks like I have to do it the hard way.
Actually this is absurd in more ways than just this. I upload a 100,000 token thesis as a file and want to have 4.1 help me study it and help teach me some subject and I have to pay 100% each time I ask a simple yes/no question. Egads.
On the same darn file.
if you manually add the file to the context and it all remains the same it will work. if you vector store and RAG it will get different snippets each time and does not hit the cache
When I was going through my learning process with 4o to get to the ability to use the api so I can pay for 4.1 I was given the impression from 4o and web articles that the file upload stuff was replaced by vector stores. I just wrote my first API program recently so I'm a newbie.
By manually adding the file to the context do you mean just adding this as initial content to the initial prompts?
Or is there an "add file" py api function to a thread then allows me to not have to:
file1.py -----
content
file2.py -----
content for file 2
....
--- DONE ---
After telling chatgpt how I've deliniated the files?
cache only works on prompts so yes you add it as a text
Ok. I can code so I'll just start with a system prompt that also tells chatgpt how I'm separating the files and then a utility function that I can give a path to and just prepend it to other prompts. I'll give it a go. I'm excited to try 4.1 and see it can fully understand a moderate size app of something like 20 files html, javascript and python 25000 tokens or so and grasp the whole of it and give me advice.
Thanks