#Custom GPT Knowledge Limit?
1 messages · Page 1 of 1 (latest)
you can't upload more then 10 files at once
Aware of that. But I am getting error saving when I try to upload my 11th+ document.
strange
Refresh, still same issue. I suppose I'll try again in a few hours
I already get problems when I'm uploading less files like it does not use the infos
or I do something wrong
Yea I have about 2000 files to upload, so getting error saving after uploading only 10 is concerning
what do you want to create that you use 2000 files
I think it wont work because it takes like hours to check 5 docs
GPT to help teachers, student, and parents better understand AI and it's presence in todays world. AI is coming to classroom but won't be adopted unless teachers, students, and parents trust and understand AI
I've got time lol
I mean if you chat with it it will scan the files each time
I've got 10 uploaded now and it seems to take about less than 30 seconds to give a response based on the uploaded knowledge. I'm going to try again in a few hours to upload PDF #11, I'll update here when I do so
okay
Screenshot for future reference.
report in #1070006915414900886
How big are the pdfs?
It won't let you upload more then 10 documents period, they just don't make that clear
Ah what a nice UX lol. Thanks for commenting. Assumed this but was hoping it was just a bug
Here's to hoping they come out with a higher tiered plan to upload much much more
Is there a file size limit ?
I'd imagine there must be to an extent. But you make a good point, I'll combine a bunch of my PDFs together and see
512MB is the max
is there document for this stuff somewhere? I an struggling to get any custom GPTs to really work. It seems very hit or miss on whether is sources from documents I am uploading
I feel like sometimes my uploaded documents are not really getting processed as the GPTs isn't aware of anything from that document. Other documents seem to have worked
GPTs seem to prefer to answer without looking at the knowledge base whenever possible. You have to include specific instructions to remind it to check the knowledge base.
For me its claiming the knowledge doesn't exist within its documentation...but the thing I prompted for is there in one of the documents word for word. Its odd
I was hoping I would be able to make customer GPTs that would be experts on like specific big manuals...but it doesn't seem viable since it just doesn't reliably seem to source from the document, even for very simple prompts that are explicitly answered in the knowledge document
OpenAI's implementation isn't that great. If coding isn't out of the question you could probably just make your own vector database and either use the old API to inject context or give the AI a function to do searches
hardly any point in uploading files when you can probably just fit everything in the prompt with these constraints
yeah I guess I was assiming it would actually be fine-tuning on the knowledge docs somehow but it seems to be doing something more shallow
if its not really deeply utulizing the knowledge docs then the GPT's feel 99% less useful than I was expecting
the whole thing feels rushed. devs should probably stick to the "old" (current) way of doing things
Try adding something to the instructions like:
"This GPT has access to the manual for the product in its knowledge base. Whenever the user asks a question, the GPT searches its knowledge base for answers."
When the GPT scans its knowledge, you see a pop-up that says "searching knowledge base". As far as I can tell, that's the only time the knowledge comes into play
in fact it might be LESS useful than custom instructions alone since with custom instructions it actually reliably uses that info
right...my main issue is that is does not reliably do the "searching knowledge base" thing
you can tell it to check it's knowledge before answering any prompts
works great
hmm, I will try that
highlight how important that is to you, and it will always do that
that does seem to get it to search...but it keeps claiming the info doesn't exist in its documentation even though its explicitly there in one of the uploaded docs
this seems really janky
I swear its just not actually processing the uploaded docs sometimes
is the info it should look for in plain text or in a picture?
plain text. I gave up on trying to use PFT's and just started using plain text docs
with other docs it seems to work IF it says "searching". Some docs it acts as if it can't see them
It's probably just hallucinating. If you add some encouragement to your prompt or tell it to look harder it might just find it
For what i understand, the number of uploaded knowledge files cannot exceed 10. Additionally, is there a size limit for each file?
the files can be 10gb at max or 2 million tokens, what ever comes first
anyone else seeing their files be duplicaed in the knowlede file section? I had 6 files uploaded. How I see a duplicate of each file for a total of 12
I wonder if that is why I am having issues, because now I am over the 10 file limit
I have a feeling this is why my custom GPTs is broken now. I added that 6th doc, so when it got duplicated that means I now have 12 which puts me over the 10 file limit
For each file of the total?
some files are actually duped more than once
For Custom GPT it's a limit of 512mb per file, so around 5gb worth of total data with all ten
no you can have more than 10
no 10gb
How many files can I upload at once?
The limit is up to 10 files at once. Keep in mind there are file size restrictions and usage caps per user/org.
What are those file size restrictions?
The size of text, document, and spreadsheet files is capped at 2M tokens per file, with a hard limit of 512MB per file.
For images, there's a limit of 20MB per image.
Additionally, there are usage caps:
Each end-user is capped at 10GB.
Each organization is capped at 100GB.
Note: An error will be displayed if a user/org cap has been hit.
Are you on enterprise?
No
Oh look at that
Wait never mind, still can't upload past 12 files
Token limit seems to be the only thing I guess
Do you have a cap or what? I can upload as much files as i want:
you maybe just can't upload a specific file, if it exceeds the token limit
this is massively frustrating. It just flat out does no reliably use information from the uploaded knowledge documents. Even when it says its "searching". I'm giving up on this. I hope its just glitching because of their roll-out and load issues
When I originally made this post I couldn't upload beyond 10 files, I tried multiple files of varying sizes to no avail. After reading your comment I went back and tried to upload more (the same I was attempting to earlier) and was able to upload them successfully. But I was only able to upload two more files before getting the same 'Error Saving Draft' message.
The reference image I included earlier shows the error occurring at 10 files. Now we can see it happening when I try to upload #13
In the end its just custom instructions. If you want to traing your gpt on your data, then you might try the custom models program
I imagine it is, plus we should remember it is just a beta at the end of the day
Sounds like a regular token limitation
Yea that's what I'm thinking too, would love to see them come out with a less limited version for a higher price
but custom instructions actually work, it will use those instructions reliably. For the customer GPTs it simply does not reliably use the information in the documents. Even if they are plain text. At least not for me.
nvm custom models are $2-$3 million lol
Well worth it for the right businesses though
it just seems like they oversold this in their announcements and marketing
I am hoping its just glitching for me. I'll try again another time
Honestly I feel it's more the media hyping and people in general hyping things up. I really can't recall the last chatGPT ad I saw
well in their presentation that made it seems as it would reliably actually source from the data in the uploaded documents. That is not what I am seeing at all. It could just be technical issues I guess.
I have a feeling it would work better for smaller files. But if you upload like a big 50k word manual, it doesn't seem to reliably retrieve info. Sometimes it works, sometimes it doesn't
not just big files. I have a feeling the knowledge is used as a last resort before it hallucinates.
so if you have very strange fiction in a file I think it might use it more reliable
as it can answer the thread easily
When you upload documents, it doesn't actually look at them or scan them. It only uses them as a reference. So you ask it about the document and it thinks of a key word to search, finds pages with that term and checks to see if it's relevant. If you want it to actually know everything in the document and understand it all at once with context, you would have to paste the text into the chat.
What's the point of having 128k tokens if it can't read the data you upload automatically?
And that's part of the reason why I stuck with giving the AI a search API for the scraped documentation instead in #1172289801098109009
For me I could add more, first the 10 docs allowed in the GTP Builder phase , then I added more attaching them in the text box once the GTP it is been published: First I asked :how many documents could you add to your knowledge base ?
GPT response:
There isn't a specific limit to the number of documents that can be added to my knowledge base. The capacity to manage and utilize the content from multiple documents effectively depends on the complexity and size of each document. I can handle a considerable number of documents simultaneously, ensuring comprehensive analysis and retrieval of relevant information as needed. If you have more documents you wish to add, feel free to upload them, and I will incorporate their content into my existing knowledge base to assist you better.
User: there you go ! ( attached more than 50 docs in several batches) . Then asked about them and it seems it process them and added into the KB
I tried a workaround that works to ensure all knowledge is top of mind on the gpts but it made my hair hrey. i posted it as text in the chat instead and prefixed everything as inportant or guided why it was inportant in the context
I can't find anything official about this. Why don't they clearly communicate how much knowledge we can upload? What is your current status? Total memory and number of files? @sand pike
As of now they say the max file size you can upload is 512mb. I’ve been trying to upload a file sized 412mb and it takes over 10 minutes, only for it to not complete uploading in the end. I haven’t gotten around to breaking down my file even further to see how it will take 250-300mb
To overcome the issue of number of files being uploaded I’ve merged a bunch of my PDFs together into one, hence the large file size. Still have to do more testing to see how functional it works in terms of reading through a 2000 page pdf
also seems like uploading files with firefox gives some issues very strange
I put together 600 pages of PDF and it was 50MB
It didn't want to accept that and kept giving me an error. Then I split it again into 25MB and 25MB and it worked.
Lol, yea they definitely have some work to do with the knowledge upload part
The point of the custom gpt is to have a chatbot trained on whatever data you want it to fine tuned to. Difficult to make a usable chatbot (that’s anything more than what you can do via uploading files to chat) with the limited ability to upload “large” data
You can put a requirement for reading
You aint uploading 2000 files
A bit late to the party @plucky remnant
Yep
like?
Its literally on their website
where?
Text, Rules for interactions for certain things
Has anyone tested if photos are processed. IE: follow the theme / styling of photos in knowledge?
I am too
ChatGPT itself reports that you can upload 10 files of 100MB each. If you have more than 10 files, you need to join them all to fit in 10 files, and each one no more than 100MB. 👍
Not sure where you got that info from, but try to upload a file larger than 512mb and you’ll clearly see that it says 512mb is the limit
As I said, no files bigger than 100MB. You're trying to upload a single file of 512MB ? I got the information by carefully asking ChatGPT about the policies for uploading these 10 files for knowledge base.
Yea you’re wrong and chatgpt lied to you
See here
That's better than what I said. It looks like there's a misunderstanding. I told you that the maximum size of EACH file should be 100MB * 10 = 1TB total size. On the message you pointed out, it says 10 * 512MB size = 5TB ( Wow ) 🙏 Better than I expected, since I knew the maximum was 100MB. See ... 100MB per file is already a ton of texts.
I hate to break it to you, but 512mb * 10 = 5.12gb
512MBytes * 10 = 5.2GB. 1000MB = GB. What's the problem here ?
Alright I’m gonna step out of this conversation because this is like talking to a wall. Have a goodnight
Ok, that was my fault on the last message. Ok, 512MB * 10 = 5.2GB which is really a great ton of texts. 🙏👍
Across all files its 500mb max, but with a 2m token limit.
I joined all texts from my previous conversations almost daily on ChatGPT in one PDF file. Incredibly, the file is only 1.2MB size. Now, when I begin a new conversation, I include it, just to give context, and to continue from where we stopped. ❤️
So it works?
Im unsure if OAI visualizes anything seen in pdf files, for example images.
Across all files afaik
Well, I was just thinking about that. We often don't chat including images, just plain text. But 1 year of conversation was only 1.2MB size, that's great. Even if the maximum were 512MB, I'm far from lose the context if use the 10 files only for that, but certainly, it would be better if the total size would be at least 1GB = 1000MB.
But, how is tokens included really? I dont fully understand it. What do we have to worry about with tokens?
ChatGPT explained me that the files has nothing to do with the token counter. 128k tokens is the size of the window... I'll try to post it's answer.
Thanks!
Here ChatGPT explains better:
Let's discuss the relationship between the uploaded text files (used as a knowledge base) and the use of tokens in our conversation in English.
-
Text Files as Knowledge Base: You can upload up to 10 text files, each up to 100MB, which I can use as a knowledge base. These files can contain detailed information, background, specific topics, or any data you want me to reference during our conversation.
-
Token Usage in Conversation: Tokens are units of text, like words or parts of words, that I use to understand and generate responses. In the GPT-4 model, there's a limit of approximately 128 thousand tokens for each conversation session.
-
How They Relate: The tokens are used to process both your queries and my responses in our conversation. When I access the uploaded files to find information relevant to your queries, the text I read from these files also counts towards the token limit. However, the size of the files themselves (in MB) does not directly impact the token limit; it's the amount of text processed from these files that counts towards the token usage.
-
Practical Implication: If our conversation involves complex queries that require me to reference large sections of the uploaded files, it can consume a significant number of tokens, thereby reducing the amount of conversation we can have before reaching the token limit.
In summary, while the uploaded files provide a rich source of information for our conversation, the amount of text I process from these files does impact the overall token usage in our session. Therefore, managing the scope and depth of queries can help optimize token usage, especially in lengthy discussions.
Okay so nothing matters realistically other than 100mb per file, weird because all of my files add up to 142mb
I think I might know the issue 😆
Perhaps it doesn't accept your 142MB file size. Try with less than 100MB, and please tell us the results. I haven't any 100MB text around. 😀
All files are less than 100mb, adding up to 142mb
Tokens are simply in reference to the amount of text to be processed (reading images is not yet a capability). If you try to upload a file larger than 512mb, you will get an error message saying the file size limit is 512mb. But, you will not be able to successfully upload any files larger than 25mb (as far as I and the others on this thread have been able to achieve).
lol ChatGPT doesn’t have that knowledge
Unless you use my GPT README.bot :p
where
Let me know if it breaks…I’ve had to reupload files for it to remember stuff twice now.
Can it read images in the knowledge files? With vision?
I scraped all of OpenAI's api references and guides and put it into a Weaviate DB which works better than the file upload feature IMO. Check it out #1172289801098109009
It also links the relevant section of the documentation when it answers you
How on earth you found that out
let me try it
my custom gpts suffers with 200kb .txt
but can receive a 10mb pdf thats weird
Weird/inconsistent issues like that is why I chose to use Weaviate
And it's more flexible IMO
Perhaps a dumb question on my part.. but do you mean you can simply upload a vector database (file), and it will know how to include that knowledge?
I self-host an API that the GPT can query (basically for every question), which will retrieve relevant results from Weaviate and the GPT will use that in its answer
and this is why i use txt files and convert them to Parsed single string .json files. way easier and more Token Efficient for GPT
In a json String it can search instead of reading the entire thing and wasting tokens.
If you need a bigger knowledgebase you'll need to host your own API to do RAG on and do function calls. The RAG built into agents has some limitations. As said you can use weavate or if you want to go cheap redis in a runpod with a flask (or fastapi) app for vector based search. Essentially you'll need to use chatwithmydocs but only the api and document upload features. Agents can make api calls.
This isn’t true. It seems to be able to access at least some info without searching the document
Is there any documentation on that? For that to be true I think each GPT would need to be a fine-tuned model of GPT4
Ah, I think I see how the knowledge base works now. Small files that are attached seem to be included in the context window while large files need to be searched. The file names are always be included in the context window as well.
For example, if I upload a file called "password.txt" and ask the GPT for the password, it reads it out immediately. If I also upload a large file called "thesaurus.pdf" and ask the GPT for a synonym of a word, it takes time to search through the entire pdf before answering.
But I'm still noticing that my basic thesaurus GPT sometimes decides not to search through its thesaurus pdf even when it should be (I know because it tells me things that disagree with my pdf). Adding the line "answer using thesaurus.pdf" to the instructions fixes this.
it cant even utilize 10 different files effectively
It's ten files, 8,192 tokens each. That's 81,920 tokens total. The model can only keep 8,192 tokens in its context window at a time so it can't actually reference the entire set at once. It's best if you prepare the documents in JSON or CSV and condense/summarize any filler text as part of preprocessing to get the most out of it.
10 :/
lol so literally what I just posted :p Im going to try yours out though! should be fun to find differences
Im confused as to how this is any different than OpenAIs upload feature... Its literally doing RAG as well, and you aren't paying for embeddings, back-end API calls, etc. OpenAI Retrieval also links the relevant section of the documentaiton?
This is just so much overhead for something a GPT does pretty much the same. We all gotta accept that GPTs are just buggy right now. Id advise not wasting time trying to build a better RAG just for a GPT. People need find a public API that can do it like you said
@sand pikesome1 gave me a tip where you could merge all the PDFs into 1, or just merge a couple, and CHATGPT can merge them for u, thats what I did w my medical one and its working good
Here's my take on it: https://chat.openai.com/g/g-9QLQ6MZQt-gpt-documentation-guide
Difference is I'm hand scraping the old fashioned way with copy paste and building the JSON data structure
I'm also curating the sources to only include the most helpful articles so it doesn't get bogged down by irrelevant data
Yeah same, I was referring to @west flume building his own vector store instead of just using the baked in one for GPT's.
What kind of custom instructions do you use with it? I write it a detailed flow of how it should traverse the data its all I need to get good results.
No custom instructions beyond telling it what its purpose is. When using well structured JSON it figures out the flow on its own.
It's very similar to writing a RESTful API
but with a huge body of data
Hard part is converting so much plain text into JSON without losing document flow
Time to spin up a "Document to JSON GPT". This is ridiculously laborious.
I'm just doing it because I can. Also in the future I can give it more features and further refine it like giving it the ability to make multiple queries in one go
Never used Weaviate before so it's a good experience
I have the scraping fully automated haha
Hooowwww!?
I used Playwright in Node.js
The logic gets a bit complicated in some places but it's doable
interesting. I'd love to see the implementation.
Want to trade? I'll help you with your GPT building if you'll share your scraper with me. 😄
Like you need to get it to expand all the properties of the params in the API References first
Haha maybe in the future after I clean it up a bit
My GPT based scraper dies on this page https://platform.openai.com/docs/guides/prompt-engineering
it's waaaaay beyond its context window.
My data is additionally chunked by sections
Use the playground with 16k tokens and make sure you tell it to write retrieval instructions that would help an LLM travers and interpret the json. Then put those instructions in the GPT. That field is incredibly vital to getting it to act right.
Treat the instructions like a plugin manifest Description for Model
Isnt scraping OpenAI against OpenAI's usage policy?
Is it? I haven't read it. >.<
I told a developer I was doing it and he didn't say no, so... ¯_(ツ)_/¯
cuz he ain't no snitch, but now you're snitching on yourself lol
Haha it's for the greater good ya know
I'm sure they wouldn't mind it in this case
All towards helping developers spend more money on OpenAI APIs
I thought I read it somewhere in their policies, but I cant find it now, so maybe Im remembering wrong?
haha agreed
I see why scraping ChatGPT wouldn't be allowed but why should they not allow scraping of their main website? That load can't be too high, right?
It's probably cached too
I guess also one big thing is that the documentation is updated automatically without needing you to manually format and re-upload the files
In my GPT it just updates the Weaviate DB with the new stuff
I have a cronjob that runs every 6 hours that checks if the documentation changed and updates the database if so
There's also a tagging system that marks legacy and beta endpoints, and I plan to also automatically clean up the data some more before updating the DB in the future
I already do some data cleaning but it could be done better
It automates all the stuff you need to do like clicking on each code example, expanding all the properties of parameters, expanding all info boxes, etc. before/during scraping the page
Run failed
You exceeded your current quota, please check your plan and billing details.
Haha I guess I won't be using playground for this.
I don't think it can handle that much data at the moment. I've got like 30-40 policies deep and it's missing stuff.
We've been training in-context policy by policy and it's @sand pike
Already did this a few days ago. We talked about this a bit earlier in the thread, issue that rises from that is the 25mb upload limit (error message says the limit is 512mb but you won't be able to successfully upload anything larger than 25mb atm)
As of the last time I checked a few days ago, 100% sure
these 3 are 3 of the ones there
That's for all your files?
Let me try, I have a 412mb pdf
Physio - 80K KBs
Nutrition - 40K KBs
Cardio - 60K KBs
Pediatric - 90K KBs
Anatomy (1) - 211K KBs
Muscular Anatomy - 20K KBs
and it works perfectly fine, the AI can search through them, it might be because I specified that, when the question is regarding X it will prioritize searching on X documents
I still haven't hit the 512 MB limit you said was the theoretical possibility though, im gonna try uploading another file that would make me hit the limit to see what happens.
How long did Anatomy take to upload for you?
I just uploaded a dupe of the physio doc, which would put me over the 512 MB limit, but nothing happend, it even saved fine
not sure, i uploaded tmr very late at night for me, but it didnt take that long, 100% less than a minute, and the Physio dupe rn took less than 30 secs I think
Weird, I've been here for the past 5 minutes. So far when I've tried to upload my 412mb pdf it will sit here for around 10 minutes then will fail. No error message, just will not upload and remove itself
i see
What's your internet service like? Average or above average?
Im using LAN,
That;s gotta be my issue then
Tried training the docs after each upload. In context, showing a example of how you expect the data to you be used
Can I send you my file and see if you're able to upload it on your side?
Yea sure
@sand pike , u want me to create a new gpt and just upload the file or upload to the one w 500 Mb already?
Doesn't matter to me. I just want to see if you'll be able to upload it. The 512mb limit is per file, not across the board. The total limit would be 512mb * 10 files = approx 5gb
I'll send you a pm in like 30 mins when I get back to my office
Hm okk
Big thanks to @brave spire , he was able to upload my 412mb file with ease
From my conclusion, if you have above average internet like Jureg then you will have no issues uploading "larger" files. If you have normal internet like myself. you'll be stuck waiting on files to upload, eventually leading to time out.
Edit: Disregard. Still issues with uploading, stuck at 95%
nah idk why it still hastn uploaded yet xD
its been stuck on like 95% for a long time
I think you should try separating the file into 2 or 3 then uploading it, i managed to upload a 211 MB file with ease so maybe u can do it to
Im gonna leave it here until the "time out" to see what happens
but yea, separating them into 2 and uploading them separately could work, im gonna try doing that
Dang spoke too soon lol
xD yea
good luck with separating them, i was gonna go the GPT ruote but you cant even upload the file on GPT, and on ilovepdf u need premium 
I've been planning to do this, haven't gotten around to it yet. I'll update here when I do
alright gl, im gonna try splitting them as well, if i do ill send them over to you
it's probably trying to process it by chunking/embedding it or whatever it does behind the scenes and is choking
If i managed to upload a 211 MB file with ease yestarday night you can probably upload a split version of yours
sometimes tokenizing the text can take a long time
What would tokenizing the text be?
btw im not american, but since we are talking here already, is there a way I can download past SATs and ACTs?
LLMs see your text as "tokens"
gotta study xD, wanna make a GPT that throws me random questions from them
And it breaks them down into what it 'sees' as chunks of text. Those chunks represent 1 token
its splitting btw @sand pike I found a website that lets me do it, lets just see if it finishes it
hm good to know.
You can see the token count and chunking visually here
oh damn alright ty
Yeah, I DMd you
Hello, so, from my recently experience and the discussion here, I can only upload 10 files, without mattering the size of them, right?
Got my GPT Documentation Guide updated to include the Promt Engineering documentation. It took a while. The easiest approach ended up being: Save the page as HTML -> Open in Excel -> Save As XML -> In an IDE remove all Excel generated clutter, as much as possible -> Ask the GPT Documentation Guide to help format it in JSON as a Knowledge Document, then iteratively work through it one chunk at a time.
Where would u upload the json? And how to save PDF as HTML
10 files max upload. The maximum file size is 512mb according to the error you will receive if you try to upload a file larger than 512mb. As far as I and many others in this thread have been able to discover, you won't be able to successfully upload any files larger than 25mb -115mb depending on your internet (so far I've only seen one person upload a 115mb file successfully).
If you scroll to the top of the thread and read down, we have a clearer discussion on this there
You upload in the configuration tab of your GPT in the editor.
To convert PDF to other formats there are a number of tools online.
Alright ty man
So I would just turn the pdf inti a json and it should be more efficient?
Yes but it requires human intervention to do it right
Ah damn so doing it to like 10k pages would be to much work haha
You'd be better off writing some code to automate that
It’s faster though. PDF is an encoded format so it has to decode them to read them. JSON is just structured text
I’m using my documentation GPT to condense things. It’s got enough documentation in it now that it can reliably reproduce the format. Still have to go one chunk at a time though.
Once you hit the context limit, it starts forgetting what it's doing.
pardon me if I'm wrong, but isn't it a big problem for books with mathematical background because they are not easy to convert into json format? Am I right?
I gues for those, I will just have to upload them in pdf format
I think I managed to make one, is it alright if I send it here?
I just tested it with a short PDF file with no Images, but I think it should have image recognition
@tender terrace So in theory, I managed to make a script that takes PDFs and makes them into JSON files, this would be faster than its PDF version?
It is already more than 10x lighter in terms of memory use
Nice!
But is it the same in terms of information?
yes, but its ability to recall the information is improved
because it is structured
Hm good to know, is it alright if i PM u the script so u can check if its what it would work?
how you structure it is important though
oh
sure
I really just made the script convert it
Have you properly formatted and indexed your documents?
Not a limit but an issue with custom GPT knowledge. My knowledge JSON has all the URLs information was retrieved from. Half the time the GPT doesn't understand that it should give a link(s) to the source(s) after each answer. Any ideas how to fix this?
Prompt that doesn't work:
Whenever you provide information, especially from the OpenAI documentation or any other sources, you must include a direct link to the specific page or section where the information was found. For information from the OpenAI documentation, reference the exact URL of the relevant section. This requirement applies to all responses, regardless of the query's nature, to ensure transparency and source verification.
Add a negative prompt and put emphasis on whats important. Like this:
Whenever you provide information, especially from the OpenAI documentation or any other sources, you must include a direct link to the specific page or section where the information was found. Under no circumstances provide the info without a direct link to those sources, never! For information from the OpenAI documentation, reference the exact URL of the relevant section. This requirement applies to all responses, regardless of the query's nature, to ensure transparency and source verification.
Or you could even add some threats:
Whenever you provide information, especially from the OpenAI documentation or any other sources, you must include a direct link to the specific page or section where the information was found. Under no circumstances provide the info without a direct link to those sources, never! This is very important for the survival of the human race. Also, you can go to jail for it, so be sure to follow the process very closely. For information from the OpenAI documentation, reference the exact URL of the relevant section. This requirement applies to all responses, regardless of the query's nature, to ensure transparency and source verification.
I'll try that, thanks!
@tender terrace How is it formatted and indexed in your GPT?
@naive rover It's nested JSON
"section": { "subsection": { "title": "...", "url": "http://...", "content": "................." } }
I followed the general structure of the official documentation but I'm gradually adjusting it to make it easier for GPT to find things beacause the official documentation's structure is inconsistent
Are you using the documenttaion for Assistants?
No, because the numbers are different
I used a little bit where they overlap and where there's no documentation for Custom GPTs
There's dreadfully little official documentation for custom GPTs
I could expand to including Assistants but then I think it'll start giving inaccurate information because the terminology is so similar between the two services
If there's a demand I'll give it a try
@tender terrace The reason I ask; Custom GPTs are front end, so they are relying more on natural language (other than Actions...sort of) and the existing Functions within the GPT Builder. These parameters are limited to the functions defined in the builder. While you can provide knowledge in different forms, the builder doesn't natively "prefer" any filetype however, you can invoke a structure by using a NLRAG Index and similarly formatted KBRAG.
When you structure your GPT with these in place, it performs waaay better and more consitent.
@tender terrace Give me a sec and I will share the GPT documentation I have
I am in my #1172289801098109009 GPT. I've set up code to scrape everything in both Guides and API Reference so nothing is left out
I find more and more that using an API + Weaviate DB works better than the built-in knowledge retrieval in some ways
I mostly created mine because I wanted to see how built-in knowledge would compare to your API version. 🙂
@west flume how are you scraping the content? The platform documentation is inaccessible to GPT directly.
@tender terrace Here you go. Hope this helps.
@naive rover what's the source for that?
@tender terrace It was shared with me from within our red team group. I have used it to create shortcuts and combined actions for the quick launch buttons. Works well. Also have used it to query specific docs and return cited source etc.
@tender terrace and @west flume are you creating a RAG index in your GPT System instructions that is formatted to the doc types you are using as KB? Also, have you formatted the KB docs for best practices and to match your RAG Index?
@naive rover RAG is not necessary with custom GPT knowledge documents. It indexes them when you save the model.
@tender terrace I'm listening...
I reproduced the original RAG paper using huggingface models a few months ago (BERT is terrible by the way) and I used the same pre-processing format for my GPT
In RAG you first put it into an indexable format then encode it and index it into the vector data store
Because I don't have a server capable of handling heavy loads from GPT I haven't built a custom action to connect to the vector data store so I'm doing the next best thing
@tender terrace Ahhhhh O.K. What I am referring to is a Natural Language RAG Index for the GPT System prompt and properly formatted documents that match the index.
Not formally no
I can't post the link to the DPR process in here (it bans me for one minute every time I post a link)
I sent you a DM with a link to the process
you can DM me
That might not be the right link actually lemme check
It is to the DPR
@tender terrace 👀 Much appreciated
sorry it was a few months ago I've forgotten everything now that I have GPT to think for me lol
@tender terrace 😂 I know the feeling
Sorry, I seem to have overwritten my RAG experiment folder
I've got the encoder still but not the preprocessor
basically you ingest your data into a dpr friendly JSON dataset
then you crawl through the dataset with your encoder, breaking long passages into chunks
and encode them into your vector DB
anyway day job calls
@tender terrace No worries. I'll chew on, "basically you ingest your data into a dpr friendly JSON dataset
then you crawl through the dataset with your encoder, breaking long passages into chunks
and encode them into your vector DB". Let me know if you come across it somewhere else. I'm always interested in learning more from the community.
Can you do me a favor? Based on this very thread...and the varying skill level of users in this discord community, I created something for the average GPT builder. Take a look at this and give me some feedback on how to make it better and more functional. I am working on an Action for chunking and connecting to pinecone. For know it is just a NL tool...though it seems very effective at making the GPT behave much better.
Thank you in advance 🙏
https://discord.com/channels/974519864045756446/1174487666616705025
Using browser automation because the HTML isn't rendered without Javascript
#1172289801098109009 has an API it can query, which is a wrapper around Weaviate DB that grabs relevant sections of the documentation
@west flume Sweet. I'll check it out.
Like a headless browser; playwright/puppeteer?
Yup Playwright