#Custom GPT Knowledge Limit?

1 messages · Page 1 of 1 (latest)

sand pike
#

Hi all, hope everyone is having fun exploring creating custom GPTs. Has anyone else experienced the error saving issue when uploading more then 10 documents for the GPT to reference as knowledge?

dry thunder
sand pike
#

Aware of that. But I am getting error saving when I try to upload my 11th+ document.

sand pike
#

Refresh, still same issue. I suppose I'll try again in a few hours

dry thunder
#

I already get problems when I'm uploading less files like it does not use the infos

#

or I do something wrong

sand pike
#

Yea I have about 2000 files to upload, so getting error saving after uploading only 10 is concerning

dry thunder
#

I think it wont work because it takes like hours to check 5 docs

sand pike
#

GPT to help teachers, student, and parents better understand AI and it's presence in todays world. AI is coming to classroom but won't be adopted unless teachers, students, and parents trust and understand AI

dry thunder
sand pike
#

I've got 10 uploaded now and it seems to take about less than 30 seconds to give a response based on the uploaded knowledge. I'm going to try again in a few hours to upload PDF #11, I'll update here when I do so

sand pike
#

Screenshot for future reference.

dry thunder
barren fractal
#

How big are the pdfs?

halcyon cobalt
#

It won't let you upload more then 10 documents period, they just don't make that clear

sand pike
#

Ah what a nice UX lol. Thanks for commenting. Assumed this but was hoping it was just a bug

Here's to hoping they come out with a higher tiered plan to upload much much more

ionic hornet
#

Is there a file size limit ?

sand pike
#

I'd imagine there must be to an extent. But you make a good point, I'll combine a bunch of my PDFs together and see

sand pike
#

512MB is the max

unborn swan
#

is there document for this stuff somewhere? I an struggling to get any custom GPTs to really work. It seems very hit or miss on whether is sources from documents I am uploading

#

I feel like sometimes my uploaded documents are not really getting processed as the GPTs isn't aware of anything from that document. Other documents seem to have worked

barren fractal
unborn swan
#

I was hoping I would be able to make customer GPTs that would be experts on like specific big manuals...but it doesn't seem viable since it just doesn't reliably seem to source from the document, even for very simple prompts that are explicitly answered in the knowledge document

keen lantern
#

OpenAI's implementation isn't that great. If coding isn't out of the question you could probably just make your own vector database and either use the old API to inject context or give the AI a function to do searches

#

hardly any point in uploading files when you can probably just fit everything in the prompt with these constraints

unborn swan
#

yeah I guess I was assiming it would actually be fine-tuning on the knowledge docs somehow but it seems to be doing something more shallow

#

if its not really deeply utulizing the knowledge docs then the GPT's feel 99% less useful than I was expecting

keen lantern
#

the whole thing feels rushed. devs should probably stick to the "old" (current) way of doing things

barren fractal
#

Try adding something to the instructions like:
"This GPT has access to the manual for the product in its knowledge base. Whenever the user asks a question, the GPT searches its knowledge base for answers."

When the GPT scans its knowledge, you see a pop-up that says "searching knowledge base". As far as I can tell, that's the only time the knowledge comes into play

unborn swan
#

in fact it might be LESS useful than custom instructions alone since with custom instructions it actually reliably uses that info

#

right...my main issue is that is does not reliably do the "searching knowledge base" thing

umbral fox
#

works great

unborn swan
#

hmm, I will try that

umbral fox
#

highlight how important that is to you, and it will always do that

unborn swan
#

that does seem to get it to search...but it keeps claiming the info doesn't exist in its documentation even though its explicitly there in one of the uploaded docs

#

this seems really janky

#

I swear its just not actually processing the uploaded docs sometimes

umbral fox
unborn swan
#

plain text. I gave up on trying to use PFT's and just started using plain text docs

umbral fox
#

It works best with text, images are prone to such errors

#

hm weird

unborn swan
#

with other docs it seems to work IF it says "searching". Some docs it acts as if it can't see them

barren fractal
#

It's probably just hallucinating. If you add some encouragement to your prompt or tell it to look harder it might just find it

safe kelp
#

For what i understand, the number of uploaded knowledge files cannot exceed 10. Additionally, is there a size limit for each file?

umbral fox
#

the files can be 10gb at max or 2 million tokens, what ever comes first

unborn swan
#

anyone else seeing their files be duplicaed in the knowlede file section? I had 6 files uploaded. How I see a duplicate of each file for a total of 12

#

I wonder if that is why I am having issues, because now I am over the 10 file limit

#

I have a feeling this is why my custom GPTs is broken now. I added that 6th doc, so when it got duplicated that means I now have 12 which puts me over the 10 file limit

safe kelp
unborn swan
#

some files are actually duped more than once

sand pike
umbral fox
#

How many files can I upload at once?

The limit is up to 10 files at once. Keep in mind there are file size restrictions and usage caps per user/org.

What are those file size restrictions?

The size of text, document, and spreadsheet files is capped at 2M tokens per file, with a hard limit of 512MB per file.
For images, there's a limit of 20MB per image.
Additionally, there are usage caps:
Each end-user is capped at 10GB.
Each organization is capped at 100GB.
Note: An error will be displayed if a user/org cap has been hit.

sand pike
#

Are you on enterprise?

umbral fox
#

No

sand pike
#

Oh look at that

#

Wait never mind, still can't upload past 12 files

#

Token limit seems to be the only thing I guess

umbral fox
#

you maybe just can't upload a specific file, if it exceeds the token limit

unborn swan
#

this is massively frustrating. It just flat out does no reliably use information from the uploaded knowledge documents. Even when it says its "searching". I'm giving up on this. I hope its just glitching because of their roll-out and load issues

sand pike
#

When I originally made this post I couldn't upload beyond 10 files, I tried multiple files of varying sizes to no avail. After reading your comment I went back and tried to upload more (the same I was attempting to earlier) and was able to upload them successfully. But I was only able to upload two more files before getting the same 'Error Saving Draft' message.
The reference image I included earlier shows the error occurring at 10 files. Now we can see it happening when I try to upload #13

umbral fox
sand pike
umbral fox
sand pike
#

Yea that's what I'm thinking too, would love to see them come out with a less limited version for a higher price

unborn swan
#

but custom instructions actually work, it will use those instructions reliably. For the customer GPTs it simply does not reliably use the information in the documents. Even if they are plain text. At least not for me.

umbral fox
#

nvm custom models are $2-$3 million lol

sand pike
#

Well worth it for the right businesses though

unborn swan
#

it just seems like they oversold this in their announcements and marketing

#

I am hoping its just glitching for me. I'll try again another time

sand pike
#

Honestly I feel it's more the media hyping and people in general hyping things up. I really can't recall the last chatGPT ad I saw

unborn swan
#

well in their presentation that made it seems as it would reliably actually source from the data in the uploaded documents. That is not what I am seeing at all. It could just be technical issues I guess.

#

I have a feeling it would work better for smaller files. But if you upload like a big 50k word manual, it doesn't seem to reliably retrieve info. Sometimes it works, sometimes it doesn't

turbid gulch
#

so if you have very strange fiction in a file I think it might use it more reliable

#

as it can answer the thread easily

toxic nexus
# unborn swan but custom instructions actually work, it will use those instructions reliably. ...

When you upload documents, it doesn't actually look at them or scan them. It only uses them as a reference. So you ask it about the document and it thinks of a key word to search, finds pages with that term and checks to see if it's relevant. If you want it to actually know everything in the document and understand it all at once with context, you would have to paste the text into the chat.

#

What's the point of having 128k tokens if it can't read the data you upload automatically?

west flume
tawdry steppe
#

For me I could add more, first the 10 docs allowed in the GTP Builder phase , then I added more attaching them in the text box once the GTP it is been published: First I asked :how many documents could you add to your knowledge base ?
GPT response:

There isn't a specific limit to the number of documents that can be added to my knowledge base. The capacity to manage and utilize the content from multiple documents effectively depends on the complexity and size of each document. I can handle a considerable number of documents simultaneously, ensuring comprehensive analysis and retrieval of relevant information as needed. If you have more documents you wish to add, feel free to upload them, and I will incorporate their content into my existing knowledge base to assist you better.
User: there you go ! ( attached more than 50 docs in several batches) . Then asked about them and it seems it process them and added into the KB

flint summit
#

I tried a workaround that works to ensure all knowledge is top of mind on the gpts but it made my hair hrey. i posted it as text in the chat instead and prefixed everything as inportant or guided why it was inportant in the context

winter vector
#

I can't find anything official about this. Why don't they clearly communicate how much knowledge we can upload? What is your current status? Total memory and number of files? @sand pike

sand pike
#

As of now they say the max file size you can upload is 512mb. I’ve been trying to upload a file sized 412mb and it takes over 10 minutes, only for it to not complete uploading in the end. I haven’t gotten around to breaking down my file even further to see how it will take 250-300mb

#

To overcome the issue of number of files being uploaded I’ve merged a bunch of my PDFs together into one, hence the large file size. Still have to do more testing to see how functional it works in terms of reading through a 2000 page pdf

calm prairie
#

also seems like uploading files with firefox gives some issues very strange

winter vector
calm prairie
#

ok thats why i can't uplload anything now

#

1 user uploaded 500mb nice

sand pike
#

Lol, yea they definitely have some work to do with the knowledge upload part

#

The point of the custom gpt is to have a chatbot trained on whatever data you want it to fine tuned to. Difficult to make a usable chatbot (that’s anything more than what you can do via uploading files to chat) with the limited ability to upload “large” data

calm prairie
#

i think it's what they have on enterprise 😛

#

And it's so awesome

west plover
plucky remnant
sand pike
#

A bit late to the party @plucky remnant

plucky remnant
#

Yep

dry thunder
winter vector
west plover
round dawn
#

Has anyone tested if photos are processed. IE: follow the theme / styling of photos in knowledge?

low plank
# sand pike 512MB is the max

ChatGPT itself reports that you can upload 10 files of 100MB each. If you have more than 10 files, you need to join them all to fit in 10 files, and each one no more than 100MB. 👍

sand pike
low plank
sand pike
low plank
# sand pike Yea you’re wrong and chatgpt lied to you

That's better than what I said. It looks like there's a misunderstanding. I told you that the maximum size of EACH file should be 100MB * 10 = 1TB total size. On the message you pointed out, it says 10 * 512MB size = 5TB ( Wow ) 🙏 Better than I expected, since I knew the maximum was 100MB. See ... 100MB per file is already a ton of texts.

sand pike
#

I hate to break it to you, but 512mb * 10 = 5.12gb

low plank
sand pike
#

Alright I’m gonna step out of this conversation because this is like talking to a wall. Have a goodnight

low plank
thorny yacht
#

Across all files its 500mb max, but with a 2m token limit.

low plank
#

I joined all texts from my previous conversations almost daily on ChatGPT in one PDF file. Incredibly, the file is only 1.2MB size. Now, when I begin a new conversation, I include it, just to give context, and to continue from where we stopped. ❤️

thorny yacht
#

So it works?

#

Im unsure if OAI visualizes anything seen in pdf files, for example images.

thorny yacht
low plank
thorny yacht
#

But, how is tokens included really? I dont fully understand it. What do we have to worry about with tokens?

low plank
thorny yacht
#

Thanks!

low plank
#

Here ChatGPT explains better:

Let's discuss the relationship between the uploaded text files (used as a knowledge base) and the use of tokens in our conversation in English.

  1. Text Files as Knowledge Base: You can upload up to 10 text files, each up to 100MB, which I can use as a knowledge base. These files can contain detailed information, background, specific topics, or any data you want me to reference during our conversation.

  2. Token Usage in Conversation: Tokens are units of text, like words or parts of words, that I use to understand and generate responses. In the GPT-4 model, there's a limit of approximately 128 thousand tokens for each conversation session.

  3. How They Relate: The tokens are used to process both your queries and my responses in our conversation. When I access the uploaded files to find information relevant to your queries, the text I read from these files also counts towards the token limit. However, the size of the files themselves (in MB) does not directly impact the token limit; it's the amount of text processed from these files that counts towards the token usage.

  4. Practical Implication: If our conversation involves complex queries that require me to reference large sections of the uploaded files, it can consume a significant number of tokens, thereby reducing the amount of conversation we can have before reaching the token limit.

In summary, while the uploaded files provide a rich source of information for our conversation, the amount of text I process from these files does impact the overall token usage in our session. Therefore, managing the scope and depth of queries can help optimize token usage, especially in lengthy discussions.

thorny yacht
#

Okay so nothing matters realistically other than 100mb per file, weird because all of my files add up to 142mb

#

I think I might know the issue 😆

low plank
thorny yacht
#

All files are less than 100mb, adding up to 142mb

sand pike
#

Tokens are simply in reference to the amount of text to be processed (reading images is not yet a capability). If you try to upload a file larger than 512mb, you will get an error message saying the file size limit is 512mb. But, you will not be able to successfully upload any files larger than 25mb (as far as I and the others on this thread have been able to achieve).

supple walrus
#

Unless you use my GPT README.bot :p

calm prairie
supple walrus
#

Let me know if it breaks…I’ve had to reupload files for it to remember stuff twice now.

calm prairie
#

Cool work

#

how many files you uploaded

pastel skiff
#

Can it read images in the knowledge files? With vision?

west flume
#

It also links the relevant section of the documentation when it answers you

calm prairie
#

How on earth you found that out

#

let me try it

#

my custom gpts suffers with 200kb .txt
but can receive a 10mb pdf thats weird

west flume
#

And it's more flexible IMO

silent kiln
#

Perhaps a dumb question on my part.. but do you mean you can simply upload a vector database (file), and it will know how to include that knowledge?

west flume
west plover
#

and this is why i use txt files and convert them to Parsed single string .json files. way easier and more Token Efficient for GPT

#

In a json String it can search instead of reading the entire thing and wasting tokens.

orchid hound
#

If you need a bigger knowledgebase you'll need to host your own API to do RAG on and do function calls. The RAG built into agents has some limitations. As said you can use weavate or if you want to go cheap redis in a runpod with a flask (or fastapi) app for vector based search. Essentially you'll need to use chatwithmydocs but only the api and document upload features. Agents can make api calls.

narrow gust
barren fractal
barren fractal
#

Ah, I think I see how the knowledge base works now. Small files that are attached seem to be included in the context window while large files need to be searched. The file names are always be included in the context window as well.

For example, if I upload a file called "password.txt" and ask the GPT for the password, it reads it out immediately. If I also upload a large file called "thesaurus.pdf" and ask the GPT for a synonym of a word, it takes time to search through the entire pdf before answering.

But I'm still noticing that my basic thesaurus GPT sometimes decides not to search through its thesaurus pdf even when it should be (I know because it tells me things that disagree with my pdf). Adding the line "answer using thesaurus.pdf" to the instructions fixes this.

tired wagon
tender terrace
#

It's ten files, 8,192 tokens each. That's 81,920 tokens total. The model can only keep 8,192 tokens in its context window at a time so it can't actually reference the entire set at once. It's best if you prepare the documents in JSON or CSV and condense/summarize any filler text as part of preprocessing to get the most out of it.

supple walrus
supple walrus
supple walrus
supple walrus
brave spire
#

@sand pikesome1 gave me a tip where you could merge all the PDFs into 1, or just merge a couple, and CHATGPT can merge them for u, thats what I did w my medical one and its working good

tender terrace
#

Difference is I'm hand scraping the old fashioned way with copy paste and building the JSON data structure

#

I'm also curating the sources to only include the most helpful articles so it doesn't get bogged down by irrelevant data

supple walrus
#

Yeah same, I was referring to @west flume building his own vector store instead of just using the baked in one for GPT's.

supple walrus
tender terrace
#

No custom instructions beyond telling it what its purpose is. When using well structured JSON it figures out the flow on its own.

#

It's very similar to writing a RESTful API

#

but with a huge body of data

#

Hard part is converting so much plain text into JSON without losing document flow

#

Time to spin up a "Document to JSON GPT". This is ridiculously laborious.

west flume
#

Never used Weaviate before so it's a good experience

west flume
tender terrace
#

Hooowwww!?

west flume
#

I used Playwright in Node.js

#

The logic gets a bit complicated in some places but it's doable

tender terrace
#

interesting. I'd love to see the implementation.

#

Want to trade? I'll help you with your GPT building if you'll share your scraper with me. 😄

west flume
#

Like you need to get it to expand all the properties of the params in the API References first

#

Haha maybe in the future after I clean it up a bit

tender terrace
#

it's waaaaay beyond its context window.

west flume
#

My data is additionally chunked by sections

supple walrus
supple walrus
tender terrace
#

Is it? I haven't read it. >.<

west flume
umbral fox
west flume
#

I'm sure they wouldn't mind it in this case

#

All towards helping developers spend more money on OpenAI APIs

supple walrus
#

I thought I read it somewhere in their policies, but I cant find it now, so maybe Im remembering wrong?

umbral fox
#

I see why scraping ChatGPT wouldn't be allowed but why should they not allow scraping of their main website? That load can't be too high, right?

west flume
#

It's probably cached too

west flume
#

In my GPT it just updates the Weaviate DB with the new stuff

#

I have a cronjob that runs every 6 hours that checks if the documentation changed and updates the database if so

#

There's also a tagging system that marks legacy and beta endpoints, and I plan to also automatically clean up the data some more before updating the DB in the future

#

I already do some data cleaning but it could be done better

#

It automates all the stuff you need to do like clicking on each code example, expanding all the properties of parameters, expanding all info boxes, etc. before/during scraping the page

tender terrace
#

Run failed
You exceeded your current quota, please check your plan and billing details.

Haha I guess I won't be using playground for this.

gritty lagoon
sand pike
brave spire
#

are u sure?

#

wait let me send a screenshot just a minute

sand pike
#

As of the last time I checked a few days ago, 100% sure

brave spire
#

these 3 are 3 of the ones there

sand pike
#

That's for all your files?

brave spire
#

no, thats for only 3 of them

#

wait let me check the KB for the otehsr

sand pike
#

Let me try, I have a 412mb pdf

brave spire
#

Physio - 80K KBs
Nutrition - 40K KBs
Cardio - 60K KBs
Pediatric - 90K KBs
Anatomy (1) - 211K KBs
Muscular Anatomy - 20K KBs

#

and it works perfectly fine, the AI can search through them, it might be because I specified that, when the question is regarding X it will prioritize searching on X documents

#

I still haven't hit the 512 MB limit you said was the theoretical possibility though, im gonna try uploading another file that would make me hit the limit to see what happens.

sand pike
#

How long did Anatomy take to upload for you?

brave spire
#

I just uploaded a dupe of the physio doc, which would put me over the 512 MB limit, but nothing happend, it even saved fine

brave spire
sand pike
#

Weird, I've been here for the past 5 minutes. So far when I've tried to upload my 412mb pdf it will sit here for around 10 minutes then will fail. No error message, just will not upload and remove itself

brave spire
#

i see

sand pike
#

What's your internet service like? Average or above average?

brave spire
#

Im using LAN,

sand pike
#

That;s gotta be my issue then

gritty lagoon
sand pike
#

Can I send you my file and see if you're able to upload it on your side?

brave spire
#

Yea sure

#

@sand pike , u want me to create a new gpt and just upload the file or upload to the one w 500 Mb already?

sand pike
#

Doesn't matter to me. I just want to see if you'll be able to upload it. The 512mb limit is per file, not across the board. The total limit would be 512mb * 10 files = approx 5gb

#

I'll send you a pm in like 30 mins when I get back to my office

sand pike
#

Big thanks to @brave spire , he was able to upload my 412mb file with ease
From my conclusion, if you have above average internet like Jureg then you will have no issues uploading "larger" files. If you have normal internet like myself. you'll be stuck waiting on files to upload, eventually leading to time out.

Edit: Disregard. Still issues with uploading, stuck at 95%

brave spire
#

nah idk why it still hastn uploaded yet xD

#

its been stuck on like 95% for a long time

#

I think you should try separating the file into 2 or 3 then uploading it, i managed to upload a 211 MB file with ease so maybe u can do it to

#

Im gonna leave it here until the "time out" to see what happens

#

but yea, separating them into 2 and uploading them separately could work, im gonna try doing that

sand pike
#

Dang spoke too soon lol

brave spire
#

xD yea

#

good luck with separating them, i was gonna go the GPT ruote but you cant even upload the file on GPT, and on ilovepdf u need premium sad

sand pike
brave spire
#

alright gl, im gonna try splitting them as well, if i do ill send them over to you

west flume
brave spire
#

If i managed to upload a 211 MB file with ease yestarday night you can probably upload a split version of yours

west flume
#

sometimes tokenizing the text can take a long time

brave spire
#

What would tokenizing the text be?

#

btw im not american, but since we are talking here already, is there a way I can download past SATs and ACTs?

west flume
#

LLMs see your text as "tokens"

brave spire
sand pike
#

And it breaks them down into what it 'sees' as chunks of text. Those chunks represent 1 token

brave spire
#

its splitting btw @sand pike I found a website that lets me do it, lets just see if it finishes it

sand pike
sand pike
brave spire
#

oh damn alright ty

tough vault
#

Hello, so, from my recently experience and the discussion here, I can only upload 10 files, without mattering the size of them, right?

tender terrace
#

Got my GPT Documentation Guide updated to include the Promt Engineering documentation. It took a while. The easiest approach ended up being: Save the page as HTML -> Open in Excel -> Save As XML -> In an IDE remove all Excel generated clutter, as much as possible -> Ask the GPT Documentation Guide to help format it in JSON as a Knowledge Document, then iteratively work through it one chunk at a time.

brave spire
#

Where would u upload the json? And how to save PDF as HTML

sand pike
#

If you scroll to the top of the thread and read down, we have a clearer discussion on this there

tender terrace
brave spire
#

Alright ty man

#

So I would just turn the pdf inti a json and it should be more efficient?

tender terrace
#

Yes but it requires human intervention to do it right

brave spire
#

Ah damn so doing it to like 10k pages would be to much work haha

west flume
#

You'd be better off writing some code to automate that

tender terrace
#

It’s faster though. PDF is an encoded format so it has to decode them to read them. JSON is just structured text

#

I’m using my documentation GPT to condense things. It’s got enough documentation in it now that it can reliably reproduce the format. Still have to go one chunk at a time though.

#

Once you hit the context limit, it starts forgetting what it's doing.

tough vault
#

pardon me if I'm wrong, but isn't it a big problem for books with mathematical background because they are not easy to convert into json format? Am I right?

#

I gues for those, I will just have to upload them in pdf format

brave spire
#

I just tested it with a short PDF file with no Images, but I think it should have image recognition

brave spire
#

@tender terrace So in theory, I managed to make a script that takes PDFs and makes them into JSON files, this would be faster than its PDF version?

#

It is already more than 10x lighter in terms of memory use

tender terrace
#

Nice!

brave spire
#

But is it the same in terms of information?

tender terrace
#

yes, but its ability to recall the information is improved

#

because it is structured

brave spire
#

Hm good to know, is it alright if i PM u the script so u can check if its what it would work?

tender terrace
#

how you structure it is important though

brave spire
#

oh

tender terrace
#

sure

brave spire
#

I really just made the script convert it

naive rover
tender terrace
#

Not a limit but an issue with custom GPT knowledge. My knowledge JSON has all the URLs information was retrieved from. Half the time the GPT doesn't understand that it should give a link(s) to the source(s) after each answer. Any ideas how to fix this?

#

Prompt that doesn't work:

Whenever you provide information, especially from the OpenAI documentation or any other sources, you must include a direct link to the specific page or section where the information was found. For information from the OpenAI documentation, reference the exact URL of the relevant section. This requirement applies to all responses, regardless of the query's nature, to ensure transparency and source verification.

umbral fox
# tender terrace Prompt that doesn't work: ```Whenever you provide information, especially from...

Add a negative prompt and put emphasis on whats important. Like this:

Whenever you provide information, especially from the OpenAI documentation or any other sources, you must include a direct link to the specific page or section where the information was found. Under no circumstances provide the info without a direct link to those sources, never! For information from the OpenAI documentation, reference the exact URL of the relevant section. This requirement applies to all responses, regardless of the query's nature, to ensure transparency and source verification.

Or you could even add some threats:

Whenever you provide information, especially from the OpenAI documentation or any other sources, you must include a direct link to the specific page or section where the information was found. Under no circumstances provide the info without a direct link to those sources, never! This is very important for the survival of the human race. Also, you can go to jail for it, so be sure to follow the process very closely. For information from the OpenAI documentation, reference the exact URL of the relevant section. This requirement applies to all responses, regardless of the query's nature, to ensure transparency and source verification.

tender terrace
#

I'll try that, thanks!

naive rover
#

@tender terrace How is it formatted and indexed in your GPT?

tender terrace
#

@naive rover It's nested JSON

#

"section": { "subsection": { "title": "...", "url": "http://...", "content": "................." } }

#

I followed the general structure of the official documentation but I'm gradually adjusting it to make it easier for GPT to find things beacause the official documentation's structure is inconsistent

naive rover
tender terrace
#

No, because the numbers are different

#

I used a little bit where they overlap and where there's no documentation for Custom GPTs

#

There's dreadfully little official documentation for custom GPTs

#

I could expand to including Assistants but then I think it'll start giving inaccurate information because the terminology is so similar between the two services

#

If there's a demand I'll give it a try

naive rover
#

@tender terrace The reason I ask; Custom GPTs are front end, so they are relying more on natural language (other than Actions...sort of) and the existing Functions within the GPT Builder. These parameters are limited to the functions defined in the builder. While you can provide knowledge in different forms, the builder doesn't natively "prefer" any filetype however, you can invoke a structure by using a NLRAG Index and similarly formatted KBRAG.

When you structure your GPT with these in place, it performs waaay better and more consitent.

#

@tender terrace Give me a sec and I will share the GPT documentation I have

west flume
#

I find more and more that using an API + Weaviate DB works better than the built-in knowledge retrieval in some ways

tender terrace
#

@west flume how are you scraping the content? The platform documentation is inaccessible to GPT directly.

naive rover
tender terrace
#

@naive rover what's the source for that?

naive rover
#

@tender terrace It was shared with me from within our red team group. I have used it to create shortcuts and combined actions for the quick launch buttons. Works well. Also have used it to query specific docs and return cited source etc.

#

@tender terrace and @west flume are you creating a RAG index in your GPT System instructions that is formatted to the doc types you are using as KB? Also, have you formatted the KB docs for best practices and to match your RAG Index?

tender terrace
#

@naive rover RAG is not necessary with custom GPT knowledge documents. It indexes them when you save the model.

naive rover
#

@tender terrace I'm listening...

tender terrace
#

I reproduced the original RAG paper using huggingface models a few months ago (BERT is terrible by the way) and I used the same pre-processing format for my GPT

#

In RAG you first put it into an indexable format then encode it and index it into the vector data store

#

Because I don't have a server capable of handling heavy loads from GPT I haven't built a custom action to connect to the vector data store so I'm doing the next best thing

naive rover
#

@tender terrace Ahhhhh O.K. What I am referring to is a Natural Language RAG Index for the GPT System prompt and properly formatted documents that match the index.

tender terrace
#

Not formally no

#

I can't post the link to the DPR process in here (it bans me for one minute every time I post a link)

#

I sent you a DM with a link to the process

naive rover
#

you can DM me

tender terrace
#

That might not be the right link actually lemme check

naive rover
#

It is to the DPR

tender terrace
#

nope the DPR is the retriever part of RAG

#

I'm looking for the encoder

naive rover
#

@tender terrace 👀 Much appreciated

tender terrace
#

sorry it was a few months ago I've forgotten everything now that I have GPT to think for me lol

naive rover
#

@tender terrace 😂 I know the feeling

tender terrace
#

Sorry, I seem to have overwritten my RAG experiment folder

#

I've got the encoder still but not the preprocessor

#

basically you ingest your data into a dpr friendly JSON dataset

#

then you crawl through the dataset with your encoder, breaking long passages into chunks

#

and encode them into your vector DB

#

anyway day job calls

naive rover
#

@tender terrace No worries. I'll chew on, "basically you ingest your data into a dpr friendly JSON dataset
then you crawl through the dataset with your encoder, breaking long passages into chunks
and encode them into your vector DB". Let me know if you come across it somewhere else. I'm always interested in learning more from the community.

Can you do me a favor? Based on this very thread...and the varying skill level of users in this discord community, I created something for the average GPT builder. Take a look at this and give me some feedback on how to make it better and more functional. I am working on an Action for chunking and connecting to pinecone. For know it is just a NL tool...though it seems very effective at making the GPT behave much better.

Thank you in advance 🙏

https://discord.com/channels/974519864045756446/1174487666616705025

west flume
west flume
naive rover
#

@west flume Sweet. I'll check it out.

naive rover
west flume