#Using Chat GPT with an Offline Database
49 messages · Page 1 of 1 (latest)
At first glance, no, because it can only analyze so much text at once. Davinci can only do up to 2048 tokens at a time including your prompt. A token roughly equates to 3-4 letters. This is far smaller than the amount of information you would typically store in a database.
But, if select a small portion of your database, then have openAI process that with a prompt I could see it working.
You just can't expect this application to have all the knowledge of your database at once.
yea you can download the database , it is about 475 terabytes or 4,750,000
gigabytes
I think he was asking about using it to process information in his own offline db.
i really hope so
espetially being the gpu power required to run gpt3 with its full database would cost about 2400000 dollars... given the current cost of the top of the line nvidia gpus for ai and the fact it takes about 80 to handle a basic set up of the ai for offline use... thats not even including the power requirements of running it..
?! and you know this how?!
Any idea how that would work? Would I "fine-tune" based data from my db and how exactly?
Let's say I want to to query my trained model like so: "how many bookings did my company XYZ have in may 2022?"
How would my fine-tune data (jsonl) look like?
why would you need chatgpt for this? you can run that sort of query directly in your database
because also non-developers could easily query for most important data with natural language
"how did our bookings evolve over the past 12 months and how does that relate to the overall economic situation" - where in theory chatgpt would be able to relate our data with the outside world
(i know chatgpt is trained with data only up to 2021)
so many use cases i can not implement with sql
Here I am trying to create a training / fine-tuning set with the help of chatgpt but the results when fine-tuning werent great so i may be missing an important thing here. I am just trying to make chatgpt understand a single fact here. then later i can process data from my db one by one to produce more facts
{"prompt": "in 2002 the company inaplay was founded", "completion": "The company Inaplay was established in 2002."} {"prompt": "in 2002 the company inaplay was founded", "completion": "In 2002, Inaplay came into existence as a company."} {"prompt": "in 2002 the company inaplay was founded", "completion": "The inception of Inaplay as a company occurred in 2002."} {"prompt": "in 2002 the company inaplay was founded", "completion": "In 2002, Inaplay was founded and began operations."} {"prompt": "in 2002 the company inaplay was founded", "completion": "The year 2002 saw the foundation of Inaplay as a company."} {"prompt": "in 2002 the company inaplay was founded", "completion": "Inaplay was founded in 2002, beginning its journey as a company."} {"prompt": "in 2002 the company inaplay was founded", "completion": "The company known as Inaplay had its start in 2002."} {"prompt": "in 2002 the company inaplay was founded", "completion": "2002 marked the beginning of Inaplay as a company."} {"prompt": "in 2002 the company inaplay was founded", "completion": "In 2002, the doors to Inaplay as a company were opened."} {"prompt": "in 2002 the company inaplay was founded", "completion": "The founding of Inaplay as a company took place in 2002."}
the input you can provide to chatgpt is limited. As you mentioned it's trained on a limited set of data (albeit quite large)
however you could use chatgpt to translate a natural language query to a sql query
so you would give chatgpt the structure of your database
ask it "how would i query the number of bookings in 2022"
and it would give you sql
which you can then run on your db
and provide an answer
these guys are doing what you're looking for https://rawquery.com/
wow, nice. did you try rawquery yourself?
nope
id be more interested in making something like rawquery
it's essentially removing the pain of SQL .. however it remains to be seen how reliable it is
currently there is no API for chatGPT, but you can do a simmilar use with GPT-3 text-davinci models
just point the TOS, 2.Usage Requirements. (c) Restrictions. You may not... (iv) use any method to extract data from the Services, including web scraping, web harvesting, or web data extraction methods, other than as permitted through the API;
it would be fine to archive the outputs of GPT-3 API as it was made to be handled by code, but not from the chatGPT website
i mean.. just wait a bit, there will most likely be an API for chatGPT soon
oh, i actually meant the openai API, not chatgpt. sorry...
ohh, right, that makes more sense
if that is the case, go for it =P
archiving / caching the data is a good way to save money on the API usage
any idea how exactly rawquery might teach gpt-3 the structure of a db?
Output: SELECT * FROM users
Input: query all bookings
Output: SELECT * FROM bookings
Input: get all bookings of user 'Hans'
Output: SELECT bookings.* from bookings, users WHERE bookings.user_id = user.id```
like this?
for anyone looking here: yes, that's exactly how it is done. you can paste it into the openai playground , add a new input and execute to get the resulting query that you can then send to the db. that's probably also all the magic of rawquery
https://youtu.be/5tmGKTNW8DQ taking the word of this video
Links:
- The Asianometry Newsletter: https://asianometry.substack.com
- Patreon: https://www.patreon.com/Asianometry
- The Podcast: https://anchor.fm/asianometry
- Twitter: https://twitter.com/asianometry
Is it incorrect?