#Large JSON inputs for GPT3.5

27 messages · Page 1 of 1 (latest)

raw shore
#

🧵

prisma island
#

maybe just use the json as your prompt?

raw shore
#

It's larger than the token limit, roughly 2 years of transaction data.

prisma island
#

is it like an Q&A?

raw shore
#

Yeah, mostly

prisma island
#

or you want to summarize it at once

raw shore
#

The exact use case: take 2 years of transactions as input, and answer questions about them

prisma island
#

oh a Q&A

#

you can use embeddings

#

take a look at this^

raw shore
#

Super useful, thanks!

#

so in this case, would our transactions JSON be a single embedding?

prisma island
#

no, you should cut it into as many embeddings as possible

#

but dont cut it into a few words lol, like around 50 words per embedding

#

also you should label the small pieces with pre-set questions and embed the questions instead, so the hit rate would be higher

raw shore
#

okay, so I can take our transactions JSON, cut it into CSV-like data (one row per transaction), create an embedding per transaction

prisma island
#

yes

raw shore
#

gotcha

#

and I have to store these embeddings myself

#

or use some hosted version of OpenAI APIs

prisma island
#

openai api doesn't store any data for you

#

you need to store it by yourself

#

or use a vector DB

#

theres a cookbook for that in the link i gave you

raw shore
#

what if they're all roughly the same cosine similarity to the prompt?