PopChoice: using an api for database instead? | Scrimba | Page 1

grim torrent Nov 27, 2023, 6:56 PM

#

I recently completed the embedding course as part of the AI engineer path. The practice project is to create an AI movie recommendation app. I was wondering how hard would it be to use an API for the movie database instead of the one provided. Some obvious issues for this is

How to convert API data into a vector database on Supabase?
How does embeddings work at scale?

Anyone have any thoughts on this?

near temple Nov 27, 2023, 7:32 PM

#

It would definitely be a great challenge.
I don't think this is impossible to do.
You would need to find a way get a lot of entires from the API of your choice using some kind of filtering.
Once you have your dataset it would be fairly easy to convert it into vectors.
If you are using langchain I don't think you need to worry about embeddings with large datasets in fact probably the bigger the dataset the better. As far as I know the algorithm would only pick the entries relevant to your use case and not all of them just the closest one to the base value.

grim torrent Dec 1, 2023, 4:12 PM

#

near temple It would definitely be a great challenge. I don't think this is impossible to do...

Thank you!

round hedge Dec 1, 2023, 4:31 PM

#

grim torrent I recently completed the embedding course as part of the AI engineer path. The p...

I'll assume you would recommend the movie using the movie's description or such field which may have some semantic match with the user query?

In that case you'd scrape the API data, convert the description to a langchain document with the title etc as metadata (as embedding the title, rating would'nt help much), then embed the Document and save in a VectorStore, then when you have something to query against, use a Retriever to have the relevant data back from your VectorStore. This whole thing can be composed into a Chain.

I have abstracted away some of the underlying works as I haven't used langchain heavily in a while but this approach should work.

round hedge Dec 1, 2023, 4:33 PM

#

grim torrent I recently completed the embedding course as part of the AI engineer path. The p...

I remember in a project the number of Document was destined to go big, so with the increased embedding count in the DB, the retrieval time would also increase. I remember have a metadata as a filter for retrieval, for your case maybe the movie genre. So the number of DB record to match against decreases if you can specify such filter.

grim torrent Dec 1, 2023, 4:33 PM

#

round hedge I'll assume you would recommend the movie using the movie's description or such ...

I'll assume you would recommend the movie using the movie's description or such field which may have some semantic match with the user query?

No, I use a vector database of movies, currently there's only 34, the idea was to expand that database to 500+ vectors

#PopChoice: using an api for database instead?