#PopChoice: using an api for database instead?

6 messages · Page 1 of 1 (latest)

grim torrent
#

I recently completed the embedding course as part of the AI engineer path. The practice project is to create an AI movie recommendation app. I was wondering how hard would it be to use an API for the movie database instead of the one provided. Some obvious issues for this is

  • How to convert API data into a vector database on Supabase?
  • How does embeddings work at scale?

Anyone have any thoughts on this?

near temple
#

It would definitely be a great challenge.
I don't think this is impossible to do.
You would need to find a way get a lot of entires from the API of your choice using some kind of filtering.
Once you have your dataset it would be fairly easy to convert it into vectors.
If you are using langchain I don't think you need to worry about embeddings with large datasets in fact probably the bigger the dataset the better. As far as I know the algorithm would only pick the entries relevant to your use case and not all of them just the closest one to the base value.

round hedge
# grim torrent I recently completed the embedding course as part of the AI engineer path. The p...

I'll assume you would recommend the movie using the movie's description or such field which may have some semantic match with the user query?

In that case you'd scrape the API data, convert the description to a langchain document with the title etc as metadata (as embedding the title, rating would'nt help much), then embed the Document and save in a VectorStore, then when you have something to query against, use a Retriever to have the relevant data back from your VectorStore. This whole thing can be composed into a Chain.

I have abstracted away some of the underlying works as I haven't used langchain heavily in a while but this approach should work.

round hedge
grim torrent