How to ger RAG working? | OpenRouter | Page 1

river sky Jul 24, 2024, 1:26 PM

#

Hello. I’m trying to get RAG to work. I use the llama3 405b as model with LibreChat as inference/chat GUI.

It doesn’t seem that openrouter API supports embeddings (which I think may be required for RAG to work?)

So if anyone got it working or know how to get it to work please tell me. I want to upload pdfs and have them summarized.

rocky dragon Jul 24, 2024, 2:04 PM

#

Can you attach a screenshot of the RAG UI? Can you use openai for embeddings?

river sky Jul 24, 2024, 2:05 PM

#

RAG UI? I just try to upload a pdf and it fails. I have added the RAG configuration to my .env file in LibreChat to use OpenRouter API and api key.

#

It seems there is an “integrations” tab under settings on the openrouter page which might help me get a non-OR api key. Is it something you can activate for me?

rocky dragon Jul 24, 2024, 3:30 PM

#

Integrations are primary for adding other LLM api keys to your openrouter account

#

so they would not help you get a non-OR api key. what's the exact error message? screenshots help a lot

vital socket Aug 4, 2024, 6:33 PM

#

He could use there OpenAI key for the RAG embedding, it is supported.

proven geode Aug 4, 2024, 6:56 PM

#

vital socket He could use there OpenAI key for the RAG embedding, it is supported.

OpenRouter is not a full proxy for other APIs like OpenAI, so only the advertised models and their modality are supported, no embeddings so far. Using the Integration feature basically only changes billing (and rate limits?), not the API or feature availability.

vital socket Aug 4, 2024, 7:21 PM

#

proven geode OpenRouter is not a full proxy for other APIs like OpenAI, so only the advertise...

I see. Some people thinks OR as an API for all LLM related models including embeddings. The app mentioned use an RAG API with pgvector and supports OpenAI, HF and Ollama.

proven geode Aug 4, 2024, 7:25 PM

#

vital socket I see. Some people thinks OR as an API for all LLM related models including embe...

You totally can create a RAG application using OpenRouter for the LLM part, if you use a different API for embeddings (or run your own local embeddings model, they do not need much compute, JavaScript implementations via transformers.js exist).

#

You can also use Ollama to run an embeddings model, of course -> https://ollama.com/library/nomic-embed-text

#

(you need to use different models for image similarity/embeddings)

#

You do not need PostgreSQL with pgvector, SQLite supports storing these vectors too, there is even a new sqlite-lembed extension, which makes generating embeddings for RAG apps nearly trivial (plus you can combine this with the very capable full text search inside SQLite)

Introducing sqlite-lembed: A SQLite extension for generating text e...

Generate text embeddings in SQL with GGUF models!

vital socket Aug 4, 2024, 7:37 PM

#

I'll read it. By the way I don't know if the current RAG model is here to stay. It's kind of inefficient and frustrating for a lot of requests.

https://x.com/emollick/status/1818416020161544368?t=P4Fjq5ZDN4liihNgYpBQWg&s=19

Ethan Mollick (@emollick) on X

A common cause of error when people use LLMs for serious work is that they don’t know what is in the context window. They assume when they upload a PDF the AI can read it. Instead, either it fails to parse or is only partially read, causing hallucinations

The UX should be better

proven geode Aug 4, 2024, 7:40 PM

#

vital socket I'll read it. By the way I don't know if the current RAG model is here to stay. ...

RAG is how most LLM apps tick, but it is hard to get right and you can easily shoot yourself in the foot with it. It most certainly more than "throw PDFs at it and then ask questions about them", and it will always be there in some form or another. It is just context management in its core.

vital socket Aug 4, 2024, 7:43 PM

#

Yes. If I would deploy RAG for clients I would think hard how to deploy some kind of long term resilient architecture, thinking how to not loose their data if I decide to change some part of the RAG architecture.

#How to ger RAG working?