#Vector Data Type

19 messages · Page 1 of 1 (latest)

brazen coral
#

Hi, I am curious how you'd implement a vector data type. Basically, I am trying to create vectors from some content, but currently storing it as a string, which is not ideal. I'd like to store it as a vector and query it with pgvector

regal falconBOT
velvet wind
brazen coral
#

Like a way to store vector embeddings

velvet wind
#

understand.

Generally it seems possible. There is a "schedule a demo"
https://payloadcms.com/enterprise/ai-framework?

I would add a non payload managed table./column Maybe in
https://payloadcms.com/docs/database/postgres#beforeschemainit. Have you already tried ideas? Would be great to have a solution.

Payload

With Payload, your content is automatically ready for any LLM or AI application.

Payload

Payload is a headless CMS and application framework built with TypeScript, Node.js, React and MongoDB

hoary mulch
#

You can have a vector column with beforeSchemaInit!
import vector like that:
import {vector } from '@payloadcms/db-postgres/drizzle/pg-core'
Also, if you pass extensions: ["pg_vector"] payload will automatically create the extension for you, but you can omit it if you are certain that the extension is created in some other way in the db.

#

that being said, you won't have a native way to manage that field from the admin panel

brazen coral
#

Hm, doesn't seem super possible

#

My use case being this:

Whenever a document is changed, I'd like to create several document-chunks, where the document chunks have a vector embedding

const afterChangeHook: CollectionAfterChangeHook<Document> = async ({ doc, req }) => {
  if (!doc.url) return

  await req.payload.delete({
    collection: 'document-chunk',
    where: {
      document: {
        equals: doc.id,
      },
    },
    req,
  })

  const response = await fetch('http://localhost:3000' + doc.url)
  const buffer = await response.arrayBuffer()
  const pdfLoader = new WebPDFLoader(new Blob([buffer]))

  const pdf = await pdfLoader.load()

  pdf.map(async (page) => {
    const embedding = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: [page.pageContent],
      encoding_format: 'float',
    })

    await req.payload.create({
      collection: 'document-chunk',
      data: {
        chunk: page.pageContent,
        embedding: embedding.data[0].embedding, // <--- Not possible as currently embedding is set as a string type
        document: doc.id,
      },
      req,
    })
  })
}
rapid sky
brazen coral
rapid sky
# brazen coral How would you insert them? Is there a way to let Payload know of the new types i...
import type { CollectionConfig } from 'payload'

export const Embeddings: CollectionConfig = {
  slug: 'embeddings',
  fields: [
    {
      name: 'model',
      type: 'text',
      required: true,
      index: true,
    },
    {
      name: 'vector',
      type: 'json',
      required: true,
    },
    // Rails styled polymorphic relationship
    {
      name: 'embeddableId',
      type: 'number',
      required: true,
      index: true,
    },
    {
      name: 'embeddableType',
      type: 'text',
      required: true,
      index: true,
    },
  ],
}

I save the embeddings returned by olllama api

velvet wind
# brazen coral This is great! I don't need a way to manage it from the admin panel, but could I...

@brazen coral Local API used the payload config. But drizzle uses payload-generated-schema , so fall back to drizzle for this function.

I haven't tried, but here documentation about payload-generated-schema https://payloadcms.com/docs/database/postgres#note-for-generated-schema
and here Drizzle https://payloadcms.com/docs/database/postgres#access-to-drizzle .

BTW: I like the progress, here.

Payload

Payload is a headless CMS and application framework built with TypeScript, Node.js, React and MongoDB

brazen coral
brazen coral
#

beforeSchemainit straight up doesn't work for me. Doesn't change anything in the database.

 db: postgresAdapter({
    pool: {
      connectionString: process.env.DATABASE_URI || '',
    },
    extensions: ['vector'],
    beforeSchemaInit: [
      ({ schema, adapter }) => {
        return {
          ...schema,
          tables: {
            ...schema.tables,
            document_chunks: {
              ...schema.tables.document_chunks,
              embedding: vector('embedding', { dimensions: 1536 }),
            },
          },
        }
      },
    ],
  }),
brazen coral
#

Had to use afterSchemaInit, things are working now

hearty dragon
#

I do it like this but having errors, what am i missing?