#ok, looks like I don't have a test for

1 messages ยท Page 1 of 1 (latest)

delicate prism
#

It seems to work if I turn the object into a string, example

  const userMessageContent = [
    {
      type: "text",
      text: userMessage,
    },
    {
      type: "image_url",
      image_url: testAddress,
    },
  ];

  const userMessageString = JSON.stringify(userMessageContent);

  const input: BaseLanguageModelInput = new ChatPromptValue([
    new HumanMessage({
      content: userMessageString,
    }),
  ]);

   const model = new GoogleLLM({
     model: `gemini-pro-vision`,
    });

    const streamingResp = await model.stream(input);

Also the image would have to be a url, I'm not sure if as data will work.

Does this seem correct?

hollow relic
#

In your example, testAddress needs to be a data: URL. It doesn't try to load from a website or anything. (At least it doesn't at the moment.)

delicate prism
hollow relic
#

I assume the Firebase storage URL didn't work? {:

delicate prism
#

It worked

hollow relic
#

oh wait... are you doing this on Vertex, and the URL for that was a cloud storage url?

delicate prism
#

Just set the permissions correctly

#

Yeah

hollow relic
#

and it happened to be a PNG file? {:

delicate prism
#

It might have been a jpeg, Iโ€™ll have to check

hollow relic
#

I'm curious if it was. In which case, Google is lying to me. {: I hardcoded it as a PNG. That block of code was... unpleasant. And I'm trying to convince the LangChain folks to create a new class.

#

Anyway... glad it works!

delicate prism
#
https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will2.jpg?alt=media&token=7384bbd4-d1fa-4586-8c92-ad24a1ac1f3d
delicate prism
hollow relic
#

๐Ÿ˜ฎ

delicate prism
#

Right? I could always be missing something

hollow relic
#

Right, but the Vertex AI API says "The Cloud Storage URI of the image or video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify MIMETYPE." And I just hardcoded the mime type.

#

not to mention that isn't a cloud storage URI

delicate prism
#

Ahh

hollow relic
#

and langchain doens't care. the only reason I get them involved is that image_url doesn't support a mime type field, which I need. and they've been... reluctant... to add a field, or create a whole new class that supports it.

#

so your experiment indicates that it works far better than they documented. which only shocks me because its to our advantage.

delicate prism
#

If I use userMessage = who is this?

response could be:
Assistant: [{"type":"text","text":"That's Will Smith."}]

#

Iโ€™m suspecting my experiment might be reflect on how it outputs

hollow relic
#

sounds right at a glance.

delicate prism
#

Like my response is in text (string format) as above including the Assistant key

#

I had to leave my computer for a few, but I can paste some more examples with this image and different user inputs

hollow relic
#

That may just be the toString() version of it. If you take a look at the response itself, is it an object?

delicate prism
#

nope, it's just the string. Here's my code:

  const testAddress = `https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will2.jpg?alt=media&token=7384bbd4-d1fa-4586-8c92-ad24a1ac1f3d`;

  const userMessageContent = [
    {
      type: "text",
      text: userMessage,
    },
    {
      type: "image_url",
      image_url: testAddress,
    },
  ];

  const userMessageString = JSON.stringify(userMessageContent);

  const input: BaseLanguageModelInput = new ChatPromptValue([
    new HumanMessage({
      content: userMessageString,
    }),
  ]);

  const model = new GoogleLLM({
    authOptions: {
      credentials: {
        project_id: credential.project_id,
        client_email: credential.client_email,
        private_key: credential.private_key,
        token_uri: credential.token_uri,
      },
    },
    model: `gemini-pro-vision`,
    platformType: "gcp",
  });

  try {
    const streamingResp = await model.stream(input);

    const stream = new ReadableStream({
      async start(controller) {
        try {
          for await (const chunk of streamingResp) {
            console.log("chunk =======>", chunk);
            if (chunk) {
              const textEncoder = new TextEncoder();
              const encodedText = textEncoder.encode(chunk);
              controller.enqueue(encodedText);
            }
          }
          controller.close();
        } catch (error) {
          console.error("Streaming error:", error);
          controller.error(error);
        }
      },
    });

    return new StreamingTextResponse(stream);
  } catch (error) {
    return new Response("Internal Server Error", { status: 500 });
  }
#

some example userMessage and text responses

userMessage:
what is in this picture?

response:
Human: [{"type":"text","text":"what is in this picture?"},{"type":"image_url","image_url":"https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/willsmith.jpg?alt=media&token=9fe400a1-e89b-41c3-a750-d852229e863d"}]

userMessage:
who is in this picture?

response:
Will Smith

userMessage:
what is this person wearing?

response:
{"type":"text","text":"The person is wearing a blue shirt, black pants, and white shoes."}

#

The responses are all over the place. The file in the response doesn't exist, it's just what it responds as.

#

It seems to be working as a LLM, input reflects the output

hollow relic
#

...

#

it... RESPONDED... with an image_url type?

delicate prism
#

What I think is interesting is itโ€™s actually getting the file from the url in the string

#

I could throw together a repo later tonight and you can try. Iโ€™m hoping I just did some wrong ๐Ÿ˜…

#

Or who knows I stubbled on something

hollow relic
#

if you could, I'd appreciate it. I'll try to take a look this weekend. Something looks very odd ehre.

delicate prism
#

It does look odd

hollow relic
#

there's no way you should have gotten that as a response. just... no way.

delicate prism
#

Alright, this makes more sense: Since I'm turning the array into a string I'm basically sending this to the llm for userMessageString

[{"type":"text","text":"who is this?"},{"type":"image_url","image_url":"https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will8.jpg?alt=media&token=64e55faf-3c3d-41ef-999d-ee26836bf285"}]

The response is assuming will8.jpg is Will Smith, but this image is not. The image was just labelled this way to test.

The response I got back was:
[{"type":"text","text":"that is Will Smith"}]

It's basically just trying to autocomplete

#

So now I'm back to figuring out how to send an image properly again ๐Ÿ˜…
I knew there was something I was doing wrong

hollow relic
#

whew. ok. I feel a little better.
Let me take a closer look at your code.

#

ok.. .yes, if I had looked at that part, I would have madea face at you. {:

#

Tho I'll need to look at it more closely later, since I need to run shortly. um...

looking at predictMessages() you need to send an array of BaseMessage objects, which will the be array of text and image url typed objects. I think. I'm doing a lot by memory.

delicate prism
#

If I try predictMessages() like this:

  const testMessages: BaseMessage[] = [
    new HumanMessage({
      content: [
        {
          type: "text",
          text: userMessage,
        },
        {
          type: "image_url",
          image_url: testAddress,
        },
      ],
    }),
  ];

  const testPredict = await model.predictMessages(testMessages);
  console.log("testPredict =======>", testPredict);

I get:
Error: Could not get access token for Google with status code: 400

hollow relic
#

That seems like a problem!

#

(Tho I'm not sure its with langchain) ok. Will definitely have to look more into this this weekend. Sorry for the problems. /:

delicate prism
#

honestly no worries at all, I'm always learning and getting better so this helps me way more than is a concern ๐Ÿ˜ƒ

hollow relic
#

Well, if there is a bug, I need to fix it!

delicate prism
#

It might be a bug, this langchain repo has gotten huge. It increases the barrier for a casual contributor like to me to jump in and try to figure out and confirm ๐Ÿ˜…

hollow relic
#

I'm updating my repo now and creating a test to try things out.

delicate prism
#

It looks like the auth.ts file is throwing the error, so I'm guessing it's how opts.data is sent in the _request ?

#

of the GoogleAbstractedFetchClient class

hollow relic
#

which auth package are you using again?

delicate prism
#

I'm adding the credentials manually

  import { GoogleLLM } from "@langchain/google-webauth";
  const model = new GoogleLLM({
    authOptions: {
      credentials: {
        project_id: credential.project_id,
        client_email: credential.client_email,
        private_key: credential.private_key,
        token_uri: credential.token_uri,
      },
    },
    model: `gemini-pro-vision`,
    platformType: "gcp",
  });

  const input: BaseLanguageModelInput = new ChatPromptValue([
    new HumanMessage({
      content: [
        {
          type: "text",
          text: userMessage,
        },
        {
          type: "image_url",
          image_url: testAddress,
          // image_url: `data:image/png;base64,${media[0]}`,
          // image_url: {
          //   url: `data:image/png;base64,${media[0]}`,
          // },
        },
      ],
    }),
  ]);

  const streamingResp = await model.stream(input);

I'm using this in Next js on edge.

I've been trying to shotgun every which way to send an image. With this above I'm getting a response back:
" I'm sorry, I don't understand what you mean. Can you please rephrase your question?"

Which leads me to believe I'm not setting up the input correctly.

#

To get the 400 error I was using the predictMessages in this way:

  const testMessages: BaseMessage[] = [
    new HumanMessage({
      content: [
        {
          type: "text",
          text: userMessage,
        },
        {
          type: "image_url",
          // image_url: testAddress,
          // image_url: `data:image/jpeg;base64,${media[0]}`,
          image_url: `data:image/png;base64,${media[0]}`,
          // image_url: {
          //   // mime_type: media_types[0],
          //   url: `data:image/png;base64,${media[0]}`,
          // },
        },
      ],
    }),
  ];

  const testPredict = await model.predictMessages(testMessages);
  console.log("testPredict =======>", testPredict);
hollow relic
#

Webauth was what I wanted to verify. Thank you.

#

(I found another bug on my way to fixing yours)

#

well... to looking at yours.

delicate prism
#

it might just be the way I'm putting together the message to send to the model