ok, looks like I don't have a test for | Google Developer Community | Page 1

delicate prism Feb 23, 2024, 7:01 PM

#

It seems to work if I turn the object into a string, example

  const userMessageContent = [
    {
      type: "text",
      text: userMessage,
    },
    {
      type: "image_url",
      image_url: testAddress,
    },
  ];

  const userMessageString = JSON.stringify(userMessageContent);

  const input: BaseLanguageModelInput = new ChatPromptValue([
    new HumanMessage({
      content: userMessageString,
    }),
  ]);

   const model = new GoogleLLM({
     model: `gemini-pro-vision`,
    });

    const streamingResp = await model.stream(input);

Also the image would have to be a url, I'm not sure if as data will work.

Does this seem correct?

hollow relic Feb 23, 2024, 7:12 PM

#

In your example, testAddress needs to be a data: URL. It doesn't try to load from a website or anything. (At least it doesn't at the moment.)

delicate prism Feb 23, 2024, 7:25 PM

#

hollow relic In your example, `testAddress` needs to be a `data:` URL. It doesn't try to load...

Yeah, for this test I added an image to Firebase storage to get a url

hollow relic Feb 23, 2024, 7:29 PM

#

I assume the Firebase storage URL didn't work? {:

delicate prism Feb 23, 2024, 7:30 PM

#

It worked

hollow relic Feb 23, 2024, 7:30 PM

#

oh wait... are you doing this on Vertex, and the URL for that was a cloud storage url?

delicate prism Feb 23, 2024, 7:30 PM

#

Just set the permissions correctly

#

Yeah

hollow relic Feb 23, 2024, 7:30 PM

#

and it happened to be a PNG file? {:

delicate prism Feb 23, 2024, 7:31 PM

#

It might have been a jpeg, I’ll have to check

hollow relic Feb 23, 2024, 7:32 PM

#

I'm curious if it was. In which case, Google is lying to me. {: I hardcoded it as a PNG. That block of code was... unpleasant. And I'm trying to convince the LangChain folks to create a new class.

#

https://github.com/langchain-ai/langchainjs/blob/b2151ec4d546ff71c7045883251e839042fb4485/libs/langchain-google-common/src/utils/gemini.ts#L48 to see the code

GitHub

langchainjs/libs/langchain-google-common/src/utils/gemini.ts at b21...

🦜🔗 Build context-aware reasoning applications 🦜🔗. Contribute to langchain-ai/langchainjs development by creating an account on GitHub.

#

Anyway... glad it works!

delicate prism Feb 23, 2024, 7:40 PM

#

https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will2.jpg?alt=media&token=7384bbd4-d1fa-4586-8c92-ad24a1ac1f3d

#

https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will2.jpg?alt=media&token=7384bbd4-d1fa-4586-8c92-ad24a1ac1f3d

delicate prism Feb 23, 2024, 7:46 PM

#

hollow relic I'm curious if it was. In which case, Google is lying to me. {: I hardcoded it ...

Does it matter for Langchain? Langchain doesn’t really handle much in my example, I’m basically just passing a string for Langchain to get to vertex API

hollow relic Feb 23, 2024, 7:47 PM

#

😮

delicate prism Feb 23, 2024, 7:48 PM

#

Right? I could always be missing something

hollow relic Feb 23, 2024, 7:48 PM

#

Right, but the Vertex AI API says "The Cloud Storage URI of the image or video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify MIMETYPE." And I just hardcoded the mime type.

#

not to mention that isn't a cloud storage URI

delicate prism Feb 23, 2024, 7:49 PM

#

Ahh

hollow relic Feb 23, 2024, 7:50 PM

#

and langchain doens't care. the only reason I get them involved is that image_url doesn't support a mime type field, which I need. and they've been... reluctant... to add a field, or create a whole new class that supports it.

#

so your experiment indicates that it works far better than they documented. which only shocks me because its to our advantage.

delicate prism Feb 23, 2024, 7:52 PM

#

If I use userMessage = who is this?

response could be:
Assistant: [{"type":"text","text":"That's Will Smith."}]

#

I’m suspecting my experiment might be reflect on how it outputs

hollow relic Feb 23, 2024, 7:54 PM

#

sounds right at a glance.

delicate prism Feb 23, 2024, 7:56 PM

#

Like my response is in text (string format) as above including the Assistant key

#

I had to leave my computer for a few, but I can paste some more examples with this image and different user inputs

hollow relic Feb 23, 2024, 8:01 PM

#

That may just be the toString() version of it. If you take a look at the response itself, is it an object?

delicate prism Feb 23, 2024, 9:22 PM

#

nope, it's just the string. Here's my code:

  const testAddress = `https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will2.jpg?alt=media&token=7384bbd4-d1fa-4586-8c92-ad24a1ac1f3d`;

  const userMessageContent = [
    {
      type: "text",
      text: userMessage,
    },
    {
      type: "image_url",
      image_url: testAddress,
    },
  ];

  const userMessageString = JSON.stringify(userMessageContent);

  const input: BaseLanguageModelInput = new ChatPromptValue([
    new HumanMessage({
      content: userMessageString,
    }),
  ]);

  const model = new GoogleLLM({
    authOptions: {
      credentials: {
        project_id: credential.project_id,
        client_email: credential.client_email,
        private_key: credential.private_key,
        token_uri: credential.token_uri,
      },
    },
    model: `gemini-pro-vision`,
    platformType: "gcp",
  });

  try {
    const streamingResp = await model.stream(input);

    const stream = new ReadableStream({
      async start(controller) {
        try {
          for await (const chunk of streamingResp) {
            console.log("chunk =======>", chunk);
            if (chunk) {
              const textEncoder = new TextEncoder();
              const encodedText = textEncoder.encode(chunk);
              controller.enqueue(encodedText);
            }
          }
          controller.close();
        } catch (error) {
          console.error("Streaming error:", error);
          controller.error(error);
        }
      },
    });

    return new StreamingTextResponse(stream);
  } catch (error) {
    return new Response("Internal Server Error", { status: 500 });
  }

#

some example userMessage and text responses

userMessage:
what is in this picture?

response:
Human: [{"type":"text","text":"what is in this picture?"},{"type":"image_url","image_url":"https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/willsmith.jpg?alt=media&token=9fe400a1-e89b-41c3-a750-d852229e863d"}]

userMessage:
who is in this picture?

response:
Will Smith

userMessage:
what is this person wearing?

response:
{"type":"text","text":"The person is wearing a blue shirt, black pants, and white shoes."}

#

The responses are all over the place. The file in the response doesn't exist, it's just what it responds as.

#

It seems to be working as a LLM, input reflects the output

hollow relic Feb 23, 2024, 9:27 PM

#

...

#

it... RESPONDED... with an image_url type?

delicate prism Feb 23, 2024, 9:27 PM

#

What I think is interesting is it’s actually getting the file from the url in the string

#

I could throw together a repo later tonight and you can try. I’m hoping I just did some wrong 😅

#

Or who knows I stubbled on something

hollow relic Feb 23, 2024, 9:29 PM

#

if you could, I'd appreciate it. I'll try to take a look this weekend. Something looks very odd ehre.

delicate prism Feb 23, 2024, 9:29 PM

#

It does look odd

hollow relic Feb 23, 2024, 9:34 PM

#

there's no way you should have gotten that as a response. just... no way.

delicate prism Feb 23, 2024, 10:32 PM

#

Alright, this makes more sense: Since I'm turning the array into a string I'm basically sending this to the llm for userMessageString

[{"type":"text","text":"who is this?"},{"type":"image_url","image_url":"https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will8.jpg?alt=media&token=64e55faf-3c3d-41ef-999d-ee26836bf285"}]

The response is assuming will8.jpg is Will Smith, but this image is not. The image was just labelled this way to test.

The response I got back was:
[{"type":"text","text":"that is Will Smith"}]

It's basically just trying to autocomplete

#

So now I'm back to figuring out how to send an image properly again 😅
I knew there was something I was doing wrong

hollow relic Feb 23, 2024, 10:34 PM

#

whew. ok. I feel a little better.
Let me take a closer look at your code.

#

ok.. .yes, if I had looked at that part, I would have madea face at you. {:

#

Tho I'll need to look at it more closely later, since I need to run shortly. um...

looking at predictMessages() you need to send an array of BaseMessage objects, which will the be array of text and image url typed objects. I think. I'm doing a lot by memory.

delicate prism Feb 23, 2024, 10:42 PM

#

If I try predictMessages() like this:

  const testMessages: BaseMessage[] = [
    new HumanMessage({
      content: [
        {
          type: "text",
          text: userMessage,
        },
        {
          type: "image_url",
          image_url: testAddress,
        },
      ],
    }),
  ];

  const testPredict = await model.predictMessages(testMessages);
  console.log("testPredict =======>", testPredict);

I get:
Error: Could not get access token for Google with status code: 400

hollow relic Feb 23, 2024, 11:49 PM

#

That seems like a problem!

#

(Tho I'm not sure its with langchain) ok. Will definitely have to look more into this this weekend. Sorry for the problems. /:

delicate prism Feb 24, 2024, 4:26 PM

#

honestly no worries at all, I'm always learning and getting better so this helps me way more than is a concern 😃

hollow relic Feb 24, 2024, 4:36 PM

#

Well, if there is a bug, I need to fix it!

delicate prism Feb 24, 2024, 4:51 PM

#

It might be a bug, this langchain repo has gotten huge. It increases the barrier for a casual contributor like to me to jump in and try to figure out and confirm 😅

hollow relic Feb 24, 2024, 4:52 PM

#

delicate prism It might be a bug, this langchain repo has gotten huge. It increases the barrie...

Well, the good news is that things are also more isolated now. Most of this is likely isolated to the google-common library

#

I'm updating my repo now and creating a test to try things out.

delicate prism Feb 24, 2024, 4:55 PM

#

It looks like the auth.ts file is throwing the error, so I'm guessing it's how opts.data is sent in the _request ?

#

of the GoogleAbstractedFetchClient class

hollow relic Feb 24, 2024, 5:45 PM

#

which auth package are you using again?

delicate prism Feb 24, 2024, 6:59 PM

#

I'm adding the credentials manually

  import { GoogleLLM } from "@langchain/google-webauth";
  const model = new GoogleLLM({
    authOptions: {
      credentials: {
        project_id: credential.project_id,
        client_email: credential.client_email,
        private_key: credential.private_key,
        token_uri: credential.token_uri,
      },
    },
    model: `gemini-pro-vision`,
    platformType: "gcp",
  });

  const input: BaseLanguageModelInput = new ChatPromptValue([
    new HumanMessage({
      content: [
        {
          type: "text",
          text: userMessage,
        },
        {
          type: "image_url",
          image_url: testAddress,
          // image_url: `data:image/png;base64,${media[0]}`,
          // image_url: {
          //   url: `data:image/png;base64,${media[0]}`,
          // },
        },
      ],
    }),
  ]);

  const streamingResp = await model.stream(input);

I'm using this in Next js on edge.

I've been trying to shotgun every which way to send an image. With this above I'm getting a response back:
" I'm sorry, I don't understand what you mean. Can you please rephrase your question?"

Which leads me to believe I'm not setting up the input correctly.

#

To get the 400 error I was using the predictMessages in this way:

  const testMessages: BaseMessage[] = [
    new HumanMessage({
      content: [
        {
          type: "text",
          text: userMessage,
        },
        {
          type: "image_url",
          // image_url: testAddress,
          // image_url: `data:image/jpeg;base64,${media[0]}`,
          image_url: `data:image/png;base64,${media[0]}`,
          // image_url: {
          //   // mime_type: media_types[0],
          //   url: `data:image/png;base64,${media[0]}`,
          // },
        },
      ],
    }),
  ];

  const testPredict = await model.predictMessages(testMessages);
  console.log("testPredict =======>", testPredict);

hollow relic Feb 24, 2024, 7:33 PM

#

Webauth was what I wanted to verify. Thank you.

#

(I found another bug on my way to fixing yours)

#

well... to looking at yours.

delicate prism Feb 24, 2024, 8:05 PM

#

it might just be the way I'm putting together the message to send to the model

#ok, looks like I don't have a test for