#ok, looks like I don't have a test for
1 messages ยท Page 1 of 1 (latest)
It seems to work if I turn the object into a string, example
const userMessageContent = [
{
type: "text",
text: userMessage,
},
{
type: "image_url",
image_url: testAddress,
},
];
const userMessageString = JSON.stringify(userMessageContent);
const input: BaseLanguageModelInput = new ChatPromptValue([
new HumanMessage({
content: userMessageString,
}),
]);
const model = new GoogleLLM({
model: `gemini-pro-vision`,
});
const streamingResp = await model.stream(input);
Also the image would have to be a url, I'm not sure if as data will work.
Does this seem correct?
In your example, testAddress needs to be a data: URL. It doesn't try to load from a website or anything. (At least it doesn't at the moment.)
Yeah, for this test I added an image to Firebase storage to get a url
I assume the Firebase storage URL didn't work? {:
It worked
oh wait... are you doing this on Vertex, and the URL for that was a cloud storage url?
and it happened to be a PNG file? {:
It might have been a jpeg, Iโll have to check
I'm curious if it was. In which case, Google is lying to me. {: I hardcoded it as a PNG. That block of code was... unpleasant. And I'm trying to convince the LangChain folks to create a new class.
https://github.com/langchain-ai/langchainjs/blob/b2151ec4d546ff71c7045883251e839042fb4485/libs/langchain-google-common/src/utils/gemini.ts#L48 to see the code
Anyway... glad it works!
https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will2.jpg?alt=media&token=7384bbd4-d1fa-4586-8c92-ad24a1ac1f3d
Does it matter for Langchain? Langchain doesnโt really handle much in my example, Iโm basically just passing a string for Langchain to get to vertex API
๐ฎ
Right? I could always be missing something
Right, but the Vertex AI API says "The Cloud Storage URI of the image or video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify MIMETYPE." And I just hardcoded the mime type.
not to mention that isn't a cloud storage URI
Ahh
and langchain doens't care. the only reason I get them involved is that image_url doesn't support a mime type field, which I need. and they've been... reluctant... to add a field, or create a whole new class that supports it.
so your experiment indicates that it works far better than they documented. which only shocks me because its to our advantage.
If I use userMessage = who is this?
response could be:
Assistant: [{"type":"text","text":"That's Will Smith."}]
Iโm suspecting my experiment might be reflect on how it outputs
sounds right at a glance.
Like my response is in text (string format) as above including the Assistant key
I had to leave my computer for a few, but I can paste some more examples with this image and different user inputs
That may just be the toString() version of it. If you take a look at the response itself, is it an object?
nope, it's just the string. Here's my code:
const testAddress = `https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will2.jpg?alt=media&token=7384bbd4-d1fa-4586-8c92-ad24a1ac1f3d`;
const userMessageContent = [
{
type: "text",
text: userMessage,
},
{
type: "image_url",
image_url: testAddress,
},
];
const userMessageString = JSON.stringify(userMessageContent);
const input: BaseLanguageModelInput = new ChatPromptValue([
new HumanMessage({
content: userMessageString,
}),
]);
const model = new GoogleLLM({
authOptions: {
credentials: {
project_id: credential.project_id,
client_email: credential.client_email,
private_key: credential.private_key,
token_uri: credential.token_uri,
},
},
model: `gemini-pro-vision`,
platformType: "gcp",
});
try {
const streamingResp = await model.stream(input);
const stream = new ReadableStream({
async start(controller) {
try {
for await (const chunk of streamingResp) {
console.log("chunk =======>", chunk);
if (chunk) {
const textEncoder = new TextEncoder();
const encodedText = textEncoder.encode(chunk);
controller.enqueue(encodedText);
}
}
controller.close();
} catch (error) {
console.error("Streaming error:", error);
controller.error(error);
}
},
});
return new StreamingTextResponse(stream);
} catch (error) {
return new Response("Internal Server Error", { status: 500 });
}
some example userMessage and text responses
userMessage:
what is in this picture?
response:
Human: [{"type":"text","text":"what is in this picture?"},{"type":"image_url","image_url":"https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/willsmith.jpg?alt=media&token=9fe400a1-e89b-41c3-a750-d852229e863d"}]
userMessage:
who is in this picture?
response:
Will Smith
userMessage:
what is this person wearing?
response:
{"type":"text","text":"The person is wearing a blue shirt, black pants, and white shoes."}
The responses are all over the place. The file in the response doesn't exist, it's just what it responds as.
It seems to be working as a LLM, input reflects the output
What I think is interesting is itโs actually getting the file from the url in the string
I could throw together a repo later tonight and you can try. Iโm hoping I just did some wrong ๐
Or who knows I stubbled on something
if you could, I'd appreciate it. I'll try to take a look this weekend. Something looks very odd ehre.
It does look odd
there's no way you should have gotten that as a response. just... no way.
Alright, this makes more sense: Since I'm turning the array into a string I'm basically sending this to the llm for userMessageString
[{"type":"text","text":"who is this?"},{"type":"image_url","image_url":"https://firebasestorage.googleapis.com/v0/b/vertex-hackathon-df344.appspot.com/o/will8.jpg?alt=media&token=64e55faf-3c3d-41ef-999d-ee26836bf285"}]
The response is assuming will8.jpg is Will Smith, but this image is not. The image was just labelled this way to test.
The response I got back was:
[{"type":"text","text":"that is Will Smith"}]
It's basically just trying to autocomplete
So now I'm back to figuring out how to send an image properly again ๐
I knew there was something I was doing wrong
whew. ok. I feel a little better.
Let me take a closer look at your code.
ok.. .yes, if I had looked at that part, I would have madea face at you. {:
Tho I'll need to look at it more closely later, since I need to run shortly. um...
looking at predictMessages() you need to send an array of BaseMessage objects, which will the be array of text and image url typed objects. I think. I'm doing a lot by memory.
If I try predictMessages() like this:
const testMessages: BaseMessage[] = [
new HumanMessage({
content: [
{
type: "text",
text: userMessage,
},
{
type: "image_url",
image_url: testAddress,
},
],
}),
];
const testPredict = await model.predictMessages(testMessages);
console.log("testPredict =======>", testPredict);
I get:
Error: Could not get access token for Google with status code: 400
That seems like a problem!
(Tho I'm not sure its with langchain) ok. Will definitely have to look more into this this weekend. Sorry for the problems. /:
honestly no worries at all, I'm always learning and getting better so this helps me way more than is a concern ๐
Well, if there is a bug, I need to fix it!
It might be a bug, this langchain repo has gotten huge. It increases the barrier for a casual contributor like to me to jump in and try to figure out and confirm ๐
Well, the good news is that things are also more isolated now. Most of this is likely isolated to the google-common library
I'm updating my repo now and creating a test to try things out.
It looks like the auth.ts file is throwing the error, so I'm guessing it's how opts.data is sent in the _request ?
of the GoogleAbstractedFetchClient class
which auth package are you using again?
I'm adding the credentials manually
import { GoogleLLM } from "@langchain/google-webauth";
const model = new GoogleLLM({
authOptions: {
credentials: {
project_id: credential.project_id,
client_email: credential.client_email,
private_key: credential.private_key,
token_uri: credential.token_uri,
},
},
model: `gemini-pro-vision`,
platformType: "gcp",
});
const input: BaseLanguageModelInput = new ChatPromptValue([
new HumanMessage({
content: [
{
type: "text",
text: userMessage,
},
{
type: "image_url",
image_url: testAddress,
// image_url: `data:image/png;base64,${media[0]}`,
// image_url: {
// url: `data:image/png;base64,${media[0]}`,
// },
},
],
}),
]);
const streamingResp = await model.stream(input);
I'm using this in Next js on edge.
I've been trying to shotgun every which way to send an image. With this above I'm getting a response back:
" I'm sorry, I don't understand what you mean. Can you please rephrase your question?"
Which leads me to believe I'm not setting up the input correctly.
To get the 400 error I was using the predictMessages in this way:
const testMessages: BaseMessage[] = [
new HumanMessage({
content: [
{
type: "text",
text: userMessage,
},
{
type: "image_url",
// image_url: testAddress,
// image_url: `data:image/jpeg;base64,${media[0]}`,
image_url: `data:image/png;base64,${media[0]}`,
// image_url: {
// // mime_type: media_types[0],
// url: `data:image/png;base64,${media[0]}`,
// },
},
],
}),
];
const testPredict = await model.predictMessages(testMessages);
console.log("testPredict =======>", testPredict);
Webauth was what I wanted to verify. Thank you.
(I found another bug on my way to fixing yours)
well... to looking at yours.
it might just be the way I'm putting together the message to send to the model