#Yes

rain panther · 2023-12-14T20:20:23.438Z

Yes | Google Developer Community | Page 1

1 messages · Page 1 of 1 (latest)

rain panther Dec 14, 2023, 8:20 PM

the example in the docs only shows one text input and one image, how is it possible to send multiple?

Here's how Im attempting to do it


    messages.append(
       
            f"""
            The attached images are all from the same pdf but can have entities that span multiple pages.
            Please only return a JSON that represents which pages are associated with an individual entity.
            """
    )

    messages.append(
            f"Information on the pdf: There are only {len(images)} pages total. Entities will only span continous pages, so an entity can span pages 2 and 3 but not 2, 3, 6 and 7 because 2,3,6,7 aren't continous. Remember that it is possible for two distinct entities to have similar information but still be disjoint.",
    )

    messages.append('Tip: Please take time to think and look at all of the pages. This is important to my career.')
    
messages.append(images)

    response = gemini_vision.generate_content(messages)```

getting this error Could not create `Blob`, expected `Blob`, `dict` or an `Image` type(`PIL.Image.Image` or `IPython.display.Image`). Got a: <class 'list'> Value: [<PIL.Image.Image image mode=RGB size=1700x2200 at 0x178A428F0>, <PIL.Image.Image image mode=RGB size=1700x2200 at 0x178A42B00>, <PIL.Image.Image image mode=RGB size=1700x2200 at 0x178AFC3D0>, <PIL.Image.Image image mode=RGB size=2700x3300 at 0x178AFC430>, <PIL.Image.Image image mode=RGB size=2700x3300 at 0x178AFC400>, <PIL.Image.Image image mode=RGB size=2900x3300 at 0x178AFC460>, <PIL.Image.Image image mode=RGB size=2900x3300 at 0x178AFC490>]

carmine swift Dec 14, 2023, 9:52 PM

It is about how you are handling your images. Perhaps if you could share more of your code, it would be helpful. Try building your prompt as a dictionary with your text/prompt and images, as key-value pair.

`import PIL.Image as Image
import base64
import json
import requests

Load and pre-process images

image1 = Image.open("image1.jpg").resize((256, 256)).convert("RGB")
image2 = Image.open("image2.png").resize((256, 256)).convert("RGB")
image3 = Image.open("image3.jpeg").resize((256, 256)).convert("RGB")

image1_data = base64.b64encode(image1.tobytes()).decode("utf-8")
image2_data = base64.b64encode(image2.tobytes()).decode("utf-8")
image3_data = base64.b64encode(image3.tobytes()).decode("utf-8")

Build the prompt

prompt = {
"text": "Describe the scene in these images",
"image1": image1_data,
"image2": image2_data,
"image3": image3_data,
}
`
Something like this...