#Yes

1 messages · Page 1 of 1 (latest)

rain panther
#

the example in the docs only shows one text input and one image, how is it possible to send multiple?

#

Here's how Im attempting to do it


    messages.append(
       
            f"""
            The attached images are all from the same pdf but can have entities that span multiple pages.
            Please only return a JSON that represents which pages are associated with an individual entity.
            """
    )

    messages.append(
            f"Information on the pdf: There are only {len(images)} pages total. Entities will only span continous pages, so an entity can span pages 2 and 3 but not 2, 3, 6 and 7 because 2,3,6,7 aren't continous. Remember that it is possible for two distinct entities to have similar information but still be disjoint.",
    )

    messages.append('Tip: Please take time to think and look at all of the pages. This is important to my career.')
    
messages.append(images)

    response = gemini_vision.generate_content(messages)```
#

getting this error Could not create `Blob`, expected `Blob`, `dict` or an `Image` type(`PIL.Image.Image` or `IPython.display.Image`). Got a: <class 'list'> Value: [<PIL.Image.Image image mode=RGB size=1700x2200 at 0x178A428F0>, <PIL.Image.Image image mode=RGB size=1700x2200 at 0x178A42B00>, <PIL.Image.Image image mode=RGB size=1700x2200 at 0x178AFC3D0>, <PIL.Image.Image image mode=RGB size=2700x3300 at 0x178AFC430>, <PIL.Image.Image image mode=RGB size=2700x3300 at 0x178AFC400>, <PIL.Image.Image image mode=RGB size=2900x3300 at 0x178AFC460>, <PIL.Image.Image image mode=RGB size=2900x3300 at 0x178AFC490>]

carmine swift
#

It is about how you are handling your images. Perhaps if you could share more of your code, it would be helpful. Try building your prompt as a dictionary with your text/prompt and images, as key-value pair.

#

`import PIL.Image as Image
import base64
import json
import requests

Load and pre-process images

image1 = Image.open("image1.jpg").resize((256, 256)).convert("RGB")
image2 = Image.open("image2.png").resize((256, 256)).convert("RGB")
image3 = Image.open("image3.jpeg").resize((256, 256)).convert("RGB")

image1_data = base64.b64encode(image1.tobytes()).decode("utf-8")
image2_data = base64.b64encode(image2.tobytes()).decode("utf-8")
image3_data = base64.b64encode(image3.tobytes()).decode("utf-8")

Build the prompt

prompt = {
"text": "Describe the scene in these images",
"image1": image1_data,
"image2": image2_data,
"image3": image3_data,
}
`
Something like this...