#Vision API Parameters

1 messages · Page 1 of 1 (latest)

cosmic wolf
#

Following along with this cookbook for piping frames of images into gpt-4-vision-preview: https://cookbook.openai.com/examples/gpt_with_vision_for_video_understanding, there's several things here that don't align with the documentation of the vision api. The input messages to the chat completion api are formatted as such:

PROMPT_MESSAGES = [
    {
        "role": "user",
        "content": [
            "These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.",
            *map(lambda x: {"image": x, "resize": 768}, base64Frames[0::10]),
        ],
    },
]

But the documentation here https://platform.openai.com/docs/guides/vision specifies a schema for images that looks like this:

          {
            "type": "image_url",
            "image_url": {
              "url": f"data:image/jpeg;base64,{base64_image}"
            }

With an optional detail value. So is the resize value from the cookbook not doing anything? Is there any way to check?

Explore developer resources, tutorials, API docs, and dynamic examples to get the most out of OpenAI's platform.

knotty frost
#

agreed @cosmic wolf

hardy nova
#

Sometimes the cookbooks are a little messy

#

the second code block you linked is the correct way to attach an image to the vision API according to the spec