Following along with this cookbook for piping frames of images into gpt-4-vision-preview: https://cookbook.openai.com/examples/gpt_with_vision_for_video_understanding, there's several things here that don't align with the documentation of the vision api. The input messages to the chat completion api are formatted as such:
PROMPT_MESSAGES = [
{
"role": "user",
"content": [
"These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.",
*map(lambda x: {"image": x, "resize": 768}, base64Frames[0::10]),
],
},
]
But the documentation here https://platform.openai.com/docs/guides/vision specifies a schema for images that looks like this:
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
With an optional detail value. So is the resize value from the cookbook not doing anything? Is there any way to check?