#video description

1 messages · Page 1 of 1 (latest)

oak sierra
#

how do I use gpt4s image capabilities to get a description of a video (if I were to download then of YouTube for example)

wild starBOT
#

Hi @oak sierra. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

To use GPT-4's image capabilities to get a description of a video, you would first need to extract frames from the video. This can be done using OpenCV, a popular library for computer vision tasks. Here's an example of how you can do this:

import cv2
import base64

video = cv2.VideoCapture("data/bison.mp4")

base64Frames = []
while video.isOpened():
 success, frame = video.read()
 if not success:
 break
 _, buffer = cv2.imencode(".jpg", frame)
 base64Frames.append(base64.b64encode(buffer).decode("utf-8"))

video.release()
print(len(base64Frames), "frames read.")

In this example, the video is read frame by frame, and each frame is encoded into a base64 string and stored in a list.

Once you have the frames, you can craft a prompt and send a request to GPT-4. Note that you don't need to send every frame for GPT-4 to understand what's going on. Here's an example of how you can do this:

from openai import OpenAI

PROMPT_MESSAGES = [
 {
 "role": "user",
 "content": [
 "These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video.",
 *map(lambda x: {"image": x, "resize": 768}, base64Frames[0::50]),
 ],
 },
]
params = {
 "model": "gpt-4-vision-preview",
 "messages": PROMPT_MESSAGES,
 "max_tokens": 200,
}

client = OpenAI()
result = client.chat.completions.create(**params)
print(result.choices[0].message.content)

In this example, a prompt is created that includes the base64-encoded frames, and a request is sent to GPT-4 to generate a description of the video. The result is then printed.

Please note that as of the time of the information provided in the extracts, the image input feature is a research model and is not available for public use. There are no official timelines on its release (source (https://community.openai.com/t/309294)).