#Is it possible to submit a PDF via the API (or a URL for a pdf) so that OpenAI can analyse it?

1 messages · Page 1 of 1 (latest)

sullen vault
#

Is it possible to submit a PDF via the API (or a URL for a pdf) so that OpenAI can analyse it?

which models currently support this?

rich folioBOT
#

Hi @sullen vault. I'm kapa.ai, your friendly support bot. I will try to answer your question by searching through the knowledge base. :wave:

#

Yes, it is possible to submit a PDF for analysis using OpenAI's capabilities, but it requires some preprocessing steps. OpenAI models like GPT-4V and GPT-4o can be used to analyze the content of PDFs, but they do not natively handle PDF files directly. Instead, you need to convert the PDF pages into images and then analyze these images.

Here is a general approach based on the knowledge sources:

  1. Convert PDF to Images: Use a library like pdf2image to convert each page of the PDF into an image. This is necessary because models like GPT-4V and GPT-4o can analyze images but not PDF files directly.

    from pdf2image import convert_from_path
    
    def convert_doc_to_images(path):
        images = convert_from_path(path)
        return images
    
  2. Encode Images: Convert the images to a base64 format to prepare them for analysis.

    import base64
    from io import BytesIO
    from PIL import Image
    
    def get_img_uri(img):
        buffer = BytesIO()
        img.save(buffer, format="jpeg")
        base64_image = base64.b64encode(buffer.getvalue()).decode("utf-8")
        data_uri = f"data:image/jpeg;base64,{base64_image}"
        return data_uri
    
  3. Analyze Images with GPT-4V or GPT-4o: Use the encoded images as input to the model for analysis.

#
   def analyze_image(img_url):
       response = client.chat.completions.create(
           model="gpt-4-vision-preview",
           temperature=0,
           messages=[
               {
                   "role": "system",
                   "content": system_prompt
               },
               {
                   "role": "user",
                   "content": [
                       {
                           "type": "image_url",
                           "image_url": img_url,
                       },
                   ],
               }
           ],
           max_tokens=300,
           top_p=0.1
       )
       return response.choices[0].message.content

For more detailed instructions, you can refer to the OpenAI Cookbook on parsing PDF docs for RAG (https://cookbook.openai.com/examples/parse_pdf_docs_for_rag).

Regarding model support, the models that can handle such tasks include gpt-4-vision-preview and gpt-4o, as they have vision capabilities. You can find more about model availability in the Batch API documentation (https://platform.openai.com/docs/guides/batch#model-availability).

If you need further assistance, you can explore the OpenAI documentation (https://platform.openai.com/docs) for more details.

Note: you can ask me a follow up question by @ mentioning me again :speech_balloon: