I am utilizing GPT-4 to parse a PDF document. The document contains line breaks within certain columns, which may be causing issues with GPT-4's ability to accurately interpret the content. Initially, I attempted to convert the PDF to text and then provided that text as input to GPT-4. However, the results were unsatisfactory. The presence of line breaks in the columns caused GPT-4 to misinterpret the structure, treating each line break as a separate object and disrupting the continuity of the text.
To address this issue, I tried converting the PDF to an image and feeding it as input to GPT-4. While this approach yielded slight improvements in the results, the accuracy is still lacking.
At this point, I am seeking guidance on how to proceed. Should I explore alternative methods or techniques to enhance the accuracy of parsing the PDF document using GPT-4 within a Node.js environment?