#pdf

1 messages · Page 1 of 1 (latest)

sterile ether
#

I am utilizing GPT-4 to parse a PDF document. The document contains line breaks within certain columns, which may be causing issues with GPT-4's ability to accurately interpret the content. Initially, I attempted to convert the PDF to text and then provided that text as input to GPT-4. However, the results were unsatisfactory. The presence of line breaks in the columns caused GPT-4 to misinterpret the structure, treating each line break as a separate object and disrupting the continuity of the text.

To address this issue, I tried converting the PDF to an image and feeding it as input to GPT-4. While this approach yielded slight improvements in the results, the accuracy is still lacking.

At this point, I am seeking guidance on how to proceed. Should I explore alternative methods or techniques to enhance the accuracy of parsing the PDF document using GPT-4 within a Node.js environment?

vast kernel
sterile ether
#

I am using a package called pdf-parse its a node related package btw I fine tuned the data now I am getting a little furnished reponses but those resp are cutted

#

By cutted I mean a full strigifeid object is not being returned some details are omitted or coming half for ex {name: 'zainul', age: 22, emai