#PDF processing

1 messages · Page 1 of 1 (latest)

storm schooner
#

Hi, im making a tool for work (dont work as a dev but want to automate some work at my office job) and I need to essentially take human readable PDF and convert it to nice strings and then put all that into objects, however im running into scenario where no matter what I try in terms of PDFbox, it's becoming a nightmare to sort everything correctly, especially since custom words are in the data.

any suggestions for tools or approaches here? this has made a fun relatively easy project into a nightmare of impossible logic

pine spearBOT
#

<@&987246399047479336> please have a look, thanks.

barren spoke
#

Yeah.. I looked into this subject briefly myself.. It doesn't seem like a fun time honestly. As far as tools I'm not sure I'd be much help unfortunately. But when I was researching I do remember seeing some options available.

#

This might be a useful discussion to read through.

#

Seems like iText could be a good option IF you plan on open sourcing your code.. Otherwise their license is pretty strict.. PDFBox on the other hand has a very open license which shouldn't give you any issues.

#

This might help with your text issues as well. Using OCR would probably be your best bet.

river zephyr
#

Maybe first: what do you want to achieve? Do you need to do OCR? To which extend do you need to parse the data?

storm schooner
pine spearBOT
#

@storm schooner

Your question has been closed due to inactivity.

If it was not resolved yet, feel free to just post a message below
to reopen it, or create a new thread.

Note that usually the reason for nobody calling back is that your
question may have been not well asked and hence no one felt confident
enough answering.

When you reopen the thread, try to use your time to improve the quality
of the question by elaborating, providing details, context, all relevant code
snippets, any errors you are getting, concrete examples and perhaps also some
screenshots. Share your attempt, explain the expected results and compare
them to the current results.

Also try to make the information easily accessible by sharing code
or assignment descriptions directly on Discord, not behind a link or
PDF-file; provide some guidance for long code snippets and ensure
the code is well formatted and has syntax highlighting. Kindly read through
https://stackoverflow.com/help/how-to-ask for more.

With enough info, someone knows the answer for sure 👍

barren spoke
#

Any luck with this?

river zephyr