#Azure Document Intelligence

1 messages · Page 1 of 1 (latest)

ornate oracle
#

I can help with that the large JSON happens because Azure returns all layout and metadata by default, not just the tables. You can filter or post-process it to extract only the table objects while preserving column relations. Are you using the prebuilt model or the custom layout model?

autumn peak
#

I have used prebuilt layout model, it my case it does the job. maybe the invoice model might be better but it will mean more unneeded data. cus my goal is just to have a json or markdown representation of my table from the source document (pdf, excel).

so the only solution is to parse the JSON? SDK doesnt provide anything like "clean JSON"?

ornate oracle
#

You're absolutely right the prebuilt layout model gives great accuracy, but yeah, it dumps a ton of extra layout and metadata. The SDK doesn’t provide a direct “clean JSON” output yet, but I’ve built custom scripts that process the response and return just the structured table data (in JSON or Markdown) while keeping the column relations intact.

#

If you’d like, I can show you how to set that up or even build a small utility that automates it for your workflow. Want to message me with a quick outline of how I’d approach it?
@redi.capoj