How to extract invoice data to Excel — whatever the layout
Every vendor sends invoices in a different layout, and that's exactly what makes them painful to get into a spreadsheet. Here's how to pull invoice number, dates, totals, and line items into clean Excel, CSV, or JSON — across any layout, without building a template per supplier.
- tutorial
- invoices
If your inbox fills up with vendor invoices every month, you already know the drill: open each PDF, find the invoice number, the date, the total, copy out every line item, and type it all into a spreadsheet. Then do it again for the next vendor — whose invoice looks nothing like the last one.
That last part is the real problem. It isn’t the volume of invoices; it’s that no two suppliers format them the same way. The total sits in a different place, the dates use a different style, the line-item table has different columns. A human adapts to each one without thinking. Most software doesn’t — which is why so many teams give up and just retype everything by hand.
This post walks through how to get invoice data out into clean Excel, CSV, or JSON — across any layout — without building and maintaining a separate template for every vendor.
Why invoices are harder to extract than they look
A single, fixed invoice template is easy. The trouble is you almost never have just one. A few reasons invoices resist clean extraction:
- Every vendor’s layout is different. There’s no industry standard for where the invoice number, billing address, or totals go. The template you set up for one supplier breaks the moment a new vendor sends their first invoice.
- “The amount” is ambiguous. A single invoice carries a subtotal, tax, shipping, a total before discount, and a final amount due — often stacked right next to each other. Pull “the amount” without saying which one, and you’ll get whichever the engine guessed.
- Line items are a list, not a value. Each invoice has one invoice number but many line items, each with its own description, quantity, unit price, and line total. Flatten that wrong and you get a jumble where you wanted clean rows.
- PDF, scan, or photo. An invoice emailed as a PDF is clean text. The same invoice scanned at the front desk or photographed on a phone is an image — now you need OCR before you can extract anything, and OCR brings its own errors.
Any tool that claims to “just extract invoices” has to have an answer for all of these. The manual frustration lives in that variety, not in any single invoice.
The common approaches, and where each one stops working
There’s no single right tool — it comes down to how many vendors you deal with and how consistent their layouts are.
Typing it by hand. Zero setup, accurate if you’re careful, and completely unscalable. Fine for a handful a month; a non-starter once you’re processing dozens across many suppliers.
Template-based parsers. You define, once, where each field sits on the page. Fast and cheap if every invoice looks identical. But since every vendor differs, you end up building and maintaining a template per supplier — and rebuilding it the day a vendor tweaks their layout. With three or four steady suppliers this is fine. With a long, changing list of vendors, the setup cost eats the time savings.
Natural-language extraction. Instead of marking positions, you describe the fields you want in plain language and the engine adapts to each layout. This handles the “every vendor is different” problem directly, and it’s far more forgiving of scans and odd formatting. The trade-off is that you want a tool that lets you verify the output — because you’re trusting a model to read the page rather than a fixed coordinate.
That last category is where Ztract sits, so let’s walk through it concretely.
Walkthrough: invoice to Excel in Ztract
Here’s the full flow — the same one whether you’re processing a single invoice or a folder of fifty from a dozen different vendors.
1. Create a project and describe what you want
A project is just a container for related documents and the schema you’ll apply to them. For invoices, you have three ways to define that schema:
-
Start from the ready-made invoice schema and adjust it. This is the fastest start — it already knows about invoice numbers, dates, vendor details, totals, and line items.
-
Describe the fields in plain language. For example:
“For each invoice, extract the invoice number, the issue date, the vendor name, and the total amount due (after tax and discounts). Then for each line item, extract the description, quantity, unit price, and line total. If a field isn’t present, leave it blank rather than guessing.”
Notice two things in there. “The total amount due (after tax and discounts)” tells the engine exactly which of the several amounts you mean. And “for each line item” marks the line items as a repeating list, so you get clean rows back instead of everything mashed into one cell. Those two habits are most of what separates trustworthy invoice output from a mess.
-
Infer from a sample. Drop in one representative invoice and let Ztract propose a schema from it. Useful when a new vendor’s invoice has fields you didn’t expect.
The key advantage: the same schema works across vendors. You’re describing the data you want, not the position it sits in — so a layout you’ve never seen before is handled the same way as one you have. No template per supplier.
2. Upload the invoices
Drag in your files — PDF, Word, Excel, scans, or phone photos, up to 500 MB per file. Text-based PDFs and image-based scans both work; the scans simply get OCR’d first. If a month’s invoices arrive as separate files from different vendors, upload them all together and the same schema applies to every one.
3. Review and correct — the part that actually saves time
Here’s the thing people underestimate: with invoices, the extraction isn’t the time sink — the checking is. If you can’t trust the output, you end up re-reading every invoice against the spreadsheet anyway, and you’ve saved nothing.
Ztract is built around that. Every extracted value is anchored to its exact position on the source document: click a number in the results and it highlights where on the invoice it came from. That side-by-side view is what makes review fast. Instead of re-checking every field, you scan for the ones that look off — a total that pulled the subtotal by mistake, a line item that merged two rows — and fix them in one click.
And because we only charge for extraction, correcting a value costs you nothing. The editing afterward is free; only the pages you extract count against your pack, not the cleanup.
4. Export
Once it looks right, export to Excel, CSV, or JSON — a single invoice or the whole project at once. From there it drops straight into your accounts-payable workflow, your accounting software’s import, or wherever the numbers need to go next.
The cases that still need a human eye
We’d rather tell you where this gets tricky than pretend it doesn’t. A few situations to watch:
- Credit notes and refunds. A credit note looks like an invoice but the amounts run the other way. Be explicit in your schema about how to treat negative amounts, and double-check the sign in the review step.
- Multi-currency suppliers. If you buy from vendors in different currencies, capture the currency as its own field per invoice rather than assuming one currency across the batch — otherwise a total of “1,000” tells you nothing.
- Badly degraded scans. A faxed-then-rescanned invoice with faint print is hard for anyone to read, OCR included. If the source is illegible to your eye, expect to verify more closely — a cleaner scan beats any amount of after-the-fact correction.
If a layout we should handle comes back wrong, we genuinely want to see it — email a sample (anonymized if you need) to support@ztract.com and we’ll dig in. The documents people send us are how the engine gets better.
A note on vendor data
Invoices carry sensitive commercial information — who you buy from, what you pay, your account numbers — so it’s worth being clear: we don’t train models on the documents you upload. Not our own engine, and not the third-party LLMs we route through; the commercial APIs we use prohibit training on submitted data, and we rely on those commitments. When you delete an invoice, it’s gone immediately from active storage and within 14 days from backups. The full picture is in our Privacy Policy and Data Processing Agreement.
Try it on your own invoices
The fastest way to know whether this fits your workflow is to run it on a handful of real invoices you’d otherwise be typing out by hand — ideally from a few different vendors, so you can see the same schema handle different layouts. New accounts get 30 free pages, no credit card — plenty to extract a batch end to end and check the totals yourself.
And if you process invoices in volume and would share honest feedback on what worked and what didn’t, get in touch — we’re onboarding early users and shaping what we build next around the documents people actually struggle with. Invoices are right at the top of that list.
Checkout our use case page for more on invoice data extraction.