Extract document data

The Extract Document Data step uses AI to pull structured fields from a PDF or image and return them as JSON that your automation can use.

This is useful when you need to turn unstructured files, such as invoices, forms, or receipts, into machine-readable data.

📘
You need to enable AI to use this feature.

What this step does

At runtime, this step:

The extraction output returns at most one object per run

Extracting invoice fields such as invoiceNumber, total, and dueDate
Pulling ID or application form details into a table row
Reading values from uploaded receipts, purchase orders, or contracts
Converting incoming document attachments from email triggers into structured records

If the source and input format do not match, extraction fails

For URL sources, File Type determines how the document is processed.

Data schema defines the fields and expected types. Keep it simple and explicit.

Supported value hints:

Example schema:

{
  "invoiceNumber": "string",
  "invoiceDate": "string",
  "totalAmount": "number",
  "isPaid": "boolean"
}

Tips:

Use stable field names you can map directly into table columns
Prefer simple primitive types
If a value is ambiguous (for example dates), initially extract as string and normalise in a later step

Trigger: Row Created on a table with an attachment column
Step: Extract Document Data
Input:
- Source: Attachment
- Document: attachment binding from the trigger row
- Data schema: expected fields (for example invoice fields)
Step: Update Row to write stepsByName.ExtractStep.data.0.<field> values into columns
Optional: add a Condition step to branch when ExtractStep.success is false

If extraction fails, check the following:

Missing required inputs:
- Ensure both Document and Data schema are set
Source/input mismatch:
- URL source must receive a URL string
- Attachment source must receive an attachment object
URL fetch failures:
- Confirm the URL is reachable from your Budibase environment
- Confirm authentication or firewall rules are not blocking access
Schema/output parsing failures:
- Simplify your schema and retry
- Start with a few fields, then add more incrementally
No data found:
- The step can fail if the model cannot find matching values for your schema
- Try clearer field names and cleaner source documents