Skip to main content
Version: 8.4

OCR connector

To enable the processing of unstructured files such as PDFs or images, Data Capture includes native connectivity with optical character recognition (OCR) platforms.

To use these services, you must first create a user account with the selected OCR provider.

1.1. OCR connector.
1.1. OCR connector.

Mindee

To connect Data Capture to the Mindee API, follow these steps:

  1. Create an account and retrieve the API key:
  • Sign up on the Mindee platform (Sign Up) to create an account.

  • Once logged in, retrieve your secret key in the Mindee API Keys section.

  1. Overview of available APIs. Mindee offers several APIs designed to extract data using its OCR engine. Here are some of the types of documents supported:
  • Invoices

  • Expenses

  • Receipts

  • ID Documents (identity documents, passports)

Each API has its own documentation, which specifies in particular:

  • The version of the API you are using

  • The endpoints for making API calls

1.1. Version & Endpoint name.
1.1. Version & Endpoint name.

Next, in your Mindee account, you can add your Document API configurations as follows:

  1. Access: Data Capture → Configuration → Mindee → Mindee Accounts.

  2. Click on “Add Account” (+ icon) to configure a new Mindee account.

  3. Enter the secret API token obtained on the Mindee platform.

1.2. Username & API Token.
1.2. Username & API Token.
  1. Select the type of document you want to process (e.g., Invoices, Expenses, etc.).

  2. Specify the API version and the endpoint corresponding to this document.

1.3. Document API.
1.3. Document API.
note

It is also possible to define a default Mindee API, which will be automatically used when generating a template via this platform.

1.4. Document API (default Mindee account).
1.4. Document API (default Mindee account).

Mindee Schema

In some cases, the data provided by Mindee may be structured in a nested manner, meaning that an expected value may be found inside a sub-element.

  1. Example: Currency. In the JSON extract below, the currency is embedded in the “currency” attribute, which is itself included in the ‘locale’ sub-element.
1.1. JAVA script.
1.1. JAVA script.

In order for Data Capture to identify and extract the currency code correctly, it is necessary to specify the depth level of the data using the $ operator.

Thus, to access the currency value in this structure, we will use the notation locale$currency, allowing us to target the value “EUR” directly.

1.2. The local$currency notation allows you to target the EUR value directly.
1.2. The local$currency notation allows you to target the EUR value directly.

In addition, some data in Mindee responses may be presented in the form of lists—this is particularly the case for invoice lines.

In this type of structure, correct extraction of line data requires the introduction of a single parent element, serving as a consistent grouping for all elements in the list.

For example, in Data Capture, it is possible to define a group of elements that represents all the lines of the invoice. This allows each line to be processed individually while associating them with their main document.

  1. Example: Invoice lines
1.3. Invoice lines.
1.3. Invoice lines.

In Data Capture, you can define a named group (for example, ligne_items) to represent all the lines. This group acts as a parent container, structuring the lines in a logical manner. This allows each line to be isolated individually while linking it to the original document.

The line data is then accessed via line_items, which facilitates its use and mapping to Axelor Open Suite.

1.4. In the image: line_items.
1.4. In the image: line_items.

Capture settings – Mindee

When configuring capture settings for integration with Mindee, two essential elements must be defined:

  • Mindee account: this setting enables authentication and connection to the Mindee platform. It guarantees secure access to document extraction services.

  • Document API: this setting specifies the document analysis model to be used. Each model corresponds to a document type (invoice, receipt, ID, etc.) and guides the processing applied during data extraction.

These two parameters are then used by the Data Capture model, which uses this information to connect to the Mindee API, analyze the transmitted document, apply the appropriate OCR engine, and then extract structured data from it.

1.1. Mindee configurations.
1.1. Mindee configurations.

Data Capture model – Mindee

Following data extraction via the Mindee API, the Data Capture model plays a central role in the data verification process.

In particular, it allows you to debug the extracted information by recording the raw response returned by Mindee. This response is automatically stored in the model, providing complete transparency on the captured data.

This feature facilitates:

  • analysis of the extracted data ;

  • detection of any anomalies or inaccuracies ;

  • validating information before it is used or transformed in business processes.

1.1. Image: OCR Response anomaly.
1.1. Image: OCR Response anomaly.