OCR connector
To enable the processing of unstructured files such as PDFs or images, Data Capture includes native connectivity with optical character recognition (OCR) platforms.
To use these services, you must first create a user account with the selected OCR provider.
Mindee
To connect Data Capture to the Mindee API, follow these steps:
- Create an account and retrieve the API key:
-
Sign up on the Mindee platform (Sign Up) to create an account.
-
Once logged in, retrieve your secret key in the Mindee API Keys section.
- Overview of available APIs. Mindee offers several APIs designed to extract data using its OCR engine. Here are some of the types of documents supported:
-
Invoices
-
Expenses
-
Receipts
-
ID Documents (identity documents, passports)
Each API has its own documentation, which specifies in particular:
-
The version of the API you are using
-
The endpoints for making API calls
Next, in your Mindee account, you can add your Document API configurations as follows:
-
Access: Data Capture → Configuration → Mindee → Mindee Accounts.
-
Click on “Add Account” (+ icon) to configure a new Mindee account.
-
Enter the secret API token obtained on the Mindee platform.
-
Select the type of document you want to process (e.g., Invoices, Expenses, etc.).
-
Specify the API version and the endpoint corresponding to this document.
It is also possible to define a default Mindee API, which will be automatically used when generating a template via this platform.
Mindee Schema
In some cases, the data provided by Mindee may be structured in a nested manner, meaning that an expected value may be found inside a sub-element.
- Example: Currency. In the JSON extract below, the currency is embedded in the “currency” attribute, which is itself included in the ‘locale’ sub-element.
In order for Data Capture to identify and extract the currency code correctly, it is necessary to specify the depth level of the data using the $ operator.
Thus, to access the currency value in this structure, we will use the notation locale$currency, allowing us to target the value “EUR” directly.
In addition, some data in Mindee responses may be presented in the form of lists—this is particularly the case for invoice lines.
In this type of structure, correct extraction of line data requires the introduction of a single parent element, serving as a consistent grouping for all elements in the list.
For example, in Data Capture, it is possible to define a group of elements that represents all the lines of the invoice. This allows each line to be processed individually while associating them with their main document.
- Example: Invoice lines
In Data Capture, you can define a named group (for example, ligne_items) to represent all the lines. This group acts as a parent container, structuring the lines in a logical manner. This allows each line to be isolated individually while linking it to the original document.
The line data is then accessed via line_items, which facilitates its use and mapping to Axelor Open Suite.
Capture settings – Mindee
When configuring capture settings for integration with Mindee, two essential elements must be defined:
-
Mindee account: this setting enables authentication and connection to the Mindee platform. It guarantees secure access to document extraction services.
-
Document API: this setting specifies the document analysis model to be used. Each model corresponds to a document type (invoice, receipt, ID, etc.) and guides the processing applied during data extraction.
These two parameters are then used by the Data Capture model, which uses this information to connect to the Mindee API, analyze the transmitted document, apply the appropriate OCR engine, and then extract structured data from it.
Data Capture model – Mindee
Following data extraction via the Mindee API, the Data Capture model plays a central role in the data verification process.
In particular, it allows you to debug the extracted information by recording the raw response returned by Mindee. This response is automatically stored in the model, providing complete transparency on the captured data.
This feature facilitates:
-
analysis of the extracted data ;
-
detection of any anomalies or inaccuracies ;
-
validating information before it is used or transformed in business processes.