/extraction/

Extraction endpoint is used in extracting data from document images. Images are prefferably first validated using /validation endpoint, but this endpoint can be used independently of validation endpoint. Supported REST method is /extraction/ POST method that submits the document images through JSON body alongside settings that the service should take into account when extracting the data.

Request format

Request contains keys as described by the form on this page.

The request is split into 3 important section

Accepting terms and conditions - which have to be accepted to successfully perform any extraction
DataFields - that contain image data
Settings that contain multiple settings that can be used to tailor a response specifically to your desires

Important notes regarding the request

AcceptTermsAndConditions must be set to true, or the request won't be accepted.

DataFields' Front and Back image keys do not have to actually contain front and back sides respectivelly, but they can be switched, this is due to backwards compatibility, the only restriction is if the document submitted contains only front side (most of European drivers licenses and all passports) IgnoreBackImage setting must be set to true.

Settings structure

Settings contains useful settings that tailor responses to your needs each used setting is as follows:

ShouldReturnDocumentImage - indicates if the cropped image of the document should be returned (we reccommend that this image is displayed to the user so the user can verify that the image was properly cropped)
ShouldReturnFaceIfDetected - indicates if the service should return face image from the document
ShouldReturnSignatureIfDetected - indicates if the service should return signature image from the document
SkipDocumentsSizeCheck - if set to true, any problems tied to input image size (document too far away from the camera or similar) will be added to warnings list and will not raise a BAD REQUEST 400 response
SkipImageSizeCheck - if set to true, any problems tied to low image resolution will be added to warnings list and will not raise a BAD REQUEST 400 response
CanStoreImages - allows the service to store the images that could be used for any troubleshooting or service improvements if set to true
EnforceDocsSameCountryTypeSeries - if set to true, raises a BAD REQUEST 400 error if the front side of the document doesn't correspond to the back side of the document - this might cause some service missfunctioning since some documents have very similar versions which could lead to BAD REQUEST 400 being thrown unnecessarily, we reccommend for this flag to be set to false
CaseSensitiveOutput - if set to true output will correspond to the input text symbols on the document, else all letters will be returned as capitalized
FaceImageResize - isn't a flag, but a field that takes the following format: XxY (ex. 300x400) which indicates the resize dimensions for the returned face image
SignatureImageResize - isn't a flag, but a field that takes the following format: XxY (ex. 300x400) which indicates the resize dimensions for the returned signature image
SegmentedImageResize - isn't a flag, but a field that takes the following format: XxY (ex. 300x400) which indicates the resize dimensions for the returned document cropped images
StoreFaceImage - if set to true, and if ShouldReturnFaceIfDetected is set to true, face image will be stored in AWS S3 container defined in the service environment
DontUseValidation - if set to true, the service runs one validation call prior to extraction and terminates the transaction if the image is inadequate, currently this option is true by default and extraction does not perform one additional validation call

Bolded settings are important and could cause your requests to fail so be mindful of them, other settings are used more for specifically tailoring response format.

Response

The service responds with JSON body containing extraction results as given below:

{
    "TransactionID": "e8edb095-5640-4250-b59a-463df488db87",
    "UploadedAt": "2023-06-13T08:44:31.398563",
    "ProductName": "Scan App v8.2.2",
    "Errors": [],
    "Warnings": [],
    "Status": 200,
    "Method": "Extraction",
    "InfoCode": "1000",
    "Data": {
        "Name": {
            "Read": false,
            "Validated": false,
            "RecommendedValue": null,
            "MRZ": {
                "Read": false,
                "Value": null,
                "Validated": false
            },
            "OCR": {
                "Read": false,
                "Value": null,
                "Validated": false
            }
        },
        "BirthDate": {
            "Read": true,
            "Validated": true,
            "RecommendedValue": "25.11.1979",
            "OCR": {
                "Read": true,
                "Validated": true,
                "Value": "25.11.1979"
            },
            "MRZ": {
                "Read": false,
                "Value": null,
                "Validated": false
            }
        },
        {
        ...
        }
    },
    "Metadata": [
        {
            "Country": "HRV",
            "DocumentSide": "FRONT",
            "DocumentType": "ID",
            "DocumentSeries": "HRV-BO-04001"
        }
    ],
    "ImageData": {
        "Documents": [
            "...", "..."
        ],
        "FaceImage": "...",
        "Signature": "..."
    },
    "AnalysisTime": "1493.297 ms"
}

Response contains fields similar to the /validation endpoint with a couple of differences. The /extraction endpoint doesn't contain info field - errors and warnings should be used instead. You can check info codes and their meaning here

Response data

The most important response fields are

Data
ImageData
Metadata

Data field is a key-value dictionary where the keys are document fields (ex. name, surname, ...) and values are dictionaries with the following key-value structure:

"FieldKey":
{
  "Read": true,
  "Validated": true,
  "RecommendedValue": "<Reccomended-field-value>",
  "OCR": {
    "Read": true,
    "Validated": true,
    "Value": "<Value-read-from-the-document>"
  },
  "MRZ": {
    "Read": false,             
    "Value": "<Value-read-from-the-mrz>",
    "Validated": false
  }
}

When unpacking Data key results, we reccomend you check each Read key and take the ReccommendedValue as the reading. You can check the what's next section if you want a detailed list of all available fields.

ImageData contains images that were modified by the Settings input. ImageData is comprised of Documents, Signature, FaceImage.

Documents contains one or two segmented images (depending on the amount of images submitted through a request) in the front/back order.

Signature key holds a signature image and FaceImage key contains the face image.

All image data is base64 encoded.

Metadata contains metadata about each submitted image in the order of submittion (front/back):

Document country
Document type
Document series
Document side