Announcing support for extracting data from identity documents using Amazon Textract

In the financial services sector, there is the development of new accounts online, or in health care there are new digital platforms to schedule and handle appointments, which require users to fill out kinds. Some businesses (or organizations) have actually tried to automate this procedure and simplify by consisting of identity document uploads, such as a motorists license or passport. You require a service to help automate the extraction of details from identity documents to allow your clients to open bank accounts with ease, or schedule and manage appointments online using accurate details.
Today, we are delighted to announce a new API to Amazon Textract called Analyze ID that will assist you instantly extract info from recognition documents, such as drivers licenses and passports. Amazon Textract uses AI and ML technologies to draw out details from identity files, such as U.S. passports and motorists licenses, without the requirement for templates or configuration. You can immediately extract specific information, such as date of expiry and date of birth, in addition to intelligently draw out and determine implied details, such as name and address.
We will cover the following topics in this post:

How Amazon Textract processes identity documents
A walkthrough of the Amazon Textract console
Structure of the Amazon Textract AnalyzeID API reaction
How to process the action with the Amazon Textract parser library

Identity Document processing utilizing Amazon Textract
Some of you have streamlined and automated the online application process by asking your users to upload a photo of their ID, and then use market options to extract information and prefill the applications automatically. They typically fall short when extracting all of the needed fields properly due to the rich background image on IDs or the failure to acknowledge names and addresses and the fields associated with them. You need an option to help automate the extraction of info from identity files to allow your clients to open bank accounts with ease, or schedule and handle visits online with accurate details.
To solve this issue, you can now utilize Amazon Textracts freshly released Analyze ID API, powered by ML rather of a conventional template matching solution, to process identity documents at scale. (B) Implied fields on the file that might not have explicit secrets, such as Name, Address, and Issued By. The key-value pairs are also stabilized into a typical taxonomy (for example, Document ID number = LIC # or Passport No.).
Amazon Textract console walkthrough
Before we start with the API and code samples, lets examine the Amazon Textract console. The following images reveal examples of a passport and a motorists license file on the Analyze Document output tab of the Amazon Textract console. Amazon Textract immediately and quickly extracts key-value elements, such as the type, code, passport number, surname, provided name, citizenship, date of birth, place of birth, and more fields, from the sample image.

The following is another example with a sample drivers license. Evaluate ID extracts key-value components such as class, as well as indicated fields such as very first name, last name, and address. It also stabilizes secrets, such as “Document number” from “4d NUMBER” as “820BAC729CBAC”, and “Date of birth” from “DOB” as “03/18/1978″, so that it is standardized across IDs.

AnalyzeID API demand
In this area, we explain how to pass the ID image in the demand and how to conjure up the Analyze ID API. The input file is either in a byte array format or present on an Amazon Simple Storage Service (Amazon S3) item. You pass image bytes to an Amazon Textract API operation by utilizing the Bytes property. For example, you can use the Bytes home to pass a document filled from a regional file system. Image bytes passed by utilizing the Bytes home should be base64 encoded. Your code may not require to encode document file bytes if youre utilizing an AWS SDK to call Amazon Textract API operations. Additionally, you can pass images saved in an S3 bucket to an Amazon Textract API operation by utilizing the S3Object residential or commercial property. Documents stored in an S3 container do not need to be base64 encoded.
The following examples reveal how to call the Amazon Textract AnalyzeID function in Python and use the CLI command.
Test Python code:

textract = boto3.client( textract).

# Call textract AnalyzeId by passing photo on regional disk.
documentName=”us-driver-license. jpeg”.
with open( documentName, rb) as document:.
imageBytes = bytearray( document.read()).

import boto3

response = textract.analyze _ id(.
# Call textract AnalyzeId by passing picture on S3.
response= textract.analyze _ id(.
DocumentPages= [
” S3Object”:

The copying reveals how to serialize a Textract AnalyzeId challenge dictionary:.

About the Authors.
Wrick Talukdar is a Senior Solutions Architect with AWS and is based in Calgary, Canada. Wrick works with enterprise AWS clients to change their company through innovative usage of cloud innovations. Beyond work, he enjoys reading and photography.
Lana Zhang is a Sr. Solutions Architect at AWS with competence in Machine Learning. She is accountable for helping consumers architect scalable, protected, and affordable work on AWS.

The following example demonstrates how to deserialize Textract AnalyzeID JSON response to an item:.

# j holds the Textract response JSON.
from trp.trp2 _ analyzeid import TAnalyzeIdDocumentSchema.
t_doc = TAnalyzeIdDocumentSchema(). load( json.loads( j)).

Test shortened action.

from trp.trp2 _ analyzeid import TAnalyzeIdDocumentSchema.
t_doc = TAnalyzeIdDocumentSchema(). dump( t_doc).

— region us-east-1.

,.

,.

Today, we are thrilled to reveal a brand-new API to Amazon Textract called Analyze ID that will help you instantly draw out information from recognition documents, such as chauffeurs licenses and passports. Amazon Textract utilizes AI and ML innovations to draw out details from identity documents, such as U.S. passports and drivers licenses, without the need for design templates or setup. To resolve this issue, you can now utilize Amazon Textracts recently released Analyze ID API, powered by ML rather of a conventional design template matching option, to process identity files at scale. The following images reveal examples of a passport and a chauffeurs license file on the Analyze Document output tab of the Amazon Textract console. Your code might not need to encode document file bytes if youre using an AWS SDK to call Amazon Textract API operations.

,.

python -m pip install amazon-textract-response-parser.

Sample CLI command:.

],.
” DocumentMetadata”:
,.
” AnalyzeIDModelVersion”: “1.0”.

]

” IdentityDocuments”: [
” DocumentIndex”: 1,.
” IdentityDocumentFields”: [
” Type”:
” Text”: “FIRST_NAME”.
,.
” ValueDetection”:
” Text”: “GARCIA”,.
” Confidence”: 99.48689270019531.

,.

” Type”:
” Text”: “EXPIRATION_DATE”.
,.
” ValueDetection”: 2028″,.
” NormalizedValue”:
,.
” Confidence”: 98.64090728759766.

,.

” Type”:
” Text”: “LAST_NAME”.
,.
” ValueDetection”:
” Text”: “MARIA”,.
” Confidence”: 98.49578857421875.

]).

” Type”:
,.
” ValueDetection”:
” Text”: “”,.
” Confidence”: 99.62541198730469.

Process Analyze ID action with the Amazon Textract parser library.
You can use the Amazon Textract reaction parser library to easily parse the JSON returned by Amazon Textract AnalyzeID. The library parses JSON and provides shows language specific constructs to deal with different parts of the document.
Set Up the Amazon Textract Response Parser library:.

Examine ID API reaction.
In this area, we describe the Analyze ID action structure utilizing the sample passport image. The following is the sample passport image and the matching AnalyzeID reaction JSON.

,.

,.

” Type”:
” Text”: “ID_TYPE”.
,.
” ValueDetection”:
” Text”: “DRIVER LICENSE FRONT”,.
” Confidence”: 98.71986389160156.

The AnalyzeID JSON output includes AnalyzeIDModelVersion, DocumentMetadata and IdentityDocuments, and each IdentityDocument product includes IdentityDocumentFields.
The most granular level of data in the IdentityDocumentFields response includes Type and ValueDetection.
Lets call this set of data an IdentityDocumentField component. The preceding example shows an AnalyzeDocument consisting of the Type with the Text and Confidence, and the ValueDetection that includes the Text, the Confidence, and the optional field NormalizedValue.
In the preceding example, Amazon Textract identified 44 key-value sets, including PLACE_OF_BIRTH: New York City For the list of fields drawn out from identity documents, refer to the Amazon Textract Developer Guide.
In addition to the identified content, the Analyze ID API supplies details such as self-confidence scores for found components. It provides you manage over how you consume drawn out content and incorporate it into your applications. You can flag any elements that have a self-confidence score under a certain limit for manual evaluation.
The following is the Analyze ID action structure using the sample driving license image:.
Sample shortened reaction.

Summary.
In this post, we offered a summary of the brand-new Amazon Textract AnalyzeID API to rapidly and quickly retrieve structured data from U.S. government-issued motorists licenses and passports. We likewise described how you can parse the Analyze ID response JSON. For more info, see the Amazon Textract Developer Guide, or check out the developer console and check out Analyze ID API.

Leave a Reply

Your email address will not be published.