Getting started overview

This guide outlines the essential steps to begin using AWS Textract, covering account setup, credential management, and executing your initial API request. AWS Textract is a machine learning service that automatically extracts text, handwriting, and data from scanned documents. It goes beyond simple optical character recognition (OCR) to identify fields, content, and information stored in tables without manual configuration or training AWS Textract features overview.

The process involves:

  1. Creating an AWS account and an IAM user.
  2. Generating access keys for programmatic interaction.
  3. Configuring your local development environment.
  4. Writing and executing a simple code example to make a Textract API call.

Familiarity with basic AWS concepts, such as Identity and Access Management (IAM) and the AWS Management Console, will facilitate the setup process. This guide focuses on direct API interaction, primarily using the AWS SDKs for Python and Java, which are commonly used for Textract integrations AWS Textract product page.

Quick Reference Table

Step What to do Where
1. Create AWS Account Sign up for a new AWS account. AWS homepage
2. Create IAM User Create a dedicated IAM user with programmatic access. AWS IAM User Guide
3. Attach Policy Attach the AmazonTextractFullAccess policy to the IAM user. AWS Textract Access Control
4. Generate Access Keys Generate an Access Key ID and Secret Access Key for the IAM user. AWS IAM User Guide on Access Keys
5. Configure Environment Install AWS SDK and configure credentials locally. AWS CLI Configuration Guide
6. Make First Request Execute sample code to detect text or analyze a document. This guide's "Your first request" section, and AWS Textract Developer Guide

Create an account and get keys

To interact with AWS Textract programmatically, you need an AWS account and credentials. If you don't have an AWS account, you can create one by visiting the AWS homepage and following the sign-up process. This typically involves providing an email address, password, and credit card for billing purposes, though Textract offers a free tier for initial usage AWS Textract pricing details.

IAM User and Permissions

It is a security best practice to create an AWS Identity and Access Management (IAM) user for programmatic access instead of using your root account credentials. This allows you to grant specific permissions necessary for Textract operations without over-privileging your access.

  1. Sign in to the AWS Management Console: Use your root account or an existing IAM user with administrative privileges.
  2. Navigate to IAM: In the console, search for "IAM" and select the service.
  3. Create a new IAM user:
    • Go to "Users" in the left navigation pane and click "Add users".
    • Enter a user name (e.g., textract-user).
    • Select "Access key - Programmatic access" as the AWS credential type creating IAM users.
    • Click "Next: Permissions".
  4. Attach permissions policy:
    • On the "Set permissions" page, select "Attach existing policies directly".
    • Search for AmazonTextractFullAccess and select it. This policy grants the necessary permissions to perform all Textract actions. For production environments, consider creating a more restrictive custom policy based on the principle of least privilege Textract access control documentation.
    • Click "Next: Tags" (optional) and then "Next: Review".
    • Review the user details and click "Create user".
  5. Retrieve Access Keys:
    • After the user is created, you will see the "Access key ID" and "Secret access key".
    • Important: Copy these keys immediately. The secret access key is only displayed once. If you lose it, you will need to generate new keys.
    • Store these keys securely.

Configure AWS CLI and SDKs

You can configure your local environment using the AWS Command Line Interface (CLI) or directly through environment variables/SDK configuration files.

Using AWS CLI (Recommended):

  1. Install AWS CLI: Follow the instructions for your operating system on the AWS CLI installation guide.
  2. Configure AWS CLI: Open your terminal or command prompt and run: aws configure
  3. When prompted, enter your Access Key ID, Secret Access Key, default region (e.g., us-east-1), and default output format (e.g., json) AWS CLI configuration documentation.

Your first request

This section demonstrates how to make a basic Textract API call using Python (Boto3 SDK) and Java (AWS SDK for Java). Ensure your environment is configured with the AWS credentials as described above.

Python (Boto3)

First, install the Boto3 SDK:

pip install boto3

Next, create a Python script (e.g., detect_text.py) to detect text in an image. For this example, you'll need a sample image file (e.g., document.png) in the same directory or accessible via a path. You can use any image with text, such as a screenshot or a scanned document.

import boto3

def detect_text_from_image(image_path, region_name='us-east-1'):
    client = boto3.client('textract', region_name=region_name)

    with open(image_path, 'rb') as document:
        image_bytes = bytearray(document.read())

    response = client.detect_document_text(
        Document={'Bytes': image_bytes}
    )

    print('Detected Text:')
    for item in response['Blocks']:
        if item['BlockType'] == 'WORD':
            print(f"  {item['Text']}")

    return response

if __name__ == '__main__':
    # Replace 'document.png' with the path to your image file
    image_file = 'document.png'
    try:
        detect_text_from_image(image_file)
    except FileNotFoundError:
        print(f"Error: The file '{image_file}' was not found. Please ensure the image is in the correct directory.")
    except Exception as e:
        print(f"An error occurred: {e}")

Execute the script:

python detect_text.py

The output will list the words detected by Textract in your image. For more advanced analysis, such as forms and tables, you would use the analyze_document API call AWS Textract AnalyzeDocument API reference.

Java (AWS SDK for Java)

First, set up a Maven or Gradle project. For Maven, add the following dependency to your pom.xml:

<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>textract</artifactId>
    <version>2.20.70</version> <!-- Check for the latest version -->
</dependency>
<dependency>
    <groupId>software.amazon.awssdk</groupId>
    <artifactId>regions</artifactId>
    <version>2.20.70</version> <!-- Check for the latest version -->
</dependency>

Create a Java file (e.g., DetectText.java) and place your sample image (e.g., document.jpg) in a location accessible by your application.

import software.amazon.awssdk.regions.Region;
import software.amazon.awssdk.services.textract.TextractClient;
import software.amazon.awssdk.services.textract.model.DetectDocumentTextRequest;
import software.amazon.awssdk.services.textract.model.Document;
import software.amazon.awssdk.services.textract.model.Block;
import software.amazon.awssdk.core.SdkBytes;

import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.nio.ByteBuffer;

public class DetectText {

    public static void main(String[] args) {
        String photo = "document.jpg"; // Replace with your image file path
        Region region = Region.US_EAST_1; // Replace with your desired region

        TextractClient textractClient = TextractClient.builder()
                .region(region)
                .build();

        System.out.println("Detecting text in " + photo);

        try (FileInputStream fileInputStream = new FileInputStream(photo)) {
            SdkBytes sourceBytes = SdkBytes.fromByteBuffer(ByteBuffer.wrap(fileInputStream.readAllBytes()));

            Document myDoc = Document.builder()
                    .bytes(sourceBytes)
                    .build();

            DetectDocumentTextRequest detectDocumentTextRequest = DetectDocumentTextRequest.builder()
                    .document(myDoc)
                    .build();

            textractClient.detectDocumentText(detectDocumentTextRequest).blocks().forEach(block -> {
                if (block.blockType().toString().equals("WORD")) {
                    System.out.println("  " + block.text());
                }
            });

        } catch (FileNotFoundException e) {
            System.err.println("Error: The file '" + photo + "' was not found. Please ensure the image is in the correct directory.");
            System.exit(1);
        } catch (IOException e) {
            System.err.println("Error reading image file: " + e.getMessage());
            System.exit(1);
        } catch (Exception e) {
            System.err.println("An error occurred: " + e.getMessage());
            System.exit(1);
        }

        textractClient.close();
    }
}

Compile and run your Java application. The output will display detected words from your image.

Common next steps

After successfully making your first Textract call, consider these common next steps to further integrate and optimize your usage:

  1. Explore AnalyzeDocument: The DetectDocumentText API is good for raw text extraction. For structured data like forms and tables, use the AnalyzeDocument API. This API can extract key-value pairs and table data, which is crucial for automating data entry and processing AWS Textract AnalyzeDocument functionality.
  2. Asynchronous Operations: For large documents or batches of documents, use asynchronous operations (StartDocumentTextDetection, GetDocumentTextDetection, StartDocumentAnalysis, GetDocumentAnalysis). These APIs process documents in the background and notify you upon completion, often via Amazon SNS and SQS asynchronous processing in AWS Textract.
  3. Error Handling and Retries: Implement robust error handling, including retry mechanisms with exponential backoff, to manage transient network issues or service throttling. The AWS SDKs often provide built-in retry logic.
  4. Cost Management: Monitor your Textract usage and costs through the AWS Billing Console. Understand the AWS Textract pricing model, which is based on the number of pages processed and the features used (e.g., text detection, form extraction, table extraction). Take advantage of the free tier during initial development.
  5. Security Best Practices: Refine your IAM policies to grant only the necessary permissions (least privilege). Consider using Amazon S3 for document storage and ensuring proper bucket policies.
  6. Integrate with Other AWS Services: Textract often serves as a component in larger workflows. Integrate it with services like Amazon S3 for storage, AWS Lambda for serverless processing, Amazon Comprehend for natural language processing, or Amazon DynamoDB for storing extracted data.
  7. Review Output Structure: Familiarize yourself with the JSON output structure of Textract, especially the Blocks array, which contains detailed information about detected text, lines, words, forms, and tables. Understanding this structure is key to effectively parsing and utilizing the extracted data Textract output objects.
  8. Explore Advanced Features: Look into specialized APIs like AnalyzeID for identity documents or AnalyzeExpense for receipts and invoices if your use case requires them AWS Textract AnalyzeID documentation.

Troubleshooting the first call

Encountering issues during your first Textract API call is common. Here are some steps to diagnose and resolve typical problems:

  1. "No credentials provided" or "Missing authentication token" errors:
    • Check AWS CLI configuration: Ensure you have run aws configure and provided valid Access Key ID and Secret Access Key. Verify the ~/.aws/credentials file for correct entries AWS CLI configuration guide.
    • Environment variables: If not using the CLI config, ensure AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY environment variables are set correctly.
    • Region mismatch: Confirm that the region specified in your code (e.g., us-east-1) matches the region configured for your credentials or where Textract is available.
  2. "Access Denied" or "User is not authorized to perform textract:DetectDocumentText":
    • IAM Policy: Double-check that the IAM user or role used has the AmazonTextractFullAccess policy or a custom policy with equivalent permissions attached. Review the policy in the IAM console IAM troubleshooting policies.
    • Resource-based policies: If you are using S3 buckets for input, ensure the bucket policy grants Textract access (more relevant for asynchronous operations).
  3. "Invalid image format" or "Unsupported document format":
    • Supported formats: Textract supports PNG, JPG/JPEG, and PDF files. Ensure your input document is one of these formats Textract service limits.
    • File corruption: Verify the integrity of your image file. Try opening it with a standard image viewer.
    • File size limits: Check the maximum file size allowed for synchronous (10MB) and asynchronous (500MB for image, 500MB for PDF up to 3,000 pages) operations. Large files may need to be processed asynchronously Textract service limits.
  4. "The image is too large" or "Request payload exceeded":
    • Synchronous vs. Asynchronous: For larger files, switch from synchronous APIs (detect_document_text, analyze_document) to their asynchronous counterparts (start_document_text_detection, start_document_analysis) which support larger inputs.
  5. No output or empty Blocks array:
    • Image quality: Textract performs best on clear, well-lit images with legible text. Poor quality, blurry, or heavily skewed images may result in limited or no detection.
    • Document content: Ensure the document actually contains text. If it's a blank page or an image without discernible characters, Textract will find nothing.
  6. SDK specific errors (e.g., Boto3 ClientError, Java SdkClientException):
    • SDK version: Ensure you are using a recent and compatible version of the AWS SDK. Check the SDK documentation for version requirements.
    • Network connectivity: Verify your internet connection and ensure there are no firewall rules blocking outbound connections to AWS endpoints.
    • Debugging output: Increase the verbosity of your SDK's logging to get more detailed error messages. For Boto3, you can set boto3.set_stream_logger('').
  7. General API Reference: When in doubt, consult the AWS Textract API Reference for specific error codes and their meanings.