Authentication overview

AWS Textract, a machine learning service that extracts text and data from documents, uses a robust authentication model built on AWS Identity and Access Management (IAM). IAM is an AWS service that helps you securely control access to AWS resources. When you interact with Textract, whether programmatically through an SDK or directly via the AWS CLI, your requests must be authenticated and authorized to ensure that only permitted entities can perform actions.

Authentication verifies the identity of the principal (a person or application) making the request, while authorization determines what actions that principal is allowed to perform on specific Textract resources. All requests to Textract must be signed using Signature Version 4 (SigV4), AWS's protocol for authenticating API requests. This signing process involves using your AWS credentials to create a unique signature for each request, which AWS then verifies upon receipt.

Understanding Textract's authentication mechanisms is crucial for secure integration and operation, preventing unauthorized access to your document processing workflows and sensitive data. The primary methods involve using access keys, temporary security credentials, or IAM roles, each suited for different use cases and security postures.

Supported authentication methods

AWS Textract supports several authentication methods, primarily leveraging the capabilities of AWS IAM. The choice of method often depends on the environment from which Textract is being called (e.g., an EC2 instance, a local development machine, or a serverless function) and the desired level of security and credential management.

The following table outlines the main authentication methods for AWS Textract:

Method When to Use Security Level
IAM User Access Keys Local development, non-AWS environments, command-line interface (CLI) access. Moderate. Requires careful management to prevent exposure; long-lived credentials.
IAM Roles for EC2 Instances Applications running on AWS EC2 instances, ECS containers, or EKS pods. High. Provides temporary, automatically rotated credentials; no need to store keys on the instance.
IAM Roles for AWS Lambda Functions Serverless applications using AWS Lambda. High. Provides temporary credentials automatically managed by Lambda; no manual key management.
Temporary Security Credentials (STS) Federated users, cross-account access, mobile/desktop applications, situations requiring short-lived credentials. High. Credentials have a limited lifespan, reducing the risk of compromise.
AWS Single Sign-On (SSO) Centralized access management for users across multiple AWS accounts and business applications. High. Integrates with identity providers, simplifies user access and permissions.

For most production applications running within the AWS ecosystem, using IAM roles is the recommended approach due to its enhanced security and simplified credential management. Roles provide temporary security credentials that applications can use to make requests, eliminating the need to embed or store long-term access keys.

Getting your credentials

The process of obtaining credentials for AWS Textract depends on the chosen authentication method:

1. IAM User Access Keys

  1. Create an IAM User: Navigate to the IAM console, select 'Users', and then 'Add user'. Provide a user name and select 'Programmatic access'.
  2. Attach Policies: Grant the necessary permissions. For Textract, you might attach policies like AmazonTextractReadOnlyAccess or AmazonTextractFullAccess, or create custom policies with least privilege.
  3. Retrieve Access Keys: After creating the user, you will be presented with an Access Key ID and a Secret Access Key. These are displayed only once. Download them or copy them immediately.
  4. Configure: Store these keys securely. For local development, you can configure them using the AWS CLI configure command or by setting environment variables (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_SESSION_TOKEN if using temporary credentials).

2. IAM Roles for AWS Services (EC2, Lambda, etc.)

  1. Create an IAM Role: In the IAM console, select 'Roles' and 'Create role'. Choose the AWS service that will use this role (e.g., EC2, Lambda).
  2. Attach Policies: Grant the necessary Textract permissions to the role.
  3. Associate Role:
    • For EC2: When launching an EC2 instance, select the created IAM role under 'Advanced details' -> 'IAM instance profile'.
    • For Lambda: When creating or configuring a Lambda function, select the created IAM role as the 'Execution role'.
  4. Automatic Credential Provisioning: The AWS SDKs and CLI automatically retrieve temporary credentials from the associated service's metadata service (e.g., EC2 instance metadata service) when a role is used. No manual key management is required.

3. Temporary Security Credentials (STS)

AWS Security Token Service (STS) allows you to request temporary, limited-privilege credentials. This is often used in scenarios like federated access or when building custom identity brokers. You can obtain temporary credentials by calling STS API operations like AssumeRole, GetFederationToken, or GetSessionToken. These operations return a set of temporary credentials (access key ID, secret access key, and session token) that are valid for a specified duration.

Authenticated request example

Once your credentials are configured, making an authenticated request to AWS Textract is handled by the AWS SDKs. The SDKs automatically sign your requests using the available credentials. Here's an example using the Python Boto3 SDK to detect text in a document stored in an S3 bucket:


import boto3

def detect_text_from_s3(bucket_name, document_key):
    textract_client = boto3.client('textract', region_name='us-east-1')

    try:
        response = textract_client.detect_document_text(
            Document={
                'S3Object': {
                    'Bucket': bucket_name,
                    'Name': document_key
                }
            }
        )
        print("Detected Text:")
        for item in response['Blocks']:
            if item['BlockType'] == 'LINE':
                print(item['Text'])
    except Exception as e:
        print(f"Error detecting text: {e}")

# Example usage (assuming credentials are configured via environment variables, ~/.aws/credentials, or IAM role)
# detect_text_from_s3('your-s3-bucket-name', 'path/to/your/document.png')

In this Python example, boto3.client('textract', region_name='us-east-1') initializes the Textract client. Boto3 automatically looks for credentials in a specific order: environment variables, shared credential file (~/.aws/credentials), EC2 instance profile credentials, and container credentials. No explicit credential passing is typically needed when using SDKs in properly configured environments.

Security best practices

Adhering to security best practices is essential when authenticating with AWS Textract to protect your AWS account and the data you process:

  • Principle of Least Privilege: Grant only the necessary permissions to users and roles. For Textract, this means providing only the specific API actions required (e.g., textract:DetectDocumentText, textract:AnalyzeDocument) and restricting access to specific S3 buckets where documents are stored. Avoid using * for actions or resources unless absolutely necessary.
  • Use IAM Roles for AWS Services: Whenever possible, use IAM roles for applications running on AWS infrastructure (EC2, Lambda, ECS). Roles provide temporary credentials that are automatically rotated, eliminating the need to store static access keys on instances or within code.
  • Rotate Access Keys Regularly: If you must use IAM user access keys, implement a regular rotation schedule (e.g., every 90 days). You can generate new keys and deactivate old ones in the IAM console.
  • Never Embed Credentials in Code: Do not hardcode access keys or other sensitive credentials directly into your application's source code. Use environment variables, configuration files, or AWS Secrets Manager for credential storage.
  • Enable Multi-Factor Authentication (MFA): Enforce MFA for all IAM users, especially for the root account and administrative users. This adds an extra layer of security beyond just a password or access key.
  • Monitor AWS CloudTrail Logs: Use AWS CloudTrail to monitor API calls made to Textract and other AWS services. CloudTrail logs provide a detailed history of actions taken, which is invaluable for security auditing and incident response.
  • Encrypt Data in Transit and at Rest: Ensure that documents processed by Textract are encrypted both when uploaded to S3 (at rest) and when transmitted to Textract (in transit) using TLS/SSL. Textract automatically encrypts data at rest with AWS Key Management Service (KMS).
  • Use VPC Endpoints: For enhanced security and to keep traffic within the AWS network, configure VPC endpoints for Textract. This allows your applications in a Virtual Private Cloud (VPC) to communicate with Textract without traversing the public internet.
  • Review IAM Policies Periodically: Regularly audit and review your IAM policies to ensure they still adhere to the principle of least privilege and that no unnecessary permissions have accumulated.