Getting started overview

Integrating Deepgram's speech-to-text and audio intelligence capabilities involves a standard API integration workflow. This guide outlines the steps to create a Deepgram account, obtain the necessary API credentials, and execute a foundational request to confirm your setup. The process emphasizes using official documentation and SDKs to streamline development.

Deepgram provides API reference documentation and SDKs for multiple programming languages, including Python, Node.js, Go, Ruby, Java, C#, and PHP. These resources facilitate interaction with Deepgram's core products, such as Deepgram Aura for Speech-to-Text and Text-to-Speech, Deepgram Analyze for Audio Intelligence, and Deepgram Hear for live streaming audio processing.

Before making your first API call, ensure you have a Deepgram account and an API key. This key authenticates your requests and links them to your usage plan, which includes a free tier for initial development.

Quick Reference Guide

Step What to Do Where
1. Account Creation Register for a Deepgram account. Deepgram Signup Page
2. API Key Retrieval Generate and copy an API key from your console. Deepgram Console Project Settings
3. Environment Setup Install Deepgram SDK or prepare for direct HTTP requests. Deepgram Getting Started Documentation
4. First Request Send an audio file or stream for transcription. Deepgram Quickstart Examples
5. Verify Results Check the API response for transcription output. Your application's console or output log

Create an account and get keys

To begin using Deepgram, you must first create an account. This account serves as your portal to manage projects, monitor usage, and generate API keys. Deepgram offers a free tier that includes 10,000 requests per month, allowing developers to test and build without immediate cost.

  1. Sign Up: Navigate to the Deepgram signup page. You can register using an email address, or through third-party authentication providers like Google or GitHub.
  2. Access Console: After successful registration, you will be redirected to the Deepgram Console. This dashboard provides an overview of your projects and usage.
  3. Generate API Key:
    • In the Deepgram Console, locate the "API Keys" section, typically found under your project settings or a dedicated API key management tab.
    • Click "Create New API Key".
    • Provide a descriptive name for your key (e.g., "Development Key", "Transcription Service Key").
    • Deepgram will generate a unique API key. This key is crucial for authenticating your API requests. Copy this key immediately, as it may not be fully retrievable later for security reasons. Treat your API key like a password; it grants access to your Deepgram account and resources.
  4. Store API Key Securely: It is recommended to store your API key as an environment variable rather than hardcoding it directly into your application code. This practice enhances security and simplifies key rotation. For example, on Linux or macOS, you might use export DEEPGRAM_API_KEY="YOUR_API_KEY_HERE" in your shell profile. For server-side applications, consider using a secret management service.

Your first request

After setting up your account and obtaining an API key, you can make your first request to the Deepgram API. This example demonstrates how to transcribe an audio file using the Deepgram Python SDK. Deepgram also provides SDKs for Node.js, Go, Ruby, Java, C#, and PHP.

Prerequisites

  • Python 3.7+ installed.
  • Deepgram API Key (stored as an environment variable named DEEPGRAM_API_KEY).
  • An audio file (e.g., a short .wav or .mp3 file) for transcription.

Installation (Python SDK)

Install the Deepgram Python SDK using pip:

pip install deepgram-sdk

Example Python Code for File Transcription

This script transcribes a local audio file. Replace "your_audio_file.wav" with the path to your audio file.

import os
from deepgram import DeepgramClient, DeepgramClientOptions, LiveTranscriptionEvents, FileSource, PrerecordedOptions

# Ensure your API key is set as an environment variable
# export DEEPGRAM_API_KEY="YOUR_DEEPGRAM_API_KEY"
API_KEY = os.getenv("DEEPGRAM_API_KEY")

if not API_KEY:
    raise ValueError("DEEPGRAM_API_KEY environment variable not set.")

# Initialize Deepgram Client
# Configuration options can be passed here if needed
config: DeepgramClientOptions = DeepgramClientOptions(
    verbose=1
)

deepgram = DeepgramClient(API_KEY, config)

# Path to your audio file
FILE = "your_audio_file.wav"

# Read the audio file
with open(FILE, "rb") as file:
    buffer_data = file.read()

# Prepare the source for transcription
payload: FileSource = {
    "buffer": buffer_data,
    # Optionally, specify mimetype if Deepgram can't infer it
    # "mimetype": "audio/wav"
}

# Configure transcription options
options: PrerecordedOptions = {
    "punctuate": True,
    "diarize": False,
    "language": "en-US"
}

print(f"Attempting to transcribe: {FILE}")

try:
    # Send the request and get the response
    response = deepgram.listen.prerecorded.v("1").transcribe_file(payload, options)
    
    # Print the full JSON response for inspection
    # print(response.to_json(indent=4))

    # Extract and print the transcription
    if response.results and response.results[0].channels and response.results[0].channels[0].alternatives:
        transcript = response.results[0].channels[0].alternatives[0].transcript
        print(f"\nTranscription: {transcript}")
    else:
        print("No transcription found in the response.")

except Exception as e:
    print(f"Error during transcription: {e}")

Running the Script

  1. Save the code as a .py file (e.g., transcribe_audio.py).
  2. Ensure your DEEPGRAM_API_KEY environment variable is set.
  3. Execute the script from your terminal: python transcribe_audio.py.

The output will display the transcription of your audio file. This confirms that your Deepgram API key is correctly configured and the API is accessible.

Common next steps

After successfully making your first Deepgram API call, consider these next steps to further integrate and optimize your speech applications:

  1. Explore Advanced Transcription Features: Deepgram offers numerous options for transcription, such as speaker diarization (identifying different speakers), punctuation, entity detection, and language selection. Review the Deepgram Prerecorded Speech Recognition API reference to understand and implement these features for richer insights from audio.
  2. Integrate Real-time Transcription: For applications requiring live audio processing (e.g., voice assistants, live call centers), explore Deepgram's real-time streaming API. This involves establishing a WebSocket connection and sending audio chunks as they become available. The Deepgram streaming documentation provides specific examples.
  3. Utilize Audio Intelligence Features: Beyond basic transcription, Deepgram Analyze offers features like sentiment analysis, topic detection, and summarization. These capabilities can enrich your applications with deeper insights from spoken content. Consult the Deepgram Audio Intelligence guides for implementation details.
  4. Explore Text-to-Speech (TTS): If your application requires generating spoken responses, Deepgram Aura Text-to-Speech allows you to convert text into natural-sounding speech. This is useful for interactive voice agents or content narration. Refer to the Deepgram Text-to-Speech documentation.
  5. Implement Webhooks for Asynchronous Processing: For processing large audio files or batches, using Deepgram webhooks can improve efficiency. Instead of polling for results, Deepgram sends a notification to a specified endpoint once transcription is complete. This is a common pattern for asynchronous API interactions, as described in Mozilla's Webhook documentation.
  6. Monitor Usage and Billing: Regularly check your Deepgram Console to monitor API usage and understand your billing. This helps manage costs, especially as your application scales beyond the free tier. The Deepgram project dashboard provides detailed metrics.
  7. Review Security Best Practices: Ensure your API keys and sensitive data are handled securely. Adhere to principles like least privilege, regular key rotation, and secure storage of credentials.

Troubleshooting the first call

When encountering issues with your initial Deepgram API request, consider these common troubleshooting steps:

  1. API Key Validation:
    • Incorrect Key: Double-check that the API key you are using is exactly as copied from the Deepgram Console. Even minor typos can cause authentication failures.
    • Environment Variable Not Set: If using an environment variable (DEEPGRAM_API_KEY), confirm it is correctly set in your current shell session before running the script. Restarting your terminal or IDE might be necessary for new environment variables to take effect.
    • Expired or Revoked Key: API keys can be revoked or expire. Check the status of your key in the Deepgram Console.
  2. Network Connectivity:
    • Ensure your development environment has an active internet connection and is not blocked by a firewall or proxy from accessing api.deepgram.com.
    • Test connectivity with a simple ping api.deepgram.com or curl -I https://api.deepgram.com/v1/health.
  3. Audio File Issues:
    • Unsupported Format: Verify that your audio file is in a supported format (e.g., WAV, MP3, FLAC, M4A).
    • Corrupt File: Ensure the audio file is not corrupt and can be played back locally.
    • Incorrect Path: Confirm the file path in your code correctly points to the audio file.
    • Empty or Silent Audio: Deepgram may return an empty transcription if the audio contains no discernible speech.
  4. SDK and Dependency Issues:
    • SDK Installation: Verify the Deepgram SDK is correctly installed (e.g., pip show deepgram-sdk for Python).
    • Outdated SDK: Consider updating your SDK to the latest version to benefit from bug fixes and new features (e.g., pip install --upgrade deepgram-sdk).
    • Other Dependencies: Check for any missing or conflicting dependencies in your project.
  5. API Response and Error Messages:
    • Parse Full Response: Instead of just printing the transcript, print the entire JSON response from Deepgram. Error messages are often contained within this response and can provide specific clues.
    • HTTP Status Codes: Pay attention to HTTP status codes. A 401 Unauthorized typically indicates an API key issue, while a 400 Bad Request suggests an issue with your request payload or parameters.
  6. Deepgram Status Page: Check the Deepgram status page to see if there are any ongoing service outages or maintenance that might affect API availability.
  7. Consult Deepgram Documentation and Support: If issues persist, refer to the Deepgram Getting Started guide for more detailed troubleshooting or contact Deepgram support.