SDKs overview
The OpenAI Whisper API offers programmatic access to its speech-to-text and speech-to-translation models. To facilitate integration, OpenAI provides official Software Development Kits (SDKs) for popular programming languages. These SDKs are designed to streamline the process of sending audio files to the API, managing responses, and handling common tasks such as authentication and error handling. While official SDKs focus on Python and Node.js, community-driven libraries exist for other languages, often wrapping the core RESTful API functionality.
Using an SDK can simplify development by abstracting HTTP requests and JSON parsing, allowing developers to interact with the Whisper API using native language constructs. This approach can reduce boilerplate code and improve maintainability compared to direct API calls via HTTP clients like cURL or a custom implementation.
The OpenAI Whisper API is part of the broader OpenAI API platform, which includes models for various AI tasks. Developers can find comprehensive details on the API's capabilities and parameters within the official OpenAI Audio API reference.
Official SDKs by language
OpenAI officially supports SDKs for Python and Node.js, which are the primary recommended methods for integrating the Whisper API into applications. These SDKs are maintained by OpenAI and provide the most up-to-date and reliable way to interact with the API.
| Language | Package Name | Installation Command | Maturity |
|---|---|---|---|
| Python | openai |
pip install openai |
Stable |
| Node.js | openai |
npm install openai |
Stable |
These packages are part of the larger OpenAI API client libraries, meaning they provide access not only to the Whisper API but also to other OpenAI models like GPT for language generation or DALL-E for image generation. Developers should refer to the specific OpenAI API documentation for audio for detailed usage instructions and available methods.
Installation
To begin using the OpenAI Whisper API, you must first install the appropriate SDK for your chosen programming language. The installation process typically involves using a package manager specific to that language.
Python
For Python, the openai package is installed using pip, the standard package installer for Python. Ensure you have Python 3.7.1 or newer installed on your system.
pip install openai
After installation, you can verify it by importing the library in a Python interpreter or script.
Node.js
For Node.js, the openai package is installed using npm, the default package manager for Node.js. Ensure you have Node.js 12.0.0 or newer installed.
npm install openai
Once installed, the package can be imported into your JavaScript or TypeScript projects.
Before making API calls, you will also need an API key from your OpenAI account, which can be generated and managed on the OpenAI API keys page.
Quickstart example
The following examples demonstrate how to transcribe an audio file using the OpenAI Whisper API through its official SDKs. These snippets assume you have already installed the respective SDK and have your OpenAI API key configured (e.g., as an environment variable OPENAI_API_KEY).
Python Quickstart
This Python example shows how to transcribe an audio file named audio.mp3 using the openai SDK. The transcript object will contain the transcribed text.
import openai
import os
# Ensure your API key is set as an environment variable or passed directly
# openai.api_key = os.getenv("OPENAI_API_KEY")
# Path to your audio file
audio_file_path = "./audio.mp3"
def transcribe_audio(file_path):
try:
with open(file_path, "rb") as audio_file:
transcript = openai.audio.transcriptions.create(
model="whisper-1",
file=audio_file
)
return transcript.text
except Exception as e:
return f"An error occurred: {e}"
if __name__ == "__main__":
transcribed_text = transcribe_audio(audio_file_path)
print(f"Transcribed Text: {transcribed_text}")
Node.js Quickstart
This Node.js example demonstrates transcribing an audio file named audio.mp3. The transcribed text will be available in the transcript.text property.
const OpenAI = require('openai');
const fs = require('fs');
const openai = new OpenAI();
const audioFilePath = './audio.mp3';
async function transcribeAudio(filePath) {
try {
const transcript = await openai.audio.transcriptions.create({
file: fs.createReadStream(filePath),
model: 'whisper-1',
});
return transcript.text;
} catch (error) {
return `An error occurred: ${error.message}`;
}
}
(async () => {
const transcribedText = await transcribeAudio(audioFilePath);
console.log(`Transcribed Text: ${transcribedText}`);
})();
These examples illustrate the basic process. For more advanced features, such as specifying response formats, language hints, or translating audio to English, consult the OpenAI transcription API reference.
Community libraries
While OpenAI provides official SDKs for Python and Node.js, the broader developer community has created libraries and wrappers for other programming languages and frameworks. These community-maintained projects often aim to provide similar ease of use or integrate Whisper functionality into specific environments where an official SDK is not available.
Examples of community contributions might include:
- Go wrappers: Developers have created Go clients that interact with the OpenAI API, including Whisper endpoints, by making HTTP requests directly.
- PHP clients: Similar to Go, PHP libraries exist that abstract the RESTful calls to the OpenAI API, allowing PHP applications to leverage Whisper's capabilities.
- CLI tools: Command-line interface tools built on top of the OpenAI API often incorporate Whisper for batch processing or quick transcriptions without writing custom code.
When considering community libraries, it is important to evaluate their maintenance status, documentation quality, and active development. While they can offer flexibility, official SDKs generally provide the most stable and feature-complete integration with OpenAI's services. Developers can often find these community projects on platforms like GitHub by searching for keywords such as "OpenAI Whisper [language] client" or "OpenAI API [language] wrapper." For general best practices when working with third-party APIs, the MDN Web Docs on Fetch API provides foundational knowledge on making web requests, which many community libraries abstract.
Always verify the source and security practices of any third-party library before incorporating it into a production environment. For critical applications, direct integration via the official SDKs or raw API calls (with proper authentication and error handling) is generally recommended.