Pricing overview
Cohere's pricing structure is primarily usage-based, employing a pay-as-you-go model where costs are calculated per token. This applies across its core language models and embedding services, including Command R, Command R+, Command, Embed, and Rerank. The pricing model differentiates between input tokens (the text sent to the model) and output tokens (the text generated by the model), with varying rates depending on the model's complexity and capability. This approach allows developers and businesses to scale their usage according to demand, paying only for the resources consumed. Cohere also offers a free tier for developers to experiment with its APIs, alongside custom enterprise pricing for large-scale deployments and specific requirements Cohere's official pricing details.
The token-based model is a common practice among large language model (LLM) providers, ensuring granular control over expenditure. For instance, a single token typically represents about four characters in English, though this can vary by language and tokenization method Google's machine learning glossary definition of a token. Understanding tokenization is crucial for estimating costs, as longer inputs and outputs directly correlate with higher token counts and thus higher costs.
Plans and tiers
Cohere primarily offers two main tiers for its API access: a Free Tier for development and testing, and a Production Tier for live applications. Enterprise-level solutions are also available with custom agreements.
Free Tier
The Free Tier is designed to allow developers to explore Cohere's capabilities without initial financial commitment. It provides monthly allocations for various models:
- Up to 5 million input tokens for Command R
- Up to 100,000 output tokens for Command R
- Up to 1 million input tokens for Embed
- Up to 1 million input tokens for Rerank
These limits reset monthly, enabling continuous development and prototyping Cohere pricing page.
Production Tier (Pay-as-you-go)
The Production Tier operates on a pay-as-you-go basis, with costs calculated per 1,000 tokens. Rates differ significantly between input and output tokens, as well as between different models. The following table summarizes the general pricing structure for key Cohere models as of May 2026:
| Product/Model | Input Tokens (per 1k) | Output Tokens (per 1k) | Key Limits/Features | Best For |
|---|---|---|---|---|
| Command R+ | $3.00 | $15.00 | Most capable, RAG optimized, Tool Use | Complex enterprise RAG, advanced automation |
| Command R | $0.50 | $1.50 | Mid-size, RAG optimized, Tool Use | Enterprise search, conversational AI |
| Command | $0.01 | $0.015 | General purpose, reliable generation | Text generation, content creation, summarization |
| Embed v3 (English) | $0.0001 (per 1k) | N/A (input only) | Optimized for search, 1024 dimensions | Semantic search, retrieval-augmented generation (RAG) |
| Embed v3 (Multilingual) | $0.0001 (per 1k) | N/A (input only) | Supports 100+ languages, 1024 dimensions | Global semantic search, cross-lingual RAG |
| Rerank v3 | $0.001 (per 1k) | N/A (input only) | Improves search relevance | Search result optimization, information retrieval |
Note: All prices are illustrative and subject to change. Refer to the official Cohere pricing page for the most current rates.
Enterprise Solutions
For organizations with significant usage requirements, specific compliance needs (e.g., SOC 2 Type II, GDPR, HIPAA), or private cloud deployments, Cohere offers custom enterprise agreements. These plans typically include dedicated support, tailored pricing, and advanced security features Cohere's enterprise offerings.
Free tier and limits
Cohere provides a generous free tier to facilitate development and experimentation across its primary services. The free tier limits are:
- Command R: 5 million input tokens and 100,000 output tokens per month. This allows for substantial testing of conversational AI and RAG-optimized applications.
- Embed: 1 million input tokens per month. This is sufficient for generating embeddings for a considerable amount of text, enabling the development of semantic search and recommendation systems.
- Rerank: 1 million input tokens per month. This tier supports testing the relevance improvement capabilities of the Rerank model on a significant dataset.
These limits are cumulative across all models within their respective categories (e.g., all Embed models share the 1 million input token limit). The tokens reset at the beginning of each calendar month. Exceeding these limits will require transitioning to the production pay-as-you-go tier or upgrading to an enterprise plan Cohere's free tier details.
Real-world cost examples
Understanding Cohere's token-based pricing in practical scenarios helps in estimating potential costs. Here are a few examples based on the listed production tier rates (as of May 2026).
Example 1: Basic Text Generation with Command
- Scenario: Generating 10,000 short product descriptions, each averaging 200 input tokens (prompt) and 150 output tokens (generated description).
- Model: Command
- Input Tokens: 10,000 descriptions * 200 input tokens/description = 2,000,000 input tokens
- Output Tokens: 10,000 descriptions * 150 output tokens/description = 1,500,000 output tokens
- Cost Calculation:
- Input cost: (2,000,000 / 1,000) * $0.01 = $20.00
- Output cost: (1,500,000 / 1,000) * $0.015 = $22.50
- Total Cost: $42.50
Example 2: Semantic Search with Embed and Rerank
- Scenario: Embedding 500,000 documents for a knowledge base (each 500 tokens) once, and then performing 100,000 search queries (each 50 tokens) per month, with results reranked.
- Models: Embed v3 (Multilingual), Rerank v3
- Initial Embedding Cost (one-time):
- Documents to embed: 500,000
- Tokens per document: 500
- Total Embed input tokens: 500,000 * 500 = 250,000,000
- Embed cost: (250,000,000 / 1,000) * $0.0001 = $25.00
- Monthly Search & Rerank Cost:
- Search queries: 100,000
- Tokens per query (Embed): 50
- Queries for Rerank: 100,000 (Rerank processes the query + top N results, assuming query itself is also passed)
- Total Embed input tokens (monthly): 100,000 * 50 = 5,000,000
- Total Rerank input tokens (monthly, assuming 50 candidate documents + query per rerank call): 100,000 queries * (50 tokens query + 50 documents * 500 tokens/document) = 100,000 * (50 + 25000) = ~2,500,000,000 tokens
- Embed monthly cost: (5,000,000 / 1,000) * $0.0001 = $0.50
- Rerank monthly cost: (2,500,000,000 / 1,000) * $0.001 = $2,500.00
- Total Monthly Cost: ~$2,500.50 (plus the one-time embedding cost)
Example 3: Conversational AI with Command R
- Scenario: A chatbot handling 5,000 conversations per day, with each conversation averaging 10 turns. Each turn involves 150 input tokens (user query + context) and 100 output tokens (bot response).
- Model: Command R
- Daily Interactions: 5,000 conversations * 10 turns/conversation = 50,000 turns
- Daily Input Tokens: 50,000 turns * 150 input tokens/turn = 7,500,000 input tokens
- Daily Output Tokens: 50,000 turns * 100 output tokens/turn = 5,000,000 output tokens
- Monthly Cost (approx. 30 days):
- Monthly input tokens: 7,500,000 * 30 = 225,000,000
- Monthly output tokens: 5,000,000 * 30 = 150,000,000
- Input cost: (225,000,000 / 1,000) * $0.50 = $112,500.00
- Output cost: (150,000,000 / 1,000) * $1.50 = $225,000.00
- Total Monthly Cost: $337,500.00
These examples illustrate how costs can accumulate based on the scale of operations and the specific models utilized. The most powerful models, like Command R+, command higher per-token rates due to their advanced capabilities. Developers are encouraged to optimize prompt engineering and response length to manage token usage efficiently Cohere's tokenization documentation.
How the pricing compares
Cohere's token-based pricing model is standard within the LLM industry, aligning with major alternatives such as OpenAI and Anthropic. The competitive landscape often sees providers adjusting per-token rates based on model size, capability, and target use cases (e.g., general text generation vs. highly specialized RAG). While direct price comparisons can be complex due to differences in tokenization, model performance, and feature sets, a general overview can be provided.
- OpenAI: Offers a diverse range of models (e.g., GPT-3.5 Turbo, GPT-4) with varying price points. GPT-4 Turbo models, designed for high performance and context windows, typically have higher per-token costs than GPT-3.5 Turbo. OpenAI also distinguishes between input and output token pricing OpenAI's official pricing information. For instance, a basic model like GPT-3.5 Turbo 0125 might be priced at $0.0005 / 1K input tokens and $0.0015 / 1K output tokens, while GPT-4 Turbo could be $0.01 / 1K input and $0.03 / 1K output OpenAI API pricing details.
- Anthropic: Provides its Claude series of models (e.g., Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku). Anthropic also uses a token-based model with separate rates for input and output, often emphasizing larger context windows and safety features. Claude 3 Haiku, their fastest and most compact model, might be priced around $0.25 / 1M input tokens and $1.25 / 1M output tokens, with Opus being significantly higher Anthropic's API pricing documentation.
- Google Cloud Vertex AI: Google's Vertex AI platform hosts various models, including their Gemini family. Pricing is also token-based, with costs varying by model and region. Google often offers tiered pricing or discounts for high volume usage. For example, Gemini 1.5 Pro might be priced at $0.007 / 1K input and $0.021 / 1K output tokens, with specific rates for multimodal inputs Google Cloud Vertex AI pricing.
Cohere's RAG-optimized Command R and Command R+ models are positioned competitively for enterprise use cases where retrieval accuracy and tool use are critical. Their Embed and Rerank models are specifically designed for search and information retrieval, offering fine-tuned performance for those tasks. While specific per-token rates can fluctuate, Cohere's pricing strategy aims to provide value through model specialization and performance in key enterprise AI applications Cohere's pricing details.