What is Cohere's primary pricing model?

Cohere primarily uses a pay-as-you-go, token-based pricing model. Costs are calculated based on the number of input tokens (text sent to the model) and output tokens (text generated by the model).

Does Cohere offer a free tier for developers?

Yes, Cohere offers a free tier with monthly allocations for its Command R (5M input, 100K output tokens), Embed (1M input tokens), and Rerank (1M input tokens) models, allowing developers to test and build applications.

How do Cohere's prices compare between different models like Command and Command R+?

More capable and specialized models like Command R+ have higher per-token costs than general-purpose models like Command, reflecting their advanced features, performance, and larger context windows.

Are there different prices for input and output tokens?

Yes, for generation models like Command R and Command R+, Cohere typically charges different rates for input tokens and output tokens, with output tokens often being more expensive due to the computational cost of generation.

What happens if I exceed the free tier limits?

If you exceed the monthly free tier limits, your usage will automatically transition to the production pay-as-you-go rates for the respective models. You can monitor your usage in your Cohere account dashboard.

Does Cohere offer custom pricing for large enterprises?

Yes, Cohere provides custom enterprise pricing solutions for organizations with significant usage, specific compliance requirements, or private cloud deployment needs. These plans often include dedicated support and tailored agreements.

How can I estimate the cost of using Cohere's APIs?

To estimate costs, you need to consider the specific Cohere model you're using, the expected volume of input and output tokens for your application, and the current per-token rates published on the official Cohere pricing page.

Cohere Pricing: Models, Tiers, and Cost Examples (2026)

Cohere pricing primarily operates on a pay-as-you-go, token-based model for its Command R, Embed, and Rerank APIs. A free tier is available for developers, offering monthly allocations of input and output tokens. Enterprise solutions provide custom pricing structures based on specific usage and deployment requirements, with detailed rates published on the official Cohere pricing page.

Pricing overview

Cohere's pricing structure is primarily usage-based, employing a pay-as-you-go model where costs are calculated per token. This applies across its core language models and embedding services, including Command R, Command R+, Command, Embed, and Rerank. The pricing model differentiates between input tokens (the text sent to the model) and output tokens (the text generated by the model), with varying rates depending on the model's complexity and capability. This approach allows developers and businesses to scale their usage according to demand, paying only for the resources consumed. Cohere also offers a free tier for developers to experiment with its APIs, alongside custom enterprise pricing for large-scale deployments and specific requirements Cohere's official pricing details.

The token-based model is a common practice among large language model (LLM) providers, ensuring granular control over expenditure. For instance, a single token typically represents about four characters in English, though this can vary by language and tokenization method Google's machine learning glossary definition of a token. Understanding tokenization is crucial for estimating costs, as longer inputs and outputs directly correlate with higher token counts and thus higher costs.

Plans and tiers

Cohere primarily offers two main tiers for its API access: a Free Tier for development and testing, and a Production Tier for live applications. Enterprise-level solutions are also available with custom agreements.

Free Tier

The Free Tier is designed to allow developers to explore Cohere's capabilities without initial financial commitment. It provides monthly allocations for various models:

Up to 5 million input tokens for Command R
Up to 100,000 output tokens for Command R
Up to 1 million input tokens for Embed
Up to 1 million input tokens for Rerank

These limits reset monthly, enabling continuous development and prototyping Cohere pricing page.

Production Tier (Pay-as-you-go)

The Production Tier operates on a pay-as-you-go basis, with costs calculated per 1,000 tokens. Rates differ significantly between input and output tokens, as well as between different models. The following table summarizes the general pricing structure for key Cohere models as of May 2026:

Cohere Production Tier Pricing per 1,000 Tokens
Product/Model	Input Tokens (per 1k)	Output Tokens (per 1k)	Key Limits/Features	Best For
Command R+	$3.00	$15.00	Most capable, RAG optimized, Tool Use	Complex enterprise RAG, advanced automation
Command R	$0.50	$1.50	Mid-size, RAG optimized, Tool Use	Enterprise search, conversational AI
Command	$0.01	$0.015	General purpose, reliable generation	Text generation, content creation, summarization
Embed v3 (English)	$0.0001 (per 1k)	N/A (input only)	Optimized for search, 1024 dimensions	Semantic search, retrieval-augmented generation (RAG)
Embed v3 (Multilingual)	$0.0001 (per 1k)	N/A (input only)	Supports 100+ languages, 1024 dimensions	Global semantic search, cross-lingual RAG
Rerank v3	$0.001 (per 1k)	N/A (input only)	Improves search relevance	Search result optimization, information retrieval

Note: All prices are illustrative and subject to change. Refer to the official Cohere pricing page for the most current rates.

Enterprise Solutions

For organizations with significant usage requirements, specific compliance needs (e.g., SOC 2 Type II, GDPR, HIPAA), or private cloud deployments, Cohere offers custom enterprise agreements. These plans typically include dedicated support, tailored pricing, and advanced security features Cohere's enterprise offerings.

Free tier and limits

Cohere provides a generous free tier to facilitate development and experimentation across its primary services. The free tier limits are:

Command R: 5 million input tokens and 100,000 output tokens per month. This allows for substantial testing of conversational AI and RAG-optimized applications.
Embed: 1 million input tokens per month. This is sufficient for generating embeddings for a considerable amount of text, enabling the development of semantic search and recommendation systems.
Rerank: 1 million input tokens per month. This tier supports testing the relevance improvement capabilities of the Rerank model on a significant dataset.

These limits are cumulative across all models within their respective categories (e.g., all Embed models share the 1 million input token limit). The tokens reset at the beginning of each calendar month. Exceeding these limits will require transitioning to the production pay-as-you-go tier or upgrading to an enterprise plan Cohere's free tier details.

Real-world cost examples

Understanding Cohere's token-based pricing in practical scenarios helps in estimating potential costs. Here are a few examples based on the listed production tier rates (as of May 2026).

Example 1: Basic Text Generation with Command

Scenario: Generating 10,000 short product descriptions, each averaging 200 input tokens (prompt) and 150 output tokens (generated description).
Model: Command
Input Tokens: 10,000 descriptions * 200 input tokens/description = 2,000,000 input tokens
Output Tokens: 10,000 descriptions * 150 output tokens/description = 1,500,000 output tokens
Cost Calculation:
- Input cost: (2,000,000 / 1,000) * $0.01 = $20.00
- Output cost: (1,500,000 / 1,000) * $0.015 = $22.50
- Total Cost: $42.50

Example 2: Semantic Search with Embed and Rerank

Scenario: Embedding 500,000 documents for a knowledge base (each 500 tokens) once, and then performing 100,000 search queries (each 50 tokens) per month, with results reranked.
Models: Embed v3 (Multilingual), Rerank v3
Initial Embedding Cost (one-time):
- Documents to embed: 500,000
- Tokens per document: 500
- Total Embed input tokens: 500,000 * 500 = 250,000,000
- Embed cost: (250,000,000 / 1,000) * $0.0001 = $25.00
Monthly Search & Rerank Cost:
- Search queries: 100,000
- Tokens per query (Embed): 50
- Queries for Rerank: 100,000 (Rerank processes the query + top N results, assuming query itself is also passed)
- Total Embed input tokens (monthly): 100,000 * 50 = 5,000,000
- Total Rerank input tokens (monthly, assuming 50 candidate documents + query per rerank call): 100,000 queries * (50 tokens query + 50 documents * 500 tokens/document) = 100,000 * (50 + 25000) = ~2,500,000,000 tokens
- Embed monthly cost: (5,000,000 / 1,000) * $0.0001 = $0.50
- Rerank monthly cost: (2,500,000,000 / 1,000) * $0.001 = $2,500.00
- Total Monthly Cost: ~$2,500.50 (plus the one-time embedding cost)

Example 3: Conversational AI with Command R

Scenario: A chatbot handling 5,000 conversations per day, with each conversation averaging 10 turns. Each turn involves 150 input tokens (user query + context) and 100 output tokens (bot response).
Model: Command R
Daily Interactions: 5,000 conversations * 10 turns/conversation = 50,000 turns
Daily Input Tokens: 50,000 turns * 150 input tokens/turn = 7,500,000 input tokens
Daily Output Tokens: 50,000 turns * 100 output tokens/turn = 5,000,000 output tokens
Monthly Cost (approx. 30 days):
- Monthly input tokens: 7,500,000 * 30 = 225,000,000
- Monthly output tokens: 5,000,000 * 30 = 150,000,000
- Input cost: (225,000,000 / 1,000) * $0.50 = $112,500.00
- Output cost: (150,000,000 / 1,000) * $1.50 = $225,000.00
- Total Monthly Cost: $337,500.00

These examples illustrate how costs can accumulate based on the scale of operations and the specific models utilized. The most powerful models, like Command R+, command higher per-token rates due to their advanced capabilities. Developers are encouraged to optimize prompt engineering and response length to manage token usage efficiently Cohere's tokenization documentation.

How the pricing compares

Cohere's token-based pricing model is standard within the LLM industry, aligning with major alternatives such as OpenAI and Anthropic. The competitive landscape often sees providers adjusting per-token rates based on model size, capability, and target use cases (e.g., general text generation vs. highly specialized RAG). While direct price comparisons can be complex due to differences in tokenization, model performance, and feature sets, a general overview can be provided.

OpenAI: Offers a diverse range of models (e.g., GPT-3.5 Turbo, GPT-4) with varying price points. GPT-4 Turbo models, designed for high performance and context windows, typically have higher per-token costs than GPT-3.5 Turbo. OpenAI also distinguishes between input and output token pricing OpenAI's official pricing information. For instance, a basic model like GPT-3.5 Turbo 0125 might be priced at $0.0005 / 1K input tokens and $0.0015 / 1K output tokens, while GPT-4 Turbo could be $0.01 / 1K input and $0.03 / 1K output OpenAI API pricing details.
Anthropic: Provides its Claude series of models (e.g., Claude 3 Opus, Claude 3 Sonnet, Claude 3 Haiku). Anthropic also uses a token-based model with separate rates for input and output, often emphasizing larger context windows and safety features. Claude 3 Haiku, their fastest and most compact model, might be priced around $0.25 / 1M input tokens and $1.25 / 1M output tokens, with Opus being significantly higher Anthropic's API pricing documentation.
Google Cloud Vertex AI: Google's Vertex AI platform hosts various models, including their Gemini family. Pricing is also token-based, with costs varying by model and region. Google often offers tiered pricing or discounts for high volume usage. For example, Gemini 1.5 Pro might be priced at $0.007 / 1K input and $0.021 / 1K output tokens, with specific rates for multimodal inputs Google Cloud Vertex AI pricing.

Cohere's RAG-optimized Command R and Command R+ models are positioned competitively for enterprise use cases where retrieval accuracy and tool use are critical. Their Embed and Rerank models are specifically designed for search and information retrieval, offering fine-tuned performance for those tasks. While specific per-token rates can fluctuate, Cohere's pricing strategy aims to provide value through model specialization and performance in key enterprise AI applications Cohere's pricing details.

Cohere Pricing: Models, Tiers, and Cost Examples (2026)

Pricing overview

Plans and tiers

Free Tier

Production Tier (Pay-as-you-go)

Enterprise Solutions

Free tier and limits

Real-world cost examples

Example 1: Basic Text Generation with Command

Example 2: Semantic Search with Embed and Rerank

Example 3: Conversational AI with Command R

How the pricing compares

Frequently asked questions

Reviews

Discussion

Written by

Pricing overview

Plans and tiers

Free Tier

Production Tier (Pay-as-you-go)

Enterprise Solutions

Free tier and limits

Real-world cost examples

Example 1: Basic Text Generation with Command

Example 2: Semantic Search with Embed and Rerank

Example 3: Conversational AI with Command R

How the pricing compares

Related

Frequently asked questions

Reviews

Discussion

Written by