What types of models does Together AI support?

Together AI primarily supports a wide range of open-source large language models (LLMs), including popular architectures like Llama, Mixtral, and Falcon for both inference and fine-tuning.

Is there a free tier available for Together AI?

Yes, Together AI offers a free tier that includes up to $25 in credits for new users to experiment with their platform and services.

What programming languages are supported by Together AI SDKs?

Together AI provides official SDKs for Python and JavaScript, enabling developers to easily integrate the API into applications written in these languages. cURL examples are also available.

How is Together AI's pricing structured?

Pricing is pay-as-you-go. Inference is billed per token, while fine-tuning costs are calculated hourly based on GPU usage. Specific rates vary by model and GPU type.

What is SOC 2 Type II compliance?

SOC 2 Type II compliance indicates that Together AI has undergone an audit demonstrating its commitment to managing customer data securely, based on trust service principles of security, availability, processing integrity, confidentiality, and privacy.

Can I fine-tune custom models with Together AI?

Yes, Together AI offers a Fine-tuning API that allows users to train and customize open-source LLMs using their own datasets to achieve better performance on specific tasks or domains.

What are serverless GPUs?

Serverless GPUs refer to Together AI's on-demand GPU computing resources that automatically scale up or down based on workload. This abstracts away the need for users to provision and manage their own GPU infrastructure for AI tasks.

Together AI — Open-Source LLM Hosting and Fine-Tuning

Together AI provides an API platform for deploying and fine-tuning open-source large language models (LLMs). It offers access to serverless GPUs for inference and training, positioning itself as a cost-effective solution for developers and researchers working with a range of community-driven AI models. The platform supports various models and aims to simplify the operational complexities of running LLMs.

Overview

Together AI offers an API-driven platform designed for developers and researchers to deploy, fine-tune, and run open-source large language models (LLMs). The service focuses on providing access to a wide array of pre-trained models, including popular architectures like Llama, Mixtral, and Falcon, through its inference API. This approach allows users to integrate advanced AI capabilities into their applications without managing underlying infrastructure or specialized hardware.

The platform is optimized for scenarios requiring cost-effective inference and efficient fine-tuning of custom models. Developers can utilize Together AI's serverless GPU infrastructure to scale their AI workloads dynamically. This flexibility supports various use cases, from integrating generative AI into consumer applications to conducting advanced research on novel model architectures.

Together AI aims to simplify the operational aspects of working with LLMs. Its developer experience is characterized by a straightforward API, comprehensive documentation with code examples for common tasks, and client libraries for popular programming languages such as Python and JavaScript. The emphasis is on performance and cost efficiency, making it suitable for projects with budget constraints or those requiring high throughput for LLM interactions. For example, for an overview of performance considerations for LLMs, the Mozilla Web Performance documentation provides general principles that apply to API interactions, including those with AI models.

The service targets a broad audience, including startups building AI-powered products, enterprises looking to experiment with or deploy open-source LLMs, and academic researchers requiring scalable compute resources. By focusing on open-source models, Together AI allows users to avoid vendor lock-in and leverage the rapid advancements occurring within the open-source AI community.

Key applications include natural language generation, text summarization, code generation, and complex reasoning tasks. The platform's fine-tuning capabilities enable users to adapt generic models to specific domain knowledge or proprietary datasets, enhancing accuracy and relevance for specialized applications. This process involves training a pre-existing model on new data to improve its performance on particular tasks, a common practice in machine learning development. An example of a similar approach to model adaptation can be seen in Google Cloud's Machine Learning documentation, which discusses applying custom training to various AI models.

Together AI was founded in 2022 and offers a free tier with up to $25 in credits, allowing developers to experiment with the platform before committing to paid usage. The company also maintains SOC 2 Type II compliance, addressing data security and privacy concerns for enterprise users.

Key features

Inference API: Provides programmatic access to a catalog of open-source large language models for text generation, summarization, and other AI tasks (Together AI Inference API Reference).
Fine-tuning API: Enables users to customize pre-trained open-source LLMs with their own datasets to improve performance on specific tasks or domains.
Serverless GPUs: Offers on-demand GPU compute resources for running inference and fine-tuning jobs, abstracting away infrastructure management.
Extensive Model Catalog: Supports a wide range of popular open-source LLMs, including models from Meta, Mistral AI, and Falcon, allowing developers flexibility in model choice.
Developer SDKs: Provides client libraries for Python and JavaScript, simplifying integration into existing application workflows (Together AI Python and JavaScript SDKs).
Cost-Effective Pricing: Utilizes a pay-as-you-go model for inference and hourly billing for fine-tuning, designed to be competitive for large-scale AI workloads.
SOC 2 Type II Compliance: Demonstrates commitment to data security and privacy standards, important for enterprise deployments.

Pricing

Together AI operates on a pay-as-you-go model for both inference and fine-tuning services. Inference costs are calculated per token, while fine-tuning is billed hourly for GPU usage. A free tier is available, offering up to $25 in credits for new users to explore the platform's capabilities. Pricing details are subject to change; for the most current information, refer to the official Together AI pricing page.

Service	Unit	Pricing (as of 2026-05-29)
Inference (Input)	Per 1M tokens	Varies by model (e.g., Llama-2-7B-Chat: $0.10)
Inference (Output)	Per 1M tokens	Varies by model (e.g., Llama-2-7B-Chat: $0.10)
Fine-tuning	Per GPU hour	Varies by GPU type (e.g., A100: ~$1.50/hour)
Free Tier	Credits	Up to $25

Common integrations

LangChain: Integrate Together AI models into LangChain applications for building complex LLM chains and agents (Together AI LangChain integration guide).
LlamaIndex: Connect with LlamaIndex for advanced data indexing and retrieval augmented generation (RAG) using Together AI's inference capabilities.
Hugging Face Transformers: Utilize models hosted on Together AI with the Hugging Face ecosystem for research and development workflows.
Custom Python Applications: Directly integrate the Together AI Python SDK into custom applications for inference and fine-tuning.
Custom JavaScript/TypeScript Applications: Use the Together AI JavaScript SDK to embed LLM capabilities into web and Node.js applications.

Alternatives

Anyscale: Offers a platform for building and scaling AI applications, including support for open-source LLMs, with a focus on Ray for distributed computing.
Fireworks AI: Provides an API for deploying and serving open-source LLMs with low latency and competitive pricing, similar to Together AI's inference service.
Perplexity AI: Focuses on an AI search engine, but also offers an API for its models, which can be an alternative for specific information retrieval and summarization tasks.

Getting started

To begin using Together AI, you typically generate an API key from your account dashboard and then use it to authenticate requests to the inference or fine-tuning APIs. The following Python example demonstrates a basic text generation request using the Together AI Python SDK:

import together

together.api_key = "YOUR_API_KEY"

def generate_text(prompt, model="mistralai/Mixtral-8x7B-Instruct-v0.1"): # Specify a widely available model
    try:
        response = together.Complete.create(
            prompt=prompt,
            model=model,
            max_tokens=100,
            temperature=0.7,
            top_p=0.7,
            top_k=50,
            repetition_penalty=1
        )
        return response['output']['choices'][0]['text']
    except together.TogetherError as e:
        return f"Error generating text: {e}"

if __name__ == "__main__":
    user_prompt = "Write a short story about a robot who discovers a love for painting."
    generated_story = generate_text(user_prompt)
    print("Generated Story:\n", generated_story)

This code snippet initializes the Together AI client with an API key, then defines a function to send a text generation request to the specified model. The parameters max_tokens, temperature, top_p, top_k, and repetition_penalty control the output characteristics, allowing developers to fine-tune the generative process. Replace "YOUR_API_KEY" with your actual API key obtained from the Together AI console.

Together AI

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

From across the cluster

Frequently asked questions

Reviews

Discussion

Written by

Overview

Key features

Pricing

Common integrations

Alternatives

Getting started

Related

From across the cluster

Frequently asked questions

Reviews

Discussion

Written by