Overview

Together AI offers an API-driven platform designed for developers and researchers to deploy, fine-tune, and run open-source large language models (LLMs). The service focuses on providing access to a wide array of pre-trained models, including popular architectures like Llama, Mixtral, and Falcon, through its inference API. This approach allows users to integrate advanced AI capabilities into their applications without managing underlying infrastructure or specialized hardware.

The platform is optimized for scenarios requiring cost-effective inference and efficient fine-tuning of custom models. Developers can utilize Together AI's serverless GPU infrastructure to scale their AI workloads dynamically. This flexibility supports various use cases, from integrating generative AI into consumer applications to conducting advanced research on novel model architectures.

Together AI aims to simplify the operational aspects of working with LLMs. Its developer experience is characterized by a straightforward API, comprehensive documentation with code examples for common tasks, and client libraries for popular programming languages such as Python and JavaScript. The emphasis is on performance and cost efficiency, making it suitable for projects with budget constraints or those requiring high throughput for LLM interactions. For example, for an overview of performance considerations for LLMs, the Mozilla Web Performance documentation provides general principles that apply to API interactions, including those with AI models.

The service targets a broad audience, including startups building AI-powered products, enterprises looking to experiment with or deploy open-source LLMs, and academic researchers requiring scalable compute resources. By focusing on open-source models, Together AI allows users to avoid vendor lock-in and leverage the rapid advancements occurring within the open-source AI community.

Key applications include natural language generation, text summarization, code generation, and complex reasoning tasks. The platform's fine-tuning capabilities enable users to adapt generic models to specific domain knowledge or proprietary datasets, enhancing accuracy and relevance for specialized applications. This process involves training a pre-existing model on new data to improve its performance on particular tasks, a common practice in machine learning development. An example of a similar approach to model adaptation can be seen in Google Cloud's Machine Learning documentation, which discusses applying custom training to various AI models.

Together AI was founded in 2022 and offers a free tier with up to $25 in credits, allowing developers to experiment with the platform before committing to paid usage. The company also maintains SOC 2 Type II compliance, addressing data security and privacy concerns for enterprise users.

Key features

  • Inference API: Provides programmatic access to a catalog of open-source large language models for text generation, summarization, and other AI tasks (Together AI Inference API Reference).
  • Fine-tuning API: Enables users to customize pre-trained open-source LLMs with their own datasets to improve performance on specific tasks or domains.
  • Serverless GPUs: Offers on-demand GPU compute resources for running inference and fine-tuning jobs, abstracting away infrastructure management.
  • Extensive Model Catalog: Supports a wide range of popular open-source LLMs, including models from Meta, Mistral AI, and Falcon, allowing developers flexibility in model choice.
  • Developer SDKs: Provides client libraries for Python and JavaScript, simplifying integration into existing application workflows (Together AI Python and JavaScript SDKs).
  • Cost-Effective Pricing: Utilizes a pay-as-you-go model for inference and hourly billing for fine-tuning, designed to be competitive for large-scale AI workloads.
  • SOC 2 Type II Compliance: Demonstrates commitment to data security and privacy standards, important for enterprise deployments.

Pricing

Together AI operates on a pay-as-you-go model for both inference and fine-tuning services. Inference costs are calculated per token, while fine-tuning is billed hourly for GPU usage. A free tier is available, offering up to $25 in credits for new users to explore the platform's capabilities. Pricing details are subject to change; for the most current information, refer to the official Together AI pricing page.

Service Unit Pricing (as of 2026-05-29)
Inference (Input) Per 1M tokens Varies by model (e.g., Llama-2-7B-Chat: $0.10)
Inference (Output) Per 1M tokens Varies by model (e.g., Llama-2-7B-Chat: $0.10)
Fine-tuning Per GPU hour Varies by GPU type (e.g., A100: ~$1.50/hour)
Free Tier Credits Up to $25

Common integrations

  • LangChain: Integrate Together AI models into LangChain applications for building complex LLM chains and agents (Together AI LangChain integration guide).
  • LlamaIndex: Connect with LlamaIndex for advanced data indexing and retrieval augmented generation (RAG) using Together AI's inference capabilities.
  • Hugging Face Transformers: Utilize models hosted on Together AI with the Hugging Face ecosystem for research and development workflows.
  • Custom Python Applications: Directly integrate the Together AI Python SDK into custom applications for inference and fine-tuning.
  • Custom JavaScript/TypeScript Applications: Use the Together AI JavaScript SDK to embed LLM capabilities into web and Node.js applications.

Alternatives

  • Anyscale: Offers a platform for building and scaling AI applications, including support for open-source LLMs, with a focus on Ray for distributed computing.
  • Fireworks AI: Provides an API for deploying and serving open-source LLMs with low latency and competitive pricing, similar to Together AI's inference service.
  • Perplexity AI: Focuses on an AI search engine, but also offers an API for its models, which can be an alternative for specific information retrieval and summarization tasks.

Getting started

To begin using Together AI, you typically generate an API key from your account dashboard and then use it to authenticate requests to the inference or fine-tuning APIs. The following Python example demonstrates a basic text generation request using the Together AI Python SDK:

import together

together.api_key = "YOUR_API_KEY"

def generate_text(prompt, model="mistralai/Mixtral-8x7B-Instruct-v0.1"): # Specify a widely available model
    try:
        response = together.Complete.create(
            prompt=prompt,
            model=model,
            max_tokens=100,
            temperature=0.7,
            top_p=0.7,
            top_k=50,
            repetition_penalty=1
        )
        return response['output']['choices'][0]['text']
    except together.TogetherError as e:
        return f"Error generating text: {e}"

if __name__ == "__main__":
    user_prompt = "Write a short story about a robot who discovers a love for painting."
    generated_story = generate_text(user_prompt)
    print("Generated Story:\n", generated_story)

This code snippet initializes the Together AI client with an API key, then defines a function to send a text generation request to the specified model. The parameters max_tokens, temperature, top_p, top_k, and repetition_penalty control the output characteristics, allowing developers to fine-tune the generative process. Replace "YOUR_API_KEY" with your actual API key obtained from the Together AI console.