Getting started overview
Apache Superset is an open-source software application designed for data exploration and data visualization. It supports a wide range of data sources through SQLAlchemy, allowing users to create interactive dashboards and charts. As an open-source project, Superset does not require a signup process with a commercial vendor; instead, getting started involves installing the software on a local machine or server.
The primary methods for setting up Apache Superset include using Docker for a containerized environment or installing it directly via pip for a Python environment. Docker is generally recommended for new users due to its simplified dependency management and quicker setup time. A successful setup culminates in accessing the Superset web interface and connecting to a database to begin visualizing data.
Quick Reference Table: Superset Getting Started Steps
| Step | What to Do | Where |
|---|---|---|
| 1. Choose Installation Method | Decide between Docker or pip installation. Docker is recommended for ease of use. | Apache Superset installation documentation |
| 2. Install Dependencies | Install Docker Desktop (if using Docker) or Python and pip (if using pip). | Docker Desktop download page, Python downloads |
| 3. Run Superset | Execute Docker commands or pip install and Superset initialization commands. | Command line interface |
| 4. Create Admin User | Set up an administrative user account for login. | Command line interface during initialization |
| 5. Log In | Access the Superset web interface via your browser. | Web browser (e.g., http://localhost:8088) |
| 6. Connect Data Source | Add a new database connection (e.g., PostgreSQL, MySQL). | Superset web interface > Data > Databases |
Create an account and get keys
Unlike commercial SaaS platforms, Apache Superset does not involve a traditional account creation process with a third-party provider or API keys that are generated and managed externally. As self-hosted open-source software, user accounts and any necessary credentials (like database connection strings) are managed within your deployed instance of Superset. The initial administrative user is created during the installation process.
For Docker Installation:
If you are using the Docker Compose setup provided by the Apache Superset project, the process typically involves cloning the Superset GitHub repository and then running Docker Compose commands. This creates a local Superset environment.
git clone https://github.com/apache/superset.git
cd superset
docker compose up -d
After the containers are up, you will need to initialize the database and create an admin user. The Docker Compose setup often includes scripts to automate this. For example, to create an admin user:
docker compose exec superset superset fab create-admin --username admin --firstname Superset --lastname Admin --email [email protected] --password admin
This command creates an admin user with the specified username and password. Remember to change admin to a strong password for production environments. More detailed instructions are available in the Apache Superset Docker Compose installation guide.
For pip Installation:
If you choose to install Superset using pip, you will first need a Python environment. After installing Superset, you'll run commands to initialize the database and create the admin user.
pip install apache-superset
superset db upgrade
superset init
superset fab create-admin
The superset fab create-admin command will prompt you interactively to enter a username, first name, last name, email, and password for your administrator account. This user will be the first one with access to the Superset UI.
Once the admin user is created, you can log into your Superset instance using these credentials. Database connection strings (which act as API keys to your data sources) are then configured within the Superset UI after login. These connections are stored securely within your Superset instance's metadata database.
Your first request
A "first request" in the context of Apache Superset typically refers to connecting your first data source and then querying it to create a visualization. This process involves navigating the Superset UI after successful installation and login.
Step 1: Log in to Superset
After completing the installation and admin user creation, open your web browser and navigate to the Superset instance, usually at http://localhost:8088. Log in using the admin credentials you created.
Step 2: Connect a Data Source
From the Superset interface, go to Data > Databases. Click the + Database button. You will be prompted to enter connection details for your database. Superset supports a wide array of databases through SQLAlchemy, including PostgreSQL, MySQL, SQLite, and many others. For this example, let's assume you're connecting to a local PostgreSQL database.
- Database Name: Give your connection a descriptive name (e.g.,
MyPostgreSQLDB). - SQLAlchemy URI: Enter the connection string. For a local PostgreSQL database, this might look like:
postgresql://user:password@host:port/database_name(e.g.,postgresql://superset_user:mypassword@localhost:5432/superset_db).
Click Test Connection to verify the URI is correct, then Add to save the database.
Step 3: Add a Dataset
Once the database is connected, you need to expose specific tables as "datasets" for Superset to query. Go to Data > Datasets. Click the + Dataset button.
- Database: Select the database you just added.
- Schema: Select the relevant schema (e.g.,
public). - Table Name: Choose a table from your database that you want to visualize.
Click Add. Superset will automatically infer columns and data types, which you can review and adjust if necessary.
Step 4: Create a Chart
With a dataset configured, you can now create your first visualization. Go to Charts and click the + Chart button.
- Choose a Dataset: Select the dataset you just created. Click Choose a chart type.
- Choose a Chart Type: Select a visualization type (e.g., "Bar Chart", "Line Chart"). Click Create New Chart.
- Configure the Chart: In the chart builder interface, drag and drop columns to define your visualization. For a bar chart, you might set a dimension for the X-axis (e.g.,
category_column) and a metric for the Y-axis (e.g.,COUNT(*)orSUM(value_column)).
Click Run Query to see the visualization. You can then Save the chart and optionally add it to a dashboard.
Common next steps
After successfully connecting a data source and creating your first chart in Apache Superset, several common next steps can enhance your data analysis and visualization capabilities:
- Build a Dashboard: Combine multiple charts into an interactive dashboard. Go to Dashboards, click + Dashboard, and then add your saved charts to it. Dashboards allow for comprehensive data storytelling and tracking of key metrics.
- Explore SQL Lab: Use Superset's SQL Lab feature (SQL Lab > SQL Editor) to write custom SQL queries against your connected databases. This is useful for complex data exploration, creating new datasets from custom queries, or debugging data issues.
- Add More Data Sources: Connect additional databases, data warehouses, or even CSV files (via the Upload CSV functionality) to expand the scope of your analysis.
- Configure Access Control: For multi-user environments, set up roles and permissions to control who can access which dashboards, datasets, and features. Go to Security > List Roles to manage this.
- Scheduled Reports: Configure Superset to send scheduled email reports of dashboards or charts to stakeholders.
- Custom Plugins: Extend Superset's functionality by developing or installing custom visualization plugins. This requires Python and JavaScript development skills. The Superset documentation on custom visualizations provides a starting point.
- Production Deployment: For production use cases, move from a local installation to a more robust, scalable deployment using Kubernetes or other orchestration tools, and configure a proper web server (like Gunicorn) and caching.
Troubleshooting the first call
When getting started with Apache Superset, several issues can arise during installation, database connection, or chart creation. Here are common problems and their solutions:
-
Installation Errors (Docker):
- Issue: Docker containers fail to start or exit immediately.
- Solution: Check Docker logs (
docker compose logsordocker logs <container_id>) for specific error messages. Ensure Docker Desktop is running and has sufficient resources. Sometimes, old volumes can cause conflicts; trydocker compose down -vto remove volumes and thendocker compose up -dagain.
-
Installation Errors (pip):
- Issue:
pip install apache-supersetfails or subsequent commands (superset db upgrade) report errors. - Solution: Ensure you are using a Python virtual environment to avoid dependency conflicts. Verify Python version compatibility (Superset has specific Python version requirements, usually 3.8+). Install necessary system-level dependencies for database drivers (e.g.,
libpq-devfor PostgreSQL on Linux). Refer to the Apache Superset system dependencies documentation.
- Issue:
-
Login Issues:
- Issue: Cannot log in with the admin credentials.
- Solution: Double-check the username and password. If forgotten, you can create a new admin user or reset passwords via the Superset CLI (e.g.,
superset fab reset-password --username adminfor pip installations, ordocker compose exec superset superset fab reset-password --username adminfor Docker). Ensure the Superset web server is running.
-
Database Connection Failure:
- Issue: "Test Connection" fails when adding a database.
- Solution: Verify the SQLAlchemy URI for typos. Ensure the database server is running and accessible from the machine where Superset is hosted (check firewall rules, network connectivity). Confirm the database user has correct permissions and the specified database exists. Install the appropriate database driver (e.g.,
pip install psycopg2-binaryfor PostgreSQL,pip install mysqlclientfor MySQL) within your Superset environment.
-
Chart Creation Issues:
- Issue: Chart displays an error or no data.
- Solution: Check the SQL query generated by Superset in the chart builder; it might have syntax errors or refer to non-existent columns. Ensure the dataset has chosen the correct table and schema. Verify that the selected columns have data. Look at the browser's developer console for JavaScript errors, and Superset's backend logs for server-side errors.