In today’s fast-paced digital landscape, building high-performance APIs is paramount, and FastAPI has emerged as a leading framework for this purpose in Python. Its speed, asynchronous capabilities, and automatic documentation make it a developer favorite. However, deploying an API without robust monitoring is like navigating a ship without a compass – you’re moving, but you have no idea where you’re going or if you’re about to hit an iceberg. Effective monitoring is crucial for understanding your application’s health, identifying bottlenecks, and ensuring a seamless user experience. This guide will walk you through setting up a powerful monitoring stack for your FastAPI applications using Prometheus for metric collection and Grafana for visualization.
Why Monitoring is Non-Negotiable for FastAPI Applications
Running a FastAPI application in production without a monitoring solution leaves you blind to potential issues. Think of monitoring as the vital signs of your application. Without it, you wouldn’t know if your app is:
- Under Heavy Load: Is it struggling with too many requests?
- Experiencing Latency Spikes: Are certain endpoints suddenly slow?
- Throwing Errors: Are users encountering frequent 500 errors?
- Resource Constrained: Is it running out of memory or CPU?
Proactive monitoring allows you to identify and address these problems before they impact your users or lead to costly downtime. It provides the data needed for informed decision-making regarding scaling, optimization, and resource allocation. For businesses in the US, where downtime can translate to significant financial losses and reputational damage, a robust monitoring strategy is not just a best practice – it’s a business imperative.
Understanding the Monitoring Stack: Prometheus and Grafana
Before diving into the implementation, let’s briefly understand the key players in our monitoring stack:
Prometheus: The Time-Series Database and Alerting System
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It excels at collecting and storing metrics as time-series data. Here’s how it generally works:
- Scraping: Prometheus pulls metrics from configured targets (our FastAPI application in this case) at specified intervals.
- Storage: It stores these metrics in its time-series database.
- Querying: It provides a powerful query language called PromQL to analyze the collected data.
- Alerting: It can trigger alerts based on defined rules.
Prometheus is designed for reliability and scalability, making it an excellent choice for modern microservices architectures.
Grafana: The Visualization Powerhouse
Grafana is an open-source platform for monitoring and observability. It allows you to query, visualize, alert on, and explore your metrics no matter where they are stored. With Grafana, you can:
- Create Dynamic Dashboards: Build interactive dashboards with various panels (graphs, tables, heatmaps) to display your metrics.
- Connect to Multiple Data Sources: Seamlessly integrate with Prometheus, Elasticsearch, InfluxDB, and many others.
- Set Up Alerts: Configure sophisticated alert rules and notifications.
- Collaborate: Share dashboards and insights with your team.
Together, Prometheus and Grafana form a formidable duo for gaining deep insights into your application’s performance and health.

Integrating Prometheus Metrics into Your FastAPI Application
The first step is to make your FastAPI application expose metrics in a format that Prometheus can scrape. We’ll use the starlette-exporter library, which provides a convenient way to integrate Prometheus metrics into Starlette-based frameworks like FastAPI.
Setting Up Your FastAPI Project
Let’s assume you have a basic FastAPI application. If not, here’s a quick starter:
# main.py
from fastapi import FastAPI
app = FastAPI()
@app.get("/")
async def read_root():
return {"message": "Hello, World!"}
@app.get("/items/{item_id}")
async def read_item(item_id: int):
return {"item_id": item_id}
Install FastAPI and Uvicorn:
pip install fastapi uvicorn
Adding starlette-exporter
Now, let’s add the exporter. Install the library:
pip install starlette-exporter
Modify your main.py to include the Prometheus middleware:
# main.py
from fastapi import FastAPI, Request
from starlette_exporter import PrometheusMiddleware, handle_metrics
import time
app = FastAPI()
# Add the PrometheusMiddleware
# The 'app_name' parameter helps in distinguishing metrics from different applications
app.add_middleware(PrometheusMiddleware, app_name="fastapi_app_monitor", prefix="fastapi_app")
# Expose the metrics endpoint
# By default, metrics will be available at /metrics
app.add_route("/metrics", handle_metrics)
@app.get("/")
async def read_root(request: Request):
# Simulate some work
time.sleep(0.05)
return {"message": "Hello, World!"}
@app.get("/items/{item_id}")
async def read_item(item_id: int, request: Request):
# Simulate some work
time.sleep(0.1)
return {"item_id": item_id}
@app.get("/long_task")
async def long_task(request: Request):
# Simulate a long-running task
time.sleep(2)
return {"message": "Long task completed"}
Run your application:
uvicorn main:app --host 0.0.0.0 --port 8000
Now, if you navigate to http://localhost:8000/metrics in your browser, you should see a page full of Prometheus-formatted metrics, including request counts, latencies, and more. The starlette-exporter automatically collects a range of default metrics, which is a great starting point.
Note: The
app_nameandprefixparameters inPrometheusMiddlewareare crucial for organizing your metrics, especially when you have multiple applications or services exporting metrics to the same Prometheus instance. A clear naming convention will save you headaches later.

Defining Custom Metrics for Deeper Insights
While starlette-exporter provides excellent default metrics, you’ll often need to track application-specific events or business logic. Prometheus offers four core metric types for this:
- Counter: A cumulative metric that represents a single numerical value that only ever goes up. Useful for counting requests, errors, or completed tasks.
- Gauge: A metric that represents a single numerical value that can arbitrarily go up and down. Useful for tracking current memory usage, number of concurrent requests, or queue sizes.
- Histogram: Samples observations (e.g., request durations or response sizes) and counts them in configurable buckets. Provides insights into distributions and allows for calculating quantiles.
- Summary: Similar to a Histogram, it samples observations but calculates configurable quantiles over a sliding time window on the client side.
Let’s integrate some custom metrics into our FastAPI application using the prometheus_client library directly.
Implementing Custom Metrics
First, install the prometheus_client library:
pip install prometheus_client
Now, let’s modify our main.py to include custom metrics:
# main.py with custom metrics
from fastapi import FastAPI, Request
from starlette_exporter import PrometheusMiddleware, handle_metrics
from prometheus_client import Counter, Gauge, Histogram
import time
import random
app = FastAPI()
app.add_middleware(PrometheusMiddleware, app_name="fastapi_app_monitor", prefix="fastapi_app")
app.add_route("/metrics", handle_metrics)
# Define custom metrics
# Counter for custom business events
CUSTOM_EVENT_COUNTER = Counter(
'fastapi_app_custom_event_total',
'Number of times a custom event occurred',
['event_type'] # Label for event type
)
# Gauge for current active users (example)
ACTIVE_USERS_GAUGE = Gauge(
'fastapi_app_active_users',
'Current number of active users'
)
# Histogram for processing time of a specific internal logic
PROCESSING_TIME_HISTOGRAM = Histogram(
'fastapi_app_processing_seconds',
'Histogram of processing time for a specific internal task',
buckets=(0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0, float('inf'))
)
@app.get("/")
async def read_root(request: Request):
# Increment a custom event counter
CUSTOM_EVENT_COUNTER.labels(event_type='root_access').inc()
# Simulate active users changing
ACTIVE_USERS_GAUGE.set(random.randint(50, 200)) # Random for demonstration
# Simulate some work
time.sleep(0.05)
return {"message": "Hello, World!"}
@app.get("/items/{item_id}")
async def read_item(item_id: int, request: Request):
# Simulate internal processing and record its duration
with PROCESSING_TIME_HISTOGRAM.time():
time.sleep(random.uniform(0.01, 0.3))
CUSTOM_EVENT_COUNTER.labels(event_type='item_read').inc()
return {"item_id": item_id}
@app.post("/process_data")
async def process_data(request: Request):
# Another custom event
CUSTOM_EVENT_COUNTER.labels(event_type='data_processed').inc()
return {"status": "data processed"}
After running this updated application and hitting the endpoints a few times, accessing /metrics will show your new custom metrics alongside the default ones. Notice how labels (like event_type) are used to add dimensions to your metrics, allowing for more granular analysis.
Setting Up Prometheus for Scraping
With your FastAPI application exposing metrics, the next step is to configure Prometheus to scrape them. You’ll need a prometheus.yml configuration file.
Prometheus Configuration File (prometheus.yml)
global:
scrape_interval: 15s # How frequently Prometheus will scrape targets
evaluation_interval: 15s # How frequently Prometheus will evaluate rules
scrape_configs:
- job_name: 'fastapi_app'
# metrics_path defaults to /metrics
# scheme defaults to http
static_configs:
- targets: ['localhost:8000'] # Replace with your FastAPI app's host and port
# Example for Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
Save this file as prometheus.yml in the same directory where you’ll run the Prometheus server.
Running Prometheus
You can download Prometheus from its official website or run it via Docker. Using Docker is often the easiest way to get started:
docker run \
-p 9090:9090 \
-v /path/to/your/prometheus.yml:/etc/prometheus/prometheus.yml \
prom/prometheus
Replace /path/to/your/prometheus.yml with the actual path to your configuration file. Once Prometheus is running, you can access its web UI at http://localhost:9090. Go to the ‘Status’ -> ‘Targets’ page to confirm that your fastapi_app target is up and healthy.
Visualizing Metrics with Grafana
Prometheus gives you the raw data, but Grafana transforms that data into actionable insights through stunning dashboards. Let’s set it up.
Running Grafana
Similar to Prometheus, Grafana can be easily run with Docker:
docker run -d -p 3000:3000 --name grafana grafana/grafana-oss
Grafana will be accessible at http://localhost:3000. The default login credentials are admin for both username and password (you’ll be prompted to change it on first login).
Connecting Grafana to Prometheus
- Add Data Source: In Grafana, navigate to ‘Connections’ -> ‘Data sources’ -> ‘Add new data source’.
- Select Prometheus: Choose ‘Prometheus’ from the list.
- Configure: Set the ‘URL’ to
http://host.docker.internal:9090if Grafana is running in Docker and Prometheus is on your host machine. If both are in Docker, use the Prometheus container’s service name (e.g.,http://prometheus:9090if they are on the same Docker network). For simplicity, if both are running onlocalhostdirectly, usehttp://localhost:9090. - Save & Test: Click ‘Save & test’. You should see a ‘Data source is working’ message.
Building Your First Dashboard
Now you can create dashboards to visualize your FastAPI metrics:
- Create Dashboard: Go to ‘Dashboards’ -> ‘New Dashboard’ -> ‘Add a new panel’.
- Select Data Source: In the ‘Query’ tab, select your Prometheus data source.
- Write PromQL Queries: Use PromQL to query your metrics. For example:
- To see total requests:
sum(rate(fastapi_app_requests_total{job="fastapi_app"}[5m])) - To see average request duration:
rate(fastapi_app_request_duration_seconds_sum{job="fastapi_app"}[5m]) / rate(fastapi_app_request_duration_seconds_count{job="fastapi_app"}[5m]) - To see custom event counts:
sum(rate(fastapi_app_custom_event_total{job="fastapi_app", event_type="root_access"}[5m]))
- To see total requests:
- Customize Panel: Choose the visualization type (Graph, Stat, Gauge), set titles, units, and axes.
- Repeat: Add more panels for different metrics like error rates, CPU usage (if you have a node exporter), active users, and more.

Best Practices for FastAPI Monitoring
To get the most out of your monitoring setup, consider these best practices:
- Define SLOs and SLIs: Establish Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for your critical endpoints. This helps you focus on what truly matters to your users.
- Granular Metrics: While default metrics are good, custom metrics for business-critical events or specific internal logic provide deeper, more actionable insights.
- Meaningful Labels: Use labels judiciously to add context to your metrics (e.g.,
endpoint,status_code,user_id,region). This allows for powerful filtering and aggregation in PromQL. However, be mindful not to create too many unique label combinations (high cardinality), as this can strain Prometheus. - Alerting: Don’t just visualize; set up alerts in Prometheus Alertmanager or Grafana to notify you immediately of critical issues. Think about thresholds for error rates, latency, or resource utilization.
- Performance Considerations: While metric collection is generally lightweight, be cautious about collecting excessively high-cardinality metrics (metrics with many unique label combinations) or very frequent updates for gauges, as this can increase Prometheus’s storage and processing requirements.
- Logging and Tracing: Monitoring is one pillar of observability. Combine it with structured logging (e.g., using Loguru or structlog) and distributed tracing (e.g., with OpenTelemetry) for a complete picture of your application’s behavior.
- Environment Variables for Configuration: Avoid hardcoding sensitive information or configuration values in your application. Use environment variables for things like Prometheus endpoint paths or application names.
Conclusion
Monitoring your FastAPI applications with Prometheus and Grafana provides an indispensable toolkit for maintaining performance, reliability, and user satisfaction. By following the steps outlined in this guide, you can set up a robust system to collect, visualize, and alert on critical metrics, giving you the insights needed to keep your applications running smoothly. From basic request tracking to detailed custom business metrics, this stack empowers developers and operations teams to proactively identify and resolve issues, ensuring your FastAPI services are always at their best. Invest in your monitoring strategy today, and gain the confidence to scale and innovate without fear of the unknown.
Frequently Asked Questions
What is the difference between a Prometheus Counter and a Gauge?
A Prometheus Counter is a cumulative metric that can only increase or be reset to zero upon application restart. It’s ideal for tracking events that increment over time, like total requests received or errors encountered. A Gauge, on the other hand, represents a single numerical value that can go up and down arbitrarily. It’s suitable for measurements that fluctuate, such as current memory usage, the number of active users, or queue sizes. You would use a Counter for ‘how many times something happened’ and a Gauge for ‘what is the current value of something’.
Why should I use starlette-exporter instead of just prometheus_client directly?
starlette-exporter provides a convenient, out-of-the-box solution for common FastAPI/Starlette metrics. It automatically instruments your application to collect metrics like request count, request duration, and response sizes, saving you significant boilerplate code. While prometheus_client allows for direct and fine-grained control over every metric, starlette-exporter simplifies the initial setup for standard HTTP metrics. You can, and often should, use both: starlette-exporter for common HTTP metrics and prometheus_client for application-specific custom metrics.
What is high cardinality in Prometheus, and why should I avoid it?
High cardinality refers to metrics that have a very large number of unique label combinations. For example, if you add a user_id label to every request metric, and you have millions of unique users, that metric would have extremely high cardinality. Prometheus stores each unique label combination as a separate time series. High cardinality can severely impact Prometheus’s performance, leading to increased memory usage, slower query times, and higher storage requirements. It’s crucial to be mindful of the labels you add and ensure they don’t create an unbounded number of unique series.
Can I monitor multiple FastAPI applications with a single Prometheus instance?
Yes, absolutely. Prometheus is designed to scrape metrics from multiple targets. You would simply add more entries to the scrape_configs section in your prometheus.yml file, each pointing to a different FastAPI application’s metrics endpoint. It’s good practice to use unique job_name and potentially app_name prefixes in your PrometheusMiddleware for each application to easily distinguish their metrics in Prometheus and Grafana queries.