Monitoring n8n in Production: Metrics, Logs, Alerts
Set up Prometheus metrics, structured logging, execution alerting, and SLO tracking for self-hosted n8n.
Key takeaways
- Enable N8N_METRICS=true and scrape /metrics with Prometheus.
- Ship structured logs to Loki or your log aggregator.
- Alert on execution failure rate, queue depth, and worker count.
- Define SLOs per workflow — not per instance.
n8n surfaces enough internal state to run a real SRE practice on it — Prometheus metrics, structured logs, an executions API, and a webhook for error workflows. Here is the minimal but complete observability stack.
Metrics
Set N8N_METRICS=true. Prometheus can scrape n8n_workflow_executions_total, n8n_workflow_execution_duration_seconds, and queue depth. Build a Grafana dashboard with three panels: executions/min, p95 duration, error rate.
Logs
Set N8N_LOG_OUTPUT=console and N8N_LOG_FORMAT=json. Ship to Loki, Datadog, or any log platform. Index workflow_id and execution_id so you can pivot from an alert to logs in one click.
Alerts
Three alerts cover 80%: execution error rate > 5%, queue depth > 100 for 5 minutes, no workers reporting. Route via Alertmanager → PagerDuty or your on-call tool.
SLOs
Per-workflow SLOs are the right unit. The lead-router workflow has a different criticality than the weekly KPI digest. Track availability and latency for each tier-1 workflow.
Frequently asked questions
- Does n8n have built-in alerting?
- Via Error Workflows yes — but for instance-level alerts you need Prometheus + Alertmanager or equivalent.
- Can I export n8n logs to Datadog?
- Yes via the Datadog log agent reading stdout, since n8n logs JSON to stdout.