n8n Error Handling: Patterns That Don't Wake You Up
Build resilient n8n workflows with Error Workflows, retries, dead-letter queues, and Slack alerts. Patterns proven at scale.
Key takeaways
- Set an Error Workflow on every production workflow.
- Use the Stop and Error node to fail loud on bad data instead of silently passing junk downstream.
- Retry transient failures with exponential backoff using node-level Retry On Fail.
- Push permanent failures to a dead-letter Postgres table for human review.
A workflow that runs once is a demo. A workflow that runs ten thousand times is software, and software fails. Good error handling is the difference between a Slack ping you investigate Monday morning and a 3 a.m. page. n8n has all the primitives you need — most teams just don't wire them up.
Error Workflows — the safety net
Every workflow has a Settings panel with an Error Workflow field. Point it at a dedicated workflow that receives the error context and posts to Slack, opens a Linear ticket, or pages PagerDuty. One error workflow can serve all your production workflows.
Node-level retries
Open any node's settings → Retry On Fail. Set Max Tries 3 and Wait Between Tries 5000ms. n8n now retries transient HTTP 502s, Postgres deadlocks, and rate-limit 429s without firing the error workflow. Combine with a Wait node and jitter for serious backoff.
Stop and Error — fail loud
When data is malformed, throw. The Stop and Error node halts the execution and surfaces a clear message in the Executions list and the error workflow. Silent passes are the worst kind of bug — they corrupt downstream data and look like success.
Dead-letter queue pattern
Wrap risky sections in a sub-workflow called with Continue On Fail. If it fails, write the input payload to a dead_letter_queue Postgres table with the error message and timestamp. A daily review workflow lets a human triage and replay.
Alerting that humans actually read
Alerts should include: workflow name, execution URL (deep link to the failed run), error message, first-failed node, and a one-click Retry button (a Slack interactive button posting back to a webhook that calls the n8n API to re-execute). Vague alerts get muted; specific alerts get fixed.
Frequently asked questions
- Where do I see past errors?
- Executions tab → filter Status = Error. Each row links to the failed node with the exact input.
- Can I retry a failed execution?
- Yes from the Executions list, or programmatically via the n8n REST API POST /executions/:id/retry.
- How do I alert on no execution at all?
- Add a heartbeat workflow that pings a dead-man's-snitch service every minute and alerts on missing pings.