๐ Webhook Retry - API Resilience
Automatic retry mechanisms ensuring reliable webhook delivery even during temporary service disruptions
Our webhook retry system is designed to maximize delivery success while protecting both sender and receiver from cascading failures. These intelligent retry policies ensure your integration receives critical events without overwhelming your infrastructure.
Retry Policy
All webhook deliveries automatically retry on failure using exponential backoff to allow your service time to recover.
Retry Configuration
| Parameter | Value | Description |
|---|---|---|
| Maximum Delivery Attempts | 15 retries | Total attempts before moving to dead letter queue |
| Minimum Backoff Duration | 5 seconds | Initial retry delay after first failure |
| Maximum Backoff Duration | 300 seconds | Maximum delay between retry attempts (5 minutes) |
| Backoff Strategy | Exponential | Delay doubles with each attempt up to maximum |
How it works:
- First retry occurs after 5 seconds
- Each subsequent retry doubles the delay: 5s โ 10s โ 20s โ 40s โ 80s โ 160s โ 300s (max)
- Once reaching 300 seconds, all remaining retries maintain this interval
- After 15 failed attempts, event moves to dead letter queue
- Your endpoint must return a 2xx status code to confirm successful delivery
Automatic Subscription Deactivation
Protection against cascading failures monitors webhook health and automatically pauses problematic subscriptions.
Failure Threshold
| Metric | Threshold | Action | Configurable |
|---|---|---|---|
| Failed Delivery Attempts | 150 failures | Subscription set to INACTIVE | No - fixed threshold |
| Time Window | 15 minutes | Rolling window for counting failures | No - fixed window |
How it works:
- System continuously monitors
failed_delivery_attemptcount for each subscription - Counts failures within a rolling 15-minute window
- When failures exceed 150 in any 15-minute period, subscription automatically deactivates
- Deactivated subscriptions stop receiving new events
- Manual reactivation required after addressing underlying issues
Recovery process:
- Investigate: Review your endpoint logs to identify the root cause or call the LIKE MAGIC api
curl -X 'GET' \ 'https://{{monitoringUrl}}/api/event-and-data-hub-service/webhook-subscriptions/{id}/failed-deliveries' \ -H 'accept: */*' \ -H 'Authorization: Bearer {token}' - Fix: Resolve infrastructure, code, or configuration issues
- Reactivate: Manually enable the subscription via API or use Operational Platform - Webhook Settings
- Monitor: Watch failure rates to ensure successful recovery
Best practices:
- Monitor your endpoint health proactively before reaching threshold
- Set up alerts when failure rate exceeds 50-100 failures per 15 minutes
- Implement graceful degradation when your service experiences issues
- Use retry queues on your side for processing failures
- Return 2xx immediately and process asynchronously to avoid timeout-related failures
Understanding Retry Timelines
Example retry schedule for a failing endpoint:
Attempt 1: Immediate (original delivery) โ
Attempt 2: + 5 seconds โ
Attempt 3: + 10 seconds (5s + 5s backoff) โ
Attempt 4: + 20 seconds โ
Attempt 5: + 40 seconds โ
Attempt 6: + 80 seconds (~1.3 minutes) โ
Attempt 7: + 160 seconds (~2.7 minutes) โ
Attempt 8: + 300 seconds (5 minutes, max) โ
Attempt 9: + 300 seconds โ
...
Attempt 15: + 300 seconds โ
โ Event moves to dead letter queue after ~37 minutes total
Successful early retry:
Attempt 1: Immediate (original delivery) โ
Attempt 2: + 5 seconds โ
Attempt 3: + 10 seconds โ
Success!
โ Event delivered successfully, no further retries
FAQ
What happens to events in the dead letter queue?
Events that fail all 15 delivery attempts are moved to a dead letter topic for manual review
Can I configure the retry policy for my subscription?
No, retry parameters are standardized across all subscriptions to ensure consistent behavior and platform stability
Will my subscription automatically reactivate after deactivation?
No, deactivated subscriptions require manual reactivation. This prevents automatic re-engagement with unhealthy endpoints that could cause further issues
How do I prevent automatic deactivation?
- Ensure your endpoint responds within 5 seconds
- Return 2xx status codes for successful delivery
- Implement proper error handling and graceful degradation
- Monitor failure rates and address issues before hitting the 150-failure threshold
- Process webhook events asynchronously to avoid timeout failures
Do retries count toward the 150-failure threshold?
Yes, every failed delivery attempt (including retries) increments the failure counter. A single event that fails all 15 attempts contributes 15 to the failure count.
Can I request higher failure thresholds for my subscription?
No, the 150-failure/15-minute threshold is fixed for all subscriptions. This standardization ensures platform stability and prevents cascading failures.
Deliver reliably. Fail gracefully. Recover quickly ๐ฏ