🔁 Webhook Retry - API Resilience

Automatic retry mechanisms ensuring reliable webhook delivery even during temporary service disruptions

Our webhook retry system is designed to maximize delivery success while protecting both sender and receiver from cascading failures. These intelligent retry policies ensure your integration receives critical events without overwhelming your infrastructure.

Retry Policy

All webhook deliveries automatically retry on failure using exponential backoff to allow your service time to recover.

Retry Configuration

Parameter	Value	Description
Maximum Delivery Attempts	15 retries	Total attempts before moving to dead letter queue
Minimum Backoff Duration	5 seconds	Initial retry delay after first failure
Maximum Backoff Duration	300 seconds	Maximum delay between retry attempts (5 minutes)
Backoff Strategy	Exponential	Delay doubles with each attempt up to maximum

How it works:

First retry occurs after 5 seconds
Each subsequent retry doubles the delay: 5s → 10s → 20s → 40s → 80s → 160s → 300s (max)
Once reaching 300 seconds, all remaining retries maintain this interval
After 15 failed attempts, event moves to dead letter queue
Your endpoint must return a 2xx status code to confirm successful delivery

Automatic Subscription Deactivation

Protection against cascading failures monitors webhook health and automatically pauses problematic subscriptions.

Failure Threshold

Metric	Threshold	Action	Configurable
Failed Delivery Attempts	150 failures	Subscription set to INACTIVE	No - fixed threshold
Time Window	15 minutes	Rolling window for counting failures	No - fixed window

How it works:

System continuously monitors failed_delivery_attempt count for each subscription
Counts failures within a rolling 15-minute window
When failures exceed 150 in any 15-minute period, subscription automatically deactivates
Deactivated subscriptions stop receiving new events
Manual reactivation required after addressing underlying issues

Recovery process:

Investigate: Review your endpoint logs to identify the root cause or call the LIKE MAGIC api

curl -X 'GET' \
  'https://{{monitoringUrl}}/api/event-and-data-hub-service/webhook-subscriptions/{id}/failed-deliveries' \
  -H 'accept: */*' \
  -H 'Authorization: Bearer {token}'

Fix: Resolve infrastructure, code, or configuration issues
Reactivate: Manually enable the subscription via API or use Operational Platform - Webhook Settings
Monitor: Watch failure rates to ensure successful recovery

Best practices:

Monitor your endpoint health proactively before reaching threshold
Set up alerts when failure rate exceeds 50-100 failures per 15 minutes
Implement graceful degradation when your service experiences issues
Use retry queues on your side for processing failures
Return 2xx immediately and process asynchronously to avoid timeout-related failures

Understanding Retry Timelines

Example retry schedule for a failing endpoint:

Attempt 1:  Immediate (original delivery)        ❌
Attempt 2:  + 5 seconds                          ❌
Attempt 3:  + 10 seconds (5s + 5s backoff)       ❌
Attempt 4:  + 20 seconds                         ❌
Attempt 5:  + 40 seconds                         ❌
Attempt 6:  + 80 seconds (~1.3 minutes)          ❌
Attempt 7:  + 160 seconds (~2.7 minutes)         ❌
Attempt 8:  + 300 seconds (5 minutes, max)       ❌
Attempt 9:  + 300 seconds                        ❌
...
Attempt 15: + 300 seconds                        ❌
→ Event moves to dead letter queue after ~37 minutes total

Successful early retry:

Attempt 1:  Immediate (original delivery)        ❌
Attempt 2:  + 5 seconds                          ❌
Attempt 3:  + 10 seconds                         ✅ Success!
→ Event delivered successfully, no further retries

FAQ

What happens to events in the dead letter queue?

Events that fail all 15 delivery attempts are moved to a dead letter topic for manual review

Can I configure the retry policy for my subscription?

No, retry parameters are standardized across all subscriptions to ensure consistent behavior and platform stability

Will my subscription automatically reactivate after deactivation?

No, deactivated subscriptions require manual reactivation. This prevents automatic re-engagement with unhealthy endpoints that could cause further issues

How do I prevent automatic deactivation?

Ensure your endpoint responds within 5 seconds
Return 2xx status codes for successful delivery
Implement proper error handling and graceful degradation
Monitor failure rates and address issues before hitting the 150-failure threshold
Process webhook events asynchronously to avoid timeout failures

Do retries count toward the 150-failure threshold?

Yes, every failed delivery attempt (including retries) increments the failure counter. A single event that fails all 15 attempts contributes 15 to the failure count.

Can I request higher failure thresholds for my subscription?

No, the 150-failure/15-minute threshold is fixed for all subscriptions. This standardization ensures platform stability and prevents cascading failures.

Deliver reliably. Fail gracefully. Recover quickly 🎯