Skip to content

๐Ÿ” Webhook Retry - API Resilience

Automatic retry mechanisms ensuring reliable webhook delivery even during temporary service disruptions

Our webhook retry system is designed to maximize delivery success while protecting both sender and receiver from cascading failures. These intelligent retry policies ensure your integration receives critical events without overwhelming your infrastructure.

Retry Policy

All webhook deliveries automatically retry on failure using exponential backoff to allow your service time to recover.

Retry Configuration

Parameter Value Description
Maximum Delivery Attempts 15 retries Total attempts before moving to dead letter queue
Minimum Backoff Duration 5 seconds Initial retry delay after first failure
Maximum Backoff Duration 300 seconds Maximum delay between retry attempts (5 minutes)
Backoff Strategy Exponential Delay doubles with each attempt up to maximum

How it works:

  • First retry occurs after 5 seconds
  • Each subsequent retry doubles the delay: 5s โ†’ 10s โ†’ 20s โ†’ 40s โ†’ 80s โ†’ 160s โ†’ 300s (max)
  • Once reaching 300 seconds, all remaining retries maintain this interval
  • After 15 failed attempts, event moves to dead letter queue
  • Your endpoint must return a 2xx status code to confirm successful delivery

Automatic Subscription Deactivation

Protection against cascading failures monitors webhook health and automatically pauses problematic subscriptions.

Failure Threshold

Metric Threshold Action Configurable
Failed Delivery Attempts 150 failures Subscription set to INACTIVE No - fixed threshold
Time Window 15 minutes Rolling window for counting failures No - fixed window

How it works:

  • System continuously monitors failed_delivery_attempt count for each subscription
  • Counts failures within a rolling 15-minute window
  • When failures exceed 150 in any 15-minute period, subscription automatically deactivates
  • Deactivated subscriptions stop receiving new events
  • Manual reactivation required after addressing underlying issues

Recovery process:

  1. Investigate: Review your endpoint logs to identify the root cause or call the LIKE MAGIC api
    curl -X 'GET' \
      'https://{{monitoringUrl}}/api/event-and-data-hub-service/webhook-subscriptions/{id}/failed-deliveries' \
      -H 'accept: */*' \
      -H 'Authorization: Bearer {token}'
    
  2. Fix: Resolve infrastructure, code, or configuration issues
  3. Reactivate: Manually enable the subscription via API or use Operational Platform - Webhook Settings
  4. Monitor: Watch failure rates to ensure successful recovery

Best practices:

  • Monitor your endpoint health proactively before reaching threshold
  • Set up alerts when failure rate exceeds 50-100 failures per 15 minutes
  • Implement graceful degradation when your service experiences issues
  • Use retry queues on your side for processing failures
  • Return 2xx immediately and process asynchronously to avoid timeout-related failures

Understanding Retry Timelines

Example retry schedule for a failing endpoint:

Attempt 1:  Immediate (original delivery)        โŒ
Attempt 2:  + 5 seconds                          โŒ
Attempt 3:  + 10 seconds (5s + 5s backoff)       โŒ
Attempt 4:  + 20 seconds                         โŒ
Attempt 5:  + 40 seconds                         โŒ
Attempt 6:  + 80 seconds (~1.3 minutes)          โŒ
Attempt 7:  + 160 seconds (~2.7 minutes)         โŒ
Attempt 8:  + 300 seconds (5 minutes, max)       โŒ
Attempt 9:  + 300 seconds                        โŒ
...
Attempt 15: + 300 seconds                        โŒ
โ†’ Event moves to dead letter queue after ~37 minutes total

Successful early retry:

Attempt 1:  Immediate (original delivery)        โŒ
Attempt 2:  + 5 seconds                          โŒ
Attempt 3:  + 10 seconds                         โœ… Success!
โ†’ Event delivered successfully, no further retries

FAQ

What happens to events in the dead letter queue?

Events that fail all 15 delivery attempts are moved to a dead letter topic for manual review

Can I configure the retry policy for my subscription?

No, retry parameters are standardized across all subscriptions to ensure consistent behavior and platform stability

Will my subscription automatically reactivate after deactivation?

No, deactivated subscriptions require manual reactivation. This prevents automatic re-engagement with unhealthy endpoints that could cause further issues

How do I prevent automatic deactivation?

  • Ensure your endpoint responds within 5 seconds
  • Return 2xx status codes for successful delivery
  • Implement proper error handling and graceful degradation
  • Monitor failure rates and address issues before hitting the 150-failure threshold
  • Process webhook events asynchronously to avoid timeout failures

Do retries count toward the 150-failure threshold?

Yes, every failed delivery attempt (including retries) increments the failure counter. A single event that fails all 15 attempts contributes 15 to the failure count.

Can I request higher failure thresholds for my subscription?

No, the 150-failure/15-minute threshold is fixed for all subscriptions. This standardization ensures platform stability and prevents cascading failures.


Deliver reliably. Fail gracefully. Recover quickly ๐ŸŽฏ