Back to overview
Degraded

API timeouts elevated

Aug 06 at 04:35am PDT
Affected services
api.firecrawl.dev

Resolved
Aug 06 at 05:19am PDT

The issue is fully resolved. Apologies for the disruption and thank you for your patience.

The issue was caused by a load spike, triggering a scale-up to a high amount of API and Worker pods. These pods interface heavily with our Dragonfly (Redis-equivalent) instance via BullMQ for job queueing. The increased connections and requests to Dragonfly caused it to start having to queue pipeline operations, which caused BullMQ operations to have a delay, making the system fail and scrape jobs accumulate.

We set a lower limit on the max amount of worker and API pods, and we changed our BullMQ lock renewal to be less aggressive. This caused the pipeline queue length metric to decrease back to nominal levels.

Created
Aug 06 at 04:35am PDT

We are working on resolving the issue