Previous incidents

August 2025
Aug 27, 2025
1 incident

API is degraded

Degraded

Resolved Aug 27 at 10:18am PDT

The system has recovered. We have escalated the root cause to our upstream provider.

1 previous update

Aug 26, 2025
1 incident

API is degraded

Degraded

Resolved Aug 26 at 08:03am PDT

We are back.

1 previous update

Aug 21, 2025
1 incident

Dashboard is partially unavailable or slower than usual

Degraded

Resolved Aug 21 at 12:57pm PDT

Upstream provider has resolved the issue.

1 previous update

Aug 18, 2025
1 incident

firecrawl.dev is down

Downtime

Resolved Aug 18 at 04:59pm PDT

firecrawl.dev recovered.

1 previous update

Aug 14, 2025
1 incident

API is degraded

Degraded

Resolved Aug 14 at 02:26pm PDT

We are back. Job timeout metrics have recovered to pre-incident levels.

The issue came down to misconfigured pipeline queue limits on Dragonfly -- we configured them with high values expecting a high load on production, however, they ended up being ridiculously high. This caused Dragonfly's backpressure mechanisms to kick in way too late, only when the state of the instance is practically already unsalvageable. The configuration was tuned, and we will continue to monitor this.

5 previous updates

Aug 13, 2025
1 incident

API is degraded

Degraded

Resolved Aug 13 at 04:37pm PDT

The API has recovered.

We quickly restarted the workers to get them un-stuck again. After that, we revisited the core issue. A lot of jobs were finishing at the same time, started hammering the same sorted set in Redis at the same time, and ran into the same race conditions. We decided to break up these clumps of requests by applying a timeout of random length before retrying every time the anti-race mechanism activates. This spreads out the workers nicely.

After applying the fix and observ...

3 previous updates

Aug 07, 2025
1 incident

API is degraded

Degraded

Resolved Aug 07 at 04:21am PDT

Service is restored. Crawls that appeared "stuck" should now resume.

2 previous updates

Aug 06, 2025
1 incident

API timeouts elevated

Degraded

Resolved Aug 06 at 05:19am PDT

The issue is fully resolved. Apologies for the disruption and thank you for your patience.

The issue was caused by a load spike, triggering a scale-up to a high amount of API and Worker pods. These pods interface heavily with our Dragonfly (Redis-equivalent) instance via BullMQ for job queueing. The increased connections and requests to Dragonfly caused it to start having to queue pipeline operations, which caused BullMQ operations to have a delay, making the system fail and scrape jobs accu...

1 previous update