Previous incidents

October 2025
Oct 29, 2025
1 incident

Elevated timeouts

Downtime

Resolved Oct 29 at 04:48pm PDT

Customers who were exceeding their concurrency limit during this outage may be encountering strange errors in their crawls.

This is due to the nature of the outage affecting our concurrency limiting system, causing unique jobs to be inserted into a separate per-team concurrency queue multiple times. When these jobs are promoted from the concurrency queue into the main queue, they retain their IDs, which causes a violation of the unique ID constraint.

A fix for this issue is now deployed.

2 previous updates

Oct 20, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 20 at 09:49am PDT

The issue is now resolved. As part of the hotfix, all ongoing concurrency queued jobs had to be discarded, causing ongoing crawls to freeze -- please reach out to support if you are affected. We apologize for the disturbance, and thank you for your patience.

1 previous update

Oct 19, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 19 at 04:52pm PDT

The issue has been resolved. Thank you for your patience.

2 previous updates

Oct 18, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 18 at 12:54pm PDT

We've shipped a fix and the incident is now resolved. Thank you for your patience.

4 previous updates

Oct 11, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 11 at 03:09pm PDT

The elevated API timeouts have been fully resolved. All services are operating normally. We apologize for the disruption and appreciate your patience.

2 previous updates

Oct 05, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 05 at 07:31am PDT

We have identified and pushed a hotfix for the root cause. We are working on a permanent fix.

1 previous update

Oct 02, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 02 at 08:59am PDT

The issue is now resolved. We apologize for the inconvenience. The maintenance we had previously scheduled for Sunday will address the root cause of this downtime.

1 previous update

August 2025
Aug 27, 2025
1 incident

API is degraded

Degraded

Resolved Aug 27 at 10:18am PDT

The system has recovered. We have escalated the root cause to our upstream provider.

1 previous update

Aug 26, 2025
1 incident

API is degraded

Degraded

Resolved Aug 26 at 08:03am PDT

We are back.

1 previous update

Aug 21, 2025
1 incident

Dashboard is partially unavailable or slower than usual

Degraded

Resolved Aug 21 at 12:57pm PDT

Upstream provider has resolved the issue.

1 previous update

Aug 18, 2025
1 incident

firecrawl.dev is down

Downtime

Resolved Aug 18 at 04:59pm PDT

firecrawl.dev recovered.

1 previous update

Aug 14, 2025
1 incident

API is degraded

Degraded

Resolved Aug 14 at 02:26pm PDT

We are back. Job timeout metrics have recovered to pre-incident levels.

The issue came down to misconfigured pipeline queue limits on Dragonfly -- we configured them with high values expecting a high load on production, however, they ended up being ridiculously high. This caused Dragonfly's backpressure mechanisms to kick in way too late, only when the state of the instance is practically already unsalvageable. The configuration was tuned, and we will continue to monitor this.

5 previous updates

Aug 13, 2025
1 incident

API is degraded

Degraded

Resolved Aug 13 at 04:37pm PDT

The API has recovered.

We quickly restarted the workers to get them un-stuck again. After that, we revisited the core issue. A lot of jobs were finishing at the same time, started hammering the same sorted set in Redis at the same time, and ran into the same race conditions. We decided to break up these clumps of requests by applying a timeout of random length before retrying every time the anti-race mechanism activates. This spreads out the workers nicely.

After applying the fix and observ...

3 previous updates

Aug 07, 2025
1 incident

API is degraded

Degraded

Resolved Aug 07 at 04:21am PDT

Service is restored. Crawls that appeared "stuck" should now resume.

2 previous updates

Aug 06, 2025
1 incident

API timeouts elevated

Degraded

Resolved Aug 06 at 05:19am PDT

The issue is fully resolved. Apologies for the disruption and thank you for your patience.

The issue was caused by a load spike, triggering a scale-up to a high amount of API and Worker pods. These pods interface heavily with our Dragonfly (Redis-equivalent) instance via BullMQ for job queueing. The increased connections and requests to Dragonfly caused it to start having to queue pipeline operations, which caused BullMQ operations to have a delay, making the system fail and scrape jobs accu...

1 previous update