Previous incidents

October 2025
Oct 29, 2025
1 incident

Elevated timeouts

Downtime

Resolved Oct 29, 2025 at 11:48pm UTC

Customers who were exceeding their concurrency limit during this outage may be encountering strange errors in their crawls.

This is due to the nature of the outage affecting our concurrency limiting system, causing unique jobs to be inserted into a separate per-team concurrency queue multiple times. When these jobs are promoted from the concurrency queue into the main queue, they retain their IDs, which causes a violation of the unique ID constraint.

A fix for this issue is now deployed.

2 previous updates

Oct 20, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 20, 2025 at 4:49pm UTC

The issue is now resolved. As part of the hotfix, all ongoing concurrency queued jobs had to be discarded, causing ongoing crawls to freeze -- please reach out to support if you are affected. We apologize for the disturbance, and thank you for your patience.

1 previous update

Oct 19, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 19, 2025 at 11:52pm UTC

The issue has been resolved. Thank you for your patience.

2 previous updates

Oct 18, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 18, 2025 at 7:54pm UTC

We've shipped a fix and the incident is now resolved. Thank you for your patience.

4 previous updates

Oct 11, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 11, 2025 at 10:09pm UTC

The elevated API timeouts have been fully resolved. All services are operating normally. We apologize for the disruption and appreciate your patience.

2 previous updates

Oct 05, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 5, 2025 at 2:31pm UTC

We have identified and pushed a hotfix for the root cause. We are working on a permanent fix.

1 previous update

Oct 02, 2025
1 incident

Elevated timeouts

Degraded

Resolved Oct 2, 2025 at 3:59pm UTC

The issue is now resolved. We apologize for the inconvenience. The maintenance we had previously scheduled for Sunday will address the root cause of this downtime.

1 previous update

August 2025
Aug 27, 2025
1 incident

API is degraded

Degraded

Resolved Aug 27, 2025 at 5:18pm UTC

The system has recovered. We have escalated the root cause to our upstream provider.

1 previous update

Aug 26, 2025
1 incident

API is degraded

Degraded

Resolved Aug 26, 2025 at 3:03pm UTC

We are back.

1 previous update

Aug 21, 2025
1 incident

Dashboard is partially unavailable or slower than usual

Degraded

Resolved Aug 21, 2025 at 7:57pm UTC

Upstream provider has resolved the issue.

1 previous update

Aug 18, 2025
1 incident

firecrawl.dev is down

Downtime

Resolved Aug 18, 2025 at 11:59pm UTC

firecrawl.dev recovered.

1 previous update

Aug 14, 2025
1 incident

API is degraded

Degraded

Resolved Aug 14, 2025 at 9:26pm UTC

We are back. Job timeout metrics have recovered to pre-incident levels.

The issue came down to misconfigured pipeline queue limits on Dragonfly -- we configured them with high values expecting a high load on production, however, they ended up being ridiculously high. This caused Dragonfly's backpressure mechanisms to kick in way too late, only when the state of the instance is practically already unsalvageable. The configuration was tuned, and we will continue to monitor this.

5 previous updates

Aug 13, 2025
1 incident

API is degraded

Degraded

Resolved Aug 13, 2025 at 11:37pm UTC

The API has recovered.

We quickly restarted the workers to get them un-stuck again. After that, we revisited the core issue. A lot of jobs were finishing at the same time, started hammering the same sorted set in Redis at the same time, and ran into the same race conditions. We decided to break up these clumps of requests by applying a timeout of random length before retrying every time the anti-race mechanism activates. This spreads out the workers nicely.

After applying the fix and observ...

3 previous updates

Aug 07, 2025
1 incident

API is degraded

Degraded

Resolved Aug 7, 2025 at 11:21am UTC

Service is restored. Crawls that appeared "stuck" should now resume.

2 previous updates

Aug 06, 2025
1 incident

API timeouts elevated

Degraded

Resolved Aug 6, 2025 at 12:19pm UTC

The issue is fully resolved. Apologies for the disruption and thank you for your patience.

The issue was caused by a load spike, triggering a scale-up to a high amount of API and Worker pods. These pods interface heavily with our Dragonfly (Redis-equivalent) instance via BullMQ for job queueing. The increased connections and requests to Dragonfly caused it to start having to queue pipeline operations, which caused BullMQ operations to have a delay, making the system fail and scrape jobs accu...

1 previous update