Previous incidents
Elevated timeouts
Resolved Oct 29, 2025 at 11:48pm UTC
Customers who were exceeding their concurrency limit during this outage may be encountering strange errors in their crawls.
This is due to the nature of the outage affecting our concurrency limiting system, causing unique jobs to be inserted into a separate per-team concurrency queue multiple times. When these jobs are promoted from the concurrency queue into the main queue, they retain their IDs, which causes a violation of the unique ID constraint.
A fix for this issue is now deployed.
2 previous updates
Elevated timeouts
Resolved Oct 20, 2025 at 4:49pm UTC
The issue is now resolved. As part of the hotfix, all ongoing concurrency queued jobs had to be discarded, causing ongoing crawls to freeze -- please reach out to support if you are affected. We apologize for the disturbance, and thank you for your patience.
1 previous update
Elevated timeouts
Resolved Oct 19, 2025 at 11:52pm UTC
The issue has been resolved. Thank you for your patience.
2 previous updates
Elevated timeouts
Resolved Oct 18, 2025 at 7:54pm UTC
We've shipped a fix and the incident is now resolved. Thank you for your patience.
4 previous updates
Elevated timeouts
Resolved Oct 11, 2025 at 10:09pm UTC
The elevated API timeouts have been fully resolved. All services are operating normally. We apologize for the disruption and appreciate your patience.
2 previous updates
Elevated timeouts
Resolved Oct 5, 2025 at 2:31pm UTC
We have identified and pushed a hotfix for the root cause. We are working on a permanent fix.
1 previous update
Elevated timeouts
Resolved Oct 2, 2025 at 3:59pm UTC
The issue is now resolved. We apologize for the inconvenience. The maintenance we had previously scheduled for Sunday will address the root cause of this downtime.
1 previous update
API is degrated
Resolved Oct 1, 2025 at 5:20am UTC
Resolved
1 previous update
API is degraded
Resolved Sep 19, 2025 at 5:55pm UTC
The incident has been resolved.
4 previous updates
API is degraded
Resolved Sep 4, 2025 at 1:32pm UTC
We are back!
1 previous update
API is degraded
Resolved Sep 2, 2025 at 8:58pm UTC
The issue was resolved.
1 previous update
API is degraded
Resolved Aug 27, 2025 at 5:18pm UTC
The system has recovered. We have escalated the root cause to our upstream provider.
1 previous update
API is degraded
Resolved Aug 26, 2025 at 3:03pm UTC
We are back.
1 previous update
Dashboard is partially unavailable or slower than usual
Resolved Aug 21, 2025 at 7:57pm UTC
Upstream provider has resolved the issue.
1 previous update
firecrawl.dev is down
Resolved Aug 18, 2025 at 11:59pm UTC
firecrawl.dev recovered.
1 previous update
API is degraded
Resolved Aug 14, 2025 at 9:26pm UTC
We are back. Job timeout metrics have recovered to pre-incident levels.
The issue came down to misconfigured pipeline queue limits on Dragonfly -- we configured them with high values expecting a high load on production, however, they ended up being ridiculously high. This caused Dragonfly's backpressure mechanisms to kick in way too late, only when the state of the instance is practically already unsalvageable. The configuration was tuned, and we will continue to monitor this.
5 previous updates
API is degraded
Resolved Aug 13, 2025 at 11:37pm UTC
The API has recovered.
We quickly restarted the workers to get them un-stuck again. After that, we revisited the core issue. A lot of jobs were finishing at the same time, started hammering the same sorted set in Redis at the same time, and ran into the same race conditions. We decided to break up these clumps of requests by applying a timeout of random length before retrying every time the anti-race mechanism activates. This spreads out the workers nicely.
After applying the fix and observ...
3 previous updates
API is degraded
Resolved Aug 7, 2025 at 11:21am UTC
Service is restored. Crawls that appeared "stuck" should now resume.
2 previous updates
API timeouts elevated
Resolved Aug 6, 2025 at 12:19pm UTC
The issue is fully resolved. Apologies for the disruption and thank you for your patience.
The issue was caused by a load spike, triggering a scale-up to a high amount of API and Worker pods. These pods interface heavily with our Dragonfly (Redis-equivalent) instance via BullMQ for job queueing. The increased connections and requests to Dragonfly caused it to start having to queue pipeline operations, which caused BullMQ operations to have a delay, making the system fail and scrape jobs accu...
1 previous update