Previous incidents
Elevated timeouts
Resolved Oct 20 at 09:49am PDT
The issue is now resolved. As part of the hotfix, all ongoing concurrency queued jobs had to be discarded, causing ongoing crawls to freeze -- please reach out to support if you are affected. We apologize for the disturbance, and thank you for your patience.
1 previous update
Elevated timeouts
Resolved Oct 19 at 04:52pm PDT
The issue has been resolved. Thank you for your patience.
2 previous updates
Elevated timeouts
Resolved Oct 18 at 12:54pm PDT
We've shipped a fix and the incident is now resolved. Thank you for your patience.
4 previous updates
Elevated timeouts
Resolved Oct 11 at 03:09pm PDT
The elevated API timeouts have been fully resolved. All services are operating normally. We apologize for the disruption and appreciate your patience.
2 previous updates
Elevated timeouts
Resolved Oct 05 at 07:31am PDT
We have identified and pushed a hotfix for the root cause. We are working on a permanent fix.
1 previous update
Elevated timeouts
Resolved Oct 02 at 08:59am PDT
The issue is now resolved. We apologize for the inconvenience. The maintenance we had previously scheduled for Sunday will address the root cause of this downtime.
1 previous update
API is degrated
Resolved Sep 30 at 10:20pm PDT
Resolved
1 previous update
API is degraded
Resolved Sep 19 at 10:55am PDT
The incident has been resolved.
4 previous updates
API is degraded
Resolved Sep 10 at 04:37am PDT
The issue is resolved. We switched the main Firecrawl service to our new queue system last week, however, Fire-Engine remained on the previous queue system, and experienced a failure similar to what Firecrawl was experiencing before the switch. We are working on switching Fire-Engine over to the new system as well.
1 previous update
API is degraded
Resolved Sep 04 at 06:32am PDT
We are back!
1 previous update
API is degraded
Resolved Sep 02 at 01:58pm PDT
The issue was resolved.
1 previous update
API is degraded
Resolved Aug 27 at 10:18am PDT
The system has recovered. We have escalated the root cause to our upstream provider.
1 previous update
API is degraded
Resolved Aug 26 at 08:03am PDT
We are back.
1 previous update
Dashboard is partially unavailable or slower than usual
Resolved Aug 21 at 12:57pm PDT
Upstream provider has resolved the issue.
1 previous update
firecrawl.dev is down
Resolved Aug 18 at 04:59pm PDT
firecrawl.dev recovered.
1 previous update
API is degraded
Resolved Aug 14 at 02:26pm PDT
We are back. Job timeout metrics have recovered to pre-incident levels.
The issue came down to misconfigured pipeline queue limits on Dragonfly -- we configured them with high values expecting a high load on production, however, they ended up being ridiculously high. This caused Dragonfly's backpressure mechanisms to kick in way too late, only when the state of the instance is practically already unsalvageable. The configuration was tuned, and we will continue to monitor this.
5 previous updates
API is degraded
Resolved Aug 13 at 04:37pm PDT
The API has recovered.
We quickly restarted the workers to get them un-stuck again. After that, we revisited the core issue. A lot of jobs were finishing at the same time, started hammering the same sorted set in Redis at the same time, and ran into the same race conditions. We decided to break up these clumps of requests by applying a timeout of random length before retrying every time the anti-race mechanism activates. This spreads out the workers nicely.
After applying the fix and observ...
3 previous updates
API is degraded
Resolved Aug 07 at 04:21am PDT
Service is restored. Crawls that appeared "stuck" should now resume.
2 previous updates
API timeouts elevated
Resolved Aug 06 at 05:19am PDT
The issue is fully resolved. Apologies for the disruption and thank you for your patience.
The issue was caused by a load spike, triggering a scale-up to a high amount of API and Worker pods. These pods interface heavily with our Dragonfly (Redis-equivalent) instance via BullMQ for job queueing. The increased connections and requests to Dragonfly caused it to start having to queue pipeline operations, which caused BullMQ operations to have a delay, making the system fail and scrape jobs accu...
1 previous update