Fortnite's Playground Mode Disaster: Here's What Went Wrong
Fortnite's recent Playground mode was one of the most popular limited-time modes ever introduced for the game. It was so popular, in fact, that its launch brought Fortnite's matchmaking servers to their knees, sending the game into emergency maintenance shortly thereafter. It was a few days later that Playground mode returned to Fortnite, and now Epic has delivered an update on just went wrong with that launch.In a postmortem published to Epic's website last night, the company explains why, exactly, Playground mode brought Fortnite's matchmaking down, which came as a surprise to many. After all, Fortnite is one of the most popular games in the world, so obviously Epic's servers are prepared to handle a large number of players, right? As it turns out, Fortnite actually had the server capacity to handle all of the people looking to create Playground matches – where it failed was in the matchmaking service itself.
Epic explains that Fortnite's matchmaking service (MMS) relies on nodes that put players in open servers that match their requested regions. Given Fortnite's battle royale formula, each server can typically hold 100 players, but Playground mode was limited to just four at a time, meaning that 25 to 100 times as many matches are being created specifically for Playground mode. That, by extension, meant that each node now had to manage a server list that was 15 times longer than it normally would be, increasing compute times dramatically.
The nodes essentially became overwhelmed searching these longer-than-usual server lists for somewhere to place queuing players, eventually looking for free servers on lists that belonged to other nodes. That was enough to create a backlog of queue requests, which Epic says resulted in "a feedback loop that eventually caused the system to grind to a halt."
To fix the problem, Epic says that it split Playground matchmaking into its own service cluster so that its problems wouldn't affect matchmaking for other modes. From there, Epic says that "the solution was to give the cluster the ability to bulk rebalance sessions from other nodes to ensure repeated lookups were not necessary." Epic had to spend a lot of time testing this solution in cycles that took several hours to complete each time, but it says that testing ensured Playground mode – or any of Fortnite's various modes, for that matter – would be able to handle a rush of players in the future.
In the end, Epic says that this was a "solid reminder that complex distributed systems fail in unpredictable ways." The good news is that the work Epic had to do to bring Playground mode back online will continue to benefit the game as it grows further, and the hope is that we won't see another failure like this again. Epic's write up gives an excellent (and rare) behind-the-scenes look at some of the problems that can plague developers, so be sure to give the entire thing a read if you've got a few spare moments.