December 2019 Stress Test ReviewDecember 9th, 2019 at 11:32 am
What happened during Fractured’s first explosive* open stress test? Let’s find out! (*servers exploded)
Hi, fellow gamer and MMO enthusiast!
Our stress test weekend has ended, and it’s been a great… mess! Not uncommon for a stress test, but perhaps a little worse than expected 😉
When we announced it, we asked ourselves the following questions:
- Client patcher – will download be as fast as it is now?
- Login server – will there be queues?
- Server workers – will the processes that simulate the world server-side be able to handle a lot of concurrent users?
- Game world – will the map get too busy or will it become too hard to find a free plot of land?
I can say we have the answer to all but the last, so let’s get to them!
Workers were stable
Workers are the Unity processes that take care of simulating the world in the server. Each worker controls a portion of our open world, and players don’t perceive anything when transitioning from one to another thanks to the magic of SpatialOS that stitches them together.
During the stress test, workers showed no sign of overload. Both those that controlled areas that included new player spawn points and those that controlled hotspots where intense PvE action was taking place were running smoothly. This is great news for us gameplay-wise because it shows we can already handle large PvE/PvP battles, but also financially, since it means we need less powerful servers.
Interest was high
The Open Test Weekend revealed an amount of interest in Fractured that surprised us – and greatly contributed to the demise of the server 😉
Over 4,000 new accounts were created from the day the test was announced (Wednesday) until it was closed (Saturday). Almost 13,000 characters were created during the test (belonging to ~9,000 distinct accounts).
In the very moment we would open up the servers after a patch, dozens of users would try to login at the same time (RIP login server). This was exciting to watch and we look forward to what will happen during our next stress test!
This is something we started experiencing soon after opening on Friday and immediately made public on our forums. Once a few hundred concurrent users are connected and actively playing, something goes wild (<= getting very technical there) in a component of our backend engine (SpatialOS). This component is called the “runtime”, and is in charge of handling persistent data and communication between workers (both clients and servers).
Over the weekend, we tried to solve the issue working on a few possible culprits. We dropped a quick patch on Friday evening, another one deep into Friday night, then the final one on Saturday afternoon. Given that the issue was internal to the engine, however, we were shooting in the dark – and guess what? None of our attempts nailed it. It was clear we had to wait for the support of the SpatialOS engineers on Monday to figure it out.
On Saturday night, we faced a choice: leave the stress test running with a strongly limited amount of concurrent users and endless queues to login, or end it early and run a new stress test in the (close) future. We picked the latter.
Inadequate web infrastructure
As of today, our login server and web API are the most “alpha” components of Fractured. They’re deployed on a very weak machine and lack basic functionalities such as login queues.
We didn’t think the stress test would put much pressure on our web infrastructure, but it did and it was very obvious. At peak times, it would take a while just to be able to login and reach character selection. The cap on concurrent users we had to put in place to due the issue described above made things worse, increasing the number of players in the hands of the login server.
The good news here is that we just have to move these services to a proper scalable host and expand them a little – a relatively quick job we’ll get done before the next stress test.
A new stress test open to all registered users! This one starting early in the week perhaps, when we can work with the SpatialOS engineers right away if something goes wrong.
Before that happens, we are also considering running a stress test open to all backers – which means including Beta2/Beta1 pledges, but not free accounts. The number of concurrent users during such a test would be a lot smaller, but hopefully enough to reveal if the issue is resolved before moving to the large-scale test.
When the latter is going to be is impossible to tell – we’d love to launch it before Christmas, since we know many of you are eager to test more and some didn’t manage to login at all. However, we can’t make any meaningful prediction on how long it will take to sort the bug out. If not December, it will be January though, so stay tuned and…