Back

Hosting ParadiseSS13

With the old host stepping down around April 2021, I offered to take up the mantle. Hosting a game server system for thousands of players around the world is not easy, but I took it anyway, because no one else was able to.

To summarise the infrastructure, we have:

18 servers combining physical and virtual machines, spanning 10 datacentres over 3 continents (North America, Europe and Australia), and 4 providers (ReliableSite, Oracle Cloud, Hetzner, OVH)
A mix of custom written and industry standard software, including but not limited to
- Proxmox VE Hypervisor
- pfSense Router
- MariaDB
- Redis
- Apache Webserve
  - MediaWiki
  - Invision Community
- tgstation-server
- The full Elasticsearch/Logstash/Kibana stack
- GitLab
- PRTG Network Monitor (network map)
- HAproxy

The main hypervisor is located in New York, to aid connectivity of North America, and runs most of the core systems over several VMs. These are:

The router VM (pfSense)
- Provides a firewall, NAT for the rest of the VMs, and an OpenVPN server for management tasks to remove the need to have management ports exposed.
The core VM (Ubuntu Server)
- Runs the database, redis server, several pieces of custom written internal tooling, and is the centralised data hub for the rest of the infrastructure.
The webserver VM (Ubuntu Server)
- Runs the webserver, and thats it.
- This is its own VM for security reasons as webapps are a large attack vector.
The game VM (Windows Server Core)
- Runs the gameserver, and thats it.
- his is in its own VM due to a windows requirement and separation for maximum performance.
The analytics VM (Ubuntu Server)
- Runs the Elastic stack for analysis of logs and metrics.
- This is in its own VM so I can delegate control to someone with Elastic certifications, and put a hard limit on Elasticsearch storage.
The GitLab VM (Ubuntu Server)
- Runs GitLab, and thats it.
- This is its own VM for security reasons with the GitLab runner, and for resource confinement as GitLab likes to consume lots of resources
The monitoring VM (Windows Server Core)
- Runs the PRTG web service and PRTG probe.
- This is in its own VM due to a windows requirement and not wanting to bog down the game VM.
The stats frontend VM (Ubuntu Server)
- Runs a custom stats page for game stats.
- This is in its own VM so I can delegate it to the stats page developer to make managing python modules and CD easier.

All VMs are accessible only via the management VPN to aid in security. No management ports are exposed to the internet. Each VM is assigned to specific cores (IE - forcefully mapped to physical cores instead of just allocated X cores) on the hypervisor to avoid any resource conflicts due to usage overlap. This aids in performance incredibly.

The other main server here is the offsite backup, located in Germany for maximum geo-resilience. This takes snapshots of core directories (SQL backups, game logs, webserver files) daily, allowing for history. The backups are done on a pull-system as opposed to the system, with the backup server having read access to the servers it needs to backup from, rather than the systems needing to be backed up having write access to the backup server. This way, if the primary server gets compromised either by network or an attack on the provider itself (the backup server is with a separate provider), there is no way to wipe out the backup data as it cannot even see it, let alone perform write or delete operations on it.

The remaining servers in each regions are proxy nodes. As well as the main server in New York being able to take incoming connections, there are relays situated in the following regions:

US-West (California)
US-East (Virginia)
UK (London)
EU-West (France)
EU-Central (Poland)
Australia (Sydney)

Users have a choice to connect to these relays as opposed to the main server if they are geographically far away from New York, as it has a reduction in latency. The user's ISP only has to route to the nearest region as opposed to potentially over several oceans, allowing for a more optimised route. Furthermore, all traffic from the relays to the main aggregator is on the same network (OVH Private Backbone), allowing for a more optimised and less-congested route around the world, as routing between datacentres owned by the same company is drastically better than from a home ISP around the world. These servers run HAproxy to wrap the game traffic inside the proxy protocol, and then forward it to the gameserver while preserving the source IP. This is then decoded by the router VM and forwarded onto the gameserver VM.

This relay system could be accomplished with existing cloud PaaS systems such as AWS Global Accelerator, however that would cost ~$400 USD a month with the traffic we shift, and this VM solution costs <$30 USD a month.