Scaling Dedicated Game Servers With Kubernetes Part 3 Scaling Up Nodes

From E-learn Portal
Jump to: navigation, search

This is part three of a fivefour-part series on scaling game servers with Kubernetes.



In the previous two posts we looked at hosting dedicated game servers on Kubernetes and measuring and limiting their memory and CPU resources. In this instalment we look at how we can use the CPU information from the previous post to determine when we need to scale up our Kubernetes cluster because we've run out of room for more game servers as our player base increases.



Separate Apps from Game Servers



Before we start to write code to increase Kubernetes' size, the first thing we should do is to separate our applications, such as match makers, game server controllers and the soon to be written node scaler, onto other nodes in the cluster from where the game servers will be running.



This has many benefits:



1. The resource usage of the applications will no longer have an effect on the game servers. This means that even if a matchmaker experiences a CPU spike, there is an additional barrier to prevent it from affecting a dedicated server in play. 2. It makes scaling up and down capacity for dedicated game servers easier - as we only need to look at game server usage across a specific set of nodes, rather than all potential containers across the entire cluster. 3. You can use larger machines with more CPU cores, memory, and memory for the game server servers, or smaller machines with fewer cores, memory, and cores for the controller apps, as they require less resources. This allows us to choose the right size machine for the job. This is gives us great flexibility while still being cost effective.



Kubernetes is easy to set-up a heterogenous cluster and gives us tools to specify where the Pods are located within the cluster.



It's worth noting that that there is also a more sophisticated Node Affinity feature in beta, but we don't need it for this example, so we'll ignore its extra complexity for now.



To get started, we need to assign labels (a set of key-value pairs) to the nodes in our cluster. This is exactly the same as you would have seen if you've ever created Pods with Deployments and exposed them with Services, but applied to nodes instead. Google Cloud Platform's Container Engine uses Node Pools to assign labels to clusters as they are created. It also sets up heterogenous clusters. You can do the same things on other cloud providers as well as through the Kubernetes API and the command line client.



In this example, I added the labels role:apps and role:game-server to the appropriate nodes in my cluster. To control which nodes within the cluster Pods will be scheduled onto, we can then add a nodeSelector to our Kubernetes configurations.



Here's an example configuration for the matchmaker app. The nodeSelector is set to role-apps to ensure that the container instances are created only on application nodes (those with the "apps") role.



To the same effect, we can also adjust the configuration in the previous article to schedule all the dedicated server Pods on the machines that we have specifically designated. those tagged by role:game-server



Note that in my sample code, I use the Kubernetes API to provide a configuration identical to the one above, but the yaml version is easier to understand, and it is the format we've been using throughout this series.



A Strategy to Scale Up



Kubernetes on cloud providers tends to come with automated scaling capabilities, such as the Google Cloud Platform Cluster Autoscaler, but since they are generally built for stateless applications, and our dedicated game servers store the game simulation in memory, they won't work in this case. With the tools Kubernetes provides, it is not difficult to create your own Kubernetes cluster autorscaler!



Scaling up and down the nodes in a Kubernetes cluster probably makes more sense for a cloud environment, since we only want to pay for the resources that we need/use. If we were operating in our own premises, it might not make sense to adjust the Kubernetes cluster's size. We could simply run large clusters across all machines and keep them static. Adding and removing machines is much more difficult than on the Cloud, and would not necessarily save us money, as we lease the machines for longer periods.



There are multiple potential strategies for determining when you want to scale up the number of nodes in your cluster, but for this example we'll keep things relatively simple:



- Determine a minimum and maximum amount of game server servers that you can use, and make sure you are within that limit. - Use CPU resource capacity and usage as our metric to track how many dedicated game servers we can fit on a node in our cluster (in this example we're going to assume we always have enough memory). - Create a buffer of CPU resources to support a certain number of game servers in the cluster at any given time. I.e. You can add more nodes to the cluster if you are unable to add n servers without running out of CPU resources. Calculate if a new dedicated server for gaming is being started. The buffer amount determines if the cluster has enough CPU capacity. - As a failsafe, every n seconds calculate if we should add a new node in the cluster as the measured CPU capacity resources are below the buffer.



Creating a Node Scaler



The node scaler essentially runs an event loop to carry out the strategy outlined above.



This is possible by using Go and the Kubernetes Go client libraries. As you can see in the Start() function in my node scaler, it is relatively easy to implement.



Please note that I have removed most errors handling boilerplate to make it clearer. But, the original code is available if needed.



Let's take a look at Go for those who aren’t familiar with it.



kube.ClientSet() is a small piece that returns us a Kubernetes ClientSet. It gives us access to the Kubernetes interface of the cluster where we are running. gw.ClientSet() - Kubernetes provides APIs that allow us to monitor the changes within the cluster. This code returns a data structure that contains a Go Channel (essentially, a blocking-queue). In this case, gw.events will return a value every time a Pod is added to or deleted from the cluster. Look here for the full source for the gameWatcher. tick := Time.Tick(s.tick). This creates a Go Channel that blocks until a certain time, in this example 10 seconds. Then it returns a value. If you would like to look at it, here is the reference for time.Tick. 1. The main event loop is under the "// ^^^ MAIN EVENT LOOP HERE ^^^" comment. Within this code block is a select statement. This declares that the system will block the gw.events or tick channels (firing every 10s) return a value and then execute s.scaleNodes(). This means that a scaleNodes command will fire whenever a game server is added/removed or every 10 seconds. s.scaleNodes() - run the scale node strategy as outlined above.



Within s.scaleNodes() we query the CPU limits that we set on each Pod, as well as the total CPU available on each Kubernetes node within the cluster, through the Kubernetes API. The Rest API and Go Client allow us to see the CPU limits that we have set for each Pod. This allows us to track the CPU usage of each of our game servers, as well any Kubernetes management Pods. Through the Node specification, the Go client can also track the amount of CPU capacity available in each node. From here it is a case of summing up the amount of CPU used by Pods, subtracting it from the capacity for each node, and then determining if one or more nodes need to be added to the cluster, such that we can maintain that buffer space for new game servers to be created in.



If you look at the code in this example you will see that we are using Google Cloud Platform APIs to add nodes to the cluster. We can add (and take out) instances from the Kubernetes Nodepool using the APIs provided for Google Compute Engine Managed Instance Groups. That being said, any cloud provider will have similar APIs to let you do the same thing, and here you can see the interface we've defined to abstract this implementation detail in such a way that it could be easily modified to work with another provider.



The Node Scaler is deployed



Below is the deployment.YAML file for the node-scaler. As you can see, the environment variables are used for setting all configuration options. I'm Only Good At One Thing



- Which nodes in the cluster need to be managed? How many CPU each dedicated server of gaming needs? The minimum and maximum number? How much buffer should be available at all times?



You may have noticed we set the deployment so that there are at least one replica of each node scaler. We did this because we always want to have only one instance of the node scaler active in our Kubernetes cluster at any given point in time. This ensures that we do not have more than one process attempting to scale up, and eventually scale down, our nodes within the cluster, which could definitely lead to race conditions and likely cause all kinds of weirdness.



You should also make sure that the node-scaler is properly shutdown before creating a new instance. If you want to update it, you also need to configure strategy. type: Recreate to ensure Kubernetes will destroy any currently running node-scaler Pods before recreating a newer version. This avoids potential race conditions.



It is possible to see it in action



Once we have deployed our node scaler, let's tail the logs and see it in action. In the video below, we see via the logs that when we have one node in the cluster assigned to game servers, we have capacity to potentially start forty dedicated game servers, and have configured a requirement of a buffer of 30 dedicated game servers. As we fill the available CPU capacity with running dedicated game servers via the matchmaker, pay attention to how the number of game servers that can be created in the remaining space drops and eventually, a new node is added to maintain the buffer!



Kubernetes is so exciting because we can do it without having to build much of the foundation. that's how to be me While we touched on the Kubernetes client in the first post in this series, in this post we've really started to take advantage of it. This is what I feel the true power of Kubernetes really is - an integrated set of tools for running software over a large cluster, that you have a huge amount of control over. In this instance, we haven't had to write code to spin up and spin down dedicated game servers in very specific ways - we could just leverage Pods. The Watch APIs allow us to control and react to Kubernetes cluster events. It's amazing what Kubernetes can do for you outside of the box that many of our have built over the years.



That all being said, scaling up nodes and game servers in our cluster is the comparatively easy part; scaling down is a trickier proposition. We'll need to make sure nodes don't have game servers on them before shutting them down, while also ensuring that game servers don't end up widely fragmented across the cluster, but in the next post in this series we'll look at how Kubernetes can also help in these areas as well!



As with previous posts, I welcome questions and comments. You can also reach me via Twitter. You can view my presentation at GDC as well as the code in GitHub. This code is still being actively developed!