Competing in the gaming industry is becoming increasingly costly. To attract new players, millions of dollars are spent in user acquisition (UA), development and cloud infrastructure. As the space becomes more expensive, revenue and cost optimizations can become a matter of survival.
The cost to businesses because of server outages, game response lag or generally poor performance can equate to billions of market cap of gaming companies. The value of observability in modern games is about ensuring that revenue opportunity is not missed in a costly operating landscape. Observability assures that all investment spent on user acquisition is protected and that gaming user experience (UX) is maximized. And user experience depends not only on how the game’s UI is performing but also on the entire gaming environment performance and reliability.
A modern gaming environment is usually highly distributed—a lot of pieces have to come together. The engineering team develops code for the gaming application, which can consist of many microservices, containers, managed cloud services, third-party APIs and serverless functions. Game designers create game levels, the art team produces visual effects and animations and the audio department creates game audio. Once these moving parts are packaged together, the game service has to be deployed on cloud infrastructure and the game itself shipped as executables for consoles, mobile devices and laptops.
All these elements generate massive amounts of observability data, particularly if a game is popular and played by millions of users. Thus, making sure that all gaming elements are working as expected means that gaming companies have to implement efficient ways of collecting and analyzing the massive amounts of that data. That data comes from gaming services, consoles and other sources in real-time. For instance, it is not uncommon to deal with hundreds of terabytes of ingested log data per day. As a result, improving the gaming experience can be quite challenging for gaming companies with large-scale operations. The companies’ success depends on how efficiently they can harness all the data generated by gaming environments.
This is where observability comes in. Observability provides critical insights into what is—or what may be—wrong with the game, all its moving parts and user experience. Powerful observability capabilities are essential, including processing vast amounts of data (logs, metrics, traces, etc.) in real-time, managing fluctuating observability data from variable user loads and running concurrent real-time queries on a massive scale.
So, what, exactly, needs to be measured for a great gaming user experience?
Measuring the Gaming User Experience
Gaming companies are under continuous financial pressure to deliver new features that attract and retain new users in order to offset mounting operating expenses. Attracting new users is a balancing act from the gaming companies’ perspective. If they move too slowly, they will lose gamers. If they move too quickly, the gaming user experience can deteriorate due to newly introduced bugs. Gamers can be very vocal on social networks, especially if their intent is to express their dissatisfaction with compromised gaming quality. As a result, companies are forced to balance the rate of feature introduction with user satisfaction. That’s why they need to measure user experience continuously.
To improve gaming user experience, gaming organizations must establish measurable key performance indicators (KPIs), which can drive improvements and overall quality. KPIs include, but are not limited to:
- User video quality (frames per second)
- Memory usage
- Crash reports count
- Database performance measurements
- Service latencies
- User request count
- Game-specific errors and warnings count
- Code quality measurements, such as automated test coverage success criteria (a part of the CI/CD pipeline)
Monitoring Service Latency for a High-Quality Gaming Experience
Gaming applications are often bursty when it comes to user load, especially during the introduction of new games or new features. One of the critical indicators to monitor is service latency to manage fluctuating user demand during those times.
DevOps teams and SREs need to collect logs at a massive scale to analyze the performance of services, infrastructure and individual host utilization metrics such as high CPU utilization. This information gives engineers clues about the sources of bottlenecks and potential deterioration related to service latency. And logging tools need to be able to deal with this fluctuating demand, too. This level of insight into the state of the gaming application helps engineers troubleshoot issues before gamers face increased service latency that can impact their gaming experience.
Analyzing Game-Specific Observability Data
Game developers can create custom logs and metrics and combine those with observability data from the gaming engines. Both must be analyzed to assure a high-quality user experience. Some of the common gaming data parameters available for analysis are:
- Log data that indicates when a player joins and leaves the game (associated with the player’s name)
- The location of a player reported at specified times
- Logs related to significant gaming events associated with each player (such as goals achieved or actions performed)
- Logs related to chat messages between players, player location, the time when the message was generated and message context
The more data collected, the better the chances to understand any issues that can turn players away from the game. Observability data management should evolve in parallel with game development. As new features are added, the code should be instrumented with KPIs to ensure the quality of the newly introduced code in the long run.
Database Performance Monitoring is Critical for Gaming User Experience
To store time-sensitive gaming data, DevOps teams must use fast databases. The database performance and how fast the game’s properties are retrieved from a database directly impact user satisfaction. Slow database performance negatively affects the gaming user experience. For DevOps teams to avoid any service issues, it is a must to measure key database performance indicators such as response times metrics, out-of-memory error logs and others.
Fast databases could be expensive in the extremely competitive market where gaming companies are always looking for cost optimization to survive in the business. It’s non-trivial for DevOps teams to monitor user experience related to database performance and, at the same time, to look for ways to optimize the cost of fast databases.
Build a Pipeline and Delivery Monitoring
For a large-scale video game, new code is often pushed many times per day. Builds can be significant and deployed across many gaming consoles and end devices. Continuous build pipeline and delivery monitoring are crucial for frequent and commonly large software deployments.
With large builds deployed daily over the network to thousands of gaming platforms—where each platform receives a massive amount of data—monitoring of build deployments should be automated. Errors from log data should trigger alerts and drive redeployments. QA automation is necessary to reduce the burden of QA efforts and make build deliveries less error-prone and less tedious.
Better Observability Improves In-Game Commerce
Insights into user behavior are as important as monitoring possible issues affecting user experience. Observability tools can provide key insights into the overall engagement of all game users. High engagement then drives more excitement. Higher user excitement may increase the success of ad campaigns and accelerate in-game purchases.
Observability Tools for Managing Gaming Experience
When thinking about observability tools that can be instrumental in driving continuous improvement of user experience in the gaming industry, the following functionalities are important:
- Ability to ingest and analyze log data at petabyte-scale and beyond in real-time to avoid gaming experience deterioration.
- Zero-schema and ability to accept observability data in various formats from various gaming platforms, cloud servers, gaming consoles, and mobile devices. Because gaming environments use different third-party software packages, logs may come in many formats.
- Ability to create alerts on collected observability data. Alerting can improve user experience by monitoring properties such as the number of active users experiencing client errors.
- Capability to handle backpressure when critical issues occur is essential. An alert storm can clog the data pipes, and the observability tool can be in the dark for hours unless it is designed to cope with backpressure.
- Ability to collect data from build pipelines to monitor frequent and commonly large software builds being deployed daily over the network to numerous gaming platforms.
- Capability to collect data from cloud environments such as cloud infrastructure monitoring.
Observability is key to protecting gaming revenue opportunities and assuring the success of video games, as long as you know what to look for in observability tools that are used to gather data from gaming environments.