Data Science Insights: Exploring the Influence of COVID-19 on Video Game Popularity and Sales

Square

This is the group project of SDSC1001 – Introduction to Data Science. I did the project in my year 1 2020/21 Semester B.

Presentation Slides:

Course Instructor: Dr. Xinyue LI

Abstract: People spend more time staying at home under the pandemic of Covid-19. Playing video games has become one of the entertainments. As a result, the number of players has significantly increased. This paper is to investigate the correlation between factors and hit rate by linear regression and the machine learning algorithm for predicting game sales and success. It is hoped that the factors and algorithms could be a reference to predict future game sales and success.

1. Background, Problems and Motivation

1.1 Background

During the Covid-19 pandemic, many countries have set up disease prevention measures to prevent virus transmission. For example, the United Kingdom has imposed the coronavirus lockdown. People have more time staying at home due to government policy and health situations. Under these circumstances, more people spend time playing video games and gaming companies are willing to put resources into introducing new games. Some researchers state that the number of concurrent active users has more than 20 million during the pandemic, which is the highest according to the record from one of the video game distributors, Steam. As a result, the number of video game players has dramatically increased. King et al. (2021) also indicate that the pandemic has facilitated the participation of gaming. It seems that the pandemic might be a significant factor that affects the video game hit rate; however, there might be other factors that could affect the hit rate of video games. Thus, this paper aims to visualize whether the factors could affect the video game hit rate and further investigates the machine learning algorithm, which is appropriate for predicting video game sales and game success.

1.2 Definition

Game hit rate

  • Total number of players
    • It means the number of players starting from the first day of opening the game to now.
  • Average concurrent players
    • It means the average number of online players in a specific time.
  • Peak concurrent players
    • It means the highest number of online players in a specific time.
  • Daily/Weekly/Monthly active players
    • It means the number of players that logged to the game for at least once in a day, a week or a month.

Video game

  • Electronic games in which players control images on a video screen

Linear regression

  • A methodology establishes a relationship between a scalar and one or more variables.
Y = aX + b ...
Y is result variable
X is predicted variable
a and b are coefficients

Support vector regression

  • A machine learning algorithm is used for classification from Support Vector Machine. It decides the decision boundary lines to forecast a continuous variable.
Y = aX + b ...
Decision boundaries:
aX + b = -e
aX + b = +e
Model satisfaction:
-e < Y - aX + b < +e

Random forest

  • A supervised machine learning algorithm randomly generates a forest with several trees. The classification or regression can be adopted to construct a decision tree and output class or prediction.
Precondition: A training set S := (x1, y1), . . . ,(xn, yn), features F, and number of trees in forest B.
function RandomForest(S , F)
H ← ∅
for i ∈ 1, . . . , B do
S (i) ← A bootstrap sample from S
hi ← RandomizedTreeLearn(S (i) , F)
H ← H ∪ {hi}
end for
return H
end function
function RandomizedTreeLearn(S , F)
At each node:
f ← very small subset of F
Split on best feature in f
return The learned tree
end function

Decision tree

  • A supervised machine learning algorithm predicts a target variable. It is used in classification and prediction problems with mainly “if-then-else” statements.

Internet popularity

  • Number of people who accessed the Internet.

2. Objectives

  • Correlation between price and number of players in a game

First, we assume price is an important factor of the hit rate of a game. Our validation method is to use linear regression to find out the correlation between price and number of players. We use Counter-Strike: Global Offensive (CS: GO) data to do the testing as the game company (Steam) provides the players’ number api so that the public can easily access those information for doing investigation. We will need two datasets, one contains the history price of CS:GO from August, 2012 to March, 2021. The other dataset is the one provided by Steam api. After we have downloaded the two sets data, it is a must for us to do data cleaning in

order to reduce noise for better accuracy. We found that there was some data loss in the price dataset such as the price of August 2012 to February 2013 were not being recorded. As a result, we can only compare two datasets from March 2013 to November 2018 because the game changed to free access to everybody in December 2018, so the price is changed to zero and not suitable for comparing the relationship of price and number of players by linear regression anymore. The calculation process of linear regression was finished at Excel. The full processed dataset can be found in Appendix.

  • Correlation between internet popularity and video game sales

Our assumption is that video game sales has the correlation with internet popularity. In order to verify this assumption, we used the datasets of video game sales by country and the internet popularity of them. These two datasets are provided by Zach. The dataset of video games sales show the game revenue estimated in 2020 and the internet popularity shows the number of people who accessed the internet using any devices for each country this year. These estimates are based on consumer research, transactional data, quarterly company reports, and census data. The revenues are based on consumer spending in each country and exclude hardware sales and tax. Our testing method is to use linear regression to test its correlation by excel.

3. Main Results

  • Correlation between price and number of players in a game

After the two datasets were combined together and processed with linear regression from Excel, here is the exact result after processing.

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.945345122

R Square

0.864456354

Adjusted R Square

0.874734625

Standard Error

10051.16955

Observations

69

As the table upon, we can see that Multiple R is around 0.94 which means there is a strong positive relationship between price and number of players of the game. R Square is 0.86 which means 86% of our values fit the regression analysis model.

  • Correlation between internet popularity and video game sales

By combining two datasets and doing the linear regression analysis, we got some results about the correlation.

Equation:

y=(0.000000051)x + 45.6

Multiple R

0.844716

R square value

0.713545

Adjusted R square

0.677738

Standard Error

0.0000000829

Observations

20

We can see that the slope of the equation is positive and the R-square value is about 0.71, which means that internet popularity has a positive relation with video games sales and 71% of the variance in the video games sales can be explained by internet popularity.

  • Accuracy of predictive algorithms for game sale

ML algorithm

Accuracy

Random Forest

0.9605

Support Vector Regression

0.8154

Decision Tree

0.8036

Linear Regression

0.7534

Source: Keerthana, B. & Rao, K.V.(2019). Sales Prediction on Video Games Using Machine Learning. Journal of Emerging Technologies and Innovative Research, 6(6).

  • Accuracy of predictive algorithms for game success (number of players)

ML algorithm

Accuracy

Random Forest

0.9750

Support Vector Regression

0.9640

Source: Trněný, M. (2017). Machine Learning for Predicting Success of Video Games. Masaryk University Faculty of Informatics.

4. Interpretation, Comparison, Discussions

  • Correlation between price and number of players in a game

From the result of linear regression, it is not hard to notice that the correlation between price and number of players in a game exists. It has proved our assumption that price is an important factor of the hit rate of a game. There are some reasons that make this correlation happen. When a higher price game is being made, large studios are aiming at high-quality products’ market so their budget allows them to invest into advertising in order to get a larger audience to notice the game. As a consequence, while most games cost $10 and less, the more successful ones can be generally found above $40 (Trněný, 2017). (check Figure 1 below)

On the contrary, the game that we use for testing (CS: GO) shows a different phenomena as well. After the game was changed to a free game on 6th December of 2018, the number of new players increased a lot more than before. It shows that people become more willing to give it a try as it is free. To conclude that, price has a significant effect on the hit rate of games, no matter if the price is high or not.

Figure 1

Source: Trněný, M. (2017). Machine Learning for Predicting Success of Video Games. Masaryk University Faculty of Informatics. https://is.muni.cz/th/k2c5b/diploma_thesis_trneny.pdf

  • Correlation between internet popularity and video game sales

There is a phenomenon that countries with higher internet popularity would have higher video games sales. It can be explained by the market size. Higher internet popularity means that there are more people who have suitable devices to play video games and receive information about it. Also, the variety of opponents we can face online would cause players to have a fresh and unique challenge. These are the reasons for the positive correlation.

Optimal machine learning algorithm for predicting sale or success of games

  • Sale prediction

Linear regression, support vector regression, random forest and decision tree models were constructed to predict the games sale (Keerthana, B., Rao, K.V. & Scholar, M.T., 2019). The linear regression was the baseline model which was selected by the research team. Among the 4 testing models, the accuracy rates were 75%, 85%, 96% and 80% respectively in linear regression, support vector regression, random forest and decision tree models. The optimal ML algorithm for predicting sale of games is the random forest model which owned 96% of accuracy.

  • Success prediction

Support vector regression and random forest provide effective prediction models which have 96% and 97% of accuracy respectively (Trněný, 2017). In the research, one of the objectives is to predict the average number of players in the game. The model of predicting the game over 100 players on average worked in those 2 algorithms that it covered 33% of the dataset. Hence, the researcher believes that the models are beneficial to game producers or developers to have an useful insight for their products.

5. Conclusions

The primary aim of this paper is to investigate whether the factors could affect the video game hit rate. Based on the visualization results, it has following conclusion:

  1. There is a strong relationship between game prices and number of players.
  2. The relationship between internet popularity and video game sales is positive.

The phenomenon in (1) could be explained by the mentality of the players. As Alha et al. (2014) state, people are more willing to try free-to-play games. High price games and low price games might not have differences for those players. They could gain similar amusement even by playing a low price game. In this situation, low-price games could attract a part of players; however, some players are willing to pay for high price games since they might believe that high price games could provide high-quality gaming experiences. For example, it might contain a rich story. As such, there is a market for both high and low price games.

The paper also aims to figure out the optimal machine learning algorithm for video game sales and game success prediction. After investigation, the random forest model seems to be the most effective machine learning algorithm for predicting video game sales and game success, which has a 96% accuracy rate in predicting game sales and approximately 97% in predicting game success.

In conclusion, the price of video games and internet popularity are the factors that would affect the hit rate of games. A suitable machine learning algorithm is also essential for gaming companies to predict future game sales or success. It is hoped that the paper could be a reference for future prediction.

References

Alha, K., Koskinen, E., Paavilainen, J., Hamari, J., & Kinnunen, J. (2014). Free-to-Play Games: Professionals’ Perspective. Proceedings of DiGRA Nordic 2014.

Keerthana, B. & Rao, K.V. (2019). Sales Prediction on Video Games Using

Machine Learning. Journal of Emerging Technologies and Innovative Research, 6(6). http://www.jetir.org/papers/JETIR1907H50.pdf

King, D. L., Delfabbro, P. H., Billieux, J., & Potenza, M. N. (2020). Problematic online gaming and the COVID-19 pandemic. Journal of Behavioral Addictions, 9(2), 184-186.

Şener, Mehmet & Yalcin, Turkan & Gulseven, Osman. (2021). The Impact of Covid-19 on the Video Game Industry.

Trněný, M. (2017). Machine Learning for Predicting Success of Video Games. Masaryk University Faculty of Informatics. https://is.muni.cz/th/k2c5b/diploma_thesis_trneny.pdf

Appendix

Price and number of players in CS: GO dataset

Month

Avg. Players

Gain

% Gain

Price (HKD)

March 2021

740927.82

-85.42

-0.01%

0

February 2021

741013.24

-2196.42

-0.30%

0

January 2021

743209.66

25405.91

3.54%

0

December 2020

717803.75

49049.17

7.33%

0

November 2020

668754.58

55087.89

8.98%

0

October 2020

613666.69

6816.37

1.12%

0

September 2020

606850.32

-33107.34

-5.17%

0

August 2020

639957.66

14056.85

2.25%

0

July 2020

625900.81

-45746.65

-6.81%

0

June 2020

671647.46

-97147.79

-12.64%

0

May 2020

768795.25

-88808.97

-10.36%

0

April 2020

857604.22

186570.94

27.80%

0

March 2020

671033.29

127054.13

23.36%

0

February 2020

543979.15

42783.15

8.54%

0

January 2020

501196

44494.44

9.74%

0

December 2019

456701.56

30620.76

7.19%

0

November 2019

426080.81

17085.5

4.18%

0

October 2019

408995.31

-1930.29

-0.47%

0

September 2019

410925.6

-4171.7

-1.00%

0

August 2019

415097.3

21314.48

5.41%

0

July 2019

393782.83

4406.1

1.13%

0


June 2019
389376.7224959.426.85%0

May 2019

364417.31

12427.39

3.53%

0

April 2019

351989.92

-38250.24

-9.80%

0

March 2019

390240.16

18881.2

5.08%

0

February 2019

371358.96

-30007.91

-7.48%

0

January 2019

401366.87

5857.61

1.48%

0

December 2018

395509.26

85423.83

27.55%

0

November 2018

310085.43

-15822.39

-4.85%

99

October 2018

325907.82

-7256.17

-2.18%

76

September 2018

333163.99

49632.68

17.51%

32

August 2018

283531.31

10224.05

3.74%

48

July 2018

273307.26

6445.02

2.42%

76

June 2018

266862.24

4691.36

1.79%

76

May 2018

262170.88

-26905.82

-9.31%

120

April 2018

289076.7

-65193.64

-18.40%

120

March 2018

354270.33

-28186.77

-7.37%

110

February 2018

382457.1

426.57

0.11%

99

January 2018

382030.53

41153.65

12.07%

49.5

December 2017

340876.88

19745.48

6.15%

60

November 2017

321131.4

-20729.86

-6.06%

120

October 2017

341861.26

-12540.83

-3.54%

99

September 2017

354402.09

-20023.6

-5.35%

99

August 2017

374425.69

-3163.35

-0.84%

99

July 2017

377589.04

3201

0.85%

76


June 2017
374388.042558.70.69%76

May 2017

371829.34

-20369.85

-5.19%

99

April 2017

392199.19

5290.47

1.37%

76

March 2017

386908.72

-15476.99

-3.85%

99

February 2017

402385.71

9276.18

2.36%

48

January 2017

393109.53

50913.83

14.88%

38

December 2016

342195.7

13150.44

4.00%

48

November 2016

329045.26

-4031.2

-1.21%

99

October 2016

333076.46

10550.57

3.27%

48

September 2016

322525.89

-24703.36

-7.11%

120

August 2016

347229.25

-6548.31

-1.85%

70

July 2016

353777.56

19466.5

5.82%

49.5

June 2016

334311.06

-4427.34

-1.31%

70

May 2016

338738.39

-37057.47

-9.86%

120

April 2016

375795.87

-3631.08

-0.96%

70

March 2016

379426.95

3141.92

0.83%

66.33

February 2016

376285.02

10913.93

2.99%

44.5

January 2016

365371.09

-12076.02

-3.20%

99

December 2015

377447.11

16521.23

4.58%

49.5

November 2015

360925.88

-1840.21

-0.51%

62

October 2015

362766.09

6860.76

1.93%

48

September 2015

355905.33

-1629.91

-0.46%

62

August 2015

357535.24

28002.87

8.50%

48

July 2015

329532.38

-14623.63

-4.25%

99


June 2015
344156.0126869.728.47%32

May 2015

317286.29

25537.55

8.75%

32

April 2015

291748.74

23752.43

8.86%

32

March 2015

267996.31

28061.68

11.70%

32

February 2015

239934.64

5863.96

2.51%

48

January 2015

234070.68

50481.18

27.50%

25

December 2014

183589.5

36260.43

24.61%

25

November 2014

147329.07

13791.37

10.33%

48

October 2014

133537.7

2503.02

1.91%

99

September 2014

131034.68

-2151.11

-1.62%

78

August 2014

133185.79

27047.79

25.48%

32

July 2014

106138

21974.38

26.11%

32

June 2014

84163.62

-761.4

-0.90%

78

May 2014

84925.02

6044.49

7.66%

68

April 2014

78880.53

8737.2

12.46%

68

March 2014

70143.33

10351.96

17.31%

68

February 2014

59791.37

4164.01

7.49%

68

January 2014

55627.35

8839.08

18.89%

68

December 2013

46788.27

16897.75

56.53%

68

November 2013

29890.52

1988.38

7.13%

68

October 2013

27902.14

223.23

0.81%

68

September 2013

27678.91

1717.06

6.61%

68

August 2013

25961.85

5469.33

26.69%

68

July 2013

20492.52

2372.68

13.09%

68

Video game sales and internet popularity dataset

Country

Internet Popularity (million)

Sales (million)

China

907.5

4085.4

United State

283.9

3692.1

Japan

101.5

1868.3

South Korea

48.2

656.4

Germany

75.5

596.5

United Kingdom

61.8

551.1

France

58.2

398.7

Canada

33.7

3051.

Italy

52.7

266.1

Spain

40.8

265.6

Russia

65.2

200

Sweden

42

1800

Finland

38

1734

Australia

29

143.5

Mexico

27.65

126.5

Singapore

3.145

159.3

Vietnam

7.815

364.2

Poland

23.3

123

Netherland

35

213

Slovakia

33

165.6

 

Leave a Reply

Your email address will not be published. Required fields are marked *