Video Game Localization: How to Find Out User Difficulty and Expectations
Video gaming is fast becoming the most popular recreational past-time globally. With the gaming market reportedly worth $138 billion (USD) in 2018, it has been predicted to increase at a compound annual growth rate of 10%, reaching $180 billion (USD) by 2021.
Therefore, more users around the world want popular games to be translated and adapted to their regions. Reciprocally, there’s a rapid increase in localization, a subsector of the gaming industry. Its goal is to create a smoother playing experience for the end user by taking into account their specific cultural context, while being faithful to the source material.
The problem many game localization companies face, however, is the challenging task of analyzing user reviews for useful insights and to inform the localization process. This costs the company time and money, and delays users from fully enjoying their chosen game.
This article will demonstrate how such analysis can be done efficiently, by looking at a project carried out for Allcorrect as a part of the Data Analyst Practicum by Yandex bootcamp this year. Specifically, examples of using some popular Python packages for dealing with text data in multiple languages are presented, such as deep_translator, lang_detect, and NLTK.
The data
The data used for the report in this article is provided by Allcorrect. It includes game ID, user score and review text in its original language. The games are on both mobile and PC platforms, which have different user scoring systems. The game IDs have been anonymized.
To prepare for the analysis, reviews that mention localization need to be identified. In this project, there are a total of 140 keywords in 27 languages used to filter the data. One of the challenges in this process is the differences among languages. Users might mention the word 'localization' in their reviews, but use more than one expression in their original language. Therefore, having only one translation for 'localization' in the keywords will miss the reviews containing other expressions.
Using the Chinese language for example, the keywords Chinese, localization and English can be translated as 中文,本土化,英语 respectively. However, in the user reviews, we also find the use of 华语,汉化,英文 to refer to the same concepts. These need to be included in the keywords as not to miss important localization reviews. The same would apply to other languages as well.
Using the keywords to filter the data, slightly over 1% of it was about localization (a total of 152,447 records). In what follows, we will demonstrate how to carry out language detection, translation, and emoji conversion on text data. Then, we'll discuss how to use a logistic regression model to reveal the dynamics of positive and negative reviews. Finally, we'll examine the most common reasons for negative reviews. (Where there are code examples, the variable reviews is used to refer to the localization review dataset.)
Useful packages, tools and functions
Before diving into the details of the analysis, let's briefly introduce the most useful Python packages that were used in the analysis of large texts in multiple languages. This section will list the packages used, as well as some useful functions that readers could modify and use.
1) langdetect is a package used for language detection. To apply functions to large dataset, swifter is a useful package that will speed up the pandas processing speed. (For more information, click here ) Below you'll find a demonstration of how the two work together.)
2) GoogleTranslator, from the Python deep_translator package, is a useful tool to translate text reviews. Given that GoogleTranslator does not work when the text exceeds a certain length, when defining the function to apply to dataframe, a try-except would come in handy. (More info on the Python's deep_translator document can be found, here .) Below we demonstrate how to use this package to define a function to translate text data.
3) Emojis are used often in review text data. As these reflect users' feelings, and at times, we may want to keep them. One way to retain such data for text analysis is to convert them into text. This is where the Python package emoji comes in handy. The code below demonstrates the process of using this package to define a function to convert the emojis to text data.
The tools described above are just some of the ones that proved useful when preparing the dataset for further analysis. With those in place, let's move on to discuss the analysis.
Top 10 requested languages and games
Applying the language detection function to the review text column allows us to find out the most popular languages in which the localization reviews are written.
The graph above shows the top 10 languages with the most localization reviews: simplified Chinese, Korean, Russian, Turkish, English, Spanish, Italian, French, Thai, and Japanese. Examination of the text data further reveals the prevalence of user requests of local versions or improvement of existing local versions of games in these languages. Such information provides useful insights as to which languages the game localization companies should focus on in their strategic development plans.
We also generated the top 10 games (in terms of the number of their localization reviews) below.
The graph above shows the games the company should focus on when developing localization plans.
Further analysis of reviews for a particular game should reveal the user's preferences, such as whether the game should be localized in a particular language or that existing localization versions should be improved.
User sentiments in localization reviews
To distinguish the user sentiment (positive or negative feelings) in the reviews, reviews were assigned as negative (score 1 or 2 for mobile games and 0 for pc), neutral (score 3), and positive (4 or 5 for mobile games and 1 for pc). A breakdown of review sentiments is shown below.
Among all the localization reviews, over 65% were positive, approximately 23% were negative, and 11% remained neutral. It's good news that the majority of the reviews mentioning localization are positive. Our focus here is to understand the dynamics of positive and negative reviews, particularly reasons for negative reviews. To that end, the neutral reviews have been removed and 10% of the total data has been sliced and translated, which is used to build a logistic regression model. The resulting dataframe has the following structure:
Examining positive and negative reviews
Having translated the reviews into English and then converted emojis into text, the positive and negative review texts were used to generate wordclouds to highlight any patterns.(Python has a wordcloud libary and here is a demonstration of how to use it.) First, a wordcloud for positive reviews.
Words such as 'great', 'good', 'fun', 'better' stood out, but words like 'difficult' and 'problem' were also present. Expressions like 'language support', 'turkish language', and 'chinese' indicate the need for specific languages.
Next, let's see the negative reviews wordcloud.
Again we see words like 'good' stand out. Words such as 'problem', 'bad', 'need', and "can't" are likely to indicate issues faced by users.
Despite the above observation, there are no distinct differences in patterns between positive and negative reviews using wordclouds alone. Hence why a logistic model was built to reveal underlying dynamics.
A logistic regression model to detect positive or negative review patterns
Using a logistic regression model makes interpretation easy and can be learned quickly. It also performs well for the sparse matrix of vectorized words.
The goal of fitting a logistic model here is not to predict the sentiment of reviews, however. It is believed that the score given by the user reflects their satisfaction level more accurately, whereas the role of their text reviews is to provide additional information or explanation. Therefore, the main goal is to reveal the underlying text patterns.
Before using the review text column and the binary positive column to build a simple logistic regression model, the text review column needs to be processed. The function used to retrieve the text data is shown below.
The key Python packages needed for this function are provided by the NLTK library: stopwords, word_tokenize, and WordNetLemmatizer:
After processing the text column, the first five rows of the data used for the logistic regression model look like this:
Let's now move onto the steps of fitting a logistic model to the data.
The data frame is split into train and test sets. 80% of the data will be used for training, and 20% will be used for testing. Alternatively, the train_test_split method can be used to split the data, but here let's try another way:
To perform logistic regression analysis on text data, the text first needs to be tokenized and vectorized using a tool from the Scikit-learn library, CounterVectorizer.
Two hyperparameters are fine tuned in the model: penalty and C values. GridSearchCV is used to find the optimal values for them.
It looks like at the default level C=0.1, penalty at 'l2', the best results are achieved. Those will be used as hyperparameters.
The result is a successfully built and trained simple logistic regression model. Checking the accuracy score for both training and test data shows that the model has an overall 78% prediction accuracy on the training data, and 73% accuracy on the testing data. Or model is slightly overfitted.
Discriminating words for positive and negative reviews
Using the logistic regression model, words or patterns that feature in the positive or negative reviews can be revealed. Let's look at the 10 most discriminating words for both types of reviews. This is done by looking at the largest and smallest coefficients, respectively.
The 10 most discriminating negative words have been saved for when we examine the negative reviews.
Examine the negative reviews
The most important piece of information that the company wants to know is the reasons behind the negative reviews (involving localization). Therefore, all the negative localization review text data has been translated into English for closer examination. The key method used for this purpose is word concordances, a combination of keywords and discriminating negative words from the logistic regression model.
The negative review text column has been processed in the same way as the logistic model. A column has also been added to show the number of words in the reviews.
Using this data, it is possible to look at the distribution of review length, the top languages in which those negative reviews were written, as well as which games they refer to, before moving on to the negative review characteristics.
Distribution of length of reviews
There are a total of 31,631 negative reviews about localization in the dataset. Below are two histograms showing the distribution of length in the number of words. The first shows the total number, and the second is with the outliers removed.
From the above histogram, we can determine that most negative reviews are under 50 words in length. We will use this information to filter the reviews when examining their patterns using the method of word concordances.
Top languages and games for negative reviews
Will the top 10 languages for negative reviews differ from those for all reviews? Let's take a look.
As the graph above demonstrates, the top 10 languages dominating the negative reviews are Simplified Chinese, Russian, Korean, Turkish, English, Italian, Spanish, French, Japanese, and German. The difference between this and the previous ranking for all reviews is the German language.
Next, let's see what 10 games have the most negative reviews about localization.
The above shows the 10 games that have the most negative reviews that mention localization.
Although not all these reviews are negative exactly because of their localization quality - the filtering of localization reviews wasn't perfect - this data is still useful for localization companies to conduct further investigations into those games.
Examining the negative reviews using word concordances
To examine the text reviews using word concordances, we need to first create a corpus for the review text, instatiate an NLTK text object using the corpus, and then call the concordance method. Below shows a code example carrying this procedure.
Using this method, concordances for the following words have been generated:
- language
- translation
- localization/localization
- problem
- difficult
- rubbish
- terrible
- [censored]
- grammar
These are a mixture of the localization key words, what we observed from the wordclouds, as well as the discriminating negative words generated from the logistic regression model. The list can certainly expand depending on the information being sought.
For each concordance, we have generated 20 matching lines for further examination. Below is an example of such concordances matching the word 'language'.
Examination of the concordances for all the selected keywords shows that, athough in most reviews related to localization there is the mention of the lack of the users' native language in the games, an additional urgency can be detected.
Чтобы изменить содержимое ячейки, дважды нажмите на нее (или выберите "Ввод")
These can be shown in, for example:
- "When is Polish language???????" with many questions marks to emphasize
- the '!" after "there is no Russian language!"
- please add/put language of ...
- "Turkish support should be added immediately"
- "don't your hurry up and fix it"
The analysis also reveals the difficulty users experience when playing games without local language support, for example:
- "without Vietamaese it is too difficult to play"
- "difficult to understand because it is not in..."
- "difficult to understand instructions"
- "the method of Japanese mod is difficult"
- "Thai language... difficult ... to be able to play"
Users also noted the issue with localization quality, for example:
- "translation is rubbish"
- 'terrible translation"
- "The Chinese localization is rubbish, even worse than google translation"
- 'weird grammar"
- "can not play because of missing grammar"
and potential cultural conflict, for example
- "discriminate against Chinese"
Furthermore, given the concern that not all reviews that mention localization are negative because of a specific localization problem, it is important to find out what negative comments are being made about localization. Therefore, the following function might be useful:
The function can of course be applied to find the frequency of any expressions, such as 'language sucks', 'bad grammar', or 'poor translation', to make the search more specific. The function and its modifications will allow the analyst to quickly locate reviews of particular patterns and perform further analysis on them.
For example, applying the function using 'bad translation' reveals there are 48 matching records. A possible next step would be to find out which these are, what games, and what languages.
Conclusion
The analysis discussed in this article reveals the most popular languages among the reviews, as well as which games received the most reviews mentioning localization. These would provide useful information for the localization company on what to focus in their strategic planning.
An investigation of the positive and negative reviews translated in English shows that they exhibit similar structures in language use, both noting the lack of games in a specific language and the difficulty that causes gamers to experience.
A notable difference in the positive reviews is that the lack of a local language version may not have impacted the enjoyment of the game. For negative reviews, having no local language version or a bad translation add to the barriers of game play, even arousing a feeling of discrimination in some cases, and so a hint of detected urgency and frustration.
To sum up, the top reasons for negative sentiments are :
- lack of specific language
- bad/poor/terrible translation
- difficult in understanding or playing in English
- repeated requests to add language not addressed
- discrimination/stereotypes
- bad/awkward localization
- wrong, bad, or weird grammar and punctuation use
The tools and functions described in this article allow the company to quickly distinguish what languages and games they should pay particular attention to in their localization plans. An easy to apply method has also been provided, which allows the company to examine the reasons for negative reviews. Using the procedure demonstrated, it is also possible for the localization company to futher examine which languages and games the problems persist in, such as poor translation or even stereotypes, and take action accordingly.