Combination of Self-organizing Map and k-means Methods of Clustering for Online Games Marketing

,


Introduction
Owing to the popularity of mobile networks and the increasing frequency of broadband, the Internet has come to play an important role in people's lives. In 2017, according to the "Taiwan Broadband Network Usage Survey" published by the Taiwan Network Information Center (TWNIC), the total number of Internet users in Taiwan reached 17.6 million. Because of the popularity of mobile Internet, Taiwan's Internet population is approaching the total population of the country. According to the Market Intelligence & Consulting Institute (MIC), the game players in Taiwan were surveyed in 2017. (1) Among the online game types of the digital game category, up to 75.5% of the players are most interested in mobile/tablet app games, overwhelmingly surpassing the second-place computer online games (27.5%), while other game types such as computer web games (25.5%), computer stand-alone games (21.5%), and video console games (16.5%) are the last to be played. Although virtual reality (VR) games have enjoyed high exposure in recent years and are beginning to gain a following, at 2.6%, there is still a considerable distance to go before they can join the mainstream games. The global digital game market is booming and rife with huge business opportunities, providing a receptacle for new technology applications such as VR, augmented reality (AR), and 5G. With the popularity of platforms such as mainframes, computers, and mobile devices, games themselves infiltrate players' lives and drive live broadcast, advertising placement, and other industries. Taiwan has a thriving game industry with active users and live program trackers, and its development potential is endless. (2) In this study, the term online game generally refers to digital game activities through mobile, wireless, and wireline media. According to the results of the above research, the online game population accounts for a considerable proportion of the general population.
Therefore, in the highly competitive online game market, it is important for game manufacturers to understand the diverse attributes of their video game customers and the preferences of related products, and to expand the market of video games. In the marketing strategy, because of limited products and resources, it is impossible to fully expand all the test sites, effectively separate the market, and find the best market position (niche). Using an online questionnaire, we obtain the opinions and preferences of the online game customer group, and then conduct clustered data exploration (clustering). The data are divided into different attribute groups, allowing us to effectively separate customers and judge the contribution of different attribute groups. Then, by selecting the most suitable potential customers from the indicators or constraints, the manufacturer can implement appropriate marketing strategies for these potential customers. For the clustering data method, at present, machine learning is the mainstream. Among the machine learning methods are artificial neural networks (ANNs), such as the adaptive resonance theory network (ART) and self-organizing map (SOM). Also, cluster analysis in data mining, such as the k-means algorithm, has the characteristics that "the highest similarity within the group and the lowest similarity between groups" can effectively segment customers and uncover the best market position. Thus, for commercial application, cluster analysis provides a better decision making method for improving the effectiveness of customer relationship management. This study can be applied to IOT development in the market, since every customer is like a target or an item for marketing segmentation. We can develop an interface module with the ability to aggregate data collected from every customer resembling sensors that can detect the intention of a customer browsing on the Internet. In the future, by applying SOM and k-means as the core of systems, we can develop a complete IOT system for marketing segmentation.
The paper is organized as follows: we mainly introduce the SOM algorithm in machine learning in Sect. 2 and the k-means algorithm in data grouping exploration in Sect. 3. In Sect. 4, we explain the importance of market segmentation in the online game industry operation and the latest related research. The integration of SOM and k-means for online game customer segmentation is discussed in Sect. 5. Research results are discussed and research conclusions are presented in Sects. 6 and 7, respectively.

SOM
SOM, in the field of machine learning, is an unsupervised learning network model, which was proposed by Kohonen in 1990. (3) The basic principles of the SOM network can be traced back to the characteristics of the brain structure; brain cells with similar functions in the brain gather together. The brain nerve cells exhibit this characteristic of "similar things come together". SOM copies this characteristic, and its output processing units affect each other. When the network is learning, its output processing units will have similar functions, that is, have similar link weights. (4) SOM is a network architecture consisting of an input layer and an output layer. In the input layer, neurons represent the attributes of the input data, are independent of each other, and have independent weights. For the output layer, a two-dimensional grid network structure is commonly used, and the output and input-layer neurons have interconnected network relationships in a manner represented by a weight vector.
In 2000, Vesanto and Alhoniemi (5) proposed using SOM for clustering analysis to obtain better data clustering and reduce computation time. In 2005, Bacao et al. (6) proposed that SOM can be used instead of k-means for data clustering. In recent years, research and application in related fields have gradually confirmed that SOM and k-means can be combined to create a better method for data clustering. (7)(8)(9)

Clustering Analysis by k-means Method
k-means is the simplest and most commonly used algorithm employing a squared error criterion. (10) It starts with a random initial partition and keeps reassigning the patterns to clusters on the basis of the similarity between the pattern and the cluster centers until a convergence criterion is met (e.g., there is no reassignment of any pattern from one cluster to another; otherwise, the squared error ceases to decrease significantly after a number of iterations). Grouping can be divided into two major architectures: hierarchical clustering and partitional clustering, as shown in Fig. 1.

Marketing Segmentation in Online Games Industry
Market segmentation is the process of dividing the mass market into groups with similar needs and wants. (11) The rationale for market segmentation is that in order to gain a competitive advantage and superior performance, firms should "(1) identify segments of industry demand, (2) target specific segments of demand, and (3) develop specific 'marketing mixes' for each targeted market segment". (12) From an economic perspective, segmentation is based on the assumption that heterogeneity in demand allows for the demand to be disaggregated into segments with distinct demand functions. (13) Because of the boom in online games, the market is fiercely competitive, and much research on the online game industry is being carried out. The Taiwanese game industry mostly fell into hard times, such that it is still important to understand the characteristics of players and the utmost importance of marketing. From the responses to online questionnaires regarding the preferences of online game players, Lee elucidated the decision making process of choosing games. (14) The results of two-stage grouping show that there is a large difference between the ideas of the players and online game companies. (14) The online game market characteristics found through research can be used as a reference for marketing. Through an online game industry survey, Ren and Hardwick (14) found that the 11 online game makers in Taiwan, for example, used the number of online game strategies they evaluated to divide the industry into (1) leaders, (2) challengers, (3) followers, and three other types, as a business strategy reference for similar operators.

Proposed model for marketing segmentation
Effective marketing segmentation is an important marketing strategy for online game business operators. In market segmentation, questionnaire surveys were used in the past, and the responses were analyzed by factor analysis and related multivariate analysis in statistics. In the era of big data, the amount of data collected through the Internet is large, and machine learning and data exploration have begun to be used for analysis and application. Kuo and Liao (15) proposed a clustering model based on ART combined with k-means to determine online game players' attributes; the cluster number A obtained by ART and the group number B obtained by k-means were found to coincide. Common attribute groups could be identified, providing vendors with a means of better customer relationship management.
On the basis of the above literature, although the k-means algorithm is the simplest and most commonly used clustering method, if the number of clusters is not specified in advance or the software is randomly selected, clustering may not be possible. Furthermore, the accuracy of clustering depends mainly on the researcher's experience in appropriately applying the algorithms. The SOM-like neural model is more accurate than the traditional multivariate method. (16) Zhang and Tsay (17) proposed a two-stage strategy clustering method to improve the accuracy of grouping. In the first stage, SOM is used to map high-dimensional data to a lowdimensional space to easily understand the data structure. With a parameter S of rough groups, convergence can achieve global optimization by using the software "Viscovery SOMine". In the second stage, the k-means algorithm of Weka software is used for cluster analysis by setting the number of K clusterings to be equal to S. The purpose is to find the characteristic pattern of the data so that the difference between the attributes in the same groups is as small as possible. We propose to integrate the SOM and k-means methods into the SOM-k architecture, as shown in Fig. 2.

Data description for SOM-k
The data for online games marketing was collected by both online and offline surveys in 2011. There were 473 responses received, 438 of which were effective data. The attributes of the data are listed in Table 1.

Data preprocessing
The attribute data in this study are the numerical and categorical data types, but the SOM clustering algorithm can only process numerical data. However, using the co-occurrence clustering method, Zhang and Tsay (17) considered the accompanying relationship between the categories of attributes in the data set and numerically classified all the category attributes in accordance with this relationship. Then, mixed data were converted to full-value data. In this manner, grouping can be accomplished by various clustering methods, and data of this type are used for numerical conversion. As for the preprocessing of the numerical data, normalization should be performed before running the SOM algorithm, in order to obtain a better and more suitable equation for reliable clustering outcomes. The formula of normalization is where x i max represents the maximum value of the original data in the attribute, x i min represents the minimum value of the original data in each facet, and i is the value of each attribute category, for example, for the original data attributes {A, B, C}, i = A, B, C. The normalized data of the attributes "Age" and "Monthly expense" are shown in Tables 2 and 3, respectively. Second, the categorical data type can be transformed into the numerical data type by the cooccurrence clustering method. This is a complex and tedious procedure. (18) However, the group of mixed data to determine relationships between attributes is similar to the distance between the data objects. After the above procedure is completed, the category attribute is digitized as shown in Table 4-6.

SOM-k exploration results
After the above data preprocessing steps, the 438 samples are converted into SOM-k for data cluster exploration. Of the parameters calculated by SOM, map size is the number of generated nodes. The more nodes, the more detailed the map, and the topology nodes are preset in hexagonal order. The smaller the average of the attributes, the more data space is required, typically between 0.5 and 2 normalized unit. The training procedure is under a good convergent rate. The calculation results show that the best cluster number is 3. The network topology space is shown in Fig. 3.
After SOM determines the optimal cluster number, the cluster number is set to the cluster number of k-means, and clustering is performed to obtain the final clustering result. Table 7 shows the centers of the average attributed parameters of each of the three clusters.

Results and Discussion
The results of the SOM-k cluster exploration are summarized in Table 8. The results of the analysis of each of the three groups are most significant in terms of age, and the following three groups can be obtained.  Group 1: University students: age 19 to 24 years, students, prefer "RO2". Group 2: Social youth group: new members of workforce, college level, age 19 to 24 years, prefer "World of Warcraft".
The results of this study were compared with those for experienced players in 2011 (18) shown in Table 9.
The social youth group prefers World of Warcraft, which experienced players ranked first. The new Ragnarok online legend, which was preferred by the students group, is second in the ranking of experienced players. The junior high/high school students group prefers Maple Story, ranked 7th by experienced players. Since games in this ranking were evaluated by experienced players, Maple Story is judged to be simple and easy to operate, and its age groups are biased towards younger players. In this study, the majority of players were 19-29 years old; thus, it is foreseeable that the ranking results of games played by this group will not be high. According to the above description, the cluster number of this study is 3. Two clusters are associated with games ranking first and second in the experienced players' evaluation of games, which verifies the accuracy of the grouping in this study relative to the popularity of actual online game products. The current market situation is consistent with our results.

Conclusions
In this study, we investigated the SOM network and the two-stage grouping method supplemented by the k-means algorithm. The number of clusters of online game players was determined and named as junior high/high school students, university specialists, and social youth groups. The ranking of games as evaluated by experienced players indicated that the clustering results are reliable. After we group the data, we can adjust or refer to the decision based on the clustering result. Suppose there is a manufacturer developing a new online game. The game type is similar to Maple Story. To attract more potential customers, the manufacturer can promote the game mainly through TV shows or popular websites that are often watched by high school students, who are the main targets. In contrast, if game makers choose professional forums or advertise on the financial channel, the relative publicity benefits will be greatly reduced. According to the research results, manufacturers can realize better customer management and effective promotion in appropriate places, while reducing the time and resources spent in inappropriate places, thus avoiding the waste of resources. This is part of every company's publicity plans. From the groupings of clusters, we also learned that the current World of Warcraft and new Fairyland Legends games can be found among the current diversified online games, and their popularity has always been among the highest on game discussion boards. These two types of online games are still the ones most players like. Game makers can design new games on the basis of a complete analysis of their architecture or popular attributes.
In this study, the characteristics of players in large-scale online games were examined. Time and financial constraints prevent us from exploring media such as Facebook in future research and comparison of online games. Since online game players are mostly students, if the number of samples can be increased in the future, the player characteristics in the game market will be better represented.
In the future, by using the following concept, we can develop a complete IOT system for marketing segmentation, which can be constructed for future development in the marketing field. Since every customer is a target or an item for marketing segmentation, those modules, which have the ability to aggregate data collected from every customer, resemble sensors for detecting the intentions of customers browsing on the Internet. All of the decision making in marketing segmentation can be automated in the web systems by using the virtual sensors throughout all the browsing on the web. He is also the president of the International Institute of Knowledge Innovation and Invention (IIKII) and the chair of the IEEE Tainan Section Sensors Council. In recent years, he has published more than 100 SCI and SSCI papers.