DLGNN: A Double-layer Graph Neural Network Model Incorporating Shopping Sequence Information for Commodity Recommendation

The essence of recommendation methods for online shopping is to predict whether the user will buy specified products. The existing methods are usually modeled on a user–item-score matrix


Introduction
With the rapid development of mobile Internet, daily acts of people cannot be separated from their mobile devices and internet information. 5G will make connections between people and things, and between things and things, that is to say, objects in the home, office, and city will be connected, moving towards systems with wisdom and intelligence. However, as the content and services provided by the Internet become increasingly rich, the large amount of redundant information leads to the problem of information overload and poor user experience.
Many online services such as E-commerce, advertising, social networks, and personalized reading are becoming increasingly widespread with the rapid development of mobile Internet. The core function of recommendation systems (RSs) is to predict how likely a user will purchase an item on the basis of historical interactions such as browses, clicks, and purchases. (1) RSs rely on large datasets of historical data and find a good representation model of historical data, then design models and algorithms to predict and recommend.
The two traditional kinds of recommendation methods, content-based recommendation algorithms (2,3) and collaborative filtering (CF) recommendation algorithms, (4) have shortcomings in describing sequential data. Each item is independent of other items and the continuous preference information of each item in session cannot be modeled. (5) In recent years, the emergence of deep learning methods has brought new ideas for RSs. Relevant models and technologies based on deep learning are applied to RSs, and certain breakthroughs have been achieved. The most powerful approach to capture complex relations is to use deep learning techniques. Deep neural networks (DNNs) for RSs are also beginning to attract much attention. (6)(7)(8)(9) User attributes and item properties can be added to the models when constructing a neural network, then the impact of these external features on the final results can be analyzed. It is easier to obtain the preference of users from user attributes than from historical user behaviors. Some attributes may even directly indicate user preferences, and the relationships between these preferences and user behaviors can be further obtained from the output of a neural network. Some existing research methods also add features extracted from relevant texts and images to the neural network. Although the accuracy of algorithms is greatly enhanced using a neuralnetwork-based recommendation model, the effectiveness of the recommendation is not quite as good as expected.
Generally speaking, a RS needs a very large number of historical user behaviors, which can be more than 10000, containing all the information of the users. The CF and matrixfactorization (MF)-based methods have been proposed to deal with user shopping data, which reflect the fact that the recommendation process has begun to depend on the decision of algorithms. Algorithms based on MF sufficiently explain the implicit relevance contained in the user-item-score matrix. (10) Thus, user behaviors are predicted by analyzing user preferences in the historical data using such methods. However, in an actual environment, user preferences are essentially dynamic and may change with time. Changes in the user preferences in historical data and the prediction of prospective trends are both ignored in these models. To obtain durable representation vectors of user preferences, constantly updated data should be used in these methods. However, this can lead to various problems in real applications. On the one hand, the constantly updated dataset is usually small, and thus the reliability of the algorithm becomes low. Conversely, changes in user preferences cannot be accurately obtained if too much historical data are used. On the other hand, the failure time of the model cannot be predicted, and the frequency for updating the model is hard to control.
Currently, some recommendation models (11) based on a recurrent neural network (RNN) use the purchasing behavior of users to model and express changes in user behaviors. However, it is known that the RNN model is more applicable to express the effect of predecessor nodes on the successor node. In fact, the effect of user historical behaviors on the subsequent behaviors is not in chronological order, but user historical behaviors construct a user behavior network that has a cross influence. Other models (12) use a graph structure to model the click sequence of users. However, these methods unduly focus on behaviors of a single user and ignore the global level, i.e., the changes in the item audience.
For a smart city in the future, the relationship between all the commodities (equivalent to the commodities in this article) can be modeled by using an algorithm similar to one for commodity recommendation. Commodities can be expressed by sensors, where people are users, and this relationship is the social relationship between people. People's choice of commodities is the same as their preference for commodities, and its essence is a kind of RS.
To solve these problems, a double-layer graph neural network (DLGNN) based on a graph neural network (GNN) is proposed as a recommendation model in this paper. The userpurchased-item sequence and purchased-item-user sequence are both used to construct the graph structure, and the GNN is applied to model these sequences to probe the transformation pattern between different nodes. The states of these transformation patterns are then used to predict user behaviors. On this basis, pretrained embedding vectors of users and commodities are used to represent their global information, and thus some additional features can be incorporated to increase the amount of prediction information. The E-commerce Recommendation Algorithm in this paper is similar to the recommendation problem in a smart city. In summary, this work provides the following main contributions: 1) The user-purchased-item sequence and purchased-item-user sequence are both extracted from the original purchase information and are used to construct two graph structures, which can be effectively applied to express changes in user preferences and item audiences. 2) We propose the DLGNN model, a new recommendation framework based on a GNN, and the nodes of the network are used to express the transition states of users and items. User behaviors are modeled and predicted by the implicit vectors of these states. The GNN can model the continuous user-purchased-item sequence and purchased-item-user sequence in a graph learning method. 3) Our methods are used to construct the representation vectors of users and commodities. A neural network is structured to represent the embedding of user and item information. By introducing these vectors in the above neural network, one can apply both the purchase sequence and the representation vectors of users to predict items purchased. This paper is organized as follows. Section 2 presents the previous related work. Details of the shopping prediction model based on the GNN are introduced in Sect. 3. Experiments based on a real dataset in Sect. 4 verify the effectiveness and reliability of the proposed method. Section 5 concludes this paper.

Related Work
RSs have been widely investigated. Many approaches have been proposed to address the following issue: the traditional RS uses explicit/implicit information as the input for prediction, which has the problem of sparsity. Existing methods for RSs can be generally categorized into three classes: content-based models, (12) CF, (13) and hybrid recommendation methods. (14,15) Among the various methods, MF (16) is the most representative one, which models the useritem interaction function as the inner product of a user latent vector and an item latent vector. However, MF-based methods lack the capability to capture hidden subtle factors, because the inner product is not sufficient for modeling the complex structure of interaction data. (17,18) In recent years, DNNs have yielded state-of-the-art performance on several tasks such as computer vision and natural language processing. Neural-network-based recommendation models have rapidly become a research hotspot, and types of network with different structures have been applied to RSs. A DNN, convolutional neural network (CNN), and RNN were respectively used in Refs. 19-21. Some neural network recommendation models have been successfully applied to the Internet. For example, the DNN model of YouTube (22) utilizes the advantages of neural networks for parameter training, and the constructed network effectively combines features of different dimensions. However, most works performed deep content feature learning and resorted to CF, which cannot effectively model the highly complex user-item interaction relationship. On the other hand, owing to the difficulty of training DNNs, existing models utilize a shallow architecture, thus limiting the expressive potential of deep learning. (18) In fact, there are two key points to be considered: 1) embedding, which transforms users and items to vectorized representations; 2) interaction modeling, which reconstructs historical interactions based on the embeddings. GNNs (23,24) are deep-learning-based methods that operate on a graph domain. Also, graph embedding can learn to represent graph nodes, edges, or subgraphs as low-dimensional vectors. (25) Therefore, the GNN method can be used in RSs.
A GNN is a type of neural network that is applied to analyze graph structures. Graph-based tasks such as representation learning are modeled by a GNN. A GNN based on an RNN has been proposed. (23) Adding gated recurrent units (GRUs) to GNNs makes the training more effective. (24) In some studies on RSs, GNNs are combined with traditional models. (26) There are also some methods that use a GNN directly to make predictions. (5)

Proposed Model
In this paper, a recommendation model based on a GNN is proposed. The construction of representation vectors of users and commodities is first introduced. Then, the method for constructing the graph data and the model for analyzing these data using a GNN are both presented. Finally, the shopping prediction model and its training mode are given.

Model framework
The DLGNN model predicts user ratings of a product. The framework of the DLGNN is shown in Fig. 1. The input of the model is the user's scoring matrix for the product. The output of the model is a score, which is used to predict the user purchase intention of specified commodities. The scoring matrix is first constructed as sequence data. The sequences of users and commodities are the inputs of the model, which are used to construct the graph data and GNN. In addition, to introduce global characteristic information of users and commodities into the steps of prediction, the embedding information of users and commodities is constructed from the relevant group data. Then, the GNN-analyzed state transfer matrices and representation vectors of users and commodities are used to predict the scores of users for specified commodities. Feedback in the form of real scores is used to make the model converge. For a particular user, the scores of different commodities are sorted, and those with highest scores are recommended.

Representation vector construction
The purpose of constructing representation vectors for users and commodities is to use global information of users and commodities as supplementary information. The transitional information between the sequences of commodities, which are restricted to the sequence nodes, can be obtained from the GNN. The generalization performance of the model can be enhanced by adding global information. The representation vectors of users and commodities can be seen as the representation vectors in the entire group. The people who bought the same commodities as a user represent the unique features of this user. The model structure used to construct the representation vector originates from the Continuous Bag-of-Words (CBOW) Model. (27) The neural network structure is shown in Fig. 2, which is applied to train the representation vectors of users.
The inputs of the neural network are one-hot vectors denoting the users that bought the same commodities as the i th user. The hidden layer cell can be obtained by the following expression: where x i is the one-hot vector of user i, W h is the weight matrix, and b is the bias. By using h and the weight matrix W o , we can express the k-dimensional probability distribution y as The output of the network y is a multiclass probability. The class with the largest value is the class in which the predicted user u k is located. The loss value is calculated using the onehot vector of user u k and network convergence is performed by back-propagation. When the network converges, for a user u k , the users who bought the same commodities as u k are passed to the network to calculate the hidden layer vector h as the representation vector of user u k . For the commodities, the items bought by the same users are used to train the commodities' representation vector.
In addition to the above methods for obtaining global representation vectors, there are many representation learning algorithms for training them. Taking the node2vec method (28) as an example, relationships between users are used to construct the user diagram G(v, e). Sentences constructed by nodes are generated by a random walk in the network, and the word2vec model is then used to train the node-embedded expression. However, a large amount of resources is needed when dealing with a large-scale user graph and item graph.

Design of graph
The graph data are constructed on the basis of two types of shopping sequence, the item sequence of a single user and the user sequence of a single item; the final user-item graph is synthesized from these two sequences. The shopping sequence of a user is a list that contains all the recently purchased commodities representing the changes in user preferences; this sequence can be seen as a local graph: userGraph = (V item , E item ). Where the nodes of the network represent the recently purchased commodities, the directed edge e ij between nodes represents the fact that the user purchased v j after v i . Note that the length of the sequence should be limited by using the sequence over the last r days or k purchase records to avoid further changes in historical user preferences appearing again, which introduce noise into the model.
The user sequence of single items represents the changes in the item audience. Similar to the shopping sequence of users, a node of the network represents the recent users who have purchased item v k , and the directed edge e p,q between nodes represents the fact that user u q purchased item v k after user u q .
Local graphs of users and commodities are just partial considerations on changes in user preferences and item audiences; these local graphs can be spliced into a global graph. The model cannot be fully trained using only local graphs. However, arbitrary training samples can be constructed from a global graph to fully train the model.

Description of GNN
In the last ten years, many methods for analyzing graph data have been proposed, and neuralnetwork-based algorithms are a research hotspot. The gated GNN (GGNN) is an improved version of the traditional GNN, and it is more suitable for modeling the transition between sequence nodes. In this paper, the GGNN is used to analyze the node transition of shopping sequences of users and commodities. Two synchronously updated GGNN networks are designed to separately handle these two shopping sequences. Here, we take shopping sequences of users with a length of four as an example, specifically, Seq: Let matrix A denote the adjacent matrix of the sequence as shown in Fig. 3. The adjacent matrix is constructed from the incoming edges and outgoing edges of nodes, and the transition relationships are represented in the form of a matrix.
The item representation vectors containing transition relationships are constructed by the following expressions: where a is the sequence of item representation vectors. The item representation vectors use a d-dimensional vector to represent a specified item, and this vector contains the transition information between this item and other items. When using an RNN to process seq.a, the reset gate and update gate of the GRU are used to select necessary information in a and activate the neural cells. The calculation process is shown in the following expressions.
Here, z t and r t are the reset and update gate vectors, respectively, σ is the sigmoid function, and  is the elementwise multiplication operator. v t indicates the latent vector of the sequence node V itemt . Then, the hidden state of the previous point in time and the present transition state are used to update the present hidden state. The output vector v k of the final sequence node V itemk represents the hidden state of all user changes for the final item after buying a series of items. Similarly, for the user sequences after GGNN processing, the output vector denotes the state vector of users after all changes in the item audience. In the actual modeling, two GNNs with the same structure are used to process sequences of users and items simultaneously.

Prediction step
After the processing of the two GNNs, the output vectors S useri and S itemj of the user sequence and item sequence can be obtained, where S itemj represents the cumulative state of the audience transition of item j and S useri denotes the cumulative state of the purchase preference transition of user i . The pretrained representation vector S itemj for item j denotes the representation vector of item j over all items. Similarly, the representation vector E useri of user i denotes the representation vector of user i over all users. The prediction of scores of user i for item j is based on only these four vectors. To solve the problem that the dimensions of the state transition vector and the representation vector are not the same, a feature-fusion computation is used, and the feature vector representing users and commodities can be obtained as follows: where W iemb and W itran control the weights of these two vectors. Activating and normalizing A fully connected neural network layer is used to process the context features and for regression prediction. The equation for predicting scores can be expressed as ˆb The model is trained by continuous optimization to make it approach real scores. The loss function is defined by the real scores y and predicted scores ŷ as The stochastic gradient descent algorithm and back-propagation are used to minimize the loss function until the model converges.

Experiments and Analysis
To evaluate the performance of the model, the real dataset used and some existing recommendation methods are first introduced in this paper. The practical effects of the proposed algorithm are then compared with those of the baseline model by considering different evaluation indicators. The effects of embedded vectors of users and commodities on the proposed model are also considered. Finally, the training process and its characteristics are discussed.

Dataset
A real dataset from the well-known Amazon.com is used to evaluate the performance of the proposed algorithm. As a global e-commerce website, Amazon has large and full-scale datasets, (29) the statistical properties of which are favorable such as a homogeneous distribution and lack of bias. The dataset Movies and TV is used in this paper, the data of which contain the scores of users for specified commodities and the purchase timestamps. The time span of the data is from 1996 to 2014. The scale of the dataset is listed in Table 1.
The average numbers of purchases in the dataset Movies and TV are given in this dataset, and each item is purchased approximately 75.3 times on average. From this, it can be inferred that the lengths of the user sequence and item sequence are sufficient to be applied in the proposed method, i.e., changes in user preferences and the item audience are contained. This dataset is also favorable for use with baseline methods.

Baseline
The following algorithms are used to evaluate the effectiveness of the proposed DLGNN. 1) Bayesian probabilistic matrix factorization (BPMF): (30) The BPMF is based on the factorization of the probability matrix, which uses the Markov chain Monte Carlo (MCMC) method for approximate inference. 2) SVD++: (31) SVD++ is based on MF and implicit feedback. During the factorization of the score matrix of users, the implicit feedback of scores is also considered in this model. 3) TimeSVD: (32) On the basis of SVD++, the impact of time on the purchase is introduced. 4) Factorization machines (FMs): (33) The FMs are a type of machine learning algorithm based on MF, which can be used for regression prediction of user scores. The crossing relations of different characteristics are expressed by implicit vectors.

Evaluation metrics
The indicators used to evaluate the DLGNN and baseline methods are defined as follows: 1) RMSE: The root mean square error (RMSE) indicates the bias between predicted values and real values; this is a commonly used evaluation indicator.
2) MAE: The mean absolute error (MAE) is expressed as The actual situation of the predicted error can be better indicated using the MAE.

Performance evaluation
The following parameter setup is used in the experiments conducted. The dimension of the trained vectors is 128 when training the representation vectors for users and commodities. The lengths of purchase sequences are set to 15. The length of the GNN output vector after processing the sequence data is set to 128. A Gaussian distribution is used to initiate the parameters of the neural network; the Adam optimizer with a learning rate of 0.001 is used  Table 2. From Table 2, the prediction result of the DLGNN yields the smallest RMSE and MAE. The performance of BPMF is the lowest owing to the fact that no latent information is utilized. FM, SVD++, and TimeSVD introduce latent vectors into the model, which increase the amount of information in the model and improve the accuracy of the results by training implicit information. This comparison illustrates that the more information used in the model, the better the result achieved. Although TimeSVD utilizes information of the purchase time, this algorithm does not work well because it does not consider changes in user preferences. A user's old shopping record interferes with the prediction of new items owing to changes in user preferences. This is a drawback of methods based on the user-item-score MF method. These methods rely too much on the logic in the matrix and cannot capture the transfer of user interest. Although FM introduces more latent information, the choice of features of the data has a major impact on the results. The DLGNN constructs purchase sequences from the user-item-score matrix to analyze changes in user preferences and changes in product audiences. This allows it to capture external changes in users and commodities, resulting in higher accuracy of the model.

Analysis of representation module
In the proposed DLGNN method in this paper, the pretrained representation vectors of users and commodities are used as global information; this information and the output vectors of the GNN are combined to generate the prediction for user scores. The specified impact of representation vectors on the final results is analyzed. The full DLGNN model with global representation vectors and the fragmented DLGNN model without global representation vectors are separately constructed for the following discussion. The fragmented DLGNN model directly splices two GNN outputs, and linear regression is used to obtain predictions. Changes in the loss of these two algorithms are shown in Fig. 4.

Analysis of sequence length
The length of the purchase sequence is expected to affect the convergence and performance of the training process. Purchase sequences with three different lengths are used here to train the proposed model, and the detailed information is listed in Table 3. It can be seen from the results that too long and too short sequences have a negative impact on the model. When the sequence is too short, the user's purchase preference information and the audience information of the product contained in the sequence are insufficient. The transfer of users and items occurs more than once during a long sequence, which shifts the output of the final GNN. In our follow-up work, we will additionally focus on the processing of some long purchase sequence dataset. The suitable sequence length is not fixed and should be adjusted according to actual conditions. In the case of a single product type, the sequence length should be set shorter. If the variety of goods is large, the sequence length should be set longer.

Conclusion
A recommendation model, DLGNN, based on a GNN is proposed in this paper. The userpurchased-item sequence and purchased-item-user sequence are both used to construct the graph structure, and the GNN is applied to model these sequences to probe the transformation pattern between different nodes. On this basis, the pretrained embedding vectors of users and commodities are used to represent their global information. Experiments based on a real dataset show that the proposed method yields better performance than existing recommendation algorithms.