Computed Tomography Image Recognition with Convolutional Neural Network Using Wearable Sensors

1School of Information Engineering, Jimei University, Xiamen, Fujian 361021, China 2Information and Engineering College, Shenzhen University, Shenzhen, Guangdong 518060, China 3School of Information and Communication, Griffith University, Goldcoast, QLD 4222, Australia 4Fujian Shipping Research Institute, Xiamen, Fujian 361021, China 5Department of Business Administration of Chaoyang University of Technology, Taizhong 413310, Taiwan 6Department of Industrial Engineering and Management, Chaoyang University of Technology, Taizhong 413310, Taiwan 7Department of Information Management, Chaoyang University of Technology, Taizhong 413310, Taiwan


Introduction
To employ preventative measures for diseases, early regular health checks and monitoring are required. However, devices used for these routine checkups in hospitals are bulky electronic systems, and imaging devices are attached to computers and consume large amounts of power. As a result, they are not usable at home. Electronic wearable devices or gadgets worn by patients ubiquitously and continuously capture or track biometric information related to health or fitness. These wearable devices are a better alternative to monitoring health and performing basic health checks. In addition, multiple wearable devices may be connected to a server, which allows the centralized monitoring of multiple patients at once.
A wearable belt-type electrical impedance tomography (EIT) system (1) has been proposed for monitoring lung health. The image reconstructed from an EIT system can be displayed on a mobile application in smart devices, such as tablet computers or smartphones. In this study, we use a method similar to this reconstruction to obtain images and we utilize a convolutional neural network (CNN) to subsequently analyze them.
Medical images play an important role in clinical disease prediction, classification, and treatment. Two-dimensional (2D) chest computed tomography (CT), which provides a variety of information, is very useful for the diagnosis of lung cancer and other diseases. Traditional medical image discrimination and diagnosis mainly rely on experienced doctors. Doctors are subject to human error as a result of distraction, stress, and fatigue, and some CT images may be misread and the patients misdiagnosed.
Deep learning is a popular research direction of artificial intelligence, which is widely used in computer vision, speech processing, big data analysis, and so on, because of its ability to accurately predict and model trends given a sufficient amount of training data. Deep learning is gradually being adopted in medical fields. It functions by building a multilayered network in which computers can automatically discover the representations needed for feature detection or classification from raw data. These rich features are called high-dimensional abstract data.
CNNs (2) are noninvasive and may assist doctors to make more informed diagnoses and adopt more targeted treatment methods. Deep learning has often been used for the detection of anomalies in 2D chest X-ray images. In Ref. 3, using a combination of a CNN and a recurrent neural network (RNN), features that indicate abnormal areas were identified, enabling types of lung abnormality to be determined accurately. The possibility of designing a computer-aided diagnosis for chest X-rays using deep CNNs has also been explored. (4) In addition, a method used to infer abnormal areas in chest CT images has also been proposed. (5) There are many techniques for improving CNN accuracy. To improve the architecture of CNNs, Min et al. (6) embedded a multilayer perceptron in the MLPconv layer, which resulted in the output of a convolutional layer called a network-in-network (NIN) that significantly improved the performance of the CNN. The use of the discrete Fourier transform (DFT) to accelerate the calculations in convolutions has also been proposed. (7) Graham proposed fractional max pooling (FMP), (8) which solves the problem of information loss caused by the reduction of the pool size. Methods used to improve pooling performance in the frequency domain have also been proposed. (9,10) We present a CNN-based method of detecting lung cancer based on CT images obtained from our wearable EIT medical device. We introduce the principle of the CNN in Sect. 2, analyze the factors affecting the performance indicators of the model in Sect. 3, and identify CT images by image enhancement methods in Sect. 4.

CNN
Multiple neurons are connected according to certain rules to form a neural network, and a CNN is a feedforward neural network with localized connections and weight sharing that is especially suitable for dealing with gridlike data. A CNN mainly includes a convolution layer, a pooling layer, and a fully connected layer, as shown in Fig. 1.

Convolution layer
Convolution in a neural network refers to an operation consisting of multiple parallel convolutions. To calculate the kth channel Y [k] of the output feature map, the kth convolution kernel W [k] (including the slice matrix W [k,1] , W [k,2] , ..., W [k,D] ) is used. This is convolved with the input feature map X (including the slice matrix X [1] , X [2] , ..., X [D] ), and then the bias b [k] is added to obtain the weighted input z [k] of the convolutional layer. Finally, the output feature map Y [k] of the kth channel after the nonlinear activation function g(⸱) is obtained as in Eq. (1).
Here, g(⸱) is the activation function of neurons, which is generally nonlinear and continuously differentiable, and helps to approximate complex nonlinear functions, solve nonlinear problems, and enhance the expressiveness and learning ability of the network.
In this layer, the input

Pooling layer
CNNs can effectively adjust the size of the output features, but local connections introduce the parameters necessary for convolution operations. The purpose of the pooling layer is to avoid adding additional parameters while preserving the main features of the network.
Pooling summarizes information in each area. The input feature map of the pooling layer  (2) Average pooling: Generally, the average value of all neurons in a region is taken as Each M × N region of the input feature map X [d] is pooled to obtain the output feature map of the pooling layer [ ] The difference between convolution and pooling is that the convolution filter changes with the training process, while the parameters of the pooling layer are fixed.

Fully connected layer and softmax classifier
The fully connected layer and the softmax function work together to classify data by using the extracted features. The feature map matrix, which is obtained by inputting images through the convolution and pooling layers, is first converted into a one-dimensional matrix and then passed through a weighted single-layer perceptron to obtain the same number of outputs as categories. Finally, the output is passed through the softmax function to obtain the probability of identifying each category.
When there are a large number of hidden layers, full connection will lead to redundant parameters, so the fully connected layer is usually placed in the last layer or the penultimate layers of the CNN for later classification or regression.
The softmax function is a generalization of logistic regression functions. It can map K-dimensional vectors into another set of K-dimensional vectors. The mapped output values represent the probability that the input samples belong to a certain class (range 0 to 1), and their sum is 1. The definition of the softmax classifier is as follows: where k = 1, 2, 3, ..., K.

Model training process
The method used to train the model is shown in Fig. 2. First, the data set is obtained by collecting a large number of sample images, which can be divided into three parts: a training set, a validation set, and a test set, where the ratio is generally 6:2:2. Then, the training set is preprocessed and sent to the CNN for training in order to collect the features required for the learning task. Moreover, the network continuously carries out training until the model with the best performance is obtained, which is verified by the validation set. Finally, the test set (new untrained images) is input and an accurate prediction through the best model is output.

Analysis of impact factors
The CNN model for cancer identification can be regarded as a binary classification problem, with each subject in the data set corresponding to a class. After training, the classification layer can be removed, and its features can be used for cancer recognition. Because medical data such as those from lung CT are difficult to collect, the data set is relatively small and the traditional method of deep learning has low performance. Therefore, we consider training the network by transfer learning.
Transfer learning provides a framework to leverage an already existing model (based on some training data) in a related domain. We can transfer the knowledge (and data) gained in the previous model to the new domain.
To analyze the impact of various factors on network performance, we use the model introduced in Sect. 2 to construct a CNN for recognizing animal faces. The data set is from the website providing an open ImageNet data set. (11) Figure 3 shows that when the number of samples is gradually increased from 10 to 400, the model accuracy increases from 12.50 to 15.82%, but when the number of samples is further increased to 800, the model accuracy increases only slightly and gradually becomes saturated.

Impact of training set size
Obviously, when the model accuracy reaches the saturation point, it is no longer necessary to increase the number of samples.

Impact of iteration
Using 10 test samples, we set the same experimental conditions, gradually increased the number of iterations, and observed the experimental results.
It can be seen from Fig. 4 that as the number of iterations increases, the model accuracy increases gradually. When the number of iterations exceeds 3000, the model accuracy rapidly improves, then saturates above 6000 iterations.
The change in loss function value with the number of iterations is also shown in Fig. 4. It can be seen that as the number of iterations increases, the loss function value decreases monotonically, that is, the average error of the samples in the training set gradually decreases. Thus, increasing the number of iterations clearly plays an important role in model training.

Impact of weight for initialization
The use of a suitable weight for initialization affects the convergence speed of the gradient descent. Although zero initialization is simple, the gradients updated in the process of layer-by-layer transmission are always equal, so the initialization effect is not ideal. However, the model trained with random initialization can be smoother and simpler, reducing the possibility of overfitting. We trained the model by random initialization with the training samples trained iteratively in a positive:negative ratio of 3:1. The test results are compared with those for zero initialization in Fig. 5.
In zero initialization, although the training speed is very high, the model accuracy is very low (between 12 and 14%), resulting in a large judgment error during the test. In contrast, in random initialization, with increasing number of iterations during training, the model accuracy shows a steady upward trend. Simultaneously, it is found that the selected random array value should also be determined according to the activation function in the training process. If the value is too large or too small, the model accuracy will fluctuate considerably.

Analysis of generalization and improvement
Generalization can be defined as a mathematical interpolation or regression over a set of training points. To achieve good generalization accuracy for new examples, we can establish a maximum acceptable error rate and train the network using a validation test set to tune the parameters.
The quality of the model can be increased by reducing the training error and narrowing the gap between the training error and the test error. Underfitting means that the model does not fit the training set well, and overfitting means that the gap between the training error and the test error is too large, such that some of the features collected are not applicable to the test set. (12) To obtain a better model, we should optimize the model in two steps: optimize the algorithm itself and initialize the parameters. The purpose of generalization is to reduce the test error, and common methods for generalization are data set enhancement and the introduction of a parameter norm penalty.
The training and generalization errors change with the training set size, and the expectation of the generalization error never increases with the number of training samples. There are usually two ways to avoid overfitting. One is to minimize the number of features; the algorithm determines which feature variables are most important and discards some irrelevant features. The other is to retain all features but reduce the magnitude or number of parameters, because each feature will contribute to the final prediction.

Data set enhancement
We can easily increase the size of the training set by adding additional copies of the training set to improve the generalization ability of the classifier. These extra copies can be generated by performing geometric transformations on each original image without changing its category. In object recognition, data set enhancement through random rotation is very effective because the category information of the images does not change after the transformation.

Selection of activation function
We found that in the training process, the tanh activation function does not perform well. However, with the rectified linear unit (ReLU) activation function, the model accuracy is greatly improved. As shown in Fig. 6, the tanh function changes slowly when it approaches the saturated region and its derivative tends to 0, which may cause the loss of information and lead to the disappearance of the gradient. The ReLU function effectively solves this problem, and applying it to the output of the linear transformation produces a nonlinear transformation. The partial gradients of the ReLU function are 1 for x ≥ 0 and 0 for x ≤ 0. For x ≥ 0, the gradient is large, and the phenomenon of gradient disappearance can be completely eliminated.

Lung CT Image Recognition Method
Electronic wearable devices or gadgets worn by consumers can ubiquitously and continuously capture or track biometric information related to health or fitness. The user uses wearable devices at home and then uploads the measurement information to the hospital server. After that, the medical information system sends a medical report to the user. A flow chart of such a report system is shown in Fig. 7.
In our study, the format of the images in the data set was converted from tif to png. The optimal parameters of the CNN model for animal image recognition are used as the initial parameters of our CNN model for lung CT image recognition. Next, the input image set is trained; using the trained model, the probability of lung cancer is predicted for unknown CT images, and the classification results are compared with the labeling results of experts.
Multiple wearable devices may be connected to a common server with a sensor network built to monitor a physical activity at a centralized location. In this paper, we use a similar method to obtain images and analyze them by the CNN method.

Sample and image preprocessing
The original images were taken from a lung CT image data set from the Kaggle competition. (13) This data set includes 475 images, 66 of which were CT lung cancer images as negative samples and the remaining 409 were healthy CT images as positive samples. Typical images are shown in Fig. 8. The data set is also divided into three parts with a ratio of 6:2:2 for the training, validation, and test sets, respectively. Before training the CNN, it is necessary to understand the distribution of the data and find out its rules. According to these rules, the data is filtered and sorted in the data preprocessing. CT images are obtained by fixing X-ray sources in a certain position, followed by sampling or reconstruction to generate discrete image representations, and finally mapping the values to generate 2D images in any desired direction, (5) so the image preprocessing can omit the step of converting a three-dimensional image into a grayscale image. Since a CNN must have a fixed input size, the image resolution should be reset before training. If the input image is too large, the training speed will be very low and more convolutional layers will be needed. If the input image is too small, too much information will be lost owing to the reduced resolution. In terms of the loss function value and the model accuracy degree, a 256 × 256 image is regarded as one with a moderate resolution that is tested using the validation set, so that as much of the target area as possible can be included without the image being too large.

Model training process
In the model training process, it is necessary to adjust and monitor the parameters such as the number of iterations, which is beneficial for observing the trend of model performance as the parameters change.
We set the number of training samples to 10 and obtain the model loss function values for different numbers of iterations. The results are shown in Fig. 9.
It can be seen from Fig. 9 that with increasing number of iterations, the loss function value decreases gradually and the model accuracy increases. This is because the test value gradually approaches the real value in the process of model fitting. After 20000 iterations, the loss function value is close to 0, meaning that overfitting occurs, making it necessary to optimize or generalize the model.
In Fig. 10, the original training set has only 284 images, 245 of which are positive samples and 39 are negative samples, and the ratio of positive to negative samples is about 6:1. After the  data set is enhanced (each sample is rotated 90° counterclockwise, 90° clockwise, and by 180°), there are 1136 images in the training set, and the numbers of positive and negative samples are increased threefold. As the number of samples increases, the loss function value gradually decreases. When the number of training samples reaches 809, the loss function value is 0.095076 and the accuracy reaches 83.33%, but the recall rate decreases, indicating that the accuracy and recall rate are indicators that must be balanced in the model training process. By increasing the number of training samples, the loss function value does not change significantly and basically maintains small fluctuations. Therefore, when the number of training samples is 809, the model performance is good. From Fig. 10, we can see that, without increasing the number of samples, both the training time and calculation cost of the model decrease.
When the model is in an underfitted state, the training and generalization errors are very high. When the number of samples is increased, the training error decreases, but the distance between the training and generalization errors continues to increase until the overfitting state occurs. After adopting optimization or regularization measures, the generalization gap of the model is significantly narrowed and the generalization ability is improved as shown in Fig. 11.

Comparison with other measurement methods
Earlier research on methods for X-ray detection mainly used CNN methods to find objects, but objects different from those in this study were searched for. The 2D chest X-ray images classified by the CNN (2) in this study can assist doctors in noninvasive detection, increasing the accuracy of diagnosis and enabling doctors to adopt more targeted treatment methods. There have been some related studies on the detection of abnormal 2D chest X-ray images through deep learning. In Ref. 3, a combination of the CNN and RNN identified the tags that represent the abnormal places and types of lungs. (4) In our study, the object is CT images and the method is CNN. Table 1 shows the comparison of different methods for X-ray detection. Moreover, several methods are used to improve CNN performance. For example, in the convolution layer, (6) a multilayer perceptron in the output of the convolution layer is embedded to improve its performance. DFT is used to accelerate the calculation of the convolution layer. (7) In the pooling layer, the FMP to overcome information loss is proposed. (8) In Refs. 9 and 10, a method of pooling to improve performance in the frequency domain is proposed. Table 2 shows the comparison of different methods for improving CNN performance.
From Table 1, it can be seen that the recognition method based on the CNN used in this study can obtain a CT image. In addition, on the basis of the measured results of the wearable device, we propose a CNN-based method for CT image recognition to help doctors diagnose diseases. We use graphic enhancement to preprocess the graph, adjust the activation function and initialization weighting, train and test the target, and extract the feature adaptively, which also alleviates the overfitting problem.

Conclusion
In this paper, an image recognition model based on a CNN is first established, then various factors affecting the model training process are analyzed. The model accuracy increases with the size of the training set and the number of iterations, then saturates. In addition, the weight initialization factor has a significant impact on model accuracy. An optimization method is proposed to improve the network performance by enhancing the data set. The experimental results show that the recognition rate of lung cancer CT images has room for improvement and that the overfitting problem can be properly addressed when the number of samples is limited. Embed a multilayer perceptron in the MLPconv layer (7) Use DFT to accelerate the calculation of convolution layer (8) FMP (9) A method of pooling to improve performance in the frequency domain is proposed

Our proposed
Use graphic enhancement to preprocess the graph, adjust the activation function and initialization weighting, train and test the target, and extract the feature adaptively, which also alleviates the overfitting problem.