Hyperparameter Optimization of Deep Learning Networks for Classification of Breast Histopathology Images

,


Introduction
Breast cancer is the principal cause of death from cancers in women worldwide. (1) Early diagnosis can reduce breast cancer death rates and consequently improve the success rate of early breast cancer treatment to 90%. (2) Although histological image classification is critical in breast cancer diagnoses, the detection and categorization of abnormalities are difficult even for experienced pathologists. (3,4) Recently, machine learning techniques have been successfully used for handling classification in the medical field. Therefore, automatic detection and diagnosis of tumors through histological image processing and machine learning algorithms can increase the accuracy of breast cancer diagnosis. (5) Automated machine-learning-based tools will assist physicians in image analysis. (6) The diagnosis of medical images of breast cancer obtained by ultrasound by machine learning showed significantly better performance than that of models trained on clinical features. (7) The complex processes of traditional machine learning algorithms, such as tedious preprocessing, segmentation, and manual feature extraction steps reduce their efficiency and accuracy. (8) To overcome these drawbacks of traditional machine learning techniques, deep learning networks have been used for efficient classification. (9) The present study focused on the application of deep learning networks for the classification of breast histopathology images.
Deep learning networks have multiple levels of abstraction, representation, and information, and have been successfully applied in various domains. (10)(11)(12) For example, convolutional neural networks (CNNs) based on the deep learning network architecture have been reported to be powerful tools for the automated classification of human cancer histopathology images. (13) Khosravi et al. designed several computational methods based on CNNs to effectively classify various histopathology images across different types. (14) Sabeena et al. used modified CNNbased deep transfer learning for automatic mitosis detection in breast histopathology images. (15) In addition, a successful application of the CNN is the LeNet-5 system, which consists of two convolutional layers, two pooling layers, and a fully connected layer. (16) The following methods are used for the defect classification of benign tumor and malignant tumor features: singlelayer CNN, (17) rotation forest classifier and a parameter-free version of threshold adjacency statistics, (18) LeNet-5 stochastic gradient descent with momentum, (19) LeNet-5 adaptive moment estimation, (20) and LeNet-5 root mean square propagation. (21) Although traditional methods are capable of segmenting and detecting breast tumors using the Breast Cancer Histopathological Database (BreakHis), classification accuracy is limited. (22) To improve the accuracy of mitosis detection in breast histopathology images, we propose a hyperparameter-optimized CNN approach based on deep learning networks.
However, the training of a CNN based on deep learning networks typically requires a large amount of data to optimize the large number of network parameters, such as the number of filters, convolution kernel size, and number of output feature maps. (23) Therefore, hyperparameter optimization in deep learning models is effective for prototype selection and classifier selection, and it has been widely used for the optimization of general machine learning models. (24) Although several methods have been proposed for hyperparameter optimization, the Taguchi method is a unique experimental statistical design, which provides a reliable design solution, improves quality, and reduces costs because it allows optimization with the minimum number of experiments. (25) The Taguchi method is a statistical approach for optimizing process parameters by using an orthogonal array consisting of factors and levels to classify the results. (26) Debnath et al. applied the Taguchi method for network optimization in deep learning networks. They achieved superior classification using hyperparameter optimization to speed up training. (27) In this study, to develop deep learning networks, we presented a simple and efficient framework for the classification of breast histopathology images by using hyperparameter optimization with the Taguchi method.
The objective of this study was to use hyperparameter optimization of deep learning networks for the classification of breast histopathology images. The major contributions of the study are as follows: (1) obtaining the best combination, such as the convolution kernel size, number of filters, stride, and padding of hyperparameters by the Taguchi method; (2) adjusting the hyperparameters in the network architecture to reduce the number of experiments compared with the traditional manual adjustment of hyperparameters; and (3) by analyzing the importance of each impact factor, the hyperparameters in CNN can be effectively optimized.
This study is organized as follows. Section 2 presents hyperparameter optimization methods for deep learning networks. Section 3 discusses the experimental results and data analysis. The final section presents conclusions and future studies.

Materials and Methods
The purpose of this study was to optimize the hyperparameters of a CNN. On the basis of deep learning networks, we obtained the best hyperparameter combinations by performing experiments with the Taguchi method. In these experiments, BreakHis was used to improve the network performance. First, the experimental factors were set for many hyperparameters and the variation level of each factor was determined. The orthogonal arrays required for the experiment were designed. The results can be used to calculate the signal-to-noise (S/N) ratio, and the contribution of each factor can be analyzed through the variance to obtain the best hyperparameter combination for the network.

Deep learning networks
This section introduces the architecture and hyperparameters of the deep learning networks, which contain selectable factors. We performed experiments based on the LeNet network structure. Figure 1 depicts the proposed deep learning network architecture, including an input, three-layer convolutional layer, two-layer pooling layer, fully connected layer, and final classification. (a) Input image size: The input image size in this study was 50 × 50 instead of 32 × 32 (original LeNet). To select hyperparameters of convolutional layers as experimental factors, more levels can be added so that the final output feature image is not less than 0. (b) Convolutional layers: The proposed network architecture contained three convolutional layers. The selected hyperparameters included convolution kernel size, number of filters, stride, and padding. The convolution kernel size affects the fineness of feature extraction, and the number of filters is related to the number of feature maps. Too many filters may affect the training time. The stride is the step of the convolution kernel, which increases each time the feature is captured. The increase makes the extracted feature map smaller, and the calculation complexity also decreases. The padding was set to 0 for each input image periphery, which is used to increase the size of the feature map. The remaining layer changes the convolution kernel to 1 × 1 to reduce the dimension. After testing, the kernel does not significantly affect the network performance, and it is connected to the last fully connected layer. (c) Pooling layers: By referring to the LeNet network architecture, two pooling layers were inserted between the three convolutional layers. The hyperparameters of the pooling layer were not adjusted, and its size of 2 × 2 was maintained. (d) Activation function: After each convolutional layer, the ReLU activation function was added, which can ensure efficient gradient descent.

Hyperparameter optimization using the Taguchi method
The Taguchi method is an engineering method that was introduced by Dr. Kazuhiro Taguchi. In this method, statistics are used to perform experiments and data analysis. The Taguchi method is used to design orthogonal arrays to reduce the number of experiments and observe the degree of variation among experiments. Variance analysis is used to determine the importance of factors, optimize quality, and reduce costs. Figure 2 depicts the experimental procedure of the Taguchi method. The orthogonal arrays in the standard Taguchi method are predefined according to the number of control factors and levels. Therefore, the quality characteristics are evaluated through experiments performed according to the defined orthogonal arrays. In this experiment, we defined the quality characteristics as the performance of the hyperparameters of the CNN based on deep learning networks and we calculated the S/N ratio for each experiment from multiple observations. The S/N ratio of the experimental results indicates the robustness of the network performance. In this experiment, the S/N ratio for the case of "larger is better" was evaluated as follows: where n is the number of observations, y i is the efficiency of the ith observation, and the unit of the S/N ratio is the db value.
As presented in Table 1, Levels 2 and 3 were the levels of the experiment, A-D were the hyperparameters of the first convolution layer, and E-H were the hyperparameters of the second convolution layer. The original hyperparameters in LeNet were the 5 × 5 convolution kernel, which was amplified to 7 × 7 or reduced to 3 × 3 in the table because we wanted to determine if the recognition rates increase for large-or small-feature extraction. The stride and padding were increased to 2 and 1, respectively, to avoid an output image smaller than zero. After the first convolution layer, the output feature image was reduced, and some features were lost. Therefore, we increased the number of filters in the second convolutional layer and retained more features to increase the classification performance. (b) Design of the orthogonal array After the control factors and levels were determined, the orthogonal array was designed according to the results of Table 1. First, we calculated the minimum number of experiments and selected the orthogonal array applicable to the experiment by comparing the total number of columns in the orthogonal array (see Table 2) and determined the number of experiments. The minimum number of experiments is the total degrees of freedom (DoF) plus one, as described in Eq. (2) and shown in Table 3. After the orthogonal array required for the experiment is selected, the orthogonal array is set according to the selected factors and levels.

Experimental Results
The first subsection introduces the BreakHis data set, which is used in the CNN based on deep learning networks to classify benign and malignant tumors and determine the optimal hyperparameter combination using the Taguchi method. In the second section, the experimental results obtained using the orthogonal array and the factor significance analysis are presented.

Data set
BreakHis was established by the Federal University of Parana (UFPR) and Pathological Anatomy and Cytopathology (P&D) Laboratory. (28) The data set is composed of 7909 histopathological images with different magnifications of breast tumors taken from 82 patients, as presented in Table 4 and Fig. 3. The samples were collected through surgical open biopsy, namely, partial mastectomy or excisional biopsy, performed at a hospital with the patient under general anesthesia. Compared with needle aspiration cytology methods, this type of procedure can remove large tissue samples. The data set consisted of 2480 and 5429 images of benign and malignant tumors, respectively, with 700 × 460 pixels. The images were three-channel RGB images with 8-bit depth in each channel and were in PNG format. Images were divided into two main groups, those of benign tumors and malignant tumors. Histologically benign is a term  Table 3 Calculation of DoF of each factor. No.

Number of Levels
referring to a lesion that does not satisfy any criterion of malignancy, such as mitosis, marked cellular atypia, basement membrane disruption, and metastasis. Malignant tumors represent the occurrence of cancer, and the lesion can invade and spread to distant sites and cause death.

Experimental results and analysis
The minimum number of experiments was 13. With reference to Table 3, we selected the orthogonal array more than 16 times. This experiment adopted a mixture of levels of 2 and 3. To satisfy the level number, L 36 (2 11 , 3 12 ) was selected as the experimental orthogonal array. A total of 36 experiments were conducted with the use of up to 11 factors with 2 levels and 12 factors with 3 levels. In this experiment, levels 2 and 3 had four factors, and eight factors were classified as A-H. Table 5 depicts the experimental results. Each experiment contained three observations used to perform three tests on the same hyperparameter combination. In each test, the network training parameters should be cleared, and each observation is independently trained. The S/N ratio was calculated using Eq. (1). In this experiment, we used the S/N ratio with larger-is-better characteristics. Therefore, the larger the S/N ratio, the higher the robustness of the network composed of this factor.   The response table for the S/N ratio, significance, and percentage contribution of each factor is presented in Table 6. When the level in the factor is changed independently, the robustness to the network performance changes. If the level of variation changes, then the greater the degree of variation in the network performance, the greater the effect this factor has on the network. We listed a factor as a significant one if the significance was higher than a threshold. When the optimization does not reach the goal, the level can be adjusted for the significance factor. Finally, the best parameter combination we selected was A1, B2, C2, D2, E1, F3, G2, and H2. The significance parameters were ranked in the order F > E > C > A > D > G > B > H. The percentage contribution order of the factors with the hyperparameter optimization to obtain a high accuracy rate is as follows: Conv1_Stride > Conv2_Filter > Conv2_Kernel size > Conv1_ Kernel size > Conv1_padding > Conv2_Stride > Conv1_Filter > Conv2_padding. Table 7 shows a comparison of the performance of the novel network with the original LeNet network. We trained the same data set (BreakHis) and tested the original network and the optimized network three times without retaining the training parameters. As shown in Table 7, compared with the original network, the proposed method has more average observations in the three tests, which proves that the proposed optimization method improved the robustness of the network. Table 8 presents a comparison of the performances using various methods. The single-layer CNN network architecture includes a convolutional layer and a pooling layer. The number of channels in the convolutional layer is 50, which extracts more feature maps. (17) For comparison with traditional manual feature extraction, the extracted features are added to the CNN training method. (18) In LeNet-5, the accuracy rates are obtained using different gradient descent methods. (19)(20)(21) Therefore, the experimental results prove that the performance of our method reaches 83.19%, which is significantly (5.69%) higher than that of the single-layer CNN method. The proposed optimized network architecture exhibited the best performance.

Conclusions
We proposed the hyperparameter optimization of deep learning networks. The Taguchi method was used to perform experiments on the BreakHis data set to classify tumors as malignant or benign. On the basis of the experimental results, the best hyperparameter   Table 8 Comparison of performances using various methods. Method Performance (%) Single−layer CNN (17) 77.50 RF classifier+PFTAS (18) 81.28 LeNet−5(Sgdm) (19) 80.69 LeNet−5(Adam) (20) 82.22 LeNet−5(RMSprop) (21) 82.58 Our method 83. 19 combination of a CNN based on deep learning networks can be obtained. The optimum parameters are as follows: Conv1_Kernel size = 3, Conv1_Filter = 6, Conv1_Stride = 2, Conv1_padding = 1, Conv2_Kernel size = 3, Conv2_Filter = 32, Conv2_Stride = 2, and Conv2_ padding = 1. The influence of each control factor on the network performance provides the basis for the adjustment of each hyperparameter and improves its performance. The results show that compared with the unoptimized CNN based on deep learning networks, the performance of the optimized networks was improved, and the proposed method is superior to other methods. However, the proposed hyperparameter method was limited to two convolution layers and only applied to the BreakHis data set in this study. Therefore, the proposed hyperparameter optimization of deep learning networks should be performed on a different data set to classify other medical histopathological images in the future.