Performance Improvement of Ovarian Cancer Classification Model Using Multiple Biomarkers and Menopause Information

The tumor biomarker test used for the early diagnosis of ovarian cancer is a relatively simple test using blood. In previous studies, an optimal combination of 2 or 3 biomarkers from 16 cancer biomarkers showing a specific response to ovarian cancer was designed to obtain an ovarian cancer classification model. Menopause is an important information for diagnosing ovarian cancer. In this study, we applied the menopausal status to the classification model and confirmed the performance results. The classification model has better performance when including menopausal clinical information.


Introduction
Ovarian cancer is a malignant tumor of the ovary that occurs most often between the ages of 50 and 70.Epithelial ovarian cancer, which accounts for about 90% of ovarian cancer, is found in more than three phases, so the 5-year survival rate after cancer diagnosis is less than 40%.However, 25% of patients with early diagnosis have a survival rate of more than 90% over 5 years, and the survival rate of second-stage patients is over 70%.The early diagnosis of ovarian cancer provides a great possibility to improve clinical outcome and the survival rate can be increased.Therefore, early diagnosis of ovarian cancer is important. (1)he use of cancer biomarkers is a relatively simple method of screening using blood, and cancer screening is possible at a lower cost than by other diagnostic methods.A cancer biomarker is a molecule that indicates the presence of cancer in the body.It transmits information on the basis of specific changes or mutations of genes, RNA, proteins, and metabolites, and biomarkers detect molecular changes that occur during tumor development.
Cancer can be discovered and prognosis can be determined, and disease progression and therapeutic response monitoring become possible. (2)n cancer diagnosis, the use of the in vitro diagnostic multivariate index assay presupposes that there is no biomarker with a sufficiently high specificity close to 100% in a specific cancer.Therefore, it is possible to combine multiple biomarkers and quantify the analysis by statistical methods.(3)(4) The ovarian cancer classification model was designed by finding the optimal biomarker combination from 16 serum biomarkers showing a specific response to ovarian cancer.
)(7) In this study, we applied the menopausal status to the classification model and confirmed the performance results.The classification model including menopausal clinical information showed better performance than the classification model that did not include clinical information.

Data collection
The samples used in this study are serum samples from 92 healthy Korean women and 101 patients with ovarian cancer including menopausal status.The sera were provided by Hallym University Medical Center (HUMC) and Asan Medical Center.These samples were reacted with Luminex beads attached to 16 biomarkers, and the fluorescence from the antibodies on the beads was measured.
Unlike in our previous work, we additionally use menopause information to improve the performance of classification in this experiment.The detailed statistics for data with menopause information are shown in Table 1.According to our previous work, we find two 2-biomarker combinations and two 3-biomarker combinations that can be used to classify well cancer and normal samples.)(4)

Receiver operating characteristic area under the curve (ROC AUC)
To assess the test performance, sensitivity and specificity are commonly used, and through two indicators, we can find how well a classifier can distinguish between patients and healthy people.When a certain diagnosis system is used, sensitivity is a measure of how well the system can distinguish between the samples, which is associated with condition.Specificity is a measure of how well the system can distinguish between the samples, which does not have an associated condition.In addition, the ROC curve is widely used to determine the accuracy of diagnosis. (8,9)

Results
Table 2 shows the performance of a model with and without menopause information.One of the most popular screening tests for ovarian cancer is the CA-125 or HE4 blood test.However, checking the CA-125 level has led to the misdiagnosis of ovarian cancer.The problem with using CA-125 as a screening test for ovarian cancer is that common conditions other than cancer can also cause a high CA-125 level.
(3)(4) Menopause (training data) in Table 2 represents the type of training data.The hyphen ('-') in the menopause field indicates using the data regardless of the menopause information and it is our baseline.Sensitivity and specificity (fourth column in Table 2) indicate the proportion of cancer patients who are correctly identified as having the condition and the proportion of healthy women who are correctly identified as not having the condition, respectively.Except the Prolactin-TTR combination (first combination), all combinations show a better performance than each baseline.In the case of the HE4-ELISA-Prolactin combination (second combination), AUC was improved from 0.969 to 0.98 and sensitivity increased from 0.78 to 0.91 when premenopause data was used.In the case of the ApoCIII-HE4-ELISA-Prolactin combination (third combination), AUC improved from 0.985 to 0.99 when we used premenopause information.
Furthermore, sensitivity and specificity also increased from 0.901 to 0.9505 and 0.9565 to 0.9783, respectively.Similarly to other combinations, the HE4-ELISA-Prolactin-TTR combination (last combination) also showed a better performance for AUC, sensitivity, and specificity when we used premenopause information.Unfortunately, the Prolactin-TTR combination showed almost the same performance for specificity with the baseline.Figure 1 shows the ROC curve for the performance of a model trained with and without the menopausal status for each combination.For clarity, all figures are enlarged to the top-left side of the overall graph.As shown in Table 2, the performance improved when we used premenopause information rather than postmenopause information.Table 3 shows the performance for each premenopause and postmenopause data with a model trained with premenopause.Menopause (test data) in Table 3 represents the type of test data.The hyphen ('-') represents all the data regardless of the menopausal status.Unlike our expectation where a model trained with premenopause data predicts premenopause data well, postmenopause data is predicted well compared with premenopause data.

Conclusions
In our previous work, we found four optimal biomarker combinations that can be used to classify cancer and normal samples well.To improve the performance of those combinations, we additionally used the menopausal status.In this study, we conducted a classification experiment for detecting ovarian cancer.The Prolactin-TTR model showed a similar performance and the remaining models showed better results in the learned model including menopause.Menopausal status information is very important in the classification model for the early diagnosis of ovarian cancer.Specifically, a model trained with premenopause data classifies normal and cancer well compared with that with postmenopause data.Thus, in this paper, a method of combining human information with a sensed biomarker for the diagnosis of ovarian cancer is proposed.

Fig. 1 .
Fig. 1. (Color online) ROC curve for each combination of a model with and without menopause information.

Table 1
Data statistics.

Table 2
Performance of model with and without menopause information.

Table 3
Performance for each premenopause and postmenopause data with a model trained with premenopause.