Using artificial neural network to predict a variety of pathogenic microorganisms

1 Department of Mechanical Engineering, Yuan Ze University, Chungli 32003, Taiwan; comimic@gmail.com; wangzhchuang@126.com; jsshieh@saturn.yzu.edu.tw 2 College of Engineering, Design and Physical Sciences, Brunel University London, Uxbridge UB8 3PH, United Kingdom; Maysam.Abbod@brunel.ac.uk 3 Division of Thoracic Medicine, Department of Internal Medicine, School of Medicine, College of Medicine, Taipei Medical University. 4 Division of Pulmonary Medicine, Department of Internal Medicine, Taipei Medical University Hospital, Taiwan; chshih43@tmu.edu.tw


Introduction
Intensive care unit (ICU) is an important part of medical treatment in the hospital, and it is also the place which can save critical patients. Generally, these patients have poor survival rate expect except when getting the true diagnosis. In order to reduce mortality rate, patients have to receive many invasive examinations and treatment.
Although these methods can keep patient alive, it is also a way for bacteria invading into patient body. Nowadays, many patients are easily getting ventilator-associated pneumonia (VAP). Mortality rate of VAP [1] lies between 20% and 60% [2][3] and can be even higher. The whole detecting process takes at least five days for culture and identify which species of bacteria causing the diseases. During the period, medical doctor do not treat the patients without diagnosis. Unfortunately some patients might pass away without true diagnosis pending the period of laboratory outcome. By this time, the bacteria keeps multiplying and increase the difficulty of the treatment. Therefore, changing standard operating procedure into online pneumonia detection, without drawing blood, taking chest X-ray and sputum culture, is more efficient and safe.
According to previous study [4], nitrogen (N2), oxygen (O2), water vapor and carbon dioxide (CO2) can be produced from breathing. There are almost 1259 kinds of volatile organic compounds (VOCs) [5][6][7] can be produced after the process of human 3 metabolism activities. These VOCs circulate through the body in the blood. It is usually unclear if the empirical antibiotics are effective or not in the first five days of admission. A fast detection system can be developed using a sensor system that detects bacterial species. The sensor data can be analyzed for identifying whether the patient has pneumonia or not without a need for the period of culture developments which can let physician select a suitable antibiotics in a short time and reduce the mortality rate.
At first, identifying whether the patient has pneumonia or not is the first step in this study. Although the culture of the phlegm and blood can identify the species of bacteria, it still need to take at least five days to know the kinds of bacteria and the consequence of the medication allergy testing. This period of cultivating is very crucial to patients, while patients should also take treatment in this period. However, there are dozens of bacteria can cause to pneumonia. Therefore, it only can depend on the judgments of doctors to treat the patients when cultivating bacteria.
Due to the fact that bacteria would produce VOC (volatile organic compounds) when metabolizing, hence it is important to analyze the spectrum of pneumonia cultured bacteria and collect VOC from exhaled breathing gas. Using 11 sensors electronic nose [8] can provide a variety of resistance from exhaled volatile compounds. Therefore, using the 11 resistances as input and the cultured bacteria type identified in the hospitals as output, can establish an ANN model [9][10][11] which can predict whether the patient has 4 pneumonia or not in real time.

The pneumonia detection system
The developments of pneumonia detection system need to be supported by large amount of data. Data from the 11 sensors, and pneumonia VOC information have been transferred into numeral data that represent the resistance value and VOC relationship.
Comparing the varieties from each sensor, data should be selected carefully before training the ANN [12][13]. Further details on how to build the system are given in the following sections.

Sensors
In this study, CHS430 (a kind of electric nose which is used to collect the breathing data in the study) is used to gather bacteria data by detecting patients exhaled air as shown in Fig. 1. CHS430 is a kind of electric nose [14][15][16] that manufactured by Taiwan Carbon Nanotube Technology Corporation (TCNT). CHS430 consists 11 sensors, each sensor is used to identify different gases. The CHS430 is designed to identify whether the patient has infection pneumonia or not. Internal air sampling pump and advanced pattern recognition algorithm to detect and recognize the chemical vapors requires carefully designed algorithm in order to get the best sensors performance. Data can be translated to numerical resistance format in order to be used to design the detection algorithm. Fig. 1 shows the graphical user interface which provide 5 information to the user about the collected breathing data, also it can show the reflection pattern of each sensor array. Fig. 1 The reflection pattern of each sensor array 6

Data Source
In this study, electronic nose is used to collect the exhaled breathing data. These

Data pre-processing
For selecting data, an ideal phase of flags 1 and 2 has been selected for training.
The steps of recording the breathing data by the electronic nose are Flags 0-4. The electronic nose recorded one reading every ten minutes. Since the resistance values are used to identify the status of the gas, Flags 1 and 2 are the most important processes that shows the changing of the resistance values and saturation, as shown in Fig. 2.
Therefore, data from Flag 1 and Flag 2 are used for the analysis in this study. Then bacteria data had all be labeled as infection and the rest non-infection patient data were 7 labeled as non-infection using binary system. After testing the accuracy of the model, the data has been normalized. For assessing the sensitivity of the sensors, the mean absolute error (MAE) between infection patients and non-infection patients are calculated.

Sensitivity analysis of sensors
There are seven patients' data which are collected by the electronic nose. The study had full prior approval by an institutional review board and written informed consent was obtained from all the patients. At first, the sensitivity of the sensors to be investigated, hence the MAE between infection patients and non-infection patients is calculated. Fig. 3 shows the relationship with infection patients and non-infection patients. From the result, it can be seen that sensors 9 and 10 do not have any reflection between infection and non-infection. Hence these two sensors were taken off, and th e 8 model is built using the other nine sensors.

Machine Learning
After performing pre-processing of the data, the data has all been labeled as

Results
The model is created using ANN toolbox within MATLAB. At the beginning, one hidden layer with random neurons have been set. Fig. 4 shows the training error for the 5 patient's data. In the figure the blue dots show the error while the green line shows the separation between Flags 1 and 2. Three additional red lines are standing for 0.8, 0.5 and 0.2, respectively. Since Flag 2 is more stable than Flag 1, reference to Flag 2 is made to identify whether the patient is infection or non-infection. Furthermore, in the figure, patients 1 and 3 are lower than 0.2 after the green line, while patient 2 is close to 0.2. For these three patients, it can judged that the patients do not have pneumonia.
On the other hand, patients 4 and 5 are higher than 0.8 with respect to Flag 2, hence these two patients have been infection. Fig. 4 The accurately of the model 10 After testing training ANN model another two patients' data (one is non-infection patient, and the other is infection patient) are used to test the accuracy of the model. Fig. 5 shows the testing data results, the predict value of the first patient (i.e., patient 6) are less than 0.2, hence this patient is clearly non-infection case. However, the predicted values of 2 nd patient (i.e., patient 7) show this patient has been infected with pneumonia.
The results show that the model predicts the same result which were obtained from TMUH. After the green line, Flag 2 is quite stable to know whether the patient have pneumonia or not. In order to make sure about the consistency of these data, cross validation id performed to prove it. This preliminary study only records seven patients' breathing 11 data, there are four patients who are non-infected, and three patients who are infected.
In each model, two cases are used (one is non-infection, the other one is infection) to test the accuracy of the model, as a result, three cross validation models were generated. Table 1 shows the result of each model for the same database. Since Flag 2 is more stable than Flag 1, the average from Flag 2 is calculated to identify whether the patient is infected or non-infected. In these results, values larger than 0.8 means the patient has infection; whereas values less than 0.2 means the patient has no infection. Table 1 shows that each model has achieved good performance. The accuracy of each model is 100%.

Conclusion and Future Work
In this study, electronic nose is used to collect patients' breathing data. According to the characteristic of substance when patient breathing, sensor data were used to identify whether the patient have infection VAP or not infection using ANN which has demonstrated good prediction accuracy.
For future works, the study attempts to increase patients' data. The proposed algorithm needs to be comprehensively evaluated in a wider database. This will help to enhance the proposed algorithm for efficiency purposes. Currently, this study suggests that model which is built by machine learning has good performance and accuracy.
Further development and more evaluation are required by increasing the database to make sure that the model has good performance for detecting different types of bacteria.