1 Introduction
Since the last months of 2019, a new disease has been appeared called COVID 19 (Coronavirus). This virus has been appeared firstly in Wuhan capital of Hubei, China. COVID 19 is spread during close to contaminated surface of when people cough or sneeze. The symptoms are fever, cough, fatigue, myalgia. Unfortunately, on March 11, 2020, the world health organization declared a pandemic. Until today we record 659544 cases and 30630 deaths.
Intelligence Artificial and precisely Machine learning has been a very active research field in medical diagnosis by analyzing X-ray images. Generally, the process of use of machine learning is based into building a strong training model. The main keys of this process are feature extraction and feature selection. The first one is the representation of an image as feature vector, among of feature extraction methods, we cite, Histogram of oriented gradients, Local binary patterns, Color histograms, Fourier, Gabor, Discrete cosine transform, etc.
The second one is the feature selection which allows to select the relevant and the optimal subset of feature by removing the non-informative and redundant features. Classification output depends largely of the features used to build the training model. The features are not all relevant, several features are considered as noise and reduce the accuracy rate.
Feature selection is a process of preprocessing data that attempts to select the optimal subset of features considered as the relevant and informative one before classification. Feature selection approaches are divided into three categorize: filter, wrapper and embedded. Filter approach. Filter approach uses the general characteristics of features. It classifies the feature according to certain measure such as Fisher Score [5], Pearson correlation [20], mutual information [19], etc.
Wrapper approach is based on the generation of features subset by using classifier. The score using to evaluate the candidate subset is generally the classification accuracy rate. The wrapper approach provides a good results compared to filter approach, but, it is time consuming because it called several times at each iteration the classifier, and the results can highly be related to that classifier. Wrapper approach uses generally a meta heuristic such as Genetic Algorithm [7], Particle Swarm Optimization [8], Gravitational Search Algorithm [16], Binary Bat Algorithm [14], etc.
The last category is embedded approach which integrates the feature selection during the classification process such as SVM-RFE [15].
Many works have been done on medical diagnosis based on feature selection. In [1], the authors attempt to diagnosed the autism spectrum disorder using electroencephalogram. The authors use a feature selection approach based on mutual information, information gain, minimum redundancy maximum relevancy and genetic algorithm. The classifies are K nearest neighbor and support vector machine. Xiaoke et al. [6], present multi-modal neuroimaging feature selection using for diagnosis Alzheimer’s disease.
The authors propose a new multi-modal neuroimaging feature selection based on consistent metric constraint for AD analysis. The multi-kernel support vector machine is using as classifier. In [3], the authors propose to incorporate the feature selection approaches for neonatal seizure diagnosis. The feature selection is based on decision support system using the electroencephalography. They use ten different feature selection algorithms to select the optimal subset of feature. In [12], the authors trait the problem of neurological disorder diagnosis for autism. They proposed to use feature selectin approach to reduce the high-dimensionality of connectome data. The authors proposed a new feature selection approach called brain network atlas guided feature selection to disentangle the healthy from the disordered connectome.
In this paper, we propose a complete process of COVID 19 diagnosis. This process is composed of three phases.
The first one is the feature extraction based on four approaches. The second phase is feature selection, we propose a new approach based on Multi-Verse Optimizer and a new objective function. The last phase is the classification analysis. This work will be tested on COVID 19 Chest X-ray images.
The rest of paper is organized as follows: In section 2, we detailed the proposed approach. In section 3, we present and discuss the experimental results. In section 4, the conclusions and some future work are presented.
2 Proposed Approach
The approach proposed in this work contains three phases: feature extraction, feature selection and classification.
2.1 Feature Extraction
The first step consists of the extraction of features. The process is to convert the image to feature vector. In this study, we propose to combine four feature extraction approach which are:
— Pyramid Histogram of Orientation Gradients consists to the gradient orientation in the image used generally for object detection. (dalal trigs 2005) it consist of counting the occurrence of gradient orientation. The image is divided to sub regions at different resolutions [4].
— Fourier features is a very used approach in image processing. It divided the image into sine and cosine components. The number of frequencies is the number of pixels in image [18].
— Gabor feature attempts to extract characteristics of scale, orientation and spatial locality which are combined to recognize a region [9].
— Discrete cosine transform is member of the class of sinusoidal unitary transforms. It is a feature extraction method that divide the image into sub blocks of differing importance related to the visual quality [17].
The basic of combining all these feature extraction approaches is to use all the advantage of each one, and, get a sufficient number of features.
2.2 Feature Selection
In this section, we present the main step which is feature selection. This last represents a primordial step in classification process. It allows to select the optimal sub set of features considering as the relevant and informative subset. Improving this step allows to increase the quality of classification and by consequence the accuracy rate.
In this study, we cast the feature selection problem as a combinatorial optimization problem defined as follows:
Let’s supposing F = {F1,..., Fn} entire set of all features provided by the first step (feature extraction). We define a binary variable X = {X1,..., Xn} which is decision variable that can be 0 or 1. 0 means that the feature is selected and it will be used to build the training model and 0 otherwise.
To evaluate the quality of candidate feature subset, a certain measure must be defined. The objective function computes the score of selecting the candidate feature subset or not. The objective function J(X) proposed in this study is composed of two terms, the classification accuracy rate J1(X) and number of selected bands J2(X):
The main goal of this objective function is to reduces the classification error rate and the number of selected features together.
The classification accuracy rate is obtained by five classifiers: Support Vector
Machine using Gaussian Kernel, K Nearest Neighbor, Naï¿
where C1(Z), C2(Z), C3(Z), C4(Z), C5(Z) are the class of Z using Support Vector Machine, K Nearest Neighbor, Naive Bayes, Discriminant Analysis Classifier, Decision Tree. J1(X) has the following form:
where CAR is classification accuracy rate.
The second term of objective function is the number of selected band which attempts to minimize the number of selected features:
where n is the total number of features.
To minimize the objective function, we propose to use the Multi-Verses Optimizer.
2.2.1 Multi-Verses Optimizer
Multi-Verses Optimizer is naturel inspired optimization algorithm based on Multi-verse theory. Our universe was created by a big explosion called big bang. The universe is on expansion through space which is caused by the eternal inflation. Inflation is the main source of forming planets, starts, black hole, etc. [13, 2].
Muti-Verses theory admets that it exists other universes with different physical laws. In cosmology, three concepts exists: White hole, black hole, worm hole. These three concepts are the main keys of multi-verses theory.
The Multi-verse assumes that there are many universes also created by big bang [13, 2].
Multi-Verses Optimizer is based on the following rules [13, 2]:
— The higher inflation rate, the higher probability of having while hole,
— The higher inflation rate, the lower probability of having black hole,
— Universes with higher inflation rate tend to send objects thrrough white hole,
— Universe with lower inflation rate tend to receive more objects through black holes,
— The objects in universe may face random movement towards the best universe via worm holes regardless of the inflation rate,
— Each solution is a universe and each variable in the solution is an object in the universe. The concept of white and black holes is used for exploration and the concept of wormhole is used for exploitation [13, 2].
The mathematical model is defined as follows [13, 2]:
Let’s U a universe with:
d is the number of variables and n is the number of universes (candidate solution):
where
The pseudocodes for this part are as is presented in Algorithm 1 [13, 2].
In order to provide local changes for each universe and have high probability of improving the inflation rate using wormholes, we assume that wormhole tunnels are always established between a universe and the best universe formed so far. The formulation of this mechanism is as follows:
Here Xj is the
jth variable of the best universe formed so far.
lbj and
Ubj is the lower and
upper bound of jth variable.
The pseudocode of this part is defined as presented in Algorithm 2 [13, 2].
As seen in the pseudocode, there are two main coefficients: WormholeExistenceProbability (WEP) and TravellingDistanceRate (TDR). The first one increase linearly over the iterations in order to emphasize the exploitation phase. The second one TDR represents the distance rate that an object can be teleported by a wormhole around the best universe. These two coefficients is defined as follows [13, 2]:
where l current iteration, L maximum iterations, p is the exploitation accuracy over the iterations [13, 2]. MVO algorithm is defined as presented in Algorithm 3 [13, 2].
Firstly, the algorithm generates randomly a set of universes. In each iteration, by using white and black holes, the objects can move between universe with high inflation rates to universe with low inflation rate.
Each universe faces random teleportation in its objects via worm holes to the best universe [13, 2].
2.2.2 Proposed Binary Multi-Verse Optimizer
We propose a binary version of of MVO algorithm. The problem of feature selection is a binary problem where 1 means that the feature is selected and 0 otherwise.
In other terms, if Xi = 1, the feature Fi is selected and used to build the training model, else, if Xi = 0, the feature Fi is not selected [10, 11]. This is why, we use the sigmoid function as follows:
3 Experimental Results
We present the results obtained by the experiments in this section. Performances demonstration are conducted in terms of classification accuracy rate, sensitivity, specificity, positive predictive value and negative predictive value. The following formula are used to compute these measures. Let us define NTP as Number of True Positives; NTN as Number of True Negatives; NFP as Number of False Positives; and NFN as Number of False Negatives. Then we can define the following measures.
Accuracy Rate | |
Sensitivity | |
Specificity | |
Positive Predictive Value | |
Negative Predictive Value |
3.1 Datasets
The images dataset used in this work is obtained by Adrian Rosebrock and available in (PyImageSearch.com). This dataset is composed of 50 X-ray images divided into two categories: 25 images represents normal chest and the remaining 25 are classed as COVID 19. The images have different size. Figure 1 illustrated some chest X-ray images.
3.2 Parameters Setting
In classification, is very primordial to define the training and testing sets. In this study, we propose to divide the dataset into two subset: 70% instances used for training and 30% used for test. To avoide the problem of overtraining, in each iteration of the algorithm, we split randomly the dataset.
The parameters of the proposed approach are defined as follows:
Firstly, parameters of MVO:
— Number of universes is 60,
— Number of iterations is 50,
— Coefficient p is 6,
— Value of min is 0.2,
— Value of max is 1.
As mentioned above, for feature extraction step, we have used four approach:
— Features from F1 to F765 are obtained by Pyramid Histogram of Oriented Gradients,
— Features from F766 to F773 are obtained by Fourier,
— Features from F774 to F841 are obtained by Gabor,
— Features from F842 to F844 are obtained by DCT.
The total number of features is 844.
3.3 Results and Discussion
In this section, we present the experimental results obtained by the proposed approach. Table 1 presents the results.
SVM | KNN | CNB | DCA | DTREE | This study | |
Accuracy | 75 | 75 | 90 | 85 | 75 | 95 |
Sensitivity | 70 | 80 | 100 | 70 | 90 | 90 |
Specificity | 80 | 70 | 80 | 100 | 60 | 100 |
PPV | 77,77 | 72,72 | 83.34 | 100 | 69,23 | 100 |
NPV | 72,72 | 77,78 | 100 | 76,92 | 85,71 | 90,90 |
Table 1 represents the classification accuracy rate, sensitivity, specificity, positive predictive value, negative predictive value obtained by the proposed approach and some classifier (classifier using all the features) Support Vector Machine (SVM), K Nearest Neighbor (KNN), Classifier Native Bayes (CNB), Discriminant Analyses Classifier (DAC), Decision Tree (DTREE).
By analyzing the results, we clearly observe that the proposed approach provides a high classification accuracy rate that reaches 95% following by native bayes with 90% of accuracy. Discriminant analyses classifier provides 85% of accuracy. The rest of classifiers reach 75% of classification accuracy rate.
For the proposed approach the sensitivity is 90% and the specificity reaches 100%. This means that the proposed approach can return correctly a positive result for 90% of people who has the disease and a false value for the 10% of peoples. With 100% value of specificity means that the proposed approach returns correctly a negative result for 100% of people. The positive and negative predictive value are very satisfactory.
The total number of feature is 844. The proposed approach has selected 492 features which means that 58% of features has been selected.
This paper can be summarized with the following points:
The proposed approach is composed of three steps: feature extraction, feature selection and classification.
The features set is composed of features extracted by using: Pyramid Histogram of Orientation Gradients, Fourier, Gabor and Discrete cosine transform.
The feature selection approach is based on Muti-Verse Optimizer and a Binary version is proposed.
The fitness function is composed of two important terms: accuracy rate and the number of selected features. The goal is to minimize the classification error rate and also the number of selected features.
The classification approach used to compute the fitness function and the classification accuracy rate is based on five classifiers: Support Vector Machine using Gaussian Kernel, K Nearest Neighbor, Native Bayes and Discriminant Analyses Classifier and Decision Tree. The class affected to instance is analyzed and choosen among the five classes generated by the different classifiers
4 Conclusion
This paper proposes an automatic system for COVID 19 diagnosis. The system is composed of three main steps: feature extraction, feature selection and classification. We propose to combine four feature extraction approaches Pyramid Histogram of Orientation Gradients, Fourier, Gabor and Discrete cosine transform.
The next step is feature selection which allows to select the relevant features. For this step, a wrapper approach is proposed based on Multi-Verse Optimizer and a binary version of MVO is defined. The objective function is to minimize the number of features and to minimize the classification error rate. We combine five classifiers: Support Vector Machine using Gaussian Kernel, K Nearest Neighbor, Native Bayes and Discriminant Analyses Classifier and Decision Tree. the class affected to the instance is the class that has the maximum number of occurring between all the classes generated by the classifier. The dataset is a set of chest X-ray images available on PyImageSearch.com. Performance evaluation has been done by analyzing the classification accuracy rate, sensitivity, specificity, positive predictive value and negative predictive value. The analysis of the results indicates that the proposed approach provides satisfactory results compared to classifier without feature selection. As future work, is will be very interesting to test this approach in a big dataset contains many images.