Comprehensive Performance Analysis on Classical Machine Learning and Deep Learning Methods for Predicting the COVID-19 Infections

Kumar, Prabhat; Suresh, Selvam; Kumar, Prabhat; Suresh, Selvam

doi:10.13053/cys-26-3-3782

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.26 no.3 Ciudad de México jul./sep. 2022 Epub 02-Dic-2022

https://doi.org/10.13053/cys-26-3-3782

Articles

Comprehensive Performance Analysis on Classical Machine Learning and Deep Learning Methods for Predicting the COVID-19 Infections

Prabhat Kumar¹^*

Selvam Suresh¹

¹1 Banaras Hindu University, Institute of Science, Department of Computer Science, India. suresh.selvam@bhu.ac.in.

Abstract:

The COVID-19 (coronavirus disease) has been declared a pandemic throughout the world by the WHO (World Health Organization). The number of active COVID-19 cases is increasing day by day and clinical laboratory findings consume more time while interpreting the COVID-19 infected result. There are limited treatment facilities and proper guidelines for reducing infection rates. To overcome these limitations, the requirement of clinical decision support systems embedded with prediction algorithms is raised. In our study, we have architected the clinical prediction system using classical machine learning, deep learning algorithms, and experimental laboratory data. Our model estimated which patients were likely infected with COVID-19 disease. The prediction performances of our models are evaluated based on the accuracy score. The experimental dataset has been provided by Hospital Israelita Albert Einstein at Sao Paulo, Brazil, which included the records of 600 patients from 18 laboratory findings with 10% COVID-19 disease infected patients. Our model has been validated with a train-test split approach, 10-fold cross-validation, and AUC-ROC curve score. The experimental results show that the infected patients with COVID-19 disease are identified at an accuracy of 91.88% through the deep learning method (Convolutional Neural Network (CNN)) and 89.79 % through classical machine learning (Logistic Regression) respectively. This high accuracy is evidence that our prediction model could be readily used for predicting the COVID-19 infections and assisting the health experts in better diagnosis and clinical studies.

Keywords: COVID-19; coronavirus disease; WHO; machine learning; deep learning; decision support system

1 Introduction

The origin of the novel coronavirus (2019-nCoV) was spotted in Wuhan province, China on December 31, 2019, named COVID-19 by WHO [¹]. The World Health Organization (WHO) has declared the novel CoV outbreak as Public Health Emergency worldwide on January 30, 2020, conveyed according to the act of International Health Regulations [²].

The clinical characteristics of CoV are classified as most common symptoms (fever, dry cough, and tiredness), less common symptoms (aches and pains, sore throat, diarrhea, conjunctivitis, headache, loss of taste or smell, and a rash on the skin, or discoloration of fingers or toes) and serious symptoms (difficulty breathing or shortness of breath, chest pain or pressure, and loss of speech or movement) [³, ⁴].

The prevention strategies regarding interruption of the CoV spreading were noted as early detection, isolating and treating cases, contact tracing, and social distancing. The transmission of CoV can be occurred by directly connected to the infected person via coughing or sneezing within closed connecting (<1m) and indirectly also infected by immediately touching or using the infected surfaces or objects [⁵].

Recently, the article published in the New England Journal of Medicine has produced evidence of the COVID-19 virus spreading through airborne transmission. The home quarantine is enough for a healthy person affected with mild CoV symptoms, on average 5–6 days are essential to show the symptoms otherwise in worst cases takes up to 14 days [⁶].

Over 80% of infected persons are recovered from COVID-19 who had low levels of antibodies of SARS-CoV-2 in their blood. The careful observation of the development of antibodies in infected persons helps to develop the vaccines and treatment for COVID-19 [⁷].

Due to lack of vaccines or the proper treatment, the person can slow down the transmission of COVID-19 by regularly washing the hands with soap and water, maintaining the social distance of at least 1 meter (3 feet) between yourself and others, avoiding going to crowded places, don’t touch eyes, nose, and mouth unnecessary, stay home and self-isolate while minor symptoms, and up to date with the latest information from trusted sources, such as WHO or your local and national health authorities [⁸].

The WHO has brought the world’s scientists and global health professionals to collaborate to accelerate the research and development process, and develop the treatment and vaccines for controlling the coronavirus pandemic [⁹]. Various studies on epidemiologic history, laboratory conditions, analyze the clinical characteristics, treatment regimens, and prognosis of patients are commenced regarding the instantiation of the outbreak of COVID-19 [¹⁰,¹¹].

The clinical characteristics have been studied on mild symptoms affected patients, the outcomes are varied greatly [¹², ¹³]. This is very difficult to find out a highly risky group by concerning the only age and gender factors. Furthermore, it is necessary to predict the infected groups, provide the real treatments with constraints hospitality resources, and health practitioners faced difficulties while treating the patients without any previous experience. Out of these limitations, artificial intelligence (AI) can analyze the data, learn effective patterns, and suggest while decision-making processes. Over the last two decades, AI has achieved countless milestones in the field of health care and advisory systems such as biomedical information processing, disease diagnostics and prediction, and application to radiology, pathology, ophthalmology, and dermatology [–16].

The machine learning algorithms also effort to early detect and predict the health care issues in the application area of latent diseases [¹⁷], Health Monitoring System [¹⁸], Brain Stroke [¹⁹], early-stage disease risk prediction [²⁰], and Acute Kidney Injurious prediction [²¹]. Similarly, the deep learning methods are extremely dedicated to the application area of health such as Alzheimer's disease [²²], emotion analysis towards mental health care [²³], Cancer Care [²⁴], and prediction of pain progression in knee osteoarthritis [²⁵].

We have observed that the contribution of AI, machine learning, and deep learning are considerable in the health care system and the application area of such techniques can also be extended to predict the COVID-19 infection.

In this paper, we provide the classical supervised machine learning algorithms and deep learning methods for the prediction of COVID-19 infection. Twelve classifiers (nine classical supervised machine learning algorithms and three deep learning methods) are designed and applied to laboratory datasets for finding the infected patients.

The performance of our implemented models is compared based on the classification accuracy rate. The main objectives are covered in this paper are summarized as follows:

– Processing towards introducing the prediction system for the identification of COVID-19 infected persons using machine learning and deep learning algorithms rather than Chest X-ray or CT-Scan Images.
– Our research work compared with various machine learning and deep learning algorithms mention in this paper. Further, we also analyzed the experimental results with other recently published research works.
– Our work will motivate the researchers to further architect and build more effective models and include additional parameters such as genders, travel details, previous medical treatment details, etc for boosting the prediction of COVID-19 infection outcomes.

The remainder of the paper is organized as follows. Section 2 elaborates the related works regarding the prediction of COVID-19 infections. The essential information for the implementation of the experimental dataset and basic introduction to methodology is described in section 3. Section 4 provides the initial configuration setup for the method used in this paper.

The evaluation metrics such as accuracy, precision, recall, F1-score, and AUC-ROC score used for evaluating the classification performance are presented in section 5. The evaluation parameters and experimental results of the proposed classification model's performance are comprehensively analyzed with recently published works are detailed in section 6. Finally, section 7 contains the conclusion of our research work and future scope.

2 Related Works

This is very important to continue monitoring and predicting health care tasks. The computer-aided clinical systems are widely used as assisting tools for caring the various health-related issues such as diagnosis of breast cancer [²⁶], diagnosing early gastric cancer [²⁷], brain pathology identification [²⁸], computer-aided drug discovery [²⁹], health care facilities management [³⁰].

The medical experts can use these techniques as assistance for better prediction of diagnosing related issues. This study is extremely dedicated to building the recent methodological model for predictive the COVID-19 infection. Recently, various literature was published related to deep learning methods for COVID-19 infection prediction using chest X-ray or CT-Scan Images [–33].

The authors [³⁴] have obtained the clinical data set from the institutional ethics board of Wenzhou Central Hospital and Cangnan People’s Hospital in Wenzhou, China. The effective features were extracted using eleven feature selection algorithms (ALT, Myalgias, Hemoglobin, Gender, Temperature, Na+, K+, Lymphocyte Count, Creatinine, Age, and White Blood Count). Six machine learning algorithms (Logistic Regression, KNN (k=5), Decision Tree based on Gain Ratio & Gini Index, Random Forests, and Support Vector Machine (SVM)) were applied and accuracy measured on 10-fold cross-validation.

The SVM was obtained the maximum accuracy of 80% among the listed classifiers. The paper [³⁵] have also applied the machine learning (neural networks, random forests, gradient boosting trees, logistic regression, and support vector machines) techniques on the clinical dataset and measured the performance based on AUC, sensitivity, specificity, F1-score, Brier score, Positive Predictive Value (PPV), and Negative Predictive Value (NPV).

This paper was used as the clinical dataset, obtained from Hospital Israelita Albert Einstein in São Paulo, Brazil, and split into 70% for training and 30% for testing. The SVM and random forests classifiers were achieved the best score valued regarding measured parameters (AUC = 0.847, Sensitivity = 0.677, Specificity = 0.850, F1-score = 0.724, Brier score = 0.160, PPV = 0.778, and NPV = 0.773).

The paper [³⁶] was used the same experimental clinical dataset as in [³⁵] and applied the various machine learning algorithms including Logistic Regression (LR), Neural Network (NN), Random Forest (RF), Support Vector Machine (SVM), and Gradient Boosting (XGB). The predictive performance was compared in terms of AUC, AUPR, sensitivity, specificity, and specificity at greater than 95% sensitivity (Spec. @95%Sens.). The XGB was obtained the best experimental result, noted as AUC = 0.66, AUPR = 0.21, Sensitivity = 0.75, Specificity = 0.49, Spec.@95% Sens. = 0.23.

In the paper [³⁷], the deep learning methods (Artificial Neural Networks (ANN), Convolutional Neural Network (CNN), Long Short Term Memory (LSTM), Recurrent Neural Network (RNN), CNN + LSTM, CNN + RNN were applied on the clinical dataset used in [³⁵], experimental results were evaluated with train-test split and 10 fold cross-validation approach, and scores are measured based on precision, F1-score, recall, AUC, and accuracy scores. The hybrid model CNNLSTM of deep learning methods was achieved the best predictive score accuracy = 86.66%, F1-score = 91.89%, precision = 86.75%, recall = 99.42%, and AUC = 62.50%.

3 Experimental Dataset and Methodology

The purpose of this section is to outline the necessary background information regarding the experimental dataset and methodology used in this paper.

3.1 Dataset Description

Here, we provide a detailed description of the experimental dataset, available at Hospital Israelita Albert Einstein at Sao Paulo Brazil, and accessed through [³⁶].

The samples were collected to test the infection of SARS-CoV-2 in the early month of 2020 and available on [³⁸]. This dataset contained a sample record of 5644 patients with a contribution of 111 different laboratories. The infection rate of patients was around 10% of which around 6.5% and 2.5% required hospitalization and critical care. The rest of 90% of patients reported negative SARS-CoV-2. The information related to the gender of patients is not mentioned in this dataset.

The dataset consists of a total of ten columns (Patient ID, Patient age quantile, SARS-Cov-2 exam result (negative/positive), Patient admitted to the regular ward (yes/no), Patient admitted to the semi-intensive unit (yes/no), Patient admitted to intensive care unit (yes/no), Hematocrit, Hemoglobin, Platelets, and Mean platelet volume). We apply the split-test approach and randomly divide the dataset into training (80%) and testing (20%) respectively for validating our models. Furthermore, the 10-fold cross-validation is also used to approximately balance the accuracy rate of models.

3.2 Methodology

Artificial Intelligence (AI) is a loose interpretation of human intelligence into the machine, accomplished through learning, reasoning, and self-correction. The AI-based machine can make decisions based on predefined rules and algorithms without interfering with human beings. Machine learning (ML) and deep learning (DL) are considered a subset of AI and adopt additional features to beat the human being in terms of intelligence and accuracy. The working performance of ML is differing from DL due to the way data is presented in the system. The ML is always required structured data whereas deep learning relies on the reassembling of artificial neural networks (ANN). It is essential to hand over the control to the human beings for handling the applied areas of the ML concept. The DL system aims to adopt the same features without supplementary interference with human beings. The large amount of data processed and used the complex mathematical calculations in the algorithms; DL systems require much more powerful computing power rather than simple ML systems. So, a deep learning system consumes much time (a few hours to a few weeks) to train the model as compared to a simple ML model (a few seconds to a few hours).

In this study, we serve a reasonable framework for validating the developed clinical predictive models to predict the COVID-19 infection. We developed the twelve different models (nine ML and three DL) for evaluation of the study: logistic regression, K-Neighbors classifier, support vector classifier, decision tree classifier, random forest classifier, AdaBoost classifier, GaussianNB, linear discriminant analysis, quadratic discriminant analysis, Convolutional Neural Network (CNN), Recurrent Neural Networks (RNN), and Long Short-Term Memory (LSTM). The logistic regression classification algorithm is used to predict the probability of a categorical dependent variable, which should be a binary variable that contains the binary coded (true/false).

This model of ML is used for predicting the risk of developing chronic diseases [³⁹], Trauma and Injury Severity Score (TRISS) [⁴⁰], diabetes [⁴¹], heart disease identification [⁴²], breast cancer [⁴³], Alzheimer [⁴⁴], etc. The K-Nearest Neighbors has supervised classification algorithms in ML, stores all available cases, and classifies new cases based on a similarity measure, e.g., hamming or standardized distance function.

This is a popular method with a wide variety of applications in many different areas of voice disorder identification [⁴⁵], brain tumor classification [⁴⁶], and more.

Support Vector Machine (SVM) is a supervised machine learning algorithm that can be used for classification, regression, or outlier detection. The SVM assembles the hyperplane or set of hyperplanes in a high or infinite-dimensional space. The hyperplane achieved a good classification result that has the largest distance to the nearest training data point of any class. This is a popular method with a wide variety of applications in many different areas of skin disease detection [⁴⁷], heart disease diagnosis [⁴⁸], etc.

Decision tree is supervised machine learning and classifies the instances according to their feature values [⁴⁹]. This classifier follows the concept of a divide-and-conquer algorithm that splits the data into subsets based on homogeneous properties. This method has innumerable applications in many different areas such as detection of Hepatocellular carcinoma (HCC) in clinical data [⁵⁰], Opioid Use Disorder (OUD) understanding [⁵¹], etc.

Random forest is a supervised learning algorithm and can be used both for classification and regression. It usually draws the set of decision trees from a randomly selected subset of the training set and then merges them to occupy a more accurate and stable prediction result. Its applications can be found in many areas such as identification of human vital functions related to the respiratory system [⁵²], prognosis prediction [⁵³], etc.

AdaBoost or Adaptive Boosting classifier is an iterative approach that learns from the incorrectly classified instances by weak classifiers and fits additional copies of the classifier for turning them into strong classifiers. The application areas of the AdaBoost classifier are relevant to the early prediction of lung cancer [⁵⁴], pinus diseased recognition [⁵⁵], etc.

GaussianNB classifier provides the way of implementing the concept of the Gaussian Naïve Bayes algorithm for classification. This method could be extended to solve the problems in various areas such as diabetes prediction [⁵⁶], a prediction model for the detection of cardiac arrest [⁵⁷], etc.

Linear Discriminant Analysis (LDA) classifier is based on the value of the prediction, estimated by the probability new inputs set belongs to each class. The output class is designated by the highest probability value and built the prediction. This convention is mostly preferred for measuring the models for human health effects [⁵⁸], detection of epileptic seizures using EEG signals [⁵⁹], etc.

Quadratic discriminant analysis is used as both classifier and dimensionality reduction technique but cannot be used as a dimensionality reduction technique. This approach is a variation of the LDA classification technique that also allows for non-linear separation of data. This method is applied in various application areas such as epileptic seizure detection [⁶⁰], pre-diagnosis of Huntington’s disease [⁶¹], etc.

Design of CNN architecture is inspired by the biological vision system and is composed of four subsequent stages of layers: convolutional layer, pooling layer, activation layer, and the fully-connected layer. Each distinct layer is responsible for transforming the input volume to the output volume through different hidden layers to achieve the predefined goal. We can apply the CNN method in different application areas such as automatic skin disease diagnosis [⁶²], pneumonia detection [⁶³], breathing disorder detection [⁶⁴], arrhythmia classification [⁶⁵], small lesion detection [⁶⁶], etc. A plausible and useful theory behind the RNN method is to make use of sequential information that means output from the previous stage provided as input to the current stage. The RNN has a concept of storage that stores all calculated information and also exhibits temporal dynamic behavior. For this reason, this method represents an attractive option for arrhythmia detection [⁶⁷], hemoglobin concentration prediction [⁶⁸], Heart sound classification [⁶⁹], etc.

The LSTM is a special kind of RNN and explicitly designed to avoid the problem of long-term dependency. The LSTM carried out the chain structure that contains four neural networks and various memory blocks, called cells. These cells are responsible for retaining the information and gates to manipulate the memory, named as forget gate, input gate, and output gate. We can use the LSTM approach in various fields such as EEG-based emotion classification [⁷⁰], analyze psychological effects [⁷¹], abnormal heart sound detection [⁷²], chronic laryngitis classification [⁷³]. Figure 1 shows the logical diagram of our experimental prediction model used in this paper.

Fig. 1 Conceptual view of experimental models used in this paper: from the flow of dataset to ML (Machine Learning) and DL (Deep Learning) model, prediction model, and evaluation results

4 Configuration of Experimental Methods

In this section, we are addressing the detailed description regarding the configuration of ML and DL methods, used in this paper for the prediction of COVID-19 infection. For the exposure of ML algorithms as compared to DL methods, we have used the Scikit-learn machine learning classifiers (logistic regression, K-neighbors classifier, support vector classifier, decision tree classifier, random forest classifier, AdaBoost classifier, GaussianNB, linear discriminant analysis, and quadratic discriminant analysis).

These methods are publically accessible with full documentation and can be imported from the sklearn library [⁷⁴]. The initial values of parameters for each classifier and reference section contain the user guideline URL, mentioned in table 1. The layer architecting, the details and parameters of each DL classifier used in this study are mentioned in table 2.

Table 1 ML classifiers parameter adjustment

classifier	scikit-learn method	parameters	Ref.
Logistic Regression	sklearn.linear_model.LogisticRegression	C=1.0, max_iter=100, penalty='l2', solver='lbfgs', tol=0.0001	[75]
K-Neighbors	sklearn.neighbors.KNeighborsClassifier	leaf_size=30, metric='minkowski', n_neighbors=3, p=2	[76]
Support Vector	sklearn.svm.SVC	C=0.025, cache_size=200, degree=3, kernel='rbf', max_iter=-1	[77]
Decision Tree	sklearn.tree.DecisionTreeClassifier	criterion='gini', min_samples_leaf=1, presort='deprecated', splitter='best'	[78]
Random Forest	sklearn.ensemble.RandomForestClassifier	n_estimators=100, min_samples_split=2, min_samples_leaf=1	[79]
AdaBoost	sklearn.ensemble.AdaBoostClassifier	algorithm='SAMME.R', learning_rate=1.0, n_estimators=50	[80]
GaussianNB	sklearn.naive_bayes.GaussianNB	var_smoothing=1e-09	[81]
Linear Discriminant Analysis	sklearn.discriminant_analysis.LinearDiscriminantAnalysis	solver='svd', tol=0.0001	[82]
Quadratic Discriminant Analysis	sklearn.discriminant_analysis.QuadraticDiscriminantAnalysis	reg_param=0.0, tol=0.0001	[83]

Table 2 Architecting configuration of DL methods

Parameters	CNN	RNN	LSTM
Number of layers	1	1	1
Activation function	ReLU, Softmax	ReLU, Softmax	ReLU, Softmax
Learning rate	0.0005	0.0005	0.0005
Loss function	Sparse categorical crossentropy	Sparse categorical crossentropy	Sparse categorical crossentropy
Number of epoch	30	30	30
Optimizer	Adam	Adam	Adam
Dropout	0.4, 0.6	0.4, 0.6	0.4, 0.6
Batch size	512	1024	1024
Total parameters	9,442,306	2,102,274	4,728,322

The parameters are named as number of layers, activation function, learning rate, loss function, number of the epoch, optimizer, dropout, batch size, and total parameters are responsible for framework the DL methods. Each layer is configured with different values that can be optimized and manipulate the input data.

The gentle introduction of the activation function is used for deciding whether a neuron should be activated or not with the help of calculating the weighted sum. How quickly and slowly, the neural network model learns a problem depends on learning rate values. The loss function calculates the prediction error due to estimation loss by the neural networks. The epoch values count the pass of the full training data set through the model. The optimization function is responsible for reducing the losses and accelerating the accuracy rates as much as possible. The dropout is commonly used in deep neural networks to prevent overfitting problems. The batch size specifies the number of training samples processed before the model execution. The total number of parameters aggregates all weights and biases.

5 Evaluation Metrics

The evaluation metrics exist with the Sklearn method to compare the performance of classifiers. In this section, we discuss the evaluation metrics, experimental result analysis based on train-test split and 10-folds validation approach, and result comparison with published works.

To evaluate the classification performance of models, we can use accuracy (A), precision (P), recall (R), and F1-score (F1). For a binary classification problem, the confusion matrix holds the entries of True Positive (TP), False Positive (FP), True Negatives (TN), and False Negatives (FN). The diagonal entries hold the correct prediction TP and incorrect prediction denoted by TN. The classifier made the wrong prediction, referred to as FP and FN.

Accuracy measures the ratio between the number of correct predictions and the total number of input samples. The classification model is observed as perfect when the number of predicted samples is equal to the total number of samples. For the multiclass classification problem, the numbers of classes are denoted by the value of k.

Precision equation measures the number of correct positive results divided by the number of positive results predicted by the classifier. The recall evaluates the number of correct positive results divided by the number of all relevant samples.

F1-score is primarily used to measure the model’s test accuracy and the score varies between 0 and 1 values. The high precision value and low recall value achieve a great accuracy rate but avoid the large number of samples that are difficult to classify. Table 3 illustrates the equation to measure the classification accuracy, precision, recall, and F1-score, extracted from the confusion matrix. The ROC-AUC (Receiver Operating Characteristics - Area under the Curve) is frequently used to evaluate the classification and prediction model's performance. This examines the model’s ability while distinguishing between positive and negative classes. The higher AUC score indicates the better model for the prediction of patients with infected or not infected. The ROC curve is plotted with False Positive Rate (FPR) on X-axis and True Positive Rate (TPR) on Y-axis (figure 2).

Table 3 Equations for evaluating the classification performance

Evaluation Metric	Equation
Accuracy (A)	TP+TNTP+FP+TN+FN
Precision (P_k)	TPTP+FP
Recall (R_k)	TPTP+FN
F1-score (F1k)	2×(Pk×RkPk+Rk)

Fig. 2 AUC-ROC Curve

The FPR and TPR score are calculated using the expression (1) and (2), respectively:

FPR=FPTN+FP, (1)

TPR=TTP+FN (2)

The idea behind the calculation of the AUC score (exists between 0 and 1) is the measurement of separability. The AUC score exists near 1, which means has good separation capability. For the multi-class problem, we can plot the N number of AUC ROC curves for multiple classes.

6 Experimental Results and Discussion

This section consists of the experimental results of ML and DL methods for the prediction of COVID-19 infection, considering a total of 600 patients using 18 different laboratory findings. These results are evaluated based on the train-test split approach, 10-folds cross-validation, ROC-AUC score, and comparison of results with published works.

6.1 Train-Test Split Approach

We can observe in Table 4, the accuracy results of all ML models have reached at least 80% and above. The Logistic Regression, K-Neighbors, and AdaBoost classifier have achieved the best evaluation performance with an 85.00% accuracy score. The Support Vector, Random Forest, GaussianNB, and Linear Discriminant Analysis were observed as the second-best models. The experimental results of all DL application methods through the aggregation of the mean values of accuracy score. In terms of accuracy predictive performance, we observed that the overall best score was achieved by CNN with a 91.88% score followed by RNN (accuracy = 90.27%), then LSTM (accuracy = 89.99%).

Table 4 Summary of experimental results of all ML and DL with the train-test split approach.

Classification Models
Machine Learning Methods		Deep Learning Methods
Methods	Accuracy (%)	Methods	Accuracy (%)
Logistic Regression	85.00	CNN	91.88
K-Neighbors	85.00
Support Vector	84.16
Decision Tree	82.50	RNN	90.27
Random Forest	84.16
AdaBoost	85.00
GaussianNB	84.16	LSTM	90.00
Linear Discriminant Analysis	84.16
Quadratic Discriminant Analysis	80.83

6.2 10-Fold Cross-Validation Approach

In the 10-fold cross-validation, the experimental dataset is randomly partitioned into 10 equal sub-datasets. Out of these sub-datasets, a single sub-dataset is assigned as a validating dataset, and the rest of the nine sub-datasets are retained as training data. The cross-validation technique repeats the process ten times and each of the 10 - subsamples is used exactly once as the validation data. The final result of 10-folds can be produced by aggregating the average results of each folding. Table 5 shows the experimental results of all machine learning application models with 10 folds cross-validation approach.

Table 5 Summary of experimental results of all ML and DL with 10 folds cross-validation approach

Classification Models
Machine Learning Methods		Deep Learning Methods
Methods	Accuracy (%)	Methods	Accuracy (%)
Logistic Regression	89.79	CNN	87.66
K-Neighbors	87.29
Support Vector	87.29
Decision Tree	83.75	RNN	86.49
Random Forest	89.16
AdaBoost	87.91
GaussianNB	83.54	LSTM	86.66
Linear Discriminant Analysis	87.70
Quadratic Discriminant Analysis	82.29

In cases of relatively small samples, the k fold cross-validation approach is frequently used to measure the accurate classification performance of classifiers especially in health studies [³⁵]. In table 5, we have observed the accuracy score of all ML and DL classifications methods based on 10 folds cross-validation techniques. The performance of all ML algorithms was better in 10 folds cross-validation approach comparison with the train-test split strategy but opposite performance results with DL methods. The accuracy results of all ML models have reached at least 82.29% and above.

Logistic Regression has achieved the best accuracy performance with an 89.79% score, followed by Random Forest as the second-best model with 89.16% accuracy. Moreover, the experimental results of all DL application methods were observed using the mean values of accuracy score. CNN has achieved the overall best accuracy 91.88% score, followed by RNN (accuracy = 86.49%) then LSTM (accuracy = 86.66%).

6.3 Results Interpretation of Area Covered Under the ROC Curve

The AUC-Score determines the best model prediction on classes and ranges in value from 0 to 1. The various points on the ROC curve determine the different characteristics of the model’s performance. The following table 6 determines the model’s characteristics based on range value [⁸⁴]. The classification models achieved more than 0.60 AUC score value; we can say that those models were accepted for clinical prediction of COVID-19.

Table 6 Performance measurement based on AUC score

AUC-Score	Model’s characteristics
0	Inaccurate Test
0.5	No Discrimination
0.6 to 0.8	Acceptable
0.8 to 0.9	Excellent
> 0.90	Outstanding
1	Accurate Test

The AUC score of logistic regression is considered acceptable since the results range between 0.8 and 0.9 (figure 3). The AUC scores of the remaining ML methods were excellent, all of the results were higher than 0.66. In the DL methods, the RNN achieved the highest score (AUC = 0.68), followed by CNN (AUC = 0.62), and then the LSTM approach (AUC = 0.50) (table 7).

Fig. 3 AUC values of all classical machine learning algorithms

Table 7 AUC values of all deep learning methods

Deep Learning Methods	AUC- Score
CNN	0.6211
RNN	0.6876
LSTM	0.5000

6.4 Comparison of Experimental Results with Recently Published Articles

The paper [³⁴] and [³⁵] have used the classical machine learning algorithms i.e. Support Vector Machine, Random Forest, respectively. Similarly, the paper [³⁷] compared the prediction performance of six different classical and hybrid deep learning algorithms i.e. ANN, CNN, RNN, LSTM, CNNRNN, and CNNLSTM. However, we have used both classical ML and DL algorithms. We can observe in Table 8, the best classification is obtained by deep learning methods (CNN, RNN, and LSTM). Yet, in our study, we have exposed the performance of both ML and DL methods. It has shown that the AUC-Score of all methods is acceptable for the prediction of COVID-19 infection.

Table 8 Comparison of experimental results with recently published works

Ref.	Dataset Source	Techniques	Classification methods	Accuracy (%)	AUC	F1 – Score
[34]	Wenzhou Central Hospital and Cangnan People’s Hospital in Wenzhu, China	ML	Support Vector Machine	80.00	-	-
[35]	Hospital Israelita Albert Einstein at Sao Paulo, Brazil	ML	Support Vector Machine, Random Forest	-	0.87	0.72
[36]	Hospital Israelita Albert Einstein at Sao Paulo, Brazil	ML	Logistic Regression	89.00	0.85	-
Our work	Hospital Israelita Albert Einstein at Sao Paulo, Brazil	DL	CNN, RNN, LSTM	91.88, 90.27, 90.00	-	-

7 Conclusion and Future Works

In this study, we have designed and developed deep learning-based machine learning models for predicting the COVID-19 infection. We have carried out the nine classical machines (logistic regression, K-Neighbors classifier, support vector classifier, decision tree classifier, random forest classifier, AdaBoost classifier, GaussianNB, linear discriminant analysis, quadratic discriminant analysis) and three deep learning methods (CNN, RNN, and LSTM) to accomplish the clinical prediction work.

The experimental data were preprocessed using standardization and then fed to our models. Further, the classification results were measured based on the accuracy score. To validate our model, we have used the train-test split approach, 10-fold cross-validation, and AUC-ROC curve score. In the train-test split approach, the best result was achieved using CNN with an accuracy of 91.88% and an AUC score of 62.11% in the deep learning application.

However, Logistic Regression, K-Neighbors, and AdaBoost classifiers have obtained a similar accuracy of 85.00% and AUC score of 85.00%, 78.00%, and 71.00%, respectively. The best accuracy value was achieved by CNN (Deep Learning) with an accuracy of 87.66% and Logistic Regression (Machine Learning) with an accuracy of 89.79%. All the ML and DL algorithms used in this study, have achieved an accuracy of over 80%.

This study carried out a major limitation with a small and imbalanced experimental dataset. The performance of our prediction model can be enhanced by increasing the size of the dataset either combining the data from different laboratories or using data augmentation techniques. Further studies were carried out from our study with findings the additional parameters such as genders, travel details, previous medical treatment details, etc. for enhancing the prediction rate. Based on our experimental results, we conclude that the clinical system should explore the use of artificial intelligence for prioritizing the models as decision support systems while reducing the personalizing infection risks.

Acknowledgments

Prabhat Kumar sincerely acknowledges the University Grants Commission (UGC), New Delhi, India, for awarding the Non-Net Fellowship Scheme [FILE NO.: R/Dev/IX-Sch .(UGC Res. Sch.) 2020-21/13674, Dated: 01-10-2019]. The work was supported under Institute of Eminence (IoE) Seed Grant by Banaras Hindu University.

References

1. Aylward, B., Liang, W. (2020). Report of the WHO-China joint mission on Coronavirus disease 2019 (COVID-19). WHO-China Jt. Mission Coronavirus Dis. 2019. pp. 6–24. [ Links ]

2. WHO (2020). WHO Director-General's statement on IHR Emergency Committee on Novel Coronavirus (2019-nCoV), August 11, 2020. [ Links ]

3. Wang, D., Hu, B., Hu, C., Zhu, F., Liu, X., Zhang, J., Wang, B., Xiang, H., Cheng, Z., Xiong, Y., et al. (2020). Clinical characteristics of 138 hospitalized patients with 2019 novel Coronavirus-Infected pneumonia in Wuhan, China. JAMA, Vol. 323, No. 11, pp. 1061– 1069. DOI: 10.1001/jama.2020.1585. [ Links ]

4. Holshue, M. L., DeBolt, C., Lindquist, S., Lofy, K. H., Wiesman, J., Bruce, H., Spitters, C., Ericson, K., Wilkerson, S., Tural, A., et al. (2020). First case of 2019 novel coronavirus in the United States. New England Journal of Medicine, Vol. 382, pp. 929–936. DOI: DOI: 10.1056/NEJMoa2001191. [ Links ]

5. WHO (2020). Modes of transmission of virus causing COVID-19: implications for IPC precaution recommendations, Scientific brief, August 13, 2020. [ Links ]

6. WHO (2020). Coronavirus, August 13, 2020. Available from: https://www.who.int/health-topics/coronavirus#tab=tab_3. [ Links ]

7. WHO (2020). Potent antibodies found in people recovered from COVID-19, August 13, 2020. Available from: https://www.nih.gov/news-events/nih-research-matters/potent-antibodies-found-people-recovered-covid-19. [ Links ]

8. WHO (2020). Advice for the public, August 13, 2020. [ Links ]

9. WHO (2020). Global research on coronavirus disease (COVID-19), August 13, 2020. [ Links ]

10. Huang, C., Wang, Y., Li, X., Ren, L., Zhao, J., Hu, Y., Zhang, L., Fan, G., Xu, J., Gu, X., et al. (2020). Clinical features of patients infected with 2019 novel coronavirus in Wuhan, China. Lancet, Vol. 395, pp. 497–506. DOI: 10.1016/S0140-6736(20)30183-5. [ Links ]

11. Liu, K., Fang, Y. Y., Deng, Y., Liu, W., Wang, M. F., Ma, J. P., Xiao, W., Wang, Y. N., Zhong, M. H., Li, C. H., et al. (2020). Clinical characteristics of novel coronavirus cases in tertiary hospitals in Hubei Province. Chinese Medical Journal, Vol. 133, pp. 1025–1031. DOI: 10.1097/CM9.0000000000000744. [ Links ]

12. Kam, K. Q., Yung, C. F., Cui, L., Lin, R. T. P., Mak, T. M., Maiwald, M., Li, J., Chong, C. Y., Nadua, K., Tan, N. W. H., et al. (2020). A well infant with Coronavirus disease 2019 (COVID-19) with High Viral Load. Clinical Infectious Disiseases, Vol. 71, No. 15, pp. 847–849. DOI: 10.1093/cid/ciaa201. [ Links ]

13. Jiehao, C., Jin, X, Daojiong, L, Zhi, Y., Lei, X., Zhenghai, Q., Yuehua, Z., Hua, Z., Ran, J., Pengcheng, L., et al. (2020). A case series of children with 2019 novel coronavirus infection: clinical and epidemiological features. Clinical Infectious Diseases, Vol. 71, No. 6, pp. 1547-1551. DOI: 10.1093/cid/ciaa198. [ Links ]

14. Kulkarni, S., Seneviratne, N., Baig, M. S., Khan, A. H. A. (2020). Artificial intelligence in medicine: where are we now?. Academic Radiology, Vol. 27, No. 1, pp. 62–70. DOI: 10.1016/j.acra.2019.10.001. [ Links ]

15. Rowe, J. P., Lester, J. C. (2020). artificial intelligence for personalized preventive adolescent healthcare. Journal of Adolescent Health, Vol. 67, No. 2, pp. S52–S58. DOI: 10.1016/j.jadohealth.2020.02.021. [ Links ]

16. Rong, G., Méndez, A., Assi, E. B., Zhao, B., Sawan, M. (2020). artificial intelligence in healthcare: review and prediction case studies. Engineering, Vol. 6, No. 3, pp. 291– 301. DOI: 10.1016/j.eng.2019.08.015. [ Links ]

17. Wang, Y., Zhao, Y., Therneau, T. M., Atkinson, E. J., Tafti, A. P., Zhang, N., Amin, S., Limper, A. H., Khosla, S., Liu, H. (2020). Unsupervised machine learning for the discovery of latent disease clusters and patient subgroups using electronic health records. Journal of Biomedical Informatics, Vol. 102, pp. 1–10. DOI: 10.1016/j.jbi.2019.103364. [ Links ]

18. Sheela, K. G., Varghese, A. R. (2019). Machine learning based health monitoring system. Materials Today: Proceedings, Vol. 24, pp. 1788–1794. DOI: 10.1016/j.matpr.2020.03.603. [ Links ]

19. Sirsat, M. S., Fermé, E., Câmara, J. (2020). Machine learning for brain stroke: A review. Journal of Stroke and Cerebrovascular Diseases, Vol. 29, No. 10, pp. 1–17. DOI: 10.1016/j.jstrokecerebrovasdis.2020.10516 [ Links ]

20. Hossain, M. A., Ferdousi, R., Alhamid, M. F. (2020). Knowledge-driven machine learning based framework for early-stage disease risk prediction in edge environment. Journal of Parallel Distributed Computing, Vol. 146, pp. 25–34. DOI: 10.1016/j.jpdc.2020.07.003. [ Links ]

21. Yang, X., Yu, Y., Xu, J., Shu, H., Xia, J., Liu, H., Wu, Y., Zhang, L., Yu, Z., Fang, M., et al. (2020). Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. The Lancet Respiratory Medicine, Vol. 8, No. 5, pp. 475–481. DOI: 10.1016/S2213-2600(20)30079-5. [ Links ]

22. An, N., Ding, H., Yang, J., Au, R., Ang, T. F. A. (2020). Deep ensemble learning for Alzheimer’s disease classification. Journal of Biomedical Informatics, Vol. 105, pp. 1–11. DOI: 10.1016/j.jbi.2020.103411. [ Links ]

23. Fei, Z., Yang, E., Li, D. D. U., Butler, S., Ijomah, W., Li, X., Zhou, H. (2020). Deep convolution network based emotion analysis towards mental health care. Neurocomputing, Vol. 388, pp. 212–227. DOI: 10.1016/j.neucom.2020.01.034. [ Links ]

24. Coccia, M. (2020). Deep learning technology for improving cancer care in society: New directions in cancer imaging driven by artificial intelligence. Technology in Society, Vol. 60, pp. 1–11. DOI: 10.1016/j.techsoc.2019.101198. [ Links ]

25. Guan, B., Liu, F., Matthew, P., Mirzaian, A. H., Demehri, S., Neogi, T., Guermazi, A., Kijowski, R. (2020). Deep learning approach to predict pain progression in knee osteoarthritis. Osteoarthritis and Cartilage, Vol. 28, pp. S316. DOI: 10.1016/j.joca.2020.02.489. [ Links ]

26. Rahman, M. M., Ghasemi, Y., Suley, E., Zhou, Y., Wang, S., Rogers, J. (2020). Machine learning based computer aided diagnosis of breast cancer utilizing anthropometric and clinical features. IRBM, Vol. 42, No. 4, pp. 215-226. DOI: 10.1016/j.irbm.2020.05.005. [ Links ]

27. Horiuchi, Y., Hirasawa, T., Ishizuka, N., Tokai, Y., Namikawa, K., Yoshimizu, S., Ishiyama, A., Yoshio, T., Tsuchida, T., Fujisaki, J., et al. (2020). Performance of a computer-aided diagnosis system in diagnosing early gastric cancer using magnifying endoscopy videos with narrow-band imaging (with videos). Gastrointestinal Endoscopy, Vol. 92, No. 4, pp. 856–865. DOI: 10.1016/j.gie.2020.04.079. [ Links ]

28. Gudigar, A., Raghavendra, U., Hegde, A., Kalyani, M., Ciaccio, E. J., Acharya, U. R. (2020). Brain pathology identification using computer aided diagnostic tool: A systematic review. Computer Methods and Programs in Biomedicine, Vol. 187, pp. 1–18. DOI: 10.1016/j.cmpb.2019.105205. [ Links ]

29. Ebhohimen, I. E., Edemhanria, L., Awojide, S., Onyijen, O. H., Anywar, G. (2020). Advances in computer-aided drug discovery. Phytochemicals as Lead Compounds for New Drug Discovery, Elsevier, pp. 25-37. DOI: 10.1016/B978-0-12-817890-4.00003-2. [ Links ]

30. Iadanza, E., Luschi, A. (2019). Computer-aided facilities management in health care. Clinical Engineering Handbook, 2nd ed. Academic Press, pp. 42–51. DOI: 10.1016/B978-0-12-813467-2.00005-5. [ Links ]

31. Panwar, H., Gupta, P. K., Siddiqui, M. K., Morales-Menéndez, R., Singh, V. (2020). Application of deep learning for fast detection of COVID-19 in X-Rays using nCOVnet. Chaos, Solitons and Fractals. Vol. 138, pp. 1–8. DOI: 10.1016/j.chaos.2020.109944. [ Links ]

32. Panwar, H., Gupta, P. K., Siddiqui, M. K, Morales-Menéndez, R., Bhardwaj, P, Singh, V. (2020). A deep learning and Grad-CAM based color visualization approach for fast detection of COVID-19 cases using chest X-ray and CT-Scan images. Chaos, Solitons and Fractals, Vol. 140, pp. 1–12. DOI: 10.1016/j.chaos.2020.110190. [ Links ]

33. Das, N. N., Kumar, N., Kaur, M., Kumar, V., Singh, D. (2020). Automated deep transfer learning-based approach for detection of COVID-19 infection in chest X-rays, IRBM, Vol. 43, No. 2, pp. 114–119. DOI: 10.1016/j.irbm.2020.07.001. [ Links ]

34. Jiang, X., Coffee, M., Bari, A., Wang, J., Jiang, X., Huang, J., Shi, J., Dai, J., Cai, J., Zhang, T., et al. (2020). Towards an artificial intelligence framework for data-driven prediction of coronavirus clinical severity. Computers, Materials & Continua, Vol. 63, No. 1, pp. 537–551. DOI: 10.32604/cmc.2020.010691. [ Links ]

35. Batista, A. F. M., Miraglia, J. L., Donato, T. H. R., Chiavegatto Filho, A. D. P. (2020). COVID-19 diagnosis prediction in emergency care patients: a machine learning approach. MedRxiv. DOI: 10.1101/2020.04.04.20052092. [ Links ]

36. Schwab, P., Schütte, A. D., Dietz, B., Bauer, S. (2020). predCOVID-19: A Systematic Study of Clinical Predictive Models for Coronavirus Disease 2019. ArXiv:2005.08302, Vol. 76, pp. 1–8. [ Links ]

37. Alakus, T. B., Turkoglu, I. (2020). Comparison of deep learning approaches to predict COVID-19 infection. Chaos, Solitons and Fractals, Vol. 140, pp. 1–7. DOI: 10.1016/j.chaos.2020.110120. [ Links ]

38. Kaggle (2020). Diagnosis of COVID-19 and its clinical spectrum. August 18, 2020. Available from: https://www.kaggle.com/einsteindata4u/covid19. [ Links ]

39. Nusinovici, S., Tham, Y. C., Yan, M. Y. C., Ting, D. S. W., Li, J., Sabanayagam, C., Wong, T. Y., Cheng, C. Y. (2020). Logistic regression was as good as machine learning for predicting major chronic diseases, Journal of Clinical Epidemiology, Vol. 122, pp. 56–69. DOI: 10.1016/j.jclinepi.2020.03.002. [ Links ]

40. Schluter, P. J. (2011). The trauma and injury severity score (TRISS) revised. Injury, Vol. 42, No. 1, pp. 90–96. DOI: 10.1016/j.injury.2010.08.040. [ Links ]

41. Aboagye-Mensah, E. B., Azap, R. A., Odei, J. B., Gray, D. M., Nolan, T. S., Elgazzar, R., White, D., Gregory, J., Joseph, J. J. (2020). The association of ideal cardiovascular health with self-reported health, diabetes, and adiposity in African American males. Preventive Medicine Reports, Vol. 19. DOI: 10.1016/j.pmedr.2020.101151. [ Links ]

42. Ahmed, H., Younis, E. M. G., Hendawi, A., Ali, A. A. (2020). Heart disease identification from patients’ social posts, machine learning solution on spark. Future Generation Computer Systems, Vol. 111, pp. 714–722. DOI: 10.1016/j.future.2019.09.056. [ Links ]

43. Morais-Rodrigues, F., Silverio-Machado, R., Kato, R. B., Rodrigues, D. L. N., Valdez-Baez, J., Fonseca, V., San, E. J., Gomes, L. G. R., dos Santos, R. G., Viana, M. V. C., et al. (2020). Analysis of the microarray gene expression for breast cancer progression after the application modified logistic regression. Gene, Vol. 726, pp. 1–8. DOI: 10.1016/j.gene.2019.144168. [ Links ]

44. Fukunishi, H., Nishiyama, M., Luo, Y., Kubo, M., Kobayashi, Y. (2020). Alzheimer-type dementia prediction by sparse logistic regression using claim data. Computer Methods and Programs in Biomedicine, Vol. 196, pp. 1–8. DOI: 10.1016/j.cmpb.2020.105582. [ Links ]

45. Chen, L., Wang, C., Chen, J., Xiang, Z., Hu, X. (2021). voice disorder identification by using Hilbert-Huang Transform (HHT) and K Nearest Neighbor (KNN). Journal of Voice, Vol. 35, No. 6, pp. 932.e1–932.e11. DOI: 10.1016/j.jvoice.2020.03.009. [ Links ]

46. Kaplan, K., Kaya, Y., Kuncan, M., Ertunç, H. M. (2020). Brain tumor classification using modified local binary patterns (LBP) feature extraction methods. Medical Hypotheses, Vol. 139. DOI: 10.1016/j.mehy.2020.109696. [ Links ]

47. Balaji, V. R., Suganthi, S. T., Rajadevi, R., Kumar, V. K., Balaji, B. S., Pandiyan, S. (2020). Skin disease detection and segmentation using dynamic graph cut algorithm and classification through Naive Bayes classifier. Measurement, Vol. 163. DOI: 10.1016/j.measurement.2020.107922. [ Links ]

48. Shah, S. M. S., Shah, F. A., Hussain, S. A., Batool, S. (2020). Support vector machines-based heart disease diagnosis using feature subset, wrapping selection and extraction methods. Computers & Electrical Engineering, Vol. 84. DOI: 10.1016/j.compeleceng.2020.106628. [ Links ]

49. Panigrahi, R., Borah, S. (2019). Classification and analysis of facebook metrics dataset using supervised classifiers. Social Network Analytics, Computational Research Methods and Techniques, Elsevier, pp. 1–19. DOI: 10.1016/B978-0-12-815458-8.00001-3. [ Links ]

50. Radha, P., Divya, R. (2020). An efficient detection of HCC-recurrence in clinical data processing using boosted decision tree classifier. Procedia Computer Science, Vol. 167, pp. 193–204. DOI: 10.1016/j.procs.2020.03.196. [ Links ]

51. Wadekar, A. S. (2020). Understanding opioid use disorder (OUD) using tree-based classifiers. Drug and Alcohol Dependence, Vol. 208. DOI: 10.1016/j.drugalcdep.2020.107839. [ Links ]

52. Proniewska, K., Pregowska, A., Malinowski, K. P. (2020). Identification of human vital functions directly relevant to the respiratory system based on the cardiac and acoustic parameters and random forest. IRBM, Vol. 42, No. 3, pp. 174–179. DOI: 10.1016/j.irbm.2020.02.006. [ Links ]

53. Li, J., Tian, Y., Zhu, Y., Zhou, T., Li, J., Ding, K., Li, J. (2020). A multicenter random forest model for effective prognosis prediction in collaborative clinical research network. Artificial Intelligence in Medicine, Vol. 103. DOI: 10.1016/j.artmed.2020.101814. [ Links ]

54. Tan, C., Chen, H., Xia, C. (2009). Early prediction of lung cancer based on the combination of trace element analysis in urine and an Adaboost algorithm. Journal of Pharmaceutical and Biomedical Analysis, Vol. 49, No. 3, pp. 746–752. DOI: 10.1016/j.jpba.2008.12.010. [ Links ]

55. Hu, G., Yin, C., Wan, M., Zhang, Y., Fang, Y. (2020). Recognition of diseased pinus trees in UAV images using deep learning and AdaBoost classifier. Biosystems Engineering, Vol. 194, pp. 138–151. DOI: 10.1016/j.biosystemseng.2020.03.021. [ Links ]

56. Mujumdar, A., Vaidehi, V. (2019). Diabetes prediction using machine learning algorithms. Procedia Computer Science, Vol. 165, pp. 292–299. DOI: 10.1016/j.procs.2020.01.047. [ Links ]

57. Javan, S. L., Sepehri, M. M., Javan, M. L., Khatibi, T. (2019). An intelligent warning model for early prediction of cardiac arrest in sepsis patients. Computer Methods and Programs in Biomedicine, Vol. 178, pp. 47–58. DOI: 10.1016/j.cmpb.2019.06.010. [ Links ]

58. Worth, A. P., Cronin, M. T. D. (2003). The use of discriminant analysis, logistic regression and classification tree analysis in the development of classification models for human health effects. Journal of Molecular Structure: THEOCHEM, Vol. 622, Nos. 1–2, pp. 97–111. DOI: 10.1016/S0166-1280(02)00622-X. [ Links ]

59. Nkengfack, L. C. D., Tchiotsop, D., Atangana, R., Louis-Door, V., Wolf, D. (2020). EEG signals analysis for epileptic seizures detection using polynomial transforms, linear discriminant analysis and support vector machines. Biomedical Signal Processing and Control, Vol. 62. DOI: 10.1016/j.bspc.2020.102141. [ Links ]

60. Bari, M. F., Fattah, S. A. (2020). Epileptic seizure detection in EEG signals using normalized IMFs in CEEMDAN domain and quadratic discriminant classifier. Biomedical Signal Processing and Control, Vol. 58. DOI: 10.1016/j.bspc.2019.101833. [ Links ]

61. Georgiou-Karistianis, N., Gray, M. A., Domínguez D, J. F., Dymowski, A. R., Bohanna, I., Johnston, L. A., Churchyard, A., Chua, P., Stout, J. C., Egan, G. F. (2013). Automated differentiation of pre-diagnosis Huntington’s disease from healthy control individuals based on quadratic discriminant analysis of the basal ganglia: The IMAGE-HD study. Neurobiology of Disease, Vol. 51, pp. 82–92. DOI: 10.1016/j.nbd.2012.10.001. [ Links ]

62. Shanthi, T., Sabeenian, R. S., Anand, R. (2020). Automatic diagnosis of skin diseases using convolution neural network. Microprocessors and Microsystems, Vol. 76. DOI: 10.1016/j.micpro.2020.103074. [ Links ]

63. Jain, R., Nagrath, P., Kataria, G., Kaushik, V. S., Hemanth, D. J. (2020). Pneumonia detection in chest X-ray images using convolutional neural networks and transfer learning. Measurement, Vol. 165. DOI: 10.1016/j.measurement.2020.108046. [ Links ]

64. Cimr, D., Studnicka, F., Fujita, H., Tomaskova, H., Cimler, R., Kuhnova, J., Slegr, J. (2020). Computer aided detection of breathing disorder from ballistocardiography signal using convolutional neural network. Information Sciences, Vol. 541, pp. 207–217. DOI: 10.1016/j.ins.2020.05.051. [ Links ]

65. Atal, D. K., Singh, M. (2020). Arrhythmia classification with ECG signals based on the optimization-enabled deep convolutional neural network. Computer Methods and Programs in Biomedicine, Vol. 196. DOI: 10.1016/j.cmpb.2020.105607. [ Links ]

66. Savelli, B., Bria, A., Molinara, M., Marrocco, C., Tortorella, F. (2020). A multi-context CNN ensemble for small lesion detection. Artificial Intelligence in Medicine, Vol. 103. DOI: 10.1016/j.artmed.2019.101749. [ Links ]

67. Zhang, J., Liu, A., Gao, M., Chen, X., Zhang, X., Chen, X. (2020). ECG-based multi-class arrhythmia detection using spatio-temporal attention-based convolutional recurrent neural network. Artificial Intelligence in Medicine, Vol. 106. DOI: 10.1016/j.artmed.2020.101856. [ Links ]

68. Pellicer-Valero, O. J., Cattinelli, I., Neri, L., Mari, F., Martín-Guerrero, J. D., Barbieri, C. (2020). Enhanced prediction of hemoglobin concentration in a very large cohort of hemodialysis patients by means of deep recurrent neural networks. Artificial Intelligence in Medicine, Vol. 107. DOI: 10.1016/j.artmed.2020.101898. [ Links ]

69. Deng, M., Meng, T., Cao, J., Wang, S., Zhang, J., Fan, H. (2020). Heart sound classification based on improved MFCC features and convolutional recurrent neural networks. Neural Networks, Vol. 130, pp. 22–32. DOI: 10.1016/j.neunet.2020.06.015. [ Links ]

70. Yang, J., Huang, X, Wu, H., Yang, X. (2020). EEG-based emotion classification based on bidirectional long short-term memory network. Procedia Computer Science, Vol. 174, pp. 491–504. DOI: 10.1016/j.procs.2020.06.117. [ Links ]

71. Ghosh, L., Saha, S., Konar, A. (2020). Bi-directional long short-term memory model to analyze psychological effects on gamers. Applied Soft Computing, Vol. 95. DOI: 10.1016/j.asoc.2020.106573. [ Links ]

72. Zhang, W., Han, J., Deng, S. (2019). Abnormal heart sound detection using temporal quasi-periodic features and long short-term memory without segmentation. Biomedical Signal Processing and Control, Vol. 53. DOI: 10.1016/j.bspc.2019.101560. [ Links ]

73. Guedes, V., Junior, A., Fernandes, Teixeira, F., Teixeira, J. P. (2018). Long short term memory on chronic laryngitis classification. Procedia Computer Science, Vol. 138, pp. 250–257. DOI: 10.1016/j.procs.2018.10.036. [ Links ]

74. Scikit Learn (2020). Supervised learning, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

75. Scikit Learn (2020). Logistic Regression, sklearn.linear_model. LogisticRegression, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

76. Scikit Learn (2020). KNN, sklearn.neighbors. KNeighborsClassifier, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

77. Scikit Learn (2020). sklearn.svm.SVC, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

78. Scikit Learn (2020). Decision Tree, sklearn.tree.DecisionTreeClassifier, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

79. Scikit Learn (2020). Random Forest, sklearn.ensemble.RandomForestClassifier, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

80. Scikit Learn (2020). AdaBoost, sklearn.ensemble.AdaBoostClassifier, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

81. Scikit Learn (2020). Naïve Bayes, sklearn.naive_bayes.GaussianNB, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

82. Scikit Learn (2020). LDA, sklearn.discriminant_analysis. LinearDiscriminantAnalysis, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

83. Scikit Learn (2020). Quadratic Discriminant Analysis, sklearn.discriminant_analysis. QuadraticDiscriminantAnalysis, scikit-learn-0.23.2 documentation, August 31, 2020. [ Links ]

84. Mandrekar, J. N. (2010). Receiver operating characteristic curve in diagnostic test assessment. Journal of Thoracic Oncology, Vol. 5, No. 9, pp. 1315–1316. DOI: 10.1097/JTO.0b013e3181ec173d. [ Links ]

Received: October 16, 2020; Accepted: December 20, 2021

^* Corresponding author: Prabhat Kumar, e-mail: prabhat.kumar13@bhu.ac.in

This is an open-access article distributed under the terms of the Creative Commons Attribution License