1. Introduction
Farmers' time and effort has often been decimated by pathogens, causes significant economic damage. In order to avoid these circumstances, the disease identification system has a mechanism for detecting leaf diseases. Information related to diseases have been accessible via smart phone. Disease detection system can accurately predict disease types and accuracy. Farmers would use these details to reduce the harm caused by diseases on a massive scale. Disease Control and management is defined by the following important notions: surveys, monitoring, classification, and identification. all is possible with the implementations of correct leaf disease identification system. Image processing is used to extract visual information, which is then compared to pre-existing data sets in most automatic detection processes (Jagtap & Hambarde, 2014; Huang & Chang, 2020; Kaur et al., 2018; Pugoy & Mariano, 2011). The type of disease and plant species have a considerable impact on the parameters observed in various studies. There are several variables that must be taken into consideration when conducting research on plant diseases. These include the identification of weeds and plants as well as the image background and image capturing circumstances. When considering all the necessary components in constructing a disease detection system, it is typically necessary to estimate it for proper recognition and classification; that poses a number of difficulties in the real world. There are already several attempts in the past made to recognise and classify the photos in order to identify leaf image infections. This study classifies tomato leaf diseases using the Xception model developed using deep learning and optimised using Adam.
An overview of the literature that relates to plant disease detection is presented in this section. Several studies have been examined and proposed in this area, and there is a vast literature on methods to solve this issue. More than 100 species of Alternaria are reported over the various ecosystem and agro-climatic zones, they are among the essential phytopathogenic fungi. The evidence shows phytopathogenic Alternaria species cause 20% of agricultural produce damage. Which produces a variety of secondary metabolite in crops, fruits, and vegetables; this issue is explored in (Dhaware & Wanjale, 2017). In research (Patokar & Gohokar, 2020) the classification accuracy for six different classes for VGG16 comes 95% and AlexNet at 93.4% model accuracy.
This issue was investigated in Kaur et al. (2018), Pugoy and Mariano (2011) which reviewed available methods used in segmentation and extraction of feature as well as classification of diseases into category using its fruit and leaf images. ANN and SVM are found to have better accuracy than any other classifier. Researchers in Jagtap and Hambarde (2014) presents K-mean method of clustering for segmenting images. Feature extraction from a particular cluster has been tested by GLC-occurrence Matrix, afterword’s extracted features are served to SVM classifier, in order to categorize healthy or disease affected images. The results are verified using various kernels and same accuracy is found for the classification of a healthy or unhealthy images. A web-based disease finding system is pretested in Pantazi et al. (2019) which use compressed sensing in measurements of the segmented leaf to diminish the storing intricacy. system presented here with the existing techniques provides precision of 98.5% and classification accuracy of 98.4%.
The work (Kanjalkar & Lokhande, 2013; Prasad et al., 2016) in this research, focuses on two primary infections, yellowing and Esca, RGB images and hyper-spectral reflectance figures are used from diseased leaves from hale and hearty plant leaves. Parameters with texture parameters are compiled and then examined for leaf yellowing and Esca. The accuracy obtained 99% for both diseases. Work presented in Dhaygude and Kumbhar (2013) addresses disease finding issues using a comparison of FCM, and KCM. The results of analysis with FCM scored 86% accuracy and KCM accuracy resulted 66%. Classification was done using fuzzy c-means clustering. Research presented in Barot and Limbad (2015) found accuracy of 85-95% for diseases in brinjal leaf. Analysis was done using k-means algorithm and ANN. This paper (Kaur & Kaur, 2017; Sethy et al., 2017) discusses uses of certain spots and lesions identification, instead of full leaf. Because each area will have its own features, the information variance has decreased without any need for full leaf images. Thus, it enables the detection of several common infections having similar leaf and accuracy is found nearly 86%.
This research is carried out with MATLAB (Ramya & Lydia, 2016) for identifying infections on leaves. It also presents colour variations of the disease infected leaf area. This article presents (Hillnhütter et al., 2012) use of expert system for texture-based identification of leaf diseases. Visual information is collected from digital images and compared with existing data sets in most of the automatic detection algorithms proposed in the literature. Many researchers applied ANN, probabilistic neural network, and SVM for the detection of diseases, for vegetables, cash crops and cereals (Pixia & Xiangdong, 2013; Pujari et al., 2014; VijayaLakshmi & Mohan, 2016; Wang et al., 2008). Experimentation with probabilistic-model, support vector machine classifiers, have been used to identify leaf diseases presented in research (Arivazhagan et al., 2013; Jaware et al., 2012; Waghmare et al., 2016). Aphid fungal diseases and diagnostic method of tomato crop diseases are identified with the help of ANN presented in the research (Al Bashish et al., 2010; Bauer et al., 2009; Joshi & Jadhav, 2016). Several research studies have suggested an algorithm named as "minimum distance classifier" for recognition of cucumber leaves disease (Biswas et al., 2014; De Luna et al., 2017; Gavhale et al., 2014; Jadhav & Patil, 2016; Liu et al., 2017). A methodology for identifying diseases for plants such as jackfruit, pepper, tomato, etc. using the SVM classifier as presented in the research (Atabay, 2017; Fadzil et al., 2014; Gaikwad & Karande, 2016). This article (Abade et al., 2021) provided an overview of CNN algorithm applications for the diagnosis of plant diseases. The authors looked at 121 studies that were presented in journals between the years 2010 and 2019. In this analysis, PlantVillage was found to be the dataset that was utilised the most, while TensorFlow was found to be the framework that was utilised most of the time. The authors of this work (Dhaka et al., 2021) provided an overview of the fundamental CNN model procedures that can be utilised to diagnose plant diseases based on photographs of leaf lesions. In addition to that, they compared several CNN models, pre-processing methods, and frameworks. The study examines datasets and model performance measures. In addition, the authors of this work (Nagaraju & Chawla, 2020) conducted a review in order to identify the most useful datasets, pre-processing methods, and DL methodologies for a variety of plants. They read and examined a total of 84 papers about the application of DL in the diagnosis of plant diseases. They concluded that a great deal of DL approaches has restrictions on their capacity to examine the original images, and that employing an appropriate pre-processing strategy is required for good model performance.
In the paper (Vallabhajosyula et al., 2021), the Kaggle dataset was chosen to show how to classify multiple leaf diseases. CNN architecture was used for the classification task, and the kaggle dataset showed that it was 100% accurate. The authors of paper (Hassan et al., 2021) used images from more than one PlantVillage leaf dataset to classify diseases. The architecture used in this work was EfficientNet B0, and the multiclass leaf disease classification task was found to be accurate to 99.56%. In this paper (Yadav et al., 2021), peach leaf disease detection was done with a self-made dataset. CNN architecture was used to test the accuracy of classification, and the results show that it is 98.75% accurate. In the work shown in (Atila et al., 2021), multiple images from the PlantVillage dataset were used to classify the data, and the efficient net architecture gives an accuracy of 98.42%.
In above review many of the methods are based on offline disease detection and found accuracy up to 98.5% on personal computer based on the manual input for identification. Different image features are taken into consideration for processing such as texture, size, etc. and analysis is done on various platforms like MATLAB, Python using deep learning, AI, SVM for classification. Highest accuracy for 2 or more classes found 99 and 100%. in the literature. The methods and approaches will classes are different, but the aim of all reviewed literature was same leaf disease identification and classification.
2. Materials and methods
The proposed concept is conducted using a Jupiter notebook and Python on a Windows 10 computer with 16GB of RAM, a 1TB hard drive, and an Intel(R) Core (TM) i5 CPU of the 7th generation Dell OptiPlex 7050. The plant village dataset is used to analyse leaf photos of leaves with nine different illnesses, including Bacterial Blight and Leaf Spot, as well as healthy leaves and background images.The approach of leaf disease detection training flow is shown in Figure 1 below. We have considered a total of 11 image classes in this process. The images considered are taken from the open-source platform PlantVillage (Huang & Chang, 2020) One leaf and one background are all there is in each image. The image size was then changed from 256 * 256 to 224 * 224. Following that step, the database is partitioned into a training dataset and a validation set. In data analysis, techniques called "data augmentation" are used to increase the quantity of figures by adding marginally altered versions of previously available dataset or brand-new synthetic data that is created from existing data. When a machine learning model is being trained, it serves as a regularizer and aids in lowering overfitting. supervised machine learning includes classification models as a subset. An output from a classification model categorises the input after it has read some input. In this case, the Xception model was used for classification. Deep learning neural network models were trained using a training dataset of images. The model's performance changes with each iteration because this training is iterative, which means that small tweaks to the model weights are made over time. In this case, a total of 100 epochs were utilised during the training process. An algorithm for machine learning is optimised using a loss function. The model's performance in these two sets determines how the loss is calculated, which is based on training and validation data. In training or validation sets, it is the total number of errors made for each example. A model's loss value indicates how well or poorly it performs after each optimization iteration.
The performance of the algorithm is evaluated using an understandable accuracy metric. A model's accuracy is typically assessed after the model's input parameters and is expressed as a percentage. It measures how closely your model's prediction matches the actual data. After this the trained model can be saved to native disc for further processing.
So here in system flow for prediction as in Figure 2, we're using model. Load to load the model that to be saved in local disk. The trained model's result was obtained using prediction after the model had been loaded. In the prediction the different classes are defined as 0: 'Bacterial spot’, 1:'Early_blight’, 2:'healthy’, 3:'Late_blight’, 4:'Leaf_Mold’, 5:'Septoria_leaf_spot’, 6:'Spider_mites’, 7:'Target_Spot’, 8:'mosaic_virus’, 9:'Yellow_Leaf_Curl_Virus’, 10:'background'. The numeric values from 0 to 10 are represented in classification summery along with the precision, recall and f1 score of the respective class.
2.1. Performance metrics
Early detection and correct identification of plant leaf disease is critical to halting the disease's progress and ensuring the health of the crops (Kanjalkar & Lokhande, 2013) This method's output must be checked for accuracy. using a variety of means. Model performance is evaluated using metrics like as accuracy, precision, recall, and F1 scores. Even though the terminology sound complicated, the underlying concepts are quite simple. Formulas for calculating them are based on easy-to-understand formulas:
Accuracy: Prediction accuracy measures the proportion of accurate forecasts to all possible predictions. It's one of the most basic model metrics there is. Our model's accuracy must be good. Models with high accuracy can be assumed to be accurate most of the time.
CP = Correct Prediction = True Positive + True Negative
IP = Incorrect Prediction = False Positive + False Negative
Accuracy = CP / CP + IP.
Recall
The recall statistic measures the relationship between the number of projected positives and the overall number of positive labels.
Recall = True positives / TNPL
TNPL = Total number of positive labels = True Positive + False Negative
Precision
The precision of a model is defined as the proportion of correctly predicted positive outcomes to all positive outcomes.
Precision = True positives / TNPP
TNPP = Total number of positive Predictions = True Positive + False Positive
F1 Score
For the F1 score to be accurate, it must consider both the accuracy of the recall and its precision. Recall and precision are both important factors in determining F1 score; the score is calculated as just the harmonic mean of both factors.
F1 Score = 2 x (recall * precision / recall + precision).
3. Materials and methods
The aim of this study was to classify eleven distinct leaf tomato diseases. Adam optimizer was used to train and evaluate Xception model, with the learning rat set to 1e4. A classification report and confusion matrix are utilised in order to assess the degree of accuracy possessed by the predictions generated by a classification algorithm. how many of the predictions were accurate and how many were not accurate. To be more explicit, the metrics of a classification report are forecasted by utilising true positives, false positives, true negatives, and false negatives, as displayed in the classification summary in Table 1 and graphically presented in the confusion matrix in Figure. 3.
Table 1 Classification summery.
Classification summery | ||||
---|---|---|---|---|
precision | recall | f1-score | support | |
0 | 0.99 | 0.93 | 0.96 | 335 |
1 | 0.99 | 0.98 | 0.99 | 397 |
2 | 0.99 | 0.99 | 0.99 | 357 |
3 | 1 | 0.96 | 0.98 | 356 |
4 | 0.99 | 0.97 | 0.98 | 388 |
5 | 0.96 | 1 | 0.98 | 345 |
6 | 0.98 | 0.99 | 0.99 | 352 |
7 | 0.93 | 0.99 | 0.96 | 364 |
8 | 1 | 1 | 1 | 370 |
9 | 1 | 0.99 | 1 | 403 |
10 | 1 | 1 | 1 | 208 |
accuracy | 0.98 | 3875 | ||
macro avg | 0.98 | 0.98 | 0.98 | 3875 |
weighted avg | 0.98 | 0.98 | 0.98 | 3875 |
Data visualisation
using plot function of matplotlib's random 20 dataset images were plotted as represented in the Figure 4. Subplots () created 4 rows and 5 columns. subplots () create unit-specific axes objects. Using imshow (), we'll display each image on each axes object. Finally, the Figure 4 is shown using show ().
In the paper (Patokar & Gohokar, 2020) who demonstrated that end-to-end supervised training utilising a deep learning architecture is practicable even for image classification tasks involving many classes, served as the inspiration for our methodology.
We trained our model on plant leaf images using deep learning method to classify disease. This goal was achieved with 98.00% accuracy in the PlantVillage data set 11 classes. The model accurately classifies disease without feature engineering. Model loss and accuracy plots represented in the Figure 5. The Figure 6 presents the random images from prediction the prediction was done on the validation set.
4. Conclusion
Leaf disease detection and classification is key to preventing agricultural loss. Plant disease detection methods and classifiers are listed. This research uses computer vision techniques to detect plant leaf diseases. This paper here presents a framework for plant leaf disease detection and its workflow as explained in detail. The proposed concepts include plant leaf images with 9 diseases, 1 healthy image, and 1 background class of the images. Correct predictions can save crops and improve production and provides accurate information quickly. If such model in future implemented on hardware platform for real-time implementation can solve early-stage disease control. Work done in this research found effective to identify and classify the disease of tomato leaf.