Combination of morphology, wavelet and convex Hull features in classification of patchouli varieties with imbalance data using artificial neural network

Mahmudy, Wayan Firdaus; Dewi, Candra; Arifando, Rio; Ahmadie, Beryl Labique; Rahman, Muh Arif; Mahmudy, Wayan Firdaus; Dewi, Candra; Arifando, Rio; Ahmadie, Beryl Labique; Rahman, Muh Arif

doi:10.22201/icat.24486736e.2021.19.6.1017

Services on Demand

Journal

Article

Indicators

Cited by SciELO
Access statistics

Journal of applied research and technology

On-line version ISSN 2448-6736Print version ISSN 1665-6423

J. appl. res. technol vol.19 n.6 Ciudad de México Dec. 2021 Epub Mar 22, 2022

https://doi.org/10.22201/icat.24486736e.2021.19.6.1017

Articles

Combination of morphology, wavelet and convex Hull features in classification of patchouli varieties with imbalance data using artificial neural network

Wayan Firdaus Mahmudy^a^*

Candra Dewi^a

Rio Arifando^a

Beryl Labique Ahmadie^a

Muh Arif Rahman^a

^{^a}Faculty of Computer Science, Universitas Brawijaya, Indonesia

Abstract

Patchouli plants are main raw materials for essential oils in Indonesia. Patchouli leaves have a very varied physical form based on the area planted, making it difficult to recognize the variety. This condition makes it difficult for farmers to recognize these varieties and they need experts’ advice. As there are few experts in this field, a technology for identifying the types of patchouli varieties is required. In this study, the identification model is constructed using a combination of leaf morphological, texture, and shape features. The texture features are obtained using Wavelet transformation and the shape features are obtained using convex hull. The feature extraction results are used as input data for training of classification algorithms. The effectiveness of the input features is tested using three classification methods in class of artificial neural network algorithms: (1) feedforward neural networks with backpropagation algorithm for training, (2) learning vector quantization (LVQ), and (3) extreme learning machine (ELM). Synthetic minority over-sampling technique (SMOTE) is employed to solve the problem of class imbalance in the patchouli variety dataset. The results of the patchouli variety identification system by combining these three features indicate the level of recognition with an average accuracy of 72.61%. The accuracy with the combination of these three features is higher when compared to using only morphological features (58.68%) or using only Wavelet features (59.03 %) or both (67.25%). This study also showed that the use of SMOTE in imbalance data increases the accuracy with the highest average accuracy of 88.56%.

Keywords: morphology; wavelet; convex hull; neural network; SMOTE; patchouli variety

1. Introduction

The increasing world demand for natural ingredients for cosmetics, perfumes, and medicines, also increases the need for essential ingredients. The world's essential products produced in Indonesia include patchouli, citronella, clove leaf oil, and Cananga. With the growing importance of the needs and research in the field of essential oil in Indonesia, in 2020, the Ministry of Industry has determined essential oil as one of the national research priorities in its use as an antioxidant and anti-aging material.

Among essential products, patchouli oil is the largest export commodity, which is 60% of exports of essential oils in Indonesia. Indonesia is also the largest patchouli oil producer at 85% in the international market (^{Wahyudi & Ermiati, 2020}). As the higher demand for patchouli oil and the decreasing area that can be planted with patchouli, there needs to be an effort to increase the productivity of patchouli cultivation.

The importance of knowing patchouli varieties is because not all patchouli plants have good oil quality. Besides showing the quality of oil, patchouli varieties can also show the resistance of these plant varieties from pests and diseases. By knowing the resistance of pests and plant diseases, the cultivation of prevention can be done for these pests and plant diseases. One way that is usually done to know patchouli varieties is through experts. But there are few experts in this field, so we need a tool for patchouli varieties identification.

In the process of identification, the system requires a sample of patchouli varieties that are used as input data. But the number of patchouli samples obtained from the field is imbalance where the number of samples in a class is higher than other classes or vice versa. Classification on imbalance class may ignore classes that have a fewer number of samples so that it may significantly reduce the accuracy of the classification method (^{Yıldırım, 2016}). Furthermore, minority class features will usually be difficult to be identified (^{Jeatrakul et al., 2010}). One way to handle imbalance classes is to use the Synthetic Minority Over-sampling Technique (SMOTE) technique that works by generating minority data as much as majority data.

Previous research has been carried out to classify patchouli varieties using texture features extracted using wavelet method with 83.33% of accuracy (Dewi et al., 2016). Another study uses morphological features, local binary pattern texture features and hulls convection with accuracy of 77.5% (Dewi et al., 2016). From these two studies it can be seen that the use of wavelet feature extraction is more effective when compared to the local binary pattern texture features. Furthermore, the use of the convex hull feature can increase accuracy in the recognition process (Dewi et al., 2016). Both studies also mentioned the need to choose a combination of dominant derivative features to improve the accuracy of the recognition process.

In this study, the patchouli identification model is constructed using a combination of leaf morphological, texture, and shape features. The texture features are obtained using Wavelet transformation and the shape features are obtained using convex hull. These three features can be extracted into several sub features. However, not all of the sub features have a dominant influence in the process of recognizing varieties. Thus, the optimal selection of features will greatly determine the success of the classification of patchouli varieties. The effectiveness of the input features is tested using three machine learning methods. The first is feedforward neural networks with backpropagation algorithm for training, the second is learning vector quantization (LVQ), and the last is extreme learning machine (ELM).

Patchouli leaf image sample data is taken from several types of patchouli, namely Sidikalang, Diploid, Tetraploid and Patchoulina. Image data is then used as input data as for the process of features extraction of patchouli varieties. The result of features extraction is stored as a vector that is used as input data in the classification process using the proposed methods.

2. Related works

Plant identification using leaf image processing has been done in several studies. Principal component analysis (PCA) and elliptic Fourier have been chosen to extract the leaf shapes. (^{Laga et al., 2014}; ^{Neto et al., 2006}). Furthermore, several studies have also been carried out to identify leaves by combining deep belief networks and multi-features (^{Liu & Kan,
2016}), pattern of leaf bones (^{Zhang et
al., 2016}); leaf texture (^{Pahikkala et
al., 2015}).

The plant leaves identification is also carried out using three features, namely shape features using the scale invariant feature transform (SIFT) method, color features that are extracted using the color moment method, and texture features using the segmentation-based fractal texture analysis (SFTA) method. The study reports an accuracy of 94% (^{Jamil et al.,
2015}). The combination of the three features (morphology, texture, and shape) and using PNN as a classification method are also used in the medicinal plants identification with maximum accuracy reaching 74.67% (^{Herdiyeni et al., 2013}).

The use of wavelet in (^{Abdolmaleki et al.,
2017}) for extracting spectral features in hyperspectral images with wavelet yields promising results for the detection of copper deposits. Study (^{Bakhshipour et al., 2017}) shows that feature extraction using wavelet may improve the performance of the weed detection process. Furthermore, the other study also proves that using wavelet in feature selection can improve recognition performance (^{Arora et al.,
2012}).

The use of morphological features has been used in plant identification research (^{Arora et al., 2012}). One of them is a study (^{Wu et al., 2007}) to identify leaf images using morphological features using the probabilistic neural network (PNN) classifier which results in an average accuracy of 90.3%. The use of shape features using convex hull extraction was done in (^{Lee &
Hong, 2013}) by using leaf veins and shape features using fast Fourier transform and convex hulls obtained accuracy reaching 97.19%.

3. Synthetic minority over-sampling technique (SMOTE)

Data imbalance happens if the number of objects in a certain class is significantly higher than other classes. Classes with a greater number of objects are labeled as major classes while others are labeled as minor classes. Classification methods that do not treat data imbalances may be overwhelmed by major classes and ignore minor classes (^{Chawla et al., 2002}).

The SMOTE method that is proposed in (^{Jeatrakul et
al., 2010}) offers a solution to address unbalanced data with a different principle from the previous oversampling. If the oversampling method is designed to add random observations, the SMOTE method generates new artificial data for minor classes data so that they have equivalent amount of data to the major classes. Artificial data or synthesis is generated using k-nearest neighbor. The number of neighbors is selected by considering the ease in implementing it. Numerical scale artificial data generation differs from categorical data. Numerical data are measured by their proximity to Euclidean distance while categorical data are measured by mode value.

4. Feature extraction

4.1. Wavelet

Wavelet is a short wave whose energy is concentrated in short time intervals. Wavelet transform is the development of Fourier transform which works on periodic waves (^{Sanjeevi et al., 2001}). Wavelet transform can provide time and frequency information simultaneously and has a good performance for analyzing non periodic signals. Therefore, Wavelet transform is appropriate for signal processing and digital image processing (^{Feng et al., 2011}). The signal is formed in a sequence of discrete signals that is called 2D discrete wavelet transform (DWT-2D) (^{Yang et al.,
2014}).

The calculation of DWT-2D is carried out using low-pass and high-pass filters of pixel image values. The low-pass and high-pass filters are labelled by h and g as shown in Figure 1. The decomposition of DWT-2D image is done in 3 levels. At every level, high-pass filters are applied to produce detailed pixel formation of images. The low-pass filters are applied to produce rough estimation of images (^{Shahbahrami, 2012}). The DWT-2D transformation itself uses Eq.1.

XWTi,j= ∑t=-∞∞x(t)ψi,j* (1)

where:

X_WT=	wavelet transform function
Ψ*=	mother wavelet function
x(t)=	reverse transformation
t=	time and output of the function
i, j =	pixel coordinates

Figure 1 Implementation of low-pass and high-pass filters (^{Shahbahrami, 2012}).

The reverse transformation of DWT-2D is formulated in (2):

t=∑t=-∞∞∑t=-∞∞Xt,sψτ (2)

where:

(t)=	reverse transformation
ψ=	mother wavelet function
τ=	time
s=	scale
τ=	time and output of the function

Filtering using the low-pass and high-pass filter obtains 4 image subsections is shown in Figure 2. The four sub-sections are labelled as HH (high sub-bands), LL (low sub-band), HL (high-low sub-bands), and LH (low-high sub-bands). The sub-sections are labelled as LL as the results of low pass filter in horizontal and vertical sections. The results of high pass filter in horizontal and vertical sections are labelled as HH. The results of low pass filter in the horizontal direction and high pass filter images in the vertical direction is denoted as LH while HL is the results of low pass filter in the vertical direction.

Figure 2 Sub-section of the image obtained from low-pass and high-pass filter of the image (^{Shahbahrami,
2012}).

The implementation of the wavelet transform may use several algorithms with different wavelet coefficients. One popular application used is the Daubechies wavelet transformation. Daubechies has a same computation time as other wavelets. Furthermore, it can easily address the edges of images (^{Singh & Khare, 2014}). The wavelet feature uses energy normalization to 1 (L1) and energy normalization to 2 (L2) using Equ. 3. Wavelet features use high-frequency sub-bands (HH) and feature extraction is applied for each level of decomposition. In this study we use 6 wavelet features that are presented in Table 1.

L1=∑HHMNL2=∑HH2MN (3)

where

HH	= high subbands of wavelet
MN	= width x height of the image

Table 1 Wavelet features.

Wavelet Features
L1(HH3)	L1(HH2)	L1(HH1)	L2(HH3)	L2(HH2)	L2(HH1)

4.2. Leave morphology

Morphological characteristics are categorized into two characteristics, namely basic characteristics and derivative characteristics. Leaf base features include diameter (D), physical length (Lp), physical width (Wp), area (A), and perimeter (P). The diameter is the furthest point between two points of the leaf boundary. Physical length is measured as the distance of two leaf base points. Physical width is calculated based on the length of the longest line which intersects the orthogonal length of the physical length line. The area is calculated based on the number of pixels that are inside the edge of the leaf, while the perimeter is the number of pixels on the edge of the leaf (^{Wu et al.,
2007}).

From these five basic characteristics, seven morphological features are obtained. The value of inheritance can be calculated from the ratio between the leaf base characteristics. There are six leaf derivative features, namely (^{Wu et al., 2007}):

1. Aspect ratio

Ratio of physiological length (Lp) to physiological width (Wp).

LpWp

2. Form factor

Used to determine the shape of a leaf and find out how round the leaf shape is.

4πA P2

3. Rectangularity

Describe how square the leaf surface is

LpWpA

4. Narrow factor

The ratio of diameter (D) to physiological length. This feature is to determine whether the shape of the leaf blade is classified as symmetry or asymmetry. If the leaf blade is classified as symmetry, the narrow factor is 1. If asymmetry, the narrow factor is more than 1.

DLp

5. Perimeter ratio of diameter

This feature is to measure how oval the leaf is.

6. Perimeter ratio of physiological length & width.

P(Lp+Wp)

4.3. Convex Hull

Binary images obtained from preprocessing are used for extraction of shapes using convex hull. This characteristic is calculated from the difference between the image area of the results of convex hulls and the original image area of the leaves (^{Lee & Hong, 2013}). Figure 3 shows an example of a convex hull implementation in a binary leaf image.

Figure 3 Example of (a) binary image, (b) binary image obtained from Convex Hulls (^{Lee & Hong,
2013}).

Convexity and solidity characteristics make use of convex hull or the convex set. The convex set is defined as the smallest polygon that surrounds an object. The convexity value is determined by the ratio of the perve length of the convex hull surrounding the object to its perimeter length. The value is calculated using Equ. 4.

convexity=ConvexPerimeterObjectPerimeter (4)

Solidity is measured by the ratio of the object’s area to its convex hull, by utilizing the pixels that make up the convex hull. The ratio is formulated in Equ. 5.

Solidity=ObjectAreaConvexArea (5)

5. Classification algorithms

5.1. Learning vector quantization (LVQ)

The vector quantization learning (LVQ) network is a supervised artificial neural network that was developed by Teuvo Kohonen in the mid-1980s (^{Kohonen, 1995}). The network has an input layer, an LVQ layer, and an output layer. The output layer contains several processing elements because there are different classes. The LVQ layer contains several processing elements for each class. LVQ has been successfully implemented for classifications problems (^{Arifando et al., 2019}).

The LVQ architecture consists of an input layer, kohonen layer (there is competition for input to enter a class based on proximity) and an output layer. The steps in the LVQ Learning Algorithm can be explained as follows (^{Degang et al., 2007}).

1. Let the x is an input vector from training set and W _i is the ith reference vector: Wi∈Rn

2. Determine the winner unit c in the competitive process through Equ. 8:

x-Wc=mini⁡(x-Wi) (8)

3. Adjust W _c using Equ. 9:

Wct+1=Wct+stαtx-Wct, i=cWit, i≠c (9)

in which:

st=1, if the classifier is correct -1, if the classifier is wrong

where α(t) is the corresponding learning rate,

αt=α0(1-tT)

where 0<α0<1 , and T is a total number of learning iterations, W _c (t) represents the sequential values of Wc in the discrete-time domain (t=0,1,2…).

5.2. Extreme learning machine (ELM)

Extreme learning machine (ELM) is an artificial neural network algorithm that is often used for classification, regression, clustering, and learning features that have the concept of one or more hidden layers that work in a single iteration (^{Tang et al., 2016}). The advantage of the ELM method is that it is a thousand times faster than other neural network algorithms that use the concept of backpropagation learning (^{Ding et al., 2015}). The ELV have been applied for various classification problems (^{Alfiyatin et al., 2019}). The steps in the ELM Algorithm are detailed as follows (^{Samet & Miri, 2012}):

1. Suppose there are N data samples (xi,ti), where xi=[xi1,⋯, xim]∈Rm and ti=[ti1,⋯, tiq]∈Rq. A single layer feedforward neural network with Ñ hidden nodes and activation function g(x) is shown as:

∑i=1Ñβigixj=∑i=1Ñβigwi.xj+bi=0j, j=1,…,N. (10)

2. In the equation, wi=wi1,⋯, wimT is the weight vector of the connectors from node in input layer to i ^th node in the hidden layer. βi=βi1,⋯, βiqT is the weight vector of the connectors between the i ^th node in hidden layer to the nodes in the output layer. bi is the threshold of the i-th hidden node. By approximating the samples with zero error, i.e.:

∑j=1Ñ0j-tj=0 (11)

means that there exist wi, βi and bi, such that:

∑i=1Ñβigwi.xj+bi=tj, j=1,…,N (12)

3. By using the following substitutions:

Hwi,…,wÑ,b1,…,bÑ,x1,…,xN=gw1.x1+b1⋯gwÑ.x1+bÑ⋮…⋮gw1.xN+b1⋯gwÑ.xN+bÑ (13)

Where β=β1T⋮βÑT and T=t1T⋮tNT, all N equations of (12) can be written as:

Hβ=T (14)

5.3. Backpropagation algorithm

Backpropagation is supervised learning algorithms to train an artificial neural network. The network is composed of multiple layers of neurons. Backpropagation is a controlled type of artificial neural network training method which uses a weight adjustment pattern to minimize error value between predicted output and targeted output (^{Rahmi et al., 2016}).

The steps in the Backpropagation Algorithm can be explained as follows (^{Liu et al., 2016}): For each neuron j, let n denote the number of neurons in the last layer; oi the output of the 𝑖th neuron; wi the corresponding weight for oi; θj the bias of the neuron j. The neuron j calculates the input for the sigmoid function I _j using Equ. 15.

Ij=∑nwioi + θj (15)

Let oj is the output of neuron j; it can be represented using Eq. 16.

oj =1(1 + e-1j) (16)

If the neuron j is in the output layer, the network starts the backpropagation phase. Let t _j is the encoded target output. The algorithm calculates the output error Err _j for the neuron j in the output layer using Eq. 17.

Errj = oj (1 - oj) (tj - oj) (17)

Let k denote the number of neurons in the next layer; wp the weight; and Err _p the error of neuron p in the next layer. The error Err _j of the 𝑗th neuron can be represented using Eq. 18.

Errj = oj (1 - oj)∑kErrpwp (18)

Let η denote the learning rate. The neuron j updates its weight wj and bias θj using Eq. 19.

Δwj = ηErrjoj,

Δθj = ηErrj,

wj = wj + Δwj,

θj = θj + Δθj (19)

6. Experimental result

This section discusses testing conducted on the LVQ, ELM & backpropagation method to classify input (in the form of features extracted in the previous stage) into 4 classes of patchouli varieties namely diploid, tetraploid, patchoulina and sidikalang. The data used are 91 data which are divided into two parts, namely 63 training data and 28 test data.

The scenario in this test is carried out 10 times and the average of results is calculated. This test aims to find the best parameters to produce the recommended parameter values with the highest accuracy results used in the next testing phase.

6.1. LVQ testing

In this stage, we test the best parameter value of learning rate (α) and epoch for LVQ. Testing is done with variations in the learning rate (α) 0.01, 0.02, 0.03, 0.04, 0.05 up to 0.9 and variations on epoch 10, 20, 30, 40, 50 up to 200. The learning rate (α) and epoch LVQ test results are provided in Table 2 and Table 3.

Table 2 Learning rate test of LVQ.

	learning rate
	0.01	0.02	0.03	0.04	0.05	0.06	0.07	0.08	0.09	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9
Accuracy (%)	82.86	84.64	83.21	83.93	84.29	83.93	82.50	81.43	84.64	85.00	83.57	83.93	76.07	73.93	70.71	62.14	60.36	60.71
Time (s)	0.394	0.406	0.407	0.407	0.415	0.417	0.419	0.421	0.425	0.425	0.427	0.429	0.435	0.441	0.446	0.447	0.449	0.453

Table 3 Number of Epoch test of LVQ.

	epoch
	10	20	30	40	50	60	70	80	90	100	120	140	160	180	200
Accuracy (%)	84.64	85.00	84.64	84.64	84.29	85.71	85.71	85.71	83.93	86.07	83.93	85.36	85.00	85.36	84.64
Time (s)	0.41	0.43	0.45	0.49	0.74	1.05	1.52	2.01	2.43	2.99	3.62	4.49	5.50	6.52	8.23

Testing of learning rate (α) is done by using the initial parameters of epoch of 20. As shown in Table 2, there is increase of accuracy for learning rate from 0.01 to 0.1. Greater value of learning rate may enable the LVQ to quickly update its weight to produce better results. However, the performance of the LVQ is reduced for learning rate greater than 0.1. Too high value learning rate may cause the training process of the LVQ becoming unstable. It is possible that the weights of LVQ is updated too fast and its optimum value is jumped. Table 2 also shows that there is no significant differences of training process time for different value of learning rate.

Testing of epoch is done by using the initial parameters of the best value of learning rate of 0.1. As shown in Table 3, there is increase of accuracy for epoch from 10 to 100. Greater value of epoch will enable the LVQ to update its weights be better in each iteration. However, for epoch greater than 100, there is no significant improvement of accuracy. Therefore, we conclude that the best value of epoch is 100 as greater value of epoch will require a higher training time.

6.2. ELM testing

The scenario in this test is to find the best parameters to produce the recommended parameter values with the highest accuracy results for use in the next test phase. Parameter testing is performed to find the best number of hidden layers. Testing is done with variations of hidden layer of 5, 10, 15, 20 to 100. The results of the hidden layer ELM parameter are shown in Table 4.

Table 4 Hidden layer test of ELM.

	Number of hidden layers
	5	10	15	20	25	30	35	40	45	50
Accuracy (%)	77.4	79.3	79.9	80	80.1	80.3	80.3	80.3	80.3	80.4
Time (s)	0.152	0.152	0.153	0.156	0.161	0.161	0.163	0.165	0.167	0.173

Table 4 shows that the hidden layer of 50 gets a maximum accuracy of 80.43%. The results shows that the number of hidden layers heavily determine the level of accuracy. The greater the number of hidden layers, the better the accuracy level. Hidden layer here serves to assist the process, the more hidden layer that is used the better output is obtained, but the training time will be longer. We have increased the number the hidden layer until 100 and there is no improvement of accuracy.

6.3. Backpropagation algorithm testing

Determining the best parameter values of Backpropagation is carried out using starting value Hidden Layer = 25 and Epoch of 100. Learning Rate (α) is varied from 0.002 to 0.050. Table 5 shows the result of the learning rate testing.

Table 5 Testing of learning rate (α) of backpropagation algorithm.

learning rate (α)	0.002	0.004	0.006	0.008	0.01	0.02	0.03	0.04	0.05
accuracy (%)	79.64	80.7	82.0	84.5	86.4	86.4	86.4	86.4	86.4

Based on Table 5, the learning rate (α) 0.01 gets the maximum accuracy that is 86.424%. The pattern is similar with the learning rate testing for the LVQ. Greater value of learning rate may enable the Backpropagation to quickly update its weight to produce better results. However, the performance of the Backpropagation is reduced for learning rate greater than 0.01.

Table 6 shows the result of number of hidden layer testing. The best value of learning rate (α) that is obtained in the previous testing is used. The number of hidden layers of 15 has gotten the maximum accuracy result which is 89,637%. The greater number of hidden layers does not significantly increase the level of accuracy.

Table 6 Testing of number of hidden layer of backpropagation.

# hidden layer	5	10	15	20	25	30	35	40	45	50
accuracy (%)	83.9	85.7	89.6	88.2	87.1	84.3	88.2	86.8	87.1	88.6

Table 7 shows the result of epoch testing. The best values of learning rate (α) and number of hidden layers that are obtained in the previous testing are used. As shown in Table 6, there is increase of accuracy for epoch from 10 to 150. Greater value of epoch will enable the Backpropagation to update its weights be better in each iteration. However, for epoch greater than 150, there is no significant improvement of accuracy.

Table 7 Epoch testing for backpropagation.

	Number of Epoch
	10	20	30	40	50	60	70	80	90	100	110	120	130	140	150
Accuracy (%)	77.50	80.36	77.14	78.93	80.00	81.07	81.07	83.21	85.00	83.43	85.00	85.36	85.36	86.07	89.43
Time (s)	0.075	0.119	0.157	0.220	0.288	0.291	0.314	0.345	0.407	0.413	0.480	0.514	0.548	0.573	0.603

6.4. Comparison of input feature and SMOTE combination

At this stage, testing of the learning vector quantization (LVQ), extreme learning machine (ELM) & backpropagation method is carried out by using various types of feature extraction combinations. There are 3 features, namely texture features extracted using wavelet texture analysis, morphological features and shape features extracted using convex hull. The combinations used include:

1. Morphology (M)
2. Wavelet (W)
3. Morphology + Wavelet (M+W)
4. Morphology + Wavelet + Convex Hull (M+W+C)

An input vector of length 14 is used to store 6 morphological features as representation of leaf basic characteristics, 6 wavelet features, and 2 convex hull features (convexity and solidity).

At this stage, the test uses the best parameters that have been obtained previously to determine the effect of changes in the level of accuracy. The results of feature tests are presented in Table 7 and Table 8.

Table 7 Comparison of input feature combination without SMOTE.

Method	M	W	M + W	M + W + C
Backpro	78.21%	81.394%	91.77%	92.49%
LVQ	77.49%	76.42%	81.06%	87.85%
ELM	20.36%	19.29%	28.93%	37.5%
Average	58.68%	59.03%	67.25%	72.61%

Table 8 Comparison of input feature combination with SMOTE.

Method	M	W	M + W	M + W + C
Backpro	84.42%	92.99%	94.85%	94.71%
LVQ	79.13%	84.28%	89.28%	90.56%
ELM	75%	81.14%	79.43%	80.43%
Average	79.52%	86.13%	87.85%	88.56%

Table 7 and Table 8 show that the best combination of features in all methods is in "Morphorlogy + Wavelet + Convex Hull" with the highest values reaching 92,493% and 94.7111% for the data that has been processed using SMOTE. In the non-SMOTE feature testing the good accuracy trend only occurs in Backpropagation Algorithm and LVQ, this is inversely proportional to ELM which only gets the best accuracy with a value of 37.5%. In other words, ELM is unable to overcome classification problems in unbalanced data.

The addition of the number of features affects all methods, this can be seen from the increase in accuracy along with the addition of features. From these results it can be concluded that the combination of feature selection is very important in increasing accuracy. Therefore, it is necessary to obtain relevant features when there are many features. Meanwhile, if the number of features is small, the increase in accuracy is not too significant but still able to reduce the workload of the system in the computing process.

7. Conclusions

The target of this research is to get the best accuracy results for the introduction of patchouli varieties using a combination of morphological features, textures, shapes, and artificial neural network algorithms. The higher the accuracy results, the better the results of the introduction of the method used.

In this study, the amount of data used is only 91 and unbalanced, it is expected that in subsequent studies it can improve the results of accuracy by increasing the amount of data and increasing the number of classes so that the classification process can be done in detail. The system can be developed further by using different input features so that it will always be updated with scientific knowledge that continues to grow.

Conflict of interest

The authors do not have any type of conflict of interest to declare.

Acknowledgments

The authors would like to thank Universitas Brawijaya and Faculty of Computer Science for financial support based on Decree Number 191.2/2019.

References

Abdolmaleki, M., Tabaei, M., Fathianpour, N., & Gorte, B. G. (2017). Selecting optimum base wavelet for extracting spectral alteration features associated with porphyry copper mineralization using hyperspectral images. International journal of applied earth observation and geoinformation, 58, 134-144. https://doi.org/https://doi.org/10.1016/j.jag.2017.02.005 [ Links ]

Alfiyatin, A. N., Rizki, A. M., Mahmudy, W. F., & Ananda, C. F. (2019). Extreme learning machine and particle swarm optimization for inflation forecasting. International Journal of Advanced Computer Science and Applications, 10(4), 473-478. [ Links ]

Arifando, R., Yulianto, F., Mahmudy, W. F., & Sander, B. A. (2019). Hybrid Genetic Algorithm & Learning Vector Quantization for Classification of Social Assistance Recipients. In 2019 International Conference on Sustainable Information Engineering and Technology (SIET) (pp. 316-321). IEEE. https://doi.org/10.1109/SIET48054.2019.8986082 [ Links ]

Arora, A., Gupta, A., Bagmar, N., Mishra, S., & Bhattacharya, A. (2012, September). A Plant Identification System using Shape and Morphological Features on Segmented Leaflets: Team IITK, CLEF 2012. In CLEF (Online Working Notes/Labs/Workshop). [ Links ]

Bakhshipour, A., Jafari, A., Nassiri, S. M., & Zare, D. (2017). Weed segmentation using texture features extracted from wavelet sub-images. Biosystems Engineering, 157, 1-12. https://doi.org/10.1016/j.biosystemseng.2017.02.002 [ Links ]

Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16, 321-357. https://doi.org/10.1613/jair.953 [ Links ]

Degang, Y., Guo, C., Hui, W., & Xiaofeng, L. (2007). Learning vector quantization neural network method for network intrusion detection. Wuhan University Journal of Natural Sciences, 12(1), 147-150. https://doi.org/10.1007/s11859-006-0258-z [ Links ]

Dewi, C., Krisnanti, G. W., Cholissodin, I., & Basuki, A. (2006). Identifying Quality of Patchouli Leaves through Its Leave Image Using Learning Vector Quantization. In The 6th Annual Basic Science International Conference. [ Links ]

Ding, S., Zhao, H., Zhang, Y., Xu, X., & Nie, R. (2015). Extreme learning machine: algorithm, theory and applications. Artificial Intelligence Review, 44(1), 103-115. https://doi.org/10.1007/s10462-013-9405-z [ Links ]

Feng, H. Y., Wang, J. P., Li, Y. C., & Chen, J. (2011, October). Wavelet theory and application summarizing. In International Conference on Information Computing and Applications (pp. 337-343). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25255-6_43 [ Links ]

Herdiyeni, Y., Nurfadhilah, E., Zuhud, E., Damayanti, E., Arai, K., & Okumura, H. (2013). A Computer Aided System For Tropical Leaf Medicinal Plant Identification. International Journal on Advanced Science, Engineering and Information Technology, 3, 23-27. https://doi.org/10.18517/ijaseit.3.1.270 [ Links ]

Jamil, N., Hussin, N. A. C., Nordin, S., & Awang, K. (2015). Automatic Plant Identification: Is Shape the Key Feature? Procedia Computer Science, 76, 436-442. https://doi.org/10.1016/j.procs.2015.12.287 [ Links ]

Jeatrakul, P., Wong, K. W., & Fung, C. C. (2010). Classification of Imbalanced Data by Combining the Complementary Neural Network and SMOTE Algorithm. In: Wong K.W., Mendis B.S.U., Bouzerdoum A. (eds) Neural Information Processing. Models and Applications. ICONIP 2010. Lecture Notes in Computer Science , vol 6444. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17534-3_19 [ Links ]

Kohonen, T. (1995). Learning vector quantization. In Self-organizing maps (pp. 175-189). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-97610-0_6 [ Links ]

Laga, H., Kurtek, S., Srivastava, A., & Miklavcic, S. J. (2014). Landmark-free statistical analysis of the shape of plant leaves. Journal of Theoretical Biology, 363, 41-52. https://doi.org/10.1016/j.jtbi.2014.07.036 [ Links ]

Lee, K., & Hong, K. S. (2013). An implementation of leaf recognition system using leaf vein and shape. International Journal of Bioscience and Bio-Technology, 5, 57-65. [ Links ]

Liu, N., & Kan, J. (2016). Improved deep belief networks and multi-feature fusion for leaf identification. Neurocomputing, 216, 460-467. https://doi.org/10.1016/j.neucom.2016.08.005 [ Links ]

Liu, Y., Jing, W., & Xu, L. (2016). Parallelizing backpropagation neural network Using MapReduce and cascading model. Computational Intelligence and Neuroscience, 2016. https://doi.org/10.1155/2016/2842780 [ Links ]

Neto, J. C., Meyer, G. E., Jones, D. D., & Samal, A. K. (2006). Plant species identification using Elliptic Fourier leaf shape analysis. Computers and Electronics in Agriculture, 50(2), 121-134. https://doi.org/10.1016/j.compag.2005.09.004 [ Links ]

Pahikkala, T., Kari, K., Mattila, H., Lepistö, A., Teuhola, J., Nevalainen, O. S., & Tyystjärvi, E. (2015). Classification of plant species from images of overlapping leaves. Computers and Electronics in Agriculture , 118, 186-192. https://doi.org/10.1016/j.compag.2015.09.003 [ Links ]

Rahmi, A., Wijayaningrum, V. N., Mahmudy, W. F., & Parewe, A. M. A. K. (2016). Offline signature recognition using back propagation neural network. Indonesian Journal of Electrical Engineering and Computer Science, 4(3). [ Links ]

Samet, S., & Miri, A. (2012). Privacy-preserving back-propagation and extreme learning machine algorithms. Data and Knowledge Engineering, 79-80, 40-61. https://doi.org/10.1016/j.datak.2012.06.001 [ Links ]

Sanjeevi, S., Vani, K., & Lakshmi, K. (2001, November). Comparison of conventional and wavelet transform techniques for fusion of IRS-1C LISS-III and PAN images. In 22 nd Asian conference on remote sensing (pp. 65-85). [ Links ]

Shahbahrami, A. (2012). Algorithms and architectures for 2D discrete wavelet transform. The Journal of Supercomputing, 62(2), 1045-1064. https://doi.org/10.1007/s11227-012-0790-x [ Links ]

Singh, R., & Khare, A. (2014). Fusion of multimodal medical images using Daubechies complex wavelet transform - A multiresolution approach. Information Fusion, 19(1), 49-60. https://doi.org/10.1016/j.inffus.2012.09.005 [ Links ]

Tang, J., Deng, C., & Huang, G. (2016). Extreme Learning Machine for Multilayer Perceptron. IEEE Transactions on Neural Networks and Learning Systems, 27(4), 809-821. https://doi.org/10.1109/TNNLS.2015.2424995 [ Links ]

Wahyudi, A., & Ermiati. (2020). Prospek Pengembangan Industri Minyak Nilam di Indonesia. In Bunga Rampai Inovasi Tanaman Atsiri Indonesia. Balai Penelitian Tanaman Rempah dan Obat. [ Links ]

Wu, S. G., Bao, F. S., Xu, E. Y., Wang, Y., Chang, Y., & Xiang, Q. (2007). A Leaf Recognition Algorithm for Plant Classification Using Probabilistic Neural Network. 2007 IEEE International Symposium on Signal Processing and Information Technology, 11-16. https://doi.org/10.1109/ISSPIT.2007.4458016 [ Links ]

Yang, L., Tang, Y. Y., & Sun, Q. (2014). Implementation of 2D Discrete Wavelet Transform by Number Theoretic Transform and 2D Overlap-Save Method. Mathematical Problems in Engineering, 2014. https://doi.org/10.1155/2014/532979 [ Links ]

Yıldırım, P. (2016). Pattern Classification with Imbalanced and Multiclass Data for the Prediction of Albendazole Adverse Event Outcomes. Procedia Computer Science , 83, 1013-1018. https://doi.org/10.1016/j.procs.2016.04.216 [ Links ]

Zhang, L., Weckler, P., Wang, N., Xiao, D., & Chai, X. (2016). Individual leaf identification from horticultural crop images based on the leaf skeleton. Computers and Electronics in Agriculture , 127, 184-196. https://doi.org/10.1016/j.compag.2016.06.017 [ Links ]

Peer Review under the responsibility of Universidad Nacional Autónoma de México.

Received: May 10, 2020; Accepted: September 20, 2021; Published: December 31, 2021

∗Corresponding author. E-mail address: wayanfm@ub.ac.id (Wayan Firdaus Mahmudy).

This is an open-access article distributed under the terms of the Creative Commons Attribution License