Optimizing Electrocardiogram Denoising for Enhanced Cardiovascular Disease Detection: A Metaheuristic Approach

Galvis-Chacón, Javier; Ramos-Soto, Óscar; Oliva, Diego; Valdivia, Arturo; Rostro-González, Horacio; Zapotecas-Martínez, Saúl; Pérez-Cisneros, Marco; Galvis-Chacón, Javier; Ramos-Soto, Óscar; Oliva, Diego; Valdivia, Arturo; Rostro-González, Horacio; Zapotecas-Martínez, Saúl; Pérez-Cisneros, Marco

doi:10.13053/cys-29-1-5532

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Computación y Sistemas

versión On-line ISSN 2007-9737versión impresa ISSN 1405-5546

Comp. y Sist. vol.29 no.1 Ciudad de México ene./mar. 2025 Epub 05-Dic-2025

https://doi.org/10.13053/cys-29-1-5532

Articles of the Thematic Section

(2)

Optimizing Electrocardiogram Denoising for Enhanced Cardiovascular Disease Detection: A Metaheuristic Approach

Javier Galvis-Chacón¹^*

Óscar Ramos-Soto¹

Diego Oliva¹

Arturo Valdivia¹

Horacio Rostro-González²

Saúl Zapotecas-Martínez³

Marco Pérez-Cisneros¹

¹Universidad de Guadalajara, Departamento de Ingeniería Electro-Fotónica, Mexico. oscar.ramos9279@alumnos.udg.mx, diego.oliva@cucei.udg.mx, arturo.valdivia@academicos.udg.mx, marco.perez@cucei.udg.mx.

²2 Universitat Ramon Llull, GEPI Research Group, School of Engineering, Spain. horacio.rostro@iqs.url.edu.

³3 Instituto Nacional de Astrofísica, Óptica y Electrónica, Coordinación de Ciencias Computacionales, Mexico. szapotecas@inaoep.mx.

Abstract:

Cardiovascular disease (CVD) is the leading cause of death worldwide, accounting for more deaths than any other known cause. Hence, early detection followed by timely treatment of these diseases is crucial to preventing premature deaths. In this scenario, the electrocardiogram (ECG) emerges as a key diagnostic tool, providing critical insight into the heart’s electrical activity and allowing early identification of potentially lethal conditions such as arrhythmias and heart attacks. The automated analysis of ECGs represents a potential tool for the timely detection of different heart conditions. Nevertheless, noise is always present due to the signal acquisition process, and the degree of removal highly impacts the ECG classification accuracy. This paper presents an approach to determining the best ECG degree of noise removal effectively. It comprises the iterative analysis of the wavelet-based denoising method and the Extreme Gradient Boosting (XGBoost) classification, whose best noise removal parameter configuration is obtained through an optimization based on metaheuristic algorithms (MAs). Different MAs are tested to evaluate their performance in classification accuracy enhancement. This proposal is trained and tested on the MIT BIH public ECG dataset to demonstrate its effectiveness across different signal acquisitions. This method is intended to be a preprocessing stage to improve the accuracy of predictive models based on neural networks and the future development of more robust ECG classifier systems, which will improve the detection of CVD.

Keywords: ECG signal; extreme gradient boosting (XGBoost); metaheuristic algorithms (MAs)

1 Introduction

Cardiovascular disease (CVD) is the leading cause of death worldwide, accounting for more deaths than any other known cause. Approximately 80% of these CVD deaths occur in low- and middle-income countries [⁶]. Early detection, followed by timely treatment of these diseases, has been crucial to preventing premature deaths in higher-income countries.

In this regard, one effective and accessible way to identify potential cardiovascular problems is through the use of electrocardiogram (ECG) signals. These signals, which record the electrical activity of the heart, provide relevant information about how the heart functions.

By detecting irregularities in heart rhythm or electrical conduction, the ECG can alert to conditions such as arrhythmias, myocardial infarctions, and other cardiac disorders that allow healthcare professionals to intervene quickly, providing appropriate treatment that can help prevent serious complications and improve the patient’s cardiovascular health [⁹].

In this regard, advanced machine learning-based classifiers have been developed and applied for the diagnosis of several conditions by using ECG signals [²²]. One of the main problems with the performance of these techniques is the always-present noise acquired due to the nature of their signal acquisition. This scenario hinders the correct classification, even when using robust algorithms.

These artifacts distort the most important features of ECG signals, making accurate identification and interpretation difficult. High levels of noise lead to poor quality data that confuses the model during both training and prediction phases, potentially resulting in misclassifications, such as false positives or false negatives [³].

Additionally, noise can introduce misleading patterns that mimic certain arrhythmias or normal beats, and if present in the training data, can cause the model to overfit to these noisy patterns, reducing its ability to generalize to cleaner data. Relevant work has been done on the preprocessing and denoising of ECG signals [¹⁷], given its importance in the accuracy of classification models [²⁰].

Noise in ECG signals can come from multiple sources, such as electromagnetic interference or patient movements [⁸], which can obscure essential signal features crucial for accurate classification. Most of these databases have already been preprocessed to improve the quality of ECG signals.

However, when developing embedded systems for processing this type of signal, it is observed that these signals are in their original state, that is, without preprocessing and affected by various types of noise, including artifacts. To design a robust system to process these signals, it is essential to simulate the conditions in which they are captured, that is, in their pure state.

According to [²¹], this is achieved by adding noise to the database signals, thus emulating the artifacts present in the signal capture stage. Once this process is done, preprocessing and classification techniques can be applied. It is proposed to use wavelet transforms for denoising, as presented in [¹].

In addition, the use of metaheuristic algorithms to find the optimal threshold for these transforms, such as the particle swarm (PSO) used in [¹⁶], is proposed, which allows a finer and adaptive adjustment of the threshold, improving the system’s ability to effectively remove noise without losing important signal information.

In this research, a proposal to improve the efficiency of ECG signal classification by focusing on the denoising stage is presented. First, the Wavelet denoising technique performance is improved by applying a metaheuristic technique to find the optimal denoising value when denoising a publicly available dataset.

The denoised signals are then classified using the Extreme Gradient Boosting (XGBoost) algorithm to achieve both binary classification (normal vs. arrhythmias) and multiclass classification (Normal (N), fusion (F), supraventricular ectopic beat (SVEB), and ventricular ectopic beat (VEB)).

Different metaheuristic algorithms were tested for a comparison of their performance in this task: Particle Swarm Optimization (PSO), Differential Evolution (DE), and Genetic Algorithms (GA).

This approach allows to evaluate the Wavelet denoise techniques and their hybridization with metaheuristic algorithms for the improvement of the accuracy of ECG signal classification, contributing to a more accurate and reliable diagnosis of cardiac diseases.

2 Preliminar Concepts

This section presents the theoretical aspects underlying this article, which are structured concisely to facilitate the understanding and scope of this article. Starting with a description of the basic aspects related to ECG signals, this section covers Wavelet denoising, the XGBoost classifier, and the metaheuristic algorithms used for the respective tests.

2.1 ECG Signal

An ECG signal is a recording of the heart’s electrical activity, used to study its behavior and detect various cardiac conditions. Although it does not diagnose specific diseases by itself, it provides crucial information to identify conditions such as coronary artery disease, myocardial infarction, cardiac arrhythmias, heart block, ventricular hypertrophy, myocarditis, pericarditis, and congenital heart abnormalities.

The ECG in normal rhythm includes a P wave, a QRS complex and a T wave, each with specific characteristics in duration and amplitude. The P wave marks the beginning of the cardiac cycle, the QRS complex represents the depolarization and repolarization of the heart muscle, and the T wave reflects the final repolarization. The ECG profile varies according to the position of the electrodes, which allows a better appreciation of certain properties of the heart.

2.2 Wavelet Denoising

In general, the wavelet transform decomposes a signal by using the scaled and shifted versions of the selected wavelet. Hence, the wavelet acts as a band pass filter, which only allows the passage of certain components of the signal at a certain frequency [²³].

These signals or waveforms have a limited duration and an average value of zero and there are different types of them. The choice of a wavelet will depend on the type of signal to be analyzed, as well as the information to be obtained from it.

In this proposal, the Daubechies wavelets are selected, specifically the Daubechies 4 (db4). These wavelets, based on the work of Ingrid Daubechies [⁷], constitute a family of orthogonal wavelets that define a discrete wavelet transform and are characterized by having a maximum number of null moments for a given support.

Each type of wavelet in this class is associated with a scaling function (known as a parent wavelet), which generates an orthogonal multiresolution analysis. The db4 wavelet is an orthogonal and biorthogonal wavelet with filters of length 8 and an asymmetric symmetry.

The equations for decomposition and reconstruction using db4 are based on the coefficients of the scaling and wavelet filters. Once the signal is in the wavelet domain, it is then thresholded, so the coefficients are modified and filtered. The thresholding process can be described as follows:

Decomposition: The original signal is decomposed into wavelet coefficients using the wavelet transform and the wavelet db4.
Thresholding: Depending on how it is configured, hard thresholding or soft thresholding can be applied [¹²]. The soft threshold function sets the specified value of the decomposition coefficient to zero.

This method ensures consistency in the post-algorithm decomposition coefficient but loses some of the high-frequency coefficients that exceed the threshold. Considering:

wj,k−={sgn(wj,k|−λ),|wj,k|≥λ,0,|wj,k|<λ, (1)

where wj,k− represents the estimated wavelet coefficients, the parameter wj,k denotes the wavelet coefficients after decomposition, λ symbolizes the threshold, and sgn(⋅) is the symbolic function per span in the above formulas.

3. Reconstruction: Reconstruct the signal from the thresholded coefficients by applying the inverse wavelet transform.

This λ threshold value is a fundamental parameter in the denoising algorithm using wavelets since the selection of the threshold directly affects the performance of the denoising process. A fixed threshold is commonly used. This variable is the optimized variable of this paper and will be manipulated later using metaheuristic algorithms to find its optimal value.

2.3 Extreme Gradient Boosting (XGBoost)

The XGBoost [⁵] is an algorithm renowned for its efficiency and performance in several machine-learning tasks, especially when working with structured data for classification and regression problems [¹⁵]. This technique incorporates gradient descent as the primary optimization step and regularization techniques to prevent overfitting and improve generalization, such as L1 and L2.

XGBoost constructs an ensemble of trees and is trained on the residual errors of the combined prediction of all previous trees. This algorithm predicts the target variable for new instances by aggregating the predictions from all trees in the ensemble and provides insights into feature importance, highlighting the most important features in the prediction process. The objective of XGBoost is to minimize the following ℒ L objective function:

ℒ(ϕ)=∑i=1Nℓ(vi,v^i)+Ω(ϕ), (2)

where ℓ(vi,v^i) is the loss function that measures the difference between the true value yi and the predicted value y^i, Ω(ϕ) is the regularization term that helps prevent overfitting by penalizing model complexity, whereas N the number of instances in the dataset. Considering v^0 as the initial prediction, the model is initialized as:

v^0=1N∑i=1Nvi, (3)

where vi refers to the target for the i-th instance in the dataset. By applying gradients and second-order derivatives following the gradient boosting framework [¹⁹], the final model prediction v^f is obtained by summing the predictions from all trees in the ensemble:

v^f=v^0+∑t=1Tηft(x), (4)

where η represents the learning rate and ft(x) represents the prediction from the t-th tree for the input x in the boosting round t, and T is the total number of boosting rounds or trees in the ensemble. By iteratively adding trees that focus on the errors of the current ensemble, XGBoost creates a robust model that balances bias and variance, leading to high predictive accuracy and strong generalization capabilities.

The optimization of ϕ involves finding the best structure and weights for each tree that minimize the overall objective function ℒ(ϕ), ensuring that the model effectively captures the most relevant patterns in the data while avoiding overfitting.

2.4 Metaheuristic Optimization

2.4.1 GA

A subtype of evolutionary computing is genetic algorithms (GA). This branch of artificial intelligence focuses on solving optimization problems. The mechanism of genetic algorithms is based on the natural evolution and natural selection of living organisms. The genetic algorithms were developed by John Holland in the 1970s [¹¹]. The algorithm’s pseudo-code is shown in Algorithm 1 to depict the main structure of GA.

Algorithm 1 Genetic algorithm (GA)

2.4.2 DE

DE is a popular optimization algorithm that can solve a wide range of problems [¹⁰]. It operates through four main stages: Initialization, Mutation, Crossover, and Selection.

DE is a stochastic population-based search technique sensitive to parameters like crossover rate (Cr), scale factor (F), and population size (Np). The stages are summarized below:

– Initialization: In the first phase, a population of candidate solutions is randomly generated within specified bounds. Each solution, represented as a vector xig, is defined.
– Mutation: DE implements a perturbation process, producing a mutant vector vig using the DE/rand/1 strategy from Eq. 5:

vig+1=xr3g+F(xr1g−xr2g). (5)

– Crossover: Crossover increases population diversity, controlled by the constant Cr with a value in the range [0,1]. A trial vector uig+1 in Eq. 6 is generated by combining the target vector xi,dg and the mutant vector vi,dg:

uig+1=udg{vi,dgif d=drand or rand(0,1)≤Cr,xi,dgotherwise, (6)

where drand is a randomly chosen index ∈ 1, 2, …, d which ensures that uig includes at least one parameter of the mutant vector vig.

– Selection: Greedy selection determines survival based on fitness values. The population of the next generation is determined by the vector with the fittest values. The process is repeated until the optimum is found or the termination criterion is met.

2.4.3 PSO

Particle Swarm Optimization (PSO) algorithm simulates a swarm of individuals like a flock of birds [¹³]. PSO algorithm emulates a collective behavior that, in nature, has two main components: velocity and location targets of food sources. Each individual is a particle in the PSO algorithm, denoted by i=1, 2, …, N, where N is the maximum number of particles.

Each particle is conformed by considering its current position in the search space (denoted by x) and its velocity toward the target (represented by v), which is adjusted along the iterative process.

In PSO, two knowledge influences are involved: the individual best position p(t) and the global best position known by the flock g(t). Equation 7 is used to update the particle’s velocity. Component w is an inertia value that ranges in the standard PSO from wmax⁡ = 0.9 to wmin⁡ = 0.4, decreasing linearly;

c1 and c2 are constants called learning factors that control de influence of p(t) and g(t); finally, r1 and r2 are random values in the range [0,1]. Eq. 8 is used to update the positions of the particles:

vi(t+1)=w × vi(t)+c1r1(pi(t))−xi(t)+c2r2(g(t)−xi(t)), (7)

xi(t+1)=xi(t)+vi(t). (8)

3 Proposed Method

The general scheme of this proposal includes the following: first, noise is added to the dataset above described to emulate real scenarios of ECG signal. Then, this dataset is denoised by the optimized WD (Wavelet Denoising), which implies the optimization of the variable λ by using a metaheuristic algorithm, where the accuracy value obtained from the classifier stage is the objective function. Then, this denoised dataset is binary and multi-class classified using the XGBoost for a later numerical analysis using four different performance metrics. This is presented in Fig. 1 and detailed below.

Fig. 1 General diagram of the proposed approach

3.1 Dataset

The MIT-BIH Arrhythmia Database [¹¹] is used for the present case study, developed by the Division of Science and Technology of the University of California, Berkeley, USA. Arrhythmia Database [¹⁸], developed by the Health Sciences and Technology Division of the Massachusetts Institute of Technology (MIT) and Harvard University.

The MIT-BIH arrhythmia database contains 48 half-hourly excerpts of two-channel ambulatory ECG recordings obtained from 47 subjects studied by the BIH Arrhythmia Laboratory between 1975 and 1979. Twenty-three recordings were randomly selected from a set of 4000 ECGs from a mixed population of inpatients (about 60%) and outpatients (about 40%) at Beth Israel Hospital in Boston.

The remaining 25 recordings were selected from the same set to include less common but clinically significant arrhythmias that would not be well represented in a small random sample. Two or more cardiologists independently annotated each recording; disagreements were resolved to obtain computer-readable reference annotations for each beat (approximately 110,000 annotations in total) included in the database.

This database was chosen since it is one of the most widely used and recommended by ANSI/AAMI EC57:1998/(R)2008 [²] (AAMI, Association for the Advancement of Medical Instrumentation), which is responsible for specifying and defining the protocol for performing the evaluations to ensure that the experiments are reproducible and comparable.

A total of 15 different classes of heartbeats are identified, which can be organized into two groups NORMAL and ARRHYTMIAS, where the class conforms to the Normal group: Normal (N) with 45,801 beats, and the ARRHYTMIAS group is composed by the classes:

Supraventricular Ectopic Beats (SVEB) with 976 beats, Ventricular Ectopic Beats (VEB) with 3,788 beats, Fusion Beats (F) with 415 beats, and an additional category for unknown beats.

As can be seen, there is an imbalance in the number of beats of the classes and general of the groups of the MIT-BIH database, for this reason, the SMOTE technique [⁴] was applied. The SMOTE algorithm performs an oversampling approach to rebalance the original training set.

Instead of applying a simple replication of the minority class instances, the key idea of SMOTE is to introduce synthetic examples.

3.2 Noise Addition

The dataset described in Sect. 3.1 is a noise-free set of signals. To prove the robustness of this proposal in noise removal, two common types of noise in these signals are added: electrical and white noise.

One of the most prevalent noises in ECG signal is electrical noise which is typically from alternating current (AC) power lines and appears as a 50 Hz or 60 Hz sinusoidal interference (depending on the local power supply frequency).

On the other hand, by the movement of the electrodes and wires, and the electronic circuitry of the ECG recording equipment, thermal noise and electronic components can also add white noise. The electrical noise En is defined as:

En(t)=A ⋅ sin⁡(2π ⋅ f ⋅ t), (9)

where A and f represent the signal amplitude and frequency, respectively. For this proposal, the frequency value is f=60Hz. On the other hand, the white noise Wn is defined as:

Wn~N(0, σ2), (10)

where N(0, σ2) is the normal distribution with mean 0 and variance σ2. The final noisy signal Ns is obtained as follows:

Ns=Cs−Sμ+Wn+En, (11)

where Sμ is the signal mean value to compensate the offset added by the sum of different noises, defined as:

Sμ=1N∑i=1NCs. (12)

3.3 MAs Parameters Configuration

Due to the variety in the MAs operation nature, each one has an initial parameter configuration. All MAs’ initial parameters were experimentally determined, as presented in Table 1.

Table 1 General and particular parameter settings of the MAs

Algorithm	Parameter settings
General settings	Population size = 10
	Maximum iterations = 10
	λ boundaries = [.01, 0.99]
PSO	Inertia weight = 2.0
	Personal acceleration coefficient = 2.0
	Social acceleration coefficient = 0.9
DE	Scaling factor = 0.5
DE	Crossover rate = 0.5
GA	Crossover probability = 0.5
GA	Mutation probability = 0.9

3.4 Performance Metrics

Four metrics are selected to quantitatively assess the performance of the WD optimization: Accuracy, Precision, Recall, and F1 score. First, the accuracy provides a summary measure of the classifier’s performance across both classes. On the other hand, Precision, Recall, and F1 score are crucial for the performance evaluation of the classification model, ensuring high true positive rates, and low false positive and false negative rates. The mathematical definition of these metrics is presented in Eq. 13 through 16:

Accuracy=TP+TNTP+TN+FP+FN, (13)

P=TPTP+FP, (14)

R=TPTP+FN, (15)

F1=2 ⋅ P⋅RP+R, (16)

where True Positives (TP) represent correctly predicted instances of a class, True Negatives (TN) are instances correctly predicted as not belonging to the class, False Positives (FP) are instances incorrectly predicted as belonging to the class, and False Negatives (FN) are instances of the class incorrectly predicted as belonging to another class, whether it is binary or multi-class classification.

4 Result and Discussion

In this section, the performance of the metaheuristic algorithms in the denoising step is presented. These techniques were used for optimal denoising for both binary and multiclass classification. All experiments were carried out using the same dataset distribution and allocation, to perform an equitable analysis.

4.1 Binary Classification

Table 2 presents performance metrics for binary classification of normal and arrhythmia classes, highlighting the highest value for each metric in bold. The XGBoost without denoising achieves high performance in all metrics, with an overall accuracy of 97.61%, but is surpassed by denoising techniques combined with optimization algorithms, where the PSO and DE obtain the same highest accuracy of 99.04%. Specifically, the combination of WD with PSO and WD with DE achieved the highest overall accuracy and F1 scores (99.04%) in the arrhythmia ECG signal, indicating superior performance. The WD + PSO obtains the highest recall and F1 score for the arrhythmia class (99.12% and 99.04%, respectively) and the highest precision for the normal class (99.11%).

Table 2 Performance metrics on binary classification, Accuracy values are expressed in percentages (%)

	Normal			Arrhythmia
	Precision	Recall	F1 score	Precision	Recall	F1 score	Accuracy
No denoising	97.70	97.51	97.60	97.52	97.71	97.62	97.61
WD + PSO	99.11	98.96	99.03	98.96	99.12	99.04	99.04
WD + DE	99.10	98.97	99.04	98.97	99.11	99.04	99.04
WD + GA	99.10	98.90	99.00	98.91	99.11	99.01	99.01

While wavelet denoising combined with a genetic algorithm (WD + GA) likewise considerably increases performance, it does not surpass the PSO and DE performance, but it still significantly improves the performance without denoising. Overall, incorporating the WD with optimization algorithms significantly enhances binary classification performance, with PSO and DE showing the best and similar results.

In Fig. 2, the confusion matrix of the binary classification of each proposal is presented. As presented, the PSO provides a lower false positive classification of arrhythmias, while DE a lower false negative of normal beats.

Fig. 2 Confusion matrices for binary classification

4.2 Multi Class Classification

The data in Table 3 shows the performance of multi-class classification for N, F, SVEB, and VEB. The highest values for each metric are bolded. The XGBoost without denoising reaches an overall accuracy of 97.89%, which is outperformed by denoising methods hybridized with optimization algorithms.

Table 3 Performance metrics on multi-class classification, Accuracy values are expressed in percentages (%)

	No denoising				WD + PSO
	Precision	Recall	F1 score	Accuracy	Precision	Recall	F1 score	Accuracy
F	99.05	98.00	98.52	97.89	99.45	99.50	99.47	99.12
N	97.98	98.30	98.14		99.20	99.19	99.19
SVEB	95.91	96.41	96.16		98.29	98.39	98.34
VEB	99.31	98.45	98.88		99.69	99.56	99.63
	WD + DE				WD + GA
	Precision	Recall	F1 score	Accuracy	Precision	Recall	F1 score	Accuracy
F	99.54	99.43	99.49	99.11	99.43	99.56	99.50	99.09
N	99.18	99.18	99.18		99.14	99.18	99.16
SVEB	98.20	98.43	98.31		98.20	98.31	98.25
VEB	99.73	99.53	99.63		99.77	99.44	99.61

The WD + PSO combination achieves the top overall accuracy of 99.12% and exhibits substantial enhancements across all metrics compared to no denoising. Furthermore, WD + DE demonstrates great performance with the highest overall accuracy of 99.11%.

Although WD + GA notably enhances performance compared to no denoising, it falls short of WD + PSO and WD + DE results. In essence, the WD denoising paired with optimization algorithms, especially PSO and DE, boosts XGBoost’s performance in multi-class ECG signal classification.

This enhancement is reflected in higher accuracy, precision, recall, and F1 scores across different classes compared to scenarios without denoising. On the other hand, in Fig. 3 the confusion matrices of multiclass classification are presented.

Fig. 3 Confusion matrices for multi-class classification

As can be seen, WD + GA presents a better N classification than WD + PSO and WD + DE performance but falls in the SVEB and VEB performance. For visual analysis of the result of the noise/denoising process, Fig. 4 presents an example of the original signal, its noise-added version, and the optimal denoised signal.

Fig. 4 Raw ECG, noised, and denoised signals

As observed, the denoised signal keeps the original slopes of the ECG, while removing high-frequency components. This best threshold λ value is obtained through metaheuristic optimization.

5 Conclusions and Future Work

This study developed an optimized Wavelet denoising through metaheuristic techniques combined with XGBoost to classify ECG beats using the MIT-BIH database. Noises were added to these signals to simulate real conditions of direct patient acquisition, including typical artifacts present during measurement.

Subsequently, Daubechies wavelets (db4) were applied for signal denoising. The selection of the threshold of the wavelets, which was the target variable in our approach, was optimized using metaheuristic algorithms such as PSO (Particle Swarm Optimization), DE (Differential Evolution), and GA (Genetic Algorithm).

The results show that the use of db4 wavelets combined with metaheuristic optimization techniques significantly improves the quality of the denoised signals, which in turn allows better performance of the XGBoost convolutional network model in ECG beat classification.

Among the metaheuristic algorithms used, PSO presented the best results, closely followed by DE and, finally, GA. However, compared to XGBoost without denoising, the system improves considerably when wavelets are applied.

This approach demonstrates the effectiveness of integrating advanced signal processing methods and machine learning techniques to improve accuracy in diagnosing cardiac arrhythmias, even in the presence of noise and artifacts.

This methodology can be extended and applied to other types of wavelets and classifiers, offering considerable potential for developing robust artificial intelligence-assisted medical diagnostic systems. In future work, the following lines of research and development are contemplated:

– Exploration of Metaheuristic Algorithms: Extend the research by testing a wider variety of metaheuristic algorithms to optimize the threshold parameters in the denoising algorithm using wavelets.
– Statistical Analysis: Propose the implementation of rigorous statistical analyses, such as Friedman or Holm tests, to more accurately assess significant differences in the performance of different metaheuristic algorithms applied to ECG signal denoising.
- These analyses will allow a more comprehensive and objective comparison, providing a solid basis for the selection of the most efficient techniques.
– Evaluation of Different Types of Wavelets: Experiment with other types of wavelets, such as Symlet (sym) wavelets and various additional Daubechies wavelets, as performed in [¹⁴]. This comparison will allow us to determine which is most efficient for the problem at hand.
– Evaluation With Different Types of Simulated Noises: Analyze the performance of the algorithm with the addition of different types of simulated noise, not just white Gaussian noise, that much more accurately emulate the types of real artifacts that affect the quality of ECG signals. This inclusion will allow the robustness of the algorithm to be evaluated in more realistic scenarios.
– Application of Alternative Classifiers: Test with other classifiers to evaluate how they influence the performance of the overall system. Comparison of different classification approaches will help identify the most suitable for the specific task.

These research directions will contribute to a more complete understanding and continuous improvement of the signal denoising and classification process, optimizing both the selection of wavelets and the classification methods used.

References

1. Alfaouri, M., Daqrouq, K. (2008). ECG signal denoising by wavelet transform thresholding. American Journal of applied sciences, Vol. 5, No. 3, pp. 276–281. [ Links ]

2. Association for the Advancement of Medical Instrumentation and American National Standards Institute (1999). Testing and reporting performance results of cardiac rhythm and ST-segment measurement algorithms. The Association. [ Links ]

3. Chatterjee, S., Thakur, R. S., Yadav, R. N., Gupta, L., Raghuvanshi, D. K. (2020). Review of noise removal techniques in ECG signals. IET Signal Processing, Vol. 14, No. 9, pp. 569–590. DOI: 10.1049/iet-spr.2020.0104. [ Links ]

4. Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P. (2002). SMOTE: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, Vol. 16, pp. 321–357. DOI: 10.1613/jair.953. [ Links ]

5. Chen, T., Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. DOI: 10.1145/2939672.2939785. [ Links ]

6. Dang, P., Tang, M., Fan, H., Hao, J. (2024). Chronic lead exposure and burden of cardiovascular disease during 1990–2019: A systematic analysis of the global burden of disease study. Frontiers in Cardiovascular Medicine, Vol. 11. DOI: 10.3389/fcvm.2024.1367681. [ Links ]

7. Daubechies, I. (1992). Ten lectures on wavelets. Society for Industrial and Applied Mathematics. [ Links ]

8. Friesen, G. M., Jannett, T. C., Jadallah, M. A., Yates, S. L., Quint, S. R., Nagle, H. T. (1990). A comparison of the noise sensitivity of nine QRS detection algorithms. IEEE Transactions on Biomedical Engineering, Vol. 37, No. 1, pp. 85–98. DOI: 10.1109/10.43620. [ Links ]

9. Goldberger, J. J., Cain, M. E., Hohnloser, S. H., Kadish, A. H., Knight, B. P., Lauer, M. S., Maron, B. J., Page, R. L., Passman, R. S., Siscovick, D., Stevenson, W. G., Zipes, D. P. (2008). American heart association/american college of cardiology foundation/heart rhythm society scientific statement on noninvasive risk stratification techniques for identifying patients at risk for sudden cardiac death. Journal of the American College of Cardiology, Vol. 52, No. 14, pp. 1179–1199. DOI: 10.1016/j.jacc.2008.05.003. [ Links ]

10. Guo, J., Li, Z., Yang, S. (2018). Accelerating differential evolution based on a subset-to-subset survivor selection operator. Soft Computing, Vol. 23, No. 12, pp. 4113–4130. DOI: 10.1007/s00500-018-3060-x. [ Links ]

11. Holland, J. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control and artificial intelligence. [ Links ]

12. Jing-Yi, L., Hong, L., Dong, Y., Yan-Sheng, Z. (2016). A new wavelet threshold function and denoising application. Mathematical Problems in Engineering, Vol. 2016, pp. 1–8. DOI: 10.1155/2016/3195492. [ Links ]

13. Kennedy, J., Eberhart, R. (1995). Particle swarm optimization. Proceedings of International Conference on Neural Networks, Vol. 4, pp. 1942–1948. DOI: 10.1109/ICNN.1995.488968. [ Links ]

14. Lema-Condo, E. L., Bueno-Palomeque, F. L., Castro-Villalobos, S. E., Ordoñez-Morales, E. F., Serpa-Andrade, L. J. (2017). Comparison of wavelet transform symlets (2-10) and daubechies (2-10) for an electroencephalographic signal analysis. Proceedings of the IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing, pp. 1–4. DOI: 10.1109/INTERCON.2017.8079702. [ Links ]

15. Li, M., Fu, X., Li, D. (2020). Diabetes prediction based on XGBoost algorithm. Vol. 768, No. 7, pp. 072093. DOI: 10.1088/1757-899x/768/7/072093. [ Links ]

16. Manivannan, G. S., Babu, C. G., Rajaguru, H. (2024). Amelioration of multitudinous classifiers performance with hyper-parameters tuning in elephant search optimization for cardiac arrhythmias detection. The Journal of Supercomputing, Vol. 80, No. 10, pp. 14848–14924. DOI: 10.1007/s11227-024-06036-6. [ Links ]

17. Mir, H. Y., Singh, O. (2021). ECG denoising and feature extraction techniques – A review. Journal of Medical Engineering and Technology, Vol. 45, No. 8, pp. 672–684. DOI: 10.1080/03091902.2021.1955032. [ Links ]

18. Moody, G. B., Mark, R. G. (1990). The MIT-BIH arrhythmia database on CD-ROM and software for use with it. Proceedings of the Computers in Cardiology, pp. 185–188. DOI: 10.1109/CIC.1990.144205. [ Links ]

19. Natekin, A., Knoll, A. (2013). Gradient boosting machines, a tutorial. Frontiers in Neurorobotics, Vol. 7, pp. 21. DOI: 10.3389/fnbot.2013.00021. [ Links ]

20. Nurmaini, S., Darmawahyuni, A., Sakti-Mukti, A. N., Rachmatullah, M. N., Firdaus, F., Tutuko, B. (2020). Deep learning-based stacked denoising and autoencoder for ECG heartbeat classification. Electronics, Vol. 9, No. 1, pp. 135. DOI: 10.3390/electronics9010135. [ Links ]

21. Poungponsri, S., Yu, X. H. (2013). An adaptive filtering approach for electrocardiogram (ECG) signal noise reduction using neural networks. Neurocomputing, Vol. 117, pp. 206–213. DOI: 10.1016/j.neucom.2013.02.010. [ Links ]

22. Vishwa, A., Lal, M. K., Dixit, S., Vardwaj, P. (2011). Clasification of arrhythmic ECG data using machine learning techniques. International Journal of Artificial Intelligence and Interactive Multimedia, Vol. 1, No. 4, pp. 67–70. [ Links ]

23. Zhang, D. (2021). Wavelet transform. Fundamentals of Image Data Mining, pp. 45–54. DOI: 10.1007/978-3-030-69251-3_3. [ Links ]

Received: June 26, 2024; Accepted: August 14, 2024

^* Corresponding author: Luis E. Montañez, e-mail: javier.galvis8583@alumnos.udg.mx

This is an open-access article distributed under the terms of the Creative Commons Attribution License