SciELO - Scientific Electronic Library Online

 
vol.21 issue6Microstructural analysis of cracked steam drum plate during fabrication processModeling and parameters optimization of biocomposite using box-Behnken response surface methodology author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Journal of applied research and technology

On-line version ISSN 2448-6736Print version ISSN 1665-6423

J. appl. res. technol vol.21 n.6 Ciudad de México Dec. 2023  Epub Aug 13, 2024

https://doi.org/10.22201/icat.24486736e.2023.21.6.1931 

Articles

New modified Bat algorithm for blind speech enhancement in time domain

Sofiane Fislia  * 

Mohamed Djendib 

aUniversité 8 Mai 1945- Guelma, Laboratoire d’Automatique et Informatique de Guelma (LAIG), Guelma, Algeria

bUniversity of Blida 1, Signal Processing, and Image Laboratory (LATSI), Blida, Algeria


Abstract

We address the speech enhancement problem for dual convolutif mixed channel by viewing it in a blind separation source setting. One widely used technique to separate mixed signals is to apply adaptive filtering, the challenge is to identify an unknown finite impulse response. Traditionally we apply a gradient-based algorithm to adapt filter coefficients. However, such algorithms often suffer from premature convergence when using large filters and non-stationary inputs leading to the so-called local minimum problem, which affects the quality of enhanced signals significatively. One alternative to overcome this problem is to apply a population-based metaheuristic algorithms in which filter coefficients are adapted iteratively by minimizing a cost function. But even with this metaheuristic-based solution, local minimum problem at large filters still exist. To avoid local minima and improve the chance to reach the global solution. We propose in this paper, a novel algorithm called a modified Bat algorithm to render the search process efficiently by enhancing its capability of exploration and exploitation. Several experiments under different noise types are conducted using our proposed modified Bat algorithm in comparison with some of the popular state-of-the-art algorithms. The enhanced signals obtained by each algorithm at the separation process outputs show good behavior and superiority of our proposed algorithm. In terms of system misalignment, as well as a segmental signal-to-noise ratio.

Keywords: Speech enhancement; blind source separation; population-based metaheuristic algorithms; system misalignment; segmental signal-to-noise ratio

1. Introduction

Adaptive noise cancellation (ANC) is an alternative approach used to improve the quality of corrupted speech signal by different noises (Loizou, 2013). Numerous techniques were suggested to enhance the speech signal using the gradient-based algorithm family (Widrow et al., 1975). The most used algorithms from this family are the least mean square (LMS) and normalized least mean square (NLMS) algorithms (Rogers, 1996). However, gradient-based algorithms suffer from the local minimum optimization problem and the global solution is seldom attained. To avoid local minimum solutions in the ANC, many modifications of the normalized least mean square were proposed such as variable step size NLMS (VSS-NLMS) (Bendoumia & Djendi, 2015), and the wavelet-domain NLMS algorithms (Djendi, 2018).

In order to overcome this problem, algorithms-based metaheuristic algorithms are advised due to their simple implementation. Furthermore, metaheuristics are well known for their ability to avoid premature convergence and lead to a lowest chance of falling in local minima (Mahbub et al., 2010). Various metaheuristic algorithms have been used to resolve the ANC problem using adaptive infinite impulse response filters (IIR). The authors Chang and Chen, (2010), Kunche (2016) suggested to use a Bat algorithm (BA), genetic algorithm (GA), particle swarm optimization and its variant version to be applied in ANC.

The aim of this paper is to propose a new efficient modified Bat algorithm which will be implemented in a blind speech enhancement structure (in this work, we only consider the convolutive mixture of signals (Djendi, 2010)) . Note that this paper is an extended version of our work published in Fisli et al. (2019). we extend the previous version by a new theoretical basis and some efficient modification which will increase its performance and therefore the possibility to apply it in other scenarios involving other types of noise. The remains of this manuscript are organized as follows, in the second section, we review the mixing process that produces mixed signals, then the forward blind source separation structure (FBSS) is presented, in Section 3 standard BA is reviewed. Then our modified version of BA is presented. Simulation results discussions are discussed in Section 4. Finally, the paper is concluded.

2. Problem formulation

2.1. Mixture model

In Figure 1, we show the scheme of the convolutive mixture model (two source signals recorded by two microphones), where s1k denote the speech signal, s2k represent the punctual noise (Djendi, 2010).

m1k and m2k is the mixing process output, these two outputs are given by:

m1(k)=s1(k)*h11(k)+s2(k)*h21(k) (1)

m2(k)=s2(k)*h22(k)+s1(k)*h12(k) (2)

where h11(n) and h22(n) represent direct channel paths, h12(k) and h21(k) represent the cross-coupling effects between the channels, all these impulse responses are a finite impulse response (FIR), however, the symbol (*) denote the convolution operator.

Figure 1 The convolutive mixture model. 

A complete mixing process can be simplified by considering some assumption:

  • Original signals are a clean speech and a noise signal, i.e., s1k=s(k) , s2k=b(k) .

  • Direct channel paths are considered equivalent to the unit impulse response, i.e.

    h11k=h22k=δ(k) .

  • Moreover, we assume that input signals are statistically independent.

  • Note that simplified convolutive mixing is widely used because it is well approved in theory and practice.

Figure 2 shows the new simplified convolutive model where the two noisy signals at each channel can be written as:

m1(k)=s(k)+b(k)*h21(k) (3)

m2(k)=b(k)+s(k)*h12(k) (4)

Figure 2 The simplified convolutive mixture model. 

2.2. Forward blind source separation structure

In Figure 2, we suppose that we have no prior knowledge about the two input signals sn, b(n) and the two cross-coupling impulse responses h12(n) and h21(n). In this situation, we call the technique that estimates the original signals by using only the observation, the blind source separation (BSS). In this technique two structures are applied to retrieve the original signals (Bendoumia & Djendi, 2015).

Forward and backward structures are frequently used in BSS due to their efficiency involving speech enhancement for hearing aids, speech recognition and teleconferencing systems. In this work we used forward blind source separation structure (FBSS), (see Figure 3). Note that FBSS can be used only when all observed signal of the separation process is a simple linear combination of the input signals. Outputs available at the FBSS structure are:

Out1(k)=p1(k)-p2(k)*w21(k) (5)

Out2(k)=p2(k)-p1(k)*w12(k) (6)

inserting (1) and (2) into (3) and (4), respectively, we get:

Out1(k)=bk*[h21(k)-w21(k)]+sk*[δk-h12(k)*w21(k)] (7)

Out2(k)=sk*[h12(k)-w12(k)]+bk*[δk-h21(k)*w12(k)] (8)

Figure 3 The FBSS structure model. 

to obtain the optimal solution of the FBSS, we assume that : w12(n) = h12(n) and w21(n) = h21(n) thus the output equation of the unmixed signals is given by:

Out1(k)=sk*[δk-h12(k)*w21(k)] (9)

Out2(k)=bk*[δk-h21(k)*w12(k)] (10)

from (9) and (10), we can get the two-input signal estimation at the output, Out1k and Out2k with spectral and temporal distortions. Consequently the use of post-filters at output may be necessary (Djendi et al., 2006).

In this work, we consider only the case when the two microphones are lightly spaced, which leads to a low distortion, therefore, s^(k)=Out1(k) and b^(k)=Out2(k). To obtain the estimated source signals yield to obtain an optimal solution for the adaptive filter, w12(k) and w21(k), which we can obtained by minimizing the following objective function:

J=1Lk=0LOuti(k)2 (11)

where L is the input frame length and i=1, 2 is the channel index.

2.3. Framework for adaptive filtering in FBSS based on metaheuristic

In general, to solve optimization problems with a metaheuristic algorithm, one needs to evaluate the cost functions at each iteration using a set of input data. In FBSS problems, the mixed signals which represent the input signals of the online adaptive filter are not entirely available, therefore, the efficient way to proceed is to evaluate the cost function using the available frame of observed signal at each iteration. Moreover, we propose, in this paper, to use a manual voice activity detection (MVAD) system to control the adjustment of the adaptive filter, therefore the manual adaptation control, allow to evaluate the cost function only during the noise presence period in the case of the filter w21(k), whereas the filter w12(k) is updated during the voice activity presence periods. The general scheme of the proposed dual adaptive filtering by FBSS and metaheuristic algorithm is illustrated in Figure 4.

Figure 4 Flowchart of the proposed dual adaptive FBSS -based metaheuristic algorithms. 

3. Algorithms review

Bat algorithms and modifications made to improve its efficiency are presented in this section.

3.1. Bat algorithm (BA)

The Bat algorithm (BA) belongs population-based algorithm (Yang, 2010). The bat can hunt even in the whole darkness using the echo return; this characteristic allows bats to differentiate between obstacles and insects as shown in Figure 5. The mechanism of echolocation can be modeled using a set of mathematical equations that consists of a bat swarm representing a potential solution, each bats move according to its velocity vi and position xi in land space according to a frequency fmin, variable wavelength γ and loudness A0 to search for prey location. Bats fine-tune the emitted pulse frequencies and the pulse emission rate, using the distance between them and prey. Optimization process is then repeated until the maximum number of iterations is reached; the position and velocity are updated using the following relations: of iterations is

fi=fmin+(fmax -fmin)  δb (12)

vin= vin-1+xin-gn fi (13)

xik= xik-1+vik (14)

were

fmin, fmax:

frequency min and max

fi:

frequency of the ith bat,

vin, xin:

velocity and position of the ith bat at time n,

 δb:

a random vector distribution uniformly distributed,

Gnearn :

global near best solution,

however, a random walk is generated for each bat to improve the local search:

xin=xin+εAin (15)

were

ε:

random value in the range [-1, 1],

Ain:

loudness of the ith bat at time n.

Figure 5 Bat echo location mechanism. 

Furthermore, the loudness Ai and the rate ri of pulse emission are updated a every iteration n. During the process the rate of pulse emission increases while the loudness decreases once a bat has found its prey, we use for simplicity  A0=1 and Amin = 0, which means that a bat has just met prey, therefore, bat stop to emit sound temporarily:

Ain+1=αAin (16)

rin+1=ri0(1-e-γt) (17)

were

α, γ:

constant value,

Ain loudness of the ith:

bat at time n.

3.2. Formulation of the proposed modified Bat algorithm

Standard Bat algorithm has become very popular for solving real-world problem effectively, except in cases of higher-dimensional problems where BA suffers from local minima problems, to overcome this handicap a modified Bat algorithm (MBA) is introduced to adapt the large adaptive filter. The decreasing nature of the acoustic filter requires to change the philosophy of generated the new solution by improving the local search. In our proposed MBA algorithm, we suggest updating loudness parameter Ai at each iteration which mean that loudness became variable during the optimization process by following a negative exponential function, loudness is estimated by:

Ain=θ e-μ(n-1) (18)

where

θ, μ:

constants in the range of [0, 1],

moreover, by examination of real acoustic impulse responses, one can easily see the large distance between the first and the last point of impulse responses. In standard BA all the point filters are processed in the same way which prevents better exploitation and consequently lead to a wicked final solution. Wherefore in the proposed algorithm, we introduce another step to improve the quality of the solution by manipulating the elements of the best global solution individually according to the following equation:

Gbesti=Gbesti+ωAin (19)

c

Begin

 • Set problem dimension 𝑛𝑛 ,number of Bat’s, Maximum number of iterations 𝑚αxit

the search space 𝑅 , minimum and maximum

value of frequency 𝑓min and 𝑓mαx.

 • Randomly generate positions 𝑋𝑖(𝑖=1,2,…𝑛𝑛) and velocity 𝑉𝑖(𝑖=1,2,…𝑛) of bat

 • Define pulse frequency 𝑓𝑖

 • Initialize pulse rates 𝑟𝑖 and the loudness 𝐴𝑖

 • Evaluate the objective function for each bat then to find the best initial fitness and the best global solution 𝐺best

While (𝑡<𝑚axit)

 • Generate new solutions by adjusting frequency (Equation 12)

 • Update frequency, velocities (Equations 13 and 14)

If (rand > 𝑟𝑖) then

 • Select a solution among the best solutions randomly.

 • Generate a local solution around the selected best solution by a local random walk (Equation 15)

End if

 • Evaluate the objective function for each bat then and update the best fitness and Gbest

If (𝑟and <𝐴𝑖 & f (𝑥𝑖)<f (Gbest)) then

 • Accept the new solution Increases 𝑟𝑖 using (Equation 17) and decrease 𝐴𝑖 using the modified Equation (18)

End if

 • Evaluate the objective function for each bat then and update the best fitness and Gbest

For (j=1:n)

 • Generate a new solution by manipulation only the 𝑗𝑡ℎ element of the Gbest (equation 19)

 • Evaluate the objective function for the new Gbest

 • Accept the change in the Gbest unless it guarantees a lower fitness value , if not the change is ignored

End For

End while

End

Return Gbest as the solution

ω: random value in te [-1, 1]

this step is followed by evaluating the objective function, the new solution is accepted unless it guarantees a lower fitness value compared the one obtained by initial Gbest, if not the change is ignored.

The modified Bat algorithm is expressed by the following pseudo-code.

4. Analysis of experimental results

In this section, we demonstrate the noise reduction capabilities of the proposed modified Bat algorithm in the context of speech enhancement. We perform extensive experiments under several different noisy observation and compare its performance to well-known metaheuristic algorithm including its original version Bat algorithm (BA), (Yang, 2010) particle swarm optimization (PSO) (Clerc, 2010) and gray wolf optimizer (GWO) (Okwu & Tartibu, 2021). We have used the simplified convolutive mixture model presented in Section 2. The clean speech signal s(k) is a sentence pronounced by one male speaker that is sampled at 8 kHz. We mixed clean speech using three different reel noise (k): white Gaussian, car, and USASI noises. The two impulse responses h12(k) and h21k are produced by random sequences, with exponentially negative functions (Djendi, 2010; Djendi et al., 2006). In Figure 6, we show a sample of the impulse response with length L=128, used to produce the mixing signals m1(k) and m2(k) where the input signals are a speech and USASI noise; the input SNRs at both sensors are Snr1=Snr2=-6 dB (see Figure 7).

Figure 6 A sample of impulse responses in left h12(k) and in right h21(k), with L=128 . 

Figure 7 Original speech s(k) [top left], noise signal b(k) [top right], mixing signal m1(k) [bottom left], (mixing signal m2(k) [bottom right]. 

It should be mentioned that we have used all the instances described in Section 3 for all test, furthermore, the same population number, search space range and iteration numbers are used for all algorithms with the goal to evaluate the algorithm and then to get the better performance algorithm using the same setting. Moreover, results are conducted using three lengths of the adaptive filter L= 32, 64 and 128 and different input SNRs. Finally, all obtained results are averaged over 20 trial runs. Note that there are many manners to conduct the comparison of algorithm performances, in this work we propose to use two performance measures:

- System misalignment (SM) criterion that is defined as follows:

SMdB=20log(h21-w21h21 (20)

where . represent the Euclidian norm operator, h21(n) and w21n denote the real filter vector and the adaptive filter vector, respectively.

-Segmental signal-to-noise ratio (SegSNR) which is given by the following relation:

SegSnrdB=10log(i=0P-1|s(i)|2i=0P-1|s(i)-Out1(i)|2) (21)

where |.| represents the absolute operator, s(k) and Out1(k) are the original and the estimated speech signals respectively, P represents the number of samples needed to obtain the average value of the output SNR. In all experiments we have used a manual voice activity detector (MVAD), which means that we update the filter w21(k) only in silence periods, whereas w12(k) is updated only in speech-periods (Djendi, 2010). We should mention that the noisy observations m1(k) and m2(k) are processed segment by segment with overlap technique where each segment involves 256 samples, segmentation is performed using Hamming window with 25% overlap between adjacent frames (Kunche, 2016).

4.1. System misalignment (SM) evaluation

The experimental results in terms of SM criterion obtained by the four algorithms, i.e., BA (Bat algorithm), PSO (particle swarm optimization), GWO (gray wolf optimizer), and the proposed MBA algorithm are described in Figure 8 (we used the absolute value of each value to better illustrate the results). The parameters used to compute the output of each algorithm are summarized in Table 1. The adaptive filter length is variable, i.e., L=32, 64, and 128. The input SNRs are selected to be equal to -6 dB, 0 dB, and 6 dB. The punctual noise is white, USASI (United State of America Standard Institute, now ANSI), and a car noise. Note that we are only interested in the filter w21k since the speech signal is obtained from the first channel. To begin with, we observe that the proposed MBA ideal performs significantly better than the other algorithm in all scenarios, whereas it is slightly inferior to the PSO the white noise scenario with a small filter (L=32).

Figure 8 Comparison of SM absolute final value results. 

Table 1 The parameter setting for BA, PSO, GWO and proposed MBA algorithms. 

Algorithms Parameters
PSO (Clerc, 2010) mαxit = 500; R = [-3,3]D Population = 30; w = 2; c1 = 0.9; c2 = 0.4.
GWO (Okwu & Tartibu, 2021) mαxit = 500; R = [-3 ,3]D ; Population = 30;
BA (Yang, 2010) mαxit = 500; R = [-3 ,3]D Population = 30; A0 = 0.1; r0=0.01.
Proposed MBA [in this paper] mαxit = 500; R = [-3 ,3]n . Population = 30; r0 = 0.01 ; θ = 0.01; μ = 0.04;

In addition, the goal to investigate the potential of the MBA in terms of convergence speed in the transient regime, we have reported on the Figure 9 the temporal evolution of the SM criterion in the case of large adaptive filters (L=128), the clean signal is mixed with white, USASI and car noise with different input SNRs, i.e., -6 dB, 0 dB and 6 dB respectively. We can easily see that our proposed MBA needs lower time to converge in all scenarios, this means that the proposed MBA converges fast to the optimal solution in comparison with the other ones, i.e., BA, PSO and GWO algorithms. In other words , the proposed MBA has the lower steady state values in terms of SM and also the faster convergence speed performance which is a very important characteristic of any adaptive algorithm .

Figure 9 System misalignment criteria estimated on the adaptive filter w21 (k) using White noise with [In left], USASI noise [In middle] and car noise [In right], with L=128 at all simulation. 

4.2. Segmental signal-to-noise ratio (SegSnr) criterion evaluation

A comparison of final values of the SegSnr criterion estimated on the denoised signals Out1(k) obtained by each algorithm are shown in Figure 10. The simulation setting parameters of each algorithm are the same as those given in Table 1. The results indicate that the proposed MBA performs much better than the BA, PSO and GWO algorithms in all scenarios. We also reported in Figure 11, the temporal evolution of the SegSnr criterion obtained at the first output using an adaptive filter with length L=128. Experiments are conducted using white, USASI and car noise with different input SNRs, i.e., -6 dB, 6 dB and 6 dB respectively.

Figure 10 Comparison results of final values of SegSnr criteria 

Figure 11 SegSNR criteria estimated at the output signal Out1 (n) values using white noise [In left], USASI noise [In middle] and car noise [In right], with L=128 at all simulation. 

The results of Figure 11, confirm the superiority of the proposed MBA algorithm over the other ones, i.e., BA, PSO, and GWO in terms of convergence speed in transient regime as well as permanent regime in all experiments .

5. Conclusion

In this work, we have focused on the dual channel speech enhancement through adaptive filtering, we have suggested to use metaheuristic algorithms to adapt filter coefficients, also we have developed a new algorithm namely modified Bat algorithm. The proposed MBA algorithm is combined with the FBSS structure to reduce the acoustic noise components in noisy observations.

Experimental results indicate that the proposed algorithms outperform conventional and state-of-the-art metaheuristic algorithms (PSO, BA, and GWO), in terms of both convergence rate and segmental to noise ratio, as well as the steady state misalignment. In conclusion the obtained results, led us to conclude that the proposed algorithms could represent appealing solutions for speech enhancement and acoustic noise reduction applications.

Acknowledgements

This work was supported by the following grants: Laboratoire d’Automatique et Informatique de Guelma (LAIG), Guelma, Algeria and Signal Processing and Image Laboratory (LATSI), Blida, Algeria.

References

Bendoumia, R., & Djendi, M. (2015). Two-channel variable-step-size forward-and-backward adaptive algorithms for acoustic noise reduction and speech enhancement.Signal processing,108, 226-244. https://doi.org/10.1016/j.sigpro.2014.08.035 [ Links ]

Chang, C. Y., & Chen, D. R. (2010). Active noise cancellation without secondary path identification by using an adaptive genetic algorithm.IEEE transactions on Instrumentation and Measurement,59(9), 2315-2327. https://doi.org/10.1109/TIM.2009.2036410 [ Links ]

Clerc, M. (2010). Particle Swarm Optimization. Particle Swarm Optimization, 1942-1948. https://doi.org/10.1002/9780470612163 [ Links ]

Djendi, M. (2010). Advanced techniques for two-microphone noise reduction in mobile communications (Ph. D. dissertation). University of Rennes, France (in French). https://www.theses.fr/2010REN1S012Links ]

Djendi, M. (2018). A new efficient wavelet-based adaptive algorithm for automatic speech quality enhancement. InProceedings of the Fourth International Conference on Engineering & MIS 2018(pp. 1-6). https://doi.org/10.1145/3234698.3234752 [ Links ]

Djendi, M., Gilloire, A., & Scalart, P. (2006). Noise cancellation using two closely spaced microphones: Experimental study with a specific model and two adaptive algorithms. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, 3, III-III. https://doi.org/10.1109/icassp.2006.1660761 [ Links ]

Fisli, S., Djendi, M., & Guessoum, A. (2019). A New Dual Modifted Bat Algorithm for Design of Adaptive Noise Canceller. 2019 International Conference on Advanced Electrical Engineering , ICAEE 2019, 1-6. https://doi.org/10.1109/ICAEE47123.2019.9014816 [ Links ]

Loizou, P. C. (2013). Speech Enhancement: Theory and Practice. CRC Press, Inc. https://doi.org/10.1201/b14529 [ Links ]

Mahbub, U., Shahnaz, C., & Fattah, S. A. (2010). An adaptive noise cancellation scheme using particle swarm optimization algorithm. In2010 International Conference on Communication Control and Computing Technologies(pp. 683-686). IEEE. https://doi.org/10.1109/ICCCCT.2010.5670753 [ Links ]

Mahbub, U., Shahnaz, C., & Fattah, S. A. (2010). An adaptive noise cancellation scheme using particle swarm optimization algorithm,International Conference On Communication Control And Computing Technologies, Nagercoil, pp. 683-686, https://doi.org/10.1109/ICCCCT.2010.5670753 [ Links ]

Okwu, M. O., & Tartibu, L. K. (2021). Grey Wolf Optimizer. Studies in Computational Intelligence, 927, 43-52. https://doi.org/10.1007/978-3-030-61111-8_5 [ Links ]

Kunche, P., & Reddy, K. V. V. S. (2016).Metaheuristic applications to speech enhancement(pp. 7-16). Springer International Publishing. https://doi.org/10.1007/978-3-319-31683-3 [ Links ]

Rogers, S. (1996). Adaptive filter theory. In Control Engineering Practice (Vol. 4, Issue 11). Pearson Education India. https://doi.org/10.1016/0967-0661(96)82838-3 [ Links ]

Widrow, B., Glover, J. R., McCool, J. M., Kaunitz, J., Williams, C. S., Hearn, R. H., ... & Goodlin, R. C. (1975). Adaptive noise cancelling: Principles and applications.Proceedings of the EEE,63(12), 1692-1716. https://doi.org/10.1109/PROC.1975.10036 [ Links ]

Yang, X. S. (2010). A new metaheuristic bat-inspired algorithm. InNature inspired cooperative strategies for optimization (NICSO 2010)(pp. 65-74). Berlin, Heidelberg: Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-642-12538-6_6 [ Links ]

Peer Review under the responsibility of Universidad Nacional Autónoma de México.

Funding

The authors received no specific funding for this work.

Received: February 28, 2022; Accepted: May 23, 2022; Published: December 31, 2023

*Corresponding author. E-mail address: s.fisli@yahoofr (Sofiane. Fisli).

Conflict of interest

The authors declare that they have no conflict of interest to declare.

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License