SciELO - Scientific Electronic Library Online

 
vol.29 issue2A study of trends for Mexico City ozone extremes: 2001-2014Simple statistical models of surface/atmosphere energy fluxes and their hysteresis in a desertic Mexican city (Mexicali) author indexsubject indexsearch form
Home Pagealphabetic serial listing  

Services on Demand

Journal

Article

Indicators

Related links

  • Have no similar articlesSimilars in SciELO

Share


Atmósfera

Print version ISSN 0187-6236

Atmósfera vol.29 n.2 Ciudad de México Apr. 2016

 

Articles

Application of several data-driven techniques to predict a standardized precipitation index

Bahram Choubin1  * 

Arash Malekian2 

Mohammad Golshan3 

1 Faculty of Natural Resources, Sari University of Agriculture Sciences and Natural Resources, Sari, Iran.

2 Faculty of Natural Resources, University of Tehran, Karaj, Iran.

3 Faculty of Natural Resources, Sari University of Agriculture Sciences and Natural Resources, Sari, Iran.


Abstract:

Climate modeling and prediction is important in water resources management, especially in arid and semi-arid regions that frequently suffer further from water shortages. The Maharlu-Bakhtegan basin, with an area of 31 000 km2 is a semi-arid and arid region located in southwestern Iran. Therefore, precipitation and water shortage in this area have many problems. This study presents a drought index modeling approach based on large-scale climate indices by using the adaptive neuro-fuzzy inference system (ANFIS), the M5P model tree and the multilayer perceptron (MLP). First, most of the climate signals were determined from 25 climate signals using factor analysis, and subsequently, the standardized precipitation index (SPI) was predicted one to 12 months in advance with ANFIS, the M5P model tree and MLP. The evaluation of the models performance by error parameters and Taylor diagrams demonstrated that performance of the MLP is better than the other models. The results also revealed that the accuracy of prediction increased considerably by using climate indices of the previous month (t - 1) (RMSE = 0.802, ME = -0.002 and PBIAS = -0.47).

Keywords: Standardized precipitation index (SPI); climate signals; multi-layer perceptron (MLP); adaptive neuro-fuzzy inference system (ANFIS); M5P model tree; Taylor diagrams

Resumen:

La modelación y predicción del clima son importantes para la gestión de recursos hidráulicos, especialmente en regiones áridas y semiáridas que con frecuencia sufren escasez de agua. La cuenca de Maharlu-Bakhtegan es una región árida y semiárida de 31 000 km2 localizada al suroeste de Irán, de modo que la precipitación y escasez de agua en esta zona son muy problemáticas. Este estudio presenta una aproximación a la modelación del índice de sequía con base en índices climáticos de larga duración y el uso del sistema adaptativo de inferencia neurodifusa (ANFIS, por sus siglas en inglés), el árbol de decisión M5P y el modelo perceptrón multicapa (MLP, por sus siglas en inglés). Primero se determinó la mayoría de las señales climáticas a partir de 25 señales climáticas utilizando análisis factorial, y posteriormente se predijo un índice estandarizado de precipitación mediante las técnicas ANFIS, MLP y M5P con anticipación de uno a 12 meses. La evaluación de la aptitud del modelo mediante parámetros de error y diagramas de Taylor demostró que el desempeño del MLP es mejor que el de los otros dos modelos. Los resultados también mostraron que la exactitud de la predicción aumentó de manera considerable cuando se utilizaron índices climáticos del mes previo (t - 1) (RMSE = 0.802, ME = -0.002 y PBIAS = -0.47).

1. Introduction

Drought is a climate feature that occurs occasionally. This phenomenon, which affects more people than any other hazard, is considered by many to be the most complex but least understood of all the natural vulnerabilities (Mishra and Desai, 2005). In Iran, arid climate extends to an area of 573 884 km2 (35.54% of the territory). The Maharlu-Bakhtegan basin is located in this area; therefore, precipitation and water shortage in this region are very problematic. Meteorological drought occurs when the precipitation average is less than the precipitation average during the long-term period. In Iran, we are confronting challenges in many areas that have arid and semi-arid climates and suffer drought events, so it is necessary to pay more attention to precipitation. Pre-knowledge of the possible amount of precipitation is important in planning water recourses, management of agriculture and droughts, etc. Previous studies show that large-scale climate modes (e.g., North Atlantic Oscillation [NAO], South Oscillation Index [SOI]) have an influence on climate and precipitation in different parts of the world (Nazemosadat and Cordey, 2000; Karabörk et al., 2005; Gaughan and Waylen, 2012; Berg et al., 2013; Choubin et al., 2014b).

In this study, we used large-scale climate indices for predicting the standard precipitation index (SPI). Among the several proposed drought monitoring indices, SPI has widespread application for describing and comparing droughts among different time periods and regions with different climatic conditions (Cancelliere et al., 2007). SPI prediction is a critical issue that has attracted much attention in recent decades all over the world in order to carry out hydrological modeling in arid and semi-arid regions (Rezaeian-Zadeh et al., 2012). Today, more non-linear models are applied to prediction. In previous studies, Dahamsheh and Aksoy (2009), Azadi and Sepaskhah (2012), and Rezaeian-Zadeh et al. (2012) used artificial neural networks (ANNs), and El-Shafie et al. (2011), Sanikhani and Kisi (2012), Jeong et al. (2012), and Choubin et al. (2014a) successfully applied the adaptive neuro-fuzzy inference system (ANFIS) to predict precipitation. In eastern Australia, Deo and Sahin (2015) investigated the application of the ANN model for the prediction of monthly SPIs using hydrometeorological parameters and climate indices. The results showed that the ANN model is a useful data-driven tool for forecasting monthly SPIs.

In the Awash River Basin (Ethiopia), Belayneh et al. (2014) forecasted the long term SPI drought using wavelet neural networks. The forecasted results indicated that the coupled wavelet neural network (WA-ANN) models were better than all the other models in this study for forecasting SPI 12 and SPI 24 values. Ruigar and Golian (2016) predicted the precipitation in the Golestan dam watershed using climate indices: their results indicated that the MLP model is capable of accurately predicting monthly maximum precipitation.

In this study we compared the performances of three modeling techniques for predicting drought in a 43-yr period (1967-2009) in the Maharlu-Bakhtegan basin of Iran. We used the M5P model tree in addition to ANFIS and the multilayer perceptron (MLP) network to predict the SPI using large-scale climate indices as input data, over lead times of 1 to 12 months.

2. Methodology

2.1 Study area

The Maharlu-Bakhtegan basin spreads over 31 000 km2. This area located in southwestern Iran (29° 00' to 31° 14' N, 51° 42' to 54° 31' W), with annual precipitation of270 mm, is one of the most important agricultural centers of Iran (Fig. 1). In this paper, precipitation data were collected from the Iranian Water Resource Management Company (TAMAB) for four meteorological stations: Shiraz synoptic station, Dashtbal, Ali Abad Khatr and Dehkade Shahid. First, station data were analyzed and missing data were reconstructed by using the correlation method; then homogeneity and independence of data were evaluated using the run-test method. Homogeneity and dependence were accepted at a high level. We used Thiessen polygons between stations to calculate the average of watershed precipitation.

Fig. 1 Study area. 

2.2 Standard precipitation index

The SPI was formulated by McKee et al. (1993) in the Colorado Climate Center. It is a relatively new drought index based only on precipitation, which is very important to farmers and responds fairly immediately to rainfall or dryness. This index is the number of standard precipitation deviations that the observed value would deviate from the long-term climatological average. Either a gamma distribution or a Pearson type III distribution is used for its transformation into a normal distribution (Guttman, 1999). It can be calculated for any time scale; yearly, seasonally, monthly or for various months. In this study, a monthly SPI was obtained based on the average rainfall over the basin for a 43-yr period (1967-2009).

2.3 Large-scale climate indices

Climate signals are oceanic and atmospheric patterns that affect the Earth's climate in different regions. In this study, the 25 indices were obtained from the National Oceanic and Atmospheric Administration (NOAA) site (http://www.esrl.noaa.gov/psd/data/climateindices/list/). Then, factor analysis was used to choose the most effective climate index by reducing the complexity of input variables when there are large volumes of information, thus allowing a better interpretation of variables.

2.4 MLP network

ANNs are simplified versions of a human brain and consist of input, hidden and output layers (Gunaydin, 2009). MLP is the most common neural network model (Zurada, 1992; Hagan et al., 1996).

In this paper, we used the Levenberg-Marquardt (LM) training algorithm to obtain the weight of the MLP network. LM can be thought as a combination of steepest descent and the Newton method. The MLP network consists of an input layer of source neurons, at least one middle or hidden layer of computational neurons, and an output layer of computational neurons. The output of an artificial neuron can be expressed as follows:

(1)

where n is the total number of inputs, x 1,x 2,...,xn are the inputs, w1, w2,...,wn are corresponding weights for the inputs.

In this study, the optimum number of hidden neurons and transfer functions was obtained by experiments or by trial and error. Logsig and Purelin transfer functions were used in the hidden and output layers, respectively.

2.5 M5P model tree

Model trees were developed by Quinlan (1992). M5P is a tree-based model used for prediction. Instead of discrete class labels, it uses linear functions at the leaves. M5P is based on the assumption that the functional dependency is not constant in the whole domain, but can be considered on smaller sub domains (Dimitri and Xue, 2005).

2.6 ANFIS model

ANFIS is a kind of neural network based on the Sugeno fuzzy inference system (Takagi and Sugeno, 1985), and was first introduced by Jang (1993). This system uses either back propagation or a combination of least squares estimation and back propagation for estimating the membership functions' parameters.

Since the number of inputs in our study was greater than six, we cannot use grid partition because the number of fuzzy rules would be too large (Farokhnia et al., 2011). So substractive fuzzy clustering algorithms were used to establish rules based on the relationship between input and output variables (Jang and Sun, 1995). Subtractive fuzzy clustering was introduced by Chiu (1994). In this study, the hybrid optimization method, which is a combination of least-squares and back propagation gradient descent method was used as an optimization method; also, Gaussian and linear membership functions were selected as optimum for input and output data, respectively; and the number of membership functions was determined through trial and error by varying the range of influence from 0.5 to 1.5.

In the present paper, we used Matlab R2010b for simulating the ANFIS and MLP models, and the Weka package for the M5P model tree. The input data were divided into two parts: training and testing data in an 85 and 15% combination, respectively.

2.7 Data normalization and evaluation criteria

Climate data in a semi-arid region are sparse and irregular in distribution; the best way to improve the robustness of climate information would be data normalization. The best range for data normalization is 0.05-0.95 (Hsu et al., 1955), as follows:

(2)

where, xnorm and xr are the normalized and the original inputs, and xmin and xmax are the minimum and maximum input ranges, respectively.

Some of the common parameters, including root mean square error (RMSE), mean error (ME) and percent bias (PBIAS) were used to check the performance of the applied models. These indices are valuable because they disclose errors in the units (or squared units) of the constituent of interest, which aids in the analysis of results (Moriasi et al., 2007). The PBIAS measures the average tendency of the simulated data to be larger or smaller than their observed counterparts; the optimal value is 0.0. Positive values indicate a model bias toward underestimation, whereas negative values indicate a bias toward over-estimation (Gupta et al., 1999). These parameters were calculated as follows:

(3)

(4)

(5)

where N is the number of data points considered, and Oi and Pi are the observed and predicted values, respectively.

3. Results

Factor analysis showed that the Kaiser-Meyer-Olkin (KMO) statistic equals to 0.69, so the input variables are suitable for factor analysis (Shrestha and Kazama, 2007). Eight components had eigenvalues greater than 1 and contained 81% of the total variance. Thus, eight climate signals (AMO, AMM, BEST, NINO3.4, NINO4, NTA, SOI, TNA) were selected as most effective after a Varimax rotation with factor loadings of 0.904, 0.826, 0.952, 0.918, 0.855, 0.908, -0.849, and 0.927, respectively.

Table I shows the performance of the ANFIS, M5P and MLP models in predicting SPI time series 12 months in advance. Regarding data testing, the best performance of ANFIS was found for eight-months in advance prediction, with RMSE, ME and PBIAS values of 1.032, 0.011, and 3.55, respectively (Table I). For the M5P model tree, the minimum values of RMSE, ME and PBIAS are related to 10-months in advance predictions (RMSE = 0.828, ME = -0.007, and PBIAS = -2.12). For MLP, the best performance of prediction was obtained one step ahead compared to other models (RMSE = 0.802, ME = -0.002, and PBIAS = -0.47) (Table I). PBIAS indicated that predictions are mostly overestimated (about 85, 54 and 70% for ANFIS, M5P model tree and MLP, respectively).

Table I Performance of models in predicting SPI (from one to 12 months in advance). 

In this paper, we used a Taylor (2001) diagram (Fig. 2) to evaluate the accuracy of ANFIS, MLP, and the M5P model tree. This diagram provides a visual framework for comparing different model results to a reference model or, mainly, to observations. The Taylor diagram is drawn by standard deviation (STD), centered root mean square error (RMSE) and correlation (COR) between different models and observations. Statistics of STD, RMSE and COR were computed for ANFIS, the M5P model tree and MLP from one to 12 months in advance (Fig. 2). The position of each model in the plot shows how closely the simulated SPI pattern matches with observations. From Fig. 2 it can be seen that predictions of ANFIS and MLP are in agreement with observations, unlike the M5P model tree. Although predictions in ANFIS and MLP are quite similar, some step predictions in MLP are closer to observations (e.g., one-month-in-advance predictions). Standard deviation of prediction data indicated that none of the models was able to predict fluctuations in observation data. Figure 3 compares the observed and predicted SPI for the testing set in a one-month lag time.

Fig. 2 Comparisons of observed and predicted SPIs by MLP, ANFIS and the M5P model with a one-month delay. 

Fig. 3 Comparisons of observed and predicted SPIs for watershed in a one-month lag-time. 

4. Discussion

In this study we used climate indices to predict SPI. Factor analysis was used to determine the most important of large-scale climate signals. Sea surface temperature (SST) on the Pacific Ocean and ENSO (including the BEST, NINO3.4, and NINO4 signals), the southern oscillation index (SOI) and SST on the Atlantic Ocean (including the AMO, AMM, NTA, and TNA signals) were selected as the most important signals. In previous studies, Nazemosadat and Cordey (2000), Mariotti et al. (2002) and Pongracz and Bartholy (2006) showed the direct effect of ENSO on precipitation. Karabörk et al. (2005) indicated an inverse relationship between SOI and precipitation.

The ANFIS model was found to have the best performance for the eight-month in advance predictions, whereas the M5P model performed better for 10-month in advance predictions and the MLP network for one-month in advance. Error parameters (Table I) indicated that the MLP network performance was a little better than the other two models, while Dastorani et al. (2010) showed that the potential of ANNs with the ANFIS model is almost the same in predicting dry land precipitation. The Levenberg-Marquardt training algorithm (used in the MLP network) is more powerful and faster than the standard back-propagation algorithm (used in ANFIS) (Abyaneh et al., 2011). This may be the reason for the better efficiency of the MLP model compared to ANFIS. Also, we used the Taylor diagram (Fig. 2) to investigate the accuracy of the models. It is clear that MLP has a better aptitude in comparison with the M5P model tree and ANFIS.

5. Conclusion

Modeling is important in hydrology. This study investigated the prediction of SPI by using several models based on large-scale climate indices. Results showed that the performance of MLP was better than the M5P model tree and ANFIS (Table I). The best performance of the MLP model for SPI prediction was achieved with eight inputs, two hidden and one output neuron (MLP [8, 2, 1]) for a one-month-in-advance prediction. Also, the Taylor diagram (a very useful tool that compares the performance of different models) indicated that MLP is more efficient than the M5P and ANFIS models. There are many parameters for the determination of models' performances, but hydrologists need a tool that can compare different models. The Taylor diagram would be helpful for this purpose. We hope modelers further use this tool in natural sciences and hydrology modeling.

References

Ayaneh H. Z., A. Moghadam Nia, M. B. Varkeshi, S. Marofi and O. Kisi, 2011. Performance evaluation of ANN and ANFIS models for estimating garlic crop evapotranspiration. J. Irrig. Drain. E.-ASCE 137, 280-286. [ Links ]

Azadi S. and A. R. Sepaskhah, 2012. Annual precipitation forecast for west, southwest, and south provinces of Iran using artificial neural networks. Theor. Appl. Climatol. 109, 175-189. [ Links ]

Belayneh A., J. Adamowski, B. Khalil and B. Ozga-Zielinski, 2014. Long-term SPI drought forecasting in the Awash River Basin in Ethiopia using wavelet neural network and wavelet support vector regression models. J. Hidrol. 508, 418-429. [ Links ]

Berg N., A. Hall, S. B. Capps and M. Hughes, 2013. El Niño-Southern Oscillation impacts on winter winds over Southern California. Clim. Dyn., doi:10.1007/s00382-012-1461-6. [ Links ]

Cancelliere A., G. D. Mauro, B. Bonaccorso and G. Rossi, 2007. Drought forecasting using the Standardized Precipitation Index. Water Resour. Manag. 21, 801-819. [ Links ]

Chiu S. L., 1994. Fuzzy model identification based on cluster estimation. J. Intell. Fuzzy Syst. 2, 267-278. [ Links ]

Choubin B., S. Khalighi-Sigaroodi, A. Malekian, S. Ahmad and P. Attarod, 2014a. Drought forecasting in a semi-arid watershed using climate signals: a neuro-fuzzy modeling approach. J. Mt. Sci. 11, 1593-1605, doi:10.1007/s11629-014-3020-6. [ Links ]

Choubin B., S. Khalighi-Sigaroodi, A. Malekian and Ö. Kişi, 2014b. Multiple linear regression, multi-layer perceptron network and adaptive neuro-fuzzy inference system for the prediction of precipitation based on large-scale climate signals. Hydrol. Sci. J. doi:10.1080/02626667.2014.966721. [ Links ]

Dahamsheh A. and H. Aksoy, 2009. Artificial neural network models for forecasting intermittent monthly precipitation in arid regions. Meteorol. Appl. 16, 325-337. [ Links ]

Dastorani M. T., A. Moghadamnia, J. Piri and M. Rico-Ramirez, 2010. Application of ANN and ANFIS models for reconstructing missing flow data. Environ. Monit. Assess. 166, 421-434. [ Links ]

Deo R. C. and M. Sahin, 2015. Application of the artificial neural network model for predication of monthly standardized precipitation and evapotranspiration index using hydro meteorological parameters and climate index in eastern Australia. Atmos. Res. 161, 65-81. [ Links ]

Dimitri S. P. and Y. Xue, 2005. M5 model trees and neural networks: Application to flood forecasting in the upper reach of the Hui River in China. J. Hydrol. Eng. 9, 491-501. [ Links ]

El-Shafie A., O. Jaafer and A. Seyed, 2011. Adaptive neuro-fuzzy inference system based model for rainfall forecasting in Klang River, Malaysia. Int. J. Phys. Sci. 6, 2875-2888. [ Links ]

Farokhnia A., S. Morid and H. R. Byun, 2011. Application of global SST and SLP data for drought forecasting on Tehran plain using data mining and ANFIS techniques. Theor. Appl. Climatol. 104, 71-81. [ Links ]

Gaughan A. E. and P. R. Waylen, 2012. Spatial and temporal precipitation variability in the Okavangoe-Kwandoe-Zambezi catchment, southern Africa. J. Arid Environ. 82, 19-30. [ Links ]

Gunaydin O., 2009. Estimation of soil compaction parameters by using statistical analyses and artificial neural networks. Environ. Geol. 57, 203-215. [ Links ]

Gupta H.V., S. Sorooshina and P. O. Yapo, 1999. Status of Automatic Calibration for Hydrologic Models Comparison with Multilevel Export Calibration. J. Hydrol. Eng. 4, 135-143. [ Links ]

Guttman N. B., 1999. Accepting the standardized precipitation index: A calculation algorithm. J. Amer. Water Resour. Assoc. 35, 311-322. [ Links ]

Hagan M. T., H. B. Demuth and M. H. Beale, 1996. Neural network design, PWS Publishing Company, Boston, 802 pp. [ Links ]

Hsu K. L., H. V. Gupta and S. Sorooshian, 1955. Artificial neural network modeling of rainfall-runoff process. Water Resour. Res. 31, 2517-2530. [ Links ]

Jang J. S. R., 1993. ANFIS: adaptive-network-based fuzzy inference system. IEEE T. Syst. Man. Cyb. 23, 665-685. [ Links ]

Jang J. S. R. and C. T. Sun, 1995. Neuro-fuzzy modeling and control. Proc. IEEE. 83, 378-406. [ Links ]

Jeong C. H., Ju-Y. Shin, T. Kim and J. H. Heo, 2012. Monthly Precipitation Forecasting with a Neuro-Fuzzy Model. Water Resour. Manag 26, 4467-4483. [ Links ]

Karabörk M. C., E. Kahya and M. Karaca, 2005. The influences of the Southern and North Atlantic Oscillations on climatic surface variables in Turkey. Hydrol. Process. 19, 1185-1211. [ Links ]

Mariotti A., N. Zeng and K. M. Lau, 2002. Euro-Mediterranean rainfall and ENSO-a seasonally varying relationship. S. Afr. J. Sci. 82,196-198. [ Links ]

McKee T. B., N. J. Doesken and J. Kleist, 1993. The relationship of drought frequency and duration to time scales. Eighth Conference on Applied Climatology, Anaheim, California, January. [ Links ]

Mishra A. K. and V. R. Desai, 2005. Drought forecasting using stochastic models. Stoch. Env. Res. Risk A. 19, 326-339. [ Links ]

Moriasi D. N., J. G. Arnold, M. W. van Liwe, R. L. Bigner, R. D. Harmel and T. L. Veith, 2007. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. T. ASABE 50, 885-900. [ Links ]

Nazemosadat M. J. and I. Cordey, 2000. On the relationship between ENSO and autumn rainfall in Iran. Int. J. Climatol. 20, 47-61. [ Links ]

Pongracz R. and J. Bartholy, 2006. Regional Effects of ENSO in Central/Eastern Europe. Adv. Geosci. 6, 133-137. [ Links ]

Quinlan J. R., 1992. Learning with continuous classes. Proceedings of the Australian Joint Conference on Artificial Intelligence World Scientific, Singapore, pp. 343-348. [ Links ]

Rezaeian-Zadeh M., H. Tabari and H. Abghari, 2012. Prediction of monthly discharge volume by different artificial neural network algorithms in semi-arid regions. Arab J. Geosci. doi:10.1007/s12517-011-0517-y. [ Links ]

Ruigar H. and S. Golian, 2016. Prediction of precipitation in Golestan dam watershed using climate signals. Theor. Appl. Climatol. 123, 671-682. [ Links ]

Sanikhani H. and O. Kisi 2012. River flow estimation and forecasting by using two different adaptive neuro-fuzzy approaches. Water Resour. Manag 26, 1715-1729. [ Links ]

Shrestha S. and F. Kazama, 2007. Assessment of surface water quality using multivariate statistical techniques: a case study of the Fuji river basin, Japan. Environ. Model. Software 22, 464-475. [ Links ]

Takagi T. and M. Sugeno, 1985. Fuzzy identification of systems and its applications to modeling and control. IEEE T. Syst. Man. Cyb. 15, 116-132 [ Links ]

Taylor K. E., 2001. Summarizing multiple aspects of model performance in a single diagram. J. Geophys. Res. 106, 7183-7192 (also see the PCMDI Report 55, available at: http://www-pcmdi.llnl.gov/publications/ab55.html). [ Links ]

Zurada J. M., 1992. Introduction to artificial neural systems West Publishing Company, Saint Paul, Minnesota, 400 pp. [ Links ]

Received: March 25, 2015; Accepted: February 22, 2016

* Corresponding author: Bahram Choubin; email: Bahram368@gmail.com

Creative Commons License This is an open-access article distributed under the terms of the Creative Commons Attribution License