Comparison of algorithms for the prediction of glucose levels in patients with diabetes

Olivares-Vera, Daniel Arturo; Gutiérrez-Hernández, David Asael; Escobar-Acevedo, Marco Antonio; Lara-Rendón, Claudia; Velázquez-Vázquez, Dulce A.; Olivares-Vera, Daniel Arturo; Gutiérrez-Hernández, David Asael; Escobar-Acevedo, Marco Antonio; Lara-Rendón, Claudia; Velázquez-Vázquez, Dulce A.

doi:10.21640/ns.v13i26.2752

Servicios Personalizados

Revista

Articulo

Indicadores

Citado por SciELO
Accesos

Links relacionados

Similares en SciELO

Otros
Otros

Permalink

Nova scientia

versión On-line ISSN 2007-0705

Nova scientia vol.13 no.26 León may. 2021 Epub 30-Ago-2021

https://doi.org/10.21640/ns.v13i26.2752

Ciencias Naturales e Ingenierías

Comparison of algorithms for the prediction of glucose levels in patients with diabetes

Comparación de algoritmos para la predicción de niveles de glucosa en pacientes con diabetes

Daniel Arturo Olivares-Vera^*^**

David Asael Gutiérrez-Hernández^*^**

Marco Antonio Escobar-Acevedo^*

Claudia Lara-Rendón^**

Dulce A. Velázquez-Vázquez^***

^{^*}Universidad De La Salle Bajío, León, Guanajuato, México

^{^**}Tecnológico Nacional de México / Instituto Tecnológico de León, León, Guanajuato, México

^{^***}Universidad IEXPRO, Tuxtla Gutiérrez, Chiapas, México

Abstract:

This work presents a comparison between two algorithms for the prediction of glucose levels in diabetic patients by using a univariate time series. The algorithms are applied to the history of fasting glucose levels to predict the five following values. The comparison is performed between 1) The Autoregressive Neural Networks (ARNN) and 2) The autoregressive integrated moving average (ARIMA) models. A total of 70 series are analyzed, and we show that the results obtained for the ARIMA model have error percentages higher than 25% of the predicted value to the expected value. In contrast, in 73% of the cases, the percentage error was less than 25% for the Autoregressive Neural Networks.

Keywords: neural network autoregressive; ARIMA; univariate time series; prediction; glucose; diabetic; neural networks; diabetes; algorithms; analysis models

Resumen:

Este trabajo presenta una comparación entre dos algoritmos para la predicción de los niveles de glucosa en pacientes diabéticos mediante el uso de series de tiempo univariadas. Los algoritmos se aplican al historial de niveles de glucosa en ayunas para predecir los 5 valores posteriores. La comparación se realiza entre 1) Las Redes Neuronales Autorregresivas (ARNN) y 2) Los modelos de media móvil integrada autorregresiva (ARIMA), se analizan un total de 70 series, y se muestra que los resultados obtenidos para el modelo ARIMA tienen porcentajes de error mayores del 25% del valor predicho con respecto al valor esperado, mientras que para las Redes Neuronales Autorregresivas en el 73% de los casos el porcentaje de error fue menor al 25%.

Palabras clave: red neuronal autorregresiva; ARIMA; series de tiempo univariadas; predicción; glucosa; diabético; redes neuronales; diabetes; algoritmos; modelos de análisis

Introduction

Diabetes mellitus is a chronic degenerative disease that is characterized by high blood glucose levels. This disease occurs when the pancreas stops producing insulin, when it does not produce it in enough quantities, or when the organism cannot use the insulin properly. Lack of insulin produces high glucose levels in the blood. This phenomenon is known as hyperglycemia and can severely damage many of the body's systems, e.g., cardiovascular, and nervous, in the long term (^{Wilmot et al., 2012}). Consequently, a group of metabolic diseases like cardiovascular diseases, neuropathy, nephropathy, retinopathy, and blindness might follow a diabetes diagnosis. By controlling the blood glucose levels, some of these diseases might be prevented or delayed (^{Harris et al., 1987}).

Diabetes is diagnosed by testing blood glucose levels (^{World Health Organization [WHO], 2016}). If one or more of the following criteria are satisfied: 1) the fasting blood glucose level is larger or equal to 126mg/dl 2) blood glucose is present after two hours of ingesting 75g of glucose 3) the blood glucose taken at random is larger than 200mg/dl.

Many diabetes patients periodically monitor their glucose levels, and they use insulin shots to compensate for the pancreas insulin production insufficiency. These patients might benefit from tools that help them decide when to apply insulin (^{Amaris et al., 2017}). The use of a predictive algorithm might be beneficial in these cases, and if the historical glucose levels follow a pattern, then their future values might be anticipated. For example, in reference (^{Zhao et al., 2012}), a prediction of glucose levels from continuous monitoring data is made using autoregressive models with exogenous inputs that establish the future glucose levels as a lineal combination of current and recent glucose levels. In that reference, an laten variable based technique is used to develop an empirical model for predicting the patient's glucose levels.

The glucose levels are known for their instability and nonlinearity. For example, ^{Frandes et al. (2017)} modeled the glucose dynamics using nonlinear chaotic properties by monitoring the glucose levels in patients under free-living conditions; autoregressive models were applied to predict glucose levels in 30- and 60-minutes time intervals. The logistic smooth transition autoregressive model obtained a high precision for high glucose variability patients.

^{Panella (2011)} demonstrated that neural networks are useful to approximate a function from their inputs using previous data in the time series. Gaussian neural networks can be used efficiently to predict type 2 diabetes's temporal evolution by considering the biologic time series's chaotic nature.

^{Ståhl and Johansson (2009)} showed how to estimate quantitative predictive models to design optimal insulin levels for the patients. Three aspects were considered: 1) insulin, 2) glucose, 3) insulin-glucose interaction, and different black-box and gray-box models were developed and analyzed. The models' short-term predictors for the glucose levels were designed to achieve prediction within two hours.

The neural networks (NN), multi-rate regression, and autoregressive integrated moving average (ARIMA) models are the most used models to study the evolution and make predictions. In ^{Velásquez et al. (2008)}, nonlinear models are used to predict the monthly electricity demand. Among these models, the multilayer perceptron, the autoregressive neural network (ARNN), and the ARIMA model were compared to predict the monthly electricity demand in Colombia by using only the demand's historical data. ARNN showed less percentage of error, while in (^{Amaris et al., 2017}).

^{Tang et. al. (1991)} compared three different times series with different characteristics and the they concluded that for time series with long memory both ARIMA and NN performed similarly, while for short memory the NN appeared to be superior. In contrast, for prediction of the solar radiation, ^{Reikard (2009)} concluded that ARIMA was superior. In another study (^{Adamowski et al., 2012}) compared several linear and nonlinear regression, ARIMA, NN and wavelet NN for urban water demand forecasting concluding that the wavelet NN was superior.

In this work, an analysis of the fasting glucose level is done to predict the following five values, comparing the ARNN and ARIMA models. The ARNN takes advantage of autoregressive (AR) models and multilayer perceptron (MLP) to capture glucose levels' complex dynamics. The ARIMA models are composed of three elements: autoregressive models (AR), an integrator (I), and the mobile averages (MA), which are useful to find longitudinal data adjustments.

Several experiments with ARNN were performed using three different configurations by modifying the number of neurons. The obtained results show that ARNN were favorable as compared against the ARIMA model. The two-layer and ten-neurons ARRN showed that 73% of the signals obtained error percentages below 25%.

Method

The data used in this work was obtained from the Diabetes-Data database, composed of 70 patients' data providing information like dates, glucose level monitoring times, and insulin dosages, along with aliment consumption and exercise performed (^{Michael, 2017}).

The ARIMA and ARNN models describe one or more variables over time. These models have been applied to predicting currency exchange rates, rainfall levels, and energy consumption. The artificial neural networks allow emulating the processing of information that the brain performs and allow it to be approximated to any function (^{Velásquez et al., 2008}). The ARRN combines an autoregressive linear model (AR) and multilayer perceptron (MLP) that contains a hidden layer. The ARNN is a model that allows using the advantages of the AR and MLP to capture complex dynamics (^{Velásquez et al., 2008}; ^{Velásquez et al., 2009}). The architecture of an ARNN is shown in Fig. 1.

Fig. 1 Autoregressive Neural Network Architecture (ARNN).

The ARNN model has a dependent variable f, that is obtained from applying a nonlinear function to N previous values, X_t-n for n = 1,…,N:

f= η+∑n=1NφnXt-n+∑h=1HβhGωh+∑n=1Nαn,hXt-n (1)

Where:

N Total number of previous values

φnφnαn,h

αn,hηWeight valuesβhβhωhωhXt-NInpunt valuesXt-N

Where GG is the sigmoid adaptive function define as:

Gu=11+exp⁡-uM (2)

The model parameters are η,φp,βh,wh,αp,h and M for i = 1,…,N and h = 1,…,H which are estimated by minimizing the regularization error: λE_* where λ is a user-defined parameter (Breu et al., 2011).

Box developed statistical models for the time series (^{Box et al., 1994}), where each observation value is modeled as a function of previous values (^{Amaris et al., 2017}; ^{Velásquez et al., 2008}; Breu et al., 2011; ^{Casdagli, 1989}; ^{Broz and Viego, 2014}). These models are known as ARIMA and are composed of the following parts: 1) autoregressive (AR) 2) integrand (I) 3) moving average (MA), this in order to adjust the longitudinal data.

The ARIMA models predict the future values of time series based on historical behavior, without considering the underlying factors responsible for the variations of the dependent variable (^{Broz and Viego, 2014}). The ARIMA workflow is shown in Fig. 2; the process starts by identifying the candidate model for the series to evaluate, following by an estimation, which refers to selecting the appropriate data. Next, a validation stage takes place, and the process ends with the prediction of future values.

Fig. 2 ARIMA Model.

The p, d, q, values must be assigned appropriately to model the time series's behavior and then select a reduced set of models to try to adjust the series. The ARIMA model is composed of 3 values (p, d, q), p represents the value of the autoregressive component (AR), d corresponds to the order of the integrand component (I), and q is the order value of the moving averages (MA).

ARIMA models can be expressed as:

Yt=φ1Yt-1+⋯φpYt-p+εt-θ1εt-1-⋯-θqεt-q (3)

Where:

φ is the autoregressive coefficient

θ moving average coefficient

ε error

Y_t-1Y_t-1 normalized series value

The neural networks (NN) have been used for the prediction in time series. A common error is not to realize that there is not an accepted methodology by the scientific community, but a set of guidelines and critical steps that have been adapted from general heuristics, the researcher ability, and previous knowledge of the analyzed series (^{Velásquez et al., 2008}; Zhang et al., 1998).

Results

A series of tests were performed based on the literature review. The models were applied to the 70 subjects in the available database in order to compare their performance. Each series has N glucose level samples; 70% of the data was used for training, and 30% for the prediction validation. Each one of these series has a different behavior since each of the individuals has a different lifestyle. In Fig. 3, three different signals are shown. The signals shown in Fig. 4 show glucose levels above 120 mg/dl.

Fig. 3 Signals for patients 01, 02, 68.

Fig. 4 Subject 29 ARIMA prediction.

Each, the ARIMA and ARNN models were applied to the elements of the database. In the ARIMA model, the signals were used in weekly cycles that showed the best results. The quantity of data available to each series is reduced with the number of cycles to find, train, and approximate the expected values.

Fig. 4 and Fig. 5 show the signals from subjects 29 and 56. Zooming in the region of interest is also shown along with the predicted values using ARIMA. In those plots, it can be observed that the expected and predicted values are close to each other.

Fig. 5 Subject 56 ARIMA prediction.

The ARNN was applied to each of the available times series using three different configurations, in Fig. 6 and 7, the predicted values for each of the configurations used by the ARRN. The five-neurons configuration is marked in red, in green the ten-neurons configuration, and the fifteen-neurons configuration was plotted in blue.

Fig. 6 Subject 01 ARNN prediction.

Fig. 7 Subject 56 ARNN prediction.

An evaluation of the results obtained using the two different prediction models was performed. As metrics, the absolute error (AE), mean squared error (MSE), and the root mean square error (RMSE) were used. Those results are presented in this section to predict the five subsequent values of the glucose levels. Table 1 shows the average error values by prediction of each of the tests performed.

Table 1 Percentage of average error (%).

Prediction Number
MAE		1^st	2^nd	3^rd	4^th	5^th	Total
	ARIMA	99.79	80.85	90.09	75.61	93.11	87.89
	ARNN (5)	64.60	57.15	68.16	63.26	65.27	63.69
	ARNN (10)	28.07	33.18	32.44	24.90	34.68	30.65
	ARNN (15)	70.58	60.90	54.58	67.58	83.46	67.42
MSE	ARIMA	14696.77	12006.81	12959.12	10385.84	13557.66	12721.24
	ARNN (5)	7353.49	6757.86	9324.24	8523.18	7449.15	7881.58
	ARNN (10)	1831.04	2510.09	2678.64	1320.53	2773.57	2222.77
	ARNN (15)	9945.59	9462.17	6018.96	14076.87	21935.19	12287.76
RMSE	ARIMA	121.23	109.57	113.83	101.91	116.43	12721.24
	ARNN (5)	85.75	82.20	96.56	92.32	86.30	88.77
	ARNN (10)	42.79	50.101	51.756	36.33	52.66	47.14
	ARNN (15)	99.72	97.274	77.582	118.64	148.10	12287.76

It should be noted that after performing the evaluations on the 70 patients with the four proposed models, the calculation of the mean absolute error (MAE), the mean square error (MSE), and the root of the mean square error (RMSE) by prediction and by the model was performed. It was identified that 73% of the 70 subjects evaluated obtained error percentages lower than 25% in the MAE with ten-neurons in the ARNN. However, the other 27% of the evaluated subjects obtained errors between 39 and 156, being the more accurate model. Since the glucose levels are known for their instability and nonlinearity, most of the literature on the subjects tries to predict the glucose in the short term (^{Ståhl, 2009}), using time series with sampled data in intervals from 5 to 120 minutes, see for example Table 1 in (^{Hameed, 2020}), or in other cases using continuous information (^{Pérez-Gandía, 2010}). The data that we have available has samples of approximately 24 hours, however this is the data that is available to the DM patients since they typically measure their sugar before breakfast.

The results obtained with the ARIMA model were not close enough to the sampled glucose values. The prediction values were high. In particular, when comparing with the values obtained by the ARNN.

Linear regression is applied between the expected value and the predicted value; a line at 45 degrees' angle will represent a high precision in the predictions, it is possible to observe the scatterplots that show the positive linear correlation between the sampled glucose levels and each of the model's predictions. In Fig. 8, the ten-neurons ARNN model is the model that approximates the most to a 45% degrees' straight line. It can also have observed that the data dispersion is less than in the other models; thus, this is the best model in our evaluation. It is also possible to infer from our data that the ARIMA model is not appropriate to predict glucose levels, or at least not when using univariate time series.

Fig. 8 Glucose sampled value and a) ARIMA, b) ARNN with 5 neurons, c) ARNN with ten-neuronss, d) ARNN with 15 neurons.

To compare the performance of each model, a linear regression analysis was performed for each model and to the five predicted values. The scatterplots and the linear adjustment for the first, second, third, fourth, and fifth predictions can be observed in Fig. 9, 10, 11, 12, and 13, respectively.

Fig. 9 Linear regression for the first prediction.

Fig. 10 Linear regression for the second prediction.

Fig. 11 Linear regression for the third prediction.

Fig. 12 Linear regression for the fourth prediction.

Fig. 13 Linear regression for the fifth prediction.

The R-squared adjustment is a statistical tool to measure how well a model predicts the sampled data; in other words, it is a measure of the relation between the predicting and goal variable. The R-squared takes values between 0 and 1; if close to zero the regression does not explain the variance in the response. On the other hand, a number close to 1 explains well the variance in the observed value in the output. In Table 2 are listed the obtained values for the R-squared of each prediction.

Table 2 coefficient of determination.

Prediction number
	1^st	2^nd	3^rd	4^th	5^th
ARIMA	0.03489	0.004939	0.0156	0.0128	0.006496
ARNN (5)	0.272	0.1769	0.09595	0.03649	0.1266
ARNN (10)	0.8007	0.658	0.6876	0.7695	0.6613
ARNN (15)	0.1545	0.2783	0.3203	0.002408	0.001855

In Fig. 9, it can be observed that the first prediction of the ARIMA model underperforms. However, the ten-neurons ARNN model approaches better the expected value; this is evident when comparing their respective values of the coefficient of determination since the first prediction for the ARIMA has a value of 0.03489, which is close to 0, and the ten-neurons ARRN has a value of 0.8007 which approaches 1.

In Fig. 10 it can be observed that the models follow the same trend. The R-Squared for the second prediction in the ARIMA model is 0.004939, while for the ten-neurons ARNN has a value of 0.658.

In Fig. 11, 12 and 13 the scatterplots of the third, fourth and fifth predictions are shown. The R-squared value is 0.0156, 0.0128, and 0.006496, respectively for the ARIMA model and 0.6876, 0.7695 and 0.6613, for the ten-neurons ARNN.

Based on the results obtained and analyzing the linear regressions and r-squared, the ARIMA model is not adequate for predicting glucose levels since the values for both tests were close to 0, indicating that there is no reasonable relation between the predicted and target variables values. In the ARNN model, the results obtained with the regressions are very favorable. It is verified with the R-squared adjustment values, which in the 5 predictions are the closest to 1, which indicates that the linear relationship between both variables is good. It should be noted that the first and fourth predictions of the ARNN model with ten-neurons are those that are closest to the predicted values.

In the ARNN model with fifteen-neurons, predictions four and five are not reliable since their R-squared adjustment is very close to 0. In deciding to use this model to predict glucose levels, it is crucial to consider that the prediction would be sufficient for three values ahead. However, the best model for predicting glucose levels is the ARNN model with ten-neurons. It is the model that its average absolute error by prediction and in general are the lowest. In terms of the R-squared adjustment, it is the model that finds the best relationship between the prediction and the target variable.

Conclusion

The performance of the ARIMA and ARNN model for the prediction of glucose levels was analyzed. The results show that ARNN can predict up to five values of glucose. In 73% of the cases, the error was below 25%. On the other hand, the ARIMA model shows that only 6% of the cases had an error below 25%. It is important to mention that a prediction will never be completely accurate since many variables related to each patient's behavior are not considered and cannot be controlled. Despite that, we have established that ARNN is a viable option based on the relative and absolute errors for prediction and as a whole for glucose prediction. The ARNN was also the model that obtained the best R-squared adjustment to the predicted and sampled values. As future work, we would like to include categorical data into our database to classify the patients according to meat consumption, physical activity, insulin dosage, and sampling time.

References

Adamowski, J., Fung Chan, H., Prasher, S. O., Ozga-Zielinski, B. y Sliusarieva, A. (2012). Comparison of multiple linear and nonlinear regression, autoregressive integrated moving average, artificial neural network, and wavelet artificial neural network methods for urban water demand forecasting in Montreal, Canada. Water Resources Research, 48(1), 1-14. DOI: https://doi.org/10.1029/2010WR009945 [ Links ]

Amaris, G., Ávila, H., & Guerrero, T. (2017). Aplicación de modelo ARIMA para el análisis de series de volúmenes anuales en el río Magdalena. Tecnura, 21(2017), 88-101. DOI: https://doi.org/10.14483/udistrital.jour.tecnura.2017.2.a07 [ Links ]

Box, G. E. P., & Jenkins, G. M. (1994). Time Series Analysis: Forecasting and Control. Journal of Time Series Analysis, 3. DOI: https://doi.org/10.1111/j.1467-9892.2009.00643.x [ Links ]

Breu, F., Guggenbichler, S., & Wollmann, J. (2008). ARNN: A packages for time series forecasting using autorreresive neural networks. Vasa, 8(2). [ Links ]

Broz, D. R., & Viego, V. N. (2014). Precios De Productos Almacenables. Madera y Bosques, 20(Primavera 2014), 37-46. [ Links ]

Casdagli, M. (1989). Nonlinear prediction of chaotic time series. Physica D: Nonlinear Phenomena, 35(3), 335-356. DOI: https://doi.org/10.1016/0167-2789(89)90074-2 [ Links ]

Frandes, M., Timar, B., Timar, R., & Lungeanu, D. (2017). Chaotic time series prediction for glucose dynamics in type 1 diabetes mellitus using regime-switching models. Scientific Reports, 7(1), 1-11. DOI: https://doi.org/10.1038/s41598-017-06478-4 [ Links ]

Hameed, H. y Kleinberg, S. (2020). Comparing Machine Learning Techniques for Blood Glucose Forecasting Using Free-living and Patient Generated Data. Proceedings of Machine Learning for Healthcare, 1-22. https://par.nsf.gov/biblio/10177612 [ Links ]

Harris, M. I., Hadden, W. C., Knowler, W. C. y Bennett, P. H. (1987). Prevalence of Diabetes and Impaired Glucose Tolerance and Plasma Glucose Levels in U.S. Population Aged 20-74 Yr. Diabetes, 36(4), 523-534. DOI: https://doi.org/10.2337/diab.36.4.523 [ Links ]

Michael Kahn, MD, PhD, Washington University, St. Louis, M. (n.d.). UCI Machine Learning Repository: Diabetes Data Set. [ Links ]

Panella, M. (2011). Advances in biological time series prediction by neural networks. Biomedical Signal Processing and Control, 6(2), 112-120. DOI: https://doi.org/10.1016/j.bspc.2010.09.006 [ Links ]

Pérez-Gandía, C., Facchinetti, A., Sparacino, G., Cobelli, C., Gómez, E. J., Rigla, M., De Leiva, A. y Hernando, M. E. (2010). Artificial neural network algorithm for online glucose prediction from continuous glucose monitoring. Diabetes Technology and Therapeutics, 12(1), 81-88. DOI: https://doi.org/10.1089/dia.2009.0076 [ Links ]

Reikard, G. (2009). Predicting solar radiation at high resolutions: A comparison of time series forecasts. Solar Energy, 83(3), 342-349. DOI: https://doi.org/10.1016/j.solener.2008.08.007 [ Links ]

Ståhl, F., & Johansson, R. (2009). Diabetes mellitus modeling and short-term prediction based on blood glucose measurements. Mathematical Biosciences, 217(2), 101-117. DOI: https://doi.org/10.1016/j.mbs.2008.10.008 [ Links ]

Tang, Z., de Almeida, C. y Fishwick, P. A. (1991). Time series forecasting using neural networks vs. Box-Jenkins methodology. Simulation, 57(5), 303-310. DOI: https://doi.org/10.1177/003754979105700508 [ Links ]

Velásquez, J D, Dyner, I., & Souza, R. C. (2008). Electricity spot price modelling in Brasil using an autoregressive neural network. Ingeniare, 16(3), 394-403. [ Links ]

Velásquez, Juan David, Franco, C. J., & García, H. A. (2009). Un modelo no lineal para la predicción de la demanda mensual de electricidad en colombia. Estudios Gerenciales, 25(112), 37-54. DOI: https://doi.org/10.1016/S0123-5923(09)70079-8 [ Links ]

Wilmot, E. G., Edwardson, C. L., Achana, F. A., Davies, M. J., Gorely, T., Gray, L. J., Khunti, K., Yates, T. y Biddle, S. J. H. (2012). Sedentary time in adults and the association with diabetes, cardiovascular disease and death: Systematic review and meta-analysis. Diabetologia, 55(11), 2895-2905. DOI: https://doi.org/10.1007/s00125-012-2677-z [ Links ]

World Health Organization. (2016). Global report on diabetes. Geneva, Switzerland. https://apps.who.int/iris/bitstream/handle/10665/204871/9789241565257_eng.pdf [ Links ]

Zhao, C., Dassau, E., Jovanovič, L., Zisser, H. C., Doyle, F. J., & Seborg, D. E. (2012). Predicting subcutaneous glucose concentration using a latent-variable-based statistical method for type 1 diabetes mellitus. Journal of Diabetes Science and Technology, 6(3), 617-633. https://doi.org/10.1177/193229681200600317 [ Links ]

Received: November 11, 2020; Accepted: March 06, 2021

Autor para correspondencia: Daniel Arturo Olivares-Vera, dolivares@delasalle.edu.mx

This is an open-access article distributed under the terms of the Creative Commons Attribution License