1 Introduction
This work is about Artificial Neural Networks (ANN) and their applications to financial time series forecasting. We use two types of algorithms, backpropagation (BP) and resilient backpropagation (RBP), to produce the weights needed for prediction. The final scientific objective is to use the network weights to estimate some measures of relative importance. One of the main difficulties when using nonparametric methods such as ANN is the interpretation and meaning of the weights (parameters) obtained. Though interpretation is difficult due to the nature and purpose of machine learning methods, we intend to offer some conclusions on the importance of the variables used for prediction. In this respect, ANN analysis is the method for obtaining information about which variables are more relevant for forecasting.
The most common architecture for prediction in times series is the single layer or the multilayer perceptron feedforward networks. When deciding on the activation function it is common to decide on a sigmoid type, which is the standard when the prediction is on the range between zero and one. The simplest and most common learning rule for forecasting is the errorcorrection type. But perhaps one important parameter needed to feed our analysis is the learning rate which decides on how well the updates in the network performs.
When using the traditional Backpropagation algorithm we must do some previous work in order to choose the learning rate that best fits our model. But this could be a time consuming process and there is not always assurance that the network will work well. One way to go around this problem is to use a different algorithm that may endogenously determine this learning rate. We decided to use the Resilient Backpropagation (RBP) algorithm which offers a simple and heuristic method to find the network weights without first determining the learning rate.
The RBP algorithm is an improvement on traditional Neural Networks using backpropagation algorithm, first proposed by ^{Riedmiller and Braun [14]} in 1993. Riedmiller developed a flexible algorithm to tackle the main problems in the traditional Backpropagation, in especial the vanishing gradient problem and the need for crossvalidation analysis for estimating the learning rate. The new algorithm allowed for weights backtraking and a heuristic adjustable learning rate that improved prediction.
The Artificial Neural Networks history began perhaps since 1940’s when ^{McCulloch and Pitts [12]} first proposed the idea of simulating neuronal activity using mathematical logic. But it was until early 1960’s that the idea that machines can learn was first explored by ^{Rosenblatt [15]} with the creation of the Perceptron Algorithm. For the first time the possibilities of Artificial Intelligence were recognized when this algorithm was tested on an IBM 704 computer. Although there were great expectation about artificial intelligence at that time, the computer technology was not well advanced at that time to make Artificial Neural Networks to work in their full potential. A big lap forward in ANN research was PJ Werbos’s 1974 unpublished PhD Dissertation "Beyond regression: new tools for prediction and analysis in the behavioural sciences", where he first proposed backpropagation to train Neural Networks. A detailed explanation on Backpropagation can be found also in ^{Werbos [17]}. Backpropagation algorithm was indeed a breaktrough that allowed the effective use of gradient descent method in the training of ANN.
The development of the Neocognitron by ^{Fukushima and Miyake [4]} inspired the creation of Convolutional Neural Networks (CNN) which are indeed Deep Neural Networks (DNN) with multi layers but where some hidden layers are called convolutional layers as they perform a convolution that connects to the next hidden layers of neurons. This type of neural networks are often used in pattern recognition problems. In 1982 John Hopfield, with his paper ^{Hopfield [6]}, invented the Hopfield Network with the purpose of modelling human memory. This later was known as Recurrent Neural Network, in which time lapses between hidden layers and neurons were important to model human learning process and memory. Several other types of DNN with different variations and architectures have been created in recent years. Deep learning is currently a field very dynamic with great possibilities.
On the side of financial time series analysis, ^{Sapankevych and Sankar [16]} made a survey on the SVM (Support Vector Machines) and focused on times series prediction using SVM. ^{Kim [11]} is an analysis on financial time series using SVM on the Korean composite stock exchange market and and compares the SVM results with Artificial Neural Network (ANN) models and finds that the SVM outperforms slightly those of ANN models. ^{Huang et al. [7]} predicted the Japanese Stock Exchange index using SVM and concluded that SVM has a better hit ratio than neural networks. ^{Kara et al. [10]} is a similar analysis on the Istanbul stock index using ANN and SVM. Contrary to the previous works on financial time series, this work concluded that the Neural Networks performed better than SVM with a higher hit ratio. ^{Cao and Tay [3]} is an analysis using Futures contract in the Chicago Mercantile Market using SVM, backpropagation NN and Regularized Radial Basis function NN. Their results also show that SVM outperforms backpropagation NN and has similar performance against Radial Basis function NN.
This work shows the basic formulation of Artificial Neural Networks and their practical application to time series. We introduce financial forecasting using the Resilient Backpropagation Algorithm (RBP), which was proposed by Martin ^{Riedmiller und Heinrich Braun in Riedmiller and Braun [13]} in 1992. They published their work the next year in ^{Riedmiller and Braun [14]}. This algorithm tries to solve the problem of the learning rate especially in noisy data. ^{Igel and Husken [9]} is also a work in RBP which explains the weight backtracking technique.
As mentioned before, the final scientific objective is to measure which features are important for prediction in whole financial markets. ^{Huang et al. [8]} is a paper that includes different measures for assessing the relative importance of each feature in the neural network prediction. The two importance measures found in the literature are the Garson and Yoon contribution measures but we noticed that both are not highly correlated. The interpretation and comparison between both measures is not straight forward and, in average, the correlation between them is about 0.54 in our analysis. Our hypothesis is that these contribution measures are not well suited to describe the relative importance of each feature. We decided to construct a simple measure in order to describe the importance of each feature variable in the best ANN model obtained.
Given the above scientific objective, we do not focus in model selection techniques. Although prediction depends on the network architecture and other technical choices such as the learning rule or the activation function, the main purpose is to observe which features are better for prediction. This is information is already embedded in the data and its complexity. Although the determination of the best model is important for prediction (e.g. evolutionary algorithms) we decided to focus on this subject in future research.
In this work we are going to work with ANN for binary classification with the objective of predicting ups and downs in the stock exchange indexes. In the first part of this work we introduce Neural Networks and the Resilient Backpropagation Algorithm. In the second part, we use data from six stock exchange markets (Hong Kong, Japan, Germany, Europe50, Canada and Mexico indexes) in order to obtain prediction on the ups and downs on the stock indexes. The final part of this work includes an analysis on the relative importance of the feature variables using different contribution measures.
2 Artificial Neural Networks
2.1 The theory
Artificial Neural Networks using the Backpropagation algorithm is a traditional method for classification and forecasting. Though several versions of Deep Neural Networks (DNN) are now popular powerful tools for analysis, still the backbone behind all architectures of Neural Networks continue to be the gradient descent method used in Feedforward and Backpropagation algorithms. Both ANN and DNN have a wide range of important commercial applications. There have been numerous efforts to design artificial neural networks based on Von Neumann’s architecture, trying to produce intelligent programs that mimic biological neural network. Neurons are very special cells in the human brain, interconnected with each other and responding to stimuli using chemical and electric reactions with connections called synapses. The idea of ANN is to simulate neurons stimuli process and let this neurons to learn by themselves.
ANN can perform complex classification problems. For a simple binary classification, the idea is to construct a decision function
Which can also be written in the form:
The main objective is to find the vector of weights
The process of training a ANN will depend on the activation function we want to use as well as the method to find the appropriate weights recursively. Usually, we may initiate to train the ANN with random input values and then apply weights to every data point that will pass on information to a hidden layer where the information will be processed by an activation function. Weights
To represent
This is akin to a logistic regression function with
The decision function will approximate the label
The crucial step is to minimize the error function
Where the
By computing partial derivatives of the error function with respect to the parameters, the gradients become:
The stochastic gradient descent algorithm allows to learn the decision function
The gradient descent algorithm is a key feature of an ANN. Although more sophisticated algorithms are being developed, still gradient descent algorithm is still the core method in ANN. There is also the disadvantage of vanishing gradient when weights are too small and make the gradient to go to zero. Perhaps the vanishing gradient problems was the main disadvantage of the ANN and also the main motivation to develop more sophisticated networks.
Another idea is to separate the data into smaller problems and to solve for each problem separately. For example, in our binary classification problem, some data with a label equal to zero will be a single cluster between two separated clusters of data labelled one. Now we will need two decision functions with more parameters and we need to construct a neural network with two neurons. The idea is to make a decision function of decision functions so that to predict the label
What we are building now is a neural network where the hidden layer that store the activation function
In the above decision function all parameters and biases must be found at the same time using gradient descent. We are iterating forward, which means that iteration to update parameters must go back to each input data point in the training sample
2.2 Backpropagation algorithm
Another way to learn is to use the backpropagation algorithm (BP). But before we may consider the possibility of more complex ANN architectures. We may consider adding more neurons but also additional hidden layers to our Neural Networks, on what is commonly known as Multilayer Perceptron Network. BP requires that once the feedforward process has been completed and we have arrived to the output layer
And the decision function becomes:
The first decision function is just the layer of inputs
And for the bias:
The second part of the above derivatives,
To find the first part of the above derivatives 15 and 16, we define
and since
Now we can obtain the derivative
Where the
2.3 Resilient Backpropagation Neural Networks (RBP)
The backpropagation algorithm allows the network to learn and get the parameters
Another algorithm commonly used in ANN is the heuristic Resilient Backpropagation algorithm (RBP). This algorithm is a slightly different version of the Backpropagation but instead of using the magnitude of the gradients
With the Backpropagation algorithm we have seen that the weights are updated following the general form:
Where
The RBP algorithm proposes that the update is performed with the sign of the derivative rather than the size of it as follows:
Another importance change is that the update parameter
Where
The methods of weight backtraking is also based on heuristics, and the idea is to keep using previous weights for updating (some weights only). For example, if:
But if less than zero, we use the previous update:
This implementation trick avoid the updating of the learning rate then avoiding using the otherwise option above. The advantages of using RBP algorithm is that reduces computation with the advantage of similar, if not better, precision. It is very useful when the data contains noise, which means that it preforms well when applied to financial time series data sets. In the next section we will present some results using RBP neural networks.
3 Time Series Forecasting
3.1 The data
The first task in this work is to forecast a time series using binary classification with ANN methods. A basic classification would be to describe the behaviour of a stock or stock index in order to predict its movement. Predicting stock prices is important as we would want to decide if we need to buy or sell a stock or predict the ups and downs of a price index. In this case we would want to define a label
The next question is defining the features that will be used to predict the movements in stock prices. In other words, we need the matrix of features x that will help to define the label y. We decided to use some technical analysis concepts as in ^{Kim [11]}, most of them taken from ^{Achelis [1]}. Technical analysis indicators will be our features matrix x. An experienced trader may read the concepts in table 1 and along with additional information then try to predict changes in stock prices. These features are mostly ratios of prices, moving averages or both.
Stochastic % K 

Stochastic % D (Stochastic moving average of K) 

Slow %D (Moving average of %D) 

Momentum 

Rate of Change (ROC) 

Williams’ %R 

A/D Accumulation/Distribution Oscillator 

Disparity5 

Price Oscillator (OSCP) 

Commodity Channel Index 







Relative Strength Index 

Table 1 shows twelve well known technical indicators for trading. These are constructed with simple market data such as closing price (CP), lowest low price (LL), highest high price (HH), high (H) and low (L) prices during the trading day or period and it is very common to use Moving Averages (MA) for their construction. The stochastic oscillators such as stochastic
All features in table 1 are associated with the prices of the stocks and are used to interpret the trends of stock market prices. The entire data set for a given stock market index will be the label y and the feature matrix x that describes the label. All technical analysis indicators will be used to classify our label in both directions, ups and downs for the entire stock market index. There are dozens more technical analysis indicators that can be used, but we are trying to use some the most popular and also applied in other similar research.
This section contains an empirical analysis using RBP algorithm in order to predict time series, particularly changes in stock price indexes. We chose to predict changes in six major European, Asian and North American stock market indexes. We used six stock indexes: The European STOXX50 that contains blue chip stock from the 50 best performing companies in leading sectors in Europe; the DAX which is also an index that contains 30 blue chip German companies; the Nikkei stock exchange index, the Hang Seng index which is the stock exchange index from Hong Kong, the Canadian Toronto Stock Exchange index and the Mexican Stock Exchange index IPC.
We decided to use daily data for each stock exchange from January 2000 to June 2019, less than five thousands daily observation in each market. Compared with the same data from 1980’s and 1990’s, the period of analysis is high frequency data and contains sharp financial crashes, perhaps due to the new trading methods using electronic platforms and the availability of information online. Financial markers are now more competitive as communication technology has improved along with capital mobility. Table 5 in the appendix contains the summary statistics for the six markets on closing, high, low and open market prices.
3.2 Estimation
With the information on the average prices in each market, we first constructed our matrix of features
Because ANN is a supervised machine learning method, we are going to demand to the network to find the best way to predict y using the twelve technical analysis features constructed using indicators in table 1 (matrix
When the whole data set with y and
The only thing left for clarification will be the estimation of the Hit ratio for each prediction. After running each of the ANN models, we will get the predicted values using the test data into a new data set with predicted values for the label
The prediction performance is measured using a hit ratio, defined by:
This hit ratio is the percentage of correct matches where
The main part of the empirical analysis requires to use ANN to predict time series. We trained different single layer networks using traditional Backpropagation and Resilient Backpropagation Neural Network algorithm. At first, single layer neural networks were constructed with 6, 12, 18 and 24 neurons each using standard logistic and error functions. Later we trained multilayer networks with 6 and 12 neurons in three hidden layers. The results are shown in table 2.
Backpropagation (learning rate=0.1)  

INDEX \ Neurons  6  12  18  24  666  121212 
TSE (Canada)  0.4522  0.4522  0.4522  0.5478  0.4522  0.4522 
IPC (Mexico)  0.4905  0.5095  0.4905  0.5095  0.4905  0.4905 
Nikkei (Japan)  0.4600  0.5400  0.4600  0.5400  0.4600  0.4600 
Europe50  0.4763  0.4763  0.4763  0.5237  0.4763  0.4763 
Han Seng (Hong Kong)  0.4757  0.5243  0.4757  0.5243  0.4757  0.4757 
DAX (Germany)  0.4612  0.5388  0.4612  0.5388  0.4612  0.4612 
Resilient Backpropagation (weight bactracking)  
INDEX \ Neurons  6  12  18  24  666  121212 
TSE (Canada)  0.5478  0.5959  0.6678  0.5458  0.5478  0.5478 
IPC (Mexico)  0.5095  0.5194  0.5095  0.5095  0.5095  0.5095 
Nikkei (Japan)  0.4892  0.4929  0.5743  0.4899  0.5131  0.5049 
Europe50  0.4564  0.4515  0.4522  0.4536  0.4557  0.4529 
Han Seng (Hong Kong)  0.5028  0.5125  0.4944  0.4993  0.5271  0.5049 
DAX (Germany)  0.5287  0.5273  0.5745  0.5300  0.5388  0.5179 
One disadvantage of the ANN is the cost in training increases when the architecture becomes more complex. As the number of neurons and hidden layers increase, the longer the training time is required. On the other hand, ANN with backpropagation may obtain better performance due to a more flexible updating. Table 2 contains the hit ratios for different ANN models with different architectures. The upper part contains the hit ratios using traditional backpropagation with a learning rate of 0.1 while the lower part contains the hit ratios using resilient backpropagation with weight backtracking.
With the only exception being the Europe50 index, we find larger hit ratios using resilient backpropagation. This does not mean that we cannot achieve better results in traditional backpropagation, but for that we need to find the best learning rate and architecture. And this will require additional statistical analysis in order to decide the correct learning rate, as each model is different.
On the other hand, resilient backpropagation has a flexible and heuristic way to choose the learning rate and update the gradient for a better descent. The reader may notice that there is little room for improvement in each model using backpropagation as we use a single learning rate for every model. However, resilient backpropagation has room for improvement as the learning is controlled during convergence. Estimation in table 2 will vary as long as we choose different activation functions, learning rates and gradient methods for updating, but we decided to leave model selection for future research.
4 Contribution measures
This work focuses not only on financial forecasting using ANN but also offers a descriptive analysis on the overall performance of the features used for prediction. This is an important issue because we need information on the relative relevance of each feature in the learning process. We know that each feature was normalized when constructing the matrix x, so we may be able to apply some indicators on similar data and obtain some comparable results.
Index  Garson\Yoon  Garson\Trapezoid  Yoon\Trapezoid 

DAX  0.625  0.816  0.808 
NIKKEI  0.789  0.851  0.931 
IPC  0.550  0.722  0.908 
HS  0.050  0.307  0.881 
EU50  0.619  0.621  0.512 
TSE  0.511  0.698  0.864 
In order to find the relative importance of each feature we must apply a measure using the weights from the ANN analysis. The magnitude of each weight in every network tell us about the relative importance of each feature. This section provides with some measures on the relative contribution of each feature on the final output in a Neural Network. We estimated each contribution measure on the best single hidden layer ANN neural network. For example, if the input layer has
And the second is the Yoon measure:
The Garson measure can be interpreted as percentages of contribution on the final output. Yoon contribution index is more complicated to interpret, though we may interpret a high absolute value of Yoon measure as high relevance. Both measures are designed for a single layer Neural Network, then the best single layer results for each model. The results of the estimation are shown in table 4 for each market (the number in the parenthesis shows the number of neurons in the hidden layer). For four markets the ROC seems to be the feature with the highest contribution to the financial forecasting, except for the Hang Sheng index and the Euro50 index. These were the only two indexes where we used the best single layer hit ratio using backpropagation.
Features  TSE (18n)  Nikkei (18n)  DAX (18n)  

Garson  Yoon  Trapezoid  Garson  Yoon  Trapezoid  Garson  Yoon  Trapezoid  
A/D  1.67%  0.019  2.8%  5.52%  0.018  2.03%  5.58%  0.027  2.66% 
CCI  2.43%  0.005  3.6%  6.55%  0.015  3.45%  6.56%  0.006  3.27% 
Disp10  7.26%  0.058  3.8%  8.21%  0.017  4.12%  7.07%  0.041  3.01% 
Disp5  9.92%  0.027  3.7%  9.07%  0.013  4.35%  7.16%  0.006  4.84% 
fastD  12.15%  0.023  10.0%  8.38%  0.093  10.67%  9.20%  0.040  7.44% 
fastK  8.08%  0.010  3.8%  6.73%  0.085  4.93%  8.37%  0.040  5.00% 
Moment  9.56%  0.035  4.9%  8.42%  0.005  4.66%  10.49%  0.020  5.85% 
OSCP  9.65%  0.015  3.9%  9.19%  0.012  3.54%  8.44%  0.009  4.05% 
ROC  15.73%  0.637  38.5%  13.84%  0.722  42.38%  13.42%  0.558  39.58% 
RSI  7.94%  0.010  5.4%  8.70%  0.001  3.61%  8.82%  0.042  5.75% 
slowD  4.08%  0.020  5.3%  6.49%  0.005  4.39%  6.04%  0.011  4.18% 
WilliamsR  11.53%  0.143  14.3%  8.89%  0.015  11.87%  8.84%  0.201  14.37% 
Features  IPC (12n)  HSI (12n)  EU50 (24n)  
Garson  Yoon  Trapezoid  Garson  Yoon  Trapezoid  Garson  Yoon  Trapezoid  
A/D  3.35%  0.001  1.33%  7.54%  0.034  5.71%  7.87%  0.076  6.38% 
CCI  6.84%  0.001  2.95%  8.24%  0.118  8.44%  9.88%  0.162  9.77% 
Disp10  7.49%  0.018  2.99%  7.98%  0.107  9.58%  7.86%  0.033  8.41% 
Disp5  7.13%  0.003  2.86%  8.39%  0.124  9.48%  7.86%  0.087  8.05% 
fastD  10.21%  0.021  8.45%  7.05%  0.003  7.17%  7.71%  0.052  7.09% 
fastK  8.79%  0.098  7.90%  9.59%  0.238  15.42%  9.22%  0.110  13.09% 
Moment  7.51%  0.003  1.42%  7.31%  0.081  6.61%  7.07%  0.027  7.27% 
OSCP  10.09%  0.022  3.23%  8.05%  0.056  7.65%  8.43%  0.078  8.23% 
ROC  13.29%  0.717  49.24%  8.37%  0.065  7.44%  9.10%  0.139  9.73% 
RSI  9.15%  0.003  2.27%  8.95%  0.130  9.35%  7.75%  0.099  7.36% 
slowD  6.41%  0.004  4.47%  6.21%  0.017  6.23%  8.43%  0.046  8.37% 
WilliamsR  9.75%  0.109  12.90%  12.33%  0.028  6.91%  8.82%  0.090  6.25% 
High price  
Index  N  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
DAX  4,942  7,346.74  2,795.28  6,801.95  2,319.65  13,596.89  11,277.24  0.49  0.78 
Nikkei  4,776  14,120.67  4,285.77  13,636.81  7,100.77  24,448.07  17,347.30  0.41  0.98 
IPC  4,876  28,553.79  15,434.73  31,543.24  5,109.40  51,772.37  46,662.97  0.24  1.45 
Hang Seng  4,800  19,577.61  5,696.33  20,623.56  8,430.62  33,484.08  25,053.46  0.06  0.86 
EU50  4,850  3,241.86  717.30  3,111.17  1,809.98  5,464.43  3,654.45  0.86  0.52 
TSE  4,917  11,867.28  2,843.10  12,294.60  5,812.90  16,672.70  10,859.80  0.31  1.04 
Low Price  
Index  N  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
DAX  4,942  7,233.77  2,780.47  6,691.01  2,188.75  13,517.81  11,329.06  0.49  0.77 
Nikkei  4,776  13,935.29  4,257.91  13,403.06  6,994.90  24,217.26  17,222.36  0.42  0.96 
IPC  4,876  28,168.76  15,280.96  31,087.91  4,950.71  51,524.23  46,573.52  0.23  1.46 
Hang Seng  4,800  19,316.24  5,639.54  20,386.76  8,331.87  32,897.04  24,565.17  0.06  0.88 
EU50  4,850  3,241.86  717.30  3,111.17  1,809.98  5,464.43  3,654.45  0.86  0.52 
TSE  4,917  11,738.87  2,834.99  12,151.10  5,678.30  16,589.80  10,911.50  0.29  1.05 
Open Price  
Index  N  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
DAX  4,942  7,293.06  2,788.12  6,746.28  2,203.97  13,577.14  11,373.17  0.49  0.77 
Nikkei  4,776  14,032.76  4,273.57  13,553.15  7,059.77  24,376.17  17,316.40  0.42  0.97 
IPC  4,876  28,362.18  15,362.55  31,307.40  5,077.39  51,590.48  46,513.09  0.23  1.46 
Hang Seng  4,800  19,460.25  5,672.78  20,518.17  8,351.59  33,335.48  24,983.89  0.06  0.87 
EU50  4,850  3,241.86  717.30  3,111.17  1,809.98  5,464.43  3,654.45  0.86  0.52 
TSE  4,917  11,807.88  2,839.82  12,219.80  5,689.40  16,642.10  10,952.70  0.30  1.04 
Close Price  
Index  N  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
DAX  4,942  7,292.11  2,787.72  6,748.30  2,202.96  13,559.60  11,356.64  0.49  0.77 
Nikkei  4,776  14,027.96  4,273.60  13,541.62  7,054.98  24,270.62  17,215.64  0.42  0.97 
IPC  4,876  28,368.40  15,360.10  31,321.52  5,081.92  51,713.38  46,631.46  0.23  1.46 
Hang Seng  4,800  19,450.48  5,665.96  20,511.59  8,409.01  33,154.12  24,745.11  0.06  0.87 
EU50  4,850  3,241.86  717.30  3,111.17  1,809.98  5,464.43  3,654.45  0.86  0.52 
TSE  4,917  11,805.41  2,838.86  12,220.20  5,695.30  16,669.40  10,974.10  0.30  1.04 
TSE (Canada)  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
Label  0.54  0.50  1.00  0.00  1.00  1.00  0.14  1.98 
AD  13175.28  4966.81  12835.92  5126.09  21788.43  16662.33  0.22  1.33 
CCI  19.59  109.43  37.32  330.81  327.16  657.97  0.43  0.51 
FastK  58.82  30.98  64.24  0.00  100.00  100.00  0.35  1.19 
FastD  58.82  28.77  63.90  0.25  100.00  99.75  0.34  1.23 
SlowD  58.82  27.88  63.72  1.14  98.42  97.28  0.34  1.22 
Williams R  41.18  30.98  35.76  0.00  100.00  100.00  0.35  1.19 
Disp10  100.07  1.71  100.26  85.94  108.36  22.42  1.15  6.55 
Disp5  100.03  1.15  100.14  90.03  106.43  16.40  0.82  6.09 
Moment  6.49  229.82  23.60  1884.90  1225.50  3110.40  0.83  4.29 
OSCP  0.00  0.01  0.00  0.08  0.04  0.12  1.23  6.44 
ROC  0.01  1.07  0.06  9.79  9.37  19.16  0.66  10.07 
RSI  53.14  12.05  53.83  12.78  84.00  71.23  0.27  0.31 
IPC (Mexico)  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
Label  0.53  0.50  1.00  0.00  1.00  1.00  0.12  1.99 
AD  28733.10  19597.32  31712.29  973.64  59315.71  58342.07  0.13  1.54 
CCI  20.33  110.74  42.03  357.18  371.29  728.47  0.38  0.51 
FastK  57.88  30.91  63.36  0.00  100.00  100.00  0.34  1.19 
FastD  57.90  28.86  63.21  0.51  99.88  99.37  0.34  1.25 
SlowD  57.92  27.99  63.04  1.91  99.57  97.66  0.34  1.25 
Williams R  42.12  30.91  36.64  0.00  100.00  100.00  0.34  1.19 
Disp10  100.18  2.20  100.29  84.56  112.72  28.17  0.57  4.27 
Disp5  100.08  1.47  100.14  89.78  109.98  20.20  0.33  4.76 
Moment  29.31  665.07  51.18  4496.07  3554.29  8050.36  0.39  3.65 
OSCP  0.00  0.01  0.00  0.08  0.07  0.15  0.72  4.45 
ROC  0.04  1.29  0.07  8.27  10.44  18.71  0.00  5.38 
RSI  53.53  12.56  54.50  11.49  86.44  74.94  0.20  0.51 
Nikkei (Japan)  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
Label  0.51  0.50  1.00  0.00  1.00  1.00  0.05  2.00 
AD  33210.80  4452.53  34423.82  40305.15  20749.27  19555.87  0.66  0.51 
CCI  8.83  109.64  19.54  430.68  321.73  752.40  0.24  0.67 
FastK  55.24  32.47  58.55  0.00  100.00  100.00  0.20  1.37 
FastD  55.25  30.28  58.34  0.22  100.00  99.78  0.19  1.41 
SlowD  55.26  29.40  58.24  0.66  98.49  97.83  0.18  1.40 
Williams R  44.76  32.47  41.45  0.00  100.00  100.00  0.20  1.37 
Disp10  100.03  2.40  100.21  79.79  113.32  33.53  0.72  4.09 
Disp5  100.01  1.61  100.15  86.81  113.79  26.98  0.65  5.95 
Moment  2.43  381.10  25.95  2415.93  1671.34  4087.27  0.59  2.75 
OSCP  0.00  0.01  0.00  0.10  0.07  0.16  0.76  3.69 
ROC  0.00  1.52  0.03  12.11  13.23  25.35  0.40  6.28 
RSI  51.83  12.18  51.70  13.54  92.94  79.41  0.10  0.30 
Europe50  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
Label  0.51  0.50  1.00  0.00  1.00  1.00  0.03  2.00 
AD  2140.52  717.30  2009.83  708.64  4363.09  3654.45  0.86  0.52 
CCI  8.84  108.23  23.32  366.03  367.35  733.38  0.31  0.60 
FastK  55.99  36.96  61.44  0.00  100.00  100.00  0.25  1.44 
FastD  56.00  34.09  61.01  0.00  100.00  100.00  0.24  1.43 
SlowD  56.02  32.95  60.76  0.00  100.00  100.00  0.23  1.43 
Williams R  44.01  36.96  38.56  0.00  100.00  100.00  0.25  1.44 
Disp10  99.98  2.23  100.23  84.30  110.57  26.27  0.76  3.08 
Disp5  99.99  1.53  100.10  89.74  107.94  18.20  0.44  2.97 
Moment  1.23  85.01  5.83  487.77  435.31  923.08  0.46  2.49 
OSCP  0.00  0.01  0.00  0.07  0.05  0.12  0.75  2.97 
ROC  0.01  1.46  0.02  9.01  10.44  19.45  0.06  4.71 
RSI  51.48  11.04  52.21  14.14  77.81  63.67  0.22  0.54 
Han Seng (HK)  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
Label  0.52  0.50  1.00  0.00  1.00  1.00  0.06  2.00 
AD  21143.14  6147.85  23233.24  8928.13  34672.11  25743.98  0.41  1.10 
CCI  10.54  109.03  19.31  338.04  300.85  638.89  0.17  0.86 
FastK  54.97  32.60  58.64  0.00  100.00  100.00  0.19  1.42 
FastD  54.97  30.54  58.01  1.33  99.59  98.26  0.17  1.45 
SlowD  54.97  29.66  57.92  2.98  98.79  95.81  0.17  1.44 
Williams R  45.03  32.60  41.36  0.00  100.00  100.00  0.19  1.42 
Disp10  100.06  2.39  100.23  76.16  110.29  34.13  0.60  4.57 
Disp5  100.03  1.59  100.10  82.57  113.26  30.68  0.47  7.04 
Moment  9.65  553.11  33.43  4025.33  2952.83  6978.16  0.35  3.34 
OSCP  0.00  0.01  0.00  0.09  0.06  0.16  0.54  3.23 
ROC  0.01  1.47  0.05  13.58  13.41  26.99  0.10  8.05 
RSI  52.11  12.58  52.33  15.05  89.41  74.36  0.04  0.51 
Dax (Germany)  Mean  SD  Median  Min  Max  Range  Skew  Kurtosis 
Label  0.53  0.50  1.00  0.00  1.00  1.00  0.12  1.99 
AD  6225.18  4595.94  5056.53  825.28  15989.25  16814.53  0.66  0.79 
CCI  15.34  109.28  34.90  303.85  346.90  650.75  0.34  0.69 
FastK  57.98  31.51  63.09  0.00  100.00  100.00  0.31  1.29 
FastD  57.98  29.34  62.85  0.45  99.65  99.21  0.29  1.33 
SlowD  57.98  28.45  62.83  2.20  99.40  97.20  0.28  1.32 
Williams R  42.02  31.51  36.91  0.00  100.00  100.00  0.31  1.29 
Disp10  100.07  2.34  100.35  84.06  111.92  27.86  0.79  3.52 
Disp5  100.03  1.57  100.15  90.31  108.14  17.82  0.50  3.18 
Moment  4.40  187.76  16.42  1267.49  807.60  2075.09  0.49  2.14 
OSCP  0.00  0.01  0.00  0.08  0.06  0.14  0.87  3.57 
ROC  0.01  1.47  0.08  8.87  10.80  19.67  0.06  4.59 
RSI  52.78  11.90  53.36  11.24  84.64  73.40  0.17  0.41 
One of the problems of the above measures is consistency. Both measures are positively correlated but just. For example, table 3 shows the correlation coefficient between the Garson Measure and the Yoon Measure. Both measures are highly correlated when analysing the Nikkei Index, but they are completely different in the Hang Seng index with a correlation of just 0.05. Another drawback of the Garson and Yoon measures is that they become difficult to calculate in more complex network architectures. Under such considerations, a different measure is needed to evaluate the contribution of each feature in a ANN model.
We decided to give a geometric interpretation to the weights in order to establish their relevance. For example, in a onehidden layer neural network, we interpret the weights
An appealing feature of this Trapezoid Contribution measure is that can be applied to any number of hidden layers and neurons in the network and is quite easy to calculate and interpret if we make percentages with the whole area and its parts. Table 4 show the relative importance of each feature from the ANN analysis using Garson, Yoon and the Trapezoid measures. For Japan, Canada, Mexico and German indexes the ROC is the most influential feature to predict the stock market index while the fastK is the most important in the Hong Kong and European50 indexes.
We may notice that the new Trapezoid contribution measure is highly correlated with the Yoon measure but also moderately correlated with the Garson measure. Most importantly, it is easy to calculate and can be applied to more complex network architectures.
5 Final Conclusions
This work contains a financial forecasting using both traditional backpropagation and Resilient Backpropagation Neural Networks and also an analysis on the relative importance of features used for forecasting. We use standard single layer and multilayer feed forward architectures to evaluate the performance of both algorithms, along with sigmoid activation function and errorcorrection learning rule, which are common for time series forecasting. The use of the RBP algorithm provides a practical solution to the determination of the learning rate and is especially helpful for data sets with noise such as financial stock indexes. The Resilient backpropagation with weight backtracking is a very flexible algorithm that can adjust to changes in model complexity. Some times it can find a better solution when the model specification changes.
This work provides a simple contribution measure in order to evaluate the importance of features in financial times series forecasting. The main reason comes from the lack of consistency in two available indexes: the Garson and the Yoon contribution measures. A simple measure using the concept of an area of a trapezoid captures de idea of contribution to the prediction using the ANN weights. This Trapezoid contribution measure uses the ANN weights from the best model (highest hit ratio from a single layer ANN) to calculate an area of an irregular trapezoid for every feature variable. Although this concept is simple it reflects the magnitude and influence of each weight in the network and can be interpreted as contribution to the forecasting.
We used the trapezoid contribution measure along with the Garson and the Yoon measures to analyse the relevance of each feature in the best ANN model for each of the six stock exchange indexes. We concluded that the ROC is perhaps a very relevant feature at least for four of the stock exchange indexes used: IPC, TSE, DAX and Nikkei. The European50 index and the Hang Seng index seem to respond more to the FastK indicator despite the Garson and Yoon contribution measures are not consistently showing this. In this respect, the trapezoid contribution measure offers additional relevant information that can be used to evaluate the contribution of each feature in the network.