Daniel Steffen
Community University of Chapeco Region, Brazil
E-mail: daniel_steffen@unochapeco.edu.br
Anselmo
Chaves Neto
Federal University of Paraná, Brazil
E-mail: anselmo@ufpr.br
Submission: 04/05/2016
Revision: 20/05/2016
Accept: 23/08/2016
ABSTRACT
In this work, are
developed an experimental computer program in Matlab language version 7.1 from
the univariate method for time series forecasting called Theta, and implementation
of resampling technique known as computer intensive "bootstrap" to
estimate the prediction for the point forecast obtained by this method by
confidence interval. To solve this problem built up an algorithm that uses
Monte Carlo simulation to obtain the interval estimation for forecasts. The
Theta model presented in this work was very efficient in M3 Makridakis
competition, where tested 3003 series. It is based on the concept of modifying
the local curvature of the time series obtained by a coefficient theta (Θ). In
its simplest approach the time series is decomposed into two lines theta
representing terms of long term and short term. The prediction is made by
combining the forecast obtained by fitting lines obtained with the theta
decomposition. The results of Mape's error obtained for the estimates confirm
the favorable results to the method of M3 competition being a good alternative
for time series forecast.
Keywords:
Forecasting; Time Series; Bootstrap; Theta Model
1. INTRODUCTION
The
forecasting models are of great importance in the academic and social media
because of its wide applicability in various areas of scientific, industrial,
commercial and services. These predictions, which are made by such companies to
estimate the demand for its products, and consequently, plan the production
schedule, shopping and other activities including identifying when and where to
focus marketing efforts (LIEBEL, 2004).
An
interesting illustration is the hydroelectric construction plant in which
knowledge of the time series of river flow that supply the dam is essential to
the development of the project. Currently, industries need to plan in detail
the production and stocks kept at the disposal of operations. Thus, the time
series application is essential.
The techniques of time series forecasting are the
predictions from sequences of past values, in other words, from observations of
the series [Zt-1, Zt-2,
Zt-3, .... , Z1]. The objectives of the
techniques of time series forecasting are:
·
forecasting future
values of the time series;
·
describing only the behavior of the series (checking
of the trend and seasonality);
·
identifying the mechanism that generates the series
(the stochastic process that generated the series).
The
modeling of the series is based on a single realization and this requires it to
have an ergodic stochastic process. The predictions resulting from the
application of these techniques may be related only to the information
contained in the historical series of interest (based
on classical statistical methods) or even in addition to incorporating this
information, may consider other supposedly relevant and which are not contained
in the series analyzed (methods based on Bayesian statistics).
According
to Boucher and Elsayed (1994), the techniques of time series forecasts can be
divided into two categories: qualitative and quantitative techniques. The
quantitative technique is the most used when there is a set of data, and it is
possible to apply various statistical methods in order to extrapolating these
data and to obtaining probable future values.
Among
the most widespread prediction models, we can mention the Box and Jenkins
models, which have the advantage of its systematic methodology that allows both
estimates, specific as per interval and based on a series of hypotheses and
statistical tests for acceptance of the models, however they should only be
applied to linear series, stationary or not (GUTIERREZ, 2003). This is a very
powerful procedure, but requires a very accurate knowledge.
Assimakopoulos
and Nikolopoulos (2000) have proposed a new univariate forecasting model called
Theta, which is relatively simple to apply and presented one of
the best performances in time series forecasts.This
was one of the methods tested in the M3-Competition of Makridakis (2000). The model consists in decomposing time series in two lines called Theta,
each line is extrapolated separately by linear regression and Simple
Exponential Smoothing (SES), respectively, then the two forecasts are combined
with equal weights, obtaining the Theta forecast.
However,
as disadvantage it may be cited the lack of confidence intervals for the
estimates in the work of Assimakopoulos and Nikolopoulos (2000). The confidence intervals are very important to have a reliable estimate of
the size of the error you can make to get an estimate.
The
purpose of this work is applying the technique for computing-intensive
resampling known as "Bootstrap" to estimate by confidence intervals
the point forecasts achieved by Theta Model. The Bootstrap
resampling method consists in a set of data, either directly or through an
adjusted model in order to create the data replication, which can evaluate the
variability of amounts of interest without using analytical calculations.
Therefore,
this technique is particularly useful when the estimators calculation is
complicated by analytical methods. Resampling permits different alternatives to
meet standard deviations and confidence intervals by analyzing a set of data (DAVISON; HINKLEY, 1997).
2. REVIEW OF LITERATURE
2.1. Theta Model
According
to Nikolopoulos et. al. (2011) the Theta model is a time series forecasting
model developed from the idea that an extrapolative method is practically
unable to capture efficiently all available information hidden in a time
series. The Theta model sparked interest in academia, due to its amazing
performance in positive predictions in M3-competition (MAKRIDAKIS, et. al,
2000).
This
model can be understood according to the analysis of Hyndman and Billah (2003)
as being equivalent to simple exponential smoothing with "drift".
However, Nikolopoulos and Assimakopoulos (2005) disagree with this approach and
claim that the theta model is more general than the simple exponential
smoothing because it is an approximation to the decomposition of the data and
that it can be relied on extrapolation of any forecasting method.
The
Theta model is based on modification of the local curvature of time series
seasonally adjusted by a coefficient theta. This coefficient is applied directly to the second
difference of the series (ASSIMAKOPOULOS; NIKOLOPOULOS, 2008). This application
results in a series called “Theta Line”, maintaining the average and the slope
of the original data, but not their curvatures.
The
general formulation of theta model is based on the following steps:
·
Decomposition of the initial series in two
or more rows theta;
·
Each theta line is extrapolated separately
and the forecasts are simply combined with equal weights.
The
best formulation of the model and which was tested in the Competition-M3 is the
decomposition of the time series in two theta lines. In this case the number of
observations is decomposed as follows:
= , (1)
Where,
is the linear
regression of the data and is obtained by the
following expression:
. (2)
The
describes the series
as a linear trend. duplicates the local
curvatures extending the short-term action. For extrapolation of it is applied the method of simple exponential
smoothing. The final forecasting for the Theta model is obtained by combining
the two lines with equal weights,
= , (3)
In
practice the method can be easily implemented by using EXCEL an electronic
spreadsheet. Nikolopoulos and Assimakopoulos (2005) suggest the following steps
for its implementation:
·
Step 0: Seasonal
decomposition of data by the classical method multiplicative if necessary;
·
Step 1: Apply Linear
Regression to data and prepare and forecasts;
·
Step 2: Prepare values for using formula
(1);
·
Step 3: Extrapolate with either SES (Simple Exponential Smoothing) or other smoothing method,
such as moving averages;
·
Step 4: Combine with
equal weights the forecasts from SES and LR (Linear Regression).
Theta
model is usually simple and requires no extensive training. According to the
results of competition of Makridakis (2000) the method has obtained good
predictions in monthly series stationary or with trend or seasonality.
Petropoulos & Nikolopoulos (2013) argue for the use of more theta lines Q Î {-1, 0, 1, 2, 3}, as to
extract even more information from the data. These lines can be extrapolated
with other exponential smoothing methods like the Holt exponential smoothing
and Brown exponential smoothing.
2.2.
“Bootstrap” Method
"Bootstrap"
is a computer-intensive method developed by Bradley Efron (1979) to be used in
the estimation of the variability of statistics. Generally speaking,
"bootstrap" is a technique that objectives the estimation by point or
confidence interval of parameters of interest using resampling of the original
data. It should be used when the classical methods for this purpose are
asymptotic, difficult to implement or simply not existing for specific
statistics.
"Bootstrap",
as already mentioned, is a computationally intensive method that uses Monte
Carlo simulation to estimate standard errors and confidence intervals.
According to Chaves Neto (1991), ‘Bootstrap’ is a non-parametric statistical
technique computationally intensive that allows evaluating the variability of
statistics based on data from a single sample exists.
The
basic idea of "Bootstrap" is resampling a set of observations of the
original sample, directly or via an adjusted model in order to create replicas
of data, from which it can evaluate the variability of statistics without the
use of analytical methods.
So
when you have a random sample of size n, x’ = [x1, x2, x3, ..... , xn], with
replacement it becomes NBS samples with replacement from the original sample
resulting in a sample called "bootstrap" and denoted by x *.
Calculating the T statistic of interest with the NBS samples
"bootstrap" it can be gotten the set of estimates
"bootstrap" consisting of i = 1, 2, ... NBS.
These values create an approximation of the true distribution sample of T.
Resampling
is based on an empirical distribution, in other words, there is probability
mass equal to 1/n each sample point. Thus the empirical distribution placed in
the sample data is F = 1/n. The key point of the method is thus the replacement
that allows the reset of as many samples as desired.
The
goal is to see how the statistics obtained from the resampling obtained vary
due to random sampling. In cases of parameter estimation in which the sampling
distribution of the statistic (estimator) is unknown the "bootstrap"
is very helpful.
Hestemberg
et al. (2003) stated that the original sample represents the population from
which it was removed. The resampling represents what you should get when many
samples are taken from the original population. The "bootstrap"
distribution of statistics, based on many resampling, represents an
approximation of the true sampling distribution of statistics. In order to
obtain reliable results it should be taken thousands of "bootstrap"
samples from the original sample.
3. PROPOSED METHOLOGY
The
time series analyzed was obtained by a program generator of time series (GST),
developed experimentally in the Pascal language. It has been Chosen to generate
series with 36 observations and the generation was made to the structure models
AR (1) defined by , setting autoregressive parameter into the parameter
space in the region of stationarity, . It was considered the value of the constant term d = 45 and noise variance has been set at V(at) = = 0,2. The programs were developed in Matlab version 7.1 to
achieve the objectives of this work.
3.1.
Adaptation of the models and computational implementation
3.1.1. Forecasts Theta
Given the time series , the series fits a linear regression model by the method of
ordinary least squares (OLS), obtaining the estimation of , and the vector that will be designated as .
To achieve other lines theta, it replaces in the equation:
(4)
is extrapolated by a simple exponential
smoothing (SES). The combination with equal weights in the period h gives the
final forecast for the theta model.
= (5)
3.1.2 Confidence Interval “Bootstrap”
In the present study
residues are obtained by the sample , obtained by
combining equal weight of and after application of the method of exponential
smoothing.
The problem considered
is one which uses linear models to estimate the sampling distribution of
statistics used to estimate . The overall
regression model adapted to the context prediction for linear regression is:
(5)
where:
: Answers vector with dimension n;
X: matrix model of order nxp
: parameter vector with dimension p;
: residual vector with dimension n.
With linear regression
model applied, the same is used to generate obtained by the equation:
= 2
(6)
To generate the preview line , a combination is made with equal weights between and , and after the application of simple exponential smoothing
(SES) on .
= (+ )
(7)
For the purpose of "Bootstrap", will be used
residues obtained from this combination, so it follows that:
= - (8)
where:
Original series, generated by the simulator
Series estimated by Theta;
The steps for
computational implementing to obtain confidence intervals for the forecast h
periods are:
1) Fit a regression model by ordinary least squares (OLS), obtaining the
estimation
of , and the vector of
answers e .
2) Apply a model for exponential smoothing on , generating the
vector . Get the theta model
by the equation:
= , (9)
obtaining
the vector of estimated residues which shall be
considered the original sample for the purpose of “bootstrap”.
3) Select B random sample of size n, from residues obtained in step (1) using resampling with
replacement, with probability for each residue selected .
, , …. ,
, , … ,
…...............................
…...............................
, , … , ~
4)
Generate the pseudo-series, with each sample “bootstrap” by the equation:
(10)
5) Adjust again the model by ordinary least squares to pseudo-series,
obtaining the estimated "bootstrap" from , and
the vector "bootstrap" for the theta model ().
6) Store in a vector BX1;
The following flowchart, Figure 1
shows steps of the algorithm.
Bootstrap distribution of that simulates the sampling distribution of . { ; = 1, 2, 3, ... ,B}
Figure 1: "bootstrap" Algorithm distribution from
From this distribution
"bootstrap" one can calculate the
standard deviation "bootstrap” and the breaks "bootstrap" for . To obtain a percentile confidence interval with confidence
level of 95% to , it is ordered in ascending order of the data distribution
"bootstrap" ≤ ≤≤...≤ And use and respectively as lower
and upper limits from the confidence interval from to .
4. NUMERICAL RESULTS
Table 1, below, shows the series
generated by the simulator GST with 36 observations. The last six values of
the series were stored for validation and performance testing.
Table 1: Time series generated by simulator
Time Series |
|||
|
ear 1 |
ear 2 |
ear 3/test |
month 1 |
45.08 |
44.84 |
45.05 |
month 2 |
44.69 |
44.68 |
45.14 |
month 3 |
44.61 |
44.60 |
44.87 |
month 4 |
44.90 |
44.70 |
45.04 |
month 5 |
45.21 |
44.50 |
45.24 |
month 6 |
45.13 |
45.06 |
45.25 |
month 7 |
45.15 |
45.12 |
44.93 |
month 8 |
44.99 |
44.85 |
45.21 |
month 9 |
45.06 |
44.93 |
45.10 |
month 10 |
44.89 |
44.60 |
45.18 |
month 11 |
44.78 |
44.83 |
45.09 |
month 12 |
44.79 |
44.75 |
45.15 |
Source: The authors.
Theta Model is decomposed into two
theta lines, L (Θ = 0) and L (Θ = 2) and extrapolated by linear trend and
simple exponential smoothing, respectively.
Table 2, below, shows the six series
values stored for performance testing, and forecasts for and , and forecast for the
Theta Model, the evaluation of quality of the prediction according to the
criterion MAPE, RMSE, and MSE.
Table 2: Period actual value, regression lines and smoothing, theta
forecasts, MAPE, and RMSE MSE
Period (h) |
Observed (test) |
|
|
|
MAPE |
MSE |
1 |
44.93 |
44.9652 |
45.5130 |
45.2391 |
0.6880 |
0.0955 |
2 |
45.21 |
44.9687 |
45.5130 |
45.2409 |
0.0682 |
0.0010 |
3 |
45.10 |
44.9722 |
45.5130 |
45.2426 |
0.3162 |
0.0203 |
4 |
45.18 |
44.9757 |
45.5130 |
45.2443 |
0.1424 |
0.0041 |
5 |
45.09 |
44.9792 |
45.5130 |
45.2461 |
0.3462 |
0.0244 |
6 |
45.15 |
44.9826 |
45.5130 |
45.2478 |
0.2167 |
0.0096 |
Mean |
|
0.2963% |
0.0258 |
|||
RMSE |
|
0.1606 |
Source:
The authors.
RMSE for the series decomposed
into two lines and the extrapolation by SES and LR respectively, is equal to
0.1606. The performance measure medium according to the criterion MAPE is
0.2963%. Figure 1, below, shows the time series analyzed, the linear regression
line of exponential smoothing and the forecasts for the theta method.
Figure 2: Time Series, L (Θ
= 0) RL, L (Θ = 2) of AES forecast Theta.
Table 3, below shows the forecast
results obtained by the theta and also the predictions obtained by the Statgraphics
software, using traditional methods optimized for the lowest RMSE and their respective
MAPE errors.
Table 3: Period, theta forecasts and Box & Jenkins and MAPE's
|
THETA |
BOX & JENKINS |
||
Period (h) |
|
MAPE |
ARMA (1,0) |
MAPE |
1 |
45.2391 |
0.6880% |
45.0868 |
0,3489% |
2 |
45.2409 |
0.0682% |
45,0054 |
0,4525% |
3 |
45.2426 |
0.3162% |
44.9648 |
0,2997% |
4 |
45.2443 |
0.1424% |
44.9445 |
0,5212% |
5 |
45.2461 |
0.3462% |
44.9344 |
0,3450% |
6 |
45.2478 |
0.2167% |
44.9293 |
0,4888% |
Mean |
|
0.2963% |
|
0.4095% |
Source:
The authors.
Table 3, allows us to affirm that
the Theta model in its simplest application L(Θ = 0) and L(Θ = 2), using the
method of simple exponential smoothing (SES) for extrapolation of L(Θ = 2),
obtained the best performance getting an average absolute percentage error of
0.2963% against 0.4095% achieved by the methodology of Box and Jenkins.
Table 4, below, shows the observed
values, the predictions for the Theta Methods, Standard Deviation
"bootstrap" MSE "bootstrap" and the Lower and Upper Limits
of confidence.
Table 4: Real value, theta forecast, “bootstrap” standard error,
“bootstrap” MSE, confidence interval
Period (h) |
Observed |
ThetaF. |
Standard
error “bootstrap” |
MSE “bootstrap” |
Lo. Limit 95% |
Up. Limit 95% |
1 |
44.93 |
45.2391 |
0.0020 |
0.09x10-4 |
45.2329 |
45.2406 |
2 |
45.21 |
45.2409 |
0.0040 |
0.37x10-4 |
45.2286 |
45.2439 |
3 |
45.10 |
45.2426 |
0.0062 |
0.86x10-4 |
45.2228 |
45.2488 |
4 |
45.18 |
45.2443 |
0.0085 |
1.49x10-4 |
45.2200 |
45.2522 |
5 |
45.09 |
45.2461 |
0.0108 |
2.42x10-4 |
45.2141 |
45.2561 |
6 |
45.15 |
45.2478 |
0.0125 |
3.12x10-4 |
45.2123 |
45.2600 |
|
|
|
|
1.39x10-4 |
|
|
Source:
The authors.
Analyzing Table 4, it is note that
the MSE "bootstrap" on the predictions obtained appear in ascending
order, meaning that each forecast horizon the confidence limits appear in a
larger range. Making the predictions less reliable as the forecast horizon increases.
The interval "bootstrap"
for the forecast horizon of h periods ahead for the time series, it was applied
the method "bootstrap" with a number of replications B = 1000. The
histogram of the estimates "bootstrap" of the forecast for the first
prediction , Figure 2 below shows the distribution of
"bootstrap" for . In the figure it is observed a high degree of symmetry,
which suggests a Gaussian model for these values gotten.
Figure 2: “Bootstrap” distribution
for
5. CONCLUSIONS
This work adopts the method known as
"bootstrap", to estimate by confidence intervals for forecasts for
the Theta Model. The Theta model has also proved to be a good alternative for
time series forecasting, because the results indicate a significant advantage
over conventional methods.
The method is simple, does not
require extensive training and basic statistics. From the results obtained, it
is noted that the greater complexity of a model does not necessarily result in
better results in modeling data. In its simplest application, by the series
decomposition in linear regression and simple exponential smoothing is obtained
at least equivalent results to other automated methods.
The results of the errors MAPE's
confirm the favorable results of the M3-competition of Makridakis and Hibon
(2000). For the construction of confidence intervals it was used the method of
computation intensive "Bootstrap", which was obtained by the
percentile intervals of 95% confidence.
The distribution data
"bootstrap" showed symmetrical behavior, which suggests an estimation
model normally distributed with 1000 replications. The problem to determine how
many B replications will be required to obtain good estimates of lower and upper
limits of confidence intervals by the "bootstrap" is discussed in
Efron and Tibshirani (1993).
REFERENCES
ASSIMAKOPOULOS, V.; NIKOLOPOULOS, K. (2000). The
theta model: a decomposition approach to forecasting.
International Journal of Forecasting
v. 16 p. 521 –530.
ASSIMAKOPOULOS,
V.; NIKOLOPOULOS, K. (2008). Advances in the theta model. University of
Peloponnese, Department of
Economics.
BOUCHER, T. O.; ELSAYED, E. A.
(1994). Analysis and control of production systems., 2.nd ed.,
Prentice Hall, New Jersey.
CHAVES NETO, A. (1991) Bootstrap”
DAVISON, A. C.; HINKLEY, D. V. (1997). Bootstrap
Methods and their Application. Cambridge University Press.
EFRON, B. (1979) Bootstrap methods: another look at
jakknife. Annals of Statisticis, v. 7,
n. 1, p. 1-26.
EFRON, B.; TIBSHIRANI, R. J.
(1993). An introduction to the “bootstrap”. Chapman and Hall, New York.
GUTIÉRRES, J. L. C. (2003). Monitoramento
da Instrumentação da Barragem de Corumba-I por Redes Neurais e Modelos de Box e
Jenkins. Dissertation (Master in Civil Engineering), PUC-RIO.
HESTERBERG, T.; MOORE, D. S.;
MONAGHAN, S.; CLIPSON, A.; EPSTEIN, R. (2003) “Bootstrap” methods and
permutation tests, In: The practice of business statistics. New York: W. H. Freeman.
HYNDMAN, R. J.; BILLAH, B.
(2003). Unmasking the Theta method. International
Journal of Forecasting, v.19,
n. 2, p. 287-290.
LIEBEL, M. J. (2004). Previsão de Receitas
Tributárias – O caso do ICMS no Estado do Paraná. Dissertation
(Professional Master’s degree in Engineering) – Universidade Federal do Rio
Grande do Sul – RS.
MAKRIDAKIS, S.; HIBON, M.
(2000). The M3-Competition: results, conclusions and implications.
International Journal of Forecasting,
v. 16, p. 451 –476.
NIKOLOPOULOS,
K.; ASSIMAKOPOULOS, V. (2005) Fathoming the Theta
model. In: 25th International Symposium on Forecasting, ISF, San Antonio, Texas, USA.
NIKOLOPOULOS,
K.; ASSIMAKOPOULOS, V.; BOUGIOUKOS, N.; LITSA, A.; PETROPOULOS, F. (2011). The
Theta model: An essential Forecasting Tool for Supply Chain Planning. Advances
in Automation and Robotics, Lecture Notes in Electrical Engineering, n. 123,
p. 431-437.
PETROPOULOS,
F.; NIKOLOPOULOS, K. (2013). Optimizing Theta model for monthly data. In:
Proceedings of the 5th International Conference on Agents and
Artificial Intelligence.