forked from seegler/web-copier
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathsource.html
More file actions
292 lines (148 loc) · 49.4 KB
/
source.html
File metadata and controls
292 lines (148 loc) · 49.4 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
<html><body>
<!--StartFragment--><div><h1 class="ec ed as ee b ef eg eh ei ej ek el em en eo ep eq er es et eu ev" id="7911">How To Model Time Series Data With Linear Regression</h1></div><h2 class="ew ed as ar cl ex ey ez fa fb fc fd fe ff fg fh fi fj fk fl fm aw" id="8189">Time Series Modeling With Python Code</h2><div class="fn"><div class="n fo fp fq fr"><div class="o n"><div><a href="https://towardsdatascience.com/@jhwang1992m?source=post_page-----cd94d1d901c0----------------------" rel="noopener"><div class="fs ft fu"><div class="fv n fw o p s fx fy fz ga gb dj"><svg height="57" viewbox="0 0 57 57" width="57"></svg></div></div></a></div></div></div></div><div><a href="https://towardsdatascience.com/@jhwang1992m?source=post_page-----cd94d1d901c0----------------------" rel="noopener"><div class="fs ft fu"><img alt="Jiahui Wang" class="r gc fu ft" height="48" src="download/2e8NPGZhPxcLycmEEaUy2lQ.jpeg" width="48"/></div></a></div><div class="gd ai r"><div class="n"><div style="flex:1"><span class="ar b as at au av r ev q"><div class="ge n o gf"><span class="ar cl gg at br gh gi gj gk gl ev"><a class="cq cr ba bb bc bd be bf bg bh gm bk gn go" href="https://towardsdatascience.com/@jhwang1992m?source=post_page-----cd94d1d901c0----------------------" rel="noopener">Jiahui Wang</a></span><div class="gp r ao h"><span><a class="gq ev q gr gs gt gu gv bh gn gw gx gy gz ha hb hc ar b as hd cm av he hf df hg dx" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=-4037e6e33535-------------------------follow_byline-" rel="noopener">Follow</a></span></div></div></span></div></div><span class="ar b as at au av r aw ax"><span class="ar cl gg at br gh gi gj gk gl aw"><div><a class="cq cr ba bb bc bd be bf bg bh gm bk gn go" href="https://towardsdatascience.com/how-to-model-time-series-data-with-linear-regression-cd94d1d901c0?source=post_page-----cd94d1d901c0----------------------" rel="noopener">Apr 8</a> · 10 min read<span style="padding-left:4px"><svg class="star-15px_svg__svgIcon-use" height="15" style="margin-top:-2px" viewbox="0 0 15 15" width="15"></svg></span></div></span></span></div><div class="n o"><div class="hp r ao"></div></div><div class="hp r ao"></div><div class="hp r ao"></div><div class="hs r"><div class="q"><span><a class="cq cr ba bb bc bd be bf bg bh hq hr bk gn go" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=post_actions_header--------------------------bookmark_header-" rel="noopener"><svg height="25" viewbox="0 0 25 25" width="25"></svg></a></span></div></div><article class="meteredContent"><div><section class="dk dl dm dn do"><div class="n p"><div class="z ab ac ae af dp ah ai"><div class="fn"><div class="n fo fp fq fr"><div class="n hh hi hj hk hl hm hn ho y"><div class="n o"><div class="ht r am"></div></div></div></div></div><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd ia"><div class="ip r fs iq"><div class="ir is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="2250" src="download/0NS8utnPL-0YBZzBJ" width="3000"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="2250" sizes="700px" src="download/0NS8utnPL-0YBZzBJ" srcset="https://miro.medium.com/max/414/0*NS8utnPL-0YBZzBJ 276w, https://miro.medium.com/max/828/0*NS8utnPL-0YBZzBJ 552w, https://miro.medium.com/max/960/0*NS8utnPL-0YBZzBJ 640w, https://miro.medium.com/max/1050/0*NS8utnPL-0YBZzBJ 700w" width="3000"/></div></div></div></div><figcaption class="ix iy de dc dd iz ja ar cl gg at aw" data-selectable-paragraph="">Photo by <a class="cq dx dy jb ea eb" href="https://unsplash.com/@tangib?utm_source=medium&utm_medium=referral" rel="noopener nofollow" target="_blank">tangi bertin</a> on <a class="cq dx dy jb ea eb" href="https://unsplash.com?utm_source=medium&utm_medium=referral" rel="noopener nofollow" target="_blank">Unsplash</a></figcaption></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="391c">Welcome back! This is the 4th post in the <a class="cq dx dy jb ea eb" href="https://towardsdatascience.com/tagged/time-series-modeling" rel="noopener" target="_blank">column</a> to explore analysing and modeling time series data with Python code. In the previous three posts, we have covered <a class="cq dx dy jb ea eb" href="https://towardsdatascience.com/fundamental-statistics-7770376593b" rel="noopener" target="_blank"><strong class="je jq">fundamental statistical concepts</strong></a>, <a class="cq dx dy jb ea eb" href="https://towardsdatascience.com/how-to-analyse-a-single-time-series-variable-11dcca7bf16c" rel="noopener" target="_blank"><strong class="je jq">analysis of a single time series variable</strong></a>, and <a class="cq dx dy jb ea eb" href="https://towardsdatascience.com/how-to-analyse-multiple-time-series-variable-5a8d3a242a2e" rel="noopener" target="_blank"><strong class="je jq">analysis of multiple time series variables</strong></a>. From this post onwards, we will make a step further to explore modeling time series data using linear regression.</p></div></div></section><hr class="jr cl js jt ju jv iy jw jx jy jz ka"/><section class="dk dl dm dn do"><div class="n p"><div class="z ab ac ae af dp ah ai"><h1 class="kb kc as ar kd ke kf kg kh ki kj kk kl km kn ko kp kq kr ks kt ev" data-selectable-paragraph="" id="0934">1. Ordinary Least Squares (OLS)</h1><p class="jc jd as je b ex ku jg fa kv ji jj kw ff jl kx fi jn ky fl jp dk ev" data-selectable-paragraph="" id="2018">We
all learnt linear regression in school, and the concept of linear
regression seems quite simple. Given a scatter plot of the dependent
variable y versus the independent variable x, we can find a line that
fits the data well. But wait a moment, how can we measure whether a line
fits the data well or not? We cannot just visualize the plot and say a
certain line fits the data better than the other lines, because
different people may make different evaluation decisions. How can we
quantify the evaluation?</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="b1b3">Ordinary
least squares (OLS) is a method to quantify the evaluation of the
different regression lines. According to OLS, we should choose the
regression line that minimizes the sum of the squares of the differences
between the observed dependent variable and the predicted dependent
variable.</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd kz"><div class="ip r fs iq"><div class="la is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="651" src="download/1cfD_EOOIo6sG1Thch6QeTQ.png" width="1197"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="651" sizes="700px" src="download/1cfD_EOOIo6sG1Thch6QeTQ.png" srcset="https://miro.medium.com/max/414/1*cfD_EOOIo6sG1Thch6QeTQ.png 276w, https://miro.medium.com/max/828/1*cfD_EOOIo6sG1Thch6QeTQ.png 552w, https://miro.medium.com/max/960/1*cfD_EOOIo6sG1Thch6QeTQ.png 640w, https://miro.medium.com/max/1050/1*cfD_EOOIo6sG1Thch6QeTQ.png 700w" width="1197"/></div></div></div></div><figcaption class="ix iy de dc dd iz ja ar cl gg at aw" data-selectable-paragraph="">Illustration of OLS regression</figcaption></figure><h1 class="kb kc as ar kd ke lb kg kh lc kj kk ld km kn le kp kq lf ks kt ev" data-selectable-paragraph="" id="5738">2. Gauss-Marcov Assumptions</h1><p class="jc jd as je b ex ku jg fa kv ji jj kw ff jl kx fi jn ky fl jp dk ev" data-selectable-paragraph="" id="2be3">We
can find a line that best fits the observed data according to the
evaluation standard of OLS. A general format of the line is:</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd lg"><div class="ip r fs iq"><div class="lh is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="137" src="download/1BsIOb5DT_4L6ZOqsyK7M7A.png" width="872"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="137" sizes="700px" src="download/1BsIOb5DT_4L6ZOqsyK7M7A.png" srcset="https://miro.medium.com/max/414/1*BsIOb5DT_4L6ZOqsyK7M7A.png 276w, https://miro.medium.com/max/828/1*BsIOb5DT_4L6ZOqsyK7M7A.png 552w, https://miro.medium.com/max/960/1*BsIOb5DT_4L6ZOqsyK7M7A.png 640w, https://miro.medium.com/max/1050/1*BsIOb5DT_4L6ZOqsyK7M7A.png 700w" width="872"/></div></div></div></div></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="986c">Here,
μᵢ is the residual term that is the part of yᵢ that cannot be explained
by xᵢ. We can find this best regression line according to OLS
requirement, but are we sure OLS generates the best estimator? One
example is when there is an outlier, the ‘best’ regression line
calculated according to OLS obviously does not fit the observed data
well.</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd kz"><div class="ip r fs iq"><div class="la is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="651" src="download/1zvhHrnoVtF8QZrS-tfnIiQ.png" width="1197"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="651" sizes="700px" src="download/1zvhHrnoVtF8QZrS-tfnIiQ.png" srcset="https://miro.medium.com/max/414/1*zvhHrnoVtF8QZrS-tfnIiQ.png 276w, https://miro.medium.com/max/828/1*zvhHrnoVtF8QZrS-tfnIiQ.png 552w, https://miro.medium.com/max/960/1*zvhHrnoVtF8QZrS-tfnIiQ.png 640w, https://miro.medium.com/max/1050/1*zvhHrnoVtF8QZrS-tfnIiQ.png 700w" width="1197"/></div></div></div></div><figcaption class="ix iy de dc dd iz ja ar cl gg at aw" data-selectable-paragraph="">A case when OLS does not generate the best regression line to describe the data</figcaption></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="1aac"><strong class="je jq">2.1 Gauss-Markov Assumptions for Cross-sectional Data</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="5912">It
turns out that only when certain assumptions are fulfilled, OLS
calculates the best linear unbiased estimator (BLUE) that well estimates
the population parameters. For cross-sectional data, Gauss-Marcov
assumptions have six assumptions that ensure estimators calculated using
OLS are BLUE. When any one of the Gauss-Marcov assumptions is violated,
the sample parameters calculated using OLS no longer represent
population parameters well.</p><ol class=""><li class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp li lj lk ev" data-selectable-paragraph="" id="aa71">Linearity
in parameters. This assumption requires that parameter β is linear.
However, there is no requirement for linearity in the independent
variable. yᵢ=α + βxᵢ² +μᵢ and yᵢ=α + βIn(xᵢ) +μᵢ both have linear β.</li><li class="jc jd as je b ex ll jg fa lm ji jj ln ff jl lo fi jn lp fl jp li lj lk ev" data-selectable-paragraph="" id="47a7">The
independent variable x and dependent variable y are both random
variables. It is worth mentioning that if x and y are both random
variables, the residual term μ will not be autocorrelated.</li><li class="jc jd as je b ex ll jg fa lm ji jj ln ff jl lo fi jn lp fl jp li lj lk ev" data-selectable-paragraph="" id="c06a">No
perfect collinearity between multiple independent variables x₁ and x₂.
If there is perfect collinearity, linear regression results will be
random, as it cannot differentiate the contribution of x₁ and x₂.
Typically, when R² result is good but t test for each independent
variable is poor, it indicates collinearity.</li><li class="jc jd as je b ex ll jg fa lm ji jj ln ff jl lo fi jn lp fl jp li lj lk ev" data-selectable-paragraph="" id="6ee0">The
residual term μ is endogenous. To be endogenous, μᵢ does not change
with xᵢ. It can be expressed as cov(μᵢ, xᵢ)=0. Endogeneity may arise
from reverse causality or measurement error in x, which causes cov(μᵢ,
xᵢ)!=0.</li><li class="jc jd as je b ex ll jg fa lm ji jj ln ff jl lo fi jn lp fl jp li lj lk ev" data-selectable-paragraph="" id="5c9a">Homoscedasticity in residual term μᵢ. It requires the variance of μᵢ does not change with xᵢ.</li><li class="jc jd as je b ex ll jg fa lm ji jj ln ff jl lo fi jn lp fl jp li lj lk ev" data-selectable-paragraph="" id="bad9">No
autocorrelation of the residual term μᵢ. It can be expressed as cov(μᵢ,
μⱼ)=0. Autocorrelation of μᵢ can arise from omitted independent
variable, mis-specified regression function, measurement error in the
independent variables, and cluster errors.</li></ol><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="3036"><strong class="je jq">2.2 Gauss-Markov Assumptions for Time Series Data</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="8965">Time
series data is slightly different from the cross-sectional data. For
cross-sectional data, we are getting samples from a population and
Gauss-Markov assumptions require the independent variable x and
dependent variable y are both random variables. For time series data, we
are getting samples from the same process, and we can no longer assume
that the independent variable x is random variable. Thus, Gauss-Markov
assumptions are stricter for time series data in terms of endogeneity,
homoscedasticity, and no autocorrelation. Since x is no longer a random
variable, the requirement needs to be fulfilled for all xₖ at all time
points instead of just xᵢ at the time point as the residual term μᵢ.</p><h1 class="kb kc as ar kd ke lb kg kh lc kj kk ld km kn le kp kq lf ks kt ev" data-selectable-paragraph="" id="1c78">3. Hypothesis Testing On Linear Regression</h1><p class="jc jd as je b ex ku jg fa kv ji jj kw ff jl kx fi jn ky fl jp dk ev" data-selectable-paragraph="" id="45b4"><strong class="je jq">3.1 Linear Regression in Python</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="a09f">Here, we continue to use the historical AAPL_price and SPY_price obtained from <a class="cq dx dy jb ea eb" href="https://sg.finance.yahoo.com/quote/AAPL/" rel="noopener nofollow" target="_blank">Yahoo finance</a>.
We scatter plot AAPL_price against SPY_price first. Then, to find to
what extent AAPL_price can be explained by the overall stock market
price, we will build linear regression model with SPY_price as the
independent variable x and AAPL_price as the dependent variable y.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="8c96">Linear regression can be easily done with statsmodels library in Python.</p><pre class="ib ic id ie if lq lr ca"><span class="ev ls kc as lt b gg lu lv r lw" data-selectable-paragraph="" id="ba74">import numpy as np<br/>import pandas as pd<br/>import matplotlib.pyplot as plt<br/>import statsmodels.api as sm</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="c58b">AAPL_price = pd.read_csv('AAPL.csv',usecols=['Date', 'Close'])<br/>SPY_price = pd.read_csv('SPY.csv',usecols=['Date', 'Close'])</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="edff">X = sm.add_constant(SPY_price['Close'])<br/>model = sm.OLS(AAPL_price['Close'],X)<br/>results = model.fit()</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="39b2">plt.scatter(SPY_price['Close'],AAPL_price['Close'],alpha=0.3)<br/>y_predict = results.params[0] + results.params[1]*SPY_price['Close']<br/>plt.plot(SPY_price['Close'],y_predict, linewidth=3)</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="1d40">plt.xlim(240,350)<br/>plt.ylim(100,350)<br/>plt.xlabel('SPY_price')<br/>plt.ylabel('AAPL_price')<br/>plt.title('OLS Regression')</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="c4d9">print(results.summary())</span></pre><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="dc dd mc"><div class="ip r fs iq"><div class="md is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="288" src="download/1cWBMsoGgEhCO39_nrp2Log.png" width="432"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="288" sizes="432px" src="download/1cWBMsoGgEhCO39_nrp2Log.png" srcset="https://miro.medium.com/max/414/1*cWBMsoGgEhCO39_nrp2Log.png 276w, https://miro.medium.com/max/648/1*cWBMsoGgEhCO39_nrp2Log.png 432w" width="432"/></div></div></div></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="2a6e">Together with the plot to visualize the OLS linear regression results, we can print a summary table, which looks like this:</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="dc dd me"><div class="ip r fs iq"><div class="mf is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="325" src="download/1ST-bL7LLxhgk8r8Rn7C3YQ.png" width="566"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="325" sizes="566px" src="download/1ST-bL7LLxhgk8r8Rn7C3YQ.png" srcset="https://miro.medium.com/max/414/1*ST-bL7LLxhgk8r8Rn7C3YQ.png 276w, https://miro.medium.com/max/828/1*ST-bL7LLxhgk8r8Rn7C3YQ.png 552w, https://miro.medium.com/max/849/1*ST-bL7LLxhgk8r8Rn7C3YQ.png 566w" width="566"/></div></div></div></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="b0ae">Why
are we doing these complex hypothesis testing? How can we interpret
these hypothesis testing results? We will answer these questions in the
following sessions.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="75b3"><strong class="je jq">3.2 Why Hypothesis Testing on Linear Regression?</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="409e">Since
we are using samples to estimate the population, we need to evaluate
how well the population parameters are estimated by the sample
parameters. To conduct hypothesis testing on sample parameters, we need
to know the sample parameter distribution.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="1ae2">According
to the central limit theorem, when the sample size is large enough, the
sample distribution of β is normal distribution:</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd mg"><div class="ip r fs iq"><div class="mh is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="177" src="download/1ruvZ0Xc2hJxg7BfdIpIj3w.png" width="988"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="177" sizes="700px" src="download/1ruvZ0Xc2hJxg7BfdIpIj3w.png" srcset="https://miro.medium.com/max/414/1*ruvZ0Xc2hJxg7BfdIpIj3w.png 276w, https://miro.medium.com/max/828/1*ruvZ0Xc2hJxg7BfdIpIj3w.png 552w, https://miro.medium.com/max/960/1*ruvZ0Xc2hJxg7BfdIpIj3w.png 640w, https://miro.medium.com/max/1050/1*ruvZ0Xc2hJxg7BfdIpIj3w.png 700w" width="988"/></div></div></div></div></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="c801">However,
we do not know the exact population residual variance (σ²). We can use
sample residual variance (σʰᵃᵗ²) to estimate population residual
variance, but this way sample β distribution is no longer a normal
distribution. It becomes t distribution instead:</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd mi"><div class="ip r fs iq"><div class="mj is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="209" src="download/1QQD_uLVv_rpwwdF_0ooXOA.png" width="963"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="209" sizes="700px" src="download/1QQD_uLVv_rpwwdF_0ooXOA.png" srcset="https://miro.medium.com/max/414/1*QQD_uLVv_rpwwdF_0ooXOA.png 276w, https://miro.medium.com/max/828/1*QQD_uLVv_rpwwdF_0ooXOA.png 552w, https://miro.medium.com/max/960/1*QQD_uLVv_rpwwdF_0ooXOA.png 640w, https://miro.medium.com/max/1050/1*QQD_uLVv_rpwwdF_0ooXOA.png 700w" width="963"/></div></div></div></div></figure><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd mk"><div class="ip r fs iq"><div class="ml is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="882" src="download/1jJKjgT5ugFYy9CbY1p6iEQ.png" width="2100"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="882" sizes="700px" src="download/1jJKjgT5ugFYy9CbY1p6iEQ.png" srcset="https://miro.medium.com/max/414/1*jJKjgT5ugFYy9CbY1p6iEQ.png 276w, https://miro.medium.com/max/828/1*jJKjgT5ugFYy9CbY1p6iEQ.png 552w, https://miro.medium.com/max/960/1*jJKjgT5ugFYy9CbY1p6iEQ.png 640w, https://miro.medium.com/max/1050/1*jJKjgT5ugFYy9CbY1p6iEQ.png 700w" width="2100"/></div></div></div></div><figcaption class="ix iy de dc dd iz ja ar cl gg at aw" data-selectable-paragraph="">Sample
distribution of β follows t distribution, because we do not exactly
know the variance of population residual variance. Standard error is the
variance of sample parameter.</figcaption></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="faa2"><strong class="je jq">3.3 How To Interpret OLS Statistical Summary?</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="fa17">Now it is time to come back to the OLS Regression Results table and try to interpret the summary results.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="5e44">The
first session of the summary table has R² and F-statistic, which
measure the overall explainability of the independent variables over the
dependent variable.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="0c6d">R²
is the explained sum of squared errors divided by the total sum of
squared errors. R² lies in between 0 and 1, and a larger R² indicates
the dependent variable is better explained by the independent variables.
R² = explained sum of squared errors/total sum of squared errors. With
more independent variables, the resulting R² will be closer to 1, but at
the same time, the more independent variables may result in
overfitting. Adjusted R² prefers fewer independent variables by
penalizing the excess independent variables.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="efa4">F
statistic tests against the joint effect of the independent variables. A
low p-value of F statistic test indicates that the independent
variables do not explain the dependent variable well.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="0cd7">The
second session of the summary table is the t-statistic which tests
against each independent variable. Using the F-statistic and t-statistic
together helps to check whether there is collinearity in the
independent variables. A good F-statistic and poor t-statistic indicates
collinearity.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="ed5e">Durbin-Watson
and Jarque-Bera reported in the third session of the summary table
measures the stationarity and normality of the residual term, which will
be discussed in detail in the following sessions.</p><h1 class="kb kc as ar kd ke lb kg kh lc kj kk ld km kn le kp kq lf ks kt ev" data-selectable-paragraph="" id="06c8">4. Linear Regression Residual</h1><p class="jc jd as je b ex ku jg fa kv ji jj kw ff jl kx fi jn ky fl jp dk ev" data-selectable-paragraph="" id="52af">The
residual term is important. By checking whether the Gauss-Marcov
assumptions are fulfilled using the residual term, we can infer the
quality of the linear regression.</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd mm"><div class="ip r fs iq"><div class="mn is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="1060" src="download/13sd1TlhWfGSt-f4wKsURIw.png" width="2171"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="1060" sizes="700px" src="download/13sd1TlhWfGSt-f4wKsURIw.png" srcset="https://miro.medium.com/max/414/1*3sd1TlhWfGSt-f4wKsURIw.png 276w, https://miro.medium.com/max/828/1*3sd1TlhWfGSt-f4wKsURIw.png 552w, https://miro.medium.com/max/960/1*3sd1TlhWfGSt-f4wKsURIw.png 640w, https://miro.medium.com/max/1050/1*3sd1TlhWfGSt-f4wKsURIw.png 700w" width="2171"/></div></div></div></div><figcaption class="ix iy de dc dd iz ja ar cl gg at aw" data-selectable-paragraph="">Expected value of sample β</figcaption></figure><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd mo"><div class="ip r fs iq"><div class="mp is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="855" src="download/1-q3Je4RyUrwGAe0zLe7unA.png" width="2255"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="855" sizes="700px" src="download/1-q3Je4RyUrwGAe0zLe7unA.png" srcset="https://miro.medium.com/max/414/1*-q3Je4RyUrwGAe0zLe7unA.png 276w, https://miro.medium.com/max/828/1*-q3Je4RyUrwGAe0zLe7unA.png 552w, https://miro.medium.com/max/960/1*-q3Je4RyUrwGAe0zLe7unA.png 640w, https://miro.medium.com/max/1050/1*-q3Je4RyUrwGAe0zLe7unA.png 700w" width="2255"/></div></div></div></div><figcaption class="ix iy de dc dd iz ja ar cl gg at aw" data-selectable-paragraph="">Variance of sample β</figcaption></figure><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="dc dd mc"><div class="ip r fs iq"><div class="md is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="288" src="download/1QEA1QMyqLKsxVuFKyz-BZA.png" width="432"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="288" sizes="432px" src="download/1QEA1QMyqLKsxVuFKyz-BZA.png" srcset="https://miro.medium.com/max/414/1*QEA1QMyqLKsxVuFKyz-BZA.png 276w, https://miro.medium.com/max/648/1*QEA1QMyqLKsxVuFKyz-BZA.png 432w" width="432"/></div></div></div></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="cdc0"><strong class="je jq">4.1 Normality test</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="fc17">It
is important to test if the residuals are normally distributed. If the
residuals are not normally distributed, the residuals should not be used
for z test or any other test derived from normal distribution, such as t
test, F test and chi2 test.</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd mq"><div class="ip r fs iq"><div class="mr is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="861" src="download/1dXgVdSaG6i-_LNx8oHJnNw.png" width="2114"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="861" sizes="700px" src="download/1dXgVdSaG6i-_LNx8oHJnNw.png" srcset="https://miro.medium.com/max/414/1*dXgVdSaG6i-_LNx8oHJnNw.png 276w, https://miro.medium.com/max/828/1*dXgVdSaG6i-_LNx8oHJnNw.png 552w, https://miro.medium.com/max/960/1*dXgVdSaG6i-_LNx8oHJnNw.png 640w, https://miro.medium.com/max/1050/1*dXgVdSaG6i-_LNx8oHJnNw.png 700w" width="2114"/></div></div></div></div></figure><pre class="ib ic id ie if lq lr ca"><span class="ev ls kc as lt b gg lu lv r lw" data-selectable-paragraph="" id="3a9c">import pandas as pd<br/>import statsmodels.api as sm<br/>from scipy import stats</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="0fe1">AAPL_price = pd.read_csv('AAPL.csv',usecols=['Date', 'Close'])<br/>SPY_price = pd.read_csv('SPY.csv',usecols=['Date', 'Close'])</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="4828">X = sm.add_constant(SPY_price['Close'])<br/>model = sm.OLS(AAPL_price['Close'],X)<br/>results = model.fit()</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="92df">residual = AAPL_price['Close']-results.params[0] - results.params[1]*SPY_price['Close']</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="54b0">print('p value of Jarque-Bera test is: ', stats.jarque_bera(residual)[1])<br/>print('p value of Shapiro-Wilk test is: ', stats.shapiro(residual)[1])<br/>print('p value of Kolmogorov-Smirnov test is: ', stats.kstest(residual, 'norm')[1])</span></pre><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="ba4b">Output:</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="10a2">p value of Jarque-Bera test is: 0.0<br/>p value of Shapiro-Wilk test is: 9.164991873555915e-20<br/>p value of Kolmogorov-Smirnov test is: 1.1324826980654097e-55</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="1f67">If
we choose a significance level of 0.05, then all the three normality
tests indicate the residual term does not follow normal distribution.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="2bd2"><strong class="je jq">4.2 Homogeneity test</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="ab14">Three
commonly used statistical testing for heteroscedasticity are
Goldfeld-Quandt, Breusch-Pagan, White test. In the same sequence, more
general homogeneity is tested.</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd ms"><div class="ip r fs iq"><div class="mt is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="736" src="download/1Jc9PDg3u1D6nwUxghs1CjA.png" width="2153"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="736" sizes="700px" src="download/1Jc9PDg3u1D6nwUxghs1CjA.png" srcset="https://miro.medium.com/max/414/1*Jc9PDg3u1D6nwUxghs1CjA.png 276w, https://miro.medium.com/max/828/1*Jc9PDg3u1D6nwUxghs1CjA.png 552w, https://miro.medium.com/max/960/1*Jc9PDg3u1D6nwUxghs1CjA.png 640w, https://miro.medium.com/max/1050/1*Jc9PDg3u1D6nwUxghs1CjA.png 700w" width="2153"/></div></div></div></div></figure><pre class="ib ic id ie if lq lr ca"><span class="ev ls kc as lt b gg lu lv r lw" data-selectable-paragraph="" id="f630">import numpy as np<br/>import pandas as pd<br/>import matplotlib.pyplot as plt<br/>import statsmodels.api as sm<br/>import statsmodels.stats.api as sms</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="715b">AAPL_price = pd.read_csv('AAPL.csv',usecols=['Date', 'Close'])<br/>SPY_price = pd.read_csv('SPY.csv',usecols=['Date', 'Close'])</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="b4f6">X = sm.add_constant(SPY_price['Close'])<br/>model = sm.OLS(AAPL_price['Close'],X)<br/>results = model.fit()</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="79f4">residual = AAPL_price['Close']-results.params[0] - results.params[1]*SPY_price['Close']</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="60be">print('p value of Goldfeld–Quandt test is: ', sms.het_goldfeldquandt(results.resid, results.model.exog)[1])<br/>print('p value of Breusch–Pagan test is: ', sms.het_breuschpagan(results.resid, results.model.exog)[1])<br/>print('p value of White test is: ', sms.het_white(results.resid, results.model.exog)[1])</span></pre><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="9a48">Output is:</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="036b">p value of Goldfeld–Quandt test is: 2.3805273535080445e-38<br/>p value of Breusch–Pagan test is: 2.599557770260936e-06<br/>p value of White test is: 1.0987132773425074e-22</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="2516">If
we choose a significance level of 0.05, then all the three normality
tests indicate the residual term does not follow normal distribution.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="5ba3"><strong class="je jq">4.3 Stationarity test</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="7a1d">Durbin-Watson
test detects autocorrelation of the residual term with lag of 1, while
Breusch-Godfrey test detects autocorrelation of the residual term with
lag of N, depending on the setting in the test.</p><pre class="ib ic id ie if lq lr ca"><span class="ev ls kc as lt b gg lu lv r lw" data-selectable-paragraph="" id="ab44">import numpy as np<br/>import pandas as pd<br/>import matplotlib.pyplot as plt<br/>import statsmodels.api as sm</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="311b">AAPL_price = pd.read_csv('AAPL.csv',usecols=['Date', 'Close'])<br/>SPY_price = pd.read_csv('SPY.csv',usecols=['Date', 'Close'])</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="de9e">X = sm.add_constant(SPY_price['Close'])<br/>model = sm.OLS(AAPL_price['Close'],X)<br/>results = model.fit()</span><span class="ev ls kc as lt b gg lx ly lz ma mb lv r lw" data-selectable-paragraph="" id="3736">import statsmodels.stats.api as sms<br/>print('The Durbin-Watson statistic is: ', sms.durbin_watson(results.resid))<br/>print('p value of Breusch-Godfrey test is: ', sms.acorr_breusch_godfrey(results,nlags=1)[3])</span></pre><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="6ab9">Output:</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="d3e8">The Durbin-Watson statistic is: 0.06916423461968918<br/>p value of Breusch-Godfrey test is: 4.646673126097712e-150</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="8fbb">Both
Durbin-Watson and Breusch-Godfrey tests indicate there is
autocorrelation of the residual term with lag of 1. When Durbin-Watson
statistic is 2, there is no autocorrelation. When Durbin-Watson
statistic is towards 0, there is positive autocorrelation.</p><h1 class="kb kc as ar kd ke lb kg kh lc kj kk ld km kn le kp kq lf ks kt ev" data-selectable-paragraph="" id="59cc">5. Solving Violations of Gauss-Marcov Assumptions</h1><p class="jc jd as je b ex ku jg fa kv ji jj kw ff jl kx fi jn ky fl jp dk ev" data-selectable-paragraph="" id="286e"><strong class="je jq">5.1 Violation of Gauss-Marcov Assumptions</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="8acc">When
the Gauss-Marcov assumptions are violated, the estimators calculated
from the samples are no longer BLUE. The following table shows how
violation of Gauss-Marcov assumptions affects the linear regression
quality.</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd mu"><div class="ip r fs iq"><div class="mv is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="638" src="download/1z1Cz1U_AozDit32HkN4jAg.png" width="2488"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="638" sizes="700px" src="download/1z1Cz1U_AozDit32HkN4jAg.png" srcset="https://miro.medium.com/max/414/1*z1Cz1U_AozDit32HkN4jAg.png 276w, https://miro.medium.com/max/828/1*z1Cz1U_AozDit32HkN4jAg.png 552w, https://miro.medium.com/max/960/1*z1Cz1U_AozDit32HkN4jAg.png 640w, https://miro.medium.com/max/1050/1*z1Cz1U_AozDit32HkN4jAg.png 700w" width="2488"/></div></div></div></div></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="51de"><strong class="je jq">5.2 Weighted Least Squares (WLS)</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="d9fb">To
account for heteroscedastic error, Weighted Least Squares (WLS) can be
used. WLS transforms the independent variable and the dependent
variable, so that OLS remains BLUE after the transformation.</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd mw"><div class="ip r fs iq"><div class="mx is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="438" src="download/1k_F7OxRdKaYoB393OCPqHQ.png" width="1830"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="438" sizes="700px" src="download/1k_F7OxRdKaYoB393OCPqHQ.png" srcset="https://miro.medium.com/max/414/1*k_F7OxRdKaYoB393OCPqHQ.png 276w, https://miro.medium.com/max/828/1*k_F7OxRdKaYoB393OCPqHQ.png 552w, https://miro.medium.com/max/960/1*k_F7OxRdKaYoB393OCPqHQ.png 640w, https://miro.medium.com/max/1050/1*k_F7OxRdKaYoB393OCPqHQ.png 700w" width="1830"/></div></div></div></div></figure><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="ac61"><strong class="je jq">5.3 Generalized Least Squares (GLS)</strong></p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="63e0">To
account for both heteroscedastic error and serial correlated error,
Generalized Least Squares (GLS) can be used. GLS transforms the
independent variable and the dependent variable in a more complex way
than WLS, so that OLS remains BLUE after the transformation.</p><figure class="ib ic id ie if ig dc dd paragraph-image"><div class="ih ii fs ij ai"><div class="dc dd my"><div class="ip r fs iq"><div class="mz is r"><div class="ik il s t u im ai br in io"><img alt="Image for post" class="s t u im ai it iu ap xy" height="497" src="download/1LMmh1bZmxm-4MqRLUMgomA.png" width="1708"/></div><img alt="Image for post" class="sy xt s t u im ai iw" height="497" sizes="700px" src="download/1LMmh1bZmxm-4MqRLUMgomA.png" srcset="https://miro.medium.com/max/414/1*LMmh1bZmxm-4MqRLUMgomA.png 276w, https://miro.medium.com/max/828/1*LMmh1bZmxm-4MqRLUMgomA.png 552w, https://miro.medium.com/max/960/1*LMmh1bZmxm-4MqRLUMgomA.png 640w, https://miro.medium.com/max/1050/1*LMmh1bZmxm-4MqRLUMgomA.png 700w" width="1708"/></div></div></div></div></figure></div></div></section><hr class="jr cl js jt ju jv iy jw jx jy jz ka"/><section class="dk dl dm dn do"><div class="n p"><div class="z ab ac ae af dp ah ai"><h1 class="kb kc as ar kd ke kf kg kh ki kj kk kl km kn ko kp kq kr ks kt ev" data-selectable-paragraph="" id="0d42">Summary</h1><p class="jc jd as je b ex ku jg fa kv ji jj kw ff jl kx fi jn ky fl jp dk ev" data-selectable-paragraph="" id="a251">In
this post, we learnt that OLS generates good estimators only when
Gauss-Marcov assumptions are fulfilled. Thus, after linear regression,
it is always important to check the residual terms to ensure the
Gauss-Marcov assumptions are not violated. Luckily, using the
statsmodels library in Python, many statistical tests are automatically
conducted during the linear regression. A simple print of the OLS linear
regression summary table enables us to quickly evaluate the quality of
the linear regression. If there is violation of the Guass-Marcov
assumptions, further solutions of WLS and GLS are also available to
transform the independent variable and dependent variable, so that OLS
remains BLUE.</p><p class="jc jd as je b ex jf jg fa jh ji jj jk ff jl jm fi jn jo fl jp dk ev" data-selectable-paragraph="" id="263d">Hope you have enjoyed learning time series data modeling using linear regression!</p></div></div></section></div></article><div class="ik dj nb na ai zu nf ni" data-test-id="post-sidebar"><div class="n p"><div class="z ab ac ae af ag ah ai"><div class="nj n nk"><div class="dj"><div class="nl nm r"><a class="cq cr ba bb bc bd be bf bg bh hq hr bk gn go" href="https://towardsdatascience.com/?source=post_sidebar--------------------------post_sidebar-" rel="noopener"><h2 class="ar kd nn no ev">Towards Data Science</h2></a><div class="np nq r"><h4 class="ar cl gg at br nr gi gj ns gl aw">A Medium publication sharing concepts, ideas, and codes.</h4></div><div aria-hidden="false" class="hg"><span><a class="nt gr ay az nu bi bj nv bh hc ar b as at au av he hf df hg dx" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=--------------------------follow_card-" rel="noopener">Follow</a></span></div></div><div class="nw nx ny n"><div class="n o"><div class="r fs nz oa ob oc od"><span><a class="cq cr ba bb bc bd be bf bg bh hq hr bk gn go" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=post_sidebar-----cd94d1d901c0---------------------clap_sidebar-" rel="noopener"></a></span></div></div></div></div></div></div></div></div><div class="r on oo op oq or os ot"><div class="ou"><h4 class="ar cl gg at aw"></h4></div></div><div class="nx r"></div><div class="q"><span><a class="cq cr ba bb bc bd be bf bg bh hq hr bk gn go" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=post_sidebar--------------------------post_sidebar-" rel="noopener"><svg height="25" viewbox="0 0 25 25" width="25"></svg></a></span></div><div class="ik dj na nb nc ga nd ne nf ng"></div><div><div class="ox ig n nk p"><div class="n p"><div class="z ab ac ae af dp ah ai"><div class="oy oz pa pb iq pc"><h2 class="ar kd pd jg pe ji pf ff pg fi ph fl ev">Sign up for The Daily Pick</h2><div class="pi r"><h3 class="ar cl cm at ev">By Towards Data Science</h3></div><div class="pj pk r"><p class="ar cl pl pm pn po pp pq pr ps pt pu ev">Hands-on
real-world examples, research, tutorials, and cutting-edge techniques
delivered Monday to Thursday. Make learning your daily ritual. <a class="cq cr ba bb bc bd be bf bg bh bk gn go pv" href="https://medium.com/towards-data-science/newsletters/the-daily-pick?source=follow_footer--------------------------follow_footer-" rel="noopener">Take a look</a></p></div><div class="n pw"><div class="px r py"><span><a class="cq cr ba bb bc bd be bf bg bh hq hr bk gn go" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=newsletter_v3_promo--------------------------newsletter_v3_promo-" rel="noopener"></a></span></div></div></div></div></div></div></div><div class="oy oz pa pb iq pc"><div class="n pw"><div class="qj qk r"><p class="ar cl cm at ev">Create a free Medium account to get The Daily Pick in your inbox.</p></div></div></div><div class="n pw"></div><div class="n o pw"></div><div class="ql r"><ul class="bf bg"><li class="hg bx hs qm"><a class="qn qo dx aw r lr no a b cm" href="https://towardsdatascience.com/tagged/python">Python</a></li><li class="hg bx hs qm"><a class="qn qo dx aw r lr no a b cm" href="https://towardsdatascience.com/tagged/statistics">Statistics</a></li><li class="hg bx hs qm"><a class="qn qo dx aw r lr no a b cm" href="https://towardsdatascience.com/tagged/programming">Programming</a></li><li class="hg bx hs qm"><a class="qn qo dx aw r lr no a b cm" href="https://towardsdatascience.com/tagged/data-science">Data Science</a></li><li class="hg bx hs qm"><a class="qn qo dx aw r lr no a b cm" href="https://towardsdatascience.com/tagged/time-series-modeling">Time Series Modeling</a></li></ul></div><div class="qp n fo y"><div class="n fw"><div class="qq r"><div class="n o"><div class="r fs qr qs qt qu qv"><span><a class="cq cr ba bb bc bd be bf bg bh hq hr bk gn go" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=post_actions_footer-----cd94d1d901c0---------------------clap_footer-" rel="noopener"><div class="c qw gc n o qx fs qy qz ra rb rc rd re rf rg rh ri rj rk rl"></div></a></span></div></div></div></div></div><div class="r on oo op oq or os ot"><div class="fs rp ou"><h4 class="ar cl gg at ev"></h4></div></div><div class="n fw"><div class="r rq rr rs rt ru"></div></div><div class="n o"><div class="hp r ao"></div></div><div class="hp r ao"></div><div class="hp r ao"></div><div class="rv r ao"><div class="q"><span><a class="cq cr ba bb bc bd be bf bg bh hq hr bk gn go" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=post_actions_footer--------------------------bookmark_footer-" rel="noopener"><svg height="25" viewbox="0 0 25 25" width="25"></svg></a></span></div></div><div class="n p"><div class="z ab ac ae af dp ah ai"><div class="rw rx pb ql r ry y"><div class="r g"><div class="rz sa r fs"><span class="r sb al sc"><div class="r s sd se"><a href="https://towardsdatascience.com/@jhwang1992m?source=follow_footer--------------------------follow_footer-" rel="noopener"><div class="fs sf ce"><div class="fv n fw o p s fx fy fz ga gb dj"><svg height="91" viewbox="0 0 91 91" width="91"></svg></div></div></a></div></span></div></div></div></div></div><div class="rz sa r fs"><span class="r sb al sc"><div class="r s sd se"><a href="https://towardsdatascience.com/@jhwang1992m?source=follow_footer--------------------------follow_footer-" rel="noopener"><div class="fs sf ce"><img alt="Jiahui Wang" class="r gc ce sf" height="80" src="download/2e8NPGZhPxcLycmEEaUy2lQ.jpeg" width="80"/></div></a></div><span class="r"><div class="sg r sh"><p class="ar cl cm at aw co dq">Written by</p></div><div class="sg si n sh"><div class="ai n o fo"><h2 class="ar kd sj sk ev"><a class="cq cr ba bb bc bd be bf bg bh hq hr bk gn go" href="https://towardsdatascience.com/@jhwang1992m?source=follow_footer--------------------------follow_footer-" rel="noopener">Jiahui Wang</a></h2><div class="r g"><span><a class="nt gr ay az nu bi bj nv bh hc ar b as at au av he hf df hg dx" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=follow_footer-4037e6e33535-------------------------follow_footer-" rel="noopener">Follow</a></span></div></div></div></span></span><div class="sg sl r sh bp"><div class="sm r"><h4 class="ar cl nn qh aw">Motivated to LEARN and SHARE</h4></div></div></div><div class="rw r"></div><div class="rz sa r fs"><span class="r sb al sc"><div class="r s sd se"><a href="https://towardsdatascience.com/?source=follow_footer--------------------------follow_footer-" rel="noopener"><img alt="Towards Data Science" class="hc sf ce" height="80" src="download/1hVxgUA6kP-PgL5TJjuyePg.png" width="80"/></a></div><span class="r"><div class="sg si n sh"><div class="ai n o fo"><h2 class="ar kd sj sk ev"><a class="cq cr ba bb bc bd be bf bg bh hq hr bk gn go" href="https://towardsdatascience.com/?source=follow_footer--------------------------follow_footer-" rel="noopener">Towards Data Science</a></h2><div class="r g"><div aria-hidden="false" class="hg"><span><a class="nt gr ay az nu bi bj nv bh hc ar b as at au av he hf df hg dx" href="https://medium.com/m/signin?operation=register&redirect=https%3A%2F%2Ftowardsdatascience.com%2Fhow-to-model-time-series-data-with-linear-regression-cd94d1d901c0&source=--------------------------follow_card-" rel="noopener">Follow</a></span></div></div></div></div></span></span><div class="sg sp r sh bp"><div class="sm r"><h4 class="ar cl nn qh aw">A Medium publication sharing concepts, ideas, and codes.</h4></div></div></div><!--EndFragment-->
</body>
</html>