This project is a common work with Raphael Thabut and Eric Vong. It’s a pratical implementation of A closed-form filter for binary time series by Fanaso and al.
Article presentation
Denoising data is always relevant to perform, as in reality, a lot of observations can be noised due to external effects. Most filtering methods to predict and smooth distributions relies on the assumption of the Gaussian nature of the state-space models. This is because we have closed form filters available. Even for a binary state-space model, we require Monte-Carlo strategies for prediction.
This article claims that they have found a closed form filter for the dynamic probit models with a gaussian state variable. To do so, they showed that they belong to a class of distribution, the Skew Unified Normal (SUN), whose parameters can be updated with closed expression.
They then show how the algorithm performs in practice, and how using, the properties of the SUN distribution, allows for online inference for high dimensions.
Zoom on the financial application
The articles gave an application example, using financial time series. Indeed, it focus on the analysis of two times series, the Nikkei 225 (Japanase Stock Index) and the CAC 40 (French Stock Index).
As the latter opens after the former, one can try to use the cointegration effect between the two time series to obtain relevant predictions. Our data will be composed of :
- xt where xt=1 if the Nikkei 225 opening has increased between the close value and opening value and xt=0 otherwise
- yt where yt=1 if the CAC 40 opening has increased between the close value and opening value and yt=0 otherwise
One important objective for a financial application would be to evaluate Pr(yt=1∣xt).
This can be done using a probit model. We try to infer a binary value given two parameters, the trend of the CAC 40 when the Nikkei 225 opening is negative and the shift in the trend if the Nikkei 225 opening is positive. Mathematically, this means :
P(yt=1∣xt)=Φ[Ftθ,1]with the following notations :
- Ft=(1,xt)
- θ=(θ1,θ2), θ1 being the trend and θ2 the shift
In the case of such an application, the purpose of the methods we describe in this report is to estimate the θ parameter effectively.
The SUN distribution
Gaussian models are omnipresent in mathematical models. They have very strong properties and make models simpler. Although, one important drawback is that real life distribution tends to have some sort of asymetry.
A first extension of the gaussian models, Skew Normals, has been proposed by Arellano-Valle and Azzalini (1996). Due to the fertility of this formulation, a unifying representation, the Skewed Unifyed Normals, has been proposed by them since 2006.
Given θ∈Rq, we say that θ has a unified skew-normal distribution, noted θ∼SUNq,h(ξ,Ω,Δ,γ,Γ), if its density function ρ(θ) is of form:
ϕq(θ−ξ;Ω)Φh(γ+ΔT¯Ω−1ω−1(θ−ξ);Γ−ΔT¯Ω−1Δ)Φh(γ;Γ)With:
- ϕq(θ;Ω) where ϕh is the centered gaussian density with covariance Ω matrix
- Φh(γ;Γ) where Φh is the centered gaussian cumulative distribution function with covariance Γ
- Ω=ω¯Ωω and ¯Ω is the correlation matrix
- ξ controls the location
- ω controls the scale
- Δ defines the dependance between the covariance matrixes Γ,¯Ω
- γ controls the truncature
The term:
Φh(γ+ΔT¯Ω−1ω−1(θ−ξ);Γ−ΔT¯Ω−1Δ)Φh(γ;Γ)induces the skewness as it is rescaled by the term Φh(γ;Γ)
We have also the following distribution equality θ(d)=ξ+ω(U0+ΔΓ−1U1) with U0 independant of U1 and U0∼Nq(0,¯Ω−ΔΓ−1ΔT),U1∼TNh(0,Γ,−γ) truncated normal law below −γ. This representation allows for efficient computing, but we can also apply it to a probit model.
Given yt=(y1t,…,ymt)T∈(0,1)m a binary vector and θt=(θ1t,…,θpt)T∈Rp the vectors of parameters. The dynamic probit model is then defined as: `
p(yt∣θt)=Φm(BtFtθt;BtVtBt) θt=Gtθt−1+εt,εt∼Np(0,Wt)
With θ0∼Np(a0,P0) with Bt=diag(2y1t−1,…,2ymt−1)
This representation is equivalent to another formulation (Albert and Chib 1993) as followed:
p(zt|θt)=ϕm(zt−Ftθt;Vt) θt=Gtθt−1+εt,εt∼Np(0,Wt)
With $y_{it} = 1{z{it} > 0}$
Then, under such a dynamic, the first step filtering distribution is:
(θ1∣y1)∼SUNp,m(ξ1∣1,Ω1∣1,Δ1∣1,γ1∣1,Γ1∣1)With the following parameters:
- ξ1∣1=G1a0
- Ω1∣1=G1P0GT1+W1
- Δ1∣1=¯Ω1∣1ω1∣1FT1B1s−11
- γ1∣1=s−11B1F1ξ1∣1
- Γ1∣1=s−11B1×F1Ω1∣1FT1+V1B1s−11
- s1=(diag(F1Ω1∣1FT1+V1))1/2
More generally, if (θt∣y1:t−1)∼SUNp,m(t−1)(ξt−1∣t−1,Ωt−1∣t−1,Δt−1∣t−1,γt−1∣t−1,Γt−1∣t−1) is the filtering distribution at t−1, then the predictive distribution at t is (θt∣y1:t−1)∼SUNp,m(t−1)(ξt∣t−1,Ωt∣t−1,Δt∣t−1,γt∣t−1,Γt∣t−1) with the following parameters:
- ξt∣t−1=Gtξt−1∣t−1
- Ωt∣t−1=GtΩt−1∣t−1GTt+Wt
- Δt∣t−1=w−1t−1∣t−1Gtwt−1∣t−1Δt−1∣t−1
- γt∣t−1=γt−1∣t−1
- Γt∣t−1=γt−1∣t−1
And at time t, the filtering distribution are defined as:
- ξt∣t=ξt∣t−1
- Ωt∣t=Ωt∣t−1
- Δt∣t=[Δt∣t−1,¯Ωt∣tωt∣tFTtBts−1t]
- γt∣t=[γTt∣t−1,ξTt∣tFTtBts−Tt]
- Γt∣t=[[A,B],[B,C]]
- A=Γt∣t−1
- B=s−1tBtFtωt∣tΔt∣t−1
- C=s−1tBt(FtΩt∣tFTt+Vt)Bts−1t
- st=(diagFtΩt∣tFTt+Vt)1/2
For the smoothing, it is available with similar forms of equations.
Particles algorithms
Particle filters are a mathematical tool used to estimate the state of a system that is described by a set of uncertain variables. It is a type of recursive Bayesian filter that estimates the state of a system by sampling a large number of “particles,” each of which represents a possible state of the system. The particles are weighted according to their likelihood of being the true state of the system, and the weighted average of the particles is used as an estimate of the state. In this section we describe two particles algorithms we used to estimate the parameters of the targeted SUN distribution : the bootstrap filter, and an optimal filter proposed by the researchers.
To obtain similar results, detailed in the fourth part of the report, we started with the same parameters and used the previously seen properties.
- a0=[0,0]
- P0=3I2
- Vt=1
Bootstrap filter
In a bootstrap filter, the state estimates are generated using a bootstrapping process, which involves sampling from the previous estimate of the state distribution with replacement. This means that each sample is drawn from the previous estimate, and the same sample may be drawn multiple times. The new estimate of the state is then obtained by weighting the samples according to their likelihood of being the true state.
In the specific case of our study, using the properties and the assumptions of the model, we implemented the bootstrap filter by following this pseudo code :
Where W0 is taken as W0=0.01I2 (suggested by a graphical search of the maximum for the marginal likelihood computed under different combinations via the analytical formula)
Independent and identically distributed sampling
A first filter is proposed by the researchers in the article. It uses the main properties of the SUN distribution in order to get independent and identically distributed sampling. This algorithm can be described by the following pseudo code :
Optimal filter
The optimal filter introduced by the researchers can be described by the following pseudo code, using the equations of the SUN distribution and filtering, we perform the steps :
Where Gt=Γ=¯Ω=I2 and ω=0.1, this filter leverages the SUN distribution properties detailed above. With the starting values, we have:
Γ=I2Results and Conclusion
We decided to focus on the convergence of the algorithms. Plotting the convergence of the models parameter θ1 and θ2.
We observe similar results as the one in the paper :
- θ1 seems to tend towards -0.5
- θ2 seems to tend towards 1
One may find the same plot but for the boostrap algorithm in the Appendix. We can see the same convergence results : θ1 seems to tend towards -0.5, θ2 seems to tend towards 1.
In order to compare the accuracy of the algorithms, Fasano et al used the following procedure :
- Compute the corresponding SUN density on a 2000*2000 sized grid of equally spaced values of θ1t and θ2t for each t in 1,2,…,97
- Compute the marginal distributions of θ1t (resp. θ2t) by summing over the values of θ2t (resp. θ1t) and multiplying by the grid step size (this corresponds to approaching the integral of the joint density on one of the two variables by a rectangle method)
- Run each algorithm 100 with respectively 103,104, and 105 particles
- Compute the Wasserstein distance between the empirical distribution given by each algorithm and the discretized exact density
- for each algorithm, take the median of the 100 Wasserstein distance computed
We also used this method, however due to the extensive computing time the methods required. We restrained ourselves to grids of size 100×100 and only went up to t=50. We ended up with the following table :
Algorithm | θ1(R=103) | θ1(R=104) | θ1(R=105) | θ2(R=103) | θ2(R=104) | θ2(R=105) |
---|---|---|---|---|---|---|
Boostrap | 0.086 | 0.036 | 0.025 | 0.124 | 0.069 | 0.062 |
Optimal | 0.082 | 0.034 | 0.024 | 0.117 | 0.067 | 0.061 |
We can see that our results are similar to the article’s results, and the conclusions drawn out are the same : although mild, we benefit from a better accuracy with the proposed “optimal” filter than the one we observe when using the classic bootstrap filter.
“
We also reproduced a bar plot which shows the frequency at which each algorithm dominates the other accross time.
We can see the “optimal” filter has ranks above the bootstrap filter most of the time: it overperforms the bootstrap filter at frequency 0.59 for R=103, 0.84 for R=104, and 0.66 for R=105.
We also compared the performance of the algorithms throughout the time window by averaging the wasserstein distance over each of the 100 experiments for each algorithm at each time t, the graph below shows the results:
We can see that the accuracy of both algorithms decreases with time (let it be noted that here a good accuracy is a low accuracy, although this can sound confusing). Of course, we also notice throughout the study that more particles implies better results.
Finally, and because we are using Monte Carlo methods, we decided to take a look at the variance of our algrithms. We expected a smaller variance for the “optimal” algorithm. To look at the variance of the results, we simulated 100 runs with each algorithm and 103 samples. We then assessed the median variance. We obtained the following results :
Algorithm | Variance of θ1 | Variance of θ2 |
---|---|---|
Boostrap | 0.00525312 | 0.00937037 |
Optimal | 0.00380095 | 0.00727529 |
As expected we have lower variance with the optimal algorithm than the boostrap algorithm. Moreover, we have a relatively small variance : 10−3 for all coefficients.