We propose to approximate a model for repeated measures that incorporated random effects, correlated stochastic process and measurements error. The stochastic process used in this paper is the Integrated Ornstein-Uhlenbeck (IOU) process. We consider a Bayesian approach which is motivated by the complexity of the model, thus, we propose to approximate the IOU stochastic process into a continuous spatial model that constructed by convolving a very simple and independent, process with a kernel function. The goal of this approximation is to offer some advantages over specification through a spatial process of computing covariance, variogram, and extremal coefficient functions, also to add to the extremal coefficient plots the empirical estimates. This approximation is attractive because it facilitates calculations especially that contain a huge amount of data in addition it reduces the computational execution time, also it extends beyond simple stationary models.
Stochastic process, Longitudinal, Integrated ornstein-uhlenbeck, Bayesian, Spatial, Convolution
62M05, 62M30, 62P10
A longitudinal data consists of measurements of a single variable taken repeatedly over time from an individual. Any approach used to analyze such data must properly consider the correlations among the observations, see Li, N., et al. [1]. The typical structure of longitudinal data is numerous measurements of a possible multivariate response variable on each subject. There could also be covariates, possibly time varying, that influence the response variable [2]. The aim in the analysis of such data is to understand the changes in the mean structure of the response variable with time, to understand the effect of the covariates on the response variable, and to understand the within subject correlation structure.
Let Yi(tij) be the longitudinal process of subject i at time tij ≥ 0. Values of Yi(tij) are measured intermittently at some set of times tij. The observed longitudinal data on subject i may be subject to "error", thus we observed only Xi(tij) whose elements may not exactly equal the corresponding Yi(tij). Given a specification for Yi(tij), the observed longitudinal data are taken to follow
Yi(tij) = Xi(tij)+ εi(tij) (1.1)
Where εi(tij) is an intra subject error.
In this paper, following Taylor, et al., Xu and Zeger, Wang and Taylor, and Abu Bakar, et al. [3-6] we consider a model of the form
Yi(tij) = Xi(tij)+ εi(tij)
Xi(tij) = ai+ btij+ βZi(tij)+ Wi(tij) (1.2)
With the notations (N) denotes for Normal Distribution, and (Nni) denotes for Multivariate Normal with dimension ni. Yi(tij) denotes the observed value of a continuous time-dependent covariates (disease marker or longitudinal measurements) for subject i, (i= 1, ..., M) at tij (j= 1, ..., ni); M number of the subjects in the study; ni number of repeated measurements for subject i and may be different for each subject, Xi(tij) is the true value of the marker at time tij, ai are independent random intercept of subject i follow the normal distribution with mean μa and variance σ2a,{ai∼N(μa,σ2a)}, b is fixed slope, Zi(tij) is the covariates for subject i at time tij with corresponding unknown regression parameter β, εi(tij)∼N(0,σ2ε) represents deviations due to measurement error and "local" biological variation that is on a sufficiently short time scale that εi(tij) may be taken independent across j, and Wi(tij)∼Nni(0,∧) is a vector of independent IOU stochastic process, the covariance matrix ∧ with parameters αw and σ2w; only depends on i through the number ni of observations and through the time points tij at which measurements are taken, see Henderson, Diggle and Dobson [7].
In this paper, we introduce an approximation of the IOU process which still gives an efficient inference of model parameters and reducing the dimensionality and complexity of the model. The details of this simple approach are given in the following sections, including an approximate formulation, the likelihood and parameter estimations. The usefulness of this modeling approach is then demonstrated by simulations.
The main disadvantage of the IOU process is that it is not stationary; hence it is necessary to have a natural time zero for each individual. In some applications, it may be that there is no natural time zero, or that time zero is not exactly known. Large Longitudinal datasets are often defined on naturally heterogeneous fields or have other inherently spatially varying conditions. Therefore, it is unreasonable to expect a response variable to be well-modeled by a stationary process over a large domain space. However, using non-stationary models is difficult in practice due to the conceptual challenges in specifying the model and the computational challenges of fitting the model when the data is so large that memory constraint prevent formation of the covariance matrix.
We propose to approximate the IOU stochastic process Wi into flexible spatial model that can be constructed by convolving a very simple and independent, process with a kernel function. This approximation for constructing a spatial process introduces a number of advantages over specification through a spatial covariogram. In particular, this process convolution specification leads to computational simplifications and easily extends beyond simple stationary models. Our modeling approach is similar to that in Higdon [8], provide simple representations of such model by convolving continuous white noise with a kernel, whose shape determines the covariance structure of the resulting process. This approach is an alternative to traditional geostatistical techniques, where a covariance function is specified directly, but allows for increased flexibility, since the choice of the kernel also allows for features such as non-stationary, anisotropy, and edge effects. Moreover, model (1.2) is temporal longitudinal model, by applying the proposed approximation, the dimensionality of the complex temporal process significantly reduced.
The model we propose for Xi(tij) at time tij is
Xi(tij) = ai+ btij+ βZi(tij)+ Ui(tij),
Where U is a mean zero Gaussian process, and is an approximation of the IOU stochastic process W. Hence, model in (1.2) becomes
Yi(tij) = Xi(tij)+ εi(tij)
Xi(tij) = ai+ btij+ βZi(tij)+ Ui(tij) (2.1)
But rather than specify U(t) through its covariance function, it is determined by the latent process U∗(t) and the smoothing kernel K(t).The restriction on the latent process U∗(t) to be nonzero at the spatial sites t∗= {t∗1= ti1, t∗2= ti2, ...,t∗mi= timi} in a space time t and define U∗(t∗)={u∗(t∗1), ..., u∗(t∗mi)}. Each u∗ is then modeled as independent draws from a mean zero Gaussian distribution with variance σ2u.
Hence, U∗ will follow a multivariate normal with mean zero and variance covariance matrix ∑= Imiσ2u, here mi is the number of space times over spatial sites t∗.
The new Gaussian process Ui(t) is then
Ui(tij) = ∑mir=1K(tij−t∗r) U∗i(t∗r) (2.2)
Where K(.−t∗r) is a kernel centered at t∗r.
The resulting covariance function for U(t) depends only on the displacement vector d = t − s and is given by
C(d)= Cov(U(t),U(s))= ∫sK(u−t) K(u−s)du (2.3)
Table 1 shows kernels that give standard Gaussian, exponential, and spherical covariograms for the process U(t), [9]. In addition, the covariogram induced by the biwieght kernel Cleveland [10] is also shown.
The process convolution approach gives an approach to build dependent spatial processes, see Ver Hoef and Barry [11]. The basic idea is to build processes U(t) that share part of a common latent process in their construction. Perhaps the biggest attraction to these process convolution models is that they give a framework for developing new classes of space and space-time models that allow for more realistic space-time dependence while maintaining some analytic tractability. Generally, one can construct a space-time process by first defining a simple, possibly discrete, process over space and time, and then smoothing it out with one or more kernels, giving a smooth process over space and time.
This constructive approach is appealing since the resulting models can be extended to allow for generalizations such as non-stationarity, non-Gaussian models, and non-separable space-time dependence structures. See Wolpert and Ickstadt [12], and Higdon, et al. [13] for some purely spatial applications, and Higdon [14] for a space-time model. In addition, models can be constructed in such a way to facilitate computation - such as restricting the underlying process to reside on a lattice so that fast Fourier transforms can be employed.
Based upon our previous assumptions, the unknown parameters in model (2.1) are
Ω= {ai, ...., an,b,β,σ2ε,μa,σ2a,σ2u}. The contribution of subject i to the conditional likelihood function is given by
[Yi(tij)|Ω, Xi]=(∏nij=1[Yi(tij)|Ω, Xi])=(∏nij=1[Yi(tij)∣∣ai,b,β,σ2ε,Zi(tij),Ui(tij)])=∏nij=112πσ2ε−−−−√exp(−(Yi(tij)−(ai+btij+βZi(tij)+Ui(tij)))22σ2ε) (3.1)
Where, [.] and [.|.] denote marginal and conditional densities, respectively, Ω denotes all model unknown parameters.
For the prior density of Ω we assume tha b,β,σ2ε,μa,σ2a,and σ2u have independent prior densities, so that {ai,i=1,...,n} are independent and normally distributed with parameters μa and σ2a, also {Ui,i=1,...,n} are independent stochastic process with parameter σ2u.
From the independency assumptions, the posterior density of all unknown model parameters [Ω|data] is proportional of
{∏Ni=1(∏nij=1[Yi(tij)∣∣ai,b,β,σ2ε,Xi,Ui(t)])[Ui(t)∣∣,αw,σ2w] [ai∣∣μa,σ2a]} × [b] [μa] [σ2a] [β] [σ2u] [σ2ε] (3.2)
To fit the full model and make inference about the population parameters, Adaptive Rejection Metropolis Sampling (ARMS), and Gibbs and ARMS sampling techniques are used. These methods are a MCMC technique for drawing dependent samples from complex high dimensional distributions, see Waezizadeh, and Mehrpooya [15]. The posterior distribution converges was checked by Gelman-Rubin convergence statistic R (posterior consistency), as modified by Brooks and Gelman [16]. In order to apply one of these methods on our model, posterior for each parameter must be derived, and then a proposed prior density for each of these parameters must be chosen. Based on the likelihood functions in (3.1) and (3.2), with the notations (IG) denotes for Inverse Gamma, and (N) denotes for Normal, the conditional densities of the unknown model parameters are given as follows:
For the error parameter σ2ε
[σ2ε|.] α∏ni=1∏nij=1[Yi(tij)∣∣ai,b,σ2ε,Zi(tij),Ui][σ2ε]α∏ni=1∏nij=11σ2ε−−√exp(−(Yi(tij)−(ai+btij+βXi(tij)+Ui(tij)))22σ2ε) [σ2ε]α∏ni=1∏nij=11σ2ε−−√exp(−(Yi(tij)−(ai+btij+βXi(tij)+Ui(tij)))22σ2ε) [σ2ε]α1(σ2ε)∑ni=1ni2exp(−∑ni=1∑nij=1(Yi(tij)−(ai+btij+βXi(tij)+Ui(tij)))22σ2ε) [σ2ε]α(σ2ε)−((∑ni=1ni2−1)+1)exp (−∑ni=1∑nij=1(Yi(tij)−(ai+btij+βXi(tij)+Ui(tij)))2/2σ2ε) [σ2ε]α IG(α0,β0)[σ2ε]
Where
α0= ∑mi=1ni2−1, and β0= ∑mi=1∑nij=1(Yi(tij)−(ai+btij+βXi(tij)+Ui(tij)))22
In the same manner we found:
The intercept variance σ2a:[σ2a|.] αIG (α0,β0)[σ2a], where α0= n2−1, and β0= ∑ni=1(ai−μa)22
The intercept mean μa: [μa|.] αN (α0,β0)[μa], where α0=∑ni=1aiN, and β0= σ2an
The random intercept ai, for i= 1,2, ....n,[ai|.] α N(α0,β0), where
α0=⎛⎝⎜∑nij=1(Yij−(btij+βXi(tij)+Ui(tij)))σ2ε+μaσ2a⎞⎠⎟(niσ2ε+1σ2a)−1 and β0= (niσ2ε+1σ2a)−1
The average rate of the slope b: [b|.]αN(mb,β0)[b], where
mb=∑ni=1∑nij=1tij(Yi(tij)−(ai+βXi(tij)+Ui(tij)))∑ni=1∑nij=1t2ij and β0= σnε∑ni=1∑nij=1t2ij
The effect of the regression parameter on the marker βl,l=1,...,p:[βl|.]αN(mβl,β0)[βl], where
-l
mβl=∑ni=1∑nij=1Xil(tij)(Yi(tij)−(ai+btij+β(−l)Xi(−l)(tij)+Ui(tij)))∑ni=1∑nij=1X2il(tij) and
β0= σ2ε∑ni=1∑nij=1X2il(tij)
β(−l)=(β1,...,βl−1,βl+1,...,β0) and Xi(−l) are the remaining covariates after lth covariate Xil is excluded from Xi.
For the stochastic process U(t) parameter σ2u : [σ2u|.] αIG (α0,β0)[σ2u], where
α0= ∑mit∗im−1, and β0= ∑mi=1UTiA−1Uim, Ui={ui(t∗i),....,ui(t∗mi)}, A−1=∑−1σ2u, and ∑ is given in (2.2).
Since all posterior densities are in standard form, then it is easy to choose conjugate priors for all model (2.1) parameters, drawing random variates using Gibbs sampler from their full conditional distributions is straightforward since their full conditional densities are standard distributions. Therefore, we use the full conditional density as proposal density. At each updating step for these parameters, a new draw from the full conditional density is always accepted.
To illustrate our proposed model, we setup our simulation study represents a randomize clinical trial, in which M = 500 subjects are randomized. Each longitudinal marker in model (2.1), Yi(tij), i=1,...,M,j=1,...,ni was simulated as the sum of the trajectory function Xil(tij) and the error terms εi(tij), each subject has its observed longitudinal measured ni=10 at time points t={t1=0.1,...,t10=1}. For the simulations considered in this research the smoothing kernel K(.) will be a radially symmetric kernel, such that K(t)αexp(−12∥t∥2), with variance covariance function C(d)αexp(−12(t−s2√)2), and defining the latent process support so that the t∗j are m = 5 equally spaced points ranging from -0.1 to 1.2. Note that this combination of kernel width and spacing of the t∗j yields a spatial process U(t) via (2.3) that is nearly stationary. If the spacing become much larger, or if the kernel width is reduced, the covariance structure C(d) for U(t) becomes unduly influenced by sparseness artifacts.
Since the calculations for the simulation study were highly computationally intensive, we have used the cluster with about 20 nodes with AMD Quad-Core Opteron 835X, 4 × 2G Hz, and 16 GB RAM per node. We obtained 3000 iterations after a burn-in of 1000 iterations. Convergence was checked by monitoring histories of sampled quantities with several different starting points. The histogram, the time series plots of one sequence of Gibbs samples for different number of iterations and the average number of these iterations for the parameter μa are presented in Figure 1.
We also used the Gelman-Rubin convergence statistic, as modified by Brooks and Gelman [15]. They emphasize that one should be concerned both with convergence of R is the ratio pooled-width within-width (the ratio of width of the central 80% interval based on pooled runs and the average width of 80% intervals within the individual runs), to one, and with convergence of both the pooled and within interval widths to stability. For our analyses, all R's converged to 1 within 3000 iterations and hence the burn-in of 1000. The analysis for the 500 simulated data sets for a single scenario took approximately 2:30 hours to run the model under the approximate model. While it took almost 6 hours to run the analysis without approximations.
With the initial values of the parameters for which the data are generated considered as the truth values of the parameters, estimate Monte Carlo Summary statistics, Monte Carlo Standard Deviation (MCSD), Mean Squared Error (MSE), 95% Confidence Converge Rate (CCR), and Bias in Percentage Terms (BPT) are presented in Table 2, where, MCE stand for Monte Carlo Error and it can be evaluated as follows : In our simulate study we used 500 data replications, thus the resulting estimates are subject to sampling variation (Monte Carlo Error), this variation for the point estimate can be calculated as pˆ=MCSD/500−−−√, the MCE then can be found by MCE= pˆ(1−pˆ)500−−−−−√
Results in Table 2 assert the convergence of the Markov Chain and the samplers reached the convergence after 2,000 iterations after 1,000 iterations are burn-in. Posterior means, posterior standard deviations, Bias as percent of true parameter and 95% highest posterior density intervals for each parameter in the proposed and IOU models, are represented in Table 3. The estimates of all parameters from the approximate modelling analysis are quite accurate and efficient.
We have proposed a new model for repeated measures that incorporated random effects, correlated stochastic process and measurements error, we proposed to approximate the IOU stochastic process Wi into a spatial model that can be constructed by convolving a very simple and independent, process with a kernel function. This approach offered some advantages over specification through a spatial process of computing covariance, variogram, and extremal coefficient functions, also added to the extremal coefficient plots the empirical estimates. This approximation is attractive because it facilitates calculations, especially that contain a huge amount of data also it easily extends beyond simple stationary flexible models. Moreover, this structure can be used to significantly reduce the dimensionality of a complex temporal process.
The limitation in the proposed model is that the latent process U∗(t) must be nonzero at the spatial sites t∗ in a space time t. To overcome this problem with the covariance function of U(t) and U(s) the differences through the displacement vector d = t – s was performed.
A Bayesian approach was taken to fit the proposed model through a simulation study. The numerical results demonstrate that the propose modelling method results in efficient estimates and good coverage for the population parameters. The estimates are close to the true values of the parameters and have good coverage rates. The small biases of the estimates are due to Monte Carlo simulation error. Compared to the IOU model, the proposed model results in improved estimates almost for all parameters. The proposed model demonstrated significant reductions in execution time. This approach effectively eliminates the deficiency of non-spatial huge data access by replacing such patterns in hotspots of applications with spatial sites data at a space time t at runtime. The execution time reduced by 60% illustrate the efficiency of proposed model.