Hierarchical Bayes Approach for Analysis of Item-Level Missing Data

C l i n M e d International Library Citation: Junshan Q, Ram T (2015) Hierarchical Bayes Approach for Analysis of Item-Level Missing Data. Int J Clin Biostat Biom 2:009 Received: April 27, 2016: Accepted: June 28, 2016: Published: June 30, 2016 Copyright: © 2015 Junshan Q, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Junshan and Ram. Int J Clin Biostat Biom 2016, 2:1


ISSN: 2469-5831
particularly in the PROs.In case of missing responses to some of the questions/items in a multi-item questionnaire, the calculation of the total scores could be completed via simply ignoring the missing values or with imputation of missing values.Regardless of the reasons for the omitted responses, analyzing an incomplete data without addressing the missing items may result in misleading results, depending on the extent of missingness.McKnight and McKnight [1] recommended making diagnosis of missing data to distinguish three situations: ignorable given a few missing items, treatable given a reasonable amount of missing items, and non-treatable given a large swath of missing items.Averaging over the available items is often advised given that the items are all expressed in a common metric, and the scale is uni-dimensional and having acceptable consistency or convergent validity (e.g., Cronbach's alpha is sufficiently high, at or above 0.70) [2].It is equivalent to put equal weights over the items given the assumptions are valid.However, this method can result in biased results when there are deviations from the aforementioned assumptions (e.g., inflated estimates of inter-item consistency) or when the missing completely at random assumption (MCAR) is not valid [3].To overcome this problem, using non-missing consistent or convergent items to directly estimate an imputed score has been recommended [2].A complete-case analysis is another option.Similar to the averaging over available cases, this method can result in not only biased analyses when data are not MCAR but also suboptimal power of the study [4].Furthermore, a big proportion of missing item scores can lead to missing total scores at scale level.As such, missing data in a multi-item questionnaire needs to be handled at the item level and/or at the scale level.Previous studies have demonstrated that precision will improve when missingness is handled at the item level [5].There are two main

Introduction Types of missing data
Missing data are primarily due to dropout which can be categorized into different types based on its relation to the response process.For simplicity, it is generally assumed that the relation between a specific type of dropout and the response process can be described using a single (indicator) random variable.In case of distinct types of dropout, it is natural to use the multinomial indicator variables to model the dropout.In addition, monotone missing pattern is a key requirement for most of the methods for handling missing data, especially in the longitudinal data analyses.However, this assumption is not required for the sectional setting where only the observations collected at a single time point are of the primary interest.
ISSN: 2469-5831 and covariates, Z, such as treatment, age, gender, etc., as in Equations (1.1) and (1.2): Here M is the total number of covariates; Y(P) and X(P) represent the variables that are either observed as Y or X, or derived as YP or XP (e.g., predicted mean via Monte Carlo sampling); and Y(X) means that the data could be from domain Y or X.We assume that ijk Y or ijk X are observed for subject i (i = 1,...,N; and N is the total number of subjects enrolled) in treatment group i Z k = on item j, and that ijk Y or ijk X follows a normal distribution with a mean ijk YP or ijk XP and a common precisionτ , which is specified as a fixed number.That is, We model the number of missing observations per subject, u[i] using a categorical distribution with a parameter vector calpha.
, where b = 3; calpha [Z[i], 1: b] follows a Dirichlet distribution with a parameter vector which has each element of 1.
The model is displayed in figure 1.

Prior distributions
To accelerate model convergence, we assume that the elements of the vector, α, are independent and follow a truncated normal distribution with mean 0 and precision 0.001 with range of L to U. That is, '.
(0, 0.001) ( , ) where L α and U α refer to the upper and lower limits of the truncation range for parameter ' jj α ; j refers to the item as the response variable, ' j is the explanatory variable, and I(A) refers to the indicator of the set A.

Mechanism: Relay from completely to partially observed sets
Prior to fitting the model above, the data set was divided into sections by missing patterns as in table 1 with "1" indicating missing and "0" indicating observed.
Table 2 demonstrates how the missing values are predicted.The as the amount of missing data is not extensive and the marginal distribution is approximately uni-modal and symmetric for each item, we can use a normal model to impute the missing data and make inference using the imputed multi-item data [9].

Handling missing data in Bayesian paradigm
From a Bayesian perspective, missing values are treated as unknown quantities, and hence random variables, for which a joint posterior distribution can be derived; or missing data are treated as latent variables that can be included as part of the complete data.Thus, the distinction between the missing data and unknown model parameters is not obvious.Using the OpenBUGS, the Markov chain Monte Carlo (MCMC) samples from the joint posterior distribution of the model parameters and missing values can be generated.In other words, the missing part of the data can be simulated from its conditional posterior distribution given the observed data, and other model parameters.Under a fully Bayesian paradigm, any inference will be based on the conditional posterior distributions of interested quantities directly, while certain multiple imputation approaches relying on the models for imputation of missing values make inferences based on multiple imputed datasets [10].As such, a fully Bayesian approach seems more straightforward than a two-stage multiple imputation approach.
The proposed fully Bayesian approach is analogous to the multiple imputation approach based on the method known as fully conditional specification [10] and is presented in the next Section on Bayesian Modeling.The performance of the proposed approach is assessed in Section on Evaluation of Model Performance with respect to residuals; and inferences on quantities of interest such as overall treatment effect, are compared with the corresponding Frequentist inferences.Sensitivity analyses are also conducted to evaluate the robustness of the proposed approach on missing data mechanism.In Sections on Examples and Real Data Analyses, we further demonstrate how to implement the proposed approach.

Model structure
To start with a simple example, we assume that there are two domains (X and Y) assessed by the 5-level scores, and three items within each domain (j = 1, T) with T = 3.Each item mean at individual level (Y(X)P ij ) can be expressed as a linear function of other items  Note that this procedure can be easily generated to more than 2 domains and more than 3 items per domains.

Evaluation of model performance
Before making any inference based on the posterior distributions, convergence must be achieved for the MCMC simulation of each chain for each parameter.In addition, if the MCMC simulation has an adaptive phase, any inference is made using values sampled after the end of the adaptive phase.The Gelman-Rubin statistic (R), as modified by [11] is calculated to assess the convergence by comparing withinand between-chain variability over the second half of each chain.This R statistic will be greater than 1 if the starting values are suitably overdispersed; it will tend to 1 as convergence is approached.In general practice, if R < 1.05, we might assume convergence has been reached.The MCMC simulation for each model parameter is examined using the R statistic.The convergence of the MCMC simulation for each model parameter of interest is identified for inferences.
To assess the performance of the hierarchical Bayesian model, we plotted the residuals, which are defined as the difference between the predicted mean for each subject and each item and the corresponding simulated observations.To evaluate the proposed approach for handling missing data, we check whether the false positive rate for detecting treatment effect under the null hypothesis can be controlled at the desired level; and whether the power for detecting treatment effect at the desired level under the alternative hypothesis can be retained at least.In addition, the Bayesian analyses results are compared with the frequentist counterpart.

Checking residuals
The prediction ability of the hierarchical Bayesian model plays an important role in handling missing data because the predicted

# of Missing Data Sections Domain
mean will be used to impute the missing values.If the residuals are small in magnitude, the imputed values will be close to the true values of the missing observations and will result in inferences with better precision and less bias.We use the following equation to define residuals for Subject i and Item , where YP[i,j] refers to the predicted mean for Subject i and Item j, and YS[i,j] refers to the simulated observation for Subject i and Item j.

Inference: Detecting treatment effects
Gibbs sampling was used to make inferences because it is hard to find an analytical solution to the hierarchical model with parameters at multiple levels.After checking whether the posterior distributions of model parameters have converged and whether the imputed values are close to the true values of missing observations, we can decide whether this hierarchical Bayes model can be used to make sound inferences.Our primary interest of inference is on detecting treatment effects at each domain and overall levels.
We carried out simulations in two main categories: under the null hypothesis: no treatment effect within each domain and under the alternative hypothesis: significant treatment effect within each domain.

Comparing with the frequentist results
Analysis of Variance (ANOVA) was used to test the treatment effect within each domain at the significance level of 0.05.The analyses results were compared with their counterparts from the Bayesian analysis.Instead of using p-values, the 95% lower and upper bounds of credible intervals for average domain scores were used.If the 95% lower bound of the credible interval for the treated group is greater than the 95% upper bound of the control group, the treatment effect is statistically significant in favor of the treatment.

Sensitivity analyses
Given that the full data distribution can be factored into an extrapolation model and an observed data model, one or more parameters indexing extrapolation may not be identified by the observed data [12].Parameters indexing missing data extrapolation are called sensitivity parameters, in general.This concept is used to assess sensitivity of model-based inferences to either assumptions about missing data mechanisms or informative prior distributions.Under the assumption of MAR, the corresponding constrains require that the conditional distribution for a missing observation 2 ,( ) .
for Subject i Item j in domain X or Y in the missing pattern set with indices ≥ 2 follows the same corresponding distribution as for an observation from the completely observed set, 1 ,( ) .
The above conditions, under the normality assumption in this analysis, can be satisfied by equal means and variances of 2 ,( ) .
Similar constrains can be imposed on the parameters for domain X.As a sensitivity analysis, we further write the model in a large class of missing not at random (MNAR) models indexed by the parameter ( ∆ ) that measures the departure from MAR.Here, the model parameters for partially observed datasets are re-parameterized as functions of the corresponding parameters for completely observed datasets.For example, The analysis results with the MNAR model were compared with the ones using the MAR model.

Examples Under the null hypothesis: no treatment effect and homogeneity across the domains
We generated a data set for 120 subjects evenly distributed between two groups.For each subject, the assessments on three items within the two domains were simulated based upon a multivariate normal distribution.For the placebo group ( 1 µ ) and the treatment group ( 2 µ ), the mean vectors and the covariance matrix (Σ) are shared across the placebo and the treatment groups; both parameters specified as follows: ( )

Under the alternative hypothesis: significant treatment effect and homogeneity across the domain
The mean vector is different between the treatment groups; and the covariance matrix is shared across the placebo and the treatment groups; both parameters are specified as follows: ( ) 1 (0,1) (0, 0.5); (1,1) (0.5,1.5);

Under the alternative hypothesis: significant treatment effect and heterogeneity across the domain
The mean vector is different between the treatment groups and also different across the domains for treatment 2; and the covariance matrix is shared across the placebo and the treatment groups; both parameters are specified as follows.
( ) 1 (0,1) (0, 0.5); Under the alternative hypothesis: significant treatment effect and heterogeneity across the domain with Cronbach's alpha < 0.7 The mean vector is different between the treatment groups and also different across the domains for treatment 2; and the covariance matrix is shared across the placebo and the treatment groups; both

Real Data Analyses
The dataset reported by Jouvent, et al. [13] includes observations on poly-dimensional rating scale of depressive mode of 20 items from 269 subjects.However, there are no missing values in the data frame.For simplicity and comparison purposes, we randomly selected 120 subjects and items e1 to e3 and e10 to e12.The missing data pattern is the same as in the simulation studies.Similar to the simulation studies, the 120 subjects were equally allocated to two groups.The selected 6 items are on observed painful sadness, emotional hyper-expressiveness, emotional instability, worried gesture, observed anhedonia, and felt sadness.The first 3 items were grouped as domain X and the rest 3 items were grouped as domain Y.There is no significant difference between the treatment groups within either domain.The selected 6 items has low Cronbach's alpha (0.284) based on the selected 120 subjects without missing values.As such, the conventional approach of using the mean of the observed items to impute the corresponding missing value is not valid.Our proposed approach is used to handle missing values instead.parameters are specified as follows.Particularly, the Cronbach's alpha for evaluation of consistency across the items is below 0.7.As such, the within-subject imputation is not valid.

Missing Data Generation
To generate missing data, each subject received a number randomly generated from a uniform distribution.The sequence of random numbers was ordered from small to large.The same procedure was repeated for each item per domain.The first C numbers of subjects were assigned to missing values for a particular

ISSN: 2469-5831
Its performance is evaluated by taking the advantage of the fact that true values of the created missing data are known.The prediction ability of the hierarchical Bayesian model is evaluated based on the residuals defined earlier.Inferences on the overall treatment effect and the one within each domain are also performed based on the data with missing values imputed.

Results for the Examples Under the null hypothesis: no treatment effect and homogeneity across the domains
We use a dataset randomly generated to evaluate the model performance under the scenario of null hypothesis.The convergence of the MCMC samples for the model parameters is acceptable based on the trace, history, autocorrelation and background diagnosis plots.Per the residual plots in figure 2 and figure 3, the residuals for each subject and item fall around the zero line and vary roughly between -5 and 5.This indicates that the model has adequate prediction ability.And the predicted posterior means can be used to impute missing values.The posterior distributions of model parameters are summarized in figure 4. Further, inference on treatment effect within each domain was made based on the posterior distribution of mean scores per treatment group.Box plots as in figure 5, figure 6 and figure 7 are used to demonstrate the posterior distributions of mean scores in each domain and overall.As

ISSN: 2469-5831
shown in the figure 5 and figure 6, there is no significant treatment effect in each domain.That is, the lower bound of the 95% credible interval for one treatment group overlaps the upper bound of the 95% credible interval for the other treatment group.There is no overall significant treatment effect as in figure 7.These results agree with the ones by ANOVA.These observations were further corroborated by simulation studies (N = 1000; > 95% times of no significant difference).

Under the alternative hypothesis: significant treatment effect and homogeneity across the domain
We use a dataset randomly generated to evaluate the model performance under the scenario of alternative hypothesis with homogeneity.The convergence of the MCMC samples for the model parameters is acceptable based on the trace history, autocorrelation and background diagnosis plots.The posterior distributions of model parameters are summarized in figure 8. Per the residual plots in figure 9 and figure 10, for each domain, the residuals for each subject and item fall around the zero line and vary roughly between -5 and 5.This indicates that the model has adequate prediction ability, and that the predicted posterior means can be used to impute missing values.Further, inference on treatment effect within each domain was made based on

ISSN: 2469-5831
the posterior distribution of mean scores per treatment group.Box plots as in figure 11, figure 12, and figure 13 are used to demonstrate the posterior distributions of mean scores in each domain and overall.As shown in figure 11, figure 12, and figure 13, there is significant treatment effect in each domain and overall.These results agree with the ones by ANOVA.These observations were further corroborated by simulation studies (N = 1000; > 95% times of significant difference).

Under the alternative hypothesis: significant treatment effect and heterogeneity across the domain
We use a dataset randomly generated to evaluate the model performance under the scenario of heterogenic alternative hypothesis.The convergence of the MCMC samples for the model parameters is acceptable based on the trace, history, autocorrelation and background diagnosis plots.The posterior distributions of model parameters are summarized in figure 14.Per the residual plots in figure 15 and figure 16, for each domain, the residuals for each subject and item fall around the zero line and vary roughly between -5 and 5.This indicates that the model has adequate prediction ability and that the predicted posterior means can be used to impute missing values.Further, inference on treatment effect within each domain was made based on the posterior distribution of mean scores per treatment group.Box plots as in figure 17  19, there is significant treatment effect in each domain and overall.These results agree with the ones by ANOVA.These observations were further corroborated by simulation studies (N = 1000; > 95% times of significant difference).

Sensitivity Analyses
We re-parameterized the model including the sensitivity parameter ( ∆ ) that measures departures from MAR.The model parameters for the partially observed data (i.e., parameter vector alpha2 in figure 20) were parameterized as with certain departures from the ones for the completely observed data (i.e., parameter vector alpha in figure 21).
As shown in figure 20

Real Data Analyses
Our proposed approach are further evaluated using the dataset reported by Jouvent, et al. [13] which includes observations on polydimensional rating scale of depressive mode of 20 items from 269 subjects.Since there are no missing values in the data frame, the data are manipulated as described before.The predictive ability of the Bayesian hierarchical model is evaluated based on the residuals defined earlier.The residual plots in figure 25 and figure 26 show that the residuals for both the domain X and Y vary around 0 and range  There is no significant difference between the two groups within each domain.Similar results for the posterior distribution of the overall average of the sum of item scores (i.e., overall treatment effect) is given in figure 29.dealing with the missingness at the item level particularly when the consistency across the items is not adequate.Even though the conventional approach may be favorable given adequate consistency across the items [14], but it cannot deal with the inherent uncertainty of imputations themselves [15].The inference based on the proposed model is robust with the non-informative priors.The

Discussion
We demonstrated that the multi-level score-type data with missingness can be effectively analyzed using the hierarchical Bayes model with OpenBUGS (Version 3.2.3).In general, such models provide a promising alternative to the conventional approach in ISSN: 2469-5831 posterior distributions of model parameters of interest at multiple levels provide more information than do the maximum likelihood estimators (MLEs) even if the computation of MLEs is feasible.The proposed approach can perform well under MAR, particularly when the factors that influence missingness are included in the model.This leads to valid inference based on imputed items.This feature mirrors the merits of regression imputation [15].However, it is beneficial to perform sensitivity analyses to warrant the validity of the analyses under the MAR assumption.
The proposed method is developed under the cross-sectional setting.When borrowing information across longitudinal observations  ISSN: 2469-5831 within a subject is not feasible, it is an efficient way to handle missing values via borrowing information across domains (i.e., scales) and/or borrowing among subjects.Even though this method can ignore the domain structure, it does have the advantage in making good use of domain structure to handle missing data and make inferences at the domain or item level.It makes this approach more appealing than Domain Y: Group 1 and 2 ISSN: 2469-5831 other analogues.In addition, this method can be further extended to the longitudinal setting.Moreover, the distribution of item scores can be ordinal rather than continuous.
As an alternative, the imputed value can be sampled via the following distribution [16].This approach is similar to stochastic regression imputation [15], which adds additional noise to the predicted values for imputation.Even though this approach makes sampled values for imputation from the intended distribution, it also causes problems such as implausible/out-of-range values.Therefore, the imputed value needs to be restricted to the range of the item score to assist with convergence.In addition, the feedback of the imputed value may be undesirable and should be avoided via cut function in OpenBUGS.To these aims, we use the following manipulation, ) Here UB is the upper bound of the range of the item scores.This method yields comparable results as the proposed approach.
In the real situations, it is quite often that we encounter with a large number of items, in such situations our proposed approach may not be applicable directly.However, we can reduce the dimension of the items via grouping sets of items and replacing them with their sum scores.But one should be careful in taking this action because it requires that the items are similar enough.Alternatively, we can select a certain number of informative predictors [17].

Conclusions
The proposed hierarchical Bayes model can be used to handle missing data at item level and can help make valid inferences at multiple levels of interest via conditional posterior distributions.It can serve as an alternative to the traditional approach in either the primary or the sensitivity analysis.This approach has the potential to be further extended to other more complicated setting.

0 jj∆,
Where j=1,...,T+M; j ' =1,...,T+M.It is MAR When ' =.In the MNAR model, we assume that each departure parameter follows a uniform distribution, Where υ is set at a fraction of one standard deviation of each model parameter estimate.

Figure 2 :Figure 3 :
Figure 2: Residual plot for domain X under the null.

Figure 5 :
Figure 5: Boxplot of posterior distributions of treatment effects in domain X.

Figure 6 :
Figure 6: Boxplot of posterior distributions of treatment effects in domain Y.

Figure 7 :
Figure 7: Posterior distributions of overall treatment effects.
, figure 18, and figure 19 are used to demonstrate the posterior distributions of mean scores in each domain and overall.As shown in figure 17, figure 18, and figure residX

Figure 9 :
Figure 9: Residual plot for domain X under the harmonized alternative.
and figure 21, the posterior distributions of model parameters under either the MAR or MNAR assumptions are comparable.With respect to the inferences on treatment effects under the MNAR assumption, the posterior distributions of treatment effects per domain and overall are summarized as in figure 22, figure 23 and figure 24.The treatment effects are comparable to the corresponding ones under the MAR assumptions figure 17, figure 18 and figure 19 but with slightly larger variations.

Figure 10 :
Figure 10: Residual plot for domain Y under the harmonized alternative.

Figure 11 :
Figure 11: Box plot of treatment effect for domain X under the harmonized alternative.

Figure 12 :Figure 13 :Figure 15 :Figure 16 :Figure 17 :
Figure 12: Box plot of treatment effect for domain Y under the harmonized alternative.

Figure 22 :Figure 23 :Figure 25 :
Figure 22: Posterior distributions of treatment effects for domain X under MNAR.

Figure 26 :Figure 27 :
Figure 26: Residual plot for domain Y for the real data with Cronbach's alpha < 0.7.

Figure 28 :Figure 29 :
Figure 28: Posterior distributions of treatment effects for the two groups within domain Y.
Z Normal Y X P σ 

2469-5831 item
within a domain.The placebo group received more missing values than the treatment group in domain Y.