# Bayesian Cox Proportional Hazards Model in Survival Analysis of HACE1 Gene with Age at Onset of Alzheimer's Disease

##### Ke-Sheng Wang1*, Ying Liu1, Shaoqing Gong2, Chun Xu3, Xin Xie4, Liang Wang1 and Xingguang Luo5,6

1Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, Johnson City, TN, USA

2School of Public Policy and Administration, Xi'an Jiaotong University, Xi'an, China

3Department of Health and Biomedical Sciences, College of Health Affairs, University of Texas Rio Grande Valley, Brownsville, TX, USA

4Department of Economics and Finance, College of Business and Technology, East Tennessee State University, Johnson City, TN, USA

5Department of Psychiatry, Yale University School of Medicine, New Haven, CT, USA

6Biological Psychiatry Research Center, Beijing Huilongguan Hospital, Beijing, China

*Corresponding author: Ke-Sheng Wang, Department of Biostatistics and Epidemiology, College of Public Health, East Tennessee State University, PO Box 70259, Lamb Hall, Johnson City, TN 37614-1700, USA, Tel: +1-423-439-4481, Fax: +1 423-439-4606.

Accepted: November 30, 2017 | Published: December 01, 2017

Citation: Ke-Sheng W, Liu Y, Gong S, Xu C, Xie X, et al. (2017) Bayesian Cox Proportional Hazards Model in Survival Analysis of HACE1 Gene with Age at Onset of Alzheimer's Disease. Int J Clin Biostat Biom 3:014. doi.org/10.23937/2469-5831/1510014

Copyright: © 2017 Ke-Sheng W, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

# Abstract

Alzheimer's Disease (AD), the most common form of dementia, is a chronic neurodegenerative disease. The HECT domain and ankyrin repeat containing E3 ubiquitin protein ligase 1 (HACE1) gene is expressed in human brain and may play a role in the pathogenesis of neurodegenerative disorders. Till now, no previous study has reported the association of the HACE1 gene with the risk and Age at Onset (AAO) of AD; while few studies have checked the proportional hazards assumption in the survival analysis of AAO of AD using Cox proportional hazards model. In this study, we examined the associations of 14 Single Nucleotide Polymorphisms (SNPs) in the HACE1 gene with the risk and the AAO of AD using 791 AD patients and 782 controls. Multiple logistic regression model identified one SNP (rs9499937 with p = 1.8 × 10-3) to be associated with the risk of AD. For survival analysis of AAO, both classic Cox regression model and Bayesian survival analysis using the Cox proportional hazards model were applied to examine the association of each SNP with the AAO. The Hazards Ratio (HR) with its 95% Confidence Interval (CI) was estimated. Survival analysis using the classic Cox regression model showed that 4 SNPs were significantly associated with the AAO (top SNP rs9499937 with HR = 1.33, 95% CI = 1.13-1.57, p = 5.0 × 10-4). Bayesian Cox regression model showed similar but a slightly stronger associations (top SNP rs9499937 with HR = 1.34, 95% CI = 1.11-1.55) compared with the classic Cox regression model. Using an independent family-based sample, one SNP rs9486018 was associated with the risk of AD (p = 0.0323) and the T-T-G haplotype from rs9786015, rs9486018 and rs4079063 showed associations with both the risk and AAO of AD (p = 2.27 × 10-3 and 0.0487, respectively). The findings of this study provide first evidence that several genetic variants in the HACE1 gene were associated with the risk and AAO of AD.

# Keywords

Alzheimer's disease, Age at onset, HACE1, Single nucleotide polymorphisms, Survival analysis, Cox model, Bayesian analysis

# Introduction

The HECT domain and ankyrin repeat containing E3 ubiquitin protein ligase 1 (HACE1) gene (also known as KIAA1320) is located at 6q16.3 [13-15]. The HACE1 is expressed in brain, heart, lung, kidney, testis, and ovary [13,15]. Several studies have implicated that HACE1 is a candidate chromosome 6q21 tumor suppressor gene involved in multiple cancers [16-18]. Recently, it has been reported that HACE1 gene may play a role in neurodegeneration [19] and autophagy pathway [20]; while HACE1 mutations are involved in an autosomal recessive neurodevelopmental disorder [21], and glutamine addiction [22]. A Genome-Wide Association Study (GWAS) identified five Single-Nucleotide Polymorphisms (SNPs) (rs17065302, rs11759010, rs6927608, rs4946645 and rs4245525) within the HACE1 gene associated with equol-producing phenotype such as blood pressure, which may implicate HACE1 in intestinal immune responses [23]. Another GWAS identified 5 SNPs (rs4336470. rs9404576, rs4079063, rs24996663, and rs2499667) in HACE1 associated with neuroblastoma susceptibility [24]; while in a replication study, rs4336470 showed moderate association (p < 0.05) with risk of neuroblastoma [25]. More recently, it has been reported that the five above SNPs in the HACE1 gene may have a weak combined effect (p = 0.065) on neuroblastoma risk in Southern Chinese children [26]. Thus we hypothesized that HACE1 genetic variants may be in association with AD development.

To our best knowledge, no study has focused on the association of the HACE1 gene with the risk and AAO of AD. Even though the Cox proportional hazards model has been used to detect genetic associations with the AAO of AD [27-29], the proportional hazards assumption may be violated and they might not be carefully checked. Bayesian methods have been widely used recently in genetic association studies and provide alternative ways to traditional statistical methods [30-32]. In this study, we explored the association of HACE1 with the AAO of AD by using a Bayesian proportional hazards model in a population-based sample and then a family-based sample for replication.

# Subjects and Methods

## Study population

791 patients with AD and 782 controls with complete genotype and phenotype information in a Canadian sample were selected from the Multi-Site Collaborative Study for Genotype-Phenotype Associations in Alzheimer's disease and the longitudinal follow-up of Genotype-Phenotype Associations in Alzheimer's disease and the Neuroimaging component of Genotype-Phenotype Associations in Alzheimer's disease-Study Accession: phs000219.v1.p1. Covariates include sex and age. The details about these subjects were described in previous studies [27,33]. Genotyping was conducted using the Affymetrix technique. The genotypes of 14 SNPs within the HACE1 gene were available in this data.

A family-based study (1266 AD cases and their relatives, 1070 individuals with the AAO values) were available from the National Institute on Aging - Late Onset Alzheimer's Disease (NIA-LOAD) Family Study: Genome-Wide Association Study for Susceptibility Loci - Study Accession: phs000168.v1.p1. Genotyping by the Center for Inherited Disease Research (CIDR) was performed using the Illumina Infinium II assay protocol. The details about the sample of subjects were described elsewhere [34]. There are 28 SNPs within the HACE1 gene in this family-based sample.

## Statistical analysis

### Descriptive statistics and genotype quality control

Descriptive statistics were used to characterize participants' sex, age and the AAO of AD stratified by AD case and control status [29,35]. Hardy-Weinberg Equilibrium (HWE) was tested for all SNPs using the controls by HAPLOVIEW software [36]. Then, Minor Allele Frequency (MAF) was determined for each SNP. Pair wise Linkage Disequilibrium (LD) statistics (r2) among SNPs were assessed using the European sample from the HapMap dataset (http://hapmap.ncbi.nlm.nih.gov/) and the founders in the family study.

### Multiple logistic and linear regression models in PLINK software

Multiple logistic regression analysis of each SNP with the risk of AD as a binary outcome, adjusted for sex and age, was performed using PLINK [37]; while the asymptotic p-values were obtained and the Odds Ratio (OR) and 95% Confident Interval (CI) were estimated. The parallel procedure was performed for the multiple linear regression analysis of each SNP with the AAO of AD as a continuous outcome. Bonferroni correction (α = 0.05/14 = 3.57 × 10-3) was used to deal with the multiple comparison issue [38].

### Bayesian Cox proportional hazards model in PROC PHREG

The Cox proportional hazards model (1) or Cox regression model [39] is widely used in the analysis of time-to-event data [40-42].

where $h\left(t/x\right)$ is the hazard at time t for a subject (AAO for this study), ${h}_{0}\left(t\right)$ is the baseline hazard function. The Hazard Ratio (HR) is defined as the ratio of the predicated hazard function under two different values of a predictor variable. The PHREG procedure in SAS fits the Cox model by maximizing the partial likelihood function. Both the graphical and numerical methods [43] were used to check the proportional hazards assumption in the ASSESS option of PROC PHREG. The ASSESS option plots the cumulative score residuals against time for each independent variable; while the RESAMPLE option computes the p-value of a Kolmogorov-type supremum test based on a sample of 1,000 simulated residual patterns. A significant p-value indicates a poor fit.

The Akaike Information Criterion (AIC) (2) was used as a measure of better fit among candidate models [44,45].

where x is the random variable, is the maximum likelihood estimate, and k is the number of parameters. A smaller AIC generally indicates a better fit.

Bayesian statistics is an extension of Bayes theorem, which can be written as (3)

where θ is the parameter of interest, Y is the observed evidence, P(Y) is the marginal probability, P(Y|θ) is the likelihood function, P(θ) is the prior, and P(θ|Y) is the posterior probability [32]. Bayesian Cox regression can be requested by using the BAYES statement in the PHREG procedure. Summary statistics (Mean, Standard Deviation, the Highest Posterior Density (HPD) and Credible Intervals, and Correlation Matrix) were computed for each of the parameters. Trace plots, posterior density plots, and autocorrelation function plots were also provided [32]. For Bayesian survival analysis of the AAO of AD, the normal prior was chosen for the coefficients and the Deviance Information Criteria (DIC) was available instead of AIC. DIC is intended as a generalization of AIC [46]. A measure of effective numbers of parameters is defined as pD in (4), where pD is the posterior mean deviance minus the deviance measured at the posterior mean of the parameters.

Then DIC is defined analagously to AIC as in (5). Models with smaller DIC are better supported by the data.

The PHREG procedure in SAS was used to fit the Cox model. Multiple Cox regression model analysis, adjusted for sex and age, was conducted to examine association of each SNP with the AAO of AD. Descriptive statistics and Cox regression analysis were performed with SAS v.9.4 (SAS Institute, Cary, NC, USA).

### Family-based study

A family-based association analysis for AD was performed using PBAT version 3.6.7 [47]. For the risk of AD, the Family‐Based Association Test using Generalized Estimating Equations (FBAT-GEE) was used [48]; while for the AAO, FBAT-Wilcoxon statistics were employed [49]. The AAO values for healthy siblings were censored and age at entry into the study was used. Haplotype analysis was conducted in 2 or 3-SNP sliding window.

# Results

## Descriptive statistics and genotype quality control

The demographic characteristics of the subjects are detailed in Table 1. The mean AAO for cases was 76.4 and 72.3 years, respectively, in the NIA and Canadian samples, respectively. All 14 SNPs had MAF > 5% and were in HWE in the controls (p > 0.05); while 1 of 28 SNPs with HWE < 0.10-4 in the family sample was removed for further analysis.

Table 1: Descriptive characteristics of cases and controls. View Table 1

## Multiple linear and logistic regression analyses using PLINK

We found that one SNP was associated with the risk of AD (rs9499937 with p = 1.8 × 10-3) and four SNPs were associated with the AAO of AD (rs7746856, rs6941988, rs9499937 and rs7770002 with p = 3.09 × 10-2, 3.88 × 10-3, 7.39 × 10-4, and 3.14 × 10-2, respectively) (Table 2). Interestingly, the same SNP rs9499937 showed associations with both the risk and AAO of AD and the results remained significant after Bonferroni correction (p < 3.57 × 10-3).

Table 2: SNPs associated with the risk and/or age at onset of AD (p < 0.05). View Table 2

## Supremum test for proportional hazards assumption

Figure 1 and Figure 2 display the observed standardized score process with 20 simulated realizations from the null distribution for rs9499937 CC and CT genotypes, respectively. The plots showed that the observed process was atypical compared to the simulated realizations and revealed proportional hazards for the two genotypes compared with TT. The Kolmogorov-type supremum test results based on 1,000 simulations for all the covariates were not significant (p > 0.05), suggesting the proportional hazards assumption was valid for all the variables in the Canadian sample.

Figure 1: Explore plot for checking proportional hazards assumption for rs9499937C_C gentoype compared with rs9499937T_T genotype. View Figure 1

Figure 2: Explore plot for checking proportional hazards assumption for rs9499937C_T compared with rs9499937T_T. View Figure 2

## Classic Bayesian Cox proportional hazards model using PROC PHREG

The classic Cox model showed that four SNPs were associated with the AAO of AD (the CT genotype of the top SNP rs9499937 with HR = 1.33, 95% CI = 1.13-1.57, p = 5.0 × 10-4). The HRs based on the Bayesian survival analyses revealed similar but a slightly stronger associations compared with the non-Bayesian analyses results (Table 3). The DIC for the four SNPs were similar to those of AIC. The trace plot, posterior density plot, and autocorrelation function plot based on Bayesian analysis (Figure 3) indicated that the Markov chain had stabilized with good mixing for rs9499937. The posterior density plot, which estimates the posterior marginal distributions for the four regression coefficients, showed a smooth and unimodal shape for the posterior marginal distribution (Figure 4).

Figure 3: Trace plot, autocorrelation function plot, and posterior density plot for rs9499937. View Figure 3

Figure 4: The posterior density plots for the 4 regression coefficients. View Figure 4

Table 3: SNPs associated with the age at onset of AD using PROC PHREG (p < 0.05). View Table 3

## Family-based association analysis

We observed one SNP associated with the risk of AD (rs9486018 with p = 0.0323) by using FBAT-GEE analysis in the family-based study. The T-A haplotype from rs6937432 and rs6940552 revealed mostly significant associations with the risk (p = 9.7 × 10-4). The T-T haplotype from rs9786015 and rs9486018 and the T-G haplotype from rs9486018 and rs4079063 showed significant association with the risk (p = 1.56 × 10-3 and 4.46 × 10-3, respectively) (Table 4). Using the FBAT-Wilcoxon test, the C-C haplotype from rs6937026 and rs6946640 revealed mostly significant associations with the AAO (p = 0.0223). The C-A-G and C-C-A haplotype from rs6437026, rs4946640 and rs6910034 showed associations with the AAO (p = 0.0274 and 0.0225, respectively). The T-T-G haplotype from rs9786015, rs9486018 and rs4079063 showed associations with both the risk (p = 2.27 × 10-3) and the AAO (p = 0.0487).

Table 4: Haplotype analysis of the risk and age at onset of AD in the family sample. View Table 4

## The linkage disequilibrium structure of the HACE1 gene

Using the HapMap data, we identified one haplotype block including the four SNPs associated AD and/or AAO in the case-control study. Figure 5 shows the LD structure based on LD statistics (r2). Based on the rough rule of thumb, values of r2 > 1/3 might indicate sufficiently strong LD that can be used for a fine mapping [50]. The neuroblastoma associated rs4336470 [24,25] had moderate or strong LD with three AAO associated SNPs (rs7746856, rs6941988 and rs7770002 with r2 = 0.6, 0.6 and 0.71, respectively) and weak LD with the risk and the AAO - associated SNP rs9499937 (r2 = 0.27). Furthermore, there was strong LD observed between rs2499663 and three AAO associated SNPs (rs7746856, rs6941988 and rs7770002 with r2 = 0.81, 0.81 and 0.93, respectively). Using the founders in the family study, LD structure based on (r2) was constructed for all 28 SNPs (Figure 6). The two neuroblastoma associated SNPs (rs4336470 and rs9404576) [24,25] had moderate LD with SNPs (such as rs6910034, rs9404573, rs9499934, rs696937432, rs9486015, and rs4079063) which built AD and AAO associated haplotype; while another neuroblastoma associated SNPs rs2499663 [24,25] had moderate or strong LD with SNPs involved in AD and AAO associated haplotype.

Figure 5: Linkage disequilibrium structure (r2) within the HACE1 gene using the HapMap data (Dark area shows r2 = 1). View Figure 5

# Discussion

In the present study we explored the association of 14 HACE1 SNPs with the risk and AAO of AD using a case-control study and identified one SNP associated with AD and four SNPs with the AAO of AD using PLINK software. Interestingly, the same SNP rs9499937 showed associations with both the risk and AAO of AD. Bayesian Cox regressions revealed similar but a slightly stronger associations with the AAO of AD. Using an independent family-based sample, one SNP rs9486018 was associated with the risk of AD while haplotype analyses further revealed the associations with the risk and AAO of AD. The findings of this study provide the first evidence that several genetic variants in the HACE1 gene influenced the risk and the AAO of AD.

A previous study suggested that rs9391227 in the HACE1 gene was associated with celiac disease involved in the immune system and antigen presentation [51]; while another study revealed that five SNPs (rs17065302, rs11759010, rs6927608, rs4946645 and rs4245525) within HACE1 gene associated with equol-producing phenotype such as blood pressure [23] which may implicate HACE1 in immune responses. However, rs9391227 [51] was not available in both the case-control and family-based samples; while equol-producing phenotype associated five SNPs [23] were not associated with the risk or the AAO of AD in the case-control study and rs6927608 was not associated with the risk and AAO of AD in both samples. However, rs9391227 had strong LD with three AAO associated SNPs (rs7746856, rs6941988 and rs7770002 with r2 = 1.0, 1.0 and 0.87, respectively) in the case-control study (Figure 5); whereas rs6927608 had weak LD with other SNPs (Figure 5 and Figure 6). Previous studies have shown that AD is a chronic neurodegenerative disease while autophagy, immune and inflammatory processes are involved in the pathogenesis of AD [1,8,9,52-56].

Figure 6: Linkage disequilibrium structure (r2) within the HACE1 gene of 28 SNPs in the family sample (Dark area shows r2 = 1). View Figure 6

# Conclusion

To the best of our knowledge, this is the first candidate gene study which investigated the associations of HACE1 SNPs with the risk of and the AAO of AD. The findings may serve as a resource for replication in other populations for future investigations on target genetic variation and AD. Future functional studies of this gene may help better characterize the genetic architecture of the risk of and AAO of AD.

# Acknowledgments

We acknowledge the NIH GWAS Data Repository, the Contributing Investigator(s) who contributed the phenotype data and DNA samples from his/her original study and the primary funding organization that supported the contributing study "Multi-Site Collaborative Study for Genotype-Phenotype Associations in Alzheimer's disease and longitudinal follow-up of Genotype-Phenotype Associations in Alzheimer's disease and Neuroimaging component of Genotype-Phenotype Associations in Alzheimer's disease" and "National Institute on Aging - Late Onset Alzheimer's Disease Family Study: Genome-Wide Association Study for Susceptibility Loci". The genotypic and associated phenotypic data used in the study, "Multi-Site Collaborative Study for Genotype-Phenotype Associations in Alzheimer's Disease (GenADA)" were provided by the GlaxoSmithKline, R&D Limited. The datasets used for analyses described in this manuscript were obtained from dbGaP at https://www.ncbi.nlm.nih.gov/gap through dbGaP accession number phs000219.v1.p1. Funding support for the "Genetic Consortium for Late Onset Alzheimer's Disease" was provided through the Division of Neuroscience, NIA. The Genetic Consortium for Late Onset Alzheimer's Disease includes a genome-wide association study funded as part of the Division of Neuroscience, NIA. Assistance with phenotype harmonization and genotype cleaning, as well as with general study coordination, was provided by Genetic Consortium for Late Onset Alzheimer's Disease. The datasets used for analyses described in this manuscript were obtained from dbGaP at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs000168.v1.p1.

# Role of the Funding Sources

No founding source is given for the present paper.

# Conflict of Interest

The authors declare that they have no conflict of interest.