Numerous studies have shown that individuals with dementia have exhibited activation of inflammatory pathways in their brains. Typically, these studies use traditional and well-established regression methods for data analysis. In this paper, a new approach is introduced that utilizes the analysis of the covariance structure using methods related to the principal component analysis (PCA) theory. Eleven biomarkers related to neuroinflammation were used to determine the association with the onset of dementia. Various demographic covariates were adjusted to account for possible confounding effects of the covariance structure. Three hypothesis testing methods were considered to discern differences between partial covariance matrices for comparing power and Type I errors through simulation studies. Application of hypothesis testing methods using data from Framingham Heart Study (FHS) found significant differences in covariance matrices between the non-dementia and dementia groups.

Inflammatory biomarkers, Dementia, Comparison of covariance matrices, Principal component analysis (PCA)

There is general research consensus that inflammation in pathologically vulnerable regions of the brain is associated with dementia, and specifically with the subtype of Alzheimer’s disease [1]. For example, [2] surmises that miocroglial activation neuroinflammation plays a significant role in the development of dementia. Therefore, it has been suggested that treating inflammatory conditions could inhibit the development of dementia. Given that inflammatory molecules have been found in autopsy [3], it raises the question at what antemortem time point is neuroinflammation a risk factor for the onset of dementia later in life. Whether inflammation is a causal factor of dementia or simply a byproduct is still widely debated among the scientific and clinical communities [4]. Epidemiological studies and treatment trials using anti-inflammatory drugs have been disappointing, although concerns have been raised about the methodological fidelity of the trials [4]. As there is yet a proven and effective cure for dementia, it is crucial to understand the underlying mechanisms further to develop such treatments.

Protein biomarkers are used to quantify the level of inflammation in the brain. For this study, eleven measures that are recognized as inflammatory biomarkers are assessed as risk factors for dementia. The most common inflammatory biomarker is C-reactive protein (CRP), which is found in higher concentrations after an aneurysmal subarachnoid hemorrhage [5]. Other biomarkers include interleukin-6 (IL6) and osteoprotegerin (OPG). IL6 is a pleiotropic cytokine that contributes to host defense during infection and tissue injury [6], while OPG is a soluble secreted protein and decoy receptor that is associated with inflammation after is chemic stroke [7]. These protein biomarkers are gathered by venous blood samples drawn from participants and then the proteins of interest are isolated, frozen, and measured [8]. The data presented in this study is from the Framingham Heart Study (FHS). The FHS has been collecting longitudinal data from three generations of participants for over 7 decades that includes demographics, inflammatory biomarkers, incident dementia and clinical comorbidities.

Traditional hypothesis testing methods are adequate for assessing the effects of biomarkers’ individual effects on dementia. However, multiple-hypothesis testing procedures can produce false positives at an unacceptably high rate [9]. Smart manipulation of the rejection criteria can result in a good balance of power and false positives, but this study circumvents the need by offering an alternative approach. Principal component analysis (PCA) and partial covariance analysis are deployed to condense the problem into a single hypothesis testing procedure. PCA approximates data by the product of the “object patterns” and the “variable patterns” [10], while partial covariance adjusts the scale measures for a set of variables [11]. These methods can summarize a set of variables and identify the fundamental structure of the set, which can be invaluable for understanding the underlying factors that affect the combined impact of these variables. A useful application of the group structure is to facilitate the comparison of a set of variables for different population groups.

For this study, three distinct methods for the hypothesis testing of equivalence of covariance matrices are applied to determine potential inflammation-related risk factors for dementia: Principal Component Group Comparison [12], Forkman’s Test [13], and the Tracy-Widom Statistic [14]. Primary objectives are to compare the test statistics and p-values via various visualization tools, as well as to assess the Type I error rate and power of the hypothesis testing procedure via simulations. Given the novelty of the approaches utilized in this paper, it is imperative to perform simulation analysis for a comprehensive understanding of these inferential tools. For example, the simulation analysis allows visualizing and evaluating each method’s unique advantages and disadvantages within controlled settings. The PCA comparison method is intuitive, and the non-parametric framework requires fewer assumptions about the underlying data generating process but is untested in the context of hypothesis testing. The Tracy-Widom distribution is parametric; therefore, the p-values can be derived from the distribution by theoretical analysis. However, there are strict assumptions for the Tracy-Widom distribution, so the results may be questionable if the assumptions are not met. The Forkman’s test method allows the computation of the p-values without the restrictive distributional assumptions. The results are further summarized into test statistics to test the differences in partial covariance matrices.

In summary, the paper discusses three novel approaches to understanding the differences between covariance structures of multivariable data and provides guidelines for using these methods using extensive simulation studies. The potential clinical contribution of these analyses is their application to confirm the role of inflammatory biomarkers as a risk factor for dementia.

Design In this section, we briefly describe the details of the study design.

Participants: The Framingham Heart Study (FHS) is a community-based multi-generational prospective cohort study that began in 1948 to identify risk factors for cardiovascular disease and subsequently expanded to incorporate study of many common chronic diseases including dementia. For the current study, participants are members of the offspring cohort, which includes the biological children of the original FHS cohort and the spouses of the children [15]. These participants had regular health examinations on average every four years. Data used for this analysis were collected at the seventh and eighth health examinations performed during 1998-2001 and 2005-2008, respectively (n = 2684).

Inflammatory biomarkers: Inflammatory biomarkers were measured from venous blood samples, as previously described [8]. Table 1 lists the 12 different biomarkers of inflammation [16,17].

Table 1: Biomarker sampling method. View Table 1

The biomarkers’ blood concentration levels are typically right-skewed, as there is a lower limit on the levels but no upper limit (Figure 1). Therefore, a logarithmic transformation is applied to biomarkers to reduce skewness and improve the performance of PCA decomposition.

Figure 1: Kernel Density Estimation of CRP vs. Log-Transformed CRP. View Figure 1

Dementia review: The dementia diagnosis of each participant in FHS is evaluated and verified through an adjudication panel that includes at least one neurologist and one neuropsychologist, who determined the diagnosis based on information from clinical examinations, medical records, and, when available, neuropsychological and neurological assessments and family interviews [18,19]. Ongoing surveillance for incident dementia that has been underway since 1976 identifies participants for diagnostic consideration. Therefore, those who do not undergo a dementia review assessment are presumed not to have dementia. Participants identified for possible mild cognitive impairment (MCI) were also excluded from the analysis (n = 158), to create a distinct separation between the dementia and non-dementia groups. Dementia is flagged whether the onset is before or after the biomarkers sampling.

Adjusting for covariates: Covariates that could significantly affect protein biomarker concentration are identified using rudimentary linear models and their mutual correlation. These covariates are age, sex, BMI, current smoking status, total cholesterol, ventricular rate, serum creatinine concentration and indicator of treatment for lipid disorders. Furthermore, the models are adjusted for the time difference between biomarker sampling and dementia diagnosis. Participants with missing covariates are excluded (n = 1021), these exclusions were already applied in Section 4.1.1.

Data imputation: Missing data, be it due to non-attendance at the time of data collection, inconclusive results, unreadable notation, or something else, pose a significant challenge in a clinical setting. However, in traditional statistical analysis, a sample would be discarded if its variables had missing values. Instead, the PCA imputation method is applied on the logarithm of biomarker concentrations to preserve the sample size with minimal effect on the data quality. This method uses a regularized iterative PCA algorithm that fits an expectation-maximization (EM) algorithm to remove noise during the imputation step [20]. Note that imputation works well when the proportion of missing values is not too high; as such, the inflammatory biomarker lysophosphatidic acid (lpa) was not considered in this analysis due to too many missing values (n = 1270). Participants with missing values in the other biomarkers had their missing values imputed (n = 433).

Statistical analyses This section describes the various statistical techniques implemented to analyze the data along with a detailed discussion of the various characteristics of the methodologies.

Participant characteristics: After screening out exclusions and implementing imputation for missing observation, the data included 2684 participants, of which 2536 were in the non-dementia group and 148 in the dementia group. The study sample was characterized using means and percentages along with rudimentary univariate analysis to examine potential differences between the two groups. Table 2 provides the mean and standard deviation of continuous variables. The two-sample t-test of the mean difference between the dementia group and the non-dementia group is included for each variable. Table 3, Table 4 and Table 5 show the counts and proportions for categorical variables, along with the χ 2 test of independence. Age, BMI, creatinine concentration, total cholesterol, current smoking status, and lipid disorder treatment indicator show significant mean differences between the dementia and non-dementia groups whereas sex and ventricular rate were not significantly different. Interestingly, the dementia rate is slightly higher for females than males, which goes against the findings of contemporary studies [21].

Table 2: Quantitative covariates descriptions. View Table 2

Table 3: Covariates variables descriptions (sex). View Table 3

Table 4: Covariates variables descriptions (current smoking status). View Table 4

Table 5: Covariates variables descriptions (treated for lipids). View Table 5

Biomarker characteristics: Table 6 provides the characterization of inflammatory biomarkers using the mean and standard deviation with two samples t-test of the mean difference as a comparison between the dementia and non-dementia groups. The results are mixed; four biomarkers have significant differences, while seven did not. However, this method evaluates biomarkers individually and is not able to detect potential inter-biomarker covariance differences. Furthermore, there may be demographic effects that cannot be captured using simple summary statistics. Therefore, it is necessary to use more sophisticated statistical methods to sufficiently understand the underlying covariance structure between the biomarkers.

Table 6: Log predictor descriptions. View Table 6

Partial covariance: A multivariable data covariance matrix measures the joint structural association among a set of variables. Comparison of covariance matrices is a key tool for understanding differences between multivariable structures of multiple groups of data. This approach has been extensively used in the study of a wide range of problems involving multidimensional data, such as understanding the role of genetic constraints in the determination of evolutionary trajectories in adaptive radiation [22], the response of the genetic architecture to environmental heterogeneity [23], the evolution of phenotypic integration [24,25], multicharacter phenotypic plasticity [26] and sexual dimorphism [27,28] among others.

The partial covariance allows exploration of the structural association between the variables of interest after adjusting for exogenous variables. The logarithmic transformed biomarkers are adjusted by controlling for covariates. This is done by fitting a linear model for each biomarker against the covariates and estimating the covariance using the resulting residuals. Therefore, data analysis uses partial covariance matrices instead of traditional covariance matrices to account for demographic effects.

Preliminary screening for the association of inflammatory biomarkers with dementia: To explore the relationship of inflammatory biomarkers with dementia status, the partial covariance structure of biomarkers is compared between the two groups. Visual comparisons of the partial correlation matrices between the two groups were made using heat maps and scree plots. The heat map visualizes high values using reddish colors and low values using blueish colors, akin to a temperature map. A scree plot is a graphical tool that plots the eigen-values, i.e. amount of variance explained by each principal component, of the covariance matrix by decreasing orders of magnitude. The scree plot can be used to find the number of significant components to keep in a principal component decomposition or the number of dominant factors in factor analysis [29]. These descriptive tools were utilized to identify potential structural differences between the partial covariance matrices of the non-dementia and dementia groups.

Hypothesis testing: Hypothesis testing methods served as a reference analytical tool to assess the difference between the partial covariance matrices of the two groups.

H 0 : P 0 = P 1 , H A : P 0 ≠ P 1

where P 0 and P 1 are the population partial covariance matrices of the non-dementia and dementia groups, respectively. A hypothesis testing method calculates a test statistic and the corresponding p-value from the data that are used to make a decision. Three novel approaches to the hypothesis testing problem were considered, namely, Principal Component Group Comparisons, Forkman’s test, and Tracy-Widom statistic.

Principal components group comparison: The first method to assess the differences between two covariance matrices was the Principal Components Group Comparison (PCGC) method based on the comparison of principal components of different groups [12]. Let L and M be the principal component loading matrices of the non-dementia and dementia groups, respectively, then the test statistic is the minimum angle between the space of the first component, given by

T = cos−1{(λ1)12} where λ 1 is the first eigenvalue of S = LM ′ ML ′ .

A small minimum angle implies that the two groups are similar with respect to the first principal component. As the minimum angle does not follow a known probability distribution, a bootstrap method is used to calculate the p-value. A bootstrap method creates B bootstrap data by randomly shuffling the group labels. Then, a boot-strap test statistic T b is calculated for each bootstrap data ( b = 1 , ..., B ). Finally, the p-value is calculated as the proportion of bootstrap samples in which the observed test statistic is smaller than the bootstrap test statistics.

Forkman’s test: The Forkman’s test takes inspiration from [13], where the authors propose a hypothesis testing method for the number of significant principal components in a standardized data matrix. First, the authors simulate simple parametric bootstrap samples and calculate the test statistic for each sample. The bootstrap is parametric because it assumes a normal distribution. It is relatively simple approach because it is based on a standard distribution with no estimation of parameters. Then, the bootstrap test statistics are compared to the observed test statistic to calculate the bootstrap p-value. Estimation of parameters are not needed because the observed data are already standardized to zero mean and unit variance.

The methods in Forkman, et al. [13] are then modified to test the hypothesis of the difference between two covariance matrices. First the test statistic is calculated as such:

Z=(n0S0+n1S1)−1n1S1 T=λ1/∑k=1pλk Where S 0 and S 1 are the sample partial covariance matrices of the standardized data from the dementia group and the non-dementia group, and n 0 and n 1 are the respective sample sizes. Furthermore, T is the test statistic and λ k is the k th eigenvalue of Z .

In summary, the test statistic is calculated by combining the covariance matrices of two groups in such a way that if the two matrices are equal, the largest eigenvalue of the combined matrices follows the greatest root statistic distribution [14].

The p-value is then calculated using the following boot-strap algorithm:

Algorithm 1 Forkman’s Test P-Value

1: for b ∈ 1 , 2 , ..., B do

2: Generate

3: (Σ is the covariance matrix)

4: ( λ bk is the k th eigenvalue of Z b )

5: end for

6: P-value = ( I is the indicator function)

Tracy-widom statistic: The Tracy-Widom distribution was introduced in [30], as the probability distribution of the normalized eigenvalue of a random Hermitian matrix. Johnstone [31] established the use of the Tracy-Widom distribution of order 1 as the asymptotic distribution of the largest eigenvalue in the covariance matrix of independent Gaussian variables. A follow-up paper [14] explores the application of the Tracy-Widom distribution of order 1 to multivariable analysis, of relevance is the group comparisons of covariance matrices.

The hypothesis tested considered S 0 and S 1 , the sample covariance matrices for both groups. Let n 0 be the number of participants in the non-dementia group, and n 1 for the dementia group. And let λ 1 be the largest eigenvalue of S = ( n 0 S 0 + n 1 S 1 ) −1 n 1 S 1 . Finally, define F 1 as the Tracy-Widom distribution of order 1 and p the number of predictors, then under H 0 :

T=logit(λ1)−μ(p,n0,n1)σ(p,n0,n1)∼F1 Where µ is the centering term, and σ the scaling term, defined as:

μ(p,n0,n1)=2log(tan(γ+ϕ2)) σ(p,n0,n1)3=16((n0+n1−1)sin2(γ+ϕ)sinγsinϕ)−1 Where

sin2(γ2)=min(p,n1)−0.5n0+n1−1 sin2(ϕ2)=max(p,n1)−0.5n0+n1−1 Therefore, T is the test statistic, and the p-value is given by Pr ( T > t|H 0 ) = 1 - F 1 ( t ).

In practice, we approximate the Tracy-Widom p-value using a Gamma distribution as outlined by [32], as the Tracy-Widom p-value is difficult to calculate directly.

Simulation study:

Before addressing the core question related to differences in inflammatory biomarkers between the non-dementia group and the dementia group, it is important to first compare the performances of the three hypothesis testing methods discussed in the last section using simulation studies. Simulation is necessary to evaluate and compare the results of the three methods in a controlled setting, as established theoretical results for these methods are limited. Two groups of random variables are generated from a parametric distribution in the simulations. The first group had a fixed parameter, while the second group varied the parameter in a range where the fixed parameter of the first group is the center. The goal of these simulations is to visualize the change in statistic and p-values as the distributions of the two groups differ. The Type I error and power of these tests are assessed below. The following describes the data-generating process (DGP) for each simulation below.

DGP1: Data is generated from bivariate normal distributions with zero-mean and unit variance, X 0 , X 1 , where the correlation is defined as ρ 0 = 0 , ρ 1 ∈ (−1 , 1).

DGP2 : Following the work of [33], let represent the predictor vector for the i th observation in group k , in this case k = 0 , 1 and generate . Parameters are defined as θ 0 = 3, θ 1 ∈ (1 , 5). This simulation creates a complex dependency based on a moving average process that is not directly obvious.

DGP3: Designed as a multivariable normal simulation with . Where the i th row and j th column of is defined as , which follows the correlation structure of an autoregressive process of order 1 (AR(1)). Specifically, the parameters are defined as . This simulation can be described as a multivariable variation of DGP1 .

In each simulation, the test statistic and the p-value are graphically visualized. A fine grid of discrete values is set for ρ 1 , θ 1 , and ɸ 1 respectively. Each DGP is simulated 10 times, and the average of each set of test statistic or p-values against their respective value of ρ 1 , θ 1 , or ɸ 1 are then plotted.

Furthermore, Type I error and power for each DGP and each statistical method are calculated by estimating the Type I error by setting the parameter of interest equal to ( DGP1 : ρ 0 = ρ 1 = 0 , DGP2 : θ 0 = θ 1 = 3 , DGP3 : ɸ

0 = ɸ 1 = 3). To assess power, a reference point for group0 ( ρ 0 = ɸ 0 = 0 , θ 0 = 3) is set, while setting the parameter of interest of group 1 to values surrounding group 0’s ( ρ 1 = ɸ 1 = (− . 4 , − . 2 , . 2 , . 4) , θ 1 = (1 , 2 , 3 , 4)). The Type I error and power are evaluated in a range of sample sizes ( n 0 = n 1 = 500 , 1000 , 1500) and a fixed number of predictors (p = 10) to evaluate changes in the estimator as sample sizes increase.

Inflammatory biomarkers and effect on dementia: Finally, the effect that inflammatory biomarkers collectively had on dementia risk is analyzed by performing the three aforementioned methods on the partial covariance structure of non-dementia and dementia group.

Simulation analysis This section summarizes the results of the simulation analysis as described in Section 4.2.9.

Test statistics and P-values Results: Figure 2 depicts the graphs of the three test statistics and the respective p-values (Figure 3) for the three different simulations. Note that the parameters for DGP2 ( θ 1 ) are scaled to match the other simulations in the overlying illustration. As a reminder, in the original simulation ( θ 1 ∈ (1 , 5) , θ 0 = 3), this is rescaled to in graphical presentation. The test statistics in DGP1 do not change noticeably with the parameters, although their corresponding p-values changed in a predictable way except for DGP2 . The Tracy-Widom distribution performed well under DGP1 and DGP3 as it exhibits symmetry in the test statistic and the p-value graph. However, the DGP2 simulation was not symmetric but was more significant on the right side. The Tracy-Widom p-value did not reach significance level ( α = 0 . 05) even at , the lowest value in DGP2 simulation. In fact, the minimum of the Tracy-Widom test statistic (the maximum p-value) was reached when θ 1 < 3, lower than the true null case. Lastly, the scaled Forkman’s test statistic had good results in all three simulation methods.

Figure 2: Comparisons of test statistics. View Figure 2

Figure 3: P-value comparisons. View Figure 3

The one-sided peculiarity of Tracy-Widom’s distribution in DGP2 is worth exploring. A compelling argument for this phenomenon is that the size or determinant of Σ 1 has an one-sided effect on the Tracy-Widom distribution, as S 1 is used in the numerator to make the combined co-variance matrix. Indeed, a simulation that varies the matrix size clearly shows this effect, we call this simulation DGP4 .

DGP4: Data was generated from independent bivariate normal distributions with zero-mean and variance . Where k = (0 , 1) and .

DGP4 simulation result (Figure 4) clearly showed the one-sided relationship between the Tracy-Widom statistic and the difference between the variances . Therefore, the appropriate way to test the equality of the co-variance matrices would be by means of a two-sided p-value. However, this contradicts the results of DGP1 and DGP3 , which showed that the Tracy-Widom statistic is two-sided with respect to the difference in covariance between variables. This contradiction creates a difficult dilemma. On the one hand, there is the option of looking for differences in magnitude in variances, where a one-sided p-value would be appropriate. On the other hand, if the objective is to determine differences in correlation between variables, the two-sided p-value would be the appropriate choice. For sake of simplicity, the data analysis focuses on the latter by standardizing the variables of interest. The standardization to zero mean and unit variance is consistent with similar analyses, such as in [34].

Figure 4: DGP4 Simulation results. View Figure 4

Type I error and power results: Next, the three methods using Type I error and power of the tests were assessed (Table 7, Table 8 and Table 9). Type I error is represented in the columns where the parameters for both groups are equal to ( ρ 1 = 0 , θ 1 = 3, and ɸ 1 = 0). Although the PCGC method captures the true Type I error ( α = 0 . 05) quite well, the power is relatively low in all simulations; in fact, the power does not appear to change at all in DGP2 . The Forkman’s test method accurately estimates Type I error, while also having high power for all simulations. Meanwhile, the Tracy-Widom method was consistently conservative in estimating the Type I error in all simulations, with the DGP1 simulation being the most conservative. As for power, the Tracy-Widom method had good power distribution in DGP1 and DGP3 , although is in general less powerful than the Forkman’s test method. However, the Tracy-Widom method was extremely under-powered in DGP2 when θ 1 < θ 0 , this phenomenon is a consequence of Tracy-Widom method’s non-symmetric interaction with DGP2 .

Table 7: Type I error and power under DGP1. View Table 7

Table 8: Type I error and power under DGP2. View Table 8

Table 9: Type I error and power under DGP3. View Table 9

In general, power increases as the sample size increases, whereas the Type I error does not have any discernible changes. Additionally, the power increases along with sample sizes given a fixed number of variables; this is true for all three methods.

In summary, the Forkman’s test method had the most impressive simulation results; not only did it reasonably estimate Type I error, but it also demonstrated high power in all the simulations. The least impressive method was PCGC. While it had reasonable Type I power estimation, it was under-powered in comparison. In addition, PCGC was completely unable to detect the parameter changes in DGP2 . The Tracy-Widom method was conservative in estimating the Type I error in all simulations. It had high power in DGP1 and DGP3 , although not as much as the Forkman’s test. However, the Tracy-Widom method was extremely underpowered in DGP2 when ( θ 1 < θ 0 ) as mentioned previously.

Analysis of inflammatory biomarker data

This section summarizes the results of the analysis of the data on the FHS inflammatory biomarkers and dementia.

Preliminary analysis: These analyzes sought evidence suggesting structural differences in biomarkers between participants in the dementia and non-dementia group. Figure 5 illustrates the comparison between the partial correlation of the two groups using a heat map visualization. There seems to be some negative correlation among some variables in the dementia group, which does not appear in the non-dementia group.

Figure 5: Heatmap comparison. View Figure 5

Another possible way to visualize the structural differences in the partial covariance matrices between the non- dementia group and the dementia group is by comparison of the scree plots (Figure 6). In both groups the eigen values of the first two components appear to be very similar, while there exists minor differences in the latter components.

Figure 6: Scree-plot comparison. View Figure 6

Results of the inferential procedures: The previous sections considered inflammatory biomarkers and the difference in the structure of the covariance between the two groups by generating a simple covariance matrix and the partial covariance matrix. The simple covariance matrix was calculated directly from the log-transformed biomarkers, while the partial covariance was calculated from the residuals after adjusting for the covariates. The results from analyzing the simple covariance would show whether there is a difference in covariance structure by itself, while the results from the partial covariance would show whether differences exist after adjusting for confounding factors.

The results show significant differences between the co-variance matrices in all cases except for the PCGC method on the simple covariance matrix (Table 10). Given the lackluster performance of PCGC in the simulations, the results of Tracy-Widom and Forkman’s test seem more reasonable. Therefore, it appears that there is a significant difference between the simple covariance and partial covariance of the log-adjusted biomarkers of the dementia group and the non-dementia groups. Partial covariance appears to be more significant than simple covariance, which gives credence to covariate adjustments. This result explicitly reveals differences in the dementia groups that the preliminary visualizations did not clearly show.

Table 10: Data analysis, test statistics and P-values. View Table 10

Despite extensive research on individual risk factors for dementia, there are far fewer studies that consider how these risk factors synergistically influence dementia. Current solutions to multivariable problems (such as machine learning) are complex given the large and varied data involved. On the other hand, statistical methods based on comparison of covariance matrices have been applied to a variety of problems, but not to biological risk factors of dementia. The purpose of this study was to test simpler covariance-based models that summarize multiple variables into a single test and score by efficiently examining the combined effect of many variables.

Three methods were investigated that combined the covariance matrices of two groups and then were summarized as a test statistic to test whether the matrices are structurally different. The validity of these methods via simulation, Type I error, and power analysis suggests that the Forkman’s test method is effective in identifying both significant and insignificant differences between covariance matrices. By comparison, the PCGC method has low power and was unreliable in some cases. The Tracy-Widom method was conservative in estimating the Type I error but was accurate in estimating power. However, for non-normal distributed data, there were conflicting effects of variable variances and inter-variable covariances using the Tracy-Widom Statistics. These results indicate that the Forkman’s test and Tracy-Widom methods were reliable methods for analyzing approximately normally distributed data.

The application of these methods to the clinical question of inflammatory biomarkers and their relationship with dementia risk found that these tests detected significant differences in the covariance structure of biomarkers between the dementia group and the non-dementia group. This result reinforces the established consensus that inflammation biomarkers are indeed risk factors for dementia. Furthermore, the 11 biomarkers analyzed in this paper can act as a collective risk factor for dementia. The collective covariance effect is certainly significant for dementia regardless of the significance of the individual mean effects.

There are limitations both in the methods and with the data. These methods are not well tested and thus are not established theoretically, which leads to a reliance on the use of simulation to assess the validity of these methods. Although these methods did determine the existence of a significant difference between two groups collectively among biomarkers, they could not determine which variables are different. Furthermore, the data was not collected from the same period. Interleukin-18 (Il18) samples were measured from blood from the 7 th health exam (1998- 2001), while the rest were measured from health exam 8 (2005-2008).

Furthermore, the covariate information came mainly from exam 8 values; exam 7 values were used only when exam 8 values are unavailable. These compromises could introduce additional noise that could not be controlled in order to maximize the number of biomarkers in the study. Furthermore, there was an imbalance in sample sizes between the non-dementia group (n = 2536) and the dementia group (n = 168). Imbalanced data introduces bias [35] and ’wastes’ the extra samples in the non-dementia group. Future research is warranted to determine theoretical justifications for the Forkman’s test and Tracy-Widom methods using more balanced study groups.

In summary, Forkman’s test and Tracy-Widom methods are valid candidates for testing the significance of co- variance differences among standardized variables between groups, while the PCGC method is unreliable. Forkman’s test appears to have suitable Type I error estimates and strong power, while Tracy-Widom has conservative Type I error estimates but good power. In a clinical research application, data analysis showed significant differences in protein biomarker covariance matrices between the non-dementia group and the dementia group. This result is expected, given the literature on inflammatory biomarkers as a risk factor for dementia. Utilizing new and cutting-edge analytic approaches can further enrich the understanding of inflammatory biomarkers and their effects on dementia.

Acknowledgements We would like to thank the participants of the Framingham Heart Studies for their contributions to the study.

Funding This paper is generously supported by Framingham Heart Study’s National Heart, Lung, and Blood Institute contract (N01-HC-25195), and the National Institute on Aging (AG016495, AG008122, AG062109, AG068753).

Conflict of interest Rhoda Au is a scientific advisor to Signant Health and a scientific consultant to Biogen and the Davos Alzheimer’s Collaborative (DAC). She also serves as Director of the Global Cohort Program for DAC. The other authors have no conflict of interest to declare.

Ethical approval All participants in the FHS study referenced in the data analysis have provided informed consent. The study protocol received approval from the Institutional Review Board of the Boston University Medical Center. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement was followed.

Data availability statement The data that support the findings of this study are available from the corresponding author upon reasonable request.

*Lythrum salicaria*in eastern North America. Evolution 65: 2514-2529. https://pubmed.ncbi.nlm.nih.gov/21884053/

*Lepidium bonariense*. Evolutionary Ecology 24: 1321-1337. https://link.springer.com/article/10.1007/s10682-010-9373-6