Nonparametric Method for Estimation of Controlled Correlations in Studies of VEGF-Hypoxia Relationship

Kodryan MS

doi:10.23937/2469-5831/1510024

International Journal of

Clinical Biostatistics and Biometrics ISSN: 2469-5831

Citation

Kodryan MS, Kuznetsova AV, Klimenko LL, Mazilina AN, Baskakov IV, et al. (2020) Nonparametric Method for Estimation of Controlled Correlations in Studies of VEGF-Hypoxia Relationship. Int J Clin Biostat Biom 6:024. doi.org/10.23937/2469-5831/1510024

Research Article | OPEN ACCESS DOI: 10.23937/2469-5831/1510024

Nonparametric Method for Estimation of Controlled Correlations in Studies of VEGF-Hypoxia Relationship

Maxim S Kodryan¹, Anna V Kuznetsova², Luidmila L Klimenko³, Aksana N Mazilina⁴, Ivan V Baskakov³ and Oleg V Senko^5*

¹Lomonosov Moscow State University, Russia

²Emanuel Institute of Biochemical Physics, Russian Academy of Sciences, Russia

³Semenov Institute of Chemical Physics, Russian Academy of Sciences, Russia

⁴Department of Neurology of Clinic Hospital, Federal Medical and Biological Agency of Russia, Moscow, Russia

⁵Federal Research Center/Research Center of Informatics and Control, Russian Academy of Sciences, Moscow, Russia

Abstract

It is known that vascular endothelial growth factor (VEGF) expression is a response to hypoxia. On the other hand hypoxia may be detected by oximetry parameters including venous CO-oximetry indices or corresponding partial pressures of O₂ and CO2. However significant correlation ties between VEGF levels and oximetry parameters were not found in groups of patients with ischemic stroke and transient ischemic attack. At that some effect related to the relationship between VEGF and sO₂ was observed at corresponding scatter plots. Correlation between VEGF and proteins S100 levels in serum existed only in group with severe hypoxia where sO₂ is less threshold close to 39-40%. So the relationship between VEGF level and saturation index sO₂ exists in conjunction with additional factor that is S100 level in serum. To assess statistical significance of observed regularity it is necessary to test three null hypotheses about independence of one of involved factors on two another. The relationship may be manifestation of hypoxia effect on VEGF. To assess significance of hypoxia effect as a whole all three null hypotheses were tested with the help of developed technique based on random permutations and involving nonparametric combinations of criteria related to single oximetry parameters. The statistical significance assessments also involved multiplicity adjustment aimed to take into account multiple search of additional factors among variety of biological indices from analyzed data set. As a result of developed technique application all three considered null hypotheses were rejected at adjusted level p < 0.02 when effect of hypoxia on correlation between VEGF and complement component C4 was evaluated.

Keywords

VEGF, Hypoxia, Controlled correlations, Permutation test, Nonparametric combinations, Multiple testing

Introduction

Angiogenesis may be a biological response to insufficient oxygen supply resulting in hypoxia. The key mediator of angiogenesis and probably neurogenesis [1] is vascular endothelial growth factor (VEGF) which is homodimeric glycoprotein with a molecular weight of approximately 45 kDa. VEGF expression is activated as a response to stabilization and nuclear translocation of hypoxia-inducible factor-1 (HIF-1) when intracellular oxygen level is reduced [2]. Existence of HIF-1, VEGF signalling pathways is confirmed by high levels of VEGF in patients with chronic obstructive pulmonary disease (CHOPD) [3] and asthma [4] or in subjects with "plateau red face" [5]. It is known that immune system is involved in angiogenesis via secreting VEGF and other pro-angiogenic factors by macrophages, neutrophyles and other immune cells [6-8]. Oxygen saturation index sO₂ and other CO-oximetry parameters in venous blood reflect balance between oxygen delivery and oxygen consumption [9]. Low sO₂ correspond to tissue hypoxia [10]. However correlation coefficients between CO-oximetry parameters and VEGF levels were small and not statistically significant in groups of patients with severe neurological disorders. Corresponding data set is discussed below. Correlation coefficients between VEGF and partial pressures of O₂ and CO₂ were also not significant. Lack of reliable ties may be attributed among other things to complexity of existing dependence when relationship between two factors is controlled by the third one. Previous study of the data set with the help of OVP method [11,12] and visual analysis of related sparse diagrams uncover complex effect involving sO₂ and levels in serum of VEGF and proteins from S100 family.

Left scatter diagram from Figure 1 corresponds to group with sO₂ lower than 38.4%. This diagram conforms to existence of linear dependence of VEGF on S100 level. The only exception is one object that is marked with a red circle. At that there is no noticeable correlation between VEGF and S100 in the right diagram. This diagram corresponds to group with sO₂ greater than 38.4%. It may be suggested that the correlation observed in the left diagram may be caused by severe hypoxia existing when sO₂ is below 38.4%. The relationship from Figure 1 may be described by following equations:

Figure 1: Correlation between S100 and VEGF levels are compared in groups with sO₂ < 38.4% (left scatter plot) and sO₂ > 38.4% (right scatter plot). View Figure 1

$\begin{array}{l} y_{j} = β_{0}^{l} + β_{1}^{l} x_{i j} + \in_{j}^{l} i f z_{k j} < δ (1) \\ y_{j} = β_{0}^{r} + β_{1}^{r} x_{i j} + \in_{j}^{r} i f z_{k j} \geq δ \end{array}$

Model (1) evidently is equivalent to standard piece-wise regression if X_i is equal to Z_k. It is supposed that the verification procedure must satisfy following demands:

• Verification procedure must include testing significance of both variables X and Z.

• Hypoxia is assessed by several oximetry parameters. It may be expected that statistical technique evaluating the effect of hypoxia on relationship between VEGF and additional factor X_i would be more powerful if it implements combination of statistical tests assessing effect each of oximetry parameters on relationship between Y and X_i.

• Effect from Figure 1 was found via testing variety of factors. So multiple testing must be taken into account when statistical significance is estimated.

Permutation test is an approach capable of meeting the listed demands due to following advantages. Permutation test may be implemented regardless whether or not underlying distributions of test statistics are known. There are no limitations on data sets sizes. Different permutation tests were proposed for assessing importance of each predictor in multiple regression models. Method based on testing significance of corresponding partial correlation coefficients may be mentioned in this regard [13]. Permutation test was used to make a choice between piece-wise regression and a simple linear regression [14]. However mentioned techniques are not suitable to evaluated significance of the effect considered in the paper. Verification is possible with the help of discussed below procedure based on testing several null hypotheses.

Effective methods calculating nonparametric combination (NPC) of several dependent permutation tests [15] may be used for assessing significance of hypoxia effect by several oximetry parameters. Also permutation tests are widely used to control multiplicity in various applied tasks including high-dimensional tasks related to DNA microarray experiments [16-19]. At that there are different ways to control FWER. Single step or stepwise procedures may be used to receive adjusted p-values from previously received raw p-values by min P correcting procedure or to calculate adjusted p-values directly from distributions of test statistics by maxT technique [20-22]. Unlike the mentioned works our paper is focused at problems associated with multiplicity control and combinations of several criteria when more complicated multifactor regularities are studied.

Data Set and Preliminary Results

Effect of hypoxia on relationship between VEGF and 138 different biochemical, clinical or biophysical parameters was studied in a group of 88 patients of age from 33 to 88 with ischemic stroke and transient ischemic attack. Hypoxia level was assessed by partial pressures of O₂ and CO₂ and also by CO-oximetry parameters in venous blood that were measured with ABL80 FLEX CO-OX analyser. Serum levels of VEGF, S100 and complement component C4 were measured by the enzyme-linked immunosorbent assay (ELISA). CO-oximetry parameters together with partial pressures of O₂ and CO₂ in venous blood will be for the simplicity further referred to as oximetry parameters.

Results of standard correlation analysis are given in Table 1.

Table 1: Correlation coefficients between partial pressures of O₂ and CO₂ and CO-oximetry parameters in venous blood and VEGF levels. View Table 1

It can be seen from Table 1 that statistically significant linear ties between VEGF levels and hypoxia are absent.

It is quite possible that linear ties nevertheless may exist in combination with some additional factors. Such supposition is supported by the effect of oxygen saturation index (sO2) on relationship between VEGF and proteins S100 levels in serum. This effect was discovered with the help of OVP technique [11] and visual analysis of corresponding scatter plots. Left scatter plot from Figure 1 presents strong linear relationship between VEGF and S100 in group with saturation level sO₂ less than 38.4. It is seen from right scatter plot that noticeable linear dependence between VEGF and S100 is absent in the group with saturation level sO₂ greater than 38.4.

It may be supposed from Figure 1 that hypoxia leads to correlation between VEGF (Y ) and S100 (X_i). So S100 may be considered as additional factor. Our goal is to assess statistical significance of the assumed effect.

Verification of the Effect

Complex dependencies testing: The above supposition is too complex to test using a single null hypothesis. In fact the effect contradicts the following three hypotheses:

a) $H_{0}^{1} (k, i)$ -Yis independent of vector (Z_k, X_i);

b) $H_{0}^{2} (k, i)$ - X_i is independent of vector (Y, Z_k);

c) $H_{0}^{3} (k, i)$ - Z_k is independent of vector (Y, X_i).

All three hypotheses must be rejected to be sure that the supposition is perfectly correct. Otherwise the observed pattern may be explained simpler. For example it may be attributed to existence of linear relationship between Y and X_i when $H_{0}^{3} (k, i)$ is in fact true.

Testing single hypothesis: The discussed effect is associated with great difference between $ρ_{l} (Y, X_{i})$ and $ρ_{r} (Y, X_{i})$ great value of one of the two correlation coefficients. High correlation values are achieved randomly in small groups. So the two conditions above may be a convincing evidence of the effect only when sizes of both groups are great enough. Thus the mentioned conditions correspond to great values of the following functional:

$Q_{2 ρ} (δ) = \frac{m_{l} m_{r} ||ρ_{l} (Y, X_{i})| - |ρ_{r} (Y, X_{i})||}{1 - max \{ρ_{l}^{2} (Y, X_{i}), ρ_{r}^{2} (Y, X_{i})\}}, (2)$

Where m_l is the number of patients in group $\tilde{s}$ _l with Z_k < δ and mr is the number of patients in group $\tilde{s}$ r with Z_k > δ. The threshold δ is initially unknown. It is proposed to use the optimal threshold δ₀ that is calculated as δ₀ = arg max $\hat{Q}$ _2ρ(δ). Maximum of the functional Q_2ρ(δ) may be searched by trying all boundaries between distinct values of Z_k existing in full group $\tilde{s}$ . Great values of Q_2ρ(δ₀) better testify against each of 3 null hypotheses $H_{0 i}^{1} (k, i)$ , $H_{0}^{2} (k, i)$ and $H_{0}^{3} (k, i)$ if δ₀ is searched inside interval (δ₁; δr) including only such thresholds for Z_k that provide simultaneous validity of inequalities ml > mthr and mr > mthr. The narrowed search interval is used because of the high probability of great Q_2ρ(δ₀) values when null hypotheses are true but one of the groups is small. Probability of great correlation coefficients in small groups evidently is higher. Full compensation of this probability increase is impossible by using multiplier m_lm_r only.

Existence of outlying observations such as observation circled in red on the left part of Figure 1 reduces Q_2ρ and so may hinder correct statistical evaluation of the studied effect. So it is better to use robust Pearson correlation coefficient.

Robust correlation coefficient: Suppose that dependence of variable U on variable V is studied on data set $\tilde{s}$ = {(u_j, v_j) | j = 1, . . . , m}. At the first step simple linear regression model U = β₀ + β₁V + e is calculated by $\tilde{s}$ . Let $\tilde{s}$ _out be the set of all 3σ outliers or $\tilde{s}$ _out = {(u_j, v_j) || u_j - β₀ - β₁v_j| > 3σ_e }, where σ_e is standard error of regression model. Robust correlation coefficient $\hat{ρ}$ is calculated as standard Pearson correlation coefficient ρ by set $\tilde{s}$ \ $\tilde{s}$ _out.

The robust version of the functional 2 can be written as

${\hat{Q}}_{2 ρ} (δ) = \frac{m_{l} m_{r} ||{\hat{ρ}}_{l} (Y, X_{i})| - |{\hat{ρ}}_{r} (Y, X_{i})||}{1 - \max \{{\hat{ρ}}_{l}^{2} (Y, X_{i}), {\hat{ρ}}_{r}^{2} (Y, X_{i})\}}, (3)$

Where ${\hat{ρ}}_{l}$ and ${\hat{ρ}}_{r}$ are robust correlation coefficients in groups ${\tilde{s}}_{l}$ and ${\tilde{s}}_{r}$ . Optimal threshold δ₀ is calculated now as $δ_{0} = \arg \max_{δ \in (δ_{l}, δ_{r})} {\hat{Q}}_{2 ρ} (δ)$ . Functional ${\hat{Q}}_{2 ρ} (δ_{0})$ is used as statistics for testing null hypotheses $H_{0}^{1} (k, i)$ , $H_{0}^{2} (k, i)$ and $H_{0}^{3} (k, i) .$ Then p-values for these null hypotheses are calculated according to the following Procedure I:

• Calculate optimal threshold δ₀ on data set ${\tilde{S}}_{t}$ as $\arg \max_{(δ_{l}, δ_{r})} {\hat{Q}}_{2 ρ} (δ) .$ The observed statistics value T0 is taken equal to ${\hat{Q}}_{2 ρ} (δ_{0}) .$

To test $H_{0}^{1} (k, i)$ repeat independently steps a₁ and a₂ for r = 1, . . . , N.

• a₁) Take a random permutation ${\tilde{S}}_{t}^{r}$ of ${\tilde{S}}_{t}$ . This is obtained by considering a random permutation u^r = $(u_{1}^{r}, ..., u_{m}^{r})$ of objects labels (1, . . . , m).

Let ${\tilde{S}}_{t}^{r} = \{(y_{u_{1}^{r}}, x_{1 i}, z_{1 k}), ..., (y_{u_{m}^{r}}, x_{m i}, z_{m k})\} .$

• a₂) Calculate optimal threshold δ₀ on training set ${\tilde{S}}_{t}^{r}$ as $\arg \max_{δ \in (δ_{l}, δ_{r})} {\hat{Q}}_{2 ρ} (δ) .$ The statistics value T^r is taken equal to ${\hat{Q}}_{2 ρ} (δ_{0}) .$

• Then calculate estimate of p-value for the null $H_{0}^{1} (k, i)$ as ${\hat{p}}^{1} (k, i) = \frac{\sum_{1 \leq r \leq N} ∥ (T^{r} \geq T^{0})}{N} .$

To $H_{0}^{2} (k, i)$ repeat independently steps a₃ and a₂ for r = 1, . . . , N.

• a3) Take a random permutation ${\tilde{S}}_{t}^{r}$ of ${\tilde{S}}_{t}^{}$ as ${\tilde{S}}_{t}^{r} = \{(y_{1}, x_{u_{1}^{r} i}, z_{1 k}), ..., (y_{m}, x_{u_{m}^{r} i}, z_{m k})\}$ Where u^r = $(u_{1}^{r}, ..., u_{m}^{r})$ is a random permutation of {1, . . . , m}.

• Implement step (a₂).

• Then calculate estimate of p-value for the null $H_{0}^{2} (k, i)$ as ${\hat{p}}^{2} (k, i) = \frac{\sum_{1 < r < N} ∥ (T^{r} > T^{0})}{N} .$

To test $H_{0}^{3} (k, i)$ repeat independently steps a₄ and a₂ for r = 1, . . . , N.

• a₄) Take a random permutation ${\tilde{S}}_{t}^{r}$ of ${\tilde{S}}_{t}^{}$ as ${\tilde{S}}_{t}^{r} = \{(y_{1}, x_{11}, z_{u_{1}^{r}}), ..., (y_{m}, x_{m 1}, z_{u_{m}^{r}})\}$ Where u^r = $(u_{1}^{r}, ..., u_{m}^{r})$ is a random permutation of {1, . . . , m}.

• Implement step (a₂).

• Then calculate estimate of p-value for null hypothesis $H_{0}^{3} (k, i)$ as ${\hat{p}}^{3} (k, i) = \frac{\sum_{1 < r < N} ∥ (T^{r} > T^{0})}{N} .$

A regularity for combination of factors (Y, X_k, Z_i) similar to the one in Figure 1 is supposed to be significant at level α if all three inequalities ${\hat{p}}^{1} (k, i) < α, {\hat{p}}^{2} (k, i) < α and {\hat{p}}^{3} (k, i) < α$ hold. It is difficult to make a theoretical conclusion about unbiasedness and consistency of the described test. However its performance can be assessed in experiments with simulated data.

Experiments with simulated data

Design of experiments: Experiments were designed to imitate regularity from Figure 1 only for certain groups of variables while for the remaining ones such regularities were absent. Scenario includes generating variables Y, Z₁, . . . , Z₇, X₁, . . . , X_n. VariablesYand e₁, . . . , e_n were independently sampled from N(0,1), variable X₁, U_g and variables U₂, ..., U₇ are independently sampled from continuous uniform distribution U(0,1). Variables X₁, . . . , X_n are calculated fromY, e₁, . . . , e_n, Z₁ and U_g. Variables Z₂, . . . , Z₇ are calculated from Z₁ and U₂, . . . , U₇.

1) Variables X₁, . . . , X₃₀ for an observation j were calculated as $X_{i j} = Y_{j} - δ_{1} * e_{i j}$ if Z_1j < 0.4 and as $X_{i j} = δ_{2} * e_{i j}$ otherwise. In groups with Z_1j < 0.4 correlation level betweenYand X_i is determined by parameter δ₁. Experiments with δ₁ = 0.75 and δ₁ = 0.57 were conducted. Choice δ₁ = 0.75 provides generating data with ρ(Y, X_i) = 0.8 and choice δ₁ = 0.57 corresponds to ρ(Y, X_i) = 0.87. Parameter δ₂ was taken equal to $\sqrt{1 + δ_{1}^{2}} .$ Observations with Z_1j ≥ 0.4 were generated from distribution with ρ(Y, X_i) = 0.

Thus in the first experiment data was sampled from distribution with ρ(Y, X_i) = 0.8, σ_Y = 1 and $σ_{X_{i}} = \sqrt{1 + {0.75}^{2}}$ when Z₁ < 0.4 and from distribution with ρ(Y, X_i) = 0, σ_Y = 1 and $σ_{X_{i}} = \sqrt{1 + {0.75}^{2}}$ when Z₁ ≥ 0.4.

In the second experiment data was sampled from distribution with ρ(Y, X_i) = 0.87, σ_Y = 1 and $σ_{X_{i}} = \sqrt{1 + {0.57}^{2}}$ when Z₁ < 0.4 and from distribution with ρ(Y, X_i) = 0, σ_Y = 1 and $σ_{X_{i}} = \sqrt{1 + {0.57}^{2}}$ when Z₁ ≥ 0.4. So data is generated to provide existence of effect that is similar to effect from Figure 1 for each combination from set {(Y, X_i,Z_k)|i = 1, . . . , 30, k = 1, . . . , 7}.

2) Variables X₃₁, . . . , X₆₀ for the observation j were calculated as $X_{i j} = Y_{j} - δ_{1} * e_{i j}$ if U_g < 0.4 and as $X_{i j} = δ_{2} * e_{i j}$ if U_g ≥ 0.4. So observations are generated from mixture of distributions with ρ(Y, X_i) = 0.8 and ρ(Y, X_i) = 0 in the first experiment and from mixture of distributions with ρ(Y, X_i) = 0:87 and ρ(Y, X_i) = 0 in the second one. Thus data is generated to provide existence of weak mutual correlation for pairs from {(Y, X_i)|i = 31, . . . , 60} and independence of such pairs on Z₁, . . . , Z_k.

3) Variables X₆₁, . . . , X₉₀ for the observation j were calculated as $X_{i j} = δ_{2} * e_{i j} .$ SoYis independent on variables X₆₁, . . . , X₉₀ and Z₁.

4) Variables Z₂, . . . , Z₇ for the observation j were calculated as Z_ij = Z_1j if U_g ≤ 0.9 and as Z_ij = 1 - Z_1j if U_g > 0.9. There are no regularities for combinations (Y,X_k,Z_i) when 31 ≤ i ≤ 90 that are similar to regularity from Figure 1. Variables Z₁, . . . , Z₇ are included in scenario to imitate all 7 oximetry factors. At that for combinations (Y,X₁,Z_i) regularities are more distinct to compare with regularities for combinations (Y,X_k,Z_i) when k > 1. Sets X₁, . . . , X₃₀; X₃₁, . . . , X₆₀; X₆₁, . . . , X₉₀ will be referred to as ${\tilde{C}}_{1}, {\tilde{C}}_{2}, {\tilde{C}}_{3}$ correspondingly.

Results of experiment: Results of the first and second experiments are presented in Table 2. Columns of the table correspond to significance levels from p < 0.0001 to p < 0.1. Upper part of table corresponds to first experiment (δ₁ = 0.75) and lower part of table corresponds to second experiment (δ₁ = 0.57). Cell at intersection of row corresponding to significance level α and column corresponding to subset ${\tilde{C}}_{j}$ contains number of triples (Y, X_i,Z₁) with $X_{i} \in {\tilde{C}}_{j}$ for which all three null hypotheses were rejected at least at level α. Number of triples from set $\{(Y, X_{i}, Z_{k}) | X_{i} \in {\tilde{C}}_{j}, 2 \leq k \leq 7\}$ for which all three null hypotheses were rejected at level α is given in the same cell in parentheses.

Table 2: Results of experiments with simulated data. View Table 2

It may be seen from Table 2 that all three null hypotheses were rejected for the majority combinations from the set ${\tilde{C}}_{1} = \{(Y, X_{i}, Z_{k}) | 1 \leq i \leq 30, k = 1, . . . , 7\} .$ In the first experiment all null hypotheses were rejected at significance level p < 0.01 for all 30 combinations from ${\tilde{C}}_{1}$ when k = 1 and for 163 combinations from 180 when k = 2, . . . ,7. On the contrary all null hypotheses were rejected only for few combinations from the sets ${\tilde{C}}_{2} = \{(Y, X_{i}, Z_{k}) | 31 \leq i \leq 60, k = 1, . . . , 7\}$ and ${\tilde{C}}_{3} = \{(Y, X_{i}, Z_{k}) | 61 \leq i \leq 90, k = 1, . . . , 7\} .$ All null hypotheses were rejected at significance level p < 0.1 only for one combination from ${\tilde{C}}_{2}$ and for one combination from ${\tilde{C}}_{3}$ when k = 1. At that number of combinations where all three null hypotheses were rejected equals 6 inside ${\tilde{C}}_{2}$ and 3 inside ${\tilde{C}}_{3}$ for k = 2, . . . ,7. In the second experiment number of combinations where three null hypotheses were rejected at level α were higher than number of such combinations practically for all significance levels. So the results of experiments strongly indicate unbiasedness of the developed criterion.

Experiments with clinical data

The developed technique was applied to find regularities similar to the one shown in Figure 1 on the described above clinical data set. Three null hypotheses were tested for combinations (Y, X_i,Z_k) whereYis concentration of VEGF in serum, Z₁, . . . , Z₇ were oximetry parameters sO2, pO2, pCO2, FCOHb, FO2Hb, FMetHb, FHHb. All 138 variables different from VEGF concentration and oxymetry parameters were tried as additional factors X₁, . . . , X_n. The most significant effects were revealed if concentrations of S100 proteins or complement component C4 in serum are used as additional factor X. Table 3 present calculated p-values ${\hat{p}}^{1}, {\hat{p}}^{2} and {\hat{p}}^{3}$ correspondingly for all such combinations.

Table 3: Results of the null hypotheses testing when S100 or C4 are additional factors. View Table 3

It is seen from Table 3 that all null hypotheses are rejected at significance level p < 0.001 for combination (VEGF, pO2, C4), at significance level p < 0.002 for combinations (VEGF, sO2, C4) and (VEGF, FO2Hb, C4), at significance level p < 0.05 for combinations (VEGF, FHHb, C4), (VEGF, sO2, S100), (VEGF, pCO2, S100). Regularity related to effect of hypoxia on correlation between VEGF and S100 is shown at Figure 2.

Figure 2: Correlations between S100 and VEGF levels are compared in groups with sO₂ < 39.75% (left scatter plot) and sO₂ > 39.75% (right scatter plot). Boundary point was calculated using Procedure I. View Figure 2

Pattern from Figure 2 is similar to the pattern from Figure 1. However boundary point for pattern from Figure 2 is received by procedure I. This boundary differs from boundary for the pattern from Figure 1 that was calculated by OVP method. Correlation coefficient between VEGF and S100 in group of 33 cases with sO₂ < 39.75% equals 0.64. Correlation coefficient increases to 0.88 after removing an outlying object highlighted in the left scatter diagram by red circle. No relationship between VEGF and S100 exists in group of 55 cases with sO₂ < 39.75% as it may be seen from right diagram. Corresponding correlation coefficient equals 0.03.

It may be seen from Figure 3 and Figure 4 that effect of hypoxia on relationship between VEGF and C4 is similar to the effect of hypoxia on relationship between VEGF and S100. Correlation coefficient between VEGF and C4 is equal 0.47 in group of 31 cases with sO₂ < 39.25% which corresponds to left scatter diagram from Figure 3. Correlation coefficient increases to 0.76 after removing of outlying object which is highlighted at the left scatter diagram by red circle. No significant relationship between VEGF and C4 exists in group of 57 cases with sO₂ > 39.25% as it may be seen from the right diagram. Corresponding correlation coefficient equals -0.11. Correlation coefficient increases to 0.05 after removing the highlighted at right diagram outlier.

Figure 3: Correlation between VEGF and C4 levels are compared in groups with sO₂ < 39.25% (left scatter plot) and sO₂ > 39.25% (right scatter plot). Boundary point was calculated using Procedure I. View Figure 3

Figure 4: Correlation between VEGF and C4 levels are compared in groups with FHHB < 56.3% (left scatter plot) and FHHB > 56.3% (right scatter plot). Boundary point was calculated using Procedure I. View Figure 4

Low saturation index sO₂ corresponds to high FHHB values. Strong correlation between VEGF and S100 exists when FHHB is greater than a certain threshold. At that correlation coefficient is close to zero when FHHb is lower than the threshold as can be seen in Figure 4. Correlation coefficient between VEGF and C4 in group of 36 cases with FHHB > 56.3% equals 0.45. Correlation coefficient increases to 0.73 after removing of outlier highlighted at right diagram. No significant relationship between VEGF and C4 exists in group of 52 cases with FHHB < 56.3% as it may be seen from left diagram. Corresponding correlation coefficient is equal -0.1. Correlation coefficient increases to 0.07 after removing of highlighted at left diagram outlier.

Our goal is testing if hypoxia has effect on VEGF production via controlling relationship between VEGF and some additional factor. Hypoxia effect is manifested via effects associated with different oximetry parameters. Existence of supposed hypoxia effect contradicts simultaneously to several of null hypotheses associated with different oximetry parameters. Hypoxia effect may be assessed by testing global null hypotheses $H_{0 c}^{1} (i) = \cap_{k = 1}^{7} H_{0}^{1} (k, i),$ $H_{0 c}^{2} = \cap_{k = 1}^{7} H_{0}^{2} (k, i)$ and $H_{0 c}^{3} = \cap_{k = 1}^{7} H_{0}^{3} (k, i)$ [15]. Last global hypotheses may be tested with the help of nonparametric combinations (NPC) methodology [15]. To test global hypothesis $H_{c 0}^{g} (i) (g = 1, . . . , 3)$ on data set ${\tilde{S}}_{t} = \{(y_{1}, x_{1 i}, z_{11}, . . . , z_{17}), . . . , (y_{m}, x_{m i}, z_{m 1}, . . . , z_{m 7})\}$ following NPC procedure was used.

Procedure II: Repeat steps (b1), . . . , (b4) for k = 1, . . . , 7.

• b₁) Calculate T0 according to step a0 from Procedure I on data set $\{(y_{1}, x_{1 i}, z_{1 k}), . . . , (y_{m}, x_{m i}, z_{m k})\}$

• b₂) For r = 1, . . . , N repeat step if is tested, g = 1, . . . , 3.

- $b_{2}^{1})$ Take random permutation ${\tilde{S}}_{t}^{r} = \{(y_{u_{1}^{r}}, x_{1 i}, z_{1 k}), . . . , (y_{u_{m}^{r}}, x_{m i}, z_{m k})\}$ of ${\tilde{S}}_{t}$ according to step (a₁) from Procedure I and implement step a₂ from Procedure I to calculate statistics Tr by ${\tilde{S}}_{t}^{r} .$

- $b_{2}^{2})$ Take random permutation ${\tilde{S}}_{t}^{r} = \{(y_{1}, x_{u_{1}^{r} i}, z_{1 k}), . . . , (y_{m}, x_{u_{m}^{r} i}, z_{m k})\}$ of ${\tilde{S}}_{t}$ according to step (a₃) from Procedure I and implement step a₂ from Procedure I to calculate statistics Tr by ${\tilde{S}}_{t}^{r} .$

- $b_{2}^{3})$ Take random permutation ${\tilde{S}}_{t}^{r} = \{(y_{1}, x_{1 i}, z_{u_{1}^{r} k}), . . . , (y_{m}, x_{m i}, z_{u_{m}^{r} k})\}$ of ${\tilde{S}}_{t}$ according to step (a₄) from Procedure I and implement step a₂ from Procedure I to calculate statistics Tr by ${\tilde{S}}_{t}^{r} .$

• b₃) Calculate $\hat{p} (k, i) = \frac{\sum_{1 \leq j \leq N} ∥ (T^{j} (i) \geq T^{0} (i))}{N} .$

• b₄) Calculate $λ (k, r, i) = \frac{\sum_{1 \leq j \leq N} ∥ [T^{j} (i) \geq T^{r} (i)]}{N}$ r = 1, . . . , N

• Calculate statistics $T_{c}^{0} (i) = ψ [\hat{p} (1, i), . . . , \hat{p} (7, i)]$

• Calculate statistics $T_{c}^{r} (i) = ψ [λ (1, r, i), . . . , λ (7, r, i)]$

• Then calculate ${\hat{p}}_{c} (i) = \frac{\sum_{1 \leq j \leq N} ∥ [T_{c}^{r} (i) \geq T_{c}^{0} (i)]}{N}$ that is p-value testing hypothesis $H_{0 c}^{g}$ when step $b_{2}^{g} .$

Test statistics in Procedure II is calculated as a combining function of p-values related to partial tests. Several combining functions are discussed in [15]. The best performance is achieved according to our experiments when slightly modified Fisher combining function ψ is used. Let ${\hat{p}}_{1}, . . . , {\hat{p}}_{k}$ are some p-values calculated by permutation test with N random permutations. Then $ψ ({\hat{p}}_{1}, . . . , {\hat{p}}_{k}) = - 2 \sum_{i = 1}^{k} \log {\hat{p}}^{'}_{i},$ where ${\hat{p}}^{'}_{i} = {\hat{p}}_{i}$ if ${\hat{p}}_{i} > 0$ and ${\hat{p}}^{'}_{i} = 1/2 N$ otherwise.

Results of the Procedure II applied to the studied data set are presented in Table 4. It can be seen that the global null hypotheses $H_{0 c}^{1}$ and $H_{0 c}^{2}$ are rejected at level p < 0.0005 when additional factors are C4 and S100 concentrations. Global null hypothesis $H_{0 c}^{3}$ is rejected at p = 0.0001 when additional factor is C4 concentration. But $H_{0 c}^{3}$ is not rejected when additional factor is S100 concentration.

Multiplicity Control

It was necessary to test global null hypotheses from set $\{H_{0 c}^{1}, H_{0 c}^{2}, H_{0 c}^{3} |i = 1, . . . , n\}$ to reveal the supposed effect of hypoxia on relationship between concentration of VEGF and concentrations of C4 or S100. So a multiple testing procedure must be used to assess true statistical significance of revealed effects. It is sufficient to use a single-step procedure because only two global null hypotheses associated with C4 and S100 as initial factors were rejected. Methods min P and max T are effective tools of multiplicity control [22]. More universal min P procedure is preferable for discussed task because max T technique is based on unjustified supposition about approximate equality of test statistics distribution.

Adjusted p-values were calculated by the represented below Procedure III.

Repeat steps (c₁), (c₂) and (c₃) for i = 1, . . . , n.

• c₁) Use Procedure II to calculate statistics $T_{c}^{0} (i), T_{c}^{1}, . . . , T_{c}^{N}$ by ${\tilde{S}}_{t}$ and random permutations $\{{\tilde{S}}_{t}^{r} |r = 1, . . ., N\}$ correspondingly. At that step $b_{2}^{g}$ is implemented when $H_{0 c}^{g} (i)$ is tested, g = 1, . . . , 3.

• c₂) Calculate $λ_{c} (0, i) = \frac{\sum_{1 \leq j \leq N} ∥ (T_{c}^{j} (i) \geq T_{c}^{0} (i))}{N}$

• c₃) Calculate $λ_{c} (r, i) = \frac{\sum_{1 \leq j \leq N} ∥ (T_{c}^{j} (i) \geq T_{c}^{r} (i))}{N}$

• c₄) Calculate ${\hat{p}}_{m t} = \frac{\sum_{1 \leq r \leq N} ∥ [\min_{i \in \{1, ..., n\}} λ_{c}^{r} (i) \leq {\hat{p}}_{c}^{0} (i)]}{N}$

• c₅) ${\hat{p}}_{m t}$ is equal to ${\hat{p}}_{m t}^{g}$ that is adjusted p-value testing global null hypothesis $H_{0 c}^{g}$ when $(b_{2}^{g})$ is used at step (c₁).

The described Procedure III provides weak FWE control. Procedure III was applied to calculate adjusted p-values for global null hypotheses $H_{0 c}^{1},$ $H_{0 c}^{2}$ and $H_{0 c}^{3} .$ Results are presented in Table 4.

Table 4: Results of the global null hypothesis testing with NPC and multiplicity control with Procedure III. View Table 4

It is seen from Table 4 that the global null hypotheses $H_{0 c}^{1},$ and $H_{0 c}^{2}$ are rejected when concentrations of C4 and S100 are additional factors. At that $H_{0 c}^{3}$ is not rejected if additional factor is concentration of S100. So it is possible that pattern from the Figure 2 may be related only to existence of linear relationship between VEGF and S100 levels in serum which is not controlled by hypoxia. On the other hand set of relationships that are represented at Figure 3 and Figure 4 or are mentioned in Table 3 where concentrations of C4 is additional factor cannot be explained with the help of some simpler effect. These relationships cannot be reduced to linear correlation between VEGF and C4. Also they cannot be explained by effect of hypoxia on VEGF only or on C4 only. So supposition that relationship between VEGF and C4 levels in serum is controlled by hypoxia is in accordance with data.

Conclusion

Results may be shortly summarized as follows. A method was developed which is aimed to discover relationships of the following type in data: Significant linear correlation between two factorsYand X_i exists only if third factor Z_k belongs to interval from one side of some threshold δ. At that from another side of δ Pearson correlation coefficient is close to zero.

It was suggested to consider such three-factor relationship as statistically significant when rejecting three null hypotheses: $H_{0 c}^{1},$ $H_{0 c}^{2}$ and $H_{0 c}^{3} .$ Nonparametric permutation tests with statistics that is optimal value of special quality functional were used to test these hypotheses.

Performance of the method was evaluated in tasks with simulated data. Good concordance between found regularities and patterns provided by the experiment scenario is seen from Table 2.

The method was applied to test supposition that hypoxia control relationship between serum VEGF concentration and some factor from the analyzed clinical database. It was supposed that hypoxia is manifested by oximetry parameters. Three null hypotheses were rejected for set of triples (Y, X_i,Z_k) whereYis VEGF concentration, X_i is some additional factor and Z_k is some of oximetry parameters.

Significance of hypoxia effect on correlation between VEGF level and additional factor X_i may be assessed as combined significance of effects related to different oximetry parameters. Combined significance was evaluated with the help of NPC method testing intersection of null hypotheses related to set of triples $\{(Y, X_{i}, Z_{k}) |k = 1, . . . , 7\} .$ A single-step permutations based FWE control was implemented to take into account that additional factor is searched among 138 variables.

It was shown that three combined null hypotheses were rejected at significance level p < 0.02 when concentration of complement C4 is the additional factor. Developed technique may be used in variety of biomedical tasks where it is necessary to assess effect of some factor or some group of factors on existing linear ties.

Acknowledgement

This study was supported by RFBR grant 20-01-00609.

References

Citation

Volume 6 Issue 1

Download Article

Article Formats

PDF | HTML | XML | ePUB

Order Reprints

Article Details

International Journal of Clinical Biostatistics and Biometrics

ISSN: 2469-5831

Int J Clin Biostat Biom

Abbrevation: ijcbb

DOI: 10.23937/2469-5831/1510024

Pub Date: March 04, 2020

Article Type: Research Article

Pub Type: Open Access

Corresponding author

Oleg V Senko, Federal Research Center/Research Center of Informatics and Control, Russian Academy of Sciences, Moscow, Russia.

Copyright

© 2019 Kodryan MS, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

[ref1] Han W, Song X, He R, Li T, Cheng L, et al. (2017) VEGF regulates hippocampal neurogenesis and reverses cognitive deficits in immature rats after status epilepticus through the VEGF R2 signaling pathway. Epilepsy Behav 68: 159-167.

[ref2] Z_iello JE, Jovin IS, HuangY(2007) Hypoxia-Inducible Factor (HIF)-1 regulatory pathway and its potential for therapeutic intervention in malignancy and ischemia. Yale J Biol Med 80: 51-60.

[ref3] Pavlisa G, Pavlisa G, Kusec V, Kolonic SO, Markovic AS, et al. (2010) Serum levels of VEGF and bFGF in hypoxic patients with exacerbated COPD. Eur Cytokine Netw 21: 92-98.

[ref4] Asai K, Kanazawa H, Kamoi H, Shirashi S, Hirata K, et al. (2003) Increased levels of vascular endothelial growth factor in induced sputum in asthmatic patients. Clin Exp Allergy 33: 595-599.

[ref5] L Ma,YChen, G Jin,YYang, Q Ga, et al. (2015) Vascular endothelial growth factor as a prognostic parameter in subjects with "Plateau Red Face". High Alt Med Biol 16: 147-153.

[ref6] Pollard JW (2004) Tumour-educated macrophages promote tumour progression and metastasis. Nat Rev Cancer 4: 71-78.

[ref7] Ribatti D, Crivellato E (2009) Immune cells and angiogenesis. J Cell Mol Med 13: 2822-2833.

[ref8] Hoeres T, Wilhelm M, Smetak M, Holzmann E, Schulze-Tanzil G, et al. (2018) Immune cells regulate VEGF signalling via release of VEGF and antagonistic soluble VEGF receptor-1. Clin Exp Immunol 192: 54-67.

[ref9] Van Beest P, Wietasch G, Scheeren T, Spronk P, Kuiper M (2011) Clinical review: Use of venous oxygen saturations as a goal - a yet unfinished puzzle. Crit Care 15: 232.

[ref10] Nebout S, Pirracchio R (2011) Should we monitor ScVO2 in critically ill patients? Cardiol Res Pract 2012: 370697.

[ref11] Senko OV, Kuznetsova AV (2006) The optimal valid partitioning procedures. Statistics on the Internet.

[ref12] Kuznetsova AV, Kostomarova IV, Senko OV (2013) Modification of the method of optimal valid partitioning for comparison of patterns related to the occurrence of ischemic stroke in two groups of patients. Pattern Recognition and Image Analysis 22: 10-25.

[ref13] Anderson MJ, Robinson J (2001) Permutation tests for linear models. Aust N Z J Stat 43: 75-88.

[ref14] Kim HJ, Fay MP, Feuer EJ, Midthune DN (2000) Permutation tests for jointpoint regression with applications to cancer rates. Stat Med 19: 335-351.

[ref15] Pesarin F, Salmaso L (2010) Permutation tests for complex data: Theory, Applications and Software. John Wiley and Sons, Ltd.

[ref16] Dudoit S, Popper Shaffer J, Boldrick JC (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18: 71-103.

[ref17] Tusher VG, Tibshirani R, Chu G (2001) Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci U S A 98: 5116-5121.

[ref18] Ge Y, Sealfon SC, Speed TP (2009) Multiple testing and its applications to microarrays. Stat Meth Med Res 18: 543-563.

[ref19] Goeman JJ, Solari A (2014) Multiple hypothesis testing in genomics. Statist Med 33: 1946-1978.

[ref20] Westfall PH, Young SS (1993) Resampling-based multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons.

[ref21] Ge Y, Dudoit S, Speed TP (2003) Resampling-based multiple testing for microarray data analysis. Test 12: 1-77.

[ref22] Westfall PH, Troendle JF (2008) Multiple testing with minimal assumptions. Biom J 50: 745-755.

International Journal of

Table of Contents

Citation

Research Article | OPEN ACCESS DOI: 10.23937/2469-5831/1510024

Nonparametric Method for Estimation of Controlled Correlations in Studies of VEGF-Hypoxia Relationship

Maxim S Kodryan1, Anna V Kuznetsova2, Luidmila L Klimenko3, Aksana N Mazilina4, Ivan V Baskakov3 and Oleg V Senko5*

Abstract

Keywords

Introduction

Data Set and Preliminary Results

Verification of the Effect

Experiments with simulated data

Experiments with clinical data

Multiplicity Control

Conclusion

Acknowledgement

References

Citation

Maxim S Kodryan¹, Anna V Kuznetsova², Luidmila L Klimenko³, Aksana N Mazilina⁴, Ivan V Baskakov³ and Oleg V Senko^5*