On Sample Size Calculation for Exact Group Sequential Tests for Rare Disease

Jin M

doi:10.23937/2469-5831/1510013

RESEARCH ARTICLE | VOLUME 3, ISSUE 1 | OPEN ACCESS DOI: 10.23937/2469-5831/1510013

On Sample Size Calculation for Exact Group Sequential Tests for Rare Disease

Man Jin^1* and James L Kepner²

¹MRL, Merck & Co., Inc., Rahway, USA

²Alpha, Illinois, USA

^*Corresponding author: Dr. Man Jin, MRL, Merck & Co., Inc., Rahway, NJ 07065, USA.

Accepted: October 07, 2017 | Published: October 09, 2017

Citation: Jin M, Kepner JL (2017) On Sample Size Calculation for Exact Group Sequential Tests for Rare Disease. Int J Clin Biostat Biom 3:013. doi.org/10.23937/2469-5831/1510013

Copyright: © 2017 Jin M, et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Abstract

For a rare disease, all the patients having the disease constitute a small population, and the standard single-stage hypergeometric test is uniformly most powerful to evaluate the response probability of a specific treatment regimen. Although exact group sequential designs are widely employed in phase II clinical trials for binomial proportions, it is unknown whether or not similar tests can be employed for hypergeometric proportions. In this manuscript, it is proved that, for hypergeometric proportions, there exist exact group sequential designs that achieve the predesignated significance level and power with maximum total sample size bounded above by the sample size for the corresponding standard exact single-stage test. Additionally, two types of optimal two-stage designs are examined for a range of design parameters; one is optimal in the sense that the expected sample size under the null hypothesis is minimized, and the other is optimal in the sense that the maximum sample size is minimized.

Keywords

Exact group sequential designs, Hypergeometric distribution, Minimax designs, Optimal designs, Small population, Uniformly most powerful test

Introduction

For a rare disease, all the patients having the disease constitute a small population of size N. Phase II trials of Investigational New Drugs (INDs) are performed in order to assess whether a new drug shows some promise of activity for the disease [1]. Imagine that if all the patients with the disease were to be treated, the new drug would show some promise of activity to M of them, and therefore the response rate could be defined as p = M/N. Care must be taken to explicitly define what is meant by "show some activity" [1]. For example, the drug is said to show some activity to a cancer patient whose tumor shrinks by at least 50% after the treatment.

Glycogen storage disease type II (also known as Pompe disease) is an autosomal recessive metabolic disorder which damages muscle and nerve cells throughout the body. The disease affects approximately 1 in 140,000 babies and 1 in 60,000 adults a year [2]. Von Hippel-Lindau (VHL) disease is another rare autosomal dominant syndrome which affects 1 in 36,000 babies [3].

To conduct clinical trials for such extremely rare disease, the designs developed here are based on testing a null hypothesis $H_{0} : p \leq p_{0}$ that the true response rate is less than some uninteresting level $p_{0}$ ; that is, the new drug shows some activity to fewer than $M_{0} = N p_{0}$ patients. If the null hypothesis is true, then we require that the probability of falsely concluding that the drug is efficacious is less than a user specified $α$ . We also require if a specified alternative hypothesis $H_{1} : p \leq p_{1}$ (that is, more than $M_{1} = N p_{1}$ patients response to the drug) is true with $p_{0} \leq p_{1}$ , then the probability of falsely concluding that the drug is not efficacious is less than a user specified $β$ .

We start our discussion by considering the standard one-stage design for testing

$H_{0} : p \leq p_{0}$ vs. $H_{1} : P > p_{0}$ , or equivalently $H_{0} : M \leq M_{0}$ vs. $H_{1} : M > M_{0} (1)$

In order to test the above hypotheses, a sample of n patients selected randomly from the population of N patients is treated. Let S denote the number of patients in the sample who respond to the treatment, which is following the hypergeometric distribution, denoted as H (N; M; n),

$\Pr (S = s) = h (s; N, M, n) = (\begin{array}{l} M \\ s \end{array}) (\begin{array}{l} N - M \\ n - s \end{array}) / (\begin{array}{l} N \\ n \end{array}), (2)$

Where $s = \max (0; n + M - N)$ ; ... ; $\min (n; M)$ . Consider the test with reject region $S \leq b + 1$ , where $b$ is an integer such that $\Pr (S \leq b + 1 | p = p_{0}) \leq α$ . We can show that this test is Uniformly Most Powerful (UMP) test for hypotheses (1).

Group sequential designs are widely employed in phase II and phase III clinical trials. A group sequential test allows early termination of accrual if the treatment response rate of the treatment is quite high or quite low during early stages. Fleming [4], Chang, et al. [5], and Simon [6] pointed out that for binomial distributions; well-designed group sequential tests are more efficient than standard single-stage tests, because on average they require fewer patients to achieve the predesignated significance level and power. However, it is still unknown that for hypergeometric distributions, whether or not there exist exact group sequential tests which are more efficacious than the aforementioned standard single-stage exact test.

It is known that when the disease population size $N$ is large, the hypergeometric distribution $h (s; N, M, n)$ can be approximately well by the binomial distribution $b (s; n, p)$ where $p = M / N$ and then we can use the available designs [6]. But for extremely rare diseases, using designs based on binomial distribution may have incorrect significant level; that is, the true type-I error may be different from the pre specified type-I error. In Figure 1, we display a toy example, showing the difference between two distributions, hypergeometric distribution $h (s; 50, 20, 10)$ and binomial distribution $b (s; 10, 0.4)$ , where $s = 0, 1, ..., 10.$

Motivated by Kepner and Chang [7,8], in this manuscript we show that, for hypergeometric distributions, under mild conditions and for a given number $k \geq 2$ , there exists at least one k-stage group sequential test which has exactly the same maximum total sample size, significance level, and power as the standard single-stage hypergeometric test. Furthermore, motivated by Simon [6], we examine two types of optimal two-stage designs for a range of design parameters; one is optimal in the sense that the expected sample size under the null hypothesis is minimized, and the other is optimal in the sense that the maximum sample size is minimized.

The remainder of the manuscript is organized as follows. In Section 2, group sequential procedures are described. In Section 3, main theorems are presented and proved. In Section 4, two types of optimal two-stage designs are examined. Some brief discussion is given in Section 5 (Figure 1).

Figure 1: Comparing hypergeometric distribution $h (s; 50, 20, 10)$ and binomial distribution $b (s; 10, 0.4)$ , where $s = 0, 1, ..., 10$ . View Figure 1

Group Sequential Tests

Group sequential tests are specified by the maximum number of stages, k, the cumulative number of patients to be treated up to each stage $i, n_{i}$ and the critical values, ${[a_{1}, b_{1}], \cdot \cdot \cdot, [a_{k - 1,} b_{k - 1}], b_{k}}$ , where $a_{i} \leq b_{i}, i = 1, ..., k - 1$ . Let $S_{i}$ be the number of patients who respond positively to the treatment among $n_{i}$ patients cumulative up to stage $i$ . The distribution of $S_{i}$ is $\Pr (S_{i} = s) = h (s; N, M, n_{i})$ where $s = \max (0, n + M - N), ..., \min (n_{i}, M)$ . The distribution is undetermined up to an unknown parameter, $M$ or equivalently, $p = M / N$ .

The group sequential tests are conducted as follows. Start with stage $i = 1$ . If $S_{i} \leq a_{i} - 1$ , stop sampling and reject $H_{1}$ ; if $S_{i} \geq b_{i} + 1$ , stop sampling and accept $H_{1}$ ; if $a_{i} \leq S_{i} \leq b_{i}$ , continue to stage $i + 1$ . At the final stage, accept $H_{1}$ if $S_{k} \geq b_{k} + 1$ and reject $H_{i}$ if $S_{k} \leq b_{k \cdot}$

The power function for any such group sequential test is

$\begin{matrix} P (M) = \Pr {S_{1} \geq b_{1} + 1 | M} + \Pr {a_{1} \leq S_{1} \leq b_{1}, S_{2} \geq b_{1} + 1 | M} + ... (3) \\ + \Pr {a_{1} \leq S_{1} \leq b_{1}, ..., a_{k - 1} \leq S_{k - 1} \leq b_{k - 1}, S_{k} \geq b_{k} + 1 | M} . \end{matrix}$

For the required significance level $α$ and power $1 - β$ at $M_{0} + Δ$ , one should select $(n_{1}, ..., n_{k})$ and $[(a_{1}, b_{1}), ..., (a_{k - 1}, b_{k - 1}), b_{k}]$ such that $P (M_{0}) \leq α$ and $P (M_{0} + Δ) \geq 1 - β$ . The power function can be written as a function of proportion $p$ . To abuse the notation slightly, let the power function (3) be written as $P (p)$ . Therefore, for the required significance level $α$ and power $1 - β$ at $p_{0} + δ$ , where $p_{0} = M_{0} / N$ and $δ = Δ / N$ , one should select $(n_{1}, ..., n_{k})$ and $[(a_{1}, b_{1}), ..., (a_{k - 1}, b_{k - 1}), b_{k}]$ such that $P (p_{0}) \leq α$ and $P (p_{0} + δ) \geq 1 - β$ .

As discussed in Kepner and Chang [7], there are three types of group sequential designs. Type 1 designs stop early only to conclude efficacy (i.e., stop early only to accept $H_{1}$ ), Type 2 design stops early only to conclude futility (i.e., stop early only to reject $H_{1}$ , and Type 3 designs stop early for either efficacy or futility. Using the above notation, Type 1 group sequential tests are those with $a_{1} = a_{2} = ... = a_{k - 1} = 0$ , Type 2 group sequential tests are those with $b_{1} = b_{2} = ... = b_{k - 1} = n_{i}$ , and Type 3 group sequential tests are those with $0 < a_{i} \leq b_{i} \leq n_{i}$ , for $i = 1, ..., k - 1$ .

Main Theorems

In this section, three theorems are established, one for each of three types of exact group sequential tests discussed in Section 2. These three theorems are similar to those established in Kepner and Chang [7] which were for binomial distributions.

Assume:

$(A 1) S \sim ~ H (N, M, n);$

$(A 1)$ $n$ is the smallest sample size for which there is an integer b such that $0 \leq b + 1 \leq n$ , $\Pr {S \geq b + 1 | M = M_{0}} \leq α$ , and $\Pr {S \geq b + 1 | M = M_{0} + Δ} \geq 1 - β$ , where $0 \leq Δ < N - M_{0}$ ; $(A 3)$ $k$ is an integer such that $2 \leq k \leq n$ .

Theorem 1

Under the conditions (A1)-(A3), assume $b \leq n - k$ . Then there exists a Type 1 k-stage exact sequential test such that $n_{1} \geq n_{2} - n_{1} \geq ... \geq n_{k} - n_{k - 1}$ and $n_{k} = n$ the significance level of the test is $α$ and the power of the test at $M = M_{0} + Δ$ is at least $1 - β$ .

Proof

Let $n_{1} = n - k + 1$ and $n_{i} = n_{i - 1} + 1$ for $i = 2, ..., k$ . The test statistic at stage $i$ is $S_{i}$ . Let $b_{1} = b_{2} = ... = b_{k} = b$ and $a_{1} = a_{2} = ... = a_{k - 1} = 0$ . Since $b_{1} + 1 = b + 1 \leq n - k + 1 = n_{1} \leq n_{i}$ for $i = 1, ..., k$ , then the sample size $(n_{1}, ..., n_{k})$ and the critical values $[(a_{1}, b_{1}), ..., (a_{k - 1}, b_{k - 1}), b_{k}]$ become a proper Type 1 k-stage test. According to this test, $H_{1}$ is accepted if and only if at least one $S_{i} \geq b + 1$ , $1 \leq i \leq k$ , which is equivalent to $S_{k} = S \geq b + 1$ , since $S_{1} \leq S_{2} \leq ... \leq S_{k} = S$ . Thus, the power function of the test is the same as that of S.

Theorem 2

Under the conditions (A1)-(A3), assume $b \geq k - 1$ . Then there exists a Type 2 k-stage exact sequential test such that $n_{1} \geq n_{2} - n_{1} \geq ... \geq n_{k} - n_{k - 1}$ and $n_{k} = n$ , the significance level of the test is $α$ and the power of the test at $M = M_{0} + Δ$ is at least $1 - β$ .

Proof

Let $n_{1} = n - k + 1$ and $n_{i} = n_{i - 1} + 1$ for $i = 2, ..., k$ . The test statistic at stage $i$ is $S_{i}$ . Let $b_{1} = b_{2} = ... = b_{k} = b$ and $b_{i} = n_{i}$ for $i = 1, ..., k - 1$ and $b_{k} = b$ . Since $1 \leq a_{1} \leq b_{i} = n_{i}$ , $i = 1, ..., k - 1$ and $k \leq b + 1 \leq n$ , then the sample size $(n_{1}, ..., n_{k})$ and the critical values $[(a_{1}, b_{1}), ..., (a_{k - 1}, b_{k - 1}), b_{k}]$ become a proper Type 2 k-stage test. According this test $H_{1}$ is rejected if and only if at least one $S_{i} \leq a_{i} = b - k + i + 1$ . That is, there exists at least one $i$ , $1 \leq i \leq k$ , such that $n_{i} - S_{i} \leq n_{i} - b + k + i = n - b$ . This is equivalent to $n - S_{k} \leq n - b$ , because $n_{1} - S_{i} \leq n_{2} - S_{2} \leq ... \leq n_{k} - S_{k}$ . Thus, the power function of the test is the same as that of S.

After Theorems 1-2 are established, following the same proof as that for Theorem 3 in Kepner and Chang [7], we can establish the following theorem.

Theorem 3

Under the conditions (A1) - (A3), assume $k - 1 \leq b \leq N - k$ . Then there exists a Type 3 exact k-stage sequential test such that $n_{1} \geq n_{2} - n_{1} \geq ... \geq n_{k} - n_{k - 1}$ and $n_{k} = n$ , the significance level of the test is $α$ and the power of the test at $M = M_{0} + Δ$ is at least $1 - β$ .

The implications of these theorems are two-fold. The theoretical implication is that, for a given number $k \geq 2$ satisfying some mild conditions, there exists at least one k-stage group sequential test whose maximum total sample size is bounded above by the sample size needed for the standard one-stage test to achieve the same significance level and power. The applied implication is that we can search for "optimal" exact k stage designs among those proper group sequential tests whose maximum total sizes are bounded above by the sample size needed for the standard exact one-stage test to achieve the same significance level and power.

Optimal Two-Stage Designs

In this section we focus on optimal two-stage designs, which can be extended to optimal k-stage designs. We examine the two types of optimal two-stage designs proposed in Simon [6]; one is optimal in the sense that the expected sample size under the null hypothesis is minimized, and the other is optimal in the sense that the maximum sample size is minimized. Originally they were proposed for binomial distributions. In this manuscript, we examine them for hypergeometric distributions. But there is a small difference. In Simion [6], the optimal designs are searched among all the proper two-stage designs. Hereafter, thanks to Theorems 1-3, we only need to search for optimal designs among those proper two-stage designs whose maximum total sizes are bounded above by the sample size needed for the standard one-stage design to achieve the same significance level and power. Narrowing the search domain improves the computational efficiency.

Consider the first type of optimal two-stage designs. If there are $n_{1}$ patients in the first stage and, if necessary, $n_{2} - n_{1}$ more patients are treated in the second stage, leading to maximum sample size to be $n_{2}$ . Then the expected sample size is $E N = P E T \times n_{1} + (1 - P E T) \times n_{2}$ , where PET is the probability of early termination after the first stage. The decision of whether or not to terminate after the first stage depends on the type of the two-stage design (Type 1, Type 2, or Type 3) and the number of responses observed from those $n_{1}$ patients (Table 1). The expected sample size EN and the probability of early termination PET depend on the unknown parameter, M, the number of responses observed in the sample of size $n_{1}$ from the population of size N. In particular, $P E T = 1 - \sum_{s = a_{1}}^{b_{1}} h (s; N, M, n_{1})$ . In this manuscript, as in Simon [6], we consider the optimal two-stage designs which have minimum expected sample size under the null hypothesis, $E N_{0} = P E T_{0} \times n_{1} + (1 - P E T_{0}) \times n_{2}$ where $P E T = 1 - \sum_{s = a_{1}}^{b_{1}} h (s; N, M_{0}, n_{1})$ . The rationale behind this is that we should expose as few patients as possible to an ineffective treatment. The second type of optimal two-stage designs is easier to describe. They are the ones that have minimum maximum sample size; that is, $n_{2}$ is minimized.

Table 1: Notation for interpreting different designs. View Table 1

Now we are ready to examine these two types of optimal two-stage designs. First, we refresh notation for interpreting different designs in Table 1. We consider the following settings. Set significance level $α = 0.05$ and Type II error rate $β = 0.2$ . Set the population size N = 80 or 120, the proportion under the null hypothesis $p_{0} = M_{0} / N = 0.1, 0.2, ..., 0.7$ and the treatment effect $δ = Δ / N = 0.15$ or 0:2. The resulting optimal designs are presented in Table 2, Table 3, Table 4 and Table 5. We have obtained these results using R codes, which can be requested by sending email to the first author.

Table 2: N = 80. Designs for testing H0:p p0 vs. H1:p > p0 with = 0:05 and power at least 80% at p1 = p0 + 0:15. For each setting the Rows 1-3 contain the Types 1-3 two-stage designs. The left and rights parts are for the first type and second type of optimal designs. View Table 2

Table 3: N = 80. Designs for testing H0:p p0 vs. H1:p > p0 with = 0:05 and power at least 80% at p1 = p0 + 0:2. For each setting the Rows 1-3 contain the Types 1-3 two-stage designs. The left and rights parts are for the first type and second type of optimal designs. View Table 3

Table 4: N = 120. Designs for testing H₀: p p₀ vs. H₁: p > p₀ with = 0: 05 and power at least 80% at p₁ = p₀ + 0:15. For each setting the Rows 1-3 contain the Types 1-3 two-stage designs. The left and rights parts are for the first type and second type of optimal designs. View Table 4

Table 5: N = 120. Designs for testing H0:p p0 vs. H1:p > p0 with = 0:05 and power at least 80% at p1 = p0 + 0:2. For each setting the Rows 1-3 contain the Types 1-3 two-stage designs. The left and rights parts are for the first type and second type of optimal designs. View Table 5

Table 2 and Table 3 apply to a small population of size and Table 4 and Table 5 apply to a small population of size $N = 120$ , where $δ$ is 0:15 for Table 2 and Table 4 is 0:20 for Table 3 and Table 5. In each table, the first column corresponds to one-stage design, the optimal two-stage designs minimizing the expected sample size under $H_{0}$ are shown on the left half of the table, and the optimal two-stage designs minimizing the maximum sample size are shown in the right half of the table. For each setting, there are three rows, which are corresponding to three types of two-stage designs (Types 1-3), respectively. The tabulated results include the optimal design (Table 1), the expected sample size under $H_{0}$ , $E N_{0}$ , the maximum sample size, $n_{2}$ , and the early termination probability under $H_{0}$ , $P E T_{0}$ . For some settings, there is more than one optimal design, of which the minimum maximum size is the same as the sample size in the one-stage design, then the one which has minimum expected sample size under $H_{0}$ is reported and it is indicated by an asterisk.

First and most importantly, the results in Table 2, Table 3, Table 4 and Table 5 verify the main theorems; that is, for each setting, there exists at least one two-stage design whose maximum sample size is bounded above by the sample size of the one-stage design. Because the early termination probability is strictly positive for any proper two-stage design, the corresponding expected size is always strictly smaller than the sample size of the one-stage design.

Second, in many settings, the maximum sample sizes of the two-stage designs are strictly smaller than the sample size of the corresponding one-stage design. This finding is striking, noting that the standard one-stage test is uniformly most power test.

Third, in some settings, the optimal two-stage design minimizing the expected sample size may be more attractive than the optimal two-stage design minimizing the maximum sample size. This is the case when the difference in maximum sample size is small, but the difference in expected sample sizes is large. For example, in Table 2, the third row in the session of $p_{0} = 0.2$ indicates cases where the difference in maximum sample size is only one, but the difference in expected sample sizes is 7:5. For other settings, these two types of optimal exact two-stage designs are quite similar. Surprisingly, this finding is different from that in Simon [6], where it was concluded that the "minimax" designs may be more attractive.

Discussion

Optimal exact group sequential tests for a binomial proportion have been well-studied in the literature, but a corresponding study involving hypergeometric distributions is lacking. This manuscript studies some exact group sequential tests involving hypergeometric distributions, which are useful for investigating treatment effects on rare diseases.

In this manuscript, three theorems have been proved and two types of optimal two-stage designs are presented. The theorems guarantee the existence of proper exact group sequential designs whose expected sample sizes are strictly smaller than the ones from standard one-stage designs. The discussed optimal two-stage designs provide two examples of how to design optimal two stage designs. There are other criteria; for example, the optimal design minimizing the expected sample size at a given parameter, say $p_{1} = p_{0} + δ$ . Moreover, the tabulated results provide detailed comparisons between these two types of optimal designs.

Finally, this manuscript focuses on one-arm phase II clinical trials. One of our future projects is to study the properties of group sequential tests for comparing two hypergeometric distributions, which arise from two-arm phase II clinical trials.

Acknowledgment

The authors would like to thank two anonymous referees for their constructive comments and suggestions, which have led to a significantly improved paper.