For a rare disease, all the patients having the disease constitute a small population, and the standard single-stage hypergeometric test is uniformly most powerful to evaluate the response probability of a specific treatment regimen. Although exact group sequential designs are widely employed in phase II clinical trials for binomial proportions, it is unknown whether or not similar tests can be employed for hypergeometric proportions. In this manuscript, it is proved that, for hypergeometric proportions, there exist exact group sequential designs that achieve the predesignated significance level and power with maximum total sample size bounded above by the sample size for the corresponding standard exact single-stage test. Additionally, two types of optimal two-stage designs are examined for a range of design parameters; one is optimal in the sense that the expected sample size under the null hypothesis is minimized, and the other is optimal in the sense that the maximum sample size is minimized.

Exact group sequential designs, Hypergeometric distribution, Minimax designs, Optimal designs, Small population, Uniformly most powerful test

For a rare disease, all the patients having the disease constitute a small population of size N. Phase II trials of Investigational New Drugs (INDs) are performed in order to assess whether a new drug shows some promise of activity for the disease [1]. Imagine that if all the patients with the disease were to be treated, the new drug would show some promise of activity to M of them, and therefore the response rate could be defined as p = M/N. Care must be taken to explicitly define what is meant by "show some activity" [1]. For example, the drug is said to show some activity to a cancer patient whose tumor shrinks by at least 50% after the treatment.

Glycogen storage disease type II (also known as Pompe disease) is an autosomal recessive metabolic disorder which damages muscle and nerve cells throughout the body. The disease affects approximately 1 in 140,000 babies and 1 in 60,000 adults a year [2]. Von Hippel-Lindau (VHL) disease is another rare autosomal dominant syndrome which affects 1 in 36,000 babies [3].

To conduct clinical trials for such extremely rare disease, the designs developed here are based on testing a null hypothesis H0:p≤p0 that the true response rate is less than some uninteresting level p0; that is, the new drug shows some activity to fewer than M0=Np0 patients. If the null hypothesis is true, then we require that the probability of falsely concluding that the drug is efficacious is less than a user specified α. We also require if a specified alternative hypothesis H1:p≤p1 (that is, more than M1=Np1 patients response to the drug) is true with p0≤p1, then the probability of falsely concluding that the drug is not efficacious is less than a user specified β.

We start our discussion by considering the standard one-stage design for testing

H0:p≤p0 vs. H1:P>p0, or equivalently H0:M≤M0 vs. H1:M>M0 (1)

In order to test the above hypotheses, a sample of n patients selected randomly from the population of N patients is treated. Let S denote the number of patients in the sample who respond to the treatment, which is following the hypergeometric distribution, denoted as H (N; M; n),

Pr(S=s)=h(s;N,M,n)=(Ms)(N−Mn−s)/(Nn), (2)

Where s=max(0;n+M−N) ; ... ;min(n;M). Consider the test with reject region S≤b+1, where b is an integer such that Pr(S≤b+1|p=p0)≤α. We can show that this test is Uniformly Most Powerful (UMP) test for hypotheses (1).

Group sequential designs are widely employed in phase II and phase III clinical trials. A group sequential test allows early termination of accrual if the treatment response rate of the treatment is quite high or quite low during early stages. Fleming [4], Chang, et al. [5], and Simon [6] pointed out that for binomial distributions; well-designed group sequential tests are more efficient than standard single-stage tests, because on average they require fewer patients to achieve the predesignated significance level and power. However, it is still unknown that for hypergeometric distributions, whether or not there exist exact group sequential tests which are more efficacious than the aforementioned standard single-stage exact test.

It is known that when the disease population size N is large, the hypergeometric distribution h(s;N,M,n) can be approximately well by the binomial distribution b(s;n,p) where p=M/N and then we can use the available designs [6]. But for extremely rare diseases, using designs based on binomial distribution may have incorrect significant level; that is, the true type-I error may be different from the pre specified type-I error. In Figure 1, we display a toy example, showing the difference between two distributions, hypergeometric distribution h(s;50,20,10) and binomial distribution b(s;10,0.4) , where s=0,1,...,10.

Motivated by Kepner and Chang [7,8], in this manuscript we show that, for hypergeometric distributions, under mild conditions and for a given number k≥2 , there exists at least one k-stage group sequential test which has exactly the same maximum total sample size, significance level, and power as the standard single-stage hypergeometric test. Furthermore, motivated by Simon [6], we examine two types of optimal two-stage designs for a range of design parameters; one is optimal in the sense that the expected sample size under the null hypothesis is minimized, and the other is optimal in the sense that the maximum sample size is minimized.

The remainder of the manuscript is organized as follows. In Section 2, group sequential procedures are described. In Section 3, main theorems are presented and proved. In Section 4, two types of optimal two-stage designs are examined. Some brief discussion is given in Section 5 (Figure 1).

Group sequential tests are specified by the maximum number of stages, k, the cumulative number of patients to be treated up to each stage i,ni and the critical values, {[a1,b1],⋅⋅⋅,[ak−1,bk−1],bk} , where ai≤bi,i=1,...,k−1. Let Si be the number of patients who respond positively to the treatment among ni patients cumulative up to stage i. The distribution of Si is Pr(Si=s)=h(s;N,M,ni) where s=max(0,n+M−N),...,min(ni,M). The distribution is undetermined up to an unknown parameter, M or equivalently, p=M/N.

The group sequential tests are conducted as follows. Start with stage i=1 . If Si≤ai−1, stop sampling and reject H1 ; if Si≥bi+1 , stop sampling and accept H1; if ai≤Si≤bi, continue to stage i+1 . At the final stage, accept H1 if Sk≥bk+1 and reject Hi if Sk≤bk⋅

The power function for any such group sequential test is

P(M)=Pr{S1≥b1+1|M}+Pr{a1≤S1≤b1,S2≥b1+1|M}+... (3)+Pr{a1≤S1≤b1,...,ak−1≤Sk−1≤bk−1,Sk≥bk+1|M}.

For the required significance level α and power 1−β at M0+Δ , one should select (n1,...,nk) and [(a1,b1),...,(ak−1,bk−1),bk] such that P(M0)≤α and P(M0+Δ)≥1−β. The power function can be written as a function of proportion p. To abuse the notation slightly, let the power function (3) be written as P(p). Therefore, for the required significance level α and power 1−β at p0+δ , where p0=M0/N and δ=Δ/N , one should select (n1,...,nk) and [(a1,b1),...,(ak−1,bk−1),bk] such that P(p0)≤α and P(p0+δ)≥1−β.

As discussed in Kepner and Chang [7], there are three types of group sequential designs. Type 1 designs stop early only to conclude efficacy (i.e., stop early only to accept H1 ), Type 2 design stops early only to conclude futility (i.e., stop early only to reject H1 , and Type 3 designs stop early for either efficacy or futility. Using the above notation, Type 1 group sequential tests are those with a1 = a2 = ... = ak−1 = 0, Type 2 group sequential tests are those with b1=b2=...=bk−1=ni, and Type 3 group sequential tests are those with 0<ai≤bi≤ni, for i=1,...,k−1.

In this section, three theorems are established, one for each of three types of exact group sequential tests discussed in Section 2. These three theorems are similar to those established in Kepner and Chang [7] which were for binomial distributions.

Assume:

(A1)S∼~H(N,M,n);

(A1) n is the smallest sample size for which there is an integer b such that 0≤b+1≤n, Pr{S≥b+1|M=M0}≤α, and Pr{S≥b+1|M=M0+Δ}≥1−β, where 0≤Δ<N−M0; (A3) k is an integer such that 2≤k≤n.

Theorem 1

Under the conditions (A1)-(A3), assume b≤n−k. Then there exists a Type 1 k-stage exact sequential test such that n1≥n2−n1≥...≥nk−nk−1 and nk=n the significance level of the test is α and the power of the test at M=M0+Δ is at least 1−β.

Proof

Let n1=n−k+1 and ni=ni−1+1 for i=2,...,k. The test statistic at stage i is Si . Let b1=b2=...=bk=b and a1=a2=...=ak−1=0. Since b1+1=b+1≤n−k+1=n1≤ni for i=1,...,k, then the sample size (n1,...,nk) and the critical values [(a1,b1),...,(ak−1,bk−1),bk] become a proper Type 1 k-stage test. According to this test, H1 is accepted if and only if at least one Si≥b+1, 1≤i≤k, which is equivalent to Sk=S≥b+1, since S1≤S2≤...≤Sk=S. Thus, the power function of the test is the same as that of S.

Theorem 2

Under the conditions (A1)-(A3), assume b≥k−1. Then there exists a Type 2 k-stage exact sequential test such that n1≥n2−n1≥...≥nk−nk−1 and nk=n, the significance level of the test is α and the power of the test at M=M0+Δ is at least 1−β.

Proof

Let n1=n−k+1 and ni=ni−1+1 for i=2,...,k. The test statistic at stage i is Si. Let

b1=b2=...=bk=b

and bi=ni for i=1,...,k−1 and bk=b. Since 1≤a1≤bi=ni, i=1,...,k−1 and k≤b+1≤n, then the sample size (n1,...,nk) and the critical values [(a1,b1),...,(ak−1,bk−1),bk] become a proper Type 2 k-stage test. According this test H1 is rejected if and only if at least one Si≤ai=b−k+i+1. That is, there exists at least one i, 1≤i≤k, such that ni−Si≤ni−b+k+i=n−b. This is equivalent to n−Sk≤n−b, because n1−Si≤n2−S2≤...≤nk−Sk. Thus, the power function of the test is the same as that of S.

After Theorems 1-2 are established, following the same proof as that for Theorem 3 in Kepner and Chang [7], we can establish the following theorem.

Theorem 3

Under the conditions (A1) - (A3), assume k−1≤b≤N−k. Then there exists a Type 3 exact k-stage sequential test such that n1≥n2−n1≥...≥nk−nk−1 and nk=n, the significance level of the test is α and the power of the test at M=M0+Δ is at least 1−β.

The implications of these theorems are two-fold. The theoretical implication is that, for a given number k≥2 satisfying some mild conditions, there exists at least one k-stage group sequential test whose maximum total sample size is bounded above by the sample size needed for the standard one-stage test to achieve the same significance level and power. The applied implication is that we can search for "optimal" exact k stage designs among those proper group sequential tests whose maximum total sizes are bounded above by the sample size needed for the standard exact one-stage test to achieve the same significance level and power.

In this section we focus on optimal two-stage designs, which can be extended to optimal k-stage designs. We examine the two types of optimal two-stage designs proposed in Simon [6]; one is optimal in the sense that the expected sample size under the null hypothesis is minimized, and the other is optimal in the sense that the maximum sample size is minimized. Originally they were proposed for binomial distributions. In this manuscript, we examine them for hypergeometric distributions. But there is a small difference. In Simion [6], the optimal designs are searched among all the proper two-stage designs. Hereafter, thanks to Theorems 1-3, we only need to search for optimal designs among those proper two-stage designs whose maximum total sizes are bounded above by the sample size needed for the standard one-stage design to achieve the same significance level and power. Narrowing the search domain improves the computational efficiency.

Consider the first type of optimal two-stage designs. If there are n1 patients in the first stage and, if necessary, n2−n1 more patients are treated in the second stage, leading to maximum sample size to be n2. Then the expected sample size is EN=PET×n1+(1−PET)×n2, where PET is the probability of early termination after the first stage. The decision of whether or not to terminate after the first stage depends on the type of the two-stage design (Type 1, Type 2, or Type 3) and the number of responses observed from those n1 patients (Table 1). The expected sample size EN and the probability of early termination PET depend on the unknown parameter, M, the number of responses observed in the sample of size n1 from the population of size N. In particular, PET=1−∑s=a1b1h(s;N,M,n1). In this manuscript, as in Simon [6], we consider the optimal two-stage designs which have minimum expected sample size under the null hypothesis, EN0=PET0×n1+(1−PET0)×n2 where PET=1−∑s=a1b1h(s;N,M0,n1). The rationale behind this is that we should expose as few patients as possible to an ineffective treatment. The second type of optimal two-stage designs is easier to describe. They are the ones that have minimum maximum sample size; that is, n2 is minimized.

Now we are ready to examine these two types of optimal two-stage designs. First, we refresh notation for interpreting different designs in Table 1. We consider the following settings. Set significance level α=0.05 and Type II error rate β=0.2. Set the population size N = 80 or 120, the proportion under the null hypothesis p0=M0/N=0.1,0.2,...,0.7 and the treatment effect δ=Δ/N=0.15 or 0:2. The resulting optimal designs are presented in Table 2, Table 3, Table 4 and Table 5. We have obtained these results using R codes, which can be requested by sending email to the first author.

Table 2 and Table 3 apply to a small population of size and Table 4 and Table 5 apply to a small population of size N=120, where δ is 0:15 for Table 2 and Table 4 is 0:20 for Table 3 and Table 5. In each table, the first column corresponds to one-stage design, the optimal two-stage designs minimizing the expected sample size under H0 are shown on the left half of the table, and the optimal two-stage designs minimizing the maximum sample size are shown in the right half of the table. For each setting, there are three rows, which are corresponding to three types of two-stage designs (Types 1-3), respectively. The tabulated results include the optimal design (Table 1), the expected sample size under H0, EN0, the maximum sample size, n2 , and the early termination probability under H0, PET0. For some settings, there is more than one optimal design, of which the minimum maximum size is the same as the sample size in the one-stage design, then the one which has minimum expected sample size under H0 is reported and it is indicated by an asterisk.

First and most importantly, the results in Table 2, Table 3, Table 4 and Table 5 verify the main theorems; that is, for each setting, there exists at least one two-stage design whose maximum sample size is bounded above by the sample size of the one-stage design. Because the early termination probability is strictly positive for any proper two-stage design, the corresponding expected size is always strictly smaller than the sample size of the one-stage design.

Second, in many settings, the maximum sample sizes of the two-stage designs are strictly smaller than the sample size of the corresponding one-stage design. This finding is striking, noting that the standard one-stage test is uniformly most power test.

Third, in some settings, the optimal two-stage design minimizing the expected sample size may be more attractive than the optimal two-stage design minimizing the maximum sample size. This is the case when the difference in maximum sample size is small, but the difference in expected sample sizes is large. For example, in Table 2, the third row in the session of p0=0.2 indicates cases where the difference in maximum sample size is only one, but the difference in expected sample sizes is 7:5. For other settings, these two types of optimal exact two-stage designs are quite similar. Surprisingly, this finding is different from that in Simon [6], where it was concluded that the "minimax" designs may be more attractive.

Optimal exact group sequential tests for a binomial proportion have been well-studied in the literature, but a corresponding study involving hypergeometric distributions is lacking. This manuscript studies some exact group sequential tests involving hypergeometric distributions, which are useful for investigating treatment effects on rare diseases.

In this manuscript, three theorems have been proved and two types of optimal two-stage designs are presented. The theorems guarantee the existence of proper exact group sequential designs whose expected sample sizes are strictly smaller than the ones from standard one-stage designs. The discussed optimal two-stage designs provide two examples of how to design optimal two stage designs. There are other criteria; for example, the optimal design minimizing the expected sample size at a given parameter, say p1 = p0+δ. Moreover, the tabulated results provide detailed comparisons between these two types of optimal designs.

Finally, this manuscript focuses on one-arm phase II clinical trials. One of our future projects is to study the properties of group sequential tests for comparing two hypergeometric distributions, which arise from two-arm phase II clinical trials.

The authors would like to thank two anonymous referees for their constructive comments and suggestions, which have led to a significantly improved paper.