Pharmacovigilance is the primary method used to identify hazards associated with medicinal products. Using generalized likelihood ratio tests, we examed the adverse reaction signals of the drug class Monoamine oxidase inhibitors (MAOIs) from World Health Organization's Pharmacovigilance Database. The proposed test procedure has the ability of detecting adverse reactions of multiple drugs simultaneously. Our findings sugggest there are 23 common Adverse reaction signals detected within this drug class. And postural hypotension, high blood pressure, fainting, abnormal heart rhythm, dizziness, headache, drowsiness are the most strong signals for the MAOIs class. An extensive simulation study performed to evaluate the proposed test procedural also suggests the proposed test procedure works well in practice.

Signal detection, Adverse reaction, Monte carlo simulation, Pharmacovigilance

Reporting of drug or medical device related adverse reactions (ARs) is usually voluntary. One of the major postmarket safety surveillance databases is the World Health Organization's (the "WHO") global pharmacovigilance database, which contains reports of suspected ADRs, so called Individual Case Safety Reports (ICSRs), collected by national drug authorities in over 110 countries and span over more than 100,000 different medicinal products. Clinical reviewers evaluate adverse reactions reports to look for new safety concerns that might be related to a marketed product, or for a manufacturer's compliance to reporting regulations.

There are several statistical methods available for adverse reactions detection in and other large postmarket databases. These methods include the reporting odds ratio (ROR, [1]), proportional reporting ratio (PRR, [2]), multi-gamma Poisson shrinker (MGPS, [3,4]), Bayesian confidence propagation neural network (BCPNN, [5]), Bayesian method based on a new Information Component (IC, [6]), simplified Bayes (sB, [7]), among others. A likelihood ratio test method ([8]), that assumes the number of adverse reactions follows a Poisson distribution, was developed to identify adverse reactions for a specific drug or to identify drugs for a particular adverse event. In post-marketing surveillance, the signals of adverse reactions within a drug class, or drug signals for a group of adverse reactions may be of interest to medical reviewers. In this article, we develop a generalized likelihood ratio test to identify adverse reactions that have high reporting rates compared to other adverse reactions associated with all the drugs of the same class or with similar treatment indications. The drug class refers to a set of drugs which have similar chemical structure such as the antibiotics drug class containing Penicillins, Tetracyclines, Cephalosporins, Quinolones, Lincomycins, Macrolides, Sulfonamides, Glycopeptides, Aminoglycosides and Carbapenems, etc. A group of adverse reactions refers to a set of preferred terms such as hepatic failure, alanine aminotransferase abnormal, ascites, blood bilirubin abnormal, cholestatic liver injury, hepatic atrophy, hepatomegaly, Reye's syndrome, and so on that are all related to hepatocellular injury.

This article is organized as followed. In section 2, a brief overview of World Health Organization's global pharmacovigilance database is provided. In Section 3, we give a brief review of the likelihood ratio test procedure for adverse reactions detection for a single drug, then we propose a generalized likelihood ratio test procedure, namely GLRT, to detect multiple ARs in a drug class. The performance of GLRT is evaluated using simulated datasets in Section 4. In Section 5, both the LRT and the GLRT are applied to the 2000-2005 and 2005-2010 data from WHO's global database. Section 6 contains some discussion and concluding remarks.

The WHO's global pharmacovigilance database consists of the individual reports with demographical information, route of administration, drug/biological information, medical history, treatment indication, therapy start dates, and end dates. For adverse reaction detection, Medical Dictionary for terminology of preferred terms is often used to identify the adverse events, such as Death, Stroke, Myocardial infarction, and so on. There are also verbatim drug names in the file for drug/biologic information. In studying the drug-AE association, the generic name of the drug is used, which refers to the unique chemical makeup of a drug.

The WHO's global pharmacovigilance database includes reports since 1980, however researchers and reviewers are more interested in data from recent years. In this article we focus on cases reported to WHO between 2000 and 2010 for more than 6500 drugs and 14,000 Adverse reactions. For any particular adverse event, the investigators consider all suspect and concomitant drugs.

After summarizing the data files, the WHO pharmacovigilance data can be presented in a tabular form with, say, adverse reactions (ARs) as the row variable and drugs as the column variable (as in Table 1), with nij as the cell count for ith AR and jth drug, ni. as the sum of counts for ith AR (ith row total) and n.j as the sum of counts for jth drug (jth column total).

We collapse the data structure table into multiple 3 × 3 tables. For a fixed jth drug, we have I such tables (Table 2), each associated with an AR (i = 1,..., I). We assume that nij ~ Poisson(ni. × pij), where pij is the reporting rate of jth drug for ith AR; and n.j - nij ~ Poisson((n.. - ni.)qij) where qij is the reporting rate of jth drug for other ARs combined excluding ith AR. We also assume nij and n.j - nij are independent. Since drug j is fixed, unless stated otherwise, we suppressed the notational dependence of pij and qij and on jth drug. We define the null hypothesis,

*H _{0j}: p_{ij}* =

*q*=

_{ij}*p*for all ARs in drug $j\text{}(ie.\text{}{H}_{0j}:{\cap}_{i}{p}_{ij}={q}_{ij}={p}_{0});$

_{0}and the alternative hypothesis (one sided) as

${H}_{aj}:{p}_{ij}>{q}_{ij}$ for at least one AR in drug $j\text{}(ie.\text{}{H}_{aj}:{\cup}_{i}{p}_{ij}{q}_{ij}).$

Under the null hypothesis, the maximum likelihood estimates (MLEs) for ${p}_{0}$ is ${\widehat{p}}_{0}=\frac{{n}_{.}{}_{j}}{{n}_{\mathrm{..}}},$ and the expected number of cases for i^{th} AR and j^{th} drug is ${E}_{ij}=\text{}{n}_{i.}\times \frac{{n}_{.j}}{{n}_{\mathrm{..}}}.$ Under the two-sided alternative hypothesis (*p _{ij} ≠ q_{ij}*), the maximum likelihood estimates (MLEs) for p

_{ij}and q

_{ij}are

${\widehat{p}}_{ij}=\frac{{n}_{ij}}{{n}_{i.}}$ and ${\widehat{q}}_{ij}=\frac{{n}_{.j}-{n}_{ij}}{{n}_{\mathrm{..}}-{n}_{i.}}.$

The maximum likelihoods under both the null and the two-sided alternative hypotheses are obtained by replacing the parameters with their MLEs in the likelihood functions, leading to the likelihood ratio, for *i ^{th}* AR and

*j*drug as,

^{th}$L{R}_{ij}=\frac{{L}_{a}({\widehat{p}}_{ij},{\widehat{q}}_{ij})}{{L}_{0}({\widehat{p}}_{0})}=\frac{{(\frac{{n}_{ij}}{{n}_{i.}})}^{{n}_{ij}}{(\frac{{n}_{.j}-{n}_{ij}}{{n}_{\mathrm{..}}-{n}_{i.}})}^{({n}_{.j}-{n}_{ij})}}{{(\frac{{n}_{.j}}{{n}_{\mathrm{..}}})}^{{n}_{.j}}}={(\frac{{n}_{ij}}{{E}_{ij}})}^{{n}_{ij}}{(\frac{{n}_{.j}-{n}_{ij}}{{n}_{.j}-{E}_{ij}})}^{{n}_{.j}-{n}_{ij}}\text{(1)}$

Where ${E}_{ij}=\frac{{n}_{i.}\times {n}_{.j}}{{n}_{\mathrm{..}}}$ is the expected number of counts for *i ^{th}* AR and

*j*drug. The likelihood ratio test statistic for testing ${H}_{0j}:{p}_{i}={q}_{i}$ for all ARs in drug

^{th}*j*vs. ${H}_{aj}:{p}_{i}>{q}_{i}$ for at least one AR, is the maximum likelihood ratio MLR

_{j}= max

_{i}(LR

_{ij}), where the maximum is over ARs i = 1,...,I. We calculate $L{R}_{ij}={(\frac{{n}_{ij}}{{E}_{ij}})}^{{n}_{ij}}{(\frac{{n}_{.j}-{n}_{ij}}{{n}_{.j}-{E}_{ij}})}^{{n}_{.j}-{n}_{ij}}I({\widehat{p}}_{ij}>{\widehat{q}}_{ij})$ and define

*MLR*as the test statistic. For computational convenience, we may sometimes work with the log-likelihood ratio log(LR

_{i}= max(LR_{ij})_{ij}) which is the monotone function of LR

_{ij}.

The distribution of MLR under H_{0} is not analytically tractable and is obtained using Monte Carlo simulation as defined below. First, the number of cases for each AR, for a given drug j, are simulated under H_{0}. Under H_{0}, since n1j,...,n_{ij}, given the margin totals n1.,...,n_{i}. are independent *Poisson(n _{i}.p_{0})*, i = 1,...,I, the joint distribution of (n

_{1j},...,n

_{ij}) conditioning on n

_{.j}and (n1.,...n

_{i}.) is

$({n}_{1j},{n}_{2j},\dots ,\text{}{n}_{Ij})|{n}_{.j};{n}_{1.},\mathrm{...},{n}_{I.}\sim multinomial({n}_{.j},(\frac{{n}_{1.}}{{n}_{\mathrm{..}}},\mathrm{...},\frac{{n}_{I.}}{{n}_{\mathrm{..}}})).$

A total of 499 datasets under H_{0} are simulated from the multinomial distribution, and 500 MLRs are calculated. The null hypothesis is rejected at the α = 0.05 level if the value of MLR from the observed dataset is greater than the 95th percentile of the 500 MLR values (threshold, T_{α}). The corresponding p-value is then 1-R/500, where R is the rank of the observed MLR among all the 500 MLR values. If the p-value of the observed MLR is less than α (say, 0.05), then the AR associated with this MLR is the strongest signal among all ARs for the j^{th} drug under consideration. Having found the strongest signal, we can then move to the second largest LR_{ij}, and so on, and declare them as signals if their LR_{ij} are greater than T_{α} or the corresponding p-values are less than α.

The likelihood ratio test is shown, analytically and through extensive simulation study, to control type-I error and false discovery rate (FDR) while retaining good power and sensitivity ([7,8]). In the next section, we generalize the likelihood ratio test procedure to detect all AR signals in a drug class. The methods to detect drug signals for a set of prespecified ARs can be performed in a similar fashion.

In order to develop a test statistic that can identify adverse reactions of multiple drugs in a class, we assume that a drug class has K different drugs (usually K is a small number), and we assume that for kth drug the number of reports for ith AR and all other ARs (excluding ith AR) still remains a Poission distribution:

${n}_{ik}~Poisson\left({n}_{i.}{p}_{ik}\right)$

${n}_{.k}-{n}_{ik}~Poisson\left(\left({n}_{\mathrm{..}}-{n}_{i.}\right){q}_{ik}\right)$

Where p_{ik} is the reporting rate of k^{th} drug for i^{th} AR, and q_{ik} is the reporting rate of k^{th} drug for the other ARs.

The null and alternative hypotheses for detecting AR signal in drug *k* are,

${H}_{0k}:{p}_{ik}={q}_{ik}={p}_{0k}$ for all ARs in drug *k* versus ${H}_{ak}:{p}_{ik}>{q}_{ik}$ for at least one AR.

The null and alternative hypotheses for detecting AR signals among this drug class with k drugs are,

${H}_{0}:{\cap}_{k}{\cap}_{i}{p}_{ik}={q}_{ik}={p}_{0k}$

Where i = 1,..., I and k = 1,... , K, versus

${H}_{a}:{\cup}_{k}{\cup}_{i}{p}_{ik}>{q}_{ik}$

Another way to write the null and alternative hypotheses are ${H}_{0}:{\cap}_{k}\text{}{H}_{0k}$ versus ${H}_{a}:{\cup}_{k}\text{}{H}_{ak}.$ Under the null hypothesis H_{0}, the MLE for ${p}_{0k}$ is ${\widehat{p}}_{0k}=\frac{{n}_{.}{}_{k}}{{n}_{\mathrm{..}}},$ and the expected number of counts for i^{th} AR and k^{th} drug is ${E}_{ik}={n}_{i.}\times \frac{{n}_{.}{}_{k}}{{n}_{\mathrm{..}}}.$ Under the two-sided alternative hypothesis $({\cup}_{k}{\cup}_{i}{p}_{ik}\ne {q}_{ik}),$ the MLEs for ${p}_{ik}$ and ${q}_{ik}$ are ${\widehat{p}}_{ik}=\frac{{n}_{i}{}_{k}}{{n}_{i.}}$ and ${\widehat{q}}_{ik}=\frac{{n}_{.}{}_{k}-{n}_{i}{}_{k}}{{n}_{\mathrm{..}}-{n}_{i.}}.$

The maximum likelihood under both the null and the two-sided alternative hypotheses are obtained by replacing the parameters with their MLEs in the likelihood functions, leading to the likelihood ratio for i^{th} AR in k^{th} drug as:

$L{R}_{ik}=\frac{{L}_{a}({\widehat{p}}_{i},{\widehat{q}}_{i})}{{L}_{0}({\widehat{p}}_{0})}=\frac{{(\frac{{n}_{ik}}{{n}_{i.}})}^{{n}_{ik}}(\frac{{n}_{.k}-{n}_{ik}}{{n}_{\mathrm{..}}-{n}_{i.}})({n}_{.k}-{n}_{ik})}{{(\frac{{n}_{.k}}{{n}_{\mathrm{..}}})}^{{n}_{.k}}}={(\frac{{n}_{ik}}{{E}_{ik}})}^{{n}_{ik}}{(\frac{{n}_{.k}-{n}_{ik}}{{n}_{.k}-{E}_{ik}})}^{{n}_{.k}-{n}_{ik}}\text{(2)}$

The likelihood ratio test statistic for testing ${H}_{0k}:{p}_{ik}={q}_{ik}={p}_{0k},$ versus, ${H}_{ak}:{p}_{ik}>{q}_{ik}$ for at least one AR, is the maximum likelihood ratio *MLR _{k} = max_{i}(LR_{ik})*, i = 1,...I. The test statistic for testing ${H}_{0}:{\cap}_{k}{\cap}_{i}{p}_{ik}={q}_{ik}={p}_{ok}$ versus ${H}_{a}:{\cup}_{k}{\cup}_{i}{p}_{ik}>{q}_{ik}$ is

$MLR=\text{}ma{x}_{k}\left(ML{R}_{k}\right)=ma{x}_{k}(ma{x}_{i}(L{R}_{ik}I({\widehat{p}}_{ik}{\widehat{q}}_{ik}))),$

Where i = 1,...I and k = 1,... , K.

Because the distribution of MLR under *H _{0}* is not analytically tractable, we still use a Monte Carlo simulation to obtain its distribution. For each drug

*k*in the drug class under

*H*we generate 499 datasets using

_{0}$({n}_{1k},{n}_{2k},\dots ,\text{}{n}_{Ik})|{n}_{.k};{n}_{i.},\mathrm{...},{n}_{I.}\sim multinomial({n}_{.k},(\frac{{n}_{1.}}{{n}_{\mathrm{..}}},\mathrm{...},\frac{{n}_{I.}}{{n}_{\mathrm{..}}}))\text{(3)}$

and compute 500 values of MLR including the one from the real data, for k = 1,..., K. This results into 500 × K MLR values. The nulll hypothesis is rejected at α = 0.05 level if the value of MLR from the observed dataset is greater than the (1 - α)th percentile of the 500 × K MLR values T_{α}. After AR associated with the largest LRik is identified as signal (LRik > T_{α}), we move to the AR with the second largest value of LRik, determine if it is a signal and so on. This way, the generalized likelihood ratio test procedure controls Type-I error. It also controls the false discovery rate (FDR) with FDR ≤ α.

In the following, we present the results from applying the likelihood ratio test procedure discussed in Section 3 to the "Monoamine oxidase inhibitors" (MAOIs). The MAOIs are used to treat several conditions. They include, but are not limited to: Depression, generalized anxiety disorder, agitation, obsessive compulsive disorders (OCD), manic-depressive disorders, childhood enuresis (bedwetting), major depressive disorder, diabetic peripheral neuropathic pain, neuropathic pain, social anxiety disorder, posttraumatic stress disorder (PTSD) etc. The drug class includes Nardil (phenelzine), Parnate (tranylcypromine), Marplan (isocarboxazid), Emsam (selegiline), etc. We select four MAOIs labeled as MAOI1, MAOI2, MAOI3, MAOI4 and MAOI5 (not in any specific order to mask their names) using the WHO 2000-2005 and 2005-2010 data set. The purpose of this analysis is to identify the ARs signals (with high disportionality rates) associated with MAOIs drug class. We apply the likelihood ratio test (LRT) and generalized likelihood ratio test (GLRT) for detecting Adverse Reactions.

The results of MAOIs drug class using both the LRT and GLRT are listed in Table 3. By using the likelihood ratio test procedure to each of the four drugs in the drug class, there are 66, 37, 74, 45 ARs detected for the four MAOIs drugs; while using the generalized likelihood ratio test, there are 61, 32, 68, 39 ARs detected, respectively. Across the four drugs, the GLRT detects less ARs than the LRT. By cross-checking the ARs in the four MAOIs drugs, there are 23 common ARs detected within this drug class. The top ARs are listed in Table 4 and Table 5, and postural hypotension, high blood pressure, fainting, abnormal heart rhythm, dizziness, headache, drowsiness are the most strong ARs for this MAOIs class.

We then study the performance of the generalized likelihood ratio test (GLRT) using simulated datasets. We simulate datasets based on the four drugs in the monoamine oxidase inhibitors drug class in WHO's global pharmacovigilance database.

Under the null hypothesis, the data are simulated from multinomial distribution (3). Under the alternative hypothesis, data are generated as follow,

$({n}_{1k},\dots ,\text{}{n}_{Ik})|{n}_{.k};{n}_{1.},\mathrm{...},{n}_{I.}\sim multinomial({n}_{.k},(r{r}_{1k}\times {r}_{0k}\times \frac{{n}_{1.}}{{n}_{\mathrm{..}}}),\dots ,(r{r}_{1k}\times {r}_{0k}\times \frac{{n}_{I.}}{{n}_{\mathrm{..}}}))\text{(4)}$

Where k = 1,...5, and rr1k,..., rrIk are the relative reporting rates for AE1, ..., AEI in K drugs with constraints $0\le r{r}_{ik}\ast {r}_{0k}\ast \frac{{n}_{i.}}{{n}_{\mathrm{..}}}$ and ${\sum}_{i=1}^{I}r{r}_{ik}\ast r{r}_{0k}\ast \frac{{n}_{i.}}{{n}_{\mathrm{..}}}}=1.$ Relative reporting rates rrik are specified as follows: rrik are assigned a value; higher than 1 for ARs selected as signals and 1 for all other ARs not selected as signals. r_{0k} can be regarded as baseline risks for drug k, and r_{0k} can be different from one drug to another.

We evaluate how the relative reporting rate (*rr*), the sample size (n_{.k}) and the number of signals affect the performance of the GLRT through the following four scenarios:

• Scenario 1: One signal is randomly assigned to one drug, and the remaining of other four drugs are free of signals. Without loss of generality, we assign one signal to the drug with the column total as 12000.

• Scenario 2: We randomly assign 30 common signals in each drug over the drug class with homogeneous relative reporting rate.

• Senario 3: We randomly assign 30 signals in each drug using homogeneous relative reporting rates (rr) across the drug class, but signals are not necessarily common between drugs.

• Senario 4: We take a similar process as Scenario 3, randomly select 30 signals for each drug independently, but we use inhomogeneous rr. A rate of 2 × rr is assigned to those AR signals for which n_{i}. (the total number of reports for the ARs) fall between 35,000 and 40,000, a rate of 3 × rr to those AR signals for which n_{i}. fall between 20,000 and 25,000, a rate of 4 × rr to those AR signals for which n_{i}. fall between 15,000 and 20,000, and a rate of 5 × rr to those AR signals for which n_{i}. fall between 6,000 and 12,000. rr is assigned to 1 for those ARs that are not selected as signals.

In each simulation, we generate 1,000 datasets.

The performance of the proposed methods is evaluated by using Power, sensitivity (ST) and false discovery rate (FDR). First, power is defined as:

$Power=\frac{Number\text{}of\text{}times\text{}reject\text{}{H}_{0}}{L},$

Where L = 1,000 is the total number of simulations. *H _{0}* will be rejected when at least one AR in any one drug (in the drug class) is signal.

The sensitivity of a test is the proportion of positive results that are correctly identified. In our case, sensitivity is defined as:

$ST=\frac{1}{L}{\displaystyle {\sum}_{l}\frac{{\text{Numberofreactionscorrectlydetectedini}}^{\text{th}}\text{simulateddata}}{{\text{Totalnumberoftruereactionsinthei}}^{\text{th}}\text{simulateddata}}}$

The definition of FDR can be illustrated by a 2 × 2 table as in Table 6, where V is the number of falsely deteced signals, S is the number of correctly detected signals. FDR is defined as $E(\frac{V}{V+S})$, the expected proportion of rejected null hypotheses which are erroneously rejected. It's estimated by,

$FDR=\frac{1}{L}{\displaystyle {\sum}_{l}\frac{{\text{Numberofreactionsfalselydetectedini}}^{\text{th}}\text{simulateddata(V)}}{{\text{Totalnumberofreactionsdetectedinthei}}^{\text{th}}\text{simulateddata(V+S)}}}.$

All power, ST and FDR have values between 0 and 1. As we shall see in next section, GLRT have high sensitivity, low FDR, and to control Type-I error α which indicates its superiority over the conventional likehood ratio test.

The simulation results shown in Table 7 include power, sensitivity, and false discovery rate for the different scenarios described in Section 5.1.

In Scenario 1, one signal was assigned to the ARs with the relative large or moderate marginal counts (28,216 and 4,362). With fixed rr = 3, n_{i}. = 28216 and sample size n_{.j} = 500, the power is 0.073, ST is 0.13 and FDR is 0.0565. As the sample size n_{.j} increases, the power and ST increase to 1, and FDR decreases from 0.06 to 0.03. When the sample size of AR is fixed at n_{i}. = 28216, with the increase of rr from 1 to 7, the power increases from 0.06 to 0.75, and then to 1. The same increasing trend is also observed for ST. FDR decreases from 0.05 to 0.03, a value much lower than the level of significance. The effect of sample size is also evaluated when n_{i}. is fixed at 4362. The trends remain similar for the Power, ST and FDR, though the change in trends is relatively slower.

In Scenario 2 where 30 common signals are assigned to all the four drugs in the drug class, when rr = 1, the power and FDR are both 0.06. As rr increases, the power increases to 1, the ST increases from 0.01 to 0.85, and the FDR decreases from 0.064 to 0.0009. Because multiple signals are assigned randomly, we use actual sample size (AS) for n_{.j}. Similar trends of power, sensitivity and FDR are also found for Scenarios 3 and 4.

Besides the effect of relative reporting rate (rr), the effect of number of selected true signals on the performance of GLRT is also studied. In Scenario 2, if the number of signals are changed to 10 and 20, similar trends are observed for the power, sensitivity and FDR, as in Table 8. As rr increases, both the power and sensitivity increase, and the FDR decreases. If rr is fixed, as the number of selected signals increases, the power increases but the FDR decreases.

In this paper we generalized the likelihood ratio test procedure to detect adverse event for a class of drugs and applied it to the WHO's pharmacovigilance database. The proposed methods can also be used to detect drug adverse reactions in a group of pre-specified adverse reactions by renaming the row and column variables. One of the advantages of the generalized likelihood ratio test presented here is that the methods can be used to find multiple adverse reactions with both the Type-I error and false discovery rates controlled while retaining good power and sensitivity. We note that the GLRT tends to detect less adverse reactions than the LRT method. This is to be expected, since the threshold in the GLRT of the drug class is greater than or equal to those from each individual drug using the LRT, thus it is more conservative.

The generalized likelihood ratio test procedure provides a useful tool to identify potential adverse reactions in pharmacovigilance database. However, the final discovery of the true adverse reactions should also be based on a thorough review of all available medical records.

The author declares that he has no competing interests.

None.