A statistical test is a formal method used in data analysis to make inferences, identify relationships, and draw conclusions about a population based on sample data. Many emerging biological researchers face challenges in managing their statistical findings because they often lack clear guidelines or interpretive frameworks to help them navigate their research safely. If a researcher fails to use the correct statistical test, the results may be harmful, and the conclusions could be unclear and misleading. It is essential to have an interpretive framework that guides and assists novice biological researchers in selecting and interpreting their statistical data. In this paper, an interpretive framework for statistical tests for emerging biological researchers was proposed. The proposed framework IFST2BR was validated through expert reviews and focus group discussions in terms of relevancy and understanding. After validation, the results obtained demonstrated the practical usability of the proposed framework as a guide and assistant for emerging biological researchers.
Interpretive framework, Statistical tests, Biostatistician, Biostatistics Test types, Expert review, Focus group discussion
Statistics is a field that involves the collection and organization of data to address research problems. Data serves as the fundamental building block for gaining knowledge. Using the right data along with appropriate statistical tests is crucial for producing accurate insights. However, selecting the correct statistical test can be challenging, particularly for emerging researchers. When a researcher chooses the wrong statistical test, it can lead to various issues during the interpretation process. Additionally, using an inappropriate test may result in incorrect conclusions [1,2].
Thanks to modern technology, we now have a variety of statistical software and applications, such as R, SPSS, SEM-PLS SAS, and so on, all of which simplify the process of statistical testing. However, choosing the right test can still be a challenge for biological researchers. While these applications help manage the statistical process, they do not offer guidance on which statistical test is most appropriate for a given situation [3]. Choosing the appropriate test depends on the nature of the collected data and the main purpose of the research. If the researcher succeeds in selecting the right test, the results will be significant, and vice versa [4].
The primary purpose of this paper is to propose a systematic framework that is utilized as a guide for junior researchers in the field of biology to first understand the data and then choose the appropriate statistical test. In this research, in the beginning, the proposed framework has presented all its components, relying on an extensive study of all research related to the topic and reliable websites specializing in the above research topic. Next, we will provide a brief explanation of its components. After that, we will validate the framework using two scientific methods: focus group discussions and expert reviews. Finally, we will present the results of the validation and draw our conclusions. Figure 1 illustrates the proposed framework.
Figure 1: The proposed framework IFST2BR with its phases and components.
View Figure 1
As indicated in figure 1, the proposed framework IFST2BR consists of three main sides, and each side has its components; consequently, an explanation of each side with its components will be outlined in the following paragraphs.
To obtain the correct statistical analysis, the main objective of the analysis must be determined. The beginning biological researcher must ensure that the statistical test used is appropriate for the type of data collected and the method of research design. Therefore, the beginning researcher must ask the following question: What are we looking for? The answer to this question is the plan for choosing the appropriate statistical test.
This aspect includes four important and necessary concepts in determining and selecting the appropriate statistical test, which is determining the objectives of the analysis, forming research questions, and deducing the dependent and independent variables. In the following paragraphs, each concept is explained in some detail.
Determining the purpose of the statistical analysis is very important because the process of choosing the appropriate statistical test depends fundamentally on determining the purpose of the analysis. Accordingly, the researcher must ask the following question: What knowledge are we looking for? The answer to this question determines the basic purpose of the analysis.
Research questions are often derived from the primary objective of the study and there is a close relationship between them. Accordingly, and based on previous relevant studies, there are several different types of research questions, including diagnostic or classification, exploratory, validity and reliability, predictive, causal or experimental, relationship or correlation, comparative, and descriptive research questions. Well-formulated research questions have a positive impact on the selection of appropriate statistical tests as well as analysis methods that ensure that results are meaningful and consistent with the purpose of the study.
In the context of biostatistics, independent variables play a prominent role in the selection of the appropriate statistical test because they provide the possibility of determining the type and structure of the analysis and are also called predictors. Three types of independent variables include Categorical (Nominal), Ordinal, and Continuous (Quantitative) [5]. Choosing the suitable statistical test based on independent variables is tabulated in table 1.
Table 1: Statistical test based on independent variables. View Table 1
In the context of biostatistics, a dependent variable, also known as a response variable in biostatistics is the crucial variable in any statistical study that the researcher must explain or forecast based on how much the independent variables react to it [6]. Choosing the suitable statistical test based on dependent variables is tabulated in table 2. Five types of dependent variables include continuous, categorical, binary, ordinal, and count dependent variables.
Table 2: Statistical test based on dependent variables. View Table 2
This aspect includes four important and necessary concepts in determining and selecting the appropriate statistical test, which are identifying and interpreting the collected data of the analysis. Includes all the procedures and processes for managing data [3]. Therefore, the following essential elements should understood by researchers to manage statistical data:
These involve classifying the variables that comprise the data and determining what type of statistical analysis is used. There are typically two key categories of these scales: Quantitative (interval, ratio) and Qualitative (ordinal m nominal) [7].
Nominal scale : Group data into distinct categories mutually exclusive and exhaustive; order or ranking is unnecessary. The researcher can calculate counts and modes. In test analysis, usually use frequencies and proportions [8].
Ordinal scale : The principle of operation of the ordinal scale is to classify the selected categories into ordered categories, which means that the order of the categories is significant in this scale. Arithmetic operations are not significant in this scale and should be replaced by calculating the mean and mode as well as using non-parametric tests [9].
Interval scale : Unlike an ordinal scale, it has ordered categories but does not have a true zero point. The variance between values on this scale is significant and comparable. Researchers can use the standard deviation, median, mean, and mode as well as arithmetic operations [10].
Ratio scale : The principle of this scale is the same as the principle of the interval scale with the only difference being that it has a true zero point in addition to its security. It uses all the test scales that the interval scale uses with the addition of a coefficient of variation [11].
Parametric methods: The parametric approach is widely used to interpret and manage biological data in the field of biostatistics. These approaches rely on interpreting assumptions about the normal distribution of the data and drawing inferences and predictions from selected samples [12]. Figure 2 and Figure 3: illustrate the normal distribution and not normal distribution respectively. There are some widely used approaches often used by biostatisticians as tabulated in table 3.
Figure 2: Normal distribution.
View Figure 2
Figure 3: Not-normal distribution
View Figure 3
Table 3: Parametric method with examples. View Table 3
Non-parametric approaches : Are methods that do not rely on normal distribution and do not require the required assumptions to be met as in the case of normal distribution [13,14].
There are two types of data sources used by biological researchers in statistical tests. The data sources chosen depend on the field of study, the nature of the required case study, and the objectives of the analysis. Accordingly, the data are either primary or secondary data sources.
Primary data sources : Primary sources are those used by the researcher to collect original data and collect data directly. Data collection from primary sources depends on the research questions.
Secondary data sources : Data sets collected for different purposes by individuals or organizations that the researcher uses to analyze the data for his study.
It includes four commonly used approaches in the field of biological statistics, which are questionnaires, experiments, administrative data, and observations.
Questionnaires : It is an approach to collect data through questionnaires from users related to the field of study or real users or the selected sample and is usually about certain behaviours, points of view, or characteristics.
Experiments : It is an approach to collecting data through certain experiments under the control of the researcher and is usually useful in studying and analyzing causal relationships.
Observations : An approach to collect data based on observing the selected sample for a certain behaviour or phenomenon and deducing the data and the relationships that link them together.
Administrative data : It is an approach to collect data found in institutions related to the field of research such as patient records in hospitals and historical data for patients such as analysis data and others.
Raw data : The basic data collected that not analyzed and usually not processed yet and needs to be organized.
Processed data : Data that has cleansed, classified, and organized to make it appropriate for the analysis process.
Grouped data : Data has classified based on age, gender, yearly income as well as blood groups.
Statistical tests in biology, like other statistical disciplines, are mechanisms and tools that researchers use to analyze and interpret data to make appropriate decisions and indicate whether the inferences obtained are statistically significant or whether they occurred by chance or randomly.
Descriptive statistics tests are concerned with summarizing and interpreting sample data descriptively, for example, calculating the mean, median, mode, variance, range, and standard deviation, as well as providing the ability to interpret most charts and graphs. It composite of three key types a) central tendency, b) Spread of the data, and c) Data dispersion [15,16].
Parametric tests are typically used to assess differences or find relationships in population parameters and are a very powerful tool because they conclude the interpretation of underlying sample information. Parametric tests consist of four types of statistical tests. Table 4 lists parametric test types and their purpose with assumptions [17-19].
Table 4: Paramedical tests overview. View Table 4
This type of test is used if there is no specific distribution of the data selected for the sample. It is often utilized in biostatistics to test medical data that does not follow the standardization. In biostatistics, there are common Non-Parametric Tests types [20,21]. Table 5 visualize these types with the relevant purpose for each.
Table 5: Non-parametric overview. View Table 5
In the context of this study, the proposed framework was evaluated for its understand ability and relevance using a well-known and accepted academic evaluation method, focus groups in expert review [22]. Accordingly, this paper uses the expert review technique through focus groups to evaluate the proposed framework.
Experts engaged in this evaluation process were lectures and instructors in biostatistics. The criteria for the experts are as follows:
1. Have a PhD in Biostatistics, Data Science (DS), Bioinformatics or related areas.
2. Have fifteen years or more of teaching background in biostatistics or DS or related areas.
Twenty experts participated in this review session (focus group), and Table 4444 details the demographics of the experts. Twenty experts are more than adequate for this study, as endorsed by (Fattah, et al. 2022; Freeman and Nelson, 2004) [23,24].
As illustrated in table 6, the experts' backgrounds represent various fields of expertise: 10 participants in biostatistics, 6 in data analysis, and 6 in bioinformatics.
Table 6: Demographic details of experts review (Focus Group). View Table 6
The objective of the expert review was to conduct a focus group review of the proposed IFST2BR in terms of relevance and understanding, seeking the expert view on each IFST2BR item.
Data collected from focus group discussion (expert review) are listed in table 7. The data were documented as in frequency of responses of the expert review to the questions asked in the instrument.
Table 7: Overall findings. View Table 7
As is clearly evident from the results listed in table 7, figure 4, figure 5 and figure 6, the majority of experts agreed or strongly agreed that the proposed framework components are feasible in practical terms, which indicates that the proposed framework is feasible in helping emerging biostatistics researchers complete their tasks. The overall findings are illustrate in figure 4, figure 5 and figure 6.
Figure 4: Rrelevancy of IFST2BR.
View Figure 4
Figure 5: Understanding of IFST2BR.
View Figure 5
Figure 6: General questions.
View Figure 6
It is clear from the obtained results that the research question has been answered successfully and that the proposed evaluation framework is applicable and workable in the field of biostatistics and is very useful for emerging researchers in the field of biostatistics. On the other hand, the systematic approach in proposing the interpretive framework and its evaluation method can be useful for researchers in guiding them to propose and evaluate similar frameworks or models.