Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.
3.1 What is Quantitative Research?
Quantitative research is a research method that uses numerical data and statistical analysis to study phenomena. 1 Quantitative research plays an important role in scientific inquiry by providing a rigorous, objective, systematic process using numerical data to test relationships and examine cause-and-effect associations between variables. 1, 2 The goal is to make generalisations about a population (extrapolate findings from the sample to the general population). 2 The data and variables are predetermined and measured as consistently and accurately as possible, and statistical analysis is used to evaluate the outcomes. 2 Quantitative research is based on the scientific method, wherein deductive reductionist reasoning is used to formulate hypotheses about a particular phenomenon.
An Introduction to Research Methods for Undergraduate Health Profession Students Copyright © 2023 by Faith Alele and Bunmi Malau-Aduli is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License , except where otherwise noted.
- Chapter 3: Home
Introduction to Quantitative Research Design
Quantitative research: target population and sample, script for purpose statement in quantitative methodology.
- Qualitative Descriptive Design
- Qualitative Narrative Inquiry Research
- SAGE Research Methods
- Alignment of Dissertation Components for DIS-9902ABC
- IRB Resources This link opens in a new window
- Research Examples (SAGE) This link opens in a new window
- Dataset Examples (SAGE) This link opens in a new window
The first step in developing research is identifying the appropriate quantitative design as well as target population and sample.
Please access the NU library database "SAGE Research Methods" for help in identifying the appropriate design for your quantitative dissertation.
Quantitative studies are experimental, quasi-experimental, or non-experimental.
Experimental is the traditional study you may be familiar with – random sampling and experimental and control groups investigating the cause-and-effect relationship between dependent variable(s) and independent variable(s). The independent variable is manipulated by the researcher. The researcher also designs the intervention. Some examples of designs are independent measures/between groups, repeated measures/with-in groups, and matched pairs.
Quasi-experimental is when the sample cannot be randomly sampled but still focuses on the cause-and-effect relationship between dependent variable(s) and independent variable(s). The researcher does not have control over the intervention, i.e., the groups already exist, and the independent variable (intervention/treatment) is not manipulated. The intervention/treatment has usually occurred prior to the current study. Control groups can be used but are not required like in an experimental study. Some examples of designs are causal comparative, regression analysis, and pre-test/posttest.
NOTE: Quasi-experimental is often used interchangeably with ex-post facto design, which means “after the fact.”
Non-experimental is when the sample is not randomly sampled and cause-and-effect are neither desired nor possible. These studies often can find a relationship between variables, but not which variable caused the other to change. Therefore, these studies do not have dependent nor independent variables. Some examples of designs are correlational, cross-sectional, and observational.
The primary non-experimental quantitative design is correlational. However, you need to keep in mind that correlational just confirms if a relationship exists between two variables, not the degree or strength of that relationship NOR the cause of the relationship.
NOTE: Variables in correlational studies are NOT dependent and independent, they are just variables.
If you wish to conduct a more rigorous type of quantitative study still looking at relationships, you can choose regression analysis, which will demonstrate how one variable affects the other. In regression analysis, the “independent variable(s)” should be referred to as “predictor variable(s)” and the “dependent variable(s)” as “outcome variable(s).”
Also, a causal-comparative design (which is a quasi-experimental design) can help determine differences between groups due to an independent variable’s effect on them.
The Target Population.
The target population is the population that the sample will be drawn from. It is all individuals who possess the desired characteristics (inclusion criteria) to participate in the Dissertation.
The sampling design represents the plan for obtaining a sample from the target population. A sampling frame can be employed to identify participants and can provide access to the population for recruitment of sample.
The Sampling Frame.
To identify all individuals in the dissertation population a sampling frame is identified and provides access to the population for recruitment of sample. Review Trochim's Knowledge Base at http://www.socialresearchmethods.net/kb/ for more information.
Use the script below by replacing the italicized text with the appropriate information to state the target population.
"The target population for the proposed study is comprised of all (individuals with relevant characteristics), within (describe the sampling frame)."
The Study Sample.
The sample is a subset of the target population. Participants comprise the sample and should be labeled with relevant characteristics to the dissertation. The sampling method is the technique used to obtain the sample. Review Trochim's Knowledge Base at http://www.socialresearchmethods.net/kb/ for more information.
A G*Power analysis is often conducted to determine the minimum sample size needed for a quantitative study. There are calculators to help with this analysis - https://www.psychologie.hhu.de/arbeitsgruppen/allgemeine-psychologie-und-arbeitspsychologie/gpower.html.
NOTE: It is important to understand the target population to determine the correct minimum sample size.
Use the script below to state the sample.
"A (sampling method) was used to determine a sample of (sample number) participants to be recruited for this study. The following inclusion criteria (list relevant characteristics needed to participate) must be met."
Creswell (2003) advised the following script for purpose statements in quantitative methodology:
“The purpose of this _____________________ (experiment? survey?) project is (was? will be?) to test the theory of _________________that _________________ (compares? relates?) the ___________(independent variable) to _________________________(dependent variable), controlling for _______________________ (control variables) for ___________________ (participants) at _________________________ (site). The independent variable(s) _____________________ will be generally defined as _______________________ (provide a general definition). The dependent variable(s) will be generally defined as _____________________ (provide a general definition), and the control and intervening variables(s), _________________ (identify the control and intervening variables) will be statistically controlled in this project” (pg. 97).
Creswell, J. (2003). Research design: Qualitative, quantitative and mixed methods approaches (2nd ed.) . SAGE Publications.
- << Previous: Chapter 3: Home
- Next: Developing the Qualitative Research Design >>
- Last Updated: Nov 2, 2023 10:17 AM
- URL: https://resources.nu.edu/c.php?g=1007179
Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser .
Enter the email address you signed up with and we'll email you a reset link.
- We're Hiring!
- Help Center
Chapter 3: Methodology (Quantitative
Spring Season Publications
Ralph Renzo Salangsang
Second Language Learning and Teaching
Air Medical Journal
American Journal of Case Reports
Zeitschrift für Gemeinschaftsprivatrecht
İnsan ve Toplum Bilimleri Araştırmaları Dergisi
Journal of Nanoelectronics and Optoelectronics
Journal of the Neurological Sciences
rosalia oliva Suàrez
Tempat Pkl Smk Di Blora
Rekomendasi Tempat PKL SMK
Revista Interfaz , Emilio Aquino
Journal of Vibroengineering
JKEP : Jurnal Keperawatan
CoSMo | Comparative Studies in Modernism
Journal of The Korean Association of Oral and Maxillofacial Surgeons
Energy and Buildings
International journal of education and development using information and communication technology
Eurasian journal of emergency medicine
International Journal of Legal Medicine
Muhammad Adnan Shan
Revista da Faculdade de Direito de São Bernardo do Campo
Caio Cesar Figueiroa
Lecture Notes in Computer Science
Ansel Y Rodriguez Gonzalez
International Journal of Green Energy
Revista Española de Cardiología
lourdes cordero lorenzana
Research Square (Research Square)
- We're Hiring!
- Help Center
- Find new research papers in:
- Health Sciences
- Earth Sciences
- Cognitive Science
- Computer Science
- Academia ©2024
An official website of the United States government
The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.
The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.
- Account settings
- Advanced Search
- Journal List
- J Korean Med Sci
- v.37(16); 2022 Apr 25
A Practical Guide to Writing Quantitative and Qualitative Research Questions and Hypotheses in Scholarly Articles
1 Department of General Education, Graduate School of Nursing Science, St. Luke’s International University, Tokyo, Japan.
Glafera Janet Matanguihan
2 Department of Biological Sciences, Messiah University, Mechanicsburg, PA, USA.
The development of research questions and the subsequent hypotheses are prerequisites to defining the main research purpose and specific objectives of a study. Consequently, these objectives determine the study design and research outcome. The development of research questions is a process based on knowledge of current trends, cutting-edge studies, and technological advances in the research field. Excellent research questions are focused and require a comprehensive literature search and in-depth understanding of the problem being investigated. Initially, research questions may be written as descriptive questions which could be developed into inferential questions. These questions must be specific and concise to provide a clear foundation for developing hypotheses. Hypotheses are more formal predictions about the research outcomes. These specify the possible results that may or may not be expected regarding the relationship between groups. Thus, research questions and hypotheses clarify the main purpose and specific objectives of the study, which in turn dictate the design of the study, its direction, and outcome. Studies developed from good research questions and hypotheses will have trustworthy outcomes with wide-ranging social and health implications.
Scientific research is usually initiated by posing evidenced-based research questions which are then explicitly restated as hypotheses. 1 , 2 The hypotheses provide directions to guide the study, solutions, explanations, and expected results. 3 , 4 Both research questions and hypotheses are essentially formulated based on conventional theories and real-world processes, which allow the inception of novel studies and the ethical testing of ideas. 5 , 6
It is crucial to have knowledge of both quantitative and qualitative research 2 as both types of research involve writing research questions and hypotheses. 7 However, these crucial elements of research are sometimes overlooked; if not overlooked, then framed without the forethought and meticulous attention it needs. Planning and careful consideration are needed when developing quantitative or qualitative research, particularly when conceptualizing research questions and hypotheses. 4
There is a continuing need to support researchers in the creation of innovative research questions and hypotheses, as well as for journal articles that carefully review these elements. 1 When research questions and hypotheses are not carefully thought of, unethical studies and poor outcomes usually ensue. Carefully formulated research questions and hypotheses define well-founded objectives, which in turn determine the appropriate design, course, and outcome of the study. This article then aims to discuss in detail the various aspects of crafting research questions and hypotheses, with the goal of guiding researchers as they develop their own. Examples from the authors and peer-reviewed scientific articles in the healthcare field are provided to illustrate key points.
DEFINITIONS AND RELATIONSHIP OF RESEARCH QUESTIONS AND HYPOTHESES
A research question is what a study aims to answer after data analysis and interpretation. The answer is written in length in the discussion section of the paper. Thus, the research question gives a preview of the different parts and variables of the study meant to address the problem posed in the research question. 1 An excellent research question clarifies the research writing while facilitating understanding of the research topic, objective, scope, and limitations of the study. 5
On the other hand, a research hypothesis is an educated statement of an expected outcome. This statement is based on background research and current knowledge. 8 , 9 The research hypothesis makes a specific prediction about a new phenomenon 10 or a formal statement on the expected relationship between an independent variable and a dependent variable. 3 , 11 It provides a tentative answer to the research question to be tested or explored. 4
Hypotheses employ reasoning to predict a theory-based outcome. 10 These can also be developed from theories by focusing on components of theories that have not yet been observed. 10 The validity of hypotheses is often based on the testability of the prediction made in a reproducible experiment. 8
Conversely, hypotheses can also be rephrased as research questions. Several hypotheses based on existing theories and knowledge may be needed to answer a research question. Developing ethical research questions and hypotheses creates a research design that has logical relationships among variables. These relationships serve as a solid foundation for the conduct of the study. 4 , 11 Haphazardly constructed research questions can result in poorly formulated hypotheses and improper study designs, leading to unreliable results. Thus, the formulations of relevant research questions and verifiable hypotheses are crucial when beginning research. 12
CHARACTERISTICS OF GOOD RESEARCH QUESTIONS AND HYPOTHESES
Excellent research questions are specific and focused. These integrate collective data and observations to confirm or refute the subsequent hypotheses. Well-constructed hypotheses are based on previous reports and verify the research context. These are realistic, in-depth, sufficiently complex, and reproducible. More importantly, these hypotheses can be addressed and tested. 13
There are several characteristics of well-developed hypotheses. Good hypotheses are 1) empirically testable 7 , 10 , 11 , 13 ; 2) backed by preliminary evidence 9 ; 3) testable by ethical research 7 , 9 ; 4) based on original ideas 9 ; 5) have evidenced-based logical reasoning 10 ; and 6) can be predicted. 11 Good hypotheses can infer ethical and positive implications, indicating the presence of a relationship or effect relevant to the research theme. 7 , 11 These are initially developed from a general theory and branch into specific hypotheses by deductive reasoning. In the absence of a theory to base the hypotheses, inductive reasoning based on specific observations or findings form more general hypotheses. 10
TYPES OF RESEARCH QUESTIONS AND HYPOTHESES
Research questions and hypotheses are developed according to the type of research, which can be broadly classified into quantitative and qualitative research. We provide a summary of the types of research questions and hypotheses under quantitative and qualitative research categories in Table 1 .
Research questions in quantitative research
In quantitative research, research questions inquire about the relationships among variables being investigated and are usually framed at the start of the study. These are precise and typically linked to the subject population, dependent and independent variables, and research design. 1 Research questions may also attempt to describe the behavior of a population in relation to one or more variables, or describe the characteristics of variables to be measured ( descriptive research questions ). 1 , 5 , 14 These questions may also aim to discover differences between groups within the context of an outcome variable ( comparative research questions ), 1 , 5 , 14 or elucidate trends and interactions among variables ( relationship research questions ). 1 , 5 We provide examples of descriptive, comparative, and relationship research questions in quantitative research in Table 2 .
Hypotheses in quantitative research
In quantitative research, hypotheses predict the expected relationships among variables. 15 Relationships among variables that can be predicted include 1) between a single dependent variable and a single independent variable ( simple hypothesis ) or 2) between two or more independent and dependent variables ( complex hypothesis ). 4 , 11 Hypotheses may also specify the expected direction to be followed and imply an intellectual commitment to a particular outcome ( directional hypothesis ) 4 . On the other hand, hypotheses may not predict the exact direction and are used in the absence of a theory, or when findings contradict previous studies ( non-directional hypothesis ). 4 In addition, hypotheses can 1) define interdependency between variables ( associative hypothesis ), 4 2) propose an effect on the dependent variable from manipulation of the independent variable ( causal hypothesis ), 4 3) state a negative relationship between two variables ( null hypothesis ), 4 , 11 , 15 4) replace the working hypothesis if rejected ( alternative hypothesis ), 15 explain the relationship of phenomena to possibly generate a theory ( working hypothesis ), 11 5) involve quantifiable variables that can be tested statistically ( statistical hypothesis ), 11 6) or express a relationship whose interlinks can be verified logically ( logical hypothesis ). 11 We provide examples of simple, complex, directional, non-directional, associative, causal, null, alternative, working, statistical, and logical hypotheses in quantitative research, as well as the definition of quantitative hypothesis-testing research in Table 3 .
Research questions in qualitative research
Unlike research questions in quantitative research, research questions in qualitative research are usually continuously reviewed and reformulated. The central question and associated subquestions are stated more than the hypotheses. 15 The central question broadly explores a complex set of factors surrounding the central phenomenon, aiming to present the varied perspectives of participants. 15
There are varied goals for which qualitative research questions are developed. These questions can function in several ways, such as to 1) identify and describe existing conditions ( contextual research question s); 2) describe a phenomenon ( descriptive research questions ); 3) assess the effectiveness of existing methods, protocols, theories, or procedures ( evaluation research questions ); 4) examine a phenomenon or analyze the reasons or relationships between subjects or phenomena ( explanatory research questions ); or 5) focus on unknown aspects of a particular topic ( exploratory research questions ). 5 In addition, some qualitative research questions provide new ideas for the development of theories and actions ( generative research questions ) or advance specific ideologies of a position ( ideological research questions ). 1 Other qualitative research questions may build on a body of existing literature and become working guidelines ( ethnographic research questions ). Research questions may also be broadly stated without specific reference to the existing literature or a typology of questions ( phenomenological research questions ), may be directed towards generating a theory of some process ( grounded theory questions ), or may address a description of the case and the emerging themes ( qualitative case study questions ). 15 We provide examples of contextual, descriptive, evaluation, explanatory, exploratory, generative, ideological, ethnographic, phenomenological, grounded theory, and qualitative case study research questions in qualitative research in Table 4 , and the definition of qualitative hypothesis-generating research in Table 5 .
Qualitative studies usually pose at least one central research question and several subquestions starting with How or What . These research questions use exploratory verbs such as explore or describe . These also focus on one central phenomenon of interest, and may mention the participants and research site. 15
Hypotheses in qualitative research
Hypotheses in qualitative research are stated in the form of a clear statement concerning the problem to be investigated. Unlike in quantitative research where hypotheses are usually developed to be tested, qualitative research can lead to both hypothesis-testing and hypothesis-generating outcomes. 2 When studies require both quantitative and qualitative research questions, this suggests an integrative process between both research methods wherein a single mixed-methods research question can be developed. 1
FRAMEWORKS FOR DEVELOPING RESEARCH QUESTIONS AND HYPOTHESES
Research questions followed by hypotheses should be developed before the start of the study. 1 , 12 , 14 It is crucial to develop feasible research questions on a topic that is interesting to both the researcher and the scientific community. This can be achieved by a meticulous review of previous and current studies to establish a novel topic. Specific areas are subsequently focused on to generate ethical research questions. The relevance of the research questions is evaluated in terms of clarity of the resulting data, specificity of the methodology, objectivity of the outcome, depth of the research, and impact of the study. 1 , 5 These aspects constitute the FINER criteria (i.e., Feasible, Interesting, Novel, Ethical, and Relevant). 1 Clarity and effectiveness are achieved if research questions meet the FINER criteria. In addition to the FINER criteria, Ratan et al. described focus, complexity, novelty, feasibility, and measurability for evaluating the effectiveness of research questions. 14
The PICOT and PEO frameworks are also used when developing research questions. 1 The following elements are addressed in these frameworks, PICOT: P-population/patients/problem, I-intervention or indicator being studied, C-comparison group, O-outcome of interest, and T-timeframe of the study; PEO: P-population being studied, E-exposure to preexisting conditions, and O-outcome of interest. 1 Research questions are also considered good if these meet the “FINERMAPS” framework: Feasible, Interesting, Novel, Ethical, Relevant, Manageable, Appropriate, Potential value/publishable, and Systematic. 14
As we indicated earlier, research questions and hypotheses that are not carefully formulated result in unethical studies or poor outcomes. To illustrate this, we provide some examples of ambiguous research question and hypotheses that result in unclear and weak research objectives in quantitative research ( Table 6 ) 16 and qualitative research ( Table 7 ) 17 , and how to transform these ambiguous research question(s) and hypothesis(es) into clear and good statements.
a These statements were composed for comparison and illustrative purposes only.
b These statements are direct quotes from Higashihara and Horiuchi. 16
a This statement is a direct quote from Shimoda et al. 17
The other statements were composed for comparison and illustrative purposes only.
CONSTRUCTING RESEARCH QUESTIONS AND HYPOTHESES
To construct effective research questions and hypotheses, it is very important to 1) clarify the background and 2) identify the research problem at the outset of the research, within a specific timeframe. 9 Then, 3) review or conduct preliminary research to collect all available knowledge about the possible research questions by studying theories and previous studies. 18 Afterwards, 4) construct research questions to investigate the research problem. Identify variables to be accessed from the research questions 4 and make operational definitions of constructs from the research problem and questions. Thereafter, 5) construct specific deductive or inductive predictions in the form of hypotheses. 4 Finally, 6) state the study aims . This general flow for constructing effective research questions and hypotheses prior to conducting research is shown in Fig. 1 .
Research questions are used more frequently in qualitative research than objectives or hypotheses. 3 These questions seek to discover, understand, explore or describe experiences by asking “What” or “How.” The questions are open-ended to elicit a description rather than to relate variables or compare groups. The questions are continually reviewed, reformulated, and changed during the qualitative study. 3 Research questions are also used more frequently in survey projects than hypotheses in experiments in quantitative research to compare variables and their relationships.
Hypotheses are constructed based on the variables identified and as an if-then statement, following the template, ‘If a specific action is taken, then a certain outcome is expected.’ At this stage, some ideas regarding expectations from the research to be conducted must be drawn. 18 Then, the variables to be manipulated (independent) and influenced (dependent) are defined. 4 Thereafter, the hypothesis is stated and refined, and reproducible data tailored to the hypothesis are identified, collected, and analyzed. 4 The hypotheses must be testable and specific, 18 and should describe the variables and their relationships, the specific group being studied, and the predicted research outcome. 18 Hypotheses construction involves a testable proposition to be deduced from theory, and independent and dependent variables to be separated and measured separately. 3 Therefore, good hypotheses must be based on good research questions constructed at the start of a study or trial. 12
In summary, research questions are constructed after establishing the background of the study. Hypotheses are then developed based on the research questions. Thus, it is crucial to have excellent research questions to generate superior hypotheses. In turn, these would determine the research objectives and the design of the study, and ultimately, the outcome of the research. 12 Algorithms for building research questions and hypotheses are shown in Fig. 2 for quantitative research and in Fig. 3 for qualitative research.
EXAMPLES OF RESEARCH QUESTIONS FROM PUBLISHED ARTICLES
- EXAMPLE 1. Descriptive research question (quantitative research)
- - Presents research variables to be assessed (distinct phenotypes and subphenotypes)
- “BACKGROUND: Since COVID-19 was identified, its clinical and biological heterogeneity has been recognized. Identifying COVID-19 phenotypes might help guide basic, clinical, and translational research efforts.
- RESEARCH QUESTION: Does the clinical spectrum of patients with COVID-19 contain distinct phenotypes and subphenotypes? ” 19
- EXAMPLE 2. Relationship research question (quantitative research)
- - Shows interactions between dependent variable (static postural control) and independent variable (peripheral visual field loss)
- “Background: Integration of visual, vestibular, and proprioceptive sensations contributes to postural control. People with peripheral visual field loss have serious postural instability. However, the directional specificity of postural stability and sensory reweighting caused by gradual peripheral visual field loss remain unclear.
- Research question: What are the effects of peripheral visual field loss on static postural control ?” 20
- EXAMPLE 3. Comparative research question (quantitative research)
- - Clarifies the difference among groups with an outcome variable (patients enrolled in COMPERA with moderate PH or severe PH in COPD) and another group without the outcome variable (patients with idiopathic pulmonary arterial hypertension (IPAH))
- “BACKGROUND: Pulmonary hypertension (PH) in COPD is a poorly investigated clinical condition.
- RESEARCH QUESTION: Which factors determine the outcome of PH in COPD?
- STUDY DESIGN AND METHODS: We analyzed the characteristics and outcome of patients enrolled in the Comparative, Prospective Registry of Newly Initiated Therapies for Pulmonary Hypertension (COMPERA) with moderate or severe PH in COPD as defined during the 6th PH World Symposium who received medical therapy for PH and compared them with patients with idiopathic pulmonary arterial hypertension (IPAH) .” 21
- EXAMPLE 4. Exploratory research question (qualitative research)
- - Explores areas that have not been fully investigated (perspectives of families and children who receive care in clinic-based child obesity treatment) to have a deeper understanding of the research problem
- “Problem: Interventions for children with obesity lead to only modest improvements in BMI and long-term outcomes, and data are limited on the perspectives of families of children with obesity in clinic-based treatment. This scoping review seeks to answer the question: What is known about the perspectives of families and children who receive care in clinic-based child obesity treatment? This review aims to explore the scope of perspectives reported by families of children with obesity who have received individualized outpatient clinic-based obesity treatment.” 22
- EXAMPLE 5. Relationship research question (quantitative research)
- - Defines interactions between dependent variable (use of ankle strategies) and independent variable (changes in muscle tone)
- “Background: To maintain an upright standing posture against external disturbances, the human body mainly employs two types of postural control strategies: “ankle strategy” and “hip strategy.” While it has been reported that the magnitude of the disturbance alters the use of postural control strategies, it has not been elucidated how the level of muscle tone, one of the crucial parameters of bodily function, determines the use of each strategy. We have previously confirmed using forward dynamics simulations of human musculoskeletal models that an increased muscle tone promotes the use of ankle strategies. The objective of the present study was to experimentally evaluate a hypothesis: an increased muscle tone promotes the use of ankle strategies. Research question: Do changes in the muscle tone affect the use of ankle strategies ?” 23
EXAMPLES OF HYPOTHESES IN PUBLISHED ARTICLES
- EXAMPLE 1. Working hypothesis (quantitative research)
- - A hypothesis that is initially accepted for further research to produce a feasible theory
- “As fever may have benefit in shortening the duration of viral illness, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response when taken during the early stages of COVID-19 illness .” 24
- “In conclusion, it is plausible to hypothesize that the antipyretic efficacy of ibuprofen may be hindering the benefits of a fever response . The difference in perceived safety of these agents in COVID-19 illness could be related to the more potent efficacy to reduce fever with ibuprofen compared to acetaminophen. Compelling data on the benefit of fever warrant further research and review to determine when to treat or withhold ibuprofen for early stage fever for COVID-19 and other related viral illnesses .” 24
- EXAMPLE 2. Exploratory hypothesis (qualitative research)
- - Explores particular areas deeper to clarify subjective experience and develop a formal hypothesis potentially testable in a future quantitative approach
- “We hypothesized that when thinking about a past experience of help-seeking, a self distancing prompt would cause increased help-seeking intentions and more favorable help-seeking outcome expectations .” 25
- Although a priori hypotheses were not supported, further research is warranted as results indicate the potential for using self-distancing approaches to increasing help-seeking among some people with depressive symptomatology.” 25
- EXAMPLE 3. Hypothesis-generating research to establish a framework for hypothesis testing (qualitative research)
- “We hypothesize that compassionate care is beneficial for patients (better outcomes), healthcare systems and payers (lower costs), and healthcare providers (lower burnout). ” 26
- Compassionomics is the branch of knowledge and scientific study of the effects of compassionate healthcare. Our main hypotheses are that compassionate healthcare is beneficial for (1) patients, by improving clinical outcomes, (2) healthcare systems and payers, by supporting financial sustainability, and (3) HCPs, by lowering burnout and promoting resilience and well-being. The purpose of this paper is to establish a scientific framework for testing the hypotheses above . If these hypotheses are confirmed through rigorous research, compassionomics will belong in the science of evidence-based medicine, with major implications for all healthcare domains.” 26
- EXAMPLE 4. Statistical hypothesis (quantitative research)
- - An assumption is made about the relationship among several population characteristics ( gender differences in sociodemographic and clinical characteristics of adults with ADHD ). Validity is tested by statistical experiment or analysis ( chi-square test, Students t-test, and logistic regression analysis)
- “Our research investigated gender differences in sociodemographic and clinical characteristics of adults with ADHD in a Japanese clinical sample. Due to unique Japanese cultural ideals and expectations of women's behavior that are in opposition to ADHD symptoms, we hypothesized that women with ADHD experience more difficulties and present more dysfunctions than men . We tested the following hypotheses: first, women with ADHD have more comorbidities than men with ADHD; second, women with ADHD experience more social hardships than men, such as having less full-time employment and being more likely to be divorced.” 27
- “Statistical Analysis
- ( text omitted ) Between-gender comparisons were made using the chi-squared test for categorical variables and Students t-test for continuous variables…( text omitted ). A logistic regression analysis was performed for employment status, marital status, and comorbidity to evaluate the independent effects of gender on these dependent variables.” 27
EXAMPLES OF HYPOTHESIS AS WRITTEN IN PUBLISHED ARTICLES IN RELATION TO OTHER PARTS
- EXAMPLE 1. Background, hypotheses, and aims are provided
- “Pregnant women need skilled care during pregnancy and childbirth, but that skilled care is often delayed in some countries …( text omitted ). The focused antenatal care (FANC) model of WHO recommends that nurses provide information or counseling to all pregnant women …( text omitted ). Job aids are visual support materials that provide the right kind of information using graphics and words in a simple and yet effective manner. When nurses are not highly trained or have many work details to attend to, these job aids can serve as a content reminder for the nurses and can be used for educating their patients (Jennings, Yebadokpo, Affo, & Agbogbe, 2010) ( text omitted ). Importantly, additional evidence is needed to confirm how job aids can further improve the quality of ANC counseling by health workers in maternal care …( text omitted )” 28
- “ This has led us to hypothesize that the quality of ANC counseling would be better if supported by job aids. Consequently, a better quality of ANC counseling is expected to produce higher levels of awareness concerning the danger signs of pregnancy and a more favorable impression of the caring behavior of nurses .” 28
- “This study aimed to examine the differences in the responses of pregnant women to a job aid-supported intervention during ANC visit in terms of 1) their understanding of the danger signs of pregnancy and 2) their impression of the caring behaviors of nurses to pregnant women in rural Tanzania.” 28
- EXAMPLE 2. Background, hypotheses, and aims are provided
- “We conducted a two-arm randomized controlled trial (RCT) to evaluate and compare changes in salivary cortisol and oxytocin levels of first-time pregnant women between experimental and control groups. The women in the experimental group touched and held an infant for 30 min (experimental intervention protocol), whereas those in the control group watched a DVD movie of an infant (control intervention protocol). The primary outcome was salivary cortisol level and the secondary outcome was salivary oxytocin level.” 29
- “ We hypothesize that at 30 min after touching and holding an infant, the salivary cortisol level will significantly decrease and the salivary oxytocin level will increase in the experimental group compared with the control group .” 29
- EXAMPLE 3. Background, aim, and hypothesis are provided
- “In countries where the maternal mortality ratio remains high, antenatal education to increase Birth Preparedness and Complication Readiness (BPCR) is considered one of the top priorities . BPCR includes birth plans during the antenatal period, such as the birthplace, birth attendant, transportation, health facility for complications, expenses, and birth materials, as well as family coordination to achieve such birth plans. In Tanzania, although increasing, only about half of all pregnant women attend an antenatal clinic more than four times . Moreover, the information provided during antenatal care (ANC) is insufficient. In the resource-poor settings, antenatal group education is a potential approach because of the limited time for individual counseling at antenatal clinics.” 30
- “This study aimed to evaluate an antenatal group education program among pregnant women and their families with respect to birth-preparedness and maternal and infant outcomes in rural villages of Tanzania.” 30
- “ The study hypothesis was if Tanzanian pregnant women and their families received a family-oriented antenatal group education, they would (1) have a higher level of BPCR, (2) attend antenatal clinic four or more times, (3) give birth in a health facility, (4) have less complications of women at birth, and (5) have less complications and deaths of infants than those who did not receive the education .” 30
Research questions and hypotheses are crucial components to any type of research, whether quantitative or qualitative. These questions should be developed at the very beginning of the study. Excellent research questions lead to superior hypotheses, which, like a compass, set the direction of research, and can often determine the successful conduct of the study. Many research studies have floundered because the development of research questions and subsequent hypotheses was not given the thought and meticulous attention needed. The development of research questions and hypotheses is an iterative process based on extensive knowledge of the literature and insightful grasp of the knowledge gap. Focused, concise, and specific research questions provide a strong foundation for constructing hypotheses which serve as formal predictions about the research outcomes. Research questions and hypotheses are crucial elements of research that should not be overlooked. They should be carefully thought of and constructed when planning research. This avoids unethical studies and poor outcomes by defining well-founded objectives that determine the design, course, and outcome of the study.
Disclosure: The authors have no potential conflicts of interest to disclose.
- Conceptualization: Barroga E, Matanguihan GJ.
- Methodology: Barroga E, Matanguihan GJ.
- Writing - original draft: Barroga E, Matanguihan GJ.
- Writing - review & editing: Barroga E, Matanguihan GJ.
- Other Journals
Chapter 3. Introduction to Quantitative Research and Data
Chapter 3 of Library Technology Reports (vol. 53, no. 4), "Applying Quantitative Methods to E-book Collections"
The foundation of any e-book analysis framework rests on knowledge of quantitative data and metrics. Chapter 3 of Library Technology Reports (vol. 53, no. 4), “Applying Quantitative Methods to E-book Collections,” discusses key characteristics of quantitative data and the various types of research questions it can answer. It also lists various performance measures and indicators that can be used in information management environments to support conclusions and provide evidence for e-book collection development decisions. Finally, it provides a research framework that can be used to plan and define collection analysis projects.
Bob Matthews and Liz Ross, Research Methods: A Practical Guide for the Social Sciences (Harlow, UK: Pearson Education, 2010), 45.
Based on information provided by Stephen A. Roberts, Financial and Cost Management for Libraries and Information Management Services (London: Bowker-Saur, 1998), 140–41.
Darby Orcutt, Library Data: Empowering Practice and Persuasion (Santa Barbara, CA: Libraries Unlimited, 2009), 106.
Northwestern University Libraries, “DataBank: How to Interpret Your Data: Financial Support,” LibGuide, last updated December 8 2015, http://libguides.northwestern.edu/c.php?g=115065&p=748741.
Nisa Bakkalbasi, Donna Sundre, and Kenton Fulcher, “Assessing Assessment: A Framework to Evaluate Assessment Practices and Progress for Library Collections and Services,” in Proceedings of the 2012 Library Assessment Conference: Building Effective, Sustainable, Practical Assessment, October 29–31, 2012, Charlottesville, VA, ed. Steve Hiller, Martha Kyrillidou, Angela Pappalardo, Jim Self, and Amy Yeager (Washington, DC: Association of Research Libraries, 2013), 538-545.
- There are currently no refbacks.
Chapter 3 -- Survey Research Design and Quantitative Methods of Analysis for Cross-sectional Data
Almost everyone has had experience with surveys. Market surveys ask respondents whether they recognize products and their feelings about them. Political polls ask questions about candidates for political office or opinions related to political and social issues. Needs assessments use surveys that identify the needs of groups. Evaluations often use surveys to assess the extent to which programs achieve their goals. Survey research is a method of collecting information by asking questions. Sometimes interviews are done face-to-face with people at home, in school, or at work. Other times questions are sent in the mail for people to answer and mail back. Increasingly, surveys are conducted by telephone. SAMPLE SURVEYS Although we want to have information on all people, it is usually too expensive and time consuming to question everyone. So we select only some of these individuals and question them. It is important to select these people in ways that make it likely that they represent the larger group. The population is all the individuals in whom we are interested. (A population does not always consist of individuals. Sometimes, it may be geographical areas such as all cities with populations of 100,000 or more. Or we may be interested in all households in a particular area. In the data used in the exercises of this module the population consists of individuals who are California residents.) A sample is the subset of the population involved in a study. In other words, a sample is part of the population. The process of selecting the sample is called sampling . The idea of sampling is to select part of the population to represent the entire population. The United States Census is a good example of sampling. The census tries to enumerate all residents every ten years with a short questionnaire. Approximately every fifth household is given a longer questionnaire. Information from this sample (i.e., every fifth household) is used to make inferences about the population. Political polls also use samples. To find out how potential voters feel about a particular race, pollsters select a sample of potential voters. This module uses opinions from three samples of California residents age 18 and over. The data were collected during July, 1985, September, 1991, and February, 1995, by the Field Research Corporation (The Field Institute 1985, 1991, 1995). The Field Research Corporation is a widely-respected survey research firm and is used extensively by the media, politicians, and academic researchers. Since a survey can be no better than the quality of the sample, it is essential to understand the basic principles of sampling. There are two types of sampling-probability and nonprobability. A probability sample is one in which each individual in the population has a known, nonzero chance of being selected in the sample. The most basic type is the simple random sample . In a simple random sample, every individual (and every combination of individuals) has the same chance of being selected in the sample. This is the equivalent of writing each person's name on a piece of paper, putting them in plastic balls, putting all the balls in a big bowl, mixing the balls thoroughly, and selecting some predetermined number of balls from the bowl. This would produce a simple random sample. The simple random sample assumes that we can list all the individuals in the population, but often this is impossible. If our population were all the households or residents of California, there would be no list of the households or residents available, and it would be very expensive and time consuming to construct one. In this type of situation, a multistage cluster sample would be used. The idea is very simple. If we wanted to draw a sample of all residents of California, we might start by dividing California into large geographical areas such as counties and selecting a sample of these counties. Our sample of counties could then be divided into smaller geographical areas such as blocks and a sample of blocks would be selected. We could then construct a list of all households for only those blocks in the sample. Finally, we would go to these households and randomly select one member of each household for our sample. Once the household and the member of that household have been selected, substitution would not be allowed. This often means that we must call back several times, but this is the price we must pay for a good sample. The Field Poll used in this module is a telephone survey. It is a probability sample using a technique called random-digit dialing . With random-digit dialing, phone numbers are dialed randomly within working exchanges (i.e., the first three digits of the telephone number). Numbers are selected in such a way that all areas have the proper proportional chance of being selected in the sample. Random-digit dialing makes it possible to include numbers that are not listed in the telephone directory and households that have moved into an area so recently that they are not included in the current telephone directory. A nonprobability sample is one in which each individual in the population does not have a known chance of selection in the sample. There are several types of nonprobability samples. For example, magazines often include questionnaires for readers to fill out and return. This is a volunteer sample since respondents self-select themselves into the sample (i.e., they volunteer to be in the sample). Another type of nonprobability sample is a quota sample . Survey researchers may assign quotas to interviewers. For example, interviewers might be told that half of their respondents must be female and the other half male. This is a quota on sex. We could also have quotas on several variables (e.g., sex and race) simultaneously. Probability samples are preferable to nonprobability samples. First, they avoid the dangers of what survey researchers call "systematic selection biases" which are inherent in nonprobability samples. For example, in a volunteer sample, particular types of persons might be more likely to volunteer. Perhaps highly-educated individuals are more likely to volunteer to be in the sample and this would produce a systematic selection bias in favor of the highly educated. In a probability sample, the selection of the actual cases in the sample is left to chance. Second, in a probability sample we are able to estimate the amount of sampling error (our next concept to discuss). We would like our sample to give us a perfectly accurate picture of the population. However, this is unrealistic. Assume that the population is all employees of a large corporation, and we want to estimate the percent of employees in the population that is satisfied with their jobs. We select a simple random sample of 500 employees and ask the individuals in the sample how satisfied they are with their jobs. We discover that 75 percent of the employees in our sample are satisfied. Can we assume that 75 percent of the population is satisfied? That would be asking too much. Why would we expect one sample of 500 to give us a perfect representation of the population? We could take several different samples of 500 employees and the percent satisfied from each sample would vary from sample to sample. There will be a certain amount of error as a result of selecting a sample from the population. We refer to this as sampling error . Sampling error can be estimated in a probability sample, but not in a nonprobability sample. It would be wrong to assume that the only reason our sample estimate is different from the true population value is because of sampling error. There are many other sources of error called nonsampling error . Nonsampling error would include such things as the effects of biased questions, the tendency of respondents to systematically underestimate such things as age, the exclusion of certain types of people from the sample (e.g., those without phones, those without permanent addresses), or the tendency of some respondents to systematically agree to statements regardless of the content of the statements. In some studies, the amount of nonsampling error might be far greater than the amount of sampling error. Notice that sampling error is random in nature, while nonsampling error may be nonrandom producing systematic biases. We can estimate the amount of sampling error (assuming probability sampling), but it is much more difficult to estimate nonsampling error. We can never eliminate sampling error entirely, and it is unrealistic to expect that we could ever eliminate nonsampling error. It is good research practice to be diligent in seeking out sources of nonsampling error and trying to minimize them. DATA ANALYSIS Examining Variables One at a Time (Univariate Analysis) The rest of this chapter will deal with the analysis of survey data . Data analysis involves looking at variables or "things" that vary or change. A variable is a characteristic of the individual (assuming we are studying individuals). The answer to each question on the survey forms a variable. For example, sex is a variable-some individuals in the sample are male and some are female. Age is a variable; individuals vary in their ages. Looking at variables one at a time is called univariate analysis . This is the usual starting point in analyzing survey data. There are several reasons to look at variables one at a time. First, we want to describe the data. How many of our sample are men and how many are women? How many are black and how many are white? What is the distribution by age? How many say they are going to vote for Candidate A and how many for Candidate B? How many respondents agree and how many disagree with a statement describing a particular opinion? Another reason we might want to look at variables one at a time involves recoding. Recoding is the process of combining categories within a variable. Consider age, for example. In the data set used in this module, age varies from 18 to 89, but we would want to use fewer categories in our analysis, so we might combine age into age 18 to 29, 30 to 49, and 50 and over. We might want to combine African Americans with the other races to classify race into only two categories-white and nonwhite. Recoding is used to reduce the number of categories in the variable (e.g., age) or to combine categories so that you can make particular types of comparisons (e.g., white versus nonwhite). The frequency distribution is one of the basic tools for looking at variables one at a time. A frequency distribution is the set of categories and the number of cases in each category. Percent distributions show the percentage in each category. Table 3.1 shows frequency and percent distributions for two hypothetical variables-one for sex and one for willingness to vote for a woman candidate. Begin by looking at the frequency distribution for sex. There are three columns in this table. The first column specifies the categories-male and female. The second column tells us how many cases there are in each category, and the third column converts these frequencies into percents. Table 3.1 -- Frequency and Percent Distributions for Sex and Willingness to Vote for a Woman Candidate (Hypothetical Data) Sex Voting Preference Category Freq. Percent Category Freq. Percent Valid Percent Male 380 40.0 Willing to Vote for a Woman 460 48.4 51.1 Female 570 60.0 Not Willing to Vote for a Woman 440 46.3 48.9 Total 950 100.0 Refused 50 5.3 Missing Total 950 100.0 100.0 In this hypothetical example, there are 380 males and 570 females or 40 percent male and 60 percent female. There are a total of 950 cases. Since we know the sex for each case, there are no missing data (i.e., no cases where we do not know the proper category). Look at the frequency distribution for voting preference in Table 3.1. How many say they are willing to vote for a woman candidate and how many are unwilling? (Answer: 460 willing and 440 not willing) How many refused to answer the question? (Answer: 50) What percent say they are willing to vote for a woman, what percent are not, and what percent refused to answer? (Answer: 48.4 percent willing to vote for a woman, 46.3 percent not willing, and 5.3 percent refused to tell us.) The 50 respondents who didn't want to answer the question are called missing data because we don't know which category into which to place them, so we create a new category (i.e., refused) for them. Since we don't know where they should go, we might want a percentage distribution considering only the 900 respondents who answered the question. We can determine this easily by taking the 50 cases with missing information out of the base (i.e., the denominator of the fraction) and recomputing the percentages. The fourth column in the frequency distribution (labeled "valid percent") gives us this information. Approximately 51 percent of those who answered the question were willing to vote for a woman and approximately 49 percent were not. With these data we will use frequency distributions to describe variables one at a time. There are other ways to describe single variables. The mean, median, and mode are averages that may be used to describe the central tendency of a distribution. The range and standard deviation are measures of the amount of variability or dispersion of a distribution. (We will not be using measures of central tendency or variability in this module.) Exploring the Relationship Between Two Variables (Bivariate Analysis) Usually we want to do more than simply describe variables one at a time. We may want to analyze the relationship between variables. Morris Rosenberg (1968:2) suggests that there are three types of relationships: "(1) neither variable may influence one another .... (2) both variables may influence one another ... (3) one of the variables may influence the other." We will focus on the third of these types which Rosenberg calls "asymmetrical relationships." In this type of relationship, one of the variables (the independent variable ) is assumed to be the cause and the other variable (the dependent variable ) is assumed to be the effect. In other words, the independent variable is the factor that influences the dependent variable. For example, researchers think that smoking causes lung cancer. The statement that specifies the relationship between two variables is called a hypothesis (see Hoover 1992, for a more extended discussion of hypotheses). In this hypothesis, the independent variable is smoking (or more precisely, the amount one smokes) and the dependent variable is lung cancer. Consider another example. Political analysts think that income influences voting decisions, that rich people vote differently from poor people. In this hypothesis, income would be the independent variable and voting would be the dependent variable. In order to demonstrate that a causal relationship exists between two variables, we must meet three criteria: (1) there must be a statistical relationship between the two variables, (2) we must be able to demonstrate which one of the variables influences the other, and (3) we must be able to show that there is no other alternative explanation for the relationship. As you can imagine, it is impossible to show that there is no other alternative explanation for a relationship. For this reason, we can show that one variable does not influence another variable, but we cannot prove that it does. We can only show that it is more plausible or credible to believe that a causal relationship exists. In this section, we will focus on the first two criteria and leave this third criterion to the next section. In the previous section we looked at the frequency distributions for sex and voting preference. All we can say from these two distributions is that the sample is 40 percent men and 60 percent women and that slightly more than half of the respondents said they would be willing to vote for a woman, and slightly less than half are not willing to. We cannot say anything about the relationship between sex and voting preference. In order to determine if men or women are more likely to be willing to vote for a woman candidate, we must move from univariate to bivariate analysis. A crosstabulation (or contingency table ) is the basic tool used to explore the relationship between two variables. Table 3.2 is the crosstabulation of sex and voting preference. In the lower right-hand corner is the total number of cases in this table (900). Notice that this is not the number of cases in the sample. There were originally 950 cases in this sample, but any case that had missing information on either or both of the two variables in the table has been excluded from the table. Be sure to check how many cases have been excluded from your table and to indicate this figure in your report. Also be sure that you understand why these cases have been excluded. The figures in the lower margin and right-hand margin of the table are called the marginal distributions. They are simply the frequency distributions for the two variables in the whole table. Here, there are 360 males and 540 females (the marginal distribution for the column variable-sex) and 460 people who are willing to vote for a woman candidate and 440 who are not (the marginal distribution for the row variable-voting preference). The other figures in the table are the cell frequencies. Since there are two columns and two rows in this table (sometimes called a 2 x 2 table), there are four cells. The numbers in these cells tell us how many cases fall into each combination of categories of the two variables. This sounds complicated, but it isn't. For example, 158 males are willing to vote for a woman and 302 females are willing to vote for a woman. Table 3.2 -- Crosstabulation of Sex and Voting Preference (Frequencies) Sex Voting Preference Male Female Total Willing to Vote for a Woman 158 302 460 Not Willing to Vote for a Woman 202 238 440 Total 360 540 900 We could make comparisons rather easily if we had an equal number of women and men. Since these numbers are not equal, we must use percentages to help us make the comparisons. Since percentages convert everything to a common base of 100, the percent distribution shows us what the table would look like if there were an equal number of men and women. Before we percentage Table 3.2, we must decide which of these two variables is the independent and which is the dependent variable. Remember that the independent variable is the variable we think might be the influencing factor. The independent variable is hypothesized to be the cause, and the dependent variable is the effect. Another way to express this is to say that the dependent variable is the one we want to explain. Since we think that sex influences willingness to vote for a woman candidate, sex would be the independent variable. Once we have decided which is the independent variable, we are ready to percentage the table. Notice that percentages can be computed in different ways. In Table 3.3, the percentages have been computed so that they sum down to 100. These are called column percents . If they sum across to 100, they are called row percents . If the independent variable is the column variable, then we want the percents to sum down to 100 (i.e., we want the column percents). If the independent variable is the row variable, we want the percents to sum across to 100 (i.e., we want the row percents). This is a simple, but very important, rule to remember. We'll call this our rule for computing percents . Although we often see the independent variable as the column variable so the table sums down to 100 percent, it really doesn't matter whether the independent variable is the column or the row variable. In this module, we will put the independent variable as the column variable. Many others (but not everyone) use this convention. It would be helpful if you did this when you write your report. Table 3.3 -- Voting Preference by Sex (Percents) Voting Preference Male Female Total Willing to Vote for a Woman 43.9 55.9 51.1 Not Willing to Vote for a Woman 56.1 44.1 100.0 Total Percent 100.0 100.0 100.0 (Total Frequency) (360) (540) (900) Now we are ready to interpret this table. Interpreting a table means to explain what the table is saying about the relationship between the two variables. First, we can look at each category of the independent variable separately to describe the data and then we compare them to each other. Since the percents sum down to 100 percent, we describe down and compare across. The rule for interpreting percents is to compare in the direction opposite to the way the percents sum to 100. So, if the percents sum down to 100, we compare across, and if the percents sum across to 100, compare down. If the independent variable is the column variable, the percents will always sum down to 100. We can look at each category of the independent variable separately to describe the data and then compare them to each other-describe down and then compare across. In Table 3.3, row one shows the percent of males and the percent of females who are willing to vote for a woman candidate--43.9 percent of males are willing to vote for a woman, while 55.9 percent of the females are. This is a difference of 12 percentage points. Somewhat more females than males are willing to vote for a woman. The second row shows the percent of males and females who are not willing to vote for a woman. Since there are only two rows, the second row will be the complement (or the reverse) of the first row. It shows that males are somewhat more likely to be unwilling to vote for a woman candidate (a difference of 12 percentage points in the opposite direction). When we observe a difference, we must also decide whether it is significant. There are two different meanings for significance-statistical significance and substantive significance. Statistical significance considers whether the difference is great enough that it is probably not due to chance factors. Substantive significance considers whether a difference is large enough to be important. With a very large sample, a very small difference is often statistically significant, but that difference may be so small that we decide it isn't substantively significant (i.e., it's so small that we decide it doesn't mean very much). We're going to focus on statistical significance, but remember that even if a difference is statistically significant, you must also decide if it is substantively significant. Let's discuss this idea of statistical significance. If our population is all men and women of voting age in California, we want to know if there is a relationship between sex and voting preference in the population of all individuals of voting age in California. All we have is information about a sample from the population. We use the sample information to make an inference about the population. This is called statistical inference . We know that our sample is not a perfect representation of our population because of sampling error . Therefore, we would not expect the relationship we see in our sample to be exactly the same as the relationship in the population. Suppose we want to know whether there is a relationship between sex and voting preference in the population. It is impossible to prove this directly, so we have to demonstrate it indirectly. We set up a hypothesis (called the null hypothesis ) that says that sex and voting preference are not related to each other in the population. This basically says that any difference we see is likely to be the result of random variation. If the difference is large enough that it is not likely to be due to chance, we can reject this null hypothesis of only random differences. Then the hypothesis that they are related (called the alternative or research hypothesis ) will be more credible.
In the first column of Table 3.4, we have listed the four cell frequencies from the crosstabulation of sex and voting preference. We'll call these the observed frequencies (f o ) because they are what we observe from our table. In the second column, we have listed the frequencies we would expect if, in fact, there is no relationship between sex and voting preference in the population. These are called the expected frequencies (f e ). We'll briefly explain how these expected frequencies are obtained. Notice from Table 3.1 that 51.1 percent of the sample were willing to vote for a woman candidate, while 48.9 percent were not. If sex and voting preference are independent (i.e., not related), we should find the same percentages for males and females. In other words, 48.9 percent (or 176) of the males and 48.9 percent (or 264) of the females would be unwilling to vote for a woman candidate. (This explanation is adapted from Norusis 1997.) Now, we want to compare these two sets of frequencies to see if the observed frequencies are really like the expected frequencies. All we do is to subtract the expected from the observed frequencies (column three). We are interested in the sum of these differences for all cells in the table. Since they always sum to zero, we square the differences (column four) to get positive numbers. Finally, we divide this squared difference by the expected frequency (column five). (Don't worry about why we do this. The reasons are technical and don't add to your understanding.) The sum of column five (12.52) is called the chi square statistic . If the observed and the expected frequencies are identical (no difference), chi square will be zero. The greater the difference between the observed and expected frequencies, the larger the chi square. If we get a large chi square, we are willing to reject the null hypothesis. How large does the chi square have to be? We reject the null hypothesis of no relationship between the two variables when the probability of getting a chi square this large or larger by chance is so small that the null hypothesis is very unlikely to be true. That is, if a chi square this large would rarely occur by chance (usually less than once in a hundred or less than five times in a hundred). In this example, the probability of getting a chi square as large as 12.52 or larger by chance is less than one in a thousand. This is so unlikely that we reject the null hypothesis, and we conclude that the alternative hypothesis (i.e., there is a relationship between sex and voting preference) is credible (not that it is necessarily true, but that it is credible). There is always a small chance that the null hypothesis is true even when we decide to reject it. In other words, we can never be sure that it is false. We can only conclude that there is little chance that it is true. Just because we have concluded that there is a relationship between sex and voting preference does not mean that it is a strong relationship. It might be a moderate or even a weak relationship. There are many statistics that measure the strength of the relationship between two variables. Chi square is not a measure of the strength of the relationship. It just helps us decide if there is a basis for saying a relationship exists regardless of its strength. Measures of association estimate the strength of the relationship and are often used with chi square. (See Appendix D for a discussion of how to compute the two measures of association discussed below.) Cramer's V is a measure of association appropriate when one or both of the variables consists of unordered categories. For example, race (white, African American, other) or religion (Protestant, Catholic, Jewish, other, none) are variables with unordered categories. Cramer's V is a measure based on chi square. It ranges from zero to one. The closer to zero, the weaker the relationship; the closer to one, the stronger the relationship. Gamma (sometimes referred to as Goodman and Kruskal's Gamma) is a measure of association appropriate when both of the variables consist of ordered categories. For example, if respondents answer that they strongly agree, agree, disagree, or strongly disagree with a statement, their responses are ordered. Similarly, if we group age into categories such as under 30, 30 to 49, and 50 and over, these categories would be ordered. Ordered categories can logically be arranged in only two ways-low to high or high to low. Gamma ranges from zero to one, but can be positive or negative. For this module, the sign of Gamma would have no meaning, so ignore the sign and focus on the numerical value. Like V, the closer to zero, the weaker the relationship and the closer to one, the stronger the relationship. Choosing whether to use Cramer's V or Gamma depends on whether the categories of the variable are ordered or unordered. However, dichotomies (variables consisting of only two categories) may be treated as if they are ordered even if they are not. For example, sex is a dichotomy consisting of the categories male and female. There are only two possible ways to order sex-male, female and female, male. Or, race may be classified into two categories-white and nonwhite. We can treat dichotomies as if they consisted of ordered categories because they can be ordered in only two ways. In other words, when one of the variables is a dichotomy, treat this variable as if it were ordinal and use gamma. This is important when choosing an appropriate measure of association. In this chapter we have described how surveys are done and how we analyze the relationship between two variables. In the next chapter we will explore how to introduce additional variables into the analysis. REFERENCES AND SUGGESTED READING Methods of Social Research Riley, Matilda White. 1963. Sociological Research I: A Case Approach . New York: Harcourt, Brace and World. Hoover, Kenneth R. 1992. The Elements of Social Scientific Thinking (5 th Ed.). New York: St. Martin's. Interviewing Gorden, Raymond L. 1987. Interviewing: Strategy, Techniques and Tactics . Chicago: Dorsey. Survey Research and Sampling Babbie, Earl R. 1990. Survey Research Methods (2 nd Ed.). Belmont, CA: Wadsworth. Babbie, Earl R. 1997. The Practice of Social Research (8 th Ed). Belmont, CA: Wadsworth. Statistical Analysis Knoke, David, and George W. Bohrnstedt. 1991. Basic Social Statistics . Itesche, IL: Peacock. Riley, Matilda White. 1963. Sociological Research II Exercises and Manual . New York: Harcourt, Brace & World. Norusis, Marija J. 1997. SPSS 7.5 Guide to Data Analysis . Upper Saddle River, New Jersey: Prentice Hall. Data Sources The Field Institute. 1985. California Field Poll Study, July, 1985 . Machine-readable codebook. The Field Institute. 1991. California Field Poll Study, September, 1991 . Machine-readable codebook. The Field Institute. 1995. California Field Poll Study, February, 1995 . Machine-readable codebook.
- Free PDF Viewer
- Free Word Viewer
- Free Excel Viewer
- Free PowerPoint Viewer