News alert: UC Berkeley has announced its next university librarian

Secondary menu

  • Log in to your Library account
  • Hours and Maps
  • Connect from Off Campus
  • UC Berkeley Home

Search form

Oomph library resources: phw 250/250b epidemiologic methods: epidemiologic case study resources.

  • Online Books on Epidemiology and Biostatistics
  • R for Public Health
  • Epidemiologic Case Study Resources
  • Rural Health Resources
  • Stata Resources and Tips
  • Help/Off-Campus Access

Epidemiologic Case Studies

  • Epidemiologic Case Studies (US CDC) These case studies are interactive exercises developed to teach epidemiologic principles and practices. They are based on real-life outbreaks and public health problems and were developed in collaboration with the original investigators and experts from the Centers for Disease Control and Prevention (CDC). The case studies require students to apply their epidemiologic knowledge and skills to problems confronted by public health practitioners at the local, state, and national level every day.
  • Case Studies (WHO) From "Strengthening health security by implementing the International Health Regulations," each case has learning objectives and documentation.
  • Case Studies in Social Medicine A series of Perspective articles from the New England Journal of Medicine that highlight the importance of social concepts and social context in clinical medicine. The series uses discussions of real clinical cases to translate theories and methods for understanding social processes into terms that can readily be used in medical education, clinical practice, and health system planning.
  • African Case Studies in Public Heath Case study exercises based on real events in African contexts and written by experienced Africa-based public health trainers and practitioners. These case studies represent the most up-to-date and context-appropriate case study exercises for African public health training programs. These exercises are designed to reinforce and instill competencies for addressing health threats in the future leaders of public health in Africa.
  • Case Consortium @ Columbia University: Public Health Cases The case collection includes "teaching" cases. Nearly all the cases are multimedia and based on original research; a few are written from secondary sources. All cases are offered free of charge.
  • Epi Teams Training: Case Studies From the North Carolina Institute for Public Health, this curriculum includes several interactive case studies designed be used by the Epi Team as a group. These case studies are based on actual outbreaks that have occurred in North Carolina and elsewhere.
  • National Center for Case Study Teaching in Science The mission of the NCCSTS at the University at Buffalo is to promote the development and dissemination of materials and practices for case teaching in the sciences. Our website provides access to an award-winning collection of peer-reviewed case studies. We offer a five-day summer workshop and a two-day fall conference to train faculty in the case method of teaching science. In addition, we are actively engaged in educational research to assess the impact of the case method on student learning. "Case Collection" includes over 100 public health cases.

Books of Case Studies

case study of epidemiology

  • << Previous: R for Public Health
  • Next: Rural Health Resources >>
  • Last Updated: May 22, 2024 8:31 AM
  • URL: https://guides.lib.berkeley.edu/publichealth/PHW250

case study of epidemiology

PH717 Module 1B - Descriptive Tools

Descriptive epidemiology & descriptive statistics.

  •   Page:
  •   1  
  • |   2  
  • |   3  
  • |   4  
  • |   5  
  • |   6  
  • |   7  
  • |   8  
  • |   9  

On This Page sidebar

Case Reports and Case Series

Case reports, case series, test yourself.

Learn More sidebar

Categories of Descriptive Epidemiology

A case report is a detailed description of disease occurrence in a single person. Unusual features of the case may suggest a new hypothesis about the causes or mechanisms of disease.

Example: Acquired Immunodeficiency in an Infant; Possible Transmission by Means of Blood Products

In April 1983 it had not yet been shown that AIDS could be transmitted by blood or blood products. An infant born with Rh incompatibility; required blood products from 18 donors over 8 weeks and subsequently developed unusual recurrent infections with opportunistic agents such as Candida. The infant's T cell count was low, suggesting AIDS. There was no family history of immunodeficiency, but one of the blood donors was found to have died of AIDS. This led the investigators to hypothesize that AIDS could be transmitted by blood transfusion.

Link to article by Ammann AJ et al: Acquired immunodeficiency in an infant: possible transmission by means of blood products. The Lancet 1:956-958, 1983.

A case series is a report on the characteristics of a group of subjects who all have a particular disease or condition. Common features among the group may suggest hypotheses about disease causation. Note that the "series" may be small (as in the example below) or it may be large (hundreds or thousands of "cases"). However, the chief limitation is that there is no comparison group. Consequently, common features may suggest hypotheses, but these need to be tested with some sort of analytical study before an association can be accepted as valid.

Example: Discovery of HIV in the United States

case study of epidemiology

This was an extraordinarily important case series (a detailed description of characteristics of a series of people who all have the same disease) that suggested that this new syndrome was associated with sexual activity in male homosexuals. Alerting the medical establishment and proposing a hypothesis was an important milestone in the AIDS epidemic.

Link to article by Gottlieb MS, et al: Pneumocystis carinii pneumonia and mucosal candidiasis in previously healthy homosexual men: evidence of a new acquired cellular immunodeficiency. N Engl J Med 1981;305:1425-1431.

There had been a number of case reports of liver cancers in young women taking oral contraceptives. A study was undertaken by contacting all of the cancer registries collaborating with the American College of Surgeons. The investigators wanted to collect information on as many of these rare liver tumors as possible across the US.  

Table - Oral Contraceptive Use Among Women Who Developed Liver Cancer

What conclusions can you draw from these data regarding a possible increased risk of liver cancer in woman taking oral contraceptives? Think about it before you look at the answer.

return to top | previous page | next page

Content ©2020. All Rights Reserved. Date last modified: September 10, 2020. Wayne W. LaMorte, MD, PhD, MPH

tag. --> Epidemiologic Case Studies

These case studies are interactive exercises developed to teach epidemiologic principles and practices. They are based on real-life outbreaks and public health problems and were developed in collaboration with the original investigators and experts from the Centers for Disease Control and Prevention (CDC). The case studies require students to apply their epidemiologic knowledge and skills to problems confronted by public health practitioners at the local, state, and national level every day.

Three types of epidemiologic case studies are available.

Computer-Based Case Studies

Can be used as self-study and in the classroom.

Botulism in Argentina (CB3058)

E. coli O157:H7 Infection in Michigan (CB3075)

Gastroenteritis at a University in Texas (CB3076)

Classroom Case Studies

Primarily for use in a group setting with a knowledgeable instructor.

Instructor’s Guide

Foodborne Disease

Waterborne Disease

Outbreak Simulation

Gives students the opportunity to work through an outbreak investigation as a lead investigator.

Outbreak Simulation: Pharyngitis in Louisiana (CB3050)

File Formats Help:

  • Adobe PDF file
  • Microsoft PowerPoint file
  • Microsoft Word file
  • Microsoft Excel file
  • Audio/Video file
  • Apple Quicktime file
  • RealPlayer file
  • Zip Archive file
  • Page last reviewed: September 11, 2017
  • Page last updated: September 11, 2017
  • Office of Public Health Scientific Services ;
  • Center for Surveillance, Epidemiology, and Laboratory Services ;
  • Division of Scientific Education and Professional Development

Web Analytics

Acute and long-term outcomes of SARS-CoV-2 infection in school-aged children in England: Study protocol for the joint analysis of the COVID-19 schools infection survey (SIS) and the COVID-19 mapping and mitigation in schools (CoMMinS) study

Affiliations.

  • 1 Population Health Sciences, University of Bristol, Bristol, United Kingdom.
  • 2 Centre for Academic Primary Care, University of Bristol, Bristol, United Kingdom.
  • 3 Department of Infectious Disease Epidemiology, Faculty of Epidemiology and Population Health, London School of Hygiene & Tropical Medicine, London, United Kingdom.
  • 4 Office for National Statistics, Newport, South Wales, United Kingdom.
  • 5 NIHR Bristol Biomedical Research Centre, University of Bristol, Bristol, United Kingdom.
  • 6 Department of Infection Biology, Faculty of Infectious and Tropical Diseases, LSHTM, London, United Kingdom.
  • 7 Department of Non-Communicable Disease Epidemiology, Faculty of Epidemiology and Population Health, LSHTM, United Kingdom.
  • 8 MRC Integrative Epidemiology Unit, University of Bristol, Bristol, United Kingdom.
  • 9 Faculty of Epidemiology and Population Health, LSHTM, United Kingdom.
  • 10 Health Data Research UK (HDR UK) South-West, Bristol, United Kingdom.
  • PMID: 38776311
  • DOI: 10.1371/journal.pone.0303892

Background: The symptom profiles of acute SARS-CoV-2 infection and long-COVID in children and young people (CYP), risk factors, and associated healthcare needs, are poorly defined. The Schools Infection Survey 1 (SIS-1) was a nationwide study of SARS-CoV-2 infection in primary and secondary schools in England during the 2020/21 school year. The Covid-19 Mapping and Mitigation in Schools (CoMMinS) study was conducted in schools in the Bristol area over a similar period. Both studies conducted testing to identify current and previous SARS-CoV-2 infection, and recorded symptoms and school attendance. These research data have been linked to routine electronic health record (EHR) data.

Aims: To better understand the short- and long-term consequences of SARS-CoV-2 infection, and their risk factors, in CYP.

Methods: Retrospective cohort and nested case-control analyses will be conducted for SIS-1 and CoMMinS data linked to EHR data for the association between (1) acute symptomatic SARS-CoV-2 infection and risk factors; (2) SARS-CoV-2 infection and long-term effects on health: (a) persistent symptoms; (b) any new diagnosis; (c) a new prescription in primary care; (d) health service attendance; (e) a high rate of school absence.

Results: Our study will improve understanding of long-COVID in CYP by characterising the trajectory of long-COVID in CYP in terms of things like symptoms and diagnoses of conditions. The research will inform which groups of CYP are more likely to get acute- and long-term outcomes of SARS-CoV-2 infection, and patterns of related healthcare-seeking behaviour, relevant for healthcare service planning. Digested information will be produced for affected families, doctors, schools, and the public, as appropriate.

Conclusion: Linked SIS-1 and CoMMinS data represent a unique and rich resource for understanding the impact of SARS-CoV-2 infection on children's health, benefiting from enhanced SARS-CoV-2 testing and ability to assess a wide range of outcomes.

Copyright: © 2024 Looker et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

  • COVID-19* / diagnosis
  • COVID-19* / epidemiology
  • Case-Control Studies
  • England / epidemiology
  • Retrospective Studies
  • Risk Factors
  • SARS-CoV-2* / isolation & purification

Attack Rate, Case Fatality Rate and Predictors of Pertussis Outbreak During Pertussis Outbreak Investigation in Ethiopia: Systematic Review and Meta-Analysis

  • Review Article
  • Open access
  • Published: 15 May 2024

Cite this article

You have full access to this open access article

case study of epidemiology

  • Mengistie Kassahun Tariku 1 ,
  • Abebe Habtamu Belete 1 ,
  • Daniel Tarekegn Worede 1 ,
  • Simachew Animen Bante 2 ,
  • Agumas Alemu Alehegn 3 ,
  • Biniam Kebede Assen 3 ,
  • Bantayehu Addis Tegegne 4 &
  • Sewnet Wongiel Misikir 5  

214 Accesses

Explore all metrics

Pertussis, a highly contagious, vaccine-preventable respiratory infection caused by Bordetella pertussis, is a leading global public health issue. Ethiopia is currently conducting multiple pertussis outbreak investigations, but there is a lack of comprehensive information on attack rate, case fatality rate, and infection predictors. This study aimed to measure attack rates, case fatality rates, and factors associated with pertussis outbreak.

This study conducted a systematic review and meta-analysis of published and unpublished studies on pertussis outbreaks in Ethiopia from 2009 to 2023, using observational study designs, using the guideline Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA). The study utilized databases like Science Direct, MEDLINE/PubMed, African Journals Online, Google Scholar and registers. The data were collected using an Excel Spreadsheet and then exported to STATA version 17 for analysis. Subgroup analysis was conducted to identify potential disparities. A random effects model was used to consider heterogeneity among studies. I 2 -squared test statistics were used to assess heterogeneity. The attack rate, case fatality rate, and odds ratio (OR) were presented using forest plots with a 95% confidence interval. Egger’s and Begg’s tests were used to evaluate the publication bias.

Seven pertussis outbreak investigations with a total of 2824 cases and 18 deaths were incorporated. The pooled attack and case fatality rates were 10.78 (95% CI: 8.1–13.5) per 1000 population and 0.8% (95% CI: 0.01–1.58%), respectively. The highest and lowest attack rates were in Oromia (5.57 per 1000 population and in the Amhara region (2.61 per 1000 population), respectively. Predictor of pertussis outbreak were being unvaccinated [odds ratio (OR) = 3.05, 95% CI: 1.83–4.27] and contact history [OR = 3.44, 95% CI: 1.69–5.19].

Higher and notable variations in attack and case fatality rates were reported. Being unvaccinated and having contact history were the predictors of contracting pertussis disease in Ethiopia. Enhancing routine vaccination and contact tracing efforts should be strengthened.

Similar content being viewed by others

case study of epidemiology

The burden of pertussis in low- and middle-income countries since the inception of the Expanded Programme on Immunization (EPI) in 1974: a systematic review protocol

Bordetella pertussis in school-age children, adolescents and adults: a systematic review of epidemiology and mortality in europe.

case study of epidemiology

The burden of laboratory-confirmed pertussis in low- and middle-income countries since the inception of the Expanded Programme on Immunisation (EPI) in 1974: a systematic review and meta-analysis

Avoid common mistakes on your manuscript.

1 Introduction

A pertussis outbreak investigation involves identifying and confirming suspected outbreaks through prompt and intelligent use of appropriate procedures to contain the outbreak [ 1 ]. The aim of the pertussis outbreak investigation is to assess the outbreak’s magnitude, locate the source and population at risk, and initiate prompt case management in order to lower morbidity and mortality [ 2 ].

Pertussis (whooping cough), a highly communicable respiratory infection caused by Bordetella pertussis infection [ 3 ]. A cough of 14 days or more, or any duration with a paroxysm, or any duration cough with whoop is the suspect of pertussis case. A case that meets the clinical case definition and is linked epidemiologically or directly to a laboratory-confirmed case is known as a confirmed case [ 4 , 5 ]. Catarrhal, paroxysmal, and convalescent are the three stages of pertussis. Grasping, fever, congestion, and sneezing are all part of the catarrhal stage. A severe cough, cyanosis, and rapid coughing are symptoms of the paroxysmal stage. Over the course of two to three weeks, the cough will lessen in intensity and eventually go away during the convalescent phase [ 1 , 6 , 7 ].

Pertussis can be transmitted through respiratory contact with infected individuals’ secretions, up to two weeks before and after symptoms appear [ 8 , 9 ]. There are no known sources of pertussis in animals, insects, or vectors; humans are the only known reservoir for the disease [ 10 , 11 ].

Globally, over 150,000 cases, with 160,700 children are dying annually. Africa has the highest global case and death rates of the disease, accounting for 33% of cases and 58% of deaths [ 12 ]. In 2016, Ethiopia reported 4,719 confirmed pertussis cases and 9 deaths [ 13 ].

Pertussis cases and outbreaks are primarily influenced by factors such as living in close proximity to an infected person, waning immunity post-vaccination, and not being immunized [ 10 , 14 ]. Individuals without diphtheria, pertussis and tetanus (DPT) vaccines in the same household are 80–100% susceptible to exposure, while those immunized and living in the same household are 20% [ 15 ].

The highest likelihood of pertussis-related morbidity and mortality is present in infants and early childhood [ 16 ]. Young unvaccinated infants, under-vaccinated preschool children, and those under 6 months old are at higher risk for severe complications related to pertussis [ 17 ].

Vaccination is the most effective method for preventing pertussis in all age groups [ 18 ]. High vaccine coverage leads to high protection in children under five, while minor reductions can increase cases [ 1 , 19 ]. Three doses of the pertussis vaccine, when completed, prevent 95% of deaths and 80% of cases. It has been demonstrated that incomplete vaccination can prevent severe morbidity; mortality is reduced by 50% and 80%, respectively, after one and two doses [ 1 , 20 ].

Most African countries, except Morocco and Rwanda, have varying DPT3 coverage by 25%, with Ethiopia, Somalia, and Angola having low coverage and high dropout rates [ 21 ].

Ethiopia’s national vaccination strategy includes Pentavalent vaccines at 6, 10, and 14 weeks, but not pertussis booster. Despite good coverage, recent outbreaks of pertussis have been reported in various localities [ 22 ].

In 2009 and 2017, Ethiopia introduced advanced field epidemiology and frontline programs, respectively to improve outbreak identification and investigation [ 23 ]. Meta-analysis is a widely-used tool that integrates findings from various studies to inform decision-making in evidence-based medicine [ 24 ]. Ethiopia is currently conducting multiple pertussis outbreak investigations, but there is a lack of comprehensive information on attack rate, case fatality rate, and infection predictors. Therefore, this study aimed to measure the pooled attack rates, case fatality rates, and factors associated with pertussis outbreak.

2.1 Study Design and Searching Methods

A systematic review and meta-analysis of published or unpublished studies on pertussis/ whooping cough outbreaks were employed from December 1–25/2023 by using the guideline Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) in this study. The databases of Science Direct, MEDLINE/PubMed, African Journals Online, and Google Scholar were searched for published studies. The terms that were used were “Whooping cough or pertussis or Bordetella pertussis and attack rate or incidence and case fatality rate or mortality and determinants or risk factors and outbreak or epidemic and investigation or study or search and Ethiopia.”

2.2 Study Selection and Eligible Criteria

All published or unpublished articles on pertussis/ whooping cough outbreaks investigations in Ethiopia were included in this systematic review and meta-analysis. The study designs used in the studies were observational study designs. The systematic review included studies on pertussis/ whooping cough outbreak investigations that were written, in English andaccessible online between 2009 and 2023. Outbreak investigations without attack rate or case fatality rate measurement were excluded.

2.3 Measurement of Outcomes

The pertussis attack rate, which is determined by dividing the number of cases of the disease by the total number of population at risk and multiplied by 100, and the case fatality rate, which is determined by dividing the number of deaths from the disease by the total number of cases and multiplied by 100 [ 25 ], are the outcome variables. Additionally, factors influencing the likelihood of getting a Bordetella pertussis infection were outcome variables. The presence of the Bordetella pertussis infection was determined by contact and vaccination status. These factors were reported in odds ration with 95% CI.

2.4 Quality Assessment and Data Extraction

Independently, two authors reviewed full-length articles, examined titles and abstracts, and evaluated the quality of studies to include or exclude. To ensure transparent communication and thorough analysis, the team convened with a third author in order to arrive at a consensus decision. The Joanna Briggs Institute quality check tool was used to assess each study’s quality [ 26 ]. The eight checklists in this quality check tool are designed to evaluate the following aspects of research quality: (1) evaluating inclusion and exclusion criteria; (2) summarizing the study subject and setting; (3) measuring outcome; (4) measuring exposure; (5) identifying confounding factors; (6) confounding factor control strategies; (7) suitable statistical analysis; and (8) objective and standard criteria applied. Seven studies that received a score of six out of eight were considered suitable for inclusion in the systematic review and meta-analysis.

Prior to collecting data, various kinds of literature were reviewed in order to adapt a standard tool. Two authors worked independently to develop the data extraction tool, and two more authors made revisions. The tool was approved by all authors prior to data collection. Author names, publication years, study designs, study periods, study settings, sample sizes, descriptive data analysis (attack rate, case fatality rate, outbreak duration, and vaccination status), and Pertussis/ Whooping cough outbreak factors were among the details included in the data extraction tool.

2.5 Data Management and Analysis

To perform the meta-analysis, data was gathered, arranged, and imported into an Excel spreadsheet before being imported into STATA version 17. The thorough evaluation and consolidation of attack rates, case fatality rates, and variables influencing Bordetella pertussis infection was the primary objective of the systematic review and meta-analysis.

The attack rate, case fatality rate, and standard error data from each study were used to calculate the pooled attack and case fatality rates as well as the corresponding 95% confidence intervals (CI). The findings were visually represented using forest plots, which showed the odds ratios (OR) corresponding to variables linked to Bordetella pertussis infection as well as the 95% confidence intervals (CI) for attack and case fatality rates.

Subgroup analysis, which took into account variables like study design and geography, was done to look into any potential disparities. The meta-analysis used a random effects model to take into consideration the heterogeneity among the included studies.

The I 2 -squared test statistic and its associated p-value were used to assess the heterogeneity of the investigation. Heterogeneity was defined as a p-value less than 0.05. We used I 2 25, 50, and 75% statistics to indicate low, moderate, and high heterogeneity [ 27 ], respectively. A meta-analysis’s potential publication bias is often evaluated using Egger’s and Begg’s tests, with a p-value of less than 0.005 indicating the significance of the results [ 28 ].

3.1 Study Selection

We were able to obtain a total of 242 records through electronic database searches. One hundred thirty investigations were removed from consideration after a preliminary screening process that evaluated titles, abstracts, and full article reviews. Following this, eligibility was assessed for 25 articles, of which 18 were rejected because of insufficient reporting. In the end, seven studies satisfied the requirements and were included in the systematic review and the meta-analysis (Fig.  1 ).

figure 1

Flow diagram of pertussis outbreak investigation included in systematic review and meta-analysis in Ethiopia, 1980–2023

3.2 Characteristics of Included Investigations

Seven different Pertussis/whooping cough outbreak investigations were included in this systematic review and meta-analysis, a total of 2824 confirmed cases of the disease and 18 deaths recorded (ages 1 month to 51 years) [ 4 , 5 , 29 , 30 , 31 , 32 , 33 ]. Of the seven studies, four had a case-control study design and the other three had a descriptive cross-sectional, study design. Five articles were conducted in the Amhara region [ 4 , 5 , 30 , 32 , 33 ], one in South Nation Nationality and People Region (SNNPR) [ 29 ], and one in Oromia region [ 31 ]. The included articles were investigated from 2017 to 2023, and the outbreak lasted from 21 days to 149 days (Table 1 ).

3.3 Study bias Assessment Results

Every article was carefully evaluated, with studies receiving an 8 out of 8 being classified as good quality, and studies receiving a 6 or 7 being classified as medium risk. Using the aforementioned appraisal tools, no study was left out of the reviews. Cross-sectional and case-control studies were assessed using the following criteria (Table  2 ).

3.4 Attack Rate

The systemic review and meta-analysis yielded a pooled attack rate (A.R.) of 10.78 (95% CI: 8.1–13.5) per 1000 population [ 4 , 5 , 30 , 31 , 32 , 33 , 34 ] (Fig.  2 ). The heterogeneity was significantly higher (p-value < 0.0001 and I 2  = 99.81% (95% CI: 99.78-99.84%)) (Fig.  3 ). There was a noticeable publication bias found, with Eggers, P  < 0.0001 and funnel plot (Fig.  4 ).

figure 2

Forest plot of attack rate of pertussis outbreak in Ethiopia, 2009–2023

figure 3

Galbraith plot with 95% CI of precision of Attack rate during pertussis outbreak investigations in Ethiopia, 2009–2023

figure 4

Funnel plot with Pseudo 95% CI to assess publication bias of the Meta-analysis of attack rate during pertussis outbreak investigations in Etiopia, 2009–2023

3.5 Case Fatality Rate

The pooled CFR for this study was 0.8% (95% CI: 0.01–1.58%) [ 4 , 28 , 30 , 31 ] (Fig.  5 ). Significantly moderate heterogeneity was present (p-value = 0.05 and I 2  = 61.47% (95% CI: 0-87.1%)) (Fig.  6 ). There was a detectable publication bias with Egger tests, p-value = 0.015 and funnel plot (Fig.  7 ).

figure 5

Forest plot of case fatality rate of pertussis outbreak in Ethiopia, 2009–2023

figure 6

Galbraith plot with 95% CI of precision of case fatality rate during pertussis outbreak investigations in Ethiopia, 2009–2023

figure 7

Funnel plot with Pseudo 95% CI to assess publication bias of the Meta-analysis of case fatality rate during pertussis outbreak investigations in Ethiopia, 2009–2023

3.6 Subgroup-Analysis

Based on a subgroup analysis of attack rates by region, the Oromia region had the highest attack rate, with 55.7 (95% CI: 50.6–60.7) per 1000 population. The Amhara region had the lowest reported AR, 2.61 (95% CI: 1.81–3.40) per 1000 population. A higher attack rate was reported in the cross-sectional study design: 26.30 (95% CI: 11.35–41.26) per 1000 population (Table  3 ).

3.7 Predictor of Pertussis Infection

Three investigations into outbreaks found a significant association between the vaccination status—vaccinated or unvaccinated—and the likelihood of contracting pertussis disease [ 4 , 5 ]. According to the meta-analysis, individuals who had not received the pertussis vaccination had an almost threefold increased risk of getting the illness (OR = 3.05, 95% CI: 1.83–4.27). The evidence of heterogeneity (I 2  = 0, P-value = 0.6623) demonstrated the absence of heterogeneity.

Contact history was found to be a significant risk factor for contracting pertussis in three outbreak investigations [ 4 , 5 , 30 ]. The individuals who had contact history were more than 3 times more likely in contracting pertussis disease (OR = 3.44, 95% CI: 1.69–5.19). The heterogeneity test showed that there was no heterogeneity (I 2  = 0, P-value = 0.6250) (Table  4 ).

4 Discussion

In order to assess the total evidence of the pertussis outbreak investigation in Ethiopia, a systematic review and meta-analysis were carried out. During the course of the outbreak investigation, the pooled attack rate of 10.78 (95% CI: 8.06–13.51) per 1000 has been estimated. The highest and lowest attack rates were 55.682 per 1000 population [ 31 ] and 0.864 per 1000 population [ 5 ], respectively. There could be several reasons for this variation in attack rate between outbreaks, including the population’s vaccination status, the duration of the outbreak, its range, and at the time the intervention was launched. Differences may also exist in the response-to-action threshold and the duration of the outbreak investigation.

In four studies, the age group of children under five years old had the highest age-specific attack rate, 197.7/1000 [ 31 ], 73.6/1000 [ 30 ], 6.8/1000 [ 4 ], and 5.5/1000 (929) populations. The age group of 5 to 9 years old had the highest attack rate (245 per 1000 population) according to one outbreak investigation [ 5 ]. Children under the age of four were the most affected group population in another investigation [ 33 ].

A sex-specific attack rate was found to be similar in four outbreak investigations [ 5 , 30 , 31 , 32 ], whereas the highest attack rate was found in females in two outbreak investigations [ 4 , 33 ].

Six articles [ 4 , 5 , 29 , 31 , 32 , 33 ] used the epidemic curve to plot the outbreak over time, while the epidemic curve was not used to characterize the outbreak in terms of time in one outbreak investigation [ 30 ]. Between September and January, three distinct outbreaks had occurred [ 29 , 33 ].

Two outbreak investigations used place specific attack rate but did not use map [ 5 , 31 ], while three outbreak investigations did not use place specific attack rate [ 4 , 30 , 33 ]. In one outbreak investigation, place specific attack rate and map were utilized [ 29 ]. The attack rate showed significant regional variation. Between 0.864 and 7.5 attacks per 1000 population were noted in the Amhara region [ 4 , 5 , 30 , 32 , 33 ], 55.682 attacks per 1000 population in the Oromia region [ 31 ], and 17.080 attacks per 1000 population in the SNNPR region [ 29 ]. According to subgroup analysis, the Amhara region had the lowest pooled attack rate (2.61 per 1000 population) [ 4 , 5 , 30 , 32 , 33 ], while the Oromia region had the highest pooled attack rate (55.682 per 1000 population) [ 31 ]. This discrepancy could result from a difference in the timing of the outbreak response’s start and detection. After the outbreak had been ongoing for two weeks and two months, respectively, the district health office and the zonal health department in the Oromia region responded to it [ 31 ]. Another explanation for this variation could be the different denominator used to calculate the attack rate. In the Oromia region, the smallest administrative unit, the affected Keble [ 31 ], served as the denominator, whereas in other outbreak investigations, the district population—which included people living outside of affected Kebeles—was used. This could inflate the AR in the Oromia region.

A higher attack rate—26.30 (95% CI; 11.35–41.26) per 1000 population—was found in subgroup analysis of a cross-sectional study design. This could be as a result of the case-control study design’s primary focus on risk factors identification during data collection, which could jeopardize active case search.

In this study, the pooled case fatality rate was 0.8% (95% CI: 0.01–1.58%). The case fatality rate in four outbreaks [ 4 , 29 , 31 , 32 ] ranged from 0.33 to 3.72%, while in three other outbreak investigations, the CFR was zero [ 5 , 30 , 33 ]. The variation may be attributed to disparities in the duration of the outbreak. The Amhara region experienced an outbreak with a CFR of 3.72% for 112 days [ 4 ], the Oromia region experienced an outbreak with a CFR of 0.68% for 85 days [ 31 ] and the SNNPR experienced an outbreak with a CFR of 0.33% for 149 days [ 29 ]. The duration of outbreak of the remaining outbreak investigations, which had zero CFR, was 60 days [ 30 ], 30 days [ 33 ] and 21 days [ 5 ]. A discrepancy in the intervention’s start time could be another factor. Three fatalities were recorded in the Oromia region prior to the start of the outbreak control and public health response measures. Additional deaths might have been avoided if clinical case management had started sooner [ 31 ].

In two outbreak investigation, the highest CFR was observed at the age of 5–9 years, 6.3% [ 4 ] and 1.4% [ 31 ]. The age group ≤ 5 years had the highest CFR, 0.87% [ 29 ] in another outbreak investigation. In one unpublished outbreak investigation, the highest CFR was reported at the age group of < 1 year, 17% [ 32 ]. This variation might be due to difference in immunization status across age group.

In three outbreak investigation, females had the highest CFR, 2% [ 32 ], 0.91% [ 31 ] and 4.34% [ 4 ].

Childhood pertussis vaccination provides limited protection, but when completed, it prevents 95% of deaths and 80% of cases with three doses [ 1 , 19 ]. Individuals without diphtheria, pertussis and tetanus (DPT) vaccines in the same household are 80–100% susceptible to exposure [ 13 ]. The study found that individuals who did not receive the pertussis vaccination had a nearly threefold increased risk of contracting the disease [ 4 , 5 , 32 ]. Two outbreak investigations revealed that all cases had unknown vaccination status [ 32 , 34 ], while in two other investigations, 41% of cases completed the DPT 3 dose [ 5 , 30 ]. A study in Janamora district, Amhara region revealed 86.6% of cases were unvaccinated [ 30 ], while 51.2% completed DPT3 in Mekdela district, South Wollo zone, Amhara region [ 4 ]. Another study in Mahal Saynt district, South Wollo Zone, Amhara Region showed that 34.29% of cases were not vaccinated [ 32 ]. The article in the Oromia region, the study setting, had 100% official reported vaccination coverage [ 31 ]. In four outbreak investigation, there was no regular routine immunization service and kebeles health posts didn’t have functional refrigerators for the storage of vaccines [ 4 , 30 , 31 , 33 ], the study in SNNPR showed that investigation team did not find continuously recorded temperature monitoring tools [ 29 ].

Pertussis cases and outbreaks are primarily influenced by factors such as living in close proximity to an infected person [ 10 , 13 ]. The individuals who had contact history were more than three times more likely in contracting pertussis disease as compared to individuals who had no contact history. This is congruent with the study conducted in Australia [ 34 ].

Even though this study provided the consolidated evidences for pertussis outbreak investigation in Ethiopia, there are some limitations. First, this study only included pertussis outbreak investigations, which were written in English and accessible online. This might under or overestimate attack rate and case fatality rate. Secondly, the cross-sectional and case-control study designs employed in all of the included outbreak investigations restrict the ability to evaluate causal relationships.

5 Conclusion

Our results showed that higher as compared with 2023 Provisional Pertussis Surveillance Report and significant differences in attack and case fatality rates between the various study regions. In Ethiopia, the risk factors for catching pertussis were not getting vaccinated and having a history of contact. The overbearing measures include enhancing routine vaccination and contact tracing efforts should be strengthened.

Data Availability

The data sets generated during the current study are available from the corresponding author upon reasonable request.

Abbreviations

Attack rate

Case fatality rate

Confidence interval

diphtheria pertussis tetanus and OR: Odds ratio

Blain A, Tiwari T. Manual for the Surveillance of Vaccine-Preventable Diseases. Atlanta, GA: US Department of Health and Human Services. 2017;500:2017.

Forsyth K, Tan T, von König C-HW, Caro JJ, Plotkin S. Potential strategies to reduce the burden of pertussis. Pediatr Infect Dis J. 2005;24(5):S69–74.

Article   PubMed   Google Scholar  

Trainor EA, Nicholson TL, Merkel TJ. Bordetella pertussis transmission. Pathogens Disease. 2015;73(8).

Alamaw SD, Kassa AW, Gelaw YA. Pertussis outbreak investigation of Mekdela district, south Wollo Zone, Amhara region, north-west Ethiopia. BMC Res Notes. 2017;10:1–7.

Article   Google Scholar  

Yeshanew AG, Lankir D, Wondimu J, Solomon S. Pertussis outbreak investigation in Northwest Ethiopia: a community based study. PLoS ONE. 2022;17(2):e0263708.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Kline JM, Lewis WD, Smith EA, Tracy LR, Moerschel SK. Pertussis: a reemerging infection. Am Family Phys. 2013;88(8):507–14.

Google Scholar  

Organization WH. WHO vaccine-preventable diseases: monitoring system: 2009 global summary. World Health Organization; 2009.

Control, CfD. Prevention. Manual for the surveillance of vaccine-preventable diseases. Atlanta: Centers for Disease Control and Prevention, 1997. 2003.

Gopal DP, Barber J, Toeg D. Pertussis (whooping cough). BMJ. 2019;364:l401.

Hamborsky J, Kroger A. Epidemiology and prevention of vaccine-preventable diseases. E-Book: The Pink Book: Public Health Foundation; 2015.

Ryu S, Kim JJ, Chen M-Y, Jin H, Lee HK, Chun BC. Outbreak investigation of pertussis in an elementary school: a case-control study among vaccinated students. Clin Experimental Vaccine Res. 2018;7(1):70–5.

Yeung KHT, Duclos P, Nelson EAS, Hutubessy RCW. An update of the global burden of pertussis in children younger than 5 years: a modelling study. Lancet Infect Dis. 2017;17(9):974–80.

Taye S, Tessema B, Gelaw B, Moges F. Assessment of pertussis vaccine protective effectiveness in children in the Amhara regional state, Ethiopia. International journal of microbiology. 2020;2020.

Wensley A, Hughes G, Campbell H, Amirthalingam G, Andrews N, Young N, et al. Risk factors for pertussis in adults and teenagers in England. Epidemiol Infect. 2017;145(5):1025–36.

Hughes MM, Englund JA, Kuypers J, Tielsch JM, Khatry SK, Shrestha L, et al. Population-based pertussis incidence and risk factors in infants less than 6 months in Nepal. J Pediatr Infect Dis Soc. 2017;6(1):33–9.

Calvert A, Karampelas K, Andrews N, England A, Hallis B, Jones CE et al. Optimising the timing of whooping cough immunisation in MUMs (OpTIMUM): a randomised controlled trial investigating the timing of pertussis vaccination in pregnancy. Lancet Microbe. 2021.

Baxter R, Bartlett J, Rowhani-Rahbar A, Fireman B, Klein NP. Effectiveness of pertussis vaccines for adolescents and adults: case-control study. BMJ. 2013;347.

Powell-Jackson T, Fabbri C, Dutt V, Tougher S, Singh K. Effect and cost-effectiveness of educating mothers about childhood DPT vaccination on immunisation uptake, knowledge, and perceptions in Uttar Pradesh, India: a randomised controlled trial. PLoS Med. 2018;15(3):e1002519.

Article   PubMed   PubMed Central   Google Scholar  

Crowcroft N, Stein C, Duclos P, Birmingham M. How best to estimate the global burden of pertussis? Lancet Infect Dis. 2003;3(7):413–8.

Article   CAS   PubMed   Google Scholar  

Mosser JF, Gagne-Maynard W, Rao PC, Osgood-Zimmerman A, Fullman N, Graetz N, et al. Mapping diphtheria-pertussis-tetanus vaccine coverage in Africa, 2000–2016: a spatial and temporal modelling study. Lancet. 2019;393(10183):1843–55.

Wolter N, Cohen C, Tempia S, Walaza S, Moosa F, du Plessis M, et al. Epidemiology of Pertussis in individuals of all ages hospitalized with respiratory illness in South Africa, January 2013—December 2018. Clin Infect Dis. 2021;73(3):e745–53.

Argaw MD, Desta BF, Tsegaye ZT, Mitiku AD, Atsa AA, Tefera BB, et al. Immunization data quality and decision making in pertussis outbreak management in southern Ethiopia: a cross sectional study. Archives Public Health. 2022;80(1):49.

Kebebew T, Woldetsadik MA, Barker J, Cui A, Abedi AA, Sugerman DE, et al. Evaluation of Ethiopia’s field epidemiology training program–frontline: perspectives of implementing partners. BMC Health Serv Res. 2023;23(1):406.

Bello A, Wiebe N, Garg A, Tonelli M. Evidence-based decision-making 2: systematic reviews and meta-analysis. Clin Epidemiology: Pract Methods. 2015:397–416.

Muller A, Leeuwenburg J, Pratt D. Pertussis: epidemiology and control. Bull World Health Organ. 1986;64(2):321.

CAS   PubMed   PubMed Central   Google Scholar  

Porritt K, Gomersall J, Lockwood C. JBI’s systematic reviews: study selection and critical appraisal. AJN Am J Nurs. 2014;114(6):47–52.

Sedgwick P. Meta-analyses: what is heterogeneity? Bmj. 2015;350.

Furuya-Kanamori L, Xu C, Lin L, Doan T, Chu H, Thalib L, et al. P value–driven methods were underpowered to detect publication bias: analysis of Cochrane review meta-analyses. J Clin Epidemiol. 2020;118:86–92.

Mitiku AD, Argaw MD, Desta BF, Tsegaye ZT, Atsa AA, Tefera BB, et al. Pertussis outbreak in southern Ethiopia: challenges of detection, management, and response. BMC Public Health. 2020;20:1–12.

Almaw L, Bizuneh H. Pertussis outbreak investigation in Janamora district, Amhara Regional State, Ethiopia: a case-control study. Pan Afr Med J. 2019;34(1).

Badeso MH, Kalili FS, Bogale NB. Pertussis outbreak investigation in Likimsa-Bokore kebele, Meda Walebu district, Bale zone, Oromia region, Ethiopia, 2019. 2021.

Seid Mohammed ZA, Pertussis outbreak investigation in Mahal Saynt District, South Wollo Zone. Ethiopia: Amhara Region; 2023.

Wagaye FE, Asrat A, Shimekaw B, Hassen M, Terefe W, Gelaw A, et al. Pertussis outbreak investigation in south Gondar zone, Northwest, Ethiopia. Public Health. 2023;9(2):1–5.

Kovitwanichkanont T. Public health measures for pertussis prevention and control. Aust N Z J Public Health. 2017;41(6).

Download references

Acknowledgements

We would like to acknowledge all authors of investigations included in this systemic review and Meta-analysis.

This work was funded by ourselves.

Author information

Authors and affiliations.

Department of Public Health, College of Health Science, Debre Markos University, Debre Markos, Ethiopia

Mengistie Kassahun Tariku, Abebe Habtamu Belete & Daniel Tarekegn Worede

Department of Midwifery, College of Medicine and Health Science, Bahir Dar University, Bahir Dar, Ethiopia

Simachew Animen Bante

Amhara Regional State Public Health Institute, Bahir Dar, Ethiopia

Agumas Alemu Alehegn & Biniam Kebede Assen

Department of Pharmacy, College of Health Science, Debre Markos University, Debre Markos, Ethiopia

Bantayehu Addis Tegegne

Department of Medical Laboratory Technology, Felege Hiwot Comprehensive Specialized Hospital, Bahir Dar, 680, Ethiopia

Sewnet Wongiel Misikir

You can also search for this author in PubMed   Google Scholar

Contributions

The primary manuscript text was authored by M.K. and S.W., while A.H. and B.A was responsible for generating Figs.  2 , 3 , 4 , 5 , 6 and 7 ; Tables  1 , 2 , 3 and 4 . D.T and B.K. PRISMA diagram creation and revision. S.A. and A.A crafted the abstract with input from all authors. Subsequently, all authors participated in the manuscript review.

Corresponding author

Correspondence to Mengistie Kassahun Tariku .

Ethics declarations

Ethics approval and consent to participate.

Not applicable.

Consent for Publication

Competing interests.

The authors declare that they have no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Tariku, M.K., Belete, A.H., Worede, D.T. et al. Attack Rate, Case Fatality Rate and Predictors of Pertussis Outbreak During Pertussis Outbreak Investigation in Ethiopia: Systematic Review and Meta-Analysis. J Epidemiol Glob Health (2024). https://doi.org/10.1007/s44197-024-00234-4

Download citation

Received : 20 March 2024

Accepted : 22 April 2024

Published : 15 May 2024

DOI : https://doi.org/10.1007/s44197-024-00234-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List
  • JMIR Publications - PMC COVID-19 Collection
  • PMC10071404

Logo of pheihealthco

Big Data and Infectious Disease Epidemiology: Bibliometric Analysis and Research Agenda

Lateef babatunde amusa.

1 Centre for Applied Data Science, University of Johannesburg, Johannesburg, South Africa

2 Department of Statistics, University of Ilorin, Ilorin, Nigeria

Hossana Twinomurinzi

Edith phalane.

3 Pan African Centre for Epidemics Research (PACER) Extramural Unit, South African Medical Research Council/University of Johannesburg, Johannesburg, South Africa

4 Department of Environmental Health, Faculty of Health Sciences, University of Johannesburg, Johannesburg, South Africa

Refilwe Nancy Phaswana-Mafuya

Infectious diseases represent a major challenge for health systems worldwide. With the recent global pandemic of COVID-19, the need to research strategies to treat these health problems has become even more pressing. Although the literature on big data and data science in health has grown rapidly, few studies have synthesized these individual studies, and none has identified the utility of big data in infectious disease surveillance and modeling.

The aim of this study was to synthesize research and identify hotspots of big data in infectious disease epidemiology.

Bibliometric data from 3054 documents that satisfied the inclusion criteria retrieved from the Web of Science database over 22 years (2000-2022) were analyzed and reviewed. The search retrieval occurred on October 17, 2022. Bibliometric analysis was performed to illustrate the relationships between research constituents, topics, and key terms in the retrieved documents.

The bibliometric analysis revealed internet searches and social media as the most utilized big data sources for infectious disease surveillance or modeling. The analysis also placed US and Chinese institutions as leaders in this research area. Disease monitoring and surveillance, utility of electronic health (or medical) records, methodology framework for infodemiology tools, and machine/deep learning were identified as the core research themes.

Conclusions

Proposals for future studies are made based on these findings. This study will provide health care informatics scholars with a comprehensive understanding of big data research in infectious disease epidemiology.

Introduction

Globally, the infectious disease burden continues to be substantial in countries with low and lower-middle income, while morbidity and mortality related to neglected tropical diseases and HIV infection, tuberculosis, and malaria remain high. Tuberculosis and malaria are endemic to many areas, imposing substantial but steady burdens. At the same time, other infections such as influenza fluctuate in pervasiveness and intensity, disrupting the developing and developed settings alike when an outbreak and epidemic occurs. Additionally, deaths have persisted over the 21st century due to emerging and reemerging infectious diseases compared with seasonal and endemic infections. This portrays a new era of infectious disease, defined by outbreaks of emerging, reemerging, and endemic pathogens that spread quickly with the help of global mobility and climate change [ 1 ].

Moreover, the risk from infectious diseases is globally shared. While infectious diseases thrive in underresourced settings, inequalities and inequities in accessing health and health care create a favorable environment for infectious diseases to spread [ 2 , 3 ]. Addressing inequalities and inequities in accessing health care, and improving surveillance and monitoring of infectious diseases should be prioritized to minimize the emergence and spread of infections.

Recent years have witnessed the rapid emergence of big data and data science research, propelled by the increasing availability of digital traces [ 4 ]. The growing availability of electronic records and passive data generated by social media, the internet, and other digital sources can be mined for pattern discoveries and knowledge extraction. Like most buzz words, big data has no straightforward meaning and its definition is evolving. Broadly, big data refers to a large volume of structured or unstructured data, with largeness itself associated with three major terms known as the “3 Vs”: volume (large quantity), velocity (coming in at unprecedented real-time speeds), and variety (increasing collection from different data sources). Additional characteristics of big data include veracity, validity, volatility, and value [ 5 ]. For epidemiology and infectious diseases research, this means that in the last decade, there has been a significant spike in the number of studies with considerable interest in using digital epidemiology and big data tools to enhance health systems in terms of disease surveillance, modeling, and evidence-based responses [ 4 , 6 - 8 ]. Digital epidemiology uses digital data or online sources to gain insight into disease dynamics and health equity, and to inform public health programs and policies [ 9 , 10 ].

The success of infectious disease control relies heavily on surveillance systems tracking diseases, pathogens, and clinical outcomes [ 11 ]. However, conventional surveillance systems are known to frequently have severe time lags and limited spatial resolution; therefore, surveillance systems that are robust, local, and timely are critically needed. It is crucial to monitor and forecast emerging and reemerging infections [ 12 ] such as severe acute respiratory syndrome, pandemic influenza, Ebola, Zika, and drug-resistant pathogens, especially in resource-limited settings such as low-middle–income countries. Using big data to strengthen surveillance systems is critical for future pandemic preparedness. This approach provides big data streams that can be triangulated with spatial and temporal data. These big data streams include digital data sources such as mobile health apps, electronic health (or medical) records, social media, internet searches, mobile phone network data, and GPS mobile data. Many studies have demonstrated the usefulness of real-time data in health assessments [ 13 - 18 ]. Some of these studies have been used explicitly for the monitoring and forecasting of epidemics such as COVID-19 [ 19 ], Zika [ 13 ], Ebola [ 16 ], and influenza [ 14 ].

The body of extant literature at the nexus of big data, epidemiology, and infectious diseases is rapidly growing. However, despite its growth and dispersion, there has been a limited synthesis of the applications. A previous study [ 20 ] performed a bibliometric analysis focusing on only HIV. A bibliometric analysis is a statistical or quantitative analysis of large-scale bibliographic metadata (or metrics of published studies) on a given topic. These quantitative analyses detect patterns, networks, and trends among the bibliographic metadata [ 21 , 22 ]. Thus, the aim of this study was to address the evolution of big data in epidemiology and infectious diseases to identify gaps and opportunities for further research. The study findings reveal interesting patterns and can inform trending research focus and future directions in big data–driven infectious diseases research.

Study Design

A bibliometric analysis was performed to understand and explore research on big data in infectious disease modeling and surveillance. The adopted bibliometric methodology involved three main phases: data collection, data analysis, and data visualization and reporting [ 23 ].

Search Strategy

Regarding data collection, which entails querying and exporting data from selected databases, we queried the Web of Science (WoS) core databases for publications using specific inclusion and exclusion criteria. Compared to other databases, the WoS has been shown to have better quality bibliometric information [ 23 , 24 ] and more excellent coverage of high-impact journals [ 25 ]. With the aid of domain knowledge experts from the fields of both big data and epidemiology, we iteratively developed a search strategy and selected the following search terms. The following search string queried all documents’ titles, abstracts, and keywords, and generated 3235 publications in the WoS collection:

(Epidemic* OR “infectious disease*” OR “Disease surveillance” OR “disease transmission” OR “disease outbreak*” OR (“communicable disease*” NOT “non-communicable disease”) OR syndemic* OR HIV OR AIDS OR “human immunodeficiency virus” OR coronavirus* OR SARS-CoV-2 OR COVID-19 OR Influenza OR flu OR Zika OR Ebola OR MERS OR “Middle East respiratory syndrome” OR Tuberculosis OR “Monkey Pox” OR “Dengue virus” OR Hepatitis*)
(“BIG DATA” OR “web mining” OR “opinion mining” OR “Google Trend*” OR “Google search*” OR “Google quer*” OR “Internet search*” OR “Internet quer*” OR “search engine quer*” OR “Digital traces” OR “electronic health records” OR “Digital epidemiology”)

Screening Strategy

Documents not written in English and not peer-reviewed, including editorial materials, letters, meeting abstracts, news items, book reviews, and retracted publications, were removed from the data set given the focus on bibliometric analysis, leaving 3054 documents for the analytic sample ( Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is ijmr_v12i1e42292_fig1.jpg

Flow chart of the literature selection process.

The 3054 bibliographic data were exported into the R package bibliometrix [ 23 ] for analysis. This package was specifically used to conduct performance analysis and science mapping of big data in infectious disease epidemiology. Performance mapping evaluates the production and impact of research constituents, including authors, institutions, countries, and journals. Science mapping examines the relationships between the research constituents by analyzing the topic’s conceptual, intellectual, and social structure.

There are several metrics available for bibliometric analysis. In this study, the primary metrics used for evaluating productivity and influence were the H-index and M-index. The H-index represents the number of published papers h , such that the citation number is at least h [ 26 ]. The H-index can be computed for different bibliometric units of analysis: authors, journals, institutions, and countries. The M-index simply adjusts the H-index for the academic age (ie, the number of years since the researcher’s first publication). Other utilized performance analysis metrics were obtained from yearly research output and citation counts. These metrics also contribute to identifying the main themes and the key actors in the research area.

In terms of science mapping, network maps were constructed for some selected bibliographic units of analysis [ 27 ]. These networks exhibit frequency distributions of the involved bibliographic data over time. For instance, international collaborations can be explored by assessing same-country publications. A cocitation network analysis was also used to analyze publication references. In addition, using the Louvain clustering algorithm and a greedy optimization technique [ 28 ], a co-occurrence analysis was used to understand the conceptual structure of the research area. The basic purpose of co-occurrence analysis is to investigate the link between keywords based on the number of times they appear together in a publication. Notable research topics and over-time trends were detected by generating clusters for author-provided keywords [ 29 ]. VOSviewer [ 30 ] was used to construct the network visualizations. Each network node represents a research constituent (eg, author, country, institution, article, document source, keyword). The node’s size is proportional to the occurrence frequency of the relevant parameters. The degree of association is represented by the thickness of the link between nodes, and the various colors reflect distinct clusters.

Descriptive Summary

The bibliographic data set comprises 3054 documents from 1600 sources, 14,351 authors, and 121,726 references. From the 3054 documents, 2666 (87.30%) were original research articles and the remaining 388 (12.70%) were review papers. The research output before 2009 was relatively low. The annual publication output during the 27 years (1995-2022) grew steadily, with a yearly growth rate of 26.5%. The publication growth increased steeply between 2013 and 2020 ( Figure 2 ). Table 1 presents the summary statistics of the primary characteristics of these 3054 publications, including the time span and information about documents and authors.

An external file that holds a picture, illustration, etc.
Object name is ijmr_v12i1e42292_fig2.jpg

Annual growth of publications related to big data in infectious diseases research.

Main descriptive summary of the extracted bibliographic records from 1995 to 2022.

As shown in Table 2 , the most productive and influential sources publishing on topics related to big data and infectious diseases epidemiology were Journal of Medical Internet Research and PLoS One (H-index=18), followed by IEEE Access (H-index=13). In terms of productivity, Journal of Medical Internet Research produced a slightly higher number of publications (n=61) than the next best journal PLoS One (n=56). PLoS One had the highest number of total citations at 1893.

Top 10 productive and influential publication sources ranked by H-index.

As shown in Table 3 , the most productive and influential author was Zhang Y (H-index=17), followed by Li X (H-index=13) and Wang J (H-index=12). Wang L had the highest total citations (n=1072), which was substantially higher than the next most impactful author Wang J (total citations=861).

Top 10 productive and influential authors ranked by H-index and total citations.

a Not available.

The aim and scope of the top 10 most influential journals, as listed in Table 2 , is to publish medical research, medical informatics, or multidisciplinary studies. It can thus be inferred that major future breakthroughs regarding big data in infectious diseases epidemiology will likely appear in these journals.

Figure 3 displays the top 20 most productive institutions. Institutional contributions were assessed by affiliations with at least one author in the publication. Except for the University of California, the top three institutions, which account for 21.3% of the number of publications in the top 20, were medical schools: Harvard Medical School (7.9%) and Icahn School of Medicine at Mount Sinai (6.4%). The other institutions, each accounting for more than 6% of the total, included Columbia University and Oxford University in the top 5, whereas others in the top 20 are research universities: London School of Hygiene and Tropical Medicine focuses on global and public health, Taipei Medical University is medical-based, and Huazhong University of Science and Technology is focused on science and technology. The United States produced the majority of the top 10 most productive institutions, which were in the top 5.

An external file that holds a picture, illustration, etc.
Object name is ijmr_v12i1e42292_fig3.jpg

Top 20 institutions by number of publications. CALIF: California; HARVARD MED SCH: Harvard Medical School; ICAHN SCH MED MT SINAI: Icahn School of Medicine at Mount Sinai; LONDON SCH HYG AND TROP MED: London School of Hygiene & Tropical Medicine; PENN: Pennsylvania; UNIV: University.

The 20 most productive countries ( Figure 4 ) are led by the United States and China, accounting for more than half (57.3%) of the total publication output. The United States alone accounted for 41.1% of the productivity in this field. The other countries in the top five were the United Kingdom (9.4%), India (4.4%), and Canada (3.3%).

An external file that holds a picture, illustration, etc.
Object name is ijmr_v12i1e42292_fig4.jpg

Top 20 productive countries by number of publications.

Computer science was the most productive research domain in the bibliographic collection ( Figure 5 ), accounting for 17.6% of the top 10 subject areas. In order of productivity, the other research subjects in the top 5 were public environmental and occupational health (11.4%), health care services (9.6%), medical informatics (9.0%), and engineering (8.8%).

An external file that holds a picture, illustration, etc.
Object name is ijmr_v12i1e42292_fig5.jpg

Top 10 key subject areas by number of publications.

Two major clusters of countries represent the collaboration patterns of the most productive countries ( Figure 6 ). The network was set to include only countries with at least 10 documents, resulting in 50 productive countries. The clustering results demonstrated a demarcation of European countries from the others. For instance, cluster 1 (red) represented most countries from Europe, with England, Germany, and Spain being the core countries. Non-European countries constituted the second cluster (green). The United States and China were the core countries of this group.

An external file that holds a picture, illustration, etc.
Object name is ijmr_v12i1e42292_fig6.jpg

Network of country collaborations (≥10 documents, 50 countries, 2 clusters).

Regarding collaboration strength, the United States, with a total link strength of 570, featured the highest number of partners (48), accounting for almost all 50 countries in the network (96%). China, which distantly followed the United States, featured 38 partners and a total link strength of 304. This implies that collaboration is mainly regional.

Figure 7 shows a network map of cocited references in this research area, wherein the node’s size represents the citation strength of the individual studies. The network was set to include only studies with at least 25 citations, resulting in 37 studies. Ginsberg et al [ 31 ] published the most highly cited article (185 citations). This 13-year-old study presented a method that used Google search queries to track flu-like illnesses in a population. The second most cited study by Eysenbach [ 9 ] introduced the concept of infodemiology, the science of using the internet (eg, social media, search engines, blogs, and websites) to inform public health and public policy. Table 4 further summarizes the top 15 most cited references, including the title, year of publication, number of citations, type of disease, and data source.

An external file that holds a picture, illustration, etc.
Object name is ijmr_v12i1e42292_fig7.jpg

Network of cocited references.

Summary of the top 15 most cited references.

a NA: not applicable (eg, a review paper, no particular disease or data source for a case study).

b Online platform of real-time COVID-19 cases in China.

c Internet searches include Google Trends and Baidu Index.

d Weibo is a China-based social media platform.

The 37 studies in the network map of cocited references produced four thematic clusters ( Figure 7 ); disease monitoring and surveillance (cluster 1), utility of electronic health (or medical) records (cluster 2), methodology framework for infodemiology tools (cluster 3), and machine learning and deep learning methods (cluster 4) were the main topics discussed.

Keyword co-occurrence analysis serves as a supplement to enrich the understanding of the thematic clusters derived from the reference cocitation analysis and helps identify the core topics and contents [ 29 ]. As shown in Figure 8 , the co-occurrence network displayed 100 relevant keywords after assigning a selection threshold of 10 for the number of keyword occurrences. The top 5 most frequently used keywords were COVID-19, big data, machine learning, coronavirus, and electronic health records.

An external file that holds a picture, illustration, etc.
Object name is ijmr_v12i1e42292_fig8.jpg

Co-occurrence networks of author keywords.

The 100 author-derived keywords produced four clusters from the coword analysis ( Figure 8 ). Cluster 1 (yellow-green) is related to public health and infectious diseases, with top keywords such as COVID-19, SARS-CoV-2, epidemiology, and epidemics . Cluster 2 (green) is related to electronic storage and delivery of health care, with top keywords including electronic health records, clinical decision support, primary care, epidemiology, and telemedicine . Cluster 3 (blue) involves infodemiology tools, with top keywords including coronavirus, google trends, social media, infodemiology , and surveillance . Cluster 4 (red) is more coherent and broadly related to big data and artificial intelligence, including top keywords big data, machine learning, artificial intelligence, deep learning, and big data analytics.

Systematic Review of the Top 20 Papers

Further filtering of the top 20 papers was performed to determine if they met the following criteria: (1) addressed at least one infectious disease and (2) utilized a big data source. A review of these 20 papers (summarized in Table 5 ) was then performed. These selected studies were mainly characterized by papers that utilized novel data sources, including internet search engine data (Google Trends: n=11; Baidu or Weibo index: n=2; Yahoo: n=1) and social media data (Twitter: n=5). Other data sources included electronic health or medical records (n=3) and Tencent migration data (n=1). The most frequently studied diseases were COVID-19 (n=10) [ 35 , 36 , 39 , 42 , 45 - 50 ], followed by influenza (n=8) [ 37 , 40 , 43 , 44 , 51 - 54 ]. Only one study considered the Zika virus [ 55 ], and another considered the trio of meningitis, legionella pneumonia, and Ebola [ 56 ].

Summary of top 20 studies that addressed an infectious disease and utilized a big data source.

Principal Findings

Novel big data streams have created interesting opportunities for infectious disease monitoring and control. The review of the top 20 papers suggests the domination of high-volume electronic health records and digital traces such as internet searches and social media. Of note is the relatively increased use of Google Trends. Most studies used Google Trends data by correlating them with official data on disease occurrence, spread, and outbreaks. Some of these studies further adopted nowcasting for disease surveillance. However, using Google Trends for forecasts and predictions in infectious diseases epidemiology fills a gap in the extant literature. Few studies have gone as far as predicting incidents and occurrences, even though data on reported cases of various health concerns and the associated Google Trends data have been correlated in many studies. Predicting the future is hard; hence, more reliable and efficient methodologies are needed for forecasting infectious disease outbreaks.

There are a few drawbacks to digital trace data that should be considered. Many of these data streams miss demographic information such as age and gender, which is essential in almost any epidemiological study. Besides, they represent a growing but still limited population segment, with infants unfeatured and fewer older adults than younger people. Geographic heterogeneity in coverage exists, with underrepresentation in developing countries, although these biases tend to fade and are arguably less pronounced than those found in traditional global surveillance systems. Further, the retrieved data are subject to spatial and temporal uncertainty. Accordingly, hybrid systems that supplement rather than replace conventional surveillance systems as well as improve prospects for accurate infectious disease models and forecasts should be developed.

Most studies, except for those in the United States and China, were conducted in the European context. Thus, more studies need to test the utility of these big data streams for infectious disease epidemiology in the context of more countries, especially in Africa. Future research questions should ask if any cross-cultural differences between countries affect the adoption and use of big data in infectious disease epidemiology.

The vast majority of infectious diseases have a global distribution. Apart from the coronavirus, influenza, Zika, and Ebola virus outbreaks that are featured in our review, the utility of these big data sources for more infectious diseases should be studied.

Limitations

A few limitations were inherent in our study. First, like any bibliometric study, we are limited by the search terms and database used. This study utilized English publications from the WoS core collection; therefore, relevant publications may have been missed. However, our choice of WoS was informed by its greater coverage of high-impact journals. Second, some studies may have been published after we concluded document extraction. Accordingly, this study does not claim to be exhaustive but rather extensive.

Future Research Agenda and Conclusions

The bibliometric study identified the United States and China as research leaders in this field, with most affiliations from the Harvard Medical School and the University of California. Top authors were Zhang Yi and Li Xingwang. Journal of Medical Internet Research and PLoS One are the most productive and influential journals in this field. Internet searches and social media data are the most utilized data sources. COVID-19 and influenza were the most studied infectious diseases. The main research themes in this area of research were disease monitoring and surveillance, utility of electronic health (or medical) records, methodology framework for infodemiology tools, and machine/deep learning. Most research papers on big data in infectious diseases epidemiology were published in outlets related to computer science, public health, and health care services.

Opportunities for future research are revealed directly from the results of this study. Integrating multiple surveillance platforms, including big data tools, are critical to better understanding pathogen spread. It is also paramount for the research needs to align with a global view of disease risk. The risk of infectious disease is globally shared in an increasingly connected world. The COVID-19 pandemic, including the rapid global circulation of evolved strains, has emphasized the need for an interdisciplinary, collaborative, global framework for infectious disease research and control. There is a need to empower epidemiologists and public health scientists to leverage insights from big data for infectious disease prevention and control.

Abbreviations

Conflicts of Interest: None declared.

COMMENTS

  1. Epidemiologic Case Study Resources

    The case studies include links to websites and videos, discussion and interactive questions, plus a full package of instructor resources including a helpful instructor's guide with sample answers to discussion questions, and a test bank. The 6 Interactive Case Studies include: 1. Clinical course of COVID-19 2. Epidemiology of COVID-19 3.

  2. Case Reports and Case Series

    Key Concept: The key to identifying a case series is that all of the subjects included in the study have the primary disease or outcome of interest. For example, an article reported on 239 people who got bird flu. The article might present tables and graphs that gave information about their age, occupation, where they lived, whether they lived or died, etc., but basically it is a detailed ...

  3. Epidemiology Training & Resources|Epidemic Intelligence Service|CDC

    Epidemiology Training & Resources. The CDC Field Epidemiology Manual — This manual serves as an essential resource for epidemiologists and other health professionals working in local, state, national, and international settings for effective outbreak response to acute and emerging threats. CDC EIS Case Studies in Applied Epidemiology ...

  4. Designing and Conducting Analytic Studies in the Field

    Case-case studies are similar to case-control studies, except that controls have an illness not linked to the outbreak. Case-control studies are probably the type most often appropriate for field investigations. Although conceptually straightforward, the design of an effective epidemiologic study requires many careful decisions.

  5. The case for case-cohort: An applied epidemiologist's guide to re

    When designing epidemiologic studies, we are often confronted with tradeoffs between statistical precision, measurement accuracy, ... The case-cohort study includes (1) a sample of individuals from the cohort who have experienced the outcome of interest ("cases") and (2) a sample of individuals randomly selected from among the members of ...

  6. Epidemiology in Practice: Case-Control Studies

    Introduction. A case-control study is designed to help determine if an exposure is associated with an outcome (i.e., disease or condition of interest). In theory, the case-control study can be described simply. First, identify the cases (a group known to have the outcome) and the controls (a group known to be free of the outcome).

  7. EIS Case Studies

    Each EIS Case Study consists of an instructor guide or a student guide. Instructors or trainers ( not students ): obtain Applied Epidemiology Case Studies instructors' guides. Send an e-mail to: [email protected] and label subject line: "Case Study Instructor Guide Request". Students: Use Adobe Acrobat reader, to view or print the student guides.

  8. Case studies in applied epidemiology

    In 1988, EIS training staff looked for a new name, and settled on "case study.". However, these applied epidemiology cases studies differ in a number of ways from what are called case studies in other disciplines, particularly the case study based on a single patient in clinical medicine or psychology or the case studies used in business ...

  9. Introduction to Epidemiological Studies

    The basic epidemiological study designs are cross-sectional, case-control, and cohort studies. Cross-sectional studies provide a snapshot of a population by determining both exposures and outcomes at one time point. Cohort studies identify the study groups based on the exposure and, then, the researchers follow up study participants to measure ...

  10. Introduction to Epidemiology|Public Health 101 Series|CDC

    Introduction to Epidemiology. Epidemiology is the "study of distribution and determinants of health-related states among specified populations and the application of that study to the control of health problems.". — A Dictionary of Epidemiology. These materials provide an overview of epidemiology investigations, methods, and data collection.

  11. CDC Epidemiology Case Studies

    CDC developed case studies in applied epidemiology based on real-life epidemiologic investigations and used them for training new Epidemic Intelligence Service (EIS) officers — CDC's "disease detectives.". EIS offers these carefully crafted epidemiology case studies for schools of medicine, nursing, and public health to use as a ...

  12. How epidemiology has shaped the COVID pandemic

    The pandemic has changed epidemiology. As with many fields that are directly involved in the study of COVID-19, epidemiologists are collaborating across borders and time zones. They are sharing ...

  13. The Case Time Series Design : Epidemiology

    using traditional studies, they require innovative analytical approaches. Here we present a new study design, called case time series, for epidemiologic investigations of transient health risks associated with time-varying exposures. This design combines a longitudinal structure and flexible control of time-varying confounders, typical of aggregated time series, with individual-level analysis ...

  14. Classroom Case Studies

    The epidemiologic case studies for the classroom are based on real-life outbreaks and public health problems. They were developed in collaboration with the original investigators and experts from the Centers for Disease Control and Prevention (CDC). In these case studies, a group of students works through a public health problem with guidance ...

  15. Epidemiologic Case Studies

    These case studies are interactive exercises developed to teach epidemiologic principles and practices. They are based on real-life outbreaks and public health problems and were developed in collaboration with the original investigators and experts from the Centers for Disease Control and Prevention (CDC). The case studies require students to ...

  16. Case-control study

    case-control study, in epidemiology, observational (nonexperimental) study design used to ascertain information on differences in suspected exposures and outcomes between individuals with a disease of interest (cases) and comparable individuals who do not have the disease (controls). Analysis yields an odds ratio (OR) that reflects the relative probabilities of exposure in the two populations.

  17. Designing an Interactive Field Epidemiology Case Study Training for

    A key to a successful case study is placing learners in an active role, rather than asking them to reproduce information-driven training concepts in the context of the scenario. Although a case study is designed to be a classroom tool, in field epidemiology the case study should strive to put the learner in the field.

  18. Methodology minute: An overview of the case-case study design and its

    The case-case study design is a potentially useful tool for infection preventionists during outbreak or cluster investigations. This column clarifies terminology related to case-case, case-control, and case-case-control study designs. ... MPH, Department of Epidemiology and Biostatistics, Mel and Enid Zuckerman College of Public Health ...

  19. PDF EPI Case Study 1 Incidence, Prevalence, and Disease Surveillance

    EPIDEMIOLOGY CASE STUDY 1: Incidence, Prevalence, and Disease Surveillance; Historical Trends in the Epidemiology of M. tuberculosis STUDENT VERSION 1.0 6 Table 4. Tuberculosis Cases, Case Rates per 100,000 Population, Deaths, and Death Rates per 100,000 Population, and Percent Change: United States, 1953-20072 Source: CDC.

  20. What Is a Case-Control Study?

    Case-control studies are a type of observational study often used in fields like medical research, environmental health, or epidemiology. While most observational studies are qualitative in nature, case-control studies can also be quantitative, and they often are in healthcare settings. Case-control studies can be used for both exploratory and ...

  21. PDF Understanding the Epidemiologic Triangle through Infectious Disease

    This activity will help you teach about the scientific concept of the Epidemiologic Triangle using an infectious disease example. Once students understand the Triangle, they can apply it to other diseases they study. This exercise will refine research, reasoning, and problem solving skills.

  22. A Carbon Monoxide Poisoning Case Insight: The Convergence of Social

    This study explores the role of social media in public health through a case of carbon monoxide (CO) poisoning prevented and subsequently overlooked, ... forming the basis of "Media Epidemiology." This new field studies how digital communication impacts health-related behaviors and outcomes. Our analysis highlights social media as a tool ...

  23. Case Studies

    The case studies in applied epidemiology have been developed at CDC and used in training for Epidemic Intelligence Service (EIS) Officers, "disease detectives". The case studies allow students to practice their epidemiologic skills in the classroom to carefully crafted exercises that detail real public health problems.

  24. Epidemiological Study Designs: Traditional and Novel Approaches to

    The central focus of life course epidemiology and life course approaches to health development is on the complex processes underlying the occurrence and accrual of risks at multiple levels and their impact on the developing individual. Reflecting the multilevel and integrated features of human health development that are at the centre of life course health-development (LCHD) principles, study ...

  25. Acute and long-term outcomes of SARS-CoV-2 infection in school ...

    Background: The symptom profiles of acute SARS-CoV-2 infection and long-COVID in children and young people (CYP), risk factors, and associated healthcare needs, are poorly defined. The Schools Infection Survey 1 (SIS-1) was a nationwide study of SARS-CoV-2 infection in primary and secondary schools in England during the 2020/21 school year.

  26. Epidemiology of COVID-19: An updated review

    One case report study showed a person-to-person transmission between health-care workers and patients. ... The Epidemiology Team of Coronavirus Pneumonia Emergency Response (2020) represented that COVID-19 nosocomial coughing transmission is still imprecise, but in China, around 1716 hospital staff have been infected by February 2020 during ...

  27. Attack Rate, Case Fatality Rate and Predictors of Pertussis ...

    Background Pertussis, a highly contagious, vaccine-preventable respiratory infection caused by Bordetella pertussis, is a leading global public health issue. Ethiopia is currently conducting multiple pertussis outbreak investigations, but there is a lack of comprehensive information on attack rate, case fatality rate, and infection predictors. This study aimed to measure attack rates, case ...

  28. About Disaster Epidemiology

    Typically, the main objectives of disaster epidemiology are the following: Prevent or reduce the number of deaths, illnesses, and injuries caused by disasters. Provide timely and accurate health information for decision-makers. Improve prevention and mitigation strategies for future disasters by gaining information for future response preparation.

  29. Big Data and Infectious Disease Epidemiology: Bibliometric Analysis and

    For epidemiology and infectious diseases research, this means that in the last decade, there has been a significant spike in the number of studies with considerable interest in using digital epidemiology and big data tools to enhance health systems in terms of disease surveillance, modeling, and evidence-based responses [4,6-8].