Systematic Reviews and Meta-Analyses: Data Extraction
Data extraction, sometimes referred to as data collection or data abstraction, refers to the process of extracting and organizing the information from each included (relevant) study.
The synthesis approach(es) (e.g., meta-analysis, framework synthesis) that you intend to use will inform data extraction.
Just like all other stages of a systematic review, 2 data extractors should extract data from in each included reference. The exact procedure may vary according to your resource capacity. For example, you may have a team of 10 extractors in 5 pairs of 2 extracting data from chunks of the included material, if managing a large corpus.
Note: experience in the field does not necessarily increase the accuracy of this process. See Horton et al., (2010) 'Systematic review data extraction: cross-sectional study showed that experience did not increase accuracy', and Jones et al., (2005) 'High prevalence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews' for more on this topic'.
Note: Effect Size Measurements
Defining ahead of time which measurement of effect(s) will be relevant and useful is important, especially if you hope to pursue a meta-analysis. Though it is unlikely that all of your studies will produce the same measurement of effect (e.g., odds ratio, relative risk ratio), many of these measurements can be transformed or converted to the measurement you need for your meta-analysis.
If converting effect sizes, be sure to provide enough detail about this process in your manuscript such that another team could replicate. It is best to collect the original outputs from articles before converting effect sizes. There are tools available for converting effect sizes such as the Campbell Collaboration's tool for calculating or converting effect sizes and the effect size converter from MIT.
Data Extraction Templates
Data extraction is often performed using a single form to extract data from all included (relevant) studies in a uniform manner. Because the data extraction stage is driven by the scope and goals of a systematic review, there is not a gold standard or one-size-fits all approach to developing a data extraction form.
However, there are templates and guidance available to help in the creation of your forms.
Because it is standard to include the data extraction form in the supplemental material of a systematic review and/or meta-analysis, you may also consider the forms developed and/or used during similar, already published and/or in-progress reviews
As is the case with the critical appraisal, the type of data you are able to extract will also depend on the study design. Therefore, it is likely that the exact data you extract from each individual article will vary somewhat.
Data Extraction Form Templates
Cochrane | One form for randomized controlled trials (RCTs) only; one form for RCTs and non-RCTs
Joanna Briggs Institute (JBI) | Several forms located in each relevant chapter:
- Qualitative data (appendix 2.3)
- Text and opinion data (appendix 4.3)
- Prevalence studies (prevalence data; appendix 5.2)
- Mixed method (convergent integrated approach; appendix 8.1)
- Diagnostic test accuracy (appendix 9.3)
- Measurement properties (appendix 12.1) with table of results template (appendix 12.2)
Present Data Extracted
Data extracted from each reference is presented as a summary table or summary of findings table and described in the narrative.
A summary table, like the examples seen below, provides readers with quick glance summary of study details that are important to the systematic review and/or meta-analysis. Similarly to the other stages of a review, what you collect and report will depend on the scope of the review and the type of synthesis you plan to conduct.
Qualitative Data Only
Summary table from: Bin-Reza, F., Lopez Chavarrias, V., Nicoll, A., & Chamberland, M. E. (2012). The use of masks and respirators to prevent transmission of influenza: a systematic review of the scientific evidence. Influenza and other respiratory viruses, 6(4), 257–267. doi:10.1111/j.1750-2659.2011.00307.x
Quantitative Data (meta-analysis)
Summary table from: Simpson, S. S., Rorie, M., Alper, M., Schell‐Busey, N., Laufer, W. S., & Smith, N. C. (2014). Corporate Crime Deterrence: A Systematic Review. Campbell Systematic Reviews, 10(1), 1–105. https://doi.org/10.4073/csr.2014.4
It may be appropriate to include more than one summary table. For example, one table may present basic information about the study such as author names, year of publication, year(s) the study was conducted, study design, funding agency, etc.; Another table may present details more specific to the qualitative synthesis; A third table may present information specifically relevant to the meta-analysis, with effect sizes, confidence intervals, etc. Additionally, it is best practice to have one summary table for each outcome.
Chapter 5: Collecting data
- 5.2 Sources of data
- 5.3 What data to collect
- 5.4 Data collection tools
- 5.5 Extracting data from reports
- 5.6 Extracting study results and converting to the desired format
- 5.7 Managing and sharing data
Chapter 6: Choosing effect measures and computing estimates of effect
- 6.1 Types of data and effect measures
- 6.2 Study designs and identifying the unit of analysis
- 6.3 Extracting estimates of effect directly
- 6.4 Dichotomous outcome data
- 6.5 Continuous outcome data
- 6.6 Ordinal outcome data and measurement scales
- 6.7 Count and rate data
- 6.8 Time-to-event data
- 6.9 Conditional outcomes only available for subsets of participants
Step 4: Data extraction
Conducting systematic reviews of intervention questions II: Relevance screening, data extraction, assessing risk of bias, presenting the results and interpreting the findings. Sargeant JM, O’Connor AM. Zoonoses Public Health. 2014 Jun;61 Suppl 1:39-51. doi: 10.1111/zph.12124. PMID: 24905995
Study designs and systematic reviews of interventions: building evidence across study designs. Sargeant JM, Kelton DF, O’Connor AM. Zoonoses Public Health. 2014 Jun;61 Suppl 1:10-7. doi: 10.1111/zph.12127. PMID: 24905992
Randomized controlled trials and challenge trials: Design and criterion for validity. Sargeant JM, Kelton DF, O’Connor AM,Zoon. Public Health. 2014. 61 (S1); 18 – 27. PMID: 24905993
C43. Using data collection forms (protocol & review / final manuscript)
C44. Describing studies (review / final manuscript)
C45. Extracting study characteristics and outcome data in duplicate (protocol & review / final manuscript)
C46. Making maximal use of data (protocol & review / final manuscript)
C47. Examining errata (review / final manuscript)
C49. Choosing intervention groups in multi-arm studies (protocol & review / final manuscript)
C50. Checking accuracy of numeric data in the review (review / final manuscript)
Reporting in Protocol and Final Manuscript
In the Protocol | PRISMA-P
Data Collection Process (Item 11c)
...forms should be developed a priori and included in the published or otherwise available review protocol as an appendix or as online supplementary materials
Include strategies for reducing error:
"...level of reviewer experience has not been shown to affect extraction error rates. As such, additional strategies planned to reduce errors, such as training of reviewers and piloting of extraction forms should be described."
Include how to handle missing information:
"...in the absence of complete descriptions of treatments, outcomes, effect estimates, or other important information, reviewers may consider asking authors for this information. Whether reviewers plan to contact authors of included studies and how this will be done (such as a maximum of three email attempts) to obtain missing information should be documented in the protocol."
Data Items (Item 12)
List and define all variables for which data will be sought (such as PICO items, funding sources) and any pre-planned data assumptions and simplifications
Include any assumptions by extractors:
"...describe assumptions they intend to make if they encounter missing or unclear information and explain how they plan to deal with such data or lack thereof"
Outcomes and Prioritization (Item 13)
List and define all outcomes for which data will be sought, including prioritisation of main and additional outcomes, with rationale
In the Final Manuscript | PRISMA
Data Collection Process (Item 9; report in methods)
- Report how many reviewers collected data from each report, whether multiple reviewers worked independently or not (for example, data collected by one reviewer and checked by another), and any processes used to resolve disagreements between data collectors.
- Report any processes used to obtain or confirm relevant data from study investigators (such as how they were contacted, what data were sought, and success in obtaining the necessary information).
- If any automation tools were used to collect data, report how the tool was used (such as machine learning models to extract sentences from articles relevant to the PICO characteristics), how the tool was trained, and what internal or external validation was done to understand the risk of incorrect extractions.
- If articles required translation into another language to enable data collection, report how these articles were translated (for example, by asking a native speaker or by using software programs).
- If any software was used to extract data from figures, specify the software used.
- If any decision rules were used to select data from multiple reports corresponding to a study, and any steps were taken to resolve inconsistencies across reports, report the rules and steps used.
Data Items (Item 10; report in methods)
- List and define the outcome domains and time frame of measurement for which data were sought (Item 10a)
- Specify whether all results that were compatible with each outcome domain in each study were sought, and, if not, what process was used to select results within eligible domains (Item 10a)
- If any changes were made to the inclusion or definition of the outcome domains or to the importance given to them in the review, specify the changes, along with a rationale (Item 10a)
- If any changes were made to the processes used to select results within eligible outcome domains, specify the changes, along with a rationale (Item 10a)
- List and define all other variables for which data were sought. It may be sufficient to report a brief summary of information collected if the data collection and dictionary forms are made available (for example, as additional files or deposited in a publicly available repository) (Item 10b)
- Describe any assumptions made about any missing or unclear information from the studies. For example, in a study that includes “children and adolescents,” for which the investigators did not specify the age range, authors might assume that the oldest participants would be 18 years, based on what was observed in similar studies included in the review, and should report that assumption (Item 10b)
- If a tool was used to inform which data items to collect (such as the Tool for Addressing Conflicts of Interest in Trials (TACIT) or a tool for recording intervention details), cite the tool used (Item 10b)
Consider specifying which outcome domains were considered the most important for interpreting the review’s conclusions (such as “critical” versus “important” outcomes) and provide rationale for the labelling (such as “a recent core outcome set identified the outcomes labelled ‘critical’ as being the most important to patients”) (Item 10a)
Effect Measures (Item 12; report in methods)
- Specify for each outcome or type of outcome (such as binary, continuous) the effect measure(s) (such as risk ratio, mean difference) used in the synthesis or presentation of results.
- State any thresholds or ranges used to interpret the size of effect (such as minimally important difference; ranges for no/trivial, small, moderate, and large effects) and the rationale for these thresholds.
- If synthesised results were re-expressed to a different effect measure, report the methods used to re-express results (such as meta-analysing risk ratios and computing an absolute risk reduction based on an assumed comparator risk)
Study Characteristics (Item 17; report in results)
- Cite each included study
- Present the key characteristics of each study in a table or figure (considering a format that will facilitate comparison of characteristics across the studies)
If the review examines the effects of interventions, consider presenting an additional table that summarises the intervention details for each study
Results of Individual Studies (Item 19; report in results)
- For all outcomes, irrespective of whether statistical synthesis was undertaken, present for each study summary statistics for each group (where appropriate). For dichotomous outcomes, report the number of participants with and without the events for each group; or the number with the event and the total for each group (such as 12/45). For continuous outcomes, report the mean, standard deviation, and sample size of each group.
- For all outcomes, irrespective of whether statistical synthesis was undertaken, present for each study an effect estimate and its precision (such as standard error or 95% confidence/credible interval). For example, for time-to-event outcomes, present a hazard ratio and its confidence interval.
- If study-level data are presented visually or reported in the text (or both), also present a tabular display of the results.
- If results were obtained from multiple sources (such as journal article, study register entry, clinical study report, correspondence with authors), report the source of the data. This need not be overly burdensome. For example, a statement indicating that, unless otherwise specified, all data came from the primary reference for each included study would suffice. Alternatively, this could be achieved by, for example, presenting the origin of each data point in footnotes, in a column of the data table, or as a hyperlink to relevant text highlighted in reports (such as using SRDR Data Abstraction Assistant139).
- If applicable, indicate which results were not reported directly and had to be computed or estimated from other information (see item #13b)