Study evaluation requirements
Study evaluations are performed on an endpoint/outcome-specific basis. For each evaluation domain, core and prompting questions are provided to guide the reviewer in assessing different aspects of study design and conduct related to reporting, risk of bias and study sensitivity. For some domains (see below), additional outcome- or chemical-specific refinements to the criteria used to answer the questions should be developed a priori by reviewers. Each domain receives a judgment of Good, Adequate, Deficient, Not Reported or Critically Deficient accompanied by the rationale and primary study-specific information supporting the judgment. Once all domains are evaluated, a confidence rating of High, Medium, or Low confidence or Uninformative is assigned for each endpoint/outcome from the study. The overall confidence rating should, to the extent possible, reflect interpretations of the potential influence on the results (including the direction and/or magnitude of influence) across all domains. The rationale supporting the overall confidence rating should be documented clearly and consistently, including a brief description of any important strengths and/or limitations that were identified and their potential impact on the overall confidence.
Domain judgments and overall ratings for all individual endpoints/outcomes can be captured by a single (default) response, or you can create override responses assigned to individual endpoints, outcomes, or results; provide a descriptive label to describe which components the score refers to. Each response must have a single default score; when selecting the default representative rating for the domains and overall rating (i.e., the drop-down selection with the associated color code), it is typically most appropriate to select the judgment that best represents the study overall.
Follow link to see attachments that contain example answers to the animal study evaluation domains. It is really helpful to have this document open when conducting reviews.
Follow link to see attachments that contain example prompting and follow-up questions for epidemiological studies.
Requirements by Study Type
Domain | Metric | Bioassay | Epidemiology | In Vitro |
---|---|---|---|---|
Reporting Quality100502978 | Reporting quality100504889 | ✔ | - | - |
Selection and Performance100502979 | Allocation100504890 | ✔ | - | - |
Selection and Performance100502979 | Observational bias/blinding100504891 | ✔ | - | - |
Selection and Performance100502979 | Participant selection100504892 | - | ✔ | - |
Confounding/Variable Control100502980 | Confounding/variable control100504893 | ✔ | - | - |
Selective Reporting/Attrition100502981 | Selective reporting and attrition100504894 | ✔ | - | - |
Exposure Methods100502982 | Chemical administration and characterization100504895 | ✔ | - | - |
Exposure Methods100502982 | Exposure timing, frequency and duration100504896 | ✔ | - | - |
Exposure Methods100502982 | Exposure measures100504897 | - | ✔ | - |
Outcome Methods/Results Presentation100502983 | Outcome Assessment100504898 | ✔ | - | - |
Outcome Methods/Results Presentation100502983 | Results presentation100504899 | ✔ | - | - |
Outcome Methods/Results Presentation100502983 | Outcome measures100504900 | - | ✔ | - |
Confounding100502984 | Confounding100504901 | - | ✔ | - |
Analysis100502985 | Does the analysis strategy and presentation convey the necessary familiarity with the data and assumptions?100504902 | - | ✔ | - |
Selective Reporting100502986 | Selective Reporting100504903 | - | ✔ | - |
Sensitivity100502987 | Are there concerns for study sensitivity100504904 | - | ✔ | - |
Overall Study Confidence100502988 | Overall confidence (animal)100504905 | ✔ | - | - |
Overall Study Confidence100502988 | Overall confidence (epi)100504906 | - | ✔ | - |
Reporting Quality100502978
Reporting quality100504889
CORE QUESTION
Does the study report information for evaluating the design and conduct of the study for the endpoint(s)/outcome(s) of interest?
PROMPTING QUESTIONS
Does the study report the following?
Critical information necessary to perform study evaluation:
- Species; test article name; levels and duration of exposure; route (e.g., oral; inhalation); qualitative or quantitative results for at least one endpoint of interest
Important information for evaluating the study methods:
- Test animal: strain, sex, source, and general husbandry procedures
- Exposure methods: source, purity, method of administration
- Experimental design: frequency of exposure, animal age and lifestage during exposure and at endpoint/outcome evaluation
- Endpoint evaluation methods: assays or procedures used to measure the endpoints/outcomes of interest
NOTES:
- Reviewers should reach out to authors to obtain missing information when studies are considered key for hazard evaluation and/or dose-response.
- This domain is limited to reporting. Other aspects of the exposure methods, experimental design, and endpoint evaluation methods are evaluated using the domains related to risk of bias and study sensitivity.
BASIC CONSIDERATIONS
These considerations typically do not need to be refined by assessment teams, although in some instances the important information may be refined depending on the endpoints/outcomes of interest or the chemical under investigation.
A judgment and rationale for this domain should be given for the study. Typically, these will not change regardless of the endpoints/outcomes investigated by the study. In the rationale, reviewers should indicate whether the study adhered to GLP, OECD, or other testing guidelines.
Good: All critical and important information is reported or inferable for the endpoints/outcomes of interest.
Adequate: All critical information is reported but some important information is missing. However, the missing information is not expected to significantly impact the study evaluation.
Deficient: All critical information is reported but important information is missing that is expected to significantly reduce the ability to evaluate the study.
Critically Deficient: Study report is missing any pieces of critical information. Studies that are Critically Deficient for reporting are Uninformative for the overall rating and not considered further for evidence synthesis and integration.
EXAMPLE RATING
Study - Good - Important information is provided for test species, strain, sex, age, exposure methods, experimental design, endpoint evaluations and the presentation of results. The authors report that 'the study was conducted in compliance with the OECD guidelines for Good Laboratory Practice [c(81) 30 (Final)]'.
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Selection and Performance100502979
Allocation100504890
CORE QUESTION
Were animals assigned to experimental groups using a method that minimizes selection bias?
PROMPTING QUESTIONS
For each study:
- Did each animal or litter have an equal chance of being assigned to any experimental group (i.e., random allocation)?
- Is the allocation method described?
- Aside from randomization, were any steps taken to balance variables across experimental groups during allocation?
BASIC CONSIDERATIONS
These considerations typically do not need to be refined by assessment teams.
A judgment and rationale for this domain should be given for each cohort or experiment in the study.
Good: Experimental groups were randomized and any specific randomization procedure was described or inferable (e.g., computer-generated scheme). [Note that normalization is not the same as randomization (see response for 'Adequate').]
Adequate: Authors report that groups were randomized but do not describe the specific procedure used (e.g., 'animals were randomized'). Alternatively, authors used a non-random method to control for important modifying factors across experimental groups (e.g., body weight normalization).
Not Reported (interpreted as Deficient): No indication of randomization of groups or other methods (e.g., normalization) to control for important modifying factors across experimental groups.
Critically Deficient: Bias in the animal allocations was reported or inferable.
EXAMPLE RATING
All Cohorts/Experiments - Good - The study authors report that 'Fifty males and fifty females were randomly assigned to groups by a computer-generated weight-ordered distribution such that individual body weights did not exceed + 20% of the mean weight for each sex.'
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Observational bias/blinding100504891
CORE QUESTION
Did the study implement measures to reduce observational bias?
PROMPTING QUESTIONS
For each endpoint/outcome or grouping of endpoints/outcomes in a study:
- Does the study report blinding or other methods/procedures for reducing observational bias?
- If not, did the study use a design or approach for which such procedures can be inferred?
- What is the expected impact of failure to implement (or report implementation) of these methods/procedures on results?
BASIC CONSIDERATIONS
These considerations typically do not need to be refined by the assessment teams. [Note that it can be useful for teams to identify highly subjective measures of endpoints/outcomes where observational bias may strongly influence results prior to performing evaluations.]
A judgment and rationale for this domain should be given for each endpoint/outcome or group of endpoints/outcomes investigated in the study.
Good: Measures to reduce observational bias were described (e.g. blinding to conceal treatment groups during endpoint evaluation; consensus-based evaluations of histopathology lesions[1]).
Adequate: Methods for reducing observational bias (e.g., blinding) can be inferred or were reported but described incompletely.
Not Reported: Measures to reduce observational bias were not described.
- (interpreted as Adequate): The potential concern for bias was mitigated based on use of automated/computer driven systems, standard laboratory kits, relatively simple, objective measures (e.g., body or tissue weight), or screening-level evaluations of histopathology.
- (interpreted as Deficient): The potential impact on the results is major (e.g., outcome measures are highly subjective).
Critically Deficient: Strong evidence for observational bias that could have impacted results
[1] For non-targeted or screening-level histopathology outcomes often used in guideline studies, blinding during the initial evaluation of tissues is generally not recommended as masked evaluation can make 'the task of separating treatment-related changes from normal variation more difficult' and 'there is concern that masked review during the initial evaluation may result in missing subtle lesions.' Generally, blinded evaluations are recommended for targeted secondary review of specific tissues or in instances when there is a pre-defined set of outcomes that is known or predicted to occur (Crissman, 2004).
EXAMPLE RATINGS
Histopathology - Good - Although the study did not indicate blinding, blinding during the initial evaluation of tissues for initial or non-targeted evaluations is generally not recommended as masked evaluation can make the task of separating treatment-related changes from normal variation more difficult and may result in subtle lesions being overlooked (Crissman, 2004). The study did include a secondary evaluation by a pathology working group (PWG) review on coded pathology slides which minimized the potential for observational bias.
Organ weights, FOB, motor activity, swim maze and histopathology - Good - Authors reported that the investigators were blinded to the animal treatment group during evaluation for all outcome measures (i.e.,). Although blinding is not recommended for initial or non-targeted evaluations (Crissman, 2004), this study evaluated prespecified outcomes in targeted evaluations for which blinding is appropriate (cell counts in the CA3 region of the hippocampus).
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Confounding/Variable Control100502980
Confounding/variable control100504893
CORE QUESTION
Are variables with the potential to confound or modify results controlled for and consistent across all experimental groups?
PROMPTING QUESTIONS
For each study:
- Are there differences across the treatment groups (e.g., co-exposures, vehicle, diet, palatability, husbandry, health status, etc.) that could bias the results?
- If differences are identified, to what extent are they expected to impact the results?
BASIC CONSIDERATIONS
These considerations may need to be refined by assessment teams, as the specific variables of concern can vary by experiment or chemical.
A judgment and rationale for this domain should be given for each cohort or experiment in the study, noting when the potential for confounding is restricted to specific endpoints/outcomes.
Good: Outside of the exposure of interest, variables that are likely to confound or modify results appear to be controlled for and consistent across experimental groups.
Adequate: Some concern that variables that were likely to confound or modify results were uncontrolled or inconsistent across groups, but are expected to have a minimal impact on the results.
Deficient: Notable concern that potentially confounding variables were uncontrolled or inconsistent across groups, and are expected to substantially impact the results.
Critically deficient: Confounding variables were presumed to be uncontrolled or inconsistent across groups, and are expected to be a primary driver of the results.
EXAMPLE RATING
All Cohorts/Experiments/Endpoints - Good - Based on the study report, vehicle (deionized water with 2% tween 80) and husbandry practices were inferred to be the same in controls and treatment groups. The experimental conditions described provided no indication of concern for uncontrolled variables or different practices across groups.
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Selective Reporting/Attrition100502981
Selective reporting and attrition100504894
CORE QUESTION
Did the study report results for all prespecified outcomes and tested animals?
PROMPTING QUESTIONS
For each study:
Selective reporting bias:
- Are all results presented for endpoints/outcomes described in the methods (see note)?
Attrition bias:
- Are all animals accounted for in the results?
- If there are discrepancies, do authors provide an explanation (e.g., death or unscheduled sacrifice during the study)?
- If unexplained results omissions and/or attrition are identified, what is the expected impact on the interpretation of the results?
NOTE: This domain does not consider the appropriateness of the analysis/results presentation. This aspect of study quality is evaluated in another domain.
BASIC CONSIDERATIONS
These considerations typically do not need to be refined by assessment teams.
A judgment and rationale for this domain should be given for each cohort or experiment in the study.
Good: Quantitative or qualitative results were reported for all prespecified outcomes (explicitly stated or inferred), exposure groups and evaluation timepoints. Data not reported in the primary article is available from supplemental material. If results omissions or animal attrition are identified, the authors provide an explanation and these are not expected to impact the interpretation of the results.
Adequate: Quantitative or qualitative results are reported for most prespecified outcomes (explicitly stated or inferred), exposure groups and evaluation timepoints. Omissions and/or attrition are not explained, but are not expected to significantly impact the interpretation of the results.
Deficient: Quantitative or qualitative results are missing for many prespecified outcomes (explicitly stated or inferred), exposure groups and evaluation timepoints and/or high animal attrition; omissions and/or attrition are not explained and may significantly impact the interpretation of the results.
Critically Deficient: Extensive results omission and/or animal attrition are identified and prevents comparisons of results across treatment groups.
EXAMPLE RATING
Inhalation study - Good - Animal loss was reported (the authors treated 10 rats/sex/dose group and noted one death in a high-dose male rat at day 85 of study). All endpoints described in methods were reported qualitatively or quantitatively.
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Exposure Methods100502982
Chemical administration and characterization100504895
CORE QUESTION
Did the study adequately characterize exposure to the chemical of interest and the exposure administration methods?
PROMPTING QUESTIONS
For each study:
- Does the study report the source and purity and/or composition (e.g., identity and percent distribution of different isomers) of the chemical? If not, can the purity and/or composition be obtained from the supplier (e.g., as reported on the website)
- Was independent analytical verification of the test article purity and composition performed?
- Did the authors take steps to ensure the reported exposure levels were accurate?
- For inhalation studies: were target concentrations confirmed using reliable analytical measurements in chamber air?
- For oral studies: if necessary based on consideration of chemical-specific knowledge (e.g., instability in solution; volatility) and/or exposure design (e.g., the frequency and duration of exposure), were chemical concentrations in the dosing solutions or diet analytically confirmed?
Are there concerns about the methods used to administer the chemical (e.g., inhalation chamber type, gavage volume, etc.)?
NOTE: Consideration of the appropriateness of the route of exposure is not evaluated at the individual study level. Relevance and utility of the routes of exposure are considered in the PECO criteria for study inclusion and during evidence synthesis.
BASIC CONSIDERATIONS
It is essential that these criteria are considered, and potentially refined, by assessment teams, as the specific variables of concern can vary by chemical (e.g., stability may be an issue for one chemical but not another).
A judgment and rationale for this domain should be given for each cohort or experiment in the study.
Good: Chemical administration and characterization is complete (i.e., source, purity, and analytical verification of the test article are provided). There are no concerns about the composition, stability, or purity of the administered chemical, or the specific methods of administration. For inhalation studies, chemical concentrations in the exposure chambers are verified using reliable analytical methods.
Adequate: Some uncertainties in the chemical administration and characterization are identified but these are expected to have minimal impact on interpretation of the results (e.g., source and vendor- reported purity are presented, but not independently verified; purity of the test article is sub-optimal but not concerning; For inhalation studies, actual exposure concentrations are missing or verified with less reliable methods).
Deficient: Uncertainties in the exposure characterization are identified and expected to substantially impact the results (e.g., source of the test article is not reported; levels of impurities are substantial or concerning; deficient administration methods, such as use of static inhalation chambers or a gavage volume considered too large for the species and/or lifestage at exposure).
Critically Deficient: Uncertainties in the exposure characterization are identified and there is reasonable certainty that the results are largely attributable to factors other than exposure to the chemical of interest (e.g., identified impurities are expected to be a primary driver of the results).
EXAMPLE RATINGS
Oral study - Good - Source (3M) and purity (98%) are described, and the authors provided verification using analytical methods (GC/MS). Addressing concerns about known instability in solution for this chemical, the authors verified the dosing solutions twice weekly over the course of the experiment. Animals were exposed via gavage with all dose groups receiving the same volume.
Inhalation study - Good - Source (3M) and purity (98%) of the test article are described. All animals were transferred to dynamic inhalation exposure chambers for the exposures. The concentration of the test chemical in the air was continuously monitored from the animals' breathing zone throughout the 6-hour exposure periods and mean daily average concentrations and variability were reported.
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Exposure timing, frequency and duration100504896
CORE QUESTION
Was the was the timing, frequency, and duration of exposure sensitive for the endpoint(s)/outcome(s) of interest?
PROMPTING QUESTIONS
For each endpoint/outcome or grouping of endpoints/outcomes in a study:
- Does the exposure period include the critical window of sensitivity?
- Was the duration and frequency of exposure sensitive for detecting the endpoint of interest?
BASIC CONSIDERATIONS
Considerations for this domain are highly variable depending on the endpoint(s)/outcome(s) of interest and must be refined by assessment teams.
A judgment and rationale for this domain should be given for each endpoint/outcome or group of endpoints/outcomes investigated in the study.
Good: The duration and frequency of the exposure was sensitive and the exposure included the critical window of sensitivity (if known).
Adequate: The duration and frequency of the exposure was sensitive and the exposure covered most of the critical window of sensitivity (if known).
Deficient: The duration and/or frequency of the exposure is not sensitive and did not include the majority of the critical window of sensitivity (if known). These limitations are expected to bias the results towards the null.
Critically deficient: The exposure design was not sensitive and is expected to strongly bias the results towards the null. The rationale should indicate the specific concern(s).
EXAMPLE RATINGS
All Endpoints/Outcomes - Good - Study uses a standard OECD short-term (28-day) study design to examine toxicological effects that are routinely evaluated in this testing guideline.
Developmental and Male Reproductive effects - Good - The experimental design and exposure period were appropriate for evaluation of potential male reproductive and developmental effects. The experiment was designed to evaluate reproductive and developmental outcomes and followed recommendations in OECD 416 and EPA OPPT 870.3800 guidelines.
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Outcome Methods/Results Presentation100502983
Outcome Assessment100504898
CORE QUESTION
Are the procedures sensitive and specific for evaluating the endpoint(s)/outcome(s) of interest?
PROMPTING QUESTIONS
For each endpoint/outcome or grouping of endpoints/outcomes in a study:
- Are there concerns regarding the specificity and validity of the protocols?
- Are there serious concerns regarding the sample size (see note)?
- Are there concerns regarding the timing of the endpoint assessment?
NOTE: Sample size alone is not a reason to conclude an individual study is critically deficient.
BASIC CONSIDERATIONS
Considerations for this domain are highly variable depending on the endpoint(s)/outcome(s) of interest and must be refined by assessment teams.
A judgment and rationale for this domain should be given for each endpoint/outcome or group of endpoints/outcomes investigated in the study.
Examples of potential concerns include:
- Selection of protocols that are insensitive or non-specific for the endpoint of interest
- Use of unreliable methods to assess the outcome
- Assessment of endpoints at inappropriate or insensitive ages, or without addressing known endpoint variation (e.g., due to circadian rhythms, estrous cyclicity, etc.).
- Decreased specificity or sensitivity of the response due to the timing of endpoint evaluation, as compared to exposure (e.g., short-acting depressant or irritant effects of chemicals; insensitivity due to prolonged period of non-exposure prior to testing).
EXAMPLE RATING
Organ weight, body weights, and hormone measures - Good - No concerns regarding the specificity and validity of the protocols and measures were identified. Study authors used standard methodology for evaluating organ and body weights. Thyroid hormones were measured using commercial electrochemiluminescence-immunoassay methods, and the known diurnal variation in these measures was accounted for during blood collection.
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Results presentation100504899
CORE QUESTION
Are the results presented in a way that makes the data usable and transparent?
PROMPTING QUESTIONS
For each endpoint/outcome or grouping of endpoints/outcomes in a study:
- Does the level of detail allow for an informed interpretation of the results?
- Are the data analyzed, compared, or presented in a way that is inappropriate or misleading?
BASIC CONSIDERATIONS
Considerations for this domain are highly variable depending on the outcomes of interest and must be refined by assessment teams.
A judgment and rationale for this domain should be given for each endpoint/outcome or group of endpoints/outcomes investigated in the study.
Examples of potential concerns include:
- Non-preferred presentation, such as developmental toxicity data averaged across pups in a treatment group, when litter responses are more appropriate
- Failing to present quantitative results
- Pooling data when responses are known or expected to differ substantially (e.g., across sexes or ages)
- Failing to report on or address overt toxicity when exposure levels are known or expected to be highly toxic
- Lack of full presentation of the data (e.g., presentation of mean without variance data; concurrent control data are not presented)
EXAMPLE RATING
All Endpoints/Outcomes - Good - There are no notable concerns about the way the results are analyzed or presented.
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Overall Study Confidence100502988
Overall confidence (animal)100504905
CORE QUESTION
Considering the identified strengths and limitations, what is the overall confidence rating for the endpoint(s)/outcome(s) of interest?
PROMPTING QUESTIONS
For each endpoint/outcome or grouping of endpoints/outcomes in a study:
- Were concerns (i.e., limitations or uncertainties) related to the reporting quality, risk of bias, or sensitivity identified?
- If yes, what is their expected impact on the overall interpretation of the reliability and validity of the study results, including (when possible) interpretations of impacts on the magnitude or direction of the reported effects?
NOTE: Reviewers should mark studies that are rated lower than high confidence only due to low sensitivity (i.e., bias towards the null) for additional consideration during evidence synthesis. If the study is otherwise well-conducted and an effect is observed, the confidence may be increased.
BASIC CONSIDERATIONS
The overall confidence rating considers the likely impact of the noted concerns (i.e., limitations or uncertainties) in reporting, bias and sensitivity on the results.
A confidence rating and rationale should be given for each endpoint/outcome or group of endpoints/outcomes investigated in the study.
High confidence: No notable concerns are identified (e.g. most or all domains rated Good).
Medium confidence: Some concerns are identified, but expected to have minimal impact on the interpretation of the results. (e.g., most domains rated Adequate or Good; may include studies with Deficient ratings if concerns are not expected to strongly impact the magnitude or direction of the results). Any important concerns should be carried forward to evidence synthesis.
Low confidence: Identified concerns are expected to significantly impact on the study results or their interpretation (e.g., generally, Deficient ratings for one or more domains). The concerns leading to this confidence judgment must be carried forward to evidence synthesis (see note).
Uninformative: Serious flaw(s) that make the study results unusable for informing hazard identification (e.g., generally, Critically Deficient rating in any domain; many Deficient ratings). Uninformative studies are not considered further in the synthesis and integration of evidence.
EXAMPLE RATINGS
Reproductive and developmental effects other than behavior - High Confidence - The study was well-designed for the evaluation reproductive and developmental toxicity induced by chemical exposure. The study applied established approaches, recommendations, and best practices, and employed an appropriate exposure design for these endpoints. Evidence was presented clearly and transparently.
Behavioral measures - Low Confidence - The cursory cage-side observations of activity are considered to be insensitive and non-specific methods for detecting motor effects, with a strong bias towards the null.
Follow link to see attachments that contain example answers to the animal study evaluation domains.
Selection and Performance100502979
Participant selection100504892
EXAMPLE TEXT: Adequate. Nested case-control design in Mexico City birth cohort with 30 cases of preterm birth and 30 controls selected randomly from same population of woman who were recruited during prenatal visits at one of four clinics (serving low to moderate income population). Recruitment and eligibility criteria (inclusion/exclusion criteria) discussed. Little discussion of participants versus nonparticipants but the available information indicates that differential selection is possible but not likely. Participation rate reported to be low (36%). Evaluates the vulnerable population of low-moderate income pregnant women.
Add other concerns or limitations.
Add impact and direction to effect estimate, if applicable.
RATING GUIDANCE: Is there evidence that selection into or out of the study (or analysis sample) was jointly related to exposure and to outcome?
Study design, where and when was the study conducted, and who was included? Recruitment process, exclusion and inclusion criteria, type of controls, total eligible, comparison between participants and nonparticipants (or followed and not followed), final analysis group. Does the study include potential vulnerable/susceptible groups or lifestages?
Follow link to see attachments that contain example prompting and follow-up questions for epidemiological studies.
Exposure Methods100502982
Exposure measures100504897
EXAMPLE TEXT: Poor for long-chained (DEHP, DiNP) and adequate for short-chained (DEP, DBP, DiBP) phthalate metabolites based on number of samples. A single spot (second morning void) urine sample was collected from each woman during a third-trimester visit to the project's research center; third trimester sample is relevant to later term preterm births. Analytical approach described and appropriate. High percent >LOD.
Add other concerns or limitations.
Add impact and direction to effect estimate, if applicable.
RATING GUIDANCE: Does the exposure measure reliably distinguish between levels of exposure in a time window considered most relevant for a causal effect with respect to the development of the outcome?
Source(s) of exposure (consumer products, occupational, an industrial accident) and source(s) of exposure data, blinding to outcome, level of detail for job history data, when measurements were taken, type of biomarker(s), assay information, reliability data from repeat measures studies, validation studies.
Follow link to see attachments that contain example prompting and follow-up questions for epidemiological studies.
Outcome Methods/Results Presentation100502983
Outcome measures100504900
EXAMPLE TEXT: Adequate. Preterm birth defined by length of gestation (< 37 weeks), a standard measure of birth outcome, estimated by maternal recall of the date of last menstrual period, rather than the preferred early ultrasound. Potential misclassification of preterm cases due to maternal recall of last menstrual period to estimate gestational age which may be nondifferential with respect to exposure; however, differential misclassification is still possible but unlikely.
Add other concerns or limitations.
Add impact and direction to effect estimate, if applicable.
RATING GUIDANCE: Does the outcome measure reliably distinguish the presence or absence (or degree of severity) of the outcome?
Source of outcome (effect) measure, blinding to exposure status or level, how measured/classified, incident versus prevalent disease, evidence from validation studies, prevalence (or distribution summary statistics for continuous measures).
Follow link to see attachments that contain example prompting and follow-up questions for epidemiological studies.
Confounding100502984
Confounding100504901
EXAMPLE TEXT: Adequate. Information on key confounders was collected through questionnaire. The strategy for evaluating confounding and the process for retaining variables in the models was described. Rationale for selecting confounders not provided. Inclusion in model not solely based on statistical significance. Adjustment for relative co-exposures.
Add other concerns or limitations.
Add impact and direction to effect estimate, if applicable.
RATING GUIDANCE: Is confounding of the effect of the exposure unlikely?
Background research on key confounders for specific populations or settings; participant characteristic data, by group; strategy/approach for consideration of potential confounding; strength of associations between exposure and potential confounders and between potential confounders and outcome; degree of exposure to the confounder in the population.
Follow link to see attachments that contain example prompting and follow-up questions for epidemiological studies.
Analysis100502985
Does the analysis strategy and presentation convey the necessary familiarity with the data and assumptions?100504902
EXAMPLE TEXT: Adequate. Multivariable (multivariate) logistic regression used to take into account potential confounding variables; quantitative results presented (ORs and 95% CIs with ORs adjusted for confounders). Imputation techniques used when phthalate metabolite concentrations were below the LOD (filling in data where there wasn't); Amount of missing data not noted; Dichotomous exposure (reduced sensitivity) and use of median as the cut-off adjusted for urine creatinine and specific gravity to assess effect of method used.
Add other concerns or limitations.
Add impact and direction to effect estimate, if applicable.
RATING GUIDANCE: Does the analysis strategy and presentation convey the necessary familiarity with the data and assumptions?
Extent (and if applicable, treatment) of missing data for exposure, outcome, and confounders, approach to modeling, classification of exposure and outcome variables (continuous versus categorical), testing of assumptions, sample size for specific analyses, relevant sensitivity analyses.
An ideal study would convey a thoughtful and thorough description of the analytical approach, and descriptive data for key variables (e.g., exposure measures, outcome measures), including the amount of missing data (or proportion less than the limit of detection [LOD]). The ideal analysis would use an appropriate and well thought out modeling approach for the study design (e.g., logistic regression for case-control data) and specify the covariates used in the final model; the methods should be described in enough detail such that they could be applied to the data from another study. In addition, the results should be presented with sufficient detail to enable estimation of effect estimates and precision of the estimates (e.g., standard error [SE] or confidence interval [CI]
Follow link to see attachments that contain example prompting and follow-up questions for epidemiological studies.
Selective Reporting100502986
Selective Reporting100504903
Selective Reporting
EXAMPLE TEXT: Adequate. No concerns for selective reporting.
RATING GUIDANCE: Is there concern for selective reporting?
Rating should be 2-level - Adequate or Deficient.
Are results presented with adequate detail for all the endpoints of interest? Are results presented for the full sample as well as for specified subgroups? Were stratified analyses (effect modification) motivated by a specific hypothesis?
Follow link to see attachments that contain example prompting and follow-up questions for epidemiological studies.
Sensitivity100502987
Are there concerns for study sensitivity100504904
Sensitivity
EXAMPLE TEXT: Deficient. Small sample size/ Potential nondifferential misclassification of outcome and exposure. Low exposure levels. Range of exposure is narrow. Healthy worker effect.
Add other concerns or limitations.
Add impact and direction to effect estimate, if applicable.
RATING GUIDANCE: Are there concerns for study sensitivity?
What exposure range is spanned in this study? What are the ages of participants (e.g., not too young in studies of pubertal development)? What is the length of follow-up (for outcomes with long latency periods)? Choice of referent group and the level of exposure contrast between groups (i.e., the extent to which the 'unexposed group' is truly unexposed, and the prevalence of exposure in the group designated as 'exposed'). Is the study relevant to the exposure and outcome of interest?
Follow link to see attachments that contain example prompting and follow-up questions for epidemiological studies.
Overall Study Confidence100502988
Overall confidence (epi)100504906
EXAMPLE TEXT: Low confidence. Give brief rationale for rating.
Add other concerns or limitations.
Add impact and direction to effect estimate, if applicable.
RATING GUIDANCE: Once the evaluation domains have been classified, these ratings will be combined to reach an overall study confidence classification of High, Medium, Low, or Uninformative.
This classification will be based on the classifications in the evaluation domains, and will include consideration of the likely impact of the noted deficiencies in bias and sensitivity on the results. Studies with critical deficiencies in any evaluation domain will be classified as Uninformative. Other classifications will generally follow a sorting such that High Confidence studies would have the highest evaluation ('Good') for all or most domains; Low Confidence studies would have a 'Poor' evaluation for one or more domains (unless the impact of the particular limitation(s) is judged to be unlikely to be severe), and Medium Confidence studies are in between these groups (e.g., most domains receiving a mid-level Adequate evaluation, with no limitations judged to be severe.) Once initial evaluation has been performed with consensus between reviewers, the classifications will be re-evaluated, looking at the variability 'within' and 'between' levels to ensure that the separation between the levels of confidence are appropriate and that no additional criteria need to be considered.
Follow link to see attachments that contain example prompting and follow-up questions for epidemiological studies.