Non randomized clinical trials definition




















Providing that effect estimates from the included studies can be expressed using consistent effect measures, we recommend that review authors display individual study results for NRSI with similar study design features using forest plots, as a standard feature. If consistent effect measures are not available or calculable, then additional tables should be used to present results in a systematic format see also Chapter 12, Section If the features of studies are not sufficiently similar to combine in a meta-analysis which is expected to be the norm for reviews that include NRSI , we recommend displaying the results of included studies in a forest plot but suppressing the summary estimate see Chapter 12, Section For example, in a review of the effects of circumcision on risk of HIV infection, a forest plot illustrated the result from each study without synthesizing them Siegfried et al Studies may be sorted in the forest plot or shown in separate forest plots by study design feature, or their risk of bias.

For example, the circumcision studies were separated into cohort studies, cross-sectional studies and case-control studies. Heterogeneity diagnostics and investigations e. Non-statistical syntheses of quantitative intervention effects see Chapter 12 are challenging, however, because it is difficult to set out or describe results without being selective or emphasizing some findings over others.

Ideally, authors should set out in the review protocol how they plan to use narrative synthesis to report the findings of primary studies. As highlighted at the outset, review authors have a duty to summarize available evidence about interventions, balancing harms against benefits and qualified with a certainty assessment.

Nevertheless, obtaining definitive results about the likely effects of an intervention based on NRSI alone can be difficult Deeks et al Challenges arise at all stages of conducting a review of NRSI: deciding which study design features should be specified as eligibility criteria, searching for studies, assessing studies for potential bias, and deciding how to synthesize results.

A review author needs to satisfy the reader of the review that these challenges have been adequately addressed, or should discuss how and why they cannot be met. In this section, the challenges are illustrated with reference to issues raised in the different sections of this chapter. The Discussion section of the review should address the extent to which the challenges have been met.

Even if the choice of eligible study design features can be justified, it may be difficult to show that all relevant studies have been identified because of poor indexing and inconsistent use of study design labels or poor reporting of design features by researchers. Comprehensive search strategies that focus only on the health condition and intervention of interest are likely to result in a very long list of bibliographic records including relatively few eligible studies; conversely, restrictive strategies will inevitably miss some eligible studies.

In practice, available resources may make it impossible to process the results from a comprehensive search, especially since review authors will often have to read full papers rather than abstracts to determine eligibility. The implications of using a more or less comprehensive search strategy are not known. Interpretation of the results of a review of NRSI should include consideration of the likely direction and magnitude of bias, although this can be challenging to do.

Some of the biases that affect randomized trials also affect NRSI but typically to a greater extent. For example, attrition in NRSI is often worse and poorly reported , intervention and outcome assessment are rarely conducted according to standardized protocols, outcomes are rarely assessed blind to the allocation to intervention and comparator, and there is typically little protection against selection of the reported result.

We recommend using the ROBINS-I tool to assess the risk of bias because of the consensus among a large team of developers that it covers all important bias domains. This is not true of any other tool to assess the risk of bias in NRSI.

The importance of individual bias domains may vary according to the review question; for example, confounding may be less likely to arise in NRSI studies of long-term or adverse effects, or some public health primary prevention interventions. As with randomized trials, one clue to the presence of bias is notable between-study heterogeneity. Although heterogeneity can arise through differences in participants, interventions and outcome assessments, the possibility that bias is the cause of heterogeneity in reviews of NRSI must be seriously considered.

However, lack of heterogeneity does not indicate lack of bias, since it is possible that a consistent bias applies in all studies. This is a subject of ongoing research which is attempting to gather empirical evidence on factors such as study design features and intervention type that determine the size and direction of the biases.

The ability to predict both the likely magnitude of bias and the likely direction of bias would greatly improve the usefulness of evidence from systematic reviews of NRSI.

There is currently some evidence that in limited circumstances the direction, at least, can be predicted Henry et al Assembling the evidence from NRSI on a particular health question enables informed debate about its meaning and importance, and the certainty that can be attributed to it. Critically, there needs to be a debate about whether the findings could be misleading. This emphasizes the general concern about biases in NRSI, and the difficulties of attributing causality to the observed associations between intervention and outcome.

In preference to these traditional hierarchies, the GRADE approach is recommended for assessing the certainty of a body of evidence in Cochrane Reviews, and is summarized in Chapter 14 Section The certainty is then rated down in the presence of serious concerns about study limitations risk of bias , indirectness of evidence, heterogeneity, imprecision or publication bias.

For example, the strength of evidence for an association may be enhanced by a subset of primary studies that have tested considerations about causality not usually applied to randomized trial evidence Bradford Hill , or use of negative controls Jackson et al In some contexts, little prognostic information may be known, limiting identification of possible confounding Jefferson et al Whether the debate concludes that the evidence from NRSI is adequate for informed decision making or that there is a need for randomized trials will depend on the value placed on the uncertainty arising through use of potentially biased NRSI, and the collective value of the observed effects.

The GRADE approach interprets certainty as the certainty that the effect of the intervention is large enough to reach a threshold for action. This value may depend on the wider healthcare context.

It may not be possible to include assessments of the value within the review itself, and it may become evident only as part of the wider debate following publication. For example, is evidence from NRSI of a rare serious adverse effect adequate to decide that an intervention should not be used?

The evidence has low certainty due to a lack of randomized trials but the value of knowing that there is the possibility of a potentially serious harm is considerable, and may be judged sufficient to withdraw the intervention. It is worth noting that the judgement about withdrawing an intervention may depend on whether equivalent benefits can be obtained from elsewhere without such a risk; if not, the intervention may still be offered but with full disclosure of the potential harm.

Where evidence of benefit is also uncertain, the value attached to a systematic review of NRSI of harm may be even greater. In contrast, evidence of a small benefit of a novel intervention from a systematic review of NRSI may not be sufficient for decision makers to recommend widespread implementation in the face of the uncertainty of the evidence and the costs arising from provision of the intervention.

In these circumstances, decision makers may conclude that randomized trials should be undertaken to improve the certainty of the evidence if practicable and if the investment in the trial is likely to be repaid in the future. Carrying out a systematic review of NRSI is likely to require complex decisions, often necessitating members of the review team with content knowledge and methodological expertise about NRSI at each stage of the review.

Potential review authors should therefore seek to collaborate with methodologists, irrespective of whether a review aims to investigate harms or benefits, short-term or long-term outcomes, frequent or rare events. Review teams may be keen to include NRSI in systematic reviews in areas where there are few or no randomized trials because they have the ambition to improve the evidence-base in their specialty areas a key motivation for many Cochrane Reviews.

However, for reviews of NRSI to estimate the effects of an intervention on short-term and expected outcomes, review authors should also recognize that the resources required to do a systematic review of NRSI are likely to be much greater than for a systematic review of randomized trials. Inclusion of NRSI to address some review questions will be invaluable in addressing the broad aims of a review; however, the conclusions in relation to some review questions are likely to be much weaker and may make a relatively small contribution to the topic.

Therefore, review authors and Cochrane Review Group editors need to decide at an early stage whether the investment of resources is likely to be justified by the priority of the research question. Bringing together the required team of healthcare professionals and methodologists may be easier for systematic reviews of NRSI to estimate the effects of an intervention on long-term and rare adverse outcomes, for example when considering the side effects of drugs.

A review of this kind is likely to provide important missing evidence about the effects of an intervention in a priority area i. However, these reviews may require the input of additional specialist authors, for example with relevant content pharmacological expertise. There is a pressing need in many health conditions to supplement traditional systematic reviews of randomized trials of effectiveness with systematic reviews of adverse unintended effects. It is likely that these systematic reviews will usually need to include NRSI.

Systematic reviews of nonrandomized clinical studies in the orthopaedic literature. Clinical Orthopaedics and Related Research Bradford Hill A. The environment and disease: association or causation?

Proceedings of the Royal Society of Medicine ; 58 : Empirical evidence for selective reporting of outcomes in randomized trials: comparison of protocols to published articles.

JAMA ; : Evaluating non-randomised intervention studies. Health Technology Assessment ; 7 : Doll R. Doing more good than harm: The evaluation of health care interventions: Summation of the conference. Annals of the New York Academy of Sciences ; : North of England evidence based guidelines development project: methods of guideline development.

BMJ ; : Limited search strategies were effective in finding relevant nonrandomized studies. Journal of Clinical Epidemiology ; 59 : When are randomised trials unnecessary? Picking signal from noise. Room for improvement? A survey of the methods used in systematic reviews of adverse effects. Health Information and Libraries Journal b; 23 : Identifying systematic reviews of the adverse effects of health care interventions.

Agreement between randomized and non-randomized studies: the effects of bias and confounding. Using big data to emulate a target trial when a randomized trial is not available. American Journal of Epidemiology ; : Measuring inconsistency in meta-analyses. Issues relating to study design and risk of bias when including non-randomized studies in systematic reviews on the effects of interventions. Research Synthesis Methods ; 4 : BMJ ; : i Evidence of bias in estimates of influenza vaccine effectiveness in seniors.

International Journal of Epidemiology ; 35 : Assessment of the efficacy and effectiveness of influenza vaccines in healthy children: systematic review. The Lancet ; : In a trial intending to show that there is a difference less than a specific amount between control and experimental treatments, a noninferiority design statistically tests the null hypothesis that the experimental treatment is inferior by the equivalence margin.

One approach to specifying the margin is based on clinical significance, which can obviously be subjective. Sometimes it is possible to choose a margin for declaring noninferiority of a treatment, in which that treatment ends up having no effect or even a detrimental effect. The margin should be based on both statistical reasoning and clinical judgment and, in the setting of a placebo-controlled trial, cannot be greater than the smallest response that could be reliably expected from the active treatment compared to a placebo.

Using a treatment difference for the control treatment based on a previously published placebo-controlled trial, would be a way to consider the size of the margin statistically. If there were a need to ensure this conclusion, the margin could be chosen to be a fraction of the control treatment effect Fig. Noninferiority margin.

The positioning of the outcome result for each treatment is indicated. As previously pointed out, setting an inappropriate margin can cause a noninferiority test to misleadingly conclude a ineffective treatment to be effective. In some cases, noninferiority tests can be useless, unless the trial is carefully designed. A clinical trial should have the ability to distinguish effective treatments from those that are less effective, or ineffective.

This is defined as "assay sensitivity" and there is a question of whether noninferiority trials have the power to detect a beneficial treatment against a placebo even if a placebo group is included in the trial.

For example, even if a control treatment has shown efficacy in previous placebo-controlled trials, unless one can reliably expect that the control treatment effect consistently occurs in the current trial and if both treatments are truly ineffective, the test may just declare the noninferiority of an experimental treatment to an ineffective control Fig.

Hypothetical scenario. Numbers represent the values of a positive outcome. The noninferiority margin is determined by halving the control effect, based on a historical placebo-controlled trial. The presence of assay sensitivity in a noninferiority trial is not verifiable but may only be assumed based on historical evidence of sensitivity to drug effects, the similarity of trial designs to those that were able to distinguish efficacy of the active control from that of a placebo, and the quality of trial conduct.

Trial designs should be compared closely in terms of inclusion criteria, methods of diagnosis, and concomitant treatments used to evaluate consistency over time. The notion of assessing noninferiority in this context is similar to an indirect comparison, i.

The sample size of a noninferiority trial is very sensitive to the expected effects of the experimental treatments and controls. Although there could be other reasons for undertaking noninferiority trials, showing noninferiority would be more appropriate when there is an expectation that 2 treatments are similar.

A larger sample size is needed if a new treatment is assumed to be slightly less effective than the control, since in such situations it is more difficult to show noninferiority, unless a considerably narrower CI is obtained. On the other hand, the required sample size can be reduced if a new treatment is assumed to be slightly more effective than the active control.

The noninferiority margin is another major factor that influences sample size, and the greater the tolerance that is allowed, the smaller the sample size that is needed. However, an inflated margin may cause considerable loss of statistical power if noninferiority can be accepted only by a smaller margin Table 1.

Numbers represent the values of a positive outcome e. Intention-to-treat ITT is conventionally accepted as an unbiased analytical approach for superiority trials.

Analysis of all randomized patients, according to the treatments to which they were assigned, regardless of whether they received the treatment or not, confers a conservative effect on the outcome of the trial. However, ITT analysis may not be conservative for noninferiority trials, since including dropouts in the analysis tends to bias the results toward equivalence, even when an experimental treatment is less effective than the control.

The per-protocol analysis, which includes all patients who satisfactorily complied with the assigned treatment and who had no major protocol violations, is more likely to identify any treatment differences, but it can also substantially bias the results in either direction. The recommended approach for noninferiority trials is to perform both analyses and to conclude noninferiority if both analysis produce the same result.

Interpreting a noninferiority trial as a superiority trial is credible and without a need for a statistical penalty for multiple testing. However, the opposite approach is not valid. If a superiority trial fails to reject the null hypothesis but the trial data appear to suggest treatment equivalence, one may also be tempted to infer noninferiority.

If there is a possibility for testing noninferiority alongside a superiority test, one should predefine both hypotheses with a justifiable margin for noninferiority in the protocol. Testing noninferiority based on an ad hoc determination of a noninferiority margin after a trial is complete would not be acceptable due to bias.

If this definition is used with the trigger point being the first dose of the study medication, most likely, the full anlaysis set and the safety population will be identical. It is not uncommon that we define two populations that is identidical, but use it for different analyses. For safety analyses, the safety population is used; for efficacy analyses, full analysis set is used. Another term we can use in non-randomized studies is Evaluable Population which is usually defined as any subjects who receive any amount of the study medication and have at least one post-baseline efficacy measurement.

Evaluable population in non-randomized clinical trials is similar to the modified ITT population in randomized clinical trials where some randomized subjects are excluded from the analysis with justifiable rationales.

In summary, while the general principle is the same, the different terms may be preferred to be used depending on a study being a randomized or non-randomized. This is in contrast to the Habicht classification, which links these designs to the strength of the obtainable evidence adequacy, plausibility, probability [ 12 ]. A CRT as defined here needs to fulfil two basic criteria: 1 the intervention is allocated at random or using a quasi- random method of systematic allocation, and 2 there are sufficient clusters to allow a statistically meaningful comparison between intervention and control.

If fewer than 4 clusters are allocated to the intervention and control each 5—6 in a pair-matched trial , then the statistical between-arm comparison is not informative as there is no chance for the p value to be low, e.

A trial in which too few clusters are allocated to allow statistical between-arm comparison is not a CRT by this definition in contrast to the EPOC definition [ 9 ]. The methodology of CRTs has been described in depth [ 1 , 2 , 15 ]. The key difference between individually randomised and cluster-randomised studies lies in the loss in study power due to cluster-randomisation, often expressed as the design effect [ 1 , 2 , 15 ].

The design effect is the factor by which the sample size needs to be multiplied to account for clustering. The design effect can be unpredictable for outcomes with a high spatial or temporal variability, as in the case of many infections [ 16 ]. Underestimating the design effect occurs even in otherwise well-planned studies. For example, in the large-scale ZAMSTAR trial testing two tuberculosis case-finding interventions, the design effect was considerably higher than expected [ 17 ].

Unless reliable design effect estimates are available or the effect size of interest is large, CRTs can be an expensive gamble. Results from several inconclusive CRTs can be pooled in meta-analysis to improve precision. Statistically, meta-analysis is much more straightforward for CRTs than for non-randomised studies where pooled effects reflect the average amount of confounding, whereas in CRTs confounding can only be due to chance.

Unlike in CRTs, a large number of studies included in a meta-analysis of non-randomised studies does not minimise confounding. While all study designs provide more convincing evidence if the effect size of interest is large, the CRT remains the only study design suitable to investigate small effects. Both the confidence interval of the effect size and the risk of confounding can be minimised by increasing the sample size in particular the number of clusters , sometimes allowing the detection of very small effects [ 18 ].

All other study designs are at higher risk of confounding, the size of which is independent of the sample size.

Even if confounding can be minimised by statistical methods, the potential for residual confounding is likely to be larger than an expected small intervention effect. As will be discussed in more detail below, the CRT is the only study design that does not require a baseline measure of the outcome to minimise confounding, although a baseline can help to improve study power, explore eventual imbalances and adjust for these if appropriate. Imbalances between study arms can be assumed to have arisen by chance, unless proven otherwise.

Baseline measurements are costly, may cause reactivity in the study population due to repeated surveying and may already be outdated by the time a delayed intervention is delivered. If no baseline survey is needed, the investigators are in the comfortable position of letting the implementers work according to their schedule, and use the time to develop study procedures in a subset of the trial population or in external pilot clusters. A disadvantage of CRTs is that for ethical reasons, participants often but not always [ 19 ] may need to be told that they are part of a trial, possibly altering their behaviour and response to questions [ 20 — 22 ].

This may be a considerable problem especially in trials that are neither blinded nor use an objective outcome measure. Several meta-analyses have shown that such trials produce estimates that are severely affected by responder and observer bias [ 20 , 21 ]. These trials are the most problematic for the public since randomised trials carry a large weight in decision making, while it is the process of informed consent usually required in a randomised trial that may contribute to bias [ 21 ].

Not all may be lost for unblinded trials with a subjective outcome in situations where the purpose of the outcome assessment can be hidden from the study participants, for example by presenting it as a general health survey. In this context, the unit of treatment allocation may be important.

If an unblinded intervention evaluated using a subjective outcome e. If allocation is done at community level e. In contrast, several CRTs on community-level sanitation an unblinded intervention with the same outcome self-reported diarrhoea symptoms and equally poor compliance with the intervention showed no effect at all [ 23 — 25 ].

In both the point-of-use water treatment and the sanitation trials, compliance with the intervention was very poor. For both interventions, a true effect would have been biologically implausible. The absence of an observed effect in the sanitation trials may therefore be regarded not only as evidence for absence of a true effect but in contrast to the water treatment trials also as evidence for lack of responder bias, possibly because participants did not link the health surveys to the intervention or did not expect any benefits from giving false information [ 23 — 25 ].

Allocation is done by the investigator or the implementer, e. The EPOC definition of non-randomised trials requires that the investigator controls allocation [ 9 ]. In the definition used here, allocation is not random and it does not matter who allocates.

An implementer may decide to deliver an intervention in ten villages, and an evaluator may choose ten suitable control villages for comparison [ 26 , 27 ].

With notable exceptions [ 28 ], participants may not need to be explicitly told that they are part of a trial. Trial procedures may more easily be camouflaged as general demographic and health surveys than in a CRT, which may reduce responder bias. NCTs need to demonstrate that intervention and control arms are comparable.

Unlike in CRTs, imbalances are not due to chance until proven otherwise which is usually impossible. Most often, baseline characteristics are used to adjust for imbalances. Baseline variables may include 1 demographic and socio-economic characteristics and other covariates potentially associated with outcome and intervention, and 2 a baseline measure of the outcome of interest. These two measures need to be clearly distinguished as it can be argued that adjusting for the latter is likely to be more effective than for the former.

In a sense, baseline variables are a predictor of the baseline measure of the study outcome, which in turn is a predictor of the outcome at follow-up. Hence, the baseline measure of the study outcome can be regarded as more proximate to the potential outcome than other baseline variables. Investigating trends of the study outcome from baseline to follow-up is a fairly transparent way of exploring whether baseline imbalances may have affected the effect estimate, as the trends in the outcome in different study arms can be openly discussed.

If there is no baseline measure of the study outcome, then one can only compare other baseline variables e. Such models, however, usually represent a black box with an unknown amount of residual confounding [ 30 ]. It can be argued that the only way to make a NCT convincing is to obtain a precise baseline measurement of the study outcome and use it in the final analysis [ 31 ]. No amount of multivariable adjustment or matching of other variables, even if done with great care [ 27 , 32 ], can replace the value of a precise baseline measure of the study outcome.

Statistical methods to account for baseline measure are imperfect and continue to be debated [ 33 , 34 ]. Methods include the analysis of covariance or lagged regression method [ 33 , 34 ], the analysis of change scores [ 4 , 33 , 34 ], and the exploration of the interaction between treatment allocation and time point [ 35 ].

In the analysis of covariance method, regression models are used that include the baseline measure as just another explanatory variable. The analysis of change-scores is based on between-arm comparison of the difference between the outcome at follow-up and the outcome at baseline, measured in the same individual, or, in cluster-level analysis [ 36 ], in the same cluster [ 34 , 37 ]. The interaction approach is required if different individuals are measured at baseline and follow-up, and is calculated as the interaction term between treatment allocation and time-point e.

The effect estimates produced by the change score and the interaction approaches are sometimes referred to as Difference-in-Difference DID [ 35 ]. All three methods work well if baseline imbalances are relatively small, but become problematic if imbalances are large [ 4 , 34 , 37 ], which is fair enough as in this case trial arms are probably not comparable to start with. The regression approach works well if baseline and follow up measures are highly correlated, which is often the case for continuous variables such as child anthropometrics or blood pressure.

The regression approach is problematic for binary outcomes. Binary outcomes measured at two different time points e. Adjusting for a baseline measure showing only a low or moderate correlation with the follow up measure leads to regression dilution bias, and failure of the regression model to adequately adjust for any baseline imbalance [ 4 ].

The change score approach may be preferable in this situation [ 33 ]. It is important to maximise between-arm comparability and not solely rely on statistical methods to achieve balance, since the three methods mentioned above each rely on a number of assumptions. Various matching methods can be applied to achieve comparability [ 31 ], including using publicly available census data [ 27 , 32 ]. The most promising approach may be to match intervention and clusters according to the baseline measure of the outcome of interest, which however may not yet be available at the time of recruitment.

Across a number of intervention and control clusters many repeated measurements of the outcome of interest are taken before and after the intervention.

Usually, CITS require the use of regularly collected routine data, which often are only available at the level of large administrative units e. The analysis focuses on whether a certain change in the outcome has taken place after the intervention in the intervention but not the control clusters.

To include intervention and control clusters in the same model, they need to be reasonably comparable. CITS have the advantage that the requirement of including at least 4—6 clusters per arm [ 1 , 13 ] may be relaxed by including a fixed effect for cluster intercepts to control for time-invariant differences between clusters.

It may not be necessary to consider random variation in the intervention effect across clusters. The key feature of a CBA as defined here is that intervention and control arms are not compared statistically. The control arm only serves to get an idea of what the trend in the intervention arm might have been in the absence of an intervention.

Whether or not the intervention is allocated at random is of no relevance for this definition. This definition is in contrast with the EPOC definition which defines a CBA as a trial where before and after measures are taken and where allocation is non-random and outside the control of the investigator [ 9 ], equivalent to the MRC definition of a natural experiment [ 14 ].

Design and interpretation of CBA studies have been described [ 5 ], often disregarding the issue of cluster-level allocation. In CBAs, the study outcomes can be compared statistically between different points in time before and after the intervention, but only separately for intervention control clusters, not between them.



0コメント

  • 1000 / 1000