# What is the difference between statistically significant evidence and clinically significant evidence? How would each of these findings be used to advance an evidenced-based project?

To decide whether a new treatment should be used, statistical significance of its effectiveness over current treatment alone is insufficient. Measures of the size of the treatment effects (that is, clinical significance) are also necessary.

Statistical significance measures how likely that any apparent differences in outcome between treatment and control groups are real and not due to chance. p Values and confidence intervals (CI) are the most commonly used measures of statistical significance. The p values give the probability that any particular outcome would have arisen by chance with the assumption that the new and the control treatments are equally effective as the null hypothesis. CI estimate the range within which the real results would fall if the trial is conducted many times. Hence, 95% CI of the difference in treatment outcomes between the two groups would indicate the range which the differences between the two treatments would fall on 95% of the occasions, if the trial is carried out many times.

Clinical significance measures how large the differences in treatment effects are in clinical practice. Different measures have been devised. Relative risk is independent of the prevalence of the disease and can be applied to populations with different prevalence of the disease. Relative risk is the ratio of the risks in the treatment group to the event rate in the control group. However, patients may not consider this measure relevant to them as it does not specify the size of the absolute risk. The measures absolute risk reduction (ARR) and numbers needed to treat (NNT) vary with the prevalence of the disease. ARR is simply the difference in the absolute risks between the treatment group and the control group. NNT is the number of patients needed to treat to prevent one adverse event, and is numerically equal to 1/ARR. NNT has been highlighted as a meaningful measure of clinical significance.3 The level of treatment effect regarded as clinically significant also depends on the severity of the disease and any potential side effects of the treatment.

A common measure of combined statistical and clinical significance is to state a measure of clinical significance (for example, relative risk, ARR, or NNT) with its 95% CI. For example, taking into account the prevalence of the condition, an ARR of 0.2 (95% CI 0.1 to 0.4) may indicate that the expected ARR is 0.2, and the real ARR is likely to lie between 0.1 and 0.4 on 95% of the occasions if the trial is carried out many times. The fact that the lower confidence limit is greater than zero means that the treatment is significantly more effective than control at p<0.05.

However, this method has its drawbacks for the clinicians who wish to apply the results in their clinical practice. First, it only indicates the statistical significance with the null hypothesis that the treatments are equally effective. However, as there may be direct and indirect costs and risks of the new treatment, the levels of treatment effects regarded as clinically worthwhile to introduce are likely to differ among clinicians and settings. If one clinician considers that introduction of the new treatment is worthwhile only if the actual risk is reduced by 15%, say, it is important to know how likely the clinical trial observations would have arisen by chance with the null hypothesis that the new treatment has an ARR of less than 0.15 and not with the null hypothesis that the two treatments are equally effective. Secondly, the 95% CI for ARR tend to be interpreted either as statistically significant (if the CI does not include zero) or not significant (if the CI includes zero). In clinical practice, such dichotomy may not be useful, and the size of the treatment effects have to be balanced with the statistical significance. For example, compare the ARR of 0.1 (95% CI 0.05 to 0.2) with an ARR of 0.5 (95% CI –0.01 to 0.8). The former is regarded as statistically significant while the latter is not. However, clinical significance is likely to be higher in the latter than the former. The absolute risk reduction with the confidence limits does not guide the clinicians how to balance these two factors.

Bayesian statistical methods are advocated as alternatives to avoid these problems.However, subjective synthesis of all available information to determine the prior distribution and complex computation required has rendered this method often impractical.

I propose an alternative that a plot of p value-ARR or p value-NNT will be useful to the clinicians who practise evidence based medicine.

## p Value-ARR or p value-NNT plots

The null hypothesis is that the ARR for the new treatment is less than x. p Values are calculated for a range of values of x. These p values are plotted on the y axis and the ARR on the x axis. Hence, for a range of values of ARR, we have the corresponding probability that the clinical trial observations would have arisen by chance if the real ARR were less than the given values. The method of computation is shown in appendix

Scroll to Top