|
TABLE 1 |
|
|
DRUG |
PROCEDURE |
|
1. |
Unchanging compound |
1. |
Evolves Continuously |
|
2. |
Complications increase with use |
2. |
Complications decrease with use |
|
3. |
Results unrelated to physician skill |
3. |
Results vary with operator |
|
4. |
Placebo usually available |
4. |
No placebo |
|
5. |
Crossovers rare |
5. |
Crossovers common |
In contrast, new procedures are introduced while they are imperfect. The indications are uncertain, and the risks are high. As the procedure becomes more widespread, refinements occur and the risks decline, often dramatically. In early trials, results vary considerably in the hands of different operators. Placebos are obviously not available for any invasive procedure, and crossovers from medical to procedural treatment groups are commonplace.
Bias in Randomized Studies
It is commonly assumed that randomization eliminates bias. Though this is mostly true for trials that compare different drug regimens, when new procedures are compared with medical therapy, some types of bias cannot be avoided.
First is the possibility of pre-selection bias which can occur before patients are referred to the study. When two different drug protocols or two surgical protocols are being compared with each other, referring physicians rarely have reason to prefer one to the other and many are willing to accept randomization of their patients. In contrast, when a new procedure (think of coronary bypass in its early years) is being evaluated, physicians are more likely to either want it or to distrust it for their high risk patients, and they refer such patients for the therapy they prefer. The result is a diversion of such patients away from randomization, while the lower risk patients, about whom physicians feel less strongly, are randomized.
It is commonly observed that patients in both limbs of randomized trials fare better than historical controls. This result is generally attributed to both the stringent selection criteria that define the cohort eligible for the trial, and the more attentive, more standardized, and thus more effective therapy that trial patients receive. Undiscussed is the important influence of referral bias, which yields a low risk cohort. This issue merits concern, because in low risk patients, it is difficult to prove that a new therapy is more effective.
An example of this phenomenon occurred during early efforts to assess the efficacy of coronary artery bypass (CABG). In the 1970s a randomized trial of CABG for stable angina was initiated at the Oregon Health Sciences University. I had just joined the faculty after completing my cardiac surgery training there. It was striking to observe the relentless diversion of referrals for CABG away from the University and to private surgical teams who readily accommodated the referring physicians' growing preference for the new option of surgical therapy. The trial terminated with much smaller patient subgroups than originally planned, and with statistically inconclusive results.[5]
A second cause of unintended bias is the fact that even participating investigators have human instincts and subtle biases that can influence the selection or exclusion of patients for randomization. In the U.S. Veterans Administration study of coronary bypass surgery, patients could be excluded because of undefined and subjective criteria such as "poor left ventricular function." Such vague definitions led to a tendency at many participating hospitals to include only the most stable and willing patients.[6]
A third source of bias is non-adherence to the assigned therapy, or "crossing over." In drug studies, crossovers are usually infrequent, they can occur in both directions, and they have little effect on the statistical analyses. In contrast, when an operation or procedure is being studied, crossovers can occur only in one direction – from medical to procedural therapy. If they occur frequently and freely, the medical group eventually comes to be composed only of patients in whom medical therapy is successful. Even the most elaborate statistical manipulations cannot overcome some of the confounding statistical effects of large numbers of crossovers. This was a particular issue in early studies that compared CABG with medical therapy, since patients with persistent symptoms opted overwhelmingly for bypass surgery, and were no longer “available” to experience complications such as death or myocardial infarction on late follow-up of the medical group.
This latter criticism has usually been met with the argument that patients who do well after crossing over to surgery after a period of medical therapy demonstrate that there is no harm in reserving surgical therapy for patients who fail a trial of initial medical therapy. This conclusion is valid, but the favorable experience with such crossovers must not then be used to compare the results of initial surgical with initial medical therapy. Unfortunately, it is unusual for any later citation of the results of such trials to mention the incidence and problem of crossovers.
When Should Randomized Trials of Procedures Be Done?
Let’s get back to Spodick’s suggestion mentioned earlier that randomized studies should be performed from "the very first clinical trial…of…costly procedures of all kinds.”3 Accompanying this well-intentioned if somewhat naive proposal was the more fevered suggestion that editors of scientific journals should reject reports of non-randomized trials! Both of these ideas fail to acknowledge that the rapid evolution of surgical and other procedural techniques usually invalidates "early" trials. The U.S. Veterans' Administration study of coronary bypass surgery began in the early 1970's, and when results were finally reported it was intensely criticized for high mortality and poor graft patency rates in comparison with contemporary outcomes at that time.5 Progress was so dramatic and the decline in morbidity and mortality was so precipitous that initial entries (1970-1972) were actually discarded. Ultimately, the V.A. study had little impact on the practice of coronary surgery as it was outdated on arrival. In a more recent example, randomized trials of intracoronary stents had no sooner been reported than the results were superseded by dramatic changes in anticoagulant therapy.[7],[8] The story is being repeated yet again with drug-eluting stents, for which antiplatelet regimens continue to evolve, and clinical studies of their results lose impact soon after publication.
Another disadvantage of early trials of operations and procedures is their tendency to stifle technical progress, which comes from many investigators. By their nature, trials are confined to a restricted list of institutions, which limits innovation by others.
On the other hand, if a randomized trial is delayed until a procedure is "mature", and good results are the norm, it will be difficult or impossible to randomize a broad spectrum of patients. In the Coronary Artery Surgery Study (CASS) begun in the United States after coronary bypass was well established, some institutions did not randomize any patients and participated only in a registry.[9]
The incentive to participate in early randomized trials is provided by grants that are often vital to the participating institutions. Even when a proposed trial is premature, a decision not to participate in a multi-institutional study, or - perish the thought - to abandon an ongoing trial that is becoming obsolete and pointless, is a form of bureaucratic and academic self-mutilation. Large collaborative studies generate their own central bureaucracy and peripheral constituency, with many clinicians, statisticians, data coordinators, etc. being supported by grant funds. When the U.S. Veterans' Administration proposed a randomized study of the Bjork-Shiley mechanical heart valve versus the Hancock porcine valve in 1976, I declined to participate. (I was then in Milwaukee, where the Zablocki Veterans Administration Hospital was part of the academic teaching program at the Medical College of Wisconsin.) The study was doomed to premature obsolescence and irrelevance by the predictably rapid development of prosthetic heart valves. Patients were enrolled between 1977 and 1982, and the study plodded stubbornly onward for many years at great public expense until late results were reported in 1993. By then, no models of the Bjork-Shiley valve were even sold in the U.S., and the standard Hancock model initially used in the study had long since been supplanted by hydraulically superior models.[10]
Conclusions
Randomized trials that compare new procedures with established ones should guard against pre-randomization bias and must allocate patients to treatment groups based on objective or quantitative criteria, not on subjective clinical judgment. Risk, length of followup, and sample size must be used to calculate the statistical power of the study, so that a significant difference between treatments does not remain undetected (a Type II error). There should already be sufficient experience with the new procedure so that complication rates have stabilized, and so that participating operators are equally comfortable with all procedures being studied.
Even with the above stipulations, however, randomized trials that compare medical with procedural therapy pose many additional problems, and few such studies have had a major impact on clinical practice. (I have discussed these problems and examples at greater length elsewhere.[11]) The most useful randomized studies of procedures are those that compare one procedure with another, or those that assess a specific refinement in an established procedure or with an established device, such as the use of different anticoagulation regimens for coronary stents.
Finally, randomized trials look at defined (and usually low risk) cohorts that rarely reflect the full spectrum of patients seen in clinical practice, and the results of such trials usually cannot be generalized. Non-randomized retrospective studies and prospective registries can provide clinically useful information that is complementary. The introduction of propensity scoring to derive comparable groups from large registries, and meta-analysis of multiple clinical series, can substitute for the slavish use of randomized studies even when they are unlikely to be helpful. Though sophisticated statistical manipulation is needed to compare outcomes in registry patients, the results can be a useful guide to managing the full spectrum of patients in a typical clinician’s practice. Large registries with a much broader spectrum of patients from the general population can therefore be vital companions to randomized clinical trials.
REFERENCES