FOOD AND DRUG ADMINISTRATION CENTER FOR DRUG EVALUATION AND RESEARCH EIGHTIETH MEETING OF THE CARDIOVASCULAR AND RENAL DRUGS ADVISORY COMMITTEE 8:30 a.m. Thursday, February 27, 1997 Jack Masur Auditorium Building 10, Clinical Center National Institutes of Health 9000 Rockville Pike Bethesda, Maryland APPEARANCES COMMITTEE MEMBERS: BARRY MASSIE, M.D., Chairman (present morning session) Director, Coronary Care Unit Department of Medicine Veterans Administration Hospital 4150 Clement Street San Francisco, California 94121 JOAN C. STANDAERT, Executive Secretary Center for Drug Evaluation and Research Food and Drug Administration 234 Summit Street, Room 117 Toledo, Ohio 43604 ROBERT CALIFF, M.D. (present afternoon session) Professor of Medicine Director, Duke Clinical Research Center Duke University Medical Center 2024 West Main Street, Box 31123 Durham, North Carolina 27707 JOHN DiMARCO, M.D. Professor of Medicine Cardiovascular Division University of Virginia Hospital, Box 158 Hospital Drive, 5th Floor Private Clinic, Room 3608 Charlottesville, Virginia 22908 CINDY GRINES, M.D. Director, Cardiac Catheterization Division of Cardiovascular Disease William Beaumont Hospital 3601 West Thirteenth Mile Road Royal Oak, Michigan 48073-6769 MARVIN KONSTAM, M.D. Professor of Medicine New England Medical Center 750 Washington Street, Box 108 Boston, Massachusetts 02111 APPEARANCES COMMITTEE MEMBERS: (Continued) JoANN LINDENFELD, M.D. (present morning session) Professor of Medicine Division of Cardiology University of Colorado Health Science Center 4200 East Ninth Avenue, B-130 Denver, Colorado 80262 LEMUEL MOYE, M.D., PH.D. Associate Professor of Biometry University of Texas Health Science Center at Houston Coordinating Center for Clinical Trials 1200 Herman Pressler Street, Suite 801 Houston, Texas 77030 CYNTHIA RAEHL, PHARM.D. Consumer Representative Chair, Pharmacy Department School of Pharmacy Texas Technical University Health Science Center 1300 South Coulter Drive Amarillo, Texas 79106-9711 DAN RODEN, M.D.C.M. Vanderbilt University Division of Clinical Pharmacology 532C Medical Research Building-1 23rd and Pierce Avenue Nashville, Tennessee 37232-6602 UDHO THADANI, FRCP (present morning session) Professor of Medicine Division of Cardiology Oklahoma University Health Sciences Center 920 S.L. Young Boulevard, 5-SP-300 Oklahoma City, Oklahoma 73104 MICHAEL WEBER, M.D. Chairman, Department of Medicine Brookville University Hospital Medical Center 1 Brookville Plaza Brooklyn, New York 11212 APPEARANCES COMMITTEE CONSULTANTS: JEFFREY BORER, M.D. ROBERT CODY, M.D. (present morning session) RALPH D'AGOSTINO, PH.D. FOOD AND DRUG ADMINISTRATION STAFF: BOB FENICHEL, M.D. RAYMOND LIPICKY, M.D. NORMAN STOCKBRIDGE, M.D. ROBERT TEMPLE, M.D. MEDCO REPRESENTATIVES: JAY COHN, M.D. LLOYD FISHER, PH.D. CESARE ORLANDI, M.D. JOSEPH QUINN, M.S.P.H. SMITHKLINE BEECHAM REPRESENTATIVES: WILSON COLUCCI, M.D. LLOYD FISHER, PH.D. MILTON PACKER, M.D. ROBERT L. POWELL, PH.D. NEIL SHUSTERMAN, M.D. JIM TIEDE, PH.D. C O N T E N T S MORNING SESSION NDA 20-727, BIDIL (hydralazine HCl and isosorbide dinitrate) to be indicated for congestive heart failure AGENDA ITEM PAGE OPEN PUBLIC HEARING 7 MEDCO RESEARCH, INC. PRESENTATION: Introduction by Dr. Cesare Orlandi 11 Historical Overview, Clinical Efficacy by Dr. Jay Cohn 12 Statistical Overview by Dr. Joseph Quinn 32 Summary/Conclusions by Dr. Jay Cohn 44 COMMITTEE REVIEW AND DISCUSSION 114 C O N T E N T S AFTERNOON SESSION NDA 20-297 s-001, COREG (carvedilol) to be indicated for congestive heart failure AGENDA ITEM PAGE SMITHKLINE BEECHAM PRESENTATION Introduction - by Dr. Robert Powell 205 Clinical Program - by Dr. Neil Shusterman 213 COMMITTEE REVIEW AND DISCUSSION 303 P R O C E E D I N G S (8:30 a.m.) DR. MASSIE: I want to welcome everybody to the 80th meeting of the Cardio-Renal Advisory Panel which we're going to have today. Before getting started, let me briefly just introduce the members of the committee who are sitting from my left to my right: Dr. Dan Roden, Dr. Marvin Konstam, Dr. Cynthia Raehl, Dr. Michael Weber, Dr. Lemuel Moye, Dr. JoAnn Lindenfeld, our Secretary, Joan Standaert, Dr. DiMarco, Dr. Rob Califf, Dr. Udho Thadani, and not yet but to come later, Dr. Cynthia Grines. Dr. Lipicky representing the Division of Cardio-Renal Drugs is on the far left, and I guess Dr. Temple will be joining us. In addition, we have several outside consultants for today's meeting. Dr. Ralph D'Agostino, who will be a voting member as a special government employee, as will Dr. Jeffrey Borer, and Dr. Robert Cody is our special consultant, but unfortunately not able to vote. The first order of business is that we are open for public comment. If anybody has any comments, we'd be happy to entertain them at this time. In the absence of public comment, we can proceed with our business. Joan Standaert is going to discuss the waivers and potential conflicts of interest of the committee members. MS. STANDAERT: The following announcement addresses the issue of conflict of interest with regard to this meeting and is made a part of the record to preclude even the appearance of such at this meeting. Based on the submitted agenda for the meeting and all financial interests reported by the committee participants, it has been determined that all interests in firms regulated by the Center for Drug Evaluation and Research present no potential for an appearance of a conflict of interest at this meeting, with the following exceptions. In accordance with 18 U.S.C. 208(b), full waivers have been granted to Drs. Barry Massie, Lemuel Moye, and Dr. Robert Califf, which permit them to participate in all official matters concerning Posicor. In addition, Dr. Dan Roden and Dr. Udho Thadani are excluded from participating in all official matters concerning Posicor. Further, in accordance with 18 U.S.C. 208(b)(3), a limited waiver has been granted to Dr. Udho Thadani. I'm sorry. I'm reading the wrong announcement. Well, I'll start over again. Sorry, excuse me. We'll do that again tomorrow. This is the announcement for February 27th, 1997. The following announcement addresses the issue of conflict of interest with regard to this meeting and is made a part of the record to preclude even the appearance of such at this meeting. Based on the submitted agenda for the meeting and all financial interests reported by the committee participants, it has been determined that all interests in firms regulated by the Center for Drug Evaluation and Research present no potential for an appearance of a conflict of interest at this meeting, with the following exceptions. In accordance with 18 U.S.C. 208(b)(3) full waivers have been granted to Drs. JoAnn Lindenfeld, Lemuel Moye, Marvin Konstam, and Dr. Dan Roden, which permit them to participate in all official matters concerning BiDil. In addition, Dr. Robert Califf is excluded from participating in all official matters concerning BiDil. Further, in accordance with 18 U.S.C. 208(b)(3), a waiver has been granted to Dr. Marvin Konstam, which permits him to participate in all official matters concerning Coreg. However, Drs. Barry Massie, JoAnn Lindenfeld, and Dr. Udho Thadani are excluded from participating in all official matters regarding Coreg. Copies of the waiver statements may be obtained by submitting a written request to the agency's Freedom of Information Office, Room 12A-30 of the Parklawn Building. We would also like to disclose for the record that Dr. Robert Califf and his employer, the Duke University Medical Center, have interests which do not constitute a financial interest within the meaning of 18 U.S.C. 208(a), but which could create the appearance of a conflict. The agency has determined that notwithstanding these involvements, that the interest of the government in Dr. Califf's participation outweighs the concern that the integrity of the agency's programs and operations may be questioned. Therefore, Dr. Califf may participate in all official matters concerning Coreg. With respect to FDA's invited guest expert, Dr. Robert J. Cody has reported interests which we believe should be made public to allow the participants to objectively evaluate his comments. Dr. Cody would like to disclose that he has conducted clinical trials and consulted for SmithKline Beecham, Merck, and Zeneca. He has also given presentations which were sponsored by SmithKline Beecham, and Merck. In the event that the discussions involve any other products or firms not already on the agenda for which an FDA participant has a financial interest, the participants are aware of the need to exclude themselves from such involvement and their exclusion will be noted for the record. With respect to all other participants, we ask in the interest of fairness that they address any current or previous financial involvement with any firm whose products they may wish to comment upon. That concludes the statement for February 27th, 1997. DR. MASSIE: Thank you very much, Joan. Well, as is probably apparent to all the members of this audience, as well as all the committee members, we have a very full agenda today and I'm going to do my best to keep the first half on time. In the interest of trying to proceed smoothly, I'm going to ask the committee members to try not to interrupt the sponsor's presentation midstream because we will allow a block of time for questions thereafter and I think that will allow the information to flow more smoothly. So, I guess we are ready for the presentation for BiDil, NDA 20-727. DR. ORLANDI: Dr. Massie, members of the committee, Dr. Lipicky, ladies and gentlemen, good morning. We are here today to present you BiDil for the treatment of congestive heart failure. BiDil is a formulation of two drugs you are very familiar with, hydralazine and isosorbide dinitrate. Our application is based on two landmark clinical trials conducted in the 80s by the Veterans Administration, the V-HeFT I and V-HeFT II studies. Based on the results of these studies, we propose that BiDil is useful in the treatment of congestive heart failure as an adjunct to digitalis and diuretics. It is our opinion also that it's most appropriate the use of this formulation in patients that are not taking ACE inhibitors, which have become also part of standard therapy. Dr. Jay Cohn, who led the V-HeFT trials effort, will provide a historical overview of the trials. Mr. Joe Quinn will then address specific statistical issues that have been raised by the agency. And Dr. Cohn will also conclude our presentation with a brief summary of the findings. I just wanted to mention briefly that we have a number of consultants in the audience to address any specific question that the committee may have. This list includes Dr. Lloyd Fisher, who conducted a re-analysis of the V-HeFT trials, Dr. Uri Elkayam, Dr. Krik Adams, Dr. Ho-Leung Fung, and Dr. Alan Forrest. Dr. Cohn? DR. COHN: Thank you very much, Cesare, and I'd like to express my appreciation to the FDA and to the committee for giving me the opportunity to review with you the trials that we initiated really almost 20 years ago with the planning of the first V-HeFT trial, the vasodilator heart failure trials, which have continued to date, and the results of these first two trials will be the basis for our discussion this morning. What we would like to propose to you at the end of this presentation is that there is a strong basis for approval of BiDil for heart failure and we would propose that this be based on a survival benefit for BiDil as compared to placebo, on the basis of a strong trend from proved exercise tolerance versus both placebo and versus Enalapril in these two trials, on the basis of a sustained increase in ejection fraction that we believe not only confirms the mechanism of action of this drug combination but also confirms that there is a long-term effect of this drug combination. This combination of therapy has a well established rationale and an even better rationale today than at the time these studies were initiated, and we'll go into that in the course of this presentation. The safety of this drug combination of these two long-used agents is well established. This combination is already widely recommended as a treatment option in essentially all of the treatment guidelines that have been published in the last few years. And the approval of this combination is required to provide prescribing information to physicians who have been told to use this drug combination. Now then, hydralazine and isosorbide dinitrate were first used in combination. We did this, and Joe Franciosa, who worked with me at that time, is in the audience here today. We did this on the basis of the potency of this combination as a vasodilator, and the dramatic acute hemodynamic effect that this drug combination produced. At that time we predicted that this favorable hemodynamic effect might be translated into a long-term benefit but there were no long-term data available in order to determine that. V-HeFT, then, was organized as a landmark heart failure study, the first mortality trial undertaken in heart failure, with a goal to assess long-term efficacy of this vasodilator therapy added to conventional therapy, which at that time was digitalis and diuretics. ACE inhibitors had not been developed at that time. And it was possible, of course, at that time to include a placebo group because there was no other effective therapy, and this provided the first and, I must say, the only data that will exist either now or in the future of digitalis-diuretic therapy with placebo added in long-term therapy of heart failure. We would suggest that the impact of the findings of V-HeFT are that there is now demonstrated efficacy of chronic therapy, and this was indeed the first therapy which was demonstrated to be effective, and it has provided a new treatment option for the management of the patient with heart failure, which has already been accepted by most guideline committees. Well, V-HeFT I was a trial assessing vasodilator therapy in long-term therapy compared to placebo, added to, as I pointed out, digoxin and diuretic therapy for patients with heart failure. The two vasodilator regimens that were employed in this study were the hydralazine isosorbide dinitrate combination, and an alternate vasodilator, Prazosin, which had a rather similar hemodynamic effect in this patient population when given acutely. The comparison of the survival times between the placebo arm and the vasodilator arms was proposed to use a one-sided hypothesis because there was no reason at that point to consider any adverse effect of this therapy. The question was, is the therapy effective. So, it was a one-sided hypothesis, and therefore one-sided tests were proposed. V-HeFT II was initiated after the completion of V-HeFT I, and it was undertaken to determine whether the effective arm in V-HeFT I, that is the hydralazine-isosorbide dinitrate arm, had an effect comparable or different from that of Enalapril, which at that point in time had already been evaluated for short-term therapy of heart failure and it appeared to be effective. These drugs then were added to pre-existing conventional therapy of digoxin and diuretic. No placebo arm was included in V-HeFT II because it was felt by the planning committee that it was unethical after the results of V-HeFT I to have a placebo-treated group for long-term therapy. And since it was not known which of the two treatment arms would be more beneficial, it was a two-sided hypothesis and two-sided tests that were employed. Now, these trials that I'm going to tell you about were both randomized, double-blinded. One was placebo-controlled, the other had a positive control. All patients were followed for at least 6 months after randomization into the trial, and the survival status was confirmed in all patients at the planned date of completion of the trial. Both of these trials were planned to be completed at a specific date, and that date was indeed utilized in termination of the trial. The inclusion criteria were all males. The studies were all performed in Veterans Affairs hospitals. They were males between the ages of 18 and 75. They had a history of heart failure, with limitation of exercise tolerance for at least 3 months prior to screening. They all remained symptomatic, despite the use of digoxin and diuretics, and they had objective measurements that made them eligible. That is, there was cardiac dysfunction, as defined by either an enlarged heart on chest x-ray, greater than .55 cardiothoracic ratio, or a radionuclide ejection fraction of less than 45 percent, or a dilated left ventricle on echocardiography with a left ventricular internal dimension and diastole of greater than 2.7 centimeters per meter squared. These criteria were used in both trials. In addition, the patients were all subjected to a bicycle ergometer exercise test with measurement of gas exchange. And they had to have a reduced peak oxygen consumption less than 25 ml per kilogram per minute to be eligible for the trial. The major endpoint in both trials was survival time, and two related endpoints were utilized. That is, the overall survival, and of course the 2-year survival. Of course, the reason for doing that is that if one follows patients long enough, everyone will die and it was thought that perhaps a 2-year endpoint might be a more sensitive marker for a favorable effect of the therapy. So, these were both proposed as analytical endpoints. Now, the survival time was proposed to be carried out by the log-rank test, with the addition of a Cox proportional hazards model, using baseline patient characteristics as modifiers for the Cox model. I will, in addition, talk to you abut two major endpoints of the trial, secondary endpoints, at least. That is, changes in left ventricular ejection fraction and changes in peak oxygen consumption, both of which are the major determinants of survival in this population. These two endpoints were selected by the FDA with its consultant Milton Packer, and Milton is in the audience if there are any questions about his selection of these two as two of the criteria on which to adjust the mortality with the Cox analysis. These, we all agree, are major endpoints in the management of heart failure, so that these two I will talk to you about in some detail. These were assessed by repeated measures analysis, and the t-test of change from baseline by individual visits was utilized for statistical analysis. Well, these are the patient characteristics in the two trials. I think you can see that the characteristics in the two patient populations are quite similar. That is, the patients were somewhere between 58 and 60 years old, a little older in V-HeFT II. The study was obviously performed later. The ejection fraction ranged around 28 to 29 or 30 percent in both trials. The cardiothoracic ratio exhibited an enlarged heart in both studies. The peak oxygen consumption averaged around 15 or 14 ml per kilogram per minute, and patients all had heart failure for at least 2 or 3 years. The majority of the patients were Caucasian. That is, about 70 percent of them in both trials, but there was a fairly sizeable number of African-Americans in the trial. We won't go into that, but we have much data comparing the Caucasian and African-American responses. This is the duration of heart failure, which predominantly was between 6 and 48 months. About 55 percent of the patients had coronary disease as the etiology of their heart failure, and about 45 percent had what was thought to be non-ischemic cardiomyopathy. Now, this was the major endpoint of the trial, which was to monitor mortality, and I plot here the differences between the placebo and the hydralazine-isosorbide dinitrate groups during the 4-plus years in which the patients were followed. You can see that at the initiation there were 273 placebo patients and 186 hydralazine-isosorbide dinitrate patients. That was a planned preference for placebo entrance because we had a third treatment arm, which was Prazosin, which I don't have plotted here. I'll show you the survival curves, including Prazosin, in a moment. But this was an attempt to have a larger placebo group so that we could have more confidence in the placebo arm. You can see at the end of 1 year, there had been a 19.5 percent mortality in the placebo arm, and a 12.1 percent mortality with H/ISDN, and that was a 38 percent mortality reduction. At the end of 2 years, the differences were 34 and 25.6 percent, or a 25 percent reduction. At 3 years the reduction was at 23 percent, and by 4 years it was a little under 10 percent. By 5 years, of course, the numbers became very small. So, after 3 years we really have very little power and obviously an instability of the survival curves at that point in time. This indeed is a plot of the three survival curves in V-HeFT I. H/ISDN in yellow at the top showing that there is a clear reduced mortality or improved survival in this treatment compared to the placebo group in blue. The Prazosin group in red, superimposes on the placebo group until this very terminal end, where there's great instability in the numbers. I think this was the first evidence that a potent vasodilator, which Prazosin is, is not necessarily effective in heart failure, so the original concept that the two vasodilator arms would behave similarly was contradicted by this study. We now know that the efficacy of the vasodilator chronically is not necessarily related to its hemodynamic effect. This is just a brief summary of the statistical analysis of this trial. You'll hear a good deal more about this later from Joe Quinn, but just to briefly tell you what the statistics were. Using the log-rank test, the 2-year mortality reduction from hydralazine and nitrate compared to placebo was .0279, and the risk ratio .7. And as you can see, the 95 percent confidence intervals did not overlap 1. When the Cox model was employed, using the three variables that were chosen by the FDA and its independent consultant to be employed to adjust the log-rank test, the p value fell to .0168 and again, the confidence interval did not overlap 1. Overall mortality by the log-rank test was a p value of .046, and the confidence interval did include 1. When the Cox model was employed, that fell to .0177 and the confidence interval did not overlap 1. DR. MOYE: One question to clarify. Can you go back to the previous slide, please? DR. COHN: Yes, sure, Lem. DR. MOYE: When you show the log-rank p value of .046, now say again what the threshold was that the investigators determined prospectively for stat significance here. DR. COHN: Oh, we'll get into that a good deal later. I'm just showing you the raw p values with no intent to suggest that this has met any criteria that was established. So, we'll get into that in more detail later on. These are the raw numbers. Now, the other major endpoints that I wanted to bring to your attention were the peak oxygen consumption and the ejection fraction. This is the intent-to-treat analysis of changes in peak oxygen consumption over time in the two treatment arms of interest. There was clearly a trend, but no statistical difference between the two. A trend for the H/ISDN group to exhibit sustained improvement in peak oxygen consumption, which did not occur in the placebo arm, but none of these differences were statistically significant. Now, V-HeFT I gas exchange was done by a primitive methodology that we developed ourselves and we put some instruments together. There was no commercial instrument at that point available for use. It was a mixing chamber. It was a pretty crude way to measure peak oxygen consumption. V-HeFT II, that I'll show you in a moment, was carried out with modern technology with breath-by-breath gas exchange data, so that I have more confidence in the V-HeFT II gas exchange data. However, the protocol said that exercise tests would only be included for analysis if they were terminated by dyspnea or fatigue. And therefore, in the protocol analysis, there were fewer patients included because those who had orthopedic reasons, et cetera, for not finishing an exercise test were excluded. When we did that, the data looked pretty similar. That is, the green line, which is H/ISDN, exhibited some improvement, which seemed to be at least unchanged or improved over time. The placebo group in blue exhibited what was a trend toward a decline. And one time point, at 1 year, exhibited a significant p value of less than .05 for the improvement of exercise performance in the H/ISDN group compared to the placebo group. That might be shown even a little more clearly on this next slide, in which we have done an analysis looking at the changes in exercise performance by groups. So, this represents in the placebo group on the left and the H/ISDN group on the right all patients who reached 1 year after randomization and what their exercise tests showed. First of all, there were more people who died in the placebo group. This has already been pointed out. So, this excluded them from a repeat exercise test, and more were excluded in the placebo arm. Then we've looked at three different levels of peak oxygen consumption, using .07 ml per kilogram per minute as the dividing point because the mean increase in the H/ISDN group was 0.7 at 1 year. You can see that in purple are those whose exercise performance worsened over that 1-year period of time, and there were fewer here than here. In yellow are those whose exercise performance stayed the same between the two, and there were more people in this group than in this group. In this top bar are those whose exercise performance improved over that period of time, and there were more here than there were in the placebo group. And these are the ones who, for administrative reasons, had missing data and they were equal in the two treatment arms. By a chi-square analysis of these two distributions it's significant at the p .024 level. Now, the ejection fraction changes were very dramatic and consistent. That is, H/ISDN produced a significant and sustained improvement in ejection fraction. These are all measured by radionuclide techniques sequentially. In contrast, the placebo group exhibited no improvement and a progressive decline over time, which did not occur in this group. I personally view the sustained improvement of ejection fraction as a structural alteration in the left ventricle with reduction of the remodeling process, which appears to progress in the placebo group. Now, this slide adds the Prazosin group to this analysis. The Prazosin group now is in blue and the placebo group in yellow. You can see that these two track together, and there was a progressive decline of ejection fraction in both groups, indicating that this vasodilator, Prazosin, did not favorably affect the structure remodeling process in the left ventricle, whereas this vasodilator, H/ISDN, did. This indeed tracks directly with the mortality results that I've shown you before, and we have data that I won't have time to go into to suggest to you that the changes in ejection fraction are indeed very powerful predictors of the change in mortality in the individual patient groups. Well, when we completed V-HeFT I, there had been also the publication of data from the CONSENSUS study done in northern Scandinavia, in class 4 heart failure, using Enalapril as the treatment option. That study eventually led to the approval of Enalapril for mortality reduction in heart failure. And that was a 1-year trial, so that at the end of 1 year, in CONSENSUS, this was the mortality in the placebo group, very high because these were class 4 heart failure patients, and this was the mortality in the Enalapril group at 1 year, and that represented a 31 percent reduction. When we looked at the V-HeFT data in terms of 1-year data, this was the reduction from 20 percent to 13 percent which represented a 38 percent reduction in mortality. We thought that these two reductions were quite comparable, but this was indeed a very different patient population than V-HeFT and we therefore asked the question, would Enalapril have the same beneficial effect, or greater beneficial effect in mild to moderate heart failure as did hydralazine and isosorbide dinitrate in V-HeFT I, and that was the basis for the design of V-HeFT II. These were the survival curves from V-HeFT II, Enalapril in green, H/ISDN in yellow, and you can see that there was a clear trend for improved survival with Enalapril, compared to hydralazine and isosorbide dinitrate. This is the statistical analysis in brief of that difference by log-rank test, and a two-sided p value here. The p was .017 for the 2-year mortality difference between the two, favoring Enalapril, with a risk ratio of 1.46. The overall mortality difference did not achieve statistical significance, .0828, but a clear trend for a favorable effect of Enalapril compared to H/ISDN. But here the confidence intervals overlap 1.0. The other endpoints, again, in V-HeFT II are striking. This was the intent-to-treat changes in oxygen consumption during exercise, and once again, H/ISDN exhibited a modest but sustained improvement in peak oxygen consumption, certainly for the first two years, and at least three time points these increases, when compared to the changes with placebo, were statistically significant -- not placebo -- Enalapril, I'm sorry, were statistically significant. At no time point during follow-up did Enalapril produce an improvement in peak oxygen consumption. In fact, oxygen consumption tended to decline over time. Now, this was the intent-to-treat analysis, but the protocol analysis, again, defined the changes to be identified only in patients who stopped exercising for dyspnea or fatigue, and this is the protocol analysis, showing pretty much the same thing, that there was a strong trend for an improvement with H/ISDN and not with Enalapril. These were the statistically significant points. I must emphasize to you that at 3 months and at 6 months, which represents the time frame for which the FDA has all existing data on changes in exercise and heart failure therapy, H/ISDN exhibited significant improvement in peak exercise capacity compared to Enalapril. I think that if the study therefore had been terminated at 6 months, as have most other exercise studies, there would have been no question that this therapy was more effective than the ACE inhibitor in symptom relief or exercise performance in heart failure. Now, once again, the ejection fraction changes were very striking with both therapies. That is, both Enalapril in green and H/ISDN in yellow produced a sizeable and sustained improvement in ejection fraction. In fact, at 3 months the increase in the H/ISDN group was greater than the increase in the Enalapril group. Thereafter the two were similar, suggesting that both interventions favorably affect the remodeling process in the left ventricle. Now, we were struck when we completed V-HeFT II, and we had the same treatment arm in both trials -- that is the H/ISDN arm was exactly identical and the therapy was identical in the two trials. The survival curves for these two treatment arms were superimposable. That implied to us that there must have been some stability in this response since it was so reproducible. Now, this of course then comes to the placebo arms because we did not repeat a placebo in V-HeFT II and therefore, we are dependent on the placebo group in V-HeFT I. I have given you a list here of the so-called placebo groups in more recent heart failure trials. I have given you data on the number of deaths in these trials, in the placebo arms, the duration of follow-up, and the use of what may be critical co-therapy. That is, the use of nitrates and the use of ACE inhibitors. In V-HeFT I there were 120 deaths, so this is a rather robust sample. The follow-up was 2.3 years. None of the patients in the placebo arm received nitrates, and none of them received ACE inhibitors. This is a true placebo group, added to digoxin and diuretic therapy. CONSENSUS, that I have already alluded to, had only 55 deaths in the placebo arm. The follow-up averaged only 0.5 years. 45 percent of those patients were treated with a nitrate, which obviously potentially contaminates the placebo group. In the SOLVD trial, which is the largest clinical experience in heart failure trials, there were 510 deaths in the placebo arm, and a follow-up of 3.4 years, which makes this a very robust placebo group. But it isn't a placebo group because 45 percent of these patients were treated with nitrates chronically, and 23 percent in the placebo group were given ACE inhibitors as drop-in therapy. So, this is certainly not a placebo group. Now, the more recent trials, PROMISE, which exhibited an adverse effect of milrinone in heart failure, had 127 deaths in the placebo group, an average follow-up of only 0.5 years. But 59 percent of that placebo group was treated with nitrates, and essentially all of them at least were by protocol on ACE inhibitor. We don't have the actual data in the paper. The Vesnarinone trial, which was not replicated by the more recent VEST study, have only 33 deaths in the initial Vesnarinone mortality trial in the placebo arm. The follow-up was only 0.5 years. We don't know about nitrates, but 90 percent of the placebo group were on ACE inhibitors. The more recent Carvedilol American data that you'll be reviewing later this morning, in the placebo group there were 31 deaths. As you know, the follow-up was only 0.5 years, but once again, 32 percent of them were receiving nitrates and essentially all of them were receiving an ACE inhibitor. So, this is to point out to you that we will never again have a placebo arm comparable to V-HeFT I because it is ethically indefensible to any longer treat patients without an ACE inhibitor, and we would like to suggest that after today's meeting it would be equally indefensible to treat them without a nitrate along with hydralazine. Well, if we can use that placebo arm, then, as a comparator, we can put a plot of the five treatment arms from V-HeFT, and in fact this analysis was recommended by the FDA for us to do. So, this is in response to their raising the issue about comparing the V-HeFT I placebo group here in red with Enalapril in blue and with the two hydralazine-isosorbide dinitrate curves in yellow. Now, a 2-year endpoint was indeed a pre-study endpoint, so we have dropped a vertical at 2 years in the placebo arm, and discovered this is the mortality at 2 years and we put a horizontal line over to this mortality, which is about 65 percent. Then we determined at what time point would you reach that same mortality if you had instead been treated with hydralazine and isosorbide dinitrate, and these two curves, which were just superimposed, show you that there is a prolongation of life by an average of about 10 months. If instead we had used Enalapril, the prolongation of life would have been longer, at maximum probably another 8 months. Obviously the effect is more than 50 percent of this effect, and this effect is enhanced by the fact that there was a little blip on that Enalapril curve there, but be that as it may, it's clear that Enalapril had a more favorable effect than did hydralazine and isosorbide dinitrate. But both are very importantly better than the placebo or Prazosin arms shown here. Well, in summary, then, I've told you that mortality is reduced by H/ISDN, that there is a strong trend for improved exercise tolerance by H/ISDN, and that there is sustained improvement in ejection fraction with H/ISDN. Now, a number of statistical issues have been raised by the FDA, and I'll now turn the podium over to Joe Quinn, who will address some of these issues. Joe? DR. QUINN: Thank you, Dr. Cohn, and good morning to everyone. I would like to discuss several important statistical issues that have been raised by the agency that potentially impact the interpretation of the nominal p values in the application. The first issue is the impact of the interim analysis. This slide summarizes the interim results of the overall survival time that were provided to the V-HeFT I Operations Committee. Note that even though the protocol specified -- sorry. This slide summarizes the interim results of the overall survival time that were provided to the V-HeFT I Operations Committee. Note that even though the protocol specified a one-sided test hypothesis, the p values shown are two-sided p values, as the committee wanted to be conservative in their decisionmaking. There were four interim analyses that were conducted using the O'Brien-Fleming criteria. These are shown on the right-hand side with the critical values. Additionally, there were four interim analyses conducted for administrative purposes. The columns represent the protocol-specified ways of comparing the arms. Overall tests between the 3 curves, using a 2 degree of freedom test, a combined active versus placebo arm, and the two pairwise comparisons of the active arms to placebo. Note that the first three looks at the data were performed using an overall test. A trend was observed in February of 1983 comparing the best arm, which was H/ISDN, to the worst arm, placebo. It was after this analysis that the Operations Committee unblinded themselves. In May of 1983 it is important to note that a significant difference using the O'Brien-Fleming stopping boundary was observed between H/ISDN and placebo. However, the trial continued without change. The majority of the protocol specified comparisons were made after the significant interim result was established, pointed out in this area. Note that even though the overall survival time was used for this analysis, the significant results obtained in May of 1983 were more similar to a 2-year endpoint. The O'Brien-Fleming method was not the pre-specified method in the protocol but was used after the method was published in 1979 as it was easier to implement than the Canner method that was a pre-specified method. The next slide I'm going to show you is not included in the committee packet of slides, nor has it been shared with the FDA, due to the recent completion of this simulation, but we feel it shows strong, supportive information that also assesses the sensitivity of the O'Brien-Fleming method that was used. This slide summarizes the simulation of an interim analysis using the protocol specified Canner method. The results of this simulation support the findings of the O'Brien-Fleming method, indicating a superior mortality benefit for H/ISDN over placebo in May of 1983, as well as in August 1984. The critical p value that was used for this simulation was a .0125 for the comparison of H/ISDN to placebo, a .0125 for Prazosin versus placebo, and .025 for the combined active versus placebo. There were several reasons that the committee did not stop the study after the significant interim finding in May of 1983 was observed. Importantly, the impact of the differences in the baseline characteristics of the patients upon survival had not yet been assessed, and the committee wanted to establish the length of benefit of the H/ISDN effect. In summary, there was a statistically significant interim analysis in May of 1983, according to the O'Brien-Fleming stopping criteria. The study continued beyond May of 1983 to investigate the length of benefit of effect. The protocol-specified secondary analyses -- that is, the Cox model -- were justified based upon the significant interim results. As the May 1983 analysis met the stopping criteria, no penalty is required for the interim analyses that were conducted after this May 1983 finding. The next issue is the multiple treatment arm comparisons. As previously shown, the interim testing was conducted in a protected fashion. First, the overall test of the three treatment arms, using a 2 degree of freedom test, was employed. Secondly, the best versus the worst arms were compared in February of 1983, and again in May of 1983. Only after a significant result was obtained in May of 1983 were the combined active versus placebo arms and pairwise comparisons made. We would suggest that after significant differences were established in May of 1983, no alpha penalty is warranted for the protocol-planned comparisons performed subsequent to this time. The next issue is the stepwise approach to the analysis, that is, a non-significant log-rank test, then a Cox model analysis. The significant log-rank interim analysis in May of 1983 justified the protocol-specified secondary analysis, the Cox model, without alpha penalty. The analysis of a covariance method gave a more precise estimate of the true treatment effect, especially for overall survival where the estimate of effect is more variable because of the small number of patients in the trial after 3 years. The next issue is the imputation of missing covariate values. There were a total of 459 patients in the placebo and H/ISDN treatment arms in V-HeFT I. There were 51 of 459 patients that were missing either baseline ejection fraction or baseline peak O2, two of the covariates selected by Dr. Packer as a consultant to FDA. This slide shows the baseline mean values of ejection fraction and max oxygen consumption by survival status. There was a consistent trend independent of treatment group showing that patients dying during the study had a lower baseline EF and lower baseline max O2 than those alive at the end of the study. This slide shows similar data in a slightly different fashion. This slide shows the cumulative mortality by baseline ejection fraction and baseline oxygen consumption during peak exercise. The patients with the lowest ejection fraction and lowest oxygen consumption had the highest mortality during the study. There were incremental advantages in total mortality observed by baseline ejection fraction and oxygen consumption, with patients having the higher baseline ejection fraction and higher baseline oxygen consumption having the lowest mortality. This next slide summarizes a simulation analysis performed by Dr. Jim Hung at FDA, showing alternative methods for imputing missing values of baseline ejection fraction and maximum oxygen consumption. The first row of this table shows the results when the maximum non-missing value is used to impute the missing values for the patients that died, and the minimum non-missing value is used to impute the missing values for the patients that survived. This approach may not make sense, given the data which I have just shown to you, and what we know about the trials regarding the prognostic significance of ejection fraction in oxygen consumption upon survival. The second row shows the results if one uses the mean value to impute the missing values for all patients with missing ejection fraction or max oxygen consumption, regardless if they died during the trial. This is the method that most closely resembles the approach used by Dr. Lloyd Fisher for the analysis submitted in our application. This method leads to a p value of .016 in the Cox model for overall survival and a p value of .013 for 2-year survival. The third row shows the results obtained if one uses the minimum non-missing value for ejection fraction in max O2 as the imputed value for those patients that died during the trial. And the maximum non-missing value of ejection fraction and max O2 for those that did not die during the trial. Based upon the data that I have just shown you, and based upon what we know about the prognostic significance of ejection fraction and oxygen consumption from other trials, this method has strong intuitive appeal. Using this method to impute the missing 51 covariate values, one obtains a p value of .007 for the overall survival, and .01 for the 2-year survival. The true p value probably lies somewhere in between .016 and .007 for the overall survival and probably somewhere between .013 and .01 for the 2-year survival. Finally, I would like to point out regarding the last column, labeled log-rank/Cox, these columns indicate the simulation results for the incremental increase in the p value for conducting the Cox analysis after a non-significant log-rank test. As previously mentioned, the statistically significant log-rank test in the May 1983 interim analysis provided a rationale for conducting the Cox analysis without an adjustment in the p value for this approach. In summary, the sensitivity analysis conducted by Dr. Hung indicates a range of nominal p values, depending upon the method used for imputing the missing covariates. Use of a minimum value for deaths and maximum value for survivors is reasonable, given the observed findings and what we know about the prognostic significance of these covariates. Use of a mean value may lead to a more conservative p value, especially for overall survival. The next issue is the two protocol-specified primary endpoints. There was a 33 percent reduction in mortality through 2 years, and a 27 percent reduction in overall study mortality for H/ISDN treated patients. The one-sided 95 percent confidence intervals indicate the H/ISDN risk reduction is consistent with the range of observed findings. The observed risk reduction at both time periods was consistent and correlated, and both findings represent different point estimates of one endpoint, that is, survival. In summary, the consistent risk reduction was observed at 2 years and overall study. The protocol specified a valuation at two time points to assess the length of benefit of effect. The estimate of effect may be influenced by the sample size at 2 years, and at the end of study, and it may be reasonable to consider survival data as one endpoint, having two point estimates of effect for the modest alpha penalty imposed upon the nominal p values. The last issue is the issue of the replication of the study findings. There are three questions that have been suggested by the agency that must be addressed for this issue. Would H/ISDN have beaten placebo if it had been studied in V-HeFT II? And what is an appropriate placebo group to use for this comparison? And is the point estimate of the effect size for H/ISDN less than half the effect size of Enalapril? Because of ethical concerns, the demonstrated mortality benefits observed in V-HeFT I were not replicated. However, the agency has suggested analyses that might be supportive of the mortality benefit and the following is presented as supportive information. We strongly feel that the randomized concurrent control arm of V-HeFT I is the appropriate basis for the mortality benefit. It has been proposed by the agency that the placebo arm from the SOLVD treatment study may be an appropriate arm for comparison. This slide shows the risk ratio for mortality relative to Enalapril for SOLVD treatment, placebo, and V-HeFT II H/ISDN. When the H/ISDN effect is compared to this placebo, there is no observed difference in the risk estimates. This is true at both the 2-year and the overall time points. However, this placebo arm is flawed for purpose of making this comparison, of the following reasons. This study allowed the active use of vasodilators and nitrates and the study also allowed open-label use of ACE inhibitors. This placebo arm is therefore not an adequate control for making this comparison. And one would not expect to observe a difference between H/ISDN in such an arm. A more appropriate control arm is the placebo arm from V-HeFT I. The placebo arm from V-HeFT I allowed only digitalis and diuretic use. Once V-HeFT I was completed, it was no longer ethical to use this control arm in this patient population. Use of the V-HeFT I placebo group as a control group for V-HeFT II makes sense, given the similarity of the patient populations studied and the conduct and handling of both trials. As previously shown by Dr. Cohn, this slide shows the survival profile for H/ISDN treated patients in V-HeFT I and V-HeFT II. It is clear that this profile is very similar, but this does not allow one to conclude that the risk reduction for H/ISDN is replicated in V-HeFT II. To do that, one must also consider the data from the second arm of that trial, Enalapril, and how each arm would have performed relative to a placebo group, had there been one. This slide shows the risk reduction and the 95 percent confidence intervals for V-HeFT I and V-HeFT II, as well as the risk reduction for Enalapril compared to V-HeFT I placebo, and H/ISDN from V-HeFT II compared to V-HeFT I placebo. It is important to note the following. The risk reduction observed for H/ISDN and V-HeFT I, .73, is consistent with the observed risk reduction for H/ISDN and V-HeFT II, compared to V-HeFT I placebo, .75. There is a strong suggestion of an overall Enalapril benefit in V-HeFT II, even though the 95 percent confidence interval includes 1. The risk reduction observed for Enalapril, compared to V-HeFT I placebo, .61, is consistent with the expected conclusion of an Enalapril survival benefit. Importantly, the point estimate of the H/ISDN risk reduction, .75, is not less than half of the point estimate of Enalapril, when both are compared to a common placebo. Also, the upper bound of the Enalapril effect overlaps the point estimate of the H/ISDN effect, as does the lower bound of the H/ISDN effect overlap the point estimate of Enalapril. In summary, V-HeFT I was the only study with a true placebo arm. The Enalapril survival benefit versus V-HeFT I placebo was consistent with the expected survival benefit of Enalapril. The H/ISDN survival benefit from V-HeFT I was replicated in V-HeFT II when compared to the V-HeFT I placebo group, and the point estimate of the V-HeFT II H/ISDN survival effect was not less than half the effect size of Enalapril, when compared to a V-HeFT I placebo. In conclusion, it is reasonable to expect little or no impact upon the nominal p values due to the issues described. The extent of the alpha penalty does not impact the interpretation of the observed survival benefit for H/ISDN in V-HeFT I. And it is reasonable to conclude that the H/ISDN survival benefit was replicated in a second study. And now Dr. Jay Cohn will provide a clinical wrap-up to the presentation. DR. COHN: Well, a number of other endpoints were monitored in V-HeFT I and II and time won't allow us to go into all these, but a few of them have been specifically addressed by the agency, and I'll try to provide those data. In V-HeFT I and V-HeFT II, we measured cardiac hospitalizations as well as we could. Quite a different population because these were VA centers and the criteria for admission to a VA hospital are quite different from those to private hospitals. Quality of life was assessed in both trials, but I must point out to you that in 1979 when we planned V-HeFT I, there were really no appropriate quality of life instruments that could be used, so this was truly not a quality of life assessment. We did use a form in V-HeFT II that I will show you in a moment. It was never validated. It has not been re-used. We have subsequently developed a Minnesota Living with Heart Failure Questionnaire, which was not employed in all the centers in V-HeFT II. We monitored heart size, we monitored echocardiograms. We did Holter monitoring, and we measured plasma norepinephrine levels, and time will not allow me to go into these endpoints. The time to death or hospitalization is shown here because the agency asked about hospitalizations. This is the V-HeFT I data showing time to death or hospitalization, and you can see there was a clear trend for the H/ISDN group to fare better than the placebo group, but this was not statistically significant. This is the analysis of the V-HeFT II, that is, time to death or hospitalization. In V-HeFT II, and as you might predict, there was a more favorable effect of Enalapril compared to H/ISDN, largely reflecting the mortality difference because when we look at just the time to first hospitalization for any reason in V-HeFT II, the two curves for Enalapril and H/ISDN superimpose and there is no difference at all between them. If one accepts, then, that Enalapril has a significant impact on hospitalizations and reduces it, as it has in other studies, one might conclude that H/ISDN is not different from Enalapril in that regard. This is the quality of life assessment we did in V-HeFT II, called a Heart Condition Assessment Score. This is the changes over time, an increase being an improvement in quality of life, a decrease being a decrease in quality of life, and there is no striking difference between H/ISDN and Enalapril. At the first time point, 3 months, where the agency has almost all of its data on quality of life in heart failure and the effects of therapy, H/ISDN exhibited a significant increase. Enalapril did not. That p value was less than .05 at 3 months. Thereafter, quality of life declined progressively in both groups, which tells you a little bit about the natural history of heart failure. At all time points, though, there was a greater decline in quality of life in the Enalapril group than in the H/ISDN group, suggesting a trend for more favorable effect of H/ISDN, consistent with the trends on exercise performance. Now, the safety of these two drugs I won't go into. You have it in your document, all the side effect data. The safety has been well characterized. We know, and it has been confirmed, that H/ISDN causes headache and that is reduced when the dose is reduced. We know that Enalapril causes cough and that clearly appeared in the database. There were essentially no instances of lupus in V-HeFT I. There were two possible cases in V-HeFT II, but it's clear that the incidence of lupus as a complication of hydralazine is exceedingly uncommon in this patient population. Now, the issue of nitrate tolerance has been raised repeatedly, both in the clinical arena and by the agency because of the well-known tolerance that develops to continuous nitrate administration in their treatment of angina. The mechanisms for this nitrate tolerance have in the past not been clarified. There are many mechanisms that have been suggested, but there is recent and perhaps the most exciting data of all, the role of hydralazine as an inhibitor of nitrate tolerance. It appears that when we serendipitously put these two drugs together in the late 1970s, not knowing at all what the interaction was but knowing that they were both vasodilators, we did something that proved to be remarkably effective, and that is, we added to nitrate a nitrate tolerance inhibitor. I'll show you just briefly the data on that issue. It has been well established in a number of laboratories, laboratories of Munzel and Harrison and Besange, that nitrate tolerance is associated with the generation of superoxides at the endothelial surface. These superoxides chew up nitric oxide and thus inhibit the nitric oxide effect which characterizes the hemodynamic response to nitrates. This is just one slide from a paper by Winslow that was published in the Journal of Clinical Investigation last year, in which superoxides are measured in response to NADH addition as a substrate. This is carried out in ground-up aortas from rabbits, who were either not treated with nitroglycerin or treated with nitroglycerin, or treated with nitroglycerin in addition to hydralazine for 3 days before the aortas were taken out and ground up. You will notice that this is the superoxide production in response to NADH in a controlled animal that received neither nitroglycerin or hydralazine. This second black bar is the increase that is identified when the animal had been treated for 3 days with nitroglycerin, an excess by about two or three-fold of the amount of superoxide that is produced in the vasculature. When hydralazine was added to nitroglycerin in the treatment of these animals for 3 days, there was no excess of superoxide produced, implying that the hydralazine had prevented the generation of the superoxide which causes nitrate tolerance. Now, the in vivo documentation of this combination has been well established. This is a study performed by Dr. Ho-Leung Fung, who is in the audience in case there are any questions raised about this, in which he took rats with myocardial infarction who had an elevated left ventricular end diastolic pressure and infused nitroglycerin continuously. In open circles is the response of the left ventricular end diastolic pressure to nitroglycerin. It comes down and then gradually recovers, despite the fact that the nitroglycerin infusion is continued. This recovery to pre-treatment levels implies nitroglycerin tolerance, the hemodynamic effect of the nitroglycerin. When in fact he added hydralazine to the regimen, which in itself did not change LVEDP, the fall was comparable with the nitroglycerin but now the nitroglycerin effect was sustained over 10 hours. This is a significant inhibition of the tolerance that developed in the nitroglycerin-alone treated rats. And then to bring this to the clinic, Dr. Uri Elkayam and his colleagues in Los Angeles -- and Uri is also in the audience in case there are any questions -- did the same trial in humans with heart failure. Infusion of nitroglycerin in these patients with heart failure produced a decline in the pulmonary capillary wedge pressure and then when the nitroglycerin infusion was continued, the wedge pressure rose progressively, implying tolerance to the hemodynamic effects of nitroglycerin. When hydralazine was co-administered with the nitroglycerin, the favorable effect of nitroglycerin on pulmonary-capillary pressure was sustained. So, there appears to be rather persuasive evidence now that hydralazine is a potent antioxidant which inhibits the tolerance that may develop to nitroglycerin or to isosorbide dinitrate. Now, I am not willing to accede that hemodynamic tolerance is necessarily also implied, that there is tolerance to the anti-remodeling effect of nitrates on left ventricular function. I think these must be viewed as separate endpoints, and we can't assume that one is related to the other. But I think that this is clear evidence that whatever tolerance might develop during chronic administration of isosorbide dinitrate should very much be inhibited by the co-administration of hydralazine. Well, I alluded at the beginning to the fact that the guideline committees have approved this therapy already and I just remind you, and you have in your briefing document the details of these guidelines, and in fact many members of this committee have served on these guideline committees. There are three identified here. That is, the guidelines issued by the American College of Cardiology and the American Heart Association for the treatment of heart failure, the guidelines issued by the Agency for Health Care Policy Research, and the guidelines for heart failure treatment issued by the World Health Organization. All of these guidelines recommend for therapy of heart failure digoxin and diuretics, the use of ACE inhibitors, and the use of hydralazine and isosorbide dinitrate in patients who are not taking an ACE inhibitor. They do not suggest this should replace ACE inhibitors, that this should be used in patients who do not take those drugs. Well, I just would like to finish up by putting in context what I have learned from these V-HeFT trials because this has changed the paradigm. We used to think that heart failure was a syndrome in which there were many endpoints, all of which should be in concert. I think we now know that they are distinct, and that the progressive process in the left ventricle with dilatation, which we call remodeling, and a progressive fall in ejection fraction leads to premature death from arrhythmias or pump failure, and this process may continue and progress to death in the absence of symptoms. In fact, the SOLVD trial, the SOLVD prevention trial, was initiated to identify patients out here with a low EF and no symptoms. So that it is quite possible to go through this whole disease without symptoms. The presence of symptoms relates largely to noncardiac factors which may be variably stimulated by this process in the left ventricle, and may include neurohormonal activation and multiple other factors as well. Most of the data that have been previously reviewed by the FDA for treatment of heart failure for relief of symptoms have involved short-term studies in which symptom relief is really a short-term goal of therapy. In contrast, if one is interested in this process leading to death, one must do a long-term trial and one must used therapies to interfere with this process that may be quite separate from therapies aimed at relieving symptoms. So, I currently view the management of heart failure really with two different goals in mind. One is short-term symptom relief, and for that we often use -- we do use -- diuretics and vasodilator may favorably affect short-term symptoms by producing a favorable hemodynamic effect. And we even use occasionally positively inotropic drugs like dobutamine and milrinone in order to have a favorable effect on hemodynamics and on symptoms, despite the fact that we know that these drugs shorten life expectancy, apparently, and some of these drugs have no effect on life expectancy and some may shorten it. So that there is no relationship between the favorable effect of these drugs on symptoms and the potential for therapy to alter the long-term course of the disease. From what we now know, progressive left ventricular dysfunction can be inhibited and therefore mortality reduced by ACE inhibitors, by hydralazine and isosorbide dinitrate, I believe by beta-blockers -- and you are going to be dealing with that contentious issue this afternoon -- and perhaps by other neurohormonal inhibitors which can alter the milieu and influence the rate at which the left ventricle remodels, yet to be determined out here. But I think we have reached the point now where we have to identify specific endpoints for a therapeutic approach. The only agent which appears on both sides of these columns is hydralazine and isosorbide dinitrate because it does relieve symptoms and improves exercise as a potent vasodilator, and it also inhibits the progressive remodeling process in the left ventricle. Well, in summary, then, I hope we have been able to convince you, Mr. Chairman, that there is a strong basis for approval of BiDil for congestive heart failure. That is, that the combination of hydralazine and isosorbide dinitrate exhibits a survival benefit compared to placebo; that it exhibits a strong trend for improved exercise tolerance versus both placebo and versus Enalapril and V-HeFT II; that it produces a sustained improvement in ejection fraction, which I believe means that it is inhibiting the remodeling process and it also confirms the long-term effect of these two vasodilators; that this combination therapy has a well-established rationale, even more well-established by the recent data relating to nitrate tolerance; that the safety of this combination is well-established; that it is already widely recommended as a treatment option in all the guidelines issued for the management of heart failure; and that indeed the practicing physicians require prescribing information to properly utilize this remarkably effective therapy. Thank you very much. DR. MASSIE: Thank you very much, Jay. The way I think we should proceed from here is first open up this presentation to questions from the committee and our consultants. We are going to lead off with our reviewers, as we usually do, and our consultants, and if the reviewers from the FDA want to ask some questions at that point, that would also be appropriate. Then we'll ask the reviewers from the FDA for comments and then we'll proceed on to the questions. So, why don't we start. Lem, do you want to start, since you had some statistical questions? DR. MOYE: Sure. In nowhere as part of the slide presentation that we saw today did I see the -- and if this was here and I missed it, I apologize, but I don't think I saw the log-rank analyses which led to the p value of 0.093. And I wondered if you could comment on that. DR. QUINN: I think you are referring to the two-sided log-rank test that was in the original application? DR. MOYE: That's right. DR. QUINN: Well, that has been presented as the one-sided p value that corresponds to that two-sided test, as the protocol specified the one-sided p value as the appropriate method. DR. MOYE: And so the one-sided p value is what precisely? DR. QUINN: Can I go back to that slide? It would be the one from Dr. Cohn's presentation of the summary of the V-HeFT I survival. DR. MOYE: That's where I think I first asked the question. It's 0.04. DR. COHN: It's .046, I think. DR. MOYE: Okay. Now, the threshold for significance, which was prospectively specified by the investigators, was at, again one-sided, 0.025. Is that right? DR. QUINN: Well, it's difficult to interpret the protocol actually. The protocol suggested that different alternatives could be employed, depending upon the number of comparisons that were made. And the protocol suggests that if the combined active versus placebo arm was compared, as well as the two individual active arms to placebo, then that the individual active arms to placebo could be compared at the .0125. However, the protocol doesn't necessarily lead one to believe that all those tests would be conducted and the pairwise comparisons could also have been tested at .025 and the rationale that I'm trying to make is that the interim analysis of May 1983 that met the O'Brien-Fleming stopping criteria, was the significant log-rank test found for the trial. DR. MOYE: But since the trial was allowed to continue, I think it's also admissible that that might not be the definitive p value because of course, as you get these multiple p values, as you go through the interim analyses, one could choose any p value they wanted and continue to go through the trial, amassing additional p value. There is a problem with that approach, right? Okay. One other question. The protocol is actually quite laudatory of the log-rank test. I will not read the individual statements from the protocol, but there are I think two locations where they mention the superiority of the log-rank test and that it is distribution-free, and I think they go so far as to say that one of the best tests available to identify small differences between treatments is the log-rank test. Yet, now there is a good deal of emphasis on the Cox analysis approach. I could only find one brief mention of the Cox analysis approach in the protocol and if I compared statements about the log-rank versus statements about the Cox, my view would be that the investigators were hanging their hopes on the log-rank and not the Cox. Yet, we see a good deal of analyses today centered on the Cox progression analysis approach. DR. QUINN: Well, the survival curves become more variable at later time points of the trial, and the Cox model helps to partition out some of that variability and to assess the treatment effect. DR. COHN: Yes, if I could just comment about that, Dr. Moye, because you have to remember, this protocol was planned in 1978 or 1979. There were no data yet on long-term follow-up of heart failure. So, the possible potency of covariates and variables in influencing mortality was completely unknown at that time. I think that all current trials in heart failure are done recognizing those variables and adjusting for them, usually with a Cox analysis. I agree with you. At the time this protocol was written, the Cox analysis was not necessarily identified as an important determinant, for the very reason that we were not very cognizant of how important these were going to be in influencing this ultimate survival. DR. MOYE: So, I guess the crux of the matter here for me is, is there ever a circumstance when the primary statistical prospectively stated analysis plan can be adumbrated, can be substituted by another analysis plan using another stat analysis procedure? DR. COHN: Again, I think you are entirely right, and that is why we have gone into this intensive analysis of the statistics because that question has come up repeatedly and we can only show you the data as they are. These are the p values. One has to interpret them as one chooses to do. But keep in mind that this is a study designed 20 years ago. This was a VA cooperative study. This was not designed really as a regulatory study so that careful selection of criteria for endpoint were not as precise as one would see in a protocol designed today with the goal to come to this committee and ask for approval. So, one has to look at this a little differently than one might at a more recently organized mega-trial in which p values are clearly defined as the goals for the trial. DR. MOYE: Thank you. DR. MASSIE: Let me just read the statement I think that Dr. Moye was pointing out. This is in the analysis method of the protocol, the PF1 on page 34, where it said that variables which are prognostically important will be identified by comparing survival curves of patients on different levels of baseline variables. The life table regression procedures of Cox will also be used to identify variables that are prognostically important and to obtain estimates of treatment effects adjusted for any equality in their distribution between treatments. Now, one thing that struck me on the baseline characteristics is there were no inequalities of those prognostically important variables. Was that the case? DR. COHN: Yes. There were no significant differences when one asks are there differences between the two groups, but of course there are subtle differences which may impact upon mortality that don't reach statistical significance when one compares the two groups. It has been the usual approach in V-HeFT to look at all variables and not just confine oneself to variables that show a significant difference between the two treatment arms. And you can see the degree of adjustment that was required when we switched from a log-rank test to a Cox analysis, albeit using now only those variables identified by the agency and not the variables that we had originally planned on using because they were preselected independently. DR. MASSIE: It's just that in my naivete I was surprised that there was such a substantial difference in the outcome of those analyses, despite the lack of what looked like even trends. I saw a .5 difference in VO2, but everything else looked right-on. I wondered how much of that might have been as a result of imputation of the missing values as opposed to -- DR. COHN: Well, there were only 51 missing values in this whole group out of -- DR. QUINN: 459. DR. COHN: -- 400 and some patients, so it's really a relatively small number. It would probably be appropriate -- Lloyd, do you want to make a comment about that? Because Lloyd has really spent a lot of time going over these data. DR. FISHER: Well, just that the reason you can get a difference is, there are papers out showing in the Cox model, if the Cox model with covariates holds, if it is appropriate -- and that is an if -- then if you leave out other covariates, you bias the estimated effect downward. That is, I think, Piantadosi and Sam Weyend and some other people have published that. So, perhaps the reason there is a change is slightly analogous to in the analysis of variance you can reduce your variability by taking into account factors. It is not that you are correcting for baseline imbalance, but you have a more precise treatment estimate when you also take into account the other factors, and that does not contribute to the variability. That would be my guess that that is how this happens. Now, having said that, how you would actually prove a statement like that I am not sure, but it certainly can happen mathematically. DR. MASSIE: Lem, and then I thought I would ask Dr. D'Agostino to comment after Lem. DR. MOYE: Just one brief question. I wonder if you would comment on the concerns that have been raised about the lack of fit of the Cox progression model. DR. FISHER: Pardon me. About the lack of fit? DR. MOYE: The difficulty with the believability of the underlying assumptions required by the Cox regression model. DR. FISHER: Yes, I would be happy to comment on that because that's actually how I got involved in this. Things were somewhat down the road and the FDA review said the fit was examined in two ways, minus log plots. And also on the SAS output there was a statistic for fit. One of the p values for goodness of fit was .049 or something. And I came in there and I looked at the plots and I said, hey, this is proportional hazards. I knew that. I mean, I looked at it. Now, this isn't proof, this is Gestalt. So, what I suggested to the agency, I said, let's go to the randomization test. We'll use the Cox statistic but because we're worried about the parametric assumptions, we will go to the randomization test for the treatment effect, which is what we did, the primary thing actually that I did in my analysis. Before that was done that was agreed upon at a meeting with the agency that -- of course, the randomization test is always valid. It doesn't depend on the assumptions. The p value actually turned out to be almost exactly the same. To be perfectly frank, even before we did it, I knew that would happen because I had seen the plots and it looked like proportional hazards. But nevertheless, I think it will alleviate that concern with the agency. I assume that Jim is here, and if the agency still has concerns about that, they can bring them up. I don't think that's much of an issue here. DR. MASSIE: Ralph, do you want to help? DR. D'AGOSTINO: Well, I'm not going to help but I'm going to say something. I guess I'm not overwhelmed with the notion that the protocol says a log-rank test as the major test, and then later on one may want to shift to a Cox. I have written protocols where the analysis that we actually used wasn't even invented when the protocol was being written. So, the notion of shifting is not too dramatic. But in this case here there is such a heavy reliance on the log-rank test that you sort of say that this is the procedure to be used, and then when they're shifting to the Cox, as the analysis is produced, it does become bothersome in terms of trying to sort out, is it chasing after something that's going to show significance, or is it something that you really believe is the best method. The other point that really bothers me is that I can't sort out what the primary variables are. It seems to me like there are a lot of primary variables, which means that there is a lot of testing that is going to go on. And now it looks like there is only a couple of primary variables, which means maybe there shouldn't be too much adjusting. Could someone really clarify? I thought I heard a presentation that there were a lot of secondary variables, but in the materials I had, there were something like six primary variables. That would lead you to say that you committed to those variables. DR. COHN: The protocol did have, I think, six variables as primary endpoints. I know if I were rewriting the protocol today -- and I can't do that -- and we had in mind a regulatory consideration, we would have more precisely defined what were primary and what were secondary. In 1979, that was not done. We all knew as we were progressing -- and we certainly have learned since then -- that the important variables are the ones that I focused on this morning because we now know those are the important variables in heart failure. How did we learn that? We learned that from V-HeFT. So, this is a self-fulfilling prophecy. You do the study, you learn about the disease by doing the study, and then it would be nice to then go back and redesign your protocol, but we don't have the luxury to do that. So, what you are saying is correct. One has to recognize there were a number of variables. The beauty, of course, of this is that every variable went in the same direction. So, we haven't hidden anything. I have alluded to some of those. The trends were all favorable in everything that we looked at. I hope that gives some comfort to the agency in the approval of the drug because there really is consistency across all the variables. DR. D'AGOSTINO: One of the concerns that I think we might have with that is that you then really use the study in an exploratory fashion, which is we learn from studies. But it still then leaves us with the sense that, do we believe the way the variables ultimately were sorted out would, in fact, be confirmed in yet another trial. I think this is where my problems come from. DR. COHN: We have what we believe is strong support for the other variables in V-HeFT II. So, you have seen two trials in which the second -- the other endpoints all went in the same direction, and I think that should give you confidence that V-HeFT I has been replicated. DR. MASSIE: JoAnn? DR. LINDENFELD: Dr. Cohn, I have some concerns about dosing intervals. V-HeFT I and V-HeFT II were both q.i.d., and I understand the approval is for t.i.d., or is it for q.i.d.? DR. COHN: No, the approval should be for q.i.d. The data are q.i.d. DR. LINDENFELD: All right, good. DR. COHN: I think some of the recommendations, at least in one of the guidelines, is for t.i.d., based upon intuition, certainly not based upon data, and we are here with data, not intuition. DR. LINDENFELD: Good. DR. MASSIE: I'm just going to go to our consultants first and then we will open it up to the whole committee. Bob and Jim, any comments, questions? DR. CODY: A couple of questions. Did any patients who participated in V-HeFT I participate in V-HeFT II? How many would you say, what percentage? DR. COHN: Yes. I think about 15 to 20 percent of V-HeFT I patients who survived V-HeFT I were recycled and re-randomized into V-HeFT II. This of course is a major reason why we have never merged these two databases because of the overlap of patients. We have done extensive analysis to see whether there was any difference in behavior of those patients who were re-randomized as compared to those new patients entered into V-HeFT II, and there appeared to be no interaction whatsoever. So, we feel comfortable that they can be treated as if they were the same subset -- from the same set of the population, but that does influence a couple of things in terms of age, for instance. They already were a few years older. DR. CODY: In terms of the very sophisticated statistical analyses that have been done and presented today, has this been factored in, or does it need to be factored in? I would have to defer that to people who know a lot more about statistics than I do. DR. QUINN: Actually I can answer that question, that the results were done both ways for V-HeFT II, both using all the patients that were randomized to that trial, as well as looking at the patients that had not been in the V-HeFT I study. The results were absolutely consistent using both methods. DR. CODY: What percentage of the patients in V-HeFT I and II were women? DR. QUINN: There were no women in the trial. It was all conducted in the VA hospital setting. DR. CODY: I raise this because of the current VA and NIH push to include women in heart failure trials in a more representative fashion. This is certainly an issue with at least one of the VA-sponsored heart failure trials that are currently underway. We are assuming that we could extrapolate these findings across genders. Is that a reasonable assumption? DR. COHN: Well, I guess your assumption, Jeff, is as good as mine. Certainly when one has looked at the response in women versus men in the trials where both groups have been included, such as the SOLVD trial, there appeared to be no difference in the therapeutic response. Women were not included in V-HeFT because we recognized we would have so few that it would not be possible to analyze them separately, so we confined the study to males. The extrapolation to the female population then is going to be a matter of judgment rather than of data. DR. CODY: I guess a final comment is, I think a very important statement that has been made by the presenters, and that is the need for prescribing information for this combination. What data exists to suggest or to guide people when to use BiDil instead of an ACE inhibitor? When do you use BiDil in addition to an ACE inhibitor, and can these findings of functional class 2 and 3 patients be extrapolated to functional class 4. DR. COHN: Your last comment is a very important one, and since there was an exercise entrance criteria in all of these patients, class 4 patients were substantially eliminated from the trial. There were a smattering of patients who were said to be in class 4 failure, and as you know, a patient might have been in class 4 failure last month and now is ambulatory and functional and gets included in the trial. Is he now a class 4 or is he a class 3? We argue about that all the time. But there really is little data in class 4 patients in this trial. There is a good deal of experience with the combination therapy clinically on hemodynamics in class 4 failure, but they were not included in this trial. I think your first issue was -- DR. CODY: Using BiDil instead of an ACE inhibitor or in addition to? DR. COHN: Yes, the place of this in therapy. Obviously, the labeling that is being requested would point this out as alternative therapy to an ACE inhibitor in patients who were not receiving an ACE inhibitor usually because of intolerance or perceived intolerance. We know that the analyses done of the use of ACE inhibitors in patients with heart failure continues to suggest that there is a large number of patients not receiving an ACE inhibitor who, on the basis of the data, should be receiving an ACE inhibitor. So, this would be alternative therapy for that group of patients who physicians choose not to use an ACE inhibitor. We are providing no data on this combination added to an ACE inhibitor, and we would not anticipate that that should be included in the labeling. Many of us in clinical practice use that combination because we have found anecdotally that it is effective. But there have been no systematic studies done of hydralazine-isosorbide dinitrate added to an ACE inhibitor to justify that as a labeling indication. DR. CODY: I agree with you that that there are patients where we would use the combination, and generally those would be the patients who aren't doing well. They might be the functional class 4 patients who are not responding to an ACE inhibitor or the hydralazine nitrates, so we combine them. Where clinically, where this piling on concept is used for the sickest patients, do we have to have some special wording or recommendations about that? DR. COHN: Yes, I agree with you completely, Bob. That is really the way we have to focus this therapy based upon the data from V-HeFT. We have to limit the indication to what has been demonstrated in V-HeFT. I appreciate your comments, Bob. DR. MASSIE: Just one qualification and then Ray has a question. When you say in people who are not using ACE inhibitors, would it make more sense in people who have been tried on ACE inhibitors and have not tolerated them? In order to best serve the educational function that what we are trying to do is get people to use ACE inhibitors and we know there are people in whom you can't, but not just in people who are not on them because that would be any heart failure patient who is newly diagnosed and hasn't yet had a chance to be treated. DR. COHN: Well, you know, you may be right. On the other hand, if you look at the ancillary endpoints such as exercise, and one had a therapeutic goal in an individual in whom prolongation of life, based on whatever other issues might be present in that individual, was not your primary emphasis, and your primary emphasis was to allow the patient to do a little more exercise, one might conceivably feel that in that instance the mortality benefit of Enalapril was not important to this patient. Now, these are judgmental issues that physicians have to cope with, so it is difficult to demand that all physicians give all patients with heart failure an ACE inhibitor. It is important to show them the benefit of ACE inhibitors so that they can choose to use those drugs in the appropriate patient population. So, it is a very nebulous kind of distinction, but I think physicians have to be given choices. DR. MASSIE: Ray? DR. LIPICKY: I have forgotten the operant policy during the studies with respect to how the dose was manipulated with respect -- DR. MASSIE: Can you speak a little louder, please? DR. LIPICKY: Sorry. I have forgotten the operant policy with respect to how dose was manipulated during the studies. Was it titrate to maximum tolerated dose with some upper limit? DR. COHN: Yes. The upper limit was 40 milligrams 4 times a day of isosorbide dinitrate, and 75 milligrams 4 times a day of hydralazine. It was a dual titration, and that is, both drugs were increased at subsequent visits until the patient achieved that higher dose. If headache, which was the major side effect, intervened, the dose could be either held or even reduced, and that is why the mean dose in V-HeFT I was about 240 milligrams of hydralazine, not 300, and the mean dose of isosorbide dinitrate was about 110 milligrams and not 160 milligrams reflecting that. DR. LIPICKY: But dose was increased for both. DR. COHN: For both. DR. LIPICKY: They were not changed independently. DR. COHN: No. Although if a side effect occurred, the physicians were encouraged to reduce the dose of the ISDN first because it was our impression that that was the more likely cause of headache. So, they might have reduced one and not the other. And sometimes they discontinued one and not the other. Now, we had a little trouble dealing with that discontinuation of one of the drugs because they were taking one and they weren't taking two. Knowing what we know now, it's possible you need to take both in order to get the beneficial effect. But it was all an intent-to-treat analysis anyway, so that analysis was not influenced by whether they did or didn't take both drugs. DR. LIPICKY: From the vantage point of instructions for use, and based on the experience, do you think it's a problem that one has to take both and does not have a choice in titrating one or the other, depending on adverse symptoms? DR. COHN: I think our data would suggest that if one wants to attain the benefit of this drug combination, one should use the two drugs. We have no way of analyzing what the optimal dose of each of those two combinations is, as you know, and this was not a dose response study. So, we are left with a strategy for therapy, a strategy for reducing the dose if side effects occurred, and when one used that strategy, we reduce mortality. Now, I think from a labeling standpoint, all we can do is recommend that strategy in the labeling, knowing full well that that may not be the optimal strategy or the only strategy, but the only strategy we studied. DR. MASSIE: Ray, while you have the microphone, before we open the general discussion, maybe we can get you to clarify something for us. The idea of a combination drug as opposed to the two components of the drug. I think you started hinting on that point a bit. What would the FDA see as a reason for approving a combination drug when we have the two components, and I guess then I would like Jay to follow up and tell us why he thinks it is better to have this combination rather than the two components, what advantage it provides, because ordinarily I know in combination drugs where we have dealt with them for hypertension, you have to show both components are effective and then there is some advantage to having the two together. Maybe, Ray, you could tell us why we should be thinking about this. DR. LIPICKY: It could be a very long discussion but I think the short discussion is that if one has a trial where one thinks there has been documentation of an alteration in irreversible harm, and one knew that, say, it was a single chemical entity, but it was a racemate, nobody would have any problems whatsoever in saying that the drug, a racemate, did it. I think that if you consider this to have been documented, to have an effect on something that is irreversible, well, then you are stuck, or not stuck. It is appropriate to consider the combination as a single drug. Ordinarily one would expect to be able to document that drug A plus drug B has a bigger effect than either drug A or drug B alone at the appropriate doses. But ordinarily one would be concerned about that if one could in fact do studies that would allow one to determine that. It is unlikely to be able to do them for irreversible harm, especially with a study that is 20 years old. DR. TEMPLE: Barry? DR. MASSIE: Can I just let Jay respond? DR. TEMPLE: Barry? I'd like to add something. DR. MASSIE: Go ahead. DR. TEMPLE: We have a combination policy that doesn't distinguish really between taking the two drugs separately and putting them in the same tablet. That is theoretically of no concern. It is never a benefit to have them in one tablet except convenience. There can't be a medical benefit from taking them together as both separately. The question is, do they each contribute, as Ray said. The longstanding policy is you have to demonstrate that each component makes a contribution. We have, however, tried to confront the question of, suppose somebody shows you that you've done something important with two drugs and it is really not possible anymore to test the two components because you can't have the placebo group to do it. What we have said is, if there is a plausible basis for having both components, on theoretical grounds we would sort of live with the discomfort of approving the combination if it had an important effect on survival or irreversible morbidity or something like that. DR. COHN: I think the answer to your question, Barry, is a complicated one. Let me put it this way. If one looks at the use of this drug combination in its generic form, out in the community, there is very little use of hydralazine. There is substantial use of nitrates. ISDN is widely employed in heart failure, without labeling, and without indication and without marketing. Hydralazine is not used, perhaps for several reasons. Number one, physicians don't like writing so many prescriptions because they would have to write two separate prescriptions. Patients do not like taking so many pills. And there is no dosage form available of hydralazine which matches the dosage form used in V-HeFT. So, there are several impediments to the use of hydralazine. The nitrate use suggests that physicians are very comfortable using ISDN because they are comfortable with that drug. And they are using it for reasons which are mysterious because there is no existing database which suggests that ISDN should be used in patients with heart failure, other than the V-HeFT database, which we now believe strongly suggests, based upon the new data that I showed you, that hydralazine should be used along with ISDN. So, having labeling for BiDil, if it will help physicians to understand the application, the dosing and the usefulness of these two drugs and can do it in a single prescription, with a single tablet that patients will be much more comfortable taking, I think it can have a profoundly favorable effect on the management of this syndrome because, despite the fact that all the guideline committees recommend using this combination, it is not being used. There has to be some explanation for that, and that's the best explanation I have, is what I have given you. DR. MASSIE: Okay. Well, what we are going to do is, Jeff, since he has not gone, and then we will start from the right. DR. BORER: Most of the questions I had have been answered, but I need a clarification here, if I can have one, please, and then based on the response to that I have several questions I would like to pose. First of all, I would like a clear statement of what is being requested for approval here. What is the indication? Are we talking about approving the combination for reducing mortality rate in patients with congestive heart failure, or are we being asked for approval of the combination for the treatment in general of people with heart failure because at least three things look like they get better, mortality rate and maybe exercise tolerance and maybe ejection fraction? What indication is the sponsor seeking here in the approval process? DR. COHN: Well, I guess if you are speaking of the sponsor, maybe we should turn to the sponsor. Cesare, do you want to respond to that as the sponsor? DR. ORLANDI: The indication that we are pursuing is for treatment of congestive heart failure in addition to digitalis and diuretics in patients actually not taking ACE inhibitors. This is based, indeed, on data that we feel are convincing, that are mortality data and ejection fraction data. DR. BORER: Okay. So, you are not specifically suggesting that the drug is indicated for reduction of mortality, but rather that it is indicated in general for treatment of patients with heart failure. Is that right? DR. ORLANDI: We feel that we have demonstrated actually an effect on mortality as well. DR. BORER: I think that Jay is absolutely right, of course. You can't be penalized for not doing what you didn't know to do at the time you did it because the data were not available. I come to these data with a sort of a general bias in favor of the combination being good. However, we are being asked to approve the combination for something here. Now I understand that it is for the general treatment of patients with heart failure, particularly because of mortality reduction. And that may be a good thing to do. But if we are going to do that then obviously everybody has to feel comfortable with the consistency and reproducibility of the effects, and therefore there are a number of statistical considerations that I would like some, again, clarification about here. I don't think that the p values are ironclad rules that one must follow because they say this or that, and I know the FDA regulations aren't written that way either. They are guidelines. On the other hand, we have information really from two trials. One of them was placebo-controlled. As I look at the data, the general Gestalt is that ejection fraction clearly is improved when you give the combination. That is a good thing. Exercise tolerance, well, you know, it doesn't really quite make it statistically but it goes in the right way. That is convincing. And then we have mortality, which is of course a very compelling argument if it is a reasonable one. But that is where this desire I would have to be able to be convinced of the consistency and reproducibility of the results begins to founder a little bit because of statistical considerations that I am not sophisticated enough to really understand. The way I see it, we have a hypothesis that allowed only a one-directional response, maybe reasonable, so we used a one-tailed t-test. We say that there is no penalty for looking at the data many times if you passed a predetermined boundary that was determined by the Data and Safety Monitoring Committee at an early look. That may be right, but I have never heard that before, but maybe it is right. We have multiple pre-specified endpoints and we have no penalty for looking at those, even though they presumably could have gone either way. And that is okay because mortality is so important. But we only have one placebo-controlled trial and then we use a second trial where the placebo is present but it's a historical control. So, all of that is not the way we are accustomed to seeing data, and I would like to have clarified for me whether it is really legitimate to say we don't have to pay a penalty after a pre-specified stopping rule is passed but we decide to go on anyway because we wanted to see if the result was consistent over time. If it is really legitimate not to pay a penalty when we talk about consistency, if there are multiple pre-specified endpoints but there is one that really looks real good. What's the answer to that? Is there a statistician? Lloyd, perhaps? DR. MASSIE: We've heard the answer from the sponsor. I would like to have the answer from our two committee statisticians. DR. COHN: Could I just add one point here because I didn't bring this up before. I am reading now from the V-HeFT I protocol under Sample Size and Duration. "The primary objective of the study is to determine if the survival time is increased on vasodilator therapy as compared to the survival time in the placebo group." That was the primary endpoint. So, don't allow all these other endpoints to dilute that out. It is the primary endpoint. DR. BORER: That is a good point and I accept that. DR. COHN: I would like the response from the agency, but just to remind you, the reason for imposing a penalty for multiple looks is that you always have the opportunity to stop the trial if you surpass the guidelines for the endpoints and the multiple looks. If you surpass the endpoint and don't stop the trial, you really have eliminated the need for any more penalty because you haven't responded to it in the first place. So, the multiple looks have not really contributed to your final decision. That is a nuance, and I would love to hear responses from the statisticians on that, but just intuitively it seems to me that makes sense. DR. MASSIE: Lem? DR. MOYE: I have somewhat a different view. (Laughter.) DR. MOYE: The purpose of corrections for multiple looks is to ensure that you have preserved the type 1 error at an acceptable bound. The type 1 error, I think, is really a cause for lots of concern and lots of confusion. From my way of looking at it, the type 1 error is a matter of population protection. The experimenters have an obligation to protect the population from which they derive their patients and the derived sample. They protect the derived sample, of course, by taking care of the patients as best they can. They protect the population by ensuring that they don't inflict unnecessary false positives or false negatives. The way to provide the insurance for false positives is the alpha level. For every decision that is made concerning a hypothesis test, you have the potential for sampling error propagation, and there are two ways to handle that. The far superior way, again in my opinion, is for the investigators to handle it. That is to say, the investigators must say with absolute clarity what they are going to do with the primary endpoint, how they are going to test it, and what they are going to do with secondary endpoints. They must provide, if you will, a decision path, how they are going to work through the collection of endpoints that they have. They are in the best position to do it because they can do it prospectively. They have an excellent fund of knowledge to do it, but I must confess they are not used to doing that, and perhaps the reason is that we have not asked them to do that. Because we haven't, we find ourselves again in the position of trying to make some determination and some post hoc correction of these accumulated decisions. I think if the investigators surrender their mandate, because that's what they have in the beginning, for controlling these alpha errors to us retrospectively, then it is up to us to come up with our own. My personal one is a very conservative one which penalizes investigators for each decision they make, so that in this circumstance that where the type 1 error at the 2-year interim analysis is very small, then you nevertheless accumulate some error because that decision may have been wrong. You accumulate alpha from that and you move on, so that as one progresses through the secondary endpoints, the alpha eventually accumulates. You stop when you reach the bound, whatever that bound happens to have been. Typically it's at the .05 level. So, I am arguing for, number one, for a prospective plan for the spending of alpha, but in its absence -- and most times I am afraid it is absent -- a very conservative post hoc plan for the accumulation of alpha, and that way we can ensure that the probability of making it at least a type 1 error from all of them is acceptably small for the population at large. DR. MASSIE: Ralph? DR. D'AGOSTINO: I agree very much with the spirit of what was just said, but I would like to add a couple of comments to it. I think this idea of saying you cross the boundary and then you no longer pay a penalty, well, as you cross the boundary you find later on that your mortality for the full study isn't significant. Do you still believe it is significant because you crossed the boundary earlier? Do you start running into making decisions later on that you will change your mind or you will do different things, depending on what those later analyses produce, and you have to have some kind of way of guiding yourself in terms of alpha -- I don't like this notion of alpha spending, but what do you believe about it as you start looking at the data in a further fashion? I think that -- and this is a good example -- you have marginal significance with the 2-year mortality. I mean, why isn't it .001 so there was no confusion? It is hovering around. You fuss with one analysis and it crosses over the significance. Another analysis and it becomes slightly better for you. There is not a very comforted feeling on that, and these multiple looks at the data really can't be just dismissed as you have protected yourself earlier. So, I think that we really are in a situation that we can't say you crossed the boundary, therefore you forget about the alpha. I don't think that really is the case here. And I do think that this question that you raise -- and I was trying to say somewhat the same. You come into this study with certain notions that you want to look at survival, and you've seen something that looks like a 2-year survival. Will you see it again? I am not sure you will see it again. I am not sure I am convinced with what the data I have seen here. And I realize that survival is the major thing, but you still carry with you six primary outcomes and what are you going to do with those? Are you just going to ignore them? Those are all sort of look-see. I am not going to think about any significance on them? You certainly are, and once you start playing that game, I think you have to say, how am I going to use my alpha, how am I going to be able to sit back and say, I really believe what I have. And I think that we are left in a situation where we see the survival but I would like to see another test of it. DR. MASSIE: Jeff, do you have more? DR. BORER: No. DR. MASSIE: JoAnn, go ahead. DR. LINDENDFELD: Just in this same vein, I wonder when we use a second analysis to assess mortality, when we know that the first analysis has been borderline, does the second type of analysis need to be stricter than ordinary criteria, once we know that the initial analysis was of borderline significance? DR. D'AGOSTINO: Are you asking me that? DR. LINDENDFELD: Yes. DR. D'AGOSTINO: I'm not sure I know the question. Are you saying if they put a second study together? DR. LINDENFELD: I'm sorry. In the initial study, once you know the initial method of statistical analysis was of borderline significance, and we go to a second, already knowing that the first was borderline. DR. MASSIE: This is the log-rank versus the Cox? DR. LINDENDFELD: Versus the Cox, right. DR. D'AGOSTINO: No. This is the notion, I think, that was raised in the question, are you looking for the test that is going to do the best for you? DR. LINDENFELD: Exactly. Shouldn't the second be perhaps stricter than if it was primary -- DR. D'AGOSTINO: I think so and I think that there is real justification for that. Again, I don't see anything wrong with, say, doing an analysis that makes no adjustment for baseline variables, seeing where that goes, and then doing a sharper analysis that includes covariates to get rid of some of the variability. I am not sure it's imbalances that you need to correct, but you want to reduce some of the variability. But as you progress through that, if it is stated in the protocol that the real analysis that you are going to put your final weight on is the Cox regression that does all the covariates, then I think you can wait to see what that produces. But if your protocol says I am going to look at the log-rank and maybe look at the Cox, or it is unclear what you are going to do with the Cox, and then you really move to the Cox with the hope that it is going to give you some significance because the log-rank didn't, I think you are in a situation where you are beginning to doubt how much certainty you can get from the study. DR. MASSIE: I think the committee has been restive and also very cooperative in not interrupting. We were scheduled for a break, but I would like to try to make a pass-through here and continue the discussion, starting down there. Udho? DR. THADANI: I have a couple of comments and a couple of questions. I want to reiterate, the study of V-HeFT I and V-HeFT II was in class 2 and 3 failures, and only in females, so the application of what we say is only to those groups. There is no data on top of ACE inhibitors from either of the trials. Jay mentioned that the study started in 1980 and since we did not know the moving target, it should not be penalized. We learn with experience and the committee has to make a decision what we know now, not what was known before. Life is a penalty. As we get older, we are going to die sooner. Now, Jay mentioned that we don't know why nitrates are used without hydralazine, and I think about 30 percent of the patients in CHF have coronary artery disease, in some studies 40 percent. The reason nitrates alone are used are really for 20 percent despite -- especially in class 2 and 3 failure, the angina. So, we are using nitrates to treat angina, not necessarily heart failure symptoms. It is difficult sometimes, when they are getting exercise-induced dyspnea, to distinguish how is heart failure versus how is angina. So, I don't think I have much problem why we are not using hydralazine all the time in those patients. That is a comment. Jay, you mentioned in your very earlier statement that efficacy perhaps is not related to hemodynamics, and yet to turn around and you say, well, hydralazine probably prevents tolerance and probably is doing that, so it doesn't jibe. Now, one of the questions on tolerance with nitrates is a moving target. We still don't know what exactly produces it. That is why the sulfhydryl hypothesis, the neurohormones know the oxygen radical, and I could say there are receptors. So, I really don't know what to call this. Now, you are alluding to Dr. Elkayam's study that hydralazine prevented tolerance perhaps rebounding. He is in the audience. I am going to ask him again. I have asked him several times before. There is no hydralazine group in that study and the study was only 24 hours. So I have no clue whatsoever what it will do if you do it at 1 week or 2 weeks. Perhaps it might delay the tolerance. By 2 weeks there is no efficacy whatsoever. So, I am not willing to buy that as a data showing that -- postulation, yes, but not a convincing proof that that is how the combination is working. So, I want a comment on that before I say something else. Perhaps you or Uri, who is in the audience, want to allude to it. DR. COHN: Let me see if I can try to express the relationship between hemodynamics and long-term benefit because, yes, I did make the point that the hemodynamic effects that may well influence symptoms in the short term are not necessarily indicative of a long-term benefit. Yes, it is true that nitrates on hemodynamics apparently produce tolerance. Partial tolerance, we would say, because we have done some studies on nitrate therapy chronically and are able to demonstrate that, given with a drug-free interval at night, at 10 or 12 hours at night, that nitrate effect persists. It is true that the benefit that Uri and Leung and others have shown with nitrate co-administration is on hemodynamics and is short-term. So, it provides us a potential mechanism, but it certainly provides us no proof that the long-term benefit is enhanced by the combination and would not also occur with the individual drug. I wouldn't suggest to you at all that these data can be directly extrapolated to the long-term effects of nitrates. Now in our view, the long-term effects of nitrates on left ventricular remodeling are non-hemodynamic. We have data in an animal preparation in the canine that when you administer nitrates for 3 months, you prevent the remodeling of the left ventricle without a demonstrable hemodynamic effect. So, my view is that this mechanism of action of nitrates is through nitric oxide effect either on the interstitium and the collagen, or on the myocyte directly to alter the remodeling process. I think there is growing support for the idea that this remodeling is a non-hemodynamic phenomenon. I have no idea whether tolerance develops at all to that effect. We have no data that hydralazine would influence that effect whatsoever. So like in most drugs -- and I think this applies to practically every drug that has been approved by the FDA and that we use daily in practice -- we do not know how they work. Much as we would love to know, we are always grasping at straws to try to find out how drugs such as ACE inhibitors work, or why do beta-blockers lower blood pressure. I have no idea, but they do. So, we have to separate mechanism from efficacy to some extent. We are dealing here with potential mechanisms, but clearly not proof of the mechanism of long-term efficacy. DR. THADANI: Also, I think I would like to point out that ISDN regimen uses four times, which angina is really no better than placebo in several studies. So, it is a very different regimen. So, I would like to suggest let's keep tolerance out of the discussion because you are talking about survival. A combination may or may not be relevant. Now, point two I want to make is that in the V-HeFT II, Enalapril was definitely superior to combination therapy with ISDN hydralazine. There is no way about that, so that is a fact. So, my judgment would be that even -- which I use sometimes patients who do not tolerate the ACE, I would hate to suggest that we should put a broad labeling that patients should be -- this is an alternative treatment. I think you have got a study in which you did a study and Enalapril came out ACE or better. So, it's only in patients who are not able to tolerate ACE, not that they are not taking ACE because a lot of physicians do not prescribe ACE, or patients who have renal dysfunction, that might be the way to go. So, I think one has to be concerned when you are looking at the labeling issue. The question was raised in the V-HeFT I. You adjusted, depending on the headache, separately ISDN lower dose without leaving the hydralazine. How are you going to do that with a combination? Do you have any ideas, or what do you suggest we should be doing? DR. MASSIE: Just respond to that last question about the adjustment. DR. COHN: The combination has two different dosage forms which provide different relative amounts of ISDN and hydralazine. So, there is some flexibility available by altering the tablet. DR. MASSIE: John, do you have a question? DR. DiMARCO: I am still a little bit uncomfortable about recommending a combination drug when we do not really have data about either of the agents. Am I understanding you correctly, that you think that nitrates, appropriately administered with a drug-free interval, might produce the same effect you saw with the combination? DR. COHN: There is no data whatsoever on mortality or left ventricular remodeling. All I am saying is that one can maintain the hemodynamic effect of nitrates if you administer them chronically in this patient population with heart failure, measuring hemodynamics. But that has nothing to do with the long-term efficacy that we have demonstrated in this trial. So, the answer is, we don't know. DR. DiMARCO: When you were planning the trial, why didn't you then have a nitrate control group? Or a nitrate group? DR. COHN: This trial was done to prove concept. This was vasodilator concept, and we wanted to use the most potent vasodilators we had available at that time to prove concept. You would never in 1997 design a mortality trial with 640 patients and three treatment arms. Come on. I mean, we are dealing with a different era. We were breaking new ground. We didn't know what the mortality was going to be. We hit actually the placebo mortality right on. That was really our predicted mortality, so we were remarkably fortunate in guessing right. But obviously the study is not powered the way one would power it today, and Dr. D'Agostino is indeed right. Don't we wish we had a larger database from which we could then show a p value that no one would argue with? Wouldn't we wish we had two trials that demonstrated efficacy against a placebo arm? The latter is not possible because we can no longer do a placebo arm. The former we have to live with what we have, and I urge you to remember that p values don't tell you much about the magnitude of effect, but they tell you something about the confidence you would have in that effect. The magnitude of this effect is quite large. The confidence is a little lower because it is a small study, and we have to temper our judgment based upon what we have, and not what we wish we had. DR. MASSIE: Before we cross to the left, Bob? DR. TEMPLE: It is tricky to watch all the dancing p values, but it seems to me one can summarize it by saying the results are analysis-dependent, and that if you make any kind of correction as suggested by Drs. Moye or D'Agostino, it is going to rise above nominal significance. There could be an interminable debate about what that correction should be for the multiple endpoints, the multiple looks, and the fact that it was a three-group study. For starters, you have got to correct for that. So, in that sort of situation one wouldn't ordinarily say that the study is unusable, but one would then look to the next batch of data one has, and in particular look at V-HeFT II. I am interested in people's views about the novel, to say the least, approach to dealing with a study that actually showed inferiority of the drug in question to another therapy, which is not easy to do if you're an active drug, and relies on an imputed placebo group to conclude that even though it was inferior, it probably still had some effect and would have had some effect in a study of V-HeFT II size. Now, that is a very novel argument, as I am sure Dr. Cohn knows, and given the unequivocally borderline at best values in the first study, what does one make of a confirmatory study that loses to Enalapril and has to beat a putative placebo that wasn't there? How plausible is all that? Because nobody is going to make V-HeFT I overwhelming. There are too many things one can say about it. So, it is crucial to know how to interpret V-HeFT II. DR. COHN: I would like to remind you, Bob, that although we all agree that Enalapril beat H/ISDN in V-HeFT II, it only did it at the 2-year time point, not overall. And it was at the 2-year time point where there is much less argument about whether H/ISDN beat placebo. It is the overall p value that we are worried about. So, you can have it both ways. If H/ISDN beat placebo at 2 years, then so did Enalapril beat H/ISDN. If we look at the overall result, neither p value reaches the nominal level of significance, and we have to really look at these two studies then quite similarly. DR. TEMPLE: Jay, even the value at 2 years is not clear-cut. Some of them are, depending on what you do, above .05, some are below .05, but that is completely uncorrected for things that need correcting. At a minimum there is the Prazosin group. You have got to make some correction for that. And I am not one for major corrections for multiple looks here, perhaps, but there were other endpoints and there were multiple analyses, and whatever you think the right correction is, it is borderline. Then at the 2-year time, which was the sort of agreed upon time, it actually lost to another drug. So, you have to believe there is room for it to be somewhat effective but still not as effective as another drug in a study where the overall reduction in mortality was -- I don't know -- like 20 percent. Sorry. You didn't actually measure it. It was likely to be in the neighborhood of 20 or so percent based on other Enalapril data. That is a lot to believe. I just wonder what people think about that. DR. MASSIE: Let's move along to the left and see if any of the comments -- DR. LINDENFELD: A quick question. In terms of remodeling, in V-HeFT I there was echo data. Was there a difference between the hydralazine-isodi group and placebo in terms of end diastolic dimension over time? DR. COHN: No, the data really were in concert. That is, the LVIDD, I think, as we did it -- and we have improved our methodology since V-HeFT I, but the trends were the same. Clearly, a less precise measurement than the MUGAs that we did sequentially, but the directional changes were the same. DR. MOYE: Just to respond to Bob's question, part of which I think was the notion of comparing to -- having an imputed placebo group. I confess to say that my reaction is over negative to that because to me it looks very much like a historical control. There are differences in baseline characteristics between V-HeFT II and V-HeFT I and differences in the choice of medication. I can't say that I have learned anything reliably from that kind of analysis approach. DR. MASSIE: I understand we are going to have a specific presentation relevant to that point by the division after the break. DR. LIPICKY: You have a specific question dealing with that point. It would probably be good to have this discussion at the time that you are answering questions and not when you are trying to clarify the data. DR. MASSIE: Okay. So, Bob, you've prepared us to get ready. DR. WEBER: Jay, I have got a couple of short questions. First of all, you were talking earlier that many clinicians are reluctant to use hydralazine, or simply do not use hydralazine, for treating patients with congestive failure, and that may be partly because, in the minds of many physicians, hydralazine is not always an appropriate drug for patients who have got ischemic heart disease. When you look back at the V-HeFT experiences, what proportion of patients in fact had that failure on the basis of ischemic disease, and was there any information whether the treatment was more effective in those with ischemic disease than those with other etiologies? DR. COHN: No. As I pointed out, about 55 percent of the patients in V-HeFT I had ischemic disease, as opposed to 45 percent non-ischemic. The response was quite similar in the two groups. We actually have a slide. I don't think we want to waste our time showing it. But the reduction in mortality in those with CAD and those without CAD was quite comparable. DR. WEBER: Beyond what you showed us -- and I must confess, I had not really been aware in depth of these data before -- that hydralazine may work to prevent the tolerance to the ISDN, do we have enough information to make a comment describing the clinical pharmacology of the product? To me, if we are debating whether or not a combination is appropriate, is there some mechanism that we can reliably put down in writing that would provide encouragement and support for the use of the two drugs as a tandem? DR. COHN: You are talking about blood levels? DR. WEBER: If such were available. DR. COHN: Well, of course there are now a lot of data available because of the bioequivalence issue relating to the BiDil combination and the individual components. If you want to get into that, I could ask some of those that have been directly involved with that to discuss it. DR. LIPICKY: Is that what you want? Blood levels? DR. WEBER: No. I am really looking more for a justification for the mechanism of the two drugs. DR. LIPICKY: How about death? That is what Jay is offering us. If you use them in combination, it saves lives. Isn't that good enough? DR. WEBER: It certainly justifies the treatment. I'm just coming back to -- DR. COHN: Well, the pharmacologic mechanism that really led Joe Franciosa and I to use this combination was that when one gave isosorbide dinitrate by itself, the pulmonary capillary wedge pressure fell, and the cardiac output went up only a little bit. When one gave hydralazine alone, the pulmonary capillary wedge pressure barely changed, but the cardiac output went out a lot. When we gave the two together, we got a greater fall in pulmonary/capillary wedge pressure and a greater increase in cardiac output. So, hemodynamically these two drugs are indeed remarkably additive. Now, you have got to put all that aside and say, that was a wonderful idea in 1978 or 1979, but now we understand the mechanism of long-term efficacy in heart failure is not related to that remarkable hemodynamic effect probably and relates to some other action of these two drugs together which favorably affect all these outcome measurements in heart failure. And do we know exactly what they are? I think I know, but I am in a minority and I haven't convinced everybody. DR. WEBER: One last quick question, and it is one that a couple of other people have already alluded to, but I think it is somewhat troubling. The labeling that was initially proposed was that BiDil would be used as an adjunct to diuretic and dig. You then suggested -- your own phraseology was -- that it would be used as an alternative to an ACE inhibitor. Those are slightly different. I don't know if both can be reconciled into one instruction. But bearing in mind the thought that came up and hasn't really been talked about yet, the possibility of a mortality claim, and if that were the case, would the same labeling be in place, or would it just be a general mortality claim that might infer that the drug could be used in patients already receiving an ACE inhibitor? Obviously, we don't have those data, but clearly we have to try and understand how in the real world physicians would interpret these labels. What are we going to say? DR. COHN: Well, as you know, I am a strong advocate of ACE inhibitors to treat heart failure, and I believe any labeling for this drug combination would have to make it clear that ACE inhibitors produce a more favorable effect on mortality and should be employed as the agent of choice in patients with heart failure, and that this would be an alternative for those patients who, for one reason or another, are not using an ACE inhibitor, and that would usually be because of perceived intolerance to the drugs. The instructions for use, then, need to be provided to physicians. But I think the labeling would have to make it abundantly clear that there is a mortality benefit from an ACE inhibitor as compared to this combination. DR. MASSIE: Cynthia? Ray? DR. LIPICKY: I think that it might be useful to say that the exact labeling is something that may get worked out. The thing that is at stake here is whether it should be approved, so that labeling would have to be written. DR. RODEN: (Inaudible.) DR. LIPICKY: Well, I understand that. If you are having trouble deciding whether it has an effect that trials have shown in patients with heart failure, then you are going to have to say it should not be approved. If you are not having trouble in deciding what it does in patients with heart failure, you will be able to tell us how it should be labeled. DR. WEBER: No, I have no trouble deciding that it does something, Ray, I just want to know what it is. (Laughter.) DR. MASSIE: Cynthia? DR. RAEHL: Dr. Cohn, in your experience, would you suggest that the most commonly prescribed of these products would be the two dosages with the 20 milligram ISDN component, based upon what you know was the maximum expected dose in the V-HeFT trials, and what was the actual mean ISDN dose in the range? DR. COHN: Well, I am not sure I could project what dose is going to be used. Physicians tend to use drugs in lower doses than are recommended. It's been our experience with ACE inhibitors that if you tell a physician to use an ACE inhibitor, he or she will use 2.5 milligrams of Enalapril once a day and feel that they have accomplished the therapeutic goal. So, I can't predict which one is going to be used. DR. RAEHL: Well, the reason why I ask -- and perhaps it will come up in the biopharm review, but I would be interested in the prospective comments of the sponsor -- is that assuming that many patients will not tolerate 160 milligrams of ISDN a day, which I think is a good assumption, then the two middle doses of 37.5 and 20, and 75 and 20 could come into play more often. Yet, it appears to me from the biopharm review, we don't have what I would consider the very basic bioequivalency or dissolution data. I must admit, I am quite surprised to see that at this stage of the investigative process. DR. MASSIE: Maybe a focused response on the biopharmacology. Do we have a reasonable 20 milligram component of the nitrates in those pills? DR. ORLANDI: I would probably defer the question to Dr. Forrest who actually performed the studies. DR. MASSIE: You have to come up if you are going to comment. DR. RAEHL: I think to expedite this rather simple yes or no question, whether or not some of that basic bioequivalency data is available for those middle dosages. Or maybe someone from the agency can give a quick answer. DR. LIPICKY: Yes, I should be able to but I can't. DR. MASSIE: I think the question is -- and I think this question is triggered by some concerns in the reviews by the agency, so maybe they can comment -- do we expect to get the same effect from this 20 milligram nitrate component of this pill as they have gotten from isosorbide dinitrate 20 milligrams? DR. LIPICKY: Well, maybe I can address that, and if I say something wrong, holler at me. I am talking to the people who know the data, okay? Because I am recalling it. In the bioequivalence studies, where there was plasma concentration versus time, not in vitro dissolution now, but bioequivalence studies, the to-be-marketed formulation that was studied -- and was it one or two doses? VOICE: Two doses. DR. LIPICKY: Two doses? Low and high? Fine. The lowest dosage form and the highest dosage form, which is usually what we ask people to do, were not bioequivalent to anything. They were not bioequivalent to ISDN as it is available on the market, or hydralazine as it is available on the market, or to either of the formulations that were in either V-HeFT I or V-HeFT II. So, now when I have made such a broad, sweeping statement, what does that mean? It means that the usual generic rule did not apply. It was 21 percent, but it covered a whole range of dosing from 20 to 160 milligrams a day. So, it is still titratable, it still covers plasma concentration ranges that probably one would achieve. But in fact, it is not precisely bioequivalent, so it could never be a generic. So, from my vantage point, I don't think that's a big problem. Since the instructions for use are titrate to maximum tolerated dose -- it isn't, give this dose and then sit back and wait. It is, if that dose doesn't make people say they are sick, give them some more, and if that next dose doesn't make them sick, give them some more, and then they will have a mortality benefit. So, it is titratable. It is dose-proportional, but in fact it is not a generic product. Does that respond to your question or your concern? DR. RAEHL: I think it does, but it also raises the issue once again that even though instructions for many of these are to titrate to maximum tolerable dose, we know we don't do that. That's medical practice. DR. MASSIE: Okay, we are going to try to take a break in about three or four minutes. Let's just get the last two committee members' questions. DR. KONSTAM: I just had a comment and a question. The comment relates to guidelines because with regard to our construct of the guideline with the Agency for Health Care Policy and Research, we grappled a lot with this issue of making recommendations for non-approved products. I think Barry served on that committee, too. I don't think it was the intention of our panel to usurp the role of the agency. I think that we came to recognize that the criteria that we would use to make recommendations were of a different standard than regulatory standard. They were sort of a clinician's surrogate, perhaps, I might call it, if you will. I think we were concerned about safety and we were concerned that some medium level of evidence -- and there were a variety of different levels of evidence that we could use, but would accept therapies that were clearly short of a regulatory standard. Now, having said that, if you are interested in getting people to follow a guideline, I agree with what Jay said, that the current preparations of these agents I think represent really very real practical obstacles to following those recommendations that would be overcome by the type of preparation that's under proposal, that is, if it can be found to meet the regulatory standard. So just to say that. The question I had relates to stopping the trial, or not stopping the trial. I just wonder, why didn't you stop the trial after the fourth look? You're saying at the end that you felt that it was unethical to do another placebo trial because you were convinced of the effect, but why weren't you convinced enough of that after the fourth look? DR. COHN: Well, that is actually a very difficult question and I have grappled with that recently in the light of new concepts that should we have stopped the trial. I think Joe provided a list of the various reasons why the committee in the documents at that time listed why they were not stopping the trial, and obviously it was to learn more about the therapies, learn more about heart failure, learn more about the potential predictors of a favorable response, and to see whether the effect was durable, that is, did it last longer or was this going to be short-term. I think it would have been an appropriate strategy at that point actually, in retrospect, to have stopped randomization for ethical reasons and to continue follow-up, which is an intermediate strategy that has been used in some trials to date. But at that time in 1983, we had no experience in this syndrome, and I was a strong advocate for continuing the trial because it was an important study. I didn't think that the data at that point would be persuasive enough to let everyone in the world be treated with a combination, and therefore we needed to augment our database to learn more about the disease. And we did. But I think you are raising a very interesting ethical dilemma that the committee probably didn't spend as much time agonizing over as they should have. DR. MASSIE: Dan? DR. RODEN: I will take the opportunity to make one comment and then ask a couple of questions. The comment echoes what Bob said, and that is, I am troubled by being asked to approve a compound or a preparation for which the indication is not very clear to me, and which is demonstrably inferior to currently available therapies, although I understand why we're here. So my questions. I have two. One is the question of not mortality but of symptom relief. You showed us some data showing improvement in ejection fraction, improvement in VO2, and I wonder if you can make some comment as to the sort of clinical significance of those as opposed to the statistical significance of those. The changes seem to me pretty small. So, that was one question. Then the other question has to do with numbers of patients that were excluded from V-HeFT I on the way in. This was touched on earlier with respect to the question of unstable angina, for example, and the reluctance to use hydralazine. I can read the exclusion criteria but it doesn't really say how many patients were actually excluded from the trials on the way in for those kinds of reasons. DR. COHN: Yes. I think there were 3,400 patients screened and 640 entered. So, it is about a 5 to 1 ratio of screened to entered, if that's what you mean. The reasons for exclusion were down that whole list, of course, and some of them were inability to perform an exercise test or chest pain during an exercise test that excluded them. So, this is a selected population of patients in whom heart failure is causing their symptoms, not some other abnormality of their circulatory system. I think the issue about what the endpoint is is a very important one. From my last or next-to-last slide, I look forward to the day when we will target therapies for specific mechanisms of disease or surrogates for disease, rather than for the disease in general because we throw people into convenient wastebaskets and call this a disease. The labeling for this drug has been proposed to you for heart failure, and we know that heart failure is a heterogeneous syndrome with many different endpoints. DR. MASSIE: Jay, can you make it specific? Ejection fraction, exercise tolerance? DR. COHN: Yes. Now, you asked about peak VO2. There is no way of defining clinically whether a statistical improvement in peak VO2 is really an endpoint that is important or only statistical. As I showed you in V-HeFT II, H/ISDN would have been approved on the basis of exercise performance, statistically significant differences at 3 and 6 months, had this been a 6-month trial. DR. RODEN: You showed us VO2 data. How about exercise time? DR. COHN: Well, VO2 was the primary endpoint defined in the protocol. We have exercise time which follows the same path, but it was VO2 that was really chosen to be the exercise. Now, what we've learned, of course, and what was obviously intuitively apparent to everybody before was that when you do a long-term trial and look at non-mortality endpoints, you get a biased population progressively as the mortality difference enlarges. So, data after 6 months is contaminated by the fact that people are dying and dropping out, and yet you are only studying the exercise in the survivors. And you can put a substitute in, but we didn't do that. We have done that. When you substitute for mortality, and give a low value of exercise to those people who died, which has been done in some trials, you obviously show a benefit of H/ISDN because the people are not dying. And you get a very low value for the placebo group because they have died. So, that to me is not any longer looking at exercise. That is looking at kind of an overall phenomenon which I find contaminated. But that is another endpoint and it could have been used. DR. MASSIE: Just one quick question on my part. It is interesting, if we look at resting VO2, it sometimes goes up during therapies, particularly sympathetically activating therapy. These differences in peak VO2 were small enough that they potentially could be explained by a difference in the baseline pre-exercise VO2. Did you do any statistics or analyses on the VO2 before exercise? DR. COHN: I can't recall that we did. I think there was no significant change, but I can't recall if we really analyzed that, Barry. DR. MASSIE: I think it is an important issue because although you said directionally the exercise time went in a different direction -- I mean, went in the same direction, the statistics weren't there, and it was, as far as I remember, not even really close to significant at any point in time. DR. COHN: Well, the trouble with an exercise time measure is you have got flat workloads. You can increase by 2 minutes and have the same workload, so that creates -- DR. MASSIE: Peak VO2 should be better. I just had that concern because certainly with some of the positive inotropes, I have seen an increase in resting VO2 that is equal to the magnitude of the increase in peak VO2 that we have seen there. DR. THADANI: One second before you finish that up. Looking at the graphs and the table on VO2 and exercise, VO2 again is a moving target, only showing benefit at 12 months, not other time points, and there is no significance at all on total exercise duration at any time points. DR. MASSIE: Correct. DR. THADANI: So, I think what is important. Before we jump onto whether that is definitely beneficial, I have not seen any benefit. DR. MASSIE: Let's take a 15-minute break, and when we come back, it will be time for people from the agency to make any comments and then we'll go on to the questions. (Recess.) DR. MASSIE: Before we move on to the questions, I do not know if there are any specific comments from the division reviewers that we haven't covered. If there are, this would be a good time to raise any comments and questions. DR. LIPICKY: No, there are not. DR. MASSIE: There are not. Okay. So we thought of everything. Well, I think, then, we should move on to the questions. Maybe as we go through the preamble the rest of the committee will find their place. I am not going to read the whole preamble which basically outlines what we are being asked to consider, the approval of BiDil, and some concerns related to multiplicity, bioequivalence, which I guess we have covered, and tolerance, which we have at least discussed. The first set of questions starts with V-HeFT I, and I will read that in full because I think this is important. Factors that might affect interpretation of the mortality results include the following. There were four interim analyses conducted by O'Brien/Fleming rules. The protocol outlined three possible comparisons in the primary analysis, using the log-rank test. These comparisons included each active treatment arm to each other, the combined vasodilator arms to placebo, each active arm to placebo. Each of these analyses was performed at least once during the course of the study. There were two other analyses: one, a protocol specified Cox regression intended to identify covariates that were important and, two, a retrospective Cox regression analysis for the placebo versus isosorbide dinitrate comparison, using baseline covariates specified by the division. The Cox regression analysis required the somewhat arbitrary imputation of missing baseline values. Mortality was specified to be evaluated as either total mortality over the duration of the study, or as the 2-year mortality, or at the 2-year point. The published description of the study in the NDA submission reported nominal p values. In interpreting the p values for mortality analyses in V-HeFT I, by what factor, if any, should the nominal p value be inflated for? Now, some of these points we have touched on but I think we will revisit them briefly here. My plan is to ask our statistical primary reviewer and consultant to comment first on the statistically related questions, and our clinical reviewer to comment first on the clinical, if I can make that distinction. It is not always so easy. So, I think this first one we will start from the statistical point of view. Lem, multiple endpoints? DR. MOYE: I think I can answer questions 1.1 through 1.5 very succinctly. First, I would say that the primary responsibility for setting corrections for endpoints in interim analysis reside with the investigators. I think I mentioned that before. I will just say briefly now. If they don't do it, the responsibility devolves on us. The spending function that I used allocates alpha as the instigators said they would for the primary endpoint. Now, the primary endpoint alpha here is either .09 for a two-tailed test or about .046, I think, for the one-tailed test. That already exceeds the type 1 error for the primary endpoint. Any adjustments that I make to look at additional endpoints, be they primary or secondary endpoints, or times of treatments are going to increase the p values well above an acceptable level. So, I would say that the information that we get from the secondary endpoints here really is not contributory to making this study positive. Now, you do learn a great deal from these analyses and from these evaluations, but it really doesn't add in an important way to making the study positive. DR. MASSIE: Let me just clarify that. In making that statement you are not paying attention to the Cox proportional hazards model of what was stated to be the primary endpoint. Is that correct? DR. MOYE: Right. I was going to hold off on that until we got the question 3. Let me say briefly about that. My read of the protocol is that the analysis plan was to be the log-rank, and to be in a position now where the log-rank says one thing and the Cox regression analysis says something else is pretty intolerable. When I look to get out of that, I look first to see what the investigators said they were going to do. My read of the protocol is that the investigators planned to put most of their weight prospectively on the log-rank. Any other view of that, I think, runs very close to being a post hoc view because we now know what the data show and we now are in a position of having to choose our statistical analysis in view of the data we have before us. The clearest and cleanest answer, I think, is to go with the log-rank. DR. MASSIE: And then let me, just being the devil's advocate, or I don't know what advocate, say, what about the fact the 2-year endpoint log-rank, I think, came much closer than the .09 to achieving statistical significance, at least nominally? Wasn't that the case? What was the log-rank p value? DR. ORLANDI: It was .056. DR. MASSIE: And that is for the two-tailed? DR. COHN: The one-tailed 2-year log-rank was what? We will get that. DR. MASSIE: I guess of the hierarchy of endpoints, mortality was clearly meant to be the first one. My reading, by the way, is that overall mortality was really clear -- DR. COHN: The 2-year one-sided p-value log-rank, which was defined in the original proposal, was .0279. There it is: 2-year mortality log-rank .0279. With the Cox model it was .0168. DR. MASSIE: So again, I guess by Lem's criteria that wouldn't affect your answer to these questions either? DR. MOYE: That's correct. DR. LIPICKY: I am not sure that I understood Lem's answer. May I ask you to clarify it so I understand? DR. MOYE: To which part? DR. LIPICKY: Well, to any part. (Laughter.) DR. MOYE: Okay. DR. LIPICKY: Let's take the log-rank of .0279. That is the nominal value. Right? DR. MOYE: I guess I would say arguably, because my view of the protocol was that it was overall mortality and not 2-year. DR. LIPICKY: That's fine. So, that number needs to be adjusted somehow, or does it? That is what 1.1 through 1.5 are asking. That is, if you looked at that number, that looks to me like it is significant, but we know that all of these other things were done in addition to doing that. How should that influence how I look at that number? DR. MOYE: My view, Ray, is that this number is not significant. DR. LIPICKY: Why is that? DR. MOYE: Because the one-sided p value threshold was .025. That's my read. So, this is not significant. They have exceeded their alpha, and so any other decisions I make on any other endpoints are going to further inflate alpha and therefore -- DR. LIPICKY: Fine. Then could I pick another analysis where you would not be able to say that? DR. MOYE: You can do anything you like, Ray. (Laughter.) DR. LIPICKY: So, let's forget that the log-rank did meet their prespecified endpoint, and let's pick the Cox model of .0168. So, they didn't prespecify what they had to do. Right? DR. MOYE: Okay. So, now -- DR. LIPICKY: So, let's assume that that was the only test that was going to be done. DR. MOYE: Hypothetically they said the Cox model was going to be it and they hit at .0177. Okay? DR. LIPICKY: Yes. That's fine. We'll pick overall mortality, .0177. And that was the prespecified one to do, but all these other analyses were done also. DR. MOYE: Yes. DR. LIPICKY: How should that .0177 be viewed? That's what these questions were meant to get at, okay, because that looks like a pretty significant number to me. DR. MOYE: Right, and the adjustment I would make in the analysis path, going through these endpoints, would be as follows. I would leave the .0177 untouched. I wouldn't snatch defeat from the jaws of victory for the investigation. I would leave the .0177. That does leave me some alpha left to spend on secondary and tertiary -- DR. LIPICKY: But I'm not even interested in secondary and tertiary. I guess what this question was oriented towards asking was whether any of these p values for the primary endpoint can be taken at their face value. Am I making sense? Because there were interim analyses, because there was more than one primary endpoint, because there was more than one analysis. DR. MOYE: Okay, I understand. DR. LIPICKY: So, that is what this question was meant to get at, not what your bottom line is. DR. MOYE: Let me handle the interim analyses question first. Decisions are made to continue the trial at prespecified time periods during the trial's duration. That's spending alpha. The way these interim monitoring rules are constructed is that the alpha that you spend is very, very minute. So, in the end, I don't have .025 to spend. Maybe I have .023 to spend. So, from that point of view, the .0177 I think still stands. DR. LIPICKY: It might be .0179 now, though. DR. MOYE: Right. DR. LIPICKY: That's the sort of look that I would like you to give it. DR. MOYE: Now, the issue of having multiple primary endpoints is a little problematic because what is required is a prospective statement by the investigators on how they are going to allocate their .025. Now, if they are planning to allocate .025 to each of them, using kind of a Bonferroni approximation of the overall alpha, the overall alpha exceeds the -- or is at the .05 level, which is unacceptable for a one-sided test. So, the investigators would need to say at what level they are going to accept each of the tests. DR. LIPICKY: But they didn't. DR. MOYE: But they did not. DR. LIPICKY: So, you've got to set it. DR. MOYE: I beg pardon? DR. LIPICKY: So, you have to set it. DR. MOYE: That's right. And so what I would do would be to say, if I have .025 to spend on either of them, that would be for me the probability of making at least one mistake, say a mistake on primary endpoint 1, a mistake on primary endpoint 2, or a mistake on both. I would also say that I want to apportion alpha equally between the two because I do not have any reason to spend it more on one than the other. DR. COHN: They are very interrelated endpoints. Aren't they? DR. MOYE: Well, we're going to get there. Next point is I would also assume that they are independent. Now, that is very conservative and I am doing it for two reasons. Number one, because the investigators let me down and didn't do it up front. And number two, because I don't know exactly what the nature of the dependency is. It is easy to argue about the dependency but I would have to specify exactly what the nature of the dependency is. DR. FISHER: Can I ask a technical question? DR. MOYE: Yes, sir. DR. FISHER: If you perform a randomization test and take the minimum p value, you are absolutely protected by all of our rules and it takes into account the unknown correlational structure. In other words, you let the data tell you, but because you are doing the randomization test, it strikes me when you have highly correlated endpoints, that makes much more sense to most statisticians, I would say, than to assume the worst case, or do a Bonferroni or whatever. DR. MOYE: Well, I guess I am just a little uncomfortable, and maybe most statisticians are not, but I am just a little uncomfortable with the randomization test. I am much more comfortable with -- I am most comfortable, if I can say this, with the investigators saying exactly what they are going to do. Once they have not done that, then the floodgates open and we can all bring our different alpha spending functions to bear. If the question is how I would approach this, I would approach it as I said, and make conservative assumptions because, again, I want to be sure that I am preserving the type 1 error at a minimum bound, considering the issues that have come up today. DR. MASSIE: Let me interrupt. We have a lot of questions and I don't want to get into a technical statistical thing. I think what we're doing now is instructive, Ray, making Lem put his money where his mouth is, so to speak. DR. MOYE: Put my alpha where my mouth is. DR. MASSIE: And we should go through that quickly, and we have a second statistician we are going to bring up the same opinions on. Let's stay away from philosophy and just hit the questions. DR. MOYE: So, Ray, I would work through the very small amount of algebra and I would come up with a number. DR. LIPICKY: Yes, well, can you give a guess as to what that would be? DR. MOYE: Sure. DR. LIPICKY: I guess what I am looking for is some limits here because what you are saying, as I understand it, is that because of whatever was done in the beginning, you are being given the latitude to make the rule anything you want to make it. DR. MOYE: Yes. DR. LIPICKY: So what I am looking for is what that limit would be, and how .0177 fits into that. DR. MOYE: Right. For two primary endpoints, let's say the overall alpha after the interim monitoring was .024. Then I would come up with about .013. DR. LIPICKY: And that is just for two -- DR. MOYE: For two primaries. DR. LIPICKY: For two primaries, and there were six. DR. MOYE: If there were six, it would be -- DR. MASSIE: Come on, Ray. We know that they identified a single primary endpoint. DR. LIPICKY: No. DR. MASSIE: There were not six endpoints I think from the way the protocol was written. DR. COHN: The protocol was written that mortality is the major endpoint of the trial, and it links 2-year and overall, which are closely correlated. So, you don't spend all your alpha -- DR. LIPICKY: Well, okay, let's be sure we understand exactly what's being said here. There is no question that the trial was sized on the basis of mortality. DR. COHN: And the statement is in the statistical analysis that the primary endpoint is mortality. DR. LIPICKY: Right, but elsewhere in the protocol it says we have these primary outcomes. DR. COHN: It was probably not written properly. They were all listed as primary endpoints but the primary endpoint was really the mortality. The others should have been listed as secondary, and they weren't. DR. LIPICKY: Fine. DR. COHN: I plead guilty. DR. LIPICKY: Okay, so that's for that. And then for the multiple statistical methods, do you need make some other correction because the problem is we have all these analyses to look at and some of them look pretty good. DR. MOYE: Well, the correction I make really is one that frankly ignores the other analyses. I go exclusively by what the investigators said they were going to do. I rely on their diligence and their efforts in putting together an acceptable protocol, and that means -- DR. LIPICKY: Right, but I said that it was okay for them to do a Cox analysis. DR. MOYE: Yes. I don't know why. (Laughter.) DR. LIPICKY: And so since I said it was okay, they did it and now I have another p value. What am I going to do with that? DR. MOYE: My advice to you, sir, is to ignore it. DR. MASSIE: Let's just say that they made the point and said, they are both very important and if either one comes up positive, we are going to call it a positive trial. Let's say they have written it that way because some might read it that way. DR. MOYE: Okay. Well, if they had written it that way, then I would have them apportion alpha for each of those tests that they did. Here I think I guess I would be a little more amenable to the point that Jay brought up about there being correlation here. I am not sure I know exactly -- DR. LIPICKY: Okay. I think I understand what you are saying and I have it in perspective with respect to these two numbers and that is fine now, as far as I am concerned. If Dr. D'Agostino might comment a little bit. DR. MASSIE: Let's take Lem off the hot seat and have Ralph comment on those five sort of statistical -- DR. D'AGOSTINO: When I was dean of the graduate school at BU, the faculty would come in or the chairman would come in and you would expect from the sciences this rigorous evaluation of their faculty for salary increases, and you would expect maybe the people from humanities would be a little bit more loose about it. Well, what used to happen is that the humanities would come in with these scales on how they rated their faculty and the mathematics used to come in and say, this guy is a good guy and so forth. I am a statistician, but I am going to just say this is easy to handle because if I look at the best world that they produce in terms of their hypotheses testing, and you want to do lots of different tests and what have you, none of them are smashing. The levels of significance are not marginal. They are significant by our rules if this was the only thing they are looking at, but they have five or six endpoints to run around with. They have three groups to deal with. Multiply it anyway you please. It really is that once you start saying that there are multiple activities going on here, all of these are levels of significance that they have quickly run you into seven, eight p's of .07 or .9 and so forth, or .09, .10. They quickly get you beyond what we conventionally think. I think that this is a nice study but there is so much going on that nothing comes out that clear that can really stand up to any kind of adjustments that you want to make. I am willing to let them do the Cox regression. I am still not impressed by the final answer they get. DR. MASSIE: Is that good enough for you, or do you want to get specific answers? DR. LIPICKY: That discussion actually answers 2 and 3 also. So, you just got through three questions. DR. D'AGOSTINO: Thank you. DR. MASSIE: Maybe I should thank you too, but I think before we jump to 4, if anybody has some burning comments on the committee, let's hear that. DR. TEMPLE: You may well have answered all the questions, but the way this was formed was to see if you could put more specific numbers on at least some components of that, and maybe that's asking a question that's unreasonable and you have to resort to the Gestalt. But, for example, having three groups is fairly specific, three groups with a common placebo. You are going to hear another thing later in this meeting where someone said, well, my critical p value is therefore .035 because I have three groups and the common placebo. So, it is not quite Bonferroni because there is a shared placebo, so they are not independent. That is one component of this, surely, that's easy to put a number on. So I guess my thinking was, whatever you say the value here is, you have got it inflate by 50 percent or -- DR. D'AGOSTINO: Well, I am trying to say the same thing. Because you have the three groups, there is this sort of multiplication right there. Because you have a number of endpoints, there is a multiplication there. There is no way that I can take this reading that they aren't declaring these as primary endpoints. I know the primary primary is mortality, but the other ones are still called primary and I don't know what that means if it doesn't mean that they are the major variables. DR. TEMPLE: Of course, some of those sort of lean, and I have never understood how, if you have six endpoints and some of them are sort of marginal, you adjust your correction for that. That's for another two-week workshop. DR. MASSIE: Udho. DR. THADANI: Barry, I think one of the things which disturbs me, if we stick to the protocol, that it not show a significant log-rank and then you could show how we can do different statistics to prove the point. I think here we are making a decision on, as has been alluded by the two statisticians, that log-rank was negative and you could bring a lot of tests. I think we have to live with what the data is, and so one has concerns. DR. MASSIE: Yes, I just think, because obviously there are some implications of skipping immediately to question 4 from this, that if you put it in the best light, the protocol did specify two different tests up front. They did give some alpha, specifically for in fact the multiple comparisons, and they did talk about both 2-year and total mortality. My own reaction is, you have to obviously adjust for the groups and if you are willing to let 2-year mortality rise to the level of total mortality -- and it seemed to me there really was a differential emphasis in the protocol -- then you really would have to correct in some way for that and nobody has told us exactly how because I don't think they are totally independent endpoints. You would expect that they would track to some extent. But we have heard our two statisticians tell us that whether they look at it in a quantitative way or in a qualitative way that they are not blown away by the certainty of this result. DR. D'AGOSTINO: Another point to mention here, which I think has been mentioned a couple of times, if there is no guidance in the protocol on how one divides this alpha up, then you can't take the attitude of using the sharpest way of doing it and taking into the correlation and so forth because then you are playing the same game the protocol has played. There is nothing there. If you are going to get to these numbers, you have to take a sort of conservative view. There are three groups, so there's that multiplication. There are six endpoints and there's that multiplication. I don't see how you can get out of that. DR. MASSIE: Okay. Are we ready to move on to question 4? Was there a statistically significant effect found in V-HeFT I for mortality in the entire study, first, and I guess in terms of discussion, 2-year mortality? We might have comments on both of those. Let me ask JoAnn her thoughts, having heard the statisticians' input. DR. LINDENFELD: Well, I think we have heard all of those and it is still very borderline in the best case, that mortality at 2 years or in the entire study period is significant. DR. MASSIE: Any other comments from the committee? DR. THADANI: Are you asking whether you are convinced or not? Is that what you're saying? DR. MASSIE: Yes. We are being asked whether there was a statistically significant effect, having considered what the level of significance might need to be for all of the questions raised in 1, and obviously one would have to, I guess, be persuaded by the Cox more than the log-rank if we were going to say yes. DR. THADANI: We are already implying that we are going to rely more on log-rank, and the answer has to be no, I am not convinced it has shown a benefit. DR. LINDENFELD: And this was discussed in earlier meetings and it was suggested that some confirmatory data would be needed to support this. I guess we will go on and discuss that in a minute. Because of the borderline significance. DR. MASSIE: You mean earlier FDA discussions with the sponsor? DR. LINDENFELD: Right, in earlier FDA discussions. DR. MASSIE: Any comments from the left? It sounds like this is something we need to vote on. DR. LIPICKY: Yes, please. DR. MASSIE: Okay. Let's start down at the left-hand end of the table. We are voting whether there was a statistically significant effect on mortality during the entire study period for V-HeFT I. Dan, you want to lead off? DR. RODEN: No. DR. KONSTAM: Well, I always feel awkward voting on this sort of question because this is the sort of question I ask my statistician, and I haven't heard any of our statistical advisers advise us that the answer to this question is yes, so I don't see how I can vote anything other than no. DR. RAEHL: No. DR. WEBER: Now, this not the 2-year. DR. MASSIE: No. I think they are asked separately. DR. WEBER: In that case, for the moment, no. DR. MOYE: No for the overall. DR. LINDENFELD: No. DR. MASSIE: No. DR. DiMARCO: No. DR. THADANI: No. DR. GRINES: No. DR. BORER: No. DR. D'AGOSTINO: No. DR. MASSIE: Okay. Well, I guess the same question, then. Was there a statistically significant effect found in V-HeFT I for 2-year mortality? I think it is only fair to start at the other end. I don't know whether the other end starts with Ralph or Cindy. We'll let Ralph go first. DR. D'AGOSTINO: Well, if we are consistent with the discussion we had in 1, 2, and 3, we have to say no. DR. BORER: I agree. No. DR. MASSIE: Bob is not allowed to vote. DR. GRINES: No. DR. THADANI: No. DR. DiMARCO: No. DR. MASSIE: Well, I also will have to say no, since I have to rely on the statisticians to some extent. DR. LINDENFELD: No. DR. MOYE: No, for 2-year. DR. WEBER: No. DR. RAEHL: No. DR. KONSTAM: No. DR. RODEN: No. DR MASSIE: Did we get all the way up? Okay. All right, question 5, and Ray, you can at some point tell me if there are other questions we can skip along the way. Was there statistically significant effect found for hospitalizations for cardiovascular causes in V-HeFT I? DR. LINDENFELD: No. DR. MASSIE: Is there any discussion? DR. LINDENFELD: I think it was clearly no. DR. MASSIE: I guess rather than try to find if there is a consensus, we will just vote. Cindy, why don't you start. Oh, I am not sure you were here for all those data actually. You can abstain if you weren't here for that. We are in hospitalizations. DR. GRINES: Actually I did miss that. DR. MASSIE: I think you weren't there. Udho? DR. THADANI: I am not convinced. No. DR. MASSIE: Ralph? DR. DiMARCO: No DR. D'AGOSTINO: No. DR. BORER: No. DR. LINDENFELD: No. DR. MOYE: No. DR. MASSIE: Cynthia? DR. RAEHL: I need a minute. DR. RODEN: We don't have the data. DR. WEBER: Yes, where is this stuff? DR. MASSIE: The data was presented, I believe, for hospitalizations, or at least it was in the package. The hospitalizations were not different. DR. COHN: Yes. No, we didn't claim any -- DR. THADANI: Actually it's on FDA packet also, page 18 and 19. DR. LINDENFELD: There's no disagreement about hospitalizations. DR. MASSIE: All right. Is everybody voting? Well, keep going. DR. WEBER: No. DR. RAEHL: No. DR. KONSTAM: No. DR. RODEN: No. DR. MASSIE: One abstention. Right. There were three measures of exercise tolerance in V-HeFT I. For which of these were there statistically significant treatment effects? These are the maximum oxygen consumption, the total duration of symptom-limited exercise, and submaximal exercise duration. Do you want to comment on these? DR. LINDENDFELD: I think that total duration was not significant, and the submaximal exercise duration, there wasn't enough data really to evaluate that. Maximum oxygen consumption is certainly of borderline significance. Some points are positive and some are not, so overall I don't think there is a definite overall improvement, although it is very suggestive. Strong trends. DR. MASSIE: Lem, any comment? DR. MOYE: I think if you consider the comments we made about corrections that came out in questions 1 through 3, then I look at question 6 about statistically significant. I am thinking statistically significant after making the kinds of adjustments that we have discussed. And after making those adjustments, I don't see that we can say that any of these are significant. DR. MASSIE: I haven't put Bob on the line since he hadn't voted, and you are not allowed to vote. Do you think there is a significant effect on any of the exercise measurements? DR. CODY: (Inaudible.) DR. MASSIE: Jeff? DR. BORER: No. DR. MASSIE: Okay, well, JoAnn has pointed out that the only one probably worth voting on is the maximum oxygen consumption. Maybe we can start down all the way on the left. DR. THADANI: Again, Barry, that is only on one point in time, of all the different points. DR. MASSIE: I think the question is, do we feel that there is a significant effect on maximum oxygen consumption. I guess our thinking should take into consideration the various time points and what was found. Dan, you want to start us off on that? DR. RODEN: Can you state the question? DR. MASSIE: Yes. The question is, was there a significant treatment effect on maximum oxygen consumption at peak exercise during a maximum exercise tolerance test? DR. RODEN: Yes. DR. KONSTAM: I am not sure. Can we hear the statisticians' view of this first? DR. MASSIE: We heard Lem. We didn't put Ralph on line. DR. D'AGOSTINO: If you want to just say let's look at this particular variable -- forget for the moment the multiple testing and so forth -- I had asked the question, okay are you going to do lots of different time points. Do you see a consistency in significance across those time points, and you see a couple here and there and they sort of fade away. So, even in this situation of not getting caught up with the multiple testing, the data is not overwhelming. If you now want to add to the fact that this is in the presence of all these other tests going on, I don't see this is demonstrating significance. DR. MASSIE: Jay showed us the 3 and 6-month data which he said in itself would be enough to -- or was that in V-HeFT II? DR. COHN: The V-HeFT II data. DR. MASSIE: In V-HeFT I it was the 12-month point. DR. THADANI: It's on page 17 of the FDA document. There is only one point which -- DR. MASSIE: Yes? DR. RODEN: I was looking at V-HeFT II. DR. MASSIE: Let's start all over. DR. RODEN: Starting all over. That's B20 in the package. For V-HeFT I I think the answer is no. DR. MASSIE: Okay, no. DR. KONSTAM: No. DR. RAEHL: No. DR. WEBER: Yes, I guess I would say no too, but you look long and hard at this and I guess we are going to have the opportunity of revisiting this when we look at V-HeFT II and putting them both together. So, it's no for the moment, is my vote. DR. MOYE: No. DR. LINDENFELD: No. DR. MASSIE: No. DR. DiMARCO: No. DR. THADANI: No. DR. GRINES: No. DR. BORER: No. DR. D'AGOSTINO: No twice. DR. TEMPLE: We didn't actually discuss this, but I have a side question for the committee that sooner or later I think we will have to answer, which is, what would a change in maximum oxygen consumption imply to the committee in the absence of a change in exercise tolerance? Two different measures of the same thing. I don't believe we have ever relied on oxygen consumption as a measure of a symptomatic improvement in heart failure, but it wouldn't be -- DR. MASSIE: Well, let me just make a quick answer to that. I know we have been here and we have looked at studies where they have had both, but the oxygen consumption was always the sub-study of overall exercise tolerance. In this particular protocol, oxygen consumption is clearly identified as the primary measure. So, I think following the trend, I would take what the protocol tells us. But if they, for instance, said both equally, how would we weigh one versus the other? DR. TEMPLE: No, I am asking whether you think that is a -- well, you can defer this until afterward, but sooner or later we probably ought to touch on it. Is it a suitable measure of symptomatic improvement? It is an unfamiliar measure of symptomatic improvement, that's for sure, but it still might be reasonable. Sooner or later I think we would like to hear what you think about that, but you don't have to do it now. DR. MASSIE: Okay, well, maybe we won't do it now. Was there a statistically significant effect found for quality of life in V-HeFT I? As I remember, we should probably skip that because the sponsors didn't feel they had seen one either. The measurements were not very precise. Number 8, was there a statistically significant effect found for left ventricular ejection fraction in V-HeFT I? Maybe we'll start with JoAnn. DR. LINDENFELD: Yes, I think there was. DR. MASSIE: Not so much for voting, but I guess discussion purposes. DR. LINDENFELD: Yes, I think at both 8 weeks and a year there was a significant effect. DR. MASSIE: Anybody have any other comments? DR. KONSTAM: I'd like a little bit more clarification of this. Was this one of the predefined secondary endpoints? DR. MASSIE: I think it was one of the predefined -- well, the secondary of the primaries. DR. LINDENFELD: Yes, the primaries. DR. MASSIE: It was a primary endpoint but we've all I think judged it to be below mortality, which was mentioned so often. DR. KONSTAM: Okay, so in that light, I am not sure precisely what this question is asking. Is it asking that question after the correction for the fact that there are five other primary endpoints? DR. LIPICKY: That is correct. DR. MASSIE: That is correct. But we are not voting now. I think this is worthy of some discussion and I see Dan has his hand up too. DR. RODEN: Ray, do you want to put in the labeling, or does somebody want to have this drug indicated to improve ejection fraction? DR. LIPICKY: No. I don't think it is a matter of what the labeling would be or the approval. I understand that. I understand it was a facetious question. But the thing that this was trying to elicit was, what are the positive things you can take away from the trial, and the reasons why one would say that the trial was positive. MS. STANDAERT: Ray, we can't hear you. DR. LIPICKY: Oh, sorry. The reason for the question was to see what the positive things are that one could take away from the trial, and what it would be labeled for would come later once one finally came to the conclusion that the trials found something. DR. MASSIE: So, I guess the question is sort of a totality. Do we think there is an improvement in the ejection fraction and that thought should include all the provisos as to whether we should even be looking at it, because Lem would tell us we should not be looking at it, I guess. DR. LIPICKY: No. What did I say that made you say that? DR. MASSIE: I don't know. I think when somebody said after all the correction and all the rest -- DR. LIPICKY: Yes. Well, I think all of that is true. That is, would you look at this trial, and on the basis of the result say that there was an effect on ejection fraction found with some degree of certainty. And that includes everything that we have been talking about all the way along. DR. FISHER: Could I make one technical comment, Barry? The statisticians might like to look on page A130 of appendix 4 of the report because I did a generalized estimated equation analyses with a variety of different models which incorporate all the time points at once. DR. MASSIE: Do you have a page number? DR. FISHER: Yes, A130. The reason for pointing this out is it relates to the relative strength of effect because we have been talking a lot about adjustments and things. DR. MASSIE: I would guess I would interpret this question, or at least I would vote on this question, saying that one could agree that the ejection fraction has been changed, and not agree that that makes it a positive trial. Is that fair? DR. LINDENFELD: Yes. DR. LIPICKY: Sure. DR. KONSAM: There are a couple of different dimensions to this discussion. One of them is what exactly are we being asked, and is it a purely mathematical statistical question. And if that's the case, I might as well pass, and I think a lot of the other people on the panel would pass and just defer to the statisticians to reach a consensus about the mathematics. I guess I would ask Ray, is there a question beyond that? Is there a question that you are asking the clinically oriented reviewers or experts to then look back at the data after the statistical mathematical analysis and ask, okay, do you have reason to believe it now in the context of the mathematics, which is a different sort of question. DR. DiMARCO: I'd like to second Dr. Konstam there. It would be hard for me as a clinician to overrule our two statistical consultants who told me it is hard to find anything statistically positive in this trial. You know, and the question comes up, if you word it as, is it statistically significant, we have already said it is not, essentially. DR. MASSIE: No, I don't think we have. Not for ejection fraction. Bob has got his hand waving. DR. TEMPLE: I didn't hear any reservations from the biostatisticians about ejection fraction. It's significant at four zeros. Maybe you'll tell us you don't like the way we put the questions, but this question was to say, is that real. Not, is it valuable, is it a basis for claims, is it a basis for labeling. None of that stuff. That all comes later. This is, did it happen. DR. KONSTAM: But I think that is important generally with regard to all of these questions because I find myself then, after we ask the purely statistical question, going back and looking at it again and seeing if there's any reason that I would believe, based on pathophysiology, clinical insight, whatever. That maybe it is a marginally mathematically evident piece of information and I believe it because it fits a paradigm that I have. And if that is the question, then we could add insight to that. DR. LIPICKY: Fine. Perhaps all of these questions should be re-worded then and take statistically significant away and say, do you believe it? Is this real? But, you see, the problem is, if you can answer that question outside of the realm of statistical significance, I wonder how you can do that. DR. KONSTAM: I have thought about this, and I think what you do is you bring in the questions about does it fit a pathophysiologic paradigm. Does it fit based on other information that you bring to bear that is not mathematical? DR. LIPICKY: Right, but if the statisticians said that this has a p of .5 and it fit your model, you'd still think it was real? DR. KONSTAM: Not at all. But if the mathematics put it in a marginal zone then I think I would draw upon that set of information -- DR. LIPICKY: Fine. So, answer this question in terms of whether you believe it. Is this effect real? DR. MASSIE: And I think that is what we have been trying to do, although I guess there is a varying level of how people answer these questions in terms of that type of thinking. Lem says we shouldn't think about this. I think he means that in a different type of context than can we look at these numbers and decide. I shouldn't be reading into what Lem is saying. But does this make a trial positive is one question. Do we believe that there is a real effect on ejection fraction is a second question. DR. LIPICKY: It should not be looked at from the vantage point of, does this make a positive trial. We have never asked you to count the number of positive trials. So whether you think this is a positive trial or not is irrelevant. DR. COHN: But I'd ask you, Barry, in light of this discussion, to go back to the mortality issue and ask the panel whether they think the mortality reduction is real, even if not statistically significant by nominal p values because that is clearly the question that is being proposed by the agency to find out. The p value question only can depend upon Lem and Ralph, but the implication of the reduction of mortality requires the clinical judgment that Marvin has identified. DR. LIPICKY: I'll second that. DR. MASSIE: Okay. DR. TEMPLE: Barry, I want to object. I'm sorry. We are dancing with words here and it is treacherous. If the committee wants to tell us, yes, I sort of believe it, I can't quite tell you why, the p values are all over the map, that is not very helpful to us because that's information we can't use. You can't approve a drug because someone has an emotional reaction to it. These questions all bear on a seasoned judgment, but it's a judgment based on statistical and other considerations. I just want to make one observation. We have a large number of biostatisticians and we think that we share these judgments about -- we clinicians, I mean -- adjustments and things like that with the biostatisticians. They know the math in ways we can't even begin to fathom, but how many endpoints are reasonable and things like that are questions that clinicians -- that we feel we have to deal with, and we've never been told by biostatisticians that we don't dare think about those things. So, too much helplessness in the face of good statisticians is not necessary. These are all things that intelligent people skilled in study design can think about. So, the question here was, given the p value .0001 for the ejection fraction data, is there a question about whether that happened? Later on you get to, suppose it did happen? Who cares? That is a different, perfectly important question. But at this point you don't have to really defer to -- you can reach an answer with the advice of your -- DR. MASSIE: I think the reason why this question was sort of re-opened is that it may be that some of the newer members of the committee, which is almost everybody, were not thinking along the same wavelengths. I think that is an important thing to clarify. I think when I would think of these questions, yes, you would take the data from the statisticians, you weigh it against what you've seen in the midst of all this morass of information, and you make a clinical judgment. That in fact is how I voted already, but I am not sure everybody else did. DR. TEMPLE: The formulation I had trouble with was, apart from statistical considerations, do you think this works? The clinicians aren't supposed to be apart from the statistical -- DR. KONSTAM: Dr. Temple, can I just respond to -- DR. TEMPLE: -- supposed to be in light of statistical. DR. MASSIE: Right. Well, is that what you meant? DR. KONSTAM: Can I respond to that? First of all, I agree with you completely. None of us should be saying things that we can't defend by something. So nobody just sort of looks at it and I can't tell you why. Obviously we need to tell you why. The second thing is, I guess all of us feel, okay what do you want from us. I think that there is confusion. I think Barry said it, about is this a purely mathematical question that we are being asked, or is it a question that is based on the mathematics at first and then interpreted with an eye to pathophysiology and clinical judgment. It sounds like you want the latter. Where this really does come to bear, based on some of the discussions among the statisticians that I have heard, there are circumstances of some very, very compelling findings that seem very, very compelling to us, and some of the statisticians say, just don't look at it. I can't deal with this. It's a post hoc analysis and yet it's an enormous finding. Then I think that is the circumstance where we might want to say, all right, but do you believe it anyway, and why. DR. MASSIE: Ralph. DR. D'AGOSTINO: I think what you just said is very important, and what I think I am saying is that that is not present in this study with regard to mortality. When we were talking about the adjustments, the questions 1, 2, and 3, they were focused on mortality. No matter how you look at the statistical procedures that they used and put forth to us, any kind of adjustment takes them over. With this particular one, I think any kind of adjustment still keeps it in the statistically significant. We are moving very quickly through these, but don't carry with you that I said or that Lem said that everything is out now because of the multiple testing. This is very significant from a statistics point of view. Then there is the other question. DR. MOYE: Well, certainly, if you just look at the nominal p value. If you just want to confine your view to the nominal p value, as though that was the only evaluation done in this program, then you have to argue for its statistical significance, but that is not the only evaluation done in the program. The question is, what does the p value mean? I mean, if all you will do is look at these nominal p values and say, yes, they are significant because they are less than .05 or less than .01 or .001, then the overall alpha means nothing with that interpretation. I guess I insist that the overall alpha is very important. DR. D'AGOSTINO: No, I am not disagreeing. I am saying that if you apply all the discussion I had a moment ago where I was being very loose about it, I think that you have a very small p to begin with. As you multiply it for other things, you still have a small p. And I agree with you. I am not saying, I don't think anybody is saying that you should look at this as it stands by itself. What I am saying is, when you make all those adjustments, we make a number of those adjustments, this would probably still withstand the rigor of .05. DR. MASSIE: Yes. Let's call this question. Ralph, this is the ejection fraction question. Is there a significant effect on ejection fraction shown in this trial? DR. D'AGOSTINO: Yes. DR. BORER: Yes, I think there is. I would like to add one thing, just a plea to Jay. Anywhere else in the world you can say whatever you want, but in this one building, just building 10 here at the NIH, please don't call the technique you use to measure it MUGA. (Laughter.) DR. MASSIE: With that comment, Cindy? DR. GRINES: Yes, it's significant. DR. THADANI: Yes. DR. DiMARCO: Yes. DR. MASSIE: Yes. DR. LINDENFELD: Yes. DR. MOYE: No. DR. WEBER: Yes. DR. RAEHL: Yes. DR. KONSTAM: I'm not sure. Let me just clarify my answer. I'm not sure on the basis of what I have heard from the statisticians whether it is statistically significant or not. DR. MOYE: I guess we disagree. We have a different point of view. DR. KONSTAM: Exactly. Therefore, I -- DR. MASSIE: By the way, I phrased the question without the word "statistically." DR. KONSTAM: Good. So, I believe the data. I think there is an increase in ejection fraction. DR. RODEN: I think ejection fraction rises. (Laughter.) DR. MASSIE: Ray, I'm sorry to have to ask this. Do you want us to vote on whether we think there is a significant effect on mortality without the "statistical" in there? DR. LIPICKY: I think I would like to hear a yes or no vote that takes away the statistically significant part. Did V-HeFT show an effect on mortality compared to placebo? This is the kind of feeling question, so you don't have to break it up into 2-year and total mortality. Was there an observation that makes you believe there was a mortality effect seen? DR. KONSTAM: Now, Ray, could you give us some kind of idea what level of certainty you would like to see. DR. LIPICKY: Statistically significant. (Laughter.) DR. KONSTAM: 60/40? DR. MASSIE: Are you convinced that there is an effect on mortality? I think that is the question. DR. LIPICKY: You have already made it very clear what you think the statisticians think. So, this question is asking you what you think, so that we have it clearly -- DR. KONSTAM: Is it pretty clear to us that there is an effect on survival. DR. LIPICKY: Right. Was a survival effect shown. DR. MOYE: Well, we have to be careful about that. DR. MASSIE: I don't think we should say pretty clear. I think you want to know whether we are convinced that there was a mortality effect. DR. LIPICKY: That the trial found an effect. DR. MOYE: But there is no question that in V-HeFT I more patients died on placebo than died on the hydralazine-isosorbide dinitrate. There is no question about that. That's the data. The issue is, what does that mean for the population at large? DR. LIPICKY: Well, yes and no. Well, let me be sure that I am not doing something totally crazy. The committee very clearly indicated what it thought the study showed in a quantitative way, that is, in a carefully considered statistical fashion. The question that I want the committee to address now with just a yes and no vote, is, even though they think that, what does their gut say? And I don't know that we'll pay any attention to your gut, and probably should not, but because this discussion is going on, I would just like to know what the result of that vote would be. DR. MASSIE: Can I rephrase the question, then? Because I think we have to have some quantitative sense. The question I would ask is, if this study was repeated 100 times, are we -- DR. LIPICKY: I think that is a reasonable question. Would you expect to find this many times? DR. COHN: You know, it's more than gut, Ray, because we're using an arbitrary p cutoff of .05 and our statisticians -- or whatever number they are using, because it gives us that greater confidence in the result. But even then we are only dealing with 19 out of 20 possibilities. And if it is .07 or .08, does that mean it's the same as .9? No, it doesn't. DR. MASSIE: Well, we can't rediscover total philosophy, I think, but I think this is probably an important sort of way for Ray to get some information from the committee which is, do we think that if we did this trial 100 times, that at least 95 percent of the time we would show a convincing level of reduction in mortality. Is that fair enough, Ray? DR. LIPICKY: It is for me. DR. KONSTAM: Can I just throw one other thing in? I guess one might ask, is it convincing to us enough as a very well informed clinician. Would you use it in clinical practice based on everything you know? And to me when you get up around, let's say -- just to set different standards, when you get up around the 60/40, 70/30 level, as long as you are pretty convinced it's safe, you would probably use it. Now, it sounds to me that the FDA requires a higher level of certainty than that. DR. LIPICKY: Well, you have responded to the question of statistical significance. We are not interested in whether or not you would use it. We are interested in you as a scientist and investigator. Are you convinced that this trial, from that vantage point, if repeated, would find an effect, that again you would believe? That's the question. Certainly in the practice of medicine one uses things that one has absolutely no knowledge about at all. I know that. DR. MASSIE: I think we need to move on. I think we need to give this opinion. Everybody should look forward to question 20, which is whether BiDil should be approved for heart failure. That is certainly going to be something where if there is a different answer to this question from the statistical question, then people will have to deal with that somehow. I am going to start down there with Cindy. Do you want to answer this more clinically oriented question? Are you convinced that this drug has a significant, but not necessarily statistically significant, effect on mortality? DR. GRINES: We are back to the beginning of the questions? DR. MASSIE: Yes. DR. GRINES: With V-HeFT I? Actually that is what we have been discussing here, and I think that there does appear to be a clinically significant difference in mortality. Whether it meets the biostatistical criteria, I'm not capable of answering that. DR. MASSIE: Udho? DR. THADANI: I think as an investigator and a clinician, I am having problems. The trend is in the right direction but I am not convinced that it is significant. So if you would repeat the experiment, it may not bear out again. DR. MASSIE: Ralph? DR. D'AGOSTINO: Do you want a comment or do you want a vote? DR. MASSIE: I want an answer. DR. D'AGOSTINO: I just don't see how we can separate all the previous discussion. I think there is no way of knowing. I think the answer you just got was the correct answer. There is no way of knowing. It remains to be seen. DR. MASSIE: That's a no, I think. Jeff? DR. LIPICKY: Was that a yes or no? DR. MASSIE: That was a no. He doesn't feel -- he's not convinced. DR. BORER: I would say no also, and I would like to amplify a little bit. I think that if the study were repeated, it's more likely that you would come up with a similar answer than that you wouldn't come up with a similar answer. But the data that have been presented leave me without a reasonable assurance that you will come up on the second try with the same kind of result. Certainly if I were a clinician faced with a patient with congestive heart failure who couldn't take ACE inhibitors, seeing these data and seeing that they sort of trend in the right direction and there is nothing to suggest that they do something bad to people, and the condition is a horrible one with a terrible outcome, I would probably go with this combination of drugs, even though I don't have the strongest of data to support my doing that because I don't have any other options and it's a bad situation and I'm not doing much harm that I can see. But as a regulatory issue, I have to say no. DR. LIPICKY: What do you mean, Jeff? You have sucrose sitting on the shelf. DR. BORER: But I have no data about sucrose. DR. LIPICKY: Yes, you do. Right there in this trial. DR. BORER: They don't even trend in the right direction. DR. MASSIE: Since this is an opinion, we should get Bob's opinion, even if he can't vote. DR. CODY: We're talking about V-HeFT I and not the drugs themselves overall. We're talking about this study. It gets into temporal issues, when the study was conducted, et cetera. I guess all I could say is, when this study came out, it influenced my treatment practices. So, did I think it was significant? Yes. DR. DiMARCO: I agree with Dr. Borer, except I am going to answer yes instead of no, because I just think the level of certainty I have is much less than I usually see for trials that are submitted for approval. DR. MASSIE: I thought Jeff said it very well, too, but I am going to vote no because I do believe that "convinced" for me is informed by statisticians and also by whether I think 95 times out of 100 the same thing would happen, and I am not convinced that that would be the case. DR. LINDENFELD: I am not convinced it would be 95 out of 100, although I do believe it might be 70/30. DR. MASSIE: Wait. Is that a yes or a no? DR. LINDENFELD: It's a no. DR. MOYE: I think all roads lead to the statistics here. Whether you say statistically significant, whether you say you are sampling from the population at large, they all lead to statistics and the answer is still no. DR. WEBER: Well, I have a slightly different point of view. First of all, I agree with what Lem and Ralph have said about the statistics. I think we have to acknowledge that there is a right way to do them, and that at the end of play we didn't quite have a statistically significant result, but it wasn't a huge deviation from that. I look at the survival in V-HeFT I on the combination of hydralazine and isosorbide dinitrate. I look at the data from V-HeFT II. It gives me reassurance that that first line is real. DR. MASSIE: No, you're not looking at V-HeFT II. DR. WEBER: No, but I know from a collateral source that this line is probably realistic. It separates from two other lines on the same graph. As far as I am concerned, whether it is 95 percent or 93 percent, I believe that it is clinically meaningful and I would vote yes. DR. MASSIE: Cynthia. DR. RAEHL: One, I don't believe in the infallibility of statistics and that all things lead to that. Looking at the collection of the data, I would suggest, yes, I believe this does suggest a very strong trend. DR. KONSTAM: To the question about whether I am convinced that this is right and reproducible, the answer is no. I would add, however, that it considerably surpasses my usual threshold of evidence that I require to recommend to clinicians to use, so I would readily use it in clinical practice. But am I convinced it's right? No. DR. RODEN: I think the reason we are having difficulty is because if it's real, it's a small effect. It's not a huge effect, and that is why, given what the statisticians have told us and given my bias about a small effect, I vote no. DR. MASSIE: Okay, I hope that nobody remembers we ever had this discussion when we move on to regulatory things. (Laughter.) DR. MASSIE: But I think it's very important because there is a paranoia that we are marionettes and statisticians tell us what to do, and I think the answer is we need to think for ourselves also but listen. With that comment, I think we are down to -- where are we? Nine. Headache and blood pressure are consistent. Do you want us to continue with that? DR. LIPICKY: No, you can skip that. DR. MASSIE: Moving on to V-HeFT II, and again, Ray or Bob, you can tell us if we don't need to keep on going on every question. V-HeFT II had no placebo group. The division and the advisory committee have held a successful active comparator trial requires one to conclude that new treatment would have beaten placebo had there been a placebo group, and that the estimated effect size of the new treatment is not less than half of the effect size for the comparator agent. Now, I know that Bob Fenichel has some comments. I think we don't have a great deal of time, but having spent 12 hours of my life listening to this discussion and now being the only one left on the committee that did, there actually has been a discussion and some standards that the FDA has recommended, at least, to sponsors about what it might take to get a drug approved if there is no placebo group in the trial. I wonder whether maybe in three minutes you could summarize that discussion. DR. LIPICKY: I don't think he needs to do anything yet. DR. MASSIE: Yet. Okay. DR. LIPICKY: Let's see if you understand. DR. MASSIE: Okay. One way in which it could be concluded that hydralazine-isosorbide was superior to placebo would be if the combination were superior to Enalapril in V-HeFT II. Was hydralazine-isosorbide dinitrate superior to Enalapril for any of the mortality or exercise endpoints? Okay, any discussion? How about JoAnn? DR. LINDENDFELD: No, I don't believe so. There were no overall definite benefits. DR. MASSIE: Any other comments or discussion from the committee? DR. RODEN: The data are more compelling here than they were in V-HeFT I. DR. MASSIE: You have to talk louder or more directly. DR. RODEN: The data looked more compelling. The graphs looked nicer in V-HeFT II than they do in V-HeFT I. So, I would be interested in the statistical opinion with regard to, for example, the change in VO2 or the change in -- what was the other one -- change in ejection fraction. DR. MASSIE: Change in VO2 or change in ejection fraction. Dan is not sure that he hasn't seen something there, I guess. DR. RODEN: Graph B27, for example, in the book looks more convincing than in V-HeFT I. DR. MASSIE: Okay, any other comments? Dan wanted to hear from -- DR. RODEN: From Ralph or Lem. DR. MASSIE: So, there are the two pieces of data we have. Since we know there was not an improvement in mortality compared to Enalapril, these are the two. Any other comments? I think we do need to hear from -- DR. RODEN: I think this is an important issue because if in fact an improvement in symptoms has been demonstrated using that kind of measure, then it will affect the way we vote on your last question. So, Lem or Ralph? DR. MASSIE: Lem? DR. MOYE: I guess I am furiously calculating here. This is another study now where you do have a positive endpoint on mortality. We're talking about V-HeFT II. Right? DR. MASSIE: DR. MOYE: But it is a positive endpoint in -- DR. KONSTAM: In the other direction. DR. D'AGOSTINO: It is in the wrong direction. The competitor beat out the product. It went in the wrong direction. DR. MOYE: Right, and I guess that is what I'm trying to work through now. I would say that perhaps an argument here is admissible. At least from my alpha spending function, it would be admissible to consider it. DR. MASSIE: Ralph? DR. D'AGOSTINO: Are we going to go item by item? It just went in the wrong direction, so -- DR. MASSIE: Well, mortality we are not discussing, but what the question asks if I can still dig it up. DR. D'AGOSTINO: Which one? DR. MASSIE: On exercise and presumably ejection fraction, you also concluded -- well -- DR. THADANI: You skipped 10 and 11. DR. MASSIE: No, I thought I was reading 10. Mortality or exercise, right. In other words, the question is, since there is no placebo group, how do we interpret it? And the question says, if the drug, BiDil, beat what we think to be an effective comparator, that would be a positive trial. It did not for mortality. So, we're talking about exercise. DR. THADANI: Barry, before you go further, can you allude one thing? Since there is no placebo, we are already saying that we are not going to compare it with historical placebo. I think we should establish that, because our problems with that -- DR. MASSIE: No, we haven't said that at all. DR. THADANI: No, but I am just saying, in order to make a rational decision, since there is no placebo, I think there are two active controls and it happened that ISDN-hydralazine was inferior to Enalapril. So as far as mortality, I think we have to accept that. Right? DR. MASSIE: Right. What they are asking is, even though it was inferior to mortality, was it superior to Enalapril for exercise? DR. CODY: In terms of exercise time or O2? DR. MASSIE: Well, I think that they wisely said exercise endpoints. I assume that that was written with some intention. DR. LIPICKY: That was written intentionally that way, yes. You'll have to decide. DR. D'AGOSTINO: I'm not at all convinced by any of the data I see before me. If it is a blanket answer, I would say no. DR. MASSIE: Bob? Jeff? You're not convinced either? DR. CODY: No, not on the O2s. DR. MASSIE: Since as I understand there was no significant difference on the other exercise endpoints, it really is the O2s we are talking about. Any other comments on this? DR. RODEN: But you didn't see data on the other exercise secondary endpoints. So, there is no difference in exercise time or other endpoints, other secondary endpoints, Jay. DR. COHN: All the exercise endpoints trended in the same direction. The primary exercise endpoint was peak VO2 in this trial, and that is the one that achieved a statistically significant benefit of H/ISDN at 3 and 6 months. Now, looking at the whole curves, of course, one as a statistician might raise issue about using those two points and disregarding the rest of the curves, but you must recognize as an agency that up to 6 months is the only data you have ever seen before. So, based upon 3 and 6 months, you would have approved this drug on the basis of better exercise, peak VO2 than with the comparator agent, which was Enalapril. DR. KONSTAM: Is there a statistical approach to determining whether the two curves differ from each other, and should that be applied? DR. D'AGOSTINO: If you have a question that says what happens in 6 months, that can be a very real question and you could apply a technique that solely focuses on that 6 months. Or you could ask a question about the whole curve. You may not want to ask the question about the whole curve. It may just be in particular time points. I'm not looking at that particular time point and saying that's significant and therefore I really have to focus on it. I'm looking at all of the measurements that I have before me. I see something significant here. It's kind of nice, but I have seen a lot of things that aren't significant, and why would I extract this one and believe in it? So, I am not looking at the full curve. I'm trying to look at these significant results in the context of all the other procedures that have been played here. I don't think there is any hope, if you compare those two curves. You may have something if you focused on that beginning part of it. DR. MOYE: I would like to second what Ralph said and add just one other thing. This is difficult for me to interpret independent of mortality data because you are losing patients as you go along the follow-up time. Maybe the patients that you were losing who were dying were ones who, if they had survived, would have had lower EF's. So, I just have a difficult time trying to interpret this. DR. MASSIE: Yes, I think that's an important point. I don't know if Jay has done this. In the first study it was pretty clear that more people, if anything, were dying on hydralazine and isosorbide dinitrate, and therefore the placebo group might be at an advantage, if you look at exercise time. In this, more patients died with hydralazine-isosorbide dinitrate. Was a carry-forward analysis done? In other words, in the people who died, did we look at their last exercise measurement and carry that forward through any of these endpoints? DR. COHN: Well, we don't like doing that because it really implies that patients would have stayed the same had they been alive, and the other alternative is to substitute obviously a low value for those who died. I don't like doing that either because I think that obviously dilutes the data in a favorable direction. But I can't believe it. I think what we've learned since these trials have been done is that you would not choose to do prolonged exercise testing over a long period of time because of differential mortality, and that is why I focus on the 3 and 6-month data because there was not differential mortality during those early time points when you can really evaluate the effect of the therapy on a non-mortality endpoint. After 6 months you being to see a difference that makes it very difficult to interpret, and I am not sure how you do it. DR. THADANI: Barry, can I comment? On the documents you were given from the FDA, if you look on page 40, the middle paragraph, it says, "For exercise tolerance and other endpoints related to cardiac function, hydralazine-ISDN had very little advantage over Enalapril for maximum oxygen consumption. Multivariate analysis showed no significant difference between the treatments." So, I think I will go along with that because that's in your documents. All of you have that. DR. MASSIE: What page is that? DR. THADANI: Page 40. Under V-HeFT II, third paragraph. DR. D'AGOSTINO: I told you that without looking at the text. DR. LIPICKY: We can show you that analysis on a transparency, if you'd like. I don't, unfortunately, know -- DR. BORER: To be fair, that analysis is over the entire curve, which was the issue that is somewhat at issue. DR. LIPICKY: Does anyone know the page number in the reviews where those tables appear? DR. BORER: Yes, the raw data, 32 and 33. DR. THADANI: And also the p values varied. DR. TEMPLE: Barry? DR. MASSIE: Yes, Bob? DR. TEMPLE: There's another problem with picking the 3 and 6-month time points. If one had observed a benefit accrued only belatedly, one could think of a plausible reason for why that should occur because there is more time to prevent remodeling. You can always think of a good reason for anything. So, unless the 3 to 6 were specifically identified as the time points of interest, you have to worry that you are once again following your nose. Always a problem. DR. CODY: Can I just amplify on my comments? It has something to do with what you raised earlier. This slide shows the peak VO2's, and it assumes there was no difference statistically, even biologically, in the peak O2's at baseline. The previous graph, which is B26, shows the mean change for the two groups. There the outcome is a little bit different, certainly at least at 6 months. I guess you'd have to wonder what would happen if a placebo group was in there. Would it be negative at 3 and 6 months, or would it fall between the groups? So, given that we don't use peak VO2 all that much as an endpoint independent of exercise time, and with the understanding that 3 and 6 months is as long as perhaps a lot of other trials, so one has to be careful about putting too much weight on the absence of significance over time down the line, it still comes across to me as sort of a weak observation given that exercise time didn't change. DR. MOYE: I guess I would just say, putting together what Ralph has said about the concerns about the multiple time points, and also this differential censoring effect, where patients on the hydralazine-ISDN are more likely to die, and you don't know what their EF's would have been, I am inclined to vote that there is no significant effect here for the change over time. DR. MASSIE: I think there are a lot of questions, and the one that Bob brought up earlier is, if we were to agree that there is an increase in peak VO2, does that mean that patients feel better, because I guess that is why exercise came into play, as a quantitative measurement of symptoms. There are the multiple points in time. I think it is not a clear-cut issue but I guess we need to vote on this. In the absence of further comment, I think the way we should phrase the vote is, was hydralazine-isosorbide dinitrate superior to Enalapril for any exercise endpoint, or for exercise endpoints? JoAnn, do you want to vote first? DR. LINDENFELD: If we're going to divide them up, I think it was significant for peak VO2 at 6 months. The FDA analysis doesn't give it quite statistical significance at 6 months. I think it was .1. But the exercise time, and then I think the time and anaerobic threshold were not significant at all. DR. MASSIE: So that is a yes, that -- I think the way, if I can hazard to guess what Ray is asking, is, do we feel exercise capacity was improved in V-HeFT II? DR. LIPICKY: I'm sorry, I was talking to someone else when you were talking. DR. MASSIE: This question, since you didn't say VO2, are you asking us whether we think exercise capacity was improved in V-HeFT II? DR. RODEN: You are using exercise capacity in a sort of general way. That's the way I interpret your question. My answer is no. DR. MASSIE: We started down there. We'll keep on coming from there. DR. KONSTAM: No. DR. RAEHL: No. DR. WEBER: No. I'd agree with JoAnn. I'm not sure what "in a general way" means but we do have these compelling data with the peak oxygen consumption at 3 and 6 months. So, to that extent, I would say yes, I think there is a difference between the two treatments, at least at that point of the study. DR. MOYE: No. DR. LINDENFELD: I think when you combine all three measurements of exercise capacity, the answer is no. DR. MASSIE: I think I'm going to have to vote yes, but there's an important proviso, which is the question I asked Jay. I'd like to see whether the increase from rest to exercise in VO2 was improved significantly, and if it were, I would feel that is a quantitative measure of exercise capacity, but obviously we are not going to know that today. But I will vote yes, assuming if it were ever to be examined, it were found to be positive in that manner. DR. DiMARCO: I vote yes for the 3 and 6-month time points, but I really can't make a decision about the rest of the study. DR. THADANI: My answer is no. At 3 months it's significant, but the FDA analysis at 6 months, p is .11. So, the answer is no. DR. GRINES: I vote yes for 3 and 6 months. DR. BORER: I would have to say no. I have concern about the 3 and 6-month points too, and again, they sort of just make it statistically, according to the analyses we were given, but no adjustments were made. The entire discussion we had before is applicable here, and the putative difference is relatively small. So, my overall response is that the answer is no. DR. D'AGOSTINO: No. DR. MASSIE: We've got the vote. Maybe we can gain some momentum here now. If hydralazine-isosorbide dinitrate were not superior to Enalapril, it might still be superior to placebo. 11.1, the sponsor argues that an answer to that question would best be derived by comparing the hydralazine-isosorbide dinitrate group to the placebo group in V-HeFT I. The division argues that the best comparison would be with the results of the SOLVD treatment trial, where the magnitude of the effect of Enalapril was demonstrated, or with a combination of the results of SOLVD treatment and V-HeFT I. What is the appropriate placebo group for this comparison? Any questions or comments? DR. LIPICKY: This is for mortality now. DR. MASSIE: Right. DR. MOYE: I guess I have to say I'm extremely discouraged by this question because I think that the analysis that is suggested here is incorrect. We are talking essentially about historical controls, and they are so fatally flawed with differences in therapy, baseline therapy, and baseline characteristics for the groups that I don't think this question is addressable. DR. D'AGOSTINO: I feel like I am being asked if I should kill my brother or my sister. (Laughter.) DR. D'AGOSTINO: I think both answers would be problematic. (Laughter.) DR. THADANI: I think, Barry, without having a placebo control, we can't guess anything. The trial was done. Enalapril was superior, and we ought to just leave it at that. We can't presume what placebo would have done. It could have gone in any direction, so we can't use historical controls. DR. MASSIE: Well, I guess I need to raise the question that the agency is probably asking us not so much about this specific experience, perhaps, but we are in a day where there will be uncontrolled trials, and the division has sponsored a meeting where that has been discussed and how to imply things. It was in the context of thrombolytic therapy where it wasn't possible. DR. LIPICKY: Barry, that's correct, but you have enough drug to consider. So, if you keep going around the questions, the way in which the answers are coming out are answering the question. DR. MASSIE: All right. What is the appropriate placebo group for this comparison? DR. LIPICKY: And what you have said so far is that there isn't any placebo comparison that would be appropriate. DR. MASSIE: And you are happy with that answer? If that isn't an allowable answer, then we should have a vote. DR. LIPICKY: Well, yes. You can always tell us we don't know what we're asking. DR. MASSIE: Bob? DR. TEMPLE: Ray is right about needing to get on with it, but this is one of the most complicated questions in general we face. What do you do when you can't randomize to placebo anymore, which may be is a conclusion reached too soon sometimes, but was certainly the conclusion reached in this case. In the case of thrombolytics, after some very elaborate discussions, anyone who is here will recall 40 consultants, each giving their opinion about things at considerable length. In a setting where everyone agreed that it could be expected that the active control would always beat placebo, that conclusion has not been considered here. But in that setting there were rules and approaches suggested that were, at least in the opinion of the committee then, pretty stringent which included the idea that at least 50 percent of the effect of the active control should be retained not by point estimate, to my best recollection, but as a lower bound conference interval, a very stringent criterion, which basically means the new drug has to be at least a point estimate better than the old one. What is interesting about this case, though, is that no one contemplated -- no one clearly thought about doing that where the new agent actually is inferior to the control agent. And the question of whether you can really ever make a persuasive case in that setting is something that bears further discussion. It's plainly very difficult, but you never like to say never. The setting when this was last considered for thrombolytics was quite different. We had four, five trials where there was a regular 2 percent or 25 percent reduction or whatever, and everybody sort of took that as a given, that had there been a placebo, it would have been 2 percent worse than the control. The ability to say that here seems much more complicated, to say the least. Now, what is novel here is that Jay has made the argument that you have got a relevant group that is basically treated the same way. It is in the same environment, and it is rather more plausible than most historical controls. That is not a silly thing to say by any means. The question is whether you believe it. DR. MASSIE: I think what we are going to do because we can't recreate the 40-consultant, 12-hour meeting, is to move on to question 11.2, which is, had a placebo been present, does this committee think it is likely that hydralazine-isosorbide dinitrate would have been greater than that of placebo, the effect of. That seems like a concrete enough thing that we could answer, and I guess I heard some discussion and sentiment about that already. So, maybe we can vote. DR. LIPICKY: To my mind, you have already answered that question, if in fact the committee says there is no placebo group you can compare the results to. Then you have already answered the question and you don't need to. DR. MASSIE: No. That was the comment of some individual members of the committee. Do you want us to vote or -- DR. KONSTAM: Can I just ask a question? The way you worded this question is different from the way that you worded other questions. You asked, is it likely. I guess this brings me back to asking you, in our own minds, somehow, what level of evidence are you asking at each of these points. I mean, for me, is it likely? The answer would be yes. Am I convinced? The answer would be overwhelmingly no. DR. LIPICKY: Right, but I think we need to take 15 or 20 minutes to get at that. And there isn't enough time. So, I would suggest that we skip this question in its entirety. It is getting nowhere. DR. MASSIE: Okay, sold. Does the mortality of V-HeFT II confirm the findings of V-HeFT I? Do we need to discuss that since we -- DR. THADANI: Are we comparing apples and oranges? DR. MASSIE: -- findings of V-HeFT I that were significant? DR. RODEN: I thought we decided that V-HeFT I probably didn't show a mortality benefit versus placebo. So, I don't think this is an answerable question either. DR. MASSIE: Right. That is what I am asking. Should we answer that question? DR. RODEN: Well, the answer is no. DR. THADANI: Barry, didn't you say that since there is no placebo and we have two, we can't even talk about that? So, we are comparing apples and oranges while we are answering the question. It's a moot point. DR. MASSIE: It would seem that question 12 is moot. DR. LIPICKY: It seems fairly straightforward. If you have said that V-HeFT I has no mortality effect, then in fact there is no effect to confirm. DR. MASSIE: Right. DR. LIPICKY: And that would be a very simple answer. I think everyone agrees on that. Anyone who doesn't agree on that, raise their hand. (No response.) DR. LIPICKY: Okay, you're done. DR. COHN: Is no effect the same as a p value greater than .05? DR. TEMPLE: This is probably a mistake. The committee voted that there was no effect clearly shown in V-HeFT I, but when you asked various members what did they sort of grunt-believe, there was a sort of a mixed belief that there might have been a little something going on. One possibility is that V-HeFT II supports your belief that there's a little something going on because you believe the placebo group that has been developed. Another possibility is that in turning out inferior to a drug whose effect is not that large, it actually weakens whatever thoughts you had about V-HeFT I. So, I guess I wouldn't insist on a vote, but some sense of this may help us figure out how to say or do what we say or do. But if you think that's distracting, feel free. DR. MASSIE: It's getting a little distracting. DR. FENICHEL: Bob Fenichel, FDA. Let me see if I can sharpen this question. I think it's wrong to reject the question on the grounds that, having found no statistically significant effect in V-HeFT I, it is intrinsically a non-confirmable trial. Perhaps it's useful to propose a thought experiment. Suppose V-HeFT I had in fact been replicated, say, 40 times, and the question were now, do these 40 replications confirm each other? Now, if you had held that V-HeFT I was a complete waste of time, that it showed nothing, not just in a statistical sense, but it didn't give you any sort of feeling as to where the truth lay, then the answer plainly would be, well, no. Now you have just done nothing 40 times. But the answer might be, well, V-HeFT I trended in the right direction and gee, if you saw a trend in the right direction 40 independent times, that would be pretty impressive. So, it is possible for a study which shows nothing by itself to be confirmable or not, and that is what this question was trying to draw your attention to. DR. THADANI: The question said V-HeFT II was designed to compare hydralazine with Enalapril has no relevance to V-HeFT I. All you are saying in this particular study, Enalapril was superior to hydralazine, and there is no way of saying that it replicated the V-HeFT I study. I don't know how you can say that. DR. FENICHEL: I think that the facts make it difficult for me to find confirmation in V-HeFT II of V-HeFT I, but I think that it is an askable question. Suppose it were true that Enalapril is known to be immensely better than placebo every time, essentially reducing mortality to 0 from 100 percent, and suppose it were further true that in V-HeFT II the hydralazine-isosorbide dinitrate combination had been associated with, say, 2 percent mortality instead of 100 percent, and it were large enough to be distinguishable, surely, from Enalapril. Nevertheless, you might say, gee, that's pretty good. That really does confirm the kind of trend in V-HeFT I. What I am asking you to do, I think there is an answerable question. I think this is not a null question. V-HeFT I is confirmable or disconfirmable on the basis of a positive control study. Now, you may believe that it was not in fact confirmed by this particular study, but that is a separate answer. I am asking you not to reject the question. DR. MASSIE: So, you are just asking, could you confirm the results -- DR. FENICHEL: No. I'm asserting that it couldn't confirm. I am saying -- DR. LIPICKY: Maybe the way to put it is, let's say that hydralazine-ISDN significantly beat Enalapril and you are wondering, did ISDN-hydralazine beat placebo in V-HeFT I. I think you could say V-HeFT II confirms V-HeFT I. It turns out it was the converse. So, the question is addressable, okay. It is really a very simple answer I think, and you were making it complicated so I was going to skip it. DR. MASSIE: Well, I think that you heard that the committee didn't feel you could draw conclusions from V-HeFT II about the effect of hydralazine and isosorbide dinitrate on mortality. And therefore, both because there was no finding to confirm, and because they couldn't draw conclusions, it sounds like we have already answered the question. DR. TEMPLE: Barry? DR. MASSIE: Yes. DR. TEMPLE: Let me try again. To those people who thought that V-HeFT I might have been a good lean, do they feel stronger or less strong, having seen the results of V-HeFT II in which the drug was inferior to another agent? DR. MASSIE: Well, there were four of them, according to the vote, and maybe we can just ask them and move on. DR. TEMPLE: Fine. But I think that is the sort of thing we are getting at. Obviously you can't confirm something you didn't believe fully in the first place. DR. GRINES: I was one who thought that the mortality figures looked pretty good for V-HeFT I, but I would say that if there is a 30 percent reduction in V-HeFT I, I would have expected to see some sort of advantage or at least equivalence in the second study. To my knowledge, ACE inhibitors don't reduce mortality by 50 percent, so I am somewhat surprised that isosorbide and hydralazine lost in V-HeFT II. I would say that I am less enthused about the mortality advantage. DR. MASSIE: Okay. Anybody else? DR. WEBER: Yes. One of the things that was impressive to me in the two separate studies, nevertheless done by the same bunch of investigators and using very comparable patients, was that the survival curves on the two studies were virtually superimposable. In fact, they truly were superimposable. So, what happened when you got hydralazine-ISDN in the second study was that you finish up pretty much the same place as the first study. So, whatever the first study showed, I guess the second study wouldn't weaken my belief in the results of the first study. It wouldn't confirm it, but it wouldn't disconfirm it or nonconfirm it either. So, as one of the people who originally said I thought there was probably something in it, I haven't been persuaded to abandon that point of view. DR. MASSIE: Well, there, Bob, you get two sides of the same thing. Can we move on? Exercise capacity was measured by maximum oxygen consumption at peak exercise and total duration of maximum exercise. Results of both measures of exercise capacity in both the division's and sponsor's view gave similar results. DR. LIPICKY: Barry, excuse me. You are anticipating the answer you gave several questions back. It was going to be repeated. But that's okay. You can skip this. You really have already answered it. You just didn't go through it in the detail we wanted. So, we know your answer to this. You can go on to 14. DR. MASSIE: Which was hospitalizations, but as I remember, there were no data to support a reduction in hospitalizations, I mean, within the same absence of a placebo control group. So, maybe there is not much that we need to further do with that. JoAnn? DR. LINDENFELD: Right. There was no hospitalization difference. DR. MASSIE: And now we get to the ejection fraction. Was there a statistically significant treatment effect on ejection fraction favoring hydralazine-isosorbide dinitrate in V-HeFT II? DR. LINDENFELD: There was at 3 months but not thereafter, so I think overall probably not. DR. MASSIE: Anybody feel otherwise? (No response.) DR. MASSIE: Is that okay, Ray? DR. LIPICKY: Yes. DR. MASSIE: How compelling is the evidence that hydralazine prevents the occurrence of tolerance to isosorbide dinitrate? Do you want us to discuss that? DR. LIPICKY: Yes. Well, all right, no. (Laughter.) DR. LIPICKY: Skip it. DR. MASSIE: I don't think you will get a definitive answer out of this committee. DR. KONSTAM: Can I ask Jay a question about this? You talked about data from your remodeling model with nitrates showing benefit without hydralazine. Isn't that right? How then would you argue that perhaps the remodeling endpoint requires the protection from tolerance? DR. COHN: Well, the data in the animal model were done with isosorbide 5-mononitrate given twice daily. We don't suggest for a moment that ISDN by itself on remodeling would necessarily induce tolerance. For those who think that it may -- and I think Ray has been very concerned about the tolerance issue -- at least we have a mechanism by which this combination might inhibit tolerance and we don't know in humans, of course, whether the tolerance is a major issue or not. I think all those things are true. I think it's important to point out when we get to the ejection fraction data, although that has never served as a provable indication for the treatment of heart failure, that there is remarkable congruence in all trials between the change in ejection fraction and the change in mortality. There is in the animal model that I have alluded to clear evidence that nitrates and ACE inhibitors block remodeling that an alpha blocker equivalent to Prazosin that we used in V-HeFT I does not, despite the fact that it has a hemodynamic effect. So, there is incredible congruence between mortality effects, long-term changes in ejection fraction, and the drugs that one uses. So, although this is not ready yet I think for an indication, I think it should be strongly considered as clear evidence that a drug is altering the natural history of the disease. DR. MASSIE: What about digoxin? DR. COHN: We haven't studied digoxin in our model. DR. MASSIE: In your model, but in the human it's a drug that increases ejection fraction and seems to have a neutral effect on mortality. DR. COHN: Of course, in the dig trial there was no ejection fraction measurement, so we don't know what happened in that trial. The only data we have is up to 6 months, which may well be in the right direction. And as you know, in the dig trial there was a reduction of pump failure deaths, but the drug independently has an adverse effect on electrical activity and there was an increase in sudden deaths, probably so. That may well be consistent with the other observations. DR. THADANI: Jay, on that question on ejection fraction, inotropes increase ejection fraction, have a negative effect on mortality. There may be a difference between the mechanism of increasing ejection fraction or vasodilators might be different. DR. COHN: No, there is no data showing long-term improvement in ejection fraction with any inotropic drug other than digoxin, that Barry points out, which we have 6-month data for. But there is no other data. We have data out to 4 years here with ejection fraction, which really is a remodeling phenomenon, not the short-term hemodynamic increase in ejection fraction that may occur when you give dobutamine or amrinone acutely and demonstrate that the heart empties better temporarily, but that is not a remodeling issue. DR. THADANI: Would hydralazine alone do that without nitrates? DR. COHN: We don't know. DR. TEMPLE: Jay, do you know whether if you stop the drug for a couple of weeks, the increased ejection fraction persists? DR. COHN: No, we've never studied that, Bob. That has been studied with the ACE inhibitor. When one stops the drug, the benefit persists so that it isn't all hemodynamic. It's structural. But we've never done that with hydralazine and isosorbide dinitrate. DR. TEMPLE: That seems a crucial question if one wants to believe that there's remodeling and all that stuff, to be sure you separate out the hemodynamic effects. But you think it probably would persist? DR. COHN: Oh, I think a lot of it will because it is a structural change. It's not a hemodynamic change. I think we know very well from a lot of basic studies that that is a change in myocyte length and interstitium which really would persist when one stops the drug, at least for a period of time. DR. CODY: Of course, the other paradox here is that even though the ejection fraction goes up, we can't really attach a mechanism to it. I think people implicitly attach that contractility is improved. The paradox is that, to pick up on Udho and Jay's point, positive inotropes do not uniformly increase ejection fraction, whereas beta-blockers, at least in a number of trials, do. So, if the beta-blocker is increasing the ejection fraction is it by positive inotropic effect? DR. MASSIE: I guess we will hear more about this. Okay, the next set of questions I'm not sure is relevant at this point, that is, getting into the details of the dosing and instructions. DR. LIPICKY: Fine. Why don't you just go on to -- DR. MASSIE: We could go back to 17. DR. LIPICKY: Well, go on to the last question. That's fine. DR. MASSIE: So, I think the last question is, should BiDil be approved for use in the treatment of congestive heart failure? Again, I guess any discussion. We'll start with JoAnn. DR. LINDENFELD: I think given all the discussion that we're not certain of the benefit, it would be hard to recommend it. So, I would vote no. DR. KONSTAM: Barry, are we discussing or are we voting yet? DR. MASSIE: Discussing. DR. KONSTAM: I guess the question that I have in my mind is, is there a circumstance where the FDA would consider a lower standard for approvability than is normal, because in my mind the data clearly do not reach the usual level of approvability statistically, or on the basis of a clinical judgment of a certain finding, or a clearly reproducible finding. I guess the arguments in my mind that perhaps the FDA might consider a lower threshold here are some of the arguments that Jay made. This is a study that was done a long time ago that cannot be reproduced. The data trend strongly in the right direction. There is not a significant concern about safety. The finding in favor of survival at least tracks with some other things like the ejection fraction, which often goes along, and the study cannot be reproduced at this point. For all of those reasons, I think if there were a circumstance where the FDA would consider a lower level of approvability than the usual, I think this might be one. DR. THADANI: I think, realizing the study was done in 1980 but we learn with experience so we can't say what we have learned we are going to ignore and go on the evidence or the judgment made in 1980. One of the worries I have, you are going to lower the standard of approvability. Once you start that, there is no end to it and you could talk about any trial we do. You've got six endpoints, four endpoints, one is positive, you are going to approve it. So, I think I've got major problems with the previous speaker on that issue. DR. MASSIE: Well, I don't think -- he was asking. DR. TEMPLE: We're actually actively thinking about how much evidence one needs, but there were a couple of things that are worth saying. There isn't any magic p value. There are conventions but we don't sit here and say if you're .053, you're toast, or something like that. It doesn't work that way. From time to time, although I don't know if everybody would agree, we've actually accepted one-sided tests. The approval of nifedapine for unstable angina was done with a one-sided test because we thought we were confirming something. I just thought I'd mention that. We don't usually do it, but that's really a convention. It's not written in stone. There isn't any law or regulation that says the p value shall be .05 two-sided or .025 one-sided. It's a convention and it represents a strength of evidence. So, that's one thing. Everybody knows that from time to time we have accepted persuasive, unreplicated findings as a basis for approval. That usually involves a clear mortality effect and it usually involves very low p values so that they are persuasive in one way or another on their own. The discussion has been very interesting. I think, from my point of view, we'd have a rather more difficult time if all we had was V-HeFT I because then you'd have a marginal -- then everybody would look and say, well, it's sort of close and it's mortality and you can't do it again and all that. To me it's why I keep pressing this question. The existence of V-HeFT II is, relatively speaking, quite damaging because it isn't easy to lose to another active drug, if you're active. Those studies are not very sensitive to do that. So, when you talk about lowering the standard, the question is, what does that mean and what do people have in mind? There isn't any absolutely rigid standard, as anyone who has followed these committee meetings can easily deduce. But there is some sense that you're supposed to believe it with a reasonably high level of assurance, and we get advisory committees to help us figure out what that level of assurance ought to be. But there isn't a whole lot of credit given because it's old, because the study was smaller than it would have been if they were really thinking about a mortality trial. Those are just the facts of life and it's hard to give credit for that because when it's too small, that means you don't really know the answer. It's not that the answer would have been the same but more significant if you had done a bigger study. You don't know that. So, those are just some thoughts. We are actively trying to write down the sorts of things we think about when we consider evidence. One thing we definitely can do is think about the relevance of pharmacologic activities of short-term studies and things like that. We can do that. The question we will put to advisors is what cases we should do that in. DR. MASSIE: Bob? DR. CODY: Of course, any decisions along these lines today not only affect deliberations for future trials but maybe as soon as this afternoon. So, I think there is a difference here. This is not a new drug or a new drug combination. When I said no for the statistics but yes to how it influenced me clinically, back in the 1980s this V-HeFT I was the final nail in the coffin for Prazosin. So, I wasn't alone in that decision. Everybody else in heart failure stopped using Prazosin. So, people voted with their guts for V-HeFT I a long time ago. V-HeFT II, I would agree, is sort of a negative in the sense that while, yes, the combination was better than placebo, but it looks like Enalapril is better than this combination. However, it looks like because of the exercise, maybe the combination of Enalapril and hydralazine and nitrates would be beneficial. That remains untested. So, we can't really address that at all. So, I think what we are looking at is a drug combination that we do have a lot of historical information about. We know, as Jay has pointed out, that there has been no really serious adverse effect with this combination sufficient enough to raise it to everyone's level of consciousness. This has been approved by three guideline groups as an effective alternate therapy. The drug combination suggests we put brackets around a totally loosely uncontrolled range of drug combinations of hydralazine and nitrates that are being used out there. I think that the data that was presented does suggest that the combination of hydralazine with the nitrates attenuates the adverse effects of nitrate tolerance. So, I don't think we're being asked to really necessarily here make with our decision something that's going to "lower the standards" for future deliberations, even as soon as this afternoon. So to address Marv's question, Dr. Konstam's question, I think this might be a case for the reasons that he mentioned and I've mentioned, where one might look at this combination differently and, at the same time, not lowering the standards for how a new drug coming here might be evaluated. DR. MASSIE: Jeff? DR. BORER: I think that all the points that have been made just now by Bob and by Udho are very important, but since the two components are available, and since they are usable if someone believes they are appropriate to be used, I don't think it's necessary to consider changing FDA standards for approvability. The drugs are there, the literature is there. It's not illegal to put them together in a certain dosage form and use them if you believe that's right. The question we're being asked is, are the data supporting that use sufficiently consistent and sufficiently reproducible so that we can recommend that that should be done as a routine procedure by any clinician who wants to do it? I think the answer is, we don't have the degree of assurance that would allow us to do that. Again, if somebody disagrees, the drugs are there, the literature is there. One is free to use these drugs in heart failure. The practical impact of the FDA not approving this combination today is that there won't be an economic incentive for the sponsor to get out and provide educational material for a lot of doctors to know how to use the drugs best. I think that is the impact. DR. CODY: Just along those lines, one of the issues that came up was the need for prescribing information. I think ultimately that's going to be addressed best in labeling. But getting back to our very early discussions -- and Barry raised this -- yes, it might be true that only 40 or 50 percent of patients are receiving ACE inhibitors, but that doesn't mean the other 50 percent wouldn't benefit from them. It means that for whatever reason the message isn't out there. So, the use of the term of this as alternate therapy might be a little bit misleading in that regard. At the same time, in the labeling we can't say anything about whether or not this combination is beneficial added to an ACE inhibitor. There are also the minor issues of no data for functional class 4 patients and perhaps even a smaller but I think timely or at least politically correct point, we don't have any information in women. DR. MASSIE: Well, I think we've had some discussion. I think we need to take a final vote on this. Why don't we start on the right there. Cynthia, do you want to lead off the vote on whether this is approvable or should be approved for the treatment of heart failure? DR. GRINES: No. DR. MASSIE: Udho? DR. THADANI: No. DR. MASSIE: Jeff? DR. BORER: No. DR. D'AGOSTINO: No. DR. DiMARCO: No. DR. MASSIE: No. DR. LINDENFELD: No. DR. MOYE: No. DR. WEBER: I'm going to vote yes. I felt that there was enough information provided to give me reasonable confidence that the drug was beneficial or the product was beneficial or would be beneficial in a different formulation. I believe that physicians are being encouraged to use it by reputable agencies that issue guidelines and advice to physicians. I think the availability of the combination would encourage physicians to use hydralazine together with the ISDN rather than just the ISDN alone, which I think may be important to improve the quality of care. And I do not have the same concern that one or two others have expressed about weakening FDA standards. I think FDA standards are absolutely critical when you're considering new molecules or new paradigms of treatment or some kind of definite new step away for what's currently being done or what's currently available. But here we're just finding a way to make more convenient, and I think more efficacious a treatment that's already in use, and I can't see any real down side to approving it. So, that's why I would vote yes. DR. RAEHL: I think the totality of the evidence supports approval of this drug. However, I would say that when we think about labeling that it would be adjunctive to dig and diuretics in the treatment of chronic heart failure in particularly those patients who have sensitivity or demonstrated intolerance to an ACE inhibitor. DR. KONSTAM: I guess I would give the FDA my clear view that this does not meet its usual criteria for acceptability. I think that's very clear to me. I would still vote yes, approval, because I think perhaps this fits into a circumstance that I might lower that standard without, as Mike points out, influencing future decisions. DR. RODEN: No. DR. MASSIE: Well, I think that we've gotten through the agenda, and it's one minute before the scheduled time. So, we should get together by 10 after 2:00. That's 40 minutes, and get the discussion started shortly thereafter. (Whereupon, at 1:29 p.m., the committee was recessed, to reconvene at 2:15 p.m., this same day.) AFTERNOON SESSION (2:21 p.m.) DR. DiMARCO: This afternoon we will be discussing NDA 20-297, Coreg, or carvedilol, for the indication of congestive heart failure. The first part of the presentation will be the sponsor's presentation. If they would like to start, they are free to do so. DR. POWELL: Thank you. Dr. Temple, Dr. DiMarco, Dr. Lipicky, members of the Cardiorenal Advisory Committee, good afternoon. My name is Bob Powell. I'm Vice President for Regulatory Affairs at SmithKline Beecham Pharmaceuticals. We at SmithKline Beecham and Boehringer Mannheim appreciate this opportunity to present the information on the use of carvedilol, Coreg, for the treatment of congestive heart failure. As you know, this meeting represents the second time that the committee has been asked to review carvedilol for the treatment of congestive heart failure. For those of you who were present in May, you will recall that at that time the committee and the FDA agreed that carvedilol SNDA contains one study, study 240, that provides information that would support the approval of carvedilol for heart failure. However, while the results of several other trials were presented at that time, and most of these trials showed favorable effects of carvedilol in a variety of prespecified clinically important measures of efficacy, many of these measures were secondary endpoints in trials that did not achieve statistical significance for their primary endpoint -- DR. DiMARCO: Excuse me, Dr. Powell. Can we dim the lights for the front so the slides project better? DR. POWELL: -- in trials that did not achieve statistical significance for their primary endpoint, notably improvement in exercise tolerance. Also, the final report from a major study, study 223, performed in Australia and New Zealand, had not been submitted to the FDA prior to that meeting and thus the committee was unable to fully assess the value of study 223 at that time. The committee by a vote of 4 to 2 did not recommend approval of Coreg. Finally, I should add that neither the committee nor FDA raised any significant safety concerns about the use of carvedilol for the treatment of CHF. Since the May meeting, we have submitted and FDA has reviewed three types of new information. First, we submitted the final report of the long-term results of our second major trial, study 223, which we believe confirms the effects seen in study 240. Second, in response to specific requests from FDA, we submitted additional analyses of the effects of carvedilol on morbidity and mortality, as observed in previously reported multi-center trials, studies 240, 221, and 220. And third, we have provided additional information to clarify the effect of carvedilol on CHF symptoms and clinical status. I am pleased to note that we have reached full agreement with the FDA on the data, the analyses, and the nominal p values associated with the carvedilol database that are summarized in your briefing document. What remains, however, is to reach agreement on the proper interpretation of these data and analyses. As part of the meeting, you will be asked several questions concerning the efficacy of Coreg, all of which lead ultimately to the single question, should Coreg be approved for the treatment of congestive heart failure. In examining this question, we wish to emphasize that there is a hierarchy of drug effects that should be considered in deciding if Coreg demonstrates a clinically important effect in the treatment of congestive heart failure. In any hierarchy of benefits, mortality would be the most important. Second would be effects on morbidity, followed by effects on clinical symptoms and status, and finally, effects on hemodynamics. While we are not seeking a survival claim per se for Coreg, it is of note that in the U.S. multi-center trial program treatment with carvedilol was associated with a 65 percent decrease in the risk of death, with the nominal p value of 0.0001. While this observation led to the Data and Safety Monitoring Board's recommendation to terminate the entire program and offer carvedilol to patients taking placebo, we understand that there can be a legitimate debate about the appropriate regulatory course of action for a survival claim based on data from trials not specifically designed to detect a mortality benefit. Nevertheless, this finding does effectively rule out the possibility that carvedilol has an adverse effect on survival. A directionally positive but not statistically significant effect on mortality was also observed in study 223, and the overall mortality benefit based on all placebo-controlled trials with carvedilol suggests a relative risk of 0.51, and an overall nominal p value of 0.001. In our hierarchy of effects, after a reduction in mortality, evidence of a reduction in the worsening of CHF would clearly be an important benefit. In all of our multi-center trials, we prospectively monitored hospitalization as a measure of worsening CHF. You will find that it is the data on hospitalizations, or more specifically, the combined endpoint of hospitalization or death, that are most convincing that Coreg produces a clinical benefit in these patients. Why the combined endpoint? In fact, the combined endpoint is clearly a better measure of worsening of CHF than hospitalization alone because death is in fact simply the worst possible outcome. If, for example, all patients in one group had been hospitalized and all patients in the other group had died, the conclusion would not be that the deceased group was better off. Thus, we believe that mortality is an appropriate and necessary part of the measure of morbidity. We will provide prospective data from two well-controlled trials and retrospective analyses of data from two additional trials that demonstrate a positive effect on the combined endpoint of morbidity and mortality. The third benefit of Coreg in our benefit hierarchy is the effect of Coreg on the clinical symptoms and status. Data from eight placebo-controlled trials show that Coreg produced consistently favorable effects on each of several measures of symptoms in clinical status within and across studies, including New York Heart Association class, CHF symptoms, and global assessments. In addition, to complete our hierarchical profile, Coreg produced consistently favorable effects on left ventricular ejection fraction measured noninvasively, as well as left ventricular function assessed invasively. Thus, Coreg produces a variety of favorable effects across the entire hierarchy of desirable outcomes for patients with CHF. However, there is considerable variability with respect to the ability to demonstrate any one of these effects in any given clinical study. This raises an important question. How variable are the outcomes of clinical trials of compounds recognized to be useful in CHF? In response to a request by Dr. Temple, Dr. Milton Packer has reviewed the data from the NDAs of all five drugs approved for CHF during the last decade. He found that there was no measure that distinguishes drug from placebo in all trials for any given compound, and that clinical symptoms were significantly improved compared to placebo in only about half of the trials in which they were measured. Further, despite the fact that fewer than 50 percent of the trials conducted with these drugs reached their primary endpoint, it was possible to assess their effectiveness on the basis of concordance of findings across trials, including those which failed to reach the protocol-defined primary endpoint. In our presentation today, we will provide evidence from two well-controlled trials that demonstrate a reduction in morbidity and mortality, as well as a concordance of data demonstrating positive effects on clinical symptoms and status, in addition to favorable effects on hemodynamics. We believe that these data meet the usual standard used to assess drugs for the treatment of CHF. In fact, based on this database, Coreg is now available in several countries outside the U.S. for the treatment of CHF. Having said that, we recognize that the carvedilol database does have some limitations, and we would propose that these can be adequately handled in labeling. For example, we believe that one limitation of the current file is the lack of any extensive clinical experience in patients with class IV heart failure. Therefore, we are not at this time recommending that carvedilol be used to treat patients with CHF symptoms at rest, particularly those hospitalized for worsening heart failure or receiving intravenous medications for heart failure. One final comment. In the FDA medical and statistical review, it is noted that carvedilol is, after all, currently approved in the U.S. for hypertension, and it is suggested that since carvedilol could be available for the treatment of hypertension, there is no compelling need to approve carvedilol for CHF at this time, suggesting that physicians would be free to use the drug, off label, for CHF based on peer-reviewed publications that have appeared in the New England Journal of Medicine, Circulation, and most recently, The Lancet, describing the clinical program for carvedilol in CHF. We would take exception to this view, particularly for a drug like carvedilol. Initiation of therapy with carvedilol in CHF requires considerable caution and can be best achieved only if the prescribing physician has substantial knowledge about the associated risks and their management. It is the need for careful titration and a desire to provide the appropriate information to the physician that has resulted in our decision not to launch carvedilol for hypertension, despite its approval for that indication some 17 months ago. We believe that the proper physician education cannot be readily achieved under current FDA regulations without an approved claim for the treatment of CHF, and we are accordingly strongly committed to accomplish this educational task if the drug is approved. With this background, Dr. Neil Shusterman, Vice President of Cardiovascular Clinical Research, SmithKline Beecham Pharmaceuticals, will now present the following information: a very brief review of the pharmacologic profile of Coreg, a description of its hemodynamic effects, and more detailed review of the principal benefits of Coreg, its effect on morbidity and mortality, followed by a review of its beneficial effects on clinical symptoms and status. He will finish with brief comments on the safety and tolerability and come concluding remarks. Dr. Shusterman? I might point out, as Neil walks up, all the slides are numbered in the lower right-hand corner and can be retrieved by number if you have any questions, comments, or wish a more in-depth discussion following Neil's presentation. DR. SHUSTERMAN: Thank you, Dr. Powell. Dr. Temple, Dr. Lipicky, members of the Cardiovascular and Renal Drugs Advisory Committee, ladies and gentlemen. I would like to begin by briefly reviewing the pharmacologic properties of carvedilol, which may contribute to its efficacy in the treatment of heart failure. The drug is a nonselective beta receptor antagonist and a selective alpha-1 receptor antagonist, with no agonist actions at either receptor. In addition, unlike other beta-blockers, the drug exerts antioxidant and anti-proliferative effects. These occur in concentrations similar to those that act on adrenergic receptors and produce a variety of benefits in experimental models of myocardial and vascular injury. However, the importance of these additional properties to the efficacy of the drug in heart failure remains unknown. The actions of carvedilol on alpha and beta receptors are responsible for the hemodynamics of the drug in patients with heart failure. The first doses of carvedilol produced alpha blockade, resulting in peripheral vasodilatation. As a result, blood pressure falls slightly, but cardiac function is usually maintained. During long-term treatment, carvedilol, like other beta-blockers, improves cardiac performance as reflected by the increases in stroke volume and decreases in right and left ventricular filling pressures. The drug produces consistent decreases in pulmonary artery pressures and heart rate, but with little effect in the long term on systemic blood pressure. In summary, the hemodynamic effects of first doses of carvedilol appear to be primarily related to the actions on alpha receptors, whereas the hemodynamic effects of long-term treatment with carvedilol are related to its actions on beta receptors. The ability of carvedilol to improve cardiac function has been confirmed in each of the placebo controlled trials that have been carried out with the drug. In all, eight studies, therapy when carvedilol, when added to conventional therapy for 4 to 12 months produced a 5 to 9 ejection fraction unit increase in the left ventricular ejection fraction, associated with the p values shown here. This increase in ejection fraction with chronic treatment is greater than that reported for any other drug evaluated for the treatment of heart failure. The improvement in ejection fraction with carvedilol was related to the dose of the drug. In a parallel dose response study that compared a low dose, an intermediate dose, and a high dose, carvedilol produced dose-dependent increases in ejection fraction after 6 months of treatment. The p value for this dose response was 0.008. This improvement in ejection fraction was associated with a decrease in the chamber dimensions of the left ventricle. This decrease in dimensions occurred in addition to any effect that an ACE inhibitor might have to diminish left ventricular size. In this placebo-controlled long-term study of carvedilol in patients already receiving an ACE inhibitor, left ventricular end diastoic and end systolic dimensions were measured at baseline, and then after 6 and 12 months of treatment. Whereas both dimensions remained stable over time with the placebo treatment, these decreased over time with treatment with carvedilol. The p values for the differences between the two groups for both measurements at the two time points ranged from 0.0004 to 0.047. I would now like to describe the clinical trials development program for carvedilol in heart failure. Carvedilol has been evaluated, as I said, in eight randomized, double-blind, placebo-controlled trials. These studies can be grouped into two categories. First, the single-center studies, study 033, BMI-O1, and 035, each of which enrolled approximately 40 to 60 patients and maintained double-blind therapy for approximately 4 months. The U.S. multi-center program, consisting of four studies, 239, 221, 220, and 240. These studies recruited several hundred patients and lasted approximately 6.5 to as long as 15 months. And study 223, the Australia-New Zealand heart failure trial, which was the largest and longest of the trials with double-blind treatment lasting outwards to 18 to 24 months. Now, the designs of the single-center trials are described on this slide. Each of the three trials enrolled patients with a ventricular ejection fraction less than or equal to 35 percent. With persistent symptoms of heart failure despite treatment with digitalis, diuretics, and an ACE inhibitor, study 033 enrolled patients with mild heart failure, while study BMI-01 enrolled patients with more moderate heart failure, and study 035 enrolled patients with severe heart failure. Again, patients were randomized to either carvedilol or placebo for a treatment period of 4 months at the target doses indicated here. The design of the U.S. multi-center program is shown on this slide. In this program patients with heart failure and an ejection fraction less than or equal to 35 percent, despite conventional therapy, again with digitalis, diuretics, and an ACE inhibitor, entered a common screening period, during which an evaluation of exercise performance by a 6-minute walk test was performed. Based on the performance on this test, patients were then stratified into one of four individual trials. Patients with preserved exercise tolerance were stratified to study 240. Those with intermediate exercise tolerance went to either study 221 or 220, and those with the most impaired exercise tolerance were assigned to study 239. In all studies, after an open-label period, during which patients received low doses of carvedilol to determine tolerability, patients were then randomized to placebo or carvedilol, up-titrated to target doses over a period of 2 to 8 weeks, and then were continued on double-blind therapy during a maintenance period lasting from 6 to 12 months. During this time, background medications were to be kept constant. Please note that the allocation ratios to carvedilol or placebo varied in the four studies from 1 to 1, to 3 to 1. In three of the studies, 240, 221, and 239, carvedilol was titrated to a target dose in the range of 25 to 50 milligrams twice daily, whereas one trial, study 220, was designed as a parallel dose-response study with patients being randomized to placebo or one of three doses of carvedilol. The occurrence of major fatal and non-fatal events within each study and across the entire program was prospectively monitored by a data and safety monitoring board, which recommended early termination of the entire U.S. program because of the finding of a favorable effect of carvedilol on mortality. Early termination of the U.S. program modestly affected follow-up and recruitment in study 240, but had a pronounced effect on both enrollment and follow-up in study 239. Finally, the Australia-New Zealand trial was a multi-center study which enrolled patients with heart failure due to ischemic heart disease. Again, following baseline evaluation, patients with an ejection fraction less than 45 percent receiving a diuretic and an ACE inhibitor, were randomized to either placebo or carvedilol. This study had two phases. The short-term phase was designed to evaluate several physiologic and clinical endpoints at 6 and 12 months, whereas the objective of the long-term phase was to evaluate the effect of carvedilol on morbidity and mortality over 18 to 24 months. I would like to begin the discussion of the clinical efficacy of carvedilol with a review of the effects of the drug on morbidity and mortality. There are two reasons to do so. First, the effect of any drug on morbidity and mortality is of unquestioned clinical importance. Second, all of the new information and analyses that have been performed since the May 1996 meeting of the advisory committee have focused on the effects of carvedilol on morbidity and mortality. When the clinical trials program with carvedilol was being designed, there were good reasons to think that the drug could have a favorable impact on the natural history of heart failure. In experimental models of heart failure, long-term beta blockade can prevent the progression of heart failure when initiated early and can reverse the process of ventricular remodeling when started later. Beta blockade has been shown to prolong survival in cardiomyopathic hamsters also. And furthermore, long-term treatment with beta blockade has been shown to reduce the combined risk of morbidity and mortality in large-scale trials with metoprolol and with bisoprolol. Evidence that carvedilol could reduce morbidity and mortality emerged from the early single-center studies with the drug. One of the single-center studies enrolled high risk patients and in this study the investigators performed a retrospective analysis of the effect of treatment on major cardiovascular endpoints. Morbidity and mortality was defined as death, plus hospitalization for worsening heart failure, or resuscitated sudden death. The combined risk of morbidity and mortality was significantly reduced in the carvedilol group, p equals 0.028. Thus, based on the experimental data, information from trials with other beta blockers, and the results of carvedilol in the study just described, the hypothesis that carvedilol had a favorable effect on morbidity and mortality was developed and then prospectively evaluated in the two largest placebo-controlled trials carried out with the drug, study 240 and 220. These two trials also had the longest duration of follow-up. First, I will describe study 240. This slide shows its design, which is typical for all of the studies in the carvedilol program. After an initial screening period, all patients received open-label therapy with low doses of carvedilol for 2 to 3 weeks. Those who tolerated 6.25 milligrams twice daily were then randomized to carvedilol or placebo in a 2 to 1 ratio. The double-blind period consisted first of an up-titration period where patients were increased to the target dose, and after that by a 12-month maintenance period, during which background therapy was to be kept constant. Of the 389 patients who entered the run-in period of study 240, 23 did not complete for the reasons shown at the top of this slide. As a result, 366 patients were randomized, 134 to placebo, 232 to carvedilol. The two groups were balanced with respect to baseline, demographic, and clinical characteristics. During double-blind therapy, approximately 10 percent of placebo patients but only 7 percent of carvedilol patients failed to complete double-blind therapy or were active at the time of DSMB decision to terminate the U.S. program. The most important reasons for withdrawal were death and worsening heart failure, and these events occurred more frequently in the placebo group than in the carvedilol group. The primary endpoint for study 240 specified in the original protocol was the combined risk of morbidity and mortality. This was prospectively defined as the composite of three events: first, death due to heart failure, or sudden death; second, hospitalization for worsening heart failure; and third, worsening heart failure that was severe enough to require a sustained increase in background medication as defined on this slide. Several physiologic and clinical measures were prospectively defined as secondary endpoints in this trial, and these will be discussed a little later in this presentation. It should be noted that the FDA medical statistical review refers to incomplete data on study medication in this trial, including a reference to missing data on 60 percent of patients. While study medication data have always been complete since the submission of the NDA, prior to May 2nd, 1996, the data were not in an easily extractable form for computer analysis. On that date, SmithKline Beecham delivered new data sets and programs to the FDA that allowed for complete electronic replication of our results. Again, I want to emphasize that no medication data are missing on any of the patients. This slide shows the effect of carvedilol on the primary endpoint of study 240. When all randomized patients were included in the analysis, 21 percent of the placebo group, but only 11 percent of the carvedilol group experienced one or more of the three events that comprised the primary endpoint. This difference reflected a 48 percent reduction in relative risk, significant p value of 0.008. This slide also shows the breakdown of the three events in hierarchical order, with death taking precedence over hospitalization, which took precedence over a change in CHF medications. As can be seen, each component of the primary endpoint occurred less frequently on carvedilol than on placebo. The same data as shown on the previous slide are displayed here as Kaplan-Meier plots using a time-to-first-event analysis. As can be seen, the curves diverge early and continue to separate over time. Placebo is indicated in blue-green and carvedilol in yellow. Using this analytic approach, carvedilol reduced the combined risk of morbidity and mortality by 53 percent, with a p value of 0.005. To further evaluate whether the effect on the combined endpoint was related to a concordant effect on its individual components, the hierarchical analysis was repeated twice after sequentially omitting the least important components. This was done in response to a specific FDA request, and these additional analyses were submitted by the sponsor to the FDA since the first advisory committee meeting on carvedilol. The top line shows the primary endpoint as defined in the protocol, and as I showed on the previous slides. The second line shows the effect of carvedilol on death and hospitalization. This analysis was carried out by omitting the medication component. Please note that the risk reduction remains stable and the treatment effect remains significant. This third line shows the effect of carvedilol on heart failure deaths. This analysis was carried out by omitting any contribution of either hospitalizations or background medications. Even when more than 90 percent of the events are eliminated, the effect of carvedilol remained significant. The analyses on this slide demonstrate that the benefit of carvedilol on the primary endpoint in study 240 does not depend on the medication component. These results and their supportive analyses demonstrate that when added to conventional therapy for up to 15 months, carvedilol reduces the combined risk of morbidity and mortality in patients with chronic heart failure. Next I want to review the results of study 223, which was the second large-scale trial that prospectively evaluated the effect of carvedilol on morbidity and mortality. Again, this slide shows the study design and as in study 240, after an initial screening period, patients received low doses of carvedilol and those who were able to tolerate 6.25 milligrams twice a day were then randomized in a 1-to-1 fashion to either carvedilol or placebo. This was then up-titrated during the first 2 weeks of the study until target doses were reached, and after that, patients were continued on double-blind therapy for a maintenance phase that extended for 18 to 24 months. Please remember that the study was divided into two phases, a short-term phase, which was designed to evaluate the effects of carvedilol on exercise tolerance, ejection fraction, and left ventricular dimensions, and the long-term phase, which was designed to evaluate the effect of the drug on the combined risk of morbidity and mortality. A total of 415 patients were randomized into the study, 208 to placebo, 207 to carvedilol. The two groups were well matched for most baseline characteristics, except for nominally significant differences for sex, New York Heart Association class, and history of angina. This study was an investigator-initiated study. The original protocol was written by the investigators, and it did not clearly define primary or secondary endpoints. However, the protocol defined specific objectives for the two distinct phases of the study. First, it defined the objectives of the short-term phase. "Data on exercise capacity, left ventricular function, and left ventricular size will be collected at baseline, and after 6 and 12 months of follow-up." Next, it defined the objectives for the long-term phase. "Data on mortality and major morbidity, including hospitalization for worsening CHF, and signs and symptoms of heart failure will be collected over 18 months of follow-up in all patients." The two phases were quite distinct. The short-term objectives were not evaluated during the long-term phase, and the long-term objectives were not analyzed during the short-term phase. Carvedilol exerted a favorable effect on two of the three short-term objectives, specifically on left ventricular ejection fraction and on left ventricular dimensions, but it had no effect on maximal exercise tolerance. Now, I want to focus the remainder of this discussion on mortality and major morbidity in this study, and in so doing, I want to emphasize that this information has been analyzed and submitted by the sponsor to the FDA since the first advisory committee meeting on carvedilol. Mortality and major morbidity was defined by the investigators as the main objective of the long-term phase of the study. This was not only noted in the original protocol, but it was specifically reiterated in writing by the investigators following the completion of the short-term phase. Before the blind of the study was broken, mortality and major morbidity was operationally defined as the combined risk of all-cause mortality and all hospitalizations. This was to be analyzed by a time-to-first-event approach, using a log-rank test. Morbidity and mortality was monitored during the trial by a prospectively constituted data and safety monitoring board. All patients in this study were to be followed for the intended duration of double-blind therapy. No patient was lost to follow-up for the analysis of mortality. And in addition, no patients were lost to follow-up for the analysis of hospitalizations. This slide was an earlier version, and the reason for that is we have now obtained complete follow-up information on hospitalizations until the end of this study in all 7 patients. An analysis you will see today, hopefully, reflects the inclusion of these new data. At the end of the trial, 80 to 85 percent of the patients at risk were still taking double-blind medication and this proportion was similar in the placebo and carvedilol groups. This slide shows the effect of carvedilol on all-cause mortality and all hospitalizations. As can be seen, the curve diverged after about 6 to 7 months and continued to separate over time. Overall, carvedilol reduced the combined risk of morbidity and mortality by 23 percent, p equals 0.045. This slide shows the effect of carvedilol on the individual components of the combined endpoint. The effect on mortality alone and the effect on morbidity alone are concordant with the overall effect on the combined endpoint. Although not specified in the original protocol, the FDA requested that the effects of carvedilol on morbidity and mortality in study 223 be adjusted for baseline covariates to account for the potential imbalances in the two groups at baseline. To do so, three covariates of prognostic significance were selected: NYHA class, maximal exercise tolerance, and ejection fraction. When any combination of these three variables were included as covariates in a Cox regression model, the magnitude of the risk reduction with carvedilol remained extremely stable and the effect of carvedilol remained significant. To summarize at this point, the effect of carvedilol on morbidity and mortality was prospectively defined and evaluated in two studies, study 240 and study 223. In both studies carvedilol produced an important effect to reduce the combined risk of morbidity and mortality, over a follow-up period as long as 18 to 24 months. In order to further substantiate this effect, the FDA requested additional retrospective analyses of the other large multi-center studies, that is, studies 221 and 220, recognizing that study 239 was too small and too brief to produce meaningful results. Both studies 221 and 220 were designed to evaluate the effect of carvedilol in patients with moderate to severe heart failure and had virtually identical study protocols. Very quickly, this slide shows the design for study 221 and you will notice it is identical to study 240, except that patients were randomized 1 to 1 to carvedilol or placebo, and the maintenance phase was 6 months rather than 12 months. Of the 301 patients who entered the run-in this study, 23 did not complete, for the reasons shown up here, and therefore, 278 patients were randomized to placebo or carvedilol, 145 to placebo, 133 to carvedilol, and the two treatment groups were balanced with respect to baseline and clinical demographic characteristics. During double-blind therapy, almost 21 percent of the placebo patients, but 14 percent of the carvedilol group, failed to complete double-blind therapy. All reasons for withdrawal, including withdrawals for death and worsening CHF, were more frequent in the placebo group than in the carvedilol group. The primary endpoint for study 221 was exercise tolerance. Several assessments of symptoms and clinical status were specified as secondary endpoints. One of the prespecified secondary endpoints was morbidity, which was defined in the protocol as hospitalization for cardiovascular reasons. In addition, mortality was also specified in the original protocol as a safety objective and was prospectively monitored by an independent committee. This slide shows the effect of carvedilol on these prespecified analyses. When analyzed according to the original protocol, carvedilol reduced the risk of cardiovascular hospitalization by 46 percent and the risk of death by 43 percent. Although the effect on morbidity alone was nominally significant, one needs to be very careful about looking at morbidity alone. Since patients with the most rapid or severe deterioration die before being hospitalized, an analysis of morbidity alone excludes patients with the worst outcome. For these reasons, the FDA asked the sponsor to perform an analysis of the combined risk of morbidity and mortality. To ensure the robustness of these analyses, three definitions were used: first, the combination of morbidity alone and mortality alone as defined in the original protocol for study 221; second, the combined risk of morbidity and mortality as defined in study 240; and third, the combination of morbidity and mortality as defined in study 223. This slide shows the effects of carvedilol on the combined risk of morbidity and mortality, using these three definitions. As can be seen, depending on the definition used, the combined risk of morbidity and mortality was reduced by 39 to 43 percent with carvedilol. With the study 240 definition, the p value was 0.004. With the study 221 definition, the p value was 0.029, and with the study 223 definition, the p value was 0.019. This slide shows the Kaplan-Meier curves for the effect of carvedilol on the combined risk of death or hospitalization for any reason. This analysis is being shown graphically because it was the definition used in the study 223 and it makes the fewest assumptions about the specific cause of an event. As can be seen, the curves diverged early and continued to separate during the entire duration of follow-up. Overall carvedilol reduced the combined risk by 39 percent, p equals 0.019. Next, let us take a look at study 220, the second study in patients with moderate to severe heart failure. Shown here again is the design, and again, it is identical to study 221, except now patients are randomized to one of three doses of carvedilol rather than to a single dose. Otherwise, the two protocols were identical. Of the 376 patients who entered the run-in, 31 did not complete for the reasons shown up here, leaving 345 patients to be randomized to the groups as shown below. The four groups were balanced with respect to baseline clinical and demographic characteristics. During double-blind therapy, 25 percent of the placebo group but only 8 to 17 percent of patients randomized to carvedilol failed to complete double-blind therapy. The p value for the relation between dose and completion rate was 0.008. Again, as in the case of study 221, the primary endpoint for this study was exercise tolerance. Several assessments of symptoms and clinical status were specified as secondary endpoints. One of those prespecified secondary endpoints was morbidity, defined in the same way, that is, hospitalization for cardiovascular reasons. And mortality was also prespecified as a safety objective for the study. This slide shows the effect of carvedilol on these prespecified analyses. When analyzed according to the original protocol, the risk of cardiovascular hospitalization was reduced by carvedilol by 45 percent, and the risk of death was reduced by 73 percent, with the p value shown here. However, as stated earlier, we should be careful in looking at morbidity alone because patients with the most rapid or severe deterioration died before being hospitalized, and an analysis of morbidity alone therefore excludes patients with the worst outcome. For these reasons again, we were requested to perform an analysis on the combined risk by the FDA and to ensure the robustness of those analyses, we performed them in the exact same three ways as of study 221. That is, the combination of morbidity alone and mortality alone as defined in the original protocol for study 220, the combination of morbidity and mortality from study 240, and the combination of morbidity and mortality from the Australian-New Zealand trial, study 223. This slide shows the effect of carvedilol on the combined risk of morbidity and mortality using these three definitions. For all of these analyses, please note that the effect in the three carvedilol groups has been combined into a single group. As can be seen, depending upon the definition used, the combined risk was reduced by carvedilol from 32 percent to some 54 percent. When the study 240 definition was used, the p value was .049. Using the study 220 definition, the p value was .001, and using the study 223 definition, the p value was .002. This slide shows the Kaplan-Meier curves for the effect of carvedilol on the combined risk of death or hospitalization for any reason. Again, this is being shown because it was the definition used in 223 and uses the fewest assumptions in assigning the cause for an event. Again, the curves diverged early, remained separated throughout the follow-up period, and the overall reduction in the combined risk was 49 percent, with a p value of 0.002. This slide and the two that follow summarize the effect of carvedilol on morbidity and mortality in the major multi-center trials. This slide shows the effect of carvedilol on the combined risk of worsening heart failure, leading to death, hospitalization, or a need for increased medications. Please recall that this was the definition of morbidity and mortality and the primary endpoint in study 240. For all three studies, the relative risks are less than 1, and there is considerable overlap of the confidence intervals, none of which include unity. When the studies were combined to estimate overall treatment effect, carvedilol reduced the combined risk of morbidity and mortality by 40 percent, p equals 0.0001. This slide summarizes the effect of carvedilol on the combined risk of death or hospitalization due to any cause, and again, this was the definition of morbidity and mortality from the long-term phase of study 223. For all four studies, the relative risks are less than 1, and there is substantial overlap of the confidence intervals. When the studies were combined to estimate an overall treatment effect, carvedilol reduced the combined risk of morbidity and mortality by 31 percent, again with a p value of 0.0001. Finally, this slide summarizes the effect of carvedilol on the combined risk of death or cardiovascular hospitalization. This was the definition of combined risk based on the prespecified definition of morbidity alone and the prespecified safety objective of mortality alone in studies 221 and 220. Again, for all four studies, the relative risks are below 1, the confidence intervals overlap, and when the studies were combined, an overall estimate of the treatment effect showed by carvedilol reduced the combined risk by 35 percent, with a p value of 0.0001. In summary, carvedilol reduced the combined risk of morbidity and mortality in all four studies of sufficient size and duration where this could be evaluated. Such an effect is of unquestioned clinical importance and such consistency cannot be attributed to chance alone. The effect on the combined endpoint is the result of a concordant effect on the components of morbidity alone and mortality alone. For these reasons, we believe that the effect of carvedilol on morbidity and mortality forms the principal basis for the approval of the drug for heart failure. However, the benefits of carvedilol extend beyond morbidity and mortality, as I will now present data on the drug's favorable effects on symptoms and clinical status. A discussion of the effects of carvedilol on symptoms and clinical status is inherently more difficult than the assessment of morbidity and mortality. This is because measures of symptoms and clinical status are not well standardized, are subject to considerable observer bias, and are accompanied by a significant placebo effect. There is in fact considerable uncertainty as to how to measure the effect of a new drug on symptoms and clinical status. No single measure has been developed that can accurately reflect the utility of a new agent. Present measures assess three distinct but complementary aspects of heart failure, that is, the severity of symptoms, the impact of heart failure on the ability to exercise, known as functional capacity, or the impact of heart failure on general well-being and quality of life. Each of these three aspects of the disease can be assessed either qualitatively or quantitatively. The qualitative measures resemble the usual interaction between a physician and a patient in the clinical setting whereas the quantitative measures are primarily used in clinical trials. Most of these measures of efficacy were utilized in the carvedilol development program. With this in mind, I want to review the effects of carvedilol on the placebo-controlled trials carried out with the drug on symptoms and clinical status. First I will focus on the three single-center studies. These studies were primarily designed as hemodynamic studies. Each achieved its primary, prespecified endpoint, which was a measure of cardiac performance. However, each trial also specified measures of symptoms and clinical status as secondary endpoints. The shorthand used on this slide and subsequent ones expresses the results in a placebo-subtracted change at study endpoint. Positive changes favor carvedilol, negative changes favor placebo. Results that are nominally significant at the p equal 0.05 level are highlighted in yellow. Despite their small size and relatively brief duration, carvedilol had a significant effect on heart failure symptoms in both trials in which it was measured. It also had a significant effect in two out of the three trials that measured New York Heart Association class. In contrast, carvedilol had inconsistent effects on exercise tolerance. The drug had no effect on maximal exercise capacity in any of the three studies, and it had equivocal effects on submaximal exercise capacity. Studies 033 and BMI-01 assessed submaximal exercise capacity by observing the duration of exercise at 80 percent peak effort, and carvedilol had a favorable effect in BMI-01, but did not have it in 033. Only one study evaluated the effect of carvedilol on the 6-minute walk of the single center studies, and in this study the effect was significant. The inconsistency of the effects of carvedilol on exercise tolerance was attributed to the known ability of beta-blockers to attenuate exercise performance. Next I want to turn our attention to the effects of carvedilol on symptoms and clinical status in the multi-center trials carried out with the drug. As you will recall, there were five multi-center trials but I want to focus your attention on four of the trials. Because study 239 was terminated early and had only a brief duration of treatment, I will not discuss the specific results of this study, although those were presented in the briefing document. Instead, I will focus on the other four studies. First, study 240. As you will recall, the primary endpoint for this study was morbidity and mortality. However, this protocol also specified a variety of clinical measures as secondary endpoints. This slide shows the effect of carvedilol on symptoms and clinical status in this study. As can be seen, carvedilol exerted a favorable effect on each of these measures, and this effect was statistically significant for New York Heart Association class, CHF symptom score, physician and patient global assessment. For all qualitative measures of efficacy, more patients felt better and fewer patients felt worse with carvedilol than with placebo. Thus, the results of study 240 are consistent with the results of the single-center trials. Specifically, carvedilol improved the symptoms and functional capacity of patients with heart failure, but had no effect on maximal exercise capacity. Next, let us proceed to study 221. In this study exercise tolerance, both near-maximal and submaximal, were specified as the primary endpoints, and the variety of clinical measures were specified as secondary endpoints. By design, these clinical measures included the same measures of clinical efficacy assessed in the study I just showed, 240. Carvedilol had no effect on maximal exercise tolerance as measured by the 9-minute treadmill distance. By the protocol-specified analysis, the drug had only a small effect on the 6-minute walk distance, which was not statistically significant. However, this analysis excluded 26 patients who did not have on therapy assessments, and imputed values for 23 others who did not complete the study. Alternate approaches to this analysis that attempted to include a greater number of patients with less imputation of data are shown here for your interest. This slide shows the effect of carvedilol on the prespecified secondary endpoints of symptoms and clinical status in study 221. As can be seen, carvedilol exerted a favorable effect on all measures, and this effect was statistically significant for New York Heart Association class, physician global assessment, and patient global assessment. As in the case of study 240, for the qualitative measures of efficacy, more patients felt better, fewer patients felt worse with carvedilol than with placebo. And adding these results onto the previous results I have shown, we can see that they are consistent with the single-center results, as well as with study 240. Specifically, carvedilol improved the functional capacity of patients with heart failure but had no effect on maximal exercise capacity, and only small effects on submaximal exercise capacity. Next, I will discuss the results of study 220. Recall that this was identical in design to study 221, except for the number of treatment arms. In this study there were three treatment arms for active drug, representing three separate doses. The primary and secondary endpoints here, identical to study 221, were exercise tolerance, but this study was designed to evaluate the significance of the observed dose response relationships that compared carvedilol with placebo. Because the effects of carvedilol on symptoms and clinical status were not related to dose, none of the primary analyses of study 220 were statistically significant. However, when all of the doses were combined and compared with placebo, there was a parallelism between the effects of carvedilol shown in study 221, which I previously displayed, and the effects in study 220 for a number of the endpoints. This is not too surprising since, again, these two studies recruited the same types of patients and had a similar design. Finally, I would like to present the results of study 223. You will recall that this study had two principal objectives, a short-term one, based on clinical and physiologic measures, and a long-term one based on morbidity and mortality. Measures of symptoms and clinical status were not described in the protocol but were collected in the case record form. Although there was a mild tendency for worsening on carvedilol after 6 months, this was no longer present after 12 months of treatment, and the results for 12 months are shown here. For all three measures, there was little difference between carvedilol and placebo. In part, this can be attributed to the large number of minimally symptomatic patients in this study who mathematically could not improve. Lastly, I want to present the effects of carvedilol on a measure of clinical status that was used in all of the U.S. multi-center trials: the global assessment of well-being. This measure requires both the patient and the physician to ascertain how in general terms the patient is doing compared with the patient status at the start of the study, so it's a relative measure. This simple measure closely resembles the discussion that normally takes place between patients and physicians in an office setting. Here I summarize the effect of carvedilol on the global assessment using the protocol-specified approach. An up arrow indicates the percentage of patients who improved in a given treatment arm, and a down arrow indicates the percentage who worsened. In each of the three largest U.S. multi-center trials, regardless of whether the assessment was by the patient or by the physician, more patients improved and fewer patients worsened on carvedilol compared to placebo. The effect was statistically significant or nearly so in all six assessments. This slide summarizes the effects of carvedilol on symptoms and clinical status. Carvedilol produced a consistent improvement in CHF symptoms, a consistent improvement in NYHA class, and a consistent effect on overall well-being, but only small effects on exercise tolerance and quality of life. The effects of carvedilol on CHF symptoms, NYHA class, and overall well-being were nominally significant in a number of trials. In addition, in each of these four trials carvedilol improved at least three different measures of clinical efficacy. Two of these trials were single-center studies, and two of the trials were multi-center studies. We understand that we should be cautious in interpreting an effect on a secondary endpoint that did not reach significance on the primary endpoint, but we should note that in all four of these studies the effect of carvedilol on the primary endpoint was statistically significant, or nearly so. In studies 035 and BMI-01, carvedilol improved the primary endpoint of cardiac function. In study 240, carvedilol achieved the primary endpoint of morbidity and mortality, and in study 221, carvedilol improved the primary endpoint of 6-minute walk distance. This pattern of consistency within and across studies is unlikely to have occurred by chance alone, and supports the favorable effect of carvedilol on morbidity and mortality. Upon the request of the FDA, we will not be presenting detailed information about the safety of carvedilol because the FDA has not raised any safety concerns about the use of carvedilol in heart failure in the clinical trials. Full information about safety is contained in the briefing document provided to the committee. We do wish to emphasize the following points, though. First, that carvedilol was safe and well tolerated in the clinical trials when used with caution, when up-titration was guided by specific algorithms and when investigators were fully educated about the risks. In the absence of these measures, the relationship of benefit to risk may be considerably altered. Second, side effects are frequent during initiation of therapy, but these can be managed in most cases by careful titration of carvedilol and by changes in the doses of concomitant medications. Third, unlike many other drugs evaluated for heart failure, carvedilol exerts no adverse effect on survival. In conclusion, we have demonstrated that carvedilol produces improvement in each aspect of the hierarchy of the evaluation of heart failure. First, carvedilol has substantial effects on cardiac function in patients with heart failure. Ejection fraction was improved in all eight placebo-controlled trials carried out with the drug and is accompanied by a favorable change in cardiac dimensions. Second, carvedilol produced consistent effects within and across studies to improve the symptoms and clinical status of patients with heart failure. This was shown best on the qualitative measures of patient assessment such as symptoms, New York Heart Association class, and global assessment. These situations most nearly reproduce the usual clinical interaction between a physician and a patient. And third, carvedilol produced substantial effects on the unquestionably important endpoints of morbidity and mortality, as shown in this summary slide. These benefits are observed consistently across the controlled trials carried out with the drug, regardless of the definition used, and additional analyses requested by the FDA confirm the robustness of this effect. Therefore, based on all the data presented today and the documents submitted to the committee, the carvedilol SNDA meets the usual standard employed to conclude that drugs are effective in the treatment of heart failure. Thank you. DR. DiMARCO: The schedule calls for a break now, but I think in view of the time, the fact that we started at 2:15, we'll continue until about 4 o'clock before we take a break. Dr. Lipicky, do you or any members of the division wish to say anything at this point? DR. LIPICKY: No. DR. MOYE: Can I ask a question of the FDA? Is it true that the issue of medication status has been resolved to your satisfaction in study 240? DR. LIPICKY: That's a very complicated question. I guess the short answer is yes. The long answer is that, depending on what you do, you see there are slightly different p values associated with the analysis. Regardless of want you do and what you assume, it's always got better than a nominal p of .05. DR. MOYE: So, regardless of what you do, the p value winds up being less than .05. DR. SHUSTERMAN: But Dr. Moye, the issue that arose last time, whether the FDA had executable files to run algorithms has been solved. DR. LIPICKY: Okay, fine. Let me say, the problem was that in the original data that we obtained -- we had all of the original data that was to run analyses ourselves. The data in those files regarding the medications was in our judgment not sufficiently obvious to be able to decide what the basis for decisionmaking was. Those files were sent. We don't think it's a big issue. Did I say something wrong? And so okay, I was just checking with the reviewers, and they say we don't think that's a big problem. DR. DiMARCO: Dr. Temple? DR. TEMPLE: The reason it's not a big problem is that we did an extremely conservative analysis based on the small fraction of patients whose medications were thought to be adequate, the result of which was to greatly reduce the data set, but it was still significant at the usual level, even doing that, which is very adverse when you do an analysis for only about a third of the patients. DR. DiMARCO: Dr. D'Agostino? DR. D'AGOSTINO: Can I ask a couple of questions? I wasn't here for the previous discussion so you have to excuse me if I ask the same questions as were already asked and clarified. I have a couple of problems with the idea of having as an endpoint death due to CHF as opposed to all-cause mortality. You've done both. Also the hospitalization. But in the analyses or in the studies where you actually did do the hospitalization due to CHF and the death due to CHF, could you just say some words about how you decided somebody dies of CHF and hospitalizations? DR. SHUSTERMAN: The technical answer is that the investigators coded the deaths when they occurred and coded the hospitalizations. They were then reviewed internally in a blinded fashion to ensure that deaths that reasonably could be considered also to be CHF were not excluded. But it turns out the majority of the deaths, 70, 75 percent, would be considered related to heart failure, either a sudden death, which is why we included that definition, or worsening of the signs and symptoms ultimately leading to death. DR. D'AGOSTINO: One of your analyses that has a nice p value with it represents four deaths in one group and zero in another group. I sit on the Framingham committees where we try to figure out what people are dying of. My sense of our imprecision would put a few of the zeroes, deaths suddenly. That group would probably have quite a number that might be related to CHF. I like the all-cause mortality and I like the hospitalization for the sort of all-cause or the CVD hospitalization. Just what I am trying to clarify is I don't think we should keep jumping back and forth between the two because I think there are problems with the CHF hospitalization and mortality. DR. DiMARCO: I think Dr. Califf has a comment here. DR. CALIFF: I just want to stay on this line, and just for understanding because I really agree with Ralph and I would hope that maybe we could quickly dispose of cause-specific endpoints in the discussion. But just to be sure, you said they were internally reviewed also. By whom, and was the internal review bonded? And are the data you are presenting the internal review or the investigators' call on the cause-specific endpoints, both death and hospitalization? DR. SHUSTERMAN: The internal review was done by the medical monitors for the trials. It was done before the blind was broken. And what was the last part of your question? DR. CALIFF: The data you presented, is it the investigators' call or is it the internally reviewed call? DR. SHUSTERMAN: The more conservative call was used, so if the investigators excluded a death and ours included it, we would include the death, or the hospitalization. DR. DiMARCO: Dr. Temple, you had a comment? DR. TEMPLE: There is a slight problem in your preference for all-cause mortality. It wasn't the identified primary endpoint, so you get into a bit of a box. We don't disagree with you at all, but in study 240, the identified endpoint was -- DR. D'AGOSTINO: No. I understand that. 240 is 240, and we make whatever we can out of that, but we're going to move to looking at mortality across these studies. DR. TEMPLE: That actually helps a lot. DR. DiMARCO: Dr. Borer. DR. BORER: When the multi-center study was stopped all the patients on placebo were offered carvedilol. DR. SHUSTERMAN: That is correct. DR. BORER: I don't know how many of them took it, and I don't know whether you followed these patients, but that's going to be my question. And I understand fully all the problems associated with looking at data you didn't expect to look at and all that. Just my own information, if you have the data, it would be of interest to me to know whether a substantial majority of the patients who were on placebo took the carvedilol and if you followed them. And if you did follow them, was there an upward break in the survival curve in that previously placebo-treated group. So that it would approach the carvedilol curve. Do we know that? DR. SHUSTERMAN: I can tell you that greater than 90 percent of the patients in both arms at the time the trials were discontinued, continued on open label carvedilol. We have done the analysis that you have suggested. That is, look at the mortality in patients previously on placebo, switched over to carvedilol. As you can imagine, because it's not a concurrent control, it is, I think, a sloppy analysis, but the curve for those patients for mortality tracked the group that started on carvedilol. That we have on a slide and there it is. The lowest line is placebo during the double-blind trials. This is carvedilol during the double-blind trials, and this line is patients previously on placebo switched over to carvedilol. The numbers drop off after about a year or so here when this analysis was done, so I wouldn't pay too much attention to the tails, but for the first part of the curve virtually overlapping the original carvedilol. DR. COLUCCI: If I could just add, this data, this analysis was exactly as shown here, shown at the American Heart Association meeting in November. The other interesting line there are the patients on drug who have now been followed for a median time of over 2 years. DR. DiMARCO: Could you please identify yourself, sir? DR. COLUCCI: Wilson Colucci. I was the PI of study 240. So, I just wanted to amplify a little bit on this data. This was an analysis undertaken by the investigators and presented as an abstract at the American Heart meeting last November. Then to amplify further, the treatment group, who have now been continued for a median treatment of over 2 years, have continued on a virtually identical trajectory to that shown there. DR. KONSTAM: And, Bill, which trials are represented in these? Are these all four U.S. trials? DR. COLUCCI: These are all of the patients in the U.S. carvedilol trials. DR. SHUSTERMAN: That is correct. This is a combined. DR. DiMARCO: Dr. Califf? DR. CALIFF: I have a couple of questions. Maybe I should go to the last one first because I don't know if you have the data handy. What would help me the most would actually be to have the actual data for death and all-cause hospitalization for all the studies that have been done. It seems like it's been hard for me to follow. There's so much jumping around. But if we just had all-cause mortality and all-cause hospitalization with the number of patients, the number of events, and the odds ratios for each study and then combined, I think that would give us a very nice way of looking at the data in an unambiguous way that would get us out of this flipping back and forth. I don't know if you have the data handy. It seemed like one of the slides you showed almost had all of it but not quite. DR. SHUSTERMAN: The last slide showed it obviously graphically in terms of the relative risk and the confidence intervals as lined, but you want the actual numbers. DR. CALIFF: It didn't have all the studies. It was missing at least one. DR. SHUSTERMAN: It was only missing 239. That was the small study that was terminated early and had about 3 months of follow-up. But if we could put the last slide on. DR. CALIFF: If there's a way to get at the actual numbers, it would help me, and you'll understand why when I ask the next couple of questions. The odds ratio plots look nice but the numbers are not there. Let me go ahead with the other questions while you're looking for the slide. This is a simple technical question I'm sure you've dealt with already in terms of the left ventricular function endpoints. How many missing values did you have, and what did you do with the missing values in the analysis in terms of people who died or people who had inadequate studies, or just didn't come back? DR. SHUSTERMAN: You're talking about for ejection fraction? DR. CALIFF: Yes. DR. SHUSTERMAN: For ejection fraction, that represents all patients with a baseline and an on-therapy value. So, we did not impute data for patients who were missing and we didn't carry forward baseline values for those who didn't have an on-therapy value. DR. CALIFF: The concern is obviously the people who were sicker were more likely to die and not come back, and so the natural outcome of a study where you had deaths would be that the ejection fractions would go up, since the patients who were better off were more likely to be alive. Of course, you have a comparison with a control group, a placebo group. I'm just wondering if you did any worst case kind of analyses to account for that. DR. SHUSTERMAN: Well, we did not do a worst case analysis on ejection fraction because we provide that data to help explain the hemodynamic effect of the drug, but as has been shown earlier, that's not the basis that we're seeking approval for. DR. CALIFF: But the same would hold for symptomatic status. DR. SHUSTERMAN: Well, for symptomatic status we actually have done those analyses and when you do that kind of effect, that is put in worst case for a patient who drops out, then actually the results look better than I have shown here. DR. FISHER: Rob, your argument is correct. People who die are more likely to have low ejection fractions, but there are more deaths in the placebo arm, and it was compared to placebo, which would tend to bias against the active therapy -- DR. DiMARCO: Could you please identify yourself, sir? DR. FISHER: -- if you have a superior survival. DR. CALIFF: Lloyd Fisher. I'll identify him. (Laughter.) DR. CALIFF: But that's an indirect argument. I'm really just asking whether you directly address the problem in the analyses. It's also true that people who are sicker, feeling worse, are less likely to come back to get their non-invasive study. I'm just asking if there were any direct analyses done. It sounds like there weren't. The last question is, I'm really hung up on the run-in phase, and this was probably reviewed last time and I wasn't here. But it seems to me that our charge is to try to answer the question, would we recommend this as a therapy for a physician and a patient with heart failure, who doesn't have the opportunity to go through a run-in phase before things start to count. I don't really know how to deal with that. I don't know how to put a study in which all the patients that had bad things happen are eliminated before you start the treatment. I'm sure that you must have done some indirect analyses to try to sort of take that into account, but it's a bothersome thing that I hope the committee and the clinicians and statisticians will help me out with. And the reason it's important to me and the reason I keep asking for numbers is that the differences -- all of the p values are small. The numbers of events are also relatively small. And if you did a worst case analysis, for example, and counted all the deaths and all the drop-outs as attributable to the treated group, I wonder what the results would look like. DR. SHUSTERMAN: Before we show that slide, could we hold up on that slide because I want to address that question. I think you raised some really important issues there. The first is, the use of a run-in design is not at all unusual and in fact, it has been preferred at times by the FDA because it actually enhances the power of the trial. Of course, that leaves you with the dilemma, what to do with the events during that time and you have stated that. The problem there is, of course, there is no control to compare against. You don't know whether to attribute an event to the drug, to the patient's underlying disease, or to the phase of the moon. But what we have done and what we believe is that the most appropriate analysis, where there is a control, is from the point of randomization. Nonetheless, we've done several analyses and Dr. Fisher can speak to the one on mortality, where we have included the run-in events, and they do not affect the overall effect of carvedilol. We've done that on 240, 223, and the overall mortality. I think maybe Dr. Fisher would like to speak about mortality. DR. FISHER: I think it's a very important issue and I agree it's debatable whether there should be run-in periods of mortality trials because as nice as enrichment trials are in certain situations, here you worry you help some people but other people you hurt. The net effect is zero, but you kill off the high risk people early so they're not in the trial when you randomize. I thought this was going to come up last time and I brought a lot of data. I don't have my data with me but I can tell you the results of my analysis and then the agency statisticians can check it when they get done. I did a lot of analyses, and what I did was compared both the overall mortality rate on carvedilol during the run-in period with the carvedilol patients during the trial and they were the same. Of course, you don't have incredible statistical power, but there was no directional trend or anything. It looked the same. And looking at the very short early survival curve, because that's all you have in the run-in period, it looked the same. So, there's no indication whatsoever that a high-risk group was eliminated differentially because to me that was a very important concern. And I think it potentially could be very important in a setting such as this, and I was surprised it didn't receive more attention last time. DR. MOYE: Lloyd, just to follow up. What would happen if you assumed very conservatively that all of the patients who died in their run-in period would have wound up in the active group and all the survivors wound up in the placebo group? DR. FISHER: Well, it would have shifted things in the opposite direction than if I had assumed they all would have ended up in the placebo group and died. DR. MOYE: But do you think that the p value would have changed substantially from what was reported for total mortality for the U.S. studies? DR. FISHER: To me, Lem, to be honest, this is an irrational analysis. DR. MOYE: It's a very conservative analysis. Whether it's irrational or not I don't know. DR. FISHER: We can all agree to that. I haven't done that, so I don't directly know the answer to that. Maybe somebody else has. But all I can tell you is I compared, as best I could, the rate and there was no indication of an elevated mortality rate on carvedilol during the run-in period. DR. CALIFF: I'll buy that. I'm not challenging that you did that. You would agree -- and I sense a little uncertainty on your part -- that it's a little hard to say, I'm going to give you this treatment and if you make it through the first 2 weeks, then we've proven that things will be better. That's not ideal. But it looks like there are about 15 deaths in the run-in phase. The document has a nice summary of the run-in phase of each study. It looks like about 15 deaths, and you could pretty well -- DR. SHUSTERMAN: Seven deaths in the U.S. placebo controlled multi-center trials. Seven deaths. DR. CALIFF: There are 4 in study 035 in the run-in phase, and then there are 2 deaths in the New Zealand study. DR. SHUSTERMAN: That's right. Two deaths, plus seven. We presented the U.S. multi-center phase and the Australian-New Zealand. DR. CALIFF: I actually got 9. Well, anyhow, between 13 and 15 deaths. DR. PACKER: This is Milton Packer. Your question I think relates to whether there were a number of run-in events and were they of significant number, given the numerators, the total number of events in the trial because it's hard to know how to make these adjustments for run-in events, but if the numbers are small then I guess everyone would be reassured. I'm trying to get the exact numbers, but I can give you something that I think is within about 10 events, and we'll get the exact numbers for you in just a few minutes. But over all the placebo-controlled trials, there were about 400 deaths and hospitalizations for any cause. In the placebo group, it was 200 over 604, and the carvedilol group about 190 over 903. That's a 32 percent event rate in the placebo group, a 21 percent rate in the carvedilol group. You have already seen the p value is .00001. But we are talking about 400 events. DR. CALIFF: But if we look at those who dropped out in the run-in phase -- and about half of those appear to be due to either death or heart failure -- that's going to be about 120 to 140 additional events, divided by half, maybe 70. I'm not saying I know what to do with this. It's just a point of discomfort. I've said enough and I'll stop it now. But I hope that people on the panel will help me understand how to deal with this. DR. DiMARCO: Ralph? DR. D'AGOSTINO: I'd like to ask another question about study 223. I'm not clear on this long-term follow-up piece. Were the individuals still under a double-blind regimen, or was it an open-label component? It's not clear to me. DR. SHUSTERMAN: That's a good question. It was double-blind treatment for the entire 18 to 24 months of the trial. DR. D'AGOSTINO: So, all of the analysis -- DR. SHUSTERMAN: No open-label at all. DR. D'AGOSTINO: Can I ask one more question? It's unfair, but let me ask it. Do primary events mean nothing in the company? What is the company policy when they run a study and the primary events do not turn out to be significant? Is it then assumed that you can ignore them? This is a philosophical question, but I'm trying to get a sense of how I'm to respond to these post hoc and retrospective analyses in the presence of a non-significant primary. Could you tell me what you would do? Would you multiply something, a multiplicative factor for the analyses you've done? Or would you just ignore it? DR. SHUSTERMAN: I'm going to ask Dr. Jim Tiede, who's head of biometrics at SmithKline to address that. DR. TIEDE: Ralph, I'm not quite sure which studies you're referring to, but if it's studies 220 and 221, we looked at those as more confirmatory information. We had the prespecified endpoints for 240 and 223, but then wanted to see if that would hold up when we looked at other trials. So, it wasn't that we were ignoring the primary endpoint. We recognized that we didn't achieve it. We were looking just to see, did those studies provide a result that was contradictory to the two primary studies. DR. DiMARCO: Dr. Lipicky, do you have a comment? DR. LIPICKY: Milton, those numbers of 400, that was for all kinds of events? DR. PACKER: No cause-specific. This is the most conservative, no-bias interpretation. DR. LIPICKY: I just looked at studies 240 and 239 and for deaths, all told there were 10. A big jump to 400. DR. PACKER: There were a lot of hospitalizations. DR. LIPICKY: I see, so that's in hospitalizations -- DR. PACKER: And hospitalizations, which is the -- and it's all-cause mortality and all hospitalizations. DR. LIPICKY: Well, that was 10 all-cause deaths I was talking about. DR. PACKER: Yes. The total number of deaths I believe in the U.S. program, all-cause, is 53 and in Australia-New Zealand it's about 55 I believe. It's a little over 100 events so that 25 percent of the all-cause deaths and all-cause mortality is mortality, and 75 is all-cause hospitalization. DR. KONSTAM: So, could we complete the story about if you want to call it a numerator of the events in the run-in phase again in like terms. So, the total number of deaths in the run-in phase were 9? DR. PACKER: Nine. Seven in the U.S. study, I believe 2 in Australia-New Zealand. DR. KONSTAM: Compared to about 50 -- DR. PACKER: About 100 combined, 110 combined. DR. KONSTAM: 110 combined what? DR. PACKER: U.S. and Australia-New Zealand. Deaths. Just to go through it again, total number of run-in deaths in all multi-center trials is 9. Total number of deaths in the multi-center trials is approximately 110. Total number of deaths and hospitalizations -- and we're working to get the precise numbers -- is between 25 and 30. Total number of deaths and hospitalizations all across the same corresponding analysis is in excess of 400. DR. KONSTAM: I was under the impression that you had done an analysis. Let's just stick to mortality for a second with that worst case scenario of applying all the deaths to the carvedilol group. Was that analysis not done? DR. PACKER: Yes, an analysis was done, and showed that even if you took all the deaths and attributed it to carvedilol, which would be the most conservative analysis, the effect was still statistically significant. The same analysis has been done for 240 and for 223. DR. KONSTAM: Milton, what did it do to the nominal p value? DR. PACKER: .01, if I remember correctly. You're taking all the events, attributing it to one therapy. DR. KONSTAM: All the deaths. DR. PACKER: All the deaths, attributing it to one therapy. .01. DR. SHUSTERMAN: And we can show that. If we can bring up prime 14, we have that. This is the analysis if you include all deaths during run-in, attribute them to carvedilol. The relative risk is shown there. I can't read it quite from here but maybe you can. And the p value, as Milton said, was .0108. DR. CALIFF: Again, this only includes four studies, it looks like. DR. SHUSTERMAN: This is for the U.S. multi-center program. We did it for the U.S. multi-center program. DR. DiMARCO: So, it doesn't include the 2 deaths in New Zealand and the 4 deaths in the 035? DR. PACKER: 035 had 5 deaths and, again, that was in favor of carvedilol. Please remember, 035 had a 2 to 1 randomization. DR. CALIFF: During the run-in phase, Milton, for 035, I think it said there were 4 deaths in the run-in phase. DR. PACKER: Gee, that's my study. I should know, huh? DR. SHUSTERMAN: I can tell you, across all of the trials we have not done this analysis. We have done this for the multi-center trials, and that's what you see here. This counts the 7 run-in deaths against carvedilol. DR. DiMARCO: Dr. Moye? DR. MOYE: My understanding of protocol 223 is that the primary endpoint was an exercise tolerance endpoint. Do you disagree with that? DR. SHUSTERMAN: Protocol 223 had two distinct phases. I think that's pretty clear from the way the investigators set up the trial. Exercise left ventricular ejection fraction and left ventricular size were measured at 6 and 12 months, but morbidity and mortality was measured throughout the entire 18 months of the trial. In fact, none of those three measures that I mentioned were ever done again after the 12-month point of the trial. DR. MOYE: Well, I'm confused. Let me read you a statement from the protocol. "The proposed study of 450 patients is not expected" -- I say again, not expected -- "to provide any reliable evidence about the effects of carvedilol on survival. Several thousand patients followed for several years would be necessary for the reliable detection of plausibly moderate treatment benefits. Such a study would be proposed if the results in this present trial indicate that carvedilol was well tolerated by a large proportion of the study population." DR. SHUSTERMAN: That is precisely correct. This was not a mortality trial, and that's a very important point, and obviously was extremely underpowered to be a mortality trial. We're talking about the combined endpoint of morbidity and mortality, and for that the trial following patients for 18 to 24 months, some 415 patients, was an appropriate vehicle for that endpoint. That was the purpose for the follow-up of that length and duration, especially after no other measurements of status were performed after 12 months. So, this was not a mortality trial. The protocol is totally correct. We are not bringing it here as a mortality trial. But the protocol specifically said, and I gave you the quote, that mortality and major morbidity were to be looked at over the entire 18 months. DR. MOYE: Well, again, I must say, I'm confused by other statements in the protocol because they seem to, from my point of view, contradict this. Another statement says, "The major objectives of the study are to determine the effects of this treatment on exercise capacity, left ventricular function, and left ventricular size after 6 and 12 months of treatment." DR. SHUSTERMAN: That indeed was the major objective for the short-term phase of the trial, and they were measured at that time point. DR. MOYE: I can find no statement in here that says that a portion of the primary endpoint is for combined morbidity and mortality, with a statement about the definition of morbidity. I mean, to me it all seems extremely vague and very general, which is very curious for the statement for a primary objective. Do you disagree with that? DR. SHUSTERMAN: The protocol as written by the investigators is vaguer than I think you or I would have ideally liked to see. This was written some six years ago by them, and the protocol was initiated by them also. But it was clear that a long-term phase was a major aspect of this trial. Just because the patients were followed for so long with the express purpose. And that purpose has been stated by the investigators, not only the quote that I have in the protocol, but in additional communications. If I could have prime slide 2. Their initial short-term phase results were published in Circulation, and this manuscript was submitted in January 1995. Now, following completion of the short-term phase of the trial, and the observation that carvedilol had no adverse effect on exercise capacity or left ventricular function or size, the decision was made to continue treatment and follow-up of study patients with the main objective of determining the effects, if any, of treatment with carvedilol on hospital admission and mortality. So, this is in the protocol. This was a specific, active decision that was made for the long-term purpose of the trial. If I could have the next slide, please. DR. MOYE: Just a second, please. If I read this right, the decision was made to continue -- and please tell me if I'm wrong. The decision was made after the trial started to continue to the long-term follow-up. Is that right? DR. SHUSTERMAN: The decision was made to not stop the trial, to allow it to continue for 18 to 24 months. It was pre-written into the protocol that they would look, and if there was an adverse effect on the 6 and 12 months, that the protocol could be stopped. So, the protocol from the start was 18 months with the ability to stop it if there was something bad happening. DR. MOYE: But there was no stated endpoint at 18 months, no well-defined endpoint? DR. SHUSTERMAN: Yes, if we could have the next slide. This is the correspondence with the principal investigator in the trial, and the definition of the endpoint mortality and major morbidity is shown here. The definition that we adopted for this outcome was death from any cause, or hospital admission for any cause, during the period between randomization and the end of follow-up. And secondly, when was this decision made? The decision to use this global index of mortality and major morbidity was made prior to unblinding the data on these outcomes. DR. MOYE: It was made after the trial was started, but prior to unblinding. DR. SHUSTERMAN: After it started, before it was unblinded. Based on what was said in the protocol about mortality and major morbidity. DR. MOYE: Okay. Well, then would you have any response to the criticism that the investigators had a sense of how things were going and therefore tailored their definition for morbidity based on their sense for the direction of the data? DR. SHUSTERMAN: Well, I would say that if they used a cause-specific definition, Dr. Moye, but here they picked the broadest, least-biased, most assumption-free. All deaths, all hospitalizations. It didn't matter what they were admitted to the hospital for. So, I think that has the least bias in terms of being influenced by anything they knew during the trial. DR. PACKER: And then this decision to go forward after the completion of short-term phase is specifically mentioned in the original protocol. Lem, the decision to go forward with the long-term phase was specifically mentioned in the original protocol. The original protocol said that data on mortality, major morbidity would be collected for 18 months, extended by a protocol amendment to 24 months. Then later on in the protocol it says that decision would hold unless at the completion of the short-term phase there was an adverse effect on the exercise tolerance or LV chamber-size dimensions, ejection fraction measured during the short-term phase. Everything you've seen here was prespecified in the original protocol. DR. MOYE: Except for the definition of morbidity. Right? DR. PACKER: I think that if you and I would recognize that the definition of mortality and major morbidity needs to defining in this protocol, because it is not clear from the protocol what it is, what we can be reassured by is, one, they used the broadest, least-biased definition possible. And two, they defined it before they had any knowledge of any of the treatment effects before the blind was broken. I think it's perfectly consistent with the way the clinical trials are run. If there is anything vague in the protocol that needs defining, you define it up front, you define it in the least-biased way possible, and you define it before you break any codes. DR. MOYE: The message I get from you, then, not to belabor this, is that the decision for a follow-up study was made before the trial started, but the decision as to the composition of the morbidity endpoint was not. But it was made before the results were unblinded. DR. PACKER: The fact that there was such an endpoint for the long-term phase was prespecified in the original protocol. The precise definition of what morbidity/mortality would be was made before the blind, as you have said, after the original protocol, but I think that it was made quite fairly. It's the broadest definition of morbidity and mortality one can think of. DR. DiMARCO: I'm glad you both agree. Dr. Borer? DR. BORER: I don't want to belabor the run-in phase mortality issue. You've been very forthcoming about the data and the analyses that you did to try to deal with this, but I would like to know more precisely what kind of analysis was done. The primary outcome analyses were time-varying analyses, and these events occurred before time zero. What did you do? Add all the patients from the run-in phase who died as carvedilol patients and assume they died at time 0? Or was some other kind of analysis performed? DR. SHUSTERMAN: It was a time-to-event analysis but instead of starting the clock, so to speak, at the point of randomization, it was started at the point of the very first dose of carvedilol during the open-label period for all of the patients. Patients who died were attributed to carvedilol. Patients who lived and went to placebo were attributed to placebo. So, it was from the very first dose. DR. BORER: I see. Thank you. DR. KONSTAM: And what's the statistical test? DR. SHUSTERMAN: Log-rank test. That was a Kaplan-Meier curve. DR. DiMARCO: Dr. Califf? DR. CALIFF: Again, I know we are belaboring this, but that's been done for all of the trials with all of the patients. Has it or has it not? DR. SHUSTERMAN: For deaths it has not been done for all of the double-blind trials. DR. CALIFF: What about death and all-cause hospitalization? DR. SHUSTERMAN: But it has been done for the U.S. multi-center program, which was the major component of all of the trials. DR. DiMARCO: This is 4:05. We'll take a 10-minute break now and reconvene at 4:15. (Recess.) DR. DiMARCO: I'd like to get started. Before I poll the committee and ask if there are any additional questions, I'd like to ask the division if they have any questions for the sponsor. DR. LIPICKY: No. DR. DiMARCO: Would any of the reviewers like to say anything at this point? The medical or statistical reviewers? DR. D'AGOSTINO: I want to make a comment. Oh, you're talking about the FDA reviewers. DR. DiMARCO: The FDA reviewers. DR. D'AGOSTINO: I would think that it's important to hear one of the FDA people make a presentation. They have quite a different spin on some of these data. We could pick it up on our own, but I think it's important to hear them say something. DR. RODEN: While people are coming to the microphone, can I ask what kind of criteria people used to decide -- DR. DiMARCO: Dan, why don't we do the FDA reviewers first and then we'll get to your question. DR. STOCKBRIDGE: I'm Norman Stockbridge, FDA. I don't have anything new to offer on the basis of what we heard this morning. I think you have our reviews and know what our biases are here. If you have specific questions that you want to ask us, we can certainly respond to that. DR. DiMARCO: Rob? DR. CALIFF: Maybe you could just say a few words about your assessment of death and all-cause hospitalization, and your view about the run-in phase and analyses that you did on that. DR. STOCKBRIDGE: The issues about run-in phase, I think we were fairly satisfied that the point estimate one has of the mortality rate specifically during the run-in phase was not much different from what you saw during the trial. So, it wasn't like there was a large effect, and a lot of people who were at risk of dying as a result of exposure to the drug got filtered out through that. I think we were fairly satisfied with that. We were also satisfied, I think, that there was no large rebound effect associated with coming off of the drug and going into the placebo group. The early mortality rate in the placebo group was not unusually high. I don't believe we have done any analysis that tried to look at death plus hospitalization the same way. Perhaps we should. DR. DiMARCO: Ralph? DR. D'AGOSTINO: I would like to ask some questions, trying to make a judgment from the presentation we just had and what I see in the review. For example, in study 223 with the phase 2, there were three primary endpoints in phase 1. One of them I guess was statistically significant, the other one clearly wasn't, and the third one was sort of marginal, which would bump up. But then there's the phase 2 that comes in, where there is this mortality and morbidity. I think it's very important to get a sense of what we think about that level of significance that's produced. Would you let it go as it stands, or are you suggesting that it should be -- DR. STOCKBRIDGE: I have a great deal of trouble finding the words in the protocol that identify a long-term endpoint for this trial. And that is the only document that we have had, we have seen, that addresses what the endpoints in that trial might be. I believe that the Australia-New Zealand group had an interest in monitoring safety through an extended period during that trial, apparently for the purpose of considering a definitive trial later on. I think that if there had been a potential basis for approval in a primary endpoint that was identified from the short-term phase, I think they would have been horrified to have had that taken away from them because of some adjustment for multiple endpoints, including the long-term phase. I don't think that was what their intent was at all. DR. D'AGOSTINO: Can I ask a couple more questions? In study 240, where the combined endpoint was quite significant, the significance that's falling apart if you remove the medication, in particular, when you look at just straight mortality, this CHF mortality, you only have 4 versus 0 deaths. I guess I'd like some sort of statement about or some discussion about how one should look at that medication. I started off understanding that 240 was a study which everybody thought was quite nice, and the more I looked at it, the more problems I had with it. I'd like to hear your view on that. DR. STOCKBRIDGE: The FDA reviewers never did an analysis that was exactly the same as what the sponsor did. In the first place, the only analyses we did were time-to-first-event. The original protocol called for an analysis that just simply counted events. The original protocol called for components of that endpoint which were cause-specific mortality, and cause-specific hospitalization. The FDA reviewers never looked at cause-specific hospitalization or cause-specific mortality. What we analyzed was all-cause mortality and all-cause hospitalization in that, and however you do that, it is still true that the p value floats up by a factor of about 10 if you drop the medications component. DR. DiMARCO: Dr. Borer? DR. BORER: You raised what sounds to me like sort of a critical point a couple of minutes ago, and I guess we'll need a response from the company about this. But if indeed the purpose of counting deaths and hospitalizations for 18 to 24 months was to monitor safety and be able to make a safety statement, rather than to do an analysis for efficacy, the importance -- I hesitate to use the word "significance" because I think it's wrong here, but the importance of any conclusions based on that analysis would lose a great deal to me I think. Once you said it, it reminded me of one of the slides, describing study 220, where we were told that mortality was a prespecified endpoint. But it was actually prespecified as a safety assessment. I'm sure it was prespecified that deaths were going to be counted to make sure the drug was safe, but that's different from a prespecified efficacy endpoint that you build an analysis around. DR. STOCKBRIDGE: None of the U.S. trials have a protocol that specifies mortality or mortality plus all-cause hospitalization, some combination of those, as a primary or secondary endpoint. DR. BORER: The point I was making was that it does sound as if there's some similarity between the prespecification of mortality as a safety issue in one of the U.S. trials and perhaps the intent to look at mortality and morbidity in 223 as a safety issue. I'd want to know from the company whether that's what was done. DR. PACKER: Jeff, the investigators made clear that their intent was to look at a reduction in the combined risk of morbidity and mortality. That was made clear not only in the original protocol but also made clear in the January 1995 decision to go forward with the long-term phase. Remember, the protocol specifically said that unless something bad happened on the physiologic and clinical endpoint, that they would make a decision to go long-term. At that time they said that they were going long-term to look for a reduction in morbidity and mortality. It never looked at any of the data and so they were actually going for this. Now, their desire was to do a mortality trial, not to do a morbidity/mortality trial. They were doing a morbidity/mortality trial to hopefully get someone to say they were going to do a mortality trial. So, the protocol says that their hopes are eventually to do a mortality trial, but what they prespecified was the hypothesis that carvedilol would reduce the combined risk of morbidity and mortality. There's actually a considerable amount of correspondence and material in writing to indicate that. DR. DiMARCO: Do both Dr. Temple and Dr. Lipicky want to make comments? DR. TEMPLE: To some extent this is, I think, being made more complicated than it needs to be. It's perfectly obvious by now that there is nothing in the protocol for the 223 study that nicely, neatly says, my primary endpoint for the second phase is X. If there were, we wouldn't have had to be discussing it so long. I think what the company is arguing, that you can deduce what it was from these various things, from the statement in the publication and so on. My guess is it's not going to get any better than that and you sort of have to decide whether you believe that's persuasive or not because the usual kind of statement just isn't going to be there. But I was interested in knowing why Dr. Stockbridge particularly thought that they were doing it to find out if it was lethal because the phrases that were up there didn't seem to be saying that, but you might have some other reason for thinking that. Before you answer, it's important -- probably everybody knows this already but I want to be sure -- to distinguish between an endpoint that includes death and a mortality endpoint. It's very clear that there was no mortality endpoint in the U.S. trials, but there were endpoints that included one or another kind of death along with other things. So, it's not that death was uninteresting. It's that they weren't trials to find a mortality endpoint specifically. Then the third thing that's worth noting is, they tended to look at cause-specific stuff and we tended to ask them not to because we're suspicious of the ability to classify things properly. So, both endpoints were shown and you can try to decide what that means. But they were looking for some secondary endpoints at some of these mortality/morbidity things in trials. But before Dr. Stockbridge goes away, I think it's important to pin down what in there would make one think that they really weren't looking for a benefit at all because that would be very troublesome. DR. LIPICKY: Norman can answer that, but I would like to also, since opinion was asked for. I know that during our discussions with the sponsor setting up the development program, the only interest was to show there was no adverse effect on mortality. That is the only thing we ever talked about. That there was to be an expected beneficial effect on mortality was not in the cards. It was to get a point estimate to show that symptoms would be better in some fashion and there would be no adverse effect on mortality, or not a very large one. I guess I could have planted that idea in Norman's mind. The second part of it is, as you read through the protocol, boy, if anyone can read through that 223 protocol and come to the opinion that there was hypothesis-testing going on looking for a benefit, they've got a really good imagination. So, then you put that together and you say, well, they're interested in seeing what happens long-term. I have a question to ask. Those letters that were shown up here, when were they written? After the study was completed, before the analyses? DR. SHUSTERMAN: The letter I quoted was written and communicated to us when these results came out in The Lancet, which is in the last week. But the same statements are in The Lancet publication. DR. LIPICKY: That's fine. So, the first that anybody knew anything that was even close to some written-down intent was after The Lancet publication was published? DR. PACKER: That's not true, Ray. These were specifically outlined in the investigator's publication in January 1995, when they completed the analysis of the short-term phase, and in that publication said that not having any adverse effect in the short-term phase, as specified in the original protocol, the investigators are now continuing the trial for the main objective of looking at the hypothesis that carvedilol reduces morbidity and mortality. That was in January of 1995, before any blind for morbidity and mortality was broken. DR. LIPICKY: Only the short-term part. Is that right? DR. PACKER: Right. And the investigators were not privy to any information about morbid or mortal events that occurred in the study. DR. DiMARCO: Cindy? DR. GRINES: I've been reviewing this protocol, and it mentions twice in the protocol that they actually propose to perform a mortality trial if there's any plausible moderate treatment benefits. So, I can't tell from this protocol that they were expecting to see an increase in mortality because it's twice mentioned that they wanted to eventually do a large mortality trial. DR. LIPICKY: I would like to add one more comment to the opinion business. I think we missed in our first review the importance of the run-in period and how that might influence things, and I'm differing a little bit with what Dr. Stockbridge said in that I think we missed this run-in business because the major attention was paid to the post-randomization process. We were not convinced that it was an effective therapy on the basis of post-randomization analyses, so we didn't look for much more from the vantage point of what would the design of the trials mean for approvability. DR. KONSTAM: So, what's your feeling about it now? DR. LIPICKY: Well, I think we missed it and I certainly am thinking about it for the first time. DR. KONSTAM: Based on the data that you've seen. DR. SHUSTERMAN: Regarding the run-in phase, could we have backup slide 3 please because I think this is germane to this discussion. DR. PACKER: There are actually two backup slides. That's the first one. It's not unusual to have trials that have run-in phases. In fact, I think such run-in phases have been actually advocated by FDA. An example of a trial that has a run-in phase, for example, is the SOLVD treatment trial. What we did in this slide was just to compare on the left the run-in events corrected for time in the SOLVD treatment arm with the run-in events in the U.S. carvedilol arm. As you can see, the event rates during these two periods are pretty similar to each other. As Dr. Stockbridge has emphasized, he didn't see any evidence for an excessive increase in the events during this run-in period. This diagram would confirm his impression. He didn't find any excess after randomization. The event rate during the run-in was similar to the event rate post-randomization. DR. LIPICKY: Are you talking about deaths? DR. PACKER: I'm talking about deaths. DR. LIPICKY: There were very few deaths. So, the fact that someone couldn't find anything in the run-in phase is no big surprise. DR. PACKER: Deaths are pretty serious. DR. LIPICKY: But there weren't very many. The number were not very many. DR. PACKER: Well, that's because there was -- DR. LIPICKY: So, if you look at it per trial, you will not -- DR. PACKER: No, no. This is not per trial. This is for the entire program. DR. LIPICKY: What did Dr. Stockbridge say? DR. PACKER: Sorry? DR. LIPICKY: You were commenting on Dr. Stockbridge's comment, and was his comment on all of the data in all of the trials or was it a trial-by-trial comment? DR. PACKER: I guess we would have to ask him. DR. STOCKBRIDGE: I suppose I could have pulled out the primary review and looked at this again, but I didn't. I believe that what we looked at in terms of the run-in period was the mortality in the U.S. experience, the four major trials. We looked at all of the deaths that occurred during those few weeks. DR. LIPICKY: And that was a total of 12? DR. STOCKBRIDGE: Yes. Well, it was something like that. DR. PACKER: It was seven events. DR. LIPICKY: And you didn't find anything in that signal, and that means something? Come on. DR. PACKER: Ray, what can you find without a control group? What is findable in the absence of a control group? DR. LIPICKY: Nothing but you were saying we didn't find anything, and I'm just trying to emphasize the fact that that was a non-statement. DR. PACKER: But it's quoting Dr. Stockbridge's statement of a few minutes ago. DR. MOYE: But you can bound it. It sounded like that if you assumed the worst case -- DR. PACKER: Can we have the next slide please? DR. MOYE: -- then the p value still winds up being fairly small. DR. CALIFF: Don't leave the slide yet. DR. MOYE: Now, if the concern is not mortality but morbidity and mortality, what then happens? DR. CALIFF: Can I just make a comment about this slide? I'm not as convinced as Milton yet. Again, I just want to keep pushing the worst case here to set that boundary, but the underlying mortality in the carvedilol trials is about half of what it is in the SOLVD trial. So, one would expect the run-in phase mortality to be lower I would think. In addition, we've already learned I think that the U.S. experience had a lower run-in mortality than the trials that have been excluded. As you said, Milton, there were four in your study and the New Zealand had another two deaths. So, I think we need to see all the data from all the studies, and this doesn't, to me at least, provide a convincing case that there's not -- it's convincing there's not a huge problem. It's not a lethal problem. It's not a terrible problem, but I think it is an issue that needs to be sorted out. DR. PACKER: Rob, in the spirit of being most conservative, let me just say that it is true that you might expect perhaps the event rate to be lower if carvedilol was exerting an effect. But remember, this run-in period, this 2, 3, 4 weeks, the carvedilol curves don't separate for 3 to 6 months. So, you're not going to see a beneficial reduction in a run-in period of only 2 to 3 weeks in duration. DR. SHUSTERMAN: And they only received the lowest dose. DR. PACKER: And they only received the lowest dose during this period of time. DR. CALIFF: I'm making a very different point which is that the underlying mortality in the -- it's a lower risk population at a lower risk of death, and therefore in the run-in phase, you would expect .25 or .20 or something to be consistent with what you see in the rest of the trial. DR. PACKER: Why would that be? DR. CALIFF: Because you have patients with a 5 percent risk of dying at 6 months compared to patients with a 10 percent risk of dying at 6 months. DR. PACKER: Rob, if I can focus your attention on the bottom of this slide. The actual event rate at 6 months in the U.S. carvedilol/placebo arm -- and don't forget almost all of these patients, more than 95 percent, were getting ACE inhibitors. It's exactly what was observed in the SOLVD treatment arm when patients were getting an ACE inhibitor. So, in fact, the evidence is rather striking that these patients are a similar risk. DR. KONSTAM: But it's a little lower than in the SOLVD placebo arm, which is what Rob is saying. It is a different baseline population, and it may well be that it's different because they're on ACE inhibitors. So, it is a little bit different. But, Rob, I think the magnitude of this is real small. If it's 7 deaths, okay. So, maybe it's a couple one way or the other. I think the issue really is what's the worst case scenario and the worst case scenario is that all 7 deaths go over to carvedilol. I guess that's the analysis that we saw. DR. PACKER: Can I have the next slide? Rob, this is the best we can do to address your request in the short period of time available. This shows the five multi-center trials. You can see the deaths in run-in and then the deaths after randomization. Placebo is on the left and carvedilol is on the right. I think you can feel comfortable that if you took all of the deaths in the run-in period, the difference between 9.9 -- these are crude ratios. Please forgive us. These are not time-to-first-event analyses. If you took the .55 carried over to carvedilol, I'm not certain we would be having much of an impact on the delta in mortality. I agree that it would be important, Lem, to look at this for mortality and hospitalization. It's just not possible to do that right this minute. It's easy to get the mortality data. It's hard to get the hospitalization data. DR. DiMARCO: Dr. Roden, did you have a question? DR. RODEN: I had a couple of questions. One is perhaps to Dr. Lipicky and Dr. Temple. I've seen the list of questions, but I want to know what it is we're being asked. Are we being asked to decide whether carvedilol reduces mortality, or are we being asked whether it's a useful drug in the treatment of patients with heart failure? So, that's one question. Then the other question I had for Milton or someone over there, and that is mortality is easy. Hospitalization seems to me a very, very mushy endpoint depending on physician preferences. I'd like some sense of how you think those kinds of decisions were made and whether it is conceivable that because these patients are part of a trial in which endpoints were desirable, that there's a bias toward increasing the number of hospitalizations. DR. PACKER: It's very hard to address that last question because I would like to think that the patients who are in a trial -- and I think you would share this view -- in general tend to get a much more vigilant type of care than patients who are not in a trial. This is regardless of whether they get placebo or active therapy. DR. RODEN: Well, hospitalization as a sort of endpoint may change the way people approach that. DR. PACKER: No, I understand. I think that in general when one looks at either mortality or combined risk of morbidity and mortality, in general the event rates are lower than the trial predicts. There are lots of reasons why. It's a lower risk patient population, or perhaps because investigators really are more vigilant. So, in general, the hospitalization rates are lower than expected because of that. It's hard to distinguish between that and other factors that you might identify. There's no doubt that the most unbiased, clinically important endpoint in the measurement of heart failure -- and by the way, measurement of any disease -- is mortality. But I think that ranking right underneath that is mortality and morbidity. That has certain advantages. One is it tells you a little bit more about what the drug is doing to the progression of the disease because hospitalization is an intermediate endpoint. Two, hospitalization is easy to quantify. It's objective. DR. RODEN: So you say. DR. PACKER: Three, biases hopefully should be randomizable out between placebo and active therapy. I share Ralph's concern about cause specificity which is why just a week ago I emphasized in an editorial in the New England Journal that one really needs to focus on all-cause mortality and all-cause hospitalization as a combined endpoint because that gives you the highest degree of comfort that what you're looking at is is not subject to biases. It is right underneath mortality as the most important thing you can measure in heart failure. DR. DiMARCO: Dr. Temple? DR. PACKER: And by the way, that's why the analysis on carvedilol on all-cause mortality and all hospitalizations across studies is so important. DR. TEMPLE: Just a little history and theory. We've certainly accepted reduction in cardiovascular hospitalization rates as a legitimate endpoint. That's what SOLVD Prevention showed. We've said repeatedly that it's cleaner and easier to understand if you look at all-cause anything, but that doesn't mean that it's not sometimes reasonable to look at cause-specific events. What you have to do is protect yourself against bias and make sure there's an independent group that's blinded that makes the interpretations. It would be I think a real stretch to say that one mustn't do that, however. But I wanted to go back to the previous discussion. Dr. Stockbridge was initially satisfied that nothing funny was going on in the pre-randomization period by looking at overall rates. Dr. Califf has raised the question that there's a simpler way to do this and a more conservative way to look at this by simply attributing all of those events to the drug even though they happened before randomization. We will obviously eventually be able to see that. I want to be sure we understand what point is being made because we've looked at both pre-randomization deaths and pre-randomization deaths plus hospitalizations. The main endpoint on which the company is hoping, at least potentially, to rely is not the deaths. It's the deaths plus hospitalizations. So, if one wants to look at the robustness of that finding, then it's the death plus hospitalization issue that needs to be looked at, and they have apparently done that for the U.S. multi-center studies, but that leaves some things out and it shouldn't be very difficult, even if they can't do it right now, to get the rest of those data. So, do I understand that that's the main question that's being raised? You'd like to see an analysis that conservatively attributes all of those to the drug as if none of them would have happened in a placebo group. Is that correct? DR. CALIFF: That's my question. It may not reflect anybody else's interest. DR. MOYE: I agree. I think that's critical here. DR. TEMPLE: There is a second question and it's important. We know from the CAST, for example, which used a screening period and had lots of deaths during the pre-randomization period, that at least for moricizine the action was all in the pre-randomization period. The mortality attributed to the drug was all in the screening period, and by the time you got into the screening period, it wasn't nearly as bad to be on moricizine because you had killed off all the susceptibles. So, looking for excess deaths, people hypersensitive to a beta-blocker, is also a reasonable question, but my impression is that what Dr. Stockbridge did reassured him on that question that people were not dying at a terrible rate during that period. So, I just wanted to be sure I got the two questions separated and know what everybody wants to find out. DR. DiMARCO: Dr. Konstam? DR. KONSTAM: I'd like to ask two questions. I'd like to ask Milton, and maybe other people in the audience who were investigators could comment on it too. Following up Dan's point with regard to potential bias, I'm a little concerned about heart rate effects, and I think that hospitalization is subjective in the sense that it requires a decision by the clinician, as certainly does change in medication. I know if I have a patient whose heart rate is 110, I'm much more likely to be worried about that patient than somebody whose heart rate is 80. It's one of the things that consciously or unconsciously goes into my mind as a clinician. I wonder if you can comment on the likelihood that that biased these endpoints that required subjective judgment. DR. PACKER: Well, that question, I think the issue of potential bias comes up with any beta-blocker for any indication other than mortality. Taken to an extreme -- and I know you're not suggesting that -- you would say that the only endpoint you would believe in a beta-blocker trial would be mortality because any other endpoint could conceivably be influenced by a physician who may be influenced by what he sees as a change in heart rate. What's gratifying about what you see in the carvedilol program, for reasons that are not clear, is that in the analyses that have been done to date on a variety of endpoints -- and I don't think it has been done on hospitalization, by the way -- there doesn't appear to be any difference in the delta heart rate in the patients who had or did not have an event. In other words, it wasn't as if the investigators systematically hospitalized all the patients with high heart rates. DR. DiMARCO: I think what Marv was saying is that if the patient presents with worsening heart failure and a high heart rate, they're more likely to be hospitalized because they may appear sicker. DR. PACKER: I think, John, there's a likelihood that the bias would have been in the opposite direction because if an investigator could determine what the patient was on by looking at their heart rates -- and there's evidence from these trials that they could not determine who was on placebo or carvedilol by an individual measurement of heart rate. So, let me emphasize that. But if they could, they might be more likely to hospitalize someone who they thought was on a beta-blocker. DR. KONSTAM: I was thinking pretty much what John said, that I don't think it's so much an issue to me of the investigator guessing what they're on as much as that heart rate per se is one of the things that good clinicians -- that enters into their decisionmaking consciously or unconsciously. I'm a little concerned about that as a bias in the data set. DR. WEBER: But you're saying, Milton, that in fact did not place. DR. PACKER: That did not take place. DR. DiMARCO: What I'd like to do now is I'd like to just -- I'm sorry. DR. LIPICKY: Can I respond to Dan's question about what the questions are about? DR. DiMARCO: Surely. DR. LIPICKY: I think I'll lead you through the heuristic that's in the questions. It starts out posing that one usually likes to think of having two trials that meet their primary endpoints and that the p value from that is .05 squared. That's the degree of certainty that one would like to have that this is generalizable, it will happen in the patient population, so on and so forth. So, the questions start out saying can you come to that conclusion from primary endpoints. If you can, then that becomes the indication. If from the primary endpoints one can say mortality and hospitalizations decrease on carvedilol, that is an indication. Well, one might not be able to or one might be able to. If you can't, then you can look at the secondary endpoints, create a similar circumstance where you are convinced that secondary endpoints have been affected to the degree that you would like to have for approvability, then you have an indication. Now, then you might say, well, you might not be able to do that. So, then you can take retrospective endpoints. Every time you take this step, there's clearly an inference problem, and there are clear judgment problems, and there are clear penalties you pay. And you are asked to define them. So, you may be able to look at retrospectively defined endpoints and say, I know what this drug does. That is the indication. Then you are given the opportunity in the very last question of saying, I can't make up my mind like that but it's got to be good for something. (Laughter.) DR. LIPICKY: And you get to vote on that. DR. DiMARCO: Is that a suggestion we should move on to the questions, Ray? Are there any other members of the panel who would like to ask questions directly of the sponsor? Marv? DR. KONSTAM: There was one other question I wanted to direct to Dr. Stockbridge. I still wasn't quite clear what you were saying about protocol 240. You did a different set of analyses than the sponsor did. Was your conclusion was not still that it reached this primary endpoint regardless of how you did the analysis? DR. STOCKBRIDGE: The analysis that we did had a p value for study 240 of less than .05. Is that the question you asked me? DR. KONSTAM: Yes, I guess so. So, it met its primary endpoint any way you looked at it. DR. STOCKBRIDGE: Well, again, what we looked at was not exactly the same as what the protocol-specified endpoint was. What we looked at included all-cause hospitalization. It included rather than cardiovascular hospitalization. It included all-cause death not cause-specific death. So, in that sense, what we analyzed was not the primary endpoint. DR. KONSTAM: Now I'm more confused. If it was looked at from the perspective of the primary endpoint, it also reached it. Is that not right? That's correct. DR. COLUCCI: Bill Colucci again. It should be perfectly clear, if it isn't already, the primary prespecified endpoints which were quite cause-specific were strongly positive. DR. KONSTAM: Right. DR. COLUCCI: More conservative analyses done by the FDA were also significant. And reduction of the components of one, two -- leaving any single component -- all of those were also positive whether by cause-specific or cause-nonspecific analysis. So, I've seen no analysis of that study of any kind that was ever not statistically significant. DR. STOCKBRIDGE: If you leave out the medications component from either the sponsor's analysis or the FDA analysis which has no cause-specific component to it, the p value goes up by a factor of 10 in both those cases. DR. KONSTAM: Right. DR. KONSTAM: But it's still .05. DR. STOCKBRIDGE: Well, the FDA analysis started with a p value of .04 and went to .4. DR. MOYE: Right, but leaving out the medications is not per protocol. DR. STOCKBRIDGE: That's true. DR. MOYE: The per protocol included the medications. DR. STOCKBRIDGE: You're absolutely right. DR. KONSTAM: Any interpretation of the primary endpoint was reached. There's no debate it reaching its primary endpoint. I just want to understand. DR. MOYE: Now, the strength comes from the medications presumably, but the endpoint prespecified included the meds. DR. KONSTAM: The strength comes from the medications by Dr. Stockbridge's analysis but -- DR. LIPICKY: No. By anybody's. DR. KONSTAM: I said it wrong. It falls away from significant value if the analysis is done according to Dr. Stockbridge's analysis which was not the per protocol analysis. So, if you go per protocol and you take away the medications, it's still sticks to -- DR. D'AGOSTINO: But, you know, we've been making a big discussion about ignoring the primary outcome. This is one case where the primary outcome did in fact turn out to produce significance. We probably don't want to wander too much away from that. (Laughter.) DR. DiMARCO: Any other comments from the panel? (No response.) DR. DiMARCO: Let's get started with the questions then. Question 1.1 is, study 240 had a primary endpoint of time to the first event of sudden death, death from progression of heart failure, hospitalization for worsening heart failure, or sustained increase in a specified group of heart failure drugs. Elimination of the medications component of the endpoint from either the sponsor's analysis, which included cause-specific mortality and hospitalization, or the reviewers' analysis, which included all-cause mortality and hospitalization, greatly increases the p value from 0.003 to 0.029 and from 0.04 to 0.378, respectively, suggesting that most of the statistical power lies in the medications component. What effect does this observation have on the clinical interpretation of the results of study 240? Rob, do you want to start off? DR. CALIFF: Well, I think this gets deep into the heart of the philosophy of determining health effects of a treatment. It's one thing to say that the drug has a mechanism of action that improves a component of a person's health. It's another to say that you improve the overall health state of the individual or of the population. I think it's not nearly as strong when you have a change in the p value that's so substantial when you move away from the cause-specific events. I think it's a point of weakness that's fairly substantial. DR. DiMARCO: Ralph, would you like to comment? DR. D'AGOSTINO: I always like to make clinical statements. I just think that the study 240 -- somehow or other, we have to take it on its grounds. It had a protocol. It had a primary event. I think it is driven by the medication component. It weakens the sort of statistical power and so forth, but nonetheless it still, on their groups, maintains the significance. So, I think if we play statistical significance, I think we still have a result here. It certainly does weaken it clinically, though. DR. DiMARCO: Jeff, I'll ask you. DR. BORER: Yes, I agree with that. Certainly eliminating the medications component weakens the strength of the conclusions and the degree of consistency of the data, but nonetheless, the study, as it was set out to be done, achieved the goal that it set out to achieve. It proved the hypothesis it tried to test. In addition, from what I've read here and from what I've heard, every other kind of analysis that could be done of these data still comes up with a statistically significant result in favor of the drug. Now, of course, not all that overwhelmingly strong, but still statistically significant. I'd like to see some confirmation from a second trial, but I would accept this as a positive trial. DR. DiMARCO: Would anyone else like to comment on this question? DR. CALIFF: Just one more little emphasis. The big change is not taking away the drug effect or the effect of counting the change in medication. The big change is going from cause-specific to all-cause. Is it right that we should be looking for overall health effects that cause specific outcomes? Do you really care if you die from heart failure or from something else? DR. BORER: No. I agree with you about looking at all-cause. I'm sorry. That's quite right. But even if you do that, still the overall analysis of all-cause hospitalization and all-cause mortality together is, as I understand it, statistically significant. DR. CALIFF: No. I think it's .378. DR. DiMARCO: That's the one that goes to .378. DR. CALIFF: I think it's a great cause-specific answer to a question about how the drug works, but as far as the overall health effect -- that's why I say it concerns me that the p value goes all the way to .38 when you count all-cause. If it just changed a little bit, I'd be more comfortable. DR. DiMARCO: Dr. Temple? DR. TEMPLE: Can someone remind me? If you do all-cause mortality plus all-cause hospitalization plus the medications, what's the result of that? Because that's what we are really talking about here. It's the morbidity plus mortality endpoint that changes if you go to all-cause, but what happens if you keep the endpoint as planned but modify it in that way? Does anybody know? DR. PACKER: Bob, that analysis hasn't been done, but based on the magnitude of the effect on all-cause mortality and all hospitalization, plus the medications -- and remember, it's a 2 to 1 randomization at 50 percent lower risk. I have to emphasize this hasn't been done, but you're combining two components which go very strongly in the same direction. It's very likely that that will be statistically significant even when the original primary endpoint of 240 is broadened in a non-cause-specific fashion. Of course, now that the committee has asked for that analysis, it can be done. DR. TEMPLE: Given that the results are largely driven by the medication, it seems likely that it's going to come out that way. DR. STOCKBRIDGE: If I understood your question correctly, that is the analysis that the FDA reviewers did. It was non-cause-specific plus the medications. That gave you a p value of .04. DR. TEMPLE: Okay, but that was on a very much reduced data set because you didn't find most of the medication data adequate. DR. STOCKBRIDGE: All of that is certainly true. DR. TEMPLE: I was just trying to distinguish between two points that are being made. One, as Ralph said, for better or worse, you live with roughly the endpoints that they chose because we're not supposed to go flipping away from endpoints any more than they are. And if you do that and stick with the medication, it really turns out not to matter very much whether you use cause-specific or non-cause-specific because that's not the major driver of that endpoint. So, it's not surprising. It just means that what you've got here is an endpoint that is mostly about worsening heart failure as measured by medication use. It's not really a mortality/morbidity, in some other sense, endpoint. It's a more limited endpoint. You have to decide whether that's clinically relevant or meaningful. DR. KONSTAM: John? DR. DiMARCO: Marv? DR. KONSTAM: I think you could approach this question from two different perspectives: a statistical perspective and a clinical perspective. From the statistical perspective, I think you have to give this trial its due, as Dr. Temple was saying, and as we go to beat up on other trials, as we'll come to, that don't reach their primary endpoint, this one does and we got to stick with that. From the clinical perspective, Rob, I am somewhat concerned, but I guess I'm not as concerned as you sound to be about the change in the p values from the medication difference when you take away the medications. The p value rises in large part because the number of events is falling, but in fact the other two contributors are still going in the same direction for one point. And for another point, it seems to me to be consistent with other things in the data set. So, although it's obviously a big driver to the magnitude of the p value and I'm a little bit worried about it, I'm not that worried about it. DR. CALIFF: But again, I'm responding less to that than to the change from .04 to .378 which makes me wonder if there's an excess of non-cardiac hospital admissions in the carvedilol. DR. COLUCCI: I think that Dr. Konstam's point is very important. That study was empowered based on estimates of the event rate for the three-component endpoint so that it really is not empowered to see anything when you take one of those away, particularly the one that is the major contributor. I think it is fair to go to a non-cause-specific death and hospitalization, but really to take away medications really just under-powers it tremendously. It was prospectively designed to be empowered to this endpoint. The other thing to be said is that although medications have traditionally not been quantified in big clinical trials, they generally show up as an increase or a decrease. This study had really the first and most specific criteria for change in medications of any trial that has been done so far. That required a 50 percent increase that was sustained for over 30 days or the addition of a new additional drug. So, these are much more firm objective endpoints for change in medication than have been used in the past and I think are much less susceptible to the type of criticism that is very justified because, after all, diuretics are adjusted up and down, a little here and there. But these are substantial changes that are maintained for a substantial period of time. DR. DiMARCO: Dr. Raehl? DR. RAEHL: I think you just answered my question, but to clarify, the change in medication to which we're referring is primarily due to diuretic regimen changes. Correct? DR. COLUCCI: Or ACE inhibitors. DR. RAEHL: There seems to be disagreement among your -- DR. COLUCCI: The majority of changes were diuretics. The majority were diuretics that were changed, but it could have also been -- DR. RAEHL: Is that data anywhere in our review that we could actually see how the medication profiles, if you will, changed on an aggregate basis, that being a primary driver? DR. COLUCCI: I believe that's available. DR. SHUSTERMAN: We don't have it here, but I would agree with Dr. Colucci, probably diuretics are about a half -- DR. DiMARCO: You have to use the microphone please. DR. SHUSTERMAN: It also included changes in nitrates, changes in ACE inhibitors which made up the other half. There were no changes in digoxin -- DR. RAEHL: So, the criteria was a 50 percent increase in any of those medications, but about half of the time it was about a 50 percent or more increase in diuretics. Is that fair to say? DR. SHUSTERMAN: Or the addition of a new diuretic. Correct, yes. DR. COLUCCI: It had to be at least a 50 percent increase in dose or a 50 percent increase in dose of an ACE inhibitor or nitrate or other vasodilator. About half of the time, that was the diuretic. The rest of the drugs contributed to the other half. DR. DiMARCO: Are there any comments with regard to question 1.1? (No response.) DR. DiMARCO: Let's move on to number 1.2. We are now moving to study 223. What clinical benefit was the primary endpoint in the short-term phase? Rob? DR. CALIFF: My reading of the short-term phase is we didn't have any evidence of substantial clinical benefit. We had ejection fraction and ventricular diameter measurements that were improved. The primary clinical endpoint was not affected. Those are nice and very stimulating endpoints, but not measuring clinical status. DR. DiMARCO: So, you would conclude that there was no real demonstrated clinical benefit in the short-term phase. DR. CALIFF: If our goal is to understand overall health benefits to the patients or populations, there were none demonstrated. DR. DiMARCO: Ralph, any comments? DR. D'AGOSTINO: I agree with that. DR. DiMARCO: Anyone on the committee have a different opinion? DR. GRINES: I don't remember that that data was shown to us other than symptomatic data, just some of the negative exercise data. But wasn't there a change in ejection fraction? And we saw no symptomatic data. So, to determine clinical benefit short-term would be very difficult. DR. CALIFF: All right, but the primary endpoint of clinical benefit was the exercise endpoint. Ejection fraction was definitely improved, but that's not a patient-oriented endpoint. DR. DiMARCO: So, I think that we can move on to 1.2.2. Here the question is, what clinical benefit was the primary endpoint in the long-term phase? DR. CALIFF: I think we reviewed that ad nauseam here. The wording is not specific. The analyses have been done in a variety of ways, and probably by the way we have come to as preferred way in this group, it's marginal at .04 but it's there. DR. D'AGOSTINO: I think that the protocol is quite deficient in flagging what the long-term is going to be about, and I don't see any way out of that. If you do say that, well, give them the benefit of the doubt that this hospitalization and death that you're involved with there and you get a significance of .04 or .035 -- I forget exactly what it was -- there's still the presence of these other outcomes down the way, but I don't see how you can just disjoint this particular analysis from the fact that there were three other analyses going on. DR. DiMARCO: Other comments from the committee on that question? DR. MOYE: I just need to be sure. Study 223 was the Australian-New Zealand study. Right? Hopefully we want to call it a primary endpoint, but the long-term phase, in any event, looked at combined morbidity and mortality. Is that right? Isn't that right? DR. D'AGOSTINO: Yes. DR. MOYE: And the p value for that was provided as? DR. CALIFF: .045. DR. MOYE: Even though this was a prospectively designed follow-up trial, there was no prospective statement about what to do with these multiple goals/objectives. DR. D'AGOSTINO: It isn't even clear what they were going to do with the particular variable. I do not read it that they were necessarily going to analyze it. I'm not clear why it was being collected, but certainly it doesn't go on to your question. DR. MOYE: The result becomes even more skeptical. DR. DiMARCO: Following up on that, is there a possibility that baseline prognostic factors could to be used to adjust in this study? DR. WEBER: If I could ask a question. When they are in fact used to adjust -- this is on slide 43 of the handout we've got -- the p value seems to get quite a bit stronger, especially when you adjust for New York Heart Association class and ejection fraction and so forth. Is there some explanation for why that is? DR. CALIFF: I might comment on that since we're doing that a lot prospectively now. It's sort of hard to understand why you would adjust in a randomized trial in the first place, but if you did it -- and I think Lloyd Fisher commented on this before -- in general your p value gets smaller because you're taking away some variance in the estimates. That's about as far as my knowledge goes. DR. D'AGOSTINO: Or the other response to that is that you only see it when the p value gets smaller. Why would a sponsor present it if the p value is getting bigger? So, it's not clear that it will always get smaller. DR. PACKER: Ralph? John, with your permission. The sponsor did not propose doing any covariate analysis here. It was done only in response to an FDA request. DR. D'AGOSTINO: I understand that. It was just a comment. I was obviously being facetious. But I think the .045 is what the result that they wanted to produce is and we have to grapple with the question of whether or not we think it should be adjusted for multiple testing. DR. DiMARCO: Any other comments? Dr. Temple? DR. TEMPLE: What's your answer to your own question? What sort of adjustment might there be? Well, there were three endpoints. DR. D'AGOSTINO: There were three endpoints. No, I think -- DR. TEMPLE: One of which made it. DR. D'AGOSTINO: One of which made it. I think it's a good question and I don't really know the answer because I'm not completely convinced that one has to link the first with the second. I think you might be able to argue -- and I was actually hoping that there would be a discussion -- that in fact you can separate the front piece from the back piece and buy power or buy alpha of .05 for both pieces. I think that's a reasonable suggestion to make here. So, I guess I have an opinion, but I'd really like to see how other people sort out that. DR. TEMPLE: That seems important because the thinking about this study goes in two stages. One is can you decide that the second phase was designed to study anything in particular. Once you get over that hump, then you have to figure out what to do with the nominal p value they observed. So, I guess I hope there's some discussion too. DR. DiMARCO: Dr. Borer? DR. BORER: Personally I'm relatively unconcerned about dissociating short-term from long-term. I don't think that's intrinsically unreasonable if that's what you said you were going to do. It seems as if they're fairly potentially self-contained analyses that have nothing to do with one another. But hump number 1 here is the one that I'm having a little trouble with. I accept the explanation that Milton and the others gave about the intention of the investigators, but I'd feel a lot better about that if it were clearly stated somewhere where I could understand it better. So, that's the trouble that I'm having with this. If it was actually a prespecified hypothesis to be tested, that the combined endpoint would improve on the drug at 18 to 24 months, I would accept the p value as being important and I would say 223 corroborated 240 and I'd be very happy. But I'm not sure of the intent of the sponsor and the investigators in doing the study. DR. PACKER: Jeff, the correspondence that documents this can be shared with the committee. I think everyone recognizes that 223, perhaps because it was an investigator-initiated protocol, could have and should have been better written and more clear. But the written materials that the investigators have provided make it clear what their intent was, make it clear the division separated from between the short and long-term phase, and that they defined mortality and major morbidity a priori as all-cause hospitalization and all-cause mortality, the least biased of all the analyses. DR. D'AGOSTINO: I think that it really comes down to, in terms of our opinion as giving advice, it's not in the protocol. Would we as experts in statistics and clinicians be willing to separate the two and would we be willing to let the second phase live with its .045? I don't think there's any hope of getting something out of your correspondence. DR. PACKER: If I could add, I think everyone appreciates the uncertainty in the original protocol, and I think that was part of the motivation why the FDA asked for the analyses of 220 and 221. The concept was that if one could find confirmation of exactly the same analysis, combined morbidity/mortality, not cause-specific in other trials in the U.S. program -- and all the trials that could find it were looked at. And it was found. It was retrospective. The individual component, the morbidity component, was prospective. The combined was retrospective. If one could see it there and one sees it there with small p values, one can get additional comfort about the effects seen in 223. DR. KONSTAM: John? DR. DiMARCO: Marv? DR. KONSTAM: I haven't heard any viewpoint on the panel that this trial has any hope of standing on its own. The question then is, what support, if any, does it give to 240? That's obviously the question. DR. D'AGOSTINO: Can I interrupt? I'm sorry. I think it has to stand on its own to answer the question. We are being asked does it stand on its own. Am I wrong on that? DR. TEMPLE: Not solo, but in combination with the others. DR. D'AGOSTINO: No, no, but I mean the question that we have before us. Do we have the traditional two studies? DR. DiMARCO: Two studies, yes. This has to be one study that's accepted. DR. KONSTAM: Well, let me just finish my comment. I think the issue to me is do I believe this endpoint. Can I look at the whole data set and believe any endpoint? That's what I'm grappling with. We have a study that's positive and that's 240. In my view I agree with you. The study clearly doesn't stand on its own. So, the issue then becomes is there any value in it at all to me. I guess I have to say there is some to me. By the way, why doesn't it stand on its own? I think all of us agree that we don't see a primary endpoint. So, there's a big problem there. DR. LIPICKY: But, Marvin, you'll be able to say something about that when you get down the end of the list. DR. KONSTAM: Oh, is that right? DR. LIPICKY: Yes, because you'll be able to say I couldn't draw a conclusion in number 1. You might be able to use it in number 2. Well, if you can't draw a conclusion in number 2, you might be able to use it in number 3. Well, if you can't draw a conclusion in number 3, number 4 says can you draw any conclusions at all. DR. KONSTAM: Well, let me ask you. The issue to me is, is there a primary endpoint that is met? DR. LIPICKY: Right. DR. KONSTAM: And if there is a primary endpoint that is met, it's in protocol 240. DR. LIPICKY: Correct. DR. KONSTAM: Then the question for me is, is there any corroboration for that anywhere in the data set? DR. LIPICKY: You were asked on some primary endpoint here, 223, and the answer is yes or no. DR. KONSTAM: Is it not possible to draw some corroboration regarding a primary endpoint in study A from another study that does not meet its primary endpoint? Is that not possible? That's what I'm asking. DR. LIPICKY: We're talking about making a decision here about whether we have two studies that have met a primary endpoint. Okay? So, that is the question under consideration. Later you'll be able to say, well, maybe not but something happened here. You will get the opportunity to answer the thing you want to address. You need to hear, can you make it clean? And either you can or can't. DR. PACKER: Ray, I just have a question again, with John's permission. If I understand it, the pivotal question for this part, which is 1.3, is prefaced by the phrase "with appropriate consideration of the supporting evidence from primary and secondary endpoints of these and other clinical trials," which I think addresses Marv's issue directly. DR. LIPICKY: Well, he has to make a decision as to whether 223 satisfies the condition for saying that it met a primary endpoint. If he wants to draw on other trials, as you want him to, he can do that. DR. PACKER: That's what your question asks him to do. DR. WEBER: It asks for secondary endpoints too. DR. CALIFF: I think we need to answer the technical question of whether we're accepting that there is a primary endpoint in 223 which is less than .05. DR. DiMARCO: That we're going to accept. DR. CALIFF: Right. Although I'm an advocate of efficient trial design and trying to answer as many questions as you can, it's hard to accept totally unhinging within the same trial these four endpoints. It's kind of like taking four shots off the tee and saying, I'll take my best and not worry about the other three. I don't think it ought to be a severe penalty because there are clearly sort of two things being addressed here. But I think this is where even I would ask for a statistician's -- DR. D'AGOSTINO: I could have given my opinion immediately, but I thought that might be inappropriate. I would say, no, this doesn't make it. I think that my sense that this doesn't make it on its own because of the questions with the four possible outcomes and so forth. But we do pick up an alternative way of bringing this back to us with the next question. So, to be clean, I think this is a no. We can't ignore those other pieces. DR. DiMARCO: I think we've gotten to the point then that we can probably take a vote on 1.3 which is really the question we're dealing with. Are we going to accept both of these as trials which stand alone essentially? DR. D'AGOSTINO: Well, 1.3 says, "other clinical trials." DR. DiMARCO: That's right. And the question is, should carvedilol be approved for the treatment of heart failure on the basis of p less than 0.05 on the primary endpoints in each of two adequate and well-controlled studies? And that refers to 240 and 223. DR. TEMPLE: We need to clarify this. The order of this is that first you look at individual studies and their primary endpoints. Then you look at other endpoints, secondary endpoints, and then you go for it and look at anything you want to. So, this is the one about primary endpoints. The question here, just to be clear -- as I read it again, it's not perfectly clear -- is, based primarily on two studies that may have met fully, partly, whatever their primary endpoints -- and you're allowed to think about all the other stuff -- does this make it in the conventional way, the conventional way being you have two well-controlled studies that demonstrate something that they set out to demonstrate. That's sort of the conventional way. But we didn't want to say you couldn't even think about all those other things, so that's why 1.3 says, oh, well, you can think about those things. But the primary focus here is on those two studies and what they showed on their primary endpoints. I don't know if that helps. DR. DiMARCO: Is everybody clear on the question now? Okay. Ralph, do you want to start? DR. D'AGOSTINO: Well, again, I just was thrown an oddball there I guess because I'm still looking at the statement that says "supporting evidence from primary and secondary endpoints of these and other clinical trials." I think that "other clinical trials" is very important. These two studies standing on their own -- first, 240, we voted that it makes it. I think 223 standing solely on its own doesn't make it, but I think 223, in conjunction with these other analyses we've seen, does make it. So, my answer is yes to this question. I hope I'm reading it correctly. DR. TEMPLE: Well, Ray may want to comment too, but I think that's what we were asking you. As everyone has said, there are things about 223 that are not perfect, to say the least. On the other hand, it's not nothing, if I hear you. So, the question here is, well, along with that which is flawed and has problems and the other study and the other data -- now, you may want to spend some time discussing the other data before you give a yes, but that was the idea of this question. Ray, do you buy all that? DR. LIPICKY: Then the decision is not being made on the basis of meeting the primary endpoints in two trials. I must admit there's more emphasis to those words than probably belong in that question, and I'd like to take those words out. DR. TEMPLE: The reference to the other stuff. DR. LIPICKY: Because people get a chance in later questions to say, oh, yes, the totality of evidence is overwhelming. The question that number 1 is addressing is can you get there from primary endpoints in two trials. DR. D'AGOSTINO: I don't want to lose the way this is worded. So, somebody keep it for a later discussion. If we drop that reference to other trials and other supporting data, I would say, no, we don't have two clean trials. DR. TEMPLE: Maybe that question is really part of 2. I think that's what Ray is saying. DR. LIPICKY: Yes. Those words don't belong in 1. It was an error to get them there. DR. STOCKBRIDGE: Could I argue that the words really ought to belong in 1 since I put them there? (Laughter.) DR. STOCKBRIDGE: I think what one normally has is something better than two trials with a p of less than .05. One normally has two trials with a p of less than .05 plus secondary endpoints that help you feel good, that make you believe that the primary endpoint is a plausible one, that the primary endpoint finding is a plausible one. One's confidence is usually better than is expressed by two trials and only one finding in each of those trials. DR. D'AGOSTINO: I hope the FDA is going to pay for my stay tonight because I can see I'm going to miss my plane. (Laughter.) DR. D'AGOSTINO: The "other clinical trials" is the piece. If you want to feel good about these trials and so forth, there are things to feel good about. It's the "other clinical trials" I think that is very important in terms of the way I would vote. DR. DiMARCO: So, the vote is? DR. D'AGOSTINO: If we remove that phrase from it, it's no, these trials don't stand on their own. DR. DiMARCO: We've got some discrepants from our two FDA representatives here. Does the committee want to take that phrase out and just say the trials have to stand on their own? Maybe I'll just take a hand vote. How many want to leave it the trials have to stand on their own at this point in this question? DR. WEBER: Why don't we do it both ways very quickly, John? DR. DiMARCO: Okay. DR. CALIFF: Yes. I think it's important to be clear that if we're saying we're going to make an exception to a clear-cut standard. So, the first question without the phrase. DR. DiMARCO: So, we'll vote on this both with and without the phrase. The first time we'll vote without the phrase, so that the trials have to stand absolutely on their own as two independent trials with significant results. DR. D'AGOSTINO: No. DR. BORER: I'm sufficiently concerned about all the issues that have been raised about 223 that I would reluctantly have to say no as well if we can only consider those two trials. DR. DiMARCO: Cindy? DR. GRINES: I think 240 meets its endpoint. It sounds like 223 probably does not, although it has clinical benefit. So, I guess I would answer no to 1.3. But I'd like to point out that the materials that we were given state that in reconsidering carvedilol, the committee is reminded that Federal regulations pertaining to approval simply call for evidence of benefit from clinical trials. They specifically say that the regulations do not specify a p less than .05 on primary endpoints in two studies. So, I think for our committee to be clear on that. DR. DiMARCO: Yes. I don't think this is going to be the final word on whether the recommendation will be for approval or not. This is just to answer this question. Rob? DR. CALIFF: No. DR. DiMARCO: I'll vote yes. DR. MOYE: I think that 240 is fine, but 223 in my view is extremely problematic, so I say no. DR. RAEHL: No, for the same reasons. DR. WEBER: I also, I guess like everyone, have a little bit of a problem with 223, and this is a little bit of a phony vote because if you told me there was absolutely no other data and everything had to rise and fall on 240 and 223 and that was it, I'd vote yes. But knowing that there's other stuff, I can have the luxury of -- (Laughter.) DR. WEBER: -- a throwaway vote, as Lem calls it. DR. RAEHL: That's why I jumped ahead. DR. KONSTAM: I don't see a primary endpoint in 223, so I vote no. DR. RODEN: No. DR. DiMARCO: So, if we throw out the phrase "with consideration of the other trials," the committee has voted as we've heard. What if we keep the phrase in? DR. D'AGOSTINO: There's a grab bag of these secondary endpoints, some of which are significant, some of which aren't significant, and so forth, but there is a direction to them. Then there are these other bits of information that come from the other trials. Nothing is without fault, but I think that my hearing of the presentation and my reading of it is that they are very consistent, and I would say with this other supporting information, I would vote yes. DR. DiMARCO: Jeff? DR. BORER: Yes, I would echo that view. I think with all the problems -- and I see a lot of problems -- still the overall consistency through analysis after analysis and analysis that was specified by the sponsor and analyses that were asked for by the FDA, they all seemed to go in the same direction. So, I would say the same thing as Ralph did. I would say yes if we include everything. DR. DiMARCO: Dr. Temple, do you have a comment? DR. TEMPLE: As you do this, could you identify somewhat more specifically what's making you feel that way? I ask that because in the later questions, we point out that there were some specified secondary endpoints like globals, the New York Heart, and all that. That's one category of stuff. Then there's also the mortality plus morbidity analyses, and that's a different category of stuff. So, not to argue the point but say which of those or both are convincing. DR. D'AGOSTINO: I was trying to say both actually in my answer. I think that the global, the New York categories. To me I'm more persuaded by the mortality and morbidity analyses, but I do think the secondary also are important. DR. BORER: Yes. That's exactly what I would say. I think the consistency of the mortality and morbidity data are most persuasive, but it's nice when you see the matrix, seeing that for most of the other endpoints that may not even be related in terms of pathophysiology to the -- mortality and morbidity may not be related all that closely, anyway -- that even there the results all generally tend in the same positive direction with the drug. DR. DiMARCO: Cindy? DR. GRINES: I'm going to answer yes to question 1.3, and it's primarily based on the statistical reviewer's table, table 9 on page 16, where it outlines a lot of the different endpoints. It seems that the majority of the studies showed either a trend or a significant difference in favor of mortality. All showed ejection fraction. Most showed improvements in New York Heart Association class, subjective scores, objective scores, progression of heart failure. And I think it's very consistent throughout the trials. DR. DiMARCO: Rob? DR. CALIFF: I'll make a few comments here. I'm going to say yes too, but it's kind of in the blind pig finds an acorn category because the primary endpoints by themselves don't cut it. We end up with this sort of mishmash of, in all different kinds of studies, all different kinds of endpoints being positive. You get the feeling that something good is in there. It's not very directed in the way that you'd like to see it. First of all, I know we're going to get to this, but it's good to see some actual patient assessment and physician assessment in a blinded way. I think that's very reassuring. Then the last thing. The one contingency I have is I am still hung up on the run-in phase. The preliminary, sort of off-the-cuff data looks good, but I'd like reassurance that since this was not primary, secondary, or even tertiary in my mind in terms of the global endpoint of death and all-cause hospitalization, that if you did the worst case analysis, you would get something like a p value of .01 or .001 in favor of treatment. It looks like it's going to be that way, but if you throw in everything bad you can, make it the worst case, it still looks good. Even though it wasn't looked for, I don't see how you can turn your back to that. I think it's compelling. DR. DiMARCO: So, that's a yes. DR. CALIFF: Yes. DR. DiMARCO: My vote was earlier yes, and I would just say I'd continue to vote yes. I really think that this part of the question is almost sort of like 1.8 instead of question 2 because what we're talking about is we're accepting something less than the perfect study in 223 because of all the other evidence which has a broad pattern of consistency. So, I sort of think we're edging towards question 2, but again I'll vote yes. DR. MOYE: I'm going to vote no. I think that the evolution of clinical trial methodology should be for more stringent requirements not less stringent, and I think that the investigators and sponsor have the resources and the intellectual horsepower to do things right. I don't think they did things right in 223. I think that the table that was mentioned, table 9, is not persuasive because it comes before table 11 which shows a great many more study endpoints and p values and shows positive ones as well as negative ones. In the absence of prospective statements by the sponsor as to how to be guided through this, I again have the freedom to choose a very conservative track, and the conservative track is concerned for the risk of a type 1 error in the population at large. I think looking at a hodgepodge of secondary endpoints makes it very likely that we are misleading the population at large. I can't imagine writing a label that lists a host of different benefits and then says, but we're probably wrong on at least one of them, and that's tantamount to what we're doing by looking at this collection of secondary endpoints in a very unstructured manner. So, my vote is no. DR. DiMARCO: Mike? DR. WEBER: Well, I guess like most people I have shared concerns about 223, and I think as much as anything, it's an irritation that it took us so long to get to the real story of 223. I don't think that prospectively before the study began, the investigators had talked about short-term and long-term endpoints. I think most of us now have a pretty good sense of how the study evolved, and there's nothing wrong with how the study evolved. We're all investigators. We know how things happen and how new ideas come along during the course of a study. I think all of us around this table are experienced enough to know that those data can still be very helpful. They may not be perfect. They may not satisfy Dr. Moye's strict rules for clinical trials, but it would have been very helpful. And it's annoying that it took us so long to get at it. I in fact think the data from 223 are quite useful. They may not be perfect, but they're quite useful. I think the point that Cindy made was important, that ultimately we don't have to be judges of the perfection of trials. We have to be satisfied that a drug is efficacious and beneficial, and I think obviously most of us are. I'm also reassured that some of the less dramatic findings, the assessments by the physicians, the assessments by the patients, which were done blinded of course, went very strongly in the same direction as the more objective findings. So, I really have no difficulty in believing that carvedilol is a good drug for the treatment of congestive heart failure however it may ultimately be labeled, and I vote yes. DR. RAEHL: I'm influenced by the morbidity and mortality data first but also by the overall trends of some of the secondary factors as described in table 9, in particular the Heart Association classification and ejection fraction. So, therefore, I'll vote yes. DR. KONSTAM: I'm going to vote yes. To me the crux of the matter is this. We have one trial with a positive primary endpoint and that's 240. I guess the only way I can approach it then, without another primary endpoint that's positive, is do I believe the results of 240, yes or no, and my answer is, yes, I do believe the results of 240. The reason I do comes from a variety of different sources of information. The first is that 223 looks similar although it is not a primary endpoint, but I think there's evidence in 223 that pushes me toward believing 240. I think there's evidence in 220 and 221 that pushes me toward believing the results of 240, again totally the result of post hoc analyses, and therefore no way they could stand on their own, but again they push me toward believing the results of 240. What also pushes me toward believing the results of 240 is that I think there is evidence in the data set toward each of the individual components of 240, again not from any one primary endpoint, but I can't help being influenced by the overall direction and magnitude of change of the overall mortality, which is a component of 240, and although obviously not a primary endpoint and it wasn't a single trial to look at it, I can't help but recall the overall magnitude of this non-specified endpoint, nevertheless a very important endpoint, mortality, over the whole data set, again using it to support the fact that I believe the components of 240. Finally, I would say that I think that the results of 240 are believable from what else we know. They're not like a Vesnarinone result, for example, that's out of the blue and nobody would have expected based on what we know about that drug or other inotropic drugs. It's in the setting of sort of 20 years of thinking about beta-blockers and a lot of trends in similar directions that one can find in the literature about beta-blockers. So, I guess I don't find the results of 240 a surprise. So, for all of those reasons, I believe the result of 240 and therefore I vote yes. DR. DiMARCO: Dan? DR. RODEN: As I read this question, we're being asked, whether on the basis of these and other clinical trials, carvedilol should be approved for the treatment of heart failure, blah, blah, blah, having reached primary endpoints in two adequate and well-controlled studies. Now, I agree with everything that Marvin says, including the Vesnarinone comment and the decades of experience with beta-blockers, but I for exactly that reason vote no. There aren't two adequate and well-controlled trials. DR. LIPICKY: So, has anyone kept track of the vote? DR. DiMARCO: Yes. The vote is 8 to 2. DR. LIPICKY: 8 to 2. DR. DiMARCO: Yes. DR. LIPICKY: So, if it's 8 to 2 to approve, then we are done. DR. DiMARCO: Yes, because the next three questions all start "if not." DR. LIPICKY: Correct. But before we quit, I'd like to find out two things so I understand the sense of the committee. So, what does carvedilol do? (Laughter.) DR. LIPICKY: Can someone tell me? Should it be allowed to claim that it saves lives? DR. KONSTAM: We haven't discussed that. DR. LIPICKY: Should it be allowed to claim that it decreases the progression of heart failure? Because that's what hospitalizations and so on were in the past, but this is all-cause hospitalization you guys settled on. Should it be allowed to claim it makes people feel better? And by not going through all the rest of the questions, we weren't ever able to get that commitment. Would somebody -- anybody -- tell me what they think carvedilol does? DR. WEBER: Could I ask the sponsor? You started out today by saying you were not seeking a mortality claim. Is there something you know that we don't know? (Laughter.) DR. LIPICKY: Yes. It doesn't make it. DR. WEBER: I'm getting back to the questions Dr. Lipicky has just asked. DR. LIPICKY: You should not ask the sponsor anything now. You as a committee have said approve it. I want to know for what. DR. KONSTAM: Ray, based on the -- DR. LIPICKY: Okay? Tell me. Don't base anything. Just say the words. DR. KONSTAM: I want to look up the primary endpoint of 240 before I answer the question. (Laughter.) DR. LIPICKY: Progression of heart failure, a combined endpoint. DR. KONSTAM: Does it say that in the protocol? DR. LIPICKY: Yes. DR. KONSTAM: Okay, then that's what I would go with. DR. LIPICKY: Yes. It was combined endpoint progression of heart failure. DR. RODEN: I actually think symptom relief of heart failure. I think hospitalization is a very tough endpoint to figure out what it means especially with a drug that can be unblinded so easily. So, I think the data, as I see them, say that carvedilol doesn't do anything bad. There's this decades of experience that Marvin has pointed out, and most of the studies say that there is some relief in some of those indices of function, some improvement in some of those indices of function. Whether that's progression of disease, which I think is a pretty broad kind of claim, or relief of symptoms, which I think is a more constricted claim, I'll let Ray decide. DR. PACKER: Dan, don't you think hospitalization is harder than symptoms? Firmer. I don't mean harder -- actually harder to achieve, yes. DR. RODEN: No, and I think that hospitalization patterns in New Zealand may be very different from hospitalization patterns in Los Angeles compared to Des Moines, Iowa. DR. PACKER: But you're seeing it geographically all over the world. DR. RODEN: Well, I think the physician practices are different and I think clinicians approach patients with tachycardia differently from patients with no tachycardia and heart failure. So, I think that that's actually less firm. DR. PACKER: I need to say that because hospitalization is in vogue now as a heart failure endpoint. DR. RODEN: I guess except in California where no one goes into the hospital. DR. CALIFF: Let me speak out in favor of hospitalization as an endpoint. I think although there are clearly different rates of hospitalization in different societies and health care systems, the relative difference, if shown to be constant is very important and represents clearly something that's bad for the patient, which is what we're charged to deal with. DR. KONSTAM: What I would vote for is indices of morbidity. It reduces the frequency of indices of morbidity. DR. LIPICKY: The indication would be this is for decreasing the incidences of morbidity. DR. KONSTAM: Of indicators of morbidity. DR. LIPICKY: Indicators of morbidity in patients with congestive heart failure. DR. KONSTAM: That's a first cut, yes. (Laughter.) DR. LIPICKY: Can anyone do better than that? DR. DiMARCO: I would go actually what the endpoints in the trial were which are complications of congestive heart failure -- DR. LIPICKY: But what were they? DR. DiMARCO: -- which include death, hospitalization, and -- DR. LIPICKY: So, they get a mortality claim. DR. CALIFF: No, it's not a mortality claim. It's a composite claim that includes mortality. DR. LIPICKY: Death is in there, though. DR. DiMARCO: Death is in there because that's what we voted. DR. KONSTAM: It's a little confusing if you say it that way, though, because it sounds like we're saying there's an effect on mortality. I would say that mortality is an indicator of morbidity. (Laughter.) DR. WEBER: You'll get no argument from anyone on that. But a number of the trials used the combined morbidity and mortality endpoint, and it was significant. How are you going to separate them out? DR. LIPICKY: All right, I understand. So, that clarified it a little bit, well enough so that we'll be able to argue. Then I'd like the committee to argue with me when I assert that. What you all said was if you don't make it with your primary endpoints, root around in your data and find retrospective endpoints that sound good and you can wow us. DR. KONSTAM: No. DR. LIPICKY: No. DR. KONSTAM: Not at all. DR. LIPICKY: Well, how did you make that decision? DR. KONSTAM: Well, I already told you. The issue is that we have a primary endpoint that's positive, and getting back to the discussion -- DR. LIPICKY: One trial. DR. KONSTAM: Right, exactly. And getting back to the discussion from this morning, I guess the only thing I'd feel comfortable commenting on is whether or not I believe it based on the entire -- DR. LIPICKY: One trial? DR. KONSTAM: That's right. Do I believe that that finding in that trial is positive or not? I do feel that I believe the result of that trial based on many other things in the data set. DR. LIPICKY: So, you believe the results of study 240 based on seven other trials. DR. KONSTAM: I could go through it again. DR. LIPICKY: Well, that's what there are, seven other trials. DR. KONSTAM: Specific elements of those trials. DR. LIPICKY: But your conclusion that those other trials support 240 is because there were some p values for totally retrospectively defined endpoints. DR. TEMPLE: Ray, that's incorrect and you mustn't keep saying that. Most of those endpoints were prospective. They were just secondary. DR. LIPICKY: Which ones? DR. TEMPLE: Which ones were secondary? Death, global -- DR. LIPICKY: This is mortality and all-cause hospitalization. You invented that endpoint yourself. (Laughter.) DR. TEMPLE: Yes. DR. LIPICKY: Okay. There's no question that it's retrospective. DR. TEMPLE: Ray, if you want them to use their official secondary endpoint which is cause-specific, they win on that too. DR. LIPICKY: That was not a secondary endpoint in any protocol. It was also a made-up endpoint. DR. TEMPLE: This committee told us several things and it's perfectly clear to me what they told us. They said they do not agree that just because you fail on your primary endpoint, you can never learn anything from your secondary endpoints. DR. LIPICKY: I -- DR. TEMPLE: I'd like to finish. DR. LIPICKY: But -- DR. TEMPLE: I want to finish. Thank you. (Laughter.) DR. TEMPLE: They told us that. They were very clear on it. They also said that there were certain kinds of other endpoints, like the combination of all-cause mortality and morbidity, that are so persuasive in this setting that they're even willing to believe that. Now, that's debatable. Everybody can argue about that. DR. LIPICKY: That's fine. DR. TEMPLE: But that's what they told us. DR. LIPICKY: I just want to be sure that I understood them that retrospective endpoints that never were stated anywhere at all are perfectly okay. DR. KONSTAM: As an element of corroboration. DR. LIPICKY: No, this isn't corroboration. This replaces a trial. You have a trial and you're trying to make up a second. Is that not what corroboration is? How do you verify? DR. KONSTAM: No. Again, the only question I keep asking myself is do I believe the result of that one trial that is clearly positive. DR. LIPICKY: And you do that by? DR. KONSTAM: Well, I'm using the word "corroboration." The question is, do you need another positive primary endpoint in order to say that? And I'm saying no. I believe that without another primary endpoint. DR. LIPICKY: Fine, but you said that more than that, that you could make up the endpoint. We need to be certain that we understand exactly what you think. DR. KONSTAM: Yes. I think in some cases -- DR. CALIFF: Ray, it sounds to me like there are a variety of different reasons why people were swayed. In my case it actually was -- I think you're close to being right, but it's just that the endpoint that, as you say, was "made up" is the big one. It's something that's, at least in my mind, impossible to ignore if it's overwhelming, even if you weren't looking for it, even if you didn't think you were going to find it. And it's there in study after study. You put them all together. It's still there. If you count in the deaths in the run-in phase, it's still there. DR. LIPICKY: We would have gotten into all of this if you hadn't answered the first question yes. How do you know it's there? Because what you've done -- understand what you have done. For primary endpoints it's pretty clear what you're doing. Two .05's is .025 squared. Now you get into the secondary endpoint business because the primary endpoints don't get you to the level of confidence you want. What are the p values you need for the secondary endpoints? At least they were mentioned. Certainly you're not talking .05 anymore. You're elevating these guys to some status that is different from what a secondary endpoint is. You're out of this .05 stuff, and I would have liked to have seen what you thought was significant and at what level you thought it was significant. Now even if it's a big-deal, retrospective endpoint, you're really elevating that guy to something enormous, of enormous status. It is the same as a primary endpoint for purposes of approvability. What p value tells you that it's there study after study? DR. CALIFF: Actually I think this is a very important point. We might be sharper on it tomorrow, but certainly in monitoring clinical trials -- and I think somebody may have said this in one of the briefing documents -- when you hit a p value of .001 for all-cause mortality, for example, that's kind of my choking point to say it's kind of hard to ignore this and you may be crossing the boundary of acceptable continuation of not giving people this treatment. DR. LIPICKY: How many of those did you see? DR. CALIFF: You remember I specifically said in a worst case analysis, better than .001 for death or death and all-cause hospitalization would do it for me regardless. DR. LIPICKY: Well, but your comment was that -- and we accept your recommendation. I'm just trying to understand the basis of it. Okay? Your statement was that it was always there, and what I see are p values that go from .002 looking across the sponsor's thing -- so for one study .026, .035, and .378. This is for mortality and all-cause hospitalization. That's your big-deal endpoint. That's the nominal p values, uncorrected for anything including multiplicity and having been retrospective. DR. CALIFF: Those were for individual studies. DR. LIPICKY: You made the statement that it's really there, you really think it's there a lot, and I'm wondering how you made that decision. DR. KONSTAM: Just following up on what Rob said, I guess the single thing in the data set that makes me feel most comfortable that 240 is correct is the overall survival data. Now, then you ask the question, what kind of statistical correction would you do to that survival data given the fact that it's not a specified endpoint? I have no idea how to answer that from a mathematical viewpoint. DR. LIPICKY: So, you are saying you're comfortable with 240 for different reasons than Rob said he was comfortable with 240. DR. KONSTAM: I thought he said the same thing. DR. LIPICKY: No. He said it was mortality and all-cause hospitalization. You're saying mortality. DR. CALIFF: No. I said either one, but mortality being the strongest. DR. LIPICKY: The mortality being the strongest? DR. CALIFF: Sure. How can you ignore it? DR. LIPICKY: We really are back to square one. DR. KONSTAM: Both. DR. MOYE: This is the trap that we fall into. With no guidance from the investigators on how to interpret secondary endpoints, we're left with trying to root through this maze and it's very tough. With eight or nine of us, there will be eight or nine different paths that we're going to take, and it's going to be almost impossible to build a consensus about this. DR. KONSTAM: But, Lem, just the way I look at it, I think mortality to me is such a big deal, and that's what Rob is saying. We have this big magnitude effect with a very, very small p value, and I guess normally what I would do under that situation is turn to the statistician and say, what is the likelihood that that's a chance finding? DR. MOYE: Right, so you should. DR. KONSTAM: And I haven't gotten any guidance from that mathematically. So, now I'm going back to saying, you know what? It looks pretty big and it looks like a pretty small number. It trends in the same direction in every single trial, and it's not a surprise from a pathophysiologic viewpoint. To me that adds up to enough to be something of corroboration that the endpoint in 240, of which survival was an element, is real. DR. DiMARCO: Dan? DR. RODEN: I wasn't going to say anything more, but I feel compelled. I really feel that there is no basis for making a claim or including in the label the idea that this drug does anything to mortality. We have criteria for establishing drug effects on mortality. Those involve large trials. The designs are not a secret from anybody, certainly not a secret from Milton. (Laughter.) DR. RODEN: The notion that based on one trial that includes mortality as one of its many, many composite endpoints and then a bunch of other touchy-feely data, which I must say makes me feel good too, we'll approve a drug or we'll give it a labeling for mortality I think is completely inappropriate. My notion was that we feel good enough about this database that the drug ought to probably be approved for the management of symptomatic heart failure with the idea of reducing symptoms. If you want a mortality claim, there are ways to get a mortality claim, but I really feel very strongly that we shouldn't entertain a mortality claim at all. DR. WEBER: Even, Dan, if mortality was a pivotal part of the data that led us to our decision and conclusion? DR. RODEN: It's a very small part. Boy, it's a tiny part, though. DR. WEBER: Tiny but important. DR. RODEN: There are hundreds of endpoints, and of those hundreds of endpoints, there are tens of deaths. So, the endpoint you're talking about is hospitalization. You're not talking about mortality as the endpoint. DR. DiMARCO: Jeff? DR. BORER: I want to make several points. First of all, I think Marvin is right in his statement of the indication for the use of the drug, that it's a reduction in morbidity, the incidence of morbidity, whatever, because I agree that death is the worst morbid event you can have. And I don't think there are data here that allow us to support an independent mortality claim. But I do think that morbidity defined by hospitalizations, which I think is a pretty reasonable operational definition, of which mortality is the worst example, has been reasonably shown to be reduced by this drug. Now, in terms of the supporting evidence, it's true, you could look at these trials and it may be correct to say -- I don't think it is -- well, you know, they looked at them with no rhyme or reason, everything is secondary, it's retrospective, it's this and that, and therefore it doesn't work. I would suggest a slightly different way of looking at what we've seen here today. We have a large set of data from several different centers, many sites, and in addition to whatever analyses the sponsor decided to do or the investigators decided to do, which maybe they did -- I don't think they did -- to make themselves look better, the FDA came in and said, well, here's this database. We want these analyses done. So, go do these. Maybe they're different. In fact, they are different than the analyses that the sponsor chose to do. They may be analyses which are more difficult to get a positive answer with, perhaps a little bit harder in terms of the type of endpoints, more conservative. They're the kinds of endpoints Rob was talking about. When those analyses were done, coming down as a deus ex machina out of the sky, the results are positive, as were the results when the sponsor did whatever analyses the sponsor wanted to do. I look at, for example, study 220, which I don't suggest by itself should stand alone as the study, but if we correct for 10 other endpoints that were looked for here, we still have a p value of p less than .02 or something like that. I think that's pretty good. I think that one has to look at the consistency of the data across the entire package we've seen. I don't like the methods that we used for some of the studies. I don't like some of the analyses that were done. I share some of Lem's concerns about the way things were done and they way they shouldn't have been done. I think all that's true. But in the face of that, we're looking at an extraordinarily consistent database. I find that compelling and I find it extraordinarily consistent with regard to what I think is a very important endpoint, and that is major morbidity defined by hospitalization, supplemented by the worst morbid event you can have which is death. So, for that reason, I think that the NDA supports the approval of the drug for that indication. With regard to symptomatic heart failure, maybe yes, maybe no. The data certainly trend in favor of symptom reduction, but I'm not totally convinced by that. But at least they trend in the right way to help support my belief that the major morbidity is reduced. DR. WEBER: And how would you express that or define it, Jeff? DR. BORER: How would I define what? DR. WEBER: The indication. DR. BORER: For the reduction in major morbidity as measured by hospitalizations and death. DR. DiMARCO: Cindy? DR. GRINES: I'd like, I guess, to say that in my opinion we approve this based on study 240, and I think that the indication for the drug should be based on the primary endpoint of 240 which is a combination of death and progression of heart failure. As we've all been talking, we're really swayed a lot by the mortality, and in 240 mortality alone was significant. The combined mortality from all these trials is highly significant. I think that it's not consistent with previous recommendations of this panel to just kick mortality out, even though that was the primary endpoint. There are drugs that have been approved with combined endpoints, including mortality, in which there is absolutely zero difference in mortality, but since it was a part of the primary endpoint, it was included in labeling. So, I don't know why we would change that in this particular study, particularly since most of the people on the panel are very impressed with the mortality differences. DR. KONSTAM: The problem with that, Cindy, is if you're approving it on the basis of 240 and using the words "morbidity" and "mortality," then it sounds like that it's clear to us that the drug reduces mortality. That's what it sounds like. I can't get the wording. That's the problem that I have. DR. GRINES: All I know is that there are drugs that have been approved in just the past couple years in which mortality and other endpoints are in the label and there was zero difference in mortality, no significance difference whatsoever if you look at mortality individually. So, why is this drug being treated differently? DR. DiMARCO: Dr. Temple? DR. TEMPLE: Those drugs don't have a mortality claim. They have a mortality mention. You could say that about aspirin, for example, which in secondary prevention decreases the sum of hospitalization plus death, mostly hospitalization, and you have to struggle to find an actual separate mortality claim. But I guess if 240 is one of the main things one is relying on and really nothing else, then I'm newly disturbed because 240 alone, as a predominantly symptomatic trial, doesn't seem persuasive on its own. What I heard people saying is that they find some of these other endpoints, whether Ray is right to say they were picked out of the air or not quite right, persuasive even though they weren't the primary endpoints of the trial because they're so consistent and because their p values are in many cases so small. So, I heard some people say it, but that's not what I heard when the voting went around, that this was entirely based on 240. It was also based on a belief that you could learn something from those other trials. I have to say, if that's not really not true, I'm distressed by what we've been told because we're going to have difficulty acting on it I think. DR. BORER: It's true for me. DR. TEMPLE: For us to approve a claim solely on the basis of 240, one study in a symptomatic circumstance with a sort of marginal value, depending on how you do it, would be an unpleasant precedent for me personally. But that isn't what I heard people saying. I heard them saying they saw things in those other studies that should be considered persuasive, not perfect. We all know the flaws, but that there was information from those studies, which were all well-controlled studies by the way, that meant something. How to exactly write the claims I don't think is going to as all that actually. DR. DiMARCO: Ray? DR. LIPICKY: Well, I just want to defend myself for a moment. The all-cause hospitalizations and mortality appeared for the first time in November of 1996 in a letter that you wrote to them. It never appeared anywhere in any protocol, any analytical plan, or any correspondence prior to that time. So, if that is the basis of feeling comfortable, it is as retrospective as I can ever dream of something being retrospective. DR. TEMPLE: But, Ray, wasn't there a secondary endpoint in many or all of the studies -- DR. LIPICKY: No, sir. DR. TEMPLE: Hold on. I didn't finish my question. I'd like to. May I? Do I have permission? Wasn't there an endpoint of death plus hospitalization probably cause-specific plus increased use of -- DR. LIPICKY: No, sir. DR. TEMPLE: Just in 240. DR. LIPICKY: Yes. DR. TEMPLE: We know that was true in 240. DR. LIPICKY: Yes. DR. TEMPLE: I'd like confirmation from the sponsor on that. DR. LIPICKY: Well, I copied the protocols. They're in my review -- or not review -- my memo. The protocols are copied exactly verbatim. You won't find those words. DR. TEMPLE: You won't find all-cause, I'm sure of that. DR. LIPICKY: No. You won't find a combined endpoint of death plus something anywhere. DR. TEMPLE: Except in 240. DR. LIPICKY: Except for 240. That had a combined endpoint primary. You will not find a combined anything anywhere, primary or secondary, in any other protocol. DR. PACKER: Ray, to be precise, each one of the U.S. multi-center protocols had hospitalizations for cardiovascular causes as its secondary endpoint. DR. LIPICKY: Death plus. DR. PACKER: Death should always be combined with hospitalization as a worst case. DR. LIPICKY: Terrific. So, you just didn't write well. All I'm saying is it is not anywhere written down. DR. PACKER: You can't take the worst case out of analysis of a non-fatal event. You can't do that. DR. LIPICKY: All right. DR. MOYE: You shouldn't do it. The way to ensure you don't is to say it prospectively. I agree you shouldn't do it, but was that said? DR. PACKER: It wasn't said but the data speak for themselves. DR. CALIFF: I would like to just hear what Lem says. How persuasive would an unexpected mortality benefit need to be for you? We haven't seen what the actual statistical assessment is here for all the data on this drug. We really actually haven't seen that today. DR. MOYE: You really missed the May meeting, didn't you? (Laughter.) DR. CALIFF: Yes. You're saying there's no circumstance. DR. MOYE: I'm saying that the finding for mortality was a surprise and was not a prospectively stated endpoint, and that since bad surprises can occur after good surprises, that we should not accept the good surprise on its face value. It should be confirmed in a trial done to look specifically at mortality as the primary endpoint and pushed through to the end, done in a correct way. We can learn a lot of things. Discovery is fine, but you shouldn't label based on discovery I think. DR. DiMARCO: Well, are there any other comments from anyone on the panel? (No response.) DR. DiMARCO: I think I'll adjourn the meeting. Thank you all for coming. (Whereupon, at 6:20 p.m., the committee was recessed, to reconvene at 8:30 a.m., Friday, February 28, 1997.)