Predicting Poor Outcomes in Heart Failure
Fall 2011 - Volume 15 Number 4
Background: Health plans must prioritize disease management efforts to reduce hospitalization and mortality rates in heart failure patients.
Investigations into risk factors for poor outcomes in patients with heart failure have included patient-reported symptoms, comorbid conditions, and laboratory findings.1,2 These studies show that increases in patient-reported symptoms, extreme hemoglobin values, and poor renal function are important risk factors for poor outcomes. However, findings from echocardiogram data, an important diagnostic tool in heart failure, have been less well studied for their predictive value across the spectrum of heart failure. In those with systolic dysfunction (<50%), lower ejection fraction values are associated with worse survival.3 Other studies, however, have demonstrated that higher ejection fraction values are associated with worse outcomes than lower values among patients with preserved left ventricular function.4 In this study, we examine the predictive value of echocardiogram measurements, including both ejection fraction and left ventricular wall thickness so as to further clarify these risks.
We stress prediction in the analysis and interpretation. Our interest is predicting the outcomes among heart failure patients using information that providers and health plans collect routinely, not hypothesis testing5p1-9 (eg, does low ejection fraction cause mortality). This is an important distinction because although every causal factor is a predictor, not every predictor is a cause,6 suggesting that some variables included in a prediction model, including the order of their entry into model building, may differ from a causal model. This study sought to predict outcomes among heart failure patients and to develop a prognostic risk model that could separate the higher- and lower-risk patients and determine how effectively ejection fraction and left ventricular wall thickness improved predictions. Whereas other investigators have examined prediction rules in the inpatient setting and for patients with severe heart failure,7,8 we included patients more representative of those in a community-based setting.
We conducted a retrospective study on a prevalence cohort of patients. Patients included in the study were adult (age 18 years and older) members of Kaiser Foundation Health Plan of the Northwest (Health Plan) who had an echocardiogram completed between 1999 and 2004 and a diagnosis of heart failure. Patients were followed for up to 5 years or until April 1 2005, death, or disenrollment from the Health Plan (whichever came first). The patient's first echocardiogram served as the index date. Patients were required to have at least 1 year of Health Plan membership (and prescription benefit coverage) before their index echocardiogram. All patients had 1 to 3 years of baseline data from which baseline covariates were extracted. We included patients with a diagnosis of heart failure (International Classification of Diseases, Ninth Revision [ICD-9] 428) from the inpatient or outpatient setting during the baseline period—or up to 30 days after their index echocardiogram (to account for diagnoses assigned after an echocardiogram, presumably on the basis of the echocardiogram findings). Others have found that ICD-9 428 has a predictive value positive of 82%9 for heart failure, but the test performance of that characteristic may not be the same in our setting. However, in order to mimic the data readily available (eg, without chart review) to a centralized population management department we were interested in that pragmatic patient inclusion strategy. The Kaiser Permanente Northwest (KPNW) institutional review board approved this study.
The most current covariate baseline value (before the index date) was extracted from KPNW's electronic medical record (EMR) (including laboratory data from inpatient or outpatient visits, patient registries, and echocardiogram findings; see Table 1 for listing). Covariate laboratory findings were obtained from outpatient laboratory data only because outpatient values are less likely to be influenced by acute events. Glomerular filtration rate (eGFR) was estimated using the four-variable Modification of Diet in Renal Disease equation.10
Our outcome of interest was a composite of all-cause mortality or hospitalization (whichever came first) with a primary discharge diagnosis of heart failure.
We used Cox regresssion to predict the combined endpoint of all-cause mortality or first cardiovascular hospitalization. Patients were required to have a measured ejection fraction. As recommended by experts,5p41-5 we required at least 20 endpoint events for each degree of freedom reflected in candidate variables. We included candidate variables with <15% missing data in the model and used single imputation on missing data
We included covariates if they were prevalent in at least 10% of patients. When deciding the variables to include in our model, we tried to balance each variable's ease of electronic extraction and measurement error. To do this we fit a series of models starting with easily obtainable demographic characteristics (ie, Model 1, including age, gender, race, body mass index [BMI]), then added variables readily measured at a primary care office visit (Model 2, adding eGFR, hemoglobin, blood pressure to Model 1 variables). In the next model we added echocardiogram data (Model 3, adding ejection fraction and posterior heart wall thickness, plus an interaction with wall thickness and BMI to standardize for body mass to Model 2 variables). These clinical measures are more expensive to measure, but are thought to be closely related to heart failure prognosis. Our next model added a set of comorbid conditions (Model 4, adding smoking, diabetes, dyslipidemia, and hypothyroidism to Model 3 variables) that were readily diagnosed by patient inquiry or laboratory tests (ie, minimal measurement error). Model 5 added comorbid conditions (transient ischemic attack, chronic lung disease, heart valve disease, atrial fibrillation/flutter, depression, coronary artery disease, peripheral vascular disease, and stroke) that required complex clinical interaction to diagnose.
Model fit was assessed using Bayes Information Criterion (BIC). We calculated the concordance statistic5p465-507 (C-statistic = 1 is perfect prediction; C-statistic = 0.5 is equal to chance prediction) to evaluate each prediction model's accuracy. Some readers may be more familiar with the Area under the Receiver Operating Curve (AUROC) in logistic regression, which is the analog to Cox regression's C-statistic. To determine the added predictive value of each model's set of additional variables, we evaluated the marginal relative change (%) in the C-statistic in accuracy between models.11 Patients were categorized into quintiles of predicted risk. The mean predicted risk (overall and per quintile) was then compared with the mean observed risk to examine the model calibration (predicted risk/observed risk). A perfectly calibrated model would have the same observed and predicted risk. We plotted the observed risk and the predicted risk (within quintiles of predicted risk) to evaluate the risk model's calibration (ie, the extent to which the predicted risks over- or underestimate the observed risks).12 We assessed the model's discriminative ability by dividing the predicted risk in the highest quintile by the predicted risk in the lowest quintile.12
We used Stata 9.2 (College Station, Texas, USA) and R (version 2.4), an open source software from the R Foundation for Statistical Computing (www.R-project.org).
The authors had full access to the data and take responsibility for its integrity. All authors have read and agree to the manuscript as written.
Among 519,383 adults aged 18 years and older, we found 10,265 with a diagnosis of heart failure—8291 of whom had an echocardiogram. Our analysis dataset included 4696 of those patients with at least one year of Health Plan membership and pharmacy coverage before their echocardiogram. Table 1 shows the baseline characteristics and event rates. Older age, low ejection fraction, and cardiovascular disease were associated with higher death rates, while increasing BMI, blood pressure, and dyslipidemia were associated with lower event rates. The overall observed 5-year risk was 56%; 95% Confidence Interval (CI), 54% to 58%.
Table 2 shows the main results for the prediction models. The first model, with demographic characteristics only, had a C-statistic of 0.63, suggesting accuracy that is 13% better than chance prediction. In Model 2 (demographics along with hemoglobin, eGFR, and blood pressure), the C-statistic shows a 31% relative improvement: from 0.63 to 0.67. Adding echocardiogram data in Model 3 showed a 6% relative improvement to 0.68. Adding diagnoses that are readily measured in Model 4 improved the accuracy to 0.69 (6% relative improvement), and adding further comorbidities raised the C-statistic to 0.71 (an 11% relative improvement). The hazard ratios (HRs) on baseline characteristics were consistent across models and model fit is improved (ie, lower BIC), but not appreciably, with added characteristics. Few hazard ratios exceeded 1.5, except older age, anemia, low eGFR, and low ejection fraction. Ejection fraction <20%, however, was a characteristic of <5% of the population. No HRs exceeded 2.5. No interactions were significant at p < 0.05.
As discussed above, the results from the C-statistic show that Models 2 and 3 have similar discriminatory power. Another dimension of discrimination is the ratio of observed risks across the highest and lowest quintiles predicted risk. The risk model from Model 2 showed that patients in the highest quintile were about 3 times more likely to have the outcome as patients in the lowest quintile of risk: 84% (highest quintile); 66% (60th to 79th percentiles); 53% (middle quintile); 42% (20th to 39th percentiles); 30% (lowest quintile). Similar results were found for Model 3 (see Figure 1).
The calibration of Models 2 and 3 were also very similar, as shown in Figure 1. Specifically, the calibration was excellent at the highest level of predicted risk (ie, 84% observed vs 84% predicted for Model 2 and 85% observed vs 85% predicted for Model 3), and was within 5% for all quintiles in both Model 2 and Model 3.
We found that easily accessible data from EMRs can be combined to predict patients at risk of poor outcomes from heart failure, and that they predict as well as models using less easily accessible clinical data. From the perspective of the Health Plan, our prediction model may be most valuable for prioritizing centralized disease management program efforts by stratifying patients according to their absolute risk of poor outcomes. Care Managers might then use that risk data to focus coordination of care efforts on those at highest risk, or perhaps to deliver specific health prevention information to patients not yet at the highest risk. Information on individual patient risk level could also be provided to physicians as part of their decision making as they identify patients most likely to benefit from case management, a complex patient medical home or referral for other heart failure specific services. Unlike many previous efforts at risk prediction for patients with heart failure, our analysis was not restricted to any particular subgroup of patients; instead, the population we used is representative of the community setting. This is an important point for disease management efforts because those responsible for population management are concerned with the totality of the Health Plan members with heart failure. We also purposefully focused on demographic and clinical findings that are available from routinely collected EMR data, making our results more immediately applicable to centralized care management programs.
Some data like ejection fraction measurements are inconvenient to obtain from automated data sources because they don't exist in easily extractable data fields. However, our analysis showed that there was little increase in predictive ability when we added increasingly complex clinical measurements. Our findings illustrate that the added predictive ability of knowing ejection fraction is small when compared with a model that includes demographics, blood pressure, renal function and anemia status. The small increase in accuracy apparent when measurements from echocardiogram were added is particularly interesting and has potentially significant implications for disease management prioritization efforts. Specifically, whereas ejection fraction was available to us electronically at KPNW, not all managed care environments enjoy such access.
Some may find it surprising that ejection fraction did not add more predictive ability in our study. To put this into context, previous investigations have shown that each 10 point decrease in ejection fraction below 45% is related to a 30% increase in death rate (HR 1.31, 95% CI, 1.24 to 1.38), after adjusting for other potent risk factors (including, for example, the New York Heart Association heart failure classification) over a median of 38 months of follow-up.3 However, it is important to note that our findings do confirm this strong relation between ejection fraction and increasing poor outcomes. Additionally, our findings are dependent on the specific time frame of prediction (ie, 5 years) that we used; a shorter time frame may have suggested stronger predictive value for echocardiogram findings. The HR for individuals with poor ejection fraction was greater than 2.0 (eg, Model 3, HR = 2.18, 95% CI, 1.77 to 2.68)—meaning that they were twice as likely to suffer the outcome as patients with better ejection fraction—so we found that ejection fraction does have independent predictive ability. Thus, taken on its own, ejection fraction is an important risk factor, but as pointed out by Guyatt,13 predicting absolute risk requires a clinician to balance competing risk factors simultaneously. Doing so is an extremely difficult cognitive challenge. A regression-based approach like the one taken here can help providers and health plans avoid problems with double-counting the contribution of correlated risk factors. It should be noted that we used ejection fraction findings from a specific point in time, and that additional prognostic value may be available for example, from serial ejection fraction findings. We purposefully included the entire spectrum of heart failure patients (ie, both preserved and reduced systolic function) in our model. Thus, our findings do not speak to the issue of whether ejection fraction measurements would add to predictive ability if stratified models were built for patients with preserved systolic function, and, separately, patients with systolic dysfunction.
A study8 of four clinical prediction rules in hospitalized heart failure patients examined the predictive ability related to outcomes of inpatient death, complications, and 30-day mortality. The measure of accuracy used in that study (area under the receiver operating characteristic (ROC) curve, analogous to the C-statistic used in our study) was below 0.62 for inpatient death or complications, and went as high as 0.74 for inpatient death. Our study is different in important ways: 1) we used outpatient instead of inpatient characteristics; 2) we were able to assess the usefulness of echocardiogram data instead of physical examination findings such as pulse and respiratory rate; and 3) we had a 5-year follow-up period, not a 30-day period. In spite of these differences, however, 2 studies have remarkably similar findings and illustrate the difficulty in predicting outcomes in patients with heart failure. They further suggest investigators are missing important characteristics in existing models of heart failure prognosis. For example, prognosis may depend on a patient's willingness to adhere with medications and other daily disease management efforts that are difficult to capture at baseline.
Our study's main limitation is the lack of a protocol for measuring the characteristics completely and reliably. We had to impute 11% of BMI values, for example, because BMI was not collected during the baseline period. It is possible that BMI would be a stronger predictor if we had measured it more completely, instead of assigning patients' values according to their other (known) characteristics. The other characteristics that contributed to the risk prediction model had far less missing data (for example, 99.8% of patients had a recorded systolic blood pressure value), but may have been measured unreliably on the basis of a single value. Other investigators have shown that predicting cardiovascular events on the basis of a single baseline value for blood pressure underestimates the strength of the relation by as much as 60%, a statistical problem known as regression dilution bias.14 The characteristics that we evaluated might have discriminated patients' risk more effectively if we were able to reduce regression dilution bias through repeated baseline measurements. Although a prognostic risk model based on repeated baseline measurements would be better in theory, it would be impractical for most health plans, as they lack repeated baseline measures collected according to a protocol. Our findings should be subject to validation efforts in other cohorts.
We suggest that Model 2 could be used for disease management prioritization efforts, provided that patients who are included in the population to be stratified all have a recent echocardiogram and a diagnosis of heart failure. We feel this is important because all patients who contributed to our risk model had an echocardiogram, which may have influenced the spectrum of heart failure patients. Patients without an echocardiogram, for example, may have been less severe or at least less symptomatic compared with patients who had an echocardiogram. Because we excluded patients without an echocardiogram, we cannot evaluate how the effectiveness of our risk model predictions varies across the entire spectrum of patients with a diagnosis of heart failure.15 So decision makers who use our risk model to prioritize patients for disease management in their populations may elect to only calculate predictions for patients who have had an echocardiogram. Including lower-risk heart failure patients would probably compromise the risk model's accuracy and reduce its transportability, as successful transportability of a risk model to other clinical populations depends on the comparability of their disease spectrum.16
This study was funded by Amgen Pharmaceuticals. Amgen sponsored the study through a contract. Although Amgen had an opportunity to comment on the manuscript, they had no role in the study design, analysis, or writing of the manuscript; the authors bear full responsibility for this final version.
1. Go AS, Yang J, Ackerson LM, et al. Hemoglobin level, chronic kidney disease, and the risks of death and hospitalization in adults with chronic heart failure: the Anemia in Chronic Heart Failure: Outcomes and Resource Utilization (ANCHOR) Study. Circulation 2006 Jun 13;113(23):2713-23.