In this work we approached various aspects of generic outcome models study for intensive care adult patients. After a review of the main generic models developed during these last 30 years, we discuss some methodological fundamentals of outcome prediction model development. The objective of these theoretical and methodological descriptions is to help the reader in his comprehension of what severity of illness models are and how they are developed. We did not intend here to present an exhaustive review of the existing models; we do not either have the ambition to offer to the reader a method allowing him to develop its own severity model.
In a first research paper, we studied the problem of severity of illness data collection. This issue may appear trivial; however adequate data collection is a necessary prerequisite to the accurate score calculation that will guarantee the best performances possible of these indices. In this study where we surveyed three groups of data collectors: untrained junior clinical staff, trained research coordinator and senior clinical staff with an extensive experience in collecting the APACHE II score (Ledoux, Finfer et al. 2005). We evaluated the
impact of the expertise on collection of the APACHE II score and on the derived risk of death. We could show that, for most of the APACHE II score variables, the lower rate of agreement was found between the inexperienced group versus the others groups. Interestingly, if the discrimination of the APACHE II score was not affected, the calibration proved to be bad for predictions established from the data collected by the junior clinical staff. It resulted that the ratio between observed mortality and mortality predicted by the score APACHE II
(Standardized Mortality Ratio - SMR) tended to be higher, leading to a falsely pejorative the evaluation of the ICU. These results bring us, like other authors (Goldhill and Sumner 1998; Polderman, Jorna et al. 2001), to stress the importance of accurate severity score data gathering and to recommend that ICUs provide with sufficient resources to train and employ dedicated data collectors.
Another problem encountered with severity of illness indices is that their performance is not stable over time. Various authors, indeed, showed that severity of illness indices and their risk of death models see their performance deteriorating with time (Rowan, Kerr et al. 1993; Apolone, Bertolini et al. 1996; Moreno, Miranda et al. 1998). Two characteristics may explain the deterioration of performance: changes in the intensive care population and the evolution of available therapeutics. Actually, when one uses APACHE II score to compute a death prediction for a group of patients admitted to the intensive care in 2009, the returned prognosis refers to the ICU population and treatments of the years 1979-1982. Yet, the intensive care population clearly changed over these last 30 years; Hariharan et al. showed, for instance, that the proportion of octogenarians doubled in the last 10 years (6% in 1996, 12.5% in 2008) and that ICU stay of patients admitted after elective surgery lengthened (Hariharan and Paddle 2009).
Given these observations, new models were developed. Among these models, the SAPS 3 admission score is certainly most interesting (Metnitz, Moreno et al. 2005; Moreno, Metnitz
et al. 2005). Among recent models, it is indeed the only one that was developed from a vast international patients’ sample in a large number of countries distributed on three continents (Europe, America and Oceania); thus allowing for model customization according to geographical areas. However, even if this severity score appears promising, external validation studies in independent patients’ samples are still scarce. We present here a study which is, to our knowledge, the first to validate SAPS 3 score in a independent general intensive care population and to show SAPS 3 admission score superiority as compared to the APACHE II score (Ledoux, Canivet et al. 2008). In this study, that included more than 800 patients, we observed that SAPS 3 admission score customized for Western Europe had better performance (better discrimination and calibration) than APACHE II score. One can deplore that, to date, few clinical studies – therapeutic trials in particular – refer to the SAPS 3 model. The APACHE II score remains indeed – except in France – leading model in spite of the fact that even Knaus, the APACHE II original developer, advised that researchers should
discontinue the use of APACHE II for outcome assessment (Knaus 2005).
We then turned to the possible ways of generic indices improvements. We started from the
observation that explanatory power of the acute physiology model component tended to decrease in the more recent models. As a matter of fact, it was shown in the APACHE III score (Knaus, Wagner et al. 1991), that 73% of explanatory power was due to physiology variables (Ridley 1998); whereas this rate falls to less than 30% in SAPS 3 admission score (Moreno, Metnitz et al. 2005). The reduction in the contribution of physiological disturbances to SAPS 3 model explanatory power may partly be explained by the input of patient’s preadmission characteristics. One cannot exclude however that physiology variables used in the current prediction models lack of discrimination. We therefore explored other physiology variables than those commonly used in outcome models. We considered three organ systems: brain, heart and kidneys. In severity models, cerebral function is generally evaluated using the Glasgow Coma Scale (GCS). However this scale presents several flaws: it does not assess brainstem function, it is theoretically not applicable to intubated patients, and finally it lacks discrimination to identify conditions such as minimally conscious state (MCS) or locked-in syndrome (LIS). We describe here the Full Outline of UnResponsiveness (FOUR) (Wijdicks, Bamlet et al. 2005) which could advantageously replace the Glasgow coma scale in future model developments. The renal function is a well-known risk factor for morbidity and mortality (Anderson, O'Brien et al.
1999; Franga, Kratz et al. 2000; Penta de Peppo, Nardi et al. 2002). In the outcome models, renal dysfunction is often evaluated by serum creatinine. However several authors showed that serum creatinine is not a good marker of glomerular filtration rate (Perrone, Madias et al. 1992; Herget-Rosenthal, Marggraf et al. 2004). Cystatin C could be a better marker than creatinine to estimate ICU patients’ risk of death. We therefore conducted in patients admitted to the ICU after open heart surgery. In this work, we showed that the glomerular filtration rate (GFR) estimated using serum cystatin C was a better risk marker for 1-year mortality than the GFR estimated by serum creatinine (Ledoux, Monchi et al. 2007). Hence,
the use of cystatin C in outcome prediction models could prove to be interesting. In the severity of illness indices, the circulatory function markers are limited to blood pressure and heart rate. In an unpublished study (Ledoux 2008) including more than 500 patients admitted in intensive care after open cardiac surgery, we showed that a prediction model for 1-year mortality based on objective variables such as the age, troponin T, pro-BNP and CRP levels had at least equivalent performance as compared to the EuroSCORE. Although we focused on an ICU patients’ subgroup, it seems reasonable to think that introducing variables indicating the degree of cardiac ischemia such as troponin T or the degree of ventricular dysfunction such as pro-BNP could be valuable for in future model developments.
Developing severity indices is not a self-sufficient objective. The outcome prediction models are instruments whose purpose is to help the clinician in improving quality of cares. These models can play a role at various levels of the medical practice. Although they are not designed for, outcome prediction models can help in individual prediction. They indeed bring objective information on patient’s condition that may strengthen clinical perception and hence make the physician more confident in his decision. The patient also takes advantage of this objectivity. However when they are used for individual risk assessment, outcome
prediction models must interpreted taking into account the whole patient clinical picture. A more common use of the severity scores is the ICU performance evaluation; these instruments allow comparing observed and predicted mortality, to compute the SMR and hence to check the ICU efficacy and to make benchmarking. However ICU evaluation using the SMR presents limitations. To be used in this application, it is important that the severity models are correctly calibrated; a poor calibration may indeed lead to false conclusions.
Moreover, hospital mortality may be influenced by factors that are not related to ICU efficacy such as: hospitals discharge practices, the availability of step down structure like nursing home or palliative care institutions, or patients and families priorities. Finally, if hospital outcome is an important endpoint, there are other endpoints like long-term survival and quality of life which may be more meaningful and more relevant for the patient and his relatives.
The study of the outcome prediction models inevitably leads to ethical considerations, especially ethical issues related to ICU end-of-life decisions. The Ethicus study demonstrated that end-of-life decisions are routine in the ICU. Life support therapy was limited in 3 out every 4 patients who died in the ICU (Sprung, Cohen et al. 2003). The outcome prognostic models information could help physicians in their decision-making process. Previous studies demonstrated that end-of-life decisions were difficult in up to 72% of discussions (Sharma 2004); using severity models may help reducing physicians’ burdens related with end-of-life decisions. Outcome models may also be more equitable for patients since they do not incorporate value-based judgments. Nonetheless there are several factors that limit the use of severity score in end-of-life decision; the main being clinicians resistance.
In this work, we showed that in spite of their limitations, the generic outcome indices are likely to contribute to quality of care improvement. Several countries set up national projects aiming to ICU evaluation (ICNARC ; de Keizer, Bonsel et al. 2000; Villers, Fulgencio et al. 2006). We think that it would be useful to launch a similar program in Belgium allowing for ICU benchmarking across the country. It is however important to keep in mind that the goal of such a project should not be to classify ICUs but rather to create a base of knowledge in order to make it possible for each ICU to progress. Thinking this way, it seems important to us that such a project is led by the actors of health under the support of their scientific society.