Why is there a need to assess the psychometric or clinometric properties of an outcome measure in different clinical populations? This is a fair question. For example, there is a considerable body of research suggesting the Functional Independence Measure (FIM) is a valid and reliable measure. Do we really need to test it in different diagnostic populations? The short answer is ‘absolutely’. The long answer is a bit academic, but important all the same.
The FIM was developed to assess the burden of care in the stroke population (Granger et al. 1986). There has been significant investment in the development of the FIM and it has become the gold standard for the assessment of basic function (e.g. transfers, mobility, dressing, grooming, bowel and bladder). In fact it is core to the minimum dataset used in many administrative databases such as the CIHI Rehabilitation Reporting System and the Uniform Data Set for Medical Rehabilitation Centers in the United States. Despite the popularity of FIM (now a proprietary entity) and its universal recognition, the attempts to use it across a broad range of disabling physical disorders, including SCI, has revealed deficiencies and inadequacies. In fact, Catz and colleagues (1997) created the Spinal Cord Independence Measure (SCIM) in response to frustrations related to using the FIM to categorize the functional changes associated with Activities of Daily Living (ADL) during SCI rehabilitation. The results demonstrate that the responsiveness, or the ability to detect change, is better in the SCIM than the FIM (Catz et al. 1997; 2006; Itzkovich et al. 2003). Now in its third version, the SCIM III is gaining international acceptance as the measure to use to assess functioning after SCI, and evaluate performing activities of daily living (Anderson et al. 2008, Dunn et al. 2009).
Another example is the Short Form-36 (Ware & Sherbourne 1992) and its lesser cousin the Short-Form-12. These extremely popular generic surveys of health related Quality of Life (QOL) include items which are oriented around activity limitation at the personal level, as well as participation or restriction at a societal level (e.g. can you lift and carry and object; can you climb stairs?). It seems obvious that a good proportion of the SCI population would not be able to complete many of these activities. This is why it is critical to assess that each survey item is first and foremost appropriate for the level of SCI being assessed, as unacceptable items can alter the individual’s response (seriousness to answer) or confound the data from each study cohort. This stance does not mean that new tools should be created for every diagnosis, health condition or situation (Streiner & Norman 2004), but it does make sense that existing tools must be validated for each study population so they are both sufficiently accurate and sensitive to detect a meaningful difference in a functionally significant clinical endpoint between the experimental and control groups of the trial (Steeves et al. 2006).
If the above reasons are not compelling enough, Portney and Watkins (2000), in their discussion of generalizability theory (the concept of reliability theory in which measurement error is viewed as multidimensional) remind us that establishing the population-specific reliability is essential especially to clinical practice. The nuances of many factors such as pain, spasticity and deformity can alter the reliability of any obtained result. In short, while a lack of evidence does not mean evidence is lacking, we are obligated to demonstrate and document the reliability and validity of a test score in order to have faith in our results.