For the last 5 years OHE’s Deputy Director, Paula Lorgelly has been researching reporting behaviour/reporting heterogeneity in self-reported health (SRH) measures, including the EQ-5D. Paula, along with colleagues at the Centre for Health Economics, Monash University; Curtin Business School, Curtin University; Faculty of Health and Medicine, Lancaster University and the Centre for Health Economics, University of York, has worked on a programme of research designing and testing tools to identify reporting heterogeneity in the EQ-5D-5L.
Intergroup comparisons using SRH rely on the measure being an accurate reflection of the latent underlying health of the groups or individuals concerned. However, responses to questions on subjective scales, such as the domains of the EQ-5D, will be incomparable if certain groups of people systematically differ in their use of the response categories. Systematic variation in the use of response categories is known as reporting heterogeneity, response-scale heterogeneity, or differential item functioning (DIF), and has been observed across a range of respondent characteristics including age, socioeconomic status and nationality. DIF has been shown to exist in other self-reported measures of health, but has largely been overlooked in the case of the increasingly popular Patient Reported Outcome Measures (PROMs).
Anchoring vignettes have been shown to be a promising approach for detecting DIF. This approach involves presenting a series of vignettes (brief descriptions of the health state of a hypothetical individual) for each health construct of interest, at varying levels of severity.
Consider the figure below. Suppose we have two vignettes, where the person in vignette 1 (Olivia) is described as having fewer health problems (in this example mobility) compared to the person described in vignette 2 (Vicky).
How two groups (A and B) might rate the health/mobility of the vignettes on average is illustrated below (where the fixed health of each vignette is represented by the dotted horizontal lines). Group B is relatively optimistic; their ratings are more favourable than Group A’s ratings for both vignettes (i.e. mean ratings for the two vignettes are slight problems (vignette 1) and severe problems (vignette 2) for Group A, and no problems (vignette 1) and moderate problems (vignette 2) for Group B). Although relatively simple in its execution, there are two underlying assumptions that must hold: response consistency and vignette equivalence.
Figure: Logic underlying anchoring vignettes to locate respondent thresholds
The programme of work began with a phase of qualitative research in order to design vignettes that satisfied the identifying assumptions. We found that EQ-5D domain specific and more holistic vignettes were feasible (Au and Lorgelly, 2014).
Next we sought to establish if the anchoring vignette approach could be used to identify DIF, and specifically if it passed tests for response consistency and vignette equivalence. Utilising data collected in two surveys in Australia we were able to identify DIF in the EQ-5D-5L, at least for certain age groups (Knott et al., 2016).
We further found that failing to adjust for DIF in the EQ-5D could imply conclusions regarding group differences that are misleading. The next phase of research is to test whether vignettes collected in one dataset can be used to adjust for DIF in another dataset. This follows from similar work using SRH (Harris et al., 2016). Our early results discussed at a recent HESG meeting suggests that this is possible (Knott and Lorgelly, 2016).
Future work will extend this research into other countries in order to understand cross-cultural reporting behaviour, and what effect this might have on multi-national trials.
Slides of a recent presentation of this programme of work given at the London Health Economics Group are available here.
Knott, R., Au, N., Hollingsworth, B., Lorgelly, P. (in press). Response-scale heterogeneity in the EQ-5D. Health Economics Letters.