Published online by Cambridge University Press: 03 January 2019
Index mining is a new discipline that aims to search for the composite measures or indices most relevant to the contexts or outcomes. After reviewing three frailty indices and principal component (PC)-based indices, we hereby show certain occasions that can lead to ineffective indices, which consist of bias or fail to represent the theories.
We reproduced and reviewed the three frailty indices and the 134,689 PC (principal component) -based indices from previous publications. The impact of aggregating the input variables on the final indices was analyzed using forward stepwise regression.
Several methods to combine the input variables were related to ineffective projection of information onto the indices. The most common causes leading to ineffective summation of input variables were shown in three examples involving different types of input variables, which were positively or negatively correlated or uncorrelated to the outcome. Ineffective indices were created often because of the summation of redundant information or uncorrelated variables.
The creation of ineffective indices can be avoided if the relationships between input variables and outcomes are properly scrutinized. The creation of composite measures and indices is still a discipline under active development. The three examples we identified are the mistakes that may be repeated unintentionally and need to be addressed with explicit rules. A reporting guide for the creation of composite measures has been proposed. A proper review of index objectives, data characteristics, and data limitations before creating composite measures or indices is recommended.