Advanced Analysis | Machine Learning | Regression | Using Displayr | What is...

The Difference Between Shapley Regression and Relative Weights

Shapley regression and Relative Weights are two methods for estimating the importance of predictor variables in linear regression. Studies have shown that the two, despite being constructed in very different ways, provide surprisingly similar scores((Grömping, U. (2015). Variable importance in regression models, WIREs Comput Stat 7, 137-152.))((Lebreton, J.M., Ployhart, R.E., & Ladd, R.T. (2004). A Monte Carlo Comparison of Relative Importance Methodologies. Organizational Research Methods 7, 258 - 282.)). As the computational expense of Shapley regression by far outweighs that of Relative weights, and if the two yield essentially the same results, then you can consider Relative Weights a strong alternative to Shapley. In this blog post, I explore the relationship between the two methods.

Shapley regression

Shapley regression (also known as dominance analysis or LMG) is a computationally intensive method popular amongst researchers. To describe the calculation of the score of a predictor variable, first consider the difference in R² from adding this variable to a model containing a subset of the other predictor variables. Then, obtain the score by computing these differences over all possible subsets and averaging. Whilst simple in principle, the number of cases to consider doubles with every additional predictor variable. Thus, a model with 20 variables will have over 1 million different cases. This results in a non-trivial calculation, even on a modern desktop computer.

To demonstrate the calculation of a Shapley, consider a three-variable regression of y on X₁, X₂, and X₃. Using real data, the R² for the regressions run on the predictor subsets are:

$\begin{array}{c|c|c} \textrm{Model}&\textrm{Predictors}&R^2\\ \hline 0&\textrm{none}&0\\ 1&X_1&0.1064\\ 2&X_2&0.1254\\ 3&X_3&0.1288\\ {1,2}&X_1,X_2&0.1807\\ {1,3}&X_1,X_3&0.1700\\ {2,3}&X_2,X_3&0.1873\\ {1,2,3}&X_1,X_2,X_3&0.2144 \end{array}$

Calculate the Shapley score for X₁ by averaging the differences in R² of models with and without X₁ as a predictor variable:

$\frac13(R^2_{1,2,3}-R^2_{2,3})+\frac16(R^2_{1,2}-R^2_2)+\frac16(R^2_{1,3}-R^2_3)+\frac13(R^2_1-R^2_0)=0.0606$

Note that weights are chosen such that the weight for adding X₁ to a model with two predictors equates to the total weight for adding X₁ to a model with one predictor. This is also the same as the weight for adding X₁ to a model with no predictors. This general pattern holds for models with more predictors. Similarly, the Shapley scores for X₂ and X₃ are:

$\frac13(R^2_{1,2,3}-R^2_{1,3})+\frac16(R^2_{1,2}-R^2_1)+\frac16(R^2_{2,3}-R^2_3)+\frac13(R^2_2-R^2_0)=0.0788$

$\frac13(R^2_{1,2,3}-R^2_{1,2})+\frac16(R^2_{1,3}-R^2_1)+\frac16(R^2_{2,3}-R^2_2)+\frac13(R^2_3-R^2_0)=0.0751$

Relative Weights

An alternative to Shapely regression, Relative Weights((Johnson, J.W. (2000). A heuristic method for estimating the relative weight of predictor variables in multiple regression. Multivariate behavioral research 35, 1-19.)) produces similar scores with much faster computation. This method transforms the predictor variables to a set of orthogonal variables (not correlated with each other). When the outcome variable is regressed on these orthogonal variables, the squared coefficients from the regression represent each orthogonal variable's contribution to the R². The matrix from the initial transformation is then used to convert these contributions to importance scores of the original predictor variables. This procedure involves running a fixed number of linear algebra operations whose complexity only increases as a square of the number of variables, rather than exponentially, making it much faster to run.

Analytical expressions of scores

I sought to find analytical expressions for the two methods to see how they differ. Since they are identical for models with two predictors, I consider models with three predictors: models with more predictors makes analysis much more complicated. Comparing models with three predictors will provide us with insight into how the methods differ in general.

Neither method depends on the offset and scale of the outcome and predictor variables. Therefore it can be assumed that they are normalized to have a mean of zero and a variance of one. As a result, two methods can be calculated from the correlations between the predictor variables and the regression coefficients.

I managed to derive a general analytical expression for Shapley scores for models with three predictors, but unfortunately it is too long to display in its entirety in this blog post and there is no easy interpretation of it as far as I can tell. I display the first six terms for the Shapley score of the first predictor below, with another 47 terms not shown here:

$\frac16\left(\frac{4{{\beta}_{2}}{{\beta}_{3}}{{{{\rho}_{23}}}^{3}}}{1-{{{{\rho}_{23}}}^{2}}}+\frac{4{{\beta}_{1}}{{\beta}_{3}}{{\rho}_{13}}{{{{\rho}_{23}}}^{2}}}{1-{{{{\rho}_{23}}}^{2}}}+\frac{4{{\beta}_{1}}{{\beta}_{2}}{{\rho}_{12}}{{{{\rho}_{23}}}^{2}}}{1-{{{{\rho}_{23}}}^{2}}}+\frac{2{{{{\beta}_{3}}}^{2}}{{{{\rho}_{23}}}^{2}}}{1-{{{{\rho}_{23}}}^{2}}}+\frac{2{{{{\beta}_{2}}}^{2}}{{{{\rho}_{23}}}^{2}}}{1-{{{{\rho}_{23}}}^{2}}}+\frac{4{{{{\beta}_{1}}}^{2}}{{\rho}_{12}}{{\rho}_{13}}{{\rho}_{23}}}{1-{{{{\rho}_{23}}}^{2}}}+\ldots\right),$

where $\beta$ represents the vector of coefficients and $\rho$ the correlation matrix of predictors.

Deriving a general expression for Relative Weights turned out to be too difficult even with the assistance of computer algebra system. The problem becomes much more tractable when I fix values for the correlations and coefficients. Therefore I explore how the two methods differ when fixing all but one of the values.

Varying correlation between predictors

In the first comparison I set the values of the coefficients $\beta$ and the correlation matrix of the predictors $\rho$ to be:

$\beta= \left[ {\begin{array}{c} 0.5\\ -0.2\\ 0.05\end{array} } \right] \quad\textrm{and}\quad \rho= \left[ {\begin{array}{ccc} 1 & \rho_{12} & 0.5 \\ \rho_{12} & 1 & 0.5 \\ 0.5 & 0.5 & 1\end{array} } \right]$

where $\rho_{12}$ is allowed to vary. I chose these values to be similar to those found in a typical regression model. Although a correlation coefficient may range from -1 to 1, $\rho_{12}$ is further constrained by the requirement that the matrix $\rho$ needs to be a valid correlation matrix. This limits the range of $\rho_{12}$ to -0.5 to 1. The chart below plots the Shapley scores and Relative Weights over this range, for the three predictor variables X₁, X₂ and X₃:

Note the remarkable similarity of the two methods for most values of $\rho_{12}$ , except near -0.5. At this point, the correlation matrix becomes more pathological and unlikely to be encountered in the real-world. Beyond 0.5, $\rho$ is no longer a valid correlation matrix. If I sort the predictor variables by their scores, the order of the variables is almost identical between methods. The order only differs where variable lines intersect. For example, between values of 0.65 to 0.68 for $\rho_{12}$ , the scores for the second predictor variable X₂ are less than the Relative Weights of the third predictor variable X₃ but more than the Shapley scores for X₃. In any case, the order of variables within this range is not important as the scores are very close to each other and variables should be considered equivalent for comparison purposes.

Varying a regression coefficient

In the second comparison I set the values of the coefficients $\beta$ and the correlation matrix of the predictors $\rho$ to

$\beta= \left[ {\begin{array}{c} \beta_1\\ 0.1\\ -0.4\end{array} } \right] \quad\textrm{and}\quad \rho= \left[ {\begin{array}{ccc} 1 & 0.5 & 0.2 \\ 0.5 & 1 & -0.3 \\ 0.2 & -0.3 & 1\end{array} } \right]$

where I am now varying the regression coefficient $\beta_1$ instead of a correlation coefficient. Again, I chose these values to be similar to those found in a typical regression model. In the chart below, we have restricted the range of $\beta_1$ such that the sum of the scores (equal to the R²) is no more than 1:

The two methods are again very similar and the order of variables is the same except over a small range of $\beta_1$ . The two methods differ most when the magnitude of $\beta_1$ is largest, but even then, inferences made using the two methods are unlikely to differ.

In summary, my findings match those of previous studies comparing these methods, which used models with more predictors on specific data sets. Consequently, you can substitute Shapley regression with Relative Weights and vice versa.

Click here to go to the sign-up screen for Displayr. Following sign-up you will be taken to the document containing an example of Relative Weights applied to a regression model.

Acknowledgements

The outputs shown above use plotly.

CAPABILITIES

OBJECTIVES

TECHNIQUES

TECHNIQUES

LEARN

SUPPORT

LATEST WEBINAR

The Difference Between Shapley Regression and Relative Weights

Shapley regression

Relative Weights

Analytical expressions of scores

Varying correlation between predictors

Varying a regression coefficient

Acknowledgements

Prepare to watch, play, learn, make, and discover!

Get access to all the premium content on Displayr

Last question, we promise!

What type of survey data are you working with? (select all that apply)

CAPABILITIES

OBJECTIVES

TECHNIQUES

TECHNIQUES

LEARN

SUPPORT

LATEST WEBINAR

The Difference Between Shapley Regression and Relative Weights

Shapley regression

Relative Weights

Analytical expressions of scores

Varying correlation between predictors

Varying a regression coefficient

Acknowledgements

Prepare to watch, play, learn, make, and discover!

Get access to all the premium content on Displayr

Last question, we promise!

What type of survey data are you working with? (select all that apply)

eBook: How to Weight Survey Data