Driver Analysis - Displayr https://www.displayr.com/category/data-science/regression/driver-analysis/ Displayr is the only BI tool for survey data. Tue, 13 Sep 2022 04:35:01 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.2 https://www.displayr.com/wp-content/uploads/2023/10/cropped-Displayr-Favicon-Dark-Bluev2-32x32.png Driver Analysis - Displayr https://www.displayr.com/category/data-science/regression/driver-analysis/ 32 32 Driver Analysis in Displayr https://www.displayr.com/driver-analysis-in-displayr/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/driver-analysis-in-displayr/#respond Tue, 05 May 2020 08:11:30 +0000 https://www.displayr.com/?p=23323 ...]]> Displayr's driver analysis makes it both easy and fast to perform driver analysis. This post gives an overview of the key features in Displayr designed for performing driver analysis (i.e., working out the relative importance of predictors of brand performance, customer satisfaction, and NPS). This post describes the various driver analysis methods available, stacking, options for missing data, in-built diagnostics for model checking and improvement, and how to create outputs from the driver analysis.

For more detail about what method to use when, see our driver analysis webinar and eBook.

Choice of driver analysis method

All the widely used methods for driver analysis are available in Displayr. They are accessed via the same menu option, so you can toggle between them.

  • Correlations: Insert > Regression > Driver analysis and set Output to Correlation. This method is appropriate when you are unconcerned about correlations between predictor variables.
  • Jaccard coefficient/index: Insert > Regression > Driver analysis and set Output to Jaccard Coefficient (note that Jaccard Coefficient is only available when Type is set to Linear). This is similar to correlation, except it is only appropriate when both the predictor and outcome variables are binary.
  • Generalized Linear Models (GLMs), such as linear regression and binary logit, and the related quasi-GLM methods (e.g., ordered logit): Insert > Regression > Linear, Binary Logit, Ordered Logit, etc. These address correlations between the predictor variables, and each of the different methods is designed for different distributions of the outcome variable (e.g., linear for numeric outcome, binary logit for two-category outcome, ordered logit for ordinal output).
  • Shapley Regression: Insert > Regression > Driver analysis and set Output to Shapley Regression (note that Shapley Regression is only available when Type is set to Linear). This a regularized regression, designed for situations where linear regression results are unreliable due to high correlations between predictors.
  • Johnson's relative weight: Insert > Regression > Driver analysis. Note that this appears as Output being set to Relative Importance Analysis. As with Shapley Regression, this is a regularized regression, but unlike Shapley it is applicable to all Type settings (e.g., ordered logit, binary logit).

Stacking

Often driver analysis is performed using data for multiple brands at the same time. Traditionally this is addressed by creating a new data file that stacks the data from each brand on top of each other (see What is Data Stacking?). However, when performing driver analysis in Displayr, the data can be automatically stacked by:

  • Checking the Stack data option.
  • Selecting variable sets for Outcome and Predictors that contains multiple variables (for Predictors these need to be set as Binary - Grid or Number - Grid).

Missing data

By default, all the driver analysis methods exclude all cases with missing data from their analysis (this occurs after any stacking has been performed). However, there are two additional Missing data options that can be relevant:

  • If using Correlation, Jaccard Coefficient, or Linear Regression, you can select Use partial data (pairwise correlations), in which case the data is analyzed using all the available data. Even when not all the predictors have data, the partial information is used for each case.
  • If using Shapley Regression, Johnson's Relative Weights (Relative Importance Analysis) or any of the GLMs and quasi-GLMs, Multiple imputation can be used. This is generally the best method for dealing with missing data, except for situations the Dummy variable adjustment is appropriate.
  • If using Shapley Regression, Johnson's Relative Weights (Relative Importance Analysis) or any of the GLMs and quasi-GLMs, Dummy variable adjustment can be used. This method is appropriate when the data is missing because it cannot exist. For example, if the predictors are ratings of satisfaction with a bank's call centers, branches, and web site, if data is missing for people that have not attended any of these, then this setting is appropriate. By contrast, if the data is missing because the person didn't feel like providing an answer, multiple imputation is preferable.

Diagnostics for model checking and improvement

A key feature of Displayr's driver analysis is that it contains many tools for automatically checking the data to see if there are problems, including VIFs and G-VIFs if there are highly correlated predictors, a test of heteroscedasticity, tests for outliers, and checks that the Type setting has been chose correctly. Where Displayr identifies an issue that is serious it will show an error and provide no warnings. In other situations it will show a warning (in orange) and provide suggestions for resolving the issue.

One particular diagnostic that sometimes stumps new users is that by default Displayr sometimes shows negative importance scores for Shapley Regression and Johnson's Relative Weights. As both methods are defined under the assumption that importance scores must be positive, the appearance of negative scores can cause some confusion. What's going on is that Displayr also performs a traditional multiple regression and shows the signs from this on the relative importance outputs as a warning for the user that the assumption of positive importance may not be correct. This can be turned off by checking Absolute importance scores.

Outputs

Standard output output from all but the GLMs is a table like the one below. The second column of numbers shows the selected importance metric, and the first column shows this scaled to be out of 100.

Quad map

A key aspect of how driver analysis works in Displayr is that it can be hooked up directly to a scatterplot, thereby creating a quad map. See Creating Quad Maps in Displayr.

Crosstabs of importance scores

All the driver analysis methods have an option called Crosstab interaction, where a categorical variable can be selected, and the result is a crosstab that shows the importance scores by each unique value of the categorical variable, with bold showing significant differences and color-coding showing relativities.

Accessing the importance scores by code

The importance scores can also be accessed by code. For example, the raw importance scores are accessed using model.1$importance$raw.importance, contains the raw importance scores, where model.1 is the name of the main driver analysis output.

This can then be used in other reporting. For example, when inserted via Insert > R Output, table.Q14.7[order(model.1$importance$raw.importance, decreasing = TRUE), ] sorts a table called table.Q14.7 by the importance scores, and paste(names(sort(model.1$importance$raw.importance, decreasing = TRUE)), collapse = "\n") creates a textbox containing the attributes sorted from most to least important.

]]>
https://www.displayr.com/driver-analysis-in-displayr/feed/ 0
Using Text Data for Driver Analysis https://www.displayr.com/using-text-data-for-driver-analysis/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/using-text-data-for-driver-analysis/#respond Tue, 03 Dec 2019 18:48:26 +0000 https://www.displayr.com/?p=20485 ...]]> A driver analysis is used to highlight the key drivers of performance. Traditionally, it uses quantitative data, where the outcome variable is often satisfaction, likelihood to recommend, or some other measure of interest. The predictors of the outcome are ratings or a multiple response question indicating the performance of the product(s) being analyzed. However, text data from open-ended questions, tweets, or some other data source are also useful predictors. In this post, I present an example looking at drivers of preference for US phone companies, and discuss a couple of traps to avoid.

The case study

The data is from a study of the US cell phone market collected by Qualtrics in July and August of 2019. I've used two questions for the analysis. The first is a quantitative question, which measures how likely people are to recommend their main phone brand. The second is qualitative, where people have listed what they like about their brand.

Prior to running the driver analysis I coded the open-ended data into the categories shown below. You can also use automated techniques for extracting key concepts from the data rather than manually coding it. However, in general this data is a bit noisier, so the resulting driver analysis may be less valid when using automated techniques.

Conducting the driver analysis

As we discuss in our eBook on driver analysis, normally with driver analysis it is good practice to use Johnson's Relative Weights or the near-identical Shapley Regression, as they both rescale the data and deal with multicollinearity. But in this case, there is a smarter approach, which is just to use good old fashioned linear regression. What makes it smarter?

  • One of the key features of coded data is that some categories are bigger than others. In the table earlier in the post, 37% of people are categorized as Reliable/Coverage/Service, and only 2% as Speed. Using Johnson's Relative Weights or Shapley Regression will ensure that Reliable/Coverage/Service is very important, but that Speed is not. We want the driver analysis to determine importance from the relationship between the predictors and the outcome, not the amount of responses in each category.
  • When we use linear regression we can interpret the estimated coefficients as being differential impacts on NPS. The table below, for example, tells us that all else being equal, if a person likes their phone company due to Price, then their NPS score will be, on average, 18 points higher.

The table below shows the results of a linear regression.  At first glance the regression seems to make sense. People who said they like Nothing have got a much lower NPS, which is as we would expect. But, there is actually a problem here. The goal of driver analysis is to understand how experiences with the company influence attitude towards the company, where NPS is a measurement of that attitude. The categories of Nothing, I like them, and Everything aren't actually experiences at all. Rather, they are attitudes. So, the regression we have is meaningless, as it currently tells us that how much people like their cell phone carrier predicts their attitude to their cell phone carrier, which is tautological.

The solution to the tautology is to remove the predictors that are attitudes, which gives the model below. I've also removed Other as it is really a grab-bag of other things and thus uninterpretable.

Checking all the standard things

The next step is to do the standard tests of a regression model (e.g., outliers, multicolinearity, etc.). We discuss these in more detail in our eBook on driver analysis.

Putting it together as a quad map

The quad map below plots the importance scores (the Estimate column from above) on the x-axis and the performance (percentage of people to mention the issues) on the y-axis. In this case it delivers some great news, it identifies three opportunities for phone companies to different themselves. The attributes of Speed, Payment arrangements, and Customer service are all in the bottom-right "quadrant". These are things that people find to be very important, but where the average phone company has low levels of performance, suggesting that if a phone company can persuade more people of its excellence in these areas it will improve its NPS.

Some traps to avoid

Performing driver analysis using text data can be a great win. But, I will finish off the post by pointing out a few traps that can trap the unwary. They all relate to inadvertently using inappropriate data:

  1. Data from people with a known attitude. Sometimes open-ended questions are only asked for people who gave a high (or low) rating. Unfortunately, such data is not suitable for a driver analysis. The whole point of driver analysis is to see how one thing (the text data) predicts another (the overall measure of preference). But, if we have only conducted the analysis among people that like their brand, then we have insufficient variation in their attitude to the brand to work out what causes it.  The same problem exists if we have only collected text data from people known to dislike the brand.
  2. Using data from a Why did you say that? question. A second problem is where people were first asked their attitude, and then asked why did you say that. This is a problem because the actual meaning of this question is contextual. The person who said they really disliked the brand reads the question as why did you dislike the brand? whereas the person that likes the brand reads it as why do you like the brand? This means the text data is not comparable (e.g., if somebody says "price" it may mean the price is too high or too low).
  3. Using sentiment analysis on a How do you feel style question. In the case study I am using a rating of likelihood to recommend as the outcome variable. An alternative approach is to use an open-ended question and create an outcome variable by sentiment analysis. However, if doing this, some care is required, as it can easily be invalid. For example, let's say you asked How do you feel about Microsoft? Some people may respond by saying how much they like Microsoft. Other people may interpret this as an opportunity to describe what Microsoft is good at. A driver analysis of such data will be meaningless, as it will show that people mention specific things (e..g, Microsoft is innovative) will be less likely to give an attitude (e.g., I love Microsoft), as in effect they answered a different question, so we would end up with a driver analysis that tells us that being innovative is bad!
]]>
https://www.displayr.com/using-text-data-for-driver-analysis/feed/ 0
How to Identify the Key Drivers of Your Net Promoter Score https://www.displayr.com/nps-driver-analysis-with-displayr/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/nps-driver-analysis-with-displayr/#respond Mon, 18 Mar 2019 23:53:47 +0000 https://www.displayr.com/?p=16604 ...]]> What is driver analysis?

A customer feedback survey should aim to answer two questions when it comes to the Net Promoter Score (NPS):

  1. How likely are your customers to recommend your product or service?
  2. What are the key factors influencing your customers’ likelihood to recommend your product or service?

The first question is answered simply by calculating the Net Promoter Score. The second question is a lot harder to answer and involves what is commonly known as ‘driver analysis.’ The underlying goal of driver analysis is to determine the key attributes of your product or service that determine your Net Promoter Score. These attributes are referred to as ‘drivers.’

Driver analysis requires that you ask some follow-up questions about how the respondent would rate different attributes of your brand. For example, a tech company could poll customers on a range of brand perception attributes – fun, value, innovative, stylish, ease of use, etc. – to determine the key Net Promoter Score drivers.

Driver analysis often requires the use of statistical methods like linear regression modeling and relative weights analysis, which is more advanced than most forms of survey data analysis. However, it is well worth the effort.

Why is NPS driver analysis important?

Computing your Net Promoter Score is a great first step, but the simple statistic doesn’t tell you anything about why your customers are likely (or unlikely) to recommend your product or service. Driver analysis allows you to pinpoint the key factors driving their responses.

This information can influence how to tailor your product and where you focus your efforts. If a tech company finds that being perceived as ‘fun’ is a larger driver of NPS than being perceived as ‘innovative,’ then they may alter their marketing strategy to adopt a more ‘fun’ approach.

A practical example of NPS driver analysis

To better understand NPS driver analysis, let’s dive into a real-world example. Using Displayr, we analyzed NPS data from 14 large technology companies to determine which brand perception attributes played the largest role in influencing Net Promoter Scores. Survey respondents were asked how likely they were to recommend the given brands, as well as whether they associated the brands with specific perception attributes.

Regression modeling

To perform the driver analysis, we used two regression models to determine the effect of each brand perception attribute on a respondent's NPS response.

The first model is an ordered logit model, otherwise known as an ordered logistic regression. The model estimates the effect and significance each brand attribute has on overall Net Promoter Scores.

 

The ‘Estimate’ column measures the effect each brand attribute has on Net Promoter Scores. The larger the number, the larger the effect. The ‘p’ column measures the statistical significance of the brand attribute. If a brand attribute has a value below 0.05, we can conclude that it plays a significant role in determining NPS.

The second model is similar to the first, but there is one important distinction. Instead of estimating the overall effect each brand attribute has on NPS, it estimates the ‘relative importance.’ This means that it estimates the importance of each brand attribute in relation to the others.

The relative importance of each brand attribute can be interpreted as a percentage. For example, our model suggests that ‘fun’ accounts for almost 25% of the variation in NPS.

Data visualization

The two regression models have unpacked a lot of useful information and insights from the data set. Now it’s time to communicate our findings. To do this, we will create a data visualization that is both informative and easy to interpret.

The bar chart ranks the relative importance of each brand attribute, allowing us to compare their effects. It is easy for anyone to see that ‘fun’ is the most important attribute without having to interpret regression output data.

Try it yourself

Want to try analyzing NPS drivers for yourself? Click the button below for a simple step-by-step guide to recreate the data models and visualizations you just saw!

Learn NPS Driver Analysis in Displayr

]]>
https://www.displayr.com/nps-driver-analysis-with-displayr/feed/ 0
Customer Satisfaction: General, Product, & Attribute Questions https://www.displayr.com/csat-general-product-attribute/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/csat-general-product-attribute/#respond Thu, 13 Dec 2018 17:34:17 +0000 https://www.displayr.com/?p=15117 ...]]> General satisfaction

A customer's general satisfaction is their satisfaction with your brand or company as a whole. This is also known as their relational satisfaction, as it refers to a customer's overall relationship with your brand. This is the measure that the American Customer Satisfaction Index (ACSI) uses in their annual reviews. The general Customer Satisfaction question is a good customer feedback survey question because it measures someone's overarching attitude towards your brand, rather than their specific experiences with a product or service. In some ways, this question is similar to the NPS question, since it is attitudinal rather than specific. Your general satisfaction score gives you an idea of where you sit, which provides a good benchmark for more specific measures.

Product/service satisfaction

This measures a customer's satisfaction with a specific product or service. For example, if the general customer satisfaction measures somebody's satisfaction with Apple, this question measures their satisfaction with the iPhone. This is the first step in "drilling down" into a general satisfaction measure. Measuring how satisfied customers are with individual products means you can compare across products. It also allows you to identify whether certain products have a significantly lower satisfaction rating than others or the brand overall. This is also a good place to identify services which may need improvement, such as a website or customer support. This is known as transactional satisfaction, as the sentiment measured here is related to a specific transaction or experience a customer has recently had.

Attribute satisfaction

This question gets right down to the nitty-gritty details and asks about the customer's satisfaction with particular features (attributes) of a certain product. In the Apple example from earlier, this question would ask about satisfaction with the iPhone's screen, battery life, or audio quality (for instance). This is the most granular of these three measures. This question allows you to drill down even further into your customer satisfaction ratings.

Why do we need all three?

Asking about just one type of customer satisfaction could tell you something about how satisfied your customers are. However, what it can't tell you is why they are or are not satisfied, and what you should do to improve. This is where combining these three types of questions comes in handy! Gathering data about customer satisfaction throughout different levels will point you to what is making your customers dissatisfied. This data will allow you to conduct a driver analysis.

How to do a Driver Analysis?

Tutorial: CSAT Driver Analysis in Displayr

]]>
https://www.displayr.com/csat-general-product-attribute/feed/ 0
What is Driver Analysis? https://www.displayr.com/what-is-driver-analysis/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/what-is-driver-analysis/#respond Tue, 03 Apr 2018 03:51:11 +0000 https://www.displayr.com/?p=4850 ...]]>
  • What is the best way to improve the preference for a brand?
  • Should a firm concentrate on reducing the price or improving quality?
  • Should a brand focus on being positioned as being cool, or competent?
  • Outputs from driver analysis

    The key output from driver analysis is a measure of the relative importance of each of the predictor variables in predicting the outcome variable. These importance scores are also known as importance weights. Typically, they will either add up to 100% or the R-squared statistic.

    Outputs from driver analysis

    Data required for driver analysis

    Driver analysis is usually performed using data from surveys, where data has been collected for one or multiple brands. For each of the brands included in the survey, there is typically an overall rating of performance, as well as ratings on performance on various aspects of that overall performance (i.e., the drivers of overall performance).

    Data required for driver analysis

    Such data is typically collected from one or more grid questions, such as the example below (“Hilton”). The last row collects data on the overall level of performance. This is an outcome of interest to the Hilton. The other lines measure the Hilton’s performance on various attributes. Each of these attributes is a driver of the outcome of overall service delivery.

    How are driver importance scores computed?

    There are two technical challenges that need to be resolved when performing driver analysis. One is to ensure that all the predictors are on the same scale and the other is to address correlations between predictors.

    If one predictor is on a scale of 0 to 100, and another on a scale of 0 to 1, the first predictor will end up with an importance of 1/100th of the first, all else being equal. This can be resolved by either rescaling the data to make it comparable (e.g., making all predictors have a range of 1 or a standard deviation of 1), or by using statistics that ignore scale, such as correlations, beta scores, Shapley Regression, and Johnson’s Relative Weights.

    The more challenging problem with driver analysis is dealing with correlations between predictor variables, which make it hard to obtain reliable conclusions about relative importance. This is addressed by using methods specifically developed for this purpose, such as Shapley Regression and Johnson’s Relative Weights.

    Analyzing data for multiple brands

    Often a survey will collect data on multiple brands, and the goal of driver analysis is to quantify the average importance of the predictors across all the brands. This is performed in the same way as described above, except that the data needs to first be stacked.

    Typically, the data will initially be in a wide format, such as shown below.

    Stacking when conducting driver analysis involves rearranging the data, so that it instead has a single outcome variable column, and a single column for each predictor, as shown below. Typically a new data file is created that contains the stacked data.

    Analyzing data for multiple brands

    Acknowledgments

    The Hilton grid comes from http://blog.clientheartbeat.com/customer-survey-examples/

    ]]>
    https://www.displayr.com/what-is-driver-analysis/feed/ 0
    The Problem with Using Multiple Linear Regression for Key Driver Analysis: a Case Study of the Cola Market https://www.displayr.com/the-problem-with-using-multiple-linear-regression-for-key-driver-analysis-a-case-study-of-the-cola-market/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/the-problem-with-using-multiple-linear-regression-for-key-driver-analysis-a-case-study-of-the-cola-market/#respond Sun, 18 Jun 2017 00:59:18 +0000 https://www.displayr.com/?p=2139 ...]]> A key driver analysis investigates the relative importance of predictors against an outcome variable, such as brand preference. Many techniques have been developed for key driver analysis, to name but a few: Preference Regression, Shapley Regression, Relative Weights, and Jaccard Correlations.

    The best of the methods for regular day-to-day use of key driver analysis seems to be Johnson's Relative Weights technique, yet the standard technique taught in introductory statistics classes is Multiple Linear Regression. In this post, I compare Johnson’s Relative Weights to Multiple Linear Regression and I use a case study to illustrate why this introductory technique is best left in introductory classes.

    Download your free Driver Analysis ebook


    Key driver analysis of the cola market

    The data set I am using for this case study comes from a survey of the cola market. The brands considered are Coca-Cola, Diet Coke, Coke Zero, Pepsi, Pepsi Lite, and Pepsi Max. There were 327 respondents in the study. The 34 predictor variables contain information about the brand perceptions held by the consumers in the sample. I consider the relationship between these perceptions and how much the respondents like the brands (Hate ...  Love). The data has been stacked, and there are 1,893 cases with complete data for the analysis.

    The labeled scatterplot below shows the coefficients from the Multiple Linear Regression on the x-axis versus the relative importance scores computed using Johnson's Relative Weights on the y-axis. While the results are correlated, they are by no means strongly correlated. Remember that we are plotting the same data with the same basic type of analysis (i.e., predicting an outcome as a weighted sum of predictors).

    The most interesting contrast is for perception of Unconventional. The traditional regression shows it to be the third most important variable. However, the Relative Weights method suggests it is the 14th most important of the variables. That is a staggeringly big difference in interpretation.



    Which estimate is better?

    The Relative Weights estimates are the better of the two. This can be seen by inspecting a few additional analyses.

    The first analysis to check predicts brand preference using only Unconventional as the predictor. This model has an R2 of .009. By contrast, the model using only Reliable as a predictor has an R2 of .1883. This simple-but-easy-to-understand analysis suggests suggests that Reliable is 20 times as important as Unconventional, which is a lot more consistent with the conclusion from the Relative Weights than the Multiple Linear Regression.

    When a model is estimated using both Unconventional and Reliable as predictors, its R2 is .1903. Thus, adding Unconventional to the model that previously only predicted using Reliable increases the explanatory power by a paltry .0020. When done the other way around, adding Reliable to the model that only contains Unconventional adds .1813. Again, this suggests that Reliable is much more important than Unconventional.

    The regression model with all 34 predictors has an R2 of .4008. If we remove Unconventional from this model, the R2 drops by .0071, compared to a drop of .0118 for Reliable. This suggests that Reliable is around 1.7 times as important as Unconventional.

    In theory, we could repeat this analysis for all possible models involving the 34 predictors. That is, see what impact Unconventional has with each possible combination of predictors, and repeat the analysis for Reliable. This is how Shapley Regression computes importance. But, as we have 34 predictors, this would involve computing 17,179,869,184 regressions, and I have better things to do. Fortunately, Johnson's Relative Weights approximates the Shapley Regression scores. The estimates are that Unconventional will, on average, improve R2 by .01, whereas Reliable improves R2 by .044, suggesting that Reliable is around four times as important as Unconventional. This relativity is what is shown in the importance scores (i.e., vertical distances on the scatterplot above).


    Why does the multiple linear regression get it so wrong?

    If you have ever studied introductory statistics there is a good chance you were shown a proof that multiple linear regression estimates are the best possible unbiased estimates. So, why is it getting it wrong here? The multiple linear regression result implies that Reliable is around 1.3 times as important as Unconventional. This result is smaller than suggested by any of the other analyses that I have conducted, and is most similar to the analysis with all of the variables except for each of Reliable and Unconventional. Why does the multiple linear regression get it so "wrong"?

    The answer is that multiple regression makes a quite different assumption from an assumption implicit in my comparison. Multiple regression assumes that all the variables in the model are causally related to the outcome variable. So, its coefficient for Unconventional is the the estimated effect of this attribute under the assumption that all the other 33 predictors in the model do in fact cause brand preference. The relative importance analysis instead implicitly makes the assumption that we are not really sure which variables are true predictors or not, and the importance score is an estimate of the incremental effect of Unconventional across all possible models.

    In the case of key driver analysis, I think it is pretty fair to say that we never really know which of the predictors are appropriate. The assumption of the Relative Weights method is much safer.


    What about the whole issue of correlated predictors?

    Usually, when people discuss Relative Weights and the closely related Shapley Regression, the discussion is about how these methods perform better when the predictor variables are correlated. This is because if predictor variables are correlated, the effect of a variable will inevitably change a lot depending on which other variables are included in the analysis. That is, if two variables are highly correlated, if they are both included in the analysis their effects typically cancel out to an extent. Relative Weights and Shapley Regression essentially take the average effect across all the possible combinations of predictors. This means that they tend to be less sensitive to correlations between the predictors. With multiple regression, correlations between predictors can cause results to be unstable (i.e., to differ a lot from analysis to analysis). As the other methods essentially average across models, the instability cancels out.


    Conclusion

    The conclusion is straightforward: if performing Key Driver Analysis, you are better off using Relative Weights or a similar method, rather than Multiple Linear Regression.

    Download your free Driver Analysis ebook


    TRY IT OUT
    If you want to see all the detailed results referred to in this post, or run similar analyses yourself, click here to login to Displayr and see the document. You can see the R code by clicking on any of the results and selecting Properties > R CODE, on the right of the screen.


    ]]>
    https://www.displayr.com/the-problem-with-using-multiple-linear-regression-for-key-driver-analysis-a-case-study-of-the-cola-market/feed/ 0
    5 Ways to Visualize Relative Importance Scores from Key Driver Analysis https://www.displayr.com/5-ways-to-visualize-relative-importance-scores-from-key-driver-analysis/?utm_medium=Feed&utm_source=Syndication https://www.displayr.com/5-ways-to-visualize-relative-importance-scores-from-key-driver-analysis/#respond Wed, 26 Apr 2017 11:53:23 +0000 https://www.displayr.com/?p=1460

    To this end, I use a case study on the cola market, where a survey measured attitudes to six brands. Each brand was rated on on 34 different personality dimensions: "Next, we would like you to imagine that each of the cola brands you see below has a distinct personality. Using your imagination, take a moment and think about what kind of personality each cola would have, e.g., masculine/feminine, shy/out-going, etc." All the visualizations in this post can be replicated here.

    1. A table with statistical significance

    In the rest of this post I show nice graphical outputs, but I start with a table. I find this is often the best way for me to get my bearings when checking that the driver analysis has been useful. The advantage that the table has over the prettier outputs, is that we can simultaneously see:

    • The relative importance scores, scaled so that their absolute values sum to 100. The key things to look for here are that the relativities make sense. In this example, where the focus is on understanding brand positioning, eight drivers have a negative relative importance, which does not make sense. The fix in this case is to exclude these variables, as done in the next output. (Not all software for computing relative importance outputs negative scores, so if your results are all positive it is useful to check that this is not merely an assumption of the software.)
    • The Raw scores which sum to the R-square statistic. These are by definition always positive, and are used to compute the relative importance scores. They do this by being scaled so that their absolute values sum to 100, with signs from a standard multiple regression. These allow us to quickly verify that many of the predictors in this example are trivial.
    • The p-values, and the associated t-statistics, and standard errors.

     


    2. Bar or column charts

    The classic way of showing importance is as a bar or column chart. Often there are large numbers of variables, which makes it difficult to get a readable chart.  For instance, note the overlapping labels on the example chart below.

     

    3. Pie and donut charts

    Although the purists hate them, pie and donut charts are often useful when portraying importance scores. They allow viewers to get a feeling for the cumulative impact of the drivers. In this example, we can see that only 3 of our drivers explain more than one-quarter of the variance, and seven explain more than half the variance.

    4. Performance-importance charts

    Performance-importance charts, also known as quad charts, show the importance scores relative to the average values on the predictor variable. The example below shows the performance for Diet Coke. This labeled scatteplot allows us to quickly see that Diet Coke does really well on one thing, being Health Conscious, but this is not very important. The things that are important - being Reliable, Fun, and Confident - are all things that Diet Coke does poorly on.


    5. Correspondence analysis bubble charts

    This last visualization shows a bubble chart: correspondence analysis determines the positions of the bubbles and the absolute value of relative importance determines their sizes. We can see that Diet Coke and Diet Pepsi are skewing towards being Innocent and Health-Conscious. The more popular Coke and Pepsi are associated with being Traditional, Reliable, and Confident.

    TRY IT OUT

    You can replicate these visualizations for yourself in Displayr.

    These visualizations were done in Displayr. To see Displayr in action, grab a demo.

    Grab it here
    ]]>
    https://www.displayr.com/5-ways-to-visualize-relative-importance-scores-from-key-driver-analysis/feed/ 0