Segmentation - Displayr

Tools for Evaluating Segmentations

Tim Bock — Tue, 05 May 2020 07:49:09 +0000

The easy bit of market segmentation is creating segments. The hard bit is working out if the resulting segments are useful. In this post, I review tools four tools for making this process more efficient: using heatmaps to summarize lots of crosstabs, segment comparison tables, smart tables, correspondence analysis bubble charts, bubble charts that have been designed for comparing segments, and automatic updating.

Recap: How to create segments

Creating segments consists of two steps. First, we find some data that describes key differences between people. Typically the best data is obtained from surveys, collecting data on attitudes and behavior. The second step is to form the segments, where the options are to use:

Pre-defined segments. For example, age, gender, family life stage, company size, or industry.
Statistical techniques, such as cluster analysis, latent class analysis, and neural networks, which create new segmentations, where each segment consists of people that are similar based on the data being analyzed.
Strategy, which involves a combination of using pre-defined segments and statistical techniques.

The beginner's mistake is to choose to create a single segmentation. The mistake usually starts from a misunderstanding, which is the mistaken belief that a market contains a small number of segments and the goal of market segmentation is to find them.

The expert move is to create many segmentations - typically dozens - and work out which of these is "best", where best means provides the most strategic advantage. The key insight behind this approach is that in just about all markets there are no "natural" number of small segments. Rather, there are an infinite number of ways of carving segments out of the market. The more segmentations that are evaluated, the better the likelihood that a good one is found.

Once you have created multiple segmentations to be compared, you should end up with a data file, which contains:

One or more segment membership variables. Each segmentation will be represented by a separate variable. That is, a single column of data, where each person has a number assigned to them (e.g., a 1 for the first segment, 2 for the second segment, etc.).
Profiling variables, describing key differences between people. Typically this will include things like their attitudes, behaviors, demographics, and media usage.

Evaluating segments and segmentations

And now we move onto the meat of this post. If we have multiple segmentations, how do we efficiently compare them? The traditional approach has been to create lots and lots of crosstabs, and read through them all. This is a slow and painful process, and if you go down this route the odds are you will only end up evaluating a small number of possible segmentations which is, as described in the previous section, the wrong route. However, there are a number of ways of short-cutting this.

Heatmap summarizing lots of crosstabs

One way of automating the process of inspecting lots of crosstabs is to create all the crosstabs, but then, rather than read them, instead create a heatmap that summarizes what they show. The heatmap below compares six alternative segmentations using 115 profiling variables. Each row of the heatmap shows the statistical significance of the crosstab with each fo the profiling variables and each of the crosstabs. I've represented the statistical significance using z-statistics, as they create a better visualization than p-values (a z of more than 1.96 corresponds to a p of 0.05 or less).

Looking at the heatmap below we can see, for example, that segmentation 1's segments differ more in terms of work status, occupation, and age, than do any of the other segmentations (i.e., the blue is darker, which means a higher z-score, suggesting a more significant relationship). The second segmentation better explains differences in top of mind awareness, the perception that brands are fashionable, etc.

You can find out more about how to create this heatmap by reading this post.

Counting the number of significant differences

The chart below counts up the number of profiling variables that were statistically significant, as shown in the heatmap. it tells us that the 3rd and 5th segmentation are related to more of the profiling variables than the others, and should be the first segmentations we focus on. Additional insight can be obtained by doing similar analyses for sets of profiling variables (e.g., demographics, usage variables, etc.). And while you can create these by counting them in your heads, it's just a line of magic code to automate the process of counting up the number of differences: colSums(most.significant.results > 1.96).

One big segment profile table

Once you have identified the key segmenations to focus on, the next step is to carefully examine each to understand how their segments differ.

This next time saver is one that took us a bit of time to add to the software. We kept getting the request, but for some reason didn't figure out why it was important. The basic idea of the table below is that it is a single table, rather than lots of tables, that summarizes the relationship between the segments in a segmentation and all the profiling variables. Now, you may be thinking "it's just a table", but it has a few special features:

It shows the segment names and sizes at the top and when you scroll they stay pinned to the top
Shading is used to show the magnitude of differences between segments and you can control this (e.g., I've used borders instead)
Font color is used to show significance tests
It is showing both categorical and numeric data
You can embed it on a web page to give stakeholders access to it
You can set up filters so that clients can further drill into it

This is created using Insert/Create > Segments > Segment Comparison Table.

Smart tables

A practical problem with segment comparison tables is that with a big study they can just be too big to use. So, how should we select which variables to include? We should only include strategicially interesting vairbales. But what if we have lots of these? The simplest approach to doing this is to us automated statistical tests to identify which variables to include. In Q this is done using Insert > Tables > Smart Tables. In Displayr, using Insert > More > Tables > Lots of Crosstabs and choosing the options for deleting non-significant tables.

Correspondence analysis for better understanding a categorical profiling variable

The heatmap allows us to compare lots of segmentations. The segment comparison table gives us detailed information on a single segmentation. The next level of analysis is to get a lot of depth on a set of variables or categories from categorical variables. Correspondence analysis is often the best tool for this, as it draws our attention to the key differences between the segments. The example below, for example, shows us that Segments 3 and 4 skew to old people, and the others to younger people.

Bubble charts for comparing segmentations

If we have a single key numeric variable, we can compare it across multiple segments using a scatterplot like the one shown below, where the alternative segmentations appear on the x-axis, the key variable on the y-axis, and the sizes of the segments are shown by bubbles. We can see in this example that age seems to be the best of the alternative segmentations being compared as it has the highest degree of discrimination and the there are no huge segments. (The variation in the sizes of the age segments hints at some methodological problems however...). See chapter 3 of our Brand Analytics ebook for instructions on how to create this visualization.

Lastly, don't forget automatic updating

Most importantly, both of our products, Q and Displayr, have the ability to automatically update charts and tables with new data. This means that you can create a detailed set of tables or visualizations describing segments, and then automatically populate these with alternative segmentations.

Improving Segmentations Using Within Variable and Within Case Scaling

Tim Bock — Tue, 05 May 2020 06:29:50 +0000

This post describes how to apply the three standard ways of scaling (transforming) rating scales data prior to using cluster analysis and other algorithms for forming segments: standardizing within variable, standardizing within case, and unit scores within variable. The post starts with a discussion of the reasons for scaling. Then, it reviews the three standard ways of scaling. Then, there's a discussion of some other common ways of transforming data prior to creating segments. The post ends with a discussion of how to validate segmentations after scaling.

Why data is scaled

Segmentation algorithms, such as k-means, attempt to create groups that optimally summarize the strongest patterns in the data. However, sometimes the strongest patterns in the data are not very useful. In particular, three common problems are:

A small number of variables dominate the segmentation. For example, if one variable has a range of 11, and the others all have a range of 1, it's likely that differences on the variable with the large range will dominate the segmentation. This problem is addressed by scaling variables.
Patterns consistent with response biases are identified. For example, the segmentation below is based on ratings of how important people believe different things are to buyers of market researc services. Two clusters are shown. The average for cluster 1 is higher on every single variable than cluster 2. One explanation is that cluster 1 just regard everything as more important. The more likely explanation is that cluster 1 consists of people who have a tendency to give higher ratings (i.e., a yeah-saying bias). This problem is addressed by scaling within case.
The segments are just not interesting/useful, and there is a desire to obtain segments that are in some ways different. Some people new to segmentation are a bit surprised by this goal, as they often have a misunderstanding that segmentation is about "finding" segments in data. However, segmentation is more about "creating" segments. There are numerous arbitrary decisions that go into segmentation, and each reveals some different aspect of the underlying data. The goal is to find the segmentation that is most useful.

The three common ways of scaling

Standardizing data, in the context of clustering and other segmentation techniques, usually refers to changing the data so that it has a mean of 0 and a standard deviation of 1.

Standardizing within variable

The toy example below shows a data set containing 4 observations (variables) and three variables. Variable A will be ignored by any sensible segmentation algorithm, as it has no variation. But, all else being equal, a segmentation algorithm will be more likely to generate segments that differ on variable C than B. This is because C has a higher variance (standard deviation) than B. A fix for this problem is to modify each variable so that they have the same standard deviation. The simplest way of doing this is to subtract the mean from each variable and divide by the standard deviation, as done on the right. In Displayr, this is done by selecting the variables or variable set and clicking Object Inspector > ACTIONS > Scale within variable - standardize or by selecting this option from Insert > Transform.

Note that after this scaling variable A contains entirely missing data, and needs to be excluded from any segmentation.

Standardizing within case

Both before and after standardizing within variable, variables B and C are highly correlated (the standardization does not affect their correlation). When most of the variables are highly correlated it guarantees that the resulting segments will primarily differ in terms of their average values. Standardizing within case means scaling each case (row) of the raw data so that it has a mean of 0 and a standard deviation of 1. In Displayr, this is done by selecting the variables or variable set and clicking Object Inspector > ACTIONS > Scale within case - standardize or by selecting this option from Insert > Transform.

Compare the data for cases 3 and 4. In the original data, case 4 has values of 1 and 7, whereas case 3 has values of 1 and 4. After the scaling, cases 3 and 4 are now identical. Also note that variable A previously contained no information, but it now does contain variation, as case 2's score of 1 on A is, by this case's standards, nowhere near as low a score as it is for the other cases.

The output below shows the two cluster solution for the market researcher data after standardizing within case. The yeah-saying bias has been removed.

Unit scaling within variables

An alternative to standardizing within variables is to scale the data to have a unit scale, which means a minimum value of 0 and a maximum value of 1. This form of scaling is most useful when the input data has different scales (e.g., some variables may be on 2-point scales and others on 9-point scales). In Displayr, this is done by selecting the variables or variable set and clicking Object Inspector > ACTIONS > Scale within variable - unit or by selecting this option from Insert > Transform.

Other scalings

Other transformations

Any transformation of variables can be used as a way of preparing data prior to using a segmentation algorithm, such as logarithms, rankings, square roots, and top 2 boxes, to name a few. These are available in Displayr via Insert > Transform and by clicking Object Inspector > ACTIONS.

Dimension reduction

Another popular approach to scaling data is to use dimension reduction techniques such as principal component analysis/factor analysis, and multiple correspondence analysis.

Multiple scalings

It is not unknown to apply multiple scalings in sequence. For example, first standardizing within case and then within variable, or, the other way around.

Validating scalings

Ultimately any approach to scaling the data is arbitrary and as such it may improve things, or, make them worse.

Evaluating differences with the raw data

The most fundamental check is to check the resulting segmentation using the original, untransformed, variables. The first two columns of data in the table below just repeat the initial segmentation from near the beginning of this post. The columns on the right demonstrate that the segments formed using data standardized within case are different even when compared using the original data. This is both important as a check for validity and also important for reporting (as showing results using scaled variables is a sure-fire way of confusing end-users of a segmentation).

Comparing to other data

A second way of validating the scaling is to check that the segments are correlated with other variables. For example, the segments formed with the standardized data do predict differences in whether somebody is a research provider or not, which demonstrates that the segments are not merely "noise".

General usefulness

The last key consideration when evaluating a scaling is the most important: is the resulting segmentation useful for the purpose for which it has been created?

For more information about how to perform segmentation, see our webinar and our eBook.

How to do K-means Cluster Analysis in Displayr

Tim Ali — Mon, 17 Jun 2019 03:43:27 +0000

Choose your clustering variables

To run the k-Means in Displayr, we first need to select the variables that we want use as inputs to the segmentation, what are commonly called the clustering variables. In the example below, we'll use a behavioral and attitudinal statement battery on mobile technology. Questions were asked as a 5-point agree/disagree scale. We'll use the top 2 box responses to each of the statements as the inputs to our k-Means cluster analysis.

You can use any other numeric variables as clustering variables that can potentially provide differentiation between the respondents and therefore help define the clusters.

Check out the interactive tutorial on running k-Means

Running the k-Means Cluster Analysis

To setup the cluster analysis in Displayr, select Insert > Group/Segment > K-Means Cluster Analysis. A cluster analysis object will added to the current page. The next step is to add the input variables to the cluster analysis. In this case, we'll select the 11 behavioral/attitudinal statements from the Variables drop-down under in the Inputs section on the right. If the variables are grouped in a Variable Set, then the Variable Set may be selected instead, which is more convenient than selecting multiple variables.

Next, we select the number of clusters that we want to create. I have opted to create 3 clusters, but you can choose anything you want here. For this example, we'll leave the rest of the inputs with the default values selected. The following table of means output is generated.

Interpreting the Results

The standard table of means output shown above lists each of the clustering variables in the rows and shows the mean Top 2 Box percentage for each of the clusters.

The size of each cluster (n) is shown in the column header.
The red and blue highlights indicate whether or not the Top 2 Box score is higher (blue) or lower (red) than the overall mean. The red and blue colors are also scaled to provide some additional differentiation (darker shades of red/blue are farther from the mean).
Means in bold font are significantly higher/lower than the mean score.
The R-Squared value shows proportion of variance in the cluster assignment that is explained by the each of the clustering variables. In the example above, we can see that there are 4 statements that have a greater impact on the segment/cluster predictions than do the remaining variables.
The p-value shows which statement variables are significant in the model.

Saving Cluster Membership

Individual respondents can be assigned to the individual clusters in Displayr by first selecting the k-Means Cluster Analysis output and then selecting Insert > Group/Segment > Save Variable(s) > Membership. A new categorical variable is added to the top of the data set called "Segment/Cluster memberships from kmeans". Locate the new variable in the Data Sets tree and hover over it to preview the respondent level membership data or drag the variable onto the page to create a table.

This segment/cluster variable can be used for profiling against your demographic variables. Once you've identified the key differences between your clusters, try to come up with names that describe each cluster. You can add then these names to the cluster variable by first selecting the variable in the Data Sets tree, click the Labels button from the Properties on the right and enter your the cluster names in the Label column. Click OK to save the cluster names.

Check out the interactive tutorial on running k-Means

How to Work Out the Number of Segments in a Market Segmentation

Tim Bock — Sun, 23 Sep 2018 18:00:48 +0000

Download your free DIY Market Segmentation ebook

Strategic usefulness

On choosing the number of segments, the key determinant is typically which number of segments leads to a solution that seems to have the most useful strategic and practical implications. This is usually judged best by the end-users of the segmentation, as this is a managerial rather than a statistical question.

The no small segments and extent of association with other data methods are both closely related to strategic usefulness.

No small segments

The basic idea of this approach is that you choose the highest number of segments, such that none of the segments are small (less than 5% of the sample). This rule has two justifications. One is that solutions with very small segments are unlikely to be statistically reliable. The other is that small segments are unlikely to be strategically usable.

A weakness of this approach is the difficulty of justifying the choice of cutoff value.

Extent of association with other data

This approach involves assessing the extent to which each number of segments solution (i.e., the two-segment solution, the three-segment solution, etc.) are associated with other data. There are two rationales for this approach. One is that the stronger the association with other data, the greater the likelihood that the solution is valid, rather than just reflecting noise. The second is that if a solution is not associated with other data, it will be difficult to use in practice (e.g., it will be difficult to target advertising and distribution if the segmentation is not related to variables that are correlated with advertising usage and shopping behavior).

A practical challenge with this approach is that any truly novel and interesting finding is one that does not relate strongly to existing data.

Download your free DIY Market Segmentation ebook

Cross-validation

This approach involves using only a subset of the data for each subject (or whatever other unit of analysis is used) when fitting a model for a specified number of segments. Subsequently, it involves computing some measure of fit (e.g., log-likelihood) of the fitted model with the observations not used in estimation. This is repeated for different numbers of segments (e.g., from 1 to 10), and the number of classes with the best fit is selected. A variant of this, called K-fold cross validation, instead operates by splitting the sample into K groups and estimating K models for the K-1 groups and judging them based on the fit for the K groups.

The strength of these approaches is that they rely on few assumptions. The weakness is that when there is little data, or only small differences between the number of segments, these approaches are not highly reliable.

Penalized fit heuristics

Provided there are no technical errors, it should always be the case that the more segments you have, the better the segments will fit the data. At some point, however, adding the number of segments will overfit the data. Penalized fit heuristics are metrics that start with a computation of fit, and then penalize this based on the number of clusters.

Dozens and perhaps hundreds of penalized fit heuristics have been developed, such as the Bayesian information criteria (BIC), the gap statistic, and the elbow method (where the penalty factor is passed on the perceptions of the analyst rather than as a cut-and-dried rule).

A practical challenge with all penalized fit heuristics is that they tend to be optimized to work well for a very specific problem but poorly in other contexts. As a result, such heuristics are not in widespread use.

Download your free DIY Market Segmentation ebook

Statistical tests

Statistical tests, such as likelihood ratio tests, can also be used to compare different number of segments, where the difference distribution is bootstrapped.

Practical problems with this approach include that it is highly computationally intensive, that software is not widely available, and that the local optima that inevitably occur mean that the bootstrapped likelihood ratio test will inevitably be highly biased.

Entropy

When latent class analysis is used for segmentation, an output from latent class analysis is an estimate of the probability that each subject (e.g., person) is in each of the segments. This data can be summarized into a single number, called entropy, which takes a value of 1 when all respondents have a probability of 1 of being in one class, and value of 0 when the probabilities of being assigned to a class are constant for all subjects. (Sometimes this is scaled in reverse, where 0 indicates all respondents have a probability of 1 and 1 indicates constant probabilities.)

The main role of entropy is to rule out the number of segments when the entropy is too low (less than 0.8). The evidence in favor of selecting the number of segments based on entropy is weak.

Replicability

Replicability is computed by either randomly sampling with replacement (bootstrap replication) or splitting a sample into two groups. Segmentation is conducted in the replication samples. The number of segments which gets the most consistent results (i.e., consistent between the samples) is considered to be the best. This approach can also be viewed as a form of cross-validation.

Two challenges with this approach are that local optima may be more replicable than global optima (i.e., it may be easier to replicate a poor solution than a better solution), and that replicability declines based on the number of segments, all else being equal.

Download your free DIY Market Segmentation ebook

See more of our useful How To... guides, or check out the rest of our blog!

How to Work Out the Number of Clusters in Cluster Analysis

Tim Bock — Wed, 02 May 2018 20:53:34 +0000

Penalized fit heuristics

Provided there are no technical errors, it should always be the case that the more clusters you have, the better the clusters will fit the data. At some point, however, adding the number of clusters will overfit the data. Penalized fit heuristics are metrics that start with a computation of fit, and then penalize this based on the number of clusters.

Dozens and perhaps hundreds of penalized fit heuristics have been developed, such as the Bayesian information criteria (BIC), the gap statistic, and the elbow method (where the penalty factor is passed on the perceptions of the analyst rather than a cut-and-dried rule).

A practical challenge with all penalized fit heuristics is that they tend to be optimized to work well for a very specific problem but work poorly in other contexts. As a result, such heuristics are not in widespread use.

Statistical tests

Statistical tests, such as likelihood ratio tests, can also be used to compare a different number of clusters. In practice, these tests make very strong and difficult-to-justify assumptions, and none of these tests has ever been widely adopted.

The extent of association with other data

This approach involves assessing the extent to which each cluster solution (i.e., the two-cluster solution, the three-cluster solution, etc.) is associated with other data. The basic idea is that the stronger the association with other data, the greater the likelihood that the solution is valid, rather than just reflecting noise.

A practical challenge with this approach is that any truly novel and interesting finding is one that does not relate strongly to existing classifications.

Replicability

Replicability is computed by either randomly sampling with replacement (bootstrap replication) or splitting a sample into two groups. Cluster analysis is conducted in the replication samples. The number of classes that get the most consistent results (i.e., consistent between the samples), is considered to be the best. This approach can also be viewed as a form of cross-validation.

No small classes

The basic idea of this approach is that you choose the highest number of classes, such that none of the classes are small (e.g., less than 5% of the sample). This rule has long been used in practice as a part of the idea of domain-usefulness but has recently been discovered to also have some theoretical justification (Nasserinejad, K, van Rosmalen, J, de Kort, W, Lesaffre, E (2017) Comparison of criteria for choosing the number of classes in Bayesian finite mixture models. PloS one, 12).

A weakness of this approach is the difficulty of specifying the cutoff value.

Domain-usefulness

Perhaps the most widely used approach is to choose the solution that appears, to the analyst, to be the most interesting.

What are the Strengths and Weaknesses of Hierarchical Clustering?

Tim Bock — Tue, 10 Apr 2018 06:38:48 +0000

If you want to do your own hierarchical clustering, use the template below - just add your data!

The strengths of hierarchical clustering are that it is easy to understand and easy to do. The weaknesses are that it rarely provides the best solution, it involves lots of arbitrary decisions, it does not work with missing data, it works poorly with mixed data types, it does not work well on very large data sets, and its main output, the dendrogram, is commonly misinterpreted. There are better alternatives, such as latent class analysis.

Easy to understand and easy to do…

There are four types of clustering algorithms in widespread use: hierarchical clustering, k-means cluster analysis, latent class analysis, and self-organizing maps. The math of hierarchical clustering is the easiest to understand. It is also relatively straightforward to program. Its main output, the dendrogram, is also the most appealing of the outputs of these algorithms.

… But rarely provides the best solution

The scatterplot below shows data simulated to be in two clusters. The simplest hierarchical cluster analysis algorithm, single-linkage, has been used to extract two clusters. One observation -- shown in a red filled circle -- has been allocated into one cluster, with the remaining 199 observations allocated to other clusters.

It is obvious when you look at this plot that the solution is poor. It is relatively straightforward to modify the assumptions of hierarchical cluster analysis to get a better solution (e.g., changing single-linkage to complete-linkage). However, in real-world applications the data is typically in high dimensions and cannot be visualized on a plot like this, which means that poor solutions may be found without it being obvious that they are poor.

Arbitrary decisions

When using hierarchical clustering it is necessary to specify both the distance metric and the linkage criteria. There is rarely any strong theoretical basis for such decisions. A core principle of science is that findings are not the result of arbitrary decisions, which makes the technique of dubious relevance in modern research.

Missing data

Most hierarchical clustering software does not work with values are missing in the data.

Data types

With many types of data, it is difficult to determine how to compute a distance matrix. There is no straightforward formula that can compute a distance where the variables are both numeric and qualitative. For example, how can one compute the distance between a 45-year-old man, a 10-year-old-girl, and a 46-year-old woman? Formulas have been developed, but they involve arbitrary decisions.

Misinterpretation of the dendrogram

Dendrograms are provided as an output to hierarchical clustering. Many users believe that such dendrograms can be used to select the number of clusters. However, this is true only when the ultrametric tree inequality holds, which is rarely, if ever, the case in practice.

There are better alternatives

More modern techniques, such as latent class analysis, address all the issues with hierarchical cluster analysis.

What are Segmentation Variables?

Tim Bock — Tue, 10 Apr 2018 06:09:37 +0000

Download your free DIY Market Segmentation ebook

When people refer to segmentation variables, they are usually referring to one of the following:

A single variable that is used to allocate people to segments
A set of variables where people are used to allocate people to segments based on some logical relationship
A set of variables that are used in a predictive statistical algorithm to predict segmentation membership
A set of variables that are used in a segmentation algorithm
A variable in a data file or database that records segment membership

A single variable that is used to allocate people to segments

Often, one key characteristic of people is used to define segments. Airlines allocate flyers into segments (tiers) based on status credits (longer flights and more expensive flights earn more status credits). Banks allocate customers into segments based on the profit that the customers are likely to provide to the bank.

A set of variables used to allocate people to segments based on a logical relationship

Sometimes there are logical relationships between small numbers of variables that can be exploited when allocating customers to segments. Direct marketers allocate customers into segments and prioritize these segments based on how recently people have responded, how frequently they have responded, and how much they have spent when previously buying things as a result of direct-mail campaigns. Packaged-goods companies like Nestle and Unilever classify people into life-stage segments, based on the age, marital status, and the number of children (e.g., one segment of families with young children, another with teens, “empty nesters”, etc.).

A set of variables that are used in a predictive statistical algorithm to predict segmentation membership

Sometimes a single variable has been identified as being useful for segmentation, but the variable’s value is unknown for some people. For example, a bank may know how much profit its customers provide, but the bank cannot know the potential profit of competitors’ customers. Predictive models can be used to predict the profit of the current customers based on other variables, such as age, where people live, marital status, race, etc. These other variables are sometimes referred to in this context as segmentation variables, and the predictions made using these segmentation variables can be used to prioritize the customers of other banks (e.g., to target with Facebook ads).

A set of variables that are used in a segmentation algorithm

Sometimes a large number of variables are identified, and segmentation algorithms, such as k-means cluster analysis and latent class analysis, are used to identify groups of people that are similar to each other. The variables that are used in this analysis are referred to as segmentation variables.

A variable in a data file or database that records segment membership

When segments are formed using any of the methods discussed above, people in a database or data file are assigned into segments. E.g., the first person may be in segment 1, the second person in segment 3, and so on. The data that contains this segment membership information is also often referred to as the segmentation variable.

This article refers/restricts itself to people. However, the same ideas apply to other units of analysis (e.g., grouping households, countries, occasions, etc.).

Download your free DIY Market Segmentation ebook

How to Deal with Missing Values in Cluster Analysis

Tim Bock — Thu, 05 Apr 2018 20:21:41 +0000

Most of the widely used cluster analysis algorithms can be highly misleading or can simply fail when most or all the observations have some missing values. There are five main approaches to dealing with missing values in cluster analysis: using algorithms specifically designed for missing values, imputation, treating the data as categorical, forming cluster based on complete cases and allocating partial data to clusters, and forming clusters using only the complete data.

The different approaches have been ordered in terms of how safe they are. The safest techniques are introduced first.

Cluster analysis techniques designed for missing data

With very few exceptions, most of the cluster analysis techniques designed explicitly to deal with missing data are called latent class analysis rather than cluster analysis. There are some technical differences between the two techniques, but ultimately, latent class analysis is just an improved version of cluster analysis, where one of the improvements is the way it deals with missing data.

Impute missing values

Imputation refers to tools for predicting the values that would have existed were the data not missing. Provided that you use a sensible approach to imputing missing values (and replacing missing values with the average of their other values is not a sensible approach), running cluster or latent class analysis on the imputed data set means that the missing data is treated in a better way than occurs by default when using cluster analysis (by default, most cluster analyses methods make an assumption that data is missing completely at random (MCAR), which is both a strong assumption and one that is rarely correct; using imputation implicitly involves making the more relaxed assumption that data is missing at random (MAR), which is better.

Use techniques developed for categorical data

Cluster and latent class techniques have been developed for modeling categorical data. When the data contains missing values, if the variables are treated as categorical and the missing values are added to the data as another category, then these cluster analysis techniques developed for categorical data can be used.

At a theoretical level, the benefit of this approach is that it makes the fewest assumptions about missing data. However, the cost of this assumption is that often the resulting clusters are largely driven by differences in missing values patterns, which is rarely desirable.

Form clusters based on complete cases, and then allocate partial cases to segments

A popular approach to clustering with missing values is to cluster only observations with complete cases, and then assign the observations with incomplete data to the most similar segment based on the data available. For example, this approach is used in SPSS with the setting of Options > Missing Values > Exclude case pairwise.

A practical problem with this approach is that if the observations with missing values are different in important ways from those with no missing values, this is not going to be discovered. That is, this method assumes that all the key differences of interest are evident in the data where there are no missing values.

Form clusters based only on complete cases

The last approach is to ignore the data that has missing values, and perform the analysis only on observations with complete data. This does not work at all if you have missing values for all cases. Where the sample size with complete data is small, the technique is inherently unreliable. Where the sample size gets larger, the approach is still biased except where the people with missing data are identical to the observations with complete data, except for the “missingness” of the data. That is, this approach involves making the strong MCAR assumption.

How to Work Out the Number of Classes in Latent Class Analysis

Tim Bock — Wed, 04 Apr 2018 20:07:32 +0000

Try your own Latent Class Analysis!

Cross-validation

Cross-validation on a latent class involves using only a subset of the data for each subject (or whatever other unit of analysis is used) when fitting a model for a specified number of classes, and then computing some measure of fit (e.g., log-likelihood) of the fitted model with the observations not used in estimation. This is repeated for different numbers of classes (e.g., from 1 to 10), and the number of classes with the best fit is selected. A variant of this, called K-fold cross validation instead operates by splitting the sample into K groups and estimating K models for the K-1 groups and judging them based on the fit for the K groups.

The strength of these approaches is that they rely on few assumptions. The weakness is that when there is little data, or only small differences between the number of classes, these approaches are not highly reliable.

Information criteria

Provided there are no technical errors, it should always be the case that the more classes you have, the better the classes will fit the data. At some point, however, adding the number of classes will overfit the data. Information criteria are heuristics that start with a computation of fit (the log-likelihood), and then penalize this based on the number of classes. Information criteria commonly applied to the selection of number of classes include the Bayesian information criterion (BIC), deviance information criterion, and the corrected-Akaike’s information criterion (CAIC).

Information criteria are easy to compute, but have, at best, weak theoretical support when applied to latent class analysis.

Statistical tests

Statistical tests, such as likelihood ratio tests, can also be used to compare different number of clusters, where the difference distribution is bootstrapped.

Practical problems with this approach included that it is highly computationally intensive, that software is not widely available, and that the local optima that inevitably occur mean that the bootstrapped likelihood ratio test will inevitably be highly biased.

Try your own Latent Class Analysis!

Extent of association with other data

This approach involves assessing the extent to which each number of classes solution (i.e., the two-class solution, the three-class solution, etc.) are associated with other data. The basic idea is that the stronger the association with other data, the greater the likelihood that the solution is valid, rather than just reflecting noise.

A practical challenge with this approach is that any truly novel and interesting finding is one that does not relate strongly to existing classifications.

Entropy

An output from latent class analysis is an estimate of the probability that each subject (e.g., person) is in each of the classes. This data can be summarized into a single number, called entropy, which takes a value of 1 when all respondents have a probability of 1 of being in one class, and value of 0 when the probabilities of being assigned to a class are constant for all subjects. (Sometimes this is scaled in reverse, where 0 indicates all respondents have a probability of 1 and 1 indicates constant probabilities.)

The main role of entropy is to rule out the number of classes when the entropy is too low (e.g., less than 0.8). The evidence in favor of selecting the number of classes based on entropy is weak.

Replicability

Replicability is computed by either randomly sampling with replacement (bootstrap replication) or splitting a sample into two groups. Latent class analysis is conducted in the replication samples. The number of classes which gets the most consistent results (i.e., consistent between the samples) is considered to be the best. This approach can also be viewed as a form of cross-validation.

No small classes

The basic idea of this approach is that you choose the highest number of classes, such that none of the classes are small (e.g., less than 5% of the sample). This rule has long been used in practice as a part of the idea of domain-usefulness but has recently been discovered also to have some theoretical justification (Nasserinejad, K., van Rosmalen, J., de Kort, W., Lesaffre, E. (2017) Comparison of criteria for choosing the number of classes in Bayesian finite mixture models. PloS one, 12).

A weakness of this approach is the difficulty of justifying the choice of cutoff value.

Domain-usefulness

Perhaps the most widely used approach is to choose the number of classes that creates the solution that is, to the analyst, most interesting.

Try your own Latent Class Analysis!

How to Reduce the Number of Segmentation Variables

Tim Bock — Tue, 03 Apr 2018 19:52:12 +0000

A practical challenge when working out how to segment is that there are usually lots of possible variables, and you need to reduce that number. For example, if you use all the techniques in How to identify Relevant Variables for Market Segmentation, you will often end up with a very long list of segmentation variables. The list will probably be too long for you to measure them all, so the next decision is how to reduce the list in size.

Download your free DIY Market Segmentation ebook

Only use theoretically sensible variables

A useful segmentation variable should either:

Explain a difference in how or why people buy
Explain a difference in terms of how attractive customers are to a firm

If your variables do not relate to the benefits that people seek, how they intact, what prevents them from buying, purchase power, or customer value/profitability, they are unlikely to be useful. See Which Segmentation Variables Should You Use and Why?.

Use variables that are known to describe differences between people

Only include variables where there is a good reason to believe people differ. Virtually everybody considers taste important when buying food, so there is little point in measuring the importance of taste in any market, since measured differences will likely just reflect measurement error. Similarly, everybody wants “good value”, so if you ask that question you will be measuring differences in how people like to tick boxes in a questionnaire, not real differences that you can use to build a segmentation strategy.

Relate to marketing activities

Good variables often relate to the type of marketing activities that will need to be employed and the degree of marketing effort required. If a variable is measuring a difference between consumers that is not obvious in its marketing implications, serious thought should be given to excluding the variable.

Good variables are ones that are easy to measure

If it is hard to measure, this means that the resulting data is noisy, which makes it poor for segmentation. Most psychological variables fall into this category, which is why while people often describe consumers in terms of personality, few segmentations end up being successfully created around personality.

Align variables with strategic plans

Include variables that explicitly relate to the strategic planning of the company. For example, if key marketing issues relate to advertising, then awareness, knowledge and product experience are useful for identifying target consumers, while psychographics and lifestyle are useful in tailoring the message. If the priority of the firm is new product development, then the focus should be on demand-creating conditions (e.g., jobs-to-be-done) and preferences for different products.

Download your free DIY Market Segmentation ebook

How to Write “Golden Questions” for Market Segmentation

Tim Bock — Tue, 03 Apr 2018 19:35:30 +0000

The main applications of golden questions are:

As discovery questions in a sales scenario (e.g., a salesperson may be trained to ask a few questions, which allows them to work out the segment somebody is in)
When collecting data in sign-up forms (e.g., industry, firm size)
When screening people to participate in market research studies

There are three basic approaches to creating golden questions: use judgment to write a single question; use judgment to write a set of questions; use machine learning to select the golden questions from a larger set of questions.

Download your free DIY Market Segmentation ebook

Example of a judgment-based golden question

The most straightforward approach to writing a golden question is to theorize about relevant segments and write one or a small number of questions that allow consumers to select which segment they are in. For example, in the 1990s Microsoft segmented people according to their attitude to technology using the following question:

Which one of the following phrases best describes you and your attitude towards technology in general?

I love technology, I spend a lot of time reading up on new developments and trying new things

I like to try and keep up with technology but usually need someone to help me or explain things

I accept technological developments but don't really get a buzz out of them

I feel behind when it comes to technology but am open to learning more about it

I find the whole idea of new technology a bit daunting at times

I think the world would be a better place without technological change

Example of a set of judgment-based questions

Sometimes it can be faster and easier to ask multiple questions. The diagram below shows how McKinsey has historically measured life stage:

Empirical approaches

Empirical approaches proceed in two stages. First, a segmentation algorithm, such as k-means cluster analysis or latent class analysis is used to form segments using all the relevant variables. Then, a machine learning or statistical tool, such as linear discriminant analysis or random forest is used to identify the golden questions. Empirical approaches always involve making a tradeoff between the accuracy of the model versus the number of golden questions (fewer golden questions mean less accuracy).

Download your free DIY Market Segmentation ebook

Want to learn more? Check out our other handy How To guides!

What is Cluster Analysis?

Tim Bock — Tue, 03 Apr 2018 04:30:05 +0000

Cluster analysis refers to algorithms that group similar objects into groups called clusters. The endpoint of cluster analysis is a set of clusters, where each cluster is distinct from each other cluster, and the objects within each cluster are broadly similar to each other. For example, in the scatterplot below, two clusters are shown, one by filled circles and one by unfilled circles.

The required data for cluster analysis

Typically, cluster analysis is performed on a table of raw data, where each row represents an object and the columns represent quantitative characteristic of the objects. These quantitative characteristics are called clustering variables. For example, in the table below there are 18 objects, and there are two clustering variables, x and y. Cluster analysis an also be performed using data in a distance matrix.

Why is cluster analysis used?

In the example above, it is easy to detect the existence of the clusters visually because the plot shows only two dimensions of data. Typically, cluster analysis is performed when the data is performed with high-dimensional data (e.g., 30 variables), where there is no good way to visualize all the data.

The outputs from k-means cluster analysis

The main output from cluster analysis is a table showing the mean values of each cluster on the clustering variables. The table of means for the data examined in this article is shown below.

A second output shows which object has been classified into which cluster, as shown below. Other outputs include plots and diagnostics designed to assess how much variation exists within and between clusters.

Cluster analysis algorithms

Cluster analysis is a computationally hard problem. For most real-world problems, computers are not able to examine all the possible ways in which objects can be grouped into clusters. Thousands of algorithms have been developed that attempt to provide approximate solutions to the problem. The three main ones are:

Hierarchical clustering. This technique starts by treating each object as a separate cluster. Then, it repeatedly executes the following two steps: (1) identify the two clusters that are closest together, and (2) merge the two most similar clusters. This continues until all the clusters are merged together.

Try your own Hierarchical Cluster Analysis

k-means cluster analysis. This technique requires the user to specify a required number of clusters. Initially, observations are allocated to clusters using some arbitrary process (e.g., randomly). Then, the cluster means are computed, and objects are allocated to the closest cluster. These last two steps are repeated until the clusters do not change.
Latent class analysis. In terms of process, this is like k-means, except that it can be used with both numeric and non-numeric data.

Try your own Latent Class Analysis

What is Hierarchical Clustering?

Tim Bock — Wed, 28 Mar 2018 01:05:10 +0000

Hierarchical clustering, also known as hierarchical cluster analysis, is an algorithm that groups similar objects into groups called clusters. The endpoint is a set of clusters, where each cluster is distinct from each other cluster, and the objects within each cluster are broadly similar to each other.

If you want to do your own hierarchical cluster analysis, use the template below - just add your data!

Required data

Hierarchical clustering can be performed with either a distance matrix or raw data. When raw data is provided, the software will automatically compute a distance matrix in the background. The distance matrix below shows the distance between six objects.

Create your own hierarchical cluster analysis

How hierarchical clustering works

Hierarchical clustering starts by treating each observation as a separate cluster. Then, it repeatedly executes the following two steps: (1) identify the two clusters that are closest together, and (2) merge the two most similar clusters. This iterative process continues until all the clusters are merged together. This is illustrated in the diagrams below.

The main output of Hierarchical Clustering is a dendrogram, which shows the hierarchical relationship between the clusters:

Create your own hierarchical cluster analysis

Measures of distance (similarity)

In the example above, the distance between two clusters has been computed based on the length of the straight line drawn from one cluster to another. This is commonly referred to as the Euclidean distance. Many other distance metrics have been developed.

The choice of distance metric should be made based on theoretical concerns from the domain of study. That is, a distance metric needs to define similarity in a way that is sensible for the field of study. For example, if clustering crime sites in a city, city block distance may be appropriate. Or, better yet, the time taken to travel between each location. Where there is no theoretical justification for an alternative, the Euclidean should generally be preferred, as it is usually the appropriate measure of distance in the physical world.

Create your own hierarchical cluster analysis

Linkage Criteria

After selecting a distance metric, it is necessary to determine from where distance is computed. For example, it can be computed between the two most similar parts of a cluster (single-linkage), the two least similar bits of a cluster (complete-linkage), the center of the clusters (mean or average-linkage), or some other criterion. Many linkage criteria have been developed.

As with distance metrics, the choice of linkage criteria should be made based on theoretical considerations from the domain of application. A key theoretical issue is what causes variation. For example, in archeology, we expect variation to occur through innovation and natural resources, so working out if two groups of artifacts are similar may make sense based on identifying the most similar members of the cluster.

Where there are no clear theoretical justifications for the choice of linkage criteria, Ward’s method is the sensible default. This method works out which observations to group based on reducing the sum of squared distances of each observation from the average observation in a cluster. This is often appropriate as this concept of distance matches the standard assumptions of how to compute differences between groups in statistics (e.g., ANOVA, MANOVA).

Create your own hierarchical cluster analysis

Agglomerative versus divisive algorithms

Hierarchical clustering typically works by sequentially merging similar clusters, as shown above. This is known as agglomerative hierarchical clustering. In theory, it can also be done by initially grouping all the observations into one cluster, and then successively splitting these clusters. This is known as divisive hierarchical clustering. Divisive clustering is rarely done in practice.

You can quickly create your own hierarchical cluster analysis in Displayr. Sign up below to get started.

What is k-Means Cluster Analysis?

Tim Bock — Wed, 28 Mar 2018 00:00:27 +0000

Download your free DIY Market Segmentation ebook

The required data for k-means cluster analysis

k-means cluster analysis is performed on a table of raw data, where each row represents an object and the columns represent quantitative characteristics of the objects. These quantitative characteristics are called clustering variables. For example, in the table below there are 18 objects, and there are two clustering variables, x, and y. In a real-world application, there will typically be many more objects and more variables. For example, in market segmentation, where k-means is used to find groups of consumers with similar needs, each object is a person and each variable is commonly a rating of how important various things are to consumers (e.g., quality, price, customer service, convenience).

How k-means cluster analysis works

Step 1: Specify the number of clusters (k). The first step in k-means is to specify the number of clusters, which is referred to as k. Traditionally researchers will conduct k-means multiple times, exploring different numbers of clusters (e.g., from 2 through 10).

Step 2: Allocate objects to clusters. The most straightforward approach is to randomly assign objects to clusters, but there are many other approaches (e.g., using hierarchical clustering). The 18 objects have been represented by dots on a scatterplot, as seen in the diagram below, where x is shown by the horizontal position of each object and y by the vertical. The objects have been randomly assigned to the two clusters (k = 2), where one cluster is shown with filled dots and the other with unfilled dots.

Step 3: Compute cluster means. For each cluster, the average value is computed for each of the variables. In the plot below, the average value of the filled dots for the variable represented by the horizontal position (x) of the dots is around 15; for the variable on the vertical dimension, it is around twelve. These two means are represented by the filled cross. Or, stated slightly differently: the filled cross is in the middle of the black dots. Similarly, the white cross is in the middle of the white dots. These crosses are variously referred to as the cluster centers, cluster means, and cluster medoids.

Step 4: Allocate each observation to the closest cluster center. In the plot above, some of the filled dots are closer to the white cross and some of the white dots are closer to the black cross. When we reallocate the observations to the closest clusters we get the plot below.

Step 5: Repeat steps 3 and 4 until the solution converges. Looking at the plot above, we can see that the crosses (the cluster means) are no longer accurate. The following plot shows that they have been recomputed using step 3. In this example, the cluster analysis has converged (i.e., reallocating observations and updating means cannot improve the solution). When you have more data, more iterations are typically required (i.e., steps 3 and 4 are repeated until no respondents change clusters).

The algorithm described above is known as the batch algorithm. Many other variants of k-means have been developed. Perhaps the most popular of these moves objects to a cluster one at a time, updating the mean each time.

Download your free DIY Market Segmentation ebook

The outputs from k-means cluster analysis

The main output from k-means cluster analysis is a table showing the mean values of each cluster on the clustering variables. The table of means produced from examining the data is shown below:

Download your free DIY Market Segmentation ebook

Which segmentation variables should you use, and why?

Tim Bock — Mon, 26 Mar 2018 17:28:14 +0000

This article discusses how to work out which segmentation variables are appropriate from a list of variables. If you do not yet have a list, please first read How to identify relevant variables for market segmentation.

Download your free DIY Market Segmentation ebook

The choice of segmentation variables is one of the key strategic decisions when segmenting a market. As befits such an important decision, an enormous amount of work has gone into exploring all manner of different segmentation variables. Demographics have been used, such as age, gender, height, weight, race and social class. Firmographics – which are characteristics of companies, such as number of employees, turnover and industry – are regularly used in business segmentations. Many other variables have been proposed, including decision-making processes, situational factors, personality, profitability, benefits sought, and even star sign. Just about every consumer variable seems to have been used when segmenting a market.

The number of variables that have been developed creates an enormous challenge. Too many have been proposed to make it practical for us to empirically compare them all when trying to segment a market. Consequently, we need to instead employ some theory. The key bit of theory is that we need to distinguish between a segmentation variable, which is a difference between analysis units (e.g., people) that is strategically meaningful, and the measurement of a segmentation variable, which is an empirical question. This distinction can be amplified by examining what is perhaps the world’s first segmentation case study. The world’s first historian, Herodotus, writing two and a half millennia ago, tells of how Egyptian priests would sell the offcuts of their sacrificial bulls in towns containing Greeks but would throw them away if the towns contained no Greeks. The segmentation variable in this case is willingness-to-pay for bull offcuts. The ethnicity of the towns’ people – that is, whether they are Greek or not – is a measurement of this underlying variable. Whether there were Greeks in the town was not, in itself, interesting to the priests when working out whether to market their product; rather, they were interested in whether there was sufficient demand, and they worked this out by checking to see if there were any Greeks in the town.

This distinction between the true variable of interest and the observable variable, which is a measurement of the variable of interest, is a standard distinction that occurs throughout the science. When this distinction is employed it becomes clear that many of the popular segmentation variables should be viewed as being measures of segmentation variables, rather than true segmentation variables in themselves. Consider the widespread use of firm size as a segmentation variable. Whether or not a firm has five or 500 employees is interesting only because of the types of differences that would be expected to be correlated with the number of employees. A bigger firm would, in general, be expected to need more phone lines and thus provide more profitability. It would also be expected to need a switchboard and more stable IT backbone. Thus, the number of employees is the observed variable; and it is being used to account for lots of different segmentation variables, such as profitability, preferences for switchboards, internet speed, and many other variables that are correlated with firm size.

While there are an infinite number of segmentation variables, they can be grouped into five broad groups in terms of the underlying types of differences between people that they are trying to explain:

Preferences for product benefits, where consumers differ in terms of the product benefits that they seek to obtain from in a transaction. In the market for coffee, some consumers are caffeine intolerant and consequently choose decaffeinated coffee, while others may seek a caffeine “hit”.
Consumer interaction effects, where the preferences or behaviour of one or more people influence the preferences or behaviour of another group. For example, the clothes worn by Pink may be emulated by some consumers, and avoided by others.
Choice barriers, which are factors that prevent consumers from choosing the products that are most consistent with their underlying needs and wants. For example, somebody who has no interested in apps may buy an iPhone because they are unaware that their needs could be met just as well by a much cheaper Android phone.

Although understanding demand is often central to market segmentation, sometimes it is useful to focus on the firm’s own needs and wants. While preferences for product benefits, consumer interaction effects and choice barriers may all have supply-side implications, there are two additional types of differences between customers that are relevant to the segmentation strategy:

Bargaining power, where customers differ in terms of their ability to negotiate reduced prices. In the construction market, larger companies have a much greater degree of bargaining power, which enables them to get “better deals” than smaller companies.
Profitability, where buyers provide different levels of profit to the firm. In financial services, for example, a small proportion of consumers of financial services provide much of the industry’s profit.

Each of these is discussed in more detail in the next sections.

Download your free DIY Market Segmentation ebook

Preferences for product benefits

The classical marketing view of segmentation is that it is all about customer needs and how these shape preferences for different benefits provided by products. This reflects a widespread recognition that if consumers differ in terms of the relative importance they attach to different benefits provided by products and services, these differences provide a justification for segmentation. In financial services, consumers differ in terms of their preferences for electronic versus over-the-counter transactions. Consumer preferences for ice cream vary according to flavor and fat content. Consumer needs for decay-preventing oral care products differ according to the strength of their tooth enamel.

The more precisely a product meets a consumer’s needs, the more likely the consumer will be to buy the product. Tools like cluster analysis and latent class analysis allow us to create segments of customers that are relatively similar in terms of their product benefit preferences.

Consumer interactions

A quick tour of world cuisine serves to remind us that we are not born with many of our tastes, whether we’re talking about Australians and Vegemite, Americans and root beer, Chinese and the fallopian tubes of frogs (or “fat of the snow toad” as they are called to make them sound more appealing) or Danes and ammonia-flavored licorice. For reason of pragmatism, food tastes are usually assumed to be fixed preferences that can be measured at a point in time. However, there are many circumstances in which our preferences for products and product attributes can be seen as being a direct consequence of the views and behaviour of others; and this provides us with opportunities for segmentation.

In many markets, lots of people will not buy a product until others have already bought it. There are a few reasons for this. Some people only hear about new products by word of mouth, so they cannot buy until others have bought. For some people, waiting to see others buy is a way of reducing risk. In some situations, the actual economic benefit of a product changes depending upon who else is using it. For example, social media products like Facebook and LinkedIn only become useful when lots of people that you know are already using them. For these reasons, technology companies like HP court potential early adopters, seeking their input into design and sometimes even giving them free products, in the hope that they become adopters and advocates.

Pester power, whereby children persuade their parents to buy products is another form of consumer interaction which has a long and sordid history in marketing. The basic idea, which is generally denied by most companies that practice it, is that you develop marketing communications specifically for small children (which are a segment), so that they irritate their parents to the point of purchase. The same idea occurs in business markets, where within an organization there are lots of people that are not decision makers, but who are either gatekeepers which control the flow of information to the decision-makers, or who are influential and can persuade the decision-makers.

In grouping the various types of consumer interaction effects together it is not being suggested that they are the same, but that they all share a common basic implication. When consumer interaction effects exist in a market, firms cannot view market segmentation as being about measuring needs and finding groups of similar people; rather, the existence of consumer interaction effects requires marketers to recognize the interrelationships between consumers. In the U.S. some banks not only assess the profitability of individual consumers, they also assess the profitability of households in recognition that the satisfaction of a low profitability consumer could impact upon the banking behaviour of another more profitable member of the same household. Similarly, in many countries the consumption of gelato can be understood in terms of social group, with Italian communities, Italophiles and “foodies” being heavy consumers.

In markets where consumers differ only in their preferences for different product benefits, economies of scale and manufacturing technologies become key determinants of the number of segments that a firm should create. By contrast, rather than providing an opportunity for better meeting consumers’ needs, consumer interaction effects can present a constraint on marketing strategy, dictating the number of segments and how differences between the segments need to be taken into account. Consider the market for a game of soccer in the UK. Regardless of what similarities and differences consumers may have in terms of price sensitivity, comfort, food and so on, the consumer interaction effects – that is, brawling between supporters of different Premier League teams – makes it mandatory that stadia separate the supporters of the different teams.

The methodological challenges presented by consumer interaction effects are also distinct from those faced when focusing solely on preferences for product benefits. At the most fundamental level, how consumers are likely to interact needs to be known before we start to form segments, otherwise we will fail to collect relevant data for forming segments. For example, if most people will buy software only if it looks better than what they currently use and they know a few people who have used it and liked it, we need to find a segment of people that will buy without knowing anybody who has adopted -- and focus on understanding these consumers’ needs and wants.

Choice barriers

Grouping consumers solely according to the product benefits they seek and how they interact in making purchase decisions implicitly assumes that consumers have what has been described in the economics literature as an “irrational passion for dispassionate rationality.” That is, they have perfect knowledge of the available products and other consumers, and appropriately consider these factors when purchasing products. However, few would maintain that this is an accurate description of consumer behavior. Factors that constrain “homo-economicus” from maximizing his utility can be described as choice barriers. Researchers have identified a variety of choice barriers, including consumers’ awareness, knowledge and perceptions, switching costs, and decision-making and information-processing style.

From a segmentation perspective, the role of choice barriers is quite distinct. Product benefit preferences and consumer interaction effects are generally viewed as intrinsic characteristics of consumers that may not be easily changed by the marketer; however, choice barriers can be created, reinforced, weakened, and destroyed. When choice barriers are used as segmentation variables they generally lead to two different types of strategies. First, they lead to strategies of prioritizing customers according to the likely impact of the choice barriers on purchasing. Less effort gets expended on customers whose choice barriers make them unlikely to leave. For example, banks tend to end up charging their customers worse interest rates than they offer to competitors, because they know that it’s a hassle for their customers to defect. Second, choice barriers often lead to obvious marketing initiatives designed to reduce the barriers. Advertising helps with awareness. Ease of opening accounts reduces switching costs.

Where choice barriers are key segmentation variables, it is generally better to segment using judgment rather than statistical methods. A common example of this is loyalty segmentation.

Bargaining power

The fourth generic type of segmentation variable is bargaining power. Where one consumer can obtain the same product as another consumer but at a lower price, bargaining power is a factor. Differences in bargaining power enable firms to implement price discrimination (i.e., charge higher prices to consumers with lower bargaining power); an obvious example of this is the price differences charged to locals versus tourists in many countries.

In some situations, bargaining power is caused by another type of segmentation variable. For example, a consumer who is unaware that some bank managers have the ability to negotiate lower interest rates than those advertised will generally have a lower degree of bargaining power than a consumer who is aware. While the most common cause of bargaining power is the level of competition, other causes include social advantage, being a shareholder of the organization selling the product, network membership, and government legislation.

Continuing the banking example, the number of banks with branches in an area differs greatly by geography. The number of bank branches in rural areas is generally lower than in urban areas. The result of this is that consumers differ in their bargaining power, with rural consumers generally having less of an ability to force banks to compete for their patronage. Of course, ethical and legal concerns may prevent banks from taking this segmentation opportunity. Another example of segmentation by bargaining power is in the ice cream market, where the prices are often higher in locations where there is no competition, such as the theater and at sporting venues.

The three previously discussed generic types of segmentation variables are, in the main, characteristics of consumers. By contrast, bargaining power is a direct function of supply rather than a characteristic of demand. Even if two consumers have identical needs, wants, resources, and decision-making processes, differing levels of competition for their business may dictate that they should be treated differently.

At a methodological level, the key distinction between bargaining power and the other types of segmentation variables is that bargaining power requires an understanding of the environment that the buyer is in – such as the number of competitors bidding for their trade – rather than an understanding of the characteristics of the buyers.

Profitability

The fifth type of generic segmentation variable is profitability. Where one consumer provides a greater amount of profit to a firm than another (or have the potential to), profit may be a useful segmentation variable. Profitability segmentation -- also known as the “80:20 rule”, the Pareto Rule, and “volumetric segmentation”, and customer lifetime value -- has undergone a renaissance in recent years with modern database technology greatly improving our ability to measure and access the segments. The most visible example is the proliferation of loyalty programs, which reward consumers according to their volume and value of purchasing.

Although a retail bank could attempt to satisfy the needs of all of its customers, it would be at a serious competitive disadvantage by doing so, potentially acquiring a large number of highly satisfied but unprofitable customers. The economics of banking – where around five to ten percent of customers can account for 90 percent or more of industry profitability – makes it essential for banks’ segmentation strategies to focus primarily on customer profitability rather than on customer needs.

The obvious implications of differences in the amount of profitability that different customers provide have led some to conclude that profitability is generally more appropriate for segmentation than demographics or psychographics. Furthermore, segmentations based on “hard numbers” -- rather than “soft” marketing concepts such as brand attitude -- can be easily communicated throughout an organization. Nevertheless, profitability segmentation is not always applicable; it can be very difficult to calculate profitability, particularly over a customer’s lifetime, and in many industries there are no customer databases that identify members of each segment, which makes accessibility poor.

While customer profitability is a single segmentation variable, often it must be calculated using multiple pieces of data. Banks construct individual measures of profitability for use when segmenting markets by combining information on balances, interest rates and transacting behavior, while many other firms use recency, frequency and monetary value as proxies for profitability.

Most service companies use profitability segmentation. Banks try to be nice to customers with home loans and large portfolios of products, offering them various free services, removing bank fees and giving them personal relationship managers. By contrast, customers who are unprofitable find they wait in long lines, pay high fees and often feel, quite rightly, that the bank wishes they would defect.

A particularly appealing aspect of profitability segmentation is its simplicity. If you can allocate customers into segments, it is straightforward to work out how they need to be treated. The Australian carrier Qantas, with its frequent flyer program, treats its most frequent flyers, Platinums, with great respect. They are regularly referred to by name. They get priority access to good seats. A nicer lounge. Priority check-in. And fancy bottles of wine if they participate in market research.

Many marketers are uncomfortable with profitability segmentation, which is essentially exploitative in its focus; it does not fit well with the marketing philosophy of creating shareholder value by creating value for customers. Ultimately this is a question of philosophy; it is undeniable that firms can profit by segmenting based on profitability. A more compelling criticism of segmenting using profitability is that it can confuse cause and effect. That some customers are profitable is often a consequence of marketing activities; the customers that are unprofitable are perhaps the ones whose needs are not being met, rather than customers that should be de-prioritized (the normal consequence of using profitability as a segmentation variable). Similarly, the most profitable customers may already be completely happy, so any additional attention could either be unwanted or may simply increase costs and reduce profitability. There is evidence that “loyalty programs” focused on high-value customers may not work.

Ultimately these criticisms of profitability segmentation have some validity but they are insufficient grounds to reject its use. Of the five generic types of segmentation variables, profitability is by far the easiest to incorporate into an operational segmentation. This is because often good measures of profitability exist in customer databases (e.g., the purchasing record of a customer can be used to estimate profitability), whereas the other variables are often too weakly correlated with the type of data on a customer database to permit a segmentation being operationalized.

Download your free DIY Market Segmentation ebook

How to Identify Relevant Variables for Market Segmentation

Tim Bock — Mon, 26 Mar 2018 17:14:50 +0000

Download your free DIY Market Segmentation ebook

Conduct brainstorming sessions with the key stakeholders.

A useful way of organizing such a brainstorming session is to give them the task of creating a segmentation tree, successively splitting the market based on key variables (e.g., splitting first based on users and non-users, then splitting each of these by another variable, and so on). Useful participants at the brainstorming are marketers, market researchers, ad agencies, people with relevant operational experience and any parties with expertise in the product category and the business problem that has motivated the need for the study. It is often advisable to avoid senior executive staff who might attempt to demonstrate leadership in the brainstorming process (as the purpose is to identify relevant variables, rather than lead). A brainstorming session should always precede any research with consumers, so that the research can be directed to validating and extending the initial list of attributes. Where consumer research precedes the brainstorming session, the outcome is invariably a list of attributes that is far from exhaustive and contains no insights that could not have been reached through brainstorming.

Identify factors relating to purchase/non-purchase

For example, conducting exploratory research where you ask people which products they do and do not like or buy, what the reasons for this are, and what the makers of the products would have to do to make them appealing enough to buy.

Observe purchase patterns

How do consumers purchase products? Marketing gossip has it that 60% of grocery purchase decisions are made in the store and take only a few seconds. If this is true, it is extremely unlikely that the factors influencing purchase are utilitarian benefits - they’re much more likely to relate to point-of-sale factors and price.

Observe how consumers use products

In the early 1970s Pepsi gave 350 families the opportunity to order home-delivered Pepsi and competitive soft drinks at discount prices. No matter how many bottles were ordered, the consumers always drank them, leading Pepsi to conclude that the volume of soft drink consumption was driven in part by whether or not consumers could get the product home. Pepsi tapped into this attribute by developing the plastic bottle, completely reshaping the soft drink market by enabling consumers to buy more product.

Identify consumers' perception of risk

Midas has built a successful brake repair business by taking its customers through a checklist of all of that can be wrong and is wrong, thereby reducing consumers’ fears of being exploited.

Use Kelly's (1955) Personal Construct Theory

Consumers are presented with sets of three existing products and asked to identify the attributes on which the products are similar and different. Content analysis (commonly, researcher judgment) is used to identify the underlying attributes. This approach is generally useful only when researching product categories about which little is known, or when attempting to elicit image-based attributes.

Laddering methodologies, such as means-end chains, can be a useful means of identifying relevant attributes. See http://www.mktresearch.org/wiki/Laddering for more information.

Consult the academic, trade, technical and consumer literature

Consulting the academic, trade, technical and consumer literature. A good literature review can save substantial time and money. The coffee, automobile and toothpaste markets, for example, have received much attention in the theoretical and methodological marketing literature. For consumer products, magazines such as Choice and Consumer Reports, and online product comparison services like priceline.com often contain comparisons of products on what are believed to be the key attributes. Similarly, in industrial markets, technical reports and magazines often compare products based upon key performance criteria.

You can also use the framework described in Which Segmentation Variables Should You Use and Why.

Download your free DIY Market Segmentation ebook

Acknowledgments

The Pepsi and Midas examples are from MacMillan and McGrath(1996).

Kelly, G.A. . 1955. The Psychology of Personal Constructs. . New York: Norton.

MacMillan, Ian, and Rita Gunther McGrath. 1996. Discover Your Products' Hidden Potential. Harvard Business Review 74 (May-June):58-73.

Reynolds, Thomas J., and Jonathan Gutman. 1988. Laddering Theory, Method, Analysis, and Interpretation. Journal of Advertising Research 26 (1 February/March)

What is Operational Segmentation?

Tim Bock — Sun, 25 Mar 2018 23:17:14 +0000

Download your free DIY Market Segmentation ebook

The key technical distinction between operational and inspirational segmentation relates to how individuals are allocated to segments when they interact with an organization. When an inspirational segmentation has been conducted, people see the results of the segmentation. Most commonly as either new products or marketing communications. Usually, the result is an increase in choice. For example, the choice between Coke, Diet Coke, Coke Zero and Fanta. In an operational segmentation, people are placed in a segment. It is not their choice.

Any organization that restricts access to some of its offers to different groups of customers is employing an operational segmentation. For example, organizations that employ highly targeted advertising (e.g., only advertising in magazines read by young women) use operational segmentation. So do firms that send special offers via direct marketing to a subset of its customers.

Examples of operational segmentation

Historically, Dell Computers required consumers to indicate whether they were in the Home & Home Office, Small Business, Medium & Large Business, State & Local Government, Federal Government, Education or Healthcare segments. Different segments received different levels of sales support, different prices and were even directed towards different products. People in these segments had no choice about which segment they were in. If you were allocated to the Home & Home Office segment, you did not have access to the offers developed for the Federal Government segment.

Airlines also employ operational segmentations via their frequent flyer programs. Customers in higher tiers receiving “better” products, and customers in lower tiers less attractive products.

Example of an inspirational segmentation

The classic example of inspirational segmentation is Russell Haley’s segmentation of the market for toothpaste (see What is Market Segmentation Research). He identified four segments: the Worriers, who want to stop decay, the Sociables who want to attract attention, the Sensories who are motivated by flavors, and the Independents who are motivated by price. While the segments can be used to inspire product development and communication, there is no way for a company to work out to which segment each of its customers and potential customers belong.

Mixtures of Operational and Inspirational Segmentation

Organizations usually employ a mixture of overlapping operational and inspirational segmentations. For example, most firms break up the world into different territories, focusing either on a subset of territories or offering different products to different territories. Cadbury, for example, has different formulations of its Dairy Milk Chocolate in different areas [Yank] of a world. This is an example of an operational segmentation; but inspirational segmentations are developed within operational segments, with the goal of inspiring marketers to develop better products and marketing communications.

Download your free DIY Market Segmentation ebook

Acknowledgments

The term “operational segmentation” comes from Piercy and Morgan (1993). It can be seen as a mechanism for implementing third-degree price discrimination and is essentially the same idea as the Frank et al. (1972) concept of Controlled Coverage.

Frank, Ronald E., William F. Massy, and Yoram Wind. 1972. Market Segmentation. New Jersey: Prentice Hall, Inc.

Piercy, Nigel F., and Neil A. Morgan. 1993. Strategic and operational market segmentation: a managerial analysis. Journal of Strategic Marketing 1 (2):123-140.

What is Market Segmentation Research?

Tim Bock — Fri, 23 Mar 2018 06:02:08 +0000

Download your free DIY Market Segmentation ebook

The output of segmentation research

The table below is an example of the end-point of a basic segmentation study. It describes four segments of toothpaste buyers. Typically, there will be a lot more information than shown here, both in terms of richer qualitative information and detailed tables of differences between segments.

Types of segmentation research

Broadly speaking, segmentation research can be classified into four broad types of methodology:

Quantitative survey-based research
Research carried out on secondary data
Research carried out based on company databases
Qualitative research

Quantitative survey-based research

Survey-based research involves the collection of data from a survey of people in the market of interest, and then using a segmentation algorithm, such as k-means cluster analysis or latent class analysis, to form segments. This is the main way that companies form market segments, because/since this method is the most flexible and produces the most detailed outputs, in that:

Any form of data can be collected, such as demographics, behavior, attitudes, price sensitivity, preferences, etc.
As a byproduct, it can produce all the key bits of information that are required for the implementation of the market segmentation (e.g., the size of the segments, the differences between segments in their attitudes, media viewing, etc.)

Research carried out on secondary data

Secondary data is data that has already been collected and is usually in the public domain. Most commonly this is data collected by government statistical agencies, such as census data. Segmentations based on secondary data tend to focus on demographics.

Research carried out based on company databases

Segmentation research studies based on company databases tend to focus on behavioral data, such as frequency and types of products purchased, customer value, and loyalty.

Qualitative research

The three most common qualitative research methods are focus groups, in-depth interviews, and ethnography. The strength of qualitative research is its ability to provide complex textual descriptions of how
people experience a given research issue. One advantage of qualitative methods in exploratory research is that the use of open-ended questions and probing gives participants the opportunity to respond in their own words. Researchers often use qualitative and quantitative material to complement each other. The final qualitative report containing descriptions and estimates of the sizes of the segments.

The term rich data describes the notion that qualitative data and their subsequent representation should reveal the complexities and the richness of what is being studied and the type of data that can be used for forming segments. However, they present the most challenges in terms of implementation, since/because the nature of the research means that much of the information required for effective implementation is missing (e.g., media usage, the size of the segments).

Download your free DIY Market Segmentation ebook

Acknowledgments

The table of outputs is adapted from Haley, R. I. (1968). "Benefit Segmentation: A Decision Oriented Research Tool." Journal of Marketing 30(July): 30-35.

Learn more about Market Segmentation

To learn more, download our free guide to Market Segmentation!