The easy bit of market segmentation is creating segments. The hard bit is working out if the resulting segments are useful. In this post, I review tools four tools for making this process more efficient: using heatmaps to summarize lots of crosstabs, segment comparison tables, smart tables, correspondence analysis bubble charts, bubble charts that have been designed for comparing segments, and automatic updating.

Recap: How to create segments

Creating segments consists of two steps. First, we find some data that describes key differences between people. Typically the best data is obtained from surveys, collecting data on attitudes and behavior.  The second step is to form the segments, where the options are to use:

  1. Pre-defined segments. For example, age, gender, family life stage, company size, or industry.
  2. Statistical techniques, such as cluster analysis, latent class analysis, and neural networks, which create new segmentations, where each segment consists of people that are similar based on the data being analyzed.
  3. Strategy, which involves a combination of using pre-defined segments and statistical techniques.

The beginner's mistake is to choose to create a single segmentation. The mistake usually starts from a misunderstanding, which is the mistaken belief that a market contains a small number of segments and the goal of market segmentation is to find them.

The expert move is to create many segmentations - typically dozens - and work out which of these is "best", where best means provides the most strategic advantage. The key insight behind this approach is that in just about all markets there are no "natural" number of small segments. Rather, there are an infinite number of ways of carving segments out of the market. The more segmentations that are evaluated, the better the likelihood that a good one is found.

Once you have created multiple segmentations to be compared, you should end up with a data file, which contains:

  • One or more segment membership variables. Each segmentation will be represented by a separate variable. That is, a single column of data, where each person has a number assigned to them (e.g., a 1 for the first segment, 2 for the second segment, etc.).
  • Profiling variables, describing key differences between people. Typically this will include things like their attitudes, behaviors, demographics, and media usage.

Evaluating segments and segmentations

And now we move onto the meat of this post. If we have multiple segmentations, how do we efficiently compare them? The traditional approach has been to create lots and lots of crosstabs, and read through them all. This is a slow and painful process, and if you go down this route the odds are you will only end up evaluating a small number of possible segmentations which is, as described in the previous section, the wrong route. However, there are a number of ways of short-cutting this.

Heatmap summarizing lots of crosstabs

One way of automating the process of inspecting lots of crosstabs is to create all the crosstabs, but then, rather than read them, instead create a heatmap that summarizes what they show. The heatmap below compares six alternative segmentations using 115 profiling variables. Each row of the heatmap shows the statistical significance of the crosstab with each fo the profiling variables and each of the crosstabs. I've represented the statistical significance using z-statistics, as they create a better visualization than p-values (a z of more than 1.96 corresponds to a of 0.05 or less).

Looking at the heatmap below we can see, for example, that segmentation 1's segments differ more in terms of work status, occupation, and age, than do any of the other segmentations (i.e., the blue is darker, which means a higher z-score, suggesting a more significant relationship). The second segmentation better explains differences in top of mind awareness, the perception that brands are fashionable, etc.

You can find out more about how to create this heatmap by reading this post.

Counting the number of significant differences

The chart below counts up the number of profiling variables that were statistically significant, as shown in the heatmap. it tells us that the 3rd and 5th segmentation are related to more of the profiling variables than the others, and should be the first segmentations we focus on. Additional insight can be obtained by doing similar analyses for sets of profiling variables (e.g., demographics, usage variables, etc.). And while you can create these by counting them in your heads, it's just a line of magic code to automate the process of counting up the number of differences: colSums(most.significant.results > 1.96).

One big segment profile table

Once you have identified the key segmenations to focus on, the next step is to carefully examine each to understand how their segments differ.

This next time saver is one that took us a bit of time to add to the software. We kept getting the request, but for some reason didn't figure out why it was important. The basic idea of the table below is that it is a single table, rather than lots of tables, that summarizes the relationship between the segments in a segmentation and all the profiling variables. Now, you may be thinking "it's just a table", but it has a few special features:

  1. It shows the segment names and sizes at the top and when you scroll they stay pinned to the top
  2. Shading is used to show the magnitude of differences between segments and you can control this (e.g., I've used borders instead)
  3. Font color is used to show significance tests
  4. It is showing both categorical and numeric data
  5. You can embed it on a web page to give stakeholders access to it
  6. You can set up filters so that clients can further drill into it

This is created using Insert/Create > Segments > Segment Comparison Table.


Smart tables

A practical problem with segment comparison tables is that with a big study they can just be too big to use. So, how should we select which variables to include? We should only include strategicially interesting vairbales. But what if we have lots of these? The simplest approach to doing this is to us automated statistical tests to identify which variables to include. In Q this is done using Insert > Tables > Smart Tables. In Displayr, using Insert > More > Tables >  Lots of Crosstabs and choosing the options for deleting non-significant tables.

Correspondence analysis for better understanding a categorical profiling variable

The heatmap allows us to compare lots of segmentations. The segment comparison table gives us detailed information on a single segmentation. The next level of analysis is to get a lot of depth on a set of variables or categories from categorical variables. Correspondence analysis is often the best tool for this, as it draws our attention to the key differences between the segments. The example below, for example, shows us that Segments 3 and 4 skew to old people, and the others to younger people.

Bubble charts for comparing segmentations

If we have a single key numeric variable, we can compare it across multiple segments using a scatterplot like the one shown below, where the alternative segmentations appear on the x-axis, the key variable on the y-axis, and the sizes of the segments are shown by bubbles. We can see in this example that age seems to be the best of the alternative segmentations being compared as it has the highest degree of discrimination and the there are no huge segments. (The variation in the sizes of the age segments hints at some methodological problems however...). See chapter 3 of our Brand Analytics ebook for instructions on how to create this visualization.


Lastly, don't forget automatic updating

Most importantly, both of our products, Q and Displayr, have the ability to automatically update charts and tables with new data. This means that you can create a detailed set of tables or visualizations describing segments, and then automatically populate these with alternative segmentations.