Monday, June 11, 2012

Segmentation .... now or later?


Very frequently while building analytical models, I have had clients state "Let's start with customer segmentation." I have to spend a considerable time convincing the clients that segmentation is not mandatory as the first step of modelling. Unfortunately, a majority of the statisticians start with segmentation. They claim that the population need to be clustered into homogeneous segment. Every business user also is convinced that his customer base is a group of homogeneous clusters.

This thinking is very flawed for two reasons:

  1. Segmentation is a relative activity. That is, one needs to do "statistical" segmentation towards some goal... whether it is to understand customer value or default or cross sell. Segmentation as a stand alone activity does not provide much value. This is the reason I always dissuade my customers from doing just a "segmentation" exercise. (see my post on "statistical segmentation" titled perspectives of segmentation).
  2. Grouping customers into clusters induces biasness into the model building process. Let me elaborate further on this.


The primary reason in clustering is that the customer base consists of groups of individuals who behave in similar pattern. This also leads to the corollary that customers belonging to different clusters behave differently.

My econometrics professor in college had a wonderful way of explaining statistical concepts from real life scenarios or philosophies. He had stated that the basic premise of law is "a person is innocent unless proven guilty." He had told us to keep this premise in mind when defining the hyphothesis for model building. Going by this, we start with the assumption that the customer population is homogeneous unless proven otherwise. This proof can only come with model building.

By performing segmentation upfront, we are making an assumption that the customer population is not homogeneous. With this presumption we are inducing a biasness in the model building process.

From an effort perspective, I have a cost accountant view on this. A segmentation exercise typically leads to definition of 5 to 10 segments. The next step will be to build separate models for each of the 5 (or 10) segments. This multiplies the efforts required. And all because of an unproven assumption that different segments behave differently from one another.

An effective approach will be to first assume that the customer base is homogeneous. Then build a single model for the target variable. The next step is to find variance within the test population. The variance can be either on the decile dimension or we could look at the significant variable to idenfity the value that has the highest variance. This will then indicate a likely set of customers who behave differently than the rest of the population and hence provide the variance in the test population.

At one client where we adotped this approach, we found that the "product holding" was showing the highest variance. Evaluating the values, a particular product was found to be exhibiting the high variance. As a next step, we split the population into two segments -- one segment holding this product and the second for the rest of the population. Two separate models were built, one each for the two segments, and the scores were merged (after normalization). Since the population creating variance in the original model was now separated, the rest of the population was comparatively more homogeneous. The model for the variant population was custom for that population and hence a better fit. Thus, the combined scoring was more accurate than the first single model. This accuracy met the acceptable threshold of the client and we deployed the score in the business operations.

The entire exercise involved - 3 models and one filter based population split. Compare this with the "traditional" approach of one segmentation (leading to 5 to 10 segments) and one model for each segment (approximately 5 to 10 segments). We completed the exercise in 4 days against what would have taken us more than 10 days the traditional way.  

No comments:

Post a Comment

 
test