Friday, November 13, 2009

Beware of gut feel !!!

Last week I was in discussion with the founders of a firm providing analytical solutions for retail industry. They have, in the recent past, added predictive analytics module to the offering stack.

One of the founder was enquiring from me on how one would do cross sell for a retail. I stated that a first step would be to do a segmentation analysis.

He wanted to know how one would do segmentation and what variables would form an input to the segmentation exercise. I mentioned that a host of variables can be considered. One could consider demographics, transactions, interactions, and if available even psychographics. When we went into detailing, I stated that variables such as RFM could form inputs. So one should consider variables such as weekend visits, weekday visits, time of visit, day of visit, etc.

At this stage, we had a disagreement. This person stated that RFM cannot be an input into segmentation exercise but rather it should be used to describe a segment. I explained to him that at an initial stage we should provide all possible variables and derivations of the variables as input to the statistical model and let the model identify and select the significant variables. He was still insisting, and infact continously injecting, that RFM cannot be an input.

I explained to him again that if he takes a stance that RFM cannot be an input to modelling then it implies he is letting gut-feel or instinct override the modelling exercise. This is not a right way to do modelling. The best way is to let the model do the variable selection. At the end of this exercise, it may still happen that the RFM variables will not be a significant variable. But considering the possibility that a few of the RFM related variables are significant in identifying clusters or segments, then letting instinct take over and not providing RFM variables to the segmentation model will result in incorrect segements. And any subsequent modelling exercise will further magnify this error. Often this is what happens in reality. Trying to force instinct as an input into a statistical modelling exercse often leads to a incorrect inference or scoring. And always the blame of failure is put squarely on the modelling process.

As Malcolm Gladwell states instinct is often good but it should be validated... And validation as defined by him is instinct generated after 10,000 hours of going through the same grind. And even after this instinct, he states that it still helps doing statistical modelling as an independent activity since it can be used to validate the instinct.

The person I was discussing with obviously did not come with a 10,000 hours of retail exposure. We walked apart as not best of friends. I still hold on to my stand that it is best to provide as many variable as inputs to the modelling exericse and find the significant one rather than filtering any variable upfront.

Need to know how you can segment your customers OR more important why should you segment your customer base, contact me at michaeldsilva@gmail.com. I will be glad to discuss and assist you on the same.
 
test