Wednesday, March 17, 2010

The TIME Factor in Modeling

Last week I read a question on one of my linkedin community on whether "marketing is science or art". There was a lot of debate on it being a bit of both or more of one and less of other. I had my two bit answer to it also. I was pondering on the question. That seems a bit odd since I remember having the same question for my marketing management question during semester exams of my MBA program... the answer was by rote. Today I ponder. Does that mean I am wiser? Before you answer, that was a rhetoric. My response was that marketing was both an art and a science. The science part of marketing lies in all the theory and algorithm that is available to be applied. The art part of the marketing depends on what theory or algorithm to use.

I recall an incident at a general insurance company a few months back. The team had built a cross sell model. They had first segmented the customer base and done a product association analysis. Two particular products were found to be closely associated. For sake of confidentiality lets call them ... well... Product 1 and Product 2 (not very innovative... are we?).

The next step was to find customers who had only Product 1 and not Product 2 and treat them as the target for scoring the propensity for purchasing Product 2. The data set of the customers demography, transactions, etc., was formed and the model building and scoring process executed. The customers were ranked to give the potential base for cross sell.

While the process used was appropriate there was a major flaw in the way the scoring exercise was used. The data created for model building was derived from a cut of the customer database as of a particular time... say January 31, 2010. This is where the process went .. drastically .. wrong. A cardinal mistake committed by statistical standards.

For sake of explanation, consider three customers who have bought both Products 1 and 2. The following gives the timeline of purchase of the two products.




As can be observed, by taking last 12 months data from a cut of Jan 2010, the actual purchase of the two products were not taken into consideration. For Customer 1, the purchase of Product 2, which is the target event in this analysis, actually happened outside the period of analysis.

The correct approach would be to identify the event of purchase of Product 2. Term this period as Base period. For each customer, basis the purchase of Product 2, the base period will be different. Then take the data for 12 months past from this base period.

The reason this needs to be done is we are studying the pattern of behaviour a customer exhibits before he purchases the product. Thus, the period of analysis is relative to the purchase of the product.

The following figure shows the period for which data needs to be extracted for each customer. This is dependent on the purchase of Product 2. Notice that this period is different for each customer. In fact for customer 1, this period is way in the past and goes beyond the period that was displayed. It probably needs to be decided whether Customer 1 is a "vintage" customer and should be excluded from the analysis.



This is classic case of the wrong science applied to correct art. No doubt this cross sell campaign had a high chances of failure. And the blame to be put on the statistical model which failed to predict the correct potential base.

If you want to avoid this and similar pitfalls, I will be glad to discuss ... contact me at michaeldsilva@gmail.com.
 
test