Predictive Analytics-Risks and Failure Rates

Today s topic is Predictive Analytics and this is continuation in a series of audio podcast interviews on this topic. Dr. Eric Siegel is a former computer science professor at Columbia University and the President of Prediction Impact.

Dr. Siegel, good afternoon and thanks again for agreeing to this interview and for your time today.

Q: We’ve introduced  Predictive Analytics  the Business Benefits, an Example Business Case and a Client Deployment and for today’s topic I would like to ask that you address what are the risks when deploying predictive analytics and how often do they fail?

A: Well, the main risk is actually on the business process side, which is that buy-in and support for the deployment of a model won’t hold strong once that model is complete.  Now this risk is averted by following a standardized business process model known in one form as CRISP-DM, which stands for Cross-Industry Standard Process for Data Mining.  And it’s an iterative process involving both technical phases and decision-makers of various ilk to ensure that the technology will be positioned in a way that produces business value, and it will produce a model than you can produce over the data available, and the model will produce predictive scores that are actually actionable with regard to what’s possible on the operational side.

Now as far as the technical deficiency of a model — that it just doesn’t predict as well as you’d hope — the risk of this can be kept at a real minimum basically in three ways, all three of which are integrated within the standard process model:

1) The first of the three is that when you evaluate a predictive model, you don’t evaluate over the same data that you used to create it — that’s called the training data, the learning data.  The data used is held-aside data, called test data, which therefore gives you an unbiased, realistic view of how good that model really is, so you know that if it it’s not doing well on that data, you need to revisit during these pre-production technical phases, change the data or change the method until you get a better model.

2) Once you have a model that looks good and it’s time to deploy it, once again, here, the second of the three aspects of the process to help mitigate risk is to only deploy it in a small dose.  So keep the current method of decision-making in place, and then perhaps 5% of the time, or, let’s say, limited only to an hour or a day of processing, deploy the model so that it stands in contrast to how decisions are made currently, so you see can whether indeed the effect of the model is proven, and that profits have increased or that responses have increased, or what-have you.

3) And then, likewise, in the third manner, that same sort of “duality” of, basically, A-B testing — you’re testing “use this model” versus “don’t use this model” — you always keep that going, ideally, so that you have a small control set that keeps doing things the old way, or, in any case, keeps doing things in a way that does not require a predictive model.  So you have that as a baseline against which you’re constantly monitoring the performance of the predictive model, and you can decide, this predictive model has been doing well for a while, but now its performance is degrading — it’s time to produce a new model over data, let’s collect some new data and start the cycle over again.