In October 2011, at the Predictive Analytics World conference in New York City, IBM shared the supercomputer Watson, made famous by beating Jeopardy champions Ken Jennings and Brad Rutter, at the Expo. Conference attendees could go head-to-head with Watson answering jeopardy-style questions. With great pride, I can say I beat Watson in a best of three questions. Before you deem me a “super-genius” (I am no Xander), let me confess two things. The smartest thing I did was quit while I was ahead; I have no doubt Watson would have crushed me over an extended period of time. Second, the questions I beat Watson on regarded popular culture. The first question about candy, “What is Baby Ruth?” and the second question about television, “What is Friends?”
What makes Watson so impressive is its mastery of the English language. Anyone who has tried to learn a second language knows what makes language difficult are the subtleties that can only be deciphered with context and experience. It is our ability to process and use judgment that make the human brain a supercomputer like no other.
What attracted me to the conference was keynote Tom Davenport, author of Competing on Analytics and a workshop on common data mining mistakes by Dr. John Elder. Most of the attendees were brilliant technologists, engineers and mathematicians. Needless to say, I was probably the only human capital specialist in the Hilton ballroom.
In the vernacular of the conference, data mining and predictive analytics were used synonymously. Dr. Elder described data mining as “a blend of business system engineering, elegant statistical methods and industrial strength computing” but concluded it is “still art more than science.” Elder advises we should think about data mining and the associated technology not as a robot delivering answers but like the Iron Man suit enhancing our decision-making capabilities.
Key take-aways from:
The Best and the Worst of Predictive Analytics: Predictive Modeling Methods and Common Data Mining Mistakes
Take precautions to guard against “proving” something random is real. Human beings are expert pattern finders, and without using a disciplined two-step analysis approach, it is easy to fall prey to our instincts. A two-step analysis means we cannot judge the quality of a first-generation result until we can examine a second generation or, as Elder put it, “we can’t determine how well the child turned out until we see the grandchild.”
- Another danger to guard against is sampling errors or “leaks from the future.” Statistics tells us the strength of a relationship among a random sample of data points, the key word being random. When we start adding criteria to our sample, we influence the results. I accept we may not have the time or resources in organizations to conduct the advanced techniques Dr. Elder presented as best practices. The lesson I took away is this: when evaluating a result, especially R2, understand how the sample was constructed. A carefully pruned sample should have a higher R2 than a true random sample.
- My favorite lesson: Fancier tools equate to more ways to mess up. Dr. Elder provided many examples of colleagues who became so enamored with the technology or their techniques that they lost sight of the problem they were trying to solve. Elder’s advice for success is persistence, attitude, teamwork and humility, or PATH.
PATH to success:
- Persistence - Attack repeatedly, from different angles. Automate essential steps. Externally check work.
- Attitude - Optimistic, can-do.
- Teamwork - Business and statistical experts must cooperate. Does everyone want the project to succeed?
- Humility - Learning from others requires vulnerability. Don't expect too much of technology.
All the technology at the Expo was impressive, even “mind-blowing,” but a realization came over me. Our technology “overlords” will always need us…the converse is not true. Of course, it would be an inconvenience, to say the least, to go back and have to calculate standard deviation, t-tests and confidence intervals by hand, but it can be done. Children can be taught (and even entertained) without the use of an electrical outlet. We might even start to remember phone numbers again. The point is, while the technological capability is awesome, without human interpretation and judgment, it is merely a parlor trick.