The goal of this article is to identify important characteristics of data science, in a business context, a.k.a. business intelligence (BI). Both terms will be used interchangeably.
You might be familiar with the TV show NUMB3RS that aired in the US between 2005 and 2010. For those who are not, here is the basic idea: a young gifted mathematician from fictitious CalSci helps his older brother, an FBI agent, solve crimes. The show is very entertaining, even for non-mathematically inclined viewers, and covers a wide set of algorithms and mathematical theories. It also features two of my favorite technologies: Mathematica (a skilled observer will recognize some palettes) and Apple MacBook Pros. Wolfram Research “partnered with CBS in promoting math awareness”, as mentioned on this website. There, you will also find some interesting mathematical details on the different episodes.
Charlie’s use of data, mathematics and computers is reminiscent of data science. Had the show been airing more recently, Charlie might in fact have been a data scientist. Although an important aspect of the show, technology is not at the forefront. One can feel its presence, but barely sees it: powerful and seamless is what any good technology should be. And this leads to the first lesson we can learn from NUMB3RS.
Lesson #1: Data science is not mainly about technology
As in the show, you can’t solve even some of the simplest problems without technology. It is an incredible tool, but it is not at the center of the story. Most companies hire data scientists within their IT department, which results in hiring mostly developers with some modeling skills. According to Gartner, about 80% of business intelligence implementations fail. By refocusing our priorities on the problems at hand and how to model them, we might make it easier to find data scientists, and make those projects more successful.
Lesson #2: Data science is about solving problems
In the first episode, Charlie offers to help catch a serial killer using geographic profiling to identify the killer’s home most probable location. His brother had never thought math could be of any help in his job. Similarly, the business usually can’t imagine which problems can be solved using data science. This can only be achieved in collaboration with modelers who help the business identify and clarify the problems to be solved.
Lesson #3: Data science must use expert knowledge
Once a problem clearly defined, it must be modeled within a relevant mathematical framework. Charlie always asks the right questions to his brother and his team, often highlighting, sometimes questioning, some of the assumptions underlying their expert knowledge. Sometimes their interaction is enough to help the FBI move the case in another direction, without resorting to a model. Most often, this helps him identify the algorithm he’ll use and the data he’ll need. This knowledge is necessary to guide the search. Blind application of data mining on large datasets is doomed to fail most of the time, but you might still get lucky: even a blind squirrel finds a nut once in a while.
Lesson #4: Data science must be explained in simple terms
A strong emphasis is put on how Charlie explains the model he’ll use, each time resorting to analogies with everyday life situations. He makes sure his FBI colleagues understand the ideas behind the methodology he is going to use. The same care is given to the communication of his results. The message here is that you need to adapt your communication to your audience. Maybe obvious, but always worth remembering.
Lesson #5: Data science is more than just a lab experiment
Larry, Charlie’s physicist friend, regularly pushes him to leave his office to gather data in the field. How can you model a phenomenon, if you are not directly exposed to it. Another argument Larry likes to make is that it is often difficult, even impossible, to account fully for the human dimension. This impacts the modeling process but also the interpretation of the results.
Lesson #6: Data science is not just R&D
Once he gets results, Charlie’s work is not over. He helps determine the best course of action, monitors the implementation, and might correct his models or get new data, depending on the outcome of the FBI actions. Similarly, data scientists need to be implicated in the implementation and monitoring of the actions their research recommended. In a way they need to be accountable for them.
Lesson #7: Data science is about applying the scientific method
The scientific method relies on two pillars: theory and data. You can start with a theory and verify it on data, or can mine your data and come up with a theory to explain the patterns you observed. In any case, you need both. The idea that you can use data mining alone on your data, without trying to explain your results is very dangerous. This is another reason why you shouldn’t be a blind squirrel. That’s a constant in NUMB3RS: theories and data are required, and need to be in agreement with each other.
Lesson #8: Data science is about teamwork
Teamwork is an integral part of the dynamics at play in the show, on two levels. The first level is between Charlie and the FBI. The group dynamic is crucial in producing good results. The second level is between Charlie, Larry and Amita (Charlie’s former student, turned professor and girlfriend). Getting a different perspective from them regularly helps Charlie change his approach to a particular problem. We find exactly the same dynamics in business intelligence, between the data scientists and the business, and within the data science team. Teams, not star data scientists, are producing better results. Building teams will also help alleviate the resource scarcity in data science.
Lesson #9: Data science is about trust
In the first few episodes, Charlie has to convince the FBI that he can add value. He has to build trust. As the show progresses, the almost magical results he regularly obtains convince the FBI agents of the robustness and benefits of his approach. This trust, necessary to most human relationships, is a corner stone of any business intelligence project. Starting with smaller, easier problems is a good way to start building it, before attacking more complex and riskier projects.
Lesson #10: Data science helps foster an analytics-based culture
By working closely with Charlie, the FBI agents become more and more familiar with the concepts he is using. They regularly come to him with new problems, mentioning what he did in a previous case, and asking if he could apply the same methodology to help solve this new one. Building an analytics-based culture is a very important aspect of business intelligence too. Once trust has been built, with clear explanations of your methods, your internal customers will come asking for more.
NUMB3RS is a good illustration of what data science should look like. Watching the show might give you some additional ideas, but in any case you should have a great time doing so. And if you are interested in the math behind it, you can also check out this website.