Over the years in my career in data science and predictive analytics, I have noticed some awful practices that young and sometimes seasoned analysts follow. These bad practices, I believe, throw careers of these data scientists on a collision course similar to the Titanic. I will present the six worst mistakes that I feel are at the root of all these bad practices. Additionally, I will try to suggest strategies to avoid these mistakes using some memorable quotes. To begin with, let me present the purpose of being a data scientist, which in my opinion is similar to being a detective. The following quote by Sherlock Holmes sums up the purpose of being a data scientist:
My name is Sherlock Holmes. It is my business to know what other people don’t know.
― Sherlock Holmes
Now coming back to the six worst mistakes for data scientists, the following is my list for the same:
- Focus on tools rather than business problems
- Planning communication last
- Data analysis without a question / plan
- Don’t read enough
- Fail to simplify
- Don’t sell well
1) Focus on Tools rather than Business Problems
The expectations of life depend upon diligence; the mechanic that would perfect his work must first sharpen his tools.
In addition to programming languages such as SAS, R, Python etc. tools for data scientists include statistical and machine learning methods and algorithms . I am certainly not trying to undermine the importance of these tools when I am asking data scientists to shift their focus away from them. Mastering tools, as Confucius suggested, is at the core of being a good craftsman. However to make my point, imagine going to a doctor who is much more confident with her skills with stethoscope than diagnosing patients. Some data scientists also focus too much on tools rather than problems these tools are meant to solve. In my opinion, a good practice for data scientists is to always question the purpose of using the tool and how it will help solve the problem in hand.
It is the old experience that a rude instrument in the hand of a master craftsman will achieve more than the finest tool wielded by the uninspired journeyman.
— Karl Pearson
2) Planning Communication Last
The most important things are the hardest to say, because words diminish them.
― Stephen King
Trust me in your career as a data scientist you will communicate some really important things: communications that will challenge status-quos and change the way organizations do their business. Hence, you can’t leave the task of planning communication towards the end of the analysis. On the contrary, I believe, planning communication along with your investigation / analysis actually enhances the quality of your analysis. A good communication flows like a tightly knit and gripping story. When you plan your communication along with the analysis, your analysis also flows like a story. In my opinion, a good practice for data scientists is to take time away from their analysis on a daily basis and structure their results and thoughts in the form of a story.
Think like a wise man but communicate in the language of the people.
― William Butler Yeats
3) Data Analysis without a Question / Plan
If you don’t know what you want, you end up with a lot you don’t.
― Chuck Palahniuk
Easy availability of data often makes data scientists jump directly towards data without well-defined questions. This is suicidal for any data science project. Data science is a structured process that starts with well-defined questions and objectives. Then comes the part of setting a few hypotheses to satisfy the grand objective.
It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.
― Sherlock Holmes
Let me create the distinction between theorizing and hypothesising. Hypotheses are testable where facts support or dispel them. As a data scientist, our job is to be dispassionate about our hypotheses. The idea is to be truth seekers rather than doing a self serving analysis. Additionally, during your analysis, you will come up with several clues that were not part of the hypotheses. You build your story on top of these clues like a true detective. However, having clearly defined questions before the analysis is the most import aspect for data scientists.
Judge a man by his questions rather than by his answers.
4) Don’t Read Enough
A reader lives a thousand lives before he dies, said Jojen. The man who never reads lives only one.
― George R.R. Martin,
I have found reading extremely helpful throughout my career in data science. The most powerful aspect of reading is the way it helps us generate ideas and also communicate those ideas. Data scientists across the globe are doing some really cool work and reading is our gateway to access that work. In addition to books, there are so many other resources for a data scientist to gain knowledge including academic articles, research papers, white papers, blogs, LinkedIn articles etc. Reading is a highly disciplined activity and it is easy to slip out of it when there is excessive workload. However, I believe, daily reading should be a part of the job description for every data scientist. I recommend that for a successful career in data science you spend at least an hour out of you daily job to read.
It is what you read when you don’t have to that determines what you will be when you can’t help it.
― Oscar Wilde
5) Fail to Simplify
Everything should be made as simple as possible, but not simpler
― Albert Einstein
At the core of any data science activity, which is often surrounded with complicated mathematics, hacking, and analysis, lies a simple idea. Simplification is getting at the core of that idea. It is often believed that you must simplify things for others i.e. your business users and audience. On the contrary, I believe, simplification is an activity you must do for yourself. It helps you develop a deeper relationship with your work.
Simplicity is the ultimate sophistication.― Leonardo da Vinci
6) Don’t Sell Well
The story of the human race is the story of men and women selling themselves short.
― Abraham Maslow
Many data scientists believe that selling is not a part of their job, and trust me they can’t be more wrong. Whether you are working with internal or external customers selling is an integral part of your job. To explain my point even the greatest scientist had to sell their science: Einstein sold Relativity, Darwin had to sell Evolution, and Newton sold Gravity. These greatest creations of the human mind would have stayed in oblivious had it been not for the great salesmanship for their creators.
I am an artisan. I only became an artist when people watch what I do. That is when it becomes art.
― Rhys Ifans
The most important aspect for a data scientist is to ensure that their work gets integrated with business processes. Trust me this requires some hard selling. If you believe your solution has the value you need to sell it well to show its promises.
Salesmanship is limitless. Our very living is selling. We are all salespeople.― James Cash Penney
These are some of the important lessons I have learned in my career in data science. I must say I didn’t know them at the beginning and I hope they will help you with your career.
Come, Watson, come!’ [Sherlock Holmes] cried. ‘The game is afoot. Not a word! Into your clothes and come!
― Sherlock Holmes