Over the years in my career in data science and predictive analytics, I have noticed some awful practices that young and sometimes seasoned analysts follow. These bad practices, I believe, throw careers of these data scientists on a collision course similar to the Titanic. I will present the six worst mistakes that I feel are at the root of all these bad practices. Additionally, I will try to suggest strategies to avoid these mistakes using some memorable quotes. To begin with, let me present the purpose of being a data scientist, which in my opinion is similar to being a detective. The following quote by Sherlock Holmes sums up the purpose of being a data scientist:
My name is Sherlock Holmes. It is my business to know what other people don’t know.
― Sherlock Holmes
Now coming back to the six worst mistakes for data scientists, the following is my list for the same:
- Focus on tools rather than business problems
- Planning communication last
- Data analysis without a question / plan
- Don’t read enough
- Fail to simplify
- Don’t sell well
1) Focus on Tools rather than Business Problems
The expectations of life depend upon diligence; the mechanic that would perfect his work must first sharpen his tools.
― Confucius
In addition to programming languages such as SAS, R, Python etc. tools for data scientists include statistical and machine learning methods and algorithms . I am certainly not trying to undermine the importance of these tools when I am asking data scientists to shift their focus away from them. Mastering tools, as Confucius suggested, is at the core of being a good craftsman. However to make my point, imagine going to a doctor who is much more confident with her skills with stethoscope than diagnosing patients. Some data scientists also focus too much on tools rather than problems these tools are meant to solve. In my opinion, a good practice for data scientists is to always question the purpose of using the tool and how it will help solve the problem in hand.
It is the old experience that a rude instrument in the hand of a master craftsman will achieve more than the finest tool wielded by the uninspired journeyman.
— Karl Pearson
2) Planning Communication Last
The most important things are the hardest to say, because words diminish them.
― Stephen King
Trust me in your career as a data scientist you will communicate some really important things: communications that will challenge status-quos and change the way organizations do their business. Hence, you can’t leave the task of planning communication towards the end of the analysis. On the contrary, I believe, planning communication along with your investigation / analysis actually enhances the quality of your analysis. A good communication flows like a tightly knit and gripping story. When you plan your communication along with the analysis, your analysis also flows like a story. In my opinion, a good practice for data scientists is to take time away from their analysis on a daily basis and structure their results and thoughts in the form of a story.
Think like a wise man but communicate in the language of the people.
― William Butler Yeats
3) Data Analysis without a Question / Plan
If you don’t know what you want, you end up with a lot you don’t.
― Chuck Palahniuk
Easy availability of data often makes data scientists jump directly towards data without well-defined questions. This is suicidal for any data science project. Data science is a structured process that starts with well-defined questions and objectives. Then comes the part of setting a few hypotheses to satisfy the grand objective.
It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.
― Sherlock Holmes
Let me create the distinction between theorizing and hypothesising. Hypotheses are testable where facts support or dispel them. As a data scientist, our job is to be dispassionate about our hypotheses. The idea is to be truth seekers rather than doing a self serving analysis. Additionally, during your analysis, you will come up with several clues that were not part of the hypotheses. You build your story on top of these clues like a true detective. However, having clearly defined questions before the analysis is the most import aspect for data scientists.
Judge a man by his questions rather than by his answers.
― Voltaire
4) Don’t Read Enough
A reader lives a thousand lives before he dies, said Jojen. The man who never reads lives only one.
― George R.R. Martin, A Dance with Dragons
I have found reading extremely helpful throughout my career in data science. The most powerful aspect of reading is the way it helps us generate ideas and also communicate those ideas. Data scientists across the globe are doing some really cool work and reading is our gateway to access that work. In addition to books, there are so many other resources for a data scientist to gain knowledge including academic articles, research papers, white papers, blogs, LinkedIn articles etc. Reading is a highly disciplined activity and it is easy to slip out of it when there is excessive workload. However, I believe, daily reading should be a part of the job description for every data scientist. I recommend that for a successful career in data science you spend at least an hour out of you daily job to read.
It is what you read when you don’t have to that determines what you will be when you can’t help it.
― Oscar Wilde
5) Fail to Simplify
Everything should be made as simple as possible, but not simpler
― Albert Einstein
At the core of any data science activity, which is often surrounded with complicated mathematics, hacking, and analysis, lies a simple idea. Simplification is getting at the core of that idea. It is often believed that you must simplify things for others i.e. your business users and audience. On the contrary, I believe, simplification is an activity you must do for yourself. It helps you develop a deeper relationship with your work.
Simplicity is the ultimate sophistication.
― Leonardo da Vinci
6) Don’t Sell Well
The story of the human race is the story of men and women selling themselves short.
― Abraham Maslow
Many data scientists believe that selling is not a part of their job, and trust me they can’t be more wrong. Whether you are working with internal or external customers selling is an integral part of your job. To explain my point even the greatest scientist had to sell their science: Einstein sold Relativity, Darwin had to sell Evolution, and Newton sold Gravity. These greatest creations of the human mind would have stayed in oblivious had it been not for the great salesmanship for their creators.
I am an artisan. I only became an artist when people watch what I do. That is when it becomes art.
― Rhys Ifans
The most important aspect for a data scientist is to ensure that their work gets integrated with business processes. Trust me this requires some hard selling. If you believe your solution has the value you need to sell it well to show its promises.
Salesmanship is limitless. Our very living is selling. We are all salespeople.
― James Cash Penney
Sign-off Note
These are some of the important lessons I have learned in my career in data science. I must say I didn’t know them at the beginning and I hope they will help you with your career.
Come, Watson, come!’ [Sherlock Holmes] cried. ‘The game is afoot. Not a word! Into your clothes and come!
― Sherlock Holmes
All so true. I would also like to add: Not following rigorous methodology. Even in grad school, I noticed that so many wanted to quickly jump to the analysis aspect of the data using some “cool’ algorithm. However, what separated the exceptional from the good results was usually data pre-processing and the development of thorough feature sets prior to analysis.
Excellent points. Lack of focus on sound foundations and rigorous methodology is another worst mistake for data scientists. Thanks for pointing this out.
Hai bro…the word ‘thanks’ is not enough for u. Really wat an amazinggg blogs. This is an library . i’m at initial stage n just started my carrier as a data scientist. i’m hard core fan of sherlock. After looking dis blogs with quotes i was shocked.Everyone would appreciate you for doing free education service & giving wonderful information with quotes which make us to enhance skills n chance to filling our loopholes with concrete. The way u r explaining is mind blowing even other background people also could understood. You are amazing bro..gr8 job… Keep it up & go ahead,…once again thank u..jaihind!!
I tend to like this quotation better:
“No amount of experimentation can ever prove me right; a single experiment can prove me wrong.”
Great quotation – that’s Einstein. Theoretical physicists tend to think this way, as experimental physicists are always after their masterpiece theory to find fissures.
That’s the reason why they say, mathematicians usually leave the planet with certainty that no one will ever demolish their brilliant idea.
Outstanding article, covers almost all details that we miss to focus during our working hours. What an article!!!!!!
As a data scientist our best products are working with and excellent subject matter expert. Data scientists are experts in one filed. You need a village for the final product.
Very useful tips. Even put together very beautifully.
Excellent advice, and from the thoughtfully selected quotes it’s clear that you follow it yourself.
I’m just a bit confused by your take on theory versus hypothesis. Having come from a background in physics, I regard a theory as a thorough treatment of a phenomenon, explaining the causative mechanisms and providing rules for prediction. A promising theory is built from a synthesis of tested hypotheses and provides the platform to generate and test new hypotheses, which will ultimately either strengthen or disprove the theory. Unfortunately, in common usage, people confuse the words theory and speculation, which leads to ridiculous arguments like, “evolution is only a theory — it hasn’t been proved!”
That’s a good question. I was primarily focusing on this quote by Sherlock Holmes : “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts.”. Laws of physics and other physical sciences are developed over a long period of time with careful experiments and hard data. Einstein’s general theory of relativity was a speculation before it was proven consistent with natural phenomenon by careful experiments. Some of the definitions of theory I could find on the net are : ‘a supposition or a system of ideas intended to explain something.’ and ‘an idea used to account for a situation or justify a course of action.’. I couldn’t make out for sure that theories are always followed by hard evidence and data. Let me know your thoughts.
All good points, especially 1 and 5. Not sure about the ‘hard selling’ skills. I like factual evidence coupled with good use cases. so, communication/presentation skills definitely useful. If it comes off as ‘selling’ there is most likely bias, and bias is something we don’t need in data scientists, we get enough of that from social media!
Very solid advice. I came across this and found it inline with my values as an analyst.
Great advice. The quotes that you chose illustrate your points nicely!
Roopam, I think that your points on the 360 view of data science projects are worth framing and studying every day before work to ensure we don’t forget. As one says ‘Time is the best healer, and forgetfulness seeps in that healing’. Thank you for sharing your years of experience distilled in a short article to help other budding or otherwise data scientists.
very nice and definitely captures the essence of data science that is thinking like a detective
Great post! Have nice day ! 🙂
Excellent Advice