Are you preparing for a data science job interview? To help you, in this article I will explore some of the most common techniques used by data scientists to select their future colleagues. Additionally, I will also share many sample questions for data science job interviews and suggest a few strategies to prepare. Moreover, please post your comments, questions, and challenges in the discussion section at the bottom. I will be more than happy to help.
Data science job interviews and selection techniques can be categorized into the following 7 classes with 3 levels.
|Data Science Job Interview Type||Level and Significance for Selection|
|1. Puzzles & Riddles||Level 1 : These are usually the warm-up interview questions to assess logical and analytical aptitude of the candidate. You will rarely get a job offer after clearing only this level.|
|2. Quick Math|
|4. Programming & Data Preparation Challenges||Level 2 : Things get serious from this stage onward since these are part of daily activities for a data scientist. Some entry level candidates may get a job offer after clearing this level.|
|5. Statistics & Machine Learning Questions|
|6. Case Study Problems / Problem Solving Experience||Final level 3 : This is where the hiring authority is seriously considering you for the position. You will mostly secure an offer after clearing this level.|
|7. Analyze This / Take Home Analysis|
If you are preparing for a data science job interview it is a good idea for you to know what you might face in these interviews. I will discuss details of these categories of interviews later in the article. I will also introduce a few sample questions from each of these category, and suggest ways to prepare for these interviews. Please share your solution approaches and thoughts for these sample questions in the discussion section of this article.
However, before we explore the selection criteria for data science job interviews & screening process, let us learn about the group of people who have mastered the art of right selection & screening, and they are..
I am sure at some point of time you have received a Nigerian scam mail in your Gmail or Yahoo mail box. These mails are typically from an unknown person desperate to transfer a huge sum of money to your bank account because of turmoil in some African country. The senders of these mails often portray themselves as a high ranking banker or the offspring of some rich person. These emails always have laughable sentences like these
Most of us brush these mails off as ridiculous because of their ludicrous language and content. However, the success of these scam mails can be assessed by the following statistics from the report by ultrascan-agi. In 2013 Nigerian scams have duped people with close to 13 billion dollars in financial losses. Moreover between 2006-13 the losses were roughly 82 billion dollars. Now that’s a lot of money and am sure these scam mails have lured a lot of victims. Scamsters are clearly an extremely clever lot, and one wonders why they would write such laughable mails to catch victims.
Nigerian scamsters are indeed a clever bunch. They use these mails as an initial screening process for the intriguing modus operandi of Nigerian scams which eventually requires the victims to transfer money into an unknown bank account that belongs to the scamsters. They are essentially applying this first level of filter to identify people stupid enough to go the distance in this scam. Someone who can’t sight the ridiculous language and absurd content of these mails as suspicious is certainly a great target for these scamsters. The motive of Nigerian scamsters is completely wrong and illegal but their selection process is immaculate to identify and select the right candidate who will go the distance.
Data Science Job Interview
On some level any job interview process, including data science, is far from perfect and is full of flaws. The idea with interviews is to measure the ability of a candidate to run a marathon based on asking them to run a few 100 meter dashes. This is almost impossible. However as we have noticed with Nigerian scammers, they have figured out a quick yet extremely effective strategy to select the right candidates for their purpose. They could do it right because they know clearly who is the right target for them. Even in data science job interviews, the goal is to know clearly what one is looking for in a candidate and design selection procedures around this. Now we will discuss the current practices in data science job interviews.
Level 1 – Data Science Job Interview
Level 1 of data science job interviews is about assessing the logical and analytical aptitude of the participant.
1.1 Puzzles & Riddles
1.2 Quick Math
If typical interviews are like a 100 meter dash, these level 1 interviews tend to become a 10 meter ultra fast sprint. They have their place and importance but it is absolutely impossible to assess a candidate purely on these interviews. However, for someone preparing for a data science interview it is a good idea to brush up their skills on these questions. I will suggest a few books later in this article to enhance your skills with puzzles and guesstimations.
1.1 Puzzles & Riddles
Puzzles and riddles are an integral part of interviews at tech companies like Google and Microsoft. In data science interviews expect a few puzzles around probability theory as mentioned in the sample questions.
1) You are at Ranthambore National Park where the probability of sighting a tiger is 95% in every day trip i.e. 8 hours long. What is the probability that you will see a tiger in a half day trip i.e. 4 hours long?
2. Can you explain the solution to Monty Hall Problem? (read the problem and solution approach at this link)
1.2 Quick Math
Again, expect a few quick maths questions tossed at you during an interview. They are not tough if you have a calculator with you. However during an interview, you will have to solve them without the help of a calculator. What the interviewer looks for in your solution is the ball park figure and not the exact solution to the decimal level.
1. What percentage is 7 of 24?
2. Is 23 times 17 greater than 450?
These questions are actually useful because data scientists need to check the validity of their results on a regular basis. Having these skills help you sight any glaring discrepancy in your results upfront. It’s highly embarrassing if your customer points them out to you during a presentation.
1.3 Guesstimation Questions
These questions are directly borrowed from the interviews of consulting companies like McKinsey and BCG. Data science profiles usually have a significant amount of work in consulting and customer management. Hence there is a fair overlap between data science and consulting interviews.
1. Estimate market size : How many laptops are sold in India every year?
2. How many tennis balls can you fit in a Boeing 747?
These two questions for guesstimation on the surface may seem different but they are solved using the same approach. You will find the following books useful for such questions.
Level 2 – Data Science Job Interview
Level 2 of a data science job interview often has:
2.1 Statistics and Machine learning Questions
2.1 Programming and Data Preparation Challenges
2.1 Statistics and Machine Learning Questions
Statistics and machine learning are the key concepts you need to have a good grasp of to be a good data scientist.
1. What are K-mean clusters? Suggest at least a couple of ways to select the optimal value of K.
2. How are artificial neural networks (ANN) different from logistic regression? Can you create a logistic regression model through ANN? How?
2.2 Programming & Data Preparation Challenges
Data science professionals need to have proficiency in programming languages like R, Python, or SAS. Moreover, they need to understand different strategies to handle and pre-process data. This type of interview usually is an effort to understand the candidate’s proficiency with these skills.
1. What is the largest data size you have handled? What are the challenges you faced while handling that data?
2. What is the command to create histogram in R?
Personally, I am not a huge fan of asking programming questions (like question 2) in an interview. All data science languages are scripting languages with inbuilt functions for most tasks. I don’t think data scientists need to memorize these functions. Most of the time Google is there to help at work. For instance the answer to 2nd question is hist() – that’s a bit dull. However, you will find a few interviewers asking such questions. You will find many such questions on this site : R Interview Questions. I suggest that while preparing, focus on conceptual and logical angles of algorithms.
Final Level 3 – Data Science Job Interview
The final level of a data science job interview involves:
3.1 Case Study Problems
3.2 Analyze This / Take Home Analysis
3.1 Case Study Problems
These questions are the real deal for many data science job interviews. It makes sense since data scientists need to solve business problems as their primary role.
1. A retail chain is not happy with their return on marketing investment (ROMI); how could you help them improving (ROMI) using business analytics?
2. How could you help a telecommunication company improve on their profitability through data science?
3. How will you build a credit scorecard for a bank?
These case study interviews often start with broad problem statements like the ones shown above. However, they get extremely detailed as the time progresses during the interview. The candidates are expected to focus more on their approach to solve the problem rather than the final solution.
You will find these case studies on YOU CANalytics useful while preparing for case study problems.
3.2 Analyze This / Take Home Analysis
Analysis of real data is another form of selection procedure that is highly popular among data scientists. In this form of screening process a large data set is provided to the candidate who has to analyse this data. Many times output of this analysis is a complete model e.g. logistic regression, decision tree etc. The candidate is assessed based on the presentation of her thought process, data preparation strategy, exploratory data analysis, and model results. There are several variants of this form of interview. In an elaborated form, the candidate is allowed to take the data back home and is given a few days to work on the data and analysis.
1. Analyse this graph and suggest at least 3 key findings.
2. Consider y as the dependent variable, and x1 – x4 as the independent variables in the multiple linear regression model. What are your expectations about the regression coefficients for this data? Is there something you need to be careful about while generating your regression equation?
3. Attached is dataset for a lending company. Do your analysis and report significant factors responsible for credit defaults. Moreover, prepare a short report / presentation of your approach.
Please share your solution approaches and thoughts for the sample questions discussed in this article in the discussion section at the bottom. Also, feel free to ask your questions, doubts, and suggestions while preparing for a data science job interview. Would love to help.