{"id":4590,"date":"2015-02-01T18:14:51","date_gmt":"2015-02-01T12:44:51","guid":{"rendered":"http:\/\/ucanalytics.com\/blogs\/?p=4590"},"modified":"2016-09-14T14:42:13","modified_gmt":"2016-09-14T09:12:13","slug":"master-art-data-preparation-data-science","status":"publish","type":"post","link":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/","title":{"rendered":"Master the Art of Data Preparation for Data Science"},"content":{"rendered":"<hr \/>\n<div id=\"attachment_4591\" style=\"width: 283px\" class=\"wp-caption alignright\"><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg\"><img aria-describedby=\"caption-attachment-4591\" data-attachment-id=\"4591\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/photo-5-2\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&amp;ssl=1\" data-orig-size=\"273,448\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Data Preparation and Master Chef &#8211; by Roopam\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;Data Preparation and Master Chef &#8211; by Roopam&lt;\/p&gt;\n\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=183%2C300&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-4591 size-full\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?resize=273%2C448\" alt=\"Data Preparation and Master Chef - by Roopam\" width=\"273\" height=\"448\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?w=273&amp;ssl=1 273w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?resize=152%2C250&amp;ssl=1 152w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?resize=183%2C300&amp;ssl=1 183w\" sizes=\"(max-width: 273px) 100vw, 273px\" data-recalc-dims=\"1\" \/><\/a><p id=\"caption-attachment-4591\" class=\"wp-caption-text\">Data Preparation and Master Chef &#8211; by Roopam<\/p><\/div>\n<p>Every data scientist knows that in any business analytics and data science exercise 70-80% of the time is consumed in data preparation and data preprocessing. This is usually considered a drudgery in\u00a0comparison to the actual statistical modeling, machine learning, and business insights part. However, every good data scientist understands that data preparation is an art and a highly intellectual exercise. We will discover the art and science of data preparation in this article. However, before that let&#8217;s experience some culinary delights\u00a0in the&#8230;<\/p>\n<h2><span style=\"color: #3366ff;\">Master Chef&#8217;s Kitchen<\/span><\/h2>\n<p>My wife is a huge fan of MasterChef Australia. I think, she has watched all the seasons of the show till date since its inception. For me, I enjoy eating good food more than watching someone cook, and judges relishing their meal. However, over the years I have watched a few episodes because of my wife. I must say that there are a lot of good lessons for data scientists in cooking.\u00a0In virtually every episode of MasterChef I have seen, the participants spent a larger part of their time preparing ingredients for their final dishes. They all runaround and collect the appropriate supplies for their dishes from the larder. To me, all this seemed quite similar to data scientists\u00a0running around and getting the right data fields from data sources &amp; data warehouses. After this, the act of peeling, chopping, cutting, roasting etc. is similar to data preparation in data science.<\/p>\n<p>I use to think\u00a0that\u00a0master chef&#8217;s in big restaurants usually just order around sous chefs and assistants to prepare ingredients for their main dish. However, in one of the episodes of MasterChef Marco Pierre White, a celebrity chef, asks the participants to chop onions really fine to test their skills with the knife. He\u00a0first demonstrated his meticulous knife skills by chopping the onion into perfect microscopic pieces within a few seconds. For Marco Pierre White, this was not an act of showcasing his supremacy but a necessity for the dish he had in mind. Similarly, a senior data scientist\u00a0needs to be completely involved in the process of data preparation and preprocessing to produced desirable models and business results.<\/p>\n<p>A data scientist can&#8217;t\u00a0expect a delicious model unless she has spent enough time preparing the right ingredients for the model through data preparation. In the next section, let&#8217;s explore some key elements of<\/p>\n<h2><span style=\"color: #3366ff;\">Data Preparation<\/span><\/h2>\n<p>Continuous improvement to generate competitive advantage is the only way for companies to survive in the current times of cut-throat business competition. For the modern businesses, Analytics is an essential instrument to generate\u00a0competitive advantage. Analytics and data science generate novel business insights for improved business actions to keep the company ahead in the race. Even in the age of big data, \u00a0companies\u00a0can dismiss the fundamental rules of rigorous research and analysis design at their own peril. As we will see, intelligent data preparation is a fundamental aspect of\u00a0rigorous data science design. Unfortunately, till date, there is no software or tool that will create an intelligent data preparation design across the\u00a0industry. Hence the onus of data preparation is still with creative human minds.<\/p>\n<p>Before I\u00a0share a few key aspects of data preparation, let us have a look at a simplistic data schema from banking. The following schema is from banking however the discussion we will have about data preparation will suggest\u00a0general strategies for all other\u00a0industries including healthcare, telecom, retail etc.<\/p>\n<div id=\"attachment_4616\" style=\"width: 607px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/Banking-Databases.jpg\"><img aria-describedby=\"caption-attachment-4616\" data-attachment-id=\"4616\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/banking-databases\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/Banking-Databases.jpg?fit=811%2C539&amp;ssl=1\" data-orig-size=\"811,539\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Banking Databases\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;A Simple Schematic of Banking Datasets&lt;\/p&gt;\n\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/Banking-Databases.jpg?fit=300%2C199&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/Banking-Databases.jpg?fit=640%2C425&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-4616\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/Banking-Databases.jpg?resize=597%2C397\" alt=\"A Simple Schematic of Banking Datasets\" width=\"597\" height=\"397\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/Banking-Databases.jpg?w=811&amp;ssl=1 811w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/Banking-Databases.jpg?resize=250%2C166&amp;ssl=1 250w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/Banking-Databases.jpg?resize=300%2C199&amp;ssl=1 300w\" sizes=\"(max-width: 597px) 100vw, 597px\" data-recalc-dims=\"1\" \/><\/a><p id=\"caption-attachment-4616\" class=\"wp-caption-text\">A Simple Schematic of Banking Datasets<\/p><\/div>\n<div id=\"attachment_4624\" style=\"width: 340px\" class=\"wp-caption alignright\"><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/02\/Bank-Statement.jpg\"><img aria-describedby=\"caption-attachment-4624\" data-attachment-id=\"4624\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/bank-statement\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/02\/Bank-Statement.jpg?fit=621%2C480&amp;ssl=1\" data-orig-size=\"621,480\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Bank Statement\" data-image-description=\"\" data-image-caption=\"&lt;p&gt;Bank Statement &#8211; Source Wikipedia&lt;\/p&gt;\n\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/02\/Bank-Statement.jpg?fit=300%2C232&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/02\/Bank-Statement.jpg?fit=621%2C480&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-4624\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/02\/Bank-Statement.jpg?resize=330%2C255\" alt=\"Bank Statement - Source Wikipedia\" width=\"330\" height=\"255\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/02\/Bank-Statement.jpg?w=621&amp;ssl=1 621w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/02\/Bank-Statement.jpg?resize=250%2C193&amp;ssl=1 250w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/02\/Bank-Statement.jpg?resize=300%2C232&amp;ssl=1 300w\" sizes=\"(max-width: 330px) 100vw, 330px\" data-recalc-dims=\"1\" \/><\/a><p id=\"caption-attachment-4624\" class=\"wp-caption-text\">Bank Statement (click to enlarge)- Source Wikipedia<\/p><\/div>\n<p>In every industry, the IT systems are designed to capture transaction data. For example, consider the\u00a0adjacent chequing \/ saving accounts statement where every debit and credit transaction for a customer is recorded with the description. This is similar to the bank statement you get from\u00a0your banking account. Additionally, your bank captures transaction information for other investment and loan products you hold with them. In their databases, they have transaction level information for all the customers with account numbers.\u00a0Hence, the base data that data scientists start with are a continuous stream of transactions. It is easy to lose site of the big picture for data preparation. In my opinion, the\u00a0following six\u00a0points will help you keep the focus on right data preparation:<\/p>\n<p><strong><strong>1) Business objectives, questions:\u00a0<\/strong><\/strong>business objectives are\u00a0the driving force for data preparation. From the business, objective comes questions for which the analytics will provide solutions.\u00a0I have noticed\u00a0a tendency in new data scientists to immediately jumping into data without focusing on business objectives. I recommend,\u00a0don&#8217;t touch the data till you are clear about business objectives and \u00a0business questions. Having a clear data strategy based on business objectives will help you not get lost in the labyrinth of huge data and save a lot of your rework time.<\/p>\n<p><strong>2) Curiosity: <\/strong>data science doesn&#8217;t start with data but with<em> a <\/em>curiosity which is the key to being a good data scientist. All data scientist possess the inner desire to decipher and learn business patterns or facts hidden within data. To decipher new patterns data preparation is the first step. Hence, let your curiosity run wild while preparing derived data fields from the transaction data.<\/p>\n<p><strong>3)\u00a0\u00a0Unit of analysis &amp; b<\/strong><strong>usiness hypothesis:\u00a0<\/strong>one could analyse transaction data at various units i.e. customer, branch, region, agents, relationship manager, channel partners etc. The unit of analysis for model development comes from the business objective. For instance, customer risk scorecard has the customer as the unit of analysis while business expansion model has the branch or region as a unit of analysis. Once, the unit of analysis is defined it a good practice to create a few business hypotheses or hunches for variables that you believe will feature in your predictive model. One might also go for a complete data mining approach of creating hundreds of variables while data preparation to detect patterns with the target variable. However, I prefer a mix of hypotheses \/ hunches driven and data mining approaches while data preparation.<\/p>\n<p><strong>4) Data roll-up &amp; data quality checks:\u00a0<\/strong>after preparing your\u00a0unit of analysis and prospective predictor variables list you are ready to approach the data. The idea is to roll up the transaction data to your unit of analysis and prepare predictor variables for model development. Most statistics and data mining books provide you with the data that is the end product of this exercise. These datasets make your life easy but also prevent you from experiencing an essential process of data preparation in data science.<\/p>\n<p><strong>5) Operationalization of analytics base table<\/strong> :<strong>\u00a0<\/strong>finally, data scientists must think of operationalization of their model on business systems while preparing data . It is better to lose out on a few notches of predictive power for your model than to prepare a complicated data set that is difficult to productionize.<\/p>\n<h4><span style=\"color: #3366ff;\">Sign-off Note<\/span><\/h4>\n<p>My wife pointed out to me that as\u00a0the contestants get better at cooking in Masterchef, judges put a lot of emphasis on the final plating and presentation of the dish. The idea is that the served\u00a0dish needs to be aesthetically pleasing which will enhance the experience\u00a0of the dish. This is an important lesson for data scientists for presenting their models and analysis results to the senior management of the company. Communicating and presenting\u00a0predictive\u00a0models to make them appetising is an art that every great data scientist is a master of.<\/p>\n<p>See you soon with a new post.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Every data scientist knows that in any business analytics and data science exercise 70-80% of the time is consumed in data preparation and data preprocessing. This is usually considered a drudgery in\u00a0comparison to the actual statistical modeling, machine learning, and business insights part. However, every good data scientist understands that data preparation is an art<\/p>\n<p><a class=\"excerpt-more blog-excerpt\" href=\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/\">Read More&#8230;<\/a><\/p>\n","protected":false},"author":1,"featured_media":4591,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[62],"tags":[7,52,6,10],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v17.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Master the Art of Data Preparation for Data Science &ndash; YOU CANalytics |<\/title>\n<meta name=\"description\" content=\"In this article, we will learn the art and science of data preparation and data preprocessing for analytics and data science.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Master the Art of Data Preparation for Data Science &ndash; YOU CANalytics |\" \/>\n<meta property=\"og:description\" content=\"In this article, we will learn the art and science of data preparation and data preprocessing for analytics and data science.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/\" \/>\n<meta property=\"og:site_name\" content=\"YOU CANalytics |\" \/>\n<meta property=\"article:author\" content=\"roopam\" \/>\n<meta property=\"article:published_time\" content=\"2015-02-01T12:44:51+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-09-14T09:12:13+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&#038;ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"273\" \/>\n\t<meta property=\"og:image:height\" content=\"448\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Roopam Upadhyay\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#organization\",\"name\":\"YOU CANalytics\",\"url\":\"https:\/\/ucanalytics.com\/blogs\/\",\"sameAs\":[],\"logo\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#logo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120\",\"contentUrl\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120\",\"width\":607,\"height\":120,\"caption\":\"YOU CANalytics\"},\"image\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#logo\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#website\",\"url\":\"https:\/\/ucanalytics.com\/blogs\/\",\"name\":\"YOU CANalytics |\",\"description\":\"Explore the Power of Data Science\",\"publisher\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ucanalytics.com\/blogs\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&ssl=1\",\"width\":273,\"height\":448,\"caption\":\"Data Preparation and Master Chef - by Roopam\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#webpage\",\"url\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/\",\"name\":\"Master the Art of Data Preparation for Data Science &ndash; YOU CANalytics |\",\"isPartOf\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#primaryimage\"},\"datePublished\":\"2015-02-01T12:44:51+00:00\",\"dateModified\":\"2016-09-14T09:12:13+00:00\",\"description\":\"In this article, we will learn the art and science of data preparation and data preprocessing for analytics and data science.\",\"breadcrumb\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ucanalytics.com\/blogs\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Master the Art of Data Preparation for Data Science\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#webpage\"},\"author\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6\"},\"headline\":\"Master the Art of Data Preparation for Data Science\",\"datePublished\":\"2015-02-01T12:44:51+00:00\",\"dateModified\":\"2016-09-14T09:12:13+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#webpage\"},\"wordCount\":1273,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#organization\"},\"image\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&ssl=1\",\"keywords\":[\"Business Analytics\",\"data preparation\",\"Predictive Analytics\",\"Roopam Upadhyay\"],\"articleSection\":[\"Analytics Tips and Tricks\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#respond\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6\",\"name\":\"Roopam Upadhyay\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g\",\"caption\":\"Roopam Upadhyay\"},\"description\":\"This blog contains my personal views and thoughts on predictive Analytics and big data. - Roopam Upadhyay\",\"sameAs\":[\"roopam\"],\"url\":\"https:\/\/ucanalytics.com\/blogs\/author\/roopam\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Master the Art of Data Preparation for Data Science &ndash; YOU CANalytics |","description":"In this article, we will learn the art and science of data preparation and data preprocessing for analytics and data science.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/","og_locale":"en_US","og_type":"article","og_title":"Master the Art of Data Preparation for Data Science &ndash; YOU CANalytics |","og_description":"In this article, we will learn the art and science of data preparation and data preprocessing for analytics and data science.","og_url":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/","og_site_name":"YOU CANalytics |","article_author":"roopam","article_published_time":"2015-02-01T12:44:51+00:00","article_modified_time":"2016-09-14T09:12:13+00:00","og_image":[{"width":273,"height":448,"url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&ssl=1","type":"image\/jpeg"}],"twitter_misc":{"Written by":"Roopam Upadhyay","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/ucanalytics.com\/blogs\/#organization","name":"YOU CANalytics","url":"https:\/\/ucanalytics.com\/blogs\/","sameAs":[],"logo":{"@type":"ImageObject","@id":"https:\/\/ucanalytics.com\/blogs\/#logo","inLanguage":"en-US","url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120","contentUrl":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120","width":607,"height":120,"caption":"YOU CANalytics"},"image":{"@id":"https:\/\/ucanalytics.com\/blogs\/#logo"}},{"@type":"WebSite","@id":"https:\/\/ucanalytics.com\/blogs\/#website","url":"https:\/\/ucanalytics.com\/blogs\/","name":"YOU CANalytics |","description":"Explore the Power of Data Science","publisher":{"@id":"https:\/\/ucanalytics.com\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ucanalytics.com\/blogs\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"ImageObject","@id":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#primaryimage","inLanguage":"en-US","url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&ssl=1","contentUrl":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&ssl=1","width":273,"height":448,"caption":"Data Preparation and Master Chef - by Roopam"},{"@type":"WebPage","@id":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#webpage","url":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/","name":"Master the Art of Data Preparation for Data Science &ndash; YOU CANalytics |","isPartOf":{"@id":"https:\/\/ucanalytics.com\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#primaryimage"},"datePublished":"2015-02-01T12:44:51+00:00","dateModified":"2016-09-14T09:12:13+00:00","description":"In this article, we will learn the art and science of data preparation and data preprocessing for analytics and data science.","breadcrumb":{"@id":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ucanalytics.com\/blogs\/"},{"@type":"ListItem","position":2,"name":"Master the Art of Data Preparation for Data Science"}]},{"@type":"Article","@id":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#article","isPartOf":{"@id":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#webpage"},"author":{"@id":"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6"},"headline":"Master the Art of Data Preparation for Data Science","datePublished":"2015-02-01T12:44:51+00:00","dateModified":"2016-09-14T09:12:13+00:00","mainEntityOfPage":{"@id":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#webpage"},"wordCount":1273,"commentCount":1,"publisher":{"@id":"https:\/\/ucanalytics.com\/blogs\/#organization"},"image":{"@id":"https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&ssl=1","keywords":["Business Analytics","data preparation","Predictive Analytics","Roopam Upadhyay"],"articleSection":["Analytics Tips and Tricks"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ucanalytics.com\/blogs\/master-art-data-preparation-data-science\/#respond"]}]},{"@type":"Person","@id":"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6","name":"Roopam Upadhyay","image":{"@type":"ImageObject","@id":"https:\/\/ucanalytics.com\/blogs\/#personlogo","inLanguage":"en-US","url":"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g","caption":"Roopam Upadhyay"},"description":"This blog contains my personal views and thoughts on predictive Analytics and big data. - Roopam Upadhyay","sameAs":["roopam"],"url":"https:\/\/ucanalytics.com\/blogs\/author\/roopam\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/01\/photo-5.jpg?fit=273%2C448&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3L0jT-1c2","jetpack-related-posts":[{"id":4403,"url":"https:\/\/ucanalytics.com\/blogs\/career-data-science-analytics-play-strengths\/","url_meta":{"origin":4590,"position":0},"title":"Career in Data Science and Analytics &#8211; Play to Your Strengths","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"There is a lot of exuberance about data science as a career\u00a0choice among young professionals. This exuberance (for once) is not at all irrational because the field has tons of potential. I have been asked on many occasions by\u00a0professionals considering a career change to data science and young graduates trying\u2026","rel":"","context":"In &quot;Analytics Tips and Tricks&quot;","block_context":{"text":"Analytics Tips and Tricks","link":"https:\/\/ucanalytics.com\/blogs\/category\/analytics-tips\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/11\/photo-1.jpg?fit=640%2C433&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/11\/photo-1.jpg?fit=640%2C433&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/11\/photo-1.jpg?fit=640%2C433&ssl=1&resize=525%2C300 1.5x"},"classes":[]},{"id":633,"url":"https:\/\/ucanalytics.com\/blogs\/data-visualization-case-study-banking\/","url_meta":{"origin":4590,"position":1},"title":"Data Visualization &#8211; Banking Case Study Example (Part 1)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"A Scientist & An Artist A few weeks ago while wandering around in Florence, the birthplace of the Renaissance, I could not escape the thought of Leonardo da Vinci : the greatest polymath of all times. Leonardo\u2019s illustrious resume contains titles such as painter, inventor, physicist, astronomer, engineer, biologist, anatomist,\u2026","rel":"","context":"In &quot;Banking Risk Case Study Example&quot;","block_context":{"text":"Banking Risk Case Study Example","link":"https:\/\/ucanalytics.com\/blogs\/category\/risk-analytics\/banking-risk-case-study-example\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/09\/Data-Visualization.jpg?fit=1098%2C1059&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/09\/Data-Visualization.jpg?fit=1098%2C1059&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/09\/Data-Visualization.jpg?fit=1098%2C1059&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/09\/Data-Visualization.jpg?fit=1098%2C1059&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/09\/Data-Visualization.jpg?fit=1098%2C1059&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":11933,"url":"https:\/\/ucanalytics.com\/blogs\/how-data-science-will-shape-post-covid-banking-video-discussion\/","url_meta":{"origin":4590,"position":2},"title":"How data science will shape post-COVID banking? &#8211; Video Discussion","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"How data science will shape post-COVID banking? had a thought-provoking discussion with\u00a0FrankBanker: https:\/\/www.youtube.com\/watch?v=o-3uPgjSFSY&feature=youtu.be 02:01 (Part 1) Impact on variables in Credit Models05:33 (Part 2) Are we going back to Judgemental Lending?07:50 (Part 3) Evaluating analytics readiness of Banks12:20 (Part 4) Is \u2018IT\u2019 the right place for Data Analytics?14:15 (Part 5)\u2026","rel":"","context":"In &quot;Marketing Analytics&quot;","block_context":{"text":"Marketing Analytics","link":"https:\/\/ucanalytics.com\/blogs\/category\/marketing-analytics\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2020\/07\/Data-Analytics-for-Banking-FB.jpg?fit=788%2C444&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2020\/07\/Data-Analytics-for-Banking-FB.jpg?fit=788%2C444&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2020\/07\/Data-Analytics-for-Banking-FB.jpg?fit=788%2C444&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2020\/07\/Data-Analytics-for-Banking-FB.jpg?fit=788%2C444&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":2783,"url":"https:\/\/ucanalytics.com\/blogs\/in-conversation-with-eric-siegel-author-predictive-analytics\/","url_meta":{"origin":4590,"position":3},"title":"In Conversation with Eric Siegel: Author &#8216;Predictive Analytics&#8217;","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"In Conversation with.. Today we are starting a new series on YOU CANalytics called 'in conversation with'. In this series we will talk to the leaders and experts of predictive analytics and big data to gain deeper insight into the field. Dr. Eric Siegel Our first guest for the series\u2026","rel":"","context":"In &quot;Events &amp; Interviews&quot;","block_context":{"text":"Events &amp; Interviews","link":"https:\/\/ucanalytics.com\/blogs\/category\/events-and-interviews\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/12\/Slide15.jpg?fit=290%2C210&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":6241,"url":"https:\/\/ucanalytics.com\/blogs\/4-ps-to-bring-data-science-to-boardroom-the-economic-times-business-analytics-summit\/","url_meta":{"origin":4590,"position":4},"title":"4 Ps to Bring Data Science to Boardroom @ The Economic Times Business Analytics Summit","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"A couple of\u00a0weeks ago I got an\u00a0opportunity to be a\u00a0part of a\u00a0panel discussion at 'The Economic Times Business Analytics Summit'. The topic of the\u00a0discussion was\u00a0'overcoming the challenges of bringing data science to the boardroom'.\u00a0The panel had a well-balanced representation from both industry and academia. It was an interesting and thought-provoking\u2026","rel":"","context":"In &quot;Events &amp; Interviews&quot;","block_context":{"text":"Events &amp; Interviews","link":"https:\/\/ucanalytics.com\/blogs\/category\/events-and-interviews\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/10\/The-Economic-Times-Business-Analytics-Summit.jpg?fit=678%2C395&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/10\/The-Economic-Times-Business-Analytics-Summit.jpg?fit=678%2C395&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/10\/The-Economic-Times-Business-Analytics-Summit.jpg?fit=678%2C395&ssl=1&resize=525%2C300 1.5x"},"classes":[]},{"id":7454,"url":"https:\/\/ucanalytics.com\/blogs\/data-science-job-interview-types-sample-questions-preparation-strategies\/","url_meta":{"origin":4590,"position":5},"title":"7 Data Science Job Interview Types, Sample Questions, and Preparation Strategies","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"Are you preparing for a data science job interview? To help you,\u00a0in this article I\u00a0will explore some of the most common techniques used by data scientists to select their future colleagues. Additionally, I will also share many sample questions for data science job interviews and suggest a few strategies to\u2026","rel":"","context":"In &quot;Analytics Tips and Tricks&quot;","block_context":{"text":"Analytics Tips and Tricks","link":"https:\/\/ucanalytics.com\/blogs\/category\/analytics-tips\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/01\/Data-Science-Analytics-Interview.jpg?fit=470%2C640&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/posts\/4590"}],"collection":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/comments?post=4590"}],"version-history":[{"count":0,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/posts\/4590\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/media\/4591"}],"wp:attachment":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/media?parent=4590"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/categories?post=4590"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/tags?post=4590"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}