{"id":9649,"date":"2017-01-31T11:26:03","date_gmt":"2017-01-31T05:56:03","guid":{"rendered":"http:\/\/ucanalytics.com\/blogs\/?p=9649"},"modified":"2017-04-29T21:43:54","modified_gmt":"2017-04-29T16:13:54","slug":"cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2","status":"publish","type":"post","link":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/","title":{"rendered":"Cluster Analysis Puzzle : Initial Random Seeds &#8211; Learn by Doing! (Part 2)"},"content":{"rendered":"<p><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom.jpg\"><img data-attachment-id=\"9647\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/?attachment_id=9647\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom.jpg?fit=323%2C534&amp;ssl=1\" data-orig-size=\"323,534\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Freedom\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom.jpg?fit=181%2C300&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom.jpg?fit=323%2C534&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\" wp-image-9647 alignright\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom.jpg?resize=293%2C485\" alt=\"\" width=\"293\" height=\"485\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom.jpg?w=323&amp;ssl=1 323w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom.jpg?resize=151%2C250&amp;ssl=1 151w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom.jpg?resize=181%2C300&amp;ssl=1 181w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom.jpg?resize=18%2C30&amp;ssl=1 18w\" sizes=\"(max-width: 293px) 100vw, 293px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<hr \/>\n<p>This is a continuation of the\u00a0<strong><a href=\"http:\/\/ucanalytics.com\/blogs\/cluster-analysis-learn-by-doing-analytics-challenge-part-1\/\">cluster analysis puzzle<\/a>.\u00a0<\/strong>In this puzzle, we had noticed different results for k-mean clusters in different runs. Some of you (Emily. Ramya, Alard, and Pintu) have pointed out initial random seeds as the reason for this inconsistency.<\/p>\n<p>Now, this inconsistency of results is a big problem for cluster analysis. In this article, we will collectively try to come up with solutions to this problem. I will again ask a few questions. Again, your answers will provide directions and help move things forward. So, please participate and share your thoughts in the comments section. Remember, there are no wrong answers.<\/p>\n<p>Before we move further, to creatively think about this problem let me create a few links between&#8230;<\/p>\n<h2><span style=\"color: #3366ff;\">Initial Random Seeds and Human Success<\/span><\/h2>\n<p>Rahul Gandhi is an Indian politician. In 2013, he made a speech for which he was mocked and criticized endlessly. He compared a concept in physics, escape velocity, with social upliftment. Let me first try to explain escape velocity. When you throw a ball up in the air, it invariably comes back to the\u00a0ground because of Earth&#8217;s gravitational pull. If, however, you throw the ball with\u00a0enough force for it to generate the escape velocity, i.e. 11.2 km\/sec for Earth, it will fly away to space and will not return back on Earth.<\/p>\n<p>In his speech, Rahul Gandhi was addressing a section of Dalits, historically an oppressed class in India. He said Dalits need Jupiter&#8217;s escape velocity to succeed in their lives on Earth. Noticeably, Jupiter&#8217;s escape velocity is more than 5 times the escape velocity of Earth. In short, Dalits need to work much harder to succeed than other communities. Rahul Gandhi&#8217;s detractors didn&#8217;t like his argument as they believe that in the modern India everyone has an equal opportunity to succeed.<\/p>\n<h4><span style=\"color: #3366ff;\">Does Initial Disadvantage Persists? &#8211; Random Seeds &amp; Cluster Analysis<\/span><\/h4>\n<p>Let&#8217;s evaluate the argument about different escape velocity for different people. To do so, we will look at some evidence from the recent research in sociology and economics. Being a data science blog, I must present the data from a couple of research papers. The first paper from\u00a0<strong><a href=\"http:\/\/scholar.harvard.edu\/files\/lkatz\/files\/aer.103.3_fryer_katz_pp_2013_all_0.pdf?m=1369069030\">Harvard University<\/a>\u00a0<\/strong>estimates the influence of their neighborhood on children&#8217;s test scores. Another paper by\u00a0<strong><a href=\"http:\/\/www.nber.org\/papers\/w16381.pdf\">National Bureau of Economics Research, Massachusetts<\/a>,<\/strong><strong>\u00a0<\/strong>estimates the influence of a kid&#8217;s kindergarten test score on his salary as an adult.<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg\"><img data-attachment-id=\"9660\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/statistics-social-justice-1\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?fit=1198%2C464&amp;ssl=1\" data-orig-size=\"1198,464\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Statistics &#8211; Social Justice 1\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?fit=300%2C116&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?fit=640%2C248&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-9660\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?resize=640%2C248\" alt=\"\" width=\"640\" height=\"248\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?w=1198&amp;ssl=1 1198w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?resize=250%2C97&amp;ssl=1 250w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?resize=300%2C116&amp;ssl=1 300w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?resize=768%2C297&amp;ssl=1 768w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?resize=1024%2C397&amp;ssl=1 1024w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Statistics-Social-Justice-1.jpg?resize=30%2C12&amp;ssl=1 30w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>These results clearly show that the initial disadvantage in life &#8211; upbringing and neighborhood &#8211; persists with their salary as adults. This means a person brought up\u00a0in a poor neighborhood has to work extra hard to get the same salary as a person brought up\u00a0in a middle-class neighborhood.<\/p>\n<h4><span style=\"color: #3366ff;\">Nobody Conspires &#8211; Initial Random Seeds &amp; Cluster Analysis<\/span><\/h4>\n<p>It may seem that the society always deliberately conspires and oppresses certain communities and individuals. Malcolm Gladwell, however, argues that sometimes this conspiracy is completely random and unintentional. In his fascinating book Outliers &#8211; the Story of Success, he highlighted the research work of Roger Barnsley about players in the Canadian Hockey League. Barnsley observed the following patterns in the birth months of the players in the hockey league.<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg\"><img data-attachment-id=\"9683\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/canadian-hockey-league\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?fit=1030%2C425&amp;ssl=1\" data-orig-size=\"1030,425\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Canadian Hockey League\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?fit=300%2C124&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?fit=640%2C264&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-9683\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?resize=640%2C264\" alt=\"\" width=\"640\" height=\"264\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?w=1030&amp;ssl=1 1030w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?resize=250%2C103&amp;ssl=1 250w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?resize=300%2C124&amp;ssl=1 300w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?resize=768%2C317&amp;ssl=1 768w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?resize=1024%2C423&amp;ssl=1 1024w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Canadian-Hockey-League.jpg?resize=30%2C12&amp;ssl=1 30w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>One would expect an even distribution i.e. close to 25% across the birth months of players in four quarters. However, there are 4 times higher players born in the January-March quarter than October-December. There is also a consistent downward trend. What is going on here? First of all, astrology has nothing to do with these anomalous results i.e. Aquarius are not better hockey players than Sagittarius.<\/p>\n<p>Apparently, this discrepancy is the result of an innocent decision by the hockey association while setting the cutoff date for each age group. Importantly, the eligibility cutoff for age-class hockey (where the boys are grouped according to age) is 1st of January. This means that everyone born in the year 2008 is a 10-year-old in 2017. However, boys born in January 2008, although they are 10-year-olds on paper actually are almost a year older than boys born in December 2008. This one year difference in age has a huge significance as it is like an 11-year-old competing against a 10-year-old. Clearly, all things equal, the boys born in January have a better chance of selection to qualify for better training and facilities. Hence, the initial advantage for these boys continues as adult hockey players. This is kind of similar to random seeds in cluster analysis.<\/p>\n<h2><span style=\"color: #3366ff;\">Initial Random Seeds and Cluster Analysis<\/span><\/h2>\n<p>In the previous part of this\u00a0<strong><a href=\"http:\/\/ucanalytics.com\/blogs\/cluster-analysis-learn-by-doing-analytics-challenge-part-1\/\">cluster analysis puzzle<\/a>, <\/strong>we ran the k-mean cluster with two different initial random seeds and got completely different results. Remember, we had started with the initial choice of the clusters as 3 i.e. k=3. Essentially, the algorithm had generated 3 random points (aka cluster centroids) on the XY-plane. These centroids were then iteratively moved to get the final clusters. Read this\u00a0<strong><a href=\"http:\/\/ucanalytics.com\/blogs\/customer-segmentation-cluster-analysis-telecom-case-study-example\/\">cluster analysis case study<\/a>\u00a0<\/strong>to understand this iterative process.<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg\"><img data-attachment-id=\"9746\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/cluster-analysis-with-different-initial-random-seeds\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?fit=1946%2C663&amp;ssl=1\" data-orig-size=\"1946,663\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cluster analysis with different initial random seeds\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?fit=300%2C102&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?fit=640%2C218&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"aligncenter size-full wp-image-9746\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?resize=640%2C218\" alt=\"\" width=\"640\" height=\"218\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?w=1946&amp;ssl=1 1946w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?resize=250%2C85&amp;ssl=1 250w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?resize=300%2C102&amp;ssl=1 300w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?resize=768%2C262&amp;ssl=1 768w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?resize=1024%2C349&amp;ssl=1 1024w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?resize=30%2C10&amp;ssl=1 30w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-with-different-initial-random-seeds.jpg?w=1280 1280w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>I am sure by now, you must have figured that the positions of initial random seeds are responsible for these different results. This is similar to human success and their initial circumstances.<\/p>\n<h2><span style=\"color: #3366ff;\">Human Success and K-Mean Optimization<\/span><\/h2>\n<blockquote><p><em>The two most important requirements for major success are: first, being in the right place at the right time, and second, doing something about it.<\/em><\/p>\n<p style=\"text-align: right;\">&#8211; Ray Kroc<\/p>\n<\/blockquote>\n<p>Imagine, as humans, we are allowed to live multiple times in different times and eras with different surroundings and circumstances. In all these different lives we will have different levels of success. In this case, a good question is: how would one define and measure success? Is having wealth or having friends a good measure of success? Is it about achieving power or achieving peace in life?<\/p>\n<p><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-analysis-challenge-1-e1483773216799.jpeg\"><img data-attachment-id=\"9578\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-learn-by-doing-analytics-challenge-part-1\/cluster-analysis-challenge-1\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-analysis-challenge-1-e1483773216799.jpeg?fit=700%2C447&amp;ssl=1\" data-orig-size=\"700,447\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Cluster analysis challenge 1\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-analysis-challenge-1-e1483773216799.jpeg?fit=300%2C192&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-analysis-challenge-1-e1483773216799.jpeg?fit=640%2C409&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-9578 alignright\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-analysis-challenge-1-e1483773216799.jpeg?resize=316%2C202\" alt=\"\" width=\"316\" height=\"202\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-analysis-challenge-1-e1483773216799.jpeg?w=700&amp;ssl=1 700w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-analysis-challenge-1-e1483773216799.jpeg?resize=250%2C160&amp;ssl=1 250w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-analysis-challenge-1-e1483773216799.jpeg?resize=300%2C192&amp;ssl=1 300w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-analysis-challenge-1-e1483773216799.jpeg?resize=30%2C19&amp;ssl=1 30w\" sizes=\"(max-width: 316px) 100vw, 316px\" data-recalc-dims=\"1\" \/><\/a>This is essentially what we will do in the k-mean clustering algorithm to produce consistent results. In short, we will make cluster centroids live multiple lives and choose the most successful centroids. The essential thing here is to define success. Note, like different humans can have different definitions\u00a0for success in life, definitions for successful cluster centroids can also differ for different problems in hand. Also remember, using the textbook definition of successful clusters is like using someone else&#8217;s definition of success for your life. It may or may not work.<\/p>\n<h4><span style=\"color: #3366ff;\">A few Questions<\/span><\/h4>\n<p>On this note, let me pose a few questions for you to mull over<\/p>\n<table style=\"width: 653px; height: 120px; border-color: #000000; background-color: #b8e0fc;\">\n<tbody>\n<tr>\n<td style=\"width: 644.4px;\">1) What are the success factors that you will consider to define perfect clusters? I suggest you look at the data we are working with to define your factors in plain and simple English.<\/p>\n<p>2) Think of ways to measure your success factors for perfect cluster centroids and report your results.<\/p>\n<p>3) Could you think of reasons why your factors will fail to generate successful clusters on some other datasets?<\/p>\n<p>4) Finally, how will you make the initial random seeds spread across the data for them to maximize their chances of success? Don&#8217;t forget your initial seed needs to achieve success fast as well.<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<h4><span style=\"color: #3366ff;\">R Code for Multiple lives of Cluster Centers<\/span><\/h4>\n<p>There are many great minds who have pondered on the same questions that you were trying to answer in the previous section. They have come up with some good strategies. However, these questions are still open for better and creative answers. I would love to hear your solutions. We will use a package in R to make the centroids live many lives. First, let&#8217;s download our original data through the following line of code.<\/p>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\na = read.delim('http:\/\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Cluster-Analysis-Data.txt',sep = &quot;,&quot;,header = T)<\/pre>\n<p>Now, we will run the\u00a0k-mean clustering algorithm 100 times and measure the most successful centroids. We will use the\u00a0fpc package for this with the following lines of code. Notice, we have defined the value of k = 3 in the kmeansruns function.<\/p>\n<pre class=\"brush: r; first-line: 2; title: ; notranslate\" title=\"\">\r\ninstall.packages(&quot;fpc&quot;)\r\nrequire(fpc) \r\nkmeans = kmeansruns(a,3,runs=100) \r\nplot(a,col=kmeans$cluster,pch=16) \r\nlegend(-3,23,c('cluster 1','cluster 2','cluster 3'),pch=16,col=c(&quot;black&quot;,&quot;green&quot;,&quot;red&quot;))\r\n<\/pre>\n<p>Moreover, you could run this code multiple times but each time you will get the same clusters. The effect of initial random seeds is neutralized by the kmeansruns function. These are the consistent results through multiple runs of the k-mean algorithm.<br \/>\n<a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-challenge-2-e1483773283742.jpeg\"><img data-attachment-id=\"9579\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-learn-by-doing-analytics-challenge-part-1\/cluster-analysis-challenge-2\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-challenge-2-e1483773283742.jpeg?fit=693%2C444&amp;ssl=1\" data-orig-size=\"693,444\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"cluster analysis challenge 2\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-challenge-2-e1483773283742.jpeg?fit=300%2C192&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-challenge-2-e1483773283742.jpeg?fit=640%2C410&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-9579\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-challenge-2-e1483773283742.jpeg?resize=585%2C374\" alt=\"\" width=\"585\" height=\"374\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-challenge-2-e1483773283742.jpeg?w=693&amp;ssl=1 693w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-challenge-2-e1483773283742.jpeg?resize=250%2C160&amp;ssl=1 250w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-challenge-2-e1483773283742.jpeg?resize=300%2C192&amp;ssl=1 300w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/cluster-analysis-challenge-2-e1483773283742.jpeg?resize=30%2C19&amp;ssl=1 30w\" sizes=\"(max-width: 585px) 100vw, 585px\" data-recalc-dims=\"1\" \/><\/a><\/p>\n<p>This brings us to another question.<\/p>\n<table style=\"width: 652px; height: 69px; border-color: #000000; background-color: #b8e0fc;\">\n<tbody>\n<tr>\n<td style=\"width: 643.6px;\">5) What factors does kmeansruns\u00a0use\u00a0to measure the success of cluster centroids?<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Also, you run this piece of code where you are lettering kmeansruns\u00a0identify the\u00a0appropriate value of k by trying multiple values between 2 to 10.<\/p>\n<pre class=\"brush: r; first-line: 7; title: ; notranslate\" title=\"\">kmeans1=kmeansruns(a,krange = 2:10,runs=100)<\/pre>\n<p>Finally, to reinforce your learning of kmeans clusters, I suggest you simulate some more datasets and use\u00a0kmeansruns\u00a0with different settings. I am sure you will find some surprising results. Let us know what you found.<\/p>\n<h4><span style=\"color: #3366ff;\">Sign-off Note<\/span><\/h4>\n<p>Oh, I wish, like clusters optimization, we could create the most successful version of ourself by living in different times, places, and circumstances. Nature is not so kind but she has empowered us with a crucial thing: the ability to define the measure of one&#8217;s\u00a0own success in life. Don&#8217;t follow other people&#8217;s measure of success as your own. It may or may not work for you.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is a continuation of the\u00a0cluster analysis puzzle.\u00a0In this puzzle, we had noticed different results for k-mean clusters in different runs. Some of you (Emily. Ramya, Alard, and Pintu) have pointed out initial random seeds as the reason for this inconsistency. Now, this inconsistency of results is a big problem for cluster analysis. In this<\/p>\n<p><a class=\"excerpt-more blog-excerpt\" href=\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/\">Read More&#8230;<\/a><\/p>\n","protected":false},"author":1,"featured_media":9648,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[75,81],"tags":[],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v17.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Cluster Analysis Puzzle : Initial Random Seeds - Learn by Doing! (Part 2) &ndash; YOU CANalytics |<\/title>\n<meta name=\"description\" content=\"This is a continuation of cluster analysis puzzle, in this part we will measure the significance of initial random seeds on formation of perfect clusters.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Cluster Analysis Puzzle : Initial Random Seeds - Learn by Doing! (Part 2) &ndash; YOU CANalytics |\" \/>\n<meta property=\"og:description\" content=\"This is a continuation of cluster analysis puzzle, in this part we will measure the significance of initial random seeds on formation of perfect clusters.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/\" \/>\n<meta property=\"og:site_name\" content=\"YOU CANalytics |\" \/>\n<meta property=\"article:author\" content=\"roopam\" \/>\n<meta property=\"article:published_time\" content=\"2017-01-31T05:56:03+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2017-04-29T16:13:54+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom-1.jpg?fit=632%2C372&#038;ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"632\" \/>\n\t<meta property=\"og:image:height\" content=\"372\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Roopam Upadhyay\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#organization\",\"name\":\"YOU CANalytics\",\"url\":\"https:\/\/ucanalytics.com\/blogs\/\",\"sameAs\":[],\"logo\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#logo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120\",\"contentUrl\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120\",\"width\":607,\"height\":120,\"caption\":\"YOU CANalytics\"},\"image\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#logo\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#website\",\"url\":\"https:\/\/ucanalytics.com\/blogs\/\",\"name\":\"YOU CANalytics |\",\"description\":\"Explore the Power of Data Science\",\"publisher\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ucanalytics.com\/blogs\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom-1.jpg?fit=632%2C372&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom-1.jpg?fit=632%2C372&ssl=1\",\"width\":632,\"height\":372},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#webpage\",\"url\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/\",\"name\":\"Cluster Analysis Puzzle : Initial Random Seeds - Learn by Doing! (Part 2) &ndash; YOU CANalytics |\",\"isPartOf\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#primaryimage\"},\"datePublished\":\"2017-01-31T05:56:03+00:00\",\"dateModified\":\"2017-04-29T16:13:54+00:00\",\"description\":\"This is a continuation of cluster analysis puzzle, in this part we will measure the significance of initial random seeds on formation of perfect clusters.\",\"breadcrumb\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ucanalytics.com\/blogs\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Cluster Analysis Puzzle : Initial Random Seeds &#8211; Learn by Doing! (Part 2)\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#webpage\"},\"author\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6\"},\"headline\":\"Cluster Analysis Puzzle : Initial Random Seeds &#8211; Learn by Doing! (Part 2)\",\"datePublished\":\"2017-01-31T05:56:03+00:00\",\"dateModified\":\"2017-04-29T16:13:54+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#webpage\"},\"wordCount\":1546,\"commentCount\":1,\"publisher\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#organization\"},\"image\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom-1.jpg?fit=632%2C372&ssl=1\",\"articleSection\":[\"Analytics Challenge\",\"Cluster Analysis - Analytics Challenge\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#respond\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6\",\"name\":\"Roopam Upadhyay\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g\",\"caption\":\"Roopam Upadhyay\"},\"description\":\"This blog contains my personal views and thoughts on predictive Analytics and big data. - Roopam Upadhyay\",\"sameAs\":[\"roopam\"],\"url\":\"https:\/\/ucanalytics.com\/blogs\/author\/roopam\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Cluster Analysis Puzzle : Initial Random Seeds - Learn by Doing! (Part 2) &ndash; YOU CANalytics |","description":"This is a continuation of cluster analysis puzzle, in this part we will measure the significance of initial random seeds on formation of perfect clusters.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/","og_locale":"en_US","og_type":"article","og_title":"Cluster Analysis Puzzle : Initial Random Seeds - Learn by Doing! (Part 2) &ndash; YOU CANalytics |","og_description":"This is a continuation of cluster analysis puzzle, in this part we will measure the significance of initial random seeds on formation of perfect clusters.","og_url":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/","og_site_name":"YOU CANalytics |","article_author":"roopam","article_published_time":"2017-01-31T05:56:03+00:00","article_modified_time":"2017-04-29T16:13:54+00:00","og_image":[{"width":632,"height":372,"url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom-1.jpg?fit=632%2C372&ssl=1","type":"image\/jpeg"}],"twitter_misc":{"Written by":"Roopam Upadhyay","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/ucanalytics.com\/blogs\/#organization","name":"YOU CANalytics","url":"https:\/\/ucanalytics.com\/blogs\/","sameAs":[],"logo":{"@type":"ImageObject","@id":"https:\/\/ucanalytics.com\/blogs\/#logo","inLanguage":"en-US","url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120","contentUrl":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120","width":607,"height":120,"caption":"YOU CANalytics"},"image":{"@id":"https:\/\/ucanalytics.com\/blogs\/#logo"}},{"@type":"WebSite","@id":"https:\/\/ucanalytics.com\/blogs\/#website","url":"https:\/\/ucanalytics.com\/blogs\/","name":"YOU CANalytics |","description":"Explore the Power of Data Science","publisher":{"@id":"https:\/\/ucanalytics.com\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ucanalytics.com\/blogs\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"ImageObject","@id":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#primaryimage","inLanguage":"en-US","url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom-1.jpg?fit=632%2C372&ssl=1","contentUrl":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom-1.jpg?fit=632%2C372&ssl=1","width":632,"height":372},{"@type":"WebPage","@id":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#webpage","url":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/","name":"Cluster Analysis Puzzle : Initial Random Seeds - Learn by Doing! (Part 2) &ndash; YOU CANalytics |","isPartOf":{"@id":"https:\/\/ucanalytics.com\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#primaryimage"},"datePublished":"2017-01-31T05:56:03+00:00","dateModified":"2017-04-29T16:13:54+00:00","description":"This is a continuation of cluster analysis puzzle, in this part we will measure the significance of initial random seeds on formation of perfect clusters.","breadcrumb":{"@id":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ucanalytics.com\/blogs\/"},{"@type":"ListItem","position":2,"name":"Cluster Analysis Puzzle : Initial Random Seeds &#8211; Learn by Doing! (Part 2)"}]},{"@type":"Article","@id":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#article","isPartOf":{"@id":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#webpage"},"author":{"@id":"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6"},"headline":"Cluster Analysis Puzzle : Initial Random Seeds &#8211; Learn by Doing! (Part 2)","datePublished":"2017-01-31T05:56:03+00:00","dateModified":"2017-04-29T16:13:54+00:00","mainEntityOfPage":{"@id":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#webpage"},"wordCount":1546,"commentCount":1,"publisher":{"@id":"https:\/\/ucanalytics.com\/blogs\/#organization"},"image":{"@id":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom-1.jpg?fit=632%2C372&ssl=1","articleSection":["Analytics Challenge","Cluster Analysis - Analytics Challenge"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ucanalytics.com\/blogs\/cluster-analysis-puzzle-initial-random-seeds-learn-by-doing-part-2\/#respond"]}]},{"@type":"Person","@id":"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6","name":"Roopam Upadhyay","image":{"@type":"ImageObject","@id":"https:\/\/ucanalytics.com\/blogs\/#personlogo","inLanguage":"en-US","url":"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g","caption":"Roopam Upadhyay"},"description":"This blog contains my personal views and thoughts on predictive Analytics and big data. - Roopam Upadhyay","sameAs":["roopam"],"url":"https:\/\/ucanalytics.com\/blogs\/author\/roopam\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Freedom-1.jpg?fit=632%2C372&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3L0jT-2vD","jetpack-related-posts":[{"id":9519,"url":"https:\/\/ucanalytics.com\/blogs\/cluster-analysis-learn-by-doing-analytics-challenge-part-1\/","url_meta":{"origin":9649,"position":0},"title":"Cluster Analysis Puzzle &#8211; Learn by Doing! (Part 1)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"Cluster analysis is a powerful analytical technique to group or segment identical elements i.e. customers, products etc. In this series of articles, you will explore nuances of cluster analysis and its applications. Analytics challenges, on YOU CANalytics, are designed like puzzles where your participation is extremely important to move things\u2026","rel":"","context":"In &quot;Analytics Challenge&quot;","block_context":{"text":"Analytics Challenge","link":"https:\/\/ucanalytics.com\/blogs\/category\/analytics-challenge\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2017\/01\/Twins-and-Cluster-Analysis-1.jpg?fit=427%2C233&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":1385,"url":"https:\/\/ucanalytics.com\/blogs\/customer-segmentation-outliers-telecom-case-study-part-3\/","url_meta":{"origin":9649,"position":1},"title":"Cluster Analysis and Outliers \u2013 Telecom Case Study Example (Part 3)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"Outliers \"I refuse to join any club that would have me as a member.\" - Groucho Marx This witty statement came from (according to me) one of the funniest men in the history of American cinema \u2013 Julius Henry Marx better known as Groucho Marx. Groucho was certainly a very\u2026","rel":"","context":"In &quot;Marketing Analytics&quot;","block_context":{"text":"Marketing Analytics","link":"https:\/\/ucanalytics.com\/blogs\/category\/marketing-analytics\/"},"img":{"alt_text":"Groucho - by Roopam","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/01\/photo.jpg?fit=768%2C1024&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/01\/photo.jpg?fit=768%2C1024&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/01\/photo.jpg?fit=768%2C1024&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/01\/photo.jpg?fit=768%2C1024&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":1532,"url":"https:\/\/ucanalytics.com\/blogs\/customer-segmentation\/","url_meta":{"origin":9649,"position":2},"title":"Telecom Case (Part 4) &#8211; Customer Segmentation and Application","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"Telecom Case Study \u2013 Customer Segmentation For the last few articles we have been working on a telecom case study to create customer segments (Part 1, Part 2 and Part 3). In this case, you are the head of customer insights and marketing at a telecom company, ConnectFast Inc. Recall,\u2026","rel":"","context":"In &quot;Marketing Analytics&quot;","block_context":{"text":"Marketing Analytics","link":"https:\/\/ucanalytics.com\/blogs\/category\/marketing-analytics\/"},"img":{"alt_text":"Customer Segmentation - by Roopam","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/02\/photo1.jpg?fit=742%2C1024&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/02\/photo1.jpg?fit=742%2C1024&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/02\/photo1.jpg?fit=742%2C1024&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2014\/02\/photo1.jpg?fit=742%2C1024&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":1116,"url":"https:\/\/ucanalytics.com\/blogs\/customer-segmentation-cluster-analysis-telecom-case-study-example\/","url_meta":{"origin":9649,"position":3},"title":"Customer Segmentation &#038; Cluster Analysis &#8211; Telecom Case Study Example (Part 1)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"Galaxies and Cluster Analysis I live in Mumbai (Bombay), the financial capital of India and one of the largest cities in the world. One of the problems of living in a large city is that you rarely see stars in the night sky. The limited sky one can see through\u2026","rel":"","context":"In &quot;Marketing Analytics&quot;","block_context":{"text":"Marketing Analytics","link":"https:\/\/ucanalytics.com\/blogs\/category\/marketing-analytics\/"},"img":{"alt_text":"The Night Sky - by Roopam","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/11\/sky-1.jpg?fit=768%2C1024&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/11\/sky-1.jpg?fit=768%2C1024&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/11\/sky-1.jpg?fit=768%2C1024&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/11\/sky-1.jpg?fit=768%2C1024&ssl=1&resize=700%2C400 2x"},"classes":[]},{"id":1259,"url":"https:\/\/ucanalytics.com\/blogs\/customer-segmentation-cluster-analysis-telecom-case-study-part-2\/","url_meta":{"origin":9649,"position":4},"title":"Customer Segmentation &#038; Cluster Analysis \u2013 Telecom Case Study Example(Part 2)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"In one of\u00a0the previous articles, we have started with a case study example from the telecom sector. We learned about cluster analysis using black holes as an analogy. In that article, we used Euclidean distance to form customer segments. Let us continue with the same case study and learn about\u2026","rel":"","context":"In &quot;Marketing Analytics&quot;","block_context":{"text":"Marketing Analytics","link":"https:\/\/ucanalytics.com\/blogs\/category\/marketing-analytics\/"},"img":{"alt_text":"Euclid - by Roopam","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2013\/12\/unnamed.jpg?fit=524%2C615&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":7324,"url":"https:\/\/ucanalytics.com\/blogs\/business-process-optimization-call-center-case-study-example-part-2\/","url_meta":{"origin":9649,"position":5},"title":"Process Optimization &#038; Real Time Analytics  &#8211; Case Study Example (Part 2)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"This is a continuation of the case study example for\u00a0optimization of a business process. You can find the previous part at\u00a0this link. In this case study example, you are helping a company reduce their process turn-around-time through advanced analytics and data science. You will use\u00a0some of the key data science\u2026","rel":"","context":"In &quot;Call Center Case Study Example&quot;","block_context":{"text":"Call Center Case Study Example","link":"https:\/\/ucanalytics.com\/blogs\/category\/call-center-case-study-example\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/12\/Jigsaw-Yin-Yang.jpg?fit=445%2C336&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/posts\/9649"}],"collection":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/comments?post=9649"}],"version-history":[{"count":0,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/posts\/9649\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/media\/9648"}],"wp:attachment":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/media?parent=9649"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/categories?post=9649"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/tags?post=9649"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}