{"id":9018,"date":"2016-10-09T13:44:47","date_gmt":"2016-10-09T08:14:47","guid":{"rendered":"http:\/\/ucanalytics.com\/blogs\/?p=9018"},"modified":"2017-04-29T21:37:28","modified_gmt":"2017-04-29T16:07:28","slug":"step-step-regression-models-pricing-case-study-example-part-5","status":"publish","type":"post","link":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/","title":{"rendered":"Step by Step Regression Modeling Using Principal Component Analysis &#8211; Case Study Example (Part 5)"},"content":{"rendered":"<hr \/>\n<div id=\"attachment_9017\" style=\"width: 928px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg\"><img aria-describedby=\"caption-attachment-9017\" data-attachment-id=\"9017\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/sumo-and-regression-model\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&amp;ssl=1\" data-orig-size=\"918,384\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"sumo-and-regression-model\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=300%2C125&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=640%2C268&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-9017 size-full\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?resize=640%2C268\" alt=\"sumo-and-regression-model\" width=\"640\" height=\"268\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?w=918&amp;ssl=1 918w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?resize=250%2C105&amp;ssl=1 250w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?resize=300%2C125&amp;ssl=1 300w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?resize=768%2C321&amp;ssl=1 768w\" sizes=\"(max-width: 640px) 100vw, 640px\" data-recalc-dims=\"1\" \/><\/a><p id=\"caption-attachment-9017\" class=\"wp-caption-text\">Regression Models and Information Asymmetry &#8211; by Roopam<\/p><\/div>\n<p>This is a continuation of our case study example to estimate property pricing. In this part, you will learn nuances of regression modeling by building three different regression models and compare their results.\u00a0We will also use results of the <strong><a href=\"http:\/\/ucanalytics.com\/blogs\/principal-component-analysis-step-step-guide-r-regression-case-study-example-part-4\/\">principal component analysis<\/a><\/strong>, discussed in the last part, to develop a regression model. You can find all the parts of this case study at the following links: <strong><a href=\"http:\/\/ucanalytics.com\/blogs\/category\/pricing-case-study-example\/\">regression analysis case study example<\/a>.<\/strong><\/p>\n<p>However, before we start building regression models let me highlight the importance of information in pricing and also explain how data science &amp; regression creates a level playing field by eliminating information asymmetry.<\/p>\n<h2><span style=\"color: #3366ff;\">Information Asymmetry &amp; Regression Models<\/span><\/h2>\n<p><a href=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/KKK.jpg\"><img data-attachment-id=\"9069\" data-permalink=\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/kkk\/\" data-orig-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/KKK.jpg?fit=374%2C645&amp;ssl=1\" data-orig-size=\"374,645\" data-comments-opened=\"1\" data-image-meta=\"{&quot;aperture&quot;:&quot;0&quot;,&quot;credit&quot;:&quot;&quot;,&quot;camera&quot;:&quot;&quot;,&quot;caption&quot;:&quot;&quot;,&quot;created_timestamp&quot;:&quot;0&quot;,&quot;copyright&quot;:&quot;&quot;,&quot;focal_length&quot;:&quot;0&quot;,&quot;iso&quot;:&quot;0&quot;,&quot;shutter_speed&quot;:&quot;0&quot;,&quot;title&quot;:&quot;&quot;,&quot;orientation&quot;:&quot;0&quot;}\" data-image-title=\"Regression Analysis and KKK\" data-image-description=\"\" data-image-caption=\"\" data-medium-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/KKK.jpg?fit=174%2C300&amp;ssl=1\" data-large-file=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/KKK.jpg?fit=374%2C645&amp;ssl=1\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-9069 alignright\" src=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/KKK.jpg?resize=254%2C438\" alt=\"Regression Analysis and KKK\" width=\"254\" height=\"438\" srcset=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/KKK.jpg?w=374&amp;ssl=1 374w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/KKK.jpg?resize=145%2C250&amp;ssl=1 145w, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/KKK.jpg?resize=174%2C300&amp;ssl=1 174w\" sizes=\"(max-width: 254px) 100vw, 254px\" data-recalc-dims=\"1\" \/><\/a>You want to sell your house that you had purchased 8 years ago. You get mixed messages from different sources about the state of the real-estate market. Some say the housing market is booming and others believe it&#8217;s a bust. You are thoroughly confused and in good faith approached a real-estate agent to help you crack a good deal. The agent is an expert. He gets a 1.5% commission on the selling price hence will grab you the best deal. Freakonomics, a book by Steven Levitt &amp; Stephen Dubner, argues otherwise. Levitt, as a part of his research, analyzed sales patterns\u00a0of houses owned by real-estate agents versus their customers. He observed that houses owned\u00a0by agents were selling at a 3% higher price than their customers. So how do these agents get better deals for their own properties? Incidentally, they keeping their own properties listed on the market for roughly 10 more days than their clients&#8217; properties. A 300,000 dollars house will fetch $10,000 more in these additional 10 days. However, if the property belongs to the client, the agent will get just $150 for 10 extra days of effort. He would rather close the deal early and move on to the next deal.<\/p>\n<p>According to Freakonomics, real-estate agents are like Ku Klux Klan (KKK), a notorious secret society responsible for lynching African\u00a0Americans.\u00a0The entire existence of KKK and real-estate agents depends on the asymmetry\u00a0of information. The more information they have than others, the stronger they become. KKK was eventually destroyed by efforts to make their secret information public. Once the mask was removed from KKK&#8217;s face and information asymmetry was gone, the secret society was blown away like a puff of dust.\u00a0Data science has a massive role to play for the democratization of information. Imagine the world where everyone has access to all the data\/information, and everyone knows how to extract knowledge from that data. In this scenario, expertise will rely\u00a0upon sophisticated human skills like creativity and innovation rather than secrecy and deceit.<\/p>\n<p>As we move towards our regression case study, we can take a big lesson from this Freakonomics&#8217; analysis on\u00a0real estate agents. While in our case study example we are using just a handful of predictor variables to estimate housing price, there are so many interesting phenomena outside our dataset that determine housing price like if the owner of the house is also a real estate agent. As a data scientist, it&#8217;s our job to unearth these interesting phenomena and build robust models.<\/p>\n<h2><span style=\"color: #3366ff;\">Case Study Example &#8211; Regression Model<\/span><\/h2>\n<p>In this case study example, you are building regression models to help an investment firm make money through property price arbitrage. You are under a lot of pressure from your client to deliver the price estimation model soon. You have prepared your data by adjusting it for outliers and missing values. To begin with, you will build a complete model with all the predictor variables. You can find the entire R code used in this article at this link:\u00a0<strong><a href=\"http:\/\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/Regression-Models-R-Code.txt\">regression-models-r-code<\/a>.<\/strong><\/p>\n<p>The first step in model building is to fetch data in R and identify numeric and categorical predictor variables. Moreover, we will also tag house price as the target or response variable. This is exactly what the next few lines of code is doing.<\/p>\n<h4><span style=\"color: #3366ff;\">Step 1: fetch data for regression modeling &amp; tag the variables<\/span><\/h4>\n<pre class=\"brush: r; title: ; notranslate\" title=\"\">\r\nClean_Data = read.csv('http:\/\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Regression-Clean-Data.csv')<\/pre>\n<p>Now, we will tag the variables based on their properties.<\/p>\n<pre class=\"brush: r; first-line: 2; title: ; notranslate\" title=\"\">\r\nnumeric=c('Dist_Taxi','Dist_Market','Dist_Hospital','Carpet','Builtup','Rainfall')\r\ncategoric = c('Parking', 'City_Category')\r\nTarget = c('House_Price')\r\n<\/pre>\n<p>The next step is to divide your sample into training and test set. We will build all the 3 models on the training set and evaluate the\u00a0performance of the model on the test set. These datasets are formed by random selection of 70% of data as the training set and the remaining 30% dataset is the testing set.<\/p>\n<h4><span style=\"font-size: 12pt;\"><strong><span style=\"color: #ff0000;\"><span style=\"color: #3366ff;\">Step 2: prepare train and test data for regression modeling<\/span><\/span><\/strong><\/span><\/h4>\n<pre class=\"brush: r; first-line: 5; title: ; notranslate\" title=\"\">\r\nset.seed(42)\r\ntrain = sample(nrow(Clean_Data), 0.7*nrow(Clean_Data))\r\ntest = setdiff(seq_len(nrow(Clean_Data)), train)<\/pre>\n<p>Now, we will build our first regression model with all the available variables in our dataset.<\/p>\n<h4><span style=\"font-size: 12pt;\"><span style=\"color: #3366ff;\"><strong>Step 3: build 1st regression model with all the available variables<\/strong> <\/span><\/span><\/h4>\n<pre class=\"brush: r; first-line: 8; title: ; notranslate\" title=\"\">\r\nOrg_Reg=lm(House_Price~.,data=Clean_Data[train,c(Target,numeric,categoric)])\r\nsummary(Org_Reg)\r\n<\/pre>\n<p>These are the results of the first regression model.<\/p>\n<table border=\"2\" width=\"505\">\n<tbody>\n<tr>\n<td style=\"background-color: #16599c; width: 178px;\" width=\"178\"><strong><span style=\"color: #ffffff;\">Coefficients:<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px;\" width=\"70\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px;\" width=\"66\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px;\" width=\"62\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px;\" width=\"70\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px;\" width=\"59\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #5289bf; width: 178px;\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #5289bf; width: 178px;\"><strong><span style=\"color: #ffffff;\">Estimate<\/span><\/strong><\/td>\n<td style=\"background-color: #5289bf; width: 178px;\"><strong><span style=\"color: #ffffff;\">Std.Error<\/span><\/strong><\/td>\n<td style=\"background-color: #5289bf; width: 178px;\"><span style=\"color: #ffffff;\"><b>t value<\/b><\/span><\/td>\n<td style=\"background-color: #5289bf; width: 178px;\"><span style=\"color: #ffffff;\"><strong>Pr(&gt;|t|)<\/strong><\/span><\/td>\n<td style=\"background-color: #5289bf; width: 178px;\"><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">(Intercept)<\/td>\n<td>5.25E+06<\/td>\n<td>4.40E+05<\/td>\n<td>11.913<\/td>\n<td>\u00a0&lt; 2e-16<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">Dist_Taxi<\/td>\n<td>3.25E+01<\/td>\n<td>3.07E+01<\/td>\n<td>1.059<\/td>\n<td>0.2902<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">Dist_Market<\/td>\n<td>4.74E+00<\/td>\n<td>2.38E+01<\/td>\n<td>0.199<\/td>\n<td>0.8421<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">Dist_Hospital<\/td>\n<td>8.27E+01<\/td>\n<td>3.43E+01<\/td>\n<td>2.408<\/td>\n<td>0.0163<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>*<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">Carpet<\/td>\n<td>-1.61E+03<\/td>\n<td>3.92E+03<\/td>\n<td>-0.41<\/td>\n<td>0.6818<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">Builtup<\/td>\n<td>2.02E+03<\/td>\n<td>3.27E+03<\/td>\n<td>0.617<\/td>\n<td>0.5376<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">Rainfall<\/td>\n<td>-2.01E+02<\/td>\n<td>1.76E+02<\/td>\n<td>-1.146<\/td>\n<td>0.2524<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>\u00a0<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">(Parking) No Parking<\/td>\n<td>-6.70E+05<\/td>\n<td>1.59E+05<\/td>\n<td>-4.222<\/td>\n<td>2.78E-05<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">(Parking) NotProvided<\/td>\n<td>-5.09E+05<\/td>\n<td>1.43E+05<\/td>\n<td>-3.56<\/td>\n<td>0.0004<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">(Parking) Open<\/td>\n<td>-2.83E+05<\/td>\n<td>1.31E+05<\/td>\n<td>-2.156<\/td>\n<td>0.0315<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>*<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">(City_Category) CAT B<\/td>\n<td>-1.81E+06<\/td>\n<td>1.11E+05<\/td>\n<td>-16.388<\/td>\n<td>\u00a0&lt; 2e-16<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"background-color: #bbe6fa;\">(City_Category) CAT C<\/td>\n<td>-2.87E+06<\/td>\n<td>1.22E+05<\/td>\n<td>-23.404<\/td>\n<td>\u00a0&lt; 2e-16<\/td>\n<td style=\"background-color: #f2e8cb;\"><strong>***<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>If you examine the results of the first regression model in the above table. The first thing to notice is that all the variables are part of this model including the categorical variables i.e. parking and city category. Moreover, categorical variables are converted to dummy variables where each category is represented as a separate variable.<\/p>\n<p>The next thing to notice is the level of significance or importance of these variables in the model. \u00a0This is presented in the last column of the table. In this model, carpet and built-up area of the house are not showing as important. This is a bit weird since we noticed while doing <strong><a href=\"http:\/\/ucanalytics.com\/blogs\/bivariate-analysis-leverage-regression-case-study-example-part-3\/\">bivariate analysis<\/a><\/strong> that these variables had significant correlations with the house price. What is happening here?\u00a0If you remember, these two variables have a high correlation with each other. This is where we are seeing demons of multicollinearity when significant variables are tagged as unimportant. We need to do something about multicollinearity. But before we make our next model to handle multicollinearity with principal component analysis let&#8217;s evaluate the performance of this &#8216;all variable model&#8217; on the testing sample.<\/p>\n<h4><span style=\"font-size: 12pt;\"><span style=\"color: #3366ff;\">Step 4: evaluate performance of the 1st regression model<\/span><\/span><\/h4>\n<pre class=\"brush: r; first-line: 10; title: ; notranslate\" title=\"\">\r\nEstimate=predict(Org_Reg,type='response',newdata=Clean_Data[test,c(numeric,categoric,Target)])\r\nObserved=subset(Clean_Data[test,c(numeric,categoric,Target)],select=Target)\r\nformat(cor(Estimate,Observed$House_Price)^2,digits=4)\r\n<\/pre>\n<h4><\/h4>\n<p>In the above code, &#8216;Estimate&#8217; is the model estimated value of the house prices for the test sample and &#8216;Observed&#8217; is the actual value of the house price. The correlation between observed and estimated value will tell us the level of accuracy of the model. The square of this correlation is referred to as R-square value or the predictive power of the model.\u00a0The R-square value for this multicollinearity infected model is 0.4489. This means that around 44.89% of the variation in the house price can be explained by these predictor variables.<\/p>\n<p>The next step for us is to remove multicollinearity from our model. A good way to achieve this is by building the model with <a href=\"http:\/\/ucanalytics.com\/blogs\/principal-component-analysis-step-step-guide-r-regression-case-study-example-part-4\/\"><strong>the orthogonal principal components derived from the original variables.<\/strong><\/a>\u00a0Remember, principal component analysis modifies a set of numeric variables into uncorrelated components.<\/p>\n<h4><span style=\"font-size: 12pt;\"><span style=\"color: #3366ff;\">Step 5: prepare data for 2nd regression model with principal components <\/span><br \/>\n<\/span><\/h4>\n<pre class=\"brush: r; first-line: 13; title: ; notranslate\" title=\"\">\r\nrequire(FactoMineR)\r\nData_for_PCA&amp;amp;amp;amp;amp;lt;-Clean_Data[,numeric]\r\npca1 = PCA(Data_for_PCA)\r\nPCA_data=as.data.frame(cbind(Clean_Data[train,c(Target,categoric)],pca1$ind$coord[train,]))\r\n<\/pre>\n<p>In PCA_data we have replaced all the numeric variables with principal components. We will use this data to build our second regression model to counter multicollinearity.<\/p>\n<h4><span style=\"font-size: 12pt; color: #3366ff;\">Step 6: build 2nd regression model with principal components<\/span><\/h4>\n<pre class=\"brush: r; first-line: 17; title: ; notranslate\" title=\"\">\r\nStep_PCA_Reg =step(lm(House_Price~.,data = PCA_data)) \r\nsummary(Step_PCA_Reg)\r\n<\/pre>\n<table border=\"2\" width=\"505\">\n<tbody>\n<tr style=\"height: 23.05px;\">\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"178\"><strong><span style=\"color: #ffffff;\">Coefficients:<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"70\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"66\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"62\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"70\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"59\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><strong><span style=\"color: #ffffff;\">Estimate<\/span><\/strong><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><strong><span style=\"color: #ffffff;\">Std.Error<\/span><\/strong><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><span style=\"color: #ffffff;\"><b>t value<\/b><\/span><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><span style=\"color: #ffffff;\"><strong>Pr(&gt;|t|)<\/strong><\/span><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #bbe6fa; height: 23px;\">(Intercept)<\/td>\n<td style=\"height: 23px;\">\u00a07684893<\/td>\n<td style=\"height: 23px;\">120912<\/td>\n<td style=\"height: 23px;\">63.558<\/td>\n<td style=\"height: 23px;\">\u00a0&lt; 2e-16<\/td>\n<td style=\"background-color: #f2e8cb; height: 23px;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #bbe6fa; height: 23px;\">Comp 1<\/td>\n<td style=\"height: 23px;\">181462<\/td>\n<td style=\"height: 23px;\">32083<\/td>\n<td style=\"height: 23px;\">5.656<\/td>\n<td style=\"height: 23px;\"><span style=\"font-size: 12pt;\">2.37e-08<\/span><\/td>\n<td style=\"background-color: #f2e8cb; height: 23px;\"><strong>\u00a0***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #bbe6fa; height: 23px;\">Comp 2<\/td>\n<td style=\"height: 23px;\">149740<\/td>\n<td style=\"height: 23px;\">34506<\/td>\n<td style=\"height: 23px;\">4.340<\/td>\n<td style=\"height: 23px;\">1.67e-05<\/td>\n<td style=\"background-color: #f2e8cb; height: 23px;\"><strong>\u00a0***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 47px;\">\n<td style=\"background-color: #bbe6fa; height: 47px;\">(Parking) No Parking<\/td>\n<td style=\"height: 47px;\">-643139<\/td>\n<td style=\"height: 47px;\">157929<\/td>\n<td style=\"height: 47px;\">-4.072<\/td>\n<td style=\"height: 47px;\">5.26e-05<\/td>\n<td style=\"background-color: #f2e8cb; height: 47px;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 47px;\">\n<td style=\"background-color: #bbe6fa; height: 47px;\">(Parking) NotProvided<\/td>\n<td style=\"height: 47px;\">-503083<\/td>\n<td style=\"height: 47px;\">142925<\/td>\n<td style=\"height: 47px;\">-3.520<\/td>\n<td style=\"height: 47px;\">0.0004<\/td>\n<td style=\"background-color: #f2e8cb; height: 47px;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #bbe6fa; height: 23px;\">(Parking) Open<\/td>\n<td style=\"height: 23px;\">-280855<\/td>\n<td style=\"height: 23px;\">130877<\/td>\n<td style=\"height: 23px;\">-2.146<\/td>\n<td style=\"height: 23px;\">0.0322<\/td>\n<td style=\"background-color: #f2e8cb; height: 23px;\"><strong>*<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 47px;\">\n<td style=\"background-color: #bbe6fa; height: 47px;\">(City_Category) CAT B<\/td>\n<td style=\"height: 47px;\">-1802882<\/td>\n<td style=\"height: 47px;\">110352<\/td>\n<td style=\"height: 47px;\">-16.338<\/td>\n<td style=\"height: 47px;\">\u00a0&lt; 2e-16<\/td>\n<td style=\"background-color: #f2e8cb; height: 47px;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 47px;\">\n<td style=\"background-color: #bbe6fa; height: 47px;\">(City_Category) CAT C<\/td>\n<td style=\"height: 47px;\">110352<\/td>\n<td style=\"height: 47px;\">-2860830<\/td>\n<td style=\"height: 47px;\">-23.418<\/td>\n<td style=\"height: 47px;\">\u00a0&lt; 2e-16<\/td>\n<td style=\"background-color: #f2e8cb; height: 47px;\"><strong>***<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As you must have noticed we don&#8217;t have any of the original numeric variables in this model but for the uncorrelated principal components i.e. Comp 1 and Comp 2. Moreover, we have run the stepwise regression to remove insignificant variables and components. In this case, only component 1 &amp; 2 turned out to be significant and other components 3-6 were dropped because they were not important to estimate house prices. Let&#8217;s see how this new model will perform in terms of accuracy on the test dataset.<\/p>\n<h4><span style=\"font-size: 12pt; color: #3366ff;\"><strong>Step 7: performance evaluation of the 2nd regression model<\/strong><\/span><\/h4>\n<pre class=\"brush: r; first-line: 19; title: ; notranslate\" title=\"\">\r\nPCA_Estimate=predict(Step_PCA_Reg,type='response',newdata=cbind(Clean_Data[test,c(Target,categoric)],pca1$ind$coord[test,]))\r\nformat(cor(PCA_Estimate, Observed$House_Price)^2, digits=4)\r\n<\/pre>\n<p>The accuracy or R-square value for this model is\u00a00.4559. This is a slight improvement in the accuracy from the original model. However, we know that the numeric variables in this model are not correlated hence we have tackled the demons of multicollinearity.<\/p>\n<p>It is always a little problematic for an analyst to explain their analysis with principal components to their clients.\u00a0Moreover, during operationalization of models, principal components add another level of complexity. Hence, it is a good idea if possible, to build the model with the original raw variables.\u00a0You may remember this table from the previous part of this article on <a href=\"http:\/\/ucanalytics.com\/blogs\/principal-component-analysis-step-step-guide-r-regression-case-study-example-part-4\/\">principal component analysis<\/a><a href=\"http:\/\/ucanalytics.com\/blogs\/principal-component-analysis-step-step-guide-r-regression-case-study-example-part-4\/\">.<\/a><\/p>\n<table cellspacing=\"2\">\n<tbody>\n<tr style=\"background-color: #4d6edb; height: 23px;\">\n<td style=\"height: 23px;\" width=\"140\"><\/td>\n<td style=\"height: 23px;\" width=\"80\"><strong><span style=\"color: #ffffff;\">comp 1<\/span><\/strong><\/td>\n<td style=\"height: 23px;\" width=\"80\"><strong><span style=\"color: #ffffff;\">comp 2<\/span><\/strong><\/td>\n<td style=\"height: 23px;\" width=\"80\"><strong><span style=\"color: #ffffff;\">comp 3<\/span><\/strong><\/td>\n<td style=\"height: 23px;\" width=\"80\"><strong><span style=\"color: #ffffff;\">comp 4<\/span><\/strong><\/td>\n<td style=\"height: 23px;\" width=\"80\"><strong><span style=\"color: #ffffff;\">comp 5<\/span><\/strong><\/td>\n<td style=\"height: 23px;\" width=\"80\"><strong><span style=\"color: #ffffff;\">comp 6<\/span><\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"width: 140px; background-color: #ffe552; height: 23px;\" width=\"140\"><strong><span style=\"color: #000000;\">Dist_Hospital<\/span><\/strong><\/td>\n<td style=\"height: 23px;\" width=\"80\"><strong>88%<\/strong><\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">2%<\/td>\n<td style=\"height: 23px;\" width=\"80\">10%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"width: 140px; background-color: #ffe552; height: 23px;\" width=\"140\"><span style=\"color: #000000;\">Dist_Taxi<\/span><\/td>\n<td style=\"height: 23px;\" width=\"80\">76%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">1%<\/td>\n<td style=\"height: 23px;\" width=\"80\">17%<\/td>\n<td style=\"height: 23px;\" width=\"80\">6%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<\/tr>\n<tr style=\"height: 23.4502px;\">\n<td style=\"width: 140px; background-color: #ffe552; height: 23.4502px;\" width=\"140\"><span style=\"color: #000000;\">Dist_Market<\/span><\/td>\n<td style=\"height: 23.4502px;\" width=\"80\">61%<\/td>\n<td style=\"height: 23.4502px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23.4502px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23.4502px;\" width=\"80\">38%<\/td>\n<td style=\"height: 23.4502px;\" width=\"80\">1%<\/td>\n<td style=\"height: 23.4502px;\" width=\"80\">0%<\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"width: 140px; background-color: #ffe552; height: 23px;\" width=\"140\"><span style=\"color: #000000;\">Rainfall<\/span><\/td>\n<td style=\"height: 23px;\" width=\"80\">1%<\/td>\n<td style=\"height: 23px;\" width=\"80\">1%<\/td>\n<td style=\"height: 23px;\" width=\"80\">98%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"width: 140px; background-color: #ffe552; height: 23px;\" width=\"140\"><strong><span style=\"color: #000000;\">Carpet<\/span><\/strong><\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\"><strong>100%<\/strong><\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"width: 140px; background-color: #ffe552; height: 23px;\" width=\"140\"><span style=\"color: #000000;\">Builtup<\/span><\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">100%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<td style=\"height: 23px;\" width=\"80\">0%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>As you can see, the dominant variables in comp 1 &amp; 2 are distance to hospital and carpet area of the house. Hence, we will build our 3rd and final model with these variables.<\/p>\n<h4><span style=\"font-size: 12pt; color: #3366ff;\"><strong>Step 8: build 3rd regression model with dominant variables in significant pricipal components<\/strong><\/span><\/h4>\n<pre class=\"brush: r; first-line: 21; title: ; notranslate\" title=\"\">\r\nnumeric_new = c('Dist_Hospital','Carpet')\r\nNew_Reg=lm(House_Price~.,data=Clean_Data[train,c(Target,numeric_new,categoric)])\r\nsummary(New_Reg)\r\n<\/pre>\n<table border=\"2\" width=\"505\">\n<tbody>\n<tr style=\"height: 23.05px;\">\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"178\"><strong><span style=\"color: #ffffff;\">Coefficients:<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"70\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"66\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"62\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"70\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #16599c; width: 178px; height: 23.05px;\" width=\"59\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><strong><span style=\"color: #ffffff;\">\u00a0<\/span><\/strong><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><strong><span style=\"color: #ffffff;\">Estimate<\/span><\/strong><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><strong><span style=\"color: #ffffff;\">Std.Error<\/span><\/strong><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><span style=\"color: #ffffff;\"><b>t value<\/b><\/span><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><span style=\"color: #ffffff;\"><strong>Pr(&gt;|t|)<\/strong><\/span><\/td>\n<td style=\"background-color: #5289bf; width: 178px; height: 23px;\"><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #bbe6fa; height: 23px;\">(Intercept)<\/td>\n<td style=\"height: 23px;\">5050297<\/td>\n<td style=\"height: 23px;\">406156<\/td>\n<td style=\"height: 23px;\">12.434<\/td>\n<td style=\"height: 23px;\">\u00a0&lt; 2e-16<\/td>\n<td style=\"background-color: #f2e8cb; height: 23px;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #bbe6fa; height: 23px;\">Dist_Hospital<\/td>\n<td style=\"height: 23px;\">109<\/td>\n<td style=\"height: 23px;\">1.8.7<\/td>\n<td style=\"height: 23px;\">\u00a05.824<\/td>\n<td style=\"height: 23px;\">9.22e-09<\/td>\n<td style=\"background-color: #f2e8cb; height: 23px;\"><strong>\u00a0***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #bbe6fa; height: 23px;\">Carpet<\/td>\n<td style=\"height: 23px;\">811<\/td>\n<td style=\"height: 23px;\">195<\/td>\n<td style=\"height: 23px;\">4.161<\/td>\n<td style=\"height: 23px;\">3.61e-05<\/td>\n<td style=\"background-color: #f2e8cb; height: 23px;\"><strong>\u00a0***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 47px;\">\n<td style=\"background-color: #bbe6fa; height: 47px;\">(Parking) No Parking<\/td>\n<td style=\"height: 47px;\">-646164<\/td>\n<td style=\"height: 47px;\">157896<\/td>\n<td style=\"height: 47px;\">-4.092<\/td>\n<td style=\"height: 47px;\">5.26e-05<\/td>\n<td style=\"background-color: #f2e8cb; height: 47px;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 47px;\">\n<td style=\"background-color: #bbe6fa; height: 47px;\">(Parking) NotProvided<\/td>\n<td style=\"height: 47px;\">-497397<\/td>\n<td style=\"height: 47px;\">142745<\/td>\n<td style=\"height: 47px;\">-3.485<\/td>\n<td style=\"height: 47px;\">0.0005<\/td>\n<td style=\"background-color: #f2e8cb; height: 47px;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 23px;\">\n<td style=\"background-color: #bbe6fa; height: 23px;\">(Parking) Open<\/td>\n<td style=\"height: 23px;\">-274208<\/td>\n<td style=\"height: 23px;\">130744<\/td>\n<td style=\"height: 23px;\">-2.097<\/td>\n<td style=\"height: 23px;\">0.0363<\/td>\n<td style=\"background-color: #f2e8cb; height: 23px;\"><strong>*<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 47px;\">\n<td style=\"background-color: #bbe6fa; height: 47px;\">(City_Category) CAT B<\/td>\n<td style=\"height: 47px;\">-1811069<\/td>\n<td style=\"height: 47px;\">110093<\/td>\n<td style=\"height: 47px;\">-16450<\/td>\n<td style=\"height: 47px;\">\u00a0&lt; 2e-16<\/td>\n<td style=\"background-color: #f2e8cb; height: 47px;\"><strong>***<\/strong><\/td>\n<\/tr>\n<tr style=\"height: 47px;\">\n<td style=\"background-color: #bbe6fa; height: 47px;\">(City_Category) CAT C<\/td>\n<td style=\"height: 47px;\">-2854096<\/td>\n<td style=\"height: 47px;\">122091<\/td>\n<td style=\"height: 47px;\">-23.377<\/td>\n<td style=\"height: 47px;\">\u00a0&lt; 2e-16<\/td>\n<td style=\"background-color: #f2e8cb; height: 47px;\">&nbsp;<\/p>\n<p><strong>***<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>In this model we have much more friendly numeric variables. In this model, carpet area turned out to be significant since we have removed builtup area &#8211; remeber it was not significant in the 1st model. Now the only question is how accurate this model is in comparison to the model we built with pricipal components. Let&#8217;s evaluate the performance of this model.<\/p>\n<pre><span style=\"font-size: 12pt;\"><span style=\"color: #ff0000;\"><strong># Step 9: performance evaluation of the 3rd regression model\r\n\r\n<\/strong><\/span><\/span><\/pre>\n<pre class=\"brush: r; first-line: 24; title: ; notranslate\" title=\"\">\r\nNew_Estimate=predict(New_Reg,type='response',newdata=Clean_Data[test,c(numeric,categoric,Target)])\r\nObserved=subset(Clean_Data[test,c(numeric,categoric,Target)],select=Target)\r\nformat(cor(New_Estimate,Observed$House_Price)^2,digits=4)<\/pre>\n<p><span style=\"font-size: 12pt;\"><br \/>\n<\/span><\/p>\n<p>The R-square value for this model is 0.4517. This is not too bad. You can live with a slight reduction of accuracy since it will make your job of the operationalization of this model on your client&#8217;s system much less complicated. This is the final model that you will share with your client.<\/p>\n<h4><span style=\"color: #3366ff;\">Sign-off Note<\/span><\/h4>\n<p>Your model despite your best effort is only good enough to predict 45% variation in the house price. But this is still better for estimating house prices than having no model at all. You are also slightly better equipped to tackle pseudo-experts<b><i><\/i><\/b> like some real estate agents. However, there is still 55% variation in this data that can&#8217;t be explained by these predictor variables. You will have to bring in new and innovative variables in this model to completely throw pseudo-experts out of business.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This is a continuation of our case study example to estimate property pricing. In this part, you will learn nuances of regression modeling by building three different regression models and compare their results.\u00a0We will also use results of the principal component analysis, discussed in the last part, to develop a regression model. You can find<\/p>\n<p><a class=\"excerpt-more blog-excerpt\" href=\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/\">Read More&#8230;<\/a><\/p>\n","protected":false},"author":1,"featured_media":9017,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_newsletter_tier_id":0,"jetpack_publicize_message":"","jetpack_is_tweetstorm":false,"jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","enabled":false}}},"categories":[80],"tags":[],"jetpack_publicize_connections":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v17.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Step by Step Regression Modeling Using Principal Component Analysis - Case Study Example (Part 5) &ndash; YOU CANalytics |<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Step by Step Regression Modeling Using Principal Component Analysis - Case Study Example (Part 5) &ndash; YOU CANalytics |\" \/>\n<meta property=\"og:description\" content=\"This is a continuation of our case study example to estimate property pricing. In this part, you will learn nuances of regression modeling by building three different regression models and compare their results.\u00a0We will also use results of the principal component analysis, discussed in the last part, to develop a regression model. You can findRead More...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/\" \/>\n<meta property=\"og:site_name\" content=\"YOU CANalytics |\" \/>\n<meta property=\"article:author\" content=\"roopam\" \/>\n<meta property=\"article:published_time\" content=\"2016-10-09T08:14:47+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2017-04-29T16:07:28+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&#038;ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"918\" \/>\n\t<meta property=\"og:image:height\" content=\"384\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Roopam Upadhyay\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Organization\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#organization\",\"name\":\"YOU CANalytics\",\"url\":\"https:\/\/ucanalytics.com\/blogs\/\",\"sameAs\":[],\"logo\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#logo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120\",\"contentUrl\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120\",\"width\":607,\"height\":120,\"caption\":\"YOU CANalytics\"},\"image\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#logo\"}},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#website\",\"url\":\"https:\/\/ucanalytics.com\/blogs\/\",\"name\":\"YOU CANalytics |\",\"description\":\"Explore the Power of Data Science\",\"publisher\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/ucanalytics.com\/blogs\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#primaryimage\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&ssl=1\",\"width\":918,\"height\":384},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#webpage\",\"url\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/\",\"name\":\"Step by Step Regression Modeling Using Principal Component Analysis - Case Study Example (Part 5) &ndash; YOU CANalytics |\",\"isPartOf\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#primaryimage\"},\"datePublished\":\"2016-10-09T08:14:47+00:00\",\"dateModified\":\"2017-04-29T16:07:28+00:00\",\"breadcrumb\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/ucanalytics.com\/blogs\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Step by Step Regression Modeling Using Principal Component Analysis &#8211; Case Study Example (Part 5)\"}]},{\"@type\":\"Article\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#webpage\"},\"author\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6\"},\"headline\":\"Step by Step Regression Modeling Using Principal Component Analysis &#8211; Case Study Example (Part 5)\",\"datePublished\":\"2016-10-09T08:14:47+00:00\",\"dateModified\":\"2017-04-29T16:07:28+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#webpage\"},\"wordCount\":2089,\"commentCount\":11,\"publisher\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#organization\"},\"image\":{\"@id\":\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&ssl=1\",\"articleSection\":[\"Pricing Case Study Example\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#respond\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6\",\"name\":\"Roopam Upadhyay\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/ucanalytics.com\/blogs\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g\",\"caption\":\"Roopam Upadhyay\"},\"description\":\"This blog contains my personal views and thoughts on predictive Analytics and big data. - Roopam Upadhyay\",\"sameAs\":[\"roopam\"],\"url\":\"https:\/\/ucanalytics.com\/blogs\/author\/roopam\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Step by Step Regression Modeling Using Principal Component Analysis - Case Study Example (Part 5) &ndash; YOU CANalytics |","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/","og_locale":"en_US","og_type":"article","og_title":"Step by Step Regression Modeling Using Principal Component Analysis - Case Study Example (Part 5) &ndash; YOU CANalytics |","og_description":"This is a continuation of our case study example to estimate property pricing. In this part, you will learn nuances of regression modeling by building three different regression models and compare their results.\u00a0We will also use results of the principal component analysis, discussed in the last part, to develop a regression model. You can findRead More...","og_url":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/","og_site_name":"YOU CANalytics |","article_author":"roopam","article_published_time":"2016-10-09T08:14:47+00:00","article_modified_time":"2017-04-29T16:07:28+00:00","og_image":[{"width":918,"height":384,"url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&ssl=1","type":"image\/jpeg"}],"twitter_misc":{"Written by":"Roopam Upadhyay","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Organization","@id":"https:\/\/ucanalytics.com\/blogs\/#organization","name":"YOU CANalytics","url":"https:\/\/ucanalytics.com\/blogs\/","sameAs":[],"logo":{"@type":"ImageObject","@id":"https:\/\/ucanalytics.com\/blogs\/#logo","inLanguage":"en-US","url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120","contentUrl":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/11\/YOU-CANalytics-Logo.jpg?fit=607%2C120","width":607,"height":120,"caption":"YOU CANalytics"},"image":{"@id":"https:\/\/ucanalytics.com\/blogs\/#logo"}},{"@type":"WebSite","@id":"https:\/\/ucanalytics.com\/blogs\/#website","url":"https:\/\/ucanalytics.com\/blogs\/","name":"YOU CANalytics |","description":"Explore the Power of Data Science","publisher":{"@id":"https:\/\/ucanalytics.com\/blogs\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/ucanalytics.com\/blogs\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"ImageObject","@id":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#primaryimage","inLanguage":"en-US","url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&ssl=1","contentUrl":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&ssl=1","width":918,"height":384},{"@type":"WebPage","@id":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#webpage","url":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/","name":"Step by Step Regression Modeling Using Principal Component Analysis - Case Study Example (Part 5) &ndash; YOU CANalytics |","isPartOf":{"@id":"https:\/\/ucanalytics.com\/blogs\/#website"},"primaryImageOfPage":{"@id":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#primaryimage"},"datePublished":"2016-10-09T08:14:47+00:00","dateModified":"2017-04-29T16:07:28+00:00","breadcrumb":{"@id":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/ucanalytics.com\/blogs\/"},{"@type":"ListItem","position":2,"name":"Step by Step Regression Modeling Using Principal Component Analysis &#8211; Case Study Example (Part 5)"}]},{"@type":"Article","@id":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#article","isPartOf":{"@id":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#webpage"},"author":{"@id":"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6"},"headline":"Step by Step Regression Modeling Using Principal Component Analysis &#8211; Case Study Example (Part 5)","datePublished":"2016-10-09T08:14:47+00:00","dateModified":"2017-04-29T16:07:28+00:00","mainEntityOfPage":{"@id":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#webpage"},"wordCount":2089,"commentCount":11,"publisher":{"@id":"https:\/\/ucanalytics.com\/blogs\/#organization"},"image":{"@id":"https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&ssl=1","articleSection":["Pricing Case Study Example"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/ucanalytics.com\/blogs\/step-step-regression-models-pricing-case-study-example-part-5\/#respond"]}]},{"@type":"Person","@id":"https:\/\/ucanalytics.com\/blogs\/#\/schema\/person\/55961a1cea272ecdf290cb387be069b6","name":"Roopam Upadhyay","image":{"@type":"ImageObject","@id":"https:\/\/ucanalytics.com\/blogs\/#personlogo","inLanguage":"en-US","url":"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/dd1aa0b0e813f7639800bcfad6a554f1?s=96&d=mm&r=g","caption":"Roopam Upadhyay"},"description":"This blog contains my personal views and thoughts on predictive Analytics and big data. - Roopam Upadhyay","sameAs":["roopam"],"url":"https:\/\/ucanalytics.com\/blogs\/author\/roopam\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/09\/Sumo-and-Regression-Model.jpg?fit=918%2C384&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/p3L0jT-2ls","jetpack-related-posts":[{"id":8388,"url":"https:\/\/ucanalytics.com\/blogs\/regression-analysis-pricing-case-study-example-part-1\/","url_meta":{"origin":9018,"position":0},"title":"Regression Analysis &#8211; Pricing Case Study Example (Part 1)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"How to figure out if you are paying the right price for the property you are about to purchase? Welcome to a new data science case study example on YOU CANalytics to identify the right housing price. Pricing is a highly important and\u00a0specialized function for any business. A right price\u2026","rel":"","context":"In &quot;Pricing Case Study Example&quot;","block_context":{"text":"Pricing Case Study Example","link":"https:\/\/ucanalytics.com\/blogs\/category\/pricing-case-study-example\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/07\/Connect-the-Dots.jpg?fit=397%2C603&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":8700,"url":"https:\/\/ucanalytics.com\/blogs\/principal-component-analysis-step-step-guide-r-regression-case-study-example-part-4\/","url_meta":{"origin":9018,"position":1},"title":"Principal Component Analysis: Step-by-Step Guide using R- Regression Case Study Example (Part 4)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"Principal component analysis is a wonderful technique for data reduction without losing critical information. Yes, you could reduce the size of 2GB data to a few MBs without losing a lot of information. This is like a mp3 version of music. Many, including some experienced data scientists, find principal component\u2026","rel":"","context":"In &quot;Pricing Case Study Example&quot;","block_context":{"text":"Pricing Case Study Example","link":"https:\/\/ucanalytics.com\/blogs\/category\/pricing-case-study-example\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/08\/principal-component-analysis-Death-Profile.jpg?fit=495%2C329&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":8649,"url":"https:\/\/ucanalytics.com\/blogs\/bivariate-analysis-leverage-regression-case-study-example-part-3\/","url_meta":{"origin":9018,"position":2},"title":"Bivariate Analysis &#038; Leverage &#8211; Regression Case Study Example (Part 3)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"Welcome back to the\u00a0case study example for regression analysis where you are helping an investment firm make money through property price arbitrage. In the last two parts (Part 1 & Part 2) you started with the univariate analysis to identify patterns in the data including missing data and outliers. In\u2026","rel":"","context":"In &quot;Pricing Case Study Example&quot;","block_context":{"text":"Pricing Case Study Example","link":"https:\/\/ucanalytics.com\/blogs\/category\/pricing-case-study-example\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/08\/Regression-Case-Study-Example.jpg?fit=1156%2C720&ssl=1&resize=350%2C200","width":350,"height":200,"srcset":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/08\/Regression-Case-Study-Example.jpg?fit=1156%2C720&ssl=1&resize=350%2C200 1x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/08\/Regression-Case-Study-Example.jpg?fit=1156%2C720&ssl=1&resize=525%2C300 1.5x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/08\/Regression-Case-Study-Example.jpg?fit=1156%2C720&ssl=1&resize=700%2C400 2x, https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/08\/Regression-Case-Study-Example.jpg?fit=1156%2C720&ssl=1&resize=1050%2C600 3x"},"classes":[]},{"id":8488,"url":"https:\/\/ucanalytics.com\/blogs\/data-preparation-regression-pricing-case-study-example-part-2\/","url_meta":{"origin":9018,"position":3},"title":"Data Preparation for Regression &#8211; Pricing Case Study Example (Part 2)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"In the last post we had started a case study example for regression analysis to help an investment firm make money through property price arbitrage\u00a0(read part 1 :\u00a0regression case study example).\u00a0This is an interactive case study example and required your help to move forward. These are some of your observations\u2026","rel":"","context":"In &quot;Analytics Labs&quot;","block_context":{"text":"Analytics Labs","link":"https:\/\/ucanalytics.com\/blogs\/category\/analytics-labs\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/08\/Regression-analysis.jpg?fit=448%2C528&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":9145,"url":"https:\/\/ucanalytics.com\/blogs\/data-simulation-regression-modeling-pricing-case-study-example-part-6\/","url_meta":{"origin":9018,"position":4},"title":"Data Simulation for Regression Modeling &#8211; Pricing Case Study Example (Part 6)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"\"Data! Data! Data!\" he cried impatiently. \"I can't make bricks without clay.\" - Sherlock Holmes This is a continuation of our regression case study example. In the previous parts, we have learned, as Sherlock Holmes says, to make bricks i.e. develop regression models. In this part, we will learn how\u2026","rel":"","context":"In &quot;Pricing Case Study Example&quot;","block_context":{"text":"Pricing Case Study Example","link":"https:\/\/ucanalytics.com\/blogs\/category\/pricing-case-study-example\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2016\/10\/Potter-1.jpg?fit=403%2C301&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]},{"id":5782,"url":"https:\/\/ucanalytics.com\/blogs\/how-effective-is-my-marketing-budget-regression-with-arima-errors-arimax-case-study-example-part-5\/","url_meta":{"origin":9018,"position":5},"title":"How Effective is My Marketing Budget? &#8211; Regression with ARIMA Errors, Case Study Example (Part 5)","author":"Roopam Upadhyay","date":false,"format":false,"excerpt":"So far we have covered the following topics in this case study example\u00a0on time series forecasting and ARIMA models: Part 1\u00a0: Introduction to time series modeling & forecasting Part 2: Time series decomposition to decipher patterns and trends before forecasting Part 3: Introduction to ARIMA models for forecasting Part 4:\u2026","rel":"","context":"In &quot;Manufacturing Case Study Example&quot;","block_context":{"text":"Manufacturing Case Study Example","link":"https:\/\/ucanalytics.com\/blogs\/category\/manufacturing-case-study-example\/"},"img":{"alt_text":"","src":"https:\/\/i0.wp.com\/ucanalytics.com\/blogs\/wp-content\/uploads\/2015\/07\/rope-walk.jpg?fit=480%2C640&ssl=1&resize=350%2C200","width":350,"height":200},"classes":[]}],"_links":{"self":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/posts\/9018"}],"collection":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/comments?post=9018"}],"version-history":[{"count":0,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/posts\/9018\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/media\/9017"}],"wp:attachment":[{"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/media?parent=9018"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/categories?post=9018"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/ucanalytics.com\/blogs\/wp-json\/wp\/v2\/tags?post=9018"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}