How to figure out if you are paying the right price for the property you are about to purchase? Welcome to a new data science case study example on YOU CANalytics to identify the right housing price. Pricing is a highly important and specialized function for any business. A right price can make the difference between profit or loss. In this case study we will use the example of property pricing to gain a deeper understanding of regression analysis.
Regression analysis is the mother of all machine learning and analysis techniques. Hence it is essential for every data scientist to have an intuitive understanding of regression. This understanding helps them appreciate other advanced data science and analytics techniques. In this case study we will explore nuances of regression analysis including data preparation, correlation analysis, principal component analysis (PCA), traditional regression with variables selection, and regression with regularization (Ridge & Lasso – used in machine learning).
Based on the suggestions from several readers, I will share the data for this case study example right at the beginning for you to play around and learn. Essentially, we will work on this case study together. Download the data file : Regression analysis data. But before we analyse this data, let’s create some connections between regression analysis and zodiac signs that govern your daily horoscope.
Zodiac Signs & Regression Analysis – Connect the Dots
The night sky has always fascinated humans. For centuries, human imagination has looked at the night sky as a vast canvas with stars as dots waiting to be connected. Constellations are the results of this imaginative thinking. Constellations are a group of stars connected together to form mythical characters like Orion, Great Bear etc. There are 88 officially recognized constellations out of which the twelve most popular constellations are the zodiac signs i.e. Aquarius, Pisces, Aries etc. Constellations had a practical use in the ancient times when they were used to identify seasons. Each of these 12 Zodiac constellations are clearly visible in the night sky during a particular period on the calendar year. For instance, Aries is visible in October and Taurus in November and so on. In absence of the modern calendars, farmers used the position of the constellations to plan their crops. This representation displays the relative position of zodiac constellations to the Earth and the Sun.
So how do constellations became part of horoscopes? In several cultures, change in seasons are also associated with change in fate. This makes sense since agriculture productivity is directly linked to seasons. Hence, the zodiac constellations which change with seasons became the indicators for horoscope. In October, we see Aries in the clear night sky. During the same period, the Sun is blocking Libra on the other side of the space. This means Libra is in the house of Sun. If you are born between 23rd September – 23rd October your Zodiac Sign is Libra. Other aspects of horoscopes is as much a part of human imagination as the shapes of constellations.
Again, what do constellations and zodiac signs have to do with regression analysis? Regression analysis is also an effort to connect the dots similar to formation of constellations with stars. The major difference is that regression analysis doesn’t rely on human imagination but mathematics to find the most optimal connection. Keeping this in mind let’s move to our case study example.
Housing Price – Regression Analysis Case Study Example
Buy cheap and sell dear is the fundamental goal for a market economy. If you purchase something at a lower market price, you have a higher leverage to make profit. ByeBuyHome is a property listing site that aggregates ready-to-buy properties and quoted prices across the country. This is a good opportunity for property investors to identify properties that are selling at a lower premium. The question is how to identify if the property is up for grabs at a lower price than market?
You are a data analytics consultant to one such investing firm. You have accumulated data for the properties sold this month along with the features of these properties: Regression analysis data. This data contains these parameters
You have calculated the distance for the first 3 variables based on your proprietary algorithm and data from Google Maps. Now that you have some data with you, here are the two immediate goals for you:
1) Do an exploratory data analysis to identify the initial patterns in this data and report your findings in the ‘Leave a Comment’ section at the bottom of this page. Please don’t do any regression analysis at this point, but just data exploration i.e. identification of outliers, missing values, univariate, and bivariate patterns.
2) Think of yourself like a god who has access to all the information in the Universe. What all information (variables) will you use to estimate the house price? Please report the variables of your choice in the comments section.
I look forward to read your comments. Your answers will lead us to the next part in this case study example. We don’t need horoscopes anymore in this case study to estimate the right price and fate of a house, it is all mathematics and logic from here on.