Facebook and Indian Elections
India is going to have its 16th national elections in the middle of next year. Around 450 million* people are expected to vote in the elections. This will require a huge infrastructure just to conduct the elections. On top of this, political parties are going to shove huge amounts of cash in the elections kiln to woo the voters.
If Facebook, a synonym for big data, is to be believed, the results of elections are already out. The BJP is the hands down winner of the elections and Mr. Modi is the next prime minister. Then should we not just cancel the elections and choose the next government using Facebook as the source. This will save the nation a huge amount of money in the times of crumbling economy. However, one must analyze the results from Facebook or any other social media sites carefully before making a hard judgment. Mr. Modi may well be the next prime minister of India, however, the final results of the elections have very little to do with the information available on Facebook. So, what are we missing here?
Last week a major Indian business daily, Mint, published an interview with Mark Zuckerburg – the founder of Facebook Inc. He mentioned that Facebook has around 1 billion active users – the population of the world is about 6 billion. He feels getting the remaining 5 billion on the internet is going to be a humongous task, connecting with the first billion was nothing in comparison – boy he is right! Internet.org** is his effort to accomplish this task. Ok coming back to the Indian elections, Facebook has around 82 million monthly active users in India. I would estimate a sizable fraction of this user base would not participate in the elections for the following reasons – under aged, apathy or not a registered voter. I would estimate only 20% of the population on Facebook will vote – that will be just 4% of expected votes. On top of this, the clamor that we are hearing on Facebook is coming from a relatively small percentage population of 82 million active users. It is possible, the BJP and Mr. Modi mostly sponsored it.
* Roughly, 65% of 1.2 billion of Indian population is eligible for voting. Historically, India has observed around 58% voters turnout. Hence 1.2 billion*65%*58% = 452 million. ** internet.org is publicised by Facebook as an effort to make the internet accessible to the poor. However, it is also criticised as an effort to violate net-neutrality. Internet.org allows users to access only Facebook approved websites through Reliance network (this does seem fishy).
Big Data does not equal total population
I must say I am excited about big data. I know the pains of data collection – sometimes bad enough to abandon research. Finally, it seems, humanity has come to a place where data is generated on its own without intense and often boring labor. Now, inquisitive minds can ask good questions, create testable hypotheses and data is waiting for them to analyze.
There are several claims about the death of sampling – the technique that brought science thus far. We have a new way of doing science – analyze the complete big fat population! Ok, this is where the euphoria ends. Yes, the data is big but it is still not the population. You could have a trillion data points to analyze but that is still a sample. Additionally, the age-old problem of sample bias will not just disappear because the data is big – as you could see in the example presented about the Indian elections – the Facebook data was neither total population nor without biases.
Big data Analytics
Last year, I attended a daylong conference on big data analytics organized by SAS. In one of the sessions, an eloquent speaker presented a technology that performs logistic regression on a few million observations with several thousand variables in a matter of minutes. The technology uses collaborative power of several servers and machines. I was spellbound! It takes me much longer to do the same task with just a hand full of variables and a few thousand observations. When I came to my senses, I heard someone in the audience stating that we could take humans out of the process of predictive analysis. This technology can be set on autopilot to create predictive models and run the whole business. I immediately experienced the horror of all horrors – this is completely against the scientific thinking imbibed by me through years of scientific training.
Lessons from Real Science
The excitement of big data can easily make people with dubious scientific knowledge demolish the tenets of human knowledge. Genetics and astronomy are possibly the fields from mainstream science with big data. Next in line could be data generated through the Large Hadron Collider (LHC). None of these fields is thinking about purging humans. Rather, big data is creating an enhanced ecosystem for humans to take science to the next level. We need to take a few learnings from these sciences to create a scientific big data Analytics.
I am excited about the prospects of big data. I believe that big data is going to shape the human understanding about the world in the generations to come. However, as of now, the field is going through a fair bit of irrational exuberance. I would divide the whole big data frenzy into three parts: information, knowledge, and wisdom. Information can be further divided into quantity and quality of information. When we discuss big data, I feel we put excessive emphasis on the quantum of information. Whereas, I think in most discussions on big data, we do not capture knowledge and wisdom. I am also not sure a large amount of information always leads to the latter two. I believe we need to distinguish between these three aspects to having a more meaningful discussion on big data. Could someone please turn on the lights once the ruckus is over?