Please read the disclaimer about the Free PDF Books in this article at the bottom
The one thing they love more than a hero is to see a hero fail, fall, die trying. In spite of everything you’ve done for them, eventually, they will hate you [Spider-Man].
– Green Goblin / Norman Osborn
R vs Python- by Roopam
Batman v Superman: Dawn of Justice was released in March 2016. The film didn’t do too well but it’s an interesting idea to make these two superheroes battle it out against each other. Both these superheroes were introduced through comic books in the late 1930s by DC Comics. Both of them fight crime and criminals. However, in over 75 years they have developed into characters that are contrasting to each other. They are as different as day and night. Superman represents the bright, sunny side of life while Batman the dark, chilling nights. Notably, Superman gets his superpowers from the sun while fear of bats and dark nights are the source of power for Batman.
R vs Python – Superheroes
Let us continue with the theme of contrasting superheroes with a common mission. This time, we will make the superheroes for data scientists compete against each other – R vs Python. The idea for this article is to explain the superpowers of both R and Python, and also to suggest books to learn them. Most of these books are available online for free for the purpose of evaluation, and I will share those links here. To explain superpowers of R and Python let me create a few connections between them and the DC Comic superheroes.
You may find it unusual but I see a few similarities between R and Batman. Moreover, for me, Python and Superman have some things in common as well. Let me create a table to list these similarities.
Analysis Tool
Similar Superhero
Super Powers in Common
R
Batman
Detective Work
Intelligence
Cunning
Usage of Tools
More Brain than Muscles
Python
Superman
Muscle Power
Super Strength
Elegance
Wide Range
More Muscles than Brain
Let me try to explain the reasons for these distinctions between R and Python in the next segment. Also, let us figure out a good approach for data scientists while using these languages.
R vs Python / R and Python : Which is a Good Approach?
Both R and Python are open sources and free to use high-level programming languages. R is specifically developed for statistical computing. It has plenty of add-on packages / tools to support machine learning and data analysis. On the other hand, Python is a general purpose and powerful programming language with special applications in data preparation, data munging, and data analysis.
This distinction is also the reason for different communities of analysts to prefer either of these languages. Python is often preferred by computer programmers trying to develop skills in number crunching and analysis. On the other hand, R is preferred by mathematicians and statisticians. This difference is glaring in the learning resources (books and online) for these languages. For instance, consider the following four books for R available online for free (click on the books to read them for free).
YOU CANalytics Book Rating (5 / 5) – for all the 4 books mentioned below
Click to Read
Click to Read
Click to Read
Click to Read
All these books are high-quality statistical texts with R as the preferred language. These are just a few examples. Please note, the first book is not for R, but is by the same authors as the second book. You will rarely find books of this nature with Python as the preferred language. Hence, R is much better equipped to tackle data mining and statistical analysis related problems. On the other hand, Python provides great applications to work with unstructured and complicated datasets like images, written text (web, emails, etc.), genomics, sound etc.
In essence, Python and R together complete the toolkit for a data scientist. Hence, for a pragmatic and application-oriented data scientist, it is essential to understand the super-powers and qualities of both these languages.
R Qualities
Python Qualities
Use R for analysis, data visualization, and modeling
Offers great flexibility for analysis
R makes it is easy to think while doing your analysis
Constant upgrades and enhancements of analysis packages because of highly active community in statistics and mathematics
Exceptional data visualization tools
Use Python for data preparation, data munging especially for unstructured data like web, images, text etc.
Great flexibility and ability to extract information from free text, websites, and social media sites
Good with mining images and prepare data for analysis
Can handle large volume of data better than R
For a serious data scientist, it is a good idea to have some functional knowledge of both R and Python. Hence, a practical approach is to think of them together as R and Python – instead of R vs Python. In the following section, I will suggest books and online resources for both R and Python.
R – Books and Online Resources
In one of the earlier articles on YOU CANalytics I have suggested many books and online resources to learn R. In that article, I have recently added links to PDF files for the books for R. So, I suggest you revisit that post even if you have read it before. You could find that post on the following link – Learn R : 12-books (Free PDFs) and Online Resources.
R and Python – Books and Online Resources
This book uses both R and Python for marketing analytics. It is rare to find books that use both the languages.
Click to Read
Marketing Data Science: Modeling Techniques in Predictive Analytics with R and Python – Thomas W. Miller
YOU CANalytics Book Rating (4.7 / 5)
“When I prepare data for analysis or work on the web, I use Python. For modeling or graphics, I often use R” – this statement by the author of this book summarizes the way data scientists want to use R and Python. This is an excellent book to learn marketing analytics. The book covers all the major data science activities for marketers including pricing, promotion, product design, recommendation etc. However, before you reach out for this book make sure you have some functional knowledge of either R or Python.
Now let me introduce a few books I have found useful to learn Python. I have divided these books into four different categories based on their utilities. These books will be presented under the following categories:
Books: Python for General Purposes of Data Science
Books: Python for Specialized Applications in Data Science
Books: Python for Text Analytics
Books: Python for Image Analytics
Also, be prepared to see some exotic animals on the cover pages of almost all the books to follow.
1. Books: Python for General Purposes of Data Science
Click to Read
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython – Wes McKinney
YOU CANalytics Book Rating (4.2 / 5)
As I mentioned earlier, Python is excellent when it comes to data preparation, data munging, data wrangling etc. This is a good book to start learning these skills. In this book, a friendly interface IPython is used throughout to code. This makes it easy for beginners and non-programmers. Additionally, it provides good working knowledge of NumPy and Pandas.
Data Science from Scratch: First Principles with Python – Joel Grus
YOU CANalytics Book Rating (4 / 5)
This book has a much more balanced approach to theory and programming than most other books available in the market with Python as the choice of language. I still feel there are many better books on R to learn machine learning and statistical aspects of data science. However, if you want to learn these topics through Python, ‘Data Science from Scratch’ is not a bad book to start.
2. Books: Python for Specialized Applications in Data Science
Click to Read
Programming Collective Intelligence: Building Smart Web 2.0 Applications– Toby Segaran
YOU CANalytics Book Rating (5 / 5)
This is a wonderful book for the following reasons: brilliantly written, fun to read, makes the reader think, and quite practical. While reading this book you can easily figure out that the author loves his subject. Collective intelligence is about making decisions through the wisdom of the crowd instead of one expert opinion. The book introduces practical approaches to extract this knowledge from the web. Given that the book was written in 2007 there are a few outdated codes in this book. However, the underlying principals and ideas are extremely relevant and will continue to be so. I strongly recommend that you read this book.
Read Full PDF: Programming Collective Intelligence (Use the first link in the Google Search)
Click to Read
Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More – Matthew A. Russell
YOU CANalytics Book Rating (4.8 / 5)
Are you interested in mining social media sites? Twitter, Facebook, LinkedIn, Google+: this book has a chapter to extract information from all these sites and more. This is a good book especially to extract information from Twitter. However, I must offer a word of caution: APIs for these social media sites change quite regularly so you will hit a roadblock a few times while using the codes from this book. I suggest you buy the latest edition and refer to the internet during your practice sessions.
One of the most complicated problems in machine learning is to extract meaning from a free flowing text through algorithms. This book is going to introduce you to the wonderful world of text processing in an intuitive fashion. You will learn about string functions and operations, regular expression, text parsing etc. This is a great book to start your text processing journey. Notice, there are no animals on the cover of this book – how fascinating!
Natural Language Processing with Python– Steven Bird and Ewan Klein
YOU CANalytics Book Rating (3.8 / 5)
The book can be considered as a manual for Python NLTK (Natural Language Toolkit). NLTK is a powerful toolkit to implement natural language processing (NLP) i.e. make machines understand human languages. This book doesn’t cover the theoretical depth and nuances of NLP which is a bit frustrating. However, this is still a good book to learn NLTK.
Read Full PDF: Natural Language Processing with Python
4. Books: Python for Image Analytics
Click to Read
Programming Computer Vision with Python – Jan Erik Solem
YOU CANalytics Book Rating (4.3 / 5)
A greyscale digital image is just a large matrix of numbers with pixel information. Each color image has 3 matrices with RGB (red-green-blue) level pixel information. A wide screen HD TV has image matrix dimensions of 1920 x 1200 pixels. A color image with these dimensions has over 6 million numbers stored to represent individual pixels for RGB. Now, if you want to learn more about manipulating image matrices and image processing read this book. It is a gentle introduction to computer vision. The question is, can the computer see the world the way you and I see it?
Read Full PDF: Programming Computer Vision with Python
Click to Read
Learning OpenCV: Computer Vision with the OpenCV Library– Gary Bradsk &Adrian Kaehler
YOU CANalytics Book Rating (4.8 / 5)
Computer vision is a fascinating topic as mentioned earlier. While we see pictures of a butterfly, computers see matrices of numbers. The question is how to make the computer identify the butterfly within pixel-numbers? OpenCV (open computer vision) is a powerful C-based library that has answers to this question. OpenCV can be called from Python for image processing. This book is a great introduction to OpenCV. A must read if you want to learn image processing and image analytics.
The one thing they love more than a hero is to see a hero fail, fall, die trying. In spite of everything you’ve done for them, eventually, they will hate you [Spider-Man].
– Green Goblin / Norman Osborn
I guess, we do love to see our heroes fail. Why else will we make them compete against each other? I don’t know the reason for this. Possibly, as a race, we are sadistic creatures. Possibly, we are just jealous of people better than us. Possibly, we love sadness despite our claims about our love for happiness. Possibly, we relate with the demons these superheroes fight. Possibly, we believe in the futility of life.
All the above reasons are just a half truth to me. For me, a more likely reason is that we have both day and night inside us. Some days it is bright and sunny for us, and the other days it is pitch-dark. Let us embrace the grayness of life. In the same breath, let us embrace both Python and R with their individual insufficiency. Let’s stop pulling our superheroes down.
Disclaimer : Roopam Upadhyay or YOU CANalytics has no affiliation to either the authors of the books or the web-sites hosting these PDF books shared in this post. I am assuming that none of the PDF file links I have shared in this article is a copyright infringement since they are among the top Google search results. Several of these files are from either the authors' webpages or from scholarly links. In case you believe otherwise about any link please let me know I will remove that link.
Thank you for discussing the similarities and differences btw R and Python. Nice simplification of their uses and great resource with available texts to learn Python software.
Thanks a lot for this nice article and very useful resources! As an R addict who senses access to Python’s world would widen the world, I greatly appreciate the recommandation for a good start with the blue and yellow snakes!
Best wishes
There are two methods for this. You could either use maximum likelihood estimate to calculate alpha and beta parameters for the beta distribution, or simply use mean and standard deviation of your data for the same.
Hello,
Thanks for characterizing Python and R somewhat. Works mainly with Python myself and got more interested in R by your article. Interesting with the ref to Hasties et als books. I would just like to add a concise good book to start use Python (together with libraries numpy, matplotlib etc) which is a good book if you know some programming already and focus is on use Python for computations.
Hi all
Me,i started with R ,without any prior knowledge of programming(even mathematics basics at that time were a mystery to me).
But putting an effort (in the NYJH coursera course on R programming was the first ) ,I overcame my lack of prior knowledge and this I think was due to R’s easy(in my view)programming style and the language idiosyncrasy.
Also i guess the teaching style had something to do with it.
I ve also taken the ML course in Stanford ,by the authors of the 2 books you present (ISLR and Stat.learning)and found it to be an eye opener .(also R used in the course )
In all,from the +10 R courses i ve taken ,these were the ones that helped me the most.
I know most people suggest to fresh starters to follow the path from Python to data science,but to me python was like reading hieroglyphics at that time..:)
But then again, i am not one who s famous for his perception >:):)
Eventually of course i had to learn python too.As the article well puts it,you need both skills if you are to “play” with data in the real world…
Plus a few other skills (if your target is to become an analyst of some kind)
On python i ve taken several introductory courses ,plus a couple with spark (where you will find that knowing python helps).I am at this stage for now,trying to make sense of it all..:)
regards to all
Python for Data Analysis by Wes McKinney is gold. Why? Because Wes McKinney is one of the creator of Pandas itself [1]. So whatever this book teaches comes directly from the creator.
Thanks a lot and very nice way of presentation and suggestion
Thank you for discussing the similarities and differences btw R and Python. Nice simplification of their uses and great resource with available texts to learn Python software.
Hi,
Nice website and updates. Enjoyed.
Cheers,
Thanks a lot for this nice article and very useful resources! As an R addict who senses access to Python’s world would widen the world, I greatly appreciate the recommandation for a good start with the blue and yellow snakes!
Best wishes
Hello Sir,
I am a bit out of context here in terms of the topic discussed above.
Can you please help me with some information on ‘Beta Distribution’. How to fit Beta distribution on an existing dataset varying from 0 to 1.
There are two methods for this. You could either use maximum likelihood estimate to calculate alpha and beta parameters for the beta distribution, or simply use mean and standard deviation of your data for the same.
Hello,
Thanks for characterizing Python and R somewhat. Works mainly with Python myself and got more interested in R by your article. Interesting with the ref to Hasties et als books. I would just like to add a concise good book to start use Python (together with libraries numpy, matplotlib etc) which is a good book if you know some programming already and focus is on use Python for computations.
Claus Führer et al, Computing with Python – an introduction to Python for science and engineering, published by Pearsson 214.
http://www.amazon.com/Computing-Python-introduction-science-engineering-ebook/dp/B00IZI60QC/ref=sr_1_1?ie=UTF8&qid=1453978821&sr=8-1&keywords=fuhrer+computing+with+python
Very nice website.
Awesome Blogs and topics with free pdfs. Really I will recommend this website to my friends as well..
Lovely article – one of the best things I’ve recently read, and by far the most useful.
Hi all
Me,i started with R ,without any prior knowledge of programming(even mathematics basics at that time were a mystery to me).
But putting an effort (in the NYJH coursera course on R programming was the first ) ,I overcame my lack of prior knowledge and this I think was due to R’s easy(in my view)programming style and the language idiosyncrasy.
Also i guess the teaching style had something to do with it.
I ve also taken the ML course in Stanford ,by the authors of the 2 books you present (ISLR and Stat.learning)and found it to be an eye opener .(also R used in the course )
In all,from the +10 R courses i ve taken ,these were the ones that helped me the most.
I know most people suggest to fresh starters to follow the path from Python to data science,but to me python was like reading hieroglyphics at that time..:)
But then again, i am not one who s famous for his perception >:):)
Eventually of course i had to learn python too.As the article well puts it,you need both skills if you are to “play” with data in the real world…
Plus a few other skills (if your target is to become an analyst of some kind)
On python i ve taken several introductory courses ,plus a couple with spark (where you will find that knowing python helps).I am at this stage for now,trying to make sense of it all..:)
regards to all
Hiii…..Your blog about the Python and R programming is really much informative and helpful to all people…Thanks for your informative sharing….
Thank you
Thank you so much
Python for Data Analysis by Wes McKinney is gold. Why? Because Wes McKinney is one of the creator of Pandas itself [1]. So whatever this book teaches comes directly from the creator.
Ref.
[1]
[1] https://en.wikipedia.org/wiki/Wes_McKinney