Please read the disclaimer about the Free PDF Books in this article at the bottom

R, an open-source statistical and data mining programming language, is slowly but surely catching up in its race with commercial software like SAS & SPSS. I believe R will eventually replace SAS as the language of choice for modeling and analysis for most organizations. The primary reason for this is plainly commercial. Most organizations are questioning the heavy annual cost of SAS on their P&L statement. This is escalated with the presence of R as a free and viable replacement. R is a highly advanced language with over 5000 add-on packages to assist in data management and analysis. Most senior analysts and analytics leaders have already started polishing their skills on R. In this article, I will introduce the books and online resource that will help you to learn R and its applications. Before introducing these resources, let me elucidate why you need many resources for self-learning.

## Non-Linear Self-Learning

Humans are obsessed with linearity. Look at our houses, furniture, televisions, photo frames or cabinets, they all follow linear designs. The reason is linearity is simple, however, it is certainly not natural. Outside our houses nature is flourishing with non-linearity – trees, mountains, rivers and the human body all follow non-linear patterns and dynamics (to explore more read about fractal geometry and chaos theory, or we will discuss it in some later articles on YOU CANalytics).

Learning / teaching in schools and universities usually take the linear path, however, self-learning, in my opinion, is highly non-linear. Unlike school-learning, self-learning is driven by purpose and need, hence one tends to hop between books, chapters, and the internet – I say this from experience. Let me present the resources that have helped me the most to learn R. I have divided these resources into the following 5 categories

**R for Reference**: these books cover most essential aspects about R and also serve well as reference books**R with Theory**: these books are great if you want to understand fundamentals of statistics and machine learning while using R as the tool

**R with Applications**: these books use case studies or applications based learning

**R Graphic and Programming**: focus of these books is on R Graphics or programming

**Online Resource**: short online courses and computer-based learning tools (I have also included the most important online data repository over here)

Also, check out some awesome free books on Python for data science. Both R and Python are essential in a data scientist’s toolkit. |

## 1. R for Reference

R for Data Science–Hadley Wickham & Garrett GrolemundYOU CANalytics Book Rating (5 / 5)

First of all, thanks to Jared for recommending this book in the comments section of this article. I have spent last hour or so skimming through the book and believe this book deserves a place right at the top. This is an extremely well written and practical reference book. Moreover, I believe, for beginners to R this is a good book to start.

**Read the Full Book: R for Data Science**

R in Action –Robert KabacoffYOU CANalytics Book Rating

(5 / 5)

Here is another exceptional book to start learning R on your own. I must say Robert Kabacoff, the author of this book, has done a phenomenal job with this book. The organization of the book is immaculate and the presentation is friendly. I will highly recommend either this book or *R for Data Science* to start your journey to learn R.

**Read Full PDF: R in Action**

R for Everyone:Advanced Analytics and Graphics –Jared P. LanderYOU CANalytics Book Rating

(5 / 5)

Jared Lander, in his book, wastes no time on basic graphic (comes pre-installed with R) but jumps directly to ggplot2 package (a much advanced and sleek graphical package). This sets the tone for this book i.e. don’t learn things you won’t use in real life applications later. I recommend this book for a fast-paced experience to learn R.

**Read Partial PDF to Evaluate Table of Contents: R for Everyone**

The R Book–Michael J. Crawley

YOU CANalytics Book Rating

(4.8 / 5)

With close to a thousand pages and vast coverage, ‘The R Book’ could be called the Bible for R. This book starts with simple concepts in R and gradually move to highly advanced topics. The breadth of the book can be estimated through the presence of dedicated chapters on topics as diverse as data frames, graphics, Bayesian statistics, and survival analysis. Essentially this is a must-have reference book for any wannabe R programmer. But for a beginner, the thickness of the book could be intimidating.

**Read Full PDF: The R Book**

## 2. R with Theory

An Introduction to Statistical Learning:with Applications in R –Gareth James et al.

YOU CANalytics Book Rating

(5 / 5)

This book is a high-quality statistical text with R as the software of choice. If you want to be comfortable with fundamental concepts in parallel with learning R, then this is the book for you. Having said this, you will love this book even if you have studied advanced statistics. The book also covers some advanced machine learning concepts such as support machine learning (SVM) and regularization. A great book by all means.

**Read Full PDF: An Introduction to Statistical Learning**

Machine Learning with R–Brett Lantz

YOU CANalytics Book Rating

(4.5 / 5)

If you want to learn R from the machine learning perspective, then this is the book for you. Some people take a lot of interest in the fine demarcation between statistics and machine learning; however, for me, there is too much overlap between the topics. I have given up on the distinction as it makes no difference from the applications perspective. The book introduces R-Weka package – Weka is another open source software used extensively in academic research.

**Read Full PDF: Machine Learning with R**

**3. R with Applications**

R and Data Mining:Examples and Case Studies –Yanchang Zhao

YOU CANalytics Book Rating

(4.3 / 5)

There are other books that use case studies approach for readers to learn R. I like this book because of the interesting topics this book covers including text mining, social network analysis and time series modeling. Having said this, the author could have put in some effort on the formatting of this book which is pure ugly. At times you will feel you are reading a masters level project report while skimming through the book. However, once you get over this aspect the content is really good to learn R.

**Read Full PDF: R and Data Mining**

Data Mining with Rattle and R:The Art of Excavating Data for Knowledge Discovery (Use R!) –Graham WilliamsYOU CANalytics Book Rating

(4.2 / 5)

Rattle is no SAS E-miner or SPSS modeler (both commercial GUI based data mining tools). However trust me, apart from a few minor issues Rattle is not at all bad. The book is a great reference to Rattle (a GUI add-on package for R to mine data) for data mining. I really hope they keep working on Rattle to make it better, as it has a lot of potential.

**Read Full PDF: Data Mining with Rattle and R**

## 4. R Graphics and Programming

ggplot2:Elegant Graphics for Data Analysis (Use R!) –Hadley WickhamYOU CANalytics Book Rating

(4 / 5)

‘ggplot 2’ is an exceptional package to create wonderful graphics on R. It is much better than the base graphics that comes pre-installed with R, so I would recommend you start directly with ggplot 2 without wasting your time on base graphics. ‘R for everyone’, the first book we discussed, has a good introduction to ggplot. However, if you want to get to further depths of ggplot-2 then this is the book for you.

**Read Full PDF: ggplot2**

Though I prefer ggplot 2, Lattice is another package at par with ggplot 2. A good book to start with Lattice is ‘Lattice: Multivariate Data Visualization with R (Use R!) by

The Art of R Programming: A Tour of Statistical Software Design –Norman Matloff

YOU CANalytics Book Rating

(4.2 / 5)

If you want to learn programming and coding aspect of R more than the analysis aspect, then this is the book for you. The author of this book has extensive experience in R coding and that is evident when you read this book. I must warn you that at times while reading this book one wonders about the utility of some of the things Mr. Matloff talks about. Nevertheless, this is the best book in the market to learn R programming. The author also touches on the issues of parallel computing in R – a topic highly relevant in the day and age of big data.

**Read Full PDF: The Art of R Programming**

## 5. Online Resource

YOU CANalytics Resource Rating

(4.9 / 5)

This is a wonderful place to learn R programming. Before jumping to the books, I recommend you take this free online course. You don’t need to install R on your system to complete this course. It will take you less than an hour to complete this course but will prepare you well for further learning. (**Link**)

:Coursera R Programming–Roger D. PengYOU CANalytics Resource Rating

(3.5 / 5)

I had really high expectations from this course on coursera.com. Expectations were high since Dr. Andrew Ng is associated with this site and his course on machine learning is delightful. However, the course by Dr. Roger D. Peng fell short of my expectations by some margin. The instructor is a good communicator, an expert in R and the topics of this course are highly relevant for learning R. The biggest problem for me with this course is its tone which is highly didactic. If Dr. Peng could slightly redesign this course around applications and examples it will become a fantastic course. (**Link)**

Lynda.com:R Statistics Essential Training–YOU CANalytics Resource Rating

(4.5 / 5)

This course is not as comprehensive as the above course on coursera. However, the tone of the course is much more applied and learner-friendly. (**Link**)

UCI Machine Learning RepositoryYOU CANalytics Resource Rating

(5 / 5)

UCI machine learning repository has tons of freely available datasets. This site is not associated with R. However, ‘datasets’ package in R has many of the datasets taken from this site. The reason you may still want to go this site is because they have provided links to research papers that have used these datasets. (**Link**).

A few more great online resources to learn R1) Datacamp (Link): Great courses on R, try this site for some interactive courses on R 2) Open Intro (Link): This site has some really good tutorials for doing basic statistics on R 3) R-tutor (Link): This is a good site to start learning R from scratch 4) R-bloggers (Link): A great culminations of blogs for R, may not be the place you want to visit first up 5) Kaggle (Link): This link has 3 good tutorials to learn R

#### Sign-off Note

Let me create a loose parallel between Excel and R to offer you an advice about learning R. As I have mentioned earlier, R has more than 5000 add-on packages on CRAN library and millions of functions for data analysis. This may sound a bit daunting to a new learner. Luckily the online resource is quite powerful hence number of functions won’t be a challenge. Moreover, if you have worked on Excel, you will know that there are just a handful of functions that you use repeatedly based on your style of analysis. This same pattern will emerge with R as well. Hence, don’t get intimidated by the number of functions.

Enjoy learning R! It is good fun.

Disclaimer: Roopam Upadhyay or YOU CANalytics has no affiliation to either the authors of the books or the web-sites hosting these PDF books shared in this post. I am assuming that none of the PDF file links I have shared in this article is a copyright infringement since they are among the top Google search results. Several of these files are from either the authors' webpages or from scholarly links. In case you believe otherwise about any link please let me know I will remove that link.

Nice work Roopam..keep this good work up and thanks for aggregating these meaningful n imp information in one go..much appreciated.

I second what Abhinav has said. Great job in helping others with relevant knowledge sharing. Keep up the oustanding work !!

IBM has proven with its recent SPSS Modeler with R (Version 16) release that these two programs can actually CO HABITAT and EXTEND R. Realistically, R doesn’t scale well, doesn’t support enterprise deployment, and let’s face it not all people are R programmers – you need Modeler to make R usable. Other vendors are also trying to do this, but with Modeler 16 IBM SPSS is way ahead of the game.

Thanks Randy for pointing out that Modeler-16 has included this feature to integrate R. I have used Clementine in the past, and Modeler 14 was the last version that I tried. I like the way SPSS (now IBM) had developed Clementine and it’s a great tool to mine data. However, price tag attached with this software often deters organizations to replace SAS with SPSS. Clementine, now modeler, has python libraries as the underlining computing algorithms (of course they don’t reveal it anywhere). I think it would be great if someone can develop a viable platform like modeler for R with a reasonable price tag.

I suggest you add Coursera’s course recommended title “Software for Data Analysis-Programming with R” by John Chambers. A little verbose at times, but worth treading it.

excellent collection- thanks!

Thank you for your useful information. Nice collection!

This is a great idea!

Hi Roopam,

Amazing collection. Kudos !!. Do you have some thing similar for Python as well? can you please write similar stuff for python that would be really great help.

Thanks and Regards,

S.S.Pradeep

Thanks Pradeep, try this link to find free books on

Python for data science.thanks roopam

Roopam,

Thanks for compiling! A few comments:

1) I have found the Matloff book the most useful. It’s reasonably clear in its design and examples but fairly thorough in its discussion of base R. I still refer to it frequently.

2) Hadley Wickham’s new book on advanced R isn’t listed and probably deserves consideration

3) One of the concerns I have when considering reference books is how recently they were written. 3-5 years out is a long time in the software world both in terms of changes to base as well as new and expanded packages. Perhaps list the year of publication above?

Thanks again for your hard work reading all of the above and compiling this list!

Thanks Ben,

I also had a good time reading Matloff’s book. Both Matloff’s and Hadley Wickham’s books are primarily focused towards advanced coding / programming in R. I still need to read Advanced R by Wickham to evaluate it properly. However, it will be great if you could evaluate this book in a paragraph or 2 – I will add your comments (with credit) in this article with the book cover. I am sure the readers will find it useful.

I appreciate your 3rd point regarding the year of publication for the books, but I believe it is more appropriate for coding / programming books. For statistics and machine learning related text on R, I believe, the body of knowledge doesn’t change much over a period of 3-5 years.

Excellent piece of information. ‘R’ is widely used in various high tech research in many sectors. There are some vendors who has got commercial versions of ‘R’ which are as good as other commercial languages and offers all advanced features like in memory processing etc.

I hope you don’t mind my adding my own publication to the list.

Business Case Analysis with R: A Simulation Tutorial to Support Complex Business Decisionstakes the use ofRin a different direction from that of data and statistical analysis to business case analysis. This is the top level description…Yes! Thank you for sharing this. I have been looking for business case studies using R. Would love to see more!

Kayla, Nearly a year later, I just now saw your response. Were you able to make use of the tutorial?

As always, Roopam, you have done fabulous work and a great service to the data analytics community in describing all of these resources for learning R and your personal experiences with them. Kudos to you! As I am currently inexperienced with R and trying to get up to speed, it looks like the best sequence with online resources might be Code School, then Lynda, then Coursera, moving from basic to heavy duty. Does that make sense? Also, in terms of certification/recognition by potential employers, I know that the Coursera offerings from Johns Hopkins have good value, is this also true of the Lynda offerings, or do they merely serve as a “warm-up” for the more demanding courses? Additionally, I am also trying to figure which of the R interfaces (like R studio) would be the best to pursue. I must apologize, I have not read all of your blogs on YOU CANalytics, it is very possible you have commented elsewhere on these issues. Any thoughts you have on this would be much appreciated. Thanks!

Thanks James!

Yes, your sequence of courses seems right to me in terms of difficulty levels. I would recommend between CodeSchools and Lynda you may want to squeeze in two more free courses: Open Intro and Data Camp (the links are available in the table above Sign-off Note). If you feel ready after them you could skip Lynda all together and move to Kaggle challenges. Lynda, in my opinion, serves more as a warm up. I don’t feel employers and professional community consider this as a high value addition in the CV. However, it is a good course to start with.

In terms of R interfaces, I am highly biased towards R-studio. I have never used any other interface after using R studio for all these years. I used to rely on base R interface which I have not used for more than five years now. R-studio slowly grows on you so I recommend stick with it. You may want to try out Rattle as well. I have heard good reviews about H2O package but have not tried it just yet.

I didn’t see it mentioned but I found datacamp.com to be particularly useful in learning R. That is a great online resource as well.

Thanks for your commendation.

DataCamp is awesome, as others have mentioned…and I am a huge fan of Nicole Radziwill’s Statistics The Easy Way With R. It is user friendly and covers the R basics for those getting started, also includes links to data sets.

I think you need to look at overall schema of data science offered by coursera. Dr Peng programming in R is an introduction in R, is one of the subject. 9 out 10 subjects in data science required lots of R programming knowledge, so, it’s incremental learning. you need to start somewhere.

Dear Roopam,

I’d like to add another book to your book review which I think should be added to a new category which I would call ‘overall data science process’. The title of the book is:

Behind every good decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight.

I read the book and it has 2 main components in my view:

1. Examples of how to use business analytics to gain a competitive advantage. These examples are not exhaustive, but more of a describing nature.

2. The overall flow of a data science project in a business environment. They present a method that they call ‘badir’ which I guess is another implementation of the ‘crisp’ model. The great thing about this book is that they describe in a very rigorous way what steps to take to go from a business question to good insights ans what pitfalls to avoid.

3. How to create an analytics organisation.

My experience in engineering is that using a structured approach to solving problems is one of the most important aspects of making a project succesfull and this book explains in great detail how to do that for data science.

The only comment that I have is that they call the job of a data scientist ‘simple’ for most problems. My experience is that only a limited group of people is capable of solving problems in a structured way suggesting it’s not that simple.

The book is real easy to read.

Best regards,

Hans

Thanks for sharing.

can u refer a book for me. i am working on orcacle bi (OBIEE). oracle integrated with R. i want to learn R for analytic s point of view.

I’d recommend

Business Intelligence with R – From Acquiring Data to Pattern Exploration

by Dwight Barry

https://leanpub.com/businessintelligencewithr

This is a new book that is not 100% finished, but it is available. I reviewed it and found it to be very helpful.

I also have a book on using R for business case analysis, which is a slightly different use case for R from its usual data analytics. It incorporates principles of decision and risk analysis.

Business Case Analysis with R – A Simulation Tutorial to Support Complex Business Decisions

https://leanpub.com/bizanalysiswithr

My book was republished by Springer-Nature/Apress last month. It is now available at

https://www.apress.com/us/book/9781484234945#

and Amazon

https://www.amazon.com/Business-Case-Analysis-Simulation-Tutorials-ebook/dp/B07B6Q7TK1/

“R for Data Science” is now also complete and available online.

http://r4ds.had.co.nz/

by Garrett Grolemund and Hadley Wickham

Thanks, Jared. This book deserves a place right at the top.

Try this

https://www.edx.org/course/analytics-edge-mitx-15-071x-2

Great postings! I feel lucky to join this group.

R Programming is an software environment for statistical computing which are most widely used by data miners and statisticians for developing statistical software and data analysis….The blog is very informative …Thanks for updating these types of informative…

Nice blog…Am an beginner to R Programming field..This information is much useful for all…Keep instantly updating these types of informative…

Can we use R for Retail Analytics also ?

Is there any book for ” Retail Analytics using R “.

Nice information Thanks

Hi, I am really happy to found such a helpful and fascinating post that is written in well manner. Thanks for sharing such an informative post.

Thank you for sharing great resources to learn data science. I am looking for Data Science Training, Some suggest me to join janbasktraining for Data Science Training. please suggest me This is best for data Science learning or not.

Thanks Roopam

I was wondering if it is possible for you to create a similar blog for SAS?

Thanks again for your great contribution!

Very Useful

***********hi

I want this book for free (file book) pdf

R for Everyone: Advanced Analytics and Graphics – Jared P. Lander

YOU CANalytics Book Rating (5 / 5)

Very nice collection, amazing read, R’ offers advanced features like in memory processing etc, and allot of vendors use this commercial version of R’ Very useful information.

Hey I got a question!

I was looking through “The R Book” and noticed that uses a lot of imported data sets. Where do I get does data sets for practicing?