R for Statistical Computing

R is an open source software platform which offers the muscle for statistical data analysis, graphics and algorithm development. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS.

R was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand, and is currently developed by the R Development Core Team. R is named partly after the first names of the first two R authors.

1) Install R from http://cran.r-project.org/bin/windows/base/
2) Install RStudio (R IDE) from http://www.rstudio.com/products/rstudio/download/

Why is R so IMPORTANT?
http://www.kdnuggets.com/2017/05/poll-analytics-data-science-machine-learning-software-leaders.html
above list R as the second most popular language for Data Science / Data Analytics in 2017. I shall not repeat what others have mentioned about the benefits of R, but point to their link.

Why now is the time to learn R?

R, a statistical computing language that is handy for analyzing and visualizing big data, comes in at sixth place in year 2015 for Top 10 Programming Languages. Last year it was in ninth place, and its move reflects the growing importance of big data to a number of fields.

For first time R users, click here.

Below are some brief explanations or tips which could be useful for first-time users of R.
RStudioFrom RStudio, go to File -> New File -> R Script. The file extension for R Script is .R. I would normally like to include those 3 lines at the beginning of my R Script.

rm(list=ls()) # remove all variables
cat(“\014”)  # clear Console
if (dev.cur()!=1) {dev.off()} # clear R plots if exists

You can also clear the Console by placing the cursor on the Console, and press Ctrl+L.

If you want to run a particular line or a couple of those lines, highlighted them and click the Run button. If you want to run the complete R Script, click the Source button.

The variables stored are on the right panel of the Environment tab.

To comment/uncomment multi-lines in R code: highlight those lines, press Ctrl+Shift+C

You can google any questions about R, and there bound to be some suitable answers. If not, go to StackOverflow and post your questions there.

R could basically import data from many different sources, be it local files or files from internet. It can read in Excel files, SPSS files, SAS dataset, data from relational databases (MySQL, SQLite, Oracle, MS SQL), and data from non-relational databases (Hadoop, MongoDB). In addition, R has the capability to read through the HTML from a website (what we call web scraping) and import the tables (whether simple, complicated or multiple) that you want, be it from structure XML or JSON files.

A comprehensive list of categorized R packages and frameworks for different task domains has been compiled here. See this link for the complete alphabetical order of all R packages.

For more advanced users of R who have applications which are compute-intensive, a viable option is to consider the Graphics Processing Units (GPU) which proves to have some significant performance gain. A GeForce graphic card from NVIDIA would be needed.

To learn more on R’s memory management, click here.

For seasoned MATLAB users, here is a good reference guide comparing both R and MATLAB functions side by side. Also attached here is a useful Cheat Sheet on Data Exploration using R.

List of collections of R Books

See this link for the Step by Step of Learning Data Science with R.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s