1) Using R to search Twitter Data (Text Mining), see also this.

2) Using R to plot Eloba data

3) Using R as PivotTables

4) Using R for Web Scraping
Recommended Books
a) XML and Web Technologies for Data Sciences with R
b) Automated Data Collection with R: A Practical Guide to Web Scraping and Text Mining

5) R Web deployments using Shiny, Shiny CheatSheet
a) My First Shiny Application
b) Retrieve Production Data
c) Online Survey.

6) Dropbox and R Integration

7) R and D3.js Integration. See code from here.

8) Some important R packages you MUST know
a) dplyr: provides a set of tools for efficiently manipulating datasets in R. (old: plyr). A fantastic tutorial from author of dplyr, Hadley Wickham. A comprehensive video on dplyr is here. This website also provides a very good information on Summarizing Data using ddply(), summarizeBy() and aggregate() functions. See also here.
b) tidyr: easily tidy data with spread() and gather() functions. (old: reshape2)
c) ggvis: goal is is to make it easy to describe interactive web graphics in R. (old: ggplot2)
d) magrittr: provides a mechanism for chaining commands with a new forward-pipe operator, %>%
e) Some combined examples of dplyr, reshape2, forward-pipe operator.
f) Data Wrangling, CheatSheet
g) Data Transformation with dplyr Cheat Sheet
h) tidyr, gather(), separate(), spread()
i) tidyr, gather():From wide to long, spread():From long to wide
k) data manipulation in both R and Python

9) data.table vs data.frame

10) Efficient way to access data from a list of data frames, especially for large number of elements. See here and here.

11) Tips on Handling Big Data in R.

12) We could speed up R code substantially by preallocating memory to the variables (vectors, matrices, lists, etc). See here and this link.

13) R joins (left, right, inner, full, semi, anti), see also here.

14) How to find consecutive repeats in R, see here.

15) We all know the average of number 4 and 5 is 4.5. But why mean(4,5) gives 4? Answer is we should write mean(c(4,5)) which will give 4.5.

Issues-Related & Syntax

(1) If you have tried to deploy a Shiny application using R version 3.2.0, you would have encountered the below error.
Error: Unhandled Exception: Child Task 27278508 failed: Error parsing manifest: Maximum supported R version is: 3.1.3 Execution halted.
Solution: User R version 3.1.3 instead.

(2) With R version 3.1.3 and RStudio version 0.98.1103, when I press the Run App button with server.R file active, it would have problem as below:
R Session Aborted. R encountered a fatal error. The seeeion was terminated.
Solution: install.packages(‘Rcpp’) and the issue is solved.

(3) When installing library(gdata), the following errors come out:
gdata: Unable to locate valid perl interpreter
gdata: read.xls() will be unable to read Excel XLS and XLSX files unless the ‘perl=’ argument is used to specify the location of a valid perl interpreter.
Solution: Read here and download and install ActivePerl from
Installing library(gdata) again should produce the following outputs:
gdata: read.xls support for ‘XLS’ (Excel 97-2004) files ENABLED.
gdata: read.xls support for ‘XLSX’ (Excel 2007+) files ENABLED.

(4) When reading a .csv file with data Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
Solution: require(data.table)

(5) Be extra careful with if-else statements in R, especially for C++ programmers. See here.

(6) Be extra careful when you copy some R codes with double quotes from internet or pdf file and paste inside RStudio. When the word inside the double quotes is black, it is wrong. The color of the word inside the double quotes should be light green, as seen in the second line. See here.

(7) When using list() as shown here, the Error in ‘*tmp*'[[i]]: subscript out of bounds will appear.
Solution: Use lapply(), as shown here.

(8) When using data.table with R version 3.2.0 as shown here, the Error in setkeyv(…, physical = FALSE) : 4 arguments passed to .Internal(nchar) which requires 3 will appear.
Solution: Use R version 3.2.2, as shown here.

(9) Every time you are creating plots you might get this error – “Error in figure margins too large“. To avoid such errors you can first check par(“mar”) output. You should be getting: [1] 5.1 4.1 4.1 2.1
To change that write: par(mar=c(1,1,1,1)). This should rectify the error.

(10) Be extra careful if you have loaded library(dplyr) first, followed by library(plyr). See here for the message. If you load this code, it is wrong, which gives a single mean value. You need to correct the code as shown here, which gives 2 mean values.

(11) Avoid using a for loop for a list, instead use lapply function, as seen here.

(12a) mutate with ifelse, need a zero (0) at the end, else the complain of “Error: argument “no” is missing, with no default”. See here.

(12b) mutate with ifelse, change particular rows based on certain condition, e.g meter_type=”BYPASS” and update another column (min_5_flow_adjusted) for those rows. See here.

(13) Combining different files with different columns into a single data.frame. Use rbind.fill. See here. Need plyr library to use rbind.fill function.

(14) Be extra careful in this two statements here, especially when used in a for loop.

(15) DO NOT USE read.xlsx2 with Shiny, there are some hidden issues. Instead, use read_excel.

(16) When you want to read in a CSV file which you do not know whether its separator is a comma or semi-colon, do not use read.csv or read.csv2. Instead, use fread in the data.table package which will automatically determine the separator and then read the file in properly.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s