8) Some important R packages you MUST know
a) dplyr: provides a set of tools for efficiently manipulating datasets in R. (old: plyr). A fantastic tutorial from author of dplyr, Hadley Wickham. A comprehensive video on dplyr is here. This website also provides a very good information on Summarizing Data using ddply(), summarizeBy() and aggregate() functions. See also here.
b) tidyr: easily tidy data with spread() and gather() functions. (old: reshape2)
c) ggvis: goal is is to make it easy to describe interactive web graphics in R. (old: ggplot2)
d) magrittr: provides a mechanism for chaining commands with a new forward-pipe operator, %>%
e) Some combined examples of dplyr, reshape2, forward-pipe operator.
f) Data Wrangling, CheatSheet
g) Data Transformation with dplyr Cheat Sheet
h) tidyr, gather(), separate(), spread()
i) tidyr, gather():From wide to long, spread():From long to wide
k) data manipulation in both R and Python
14) How to find consecutive repeats in R, see here.
15) We all know the average of number 4 and 5 is 4.5. But why mean(4,5) gives 4? Answer is we should write mean(c(4,5)) which will give 4.5.
Issues-Related & Syntax
(1) If you have tried to deploy a Shiny application using R version 3.2.0, you would have encountered the below error.
Error: Unhandled Exception: Child Task 27278508 failed: Error parsing manifest: Maximum supported R version is: 3.1.3 Execution halted.
Solution: User R version 3.1.3 instead.
(2) With R version 3.1.3 and RStudio version 0.98.1103, when I press the Run App button with server.R file active, it would have problem as below:
R Session Aborted. R encountered a fatal error. The seeeion was terminated.
Solution: install.packages(‘Rcpp’) and the issue is solved.
(3) When installing library(gdata), the following errors come out:
gdata: Unable to locate valid perl interpreter
gdata: read.xls() will be unable to read Excel XLS and XLSX files unless the ‘perl=’ argument is used to specify the location of a valid perl interpreter.
Solution: Read here and download and install ActivePerl from http://www.activestate.com/activeperl/downloads
Installing library(gdata) again should produce the following outputs:
gdata: read.xls support for ‘XLS’ (Excel 97-2004) files ENABLED.
gdata: read.xls support for ‘XLSX’ (Excel 2007+) files ENABLED.
(4) When reading a .csv file with data Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
EOF within quoted string
(5) Be extra careful with if-else statements in R, especially for C++ programmers. See here.
(6) Be extra careful when you copy some R codes with double quotes from internet or pdf file and paste inside RStudio. When the word inside the double quotes is black, it is wrong. The color of the word inside the double quotes should be light green, as seen in the second line. See here.
(8) When using data.table with R version 3.2.0 as shown here, the Error in setkeyv(…, physical = FALSE) : 4 arguments passed to .Internal(nchar) which requires 3 will appear.
Solution: Use R version 3.2.2, as shown here.
(9) Every time you are creating plots you might get this error – “Error in plot.new(): figure margins too large“. To avoid such errors you can first check par(“mar”) output. You should be getting:  5.1 4.1 4.1 2.1
To change that write: par(mar=c(1,1,1,1)). This should rectify the error.
(10) Be extra careful if you have loaded library(dplyr) first, followed by library(plyr). See here for the message. If you load this code, it is wrong, which gives a single mean value. You need to correct the code as shown here, which gives 2 mean values.
(11) Avoid using a for loop for a list, instead use lapply function, as seen here.
(12a) mutate with ifelse, need a zero (0) at the end, else the complain of “Error: argument “no” is missing, with no default”. See here.
(12b) mutate with ifelse, change particular rows based on certain condition, e.g meter_type=”BYPASS” and update another column (min_5_flow_adjusted) for those rows. See here.
(13) Combining different files with different columns into a single data.frame. Use rbind.fill. See here. Need plyr library to use rbind.fill function.
(14) Be extra careful in this two statements here, especially when used in a for loop.
(15) DO NOT USE read.xlsx2 with Shiny, there are some hidden issues. Instead, use read_excel.
(16) When you want to read in a CSV file which you do not know whether its separator is a comma or semi-colon, do not use read.csv or read.csv2. Instead, use fread in the data.table package which will automatically determine the separator and then read the file in properly.