A note I wrote 3 years ago based on an introductory course provided by Microsoft.
R is a beatiful language. Somehow, the beauty is easy to slip away.
So I haven’t used it for a long long time.
Module 1
The True Basics
Variable
<: define a variable
1  a < 15 
Basic Calculation
+

*
/
^
%%
Workspace
ls(): Chekc the work space
rm(): Delete variables in the workspace
1  # clear the workspace 
Basic Data Types
Check the type
class() # Output is the name of type
is.**() # Output is logical judgement
Coercion to Convert The Type
as.**()
Logical
1  TRUE 
Numeric
1  4.5 
integer belongs to numeric
1  5L # 5 
Character
1  "Hello world!" 
Further Readings
Module 2
Create And Name Vectors
Vector
Sequence of data elements
With Same basic type
One dimensional arrays that can hold numeric data, character data, or logical data
Create a vector
c( )
1  > c("hearts", "spades", "diamonds", "diamonds", "spades") 
Name a vector
names( )
1  > remain < c(11, 12, 11, 13) 
Single value = vector
R does not provide a data structure to hold a single number or a single character string or any other basic data type: they’re all
just vectors of length 1.
1  > my_apples < 5 
Vectors are homogeneous
Only hold elements of the same type (重要的事说两遍^O^ )
Atomic vectors
lists (can hold elements of different types) Automatic coercion if necessary
R automatically performs coercion to make sure that you end up with a vector that contains elements of the same type if the basic data type in c() contrast.
1
2
3
4
5> drawn_ranks < c(7, 4, "A", 10, "K", 3, 2, "Q")
> drawn_ranks
[1] "7" "4" "A" "10" "K" "3" "2" "Q"
> class(drawn_ranks)
[1] "character"‘Upgrading’ logicals to numerics, logicals to charactersand numerics to characters when necessary.
Vector Calculus
1  > my_apples < 5 # my_apples is a vector! 
Elementwise
1  > earnings < c(50, 100, 30) 
Mathematics naturally extend.
1  > earnings/10 
multiplication and division are done elementwise!
1  > earnings < c(50, 100, 30) 
sum() and >
1  > earnings < c(50, 100, 30) 
Other wise sum() also can be used in logical vector, which will count the number of TRUE in the vector.
Vector subsetting
Subset by index
1  > remain < c(spades = 11, hearts = 12, 
Subset by name
1  > remain < c(spades = 11, hearts = 12, 
Subset multiple elements
1  > remain < c(spades = 11, hearts = 12, 
Subset all but some
1  > remain < c(spades = 11, hearts = 12, 
Subset using logical vector
1  > remain < c(spades = 11, hearts = 12, 
R is smart enough to see that the vector of logicals you passed it is shorter than the remain
vector, so it repeats the contents of the vector until it has the same length as remain
, so called “recycling”.
1  > remain < c(spades = 11, hearts = 12, 
Further Readings
Module 3: Matrix
Create and Name Matrices
Features
Vector: 1D array of data elements
Matrix: 2D array of data elements
Rows and columns
One atomic vector type
lists (if we want to Contain different types, we need to use list or data.frame)
Create a matrix
matrix()
1  > matrix(1:6, nrow = 2) 
recycling
when we pass the matrix function a vector that is too short to fill up the entire matrix, recycling activates.
1  > matrix(1:3, nrow = 2, ncol = 3) 
When the matrix with a vector whose multiple does not nicely fit in the matrix.
1  > matrix(1:4, nrow = 2, ncol = 3) 
rbind(), cbind()
Parameters in these two functions are the elements which also follow the recycling rule.
1  > cbind(1:3, 1:3) # vevtor order follows by column 
Combine with matrix()
1  > m < matrix(1:6, byrow = TRUE, nrow = 2) 
Naming a matrix
rownames(), colnames()
1  > m < matrix(1:6, byrow = TRUE, nrow = 2) 
1 line function
1  > m < matrix(1:6, byrow = TRUE, nrow = 2, dimnames = list(c("row1", "row2"), c("col1", "col2", "col3"))) 
Coercion
1  > num < matrix(1:8, ncol = 2) 
Subsetting Matrices
Subset element
1  > m < matrix(sample(1:15, 12), nrow = 3) 
Subset column or row
1  > m 
Subset multiple elements
we can’t select elements that don’t have one of row or column index in common. If you want to select the 11, on row 1 and column 2, and 8, on row 2 and column 3, this call will not give the wanted result. Instead, R will return a submatrix.
1  > m 
Subset by name
1  > rownames(m) < c("r1", "r2", "r3") 
mix up names and number
1  > m[2,"c"] 
Subset with logical vector
1  > m 
Matrix Calculus
Matrix Calculus
Matrix special functions: colSums(), rowSums()
Like vector, matrix can manipulate standard arithmetic possible.
Elementwise computation
Example: lotr_matrix
1  > the_fellowship < c(316, 556) 
Matrix  Scalar Calculus
1  > lotr_matrix 
Matrix  Matrix Calculus
1  > lotr_matrix 
Matrix Calculus
1  > lotr_matrix 
Matrix Multiplication
This is not the matrix in math definition which shoud use %*%
. Thus its operation is just like vector.
1  > lotr_matrix 
Matrices and Vectors
Very similar
: simply are data structures that can store elements of the same type
Vector = 1D, matrix = 2D
: The vector does this in a onedimensional sequence, while the matrix uses a twodimensional grid structure.
Coercion if necessary
: when you want to store elements of different types.
Recycling if necessary
: col by col.
Elementwise calculus
: Calculus are straightforward, all calculus is performed elementwise.
Further reading
rproject/matrices: it also contains more advanced operations on matrices, like how to take the determinant of a matrix in R or how to transpose a matrix
ats.ucla.edu/matrices: Here the concept of working with matrices in R is once again explained and some more technical examples are also discussed
Module 4: Factors
Factors
Categorical Variables
Limited number of different values
Belong to category (ordered or nonordered)
In R: factor (a data structure to save categorical variables)
Create a factor
factor()
An example: blood type.
1  > blood < c("B", "AB", "O", "A", "O", "O", "A", "B") 
when we call the factor function, R basically does 2 things.
It scans through the vector to see the different categories that are in there and sorts levels alphabetically.
it converts the character vector, blood in this example, to a vector of integer values. These integers correspond to a set of character values to use when the factor is displayed.
Thus, factors are actually integer vectors, where each integer corresponds to a category, or a level.
Order levels differently
levels = c(…)
1
2
3
4
5
6
7
8
9
10
11 > blood_factor2 < factor(blood,
levels = c("O", "A", "B", "AB"))
> blood_factor2
[1] B AB O A O O A B
Levels: O A B AB
> str(blood_factor2)
Factor w/ 4 levels "O","A","B","AB": 3 4 1 2 1 1 2 3
> str(blood_factor)
Factor w/ 4 levels "A","AB","B","O": 3 2 4 1 4 4 1 3
Rename factor levels
levels(raw factor) < c(…)
labels = …
1  > blood < c("B", "AB", "O", "A", "O", "O", "A", "B") 
Sometimes it’s a bit confusing. For both of these approaches, it’s important to follow the same order as the order of the factor levels: first A, then AB, then B and then O.
Thus, to solve it:
combination of manually specifying the
levels
and thelabels
argument
1  > factor(blood, 
Nominal Variables versus Ordinal Variables
An example: T shirt size VS blood type.
ordered = TRUE
1  > blood < c("B", "AB", "O", "A", "O", "O", "A", "B") 
Ordered factor
1  > tshirt < c("M", "L", "S", "S", "L", "M", "L", "M") 
Wrapup
Factors for storing categorical variables
Factors are integer vectors
Change factor levels: levels() function or labels argument
Ordered factors: ordered = TRUE. Catering to both nominal and ordinal variables
Extra knowledge from labs
summary()
: This function could produce result summaries of the results of various model fitting functions.
1
2
3
4
5
6
7
8
9
10
11
12
13
14 > # Defintion of survey_vector and survey_factor
> survey_vector < c("R", "L", "L", "R", "R")
> survey_factor < factor(survey_vector, levels = c("R", "L"), labels = c("Right", "Left"))
>
> # Summarize survey_vector
> summary(survey_vector)
> Length Class Mode
> 5 character character
>
> # Summarize survey_factor
> summary(survey_factor)
>Right Left
>3 2
>
Further Readings
Module 5: Lists
Create and name lists
Features
Vector: 1D, same type
Matrix: 2D, same type
List
 Could store different R objects(vectors, matrices, dates, data frames, factors and many more)
 No coercion
 Loss of some functionality that vectors and matrices offered
 calculus with lists is far less straightforward due to the lack of predefined structure that lists have to follow.
Create list
list()
1  > c("Rsome times", 190, 5) 
Name list
1  > song < list("Rsome times", 190, 5) 
Oneline approach
list(name = value…)
1
2
3 > song < list(title = "Rsome times",
duration = 190,
track = 5)
Better way to print list
str()
1  > str(song) 
List in List
1  > similar_song < list(title = "R you on time?", 
Subset and Extend Lists
Note, the printout of list is given by str()
function.
[
versus [[
An example: the song list
1  > similar_song < list(title = "R you on time?", 
[
returns to a list
[[
returns to the value
1  > song[1] 
A [[
Error
1  > song[[c(1, 3)]] # double brackets are only to select single elements from a list. 
The double brackets are only to select single elements from a list.
Subset by names
1  > song[["duration"]] # with double quotes 
Subset by logicals
1  > song[c(FALSE, TRUE, TRUE, FALSE)] 
$ and extending
It works just the same as the double brackets but only works on named lists.
1  # select a element 
Add elements to embedded lists.
1  > song$similar$reason < "too long" 
: Thus, we’d better surround the elements you want to add to the list in another list() function.
1  c(shining_list, 
Further readings
 List item
Module 6: Data Frame
Explore the Data Frame
Datasets
Observation
Variables
Emample: Datasets
 each person = observation
properties (name, age …) = variables
nameagechild  Pete30TRUE Frank21TRUE Hehe25TRUE
Matrix? Need different types
List? Not very practical
Data Frame
Specifically for datasets
Rows = observations (persons)
Columns = variables (age, name, …)
Contain elements of different types
Elements in same column: same type
Create Data Frame
Usually, we don’t need create data frame manually.
Import from data source
CSV file
Relational Database (e.g. SQL)
Software packages (Excel, SPSS …)
Create Data Frame Manually
data.frame()
1  > name < c("Anne", "Pete", "Frank", "Julia", "Cath") 
Name Data Frame
1  > names(df) < c("Name", "Age", "Child") # names() function 
Like in matrices, it’s also possible to name the rows of the data frame, but that’s generally not a good idea.
Data Frame Structure
Data frame is actually a list containing all vectors of the same length.
Strings & factor coercion.
stringsAsFactors = FALSE
1  > str(df) 
A requirement that is not present for lists is that the length of the vectors you put in the list has to be equal.
Subset  Extend  Sort Data Frames
Subset Data Frame
Subsetting syntax from matrices and lists
[ from matrices
[[ and $ from lists
Example: people
1  > name < c("Anne", "Pete", "Frank", "Julia", "Cath") 
Subset Data Frame ~ Matrix
1  > people[3,2] 
Select multiple information
1  > people[c(3, 5), c("age", "child")] 
Data Frame ~ List
A vector generated.
1  > people$age 
A data.frame generated.
1  > people["age"] 
Extend Data Frame
Add columns = add variables
Add rows = add observations
Add columns
List approach
XX$YY
XX[[YY]]1
2
3
4
5
6
7
8
9
10
11> height < c(163, 177, 163, 162, 157)
> people$height < height
> people[["height"]] < height
> people
name age child height
1 Anne 28 FALSE 163
2 Pete 30 TRUE 177
3 Frank 21 TRUE 163
4 Julia 39 FALSE 162
5 Cath 35 TRUE 157Matrix approach
cbind()
1
2
3
4
5
6
7
8
9> weight < c(74, 63, 68, 55, 56)
> cbind(people, weight)
name age child height weight
1 Anne 28 FALSE 163 74
2 Pete 30 TRUE 177 63
3 Frank 21 TRUE 163 68
4 Julia 39 FALSE 162 55
5 Cath 35 TRUE 157 56
Add rows
1  > tom < data.frame("Tom", 37, FALSE, 183) 
Sorting
order(\)**
1
2
3
4
5
6
7
8
9
10
11
12 > sort(people$age)
[1] 21 28 30 35 39 # But it doesn't help us...
> ranks < order(people$age)
> ranks
[1] 3 1 2 5 4
> people$age
[1] 28 30 21 39 35
# 21 is lowest: its index, 3, comes first in ranks
# 28 is second lowest: its index, 1, comes second in ranks
# 39 is highest: its index, 4, comes last in ranks
Than we can reorder the data.frame
1  > people[ranks, ] 
decreasing = TRUE
1  > people[order(people$age, decreasing = TRUE), ] 
Extra knowledge from lab
Select multiple rows or columns
dataframe[n:m, n:m]
Example: Planets
1  > planets_df 
Instead of having to define a vector rings_vector
, which we then use to subset planets_df
, we could’ve also used:
subset()
1 subset(planets_df, subset = has_rings == TRUE)
or
1  planets_df[planets_df$has_rings == TRUE, ] 
Further Readings
This work is licensed under a Creative Commons AttributionShareAlike 4.0 International License.