A note I wrote 3 years ago based on an introductory course provided by Microsoft.

R is a beatiful language. Somehow, the beauty is easy to slip away.

So I haven’t used it for a long long time.

Module 1

The True Basics

Variable

<-: define a variable

+
-
*
/
^
%%

Workspace

ls(): Chekc the work space
rm(): Delete variables in the workspace

Basic Data Types

Check the type

class() # Output is the name of type

is.**() # Output is logical judgement

as.**()

Numeric

integer belongs to numeric

Module 2

Create And Name Vectors

Vector

• Sequence of data elements

• With Same basic type

• One dimensional arrays that can hold numeric data, character data, or logical data

c( )

names( )

Single value = vector

R does not provide a data structure to hold a single number or a single character string or any other basic data type: they’re all
just vectors of length 1.

Vectors are homogeneous

• Only hold elements of the same type (重要的事说两遍^O^ )

• Atomic vectors lists (can hold elements of different types)

• Automatic coercion if necessary

• R automatically performs coercion to make sure that you end up with a vector that contains elements of the same type if the basic data type in c() contrast.

• ‘Upgrading’ logicals to numerics, logicals to charactersand numerics to characters when necessary.

Vector Calculus

Element-wise

Mathematics naturally extend.

multiplication and division are done element-wise!

sum() and >

Other wise sum() also can be used in logical vector, which will count the number of TRUE in the vector.

Vector subsetting

Subset using logical vector

R is smart enough to see that the vector of logicals you passed it is shorter than the `remain` vector, so it repeats the contents of the vector until it has the same length as `remain`, so called “recycling”.

Module 3: Matrix

Create and Name Matrices

Features

• Vector: 1D array of data elements

• Matrix: 2D array of data elements

• Rows and columns

• One atomic vector type lists (if we want to Contain different types, we need to use list or data.frame)

Create a matrix

matrix()

recycling

when we pass the matrix function a vector that is too short to fill up the entire matrix, recycling activates.

When the matrix with a vector whose multiple does not nicely fit in the matrix.

rbind(), cbind()

Parameters in these two functions are the elements which also follow the recycling rule.

Combine with matrix()

Naming a matrix

rownames(), colnames()

1 line function

Subsetting Matrices

Subset multiple elements

we can’t select elements that don’t have one of row or column index in common. If you want to select the 11, on row 1 and column 2, and 8, on row 2 and column 3, this call will not give the wanted result. Instead, R will return a sub-matrix.

Subset by name

mix up names and number

Matrix Calculus

Matrix Calculus

• Matrix special functions: colSums(), rowSums()

• Like vector, matrix can manipulate standard arithmetic possible.

• Element-wise computation

Matrix Multiplication

This is not the matrix in math definition which shoud use `%*%`. Thus its operation is just like vector.

Matrices and Vectors

Very similar
: simply are data structures that can store elements of the same type

Vector = 1D, matrix = 2D
: The vector does this in a one-dimensional sequence, while the matrix uses a two-dimensional grid structure.

Coercion if necessary
: when you want to store elements of different types.

Recycling if necessary
: col by col.

Element-wise calculus
: Calculus are straightforward, all calculus is performed element-wise.

• r-project/matrices: it also contains more advanced operations on matrices, like how to take the determinant of a matrix in R or how to transpose a matrix

• ats.ucla.edu/matrices: Here the concept of working with matrices in R is once again explained and some more technical examples are also discussed

Module 4: Factors

Factors

Categorical Variables

• Limited number of different values

• Belong to category (ordered or non-ordered)

• In R: factor (a data structure to save categorical variables)

Create a factor

factor()

An example: blood type.

when we call the factor function, R basically does 2 things.

1. It scans through the vector to see the different categories that are in there and sorts levels alphabetically.

2. it converts the character vector, blood in this example, to a vector of integer values. These integers correspond to a set of character values to use when the factor is displayed.

Thus, factors are actually integer vectors, where each integer corresponds to a category, or a level.

levels = c(…)

Rename factor levels

levels(raw factor) <- c(…)

labels = …

Sometimes it’s a bit confusing. For both of these approaches, it’s important to follow the same order as the order of the factor levels: first A, then AB, then B and then O.

Thus, to solve it:

combination of manually specifying the `levels` and the `labels`
argument

Nominal Variables versus Ordinal Variables

An example: T shirt size VS blood type.

ordered = TRUE

Wrap-up

• Factors for storing categorical variables

• Factors are integer vectors

• Change factor levels: levels() function or labels argument

• Ordered factors: ordered = TRUE. Catering to both nominal and ordinal variables

Extra knowledge from labs

summary()
: This function could produce result summaries of the results of various model fitting functions.

Module 5: Lists

Create and name lists

Features

• Vector: 1D, same type

• Matrix: 2D, same type

• List

• Could store different R objects(vectors, matrices, dates, data frames, factors and many more)
• No coercion
• Loss of some functionality that vectors and matrices offered
• calculus with lists is far less straightforward due to the lack of predefined structure that lists have to follow.

list()

Name list

One-line approach

list(name = value…)

Better way to print list

str()

Subset and Extend Lists

Note, the printout of list is given by `str()` function.

`[` versus `[[`

An example: the song list

`[` returns to a list

`[[` returns to the value

A `[[` Error

The double brackets are only to select single elements from a list.

\$ and extending

It works just the same as the double brackets but only works on named lists.

Wrap-up

• `[[` or `[` ?

• `[[` to select list element
• `[` results in sublist
• `[[` and `\$` to subset and extend lists

Extra knowledge from lab

c()

: We can even use the c() function to add an element

shining_list <- c(shining_list, my_opinion = “Love it!”)

vector in list

: if we want to add an entire vector as an element to the list

shining_list_ext <- c(shining_list, opinions = c(“Love it!”, “Hate it!”))
: This will return

: Thus, we’d better surround the elements you want to add to the list in another list() function.

• List item

Module 6: Data Frame

Explore the Data Frame

Datasets

• Observation

• Variables

• Emample: Datasets

• each person = observation
• properties (name, age …) = variables

|name|age|child|
|–|–|–|
|Pete|30|TRUE|
|Frank|21|TRUE|
|Hehe|25|TRUE|

• Matrix? Need different types

• List? Not very practical

Data Frame

• Specifically for datasets

• Rows = observations (persons)

• Columns = variables (age, name, …)

• Contain elements of different types

• Elements in same column: same type

Create Data Frame

Usually, we don’t need create data frame manually.

• Import from data source

• CSV file

• Relational Database (e.g. SQL)

• Software packages (Excel, SPSS …)

data.frame()

Name Data Frame

Like in matrices, it’s also possible to name the rows of the data frame, but that’s generally not a good idea.

Data Frame Structure

• Data frame is actually a list containing all vectors of the same length.

• Strings & factor coercion.

stringsAsFactors = FALSE

A requirement that is not present for lists is that the length of the vectors you put in the list has to be equal.

Subset - Extend - Sort Data Frames

Subset Data Frame

• Subsetting syntax from matrices and lists

• [ from matrices

• [[ and \$ from lists

Subset Data Frame ~ Matrix

Select multiple information

Data Frame ~ List

A vector generated.

A data.frame generated.

Extend Data Frame

• List approach

XX\$YY
XX[[YY]]

• Matrix approach

cbind()

Sorting

order(\)**

Than we can re-order the data.frame

decreasing = TRUE

Extra knowledge from lab

Select multiple rows or columns

dataframe[n:m, n:m]

Example: Planets

Instead of having to define a vector `rings_vector`, which we then use to subset `planets_df`, we could’ve also used:

subset()

or 