Skip to content

Commit

Permalink
Merge branch 'master' of github.com:swcarpentry/r-novice-inflammation
Browse files Browse the repository at this point in the history
  • Loading branch information
fmichonneau committed Jul 1, 2019
2 parents b4e77f5 + 7aa4e9b commit e1d2add
Showing 1 changed file with 74 additions and 26 deletions.
100 changes: 74 additions & 26 deletions _episodes_rmd/13-supp-data-structures.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,18 @@ questions:
- "What are the different data structures in R?"
- "How do I access data within the various data structures?"
objectives:
- "Expose learners to the different data types in R."
- "Expose learners to the different data types in R and show how these data
types are used in data structures."
- "Learn how to create vectors of different types."
- "Be able to check the type of vector."
- "Learn about missing data and other special values."
- "Getting familiar with the different data structures (lists, matrices, data frames)."
- "Get familiar with the different data structures (lists, matrices, data frames)."
keypoints:
- "R's basic data types are character, numeric, integer, complex, and logical."
- "R's basic data structures include the vector, list, matrix, data frame, and factors."
- "R's basic data structures include the vector, list, matrix, data frame, and
factors. Some of these structures require that all members be of the same data
type (e.g. vectors, matrices) while others permit multiple data types (e.g.
lists, data frames)."
- "Objects may have attributes, such as name, dimension, and class."
source: Rmd
---
Expand All @@ -24,27 +28,30 @@ source("../bin/chunk-options.R")
knitr_fig_path("13-supp-data-structures-")
```

### Understanding Basic Data Types in R
### Understanding Basic Data Types and Data Structures in R

To make the best of the R language, you'll need a strong understanding of the
basic data types and data structures and how to operate on those.
basic data types and data structures and how to operate on them.

Very important to understand because these are the objects you will manipulate
on a day-to-day basis in R. Dealing with object conversions is one of the most
common sources of frustration for beginners.
Data structures are very important to understand because these are the objects you
will manipulate on a day-to-day basis in R. Dealing with object conversions is one
of the most common sources of frustration for beginners.

**Everything** in R is an object.

R has 6 (although we will not discuss the raw class for this workshop) atomic
vector types.
R has 6 basic data types. (In addition to the five listed below, there is also
*raw* which will not be discussed in this workshop.)

* character
* numeric (real or decimal)
* integer
* logical
* complex

By *atomic*, we mean the vector only holds data of a single type.
Elements of these data types may be combined to form data structures, such as
atomic vectors. When we call a vector *atomic*, we mean that the vector only
holds data of a single data type. Below are examples of atomic character vecotrs,
numeric vectors, integer vectors, etc.

* **character**: `"a"`, `"swc"`
* **numeric**: `2`, `15.5`
Expand Down Expand Up @@ -84,7 +91,7 @@ R has many __data structures__. These include
* data frame
* factors

### Atomic Vectors
### Vectors

A vector is the most common and basic data structure in R and is pretty much the
workhorse of R. Technically, vectors can be one of two types:
Expand Down Expand Up @@ -263,23 +270,52 @@ nchar("Software Carpentry")

In R matrices are an extension of the numeric or character vectors. They are not
a separate type of object but simply an atomic vector with dimensions; the
number of rows and columns.
number of rows and columns. As with atomic vectors, the elements of a matrix must
be of the same data type.

```{r}
m <- matrix(nrow = 2, ncol = 2)
m
dim(m)
```

You can check that matrices are vectors with a class attribute of `matrix` by using `class()` and `typeof()`.
You can check that matrices are vectors with a class attribute of `matrix` by using
`class()` and `typeof()`.

```{r}
m <- matrix(c(1:3))
class(m)
typeof(m)
```

While `class()` shows that m is a matrix, `typeof()` shows that fundamentally the matrix is an integer vector.
While `class()` shows that m is a matrix, `typeof()` shows that fundamentally the
matrix is an integer vector.

> ## Data types of matrix elements
>
> Consider the following matrix:
>
> ```{r matrix-typeof}
> FOURS <- matrix(
> c(4, 4, 4, 4),
> nrow = 2,
> ncol = 2)
> ```
>
> Given that `typeof(FOURS[1])` returns `"double"`, what would you expect
> `typeof(FOURS)` to return? How do you know this is the case even without
> running this code?
>
> *Hint* Can matrices be composed of elements of different data types?
>
> > ## Solution
> > We know that `typeof(FOURS)` will also return `"double"` since matrices
> > are made of elements of the same data type. Note that you could do
> > something like `as.character(FOURS)` if you needed the elements of `FOURS`
> > *as characters*.
>
> {: .solution}
{: .challenge}
Matrices in R are filled column-wise.
Expand All @@ -296,7 +332,8 @@ dim(m) <- c(2, 5)

This takes a vector and transforms it into a matrix with 2 rows and 5 columns.

Another way is to bind columns or rows using `cbind()` and `rbind()`.
Another way is to bind columns or rows using `rbind()` and `cbind()` ("row bind"
and "column bind", respectively).

```{r}
x <- 1:3
Expand All @@ -305,7 +342,8 @@ cbind(x, y)
rbind(x, y)
```

You can also use the `byrow` argument to specify how the matrix is filled. From R's own documentation:
You can also use the `byrow` argument to specify how the matrix is filled.
From R's own documentation:

```{r}
mdat <- matrix(c(1, 2, 3, 11, 12, 13),
Expand All @@ -315,7 +353,8 @@ mdat <- matrix(c(1, 2, 3, 11, 12, 13),
mdat
```

Elements of a matrix can be referenced by specifying the index along each dimension (e.g. "row" and "column") in single square brackets.
Elements of a matrix can be referenced by specifying the index along each
dimension (e.g. "row" and "column") in single square brackets.

```{r}
mdat[2, 3]
Expand Down Expand Up @@ -398,22 +437,25 @@ names(xlist)
> {: .solution}
{: .challenge}
Lists can be extremely useful inside functions. Because the functions in R are able to return only a single object, you can "staple" together lots
of different kinds of results into a single object that a function can return.
Lists can be extremely useful inside functions. Because the functions in R are
able to return only a single object, you can "staple" together lots of different
kinds of results into a single object that a function can return.
A list does not print to the console like a vector. Instead, each element of the
list starts on a new line.
Elements are indexed by double brackets. Single brackets will still return
a(nother) list. If the elements of a list are named, they can be referenced by the `$` notation (i.e. `xlist$data`).
a(nother) list. If the elements of a list are named, they can be referenced by
the `$` notation (i.e. `xlist$data`).
### Data Frame
A data frame is a very important data type in R. It's pretty much the *de facto*
data structure for most tabular data and what we use for statistics.
A data frame is a *special type of list* where every element of the list has same length (i.e. data frame is a "rectangular" list).
A data frame is a *special type of list* where every element of the list has same
length (i.e. data frame is a "rectangular" list).
Data frames can have additional attributes such as `rownames()`, which can be
useful for annotating data, like `subject_id` or `sample_id`. But most of the
Expand Down Expand Up @@ -455,27 +497,33 @@ is.list(dat)
class(dat)
```

Because data frames are rectangular, elements of data frame can be referenced by specifying the row and the column index in single square brackets (similar to matrix).
Because data frames are rectangular, elements of data frame can be referenced by specifying
the row and the column index in single square brackets (similar to matrix).

```{r}
dat[1, 3]
```

As data frames are also lists, it is possible to refer to columns (which are elements of such list) using the list notation, i.e. either double square brackets or a `$`.
As data frames are also lists, it is possible to refer to columns (which are elements of
such list) using the list notation, i.e. either double square brackets or a `$`.

```{r}
dat[["y"]]
dat$y
```

The following table summarizes the one-dimensional and two-dimensional data structures in R in relation to diversity of data types they can contain.
The following table summarizes the one-dimensional and two-dimensional data structures in
R in relation to diversity of data types they can contain.

| Dimensions | Homogenous | Heterogeneous |
| ------- | ---- | ---- |
| 1-D | atomic vector | list |
| 2-D | matrix | data frame |

> Lists can contain elements that are themselves muti-dimensional (e.g. a lists can contain data frames or another type of objects). Lists can also contain elements of any length, therefore list do not necessarily have to be "rectangular". However in order for the list to qualify as a data frame, the lenghth of each element has to be the same.
> Lists can contain elements that are themselves muti-dimensional (e.g. a lists can contain
> data frames or another type of objects). Lists can also contain elements of any length,
> therefore list do not necessarily have to be "rectangular". However in order for the list
> to qualify as a data frame, the lenghth of each element has to be the same.
{: .callout}

> ## Column Types in Data Frames
Expand Down

0 comments on commit e1d2add

Please sign in to comment.