r4spa.Rmd

---
title: "R4SPA: R Packages and Training to enable Statistical Programming in R"
author: "Kieran Martin"
date: "2018/08/16"
output:
  xaringan::moon_reader:
    css: ["./libs/remark-css/default.css","./libs/remark-css/metropolis.css", "./libs/remark-css/metropolis-fonts.css" ]
    lib_dir: libs
    chakra:  libs/remark-latest.min.js
    nature:
      highlightStyle: github
      highlightLines: true
      countIncrementalSlides: false
---


```{r setup, include=FALSE}
options(htmltools.dir.version = FALSE)
```

# Outline
.font150[
Who are we?

What is the problem we are facing?

How are we solving it?
]

---

# Who are we?

- This work today is a collaboration between two people:
  - Kieran Martin and Craig Gower
  - check out https://github.com/gowerc and https://github.com/kieranjmartin
  - My twitter is @kjmartinstats
- Both Data Analytic Specialists at Roche

.footnote[Slides today are hosted here: https://kieranjmartin.github.io/R4SPA-talk/r4spa.html]

---

# What is the problem?

Statistical Programmers want to use R!
<br><br>
--

- Lots of attendance on R training courses

- Lots of engagement in discussion around R
--
<br><br><br>
.content-box-green[But very little... actual R outputs!]

---

# What is the problem?

Lack of use cases for R

--

One key piece of programming work is setting up and qcing analysis datasets

--

 Reluctance to use R for this task
 <br><br><br>
--

.content-box-blue-centre[
**Why?**
]
---
# Why?

.content-box-blue[
Belief that data manipulation in R is **difficult**
]
--
<br>
.content-box-blue[
Lack of tools (proc compare)
]
--
<br>
.content-box-blue[
Training was disconnected from real tasks
]

---

# What are we doing?
.content-box-green[
In house training on **data manipulation** using the **tidyverse**
]

--

**What makes this different?**

--
- Focused on **one** task: data derivation

--
- tidyverse makes **easier** to read code (those with less R experience)

--
- Exercises based on our data and specifications

--
- Plan to train people with a specific **use case** for the training


---

#What we are doing?

**diffdf**
.content-box-green[
Package for **comparing datasets**

Gives **informative feedback** on where issues are
]

--

Main page: https://gowerc.github.io/diffdf/

Now on CRAN: https://CRAN.R-project.org/package=diffdf

Check out github: https://github.com/gowerc/diffdf

---
# diffdf: missing columns:
<br>

```{r, include = FALSE}

LENGTH = 30
set.seed(12334)
test_data <- tibble::tibble( 
    ID          = 1:LENGTH,
    GROUP1      = rep( c(1,2) , each = LENGTH/2),
    GROUP2      = rep( c(1:(LENGTH/2)), 2 ),
    INTEGER     = rpois(LENGTH , 40),
    BINARY      = sample( c("M" , "F") , LENGTH , replace = T),
    DATE        = lubridate::ymd("2000-01-01") + rnorm(LENGTH, 0, 7000),
    DATETIME    = lubridate::ymd_hms("2000-01-01 00:00:00") + rnorm(LENGTH, 0, 200000000), 
    CONTINUOUS  = rnorm(LENGTH , 30 , 12),
    CATEGORICAL = factor(sample( c("A" , "B" , "C") , LENGTH , replace = T)),
    LOGICAL     = sample( c(TRUE , FALSE) , LENGTH , replace = T),
    CHARACTER   = stringi::stri_rand_strings(LENGTH,  rpois(LENGTH , 13),  pattern = "[ A-Za-z0-9]")
)
```


```{r, warning=FALSE}
library(diffdf)

test_data2 <- test_data 
test_data2 <- test_data2[,-6]
diffdf(test_data , test_data2)
```
---
# diffdf: missing rows
<br>
```{r, warning=FALSE}
test_data2 <- test_data
test_data2 <- test_data2[1:(nrow(test_data2) - 2),]
diffdf(test_data, test_data2, keys = "ID")
```
---
# diffdf: different values
<br>
```{r, warning=FALSE}
test_data2 <- test_data
test_data2[5,2] <- 6
difval <- diffdf(test_data , test_data2, keys = "ID" )
difval$NumDiff
difval$VarDiff_GROUP1
```
---
# diffdf: different attributes
<br>
```{r, warning=FALSE}
test_data2 <- test_data
attr(test_data$ID , "label") <- "This is a interesting label"
attr(test_data2$ID , "label") <- "A different label"
diffdf(test_data , test_data2, keys = "ID" )
```
---

# Plans for the future

.content-box-green[
Roll out training across sites
]
--
<br>
.content-box-blue[
Build more packages to address common problems
]
--
<br>
.content-box-green[
Build more training focusing on different tasks in R
]