-
Notifications
You must be signed in to change notification settings - Fork 1
/
README.Rmd
79 lines (57 loc) · 3.34 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
---
output:
github_document:
html_preview: false
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
library(prt)
```
# prt
<!-- badges: start -->
[![Lifecycle](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![Codecov test coverage](https://app.codecov.io/gh/nbenn/prt/branch/master/graph/badge.svg?token=HvOM3yosW3)](https://app.codecov.io/gh/nbenn/prt)
[![R build status](https://github.com/nbenn/prt/workflows/build/badge.svg)](https://github.com/nbenn/prt/actions?query=workflow%3Abuild)
[![pkgdown build status](https://github.com/nbenn/prt/workflows/pkgdown/badge.svg)](https://github.com/nbenn/prt/actions?query=workflow%3Apkgdown)
[![covr status](https://github.com/nbenn/prt/workflows/coverage/badge.svg)](https://github.com/nbenn/prt/actions?query=workflow%3Acoverage)
<!-- badges: end -->
Building on `data.frame` serialization provided by [`fst`](https://www.fstpackage.org), `prt` offers an interface for working with partitioned `data.frame`s, saved as individual `fst` files.
## Installation
You can install the development version of [prt](https://nbenn.github.io/prt/) from GitHub by running
```{r gh-dev, eval = FALSE}
source("https://install-github.me/nbenn/prt")
```
Alternatively, if you have the `remotes` package available, the latest release is available by calling `install_github()` as
```{r gh-rel, eval = FALSE}
# install.packages("remotes")
remotes::install_github("nbenn/prt@*release")
```
## Short demo
Creating a `prt` object can be done either by calling `new_prt()` on a list of previously created `fst` files or by coercing a `data.frame` object to `prt` using `as_prt()`.
```{r create}
tmp <- tempfile()
dir.create(tmp)
flights <- as_prt(nycflights13::flights, n_chunks = 2L, dir = tmp)
print(flights)
```
In case a `prt` object is created from a `data.frame`, the specified number of files is written to the directory of choice (a newly created directory within `tempdir()` by default).
```{r inspect}
list.files(tmp)
```
Subsetting and printing is closely modeled after `tibble` and behavior that deviates from that of `tibble` will most likely be considered a bug (please [report](https://github.com/nbenn/prt/issues/new)). Some design choices that do set a `prt` object apart from a `tibble` include the use of `data.table`s for any result of a subsetting operation and the complete disregard for `row.names`.
In addition to standard subsetting operations involving the functions <code>`[`()</code>, <code>`[[`()</code> and <code>`$`()</code>, the base generic function `subset()` is implemented for the `prt` class, enabling subsetting operations using non-standard evaluation. Combined with random access to tables stored as `fst` files, this can make data access more efficient in cases where only a subset of the data is of interest.
```{r subset}
jan <- flights[flights$month == 1, ]
identical(jan, subset(flights, month == 1))
print(jan)
```
A subsetting operation on a `prt` object yields a `data.table`. If the full table is of interest, a `prt`-specific implementation of the `as.data.table()` generic is available.
```{r cleanup}
unlink(tmp, recursive = TRUE)
```