support memory = FALSE like in spark #5

randomgambit · 2019-09-19T17:01:47Z

Hi,

Assuming that is (even technically) possible, it would be useful to have the data indexed (but not loaded yet in the RAM) like in sparklyr (see https://www.rdocumentation.org/packages/sparklyr/versions/1.0.2/topics/spark_read_parquet)

That would allow the user to load very large parquet files but pay only for what is actually used (similarly to what vroom does https://github.com/r-lib/vroom)

what do you think?
Thanks!

The text was updated successfully, but these errors were encountered:

hannes · 2019-09-20T08:27:24Z

Yes, I plan to implement ALTREP features also for the parquet reader similar to VROOM.

randomgambit · 2019-09-20T11:00:09Z

great idea!! maybe you should work with Jim Hester (@jimhester, vroom author) to get a single package that handles csv + parquet super fast? that would be a killer package in my opinion! and more dev are needed to fix bugs and other inefficiencies. what do you think?

hannes · 2019-09-23T10:37:41Z

Check out the altrep branch in this repo... for now, it materialises everything at once, but things like this should no longer read any unrelated payload data:

a <- miniparquet::read_parquet("...")
names(a)
mean(a$col)

hannes · 2019-09-24T08:29:25Z

See also https://twitter.com/hfmuehleisen/status/1176410678967640065?s=20

hannes closed this as completed Sep 20, 2019

hannes reopened this Sep 20, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

support memory = FALSE like in spark #5

support memory = FALSE like in spark #5

randomgambit commented Sep 19, 2019

hannes commented Sep 20, 2019

randomgambit commented Sep 20, 2019

hannes commented Sep 23, 2019 •

edited

Loading

hannes commented Sep 24, 2019

support memory = FALSE like in spark #5

support memory = FALSE like in spark #5

Comments

randomgambit commented Sep 19, 2019

hannes commented Sep 20, 2019

randomgambit commented Sep 20, 2019

hannes commented Sep 23, 2019 • edited Loading

hannes commented Sep 24, 2019

hannes commented Sep 23, 2019 •

edited

Loading