1BRC in Python with Dask and Spark #450
scharlottej13
started this conversation in
Show and tell
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Inspired by #62, wanted to share a Dask and PySpark implementation for the challenge. Both were run on an M1 Mac (8-core CPU, 16 GB memory). Dask took 32.8 seconds and Spark took 2min 1s.
Worth noting the Dask implementation uses Dask Expressions, which is under active development. Dask users can expect to see these changes in core Dask DataFrame soon.
More details in our blog post.
Dask implementation - 32.8 seconds (± 2.43 s across 7 trials)
Spark implementation - 2min 1s (± 6.45 s across 7 trials)
Beta Was this translation helpful? Give feedback.
All reactions