What?

A simple parallel MapReduce implementation in Erlang. The emphasis is on simplicity and understandability of the code.

Check out the demo files for simple examples, including:

building an inverted index of a collection of text documents
word frequency count of a collection of text documents
grep tool to search a collection of text documents

To compile:

make code

To run the inverted index demo:

cd ebin
erl
% in the erl prompt:
> Index = demo_inverted_index:index(test). % index the test subdirectory
> demo_inverted_index:query_index(Index, rover). % what files contain the word 'rover'?
{ok,["test/dogs","test/cars"]}
> halt().

To clean up:

make clean

Why?

This implementation is part of the teaching material of my course on multicore programming, a course I teach at the Vrije Universiteit Brussel (VUB) in Brussels, Belgium.

The goal is to teach students both the fundamentals of MapReduce (in particular, the API of the Map and Reduce operations, and how these are combined to formulate large data processing jobs), and to increase their fluency of Erlang at the same time. The code showcases process spawning, synchronization via message passing and process termination.

Slideware

I gave a 40-minute talk about this project at the Erlang Factory Lite Brussels. The slides are available here (pdf).

In Scala

I contributed a chapter discussing an adaptation of this MapReduce implementation in Scala in Philipp Haller's book on Actors in Scala (chapter 9, Distributed and Parallel Computing).

Acknowledgements

The inverted index example was taken from Joe Armstrong's Programming Erlang book.

Feedback

I welcome any feedback at tvcutsem at vub.ac.be. Or drop me a line on twitter.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
test		test
Makefile		Makefile
README.md		README.md
demo_frequency.erl		demo_frequency.erl
demo_grep.erl		demo_grep.erl
demo_inverted_index.erl		demo_inverted_index.erl
mapreduce.erl		mapreduce.erl
mapreduce_sequential.erl		mapreduce_sequential.erl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What?

Why?

Slideware

In Scala

Acknowledgements

Feedback

About

Releases

Packages

tvcutsem/erlang-mapreduce

Folders and files

Latest commit

History

Repository files navigation

What?

Why?

Slideware

In Scala

Acknowledgements

Feedback

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages