Note: please view the notebook or HTML file rather than the PDF file. The PDF file does not include the visualisations.
Note: Please follow the installation docs for the GDV course first.
-
Clone this repository to your computer using git.
git clone https://github.com/bryanvanhuyneghem/Distributed-Data-Processing.git
-
Download your assigned datasets to this folder of the repository.
-
Add all the files of the dataset to the
.gitignore
file so that it does not get added to the git repository. For more information on gitignore files, see the git docs. -
Open
project.code-workspace
using Visual Studio Code.Note: If you're working on Windows, make sure that your Docker instance is running.
-
Click on the "Remote Explorer" tab in the left sidebar.
- Click on the
+
next to CONTAINERS, - choose "Open Current Folder in Container",
- choose "Python 3 - Anaconda". This will create a container to develop in.
- Click on the
-
Wait until the container is setup. This can take a few minutes because the container needs to be pulled and built. You can check the progress by clicking "Starting Dev Container (show log)" in the notification on the bottom right of VSCode.
-
When the container is setup, open
lab1-project.ipynb
and start coding!