Skip to content

Latest commit

 

History

History
23 lines (14 loc) · 1020 Bytes

README.md

File metadata and controls

23 lines (14 loc) · 1020 Bytes

Advanced-Database-Systems

This project was implemented for the purposes of the undergrad. course Advamced Topics of Database Systems @ECE, NTUA, GR. Given a large dataset of movies and informations about them, the purpose of the exercise was to:

  1. Use spark framework to build queries about certain queries both in RDD API and SPARK SQL.
  2. Support use of .csv and .parquet files for the SQL queries
  3. Compare the time needed to get a response from thw query, for all possible setups (RDD/SQL) and .csv/.parquet (only in SQL).

=============================================================================================

  1. Create a function that implements repartition join
  2. Create a function tha implements broadcast join
  3. Compare running time of the above join on given data.

SETUP

All queries were running on a cluster of two nodes (master/slave) each having 2GB RAM. The VM's were assigned by Okeanos project @NTUA.

TODO

Query Description will be uploaded in english and in greek :)