Skip to content

jkvoulgaridis/Advanced-Database-Systems

Repository files navigation

Advanced-Database-Systems

This project was implemented for the purposes of the undergrad. course Advamced Topics of Database Systems @ECE, NTUA, GR. Given a large dataset of movies and informations about them, the purpose of the exercise was to:

  1. Use spark framework to build queries about certain queries both in RDD API and SPARK SQL.
  2. Support use of .csv and .parquet files for the SQL queries
  3. Compare the time needed to get a response from thw query, for all possible setups (RDD/SQL) and .csv/.parquet (only in SQL).

=============================================================================================

  1. Create a function that implements repartition join
  2. Create a function tha implements broadcast join
  3. Compare running time of the above join on given data.

SETUP

All queries were running on a cluster of two nodes (master/slave) each having 2GB RAM. The VM's were assigned by Okeanos project @NTUA.

TODO

Query Description will be uploaded in english and in greek :)

About

No description or website provided.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages