Advanced-Database-Systems

This project was implemented for the purposes of the undergrad. course Advamced Topics of Database Systems @ECE, NTUA, GR. Given a large dataset of movies and informations about them, the purpose of the exercise was to:

Use spark framework to build queries about certain queries both in RDD API and SPARK SQL.
Support use of .csv and .parquet files for the SQL queries
Compare the time needed to get a response from thw query, for all possible setups (RDD/SQL) and .csv/.parquet (only in SQL).

=============================================================================================

Create a function that implements repartition join
Create a function tha implements broadcast join
Compare running time of the above join on given data.

SETUP

All queries were running on a cluster of two nodes (master/slave) each having 2GB RAM. The VM's were assigned by Okeanos project @NTUA.

TODO

Query Description will be uploaded in english and in greek :)

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
parquet-files		parquet-files
LICENSE		LICENSE
README.md		README.md
rdd_q1.py		rdd_q1.py
rdd_q2.py		rdd_q2.py
rdd_q3.py		rdd_q3.py
rdd_q4.py		rdd_q4.py
rdd_q5.py		rdd_q5.py
sql_q1.py		sql_q1.py
sql_q2.py		sql_q2.py
sql_q3.py		sql_q3.py
sql_q4.py		sql_q4.py
sql_q5.py		sql_q5.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Advanced-Database-Systems

SETUP

TODO

About

Releases

Packages

Contributors 2

Languages

License

jkvoulgaridis/Advanced-Database-Systems

Folders and files

Latest commit

History

Repository files navigation

Advanced-Database-Systems

SETUP

TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages