This is an ongoing notebook, from a kaggle competition (https://www.kaggle.com/c/expedia-hotel-recommendations/data). The goal is to predict the prefered hotel cluster of the users. I keep updating this notebook.

The training data is a dataset of size 3.8 GB of more than 37 million samples (users log data) which includes information such as the user country, hotel country, search date of checkin, search date of check out, etc.

The test set includes more that 250k users log data.

Due to the size of the training data, python libraries such as pandas and sklearn will fail, so here I use the python wrapper of Apache Spark (pyspark). Also, I will use the distributed computing service of Amazon, Elastic Map Reduce (EMR), for faster analysis and building of machine learning models.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
README.md		README.md
expedia_hotel_recommendataions.ipynb		expedia_hotel_recommendataions.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

hamedrazavi/expedia_hotel_recommendations

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages