- The purpose of this project is to analyze Amazon reviews written by members of the paid Amazon Vine program, a service that allows manufacturers and publishers to receive reviews of their products and determine if there are any biases between Vine members and Non-Vine member's reviews.
- We use PySpark to perform the ETL process to extract the dataset, transform the data, connect to an AWS RDS instance, load the transformed data into pgAdmin and calculate different metrics.
- We focused on the US reviews for softwares.
- Vine reviews
- Non-Vine reviews
- Vine reviews
- Non-Vine reviews
- Vine reviews
- Non-Vine reviews
- For the software category, 41% of the reviews in the Vine program were 5 stars reviews whereas the percentage in the non-Vine reviews is only 29%. This describes a positivity bias for reviews in the Vine program.
- We could perform additional analysis on the statistical distribution (mean, median and mode) of the star rating for the Vine and non-Vine reviews.