Home

Welcome to the cs108-project wiki!

Database Development

Have to make database by scraping from online.

Requirements

Selenium
Scrapy
Chromedriver
Numpy
Pandas

Resources

Scrapy
Selenium only
Resource given

[Scraping IMDB Reviews in Python using Selenium]

Problems Faced

learning the classes and functions of selenium
- used documentation from their website
- used the references they gave
- used copilot
we’re trying to click the button before it loads
- used WebDriverWait class which has a function until which waits until the element loads i.e it’s clickable
- if the button/element is not clickable at all, i used location_once_scrolled_into_view which is a method of WebElement class, so that the window scrolls until the button is clickable
💡 a `WebElement` isn't clickable even if it’s not in the view
needed to scrape corresponding rating from other sites given i had a list of all movies scraped from IMDb
- used google search
- search showed different kind of results for different movies
selenium.webdriver is slow, using headless mode wasn't working
- added an Options (which is a class) argument
- speed didn’t change, hence i let it be as it is
the code to scrape is taking 1 hour to run
- it’s normal, i let it as it is
couldn't gather 6 movies “google” data
- these movies:
  - The Hunt
  - The Kid
  - Rocky
  - To Be or Not to Be
  - Psycho
  - Untouchable
- made another script to gather data of these movies seperately

Website for Searching Movies

Website with a search and submit button which shows movie Information

Have to use backend, but i’m not understanding how it works
Used Node.js to create a server
Used Express which is a module of node.js,
- this handles GET, POST request we send from client (browser) to the server (my laptop)
- it uses an express( ) function, which basically creates an app
used body-parser to parse the body of the request I get from client
used file-system module to access IMDb data, stored in imdb.json
```
fs.readFile('imdb.json', 'utf8', (err, data) => {//code});
```
as of now, my server is taking requests from the port 3000, this is also done by express
```
app.listen(3000, () => {//code});
```
using ejs (Embedded JavaScript), to dynamically display webpage
- for which I used renderer of express, with the view engine set as ‘ejs’
```
app.set('view engine', 'ejs')
```
- whenever I get a GET request to ‘/’ route, i rendered the webpage (should be kept in views/ ) by using
```
app.render('index', {movies: movies}
```
- t
parsed the data using JSON.parse()

Problems Faced

stored directors, casts in an array. when stringified, it turned into a string. i can’t destringify it into an array now.
for parsing the data, JSON.parse() required, data to be enclosed in “”, but it was in ‘’, so used string manipulation to convert all ‘ to “, assuming there are no other ‘ except those enclosing the data.

Movie Recommendation

The user gives ratings for few movies. Based upon that we’ve to suggest 5 movies

Used a different route /ratings for this.
Stored rated movies & corresponding ratings in two arrays (ratedMovies and ratings )
The rating should be between 0-10 (inclusive)
Displayed that info on website

Algorithm for Suggestions

Among the rated movies, calculated average rating of each genre ( 0 if it isn’t rated )
For eg. if 3 idiots ( Comedy, Drama ) was rated 1 & Oppenheimer ( Drama, Thriller ) was rated 2 the ratings of genres would be { Comedy: 1, Drama: 1.5, Thriller: 2}
Then I’d assign a value for each movie , which is the sum of weighted average ratings of each genre of that movie.
The weight is directly proportional to the frequency of that genre in the rated movies.
If a particular genre isn’t rated, it’s rating is taken as 0.
Then I’d give the top 5 movies according to this value.
All this is displayed using EJS of course.

Resources

Help from [Gemini] & [ChatGPT] to get an idea of algorithms used in content/collaborative based filtering, which are used to suggest movies based on genres & ratings provided.

Problems Faced

Didn’t use weights before, which created inconsistent suggestions.

Customisation

Registration System

The user should login / signup to rate

Rating

😩 Scrape user reviews from Rotten Tomatoes and Metacritic. Scraping again!! Couldn’t you say this before

Recommender

SpellCheck

Enhanced UI

Created by Awez

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Database Development

Requirements

Resources

Problems Faced

Website for Searching Movies

Problems Faced

Movie Recommendation

Algorithm for Suggestions

Resources

Problems Faced

Customisation

Registration System

Rating

Recommender

SpellCheck

Enhanced UI

Clone this wiki locally