Skip to content
Awez edited this page Apr 17, 2024 · 4 revisions

Welcome to the cs108-project wiki!

Database Development

Have to make database by scraping from online.

Requirements

  • Selenium
  • Scrapy
  • Chromedriver
  • Numpy
  • Pandas

Resources

Problems Faced

  • learning the classes and functions of selenium

    • used documentation from their website
    • used the references they gave
    • used copilot
  • we’re trying to click the button before it loads

    • used WebDriverWait class which has a function until which waits until the element loads i.e it’s clickable
    • if the button/element is not clickable at all, i used location_once_scrolled_into_view which is a method of WebElement class, so that the window scrolls until the button is clickable
    💡 a `WebElement` isn't clickable even if it’s not in the view
  • needed to scrape corresponding rating from other sites given i had a list of all movies scraped from IMDb

    • used google search
    • search showed different kind of results for different movies
  • selenium.webdriver is slow, using headless mode wasn't working

    • added an Options (which is a class) argument
    • speed didn’t change, hence i let it be as it is
  • the code to scrape is taking 1 hour to run

    • it’s normal, i let it as it is
  • couldn't gather 6 movies “google” data

    • these movies:
      • The Hunt
      • The Kid
      • Rocky
      • To Be or Not to Be
      • Psycho
      • Untouchable
    • made another script to gather data of these movies seperately

Website for Searching Movies

Website with a search and submit button which shows movie Information
  • Have to use backend, but i’m not understanding how it works

  • Used Node.js to create a server

  • Used Express which is a module of node.js,

    • this handles GET, POST request we send from client (browser) to the server (my laptop)
    • it uses an express( ) function, which basically creates an app
  • used body-parser to parse the body of the request I get from client

  • used file-system module to access IMDb data, stored in imdb.json

    fs.readFile('imdb.json', 'utf8', (err, data) => {//code});
  • as of now, my server is taking requests from the port 3000, this is also done by express

    app.listen(3000, () => {//code});
  • using ejs (Embedded JavaScript), to dynamically display webpage

    • for which I used renderer of express, with the view engine set as ‘ejs
    app.set('view engine', 'ejs')
    • whenever I get a GET request to ‘/’ route, i rendered the webpage (should be kept in views/ ) by using
    app.render('index', {movies: movies}
    • t
  • parsed the data using JSON.parse()

Problems Faced

  • stored directors, casts in an array. when stringified, it turned into a string. i can’t destringify it into an array now.
  • for parsing the data, JSON.parse() required, data to be enclosed in “”, but it was in ‘’, so used string manipulation to convert all ‘ to “, assuming there are no other ‘ except those enclosing the data.

Movie Recommendation

The user gives ratings for few movies. Based upon that we’ve to suggest 5 movies
  • Used a different route /ratings for this.
  • Stored rated movies & corresponding ratings in two arrays (ratedMovies and ratings )
  • The rating should be between 0-10 (inclusive)
  • Displayed that info on website

Algorithm for Suggestions

  • Among the rated movies, calculated average rating of each genre ( 0 if it isn’t rated )
  • For eg. if 3 idiots ( Comedy, Drama ) was rated 1 & Oppenheimer ( Drama, Thriller ) was rated 2 the ratings of genres would be { Comedy: 1, Drama: 1.5, Thriller: 2}
  • Then I’d assign a value for each movie , which is the sum of weighted average ratings of each genre of that movie.
  • The weight is directly proportional to the frequency of that genre in the rated movies.
  • If a particular genre isn’t rated, it’s rating is taken as 0.
  • Then I’d give the top 5 movies according to this value.
  • All this is displayed using EJS of course.

Resources

  • Help from [Gemini] & [ChatGPT] to get an idea of algorithms used in content/collaborative based filtering, which are used to suggest movies based on genres & ratings provided.

Problems Faced

  • Didn’t use weights before, which created inconsistent suggestions.

Customisation

Registration System

The user should login / signup to rate

Rating

😩 Scrape user reviews from Rotten Tomatoes and Metacritic. Scraping again!! Couldn’t you say this before

Recommender

SpellCheck

Enhanced UI