-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the cs108-project wiki!
Have to make database by scraping from online.- Selenium
- Scrapy
- Chromedriver
- Numpy
- Pandas
-
Scrapy
-
Selenium only
-
Resource given
-
learning the classes and functions of selenium
- used documentation from their website
- used the references they gave
- used copilot
-
we’re trying to click the button before it loads
- used
WebDriverWait
class which has a functionuntil
which waits until the element loads i.e it’s clickable - if the button/element is not clickable at all, i used
location_once_scrolled_into_view
which is a method ofWebElement
class, so that the window scrolls until the button is clickable
- used
-
needed to scrape corresponding rating from other sites given i had a list of all movies scraped from IMDb
- used google search
- search showed different kind of results for different movies
-
selenium.webdriver
is slow, using headless mode wasn't working- added an
Options
(which is a class) argument - speed didn’t change, hence i let it be as it is
- added an
-
the code to scrape is taking 1 hour to run
- it’s normal, i let it as it is
-
couldn't gather 6 movies “google” data
- these movies:
- The Hunt
- The Kid
- Rocky
- To Be or Not to Be
- Psycho
- Untouchable
- made another script to gather data of these movies seperately
- these movies:
Website with a search and submit button which shows movie Information
-
Have to use backend, but i’m not understanding how it works
-
Used Node.js to create a server
-
Used Express which is a module of
node.js
,- this handles GET, POST request we send from client (browser) to the server (my laptop)
- it uses an express( ) function, which basically creates an app
-
used body-parser to parse the body of the request I get from client
-
used file-system module to access IMDb data, stored in imdb.json
fs.readFile('imdb.json', 'utf8', (err, data) => {//code});
-
as of now, my server is taking requests from the port 3000, this is also done by express
app.listen(3000, () => {//code});
-
using ejs (Embedded JavaScript), to dynamically display webpage
- for which I used renderer of express, with the view engine set as ‘ejs’
app.set('view engine', 'ejs')
- whenever I get a GET request to ‘/’ route, i rendered the webpage (should be kept in views/ ) by using
app.render('index', {movies: movies}
- t
-
parsed the data using
JSON.parse()
- stored directors, casts in an array. when stringified, it turned into a string. i can’t destringify it into an array now.
- for parsing the data,
JSON.parse()
required, data to be enclosed in “”, but it was in ‘’, so used string manipulation to convert all ‘ to “, assuming there are no other ‘ except those enclosing the data.
The user gives ratings for few movies. Based upon that we’ve to suggest 5 movies
- Used a different route
/ratings
for this. - Stored rated movies & corresponding ratings in two arrays (
ratedMovies
andratings
) - The rating should be between 0-10 (inclusive)
- Displayed that info on website
- Among the rated movies, calculated average rating of each genre ( 0 if it isn’t rated )
- For eg. if 3 idiots ( Comedy, Drama ) was rated 1 & Oppenheimer ( Drama, Thriller ) was rated 2 the ratings of genres would be { Comedy: 1, Drama: 1.5, Thriller: 2}
- Then I’d assign a value for each movie , which is the sum of weighted average ratings of each genre of that movie.
- The weight is directly proportional to the frequency of that genre in the rated movies.
- If a particular genre isn’t rated, it’s rating is taken as 0.
- Then I’d give the top 5 movies according to this value.
- All this is displayed using
EJS
of course.
- Help from [Gemini] & [ChatGPT] to get an idea of algorithms used in content/collaborative based filtering, which are used to suggest movies based on genres & ratings provided.
- Didn’t use weights before, which created inconsistent suggestions.
The user should login / signup to rate 😩 Scrape user reviews from Rotten Tomatoes and Metacritic. Scraping again!! Couldn’t you say this before
Created by Awez