Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated scraper for URL changes. #4

Closed
wants to merge 4 commits into from
Closed

Updated scraper for URL changes. #4

wants to merge 4 commits into from

Conversation

MutazAshhab
Copy link
Contributor

Scraper now captures address and council reference correctly.

Also added support of proxy use.

Disabled SSL check as the previous run of this scraper encountered an SSL check error.

Copy link
Member

@mlandauer mlandauer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! It's not all clear to me from the commit messages what you're changing and why. "URL changes" could mean anything for example and you're rewriting the code in a not-completely-obvious way while adding functionality which makes reviewing harder. It's great when a commit does an atomic change that is self contained. Personally I also separate refactoring into their own commits so you anyone can easily verify those and also more easily eyeball the real changes that are happening elsewhere

scraper.rb Outdated
@@ -35,5 +34,5 @@

# puts record.inspect
ScraperWiki.save_sqlite(['council_reference'], record)
puts "Saving " + council_ref
puts "Saving " + council_ref.strip()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If council_ref needs whitespace remove why don't you do that before the data is saved to the database?

@@ -1,38 +1,67 @@
require 'scraperwiki'
require 'mechanize'

starting_url = 'http://www.sorell.tas.gov.au/publications/currently-advertised-applications/'
def scrape()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're mixing up refactoring with some changes which are not at all clear from the commit message. So, this whole commit is difficult for me to review

@@ -6,6 +6,13 @@ def scrape()

agent = Mechanize.new

if ENV["MORPH_AUSTRALIAN_PROXY"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change (or something similar) is already being introduced by another PR #3

@MutazAshhab MutazAshhab closed this Dec 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants