-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated scraper for URL changes. #4
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! It's not all clear to me from the commit messages what you're changing and why. "URL changes" could mean anything for example and you're rewriting the code in a not-completely-obvious way while adding functionality which makes reviewing harder. It's great when a commit does an atomic change that is self contained. Personally I also separate refactoring into their own commits so you anyone can easily verify those and also more easily eyeball the real changes that are happening elsewhere
scraper.rb
Outdated
@@ -35,5 +34,5 @@ | |||
|
|||
# puts record.inspect | |||
ScraperWiki.save_sqlite(['council_reference'], record) | |||
puts "Saving " + council_ref | |||
puts "Saving " + council_ref.strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If council_ref
needs whitespace remove why don't you do that before the data is saved to the database?
@@ -1,38 +1,67 @@ | |||
require 'scraperwiki' | |||
require 'mechanize' | |||
|
|||
starting_url = 'http://www.sorell.tas.gov.au/publications/currently-advertised-applications/' | |||
def scrape() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're mixing up refactoring with some changes which are not at all clear from the commit message. So, this whole commit is difficult for me to review
@@ -6,6 +6,13 @@ def scrape() | |||
|
|||
agent = Mechanize.new | |||
|
|||
if ENV["MORPH_AUSTRALIAN_PROXY"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change (or something similar) is already being introduced by another PR #3
Scraper now captures address and council reference correctly.
Also added support of proxy use.
Disabled SSL check as the previous run of this scraper encountered an SSL check error.