Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve location tag #1

Open
lefnire opened this issue Mar 26, 2016 · 1 comment
Open

Improve location tag #1

lefnire opened this issue Mar 26, 2016 · 1 comment

Comments

@lefnire
Copy link
Owner

lefnire commented Mar 26, 2016

All job attributes are stored as Tags, including location. Currently location is saved simply as a string, eg "San Francisco, CA". There are two problems:

  1. These strings aren't normalized, they're scraped per job and can be anything. If you currently search "San..." you'll see "San Francisco, CA", "San Francisco", "SF/Bay Area", "SF, CA, USA", etc. So we need to normalizing locations (so "San Francisco, CA" can only ever be one location tag).
  2. There's no locational awareness for radius search preferences (no lat/lng information).

I've looked into Google maps geocoding API, and other geocoding APIs; we can't use these due to terms issues (will explain if desired). Luckily we're using Postgres, which has a PostGIS Tiger Geocoder extension for just this purpose. I had a helluva time setting it up; and I'm if I'm not mistaken it only applies to USA? If I'm wrong, and we can set it up, we could store location tags as lat/lng tuples for (1) location normalization; (2) location radius scoring.

But let's punt Geocoder for later, and as a short-term solution simply dump all world cities into our database via the Adwords cities csv (creative commons). Then we'll prevent creating any new location tags, since they're all there. This will solve the normalization issue (not the radius issue).


Some technical notes for using adwords.csv:

We'll want to filter out too-small administrative divisions (see "Target Type"). I'm not sure which ones to use besides City, Country, Province, State (ideas?)

Process: (1) parse / filter the csv (see above); (2) upload to database (returning values); (3) store the results (along with id) to locations.json; (3) copy/paste said to client, so the file can be used both by client & server. Reasons for this procedure:

  • Client: the file is huge, so instead of require()ing on client (isomorphic, which will dramatically increase client's bundle.js) we can add locations.json to client/www which'll be picked up & cached by Cloudfront for much faster delivery. Additionally, it'll only be fetch()d when needed (SeedTags & CreateJob)
  • Server: doing a tag.text LIKE location.text every time we want to create a new job & pair its location will overload the database. So instead, keep locs = require('locations.json') handy to perform a similarity comparison (anyone have experience here?). This won't be an issue for custom-created jobs, since the client will be selecting from auto-complete options. But for scraped jobs, locations will come in willy-nilly and we'll want to find their closest matches.
@lefnire
Copy link
Owner Author

lefnire commented Mar 28, 2016

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant