- Uses expressjs as a server
- Utilized node streams to import data into in-memory
- Implemented a suggestion API endpoint as per the requirements
- Implement caching on the suggestion API endpoint via redis on suggestions API endpoint
- Implemented a stream API endpoint which utilizes node streams to read file and return desired suggestions. This was more of a intellectual curiosity.
- Setup CI/CD pipeline via CircleCI and Heroku
We implemented caching to allow faster response time and would speed up request. Other strategies we can use here are:
- Implement rate limiting using load balancers, nginx or use bottleneck or express-rate-limit to rate limit the api
- Enforce a minimum length for the query string, i.e. 3 chars
City data is imported in the importData.js
file. We utilize Node Stream to pipe individual lines of the file and process each one individual.
Currently we are simply storing the cities in-memory. We can easily add a pipe to store the cities in a Database since since PSQL copy command supports streaming binary data directly into the table.
[GET] /suggestions
: Implements the standard suggestions endpoint without any streaming as per the minimum requirements
[GET] /stream
: Implements the suggestions endpoint with streaming. Streaming directly from the tsv file. As we process the lines we are formatting, searching and outputting the suggestions:- Create a readable stream from tsv file
- Pipe into city population formatter
- Pipe into city coordinate formatter
- Pipe into city state formatter
- Pipe into filter by city based on config
- Pipe into search by search term
- Pipe into add score to city
- Pipe into city to json formatter
- Pipe into suggestions array
The scoring algorithm is comprised of two parts:
-
The
search_term
score Thesearch_term
score is a float between 0 and 1 which represents the confidence level of the match. It is calculated by dividing the length of thesearch_term
by the length of the city's name.
search_term.length / city.name.length
If the lengths match it would be an full match and yield a value of 1. If it is a partial match then the length ofsearch_term
will be inferior to the length of city's name and will yield a fractional value less then 1. -
The
distance
score Thedistance
score is a float between 0 and 1 which represents the closeness of two coordinates, where 1 implies both coordinate are at the same exact point and 0 mean they are the farthest apart. It is calculated by taking the difference between the biggest possible distance and the distance between both point and dividing that by the biggest possible distance between 2 coordinates on earth.EARTH_CIRCUMFERENCE = 40075.0 // in kilometers HALF_EARTH_CIRCUMFERENCE = EARTH_CIRCUMFERENCE / 2.0 // biggest ditance between 2 coordinates on earth distanceInKm = distanceBetweenCoordinates(coordinate_a, coordinate_b) ((HALF_EARTH_CIRCUMFERENCE - distanceInKm) / HALF_EARTH_CIRCUMFERENCE)
Once we retrieve both scores we weight them based on the configuration value config.suggestionConfig.coordinateScoreWeight
. If we are not taking into account distance
score (since query does not contain latitude and longitude) we would simply use 1 as the weight score for search_term
score
final_score = (search_term_score * searc_term_score_weight) + (distance_score * distance_score_weight)
├── Makefile Usefull commands
├── README.md This file
├── app.js Express app initialization and setup
├── config.js Environment configuration
├── constants.js Envinroment constant use throughout
├── data
│ ├── README.md
│ ├── admin_1_code.js Province and state data
│ └── cities_canada-usa.tsv City Data
├── domain
│ ├── suggestor.helper.js Scoring and match helper functions
│ └── suggestor.js Iterates through suggestions, filter, searching and scoring them
├── lib
│ ├── configureRedis.js Redis configuration file
│ └── importData.js Imports data in-memory utilising node streams
├── package-lock.json
├── package.json NPM configuration
├── routes
│ ├── index.js Loads all API endpoints
│ ├── routes.helper.js Helper method used in mutiple API endpoints
│ ├── stream.js Implements stream API endpoint
│ └── suggestions.js Implements suggestions API endpoint
└── test
├── domain
│ └── suggestor.helper.js Test helper function located in domain > suggestor.helper.js
└── suggestions.js Test suggestions API endpoint
The application is deployed on Heroku at https://geosuggest.herokuapp.com and it tested via CircleCI. I setup heroku to auto deploy when the master branch on github pass circle ci's continuous integration.
- Add fuzzy search to match ascii name of the city
- Search also through city's altername names (i.e.
alt_name
field in data sheet) - Build a frequency map of search term to determine search patterns
- Test the sorting
- https://www.freecodecamp.org/news/node-js-streams-everything-you-need-to-know-c9141306be93/
- https://stackoverflow.com/questions/990904/remove-accents-diacritics-in-a-string-in-javascript
Design an API endpoint that provides auto-complete suggestions for large cities. The suggestions should be restricted to cities in the USA and Canada with a population above 5000 people.
- the endpoint is exposed at
/suggestions
- the partial (or complete) search term is passed as a querystring parameter
q
- the caller's location can optionally be supplied via querystring parameters
latitude
andlongitude
to help improve relative scores - the endpoint returns a JSON response with an array of scored suggested matches
- the suggestions are sorted by descending score
- each suggestion has a score between 0 and 1 (inclusive) indicating confidence in the suggestion (1 is most confident)
- each suggestion has a name which can be used to disambiguate between similarly named locations
- each suggestion has a latitude and longitude
- all functional tests should pass (additional tests may be implemented as necessary).
- the final application should be deployed to Heroku.
- feel free to add more features if you like!
These responses are meant to provide guidance. The exact values can vary based on the data source and scoring algorithm
Near match
GET /suggestions?q=Londo&latitude=43.70011&longitude=-79.4163
{
"suggestions": [
{
"name": "London, ON, Canada",
"latitude": "42.98339",
"longitude": "-81.23304",
"score": 0.9
},
{
"name": "London, OH, USA",
"latitude": "39.88645",
"longitude": "-83.44825",
"score": 0.5
},
{
"name": "London, KY, USA",
"latitude": "37.12898",
"longitude": "-84.08326",
"score": 0.5
},
{
"name": "Londontowne, MD, USA",
"latitude": "38.93345",
"longitude": "-76.54941",
"score": 0.3
}
]
}
No match
GET /suggestions?q=SomeRandomCityInTheMiddleOfNowhere
{
"suggestions": []
}
- All code should be written in Javascript
- Mitigations to handle high levels of traffic should be implemented
- Challenge is submitted as pull request against this repo (fork it and create a pull request).
- Documentation and maintainability is a plus
- Geonames provides city lists Canada and the USA http://download.geonames.org/export/dump/readme.txt
- http://www.nodejs.org/
- http://ejohn.org/blog/node-js-stream-playground/
Begin by forking this repo and cloning your fork. GitHub has apps for Mac and Windows that make this easier.
Get started by installing nodejs.
For OS X users, use Homebrew and brew install nvm
Once that's done, from the project directory, run
nvm use
In the project directory run
npm install
The test suite can be run with
npm test
To start a local server run
PORT=3456 npm start
which should produce output similar to
Server running at http://127.0.0.1:3456/suggestions