Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Use Libpostal service #146

Open
wants to merge 5 commits into
base: master
Choose a base branch
from
Open

WIP: Use Libpostal service #146

wants to merge 5 commits into from

Conversation

orangejulius
Copy link
Member

@orangejulius orangejulius commented May 20, 2018

Looking for early review, not 100% production ready yet

Use microservice-wrapper to avoid having to load libpostal locally.

Note: this now requires a new configuration section in pelias.json, a
top-level services key with the usual properties. Here's an example
full pelias.json:

Update: The code will fall back to api.services.libpostal if services.libpostal is not defined. This helps ensure backwards compatibility.

{
  "api": {
    "textAnalyzer": "libpostal"
  },
  "services": {
    "libpostal": {
      "url": "http://libpostal-service-url:8080",
      "timeout": 4000
    }
  }
}

Notes

This is a breaking change, since a new configuration option is required.

The deasync module was chosen to give the libpostal service a synchronous-like interface. We should carefully evaluate whether there are any issues with this. We can avoid using it, however it would be quite a bit of work, as many sections of the code, including lots of unit tests, would have to be re-written in a callback style.

See update below, this code has now been rewritten in an asynchronous style.

Fixes: #106

@orangejulius
Copy link
Member Author

This has now been rewritten to use Promises, async/await, and asynchronous interfaces in general. It's a lot of changed code, and I'm actually not an expert in using the new asynchronous tools in newer Javascript in production. In practice I think we'll need more try/catch blocks or general error handling for asynchronous code failures. I think even something like a single intermittent network glitch could crash the service right now.

However this should be a big change as the interpolation service will take only a few MB of RAM, and is thus much easier and cheaper to deploy.

Copy link
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review: just wanted to discuss how the async/await stuff looks.

number: analyze.housenumber( number ),
street: analyze.street( street )
};
analyze.street(street, function streetAnalyzeCallback(err, street, metadata) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the idea behind this syntax? It seems... verbose

analyze.street(street, function streetAnalyzeCallback(err, street, metadata) {

wouldn't that be simpler as:

analyze.street(street, (err, street, metadata) => {

I see that streetAnalyzeCallback isn't referenced anywhere else in the file so I'm guessing it's for stack traces?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see in another file below that you're using the syntax:

const analyze_street = util.promisify(analyze.street);
var names = await analyze_street();

I think that syntax is much cleaner, and it allows you to leave most of the code untouched and avoids the closure?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was simply because I was hoping to avoid having to use async/await syntax at all in this PR. Despite the verboseness I'm honestly not a fan of additional language complexity. Functions are well understood by everyone :)

However, because later on we are depending on many calls to libpostal returning before proceeding, using Promisify.all is basically the only way to get the behavior we want.

Since we're already going to be using the new syntax, I'll update this code to use it as well.

@orangejulius orangejulius self-assigned this Nov 25, 2019
@orangejulius
Copy link
Member Author

Turns out there is some more work to do:

  • We will probably want to add some logging of structured data regarding each request to libpostal, including the query that was sent and how long it took to get a response
  • In to record response time information, pelias-microservice-wrapper exposes a metadata object as a 3rd parameter returned to callback functions. This makes it a bit harder to use util.promisify, so sticking with a standard callback architecture where possible might be best
  • Finally, while doing some more testing of the interpolation service it turns out that the demo is broken because the /extract/geojson endpoint was not converted to use the libpostal service. It's a challenging endpoint that needs to normalize multiple street names and therefore can generate multiple calls to the libpostal service, so this will take some work.

BREAKING CHANGE

Use microservice-wrapper to avoid having to load libpostal locally.

Note: this now requires a new configuration section in `pelias.json`, a
top-level `services` key with the usual properties. Here's an example
full `pelias.json`:

```
{
  "api": {
    "textAnalyzer": "libpostal"
  },
  "services": {
    "libpostal": {
      "url": "http://libpostal-service-url:8080",
      "timeout": 4000
    }
  }
}
```

Fixes #106
Libpostal data no longer is needed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use libpostal service
2 participants