Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support ISP database for ASN data and move parts to Java #111

Merged
merged 6 commits into from
Apr 24, 2017

Conversation

suyograo
Copy link
Contributor

This adds support to GeoIP2 ISP database which includes the ASN data.

Also, while doing this, it made sense to me to just move most parts to Java since we were already using the java APIs and not the ruby libs. All existing tests pass.

'dma_code', 'ip', 'latitude',
'longitude', 'postal_code', 'region_name',
'region_code', 'timezone', 'location']
config :fields, :validate => :array
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs still mention the same defaults, but the actual defaults have moved to Java enum class.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not listing a :default here makes the documentation generator (which will change soon with @ph's work) show that the setting has no default value. Could be confusing.

However, I think given @ph/@dede's efforts to change how plugins get documented, this is probably a fine change.

(Mostly noting this for posterity)

@@ -113,132 +96,44 @@ class LogStash::Filters::GeoIP < LogStash::Filters::Base
# of the same geoip_type share the same cache. The last declared cache size will 'win'. The reason for this is that there would be no benefit
# to having multiple caches for different instances at different points in the pipeline, that would just increase the
# number of cache misses and waste memory.
config :lru_cache_size, :validate => :number, :default => 1000
config :lru_cache_size, :validate => :number, :default => 1000, :deprecated => "This field has been deprecated in favor of cache_size."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we were never using this, and it was lingering around

Copy link

@robcowart robcowart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a little feedback on field naming.

As a user the field names anonymous_system_number and anonymous_system_organization seem unnecessary long. Imagine typing those into the filter bar in Kibana. Simply using 'asn' and 'as_org' would seem sufficient. Pretty much everyone who would care what about these values uses the term ASN. Afterall, the field 'isp' isn't named 'internet_service_provider', so why name the AS-related fields in long form?

That said, if the fields are named that way within the MaxMind database and the goal is to stick with that naming, I can understand that reasoning too.

import java.util.Locale;

enum Fields {
ANONYMOUS_SYSTEM_NUMBER("anonymous_system_number"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In ASN, the A is autonomous, not anonymous

'dma_code', 'ip', 'latitude',
'longitude', 'postal_code', 'region_name',
'region_code', 'timezone', 'location']
config :fields, :validate => :array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not listing a :default here makes the documentation generator (which will change soon with @ph's work) show that the setting has no default value. Could be confusing.

However, I think given @ph/@dede's efforts to change how plugins get documented, this is probably a fine change.

(Mostly noting this for posterity)

import java.net.UnknownHostException;
import java.util.*;

public class GeoIpFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IP is capitalized, and writing it "Ip" feels wrong. I dont' think we have a firm style guide on this, but my practice has always been acronyms stay all caps.

This kind of mixing gets confusing when, for example, Java crypto calls it "SSL" but Netty calls it "Ssl" ...

Do we have a practice for this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i was debating this. I do like acronyms as capitalized. I'll change it to GeoIP


private Set<Fields> createDesiredFields(List<String> fields) {
Set<Fields> desiredFields = EnumSet.noneOf(Fields.class);
if (fields == null) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also worth checking if fields.isEmpty() ? Or would we reject an empty fields list at configuration-time?

}
// only do event.set(@target) if the lookup result is not null
if (event.getField(targetField) == null) {
event.setField(targetField, Collections.emptyMap());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary? event.set("[foo][bar]", ...) should create any parent fields if it doesn't exist... I think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no it doesn't. I had to put this in, otherwise, tests fail. It was in the old ruby code as well.

@suyograo
Copy link
Contributor Author

@robacj suggestion accepted. I shortened the fields like you mentioned.

@jordansissel updated after your comments.

public class GeoIPFilter {
private static Logger logger = LogManager.getLogger();
private static final String CITY_DB_TYPE = "GeoLite2-City";
private static final String COUNTRY_DB_TYPE = "GeoLite2-Country";

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commercial databases have the type GeoIP2-City and GeoIP2-Country. It'd likely be worth supporting those too, especially given that they are mentioned in the recent blog post.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@oschwald good call. Updated.

Copy link
Contributor

@jordansissel jordansissel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM. <3

@suyograo suyograo merged commit ec18789 into logstash-plugins:master Apr 24, 2017
@suyograo suyograo deleted the java_this branch April 24, 2017 17:56

// if location is empty, there is no point populating geo data
// and most likely all other fields are empty as well
if (location.getLatitude() == null && location.getLongitude() == null) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you're rewriting this anyway, please don't keep this mistake. There are GeoIP records with no lat-lon that this plugin should be able to handle.

@acchen97
Copy link

Minor note, changelog is missing for this release.

@zcola
Copy link

zcola commented May 18, 2017

When can I use it?

@hackery
Copy link

hackery commented Jun 22, 2017

@zcola you can use it now by updating the plugin within your current logstash:

> logstash-plugin update logstash-filter-geoip
Updating logstash-filter-geoip
Updated logstash-filter-geoip 4.0.4 to 4.1.1

@zcola
Copy link

zcola commented Jun 23, 2017

@hackery
I upgraded to 4.2.0 no show isp,, here is my compilation script fragment

  subdivisions        => [ 'array', 'map'],
    traits        =>  'map',
    location            => 'map',
...
my $tree = MaxMind::DB::Writer::Tree->new(
    ip_version    => 4,
    record_size   => 32,
    database_type => 'GeoIP2-ISP',
    languages     => ['en'],
...


        en      => $arr[14],
                            zh_CN   => $arr[15],
                        },
                    },
                    traits    => {
                            isp   => $arr[17],
                            organization  => $arr[19],
                    },

logstash return

{
"@timestamp" => 2017-06-23T08: 01: 36.884Z,
"Message_geo" => {
"City_name" => "Guangzhou",
"Time zone" => "Asia / Shanghai"
"Ip" => "1.0.8.0"
"Latitude" => 23.1167,
"Country_name" => "China"
"Country_code2" => "CN",
"Continent_code" => "AS",
"Country_code3" => "CN",
"Region_name" => "Guangdong",
"Location" => {
"Lon" => 113.25,
"Lat" => 23.1167
},
"Region_code" => "44"
"Longitude" => 113.25
},
"@version" => "1"
"Host" => "xxx"

Use python to parse the return

 }
    },
    "traits": {
        "organization": "\u58a8\u5c14\u672cGoldenit\u6709\u9650\u516c\u53f8",
        "isp": ""
    },
    "subdivisions": [
        {
            "iso_code": "VIC",
            "names": {
                "en": "Victoria",
                "zh_CN": ""
            }
        }

Is my compiler script wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants