From db58229481f3a19a93c208fb4bcb332f72c204e9 Mon Sep 17 00:00:00 2001 From: Janosch <99879757+jkppr@users.noreply.github.com> Date: Fri, 3 Nov 2023 14:13:53 +0000 Subject: [PATCH] Updating the feature extraction analyzer documentation (#2973) * Update the feature extraction analyzer documentation --- data/winevt_features.yaml | 2 +- docs/guides/analyzers/feature_extraction.md | 102 ++++++++++++++++++-- 2 files changed, 94 insertions(+), 10 deletions(-) diff --git a/data/winevt_features.yaml b/data/winevt_features.yaml index ea898f926f..c6f5e78638 100644 --- a/data/winevt_features.yaml +++ b/data/winevt_features.yaml @@ -47,7 +47,7 @@ # For more details and examples of such an extraction check the Timesketch # documentation: # -# TODO(Add documentation link) +# https://timesketch.org/guides/analyzers/feature_extraction/ # # ------------------------------------------------------------------------ # 4624: An account was successfully logged on. diff --git a/docs/guides/analyzers/feature_extraction.md b/docs/guides/analyzers/feature_extraction.md index 9796aabfdc..d8ccc4b320 100644 --- a/docs/guides/analyzers/feature_extraction.md +++ b/docs/guides/analyzers/feature_extraction.md @@ -2,22 +2,41 @@ hide: - footer --- -The feature extraction analyzer creates attributes out of event data based on regular expressions. Different -features can be specified in the `data/regex_features.yaml` file. +The feature extraction analyzer creates attributes out of event data based on +different extraction plugins. -Please be aware that this analyzer does *not* extract ipv4, email-addresses and similar from *all* events, but only those that match the query_string. +Currently supported: +* Regular expression based extractions + * [Regex Extraction Plugin](#regex-extraction-plugin) +* Plaso parsed windows event logs + * [Winevt Extraction Plugin](#winevt-extraction-plugin) + +> **Note** +Please be aware that this analyzer does *not* extract ipv4, email addresses and +similar from *all* events, but only those that match the definitions configured +for the plugins explained below! ### Use case -This analyzer is helpful to built a list of `email_addresses` in a sketch that are used in in `WEBHIST`. To do that, run the analyzer to have the feature extracted. Check the results by running a query: `email_address:*`. +This analyzer is helpful to extract additional data from events as separate +attributes. Those extracted attributes can then be used in search, lookups, +correlations, aggregations or with analyzers. + +For example: In the default configuration, the analyzer will extract +`email_addresses` from the message field of events with the source `WEBHIST` +matching the regular expression. -Those results now can be used in an aggregation to plot a table limited to that column. +## Regex Extraction Plugin -Another way of extracting that information is via API, querying events that contain `email_address:*` as a pandas dataframe, and work from there. +This feature extraction plugin uses regular expression to extract matching +strings from an existing event attribute (e.g. message) and adds it as a new +attribute to the event. ### Configuration -A feature extraction definition looks like this: +Features are defined in [data/regex_features.yaml](../../../data/regex_features.yaml) + +A regex based feature extraction definition looks like this: ``` name: @@ -40,9 +59,9 @@ name: keep_multimatch: False ``` -Each definition needs to define either a query_string or a query_dsl. +Each definition needs to define either a `query_string` or a `query_dsl`. -`re_flags` is a list of flags as strings from the re module. These include: +`re_flags` is a list of flags as strings from the `re` module. These include: - DEBUG - DOTALL - IGNORECASE @@ -72,3 +91,68 @@ The feature extraction works in the way that the query is run, and the regular e The first value extracted is then stored inside the "store_as" attribute. If there are emojis or tags defined they are also applied to that event. In the end, if a view is supposed to be created a view searching for the added tag is added (only if there are results). + +## Winevt Extraction Plugin + +This feature extraction plugin uses configured mappings to create new attributes +for Windows Event Log events that were parsed using [Plaso](https://github.com/log2timeline/plaso). + +The mapping is based on the `strings` array, that gets generated by Plaso for +the event data entries. + +> **Note** +The winevt extraction plugin does *not* map all Windows Event Log fields. It +does only map the ones configured in [data/winevt_features.yaml](../../../data/winevt_features.yaml)! + +### Configuration + +Features are defined in [data/winevt_features.yaml](../../../data/winevt_features.yaml) + +A mapping for a Windows Event uses the yaml format and looks like this: + +``` +name: + + source_name: Type: list[str] | REQUIRED | case-insensitive + A list of source names to match against. Multiple + entries will be checked with OR. + + provider_identifier: Type: list[str] | OPTIONAL | case-insensitive + A list of provider identifiers to match against. + Multiple entries will be checked with OR. + + event_version: Type: int | REQUIRED + The event version to match against. + + event_identifier: Type: int | REQUIRED + The event identifier to match against. + + references: Type: list[str] | OPTIONAL + A list of references to provide as context and + source for the event mapping. E.g. a URL to the + official Microsoft documentation on the event. + + mapping: Type: list[dict] | REQUIRED + A list of dicts that define the new attribute name + and the string index of the event to extract the + value from. Additonally it can also contain an + alias list to add multiple attributes with + the same value but different names. + + name: Type: str | REQUIRED + The name of the new attribute to create. + + string_index: Type: int | REQUIRED | Starting at index 0 + The string index of the event to extract the + value from. Based on the plaso extracted "strings" + attribute with Windows eventlog entries. + + aliases: Type: list[str] | OPTIONAL + A list of aliases to add additionally to the + offical name of the attribute. This can be used + to add different field names matching individual + field name ontologies. E.g. srcIP, domain, etc. +``` + +Checkout the preconfigured mappings for some examples: +[data/winevt_features.yaml](../../../data/winevt_features.yaml)