Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Postprocessing is Noisy and Wasteful #65

Open
secbug opened this issue Jan 24, 2019 · 2 comments
Open

Postprocessing is Noisy and Wasteful #65

secbug opened this issue Jan 24, 2019 · 2 comments

Comments

@secbug
Copy link
Contributor

secbug commented Jan 24, 2019

If you turn on the entropy calculator, it fills the default logs with:
INFO:pastehunter.py:Running Post Module postprocess.post_entropy on
It also runs on blacklisted pastes, wasting CPU time.

Affected code:

# If any of the blacklist rules appear then empty the result set
if conf['yara']['blacklist'] and 'blacklist' in results:
    results = []
    logger.info("Blacklisted {0} paste {1}".format(paste_data['pastesite'], paste_data['pasteid']))

# Post Process

# If post module is enabled and the paste has a matching rule.
post_results = paste_data
for post_process, post_values in conf["post_process"].items():
    if post_values["enabled"]:
        if any(i in results for i in post_values["rule_list"]) or "ALL" in post_values["rule_list"]:
            logger.debug("Running Post Module {0} on {1}".format(post_values["module"], paste_data["pasteid"]))
            post_module = importlib.import_module(post_values["module"])
            post_results = post_module.run(results,
                                            raw_paste_data,
                                            paste_data
                                            )

To cut the logic off at the important point:
if any(i in results for i in post_values["rule_list"]) or "ALL" in post_values["rule_list"]:

This says either any(i in results for i in post_values["rule_list"]) or "ALL" in post_values["rule_list"] will cause a paste to be parsed. This means a post_processor like "entropy calculator" will be run on EVERY paste, blacklisted or not.

@secbug secbug changed the title Turning on entropy calculator fills INFO log level with data Postprocessing is Noisy and Wasteful Jan 28, 2019
@secbug
Copy link
Contributor Author

secbug commented Jan 28, 2019

A question for the creator @kevthehermit What is meant to take priority in the code? The blacklist, or the "parse all" setting? I believe what comes first in the code will make the difference. For example:

Current with Blacklist and Parse All:
Blacklist - result = []
Post_process_current - process only "all"
Parse_all setting - result = [none_empty]
print() everything

Option A:
Blacklist - result = []
Parse_all setting - result = [none_empty]
Post_process_current - process everything applicable and process all on all
print() everything

Option B:
Parse_all setting - result = [none_empty]
Blacklist - result = []
Post_process_current - process everything applicable and process non-blacklisted on all
print() non-blacklisted

And there are a few more possibilities.

@kevthehermit
Copy link
Owner

I can see your point, process ALL was just a lazy way to say it should run on everything without having to specify a list of every rule.

Store All should take priority over the blacklist. The idea of the blacklist was to help false positive reduction in data you wanted to keep. Store all should ignore all other filters.

I have updated the workflow to only post process a blacklisted item if Store All is True. If store all is false then there is no post process performed.

As for the log output
All output modules generate an info log. I dont want to start setting logging per post process module as it just becomes harder to manage.
I can set the default to disable entropy calculation. I use it to look for encrytped blobs or large chucks of base64 and binary data so it may not be useful for others.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants