Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for using VEX statements to filter/enrich match results #1365

Open
wagoodman opened this issue Jun 28, 2023 · 13 comments
Open

Add support for using VEX statements to filter/enrich match results #1365

wagoodman opened this issue Jun 28, 2023 · 13 comments
Labels
enhancement New feature or request

Comments

@wagoodman
Copy link
Contributor

wagoodman commented Jun 28, 2023

Given a set of VEX statements, which represents status assessments relative to a vulnerability matched with a product, it would be ideal to filter grype results down to useful or novel results (removing results that have not_affected status values in a VEX statement). The primary motivator behind this is to attempt to reduce the result size when possible to help the user focus on the results that have a practical impact (and not spend time attempting to remediate non-issues).

One question might be: where should these VEX documents come from? There is a bit of a spectrum here, and I think that motivates a possible implementation path:

Enable grype to be able to...

  1. take a single document that may have one or more VEX statements and filter the results. This is a good first step since we could blindly take statements and apply them to all artifacts regardless to the scan target. (edit: implemented in Ignore/add match results based on OpenVEX documents #1397 )
  2. take multiple VEX documents and filter the results. This has an added challenge of determining which documents apply to the artifact scanned. There are also secondary problems such as if given a directory with a large set of documents, are there cheap and easy ways to filter down to the correct set of files (do we need an index? or to build an index?)?
  3. pull down remote VEX documents given an explicit reference (e.g. a git repo URL) and filter the results using only applicable documents. The reference should not be vague, so the added challenge is less about discovery and more about authentication and caching concerns.
  4. discover VEX documents for a given input reference (e.g. alpine:latest). There are a lot of added challenges here but ultimately this would be the most "magical", requiring the least amount of user input to leverage vex documents. This added automagic-ness should not sacrifice security concerns to achieve this and ideally would require no additional configuration (not a requirement though). There are a lot of directions this could go in, so I'll leave this speculation for later.

The nice thing about this path is that we defer decisions about where these documents come from while working on the logistics of lining up the existing OpenVEX spec.

For 1 something like this could be the input:

$ grype myimage:tag --vex ./path/to/vexdoc.json

The same input could work for 2 and 3, where the argument might be a directory and we look for vex documents, or the argument could be a remote resource such as [email protected]:myorg/myvexrepo.git... I'm softly suggesting this initially to help set an initial direction, but consider none of this set in stone.

Side note: I think we should focus initial conversations and efforts just on 1 for now, but I wanted to at least get a vision going for later.

One question I have about this feature is could there be multiple modes in how you would use a vex document? The initial suggestion at the top of this issue is primarily as a filter, and I was thinking about suggesting --filter or something similar as a step one. However, vex documents could also be used as a source of vulnerabilities based off of the status field with a value of affected. This means that filter is potentially the wrong verb to use based off of potential future usage... so I fell back to specifying "what" is being input ("vex") instead of an operation on the CLI. (dev note: this is where we add new flags)

I'm assuming that either mode (filtering and adding) would be useful depending on the use case and not mutually exclusive. I tend to add config items instead of CLI flags/args when there are "knobs" like these for different use cases and a sensible default behavior. That being said adding vex.filter_not_affected and vex.add_affected configurables (GRYPE_VEX_FILTER_NOT_AFFECTED and GRYPE_VEX_ADD_AFFECTED env vars) would be nice, with a default to true for both (dev note: here's were we bind new config elements into the application config).

When it comes to the JSON output grype rarely drops match results when there are filters applied, instead they are partitioned into a separate output in the JSON format: matches and ignoredMatches . When we filter out results based on vex statements I think we should elect to put these matches into the ignoredMatches section, allowing the user to audit the total set of results found.

With each record we tend to capture "how" the match was made in the .matchDetails of the match object . So for example, a match made against the alpine:3.2 image might have a match that looks like this:

  {
   "vulnerability": {
    "id": "CVE-2023-0466",
    "dataSource": "https://nvd.nist.gov/vuln/detail/CVE-2023-0466",
    "namespace": "nvd:cpe",
    "severity": "Medium",
    ...
   },
   "relatedVulnerabilities": [],
   "matchDetails": [
    {
     "type": "cpe-match",
     "matcher": "apk-matcher",
     "searchedBy": {
      "namespace": "nvd:cpe",
      "cpes": [
       "cpe:2.3:a:openssl:openssl:1.0.2k-r0:*:*:*:*:*:*:*"
      ],
      "Package": {
       "name": "openssl",
       "version": "1.0.2k-r0"
      }
     },
     "found": {
      "vulnerabilityID": "CVE-2023-0466",
      "versionConstraint": ">= 1.0.2, < 1.0.2zh || >= 1.1.1, < 1.1.1u || >= 3.0.0, < 3.0.9 || >= 3.1.0, < 3.1.1 (unknown)",
      "cpes": [
       "cpe:2.3:a:openssl:openssl:*:*:*:*:*:*:*:*"
      ]
     }
    }
   ],
   "artifact": {
    "id": "11081b02f0e7cc1f",
    "name": "libcrypto1.0",
    "version": "1.0.2k-r0",
    "type": "apk",
    ...
   }
  }

Where the matchDetails show what we searchedBy (given the package details) and what elements contributed towards finding a match in the found section. I think the matchDetails field should be amended to account for when we add matches based purely on vex statements, so we can show our work in how the match was made like we do with all of our other matchers.

Similarly, when we ignore a match based on a vex statement we should also take note of the reason why it was ignored. Today we do this in the IgnoredMatch object, which is a superset of the Match object but additionally captures the ignore rules that apply to this match . Looking at how we express ignore rules, a question that comes to mind is "should we fix vex concepts into these ignore rules? or should we add something else? (or change how this works fundamentally?)"

Ok, I have more thoughts and questions around how might the UI get updated, should we refactor the workflow to account for filtering logic earlier in processing, and related topics... but this has gotten verbose, let me stop here for now and open up the floor.

CC: @luhring @jspeed-meyers @puerco

@wagoodman wagoodman added the enhancement New feature or request label Jun 28, 2023
@dlorenc
Copy link

dlorenc commented Jul 1, 2023

This sounds awesome, and I agree with the phased approach:

  1. take a single document that may have one or more VEX statements and filter the results. This is a good first step since we could blindly take statements and apply them to all artifacts regardless to the scan target.
  2. take multiple VEX documents and filter the results. This has an added challenge of determining which documents apply to the artifact scanned. There are also secondary problems such as if given a directory with a large set of documents, are there cheap and easy ways to filter down to the correct set of files (do we need an index? or to build an index?)?
  3. pull down remote VEX documents given an explicit reference (e.g. a git repo URL) and filter the results using only applicable documents. The reference should not be vague, so the added challenge is less about discovery and more about authentication and caching concerns.
  4. discover VEX documents for a given input reference (e.g. alpine:latest). There are a lot of added challenges here but ultimately this would be the most "magical", requiring the least amount of user input to leverage vex documents. This added automagic-ness should not sacrifice security concerns to achieve this and ideally would require no additional configuration (not a requirement though). There are a lot of directions this could go in, so I'll leave this speculation for later.

I have a lot of ideas for 3 and 4, but we can cross those bridges as we get there. From the Wolfi side, we're happy to act as guinea pigs and help design the download/discovery/caching/magic parts to make sure it works well with Grype and users get the magical experience without having to sacrifice security or control.

@luhring
Copy link
Contributor

luhring commented Jul 1, 2023

I love all of this. 😍

A couple of small thoughts:

  1. According to the spec, in addition to not_affected, we'd also want fixed to be filtered out from Grype's results. (See this spec link for the differentiation if it's helpful.)
  2. The notion of adding to Grype's results via the affected status is absolutely correct. In terms of the development plan, we may want to consider making this something we come back to right after solving for the filtering use case end-to-end, instead of at the same time as filtering. I think we'd want to tackle affected before going to items 3 and 4 above (e.g. magical discovery), but there might be enough to figure out with affected that it makes sense to complete a depth-first implementation without it first, and then come right back to it.
  3. I just want to "+1" the importance of the .ignoredMatches and .matchDetails consideration points. And I like your initial suggestions @wagoodman.

This is easily the Grype feature I'm most excited about.

@puerco
Copy link
Contributor

puerco commented Jul 13, 2023

This is great, I've been diving into the grype code over the past week and I think that I have a good grasp on what @wagoodman and @luhring are mentioning here. I'll write an initial patch to propose # 1 above (take a single document...)

@puerco
Copy link
Contributor

puerco commented Jul 20, 2023

OK, I opened #1397 which implements item 1 🚀

@sej7278
Copy link

sej7278 commented Oct 24, 2023

is this only for container images, as with pkg:rpm/kernel@version as the product purl (from vexctl) i get this:

* unable to find matches against VEX sources: unable to find matches against VEX documents: 
checking matches against VEX data: reading product identifiers from context: 
source type not supported for VEX

which seems to come from https://github.com/anchore/grype/blob/main/grype/vex/openvex/implementation.go#L70

@sethmlarson
Copy link

sethmlarson commented Nov 22, 2023

This is awesome, I am excited that VEX is being integrated directly into scanning tools. This greatly helps fight back against the systematic false-positives that will only get worse as more automated tooling tries its best to automate something which isn't automatable.

I want to create and publish SBOMs for CPython, but want to do so in a way that allows our team to mark vulnerabilities in bundled dependencies (like OpenSSL, which we only use a small subset of features) as not affecting CPython so as not to cause alarm and increase demands on volunteers to make unnecessary security releases.

This is the architecture I am imagining for CPython's SBOM and VEX, all SBOM documents would be referencing a VEX document (potentially stored publicly on GitHub) so we're able to make statements about vulnerabilities in dependencies post-release without requiring everyone update their SBOMs.

Screenshot from 2023-11-22 10-18-55

CycloneDX currently has support for specifying VEX documents (via externalReferences of type vulnerability-assertion). I wasn't able to find a similar mechanism immediately when looking at SPDX.

Obviously right now we could (and will try to) tell everyone to use our VEX statements with our SBOMs, but I suspect that there will be a percentage of folks which don't do that and then we end up having to engage with them piecemeal regardless. Would be great if there was a way for it all to happen auto-magically.

Is this use-case covered in phase 3, and if not, can it be?

@puerco
Copy link
Contributor

puerco commented Nov 22, 2023

@sethmlarson I would love to talk more about it, we are working exactly on this on OpenVEX. One way to magically discover the documents is via the SBOM reference as you mentioned it, but I would love to talk about the other methods we are implementing and exploring support for other well known data storage locations. Do you want to transfer this as an issue in openvex/spec? We can continue the conversation there!

@puerco
Copy link
Contributor

puerco commented Nov 22, 2023

@sej7278 Yes, for now the only artifact type that can incorporate vex data are container images. But we are working to support more, can you explain your use case a little bit more?

@sej7278
Copy link

sej7278 commented Nov 22, 2023

@puerco I'm basically interested in RPMs.

So an SBOM generated from RPMs and SRPMs using syft, then vulnerability scanned using grype and linking to vex data to clarify if the package is actually vulnerable or not.

A general improvement on scans based on arbitrary version strings from binaries is the endgame.

@szh
Copy link

szh commented Jun 17, 2024

I'm confused as to the status of this feature. I'm trying to use it for container images, and there is already a --vex command line option, but it doesn't seem to be working (see #1836). Is it supposed to be implemented already or not?

@szh
Copy link

szh commented Jun 24, 2024

Never mind, it seems that the issue I'm having is due to a matching issue. I'll work on a fix and submit a PR soon.

@psirenny
Copy link

psirenny commented Oct 23, 2024

@sej7278 Yes, for now the only artifact type that can incorporate vex data are container images. But we are working to support more, can you explain your use case a little bit more?

We scan artifacts before packing/shipping them off to air gapped environments. For example, a set of resolved pip dependencies whose feels files are then sent to a self-hosted PyPI repository.

@RingoDev
Copy link

RingoDev commented Jan 5, 2025

This is awesome, I am excited that VEX is being integrated directly into scanning tools. This greatly helps fight back against the systematic false-positives that will only get worse as more automated tooling tries its best to automate something which isn't automatable.

I want to create and publish SBOMs for CPython, but want to do so in a way that allows our team to mark vulnerabilities in bundled dependencies (like OpenSSL, which we only use a small subset of features) as not affecting CPython so as not to cause alarm and increase demands on volunteers to make unnecessary security releases.

This is the architecture I am imagining for CPython's SBOM and VEX, all SBOM documents would be referencing a VEX document (potentially stored publicly on GitHub) so we're able to make statements about vulnerabilities in dependencies post-release without requiring everyone update their SBOMs.

Screenshot from 2023-11-22 10-18-55

CycloneDX currently has support for specifying VEX documents (via externalReferences of type vulnerability-assertion). I wasn't able to find a similar mechanism immediately when looking at SPDX.

Obviously right now we could (and will try to) tell everyone to use our VEX statements with our SBOMs, but I suspect that there will be a percentage of folks which don't do that and then we end up having to engage with them piecemeal regardless. Would be great if there was a way for it all to happen auto-magically.

Is this use-case covered in phase 3, and if not, can it be?

Since for some time now SPDX has this externalRef specced out as well @sethmlarson: https://github.com/spdx/spdx-3-model/blob/3.0.1/model/Core/Vocabularies/ExternalRefType.md?plain=1#L64

I am coming from a very similar place where we would like to be able to "mute" false-positive vulnerabilities found of deps in published SBOMs. Would also be willing to contribute this to grype, but It would probably make sense to get #1619 merged first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

9 participants