Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

datastore: fix get query caching #1449

Merged
merged 1 commit into from
Dec 18, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
130 changes: 73 additions & 57 deletions datastore/postgres/get.go
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ func (s *MatcherStore) Get(ctx context.Context, records []*claircore.IndexRecord
defer tx.Rollback(ctx)
// start a batch
batch := &pgx.Batch{}
resCache := map[string]pgx.Rows{}
resCache := map[string][]*claircore.Vulnerability{}
rqs := []*recordQuery{}
for _, record := range records {
query, err := buildGetQuery(record, &opts)
Expand All @@ -72,7 +72,6 @@ func (s *MatcherStore) Get(ctx context.Context, records []*claircore.IndexRecord
resCache[query] = nil
}
// send the batch

start := time.Now()
res := tx.SendBatch(ctx, batch)
// Can't just defer the close, because the batch must be fully handled
Expand All @@ -83,72 +82,89 @@ func (s *MatcherStore) Get(ctx context.Context, records []*claircore.IndexRecord
results := make(map[string][]*claircore.Vulnerability)
vulnSet := make(map[string]map[string]struct{})
for _, rq := range rqs {
rows, ok := resCache[rq.query]
rid := rq.record.Package.ID
vulns, ok := resCache[rq.query]
if !ok {
return nil, fmt.Errorf("unexpected vulnerability query: %s", rq.query)
}
if rows == nil {
rows, err = res.Query()
if err != nil {
res.Close()
return nil, err
if vulns != nil { // We already have results we don't need to go back to the DB.
RTann marked this conversation as resolved.
Show resolved Hide resolved
if _, ok := vulnSet[rid]; !ok {
vulnSet[rid] = make(map[string]struct{})
}
resCache[rq.query] = rows
}

// unpack all returned rows into claircore.Vulnerability structs
for rows.Next() {
// fully allocate vuln struct
v := &claircore.Vulnerability{
Package: &claircore.Package{},
Dist: &claircore.Distribution{},
Repo: &claircore.Repository{},
for _, v := range vulns {
if _, ok := vulnSet[rid][v.ID]; !ok {
vulnSet[rid][v.ID] = struct{}{}
results[rid] = append(results[rid], v)
}
}

var id int64
err := rows.Scan(
&id,
&v.Name,
&v.Description,
&v.Issued,
&v.Links,
&v.Severity,
&v.NormalizedSeverity,
&v.Package.Name,
&v.Package.Version,
&v.Package.Module,
&v.Package.Arch,
&v.Package.Kind,
&v.Dist.DID,
&v.Dist.Name,
&v.Dist.Version,
&v.Dist.VersionCodeName,
&v.Dist.VersionID,
&v.Dist.Arch,
&v.Dist.CPE,
&v.Dist.PrettyName,
&v.ArchOperation,
&v.Repo.Name,
&v.Repo.Key,
&v.Repo.URI,
&v.FixedInVersion,
&v.Updater,
)
v.ID = strconv.FormatInt(id, 10)
continue
}
results[rid] = []*claircore.Vulnerability{}
err := func() error {
rows, err := res.Query()
if err != nil {
res.Close()
return nil, fmt.Errorf("failed to scan vulnerability: %v", err)
return fmt.Errorf("error getting rows: %w", err)
}
defer rows.Close()
// unpack all returned rows into claircore.Vulnerability structs
for rows.Next() {
// fully allocate vuln struct
v := &claircore.Vulnerability{
Package: &claircore.Package{},
Dist: &claircore.Distribution{},
Repo: &claircore.Repository{},
}

rid := rq.record.Package.ID
if _, ok := vulnSet[rid]; !ok {
vulnSet[rid] = make(map[string]struct{})
}
if _, ok := vulnSet[rid][v.ID]; !ok {
vulnSet[rid][v.ID] = struct{}{}
results[rid] = append(results[rid], v)
var id int64
err := rows.Scan(
&id,
&v.Name,
&v.Description,
&v.Issued,
&v.Links,
&v.Severity,
&v.NormalizedSeverity,
&v.Package.Name,
&v.Package.Version,
&v.Package.Module,
&v.Package.Arch,
&v.Package.Kind,
&v.Dist.DID,
&v.Dist.Name,
&v.Dist.Version,
&v.Dist.VersionCodeName,
&v.Dist.VersionID,
&v.Dist.Arch,
&v.Dist.CPE,
&v.Dist.PrettyName,
&v.ArchOperation,
&v.Repo.Name,
&v.Repo.Key,
&v.Repo.URI,
&v.FixedInVersion,
&v.Updater,
)
v.ID = strconv.FormatInt(id, 10)
if err != nil {
res.Close()
return fmt.Errorf("failed to scan vulnerability: %w", err)
}

if _, ok := vulnSet[rid]; !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize this was here before but I'm wondering: is it possible we see the same rid twice in the rqs loop and they both have different queries and different sets of vulnerabilities? If so, then it seems like, in the end, we give them both the same set of vulns instead of separate. Is this expected? If so, when?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think this is a dormant bug, it probably never comes up in practice because:

  1. With VEX the repository doesn't matter until Vulnerable() is called (so after the querying) so the queries per package are the same.
  2. Before that, in the OVAL files, each vulnerability referenced all the possible repos so I suppose you were bound to get a comparable set of Vulns for each package.ID anyway.

I think this probably deserves more discussion and a separate issue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Every other time a matcher queries on driver.RepositoryName, it's a mono-repository representing that ecosytem

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I looked into it a bit, and I realize that's supposed to happen due to the nature of IndexRecords. As in, there is a record for every unique for every repository in every environment in every package. So, there should be an example where we see the same rid with different queries (for example two different potential repositories) and it's possible we get two different results. At least in this particular example that I'm thinking of (rpms), this was something which has popped up and was a bug, but I believe switching to VEX resolved it? Or, it's still a problem because we have to guess which repository is responsible for the found rpms (we try them all and see what happens)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's not an issue for VEX because the specific repo isn't part of the query anymore, it's checked after the vulns are returned. But, it's still a little bit of a bear trap for future stuff.

vulnSet[rid] = make(map[string]struct{})
}
if _, ok := vulnSet[rid][v.ID]; !ok {
vulnSet[rid][v.ID] = struct{}{}
results[rid] = append(results[rid], v)
}
}
return nil
}()
if err != nil {
return nil, err
}
resCache[rq.query] = results[rid]
}
if err := res.Close(); err != nil {
return nil, fmt.Errorf("some weird batch error: %v", err)
Expand Down
Loading