-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
datastore: Add vuln & enrich stream updates #1315
Conversation
e08f257
to
e2c4a0f
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1315 +/- ##
==========================================
- Coverage 55.84% 55.76% -0.09%
==========================================
Files 266 266
Lines 16704 16759 +55
==========================================
+ Hits 9328 9345 +17
- Misses 6411 6445 +34
- Partials 965 969 +4 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comments on specific bits. I'd also prefer commits for Enrichment
and Vulnerability
changes individually.
datastore/postgres/enrichment.go
Outdated
"github.com/quay/claircore/datastore" | ||
"github.com/quay/zlog" | ||
|
||
"github.com/quay/claircore/libvuln/driver" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grouping.
"github.com/quay/zlog" | ||
|
||
"github.com/quay/claircore" | ||
"github.com/quay/claircore/datastore" | ||
"github.com/quay/claircore/libvuln/driver" | ||
"github.com/quay/claircore/pkg/microbatch" | ||
"github.com/quay/zlog" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grouping.
type deltaOpts struct { | ||
deletedVulns []string | ||
vulns []*claircore.Vulnerability | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike "options structs" as a style issue. I think a deleted Iter[string]
would be fine in the "internal" signature. I can harass @crozzy into helping bash that into shape -- just pull the "delta" changes into a commit that we can mess with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dislike "options structs" as a style issue.
What are the pain points, in your opinion?
I think a
deleted Iter[string]
would be fine in the "internal" signature.
The internal method needs all vulns upfront when delete=true
. It makes things a little less straightforward to change if I have to iterate over the vulns when delete=true
to handle the deletion steps first (they come first in the update logic) to decide then not to iterate if delete=false
and iterate if delete=false
.
The proposed patch seems the most straightforward way to achieve what we want; Considering we are sunsetting this interface sometime in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, missed the fact that the iterator for this particular call can be called twice. For the internal API I think this assumption is fair. I am going to update this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are the pain points, in your opinion?
They're generally used in APIs to retain source-level compatibility when adding API surface, but I think that's better implemented the way this package had it: add a new function that has the new arguments, then implement the "old" API on top of it. The other reason people seem to use them is to keep a function signature "manageable," but I find that means it's very easy to not notice when a function is gaining unique code paths. If a function has too many arguments, that's good signal of either irreducible complexity in the current design or a need to refactor the function, depending on whether refactoring is possible. Either way, an options struct sort of obscures what a 200-character long function signature makes plain.
I think they can be fine when an API really does want to make the user "fill out a form" or as an alternative to forcing a user to write boilerplate, but I don't think that applies here.
libvuln/jsonblob/jsonblob.go
Outdated
@@ -455,6 +530,16 @@ func (s *Store) DeltaUpdateVulnerabilities(ctx context.Context, updater string, | |||
return uuid.Nil, nil | |||
} | |||
|
|||
// UpdateEnrichmentsIter is unimplemented. | |||
func (s *Store) UpdateEnrichmentsIter(_ context.Context, _ string, _ driver.Fingerprint, _ datastore.EnrichmentIter) (uuid.UUID, error) { | |||
return uuid.Nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have this return something wrapping errors.ErrUnsupported
.
libvuln/jsonblob/jsonblob.go
Outdated
|
||
// UpdateVulnerabilitiesIter is unimplemented. | ||
func (s *Store) UpdateVulnerabilitiesIter(_ context.Context, _ string, _ driver.Fingerprint, _ datastore.VulnIter) (uuid.UUID, error) { | ||
return uuid.Nil, nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have this return something wrapping errors.ErrUnsupported
.
bb0672e
to
0feda8e
Compare
datastore/enrichment.go
Outdated
@@ -4,17 +4,22 @@ import ( | |||
"context" | |||
|
|||
"github.com/google/uuid" | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whitespace pedantry: this line should be here.
datastore/matcher_store.go
Outdated
// Iter is an iterator function that accepts a callback 'yield' to handle each | ||
// iterator item. The consumer can signal the iterator to break or retry by | ||
// returning an error. The iterator itself returns an error if the iteration | ||
// cannot continue or was interrupted unexpectedly. | ||
type Iter[T any] func(yield func(T) error) error | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to keep this aligned to the rangefunc
experiment syntax (which looks likely to land in Go 1.23), so I'd rather we remove the error returns:
// Iter is an iterator function that accepts a callback 'yield' to handle each | |
// iterator item. The consumer can signal the iterator to break or retry by | |
// returning an error. The iterator itself returns an error if the iteration | |
// cannot continue or was interrupted unexpectedly. | |
type Iter[T any] func(yield func(T) error) error | |
// Iter is an iterator function that accepts a callback 'yield' to handle each | |
// iterator item. The consumer can signal the iterator to break or retry by | |
// returning an error. The iterator itself returns an error if the iteration | |
// cannot continue or was interrupted unexpectedly. | |
type Iter[T any] func(yield func(T, error) bool) | |
Alternatively, if you feel strongly about always communicating the error, we could expand this a little:
// Iter is an iterator function that accepts a callback 'yield' to handle each | |
// iterator item. The consumer can signal the iterator to break or retry by | |
// returning an error. The iterator itself returns an error if the iteration | |
// cannot continue or was interrupted unexpectedly. | |
type Iter[T any] func(yield func(T) error) error | |
type Iter[T any] func(yield func(T, error) bool) | |
type IntoIter[T any] func(/*ctx?*/) (Iter[T], func() error) |
(This idea stolen from here.)
And then the using code needs to call the second function (and the compiler should complain if that doesn't happen) and the implementer just needs to keep an error
around to return for the second function to return.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, thanks for the context. Yielding errors suffice. I've updater the Iter
interface. Here's an example of what users could do using the jsonblob.Iter
:
count := 0
it := loader.NextIter()
ref, err = u.store.UpdateVulnerabilitiesIter(ctx, it.Updater, it.Fingerprint, func(yield
func(*claircore.Vulnerability, error) bool) {
for it.Next() && yield(it.Vulnerability, nil) {
count++
}
if loader.Err() != nil {
_ = yield(nil, loader.Err())
}
})
1a52476
to
81f00ae
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Datastore changes look good, especially once the restriction on the iterator needing to restart is documented.
The jsonblob changes might make more sense if implemented on a new type, instead of having already tricksy code now have two access patterns intertwined.
// UpdateVulnerabilitiesIter performs the same operation as | ||
// UpdateVulnerabilities, but accepting an iterator function. | ||
UpdateVulnerabilitiesIter(ctx context.Context, updater string, fingerprint driver.Fingerprint, vulnIter VulnerabilityIter) (uuid.UUID, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// UpdateVulnerabilitiesIter performs the same operation as | |
// UpdateVulnerabilities, but accepting an iterator function. | |
UpdateVulnerabilitiesIter(ctx context.Context, updater string, fingerprint driver.Fingerprint, vulnIter VulnerabilityIter) (uuid.UUID, error) | |
// UpdateVulnerabilitiesIter performs the same operation as | |
// UpdateVulnerabilities, but accepting an iterator function. | |
// | |
// The passed iterator must be able to iterate over the sequence multiple times. | |
UpdateVulnerabilitiesIter(ctx context.Context, updater string, fingerprint driver.Fingerprint, vulnIter VulnerabilityIter) (uuid.UUID, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// The passed iterator must be able to iterate over the sequence multiple times.
That's not true for UpdateVulnerabilitiesIter
. It would be for a potential DeltaUpdateVulnerabilitiesIter
, but this patch does not aim to support that.
libvuln/jsonblob/jsonblob.go
Outdated
@@ -117,6 +144,54 @@ func (l *Loader) Entry() *Entry { | |||
return l.e | |||
} | |||
|
|||
// Iter returns an iterator for read all objects for the current update operation. | |||
func (l *Loader) Iter() *iter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't return an unexported type.
It seems like it should be returning the equivalent of iter.Seq2[Entry, err]
, documenting that the returned Entry
objects contain a single Enrichment
or Vulnerability
, and calling (*Loader).Iter()
is mutually exclusive with calling (*Loader).Entry()
.
Alternatively, maybe just adding a StreamingLoader
or IterLoader
with a better/different interface instead of intertwining this with Loader
would prevent the possibility of misuse. It could even return something else instead of an Entry
to avoid the subtlety of that type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This shouldn't return an unexported type.
My goal is to prevent code from outside the package to create iter
objects. I need clarification of why this should not be an unexported type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, maybe just adding a StreamingLoader or IterLoader with a better/different interface instead of intertwining this with Loader would prevent the possibility of misuse.
That's a valid point.
Considering the Loader
already uses the Next()/Entry()/Err()
pattern, and considering we don't want to break backward compatibility, it seems less effort to continue using the Loader
internals vs. a new type. We would want the existing Loader
to use the StreamingLoader
anyway (if not, it's one additional code path for this, and I would rather avoid).
So we have the current proposal, where the Loader
supports two iteration patterns:
- Calling
Next()
+Entry()
to retrieve batches per updater. - Calling
NextIter()
+Iter()
to retrieve individual items per updater.
Or your proposal:
It seems like it should be returning the equivalent of iter.Seq2[Entry, err], documenting that the returned Entry objects contain a single Enrichment or Vulnerability, and calling (*Loader).Iter() is mutually exclusive with calling (*Loader).Entry().
Let's ignore rangefunc
for now -- which I don't think it worth doing anyway.
That option implies we would reuse Next()
for both (1.) and (2.) in the previous proposal. It lets the user call Next()
once and decide if either return the whole batch with Entry()
or an iterator using Iter()
. But, that's not possible unless I change the contract of Entry()
dictating the data is already available when Next() == true
. I would have to either return nil
(which I don't think the current contract allows) or partial data (which could result in misuse before the caller checks things failed calling Loader.Err()
).
A third option is to pass an option to the Loader
, which would change the behavior of Next()
and Entry()
to be of NextIter()
and Iter()
(we would reuse Entry
to iterate over individual items instead of populating the slices). That's convoluted.
So, the two iterator patterns (the current approach) balance out the need for functionality, minimal changes, and test coverage. I want to maintain backward compatibility, avoid regressions, and add a new access pattern, at the expense of adding two access patterns to the Loader
API. But I think that's OK because that API will die soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Quoting myself:
Let's ignore rangefunc for now -- which I don't think is worth doing anyway.
I lied. I played a bit with the idea today. The implementation turned out simpler, albeit a bit unusual, because I went ahead and used nested iterators. @hdonnay, let me know if you would prefer to pursue that instead d2531cb
By the way, we would have to keep the previous Next()/Entry()/Err()
implementation as is for now, so there would be two paths for the same thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
} | ||
} | ||
delVulns := func(yield func(string, error) bool) { | ||
for _, s := range deletedVulns { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other implementations above used the index i
instead. Curious why the change here?
} | ||
|
||
func (s *MatcherStore) updateVulnerabilities(ctx context.Context, updater string, fingerprint driver.Fingerprint, vulns []*claircore.Vulnerability, deletedVulns []string, delta bool) (uuid.UUID, error) { | ||
func (s *MatcherStore) updateVulnerabilities(ctx context.Context, updater string, fingerprint driver.Fingerprint, vulnIter datastore.VulnerabilityIter, delIter datastore.Iter[string]) (uuid.UUID, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delIter
is a bit confusing to me because it makes me think it's short for delta
instead of delete
. Can we make it more explicit and call it deleteIter
?
libvuln/jsonblob/jsonblob.go
Outdated
// | ||
// Users are expected to call [*iter.Next()] to read and parse the next object, | ||
// which can be retrieved using [iter.Vulnerability] or [iter.Enrichment] based | ||
// on the update kind. Errors are reported in the [Loader.Err]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think other comments use a single space after a period
@@ -139,6 +167,7 @@ func (s *MatcherStore) updateVulnerabilities(ctx context.Context, updater string | |||
return uuid.Nil, fmt.Errorf("failed to create update_operation: %w", err) | |||
} | |||
|
|||
delta := delIter != nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this used?
@@ -181,18 +210,20 @@ func (s *MatcherStore) updateVulnerabilities(ctx context.Context, updater string | |||
} | |||
|
|||
if len(oldVulns) > 0 { | |||
for _, v := range vulns { | |||
vulnIter(func(v *claircore.Vulnerability, _ error) bool { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this and delIter
make assumptions about the passed in err
that it's safe to ignore?
fb99454
to
4895118
Compare
Signed-off-by: J. Victor Martins <[email protected]>
Signed-off-by: J. Victor Martins <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We talked a little on Slack and decided split out the "provider" part of jsonblob
.
All this LGTM.
/fast-forward |
Add iterator-based vulnerability/enrichment writes to the existing DB APIs.