-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cleanup(engines): detach per-cpu kernel metrics from global kernel metrics #2031
cleanup(engines): detach per-cpu kernel metrics from global kernel metrics #2031
Conversation
Please double check driver/API_VERSION file. See versioning. /hold |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #2031 +/- ##
=======================================
Coverage 74.30% 74.30%
=======================================
Files 253 253
Lines 30966 30966
Branches 5397 5400 +3
=======================================
Hits 23010 23010
- Misses 7932 7946 +14
+ Partials 24 10 -14
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Perf diff from master - unit tests
Heap diff from master - unit tests
Heap diff from master - scap file
Benchmarks diff from master
|
userspace/libpman/src/stats.c
Outdated
* The following `if` handle the case in which we want to get the metrics per CPU but not the global ones. | ||
* It is an unsual case but at the moment we support it. | ||
*/ | ||
if ((flags & METRICS_V2_KERNEL_COUNTERS_PER_CPU) && !(flags & METRICS_V2_KERNEL_COUNTERS)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say that if KERNEL_COUNTERS are disabled, KERNEL_COUNTERS_PER_CPU must be disabled too!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see 3 options:
- handling the 2 stats separately (so looping 2 times among CPUs if both are enabled)
- handling the 2 stats together so
METRICS_V2_KERNEL_COUNTERS_PER_CPU
can be enabled only ifMETRICS_V2_KERNEL_COUNTERS
is enabled. We create a dependecy - like case 1 but we have a duplicated logic that allow us to loop just once if both metrics are enabled (implemented in this PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd go with 2, easier and expected since both metric flags share same prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep makes sense, we just need to put somewhere a log that warns the user if it enables a flag without the other
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also say that if only METRICS_V2_KERNEL_COUNTERS_PER_CPU
is passed, we silently enable METRICS_V2_KERNEL_COUNTERS
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also say that if only METRICS_V2_KERNEL_COUNTERS_PER_CPU is passed, we silently enable METRICS_V2_KERNEL_COUNTERS too.
Great idea, i will go for it!
/milestone 0.18.0 |
5827698
to
35f2152
Compare
35f2152
to
e3442fc
Compare
@@ -193,8 +193,8 @@ TEST(kmod, metrics_v2_check_per_CPU_stats) | |||
|
|||
ssize_t num_online_CPUs = sysconf(_SC_NPROCESSORS_ONLN); | |||
|
|||
// We want to check our CPUs counters | |||
uint32_t flags = METRICS_V2_KERNEL_COUNTERS; | |||
// Enabling `METRICS_V2_KERNEL_COUNTERS_PER_CPU` we also enable `METRICS_V2_KERNEL_COUNTERS` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should definitely unify the test for the 3 engines in some way because we are copying and pasting the same code 3 times for all the tests, in the end, the interface is the same... BTW I'm not doing it in this PR :/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree!
userspace/libscap/scap.c
Outdated
@@ -302,6 +302,12 @@ int32_t scap_get_stats(scap_t* handle, scap_stats* stats) | |||
// | |||
const struct metrics_v2* scap_get_stats_v2(scap_t* handle, uint32_t flags, uint32_t* nstats, int32_t* rc) | |||
{ | |||
// If we enable per-cpu counters, we also enable kenrel global counters by default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as suggested by @FedeDP
Left a minor suggestion, otherwise LGTM! |
We should be fine now :) |
There is a "tmp" commit still 😄 |
Signed-off-by: Andrea Terzolo <[email protected]>
Signed-off-by: Andrea Terzolo <[email protected]>
… enabled Signed-off-by: Andrea Terzolo <[email protected]>
Signed-off-by: Andrea Terzolo <[email protected]> Co-authored-by: Federico Di Pierro <[email protected]>
e85976c
to
2300299
Compare
reworded it, thank you! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
LGTM label has been added. Git tree hash: 8b865d3317e02cb4c9dddb77f2c8fda7cfae22d6
|
Signed-off-by: Andrea Terzolo <[email protected]> Co-authored-by: Melissa Kilby <[email protected]>
Signed-off-by: Andrea Terzolo <[email protected]> Co-authored-by: Melissa Kilby <[email protected]>
switch(stat) | ||
{ | ||
case RUN_CNT: | ||
strlcat(stats[offset].name, bpf_libbpf_stats_names[RUN_CNT], sizeof(stats[offset].name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Follow up here: Shouldn't this stay here, because we concat the name according to the switch statement?
Also we seem to use stat for the loop and here for the switch statement. Perhaps let's use separate wording for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea is to call strlcat
just once with the generic variable stat
strlcat(stats[offset].name, bpf_libbpf_stats_names[stat], sizeof(stats[offset].name));
instead of triplicating the same line 3 times using an explicit enum
strlcat(stats[offset].name, bpf_libbpf_stats_names[RUN_CNT], sizeof(stats[offset].name));
strlcat(stats[offset].name, bpf_libbpf_stats_names[RUN_TIME_NS], sizeof(stats[offset].name));
strlcat(stats[offset].name, bpf_libbpf_stats_names[AVG_TIME_NS], sizeof(stats[offset].name));
Also we seem to use stat for the loop and here for the switch statement. Perhaps let's use separate wording for clarity?
I am not sure I got this, we are using stat
(the index of the array) in the switch case to select the right metric
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looked again and yes stat is the index to the bpf_libbpf_stats ... I suppose a big confusion with stat and offset. Thanks for clarifying and also working on this.
@@ -1849,22 +1856,20 @@ const struct metrics_v2* scap_bpf_get_stats_v2(struct scap_engine_handle engine, | |||
{ | |||
strlcpy(stats[offset].name, info.name, METRIC_NAME_MAX); | |||
} | |||
strlcat(stats[offset].name, bpf_libbpf_stats_names[stat], sizeof(stats[offset].name)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] re comment here https://github.com/falcosecurity/libs/pull/2031/files#diff-12833abd4271488260dae0ba178c6ad3f0bc63642f793a20b06ab4eb10d02cf9L1839 libbpf stats were introduced w/ kernel 5.1 so folks with lower kernels can't reach this code since we check for libbpf stats being enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right but since usually many bpf features are backported I'm not so confident in removing it... I found this commit 957ab1c, unfortunately, I don't remember why I added it but i bet I had found an issue on some old machines...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair, yes the backports.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
LGTM label has been added. Git tree hash: 667ee5b9ecfd8e1c9ef82047cc3269bfe7a37aac
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Andreagit97, FedeDP, incertum The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/unhold |
What type of PR is this?
/kind cleanup
Any specific area of the project related to this PR?
/area libscap-engine-bpf
/area libscap-engine-kmod
/area libscap-engine-modern-bpf
Does this PR require a change in the driver versions?
No
What this PR does / why we need it:
As explained in issue #2028, it is better to split the per-CPU counters from the global counters for verbosity reasons.
More in detail when the per-CPU counters are enabled, libscap under the hood also enables the global counters. This is done to avoid a double loop over all the CPUs and to keep the code simpler without duplications. The idea behind this choice is that usually, a user should enable the per-cpu stats to obtain more insights with respect to the global ones so the global ones should be already enabled...
Which issue(s) this PR fixes:
Fixes #2028
Special notes for your reviewer:
Does this PR introduce a user-facing change?: