Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ct: add support for mark and labels #434

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

ct: add support for mark and labels #434

wants to merge 4 commits into from

Conversation

atenart
Copy link
Contributor

@atenart atenart commented Oct 18, 2024

RFC as I don't want this to be merged before the bindgen PR (#419); this will need to be rebased.

A test image is available and can be used as follow:

$ RETIS_TAG=ct-mark-label ./retis_in_container.sh collect -c ct ...

Labels can be set for testing, eg:

$ iptables -A INPUT -m connlabel --label 42 --set

A few questions / remarks:

  • I took the opportunity to make the parent ct information to be printed on a newline. Is that understandable?
  • Both the mark and labels are retrieved for the base and the parent nf_conn.
  • Labels are a bitmask; I chose to print them as an hex value to save some space. Is that fine?

@igsilya can you help testing to make sure this looks good? Thanks!

Close #407.

@atenart atenart added the run-functional-tests Request functional tests to be run by CI label Oct 18, 2024
@atenart atenart added this to the v1.5 milestone Oct 18, 2024
retis-events/src/ct.rs Outdated Show resolved Hide resolved
Comment on lines +57 to +63
/// u128 representation in the events. We can't use the Rust primitive as serde
/// does not handle the type well.
#[event_type]
pub struct U128 {
hi: u64,
lo: u64,
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Due to serde limitations I chose to represent u128 as the above. Another way would be to use something like U128(u128) and represent it as a string in JSON.

@atenart atenart changed the title [RFC] ct: add support for mark and labels ct: add support for mark and labels Oct 31, 2024
@igsilya
Copy link

igsilya commented Nov 8, 2024

Feels like I'm getting some random garbage instead of labels. Every time I run the test they are different or not show up at all. For example:

1326209075534 (12) [swapper/12] 0 [tp] openvswitch:ovs_do_execute_action #134c83864ebffff98c3d24a6400 (skb ffff98c3c2e21200) n 0
  if 97 (ovs-p3) rxif 97 10.1.1.4.80 > 10.1.1.3.46922 ttl 64 tos 0x0 id 0 off 0 [DF] len 60 proto TCP (6) flags [S.] seq 769399365 ack 3766449066 win 65160
  exec ct zone 0
  ↳ 1326209079076 (12) [swapper/12] 0 [tp] openvswitch:ovs_do_execute_action #134c83864ebffff98c3d24a6400 (skb ffff98c3c2e21200) n 1
      if 97 (ovs-p3) rxif 97 10.1.1.4.80 > 10.1.1.3.46922 ttl 64 tos 0x0 id 0 off 0 [DF] len 60 proto TCP (6) flags [S.] seq 769399365 ack 3766449066 win 65160
      exec recirc 0x2
      ct_state REPLY tcp (SYN_RECV) orig [10.1.1.3.46922 > 10.1.1.4.80] reply [10.1.1.4.80 > 10.1.1.3.46922] zone 0 mark 0
  ↳ 1326209086216 (12) [swapper/12] 0 [tp] skb:kfree_skb #134c83864ebffff98c3d24a6400 (skb ffff98c3c2e21200) n 2 drop (reason openvswitch/OVS_DROP_EXPLICIT)
      if 97 (ovs-p3) rxif 97 10.1.1.4.80 > 10.1.1.3.46922 ttl 64 tos 0x0 id 0 off 0 [DF] len 60 proto TCP (6) flags [S.] seq 769399365 ack 3766449066 win 65160
      ct_state REPLY tcp (SYN_RECV) orig [10.1.1.3.46922 > 10.1.1.4.80] reply [10.1.1.4.80 > 10.1.1.3.46922] zone 0 mark 0 labels 0x64766564752d646d65

It supposed to be the same ct entry, but the labels are not shown the first time. And the 0x64766564752d646d65 looks suspiciously like an ASCII text. :)

The test that produced this is using either 0x2 or 0xa000d000005000001 as a label and the test passes, so these values are there.

@igsilya
Copy link

igsilya commented Nov 8, 2024

For the formatting of the parent entry, it looks like this:

1974207113815 (0) [python3] 15686 [tp] openvswitch:ovs_do_execute_action #1cba7e9c346ffff98c3d19d0f00 (skb ffff98c3f246eae8) n 0
  if 106 (ovs-p1) rxif 106 10.1.1.2.45333 > 10.1.1.9.39971 ttl 64 tos 0x0 id 29108 off 0 [DF] len 60 proto TCP (6) flags [S] seq 359608549 win 64240
  exec ct zone 1 nat
  ↳ 1974207122891 (0) [python3] 15686 [tp] openvswitch:ovs_do_execute_action #1cba7e9c346ffff98c3d19d0f00 (skb ffff98c3f246eae8) n 1
      if 106 (ovs-p1) rxif 106 10.1.1.2.45333 > 10.1.1.9.39971 ttl 64 tos 0x0 id 29108 off 0 [DF] len 60 proto TCP (6) flags [S] seq 359608549 win 64240
      exec recirc 0x2
      ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.45333 > 10.1.1.9.39971] reply [10.1.1.1.39971 > 10.1.1.2.45333] zone 1 mark 0
      parent [tcp (ESTABLISHED) orig [10.1.1.1.56222 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.56222] zone 1 mark 0]
  ↳ 1974207131313 (0) [python3] 15686 [tp] openvswitch:ovs_dp_upcall #1cba7e9c346ffff98c3d19d0f00 (skb ffff98c3f246eae8) n 2
      if 106 (ovs-p1) rxif 106 10.1.1.2.45333 > 10.1.1.9.39971 ttl 64 tos 0x0 id 29108 off 0 [DF] len 60 proto TCP (6) flags [S] seq 359608549 win 64240
      upcall (miss) port 2548630282 cpu 0
      ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.45333 > 10.1.1.9.39971] reply [10.1.1.1.39971 > 10.1.1.2.45333] zone 1 mark 0
      parent [tcp (ESTABLISHED) orig [10.1.1.1.56222 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.56222] zone 1 mark 0]
  ↳ 1974207149639 (0) [python3] 15686 [kr] ovs_dp_upcall #1cba7e9c346ffff98c3d19d0f00 (skb ffff98c3f246eae8) n 3
      if 106 (ovs-p1) rxif 106 10.1.1.2.45333 > 10.1.1.9.39971 ttl 64 tos 0x0 id 29108 off 0 [DF] len 60 proto TCP (6) flags [S] seq 359608549 win 64240
      upcall_ret (0/1974207131313) ret 0
      ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.45333 > 10.1.1.9.39971] reply [10.1.1.1.39971 > 10.1.1.2.45333] zone 1 mark 0
      parent [tcp (ESTABLISHED) orig [10.1.1.1.56222 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.56222] zone 1 mark 0]

It might be better if the parent was indented further to the right, so it better aligns with the current connection and that it looks more like part of the ct_state and not something separate from it. It doesn't have to align perfectly, and it is not possible unless we align parts of the message internally, but it might make sense to at least shift the whole thing len("ct_state ") characters to the right. WDYT?

  ↳ 1974207149639 (0) [python3] 15686 [kr] ovs_dp_upcall #1cba7e9c346ffff98c3d19d0f00 (skb ffff98c3f246eae8) n 3
      if 106 (ovs-p1) rxif 106 10.1.1.2.45333 > 10.1.1.9.39971 ttl 64 tos 0x0 id 29108 off 0 [DF] len 60 proto TCP (6) flags [S] seq 359608549 win 64240
      upcall_ret (0/1974207131313) ret 0
      ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.45333 > 10.1.1.9.39971] reply [10.1.1.1.39971 > 10.1.1.2.45333] zone 1 mark 0
               parent [tcp (ESTABLISHED) orig [10.1.1.1.56222 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.56222] zone 1 mark 0]

@atenart
Copy link
Contributor Author

atenart commented Nov 14, 2024

Feels like I'm getting some random garbage instead of labels. Every time I run the test they are different or not show up at all. For example:

1326209075534 (12) [swapper/12] 0 [tp] openvswitch:ovs_do_execute_action #134c83864ebffff98c3d24a6400 (skb ffff98c3c2e21200) n 0
  if 97 (ovs-p3) rxif 97 10.1.1.4.80 > 10.1.1.3.46922 ttl 64 tos 0x0 id 0 off 0 [DF] len 60 proto TCP (6) flags [S.] seq 769399365 ack 3766449066 win 65160
  exec ct zone 0
  ↳ 1326209079076 (12) [swapper/12] 0 [tp] openvswitch:ovs_do_execute_action #134c83864ebffff98c3d24a6400 (skb ffff98c3c2e21200) n 1
      if 97 (ovs-p3) rxif 97 10.1.1.4.80 > 10.1.1.3.46922 ttl 64 tos 0x0 id 0 off 0 [DF] len 60 proto TCP (6) flags [S.] seq 769399365 ack 3766449066 win 65160
      exec recirc 0x2
      ct_state REPLY tcp (SYN_RECV) orig [10.1.1.3.46922 > 10.1.1.4.80] reply [10.1.1.4.80 > 10.1.1.3.46922] zone 0 mark 0
  ↳ 1326209086216 (12) [swapper/12] 0 [tp] skb:kfree_skb #134c83864ebffff98c3d24a6400 (skb ffff98c3c2e21200) n 2 drop (reason openvswitch/OVS_DROP_EXPLICIT)
      if 97 (ovs-p3) rxif 97 10.1.1.4.80 > 10.1.1.3.46922 ttl 64 tos 0x0 id 0 off 0 [DF] len 60 proto TCP (6) flags [S.] seq 769399365 ack 3766449066 win 65160
      ct_state REPLY tcp (SYN_RECV) orig [10.1.1.3.46922 > 10.1.1.4.80] reply [10.1.1.4.80 > 10.1.1.3.46922] zone 0 mark 0 labels 0x64766564752d646d65

It supposed to be the same ct entry, but the labels are not shown the first time. And the 0x64766564752d646d65 looks suspiciously like an ASCII text. :)

The test that produced this is using either 0x2 or 0xa000d000005000001 as a label and the test passes, so these values are there.

Thanks for testing! I believe there are two issues:

  • The garbage value being reported is an issue in how we report labels from BPF to userspace: if no label needs to be reported the data is not initialized to 0 and garbage is reported and parsed as valid labels. This is easy to fix.
  • It seems the BPF side fails to identify when labels are available and only report them in some cases. I'll investigate[1].

(The above is probably a combination of the two issues).

Also there is the issue reported in #445.

[Edit] [1] I think the issue is the NF_CT_EXT_LABELS value is not stable in various kernel configurations as it depends on the kernel config. We can't use it as-is in the BPF code and need to use CO-RE.

@atenart
Copy link
Contributor Author

atenart commented Nov 14, 2024

It might be better if the parent was indented further to the right, so it better aligns with the current connection and that it looks more like part of the ct_state and not something separate from it. It doesn't have to align perfectly, and it is not possible unless we align parts of the message internally, but it might make sense to at least shift the whole thing len("ct_state ") characters to the right. WDYT?

  ↳ 1974207149639 (0) [python3] 15686 [kr] ovs_dp_upcall #1cba7e9c346ffff98c3d19d0f00 (skb ffff98c3f246eae8) n 3
      if 106 (ovs-p1) rxif 106 10.1.1.2.45333 > 10.1.1.9.39971 ttl 64 tos 0x0 id 29108 off 0 [DF] len 60 proto TCP (6) flags [S] seq 359608549 win 64240
      upcall_ret (0/1974207131313) ret 0
      ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.45333 > 10.1.1.9.39971] reply [10.1.1.1.39971 > 10.1.1.2.45333] zone 1 mark 0
               parent [tcp (ESTABLISHED) orig [10.1.1.1.56222 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.56222] zone 1 mark 0]

IMO we should avoid to shift things more, we already have events (in series) and their sections. Maybe instead we can use a special char to start the parent line? is already used but we can find something else. If no better idea \ could do (and if in the future a section uses more than 2 lines, we could combine it with |).

  ↳ 1974207149639 (0) [python3] 15686 [kr] ovs_dp_upcall #1cba7e9c346ffff98c3d19d0f00 (skb ffff98c3f246eae8) n 3
      if 106 (ovs-p1) rxif 106 10.1.1.2.45333 > 10.1.1.9.39971 ttl 64 tos 0x0 id 29108 off 0 [DF] len 60 proto TCP (6) flags [S] seq 359608549 win 64240
      upcall_ret (0/1974207131313) ret 0
      ct_state RELATED tcp (SYN_SENT) orig [10.1.1.2.45333 > 10.1.1.9.39971] reply [10.1.1.1.39971 > 10.1.1.2.45333] zone 1 mark 0
      \ parent [tcp (ESTABLISHED) orig [10.1.1.1.56222 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.56222] zone 1 mark 0]

WDYT?

As it contains helpers.

Signed-off-by: Antoine Tenart <[email protected]>
While the mark is an optional feature that can be compiled out, let's
not use a dedicated event section for just an u32. Instead we detect if
the mark value should be seen as valid in the Rust side while processing
the raw event.

Signed-off-by: Antoine Tenart <[email protected]>
While labels are an optional feature that can be compiled out, let's not
use a dedicated event section for just 128 bits. Instead we detect if
the labels value should be seen as valid in the Rust side while
processing the raw event. Also 0 means there is no label.

Signed-off-by: Antoine Tenart <[email protected]>
@igsilya
Copy link

igsilya commented Nov 14, 2024

IMO we should avoid to shift things more, we already have events (in series) and their sections. Maybe instead we can use a special char to start the parent line?

I like how the protocol and reply align in the shifted version, and it's easier to compare IPs this way. But I won't insist. Having a special character seems better than not having it.

@atenart
Copy link
Contributor Author

atenart commented Nov 14, 2024

@igsilya the various issues should be fixed now. I refreshed the container image.

@igsilya
Copy link

igsilya commented Nov 14, 2024

Updated version seems to work:

25320889831 (24) [python3] 2994 [tp] openvswitch:ovs_do_execute_action #5e53dfd42ffff90be14a28280 (skb ffff90be0ac7e0e8) n 0
  if 10 (ovs-p1) rxif 10 10.1.1.2.59825 > 10.1.1.9.44731 ttl 64 tos 0x0 id 29302 off 0 [DF] len 1439 proto TCP (6) flags [P.] seq 2695371793:2695373180 ack 848702181 win 502
  exec ct zone 1 nat
  ↳ 25320894171 (24) [python3] 2994 [tp] openvswitch:ovs_do_execute_action #5e53dfd42ffff90be14a28280 (skb ffff90be0ac7e0e8) n 1
      if 10 (ovs-p1) rxif 10 10.1.1.2.59825 > 10.1.1.1.44731 ttl 64 tos 0x0 id 29302 off 0 [DF] len 1439 proto TCP (6) flags [P.] seq 2695371793:2695373180 ack 848702181 win 502
      exec recirc 0x2
      ct_state ESTABLISHED tcp (ESTABLISHED) orig [10.1.1.2.59825 > 10.1.1.9.44731] reply [10.1.1.1.44731 > 10.1.1.2.59825] zone 1 mark 0 labels 0x4d2000000000000000000000001
      \ parent [tcp (ESTABLISHED) orig [10.1.1.1.39854 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.39854] zone 1 mark 0 labels 0x4d2000000000000000000000001]
  ↳ 25320900071 (24) [python3] 2994 [tp] openvswitch:ovs_do_execute_action #5e53dfd42ffff90be14a28280 (skb ffff90be0ac7e0e8) n 2
      if 10 (ovs-p1) rxif 10 10.1.1.2.59825 > 10.1.1.1.44731 ttl 64 tos 0x0 id 29302 off 0 [DF] len 1439 proto TCP (6) flags [P.] seq 2695371793:2695373180 ack 848702181 win 502
      [recirc_id 0x2] exec oport 2
      ct_state ESTABLISHED tcp (ESTABLISHED) orig [10.1.1.2.59825 > 10.1.1.9.44731] reply [10.1.1.1.44731 > 10.1.1.2.59825] zone 1 mark 0 labels 0x4d2000000000000000000000001
      \ parent [tcp (ESTABLISHED) orig [10.1.1.1.39854 > 10.1.1.2.21] reply [10.1.1.2.21 > 10.1.1.9.39854] zone 1 mark 0 labels 0x4d2000000000000000000000001]

Tested with RHEL 9.4 stock kernel and 6.10-rc6 upstream kernel. I no longer see any garbage data and the labels are the labels match what OVS has in OpenFlow rules. Also works without kenrel config available.

@atenart
Copy link
Contributor Author

atenart commented Nov 14, 2024

Tested with RHEL 9.4 stock kernel and 6.10-rc6 upstream kernel. I no longer see any garbage data and the labels are the labels match what OVS has in OpenFlow rules. Also works without kenrel config available.

Thanks for testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-functional-tests Request functional tests to be run by CI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[feature] Print out ct mark and labels
2 participants