-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dhcp-relay: Format of the option82 field #5
Comments
From RFC3046 section 2.1:
In the current example, DHCP option 82 is inserted as the first DHCP option. Adding it as the last option will require parsing of all options - is that feasible in XDP? |
yoelcaspersen ***@***.***> writes:
From [RFC3046 section 2.1](https://datatracker.ietf.org/doc/html/rfc3046#section-2.1):
> A DHCP relay agent adding a Relay Agent Information field SHALL
> add it as the last option (but before 'End Option' 255, if
> present) in the DHCP options field of any recognized BOOTP or DHCP
> packet forwarded from a client to a server.
In the current example, DHCP option 82 is inserted as the first DHCP
option. Adding it as the last option will require parsing of all
options - is that feasible in XDP?
Don't see why not; a bounded loop should be able to get through the
whole packet before running out of iterations...
|
I figure out how to loop through the DHCP options in 09bc685. As for the format of Option 82: I think both sub-options should contain ASCII only to ensure compatibility with most DHCP servers out there (writing non-ASCII characters in configuration files is going to be an issue for most sysadmins). That begs the question: How can we parse ASCII text in an efficient manner with eBPF? |
On 13 October 2021 21:14:12 CEST, Yoel Caspersen ***@***.***> wrote:
I figure out how to loop through the DHCP options in 09bc685.
As for the format of Option 82: I think both sub-options should contain ASCII only to ensure compatibility with most DHCP servers out there (writing non-ASCII characters in configuration files is going to be an issue for most sysadmins).
That begs the question: How can we parse ASCII text in an efficient manner with eBPF?
Efficient? Sure, just look at the bytes. Convenient? Not so much, there are no string handling functions available...
|
I suppose we could make a small reverse lookup table, so we can translate ASCII to actual byte values. Let's say we receive an ASCII text like: ens6f0.83.200 If we split by the dot character (0x2E), we are left with the interface name, outer and inner VLAN. Can we find the ifindex from interface name in XDP - or do we need to maintain a map for that? |
Another thought is to just do a route lookup on the address that the DHCP server is offering to the client and route the packet based on that?
We're missing the ability to get the VLANs from route lookups so the control plan will have to insert that into a map until we fix that...
|
I considered that possibility as well, but wouldn't that just give us the ifindex of the VLAN interface, when what we really need is the ifindex of the physical interface? I am starting to like the idea of a stateful DHCP relay... |
Yoel Caspersen ***@***.***> writes:
I considered that possibility as well, but wouldn't that just give us
the ifindex of the VLAN interface, when what we really need is the
ifindex of the physical interface?
Well, you can put anything you want into the map, so just include the
physical interface? :)
I am starting to like the idea of a stateful DHCP relay...
Why?
|
That is under the assumption that we actually have a map. In the current state, the DHCP relay works without knowing in advance which VLANs to expect or allow - as long as there are 2 VLAN tags, the relay will process the requests.
Because it would make things really simple. When a DHCP request arrives, we save the following information in a map:
When the response arrives from the server, we check the map and find all the stuff we need to forward the packet to the client - no extra VLAN config needed. However, making the relay stateful would require regular cleanup of the map from user space (one of the reasons I would prefer a stateless DHCP relay in the first place). |
Yoel Caspersen ***@***.***> writes:
However, making the relay stateful would require regular cleanup of
the map from user space (one of the reasons I would prefer a stateless
DHCP relay in the first place).
You could use an LRU type map where old entries are automatically
evicted when the map runs out of space. If you size the map reasonably,
you don't really need to do any explicit cleanup. There will be stale
entries kept around, but that doesn't really matter as long as you check
the timestamp before using them and clean up stale entries when you
encounter them (and as long as the map is sized to not exhaust
resources).
|
Could you point to an example of that? |
Yoel Caspersen ***@***.***> writes:
> You could use an LRU type map where old entries are automatically evicted when the map runs out of space
Could you point to an example of that?
Sure - I use the same approach in this example to avoid having to
garbage collect:
https://github.com/xdp-project/bpf-examples/blob/master/preserve-dscp/preserve_dscp_kern.c
|
Thanks, @tohojo. I have added state to the latest commit (ac9373c). Next issue: Building a proper Option 82 Circuit ID string including interface name and VLAN tags, e.g.:
I get the eth0 part from the user space program through a map, but I need to convert the VLAN tags to ASCII. In normal C, I would use sprintf:
but that doesn't fly with eBPF. Do you have an idea how we can convert the VLAN tags to ASCII in a somewhat efficient manner? |
I found a way to do it using the modulo operator - code is committed in a9003fa. It is not efficient, but it will allow us to move forward with the return path using real replies from DHCP servers. Let me know if you come up with a better approach than mine :) |
As a general point, if you want code review, please open pull requests. It's possible to comment on commits on github, but such discussions tend to disappear into the void afterwards, so it's not a very good place to do so. A few comments on that commit: * Device name is 16 characters long and prepended with dash, e.g.:
* ----ens6f0.83.20 Why the dashes? It's text, so just make it shorter? Also, is the interface name really the best identifier here? That means you'll have to update the DHCP server config if the BNG interface configuration changes. Why not put in a logical identifier instead (say "pop-XXX" or whatever makes sense)? for (c = VLAN_ASCII_MAX; c > 0; c--) {
buf[i--] = (inner_vlan % 10) + '0';
inner_vlan /= 10;
if (inner_vlan == 0) {
break;
}
} This is an obvious contender for a small helper function (since you're repeating it twice). for (c = IF_NAMESIZE - 1; c >= 0; c--) { Here you're looping up to IF_NAMESIZE, but that's the size of the buffer, which you already used up some of; so this is liable to write beyond the beginning of the buffer. |
Well, that was my first approach, but I ran into a few issues with that. I haven't found an elegant way to copy a variable number of bytes in XDP - the built-in memcpy takes a const length. Unless we make a workaround for writing a variable number of bytes, we'll need to keep option 82 length static.
That's also an issue with our ZTE setup. But you are right, configuring a logical identifier probably would make better sense.
Hence the check:
I am aware that both buffers are IF_NAMESIZE (16 bytes) long - under normal circumstances, the physical interface name should be no longer than 6 characters to allow VLAN IDs to be appended (each VLAN takes up to 4 ASCII characters + 1 character for a separating dot). Strictly speaking, there is no guarantee which naming scheme the administrator will use, so your idea to use a configurable logical identifier is probably better. We still end up in the same place, though: Without a way to copy a variable number of bytes, option 82 circuit ID will in most cases be longer than strictly necessary. |
Well, you basically implemented one - the code that copies the interface name into the buffer is a variable-length |
Sure, but I suppose |
"slightly" being the operative word, there... you're copying a few dozen bytes at most, so I doubt you'd notice the difference. You could optimise it to copy the data in 8-byte chunks (since instructions are 64 bits wide). This would look something like: void memcpy_var(void *to, void *from, __u16 len)
{
__u64 *t64 = to, *f64 = from;
__u8 *t8, *f8;
int i;
for (i = 0; i < 100 && len >= 8; i++, len -= 8)
*t64++ = *f64++;
t8 = (__u8*)t64;
f8 = (__u8*)f64;
for (i = 0; i < 7 && len > 0; i++, len--)
*t8++ = *f8++
} This is basically what the static version does (except it elides the loop checks and just inlines the copy instructions because it knows the size at compile time): |
I tried your approach, but the compiler chokes on the following lines: *t64++ = *f64++; and: *t8++ = *f8++;
After adding compiler flags -fno-builtin the code compiles. However, when I try to use memcpy_var() within the write_dhcp_option82() function:
I get a nasty error from the verifier:
Close... but not quite there yet. |
Yoel Caspersen ***@***.***> writes:
I tried your approach, but the compiler chokes on the following lines:
```c
*t64++ = *f64++;
```
and:
```c
*t8++ = *f8++
```
`A call to built-in function 'memcpy' is not supported.`
Ah, so this would be because the compiler realises it's a memcpy and
changes it.
After adding compiler flags -fno-builtin the code compiles.
However, when I try to use memcpy_var() within the write_dhcp_option82() function:
```
if (sizeof (option.circuit_id.val) == sizeof (buf)) {
memcpy_var(option.circuit_id.val, buf + i, sizeof (buf) - i);
}
```
I get a nasty error from the verifier:
`misaligned stack access off (0x0; 0x0)+-17+0 size 8`
Close... but not quite there yet.
Hmm, right, it would need to check for alignment as well. Not sure if we
can check values of pointers like that in BPF, but in any case for
copying a few dozen bytes like you're doing I'm not sure it's really
worth it. Simpler to just skip the first bit that does u64-based copy
and copy the data byte-by-byte...
|
I modified your function to:
Same error - misaligned stack access. |
Yoel Caspersen ***@***.***> writes:
> Simpler to just skip the first bit that does u64-based copy and copy the data byte-by-byte...
I modified your function to:
```
void memcpy_var(void *to, void *from, __u16 len) {
__u8 *t8 = to, *f8 = from;
int i;
for (i = 0; i < len; i++, len--)
*t8++ = *f8++;
}
```
Same error - misaligned stack access.
Hmm, that sounds like the compiler is being "helpful" and optimising the
access. Could you please post the full verifier log?
|
Sure, you'll find it here: |
Yoel Caspersen ***@***.***> writes:
> Hmm, that sounds like the compiler is being "helpful" and optimising the access. Could you please post the full verifier log?
Sure, you'll find it here:
[2021-10-21 Verifier log - BNG router.txt](https://github.com/xdp-project/BNG-router/files/7388539/2021-10-21.Verifier.log.-.BNG.router.txt)
Hmm, the line information from the log is this:
; case 8: jmp_8: __it_set(d, 64);
934: (7b) *(u64 *)(r10 -90) = r9
misaligned stack access off (0x0; 0x0)+0+-90 size 8
processed 811 insns (limit 1000000) max_states_per_insn 1 total_states 41 peak_states 41 mark_read 18
that __it_set() call looks like it's coming from __bpf_memzero(), so I
think maybe it's not due to the memcpy_var()?
|
It could be the memset() that follows the copy operation:
When I disable the memset() statement, the verifier accepts memcpy_var(), but memcpy_var() copies only the first 8 bytes of buf. When I enable it, the verifier complains about misaligned stack access - unless I replace memcpy_var() with memcpy(). Then it works as expected. |
I fixed this by removing And now I can copy part of the buffer:
But I still cannot initialize the option.remove_id.val with memset - the verifier complains about misaligned stack access:
If I remove the memset statement, the XDP program works, but I guess option.remote_id.val should be initialized - otherwise it will contain random stack memory, right? |
This is length is still constant at compile time, so you don't actually need memcpy_var() for it, and also you're not cutting off the unused part of the buffer (which was the point of this whole exercise, wasn't it? :)).
Could you please open a pull request with the entire code tidbit? It's a bit hard to tell what's what with these small snippets... |
My point was to show that the same code that works with memcpy() fails with memcpy_var() - although it should be doing exactly the same.
Done. I found out that I could build a custom memset_var() function that passes the verifier checks. I suppose it is less efficient, but it seems to work. |
Quote:
The text was updated successfully, but these errors were encountered: