dhcp-relay: Format of the option82 field #5

tohojo · 2021-08-18T13:10:32Z

Quote:

Anyway, I think it is time to consult the RFC (
https://tools.ietf.org/rfc/rfc3046.txt), more specifically section 3.1 and
3.2, which defines the content of the sub options in Option 82.

In my view, Agent Circuit ID should contain the following, as it defines
the circuit:

Interface name

Outer VLAN

Inner VLAN

Preferably in ASCII format to ensure compatibility with as many DHCP
servers as possible.

For inspiration, our ZTE BNGs use the following format:

eth 0/0/5:30.3611 0/0/0/0/0/0

where 0/0/5 is the interface name, 30 is the outer VLAN and 3611 is the
inner VLAN. I don't know why they append " 0/0/0/0/0/0".

Agent Remote ID should not contain VLAN tags as this sub option is used to
identify the remote host. It could contain the MAC address of the client,
but that would be a bit redundant. In our current setup, we don't use Agent
Remote ID at all.

I also think GIADDR should be set - preferably as a configurable option.

yoelcaspersen · 2021-10-03T04:42:30Z

From RFC3046 section 2.1:

A DHCP relay agent adding a Relay Agent Information field SHALL add it as the last option (but before 'End Option' 255, if present) in the DHCP options field of any recognized BOOTP or DHCP packet forwarded from a client to a server.

In the current example, DHCP option 82 is inserted as the first DHCP option. Adding it as the last option will require parsing of all options - is that feasible in XDP?

tohojo · 2021-10-05T16:49:38Z

yoelcaspersen ***@***.***> writes:

From [RFC3046 section 2.1](https://datatracker.ietf.org/doc/html/rfc3046#section-2.1): > A DHCP relay agent adding a Relay Agent Information field SHALL > add it as the last option (but before 'End Option' 255, if > present) in the DHCP options field of any recognized BOOTP or DHCP > packet forwarded from a client to a server. In the current example, DHCP option 82 is inserted as the first DHCP option. Adding it as the last option will require parsing of all options - is that feasible in XDP?

Don't see why not; a bounded loop should be able to get through the whole packet before running out of iterations...

yoelcaspersen · 2021-10-13T19:14:01Z

I figure out how to loop through the DHCP options in 09bc685.

As for the format of Option 82: I think both sub-options should contain ASCII only to ensure compatibility with most DHCP servers out there (writing non-ASCII characters in configuration files is going to be an issue for most sysadmins).

That begs the question: How can we parse ASCII text in an efficient manner with eBPF?

tohojo · 2021-10-13T19:23:40Z

On 13 October 2021 21:14:12 CEST, Yoel Caspersen ***@***.***> wrote: I figure out how to loop through the DHCP options in 09bc685. As for the format of Option 82: I think both sub-options should contain ASCII only to ensure compatibility with most DHCP servers out there (writing non-ASCII characters in configuration files is going to be an issue for most sysadmins). That begs the question: How can we parse ASCII text in an efficient manner with eBPF?

Efficient? Sure, just look at the bytes. Convenient? Not so much, there are no string handling functions available...

yoelcaspersen · 2021-10-13T19:37:39Z

Efficient? Sure, just look at the bytes. Convenient? Not so much, there are no string handling functions available...

I suppose we could make a small reverse lookup table, so we can translate ASCII to actual byte values.

Let's say we receive an ASCII text like:

ens6f0.83.200

If we split by the dot character (0x2E), we are left with the interface name, outer and inner VLAN. Can we find the ifindex from interface name in XDP - or do we need to maintain a map for that?

tohojo · 2021-10-13T19:48:26Z

Another thought is to just do a route lookup on the address that the DHCP server is offering to the client and route the packet based on that? We're missing the ability to get the VLANs from route lookups so the control plan will have to insert that into a map until we fix that...

yoelcaspersen · 2021-10-13T19:55:14Z

I considered that possibility as well, but wouldn't that just give us the ifindex of the VLAN interface, when what we really need is the ifindex of the physical interface?

I am starting to like the idea of a stateful DHCP relay...

tohojo · 2021-10-13T20:57:34Z

Yoel Caspersen ***@***.***> writes:

I considered that possibility as well, but wouldn't that just give us the ifindex of the VLAN interface, when what we really need is the ifindex of the physical interface?

Well, you can put anything you want into the map, so just include the physical interface? :)

I am starting to like the idea of a stateful DHCP relay...

Why?

yoelcaspersen · 2021-10-14T06:23:43Z

Well, you can put anything you want into the map, so just include the physical interface? :)

That is under the assumption that we actually have a map. In the current state, the DHCP relay works without knowing in advance which VLANs to expect or allow - as long as there are 2 VLAN tags, the relay will process the requests.

Why?

Because it would make things really simple. When a DHCP request arrives, we save the following information in a map:

Client MAC (key)
Client VLANs
ifindex
Timestamp

When the response arrives from the server, we check the map and find all the stuff we need to forward the packet to the client - no extra VLAN config needed.

However, making the relay stateful would require regular cleanup of the map from user space (one of the reasons I would prefer a stateless DHCP relay in the first place).

tohojo · 2021-10-14T11:41:15Z

Yoel Caspersen ***@***.***> writes:

However, making the relay stateful would require regular cleanup of the map from user space (one of the reasons I would prefer a stateless DHCP relay in the first place).

You could use an LRU type map where old entries are automatically evicted when the map runs out of space. If you size the map reasonably, you don't really need to do any explicit cleanup. There will be stale entries kept around, but that doesn't really matter as long as you check the timestamp before using them and clean up stale entries when you encounter them (and as long as the map is sized to not exhaust resources).

yoelcaspersen · 2021-10-15T08:59:37Z

You could use an LRU type map where old entries are automatically evicted when the map runs out of space

Could you point to an example of that?

tohojo · 2021-10-15T09:17:45Z

Yoel Caspersen ***@***.***> writes:

> You could use an LRU type map where old entries are automatically evicted when the map runs out of space Could you point to an example of that?

Sure - I use the same approach in this example to avoid having to garbage collect: https://github.com/xdp-project/bpf-examples/blob/master/preserve-dscp/preserve_dscp_kern.c

yoelcaspersen · 2021-10-17T09:46:35Z

Sure - I use the same approach in this example to avoid having to garbage collect: https://github.com/xdp-project/bpf-examples/blob/master/preserve-dscp/preserve_dscp_kern.c

Thanks, @tohojo. I have added state to the latest commit (ac9373c).

Next issue: Building a proper Option 82 Circuit ID string including interface name and VLAN tags, e.g.:

eth0.80.25

I get the eth0 part from the user space program through a map, but I need to convert the VLAN tags to ASCII.

In normal C, I would use sprintf:

char buf[IF_NAMESIZE];
sprintf(buf, "%s.%d.%d", device_name, outer_vlan, inner_vlan);

but that doesn't fly with eBPF. Do you have an idea how we can convert the VLAN tags to ASCII in a somewhat efficient manner?

yoelcaspersen · 2021-10-18T07:26:59Z

Do you have an idea how we can convert the VLAN tags to ASCII in a somewhat efficient manner?

I found a way to do it using the modulo operator - code is committed in a9003fa.

It is not efficient, but it will allow us to move forward with the return path using real replies from DHCP servers. Let me know if you come up with a better approach than mine :)

tohojo · 2021-10-18T12:24:23Z

As a general point, if you want code review, please open pull requests. It's possible to comment on commits on github, but such discussions tend to disappear into the void afterwards, so it's not a very good place to do so.

A few comments on that commit:

 * Device name is 16 characters long and prepended with dash, e.g.:
 * ----ens6f0.83.20

Why the dashes? It's text, so just make it shorter? Also, is the interface name really the best identifier here? That means you'll have to update the DHCP server config if the BNG interface configuration changes. Why not put in a logical identifier instead (say "pop-XXX" or whatever makes sense)?

	for (c = VLAN_ASCII_MAX; c > 0; c--) {
		buf[i--] = (inner_vlan % 10) + '0';
		inner_vlan /= 10;
		if (inner_vlan == 0) {
			break;
		}
	}

This is an obvious contender for a small helper function (since you're repeating it twice).

for (c = IF_NAMESIZE - 1; c >= 0; c--) {

Here you're looping up to IF_NAMESIZE, but that's the size of the buffer, which you already used up some of; so this is liable to write beyond the beginning of the buffer.

yoelcaspersen · 2021-10-18T14:49:45Z

Why the dashes? It's text, so just make it shorter?

Well, that was my first approach, but I ran into a few issues with that.

I haven't found an elegant way to copy a variable number of bytes in XDP - the built-in memcpy takes a const length. Unless we make a workaround for writing a variable number of bytes, we'll need to keep option 82 length static.

Also, is the interface name really the best identifier here? That means you'll have to update the DHCP server config if the BNG interface configuration changes.

That's also an issue with our ZTE setup. But you are right, configuring a logical identifier probably would make better sense.

Here you're looping up to IF_NAMESIZE, but that's the size of the buffer, which you already used up some of; so this is liable to write beyond the beginning of the buffer.

Hence the check:

if (i < 0) { break; }

I am aware that both buffers are IF_NAMESIZE (16 bytes) long - under normal circumstances, the physical interface name should be no longer than 6 characters to allow VLAN IDs to be appended (each VLAN takes up to 4 ASCII characters + 1 character for a separating dot). Strictly speaking, there is no guarantee which naming scheme the administrator will use, so your idea to use a configurable logical identifier is probably better.

We still end up in the same place, though: Without a way to copy a variable number of bytes, option 82 circuit ID will in most cases be longer than strictly necessary.

tohojo · 2021-10-18T15:32:16Z

I haven't found an elegant way to copy a variable number of bytes in XDP - the built-in memcpy takes a const length. Unless we make a workaround for writing a variable number of bytes, we'll need to keep option 82 length static.

Well, you basically implemented one - the code that copies the interface name into the buffer is a variable-length memcpy() :)

yoelcaspersen · 2021-10-18T19:50:05Z

Well, you basically implemented one - the code that copies the interface name into the buffer is a variable-length memcpy() :)

Sure, but I suppose memcpy() derivatives perform significantly better than copying one byte at a time in a loop - right?

tohojo · 2021-10-18T20:13:17Z

"slightly" being the operative word, there... you're copying a few dozen bytes at most, so I doubt you'd notice the difference.

You could optimise it to copy the data in 8-byte chunks (since instructions are 64 bits wide). This would look something like:

void memcpy_var(void *to, void *from, __u16 len)
{
__u64 *t64 = to, *f64 = from;
__u8 *t8, *f8;
int i;

for (i = 0; i < 100 && len >= 8; i++, len -= 8)
  *t64++ = *f64++;

t8 = (__u8*)t64;
f8 = (__u8*)f64;
for (i = 0; i < 7 && len > 0; i++, len--)
  *t8++ = *f8++
}

This is basically what the static version does (except it elides the loop checks and just inlines the copy instructions because it knows the size at compile time):
https://github.com/xdp-project/bpf-examples/blob/master/include/bpf/builtins.h#L154

yoelcaspersen · 2021-10-21T09:02:28Z

I tried your approach, but the compiler chokes on the following lines:

  *t64++ = *f64++;

and:

  *t8++ = *f8++;

A call to built-in function 'memcpy' is not supported.

After adding compiler flags -fno-builtin the code compiles.

However, when I try to use memcpy_var() within the write_dhcp_option82() function:

if (sizeof (option.circuit_id.val) == sizeof (buf)) {
	memcpy_var(option.circuit_id.val, buf + i, sizeof (buf) - i);
}

I get a nasty error from the verifier:

misaligned stack access off (0x0; 0x0)+-17+0 size 8

Close... but not quite there yet.

tohojo · 2021-10-21T10:49:19Z

Yoel Caspersen ***@***.***> writes:

I tried your approach, but the compiler chokes on the following lines: ```c *t64++ = *f64++; ``` and: ```c *t8++ = *f8++ ``` `A call to built-in function 'memcpy' is not supported.`

Ah, so this would be because the compiler realises it's a memcpy and changes it.

After adding compiler flags -fno-builtin the code compiles. However, when I try to use memcpy_var() within the write_dhcp_option82() function: ``` if (sizeof (option.circuit_id.val) == sizeof (buf)) { memcpy_var(option.circuit_id.val, buf + i, sizeof (buf) - i); } ``` I get a nasty error from the verifier: `misaligned stack access off (0x0; 0x0)+-17+0 size 8` Close... but not quite there yet.

Hmm, right, it would need to check for alignment as well. Not sure if we can check values of pointers like that in BPF, but in any case for copying a few dozen bytes like you're doing I'm not sure it's really worth it. Simpler to just skip the first bit that does u64-based copy and copy the data byte-by-byte...

yoelcaspersen · 2021-10-21T11:02:20Z

Simpler to just skip the first bit that does u64-based copy and copy the data byte-by-byte...

I modified your function to:

void memcpy_var(void *to, void *from, __u16 len) {
	__u8 *t8 = to, *f8 = from;
	int i;

	for (i = 0; i < len; i++, len--)
		*t8++ = *f8++;
}

Same error - misaligned stack access.

tohojo · 2021-10-21T11:16:54Z

Yoel Caspersen ***@***.***> writes:

> Simpler to just skip the first bit that does u64-based copy and copy the data byte-by-byte... I modified your function to: ``` void memcpy_var(void *to, void *from, __u16 len) { __u8 *t8 = to, *f8 = from; int i; for (i = 0; i < len; i++, len--) *t8++ = *f8++; } ``` Same error - misaligned stack access.

Hmm, that sounds like the compiler is being "helpful" and optimising the access. Could you please post the full verifier log?

yoelcaspersen · 2021-10-21T11:23:59Z

Hmm, that sounds like the compiler is being "helpful" and optimising the access. Could you please post the full verifier log?

Sure, you'll find it here:

2021-10-21 Verifier log - BNG router.txt

tohojo · 2021-10-21T12:30:12Z

Yoel Caspersen ***@***.***> writes:

> Hmm, that sounds like the compiler is being "helpful" and optimising the access. Could you please post the full verifier log? Sure, you'll find it here: [2021-10-21 Verifier log - BNG router.txt](https://github.com/xdp-project/BNG-router/files/7388539/2021-10-21.Verifier.log.-.BNG.router.txt)

Hmm, the line information from the log is this: ; case 8: jmp_8: __it_set(d, 64); 934: (7b) *(u64 *)(r10 -90) = r9 misaligned stack access off (0x0; 0x0)+0+-90 size 8 processed 811 insns (limit 1000000) max_states_per_insn 1 total_states 41 peak_states 41 mark_read 18 that __it_set() call looks like it's coming from __bpf_memzero(), so I think maybe it's not due to the memcpy_var()?

yoelcaspersen · 2021-10-21T13:18:10Z

that __it_set() call looks like it's coming from __bpf_memzero(), so I think maybe it's not due to the memcpy_var()?

It could be the memset() that follows the copy operation:

	if (sizeof (option.circuit_id.val) == sizeof (buf)) {
		memcpy_var(option.circuit_id.val, buf, sizeof (buf));
	}

	/* Initialize remote ID */
	memset(option.remote_id.val, 0, sizeof (option.remote_id.val));

When I disable the memset() statement, the verifier accepts memcpy_var(), but memcpy_var() copies only the first 8 bytes of buf.

When I enable it, the verifier complains about misaligned stack access - unless I replace memcpy_var() with memcpy(). Then it works as expected.

yoelcaspersen · 2021-10-21T13:36:50Z

but memcpy_var() copies only the first 8 bytes of buf.

I fixed this by removing len-- from the loop in memcpy_var().

And now I can copy part of the buffer:

memcpy_var(option.circuit_id.val, buf + i, sizeof (buf) - i);

But I still cannot initialize the option.remove_id.val with memset - the verifier complains about misaligned stack access:

memset(option.remote_id.val, 0, sizeof (option.remote_id.val));

If I remove the memset statement, the XDP program works, but I guess option.remote_id.val should be initialized - otherwise it will contain random stack memory, right?

tohojo · 2021-10-21T14:31:35Z

if (sizeof (option.circuit_id.val) == sizeof (buf)) {
memcpy_var(option.circuit_id.val, buf, sizeof (buf));
}

This is length is still constant at compile time, so you don't actually need memcpy_var() for it, and also you're not cutting off the unused part of the buffer (which was the point of this whole exercise, wasn't it? :)).

If I remove the memset statement, the XDP program works, but I guess option.remote_id.val should be initialized - otherwise it will contain random stack memory, right?

Could you please open a pull request with the entire code tidbit? It's a bit hard to tell what's what with these small snippets...

yoelcaspersen · 2021-10-24T13:11:48Z

This is length is still constant at compile time, so you don't actually need memcpy_var() for it, and also you're not cutting off the unused part of the buffer (which was the point of this whole exercise, wasn't it? :)).

My point was to show that the same code that works with memcpy() fails with memcpy_var() - although it should be doing exactly the same.

Could you please open a pull request with the entire code tidbit?

Done. I found out that I could build a custom memset_var() function that passes the verifier checks. I suppose it is less efficient, but it seems to work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dhcp-relay: Format of the option82 field #5

dhcp-relay: Format of the option82 field #5

tohojo commented Aug 18, 2021

yoelcaspersen commented Oct 3, 2021

tohojo commented Oct 5, 2021 via email

yoelcaspersen commented Oct 13, 2021

tohojo commented Oct 13, 2021 via email

yoelcaspersen commented Oct 13, 2021

tohojo commented Oct 13, 2021 via email

yoelcaspersen commented Oct 13, 2021

tohojo commented Oct 13, 2021 via email

yoelcaspersen commented Oct 14, 2021

tohojo commented Oct 14, 2021 via email

yoelcaspersen commented Oct 15, 2021

tohojo commented Oct 15, 2021 via email

yoelcaspersen commented Oct 17, 2021 •

edited

Loading

yoelcaspersen commented Oct 18, 2021

tohojo commented Oct 18, 2021

yoelcaspersen commented Oct 18, 2021

tohojo commented Oct 18, 2021

yoelcaspersen commented Oct 18, 2021

tohojo commented Oct 18, 2021

yoelcaspersen commented Oct 21, 2021 •

edited

Loading

tohojo commented Oct 21, 2021 via email

yoelcaspersen commented Oct 21, 2021

tohojo commented Oct 21, 2021 via email

yoelcaspersen commented Oct 21, 2021

tohojo commented Oct 21, 2021 via email

yoelcaspersen commented Oct 21, 2021

yoelcaspersen commented Oct 21, 2021

tohojo commented Oct 21, 2021

yoelcaspersen commented Oct 24, 2021

dhcp-relay: Format of the option82 field #5

dhcp-relay: Format of the option82 field #5

Comments

tohojo commented Aug 18, 2021

yoelcaspersen commented Oct 3, 2021

tohojo commented Oct 5, 2021 via email

yoelcaspersen commented Oct 13, 2021

tohojo commented Oct 13, 2021 via email

yoelcaspersen commented Oct 13, 2021

tohojo commented Oct 13, 2021 via email

yoelcaspersen commented Oct 13, 2021

tohojo commented Oct 13, 2021 via email

yoelcaspersen commented Oct 14, 2021

tohojo commented Oct 14, 2021 via email

yoelcaspersen commented Oct 15, 2021

tohojo commented Oct 15, 2021 via email

yoelcaspersen commented Oct 17, 2021 • edited Loading

yoelcaspersen commented Oct 18, 2021

tohojo commented Oct 18, 2021

yoelcaspersen commented Oct 18, 2021

tohojo commented Oct 18, 2021

yoelcaspersen commented Oct 18, 2021

tohojo commented Oct 18, 2021

yoelcaspersen commented Oct 21, 2021 • edited Loading

tohojo commented Oct 21, 2021 via email

yoelcaspersen commented Oct 21, 2021

tohojo commented Oct 21, 2021 via email

yoelcaspersen commented Oct 21, 2021

tohojo commented Oct 21, 2021 via email

yoelcaspersen commented Oct 21, 2021

yoelcaspersen commented Oct 21, 2021

tohojo commented Oct 21, 2021

yoelcaspersen commented Oct 24, 2021

yoelcaspersen commented Oct 17, 2021 •

edited

Loading

yoelcaspersen commented Oct 21, 2021 •

edited

Loading