Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pppd: implement net-init, net-pre-up and net-down. #367

Merged

Conversation

jkroonza
Copy link
Contributor

This no longer differentiates between which sub protocols brings an
interface up or down, merely tracks which sub protocols have brought up
and interface, and which have not yet taken it down.

There is one caveat here, I use pointer comparison on these names, so
the absolutely cannot be anything other than a compile-time constant,
and I'm not sure whether or not at least -O1 is required or not, ie, if
the compiler will eliminate multiple "STRINGS" to a single instance in
memory.

I'm unable to test the solaris code, but I don't see any reason why this
should not work.

Signed-off-by: Jaco Kroon [email protected]

@jkroonza
Copy link
Contributor Author

This is aimed to replace both #341 and #342 with the strategy as discussed in #341.

This is the initial step in that it basically provides a consolidate up/down in main.c, which ALL protocols (eg, MPLSCP should it ever get set up) can use to up/down the interface, and which will then execute net-init, net-pre-up and net-down as appropriate.

Pushed now to leverage the build environments to at least compile-test the non-linux stuff so long.

@jkroonza jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from 31fb4d9 to bf9329e Compare August 24, 2022 14:38
@jkroonza
Copy link
Contributor Author

# /home/jkroon/projects/ppp/pppd/pppd user [email protected] password OkeeLoSh0joo debug maxfail 0 persist connect true plugin /home/jkroon/projects/ppp/pppd/plugins/pppoe/.libs/pppoe.so eth0.2 nodetach
Plugin /home/jkroon/projects/ppp/pppd/plugins/pppoe/.libs/pppoe.so loaded.
PPPoE plugin from pppd 2.4.10-dev
Send PPPOE Discovery V1T1 PADI session 0x0 length 12
 dst ff:ff:ff:ff:ff:ff  src 34:48:ed:8d:d4:61
 [service-name] [host-uniq b6 04 00 00]
Recv PPPOE Discovery V1T1 PADO session 0x0 length 49
 dst 34:48:ed:8d:d4:61  src 00:d0:f6:b6:56:d5
 [service-name] [AC-name tcap-ip-bng-1] [host-uniq b6 04 00 00] [AC-cookie 20 60 8f 6c a0 52 0f f2 b9 73 d9 43 85 ea 20 e0]
Send PPPOE Discovery V1T1 PADR session 0x0 length 32
 dst 00:d0:f6:b6:56:d5  src 34:48:ed:8d:d4:61
 [service-name] [host-uniq b6 04 00 00] [AC-cookie 20 60 8f 6c a0 52 0f f2 b9 73 d9 43 85 ea 20 e0]
Recv PPPOE Discovery V1T1 PADS session 0x1 length 12
 dst 34:48:ed:8d:d4:61  src 00:d0:f6:b6:56:d5
 [service-name] [host-uniq b6 04 00 00]
PPP session is 1
Connected to 00:d0:f6:b6:56:d5 via interface eth0.2
using channel 25
Using interface ppp1
Script /etc/ppp/net-init started (pid 1209)
Script /etc/ppp/net-init finished (pid 1209), status = 0x0
Connect: ppp1 <--> eth0.2
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x62b36091>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0x62b36091>]
sent [LCP ConfReq id=0x1 <mru 1492> <magic 0x62b36091>]
rcvd [LCP ConfAck id=0x1 <mru 1492> <magic 0x62b36091>]
rcvd [LCP ConfReq id=0x47 <mru 1492> <auth pap> <magic 0x136385c4>]
sent [LCP ConfAck id=0x47 <mru 1492> <auth pap> <magic 0x136385c4>]
sent [PAP AuthReq id=0x1 user="<removed>" password=<hidden>]
rcvd [PAP AuthAck id=0x1 "Login ok"]
Remote message: Login ok
PAP authentication succeeded
peer from calling number 00:D0:F6:B6:56:D5 authorized
kernel does not support PPP filtering
sent [IPCP ConfReq id=0x1 <addr 154.73.32.7>]
sent [IPV6CP ConfReq id=0x1 <addr fe80::e4cf:b137:6c20:f011>]
rcvd [IPV6CP ConfReq id=0x8a <addr fe80::02d0:f6ff:feb6:5580>]
sent [IPV6CP ConfAck id=0x8a <addr fe80::02d0:f6ff:feb6:5580>]
rcvd [IPCP ConfReq id=0x75 <addr 165.16.204.97>]
sent [IPCP ConfAck id=0x75 <addr 165.16.204.97>]
rcvd [IPCP ConfNak id=0x1 <addr 165.16.204.108>]
sent [IPCP ConfReq id=0x2 <addr 165.16.204.108>]
rcvd [IPV6CP ConfAck id=0x1 <addr fe80::e4cf:b137:6c20:f011>]
Script /etc/ppp/net-pre-up started (pid 1238)
Script /etc/ppp/net-pre-up finished (pid 1238), status = 0x0
IPV6CP is now UP
local  LL address fe80::e4cf:b137:6c20:f011
remote LL address fe80::02d0:f6ff:feb6:5580
Script /etc/ppp/ipv6-up started (pid 1260)
rcvd [IPCP ConfAck id=0x2 <addr 165.16.204.108>]
Script /etc/ppp/ip-pre-up started (pid 1262)
Script /etc/ppp/ip-pre-up finished (pid 1262), status = 0x0
IPCP is now UP
local  IP address 165.16.204.108
remote IP address 165.16.204.97
Script /etc/ppp/ip-up started (pid 1346)
Script /etc/ppp/ipv6-up finished (pid 1260), status = 0x0
Script /etc/ppp/ip-up finished (pid 1346), status = 0x0


^CTerminating on signal 2
Connect time 8.1 minutes.
Sent 97674 bytes, received 11780 bytes.
IPCP is now DOWN
Script /etc/ppp/ip-down started (pid 17481)
IPV6CP is now DOWN
Script /etc/ppp/net-down started (pid 17483)
Script /etc/ppp/ipv6-down started (pid 17484)
sent [LCP TermReq id=0x2 "User request"]
rcvd [LCP TermAck id=0x2]
Connection terminated.
Connect time 8.1 minutes.
Sent 97674 bytes, received 11780 bytes.
Script /etc/ppp/ip-down finished (pid 17481), status = 0x0
Script /etc/ppp/net-down finished (pid 17483), status = 0x0
Script /etc/ppp/ipv6-down finished (pid 17484), status = 0x0

Ordering:

net-init successfully executed as per above as the first script, just after "Using interface ppp1" message.

After this, authentication was completed. At this point IPCP and IPV6CP was initiated.

IPV6CP completed first (but it could just as well have been IPCP.

This correctly triggered net-pre-up.

After which the normal IPv6 config and ipv6-up scripts was run (not waited for).

IPCP then came up, and executed it's usual ip-pre-up (after configuring the IPs actually, and really was live by the time the script executed).

This correctly ran ip-pre-up, and the set_ifup did NOT trigger net-pre-up again. As expected.

After this the ip-up script was also triggered and not waited for.

Upon ^C both IPCP and IPV6CP was terminated, and upon termination of the latter net-down was triggered.

There is one debate here - should the ip-down and ipv6-down execute PRIOR to net-down? (None of them are waited for, so I don't think such a guarantee makes sense.) As it stands, net-down will trigger PRIOR to the "last proto down" executing it's script, to "fix" this I'll need to move the execution of ip-down and ipv6 down prior to calling set_ifdown, but still after deconfiguring the protocol from the OS.

pppd/main.c Outdated Show resolved Hide resolved
pppd/main.c Outdated
* use set_ipup("IPCP") ...
*/
int
set_ifup(const char* name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fan of the new static variable up_protos[]. The protocols are already a part of the protocols[] (struct protoent). Seems like there should be a possibility of querying that structure to figure out if it is set in state up or not.

Also, I would imagine that NET_PRE_UP would be executed like IP_PRE_UP which means all configured network protocols have marked themselves as done (either UP or FAIL), but interface is not yet in configured with IFF_UP yet).

A script could possibly query the configuration of the interface being brought up to find the values it needs if it isn't passed into the net-pre-up script (as ip-pre-up would pass in the IP information, or the IPv6 information).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

array of ints vs array of pointers? I was also wondering how to avoid up_protos, but alas, couldn't think of any. Even considered a bitmask ... do you have a better suggestion?

Yes, NET_PRE_UP gets executed like IP_PRE_UP, but since we cannot determine the order in which protocols configure, we can make no guarantees w.r.t. configured status (some protocol - probably exactly one - may already be configured, but it could be either ipv6cp or ipcp or theoretically some other), only that the interface will not yet be in IFF_UP state, but about to transition into that.

Yes, a script can either query the details using say "ip" or by using the environment variables (as set by script_setenv calls scattered through pppd - but again, no guarantees can be made since probably only one of ipv6cp or ipcp will have completed at this stage).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eliminated the sys-* netif_set_up and netif_set_down functions in preference of exposing a single netif_set_state() from each.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would look at main.c and find how they processes the protocol structure in get_input(), but if you could add a int field in there that says state, and this new function would
pseudo code:

for (protocol entry : protocols) {
if entry.id == PPP_EAP || entry.id == PPP_LCP || entry.id == .. ) // or, add a field phase in each protocol entry, i.e. mass update the files, then if (PHASE_NETWORK != entry.phase)
continue;
if !entry.up
return -1
}

// all NETWORK protocols are UP here.
setifstate(unit, 1);

Generally, adding the extra information to protocols structure; could help readability of the application. PHASE_NETWORK would be up when all NETWORK protocols has been negotiated.

Similarly, a flag to say the protocol is an AUTH protocol, you could chose to only pass the information to that protocol instead of checking EAP and LCP. See RFC1661.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to run NET_PRE_UP before all protocols has completed negotiation runs you afoul with the problem that not all information can be extracted. Though, trying to pass it to the script may be difficult, but a script can be made to query the state of the interface and extract the IPv6, IPv4 and MPLS information. If you execute the NET_PRE_UP before all network protocols have been negotiated then this information would be lacking at that point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the current protocol structure doesn't state if a protocol was intended to run during a particular phase, that could possibly be added in a separate change to ensure once you iterate through the structure don't have to e.g. compare it against a set of hard-coded protocol numbers (e.g. see get_input() in main.c).

This would be helpful here too where you iterate through the protocols and would only set the interface in state up when all the protocols has come up. Unfortunately, this probably require it to expose yet another callback to the protocol code to say: yeah you are up and scripts like ip-up and ip6-up gets executed (after interface has been configured in state up (IFF_UP)).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enaess thinking on this I agree this would be nice.

However, and this is a big issue, both ip-pre-up and ip-up is called from the same function in ipcp.c - this is a sequential step. So this would require a fairly major redesign.

I can certainly add a notify() and/or net_{pre_,}up_hook() mechanism for you, that's quite trivial.

Do we know of many systems using ip-pre-up? My recommendation would be that protocol-specific pre-ups remain, but that they be moved prior to interface configuration. This does of course imply that net-pre-up will likely be called after one protocol-specific pre-up, which I think is not ideal. Unless all protocols adheres to:

  1. Ensure iface is UP.
  2. Invoke protocol-specific pre-up (currently only ip, not ipv6), providing "will be used" details.
  3. Configure interface for protocol.
  4. Invoke up (currently ip and ipv6).

Alternatively (very major rework referred to above) we can somehow permit all protocols to do something like the following:

  1. Upon starting pppd, prior to proceeding indicate that the protocol (if to be negotiated) needs to finish negotiation prior to interface up.
  2. If negotiation fails remove from the "desired" set.
  3. If negotiation succeeds, configure the interface as required.
  4. Invoke pre-up. (This can potentially move to pre-up callback in 5 in order to only invoke it after net-pre-up I reckon).
  5. Call if_setup(), additionally providing two optional call-backs. 1 to invoke when ready to bring interface up, just prior to actually doing so. 1 to invoke just after bringing the interface up.

All functionality in the current _up functions after the call to if_setup would thus move into the latter callback. I don't see a specific need to the first callback, but I'd rather someone look at it rather than for it.

One problem I possibly foresee is ondemand ... but I'd have to take a much closer look. As I see it there would then be four net-* scripts:

  1. net-init - as per current.
  2. net-pre-neg (need a better name, net-ready came to mind but that implies to iface is ready to bring up, which is not the case) - invoked post authentication but prior to initiating IPCP and IPV6CP (and others).
  3. net-pre-up - invoked just prior to the pre-up callbacks (such that it's possible to get ordering net-pre-up followed by proto-pre-up).
  4. net-down (as current).

Would this be more suitable? What if we WANT to bring up the interface when the first protocol is ready instead of when ALL protocols are ready? My opinion is that the current setup kinda works for me in that once the first protocol is good to go the interface can come up, so if IPv6 goes back and forth, failing to negotiate, why should it delay IPv4 or the other way around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enaess could you please provide some feedback here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the general outline of the above seems okay. However, I would want to solicit feedback from @paulusmack to make sure he is okay with a change like that and what you are trying to do. The other part of this is how "risky" is a change like that, meaning will it break or become fragile, e.g. is ip-pre-up / net-pre-up deterministic in all cases like ip is up, but ip6 failed negotiation, the test-coverage, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paulusmack @enaess may I please request some direction here?

pppd/main.c Outdated
int i = 0;
const char** t;

while (up_protos && up_protos[i] && up_protos[i] != name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use a strcmp() here instead of comparing the memory address?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no real reason other than a compiler shouldn't duplicate the string and as such a pointer comparison is good enough. I've never seen that it gets duplicated but I've only ever worked with gcc in the last 19 years and some MSVC prior to that. Can convert if that's preferred, doesn't make a significant difference, and since others will probably wonder the same thing it's probably worth the change from a sanity perspective and just in case some compiler does actually duplicate the strings?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@enaess are you happy with the explanation or should I switch to strcmp?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not found of pointer comparisons as they imply deep understanding of the code and what you are trying to do. It is for the same reason Java does string comparisons using String.equals() vs. ==, they mean two completely different things.

pppd/sys-linux.c Outdated Show resolved Hide resolved
@jkroonza jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from b810041 to 6c56f53 Compare August 24, 2022 20:27
@jkroonza jkroonza mentioned this pull request Aug 24, 2022
pppd/main.c Outdated
run_net_script(PPP_PATH_NET_PREUP, 1);
if (if_indextoname(ifindex, iftmpname) && strcmp(iftmpname, ifname)) {
info("Detected interface name change from %s to %s.", ifname, iftmpname);
strcpy(ifname, iftmpname);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to call script_setenv here for IFNAME? What do I pass for iskey?

Only case where set_ifunit is called with iskey=0 is under multilink, in which case pppd joining a bundle I suspect (but don't KNOW) that the bundle interface will already be UP and thus set_ifup will never be called.

We've only ever used multilink in a handful of cases, and those use-cases are long-gone now so don't have an environment where I can scope and check this.

@Neustradamus
Copy link
Member

@EasyNetDev: What do you think?
We wait you...

@enaess
Copy link
Contributor

enaess commented Oct 22, 2022

@Neustradamus I don't see this PR as a reason to hold up a release. While some good collaboration happened here, the idea of a new net-pre-up and net-up callback to improve on the current problems related to ip-pre-up and absence of ipv6-pre-up can be implemented as a separate PR later

@jkroonza
Copy link
Contributor Author

@Neustradamus I don't see this PR as a reason to hold up a release. While some good collaboration happened here, the idea of a new net-pre-up and net-up callback to improve on the current problems related to ip-pre-up and absence of ipv6-pre-up can be implemented as a separate PR later

Hi,

I would really appreciate if we can get this merged, what would need to happen and what can I do from my side to assist in this regard?

@jkroonza jkroonza mentioned this pull request Dec 29, 2022
@jkroonza
Copy link
Contributor Author

Copied from PR #111

I'm a little confused by the intent of this change (#367).

OK.

There are a couple of reasons that the sifup/sifdown paths (which are by design intended ONLY to set flags, not to run scripts) are separated. One is that different platforms have different ways to represent multiple addresses on a given interface. This is a place where I think Solaris differs substantially from Linux. Solaris has "logical interfaces." A logical interface is the entity that holds the address and a set of independent flags for EACH address. That is, unlike Linux, each network layer protocol (actually each address) can be marked up or down independently.

Right. I was not aware of this. Does that actually happen that way? Because the various functions simply marks the interface up/down in the case of pppd, not a specific protocol/address here.

On Solaris, the SIOC[SG]IF* ioctls and the SIOC[SG]LIF* ioctls are different. The [SG]IF ioctls deal with legacy-style BSD-ish interfaces (dating back to SunOS 4.x days), while the [SG]LIF ioctls deal with logical interfaces. In the proposed change, all of the "LIF" handling has been deleted from the code base. I haven't tested this, but I wouldn't be surprised if this caused problems.

I can only test on Linux unfortunately. I looked at the code, and as far as I could determine, the code which was removed was 100% identical. I could have missed something. The primary aim here was to have a consolidated if up/down interface, which then did all the common work, around dispatching to the OS specific functions. Not to change behaviour, but to enable adding OS and protocol-independent up/down work to interfaces without having to do this in multiple places.

(The ioctl split exists in order to maintain binary compatibility with SunOS applications.) For example, on a Solaris 11 machine, I see this: lo0: flags=2001000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4,VIRTUAL> mtu 8232 index 1 inet 127.0.0.1 netmask ff000000 net0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2 inet 10.50.31.94 netmask ffff0000 broadcast 10.50.255.255 ether 8c:73:6e:c0:e:8a lo0: flags=2002000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv6,VIRTUAL> mtu 8252 index 1 inet6 ::1/128 net0: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 index 2 inet6 fe80::8e73:6eff:fec0:e8a/10 ether 8c:73:6e:c0:e:8a net0:1: flags=20002004841<UP,RUNNING,MULTICAST,DHCP,IPv6> mtu 1500 index 2 inet6 fd48:3e6a:e278::e30/128 Things to note in this example (sorry about the line wrap): there are two distinct instances of "net0." They are distinguished by address family. And there's a separate "net0:1" logical interface that was created by DHCPv6 and is managed by that protocol. And for IPv6, in general, only the LIF ioctls will work on Solaris, due to the change in interface (wider values in the ioctl data structure). As for the rest of the change, I'm not sure I follow what the ultimate architectural goal of this might be. To me, each NCP is (somewhat unfortunately) different in terms of what it actually negotiates and what the units of currency are. This means that, for instance, IPv4 negotiates distinct endpoint addresses that are used at the application layer, but IPv6 negotiates only the IDs used for link-locals (which no application should touch), and MPLS has no addresses at all (LDP does that).

OK, so the question here is if it's possible to UP and/or DOWN the IPv4 vs IPv6 net0 interfaces (and even net0:1 independently of net0)? Based on what you stated, it sounds like a "yes it can", but can you please confirm? From what I recall the pppd code most certainly did not do that. And in fact, there was quite a bit of confusing stuff around upping and downing interfaces, and it looked like it's possible for IPv6 to tear down, and bring down the entire interface (ie, ifdown) without actually tearing down LCP as a whole, leaving pppd in a situation where it could consider IPv4 and MPLS etc to be "up", but due to the interface being down it's actually down. If you're right (I'm not saying you're wrong, just looking to understand better) about the way solaris handled this then I'm way off base and will probably need to rethink this a bit.

And that's ignoring other more exotic protocols such as AppleTalk that negotiate a separate network number and node address. All of this means that the environment variables and the command line arguments (again, unfortunately) differ for these external scripts, and is another reason these code paths are separate. I don't see what the grand script unification does except maybe cause compatibility issues. Moreover, there's no notion in PPP of having reached a stage of "fully up." Any link can bring up or tear down an NCP at any time, so if the ultimate goal is to get to a point where we can just invoke one "I'm up" script, I don't think that can be done in any coherent way.

I think this touches on some of the discussion and issue points. So what I desperately want is an init script to execute right at the start of pppd (net-init as implemented in that PR), then scripts to execute just before upping the interface (protocol independent, net-preup, after auth, but prior to setting the interface up), and then to execute when tearing down (less critical, but we do have lated ifb interfaces used for shaping ingress traffic from the ppp interfaces which needs to be torn down too).

(I'm not sure if that's where this was going, but just a guess.) Some minor issues:

  • If we're going to delete 'unit' numbers, let's not do it piecemeal. Let's have a single change that nukes them all, rather than a series of changes that arbitrarily yank them from one old interface or another. I'm fine with removal of that, as it was used only be a few odd non-public branches of the code, and is hard to maintain correctly when "nobody" is using it. I'm just not fine with incremental breakage; it seems unprincipled to me. (As this change is doing with sifup et al. Yes, I know it's not just this change; this is a pet peeve.)

Wasn't my intention to break anything, and I did take as much care as I could to maintain backwards compatibility. I do agree with your pet peeve and I can only state that was not my intention. You reference 'unit' numbers here specifically, I do recall yanking certain parameters which I think could have been unit numbers, they were yanked from functions I was modifying where to the best of my ability to determine they were not being used anyway. You reference "odd non-public branches" that may have used them, of this I was not aware. I also didn't spot specific other places in the (fairly large) codebase where they could be removed, but I could well have missed stuff, obviously, which is why we have reviews of code before merging :).

  • Definitely agree with the other commenters saying that comparing static strings by address only is not a good idea. It saves almost nothing in execution time, and opens up a neat hole for someone else to fall into by way of a bad compilation flag, a new pppd plugin, or some other contrivance. I don't see why this mechanism is so complex, when we already have per-NCP registration mechanisms for the FSM and this sort of state information could be carried there, but if it must be separate and must be done with a global array, then at least use strcmp.

OK, so add this to the per protocol structure is what I'm getting? This may also allow the one requested additional feature called net-up to be invoked once all protocols have been brought up.

The one other requested change was to only actually up the interface once all desired protocols have been negotiated and brought up, but as stated, that would require changes to most sub protocols which can obviously be done. Specifically, sub protocols would need to signal "configuration done, ready to be upped, please call this callback when interface is up". This can be done, and I'm happy to put in the effort for the in-tree protocols. I would, however, recommend that this be configurable behaviour.

If we're going that far, I'd further recommend that any and all other pre-up scripts be nuked, since already they have no guarantee that the interface is actually still down. Alternatively, we need to follow the above methodology and enforce that.

This is perhaps a bigger change than what should be merged at this point though, and could cause further delays to an overdue ppp release.

  • I saw some discussion on the issue of running the "down" script before the link goes down. It might seem nice to run the script before a locally-initiated tear-down starts, but that should never be given as a feature to the user, because there's no general way to make it work. The other end can terminate at any time, and in practice the network connection itself may well have ended before any sort of notification is received (which may also come in the form of a simple loss of carrier). So, sure, if you can invoke it early, and believe it to be important, then do so, but don't advertise it too strongly in any documentation or suggest that anyone should rely on it. (For what it's worth, I like interfaces that give consistent semantics rather than "helpful" ones. In other words, I'd prefer to see the "down" script run after the link goes down in all cases, so users can't become dependent on happenstance. But I guess others may differ or have particular limited usage cases in mind.)

Agreed about the consistency, will gladly make this change. They way I understood it was that locally the interface will still reflect UP, but I do agree with you, it will already no longer be operational if remote side tears down, so conceding to rather execute post down. This was a debate in my head and honestly it doesn't particularly matter that much.

@jkroonza
Copy link
Contributor Author

jkroonza commented Jan 2, 2023

@carlsonj replying here rather so we can split the release discussion from this, I think they should be separate at this point.

In Solaris, a logical interface is a specific protocol and a specific address. Marking an interface by name does that. It's not the same as Linux.

ACK. Will re-look at this code.

Here's what was missed: if (ioctl(ipfd, SIOCGIFFLAGS, &ifr) < 0) { if (ioctl(fd, SIOCGLIFFLAGS, &lifr) < 0) {
Two big differences there:

  1. The IPv4 code uses the common "ipfd" file descriptor, which is open on "/dev/udp" for somewhat legacy reasons. The IPv6 code uses a temporary IPv6 UDP socket. (Not sure why; the code likely predates the addition of the ipfd6 variable. If I ever run into Adi Masputra again, I'll have to ask.) This is where the address family comes in.
  2. The IPv4 code uses SIOCGIFFLAGS. The IPv6 code uses SIOCGLIFFLAGS. The two are not the same; GIF != GLIF. I agree the naming here is an eye test, though.

ACK. I'll re-look. My suggestion is then to proceed with a new ppp release without this patch. Work does need to be done here, and we'll keep using this patch on our systems for the time being.

  1. pppd is initially launched. No need for a special script here, as the launcher can just do whatever is needed.

Can it? Let's say pppd is launched by xl2tpd or pppoe-server, you excpect the system administrator to somehow figure out which unit number is going to be used and wrapping pppd? Guessing that the "connect" option can be abused here, but at this stage (to the best of my knowledge) the unit number still won't be available. The disconnect script also doesn't run reliably due to "script is not run if the modem has already hung up". init script seems to be more aimed at initializing serial lines, either way, from a quick look these scripts only execute when ttyname option is given such that devnam is not the empty string. In fact:

Jan 2 12:05:11 kerberos pppd[9457]: In file /etc/ppp/options.lns: unrecognized option 'init'
Jan 2 12:05:52 cerberus pppd[24775]: In file /etc/ppp/options.lns: unrecognized option 'connect'
Jan 2 12:06:32 cerberus pppd[8513]: In file /etc/ppp/options.lns: unrecognized option 'disconnect'

This is when incoming over xl2tpd, pppd invoked by same.

  1. pppd completes negotiating LCP but has not yet done AUTH (if any). We don't have this now. Not sure if it's important. It could be added.

This is the net-init part of same. Yes it's important. We need the unit number for initialization even prior to authentication. Mostly this ensures that ppp "state files" as we keep them were properly cleaned up during previous runs of pppd but there are other uses for us too.

  1. pppd completes authentication of the peer and has not yet done any of the NCPs. This is auth-up.

There is both an auth-up (auth success) and auth-down (auth failure). Guessing the latter can be used to implement some form of source blacklisting, ie, for pppoe add the source mac into ebtables or similar to prevent further abuse, or to an ignore list for pppoe-server (which is a non-existent feature currently, but can be added, with the caveat that an attacker at this level can trivially bypass a mac block list, and can even utilize that to create a denial-of-service to other legitimate users. More useful for things like pptp and l2tpd wherewe can do a source IP block, but not useful in our primary case where we have a PPPoE to PPPoL2TP bridge as the LAC.

  1. pppd completes negotiating an NCP but has not yet marked the network interface as "up" for traffic so no data is flowing yet. This is ip-pre-up. (Sadly, we have this for IPv4 only. Someone might want to add ipv6-pre-up.)

I created a PR for exactly that originally. It was shot down for various reasons, primarily that ip-pre-up is already unreliable (except perhaps for solaris), and adding ipv6-pre-up adds to the confusion. Two primary suggestions that came from that:

  1. Delay "upping" the interface until all NCPs has completed, that way ipv6-pre-up would make more sense too. And net-init, net-pre-up, net-up and net-down could be further enhanced possibly. The question is, does net-pre-up then execute as the first pre-up script, or the last one in such a case. Would require callbacks into NCPs to notify NCPs that interface has now being brought up.
  2. (my personal preference) Modify behaviour of pre-up scripts to provide details as to what will be configured (with no guarantee as to iface up/down state), passing the information as arguments to the script or environment variables (which I believe is already the case). Thus:
    2.1. Execute net-pre-up;
    2.2. bring up iface for operating systems that doesn't support per-protocol interface state (based on solaris input);
    2.3. Set env variables;
    2.4. execute ncp-pre-up;
    2.5. configure interface;
    2.6. bring up protocol specific interfaces for operating systems that supports that; and
    2.7. execute ncp-up.

Note that the current patch series executes one NCP-pre-up prior to net-pre-up prior to upping the interface. This, following this discussion, will break at least solaris. This was my oversight plainly. The above should deal better with both Linux and Solaris type methodologies.

The involved scripts documentation would then be:

net-init - executes the moment the ppp interface has been created, at this point the interface will be down, unconfigured and prior to authentication commencing. You can use this script

auth-up/down - as per current. It should be noted these are executed asynchronously and thus precludes use for performing work between auth and NCPs being brought up.

net-pre-up - executes once the first NCP has completed negotiating. This is guaranteed to execute whilst the interface is still down, will be executed synchronously and should thus perform it's work in as short a time-frame as possible. No protocol specific information will be available at this point. You can safely rename the interface from this script, for example, based on authentication information.

*-pre-up (currently only ip-pre-up): Will be executed once the NCP has completed, prior to configuring the interface and, for operating systems such as Solaris that uses per-protocol interfaces, bringing up the per-protocol interface. For all other operating systems the interface will already be UP, but not yet configured for the specific protocol. The to be configured details will be available

*-up (I believe just ip-up and ipv6-up): Will be executed once the interface has been configured for the given NCP.

There was quite a large discussion around this. Essentially, pre-up cannot (In Linux world at least) be guaranteed to execute whilst the interface is still down. The consolidation was primarily done to simplify the net-* scripts for me, and because I thought it made sense to push this kind of work through a central controller function. I still think it does, but plainly it's not quite as simple as I thought.

  1. pppd has marked the network interface "up" and traffic can now flow. This is ip-up and ipv6-up (and others where supported).
  2. pppd has lost a network interface that was "up." This is ip-down and ipv6-down (and others).

For solaris where there is a per-protocol interface this makes sense, for Linux it most certainly doesn't make that much sense, since the interface has a single global state, eg:

6611: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 34:48:ed:8d:d4:61 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.144/24 brd 192.168.1.255 scope global dynamic noprefixroute eth0
       valid_lft 361sec preferred_lft 286sec
    inet6 fe80::cbb1:36d1:d0dc:a6a7/64 scope link 
       valid_lft forever preferred_lft forever
  1. pppd has lost an authenticated peer. This is auth-down. "net-init" sounds a whole lot like point (1) to me. I don't see why anyone needs a special hook for this.

Refer above. auth-down executes on authentication failure, once a peer has been authenticated, the ppp is authenticed, authentication won't happen again.

The really narrow case that could be made (in the current system design) would be to have a script that fires when the interface is activated, just to support the rarely used "on-demand" feature in pppd, and also have that script fire immediately on the usual sort of pppd invocation. I guess I could get behind that, but it's definitely not like this proposed change at all.

I disagree. The patch series does not take ondemand into consideration at all currently. This point is valid, and if there is demand for hook scripts here I could work that into the system. ondemand itself I suspect makes less and less sense overall, except perhaps for things like vpn connections (client side). Eg, have the pppd sit there with a route pointed at the remote network, and bring up the vpn whenever it's needed, this to me makes some sense, however, ncp-pre-up in above should deal with those use-cases. I think. Haven't thought about this much at all.

"net-preup" already exists. That's point (4). Just use ip-pre-up. Tearing down also exists. That's point (6). So I'm not exactly sure what's being fixed here.

No it doesn't. In solaris world this statement may be true, but certainly not in Linux world. As it stands ip-pre-up has zero guarantees on linux with regards to being executed with the interface still being down.

I was specifically referring to the change for sifup() and sifdown() to remove the unit number and replace with the renamed set_ifup() and set_ifdown().

OK. Not sure what I missed here then, but I'll re-look.

Oof. Why? I think that roughly 99 and 44/100ths of this can be done using reasonably well crafted scripts with the existing interfaces. Granted, it looks like if you negotiate both IPV6CP and IPCP on the same link then it might be possible that ip-pre-up can fire "too late" but ON LINUX ONLY, so that's a Linux specific bug.

As I understand, so that we can have a guarantee that the interface is still down when executing ip-pre-up. With the information around Solaris now I had to think wider about this, and I suspect the above strategy should be suitable all around? The change in behaviour from a system administrator point of view would then be restricted to ncp-pre-up being executed prior to actually configuring the interface (which makes sense on Linux, not so much on Solaris - perhaps we need to vary behaviour here based on OS behaviour?)

I guess I wouldn't mind seeing "if-pre-up", "if-up", and "if-down" scripts that are invoked on Linux only when the whole interface is marked up and down by pppd. Not entirely sure if that fixes your problem, but it sounds like something that sys-linux.c could do without hurting others.

I call it net-, you call it if-, guess these are both aimed at the same. I don't care about if-up personally but yea, I guess this can be expanded to (keeping with my net- rather than if-, naming is irrelevant IMHO):

net-init - executed prior to negotiation (even auth).
net-pre-up - execute prior to bringing interface up. Permits renaming (using eg "ip li set dev pppX name pppX-jkroon").
net-up - executed after bringing interface up (on solaris I guess this will make little sense since it doesn't have a single state, but the interface could still be RUNNING at this point, just not protocol-specific UP?) This could be the added from my patch series fairly trivially and makes some sense for me.
(net-all-up - executed after all NCPs has either finished being configured, or failed conrfiguration. I see no point in this and thus won't endeavour to implement this unless someone requests me to do so)
net-down - executed after tearing down the interface, prior to destroying it (ie, ppp* still exist).

It's also possible that what you're really proposing is introducing a new "phase" for pppd, and I think that'd need a little more discussion. This would be a phase after NCP that represents "all NCPs resolved" (as described above), and possibly on Linux only, would be a place where the interface is finally marked "up." That seems like a more radical change than what's proposed here, but at least seems consistent with the existing design. Or, in a simpler model, we could just have a new pppd option that disables the modification of the IFF_UP flag by pppd, leaving that part to external scripts (or programs) to resolve. Then you can do whatever you want in your "if-up" and set the flag at your leisure. You might still need the "all NCPs" script to make that work.

You would. personally I have no need for a NCP specific pre-up in the Linux world, however, in Solaris world (without thinking too hard about this) it would make a lot of sense since each protocol really has it's own interface.

Otherwise, removing the existing scripts would also be a pretty big and incompatible change. I can't see how something like that would ever reach consensus without a fork. Even then, I think it'd be a long term disaster.

Just to be clear, not all suggestions in this thread came from me. My aim was "simply" to get a net-init, net-pre-up and net-down. Consolidation was done in order to simplify that (ie, no need to signal net-pre-up from hordes of different locations), and the consolidation I believe is what broke things overall for solaris as I understand. I'd still like to action that, and put as few things as possible in sys-*.c such that the behaviour of pppd between different operating systems remains as close as viably possible between different operating systems (and to make porting between operating systems easier, not sure if *-bsd is currently supported, for example, or whehter that's even viable). Currently we only have Linux and Solaris.

Agreed. So the ncp-pre-up being pre or post configuration could be a configurable option such that the default is to stick with current (broken on Linux) behaviour, but allow for the alternative behaviour. Or conversely we could have different behaviour hard wired in pppd based on the OS design. But that's still an incompatible change for Linux at least, so I think we should run this as an option and highly recommend the option being used on Linux then but not currently make that default, just warn that you should consider it's use (once off for every pppd run if one of the ncp-pre-up scripts is executed, unless the no- variant is explicitly set such that the system administrator has indicated to opt-in to the potentially broken behaviour), in a further future version we could then start warning that the default behaviour will change at some point, and then change it eventually.

Sorry for these long messages, but I feel it's important to not leave stones unturned, and to make sure that everything is well understood. Would the above strategy be adequate for you to keep existing behaviour as close as possible, whilst still introducing the new behaviour (which enables improvements for Linux sysadmins then).


-- James Carlson 42.703N 71.076W FN42lq08 @.***>

@carlsonj
Copy link
Contributor

carlsonj commented Jan 3, 2023 via email

@jkroonza
Copy link
Contributor Author

jkroonza commented Jan 3, 2023

Without quoting (because the interface confuses me here and it's getting absurdly hard to sort it out):

If we can fix ip-pre-up, that solves our requirement for net/if-pre-up. Having looked at the code I do not thing this is a trivial endeavour though. Possible but not easy.

We still need/want a net/if-init.

And some form of -down script (doesn't really matter if we use ip-down or ipv6-down, but we generally can only take action on the last one ... how to figure that out, thus for me net/if-down makes sense to execute once all NCPs have been brought down.

ip-* only executes if IPCP is being negotiated, so it's conceivable that in the medium to long term ip-pre-up may no longer be usable. We certainly are trying to get to a point where we can ONLY have IPv6 on interfaces where possible, and tunneling IPv4 EDGES via IPv6 (most likely candidate currently is 464 XLAT or similar). I thus reason that a net/if-pre-up makes sense anyway.

I'm not sure net/if-up makes sense, for that matter even ip-up and ipv6-up makes little sense to me as everything we want to achieve should happen in pre-up. I guess there are possible absurd things like "enable bgp peers" that should only happen once the protocol is actually up though (crazy example, since I can't think of anything better). This is protocol specific though, and I can't envision any need for net/if-up.

So the issue on Linux happens with a sequence like:

LCP comes up.
AUTH phase happens (in neither, one or both directions, doens't matter).
NCPs starts. (let's say IPCP and IPV6CP).
IPV6CP completes, and brings interface UP. Only IPv6 LL at this stage. Other handling happens outside of pppd by way of RAs etc ...

at this stage the interface is UP and starts passing IPv6 packets.

There is no ipv6-pre-up either (adding this makes a lot of sense for Solaris, and my PR which has been closed can be used for this). In Linux this would suffer the same issue as ip-pre-up currently. One of the two would execute with the interface being down, but not both. And if another NCP is also being negotiated, it could be neither.

No IPCP completes, and pppd does this:
Configure the interface's address.
At this point packets can already be passed (Linux, not Solaris).
ip-pre-up executes.
Interface is "brought up" (ie, counter incremented).
ip-up executes.

Not too worried about teardown. Hoping this brings some clarity as to the perceived problem.

I'm thinking that documentation on Linux should be updated to reflect reality, as that is MUCH, MUCH simpler than fixing the overall problem (which I believe would require modification to ALL NCPs - some of which may not even be in-tree).

I'm thinking introducing interface/network level init, pre-up and down makes sense since these are protocol (NCP) agnostic.

I'm thinking having an option to explicitly execute any NCP specific -pre-up PRIOR to actually configuring also makes sense for me.

Currently ip-pre-up permits for interface renaming. Which is where the idea to permit this from net-pre-up came from. And there was quite a nasty other patch floating around to enable ifname to use % codes to include things like usernames etc into the interface name. The primary motivation being to set the interface name based on the authenticated username, eg, if user jkroon dials in over l2tp/ipsec, the interface can be renamed to something like vpn-jkroon rather than pppX. This can only be done after authentication. We don't have a specific need for this to be honest, this forms part of our state files.

The state files I reference are basically short key=value pair text files we keep on a per-interface basis, for example:

direction=in
protocol=l2tp
user=jkroon@realm

Except for in net-init and net-down these are append-only files. In net-init it's blanked, in net-down it's removed. This allows us to generate lists of connected users, and the mechanism they're using to connect. Information about LACs and calling station IDs are also added to this file. Yes, most/all of this information is also available radius side, but it's still convenient from a system administrator perspective to have this information "locally" on the LNS.

In terms of where we come from, we're an ISP, so using options like "unit" and "ifname" in the options files is precluded since it would cause problems, and we can't afford to have a per-user options file, not to mention that you can't reliably from pppoe-server nor xl2tpd determine which options file would be required until after PPP auth has happened.

Regarding having the option to "modify" behaviour - I do concede that this is nasty, but I don't see a better mechanism. The choices with respect to ip-pre-up (and any other NCP specific pre-up) are as follows:

  1. Leave broken. But at least document it.
  2. Try to fix it, probably requiring all NCPs to be modified.
  3. Leave broken by default, document it and give an option for changed behaviour (and warn if it's not set). Eg, preup-prior-to-conf and preup-after-conf (naming is too long, but you get the idea, issue a warning if neither option is set ... in Linux).

I'm not convinced of any of this, and as per yourself, this just doesn't sit quite right.

@carlsonj
Copy link
Contributor

carlsonj commented Jan 4, 2023 via email

@jkroonza
Copy link
Contributor Author

jkroonza commented Jan 4, 2023

Thank you, that feedback was very clear and concise, much appreciated. Fully agreed around technical debt, I've spotted a fair amount of that in ppp project just trying to implement this. Hoping to SOLVE some of this for the future, not add to it, so your input goes a long way for that. Regarding comparing ethernet to ppp ... I don't think that's a always a fair comparison, ethernet is designed as a multi-access network, not point-to-point, and whilst many of the issues may be similar, this is not always the case, especially when you consider ISP type use. If you look at dhcpcd for example you will note it also has a very comprehensive hook script mechanism that can be used to take a number of actions, and for us these actions are very different when compared to ppp. So even on a client or "access" side, we generally treat this very differently.

I was not aware of extra_options, to the best of what I can determine looking at auth.c this only applies to local secrets files.

163 /* Extra options to apply, from the secrets file entry for the peer. */
164 static struct wordlist *extra_options;

Yes, to a degree I think net-pre-up is more future-proof than ip-pre-up as it does not rely on IPCP from being run. It also makes it more clear that it will execute regardless of NCPs being negotiated. This world is probably further off than I'd like to see, but I figured whilst I'm messing/tampering with it I might just as well.

I personally do believe that net-pre-up vs ncp-pre-up makes some sense. And I do believe both have their uses. For example, it makes more sense for me to do NCP agnostic work (like setting up tc on the ppp interface) from a network-agnostic script, and then to possibly amend or add to it from ncp-pre-up PRIOR to the interface coming up. Frankly, in this specific use-case we have nothing protocol specific going on, however in some other cases we do have some ipv4 vs ipv6 specific things going on. the ipv6 stuff currently due to lack of ipv6-pre-up just happens from ipv6-up, and we use procfs to not auto-configure an interface by default, but then on the ppp interface if we'd like it enable it from ipv6-up, so this isn't a major blockage currently. Based on the fact that this has not been requested previously, I suspect that either IPv6 penetration isn't what we'd like it to be yet, or very few people care enough (or have figured out workarounds like we have). Towards this end, no, I do not believe that it's one or the other, it really should be both.

I do agree I'd prefer to have the interface configured for the NCP prior to invoking the ncp-pre-up, but if we can't guarantee that this can happen, then I'd like the OPTION of having it execute PRIOR to configuring the interface with IP addresses, but with the environment variables already set. As we've established, behaviour on solaris is sane, linux is not. I'd rather have the issue documented at the very least in the man page, and workarounds provided if the issue can't be properly fixed. With an option to pick which quirky behaviour the system administrator prefers, and I see three options (based on your input too):

  1. Delay interface UP until all NCPs have completed (you stated you will investigate options, there are downsides to this w.r.t. buggy remotes resulting in long, sometimes indefinite negotiation, and simply stating the remote side should fix isn't always viable/practical).
  2. Execute pre-up prior to configuring (adding addresses) to interface, with no guarantee that interface is still down.
  3. Execute pre-up after configuring, with no guarantee that interface is still down.

I'll happily concede that 1 should be the default, but I'd like the option of behaviour 2 or 3 too.

I'd like to have the options that interface comes up even if only one NCP completes, in case of remote bugs (being on the ISP side it's hard to guarantee what stupidity we face from the remote side, we've seen some ... interesting things).

We are pushing to a world where we can deploy routers that supports 464 XLAT such that we can move to a world where we only have IPv4 where absolutely required, so these routers will NACK IPCP, and then NAT46 IPv4 into the well-known 64:ff9b::/96, which the core ISP network can then NAT64 again on some dedicated host(s) which is connected to the world with IPv4. This is still a bit off, but yes, I'm at the same time trying to future-proof for this. IPv4 is fast becoming a scarce resource, and at the current IP lease rates this isn't something we're looking forward too, and thus looking to mitigate early.

With regards to l2tp and pppoe other "outer" protocols, we can trust the l2tp host (FNO controlled, and if we can't trust them, then we've got substantially bigger problems than rogue clients), but we can't trust the pppoe nor the ppp side of things. In the case of the FNO bridging PPPoE to L2TP the SCCRP authenticates the LAC, not client, so the LAC (which bridges between PPPoE and L2TP) is authenticated at this stage, but ppp ends up authenticating the final client. VPN use typically only authenticates in ppp anyway, and SCCRP doesn't happen here (PPP/L2TP/IPSec for example).

I've not actually used MPLSCP on ppp yet, so I honestly don't know what's all involved there, or even how it works, but we will have to look at this later in the year for another project.

@carlsonj
Copy link
Contributor

carlsonj commented Jan 5, 2023 via email

@jkroonza
Copy link
Contributor Author

jkroonza commented Jan 6, 2023

640KB :p.

For tc commands, yes, we definitely DO NOT require any form of IP information here. At least, for what we use it for :).

Your "later renegotiate" reasoning looks solid. But I disagree, once the interface is up it should remain up. It's trivial to disable ipv6 temporarily on a per interface basis via sysctl (net.ipv6.conf.${interface}.disable_ipv6), but I don't see similar for ipv4. And once disabled, to re-enable experience dictates that a number of other sysctl's for same iface also needs to be properly reset. I've not dug too deep into this. This is a really, really sticky case, but not one I'm too worried about.

I still do not like there being any form of longer delay bringing the interface up than absolutely required. I really feel strongly that (on Linux at least) this should be configurable behaviour, or at the very least, if no ncp-pre-up scripts are present, should default to not delaying bringing the interface up. Since pre-up's are handled from the NCPs this gets tricky again and would require co-operation from NCPs to indicate which has it.

For downing the interface temporarily that's a bad idea IMHO, and again, if this is the way you want to go, sure, but I'd definitely want a configuration option to avoid this behaviour.

Given the discussion, would it be OK for me to so long amend the net-* scripts (would you prefer if-*?) to provide for network agnostic net-init (executes prior to auth even, the moment LCP has been established), net-pre-up (auth completed, either when first NCP signals ready, or prior to NCP being initiated), net-down (after LCP has shut down, prior to deleting the interface from the OS).

Can this be done currently by way of plugin rather than coding this into the core? Then users looking for this can just load this as a plugin rather, eg, netscripts or ifscripts?

@carlsonj
Copy link
Contributor

carlsonj commented Jan 6, 2023 via email

@jkroonza
Copy link
Contributor Author

jkroonza commented Jan 8, 2023

I hear you regarding plugin vs external scripts. So from a system administrator perspective, it is simpler to just modify shell scripts, and it's also a lot simpler to debug these, and if something goes wrong with them, the probability of impacting or even crashing pppd is much lower. We prefer doing as little as possible from inside the pppd binary for this reason. It's also lower maintenance overhead to not have to keep non-public plugins in sync with the pppd project than notification scripts (the available scripts has to the best of my personal knowledge not changed in decades). I suspect I'm potentially one of the first pushing for that.

net-init - purpose: to notify that a new ppp negotiation has started. Basically, a new "unit" has been created. Most people will not find this useful, since quite frankly, as you say, there isn't much information available here. It's pre-auth, pre-everything. I don't recall how I implemented in the associated patch, so I'm not sure if it's actually pre or post LCP established. For our use-case this doesn't particularly matter. Our use-case is to reset out-of-pppd data we associate with the "unit" where the unit number is the primary key (if you think in terms of relational data). Other possible use-cases I can imagine relates to detection of attacks and blacklisting. Ie, if we get a number of initiations but auth never happens then potentially we're looking at some form of brute-force or denial of service attack. Anyhow, I agree this is probably fuzzy at best and difficult to define well in terms of alternative envisioned use-cases. IIRC this happens the moment we have a unit number, even before LCP is established but I'd have to re-check. Earlier == better here.

net-pre-up - purpose: to initialize networking prior to bringing up the network interface. As it's implemented right now, it comes up after the first NCP, which as per yourself is very unclear semantics, can ip-pre-up rely on net-pre-up having executed or not? So that's just not happy. Since we use net-pre-up instead of ip-pre-up for us that didn't matter. I'd prefer to keep delaying this as far as possible to the first NCP actually bringing the interface up, but yea, that's tricky because it means all plugins will have to signal that the NCP is now ready to configure the interface, then execute it's own pre-up, configure and then signal for bringing the interface up. Another question is if the interface isn't actually going to come up, why execute net-pre-up? The simpler methodology is to just execute this once the NCP negotiation is about to start, so it's really net-pre-ncp rather than net-pre-up. The stuff we intend to perform here is essentially house-keeping again, setting up tc against the interface, some rudimentary firewalling updates (from here we simply assume that either ipv4 or ipv6 or both may be negotiated and set up for both accordingly, it would be nice if we could separate ipv4 and ipv6 into their own -pre-up, but that doesn't work currently). Renaming the interface can happen here (not in init). I'm OK if this happens when we move into the NCP negotiation phase (sorry if my naming of phases aren't 100%) just after AUTH finished (or in the case of no auth, directly after LCP has established).

net-down - purpose: clean up out-of-pppd data, including undoing tc (the related ifb at least since when pppX gets destroyed any tc data directly associated with that will auto clean-up) and possible firewall updates. This would execute AFTER LCP has been torn down, and by implication AFTER all NCPs have been brought down (including their -down scripts have been executed). I hear you regarding the risks, please refer below.

net-up we don't care about, but if you want to keep with the NCP parallels it may be worthwhile to also consider this, to be executed once the interface is UP, whether that is delayed until all NCPs are "up" or the moment when the interface comes up ... hard to define. For Solaris this would (should) be once all NCPs are up in my opinion where it makes most sense. For Linux ... this is the sticky part around the entire discussion we've been circling the whole time. Frankly, to avoid this complication, let's just NOT implement this until it's asked for. But I think if this gets implemented it should be "after all NCPs have estalished or failed", in other words "when all NCPs have completed their negotiations". Re-negotiations may also mess with this, as such, this is an over-complication we (in my opinion) should not try and solve now, or ever ideally, only if someone asks for it.

You mentioned buffering of packets, I believe the Linux kernel already handles this to a degree (specifically, there is a file-descriptor from which packets are read, which implies the kernel is buffering them - or rather, buffering at least one, but probably more). So again, at worst, this may look like an "overly long latency issue" to the protocol, and except if it's exceptionally long as in the order of seconds it should be able to recover. I think these scripts executing timeously is the responsibility of the system administrator.

If the distinction is made between the interface state and the protocol state, then separate net-* or if-* (please indicate your preference) scripts compared to ncp-* scripts are perfectly sensible. Perhaps more so on Linux than Solaris. ncp-init may also in some cases make sense if a system administrator may want some way to indicate whether or not an NCP may or may not proceed based on authentication credentials or something (or perhaps net-pre-up should have some way of setting additional options based on whatever auth data may or may not be available - this way one could also use a private radius tag to provide data from radius which this script can then translate into pppd options - just spitballing here, since the radattr plugin already writes attributes to a file, eg, based on data from radius allow ipv4 or ipv6 to be negotiated, or even MPLS ... or associate the interface with a specific VRF in an MPLS world - I don't see a standard radius tag for this based on a VERY QUICK google, so some other mechanism may be required). Yes, there is an MPLSCP expernal plugin floating around which Gentoo have been including for a while now - I'm not sure the source.

I'd highly prefer all scripts to be synchronous, especially init and pre-up, down is less of an issue. With regards to these causing problems, suggestion perhaps, we can always adjust the log entries for executing scripts to indicate if they're being executed synchronously or asynchronously, as well as how long they actually took (in milliseconds), and for any synchronous script issue a warning if they take longer than time X, and we could even have a config option to wait max X time for synchronous scripts before sending them a SIGTERM, and then treating them as async, or simply forgetting about them. This has other risks I'm not sure I like, but frankly, sorting the scripts to not block should remain the responsibility of the system administrator of those scripts, and pppd's responsibility should end with warning in the man page about the risks. Something like this:

WARNING: This script is executed synchronously and if it doesn't complete in a timeous manner could cause protocol failures for the associated ppp protocol, and by implication the connection.

As a general rule, then I guess the protocol independent rules then for external scripts are:

-init - executed when protocol negotiation is initiated (ip-init when IPCP is started, ipv6-iit when IPV6CP is started, net-init when the ppp interface is created). sync.

-pre-up - executed prior to bringing the protocol interface up (see below regarding Linux). sync.

-up - executed after bringing the protocol interface up. async.

-down - executed after bringing the protocol down. sync/async/don't care, whatever the current status quo is.

Regarding Linux: -pre-up can be safetly executed prior to configuration in most cases, as long as the scripts don't depend on the configuration being present. Towards this end there is three strategies:

  1. Delay interface UP status until all NCPs have completed negotiation (or at least all that have a pre-up script). This is mostly workable, with a few caveats:
    1.1. Re-negotiations, or "later negotiations", eg, initially only IPV6CP is negotiated, then later IPCP is negotiated - do we bring if down just to be able to execute ip-pre-up with the down guarantee?
    1.2. NCP negotiation failures may cause unnecessary delays (such as on 3GPP). There are timeouts here, as have been pointed out, so worst case delay should be a few seconds, I believe these timeouts have been introduced after I originally ran into this around 2003-2007, I've not looked into this.
  2. Ignore the problem and execute as per current.
  3. Execute the script prior to performing configuration at the very least.

I honestly don't like any of these options in full. Each have advantages and disadvantages. I personally think this should be configurable to indicate "which broken behaviour you'd prefer if you insist on using a protocol specific pre-up", something like:

pre-up-behaviour (Linux specific) one of the following:

delayup (default) - delay bringing the ppp interface up until all NCPs with pre-up scripts have completed negotiation. It has to be noted that in the (rare) case where a NCP is (re-)negotiated at a later stage the guarantee that during execution of pre-up scripts the interface is not actually guaranteed to be down.

delayconfig - delay configuration of the interface until after the pre-up script has executed. Environment variables containing the relevant information will still be set up prior to execution of the script.

historic - Would appreciate a better name here. The interface is configured prior to pre-up, for the first pre-up there is a reasonable chance that the interface may still be down, but it cannot be guaranteed.

One could elaborate, or link to a page explaining the issue.

At this stage, it may also be a good thing to bring the script names into the NCP specific protocol structure in a way so that consistent cross-protocol behaviour can eb enforced, but this may also be reasoning too far ahead. I'd have to dig a bit deeper, but for the moment I think if we can just kinda try to finalize the desired behaviour.

@jkroonza jkroonza marked this pull request as draft February 14, 2023 06:16
@Neustradamus
Copy link
Member

@jkroonza: New commits have been added, maybe you can look :)

@jkroonza
Copy link
Contributor Author

@jkroonza: New commits have been added, maybe you can look :)

On this PR? Or on master? I can see there's merge conflicts, either way, I don't think this is going to be ready for 2.5.0 ...

@Neustradamus
Copy link
Member

@jkroonza: 2.5.0 is now ready if nothing new ^^
If you have a moment to look...

@jkroonza
Copy link
Contributor Author

@jkroonza: 2.5.0 is now ready if nothing new ^^ If you have a moment to look...

What I've seen and tested I'm happy with.

I'll re-work this code once 2.5.0 is released, hopefully for 2.5.1 but I suspect not, otherwise I'll just use it as a patch of my own until 2.6.0.

For now I've discovered at least one flaw in this patch series that's annoying, to say the least, but for 99% of our use-cases it works. If fixed to work as per the discussion in this thread it'll solve the other 1% too, for us at least. It will, however, require more work and testing than what should be done prior to 2.5.0 at this stage.

Other pending on work on rp-pppoe (server specifically) is also quite a bit more urgent for us and I've got a handle on an xl2tpd bug too that's of the three probably top priority. So any spare time that I manage to carve out currently will go to that. Most of the voice (asterisk) R&D that's ongoing currently should be completed in the next three weeks hopefully, then I can swing back to networking for a stint.

@Neustradamus
Copy link
Member

Hello @jkroonza, have you looked for your PR?

@jkroonza
Copy link
Contributor Author

Hi,

Still need to loop back to this ... hopefully in the coming week.

@jkroonza
Copy link
Contributor Author

Hey, I re-read this and other relevant discussions. I suspect I'll need to dig a bit deeper into the FSM implementation. There is definite work to be done here, and I'm happy to attempt it, going to need to sleep a bit on this.

Out of hand though, the fsm_callbacks struct - is it OK to remove say the up(fsm*) function in preference of calls like configure(fsm*), pre_up_script() and up_script()? Where the script calls really just provide a path (or potentially ncp_scripts) should be split into it's own structure.

Renaming and splitting here serves two purposes:

  1. It prevents plugins from migrating silently without thought being given to the situation; and
  2. We can run multiple network pre-ups in parallel (start the pre-up for the protocol the moment the NCP has completed, not wait for them until we are ready to up the actual interface, at which point we wait for all the pre-up scripts to complete).

{net/if}-pre-up would then execute first, guaranteed to complete, at which point other pre-ups can execute in parallel. This would mean that the FSM would need to become aware of scripts for various network protocols (And I'd put down into this too so we can wait for all of those too prior to pppd itself terminating). Since this work is happening anyway, we might just as well move this into the FSM so we can enforce behaviour of network scripts to that we can ensure -down doesn't get called prior to -up completing, and re-negotiation can't start (or complete?) until previous -down scripts have finished.

Additional (global) configuration options:

no-delay-iff-up - (Linux onl) By default on Linux bringing the interface into an up state is delayed until all network control protocols that has pre-up scripts that exist have completed (either successfully or failed) and all of these scripts (where applicable) have been executed.

no-down-if-for-preup - (Linux only) On Linux if a protocol gets negotiated after an interface has been brought up, then if said network protocol has a pre-up script, in order to maintain the constraint that the interface will be "still down" when the pre-up script it run the interface will be flagged down prior to configuring the protocol, including executing the pre-up, and then the interface will be re-enabled prior to finally executing the up script.

I'm not aware of any implementations that start negotiation after initial negotiations has completed, so perhaps the second option here should specify behaviour or "late" negotians (I have no idea for a name for such a option) with options such as:

  • down-iface-for-pre-up - (Default) bring the interface down in order to configure and execute pre-up.
  • only-if-no-pre-up - (Alternate default?) Permit a protocol if and only if there is no (known to FSM at least) pre-up script.
  • ignore-pre-up - Just don't execute the pre-up script.
  • leave-iface-up-for-pre-up - Execute the pre-up script even though the interface is already up.

Of these only only-if-no-pre-up and down-iface-for-pre-up IMHO really is sensible, but it may be that a sysadmin prefers to execute the script with the interface down, but really don't want to bring the interface down at all once it's up.

This overall is quite a major change, which has impact primarily in the Linux world, and if I do this right, almost none for Solaris.

@carlsonj @Neustradamus - opinions, objections, enlightenments, pitfalls or any other comments prior to me descending down this rabbit hole again?

@Neustradamus
Copy link
Member

@enaess, @paulusmack, @EasyNetDev: What do you think about it and the last @jkroonza comment?

@jkroonza
Copy link
Contributor Author

@enaess, @paulusmack, @EasyNetDev: What do you think about it and the last @jkroonza comment?

Really would appreciate a bit of direction here before commencing work.

@paulusmack
Copy link
Collaborator

I would prefer not to change the FSM callbacks since they correspond to events and actions defined in the relevant RFCs. The "up" action generally means that the two ends have reached agreement on how the link is to be configured. Then there is the question of how to do that configuring in practice in a race-free manner, which I assume is what we're discussing here. It seems to me that we have a higher-level state machine for which the "up" and "down" actions for the individual protocols become input events. Maybe expressing that state machine more formally might help with solving the races.

@jkroonza
Copy link
Contributor Author

jkroonza commented Aug 3, 2023

May I then simply implement a net-pre-up that executes (blocking) once auth is done but PRIOR to protocol negotiations starting?

And net-down (non-blocking) which executes just prior to taking LCP down?

Since ip-pre-up is then broken by design (and based on the feedback cannot be fixed), may I recommend that a big fat warning be added to the documentation? Something like:

WARNING: Please note that on systems where a single interface carries multiple protocols (Linux) ip-pre-up is NOT actually guaranteed to execute prior to the interface moving into an up state, although IP information won't be known you should consider using net-pre-up instead, alternatively, disable other NCPs such that IPv4 is the only negotiated protocol - which will also result in a guarantee that ip-pre-up is called prior to the interface going into an UP state.

@paulusmack
Copy link
Collaborator

May I then simply implement a net-pre-up that executes (blocking) once auth is done but PRIOR to protocol negotiations starting?

And net-down (non-blocking) which executes just prior to taking LCP down?

Since ip-pre-up is then broken by design (and based on the feedback cannot be fixed), may I recommend that a big fat warning be added to the documentation? Something like:

WARNING: Please note that on systems where a single interface carries multiple protocols (Linux) ip-pre-up is NOT actually guaranteed to execute prior to the interface moving into an up state, although IP information won't be known you should consider using net-pre-up instead, alternatively, disable other NCPs such that IPv4 is the only negotiated protocol - which will also result in a guarantee that ip-pre-up is called prior to the interface going into an UP state.

That all sounds fine to me.

@jkroonza jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from 0d12ca7 to 7f65867 Compare September 26, 2023 21:45
@jkroonza
Copy link
Contributor Author

May I then simply implement a net-pre-up that executes (blocking) once auth is done but PRIOR to protocol negotiations starting?
And net-down (non-blocking) which executes just prior to taking LCP down?
Since ip-pre-up is then broken by design (and based on the feedback cannot be fixed), may I recommend that a big fat warning be added to the documentation? Something like:
WARNING: Please note that on systems where a single interface carries multiple protocols (Linux) ip-pre-up is NOT actually guaranteed to execute prior to the interface moving into an up state, although IP information won't be known you should consider using net-pre-up instead, alternatively, disable other NCPs such that IPv4 is the only negotiated protocol - which will also result in a guarantee that ip-pre-up is called prior to the interface going into an UP state.

That all sounds fine to me.

Sorry for the delay. Moving to pppd 2.5.0 is getting some priority now.

Try as much as I could I could not completely axe the net-init mechanism. So please consider the motivation:

We have incoming ppp over ethernet (pppoe-server), l2tp (xl2tpd) as well as pptpd (some things just won't die). None of the daemons allows without some tweaking a mechanism to pre-run stuff prior to exec'ing pppd without basically wrapping pppd, and certainly the unit number won't be available at that stage, so we really just need to make sure that the moment the unit number is allocated, we clean the slate and verify that out external (from perspective of pppd) management state trackers are properly reset back into sane state (in case where pppd is forcefully killed or segfaults and doesn't clean up properly this is the only sane spot to perform the cleanup).

net-pre-up also permits interface renaming (there was an attempt to implement very complex code in pppd to perform interface renaming, this way this code can be handled externally to pppd using whatever script mechanism the op prefers). Any variables exported during the auth phase is also available here.

net-down should execute after all else have torn down, and ip-down and ipv6-down have been initiated (but possibly not completed).

Documentation pending, but review on code in the meantime please.

@jkroonza jkroonza changed the title pppd: consolidate the interface up/down code a bit. pppd: implement net-init, net-pre-up and net-down. Sep 27, 2023
@jkroonza jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from 7f65867 to a0ad71a Compare September 27, 2023 21:56
@jkroonza jkroonza marked this pull request as ready for review September 27, 2023 21:56
@Neustradamus
Copy link
Member

@paulusmack, @carlsonj, @enaess, @EasyNetDev: Can you look this PR?
@jkroonza has worked recently on it :)

@paulusmack
Copy link
Collaborator

The commit description is not adequate. I want to see there a brief outline of what these new scripts are for and what problem is solved by having them, and also what other problems might still exist that this doesn't solve. I want people in the future to be able to understand the motivation for these changes and the design decisions behind them without needing access to github.com.

net-init executes as a blocking script directly after the unit number
becomes available.  This can be used to initialise aspects related to
the ppp connection that lives outside of the ppp connection.  It can
also be used to clean up (in the author's extremely unlikely case) where
a previous pppd crashed, and net-down didn't execute in order to clean
up.

net-pre-up executes as a blocking script after auth, prior to NCPs being
negotiated.  Unlike ip-pre-up this is guaranteed to execute prior to the
interface being brought up, and can be used in an NCP agnostic manner to
pre-initialise aspects of the interface for which it still needs to be
down (amongst others it's recommended that firewall changes happen
here).

net-down executes in a non-blocking manner just prior to pppd
terminating and can be used to clean up actions from previous scripts.

You will notice that I mention ip-pre-up doesn't gaurantee that the
interface will still be down, this is because in a Linux world all
protocols runs on the same interface, compared to solaris where I'm
informed each protocol runs on it's own sub-interface, each of which has
it's own operational state.  The man page for pppd has also been
adjusted to indicate as much.

Signed-off-by: Jaco Kroon <[email protected]>
@jkroonza jkroonza force-pushed the consolidated-ifup-down-and-scripts branch from a0ad71a to 3906398 Compare October 3, 2023 06:13
@jkroonza
Copy link
Contributor Author

jkroonza commented Oct 3, 2023

The commit description is not adequate. I want to see there a brief outline of what these new scripts are for and what problem is solved by having them, and also what other problems might still exist that this doesn't solve. I want people in the future to be able to understand the motivation for these changes and the design decisions behind them without needing access to github.com.

That should be better now.

Whilst not my original strategy this does address all our concerns, as such I don't have further concerns around this, but there are from your side I'll be happy to add to the commit message.

@Neustradamus
Copy link
Member

@paulusmack: The @jkroonza commit description is good now?

@paulusmack paulusmack merged commit 1480d43 into ppp-project:master Oct 11, 2023
1 check passed
@Neustradamus
Copy link
Member

@EasyNetDev: What do you think about the @jkroonza merged PR?

@jkroonza jkroonza deleted the consolidated-ifup-down-and-scripts branch December 4, 2023 13:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

5 participants