Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pfsync multiqueue] tcp connections dropped when pfsync is active and multiqueue is enabled #8059

Open
2 tasks done
fhloston opened this issue Nov 13, 2024 · 3 comments
Open
2 tasks done
Labels
support Community support

Comments

@fhloston
Copy link
Contributor

fhloston commented Nov 13, 2024

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

When HA is configured and pfsync is active a low precentage (2-4%) of TCP connections are dropped.

Tip: to validate your setup was working with the previous version, use opnsense-revert (https://docs.opnsense.org/manual/opnsense_tools.html#opnsense-revert)

To Reproduce

Steps to reproduce the behavior:

  1. Install 2 opnsense 24.7.8 firewalls with static WAN and LAN ip.
  2. Configure WAN and LAN CARP ip.
  3. Configure NAT to outbound nat the LAN range to CARP ip.
  4. Turn on pfsync on both on LAN interface (for simplicity the extra interface for pfsync has been omitted)
  5. Install client in LAN
  6. Run reproducer on client, adapt -m timeout parameter well above usual download time. local download target is recommended.
    fail=0;success=0;while :;do curl -o /dev/null -m 30 https://fsn1-speed.hetzner.com/1GB.bin && echo "success $((++success)) fail $fail" || echo "success $success fail $((++fail))";done
  7. Observe fail count. in my case it looks like this after some hours:
    "success 12539 fail 514"

When the issue occurs, curl's current download rate drops to 0, no more packets are recieved.
Assumption: the TCP state is removed and all packets are dropped.

Expected behavior

"success 12539 fail 0"

No dropped TCP connections. When pfsync is disabled, no connections are dropped.

Describe alternatives you considered

Sync compatibility 24.1 or 24.7 does not change anything, neither does multicast vs. unicast pfsync change anything. Only disabling pfsync or setting the unicast ips to other ips than the master or backup.

I recreated the same setup with voldemort 2.7.2 to rule out general pfsync issues or issues from environment/load. This setup does not show the issue on same machine type and virtualization environment, even on the same hypervisor and bridge over several hours.

Additional context

The issue is probably there for several months if not years. We have had issues especially with docker layer updates stalling. We could never pinpoint this until now. This also goes away when turning off pfsync.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 24.7.8
Proxmox VE 8.2.7

bios: ovmf
boot: order=scsi0
cores: 2
cpu: host
efidisk0: nvme:vm-2180-disk-0,efitype=4m,pre-enrolled-keys=1,size=1M
hotplug: disk,network,usb,cpu
machine: q35
memory: 4096
name: fw01-debug-sense
net0: virtio=BC:24:11:0B:83:7E,bridge=vmbr0,queues=4
net1: virtio=BC:24:11:7E:BE:DA,bridge=abc1001,queues=4,tag=123
numa: 1
ostype: other
scsi0: nvme:vm-2180-disk-1,discard=on,iothread=1,size=18G,ssd=1
scsihw: virtio-scsi-single
serial0: socket
smbios1: uuid=1fdfcdb9-c5f3-468e-a066-16a705ac3d98
sockets: 1
vga: qxl
vmgenid: b6bee8e4-09d7-4d31-b2d4-96043b2d8069```
@AdSchellevis AdSchellevis added the support Community support label Nov 13, 2024
@fhloston
Copy link
Contributor Author

Update: getting rid of the queues=4 in Proxmox mitigates the issue.

Should pfsync work with multiqueue in OPNsense?

@fhloston fhloston changed the title [pfsync bug?] tcp connections dropped when pfsync is active [pfsync multiqueue] tcp connections dropped when pfsync is active and multiqueue is enabled Nov 14, 2024
@fhloston
Copy link
Contributor Author

Update2: I also tested with queues=8 which is even worse. However I am in contact with 2 other people who run OPNsense on Proxmox. They both use queues=8 and HA/pfsync without any issue. Mysterious.

@fhloston
Copy link
Contributor Author

Update3: voldemort 2.7.2 does not use multiqueue. so the difference is explainable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
support Community support
Development

No branches or pull requests

2 participants