Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Will newer DRBD userland break older Kernels ? #99

Open
bernardgut opened this issue Sep 21, 2024 · 0 comments
Open

Will newer DRBD userland break older Kernels ? #99

bernardgut opened this issue Sep 21, 2024 · 0 comments

Comments

@bernardgut
Copy link

Hello

Sorry in advance if this is obvious but I have been testing Linstor/DRBD on Talos for 2 weeks now. I have had a lot of issues with piraeus-operator (v2.6.0) on my Talos (1.7.6) cluster. Namely:

  • Random quorum lost on volumes (replicas stuck in connecting(<nodeID>/Unconnected(<nodeID>)). With no errors in drbdadm
  • Volumes that randomly get stuck in "Terminating" when I delete them with the logs on the pv stating Warning VolumeFailedDelete 4m42s linstor.csi.linbit.com_linstor-csi-controller-84674bd55b-4kd2n_20cbf029-05ef-4869-bf72-b9782a25f513 (combined from similar events): rpc error: code = Internal desc = failed to delete volume: Message: 'Resource 'pvc-a423738b-8249-48ae-8a57-a708f87c98e5' is still in use.'; Cause: 'Resource is mounted/in use.'; Details: 'Node: n2, Resource: pvc-a423738b-8249-48ae-8a57-a708f87c98e5'; Correction: 'Un-mount resource 'pvc-a423738b-8249-48ae-8a57-a708f87c98e5' on the node 'n2'.'; Reports: '[66EEEB28-00000-000019]' (I can share the error reports if you want)

Because I was the only one seeing these I assumed I must have made a mistake somewhere in my config.

After investigating for a week I found out that when You install mainline Talos (currently 1.7.6) and setup the drbd kernel module, you get the 9.8.2 version. Currently the DRBD version packaged with piraeus-operator is DRBD 9.2.11. It is a very obscure thing and you have to go look for it in the image tags and Talos will not push extensions updates to older versions of Talos so you need to wait until a new version of Talos gets released to get the latest version.

I new assume this is the reason why I am seeing all these issues. But before I open a PR on the piraeus-operator repo in the Talos section to add warnings to the documentation so that other people who are new to this don't hit the same issue as me, I need to confirm this :

  • Can running DRBD 9.2.11 userland against a DRBD 9.2.8 kernel cause these kind of issues ?

I am pretty new to DRBD so apologies if the answer is obvious. Either way, If the answer is yes, then I would suggest adding a small printk("drbd kernel version mismatch <9.X.X> vs <9.X.Y)") somewhere in the dmesg... I can even open the PR myself if you show me in which file to do it.

Thanks
Bernard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant