Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add Docker support #3

Draft
wants to merge 8 commits into
base: halium-9.0
Choose a base branch
from

Conversation

RoiArthurB
Copy link

@RoiArthurB RoiArthurB commented May 22, 2023

Currently, docker can't run on the Droidian's OP3 custom kernel. The limitation isn't the kernel version (as it has to be at least 3.10 and we're on 3.18), so it's a module limitation.

Currently, trying to run docker gives this error:

$ sudo docker run hello-world
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error mounting "mqueue" to rootfs at "/dev/mqueue": mount mqueue:/dev/mqueue (via /proc/self/fd/6), flags: 0xe: no such device: unknown.
ERRO[0000] error waiting for container:   

After some researches (here, here, here, here, ) I found [this article]https://blog.hypriot.com/post/verify-kernel-container-compatibility/() giving a script to check every modules/flag/else needed for docker to run properly. The current result looks like this :

$ ./check-config.sh 
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled, but apparmor_parser missing
    (use "apt-get install apparmor" to fix this)
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_IP_NF_FILTER: enabled
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
- CONFIG_NETFILTER_XT_MATCH_IPVS: missing
- CONFIG_NETFILTER_XT_MARK: enabled
- CONFIG_IP_NF_NAT: enabled
- CONFIG_NF_NAT: enabled
- CONFIG_POSIX_MQUEUE: missing
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled
- CONFIG_NF_NAT_IPV4: enabled
- CONFIG_NF_NAT_NEEDED: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_SECCOMP_FILTER: enabled
- CONFIG_CGROUP_PIDS: missing
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_MEMCG_KMEM: missing
- CONFIG_RESOURCE_COUNTERS: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: missing
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: missing
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: missing
- CONFIG_CFS_BANDWIDTH: missing
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled
- CONFIG_IP_VS: missing
- CONFIG_IP_VS_NFCT: missing
- CONFIG_IP_VS_PROTO_TCP: missing
- CONFIG_IP_VS_PROTO_UDP: missing
- CONFIG_IP_VS_RR: missing
- CONFIG_SECURITY_SELINUX: enabled
- CONFIG_SECURITY_APPARMOR: enabled
- CONFIG_EXT3_FS: missing
- CONFIG_EXT3_FS_XATTR: missing
- CONFIG_EXT3_FS_POSIX_ACL: missing
- CONFIG_EXT3_FS_SECURITY: missing
    (enable these ext3 configs if you are using ext3 as backing filesystem)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: missing
    - CONFIG_BRIDGE_VLAN_FILTERING: missing
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: missing
  - "macvlan":
    - CONFIG_MACVLAN: missing
    - CONFIG_DUMMY: missing
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled
    - CONFIG_NF_CONNTRACK_FTP: enabled
    - CONFIG_NF_NAT_TFTP: enabled
    - CONFIG_NF_CONNTRACK_TFTP: enabled
- Storage Drivers:
  - "btrfs":
    - CONFIG_BTRFS_FS: missing
    - CONFIG_BTRFS_FS_POSIX_ACL: missing
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

This PR (any help open) aim to try to enable/add every needed module in the kernel to have Docker working on Droidian OP3 :D

PS: Currently as a DRAFT to discuss this idea as issues/discussions are disabled

@RoiArthurB RoiArthurB changed the title feat: Enable POSIX Message Queues feat: Add Docker support May 22, 2023
@RoiArthurB
Copy link
Author

UPDATE : I tried this new kernel (with just the activation of mqueue), and it gives some very exciting results already ! The test image is working just fine now:

droidian@lineageoneplus3:~$ sudo docker run hello-world
[sudo] password for droidian: 

Hello from Docker!
This message shows that your installation appears to be working correctly.

To generate this message, Docker took the following steps:
 1. The Docker client contacted the Docker daemon.
 2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
    (arm64v8)
 3. The Docker daemon created a new container from that image which runs the
    executable that produces the output you are currently reading.
 4. The Docker daemon streamed that output to the Docker client, which sent it
    to your terminal.

To try something more ambitious, you can run an Ubuntu container with:
 $ docker run -it ubuntu bash

Share images, automate workflows, and more with a free Docker ID:
 https://hub.docker.com/

For more examples and ideas, visit:
 https://docs.docker.com/get-started/

I'll try docker on my OP3 furthermore to see if it's limited or not, but if the idea of bringing Docker to Droidian is something you're interested in, it doesn't seem so far from us 😉

@RoiArthurB
Copy link
Author

RoiArthurB commented May 22, 2023

Just to update you guys, docker shell is working, but all the networking and other stuff is broken at this state.

I think (obviously) it comes from missing modules from the list on the original post. From the most trivial (i.e. listed but not set), we can found this list (categorized for clearer understanding of each) :

  • General setup
    • CONFIG_POSIX_MQUEUE=y
  • IO Schedulers
    • CONFIG_CFQ_GROUP_IOSCHED is not set
  • GCOV-based kernel profiling
    • CONFIG_BLK_DEV_THROTTLING is not set
  • Classification
    • CONFIG_CGROUP_NET_PRIO is not set
  • RCU Subsystem
    • CONFIG_MEMCG_KMEM is not set
    • CONFIG_CFS_BANDWIDTH is not set
  • Xtables matches
    • CONFIG_IP_VS is not set
    • ? CONFIG_IP_VS_NFCT: missing
    • ? CONFIG_IP_VS_PROTO_TCP: missing
    • ? CONFIG_IP_VS_PROTO_UDP: missing
    • ? CONFIG_IP_VS_RR: missing
  • IEEE 1394 (FireWire) support
    • CONFIG_VXLAN is not set
    • CONFIG_MACVLAN is not set
    • CONFIG_DUMMY is not set
  • File systems
    • CONFIG_EXT3_FS is not set
    • ? CONFIG_EXT3_FS_XATTR: missing
    • ? CONFIG_EXT3_FS_POSIX_ACL: missing
    • ? CONFIG_EXT3_FS_SECURITY: missing
    • CONFIG_BTRFS_FS is not set
    • ? CONFIG_BTRFS_FS_POSIX_ACL: missing
      (PS: Question mark item are ones missing but probably hidden as the root parameter isn't set)

However, there's also this list of missing modules :

  • CONFIG_NETFILTER_XT_MATCH_IPVS
  • CONFIG_CGROUP_PIDS
  • CONFIG_CGROUP_HUGETLB
  • CONFIG_BRIDGE_VLAN_FILTERING
  • CONFIG_IPVLAN

I think, for most part, enabling those modules wouldn't create any issues nor instability (as probably already mostly coded), but I would prefer to discuss it with you guys ( @FakeShell and @Bettehem ). What's your point of view on all this PR ? :)

@FakeShell
Copy link
Collaborator

@RoiArthurB Hey

Having the ability to use docker is great. From the list you have mentioned the only ones that stand out to me as hard dependencies to get everything working are the last 5 you have mentioned. It requires more stability testing than wireguard tho.
Feel free to test it out and report back

@Bettehem
Copy link
Member

I agree with @FakeShell , docker support would be pretty awesome! But it will definitely need some testing. Thanks for looking at this :)

@RoiArthurB
Copy link
Author

RoiArthurB commented May 23, 2023

Hey buds !

So I tried to walk step by step activating modules and I almost finished the todo list

Show modules checklist
  • General setup
    • CONFIG_POSIX_MQUEUE=y
  • IO Schedulers
    • CONFIG_CFQ_GROUP_IOSCHED is not set
  • GCOV-based kernel profiling
    • CONFIG_BLK_DEV_THROTTLING is not set
  • Classification
    • CONFIG_CGROUP_NET_PRIO is not set
  • RCU Subsystem
    • CONFIG_MEMCG_KMEM is not set
    • CONFIG_CFS_BANDWIDTH is not set
  • Xtables matches
    • CONFIG_IP_VS is not set
    • CONFIG_IP_VS_NFCT: missing
    • CONFIG_IP_VS_PROTO_TCP: missing
    • CONFIG_IP_VS_PROTO_UDP: missing
    • CONFIG_IP_VS_RR: missing
  • IEEE 1394 (FireWire) support
    • CONFIG_VXLAN is not set
    • CONFIG_MACVLAN is not set
    • CONFIG_DUMMY is not set
  • File systems
    • CONFIG_EXT3_FS is not set
    • ? CONFIG_EXT3_FS_XATTR: missing
    • ? CONFIG_EXT3_FS_POSIX_ACL: missing
    • ? CONFIG_EXT3_FS_SECURITY: missing
      => Not running droidian on ext3
    • CONFIG_BTRFS_FS is not set
    • ? CONFIG_BTRFS_FS_POSIX_ACL: missing
  • Unclassified
    • CONFIG_NETFILTER_XT_MATCH_IPVS
    • ONFIG_CGROUP_HUGETLB
    • CONFIG_BRIDGE_VLAN_FILTERING

Totally missing from menuconfig :

  • CONFIG_CGROUP_PIDS
  • CONFIG_IPVLAN

But once I restart the check script some modules aren't well activated... Namely those :

  • CONFIG_CGROUP_PIDS: missing
  • CONFIG_CGROUP_HUGETLB: missing
  • CONFIG_CGROUP_NET_PRIO: missing
  • CONFIG_BRIDGE_VLAN_FILTERING: missing
Show full check-list
$ ./check-config.sh 
info: reading kernel config from /proc/config.gz ...

Generally Necessary:
- cgroup hierarchy: properly mounted [/sys/fs/cgroup]
- apparmor: enabled, but apparmor_parser missing
    (use "apt-get install apparmor" to fix this)
- CONFIG_NAMESPACES: enabled
- CONFIG_NET_NS: enabled
- CONFIG_PID_NS: enabled
- CONFIG_IPC_NS: enabled
- CONFIG_UTS_NS: enabled
- CONFIG_CGROUPS: enabled
- CONFIG_CGROUP_CPUACCT: enabled
- CONFIG_CGROUP_DEVICE: enabled
- CONFIG_CGROUP_FREEZER: enabled
- CONFIG_CGROUP_SCHED: enabled
- CONFIG_CPUSETS: enabled
- CONFIG_MEMCG: enabled
- CONFIG_KEYS: enabled
- CONFIG_VETH: enabled
- CONFIG_BRIDGE: enabled
- CONFIG_BRIDGE_NETFILTER: enabled
- CONFIG_IP_NF_FILTER: enabled
- CONFIG_IP_NF_TARGET_MASQUERADE: enabled
- CONFIG_NETFILTER_XT_MATCH_ADDRTYPE: enabled
- CONFIG_NETFILTER_XT_MATCH_CONNTRACK: enabled
- CONFIG_NETFILTER_XT_MATCH_IPVS: enabled
- CONFIG_NETFILTER_XT_MARK: enabled
- CONFIG_IP_NF_NAT: enabled
- CONFIG_NF_NAT: enabled
- CONFIG_POSIX_MQUEUE: enabled
- CONFIG_DEVPTS_MULTIPLE_INSTANCES: enabled
- CONFIG_NF_NAT_IPV4: enabled
- CONFIG_NF_NAT_NEEDED: enabled

Optional Features:
- CONFIG_USER_NS: enabled
- CONFIG_SECCOMP: enabled
- CONFIG_SECCOMP_FILTER: enabled
- CONFIG_CGROUP_PIDS: missing
- CONFIG_MEMCG_SWAP: enabled
- CONFIG_MEMCG_SWAP_ENABLED: enabled
    (cgroup swap accounting is currently enabled)
- CONFIG_MEMCG_KMEM: enabled
- CONFIG_RESOURCE_COUNTERS: enabled
- CONFIG_IOSCHED_CFQ: enabled
- CONFIG_CFQ_GROUP_IOSCHED: enabled
- CONFIG_BLK_CGROUP: enabled
- CONFIG_BLK_DEV_THROTTLING: enabled
- CONFIG_CGROUP_PERF: enabled
- CONFIG_CGROUP_HUGETLB: missing
- CONFIG_NET_CLS_CGROUP: enabled
- CONFIG_CGROUP_NET_PRIO: missing
- CONFIG_CFS_BANDWIDTH: enabled
- CONFIG_FAIR_GROUP_SCHED: enabled
- CONFIG_IP_NF_TARGET_REDIRECT: enabled
- CONFIG_IP_VS: enabled
- CONFIG_IP_VS_NFCT: enabled
- CONFIG_IP_VS_PROTO_TCP: enabled
- CONFIG_IP_VS_PROTO_UDP: enabled
- CONFIG_IP_VS_RR: enabled
- CONFIG_SECURITY_SELINUX: enabled
- CONFIG_SECURITY_APPARMOR: enabled
- CONFIG_EXT3_FS: missing
- CONFIG_EXT3_FS_XATTR: missing
- CONFIG_EXT3_FS_POSIX_ACL: missing
- CONFIG_EXT3_FS_SECURITY: missing
    (enable these ext3 configs if you are using ext3 as backing filesystem)
- CONFIG_EXT4_FS: enabled
- CONFIG_EXT4_FS_POSIX_ACL: enabled
- CONFIG_EXT4_FS_SECURITY: enabled
- Network Drivers:
  - "overlay":
    - CONFIG_VXLAN: enabled
    - CONFIG_BRIDGE_VLAN_FILTERING: missing
      Optional (for encrypted networks):
      - CONFIG_CRYPTO: enabled
      - CONFIG_CRYPTO_AEAD: enabled
      - CONFIG_CRYPTO_GCM: enabled
      - CONFIG_CRYPTO_SEQIV: enabled
      - CONFIG_CRYPTO_GHASH: enabled
      - CONFIG_XFRM: enabled
      - CONFIG_XFRM_USER: enabled
      - CONFIG_XFRM_ALGO: enabled
      - CONFIG_INET_ESP: enabled
      - CONFIG_INET_XFRM_MODE_TRANSPORT: enabled
  - "ipvlan":
    - CONFIG_IPVLAN: missing
  - "macvlan":
    - CONFIG_MACVLAN: enabled
    - CONFIG_DUMMY: enabled
  - "ftp,tftp client in container":
    - CONFIG_NF_NAT_FTP: enabled
    - CONFIG_NF_CONNTRACK_FTP: enabled
    - CONFIG_NF_NAT_TFTP: enabled
    - CONFIG_NF_CONNTRACK_TFTP: enabled
- Storage Drivers:
  - "btrfs":
    - CONFIG_BTRFS_FS: missing
    - CONFIG_BTRFS_FS_POSIX_ACL: missing
  - "overlay":
    - CONFIG_OVERLAY_FS: enabled
  - "zfs":
    - /dev/zfs: missing
    - zfs command: missing
    - zpool command: missing

Limits:
- /proc/sys/kernel/keys/root_maxkeys: 1000000

Finally, I'm not sure it made us doing any improvement... It still can't manage PID or GID inside the container

$ docker-compose up
Creating radarr ... done
Attaching to radarr
radarr    | s6-ipcserver-socketbinder: fatal: unable to create socket: Permission denied
radarr    | s6-ipcserver-socketbinder: fatal: unable to create socket: Permission denied
radarr    | s6-ipcserver-socketbinder: fatal: unable to create socket: Permission denied

Nor properly bridge network card from inside container to outside world

$ docker run -it --rm ubuntu bash
Unable to find image 'ubuntu:latest' locally
latest: Pulling from library/ubuntu
79d0ea7dc1a8: Pull complete 
Digest: sha256:dfd64a3b4296d8c9b62aa3309984f8620b98d87e47492599ee20739e8eb54fbf
Status: Downloaded newer image for ubuntu:latest
root@a6e64b5a258d:/# apt update
Ign:1 http://ports.ubuntu.com/ubuntu-ports jammy InRelease
Ign:2 http://ports.ubuntu.com/ubuntu-ports jammy-updates InRelease
Ign:3 http://ports.ubuntu.com/ubuntu-ports jammy-backports InRelease
Ign:4 http://ports.ubuntu.com/ubuntu-ports jammy-security InRelease
Ign:1 http://ports.ubuntu.com/ubuntu-ports jammy InRelease
Ign:2 http://ports.ubuntu.com/ubuntu-ports jammy-updates InRelease
Ign:3 http://ports.ubuntu.com/ubuntu-ports jammy-backports InRelease
Ign:4 http://ports.ubuntu.com/ubuntu-ports jammy-security InRelease
Ign:1 http://ports.ubuntu.com/ubuntu-ports jammy InRelease
Ign:2 http://ports.ubuntu.com/ubuntu-ports jammy-updates InRelease
Ign:3 http://ports.ubuntu.com/ubuntu-ports jammy-backports InRelease
Ign:4 http://ports.ubuntu.com/ubuntu-ports jammy-security InRelease
Err:1 http://ports.ubuntu.com/ubuntu-ports jammy InRelease
  Temporary failure resolving 'ports.ubuntu.com'
Err:2 http://ports.ubuntu.com/ubuntu-ports jammy-updates InRelease
  Temporary failure resolving 'ports.ubuntu.com'
Err:3 http://ports.ubuntu.com/ubuntu-ports jammy-backports InRelease
  Temporary failure resolving 'ports.ubuntu.com'
Err:4 http://ports.ubuntu.com/ubuntu-ports jammy-security InRelease
  Temporary failure resolving 'ports.ubuntu.com'
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/jammy/InRelease  Temporary failure resolving 'ports.ubuntu.com'
W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/jammy-updates/InRelease  Temporary failure resolving 'ports.ubuntu.com'
W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/jammy-backports/InRelease  Temporary failure resolving 'ports.ubuntu.com'
W: Failed to fetch http://ports.ubuntu.com/ubuntu-ports/dists/jammy-security/InRelease  Temporary failure resolving 'ports.ubuntu.com'
W: Some index files failed to download. They have been ignored, or old ones used instead.

I'll keep digging it and keep you informed. Feel free to propose tracks to solve problems or else as always 😉

@FakeShell
Copy link
Collaborator

Hey
the issue looks to be about DNS. about the defconfig options, CONFIG_CGROUP_PIDS
CONFIG_IPVLAN
they were introduced on 4.5 and 3.19. so to get them enabled, you'd have to backport them to this 3.18 kernel. CONFIG_IPVLAN from the description looks quite important tho.
CONFIG_CGROUP_PIDS is also missing as its introduced in 4.5 and we suspect the issue with bluebinder (the bluetooth binder proxy) is also related to this option. currently we disable all sandboxing to start and get bluetooth running on the oneplus 3 because the kernel is too old.

@FakeShell
Copy link
Collaborator

Looking further into, you can probably cherry-pick droidian-lavender/kernel-xiaomi-lavender@49b786e for CONFIG_CGROUP_PIDS and droidian-lavender/kernel-xiaomi-lavender@2ad7bf3 for CONFIG_IPVLAN to get them working

Bettehem pushed a commit that referenced this pull request May 28, 2023
[ Upstream commit 721b1d98fb517ae99ab3b757021cf81db41e67be ]

kcopyd has no upper limit to the number of jobs one can allocate and
issue. Under certain workloads this can lead to excessive memory usage
and workqueue stalls. For example, when creating multiple dm-snapshot
targets with a 4K chunk size and then writing to the origin through the
page cache. Syncing the page cache causes a large number of BIOs to be
issued to the dm-snapshot origin target, which itself issues an even
larger (because of the BIO splitting taking place) number of kcopyd
jobs.

Running the following test, from the device mapper test suite [1],

  dmtest run --suite snapshot -n many_snapshots_of_same_volume_N

, with 8 active snapshots, results in the kcopyd job slab cache growing
to 10G. Depending on the available system RAM this can lead to the OOM
killer killing user processes:

[463.492878] kthreadd invoked oom-killer: gfp_mask=0x6040c0(GFP_KERNEL|__GFP_COMP),
              nodemask=(null), order=1, oom_score_adj=0
[463.492894] kthreadd cpuset=/ mems_allowed=0
[463.492948] CPU: 7 PID: 2 Comm: kthreadd Not tainted 4.19.0-rc7 #3
[463.492950] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
[463.492952] Call Trace:
[463.492964]  dump_stack+0x7d/0xbb
[463.492973]  dump_header+0x6b/0x2fc
[463.492987]  ? lockdep_hardirqs_on+0xee/0x190
[463.493012]  oom_kill_process+0x302/0x370
[463.493021]  out_of_memory+0x113/0x560
[463.493030]  __alloc_pages_slowpath+0xf40/0x1020
[463.493055]  __alloc_pages_nodemask+0x348/0x3c0
[463.493067]  cache_grow_begin+0x81/0x8b0
[463.493072]  ? cache_grow_begin+0x874/0x8b0
[463.493078]  fallback_alloc+0x1e4/0x280
[463.493092]  kmem_cache_alloc_node+0xd6/0x370
[463.493098]  ? copy_process.part.31+0x1c5/0x20d0
[463.493105]  copy_process.part.31+0x1c5/0x20d0
[463.493115]  ? __lock_acquire+0x3cc/0x1550
[463.493121]  ? __switch_to_asm+0x34/0x70
[463.493129]  ? kthread_create_worker_on_cpu+0x70/0x70
[463.493135]  ? finish_task_switch+0x90/0x280
[463.493165]  _do_fork+0xe0/0x6d0
[463.493191]  ? kthreadd+0x19f/0x220
[463.493233]  kernel_thread+0x25/0x30
[463.493235]  kthreadd+0x1bf/0x220
[463.493242]  ? kthread_create_on_cpu+0x90/0x90
[463.493248]  ret_from_fork+0x3a/0x50
[463.493279] Mem-Info:
[463.493285] active_anon:20631 inactive_anon:4831 isolated_anon:0
[463.493285]  active_file:80216 inactive_file:80107 isolated_file:435
[463.493285]  unevictable:0 dirty:51266 writeback:109372 unstable:0
[463.493285]  slab_reclaimable:31191 slab_unreclaimable:3483521
[463.493285]  mapped:526 shmem:4903 pagetables:1759 bounce:0
[463.493285]  free:33623 free_pcp:2392 free_cma:0
...
[463.493489] Unreclaimable slab info:
[463.493513] Name                      Used          Total
[463.493522] bio-6                   1028KB       1028KB
[463.493525] bio-5                   1028KB       1028KB
[463.493528] dm_snap_pending_exception     236783KB     243789KB
[463.493531] dm_exception              41KB         42KB
[463.493534] bio-4                   1216KB       1216KB
[463.493537] bio-3                 439396KB     439396KB
[463.493539] kcopyd_job           6973427KB    6973427KB
...
[463.494340] Out of memory: Kill process 1298 (ruby2.3) score 1 or sacrifice child
[463.494673] Killed process 1298 (ruby2.3) total-vm:435740kB, anon-rss:20180kB, file-rss:4kB, shmem-rss:0kB
[463.506437] oom_reaper: reaped process 1298 (ruby2.3), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Moreover, issuing a large number of kcopyd jobs results in kcopyd
hogging the CPU, while processing them. As a result, processing of work
items, queued for execution on the same CPU as the currently running
kcopyd thread, is stalled for long periods of time, hurting performance.
Running the aforementioned test we get, in dmesg, messages like the
following:

[67501.194592] BUG: workqueue lockup - pool cpus=4 node=0 flags=0x0 nice=0 stuck for 27s!
[67501.195586] Showing busy workqueues and worker pools:
[67501.195591] workqueue events: flags=0x0
[67501.195597]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195611]     pending: cache_reap
[67501.195641] workqueue mm_percpu_wq: flags=0x8
[67501.195645]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195656]     pending: vmstat_update
[67501.195682] workqueue kblockd: flags=0x18
[67501.195687]   pwq 5: cpus=2 node=0 flags=0x0 nice=-20 active=1/256
[67501.195698]     pending: blk_timeout_work
[67501.195753] workqueue kcopyd: flags=0x8
[67501.195757]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195768]     pending: do_work [dm_mod]
[67501.195802] workqueue kcopyd: flags=0x8
[67501.195806]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195817]     pending: do_work [dm_mod]
[67501.195834] workqueue kcopyd: flags=0x8
[67501.195838]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195848]     pending: do_work [dm_mod]
[67501.195881] workqueue kcopyd: flags=0x8
[67501.195885]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=1/256
[67501.195896]     pending: do_work [dm_mod]
[67501.195920] workqueue kcopyd: flags=0x8
[67501.195924]   pwq 8: cpus=4 node=0 flags=0x0 nice=0 active=2/256
[67501.195935]     in-flight: 67:do_work [dm_mod]
[67501.195945]     pending: do_work [dm_mod]
[67501.195961] pool 8: cpus=4 node=0 flags=0x0 nice=0 hung=27s workers=3 idle: 129 23765

The root cause for these issues is the way dm-snapshot uses kcopyd. In
particular, the lack of an explicit or implicit limit to the maximum
number of in-flight COW jobs. The merging path is not affected because
it implicitly limits the in-flight kcopyd jobs to one.

Fix these issues by using a semaphore to limit the maximum number of
in-flight kcopyd jobs. We grab the semaphore before allocating a new
kcopyd job in start_copy() and start_full_bio() and release it after the
job finishes in copy_callback().

The initial semaphore value is configurable through a module parameter,
to allow fine tuning the maximum number of in-flight COW jobs. Setting
this parameter to zero initializes the semaphore to INT_MAX.

A default value of 2048 maximum in-flight kcopyd jobs was chosen. This
value was decided experimentally as a trade-off between memory
consumption, stalling the kernel's workqueues and maintaining a high
enough throughput.

Re-running the aforementioned test:

  * Workqueue stalls are eliminated
  * kcopyd's job slab cache uses a maximum of 130MB
  * The time taken by the test to write to the snapshot-origin target is
    reduced from 05m20.48s to 03m26.38s

[1] https://github.com/jthornber/device-mapper-test-suite

Signed-off-by: Nikos Tsironis <[email protected]>
Signed-off-by: Ilias Tsitsimpis <[email protected]>
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
(cherry picked from commit 1d8db45d31789884302798495ce635a6df9de868)
Bettehem pushed a commit that referenced this pull request May 28, 2023
commit cf3591ef832915892f2499b7e54b51d4c578b28c upstream.

Revert the commit bd293d071ffe65e645b4d8104f9d8fe15ea13862. The proper
fix has been made available with commit d0a255e795ab ("loop: set
PF_MEMALLOC_NOIO for the worker thread").

Note that the fix offered by commit bd293d071ffe doesn't really prevent
the deadlock from occuring - if we look at the stacktrace reported by
Junxiao Bi, we see that it hangs in bit_wait_io and not on the mutex -
i.e. it has already successfully taken the mutex. Changing the mutex
from mutex_lock to mutex_trylock won't help with deadlocks that happen
afterwards.

PID: 474    TASK: ffff8813e11f4600  CPU: 10  COMMAND: "kswapd0"
   #0 [ffff8813dedfb938] __schedule at ffffffff8173f405
   #1 [ffff8813dedfb990] schedule at ffffffff8173fa27
   #2 [ffff8813dedfb9b0] schedule_timeout at ffffffff81742fec
   #3 [ffff8813dedfba60] io_schedule_timeout at ffffffff8173f186
   #4 [ffff8813dedfbaa0] bit_wait_io at ffffffff8174034f
   #5 [ffff8813dedfbac0] __wait_on_bit at ffffffff8173fec8
   #6 [ffff8813dedfbb10] out_of_line_wait_on_bit at ffffffff8173ff81
   #7 [ffff8813dedfbb90] __make_buffer_clean at ffffffffa038736f [dm_bufio]
   #8 [ffff8813dedfbbb0] __try_evict_buffer at ffffffffa0387bb8 [dm_bufio]
   #9 [ffff8813dedfbbd0] dm_bufio_shrink_scan at ffffffffa0387cc3 [dm_bufio]
  #10 [ffff8813dedfbc40] shrink_slab at ffffffff811a87ce
  #11 [ffff8813dedfbd30] shrink_zone at ffffffff811ad778
  #12 [ffff8813dedfbdc0] kswapd at ffffffff811ae92f
  #13 [ffff8813dedfbec0] kthread at ffffffff810a8428
  #14 [ffff8813dedfbf50] ret_from_fork at ffffffff81745242

Signed-off-by: Mikulas Patocka <[email protected]>
Cc: [email protected]
Fixes: bd293d071ffe ("dm bufio: fix deadlock with loop device")
Depends-on: d0a255e795ab ("loop: set PF_MEMALLOC_NOIO for the worker thread")
Signed-off-by: Mike Snitzer <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Signed-off-by: Greg Kroah-Hartman <[email protected]>
Change-Id: I6cd6b254cca8cb8e72c197d44b91790c1a799554
(cherry picked from commit 0170e2f25f92ff74510c411a1e918d8416ae9b5f)
Bettehem pushed a commit that referenced this pull request May 28, 2023
[ Upstream commit b387e9b58679c60f5b1e4313939bd4878204fc37 ]

When system memory is in heavy pressure, bch_gc_thread_start() from
run_cache_set() may fail due to out of memory. In such condition,
c->gc_thread is assigned to -ENOMEM, not NULL pointer. Then in following
failure code path bch_cache_set_error(), when cache_set_flush() gets
called, the code piece to stop c->gc_thread is broken,
         if (!IS_ERR_OR_NULL(c->gc_thread))
                 kthread_stop(c->gc_thread);

And KASAN catches such NULL pointer deference problem, with the warning
information:

[  561.207881] ==================================================================
[  561.207900] BUG: KASAN: null-ptr-deref in kthread_stop+0x3b/0x440
[  561.207904] Write of size 4 at addr 000000000000001c by task kworker/15:1/313

[  561.207913] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G        W         5.0.0-vanilla+ #3
[  561.207916] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
[  561.207935] Workqueue: events cache_set_flush [bcache]
[  561.207940] Call Trace:
[  561.207948]  dump_stack+0x9a/0xeb
[  561.207955]  ? kthread_stop+0x3b/0x440
[  561.207960]  ? kthread_stop+0x3b/0x440
[  561.207965]  kasan_report+0x176/0x192
[  561.207973]  ? kthread_stop+0x3b/0x440
[  561.207981]  kthread_stop+0x3b/0x440
[  561.207995]  cache_set_flush+0xd4/0x6d0 [bcache]
[  561.208008]  process_one_work+0x856/0x1620
[  561.208015]  ? find_held_lock+0x39/0x1d0
[  561.208028]  ? drain_workqueue+0x380/0x380
[  561.208048]  worker_thread+0x87/0xb80
[  561.208058]  ? __kthread_parkme+0xb6/0x180
[  561.208067]  ? process_one_work+0x1620/0x1620
[  561.208072]  kthread+0x326/0x3e0
[  561.208079]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[  561.208090]  ret_from_fork+0x3a/0x50
[  561.208110] ==================================================================
[  561.208113] Disabling lock debugging due to kernel taint
[  561.208115] irq event stamp: 11800231
[  561.208126] hardirqs last  enabled at (11800231): [<ffffffff83008538>] do_syscall_64+0x18/0x410
[  561.208127] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
[  561.208129] #PF error: [WRITE]
[  561.312253] hardirqs last disabled at (11800230): [<ffffffff830052ff>] trace_hardirqs_off_thunk+0x1a/0x1c
[  561.312259] softirqs last  enabled at (11799832): [<ffffffff850005c7>] __do_softirq+0x5c7/0x8c3
[  561.405975] PGD 0 P4D 0
[  561.442494] softirqs last disabled at (11799821): [<ffffffff831add2c>] irq_exit+0x1ac/0x1e0
[  561.791359] Oops: 0002 [#1] SMP KASAN NOPTI
[  561.791362] CPU: 15 PID: 313 Comm: kworker/15:1 Tainted: G    B   W         5.0.0-vanilla+ #3
[  561.791363] Hardware name: Lenovo ThinkSystem SR650 -[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE136T-2.10]- 03/22/2019
[  561.791371] Workqueue: events cache_set_flush [bcache]
[  561.791374] RIP: 0010:kthread_stop+0x3b/0x440
[  561.791376] Code: 00 00 65 8b 05 26 d5 e0 7c 89 c0 48 0f a3 05 ec aa df 02 0f 82 dc 02 00 00 4c 8d 63 20 be 04 00 00 00 4c 89 e7 e8 65 c5 53 00 <f0> ff 43 20 48 8d 7b 24 48 b8 00 00 00 00 00 fc ff df 48 89 fa 48
[  561.791377] RSP: 0018:ffff88872fc8fd10 EFLAGS: 00010286
[  561.838895] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  561.838916] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  561.838934] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  561.838948] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  561.838966] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  561.838979] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  561.838996] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  563.067028] RAX: 0000000000000000 RBX: fffffffffffffffc RCX: ffffffff832dd314
[  563.067030] RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000297
[  563.067032] RBP: ffff88872fc8fe88 R08: fffffbfff0b8213d R09: fffffbfff0b8213d
[  563.067034] R10: 0000000000000001 R11: fffffbfff0b8213c R12: 000000000000001c
[  563.408618] R13: ffff88dc61cc0f68 R14: ffff888102b94900 R15: ffff88dc61cc0f68
[  563.408620] FS:  0000000000000000(0000) GS:ffff888f7dc00000(0000) knlGS:0000000000000000
[  563.408622] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  563.408623] CR2: 000000000000001c CR3: 0000000f48a1a004 CR4: 00000000007606e0
[  563.408625] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  563.408627] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  563.904795] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  563.915796] PKRU: 55555554
[  563.915797] Call Trace:
[  563.915807]  cache_set_flush+0xd4/0x6d0 [bcache]
[  563.915812]  process_one_work+0x856/0x1620
[  564.001226] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  564.033563]  ? find_held_lock+0x39/0x1d0
[  564.033567]  ? drain_workqueue+0x380/0x380
[  564.033574]  worker_thread+0x87/0xb80
[  564.062823] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  564.118042]  ? __kthread_parkme+0xb6/0x180
[  564.118046]  ? process_one_work+0x1620/0x1620
[  564.118048]  kthread+0x326/0x3e0
[  564.118050]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[  564.167066] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  564.252441]  ret_from_fork+0x3a/0x50
[  564.252447] Modules linked in: msr rpcrdma sunrpc rdma_ucm ib_iser ib_umad rdma_cm ib_ipoib i40iw configfs iw_cm ib_cm libiscsi scsi_transport_iscsi mlx4_ib ib_uverbs mlx4_en ib_core nls_iso8859_1 nls_cp437 vfat fat intel_rapl skx_edac x86_pkg_temp_thermal coretemp iTCO_wdt iTCO_vendor_support crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel ses raid0 aesni_intel cdc_ether enclosure usbnet ipmi_ssif joydev aes_x86_64 i40e scsi_transport_sas mii bcache md_mod crypto_simd mei_me ioatdma crc64 ptp cryptd pcspkr i2c_i801 mlx4_core glue_helper pps_core mei lpc_ich dca wmi ipmi_si ipmi_devintf nd_pmem dax_pmem nd_btt ipmi_msghandler device_dax pcc_cpufreq button hid_generic usbhid mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect xhci_pci sysimgblt fb_sys_fops xhci_hcd ttm megaraid_sas drm usbcore nfit libnvdimm sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[  564.299390] bcache: bch_count_io_errors() nvme0n1: IO error on writing btree.
[  564.348360] CR2: 000000000000001c
[  564.348362] ---[ end trace b7f0e5cc7b2103b0 ]---

Therefore, it is not enough to only check whether c->gc_thread is NULL,
we should use IS_ERR_OR_NULL() to check both NULL pointer and error
value.

This patch changes the above buggy code piece in this way,
         if (!IS_ERR_OR_NULL(c->gc_thread))
                 kthread_stop(c->gc_thread);

Signed-off-by: Coly Li <[email protected]>
Signed-off-by: Jens Axboe <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Change-Id: I343982388e3519fbda228da83d168e4d3230002a
(cherry picked from commit f1319aaca990cc3cd7f231e19c8e41eb8bd4053d)
Bettehem pushed a commit that referenced this pull request May 28, 2023
[ Upstream commit f6766ff6afff70e2aaf39e1511e16d471de7c3ae ]

We need to check mddev->del_work before flush workqueu since the purpose
of flush is to ensure the previous md is disappeared. Otherwise the similar
deadlock appeared if LOCKDEP is enabled, it is due to md_open holds the
bdev->bd_mutex before flush workqueue.

kernel: [  154.522645] ======================================================
kernel: [  154.522647] WARNING: possible circular locking dependency detected
kernel: [  154.522650] 5.6.0-rc7-lp151.27-default #25 Tainted: G           O
kernel: [  154.522651] ------------------------------------------------------
kernel: [  154.522653] mdadm/2482 is trying to acquire lock:
kernel: [  154.522655] ffff888078529128 ((wq_completion)md_misc){+.+.}, at: flush_workqueue+0x84/0x4b0
kernel: [  154.522673]
kernel: [  154.522673] but task is already holding lock:
kernel: [  154.522675] ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590
kernel: [  154.522691]
kernel: [  154.522691] which lock already depends on the new lock.
kernel: [  154.522691]
kernel: [  154.522694]
kernel: [  154.522694] the existing dependency chain (in reverse order) is:
kernel: [  154.522696]
kernel: [  154.522696] -> #4 (&bdev->bd_mutex){+.+.}:
kernel: [  154.522704]        __mutex_lock+0x87/0x950
kernel: [  154.522706]        __blkdev_get+0x79/0x590
kernel: [  154.522708]        blkdev_get+0x65/0x140
kernel: [  154.522709]        blkdev_get_by_dev+0x2f/0x40
kernel: [  154.522716]        lock_rdev+0x3d/0x90 [md_mod]
kernel: [  154.522719]        md_import_device+0xd6/0x1b0 [md_mod]
kernel: [  154.522723]        new_dev_store+0x15e/0x210 [md_mod]
kernel: [  154.522728]        md_attr_store+0x7a/0xc0 [md_mod]
kernel: [  154.522732]        kernfs_fop_write+0x117/0x1b0
kernel: [  154.522735]        vfs_write+0xad/0x1a0
kernel: [  154.522737]        ksys_write+0xa4/0xe0
kernel: [  154.522745]        do_syscall_64+0x64/0x2b0
kernel: [  154.522748]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
kernel: [  154.522749]
kernel: [  154.522749] -> #3 (&mddev->reconfig_mutex){+.+.}:
kernel: [  154.522752]        __mutex_lock+0x87/0x950
kernel: [  154.522756]        new_dev_store+0xc9/0x210 [md_mod]
kernel: [  154.522759]        md_attr_store+0x7a/0xc0 [md_mod]
kernel: [  154.522761]        kernfs_fop_write+0x117/0x1b0
kernel: [  154.522763]        vfs_write+0xad/0x1a0
kernel: [  154.522765]        ksys_write+0xa4/0xe0
kernel: [  154.522767]        do_syscall_64+0x64/0x2b0
kernel: [  154.522769]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
kernel: [  154.522770]
kernel: [  154.522770] -> #2 (kn->count#253){++++}:
kernel: [  154.522775]        __kernfs_remove+0x253/0x2c0
kernel: [  154.522778]        kernfs_remove+0x1f/0x30
kernel: [  154.522780]        kobject_del+0x28/0x60
kernel: [  154.522783]        mddev_delayed_delete+0x24/0x30 [md_mod]
kernel: [  154.522786]        process_one_work+0x2a7/0x5f0
kernel: [  154.522788]        worker_thread+0x2d/0x3d0
kernel: [  154.522793]        kthread+0x117/0x130
kernel: [  154.522795]        ret_from_fork+0x3a/0x50
kernel: [  154.522796]
kernel: [  154.522796] -> #1 ((work_completion)(&mddev->del_work)){+.+.}:
kernel: [  154.522800]        process_one_work+0x27e/0x5f0
kernel: [  154.522802]        worker_thread+0x2d/0x3d0
kernel: [  154.522804]        kthread+0x117/0x130
kernel: [  154.522806]        ret_from_fork+0x3a/0x50
kernel: [  154.522807]
kernel: [  154.522807] -> #0 ((wq_completion)md_misc){+.+.}:
kernel: [  154.522813]        __lock_acquire+0x1392/0x1690
kernel: [  154.522816]        lock_acquire+0xb4/0x1a0
kernel: [  154.522818]        flush_workqueue+0xab/0x4b0
kernel: [  154.522821]        md_open+0xb6/0xc0 [md_mod]
kernel: [  154.522823]        __blkdev_get+0xea/0x590
kernel: [  154.522825]        blkdev_get+0x65/0x140
kernel: [  154.522828]        do_dentry_open+0x1d1/0x380
kernel: [  154.522831]        path_openat+0x567/0xcc0
kernel: [  154.522834]        do_filp_open+0x9b/0x110
kernel: [  154.522836]        do_sys_openat2+0x201/0x2a0
kernel: [  154.522838]        do_sys_open+0x57/0x80
kernel: [  154.522840]        do_syscall_64+0x64/0x2b0
kernel: [  154.522842]        entry_SYSCALL_64_after_hwframe+0x49/0xbe
kernel: [  154.522844]
kernel: [  154.522844] other info that might help us debug this:
kernel: [  154.522844]
kernel: [  154.522846] Chain exists of:
kernel: [  154.522846]   (wq_completion)md_misc --> &mddev->reconfig_mutex --> &bdev->bd_mutex
kernel: [  154.522846]
kernel: [  154.522850]  Possible unsafe locking scenario:
kernel: [  154.522850]
kernel: [  154.522852]        CPU0                    CPU1
kernel: [  154.522853]        ----                    ----
kernel: [  154.522854]   lock(&bdev->bd_mutex);
kernel: [  154.522856]                                lock(&mddev->reconfig_mutex);
kernel: [  154.522858]                                lock(&bdev->bd_mutex);
kernel: [  154.522860]   lock((wq_completion)md_misc);
kernel: [  154.522861]
kernel: [  154.522861]  *** DEADLOCK ***
kernel: [  154.522861]
kernel: [  154.522864] 1 lock held by mdadm/2482:
kernel: [  154.522865]  #0: ffff88804efa9338 (&bdev->bd_mutex){+.+.}, at: __blkdev_get+0x79/0x590
kernel: [  154.522868]
kernel: [  154.522868] stack backtrace:
kernel: [  154.522873] CPU: 1 PID: 2482 Comm: mdadm Tainted: G           O      5.6.0-rc7-lp151.27-default #25
kernel: [  154.522875] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1ubuntu1 04/01/2014
kernel: [  154.522878] Call Trace:
kernel: [  154.522881]  dump_stack+0x8f/0xcb
kernel: [  154.522884]  check_noncircular+0x194/0x1b0
kernel: [  154.522888]  ? __lock_acquire+0x1392/0x1690
kernel: [  154.522890]  __lock_acquire+0x1392/0x1690
kernel: [  154.522893]  lock_acquire+0xb4/0x1a0
kernel: [  154.522895]  ? flush_workqueue+0x84/0x4b0
kernel: [  154.522898]  flush_workqueue+0xab/0x4b0
kernel: [  154.522900]  ? flush_workqueue+0x84/0x4b0
kernel: [  154.522905]  ? md_open+0xb6/0xc0 [md_mod]
kernel: [  154.522908]  md_open+0xb6/0xc0 [md_mod]
kernel: [  154.522910]  __blkdev_get+0xea/0x590
kernel: [  154.522912]  ? bd_acquire+0xc0/0xc0
kernel: [  154.522914]  blkdev_get+0x65/0x140
kernel: [  154.522916]  ? bd_acquire+0xc0/0xc0
kernel: [  154.522918]  do_dentry_open+0x1d1/0x380
kernel: [  154.522921]  path_openat+0x567/0xcc0
kernel: [  154.522923]  ? __lock_acquire+0x380/0x1690
kernel: [  154.522926]  do_filp_open+0x9b/0x110
kernel: [  154.522929]  ? __alloc_fd+0xe5/0x1f0
kernel: [  154.522935]  ? kmem_cache_alloc+0x28c/0x630
kernel: [  154.522939]  ? do_sys_openat2+0x201/0x2a0
kernel: [  154.522941]  do_sys_openat2+0x201/0x2a0
kernel: [  154.522944]  do_sys_open+0x57/0x80
kernel: [  154.522946]  do_syscall_64+0x64/0x2b0
kernel: [  154.522948]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
kernel: [  154.522951] RIP: 0033:0x7f98d279d9ae

And md_alloc also flushed the same workqueue, but the thing is different
here. Because all the paths call md_alloc don't hold bdev->bd_mutex, and
the flush is necessary to avoid race condition, so leave it as it is.

Signed-off-by: Guoqing Jiang <[email protected]>
Signed-off-by: Song Liu <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
Signed-off-by: Lee Jones <[email protected]>
Change-Id: Ifb73108ae7c83ced33e92e19ff7893aa4ff6658a
(cherry picked from commit 41c5dc7a4c7dfb120987564b10715de0c9f5be94)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants