Kissingwolf's Blog

Docker 故障(device or resource busy)

故障原因

  1. CentOS/RedHat 3.10.0内核NameSpace bug,由于Systemd PrivateTmp设置导致私有名字空间维护挂接磁盘状态影响全局空间卸载磁盘。
  2. 基于systemd 方式的ntpd 服务启动配置文件中设置“PrivateTmp=true”,导致ntpd 私有名字空间中挂接信息(/proc/$ntpd_pid/mounts)存在docker container 挂接磁盘信息,Docker 销毁container时需要卸载挂接磁盘,此时出现冲突导致docker 销毁container动作失败。
  3. Docker 创建、运行、新增container和image时不会遇到这个Bug,仅在销毁container时由于无法卸载挂接镜像磁盘导致出错。
  4. Docker 无论使用哪种storage-driver均有此问题,问题并不仅仅出现在使用devicemapper storage-driver时。
  5. CentOS7系统中默认会激活此Bug的服务除ntpd外包括:brandbot.service、dbus-org.freedesktop.hostname1.service、dbus-org.freedesktop.import1.service、dbus-org.freedesktop.locale1.service、dbus-org.freedesktop.machine1.service、dbus-org.freedesktop.timedate1.service、httpd.service、systemd-hostnamed.service、systemd-importd.service、systemd-localed.service、systemd-machined.service、systemd-timedated.service。
  6. 任何使用unshare方式运行的进程均会激活此Bug。

解决方案

  1. 启动docker服务前修改/usr/lib/systemd/system/docker.service配置文件[Service]段中加入MountFlags=slave,然后systemctl daemon-reload,重新启动docker后,docker container挂接的磁盘将独立于全局磁盘挂接,不会再受到“Systemd PrivateTmp”影响。
  2. 已受“Systemd PrivateTmp”影响的状态下,Docker 无法销毁container时,systemctl方式重启设置“PrivateTmp=true”的服务后,Docker就可以销毁执行了docker rm -f container_name的container,但再销毁新的container时,还需要重启重启设置“PrivateTmp=true”的服务。
  3. 此问题CentOS/RedHat在CentOS 7.4 /RHEL 7.4 时(kernel-3.10.0-693)时会环境,但依旧没有根治,相同环境下CentOS 7.4/RHEL7.4(kernel-3.10.0-693)会报错,但不会影响操作。

分析过程

  • 环境说明:kernel: 3.10.0-514.26.2.el7.x86_64 , docker:17.06.2.ce-1,ntp:4.2.6p5-25
1
2
3
4
5
6
[root@kevinzou ~]# uname -a
Linux kevinzou.kissingwolf.com 3.10.0-514.26.2.el7.x86_64 #1 SMP Tue Jul 4 15:04:05 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux
[root@kevinzou ~]# rpm -q docker-ce
docker-ce-17.06.2.ce-1.el7.centos.x86_64
[root@kevinzou ~]# rpm -q ntp
ntp-4.2.6p5-25.el7.centos.2.x86_64
  • 在docker devicemapper storage-driver环境下复现问题
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
[root@kevinzou ~]# systemctl start ntpd
[root@kevinzou ~]# systemctl start docker
[root@kevinzou ~]# docker info
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 1
Server Version: 17.06.2-ce
Storage Driver: devicemapper
Pool Name: docker-253:0-101567388-pool
Pool Blocksize: 65.54kB
Base Device Size: 10.74GB
....
[root@kevinzou ~]# lsns
NS TYPE NPROCS PID USER COMMAND
4026531836 pid 205 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531837 user 205 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531838 uts 205 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531839 ipc 205 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531840 mnt 199 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531856 mnt 1 17 root kdevtmpfs
4026531956 net 205 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026532462 mnt 1 587 root /usr/lib/systemd/systemd-udevd
4026532463 mnt 1 774 root /usr/bin/vmtoolsd
4026532547 mnt 2 802 root /usr/sbin/NetworkManager --no-daemon
4026532548 mnt 1 1415 ntp /usr/sbin/ntpd -u ntp:ntp -g
[root@kevinzou ~]# docker run -d busybox /bin/sh -c "while : ; do sleep 1000 ; done"
7795c776e4d0dd704a146cb17ffa16340aed88abd39e1a18ae8675c29a9ea606
[root@kevinzou ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg0-root 52403200 3211128 49192072 7% /
devtmpfs 986864 0 986864 0% /dev
tmpfs 997636 0 997636 0% /dev/shm
tmpfs 997636 780 996856 1% /run
tmpfs 997636 0 997636 0% /sys/fs/cgroup
/dev/sda5 20971520 16928 18856704 1% /home
/dev/sda1 201380 164836 36544 82% /boot
/dev/mapper/vg0-opt 10190100 36892 9612536 1% /opt
tmpfs 199528 0 199528 0% /run/user/0
/dev/dm-3 10474496 34920 10439576 1% /var/lib/docker/devicemapper/mnt/589d673ac29d6128fbcc90edc6f644a6c71900a577303b5cbf2ad3b61619fbca
shm 65536 0 65536 0% /var/lib/docker/containers/7795c776e4d0dd704a146cb17ffa16340aed88abd39e1a18ae8675c29a9ea606/shm
[root@kevinzou ~]# systemctl status ntpd |grep PID
Main PID: 1415 (ntpd)
[root@kevinzou ~]# grep devicemapper /proc/1415/mounts
/dev/mapper/vg0-root /var/lib/docker/devicemapper xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0

先启动ntpd服务,后启动docker服务并创建container,不会有任何问题。

  • 重启ntpd服务,激活bug
1
2
3
4
5
6
7
8
9
10
11
12
13
[root@kevinzou ~]# systemctl restart ntpd
[root@kevinzou ~]# systemctl status ntpd |grep PID
Main PID: 1942 (ntpd)
[root@kevinzou ~]# grep devicemapper /proc/1942/mounts
/dev/mapper/vg0-root /var/lib/docker/devicemapper xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
/dev/mapper/docker-253:0-101567388-589d673ac29d6128fbcc90edc6f644a6c71900a577303b5cbf2ad3b61619fbca /var/lib/docker/devicemapper/mnt/589d673ac29d6128fbcc90edc6f644a6c71900a577303b5cbf2ad3b61619fbca xfs rw,seclabel,relatime,nouuid,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota 0 0
[root@kevinzou ~]# docker rm -f $(docker ps -q)
Error response from daemon: driver "devicemapper" failed to remove root filesystem for 7795c776e4d0dd704a146cb17ffa16340aed88abd39e1a18ae8675c29a9ea606: failed to remove device 589d673ac29d6128fbcc90edc6f644a6c71900a577303b5cbf2ad3b61619fbca: Device is Busy
此时hang住了大概10s,而后确定删除
[root@kevinzou ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
  • 在docker overlay storage-driver环境下复现问题
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
[root@kevinzou ~]# systemctl start ntpd
[root@kevinzou ~]# systemctl start docker
[root@kevinzou ~]# docker info
Containers: 5
Running: 0
Paused: 0
Stopped: 5
Images: 1
Server Version: 17.06.2-ce
Storage Driver: overlay
Backing Filesystem: xfs
Supports d_type: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
[root@kevinzou ~]# lsns
NS TYPE NPROCS PID USER COMMAND
4026531836 pid 215 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531837 user 215 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531838 uts 215 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531839 ipc 215 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531840 mnt 209 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531856 mnt 1 17 root kdevtmpfs
4026531956 net 215 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026532462 mnt 1 587 root /usr/lib/systemd/systemd-udevd
4026532463 mnt 1 774 root /usr/bin/vmtoolsd
4026532547 mnt 2 802 root /usr/sbin/NetworkManager --no-daemon
4026532548 mnt 1 1942 ntp /usr/sbin/ntpd -u ntp:ntp -g
[root@kevinzou ~]# docker run -d busybox /bin/sh -c "while : ; do sleep 1000 ; done"
58818b40586e230715df416779ec23799000616b0f0c09686ca1ccc9a8a4401d
[root@kevinzou ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg0-root 52403200 3211932 49191268 7% /
devtmpfs 986864 0 986864 0% /dev
tmpfs 997636 0 997636 0% /dev/shm
tmpfs 997636 780 996856 1% /run
tmpfs 997636 0 997636 0% /sys/fs/cgroup
/dev/sda5 20971520 16928 18856704 1% /home
/dev/sda1 201380 164836 36544 82% /boot
/dev/mapper/vg0-opt 10190100 36892 9612536 1% /opt
tmpfs 199528 0 199528 0% /run/user/0
overlay 52403200 3211932 49191268 7% /var/lib/docker/overlay/a1adbe925183f6969aec2a38c9505e2bc7b9184ddeb362f2c6ad4e07d0727164/merged
shm 65536 0 65536 0% /var/lib/docker/containers/58818b40586e230715df416779ec23799000616b0f0c09686ca1ccc9a8a4401d/shm
[root@kevinzou ~]# systemctl status ntpd|grep PID
Main PID: 1942 (ntpd)
[root@kevinzou ~]# grep overlay /proc/1942/mounts
/dev/mapper/vg0-root /var/lib/docker/overlay xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0

和devicemapper 方式一样,先启动ntpd服务,后启动docker服务并创建container,不会有任何问题。

  • 重启ntpd服务,激活bug
1
2
3
4
5
6
7
8
9
10
11
12
13
14
[root@kevinzou ~]# systemctl restart ntpd
[root@kevinzou ~]# systemctl status ntpd|grep PID
Main PID: 2255 (ntpd)
[root@kevinzou ~]# grep overlay /proc/2255/mounts
/dev/mapper/vg0-root /var/lib/docker/overlay xfs rw,seclabel,relatime,attr2,inode64,noquota 0 0
overlay /var/lib/docker/overlay/a1adbe925183f6969aec2a38c9505e2bc7b9184ddeb362f2c6ad4e07d0727164/merged overlay rw,seclabel,relatime,lowerdir=/var/lib/docker/overlay/5b1bd094a587e8b891dcadec0ccbeb9feb8afed7880cfd62831f01105b92df2d/root,upperdir=/var/lib/docker/overlay/a1adbe925183f6969aec2a38c9505e2bc7b9184ddeb362f2c6ad4e07d0727164/upper,workdir=/var/lib/docker/overlay/a1adbe925183f6969aec2a38c9505e2bc7b9184ddeb362f2c6ad4e07d0727164/work 0 0
[root@kevinzou ~]# docker rm -f $(docker ps -q)
Error response from daemon: driver "overlay" failed to remove root filesystem for 58818b40586e230715df416779ec23799000616b0f0c09686ca1ccc9a8a4401d: remove /var/lib/docker/overlay/a1adbe925183f6969aec2a38c9505e2bc7b9184ddeb362f2c6ad4e07d0727164/merged: device or resource busy
[root@kevinzou ~]# docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
  • 修改/usr/lib/systemd/system/docker.service配置文件[Service]段中加入MountFlags=slave,解决Bug
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
[root@kevinzou ~]# systemctl stop docker
[root@kevinzou ~]# vi /usr/lib/systemd/system/docker.service
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
# the default is not to use systemd for cgroups because the delegate issues still
# exists and systemd currently does not support the cgroup feature set required
# for containers run by docker
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP $MAINPID
# Having non-zero Limit*s causes performance problems due to accounting overhead
# in the kernel. We recommend using cgroups to do container-local accounting.
LimitNOFILE=infinity
LimitNPROC=infinity
LimitCORE=infinity
# Uncomment TasksMax if your systemd version supports it.
# Only systemd 226 and above support this version.
#TasksMax=infinity
TimeoutStartSec=0
# set delegate yes so that systemd does not reset the cgroups of docker containers
Delegate=yes
# kill only the docker process, not all processes in the cgroup
KillMode=process
# restart the docker process if it exits prematurely
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
MountFlags=slave
[Install]
WantedBy=multi-user.target
[root@kevinzou ~]# systemctl daemon-reload
[root@kevinzou ~]# systemctl start docker
[root@kevinzou ~]# lsns
NS TYPE NPROCS PID USER COMMAND
4026531836 pid 207 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531837 user 207 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531838 uts 207 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531839 ipc 207 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531840 mnt 199 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026531856 mnt 1 17 root kdevtmpfs
4026531956 net 207 1 root /usr/lib/systemd/systemd --switched-root --system --deserialize 21
4026532462 mnt 1 587 root /usr/lib/systemd/systemd-udevd
4026532463 mnt 1 774 root /usr/bin/vmtoolsd
4026532547 mnt 2 802 root /usr/sbin/NetworkManager --no-daemon
4026532548 mnt 1 2255 ntp /usr/sbin/ntpd -u ntp:ntp -g
4026532553 mnt 2 2373 root /usr/bin/dockerd
  • 测试修改后不会激活Bug
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[root@kevinzou ~]# docker run -d busybox /bin/sh -c "while : ; do sleep 1000 ; done"
be96dd4a7f8bf51f36a5ba18b007bf4f7dd2245c805ad42f60fd3017f56bbe2f
[root@kevinzou ~]# systemctl restart ntpd
[root@kevinzou ~]# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/mapper/vg0-root 52403200 3212008 49191192 7% /
devtmpfs 986864 0 986864 0% /dev
tmpfs 997636 0 997636 0% /dev/shm
tmpfs 997636 780 996856 1% /run
tmpfs 997636 0 997636 0% /sys/fs/cgroup
/dev/sda5 20971520 16928 18856704 1% /home
/dev/sda1 201380 164836 36544 82% /boot
/dev/mapper/vg0-opt 10190100 36892 9612536 1% /opt
tmpfs 199528 0 199528 0% /run/user/0
[root@kevinzou ~]# systemctl status ntpd|grep PID
Main PID: 2544 (ntpd)
[root@kevinzou ~]# grep overlay /proc/2544/mounts
[root@kevinzou ~]# docker rm -f $(docker ps -q)
be96dd4a7f8b

其他概念

Systemd PrivateTmp : True/yes

使用独立的挂接空间运行服务,默认执行时继承系统全局挂接空间。

Systemd MountFlags: share/slave/private

设置文件系统的挂载传递标记,可设为 shared, slave, private 之一。 这些标记控制着文件系统挂载点的挂载和卸载动作如何在主机与容器之间传递。 shared 表示 挂载和卸载将会在主机和容器之间同步(双向可见); slave表示容器内的挂载和卸载不会传递到主机, 但主机中挂载的文件系统依然对容器内的进程可见。 private 表示主机的挂载和卸载不会传递到容器, 同时容器中的挂载亦不会传递到主机中(双向不可见)。 本选项的默认值一般是 shared , 但如果使用了PrivateTmp=, PrivateDevices=, ProtectSystem=, ProtectHome=, ReadOnlyPaths=, InaccessiblePaths=, ReadWritePaths=选项之一, 那么本选项的默认值将会 自动从 slave 降级到 private , 因为这些选项要求挂载和卸载必须不能从容器传递到主机。

参考

https://access.redhat.com/errata/RHBA-2017:1620

https://github.com/moby/moby/issues/22260

https://github.com/moby/moby/issues/27381