1 Image Metadata

By default, in the Linux system, Docker data is stored in /var/lib/docker by default. However, different systems have different Docker storage drivers and directory structures.. This document uses Docker images in the OCI standard format as an example to describes how Docker images are stored.

Figure 2.3 Image storage directory

The hash value of the file content is taken as the unique ID of each image layer. After being obtained, the image is indexed as follows: Docker server reads the mainfests file of the image and locate the config file based on the sha256 code of config in mainfests. Then the server traverses all layers in mainifests to search for the image content locally based on the sha256 code and finally reassemble a complete image..

2 Storage Driver

Ideally, mount volumes are used to store high read/write directories, and data is seldom directly written into the writable layer of the container. However, in particular circumstances, data has to be written into the writable layer. In this case, a storage driver is required to work as a media between the container and the host. Docker uses the driving technique to manage images, and storage and interaction of container instances.

Currently, Docker supports five storage drivers: AUFS, BtrFS, Device Mapper, OverlayFS, and ZFS[i]. Any of those storage drivers cannot fit all application scenarios. Therefore, you need to select appropriate storage drivers according to the specific scenarios to maximize the performance of Docker.

Table 2.2 Comparisons of common storage drivers

  Characteristic Advantage Disadvantage Application Scenario
AUFS UnionFS

Not merged into the mainline kernel

File-level storage

AUFS, the first storage driver of Docker, features relative stability, wide application in production environments, and powerful community support. As AUFS is a multi-layered system, copy-on-write (CoW) operations on big files from lower layers may be performed at a slow rate. AUFS applies to scenarios characterized by high concurrency and low I/O counts.
OverlayFS UnionFS

Merged into the mainline kernel

File-level storage

OverlayFS has only two layers. OverlayFS always duplicates the entire file regardless of the size of the modified content. It takes longer to modify big files than to modify small ones.. OverlayFS applies to high concurrency and low I/O scenarios.
Device Mapper Merged into the mainline kernel

Block-level storage

Regardless of the file size, Device Mapper duplicates the block to be modified rather than the entire file. Device Mapper does not support shared storage. If multiple containers try to read the same time simultaneously, multiple copies need to be generated. Enabling and disabling many containers can possibly cause hard disk overflow. Device Mapper does not apply to I/O-intensive scenarios.
BtrFS Merged into the mainline kernel

File-level storage

BtrFS directly operates on underlying devices and allows to add devices dynamically. BtrFS does not support shared storage. When multiple containers are reading one and the same file simultaneously, multiple copies need to be generated. BtrFS does not apply to PaaS platforms with too many containers.
ZFS All devices are aggregated into a storage pool for centralized management. ZFS allows multiple containers to share a cache block and is applicable to environments with large memory. CoW worsens the fragmentation. This leads to the discontinuity of files’ physical addresses on the hard disk, making it difficult for Docker to read these files in sequence. ZFS applies to PaaS-intensive scenarios.

3 Data Volume

Generally, stateful containers have a requirement of storing data persistently. As mentioned above, the modification of file systems occur in the uppermost read-write layer. In the lifecycle of a container, data is continuous, even after the container is disabled. However, after the container is deleted, the data layer is also deleted.

Therefore, Docker uses volumes for persistent storage of container data. Data volume is the preferred mechanism for persistent storage of data in Docker containers. Bind Mounts depend on the directory structure of the host, while data volumes are managed by Docker.   Compared with bind mounts, data volumes have the following advantages:

  • They can be easily backed up or migrated.
  • They can be managed with Docker CLI commands or Docker APIs.
  • They can be used on both Linux and Windows systems.
  • They can be securely shared among multiple containers.
  • Data volume drivers allow to store contents of storage volumes and encryption volumes onto a remote host or cloud, support the introduction of other functions.

In addition, volumes are a better choice for data storage than the read-write layer of a container. This is because volumes can be used for persistent data storage without increasing the container size, and therefore the storage is independent of the container lifecycle.

Figure 2.4 Mounting volumes on the Docker host

(To be continued)

[i] Docker storage drivers, https://docs.docker.com/storage/storagedriver/select-storage-driver/

Leave a Reply

Your email address will not be published. Required fields are marked *