Changes for page ZFS Administration - Part I - VDEVs

on 2024-09-01 12:39


on 2024-09-01 08:54
Summary
Details
- Page properties
-
- Content
-
... ... @@ -1,4 +1,4 @@ 1 -So, I've blogged a few times randomly about getting ZFS on GNU/Linux, and it's been a hit. I've had plenty of requests for blogging more. So, this will be the first in a long series of posts about how you can administer your ZFS filesystems and pools. You should start first by reading [[how to get ZFS installed into your GNU/Linux>> doc:Tech-Tips.ZFS-The-Aaron-Topponce-Archive.WebHome]] system here on this blog, then continue with this post.1 +So, I've blogged a few times randomly about getting ZFS on GNU/Linux, and it's been a hit. I've had plenty of requests for blogging more. So, this will be the first in a long series of posts about how you can administer your ZFS filesystems and pools. You should start first by reading [[how to get ZFS installed into your GNU/Linux>>url:https://web.archive.org/web/20210430213532/http://pthree.org/2012/04/17/install-zfs-on-debian-gnulinux/]] system here on this blog, then continue with this post. 2 2 3 3 == Virtual Device Introduction == 4 4 ... ... @@ -92,20 +92,17 @@ 92 92 93 93 Notice that "mirror-0" is now the VDEV, with each physical device managed by it. As mentioned earlier, this would be analogous to a Linux software RAID "/dev/md0" device representing the four physical devices. Let's now clean up our pool, and create another. 94 94 95 -{{code language="bash session"}} 96 -# zpool destroy tank 97 -{{/code}} 95 +{{{# zpool destroy tank}}} 98 98 99 99 == Nested VDEVs == 100 100 101 101 VDEVs can be nested. A perfect example is a standard RAID-1+0 (commonly referred to as "RAID-10"). This is a stripe of mirrors. In order to specify the nested VDEVs, I just put them on the command line in order (emphasis mine): 102 102 103 -{{code language="bash session"}} 104 -# zpool create tank mirror sde sdf mirror sdg sdh 101 +{{{# zpool create tank mirror sde sdf mirror sdg sdh 105 105 # zpool status 106 106 pool: tank 107 107 state: ONLINE 108 - 105 + scan: none requested 109 109 config: 110 110 111 111 NAME STATE READ WRITE CKSUM ... ... @@ -117,27 +117,22 @@ 117 117 sdg ONLINE 0 0 0 118 118 sdh ONLINE 0 0 0 119 119 120 -errors: No known data errors 121 -{{/code}} 117 +errors: No known data errors}}} 122 122 123 - 124 124 The first VDEV is "mirror-0" which is managing /dev/sde and /dev/sdf. This was done by calling "mirror sde sdf". The second VDEV is "mirror-1" which is managing /dev/sdg and /dev/sdh. This was done by calling "mirror sdg sdh". Because VDEVs are always dynamically striped, "mirror-0" and "mirror-1" are striped, thus creating the RAID-1+0 setup. Don't forget to cleanup before continuing: 125 125 126 -{{code language="bash session"}} 127 -# zpool destroy tank 128 -{{/code}} 121 +{{{# zpool destroy tank}}} 129 129 130 130 == File VDEVs == 131 131 132 132 As mentioned, pre-allocated files can be used fer setting up zpools on your existing ext4 filesystem (or whatever). It should be noted that this is meant entirely for testing purposes, and not for storing production data. Using files is a great way to have a sandbox, where you can test compression ratio, the size of the deduplication table, or other things without actually committing production data to it. When creating file VDEVs, you cannot use relative paths, but must use absolute paths. Further, the image files must be preallocated, and not sparse files or thin provisioned. Let's see how this works: 133 133 134 -{{code language="bash session"}} 135 -# for i in {1..4}; do dd if=/dev/zero of=/tmp/file$i bs=1G count=4 &> /dev/null; done 127 +{{{# for i in {1..4}; do dd if=/dev/zero of=/tmp/file$i bs=1G count=4 &> /dev/null; done 136 136 # zpool create tank /tmp/file1 /tmp/file2 /tmp/file3 /tmp/file4 137 137 # zpool status tank 138 138 pool: tank 139 139 state: ONLINE 140 - 132 + scan: none requested 141 141 config: 142 142 143 143 NAME STATE READ WRITE CKSUM ... ... @@ -147,25 +147,21 @@ 147 147 /tmp/file3 ONLINE 0 0 0 148 148 /tmp/file4 ONLINE 0 0 0 149 149 150 -errors: No known data errors 151 -{{/code}} 142 +errors: No known data errors}}} 152 152 153 153 In this case, we created a RAID-0. We used preallocated files using /dev/zero that are each 4GB in size. Thus, the size of our zpool is 16 GB in usable space. Each file, as with our first example using disks, is a VDEV. Of course, you can treat the files as disks, and put them into a mirror configuration, RAID-1+0, RAIDZ-1 (coming in the next post), etc. 154 154 155 -{{code language="bash session"}} 156 -# zpool destroy tank 157 -{{/code}} 146 +{{{# zpool destroy tank}}} 158 158 159 159 == Hybrid pools == 160 160 161 161 This last example should show you the complex pools you can setup by using different VDEVs. Using our four file VDEVs from the previous example, and our four disk VDEVs /dev/sde through /dev/sdh, let's create a hybrid pool with cache and log drives. Again, I emphasized the nested VDEVs for clarity: 162 162 163 -{{code language="bash session"}} 164 -# zpool create tank mirror /tmp/file1 /tmp/file2 mirror /tmp/file3 /tmp/file4 log mirror sde sdf cache sdg sdh 152 +{{{# zpool create tank mirror /tmp/file1 /tmp/file2 mirror /tmp/file3 /tmp/file4 log mirror sde sdf cache sdg sdh 165 165 # zpool status tank 166 166 pool: tank 167 167 state: ONLINE 168 - 156 + scan: none requested 169 169 config: 170 170 171 171 NAME STATE READ WRITE CKSUM ... ... @@ -184,26 +184,22 @@ 184 184 sdg ONLINE 0 0 0 185 185 sdh ONLINE 0 0 0 186 186 187 -errors: No known data errors 188 -{{/code}} 175 +errors: No known data errors}}} 189 189 190 190 There's a lot going on here, so let's disect it. First, we created a RAID-1+0 using our four preallocated image files. Notice the VDEVs "mirror-0" and "mirror-1", and what they are managing. Second, we created a third VDEV called "mirror-2" that actually is not used for storing data in the pool, but is used as a ZFS intent log, or ZIL. We'll cover the ZIL in more detail in another post. Then we created two VDEVs for caching data called "sdg" and "sdh". The are standard disk VDEVs that we've already learned about. However, they are also managed by the "cache" VDEV. So, in this case, we've used 6 of the 7 VDEVs listed above, the only one missing is "spare". 191 191 192 192 Noticing the indentation will help you see what VDEV is managing what. The "tank" pool is comprised of the "mirror-0" and "mirror-1" VDEVs for long-term persistent storage. The ZIL is magaged by "mirror-2", which is comprised of /dev/sde and /dev/sdf. The read-only cache VDEV is managed by two disks, /dev/sdg and /dev/sdh. Neither the "logs" nor the "cache" are long-term storage for the pool, thus creating a "hybrid pool" setup. 193 193 194 -{{code language="bash session"}} 195 -# zpool destroy tank 196 -{{/code}} 181 +{{{# zpool destroy tank}}} 197 197 198 198 == Real life example == 199 199 200 200 In production, the files would be physical disk, and the ZIL and cache would be fast SSDs. Here is my current zpool setup which is storing this blog, among other things: 201 201 202 -{{code language="bash session"}} 203 -# zpool status pool 187 +{{{# zpool status pool 204 204 pool: pool 205 205 state: ONLINE 206 - 190 + scan: scrub repaired 0 in 2h23m with 0 errors on Sun Dec 2 02:23:44 2012 207 207 config: 208 208 209 209 NAME STATE READ WRITE CKSUM ... ... @@ -221,22 +221,19 @@ 221 221 ata-OCZ-REVODRIVE_OCZ-33W9WE11E9X73Y41-part2 ONLINE 0 0 0 222 222 ata-OCZ-REVODRIVE_OCZ-X5RG0EIY7MN7676K-part2 ONLINE 0 0 0 223 223 224 -errors: No known data errors 225 -{{/code}} 208 +errors: No known data errors}}} 226 226 227 227 Notice that my "logs" and "cache" VDEVs are OCZ Revodrive SSDs, while the four platter disks are in a RAIDZ-1 VDEV (RAIDZ will be discussed in the next post). However, notice that the name of the SSDs is "ata-OCZ-REVODRIVE_OCZ-33W9WE11E9X73Y41-part1", etc. These are found in /dev/disk/by-id/. The reason I chose these instead of "sdb" and "sdc" is because the cache and log devices don't necessarily store the same ZFS metadata. Thus, when the pool is being created on boot, they may not come into the pool, and could be missing. Or, the motherboard may assign the drive letters in a different order. This isn't a problem with the main pool, but is a big problem on GNU/Linux with logs and cached devices. Using the device name under /dev/disk/by-id/ ensures greater persistence and uniqueness. 228 228 229 229 Also do notice the simplicity in the implementation. Consider doing something similar with LVM, RAID and ext4. You would need to do the following: 230 230 231 -{{code language="bash session"}} 232 -# mdadm -C /dev/md0 -l 0 -n 4 /dev/sde /dev/sdf /dev/sdg /dev/sdh 214 +{{{# mdadm -C /dev/md0 -l 0 -n 4 /dev/sde /dev/sdf /dev/sdg /dev/sdh 233 233 # pvcreate /dev/md0 234 234 # vgcreate /dev/md0 tank 235 235 # lvcreate -l 100%FREE -n videos tank 236 236 # mkfs.ext4 /dev/tank/videos 237 237 # mkdir -p /tank/videos 238 -# mount -t ext4 /dev/tank/videos /tank/videos 239 -{{/code}} 220 +# mount -t ext4 /dev/tank/videos /tank/videos}}} 240 240 241 241 The above was done in ZFS (minus creating the logical volume, which will get to later) with one command, rather than seven. 242 242