Changes for page ZFS Administration - Part I - VDEVs


on 2024-09-01 08:29


on 2024-09-01 08:57
Summary
Details
- Page properties
-
- Content
-
... ... @@ -37,14 +37,17 @@ 37 37 38 38 Let's start by creating a simple zpool wyth my 4 drives. I could create a zpool named "tank" with the following command: 39 39 40 -{{{# zpool create tank sde sdf sdg sdh}}} 40 +{{code language="bash session"}} 41 +# zpool create tank sde sdf sdg sdh 42 +{{/code}} 41 41 42 42 In this case, I'm using four disk VDEVs. Notice that I'm not using full device paths, although I could. Because VDEVs are always dynamically striped, this is effectively a RAID-0 between four drives (no redundancy). We should also check the status of the zpool: 43 43 44 -{{{# zpool status tank 46 +{{code language="bash session"}} 47 +# zpool status tank 45 45 pool: tank 46 46 state: ONLINE 47 - scan: none requested 50 + scan: none requested 48 48 config: 49 49 50 50 NAME STATE READ WRITE CKSUM ... ... @@ -54,21 +54,26 @@ 54 54 sdg ONLINE 0 0 0 55 55 sdh ONLINE 0 0 0 56 56 57 -errors: No known data errors}}} 60 +errors: No known data errors 61 +{{/code}} 58 58 59 59 Let's tear down the zpool, and create a new one. Run the following before continuing, if you're following along in your own terminal: 60 60 61 -{{{# zpool destroy tank}}} 65 +{{code language="bash session"}} 66 +# zpool destroy tank 67 +{{/code}} 62 62 69 + 63 63 == A simple mirrored zpool == 64 64 65 65 In this next example, I wish to mirror all four drives (/dev/sde, /dev/sdf, /dev/sdg and /dev/sdh). So, rather than using the disk VDEV, I'll be using "mirror". The command is as follows: 66 66 67 -{{{# zpool create tank mirror sde sdf sdg sdh 74 +{{code language="bash session"}} 75 +# zpool create tank mirror sde sdf sdg sdh 68 68 # zpool status tank 69 69 pool: tank 70 70 state: ONLINE 71 - scan: none requested 79 + scan: none requested 72 72 config: 73 73 74 74 NAME STATE READ WRITE CKSUM ... ... @@ -79,21 +79,25 @@ 79 79 sdg ONLINE 0 0 0 80 80 sdh ONLINE 0 0 0 81 81 82 -errors: No known data errors}}} 90 +errors: No known data errors 91 +{{/code}} 83 83 84 84 Notice that "mirror-0" is now the VDEV, with each physical device managed by it. As mentioned earlier, this would be analogous to a Linux software RAID "/dev/md0" device representing the four physical devices. Let's now clean up our pool, and create another. 85 85 86 -{{{# zpool destroy tank}}} 95 +{{code language="bash session"}} 96 +# zpool destroy tank 97 +{{/code}} 87 87 88 88 == Nested VDEVs == 89 89 90 90 VDEVs can be nested. A perfect example is a standard RAID-1+0 (commonly referred to as "RAID-10"). This is a stripe of mirrors. In order to specify the nested VDEVs, I just put them on the command line in order (emphasis mine): 91 91 92 -{{{# zpool create tank mirror sde sdf mirror sdg sdh 103 +{{code language="bash session"}} 104 +# zpool create tank mirror sde sdf mirror sdg sdh 93 93 # zpool status 94 94 pool: tank 95 95 state: ONLINE 96 - scan: none requested 108 + scan: none requested 97 97 config: 98 98 99 99 NAME STATE READ WRITE CKSUM ... ... @@ -105,22 +105,27 @@ 105 105 sdg ONLINE 0 0 0 106 106 sdh ONLINE 0 0 0 107 107 108 -errors: No known data errors}}} 120 +errors: No known data errors 121 +{{/code}} 109 109 123 + 110 110 The first VDEV is "mirror-0" which is managing /dev/sde and /dev/sdf. This was done by calling "mirror sde sdf". The second VDEV is "mirror-1" which is managing /dev/sdg and /dev/sdh. This was done by calling "mirror sdg sdh". Because VDEVs are always dynamically striped, "mirror-0" and "mirror-1" are striped, thus creating the RAID-1+0 setup. Don't forget to cleanup before continuing: 111 111 112 -{{{# zpool destroy tank}}} 126 +{{code language="bash session"}} 127 +# zpool destroy tank 128 +{{/code}} 113 113 114 114 == File VDEVs == 115 115 116 116 As mentioned, pre-allocated files can be used fer setting up zpools on your existing ext4 filesystem (or whatever). It should be noted that this is meant entirely for testing purposes, and not for storing production data. Using files is a great way to have a sandbox, where you can test compression ratio, the size of the deduplication table, or other things without actually committing production data to it. When creating file VDEVs, you cannot use relative paths, but must use absolute paths. Further, the image files must be preallocated, and not sparse files or thin provisioned. Let's see how this works: 117 117 118 -{{{# for i in {1..4}; do dd if=/dev/zero of=/tmp/file$i bs=1G count=4 &> /dev/null; done 134 +{{code language="bash session"}} 135 +# for i in {1..4}; do dd if=/dev/zero of=/tmp/file$i bs=1G count=4 &> /dev/null; done 119 119 # zpool create tank /tmp/file1 /tmp/file2 /tmp/file3 /tmp/file4 120 120 # zpool status tank 121 121 pool: tank 122 122 state: ONLINE 123 - scan: none requested 140 + scan: none requested 124 124 config: 125 125 126 126 NAME STATE READ WRITE CKSUM ... ... @@ -130,21 +130,25 @@ 130 130 /tmp/file3 ONLINE 0 0 0 131 131 /tmp/file4 ONLINE 0 0 0 132 132 133 -errors: No known data errors}}} 150 +errors: No known data errors 151 +{{/code}} 134 134 135 135 In this case, we created a RAID-0. We used preallocated files using /dev/zero that are each 4GB in size. Thus, the size of our zpool is 16 GB in usable space. Each file, as with our first example using disks, is a VDEV. Of course, you can treat the files as disks, and put them into a mirror configuration, RAID-1+0, RAIDZ-1 (coming in the next post), etc. 136 136 137 -{{{# zpool destroy tank}}} 155 +{{code language="bash session"}} 156 +# zpool destroy tank 157 +{{/code}} 138 138 139 139 == Hybrid pools == 140 140 141 141 This last example should show you the complex pools you can setup by using different VDEVs. Using our four file VDEVs from the previous example, and our four disk VDEVs /dev/sde through /dev/sdh, let's create a hybrid pool with cache and log drives. Again, I emphasized the nested VDEVs for clarity: 142 142 143 -{{{# zpool create tank mirror /tmp/file1 /tmp/file2 mirror /tmp/file3 /tmp/file4 log mirror sde sdf cache sdg sdh 163 +{{code language="bash session"}} 164 +# zpool create tank mirror /tmp/file1 /tmp/file2 mirror /tmp/file3 /tmp/file4 log mirror sde sdf cache sdg sdh 144 144 # zpool status tank 145 145 pool: tank 146 146 state: ONLINE 147 - scan: none requested 168 + scan: none requested 148 148 config: 149 149 150 150 NAME STATE READ WRITE CKSUM ... ... @@ -163,22 +163,26 @@ 163 163 sdg ONLINE 0 0 0 164 164 sdh ONLINE 0 0 0 165 165 166 -errors: No known data errors}}} 187 +errors: No known data errors 188 +{{/code}} 167 167 168 168 There's a lot going on here, so let's disect it. First, we created a RAID-1+0 using our four preallocated image files. Notice the VDEVs "mirror-0" and "mirror-1", and what they are managing. Second, we created a third VDEV called "mirror-2" that actually is not used for storing data in the pool, but is used as a ZFS intent log, or ZIL. We'll cover the ZIL in more detail in another post. Then we created two VDEVs for caching data called "sdg" and "sdh". The are standard disk VDEVs that we've already learned about. However, they are also managed by the "cache" VDEV. So, in this case, we've used 6 of the 7 VDEVs listed above, the only one missing is "spare". 169 169 170 170 Noticing the indentation will help you see what VDEV is managing what. The "tank" pool is comprised of the "mirror-0" and "mirror-1" VDEVs for long-term persistent storage. The ZIL is magaged by "mirror-2", which is comprised of /dev/sde and /dev/sdf. The read-only cache VDEV is managed by two disks, /dev/sdg and /dev/sdh. Neither the "logs" nor the "cache" are long-term storage for the pool, thus creating a "hybrid pool" setup. 171 171 172 -{{{# zpool destroy tank}}} 194 +{{code language="bash session"}} 195 +# zpool destroy tank 196 +{{/code}} 173 173 174 174 == Real life example == 175 175 176 176 In production, the files would be physical disk, and the ZIL and cache would be fast SSDs. Here is my current zpool setup which is storing this blog, among other things: 177 177 178 -{{{# zpool status pool 202 +{{code language="bash session"}} 203 +# zpool status pool 179 179 pool: pool 180 180 state: ONLINE 181 - scan: scrub repaired 0 in 2h23m with 0 errors on Sun Dec 2 02:23:44 2012 206 + scan: scrub repaired 0 in 2h23m with 0 errors on Sun Dec 2 02:23:44 2012 182 182 config: 183 183 184 184 NAME STATE READ WRITE CKSUM ... ... @@ -196,19 +196,22 @@ 196 196 ata-OCZ-REVODRIVE_OCZ-33W9WE11E9X73Y41-part2 ONLINE 0 0 0 197 197 ata-OCZ-REVODRIVE_OCZ-X5RG0EIY7MN7676K-part2 ONLINE 0 0 0 198 198 199 -errors: No known data errors}}} 224 +errors: No known data errors 225 +{{/code}} 200 200 201 201 Notice that my "logs" and "cache" VDEVs are OCZ Revodrive SSDs, while the four platter disks are in a RAIDZ-1 VDEV (RAIDZ will be discussed in the next post). However, notice that the name of the SSDs is "ata-OCZ-REVODRIVE_OCZ-33W9WE11E9X73Y41-part1", etc. These are found in /dev/disk/by-id/. The reason I chose these instead of "sdb" and "sdc" is because the cache and log devices don't necessarily store the same ZFS metadata. Thus, when the pool is being created on boot, they may not come into the pool, and could be missing. Or, the motherboard may assign the drive letters in a different order. This isn't a problem with the main pool, but is a big problem on GNU/Linux with logs and cached devices. Using the device name under /dev/disk/by-id/ ensures greater persistence and uniqueness. 202 202 203 203 Also do notice the simplicity in the implementation. Consider doing something similar with LVM, RAID and ext4. You would need to do the following: 204 204 205 -{{{# mdadm -C /dev/md0 -l 0 -n 4 /dev/sde /dev/sdf /dev/sdg /dev/sdh 231 +{{code language="bash session"}} 232 +# mdadm -C /dev/md0 -l 0 -n 4 /dev/sde /dev/sdf /dev/sdg /dev/sdh 206 206 # pvcreate /dev/md0 207 207 # vgcreate /dev/md0 tank 208 208 # lvcreate -l 100%FREE -n videos tank 209 209 # mkfs.ext4 /dev/tank/videos 210 210 # mkdir -p /tank/videos 211 -# mount -t ext4 /dev/tank/videos /tank/videos}}} 238 +# mount -t ext4 /dev/tank/videos /tank/videos 239 +{{/code}} 212 212 213 213 The above was done in ZFS (minus creating the logical volume, which will get to later) with one command, rather than seven. 214 214 ... ... @@ -217,7 +217,6 @@ 217 217 This should act as a good starting point for getting the basic understanding of zpools and VDEVs. The rest of it is all downhill from here. You've made it over the "big hurdle" of understanding how ZFS handles pooled storage. We still need to cover RAIDZ levels, and we still need to go into more depth about log and cache devices, as well as pool settings, such as deduplication and compression, but all of these will be handled in separate posts. Then we can get into ZFS filesystem datasets, their settings, and advantages and disagvantages. But, you now have a head start on the core part of ZFS pools. 218 218 219 219 ---- 220 - 221 221 (% style="text-align: center;" %) 222 222 Posted by Aaron Toponce on Tuesday, December 4, 2012, at 6:00 am. 223 223 Filed under [[Debian>>url:https://web.archive.org/web/20210430213532/https://pthree.org/category/debian/]], [[Linux>>url:https://web.archive.org/web/20210430213532/https://pthree.org/category/linux/]], [[Ubuntu>>url:https://web.archive.org/web/20210430213532/https://pthree.org/category/ubuntu/]], [[ZFS>>url:https://web.archive.org/web/20210430213532/https://pthree.org/category/zfs/]]. ... ... @@ -224,9 +224,8 @@ 224 224 Follow any responses to this post with its [[comments RSS>>url:https://web.archive.org/web/20210430213532/https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/feed/]] feed. 225 225 You can [[post a comment>>url:https://web.archive.org/web/20210430213532/https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/#respond]] or [[trackback>>url:https://web.archive.org/web/20210430213532/https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/trackback/]] from your blog. 226 226 For IM, Email or Microblogs, here is the [[Shortlink>>url:https://web.archive.org/web/20210430213532/https://pthree.org/?p=2584]]. 227 - 228 228 ---- 229 229 230 -{{i nfo}}231 - Retrieved from[[https:~~/~~/web.archive.org/web/20210430213532/https:~~/~~/pthree.org/2012/12/04/zfs-administration-part-i-vdevs/>>https://web.archive.org/web/20210430213532/https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/]]232 -{{/ info}}256 +{{box title="**Archived From:**"}} 257 +[[https:~~/~~/web.archive.org/web/20210430213532/https:~~/~~/pthree.org/2012/12/04/zfs-administration-part-i-vdevs/>>https://web.archive.org/web/20210430213532/https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/]] 258 +{{/box}}