Last modified by Drunk Monkey on 2024-09-01 12:39

From version 5.1
edited by Drunk Monkey
on 2024-09-01 08:57
Change comment: There is no comment for this version
To version 4.2
edited by Drunk Monkey
on 2024-09-01 08:52
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -37,17 +37,14 @@
37 37  
38 38  Let's start by creating a simple zpool wyth my 4 drives. I could create a zpool named "tank" with the following command:
39 39  
40 -{{code language="bash session"}}
41 -# zpool create tank sde sdf sdg sdh
42 -{{/code}}
40 +{{{# zpool create tank sde sdf sdg sdh}}}
43 43  
44 44  In this case, I'm using four disk VDEVs. Notice that I'm not using full device paths, although I could. Because VDEVs are always dynamically striped, this is effectively a RAID-0 between four drives (no redundancy). We should also check the status of the zpool:
45 45  
46 -{{code language="bash session"}}
47 -# zpool status tank
44 +{{{# zpool status tank
48 48   pool: tank
49 49   state: ONLINE
50 - scan: none requested
47 + scan: none requested
51 51  config:
52 52  
53 53   NAME STATE READ WRITE CKSUM
... ... @@ -57,26 +57,21 @@
57 57   sdg ONLINE 0 0 0
58 58   sdh ONLINE 0 0 0
59 59  
60 -errors: No known data errors
61 -{{/code}}
57 +errors: No known data errors}}}
62 62  
63 63  Let's tear down the zpool, and create a new one. Run the following before continuing, if you're following along in your own terminal:
64 64  
65 -{{code language="bash session"}}
66 -# zpool destroy tank
67 -{{/code}}
61 +{{{# zpool destroy tank}}}
68 68  
69 -
70 70  == A simple mirrored zpool ==
71 71  
72 72  In this next example, I wish to mirror all four drives (/dev/sde, /dev/sdf, /dev/sdg and /dev/sdh). So, rather than using the disk VDEV, I'll be using "mirror". The command is as follows:
73 73  
74 -{{code language="bash session"}}
75 -# zpool create tank mirror sde sdf sdg sdh
67 +{{{# zpool create tank mirror sde sdf sdg sdh
76 76  # zpool status tank
77 77   pool: tank
78 78   state: ONLINE
79 - scan: none requested
71 + scan: none requested
80 80  config:
81 81  
82 82   NAME STATE READ WRITE CKSUM
... ... @@ -87,25 +87,21 @@
87 87   sdg ONLINE 0 0 0
88 88   sdh ONLINE 0 0 0
89 89  
90 -errors: No known data errors
91 -{{/code}}
82 +errors: No known data errors}}}
92 92  
93 93  Notice that "mirror-0" is now the VDEV, with each physical device managed by it. As mentioned earlier, this would be analogous to a Linux software RAID "/dev/md0" device representing the four physical devices. Let's now clean up our pool, and create another.
94 94  
95 -{{code language="bash session"}}
96 -# zpool destroy tank
97 -{{/code}}
86 +{{{# zpool destroy tank}}}
98 98  
99 99  == Nested VDEVs ==
100 100  
101 101  VDEVs can be nested. A perfect example is a standard RAID-1+0 (commonly referred to as "RAID-10"). This is a stripe of mirrors. In order to specify the nested VDEVs, I just put them on the command line in order (emphasis mine):
102 102  
103 -{{code language="bash session"}}
104 -# zpool create tank mirror sde sdf mirror sdg sdh
92 +{{{# zpool create tank mirror sde sdf mirror sdg sdh
105 105  # zpool status
106 106   pool: tank
107 107   state: ONLINE
108 - scan: none requested
96 + scan: none requested
109 109  config:
110 110  
111 111   NAME STATE READ WRITE CKSUM
... ... @@ -117,27 +117,22 @@
117 117   sdg ONLINE 0 0 0
118 118   sdh ONLINE 0 0 0
119 119  
120 -errors: No known data errors
121 -{{/code}}
108 +errors: No known data errors}}}
122 122  
123 -
124 124  The first VDEV is "mirror-0" which is managing /dev/sde and /dev/sdf. This was done by calling "mirror sde sdf". The second VDEV is "mirror-1" which is managing /dev/sdg and /dev/sdh. This was done by calling "mirror sdg sdh". Because VDEVs are always dynamically striped, "mirror-0" and "mirror-1" are striped, thus creating the RAID-1+0 setup. Don't forget to cleanup before continuing:
125 125  
126 -{{code language="bash session"}}
127 -# zpool destroy tank
128 -{{/code}}
112 +{{{# zpool destroy tank}}}
129 129  
130 130  == File VDEVs ==
131 131  
132 132  As mentioned, pre-allocated files can be used fer setting up zpools on your existing ext4 filesystem (or whatever). It should be noted that this is meant entirely for testing purposes, and not for storing production data. Using files is a great way to have a sandbox, where you can test compression ratio, the size of the deduplication table, or other things without actually committing production data to it. When creating file VDEVs, you cannot use relative paths, but must use absolute paths. Further, the image files must be preallocated, and not sparse files or thin provisioned. Let's see how this works:
133 133  
134 -{{code language="bash session"}}
135 -# for i in {1..4}; do dd if=/dev/zero of=/tmp/file$i bs=1G count=4 &> /dev/null; done
118 +{{{# for i in {1..4}; do dd if=/dev/zero of=/tmp/file$i bs=1G count=4 &> /dev/null; done
136 136  # zpool create tank /tmp/file1 /tmp/file2 /tmp/file3 /tmp/file4
137 137  # zpool status tank
138 138   pool: tank
139 139   state: ONLINE
140 - scan: none requested
123 + scan: none requested
141 141  config:
142 142  
143 143   NAME STATE READ WRITE CKSUM
... ... @@ -147,25 +147,21 @@
147 147   /tmp/file3 ONLINE 0 0 0
148 148   /tmp/file4 ONLINE 0 0 0
149 149  
150 -errors: No known data errors
151 -{{/code}}
133 +errors: No known data errors}}}
152 152  
153 153  In this case, we created a RAID-0. We used preallocated files using /dev/zero that are each 4GB in size. Thus, the size of our zpool is 16 GB in usable space. Each file, as with our first example using disks, is a VDEV. Of course, you can treat the files as disks, and put them into a mirror configuration, RAID-1+0, RAIDZ-1 (coming in the next post), etc.
154 154  
155 -{{code language="bash session"}}
156 -# zpool destroy tank
157 -{{/code}}
137 +{{{# zpool destroy tank}}}
158 158  
159 159  == Hybrid pools ==
160 160  
161 161  This last example should show you the complex pools you can setup by using different VDEVs. Using our four file VDEVs from the previous example, and our four disk VDEVs /dev/sde through /dev/sdh, let's create a hybrid pool with cache and log drives. Again, I emphasized the nested VDEVs for clarity:
162 162  
163 -{{code language="bash session"}}
164 -# zpool create tank mirror /tmp/file1 /tmp/file2 mirror /tmp/file3 /tmp/file4 log mirror sde sdf cache sdg sdh
143 +{{{# zpool create tank mirror /tmp/file1 /tmp/file2 mirror /tmp/file3 /tmp/file4 log mirror sde sdf cache sdg sdh
165 165  # zpool status tank
166 166   pool: tank
167 167   state: ONLINE
168 - scan: none requested
147 + scan: none requested
169 169  config:
170 170  
171 171   NAME STATE READ WRITE CKSUM
... ... @@ -184,26 +184,22 @@
184 184   sdg ONLINE 0 0 0
185 185   sdh ONLINE 0 0 0
186 186  
187 -errors: No known data errors
188 -{{/code}}
166 +errors: No known data errors}}}
189 189  
190 190  There's a lot going on here, so let's disect it. First, we created a RAID-1+0 using our four preallocated image files. Notice the VDEVs "mirror-0" and "mirror-1", and what they are managing. Second, we created a third VDEV called "mirror-2" that actually is not used for storing data in the pool, but is used as a ZFS intent log, or ZIL. We'll cover the ZIL in more detail in another post. Then we created two VDEVs for caching data called "sdg" and "sdh". The are standard disk VDEVs that we've already learned about. However, they are also managed by the "cache" VDEV. So, in this case, we've used 6 of the 7 VDEVs listed above, the only one missing is "spare".
191 191  
192 192  Noticing the indentation will help you see what VDEV is managing what. The "tank" pool is comprised of the "mirror-0" and "mirror-1" VDEVs for long-term persistent storage. The ZIL is magaged by "mirror-2", which is comprised of /dev/sde and /dev/sdf. The read-only cache VDEV is managed by two disks, /dev/sdg and /dev/sdh. Neither the "logs" nor the "cache" are long-term storage for the pool, thus creating a "hybrid pool" setup.
193 193  
194 -{{code language="bash session"}}
195 -# zpool destroy tank
196 -{{/code}}
172 +{{{# zpool destroy tank}}}
197 197  
198 198  == Real life example ==
199 199  
200 200  In production, the files would be physical disk, and the ZIL and cache would be fast SSDs. Here is my current zpool setup which is storing this blog, among other things:
201 201  
202 -{{code language="bash session"}}
203 -# zpool status pool
178 +{{{# zpool status pool
204 204   pool: pool
205 205   state: ONLINE
206 - scan: scrub repaired 0 in 2h23m with 0 errors on Sun Dec 2 02:23:44 2012
181 + scan: scrub repaired 0 in 2h23m with 0 errors on Sun Dec 2 02:23:44 2012
207 207  config:
208 208  
209 209   NAME STATE READ WRITE CKSUM
... ... @@ -221,22 +221,19 @@
221 221   ata-OCZ-REVODRIVE_OCZ-33W9WE11E9X73Y41-part2 ONLINE 0 0 0
222 222   ata-OCZ-REVODRIVE_OCZ-X5RG0EIY7MN7676K-part2 ONLINE 0 0 0
223 223  
224 -errors: No known data errors
225 -{{/code}}
199 +errors: No known data errors}}}
226 226  
227 227  Notice that my "logs" and "cache" VDEVs are OCZ Revodrive SSDs, while the four platter disks are in a RAIDZ-1 VDEV (RAIDZ will be discussed in the next post). However, notice that the name of the SSDs is "ata-OCZ-REVODRIVE_OCZ-33W9WE11E9X73Y41-part1", etc. These are found in /dev/disk/by-id/. The reason I chose these instead of "sdb" and "sdc" is because the cache and log devices don't necessarily store the same ZFS metadata. Thus, when the pool is being created on boot, they may not come into the pool, and could be missing. Or, the motherboard may assign the drive letters in a different order. This isn't a problem with the main pool, but is a big problem on GNU/Linux with logs and cached devices. Using the device name under /dev/disk/by-id/ ensures greater persistence and uniqueness.
228 228  
229 229  Also do notice the simplicity in the implementation. Consider doing something similar with LVM, RAID and ext4. You would need to do the following:
230 230  
231 -{{code language="bash session"}}
232 -# mdadm -C /dev/md0 -l 0 -n 4 /dev/sde /dev/sdf /dev/sdg /dev/sdh
205 +{{{# mdadm -C /dev/md0 -l 0 -n 4 /dev/sde /dev/sdf /dev/sdg /dev/sdh
233 233  # pvcreate /dev/md0
234 234  # vgcreate /dev/md0 tank
235 235  # lvcreate -l 100%FREE -n videos tank
236 236  # mkfs.ext4 /dev/tank/videos
237 237  # mkdir -p /tank/videos
238 -# mount -t ext4 /dev/tank/videos /tank/videos
239 -{{/code}}
211 +# mount -t ext4 /dev/tank/videos /tank/videos}}}
240 240  
241 241  The above was done in ZFS (minus creating the logical volume, which will get to later) with one command, rather than seven.
242 242  
... ... @@ -245,6 +245,7 @@
245 245  This should act as a good starting point for getting the basic understanding of zpools and VDEVs. The rest of it is all downhill from here. You've made it over the "big hurdle" of understanding how ZFS handles pooled storage. We still need to cover RAIDZ levels, and we still need to go into more depth about log and cache devices, as well as pool settings, such as deduplication and compression, but all of these will be handled in separate posts. Then we can get into ZFS filesystem datasets, their settings, and advantages and disagvantages. But, you now have a head start on the core part of ZFS pools.
246 246  
247 247  ----
220 +
248 248  (% style="text-align: center;" %)
249 249  Posted by Aaron Toponce on Tuesday, December 4, 2012, at 6:00 am.
250 250  Filed under [[Debian>>url:https://web.archive.org/web/20210430213532/https://pthree.org/category/debian/]], [[Linux>>url:https://web.archive.org/web/20210430213532/https://pthree.org/category/linux/]], [[Ubuntu>>url:https://web.archive.org/web/20210430213532/https://pthree.org/category/ubuntu/]], [[ZFS>>url:https://web.archive.org/web/20210430213532/https://pthree.org/category/zfs/]].
... ... @@ -251,6 +251,7 @@
251 251  Follow any responses to this post with its [[comments RSS>>url:https://web.archive.org/web/20210430213532/https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/feed/]] feed.
252 252  You can [[post a comment>>url:https://web.archive.org/web/20210430213532/https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/#respond]] or [[trackback>>url:https://web.archive.org/web/20210430213532/https://pthree.org/2012/12/04/zfs-administration-part-i-vdevs/trackback/]] from your blog.
253 253  For IM, Email or Microblogs, here is the [[Shortlink>>url:https://web.archive.org/web/20210430213532/https://pthree.org/?p=2584]].
227 +
254 254  ----
255 255  
256 256  {{box title="**Archived From:**"}}