LVM Setup (on SATABeast)
By Chris Pepper on Monday, February 23 2009, 16:43 - Linux - Permalink
Update 2010/05/06: Apparently I was wrong. ext3 uses 32-bit block numbers from 0..4,294,967,295. With 4kbyte blocks (maximum on i386 & x86_64 systems) this gives a maximum ext3 filesystem of (2^32-1) * 4096 = 17,592,186,040,320 bytes. Using LVM with 4096kbyte physical extents, this means ext3 filesystems must be under 4,194,304 PEs. So use lvcreate --extents 4194303. 4,194,303 4096kbyte physical extents = 4,294,966,272 4kbyte blocks = 17,179,865,088 bytes in the resulting filesystem.
Update 2009/03/10: It looks like mke2fs is smart enough to automatically select the 4k blocksize, and largefiles4 is not necessary (which is good, as it was interfering with our backups).
We compared performance between 10-disk and 20-disk RAID6 sets on a SATABeast, and discovered the performance difference is not significant, so we chose the most efficient reasonable layout: 2 20-disk RAID6 sets, each containing a single volume the same size. These appear to the Linux host as a couple 16.37tibyte LUNs. We're using device mapper multipathing to provide fault tolerance across both FC paths (in Nexsan's recommended "All Paths All LUNs" mode, each LUN is available via both controllers). This is all handled (except the performance testing) via the SATABeast administration interfaces.
Within Linux, we create 2 LVM logical volumes of just under 8tibyte (the largest ext3 can handle), and a third with the leftover 384gibyte, from each LUN.
The SATABeast lets the host see and use the volumes while it's still generating parity on the underlying RAID arrays ("Online Creation"), but creating file systems is much slower during this process.
A fully configured SATABeast contains 42 1,000,137,687,040-byte ("1 terabyte") drives. They reserve 2 for spares, so we have 40 disks to work with. Nexsan suggests 4 10-disk RAID sets, but RAID 6 allocates 2 disks per RAID set to parity, so with 4 10-disk sets we would 'waste' 10 disks, and our usable space would be 4 volumes, each 8 usable disks = 8,001,101,496,320 bytes / 1024^4 * 4 RAID sets = 29tibyte usable. 29tibyte is a lot, but only 70% of the specified "42tbyte", so we'd really like to be more space efficient -- we have 2 hot spare and 4 p
I set up /etc/fstab earlier, using ext3 labels.
How to create an ~~8tibyte ext3 filesystem on a large multipath raw volume.
pvcreate /dev/mapper/mpath3vgcreate satabeast1vg /dev/mapper/mpath3vgdisplay# Note number of usable extents (PE).lvcreate -n satabeast1lv --extents 2096128 satabeast1vg# Use number from above. # It least close to the largest validext3volume size.mkfs.ext3 -L/satabeast1a -Tlargefile4 -b4096 /dev/satabeast1vg/satabeast1a# If the SATABeast is still calculating parity, this takes a while. Go get some food...vi /etc/fstab## I use something likeLABEL=/satabeast1 /satabeast1 ext3 defaults 0 0mount /dev/satabeast1vg/satabeast1lv /satabeast1df -h /satabeast1
Here's my transcript of creating an LVM volume with two not-quite-8tibyte and one 348gb filesystems on a 16.37tibyte LUN.
[root@norimaki device-mapper-multipath-0.4.7]# pvcreate /dev/mapper/mpath3
Physical volume "/dev/mapper/mpath3" successfully created
[root@norimaki device-mapper-multipath-0.4.7]# vgcreate noribeast0vg /dev/mapper/mpath3
Volume group "noribeast0vg" successfully created
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0lv --extents 2096128 noribeast0vg
Logical volume "noribeast0lv" created
Logical volume "noribeast0lv" successfully removed
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0a --extents 2096128 noribeast0vg
Logical volume "noribeast0a" created
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0b --extents 2096128 noribeast0vg
Logical volume "noribeast0b" created
[root@norimaki device-mapper-multipath-0.4.7]# vgdisplay
--- Volume group ---
VG Name noribeast0vg
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 5
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 2
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 16.37 TB
PE Size 4.00 MB
Total PE 4292125
Alloc PE / Size 4192256 / 15.99 TB
Free PE / Size 99869 / 390.11 GB
VG UUID QLdntg-9ccY-0HYe-DnWI-Lxxu-Huzj-S3bExH
[root@norimaki device-mapper-multipath-0.4.7]# lvcreate -n noribeast0c --extents 99869 noribeast0vg
Logical volume "noribeast0c" created
[root@norimaki device-mapper-multipath-0.4.7]# mkfs.ext3 -L/noribeast0a -Tlargefile -b4096 /dev/noribeast0vg/noribeast0a
mke2fs 1.39 (29-May-2006)
Filesystem label=/noribeast0a
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
8384512 inodes, 2146435072 blocks
107321753 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
65504 block groups
32768 blocks per group, 32768 fragments per group
128 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 25 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
[root@norimaki device-mapper-multipath-0.4.7]# time mkfs.ext3 -L/noribeast0b -Tlargefile -b4096 /dev/noribeast0vg/noribeast0b
mke2fs 1.39 (29-May-2006)
Filesystem label=/noribeast0b
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
8384512 inodes, 2146435072 blocks
107321753 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
65504 block groups
32768 blocks per group, 32768 fragments per group
128 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848, 512000000, 550731776, 644972544, 1934917632
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 23 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
real 9m52.451s
user 0m0.657s
sys 0m3.660s
[root@norimaki device-mapper-multipath-0.4.7]# time mkfs.ext3 -L/noribeast0c /dev/noribeast0vg/noribeast0c
mke2fs 1.39 (29-May-2006)
Filesystem label=/noribeast0c
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
51134464 inodes, 102265856 blocks
5113292 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
3121 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 27 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override.
real 1m45.683s
user 0m0.172s
sys 0m10.150s
[root@norimaki device-mapper-multipath-0.4.7]# df -h |grep nori
/dev/mapper/noribeast0vg-noribeast0a
8.0T 175M 7.6T 1% /noribeast0a
/dev/mapper/noribeast0vg-noribeast0b
8.0T 175M 7.6T 1% /noribeast0b
/dev/mapper/noribeast0vg-noribeast0c
384G 195M 365G 1% /noribeast0c