I am going to wait for the ~7TB data transfer to complete before I plan to work through your md0 “fix”. I see what you mean about it not being necessary (as it’s all up and working as is)… but I’d like to keep learning the “correct” ways/options to do things; understanding the pros and cons. (I’ve got 5 or 6 different documents with my on the fly learning notes)
Just curious what the correct command would have looked like to use/claim md0 right away… would it be an adjustment to the create or assemble (or both) command(s)?
Hmmm, my initial research suggests that following the wiki example created /dev/md/RAID1Array specifically (instead of /dev/md0)… and had I adjusted the initial create command to use /dev/md0 instead, then things would have been as expected during the wiki’s format phase… is this correct?
During creation, it can be specified with --name=0. I don’t recall intentionally doing that for Ubuntu or openSUSE (but then again, I might have and simply forgot? or it’s possible those distros use hooks that automatically insert certain options when using the mdadm tool. Perhaps they check existing arrays, and use an automatic sequential numbering system, starting from zero.)
Regardless, you can specify --name=0 during “–create” or “–assemble --update=name”.
Maybe the wiki is based on an older version of mdadm. I’m not entirely sure. But if you don’t specify “–name=XXX” it’s considered a “nameless” md127 array. The references to /dev/md/MyCustomName specifies a symlink to point to the actual device (such as /dev/md0, /dev/md1, /dev/md127, etc.)
Any and all actions on your array can be directed to the symlink /dev/md/Raid1Array (scrubs, status, stopping, updating, etc), because it’s the equivalent of directing it to /dev/mdX
Think of how LUKS and LVM use the device-mapper.
Entries under /dev/mapper, such as:
/dev/mapper/cryptoPV
/dev/mapper/vgBigGroup-lvRoot
/dev/mapper/vgBigGroup-lvHome
Are in fact symlinks that point to the real thing, such as:
/dev/dm-0
/dev/dm-1
/dev/dm-2
That’s why it’s better to target the symlinks, since those are based on names you choose (and can remember), and they don’t change; unlike the entries directly under /dev/ which can change.
EDIT: This might be a mistake in the Wiki (or not fully explored):
As far as I’m aware, --name only works with integers. If you specify something like “–name=MyAwesomeRaid” it will show up in the metadata details, but will not use /dev/mdMyAwesomeRaid, but rather fallback to /dev/md127. (Notice I wrote /dev/mdMyAwesomeRaid, rather than /dev/md/MyAwesomeRaid)
As far as what symlink you’ll see under /dev/md/, it depends on the path you provided before the block devices to be used, such as /dev/md/Raid1Array
That will create a symlink of /dev/md/MyMediaStorage that points to /dev/md0
With the proper ARRAY entry defined in mdadm.conf, all actions should target the symlink /dev/md/MyMediaStorage
EDIT 2: Now that I think about it, I think “name” is only useful as an alternative method of identifying an array. But using the array’s UUID or devices work well, and the UUID and devices are specified in the mdadm.conf file anyways, so it’s no mystery which array you’re trying to assemble or inspect when using the mdadm command.
In other words you can assemble an array (e.g, /dev/md/Raid1Array) by:
manually specifying the devices required to assemble it
specifying the UUID, which will be scanned for all block devices that have mdadm metadata that matches the UUID
specifying the “name”, which will be scanned for all block devices that have mdadm metadata that matches the name
using the mdadm.conf file to automatically assemble all possible arrays based on device availability
using the mdadm.conf file to assemble a particular array based on a matching UUID, name, or devices
No matter what “name” is created/updated, it defaults to hostname:arrayname, such as:
linuxpc:0
linuxpc:5
linuxpc:MyMedia
Hostname can be specified with --homehost, such as --homehost=officepc:
And since I’m okay with using /dev/md/RAID1Array, I may just end up leaving things as they are… then again, since I created my own fstab entry, in most cases I’ll be referring to the array by /data/raid1.
Would it be correct to assume that /dev/md/RAID1Array and /data/raid1 are effectively the same thing? Or is /data/raid1 fine for day to day data tasks, and /dev/md/RAID1Array (or /dev/md127) required for system commands like mdadm, format, etc?
Because if in the end I can use the /data/raid1 mountpoint I ultimately wanted for most things … the importance of what the md# created was starts to fade away.
/data/raid1 is the directory where your ext4 file-system is mounted (specified in your fstab). It’s only useful to file- and folder-based operations. /data/raid1 is unknown to mdadm, just like /home/username is not relevant to mdadm arrays.
(After all, you ran rsync with /data/media/ as the destination, rather than /dev/md/RAID1Array)
/dev/md/RAID1Array is the fully assembled array, which also happens to be the block device that an ext4 file-system was formatted on. This (or the UUID) is what you issue mdadm commands against. It’s also where you issue fsck against (make sure the file-system is not being used and is unmounted first.)
/data/raid1 is a nice convenience for day to day data tasks/access and auto-mounting, but /dev/md/RAID1Array (or /dev/mdx) is required for system commands like mdadm, format, fsck, etc… that target the raid array.
Remember, you can still have access to the block device (aka, the assembled array at /dev/md/RAID1Array), yet /data/raid1 becomes an empty folder if you unmount the ext4 file-system (which lives on /dev/md/RAID1Array).
It’s true that the umount command can accept the mount location as an argument (such as umount /data/raid1), but that’s because it allows for more than one way to figure out “what” you’re trying to unmount.
The same is true for mount, which also checks the fstab for entries if not enough information is provided.
EDIT: If you ever transfer those drives to another computer, you can use whatever is available within the devices’ metadata to re-assemble the array. So for example, even without specifying what drives/partitions contain mdadm superblocks, you can simply feed it something like this:
The above command will search for block devices on your computer, find some with mdadm superblocks, check for the specified UUID, and use them to assemble the array on your new computer.
If for whatever reason it cannot find all the necessary devices, the command will abort and warn you. (By default, mdadm refuses to assemble a degraded array if it cannot use all devices from the array.) You can “force” it to bypass this warning, but it’s dangerous. The only reason that would be necessary is to try and save the data that currently exists during an emergency situation, and there’s no time to rebuild with a new disk to complete the array back to a healthy state.
The data has finished copying from the NAS to the Software RAID (over 7TB, leaving “147GiB” of free space)… and at some point after I was finished focusing on something else, I’ve noticed a consistent rythmic hum/clack of the RAID’s mechanical drives (lasts for 1-2 seconds, subsides for 1-2 seconds, then cycle repeats) that I didn’t notice during the file transfer (but that doesn’t mean it wasn’t there).
So I ran the only command I know right now to see what (if anything) was going on…
… and it’s doesn’t appear to have much to say… Other than letting me know about the in-memory bitmap (basically a cache of what’s in the on-disk bitmap − it allows bitmap operations to be more efficient).
Are there any commands or GUI tools that might give me some more insight into the disk/array activity? Perhaps even some that would also be good to check/monitor/scrub the RAID array?
I’d hate to reboot or something when the array is in the middle of something and cause any issues.
Nothing appears wrong. The bitmap (whether internal or external) is akin to a file-system’s journal. Its cousin in ZFS is the ZIL (intent log). After some time of no writes, the bitmap shouldn’t be using any pages for cache’d writes. You can attempt to flush it by unmounting the ext4 file-system, stopping the array, and then reassembling it.
It will also be interesting if you still hear the rythmic hums and clacks after reassembling the array and waiting after a period of idle time and no data activity. (Ruling out any other mechanical drives in the system.)
You filled it a bit too close for comfort in terms of future fragmentation and performance.
You might be able to squeeze in a bit extra capacity by removing the reserved superuser blocks from the ext4 file-system. (I believe it defaults to 5%, unless that has changed recently.) It’s original purpose was to prevent locking yourself out of the system on the chance that you filled the file-system 100% and cannot even write/modify anything for the sake of recovery or emergency. It’s not really neccessary for a purely “data storage” purpose, like you’re using.
Make sure you unmount the file-system first, but leave the array assembled, and then remove the reserved superuser blocks:
sudo tune2fs -m 0 /dev/md/RAID1Array
EDIT: This concerns the ext4 file-system, nothing to do with mdadm, per se.
Yes, the data is a bit tight… but it will be shrinking over time. Lots of “windows only” bloat (drivers/installers, etc) in it currently that will be pruned over time while I stay focused on Manjaro.
Didn’t need to re-assemble the array… as the disk activity stopped (I think somewhere between the umount and tune2fs commands completing as I reclaimed that 5% (now at 520GiB free)…
Once I re-mounted, the drives clacked away merrily for about 5 seconds (I suspect while the bitmap/cache was rebuilt)… and have stayed silent so far… thank you winnie!
This is interesting… I caught wind of iotop and installed it through PAMAC. No read/write numbers… but apparently some IO activity for something called ext4lazyinit…
Apparently the kernel is tasked with handling some of the final touches of the ext4 formats initialization… and this thread I found seems to echo my experience. Probably delayed from finishing as I went straight from formatting the array, to using rsync to fill it with data.
Now it makes sense why unmounting and remounting stopped the noise… as it only starts/continues after being mounted. And I probably had the kernel working double-time as it was trying to work on that as I was loading up the drive with data… and obviously still had more to do after the data copying was complete.
Also interesting to learn there’s an extra parameter to include to not use “lazy initialization”… and according to man mkfs.ext4, there are actually two lazy features…
lazy_itable_init[= <0 to disable, 1 to enable>]
If enabled and the uninit_bg feature is enabled, the inode table will not be fully initialized by mke2fs. This speeds up filesystem initialization noticeably, but it requires the kernel to finish initializing the filesystem in the background when the filesystem is first mounted. If the option value is omitted, it defaults to 1 to enable lazy inode table zeroing.
lazy_journal_init[= <0 to disable, 1 to enable>]
If enabled, the journal inode will not be fully zeroed out by mke2fs. This speeds up filesystem initialization noticeably, but carries some small risk if the system crashes before the journal has been overwritten entirely one time. If the option value is omitted, it defaults to 1 to enable lazy journal inode zeroing.
I think I’ll be adding these two extra parameters to my mechanical drive formats.
Even that is new to me (lazy_itable_init, lazy_journal_init), but like I said, I’ve moved exclusively to XFS (local) and ZFS (NAS). Per your discovery, seems that it will only have an affect some time after formatting the file-system, and you shouldn’t have too many issues using it normally.
While you’re at it, you should check if your 8TB drives support TLER/ERC, and if so, if the firmware is set to 7 seconds by default. (It’s very likely, since 8TB and larger are usually white-label enterprise or NAS drives.)
sudo smartctl -l scterc /dev/sdx
It prints the timeout for TLER in “deciseconds”, so a value of 70 = 7.0 seconds.
Linux/mdadm waits 30 seconds before it considers a SATA/SCSI drive “unresponsive” and tries to bring it back up or simply offline it (in which case your RAID array will drop to a “degraded” state.)
If you drives do not support ERC/TLER (or they support it, but are not configured to use it), they will try for an indefinite period of time (internally) to correct their own errors / relocate bad sectors. Problem is, if this time exceeds 30 seconds, even a healthy drive can be kicked out of the array.
Setting TLER to 7.0 seconds (“70 deciseconds”) is recommended. (Don’t try to set it to anything shorter than 7 seconds, as I’ve read that the drive’s firmware might simply ignore it without notifying you that the number is invalid.)
If your drive supports TLER, but it’s not enabled, you can manually enable it (yet this will not persist through system reboots.)
sudo smartctl -l scterc,70,70 /dev/sdx
In order for it to apply after every reboot, you need to make a script or cron job that will do it upon booting up your computer.
how would I check for and change write caching on the drives… I’m assuming having write caching disabled would be a better option in case of crash (preserving data integrity at the cost of speed)… or maybe this is an old idea that might not be as relevant in GNU/Linux as it was in Windows? EDIT: Well look at that… it’s another hdparm command/parameter…
In one of my other posts the discussion evolved to adjusting APM settings on the mechanical drives to 254 (or 255 if they support it) but I thought I would reserve looking into that for this discussion (deciding how the drives would be attached first; SATA versus HBA)… would you have a recommendation for APM settings?
Nothing needs to be done! They support it and are already set by the factory at 7 seconds. This is one of the selling points of “NAS-ready” drives, among other features (such as longer MTBF and constant operation in a vibration-heavy chassis or server rack.
The drop in write performance from disabling it might not be worth it, considering ext4 uses a journal (thus buffers against a dirty state, and will re-check itself if it was previously not unmounted cleanly), and for future projects ZFS (and Btrfs, such as used in your Synology NAS) are copy-on-write file-systems, which means it’s nearly impossible to have corruption due to a crash or powerloss. (That’s not to say you should neglect a UPS battery backup in case of sudden powerloss.)
I always have mine disabled. It’s healthier for the drive. Acoustic (-M) and APM (-B) and auto-suspend (-S) should always be disabled, especially if used in a RAID array or ZFS pool. Drives barely use any power on idle (around 3 to 5 watts, spinning). Depending where you live, that’s about 30¢ to 50¢ per month on your electric bill if you leave them running 24/7.
Many thanks once again for your great advice winnie!
I’m glad to hear I can leave the write caching on for performance with EXT4! I do have a UPS connected and think I have things setup to power down @ 25% battery… although I haven’t tested the settings yet by pulling power
I’ll work on disabling Acoustic (-M), APM (-B) and auto-suspend (-S) on both drives!
I had to do a double-take on this. I just realized, and correct me if I’m wrong: THIS is your first time jumping into Linux as a legit alternative to Windows?
Well gosh darn it! You’re crazy! A deep dive right into software RAID and rsync’ing from a NAS server and esoteric file-system options!
The first time when I ditched Windows for Linux, I took baby steps:
“Okay… so… the terminal, um, that’s like the CMD.exe thingy in Windows, right? Okay… I can… do stuff in the terminal… okay… so wait… package manager? Like for zip files? Oh, package manager is like for installing software? Whatever. How do I run my .exe files? Is Firefox like IE but with an orange icon?”
Hehe… I though asking for help in advance was a baby step?!
I’m an older “Computer Engineering Technologist” who graduated back in the day when 386’s were king. I’ve dabbled with a few “Linux Live” CD/DVD/USB distros on and off over the past [cough] decades, but never put significant effort in to actually try replace windows with it after learning early on that I’d have to give up some of my favorite PC activities like winding down in a good RPG or MMO… GNU/Linux just wasn’t going to let me keep playing my favorite titles (until more recently).
But as luck would have it, much of what I play is on Steam, and earlier this year I caught a video (might have been this one or one like it) from Anthony at LTT where he was talking about “Gaming on Linux” (POP_OS and Manjaro) and that planted the seed for me to embrace all the good things Steam has been doing in this area over the past few years… with the help of other technologies like wine; and all the great upstream and downstream support found in the various GNU/Linux distributions of today.
Needless to say I’m glad to be rid of all the MS data-mining/telemetry, and happy to learn more about Manjaro and GNU/Linux as it is supporting my geekiness and Steam game play beautifully. And I’m digging in deep enough to try support the few people (like my parents) that will likely follow me to Linux; and likely future n00bs like me in this forum and other aspects of the GNU/Linux community that present themselves along the way.
I still have lots to learn, and I am prioritizing my posts and learning based on…
where I am in my migration
what apps/functionality I want/need next
what hardware I want/need to get working next
what presents itself as a learning opportunity along the way
Ok, I ran through disabling Acoustic (-M), APM (-B) and auto-suspend (-S) on both drives… but based on my steps, it looks like Acoustic (-M) is “not supported” for my WD Red’s so I left it as is…
EDIT: Wow… it’s been 7 hours (according to the forum) since I last ran iotop… and it’s still listing ext4lazyinit as working… hopefully it’s settled come the morning.
Don’t forget to create a custom udev rule so that those values are re-applied each reboot. (I name my custom files wth “99-” so I can keep track of them if I need to review/edit.)
For example, to apply it to all “spinning” drives in the system:
It doesn’t hurt to specify -M0 out of practice, as it gets ignored if it’s not supported by the drive anyways. Makes the entry good for future use if such a drive supports -M.
If you prefer to use a custom script (that runs with elevated privileges), have it do something like:
Many thanks for the keeping the learning curve moving forward winnie! I really appreciate your thoroughness.
Okay, I’m going to try to implement a udev custom rule because I think it fits nicely in with the other customization files I’ve been playing with so far… like /etc/fstab, /etc/mdadm.conf, and /etc/sysctl.d/30-swap_usage.conf (custom file from Fabby to control swappiness and vfs_cache_pressure).
But that syntax is above my head (is that some form of regex or bash scripting?), so more learning to do… although it seems to target just the sdx devices, which would mean my sda Samsung EVO SSD (which likely doesn’t care about these settings) and my sdb & sdc WD Red’s.
This is an old article, but it gives a good idea of the gist of writing a udev rule (which is rare for an end-user to do anyways. I doubt you’ll need more than this very one.)