Compiling a Linux Driver for my HBA versus SATA Raid under Manjaro

Daniel-I · 26 July 2021 19:32

Many thanks for the feedback once again winnie!

I am going to wait for the ~7TB data transfer to complete before I plan to work through your md0 “fix”. I see what you mean about it not being necessary (as it’s all up and working as is)… but I’d like to keep learning the “correct” ways/options to do things; understanding the pros and cons. (I’ve got 5 or 6 different documents with my on the fly learning notes)

Just curious what the correct command would have looked like to use/claim md0 right away… would it be an adjustment to the create or assemble (or both) command(s)?

mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 /dev/md/RAID1Array /dev/sdb1 /dev/sdc1
mdadm --assemble --scan

Hmmm, my initial research suggests that following the wiki example created /dev/md/RAID1Array specifically (instead of /dev/md0)… and had I adjusted the initial create command to use /dev/md0 instead, then things would have been as expected during the wiki’s format phase… is this correct?

mdadm --create --verbose /dev/md0 --level=1 --metadata=1.2 --raid-devices=2 /dev/sdb1 /dev/sdc1

winnie · 26 July 2021 20:55

During creation, it can be specified with --name=0. I don’t recall intentionally doing that for Ubuntu or openSUSE (but then again, I might have and simply forgot? or it’s possible those distros use hooks that automatically insert certain options when using the mdadm tool. Perhaps they check existing arrays, and use an automatic sequential numbering system, starting from zero.)

Regardless, you can specify --name=0 during “–create” or “–assemble --update=name”.

Maybe the wiki is based on an older version of mdadm. I’m not entirely sure. But if you don’t specify “–name=XXX” it’s considered a “nameless” md127 array. The references to /dev/md/MyCustomName specifies a symlink to point to the actual device (such as /dev/md0, /dev/md1, /dev/md127, etc.)

Any and all actions on your array can be directed to the symlink /dev/md/Raid1Array (scrubs, status, stopping, updating, etc), because it’s the equivalent of directing it to /dev/mdX

Think of how LUKS and LVM use the device-mapper.

Entries under /dev/mapper, such as:

/dev/mapper/cryptoPV
/dev/mapper/vgBigGroup-lvRoot
/dev/mapper/vgBigGroup-lvHome

Are in fact symlinks that point to the real thing, such as:

/dev/dm-0
/dev/dm-1
/dev/dm-2

That’s why it’s better to target the symlinks, since those are based on names you choose (and can remember), and they don’t change; unlike the entries directly under /dev/ which can change.

EDIT: This might be a mistake in the Wiki (or not fully explored):

As far as I’m aware, --name only works with integers. If you specify something like “–name=MyAwesomeRaid” it will show up in the metadata details, but will not use /dev/mdMyAwesomeRaid, but rather fallback to /dev/md127. (Notice I wrote /dev/mdMyAwesomeRaid, rather than /dev/md/MyAwesomeRaid)

As far as what symlink you’ll see under /dev/md/, it depends on the path you provided before the block devices to be used, such as /dev/md/Raid1Array

Here are a few examples:

sudo mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 --name=5 /dev/md/RAID1Array /dev/sdx1 /dev/sdy1

That will create a symlink of /dev/md/RAID1Array that points to /dev/md5

With the proper ARRAY entry defined in mdadm.conf, all actions should target the symlink /dev/md/RAID1Array

sudo mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 --name=0 /dev/md/MyMediaStorage /dev/sdx1 /dev/sdy1

That will create a symlink of /dev/md/MyMediaStorage that points to /dev/md0

With the proper ARRAY entry defined in mdadm.conf, all actions should target the symlink /dev/md/MyMediaStorage

EDIT 2: Now that I think about it, I think “name” is only useful as an alternative method of identifying an array. But using the array’s UUID or devices work well, and the UUID and devices are specified in the mdadm.conf file anyways, so it’s no mystery which array you’re trying to assemble or inspect when using the mdadm command.

In other words you can assemble an array (e.g, /dev/md/Raid1Array) by:

manually specifying the devices required to assemble it
specifying the UUID, which will be scanned for all block devices that have mdadm metadata that matches the UUID
specifying the “name”, which will be scanned for all block devices that have mdadm metadata that matches the name
using the mdadm.conf file to automatically assemble all possible arrays based on device availability
using the mdadm.conf file to assemble a particular array based on a matching UUID, name, or devices

No matter what “name” is created/updated, it defaults to hostname:arrayname, such as:

linuxpc:0
linuxpc:5
linuxpc:MyMedia

Hostname can be specified with --homehost, such as --homehost=officepc:

officepc:0
officepc:5
officepc:MyMedia

Daniel-I · 26 July 2021 21:45

Great explanation winnie, thank you!

I’ve updated my notes for the two creation choices for “named” versus “nameless”…

mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 --name=0 /dev/md/RAID1Array /dev/sdb1 /dev/sdc1
mdadm --create --verbose --level=1 --metadata=1.2 --raid-devices=2 /dev/md/RAID1Array /dev/sdb1 /dev/sdc1

And since I’m okay with using /dev/md/RAID1Array, I may just end up leaving things as they are… then again, since I created my own fstab entry, in most cases I’ll be referring to the array by /data/raid1.

Would it be correct to assume that /dev/md/RAID1Array and /data/raid1 are effectively the same thing? Or is /data/raid1 fine for day to day data tasks, and /dev/md/RAID1Array (or /dev/md127) required for system commands like mdadm, format, etc?

Because if in the end I can use the /data/raid1 mountpoint I ultimately wanted for most things … the importance of what the md# created was starts to fade away.

winnie · 26 July 2021 22:02

Those are two entirely different things.

/data/raid1 is the directory where your ext4 file-system is mounted (specified in your fstab). It’s only useful to file- and folder-based operations. /data/raid1 is unknown to mdadm, just like /home/username is not relevant to mdadm arrays.

(After all, you ran rsync with /data/media/ as the destination, rather than /dev/md/RAID1Array)

/dev/md/RAID1Array is the fully assembled array, which also happens to be the block device that an ext4 file-system was formatted on. This (or the UUID) is what you issue mdadm commands against. It’s also where you issue fsck against (make sure the file-system is not being used and is unmounted first.)

Daniel-I · 26 July 2021 23:31

Thank you or the confirmation winnie!

/data/raid1 is a nice convenience for day to day data tasks/access and auto-mounting, but /dev/md/RAID1Array (or /dev/mdx) is required for system commands like mdadm, format, fsck, etc… that target the raid array.

winnie · 26 July 2021 23:38

Remember, you can still have access to the block device (aka, the assembled array at /dev/md/RAID1Array), yet /data/raid1 becomes an empty folder if you unmount the ext4 file-system (which lives on /dev/md/RAID1Array).

It’s true that the umount command can accept the mount location as an argument (such as umount /data/raid1), but that’s because it allows for more than one way to figure out “what” you’re trying to unmount.

The same is true for mount, which also checks the fstab for entries if not enough information is provided.

EDIT: If you ever transfer those drives to another computer, you can use whatever is available within the devices’ metadata to re-assemble the array. So for example, even without specifying what drives/partitions contain mdadm superblocks, you can simply feed it something like this:

sudo mdadm --assemble /dev/md/RAID1Array --uuid=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The above command will search for block devices on your computer, find some with mdadm superblocks, check for the specified UUID, and use them to assemble the array on your new computer.

If for whatever reason it cannot find all the necessary devices, the command will abort and warn you. (By default, mdadm refuses to assemble a degraded array if it cannot use all devices from the array.) You can “force” it to bypass this warning, but it’s dangerous. The only reason that would be necessary is to try and save the data that currently exists during an emergency situation, and there’s no time to rebuild with a new disk to complete the array back to a healthy state.

Daniel-I · 27 July 2021 17:24

The data has finished copying from the NAS to the Software RAID (over 7TB, leaving “147GiB” of free space)… and at some point after I was finished focusing on something else, I’ve noticed a consistent rythmic hum/clack of the RAID’s mechanical drives (lasts for 1-2 seconds, subsides for 1-2 seconds, then cycle repeats) that I didn’t notice during the file transfer (but that doesn’t mean it wasn’t there).

So I ran the only command I know right now to see what (if anything) was going on…

[AM4-x5600-Linux ~]# cat /proc/mdstat
Personalities : [raid1] 
md127 : active raid1 sdc1[1] sdb1[0]
      7813893120 blocks super 1.2 [2/2] [UU]
      bitmap: 2/59 pages [8KB], 65536KB chunk

unused devices: <none>

… and it’s doesn’t appear to have much to say… Other than letting me know about the in-memory bitmap (basically a cache of what’s in the on-disk bitmap − it allows bitmap operations to be more efficient).

Are there any commands or GUI tools that might give me some more insight into the disk/array activity? Perhaps even some that would also be good to check/monitor/scrub the RAID array?

I’d hate to reboot or something when the array is in the middle of something and cause any issues.

winnie · 27 July 2021 17:48

Nothing appears wrong. The bitmap (whether internal or external) is akin to a file-system’s journal. Its cousin in ZFS is the ZIL (intent log). After some time of no writes, the bitmap shouldn’t be using any pages for cache’d writes. You can attempt to flush it by unmounting the ext4 file-system, stopping the array, and then reassembling it.

It will also be interesting if you still hear the rythmic hums and clacks after reassembling the array and waiting after a period of idle time and no data activity. (Ruling out any other mechanical drives in the system.)

You filled it a bit too close for comfort in terms of future fragmentation and performance.

You might be able to squeeze in a bit extra capacity by removing the reserved superuser blocks from the ext4 file-system. (I believe it defaults to 5%, unless that has changed recently.) It’s original purpose was to prevent locking yourself out of the system on the chance that you filled the file-system 100% and cannot even write/modify anything for the sake of recovery or emergency. It’s not really neccessary for a purely “data storage” purpose, like you’re using.

Make sure you unmount the file-system first, but leave the array assembled, and then remove the reserved superuser blocks:

sudo tune2fs -m 0 /dev/md/RAID1Array

EDIT: This concerns the ext4 file-system, nothing to do with mdadm, per se.

Daniel-I · 27 July 2021 18:15

Yes, the data is a bit tight… but it will be shrinking over time. Lots of “windows only” bloat (drivers/installers, etc) in it currently that will be pruned over time while I stay focused on Manjaro.

Didn’t need to re-assemble the array… as the disk activity stopped (I think somewhere between the umount and tune2fs commands completing as I reclaimed that 5% (now at 520GiB free)…

$ sudo umount /data/raid1
$ sudo tune2fs -m 0 /dev/md/RAID1Array
tune2fs 1.46.2 (28-Feb-2021)
Setting reserved blocks percentage to 0% (0 blocks)
$ sudo mount /data/raid1
$ cat /proc/mdstat
Personalities : [raid1] 
md127 : active raid1 sdc1[1] sdb1[0]
      7813893120 blocks super 1.2 [2/2] [UU]
      bitmap: 2/59 pages [8KB], 65536KB chunk

unused devices: <none>

Once I re-mounted, the drives clacked away merrily for about 5 seconds (I suspect while the bitmap/cache was rebuilt)… and have stayed silent so far… thank you winnie!

Daniel-I · 27 July 2021 19:52

This is interesting… I caught wind of iotop and installed it through PAMAC. No read/write numbers… but apparently some IO activity for something called ext4lazyinit…
Screenshot_20210727_142747

Apparently the kernel is tasked with handling some of the final touches of the ext4 formats initialization… and this thread I found seems to echo my experience. Probably delayed from finishing as I went straight from formatting the array, to using rsync to fill it with data.

Now it makes sense why unmounting and remounting stopped the noise… as it only starts/continues after being mounted. And I probably had the kernel working double-time as it was trying to work on that as I was loading up the drive with data… and obviously still had more to do after the data copying was complete.

Also interesting to learn there’s an extra parameter to include to not use “lazy initialization”… and according to man mkfs.ext4, there are actually two lazy features…

lazy_itable_init[= <0 to disable, 1 to enable>]
If enabled and the uninit_bg feature is enabled, the inode table will not be fully initialized by mke2fs. This speeds up filesystem initialization noticeably, but it requires the kernel to finish initializing the filesystem in the background when the filesystem is first mounted. If the option value is omitted, it defaults to 1 to enable lazy inode table zeroing.

lazy_journal_init[= <0 to disable, 1 to enable>]
If enabled, the journal inode will not be fully zeroed out by mke2fs. This speeds up filesystem initialization noticeably, but carries some small risk if the system crashes before the journal has been overwritten entirely one time. If the option value is omitted, it defaults to 1 to enable lazy journal inode zeroing.

I think I’ll be adding these two extra parameters to my mechanical drive formats.

winnie · 27 July 2021 21:23

Even that is new to me (lazy_itable_init, lazy_journal_init), but like I said, I’ve moved exclusively to XFS (local) and ZFS (NAS). Per your discovery, seems that it will only have an affect some time after formatting the file-system, and you shouldn’t have too many issues using it normally.

While you’re at it, you should check if your 8TB drives support TLER/ERC, and if so, if the firmware is set to 7 seconds by default. (It’s very likely, since 8TB and larger are usually white-label enterprise or NAS drives.)

sudo smartctl -l scterc /dev/sdx

It prints the timeout for TLER in “deciseconds”, so a value of 70 = 7.0 seconds.

Linux/mdadm waits 30 seconds before it considers a SATA/SCSI drive “unresponsive” and tries to bring it back up or simply offline it (in which case your RAID array will drop to a “degraded” state.)

If you drives do not support ERC/TLER (or they support it, but are not configured to use it), they will try for an indefinite period of time (internally) to correct their own errors / relocate bad sectors. Problem is, if this time exceeds 30 seconds, even a healthy drive can be kicked out of the array.

Setting TLER to 7.0 seconds (“70 deciseconds”) is recommended. (Don’t try to set it to anything shorter than 7 seconds, as I’ve read that the drive’s firmware might simply ignore it without notifying you that the number is invalid.)

If your drive supports TLER, but it’s not enabled, you can manually enable it (yet this will not persist through system reboots.)

sudo smartctl -l scterc,70,70 /dev/sdx

In order for it to apply after every reboot, you need to make a script or cron job that will do it upon booting up your computer.

Daniel-I · 27 July 2021 23:10

Looks like they are already at 7sec?

$ sudo smartctl -l scterc /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.4-1-MANJARO] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

$ sudo smartctl -l scterc /dev/sdc
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.13.4-1-MANJARO] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org

SCT Error Recovery Control:
           Read:     70 (7.0 seconds)
          Write:     70 (7.0 seconds)

And this brings up 2 more other HD thoughts:

how would I check for and change write caching on the drives… I’m assuming having write caching disabled would be a better option in case of crash (preserving data integrity at the cost of speed)… or maybe this is an old idea that might not be as relevant in GNU/Linux as it was in Windows?
EDIT: Well look at that… it’s another hdparm command/parameter…

$ sudo hdparm -W /dev/sdb

/dev/sdb:
 write-caching =  1 (on)

$ sudo hdparm -W /dev/sdc

/dev/sdc:
 write-caching =  1 (on)

In one of my other posts the discussion evolved to adjusting APM settings on the mechanical drives to 254 (or 255 if they support it) but I thought I would reserve looking into that for this discussion (deciding how the drives would be attached first; SATA versus HBA)… would you have a recommendation for APM settings?

$ sudo hdparm -B /dev/sdb

/dev/sdb:
 APM_level      = 164

$ sudo hdparm -B /dev/sdc

/dev/sdc:
 APM_level      = 164

winnie · 27 July 2021 23:33

Nothing needs to be done! They support it and are already set by the factory at 7 seconds. This is one of the selling points of “NAS-ready” drives, among other features (such as longer MTBF and constant operation in a vibration-heavy chassis or server rack.

The drop in write performance from disabling it might not be worth it, considering ext4 uses a journal (thus buffers against a dirty state, and will re-check itself if it was previously not unmounted cleanly), and for future projects ZFS (and Btrfs, such as used in your Synology NAS) are copy-on-write file-systems, which means it’s nearly impossible to have corruption due to a crash or powerloss. (That’s not to say you should neglect a UPS battery backup in case of sudden powerloss.)

I always have mine disabled. It’s healthier for the drive. Acoustic (-M) and APM (-B) and auto-suspend (-S) should always be disabled, especially if used in a RAID array or ZFS pool. Drives barely use any power on idle (around 3 to 5 watts, spinning). Depending where you live, that’s about 30¢ to 50¢ per month on your electric bill if you leave them running 24/7.

Daniel-I · 27 July 2021 23:42

Many thanks once again for your great advice winnie!

I’m glad to hear I can leave the write caching on for performance with EXT4! I do have a UPS connected and think I have things setup to power down @ 25% battery… although I haven’t tested the settings yet by pulling power
Screenshot_20210727_184008
Screenshot_20210727_184033

I’ll work on disabling Acoustic (-M), APM (-B) and auto-suspend (-S) on both drives!

winnie · 27 July 2021 23:43

I had to do a double-take on this. I just realized, and correct me if I’m wrong: THIS is your first time jumping into Linux as a legit alternative to Windows?

Well gosh darn it! You’re crazy! A deep dive right into software RAID and rsync’ing from a NAS server and esoteric file-system options!

The first time when I ditched Windows for Linux, I took baby steps:

“Okay… so… the terminal, um, that’s like the CMD.exe thingy in Windows, right? Okay… I can… do stuff in the terminal… okay… so wait… package manager? Like for zip files? Oh, package manager is like for installing software? Whatever. How do I run my .exe files? Is Firefox like IE but with an orange icon?”

Daniel-I · 28 July 2021 00:29

Hehe… I though asking for help in advance was a baby step?!

I’m an older “Computer Engineering Technologist” who graduated back in the day when 386’s were king. I’ve dabbled with a few “Linux Live” CD/DVD/USB distros on and off over the past [cough] decades, but never put significant effort in to actually try replace windows with it after learning early on that I’d have to give up some of my favorite PC activities like winding down in a good RPG or MMO… GNU/Linux just wasn’t going to let me keep playing my favorite titles (until more recently).

But as luck would have it, much of what I play is on Steam, and earlier this year I caught a video (might have been this one or one like it) from Anthony at LTT where he was talking about “Gaming on Linux” (POP_OS and Manjaro) and that planted the seed for me to embrace all the good things Steam has been doing in this area over the past few years… with the help of other technologies like wine; and all the great upstream and downstream support found in the various GNU/Linux distributions of today.

Needless to say I’m glad to be rid of all the MS data-mining/telemetry, and happy to learn more about Manjaro and GNU/Linux as it is supporting my geekiness and Steam game play beautifully. And I’m digging in deep enough to try support the few people (like my parents) that will likely follow me to Linux; and likely future n00bs like me in this forum and other aspects of the GNU/Linux community that present themselves along the way.

I still have lots to learn, and I am prioritizing my posts and learning based on…

where I am in my migration
what apps/functionality I want/need next
what hardware I want/need to get working next
what presents itself as a learning opportunity along the way

Daniel-I · 28 July 2021 01:23

Ok, I ran through disabling Acoustic (-M), APM (-B) and auto-suspend (-S) on both drives… but based on my steps, it looks like Acoustic (-M) is “not supported” for my WD Red’s so I left it as is…

Acoustic (-M) … didn’t try -M0 after seeing it’s “not supported”

$ sudo hdparm -M /dev/sdb
/dev/sdb:
 acoustic      = not supported
$ sudo hdparm -M /dev/sdc
/dev/sdc:
 acoustic      = not supported

APM (-B)

$ sudo hdparm -B /dev/sdb
/dev/sdb:
 APM_level      = 164
$ sudo hdparm -B255 /dev/sdb
/dev/sdb:
 setting Advanced Power Management level to disabled
 APM_level      = off
$ sudo hdparm -B255 /dev/sdc
/dev/sdc:
 setting Advanced Power Management level to disabled
 APM_level      = off

Auto-suspend (-S)

$ sudo hdparm -S /dev/sdc
  -S: bad/missing standby-interval value (0..255)
$ sudo hdparm -S0 /dev/sdc
/dev/sdc:
 setting standby to 0 (off)
$ sudo hdparm -S0 /dev/sdb
/dev/sdb:
 setting standby to 0 (off)

EDIT: Wow… it’s been 7 hours (according to the forum) since I last ran iotop… and it’s still listing ext4lazyinit as working… hopefully it’s settled come the morning.
Screenshot_20210727_212831

winnie · 28 July 2021 02:46

Don’t forget to create a custom udev rule so that those values are re-applied each reboot. (I name my custom files wth “99-” so I can keep track of them if I need to review/edit.)

For example, to apply it to all “spinning” drives in the system:

sudo nano /etc/udev/rules.d/99-hdparm.rules

With the entry:

ACTION==“add|change”, KERNEL==“sd[a-z]”, ATTRS{queue/rotational}==“1”, RUN+=“/usr/bin/hdparm -B255 -S0 -M0 /dev/%k”

It doesn’t hurt to specify -M0 out of practice, as it gets ignored if it’s not supported by the drive anyways. Makes the entry good for future use if such a drive supports -M.

If you prefer to use a custom script (that runs with elevated privileges), have it do something like:

hdparm -B255 -S0 -M0 /dev/disk/by-id/{serialnumberdisk1,serialnumberdisk2,etc,etc,etc}

Make sure not to use the ID of the partition, but rather the disk itself.

Daniel-I · 28 July 2021 03:08

Many thanks for the keeping the learning curve moving forward winnie! I really appreciate your thoroughness.

Okay, I’m going to try to implement a udev custom rule because I think it fits nicely in with the other customization files I’ve been playing with so far… like /etc/fstab, /etc/mdadm.conf, and /etc/sysctl.d/30-swap_usage.conf (custom file from Fabby to control swappiness and vfs_cache_pressure).

But that syntax is above my head (is that some form of regex or bash scripting?), so more learning to do… although it seems to target just the sdx devices, which would mean my sda Samsung EVO SSD (which likely doesn’t care about these settings) and my sdb & sdc WD Red’s.

Good to know!

$ sudo hdparm -M0 /dev/sdc

/dev/sdc:
 setting acoustic management to 0
SG_IO: bad/missing sense data, sb[]:  70 00 05 00 00 00 00 0a 04 53 00 00 21 04 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
 acoustic      = not supported

winnie · 28 July 2021 03:28

It’s based off the udev rules syntax.

Based on the rule entry, your SSD will be skipped because of the “spinning” attribute. (That is, "rotational")

ACTION==“add|change”, KERNEL==“sd[a-z]”, ATTRS{queue/rotational}==“1”, RUN+=“/usr/bin/hdparm -B255 -S0 -M0 /dev/%k”

This is an old article, but it gives a good idea of the gist of writing a udev rule (which is rare for an end-user to do anyways. I doubt you’ll need more than this very one.)