[Testing Update] 2019-12-16 - KDE-git, Systemd v244.1, Upstream Rebuilds

@philm on GitHub

it all comes down to: Also what is interesting is: it happens on Ryzen CPUs.

I'm seeing the same problem on a Threadripper 1950X (ASRock Taichi mobo) and even on some old FX (FX8350 on a Gigabyte 990FXA-UD3 and FX6300 on a ASRock 970-Extreme3 mobo) systems.

I'm not exactly on Manjaro, I'm using Arch ... I actually blacklisted Arch's upstream systemd packages and replaced it with your systemd version because you guys seem to actually know what you're doing (I'm not risking any data-corruption because the Arch maintainers seem to love dumpsterfires). Reverted back to your 242.153-2 and everything's fine.

EDIT: Fixed typo

2 Likes

Moreover, it could be a nice addition to 5.4 as well. Now it is one of the reasons why I compile 5.4 myself.

patch for 5.4
---
v2: Use devm_kfree() to release memory in error path

 drivers/nvme/host/Kconfig      |  10 ++
 drivers/nvme/host/Makefile     |   1 +
 drivers/nvme/host/core.c       |   5 +
 drivers/nvme/host/nvme-hwmon.c | 163 +++++++++++++++++++++++++++++++++
 drivers/nvme/host/nvme.h       |   8 ++
 5 files changed, 187 insertions(+)
 create mode 100644 drivers/nvme/host/nvme-hwmon.c

diff --git a/drivers/nvme/host/Kconfig b/drivers/nvme/host/Kconfig
index 2b36f052bfb9..aeb49e16e386 100644
--- a/drivers/nvme/host/Kconfig
+++ b/drivers/nvme/host/Kconfig
@@ -23,6 +23,16 @@ config NVME_MULTIPATH
 	   /dev/nvmeXnY device will show up for each NVMe namespaces,
 	   even if it is accessible through multiple controllers.
 
+config NVME_HWMON
+	bool "NVME hardware monitoring"
+	depends on (NVME_CORE=y && HWMON=y) || (NVME_CORE=m && HWMON)
+	help
+	  This provides support for NVME hardware monitoring. If enabled,
+	  a hardware monitoring device will be created for each NVME drive
+	  in the system.
+
+	  If unsure, say N.
+
 config NVME_FABRICS
 	tristate
 
diff --git a/drivers/nvme/host/Makefile b/drivers/nvme/host/Makefile
index 8a4b671c5f0c..03de4797a877 100644
--- a/drivers/nvme/host/Makefile
+++ b/drivers/nvme/host/Makefile
@@ -14,6 +14,7 @@ nvme-core-$(CONFIG_TRACING)		+= trace.o
 nvme-core-$(CONFIG_NVME_MULTIPATH)	+= multipath.o
 nvme-core-$(CONFIG_NVM)			+= lightnvm.o
 nvme-core-$(CONFIG_FAULT_INJECTION_DEBUG_FS)	+= fault_inject.o
+nvme-core-$(CONFIG_NVME_HWMON)		+= nvme-hwmon.o
 
 nvme-y					+= pci.o
 
diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index fa7ba09dca77..fc1d4b146717 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2796,6 +2796,9 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	ctrl->oncs = le16_to_cpu(id->oncs);
 	ctrl->mtfa = le16_to_cpu(id->mtfa);
 	ctrl->oaes = le32_to_cpu(id->oaes);
+	ctrl->wctemp = le16_to_cpu(id->wctemp);
+	ctrl->cctemp = le16_to_cpu(id->cctemp);
+
 	atomic_set(&ctrl->abort_limit, id->acl + 1);
 	ctrl->vwc = id->vwc;
 	if (id->mdts)
@@ -2897,6 +2900,8 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 
 	ctrl->identified = true;
 
+	nvme_hwmon_init(ctrl);
+
 	return 0;
 
 out_free:
diff --git a/drivers/nvme/host/nvme-hwmon.c b/drivers/nvme/host/nvme-hwmon.c
new file mode 100644
index 000000000000..af5eda326ec6
--- /dev/null
+++ b/drivers/nvme/host/nvme-hwmon.c
@@ -0,0 +1,163 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * NVM Express hardware monitoring support
+ * Copyright (c) 2019, Guenter Roeck
+ */
+
+#include <linux/hwmon.h>
+
+#include "nvme.h"
+
+struct nvme_hwmon_data {
+	struct nvme_ctrl *ctrl;
+	struct nvme_smart_log log;
+};
+
+static int nvme_hwmon_get_smart_log(struct nvme_hwmon_data *data)
+{
+	return nvme_get_log(data->ctrl, NVME_NSID_ALL, NVME_LOG_SMART, 0,
+			    &data->log, sizeof(data->log), 0);
+}
+
+static int nvme_hwmon_read(struct device *dev, enum hwmon_sensor_types type,
+			   u32 attr, int channel, long *val)
+{
+	struct nvme_hwmon_data *data = dev_get_drvdata(dev);
+	struct nvme_smart_log *log = &data->log;
+	int err;
+	int temp;
+
+	err = nvme_hwmon_get_smart_log(data);
+	if (err)
+		return err < 0 ? err : -EPROTO;
+
+	switch (attr) {
+	case hwmon_temp_max:
+		*val = (data->ctrl->wctemp - 273) * 1000;
+		break;
+	case hwmon_temp_crit:
+		*val = (data->ctrl->cctemp - 273) * 1000;
+		break;
+	case hwmon_temp_input:
+		if (!channel)
+			temp = le16_to_cpup((__le16 *)log->temperature);
+		else
+			temp = le16_to_cpu(log->temp_sensor[channel - 1]);
+		*val = (temp - 273) * 1000;
+		break;
+	case hwmon_temp_crit_alarm:
+		*val = !!(log->critical_warning & NVME_SMART_CRIT_TEMPERATURE);
+		break;
+	default:
+		err = -EOPNOTSUPP;
+		break;
+	}
+	return err;
+}
+
+static const char * const nvme_hwmon_sensor_names[] = {
+	"Composite",
+	"Sensor 1",
+	"Sensor 2",
+	"Sensor 3",
+	"Sensor 4",
+	"Sensor 5",
+	"Sensor 6",
+	"Sensor 7",
+	"Sensor 8",
+};
+
+static int nvme_hwmon_read_string(struct device *dev,
+				  enum hwmon_sensor_types type, u32 attr,
+				  int channel, const char **str)
+{
+	*str = nvme_hwmon_sensor_names[channel];
+	return 0;
+}
+
+static umode_t nvme_hwmon_is_visible(const void *_data,
+				     enum hwmon_sensor_types type,
+				     u32 attr, int channel)
+{
+	const struct nvme_hwmon_data *data = _data;
+
+	switch (attr) {
+	case hwmon_temp_crit:
+		if (!channel && data->ctrl->cctemp)
+			return 0444;
+		break;
+	case hwmon_temp_max:
+		if (!channel && data->ctrl->wctemp)
+			return 0444;
+		break;
+	case hwmon_temp_crit_alarm:
+		if (!channel)
+			return 0444;
+		break;
+	case hwmon_temp_input:
+	case hwmon_temp_label:
+		if (!channel || data->log.temp_sensor[channel - 1])
+			return 0444;
+		break;
+	default:
+		break;
+	}
+	return 0;
+}
+
+static const struct hwmon_channel_info *nvme_hwmon_info[] = {
+	HWMON_CHANNEL_INFO(chip, HWMON_C_REGISTER_TZ),
+	HWMON_CHANNEL_INFO(temp,
+			   HWMON_T_INPUT | HWMON_T_MAX | HWMON_T_CRIT |
+				HWMON_T_LABEL | HWMON_T_CRIT_ALARM,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL,
+			   HWMON_T_INPUT | HWMON_T_LABEL),
+	NULL
+};
+
+static const struct hwmon_ops nvme_hwmon_ops = {
+	.is_visible = nvme_hwmon_is_visible,
+	.read = nvme_hwmon_read,
+	.read_string = nvme_hwmon_read_string,
+};
+
+static const struct hwmon_chip_info nvme_hwmon_chip_info = {
+	.ops = &nvme_hwmon_ops,
+	.info = nvme_hwmon_info,
+};
+
+void nvme_hwmon_init(struct nvme_ctrl *ctrl)
+{
+	struct device *dev = ctrl->device;
+	struct nvme_hwmon_data *data;
+	struct device *hwmon;
+	int err;
+
+	data = devm_kzalloc(dev, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return;
+
+	data->ctrl = ctrl;
+
+	err = nvme_hwmon_get_smart_log(data);
+	if (err) {
+		dev_warn(dev, "Failed to read smart log (error %d)\n", err);
+		devm_kfree(dev, data);
+		return;
+	}
+
+	hwmon = devm_hwmon_device_register_with_info(dev, dev_name(dev),
+						     data,
+						     &nvme_hwmon_chip_info,
+						     NULL);
+	if (IS_ERR(hwmon)) {
+		dev_warn(dev, "Failed to instantiate hwmon device\n");
+		devm_kfree(dev, data);
+	}
+}
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 22e8401352c2..e6460c1216bc 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -231,6 +231,8 @@ struct nvme_ctrl {
 	u16 kas;
 	u8 npss;
 	u8 apsta;
+	u16 wctemp;
+	u16 cctemp;
 	u32 oaes;
 	u32 aen_result;
 	u32 ctratt;
@@ -652,4 +654,10 @@ static inline struct nvme_ns *nvme_get_ns_from_dev(struct device *dev)
 	return dev_to_disk(dev)->private_data;
 }
 
+#if IS_ENABLED(CONFIG_NVME_HWMON)
+void nvme_hwmon_init(struct nvme_ctrl *ctrl);
+#else
+static inline void nvme_hwmon_init(struct nvme_ctrl *ctrl) { }
+#endif
+
 #endif /* _NVME_H */
-- 
2.17.1
$  sensors
...

nvme0-pci-0300
Adapter: PCI adapter
Composite:    +41.0°C  (high = +68.0°C, crit = +71.0°C)
Sensor 1:     +41.0°C  
Sensor 2:     +48.0°C
1 Like

Going further in my investigations, I confirmed that hibernation works by doing:

sudo -s -H
echo disk > /sys/power/state

The issue appears to be that one. Only difference is that my filesystem is ext4, not Btrfs. Indeed, it can be solved by adding

[Service]
Environment=SYSTEMD_BYPASS_HIBERNATION_MEMORY_CHECK=1

to systemd-logind and systemd-hibernate by doing sudo systemctl edit [systemd-logind|systemd-hibernate].service

And rebooting, of course…

3 Likes

Hi,
i do not have this issue on intel P4 + Nvidia. Here's my configuration:


5.4.3-1-MANJARO x86_64 bits
systemd 244.1-1.1
inxi -Fxzc0

System:    Host:  Kernel: 5.4.3-1-MANJARO x86_64 bits: 64 compiler: gcc v: 9.2.0 Desktop: Xfce 4.14.1 
           Distro: Manjaro Linux 
Machine:   Type: Desktop Mobo: ASUSTeK model: P5WD2-Premium v: Rev 1.xx serial: <filter> BIOS: American Megatrends v: 0422 
           date: 05/27/2005 
CPU:       Topology: Single Core model: Intel Pentium 4 bits: 64 type: MT arch: Netburst Smithfield rev: 3 L2 cache: 2048 KiB 
           flags: lm nx pae sse sse2 sse3 bogomips: 14457 
           Speed: 3600 MHz min/max: 2800/3600 MHz Core speeds (MHz): 1: 3600 2: 2800 
Graphics:  Device-1: NVIDIA GT215 [GeForce GT 240] driver: nvidia v: 340.107 bus ID: 08:00.0 
           Display: x11 server: X.Org 1.20.6 driver: nvidia resolution: 1920x1080~60Hz 
           OpenGL: renderer: GeForce GT 240/PCIe/SSE2 v: 3.3.0 NVIDIA 340.107 direct render: Yes 
Audio:     Device-1: Intel NM10/ICH7 Family High Definition Audio vendor: ASUSTeK driver: snd_hda_intel v: kernel 
           bus ID: 00:1b.0 
           Device-2: NVIDIA High Definition Audio driver: snd_hda_intel v: kernel bus ID: 08:00.1 
           Sound Server: ALSA v: k5.4.3-1-MANJARO 
Network:   Device-1: Marvell 88E8001 Gigabit Ethernet vendor: ASUSTeK driver: skge v: 1.14 port: 7800 bus ID: 01:05.0 
           IF: enp1s5 state: up speed: 1000 Mbps duplex: full mac: <filter> 
           Device-2: Intel 82573V Gigabit Ethernet vendor: ASUSTeK driver: e1000e v: 3.2.6-k port: 9800 bus ID: 03:00.0 
           IF: enp3s0 state: down mac: <filter> 
Drives:    Local Storage: total: 298.09 GiB used: 40.69 GiB (13.7%) 
           ID-1: /dev/sda vendor: Western Digital model: WD3200BPVT-24JJ5T0 size: 298.09 GiB 
Partition: ID-1: / size: 286.06 GiB used: 40.69 GiB (14.2%) fs: ext4 dev: /dev/sda1 
           ID-2: swap-1 size: 6.45 GiB used: 0 KiB (0.0%) fs: swap dev: /dev/sda2 
Sensors:   Message: No sensors data was found. Is sensors configured? 
Info:      Processes: 151 Uptime: 6m Memory: 2.92 GiB used: 700.5 MiB (23.4%) Init: systemd Compilers: gcc: 9.2.0 Shell: bash 
           v: 5.0.11 inxi: 3.0.37 

No (apparent) issues with systemd on intel 3570K with integrated gpu. Systemd v243 worked fine as well.

For the time being we might revert to the 242 series then. Let's see when it gets sorted out.

I also had video drivers (I guess) crash after suspend (to RAM), with nvidia (nvidia-modeset). The relevant error is

Δεκ 18 17:45:47 kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
Δεκ 18 17:45:47 kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer
Δεκ 18 17:45:47 kernel: nvidia-modeset: ERROR: GPU:0: Display engine push buffer channel allocation failed: 0x65 (Call timed out [NV_ERR_TIMEOUT])
Δεκ 18 17:45:47 kernel: nvidia-modeset: ERROR: GPU:0: Failed to allocate display engine core DMA push buffer

Journal shows everything else returning successfully from suspend, network (wifi) was up and running.
Still black screen with no TTY working.

I have a desktop with PRIME (manual) setup

inxi -SMGxxz
System:    Host: ma64testimg Kernel: 5.3.16-1-MANJARO x86_64 bits: 64 compiler: gcc v: 9.2.0 Desktop: KDE Plasma 5.17.4 
           tk: Qt 5.13.2 wm: kwin_x11 dm: LightDM, SDDM Distro: Manjaro Linux 
Machine:   Type: Desktop Mobo: ASUSTeK model: P7H55-M v: Rev X.0x serial: <filter> BIOS: American Megatrends v: 1101 
           date: 08/18/2010 
Graphics:  Device-1: Intel Core Processor Integrated Graphics vendor: ASUSTeK driver: i915 v: kernel bus ID: 00:02.0 
           chip ID: 8086:0042 
           Device-2: NVIDIA GF116 [GeForce GTX 550 Ti] vendor: eVga.com. driver: nvidia v: 390.132 bus ID: 01:00.0 
           chip ID: 10de:1244 
           Display: x11 server: X.Org 1.20.6 driver: modesetting,nvidia alternate: fbdev,intel,nouveau,nv,vesa 
           compositor: kwin_x11 resolution: 1920x1080~60Hz, 1920x1080~60Hz 
           OpenGL: renderer: GeForce GTX 550 Ti/PCIe/SSE2 v: 4.6.0 NVIDIA 390.132 direct render: Yes

Muss das sein? Möglicherweise ja. Sicherheitshalber?
Bei mir funktioniert 244 plus kernel 5.4 klaglos
auf einem Ryzen 3 1300x. MoBo Asus Prime B350 Plus

Just don't downgrade those packages and stick with 244 if it's ok for you.

1 Like

We got a report that on Linux 5.4.3, sound on HDMI port may suddenly become extremely loud and stuck at maximum volume.

Seeing that issue here on kernel 5.3.16 as well as 5.4. (Nvidia 440xx)

1 Like

It works without problems for many, but Manjaro needs to ship reliable software, which is not the case when people encounter kernel panics.

4 Likes

5.4.5 is now out. Should you try it out before downgrading?

Manjaro needs to ship reliable software

That is right. Absolutely!
++++
I am on testing and upgraded partially ((YES - I know what I do)) to:

Warnung: linux54: Lokale Version (5.4.5-1) ist neuer als core (5.4.2-1)
Warnung: linux54-headers: Lokale Version (5.4.5-1) ist neuer als core (5.4.2-1)
Warnung: systemd: Lokale Version (244.1-1.1) ist neuer als core (242.153-2)
Warnung: systemd-libs: Lokale Version (244.1-1) ist neuer als core (242.153-2)
Warnung: systemd-resolvconf: Lokale Version (244.1-1) ist neuer als core (242.153-2)
Warnung: systemd-sysvcompat: Lokale Version (244.1-1) ist neuer als core (242.153-2)

System is stable as a rock....

]$ journalctl -p err -b
-- Logs begin at Thu 2019-12-05 10:56:28 CET, end at Thu 2019-12-19 09:16:55 CET. --
-- No entries --
++++
And:
No errors using systemd-manager 1.0.0-2 from the AUR.

I was wondering why I kept getting systemd local is newer than core notifications. I don't know if this is related, but just starting today, my kernel modules have been getting deleted on reboot. Like, I'll boot up, systemd-modules-load.service fails during boot, it sends me to an emergency shell, I reinstall my kernel, and I get a million of the "cannot find file information for /lib/modules/5.4xxx/xxxx" notifications during install, and the files get replaced. I booted into my arch install to try and see what was going on, and sure enough there was no /lib/modules folder. It was getting completely erased every time I booted. Thank god for timeshift is all I have to say.

How sure is enough? :rofl:
Unless this is a joke... it's a serious system mess and you should investigate (create a new topic).

BTW there are newer Testing Update Announcements...

2 Likes

It's definitely not a joke, and as soon as I downgraded Systemd as was recommended it completely went away.

Then I guess I am just lucky I still have this folder and being at

$ pacman -Q systemd
systemd 244.1-1.1
1 Like

core

Last updated: Mon 30 Dec 20:02:02 UTC 2019

Package stable testing unstable
systemd 242.153-2 242.153-2 242.153-2

I'm lucky too

sgs@mx Linux 5.4.6-2-MANJARO x86_64 18.1.5 Juhraya
~ >>> pacman -Q systemd                                         
systemd 244.1-1.1

1 Like

So please post in newest testing update thread.

This topic was automatically closed 15 days after the last reply. New replies are no longer allowed.

Forum kindly sponsored by