ryzen
24 February 2024 23:34
1
CPU: 2x 48-core AMD EPYC 7642 (-MCP SMP-) speed/min/max: 1576/1500/2300 MHz
Kernel: 6.6.16-2-MANJARO x86_64 Up: 8m Mem: 4.88/251.73 GiB (1.9%)
Storage: 1.85 TiB (1.2% used) Procs: 1171 Shell: Bash inxi: 3.3.33
Hi there, I have built a dual epyc cpu on a supermicro m/b to run python scripts on very large numbers (thousands/millions of digits). On windoze 10 the scripts run at a reasonable speed. On this machine is it very, very slow. I notice the cpu speed/min/max is not correct.
It is a fresh install, all up to date. Is there something I have messed up? BIOS looks ok. These CPU’s are base 2.3 boost up to 3.3Ghz. Thanks for your help.
ryzen
25 February 2024 00:05
3
trying to work out how to paste contents or a link to txt file. forum not allowing me to post a link (can’t include links in your post)
Welcome to the forum!
See [HowTo] Post command output and file content as formatted text
There are quite a few pastebin hosts allowed for new users, you should have no issue pasting a link unless it’s some obscure one.
ryzen
25 February 2024 00:08
6
System:
Kernel: 6.6.16-2-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/boot/vmlinuz-6.6-x86_64
root=UUID=6c3c70d5-994f-4aec-b639-c548602c205d rw nouveau.modeset=0 quiet
cryptdevice=UUID=5ad75e2b-6b15-442b-a01e-12101132d822:luks-5ad75e2b-6b15-442b-a01e-12101132d822
root=/dev/mapper/luks-5ad75e2b-6b15-442b-a01e-12101132d822 splash
apparmor=1 security=apparmor udev.log_priority=3
Desktop: Xfce v: 4.18.1 tk: Gtk v: 3.24.36 wm: xfwm4 v: 4.18.0
with: xfce4-panel tools: xfce4-screensaver vt: 7 dm: LightDM v: 1.32.0
Distro: Manjaro base: Arch Linux
Machine:
Type: Server System: Supermicro product: Super Server v: 0123456789
serial: <superuser required> Chassis: type: 17 v: 0123456789
serial: <superuser required>
Mobo: Supermicro model: H11DSi-NT v: 2.00 serial: <superuser required>
uuid: <superuser required> UEFI-[Legacy]: American Megatrends v: 2.3
date: 08/02/2021
CPU:
Info: model: AMD EPYC 7642 bits: 64 type: MCP SMP arch: Zen 2 gen: 3
level: v3 note: check built: 2020-22 process: TSMC n7 (7nm) family: 0x17 (23)
model-id: 0x31 (49) stepping: 0 microcode: 0x830107B
Topology: cpus: 2x cores: 48 smt: <unsupported> cache: L1: 2x 3 MiB (6 MiB)
desc: d-48x32 KiB; i-48x32 KiB L2: 2x 24 MiB (48 MiB) desc: 48x512 KiB
L3: 2x 256 MiB (512 MiB) desc: 16x16 MiB
Speed (MHz): avg: 1610 high: 3300 min/max: 1500/2300 boost: enabled
scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 2300 2: 1500
3: 1500 4: 1500 5: 1500 6: 3300 7: 3300 8: 1500 9: 1500 10: 1500 11: 1500
12: 1500 13: 1500 14: 1500 15: 3300 16: 2300 17: 1500 18: 1500 19: 1500
20: 1500 21: 1500 22: 1500 23: 1500 24: 1500 25: 1500 26: 1500 27: 1500
28: 1500 29: 1500 30: 1500 31: 1500 32: 1500 33: 1500 34: 1500 35: 1500
36: 1500 37: 1500 38: 1500 39: 1500 40: 1500 41: 1500 42: 1500 43: 1500
44: 1500 45: 1500 46: 1500 47: 1500 48: 1500 49: 1500 50: 1500 51: 1500
52: 1500 53: 1500 54: 1500 55: 1500 56: 1500 57: 1500 58: 1467 59: 1500
60: 3300 61: 3295 62: 1500 63: 1500 64: 1500 65: 1500 66: 1500 67: 1500
68: 1500 69: 1500 70: 1500 71: 1500 72: 1500 73: 1500 74: 1500 75: 1500
76: 1500 77: 1500 78: 1500 79: 1500 80: 1500 81: 1500 82: 1500 83: 1500
84: 1500 85: 1500 86: 1500 87: 1500 88: 1500 89: 1500 90: 1500 91: 1500
92: 1500 93: 1500 94: 1500 95: 1500 96: 1500 bogomips: 441875
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities:
Type: gather_data_sampling status: Not affected
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: retbleed mitigation: untrained return thunk; SMT disabled
Type: spec_rstack_overflow mitigation: SMT disabled
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, STIBP:
disabled, RSB filling, PBRSB-eIBRS: Not affected
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: NVIDIA GV100 [TITAN V] driver: nvidia v: 545.29.06
alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current (as of
2024-02; EOL~2026-12-xx) arch: Volta code: GV1xx process: TSMC 12nm
built: 2017-2020 pcie: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 01:00.0
chip-ID: 10de:1d81 class-ID: 0300
Device-2: NVIDIA GV100 [TITAN V] driver: nvidia v: 545.29.06
alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current (as of
2024-02; EOL~2026-12-xx) arch: Volta code: GV1xx process: TSMC 12nm
built: 2017-2020 pcie: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 21:00.0
chip-ID: 10de:1d81 class-ID: 0300
Device-3: ASPEED Graphics Family vendor: Super Micro driver: ast v: kernel
ports: active: VGA-1 empty: Virtual-1 bus-ID: 42:00.0 chip-ID: 1a03:2000
class-ID: 0300
Display: x11 server: X.org v: 1.21.1.11 compositor: xfwm4 v: 4.18.0 driver:
X: loaded: modesetting,nvidia unloaded: nouveau alternate: fbdev,nv,vesa
gpu: ast display-ID: :0.0 note: <missing: xdpyinfo/xrandr>
Monitor-1: VGA-1 model: Dell U2412M serial: <filter> built: 2020
res: 1920x1200 dpi: 94 gamma: 1.2 size: 518x324mm (20.39x12.76")
diag: 611mm (24.1") ratio: 16:10 modes: max: 1920x1080 min: 640x480
API: EGL v: 1.5 hw: drv: nvidia platforms: device: 0 drv: nvidia device: 1
drv: nvidia device: 4 drv: swrast gbm: drv: kms_swrast surfaceless:
drv: nvidia x11: drv: zink inactive: wayland,device-2,device-3
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: mesa v: 23.3.5-manjaro1.1
glx-v: 1.4 direct-render: yes renderer: llvmpipe (LLVM 16.0.6 256 bits)
device-ID: ffffffff:ffffffff memory: 245.83 GiB unified: yes
Audio:
Device-1: NVIDIA driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s
lanes: 16 bus-ID: 01:00.1 chip-ID: 10de:10f2 class-ID: 0403
Device-2: NVIDIA driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s
lanes: 16 bus-ID: 21:00.1 chip-ID: 10de:10f2 class-ID: 0403
API: ALSA v: k6.6.16-2-MANJARO status: kernel-api with: aoss
type: oss-emulator tools: alsactl,alsamixer,amixer
Server-1: JACK v: 1.9.22 status: off tools: N/A
Server-2: PipeWire v: 1.0.3 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Intel Ethernet X550 vendor: Super Micro driver: ixgbe v: kernel
pcie: gen: 2 speed: 5 GT/s lanes: 8 port: N/A bus-ID: 61:00.0
chip-ID: 8086:1563 class-ID: 0200
IF: eno1 state: up speed: 1000 Mbps duplex: full mac: <filter>
Device-2: Intel Ethernet X550 vendor: Super Micro driver: ixgbe v: kernel
pcie: gen: 2 speed: 5 GT/s lanes: 8 port: N/A bus-ID: 61:00.1
chip-ID: 8086:1563 class-ID: 0200
IF: eno2 state: down mac: <filter>
Info: services: NetworkManager
Drives:
Local Storage: total: 1.85 TiB used: 22.35 GiB (1.2%)
SMART Message: Required tool smartctl not installed. Check --recommends
ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 870 EVO 2TB
size: 1.82 TiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 3B6Q scheme: MBR
ID-2: /dev/sdb maj-min: 8:16 vendor: SanDisk model: Cruzer Blade
size: 29.25 GiB block-size: physical: 512 B logical: 512 B type: USB rev: 2.0
spd: 480 Mb/s lanes: 1 mode: 2.0 tech: N/A serial: <filter> fw-rev: 1.00
scheme: MBR
Partition:
ID-1: / raw-size: 1.82 TiB size: 1.79 TiB (98.37%) used: 22.35 GiB (1.2%)
fs: ext4 dev: /dev/dm-0 maj-min: 254:0
mapped: luks-5ad75e2b-6b15-442b-a01e-12101132d822
Swap:
Alert: No swap data was found.
Sensors:
System Temperatures: cpu: 45.8 C mobo: N/A
Fan Speeds (rpm): N/A
Info:
Memory: total: 256 GiB note: est. available: 251.73 GiB used: 5.89 GiB (2.3%)
Processes: 1392 Power: uptime: 4m states: freeze,mem,disk suspend: s2idle
wakeups: 0 hibernate: shutdown avail: reboot,suspend,test_resume
image: 100.68 GiB services: upowerd,xfce4-power-manager Init: systemd
v: 255 default: graphical tool: systemctl
Packages: pm: pacman pkgs: 1100 libs: 324 tools: pamac pm: flatpak pkgs: 0
Compilers: clang: 16.0.6 gcc: 13.2.1 Shell: Bash v: 5.2.26
running-in: xfce4-terminal inxi: 3.3.33
Thank you - note, I have SMT disabled in bios (don’t want to use it)
cscs
25 February 2024 00:19
7
ryzen:
Memory: total: 256 GiB
Dayum.
I think you have enough ram.
So these speeds look incorrect to you?
ryzen
25 February 2024 01:06
9
The ‘high’ is correct (3300), but the average and min/max is throttled way down. The cpu speed would explain some of the slow python performance, but this is bad enough that it takes 3-4 times longer on this machine than on windows for the same code to run.
This motherboard has a bunch of northbridge settings all set to auto. I have a machine next to it dual xeon v3 which is also slow in running python on manjaro (h/t is disabled). Would really like python running much faster, as fast or faster than the windows machine.
Difficult to say - some ideas
power profile
disable vulnerability mitigation
bottlenecks in script
run your script without X loaded (console)
dmt
25 February 2024 11:58
11
dgdg
25 February 2024 12:30
12
Are you sure that the CPU min/max is not correct? On AMD’s specification page it lists max all-core clocks of 2.3GHz and boost of 3.3GHz, which matches what you’ve posted (EPYC CPUs don’t tend to clock that high; server CPUs tend to value performance per watt over raw performance). I would also assume that a lower than expected clock speed would cause everything to run slow, not just Python.
Have you considered running pybench to get an idea of general Python performance? That might help narrow down if there’s a specific aspect of your script that is causing the problem.
1 Like
cscs
25 February 2024 20:43
13
I’m not sure you want to be using cpufreq either.
https://wiki.archlinux.org/title/CPU_frequency_scaling#Scaling_drivers
For my (zen3) ryzen I use amd_pstate=active
, which translates to amd-pstate-epp
in use.
ryzen
26 February 2024 06:11
14
Thanks for replies. Will try them out and let you know how I go.
Out of curiosity I tested my system - using the pybench script mentioned above
My system is nowhere comparable to yours
System CPU RAM info
$ inxi -SCm
System:
Host: tiger Kernel: 6.6.18-1-MANJARO arch: x86_64 bits: 64
Desktop: KDE Plasma v: 5.93.0 Distro: Manjaro Linux
Memory:
System RAM: total: 64 GiB available: 62.62 GiB
used: 3.37 GiB (5.4%)
Message: For most reliable report, use superuser + dmidecode.
Array-1: capacity: 1024 GiB note: check slots: 8 modules: 4
EC: Multi-bit ECC
Device-1: DIMM5 type: no module installed
Device-2: DIMM6 type: no module installed
Device-3: DIMM7 type: DDR4 size: 16 GiB speed: 3200 MT/s
Device-4: DIMM8 type: DDR4 size: 16 GiB speed: 3200 MT/s
Device-5: DIMM4 type: no module installed
Device-6: DIMM3 type: no module installed
Device-7: DIMM2 type: DDR4 size: 16 GiB speed: 3200 MT/s
Device-8: DIMM1 type: DDR4 size: 16 GiB speed: 3200 MT/s
CPU:
Info: 12-core model: AMD Ryzen Threadripper PRO 5945WX s bits: 64
type: MT MCP cache: L2: 6 MiB
Speed (MHz): avg: 514
min/max: 400/4978:4565:4705:4841:5254:5118:5943:5666:5807:5530:5394
cores: 1: 400 2: 400 3: 400 4: 400 5: 400 6: 400 7: 1769 8: 400
9: 400 10: 400 11: 400 12: 400 13: 400 14: 400 15: 400 16: 400
17: 400 18: 400 19: 400 20: 400 21: 400 22: 400 23: 400 24: 1785
Kernel cmdline
$ cat /proc/cmdline
BOOT_IMAGE=/boot/vmlinuz-6.6-x86_64 root=UUID=07c78795-e8a4-4134-be2e-be5908c5b9f8 rw quiet splash nowatchdog udev.log_priority=3 mitigations=off amd_pstate=active
Result
$ python bench.py
calculating pi:
100%|████████████████| 33554431/33554431 [00:05<00:00, 6151904.59it/s]
calculating fib recursive:
97%|████████████████████████████████▏| 37/38 [00:09<00:00, 3.93it/s]
calculating fib iterative:
100%|███████████████████| 1048574/1048574 [00:05<00:00, 178670.19it/s]
benchmark time: 0:00:20.864621
If you run something like btop at the same time you will likely see the system is using only a few cores possilby only one.
This is your python script not uitlizing the potential of the system
I was wondering myself why I couldn’t utilize all cores calculating pi. I search and found an article you may also find interesting - How to Use 100% of All CPU Cores in Python - Super Fast Python
I have a mini pc with win11 and manjaro stable
System:
Host: bmax4 Kernel: 6.6.16-2-MANJARO arch: x86_64 bits: 64
Desktop: KDE Plasma v: 5.27.10 Distro: Manjaro Linux
Memory:
System RAM: total: 16 GiB available: 15.31 GiB used: 2 GiB (13.1%)
Array-1: capacity: 64 GiB slots: 2 modules: 1 EC: None
Device-1: Controller0-ChannelA-DIMM0 type: DDR4 size: 16 GiB
speed: 2667 MT/s
Device-2: Controller1-ChannelA-DIMM0 type: no module installed
CPU:
Info: quad core model: Intel N100 bits: 64 type: MCP cache: L2: 2 MiB
Speed (MHz): avg: 703 min/max: 700/3400 cores: 1: 700 2: 708 3: 704 4: 703
minipc win11 - python 12
calculating pi:
100%|███████████████████████████████████████████████████████| 33554431/33554431 [00:10<00:00, 3253375.04it/s]
calculating fib recursive:
97%|█████████████████████████████████████████████████████████████████████▊ | 37/38 [00:12<00:00, 2.96it/s]
calculating fib iterative:
100%|███████████████████████████████████████████████████████████| 1048574/1048574 [00:11<00:00, 95308.51it/s]
benchmark time: 0:00:33.981077
same pc - manjaro - python 11
calculating pi:
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 33554431/33554431 [00:07<00:00, 4761923.08it/s]
calculating fib recursive:
97%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 37/38 [00:08<00:00, 4.19it/s]
calculating fib iterative:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1048574/1048574 [00:09<00:00, 105661.70it/s]
benchmark time: 0:00:25.827358
result ?
EDIT
and python 12 is faster than 11 ?
manjaro pyenv python 3.12.1
calculating pi:
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 33554431/33554431 [00:11<00:00, 3026243.36it/s]
calculating fib recursive:
97%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 37/38 [00:13<00:00, 2.64it/s]
calculating fib iterative:
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████| 1048574/1048574 [00:09<00:00, 110279.49it/s]
benchmark time: 0:00:34.598512
What’s not to like with Manjaro
Even python is faster …
I think the fact that python doesn’t defauilt to use all core - makes it easy to compare systems - it also makes it clear that python is incredibly effective when utilised to its full potential.
ryzen
26 February 2024 09:08
18
System:
Kernel: 6.6.16-2-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 13.2.1
clocksource: tsc avail: hpet,acpi_pm
parameters: BOOT_IMAGE=/boot/vmlinuz-6.6-x86_64
root=UUID=6c3c70d5-994f-4aec-b639-c548602c205d rw nouveau.modeset=0 quiet
cryptdevice=UUID=5ad75e2b-6b15-442b-a01e-12101132d822:luks-5ad75e2b-6b15-442b-a01e-12101132d822
root=/dev/mapper/luks-5ad75e2b-6b15-442b-a01e-12101132d822 splash
apparmor=1 security=apparmor udev.log_priority=3
Desktop: Xfce v: 4.18.1 tk: Gtk v: 3.24.36 wm: xfwm4 v: 4.18.0
with: xfce4-panel tools: xfce4-screensaver vt: 7 dm: LightDM v: 1.32.0
Distro: Manjaro base: Arch Linux
Machine:
Type: Server System: Supermicro product: Super Server v: 0123456789
serial: <superuser required> Chassis: type: 17 v: 0123456789
serial: <superuser required>
Mobo: Supermicro model: H11DSi-NT v: 2.00 serial: <superuser required>
uuid: <superuser required> UEFI-[Legacy]: American Megatrends v: 2.3
date: 08/02/2021
CPU:
Info: model: AMD EPYC 7642 bits: 64 type: MCP SMP arch: Zen 2 gen: 3
level: v3 note: check built: 2020-22 process: TSMC n7 (7nm) family: 0x17 (23)
model-id: 0x31 (49) stepping: 0 microcode: 0x830107B
Topology: cpus: 2x cores: 48 smt: <unsupported> cache: L1: 2x 3 MiB (6 MiB)
desc: d-48x32 KiB; i-48x32 KiB L2: 2x 24 MiB (48 MiB) desc: 48x512 KiB
L3: 2x 256 MiB (512 MiB) desc: 16x16 MiB
Speed (MHz): avg: 3223 high: 3276 min/max: 1500/2300 boost: enabled
scaling: driver: acpi-cpufreq governor: schedutil cores: 1: 3275 2: 3265
3: 3274 4: 3275 5: 3265 6: 3160 7: 3273 8: 3274 9: 3270 10: 3275 11: 3270
12: 3274 13: 3276 14: 1454 15: 3274 16: 3274 17: 3276 18: 3275 19: 3275
20: 3274 21: 3275 22: 3273 23: 3274 24: 3273 25: 3274 26: 3275 27: 3272
28: 3270 29: 3275 30: 3272 31: 3275 32: 3265 33: 3264 34: 3275 35: 3275
36: 3273 37: 3273 38: 3275 39: 3275 40: 3275 41: 2195 42: 3254 43: 3266
44: 3275 45: 3268 46: 3268 47: 3267 48: 3273 49: 3273 50: 3273 51: 3274
52: 3201 53: 3269 54: 3274 55: 3270 56: 3272 57: 3259 58: 3269 59: 3263
60: 3261 61: 3260 62: 3241 63: 3255 64: 3275 65: 3264 66: 3250 67: 3274
68: 3271 69: 3274 70: 3272 71: 3274 72: 3274 73: 3274 74: 3274 75: 3274
76: 3274 77: 3274 78: 3271 79: 3274 80: 3271 81: 3265 82: 3274 83: 3274
84: 3267 85: 3270 86: 3267 87: 3275 88: 3274 89: 3270 90: 3273 91: 1833
92: 3268 93: 3273 94: 3274 95: 3275 96: 3272 bogomips: 441757
Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4a ssse3 svm
Vulnerabilities:
Type: gather_data_sampling status: Not affected
Type: itlb_multihit status: Not affected
Type: l1tf status: Not affected
Type: mds status: Not affected
Type: meltdown status: Not affected
Type: mmio_stale_data status: Not affected
Type: retbleed mitigation: untrained return thunk; SMT disabled
Type: spec_rstack_overflow mitigation: SMT disabled
Type: spec_store_bypass mitigation: Speculative Store Bypass disabled via
prctl
Type: spectre_v1 mitigation: usercopy/swapgs barriers and __user pointer
sanitization
Type: spectre_v2 mitigation: Retpolines, IBPB: conditional, STIBP:
disabled, RSB filling, PBRSB-eIBRS: Not affected
Type: srbds status: Not affected
Type: tsx_async_abort status: Not affected
Graphics:
Device-1: NVIDIA GV100 [TITAN V] driver: nvidia v: 545.29.06
alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current (as of
2024-02; EOL~2026-12-xx) arch: Volta code: GV1xx process: TSMC 12nm
built: 2017-2020 pcie: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 01:00.0
chip-ID: 10de:1d81 class-ID: 0300
Device-2: NVIDIA GV100 [TITAN V] driver: nvidia v: 545.29.06
alternate: nouveau,nvidia_drm non-free: 545.xx+ status: current (as of
2024-02; EOL~2026-12-xx) arch: Volta code: GV1xx process: TSMC 12nm
built: 2017-2020 pcie: gen: 3 speed: 8 GT/s lanes: 16 bus-ID: 21:00.0
chip-ID: 10de:1d81 class-ID: 0300
Device-3: ASPEED Graphics Family vendor: Super Micro driver: ast v: kernel
ports: active: VGA-1 empty: Virtual-1 bus-ID: 42:00.0 chip-ID: 1a03:2000
class-ID: 0300
Display: x11 server: X.org v: 1.21.1.11 compositor: xfwm4 v: 4.18.0 driver:
X: loaded: modesetting,nvidia unloaded: nouveau alternate: fbdev,nv,vesa
gpu: ast display-ID: :0.0 note: <missing: xdpyinfo/xrandr>
Monitor-1: VGA-1 model: Dell U2412M serial: <filter> built: 2020
res: 1920x1200 dpi: 94 gamma: 1.2 size: 518x324mm (20.39x12.76")
diag: 611mm (24.1") ratio: 16:10 modes: max: 1920x1200 min: 640x480
API: EGL v: 1.5 hw: drv: nvidia platforms: device: 0 drv: nvidia device: 1
drv: nvidia device: 4 drv: swrast gbm: drv: kms_swrast surfaceless:
drv: nvidia x11: drv: zink inactive: wayland,device-2,device-3
API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: mesa v: 23.3.5-manjaro1.1
glx-v: 1.4 direct-render: yes renderer: llvmpipe (LLVM 16.0.6 256 bits)
device-ID: ffffffff:ffffffff memory: 245.83 GiB unified: yes
Audio:
Device-1: NVIDIA driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s
lanes: 16 bus-ID: 01:00.1 chip-ID: 10de:10f2 class-ID: 0403
Device-2: NVIDIA driver: snd_hda_intel v: kernel pcie: gen: 3 speed: 8 GT/s
lanes: 16 bus-ID: 21:00.1 chip-ID: 10de:10f2 class-ID: 0403
API: ALSA v: k6.6.16-2-MANJARO status: kernel-api with: aoss
type: oss-emulator tools: alsactl,alsamixer,amixer
Server-1: JACK v: 1.9.22 status: off tools: N/A
Server-2: PipeWire v: 1.0.3 status: active with: 1: pipewire-pulse
status: active 2: wireplumber status: active 3: pipewire-alsa type: plugin
tools: pactl,pw-cat,pw-cli,wpctl
Network:
Device-1: Intel Ethernet X550 vendor: Super Micro driver: ixgbe v: kernel
pcie: gen: 2 speed: 5 GT/s lanes: 8 port: N/A bus-ID: 61:00.0
chip-ID: 8086:1563 class-ID: 0200
IF: eno1 state: down mac: <filter>
Device-2: Intel Ethernet X550 vendor: Super Micro driver: ixgbe v: kernel
pcie: gen: 2 speed: 5 GT/s lanes: 8 port: N/A bus-ID: 61:00.1
chip-ID: 8086:1563 class-ID: 0200
IF: eno2 state: up speed: 1000 Mbps duplex: full mac: <filter>
Info: services: NetworkManager
Drives:
Local Storage: total: 1.82 TiB used: 25.32 GiB (1.4%)
SMART Message: Required tool smartctl not installed. Check --recommends
ID-1: /dev/sda maj-min: 8:0 vendor: Samsung model: SSD 870 EVO 2TB
size: 1.82 TiB block-size: physical: 512 B logical: 512 B speed: 6.0 Gb/s
tech: SSD serial: <filter> fw-rev: 3B6Q scheme: MBR
Partition:
ID-1: / raw-size: 1.82 TiB size: 1.79 TiB (98.37%) used: 25.32 GiB (1.4%)
fs: ext4 dev: /dev/dm-0 maj-min: 254:0
mapped: luks-5ad75e2b-6b15-442b-a01e-12101132d822
Swap:
Alert: No swap data was found.
Sensors:
System Temperatures: cpu: 51.8 C mobo: N/A
Fan Speeds (rpm): N/A
Info:
Memory: total: 256 GiB note: est. available: 251.73 GiB used: 7.36 GiB (2.9%)
Processes: 2035 Power: uptime: 1d 8h 1m states: freeze,mem,disk
suspend: s2idle wakeups: 0 hibernate: shutdown
avail: reboot,suspend,test_resume image: 100.68 GiB
services: upowerd,xfce4-power-manager Init: systemd v: 255
default: graphical tool: systemctl
Packages: pm: pacman pkgs: 1100 libs: 324 tools: pamac pm: flatpak pkgs: 0
Compilers: clang: 16.0.6 gcc: 13.2.1 Shell: Bash v: 5.2.26
running-in: xfce4-terminal inxi: 3.3.33
Running ‘inxi Fazy’ while running some 9 threaded number tests using a linux program, for me the CPU core speed is likely to be the culprit; so will try to figure it out and do some testing above to set a default higher clock speed. Thanks for the feedback
dgdg
26 February 2024 12:17
19
If I’m reading that correctly, your system is clocking to the correct speed according to AMD’s specs. As I said, Epyc CPUs aren’t meant to be that fast; they’re meant to be power efficient. You can overclock, sure, but that’s an overclock.
That being said, have you looked at alternative versions of Python? If you’re only running pure Python code, there’s a good chance that the pypy
version of Python could speed it up, as that’s a fair chunk faster . On average, pypy
is about 5x faster, but that average hides some really monstrous speedups for some code.
Editting to add: It looks like the Manjaro packaged versions of pypy
are currently a little broken, so you may have to download pypy
from https://www.pypy.org/ . They’re not completely broken - just some weirdness (I think someone might have compiled in some default code? Not entirely sure)
1 Like
ryzen
26 February 2024 19:32
20
calculating pi:
100%|██████████████████████████████████████████████| 33554431/33554431 [00:09<00:00, 3458013.93it/s]
calculating fib recursive:
97%|█████████████████████████████████████████████████████████████▎ | 37/38 [00:18<00:00, 1.98it/s]
calculating fib iterative:
100%|█████████████████████████████████████████████████| 1048574/1048574 [00:10<00:00, 102368.05it/s]
benchmark time: 0:00:39.346528
OK so not great, but probably acceptable for the CPU rated speed. I think I understand what’s going on now - thanks all for the responses. This thing still thinks it’s a server. The CCX cores are in groups of 3 with a lot of L3 cache, I have noticed quite big improvement going back to single core threads. Time to rethink how I manage my work load. Will need to understand the bios settings a lot better. Cheers.
{edit} an example single thread on this machine runs a large number (not-python) routine in 10 mins single thread, versus 30 mins multi thread, versus 50 mins on a windows box with a ryzen 9. so my thinking is too old school; need to understand thread handling better. more is not always better.
in summary, this is not a manjaro/python issue at all, and will investigate the pypy option if it can speed up execution times, but glad I asked the question. thanks for the help!!
dgdg
26 February 2024 22:57
22
Well, it is a server And in fact, it’s a server with a rather complex architecture. Not only do you have multiple CCXs, but you’ve got two CPUs as well.
Definitely look into minimising your inter-thread communication, and if threads have to communicate a lot, locate them on at least the same CPU, if not the same CCX. Also I can’t help but notice you used the term threads for Python code - Python has a global interpreter lock, so only one thread can execute Python code at once (so some C extensions can run in parallel, but otherwise you get no parallelisation). You may want to look into multiprocessing to fix that.