GPU numerical error is abnormally high

I tested with a simple Python script with GPU under the Manjaro distro and found that the floating error is vital even if the operation is simple, where in Windows boot, or other worse GPU machine there is no that floating error. Any idea how to deal with this? Thank you! The script for reproducing:

import random

import numpy as np
import torch
from torch.optim import Adam

seed = 1
device = "cuda"

torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
random.seed(seed)
np.random.seed(seed)


input = torch.tensor([1.0, 2.0], device=device)
target = torch.tensor([4.0, 5.0], device=device)

model = torch.nn.Linear(2, 2, bias=False)
model.to(device)
optimizer = Adam(model.parameters(), lr=0.1)


for i in range(2):
    print(model.weight)
    optimizer.zero_grad()
    optimizer.step()
    output = model(input)
    print(f"{i} output: {output}")
    loss = (target - output).sum()
    print(f"{i} loss: {loss}")
    loss.backward()
    optimizer.step()

The output:

Parameter containing:
tensor([[ 0.3643, -0.3121],
        [-0.1371,  0.3319]], device='cuda:0', requires_grad=True)
0 output: tensor([-0.2598,  0.5265], device='cuda:0', grad_fn=<SqueezeBackward3>)
0 loss: 8.7332763671875
Parameter containing:
tensor([[ 0.4643, -0.2121],
        [-0.0371,  0.4319]], device='cuda:0', requires_grad=True)
1 output: tensor([0.2410, 1.0275], device='cuda:0', grad_fn=<SqueezeBackward3>)
1 loss: 7.7315521240234375

and using CPU, the result is correct, which is

Parameter containing:
tensor([[ 0.3643, -0.3121],
        [-0.1371,  0.3319]], requires_grad=True)
0 output: tensor([-0.2599,  0.5267], grad_fn=<SqueezeBackward3>)
0 loss: 8.733150482177734
Parameter containing:
tensor([[ 0.4643, -0.2121],
        [-0.0371,  0.4319]], requires_grad=True)
1 output: tensor([0.2412, 1.0277], grad_fn=<SqueezeBackward3>)
1 loss: 7.731115341186523

The model weights will be the same across the machines & cpu, so the problem would just be the multiplication and sum operation for [1.0, 2] @ weights. Any suggestions here?

system info:

System:
  Kernel: 5.17.15-1-MANJARO arch: x86_64 bits: 64 compiler: gcc v: 12.1.0
    parameters: BOOT_IMAGE=/boot/vmlinuz-5.17-x86_64
    root=UUID=17a57bc5-4569-4d1b-8454-2c7d491f058c rw quiet apparmor=1
    security=apparmor udev.log_priority=3 snd_hda_intel.power_save=0
    button.lid_init_state=open
  Desktop: Xfce v: 4.16.0 tk: Gtk v: 3.24.29
    info: vala-panel, xfce4-panel, plank wm: xfwm v: 4.16.1 vt: 7 dm: LightDM
    v: 1.30.0 Distro: Manjaro Linux base: Arch Linux
Machine:
  Type: Laptop System: Razer product: Blade 15 (2022) - RZ09-0421 v: 8.04
    serial: <superuser required> Chassis: type: 10 serial: <superuser required>
  Mobo: Razer model: CH580 v: 4 serial: <superuser required> UEFI: Razer
    v: 1.09 date: 02/22/2022
Battery:
  ID-1: BAT0 charge: 80.3 Wh (100.0%) condition: 80.3/80.2 Wh (100.1%)
    volts: 17.5 min: 15.4 model: Razer Blade type: Unknown serial: <filter>
    status: full
  ID-2: hidpp_battery_0 charge: 76% condition: N/A volts: 4.0 min: N/A
    model: Logitech G903 LIGHTSPEED Wireless Gaming Mouse w/ HERO type: N/A
    serial: <filter> status: discharging
CPU:
  Info: model: 12th Gen Intel Core i7-12800H bits: 64 type: MST AMCP
    arch: Alder Lake gen: core 12 built: 2021 process: Intel 7 (10nm ESF)
    family: 6 model-id: 0x9A (154) stepping: 3 microcode: 0x41C
  Topology: cpus: 1x cores: 14 mt: 6 tpc: 2 st: 8 threads: 20 smt: enabled
    cache: L1: 1.2 MiB desc: d-8x32 KiB, 6x48 KiB; i-6x32 KiB, 8x64 KiB
    L2: 11.5 MiB desc: 6x1.2 MiB, 2x2 MiB L3: 24 MiB desc: 1x24 MiB
  Speed (MHz): avg: 1266 high: 2170 min/max: 400/2400:1800 scaling:
    driver: intel_pstate governor: powersave cores: 1: 1816 2: 636 3: 1062
    4: 694 5: 915 6: 999 7: 2170 8: 1999 9: 1253 10: 808 11: 1861 12: 1974
    13: 1793 14: 1640 15: 1088 16: 979 17: 1069 18: 509 19: 1106 20: 959
    bogomips: 112160
  Flags: avx avx2 ht lm nx pae sse sse2 sse3 sse4_1 sse4_2 ssse3 vmx
  Vulnerabilities:
  Type: itlb_multihit status: Not affected
  Type: l1tf status: Not affected
  Type: mds status: Not affected
  Type: meltdown status: Not affected
  Type: mmio_stale_data status: Not affected
  Type: spec_store_bypass
    mitigation: Speculative Store Bypass disabled via prctl
  Type: spectre_v1
    mitigation: usercopy/swapgs barriers and __user pointer sanitization
  Type: spectre_v2 mitigation: Enhanced IBRS, IBPB: conditional, RSB filling
  Type: srbds status: Not affected
  Type: tsx_async_abort status: Not affected
Graphics:
  Device-1: Intel Alder Lake-P Integrated Graphics vendor: Razer USA
    driver: i915 v: kernel arch: Gen12.2 process: Intel 10nm built: 2021-22+
    ports: active: eDP-1 empty: none bus-ID: 00:02.0 chip-ID: 8086:46a6
    class-ID: 0300
  Device-2: NVIDIA GA103M [GeForce RTX 3080 Ti Laptop GPU] vendor: Razer USA
    driver: nvidia v: 515.48.07 alternate: nouveau,nvidia_drm non-free: 515.xx+
    status: current (as of 2022-06) arch: Ampere process: TSMC n7 (7nm)
    built: 2020-22 pcie: gen: 2 speed: 5 GT/s lanes: 8 link-max: gen: 4
    speed: 16 GT/s lanes: 16 bus-ID: 01:00.0 chip-ID: 10de:2460 class-ID: 0300
  Device-3: IMC Networks Integrated RGB Camera type: USB driver: uvcvideo
    bus-ID: 1-2:2 chip-ID: 13d3:5279 class-ID: 0e02 serial: <filter>
  Display: x11 server: X.Org v: 21.1.3 with: Xwayland v: 22.1.2
    compositor: xfwm v: 4.16.1 driver: X: loaded: modesetting,nvidia
    alternate: fbdev,nouveau,nv,vesa gpu: i915 display-ID: :0.0 screens: 1
  Screen-1: 0 s-res: 4608x2011 s-dpi: 96 s-size: 1219x532mm (47.99x20.94")
    s-diag: 1330mm (52.36")
  Monitor-1: HDMI-1-0 pos: primary,top-right res: 2560x1440 hz: 60 dpi: 108
    size: 600x340mm (23.62x13.39") diag: 690mm (27.15") modes: N/A
  Monitor-2: eDP-1 pos: bottom-l res: 2048x1152 hz: 60 dpi: 151
    size: 344x194mm (13.54x7.64") diag: 395mm (15.55") modes: N/A
  Message: Unable to show GL data. Required tool glxinfo missing.
Audio:
  Device-1: Intel Alder Lake PCH-P High Definition Audio vendor: Razer USA
    driver: sof-audio-pci-intel-tgl
    alternate: snd_hda_intel,snd_sof_pci_intel_tgl bus-ID: 00:1f.3
    chip-ID: 8086:51c8 class-ID: 0401
  Device-2: NVIDIA vendor: Razer USA driver: N/A alternate: snd_hda_intel
    pcie: gen: 2 speed: 5 GT/s lanes: 8 link-max: gen: 4 speed: 16 GT/s
    lanes: 16 bus-ID: 01:00.1 chip-ID: 10de:2288 class-ID: 0403
  Sound Server-1: ALSA v: k5.17.15-1-MANJARO running: yes
  Sound Server-2: JACK v: 1.9.21 running: no
  Sound Server-3: PulseAudio v: 16.1 running: no
  Sound Server-4: PipeWire v: 0.3.52 running: yes
Network:
  Device-1: Intel Alder Lake-P PCH CNVi WiFi vendor: Rivet Networks
    driver: iwlwifi v: kernel bus-ID: 00:14.3 chip-ID: 8086:51f0 class-ID: 0280
  IF: wlo1 state: up mac: <filter>
  IF-ID-1: tun0 state: unknown speed: 10 Mbps duplex: full mac: N/A
Bluetooth:
  Device-1: Intel type: USB driver: btusb v: 0.8 bus-ID: 1-10:6
    chip-ID: 8087:0033 class-ID: e001
  Report: rfkill ID: hci0 rfk-id: 1 state: down bt-service: enabled,running
    rfk-block: hardware: no software: yes address: see --recommends
Drives:
  Local Storage: total: 953.87 GiB used: 247.39 GiB (25.9%)
  SMART Message: Required tool smartctl not installed. Check --recommends
  ID-1: /dev/nvme0n1 maj-min: 259:0 model: NVMe CA6-8D1024 size: 953.87 GiB
    block-size: physical: 512 B logical: 512 B speed: 63.2 Gb/s lanes: 4
    type: SSD serial: <filter> rev: ERA0902 temp: 36.9 C scheme: GPT
Partition:
  ID-1: / raw-size: 672.78 GiB size: 661.15 GiB (98.27%)
    used: 247.34 GiB (37.4%) fs: ext4 dev: /dev/nvme0n1p5 maj-min: 259:5
  ID-2: /boot/efi raw-size: 100 MiB size: 96 MiB (96.00%)
    used: 50.7 MiB (52.8%) fs: vfat dev: /dev/nvme0n1p1 maj-min: 259:1
Swap:
  Kernel: swappiness: 60 (default) cache-pressure: 100 (default)
  ID-1: swap-1 type: file size: 256 MiB used: 40.2 MiB (15.7%) priority: 50
    file: /var/lib/systemd-swap/swapfc/1
Sensors:
  System Temperatures: cpu: 45.0 C mobo: N/A
  Fan Speeds (RPM): N/A
Info:
  Processes: 486 Uptime: 2d 22h 58m wakeups: 87 Memory: 31.04 GiB
  used: 10.87 GiB (35.0%) Init: systemd v: 251 default: graphical
  tool: systemctl Compilers: gcc: 12.1.0 clang: 13.0.1 Packages: 1482
  pacman: 1459 lib: 335 flatpak: 10 snap: 13 Shell: Bash v: 5.1.16
  running-in: xfce4-terminal inxi: 3.3.19