Outgoing SSH hangup

I have a bit of a weird issue that started just recently. When I ssh into another machine from my manjaro system (called beast) the ssh session hangs and I can only end it using ssh escape sequence ~. It does not matter which system I ssh into: at around 30 seconds (but not exactly 30 seconds) it just stops. As beast is my primary desktop, I am like a carpenter with a broken hammer :frowning:

What I have tried so far:

  • Searched the internet and this forum
  • Installed kitty next to konsole: same issue.
  • Running journalctl -f in a separate window, to see if a message coincides with the hangup: no hints there
  • ssh into a set of different systems: all show the same problem, hangup in about 30 seconds
  • Used another system to ssh to the same set of systems : no issues, ssh still working after 5 minutes
  • ssh from beast to beast: no issues, ssh still working after 5 minutes
  • Boot into kernel 6.6.19 and 6.1.80 : same issue hangup in about 30 seconds

Although I don’t see myself as a linux (or manjaro) noob I am at loss here. Hope someone here can help me solve this (or give me pointers for where to look.

System:
  Host: beast Kernel: 6.6.19-1-MANJARO arch: x86_64 bits: 64
  Desktop: KDE Plasma v: 5.27.11 Distro: Manjaro Linux
Machine:
  Type: Desktop System: ASUS product: N/A v: N/A serial: <superuser required>
  Mobo: ASUSTeK model: ROG STRIX Z490-E GAMING v: Rev 1.xx
    serial: <superuser required> UEFI: American Megatrends v: 0607
    date: 05/29/2020
CPU:
  Info: 10-core Intel Core i9-10900KF [MT MCP] speed (MHz): avg: 2083
    min/max: 800/5300
Graphics:
  Device-1: NVIDIA GA104 [GeForce RTX 3070 Ti] driver: nvidia v: 550.54.14
  Display: x11 server: X.Org v: 21.1.11 with: Xwayland v: 23.2.4 driver: X:
    loaded: nvidia gpu: nvidia resolution: 1: 2560x1440~144Hz 2: 2560x1440~144Hz
  API: OpenGL v: 4.6.0 compat-v: 4.5 vendor: nvidia mesa v: 550.54.14
    renderer: NVIDIA GeForce RTX 3070 Ti/PCIe/SSE2
Network:
  Device-1: Intel Comet Lake PCH CNVi WiFi driver: iwlwifi
  Device-2: Intel Ethernet I225-V driver: igc
Drives:
  Local Storage: total: 1.82 TiB used: 1.03 TiB (56.4%)
Info:
  Memory: total: 32 GiB available: 31.24 GiB used: 8.23 GiB (26.3%)
  Processes: 468 Uptime: 14m Shell: Zsh inxi: 3.3.33

Post output of

ssh -v ...

Also, can you check if destination machine is receiving any packets?

At first glance this sounds like some routing problem. Do you use VPN?

1 Like

Run drill <remote-host> the verify the expected IP.

Run traceroute <remote-ip> to the remote system(s) failing.

Can you immediately reconnect? This seems to be a timeout configured (maybe on the server) that aggressively disconnects unused connections.

Are you somehow sure that the server behaves correctly otherwise?

In my config, I have:

Host *
    ServerAliveInterval 5
1 Like

I run a small script that shows a “clock” of increasing seconds. So I use ssh to run this script on another machine:

ssh -v  root@pve.familie-dokter.lan '
start=$(date +%s)                           
while true; do
    time="$(($(date +%s) - $start))"
    printf "%s\r" "$(date -u -d "@$time" +%H:%M:%S)"
done
' 2>ssh.log

As you can see here the script stops at 29 seconds… and at that point I have to stop ssh, as it will not respond any more. Log is added below

Another example: not using the script just an interactive shell, it stops responding after about 30 seconds, so I stop it using ssh escape codes.

I don’t understand the routing remark, if routing is a problem can it still mess up an already established ssh session? I am not using a VPN , this is all local on one subnet.

Yes I can reconnect for another 30 seconds immediately after I close the “hung connection”.
When I connect from my laptop to this same server: no issues what so ever.
When I connect from my steamdeck to this same server: no issues what so ever

The ssh.log from the first example:

OpenSSH_9.6p1, OpenSSL 3.2.1 30 Jan 2024
debug1: Reading configuration data /etc/ssh/ssh_config
debug1: /etc/ssh/ssh_config line 2: include /etc/ssh/ssh_config.d/*.conf matched no files
debug1: Connecting to pve.familie-dokter.lan [192.168.3.10] port 22.
debug1: Connection established.
debug1: identity file /home/dries/.ssh/id_rsa type 0
debug1: identity file /home/dries/.ssh/id_rsa-cert type -1
debug1: identity file /home/dries/.ssh/id_ecdsa type -1
debug1: identity file /home/dries/.ssh/id_ecdsa-cert type -1
debug1: identity file /home/dries/.ssh/id_ecdsa_sk type -1
debug1: identity file /home/dries/.ssh/id_ecdsa_sk-cert type -1
debug1: identity file /home/dries/.ssh/id_ed25519 type 3
debug1: identity file /home/dries/.ssh/id_ed25519-cert type -1
debug1: identity file /home/dries/.ssh/id_ed25519_sk type -1
debug1: identity file /home/dries/.ssh/id_ed25519_sk-cert type -1
debug1: identity file /home/dries/.ssh/id_xmss type -1
debug1: identity file /home/dries/.ssh/id_xmss-cert type -1
debug1: identity file /home/dries/.ssh/id_dsa type -1
debug1: identity file /home/dries/.ssh/id_dsa-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_9.6
debug1: Remote protocol version 2.0, remote software version OpenSSH_9.2p1 Debian-2+deb12u2
debug1: compat_banner: match: OpenSSH_9.2p1 Debian-2+deb12u2 pat OpenSSH* compat 0x04000000
debug1: Authenticating to pve.familie-dokter.lan:22 as 'root'
debug1: load_hostkeys: fopen /home/dries/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: sntrup761x25519-sha512@openssh.com
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:/AKVmsRxSoX9rlEgdKe9uLALVnm8RbEfpO6XB4C8fis
debug1: load_hostkeys: fopen /home/dries/.ssh/known_hosts2: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts: No such file or directory
debug1: load_hostkeys: fopen /etc/ssh/ssh_known_hosts2: No such file or directory
debug1: Host 'pve.familie-dokter.lan' is known and matches the ED25519 host key.
debug1: Found key in /home/dries/.ssh/known_hosts:85
debug1: ssh_packet_send2_wrapped: resetting send seqnr 3
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: ssh_packet_read_poll2: resetting read seqnr 3
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
debug1: SSH2_MSG_EXT_INFO received
debug1: kex_ext_info_client_parse: server-sig-algs=<ssh-ed25519,sk-ssh-ed25519@openssh.com,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521,sk-ecdsa-sha2-nistp256@openssh.com,webauthn-sk-ecdsa-sha2-nistp256@openssh.com,ssh-dss,ssh-rsa,rsa-sha2-256,rsa-sha2-512>
debug1: kex_ext_info_check_ver: publickey-hostbound@openssh.com=<0>
debug1: SSH2_MSG_SERVICE_ACCEPT received
debug1: Authentications that can continue: publickey,password
debug1: Next authentication method: publickey
debug1: get_agent_identities: bound agent to hostkey
debug1: get_agent_identities: agent returned 2 keys
debug1: Will attempt key: /home/dries/.ssh/id_rsa RSA SHA256:P5vMmO7lz0hHXGwu9xPw9hlWPW6kAIAmCSMRVpiE8pg agent
debug1: Will attempt key: /home/dries/.ssh/id_ed25519 ED25519 SHA256:hEMLwKWjnLUANVaTiTM3miJCBahNBuaSUNp5BnfPkls agent
debug1: Will attempt key: /home/dries/.ssh/id_ecdsa 
debug1: Will attempt key: /home/dries/.ssh/id_ecdsa_sk 
debug1: Will attempt key: /home/dries/.ssh/id_ed25519_sk 
debug1: Will attempt key: /home/dries/.ssh/id_xmss 
debug1: Will attempt key: /home/dries/.ssh/id_dsa 
debug1: Offering public key: /home/dries/.ssh/id_rsa RSA SHA256:P5vMmO7lz0hHXGwu9xPw9hlWPW6kAIAmCSMRVpiE8pg agent
debug1: Server accepts key: /home/dries/.ssh/id_rsa RSA SHA256:P5vMmO7lz0hHXGwu9xPw9hlWPW6kAIAmCSMRVpiE8pg agent
Authenticated to pve.familie-dokter.lan ([192.168.3.10]:22) using "publickey".
debug1: channel 0: new session [client-session] (inactive timeout: 0)
debug1: Requesting no-more-sessions@openssh.com
debug1: Entering interactive session.
debug1: pledge: filesystem
debug1: client_input_global_request: rtype hostkeys-00@openssh.com want_reply 0
debug1: client_input_hostkeys: searching /home/dries/.ssh/known_hosts for pve.familie-dokter.lan / (none)
debug1: client_input_hostkeys: searching /home/dries/.ssh/known_hosts2 for pve.familie-dokter.lan / (none)
debug1: client_input_hostkeys: hostkeys file /home/dries/.ssh/known_hosts2 does not exist
debug1: client_input_hostkeys: no new or deprecated keys from server
debug1: Remote: /root/.ssh/authorized_keys:6: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
debug1: Remote: /root/.ssh/authorized_keys:6: key options: agent-forwarding port-forwarding pty user-rc x11-forwarding
debug1: Sending command: 
start=$(date +%s)                           
while true; do
    time="$(($(date +%s) - $start))"
    printf "%s\\r" "$(date -u -d "@$time" +%H:%M:%S)"
done

debug1: pledge: fork
debug1: channel 0: free: client-session, nchannels 1
Killed by signal 2.

And if you do not run that script?

I created the script to be able to debug the problem, so if I do not run the script the same thing happens (as you can see in the linked example in my previous reply)

In your example without script you can clearly connect to pve.

In my example with script I can also connect to pve, actually the script (scriptje) is running on pve

I had to ask :grin:

I don’t have an issue with outgoing ssh connections - but for the sake of testing - I just did what you do - testing with the timer and the timer is not the issue - if that is of any comfort and I am sure you knew that.

My ssh connection to an internal device is running a steady clock.

 $ ssh pw1.net.nix.dk 'start=$(date +%s)                           
while true; do
    time="$(($(date +%s) - $start))"
    printf "%s\\r" "$(date -u -d "@$time" +%H:%M:%S)"
done'
^C:07:05

As your connection is good for a short period

  • you need to look at your network configuration
  • possibly your remote device
  • think back
  • what did you change
  • ufw or firewalld
  • what does the log on the remote system tell you
  • what makes it stall

With relation to the routing issue - it can happen if the local network is reset - a flaky connection perhaps - thus it would point to - perhaps an unstable network driver or the MTU is too high - perhaps you have jumbo frames enabled which not all nic support - most likely with older nics - the MTU is usually set to Automatic.

This would also - partly - explain why you don’t see this behaviour on other systems.

Is the network connection Wifi or Ethernet?

If it is Ethernet - have you tried another cable?

1 Like

Have you anything in ~/.ssh/config ?

If you have root access to one of the target machines, one trick I’ve found useful for debugging SSH problems over the years is to shut down the SSHD daemon, then from a terminal run
/usr/bin/sshd -ddd
which will give you very verbose output when you try to connect from another machine. It’s got me out of trouble many times.
Don’t forget to restart the daemon afterwards.

I was thinking along the same lines, but your summary helped me rethink.

I realised my system also has wifi built-in.The connection does stay up when I disconnect wired and switch to wifi.

Yes,

All of this lead me to the conclusion it is more than"just ssh" and indeed using iperf3 I see the same thing:

$ iperf3 -c pve.familie-dokter.lan
Connecting to host pve.familie-dokter.lan, port 5201
[  5] local 192.168.3.50 port 42358 connected to 192.168.3.10 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   512 KBytes  4.19 Mbits/sec    2   1.41 KBytes       
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   512 KBytes   419 Kbits/sec    5             sender
[  5]   0.00-10.00  sec  65.0 KBytes  53.3 Kbits/sec                  receiver

iperf Done.
$ iperf3 -c silverlaptop.familie-dokter.lan
Connecting to host silverlaptop.familie-dokter.lan, port 5201
[  5] local 192.168.3.50 port 46832 connected to 192.168.3.64 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   384 KBytes  3.14 Mbits/sec    2   1.41 KBytes       
[  5]   1.00-2.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   2.00-3.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   3.00-4.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   4.00-5.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   5.00-6.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   6.00-7.00   sec  0.00 Bytes  0.00 bits/sec    1   1.41 KBytes       
[  5]   7.00-8.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   8.00-9.00   sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
[  5]   9.00-10.00  sec  0.00 Bytes  0.00 bits/sec    0   1.41 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec   384 KBytes   315 Kbits/sec    5             sender
[  5]   0.00-10.00  sec  65.0 KBytes  53.3 Kbits/sec                  receiver

iperf Done.

I have some more digging to do!

I have booted into a live manjaro disk on my desktop system. When I do that, everything appears to be working as it should, so I think that means “network environment” like cable, switch, etc. can be ruled out. This is a problem in my manjaro desktop configuration. So now need to find out when all this started and when I changed something to cause that…

Test with a new user.

I am still not sure what caused it but I removed my network definition and added a new one for the wired interface and it appears that did the trick. so: problem solved.
Thanks @zbe , @linux-aarhus , @mithrial , @andreas85 , @beermad for spending some time on your Sunday to help me out.

1 Like

This topic was automatically closed 36 hours after the last reply. New replies are no longer allowed.