A Windows virtual machine running under VMware 17 Pro was freezing several times a day, seemingly due to a brief network outage. The cause was found to be a Network Manager setting on the host computer, a laptop running Linux Mint 21.2. As explained below, the fix was to disable connectivity checking in the NM configuration, which is on by default, while noting the side effect of disabling captive portals.
Virtual Machine Freezes Momentarily
A Windows 10 VM in regular use briefly “froze”, several times a day. Every time, messages like this were left in /var/log/syslog on the Linux Mint host (called pluto):
[Thu Sep 21 14:14:40 2023] userif-3: sent link down event.
[Thu Sep 21 14:14:40 2023] userif-3: sent link up event.
[Thu Sep 21 14:14:40 2023] userif-3: sent link down event.
[Thu Sep 21 14:14:40 2023] userif-3: sent link up event.
and the following lines were simultaneously written to /var/log/syslog:
Sep 21 14:14:39 pluto NetworkManager[874]: [1695302079.6626] manager: NetworkManager state is now CONNECTED_SITE
Sep 21 14:14:39 pluto NetworkManager[874]: [1695302079.8698] manager: NetworkManager state is now CONNECTED_GLOBAL
…the “CONNECTED_SITE” corresponds with a disconnection, while “CONNECTED_GLOBAL” matches the re-connection.
Network Manager Connectivity Check Failure
I increased NetworkManager verbosity by adding these lines to /etc/NetworkManager/NetworkManager.conf:
[logging]
level=DEBUG
and (as root user) restarted the service:
systemctl restart NetworkManager
Among the more detailed messages now written to syslog were these, indicating that
every 5 minutes, NetworkManager is testing the Internet (not simply the local network)
connectivity:
Sep 21 15:40:06 pluto NetworkManager[12519]: [1695307206.6865] connectivity: (enp4s0,IPv4,55) start request to 'http://connectivity-check.ubuntu.com./' (try resolving 'connectivity-check.ubuntu.com.' using systemd-resolved)
Sep 21 15:40:06 pluto NetworkManager[12519]: [1695307206.7123] connectivity: (enp4s0,IPv4,55) check completed: FULL; status header found
The above lines indicate a successful test. But occasionally the test fails,
like this:
Sep 21 15:34:46 pluto NetworkManager[12519]: [1695306886.5517] connectivity: (enp4s0,IPv4,50) start request to 'http://connectivity-check.ubuntu.com./' (try resolving 'connectivity-check.ubuntu.com.' using systemd-resolved)
Sep 21 15:35:06 pluto NetworkManager[12519]: [1695306906.6664] connectivity: (enp4s0,IPv4,50) check completed: LIMITED; timeout
The “LIMITED; timeout” denoting the failure to get a response from
connectivity-check.ubuntu.com (whose response is not reliable according to forum
discussions)
The failure is reflected in /var/log/syslog:
[Thu Sep 21 15:35:07 2023] userif-3: sent link down event.
[Thu Sep 21 15:35:07 2023] userif-3: sent link up event.
For every such failure, VMware responds (correctly) by downing and then upping its virtual interfaces, causing the “freezes” above. The Linux host briefly downs its real network adapter too, but so briefly as to be unnoticeable (eg network radio is uninterrupted).
Configuration Change
Following the second answer to this Askubuntu forum question, I disabled the connectivity check altogether, by simply creating this empty file on the Linux VMware host host:
sudo touch /etc/NetworkManager/conf.d/20-connectivity-ubuntu.conf
Side Effects
The purpose of the connectivity check is to allow NetworkManager to detect whether or not the system can actually access the internet or whether it is behind a captive portal. (Search the Network Manager Configuration page for “connectivity section”). With the check disabled, connecting to wireless in a hotel or other public place could become difficult. The hotel’s wi-fi login page (captive portal) might not appear.
Reversing the change might help in this circumstance, eg.:
sudo rm /etc/NetworkManager/conf.d/20-connectivity-ubuntu.conf
Root Cause
The virtual machine appeared to freeze several times a day. VMware was briefly (and correctly) downing the machine’s virtual network adapters in response to a network event from the Linux host.
Linux was downing its real network adapter in response to a signal from Network Manager that the Internet had become unreachable. This signal was incorrect. All that had become unreachable was the remote Ubuntu server that is meant to answer these connectivity checks (connectivity-check.ubuntu.com).
Conclusion
Having the connectivity check turned on by default is questionable. It allows easy connection to public wi-fi networks, but this isn’t an issue for many or most Linux servers. Having a server automatically reach out to the Internet could be seen as a security or privacy issue. Better to provide the service and leave optional its activation.
Having the connectivity check rely on Internet servers that respond slowly, and occasionally not at all, doesn’t help, leading to some interruption of the host’s network stack and application impact such as the above VM issue.
Alternative Solutions
Rather than being disabled, the connectivity check could instead be configured to use a more reliable endpoint, such as network-test.debian.org/nm. In which case, edit the file /etc/NetworkManager/conf.d/11-connectivity.conf
and add the lines:
[connectivity]
uri=http://network-test.debian.org/nm
interval=300
At the time of writing (9th Jan 2024), “http://network-test.debian.org/nm” responds more quickly and more uniformly to curl than the default endpoint “http://connectivity-check.ubuntu.com”.