Hi Lemmy!
Encountering network stability issues on my Mac mini ("Core i7" 2.3 (Late 2012)) with Proxmox VE (Linux 6.5.13-1-pve). The enp1s0f0 interface (tg3 driver) frequently drops, displaying "Link is down" messages before recovering. This affects my Plex media server and has persisted across different OS installations.
Here are some key excerpts from the logs showing the problem:
NETDEV WATCHDOG: enp1s0f0 (tg3): transmit queue 0 timed out
tg3 0000:01:00.0 enp1s0f0: transmit timed out, resetting
tg3 0000:01:00.0 enp1s0f0: Link is down
kernel: vmbr0: port 1(enp1s0f0) entered disabled state
kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex
Here is when it's happened
journalctl | grep -E "Link is down|Link is up" | grep 'enp1s0f0'
Mar 03 00:03:05 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is down
Mar 03 00:03:09 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex
Mar 03 15:35:30 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is down
Mar 03 15:35:34 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex
Mar 04 12:43:45 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is down
Mar 04 12:43:49 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex
Mar 07 09:14:48 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is down
Mar 07 09:14:52 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex
The issue is super intermittent, but it appears to happen more when I am trying to watch something on Plex (Direct Play to Apple TV), but it looks like it's also happened over night with limited activity. I've also successfully streamed all day on multiple devices, run ping
to and from multiple devices, mtr
between it and my other Mac and run 16 hours of iperf3
with zero issues!
- Does anyone have any guidance on how I can determine if this is a hardware issue or could it be driver/kernel related?
- I've ordered a Plugable USB to Ethernet Adapter to see if I can bypass the NIC and test if something else is the cause, I also needed a good USB to Ethernet adapter so it was time
- Would an external solution suffice, or is it time for a new system?
- Should I focus on further diagnostics in a different environment, or is it time to retire this box and get something new?
Happy to share more of my syslog and also my network setup, I'm in a NYC apartment, so my options for changing the environment is limited. I've also not encountered (or noticed) the issue with any of my other devices on the same switch hooked up to the router in the same manner. I have tried a different port and cable so far, but not physically moved it to another switch yet.
I'm getting closer and closer to just buying a Dell Optiplex (probably 11th Gen i7), cannibalize the SSD and trying to play with more services, but my original goal was to learn and host Plex cheaply and easily using this older hardware, but my sanity is running out!
Thanks
I seem to recall a VMware complaint similar to this too, and there was a ring buffer tuning to do to fix it... But that error message doesn't seem quite right to match it.
TX queue timeouts can be caused by several things, but I wonder if you're not seeing an end result of a spammy Ethernet flow control implementation where the device can't transmit because the link is continuously paused.
If so, there may be rx_xoff counters viewable from within proxmox, or "ethtool -s enp1s0f0" would tell you where the device is seeing pause frames from the switch on a regular Linux host.
The link down tends to be a reaction by the driver to recover from a hung queue, so if it's not flow control, there could be a driver/firmware upgrade possible, or a series of tunables if there's a bug somewhere in packet handling land resulting in the NIC itself hanging.