Linux

48012 readers

823 users here now

From Wikipedia, the free encyclopedia

Linux is a family of open source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991 by Linus Torvalds. Linux is typically packaged in a Linux distribution (or distro for short).

Distributions include the Linux kernel and supporting system software and libraries, many of which are provided by the GNU Project. Many Linux distributions use the word "Linux" in their name, but the Free Software Foundation uses the name GNU/Linux to emphasize the importance of GNU software, causing some controversy.

Rules

Posts must be relevant to operating systems running the Linux kernel. GNU/Linux or otherwise.
No misinformation
No NSFW content
No hate speech, bigotry, etc

Related Communities

Community icon by Alpár-Etele Méder, licensed under CC BY 3.0

founded 5 years ago

MODERATORS

AgreeableLandscape@lemmy.ml

nooter692@lemmy.ml

MarcellusDrum@lemmy.ml

cypherpunks@lemmy.ml

cyclohexane@lemmy.ml

d3Xt3r@lemmy.nz

Network Stability Issues with tg3 Driver - Hardware Replacement or Further Testing? (lemmy.world)

submitted 8 months ago by robalees@lemmy.world to c/linux@lemmy.ml

3 comments fedilink hide all child comments

Hi Lemmy!

Encountering network stability issues on my Mac mini ("Core i7" 2.3 (Late 2012)) with Proxmox VE (Linux 6.5.13-1-pve). The enp1s0f0 interface (tg3 driver) frequently drops, displaying "Link is down" messages before recovering. This affects my Plex media server and has persisted across different OS installations.

(related posts here and here)

Here are some key excerpts from the logs showing the problem:

NETDEV WATCHDOG: enp1s0f0 (tg3): transmit queue 0 timed out
tg3 0000:01:00.0 enp1s0f0: transmit timed out, resetting
tg3 0000:01:00.0 enp1s0f0: Link is down
kernel: vmbr0: port 1(enp1s0f0) entered disabled state
kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex

Here is when it's happened

journalctl | grep -E "Link is down|Link is up" | grep 'enp1s0f0'
Mar 03 00:03:05 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is down
Mar 03 00:03:09 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex
Mar 03 15:35:30 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is down
Mar 03 15:35:34 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex
Mar 04 12:43:45 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is down
Mar 04 12:43:49 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex
Mar 07 09:14:48 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is down
Mar 07 09:14:52 macminiserver kernel: tg3 0000:01:00.0 enp1s0f0: Link is up at 1000 Mbps, full duplex

The issue is super intermittent, but it appears to happen more when I am trying to watch something on Plex (Direct Play to Apple TV), but it looks like it's also happened over night with limited activity. I've also successfully streamed all day on multiple devices, run ping to and from multiple devices, mtr between it and my other Mac and run 16 hours of iperf3 with zero issues!

Does anyone have any guidance on how I can determine if this is a hardware issue or could it be driver/kernel related?
I've ordered a Plugable USB to Ethernet Adapter to see if I can bypass the NIC and test if something else is the cause, I also needed a good USB to Ethernet adapter so it was time
Would an external solution suffice, or is it time for a new system?
Should I focus on further diagnostics in a different environment, or is it time to retire this box and get something new?

Happy to share more of my syslog and also my network setup, I'm in a NYC apartment, so my options for changing the environment is limited. I've also not encountered (or noticed) the issue with any of my other devices on the same switch hooked up to the router in the same manner. I have tried a different port and cable so far, but not physically moved it to another switch yet.

I'm getting closer and closer to just buying a Dell Optiplex (probably 11th Gen i7), cannibalize the SSD and trying to play with more services, but my original goal was to learn and host Plex cheaply and easily using this older hardware, but my sanity is running out!

Thanks

you are viewing a single comment's thread
view the rest of the comments

[–] Sorcaeden@lemmy.world 2 points 8 months ago

I seem to recall a VMware complaint similar to this too, and there was a ring buffer tuning to do to fix it... But that error message doesn't seem quite right to match it.

TX queue timeouts can be caused by several things, but I wonder if you're not seeing an end result of a spammy Ethernet flow control implementation where the device can't transmit because the link is continuously paused.

If so, there may be rx_xoff counters viewable from within proxmox, or "ethtool -s enp1s0f0" would tell you where the device is seeing pause frames from the switch on a regular Linux host.

The link down tends to be a reaction by the driver to recover from a hung queue, so if it's not flow control, there could be a driver/firmware upgrade possible, or a series of tunables if there's a bug somewhere in packet handling land resulting in the NIC itself hanging.