Troubleshooting NIC Logs by NETDEV WATCHDOGBroadcom NIC Teaming with Hyper-V R2SATA drive problems with two SIL RAID cardsDual Port Server Nic QuestionMissing Jumbo and MTU for Broadcom NIC?What network loads require NIC polling vs interrupts?linux hibernate working but slowNIC Teaming with Broadcom on HyperV 2008 R2Ubuntu server - NIC suddenly loses connectivityKVM NIC passthrough “device already in use”How do I troubleshoot slow NIC on R610 Dell?

Reaction of borax with NaOH

Why can't RGB or bicolour LEDs produce a decent yellow?

Create a list of all possible Boolean configurations of three constraints

Can I make ravioli dough with only all-purpose flour or do I NEED semolina flour?

Speculative Biology of a Haplodiploid Humanoid Species

Extracting sublists that contain similar elements

What does "Ich wusste, dass aus dir mal was wird" mean?

Does Lawful Interception of 4G / the proposed 5G provide a back door for hackers as well?

Why do Thanos's punches not kill Captain America or at least cause some mortal injuries?

Does kinetic energy warp spacetime?

Proof that the inverse image of a single element is a discrete space

Exception propagation: When should I catch exceptions?

What is the significance of 4200 BCE in context of farming replacing foraging in Europe?

Why doesn't Rocket Lab use a solid stage?

Who was this character from the Tomb of Annihilation adventure before they became a monster?

What are the implications of the new alleged key recovery attack preprint on SIMON?

Anatomically Correct Carnivorous Tree

Setting the major mode of a new buffer interactively

What is the best way for a skeleton to impersonate human without using magic?

List software from restricted, multiverse separately

How to compact two the parabol commands in the following example?

How can a Lich look like a human without magic?

Light Switch Terminals

Does the 500 feet falling cap apply per fall, or per turn?



Troubleshooting NIC Logs by NETDEV WATCHDOG


Broadcom NIC Teaming with Hyper-V R2SATA drive problems with two SIL RAID cardsDual Port Server Nic QuestionMissing Jumbo and MTU for Broadcom NIC?What network loads require NIC polling vs interrupts?linux hibernate working but slowNIC Teaming with Broadcom on HyperV 2008 R2Ubuntu server - NIC suddenly loses connectivityKVM NIC passthrough “device already in use”How do I troubleshoot slow NIC on R610 Dell?






.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty height:90px;width:728px;box-sizing:border-box;








1















I have a CentOS 6.6 server (Linux 2.6.32-504) running on a brand new Dell PowerEdge R320. Just the other day, I began noticing problems in /var/log/messages related to one of my NICs.



Note: this is a server cluster that requires eth0 to be used in a crossover network to a redundant backup server, so we use eth1 to talk to the customer's network and the world; so it would be challenging to switch NICs for troubleshooting purposes. I'll also note I'm not seeing any issues on eth0, though.



I'd like to troubleshoot this as much as possible ~ is it a hardware failure, Linux kernel doesn't like the Dell hardware, etc.? I am not a career Linux administrator, so I'm kind of green in this regard.



Google searches seem to turn up more questions for me, than answers (and only seem to apply to other distros; but point out possibilities such as kernel/firmware issues. I humbly beg for this community's expertise and ideas!



The one thing I noticed was that the interface was coming back up as 100Mb/half (when it's really on a 1Gb/full).



I'll provide as much as I can think of, that's hopefully relevant.



Here is the snippet of /var/log/messages where I noticed the issue... does this mean anything to anyone?



Mar 22 16:10:07 ind1un043 kernel: ------------[ cut here ]------------
Mar 22 16:10:07 ind1un043 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)
Mar 22 16:10:07 ind1un043 kernel: Hardware name: PowerEdge R320
Mar 22 16:10:07 ind1un043 kernel: NETDEV WATCHDOG: eth1 (tg3): transmit queue 0 timed out
Mar 22 16:10:07 ind1un043 kernel: Modules linked in: nls_utf8 hfsplus(U) mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase dell_rbu coretemp 8021q garp stp llc wctc4xxp(U) dahdi_transcode(U) wcb4xxp(U) wctdm(U) wcfxo(U) wctdm24xxp(U) wcte11xp(U) wct1xxp(U) wcte12xp(U) dahdi_voicebus(U) wct4xxp(U) oct612x(U) dahdi(U) crc_ccitt ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput iTCO_wdt iTCO_vendor_support microcode dcdbas sb_edac edac_core ipmi_devintf power_meter acpi_ipmi ipmi_si ipmi_msghandler sg shpchp tg3 ptp pps_core lpc_ich mfd_core ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Mar 22 16:10:07 ind1un043 kernel: Pid: 9, comm: ksoftirqd/1 Not tainted 2.6.32-504.el6.x86_64 #1
Mar 22 16:10:07 ind1un043 kernel: Call Trace:
Mar 22 16:10:07 ind1un043 kernel: <IRQ> [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81074ee6>] ? warn_slowpath_fmt+0x46/0x50
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8147df7b>] ? dev_watchdog+0x26b/0x280
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81060d7c>] ? scheduler_tick+0xcc/0x260
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8147dd10>] ? dev_watchdog+0x0/0x280
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81087db7>] ? run_timer_softirq+0x197/0x340
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff810b0275>] ? tick_dev_program_event+0x65/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff810b034a>] ? tick_program_event+0x2a/0x30
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100fc15>] ? do_softirq+0x65/0xa0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81533c0a>] ? smp_apic_timer_interrupt+0x4a/0x60
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
Mar 22 16:10:07 ind1un043 kernel: <EOI> [<ffffffff8107d4c5>] ? ksoftirqd+0xd5/0x110
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d3f0>] ? ksoftirqd+0x0/0x110
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 22 16:10:07 ind1un043 kernel: ---[ end trace c154004f7af06fb3 ]---
Mar 22 16:10:07 ind1un043 kernel: tg3 0000:02:00.1: eth1: transmit timed out, resetting
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000000: 0x165f14e4, 0x00100406, 0x02000000, 0x00800010
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000010: 0xd90d000c, 0x00000000, 0xd90e000c, 0x00000000
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000020: 0xd90f000c, 0x00000000, 0x00000000, 0x04f71028
... (TOO MANY CHARACTERS TO POST ON SERVER FAULT, SO STRIPPING) ...
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00007020: 0x00000000, 0x00000000, 0x00000406, 0x10004000
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00007030: 0x000e0000, 0x00004af8, 0x00170030, 0x00000000
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0: Host status block [00000001:00000063:(0000:0695:0000):(0000:01b8)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0: NAPI info [00000063:00000063:(01a5:01b8:01ff):0000:(0095:0000:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 1: Host status block [00000001:000000ac:(0000:0000:0000):(083a:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 1: NAPI info [000000ac:000000ac:(0000:0000:01ff):083a:(003a:003a:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 2: Host status block [00000001:0000004f:(089e:0000:0000):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 2: NAPI info [00000046:00000046:(0000:0000:01ff):0895:(0095:0095:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 3: Host status block [00000001:00000012:(0000:0000:0000):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 3: NAPI info [00000012:00000012:(0000:0000:01ff):0fbc:(07bc:07bc:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 4: Host status block [00000001:000000d4:(0000:0000:0742):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 4: NAPI info [000000d4:000000d4:(0000:0000:01ff):0742:(0742:0742:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: tg3_stop_block timed out, ofs=1400 enable_bit=2
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: tg3_stop_block timed out, ofs=c00 enable_bit=2
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is down
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled


But before that, here is the snippet of /var/log/messages where the NIC initializes seemingly normally during a boot...



Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 14:18:77:32:77:3f
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: dma_rwctrl[00000001] dma_mask[64-bit]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 14:18:77:32:77:40
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: dma_rwctrl[00000001] dma_mask[64-bit]


Here is the snippet of /var/log/messages where the NIC comes up seemingly normally, but not sure in what context (it's just after the init immediately above this snippet)...



Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Link is up at 1000 Mbps, full duplex
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Flow control is on for TX and on for RX
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: EEE is disabled
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


And, here is the snippet of /var/log/messages where the NIC seems to soon-after have other issues, renegotiating or something. This happens randomly several times over the next few days...



Mar 21 16:01:42 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled
Mar 21 16:01:44 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


Here is my lspci | grep -i net...



02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe









share|improve this question
























  • Couple of things I would try first. 1: Check for errors in the iDRAC 2: grab the latest SUU and make sure all your drivers/firmware are up to date 3: check the switch it is connected to for errors 4: check that the switch it's connected to isn't hard coded to 100/half for some reason.

    – Zypher
    Mar 25 '16 at 15:41


















1















I have a CentOS 6.6 server (Linux 2.6.32-504) running on a brand new Dell PowerEdge R320. Just the other day, I began noticing problems in /var/log/messages related to one of my NICs.



Note: this is a server cluster that requires eth0 to be used in a crossover network to a redundant backup server, so we use eth1 to talk to the customer's network and the world; so it would be challenging to switch NICs for troubleshooting purposes. I'll also note I'm not seeing any issues on eth0, though.



I'd like to troubleshoot this as much as possible ~ is it a hardware failure, Linux kernel doesn't like the Dell hardware, etc.? I am not a career Linux administrator, so I'm kind of green in this regard.



Google searches seem to turn up more questions for me, than answers (and only seem to apply to other distros; but point out possibilities such as kernel/firmware issues. I humbly beg for this community's expertise and ideas!



The one thing I noticed was that the interface was coming back up as 100Mb/half (when it's really on a 1Gb/full).



I'll provide as much as I can think of, that's hopefully relevant.



Here is the snippet of /var/log/messages where I noticed the issue... does this mean anything to anyone?



Mar 22 16:10:07 ind1un043 kernel: ------------[ cut here ]------------
Mar 22 16:10:07 ind1un043 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)
Mar 22 16:10:07 ind1un043 kernel: Hardware name: PowerEdge R320
Mar 22 16:10:07 ind1un043 kernel: NETDEV WATCHDOG: eth1 (tg3): transmit queue 0 timed out
Mar 22 16:10:07 ind1un043 kernel: Modules linked in: nls_utf8 hfsplus(U) mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase dell_rbu coretemp 8021q garp stp llc wctc4xxp(U) dahdi_transcode(U) wcb4xxp(U) wctdm(U) wcfxo(U) wctdm24xxp(U) wcte11xp(U) wct1xxp(U) wcte12xp(U) dahdi_voicebus(U) wct4xxp(U) oct612x(U) dahdi(U) crc_ccitt ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput iTCO_wdt iTCO_vendor_support microcode dcdbas sb_edac edac_core ipmi_devintf power_meter acpi_ipmi ipmi_si ipmi_msghandler sg shpchp tg3 ptp pps_core lpc_ich mfd_core ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Mar 22 16:10:07 ind1un043 kernel: Pid: 9, comm: ksoftirqd/1 Not tainted 2.6.32-504.el6.x86_64 #1
Mar 22 16:10:07 ind1un043 kernel: Call Trace:
Mar 22 16:10:07 ind1un043 kernel: <IRQ> [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81074ee6>] ? warn_slowpath_fmt+0x46/0x50
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8147df7b>] ? dev_watchdog+0x26b/0x280
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81060d7c>] ? scheduler_tick+0xcc/0x260
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8147dd10>] ? dev_watchdog+0x0/0x280
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81087db7>] ? run_timer_softirq+0x197/0x340
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff810b0275>] ? tick_dev_program_event+0x65/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff810b034a>] ? tick_program_event+0x2a/0x30
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100fc15>] ? do_softirq+0x65/0xa0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81533c0a>] ? smp_apic_timer_interrupt+0x4a/0x60
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
Mar 22 16:10:07 ind1un043 kernel: <EOI> [<ffffffff8107d4c5>] ? ksoftirqd+0xd5/0x110
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d3f0>] ? ksoftirqd+0x0/0x110
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 22 16:10:07 ind1un043 kernel: ---[ end trace c154004f7af06fb3 ]---
Mar 22 16:10:07 ind1un043 kernel: tg3 0000:02:00.1: eth1: transmit timed out, resetting
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000000: 0x165f14e4, 0x00100406, 0x02000000, 0x00800010
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000010: 0xd90d000c, 0x00000000, 0xd90e000c, 0x00000000
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000020: 0xd90f000c, 0x00000000, 0x00000000, 0x04f71028
... (TOO MANY CHARACTERS TO POST ON SERVER FAULT, SO STRIPPING) ...
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00007020: 0x00000000, 0x00000000, 0x00000406, 0x10004000
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00007030: 0x000e0000, 0x00004af8, 0x00170030, 0x00000000
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0: Host status block [00000001:00000063:(0000:0695:0000):(0000:01b8)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0: NAPI info [00000063:00000063:(01a5:01b8:01ff):0000:(0095:0000:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 1: Host status block [00000001:000000ac:(0000:0000:0000):(083a:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 1: NAPI info [000000ac:000000ac:(0000:0000:01ff):083a:(003a:003a:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 2: Host status block [00000001:0000004f:(089e:0000:0000):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 2: NAPI info [00000046:00000046:(0000:0000:01ff):0895:(0095:0095:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 3: Host status block [00000001:00000012:(0000:0000:0000):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 3: NAPI info [00000012:00000012:(0000:0000:01ff):0fbc:(07bc:07bc:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 4: Host status block [00000001:000000d4:(0000:0000:0742):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 4: NAPI info [000000d4:000000d4:(0000:0000:01ff):0742:(0742:0742:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: tg3_stop_block timed out, ofs=1400 enable_bit=2
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: tg3_stop_block timed out, ofs=c00 enable_bit=2
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is down
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled


But before that, here is the snippet of /var/log/messages where the NIC initializes seemingly normally during a boot...



Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 14:18:77:32:77:3f
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: dma_rwctrl[00000001] dma_mask[64-bit]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 14:18:77:32:77:40
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: dma_rwctrl[00000001] dma_mask[64-bit]


Here is the snippet of /var/log/messages where the NIC comes up seemingly normally, but not sure in what context (it's just after the init immediately above this snippet)...



Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Link is up at 1000 Mbps, full duplex
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Flow control is on for TX and on for RX
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: EEE is disabled
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


And, here is the snippet of /var/log/messages where the NIC seems to soon-after have other issues, renegotiating or something. This happens randomly several times over the next few days...



Mar 21 16:01:42 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled
Mar 21 16:01:44 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


Here is my lspci | grep -i net...



02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe









share|improve this question
























  • Couple of things I would try first. 1: Check for errors in the iDRAC 2: grab the latest SUU and make sure all your drivers/firmware are up to date 3: check the switch it is connected to for errors 4: check that the switch it's connected to isn't hard coded to 100/half for some reason.

    – Zypher
    Mar 25 '16 at 15:41














1












1








1








I have a CentOS 6.6 server (Linux 2.6.32-504) running on a brand new Dell PowerEdge R320. Just the other day, I began noticing problems in /var/log/messages related to one of my NICs.



Note: this is a server cluster that requires eth0 to be used in a crossover network to a redundant backup server, so we use eth1 to talk to the customer's network and the world; so it would be challenging to switch NICs for troubleshooting purposes. I'll also note I'm not seeing any issues on eth0, though.



I'd like to troubleshoot this as much as possible ~ is it a hardware failure, Linux kernel doesn't like the Dell hardware, etc.? I am not a career Linux administrator, so I'm kind of green in this regard.



Google searches seem to turn up more questions for me, than answers (and only seem to apply to other distros; but point out possibilities such as kernel/firmware issues. I humbly beg for this community's expertise and ideas!



The one thing I noticed was that the interface was coming back up as 100Mb/half (when it's really on a 1Gb/full).



I'll provide as much as I can think of, that's hopefully relevant.



Here is the snippet of /var/log/messages where I noticed the issue... does this mean anything to anyone?



Mar 22 16:10:07 ind1un043 kernel: ------------[ cut here ]------------
Mar 22 16:10:07 ind1un043 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)
Mar 22 16:10:07 ind1un043 kernel: Hardware name: PowerEdge R320
Mar 22 16:10:07 ind1un043 kernel: NETDEV WATCHDOG: eth1 (tg3): transmit queue 0 timed out
Mar 22 16:10:07 ind1un043 kernel: Modules linked in: nls_utf8 hfsplus(U) mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase dell_rbu coretemp 8021q garp stp llc wctc4xxp(U) dahdi_transcode(U) wcb4xxp(U) wctdm(U) wcfxo(U) wctdm24xxp(U) wcte11xp(U) wct1xxp(U) wcte12xp(U) dahdi_voicebus(U) wct4xxp(U) oct612x(U) dahdi(U) crc_ccitt ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput iTCO_wdt iTCO_vendor_support microcode dcdbas sb_edac edac_core ipmi_devintf power_meter acpi_ipmi ipmi_si ipmi_msghandler sg shpchp tg3 ptp pps_core lpc_ich mfd_core ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Mar 22 16:10:07 ind1un043 kernel: Pid: 9, comm: ksoftirqd/1 Not tainted 2.6.32-504.el6.x86_64 #1
Mar 22 16:10:07 ind1un043 kernel: Call Trace:
Mar 22 16:10:07 ind1un043 kernel: <IRQ> [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81074ee6>] ? warn_slowpath_fmt+0x46/0x50
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8147df7b>] ? dev_watchdog+0x26b/0x280
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81060d7c>] ? scheduler_tick+0xcc/0x260
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8147dd10>] ? dev_watchdog+0x0/0x280
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81087db7>] ? run_timer_softirq+0x197/0x340
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff810b0275>] ? tick_dev_program_event+0x65/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff810b034a>] ? tick_program_event+0x2a/0x30
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100fc15>] ? do_softirq+0x65/0xa0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81533c0a>] ? smp_apic_timer_interrupt+0x4a/0x60
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
Mar 22 16:10:07 ind1un043 kernel: <EOI> [<ffffffff8107d4c5>] ? ksoftirqd+0xd5/0x110
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d3f0>] ? ksoftirqd+0x0/0x110
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 22 16:10:07 ind1un043 kernel: ---[ end trace c154004f7af06fb3 ]---
Mar 22 16:10:07 ind1un043 kernel: tg3 0000:02:00.1: eth1: transmit timed out, resetting
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000000: 0x165f14e4, 0x00100406, 0x02000000, 0x00800010
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000010: 0xd90d000c, 0x00000000, 0xd90e000c, 0x00000000
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000020: 0xd90f000c, 0x00000000, 0x00000000, 0x04f71028
... (TOO MANY CHARACTERS TO POST ON SERVER FAULT, SO STRIPPING) ...
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00007020: 0x00000000, 0x00000000, 0x00000406, 0x10004000
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00007030: 0x000e0000, 0x00004af8, 0x00170030, 0x00000000
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0: Host status block [00000001:00000063:(0000:0695:0000):(0000:01b8)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0: NAPI info [00000063:00000063:(01a5:01b8:01ff):0000:(0095:0000:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 1: Host status block [00000001:000000ac:(0000:0000:0000):(083a:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 1: NAPI info [000000ac:000000ac:(0000:0000:01ff):083a:(003a:003a:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 2: Host status block [00000001:0000004f:(089e:0000:0000):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 2: NAPI info [00000046:00000046:(0000:0000:01ff):0895:(0095:0095:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 3: Host status block [00000001:00000012:(0000:0000:0000):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 3: NAPI info [00000012:00000012:(0000:0000:01ff):0fbc:(07bc:07bc:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 4: Host status block [00000001:000000d4:(0000:0000:0742):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 4: NAPI info [000000d4:000000d4:(0000:0000:01ff):0742:(0742:0742:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: tg3_stop_block timed out, ofs=1400 enable_bit=2
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: tg3_stop_block timed out, ofs=c00 enable_bit=2
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is down
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled


But before that, here is the snippet of /var/log/messages where the NIC initializes seemingly normally during a boot...



Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 14:18:77:32:77:3f
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: dma_rwctrl[00000001] dma_mask[64-bit]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 14:18:77:32:77:40
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: dma_rwctrl[00000001] dma_mask[64-bit]


Here is the snippet of /var/log/messages where the NIC comes up seemingly normally, but not sure in what context (it's just after the init immediately above this snippet)...



Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Link is up at 1000 Mbps, full duplex
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Flow control is on for TX and on for RX
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: EEE is disabled
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


And, here is the snippet of /var/log/messages where the NIC seems to soon-after have other issues, renegotiating or something. This happens randomly several times over the next few days...



Mar 21 16:01:42 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled
Mar 21 16:01:44 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


Here is my lspci | grep -i net...



02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe









share|improve this question
















I have a CentOS 6.6 server (Linux 2.6.32-504) running on a brand new Dell PowerEdge R320. Just the other day, I began noticing problems in /var/log/messages related to one of my NICs.



Note: this is a server cluster that requires eth0 to be used in a crossover network to a redundant backup server, so we use eth1 to talk to the customer's network and the world; so it would be challenging to switch NICs for troubleshooting purposes. I'll also note I'm not seeing any issues on eth0, though.



I'd like to troubleshoot this as much as possible ~ is it a hardware failure, Linux kernel doesn't like the Dell hardware, etc.? I am not a career Linux administrator, so I'm kind of green in this regard.



Google searches seem to turn up more questions for me, than answers (and only seem to apply to other distros; but point out possibilities such as kernel/firmware issues. I humbly beg for this community's expertise and ideas!



The one thing I noticed was that the interface was coming back up as 100Mb/half (when it's really on a 1Gb/full).



I'll provide as much as I can think of, that's hopefully relevant.



Here is the snippet of /var/log/messages where I noticed the issue... does this mean anything to anyone?



Mar 22 16:10:07 ind1un043 kernel: ------------[ cut here ]------------
Mar 22 16:10:07 ind1un043 kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26b/0x280() (Not tainted)
Mar 22 16:10:07 ind1un043 kernel: Hardware name: PowerEdge R320
Mar 22 16:10:07 ind1un043 kernel: NETDEV WATCHDOG: eth1 (tg3): transmit queue 0 timed out
Mar 22 16:10:07 ind1un043 kernel: Modules linked in: nls_utf8 hfsplus(U) mpt3sas mpt2sas scsi_transport_sas raid_class mptctl mptbase dell_rbu coretemp 8021q garp stp llc wctc4xxp(U) dahdi_transcode(U) wcb4xxp(U) wctdm(U) wcfxo(U) wctdm24xxp(U) wcte11xp(U) wct1xxp(U) wcte12xp(U) dahdi_voicebus(U) wct4xxp(U) oct612x(U) dahdi(U) crc_ccitt ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 uinput iTCO_wdt iTCO_vendor_support microcode dcdbas sb_edac edac_core ipmi_devintf power_meter acpi_ipmi ipmi_si ipmi_msghandler sg shpchp tg3 ptp pps_core lpc_ich mfd_core ext4 jbd2 mbcache sr_mod cdrom sd_mod crc_t10dif ahci wmi megaraid_sas dm_mirror dm_region_hash dm_log dm_mod [last unloaded: speedstep_lib]
Mar 22 16:10:07 ind1un043 kernel: Pid: 9, comm: ksoftirqd/1 Not tainted 2.6.32-504.el6.x86_64 #1
Mar 22 16:10:07 ind1un043 kernel: Call Trace:
Mar 22 16:10:07 ind1un043 kernel: <IRQ> [<ffffffff81074df7>] ? warn_slowpath_common+0x87/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81074ee6>] ? warn_slowpath_fmt+0x46/0x50
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8147df7b>] ? dev_watchdog+0x26b/0x280
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81060d7c>] ? scheduler_tick+0xcc/0x260
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8147dd10>] ? dev_watchdog+0x0/0x280
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81087db7>] ? run_timer_softirq+0x197/0x340
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff810b0275>] ? tick_dev_program_event+0x65/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d8b1>] ? __do_softirq+0xc1/0x1e0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff810b034a>] ? tick_program_event+0x2a/0x30
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c30c>] ? call_softirq+0x1c/0x30
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100fc15>] ? do_softirq+0x65/0xa0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d765>] ? irq_exit+0x85/0x90
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff81533c0a>] ? smp_apic_timer_interrupt+0x4a/0x60
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100bb93>] ? apic_timer_interrupt+0x13/0x20
Mar 22 16:10:07 ind1un043 kernel: <EOI> [<ffffffff8107d4c5>] ? ksoftirqd+0xd5/0x110
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8107d3f0>] ? ksoftirqd+0x0/0x110
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8109e66e>] ? kthread+0x9e/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c20a>] ? child_rip+0xa/0x20
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8109e5d0>] ? kthread+0x0/0xc0
Mar 22 16:10:07 ind1un043 kernel: [<ffffffff8100c200>] ? child_rip+0x0/0x20
Mar 22 16:10:07 ind1un043 kernel: ---[ end trace c154004f7af06fb3 ]---
Mar 22 16:10:07 ind1un043 kernel: tg3 0000:02:00.1: eth1: transmit timed out, resetting
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000000: 0x165f14e4, 0x00100406, 0x02000000, 0x00800010
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000010: 0xd90d000c, 0x00000000, 0xd90e000c, 0x00000000
Mar 22 16:10:08 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00000020: 0xd90f000c, 0x00000000, 0x00000000, 0x04f71028
... (TOO MANY CHARACTERS TO POST ON SERVER FAULT, SO STRIPPING) ...
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00007020: 0x00000000, 0x00000000, 0x00000406, 0x10004000
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0x00007030: 0x000e0000, 0x00004af8, 0x00170030, 0x00000000
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0: Host status block [00000001:00000063:(0000:0695:0000):(0000:01b8)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 0: NAPI info [00000063:00000063:(01a5:01b8:01ff):0000:(0095:0000:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 1: Host status block [00000001:000000ac:(0000:0000:0000):(083a:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 1: NAPI info [000000ac:000000ac:(0000:0000:01ff):083a:(003a:003a:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 2: Host status block [00000001:0000004f:(089e:0000:0000):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 2: NAPI info [00000046:00000046:(0000:0000:01ff):0895:(0095:0095:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 3: Host status block [00000001:00000012:(0000:0000:0000):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 3: NAPI info [00000012:00000012:(0000:0000:01ff):0fbc:(07bc:07bc:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 4: Host status block [00000001:000000d4:(0000:0000:0742):(0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: 4: NAPI info [000000d4:000000d4:(0000:0000:01ff):0742:(0742:0742:0000:0000)]
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: tg3_stop_block timed out, ofs=1400 enable_bit=2
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: tg3_stop_block timed out, ofs=c00 enable_bit=2
Mar 22 16:10:09 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is down
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 22 16:10:11 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled


But before that, here is the snippet of /var/log/messages where the NIC initializes seemingly normally during a boot...



Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: PCI INT A -> GSI 16 (level, low) -> IRQ 16
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 14:18:77:32:77:3f
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: dma_rwctrl[00000001] dma_mask[64-bit]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: PCI INT B -> GSI 17 (level, low) -> IRQ 17
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Tigon3 [partno(BCM95720) rev 5720000] (PCI Express) MAC address 14:18:77:32:77:40
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: attached PHY is 5720C (10/100/1000Base-T Ethernet) (WireSpeed[1], EEE[1])
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: RXcsums[1] LinkChgREG[0] MIirq[0] ASF[1] TSOcap[1]
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: dma_rwctrl[00000001] dma_mask[64-bit]


Here is the snippet of /var/log/messages where the NIC comes up seemingly normally, but not sure in what context (it's just after the init immediately above this snippet)...



Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth0: link is not ready
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Link is up at 1000 Mbps, full duplex
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: Flow control is on for TX and on for RX
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.0: eth0: EEE is disabled
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 21 16:01:13 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled
Mar 21 16:01:13 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


And, here is the snippet of /var/log/messages where the NIC seems to soon-after have other issues, renegotiating or something. This happens randomly several times over the next few days...



Mar 21 16:01:42 ind1un043 kernel: ADDRCONF(NETDEV_UP): eth1: link is not ready
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: Link is up at 100 Mbps, half duplex
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: Flow control is off for TX and off for RX
Mar 21 16:01:44 ind1un043 kernel: tg3 0000:02:00.1: eth1: EEE is disabled
Mar 21 16:01:44 ind1un043 kernel: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready


Here is my lspci | grep -i net...



02:00.0 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe
02:00.1 Ethernet controller: Broadcom Corporation NetXtreme BCM5720 Gigabit Ethernet PCIe






linux centos6 dell-poweredge nic broadcom






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 25 '16 at 15:32







csr19us

















asked Mar 25 '16 at 15:14









csr19uscsr19us

63




63












  • Couple of things I would try first. 1: Check for errors in the iDRAC 2: grab the latest SUU and make sure all your drivers/firmware are up to date 3: check the switch it is connected to for errors 4: check that the switch it's connected to isn't hard coded to 100/half for some reason.

    – Zypher
    Mar 25 '16 at 15:41


















  • Couple of things I would try first. 1: Check for errors in the iDRAC 2: grab the latest SUU and make sure all your drivers/firmware are up to date 3: check the switch it is connected to for errors 4: check that the switch it's connected to isn't hard coded to 100/half for some reason.

    – Zypher
    Mar 25 '16 at 15:41

















Couple of things I would try first. 1: Check for errors in the iDRAC 2: grab the latest SUU and make sure all your drivers/firmware are up to date 3: check the switch it is connected to for errors 4: check that the switch it's connected to isn't hard coded to 100/half for some reason.

– Zypher
Mar 25 '16 at 15:41






Couple of things I would try first. 1: Check for errors in the iDRAC 2: grab the latest SUU and make sure all your drivers/firmware are up to date 3: check the switch it is connected to for errors 4: check that the switch it's connected to isn't hard coded to 100/half for some reason.

– Zypher
Mar 25 '16 at 15:41











1 Answer
1






active

oldest

votes


















0














Not sure if this is exactly an "answer" or not, but it did solve my issue, as far as this particular case is concerned. This seems to be related to ACPI.



I edited my /etc/grub.conf file to add the following to the end of the kernel line:



noapic acpi=off pci=noacpi



I'm now no longer seeing any errors in the log.



Could it be that power saving features were taking the interface down and bringing it back up in a slower mode? Or could it be some bug with the hardware/firmware and this kernel? Not sure ~ but this at least fixes my issue (it's a production server, whose network we never want to be down).






share|improve this answer























    Your Answer








    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "2"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader:
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    ,
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );













    draft saved

    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f766109%2ftroubleshooting-nic-logs-by-netdev-watchdog%23new-answer', 'question_page');

    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Not sure if this is exactly an "answer" or not, but it did solve my issue, as far as this particular case is concerned. This seems to be related to ACPI.



    I edited my /etc/grub.conf file to add the following to the end of the kernel line:



    noapic acpi=off pci=noacpi



    I'm now no longer seeing any errors in the log.



    Could it be that power saving features were taking the interface down and bringing it back up in a slower mode? Or could it be some bug with the hardware/firmware and this kernel? Not sure ~ but this at least fixes my issue (it's a production server, whose network we never want to be down).






    share|improve this answer



























      0














      Not sure if this is exactly an "answer" or not, but it did solve my issue, as far as this particular case is concerned. This seems to be related to ACPI.



      I edited my /etc/grub.conf file to add the following to the end of the kernel line:



      noapic acpi=off pci=noacpi



      I'm now no longer seeing any errors in the log.



      Could it be that power saving features were taking the interface down and bringing it back up in a slower mode? Or could it be some bug with the hardware/firmware and this kernel? Not sure ~ but this at least fixes my issue (it's a production server, whose network we never want to be down).






      share|improve this answer

























        0












        0








        0







        Not sure if this is exactly an "answer" or not, but it did solve my issue, as far as this particular case is concerned. This seems to be related to ACPI.



        I edited my /etc/grub.conf file to add the following to the end of the kernel line:



        noapic acpi=off pci=noacpi



        I'm now no longer seeing any errors in the log.



        Could it be that power saving features were taking the interface down and bringing it back up in a slower mode? Or could it be some bug with the hardware/firmware and this kernel? Not sure ~ but this at least fixes my issue (it's a production server, whose network we never want to be down).






        share|improve this answer













        Not sure if this is exactly an "answer" or not, but it did solve my issue, as far as this particular case is concerned. This seems to be related to ACPI.



        I edited my /etc/grub.conf file to add the following to the end of the kernel line:



        noapic acpi=off pci=noacpi



        I'm now no longer seeing any errors in the log.



        Could it be that power saving features were taking the interface down and bringing it back up in a slower mode? Or could it be some bug with the hardware/firmware and this kernel? Not sure ~ but this at least fixes my issue (it's a production server, whose network we never want to be down).







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Mar 29 '16 at 13:05









        csr19uscsr19us

        63




        63



























            draft saved

            draft discarded
















































            Thanks for contributing an answer to Server Fault!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid


            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.

            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fserverfault.com%2fquestions%2f766109%2ftroubleshooting-nic-logs-by-netdev-watchdog%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Wikipedia:Vital articles Мазмуну Biography - Өмүр баян Philosophy and psychology - Философия жана психология Religion - Дин Social sciences - Коомдук илимдер Language and literature - Тил жана адабият Science - Илим Technology - Технология Arts and recreation - Искусство жана эс алуу History and geography - Тарых жана география Навигация менюсу

            Bruxelas-Capital Índice Historia | Composición | Situación lingüística | Clima | Cidades irmandadas | Notas | Véxase tamén | Menú de navegacióneO uso das linguas en Bruxelas e a situación do neerlandés"Rexión de Bruxelas Capital"o orixinalSitio da rexiónPáxina de Bruselas no sitio da Oficina de Promoción Turística de Valonia e BruxelasMapa Interactivo da Rexión de Bruxelas-CapitaleeWorldCat332144929079854441105155190212ID28008674080552-90000 0001 0666 3698n94104302ID540940339365017018237

            What should I write in an apology letter, since I have decided not to join a company after accepting an offer letterShould I keep looking after accepting a job offer?What should I do when I've been verbally told I would get an offer letter, but still haven't gotten one after 4 weeks?Do I accept an offer from a company that I am not likely to join?New job hasn't confirmed starting date and I want to give current employer as much notice as possibleHow should I address my manager in my resignation letter?HR delayed background verification, now jobless as resignedNo email communication after accepting a formal written offer. How should I phrase the call?What should I do if after receiving a verbal offer letter I am informed that my written job offer is put on hold due to some internal issues?Should I inform the current employer that I am about to resign within 1-2 weeks since I have signed the offer letter and waiting for visa?What company will do, if I send their offer letter to another company