diff mbox series

[net] r8169: fix NAPI handling under high load

Message ID 8f84fe39-3d8d-396d-3b97-027e0a83f8cb@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show
Series [net] r8169: fix NAPI handling under high load | expand

Commit Message

Heiner Kallweit Oct. 16, 2018, 8:37 p.m. UTC
rtl_rx() and rtl_tx() are called only if the respective bits are set
in the interrupt status register. Under high load NAPI may not be
able to process all data (work_done == budget) and it will schedule
subsequent calls to the poll callback.
rtl_ack_events() however resets the bits in the interrupt status
register, therefore subsequent calls to rtl8169_poll() won't call
rtl_rx() and rtl_tx() - chip interrupts are still disabled.

Fix this by calling rtl_rx() and rtl_tx() independent of the bits
set in the interrupt status register. Both functions will detect
if there's nothing to do for them.

This issue has been there more or less forever (at least it exists in
3.16 already), so I can't provide a "Fixes" tag. 

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
---
 drivers/net/ethernet/realtek/r8169.c | 8 +++-----
 1 file changed, 3 insertions(+), 5 deletions(-)

Comments

Holger Hoffstätte Oct. 16, 2018, 9:17 p.m. UTC | #1
On 10/16/18 22:37, Heiner Kallweit wrote:
> rtl_rx() and rtl_tx() are called only if the respective bits are set
> in the interrupt status register. Under high load NAPI may not be
> able to process all data (work_done == budget) and it will schedule
> subsequent calls to the poll callback.
> rtl_ack_events() however resets the bits in the interrupt status
> register, therefore subsequent calls to rtl8169_poll() won't call
> rtl_rx() and rtl_tx() - chip interrupts are still disabled.

Very interesting! Could this be the reason for the mysterious
hangs & resets we experienced when enabling BQL for r8169?
They happened more often with TSO/GSO enabled and several people
attempted to fix those hangs unsuccessfully; it was later reverted
and has been since then (#87cda7cb43).
If this bug has been there "forever" it might be tempting to
re-apply BQL and see what happens. Any chance you could give that
a try? I'll gladly test patches, just like I'll run this one.

cheers
Holger
Stephen Hemminger Oct. 16, 2018, 10:17 p.m. UTC | #2
On Tue, 16 Oct 2018 22:37:31 +0200
Heiner Kallweit <hkallweit1@gmail.com> wrote:

> rtl_rx() and rtl_tx() are called only if the respective bits are set
> in the interrupt status register. Under high load NAPI may not be
> able to process all data (work_done == budget) and it will schedule
> subsequent calls to the poll callback.
> rtl_ack_events() however resets the bits in the interrupt status
> register, therefore subsequent calls to rtl8169_poll() won't call
> rtl_rx() and rtl_tx() - chip interrupts are still disabled.
> 
> Fix this by calling rtl_rx() and rtl_tx() independent of the bits
> set in the interrupt status register. Both functions will detect
> if there's nothing to do for them.
> 
> This issue has been there more or less forever (at least it exists in
> 3.16 already), so I can't provide a "Fixes" tag. 
> 
> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>

Another issue is this:

	if (work_done < budget) {
		napi_complete_done(napi, work_done);

		rtl_irq_enable(tp, enable_mask);
		mmiowb();
	}

	return work_done;
}


The code needs to check return value of napi_complete_done.

	if (work_done < budget &&
	    napi_complete_done(napi, work_done) {
		rtl_irq_enable(tp, enable_mask);
		mmiowb();
	}

	return work_done;
}

Try that, it might fix the problem and your logic would
be unnecessary
Stephen Hemminger Oct. 16, 2018, 11:03 p.m. UTC | #3
On Tue, 16 Oct 2018 23:17:31 +0200
Holger Hoffstätte <holger@applied-asynchrony.com> wrote:

> On 10/16/18 22:37, Heiner Kallweit wrote:
> > rtl_rx() and rtl_tx() are called only if the respective bits are set
> > in the interrupt status register. Under high load NAPI may not be
> > able to process all data (work_done == budget) and it will schedule
> > subsequent calls to the poll callback.
> > rtl_ack_events() however resets the bits in the interrupt status
> > register, therefore subsequent calls to rtl8169_poll() won't call
> > rtl_rx() and rtl_tx() - chip interrupts are still disabled.  
> 
> Very interesting! Could this be the reason for the mysterious
> hangs & resets we experienced when enabling BQL for r8169?
> They happened more often with TSO/GSO enabled and several people
> attempted to fix those hangs unsuccessfully; it was later reverted
> and has been since then (#87cda7cb43).
> If this bug has been there "forever" it might be tempting to
> re-apply BQL and see what happens. Any chance you could give that
> a try? I'll gladly test patches, just like I'll run this one.
> 
> cheers
> Holger

Many drivers have buggy usage of napi_complete_done.

Might even be worth forcing all network drivers to check the return
value. But fixing 150 broken drivers will be a nuisance.

diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index dc1d9ed33b31..c38bc66ffe74 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -466,7 +466,8 @@ static inline bool napi_reschedule(struct napi_struct *napi)
        return false;
 }
 
-bool napi_complete_done(struct napi_struct *n, int work_done);
+bool __must_check napi_complete_done(struct napi_struct *n, int work_done);
+
 /**
  *     napi_complete - NAPI processing complete
  *     @n: NAPI context
Florian Fainelli Oct. 16, 2018, 11:08 p.m. UTC | #4
On 10/16/2018 04:03 PM, Stephen Hemminger wrote:
> On Tue, 16 Oct 2018 23:17:31 +0200
> Holger Hoffstätte <holger@applied-asynchrony.com> wrote:
> 
>> On 10/16/18 22:37, Heiner Kallweit wrote:
>>> rtl_rx() and rtl_tx() are called only if the respective bits are set
>>> in the interrupt status register. Under high load NAPI may not be
>>> able to process all data (work_done == budget) and it will schedule
>>> subsequent calls to the poll callback.
>>> rtl_ack_events() however resets the bits in the interrupt status
>>> register, therefore subsequent calls to rtl8169_poll() won't call
>>> rtl_rx() and rtl_tx() - chip interrupts are still disabled.  
>>
>> Very interesting! Could this be the reason for the mysterious
>> hangs & resets we experienced when enabling BQL for r8169?
>> They happened more often with TSO/GSO enabled and several people
>> attempted to fix those hangs unsuccessfully; it was later reverted
>> and has been since then (#87cda7cb43).
>> If this bug has been there "forever" it might be tempting to
>> re-apply BQL and see what happens. Any chance you could give that
>> a try? I'll gladly test patches, just like I'll run this one.
>>
>> cheers
>> Holger
> 
> Many drivers have buggy usage of napi_complete_done.
> 
> Might even be worth forcing all network drivers to check the return
> value. But fixing 150 broken drivers will be a nuisance.

I had started doing that about a month ago in light of the ixbge
ndo_poll_controller vs. napi problem, but have not had time to submit
that series yet:

https://github.com/ffainelli/linux/commits/napi-check

feel free to piggy back on top of that series if you would like to
address this.

> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index dc1d9ed33b31..c38bc66ffe74 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -466,7 +466,8 @@ static inline bool napi_reschedule(struct napi_struct *napi)
>         return false;
>  }
>  
> -bool napi_complete_done(struct napi_struct *n, int work_done);
> +bool __must_check napi_complete_done(struct napi_struct *n, int work_done);
> +
>  /**
>   *     napi_complete - NAPI processing complete
>   *     @n: NAPI context
>
Eric Dumazet Oct. 17, 2018, 12:19 a.m. UTC | #5
On 10/16/2018 03:17 PM, Stephen Hemminger wrote:
> On Tue, 16 Oct 2018 22:37:31 +0200
> Heiner Kallweit <hkallweit1@gmail.com> wrote:
> 
>> rtl_rx() and rtl_tx() are called only if the respective bits are set
>> in the interrupt status register. Under high load NAPI may not be
>> able to process all data (work_done == budget) and it will schedule
>> subsequent calls to the poll callback.
>> rtl_ack_events() however resets the bits in the interrupt status
>> register, therefore subsequent calls to rtl8169_poll() won't call
>> rtl_rx() and rtl_tx() - chip interrupts are still disabled.
>>
>> Fix this by calling rtl_rx() and rtl_tx() independent of the bits
>> set in the interrupt status register. Both functions will detect
>> if there's nothing to do for them.
>>
>> This issue has been there more or less forever (at least it exists in
>> 3.16 already), so I can't provide a "Fixes" tag. 
>>
>> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
> 
> Another issue is this:
> 
> 	if (work_done < budget) {
> 		napi_complete_done(napi, work_done);
> 
> 		rtl_irq_enable(tp, enable_mask);
> 		mmiowb();
> 	}
> 
> 	return work_done;
> }
> 
> 
> The code needs to check return value of napi_complete_done.
> 
> 	if (work_done < budget &&
> 	    napi_complete_done(napi, work_done) {
> 		rtl_irq_enable(tp, enable_mask);
> 		mmiowb();
> 	}
> 
> 	return work_done;
> }
> 
> Try that, it might fix the problem and your logic would
> be unnecessary 
> 

Well, I do not believe this is related.

Testing napi_complete_done() is not mandatory, it is only an optimization [1]
and only for busy polling users.

In short, by default, this should not matter, since busy-polling is not enabled by default.


[1] busy polling users are spinning anyway, so it is not even clear if this
   is really an optimization, unless maybe the cost of irq enabling is really really high...
Eric Dumazet Oct. 17, 2018, 12:21 a.m. UTC | #6
On 10/16/2018 04:03 PM, Stephen Hemminger wrote:

> Many drivers have buggy usage of napi_complete_done.
> 
> Might even be worth forcing all network drivers to check the return
> value. But fixing 150 broken drivers will be a nuisance.
> 
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index dc1d9ed33b31..c38bc66ffe74 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -466,7 +466,8 @@ static inline bool napi_reschedule(struct napi_struct *napi)
>         return false;
>  }
>  
> -bool napi_complete_done(struct napi_struct *n, int work_done);
> +bool __must_check napi_complete_done(struct napi_struct *n, int work_done);
> +
>  /**
>   *     napi_complete - NAPI processing complete
>   *     @n: NAPI context
> 

I disagree completely.
This never has been a requirement.
Eric Dumazet Oct. 17, 2018, 12:23 a.m. UTC | #7
On 10/16/2018 04:08 PM, Florian Fainelli wrote:

> I had started doing that about a month ago in light of the ixbge
> ndo_poll_controller vs. napi problem, but have not had time to submit
> that series yet:
> 
> https://github.com/ffainelli/linux/commits/napi-check
> 
> feel free to piggy back on top of that series if you would like to
> address this.


But the root cause was different, you remember ?

We fixed the real issue with netpoll ability to stick all NIC IRQ onto one
victim CPU.
Florian Fainelli Oct. 17, 2018, 3:10 a.m. UTC | #8
On 10/16/2018 5:23 PM, Eric Dumazet wrote:
> 
> 
> On 10/16/2018 04:08 PM, Florian Fainelli wrote:
> 
>> I had started doing that about a month ago in light of the ixbge
>> ndo_poll_controller vs. napi problem, but have not had time to submit
>> that series yet:
>>
>> https://github.com/ffainelli/linux/commits/napi-check
>>
>> feel free to piggy back on top of that series if you would like to
>> address this.
> 
> 
> But the root cause was different, you remember ?
> 
> We fixed the real issue with netpoll ability to stick all NIC IRQ onto one
> victim CPU.

I do remember, after seeing the resolution I kind of decided to leave
this in a branch but not submit it because it was not particularly
relevant anymore.
Heiner Kallweit Oct. 17, 2018, 6:12 p.m. UTC | #9
On 16.10.2018 23:17, Holger Hoffstätte wrote:
> On 10/16/18 22:37, Heiner Kallweit wrote:
>> rtl_rx() and rtl_tx() are called only if the respective bits are set
>> in the interrupt status register. Under high load NAPI may not be
>> able to process all data (work_done == budget) and it will schedule
>> subsequent calls to the poll callback.
>> rtl_ack_events() however resets the bits in the interrupt status
>> register, therefore subsequent calls to rtl8169_poll() won't call
>> rtl_rx() and rtl_tx() - chip interrupts are still disabled.
> 
> Very interesting! Could this be the reason for the mysterious
> hangs & resets we experienced when enabling BQL for r8169?
> They happened more often with TSO/GSO enabled and several people
> attempted to fix those hangs unsuccessfully; it was later reverted
> and has been since then (#87cda7cb43).
> If this bug has been there "forever" it might be tempting to
> re-apply BQL and see what happens. Any chance you could give that
> a try? I'll gladly test patches, just like I'll run this one.
> 
After reading through the old mail threads regarding BQL on r8169
I don't think the fix here is related.
It seems that BQL on r8169 worked fine for most people, just one
had problems on one of his systems. I assume the issue was specific
to one chip version. From the ~ 50 chip versions supported by
r8169 more or less each one requires its own quirks.
If we're lucky the chip-version-specific issue has been fixed in
the meantime and we can simply apply the old BQL patch again.

If it turns out that certain chip versions can't be used with BQL,
then we can disable the feature for these chip versions instead
of removing the feature completely.

I will apply the old BQL patch and see how it's on my system
(with GRO and SG enabled).

> cheers
> Holger
> 
Heiner
Heiner Kallweit Oct. 17, 2018, 6:48 p.m. UTC | #10
Tomas,

more than three years back you reported network problems after BQL
was added to the r8169 driver. Due to this the change was reverted.
Now the discussion to add BQL popped up again.
You mentioned that the issue exists on one of your systems only.
Therefore it could be an issue specific to a particular chip version.

I'd be interested in the chip version of the affected system.
You linked to another similar report, there the chip version was:
r8169 0000:10:00.0 eth0: RTL8168c/8111c at 0xf8130000, 00:e0:4c:68:48:d2, XID 1c4000c0 IRQ 29

In case you still have the affected system or at least the old dmesg
logs, I'd appreciate if you could let me know the quoted line from
dmesg output.


Thanks a lot,
Heiner

-------- Forwarded Message --------
Subject: Re: [PATCH net] r8169: fix NAPI handling under high load
Date: Wed, 17 Oct 2018 20:12:48 +0200
From: Heiner Kallweit <hkallweit1@gmail.com>
To: Holger Hoffstätte <holger@applied-asynchrony.com>, David Miller <davem@davemloft.net>, Realtek linux nic maintainers <nic_swsd@realtek.com>
CC: netdev@vger.kernel.org <netdev@vger.kernel.org>

On 16.10.2018 23:17, Holger Hoffstätte wrote:
> On 10/16/18 22:37, Heiner Kallweit wrote:
>> rtl_rx() and rtl_tx() are called only if the respective bits are set
>> in the interrupt status register. Under high load NAPI may not be
>> able to process all data (work_done == budget) and it will schedule
>> subsequent calls to the poll callback.
>> rtl_ack_events() however resets the bits in the interrupt status
>> register, therefore subsequent calls to rtl8169_poll() won't call
>> rtl_rx() and rtl_tx() - chip interrupts are still disabled.
> 
> Very interesting! Could this be the reason for the mysterious
> hangs & resets we experienced when enabling BQL for r8169?
> They happened more often with TSO/GSO enabled and several people
> attempted to fix those hangs unsuccessfully; it was later reverted
> and has been since then (#87cda7cb43).
> If this bug has been there "forever" it might be tempting to
> re-apply BQL and see what happens. Any chance you could give that
> a try? I'll gladly test patches, just like I'll run this one.
> 
After reading through the old mail threads regarding BQL on r8169
I don't think the fix here is related.
It seems that BQL on r8169 worked fine for most people, just one
had problems on one of his systems. I assume the issue was specific
to one chip version. From the ~ 50 chip versions supported by
r8169 more or less each one requires its own quirks.
If we're lucky the chip-version-specific issue has been fixed in
the meantime and we can simply apply the old BQL patch again.

If it turns out that certain chip versions can't be used with BQL,
then we can disable the feature for these chip versions instead
of removing the feature completely.

I will apply the old BQL patch and see how it's on my system
(with GRO and SG enabled).

> cheers
> Holger
> 
Heiner
Holger Hoffstätte Oct. 17, 2018, 7:11 p.m. UTC | #11
On 10/17/18 20:12, Heiner Kallweit wrote:
> On 16.10.2018 23:17, Holger Hoffstätte wrote:
>> On 10/16/18 22:37, Heiner Kallweit wrote:
>>> rtl_rx() and rtl_tx() are called only if the respective bits are set
>>> in the interrupt status register. Under high load NAPI may not be
>>> able to process all data (work_done == budget) and it will schedule
>>> subsequent calls to the poll callback.
>>> rtl_ack_events() however resets the bits in the interrupt status
>>> register, therefore subsequent calls to rtl8169_poll() won't call
>>> rtl_rx() and rtl_tx() - chip interrupts are still disabled.
>>
>> Very interesting! Could this be the reason for the mysterious
>> hangs & resets we experienced when enabling BQL for r8169?
>> They happened more often with TSO/GSO enabled and several people
>> attempted to fix those hangs unsuccessfully; it was later reverted
>> and has been since then (#87cda7cb43).
>> If this bug has been there "forever" it might be tempting to
>> re-apply BQL and see what happens. Any chance you could give that
>> a try? I'll gladly test patches, just like I'll run this one.
>>
> After reading through the old mail threads regarding BQL on r8169
> I don't think the fix here is related.
> It seems that BQL on r8169 worked fine for most people, just one
> had problems on one of his systems. I assume the issue was specific

I continued to use the BQL patch in my private tree after it was reverted
and also had occasional timeouts, but *only* after I started playing
with ethtool to change offload settings. Without offloads or the BQL patch
everything has been rock-solid since then.
The other weird problem was that timeouts would occur on an otherwise
*completely idle* system. Since that occasionally borked my NFS server
over night I ultimately removed BQL as well. Rock-solid since then.

> I will apply the old BQL patch and see how it's on my system
> (with GRO and SG enabled).

I don't think it still applies cleanly, but if you cook up an updated
version I'll gladly test it.

Thanks! :)
Holger
Heiner Kallweit Oct. 17, 2018, 7:27 p.m. UTC | #12
On 17.10.2018 21:11, Holger Hoffstätte wrote:
> On 10/17/18 20:12, Heiner Kallweit wrote:
>> On 16.10.2018 23:17, Holger Hoffstätte wrote:
>>> On 10/16/18 22:37, Heiner Kallweit wrote:
>>>> rtl_rx() and rtl_tx() are called only if the respective bits are set
>>>> in the interrupt status register. Under high load NAPI may not be
>>>> able to process all data (work_done == budget) and it will schedule
>>>> subsequent calls to the poll callback.
>>>> rtl_ack_events() however resets the bits in the interrupt status
>>>> register, therefore subsequent calls to rtl8169_poll() won't call
>>>> rtl_rx() and rtl_tx() - chip interrupts are still disabled.
>>>
>>> Very interesting! Could this be the reason for the mysterious
>>> hangs & resets we experienced when enabling BQL for r8169?
>>> They happened more often with TSO/GSO enabled and several people
>>> attempted to fix those hangs unsuccessfully; it was later reverted
>>> and has been since then (#87cda7cb43).
>>> If this bug has been there "forever" it might be tempting to
>>> re-apply BQL and see what happens. Any chance you could give that
>>> a try? I'll gladly test patches, just like I'll run this one.
>>>
>> After reading through the old mail threads regarding BQL on r8169
>> I don't think the fix here is related.
>> It seems that BQL on r8169 worked fine for most people, just one
>> had problems on one of his systems. I assume the issue was specific
> 
> I continued to use the BQL patch in my private tree after it was reverted
> and also had occasional timeouts, but *only* after I started playing
> with ethtool to change offload settings. Without offloads or the BQL patch
> everything has been rock-solid since then.
> The other weird problem was that timeouts would occur on an otherwise
> *completely idle* system. Since that occasionally borked my NFS server
> over night I ultimately removed BQL as well. Rock-solid since then.
> 
>> I will apply the old BQL patch and see how it's on my system
>> (with GRO and SG enabled).
> 
> I don't think it still applies cleanly, but if you cook up an updated
> version I'll gladly test it.
> 
> Thanks! :)
> Holger
> 

Good to know. What's your kernel version and RTL8168 chip version?
Regarding the chip version the dmesg line with the XID would be relevant.

Below is the slightly modified original BQL patch, I just moved the call
to netdev_reset_queue(). This patch applies at least to latest linux-next.

My test system:
- RTL8168evl
- latest linux-next
- BQL patch applied
- SG/GRO enabled:

rx-checksumming: on
tx-checksumming: on
        tx-checksum-ipv4: on
        tx-checksum-ip-generic: off [fixed]
        tx-checksum-ipv6: on
        tx-checksum-fcoe-crc: off [fixed]
        tx-checksum-sctp: off [fixed]
scatter-gather: on
        tx-scatter-gather: on
        tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
        tx-tcp-segmentation: on
        tx-tcp-ecn-segmentation: off [fixed]
        tx-tcp-mangleid-segmentation: on
        tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: on
tx-vlan-offload: on

I briefly tested normal operation and did some tests with iperf3.
Everything looks good so far.

---
 drivers/net/ethernet/realtek/r8169.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 0d8070adc..e236b46b8 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -5852,6 +5852,7 @@ static void rtl8169_tx_clear(struct rtl8169_private *tp)
 {
 	rtl8169_tx_clear_range(tp, tp->dirty_tx, NUM_TX_DESC);
 	tp->cur_tx = tp->dirty_tx = 0;
+	netdev_reset_queue(tp->dev);
 }
 
 static void rtl_reset_work(struct rtl8169_private *tp)
@@ -6154,6 +6155,8 @@ static netdev_tx_t rtl8169_start_xmit(struct sk_buff *skb,
 
 	txd->opts2 = cpu_to_le32(opts[1]);
 
+	netdev_sent_queue(dev, skb->len);
+
 	skb_tx_timestamp(skb);
 
 	/* Force memory writes to complete before releasing descriptor */
@@ -6252,7 +6255,7 @@ static void rtl8169_pcierr_interrupt(struct net_device *dev)
 
 static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 {
-	unsigned int dirty_tx, tx_left;
+	unsigned int dirty_tx, tx_left, bytes_compl = 0, pkts_compl = 0;
 
 	dirty_tx = tp->dirty_tx;
 	smp_rmb();
@@ -6276,10 +6279,8 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 		rtl8169_unmap_tx_skb(tp_to_dev(tp), tx_skb,
 				     tp->TxDescArray + entry);
 		if (status & LastFrag) {
-			u64_stats_update_begin(&tp->tx_stats.syncp);
-			tp->tx_stats.packets++;
-			tp->tx_stats.bytes += tx_skb->skb->len;
-			u64_stats_update_end(&tp->tx_stats.syncp);
+			pkts_compl++;
+			bytes_compl += tx_skb->skb->len;
 			dev_consume_skb_any(tx_skb->skb);
 			tx_skb->skb = NULL;
 		}
@@ -6288,6 +6289,13 @@ static void rtl_tx(struct net_device *dev, struct rtl8169_private *tp)
 	}
 
 	if (tp->dirty_tx != dirty_tx) {
+		netdev_completed_queue(dev, pkts_compl, bytes_compl);
+
+		u64_stats_update_begin(&tp->tx_stats.syncp);
+		tp->tx_stats.packets += pkts_compl;
+		tp->tx_stats.bytes += bytes_compl;
+		u64_stats_update_end(&tp->tx_stats.syncp);
+
 		tp->dirty_tx = dirty_tx;
 		/* Sync with rtl8169_start_xmit:
 		 * - publish dirty_tx ring index (write barrier)
Holger Hoffstätte Oct. 17, 2018, 8:07 p.m. UTC | #13
On 10/17/18 21:27, Heiner Kallweit wrote:
(snip)
> Good to know. What's your kernel version and RTL8168 chip version?
> Regarding the chip version the dmesg line with the XID would be relevant.

4.18.15 + PDS (custom CPU scheduler) + cherry pickings from mainline.
Applied both the original patch in this thread & bql, built fine.

Server:
r8169 0000:04:00.0 eth0: RTL8168evl/8111evl, c8:60:00:68:33:cc, XID 2c900800, IRQ 30

Workstation:
r8169 0000:04:00.0 eth0: RTL8168evl/8111evl, 50:e5:49:41:7d:ad, XID 2c900800, IRQ 33

So same chipsets.

On both:
ethtool --coalesce eth0 rx-frames 0 rx-usecs 50 tx-frames 0 tx-usecs 50
ethtool --offload eth0 rx on tx on gro on gso on sg on tso on

Let's see how it goes. :)

cheers
Holger
Francois Romieu Oct. 17, 2018, 11:30 p.m. UTC | #14
Heiner Kallweit <hkallweit1@gmail.com> :
[...]
> This issue has been there more or less forever (at least it exists in
> 3.16 already), so I can't provide a "Fixes" tag. 

Hardly forever. It fixes da78dbff2e05630921c551dbbc70a4b7981a8fff.
Francois Romieu Oct. 17, 2018, 11:30 p.m. UTC | #15
Holger Hoffstätte <holger@applied-asynchrony.com> :
[...]
> I continued to use the BQL patch in my private tree after it was reverted
> and also had occasional timeouts, but *only* after I started playing
> with ethtool to change offload settings. Without offloads or the BQL patch
> everything has been rock-solid since then.
> The other weird problem was that timeouts would occur on an otherwise
> *completely idle* system. Since that occasionally borked my NFS server
> over night I ultimately removed BQL as well. Rock-solid since then.

The bug will induce delayed rx processing when a spike of "load" is
followed by an idle period. I do not see how bql would matter per se though.
David Miller Oct. 18, 2018, 5:21 a.m. UTC | #16
From: Francois Romieu <romieu@fr.zoreil.com>
Date: Thu, 18 Oct 2018 01:30:45 +0200

> Heiner Kallweit <hkallweit1@gmail.com> :
> [...]
>> This issue has been there more or less forever (at least it exists in
>> 3.16 already), so I can't provide a "Fixes" tag. 
> 
> Hardly forever. It fixes da78dbff2e05630921c551dbbc70a4b7981a8fff.

I don't see exactly how that can be true.

That commit didn't change the parts of the NAPI poll processing which
are relevant here, mainly the guarding of the RX and TX work using
the status bits which are cleared.

Maybe I'm missing something?  If so, indeed it would be nice to add
a proper Fixes: tag here.

Thanks!
Jonathan Woithe Oct. 18, 2018, 5:58 a.m. UTC | #17
On Thu, Oct 18, 2018 at 01:30:51AM +0200, Francois Romieu wrote:
> Holger Hoffstätte <holger@applied-asynchrony.com> :
> [...]
> > I continued to use the BQL patch in my private tree after it was reverted
> > and also had occasional timeouts, but *only* after I started playing
> > with ethtool to change offload settings. Without offloads or the BQL patch
> > everything has been rock-solid since then.
> > The other weird problem was that timeouts would occur on an otherwise
> > *completely idle* system. Since that occasionally borked my NFS server
> > over night I ultimately removed BQL as well. Rock-solid since then.
> 
> The bug will induce delayed rx processing when a spike of "load" is
> followed by an idle period.

If this is the case, I wonder whether this bug might also be the cause of
the long reception delays we've observed at times when a period of high
network load is followed by almost nothing[1].  That thread[2] details the
investigations subsequently done.  A git bisect showed that commit
da78dbff2e05630921c551dbbc70a4b7981a8fff was the origin of the misbehaviour
we were observing.

We still see the problem when we test with recent kernels.  It would be
great if the underlying problem has now been identified.

I can possibly scrape some hardware together to test any proposed fix under
our workload if there was interest.

Regards
  jonathan

[1] https://marc.info/?l=linux-netdev&m=136281333207734&w=2
[2] https://marc.info/?t=136281339500002&r=1&w=2
Heiner Kallweit Oct. 18, 2018, 5:58 a.m. UTC | #18
On 18.10.2018 07:21, David Miller wrote:
> From: Francois Romieu <romieu@fr.zoreil.com>
> Date: Thu, 18 Oct 2018 01:30:45 +0200
> 
>> Heiner Kallweit <hkallweit1@gmail.com> :
>> [...]
>>> This issue has been there more or less forever (at least it exists in
>>> 3.16 already), so I can't provide a "Fixes" tag. 
>>
>> Hardly forever. It fixes da78dbff2e05630921c551dbbc70a4b7981a8fff.
> 
> I don't see exactly how that can be true.
> 
> That commit didn't change the parts of the NAPI poll processing which
> are relevant here, mainly the guarding of the RX and TX work using
> the status bits which are cleared.
> 

AFAICS Francois is right and patch da78dbff2e05 ("r8169: remove work
from irq handler") introduced the guarding of RX and TX work.
I just checked back to 3.16 as oldest LTS kernel version.

> Maybe I'm missing something?  If so, indeed it would be nice to add
> a proper Fixes: tag here.
> 
Shall I submit a v2 including the Fixes line?

> Thanks!
>
Heiner Kallweit Oct. 18, 2018, 6:03 a.m. UTC | #19
On 18.10.2018 07:58, Jonathan Woithe wrote:
> On Thu, Oct 18, 2018 at 01:30:51AM +0200, Francois Romieu wrote:
>> Holger Hoffstätte <holger@applied-asynchrony.com> :
>> [...]
>>> I continued to use the BQL patch in my private tree after it was reverted
>>> and also had occasional timeouts, but *only* after I started playing
>>> with ethtool to change offload settings. Without offloads or the BQL patch
>>> everything has been rock-solid since then.
>>> The other weird problem was that timeouts would occur on an otherwise
>>> *completely idle* system. Since that occasionally borked my NFS server
>>> over night I ultimately removed BQL as well. Rock-solid since then.
>>
>> The bug will induce delayed rx processing when a spike of "load" is
>> followed by an idle period.
> 
> If this is the case, I wonder whether this bug might also be the cause of
> the long reception delays we've observed at times when a period of high
> network load is followed by almost nothing[1].  That thread[2] details the
> investigations subsequently done.  A git bisect showed that commit
> da78dbff2e05630921c551dbbc70a4b7981a8fff was the origin of the misbehaviour
> we were observing.
> 
> We still see the problem when we test with recent kernels.  It would be
> great if the underlying problem has now been identified.
> 
> I can possibly scrape some hardware together to test any proposed fix under
> our workload if there was interest.
> 
Proposed fix is here:
https://patchwork.ozlabs.org/patch/985014/
Would be good if you could test it. Thanks!

Heiner

> Regards
>   jonathan
> 
> [1] https://marc.info/?l=linux-netdev&m=136281333207734&w=2
> [2] https://marc.info/?t=136281339500002&r=1&w=2
>
Jonathan Woithe Oct. 18, 2018, 6:15 a.m. UTC | #20
On Thu, Oct 18, 2018 at 08:03:32AM +0200, Heiner Kallweit wrote:
> On 18.10.2018 07:58, Jonathan Woithe wrote:
> > On Thu, Oct 18, 2018 at 01:30:51AM +0200, Francois Romieu wrote:
> >> Holger Hoffstätte <holger@applied-asynchrony.com> :
> >> [...]
> >> The bug will induce delayed rx processing when a spike of "load" is
> >> followed by an idle period.
> > 
> > If this is the case, I wonder whether this bug might also be the cause of
> > the long reception delays we've observed at times when a period of high
> > network load is followed by almost nothing[1].  That thread[2] details the
> > investigations subsequently done.  A git bisect showed that commit
> > da78dbff2e05630921c551dbbc70a4b7981a8fff was the origin of the misbehaviour
> > we were observing.
> > 
> > We still see the problem when we test with recent kernels.  It would be
> > great if the underlying problem has now been identified.
> > 
> > I can possibly scrape some hardware together to test any proposed fix under
> > our workload if there was interest.
> > 
> Proposed fix is here:
> https://patchwork.ozlabs.org/patch/985014/
> Would be good if you could test it. Thanks!

I should be able to do so tomorrow.  Which kernel would you like me to apply
the patch to?

Regards
  jonathan
David Miller Oct. 18, 2018, 6:24 a.m. UTC | #21
From: Heiner Kallweit <hkallweit1@gmail.com>
Date: Thu, 18 Oct 2018 07:58:52 +0200

> On 18.10.2018 07:21, David Miller wrote:
>> From: Francois Romieu <romieu@fr.zoreil.com>
>> Date: Thu, 18 Oct 2018 01:30:45 +0200
>> 
>>> Heiner Kallweit <hkallweit1@gmail.com> :
>>> [...]
>>>> This issue has been there more or less forever (at least it exists in
>>>> 3.16 already), so I can't provide a "Fixes" tag. 
>>>
>>> Hardly forever. It fixes da78dbff2e05630921c551dbbc70a4b7981a8fff.
>> 
>> I don't see exactly how that can be true.
>> 
>> That commit didn't change the parts of the NAPI poll processing which
>> are relevant here, mainly the guarding of the RX and TX work using
>> the status bits which are cleared.
>> 
> 
> AFAICS Francois is right and patch da78dbff2e05 ("r8169: remove work
> from irq handler") introduced the guarding of RX and TX work.
> I just checked back to 3.16 as oldest LTS kernel version.

Aha, now I see it.

>> Maybe I'm missing something?  If so, indeed it would be nice to add
>> a proper Fixes: tag here.
>> 
> Shall I submit a v2 including the Fixes line?

Yes, please do!
Holger Hoffstätte Oct. 18, 2018, 11:52 a.m. UTC | #22
On 10/18/18 08:15, Jonathan Woithe wrote:
> On Thu, Oct 18, 2018 at 08:03:32AM +0200, Heiner Kallweit wrote:
>> On 18.10.2018 07:58, Jonathan Woithe wrote:
>>> On Thu, Oct 18, 2018 at 01:30:51AM +0200, Francois Romieu wrote:
>>>> Holger Hoffstätte <holger@applied-asynchrony.com> :
>>>> [...]
>>>> The bug will induce delayed rx processing when a spike of "load" is
>>>> followed by an idle period.
>>>
>>> If this is the case, I wonder whether this bug might also be the cause of
>>> the long reception delays we've observed at times when a period of high
>>> network load is followed by almost nothing[1].  That thread[2] details the
>>> investigations subsequently done.  A git bisect showed that commit
>>> da78dbff2e05630921c551dbbc70a4b7981a8fff was the origin of the misbehaviour
>>> we were observing.
>>>
>>> We still see the problem when we test with recent kernels.  It would be
>>> great if the underlying problem has now been identified.
>>>
>>> I can possibly scrape some hardware together to test any proposed fix under
>>> our workload if there was interest.
>>>
>> Proposed fix is here:
>> https://patchwork.ozlabs.org/patch/985014/
>> Would be good if you could test it. Thanks!
> 
> I should be able to do so tomorrow.  Which kernel would you like me to apply
> the patch to?

Hi Jonathan,

I'm already running it on 4.18.15, so either that or latest 4.19-rc would
work as well.

cheers
Holger
Jonathan Woithe Oct. 19, 2018, 7:29 a.m. UTC | #23
On Thu, Oct 18, 2018 at 01:52:33PM +0200, Holger Hoffstätte wrote:
> On 10/18/18 08:15, Jonathan Woithe wrote:
> > On Thu, Oct 18, 2018 at 08:03:32AM +0200, Heiner Kallweit wrote:
> > > Proposed fix is here:
> > > https://patchwork.ozlabs.org/patch/985014/
> > > Would be good if you could test it. Thanks!
> > 
> > I should be able to do so tomorrow.  Which kernel would you like me to apply
> > the patch to?
> 
> Hi Jonathan,
> 
> I'm already running it on 4.18.15, so either that or latest 4.19-rc would
> work as well.

It turns out I couldn't compile 4.18.15 conveniently for the test system due
to the age of the compiler.  The compiler, like the rest of the
distribution, is ancient: the presence of the bug has prevented us updating
the system due to a need to support hardware in the field.  The bug has
affected the operation of our systems in all kernels since (I think) 3.3.

I could compile 4.14 though - I had previously confirmed that the problem
was still seen with that kernel.  I therefore applied the fix to 4.14 (which
was trivial) and started a test with that.  The system has been running
without exhibiting the effects of the bug for over five hours.  With an
unpatched 4.14, symptoms were typically seen within 30 minutes.

I will leave the system to operate over the weekend to be sure, but at this
stage it seems likely that the patch will resolve this long-standing
difficulty that we've experienced.  I will report back on Monday.

Regards
  jonathan
Holger Hoffstätte Oct. 20, 2018, 9:55 a.m. UTC | #24
On 10/17/18 22:07, Holger Hoffstätte wrote:
> On 10/17/18 21:27, Heiner Kallweit wrote:
> (snip)
>> Good to know. What's your kernel version and RTL8168 chip version?
>> Regarding the chip version the dmesg line with the XID would be relevant.
> 
> 4.18.15 + PDS (custom CPU scheduler) + cherry pickings from mainline.
> Applied both the original patch in this thread & bql, built fine.

Good news everyone! Been running with the new BQL patch for three days
now on 2 machines and not a single hang/reset, regardless of load.
Coupled with the original patch in this thread (already in mainline)
this looks pretty good!

So while I can only speak for myself & my hardware, here's a

Tested-by: Holger Hoffstätte <holger@applied-asynchrony.com>

Thanks Heiner!

-h
Jonathan Woithe Oct. 21, 2018, 11:07 p.m. UTC | #25
On Fri, 19 Oct 2018 17:59:21 +1030, Jonathan Woithe wrote:
> On 10/18/18 08:15, Jonathan Woithe wrote:
> > On Thu, Oct 18, 2018 at 08:03:32AM +0200, Heiner Kallweit wrote:
> > > Proposed fix is here:
> > > https://patchwork.ozlabs.org/patch/985014/
> > > Would be good if you could test it. Thanks!
> > 
> > I should be able to do so tomorrow.  Which kernel would you like me to apply
> > the patch to?
>
> It turns out I couldn't compile 4.18.15 conveniently ...
> I could compile 4.14 though - I had previously confirmed that the problem
> was still seen with that kernel.  I therefore applied the fix to 4.14 (which
> was trivial) and started a test with that.  The system has been running
> without exhibiting the effects of the bug for over five hours.  With an
> unpatched 4.14, symptoms were typically seen within 30 minutes.
> 
> I will leave the system to operate over the weekend to be sure, but at this
> stage it seems likely that the patch will resolve this long-standing
> difficulty that we've experienced.  I will report back on Monday.

Our test system has now been running for just under 3 days (69 hours) and
there has not been any incidence of the problem with this patch applied to
the 4.14 r8169 driver.  Without the patch, multiple problems are logged
within 30 minutes.

Given this, I would conclude that this patch fixes the problem for us. 
Although belated (since the patch has already been accepted):

  Tested-by: Jonathan Woithe <jwoithe@atrad.com.au>

Thank you very much for your work which lead to the patch!  It means we can
finally provide an upgrade path for systems in the field which are equipped
with the r8169 hardware.

For reference, the problem being tested on our systems is the one discussed
in the "r8169 regression: UDP packets dropped intermittantly" thread on this
mailing list.

Regards
  jonathan
diff mbox series

Patch

diff --git a/drivers/net/ethernet/realtek/r8169.c b/drivers/net/ethernet/realtek/r8169.c
index 8c4f49adc..7caf3b7e9 100644
--- a/drivers/net/ethernet/realtek/r8169.c
+++ b/drivers/net/ethernet/realtek/r8169.c
@@ -6528,17 +6528,15 @@  static int rtl8169_poll(struct napi_struct *napi, int budget)
 	struct rtl8169_private *tp = container_of(napi, struct rtl8169_private, napi);
 	struct net_device *dev = tp->dev;
 	u16 enable_mask = RTL_EVENT_NAPI | tp->event_slow;
-	int work_done= 0;
+	int work_done;
 	u16 status;
 
 	status = rtl_get_events(tp);
 	rtl_ack_events(tp, status & ~tp->event_slow);
 
-	if (status & RTL_EVENT_NAPI_RX)
-		work_done = rtl_rx(dev, tp, (u32) budget);
+	work_done = rtl_rx(dev, tp, (u32) budget);
 
-	if (status & RTL_EVENT_NAPI_TX)
-		rtl_tx(dev, tp);
+	rtl_tx(dev, tp);
 
 	if (status & tp->event_slow) {
 		enable_mask &= ~tp->event_slow;