diff mbox series

[LEDE-DEV] ar71xx: Add GRO support to ag71xx

Message ID 20170903183546.16749-1-rosenp@gmail.com
State Accepted
Headers show
Series [LEDE-DEV] ar71xx: Add GRO support to ag71xx | expand

Commit Message

Rosen Penev Sept. 3, 2017, 6:35 p.m. UTC
On a TL-WN710N, this patch increases iperf performance from ~92.5 to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I expect greater numbers from gigabit devices.

Signed-off-by: Rosen Penev <rosenp@gmail.com>
---
 .../ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c  | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Michael Yartys via Lede-dev Sept. 3, 2017, 9:17 p.m. UTC | #1
The sender domain has a DMARC Reject/Quarantine policy which disallows
sending mailing list messages using the original "From" header.

To mitigate this problem, the original message has been wrapped
automatically by the mailing list software.
On Sunday, September 3, 2017 11:35:46 AM CEST Rosen Penev wrote:
> On a TL-WN710N, this patch increases iperf performance from ~92.5 to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I expect greater numbers from gigabit devices.
> 
> Signed-off-by: Rosen Penev <rosenp@gmail.com>
> ---
I've done a quick test of the patch on my WD Range Extender.
(It has a Atheros AR9344 rev 2 SoC @ CPU:560.000MHz, DDR:400.000MHz
The PHY is a AR8035, which supports 1 GBit/s Links)

The range extender (DUT) was running iperf3 server in both tests.
Another desktop PC was acting as the iperf3 client.

without the patch:

Connecting to host range-extender, port 5201
[  4] local 192.168.8.7 port 51518 connected to 192.168.8.204 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  23.5 MBytes   197 Mbits/sec    0    105 KBytes
[  4]   1.00-2.00   sec  23.7 MBytes   199 Mbits/sec    0    105 KBytes
[  4]   2.00-3.00   sec  23.6 MBytes   198 Mbits/sec    0    105 KBytes
[  4]   3.00-4.00   sec  23.0 MBytes   193 Mbits/sec    0    105 KBytes
[  4]   4.00-5.00   sec  23.4 MBytes   197 Mbits/sec    0    105 KBytes
[  4]   5.00-6.00   sec  23.3 MBytes   195 Mbits/sec    0    105 KBytes
[  4]   6.00-7.00   sec  23.4 MBytes   196 Mbits/sec    0    105 KBytes
[  4]   7.00-8.00   sec  23.6 MBytes   198 Mbits/sec    0    105 KBytes
[  4]   8.00-9.00   sec  23.1 MBytes   194 Mbits/sec    0    105 KBytes
[  4]   9.00-10.00  sec  22.1 MBytes   185 Mbits/sec    0    105 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   233 MBytes   195 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   232 MBytes   195 Mbits/sec                  receiver

iperf Done.

with the patch (gro enabled - this is done by default):

Connecting to host range-extender, port 5201
[  4] local 192.168.8.7 port 52004 connected to 192.168.8.204 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  32.7 MBytes   274 Mbits/sec    0    106 KBytes       
[  4]   1.00-2.00   sec  32.6 MBytes   274 Mbits/sec    0    106 KBytes       
[  4]   2.00-3.00   sec  32.4 MBytes   272 Mbits/sec    0    106 KBytes       
[  4]   3.00-4.00   sec  32.3 MBytes   271 Mbits/sec    0    106 KBytes       
[  4]   4.00-5.00   sec  32.5 MBytes   273 Mbits/sec    0    106 KBytes       
[  4]   5.00-6.00   sec  32.5 MBytes   273 Mbits/sec    0    106 KBytes       
[  4]   6.00-7.00   sec  32.6 MBytes   273 Mbits/sec    0    106 KBytes       
[  4]   7.00-8.00   sec  32.4 MBytes   272 Mbits/sec    0    106 KBytes       
[  4]   8.00-9.00   sec  32.6 MBytes   273 Mbits/sec    0    106 KBytes       
[  4]   9.00-10.00  sec  31.4 MBytes   264 Mbits/sec    0    106 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   324 MBytes   272 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   324 MBytes   272 Mbits/sec                  receiver

iperf Done.

(range-extender) # ethtool -K eth0 gro off

Connecting to host range-extender, port 5201
[  4] local 192.168.8.7 port 52120 connected to 192.168.8.204 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  24.8 MBytes   208 Mbits/sec    0    105 KBytes       
[  4]   1.00-2.00   sec  23.6 MBytes   198 Mbits/sec    0    105 KBytes       
[  4]   2.00-3.00   sec  24.5 MBytes   206 Mbits/sec    0    105 KBytes       
[  4]   3.00-4.00   sec  23.9 MBytes   201 Mbits/sec    0    105 KBytes       
[  4]   4.00-5.00   sec  24.6 MBytes   207 Mbits/sec    0    105 KBytes       
[  4]   5.00-6.00   sec  24.7 MBytes   207 Mbits/sec    0    105 KBytes       
[  4]   6.00-7.00   sec  24.5 MBytes   206 Mbits/sec    0    105 KBytes       
[  4]   7.00-8.00   sec  24.0 MBytes   201 Mbits/sec    0    105 KBytes       
[  4]   8.00-9.00   sec  24.3 MBytes   204 Mbits/sec    0    105 KBytes       
[  4]   9.00-10.00  sec  24.5 MBytes   206 Mbits/sec    0    105 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec   244 MBytes   204 Mbits/sec    0             sender
[  4]   0.00-10.00  sec   243 MBytes   204 Mbits/sec                  receiver

iperf Done.

So, the throughput went from 195 Mbits/sec to 272 Mbits/sec.
The gain would be (272 Mbps - 195 Mbps) / 195 Mbps = 0.3949 ~ 40%

Regards,
Christian
Rosen Penev Sept. 3, 2017, 10:46 p.m. UTC | #2
That's...way better than I expected given how minimal my changes are.

For some context, the ag71xx driver is special in that it does not seem
to do any hardware offloading to the NIC.

As far as I understand this change, GRO takes 1500 MTU packets and
packs then into 64Kb blocks which the kernel then processes.

I would be curious if anyone can do latency comparisons before this
change and after. I do know this driver to have lower latency than
others due to lack of offloads.

I guess all that's left is to add GSO support to the driver. That seems
like a lot more work than a three line change though. ¯\_(ツ)_/¯

On Sun, 2017-09-03 at 23:16 +0200, Christian Lamparter wrote:
> On Sunday, September 3, 2017 11:35:46 AM CEST Rosen Penev wrote:
> > On a TL-WN710N, this patch increases iperf performance from ~92.5
> > to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I
> > expect greater numbers from gigabit devices.
> > 
> > Signed-off-by: Rosen Penev <rosenp@gmail.com>
> > ---
> 
> I've done a quick test of the patch on my WD Range Extender.
> (It has a Atheros AR9344 rev 2 SoC @ CPU:560.000MHz, DDR:400.000MHz
> The PHY is a AR8035, which supports 1 GBit/s Links)
> 
> The range extender (DUT) was running iperf3 server in both tests.
> Another desktop PC was acting as the iperf3 client.
> 
> without the patch:
> 
> Connecting to host range-extender, port 5201
> [  4] local 192.168.8.7 port 51518 connected to 192.168.8.204 port
> 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-1.00   sec  23.5 MBytes   197 Mbits/sec    0    105
> KBytes
> [  4]   1.00-2.00   sec  23.7 MBytes   199 Mbits/sec    0    105
> KBytes
> [  4]   2.00-3.00   sec  23.6 MBytes   198 Mbits/sec    0    105
> KBytes
> [  4]   3.00-4.00   sec  23.0 MBytes   193 Mbits/sec    0    105
> KBytes
> [  4]   4.00-5.00   sec  23.4 MBytes   197 Mbits/sec    0    105
> KBytes
> [  4]   5.00-6.00   sec  23.3 MBytes   195 Mbits/sec    0    105
> KBytes
> [  4]   6.00-7.00   sec  23.4 MBytes   196 Mbits/sec    0    105
> KBytes
> [  4]   7.00-8.00   sec  23.6 MBytes   198 Mbits/sec    0    105
> KBytes
> [  4]   8.00-9.00   sec  23.1 MBytes   194 Mbits/sec    0    105
> KBytes
> [  4]   9.00-10.00  sec  22.1 MBytes   185 Mbits/sec    0    105
> KBytes
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-10.00  sec   233 MBytes   195
> Mbits/sec    0             sender
> [  4]   0.00-10.00  sec   232 MBytes   195
> Mbits/sec                  receiver
> 
> iperf Done.
> 
> with the patch (gro enabled - this is done by default):
> 
> Connecting to host range-extender, port 5201
> [  4] local 192.168.8.7 port 52004 connected to 192.168.8.204 port
> 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-1.00   sec  32.7 MBytes   274 Mbits/sec    0    106
> KBytes       
> [  4]   1.00-2.00   sec  32.6 MBytes   274 Mbits/sec    0    106
> KBytes       
> [  4]   2.00-3.00   sec  32.4 MBytes   272 Mbits/sec    0    106
> KBytes       
> [  4]   3.00-4.00   sec  32.3 MBytes   271 Mbits/sec    0    106
> KBytes       
> [  4]   4.00-5.00   sec  32.5 MBytes   273 Mbits/sec    0    106
> KBytes       
> [  4]   5.00-6.00   sec  32.5 MBytes   273 Mbits/sec    0    106
> KBytes       
> [  4]   6.00-7.00   sec  32.6 MBytes   273 Mbits/sec    0    106
> KBytes       
> [  4]   7.00-8.00   sec  32.4 MBytes   272 Mbits/sec    0    106
> KBytes       
> [  4]   8.00-9.00   sec  32.6 MBytes   273 Mbits/sec    0    106
> KBytes       
> [  4]   9.00-10.00  sec  31.4 MBytes   264 Mbits/sec    0    106
> KBytes       
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-10.00  sec   324 MBytes   272
> Mbits/sec    0             sender
> [  4]   0.00-10.00  sec   324 MBytes   272
> Mbits/sec                  receiver
> 
> iperf Done.
> 
> (range-extender) # ethtool -K eth0 gro off
> 
> Connecting to host range-extender, port 5201
> [  4] local 192.168.8.7 port 52120 connected to 192.168.8.204 port
> 5201
> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> [  4]   0.00-1.00   sec  24.8 MBytes   208 Mbits/sec    0    105
> KBytes       
> [  4]   1.00-2.00   sec  23.6 MBytes   198 Mbits/sec    0    105
> KBytes       
> [  4]   2.00-3.00   sec  24.5 MBytes   206 Mbits/sec    0    105
> KBytes       
> [  4]   3.00-4.00   sec  23.9 MBytes   201 Mbits/sec    0    105
> KBytes       
> [  4]   4.00-5.00   sec  24.6 MBytes   207 Mbits/sec    0    105
> KBytes       
> [  4]   5.00-6.00   sec  24.7 MBytes   207 Mbits/sec    0    105
> KBytes       
> [  4]   6.00-7.00   sec  24.5 MBytes   206 Mbits/sec    0    105
> KBytes       
> [  4]   7.00-8.00   sec  24.0 MBytes   201 Mbits/sec    0    105
> KBytes       
> [  4]   8.00-9.00   sec  24.3 MBytes   204 Mbits/sec    0    105
> KBytes       
> [  4]   9.00-10.00  sec  24.5 MBytes   206 Mbits/sec    0    105
> KBytes       
> - - - - - - - - - - - - - - - - - - - - - - - - -
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-10.00  sec   244 MBytes   204
> Mbits/sec    0             sender
> [  4]   0.00-10.00  sec   243 MBytes   204
> Mbits/sec                  receiver
> 
> iperf Done.
> 
> So, the throughput went from 195 Mbits/sec to 272 Mbits/sec.
> The gain would be (272 Mbps - 195 Mbps) / 195 Mbps = 0.3949 ~ 40%
> 
> Regards,
> Christian
Florian Fainelli Sept. 4, 2017, 2:10 a.m. UTC | #3
Le 09/03/17 à 15:46, rosenp@gmail.com a écrit :
> That's...way better than I expected given how minimal my changes are.
> 
> For some context, the ag71xx driver is special in that it does not seem
> to do any hardware offloading to the NIC.
> 
> As far as I understand this change, GRO takes 1500 MTU packets and
> packs then into 64Kb blocks which the kernel then processes.
> 
> I would be curious if anyone can do latency comparisons before this
> change and after. I do know this driver to have lower latency than
> others due to lack of offloads.
> 
> I guess all that's left is to add GSO support to the driver. That seems
> like a lot more work than a three line change though. ¯\_(ツ)_/¯

You could look into adding software TSO:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/ethernet/marvell/mv643xx_eth.c?id=3ae8f4e0b98b640aadf410c21185ccb6b5b02351

this makes a huge difference with mv643xx_eth. There were a number of
fixes on top of this initial commit but I would be curious to see what
it gives you with ag71xx

> 
> On Sun, 2017-09-03 at 23:16 +0200, Christian Lamparter wrote:
>> On Sunday, September 3, 2017 11:35:46 AM CEST Rosen Penev wrote:
>>> On a TL-WN710N, this patch increases iperf performance from ~92.5
>>> to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I
>>> expect greater numbers from gigabit devices.
>>>
>>> Signed-off-by: Rosen Penev <rosenp@gmail.com>
>>> ---
>>
>> I've done a quick test of the patch on my WD Range Extender.
>> (It has a Atheros AR9344 rev 2 SoC @ CPU:560.000MHz, DDR:400.000MHz
>> The PHY is a AR8035, which supports 1 GBit/s Links)
>>
>> The range extender (DUT) was running iperf3 server in both tests.
>> Another desktop PC was acting as the iperf3 client.
>>
>> without the patch:
>>
>> Connecting to host range-extender, port 5201
>> [  4] local 192.168.8.7 port 51518 connected to 192.168.8.204 port
>> 5201
>> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
>> [  4]   0.00-1.00   sec  23.5 MBytes   197 Mbits/sec    0    105
>> KBytes
>> [  4]   1.00-2.00   sec  23.7 MBytes   199 Mbits/sec    0    105
>> KBytes
>> [  4]   2.00-3.00   sec  23.6 MBytes   198 Mbits/sec    0    105
>> KBytes
>> [  4]   3.00-4.00   sec  23.0 MBytes   193 Mbits/sec    0    105
>> KBytes
>> [  4]   4.00-5.00   sec  23.4 MBytes   197 Mbits/sec    0    105
>> KBytes
>> [  4]   5.00-6.00   sec  23.3 MBytes   195 Mbits/sec    0    105
>> KBytes
>> [  4]   6.00-7.00   sec  23.4 MBytes   196 Mbits/sec    0    105
>> KBytes
>> [  4]   7.00-8.00   sec  23.6 MBytes   198 Mbits/sec    0    105
>> KBytes
>> [  4]   8.00-9.00   sec  23.1 MBytes   194 Mbits/sec    0    105
>> KBytes
>> [  4]   9.00-10.00  sec  22.1 MBytes   185 Mbits/sec    0    105
>> KBytes
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [ ID] Interval           Transfer     Bandwidth       Retr
>> [  4]   0.00-10.00  sec   233 MBytes   195
>> Mbits/sec    0             sender
>> [  4]   0.00-10.00  sec   232 MBytes   195
>> Mbits/sec                  receiver
>>
>> iperf Done.
>>
>> with the patch (gro enabled - this is done by default):
>>
>> Connecting to host range-extender, port 5201
>> [  4] local 192.168.8.7 port 52004 connected to 192.168.8.204 port
>> 5201
>> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
>> [  4]   0.00-1.00   sec  32.7 MBytes   274 Mbits/sec    0    106
>> KBytes       
>> [  4]   1.00-2.00   sec  32.6 MBytes   274 Mbits/sec    0    106
>> KBytes       
>> [  4]   2.00-3.00   sec  32.4 MBytes   272 Mbits/sec    0    106
>> KBytes       
>> [  4]   3.00-4.00   sec  32.3 MBytes   271 Mbits/sec    0    106
>> KBytes       
>> [  4]   4.00-5.00   sec  32.5 MBytes   273 Mbits/sec    0    106
>> KBytes       
>> [  4]   5.00-6.00   sec  32.5 MBytes   273 Mbits/sec    0    106
>> KBytes       
>> [  4]   6.00-7.00   sec  32.6 MBytes   273 Mbits/sec    0    106
>> KBytes       
>> [  4]   7.00-8.00   sec  32.4 MBytes   272 Mbits/sec    0    106
>> KBytes       
>> [  4]   8.00-9.00   sec  32.6 MBytes   273 Mbits/sec    0    106
>> KBytes       
>> [  4]   9.00-10.00  sec  31.4 MBytes   264 Mbits/sec    0    106
>> KBytes       
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [ ID] Interval           Transfer     Bandwidth       Retr
>> [  4]   0.00-10.00  sec   324 MBytes   272
>> Mbits/sec    0             sender
>> [  4]   0.00-10.00  sec   324 MBytes   272
>> Mbits/sec                  receiver
>>
>> iperf Done.
>>
>> (range-extender) # ethtool -K eth0 gro off
>>
>> Connecting to host range-extender, port 5201
>> [  4] local 192.168.8.7 port 52120 connected to 192.168.8.204 port
>> 5201
>> [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
>> [  4]   0.00-1.00   sec  24.8 MBytes   208 Mbits/sec    0    105
>> KBytes       
>> [  4]   1.00-2.00   sec  23.6 MBytes   198 Mbits/sec    0    105
>> KBytes       
>> [  4]   2.00-3.00   sec  24.5 MBytes   206 Mbits/sec    0    105
>> KBytes       
>> [  4]   3.00-4.00   sec  23.9 MBytes   201 Mbits/sec    0    105
>> KBytes       
>> [  4]   4.00-5.00   sec  24.6 MBytes   207 Mbits/sec    0    105
>> KBytes       
>> [  4]   5.00-6.00   sec  24.7 MBytes   207 Mbits/sec    0    105
>> KBytes       
>> [  4]   6.00-7.00   sec  24.5 MBytes   206 Mbits/sec    0    105
>> KBytes       
>> [  4]   7.00-8.00   sec  24.0 MBytes   201 Mbits/sec    0    105
>> KBytes       
>> [  4]   8.00-9.00   sec  24.3 MBytes   204 Mbits/sec    0    105
>> KBytes       
>> [  4]   9.00-10.00  sec  24.5 MBytes   206 Mbits/sec    0    105
>> KBytes       
>> - - - - - - - - - - - - - - - - - - - - - - - - -
>> [ ID] Interval           Transfer     Bandwidth       Retr
>> [  4]   0.00-10.00  sec   244 MBytes   204
>> Mbits/sec    0             sender
>> [  4]   0.00-10.00  sec   243 MBytes   204
>> Mbits/sec                  receiver
>>
>> iperf Done.
>>
>> So, the throughput went from 195 Mbits/sec to 272 Mbits/sec.
>> The gain would be (272 Mbps - 195 Mbps) / 195 Mbps = 0.3949 ~ 40%
>>
>> Regards,
>> Christian
> 
> _______________________________________________
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev
>
Rosen Penev Sept. 4, 2017, 4:15 p.m. UTC | #4
After a bit more research, it turns out this change is really boring.
Eric Dumazet did this for many ethernet drivers early this year.

As far as TSO goes, no idea how to even go about that. As an
interesting side note, TSO support broke mv643xx_eth for a while.

On Sun, 2017-09-03 at 19:10 -0700, Florian Fainelli wrote:
> Le 09/03/17 à 15:46, rosenp@gmail.com a écrit :
> > That's...way better than I expected given how minimal my changes
> > are.
> > 
> > For some context, the ag71xx driver is special in that it does not
> > seem
> > to do any hardware offloading to the NIC.
> > 
> > As far as I understand this change, GRO takes 1500 MTU packets and
> > packs then into 64Kb blocks which the kernel then processes.
> > 
> > I would be curious if anyone can do latency comparisons before this
> > change and after. I do know this driver to have lower latency than
> > others due to lack of offloads.
> > 
> > I guess all that's left is to add GSO support to the driver. That
> > seems
> > like a lot more work than a three line change though. ¯\_(ツ)_/¯
> 
> You could look into adding software TSO:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/co
> mmit/drivers/net/ethernet/marvell/mv643xx_eth.c?id=3ae8f4e0b98b640aad
> f410c21185ccb6b5b02351
> 
> this makes a huge difference with mv643xx_eth. There were a number of
> fixes on top of this initial commit but I would be curious to see
> what
> it gives you with ag71xx
> 
> > 
> > On Sun, 2017-09-03 at 23:16 +0200, Christian Lamparter wrote:
> > > On Sunday, September 3, 2017 11:35:46 AM CEST Rosen Penev wrote:
> > > > On a TL-WN710N, this patch increases iperf performance from
> > > > ~92.5
> > > > to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I
> > > > expect greater numbers from gigabit devices.
> > > > 
> > > > Signed-off-by: Rosen Penev <rosenp@gmail.com>
> > > > ---
> > > 
> > > I've done a quick test of the patch on my WD Range Extender.
> > > (It has a Atheros AR9344 rev 2 SoC @ CPU:560.000MHz,
> > > DDR:400.000MHz
> > > The PHY is a AR8035, which supports 1 GBit/s Links)
> > > 
> > > The range extender (DUT) was running iperf3 server in both tests.
> > > Another desktop PC was acting as the iperf3 client.
> > > 
> > > without the patch:
> > > 
> > > Connecting to host range-extender, port 5201
> > > [  4] local 192.168.8.7 port 51518 connected to 192.168.8.204
> > > port
> > > 5201
> > > [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> > > [  4]   0.00-1.00   sec  23.5 MBytes   197 Mbits/sec    0    105
> > > KBytes
> > > [  4]   1.00-2.00   sec  23.7 MBytes   199 Mbits/sec    0    105
> > > KBytes
> > > [  4]   2.00-3.00   sec  23.6 MBytes   198 Mbits/sec    0    105
> > > KBytes
> > > [  4]   3.00-4.00   sec  23.0 MBytes   193 Mbits/sec    0    105
> > > KBytes
> > > [  4]   4.00-5.00   sec  23.4 MBytes   197 Mbits/sec    0    105
> > > KBytes
> > > [  4]   5.00-6.00   sec  23.3 MBytes   195 Mbits/sec    0    105
> > > KBytes
> > > [  4]   6.00-7.00   sec  23.4 MBytes   196 Mbits/sec    0    105
> > > KBytes
> > > [  4]   7.00-8.00   sec  23.6 MBytes   198 Mbits/sec    0    105
> > > KBytes
> > > [  4]   8.00-9.00   sec  23.1 MBytes   194 Mbits/sec    0    105
> > > KBytes
> > > [  4]   9.00-10.00  sec  22.1 MBytes   185 Mbits/sec    0    105
> > > KBytes
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval           Transfer     Bandwidth       Retr
> > > [  4]   0.00-10.00  sec   233 MBytes   195
> > > Mbits/sec    0             sender
> > > [  4]   0.00-10.00  sec   232 MBytes   195
> > > Mbits/sec                  receiver
> > > 
> > > iperf Done.
> > > 
> > > with the patch (gro enabled - this is done by default):
> > > 
> > > Connecting to host range-extender, port 5201
> > > [  4] local 192.168.8.7 port 52004 connected to 192.168.8.204
> > > port
> > > 5201
> > > [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> > > [  4]   0.00-1.00   sec  32.7 MBytes   274 Mbits/sec    0    106
> > > KBytes       
> > > [  4]   1.00-2.00   sec  32.6 MBytes   274 Mbits/sec    0    106
> > > KBytes       
> > > [  4]   2.00-3.00   sec  32.4 MBytes   272 Mbits/sec    0    106
> > > KBytes       
> > > [  4]   3.00-4.00   sec  32.3 MBytes   271 Mbits/sec    0    106
> > > KBytes       
> > > [  4]   4.00-5.00   sec  32.5 MBytes   273 Mbits/sec    0    106
> > > KBytes       
> > > [  4]   5.00-6.00   sec  32.5 MBytes   273 Mbits/sec    0    106
> > > KBytes       
> > > [  4]   6.00-7.00   sec  32.6 MBytes   273 Mbits/sec    0    106
> > > KBytes       
> > > [  4]   7.00-8.00   sec  32.4 MBytes   272 Mbits/sec    0    106
> > > KBytes       
> > > [  4]   8.00-9.00   sec  32.6 MBytes   273 Mbits/sec    0    106
> > > KBytes       
> > > [  4]   9.00-10.00  sec  31.4 MBytes   264 Mbits/sec    0    106
> > > KBytes       
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval           Transfer     Bandwidth       Retr
> > > [  4]   0.00-10.00  sec   324 MBytes   272
> > > Mbits/sec    0             sender
> > > [  4]   0.00-10.00  sec   324 MBytes   272
> > > Mbits/sec                  receiver
> > > 
> > > iperf Done.
> > > 
> > > (range-extender) # ethtool -K eth0 gro off
> > > 
> > > Connecting to host range-extender, port 5201
> > > [  4] local 192.168.8.7 port 52120 connected to 192.168.8.204
> > > port
> > > 5201
> > > [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
> > > [  4]   0.00-1.00   sec  24.8 MBytes   208 Mbits/sec    0    105
> > > KBytes       
> > > [  4]   1.00-2.00   sec  23.6 MBytes   198 Mbits/sec    0    105
> > > KBytes       
> > > [  4]   2.00-3.00   sec  24.5 MBytes   206 Mbits/sec    0    105
> > > KBytes       
> > > [  4]   3.00-4.00   sec  23.9 MBytes   201 Mbits/sec    0    105
> > > KBytes       
> > > [  4]   4.00-5.00   sec  24.6 MBytes   207 Mbits/sec    0    105
> > > KBytes       
> > > [  4]   5.00-6.00   sec  24.7 MBytes   207 Mbits/sec    0    105
> > > KBytes       
> > > [  4]   6.00-7.00   sec  24.5 MBytes   206 Mbits/sec    0    105
> > > KBytes       
> > > [  4]   7.00-8.00   sec  24.0 MBytes   201 Mbits/sec    0    105
> > > KBytes       
> > > [  4]   8.00-9.00   sec  24.3 MBytes   204 Mbits/sec    0    105
> > > KBytes       
> > > [  4]   9.00-10.00  sec  24.5 MBytes   206 Mbits/sec    0    105
> > > KBytes       
> > > - - - - - - - - - - - - - - - - - - - - - - - - -
> > > [ ID] Interval           Transfer     Bandwidth       Retr
> > > [  4]   0.00-10.00  sec   244 MBytes   204
> > > Mbits/sec    0             sender
> > > [  4]   0.00-10.00  sec   243 MBytes   204
> > > Mbits/sec                  receiver
> > > 
> > > iperf Done.
> > > 
> > > So, the throughput went from 195 Mbits/sec to 272 Mbits/sec.
> > > The gain would be (272 Mbps - 195 Mbps) / 195 Mbps = 0.3949 ~ 40%
> > > 
> > > Regards,
> > > Christian
> > 
> > _______________________________________________
> > Lede-dev mailing list
> > Lede-dev@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/lede-dev
> > 
> 
>
Felix Fietkau Oct. 17, 2017, 2:02 p.m. UTC | #5
On 2017-09-03 20:35, Rosen Penev wrote:
> On a TL-WN710N, this patch increases iperf performance from ~92.5 to ~93.5 mbps.> Keep in mind the WN710N is a 100mbps device. I expect greater numbers
from gigabit devices.
> 
> Signed-off-by: Rosen Penev <rosenp@gmail.com>
Hi Rosen,

Sorry about that, but I will have to revert this change. It causes a
serious regression in LAN->WAN routing performance on various devices.
I did some digging and found out why:
For GRO to work properly, checksums of incoming packets have to be
verified very early in the network stack. The Ethernet MAC does not
support rx checksum offload, so this has to happen in software.
Due to the very small cache size, this causes a significant increase in
memory bus traffic.
It might be possible in the future to avoid this by making use of the
checksum offload engine, but that's a separate component on the chip and
not present on every SoC (only the newer ones).
It also requires a significant rework of the Ethernet driver, which I
don't have any time for.

- Felix
Dave Taht Oct. 17, 2017, 3:26 p.m. UTC | #6
Felix Fietkau <nbd@nbd.name> writes:

> On 2017-09-03 20:35, Rosen Penev wrote:
>> On a TL-WN710N, this patch increases iperf performance from ~92.5 to ~93.5
>> mbps.> Keep in mind the WN710N is a 100mbps device. I expect greater numbers
> from gigabit devices.
>> 
>> Signed-off-by: Rosen Penev <rosenp@gmail.com>
> Hi Rosen,
>
> Sorry about that, but I will have to revert this change. It causes a
> serious regression in LAN->WAN routing performance on various devices.
> I did some digging and found out why:
> For GRO to work properly, checksums of incoming packets have to be
> verified very early in the network stack. The Ethernet MAC does not
> support rx checksum offload, so this has to happen in software.
> Due to the very small cache size, this causes a significant increase in
> memory bus traffic.

Also, for the record, if there is a need to manage the WAN side to lower
speeds (say, below 40Mbit/s) via sqm, GRO bulking up a microburst into a
superpacket mandates sch_cake (rather than fq_codel) to peel it apart
again to hold latencies low there.

There are a lot of devices that do GRO that perhaps shouldn't.  mvneta
has very agressive soft-GRO, in particular. Shipping out one 64k
superpacket takes half a second at 1mbit.

> It might be possible in the future to avoid this by making use of the
> checksum offload engine, but that's a separate component on the chip and
> not present on every SoC (only the newer ones).
> It also requires a significant rework of the Ethernet driver, which I
> don't have any time for.
>
> - Felix
>
> _______________________________________________
> Lede-dev mailing list
> Lede-dev@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/lede-dev
diff mbox series

Patch

diff --git a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
index 566e9513d8..ae1bdf6066 100644
--- a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
+++ b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c
@@ -1089,7 +1089,7 @@  next:
 
 	while ((skb = __skb_dequeue(&queue)) != NULL) {
 		skb->protocol = eth_type_trans(skb, dev);
-		netif_receive_skb(skb);
+		napi_gro_receive(&ag->napi, skb);
 	}
 
 	DBG("%s: rx finish, curr=%u, dirty=%u, done=%d\n",
@@ -1141,7 +1141,7 @@  static int ag71xx_poll(struct napi_struct *napi, int limit)
 		DBG("%s: disable polling mode, rx=%d, tx=%d,limit=%d\n",
 			dev->name, rx_done, tx_done, limit);
 
-		napi_complete(napi);
+		napi_complete_done(napi, rx_done);
 
 		/* enable interrupts */
 		spin_lock_irqsave(&ag->lock, flags);
@@ -1160,7 +1160,7 @@  oom:
 		pr_info("%s: out of memory\n", dev->name);
 
 	mod_timer(&ag->oom_timer, jiffies + AG71XX_OOM_REFILL);
-	napi_complete(napi);
+	napi_complete_done(napi, rx_done);
 	return 0;
 }