Message ID | 20170903183546.16749-1-rosenp@gmail.com |
---|---|
State | Accepted |
Headers | show |
Series | [LEDE-DEV] ar71xx: Add GRO support to ag71xx | expand |
The sender domain has a DMARC Reject/Quarantine policy which disallows sending mailing list messages using the original "From" header. To mitigate this problem, the original message has been wrapped automatically by the mailing list software. On Sunday, September 3, 2017 11:35:46 AM CEST Rosen Penev wrote: > On a TL-WN710N, this patch increases iperf performance from ~92.5 to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I expect greater numbers from gigabit devices. > > Signed-off-by: Rosen Penev <rosenp@gmail.com> > --- I've done a quick test of the patch on my WD Range Extender. (It has a Atheros AR9344 rev 2 SoC @ CPU:560.000MHz, DDR:400.000MHz The PHY is a AR8035, which supports 1 GBit/s Links) The range extender (DUT) was running iperf3 server in both tests. Another desktop PC was acting as the iperf3 client. without the patch: Connecting to host range-extender, port 5201 [ 4] local 192.168.8.7 port 51518 connected to 192.168.8.204 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 23.5 MBytes 197 Mbits/sec 0 105 KBytes [ 4] 1.00-2.00 sec 23.7 MBytes 199 Mbits/sec 0 105 KBytes [ 4] 2.00-3.00 sec 23.6 MBytes 198 Mbits/sec 0 105 KBytes [ 4] 3.00-4.00 sec 23.0 MBytes 193 Mbits/sec 0 105 KBytes [ 4] 4.00-5.00 sec 23.4 MBytes 197 Mbits/sec 0 105 KBytes [ 4] 5.00-6.00 sec 23.3 MBytes 195 Mbits/sec 0 105 KBytes [ 4] 6.00-7.00 sec 23.4 MBytes 196 Mbits/sec 0 105 KBytes [ 4] 7.00-8.00 sec 23.6 MBytes 198 Mbits/sec 0 105 KBytes [ 4] 8.00-9.00 sec 23.1 MBytes 194 Mbits/sec 0 105 KBytes [ 4] 9.00-10.00 sec 22.1 MBytes 185 Mbits/sec 0 105 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 233 MBytes 195 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 232 MBytes 195 Mbits/sec receiver iperf Done. with the patch (gro enabled - this is done by default): Connecting to host range-extender, port 5201 [ 4] local 192.168.8.7 port 52004 connected to 192.168.8.204 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 32.7 MBytes 274 Mbits/sec 0 106 KBytes [ 4] 1.00-2.00 sec 32.6 MBytes 274 Mbits/sec 0 106 KBytes [ 4] 2.00-3.00 sec 32.4 MBytes 272 Mbits/sec 0 106 KBytes [ 4] 3.00-4.00 sec 32.3 MBytes 271 Mbits/sec 0 106 KBytes [ 4] 4.00-5.00 sec 32.5 MBytes 273 Mbits/sec 0 106 KBytes [ 4] 5.00-6.00 sec 32.5 MBytes 273 Mbits/sec 0 106 KBytes [ 4] 6.00-7.00 sec 32.6 MBytes 273 Mbits/sec 0 106 KBytes [ 4] 7.00-8.00 sec 32.4 MBytes 272 Mbits/sec 0 106 KBytes [ 4] 8.00-9.00 sec 32.6 MBytes 273 Mbits/sec 0 106 KBytes [ 4] 9.00-10.00 sec 31.4 MBytes 264 Mbits/sec 0 106 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 324 MBytes 272 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 324 MBytes 272 Mbits/sec receiver iperf Done. (range-extender) # ethtool -K eth0 gro off Connecting to host range-extender, port 5201 [ 4] local 192.168.8.7 port 52120 connected to 192.168.8.204 port 5201 [ ID] Interval Transfer Bandwidth Retr Cwnd [ 4] 0.00-1.00 sec 24.8 MBytes 208 Mbits/sec 0 105 KBytes [ 4] 1.00-2.00 sec 23.6 MBytes 198 Mbits/sec 0 105 KBytes [ 4] 2.00-3.00 sec 24.5 MBytes 206 Mbits/sec 0 105 KBytes [ 4] 3.00-4.00 sec 23.9 MBytes 201 Mbits/sec 0 105 KBytes [ 4] 4.00-5.00 sec 24.6 MBytes 207 Mbits/sec 0 105 KBytes [ 4] 5.00-6.00 sec 24.7 MBytes 207 Mbits/sec 0 105 KBytes [ 4] 6.00-7.00 sec 24.5 MBytes 206 Mbits/sec 0 105 KBytes [ 4] 7.00-8.00 sec 24.0 MBytes 201 Mbits/sec 0 105 KBytes [ 4] 8.00-9.00 sec 24.3 MBytes 204 Mbits/sec 0 105 KBytes [ 4] 9.00-10.00 sec 24.5 MBytes 206 Mbits/sec 0 105 KBytes - - - - - - - - - - - - - - - - - - - - - - - - - [ ID] Interval Transfer Bandwidth Retr [ 4] 0.00-10.00 sec 244 MBytes 204 Mbits/sec 0 sender [ 4] 0.00-10.00 sec 243 MBytes 204 Mbits/sec receiver iperf Done. So, the throughput went from 195 Mbits/sec to 272 Mbits/sec. The gain would be (272 Mbps - 195 Mbps) / 195 Mbps = 0.3949 ~ 40% Regards, Christian
That's...way better than I expected given how minimal my changes are. For some context, the ag71xx driver is special in that it does not seem to do any hardware offloading to the NIC. As far as I understand this change, GRO takes 1500 MTU packets and packs then into 64Kb blocks which the kernel then processes. I would be curious if anyone can do latency comparisons before this change and after. I do know this driver to have lower latency than others due to lack of offloads. I guess all that's left is to add GSO support to the driver. That seems like a lot more work than a three line change though. ¯\_(ツ)_/¯ On Sun, 2017-09-03 at 23:16 +0200, Christian Lamparter wrote: > On Sunday, September 3, 2017 11:35:46 AM CEST Rosen Penev wrote: > > On a TL-WN710N, this patch increases iperf performance from ~92.5 > > to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I > > expect greater numbers from gigabit devices. > > > > Signed-off-by: Rosen Penev <rosenp@gmail.com> > > --- > > I've done a quick test of the patch on my WD Range Extender. > (It has a Atheros AR9344 rev 2 SoC @ CPU:560.000MHz, DDR:400.000MHz > The PHY is a AR8035, which supports 1 GBit/s Links) > > The range extender (DUT) was running iperf3 server in both tests. > Another desktop PC was acting as the iperf3 client. > > without the patch: > > Connecting to host range-extender, port 5201 > [ 4] local 192.168.8.7 port 51518 connected to 192.168.8.204 port > 5201 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 23.5 MBytes 197 Mbits/sec 0 105 > KBytes > [ 4] 1.00-2.00 sec 23.7 MBytes 199 Mbits/sec 0 105 > KBytes > [ 4] 2.00-3.00 sec 23.6 MBytes 198 Mbits/sec 0 105 > KBytes > [ 4] 3.00-4.00 sec 23.0 MBytes 193 Mbits/sec 0 105 > KBytes > [ 4] 4.00-5.00 sec 23.4 MBytes 197 Mbits/sec 0 105 > KBytes > [ 4] 5.00-6.00 sec 23.3 MBytes 195 Mbits/sec 0 105 > KBytes > [ 4] 6.00-7.00 sec 23.4 MBytes 196 Mbits/sec 0 105 > KBytes > [ 4] 7.00-8.00 sec 23.6 MBytes 198 Mbits/sec 0 105 > KBytes > [ 4] 8.00-9.00 sec 23.1 MBytes 194 Mbits/sec 0 105 > KBytes > [ 4] 9.00-10.00 sec 22.1 MBytes 185 Mbits/sec 0 105 > KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 233 MBytes 195 > Mbits/sec 0 sender > [ 4] 0.00-10.00 sec 232 MBytes 195 > Mbits/sec receiver > > iperf Done. > > with the patch (gro enabled - this is done by default): > > Connecting to host range-extender, port 5201 > [ 4] local 192.168.8.7 port 52004 connected to 192.168.8.204 port > 5201 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 32.7 MBytes 274 Mbits/sec 0 106 > KBytes > [ 4] 1.00-2.00 sec 32.6 MBytes 274 Mbits/sec 0 106 > KBytes > [ 4] 2.00-3.00 sec 32.4 MBytes 272 Mbits/sec 0 106 > KBytes > [ 4] 3.00-4.00 sec 32.3 MBytes 271 Mbits/sec 0 106 > KBytes > [ 4] 4.00-5.00 sec 32.5 MBytes 273 Mbits/sec 0 106 > KBytes > [ 4] 5.00-6.00 sec 32.5 MBytes 273 Mbits/sec 0 106 > KBytes > [ 4] 6.00-7.00 sec 32.6 MBytes 273 Mbits/sec 0 106 > KBytes > [ 4] 7.00-8.00 sec 32.4 MBytes 272 Mbits/sec 0 106 > KBytes > [ 4] 8.00-9.00 sec 32.6 MBytes 273 Mbits/sec 0 106 > KBytes > [ 4] 9.00-10.00 sec 31.4 MBytes 264 Mbits/sec 0 106 > KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 324 MBytes 272 > Mbits/sec 0 sender > [ 4] 0.00-10.00 sec 324 MBytes 272 > Mbits/sec receiver > > iperf Done. > > (range-extender) # ethtool -K eth0 gro off > > Connecting to host range-extender, port 5201 > [ 4] local 192.168.8.7 port 52120 connected to 192.168.8.204 port > 5201 > [ ID] Interval Transfer Bandwidth Retr Cwnd > [ 4] 0.00-1.00 sec 24.8 MBytes 208 Mbits/sec 0 105 > KBytes > [ 4] 1.00-2.00 sec 23.6 MBytes 198 Mbits/sec 0 105 > KBytes > [ 4] 2.00-3.00 sec 24.5 MBytes 206 Mbits/sec 0 105 > KBytes > [ 4] 3.00-4.00 sec 23.9 MBytes 201 Mbits/sec 0 105 > KBytes > [ 4] 4.00-5.00 sec 24.6 MBytes 207 Mbits/sec 0 105 > KBytes > [ 4] 5.00-6.00 sec 24.7 MBytes 207 Mbits/sec 0 105 > KBytes > [ 4] 6.00-7.00 sec 24.5 MBytes 206 Mbits/sec 0 105 > KBytes > [ 4] 7.00-8.00 sec 24.0 MBytes 201 Mbits/sec 0 105 > KBytes > [ 4] 8.00-9.00 sec 24.3 MBytes 204 Mbits/sec 0 105 > KBytes > [ 4] 9.00-10.00 sec 24.5 MBytes 206 Mbits/sec 0 105 > KBytes > - - - - - - - - - - - - - - - - - - - - - - - - - > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-10.00 sec 244 MBytes 204 > Mbits/sec 0 sender > [ 4] 0.00-10.00 sec 243 MBytes 204 > Mbits/sec receiver > > iperf Done. > > So, the throughput went from 195 Mbits/sec to 272 Mbits/sec. > The gain would be (272 Mbps - 195 Mbps) / 195 Mbps = 0.3949 ~ 40% > > Regards, > Christian
Le 09/03/17 à 15:46, rosenp@gmail.com a écrit : > That's...way better than I expected given how minimal my changes are. > > For some context, the ag71xx driver is special in that it does not seem > to do any hardware offloading to the NIC. > > As far as I understand this change, GRO takes 1500 MTU packets and > packs then into 64Kb blocks which the kernel then processes. > > I would be curious if anyone can do latency comparisons before this > change and after. I do know this driver to have lower latency than > others due to lack of offloads. > > I guess all that's left is to add GSO support to the driver. That seems > like a lot more work than a three line change though. ¯\_(ツ)_/¯ You could look into adding software TSO: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/net/ethernet/marvell/mv643xx_eth.c?id=3ae8f4e0b98b640aadf410c21185ccb6b5b02351 this makes a huge difference with mv643xx_eth. There were a number of fixes on top of this initial commit but I would be curious to see what it gives you with ag71xx > > On Sun, 2017-09-03 at 23:16 +0200, Christian Lamparter wrote: >> On Sunday, September 3, 2017 11:35:46 AM CEST Rosen Penev wrote: >>> On a TL-WN710N, this patch increases iperf performance from ~92.5 >>> to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I >>> expect greater numbers from gigabit devices. >>> >>> Signed-off-by: Rosen Penev <rosenp@gmail.com> >>> --- >> >> I've done a quick test of the patch on my WD Range Extender. >> (It has a Atheros AR9344 rev 2 SoC @ CPU:560.000MHz, DDR:400.000MHz >> The PHY is a AR8035, which supports 1 GBit/s Links) >> >> The range extender (DUT) was running iperf3 server in both tests. >> Another desktop PC was acting as the iperf3 client. >> >> without the patch: >> >> Connecting to host range-extender, port 5201 >> [ 4] local 192.168.8.7 port 51518 connected to 192.168.8.204 port >> 5201 >> [ ID] Interval Transfer Bandwidth Retr Cwnd >> [ 4] 0.00-1.00 sec 23.5 MBytes 197 Mbits/sec 0 105 >> KBytes >> [ 4] 1.00-2.00 sec 23.7 MBytes 199 Mbits/sec 0 105 >> KBytes >> [ 4] 2.00-3.00 sec 23.6 MBytes 198 Mbits/sec 0 105 >> KBytes >> [ 4] 3.00-4.00 sec 23.0 MBytes 193 Mbits/sec 0 105 >> KBytes >> [ 4] 4.00-5.00 sec 23.4 MBytes 197 Mbits/sec 0 105 >> KBytes >> [ 4] 5.00-6.00 sec 23.3 MBytes 195 Mbits/sec 0 105 >> KBytes >> [ 4] 6.00-7.00 sec 23.4 MBytes 196 Mbits/sec 0 105 >> KBytes >> [ 4] 7.00-8.00 sec 23.6 MBytes 198 Mbits/sec 0 105 >> KBytes >> [ 4] 8.00-9.00 sec 23.1 MBytes 194 Mbits/sec 0 105 >> KBytes >> [ 4] 9.00-10.00 sec 22.1 MBytes 185 Mbits/sec 0 105 >> KBytes >> - - - - - - - - - - - - - - - - - - - - - - - - - >> [ ID] Interval Transfer Bandwidth Retr >> [ 4] 0.00-10.00 sec 233 MBytes 195 >> Mbits/sec 0 sender >> [ 4] 0.00-10.00 sec 232 MBytes 195 >> Mbits/sec receiver >> >> iperf Done. >> >> with the patch (gro enabled - this is done by default): >> >> Connecting to host range-extender, port 5201 >> [ 4] local 192.168.8.7 port 52004 connected to 192.168.8.204 port >> 5201 >> [ ID] Interval Transfer Bandwidth Retr Cwnd >> [ 4] 0.00-1.00 sec 32.7 MBytes 274 Mbits/sec 0 106 >> KBytes >> [ 4] 1.00-2.00 sec 32.6 MBytes 274 Mbits/sec 0 106 >> KBytes >> [ 4] 2.00-3.00 sec 32.4 MBytes 272 Mbits/sec 0 106 >> KBytes >> [ 4] 3.00-4.00 sec 32.3 MBytes 271 Mbits/sec 0 106 >> KBytes >> [ 4] 4.00-5.00 sec 32.5 MBytes 273 Mbits/sec 0 106 >> KBytes >> [ 4] 5.00-6.00 sec 32.5 MBytes 273 Mbits/sec 0 106 >> KBytes >> [ 4] 6.00-7.00 sec 32.6 MBytes 273 Mbits/sec 0 106 >> KBytes >> [ 4] 7.00-8.00 sec 32.4 MBytes 272 Mbits/sec 0 106 >> KBytes >> [ 4] 8.00-9.00 sec 32.6 MBytes 273 Mbits/sec 0 106 >> KBytes >> [ 4] 9.00-10.00 sec 31.4 MBytes 264 Mbits/sec 0 106 >> KBytes >> - - - - - - - - - - - - - - - - - - - - - - - - - >> [ ID] Interval Transfer Bandwidth Retr >> [ 4] 0.00-10.00 sec 324 MBytes 272 >> Mbits/sec 0 sender >> [ 4] 0.00-10.00 sec 324 MBytes 272 >> Mbits/sec receiver >> >> iperf Done. >> >> (range-extender) # ethtool -K eth0 gro off >> >> Connecting to host range-extender, port 5201 >> [ 4] local 192.168.8.7 port 52120 connected to 192.168.8.204 port >> 5201 >> [ ID] Interval Transfer Bandwidth Retr Cwnd >> [ 4] 0.00-1.00 sec 24.8 MBytes 208 Mbits/sec 0 105 >> KBytes >> [ 4] 1.00-2.00 sec 23.6 MBytes 198 Mbits/sec 0 105 >> KBytes >> [ 4] 2.00-3.00 sec 24.5 MBytes 206 Mbits/sec 0 105 >> KBytes >> [ 4] 3.00-4.00 sec 23.9 MBytes 201 Mbits/sec 0 105 >> KBytes >> [ 4] 4.00-5.00 sec 24.6 MBytes 207 Mbits/sec 0 105 >> KBytes >> [ 4] 5.00-6.00 sec 24.7 MBytes 207 Mbits/sec 0 105 >> KBytes >> [ 4] 6.00-7.00 sec 24.5 MBytes 206 Mbits/sec 0 105 >> KBytes >> [ 4] 7.00-8.00 sec 24.0 MBytes 201 Mbits/sec 0 105 >> KBytes >> [ 4] 8.00-9.00 sec 24.3 MBytes 204 Mbits/sec 0 105 >> KBytes >> [ 4] 9.00-10.00 sec 24.5 MBytes 206 Mbits/sec 0 105 >> KBytes >> - - - - - - - - - - - - - - - - - - - - - - - - - >> [ ID] Interval Transfer Bandwidth Retr >> [ 4] 0.00-10.00 sec 244 MBytes 204 >> Mbits/sec 0 sender >> [ 4] 0.00-10.00 sec 243 MBytes 204 >> Mbits/sec receiver >> >> iperf Done. >> >> So, the throughput went from 195 Mbits/sec to 272 Mbits/sec. >> The gain would be (272 Mbps - 195 Mbps) / 195 Mbps = 0.3949 ~ 40% >> >> Regards, >> Christian > > _______________________________________________ > Lede-dev mailing list > Lede-dev@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/lede-dev >
After a bit more research, it turns out this change is really boring. Eric Dumazet did this for many ethernet drivers early this year. As far as TSO goes, no idea how to even go about that. As an interesting side note, TSO support broke mv643xx_eth for a while. On Sun, 2017-09-03 at 19:10 -0700, Florian Fainelli wrote: > Le 09/03/17 à 15:46, rosenp@gmail.com a écrit : > > That's...way better than I expected given how minimal my changes > > are. > > > > For some context, the ag71xx driver is special in that it does not > > seem > > to do any hardware offloading to the NIC. > > > > As far as I understand this change, GRO takes 1500 MTU packets and > > packs then into 64Kb blocks which the kernel then processes. > > > > I would be curious if anyone can do latency comparisons before this > > change and after. I do know this driver to have lower latency than > > others due to lack of offloads. > > > > I guess all that's left is to add GSO support to the driver. That > > seems > > like a lot more work than a three line change though. ¯\_(ツ)_/¯ > > You could look into adding software TSO: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/co > mmit/drivers/net/ethernet/marvell/mv643xx_eth.c?id=3ae8f4e0b98b640aad > f410c21185ccb6b5b02351 > > this makes a huge difference with mv643xx_eth. There were a number of > fixes on top of this initial commit but I would be curious to see > what > it gives you with ag71xx > > > > > On Sun, 2017-09-03 at 23:16 +0200, Christian Lamparter wrote: > > > On Sunday, September 3, 2017 11:35:46 AM CEST Rosen Penev wrote: > > > > On a TL-WN710N, this patch increases iperf performance from > > > > ~92.5 > > > > to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I > > > > expect greater numbers from gigabit devices. > > > > > > > > Signed-off-by: Rosen Penev <rosenp@gmail.com> > > > > --- > > > > > > I've done a quick test of the patch on my WD Range Extender. > > > (It has a Atheros AR9344 rev 2 SoC @ CPU:560.000MHz, > > > DDR:400.000MHz > > > The PHY is a AR8035, which supports 1 GBit/s Links) > > > > > > The range extender (DUT) was running iperf3 server in both tests. > > > Another desktop PC was acting as the iperf3 client. > > > > > > without the patch: > > > > > > Connecting to host range-extender, port 5201 > > > [ 4] local 192.168.8.7 port 51518 connected to 192.168.8.204 > > > port > > > 5201 > > > [ ID] Interval Transfer Bandwidth Retr Cwnd > > > [ 4] 0.00-1.00 sec 23.5 MBytes 197 Mbits/sec 0 105 > > > KBytes > > > [ 4] 1.00-2.00 sec 23.7 MBytes 199 Mbits/sec 0 105 > > > KBytes > > > [ 4] 2.00-3.00 sec 23.6 MBytes 198 Mbits/sec 0 105 > > > KBytes > > > [ 4] 3.00-4.00 sec 23.0 MBytes 193 Mbits/sec 0 105 > > > KBytes > > > [ 4] 4.00-5.00 sec 23.4 MBytes 197 Mbits/sec 0 105 > > > KBytes > > > [ 4] 5.00-6.00 sec 23.3 MBytes 195 Mbits/sec 0 105 > > > KBytes > > > [ 4] 6.00-7.00 sec 23.4 MBytes 196 Mbits/sec 0 105 > > > KBytes > > > [ 4] 7.00-8.00 sec 23.6 MBytes 198 Mbits/sec 0 105 > > > KBytes > > > [ 4] 8.00-9.00 sec 23.1 MBytes 194 Mbits/sec 0 105 > > > KBytes > > > [ 4] 9.00-10.00 sec 22.1 MBytes 185 Mbits/sec 0 105 > > > KBytes > > > - - - - - - - - - - - - - - - - - - - - - - - - - > > > [ ID] Interval Transfer Bandwidth Retr > > > [ 4] 0.00-10.00 sec 233 MBytes 195 > > > Mbits/sec 0 sender > > > [ 4] 0.00-10.00 sec 232 MBytes 195 > > > Mbits/sec receiver > > > > > > iperf Done. > > > > > > with the patch (gro enabled - this is done by default): > > > > > > Connecting to host range-extender, port 5201 > > > [ 4] local 192.168.8.7 port 52004 connected to 192.168.8.204 > > > port > > > 5201 > > > [ ID] Interval Transfer Bandwidth Retr Cwnd > > > [ 4] 0.00-1.00 sec 32.7 MBytes 274 Mbits/sec 0 106 > > > KBytes > > > [ 4] 1.00-2.00 sec 32.6 MBytes 274 Mbits/sec 0 106 > > > KBytes > > > [ 4] 2.00-3.00 sec 32.4 MBytes 272 Mbits/sec 0 106 > > > KBytes > > > [ 4] 3.00-4.00 sec 32.3 MBytes 271 Mbits/sec 0 106 > > > KBytes > > > [ 4] 4.00-5.00 sec 32.5 MBytes 273 Mbits/sec 0 106 > > > KBytes > > > [ 4] 5.00-6.00 sec 32.5 MBytes 273 Mbits/sec 0 106 > > > KBytes > > > [ 4] 6.00-7.00 sec 32.6 MBytes 273 Mbits/sec 0 106 > > > KBytes > > > [ 4] 7.00-8.00 sec 32.4 MBytes 272 Mbits/sec 0 106 > > > KBytes > > > [ 4] 8.00-9.00 sec 32.6 MBytes 273 Mbits/sec 0 106 > > > KBytes > > > [ 4] 9.00-10.00 sec 31.4 MBytes 264 Mbits/sec 0 106 > > > KBytes > > > - - - - - - - - - - - - - - - - - - - - - - - - - > > > [ ID] Interval Transfer Bandwidth Retr > > > [ 4] 0.00-10.00 sec 324 MBytes 272 > > > Mbits/sec 0 sender > > > [ 4] 0.00-10.00 sec 324 MBytes 272 > > > Mbits/sec receiver > > > > > > iperf Done. > > > > > > (range-extender) # ethtool -K eth0 gro off > > > > > > Connecting to host range-extender, port 5201 > > > [ 4] local 192.168.8.7 port 52120 connected to 192.168.8.204 > > > port > > > 5201 > > > [ ID] Interval Transfer Bandwidth Retr Cwnd > > > [ 4] 0.00-1.00 sec 24.8 MBytes 208 Mbits/sec 0 105 > > > KBytes > > > [ 4] 1.00-2.00 sec 23.6 MBytes 198 Mbits/sec 0 105 > > > KBytes > > > [ 4] 2.00-3.00 sec 24.5 MBytes 206 Mbits/sec 0 105 > > > KBytes > > > [ 4] 3.00-4.00 sec 23.9 MBytes 201 Mbits/sec 0 105 > > > KBytes > > > [ 4] 4.00-5.00 sec 24.6 MBytes 207 Mbits/sec 0 105 > > > KBytes > > > [ 4] 5.00-6.00 sec 24.7 MBytes 207 Mbits/sec 0 105 > > > KBytes > > > [ 4] 6.00-7.00 sec 24.5 MBytes 206 Mbits/sec 0 105 > > > KBytes > > > [ 4] 7.00-8.00 sec 24.0 MBytes 201 Mbits/sec 0 105 > > > KBytes > > > [ 4] 8.00-9.00 sec 24.3 MBytes 204 Mbits/sec 0 105 > > > KBytes > > > [ 4] 9.00-10.00 sec 24.5 MBytes 206 Mbits/sec 0 105 > > > KBytes > > > - - - - - - - - - - - - - - - - - - - - - - - - - > > > [ ID] Interval Transfer Bandwidth Retr > > > [ 4] 0.00-10.00 sec 244 MBytes 204 > > > Mbits/sec 0 sender > > > [ 4] 0.00-10.00 sec 243 MBytes 204 > > > Mbits/sec receiver > > > > > > iperf Done. > > > > > > So, the throughput went from 195 Mbits/sec to 272 Mbits/sec. > > > The gain would be (272 Mbps - 195 Mbps) / 195 Mbps = 0.3949 ~ 40% > > > > > > Regards, > > > Christian > > > > _______________________________________________ > > Lede-dev mailing list > > Lede-dev@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/lede-dev > > > >
On 2017-09-03 20:35, Rosen Penev wrote: > On a TL-WN710N, this patch increases iperf performance from ~92.5 to ~93.5 mbps.> Keep in mind the WN710N is a 100mbps device. I expect greater numbers from gigabit devices. > > Signed-off-by: Rosen Penev <rosenp@gmail.com> Hi Rosen, Sorry about that, but I will have to revert this change. It causes a serious regression in LAN->WAN routing performance on various devices. I did some digging and found out why: For GRO to work properly, checksums of incoming packets have to be verified very early in the network stack. The Ethernet MAC does not support rx checksum offload, so this has to happen in software. Due to the very small cache size, this causes a significant increase in memory bus traffic. It might be possible in the future to avoid this by making use of the checksum offload engine, but that's a separate component on the chip and not present on every SoC (only the newer ones). It also requires a significant rework of the Ethernet driver, which I don't have any time for. - Felix
Felix Fietkau <nbd@nbd.name> writes: > On 2017-09-03 20:35, Rosen Penev wrote: >> On a TL-WN710N, this patch increases iperf performance from ~92.5 to ~93.5 >> mbps.> Keep in mind the WN710N is a 100mbps device. I expect greater numbers > from gigabit devices. >> >> Signed-off-by: Rosen Penev <rosenp@gmail.com> > Hi Rosen, > > Sorry about that, but I will have to revert this change. It causes a > serious regression in LAN->WAN routing performance on various devices. > I did some digging and found out why: > For GRO to work properly, checksums of incoming packets have to be > verified very early in the network stack. The Ethernet MAC does not > support rx checksum offload, so this has to happen in software. > Due to the very small cache size, this causes a significant increase in > memory bus traffic. Also, for the record, if there is a need to manage the WAN side to lower speeds (say, below 40Mbit/s) via sqm, GRO bulking up a microburst into a superpacket mandates sch_cake (rather than fq_codel) to peel it apart again to hold latencies low there. There are a lot of devices that do GRO that perhaps shouldn't. mvneta has very agressive soft-GRO, in particular. Shipping out one 64k superpacket takes half a second at 1mbit. > It might be possible in the future to avoid this by making use of the > checksum offload engine, but that's a separate component on the chip and > not present on every SoC (only the newer ones). > It also requires a significant rework of the Ethernet driver, which I > don't have any time for. > > - Felix > > _______________________________________________ > Lede-dev mailing list > Lede-dev@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/lede-dev
diff --git a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c index 566e9513d8..ae1bdf6066 100644 --- a/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c +++ b/target/linux/ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c @@ -1089,7 +1089,7 @@ next: while ((skb = __skb_dequeue(&queue)) != NULL) { skb->protocol = eth_type_trans(skb, dev); - netif_receive_skb(skb); + napi_gro_receive(&ag->napi, skb); } DBG("%s: rx finish, curr=%u, dirty=%u, done=%d\n", @@ -1141,7 +1141,7 @@ static int ag71xx_poll(struct napi_struct *napi, int limit) DBG("%s: disable polling mode, rx=%d, tx=%d,limit=%d\n", dev->name, rx_done, tx_done, limit); - napi_complete(napi); + napi_complete_done(napi, rx_done); /* enable interrupts */ spin_lock_irqsave(&ag->lock, flags); @@ -1160,7 +1160,7 @@ oom: pr_info("%s: out of memory\n", dev->name); mod_timer(&ag->oom_timer, jiffies + AG71XX_OOM_REFILL); - napi_complete(napi); + napi_complete_done(napi, rx_done); return 0; }
On a TL-WN710N, this patch increases iperf performance from ~92.5 to ~93.5 mbps. Keep in mind the WN710N is a 100mbps device. I expect greater numbers from gigabit devices. Signed-off-by: Rosen Penev <rosenp@gmail.com> --- .../ar71xx/files/drivers/net/ethernet/atheros/ag71xx/ag71xx_main.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)