mbox series

[0/3] : net: dsa: mt7530: support MT7530 in the MT7621 SoC

Message ID 20181130075737.8041-1-gerg@kernel.org
Headers show
Series : net: dsa: mt7530: support MT7530 in the MT7621 SoC | expand

Message

Greg Ungerer Nov. 30, 2018, 7:57 a.m. UTC
I have been working towards supporting the MT7530 switch as used in the
MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621 is built around
a dual core MIPS CPU architecture. But underneath it is what appears to
be the same 7530 switch.

The following 3 patches are more of an RFC than anything. They allow
use of the mt7530 dsa driver on this device - though with some issues
still to resolve. The primary change required is to not use the 7623
specific clock and regulator setup - none of that applies when using
the 7621 (and maybe other devices?). The other change required is to
set the 7530 MFC register CPU port number and enable bit.

The unresolved issues I still have appear to be more related to the
MT7621 ethernet driver (drivers/staging/mt7621-eth/*). I am hoping
someone might have some ideas on these. I don't really have any good
documentation on the ethernet devices on the 7621, so I am kind of
working in the dark here.

1. TX packets are not getting an IP header checksum via the normal
   off-loaded checksumming when in DSA mode. I have to switch off
   NETIF_F_IP_CSUM, so the software stack generates the checksum.
   That checksum offloading works ok when not using the 7530 DSA driver.

2. Maximal sized RX packets get silently dropped. So receive side packets
   that are large (perfect case is the all-but-last packets in a fragemented
   larger packet) appear to be dropped at the mt7621 ethernet MAC level.
   The 7530 MIB switch register counters show receive packets at the physical
   switch port side and at the CPU switch port - but I get no packets
   received or errors in the 7621 ethernet MAC. If I set the mtu of the
   server at the other end a little smaller (a few bytes is enough) then
   I get all the packets through. It seems like the DSA/VLAN tag bytes
   are causing a too large packet to get silently dropped somewhere.

Any thoughts greatly appreciated...

 Documentation/devicetree/bindings/net/dsa/mt7530.txt |    5 +
 drivers/net/dsa/mt7530.c                             |   95 ++++++++++++-------
 drivers/net/dsa/mt7530.h                             |    5 +
 3 files changed, 70 insertions(+), 35 deletions(-)

Comments

René van Dorst Nov. 30, 2018, 11:27 a.m. UTC | #1
Quoting gerg@kernel.org:

> I have been working towards supporting the MT7530 switch as used in the
> MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621 is built around
> a dual core MIPS CPU architecture. But underneath it is what appears to
> be the same 7530 switch.
>
> The following 3 patches are more of an RFC than anything. They allow
> use of the mt7530 dsa driver on this device - though with some issues
> still to resolve. The primary change required is to not use the 7623
> specific clock and regulator setup - none of that applies when using
> the 7621 (and maybe other devices?). The other change required is to
> set the 7530 MFC register CPU port number and enable bit.


Hi Greg,

Good to see that more people are working on the MT7621 device [1].
So I added Bjorn to the CC.

I am also working on this but on the OpenWRT side.
My current code works for a Ubiquiti EdgeRouter X SFP. See kernel [2],  
openwrt [3]

Current status:

Using OpenWRT provided mainline v4.14 driver MT7530 and MT7623.
I patches so that MT7621 is supported.
This means DSA part is also working, internal and external phys are detected.
I can use all of the five RJ45 ports and also MT7520 switch port 5  
which connects to a external phy (at8033) for the SFP port.
Last added TRGMII part also seems to work but with issues, see below.
Openwrt uses port 5 as wan and gets a dhcp lease.

Issues:
- I can't get 2nd GMAC talk to external phy. I have tried many many  
knobs but without success.
   GMAC seems to work but no data is transmitted/received over the cable.
   But I think this can be done later on. Adding basic support for  
MT7621 is good start.
- Ethernet driver expects that the macs are initialized so that the  
mtk_hw_init can setup the hardware registers.
   But they are not. See [4]
   I don't know how to fix this. For the current code it is not an  
issue. It still works.
   But it should be fixed.
   Because of this I can't read the mac0 "phy-mode". I need this info  
to setup the tgrmii clock at hardware init.
- Ethernet speed is unstable ~30-100mbit. I think I broke something. I  
have seems 1gbit before.

I hope that this can help you the get a step further.

Greats,

René

[1] https://lists.openwrt.org/pipermail/openwrt-devel/2018-August/013614.html
[2] https://github.com/vDorst/linux-1/commits/mt7621-dsa-trgmii
[3] https://github.com/vDorst/openwrt/commits/mt7621-dsa-trgmii
[4]  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mediatek/mtk_eth_soc.c?h=v4.14.84#n1946
René van Dorst Nov. 30, 2018, 11:30 a.m. UTC | #2
Quoting gerg@kernel.org:

> I have been working towards supporting the MT7530 switch as used in the
> MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621 is built around
> a dual core MIPS CPU architecture. But underneath it is what appears to
> be the same 7530 switch.
>
> The following 3 patches are more of an RFC than anything. They allow
> use of the mt7530 dsa driver on this device - though with some issues
> still to resolve. The primary change required is to not use the 7623
> specific clock and regulator setup - none of that applies when using
> the 7621 (and maybe other devices?). The other change required is to
> set the 7530 MFC register CPU port number and enable bit.

Hi Greg,

Good to see that more people are working on the MT7621 device [1].
So I added Bjorn to the CC.

I am also working on this but on the OpenWRT side.
My current code works for a Ubiquiti EdgeRouter X SFP. See kernel [2],  
openwrt [3]

Current status:

Using OpenWRT provided mainline v4.14 driver MT7530 and MT7623.
I patches so that MT7621 is supported.
This means DSA part is also working, internal and external phys are detected.
I can use all of the five RJ45 ports and also MT7520 switch port 5  
which connects to a external phy (at8033) for the SFP port.
Last added TRGMII part also seems to work but with issues, see below.
Openwrt uses port 5 as wan and gets a dhcp lease.

Issues:
- I can't get 2nd GMAC talk to external phy. I have tried many many  
knobs but without success.
   GMAC seems to work but no data is transmitted/received over the cable.
   But I think this can be done later on. Adding basic support for  
MT7621 is good start.
- Ethernet driver expects that the macs are initialized so that the  
mtk_hw_init can setup the hardware registers.
   But they are not. See [4]
   I don't know how to fix this. For the current code it is not an  
issue. It still works.
   But it should be fixed.
   Because of this I can't read the mac0 "phy-mode". I need this info  
to setup the tgrmii clock at hardware init.
- Ethernet speed is unstable ~30-100mbit. I think I broke something. I  
have seems 1gbit before.

I hope that this can help you the get a step further.

Greats,

René

[1] https://lists.openwrt.org/pipermail/openwrt-devel/2018-August/013614.html
[2] https://github.com/vDorst/linux-1/commits/mt7621-dsa-trgmii
[3] https://github.com/vDorst/openwrt/commits/mt7621-dsa-trgmii
[4]  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mediatek/mtk_eth_soc.c?h=v4.14.84#n1946
Bjørn Mork Nov. 30, 2018, 12:16 p.m. UTC | #3
gerg@kernel.org writes:

> I have been working towards supporting the MT7530 switch as used in the
> MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621 is built around
> a dual core MIPS CPU architecture. But underneath it is what appears to
> be the same 7530 switch.

Great!  Good to see someone pushing this idea forward.

> The following 3 patches are more of an RFC than anything. They allow
> use of the mt7530 dsa driver on this device - though with some issues
> still to resolve. The primary change required is to not use the 7623
> specific clock and regulator setup - none of that applies when using
> the 7621 (and maybe other devices?). The other change required is to
> set the 7530 MFC register CPU port number and enable bit.
>
> The unresolved issues I still have appear to be more related to the
> MT7621 ethernet driver (drivers/staging/mt7621-eth/*). I am hoping
> someone might have some ideas on these. I don't really have any good
> documentation on the ethernet devices on the 7621, so I am kind of
> working in the dark here.

No offense, but the mt7621-eth driver in staging is horrible.  What both
René and I have had some success with is adapting the mtk_eth_soc driver
already in drivers/net/ethernet/mediatek/.  Yes, I know this is supposed
to be for other SoCs, but the basic design is obviously the same.

I have had some success with a first hackish attemt based on OpenWrt.
You can find the early tree here, but note that my focus was basically
getting one specific MT7621 board up and running:
https://github.com/bmork/LEDE/tree/mt7621-with-mainline-eth-driver

This patch has most of the necessary changes to enable that driver for
MT7621:
https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404

Not a big deal, as you can see.  There is of course a reason I didn't
submit this here yet: It is by no means finished...  But it works. And I
have both GMACs working with this driver, which was my primary goal.

> 1. TX packets are not getting an IP header checksum via the normal
>    off-loaded checksumming when in DSA mode. I have to switch off
>    NETIF_F_IP_CSUM, so the software stack generates the checksum.
>    That checksum offloading works ok when not using the 7530 DSA driver.

Hmm.  How do I test this?

> 2. Maximal sized RX packets get silently dropped. So receive side packets
>    that are large (perfect case is the all-but-last packets in a fragemented
>    larger packet) appear to be dropped at the mt7621 ethernet MAC level.
>    The 7530 MIB switch register counters show receive packets at the physical
>    switch port side and at the CPU switch port - but I get no packets
>    received or errors in the 7621 ethernet MAC. If I set the mtu of the
>    server at the other end a little smaller (a few bytes is enough) then
>    I get all the packets through. It seems like the DSA/VLAN tag bytes
>    are causing a too large packet to get silently dropped somewhere.

Are you referring to the configured MTU size or some other maximal size?
If MTU, then I don't seem to have this issue with the driver from
drivers/net/ethernet/mediatek/.



Bjørn
Greg Ungerer Nov. 30, 2018, 1:25 p.m. UTC | #4
Hi Rene,

On 30/11/18 9:27 pm, René van Dorst wrote:
> Quoting gerg@kernel.org:
> 
>> I have been working towards supporting the MT7530 switch as used in the
>> MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621 is built around
>> a dual core MIPS CPU architecture. But underneath it is what appears to
>> be the same 7530 switch.
>>
>> The following 3 patches are more of an RFC than anything. They allow
>> use of the mt7530 dsa driver on this device - though with some issues
>> still to resolve. The primary change required is to not use the 7623
>> specific clock and regulator setup - none of that applies when using
>> the 7621 (and maybe other devices?). The other change required is to
>> set the 7530 MFC register CPU port number and enable bit.
> 
> 
> Hi Greg,
> 
> Good to see that more people are working on the MT7621 device [1].
> So I added Bjorn to the CC.

Nice, thanks for the pointers.


> I am also working on this but on the OpenWRT side.
> My current code works for a Ubiquiti EdgeRouter X SFP. See kernel [2], openwrt [3]

I forgot to mention I am working from mainline kernels, so those
patches of mine are against 4.20-rc4. I am working on some new
custom hardware at the moment, but I have an Oolite v8.0 board I
can run code on too.


> Current status:
> 
> Using OpenWRT provided mainline v4.14 driver MT7530 and MT7623.
> I patches so that MT7621 is supported.
> This means DSA part is also working, internal and external phys are detected.
> I can use all of the five RJ45 ports and also MT7520 switch port 5 which connects to a external phy (at8033) for the SFP port.
> Last added TRGMII part also seems to work but with issues, see below.
> Openwrt uses port 5 as wan and gets a dhcp lease.
> 
> Issues:
> - I can't get 2nd GMAC talk to external phy. I have tried many many knobs but without success.
>    GMAC seems to work but no data is transmitted/received over the cable.
>    But I think this can be done later on. Adding basic support for MT7621 is good start.
> - Ethernet driver expects that the macs are initialized so that the mtk_hw_init can setup the hardware registers.
>    But they are not. See [4]
>    I don't know how to fix this. For the current code it is not an issue. It still works.
>    But it should be fixed.
>    Because of this I can't read the mac0 "phy-mode". I need this info to setup the tgrmii clock at hardware init.
> - Ethernet speed is unstable ~30-100mbit. I think I broke something. I have seems 1gbit before.
> 
> I hope that this can help you the get a step further.

Thanks, that is all good info.

Regards
Greg


> René
> 
> [1] https://lists.openwrt.org/pipermail/openwrt-devel/2018-August/013614.html
> [2] https://github.com/vDorst/linux-1/commits/mt7621-dsa-trgmii
> [3] https://github.com/vDorst/openwrt/commits/mt7621-dsa-trgmii
> [4] https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mediatek/mtk_eth_soc.c?h=v4.14.84#n1946
> 
>
Andrew Lunn Nov. 30, 2018, 1:33 p.m. UTC | #5
> 2. Maximal sized RX packets get silently dropped. So receive side packets
>    that are large (perfect case is the all-but-last packets in a fragemented
>    larger packet) appear to be dropped at the mt7621 ethernet MAC level.
>    The 7530 MIB switch register counters show receive packets at the physical
>    switch port side and at the CPU switch port - but I get no packets
>    received or errors in the 7621 ethernet MAC. If I set the mtu of the
>    server at the other end a little smaller (a few bytes is enough) then
>    I get all the packets through. It seems like the DSA/VLAN tag bytes
>    are causing a too large packet to get silently dropped somewhere.

Hi Gerg

Try increasing the MTU on the master device. Some hardware will reject
receiving packets bigger than the MTU. But if you increase the MTU by
the DSA header size, it will then receive the frame.

I have a patchset i will be posting soon to do this automatically.

  Andrew
Andrew Lunn Nov. 30, 2018, 1:37 p.m. UTC | #6
> 1. TX packets are not getting an IP header checksum via the normal
>    off-loaded checksumming when in DSA mode. I have to switch off
>    NETIF_F_IP_CSUM, so the software stack generates the checksum.
>    That checksum offloading works ok when not using the 7530 DSA driver.

With some vendors MAC hardware, there is a field in the descriptor to
indicate how big a VLAN tag the frame has. The hardware can then use
this information to skip over the VLAN tags to find the IP header, and
then perform checksuming. You might be able to re-use that, consider
the DSA header as part of the VLAN header.

Other vendors, there is no way i've found to get hadware offload of
checksumming working.

    Andrew
Greg Ungerer Nov. 30, 2018, 1:41 p.m. UTC | #7
Hi Bjorn,

On 30/11/18 10:16 pm, Bjørn Mork wrote:
> gerg@kernel.org writes:
> 
>> I have been working towards supporting the MT7530 switch as used in the
>> MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621 is built around
>> a dual core MIPS CPU architecture. But underneath it is what appears to
>> be the same 7530 switch.
> 
> Great!  Good to see someone pushing this idea forward.
> 
>> The following 3 patches are more of an RFC than anything. They allow
>> use of the mt7530 dsa driver on this device - though with some issues
>> still to resolve. The primary change required is to not use the 7623
>> specific clock and regulator setup - none of that applies when using
>> the 7621 (and maybe other devices?). The other change required is to
>> set the 7530 MFC register CPU port number and enable bit.
>>
>> The unresolved issues I still have appear to be more related to the
>> MT7621 ethernet driver (drivers/staging/mt7621-eth/*). I am hoping
>> someone might have some ideas on these. I don't really have any good
>> documentation on the ethernet devices on the 7621, so I am kind of
>> working in the dark here.
> 
> No offense, but the mt7621-eth driver in staging is horrible.  What both
> René and I have had some success with is adapting the mtk_eth_soc driver
> already in drivers/net/ethernet/mediatek/.  Yes, I know this is supposed
> to be for other SoCs, but the basic design is obviously the same.

Yes, that makes a lot of sense. I have only been working with the
linux mainline mt7621 staging driver so far for this mt7530 work.
(I started trying to make this work with linux 4.19).


> I have had some success with a first hackish attemt based on OpenWrt.
> You can find the early tree here, but note that my focus was basically
> getting one specific MT7621 board up and running:
> https://github.com/bmork/LEDE/tree/mt7621-with-mainline-eth-driver

Thats great, thanks for the pointer, I'll have a close look at that.


> This patch has most of the necessary changes to enable that driver for
> MT7621:
> https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404
> 
> Not a big deal, as you can see.  There is of course a reason I didn't
> submit this here yet: It is by no means finished...  But it works. And I
> have both GMACs working with this driver, which was my primary goal.
> 
>> 1. TX packets are not getting an IP header checksum via the normal
>>     off-loaded checksumming when in DSA mode. I have to switch off
>>     NETIF_F_IP_CSUM, so the software stack generates the checksum.
>>     That checksum offloading works ok when not using the 7530 DSA driver.
> 
> Hmm.  How do I test this?

For me every transmitted packet had the wrong IP header checksum.
Every packet I received at the other end of the wire (dumped with
tcpdump) showed up with wrong header checksum. I am guessing you didn't
see this behavior - you can't really miss it :-)


>> 2. Maximal sized RX packets get silently dropped. So receive side packets
>>     that are large (perfect case is the all-but-last packets in a fragemented
>>     larger packet) appear to be dropped at the mt7621 ethernet MAC level.
>>     The 7530 MIB switch register counters show receive packets at the physical
>>     switch port side and at the CPU switch port - but I get no packets
>>     received or errors in the 7621 ethernet MAC. If I set the mtu of the
>>     server at the other end a little smaller (a few bytes is enough) then
>>     I get all the packets through. It seems like the DSA/VLAN tag bytes
>>     are causing a too large packet to get silently dropped somewhere.
> 
> Are you referring to the configured MTU size or some other maximal size?
> If MTU, then I don't seem to have this issue with the driver from
> drivers/net/ethernet/mediatek/.

I was referring to the mtu on the system at the other end of the wire.
For me that was my development PC (just a xubuntu 18.04, x86 machine).
If I reduced it there from the default mtu of 1500 I could get packets
to be received on the mt7621. The behavior was obvious to see with large
sent packets (something simple like ping -s 2000), the first fragment was
silently dropped bu the second fragment would get through.

Anyway, I will try out the changes you have for the drivers/net/ethernet/mediatek
driver, they sound very promising.

Thanks
Greg
Andrew Lunn Nov. 30, 2018, 1:42 p.m. UTC | #8
> > 1. TX packets are not getting an IP header checksum via the normal
> >    off-loaded checksumming when in DSA mode. I have to switch off
> >    NETIF_F_IP_CSUM, so the software stack generates the checksum.
> >    That checksum offloading works ok when not using the 7530 DSA driver.
> 
> Hmm.  How do I test this?

If there are no IP checksums in the frame, the receiver will generally
drop the packet.

ethtool -k will show you what features the MAC has in terms of
offloading. So if you see NETIF_F_IP_CSUM, you know the MAC should be
doing it in hardware.

      Andrew
Greg Ungerer Nov. 30, 2018, 1:45 p.m. UTC | #9
Hi Andrew,

On 30/11/18 11:37 pm, Andrew Lunn wrote:
>> 1. TX packets are not getting an IP header checksum via the normal
>>     off-loaded checksumming when in DSA mode. I have to switch off
>>     NETIF_F_IP_CSUM, so the software stack generates the checksum.
>>     That checksum offloading works ok when not using the 7530 DSA driver.
> 
> With some vendors MAC hardware, there is a field in the descriptor to
> indicate how big a VLAN tag the frame has. The hardware can then use
> this information to skip over the VLAN tags to find the IP header, and
> then perform checksuming. You might be able to re-use that, consider
> the DSA header as part of the VLAN header.
> 
> Other vendors, there is no way i've found to get hadware offload of
> checksumming working.

Thanks for the tips, I will try out the master mtu change too.

Regards
Greg
Greg Ungerer Dec. 3, 2018, 6:47 a.m. UTC | #10
Hi Andrew,

On 30/11/18 11:33 pm, Andrew Lunn wrote:
>> 2. Maximal sized RX packets get silently dropped. So receive side packets
>>     that are large (perfect case is the all-but-last packets in a fragemented
>>     larger packet) appear to be dropped at the mt7621 ethernet MAC level.
>>     The 7530 MIB switch register counters show receive packets at the physical
>>     switch port side and at the CPU switch port - but I get no packets
>>     received or errors in the 7621 ethernet MAC. If I set the mtu of the
>>     server at the other end a little smaller (a few bytes is enough) then
>>     I get all the packets through. It seems like the DSA/VLAN tag bytes
>>     are causing a too large packet to get silently dropped somewhere.
> 
> Hi Gerg
> 
> Try increasing the MTU on the master device. Some hardware will reject
> receiving packets bigger than the MTU. But if you increase the MTU by
> the DSA header size, it will then receive the frame.

I tried increasing it on the master, and it fails to set:

  # ifconfig eth0 mut 1500
  # ifconfig eth0 mtu 1504
  eth0: mtu greater than device maximum
  SIOCSIFMTU: Invalid argument

Looking within that staging driver it seems to generously set the size
of the RX descriptor buffers, so there is enough room in them. And glancing
around the driver I don't see it using the MTU to restrict the receive
packaet size (though I may be missing it).


> I have a patchset i will be posting soon to do this automatically.

That would be good.

Regards
Greg
Greg Ungerer Dec. 3, 2018, 7:20 a.m. UTC | #11
Hi Bjorn,

On 30/11/18 10:16 pm, Bjørn Mork wrote:
> gerg@kernel.org writes:
> 
>> I have been working towards supporting the MT7530 switch as used in the
>> MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621 is built around
>> a dual core MIPS CPU architecture. But underneath it is what appears to
>> be the same 7530 switch.
> 
> Great!  Good to see someone pushing this idea forward.
> 
>> The following 3 patches are more of an RFC than anything. They allow
>> use of the mt7530 dsa driver on this device - though with some issues
>> still to resolve. The primary change required is to not use the 7623
>> specific clock and regulator setup - none of that applies when using
>> the 7621 (and maybe other devices?). The other change required is to
>> set the 7530 MFC register CPU port number and enable bit.
>>
>> The unresolved issues I still have appear to be more related to the
>> MT7621 ethernet driver (drivers/staging/mt7621-eth/*). I am hoping
>> someone might have some ideas on these. I don't really have any good
>> documentation on the ethernet devices on the 7621, so I am kind of
>> working in the dark here.
> 
> No offense, but the mt7621-eth driver in staging is horrible.  What both
> René and I have had some success with is adapting the mtk_eth_soc driver
> already in drivers/net/ethernet/mediatek/.  Yes, I know this is supposed
> to be for other SoCs, but the basic design is obviously the same.
> 
> I have had some success with a first hackish attemt based on OpenWrt.
> You can find the early tree here, but note that my focus was basically
> getting one specific MT7621 board up and running:
> https://github.com/bmork/LEDE/tree/mt7621-with-mainline-eth-driver
> 
> This patch has most of the necessary changes to enable that driver for
> MT7621:
> https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404

I applied this to my main debug linux-4.19 kernel. Didn't apply completely
cleanly but was easy to fix up.

Using that everything came up detected (the 7530 switch) and I could
quickly see that it does not suffer the problems I listed below. Both
RX and TX packets of any size work!

However I also quickly discovered that this driver was pretty unstable.
Put a bit of packet load on it, and it would stop responding, CPU lock up,
and occasional rcu stalled messages from the kernel.

The following change helped alot, but I still get some problems under
sustained load and some types of port setups. Still trying to figure
out what exactly is going on.

--- a/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
+++ b/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
@@ -1750,8 +1750,8 @@ static irqreturn_t mtk_handle_irq_rx(int irq, void *_eth)
  
         if (likely(napi_schedule_prep(&eth->rx_napi))) {
                 __napi_schedule(&eth->rx_napi);
-               mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
         }
+       mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
  
         return IRQ_HANDLED;
  }
@@ -1762,11 +1762,53 @@ static irqreturn_t mtk_handle_irq_tx(int irq, void *_eth)
  
         if (likely(napi_schedule_prep(&eth->tx_napi))) {
                 __napi_schedule(&eth->tx_napi);
-               mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
         }
+       mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
  
         return IRQ_HANDLED;
  }


Anyway, this really looks like the right approach to me. This driver is
clearly capable of supporting the mt7621 ethernet ports. No need for the
staging driver.

Regards
Greg



> Not a big deal, as you can see.  There is of course a reason I didn't
> submit this here yet: It is by no means finished...  But it works. And I
> have both GMACs working with this driver, which was my primary goal.
> 
>> 1. TX packets are not getting an IP header checksum via the normal
>>     off-loaded checksumming when in DSA mode. I have to switch off
>>     NETIF_F_IP_CSUM, so the software stack generates the checksum.
>>     That checksum offloading works ok when not using the 7530 DSA driver.
> 
> Hmm.  How do I test this?
> 
>> 2. Maximal sized RX packets get silently dropped. So receive side packets
>>     that are large (perfect case is the all-but-last packets in a fragemented
>>     larger packet) appear to be dropped at the mt7621 ethernet MAC level.
>>     The 7530 MIB switch register counters show receive packets at the physical
>>     switch port side and at the CPU switch port - but I get no packets
>>     received or errors in the 7621 ethernet MAC. If I set the mtu of the
>>     server at the other end a little smaller (a few bytes is enough) then
>>     I get all the packets through. It seems like the DSA/VLAN tag bytes
>>     are causing a too large packet to get silently dropped somewhere.
> 
> Are you referring to the configured MTU size or some other maximal size?
> If MTU, then I don't seem to have this issue with the driver from
> drivers/net/ethernet/mediatek/.
> 
> 
> 
> Bjørn
>
Bjørn Mork Dec. 3, 2018, 11:34 a.m. UTC | #12
[ fixed Johns address - the openwrt.org email was apparently never restored? ]

Greg Ungerer <gerg@kernel.org> writes:

> The following change helped alot, but I still get some problems under
> sustained load and some types of port setups. Still trying to figure
> out what exactly is going on.
>
> --- a/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> +++ b/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
> @@ -1750,8 +1750,8 @@ static irqreturn_t mtk_handle_irq_rx(int irq, void *_eth)
>        if (likely(napi_schedule_prep(&eth->rx_napi))) {
>                 __napi_schedule(&eth->rx_napi);
> -               mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>         }
> +       mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>        return IRQ_HANDLED;
>  }
> @@ -1762,11 +1762,53 @@ static irqreturn_t mtk_handle_irq_tx(int irq, void *_eth)
>        if (likely(napi_schedule_prep(&eth->tx_napi))) {
>                 __napi_schedule(&eth->tx_napi);
> -               mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>         }
> +       mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>        return IRQ_HANDLED;
>  }

Yes, sorry I didn't point to that as well.  Just to be clear:  I have no
clue how this thing is actually wired up, or if you could use three
interrupts on the MT7621 too. I just messed with it until I got
something to work, based on Renés original idea and code.

> Anyway, this really looks like the right approach to me. This driver is
> clearly capable of supporting the mt7621 ethernet ports. No need for the
> staging driver.

Great! Thanks for doing this.

I did make a feeble attempt at testing this with current mainline
myself, but the only MT7621 board I have is using NAND flash.  So I
started trying to forward port the mtk-nand2 driver from OpenWrt. And
failed. Probably a simple mixup while trying to adjust to the many
changes in the raw NAND API between v4.14 and v.4.20.  Then I
optimistically attempted to use the mainline mtk-nand driver instead,
assuming it would be as simple as with the mtk-eth driver.  Which it
wasn't, of course. I guess there are a lot of things I do not understand
wrt flash and HW ECC etc...

Short version: I won't be able to test the mainline mtk-eth driver with
MT7621 on newer kernels before smarter people like John upgrade the
OpenWrt kernel.


Bjørn
René van Dorst Dec. 3, 2018, 2 p.m. UTC | #13
Quoting Bjørn Mork <bjorn@mork.no>:
> Greg Ungerer <gerg@kernel.org> writes:
>
>> The following change helped alot, but I still get some problems under
>> sustained load and some types of port setups. Still trying to figure
>> out what exactly is going on.
>>
>> --- a/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>> +++ b/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>> @@ -1750,8 +1750,8 @@ static irqreturn_t mtk_handle_irq_rx(int irq,  
>> void *_eth)
>>        if (likely(napi_schedule_prep(&eth->rx_napi))) {
>>                 __napi_schedule(&eth->rx_napi);
>> -               mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>>         }
>> +       mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>>        return IRQ_HANDLED;
>>  }
>> @@ -1762,11 +1762,53 @@ static irqreturn_t mtk_handle_irq_tx(int  
>> irq, void *_eth)
>>        if (likely(napi_schedule_prep(&eth->tx_napi))) {
>>                 __napi_schedule(&eth->tx_napi);
>> -               mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>>         }
>> +       mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>>        return IRQ_HANDLED;
>>  }
>
> Yes, sorry I didn't point to that as well.  Just to be clear:  I have no
> clue how this thing is actually wired up, or if you could use three
> interrupts on the MT7621 too. I just messed with it until I got
> something to work, based on Renés original idea and code.

My idea is a just a copy of mtk_handle_irq_{rx,tx} see [1]
You probably want to look at the staging driver or Ubiquity source  
with a 3.10.x kernel [2] or padavan with 3.4.x kernel [3].
AFAIK mt7621 only has 1 IRQ for ethernet part.

Greats,

René

[1]  
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mediatek/mtk_eth_soc.c#n1739
[2]  
https://www.ubnt.com/download/edgemax/edgerouter-x-sfp/default/edgerouter-er-xer-x-sfpep-r6-firmware-v1107
[3]  
https://bitbucket.org/padavan/rt-n56u/src/e6f45337528f668651e251057a1a0fec735f6df1/trunk/linux-3.4.x/drivers/net/raeth/raether.c?at=master&fileviewer=file-view-default#raether.c-658
John Crispin Dec. 3, 2018, 2:02 p.m. UTC | #14
On 03/12/2018 15:00, René van Dorst wrote:
> Quoting Bjørn Mork <bjorn@mork.no>:
>> Greg Ungerer <gerg@kernel.org> writes:
>>
>>> The following change helped alot, but I still get some problems under
>>> sustained load and some types of port setups. Still trying to figure
>>> out what exactly is going on.
>>>
>>> --- a/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>>> +++ b/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>>> @@ -1750,8 +1750,8 @@ static irqreturn_t mtk_handle_irq_rx(int irq, 
>>> void *_eth)
>>>        if (likely(napi_schedule_prep(&eth->rx_napi))) {
>>>                 __napi_schedule(&eth->rx_napi);
>>> -               mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>>>         }
>>> +       mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>>>        return IRQ_HANDLED;
>>>  }
>>> @@ -1762,11 +1762,53 @@ static irqreturn_t mtk_handle_irq_tx(int 
>>> irq, void *_eth)
>>>        if (likely(napi_schedule_prep(&eth->tx_napi))) {
>>>                 __napi_schedule(&eth->tx_napi);
>>> -               mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>>>         }
>>> +       mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>>>        return IRQ_HANDLED;
>>>  }
>>
>> Yes, sorry I didn't point to that as well.  Just to be clear:  I have no
>> clue how this thing is actually wired up, or if you could use three
>> interrupts on the MT7621 too. I just messed with it until I got
>> something to work, based on Renés original idea and code.
>
> My idea is a just a copy of mtk_handle_irq_{rx,tx} see [1]
> You probably want to look at the staging driver or Ubiquity source 
> with a 3.10.x kernel [2] or padavan with 3.4.x kernel [3].
> AFAIK mt7621 only has 1 IRQ for ethernet part.

correct there is only 1 single IRQ on mt7621

     John



>
> Greats,
>
> René
>
> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/net/ethernet/mediatek/mtk_eth_soc.c#n1739
> [2] 
> https://www.ubnt.com/download/edgemax/edgerouter-x-sfp/default/edgerouter-er-xer-x-sfpep-r6-firmware-v1107
> [3] 
> https://bitbucket.org/padavan/rt-n56u/src/e6f45337528f668651e251057a1a0fec735f6df1/trunk/linux-3.4.x/drivers/net/raeth/raether.c?at=master&fileviewer=file-view-default#raether.c-658
>
>
Greg Ungerer Dec. 4, 2018, 7:23 a.m. UTC | #15
Hi Bjorn,

On 3/12/18 9:34 pm, Bjørn Mork wrote:
> [ fixed Johns address - the openwrt.org email was apparently never restored? ]
> 
> Greg Ungerer <gerg@kernel.org> writes:
> 
>> The following change helped alot, but I still get some problems under
>> sustained load and some types of port setups. Still trying to figure
>> out what exactly is going on.
>>
>> --- a/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>> +++ b/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>> @@ -1750,8 +1750,8 @@ static irqreturn_t mtk_handle_irq_rx(int irq, void *_eth)
>>         if (likely(napi_schedule_prep(&eth->rx_napi))) {
>>                  __napi_schedule(&eth->rx_napi);
>> -               mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>>          }
>> +       mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>>         return IRQ_HANDLED;
>>   }
>> @@ -1762,11 +1762,53 @@ static irqreturn_t mtk_handle_irq_tx(int irq, void *_eth)
>>         if (likely(napi_schedule_prep(&eth->tx_napi))) {
>>                  __napi_schedule(&eth->tx_napi);
>> -               mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>>          }
>> +       mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>>         return IRQ_HANDLED;
>>   }
> 
> Yes, sorry I didn't point to that as well.  Just to be clear:  I have no
> clue how this thing is actually wired up, or if you could use three
> interrupts on the MT7621 too. I just messed with it until I got
> something to work, based on Renés original idea and code.

Understood.

By way of progress I have found that making the single IRQ handler
look like this is better than the last patch:

static irqreturn_t mtk_handle_irq(int irq, void *_eth)
{
         struct mtk_eth *eth = _eth;
         irqreturn_t rc = IRQ_NONE;

         if (mtk_r32(eth, MTK_PDMA_INT_MASK) & MTK_RX_DONE_INT) {
                 if (mtk_r32(eth, MTK_PDMA_INT_STATUS) & MTK_RX_DONE_INT)
                         rc = mtk_handle_irq_rx(irq, _eth);
         }

         if (mtk_r32(eth, MTK_QDMA_INT_MASK) & MTK_TX_DONE_INT) {
                 if (mtk_r32(eth, MTK_QMTK_INT_STATUS) & MTK_TX_DONE_INT)
                         rc = mtk_handle_irq_tx(irq, _eth);
         }

         return rc;
}

So not calling the RX or TX handlers if their interrupts
where not masked on. For some work loads that is quite reliable.
Flood ping through one port and out the other can get a lof of
packets through.

However I still get dev_watchdog timeouts after a while and with
more mixed packet loads. Seems like something is racing on the
TX path side.

Regards
Greg



>> Anyway, this really looks like the right approach to me. This driver is
>> clearly capable of supporting the mt7621 ethernet ports. No need for the
>> staging driver.
> 
> Great! Thanks for doing this.
> 
> I did make a feeble attempt at testing this with current mainline
> myself, but the only MT7621 board I have is using NAND flash.  So I
> started trying to forward port the mtk-nand2 driver from OpenWrt. And
> failed. Probably a simple mixup while trying to adjust to the many
> changes in the raw NAND API between v4.14 and v.4.20.  Then I
> optimistically attempted to use the mainline mtk-nand driver instead,
> assuming it would be as simple as with the mtk-eth driver.  Which it
> wasn't, of course. I guess there are a lot of things I do not understand
> wrt flash and HW ECC etc...
> 
> Short version: I won't be able to test the mainline mtk-eth driver with
> MT7621 on newer kernels before smarter people like John upgrade the
> OpenWrt kernel.
> 
> 
> Bjørn
>
Greg Ungerer Dec. 7, 2018, 7:12 a.m. UTC | #16
Hi John,

On 4/12/18 12:02 am, John Crispin wrote:
> On 03/12/2018 15:00, René van Dorst wrote:
>> Quoting Bjørn Mork <bjorn@mork.no>:
>>> Greg Ungerer <gerg@kernel.org> writes:
>>>
>>>> The following change helped alot, but I still get some problems under
>>>> sustained load and some types of port setups. Still trying to figure
>>>> out what exactly is going on.
>>>>
>>>> --- a/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>>>> +++ b/linux/drivers/net/ethernet/mediatek/mtk_eth_soc.c
>>>> @@ -1750,8 +1750,8 @@ static irqreturn_t mtk_handle_irq_rx(int irq, void *_eth)
>>>>        if (likely(napi_schedule_prep(&eth->rx_napi))) {
>>>>                 __napi_schedule(&eth->rx_napi);
>>>> -               mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>>>>         }
>>>> +       mtk_rx_irq_disable(eth, MTK_RX_DONE_INT);
>>>>        return IRQ_HANDLED;
>>>>  }
>>>> @@ -1762,11 +1762,53 @@ static irqreturn_t mtk_handle_irq_tx(int irq, void *_eth)
>>>>        if (likely(napi_schedule_prep(&eth->tx_napi))) {
>>>>                 __napi_schedule(&eth->tx_napi);
>>>> -               mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>>>>         }
>>>> +       mtk_tx_irq_disable(eth, MTK_TX_DONE_INT);
>>>>        return IRQ_HANDLED;
>>>>  }
>>>
>>> Yes, sorry I didn't point to that as well.  Just to be clear:  I have no
>>> clue how this thing is actually wired up, or if you could use three
>>> interrupts on the MT7621 too. I just messed with it until I got
>>> something to work, based on Renés original idea and code.
>>
>> My idea is a just a copy of mtk_handle_irq_{rx,tx} see [1]
>> You probably want to look at the staging driver or Ubiquity source with a 3.10.x kernel [2] or padavan with 3.4.x kernel [3].
>> AFAIK mt7621 only has 1 IRQ for ethernet part.
> 
> correct there is only 1 single IRQ on mt7621

One of the main differences I see between the mainline mtk_eth_soc.c
and the older mediatek/openwrt driver is that the older driver uses
the PDMA module for TX transmission, while the mainline uses the
QDMA module. I have no documentation on the what the differences
are between the 2 (or why there is even 2 DMA engines in there?).
Can you shed any light on that?

I did a quick and dirty recode of the QDMA transmission parts of
the mainline driver code to use the PDMA engine instead. The most
immediate result is that it suffers the same IP header checksum
problem on TX packets :-(  But it also still suffers from the
same occasional TX watchdog timeout I see with only the mainline
driver and basic support of MT7621.

What I see with the TX watchdog timeouts is that there is valid
TX descriptors, but the frame engine is just not processing them.
It seems to be just sitting there idle. The CTX and DTX registers
look valid and consistent with the local last_free/next_free
pointers.

Regards
Greg
NeilBrown Dec. 11, 2018, 5:02 a.m. UTC | #17
On Fri, Nov 30 2018, Bjørn Mork wrote:

> gerg@kernel.org writes:
>
>> I have been working towards supporting the MT7530 switch as used in the
>> MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621 is built around
>> a dual core MIPS CPU architecture. But underneath it is what appears to
>> be the same 7530 switch.
>
> Great!  Good to see someone pushing this idea forward.
>
>> The following 3 patches are more of an RFC than anything. They allow
>> use of the mt7530 dsa driver on this device - though with some issues
>> still to resolve. The primary change required is to not use the 7623
>> specific clock and regulator setup - none of that applies when using
>> the 7621 (and maybe other devices?). The other change required is to
>> set the 7530 MFC register CPU port number and enable bit.
>>
>> The unresolved issues I still have appear to be more related to the
>> MT7621 ethernet driver (drivers/staging/mt7621-eth/*). I am hoping
>> someone might have some ideas on these. I don't really have any good
>> documentation on the ethernet devices on the 7621, so I am kind of
>> working in the dark here.
>
> No offense, but the mt7621-eth driver in staging is horrible.  What both
> René and I have had some success with is adapting the mtk_eth_soc driver
> already in drivers/net/ethernet/mediatek/.  Yes, I know this is supposed
> to be for other SoCs, but the basic design is obviously the same.
>
> I have had some success with a first hackish attemt based on OpenWrt.
> You can find the early tree here, but note that my focus was basically
> getting one specific MT7621 board up and running:
> https://github.com/bmork/LEDE/tree/mt7621-with-mainline-eth-driver
>
> This patch has most of the necessary changes to enable that driver for
> MT7621:
> https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404
>
> Not a big deal, as you can see.  There is of course a reason I didn't
> submit this here yet: It is by no means finished...  But it works. And I
> have both GMACs working with this driver, which was my primary goal.

Thanks a lot for posting this.  I'd be very happy to see the staging
driver disappear once this is ready and in mainline.

I got your patch working on 4.20-rc5 and did a performance comparison.
With the staging driver (using iperf3) I get
  220 MBit/sec in
  680 MBit/sec out

with the patched mainline driver I get
  190 MBit/sec in
   93 MBit/sec out

(numbers are a bit rubbery, but within 10%)

I haven't looked into why this might be, but thought I would mention it.

Strangely when I test with scp, I get about 10MB/sec in both directions
with both drivers.  Maybe the CPU limits encryption speed.

I have a 4.4-based kernel where I get 940MBit/sec both ways - using a
precursor of the current staging driver.

Thanks,
NeilBrown


>
>> 1. TX packets are not getting an IP header checksum via the normal
>>    off-loaded checksumming when in DSA mode. I have to switch off
>>    NETIF_F_IP_CSUM, so the software stack generates the checksum.
>>    That checksum offloading works ok when not using the 7530 DSA driver.
>
> Hmm.  How do I test this?
>
>> 2. Maximal sized RX packets get silently dropped. So receive side packets
>>    that are large (perfect case is the all-but-last packets in a fragemented
>>    larger packet) appear to be dropped at the mt7621 ethernet MAC level.
>>    The 7530 MIB switch register counters show receive packets at the physical
>>    switch port side and at the CPU switch port - but I get no packets
>>    received or errors in the 7621 ethernet MAC. If I set the mtu of the
>>    server at the other end a little smaller (a few bytes is enough) then
>>    I get all the packets through. It seems like the DSA/VLAN tag bytes
>>    are causing a too large packet to get silently dropped somewhere.
>
> Are you referring to the configured MTU size or some other maximal size?
> If MTU, then I don't seem to have this issue with the driver from
> drivers/net/ethernet/mediatek/.
>
>
>
> Bjørn
Bjørn Mork Dec. 11, 2018, 8:28 a.m. UTC | #18
NeilBrown <neil@brown.name> writes:

> I got your patch working on 4.20-rc5 and did a performance comparison.
> With the staging driver (using iperf3) I get
>   220 MBit/sec in
>   680 MBit/sec out
>
> with the patched mainline driver I get
>   190 MBit/sec in
>    93 MBit/sec out
>
> (numbers are a bit rubbery, but within 10%)
>
> I haven't looked into why this might be, but thought I would mention it.
>
> Strangely when I test with scp, I get about 10MB/sec in both directions
> with both drivers.  Maybe the CPU limits encryption speed.
>
> I have a 4.4-based kernel where I get 940MBit/sec both ways - using a
> precursor of the current staging driver.

Yes, thanks for bringing this up. I should have mentioned that I haven't
considered performance at all yet.


Bjørn
NeilBrown Dec. 16, 2018, 10:08 p.m. UTC | #19
On Tue, Dec 11 2018, NeilBrown wrote:

>
> I got your patch working on 4.20-rc5 and did a performance comparison.
> With the staging driver (using iperf3) I get
>   220 MBit/sec in
>   680 MBit/sec out
>
> with the patched mainline driver I get
>   190 MBit/sec in
>    93 MBit/sec out
>
> (numbers are a bit rubbery, but within 10%)
>
> I haven't looked into why this might be, but thought I would mention it.
>
> Strangely when I test with scp, I get about 10MB/sec in both directions
> with both drivers.  Maybe the CPU limits encryption speed.
>
> I have a 4.4-based kernel where I get 940MBit/sec both ways - using a
> precursor of the current staging driver.

Just FYI, I've been looking further into this, and I don't think the
problem is (entirely) related to the driver.

In my 4.4 kernel, the build_skb() call in (the equivalent of)
mtk_poll_rx() takes about 1.2usec and the call to napi_gro_receive()
takes about 3usec.

In my 4.20 kernel, these calls take about 30 and 24 usec respectively.
This easily explains the slowdown.
I don't yet know why, and won't have time to look for a few days.
I haven't looked into how this affects the
drivers/net/ethernet/mediatek driver in 4.20.

If anyone has ideas about why these might be so slow, I'd love to hear
them.

Thanks,
NeilBrown
David Miller Dec. 16, 2018, 10:14 p.m. UTC | #20
From: NeilBrown <neil@brown.name>
Date: Mon, 17 Dec 2018 09:08:54 +1100

> In my 4.4 kernel, the build_skb() call in (the equivalent of)
> mtk_poll_rx() takes about 1.2usec and the call to napi_gro_receive()
> takes about 3usec.
> 
> In my 4.20 kernel, these calls take about 30 and 24 usec respectively.
> This easily explains the slowdown.

That's a huge difference.

Nothing jumps out as a possible cause except perhaps retpoline or
something like that.
NeilBrown Dec. 16, 2018, 11:19 p.m. UTC | #21
On Sun, Dec 16 2018, David Miller wrote:

> From: NeilBrown <neil@brown.name>
> Date: Mon, 17 Dec 2018 09:08:54 +1100
>
>> In my 4.4 kernel, the build_skb() call in (the equivalent of)
>> mtk_poll_rx() takes about 1.2usec and the call to napi_gro_receive()
>> takes about 3usec.
>> 
>> In my 4.20 kernel, these calls take about 30 and 24 usec respectively.
>> This easily explains the slowdown.
>
> That's a huge difference.
>
> Nothing jumps out as a possible cause except perhaps retpoline or
> something like that.

I'll keep that in mind - thanks.

My guess was CPU-cache invalidation.
I just checked and the other CPU core (there are two - each
hyper-threaded - "other" meaning not the one that handles ethernet
interrupts) gets several thousand "IPI resched" interrupts while
running a 10 second (226MByte) iperf3 receive test.
About 17KB transferred per IPI.
I cannot see where build_skb() would do cache invalidation though.

Thanks,
NeilBrown
Florian Fainelli Dec. 17, 2018, midnight UTC | #22
On December 16, 2018 3:19:22 PM PST, NeilBrown <neil@brown.name> wrote:
>On Sun, Dec 16 2018, David Miller wrote:
>
>> From: NeilBrown <neil@brown.name>
>> Date: Mon, 17 Dec 2018 09:08:54 +1100
>>
>>> In my 4.4 kernel, the build_skb() call in (the equivalent of)
>>> mtk_poll_rx() takes about 1.2usec and the call to napi_gro_receive()
>>> takes about 3usec.
>>> 
>>> In my 4.20 kernel, these calls take about 30 and 24 usec
>respectively.
>>> This easily explains the slowdown.
>>
>> That's a huge difference.
>>
>> Nothing jumps out as a possible cause except perhaps retpoline or
>> something like that.
>
>I'll keep that in mind - thanks.
>
>My guess was CPU-cache invalidation.
>I just checked and the other CPU core (there are two - each
>hyper-threaded - "other" meaning not the one that handles ethernet
>interrupts) gets several thousand "IPI resched" interrupts while
>running a 10 second (226MByte) iperf3 receive test.
>About 17KB transferred per IPI.
>I cannot see where build_skb() would do cache invalidation though.

It doesn't the driver is responsible for that. How is coherency maintained between cores?

The IPI could be due to receive packet steering, is the MAC multi queue aware on the RX path?
NeilBrown Dec. 17, 2018, 7:11 a.m. UTC | #23
On Sun, Dec 16 2018, Florian Fainelli wrote:

> On December 16, 2018 3:19:22 PM PST, NeilBrown <neil@brown.name> wrote:
>>On Sun, Dec 16 2018, David Miller wrote:
>>
>>> From: NeilBrown <neil@brown.name>
>>> Date: Mon, 17 Dec 2018 09:08:54 +1100
>>>
>>>> In my 4.4 kernel, the build_skb() call in (the equivalent of)
>>>> mtk_poll_rx() takes about 1.2usec and the call to napi_gro_receive()
>>>> takes about 3usec.
>>>> 
>>>> In my 4.20 kernel, these calls take about 30 and 24 usec
>>respectively.
>>>> This easily explains the slowdown.
>>>
>>> That's a huge difference.
>>>
>>> Nothing jumps out as a possible cause except perhaps retpoline or
>>> something like that.
>>
>>I'll keep that in mind - thanks.
>>
>>My guess was CPU-cache invalidation.
>>I just checked and the other CPU core (there are two - each
>>hyper-threaded - "other" meaning not the one that handles ethernet
>>interrupts) gets several thousand "IPI resched" interrupts while
>>running a 10 second (226MByte) iperf3 receive test.
>>About 17KB transferred per IPI.
>>I cannot see where build_skb() would do cache invalidation though.
>
> It doesn't the driver is responsible for that. How is coherency maintained between cores?

I suspect so - yes.  Coherency only needs explicit management with DMA
is used.  This wasn't the problem.

Further investigation showed that the problem was that I had
CONFIG_SLUB_DEBUG set.  That was probably useful in some earlier
debugging exercise, but it clearly isn't useful when performance-testing
the network.
I removed that and I have much nicer numbers - not quite the consistent
900+ that I saw with 4.4, but a lot closer.

Thanks for the encouragement, and sorry of the noise.

NeilBrown


>
> The IPI could be due to receive packet steering, is the MAC multi queue aware on the RX path?
> -- 
> Florian