mbox series

[v4,00/11] mediatek: add support for MediaTek Ethernet MAC

Message ID 20200520112523.30995-1-brgl@bgdev.pl
Headers show
Series mediatek: add support for MediaTek Ethernet MAC | expand

Message

Bartosz Golaszewski May 20, 2020, 11:25 a.m. UTC
From: Bartosz Golaszewski <bgolaszewski@baylibre.com>

This adds support for the Ethernet Controller present on MediaTeK SoCs from
the MT8* family.

First we convert the existing DT bindings for the PERICFG controller to YAML
and add a new compatible string for mt8516 variant of it. Then we add the DT
bindings for the MAC.

Next we do some cleanup of the mediatek ethernet drivers directory.

The largest patch in the series adds the actual new driver.

The rest of the patches add DT fixups for the boards already supported
upstream.

v1 -> v2:
- add a generic helper for retrieving the net_device associated with given
  private data
- fix several typos in commit messages
- remove MTK_MAC_VERSION and don't set the driver version
- use NET_IP_ALIGN instead of a magic number (2) but redefine it as it defaults
  to 0 on arm64
- don't manually turn the carrier off in mtk_mac_enable()
- process TX cleanup in napi poll callback
- configure pause in the adjust_link callback
- use regmap_read_poll_timeout() instead of handcoding the polling
- use devres_find() to verify that struct net_device is managed by devres in
  devm_register_netdev()
- add a patch moving all networking devres helpers into net/devres.c
- tweak the dma barriers: remove where unnecessary and add comments to the
  remaining barriers
- don't reset internal counters when enabling the NIC
- set the net_device's mtu size instead of checking the framesize in
  ndo_start_xmit() callback
- fix a race condition in waking up the netif queue
- don't emit log messages on OOM errors
- use dma_set_mask_and_coherent()
- use eth_hw_addr_random()
- rework the receive callback so that we reuse the previous skb if unmapping
  fails, like we already do if skb allocation fails
- rework hash table operations: add proper timeout handling and clear bits when
  appropriate

v2 -> v3:
- drop the patch adding priv_to_netdev() and store the netdev pointer in the
  driver private data
- add an additional dma_wmb() after reseting the descriptor in
  mtk_mac_ring_pop_tail()
- check the return value of dma_set_mask_and_coherent()
- improve the DT bindings for mtk-eth-mac: make the reg property in the example
  use single-cell address and size, extend the description of the PERICFG
  phandle and document the mdio sub-node
- add a patch converting the old .txt bindings for PERICFG to yaml
- limit reading the DMA memory by storing the mapped addresses in the driver
  private structure
- add a patch documenting the existing networking devres helpers

v3 -> v4:
- drop the devres patches: they will be sent separately
- call netdev_sent_queue() & netdev_completed_queue() where appropriate
- don't redefine NET_IP_ALIGN: define a private constant in the driver
- fix a couple typos
- only disabe/enable the MAC in suspend/resume if netif is running
- drop the count field from the ring structure and instead calculate the number
  of used descriptors from the tail and head indicies
- rework the locking used to protect the ring structures from concurrent
  access: use cheaper spin_lock_bh() and completely disable the internal
  spinlock used by regmap
- rework the interrupt handling to make it more fine-grained: onle re-enable
  TX and RX interrupts while they're needed, process the stats updates in a
  workqueue, not in napi context
- shrink the code responsible for unmapping and freeing skb memory
- rework the barriers as advised by Arnd

Bartosz Golaszewski (11):
  dt-bindings: convert the binding document for mediatek PERICFG to yaml
  dt-bindings: add new compatible to mediatek,pericfg
  dt-bindings: net: add a binding document for MediaTek Ethernet MAC
  net: ethernet: mediatek: rename Kconfig prompt
  net: ethernet: mediatek: remove unnecessary spaces from Makefile
  net: ethernet: mtk-eth-mac: new driver
  ARM64: dts: mediatek: add pericfg syscon to mt8516.dtsi
  ARM64: dts: mediatek: add the ethernet node to mt8516.dtsi
  ARM64: dts: mediatek: add an alias for ethernet0 for pumpkin boards
  ARM64: dts: mediatek: add ethernet pins for pumpkin boards
  ARM64: dts: mediatek: enable ethernet on pumpkin boards

 .../arm/mediatek/mediatek,pericfg.txt         |   36 -
 .../arm/mediatek/mediatek,pericfg.yaml        |   64 +
 .../bindings/net/mediatek,eth-mac.yaml        |   89 +
 arch/arm64/boot/dts/mediatek/mt8516.dtsi      |   17 +
 .../boot/dts/mediatek/pumpkin-common.dtsi     |   34 +
 drivers/net/ethernet/mediatek/Kconfig         |    8 +-
 drivers/net/ethernet/mediatek/Makefile        |    3 +-
 drivers/net/ethernet/mediatek/mtk_eth_mac.c   | 1668 +++++++++++++++++
 8 files changed, 1881 insertions(+), 38 deletions(-)
 delete mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt
 create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml
 create mode 100644 Documentation/devicetree/bindings/net/mediatek,eth-mac.yaml
 create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c

Comments

Arnd Bergmann May 20, 2020, 2:37 p.m. UTC | #1
On Wed, May 20, 2020 at 1:25 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
>
> From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
>
> This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC
> family. For now we only support full-duplex.
>
> Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>

Looks much better, thanks for addressing my feedback. A few more things
about this version:

> ---
>  drivers/net/ethernet/mediatek/Kconfig       |    6 +
>  drivers/net/ethernet/mediatek/Makefile      |    1 +
>  drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1668 +++++++++++++++++++
>  3 files changed, 1675 insertions(+)
>  create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c
>
> diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
> index 5079b8090f16..5c3793076765 100644
> --- a/drivers/net/ethernet/mediatek/Kconfig
> +++ b/drivers/net/ethernet/mediatek/Kconfig
> @@ -14,4 +14,10 @@ config NET_MEDIATEK_SOC
>           This driver supports the gigabit ethernet MACs in the
>           MediaTek SoC family.
>
> +config NET_MEDIATEK_MAC
> +       tristate "MediaTek Ethernet MAC support"
> +       select PHYLIB
> +       help
> +         This driver supports the ethernet IP on MediaTek MT85** SoCs.

I just noticed how the naming of NET_MEDIATEK_MAC and NET_MEDIATEK_SOC
for two different drivers doing the same thing is really confusing.

Maybe someone can come up with a better name, such as one
based on the soc it first showed up in.

> +       struct mtk_mac_ring_desc *desc = &ring->descs[ring->head];
> +       unsigned int status;
> +
> +       status = desc->status;
> +
> +       ring->skbs[ring->head] = desc_data->skb;
> +       ring->dma_addrs[ring->head] = desc_data->dma_addr;
> +       desc->data_ptr = desc_data->dma_addr;
> +
> +       status |= desc_data->len;
> +       if (flags)
> +               status |= flags;
> +       desc->status = status;
> +
> +       /* Flush previous modifications before ownership change. */
> +       dma_wmb();
> +       desc->status &= ~MTK_MAC_DESC_BIT_COWN;

You still do the read-modify-write on the word here, which is
expensive on uncached memory. You have read the value already,
so better use an assignment rather than &=, or (better)
READ_ONCE() and WRITE_ONCE() to prevent the compiler
from adding further accesses.


> +static void mtk_mac_lock(struct mtk_mac_priv *priv)
> +{
> +       spin_lock_bh(&priv->lock);
> +}
> +
> +static void mtk_mac_unlock(struct mtk_mac_priv *priv)
> +{
> +       spin_unlock_bh(&priv->lock);
> +}

I think open-coding the locks would make this more readable,
and let you use spin_lock() instead of spin_lock_bh() in
those functions that are already in softirq context.

> +static void mtk_mac_intr_enable_tx(struct mtk_mac_priv *priv)
> +{
> +       regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK,
> +                          MTK_MAC_BIT_INT_STS_TNTC, 0);
> +}
> +static void mtk_mac_intr_enable_rx(struct mtk_mac_priv *priv)
> +{
> +       regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK,
> +                          MTK_MAC_BIT_INT_STS_FNRC, 0);
> +}

These imply reading the irq mask register and then writing it again,
which is much more expensive than just writing it. It's also not
atomic since the regmap does not use a lock.

I don't think you actually need to enable/disable rx and tx separately,
but if you do, then writing to the Ack register as I suggested instead
of updating the mask would let you do this.

> +/* All processing for TX and RX happens in the napi poll callback. */
> +static irqreturn_t mtk_mac_handle_irq(int irq, void *data)
> +{
> +       struct mtk_mac_priv *priv;
> +       struct net_device *ndev;
> +       bool need_napi = false;
> +       unsigned int status;
> +
> +       ndev = data;
> +       priv = netdev_priv(ndev);
> +
> +       if (netif_running(ndev)) {
> +               status = mtk_mac_intr_read(priv);
> +
> +               if (status & MTK_MAC_BIT_INT_STS_TNTC) {
> +                       mtk_mac_intr_disable_tx(priv);
> +                       need_napi = true;
> +               }
> +
> +               if (status & MTK_MAC_BIT_INT_STS_FNRC) {
> +                       mtk_mac_intr_disable_rx(priv);
> +                       need_napi = true;
> +               }

I think you mixed up the rx and tx bits here: when you get
an rx interrupt, that one is already blocked until it gets
acked and you just need to disable tx until the end of the
poll function.

However, I suspect that the overhead of turning them off
is higher than what  you can save, and simply ignoring
the mask with

if (status & (MTK_MAC_BIT_INT_STS_FNRC | MTK_MAC_BIT_INT_STS_TNTC))
        napi_schedule(&priv->napi);

would be simpler and faster.

 +               /* One of the counters reached 0x8000000 - update stats and
> +                * reset all counters.
> +                */
> +               if (unlikely(status & MTK_MAC_REG_INT_STS_MIB_CNT_TH)) {
> +                       mtk_mac_intr_disable_stats(priv);
> +                       schedule_work(&priv->stats_work);
> +               }
> + befor
> +               mtk_mac_intr_ack_all(priv);

The ack here needs to be dropped, otherwise you can get further
interrupts before the bottom half has had a chance to run.

You might be lucky because you had already disabled the individual
bits earlier, but I don't think that was intentional here.

> +static int mtk_mac_netdev_start_xmit(struct sk_buff *skb,
> +                                    struct net_device *ndev)
> +{
> +       struct mtk_mac_priv *priv = netdev_priv(ndev);
> +       struct mtk_mac_ring *ring = &priv->tx_ring;
> +       struct device *dev = mtk_mac_get_dev(priv);
> +       struct mtk_mac_ring_desc_data desc_data;
> +
> +       desc_data.dma_addr = mtk_mac_dma_map_tx(priv, skb);
> +       if (dma_mapping_error(dev, desc_data.dma_addr))
> +               goto err_drop_packet;
> +
> +       desc_data.skb = skb;
> +       desc_data.len = skb->len;
> +
> +       mtk_mac_lock(priv);
> +
> +       mtk_mac_ring_push_head_tx(ring, &desc_data);
> +
> +       netdev_sent_queue(ndev, skb->len);
> +
> +       if (mtk_mac_ring_full(ring))
> +               netif_stop_queue(ndev);
> +
> +       mtk_mac_unlock(priv);
> +
> +       mtk_mac_dma_resume_tx(priv);

mtk_mac_dma_resume_tx() is an expensive read-modify-write
on an mmio register, so it would make sense to defer it based
on netdev_xmit_more(). (I had missed this in the previous
review)

> +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv)
> +{
> +       struct mtk_mac_ring *ring = &priv->tx_ring;
> +       struct net_device *ndev = priv->ndev;
> +       int ret, pkts_compl, bytes_compl;
> +       bool wake = false;
> +
> +       mtk_mac_lock(priv);
> +
> +       for (pkts_compl = 0, bytes_compl = 0;;
> +            pkts_compl++, bytes_compl += ret, wake = true) {
> +               if (!mtk_mac_ring_descs_available(ring))
> +                       break;
> +
> +               ret = mtk_mac_tx_complete_one(priv);
> +               if (ret < 0)
> +                       break;
> +       }
> +
> +       netdev_completed_queue(ndev, pkts_compl, bytes_compl);
> +
> +       if (wake && netif_queue_stopped(ndev))
> +               netif_wake_queue(ndev);
> +
> +       mtk_mac_intr_enable_tx(priv);

No need to ack the interrupt here if napi is still active. Just
ack both rx and tx when calling napi_complete().

Some drivers actually use the napi budget for both rx and tx:
if you have more than 'budget' completed tx frames, return
early from this function and skip the napi_complete even
when less than 'budget' rx frames have arrived.

This way you get more fairness between devices and
can run for longer with irqs disabled as long as either rx
or tx is busy.

         Arnd
Bartosz Golaszewski May 20, 2020, 5:34 p.m. UTC | #2
śr., 20 maj 2020 o 16:37 Arnd Bergmann <arnd@arndb.de> napisał(a):
>
> On Wed, May 20, 2020 at 1:25 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> >
> > From: Bartosz Golaszewski <bgolaszewski@baylibre.com>
> >
> > This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC
> > family. For now we only support full-duplex.
> >
> > Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com>
>
> Looks much better, thanks for addressing my feedback. A few more things
> about this version:
>
> > ---
> >  drivers/net/ethernet/mediatek/Kconfig       |    6 +
> >  drivers/net/ethernet/mediatek/Makefile      |    1 +
> >  drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1668 +++++++++++++++++++
> >  3 files changed, 1675 insertions(+)
> >  create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c
> >
> > diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig
> > index 5079b8090f16..5c3793076765 100644
> > --- a/drivers/net/ethernet/mediatek/Kconfig
> > +++ b/drivers/net/ethernet/mediatek/Kconfig
> > @@ -14,4 +14,10 @@ config NET_MEDIATEK_SOC
> >           This driver supports the gigabit ethernet MACs in the
> >           MediaTek SoC family.
> >
> > +config NET_MEDIATEK_MAC
> > +       tristate "MediaTek Ethernet MAC support"
> > +       select PHYLIB
> > +       help
> > +         This driver supports the ethernet IP on MediaTek MT85** SoCs.
>
> I just noticed how the naming of NET_MEDIATEK_MAC and NET_MEDIATEK_SOC
> for two different drivers doing the same thing is really confusing.
>
> Maybe someone can come up with a better name, such as one
> based on the soc it first showed up in.
>

This has been discussed under one of the previous submissions.
MediaTek wants to use this IP on future designs as well and it's
already used on multiple SoCs so they want the name to be generic. I
also argued that this is a driver strongly tied to a specific
platform(s) so if someone wants to compile it - they probably know
what they're doing.

That being said: I verified with MediaTek and the name of the IP I can
use is "star" so they proposed "mtk-star-eth". I would personally
maybe go with "mtk-star-mac". How about those two?

> > +       struct mtk_mac_ring_desc *desc = &ring->descs[ring->head];
> > +       unsigned int status;
> > +
> > +       status = desc->status;
> > +
> > +       ring->skbs[ring->head] = desc_data->skb;
> > +       ring->dma_addrs[ring->head] = desc_data->dma_addr;
> > +       desc->data_ptr = desc_data->dma_addr;
> > +
> > +       status |= desc_data->len;
> > +       if (flags)
> > +               status |= flags;
> > +       desc->status = status;
> > +
> > +       /* Flush previous modifications before ownership change. */
> > +       dma_wmb();
> > +       desc->status &= ~MTK_MAC_DESC_BIT_COWN;
>
> You still do the read-modify-write on the word here, which is
> expensive on uncached memory. You have read the value already,
> so better use an assignment rather than &=, or (better)
> READ_ONCE() and WRITE_ONCE() to prevent the compiler
> from adding further accesses.
>

Will do.

>
> > +static void mtk_mac_lock(struct mtk_mac_priv *priv)
> > +{
> > +       spin_lock_bh(&priv->lock);
> > +}
> > +
> > +static void mtk_mac_unlock(struct mtk_mac_priv *priv)
> > +{
> > +       spin_unlock_bh(&priv->lock);
> > +}
>
> I think open-coding the locks would make this more readable,
> and let you use spin_lock() instead of spin_lock_bh() in
> those functions that are already in softirq context.
>

Will do.

> > +static void mtk_mac_intr_enable_tx(struct mtk_mac_priv *priv)
> > +{
> > +       regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK,
> > +                          MTK_MAC_BIT_INT_STS_TNTC, 0);
> > +}
> > +static void mtk_mac_intr_enable_rx(struct mtk_mac_priv *priv)
> > +{
> > +       regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK,
> > +                          MTK_MAC_BIT_INT_STS_FNRC, 0);
> > +}
>
> These imply reading the irq mask register and then writing it again,
> which is much more expensive than just writing it. It's also not
> atomic since the regmap does not use a lock.
>
> I don't think you actually need to enable/disable rx and tx separately,
> but if you do, then writing to the Ack register as I suggested instead
> of updating the mask would let you do this.
>
> > +/* All processing for TX and RX happens in the napi poll callback. */
> > +static irqreturn_t mtk_mac_handle_irq(int irq, void *data)
> > +{
> > +       struct mtk_mac_priv *priv;
> > +       struct net_device *ndev;
> > +       bool need_napi = false;
> > +       unsigned int status;
> > +
> > +       ndev = data;
> > +       priv = netdev_priv(ndev);
> > +
> > +       if (netif_running(ndev)) {
> > +               status = mtk_mac_intr_read(priv);
> > +
> > +               if (status & MTK_MAC_BIT_INT_STS_TNTC) {
> > +                       mtk_mac_intr_disable_tx(priv);
> > +                       need_napi = true;
> > +               }
> > +
> > +               if (status & MTK_MAC_BIT_INT_STS_FNRC) {
> > +                       mtk_mac_intr_disable_rx(priv);
> > +                       need_napi = true;
> > +               }
>
> I think you mixed up the rx and tx bits here: when you get
> an rx interrupt, that one is already blocked until it gets
> acked and you just need to disable tx until the end of the
> poll function.
>
> However, I suspect that the overhead of turning them off
> is higher than what  you can save, and simply ignoring
> the mask with
>
> if (status & (MTK_MAC_BIT_INT_STS_FNRC | MTK_MAC_BIT_INT_STS_TNTC))
>         napi_schedule(&priv->napi);
>
> would be simpler and faster.
>
>  +               /* One of the counters reached 0x8000000 - update stats and
> > +                * reset all counters.
> > +                */
> > +               if (unlikely(status & MTK_MAC_REG_INT_STS_MIB_CNT_TH)) {
> > +                       mtk_mac_intr_disable_stats(priv);
> > +                       schedule_work(&priv->stats_work);
> > +               }
> > + befor
> > +               mtk_mac_intr_ack_all(priv);
>
> The ack here needs to be dropped, otherwise you can get further
> interrupts before the bottom half has had a chance to run.
>

My thinking was this: if I mask the relevant interrupt (TX/RX
complete) and ack it right away, the status bit will be asserted on
the next packet received/sent but the process won't get interrupted
and when I unmask it, it will fire right away and I won't have to
recheck the status register. I noticed that if I ack it at the end of
napi poll callback, I end up missing certain TX complete interrupts
and end up seeing a lot of retransmissions even if I reread the status
register. I'm not yet sure where this race happens.

> You might be lucky because you had already disabled the individual
> bits earlier, but I don't think that was intentional here.
>
> > +static int mtk_mac_netdev_start_xmit(struct sk_buff *skb,
> > +                                    struct net_device *ndev)
> > +{
> > +       struct mtk_mac_priv *priv = netdev_priv(ndev);
> > +       struct mtk_mac_ring *ring = &priv->tx_ring;
> > +       struct device *dev = mtk_mac_get_dev(priv);
> > +       struct mtk_mac_ring_desc_data desc_data;
> > +
> > +       desc_data.dma_addr = mtk_mac_dma_map_tx(priv, skb);
> > +       if (dma_mapping_error(dev, desc_data.dma_addr))
> > +               goto err_drop_packet;
> > +
> > +       desc_data.skb = skb;
> > +       desc_data.len = skb->len;
> > +
> > +       mtk_mac_lock(priv);
> > +
> > +       mtk_mac_ring_push_head_tx(ring, &desc_data);
> > +
> > +       netdev_sent_queue(ndev, skb->len);
> > +
> > +       if (mtk_mac_ring_full(ring))
> > +               netif_stop_queue(ndev);
> > +
> > +       mtk_mac_unlock(priv);
> > +
> > +       mtk_mac_dma_resume_tx(priv);
>
> mtk_mac_dma_resume_tx() is an expensive read-modify-write
> on an mmio register, so it would make sense to defer it based
> on netdev_xmit_more(). (I had missed this in the previous
> review)
>

Thanks for bringing it to my attention, I'll see about using it.

> > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv)
> > +{
> > +       struct mtk_mac_ring *ring = &priv->tx_ring;
> > +       struct net_device *ndev = priv->ndev;
> > +       int ret, pkts_compl, bytes_compl;
> > +       bool wake = false;
> > +
> > +       mtk_mac_lock(priv);
> > +
> > +       for (pkts_compl = 0, bytes_compl = 0;;
> > +            pkts_compl++, bytes_compl += ret, wake = true) {
> > +               if (!mtk_mac_ring_descs_available(ring))
> > +                       break;
> > +
> > +               ret = mtk_mac_tx_complete_one(priv);
> > +               if (ret < 0)
> > +                       break;
> > +       }
> > +
> > +       netdev_completed_queue(ndev, pkts_compl, bytes_compl);
> > +
> > +       if (wake && netif_queue_stopped(ndev))
> > +               netif_wake_queue(ndev);
> > +
> > +       mtk_mac_intr_enable_tx(priv);
>
> No need to ack the interrupt here if napi is still active. Just
> ack both rx and tx when calling napi_complete().
>
> Some drivers actually use the napi budget for both rx and tx:
> if you have more than 'budget' completed tx frames, return
> early from this function and skip the napi_complete even
> when less than 'budget' rx frames have arrived.
>

IIRC Jakub said that the most seen approach is to free all TX descs
and receive up to budget packets, so this is what I did. I think it
makes the most sense.

Best regards,
Bartosz

> This way you get more fairness between devices and
> can run for longer with irqs disabled as long as either rx
> or tx is busy.
>
>          Arnd
Arnd Bergmann May 20, 2020, 9:22 p.m. UTC | #3
On Wed, May 20, 2020 at 7:35 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> śr., 20 maj 2020 o 16:37 Arnd Bergmann <arnd@arndb.de> napisał(a):

> > I just noticed how the naming of NET_MEDIATEK_MAC and NET_MEDIATEK_SOC
> > for two different drivers doing the same thing is really confusing.
> >
> > Maybe someone can come up with a better name, such as one
> > based on the soc it first showed up in.
> >
>
> This has been discussed under one of the previous submissions.
> MediaTek wants to use this IP on future designs as well and it's
> already used on multiple SoCs so they want the name to be generic. I
> also argued that this is a driver strongly tied to a specific
> platform(s) so if someone wants to compile it - they probably know
> what they're doing.
>
> That being said: I verified with MediaTek and the name of the IP I can
> use is "star" so they proposed "mtk-star-eth". I would personally
> maybe go with "mtk-star-mac". How about those two?

Both seem fine to me. If this was previously discussed, I don't want
do further bike-shedding and I'd trust you to pick a sensible name
based on the earlier discussions.

> >  +               /* One of the counters reached 0x8000000 - update stats and
> > > +                * reset all counters.
> > > +                */
> > > +               if (unlikely(status & MTK_MAC_REG_INT_STS_MIB_CNT_TH)) {
> > > +                       mtk_mac_intr_disable_stats(priv);
> > > +                       schedule_work(&priv->stats_work);
> > > +               }
> > > + befor
> > > +               mtk_mac_intr_ack_all(priv);
> >
> > The ack here needs to be dropped, otherwise you can get further
> > interrupts before the bottom half has had a chance to run.
> >
>
> My thinking was this: if I mask the relevant interrupt (TX/RX
> complete) and ack it right away, the status bit will be asserted on
> the next packet received/sent but the process won't get interrupted
> and when I unmask it, it will fire right away and I won't have to
> recheck the status register. I noticed that if I ack it at the end of
> napi poll callback, I end up missing certain TX complete interrupts
> and end up seeing a lot of retransmissions even if I reread the status
> register. I'm not yet sure where this race happens.

Right, I see. If you just ack at the end of the poll function, you need
to check the rings again to ensure you did not miss an interrupt
between checking observing both rings to be empty and the irq-ack.

I suspect it's still cheaper to check the two rings with an uncached
read from memory than to to do the read-modify-write on the mmio,
but you'd have to measure that to be sure.

> > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv)
> > > +{
> > > +       struct mtk_mac_ring *ring = &priv->tx_ring;
> > > +       struct net_device *ndev = priv->ndev;
> > > +       int ret, pkts_compl, bytes_compl;
> > > +       bool wake = false;
> > > +
> > > +       mtk_mac_lock(priv);
> > > +
> > > +       for (pkts_compl = 0, bytes_compl = 0;;
> > > +            pkts_compl++, bytes_compl += ret, wake = true) {
> > > +               if (!mtk_mac_ring_descs_available(ring))
> > > +                       break;
> > > +
> > > +               ret = mtk_mac_tx_complete_one(priv);
> > > +               if (ret < 0)
> > > +                       break;
> > > +       }
> > > +
> > > +       netdev_completed_queue(ndev, pkts_compl, bytes_compl);
> > > +
> > > +       if (wake && netif_queue_stopped(ndev))
> > > +               netif_wake_queue(ndev);
> > > +
> > > +       mtk_mac_intr_enable_tx(priv);
> >
> > No need to ack the interrupt here if napi is still active. Just
> > ack both rx and tx when calling napi_complete().
> >
> > Some drivers actually use the napi budget for both rx and tx:
> > if you have more than 'budget' completed tx frames, return
> > early from this function and skip the napi_complete even
> > when less than 'budget' rx frames have arrived.
> >
>
> IIRC Jakub said that the most seen approach is to free all TX descs
> and receive up to budget packets, so this is what I did. I think it
> makes the most sense.

Ok, he's probably right then.

My idea was that the dma_unmap operation for the tx cleanup is
rather expensive on chips without cache-coherent DMA, so you
might not want to do too much of it but rather do it in reasonably
sized batches. It would also avoid the case where you renable the
tx-complete interrupt after cleaning the already-sent frames but
then immediately get an irq when the next frame that is already
queued is done.

This probably depends on the specific workload which one works
better here.

         Arnd
Bartosz Golaszewski May 22, 2020, 7:44 a.m. UTC | #4
śr., 20 maj 2020 o 23:23 Arnd Bergmann <arnd@arndb.de> napisał(a):
>
> On Wed, May 20, 2020 at 7:35 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> > śr., 20 maj 2020 o 16:37 Arnd Bergmann <arnd@arndb.de> napisał(a):
>
> > > I just noticed how the naming of NET_MEDIATEK_MAC and NET_MEDIATEK_SOC
> > > for two different drivers doing the same thing is really confusing.
> > >
> > > Maybe someone can come up with a better name, such as one
> > > based on the soc it first showed up in.
> > >
> >
> > This has been discussed under one of the previous submissions.
> > MediaTek wants to use this IP on future designs as well and it's
> > already used on multiple SoCs so they want the name to be generic. I
> > also argued that this is a driver strongly tied to a specific
> > platform(s) so if someone wants to compile it - they probably know
> > what they're doing.
> >
> > That being said: I verified with MediaTek and the name of the IP I can
> > use is "star" so they proposed "mtk-star-eth". I would personally
> > maybe go with "mtk-star-mac". How about those two?
>
> Both seem fine to me. If this was previously discussed, I don't want
> do further bike-shedding and I'd trust you to pick a sensible name
> based on the earlier discussions.
>
> > >  +               /* One of the counters reached 0x8000000 - update stats and
> > > > +                * reset all counters.
> > > > +                */
> > > > +               if (unlikely(status & MTK_MAC_REG_INT_STS_MIB_CNT_TH)) {
> > > > +                       mtk_mac_intr_disable_stats(priv);
> > > > +                       schedule_work(&priv->stats_work);
> > > > +               }
> > > > + befor
> > > > +               mtk_mac_intr_ack_all(priv);
> > >
> > > The ack here needs to be dropped, otherwise you can get further
> > > interrupts before the bottom half has had a chance to run.
> > >
> >
> > My thinking was this: if I mask the relevant interrupt (TX/RX
> > complete) and ack it right away, the status bit will be asserted on
> > the next packet received/sent but the process won't get interrupted
> > and when I unmask it, it will fire right away and I won't have to
> > recheck the status register. I noticed that if I ack it at the end of
> > napi poll callback, I end up missing certain TX complete interrupts
> > and end up seeing a lot of retransmissions even if I reread the status
> > register. I'm not yet sure where this race happens.
>
> Right, I see. If you just ack at the end of the poll function, you need
> to check the rings again to ensure you did not miss an interrupt
> between checking observing both rings to be empty and the irq-ack.
>
> I suspect it's still cheaper to check the two rings with an uncached
> read from memory than to to do the read-modify-write on the mmio,
> but you'd have to measure that to be sure.
>

Unfortunately the PHY on the board I have is 100Mbps which is the
limiting factor in benchmarking this driver. :(

If you're fine with this - I'd like to fix the minor issues you
pointed out and stick with the current approach for now. We can always
fix the implementation in the future once a board with a Gigabit PHY
is out. Most ethernet drivers don't use such fine-grained interrupt
control anyway. I expect the performance differences to be miniscule
really.

Bart
Arnd Bergmann May 22, 2020, 7:49 a.m. UTC | #5
On Fri, May 22, 2020 at 9:44 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> śr., 20 maj 2020 o 23:23 Arnd Bergmann <arnd@arndb.de> napisał(a):
> > On Wed, May 20, 2020 at 7:35 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote:
> > > śr., 20 maj 2020 o 16:37 Arnd Bergmann <arnd@arndb.de> napisał(a):

> > > My thinking was this: if I mask the relevant interrupt (TX/RX
> > > complete) and ack it right away, the status bit will be asserted on
> > > the next packet received/sent but the process won't get interrupted
> > > and when I unmask it, it will fire right away and I won't have to
> > > recheck the status register. I noticed that if I ack it at the end of
> > > napi poll callback, I end up missing certain TX complete interrupts
> > > and end up seeing a lot of retransmissions even if I reread the status
> > > register. I'm not yet sure where this race happens.
> >
> > Right, I see. If you just ack at the end of the poll function, you need
> > to check the rings again to ensure you did not miss an interrupt
> > between checking observing both rings to be empty and the irq-ack.
> >
> > I suspect it's still cheaper to check the two rings with an uncached
> > read from memory than to to do the read-modify-write on the mmio,
> > but you'd have to measure that to be sure.
> >
>
> Unfortunately the PHY on the board I have is 100Mbps which is the
> limiting factor in benchmarking this driver. :(
>
> If you're fine with this - I'd like to fix the minor issues you
> pointed out and stick with the current approach for now. We can always
> fix the implementation in the future once a board with a Gigabit PHY
> is out. Most ethernet drivers don't use such fine-grained interrupt
> control anyway. I expect the performance differences to be miniscule
> really.

Ok, fair enough. The BQL limiting is the part that matters the most
for performance on slow lines (preventing long latencies from
buffer bloat), and  you have that now.

       Arnd