Message ID | 20200520112523.30995-1-brgl@bgdev.pl |
---|---|
Headers | show |
Series | mediatek: add support for MediaTek Ethernet MAC | expand |
On Wed, May 20, 2020 at 1:25 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > From: Bartosz Golaszewski <bgolaszewski@baylibre.com> > > This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC > family. For now we only support full-duplex. > > Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Looks much better, thanks for addressing my feedback. A few more things about this version: > --- > drivers/net/ethernet/mediatek/Kconfig | 6 + > drivers/net/ethernet/mediatek/Makefile | 1 + > drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1668 +++++++++++++++++++ > 3 files changed, 1675 insertions(+) > create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c > > diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig > index 5079b8090f16..5c3793076765 100644 > --- a/drivers/net/ethernet/mediatek/Kconfig > +++ b/drivers/net/ethernet/mediatek/Kconfig > @@ -14,4 +14,10 @@ config NET_MEDIATEK_SOC > This driver supports the gigabit ethernet MACs in the > MediaTek SoC family. > > +config NET_MEDIATEK_MAC > + tristate "MediaTek Ethernet MAC support" > + select PHYLIB > + help > + This driver supports the ethernet IP on MediaTek MT85** SoCs. I just noticed how the naming of NET_MEDIATEK_MAC and NET_MEDIATEK_SOC for two different drivers doing the same thing is really confusing. Maybe someone can come up with a better name, such as one based on the soc it first showed up in. > + struct mtk_mac_ring_desc *desc = &ring->descs[ring->head]; > + unsigned int status; > + > + status = desc->status; > + > + ring->skbs[ring->head] = desc_data->skb; > + ring->dma_addrs[ring->head] = desc_data->dma_addr; > + desc->data_ptr = desc_data->dma_addr; > + > + status |= desc_data->len; > + if (flags) > + status |= flags; > + desc->status = status; > + > + /* Flush previous modifications before ownership change. */ > + dma_wmb(); > + desc->status &= ~MTK_MAC_DESC_BIT_COWN; You still do the read-modify-write on the word here, which is expensive on uncached memory. You have read the value already, so better use an assignment rather than &=, or (better) READ_ONCE() and WRITE_ONCE() to prevent the compiler from adding further accesses. > +static void mtk_mac_lock(struct mtk_mac_priv *priv) > +{ > + spin_lock_bh(&priv->lock); > +} > + > +static void mtk_mac_unlock(struct mtk_mac_priv *priv) > +{ > + spin_unlock_bh(&priv->lock); > +} I think open-coding the locks would make this more readable, and let you use spin_lock() instead of spin_lock_bh() in those functions that are already in softirq context. > +static void mtk_mac_intr_enable_tx(struct mtk_mac_priv *priv) > +{ > + regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK, > + MTK_MAC_BIT_INT_STS_TNTC, 0); > +} > +static void mtk_mac_intr_enable_rx(struct mtk_mac_priv *priv) > +{ > + regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK, > + MTK_MAC_BIT_INT_STS_FNRC, 0); > +} These imply reading the irq mask register and then writing it again, which is much more expensive than just writing it. It's also not atomic since the regmap does not use a lock. I don't think you actually need to enable/disable rx and tx separately, but if you do, then writing to the Ack register as I suggested instead of updating the mask would let you do this. > +/* All processing for TX and RX happens in the napi poll callback. */ > +static irqreturn_t mtk_mac_handle_irq(int irq, void *data) > +{ > + struct mtk_mac_priv *priv; > + struct net_device *ndev; > + bool need_napi = false; > + unsigned int status; > + > + ndev = data; > + priv = netdev_priv(ndev); > + > + if (netif_running(ndev)) { > + status = mtk_mac_intr_read(priv); > + > + if (status & MTK_MAC_BIT_INT_STS_TNTC) { > + mtk_mac_intr_disable_tx(priv); > + need_napi = true; > + } > + > + if (status & MTK_MAC_BIT_INT_STS_FNRC) { > + mtk_mac_intr_disable_rx(priv); > + need_napi = true; > + } I think you mixed up the rx and tx bits here: when you get an rx interrupt, that one is already blocked until it gets acked and you just need to disable tx until the end of the poll function. However, I suspect that the overhead of turning them off is higher than what you can save, and simply ignoring the mask with if (status & (MTK_MAC_BIT_INT_STS_FNRC | MTK_MAC_BIT_INT_STS_TNTC)) napi_schedule(&priv->napi); would be simpler and faster. + /* One of the counters reached 0x8000000 - update stats and > + * reset all counters. > + */ > + if (unlikely(status & MTK_MAC_REG_INT_STS_MIB_CNT_TH)) { > + mtk_mac_intr_disable_stats(priv); > + schedule_work(&priv->stats_work); > + } > + befor > + mtk_mac_intr_ack_all(priv); The ack here needs to be dropped, otherwise you can get further interrupts before the bottom half has had a chance to run. You might be lucky because you had already disabled the individual bits earlier, but I don't think that was intentional here. > +static int mtk_mac_netdev_start_xmit(struct sk_buff *skb, > + struct net_device *ndev) > +{ > + struct mtk_mac_priv *priv = netdev_priv(ndev); > + struct mtk_mac_ring *ring = &priv->tx_ring; > + struct device *dev = mtk_mac_get_dev(priv); > + struct mtk_mac_ring_desc_data desc_data; > + > + desc_data.dma_addr = mtk_mac_dma_map_tx(priv, skb); > + if (dma_mapping_error(dev, desc_data.dma_addr)) > + goto err_drop_packet; > + > + desc_data.skb = skb; > + desc_data.len = skb->len; > + > + mtk_mac_lock(priv); > + > + mtk_mac_ring_push_head_tx(ring, &desc_data); > + > + netdev_sent_queue(ndev, skb->len); > + > + if (mtk_mac_ring_full(ring)) > + netif_stop_queue(ndev); > + > + mtk_mac_unlock(priv); > + > + mtk_mac_dma_resume_tx(priv); mtk_mac_dma_resume_tx() is an expensive read-modify-write on an mmio register, so it would make sense to defer it based on netdev_xmit_more(). (I had missed this in the previous review) > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv) > +{ > + struct mtk_mac_ring *ring = &priv->tx_ring; > + struct net_device *ndev = priv->ndev; > + int ret, pkts_compl, bytes_compl; > + bool wake = false; > + > + mtk_mac_lock(priv); > + > + for (pkts_compl = 0, bytes_compl = 0;; > + pkts_compl++, bytes_compl += ret, wake = true) { > + if (!mtk_mac_ring_descs_available(ring)) > + break; > + > + ret = mtk_mac_tx_complete_one(priv); > + if (ret < 0) > + break; > + } > + > + netdev_completed_queue(ndev, pkts_compl, bytes_compl); > + > + if (wake && netif_queue_stopped(ndev)) > + netif_wake_queue(ndev); > + > + mtk_mac_intr_enable_tx(priv); No need to ack the interrupt here if napi is still active. Just ack both rx and tx when calling napi_complete(). Some drivers actually use the napi budget for both rx and tx: if you have more than 'budget' completed tx frames, return early from this function and skip the napi_complete even when less than 'budget' rx frames have arrived. This way you get more fairness between devices and can run for longer with irqs disabled as long as either rx or tx is busy. Arnd
śr., 20 maj 2020 o 16:37 Arnd Bergmann <arnd@arndb.de> napisał(a): > > On Wed, May 20, 2020 at 1:25 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > > From: Bartosz Golaszewski <bgolaszewski@baylibre.com> > > > > This adds the driver for the MediaTek Ethernet MAC used on the MT8* SoC > > family. For now we only support full-duplex. > > > > Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> > > Looks much better, thanks for addressing my feedback. A few more things > about this version: > > > --- > > drivers/net/ethernet/mediatek/Kconfig | 6 + > > drivers/net/ethernet/mediatek/Makefile | 1 + > > drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1668 +++++++++++++++++++ > > 3 files changed, 1675 insertions(+) > > create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c > > > > diff --git a/drivers/net/ethernet/mediatek/Kconfig b/drivers/net/ethernet/mediatek/Kconfig > > index 5079b8090f16..5c3793076765 100644 > > --- a/drivers/net/ethernet/mediatek/Kconfig > > +++ b/drivers/net/ethernet/mediatek/Kconfig > > @@ -14,4 +14,10 @@ config NET_MEDIATEK_SOC > > This driver supports the gigabit ethernet MACs in the > > MediaTek SoC family. > > > > +config NET_MEDIATEK_MAC > > + tristate "MediaTek Ethernet MAC support" > > + select PHYLIB > > + help > > + This driver supports the ethernet IP on MediaTek MT85** SoCs. > > I just noticed how the naming of NET_MEDIATEK_MAC and NET_MEDIATEK_SOC > for two different drivers doing the same thing is really confusing. > > Maybe someone can come up with a better name, such as one > based on the soc it first showed up in. > This has been discussed under one of the previous submissions. MediaTek wants to use this IP on future designs as well and it's already used on multiple SoCs so they want the name to be generic. I also argued that this is a driver strongly tied to a specific platform(s) so if someone wants to compile it - they probably know what they're doing. That being said: I verified with MediaTek and the name of the IP I can use is "star" so they proposed "mtk-star-eth". I would personally maybe go with "mtk-star-mac". How about those two? > > + struct mtk_mac_ring_desc *desc = &ring->descs[ring->head]; > > + unsigned int status; > > + > > + status = desc->status; > > + > > + ring->skbs[ring->head] = desc_data->skb; > > + ring->dma_addrs[ring->head] = desc_data->dma_addr; > > + desc->data_ptr = desc_data->dma_addr; > > + > > + status |= desc_data->len; > > + if (flags) > > + status |= flags; > > + desc->status = status; > > + > > + /* Flush previous modifications before ownership change. */ > > + dma_wmb(); > > + desc->status &= ~MTK_MAC_DESC_BIT_COWN; > > You still do the read-modify-write on the word here, which is > expensive on uncached memory. You have read the value already, > so better use an assignment rather than &=, or (better) > READ_ONCE() and WRITE_ONCE() to prevent the compiler > from adding further accesses. > Will do. > > > +static void mtk_mac_lock(struct mtk_mac_priv *priv) > > +{ > > + spin_lock_bh(&priv->lock); > > +} > > + > > +static void mtk_mac_unlock(struct mtk_mac_priv *priv) > > +{ > > + spin_unlock_bh(&priv->lock); > > +} > > I think open-coding the locks would make this more readable, > and let you use spin_lock() instead of spin_lock_bh() in > those functions that are already in softirq context. > Will do. > > +static void mtk_mac_intr_enable_tx(struct mtk_mac_priv *priv) > > +{ > > + regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK, > > + MTK_MAC_BIT_INT_STS_TNTC, 0); > > +} > > +static void mtk_mac_intr_enable_rx(struct mtk_mac_priv *priv) > > +{ > > + regmap_update_bits(priv->regs, MTK_MAC_REG_INT_MASK, > > + MTK_MAC_BIT_INT_STS_FNRC, 0); > > +} > > These imply reading the irq mask register and then writing it again, > which is much more expensive than just writing it. It's also not > atomic since the regmap does not use a lock. > > I don't think you actually need to enable/disable rx and tx separately, > but if you do, then writing to the Ack register as I suggested instead > of updating the mask would let you do this. > > > +/* All processing for TX and RX happens in the napi poll callback. */ > > +static irqreturn_t mtk_mac_handle_irq(int irq, void *data) > > +{ > > + struct mtk_mac_priv *priv; > > + struct net_device *ndev; > > + bool need_napi = false; > > + unsigned int status; > > + > > + ndev = data; > > + priv = netdev_priv(ndev); > > + > > + if (netif_running(ndev)) { > > + status = mtk_mac_intr_read(priv); > > + > > + if (status & MTK_MAC_BIT_INT_STS_TNTC) { > > + mtk_mac_intr_disable_tx(priv); > > + need_napi = true; > > + } > > + > > + if (status & MTK_MAC_BIT_INT_STS_FNRC) { > > + mtk_mac_intr_disable_rx(priv); > > + need_napi = true; > > + } > > I think you mixed up the rx and tx bits here: when you get > an rx interrupt, that one is already blocked until it gets > acked and you just need to disable tx until the end of the > poll function. > > However, I suspect that the overhead of turning them off > is higher than what you can save, and simply ignoring > the mask with > > if (status & (MTK_MAC_BIT_INT_STS_FNRC | MTK_MAC_BIT_INT_STS_TNTC)) > napi_schedule(&priv->napi); > > would be simpler and faster. > > + /* One of the counters reached 0x8000000 - update stats and > > + * reset all counters. > > + */ > > + if (unlikely(status & MTK_MAC_REG_INT_STS_MIB_CNT_TH)) { > > + mtk_mac_intr_disable_stats(priv); > > + schedule_work(&priv->stats_work); > > + } > > + befor > > + mtk_mac_intr_ack_all(priv); > > The ack here needs to be dropped, otherwise you can get further > interrupts before the bottom half has had a chance to run. > My thinking was this: if I mask the relevant interrupt (TX/RX complete) and ack it right away, the status bit will be asserted on the next packet received/sent but the process won't get interrupted and when I unmask it, it will fire right away and I won't have to recheck the status register. I noticed that if I ack it at the end of napi poll callback, I end up missing certain TX complete interrupts and end up seeing a lot of retransmissions even if I reread the status register. I'm not yet sure where this race happens. > You might be lucky because you had already disabled the individual > bits earlier, but I don't think that was intentional here. > > > +static int mtk_mac_netdev_start_xmit(struct sk_buff *skb, > > + struct net_device *ndev) > > +{ > > + struct mtk_mac_priv *priv = netdev_priv(ndev); > > + struct mtk_mac_ring *ring = &priv->tx_ring; > > + struct device *dev = mtk_mac_get_dev(priv); > > + struct mtk_mac_ring_desc_data desc_data; > > + > > + desc_data.dma_addr = mtk_mac_dma_map_tx(priv, skb); > > + if (dma_mapping_error(dev, desc_data.dma_addr)) > > + goto err_drop_packet; > > + > > + desc_data.skb = skb; > > + desc_data.len = skb->len; > > + > > + mtk_mac_lock(priv); > > + > > + mtk_mac_ring_push_head_tx(ring, &desc_data); > > + > > + netdev_sent_queue(ndev, skb->len); > > + > > + if (mtk_mac_ring_full(ring)) > > + netif_stop_queue(ndev); > > + > > + mtk_mac_unlock(priv); > > + > > + mtk_mac_dma_resume_tx(priv); > > mtk_mac_dma_resume_tx() is an expensive read-modify-write > on an mmio register, so it would make sense to defer it based > on netdev_xmit_more(). (I had missed this in the previous > review) > Thanks for bringing it to my attention, I'll see about using it. > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv) > > +{ > > + struct mtk_mac_ring *ring = &priv->tx_ring; > > + struct net_device *ndev = priv->ndev; > > + int ret, pkts_compl, bytes_compl; > > + bool wake = false; > > + > > + mtk_mac_lock(priv); > > + > > + for (pkts_compl = 0, bytes_compl = 0;; > > + pkts_compl++, bytes_compl += ret, wake = true) { > > + if (!mtk_mac_ring_descs_available(ring)) > > + break; > > + > > + ret = mtk_mac_tx_complete_one(priv); > > + if (ret < 0) > > + break; > > + } > > + > > + netdev_completed_queue(ndev, pkts_compl, bytes_compl); > > + > > + if (wake && netif_queue_stopped(ndev)) > > + netif_wake_queue(ndev); > > + > > + mtk_mac_intr_enable_tx(priv); > > No need to ack the interrupt here if napi is still active. Just > ack both rx and tx when calling napi_complete(). > > Some drivers actually use the napi budget for both rx and tx: > if you have more than 'budget' completed tx frames, return > early from this function and skip the napi_complete even > when less than 'budget' rx frames have arrived. > IIRC Jakub said that the most seen approach is to free all TX descs and receive up to budget packets, so this is what I did. I think it makes the most sense. Best regards, Bartosz > This way you get more fairness between devices and > can run for longer with irqs disabled as long as either rx > or tx is busy. > > Arnd
On Wed, May 20, 2020 at 7:35 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > śr., 20 maj 2020 o 16:37 Arnd Bergmann <arnd@arndb.de> napisał(a): > > I just noticed how the naming of NET_MEDIATEK_MAC and NET_MEDIATEK_SOC > > for two different drivers doing the same thing is really confusing. > > > > Maybe someone can come up with a better name, such as one > > based on the soc it first showed up in. > > > > This has been discussed under one of the previous submissions. > MediaTek wants to use this IP on future designs as well and it's > already used on multiple SoCs so they want the name to be generic. I > also argued that this is a driver strongly tied to a specific > platform(s) so if someone wants to compile it - they probably know > what they're doing. > > That being said: I verified with MediaTek and the name of the IP I can > use is "star" so they proposed "mtk-star-eth". I would personally > maybe go with "mtk-star-mac". How about those two? Both seem fine to me. If this was previously discussed, I don't want do further bike-shedding and I'd trust you to pick a sensible name based on the earlier discussions. > > + /* One of the counters reached 0x8000000 - update stats and > > > + * reset all counters. > > > + */ > > > + if (unlikely(status & MTK_MAC_REG_INT_STS_MIB_CNT_TH)) { > > > + mtk_mac_intr_disable_stats(priv); > > > + schedule_work(&priv->stats_work); > > > + } > > > + befor > > > + mtk_mac_intr_ack_all(priv); > > > > The ack here needs to be dropped, otherwise you can get further > > interrupts before the bottom half has had a chance to run. > > > > My thinking was this: if I mask the relevant interrupt (TX/RX > complete) and ack it right away, the status bit will be asserted on > the next packet received/sent but the process won't get interrupted > and when I unmask it, it will fire right away and I won't have to > recheck the status register. I noticed that if I ack it at the end of > napi poll callback, I end up missing certain TX complete interrupts > and end up seeing a lot of retransmissions even if I reread the status > register. I'm not yet sure where this race happens. Right, I see. If you just ack at the end of the poll function, you need to check the rings again to ensure you did not miss an interrupt between checking observing both rings to be empty and the irq-ack. I suspect it's still cheaper to check the two rings with an uncached read from memory than to to do the read-modify-write on the mmio, but you'd have to measure that to be sure. > > > +static void mtk_mac_tx_complete_all(struct mtk_mac_priv *priv) > > > +{ > > > + struct mtk_mac_ring *ring = &priv->tx_ring; > > > + struct net_device *ndev = priv->ndev; > > > + int ret, pkts_compl, bytes_compl; > > > + bool wake = false; > > > + > > > + mtk_mac_lock(priv); > > > + > > > + for (pkts_compl = 0, bytes_compl = 0;; > > > + pkts_compl++, bytes_compl += ret, wake = true) { > > > + if (!mtk_mac_ring_descs_available(ring)) > > > + break; > > > + > > > + ret = mtk_mac_tx_complete_one(priv); > > > + if (ret < 0) > > > + break; > > > + } > > > + > > > + netdev_completed_queue(ndev, pkts_compl, bytes_compl); > > > + > > > + if (wake && netif_queue_stopped(ndev)) > > > + netif_wake_queue(ndev); > > > + > > > + mtk_mac_intr_enable_tx(priv); > > > > No need to ack the interrupt here if napi is still active. Just > > ack both rx and tx when calling napi_complete(). > > > > Some drivers actually use the napi budget for both rx and tx: > > if you have more than 'budget' completed tx frames, return > > early from this function and skip the napi_complete even > > when less than 'budget' rx frames have arrived. > > > > IIRC Jakub said that the most seen approach is to free all TX descs > and receive up to budget packets, so this is what I did. I think it > makes the most sense. Ok, he's probably right then. My idea was that the dma_unmap operation for the tx cleanup is rather expensive on chips without cache-coherent DMA, so you might not want to do too much of it but rather do it in reasonably sized batches. It would also avoid the case where you renable the tx-complete interrupt after cleaning the already-sent frames but then immediately get an irq when the next frame that is already queued is done. This probably depends on the specific workload which one works better here. Arnd
śr., 20 maj 2020 o 23:23 Arnd Bergmann <arnd@arndb.de> napisał(a): > > On Wed, May 20, 2020 at 7:35 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > śr., 20 maj 2020 o 16:37 Arnd Bergmann <arnd@arndb.de> napisał(a): > > > > I just noticed how the naming of NET_MEDIATEK_MAC and NET_MEDIATEK_SOC > > > for two different drivers doing the same thing is really confusing. > > > > > > Maybe someone can come up with a better name, such as one > > > based on the soc it first showed up in. > > > > > > > This has been discussed under one of the previous submissions. > > MediaTek wants to use this IP on future designs as well and it's > > already used on multiple SoCs so they want the name to be generic. I > > also argued that this is a driver strongly tied to a specific > > platform(s) so if someone wants to compile it - they probably know > > what they're doing. > > > > That being said: I verified with MediaTek and the name of the IP I can > > use is "star" so they proposed "mtk-star-eth". I would personally > > maybe go with "mtk-star-mac". How about those two? > > Both seem fine to me. If this was previously discussed, I don't want > do further bike-shedding and I'd trust you to pick a sensible name > based on the earlier discussions. > > > > + /* One of the counters reached 0x8000000 - update stats and > > > > + * reset all counters. > > > > + */ > > > > + if (unlikely(status & MTK_MAC_REG_INT_STS_MIB_CNT_TH)) { > > > > + mtk_mac_intr_disable_stats(priv); > > > > + schedule_work(&priv->stats_work); > > > > + } > > > > + befor > > > > + mtk_mac_intr_ack_all(priv); > > > > > > The ack here needs to be dropped, otherwise you can get further > > > interrupts before the bottom half has had a chance to run. > > > > > > > My thinking was this: if I mask the relevant interrupt (TX/RX > > complete) and ack it right away, the status bit will be asserted on > > the next packet received/sent but the process won't get interrupted > > and when I unmask it, it will fire right away and I won't have to > > recheck the status register. I noticed that if I ack it at the end of > > napi poll callback, I end up missing certain TX complete interrupts > > and end up seeing a lot of retransmissions even if I reread the status > > register. I'm not yet sure where this race happens. > > Right, I see. If you just ack at the end of the poll function, you need > to check the rings again to ensure you did not miss an interrupt > between checking observing both rings to be empty and the irq-ack. > > I suspect it's still cheaper to check the two rings with an uncached > read from memory than to to do the read-modify-write on the mmio, > but you'd have to measure that to be sure. > Unfortunately the PHY on the board I have is 100Mbps which is the limiting factor in benchmarking this driver. :( If you're fine with this - I'd like to fix the minor issues you pointed out and stick with the current approach for now. We can always fix the implementation in the future once a board with a Gigabit PHY is out. Most ethernet drivers don't use such fine-grained interrupt control anyway. I expect the performance differences to be miniscule really. Bart
On Fri, May 22, 2020 at 9:44 AM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > śr., 20 maj 2020 o 23:23 Arnd Bergmann <arnd@arndb.de> napisał(a): > > On Wed, May 20, 2020 at 7:35 PM Bartosz Golaszewski <brgl@bgdev.pl> wrote: > > > śr., 20 maj 2020 o 16:37 Arnd Bergmann <arnd@arndb.de> napisał(a): > > > My thinking was this: if I mask the relevant interrupt (TX/RX > > > complete) and ack it right away, the status bit will be asserted on > > > the next packet received/sent but the process won't get interrupted > > > and when I unmask it, it will fire right away and I won't have to > > > recheck the status register. I noticed that if I ack it at the end of > > > napi poll callback, I end up missing certain TX complete interrupts > > > and end up seeing a lot of retransmissions even if I reread the status > > > register. I'm not yet sure where this race happens. > > > > Right, I see. If you just ack at the end of the poll function, you need > > to check the rings again to ensure you did not miss an interrupt > > between checking observing both rings to be empty and the irq-ack. > > > > I suspect it's still cheaper to check the two rings with an uncached > > read from memory than to to do the read-modify-write on the mmio, > > but you'd have to measure that to be sure. > > > > Unfortunately the PHY on the board I have is 100Mbps which is the > limiting factor in benchmarking this driver. :( > > If you're fine with this - I'd like to fix the minor issues you > pointed out and stick with the current approach for now. We can always > fix the implementation in the future once a board with a Gigabit PHY > is out. Most ethernet drivers don't use such fine-grained interrupt > control anyway. I expect the performance differences to be miniscule > really. Ok, fair enough. The BQL limiting is the part that matters the most for performance on slow lines (preventing long latencies from buffer bloat), and you have that now. Arnd
From: Bartosz Golaszewski <bgolaszewski@baylibre.com> This adds support for the Ethernet Controller present on MediaTeK SoCs from the MT8* family. First we convert the existing DT bindings for the PERICFG controller to YAML and add a new compatible string for mt8516 variant of it. Then we add the DT bindings for the MAC. Next we do some cleanup of the mediatek ethernet drivers directory. The largest patch in the series adds the actual new driver. The rest of the patches add DT fixups for the boards already supported upstream. v1 -> v2: - add a generic helper for retrieving the net_device associated with given private data - fix several typos in commit messages - remove MTK_MAC_VERSION and don't set the driver version - use NET_IP_ALIGN instead of a magic number (2) but redefine it as it defaults to 0 on arm64 - don't manually turn the carrier off in mtk_mac_enable() - process TX cleanup in napi poll callback - configure pause in the adjust_link callback - use regmap_read_poll_timeout() instead of handcoding the polling - use devres_find() to verify that struct net_device is managed by devres in devm_register_netdev() - add a patch moving all networking devres helpers into net/devres.c - tweak the dma barriers: remove where unnecessary and add comments to the remaining barriers - don't reset internal counters when enabling the NIC - set the net_device's mtu size instead of checking the framesize in ndo_start_xmit() callback - fix a race condition in waking up the netif queue - don't emit log messages on OOM errors - use dma_set_mask_and_coherent() - use eth_hw_addr_random() - rework the receive callback so that we reuse the previous skb if unmapping fails, like we already do if skb allocation fails - rework hash table operations: add proper timeout handling and clear bits when appropriate v2 -> v3: - drop the patch adding priv_to_netdev() and store the netdev pointer in the driver private data - add an additional dma_wmb() after reseting the descriptor in mtk_mac_ring_pop_tail() - check the return value of dma_set_mask_and_coherent() - improve the DT bindings for mtk-eth-mac: make the reg property in the example use single-cell address and size, extend the description of the PERICFG phandle and document the mdio sub-node - add a patch converting the old .txt bindings for PERICFG to yaml - limit reading the DMA memory by storing the mapped addresses in the driver private structure - add a patch documenting the existing networking devres helpers v3 -> v4: - drop the devres patches: they will be sent separately - call netdev_sent_queue() & netdev_completed_queue() where appropriate - don't redefine NET_IP_ALIGN: define a private constant in the driver - fix a couple typos - only disabe/enable the MAC in suspend/resume if netif is running - drop the count field from the ring structure and instead calculate the number of used descriptors from the tail and head indicies - rework the locking used to protect the ring structures from concurrent access: use cheaper spin_lock_bh() and completely disable the internal spinlock used by regmap - rework the interrupt handling to make it more fine-grained: onle re-enable TX and RX interrupts while they're needed, process the stats updates in a workqueue, not in napi context - shrink the code responsible for unmapping and freeing skb memory - rework the barriers as advised by Arnd Bartosz Golaszewski (11): dt-bindings: convert the binding document for mediatek PERICFG to yaml dt-bindings: add new compatible to mediatek,pericfg dt-bindings: net: add a binding document for MediaTek Ethernet MAC net: ethernet: mediatek: rename Kconfig prompt net: ethernet: mediatek: remove unnecessary spaces from Makefile net: ethernet: mtk-eth-mac: new driver ARM64: dts: mediatek: add pericfg syscon to mt8516.dtsi ARM64: dts: mediatek: add the ethernet node to mt8516.dtsi ARM64: dts: mediatek: add an alias for ethernet0 for pumpkin boards ARM64: dts: mediatek: add ethernet pins for pumpkin boards ARM64: dts: mediatek: enable ethernet on pumpkin boards .../arm/mediatek/mediatek,pericfg.txt | 36 - .../arm/mediatek/mediatek,pericfg.yaml | 64 + .../bindings/net/mediatek,eth-mac.yaml | 89 + arch/arm64/boot/dts/mediatek/mt8516.dtsi | 17 + .../boot/dts/mediatek/pumpkin-common.dtsi | 34 + drivers/net/ethernet/mediatek/Kconfig | 8 +- drivers/net/ethernet/mediatek/Makefile | 3 +- drivers/net/ethernet/mediatek/mtk_eth_mac.c | 1668 +++++++++++++++++ 8 files changed, 1881 insertions(+), 38 deletions(-) delete mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.txt create mode 100644 Documentation/devicetree/bindings/arm/mediatek/mediatek,pericfg.yaml create mode 100644 Documentation/devicetree/bindings/net/mediatek,eth-mac.yaml create mode 100644 drivers/net/ethernet/mediatek/mtk_eth_mac.c