Message ID | 87ob17kz96.fsf_-_@xmission.com |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Sat, 2014-03-15 at 13:01 -0700, Eric W. Biederman wrote: > Given this comment I have to ask: How insane is the mellanox mlx4 > driver that has separate rx and tx queues, separate rx and tx interrupts > and uses separate napi bottom halves to process each, and honors the > budget passed into it's rx napi handler. I believe this driver had TX completion from hard irq, and NAPI polling for RX. This was changed quite recently to also do TX completion from softirq handler. > > Right now the mlx4 code disables interrupts and calls napi_synchronize > in it's netpoll routine instead of just scheduling the napi bottom > halves as all of the sane drivers do. napi_synchronize trying to sleep > in hard irq context is pretty terrible, but I can see roughly how that > should be fixed. > > diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > index fa5ee719e04b..2e6fded14e60 100644 > --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c > @@ -1302,16 +1302,10 @@ out: > static void mlx4_en_netpoll(struct net_device *dev) > { > struct mlx4_en_priv *priv = netdev_priv(dev); > - struct mlx4_en_cq *cq; > - unsigned long flags; > int i; > > for (i = 0; i < priv->rx_ring_num; i++) { > - cq = priv->rx_cq[i]; > - spin_lock_irqsave(&cq->lock, flags); > - napi_synchronize(&cq->napi); > - mlx4_en_process_rx_cq(dev, cq, 0); > - spin_unlock_irqrestore(&cq->lock, flags); > + napi_schedule(&priv->tx_cq[i]->napi); > } > } Wait a minute, I thought ndo_poll_controller() had to be synchronous, not schedule a napi ? > > What I can't see is what is a clean thing to do with the mlx4 tx bottom > napi bottom half. As it won't processes the tx cq when I pass in a > budget of 0. Just ignore the budget to drain tx queues, as David said. > > What I can't see is how to prevent a netpoll stalling after enough > packets are transmitted from hard irq context say with sysrq-l on > a machine with enough cpus to fill the tx queue. Is this specific to mlx4 ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Eric Dumazet <eric.dumazet@gmail.com> Date: Sun, 16 Mar 2014 09:17:01 -0700 > Wait a minute, I thought ndo_poll_controller() had to be synchronous, > not schedule a napi ? ndo_poll_controller() basically runs the hardware interrupt handler, which might schedule NAPI processing. Then netpoll invokes the NAPI poll in whatever state ndo_poll_controller() left it in. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, 2014-03-17 at 17:22 -0400, David Miller wrote: > From: Eric Dumazet <eric.dumazet@gmail.com> > Date: Sun, 16 Mar 2014 09:17:01 -0700 > > > Wait a minute, I thought ndo_poll_controller() had to be synchronous, > > not schedule a napi ? > > ndo_poll_controller() basically runs the hardware interrupt handler, > which might schedule NAPI processing. Then netpoll invokes > the NAPI poll in whatever state ndo_poll_controller() left it in. Ah right, this makes sense, poll_napi() uses a spin_trylock()... -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c index fa5ee719e04b..2e6fded14e60 100644 --- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c +++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c @@ -1302,16 +1302,10 @@ out: static void mlx4_en_netpoll(struct net_device *dev) { struct mlx4_en_priv *priv = netdev_priv(dev); - struct mlx4_en_cq *cq; - unsigned long flags; int i; for (i = 0; i < priv->rx_ring_num; i++) { - cq = priv->rx_cq[i]; - spin_lock_irqsave(&cq->lock, flags); - napi_synchronize(&cq->napi); - mlx4_en_process_rx_cq(dev, cq, 0); - spin_unlock_irqrestore(&cq->lock, flags); + napi_schedule(&priv->tx_cq[i]->napi); } }