Message ID | 1389805867-22409-1-git-send-email-zoltan.kiss@citrix.com |
---|---|
State | Superseded, archived |
Delegated to: | David Miller |
Headers | show |
Any reviews on this one? It fixes an important lockup situation, so either this or some other fix should go in soon. On 15/01/14 17:11, Zoltan Kiss wrote: > The recent patch to fix receive side flow control (11b57f) solved the spinning > thread problem, however caused an another one. The receive side can stall, if: > - [THREAD] xenvif_rx_action sets rx_queue_stopped to true > - [INTERRUPT] interrupt happens, and sets rx_event to true > - [THREAD] then xenvif_kthread sets rx_event to false > - [THREAD] rx_work_todo doesn't return true anymore > > Also, if interrupt sent but there is still no room in the ring, it take quite a > long time until xenvif_rx_action realize it. This patch ditch that two variable, > and rework rx_work_todo. If the thread finds it can't fit more skb's into the > ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's > kept as 0. Then rx_work_todo will check if: > - there is something to send to the ring (like before) > - there is space for the topmost packet in the queue > > I think that's more natural and optimal thing to test than two bool which are > set somewhere else. > > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > --- > drivers/net/xen-netback/common.h | 6 +----- > drivers/net/xen-netback/interface.c | 1 - > drivers/net/xen-netback/netback.c | 16 ++++++---------- > 3 files changed, 7 insertions(+), 16 deletions(-) > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h > index 4c76bcb..ae413a2 100644 > --- a/drivers/net/xen-netback/common.h > +++ b/drivers/net/xen-netback/common.h > @@ -143,11 +143,7 @@ struct xenvif { > char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */ > struct xen_netif_rx_back_ring rx; > struct sk_buff_head rx_queue; > - bool rx_queue_stopped; > - /* Set when the RX interrupt is triggered by the frontend. > - * The worker thread may need to wake the queue. > - */ > - bool rx_event; > + RING_IDX rx_last_skb_slots; > > /* This array is allocated seperately as it is large */ > struct gnttab_copy *grant_copy_op; > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c > index b9de31e..7669d49 100644 > --- a/drivers/net/xen-netback/interface.c > +++ b/drivers/net/xen-netback/interface.c > @@ -100,7 +100,6 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id) > { > struct xenvif *vif = dev_id; > > - vif->rx_event = true; > xenvif_kick_thread(vif); > > return IRQ_HANDLED; > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c > index 2738563..bb241d0 100644 > --- a/drivers/net/xen-netback/netback.c > +++ b/drivers/net/xen-netback/netback.c > @@ -477,7 +477,6 @@ static void xenvif_rx_action(struct xenvif *vif) > unsigned long offset; > struct skb_cb_overlay *sco; > bool need_to_notify = false; > - bool ring_full = false; > > struct netrx_pending_operations npo = { > .copy = vif->grant_copy_op, > @@ -487,7 +486,7 @@ static void xenvif_rx_action(struct xenvif *vif) > skb_queue_head_init(&rxq); > > while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) { > - int max_slots_needed; > + RING_IDX max_slots_needed; > int i; > > /* We need a cheap worse case estimate for the number of > @@ -510,9 +509,10 @@ static void xenvif_rx_action(struct xenvif *vif) > if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) { > skb_queue_head(&vif->rx_queue, skb); > need_to_notify = true; > - ring_full = true; > + vif->rx_last_skb_slots = max_slots_needed; > break; > - } > + } else > + vif->rx_last_skb_slots = 0; > > sco = (struct skb_cb_overlay *)skb->cb; > sco->meta_slots_used = xenvif_gop_skb(skb, &npo); > @@ -523,8 +523,6 @@ static void xenvif_rx_action(struct xenvif *vif) > > BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta)); > > - vif->rx_queue_stopped = !npo.copy_prod && ring_full; > - > if (!npo.copy_prod) > goto done; > > @@ -1727,8 +1725,8 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif, > > static inline int rx_work_todo(struct xenvif *vif) > { > - return (!skb_queue_empty(&vif->rx_queue) && !vif->rx_queue_stopped) || > - vif->rx_event; > + return !skb_queue_empty(&vif->rx_queue) && > + xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots); > } > > static inline int tx_work_todo(struct xenvif *vif) > @@ -1814,8 +1812,6 @@ int xenvif_kthread(void *data) > if (!skb_queue_empty(&vif->rx_queue)) > xenvif_rx_action(vif); > > - vif->rx_event = false; > - > if (skb_queue_empty(&vif->rx_queue) && > netif_queue_stopped(vif->dev)) > xenvif_start_queue(vif); > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: Zoltan Kiss > Sent: 20 January 2014 12:23 > To: Ian Campbell; Wei Liu; xen-devel@lists.xenproject.org; > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Jonathan Davies > Cc: Paul Durrant > Subject: Re: [PATCH net-next v2] xen-netback: Rework rx_work_todo > > Any reviews on this one? It fixes an important lockup situation, so > either this or some other fix should go in soon. > > On 15/01/14 17:11, Zoltan Kiss wrote: > > The recent patch to fix receive side flow control (11b57f) solved the > spinning > > thread problem, however caused an another one. The receive side can > stall, if: > > - [THREAD] xenvif_rx_action sets rx_queue_stopped to true > > - [INTERRUPT] interrupt happens, and sets rx_event to true > > - [THREAD] then xenvif_kthread sets rx_event to false > > - [THREAD] rx_work_todo doesn't return true anymore > > > > Also, if interrupt sent but there is still no room in the ring, it take quite a > > long time until xenvif_rx_action realize it. This patch ditch that two variable, > > and rework rx_work_todo. If the thread finds it can't fit more skb's into the > > ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's > > kept as 0. Then rx_work_todo will check if: > > - there is something to send to the ring (like before) > > - there is space for the topmost packet in the queue > > > > I think that's more natural and optimal thing to test than two bool which are > > set somewhere else. > > > > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > > --- > > drivers/net/xen-netback/common.h | 6 +----- > > drivers/net/xen-netback/interface.c | 1 - > > drivers/net/xen-netback/netback.c | 16 ++++++---------- > > 3 files changed, 7 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen- > netback/common.h > > index 4c76bcb..ae413a2 100644 > > --- a/drivers/net/xen-netback/common.h > > +++ b/drivers/net/xen-netback/common.h > > @@ -143,11 +143,7 @@ struct xenvif { > > char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */ > > struct xen_netif_rx_back_ring rx; > > struct sk_buff_head rx_queue; > > - bool rx_queue_stopped; > > - /* Set when the RX interrupt is triggered by the frontend. > > - * The worker thread may need to wake the queue. > > - */ > > - bool rx_event; > > + RING_IDX rx_last_skb_slots; > > > > /* This array is allocated seperately as it is large */ > > struct gnttab_copy *grant_copy_op; > > diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen- > netback/interface.c > > index b9de31e..7669d49 100644 > > --- a/drivers/net/xen-netback/interface.c > > +++ b/drivers/net/xen-netback/interface.c > > @@ -100,7 +100,6 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void > *dev_id) > > { > > struct xenvif *vif = dev_id; > > > > - vif->rx_event = true; > > xenvif_kick_thread(vif); > > > > return IRQ_HANDLED; > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen- > netback/netback.c > > index 2738563..bb241d0 100644 > > --- a/drivers/net/xen-netback/netback.c > > +++ b/drivers/net/xen-netback/netback.c > > @@ -477,7 +477,6 @@ static void xenvif_rx_action(struct xenvif *vif) > > unsigned long offset; > > struct skb_cb_overlay *sco; > > bool need_to_notify = false; > > - bool ring_full = false; > > > > struct netrx_pending_operations npo = { > > .copy = vif->grant_copy_op, > > @@ -487,7 +486,7 @@ static void xenvif_rx_action(struct xenvif *vif) > > skb_queue_head_init(&rxq); > > > > while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) { > > - int max_slots_needed; > > + RING_IDX max_slots_needed; > > int i; > > > > /* We need a cheap worse case estimate for the number of > > @@ -510,9 +509,10 @@ static void xenvif_rx_action(struct xenvif *vif) > > if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) { > > skb_queue_head(&vif->rx_queue, skb); > > need_to_notify = true; > > - ring_full = true; > > + vif->rx_last_skb_slots = max_slots_needed; > > break; > > - } > > + } else > > + vif->rx_last_skb_slots = 0; > > > > sco = (struct skb_cb_overlay *)skb->cb; > > sco->meta_slots_used = xenvif_gop_skb(skb, &npo); > > @@ -523,8 +523,6 @@ static void xenvif_rx_action(struct xenvif *vif) > > > > BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta)); > > > > - vif->rx_queue_stopped = !npo.copy_prod && ring_full; > > - > > if (!npo.copy_prod) > > goto done; > > > > @@ -1727,8 +1725,8 @@ static struct xen_netif_rx_response > *make_rx_response(struct xenvif *vif, > > > > static inline int rx_work_todo(struct xenvif *vif) > > { > > - return (!skb_queue_empty(&vif->rx_queue) && !vif- > >rx_queue_stopped) || > > - vif->rx_event; > > + return !skb_queue_empty(&vif->rx_queue) && > > + xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots); > > } > > > > static inline int tx_work_todo(struct xenvif *vif) > > @@ -1814,8 +1812,6 @@ int xenvif_kthread(void *data) > > if (!skb_queue_empty(&vif->rx_queue)) > > xenvif_rx_action(vif); > > > > - vif->rx_event = false; > > - The minimal patch is to simply move this line up above the previous if clause, but I'm happy with your patch as it stands so Reviewed-by: Paul Durrant <paul.durrant@citrix.com> > > if (skb_queue_empty(&vif->rx_queue) && > > netif_queue_stopped(vif->dev)) > > xenvif_start_queue(vif); > > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Jan 15, 2014 at 05:11:07PM +0000, Zoltan Kiss wrote: > The recent patch to fix receive side flow control (11b57f) solved the spinning > thread problem, however caused an another one. The receive side can stall, if: > - [THREAD] xenvif_rx_action sets rx_queue_stopped to true > - [INTERRUPT] interrupt happens, and sets rx_event to true > - [THREAD] then xenvif_kthread sets rx_event to false > - [THREAD] rx_work_todo doesn't return true anymore > > Also, if interrupt sent but there is still no room in the ring, it take quite a > long time until xenvif_rx_action realize it. This patch ditch that two variable, > and rework rx_work_todo. If the thread finds it can't fit more skb's into the > ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's > kept as 0. Then rx_work_todo will check if: > - there is something to send to the ring (like before) > - there is space for the topmost packet in the queue > > I think that's more natural and optimal thing to test than two bool which are > set somewhere else. > > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> Sorry for the delay. Paul, thanks for reviewing. Acked-by: Wei Liu <wei.liu2@citrix.com> Wei. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 20/01/14 16:38, Wei Liu wrote: > On Wed, Jan 15, 2014 at 05:11:07PM +0000, Zoltan Kiss wrote: >> The recent patch to fix receive side flow control (11b57f) solved the spinning >> thread problem, however caused an another one. The receive side can stall, if: >> - [THREAD] xenvif_rx_action sets rx_queue_stopped to true >> - [INTERRUPT] interrupt happens, and sets rx_event to true >> - [THREAD] then xenvif_kthread sets rx_event to false >> - [THREAD] rx_work_todo doesn't return true anymore >> >> Also, if interrupt sent but there is still no room in the ring, it take quite a >> long time until xenvif_rx_action realize it. This patch ditch that two variable, >> and rework rx_work_todo. If the thread finds it can't fit more skb's into the >> ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's >> kept as 0. Then rx_work_todo will check if: >> - there is something to send to the ring (like before) >> - there is space for the topmost packet in the queue >> >> I think that's more natural and optimal thing to test than two bool which are >> set somewhere else. >> >> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> > > Sorry for the delay. > > Paul, thanks for reviewing. > > Acked-by: Wei Liu <wei.liu2@citrix.com> Hi, This patch haven't made it to net-next yet, maybe because the subject doesn't suggest that this is a bugfix. I suggest to apply it as soon as possible, otherwise netback will be quite broken. Zoli -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 04/02/14 19:19, Zoltan Kiss wrote: > On 20/01/14 16:38, Wei Liu wrote: >> On Wed, Jan 15, 2014 at 05:11:07PM +0000, Zoltan Kiss wrote: >>> The recent patch to fix receive side flow control (11b57f) solved the >>> spinning >>> thread problem, however caused an another one. The receive side can >>> stall, if: >>> - [THREAD] xenvif_rx_action sets rx_queue_stopped to true >>> - [INTERRUPT] interrupt happens, and sets rx_event to true >>> - [THREAD] then xenvif_kthread sets rx_event to false >>> - [THREAD] rx_work_todo doesn't return true anymore >>> >>> Also, if interrupt sent but there is still no room in the ring, it >>> take quite a >>> long time until xenvif_rx_action realize it. This patch ditch that >>> two variable, >>> and rework rx_work_todo. If the thread finds it can't fit more skb's >>> into the >>> ring, it saves the last slot estimation into rx_last_skb_slots, >>> otherwise it's >>> kept as 0. Then rx_work_todo will check if: >>> - there is something to send to the ring (like before) >>> - there is space for the topmost packet in the queue >>> >>> I think that's more natural and optimal thing to test than two bool >>> which are >>> set somewhere else. >>> >>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> >> >> Sorry for the delay. >> >> Paul, thanks for reviewing. >> >> Acked-by: Wei Liu <wei.liu2@citrix.com> > > Hi, > > This patch haven't made it to net-next yet, maybe because the subject > doesn't suggest that this is a bugfix. I suggest to apply it as soon as > possible, otherwise netback will be quite broken. I've reposted it with clearer subject, sorry for being too vague Zoli -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h index 4c76bcb..ae413a2 100644 --- a/drivers/net/xen-netback/common.h +++ b/drivers/net/xen-netback/common.h @@ -143,11 +143,7 @@ struct xenvif { char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */ struct xen_netif_rx_back_ring rx; struct sk_buff_head rx_queue; - bool rx_queue_stopped; - /* Set when the RX interrupt is triggered by the frontend. - * The worker thread may need to wake the queue. - */ - bool rx_event; + RING_IDX rx_last_skb_slots; /* This array is allocated seperately as it is large */ struct gnttab_copy *grant_copy_op; diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index b9de31e..7669d49 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -100,7 +100,6 @@ static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id) { struct xenvif *vif = dev_id; - vif->rx_event = true; xenvif_kick_thread(vif); return IRQ_HANDLED; diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c index 2738563..bb241d0 100644 --- a/drivers/net/xen-netback/netback.c +++ b/drivers/net/xen-netback/netback.c @@ -477,7 +477,6 @@ static void xenvif_rx_action(struct xenvif *vif) unsigned long offset; struct skb_cb_overlay *sco; bool need_to_notify = false; - bool ring_full = false; struct netrx_pending_operations npo = { .copy = vif->grant_copy_op, @@ -487,7 +486,7 @@ static void xenvif_rx_action(struct xenvif *vif) skb_queue_head_init(&rxq); while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) { - int max_slots_needed; + RING_IDX max_slots_needed; int i; /* We need a cheap worse case estimate for the number of @@ -510,9 +509,10 @@ static void xenvif_rx_action(struct xenvif *vif) if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) { skb_queue_head(&vif->rx_queue, skb); need_to_notify = true; - ring_full = true; + vif->rx_last_skb_slots = max_slots_needed; break; - } + } else + vif->rx_last_skb_slots = 0; sco = (struct skb_cb_overlay *)skb->cb; sco->meta_slots_used = xenvif_gop_skb(skb, &npo); @@ -523,8 +523,6 @@ static void xenvif_rx_action(struct xenvif *vif) BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta)); - vif->rx_queue_stopped = !npo.copy_prod && ring_full; - if (!npo.copy_prod) goto done; @@ -1727,8 +1725,8 @@ static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif, static inline int rx_work_todo(struct xenvif *vif) { - return (!skb_queue_empty(&vif->rx_queue) && !vif->rx_queue_stopped) || - vif->rx_event; + return !skb_queue_empty(&vif->rx_queue) && + xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots); } static inline int tx_work_todo(struct xenvif *vif) @@ -1814,8 +1812,6 @@ int xenvif_kthread(void *data) if (!skb_queue_empty(&vif->rx_queue)) xenvif_rx_action(vif); - vif->rx_event = false; - if (skb_queue_empty(&vif->rx_queue) && netif_queue_stopped(vif->dev)) xenvif_start_queue(vif);
The recent patch to fix receive side flow control (11b57f) solved the spinning thread problem, however caused an another one. The receive side can stall, if: - [THREAD] xenvif_rx_action sets rx_queue_stopped to true - [INTERRUPT] interrupt happens, and sets rx_event to true - [THREAD] then xenvif_kthread sets rx_event to false - [THREAD] rx_work_todo doesn't return true anymore Also, if interrupt sent but there is still no room in the ring, it take quite a long time until xenvif_rx_action realize it. This patch ditch that two variable, and rework rx_work_todo. If the thread finds it can't fit more skb's into the ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's kept as 0. Then rx_work_todo will check if: - there is something to send to the ring (like before) - there is space for the topmost packet in the queue I think that's more natural and optimal thing to test than two bool which are set somewhere else. Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com> --- drivers/net/xen-netback/common.h | 6 +----- drivers/net/xen-netback/interface.c | 1 - drivers/net/xen-netback/netback.c | 16 ++++++---------- 3 files changed, 7 insertions(+), 16 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html