Patchwork [net,v3] xen-netback: Fix Rx stall due to race condition

login
register
mail settings
Submitter Zoltan Kiss
Date Feb. 4, 2014, 7:54 p.m.
Message ID <1391543677-24039-1-git-send-email-zoltan.kiss@citrix.com>
Download mbox | patch
Permalink /patch/316748/
State Accepted
Delegated to: David Miller
Headers show

Comments

Zoltan Kiss - Feb. 4, 2014, 7:54 p.m.
The recent patch to fix receive side flow control
(11b57f90257c1d6a91cee720151b69e0c2020cf6: xen-netback: stop vif thread
spinning if frontend is unresponsive) solved the spinning thread problem,
however caused an another one. The receive side can stall, if:
- [THREAD] xenvif_rx_action sets rx_queue_stopped to true
- [INTERRUPT] interrupt happens, and sets rx_event to true
- [THREAD] then xenvif_kthread sets rx_event to false
- [THREAD] rx_work_todo doesn't return true anymore

Also, if interrupt sent but there is still no room in the ring, it take quite a
long time until xenvif_rx_action realize it. This patch ditch that two variable,
and rework rx_work_todo. If the thread finds it can't fit more skb's into the
ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's
kept as 0. Then rx_work_todo will check if:
- there is something to send to the ring (like before)
- there is space for the topmost packet in the queue

I think that's more natural and optimal thing to test than two bool which are
set somewhere else.

Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
Acked-by: Wei Liu <wei.liu2@citrix.com>
---
v2:
- Improve description of the problem

v3:
- Change subject and resend against net to emphasize this is a bugfix

 drivers/net/xen-netback/common.h    |    6 +-----
 drivers/net/xen-netback/interface.c |    1 -
 drivers/net/xen-netback/netback.c   |   16 ++++++----------
 3 files changed, 7 insertions(+), 16 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 6, 2014, 12:24 a.m.
From: Zoltan Kiss <zoltan.kiss@citrix.com>
Date: Tue, 4 Feb 2014 19:54:37 +0000

> The recent patch to fix receive side flow control
> (11b57f90257c1d6a91cee720151b69e0c2020cf6: xen-netback: stop vif thread
> spinning if frontend is unresponsive) solved the spinning thread problem,
> however caused an another one. The receive side can stall, if:
> - [THREAD] xenvif_rx_action sets rx_queue_stopped to true
> - [INTERRUPT] interrupt happens, and sets rx_event to true
> - [THREAD] then xenvif_kthread sets rx_event to false
> - [THREAD] rx_work_todo doesn't return true anymore
> 
> Also, if interrupt sent but there is still no room in the ring, it take quite a
> long time until xenvif_rx_action realize it. This patch ditch that two variable,
> and rework rx_work_todo. If the thread finds it can't fit more skb's into the
> ring, it saves the last slot estimation into rx_last_skb_slots, otherwise it's
> kept as 0. Then rx_work_todo will check if:
> - there is something to send to the ring (like before)
> - there is space for the topmost packet in the queue
> 
> I think that's more natural and optimal thing to test than two bool which are
> set somewhere else.
> 
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> Reviewed-by: Paul Durrant <paul.durrant@citrix.com>
> Acked-by: Wei Liu <wei.liu2@citrix.com>

Applied, thanks.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 4c76bcb..ae413a2 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -143,11 +143,7 @@  struct xenvif {
 	char rx_irq_name[IFNAMSIZ+4]; /* DEVNAME-rx */
 	struct xen_netif_rx_back_ring rx;
 	struct sk_buff_head rx_queue;
-	bool rx_queue_stopped;
-	/* Set when the RX interrupt is triggered by the frontend.
-	 * The worker thread may need to wake the queue.
-	 */
-	bool rx_event;
+	RING_IDX rx_last_skb_slots;
 
 	/* This array is allocated seperately as it is large */
 	struct gnttab_copy *grant_copy_op;
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index b9de31e..7669d49 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -100,7 +100,6 @@  static irqreturn_t xenvif_rx_interrupt(int irq, void *dev_id)
 {
 	struct xenvif *vif = dev_id;
 
-	vif->rx_event = true;
 	xenvif_kick_thread(vif);
 
 	return IRQ_HANDLED;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 2738563..bb241d0 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -477,7 +477,6 @@  static void xenvif_rx_action(struct xenvif *vif)
 	unsigned long offset;
 	struct skb_cb_overlay *sco;
 	bool need_to_notify = false;
-	bool ring_full = false;
 
 	struct netrx_pending_operations npo = {
 		.copy  = vif->grant_copy_op,
@@ -487,7 +486,7 @@  static void xenvif_rx_action(struct xenvif *vif)
 	skb_queue_head_init(&rxq);
 
 	while ((skb = skb_dequeue(&vif->rx_queue)) != NULL) {
-		int max_slots_needed;
+		RING_IDX max_slots_needed;
 		int i;
 
 		/* We need a cheap worse case estimate for the number of
@@ -510,9 +509,10 @@  static void xenvif_rx_action(struct xenvif *vif)
 		if (!xenvif_rx_ring_slots_available(vif, max_slots_needed)) {
 			skb_queue_head(&vif->rx_queue, skb);
 			need_to_notify = true;
-			ring_full = true;
+			vif->rx_last_skb_slots = max_slots_needed;
 			break;
-		}
+		} else
+			vif->rx_last_skb_slots = 0;
 
 		sco = (struct skb_cb_overlay *)skb->cb;
 		sco->meta_slots_used = xenvif_gop_skb(skb, &npo);
@@ -523,8 +523,6 @@  static void xenvif_rx_action(struct xenvif *vif)
 
 	BUG_ON(npo.meta_prod > ARRAY_SIZE(vif->meta));
 
-	vif->rx_queue_stopped = !npo.copy_prod && ring_full;
-
 	if (!npo.copy_prod)
 		goto done;
 
@@ -1727,8 +1725,8 @@  static struct xen_netif_rx_response *make_rx_response(struct xenvif *vif,
 
 static inline int rx_work_todo(struct xenvif *vif)
 {
-	return (!skb_queue_empty(&vif->rx_queue) && !vif->rx_queue_stopped) ||
-		vif->rx_event;
+	return !skb_queue_empty(&vif->rx_queue) &&
+	       xenvif_rx_ring_slots_available(vif, vif->rx_last_skb_slots);
 }
 
 static inline int tx_work_todo(struct xenvif *vif)
@@ -1814,8 +1812,6 @@  int xenvif_kthread(void *data)
 		if (!skb_queue_empty(&vif->rx_queue))
 			xenvif_rx_action(vif);
 
-		vif->rx_event = false;
-
 		if (skb_queue_empty(&vif->rx_queue) &&
 		    netif_queue_stopped(vif->dev))
 			xenvif_start_queue(vif);