diff mbox series

[v2] xen-netback: fix occasional leak of grant ref mappings under memory pressure

Message ID 1551358083-14227-1-git-send-email-igor.druzhinin@citrix.com
State Accepted
Delegated to: David Miller
Headers show
Series [v2] xen-netback: fix occasional leak of grant ref mappings under memory pressure | expand

Commit Message

Igor Druzhinin Feb. 28, 2019, 12:48 p.m. UTC
Zero-copy callback flag is not yet set on frag list skb at the moment
xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
leaking grant ref mappings since xenvif_zerocopy_callback() is never
called for these fragments. Those eventually build up and cause Xen
to kill Dom0 as the slots get reused for new mappings:

"d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005"

That behavior is observed under certain workloads where sudden spikes
of page cache writes coexist with active atomic skb allocations from
network traffic. Additionally, rework the logic to deal with frag_list
deallocation in a single place.

Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 drivers/net/xen-netback/netback.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

Comments

Wei Liu Feb. 28, 2019, 12:52 p.m. UTC | #1
On Thu, Feb 28, 2019 at 12:48:03PM +0000, Igor Druzhinin wrote:
> Zero-copy callback flag is not yet set on frag list skb at the moment
> xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> leaking grant ref mappings since xenvif_zerocopy_callback() is never
> called for these fragments. Those eventually build up and cause Xen
> to kill Dom0 as the slots get reused for new mappings:
> 
> "d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005"
> 
> That behavior is observed under certain workloads where sudden spikes
> of page cache writes coexist with active atomic skb allocations from
> network traffic. Additionally, rework the logic to deal with frag_list
> deallocation in a single place.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

Acked-by: Wei Liu <wei.liu2@citrix.com>

> ---
>  drivers/net/xen-netback/netback.c | 10 +++++-----
>  1 file changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 80aae3a..f09948b 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1072,11 +1072,6 @@ static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s
>  		skb_frag_size_set(&frags[i], len);
>  	}
>  
> -	/* Copied all the bits from the frag list -- free it. */
> -	skb_frag_list_init(skb);
> -	xenvif_skb_zerocopy_prepare(queue, nskb);
> -	kfree_skb(nskb);
> -
>  	/* Release all the original (foreign) frags. */
>  	for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
>  		skb_frag_unref(skb, f);
> @@ -1145,6 +1140,8 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
>  		xenvif_fill_frags(queue, skb);
>  
>  		if (unlikely(skb_has_frag_list(skb))) {
> +			struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
> +			xenvif_skb_zerocopy_prepare(queue, nskb);
>  			if (xenvif_handle_frag_list(queue, skb)) {
>  				if (net_ratelimit())
>  					netdev_err(queue->vif->dev,
> @@ -1153,6 +1150,9 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
>  				kfree_skb(skb);
>  				continue;
>  			}
> +			/* Copied all the bits from the frag list -- free it. */
> +			skb_frag_list_init(skb);
> +			kfree_skb(nskb);
>  		}
>  
>  		skb->dev      = queue->vif->dev;
> -- 
> 2.7.4
>
David Miller Feb. 28, 2019, 6:37 p.m. UTC | #2
From: Igor Druzhinin <igor.druzhinin@citrix.com>
Date: Thu, 28 Feb 2019 12:48:03 +0000

> Zero-copy callback flag is not yet set on frag list skb at the moment
> xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> leaking grant ref mappings since xenvif_zerocopy_callback() is never
> called for these fragments. Those eventually build up and cause Xen
> to kill Dom0 as the slots get reused for new mappings:
> 
> "d0v0 Attempt to implicitly unmap a granted PTE c010000329fce005"
> 
> That behavior is observed under certain workloads where sudden spikes
> of page cache writes coexist with active atomic skb allocations from
> network traffic. Additionally, rework the logic to deal with frag_list
> deallocation in a single place.
> 
> Signed-off-by: Paul Durrant <paul.durrant@citrix.com>
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>

Applied and queued up for -stable, thanks.
diff mbox series

Patch

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 80aae3a..f09948b 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1072,11 +1072,6 @@  static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *s
 		skb_frag_size_set(&frags[i], len);
 	}
 
-	/* Copied all the bits from the frag list -- free it. */
-	skb_frag_list_init(skb);
-	xenvif_skb_zerocopy_prepare(queue, nskb);
-	kfree_skb(nskb);
-
 	/* Release all the original (foreign) frags. */
 	for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
 		skb_frag_unref(skb, f);
@@ -1145,6 +1140,8 @@  static int xenvif_tx_submit(struct xenvif_queue *queue)
 		xenvif_fill_frags(queue, skb);
 
 		if (unlikely(skb_has_frag_list(skb))) {
+			struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
+			xenvif_skb_zerocopy_prepare(queue, nskb);
 			if (xenvif_handle_frag_list(queue, skb)) {
 				if (net_ratelimit())
 					netdev_err(queue->vif->dev,
@@ -1153,6 +1150,9 @@  static int xenvif_tx_submit(struct xenvif_queue *queue)
 				kfree_skb(skb);
 				continue;
 			}
+			/* Copied all the bits from the frag list -- free it. */
+			skb_frag_list_init(skb);
+			kfree_skb(nskb);
 		}
 
 		skb->dev      = queue->vif->dev;