diff mbox series

xen-netback: fix occasional leak of grant ref mappings under memory pressure

Message ID 1551319382-32595-1-git-send-email-igor.druzhinin@citrix.com
State Changes Requested
Delegated to: David Miller
Headers show
Series xen-netback: fix occasional leak of grant ref mappings under memory pressure | expand

Commit Message

Igor Druzhinin Feb. 28, 2019, 2:03 a.m. UTC
Zero-copy callback flag is not yet set on frag list skb at the moment
xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
leaking grant ref mappings since xenvif_zerocopy_callback() is never
called for these fragments. Those eventually build up and cause Xen
to kill Dom0 as the slots get reused for new mappings.

That behavior is observed under certain workloads where sudden spikes
of page cache usage for writes coexist with active atomic skb allocations.

Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
---
 drivers/net/xen-netback/netback.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Paul Durrant Feb. 28, 2019, 9:46 a.m. UTC | #1
> -----Original Message-----
> From: Igor Druzhinin [mailto:igor.druzhinin@citrix.com]
> Sent: 28 February 2019 02:03
> To: xen-devel@lists.xenproject.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> Cc: Wei Liu <wei.liu2@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>; davem@davemloft.net; Igor
> Druzhinin <igor.druzhinin@citrix.com>
> Subject: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure
> 
> Zero-copy callback flag is not yet set on frag list skb at the moment
> xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> leaking grant ref mappings since xenvif_zerocopy_callback() is never
> called for these fragments. Those eventually build up and cause Xen
> to kill Dom0 as the slots get reused for new mappings.
> 
> That behavior is observed under certain workloads where sudden spikes
> of page cache usage for writes coexist with active atomic skb allocations.
> 
> Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> ---
>  drivers/net/xen-netback/netback.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 80aae3a..2023317 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1146,9 +1146,12 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> 
>  		if (unlikely(skb_has_frag_list(skb))) {
>  			if (xenvif_handle_frag_list(queue, skb)) {
> +				struct sk_buff *nskb =
> +						skb_shinfo(skb)->frag_list;
>  				if (net_ratelimit())
>  					netdev_err(queue->vif->dev,
>  						   "Not enough memory to consolidate frag_list!\n");
> +				xenvif_skb_zerocopy_prepare(queue, nskb);
>  				xenvif_skb_zerocopy_prepare(queue, skb);
>  				kfree_skb(skb);
>  				continue;

Whilst this fix will do the job, I think it would be better to get rid of the kfree_skb() from inside xenvif_handle_frag_list() and always deal with it here rather than having it happen in two different places. Something like the following...

---8<---
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 80aae3a32c2a..093c7b860772 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
 /* Consolidate skb with a frag_list into a brand new one with local pages on
  * frags. Returns 0 or -ENOMEM if can't allocate new pages.
  */
-static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *skb)
+static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 80aae3a32c2a..093c7b860772 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *qu
eue,
 /* Consolidate skb with a frag_list into a brand new one with local pages on
  * frags. Returns 0 or -ENOMEM if can't allocate new pages.
  */
-static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
skb)
+static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
skb,
+                                  struct sk_buff *nskb)
 {
        unsigned int offset = skb_headlen(skb);
        skb_frag_t frags[MAX_SKB_FRAGS];
        int i, f;
        struct ubuf_info *uarg;
-       struct sk_buff *nskb = skb_shinfo(skb)->frag_list;

        queue->stats.tx_zerocopy_sent += 2;
        queue->stats.tx_frag_overflow++;
@@ -1072,11 +1072,6 @@ static int xenvif_handle_frag_list(struct xenvif_queue *q
ueue, struct sk_buff *s
                skb_frag_size_set(&frags[i], len);
        }

-       /* Copied all the bits from the frag list -- free it. */
-       skb_frag_list_init(skb);
-       xenvif_skb_zerocopy_prepare(queue, nskb);
-       kfree_skb(nskb);
-
        /* Release all the original (foreign) frags. */
        for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
                skb_frag_unref(skb, f);
@@ -1145,7 +1140,11 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
                xenvif_fill_frags(queue, skb);

                if (unlikely(skb_has_frag_list(skb))) {
-                       if (xenvif_handle_frag_list(queue, skb)) {
+                       struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
+
+                       xenvif_skb_zerocopy_prepare(queue, nskb);
+
+                       if (xenvif_handle_frag_list(queue, skb, nskb)) {
                                if (net_ratelimit())
                                        netdev_err(queue->vif->dev,
                                                   "Not enough memory to consolidate frag_list!\n");
@@ -1153,6 +1152,10 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
                                kfree_skb(skb);
                                continue;
                        }
+
+                       /* Copied all the bits from the frag list. */
+                       skb_frag_list_init(skb);
+                       kfree(nskb);
                }

                skb->dev      = queue->vif->dev;
---8<---

What do you think?

  Paul

> --
> 2.7.4
Andrew Cooper Feb. 28, 2019, 9:50 a.m. UTC | #2
On 28/02/2019 02:03, Igor Druzhinin wrote:
> Zero-copy callback flag is not yet set on frag list skb at the moment
> xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> leaking grant ref mappings since xenvif_zerocopy_callback() is never
> called for these fragments. Those eventually build up and cause Xen
> to kill Dom0 as the slots get reused for new mappings.

Its worth pointing out what (debug) Xen notices is dom0 performing
implicit grant unmap.

~Andrew
Wei Liu Feb. 28, 2019, 11:01 a.m. UTC | #3
On Thu, Feb 28, 2019 at 09:46:57AM +0000, Paul Durrant wrote:
> > -----Original Message-----
> > From: Igor Druzhinin [mailto:igor.druzhinin@citrix.com]
> > Sent: 28 February 2019 02:03
> > To: xen-devel@lists.xenproject.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> > Cc: Wei Liu <wei.liu2@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>; davem@davemloft.net; Igor
> > Druzhinin <igor.druzhinin@citrix.com>
> > Subject: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure
> > 
> > Zero-copy callback flag is not yet set on frag list skb at the moment
> > xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> > leaking grant ref mappings since xenvif_zerocopy_callback() is never
> > called for these fragments. Those eventually build up and cause Xen
> > to kill Dom0 as the slots get reused for new mappings.
> > 
> > That behavior is observed under certain workloads where sudden spikes
> > of page cache usage for writes coexist with active atomic skb allocations.
> > 
> > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> > ---
> >  drivers/net/xen-netback/netback.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > index 80aae3a..2023317 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -1146,9 +1146,12 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> > 
> >  		if (unlikely(skb_has_frag_list(skb))) {
> >  			if (xenvif_handle_frag_list(queue, skb)) {
> > +				struct sk_buff *nskb =
> > +						skb_shinfo(skb)->frag_list;
> >  				if (net_ratelimit())
> >  					netdev_err(queue->vif->dev,
> >  						   "Not enough memory to consolidate frag_list!\n");
> > +				xenvif_skb_zerocopy_prepare(queue, nskb);
> >  				xenvif_skb_zerocopy_prepare(queue, skb);
> >  				kfree_skb(skb);
> >  				continue;
> 
> Whilst this fix will do the job, I think it would be better to get rid of the kfree_skb() from inside xenvif_handle_frag_list() and always deal with it here rather than having it happen in two different places. Something like the following...

+1 for having only one place. 

> 
> ---8<---
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 80aae3a32c2a..093c7b860772 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
>  /* Consolidate skb with a frag_list into a brand new one with local pages on
>   * frags. Returns 0 or -ENOMEM if can't allocate new pages.
>   */
> -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *skb)
> +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 80aae3a32c2a..093c7b860772 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *qu
> eue,
>  /* Consolidate skb with a frag_list into a brand new one with local pages on
>   * frags. Returns 0 or -ENOMEM if can't allocate new pages.
>   */
> -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
> skb)
> +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
> skb,
> +                                  struct sk_buff *nskb)
>  {
>         unsigned int offset = skb_headlen(skb);
>         skb_frag_t frags[MAX_SKB_FRAGS];
>         int i, f;
>         struct ubuf_info *uarg;
> -       struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
> 
>         queue->stats.tx_zerocopy_sent += 2;
>         queue->stats.tx_frag_overflow++;
> @@ -1072,11 +1072,6 @@ static int xenvif_handle_frag_list(struct xenvif_queue *q
> ueue, struct sk_buff *s
>                 skb_frag_size_set(&frags[i], len);
>         }
> 
> -       /* Copied all the bits from the frag list -- free it. */
> -       skb_frag_list_init(skb);
> -       xenvif_skb_zerocopy_prepare(queue, nskb);
> -       kfree_skb(nskb);
> -
>         /* Release all the original (foreign) frags. */
>         for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
>                 skb_frag_unref(skb, f);
> @@ -1145,7 +1140,11 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
>                 xenvif_fill_frags(queue, skb);
> 
>                 if (unlikely(skb_has_frag_list(skb))) {
> -                       if (xenvif_handle_frag_list(queue, skb)) {
> +                       struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
> +
> +                       xenvif_skb_zerocopy_prepare(queue, nskb);
> +
> +                       if (xenvif_handle_frag_list(queue, skb, nskb)) {
>                                 if (net_ratelimit())
>                                         netdev_err(queue->vif->dev,
>                                                    "Not enough memory to consolidate frag_list!\n");
> @@ -1153,6 +1152,10 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
>                                 kfree_skb(skb);
>                                 continue;
>                         }
> +
> +                       /* Copied all the bits from the frag list. */
> +                       skb_frag_list_init(skb);
> +                       kfree(nskb);

I think you want kfree_skb here?

Wei.

>                 }
> 
>                 skb->dev      = queue->vif->dev;
> ---8<---
> 
> What do you think?
> 
>   Paul
> 
> > --
> > 2.7.4
>
Paul Durrant Feb. 28, 2019, 11:21 a.m. UTC | #4
> -----Original Message-----
> From: Wei Liu [mailto:wei.liu2@citrix.com]
> Sent: 28 February 2019 11:02
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Igor Druzhinin <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org;
> netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Wei Liu <wei.liu2@citrix.com>;
> davem@davemloft.net
> Subject: Re: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure
> 
> On Thu, Feb 28, 2019 at 09:46:57AM +0000, Paul Durrant wrote:
> > > -----Original Message-----
> > > From: Igor Druzhinin [mailto:igor.druzhinin@citrix.com]
> > > Sent: 28 February 2019 02:03
> > > To: xen-devel@lists.xenproject.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> > > Cc: Wei Liu <wei.liu2@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>; davem@davemloft.net;
> Igor
> > > Druzhinin <igor.druzhinin@citrix.com>
> > > Subject: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure
> > >
> > > Zero-copy callback flag is not yet set on frag list skb at the moment
> > > xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> > > leaking grant ref mappings since xenvif_zerocopy_callback() is never
> > > called for these fragments. Those eventually build up and cause Xen
> > > to kill Dom0 as the slots get reused for new mappings.
> > >
> > > That behavior is observed under certain workloads where sudden spikes
> > > of page cache usage for writes coexist with active atomic skb allocations.
> > >
> > > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> > > ---
> > >  drivers/net/xen-netback/netback.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > > index 80aae3a..2023317 100644
> > > --- a/drivers/net/xen-netback/netback.c
> > > +++ b/drivers/net/xen-netback/netback.c
> > > @@ -1146,9 +1146,12 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> > >
> > >  		if (unlikely(skb_has_frag_list(skb))) {
> > >  			if (xenvif_handle_frag_list(queue, skb)) {
> > > +				struct sk_buff *nskb =
> > > +						skb_shinfo(skb)->frag_list;
> > >  				if (net_ratelimit())
> > >  					netdev_err(queue->vif->dev,
> > >  						   "Not enough memory to consolidate frag_list!\n");
> > > +				xenvif_skb_zerocopy_prepare(queue, nskb);
> > >  				xenvif_skb_zerocopy_prepare(queue, skb);
> > >  				kfree_skb(skb);
> > >  				continue;
> >
> > Whilst this fix will do the job, I think it would be better to get rid of the kfree_skb() from
> inside xenvif_handle_frag_list() and always deal with it here rather than having it happen in two
> different places. Something like the following...
> 
> +1 for having only one place.
> 
> >
> > ---8<---
> > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > index 80aae3a32c2a..093c7b860772 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
> >  /* Consolidate skb with a frag_list into a brand new one with local pages on
> >   * frags. Returns 0 or -ENOMEM if can't allocate new pages.
> >   */
> > -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *skb)
> > +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *diff --git
> a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > index 80aae3a32c2a..093c7b860772 100644
> > --- a/drivers/net/xen-netback/netback.c
> > +++ b/drivers/net/xen-netback/netback.c
> > @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *qu
> > eue,
> >  /* Consolidate skb with a frag_list into a brand new one with local pages on
> >   * frags. Returns 0 or -ENOMEM if can't allocate new pages.
> >   */
> > -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
> > skb)
> > +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
> > skb,
> > +                                  struct sk_buff *nskb)
> >  {
> >         unsigned int offset = skb_headlen(skb);
> >         skb_frag_t frags[MAX_SKB_FRAGS];
> >         int i, f;
> >         struct ubuf_info *uarg;
> > -       struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
> >
> >         queue->stats.tx_zerocopy_sent += 2;
> >         queue->stats.tx_frag_overflow++;
> > @@ -1072,11 +1072,6 @@ static int xenvif_handle_frag_list(struct xenvif_queue *q
> > ueue, struct sk_buff *s
> >                 skb_frag_size_set(&frags[i], len);
> >         }
> >
> > -       /* Copied all the bits from the frag list -- free it. */
> > -       skb_frag_list_init(skb);
> > -       xenvif_skb_zerocopy_prepare(queue, nskb);
> > -       kfree_skb(nskb);
> > -
> >         /* Release all the original (foreign) frags. */
> >         for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
> >                 skb_frag_unref(skb, f);
> > @@ -1145,7 +1140,11 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> >                 xenvif_fill_frags(queue, skb);
> >
> >                 if (unlikely(skb_has_frag_list(skb))) {
> > -                       if (xenvif_handle_frag_list(queue, skb)) {
> > +                       struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
> > +
> > +                       xenvif_skb_zerocopy_prepare(queue, nskb);
> > +
> > +                       if (xenvif_handle_frag_list(queue, skb, nskb)) {
> >                                 if (net_ratelimit())
> >                                         netdev_err(queue->vif->dev,
> >                                                    "Not enough memory to consolidate frag_list!\n");
> > @@ -1153,6 +1152,10 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> >                                 kfree_skb(skb);
> >                                 continue;
> >                         }
> > +
> > +                       /* Copied all the bits from the frag list. */
> > +                       skb_frag_list_init(skb);
> > +                       kfree(nskb);
> 
> I think you want kfree_skb here?

No. nskb is the frag list... it is unlinked from skb by the call to skb_frag_list_init() and then it can be freed on its own. The skb is what we need to retain, because that now contains all the data.

  Cheers,

    Paul

> 
> Wei.
> 
> >                 }
> >
> >                 skb->dev      = queue->vif->dev;
> > ---8<---
> >
> > What do you think?
> >
> >   Paul
> >
> > > --
> > > 2.7.4
> >
Igor Druzhinin Feb. 28, 2019, 11:43 a.m. UTC | #5
On 28/02/2019 11:21, Paul Durrant wrote:
>>> @@ -1153,6 +1152,10 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
>>>                                 kfree_skb(skb);
>>>                                 continue;
>>>                         }
>>> +
>>> +                       /* Copied all the bits from the frag list. */
>>> +                       skb_frag_list_init(skb);
>>> +                       kfree(nskb);
>>
>> I think you want kfree_skb here?
> 
> No. nskb is the frag list... it is unlinked from skb by the call to skb_frag_list_init() and then it can be freed on its own. The skb is what we need to retain, because that now contains all the data.
> 

Are you saying previous code in xenvif_handle_frag_list() incorrectly
called kfree_skb()?

Igor
Paul Durrant Feb. 28, 2019, 11:49 a.m. UTC | #6
> -----Original Message-----
> From: Igor Druzhinin [mailto:igor.druzhinin@citrix.com]
> Sent: 28 February 2019 11:44
> To: Paul Durrant <Paul.Durrant@citrix.com>; Wei Liu <wei.liu2@citrix.com>
> Cc: xen-devel@lists.xenproject.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org;
> davem@davemloft.net
> Subject: Re: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure
> 
> On 28/02/2019 11:21, Paul Durrant wrote:
> >>> @@ -1153,6 +1152,10 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> >>>                                 kfree_skb(skb);
> >>>                                 continue;
> >>>                         }
> >>> +
> >>> +                       /* Copied all the bits from the frag list. */
> >>> +                       skb_frag_list_init(skb);
> >>> +                       kfree(nskb);
> >>
> >> I think you want kfree_skb here?
> >
> > No. nskb is the frag list... it is unlinked from skb by the call to skb_frag_list_init() and then it
> can be freed on its own. The skb is what we need to retain, because that now contains all the data.
> >
> 
> Are you saying previous code in xenvif_handle_frag_list() incorrectly
> called kfree_skb()?

No, it correctly called kfree_skb() on nskb in the success case. What Wei and myself would prefer is that we have a single place that the frag list is freed in both the success and error cases.

  Paul

> 
> Igor
Paul Durrant Feb. 28, 2019, 12:07 p.m. UTC | #7
> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xenproject.org] On Behalf Of Paul Durrant
> Sent: 28 February 2019 11:22
> To: Wei Liu <wei.liu2@citrix.com>
> Cc: Igor Druzhinin <igor.druzhinin@citrix.com>; Wei Liu <wei.liu2@citrix.com>; netdev@vger.kernel.org;
> linux-kernel@vger.kernel.org; xen-devel@lists.xenproject.org; davem@davemloft.net
> Subject: Re: [Xen-devel] [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory
> pressure
> 
> > -----Original Message-----
> > From: Wei Liu [mailto:wei.liu2@citrix.com]
> > Sent: 28 February 2019 11:02
> > To: Paul Durrant <Paul.Durrant@citrix.com>
> > Cc: Igor Druzhinin <igor.druzhinin@citrix.com>; xen-devel@lists.xenproject.org;
> > netdev@vger.kernel.org; linux-kernel@vger.kernel.org; Wei Liu <wei.liu2@citrix.com>;
> > davem@davemloft.net
> > Subject: Re: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure
> >
> > On Thu, Feb 28, 2019 at 09:46:57AM +0000, Paul Durrant wrote:
> > > > -----Original Message-----
> > > > From: Igor Druzhinin [mailto:igor.druzhinin@citrix.com]
> > > > Sent: 28 February 2019 02:03
> > > > To: xen-devel@lists.xenproject.org; netdev@vger.kernel.org; linux-kernel@vger.kernel.org
> > > > Cc: Wei Liu <wei.liu2@citrix.com>; Paul Durrant <Paul.Durrant@citrix.com>; davem@davemloft.net;
> > Igor
> > > > Druzhinin <igor.druzhinin@citrix.com>
> > > > Subject: [PATCH] xen-netback: fix occasional leak of grant ref mappings under memory pressure
> > > >
> > > > Zero-copy callback flag is not yet set on frag list skb at the moment
> > > > xenvif_handle_frag_list() returns -ENOMEM. This eventually results in
> > > > leaking grant ref mappings since xenvif_zerocopy_callback() is never
> > > > called for these fragments. Those eventually build up and cause Xen
> > > > to kill Dom0 as the slots get reused for new mappings.
> > > >
> > > > That behavior is observed under certain workloads where sudden spikes
> > > > of page cache usage for writes coexist with active atomic skb allocations.
> > > >
> > > > Signed-off-by: Igor Druzhinin <igor.druzhinin@citrix.com>
> > > > ---
> > > >  drivers/net/xen-netback/netback.c | 3 +++
> > > >  1 file changed, 3 insertions(+)
> > > >
> > > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > > > index 80aae3a..2023317 100644
> > > > --- a/drivers/net/xen-netback/netback.c
> > > > +++ b/drivers/net/xen-netback/netback.c
> > > > @@ -1146,9 +1146,12 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> > > >
> > > >  		if (unlikely(skb_has_frag_list(skb))) {
> > > >  			if (xenvif_handle_frag_list(queue, skb)) {
> > > > +				struct sk_buff *nskb =
> > > > +						skb_shinfo(skb)->frag_list;
> > > >  				if (net_ratelimit())
> > > >  					netdev_err(queue->vif->dev,
> > > >  						   "Not enough memory to consolidate frag_list!\n");
> > > > +				xenvif_skb_zerocopy_prepare(queue, nskb);
> > > >  				xenvif_skb_zerocopy_prepare(queue, skb);
> > > >  				kfree_skb(skb);
> > > >  				continue;
> > >
> > > Whilst this fix will do the job, I think it would be better to get rid of the kfree_skb() from
> > inside xenvif_handle_frag_list() and always deal with it here rather than having it happen in two
> > different places. Something like the following...
> >
> > +1 for having only one place.
> >
> > >
> > > ---8<---
> > > diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > > index 80aae3a32c2a..093c7b860772 100644
> > > --- a/drivers/net/xen-netback/netback.c
> > > +++ b/drivers/net/xen-netback/netback.c
> > > @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *queue,
> > >  /* Consolidate skb with a frag_list into a brand new one with local pages on
> > >   * frags. Returns 0 or -ENOMEM if can't allocate new pages.
> > >   */
> > > -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *skb)
> > > +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *diff --git
> > a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> > > index 80aae3a32c2a..093c7b860772 100644
> > > --- a/drivers/net/xen-netback/netback.c
> > > +++ b/drivers/net/xen-netback/netback.c
> > > @@ -1027,13 +1027,13 @@ static void xenvif_tx_build_gops(struct xenvif_queue *qu
> > > eue,
> > >  /* Consolidate skb with a frag_list into a brand new one with local pages on
> > >   * frags. Returns 0 or -ENOMEM if can't allocate new pages.
> > >   */
> > > -static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
> > > skb)
> > > +static int xenvif_handle_frag_list(struct xenvif_queue *queue, struct sk_buff *
> > > skb,
> > > +                                  struct sk_buff *nskb)
> > >  {
> > >         unsigned int offset = skb_headlen(skb);
> > >         skb_frag_t frags[MAX_SKB_FRAGS];
> > >         int i, f;
> > >         struct ubuf_info *uarg;
> > > -       struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
> > >
> > >         queue->stats.tx_zerocopy_sent += 2;
> > >         queue->stats.tx_frag_overflow++;
> > > @@ -1072,11 +1072,6 @@ static int xenvif_handle_frag_list(struct xenvif_queue *q
> > > ueue, struct sk_buff *s
> > >                 skb_frag_size_set(&frags[i], len);
> > >         }
> > >
> > > -       /* Copied all the bits from the frag list -- free it. */
> > > -       skb_frag_list_init(skb);
> > > -       xenvif_skb_zerocopy_prepare(queue, nskb);
> > > -       kfree_skb(nskb);
> > > -
> > >         /* Release all the original (foreign) frags. */
> > >         for (f = 0; f < skb_shinfo(skb)->nr_frags; f++)
> > >                 skb_frag_unref(skb, f);
> > > @@ -1145,7 +1140,11 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> > >                 xenvif_fill_frags(queue, skb);
> > >
> > >                 if (unlikely(skb_has_frag_list(skb))) {
> > > -                       if (xenvif_handle_frag_list(queue, skb)) {
> > > +                       struct sk_buff *nskb = skb_shinfo(skb)->frag_list;
> > > +
> > > +                       xenvif_skb_zerocopy_prepare(queue, nskb);
> > > +
> > > +                       if (xenvif_handle_frag_list(queue, skb, nskb)) {
> > >                                 if (net_ratelimit())
> > >                                         netdev_err(queue->vif->dev,
> > >                                                    "Not enough memory to consolidate
> frag_list!\n");
> > > @@ -1153,6 +1152,10 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
> > >                                 kfree_skb(skb);
> > >                                 continue;
> > >                         }
> > > +
> > > +                       /* Copied all the bits from the frag list. */
> > > +                       skb_frag_list_init(skb);
> > > +                       kfree(nskb);
> >
> > I think you want kfree_skb here?
> 
> No. nskb is the frag list... it is unlinked from skb by the call to skb_frag_list_init() and then it
> can be freed on its own. The skb is what we need to retain, because that now contains all the data.

Sorry I misread/understood what you were getting at. Yes, I meant kfree_skb(nskb).

  Paul

> 
>   Cheers,
> 
>     Paul
> 
> >
> > Wei.
> >
> > >                 }
> > >
> > >                 skb->dev      = queue->vif->dev;
> > > ---8<---
> > >
> > > What do you think?
> > >
> > >   Paul
> > >
> > > > --
> > > > 2.7.4
> > >
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xenproject.org
> https://lists.xenproject.org/mailman/listinfo/xen-devel
Wei Liu Feb. 28, 2019, 12:37 p.m. UTC | #8
On Thu, Feb 28, 2019 at 12:07:07PM +0000, Paul Durrant wrote:
> Yes, I meant kfree_skb(nskb).
> 

In that case I think your patch looks fine.

Wei.
diff mbox series

Patch

diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index 80aae3a..2023317 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -1146,9 +1146,12 @@  static int xenvif_tx_submit(struct xenvif_queue *queue)
 
 		if (unlikely(skb_has_frag_list(skb))) {
 			if (xenvif_handle_frag_list(queue, skb)) {
+				struct sk_buff *nskb =
+						skb_shinfo(skb)->frag_list;
 				if (net_ratelimit())
 					netdev_err(queue->vif->dev,
 						   "Not enough memory to consolidate frag_list!\n");
+				xenvif_skb_zerocopy_prepare(queue, nskb);
 				xenvif_skb_zerocopy_prepare(queue, skb);
 				kfree_skb(skb);
 				continue;