[net,RFC] xen-netback: Fix grant ref resolution in RX path

Message ID	1399991500-4432-1-git-send-email-zoltan.kiss@citrix.com
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> From: Zoltan Kiss <zoltan.kiss@citrix.com> To: <xen-devel@lists.xenproject.org>, <ian.campbell@citrix.com>, <wei.liu2@citrix.com>, <linux@eikelenboom.it> CC: <paul.durrant@citrix.com>, <netdev@vger.kernel.org>, <david.vrabel@citrix.com>, <davem@davemloft.net>, Zoltan Kiss <zoltan.kiss@citrix.com> Subject: [PATCH net RFC] xen-netback: Fix grant ref resolution in RX path Date: Tue, 13 May 2014 15:31:40 +0100 Message-ID: <1399991500-4432-1-git-send-email-zoltan.kiss@citrix.com> MIME-Version: 1.0 Content-Type: text/plain Sender: netdev-owner@vger.kernel.org Precedence: bulk

Zoltan Kiss May 13, 2014, 2:31 p.m. UTC

The original series for reintroducing grant mapping for netback had a patch [1]
to handle receiving of packets from an another VIF. Grant copy on the receiving
side needs the grant ref of the page to set up the op.
The original patch assumed (wrongly) that the frags array haven't changed. In
the case reported by Sander, the sending guest sent a packet where the linear
buffer and the first frag were under PKT_PROT_LEN (=128) bytes.
xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the
first frag. The receiving side had an off-by-one problem when gathered the grant
refs.
This patch fixes that by checking whether the actual frag's page pointer is the
same as the page in the original frag list. It can handle any kind of changes on
the original frags array, like:
- removing granted frags from the beginning or the end
- adding local pages to the frags list
To keep it optimized to the most common cases, it doesn't handle when the order
of the original frags changed. That would require ubuf to be reseted to the
beginning of the chain (skb_shinfo(skb)->destructor_arg), and reiterating
through the list every time.

OPEN QUESTIONS:
- Is it a safe assumption that nothing changes the order of the original frags?
  Removing them from the array or injecting new pages anywhere is not a problem.
- I used UINT_MAX as a kind of INVALID_GRANT_REF, however there is no such thing
  in the grant mapping API. Should we codify this or is it better if we just
  find another way to distinguish whether a frag is local or not?
- Should this fix go to David's net tree or directly to the mainline tree? Or
  both?

[1]: 3e2234: xen-netback: Handle foreign mapped pages on the guest RX path

Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
---
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ian Campbell May 13, 2014, 3:47 p.m. UTC | #1

On Tue, 2014-05-13 at 15:31 +0100, Zoltan Kiss wrote:
> The original series for reintroducing grant mapping for netback had a patch [1]
> to handle receiving of packets from an another VIF. Grant copy on the receiving
> side needs the grant ref of the page to set up the op.
> The original patch assumed (wrongly) that the frags array haven't changed. In
> the case reported by Sander, the sending guest sent a packet where the linear
> buffer and the first frag were under PKT_PROT_LEN (=128) bytes.
> xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the
> first frag. The receiving side had an off-by-one problem when gathered the grant
> refs.

The grant refs are gather here from the ubuf_info * link list stored in
skb->destructor_arg, is that right?

And the issue is that pulling up can consume a frag (e.g. and shift
everything up) without adjusting that linked list?

I think the commit message should make explicit reference to this linked
list and where it lives and make the implications of doing a pull up on
that list clear.

So the fix is when walking the list instead of assuming a 1:1
relationship between frags and list elements you need to traverse the
list looking for one which matches?

The ubuf's themselves are part of pending_tx_info -- is it the case that
the pending entries are retained until the skb completes even when the
corresponding frag is pulled up?

I suppose the network stack does pull ups itself -- so we can't fix this
by just fixing up the ubuf list as we do the initial PKT_PROT_LEN pull
up because there will be other pull ups which we don't control?

> This patch fixes that by checking whether the actual frag's page pointer is the
> same as the page in the original frag list. It can handle any kind of changes on
> the original frags array, like:
> - removing granted frags from the beginning or the end
> - adding local pages to the frags list
> To keep it optimized to the most common cases, it doesn't handle when the order
> of the original frags changed. That would require ubuf to be reseted to the
> beginning of the chain (skb_shinfo(skb)->destructor_arg), and reiterating
> through the list every time.

Do we need to keep the ubuf list around after this operation or could it
be destructive? I ask because by removing the entries from the list as
they are consumed the optimal case will then always have the next frag
in the head but it will handle reordering.

> OPEN QUESTIONS:
> - Is it a safe assumption that nothing changes the order of the original frags?
>   Removing them from the array or injecting new pages anywhere is not a problem.
> - I used UINT_MAX as a kind of INVALID_GRANT_REF, however there is no such thing
>   in the grant mapping API. Should we codify this or is it better if we just
>   find another way to distinguish whether a frag is local or not?

I think UINT_MAX is probably OK, but can't you avoid this by doing the
search in the existing loop over the frags which calls
xenvif_gop_frag_copy? If you do that then you can pass foreign_vif as
NULL in the cases where there is no ubuf entry.

> - Should this fix go to David's net tree or directly to the mainline tree? Or
>   both?

Via David as normal I think, but we need this for v3.15 final.

> [1]: 3e2234: xen-netback: Handle foreign mapped pages on the guest RX path
> 
> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---
> diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
> index 7666540..e18ac23 100644
> --- a/drivers/net/xen-netback/netback.c
> +++ b/drivers/net/xen-netback/netback.c
> @@ -278,7 +278,7 @@ static void xenvif_gop_frag_copy(struct xenvif *vif, struct sk_buff *skb,
>  		copy_gop->flags = GNTCOPY_dest_gref;
>  		copy_gop->len = bytes;
>  
> -		if (foreign_vif) {
> +		if (foreign_vif && (foreign_gref != UINT_MAX)) {
>  			copy_gop->source.domid = foreign_vif->domid;
>  			copy_gop->source.u.ref = foreign_gref;
>  			copy_gop->flags |= GNTCOPY_source_gref;
> @@ -388,15 +388,22 @@ static int xenvif_gop_skb(struct sk_buff *skb,
>  
>  	if ((skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY) &&
>  		 (ubuf->callback == &xenvif_zerocopy_callback)) {
> -		int i = 0;
>  		foreign_vif = ubuf_to_vif(ubuf);
>  
> -		do {
> -			u16 pending_idx = ubuf->desc;
> -			foreign_grefs[i++] =
> -				foreign_vif->pending_tx_info[pending_idx].req.gref;
> -			ubuf = (struct ubuf_info *) ubuf->ctx;
> -		} while (ubuf);
> +		for (i = 0; i < nr_frags; i++) {
> +			foreign_grefs[i] = UINT_MAX;
> +			do {
> +				u16 pending_idx = ubuf->desc;
> +
> +				ubuf = (struct ubuf_info *) ubuf->ctx;
> +				if (skb_shinfo(skb)->frags[i].page.p ==
> +					foreign_vif->mmap_pages[pending_idx]) {
> +					foreign_grefs[i] =
> +						foreign_vif->pending_tx_info[pending_idx].req.gref;
> +					break;
> +				}
> +			} while (ubuf);
> +		}
>  	}
>  
>  	data = skb->data;


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet May 13, 2014, 4:13 p.m. UTC | #2

On Tue, 2014-05-13 at 15:31 +0100, Zoltan Kiss wrote:
> The original series for reintroducing grant mapping for netback had a patch [1]
> to handle receiving of packets from an another VIF. Grant copy on the receiving
> side needs the grant ref of the page to set up the op.
> The original patch assumed (wrongly) that the frags array haven't changed. In
> the case reported by Sander, the sending guest sent a packet where the linear
> buffer and the first frag were under PKT_PROT_LEN (=128) bytes.
> xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the
> first frag. The receiving side had an off-by-one problem when gathered the grant
> refs.
> This patch fixes that by checking whether the actual frag's page pointer is the
> same as the page in the original frag list. It can handle any kind of changes on
> the original frags array, like:
> - removing granted frags from the beginning or the end
> - adding local pages to the frags list
> To keep it optimized to the most common cases, it doesn't handle when the order
> of the original frags changed. That would require ubuf to be reseted to the
> beginning of the chain (skb_shinfo(skb)->destructor_arg), and reiterating
> through the list every time.
> 
> OPEN QUESTIONS:
> - Is it a safe assumption that nothing changes the order of the original frags?
>   Removing them from the array or injecting new pages anywhere is not a problem.
> - I used UINT_MAX as a kind of INVALID_GRANT_REF, however there is no such thing
>   in the grant mapping API. Should we codify this or is it better if we just
>   find another way to distinguish whether a frag is local or not?
> - Should this fix go to David's net tree or directly to the mainline tree? Or
>   both?
> 
> [1]: 3e2234: xen-netback: Handle foreign mapped pages on the guest RX path
> 
> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> ---


The 'cleanup' of stale ubufs should be right after __pskb_pull_tail().

This is the function that can 'consume frags' after all.

Its not clear that you catch all cases, like skbs being purged in case
of device dismantle.

I am not saying your patch is wrong, only that it adds yet an obscure
thing with no comments. In two years, nobody will understand this.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sander Eikelenboom May 13, 2014, 7:25 p.m. UTC | #3

Tuesday, May 13, 2014, 6:13:55 PM, you wrote:

> On Tue, 2014-05-13 at 15:31 +0100, Zoltan Kiss wrote:
>> The original series for reintroducing grant mapping for netback had a patch [1]
>> to handle receiving of packets from an another VIF. Grant copy on the receiving
>> side needs the grant ref of the page to set up the op.
>> The original patch assumed (wrongly) that the frags array haven't changed. In
>> the case reported by Sander, the sending guest sent a packet where the linear
>> buffer and the first frag were under PKT_PROT_LEN (=128) bytes.
>> xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the
>> first frag. The receiving side had an off-by-one problem when gathered the grant
>> refs.
>> This patch fixes that by checking whether the actual frag's page pointer is the
>> same as the page in the original frag list. It can handle any kind of changes on
>> the original frags array, like:
>> - removing granted frags from the beginning or the end
>> - adding local pages to the frags list
>> To keep it optimized to the most common cases, it doesn't handle when the order
>> of the original frags changed. That would require ubuf to be reseted to the
>> beginning of the chain (skb_shinfo(skb)->destructor_arg), and reiterating
>> through the list every time.
>> 
>> OPEN QUESTIONS:
>> - Is it a safe assumption that nothing changes the order of the original frags?
>>   Removing them from the array or injecting new pages anywhere is not a problem.
>> - I used UINT_MAX as a kind of INVALID_GRANT_REF, however there is no such thing
>>   in the grant mapping API. Should we codify this or is it better if we just
>>   find another way to distinguish whether a frag is local or not?
>> - Should this fix go to David's net tree or directly to the mainline tree? Or
>>   both?
>> 
>> [1]: 3e2234: xen-netback: Handle foreign mapped pages on the guest RX path
>> 
>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>> ---

Will test the patch tomorrow and see how this works out for me.

> The 'cleanup' of stale ubufs should be right after __pskb_pull_tail().

> This is the function that can 'consume frags' after all.

> Its not clear that you catch all cases, like skbs being purged in case
> of device dismantle.
Hmm that's quite important since a partial fix only makes the potential 
remaining cases even harder to spot and debug i'm afraid.

> I am not saying your patch is wrong, only that it adds yet an obscure
> thing with no comments. In two years, nobody will understand this.

--
Sander



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zoltan Kiss May 13, 2014, 8:18 p.m. UTC | #4

On 13/05/14 16:47, Ian Campbell wrote:
> On Tue, 2014-05-13 at 15:31 +0100, Zoltan Kiss wrote:
>> The original series for reintroducing grant mapping for netback had a patch [1]
>> to handle receiving of packets from an another VIF. Grant copy on the receiving
>> side needs the grant ref of the page to set up the op.
>> The original patch assumed (wrongly) that the frags array haven't changed. In
>> the case reported by Sander, the sending guest sent a packet where the linear
>> buffer and the first frag were under PKT_PROT_LEN (=128) bytes.
>> xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the
>> first frag. The receiving side had an off-by-one problem when gathered the grant
>> refs.
>
> The grant refs are gather here from the ubuf_info * link list stored in
> skb->destructor_arg, is that right?
Yes
>
> And the issue is that pulling up can consume a frag (e.g. and shift
> everything up) without adjusting that linked list?
Yes. Or any kind of change on the frags list (expect changing size and 
offset of an entry) breaks this part of the code at the moment.

>
> I think the commit message should make explicit reference to this linked
> list and where it lives and make the implications of doing a pull up on
> that list clear.
Ok, I'll add a reference to the description in common.h
>
> So the fix is when walking the list instead of assuming a 1:1
> relationship between frags and list elements you need to traverse the
> list looking for one which matches?
Yes.
>
> The ubuf's themselves are part of pending_tx_info -- is it the case that
> the pending entries are retained until the skb completes even when the
> corresponding frag is pulled up?
Yes. Pulling does remove the page from the frag list and put_page it, 
and when the callback called, it'll make sure it is given back to the 
sender.
>
> I suppose the network stack does pull ups itself -- so we can't fix this
> by just fixing up the ubuf list as we do the initial PKT_PROT_LEN pull
> up because there will be other pull ups which we don't control?
Yes. And I guess it's better to assume that other things can happen, 
e.g. see pskb_trim
>
>> This patch fixes that by checking whether the actual frag's page pointer is the
>> same as the page in the original frag list. It can handle any kind of changes on
>> the original frags array, like:
>> - removing granted frags from the beginning or the end
>> - adding local pages to the frags list
>> To keep it optimized to the most common cases, it doesn't handle when the order
>> of the original frags changed. That would require ubuf to be reseted to the
>> beginning of the chain (skb_shinfo(skb)->destructor_arg), and reiterating
>> through the list every time.
>
> Do we need to keep the ubuf list around after this operation or could it
> be destructive? I ask because by removing the entries from the list as
> they are consumed the optimal case will then always have the next frag
> in the head but it will handle reordering.
Yes, we need this chained list until the zerocopy callback walks through 
it and add the pending_idx's to the dealloc queue. I was tinkering with 
the idea of moving the not matching entries back to the head of the 
list, as the callback doesn't really care about the order, and I guess 
the guest neither. But that would look even scarier.
>
>> OPEN QUESTIONS:
>> - Is it a safe assumption that nothing changes the order of the original frags?
>>    Removing them from the array or injecting new pages anywhere is not a problem.
>> - I used UINT_MAX as a kind of INVALID_GRANT_REF, however there is no such thing
>>    in the grant mapping API. Should we codify this or is it better if we just
>>    find another way to distinguish whether a frag is local or not?
>
> I think UINT_MAX is probably OK, but can't you avoid this by doing the
> search in the existing loop over the frags which calls
> xenvif_gop_frag_copy? If you do that then you can pass foreign_vif as
> NULL in the cases where there is no ubuf entry.
Yes, that sounds better! That would make a lot of things easier.

I've reworked the patch to include frag reordering as well.

Zoli

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ian Campbell May 14, 2014, 8:32 a.m. UTC | #5

On Tue, 2014-05-13 at 09:13 -0700, Eric Dumazet wrote:
> On Tue, 2014-05-13 at 15:31 +0100, Zoltan Kiss wrote:
> > The original series for reintroducing grant mapping for netback had a patch [1]
> > to handle receiving of packets from an another VIF. Grant copy on the receiving
> > side needs the grant ref of the page to set up the op.
> > The original patch assumed (wrongly) that the frags array haven't changed. In
> > the case reported by Sander, the sending guest sent a packet where the linear
> > buffer and the first frag were under PKT_PROT_LEN (=128) bytes.
> > xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the
> > first frag. The receiving side had an off-by-one problem when gathered the grant
> > refs.
> > This patch fixes that by checking whether the actual frag's page pointer is the
> > same as the page in the original frag list. It can handle any kind of changes on
> > the original frags array, like:
> > - removing granted frags from the beginning or the end
> > - adding local pages to the frags list
> > To keep it optimized to the most common cases, it doesn't handle when the order
> > of the original frags changed. That would require ubuf to be reseted to the
> > beginning of the chain (skb_shinfo(skb)->destructor_arg), and reiterating
> > through the list every time.
> > 
> > OPEN QUESTIONS:
> > - Is it a safe assumption that nothing changes the order of the original frags?
> >   Removing them from the array or injecting new pages anywhere is not a problem.
> > - I used UINT_MAX as a kind of INVALID_GRANT_REF, however there is no such thing
> >   in the grant mapping API. Should we codify this or is it better if we just
> >   find another way to distinguish whether a frag is local or not?
> > - Should this fix go to David's net tree or directly to the mainline tree? Or
> >   both?
> > 
> > [1]: 3e2234: xen-netback: Handle foreign mapped pages on the guest RX path
> > 
> > Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
> > Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
> > ---
> 
> 
> The 'cleanup' of stale ubufs should be right after __pskb_pull_tail().
> 
> This is the function that can 'consume frags' after all.

That would be OK for the call to __pskb_pull_tail in netback itself --
but what about any other calls from other bits of the network stack
which don't know about this driver-specific data structure?

> Its not clear that you catch all cases, like skbs being purged in case
> of device dismantle.

Doesn't that go through the normal skb destroy path, as opposed to
manipulating an existing skb?

> I am not saying your patch is wrong, only that it adds yet an obscure
> thing with no comments. In two years, nobody will understand this.

Agreed.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zoltan Kiss May 14, 2014, 11:12 a.m. UTC | #6

On 13/05/14 17:13, Eric Dumazet wrote:
> On Tue, 2014-05-13 at 15:31 +0100, Zoltan Kiss wrote:
>> The original series for reintroducing grant mapping for netback had a patch [1]
>> to handle receiving of packets from an another VIF. Grant copy on the receiving
>> side needs the grant ref of the page to set up the op.
>> The original patch assumed (wrongly) that the frags array haven't changed. In
>> the case reported by Sander, the sending guest sent a packet where the linear
>> buffer and the first frag were under PKT_PROT_LEN (=128) bytes.
>> xenvif_tx_submit() then pulled up the linear area to 128 bytes, and ditched the
>> first frag. The receiving side had an off-by-one problem when gathered the grant
>> refs.
>> This patch fixes that by checking whether the actual frag's page pointer is the
>> same as the page in the original frag list. It can handle any kind of changes on
>> the original frags array, like:
>> - removing granted frags from the beginning or the end
>> - adding local pages to the frags list
>> To keep it optimized to the most common cases, it doesn't handle when the order
>> of the original frags changed. That would require ubuf to be reseted to the
>> beginning of the chain (skb_shinfo(skb)->destructor_arg), and reiterating
>> through the list every time.
>>
>> OPEN QUESTIONS:
>> - Is it a safe assumption that nothing changes the order of the original frags?
>>    Removing them from the array or injecting new pages anywhere is not a problem.
>> - I used UINT_MAX as a kind of INVALID_GRANT_REF, however there is no such thing
>>    in the grant mapping API. Should we codify this or is it better if we just
>>    find another way to distinguish whether a frag is local or not?
>> - Should this fix go to David's net tree or directly to the mainline tree? Or
>>    both?
>>
>> [1]: 3e2234: xen-netback: Handle foreign mapped pages on the guest RX path
>>
>> Reported-by: Sander Eikelenboom <linux@eikelenboom.it>
>> Signed-off-by: Zoltan Kiss <zoltan.kiss@citrix.com>
>> ---
>
>
> The 'cleanup' of stale ubufs should be right after __pskb_pull_tail().
We can't fix every place in the kernel where frags might be changed, 
especially with a netback specific stuff, so unfortunately that won't work
>
> This is the function that can 'consume frags' after all.
>
> Its not clear that you catch all cases, like skbs being purged in case
> of device dismantle.
We need this list for two reason:
a) give back the pages to the sending guest (kfree/skb_copy_ubufs)
b) find out the grant refs when the skb is sent to another vif
b) is handled by this patch. For a) netback doesn't mind if granted 
frags were removed and/or local ones were injected. It only needs to 
give back the pages, it doesn't matter how the skb ended up.
The only other problematic point if frags are passed around between 
skbs, I'll write a separate mail about it.

>
> I am not saying your patch is wrong, only that it adds yet an obscure
> thing with no comments. In two years, nobody will understand this.
I agree, in v3 there will be more comment lines than actual new code :)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Eric Dumazet May 14, 2014, 11:35 a.m. UTC | #7

On Wed, 2014-05-14 at 12:12 +0100, Zoltan Kiss wrote:
> On 13/05/14 17:13, Eric Dumazet wrote:
> >
> > The 'cleanup' of stale ubufs should be right after __pskb_pull_tail().
> We can't fix every place in the kernel where frags might be changed, 
> especially with a netback specific stuff, so unfortunately that won't work
> >
> > This is the function that can 'consume frags' after all.
> >
> > Its not clear that you catch all cases, like skbs being purged in case
> > of device dismantle.
> We need this list for two reason:
> a) give back the pages to the sending guest (kfree/skb_copy_ubufs)

So If networking stack takes a reference on one particular frag, or
releases a ref on a frag, how is it done exactly ?

Are you 'giving back' page to the guest later because ubuf chain is now
obsolete ???


> b) find out the grant refs when the skb is sent to another vif
> b) is handled by this patch. For a) netback doesn't mind if granted 
> frags were removed and/or local ones were injected. It only needs to 
> give back the pages, it doesn't matter how the skb ended up.

> The only other problematic point if frags are passed around between 
> skbs, I'll write a separate mail about it.

Well, it seems this is exactly the point.
You give 'very special skb' to the stack, expecting stack will never do
any frag games, like in skb_try_coalesce()



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zoltan Kiss May 14, 2014, 1:25 p.m. UTC | #8

On 14/05/14 12:35, Eric Dumazet wrote:
> On Wed, 2014-05-14 at 12:12 +0100, Zoltan Kiss wrote:
>> On 13/05/14 17:13, Eric Dumazet wrote:
>>>
>>> The 'cleanup' of stale ubufs should be right after __pskb_pull_tail().
>> We can't fix every place in the kernel where frags might be changed,
>> especially with a netback specific stuff, so unfortunately that won't work
>>>
>>> This is the function that can 'consume frags' after all.
>>>
>>> Its not clear that you catch all cases, like skbs being purged in case
>>> of device dismantle.
>> We need this list for two reason:
>> a) give back the pages to the sending guest (kfree/skb_copy_ubufs)
>
> So If networking stack takes a reference on one particular frag, or
> releases a ref on a frag, how is it done exactly ?
Releasing a frag (= put_page) is not a problem, it will be given back to 
the guest when the skb is freed up. But taking an another ref is bad, 
because the destructor_arg is on shinfo, an another skb won't have the 
information about how to release that frag. That's why skb_orphan_frags 
exist, it triggers a local copy of the frags to be done, and release 
them back, so later on this feature doesn't cause a trouble. skb_clone 
does that as first thing, for example.
But of course it should be carefully checked that functions which place 
frags into another frags arrays should call orphan_frags, e.g. I guess 
skb_shift does such thing. That's what I intend to start another thread 
about.
>
> Are you 'giving back' page to the guest later because ubuf chain is now
> obsolete ???
>
>
>> b) find out the grant refs when the skb is sent to another vif
>> b) is handled by this patch. For a) netback doesn't mind if granted
>> frags were removed and/or local ones were injected. It only needs to
>> give back the pages, it doesn't matter how the skb ended up.
>
>> The only other problematic point if frags are passed around between
>> skbs, I'll write a separate mail about it.
>
> Well, it seems this is exactly the point.
> You give 'very special skb' to the stack, expecting stack will never do
> any frag games, like in skb_try_coalesce()
That's another function which needs an orphan_frag. Btw. usually these 
functions doesn't touch these packets, as they don't reach the upper 
layers of the stack, unless a frontend wants a socket connection to the 
backend domain.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

David Vrabel May 14, 2014, 1:37 p.m. UTC | #9

On 14/05/14 09:32, Ian Campbell wrote:
>> I am not saying your patch is wrong, only that it adds yet an obscure
>> thing with no comments. In two years, nobody will understand this.
> 
> Agreed.

The long term solution is to make grant copy use the supplied VA to copy
to/from which would mean we no longer need to pass the grant ref around
to do the copy.

This is non-trivial and would required an updated Xen.

It would also fix other use cases (such as a domU providing an iSCSI
target which is used as the block device for a local blkback).

David
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ian Campbell May 14, 2014, 4:40 p.m. UTC | #10

On Wed, 2014-05-14 at 14:25 +0100, Zoltan Kiss wrote:

> But of course it should be carefully checked that functions which place 
> frags into another frags arrays should call orphan_frags, e.g. I guess 
> skb_shift does such thing. That's what I intend to start another thread 
> about.

I see Eric has posted a fix for skb_try_coalesce(), but is that likely
to be acceptable for 3.15? How many other similar patches are we
expecting and/or how long do you think this careful checking will take?

Given that we are now at 3.15-rc5 I think we need to decide how to
proceed, either push ahead fixing all these issues or (partially) revert
this netback feature and try again for 3.16.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sander Eikelenboom May 14, 2014, 5:28 p.m. UTC | #11

Wednesday, May 14, 2014, 6:40:39 PM, you wrote:

> On Wed, 2014-05-14 at 14:25 +0100, Zoltan Kiss wrote:

>> But of course it should be carefully checked that functions which place 
>> frags into another frags arrays should call orphan_frags, e.g. I guess 
>> skb_shift does such thing. That's what I intend to start another thread 
>> about.

> I see Eric has posted a fix for skb_try_coalesce(), but is that likely
> to be acceptable for 3.15? How many other similar patches are we
> expecting and/or how long do you think this careful checking will take?

> Given that we are now at 3.15-rc5 I think we need to decide how to
> proceed, either push ahead fixing all these issues or (partially) revert
> this netback feature and try again for 3.16.

> Ian.

Hi Zoltan,

Thanks for the patch, and for what it's worth:

With Zoltan's v3 patch, I have ran the same tests as i did while reporting the 
regression and when that worked out ok stressed netback and netfront some extra this afternoon.
The patch fixes the regression for me, no corruption, no problems with SSL 
connections anymore and i haven't spotted anything else.

So as a fix for the immediate regression you may put on a tested-by :-)

On the discussion about the rest of the fallout i don't think i contribute that 
much.

--
Sander

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zoltan Kiss May 14, 2014, 6:44 p.m. UTC | #12

On 14/05/14 17:40, Ian Campbell wrote:
>
> On Wed, 2014-05-14 at 14:25 +0100, Zoltan Kiss wrote:
>
>> But of course it should be carefully checked that functions which place
>> frags into another frags arrays should call orphan_frags, e.g. I guess
>> skb_shift does such thing. That's what I intend to start another thread
>> about.
>
> I see Eric has posted a fix for skb_try_coalesce(), but is that likely
> to be acceptable for 3.15? How many other similar patches are we
> expecting and/or how long do you think this careful checking will take?
>
> Given that we are now at 3.15-rc5 I think we need to decide how to
> proceed, either push ahead fixing all these issues or (partially) revert
> this netback feature and try again for 3.16.
I guess these problems apply to KVM as well, and it haven't caused them 
problem so far, so I guess the problem shouldn't be that big. I assume 
they are using this feature to send fragmented skbs out from a guest, 
but I might be wrong.
Anyway, I think I'll check tomorrow if there are further places to worry 
about, but I don't think if there are any of them which are easily 
reproducible.

Zoli
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ian Campbell May 15, 2014, 8:31 a.m. UTC | #13

On Wed, 2014-05-14 at 12:12 +0100, Zoltan Kiss wrote:
> > The 'cleanup' of stale ubufs should be right after __pskb_pull_tail().
> We can't fix every place in the kernel where frags might be changed, 
> especially with a netback specific stuff, so unfortunately that won't work

Is it worth fixing up the ones in netback though so that the things
injected into the stack are consistent when we hand them over? It would
avoid some search overhead on the rx path at the other end I guess?
Perhaps not significant though.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ian Campbell May 15, 2014, 8:41 a.m. UTC | #14

On Wed, 2014-05-14 at 19:44 +0100, Zoltan Kiss wrote:
> On 14/05/14 17:40, Ian Campbell wrote:
> >
> > On Wed, 2014-05-14 at 14:25 +0100, Zoltan Kiss wrote:
> >
> >> But of course it should be carefully checked that functions which place
> >> frags into another frags arrays should call orphan_frags, e.g. I guess
> >> skb_shift does such thing. That's what I intend to start another thread
> >> about.
> >
> > I see Eric has posted a fix for skb_try_coalesce(), but is that likely
> > to be acceptable for 3.15? How many other similar patches are we
> > expecting and/or how long do you think this careful checking will take?
> >
> > Given that we are now at 3.15-rc5 I think we need to decide how to
> > proceed, either push ahead fixing all these issues or (partially) revert
> > this netback feature and try again for 3.16.
> I guess these problems apply to KVM as well, and it haven't caused them 
> problem so far, so I guess the problem shouldn't be that big. I assume 
> they are using this feature to send fragmented skbs out from a guest, 
> but I might be wrong.
> Anyway, I think I'll check tomorrow if there are further places to worry 
> about, but I don't think if there are any of them which are easily 
> reproducible.

OK, thanks. Assuming Eric's patch is acceptable for 3.15 it sounds like
that is the way to go.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Zoltan Kiss May 15, 2014, 10:14 a.m. UTC | #15

On 15/05/14 09:31, Ian Campbell wrote:
> On Wed, 2014-05-14 at 12:12 +0100, Zoltan Kiss wrote:
>>> The 'cleanup' of stale ubufs should be right after __pskb_pull_tail().
>> We can't fix every place in the kernel where frags might be changed,
>> especially with a netback specific stuff, so unfortunately that won't work
>
> Is it worth fixing up the ones in netback though so that the things
> injected into the stack are consistent when we hand them over? It would
> avoid some search overhead on the rx path at the other end I guess?
> Perhaps not significant though.

There are plans to remove that unconditional pull, as it damages 
performance. It is better to use during checksum setup maybe_pull_tail, 
and pull up whatever is needed for checksum setup (this is already done, 
partially). A sensible netfront would send the header in the first slot 
anyway, so netback won't pull, and it definitely won't pull the whole 
first frag.

Regards,

Zoli
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[net,RFC] xen-netback: Fix grant ref resolution in RX path

Commit Message

Comments

Patch