Patchwork GRO: fix merging a paged skb after non-paged skbs

login
register
mail settings
Submitter Michal Schmidt
Date Jan. 24, 2011, 5:47 p.m.
Message ID <20110124184752.1d0947dd@delilah>
Download mbox | patch
Permalink /patch/80221/
State Superseded
Delegated to: David Miller
Headers show

Comments

Michal Schmidt - Jan. 24, 2011, 5:47 p.m.
Suppose that several linear skbs of the same flow were received by GRO. They
were thus merged into one skb with a frag_list. Then a new skb of the same flow
arrives, but it is a paged skb with data starting in its frags[].

Before adding the skb to the frag_list skb_gro_receive() will of course adjust
the skb to throw away the headers. It correctly modifies the page_offset and
size of the frag, but it leaves incorrect information in the skb:
 ->data_len is not decreased at all.
 ->len is decreased only by headlen, as if no change were done to the frag.
Later in a receiving process this causes skb_copy_datagram_iovec() to return
-EFAULT and this is seen in userspace as the result of the recv() syscall.

In practice the bug can be reproduced with the sfc driver. By default the
driver uses an adaptive scheme when it switches between using
napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
reproduced when under rx load with enough successful GRO merging the driver
decides to switch from the former to the latter.

Manual control is also possible, so reproducing this is easy with netcat:
 - on machine1 (with sfc): nc -l 12345 > /dev/null
 - on machine2: nc machine1 12345 < /dev/zero
 - on machine1:
   echo 1 > /sys/module/sfc/parameters/rx_alloc_method  # use skbs
   echo 2 > /sys/module/sfc/parameters/rx_alloc_method  # use pages
 - See that nc has quit suddenly.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
---
 net/core/skbuff.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)
Ben Hutchings - Jan. 25, 2011, 1:24 a.m.
On Mon, 2011-01-24 at 18:47 +0100, Michal Schmidt wrote:
> Suppose that several linear skbs of the same flow were received by GRO. They
> were thus merged into one skb with a frag_list. Then a new skb of the same flow
> arrives, but it is a paged skb with data starting in its frags[].
> 
> Before adding the skb to the frag_list skb_gro_receive() will of course adjust
> the skb to throw away the headers. It correctly modifies the page_offset and
> size of the frag, but it leaves incorrect information in the skb:
>  ->data_len is not decreased at all.
>  ->len is decreased only by headlen, as if no change were done to the frag.
> Later in a receiving process this causes skb_copy_datagram_iovec() to return
> -EFAULT and this is seen in userspace as the result of the recv() syscall.
> 
> In practice the bug can be reproduced with the sfc driver. By default the
> driver uses an adaptive scheme when it switches between using
> napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
> reproduced when under rx load with enough successful GRO merging the driver
> decides to switch from the former to the latter.
[...]

This is odd because I thought we made sure to flush before making such a
change.  Perhaps that got lost during the conversion from inet_lro to
GRO?

Anyway, thanks very much for fixing this.

Ben.
Ben Hutchings - Feb. 7, 2011, 8:39 p.m.
On Tue, 2011-01-25 at 11:24 +1000, Ben Hutchings wrote:
> On Mon, 2011-01-24 at 18:47 +0100, Michal Schmidt wrote:
> > Suppose that several linear skbs of the same flow were received by GRO. They
> > were thus merged into one skb with a frag_list. Then a new skb of the same flow
> > arrives, but it is a paged skb with data starting in its frags[].
> > 
> > Before adding the skb to the frag_list skb_gro_receive() will of course adjust
> > the skb to throw away the headers. It correctly modifies the page_offset and
> > size of the frag, but it leaves incorrect information in the skb:
> >  ->data_len is not decreased at all.
> >  ->len is decreased only by headlen, as if no change were done to the frag.
> > Later in a receiving process this causes skb_copy_datagram_iovec() to return
> > -EFAULT and this is seen in userspace as the result of the recv() syscall.
> > 
> > In practice the bug can be reproduced with the sfc driver. By default the
> > driver uses an adaptive scheme when it switches between using
> > napi_gro_receive() (with skbs) and napi_gro_frags() (with pages). The bug is
> > reproduced when under rx load with enough successful GRO merging the driver
> > decides to switch from the former to the latter.
> [...]
> 
> This is odd because I thought we made sure to flush before making such a
> change.  Perhaps that got lost during the conversion from inet_lro to
> GRO?

That is indeed the case; commit da3bc07171dff957906cbe2ad5abb443eccf57c4
made the following deletions:

-       /* Both our generic-LRO and SFC-SSR support skb and page based
-        * allocation, but neither support switching from one to the
-        * other on the fly. If we spot that the allocation mode has
-        * changed, then flush the LRO state.
-        */
-       if (unlikely(channel->rx_alloc_pop_pages != (rx_buf->page != NULL))) {
-               efx_flush_lro(channel);
-               channel->rx_alloc_pop_pages = (rx_buf->page != NULL);
-       }

Ben.

> Anyway, thanks very much for fixing this.
> 
> Ben.
>
Herbert Xu - Feb. 8, 2011, 8:49 a.m.
On Mon, Feb 07, 2011 at 08:39:20PM +0000, Ben Hutchings wrote:
> 
> That is indeed the case; commit da3bc07171dff957906cbe2ad5abb443eccf57c4
> made the following deletions:
> 
> -       /* Both our generic-LRO and SFC-SSR support skb and page based
> -        * allocation, but neither support switching from one to the
> -        * other on the fly. If we spot that the allocation mode has
> -        * changed, then flush the LRO state.
> -        */
> -       if (unlikely(channel->rx_alloc_pop_pages != (rx_buf->page != NULL))) {
> -               efx_flush_lro(channel);
> -               channel->rx_alloc_pop_pages = (rx_buf->page != NULL);
> -       }

Oops, sorry about that.

How about changing skb_gro_receive to detect such switches and
simply return an error, which should have the same effect as
flushing that flow?

Cheers,
Ben Hutchings - Feb. 8, 2011, 3:04 p.m.
On Tue, 2011-02-08 at 19:49 +1100, Herbert Xu wrote:
> On Mon, Feb 07, 2011 at 08:39:20PM +0000, Ben Hutchings wrote:
> > 
> > That is indeed the case; commit da3bc07171dff957906cbe2ad5abb443eccf57c4
> > made the following deletions:
> > 
> > -       /* Both our generic-LRO and SFC-SSR support skb and page based
> > -        * allocation, but neither support switching from one to the
> > -        * other on the fly. If we spot that the allocation mode has
> > -        * changed, then flush the LRO state.
> > -        */
> > -       if (unlikely(channel->rx_alloc_pop_pages != (rx_buf->page != NULL))) {
> > -               efx_flush_lro(channel);
> > -               channel->rx_alloc_pop_pages = (rx_buf->page != NULL);
> > -       }
> 
> Oops, sorry about that.
> 
> How about changing skb_gro_receive to detect such switches and
> simply return an error, which should have the same effect as
> flushing that flow?

That would work, though it looks like Michal has managed to make it
tolerate switches.  (I haven't yet tested the result myself.)

Ben.
Herbert Xu - Feb. 8, 2011, 8:54 p.m.
On Tue, Feb 08, 2011 at 03:04:44PM +0000, Ben Hutchings wrote:
>
> That would work, though it looks like Michal has managed to make it
> tolerate switches.  (I haven't yet tested the result myself.)

Well the question is do we really want to still keep merging
in case of a switch? It could potentially get messy if it switches
back over and over again.

However, I suppose other merging criteria will stop the merging
in such cases so it's probably not a big deal.

Cheers,

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index d31bb36..c231f5b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -2746,7 +2746,7 @@  merge:
 	if (offset > headlen) {
 		skbinfo->frags[0].page_offset += offset - headlen;
 		skbinfo->frags[0].size -= offset - headlen;
-		offset = headlen;
+		skb->data_len -= offset - headlen;
 	}
 
 	__skb_pull(skb, offset);