diff mbox series

[net-next,3/3] udp: only use paged allocation with scatter-gather

Message ID 20180514230747.118875-4-willemdebruijn.kernel@gmail.com
State Changes Requested, archived
Headers show
Series udp gso fixes | expand

Commit Message

Willem de Bruijn May 14, 2018, 11:07 p.m. UTC
From: Willem de Bruijn <willemb@google.com>

Paged allocation stores most payload in skb frags. This helps udp gso
by avoiding copying from the gso skb to segment skb in skb_segment.

But without scatter-gather, data must be linear, so do not use paged
mode unless NETIF_F_SG.

Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
Reported-by: Sean Tranchetti <stranche@codeaurora.org>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 net/ipv4/ip_output.c  | 2 +-
 net/ipv6/ip6_output.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Eric Dumazet May 14, 2018, 11:12 p.m. UTC | #1
On 05/14/2018 04:07 PM, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Paged allocation stores most payload in skb frags. This helps udp gso
> by avoiding copying from the gso skb to segment skb in skb_segment.
> 
> But without scatter-gather, data must be linear, so do not use paged
> mode unless NETIF_F_SG.
> 
> Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
> Reported-by: Sean Tranchetti <stranche@codeaurora.org>
> Signed-off-by: Willem de Bruijn <willemb@google.com>
> ---
>  net/ipv4/ip_output.c  | 2 +-
>  net/ipv6/ip6_output.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index b5e21eb198d8..b38731d8a44f 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk,
>  
>  	exthdrlen = !skb ? rt->dst.header_len : 0;
>  	mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
> -	paged = !!cork->gso_size;
> +	paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
>  
>  	if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
>  	    sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
> index 7f4493080df6..35a940b9f208 100644
> --- a/net/ipv6/ip6_output.c
> +++ b/net/ipv6/ip6_output.c
> @@ -1262,7 +1262,7 @@ static int __ip6_append_data(struct sock *sk,
>  		dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
>  	}
>  
> -	paged = !!cork->gso_size;
> +	paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
>  	mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
>  	orig_mtu = mtu;
>  
> 

As I said, this wont help for stacked device

bonding might advertise NETIF_F_SG, but one slave might not.
Willem de Bruijn May 14, 2018, 11:30 p.m. UTC | #2
On Mon, May 14, 2018 at 7:12 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/14/2018 04:07 PM, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Paged allocation stores most payload in skb frags. This helps udp gso
>> by avoiding copying from the gso skb to segment skb in skb_segment.
>>
>> But without scatter-gather, data must be linear, so do not use paged
>> mode unless NETIF_F_SG.
>>
>> Fixes: 15e36f5b8e98 ("udp: paged allocation with gso")
>> Reported-by: Sean Tranchetti <stranche@codeaurora.org>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
>> ---
>>  net/ipv4/ip_output.c  | 2 +-
>>  net/ipv6/ip6_output.c | 2 +-
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
>> index b5e21eb198d8..b38731d8a44f 100644
>> --- a/net/ipv4/ip_output.c
>> +++ b/net/ipv4/ip_output.c
>> @@ -884,7 +884,7 @@ static int __ip_append_data(struct sock *sk,
>>
>>       exthdrlen = !skb ? rt->dst.header_len : 0;
>>       mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
>> -     paged = !!cork->gso_size;
>> +     paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
>>
>>       if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
>>           sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
>> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
>> index 7f4493080df6..35a940b9f208 100644
>> --- a/net/ipv6/ip6_output.c
>> +++ b/net/ipv6/ip6_output.c
>> @@ -1262,7 +1262,7 @@ static int __ip6_append_data(struct sock *sk,
>>               dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
>>       }
>>
>> -     paged = !!cork->gso_size;
>> +     paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
>>       mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
>>       orig_mtu = mtu;
>>
>>
>
> As I said, this wont help for stacked device
>
> bonding might advertise NETIF_F_SG, but one slave might not.

I don't quite follow. The reported crash happens in the protocol layer,
because of this check. With pagedlen we have not allocated
sufficient space for the skb_put.

                if (!(rt->dst.dev->features&NETIF_F_SG)) {
                        unsigned int off;

                        off = skb->len;
                        if (getfrag(from, skb_put(skb, copy),
                                        offset, copy, off, skb) < 0) {
                                __skb_trim(skb, off);
                                err = -EFAULT;
                                goto error;
                        }
                } else {
                        int i = skb_shinfo(skb)->nr_frags;

Are you referring to a separate potential issue in the gso layer?
If a bonding device advertises SG, but a slave does not, then
skb_segment on the slave should build linear segs? I have not
tested that.
Eric Dumazet May 14, 2018, 11:45 p.m. UTC | #3
On 05/14/2018 04:30 PM, Willem de Bruijn wrote:

> I don't quite follow. The reported crash happens in the protocol layer,
> because of this check. With pagedlen we have not allocated
> sufficient space for the skb_put.
> 
>                 if (!(rt->dst.dev->features&NETIF_F_SG)) {
>                         unsigned int off;
> 
>                         off = skb->len;
>                         if (getfrag(from, skb_put(skb, copy),
>                                         offset, copy, off, skb) < 0) {
>                                 __skb_trim(skb, off);
>                                 err = -EFAULT;
>                                 goto error;
>                         }
>                 } else {
>                         int i = skb_shinfo(skb)->nr_frags;
> 
> Are you referring to a separate potential issue in the gso layer?
> If a bonding device advertises SG, but a slave does not, then
> skb_segment on the slave should build linear segs? I have not
> tested that.

Given that the device attribute could change under us, we need to not
crash, even if initially we thought NETIF_F_SG was available.

Unless you want to hold RTNL in UDP xmit :)

Ideally, GSO should be always on, as we did for TCP.

Otherwise, I can guarantee syzkaller will hit again.
Willem de Bruijn May 15, 2018, 2:14 p.m. UTC | #4
On Mon, May 14, 2018 at 7:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
> On 05/14/2018 04:30 PM, Willem de Bruijn wrote:
>
>> I don't quite follow. The reported crash happens in the protocol layer,
>> because of this check. With pagedlen we have not allocated
>> sufficient space for the skb_put.
>>
>>                 if (!(rt->dst.dev->features&NETIF_F_SG)) {
>>                         unsigned int off;
>>
>>                         off = skb->len;
>>                         if (getfrag(from, skb_put(skb, copy),
>>                                         offset, copy, off, skb) < 0) {
>>                                 __skb_trim(skb, off);
>>                                 err = -EFAULT;
>>                                 goto error;
>>                         }
>>                 } else {
>>                         int i = skb_shinfo(skb)->nr_frags;
>>
>> Are you referring to a separate potential issue in the gso layer?
>> If a bonding device advertises SG, but a slave does not, then
>> skb_segment on the slave should build linear segs? I have not
>> tested that.
>
> Given that the device attribute could change under us, we need to not
> crash, even if initially we thought NETIF_F_SG was available.
>
> Unless you want to hold RTNL in UDP xmit :)
>
> Ideally, GSO should be always on, as we did for TCP.
>
> Otherwise, I can guarantee syzkaller will hit again.

Ah, right. Thanks, Eric!

I'll read that feature bit only once.
Willem de Bruijn May 15, 2018, 8:04 p.m. UTC | #5
On Tue, May 15, 2018 at 10:14 AM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Mon, May 14, 2018 at 7:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>> On 05/14/2018 04:30 PM, Willem de Bruijn wrote:
>>
>>> I don't quite follow. The reported crash happens in the protocol layer,
>>> because of this check. With pagedlen we have not allocated
>>> sufficient space for the skb_put.
>>>
>>>                 if (!(rt->dst.dev->features&NETIF_F_SG)) {
>>>                         unsigned int off;
>>>
>>>                         off = skb->len;
>>>                         if (getfrag(from, skb_put(skb, copy),
>>>                                         offset, copy, off, skb) < 0) {
>>>                                 __skb_trim(skb, off);
>>>                                 err = -EFAULT;
>>>                                 goto error;
>>>                         }
>>>                 } else {
>>>                         int i = skb_shinfo(skb)->nr_frags;
>>>
>>> Are you referring to a separate potential issue in the gso layer?
>>> If a bonding device advertises SG, but a slave does not, then
>>> skb_segment on the slave should build linear segs? I have not
>>> tested that.
>>
>> Given that the device attribute could change under us, we need to not
>> crash, even if initially we thought NETIF_F_SG was available.
>>
>> Unless you want to hold RTNL in UDP xmit :)
>>
>> Ideally, GSO should be always on, as we did for TCP.
>>
>> Otherwise, I can guarantee syzkaller will hit again.
>
> Ah, right. Thanks, Eric!
>
> I'll read that feature bit only once.

This issue is actually deeper and not specific to gso.
With corking it is trivial to turn off sg in between calls.

I'll need to send a separate fix for that.
Willem de Bruijn May 15, 2018, 11:57 p.m. UTC | #6
On Tue, May 15, 2018 at 4:04 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Tue, May 15, 2018 at 10:14 AM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Mon, May 14, 2018 at 7:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>
>>>
>>> On 05/14/2018 04:30 PM, Willem de Bruijn wrote:
>>>
>>>> I don't quite follow. The reported crash happens in the protocol layer,
>>>> because of this check. With pagedlen we have not allocated
>>>> sufficient space for the skb_put.
>>>>
>>>>                 if (!(rt->dst.dev->features&NETIF_F_SG)) {
>>>>                         unsigned int off;
>>>>
>>>>                         off = skb->len;
>>>>                         if (getfrag(from, skb_put(skb, copy),
>>>>                                         offset, copy, off, skb) < 0) {
>>>>                                 __skb_trim(skb, off);
>>>>                                 err = -EFAULT;
>>>>                                 goto error;
>>>>                         }
>>>>                 } else {
>>>>                         int i = skb_shinfo(skb)->nr_frags;
>>>>
>>>> Are you referring to a separate potential issue in the gso layer?
>>>> If a bonding device advertises SG, but a slave does not, then
>>>> skb_segment on the slave should build linear segs? I have not
>>>> tested that.
>>>
>>> Given that the device attribute could change under us, we need to not
>>> crash, even if initially we thought NETIF_F_SG was available.
>>>
>>> Unless you want to hold RTNL in UDP xmit :)
>>>
>>> Ideally, GSO should be always on, as we did for TCP.
>>>
>>> Otherwise, I can guarantee syzkaller will hit again.
>>
>> Ah, right. Thanks, Eric!
>>
>> I'll read that feature bit only once.
>
> This issue is actually deeper and not specific to gso.
> With corking it is trivial to turn off sg in between calls.
>
> I'll need to send a separate fix for that.

This would do it. The extra branch is unfortunate, but I see no easy
way around it for the corking case.

It will obviously not build a linear skb, but validate_xmit_skb will clean
that up for such edge cases.

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 66340ab750e6..e7daec7c7421 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1040,7 +1040,8 @@ static int __ip_append_data(struct sock *sk,
                if (copy > length)
                        copy = length;

-               if (!(rt->dst.dev->features&NETIF_F_SG)) {
+               if (!(rt->dst.dev->features&NETIF_F_SG) &&
+                   skb_tailroom(skb) >= copy) {
                        unsigned int off;
Willem de Bruijn May 16, 2018, 8:10 p.m. UTC | #7
On Tue, May 15, 2018 at 7:57 PM, Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> On Tue, May 15, 2018 at 4:04 PM, Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
>> On Tue, May 15, 2018 at 10:14 AM, Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>>> On Mon, May 14, 2018 at 7:45 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>
>>>>
>>>> On 05/14/2018 04:30 PM, Willem de Bruijn wrote:
>>>>
>>>>> I don't quite follow. The reported crash happens in the protocol layer,
>>>>> because of this check. With pagedlen we have not allocated
>>>>> sufficient space for the skb_put.
>>>>>
>>>>>                 if (!(rt->dst.dev->features&NETIF_F_SG)) {
>>>>>                         unsigned int off;
>>>>>
>>>>>                         off = skb->len;
>>>>>                         if (getfrag(from, skb_put(skb, copy),
>>>>>                                         offset, copy, off, skb) < 0) {
>>>>>                                 __skb_trim(skb, off);
>>>>>                                 err = -EFAULT;
>>>>>                                 goto error;
>>>>>                         }
>>>>>                 } else {
>>>>>                         int i = skb_shinfo(skb)->nr_frags;
>>>>>
>>>>> Are you referring to a separate potential issue in the gso layer?
>>>>> If a bonding device advertises SG, but a slave does not, then
>>>>> skb_segment on the slave should build linear segs? I have not
>>>>> tested that.
>>>>
>>>> Given that the device attribute could change under us, we need to not
>>>> crash, even if initially we thought NETIF_F_SG was available.
>>>>
>>>> Unless you want to hold RTNL in UDP xmit :)
>>>>
>>>> Ideally, GSO should be always on, as we did for TCP.
>>>>
>>>> Otherwise, I can guarantee syzkaller will hit again.
>>>
>>> Ah, right. Thanks, Eric!
>>>
>>> I'll read that feature bit only once.
>>
>> This issue is actually deeper and not specific to gso.
>> With corking it is trivial to turn off sg in between calls.
>>
>> I'll need to send a separate fix for that.
>
> This would do it. The extra branch is unfortunate, but I see no easy
> way around it for the corking case.
>
> It will obviously not build a linear skb, but validate_xmit_skb will clean
> that up for such edge cases.
>
> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
> index 66340ab750e6..e7daec7c7421 100644
> --- a/net/ipv4/ip_output.c
> +++ b/net/ipv4/ip_output.c
> @@ -1040,7 +1040,8 @@ static int __ip_append_data(struct sock *sk,
>                 if (copy > length)
>                         copy = length;
>
> -               if (!(rt->dst.dev->features&NETIF_F_SG)) {
> +               if (!(rt->dst.dev->features&NETIF_F_SG) &&
> +                   skb_tailroom(skb) >= copy) {
>                         unsigned int off;

Reminder that this is a separate draft patch to net unrelated to gso.

A simpler branch

> -               if (!(rt->dst.dev->features&NETIF_F_SG)) {
> +               if (skb_tailroom(skb) >= copy) {

is probably sufficient, but might have subtle side-effects when SG is
off, where allocation padding allows data to fit that would currently is
added as frag. Risky for a stable patch with no significant benefit.

On the other extreme, I can define

  bool sg = rt->dst.dev->features & NETIF_F_SG;

and refer to that in both current sites that test the flag. But this
will not help the corking case where the function is entered twice
for the same skb. I'll add that in the net-next gso fix where the flag
is tested three times.

But intend to send this snippet (also for v6) as is.
diff mbox series

Patch

diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index b5e21eb198d8..b38731d8a44f 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -884,7 +884,7 @@  static int __ip_append_data(struct sock *sk,
 
 	exthdrlen = !skb ? rt->dst.header_len : 0;
 	mtu = cork->gso_size ? IP_MAX_MTU : cork->fragsize;
-	paged = !!cork->gso_size;
+	paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
 
 	if (cork->tx_flags & SKBTX_ANY_SW_TSTAMP &&
 	    sk->sk_tsflags & SOF_TIMESTAMPING_OPT_ID)
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 7f4493080df6..35a940b9f208 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1262,7 +1262,7 @@  static int __ip6_append_data(struct sock *sk,
 		dst_exthdrlen = rt->dst.header_len - rt->rt6i_nfheader_len;
 	}
 
-	paged = !!cork->gso_size;
+	paged = cork->gso_size && (rt->dst.dev->features & NETIF_F_SG);
 	mtu = cork->gso_size ? IP6_MAX_MTU : cork->fragsize;
 	orig_mtu = mtu;