ipv6: handle -EFAULT from skb_copy_bits

Message ID	1482323232.2260.2.camel@stressinduktion.org
State	RFC, archived
Delegated to:	David Miller
Headers	show Return-Path: <netdev-owner@vger.kernel.org> Message-ID: <1482323232.2260.2.camel@stressinduktion.org> Subject: Re: ipv6: handle -EFAULT from skb_copy_bits From: Hannes Frederic Sowa <hannes@stressinduktion.org> To: Cong Wang <xiyou.wangcong@gmail.com>, Dave Jones <davej@codemonkey.org.uk> Cc: David Miller <davem@davemloft.net>, Linux Kernel Network Developers <netdev@vger.kernel.org> Date: Wed, 21 Dec 2016 13:27:12 +0100 In-Reply-To: <CAM_iQpXWCWMKDjLVTLuHNSKKGq5MWf5ZRLpyE=PDD3EgQ7PS4Q@mail.gmail.com> References: <20161219.144848.736886978545428136.davem@davemloft.net> <20161220003144.omoqyghgdfbxdyuu@codemonkey.org.uk> <20161220004013.43pcopemc6im32az@codemonkey.org.uk> <20161219.203623.119653805184192345.davem@davemloft.net> <20161220181728.dd2cynjwrceruwcu@codemonkey.org.uk> <CAM_iQpUgqJEG544HqH1iwdQFL9-nV3-hMuuH_eU9OnJ--pX+jg@mail.gmail.com> <20161220221214.w3zerfiy4wu6apee@codemonkey.org.uk> <CAM_iQpXWCWMKDjLVTLuHNSKKGq5MWf5ZRLpyE=PDD3EgQ7PS4Q@mail.gmail.com> Content-Type: text/plain; charset="UTF-8" Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org Precedence: bulk

Hannes Frederic Sowa Dec. 21, 2016, 12:27 p.m. UTC

On Tue, 2016-12-20 at 22:09 -0800, Cong Wang wrote:
> On Tue, Dec 20, 2016 at 2:12 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
> >         fd = socket(AF_INET6, SOCK_RAW, 7);
> > 
> >         setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
> >         setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
> > 
> 
> Interesting, you set the checksum offset to be 0, but the packet size
> is actually 49, transport header is located at offset 48, so apparently
> the packet doesn't have room for a 16bit checksum after network header.
> 
> Your original patch seems reasonable to me, unless there is some
> check in __ip6_append_data() which is supposed to catch this, but
> CHECKSUM is specific to raw socket only.

The calculation of total_len is wrong here:

	total_len = inet_sk(sk)->cork.base.length;
	if (offset >= total_len - 1) {
		err = -EINVAL;
		ip6_flush_pending_frames(sk);
		goto out;
	}


At least for this bug to fix we need to subtract the extension header
length after the fragmentation header, so:

Hannes Frederic Sowa Dec. 21, 2016, 12:41 p.m. UTC | #1

On Wed, 2016-12-21 at 13:27 +0100, Hannes Frederic Sowa wrote: 
> @@ -555,8 +566,8 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
>                 goto out;
>  
>         offset = rp->offset;
> -       total_len = inet_sk(sk)->cork.base.length;
> -       if (offset >= total_len - 1) {
> +       transport_len = raw6_corked_transport_len(sk);
> +       if (offset >= transport_len - 1) {
>                 err = -EINVAL;
>                 ip6_flush_pending_frames(sk);
>                 goto out;
> @@ -598,7 +609,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
>                 tmp_csum = csum_sub(tmp_csum, csum_unfold(csum));
>  
>         csum = csum_ipv6_magic(&fl6->saddr, &fl6->daddr,
> -                              total_len, fl6->flowi6_proto, tmp_csum);
> +                              transport_len, fl6->flowi6_proto, tmp_csum);
>  
> 

Ops, here we need actually the total_len plus the opt->opt_nflen to
always calculate the correct checksum.

David Miller Dec. 21, 2016, 7:04 p.m. UTC | #2

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 21 Dec 2016 13:41:13 +0100

> On Wed, 2016-12-21 at 13:27 +0100, Hannes Frederic Sowa wrote: 
>> @@ -555,8 +566,8 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
>>                 goto out;
>>  
>>         offset = rp->offset;
>> -       total_len = inet_sk(sk)->cork.base.length;
>> -       if (offset >= total_len - 1) {
>> +       transport_len = raw6_corked_transport_len(sk);
>> +       if (offset >= transport_len - 1) {
>>                 err = -EINVAL;
>>                 ip6_flush_pending_frames(sk);
>>                 goto out;
>> @@ -598,7 +609,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
>>                 tmp_csum = csum_sub(tmp_csum, csum_unfold(csum));
>>  
>>         csum = csum_ipv6_magic(&fl6->saddr, &fl6->daddr,
>> -                              total_len, fl6->flowi6_proto, tmp_csum);
>> +                              transport_len, fl6->flowi6_proto, tmp_csum);
>>  
>> 
> 
> Ops, here we need actually the total_len plus the opt->opt_nflen to
> always calculate the correct checksum.

It's a real shame we can't just use skb_transport_offset().  This value
has essentially been calculated for us already.

Also, if we iterate over multiple SKBs in the write queue, don't you have
to redo this calculation for every SKB we iterate over?

Furthermore, what if the user queued up some SKBs in the raw socket
with MSG_MORE, and then changed some of the options for a subsequent
sendmsg() call?

Given all of this, I think the best thing to do is validate the offset
after the queue walks, which is pretty much what Dave Jones's original
patch was doing.

Sigh... well, at least we now understand what's going on here.

Hannes Frederic Sowa Dec. 21, 2016, 9:33 p.m. UTC | #3

On Wed, 2016-12-21 at 14:04 -0500, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Wed, 21 Dec 2016 13:41:13 +0100
> 
> > On Wed, 2016-12-21 at 13:27 +0100, Hannes Frederic Sowa wrote: 
> >> @@ -555,8 +566,8 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
> >>                 goto out;
> >>  
> >>         offset = rp->offset;
> >> -       total_len = inet_sk(sk)->cork.base.length;
> >> -       if (offset >= total_len - 1) {
> >> +       transport_len = raw6_corked_transport_len(sk);
> >> +       if (offset >= transport_len - 1) {
> >>                 err = -EINVAL;
> >>                 ip6_flush_pending_frames(sk);
> >>                 goto out;
> >> @@ -598,7 +609,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
> >>                 tmp_csum = csum_sub(tmp_csum, csum_unfold(csum));
> >>  
> >>         csum = csum_ipv6_magic(&fl6->saddr, &fl6->daddr,
> >> -                              total_len, fl6->flowi6_proto, tmp_csum);
> >> +                              transport_len, fl6->flowi6_proto, tmp_csum);
> >>  
> >> 
> > 
> > Ops, here we need actually the total_len plus the opt->opt_nflen to
> > always calculate the correct checksum.
> 
> It's a real shame we can't just use skb_transport_offset().  This value
> has essentially been calculated for us already.

This code path is called when we want to send out the socket write
queue. Because of MSG_MORE we might have multiple skbs in there, but
only the first one actually carries the true header, the others are
optimized for the ipv6 fragmentation fast path, thus we need to use
skb_transport_offset to find the correct offset but we must not
touch/read the data area before, as it is undefined data at that point.

Basically total_len so far accumulated all the payload length written
to the socket from the syscall argument from the user, but
unfortunately it also accounts for the first sendmsg's synthesized
extension header length in the write queue. We need to reverse this
calculation at this point in time.

So total_len is independent from the loop we do below and the length
should always reflect the length of all skbs stored in the write queue.

The loop adjusts the offset of the checksum in case the checksum offset
points to data in one of the later skbs, but we know before if that can
happen or not because of the cork length.

> Also, if we iterate over multiple SKBs in the write queue, don't you have
> to redo this calculation for every SKB we iterate over?

All the skb's payload length are summed up in the cork's length field,
so we don't need to sum it up again but just can use that value as is
minus the extension header adjustment.

> Furthermore, what if the user queued up some SKBs in the raw socket
> with MSG_MORE, and then changed some of the options for a subsequent
> sendmsg() call?

We only refresh the extension headers the next time we prepare an ipv6
header, which is the first sendmsg after one without MSG_MORE. We are
protected in this situation and don't change header lengths later
during further MSG_MORE sendmsgs.

> Given all of this, I think the best thing to do is validate the offset
> after the queue walks, which is pretty much what Dave Jones's original
> patch was doing.

I think both approaches protect against the bug reasonably well, but
Dave's patch has a bug: we must either call ip6_flush_pending_frames to
clear the socket write queue with the buggy send request.

> Sigh... well, at least we now understand what's going on here.

Yep, the code is more than a bit complex. :/

Bye,
Hannes

Dave Jones Dec. 22, 2016, 1:40 a.m. UTC | #4

On Wed, Dec 21, 2016 at 10:33:20PM +0100, Hannes Frederic Sowa wrote:

 > > Given all of this, I think the best thing to do is validate the offset
 > > after the queue walks, which is pretty much what Dave Jones's original
 > > patch was doing.
 > 
 > I think both approaches protect against the bug reasonably well, but
 > Dave's patch has a bug: we must either call ip6_flush_pending_frames to
 > clear the socket write queue with the buggy send request.

I can fix that up and resubmit, or we can go with your approach.
DaveM ?

	Dave

David Miller Dec. 22, 2016, 3:29 a.m. UTC | #5

From: Dave Jones <davej@codemonkey.org.uk>
Date: Wed, 21 Dec 2016 20:40:19 -0500

> On Wed, Dec 21, 2016 at 10:33:20PM +0100, Hannes Frederic Sowa wrote:
> 
>  > > Given all of this, I think the best thing to do is validate the offset
>  > > after the queue walks, which is pretty much what Dave Jones's original
>  > > patch was doing.
>  > 
>  > I think both approaches protect against the bug reasonably well, but
>  > Dave's patch has a bug: we must either call ip6_flush_pending_frames to
>  > clear the socket write queue with the buggy send request.
> 
> I can fix that up and resubmit, or we can go with your approach.
> DaveM ?

Please respin your patch with the fix Dave.

ipv6: handle -EFAULT from skb_copy_bits

Commit Message

Comments

Patch