diff mbox series

[SRU,X/C] UBUNTU: SAUCE: ipv6: frags: fix skb extraction in ip6_expire_frag_queue()

Message ID 20190606093151.32010-2-stefan.bader@canonical.com
State New
Headers show
Series [SRU,X/C] UBUNTU: SAUCE: ipv6: frags: fix skb extraction in ip6_expire_frag_queue() | expand

Commit Message

Stefan Bader June 6, 2019, 9:31 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1824687

The backport of 05c0b86b9696802fd0ce5676a92a63f1b455bdf3 upstream
in linux-4.4.y stable changed ip6_expire_frag_queue() to be similar
to ip_expire(). However, using skb_get() leads to a crash while
sending the ICMP message due to a check for shared SKBs.

   kernel BUG at linux-4.4.0/net/core/skbuff.c:1207!
   RIP: 0010:[<ffffffff81740953>]
    [<ffffffff81740953>] pskb_expand_head+0x243/0x250
    [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
    [<ffffffff8183939a>] _decode_session6+0x26a/0x400
    [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
    [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
    [<ffffffff81824421>] icmp6_send+0x5e1/0x940
    [<ffffffff8183d431>] icmpv6_send+0x21/0x30
    [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120

For IPv4 the ip_expire() function however did change considerably
since then. In

  fa0f527358 "ip: use rb trees for IP frag queue."

the SKB might be taken from a rbtree (use of rbtrees for IPv4 was
backported to 4.4.y upstream).
Along with those obvious changes, the code also is modified to
actually de-queue the SKB from whichever source it was taken.
This also got rid of the skb_get() which causes problems in
icmpv6_send(). And latest upstream code uses inet_frag_pull_head()
which does the same.

To fix the crash in IPv6, we use the same modifications added
to ip_expire() by fa0f527358. This might be too much change for
now because IPv6 only starts using rbtrees for frags with

  997dd96471 "net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c"

which has not been backported to 4.4.y. Testing by a reporter was
showing good results. Likely the else part never gets used until
997dd96471 is backported, too. And that needs more changes.
Some upstream (stable) discussion was started but has not yet
resulted in any usable results. So adding this as SAUCE for now
to get the kernel stable (based on testing).

Fixes: bf8187348f "ipv6: frags: rewrite ip6_expire_frag_queue()"
       in the linux-4.4.y stable tree.
(based-on: f78a3f45e7 "ip: use rb trees for IP frag queue." 4.4.y)
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 net/ipv6/reassembly.c | 21 ++++++++++++++++++---
 1 file changed, 18 insertions(+), 3 deletions(-)

Comments

Andrea Righi June 6, 2019, 10:02 a.m. UTC | #1
On Thu, Jun 06, 2019 at 11:31:51AM +0200, Stefan Bader wrote:
> BugLink: https://bugs.launchpad.net/bugs/1824687
> 
> The backport of 05c0b86b9696802fd0ce5676a92a63f1b455bdf3 upstream

To be extremely picky here I'd use the standard git commit description
style, something like:

The backport of commit 05c0b86b9696 ("ipv6: frags: rewrite
ip6_expire_frag_queue()") upstream in linux-4.4.y ...

> in linux-4.4.y stable changed ip6_expire_frag_queue() to be similar
> to ip_expire(). However, using skb_get() leads to a crash while
> sending the ICMP message due to a check for shared SKBs.
> 
>    kernel BUG at linux-4.4.0/net/core/skbuff.c:1207!
>    RIP: 0010:[<ffffffff81740953>]
>     [<ffffffff81740953>] pskb_expand_head+0x243/0x250
>     [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
>     [<ffffffff8183939a>] _decode_session6+0x26a/0x400
>     [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
>     [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
>     [<ffffffff81824421>] icmp6_send+0x5e1/0x940
>     [<ffffffff8183d431>] icmpv6_send+0x21/0x30
>     [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
> 
> For IPv4 the ip_expire() function however did change considerably
> since then. In
> 
>   fa0f527358 "ip: use rb trees for IP frag queue."
> 
> the SKB might be taken from a rbtree (use of rbtrees for IPv4 was
> backported to 4.4.y upstream).
> Along with those obvious changes, the code also is modified to
> actually de-queue the SKB from whichever source it was taken.
> This also got rid of the skb_get() which causes problems in
> icmpv6_send(). And latest upstream code uses inet_frag_pull_head()
> which does the same.
> 
> To fix the crash in IPv6, we use the same modifications added
> to ip_expire() by fa0f527358. This might be too much change for
> now because IPv6 only starts using rbtrees for frags with
> 
>   997dd96471 "net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c"
> 
> which has not been backported to 4.4.y. Testing by a reporter was
> showing good results. Likely the else part never gets used until
> 997dd96471 is backported, too. And that needs more changes.
> Some upstream (stable) discussion was started but has not yet
> resulted in any usable results. So adding this as SAUCE for now
> to get the kernel stable (based on testing).
> 
> Fixes: bf8187348f "ipv6: frags: rewrite ip6_expire_frag_queue()"
>        in the linux-4.4.y stable tree.
> (based-on: f78a3f45e7 "ip: use rb trees for IP frag queue." 4.4.y)
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  net/ipv6/reassembly.c | 21 ++++++++++++++++++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
> 
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index ec917f58d105..b1949b37ac8c 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -110,16 +110,31 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>  	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
>  
>  	/* Don't send error if the first segment did not arrive. */
> -	head = fq->q.fragments;
> -	if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
> +	if (!(fq->q.flags & INET_FRAG_FIRST_IN))
>  		goto out;
>  
> +	if (fq->q.fragments) {
> +		head = fq->q.fragments;
> +		fq->q.fragments = head->next;
> +	} else {
> +		head = skb_rb_first(&fq->q.rb_fragments);
> +		if (!head)
> +			goto out;
> +		rb_erase(&head->rbnode, &fq->q.rb_fragments);
> +		memset(&head->rbnode, 0, sizeof(head->rbnode));
> +		barrier();
> +	}
> +
> +	if (head == fq->q.fragments_tail)
> +		fq->q.fragments_tail = NULL;
> +
> +	sub_frag_mem_limit(fq->q.net, head->truesize);
> +
>  	/* But use as source device on which LAST ARRIVED
>  	 * segment was received. And do not use fq->dev
>  	 * pointer directly, device might already disappeared.
>  	 */
>  	head->dev = dev;
> -	skb_get(head);
>  	spin_unlock(&fq->q.lock);
>  
>  	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);

We can't see from the patch, but, looking at the code, fa0f527358 is
also moving a kfree_skb(head) after rcu_read_unlock() in ip_expire(). I
was wondering if we should do the same here... in general it looks more
safe to kfree() outside the RCU section, but I don't know if it's
actually needed here.

Something like the following (on top of your patch). What do you think?

 net/ipv6/reassembly.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index b1949b37ac8c..c0e2ff764e3f 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -138,13 +138,14 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
 	spin_unlock(&fq->q.lock);
 
 	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
-	kfree_skb(head);
 	goto out_rcu_unlock;
 
 out:
 	spin_unlock(&fq->q.lock);
 out_rcu_unlock:
 	rcu_read_unlock();
+	if (head)
+		kfree_skb(head);
 	inet_frag_put(&fq->q);
 }
 EXPORT_SYMBOL(ip6_expire_frag_queue);
Stefan Bader June 6, 2019, 10:12 a.m. UTC | #2
On 06.06.19 12:02, Andrea Righi wrote:
> On Thu, Jun 06, 2019 at 11:31:51AM +0200, Stefan Bader wrote:
>> BugLink: https://bugs.launchpad.net/bugs/1824687
>>
>> The backport of 05c0b86b9696802fd0ce5676a92a63f1b455bdf3 upstream
> 
> To be extremely picky here I'd use the standard git commit description
> style, something like:
> 
> The backport of commit 05c0b86b9696 ("ipv6: frags: rewrite
> ip6_expire_frag_queue()") upstream in linux-4.4.y ...

Meh ok, since I have to re-do for below...

> 
>> in linux-4.4.y stable changed ip6_expire_frag_queue() to be similar
>> to ip_expire(). However, using skb_get() leads to a crash while
>> sending the ICMP message due to a check for shared SKBs.
>>
>>    kernel BUG at linux-4.4.0/net/core/skbuff.c:1207!
>>    RIP: 0010:[<ffffffff81740953>]
>>     [<ffffffff81740953>] pskb_expand_head+0x243/0x250
>>     [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
>>     [<ffffffff8183939a>] _decode_session6+0x26a/0x400
>>     [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
>>     [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
>>     [<ffffffff81824421>] icmp6_send+0x5e1/0x940
>>     [<ffffffff8183d431>] icmpv6_send+0x21/0x30
>>     [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
>>
>> For IPv4 the ip_expire() function however did change considerably
>> since then. In
>>
>>   fa0f527358 "ip: use rb trees for IP frag queue."
>>
>> the SKB might be taken from a rbtree (use of rbtrees for IPv4 was
>> backported to 4.4.y upstream).
>> Along with those obvious changes, the code also is modified to
>> actually de-queue the SKB from whichever source it was taken.
>> This also got rid of the skb_get() which causes problems in
>> icmpv6_send(). And latest upstream code uses inet_frag_pull_head()
>> which does the same.
>>
>> To fix the crash in IPv6, we use the same modifications added
>> to ip_expire() by fa0f527358. This might be too much change for
>> now because IPv6 only starts using rbtrees for frags with
>>
>>   997dd96471 "net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c"
>>
>> which has not been backported to 4.4.y. Testing by a reporter was
>> showing good results. Likely the else part never gets used until
>> 997dd96471 is backported, too. And that needs more changes.
>> Some upstream (stable) discussion was started but has not yet
>> resulted in any usable results. So adding this as SAUCE for now
>> to get the kernel stable (based on testing).
>>
>> Fixes: bf8187348f "ipv6: frags: rewrite ip6_expire_frag_queue()"
>>        in the linux-4.4.y stable tree.
>> (based-on: f78a3f45e7 "ip: use rb trees for IP frag queue." 4.4.y)
>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>> ---
>>  net/ipv6/reassembly.c | 21 ++++++++++++++++++---
>>  1 file changed, 18 insertions(+), 3 deletions(-)
>>
>> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
>> index ec917f58d105..b1949b37ac8c 100644
>> --- a/net/ipv6/reassembly.c
>> +++ b/net/ipv6/reassembly.c
>> @@ -110,16 +110,31 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>>  	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
>>  
>>  	/* Don't send error if the first segment did not arrive. */
>> -	head = fq->q.fragments;
>> -	if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
>> +	if (!(fq->q.flags & INET_FRAG_FIRST_IN))
>>  		goto out;
>>  
>> +	if (fq->q.fragments) {
>> +		head = fq->q.fragments;
>> +		fq->q.fragments = head->next;
>> +	} else {
>> +		head = skb_rb_first(&fq->q.rb_fragments);
>> +		if (!head)
>> +			goto out;
>> +		rb_erase(&head->rbnode, &fq->q.rb_fragments);
>> +		memset(&head->rbnode, 0, sizeof(head->rbnode));
>> +		barrier();
>> +	}
>> +
>> +	if (head == fq->q.fragments_tail)
>> +		fq->q.fragments_tail = NULL;
>> +
>> +	sub_frag_mem_limit(fq->q.net, head->truesize);
>> +
>>  	/* But use as source device on which LAST ARRIVED
>>  	 * segment was received. And do not use fq->dev
>>  	 * pointer directly, device might already disappeared.
>>  	 */
>>  	head->dev = dev;
>> -	skb_get(head);
>>  	spin_unlock(&fq->q.lock);
>>  
>>  	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
> 
> We can't see from the patch, but, looking at the code, fa0f527358 is
> also moving a kfree_skb(head) after rcu_read_unlock() in ip_expire(). I
> was wondering if we should do the same here... in general it looks more
> safe to kfree() outside the RCU section, but I don't know if it's
> actually needed here.
> 
> Something like the following (on top of your patch). What do you think?

I did not consider the rcu section. But makes sense, though I then also have to
init head to NULL.

> 
>  net/ipv6/reassembly.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index b1949b37ac8c..c0e2ff764e3f 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -138,13 +138,14 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>  	spin_unlock(&fq->q.lock);
>  
>  	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
> -	kfree_skb(head);
>  	goto out_rcu_unlock;
>  
>  out:
>  	spin_unlock(&fq->q.lock);
>  out_rcu_unlock:
>  	rcu_read_unlock();
> +	if (head)
> +		kfree_skb(head);
>  	inet_frag_put(&fq->q);
>  }
>  EXPORT_SYMBOL(ip6_expire_frag_queue);
>
diff mbox series

Patch

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index ec917f58d105..b1949b37ac8c 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -110,16 +110,31 @@  void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
 	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
 
 	/* Don't send error if the first segment did not arrive. */
-	head = fq->q.fragments;
-	if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
+	if (!(fq->q.flags & INET_FRAG_FIRST_IN))
 		goto out;
 
+	if (fq->q.fragments) {
+		head = fq->q.fragments;
+		fq->q.fragments = head->next;
+	} else {
+		head = skb_rb_first(&fq->q.rb_fragments);
+		if (!head)
+			goto out;
+		rb_erase(&head->rbnode, &fq->q.rb_fragments);
+		memset(&head->rbnode, 0, sizeof(head->rbnode));
+		barrier();
+	}
+
+	if (head == fq->q.fragments_tail)
+		fq->q.fragments_tail = NULL;
+
+	sub_frag_mem_limit(fq->q.net, head->truesize);
+
 	/* But use as source device on which LAST ARRIVED
 	 * segment was received. And do not use fq->dev
 	 * pointer directly, device might already disappeared.
 	 */
 	head->dev = dev;
-	skb_get(head);
 	spin_unlock(&fq->q.lock);
 
 	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);