diff mbox series

[SRU,V2,X/C] UBUNTU: SAUCE: ipv6: frags: fix skb extraction in ip6_expire_frag_queue()

Message ID 20190606102407.881-2-stefan.bader@canonical.com
State New
Headers show
Series [SRU,V2,X/C] UBUNTU: SAUCE: ipv6: frags: fix skb extraction in ip6_expire_frag_queue() | expand

Commit Message

Stefan Bader June 6, 2019, 10:24 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1824687

The backport of

  05c0b86b96 ("ipv6: frags: rewrite ip6_expire_frag_queue()")

to linux-4.4.y stable changed ip6_expire_frag_queue() to be similar
to ip_expire(). However, using skb_get() leads to a crash while
sending the ICMP message due to a check for shared SKBs.

   kernel BUG at linux-4.4.0/net/core/skbuff.c:1207!
   RIP: 0010:[<ffffffff81740953>]
    [<ffffffff81740953>] pskb_expand_head+0x243/0x250
    [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
    [<ffffffff8183939a>] _decode_session6+0x26a/0x400
    [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
    [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
    [<ffffffff81824421>] icmp6_send+0x5e1/0x940
    [<ffffffff8183d431>] icmpv6_send+0x21/0x30
    [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120

For IPv4 the ip_expire() function however did change considerably
since then. In

  fa0f527358 ("ip: use rb trees for IP frag queue.")

the SKB might be taken from a rbtree (use of rbtrees for IPv4 was
backported to 4.4.y upstream).
Along with those obvious changes, the code also is modified to
actually de-queue the SKB from whichever source it was taken.
This also got rid of the skb_get() which causes problems in
icmpv6_send(). And latest upstream code uses inet_frag_pull_head()
which does the same.

To fix the crash in IPv6, we use the same modifications added
to ip_expire() by fa0f527358. This might be too much change for
now because IPv6 only starts using rbtrees for frags with

  997dd96471 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c")

which has not been backported to 4.4.y. Testing by a reporter was
showing good results. Likely the else part never gets used until
997dd96471 is backported, too. And that needs more changes.
Some upstream (stable) discussion was started but has not yet
resulted in any usable results. So adding this as SAUCE for now
to get the kernel stable (based on testing).

Fixes: bf8187348f ("ipv6: frags: rewrite ip6_expire_frag_queue()")
       in the linux-4.4.y stable tree.
(based-on: f78a3f45e7 ("ip: use rb trees for IP frag queue." 4.4.y))
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
---
 net/ipv6/reassembly.c | 26 +++++++++++++++++++++-----
 1 file changed, 21 insertions(+), 5 deletions(-)

Comments

Colin Ian King June 6, 2019, 10:35 a.m. UTC | #1
On 06/06/2019 11:24, Stefan Bader wrote:
> BugLink: https://bugs.launchpad.net/bugs/1824687
> 
> The backport of
> 
>   05c0b86b96 ("ipv6: frags: rewrite ip6_expire_frag_queue()")
> 
> to linux-4.4.y stable changed ip6_expire_frag_queue() to be similar
> to ip_expire(). However, using skb_get() leads to a crash while
> sending the ICMP message due to a check for shared SKBs.
> 
>    kernel BUG at linux-4.4.0/net/core/skbuff.c:1207!
>    RIP: 0010:[<ffffffff81740953>]
>     [<ffffffff81740953>] pskb_expand_head+0x243/0x250
>     [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
>     [<ffffffff8183939a>] _decode_session6+0x26a/0x400
>     [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
>     [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
>     [<ffffffff81824421>] icmp6_send+0x5e1/0x940
>     [<ffffffff8183d431>] icmpv6_send+0x21/0x30
>     [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
> 
> For IPv4 the ip_expire() function however did change considerably
> since then. In
> 
>   fa0f527358 ("ip: use rb trees for IP frag queue.")
> 
> the SKB might be taken from a rbtree (use of rbtrees for IPv4 was
> backported to 4.4.y upstream).
> Along with those obvious changes, the code also is modified to
> actually de-queue the SKB from whichever source it was taken.
> This also got rid of the skb_get() which causes problems in
> icmpv6_send(). And latest upstream code uses inet_frag_pull_head()
> which does the same.
> 
> To fix the crash in IPv6, we use the same modifications added
> to ip_expire() by fa0f527358. This might be too much change for
> now because IPv6 only starts using rbtrees for frags with
> 
>   997dd96471 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c")
> 
> which has not been backported to 4.4.y. Testing by a reporter was
> showing good results. Likely the else part never gets used until
> 997dd96471 is backported, too. And that needs more changes.
> Some upstream (stable) discussion was started but has not yet
> resulted in any usable results. So adding this as SAUCE for now
> to get the kernel stable (based on testing).
> 
> Fixes: bf8187348f ("ipv6: frags: rewrite ip6_expire_frag_queue()")
>        in the linux-4.4.y stable tree.
> (based-on: f78a3f45e7 ("ip: use rb trees for IP frag queue." 4.4.y))
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  net/ipv6/reassembly.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index ec917f58d105..5b09ce54c476 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -92,7 +92,7 @@ EXPORT_SYMBOL(ip6_frag_init);
>  void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>  {
>  	struct net_device *dev = NULL;
> -	struct sk_buff *head;
> +	struct sk_buff *head = NULL;
>  
>  	rcu_read_lock();
>  	spin_lock(&fq->q.lock);
> @@ -110,26 +110,42 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>  	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
>  
>  	/* Don't send error if the first segment did not arrive. */
> -	head = fq->q.fragments;
> -	if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
> +	if (!(fq->q.flags & INET_FRAG_FIRST_IN))
>  		goto out;
>  
> +	if (fq->q.fragments) {
> +		head = fq->q.fragments;
> +		fq->q.fragments = head->next;
> +	} else {
> +		head = skb_rb_first(&fq->q.rb_fragments);
> +		if (!head)
> +			goto out;
> +		rb_erase(&head->rbnode, &fq->q.rb_fragments);
> +		memset(&head->rbnode, 0, sizeof(head->rbnode));
> +		barrier();
> +	}
> +
> +	if (head == fq->q.fragments_tail)
> +		fq->q.fragments_tail = NULL;
> +
> +	sub_frag_mem_limit(fq->q.net, head->truesize);
> +
>  	/* But use as source device on which LAST ARRIVED
>  	 * segment was received. And do not use fq->dev
>  	 * pointer directly, device might already disappeared.
>  	 */
>  	head->dev = dev;
> -	skb_get(head);
>  	spin_unlock(&fq->q.lock);
>  
>  	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
> -	kfree_skb(head);
>  	goto out_rcu_unlock;
>  
>  out:
>  	spin_unlock(&fq->q.lock);
>  out_rcu_unlock:
>  	rcu_read_unlock();
> +	if (head)
> +		kfree_skb(head);
>  	inet_frag_put(&fq->q);
>  }
>  EXPORT_SYMBOL(ip6_expire_frag_queue);
> 

this now closer matches the intent from the commit fa0f527358bd900ef9
that it is based on.  The original had positive test results, so I'm
hoping this will be OK too (the delta is only minor) and addresses the
concerns in Andrea's comments.

So, based on this,

Acked-by: Colin Ian King <colin.king@canonical.com>
Andrea Righi June 6, 2019, 11:01 a.m. UTC | #2
On Thu, Jun 06, 2019 at 12:24:07PM +0200, Stefan Bader wrote:
> BugLink: https://bugs.launchpad.net/bugs/1824687
> 
> The backport of
> 
>   05c0b86b96 ("ipv6: frags: rewrite ip6_expire_frag_queue()")
> 
> to linux-4.4.y stable changed ip6_expire_frag_queue() to be similar
> to ip_expire(). However, using skb_get() leads to a crash while
> sending the ICMP message due to a check for shared SKBs.
> 
>    kernel BUG at linux-4.4.0/net/core/skbuff.c:1207!
>    RIP: 0010:[<ffffffff81740953>]
>     [<ffffffff81740953>] pskb_expand_head+0x243/0x250
>     [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
>     [<ffffffff8183939a>] _decode_session6+0x26a/0x400
>     [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
>     [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
>     [<ffffffff81824421>] icmp6_send+0x5e1/0x940
>     [<ffffffff8183d431>] icmpv6_send+0x21/0x30
>     [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
> 
> For IPv4 the ip_expire() function however did change considerably
> since then. In
> 
>   fa0f527358 ("ip: use rb trees for IP frag queue.")
> 
> the SKB might be taken from a rbtree (use of rbtrees for IPv4 was
> backported to 4.4.y upstream).
> Along with those obvious changes, the code also is modified to
> actually de-queue the SKB from whichever source it was taken.
> This also got rid of the skb_get() which causes problems in
> icmpv6_send(). And latest upstream code uses inet_frag_pull_head()
> which does the same.
> 
> To fix the crash in IPv6, we use the same modifications added
> to ip_expire() by fa0f527358. This might be too much change for
> now because IPv6 only starts using rbtrees for frags with
> 
>   997dd96471 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c")
> 
> which has not been backported to 4.4.y. Testing by a reporter was
> showing good results. Likely the else part never gets used until
> 997dd96471 is backported, too. And that needs more changes.
> Some upstream (stable) discussion was started but has not yet
> resulted in any usable results. So adding this as SAUCE for now
> to get the kernel stable (based on testing).
> 
> Fixes: bf8187348f ("ipv6: frags: rewrite ip6_expire_frag_queue()")
>        in the linux-4.4.y stable tree.
> (based-on: f78a3f45e7 ("ip: use rb trees for IP frag queue." 4.4.y))
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  net/ipv6/reassembly.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index ec917f58d105..5b09ce54c476 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -92,7 +92,7 @@ EXPORT_SYMBOL(ip6_frag_init);
>  void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>  {
>  	struct net_device *dev = NULL;
> -	struct sk_buff *head;
> +	struct sk_buff *head = NULL;
>  
>  	rcu_read_lock();
>  	spin_lock(&fq->q.lock);
> @@ -110,26 +110,42 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>  	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
>  
>  	/* Don't send error if the first segment did not arrive. */
> -	head = fq->q.fragments;
> -	if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
> +	if (!(fq->q.flags & INET_FRAG_FIRST_IN))
>  		goto out;
>  
> +	if (fq->q.fragments) {
> +		head = fq->q.fragments;
> +		fq->q.fragments = head->next;
> +	} else {
> +		head = skb_rb_first(&fq->q.rb_fragments);
> +		if (!head)
> +			goto out;
> +		rb_erase(&head->rbnode, &fq->q.rb_fragments);
> +		memset(&head->rbnode, 0, sizeof(head->rbnode));
> +		barrier();
> +	}
> +
> +	if (head == fq->q.fragments_tail)
> +		fq->q.fragments_tail = NULL;
> +
> +	sub_frag_mem_limit(fq->q.net, head->truesize);
> +
>  	/* But use as source device on which LAST ARRIVED
>  	 * segment was received. And do not use fq->dev
>  	 * pointer directly, device might already disappeared.
>  	 */
>  	head->dev = dev;
> -	skb_get(head);
>  	spin_unlock(&fq->q.lock);
>  
>  	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
> -	kfree_skb(head);
>  	goto out_rcu_unlock;
>  
>  out:
>  	spin_unlock(&fq->q.lock);
>  out_rcu_unlock:
>  	rcu_read_unlock();
> +	if (head)
> +		kfree_skb(head);
>  	inet_frag_put(&fq->q);
>  }
>  EXPORT_SYMBOL(ip6_expire_frag_queue);

Looks good to me!

Acked-by: Andrea Righi <andrea.righi@canonical.com>
Stefan Bader June 6, 2019, 11:22 a.m. UTC | #3
On 06.06.19 12:24, Stefan Bader wrote:
> BugLink: https://bugs.launchpad.net/bugs/1824687
> 
> The backport of
> 
>   05c0b86b96 ("ipv6: frags: rewrite ip6_expire_frag_queue()")
> 
> to linux-4.4.y stable changed ip6_expire_frag_queue() to be similar
> to ip_expire(). However, using skb_get() leads to a crash while
> sending the ICMP message due to a check for shared SKBs.
> 
>    kernel BUG at linux-4.4.0/net/core/skbuff.c:1207!
>    RIP: 0010:[<ffffffff81740953>]
>     [<ffffffff81740953>] pskb_expand_head+0x243/0x250
>     [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
>     [<ffffffff8183939a>] _decode_session6+0x26a/0x400
>     [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
>     [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
>     [<ffffffff81824421>] icmp6_send+0x5e1/0x940
>     [<ffffffff8183d431>] icmpv6_send+0x21/0x30
>     [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
> 
> For IPv4 the ip_expire() function however did change considerably
> since then. In
> 
>   fa0f527358 ("ip: use rb trees for IP frag queue.")
> 
> the SKB might be taken from a rbtree (use of rbtrees for IPv4 was
> backported to 4.4.y upstream).
> Along with those obvious changes, the code also is modified to
> actually de-queue the SKB from whichever source it was taken.
> This also got rid of the skb_get() which causes problems in
> icmpv6_send(). And latest upstream code uses inet_frag_pull_head()
> which does the same.
> 
> To fix the crash in IPv6, we use the same modifications added
> to ip_expire() by fa0f527358. This might be too much change for
> now because IPv6 only starts using rbtrees for frags with
> 
>   997dd96471 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c")
> 
> which has not been backported to 4.4.y. Testing by a reporter was
> showing good results. Likely the else part never gets used until
> 997dd96471 is backported, too. And that needs more changes.
> Some upstream (stable) discussion was started but has not yet
> resulted in any usable results. So adding this as SAUCE for now
> to get the kernel stable (based on testing).
> 
> Fixes: bf8187348f ("ipv6: frags: rewrite ip6_expire_frag_queue()")
>        in the linux-4.4.y stable tree.
> (based-on: f78a3f45e7 ("ip: use rb trees for IP frag queue." 4.4.y))
> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> ---

Applied to xenial/master-next and cosmic/master-next. Thanks.

-Stefan

>  net/ipv6/reassembly.c | 26 +++++++++++++++++++++-----
>  1 file changed, 21 insertions(+), 5 deletions(-)
> 
> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> index ec917f58d105..5b09ce54c476 100644
> --- a/net/ipv6/reassembly.c
> +++ b/net/ipv6/reassembly.c
> @@ -92,7 +92,7 @@ EXPORT_SYMBOL(ip6_frag_init);
>  void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>  {
>  	struct net_device *dev = NULL;
> -	struct sk_buff *head;
> +	struct sk_buff *head = NULL;
>  
>  	rcu_read_lock();
>  	spin_lock(&fq->q.lock);
> @@ -110,26 +110,42 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>  	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
>  
>  	/* Don't send error if the first segment did not arrive. */
> -	head = fq->q.fragments;
> -	if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
> +	if (!(fq->q.flags & INET_FRAG_FIRST_IN))
>  		goto out;
>  
> +	if (fq->q.fragments) {
> +		head = fq->q.fragments;
> +		fq->q.fragments = head->next;
> +	} else {
> +		head = skb_rb_first(&fq->q.rb_fragments);
> +		if (!head)
> +			goto out;
> +		rb_erase(&head->rbnode, &fq->q.rb_fragments);
> +		memset(&head->rbnode, 0, sizeof(head->rbnode));
> +		barrier();
> +	}
> +
> +	if (head == fq->q.fragments_tail)
> +		fq->q.fragments_tail = NULL;
> +
> +	sub_frag_mem_limit(fq->q.net, head->truesize);
> +
>  	/* But use as source device on which LAST ARRIVED
>  	 * segment was received. And do not use fq->dev
>  	 * pointer directly, device might already disappeared.
>  	 */
>  	head->dev = dev;
> -	skb_get(head);
>  	spin_unlock(&fq->q.lock);
>  
>  	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
> -	kfree_skb(head);
>  	goto out_rcu_unlock;
>  
>  out:
>  	spin_unlock(&fq->q.lock);
>  out_rcu_unlock:
>  	rcu_read_unlock();
> +	if (head)
> +		kfree_skb(head);
>  	inet_frag_put(&fq->q);
>  }
>  EXPORT_SYMBOL(ip6_expire_frag_queue);
>
dann frazier June 10, 2019, 3:49 p.m. UTC | #4
On Thu, Jun 6, 2019 at 5:22 AM Stefan Bader <stefan.bader@canonical.com> wrote:
>
> On 06.06.19 12:24, Stefan Bader wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1824687
> >
> > The backport of
> >
> >   05c0b86b96 ("ipv6: frags: rewrite ip6_expire_frag_queue()")
> >
> > to linux-4.4.y stable changed ip6_expire_frag_queue() to be similar
> > to ip_expire(). However, using skb_get() leads to a crash while
> > sending the ICMP message due to a check for shared SKBs.
> >
> >    kernel BUG at linux-4.4.0/net/core/skbuff.c:1207!
> >    RIP: 0010:[<ffffffff81740953>]
> >     [<ffffffff81740953>] pskb_expand_head+0x243/0x250
> >     [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
> >     [<ffffffff8183939a>] _decode_session6+0x26a/0x400
> >     [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
> >     [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
> >     [<ffffffff81824421>] icmp6_send+0x5e1/0x940
> >     [<ffffffff8183d431>] icmpv6_send+0x21/0x30
> >     [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
> >
> > For IPv4 the ip_expire() function however did change considerably
> > since then. In
> >
> >   fa0f527358 ("ip: use rb trees for IP frag queue.")
> >
> > the SKB might be taken from a rbtree (use of rbtrees for IPv4 was
> > backported to 4.4.y upstream).
> > Along with those obvious changes, the code also is modified to
> > actually de-queue the SKB from whichever source it was taken.
> > This also got rid of the skb_get() which causes problems in
> > icmpv6_send(). And latest upstream code uses inet_frag_pull_head()
> > which does the same.
> >
> > To fix the crash in IPv6, we use the same modifications added
> > to ip_expire() by fa0f527358. This might be too much change for
> > now because IPv6 only starts using rbtrees for frags with
> >
> >   997dd96471 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c")
> >
> > which has not been backported to 4.4.y. Testing by a reporter was
> > showing good results. Likely the else part never gets used until
> > 997dd96471 is backported, too. And that needs more changes.
> > Some upstream (stable) discussion was started but has not yet
> > resulted in any usable results. So adding this as SAUCE for now
> > to get the kernel stable (based on testing).
> >
> > Fixes: bf8187348f ("ipv6: frags: rewrite ip6_expire_frag_queue()")
> >        in the linux-4.4.y stable tree.
> > (based-on: f78a3f45e7 ("ip: use rb trees for IP frag queue." 4.4.y))
> > Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
> > ---
>
> Applied to xenial/master-next and cosmic/master-next. Thanks.

fyi, cosmic/master-next is failing to build for me after this patch:
  https://paste.ubuntu.com/p/kjKJBk6BRN/

  -dann

> >  net/ipv6/reassembly.c | 26 +++++++++++++++++++++-----
> >  1 file changed, 21 insertions(+), 5 deletions(-)
> >
> > diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
> > index ec917f58d105..5b09ce54c476 100644
> > --- a/net/ipv6/reassembly.c
> > +++ b/net/ipv6/reassembly.c
> > @@ -92,7 +92,7 @@ EXPORT_SYMBOL(ip6_frag_init);
> >  void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
> >  {
> >       struct net_device *dev = NULL;
> > -     struct sk_buff *head;
> > +     struct sk_buff *head = NULL;
> >
> >       rcu_read_lock();
> >       spin_lock(&fq->q.lock);
> > @@ -110,26 +110,42 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
> >       IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
> >
> >       /* Don't send error if the first segment did not arrive. */
> > -     head = fq->q.fragments;
> > -     if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
> > +     if (!(fq->q.flags & INET_FRAG_FIRST_IN))
> >               goto out;
> >
> > +     if (fq->q.fragments) {
> > +             head = fq->q.fragments;
> > +             fq->q.fragments = head->next;
> > +     } else {
> > +             head = skb_rb_first(&fq->q.rb_fragments);
> > +             if (!head)
> > +                     goto out;
> > +             rb_erase(&head->rbnode, &fq->q.rb_fragments);
> > +             memset(&head->rbnode, 0, sizeof(head->rbnode));
> > +             barrier();
> > +     }
> > +
> > +     if (head == fq->q.fragments_tail)
> > +             fq->q.fragments_tail = NULL;
> > +
> > +     sub_frag_mem_limit(fq->q.net, head->truesize);
> > +
> >       /* But use as source device on which LAST ARRIVED
> >        * segment was received. And do not use fq->dev
> >        * pointer directly, device might already disappeared.
> >        */
> >       head->dev = dev;
> > -     skb_get(head);
> >       spin_unlock(&fq->q.lock);
> >
> >       icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
> > -     kfree_skb(head);
> >       goto out_rcu_unlock;
> >
> >  out:
> >       spin_unlock(&fq->q.lock);
> >  out_rcu_unlock:
> >       rcu_read_unlock();
> > +     if (head)
> > +             kfree_skb(head);
> >       inet_frag_put(&fq->q);
> >  }
> >  EXPORT_SYMBOL(ip6_expire_frag_queue);
> >
>
>
> --
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Stefan Bader June 11, 2019, 6:58 a.m. UTC | #5
On 10.06.19 17:49, dann frazier wrote:
> On Thu, Jun 6, 2019 at 5:22 AM Stefan Bader <stefan.bader@canonical.com> wrote:
>>
>> On 06.06.19 12:24, Stefan Bader wrote:
>>> BugLink: https://bugs.launchpad.net/bugs/1824687
>>>
>>> The backport of
>>>
>>>   05c0b86b96 ("ipv6: frags: rewrite ip6_expire_frag_queue()")
>>>
>>> to linux-4.4.y stable changed ip6_expire_frag_queue() to be similar
>>> to ip_expire(). However, using skb_get() leads to a crash while
>>> sending the ICMP message due to a check for shared SKBs.
>>>
>>>    kernel BUG at linux-4.4.0/net/core/skbuff.c:1207!
>>>    RIP: 0010:[<ffffffff81740953>]
>>>     [<ffffffff81740953>] pskb_expand_head+0x243/0x250
>>>     [<ffffffff81740e50>] __pskb_pull_tail+0x50/0x350
>>>     [<ffffffff8183939a>] _decode_session6+0x26a/0x400
>>>     [<ffffffff817ec719>] __xfrm_decode_session+0x39/0x50
>>>     [<ffffffff818239d0>] icmpv6_route_lookup+0xf0/0x1c0
>>>     [<ffffffff81824421>] icmp6_send+0x5e1/0x940
>>>     [<ffffffff8183d431>] icmpv6_send+0x21/0x30
>>>     [<ffffffff8182b500>] ip6_expire_frag_queue+0xe0/0x120
>>>
>>> For IPv4 the ip_expire() function however did change considerably
>>> since then. In
>>>
>>>   fa0f527358 ("ip: use rb trees for IP frag queue.")
>>>
>>> the SKB might be taken from a rbtree (use of rbtrees for IPv4 was
>>> backported to 4.4.y upstream).
>>> Along with those obvious changes, the code also is modified to
>>> actually de-queue the SKB from whichever source it was taken.
>>> This also got rid of the skb_get() which causes problems in
>>> icmpv6_send(). And latest upstream code uses inet_frag_pull_head()
>>> which does the same.
>>>
>>> To fix the crash in IPv6, we use the same modifications added
>>> to ip_expire() by fa0f527358. This might be too much change for
>>> now because IPv6 only starts using rbtrees for frags with
>>>
>>>   997dd96471 ("net: IP6 defrag: use rbtrees in nf_conntrack_reasm.c")
>>>
>>> which has not been backported to 4.4.y. Testing by a reporter was
>>> showing good results. Likely the else part never gets used until
>>> 997dd96471 is backported, too. And that needs more changes.
>>> Some upstream (stable) discussion was started but has not yet
>>> resulted in any usable results. So adding this as SAUCE for now
>>> to get the kernel stable (based on testing).
>>>
>>> Fixes: bf8187348f ("ipv6: frags: rewrite ip6_expire_frag_queue()")
>>>        in the linux-4.4.y stable tree.
>>> (based-on: f78a3f45e7 ("ip: use rb trees for IP frag queue." 4.4.y))
>>> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
>>> ---
>>
>> Applied to xenial/master-next and cosmic/master-next. Thanks.
> 
> fyi, cosmic/master-next is failing to build for me after this patch:
>   https://paste.ubuntu.com/p/kjKJBk6BRN/

Hrm, I should have checked whether rbtree usage was added to ipv4 in 4.18. Was
fooled by the ipv6 function looking just like in the 4.4.y backports. And the
ip_expire() function back then seems to just do the skb_get(). So either
everything is even more horribly broken or the bug statement was not added.
Since we have no explicit reports for Cosmic, we should drop it there for now.

-Stefan
> 
>   -dann
> 
>>>  net/ipv6/reassembly.c | 26 +++++++++++++++++++++-----
>>>  1 file changed, 21 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
>>> index ec917f58d105..5b09ce54c476 100644
>>> --- a/net/ipv6/reassembly.c
>>> +++ b/net/ipv6/reassembly.c
>>> @@ -92,7 +92,7 @@ EXPORT_SYMBOL(ip6_frag_init);
>>>  void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>>>  {
>>>       struct net_device *dev = NULL;
>>> -     struct sk_buff *head;
>>> +     struct sk_buff *head = NULL;
>>>
>>>       rcu_read_lock();
>>>       spin_lock(&fq->q.lock);
>>> @@ -110,26 +110,42 @@ void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
>>>       IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
>>>
>>>       /* Don't send error if the first segment did not arrive. */
>>> -     head = fq->q.fragments;
>>> -     if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
>>> +     if (!(fq->q.flags & INET_FRAG_FIRST_IN))
>>>               goto out;
>>>
>>> +     if (fq->q.fragments) {
>>> +             head = fq->q.fragments;
>>> +             fq->q.fragments = head->next;
>>> +     } else {
>>> +             head = skb_rb_first(&fq->q.rb_fragments);
>>> +             if (!head)
>>> +                     goto out;
>>> +             rb_erase(&head->rbnode, &fq->q.rb_fragments);
>>> +             memset(&head->rbnode, 0, sizeof(head->rbnode));
>>> +             barrier();
>>> +     }
>>> +
>>> +     if (head == fq->q.fragments_tail)
>>> +             fq->q.fragments_tail = NULL;
>>> +
>>> +     sub_frag_mem_limit(fq->q.net, head->truesize);
>>> +
>>>       /* But use as source device on which LAST ARRIVED
>>>        * segment was received. And do not use fq->dev
>>>        * pointer directly, device might already disappeared.
>>>        */
>>>       head->dev = dev;
>>> -     skb_get(head);
>>>       spin_unlock(&fq->q.lock);
>>>
>>>       icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
>>> -     kfree_skb(head);
>>>       goto out_rcu_unlock;
>>>
>>>  out:
>>>       spin_unlock(&fq->q.lock);
>>>  out_rcu_unlock:
>>>       rcu_read_unlock();
>>> +     if (head)
>>> +             kfree_skb(head);
>>>       inet_frag_put(&fq->q);
>>>  }
>>>  EXPORT_SYMBOL(ip6_expire_frag_queue);
>>>
>>
>>
>> --
>> kernel-team mailing list
>> kernel-team@lists.ubuntu.com
>> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Stefan Bader June 11, 2019, 7:21 a.m. UTC | #6
For now dropped from Cosmic.
diff mbox series

Patch

diff --git a/net/ipv6/reassembly.c b/net/ipv6/reassembly.c
index ec917f58d105..5b09ce54c476 100644
--- a/net/ipv6/reassembly.c
+++ b/net/ipv6/reassembly.c
@@ -92,7 +92,7 @@  EXPORT_SYMBOL(ip6_frag_init);
 void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
 {
 	struct net_device *dev = NULL;
-	struct sk_buff *head;
+	struct sk_buff *head = NULL;
 
 	rcu_read_lock();
 	spin_lock(&fq->q.lock);
@@ -110,26 +110,42 @@  void ip6_expire_frag_queue(struct net *net, struct frag_queue *fq)
 	IP6_INC_STATS_BH(net, __in6_dev_get(dev), IPSTATS_MIB_REASMTIMEOUT);
 
 	/* Don't send error if the first segment did not arrive. */
-	head = fq->q.fragments;
-	if (!(fq->q.flags & INET_FRAG_FIRST_IN) || !head)
+	if (!(fq->q.flags & INET_FRAG_FIRST_IN))
 		goto out;
 
+	if (fq->q.fragments) {
+		head = fq->q.fragments;
+		fq->q.fragments = head->next;
+	} else {
+		head = skb_rb_first(&fq->q.rb_fragments);
+		if (!head)
+			goto out;
+		rb_erase(&head->rbnode, &fq->q.rb_fragments);
+		memset(&head->rbnode, 0, sizeof(head->rbnode));
+		barrier();
+	}
+
+	if (head == fq->q.fragments_tail)
+		fq->q.fragments_tail = NULL;
+
+	sub_frag_mem_limit(fq->q.net, head->truesize);
+
 	/* But use as source device on which LAST ARRIVED
 	 * segment was received. And do not use fq->dev
 	 * pointer directly, device might already disappeared.
 	 */
 	head->dev = dev;
-	skb_get(head);
 	spin_unlock(&fq->q.lock);
 
 	icmpv6_send(head, ICMPV6_TIME_EXCEED, ICMPV6_EXC_FRAGTIME, 0);
-	kfree_skb(head);
 	goto out_rcu_unlock;
 
 out:
 	spin_unlock(&fq->q.lock);
 out_rcu_unlock:
 	rcu_read_unlock();
+	if (head)
+		kfree_skb(head);
 	inet_frag_put(&fq->q);
 }
 EXPORT_SYMBOL(ip6_expire_frag_queue);