diff mbox

[0/2,BUG,FIXES,-,3.10.27] sit: More backports

Message ID 52E9267C.90403@6wind.com
State RFC, archived
Delegated to: David Miller
Headers show

Commit Message

Nicolas Dichtel Jan. 29, 2014, 4:04 p.m. UTC
Le 29/01/2014 13:57, Steven Rostedt a écrit :
> On Wed, 29 Jan 2014 12:04:05 +0100
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>> Le 28/01/2014 21:57, Steven Rostedt a écrit :
>>> At Red Hat we base our real-time kernel off of 3.10.27 and do lots of
>>> stress testing on that kernel. This has discovered some bugs that we
>>> can hit with the vanilla 3.10.27 kernel (no -rt patches applied).
>>>
>>> I sent out a bug fix that can cause a crash with the current 3.10.27
>>> when you add and then remove the sit module. That patch is obsoleted by
>>> these patches, as that patch was not enough.
>> Can you explain a bit more which problem remains after that patch?
>> I wonder if a problem remains also with ip6_tunnel.ko (net/ipv6/ip6_tunnel.c),
>> the same problem was spotted into this module.
>
> Hmm, OK it may only need the first version of the patch. It was one of
> those cases where it didn't fix the second bug, so we added the full
> patch as well. Then we found the second bug but never tried all the
> tests with the smaller version of the patch. I'll put back the first
> patch and then ask our QA to retest it with the older version and the
> other patch.
>
>>
>>>
>>> A previous patch that was backported:
>>>
>>>     Upstream commit 205983c43700ac3a81e7625273a3fa83cd2759b5
>>>     sit: allow to use rtnl ops on fb tunnel
>>>
>>> Had a depenency on commit 5e6700b3bf98 ("sit: add support of x-netns")
>>> which was not backported. The dependency was only on part of that
>>> commit which is what I backported.
>> I cannot comment directly the patch, it was an attachement, hence I put my
>
> It's not really an attachment. It was sent with quilt mail, which only
> makes it look like one. What mail client are you using? mutt, alpine,
> evolution, claws-mail, and thunderbird all show it as a normal patch.
> I do know that k9-mail on android makes it into an attachment.
Thunderbird. The patch was show as a normal aptch, but when I do "reply all", I
get an empty mail.

>
>> comments here.
>> In patch 0001-sit-Unregister-sit-devices-with-rtnl_link_ops.patch, I wonder how
>> 'if (dev_net(t->dev) != net)' can be wrong. If commit 5e6700b3bf98 ("sit: add
>> support of x-netns") has not been backported, this test is always true.
>
> Should it just be removed then? This has fixed our bugs, but perhaps it
> opened new ones we haven't detected yet. I can remove the if and
> unregister and see if it still works. Or perhaps calling unregister all
> the time isn't bad.
Ok, I've think a bit more to this problem. First, let's explain it.

rmmod sit
=> sit_cleanup()
   => rtnl_link_unregister()
     => __rtnl_kill_links()
       => for_each_netdev(net, dev) {
            if (dev->rtnl_link_ops == ops)
               ops->dellink(dev, &list_kill);
          }
  **at this point, the FB device is deleted (and all sit tunnels)**

   => unregister_pernet_device()
     => unregister_pernet_operations()
       => ops_exit_list()
         => sit_exit_net()
           => sit_destroy_tunnels()
              in this function, no tunnel is found
           unregister_netdevice_queue(sitn->fb_tunnel_dev, &list);
   ** second deletion, here is the bug **

Now, what happen on netns deletion with the previous patch? Only sit_exit_net()
will be called, hence with the previous patch, the fb device will not be
destroyed, netns will leak.

Your patch serie seems to be the good way to go (note that patch 1/2 does not
compile) but I think the fix is smaller because we don't have x-netns.

Here is my proposal, if you agree, I will send the same patch for ip6_tunnnel,
which have the netns leak.


 From d101450583c3a472a2a94904cfe13fd4e7d2f519 Mon Sep 17 00:00:00 2001
From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Date: Wed, 29 Jan 2014 16:40:30 +0100
Subject: [PATCH] sit: fix double free of fb_tunnel_dev on exit

This problem was fixed upstream by commit 9434266f2c64 ("sit: fix use after free
of fb_tunnel_dev").
The upstream patch depends on upstream commit 5e6700b3bf98 ("sit: add support of
x-netns"), which was not backported into 3.10 branch.

First, explain the problem: when the sit module is unloaded, sit_cleanup() is
called.
rmmod sit
=> sit_cleanup()
   => rtnl_link_unregister()
     => __rtnl_kill_links()
       => for_each_netdev(net, dev) {
         if (dev->rtnl_link_ops == ops)
         	ops->dellink(dev, &list_kill);
         }
At this point, the FB device is deleted (and all sit tunnels).
   => unregister_pernet_device()
     => unregister_pernet_operations()
       => ops_exit_list()
         => sit_exit_net()
           => sit_destroy_tunnels()
           In this function, no tunnel is found.
           => unregister_netdevice_queue(sitn->fb_tunnel_dev, &list);
We delete the FB device a second time here!

Because we cannot simply remove the second deletion (sit_exit_net() must remove
the FB device when a netns is deleted), we add an rtnl ops which delete all sit
device excepting the FB device and thus we can keep the explicit deletion in
sit_exit_net().

CC: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
---
  net/ipv6/sit.c | 10 ++++++++++
  1 file changed, 10 insertions(+)

  	.maxtype	= IFLA_IPTUN_MAX,
@@ -1517,6 +1526,7 @@ static struct rtnl_link_ops sit_link_ops __read_mostly = {
  	.changelink	= ipip6_changelink,
  	.get_size	= ipip6_get_size,
  	.fill_info	= ipip6_fill_info,
+	.dellink	= ipip6_dellink,
  };

  static struct xfrm_tunnel sit_handler __read_mostly = {

Comments

Willem de Bruijn Jan. 29, 2014, 5:21 p.m. UTC | #1
> From d101450583c3a472a2a94904cfe13fd4e7d2f519 Mon Sep 17 00:00:00 2001
> From: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> Date: Wed, 29 Jan 2014 16:40:30 +0100
> Subject: [PATCH] sit: fix double free of fb_tunnel_dev on exit
>
> This problem was fixed upstream by commit 9434266f2c64 ("sit: fix use after free
> of fb_tunnel_dev").
> The upstream patch depends on upstream commit 5e6700b3bf98 ("sit: add support of
> x-netns"), which was not backported into 3.10 branch.
>
> First, explain the problem: when the sit module is unloaded, sit_cleanup() is
> called.
> rmmod sit
> => sit_cleanup()
>   => rtnl_link_unregister()
>     => __rtnl_kill_links()
>       => for_each_netdev(net, dev) {
>         if (dev->rtnl_link_ops == ops)
>                 ops->dellink(dev, &list_kill);
>         }
> At this point, the FB device is deleted (and all sit tunnels).
>   => unregister_pernet_device()
>     => unregister_pernet_operations()
>       => ops_exit_list()
>         => sit_exit_net()
>           => sit_destroy_tunnels()
>           In this function, no tunnel is found.
>           => unregister_netdevice_queue(sitn->fb_tunnel_dev, &list);
> We delete the FB device a second time here!
>
> Because we cannot simply remove the second deletion (sit_exit_net() must remove
> the FB device when a netns is deleted), we add an rtnl ops which delete all sit
> device excepting the FB device and thus we can keep the explicit deletion in
> sit_exit_net().
>
> CC: Steven Rostedt <rostedt@goodmis.org>
> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
> ---
>  net/ipv6/sit.c | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
> index 0491264b8bfc..620d326e8fdd 100644
> --- a/net/ipv6/sit.c
> +++ b/net/ipv6/sit.c
> @@ -1507,6 +1507,15 @@ static const struct nla_policy ipip6_policy[IFLA_IPTUN_MAX + 1] = {
>  #endif
>  };
>
> +static void ipip6_dellink(struct net_device *dev, struct list_head *head)
> +{
> +       struct net *net = dev_net(dev);
> +       struct sit_net *sitn = net_generic(net, sit_net_id);
> +
> +       if (dev != sitn->fb_tunnel_dev)
> +               unregister_netdevice_queue(dev, head);
> +}
> +
>  static struct rtnl_link_ops sit_link_ops __read_mostly = {
>         .kind           = "sit",
>         .maxtype        = IFLA_IPTUN_MAX,
> @@ -1517,6 +1526,7 @@ static struct rtnl_link_ops sit_link_ops __read_mostly = {
>         .changelink     = ipip6_changelink,
>         .get_size       = ipip6_get_size,
>         .fill_info      = ipip6_fill_info,
> +       .dellink        = ipip6_dellink,
>  };
>
>  static struct xfrm_tunnel sit_handler __read_mostly = {
> --
> 1.8.4.1


This looks good to me. It is the same as the backport "sit: fix use
after free of fb_tunnel_dev" (9434266f2c64), minus the small code
cleanup at the end of that patch that is not relevant to 3.10.27 (do
not define sit_net *sitn in sit_exit_net if it is not used there. But in
3.10.27 it is used, in unregister_netdevice_queue).
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt Jan. 29, 2014, 8:48 p.m. UTC | #2
On Wed, 29 Jan 2014 17:04:12 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
 
> Your patch serie seems to be the good way to go (note that patch 1/2 does not
> compile) but I think the fix is smaller because we don't have x-netns.
> 
> Here is my proposal, if you agree, I will send the same patch for ip6_tunnnel,
> which have the netns leak.
> 

Hold on. Seems that the kernels that were being tested in QA had more
code than what I was testing. Clark had backported "sit: fix use after
free of fb_tunnel_dev" and that was what was causing the
unlist_netdevice() to be missed.

When I started working on vanilla 3.10.27 as well, I first did my
original patch (which just removes the call to
unregister_netdevice_queue() from sit_exit_net()). I asked to have that
added to our kernel for testing, and they told me it was already there
via Clark's backport. Then I did the full backport as well and looked
for the leak. I'm now thinking that the full backport is not needed as
that was what caused the leak.

According to commit 9434266f2c645d4fcf62a03a8e36ad8075e37943 "sit: fix
use after free of fb_tunnel_dev", it states:

    Bug: The fallback device is created in sit_init_net and assumed to
    be freed in sit_exit_net. First, it is dereferenced in that
    function, in sit_destroy_tunnels:
    
            struct net *net = dev_net(sitn->fb_tunnel_dev);
    
    Prior to this, rtnl_unlink_register has removed all devices that
    match rtnl_link_ops == sit_link_ops.

    Commit 205983c43700 added the line
    
    +       sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
    
    which cases the fallback device to match here and be freed before it
    is last dereferenced.

Commit 205983c43700 was backported to 3.10, but without commit
5e6700b3bf98 "sit: add support of x-netns" which was what added the

  net = dev_net(sitn->fb_tunnel_dev);

Which looks to me that the only reason I need to port back commit
5e6700b3bf98 is if I add the full backport of 9434266f2c645d4f.

Seems to me that my original patch may be good enough. The one that I
said this series obsoletes.

Note, I've talked with the people that are doing the testing, and I'm
having them revert all changes except for that one fix and rerun the
tests again. I should know the results by tomorrow.

Let me know if "sit: fix use after free of fb_tunnel_dev" still needs
to be backported due to some other way that the fallback device can be
dereferenced after being freed.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Jan. 30, 2014, 9:28 a.m. UTC | #3
Le 29/01/2014 21:48, Steven Rostedt a écrit :
> On Wed, 29 Jan 2014 17:04:12 +0100
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>> Your patch serie seems to be the good way to go (note that patch 1/2 does not
>> compile) but I think the fix is smaller because we don't have x-netns.
>>
>> Here is my proposal, if you agree, I will send the same patch for ip6_tunnnel,
>> which have the netns leak.
>>
>
> Hold on. Seems that the kernels that were being tested in QA had more
> code than what I was testing. Clark had backported "sit: fix use after
> free of fb_tunnel_dev" and that was what was causing the
> unlist_netdevice() to be missed.
>
> When I started working on vanilla 3.10.27 as well, I first did my
> original patch (which just removes the call to
> unregister_netdevice_queue() from sit_exit_net()). I asked to have that
> added to our kernel for testing, and they told me it was already there
> via Clark's backport. Then I did the full backport as well and looked
> for the leak. I'm now thinking that the full backport is not needed as
> that was what caused the leak.
>
> According to commit 9434266f2c645d4fcf62a03a8e36ad8075e37943 "sit: fix
> use after free of fb_tunnel_dev", it states:
>
>      Bug: The fallback device is created in sit_init_net and assumed to
>      be freed in sit_exit_net. First, it is dereferenced in that
>      function, in sit_destroy_tunnels:
>
>              struct net *net = dev_net(sitn->fb_tunnel_dev);
>
>      Prior to this, rtnl_unlink_register has removed all devices that
>      match rtnl_link_ops == sit_link_ops.
>
>      Commit 205983c43700 added the line
>
>      +       sitn->fb_tunnel_dev->rtnl_link_ops = &sit_link_ops;
>
>      which cases the fallback device to match here and be freed before it
>      is last dereferenced.
>
> Commit 205983c43700 was backported to 3.10, but without commit
> 5e6700b3bf98 "sit: add support of x-netns" which was what added the
>
>    net = dev_net(sitn->fb_tunnel_dev);
>
> Which looks to me that the only reason I need to port back commit
> 5e6700b3bf98 is if I add the full backport of 9434266f2c645d4f.
>
> Seems to me that my original patch may be good enough. The one that I
> said this series obsoletes.
>
> Note, I've talked with the people that are doing the testing, and I'm
> having them revert all changes except for that one fix and rerun the
> tests again. I should know the results by tomorrow.
>
> Let me know if "sit: fix use after free of fb_tunnel_dev" still needs
> to be backported due to some other way that the fallback device can be
> dereferenced after being freed.

Steve, I think the patch I sent yesterday is the good fix. At the end, it's
a backport of Willem's patch. Note that he also ack that patch.
The first version you sent (which removes
unregister_netdevice_queue(sitn->fb_tunnel_dev, &list)) will introduce a
memory leak when the user destroy a netns.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt Jan. 30, 2014, 1:31 p.m. UTC | #4
On Thu, 30 Jan 2014 10:28:43 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:


> Steve, I think the patch I sent yesterday is the good fix. At the end, it's
> a backport of Willem's patch. Note that he also ack that patch.
> The first version you sent (which removes
> unregister_netdevice_queue(sitn->fb_tunnel_dev, &list)) will introduce a
> memory leak when the user destroy a netns.

Hi Nicolas,

I reverted my patches and applied and tested your patches locally and
they passed my first line testing. I'm going to have them entered into
our test suite, after removing our other patches, and see if they solve
all the bugs that we were tripping over.

I'll let you know when these are finished.

Thanks!

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nicolas Dichtel Jan. 30, 2014, 1:42 p.m. UTC | #5
Le 30/01/2014 14:31, Steven Rostedt a écrit :
> On Thu, 30 Jan 2014 10:28:43 +0100
> Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:
>
>
>> Steve, I think the patch I sent yesterday is the good fix. At the end, it's
>> a backport of Willem's patch. Note that he also ack that patch.
>> The first version you sent (which removes
>> unregister_netdevice_queue(sitn->fb_tunnel_dev, &list)) will introduce a
>> memory leak when the user destroy a netns.
>
> Hi Nicolas,
>
> I reverted my patches and applied and tested your patches locally and
> they passed my first line testing. I'm going to have them entered into
> our test suite, after removing our other patches, and see if they solve
> all the bugs that we were tripping over.
>
> I'll let you know when these are finished.
Thank you for testing.


Regards,
Nicolas
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Steven Rostedt Jan. 30, 2014, 10:10 p.m. UTC | #6
On Thu, 30 Jan 2014 14:42:57 +0100
Nicolas Dichtel <nicolas.dichtel@6wind.com> wrote:


> > I'll let you know when these are finished.
> Thank you for testing.

Our short tests that were crashing have all now passed :-) with your
patches applied.

We are now running our full tier tests on the code, but I don't think
we need to wait. Please add my tags to your patches, and hopefully we
can get David to give us a new blessing on your patch set.

David and Greg,

You can ignore my patches as it seems that Nicolas's patches solve the
issues we have been seeing on 3.10.

Nicolas,

Thanks for all your effort in helping us out here :-)

Reported-by: Steven Rostedt <srostedt@redhat.com>
Tested-by: Steven Rostedt <srostedt@redhat.com> (and our entire MRG team)
Tested-by: "Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Tested-by: John Kacur <jkacur@redhat.com>

I don't usually give my Red Hat email for things like this, but as this
was found and tested with the Red Hat infrastructure, with the help
from Luis and John (and others), I would like to give them credit too.

-- Steve
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/sit.c b/net/ipv6/sit.c
index 0491264b8bfc..620d326e8fdd 100644
--- a/net/ipv6/sit.c
+++ b/net/ipv6/sit.c
@@ -1507,6 +1507,15 @@  static const struct nla_policy 
ipip6_policy[IFLA_IPTUN_MAX + 1] = {
  #endif
  };

+static void ipip6_dellink(struct net_device *dev, struct list_head *head)
+{
+	struct net *net = dev_net(dev);
+	struct sit_net *sitn = net_generic(net, sit_net_id);
+
+	if (dev != sitn->fb_tunnel_dev)
+		unregister_netdevice_queue(dev, head);
+}
+
  static struct rtnl_link_ops sit_link_ops __read_mostly = {
  	.kind		= "sit",