mbox

[RFC/Review] Prevent network namespace memory exhaution

Message ID 1300981277-26338-1-git-send-email-stefan.bader@canonical.com
State New
Headers show

Pull-request

git://kernel.ubuntu.com/smb/ubuntu-lucid netnsbpv2

Message

Stefan Bader March 24, 2011, 3:41 p.m. UTC
BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095

This series of patches tries to cover a problem that we caught by
enabling network namespaces (CONFIG_NET_NS) in Lucid, which was done
(although the feature was still marked experimental) to support
containerize usecases (and we would get some complaints by
removing it now).

I tried to come up with some usable solution. Unfortunately picking the
minimal set of patches which prevents the memory buildup, also causes the
rate of connects (which in that case makes use of network namespace cloning
a lot) to go down noticeably.

The second half would improve the situation slightly but still not as
much as it has been achieved in Maverick. And using the Maverick
backport causes other problems in that specific case the bug is
reported.

To quantify that a bit better:

Lucid current		10 connections per second
Lucid set 1		 1 connection every  2 seconds
Lucid set 2		 2 connections every 3 seconds
Maverick		 2 connections per second

There has not been a way to verify how bad the impact of the slowdown
would be in a real production environment. So it might be a viable
approach to limit changes to the first set. Assuming that creating
and destroying namespaces is not the common usecase we have.

Should there be performance complaints, we still could think of
having a closer look at the second set (or more).

So generally, does this sound like an approach we can SRU? And 
second, more eyes looking at the set(s) would be appreciated.

-Stefan

Those are enough to prevent memory being eaten:
* net: Introduce unregister_netdevice_queue()
* net: Introduce unregister_netdevice_many()
* net: add a list_head parameter to dellink() method
* veth: Fix veth_dellink method
* veth: Fix unregister_netdevice_queue for veth
* net: Implement for_each_netdev_reverse.
* net: Batch network namespace destruction.

Those seem to speed up the number of connects to vsftp per time (though
not as much as Maverick):
* net: Automatically allocate per namespace data.
* net: Add support for batching network namespace cleanups
* netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}
* net: Use rcu lookups in inet_twsk_purge.
* tcp: fix inet_twsk_deschedule()
* net: Batch inet_twsk_purge

The following changes since commit 054b34d3a38dc2a775ab722411b934b52a33707f:
  Brad Figg (1):
        UBUNTU: Ubuntu-2.6.32-31.60

are available in the git repository at:

  git://kernel.ubuntu.com/smb/ubuntu-lucid netnsbpv2

Eric Dumazet (5):
      net: Introduce unregister_netdevice_queue()
      net: Introduce unregister_netdevice_many()
      net: add a list_head parameter to dellink() method
      veth: Fix veth_dellink method
      tcp: fix inet_twsk_deschedule()

Eric W. Biederman (8):
      veth: Fix unregister_netdevice_queue for veth
      net: Implement for_each_netdev_reverse.
      net: Batch network namespace destruction.
      net: Automatically allocate per namespace data.
      net: Add support for batching network namespace cleanups
      netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}
      net: Use rcu lookups in inet_twsk_purge.
      net: Batch inet_twsk_purge

 drivers/net/macvlan.c            |    6 +-
 drivers/net/veth.c               |    6 +-
 include/linux/netdevice.h        |   12 ++-
 include/net/inet_timewait_sock.h |    6 +-
 include/net/net_namespace.h      |   32 ++++-
 include/net/rtnetlink.h          |    3 +-
 net/8021q/vlan.c                 |    8 +-
 net/8021q/vlan.h                 |    2 +-
 net/core/dev.c                   |  120 ++++++++++-----
 net/core/net_namespace.c         |  296 +++++++++++++++++++++++---------------
 net/core/rtnetlink.c             |   14 +-
 net/ipv4/inet_timewait_sock.c    |   47 ++++---
 net/ipv4/tcp_ipv4.c              |   11 +-
 net/ipv6/tcp_ipv6.c              |   11 +-
 14 files changed, 369 insertions(+), 205 deletions(-)

Comments

Tim Gardner March 25, 2011, 3 a.m. UTC | #1
On 03/24/2011 09:41 AM, Stefan Bader wrote:
> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095
>
> This series of patches tries to cover a problem that we caught by
> enabling network namespaces (CONFIG_NET_NS) in Lucid, which was done
> (although the feature was still marked experimental) to support
> containerize usecases (and we would get some complaints by
> removing it now).
>
> I tried to come up with some usable solution. Unfortunately picking the
> minimal set of patches which prevents the memory buildup, also causes the
> rate of connects (which in that case makes use of network namespace cloning
> a lot) to go down noticeably.
>
> The second half would improve the situation slightly but still not as
> much as it has been achieved in Maverick. And using the Maverick
> backport causes other problems in that specific case the bug is
> reported.
>
> To quantify that a bit better:
>
> Lucid current		10 connections per second
> Lucid set 1		 1 connection every  2 seconds
> Lucid set 2		 2 connections every 3 seconds
> Maverick		 2 connections per second
>
> There has not been a way to verify how bad the impact of the slowdown
> would be in a real production environment. So it might be a viable
> approach to limit changes to the first set. Assuming that creating
> and destroying namespaces is not the common usecase we have.
>
> Should there be performance complaints, we still could think of
> having a closer look at the second set (or more).
>
> So generally, does this sound like an approach we can SRU? And
> second, more eyes looking at the set(s) would be appreciated.
>
> -Stefan
>
> Those are enough to prevent memory being eaten:
> * net: Introduce unregister_netdevice_queue()
> * net: Introduce unregister_netdevice_many()
> * net: add a list_head parameter to dellink() method
> * veth: Fix veth_dellink method
> * veth: Fix unregister_netdevice_queue for veth
> * net: Implement for_each_netdev_reverse.
> * net: Batch network namespace destruction.
>
> Those seem to speed up the number of connects to vsftp per time (though
> not as much as Maverick):
> * net: Automatically allocate per namespace data.
> * net: Add support for batching network namespace cleanups
> * netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}
> * net: Use rcu lookups in inet_twsk_purge.
> * tcp: fix inet_twsk_deschedule()
> * net: Batch inet_twsk_purge
>
> The following changes since commit 054b34d3a38dc2a775ab722411b934b52a33707f:
>    Brad Figg (1):
>          UBUNTU: Ubuntu-2.6.32-31.60
>
> are available in the git repository at:
>
>    git://kernel.ubuntu.com/smb/ubuntu-lucid netnsbpv2
>
> Eric Dumazet (5):
>        net: Introduce unregister_netdevice_queue()
>        net: Introduce unregister_netdevice_many()
>        net: add a list_head parameter to dellink() method
>        veth: Fix veth_dellink method
>        tcp: fix inet_twsk_deschedule()
>
> Eric W. Biederman (8):
>        veth: Fix unregister_netdevice_queue for veth
>        net: Implement for_each_netdev_reverse.
>        net: Batch network namespace destruction.
>        net: Automatically allocate per namespace data.
>        net: Add support for batching network namespace cleanups
>        netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}
>        net: Use rcu lookups in inet_twsk_purge.
>        net: Batch inet_twsk_purge
>
>   drivers/net/macvlan.c            |    6 +-
>   drivers/net/veth.c               |    6 +-
>   include/linux/netdevice.h        |   12 ++-
>   include/net/inet_timewait_sock.h |    6 +-
>   include/net/net_namespace.h      |   32 ++++-
>   include/net/rtnetlink.h          |    3 +-
>   net/8021q/vlan.c                 |    8 +-
>   net/8021q/vlan.h                 |    2 +-
>   net/core/dev.c                   |  120 ++++++++++-----
>   net/core/net_namespace.c         |  296 +++++++++++++++++++++++---------------
>   net/core/rtnetlink.c             |   14 +-
>   net/ipv4/inet_timewait_sock.c    |   47 ++++---
>   net/ipv4/tcp_ipv4.c              |   11 +-
>   net/ipv6/tcp_ipv6.c              |   11 +-
>   14 files changed, 369 insertions(+), 205 deletions(-)
>

Thats a honking big patch set for an SRU. Its not clear to me from the 
commit logs, but I assume they are all clean cherry-picks ?

I'm still not convinced that CONFIG_NET_NS=n isn't the best solution, 
despite the complaints that change might elicit. I'd like to hear from 
the consumers of network name spaces about how they are using the 
feature, and possible workarounds if it were to go away.

rtg
Daniel Lezcano March 25, 2011, 9:46 a.m. UTC | #2
On 03/25/2011 04:00 AM, Tim Gardner wrote:
> On 03/24/2011 09:41 AM, Stefan Bader wrote:
>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095
>>
>> This series of patches tries to cover a problem that we caught by
>> enabling network namespaces (CONFIG_NET_NS) in Lucid, which was done
>> (although the feature was still marked experimental) to support
>> containerize usecases (and we would get some complaints by
>> removing it now).
>>
>> I tried to come up with some usable solution. Unfortunately picking the
>> minimal set of patches which prevents the memory buildup, also causes 
>> the
>> rate of connects (which in that case makes use of network namespace 
>> cloning
>> a lot) to go down noticeably.
>>
>> The second half would improve the situation slightly but still not as
>> much as it has been achieved in Maverick. And using the Maverick
>> backport causes other problems in that specific case the bug is
>> reported.
>>
>> To quantify that a bit better:
>>
>> Lucid current        10 connections per second
>> Lucid set 1         1 connection every  2 seconds
>> Lucid set 2         2 connections every 3 seconds
>> Maverick         2 connections per second
>>
>> There has not been a way to verify how bad the impact of the slowdown
>> would be in a real production environment. So it might be a viable
>> approach to limit changes to the first set. Assuming that creating
>> and destroying namespaces is not the common usecase we have.
>>
>> Should there be performance complaints, we still could think of
>> having a closer look at the second set (or more).
>>
>> So generally, does this sound like an approach we can SRU? And
>> second, more eyes looking at the set(s) would be appreciated.
>>
>> -Stefan
>>
>> Those are enough to prevent memory being eaten:
>> * net: Introduce unregister_netdevice_queue()
>> * net: Introduce unregister_netdevice_many()
>> * net: add a list_head parameter to dellink() method
>> * veth: Fix veth_dellink method
>> * veth: Fix unregister_netdevice_queue for veth
>> * net: Implement for_each_netdev_reverse.
>> * net: Batch network namespace destruction.
>>
>> Those seem to speed up the number of connects to vsftp per time (though
>> not as much as Maverick):
>> * net: Automatically allocate per namespace data.
>> * net: Add support for batching network namespace cleanups
>> * netns: Add an explicit rcu_barrier to 
>> unregister_pernet_{device|subsys}
>> * net: Use rcu lookups in inet_twsk_purge.
>> * tcp: fix inet_twsk_deschedule()
>> * net: Batch inet_twsk_purge
>>
>> The following changes since commit 
>> 054b34d3a38dc2a775ab722411b934b52a33707f:
>>    Brad Figg (1):
>>          UBUNTU: Ubuntu-2.6.32-31.60
>>
>> are available in the git repository at:
>>
>>    git://kernel.ubuntu.com/smb/ubuntu-lucid netnsbpv2
>>
>> Eric Dumazet (5):
>>        net: Introduce unregister_netdevice_queue()
>>        net: Introduce unregister_netdevice_many()
>>        net: add a list_head parameter to dellink() method
>>        veth: Fix veth_dellink method
>>        tcp: fix inet_twsk_deschedule()
>>
>> Eric W. Biederman (8):
>>        veth: Fix unregister_netdevice_queue for veth
>>        net: Implement for_each_netdev_reverse.
>>        net: Batch network namespace destruction.
>>        net: Automatically allocate per namespace data.
>>        net: Add support for batching network namespace cleanups
>>        netns: Add an explicit rcu_barrier to 
>> unregister_pernet_{device|subsys}
>>        net: Use rcu lookups in inet_twsk_purge.
>>        net: Batch inet_twsk_purge
>>
>>   drivers/net/macvlan.c            |    6 +-
>>   drivers/net/veth.c               |    6 +-
>>   include/linux/netdevice.h        |   12 ++-
>>   include/net/inet_timewait_sock.h |    6 +-
>>   include/net/net_namespace.h      |   32 ++++-
>>   include/net/rtnetlink.h          |    3 +-
>>   net/8021q/vlan.c                 |    8 +-
>>   net/8021q/vlan.h                 |    2 +-
>>   net/core/dev.c                   |  120 ++++++++++-----
>>   net/core/net_namespace.c         |  296 
>> +++++++++++++++++++++++---------------
>>   net/core/rtnetlink.c             |   14 +-
>>   net/ipv4/inet_timewait_sock.c    |   47 ++++---
>>   net/ipv4/tcp_ipv4.c              |   11 +-
>>   net/ipv6/tcp_ipv6.c              |   11 +-
>>   14 files changed, 369 insertions(+), 205 deletions(-)
>>
>
> Thats a honking big patch set for an SRU. Its not clear to me from the 
> commit logs, but I assume they are all clean cherry-picks ?
>
> I'm still not convinced that CONFIG_NET_NS=n isn't the best solution, 
> despite the complaints that change might elicit. I'd like to hear from 
> the consumers of network name spaces about how they are using the 
> feature, and possible workarounds if it were to go away.

The users are heavily using all the namespaces and the cgroup through 
the Linux Containers http://lxc.sourceforge.net
There is not workaround if it is not set. If you remove this feature, 
IMO people will really complain.

The patchset providing the batching was introduced to speed up the 
network namespace destruction. Before this patch, destroying thousand of 
network namespace was taking a very long time (AFAIR, about 20 minutes). 
With this patchset it takes 2 mins.
Tim Gardner March 25, 2011, 12:57 p.m. UTC | #3
On 03/25/2011 03:46 AM, Daniel Lezcano wrote:
> On 03/25/2011 04:00 AM, Tim Gardner wrote:
>> On 03/24/2011 09:41 AM, Stefan Bader wrote:
>>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095
>>>
>>> This series of patches tries to cover a problem that we caught by
>>> enabling network namespaces (CONFIG_NET_NS) in Lucid, which was done
>>> (although the feature was still marked experimental) to support
>>> containerize usecases (and we would get some complaints by
>>> removing it now).
>>>
>>> I tried to come up with some usable solution. Unfortunately picking the
>>> minimal set of patches which prevents the memory buildup, also causes
>>> the
>>> rate of connects (which in that case makes use of network namespace
>>> cloning
>>> a lot) to go down noticeably.
>>>
>>> The second half would improve the situation slightly but still not as
>>> much as it has been achieved in Maverick. And using the Maverick
>>> backport causes other problems in that specific case the bug is
>>> reported.
>>>
>>> To quantify that a bit better:
>>>
>>> Lucid current 10 connections per second
>>> Lucid set 1 1 connection every 2 seconds
>>> Lucid set 2 2 connections every 3 seconds
>>> Maverick 2 connections per second
>>>
>>> There has not been a way to verify how bad the impact of the slowdown
>>> would be in a real production environment. So it might be a viable
>>> approach to limit changes to the first set. Assuming that creating
>>> and destroying namespaces is not the common usecase we have.
>>>
>>> Should there be performance complaints, we still could think of
>>> having a closer look at the second set (or more).
>>>
>>> So generally, does this sound like an approach we can SRU? And
>>> second, more eyes looking at the set(s) would be appreciated.
>>>
>>> -Stefan
>>>
>>> Those are enough to prevent memory being eaten:
>>> * net: Introduce unregister_netdevice_queue()
>>> * net: Introduce unregister_netdevice_many()
>>> * net: add a list_head parameter to dellink() method
>>> * veth: Fix veth_dellink method
>>> * veth: Fix unregister_netdevice_queue for veth
>>> * net: Implement for_each_netdev_reverse.
>>> * net: Batch network namespace destruction.
>>>
>>> Those seem to speed up the number of connects to vsftp per time (though
>>> not as much as Maverick):
>>> * net: Automatically allocate per namespace data.
>>> * net: Add support for batching network namespace cleanups
>>> * netns: Add an explicit rcu_barrier to
>>> unregister_pernet_{device|subsys}
>>> * net: Use rcu lookups in inet_twsk_purge.
>>> * tcp: fix inet_twsk_deschedule()
>>> * net: Batch inet_twsk_purge
>>>
>>> The following changes since commit
>>> 054b34d3a38dc2a775ab722411b934b52a33707f:
>>> Brad Figg (1):
>>> UBUNTU: Ubuntu-2.6.32-31.60
>>>
>>> are available in the git repository at:
>>>
>>> git://kernel.ubuntu.com/smb/ubuntu-lucid netnsbpv2
>>>
>>> Eric Dumazet (5):
>>> net: Introduce unregister_netdevice_queue()
>>> net: Introduce unregister_netdevice_many()
>>> net: add a list_head parameter to dellink() method
>>> veth: Fix veth_dellink method
>>> tcp: fix inet_twsk_deschedule()
>>>
>>> Eric W. Biederman (8):
>>> veth: Fix unregister_netdevice_queue for veth
>>> net: Implement for_each_netdev_reverse.
>>> net: Batch network namespace destruction.
>>> net: Automatically allocate per namespace data.
>>> net: Add support for batching network namespace cleanups
>>> netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}
>>> net: Use rcu lookups in inet_twsk_purge.
>>> net: Batch inet_twsk_purge
>>>
>>> drivers/net/macvlan.c | 6 +-
>>> drivers/net/veth.c | 6 +-
>>> include/linux/netdevice.h | 12 ++-
>>> include/net/inet_timewait_sock.h | 6 +-
>>> include/net/net_namespace.h | 32 ++++-
>>> include/net/rtnetlink.h | 3 +-
>>> net/8021q/vlan.c | 8 +-
>>> net/8021q/vlan.h | 2 +-
>>> net/core/dev.c | 120 ++++++++++-----
>>> net/core/net_namespace.c | 296 +++++++++++++++++++++++---------------
>>> net/core/rtnetlink.c | 14 +-
>>> net/ipv4/inet_timewait_sock.c | 47 ++++---
>>> net/ipv4/tcp_ipv4.c | 11 +-
>>> net/ipv6/tcp_ipv6.c | 11 +-
>>> 14 files changed, 369 insertions(+), 205 deletions(-)
>>>
>>
>> Thats a honking big patch set for an SRU. Its not clear to me from the
>> commit logs, but I assume they are all clean cherry-picks ?
>>
>> I'm still not convinced that CONFIG_NET_NS=n isn't the best solution,
>> despite the complaints that change might elicit. I'd like to hear from
>> the consumers of network name spaces about how they are using the
>> feature, and possible workarounds if it were to go away.
>
> The users are heavily using all the namespaces and the cgroup through
> the Linux Containers http://lxc.sourceforge.net
> There is not workaround if it is not set. If you remove this feature,
> IMO people will really complain.
>
> The patchset providing the batching was introduced to speed up the
> network namespace destruction. Before this patch, destroying thousand of
> network namespace was taking a very long time (AFAIR, about 20 minutes).
> With this patchset it takes 2 mins.
>

You aren't telling me _how_ network namespaces are being used. For 
example, I know of a commercial workload using vsftp. They didn't set 
out to use NET_NS, they just got it for free because the option is 
turned on. Not only does NET_NS slow down socket teardown, but it leaks 
enough memory that eventually the OOM killer cranks up. While vsftp is 
using NET_NS, its not dependent on it and would function perfectly fine 
without it, and quite a bit faster.

NET_NS was not ready for prime time in 2.6.32 and should never have been 
enabled in an LTS kernel.

rtg
John Johansen March 25, 2011, 1:16 p.m. UTC | #4
On 03/24/2011 08:00 PM, Tim Gardner wrote:
> On 03/24/2011 09:41 AM, Stefan Bader wrote:
>> BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095
>>
>> This series of patches tries to cover a problem that we caught by
>> enabling network namespaces (CONFIG_NET_NS) in Lucid, which was done
>> (although the feature was still marked experimental) to support
>> containerize usecases (and we would get some complaints by
>> removing it now).
>>
>> I tried to come up with some usable solution. Unfortunately picking the
>> minimal set of patches which prevents the memory buildup, also causes the
>> rate of connects (which in that case makes use of network namespace cloning
>> a lot) to go down noticeably.
>>
>> The second half would improve the situation slightly but still not as
>> much as it has been achieved in Maverick. And using the Maverick
>> backport causes other problems in that specific case the bug is
>> reported.
>>
>> To quantify that a bit better:
>>
>> Lucid current        10 connections per second
>> Lucid set 1         1 connection every  2 seconds
>> Lucid set 2         2 connections every 3 seconds
>> Maverick         2 connections per second
>>
>> There has not been a way to verify how bad the impact of the slowdown
>> would be in a real production environment. So it might be a viable
>> approach to limit changes to the first set. Assuming that creating
>> and destroying namespaces is not the common usecase we have.
>>
>> Should there be performance complaints, we still could think of
>> having a closer look at the second set (or more).
>>
>> So generally, does this sound like an approach we can SRU? And
>> second, more eyes looking at the set(s) would be appreciated.
>>
>> -Stefan
>>
>> Those are enough to prevent memory being eaten:
>> * net: Introduce unregister_netdevice_queue()
>> * net: Introduce unregister_netdevice_many()
>> * net: add a list_head parameter to dellink() method
>> * veth: Fix veth_dellink method
>> * veth: Fix unregister_netdevice_queue for veth
>> * net: Implement for_each_netdev_reverse.
>> * net: Batch network namespace destruction.
>>
>> Those seem to speed up the number of connects to vsftp per time (though
>> not as much as Maverick):
>> * net: Automatically allocate per namespace data.
>> * net: Add support for batching network namespace cleanups
>> * netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}
>> * net: Use rcu lookups in inet_twsk_purge.
>> * tcp: fix inet_twsk_deschedule()
>> * net: Batch inet_twsk_purge
>>
>> The following changes since commit 054b34d3a38dc2a775ab722411b934b52a33707f:
>>    Brad Figg (1):
>>          UBUNTU: Ubuntu-2.6.32-31.60
>>
>> are available in the git repository at:
>>
>>    git://kernel.ubuntu.com/smb/ubuntu-lucid netnsbpv2
>>
>> Eric Dumazet (5):
>>        net: Introduce unregister_netdevice_queue()
>>        net: Introduce unregister_netdevice_many()
>>        net: add a list_head parameter to dellink() method
>>        veth: Fix veth_dellink method
>>        tcp: fix inet_twsk_deschedule()
>>
>> Eric W. Biederman (8):
>>        veth: Fix unregister_netdevice_queue for veth
>>        net: Implement for_each_netdev_reverse.
>>        net: Batch network namespace destruction.
>>        net: Automatically allocate per namespace data.
>>        net: Add support for batching network namespace cleanups
>>        netns: Add an explicit rcu_barrier to unregister_pernet_{device|subsys}
>>        net: Use rcu lookups in inet_twsk_purge.
>>        net: Batch inet_twsk_purge
>>
>>   drivers/net/macvlan.c            |    6 +-
>>   drivers/net/veth.c               |    6 +-
>>   include/linux/netdevice.h        |   12 ++-
>>   include/net/inet_timewait_sock.h |    6 +-
>>   include/net/net_namespace.h      |   32 ++++-
>>   include/net/rtnetlink.h          |    3 +-
>>   net/8021q/vlan.c                 |    8 +-
>>   net/8021q/vlan.h                 |    2 +-
>>   net/core/dev.c                   |  120 ++++++++++-----
>>   net/core/net_namespace.c         |  296 +++++++++++++++++++++++---------------
>>   net/core/rtnetlink.c             |   14 +-
>>   net/ipv4/inet_timewait_sock.c    |   47 ++++---
>>   net/ipv4/tcp_ipv4.c              |   11 +-
>>   net/ipv6/tcp_ipv6.c              |   11 +-
>>   14 files changed, 369 insertions(+), 205 deletions(-)
>>
> 
> Thats a honking big patch set for an SRU. Its not clear to me from the commit logs, but I assume they are all clean cherry-picks ?
> 
Mostly clean, there are a few minor fixups, for differences that aren't being patched in.  I went through all the patches and they all looked good.

> I'm still not convinced that CONFIG_NET_NS=n isn't the best solution, despite the complaints that change might elicit. I'd like to hear from the consumers of network name spaces about how they are using the feature, and possible workarounds if it were to go away.
> 
That is the solution I would like but I think that at least for the server that is going to be problematic. Container are seeing a lot of use.

If we were to go with an SRU of this I would lean towards the smaller patchset that is enough to prevent memory being eaten (7 of the 13), and then if speed is a problem the remain 6 could be SRUed afterwards.
Tim Gardner March 25, 2011, 1:49 p.m. UTC | #5
On 03/25/2011 07:16 AM, John Johansen wrote:

>> I'm still not convinced that CONFIG_NET_NS=n isn't the best
>> solution, despite the complaints that change might elicit. I'd like
>> to hear from the consumers of network name spaces about how they
>> are using the feature, and possible workarounds if it were to go
>> away.
>>
> That is the solution I would like but I think that at least for the
> server that is going to be problematic. Container are seeing a lot of
> use.
>

While containers in general are in use, are network name spaces 
pro-actively being used? Is there some workload that is _dependent_ on 
NET_NS ? I'm not proposing that we disable containers or other name 
space features, only NET_NS.

> If we were to go with an SRU of this I would lean towards the smaller
> patchset that is enough to prevent memory being eaten (7 of the 13),
> and then if speed is a problem the remain 6 could be SRUed
> afterwards.

I'm not keen on releasing a kernel that reduces connection 
setup/teardown by an order of magnitude. Surely this'll have an adverse 
impact on web servers and the like.

rtg
John Johansen March 25, 2011, 2:30 p.m. UTC | #6
On 03/25/2011 06:49 AM, Tim Gardner wrote:
> On 03/25/2011 07:16 AM, John Johansen wrote:
> 
>>> I'm still not convinced that CONFIG_NET_NS=n isn't the best
>>> solution, despite the complaints that change might elicit. I'd like
>>> to hear from the consumers of network name spaces about how they
>>> are using the feature, and possible workarounds if it were to go
>>> away.
>>>
>> That is the solution I would like but I think that at least for the
>> server that is going to be problematic. Container are seeing a lot of
>> use.
>>
> 
> While containers in general are in use, are network name spaces pro-actively being used? Is there some workload that is _dependent_ on NET_NS ? I'm not proposing that we disable containers or other name space features, only NET_NS.
> 
I don't know the answer to that, it is worth exploring.  It is pro-actively being used in that some applications are requesting the CLONE_NEWNET flag, and I have seen container workloads that could claim to require NET_NS (essentially replacing virtual machines with containers) but I am not sure what kernel they were using.  I actually would like to turn NET_NS off too, my concern is that it is a regression of the feature that some (unquantifiable) set of users are using.

>> If we were to go with an SRU of this I would lean towards the smaller
>> patchset that is enough to prevent memory being eaten (7 of the 13),
>> and then if speed is a problem the remain 6 could be SRUed
>> afterwards.
> 
> I'm not keen on releasing a kernel that reduces connection setup/teardown by an order of magnitude. Surely this'll have an adverse impact on web servers and the like.
> 
Neither am I but, but my perhaps flawed understanding was it should only affect the connection setup/teardown if a new network namespace is being created, and I doubt most use cases actually do this.  This is actually something we should get a better handle on, what work loads that use NET_NS are noticeably impacted by this.
Tim Gardner March 25, 2011, 2:36 p.m. UTC | #7
On 03/25/2011 08:30 AM, John Johansen wrote:
> On 03/25/2011 06:49 AM, Tim Gardner wrote:
>> On 03/25/2011 07:16 AM, John Johansen wrote:
>>
>>>> I'm still not convinced that CONFIG_NET_NS=n isn't the best
>>>> solution, despite the complaints that change might elicit. I'd
>>>> like to hear from the consumers of network name spaces about
>>>> how they are using the feature, and possible workarounds if it
>>>> were to go away.
>>>>
>>> That is the solution I would like but I think that at least for
>>> the server that is going to be problematic. Container are seeing
>>> a lot of use.
>>>
>>
>> While containers in general are in use, are network name spaces
>> pro-actively being used? Is there some workload that is _dependent_
>> on NET_NS ? I'm not proposing that we disable containers or other
>> name space features, only NET_NS.
>>
> I don't know the answer to that, it is worth exploring.  It is
> pro-actively being used in that some applications are requesting the
> CLONE_NEWNET flag, and I have seen container workloads that could
> claim to require NET_NS (essentially replacing virtual machines with
> containers) but I am not sure what kernel they were using.  I
> actually would like to turn NET_NS off too, my concern is that it is
> a regression of the feature that some (unquantifiable) set of users
> are using.
>
>>> If we were to go with an SRU of this I would lean towards the
>>> smaller patchset that is enough to prevent memory being eaten (7
>>> of the 13), and then if speed is a problem the remain 6 could be
>>> SRUed afterwards.
>>
>> I'm not keen on releasing a kernel that reduces connection
>> setup/teardown by an order of magnitude. Surely this'll have an
>> adverse impact on web servers and the like.
>>
> Neither am I but, but my perhaps flawed understanding was it should
> only affect the connection setup/teardown if a new network namespace
> is being created, and I doubt most use cases actually do this.  This
> is actually something we should get a better handle on, what work
> loads that use NET_NS are noticeably impacted by this.

Well, there is an alternative for those folks that _are_ dependent on 
NET_NS:

sudo apt-get install linux-image-server-lts-backport-maverick

rtg
John Johansen March 25, 2011, 2:43 p.m. UTC | #8
On 03/25/2011 07:36 AM, Tim Gardner wrote:
> On 03/25/2011 08:30 AM, John Johansen wrote:
>> On 03/25/2011 06:49 AM, Tim Gardner wrote:
>>> On 03/25/2011 07:16 AM, John Johansen wrote:
>>>
>>>>> I'm still not convinced that CONFIG_NET_NS=n isn't the best
>>>>> solution, despite the complaints that change might elicit. I'd
>>>>> like to hear from the consumers of network name spaces about
>>>>> how they are using the feature, and possible workarounds if it
>>>>> were to go away.
>>>>>
>>>> That is the solution I would like but I think that at least for
>>>> the server that is going to be problematic. Container are seeing
>>>> a lot of use.
>>>>
>>>
>>> While containers in general are in use, are network name spaces
>>> pro-actively being used? Is there some workload that is _dependent_
>>> on NET_NS ? I'm not proposing that we disable containers or other
>>> name space features, only NET_NS.
>>>
>> I don't know the answer to that, it is worth exploring.  It is
>> pro-actively being used in that some applications are requesting the
>> CLONE_NEWNET flag, and I have seen container workloads that could
>> claim to require NET_NS (essentially replacing virtual machines with
>> containers) but I am not sure what kernel they were using.  I
>> actually would like to turn NET_NS off too, my concern is that it is
>> a regression of the feature that some (unquantifiable) set of users
>> are using.
>>
>>>> If we were to go with an SRU of this I would lean towards the
>>>> smaller patchset that is enough to prevent memory being eaten (7
>>>> of the 13), and then if speed is a problem the remain 6 could be
>>>> SRUed afterwards.
>>>
>>> I'm not keen on releasing a kernel that reduces connection
>>> setup/teardown by an order of magnitude. Surely this'll have an
>>> adverse impact on web servers and the like.
>>>
>> Neither am I but, but my perhaps flawed understanding was it should
>> only affect the connection setup/teardown if a new network namespace
>> is being created, and I doubt most use cases actually do this.  This
>> is actually something we should get a better handle on, what work
>> loads that use NET_NS are noticeably impacted by this.
> 
> Well, there is an alternative for those folks that _are_ dependent on NET_NS:
> 
> sudo apt-get install linux-image-server-lts-backport-maverick
> 
oh right, that convinces me.  Turn it off.
Stefan Bader March 28, 2011, 8:05 a.m. UTC | #9
On 03/25/2011 03:43 PM, John Johansen wrote:
> On 03/25/2011 07:36 AM, Tim Gardner wrote:
>> On 03/25/2011 08:30 AM, John Johansen wrote:
>>> On 03/25/2011 06:49 AM, Tim Gardner wrote:
>>>> On 03/25/2011 07:16 AM, John Johansen wrote:
>>>>
>>>>>> I'm still not convinced that CONFIG_NET_NS=n isn't the best
>>>>>> solution, despite the complaints that change might elicit. I'd
>>>>>> like to hear from the consumers of network name spaces about
>>>>>> how they are using the feature, and possible workarounds if it
>>>>>> were to go away.
>>>>>>
>>>>> That is the solution I would like but I think that at least for
>>>>> the server that is going to be problematic. Container are seeing
>>>>> a lot of use.
>>>>>
>>>>
>>>> While containers in general are in use, are network name spaces
>>>> pro-actively being used? Is there some workload that is _dependent_
>>>> on NET_NS ? I'm not proposing that we disable containers or other
>>>> name space features, only NET_NS.
>>>>
>>> I don't know the answer to that, it is worth exploring.  It is
>>> pro-actively being used in that some applications are requesting the
>>> CLONE_NEWNET flag, and I have seen container workloads that could
>>> claim to require NET_NS (essentially replacing virtual machines with
>>> containers) but I am not sure what kernel they were using.  I
>>> actually would like to turn NET_NS off too, my concern is that it is
>>> a regression of the feature that some (unquantifiable) set of users
>>> are using.
>>>
>>>>> If we were to go with an SRU of this I would lean towards the
>>>>> smaller patchset that is enough to prevent memory being eaten (7
>>>>> of the 13), and then if speed is a problem the remain 6 could be
>>>>> SRUed afterwards.
>>>>
>>>> I'm not keen on releasing a kernel that reduces connection
>>>> setup/teardown by an order of magnitude. Surely this'll have an
>>>> adverse impact on web servers and the like.
>>>>
>>> Neither am I but, but my perhaps flawed understanding was it should
>>> only affect the connection setup/teardown if a new network namespace
>>> is being created, and I doubt most use cases actually do this.  This
>>> is actually something we should get a better handle on, what work
>>> loads that use NET_NS are noticeably impacted by this.
>>
>> Well, there is an alternative for those folks that _are_ dependent on NET_NS:
>>
>> sudo apt-get install linux-image-server-lts-backport-maverick
>>
> oh right, that convinces me.  Turn it off.

Sorry for stepping back into this discussion late. I think as long as it ok to
go with a Maverick kernel, this clearly would be better than any partial
backport for NET_NS. At least in the one specific case it was suggested that
doing this was (at least for the moment) not possible as Maverick caused other
problems. There was however only very vague hints about what those other
problems really are, so it is hard to tell whether and how those could be solved.

Just for the record, it is my understanding (as John already mentioned) that the
slowdown only affects connections made with network namspace cloning involved.
If that feature is not used, there should be no real slowdown.

Summarizing, I think the safest solution is to turn the feature off. But we need
to be careful about it. This being a LTS release there could also be people
using it and for various reasons (sw cert) not be able to move to a newer kernel
that simple. Unfortunately it is hard to find out before doing.

-Stefan
Tim Gardner March 29, 2011, 2:51 p.m. UTC | #10
On 03/28/2011 02:05 AM, Stefan Bader wrote:
> On 03/25/2011 03:43 PM, John Johansen wrote:
>> On 03/25/2011 07:36 AM, Tim Gardner wrote:
>>> On 03/25/2011 08:30 AM, John Johansen wrote:
>>>> On 03/25/2011 06:49 AM, Tim Gardner wrote:
>>>>> On 03/25/2011 07:16 AM, John Johansen wrote:
>>>>>
>>>>>>> I'm still not convinced that CONFIG_NET_NS=n isn't the best
>>>>>>> solution, despite the complaints that change might elicit. I'd
>>>>>>> like to hear from the consumers of network name spaces about
>>>>>>> how they are using the feature, and possible workarounds if it
>>>>>>> were to go away.
>>>>>>>
>>>>>> That is the solution I would like but I think that at least for
>>>>>> the server that is going to be problematic. Container are seeing
>>>>>> a lot of use.
>>>>>>
>>>>>
>>>>> While containers in general are in use, are network name spaces
>>>>> pro-actively being used? Is there some workload that is _dependent_
>>>>> on NET_NS ? I'm not proposing that we disable containers or other
>>>>> name space features, only NET_NS.
>>>>>
>>>> I don't know the answer to that, it is worth exploring.  It is
>>>> pro-actively being used in that some applications are requesting the
>>>> CLONE_NEWNET flag, and I have seen container workloads that could
>>>> claim to require NET_NS (essentially replacing virtual machines with
>>>> containers) but I am not sure what kernel they were using.  I
>>>> actually would like to turn NET_NS off too, my concern is that it is
>>>> a regression of the feature that some (unquantifiable) set of users
>>>> are using.
>>>>
>>>>>> If we were to go with an SRU of this I would lean towards the
>>>>>> smaller patchset that is enough to prevent memory being eaten (7
>>>>>> of the 13), and then if speed is a problem the remain 6 could be
>>>>>> SRUed afterwards.
>>>>>
>>>>> I'm not keen on releasing a kernel that reduces connection
>>>>> setup/teardown by an order of magnitude. Surely this'll have an
>>>>> adverse impact on web servers and the like.
>>>>>
>>>> Neither am I but, but my perhaps flawed understanding was it should
>>>> only affect the connection setup/teardown if a new network namespace
>>>> is being created, and I doubt most use cases actually do this.  This
>>>> is actually something we should get a better handle on, what work
>>>> loads that use NET_NS are noticeably impacted by this.
>>>
>>> Well, there is an alternative for those folks that _are_ dependent on NET_NS:
>>>
>>> sudo apt-get install linux-image-server-lts-backport-maverick
>>>
>> oh right, that convinces me.  Turn it off.
>
> Sorry for stepping back into this discussion late. I think as long as it ok to
> go with a Maverick kernel, this clearly would be better than any partial
> backport for NET_NS. At least in the one specific case it was suggested that
> doing this was (at least for the moment) not possible as Maverick caused other
> problems. There was however only very vague hints about what those other
> problems really are, so it is hard to tell whether and how those could be solved.
>
> Just for the record, it is my understanding (as John already mentioned) that the
> slowdown only affects connections made with network namspace cloning involved.
> If that feature is not used, there should be no real slowdown.
>
> Summarizing, I think the safest solution is to turn the feature off. But we need
> to be careful about it. This being a LTS release there could also be people
> using it and for various reasons (sw cert) not be able to move to a newer kernel
> that simple. Unfortunately it is hard to find out before doing.
>
> -Stefan

So, are you gonna send a patch disabling NET_NS ?
Stefan Bader March 29, 2011, 3:13 p.m. UTC | #11
On 03/29/2011 04:51 PM, Tim Gardner wrote:
> On 03/28/2011 02:05 AM, Stefan Bader wrote:
>> On 03/25/2011 03:43 PM, John Johansen wrote:
>>> On 03/25/2011 07:36 AM, Tim Gardner wrote:
>>>> On 03/25/2011 08:30 AM, John Johansen wrote:
>>>>> On 03/25/2011 06:49 AM, Tim Gardner wrote:
>>>>>> On 03/25/2011 07:16 AM, John Johansen wrote:
>>>>>>
>>>>>>>> I'm still not convinced that CONFIG_NET_NS=n isn't the best
>>>>>>>> solution, despite the complaints that change might elicit. I'd
>>>>>>>> like to hear from the consumers of network name spaces about
>>>>>>>> how they are using the feature, and possible workarounds if it
>>>>>>>> were to go away.
>>>>>>>>
>>>>>>> That is the solution I would like but I think that at least for
>>>>>>> the server that is going to be problematic. Container are seeing
>>>>>>> a lot of use.
>>>>>>>
>>>>>>
>>>>>> While containers in general are in use, are network name spaces
>>>>>> pro-actively being used? Is there some workload that is _dependent_
>>>>>> on NET_NS ? I'm not proposing that we disable containers or other
>>>>>> name space features, only NET_NS.
>>>>>>
>>>>> I don't know the answer to that, it is worth exploring.  It is
>>>>> pro-actively being used in that some applications are requesting the
>>>>> CLONE_NEWNET flag, and I have seen container workloads that could
>>>>> claim to require NET_NS (essentially replacing virtual machines with
>>>>> containers) but I am not sure what kernel they were using.  I
>>>>> actually would like to turn NET_NS off too, my concern is that it is
>>>>> a regression of the feature that some (unquantifiable) set of users
>>>>> are using.
>>>>>
>>>>>>> If we were to go with an SRU of this I would lean towards the
>>>>>>> smaller patchset that is enough to prevent memory being eaten (7
>>>>>>> of the 13), and then if speed is a problem the remain 6 could be
>>>>>>> SRUed afterwards.
>>>>>>
>>>>>> I'm not keen on releasing a kernel that reduces connection
>>>>>> setup/teardown by an order of magnitude. Surely this'll have an
>>>>>> adverse impact on web servers and the like.
>>>>>>
>>>>> Neither am I but, but my perhaps flawed understanding was it should
>>>>> only affect the connection setup/teardown if a new network namespace
>>>>> is being created, and I doubt most use cases actually do this.  This
>>>>> is actually something we should get a better handle on, what work
>>>>> loads that use NET_NS are noticeably impacted by this.
>>>>
>>>> Well, there is an alternative for those folks that _are_ dependent on NET_NS:
>>>>
>>>> sudo apt-get install linux-image-server-lts-backport-maverick
>>>>
>>> oh right, that convinces me.  Turn it off.
>>
>> Sorry for stepping back into this discussion late. I think as long as it ok to
>> go with a Maverick kernel, this clearly would be better than any partial
>> backport for NET_NS. At least in the one specific case it was suggested that
>> doing this was (at least for the moment) not possible as Maverick caused other
>> problems. There was however only very vague hints about what those other
>> problems really are, so it is hard to tell whether and how those could be solved.
>>
>> Just for the record, it is my understanding (as John already mentioned) that the
>> slowdown only affects connections made with network namspace cloning involved.
>> If that feature is not used, there should be no real slowdown.
>>
>> Summarizing, I think the safest solution is to turn the feature off. But we need
>> to be careful about it. This being a LTS release there could also be people
>> using it and for various reasons (sw cert) not be able to move to a newer kernel
>> that simple. Unfortunately it is hard to find out before doing.
>>
>> -Stefan
> 
> So, are you gonna send a patch disabling NET_NS ?
> 
Oh well, yeah. Knowingly break something just scares me. But I'll send something
out and let the stable guys decide. :-P

-Stefan