diff mbox

2.6.38 containers bug: Infinite loop in /proc/sys/net/ipv6/neigh/neigh/neigh...

Message ID 20110330.020120.02272125.davem@davemloft.net
State Not Applicable, archived
Delegated to: David Miller
Headers show

Commit Message

David Miller March 30, 2011, 9:01 a.m. UTC
From: Rob Landley <rlandley@parallels.com>
Date: Wed, 30 Mar 2011 03:29:16 -0500

> In the host context a find on /proc completes, but inside an lxc
> container a find on /proc never completes, due to the endless loop in
> the title.  (It's not a symlink, it seems to be a cross linked directory.)
> 
> This is vanilla 2.6.38, I can attach my .config if you think it'd help.
>  (The container's the lxc debian sid template but that's probably not
> relevant.)

Please CC: netdev@vger.kernel.org when a bug might be related to
networking, as is obviously the case here.

This bug should be fixed by the following patch:

--------------------
commit 9d2a8fa96a44ba242de3a6f56acaef7a40a97b97
Author: Eric W. Biederman <ebiederm@aristanetworks.com>
Date:   Mon Mar 21 18:23:34 2011 -0700

    net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries.
    
    When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh
    by adding a mount point it appears I missed a critical ordering issue, in the
    ipv6 initialization.  I had not realized that ipv6_sysctl_register is called
    at the very end of the ipv6 initialization and in particular after we call
    neigh_sysctl_register from ndisc_init.
    
    "neigh" needs to be initialized in ipv6_static_sysctl_register which is
    the first ipv6 table to initialized, and definitely before ndisc_init.
    This removes the weirdness of duplicate tables while still providing a
    "neigh" mount point which prevents races in sysctl unregistering.
    
    This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232
    Reported-by: sunkan@zappa.cx
    Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
    Signed-off-by: David S. Miller <davem@davemloft.net>

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Rob Landley March 30, 2011, 10:16 a.m. UTC | #1
On 03/30/2011 04:01 AM, David Miller wrote:
> From: Rob Landley <rlandley@parallels.com>
> Date: Wed, 30 Mar 2011 03:29:16 -0500
> 
>> In the host context a find on /proc completes, but inside an lxc
>> container a find on /proc never completes, due to the endless loop in
>> the title.  (It's not a symlink, it seems to be a cross linked directory.)
>>
>> This is vanilla 2.6.38, I can attach my .config if you think it'd help.
>>  (The container's the lxc debian sid template but that's probably not
>> relevant.)
> 
> Please CC: netdev@vger.kernel.org when a bug might be related to
> networking, as is obviously the case here.
> 
> This bug should be fixed by the following patch:
> 
> --------------------
> commit 9d2a8fa96a44ba242de3a6f56acaef7a40a97b97
> Author: Eric W. Biederman <ebiederm@aristanetworks.com>
> Date:   Mon Mar 21 18:23:34 2011 -0700
> 
>     net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries.
>     
>     When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh
>     by adding a mount point it appears I missed a critical ordering issue, in the
>     ipv6 initialization.  I had not realized that ipv6_sysctl_register is called
>     at the very end of the ipv6 initialization and in particular after we call
>     neigh_sysctl_register from ndisc_init.
>     
>     "neigh" needs to be initialized in ipv6_static_sysctl_register which is
>     the first ipv6 table to initialized, and definitely before ndisc_init.
>     This removes the weirdness of duplicate tables while still providing a
>     "neigh" mount point which prevents races in sysctl unregistering.
>     
>     This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232
>     Reported-by: sunkan@zappa.cx
>     Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
>     Signed-off-by: David S. Miller <davem@davemloft.net>


Yes, that fixed it.  Pinging the stable guys to make sure this goes in a
dot release.

Thanks,

Rob
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 30, 2011, 10:35 a.m. UTC | #2
From: Rob Landley <rlandley@parallels.com>
Date: Wed, 30 Mar 2011 05:16:19 -0500

> Pinging the stable guys to make sure this goes in a dot release.

Please do not do this.

I let fixes soak in Linus's tree for a week or even more before I submit
them to -stable.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley March 30, 2011, 10:45 a.m. UTC | #3
On 03/30/2011 05:35 AM, David Miller wrote:
> From: Rob Landley <rlandley@parallels.com>
> Date: Wed, 30 Mar 2011 05:16:19 -0500
> 
>> Pinging the stable guys to make sure this goes in a dot release.
> 
> Please do not do this.

Too late, but presumably they got your NAK.

> I let fixes soak in Linus's tree for a week or even more before I submit
> them to -stable.

So you think letting it soak in the merge window and the churn of -rc1
(which currently doesn't boot for me due to an ide issue) will provide
more validity than a specific "There was an infinite loop in the
filesystem, there is now no longer an infinite loop in the filesystem,
here's the specific test for it" and visual inspection ofthe patch?

*shrug*  I hear what you say.  It's certainly a point of view.

Rob
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 30, 2011, 10:48 a.m. UTC | #4
From: Rob Landley <rlandley@parallels.com>
Date: Wed, 30 Mar 2011 05:45:08 -0500

> So you think letting it soak in the merge window and the churn of -rc1
> (which currently doesn't boot for me due to an ide issue) will provide
> more validity than a specific "There was an infinite loop in the
> filesystem, there is now no longer an infinite loop in the filesystem,
> here's the specific test for it" and visual inspection ofthe patch?

Yes I absolutely do think it's better to let it soak somewhere than
to repeat the reason why this patch is necessary in the first place.

This patch is necessary because Eric already screwed up trying to fix
this neigh dirctory under ipv6 once, and he got it wrong.

That patch looked correct and straightforward.

Yet it introduced the bug you're seeing.

Given that consideration, are you still holding steady to your
opinion wise guy? :-)

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Landley March 30, 2011, 11:27 a.m. UTC | #5
On 03/30/2011 05:48 AM, David Miller wrote:
> From: Rob Landley <rlandley@parallels.com>
> Date: Wed, 30 Mar 2011 05:45:08 -0500
> 
>> So you think letting it soak in the merge window and the churn of -rc1
>> (which currently doesn't boot for me due to an ide issue) will provide
>> more validity than a specific "There was an infinite loop in the
>> filesystem, there is now no longer an infinite loop in the filesystem,
>> here's the specific test for it" and visual inspection ofthe patch?
> 
> Yes I absolutely do think it's better to let it soak somewhere than
> to repeat the reason why this patch is necessary in the first place.
> 
> This patch is necessary because Eric already screwed up trying to fix
> this neigh dirctory under ipv6 once, and he got it wrong.
> 
> That patch looked correct and straightforward.
> 
> Yet it introduced the bug you're seeing.
> 
> Given that consideration, are you still holding steady to your
> opinion wise guy? :-)

The patch fixed the specific bug I hit (find /proc hanging), and didn't
noticeably make the system worse in my use cases.  I have a specific
test case which used to fail and now succeeds.  The system _without_ the
patch is broken, and I weigh the possibility of breakage with the patch
against that. I wasn't particularly concerned with it being the "right"
fix from a long-term structural standpoint because that's what 2.6.39 is
for.  I just wanted the obvious breakage (a regression from 2.6.37) to
go away, I tested this fix and it Worked For Me (tm), hence letting the
stable guys know about it being the obvious next move.

How many people do you think are likely to even try containers under
2.6.39-rc1 or -rc2?  Eric's patch introducing the bug is dated January
31, and it was in 2.6.38-rc4.  Apparently nobody noticed the breakage in
-rc5, -rc6, -rc7, or -rc8.  His patch fixing the bug is dated one week
after 2.6.38 shipped.  I also noticed it in the stable kernel, not in
any of the 2.6.38-rc releases.  So at least two people hit this in
-stable already, and nobody hit it in -dev for four -rc releases.  I
have tested it in 2.6.38 and it Worked For Me.  Supposing the new patch
did introduce subtle breakage, how is leaving it in the 2.6.39-dev
series for a _month_ more likely to find it than the entire second half
of the 2.6.38 development cycle (when it was at its most stable, and
thus easiest to test)?

I have no idea if this patch fixes 2.6.39-rc1, because -rc1 still
doesn't _boot_ in my test environment.  Many other things about -rc1 are
broken.  That's sort of the point of -rc1.  A fairly small number of
people test the development branch, and this is the most unstable time
of the development cycle where many things about it are _expected_ not
to work.  Especially right now, I don't see how letting it "soak" for a
week or two in the -dev branch accomplishes anything whatsoever.

So yes, I am still holding steady to my opinion.  But you're the
maintainer, so my opinion doesn't really matter here.

Rob
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/ipv6/sysctl_net_ipv6.c b/net/ipv6/sysctl_net_ipv6.c
index 7cb65ef..6dcf5e7 100644
--- a/net/ipv6/sysctl_net_ipv6.c
+++ b/net/ipv6/sysctl_net_ipv6.c
@@ -17,6 +17,16 @@ 
 
 static struct ctl_table empty[1];
 
+static ctl_table ipv6_static_skeleton[] = {
+	{
+		.procname	= "neigh",
+		.maxlen		= 0,
+		.mode		= 0555,
+		.child		= empty,
+	},
+	{ }
+};
+
 static ctl_table ipv6_table_template[] = {
 	{
 		.procname	= "route",
@@ -37,12 +47,6 @@  static ctl_table ipv6_table_template[] = {
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
-	{
-		.procname	= "neigh",
-		.maxlen		= 0,
-		.mode		= 0555,
-		.child		= empty,
-	},
 	{ }
 };
 
@@ -160,7 +164,7 @@  static struct ctl_table_header *ip6_base;
 
 int ipv6_static_sysctl_register(void)
 {
-	ip6_base = register_sysctl_paths(net_ipv6_ctl_path, empty);
+	ip6_base = register_sysctl_paths(net_ipv6_ctl_path, ipv6_static_skeleton);
 	if (ip6_base == NULL)
 		return -ENOMEM;
 	return 0;