Message ID | 1324528656.2621.19.camel@edumazet-laptop |
---|---|
State | RFC, archived |
Delegated to: | David Miller |
Headers | show |
On Thu, Dec 22, 2011 at 05:37:36AM +0100, Eric Dumazet wrote: > Le mercredi 21 décembre 2011 à 23:12 +0000, Chris Boot a écrit : > > > > (gdb) list *ip_fragment+0x9d > > 0xffffffff812a62e1 is in ip_fragment (include/net/dst.h:209). > > 204 return dst_metric(dst, RTAX_FEATURES) & feature; > > 205 } > > 206 > > 207 static inline u32 dst_mtu(const struct dst_entry *dst) > > 208 { > > 209 return dst->ops->mtu(dst); > > 210 } > > 211 > > 212 /* RTT metrics are stored in milliseconds for user ABI, but used as jiffies */ > > This one is different, its not IPv6 related but IPv4 : > > fake_dst_ops lacks a .mtu() field > > Bug added in commit 618f9bc74a039da76 (net: Move mtu handling down to > the protocol depended handlers) > Before I did this patch, I changed the .default_mtu field of "struct dst_ops" to .mtu and converted all users of default_mtu to mtu. fake_dst_ops never had a default_mtu field, so I did not touch it. I guess this bug is arround since fake_dst_ops exists. It's now just much easier to trigger as commit 618f9bc74a039da76 changed dst_mtu() to call dst->ops->mtu() unconditionally. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 22 décembre 2011 à 07:38 +0100, Steffen Klassert a écrit : > On Thu, Dec 22, 2011 at 05:37:36AM +0100, Eric Dumazet wrote: > > Le mercredi 21 décembre 2011 à 23:12 +0000, Chris Boot a écrit : > > > > > > (gdb) list *ip_fragment+0x9d > > > 0xffffffff812a62e1 is in ip_fragment (include/net/dst.h:209). > > > 204 return dst_metric(dst, RTAX_FEATURES) & feature; > > > 205 } > > > 206 > > > 207 static inline u32 dst_mtu(const struct dst_entry *dst) > > > 208 { > > > 209 return dst->ops->mtu(dst); > > > 210 } > > > 211 > > > 212 /* RTT metrics are stored in milliseconds for user ABI, but used as jiffies */ > > > > This one is different, its not IPv6 related but IPv4 : > > > > fake_dst_ops lacks a .mtu() field > > > > Bug added in commit 618f9bc74a039da76 (net: Move mtu handling down to > > the protocol depended handlers) > > > > Before I did this patch, I changed the .default_mtu field of "struct dst_ops" > to .mtu and converted all users of default_mtu to mtu. fake_dst_ops never > had a default_mtu field, so I did not touch it. I guess this bug is arround > since fake_dst_ops exists. It's now just much easier to trigger as commit > 618f9bc74a039da76 changed dst_mtu() to call dst->ops->mtu() unconditionally. I dont think so. Prior to 618f9bc74a0, we were calling static inline u32 dst_mtu(const struct dst_entry *dst) { u32 mtu = dst_metric_raw(dst, RTAX_MTU); if (!mtu) mtu = dst->ops->mtu(dst); return mtu; } with dst = fake_rtable and we did : dst_init_metrics(&rt->dst, br_dst_default_metrics, true); so dst_metric_raw(dst, RTAX_MTU) was returning 1500 So bug is new, we dont need to backport this fix. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 22, 2011 at 08:51:17AM +0100, Eric Dumazet wrote: > > Prior to 618f9bc74a0, we were calling > > static inline u32 dst_mtu(const struct dst_entry *dst) > { > u32 mtu = dst_metric_raw(dst, RTAX_MTU); > > if (!mtu) > mtu = dst->ops->mtu(dst); > > return mtu; > } > > with dst = fake_rtable > > and we did : > > dst_init_metrics(&rt->dst, br_dst_default_metrics, true); > > so dst_metric_raw(dst, RTAX_MTU) was returning 1500 > > So bug is new, we dont need to backport this fix. > Ok, I see. Thanks for pointing it out! -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Le jeudi 22 décembre 2011 à 08:58 +0100, Steffen Klassert a écrit : > On Thu, Dec 22, 2011 at 08:51:17AM +0100, Eric Dumazet wrote: > > > > Prior to 618f9bc74a0, we were calling > > > > static inline u32 dst_mtu(const struct dst_entry *dst) > > { > > u32 mtu = dst_metric_raw(dst, RTAX_MTU); > > > > if (!mtu) > > mtu = dst->ops->mtu(dst); > > > > return mtu; > > } > > > > with dst = fake_rtable > > > > and we did : > > > > dst_init_metrics(&rt->dst, br_dst_default_metrics, true); > > > > so dst_metric_raw(dst, RTAX_MTU) was returning 1500 > > > > So bug is new, we dont need to backport this fix. > > > > Ok, I see. Thanks for pointing it out! You're welcome. By the way we no longer have this 1500 hard coded, not sure it it matters or not, but it looks better anyway. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Dec 22, 2011 at 09:05:38AM +0100, Eric Dumazet wrote: > > By the way we no longer have this 1500 hard coded, not sure it it > matters or not, but it looks better anyway. > It looks like the bridge code sets the mtu value on this metric to the mtu of the bridge device in br_add_if(). So seems to be not really fixed to 1500. Not sure why netfilter really needs to have this 1500 in the metric before the mtu is initialized from the bridge code. We could make fake_mtu() look like dn_dst_mtu(). The mtu of the bridge device should be reflected in the metric once the bridge code initialized it. Also learned pmtu values would be taken into account then. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 22/12/2011 04:37, Eric Dumazet wrote: > Le mercredi 21 décembre 2011 à 23:12 +0000, Chris Boot a écrit : >> On 21 Dec 2011, at 21:58, Chris Boot wrote: >> >>> On 21/12/2011 20:52, Eric Dumazet wrote: >>>> Le mercredi 21 décembre 2011 à 21:28 +0100, Eric Dumazet a écrit : >>>>> Le mercredi 21 décembre 2011 à 20:05 +0000, Chris Boot a écrit : >>>>>> On 21/12/2011 18:00, Eric Dumazet wrote: >>>>>>> Le mercredi 21 décembre 2011 à 18:36 +0100, Eric Dumazet a écrit : >>>>>>> >>>>>>>> Good point, thats a different problem then, since 3.1 is not supposed to >>>>>>>> have this bug. >>>>>>>> >>>>>>>> It seems rt->rt6i_peer points to invalid memory in your crash. >>>>>>>> >>>>>>>> (RBX=00000000000001f4) >>>>>>>> >>>>>>>> 8b 83 a4 00 00 00 mov 0xa4(%rbx),%eax p->refcnt >>>>>>>> 1f4+a4 -> CR2=0000000000000298 >>>>>>>> >>>>>>> It would help if you can confirm latest linux tree can reproduce the >>>>>>> bug. >>>>>> Hi Eric, >>>>>> >>>>>> I just built a v3.2-rc6-140-gb9e26df with the same config as the Debian >>>>>> 3.1.0 kernel. I can reproduce the bug just as easily with this kernel as >>>>>> with the Debian kernel. Unfortunately I wasn't able to get an entire >>>>>> trace, for some reason it didn't appear to be printed to the serial port >>>>>> and hung after the (long) list of loaded kernel modules. The crash >>>>>> happens at the same offset: >>>>>> >>>>> Thanks ! >>>>> >>>>> Oh well, br_netfilter fake_rtable strikes again. >>>>> >>>>> I'll cook a patch in a couple of minutes... >>>>> >>>> Could you try following patch ? >>>> >>>> [snip] >>> Eric, >>> >>> It looks good! The rsync that caused the crash real quick hasn't done it at all with the patch applied. I'll keep testing it of course, but I think that's done it. >> No, sorry, false hope. The following does look rather different however: >> >> [snip] > This one is different, its not IPv6 related but IPv4 : > > fake_dst_ops lacks a .mtu() field > > Bug added in commit 618f9bc74a039da76 (net: Move mtu handling down to > the protocol depended handlers) > > Here is an updated patch, thanks again ! > > [snip] Eric, So far so good. I've had this running for several hours this morning with more of the prodding that would normally have crashed it, both IPv4 and IPv6, and it's holding up well. Thanks again, Chris
diff --git a/include/net/dst.h b/include/net/dst.h index 6faec1a..75766b4 100644 --- a/include/net/dst.h +++ b/include/net/dst.h @@ -53,6 +53,7 @@ struct dst_entry { #define DST_NOHASH 0x0008 #define DST_NOCACHE 0x0010 #define DST_NOCOUNT 0x0020 +#define DST_NOPEER 0x0040 short error; short obsolete; diff --git a/net/bridge/br_netfilter.c b/net/bridge/br_netfilter.c index d6ec372..fa8b8f7 100644 --- a/net/bridge/br_netfilter.c +++ b/net/bridge/br_netfilter.c @@ -114,12 +114,18 @@ static struct neighbour *fake_neigh_lookup(const struct dst_entry *dst, const vo return NULL; } +static unsigned int fake_mtu(const struct dst_entry *dst) +{ + return dst->dev->mtu; +} + static struct dst_ops fake_dst_ops = { .family = AF_INET, .protocol = cpu_to_be16(ETH_P_IP), .update_pmtu = fake_update_pmtu, .cow_metrics = fake_cow_metrics, .neigh_lookup = fake_neigh_lookup, + .mtu = fake_mtu, }; /* @@ -141,7 +147,7 @@ void br_netfilter_rtable_init(struct net_bridge *br) rt->dst.dev = br->dev; rt->dst.path = &rt->dst; dst_init_metrics(&rt->dst, br_dst_default_metrics, true); - rt->dst.flags = DST_NOXFRM; + rt->dst.flags = DST_NOXFRM | DST_NOPEER; rt->dst.ops = &fake_dst_ops; } diff --git a/net/ipv4/route.c b/net/ipv4/route.c index 252c512..a5004f1 100644 --- a/net/ipv4/route.c +++ b/net/ipv4/route.c @@ -1366,7 +1366,7 @@ void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst, int more) { struct rtable *rt = (struct rtable *) dst; - if (rt) { + if (rt && !(rt->dst.flags & DST_NOPEER)) { if (rt->peer == NULL) rt_bind_peer(rt, rt->rt_dst, 1); @@ -1377,7 +1377,7 @@ void __ip_select_ident(struct iphdr *iph, struct dst_entry *dst, int more) iph->id = htons(inet_getid(rt->peer, more)); return; } - } else + } else if (!rt) printk(KERN_DEBUG "rt_bind_peer(0) @%p\n", __builtin_return_address(0)); diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index 84d0bd5..ec56271 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -603,7 +603,7 @@ void ipv6_select_ident(struct frag_hdr *fhdr, struct rt6_info *rt) static atomic_t ipv6_fragmentation_id; int old, new; - if (rt) { + if (rt && !(rt->dst.flags & DST_NOPEER)) { struct inet_peer *peer; if (!rt->rt6i_peer)