Message ID | CANaxB-wntm+Vu5z0k9bRYLvPX3ezFZSEQkGPGH3AVYDT5DrOKw@mail.gmail.com |
---|---|
State | Superseded |
Headers | show |
Andrey Wagin <avagin@gmail.com> wrote: > When a conntrack is created by kernel, it is initialized (sets > IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it > is added in hashes (__nf_conntrack_hash_insert), so one conntract > can't be initialized from a few threads concurrently. > > ctnetlink can add an uninitialized conntrack (w/o > IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up > this conntrack and start initialize it concurrently. It's dangerous, > because BUG can be triggered from nf_nat_setup_info. Good catch. I don't see a good solution at the moment. We can force null bindings if no nat transformation is specified from userspace. But this would mean that rules specified in the nat table are not evaluated anymore when the first packet arrives. The only other solution is see is full serialization via ct->lock. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Jan 07, 2014 at 01:32:00PM +0100, Florian Westphal wrote: > Andrey Wagin <avagin@gmail.com> wrote: > > When a conntrack is created by kernel, it is initialized (sets > > IPS_{DST,SRC}_NAT_DONE_BIT bits in nf_nat_setup_info) and only then it > > is added in hashes (__nf_conntrack_hash_insert), so one conntract > > can't be initialized from a few threads concurrently. > > > > ctnetlink can add an uninitialized conntrack (w/o > > IPS_{DST,SRC}_NAT_DONE_BIT) in hashes, then a few threads can look up > > this conntrack and start initialize it concurrently. It's dangerous, > > because BUG can be triggered from nf_nat_setup_info. > > Good catch. > > I don't see a good solution at the moment. > > We can force null bindings if no nat transformation is specified > from userspace. But this would mean that rules specified in the nat > table are not evaluated anymore when the first packet arrives. conntracks that are added via ctnetlink should already include the nat information (if any) at creation time, this is how it works with state-sync via conntrackd when nat is present. I think attaching the null binding seems like the easier solution to me. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From f501d2e80c2811aa7d69110d72f24e6b7173d920 Mon Sep 17 00:00:00 2001 From: Andrey Vagin <avagin@openvz.org> Date: Mon, 6 Jan 2014 18:59:37 +0400 Subject: [PATCH] debug Signed-off-by: Andrey Vagin <avagin@openvz.org> --- net/core/dev.c | 2 ++ net/netfilter/nf_conntrack_netlink.c | 1 + net/netfilter/nf_nat_core.c | 8 ++++++++ 3 files changed, 11 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 4fc1722..671e4b4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2981,7 +2981,9 @@ static int get_rps_cpu(struct net_device *dev, struct sk_buff *skb, struct rps_sock_flow_table *sock_flow_table; int cpu = -1; u16 tcpu; + static int xxx = 0; + return (xxx++) % 2; if (skb_rx_queue_recorded(skb)) { u16 index = skb_get_rx_queue(skb); if (unlikely(index >= dev->real_num_rx_queues)) { diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c index 08870b8..2f6c724 100644 --- a/net/netfilter/nf_conntrack_netlink.c +++ b/net/netfilter/nf_conntrack_netlink.c @@ -1720,6 +1720,7 @@ ctnetlink_create_conntrack(struct net *net, u16 zone, if (tstamp) tstamp->start = ktime_to_ns(ktime_get_real()); + printk("%s:%d: %p\n", __func__, __LINE__, ct); err = nf_conntrack_hash_check_insert(ct); if (err < 0) goto err2; diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c index 63a8154..5600bfd 100644 --- a/net/netfilter/nf_nat_core.c +++ b/net/netfilter/nf_nat_core.c @@ -379,6 +379,14 @@ nf_nat_setup_info(struct nf_conn *ct, NF_CT_ASSERT(maniptype == NF_NAT_MANIP_SRC || maniptype == NF_NAT_MANIP_DST); + if (net != &init_net) + { + long long i; + printk("%s:%d %p %d %lx\n", __func__, __LINE__, ct, maniptype, ct->status); + for (i = 0; i < 10000000000; i++) + asm("nop"); + printk("%s:%d %p %d %lx\n", __func__, __LINE__, ct, maniptype, ct->status); + } BUG_ON(nf_nat_initialized(ct, maniptype)); /* What we've got will look like inverse of reply. Normally -- 1.8.4.2