Message ID | 497F350A.9020509@cosmosbay.com |
---|---|
State | Not Applicable, archived |
Delegated to: | David Miller |
Headers | show |
Eric Dumazet wrote: > [PATCH] netfilter: Get rid of central rwlock in tcp conntracking > > TCP connection tracking suffers of huge contention on a global rwlock, > used to protect tcp conntracking state. > > As each tcp conntrack state have no relations between each others, we > can switch to fine grained lock. Using an array of spinlocks avoids > enlarging size of connection tracking structures, yet giving reasonable > fanout. > > tcp_print_conntrack() doesnt need to lock anything to read > ct->proto.tcp.state, so speedup /proc/net/ip_conntrack as well. > > nf_conntrack_hash_rnd_initted & nf_conntrack_hash_rnd declared read_mostly > This looks good to me. Rick, would you like to give it a try? I'll convert the remaining conntrack protocols when applying it. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy wrote: > Eric Dumazet wrote: > >>[PATCH] netfilter: Get rid of central rwlock in tcp conntracking >> >>TCP connection tracking suffers of huge contention on a global rwlock, >>used to protect tcp conntracking state. >> >>As each tcp conntrack state have no relations between each others, we >>can switch to fine grained lock. Using an array of spinlocks avoids >>enlarging size of connection tracking structures, yet giving reasonable >>fanout. >> >>tcp_print_conntrack() doesnt need to lock anything to read >>ct->proto.tcp.state, so speedup /proc/net/ip_conntrack as well. >> >>nf_conntrack_hash_rnd_initted & nf_conntrack_hash_rnd declared read_mostly >> > > > This looks good to me. Rick, would you like to give it a try? > > I'll convert the remaining conntrack protocols when applying it. I will give it a try and let folks know the results - unless told otherwise, I will ass-u-me I only need rerun the "full_iptables" test case. rick jones -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rick Jones wrote: > Patrick McHardy wrote: > >> Eric Dumazet wrote: >> >>> [PATCH] netfilter: Get rid of central rwlock in tcp conntracking >>> >>> TCP connection tracking suffers of huge contention on a global rwlock, >>> used to protect tcp conntracking state. >>> >>> As each tcp conntrack state have no relations between each others, we >>> can switch to fine grained lock. Using an array of spinlocks avoids >>> enlarging size of connection tracking structures, yet giving reasonable >>> fanout. >>> >>> tcp_print_conntrack() doesnt need to lock anything to read >>> ct->proto.tcp.state, so speedup /proc/net/ip_conntrack as well. >>> >>> nf_conntrack_hash_rnd_initted & nf_conntrack_hash_rnd declared >>> read_mostly >>> >> >> >> >> This looks good to me. Rick, would you like to give it a try? >> >> I'll convert the remaining conntrack protocols when applying it. > > > I will give it a try and let folks know the results - unless told > otherwise, I will ass-u-me I only need rerun the "full_iptables" test case. The runemomniagg2.sh script is still running, but the initial cycles profile suggests that the main change is converting the write_lock time into spinlock contention time with 78.39% of the cycles spent in ia64_spinlock_contention. When the script completes I'll upload the profiles and the netperf results to the same base URL as in the basenote under "contrack01/" rick jones -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>> I will give it a try and let folks know the results - unless told >> otherwise, I will ass-u-me I only need rerun the "full_iptables" test >> case. > > > The runemomniagg2.sh script is still running, but the initial cycles > profile suggests that the main change is converting the write_lock time > into spinlock contention time with 78.39% of the cycles spent in > ia64_spinlock_contention. When the script completes I'll upload the > profiles and the netperf results to the same base URL as in the basenote > under "contrack01/" The script completed - although at some point I hit an fd limit - I think I have an fd leak in netperf somewhere :( . Anyhow, there are still some netperfs that end-up kicking the bucket during the run - I suspect starvation because where in the other configs (no iptables, and empty iptables) each netperf seems to consume about 50% of a CPU - stands to reason - 64 netperfs, 32 cores - in the "full" case I see many netperfs consuming 100% of a CPU. My gut is thinking that one or more netperf contexts gets stuck doing something on behalf of others. There is also ksoftirqd time for a few of those processes. Anyhow, the spread on trans/s/netperf is now 600 to 500 or 6000, which does represent an improvement. rick jones PS - just to be certain that running-out of fd's didn't skew the results I'm rerunning the script with ulimit -n 10240 and will see if that changes the results any. And I suppose I need to go fd leak hunting in netperf omni code :( -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rick Jones a écrit : >>> I will give it a try and let folks know the results - unless told >>> otherwise, I will ass-u-me I only need rerun the "full_iptables" >>> test case. >> >> >> The runemomniagg2.sh script is still running, but the initial cycles >> profile suggests that the main change is converting the write_lock >> time into spinlock contention time with 78.39% of the cycles spent in >> ia64_spinlock_contention. When the script completes I'll upload the >> profiles and the netperf results to the same base URL as in the >> basenote under "contrack01/" > > The script completed - although at some point I hit an fd limit - I > think I have an fd leak in netperf somewhere :( . > > Anyhow, there are still some netperfs that end-up kicking the bucket > during the run - I suspect starvation because where in the other > configs (no iptables, and empty iptables) each netperf seems to > consume about 50% of a CPU - stands to reason - 64 netperfs, 32 cores > - in the "full" case I see many netperfs consuming 100% of a CPU. My > gut is thinking that one or more netperf contexts gets stuck doing > something on behalf of others. There is also ksoftirqd time for a few > of those processes. > > Anyhow, the spread on trans/s/netperf is now 600 to 500 or 6000, which > does represent an improvement. > > rick jones > > PS - just to be certain that running-out of fd's didn't skew the > results I'm rerunning the script with ulimit -n 10240 and will see if > that changes the results any. And I suppose I need to go fd leak > hunting in netperf omni code :( > -- > Thanks for the report If you have so much contention on spinlocks, maybe hash function is not good at all... hash = (unsigned long)ct; hash ^= hash >> 16; hash ^= hash >> 8; I ass-u-me you compiled your kernel with NR_CPUS=32 ? -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> Thanks for the report > > If you have so much contention on spinlocks, maybe hash function is not > good at all... > > hash = (unsigned long)ct; > hash ^= hash >> 16; > hash ^= hash >> 8; > > I ass-u-me you compiled your kernel with NR_CPUS=32 ? I believe so - CONFIG_NR_CPUS in .config is 64 anyway. If there is a more definitive place to check I'd be happy to look. rick -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rick Jones a écrit : > >> Thanks for the report >> >> If you have so much contention on spinlocks, maybe hash function is >> not good at all... >> >> hash = (unsigned long)ct; >> hash ^= hash >> 16; >> hash ^= hash >> 8; >> >> I ass-u-me you compiled your kernel with NR_CPUS=32 ? > > I believe so - CONFIG_NR_CPUS in .config is 64 anyway. If there is a > more definitive place to check I'd be happy to look. > > rick > > By any chance, are you using SLAB instead of SLUB ? While running your netperf session, issuing : grep conntrack /proc/slabinfo Would help to know hash dispertion on your machine. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > By any chance, are you using SLAB instead of SLUB ? Seems that way: sut42:~/net-next-2.6# grep SLAB .config CONFIG_SLAB=y CONFIG_SLABINFO=y # CONFIG_DEBUG_SLAB is not set sut42:~/net-next-2.6# grep SLUB .config # CONFIG_SLUB is not set The config file should be up under the base URL: ftp://ftp.netperf.org/iptable_scaling The system started as a Debian Lenny with a 2.6.26 kernel, and I did a make old config and mostly took defaults (save multiq) > While running your netperf session, issuing : > > grep conntrack /proc/slabinfo > > Would help to know hash dispertion on your machine. sut42:~/net-next-2.6# grep conntrack /proc/slabinfo nf_conntrack_expect 0 0 240 66 1 : tunables 120 60 8 : slabdata 0 0 0 nf_conntrack 276 742 304 53 1 : tunables 54 27 8 : slabdata 14 14 0 about a minute later it is: grep conntrack /proc/slabinfo nf_conntrack_expect 0 0 240 66 1 : tunables 120 60 8 : slabdata 0 0 0 nf_conntrack 315 742 304 53 1 : tunables 54 27 8 : slabdata 14 14 0 after about another 40 to 60 seconds it was: grep conntrack /proc/slabinfo nf_conntrack_expect 0 0 240 66 1 : tunables 120 60 8 : slabdata 0 0 0 nf_conntrack 314 742 304 53 1 : tunables 54 27 8 : slabdata 14 14 7 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Rick Jones a écrit : >>> I will give it a try and let folks know the results - unless told >>> otherwise, I will ass-u-me I only need rerun the "full_iptables" test >>> case. >> >> >> The runemomniagg2.sh script is still running, but the initial cycles >> profile suggests that the main change is converting the write_lock >> time into spinlock contention time with 78.39% of the cycles spent in >> ia64_spinlock_contention. When the script completes I'll upload the >> profiles and the netperf results to the same base URL as in the >> basenote under "contrack01/" > > The script completed - although at some point I hit an fd limit - I > think I have an fd leak in netperf somewhere :( . > > Anyhow, there are still some netperfs that end-up kicking the bucket > during the run - I suspect starvation because where in the other configs > (no iptables, and empty iptables) each netperf seems to consume about > 50% of a CPU - stands to reason - 64 netperfs, 32 cores - in the "full" > case I see many netperfs consuming 100% of a CPU. My gut is thinking > that one or more netperf contexts gets stuck doing something on behalf > of others. There is also ksoftirqd time for a few of those processes. > > Anyhow, the spread on trans/s/netperf is now 600 to 500 or 6000, which > does represent an improvement. > Yes indeed you have a speedup, tcp conntracking is OK. You now hit the nf_conntrack_lock spinlock we have in generic conntrack code (net/netfilter/nf_conntrack_core.c) nf_ct_refresh_acct() for instance has to lock it. We really want some finer locking here. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Rick Jones a écrit : >> Anyhow, the spread on trans/s/netperf is now 600 to 500 or 6000, which >> does represent an improvement. >> > > Yes indeed you have a speedup, tcp conntracking is OK. > > You now hit the nf_conntrack_lock spinlock we have in generic conntrack code > (net/netfilter/nf_conntrack_core.c) > > nf_ct_refresh_acct() for instance has to lock it. > > We really want some finer locking here. That looks more complicated since it requires to take multiple locks occasionally (f.i. hash insertion, potentially helper-related and expectation-related stuff), and there is the unconfirmed_list, where fine-grained locking can't really be used without changing it to a hash. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Patrick McHardy a écrit : > Eric Dumazet wrote: >> Rick Jones a écrit : >>> Anyhow, the spread on trans/s/netperf is now 600 to 500 or 6000, which >>> does represent an improvement. >>> >> >> Yes indeed you have a speedup, tcp conntracking is OK. >> >> You now hit the nf_conntrack_lock spinlock we have in generic >> conntrack code (net/netfilter/nf_conntrack_core.c) >> >> nf_ct_refresh_acct() for instance has to lock it. >> >> We really want some finer locking here. > > That looks more complicated since it requires to take multiple locks > occasionally (f.i. hash insertion, potentially helper-related and > expectation-related stuff), and there is the unconfirmed_list, where > fine-grained locking can't really be used without changing it to > a hash. > Yes its more complicated, but look what we did in 2.6.29 for tcp/udp sockets, using RCU to have lockless lookups. Yes, we still take a lock when doing an insert or delete at socket bind/unbind time. We could keep a central nf_conntrack_lock to guard insertions/deletes from hash and unconfirmed_list. But *normal* packets that only need to change state of one particular connection could use RCU (without spinlock) to locate the conntrack, then lock the found conntrack to perform all state changes. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet a écrit : > Patrick McHardy a écrit : >> Eric Dumazet wrote: >>> Rick Jones a écrit : >>>> Anyhow, the spread on trans/s/netperf is now 600 to 500 or 6000, which >>>> does represent an improvement. >>>> >>> Yes indeed you have a speedup, tcp conntracking is OK. >>> >>> You now hit the nf_conntrack_lock spinlock we have in generic >>> conntrack code (net/netfilter/nf_conntrack_core.c) >>> >>> nf_ct_refresh_acct() for instance has to lock it. >>> >>> We really want some finer locking here. >> That looks more complicated since it requires to take multiple locks >> occasionally (f.i. hash insertion, potentially helper-related and >> expectation-related stuff), and there is the unconfirmed_list, where >> fine-grained locking can't really be used without changing it to >> a hash. >> > > Yes its more complicated, but look what we did in 2.6.29 for tcp/udp > sockets, using RCU to have lockless lookups. > Yes, we still take a lock when doing an insert or delete at socket > bind/unbind time. > > We could keep a central nf_conntrack_lock to guard insertions/deletes > from hash and unconfirmed_list. > > But *normal* packets that only need to change state of one particular > connection could use RCU (without spinlock) to locate the conntrack, > then lock the found conntrack to perform all state changes. Well... RCU is already used by conntrack :) Maybe only __nf_ct_refresh_acct() needs not taking nf_conntrack_lock -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet wrote: > Eric Dumazet a écrit : >> Patrick McHardy a écrit : >>> That looks more complicated since it requires to take multiple locks >>> occasionally (f.i. hash insertion, potentially helper-related and >>> expectation-related stuff), and there is the unconfirmed_list, where >>> fine-grained locking can't really be used without changing it to >>> a hash. >>> >> Yes its more complicated, but look what we did in 2.6.29 for tcp/udp >> sockets, using RCU to have lockless lookups. >> Yes, we still take a lock when doing an insert or delete at socket >> bind/unbind time. >> >> We could keep a central nf_conntrack_lock to guard insertions/deletes >> from hash and unconfirmed_list. >> >> But *normal* packets that only need to change state of one particular >> connection could use RCU (without spinlock) to locate the conntrack, >> then lock the found conntrack to perform all state changes. > > Well... RCU is already used by conntrack :) > > Maybe only __nf_ct_refresh_acct() needs not taking nf_conntrack_lock The lock is currently used to avoid timer update races. Martin has two old patches two remove it, but they require changes to the timer code to support a mod_timer variant that doesn't enable an inactive timer: http://people.netfilter.org/gandalf/performance/patches/mod_timer_noact http://people.netfilter.org/gandalf/performance/patches/__nf_ct_refresh_acct-locking -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h index 2e0c536..288aff5 100644 --- a/include/net/netfilter/nf_conntrack.h +++ b/include/net/netfilter/nf_conntrack.h @@ -129,6 +129,38 @@ struct nf_conn struct rcu_head rcu; }; +#if defined(CONFIG_SMP) || defined(CONFIG_DEBUG_SPINLOCK) || \ + defined(CONFIG_PROVE_LOCKING) + +/* + * We reserve 16 locks per cpu (typical cache line size is 64 bytes) + * maxed to 512 locks so that ct_hlock[] uses at most 2048 bytes of memory. + * (on lockdep we have a quite big spinlock_t, so keep the size down there) + */ +#ifdef CONFIG_LOCKDEP +#define CT_HASH_LOCK_SZ 64 +#elif NR_CPUS >= 32 +#define CT_HASH_LOCK_SZ 512 +#else +#define CT_HASH_LOCK_SZ (roundup_pow_of_two(NR_CPUS) * 16) +#endif + +extern spinlock_t ct_hlock[CT_HASH_LOCK_SZ]; + +#else +#define CT_HASH_LOCK_SZ 0 +#endif +static inline spinlock_t *ct_lock_addr(const struct nf_conn *ct) +{ + if (CT_HASH_LOCK_SZ) { + unsigned long hash = (unsigned long)ct; + hash ^= hash >> 16; + hash ^= hash >> 8; + return &ct_hlock[hash % CT_HASH_LOCK_SZ]; + } + return NULL; +} + static inline struct nf_conn * nf_ct_tuplehash_to_ctrack(const struct nf_conntrack_tuple_hash *hash) { diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c index 90ce9dd..68822d8 100644 --- a/net/netfilter/nf_conntrack_core.c +++ b/net/netfilter/nf_conntrack_core.c @@ -61,9 +61,9 @@ struct nf_conn nf_conntrack_untracked __read_mostly; EXPORT_SYMBOL_GPL(nf_conntrack_untracked); static struct kmem_cache *nf_conntrack_cachep __read_mostly; - -static int nf_conntrack_hash_rnd_initted; -static unsigned int nf_conntrack_hash_rnd; +spinlock_t ct_hlock[CT_HASH_LOCK_SZ]; +static int nf_conntrack_hash_rnd_initted __read_mostly; +static unsigned int nf_conntrack_hash_rnd __read_mostly; static u_int32_t __hash_conntrack(const struct nf_conntrack_tuple *tuple, unsigned int size, unsigned int rnd) @@ -1141,7 +1141,7 @@ module_param_call(hashsize, nf_conntrack_set_hashsize, param_get_uint, static int nf_conntrack_init_init_net(void) { int max_factor = 8; - int ret; + int i, ret; /* Idea from tcp.c: use 1/16384 of memory. On i386: 32MB * machine has 512 buckets. >= 1GB machines have 16384 buckets. */ @@ -1174,6 +1174,8 @@ static int nf_conntrack_init_init_net(void) ret = -ENOMEM; goto err_cache; } + for (i = 0; i < CT_HASH_LOCK_SZ; i++) + spin_lock_init(&ct_hlock[i]); ret = nf_conntrack_proto_init(); if (ret < 0) diff --git a/net/netfilter/nf_conntrack_proto_tcp.c b/net/netfilter/nf_conntrack_proto_tcp.c index a1edb9c..247e82f 100644 --- a/net/netfilter/nf_conntrack_proto_tcp.c +++ b/net/netfilter/nf_conntrack_proto_tcp.c @@ -26,9 +26,6 @@ #include <net/netfilter/nf_conntrack_ecache.h> #include <net/netfilter/nf_log.h> -/* Protects ct->proto.tcp */ -static DEFINE_RWLOCK(tcp_lock); - /* "Be conservative in what you do, be liberal in what you accept from others." If it's non-zero, we mark only out of window RST segments as INVALID. */ @@ -297,9 +294,7 @@ static int tcp_print_conntrack(struct seq_file *s, const struct nf_conn *ct) { enum tcp_conntrack state; - read_lock_bh(&tcp_lock); state = ct->proto.tcp.state; - read_unlock_bh(&tcp_lock); return seq_printf(s, "%s ", tcp_conntrack_names[state]); } @@ -705,14 +700,14 @@ void nf_conntrack_tcp_update(const struct sk_buff *skb, end = segment_seq_plus_len(ntohl(tcph->seq), skb->len, dataoff, tcph); - write_lock_bh(&tcp_lock); + spin_lock_bh(ct_lock_addr(ct)); /* * We have to worry for the ack in the reply packet only... */ if (after(end, ct->proto.tcp.seen[dir].td_end)) ct->proto.tcp.seen[dir].td_end = end; ct->proto.tcp.last_end = end; - write_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); pr_debug("tcp_update: sender end=%u maxend=%u maxwin=%u scale=%i " "receiver end=%u maxend=%u maxwin=%u scale=%i\n", sender->td_end, sender->td_maxend, sender->td_maxwin, @@ -821,7 +816,7 @@ static int tcp_packet(struct nf_conn *ct, th = skb_header_pointer(skb, dataoff, sizeof(_tcph), &_tcph); BUG_ON(th == NULL); - write_lock_bh(&tcp_lock); + spin_lock_bh(ct_lock_addr(ct)); old_state = ct->proto.tcp.state; dir = CTINFO2DIR(ctinfo); index = get_conntrack_index(th); @@ -851,7 +846,7 @@ static int tcp_packet(struct nf_conn *ct, && ct->proto.tcp.last_index == TCP_RST_SET)) { /* Attempt to reopen a closed/aborted connection. * Delete this connection and look up again. */ - write_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); /* Only repeat if we can actually remove the timer. * Destruction may already be in progress in process @@ -887,7 +882,7 @@ static int tcp_packet(struct nf_conn *ct, * that the client cannot but retransmit its SYN and * thus initiate a clean new session. */ - write_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); if (LOG_INVALID(net, IPPROTO_TCP)) nf_log_packet(pf, 0, skb, NULL, NULL, NULL, "nf_ct_tcp: killing out of sync session "); @@ -900,7 +895,7 @@ static int tcp_packet(struct nf_conn *ct, ct->proto.tcp.last_end = segment_seq_plus_len(ntohl(th->seq), skb->len, dataoff, th); - write_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); if (LOG_INVALID(net, IPPROTO_TCP)) nf_log_packet(pf, 0, skb, NULL, NULL, NULL, "nf_ct_tcp: invalid packet ignored "); @@ -909,7 +904,7 @@ static int tcp_packet(struct nf_conn *ct, /* Invalid packet */ pr_debug("nf_ct_tcp: Invalid dir=%i index=%u ostate=%u\n", dir, get_conntrack_index(th), old_state); - write_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); if (LOG_INVALID(net, IPPROTO_TCP)) nf_log_packet(pf, 0, skb, NULL, NULL, NULL, "nf_ct_tcp: invalid state "); @@ -940,7 +935,7 @@ static int tcp_packet(struct nf_conn *ct, if (!tcp_in_window(ct, &ct->proto.tcp, dir, index, skb, dataoff, th, pf)) { - write_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); return -NF_ACCEPT; } in_window: @@ -969,7 +964,7 @@ static int tcp_packet(struct nf_conn *ct, timeout = nf_ct_tcp_timeout_unacknowledged; else timeout = tcp_timeouts[new_state]; - write_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); nf_conntrack_event_cache(IPCT_PROTOINFO_VOLATILE, ct); if (new_state != old_state) @@ -1022,6 +1017,7 @@ static bool tcp_new(struct nf_conn *ct, const struct sk_buff *skb, pr_debug("nf_ct_tcp: invalid new deleting.\n"); return false; } + spin_lock_init(ct_lock_addr(ct)); if (new_state == TCP_CONNTRACK_SYN_SENT) { /* SYN packet */ @@ -1092,7 +1088,7 @@ static int tcp_to_nlattr(struct sk_buff *skb, struct nlattr *nla, struct nlattr *nest_parms; struct nf_ct_tcp_flags tmp = {}; - read_lock_bh(&tcp_lock); + spin_lock_bh(ct_lock_addr(ct)); nest_parms = nla_nest_start(skb, CTA_PROTOINFO_TCP | NLA_F_NESTED); if (!nest_parms) goto nla_put_failure; @@ -1112,14 +1108,14 @@ static int tcp_to_nlattr(struct sk_buff *skb, struct nlattr *nla, tmp.flags = ct->proto.tcp.seen[1].flags; NLA_PUT(skb, CTA_PROTOINFO_TCP_FLAGS_REPLY, sizeof(struct nf_ct_tcp_flags), &tmp); - read_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); nla_nest_end(skb, nest_parms); return 0; nla_put_failure: - read_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); return -1; } @@ -1150,7 +1146,7 @@ static int nlattr_to_tcp(struct nlattr *cda[], struct nf_conn *ct) nla_get_u8(tb[CTA_PROTOINFO_TCP_STATE]) >= TCP_CONNTRACK_MAX) return -EINVAL; - write_lock_bh(&tcp_lock); + spin_lock_bh(ct_lock_addr(ct)); if (tb[CTA_PROTOINFO_TCP_STATE]) ct->proto.tcp.state = nla_get_u8(tb[CTA_PROTOINFO_TCP_STATE]); @@ -1177,7 +1173,7 @@ static int nlattr_to_tcp(struct nlattr *cda[], struct nf_conn *ct) ct->proto.tcp.seen[1].td_scale = nla_get_u8(tb[CTA_PROTOINFO_TCP_WSCALE_REPLY]); } - write_unlock_bh(&tcp_lock); + spin_unlock_bh(ct_lock_addr(ct)); return 0; }