Message ID | 20171011142607.15026-2-fw@strlen.de |
---|---|
State | Changes Requested |
Delegated to: | Pablo Neira |
Headers | show |
Series | netfilter: x_tables: speed up iptables-restore | expand |
On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: > xt_replace_table relies on table replacement counter retrieval (which > uses xt_recseq to synchronize pcpu counters). > > This is fine, however with large rule set get_counters() can take > a very long time -- it needs to synchronize all counters because > it has to assume concurrent modifications can occur. > > Make xt_replace_table synchronize by itself by waiting until all cpus > had an even seqcount. > > This allows a followup patch to copy the counters of the old ruleset > without any synchonization after xt_replace_table has completed. > > Cc: Dan Williams <dcbw@redhat.com> > Cc: Eric Dumazet <edumazet@google.com> > Signed-off-by: Florian Westphal <fw@strlen.de> > --- > v3: check for 'seq is uneven' OR 'has changed' since > last check. Its fine if seq is uneven iff its a different > sequence number than the initial one. > > v2: fix Erics email address > net/netfilter/x_tables.c | 18 +++++++++++++++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > Reviewed-by: Eric Dumazet <edumazet@google.com> But it seems we need an extra smp_wmb() after smp_wmb(); table->private = newinfo; + smp_wmb(); Otherwise we have no guarantee other cpus actually see the new ->private value. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <edumazet@google.com> wrote: > On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: > > xt_replace_table relies on table replacement counter retrieval (which > > uses xt_recseq to synchronize pcpu counters). > > > > This is fine, however with large rule set get_counters() can take > > a very long time -- it needs to synchronize all counters because > > it has to assume concurrent modifications can occur. > > > > Make xt_replace_table synchronize by itself by waiting until all cpus > > had an even seqcount. > > > > This allows a followup patch to copy the counters of the old ruleset > > without any synchonization after xt_replace_table has completed. > > > > Cc: Dan Williams <dcbw@redhat.com> > > Cc: Eric Dumazet <edumazet@google.com> > > Signed-off-by: Florian Westphal <fw@strlen.de> > > --- > > v3: check for 'seq is uneven' OR 'has changed' since > > last check. Its fine if seq is uneven iff its a different > > sequence number than the initial one. > > > > v2: fix Erics email address > > net/netfilter/x_tables.c | 18 +++++++++++++++--- > > 1 file changed, 15 insertions(+), 3 deletions(-) > > > > > > Reviewed-by: Eric Dumazet <edumazet@google.com> > > But it seems we need an extra smp_wmb() after > > smp_wmb(); > table->private = newinfo; > + smp_wmb(); > > Otherwise we have no guarantee other cpus actually see the new ->private value. Seems to be unrelated to this change, so I will submit a separate patch for nf.git that adds this. Thanks! -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@strlen.de> wrote: > Eric Dumazet <edumazet@google.com> wrote: >> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: >> > xt_replace_table relies on table replacement counter retrieval (which >> > uses xt_recseq to synchronize pcpu counters). >> > >> > This is fine, however with large rule set get_counters() can take >> > a very long time -- it needs to synchronize all counters because >> > it has to assume concurrent modifications can occur. >> > >> > Make xt_replace_table synchronize by itself by waiting until all cpus >> > had an even seqcount. >> > >> > This allows a followup patch to copy the counters of the old ruleset >> > without any synchonization after xt_replace_table has completed. >> > >> > Cc: Dan Williams <dcbw@redhat.com> >> > Cc: Eric Dumazet <edumazet@google.com> >> > Signed-off-by: Florian Westphal <fw@strlen.de> >> > --- >> > v3: check for 'seq is uneven' OR 'has changed' since >> > last check. Its fine if seq is uneven iff its a different >> > sequence number than the initial one. >> > >> > v2: fix Erics email address >> > net/netfilter/x_tables.c | 18 +++++++++++++++--- >> > 1 file changed, 15 insertions(+), 3 deletions(-) >> > >> > >> >> Reviewed-by: Eric Dumazet <edumazet@google.com> >> >> But it seems we need an extra smp_wmb() after >> >> smp_wmb(); >> table->private = newinfo; >> + smp_wmb(); >> >> Otherwise we have no guarantee other cpus actually see the new ->private value. > > Seems to be unrelated to this change, so I will submit > a separate patch for nf.git that adds this. This is related to this change, please read the comment before the local_bh_enable9) /* * Even though table entries have now been swapped, other CPU's * may still be using the old entries. This is okay, because * resynchronization happens because of the locking done * during the get_counters() routine. */ Since your new code happens right after table->private = newinfo ; the smp_wmb() is required. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Eric Dumazet <edumazet@google.com> wrote: > On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@strlen.de> wrote: > > Eric Dumazet <edumazet@google.com> wrote: > >> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: > >> > xt_replace_table relies on table replacement counter retrieval (which > >> > uses xt_recseq to synchronize pcpu counters). > >> > > >> > This is fine, however with large rule set get_counters() can take > >> > a very long time -- it needs to synchronize all counters because > >> > it has to assume concurrent modifications can occur. > >> > > >> > Make xt_replace_table synchronize by itself by waiting until all cpus > >> > had an even seqcount. > >> > > >> > This allows a followup patch to copy the counters of the old ruleset > >> > without any synchonization after xt_replace_table has completed. > >> > > >> > Cc: Dan Williams <dcbw@redhat.com> > >> > Cc: Eric Dumazet <edumazet@google.com> > >> > Signed-off-by: Florian Westphal <fw@strlen.de> > >> > --- > >> > v3: check for 'seq is uneven' OR 'has changed' since > >> > last check. Its fine if seq is uneven iff its a different > >> > sequence number than the initial one. > >> > > >> > v2: fix Erics email address > >> > net/netfilter/x_tables.c | 18 +++++++++++++++--- > >> > 1 file changed, 15 insertions(+), 3 deletions(-) > >> > > >> > > >> > >> Reviewed-by: Eric Dumazet <edumazet@google.com> > >> > >> But it seems we need an extra smp_wmb() after > >> > >> smp_wmb(); > >> table->private = newinfo; > >> + smp_wmb(); > >> > >> Otherwise we have no guarantee other cpus actually see the new ->private value. > > > > Seems to be unrelated to this change, so I will submit > > a separate patch for nf.git that adds this. > > This is related to this change, please read the comment before the > local_bh_enable9) > > /* > * Even though table entries have now been swapped, other CPU's > * may still be using the old entries. This is okay, because > * resynchronization happens because of the locking done > * during the get_counters() routine. > */ Hmm, but get_counters() does not issue a wmb, and the 'new' code added here essentially is the same as get_counters(), except that we only read seqcount until we saw a change (and not for each counter in the rule set). What am I missing? -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Oct 11, 2017 at 11:18 AM, Florian Westphal <fw@strlen.de> wrote: > Eric Dumazet <edumazet@google.com> wrote: >> On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@strlen.de> wrote: >> > Eric Dumazet <edumazet@google.com> wrote: >> >> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote: >> >> > xt_replace_table relies on table replacement counter retrieval (which >> >> > uses xt_recseq to synchronize pcpu counters). >> >> > >> >> > This is fine, however with large rule set get_counters() can take >> >> > a very long time -- it needs to synchronize all counters because >> >> > it has to assume concurrent modifications can occur. >> >> > >> >> > Make xt_replace_table synchronize by itself by waiting until all cpus >> >> > had an even seqcount. >> >> > >> >> > This allows a followup patch to copy the counters of the old ruleset >> >> > without any synchonization after xt_replace_table has completed. >> >> > >> >> > Cc: Dan Williams <dcbw@redhat.com> >> >> > Cc: Eric Dumazet <edumazet@google.com> >> >> > Signed-off-by: Florian Westphal <fw@strlen.de> >> >> > --- >> >> > v3: check for 'seq is uneven' OR 'has changed' since >> >> > last check. Its fine if seq is uneven iff its a different >> >> > sequence number than the initial one. >> >> > >> >> > v2: fix Erics email address >> >> > net/netfilter/x_tables.c | 18 +++++++++++++++--- >> >> > 1 file changed, 15 insertions(+), 3 deletions(-) >> >> > >> >> > >> >> >> >> Reviewed-by: Eric Dumazet <edumazet@google.com> >> >> >> >> But it seems we need an extra smp_wmb() after >> >> >> >> smp_wmb(); >> >> table->private = newinfo; >> >> + smp_wmb(); >> >> >> >> Otherwise we have no guarantee other cpus actually see the new ->private value. >> > >> > Seems to be unrelated to this change, so I will submit >> > a separate patch for nf.git that adds this. >> >> This is related to this change, please read the comment before the >> local_bh_enable9) >> >> /* >> * Even though table entries have now been swapped, other CPU's >> * may still be using the old entries. This is okay, because >> * resynchronization happens because of the locking done >> * during the get_counters() routine. >> */ > > Hmm, but get_counters() does not issue a wmb, and the 'new' code added > here essentially is the same as get_counters(), except that we only > read seqcount until we saw a change (and not for each counter in > the rule set). > > What am I missing? Your sync code does nothing interesting if we are not sure new table->private value is visible Without barriers, compiler/cpu could do this : + /* ... so wait for even xt_recseq on all cpus */ + for_each_possible_cpu(cpu) { + seqcount_t *s = &per_cpu(xt_recseq, cpu); + u32 seq = raw_read_seqcount(s); + + if (seq & 1) { + do { + cond_resched(); + cpu_relax(); + } while (seq == raw_read_seqcount(s)); + } + } /* finally, write new private value */ table->private = newinfo; Basically, your loop is now useless and you could remove it. So there is definitely a bug. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index c83a3b5e1c6c..ffd1c7a76e29 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -1153,6 +1153,7 @@ xt_replace_table(struct xt_table *table, int *error) { struct xt_table_info *private; + unsigned int cpu; int ret; ret = xt_jumpstack_alloc(newinfo); @@ -1184,12 +1185,23 @@ xt_replace_table(struct xt_table *table, /* * Even though table entries have now been swapped, other CPU's - * may still be using the old entries. This is okay, because - * resynchronization happens because of the locking done - * during the get_counters() routine. + * may still be using the old entries... */ local_bh_enable(); + /* ... so wait for even xt_recseq on all cpus */ + for_each_possible_cpu(cpu) { + seqcount_t *s = &per_cpu(xt_recseq, cpu); + u32 seq = raw_read_seqcount(s); + + if (seq & 1) { + do { + cond_resched(); + cpu_relax(); + } while (seq == raw_read_seqcount(s)); + } + } + #ifdef CONFIG_AUDIT if (audit_enabled) { audit_log(current->audit_context, GFP_KERNEL,
xt_replace_table relies on table replacement counter retrieval (which uses xt_recseq to synchronize pcpu counters). This is fine, however with large rule set get_counters() can take a very long time -- it needs to synchronize all counters because it has to assume concurrent modifications can occur. Make xt_replace_table synchronize by itself by waiting until all cpus had an even seqcount. This allows a followup patch to copy the counters of the old ruleset without any synchonization after xt_replace_table has completed. Cc: Dan Williams <dcbw@redhat.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> --- v3: check for 'seq is uneven' OR 'has changed' since last check. Its fine if seq is uneven iff its a different sequence number than the initial one. v2: fix Erics email address net/netfilter/x_tables.c | 18 +++++++++++++++--- 1 file changed, 15 insertions(+), 3 deletions(-)