diff mbox series

[v3,nf-next,1/2] netfilter: x_tables: wait until old table isn't used anymore

Message ID 20171011142607.15026-2-fw@strlen.de
State Changes Requested
Delegated to: Pablo Neira
Headers show
Series netfilter: x_tables: speed up iptables-restore | expand

Commit Message

Florian Westphal Oct. 11, 2017, 2:26 p.m. UTC
xt_replace_table relies on table replacement counter retrieval (which
uses xt_recseq to synchronize pcpu counters).

This is fine, however with large rule set get_counters() can take
a very long time -- it needs to synchronize all counters because
it has to assume concurrent modifications can occur.

Make xt_replace_table synchronize by itself by waiting until all cpus
had an even seqcount.

This allows a followup patch to copy the counters of the old ruleset
without any synchonization after xt_replace_table has completed.

Cc: Dan Williams <dcbw@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 v3: check for 'seq is uneven' OR 'has changed' since
 last check. Its fine if seq is uneven iff its a different
 sequence number than the initial one.

 v2: fix Erics email address
 net/netfilter/x_tables.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Eric Dumazet Oct. 11, 2017, 3:09 p.m. UTC | #1
On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote:
> xt_replace_table relies on table replacement counter retrieval (which
> uses xt_recseq to synchronize pcpu counters).
>
> This is fine, however with large rule set get_counters() can take
> a very long time -- it needs to synchronize all counters because
> it has to assume concurrent modifications can occur.
>
> Make xt_replace_table synchronize by itself by waiting until all cpus
> had an even seqcount.
>
> This allows a followup patch to copy the counters of the old ruleset
> without any synchonization after xt_replace_table has completed.
>
> Cc: Dan Williams <dcbw@redhat.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  v3: check for 'seq is uneven' OR 'has changed' since
>  last check. Its fine if seq is uneven iff its a different
>  sequence number than the initial one.
>
>  v2: fix Erics email address
>  net/netfilter/x_tables.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
>

Reviewed-by: Eric Dumazet <edumazet@google.com>

But it seems we need an extra smp_wmb() after

        smp_wmb();
        table->private = newinfo;
+      smp_wmb();

Otherwise we have no guarantee other cpus actually see the new ->private value.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Oct. 11, 2017, 5:48 p.m. UTC | #2
Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote:
> > xt_replace_table relies on table replacement counter retrieval (which
> > uses xt_recseq to synchronize pcpu counters).
> >
> > This is fine, however with large rule set get_counters() can take
> > a very long time -- it needs to synchronize all counters because
> > it has to assume concurrent modifications can occur.
> >
> > Make xt_replace_table synchronize by itself by waiting until all cpus
> > had an even seqcount.
> >
> > This allows a followup patch to copy the counters of the old ruleset
> > without any synchonization after xt_replace_table has completed.
> >
> > Cc: Dan Williams <dcbw@redhat.com>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Signed-off-by: Florian Westphal <fw@strlen.de>
> > ---
> >  v3: check for 'seq is uneven' OR 'has changed' since
> >  last check. Its fine if seq is uneven iff its a different
> >  sequence number than the initial one.
> >
> >  v2: fix Erics email address
> >  net/netfilter/x_tables.c | 18 +++++++++++++++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >
> >
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>
> 
> But it seems we need an extra smp_wmb() after
> 
>         smp_wmb();
>         table->private = newinfo;
> +      smp_wmb();
>
> Otherwise we have no guarantee other cpus actually see the new ->private value.

Seems to be unrelated to this change, so I will submit
a separate patch for nf.git that adds this.

Thanks!
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 11, 2017, 5:57 p.m. UTC | #3
On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@strlen.de> wrote:
> Eric Dumazet <edumazet@google.com> wrote:
>> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote:
>> > xt_replace_table relies on table replacement counter retrieval (which
>> > uses xt_recseq to synchronize pcpu counters).
>> >
>> > This is fine, however with large rule set get_counters() can take
>> > a very long time -- it needs to synchronize all counters because
>> > it has to assume concurrent modifications can occur.
>> >
>> > Make xt_replace_table synchronize by itself by waiting until all cpus
>> > had an even seqcount.
>> >
>> > This allows a followup patch to copy the counters of the old ruleset
>> > without any synchonization after xt_replace_table has completed.
>> >
>> > Cc: Dan Williams <dcbw@redhat.com>
>> > Cc: Eric Dumazet <edumazet@google.com>
>> > Signed-off-by: Florian Westphal <fw@strlen.de>
>> > ---
>> >  v3: check for 'seq is uneven' OR 'has changed' since
>> >  last check. Its fine if seq is uneven iff its a different
>> >  sequence number than the initial one.
>> >
>> >  v2: fix Erics email address
>> >  net/netfilter/x_tables.c | 18 +++++++++++++++---
>> >  1 file changed, 15 insertions(+), 3 deletions(-)
>> >
>> >
>>
>> Reviewed-by: Eric Dumazet <edumazet@google.com>
>>
>> But it seems we need an extra smp_wmb() after
>>
>>         smp_wmb();
>>         table->private = newinfo;
>> +      smp_wmb();
>>
>> Otherwise we have no guarantee other cpus actually see the new ->private value.
>
> Seems to be unrelated to this change, so I will submit
> a separate patch for nf.git that adds this.

This is related to this change, please read the comment before the
local_bh_enable9)

        /*
         * Even though table entries have now been swapped, other CPU's
         * may still be using the old entries. This is okay, because
         * resynchronization happens because of the locking done
         * during the get_counters() routine.
         */


Since your new code happens right after table->private  = newinfo ;
the smp_wmb() is required.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Florian Westphal Oct. 11, 2017, 6:18 p.m. UTC | #4
Eric Dumazet <edumazet@google.com> wrote:
> On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@strlen.de> wrote:
> > Eric Dumazet <edumazet@google.com> wrote:
> >> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote:
> >> > xt_replace_table relies on table replacement counter retrieval (which
> >> > uses xt_recseq to synchronize pcpu counters).
> >> >
> >> > This is fine, however with large rule set get_counters() can take
> >> > a very long time -- it needs to synchronize all counters because
> >> > it has to assume concurrent modifications can occur.
> >> >
> >> > Make xt_replace_table synchronize by itself by waiting until all cpus
> >> > had an even seqcount.
> >> >
> >> > This allows a followup patch to copy the counters of the old ruleset
> >> > without any synchonization after xt_replace_table has completed.
> >> >
> >> > Cc: Dan Williams <dcbw@redhat.com>
> >> > Cc: Eric Dumazet <edumazet@google.com>
> >> > Signed-off-by: Florian Westphal <fw@strlen.de>
> >> > ---
> >> >  v3: check for 'seq is uneven' OR 'has changed' since
> >> >  last check. Its fine if seq is uneven iff its a different
> >> >  sequence number than the initial one.
> >> >
> >> >  v2: fix Erics email address
> >> >  net/netfilter/x_tables.c | 18 +++++++++++++++---
> >> >  1 file changed, 15 insertions(+), 3 deletions(-)
> >> >
> >> >
> >>
> >> Reviewed-by: Eric Dumazet <edumazet@google.com>
> >>
> >> But it seems we need an extra smp_wmb() after
> >>
> >>         smp_wmb();
> >>         table->private = newinfo;
> >> +      smp_wmb();
> >>
> >> Otherwise we have no guarantee other cpus actually see the new ->private value.
> >
> > Seems to be unrelated to this change, so I will submit
> > a separate patch for nf.git that adds this.
> 
> This is related to this change, please read the comment before the
> local_bh_enable9)
> 
>         /*
>          * Even though table entries have now been swapped, other CPU's
>          * may still be using the old entries. This is okay, because
>          * resynchronization happens because of the locking done
>          * during the get_counters() routine.
>          */

Hmm, but get_counters() does not issue a wmb, and the 'new' code added
here essentially is the same as get_counters(), except that we only
read seqcount until we saw a change (and not for each counter in
the rule set).

What am I missing?
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet Oct. 11, 2017, 6:23 p.m. UTC | #5
On Wed, Oct 11, 2017 at 11:18 AM, Florian Westphal <fw@strlen.de> wrote:
> Eric Dumazet <edumazet@google.com> wrote:
>> On Wed, Oct 11, 2017 at 10:48 AM, Florian Westphal <fw@strlen.de> wrote:
>> > Eric Dumazet <edumazet@google.com> wrote:
>> >> On Wed, Oct 11, 2017 at 7:26 AM, Florian Westphal <fw@strlen.de> wrote:
>> >> > xt_replace_table relies on table replacement counter retrieval (which
>> >> > uses xt_recseq to synchronize pcpu counters).
>> >> >
>> >> > This is fine, however with large rule set get_counters() can take
>> >> > a very long time -- it needs to synchronize all counters because
>> >> > it has to assume concurrent modifications can occur.
>> >> >
>> >> > Make xt_replace_table synchronize by itself by waiting until all cpus
>> >> > had an even seqcount.
>> >> >
>> >> > This allows a followup patch to copy the counters of the old ruleset
>> >> > without any synchonization after xt_replace_table has completed.
>> >> >
>> >> > Cc: Dan Williams <dcbw@redhat.com>
>> >> > Cc: Eric Dumazet <edumazet@google.com>
>> >> > Signed-off-by: Florian Westphal <fw@strlen.de>
>> >> > ---
>> >> >  v3: check for 'seq is uneven' OR 'has changed' since
>> >> >  last check. Its fine if seq is uneven iff its a different
>> >> >  sequence number than the initial one.
>> >> >
>> >> >  v2: fix Erics email address
>> >> >  net/netfilter/x_tables.c | 18 +++++++++++++++---
>> >> >  1 file changed, 15 insertions(+), 3 deletions(-)
>> >> >
>> >> >
>> >>
>> >> Reviewed-by: Eric Dumazet <edumazet@google.com>
>> >>
>> >> But it seems we need an extra smp_wmb() after
>> >>
>> >>         smp_wmb();
>> >>         table->private = newinfo;
>> >> +      smp_wmb();
>> >>
>> >> Otherwise we have no guarantee other cpus actually see the new ->private value.
>> >
>> > Seems to be unrelated to this change, so I will submit
>> > a separate patch for nf.git that adds this.
>>
>> This is related to this change, please read the comment before the
>> local_bh_enable9)
>>
>>         /*
>>          * Even though table entries have now been swapped, other CPU's
>>          * may still be using the old entries. This is okay, because
>>          * resynchronization happens because of the locking done
>>          * during the get_counters() routine.
>>          */
>
> Hmm, but get_counters() does not issue a wmb, and the 'new' code added
> here essentially is the same as get_counters(), except that we only
> read seqcount until we saw a change (and not for each counter in
> the rule set).
>
> What am I missing?

Your sync code does nothing interesting if we are not sure new
table->private value is visible

Without barriers, compiler/cpu could  do this :


+       /* ... so wait for even xt_recseq on all cpus */
+       for_each_possible_cpu(cpu) {
+               seqcount_t *s = &per_cpu(xt_recseq, cpu);
+               u32 seq = raw_read_seqcount(s);
+
+               if (seq & 1) {
+                       do {
+                               cond_resched();
+                               cpu_relax();
+                       } while (seq == raw_read_seqcount(s));
+               }
+       }

/* finally, write new private value */
table->private = newinfo;

Basically, your loop is now useless and you could remove it. So there
is definitely a bug.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox series

Patch

diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c
index c83a3b5e1c6c..ffd1c7a76e29 100644
--- a/net/netfilter/x_tables.c
+++ b/net/netfilter/x_tables.c
@@ -1153,6 +1153,7 @@  xt_replace_table(struct xt_table *table,
 	      int *error)
 {
 	struct xt_table_info *private;
+	unsigned int cpu;
 	int ret;
 
 	ret = xt_jumpstack_alloc(newinfo);
@@ -1184,12 +1185,23 @@  xt_replace_table(struct xt_table *table,
 
 	/*
 	 * Even though table entries have now been swapped, other CPU's
-	 * may still be using the old entries. This is okay, because
-	 * resynchronization happens because of the locking done
-	 * during the get_counters() routine.
+	 * may still be using the old entries...
 	 */
 	local_bh_enable();
 
+	/* ... so wait for even xt_recseq on all cpus */
+	for_each_possible_cpu(cpu) {
+		seqcount_t *s = &per_cpu(xt_recseq, cpu);
+		u32 seq = raw_read_seqcount(s);
+
+		if (seq & 1) {
+			do {
+				cond_resched();
+				cpu_relax();
+			} while (seq == raw_read_seqcount(s));
+		}
+	}
+
 #ifdef CONFIG_AUDIT
 	if (audit_enabled) {
 		audit_log(current->audit_context, GFP_KERNEL,