Patchwork netfilter 07/41: arp_tables: unfold two critical loops in arp_packet_match()

login
register
mail settings
Submitter Eric Dumazet
Date March 24, 2009, 9:06 p.m.
Message ID <49C94B6A.5020304@cosmosbay.com>
Download mbox | patch
Permalink /patch/25028/
State Accepted
Delegated to: David Miller
Headers show

Comments

Eric Dumazet - March 24, 2009, 9:06 p.m.
David Miller a écrit :
> From: Patrick McHardy <kaber@trash.net>
> Date: Tue, 24 Mar 2009 15:03:16 +0100 (MET)
> 
>> +/*
>> + * Unfortunatly, _b and _mask are not aligned to an int (or long int)
>> + * Some arches dont care, unrolling the loop is a win on them.
>> + */
>> +static unsigned long ifname_compare(const char *_a, const char *_b, const char *_mask)
>> +{
>> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>> +	const unsigned long *a = (const unsigned long *)_a;
>> +	const unsigned long *b = (const unsigned long *)_b;
> 
> I think we can at least give some help for the platforms which
> require alignment.
> 
> We can, for example, assume 16-bit alignment and thus loop
> over u16's

Right. How about this incremental patch ?

Thanks

[PATCH] arp_tables: ifname_compare() can assume 16bit alignment

Arches without efficient unaligned access can still perform a loop
assuming 16bit alignment in ifname_compare()

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 24, 2009, 9:16 p.m.
From: Eric Dumazet <dada1@cosmosbay.com>
Date: Tue, 24 Mar 2009 22:06:50 +0100

> [PATCH] arp_tables: ifname_compare() can assume 16bit alignment
> 
> Arches without efficient unaligned access can still perform a loop
> assuming 16bit alignment in ifname_compare()
> 
> Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>

Looks great, applied, thanks Eric!
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt - March 24, 2009, 9:17 p.m.
On Tuesday 2009-03-24 22:06, Eric Dumazet wrote:
>>> +/*
>>> + * Unfortunatly, _b and _mask are not aligned to an int (or long int)
>>> + * Some arches dont care, unrolling the loop is a win on them.
>>> + */
>>> +static unsigned long ifname_compare(const char *_a, const char *_b, const char *_mask)
>>> +{
>>> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
>>> +	const unsigned long *a = (const unsigned long *)_a;
>>> +	const unsigned long *b = (const unsigned long *)_b;
>> 
>> I think we can at least give some help for the platforms which
>> require alignment.
>> 
>> We can, for example, assume 16-bit alignment and thus loop
>> over u16's
>
>Right. How about this incremental patch ?
>
>Thanks
>
>[PATCH] arp_tables: ifname_compare() can assume 16bit alignment
>
>Arches without efficient unaligned access can still perform a loop
>assuming 16bit alignment in ifname_compare()

Allow me some skepticism, but the code looks pretty much like a
standard memcmp.

> 	unsigned long ret = 0;
>+	const u16 *a = (const u16 *)_a;
>+	const u16 *b = (const u16 *)_b;
>+	const u16 *mask = (const u16 *)_mask;
> 	int i;
> 
>-	for (i = 0; i < IFNAMSIZ; i++)
>-		ret |= (_a[i] ^ _b[i]) & _mask[i];
>+	for (i = 0; i < IFNAMSIZ/sizeof(u16); i++)
>+		ret |= (a[i] ^ b[i]) & mask[i];
> #endif
> 	return ret;
> }
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 24, 2009, 9:18 p.m.
From: Jan Engelhardt <jengelh@medozas.de>
Date: Tue, 24 Mar 2009 22:17:17 +0100 (CET)

> 
> On Tuesday 2009-03-24 22:06, Eric Dumazet wrote:
> >>> +/*
> >>> + * Unfortunatly, _b and _mask are not aligned to an int (or long int)
> >>> + * Some arches dont care, unrolling the loop is a win on them.
> >>> + */
> >>> +static unsigned long ifname_compare(const char *_a, const char *_b, const char *_mask)
> >>> +{
> >>> +#ifdef CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
> >>> +	const unsigned long *a = (const unsigned long *)_a;
> >>> +	const unsigned long *b = (const unsigned long *)_b;
> >> 
> >> I think we can at least give some help for the platforms which
> >> require alignment.
> >> 
> >> We can, for example, assume 16-bit alignment and thus loop
> >> over u16's
> >
> >Right. How about this incremental patch ?
> >
> >Thanks
> >
> >[PATCH] arp_tables: ifname_compare() can assume 16bit alignment
> >
> >Arches without efficient unaligned access can still perform a loop
> >assuming 16bit alignment in ifname_compare()
> 
> Allow me some skepticism, but the code looks pretty much like a
> standard memcmp.

memcmp() can't make any assumptions about alignment.
Whereas we _know_ this thing is exactly 16-bit aligned.

All of the optimized memcmp() implementations look for
32-bit alignment and punt to byte at a time comparison
loops if things are not aligned enough.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt - March 24, 2009, 9:23 p.m.
On Tuesday 2009-03-24 22:18, David Miller wrote:
>> >
>> >Arches without efficient unaligned access can still perform a loop
>> >assuming 16bit alignment in ifname_compare()
>> 
>> Allow me some skepticism, but the code looks pretty much like a
>> standard memcmp.
>
>memcmp() can't make any assumptions about alignment.
>Whereas we _know_ this thing is exactly 16-bit aligned.
>
>All of the optimized memcmp() implementations look for
>32-bit alignment and punt to byte at a time comparison
>loops if things are not aligned enough.

Yes, I seem to remember glibc doing something like

 if ((addr & 0x03) != 0) {
     // process single bytes (increment addr as you go)
     // until addr & 0x03 == 0.
 }

 /* optimized loop here. also increases addr */

 if ((addr & 0x03) != 0)
     // still bytes left after loop - process on a per-byte basis

Is the cost of testing for non-4-divisibility expensive enough
to warrant not usnig memcmp?

Irrespective of all that, I think putting the interface comparison
code should be agglomerated in a function/header so that it is
replicated across iptables, ip6tables, ebtables, arptables, etc.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - March 24, 2009, 9:25 p.m.
From: Jan Engelhardt <jengelh@medozas.de>
Date: Tue, 24 Mar 2009 22:23:23 +0100 (CET)

> Is the cost of testing for non-4-divisibility expensive enough
> to warrant not usnig memcmp?

I think so.  This is the fast path of these things.

> Irrespective of all that, I think putting the interface comparison
> code should be agglomerated in a function/header so that it is
> replicated across iptables, ip6tables, ebtables, arptables, etc.

Agreed.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet - March 24, 2009, 9:39 p.m.
Jan Engelhardt a écrit :
> On Tuesday 2009-03-24 22:18, David Miller wrote:
>>>> Arches without efficient unaligned access can still perform a loop
>>>> assuming 16bit alignment in ifname_compare()
>>> Allow me some skepticism, but the code looks pretty much like a
>>> standard memcmp.
>> memcmp() can't make any assumptions about alignment.
>> Whereas we _know_ this thing is exactly 16-bit aligned.
>>
>> All of the optimized memcmp() implementations look for
>> 32-bit alignment and punt to byte at a time comparison
>> loops if things are not aligned enough.
> 
> Yes, I seem to remember glibc doing something like
> 
>  if ((addr & 0x03) != 0) {
>      // process single bytes (increment addr as you go)
>      // until addr & 0x03 == 0.
>  }
> 
>  /* optimized loop here. also increases addr */
> 
>  if ((addr & 0x03) != 0)
>      // still bytes left after loop - process on a per-byte basis
> 
> Is the cost of testing for non-4-divisibility expensive enough
> to warrant not usnig memcmp?
> 
> Irrespective of all that, I think putting the interface comparison
> code should be agglomerated in a function/header so that it is
> replicated across iptables, ip6tables, ebtables, arptables, etc.

memcmp() is fine, but how is it solving the masking problem we have ?

Also in the case of arp_tables, _a is long word aligned, while _b and _mask are not.

memcmp() in this case is slower, (and dont handle mask thing)

If you look various ifname_compare(), we have two different implementations.

So yes, a factorization is possible for three ip_tables.c, ip6_tables.c and xt_physdev.c




--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jan Engelhardt - March 24, 2009, 9:52 p.m.
On Tuesday 2009-03-24 22:39, Eric Dumazet wrote:
>>> memcmp() can't make any assumptions about alignment.
>>> Whereas we _know_ this thing is exactly 16-bit aligned.
>>>
>>> All of the optimized memcmp() implementations look for
>>> 32-bit alignment and punt to byte at a time comparison
>>> loops if things are not aligned enough.
>> 
>> Yes, I seem to remember glibc doing something like
>> [...]
>> Is the cost of testing for non-4-divisibility expensive enough
>> to warrant not usnig memcmp?
>> 
>> Irrespective of all that, I think putting the interface comparison
>> code should be agglomerated in a function/header so that it is
>> replicated across iptables, ip6tables, ebtables, arptables, etc.
>
>memcmp() is fine, but how is it solving the masking problem we have ?

You are right; we would have to calculate the prefix length of the
mask first. (And I think we can assume that there will not be any 1s
in the mask after the first 0.) It would consume CPU indeed.

>Also in the case of arp_tables, _a is long word aligned, while _b and _mask
>are not.
>
>If you look various ifname_compare(), we have two different implementations.
>
>So yes, a factorization is possible for three ip_tables.c, ip6_tables.c and
> xt_physdev.c.

Very well.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Andi Kleen - March 25, 2009, 10:33 a.m.
David Miller <davem@davemloft.net> writes:
>
> memcmp() can't make any assumptions about alignment.

Newer gcc often[1] knows how to generate specialized aligned memcmp()
based on the type alignment.

However you need to make sure the input types are right.

So in theories some casts might be enough given a new enough
compiler.

[1] not always unfortunately, sometimes it loses this information.

-Andi

Patch

diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 64a7c6c..84b9c17 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -76,6 +76,7 @@  static inline int arp_devaddr_compare(const struct arpt_devaddr_info *ap,
 /*
  * Unfortunatly, _b and _mask are not aligned to an int (or long int)
  * Some arches dont care, unrolling the loop is a win on them.
+ * For other arches, we only have a 16bit alignement.
  */
 static unsigned long ifname_compare(const char *_a, const char *_b, const char *_mask)
 {
@@ -95,10 +96,13 @@  static unsigned long ifname_compare(const char *_a, const char *_b, const char *
 	BUILD_BUG_ON(IFNAMSIZ > 4 * sizeof(unsigned long));
 #else
 	unsigned long ret = 0;
+	const u16 *a = (const u16 *)_a;
+	const u16 *b = (const u16 *)_b;
+	const u16 *mask = (const u16 *)_mask;
 	int i;
 
-	for (i = 0; i < IFNAMSIZ; i++)
-		ret |= (_a[i] ^ _b[i]) & _mask[i];
+	for (i = 0; i < IFNAMSIZ/sizeof(u16); i++)
+		ret |= (a[i] ^ b[i]) & mask[i];
 #endif
 	return ret;
 }