diff mbox

net: pfifo_fast - use ffs(x)-1 instead of array lookup

Message ID 1331615541-18214-1-git-send-email-zenczykowski@gmail.com
State Changes Requested, archived
Delegated to: David Miller
Headers show

Commit Message

Maciej Żenczykowski March 13, 2012, 5:12 a.m. UTC
From: Maciej Żenczykowski <maze@google.com>

See ffs(x) definition in arch/x86/include/asm/bitops.h

  ffs - find first set bit in word

  ffs(value) returns 0 if value is 0 or the position of the first
  set bit if value is nonzero. The first (least significant) bit
  is at position 1.

On x86_64 ffs(x) is effectively:
  Z := -1
  BSFL X, Z
  return Z + 1

Since we subtract one, we effectively end up with:
  Z := -1
  BSFL X, Z
  return Z

This is certainly more readable than the open coded array that
was there before, supports an easier change in the number of bands,
and is probably faster to boot (no memory lookup).

However, on other architectures ffs() might not be so pretty,
hence use a clever arithmetic hack on other archs.
Unfortunately it only support 3 bands.

Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 net/sched/sch_generic.c |   18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

Comments

Maciej Żenczykowski March 13, 2012, 5:13 a.m. UTC | #1
Not really sure about this one, this might be more of a discussion starter...
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 13, 2012, 5:54 a.m. UTC | #2
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Mon, 12 Mar 2012 22:12:21 -0700

> From: Maciej Żenczykowski <maze@google.com>
> 
> See ffs(x) definition in arch/x86/include/asm/bitops.h
> 
>   ffs - find first set bit in word
> 
>   ffs(value) returns 0 if value is 0 or the position of the first
>   set bit if value is nonzero. The first (least significant) bit
>   is at position 1.
> 
> On x86_64 ffs(x) is effectively:
>   Z := -1
>   BSFL X, Z
>   return Z + 1
> 
> Since we subtract one, we effectively end up with:
>   Z := -1
>   BSFL X, Z
>   return Z
> 
> This is certainly more readable than the open coded array that
> was there before, supports an easier change in the number of bands,
> and is probably faster to boot (no memory lookup).
> 
> However, on other architectures ffs() might not be so pretty,
> hence use a clever arithmetic hack on other archs.
> Unfortunately it only support 3 bands.
> 
> Signed-off-by: Maciej Żenczykowski <maze@google.com>

It's about the same cost, the non-ffs() version, so I would just
use that for now.  Conditionalized code is such a pain in the butt.

--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski March 13, 2012, 7:21 a.m. UTC | #3
If I understand correctly, you're suggesting to go with

return ((26468 >> (i+i)) & 3) - 1;

correct?

2012/3/12 David Miller <davem@davemloft.net>:
> From: Maciej Żenczykowski <zenczykowski@gmail.com>
> Date: Mon, 12 Mar 2012 22:12:21 -0700
>
>> From: Maciej Żenczykowski <maze@google.com>
>>
>> See ffs(x) definition in arch/x86/include/asm/bitops.h
>>
>>   ffs - find first set bit in word
>>
>>   ffs(value) returns 0 if value is 0 or the position of the first
>>   set bit if value is nonzero. The first (least significant) bit
>>   is at position 1.
>>
>> On x86_64 ffs(x) is effectively:
>>   Z := -1
>>   BSFL X, Z
>>   return Z + 1
>>
>> Since we subtract one, we effectively end up with:
>>   Z := -1
>>   BSFL X, Z
>>   return Z
>>
>> This is certainly more readable than the open coded array that
>> was there before, supports an easier change in the number of bands,
>> and is probably faster to boot (no memory lookup).
>>
>> However, on other architectures ffs() might not be so pretty,
>> hence use a clever arithmetic hack on other archs.
>> Unfortunately it only support 3 bands.
>>
>> Signed-off-by: Maciej Żenczykowski <maze@google.com>
>
> It's about the same cost, the non-ffs() version, so I would just
> use that for now.  Conditionalized code is such a pain in the butt.
>
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski March 13, 2012, 7:23 a.m. UTC | #4
I was actually leaning towards going with ffs()-1, since

it's excellent on x86-64
good on x86_cmov
decent on x86
good on armv5+

and I'm guessing it'll be good or better on any modern platform
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 13, 2012, 7:25 a.m. UTC | #5
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue, 13 Mar 2012 00:21:55 -0700

> If I understand correctly, you're suggesting to go with
> 
> return ((26468 >> (i+i)) & 3) - 1;
> 
> correct?

Yes, at least for now.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller March 13, 2012, 7:25 a.m. UTC | #6
From: Maciej Żenczykowski <zenczykowski@gmail.com>
Date: Tue, 13 Mar 2012 00:23:27 -0700

> I was actually leaning towards going with ffs()-1, since
> 
> it's excellent on x86-64
> good on x86_cmov
> decent on x86
> good on armv5+
> 
> and I'm guessing it'll be good or better on any modern platform

Cycle count it with the pure integer operation variant, I bet it's
within a cycle or two, and at that point we're just splitting hairs
for an ugly-as-sin ifdef.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight March 13, 2012, 11:04 a.m. UTC | #7
> +	/* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
> +	return ((26468 >> (i+i)) & 3) - 1;

That expression doesn't seem quite right to me ...
	return (int)(0x12131210 >> (i * 4)) & 3) - 1;
probably does what is wanted.

	David


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Laight March 13, 2012, 4:55 p.m. UTC | #8
> > +	/* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
> > +	return ((26468 >> (i+i)) & 3) - 1;
> 
> That expression doesn't seem quite right to me ...
> 	return (int)(0x12131210 >> (i * 4)) & 3) - 1;
> probably does what is wanted.

Hmmm... I'm going blind - I read that as 'i+1' not 'i+i'.
But using 'i * 4' and putting the constant in base 16
make the code rather less obscure.

	David


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Dumazet March 13, 2012, 5:34 p.m. UTC | #9
On Tue, 2012-03-13 at 16:55 +0000, David Laight wrote:
> > > +	/* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
> > > +	return ((26468 >> (i+i)) & 3) - 1;
> > 
> > That expression doesn't seem quite right to me ...
> > 	return (int)(0x12131210 >> (i * 4)) & 3) - 1;
> > probably does whats wanted.
> 
> Hmmm... I'm going blind - I read that as 'i+1' not 'i+i'.
> But using 'i * 4' and putting the constant in base 16
> make the code rather less obscure.

Point was to get short/fast code.

26458 can probably be expressed with a macro so that it is self
explained.

Check asm output if your way is better. Its probably not the case
because some arches can use smaller code to manipulate small constants.

I am not sure the memory lookup is that expensive anyway
(we probably could use 'char' instead of 'int' to reduce the siwe of
this memory blob)

Code is also read from memory, so you have to trade icache/dcache
issues.



--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maciej Żenczykowski March 13, 2012, 8:37 p.m. UTC | #10
but a 16-bit constant will be better than a 32-bit one on 16-bit platforms

On Tue, Mar 13, 2012 at 9:55 AM, David Laight <David.Laight@aculab.com> wrote:
>
>> > +   /* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
>> > +   return ((26468 >> (i+i)) & 3) - 1;
>>
>> That expression doesn't seem quite right to me ...
>>       return (int)(0x12131210 >> (i * 4)) & 3) - 1;
>> probably does what is wanted.
>
> Hmmm... I'm going blind - I read that as 'i+1' not 'i+i'.
> But using 'i * 4' and putting the constant in base 16
> make the code rather less obscure.
>
>        David
>
>
diff mbox

Patch

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index 67fc573e013a..935492d6a0b6 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -436,9 +436,19 @@  struct pfifo_fast_priv {
  * Convert a bitmap to the first band number where an skb is queued, where:
  * 	bitmap=0 means there are no skbs on any band.
  * 	bitmap=1 means there is an skb on band 0.
- *	bitmap=7 means there are skbs on all 3 bands, etc.
+ * 	bitmap=7 means there are skbs on all 3 bands, etc.
+ *
+ * This is equivalent to ffs(i) - 1
  */
-static const int bitmap2band[] = {-1, 0, 1, 0, 2, 0, 1, 0};
+static inline int bitmap2band(int i)
+{
+#if defined(CONFIG_X86_64)
+	return ffs(i) - 1; /* Known to be efficient. */
+#else
+	/* For i in 0..7 returns {-1, 0, 1, 0, 2, 0, 1, 0}[i] */
+	return ((26468 >> (i+i)) & 3) - 1;
+#endif
+}
 
 static inline struct sk_buff_head *band2list(struct pfifo_fast_priv *priv,
 					     int band)
@@ -464,7 +474,7 @@  static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc)
 static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 {
 	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-	int band = bitmap2band[priv->bitmap];
+	int band = bitmap2band(priv->bitmap);
 
 	if (likely(band >= 0)) {
 		struct sk_buff_head *list = band2list(priv, band);
@@ -483,7 +493,7 @@  static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc)
 static struct sk_buff *pfifo_fast_peek(struct Qdisc *qdisc)
 {
 	struct pfifo_fast_priv *priv = qdisc_priv(qdisc);
-	int band = bitmap2band[priv->bitmap];
+	int band = bitmap2band(priv->bitmap);
 
 	if (band >= 0) {
 		struct sk_buff_head *list = band2list(priv, band);