Patchwork [RFC] 160-bits bitmap_element

login
register
mail settings
Submitter Richard Guenther
Date Aug. 17, 2012, 12:12 p.m.
Message ID <CAFiYyc1w6-wCNvvLDvH+LffLgoFZZq+7rhFnxAtr-RUdMg8UXg@mail.gmail.com>
Download mbox | patch
Permalink /patch/178194/
State New
Headers show

Comments

Richard Guenther - Aug. 17, 2012, 12:12 p.m.
On Fri, Aug 17, 2012 at 2:04 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
> On Fri, Aug 17, 2012 at 1:54 PM, Richard Guenther
> <richard.guenther@gmail.com> wrote:
>> Well, another effect of reducing the size of BITMAP_WORD is that
>> operations are not performed in a mode optimally using CPU regs
>> (did you check code generation differences on a 64bit host?).
>
> I did, on x86_64 and on powerpc64. The effect is not dramatic, most of
> these machines can perform 32 bits operations just fine (I think the
> only exception would be alpha, maybe?).

I wonder how bad code gets when we unconditionally use GCCs generic
vector support to do

able to lower it to the same scalar code or even v2si operations where
only those are available ...

Just an idea and eventually an opportunity to improve generic vector
lowering if the above really does not work out.

Richard.

> Ciao!
> Steven
Richard Guenther - Aug. 17, 2012, 12:16 p.m.
On Fri, Aug 17, 2012 at 2:12 PM, Richard Guenther
<richard.guenther@gmail.com> wrote:
> On Fri, Aug 17, 2012 at 2:04 PM, Steven Bosscher <stevenb.gcc@gmail.com> wrote:
>> On Fri, Aug 17, 2012 at 1:54 PM, Richard Guenther
>> <richard.guenther@gmail.com> wrote:
>>> Well, another effect of reducing the size of BITMAP_WORD is that
>>> operations are not performed in a mode optimally using CPU regs
>>> (did you check code generation differences on a 64bit host?).
>>
>> I did, on x86_64 and on powerpc64. The effect is not dramatic, most of
>> these machines can perform 32 bits operations just fine (I think the
>> only exception would be alpha, maybe?).
>
> I wonder how bad code gets when we unconditionally use GCCs generic
> vector support to do
>
> Index: gcc/bitmap.c
> ===================================================================
> --- gcc/bitmap.c        (revision 190469)
> +++ gcc/bitmap.c        (working copy)
> @@ -1446,6 +1446,17 @@ bitmap_compl_and_into (bitmap a, const_b
>           unsigned ix;
>           BITMAP_WORD ior = 0;
>
> +#if BITMAP_ELEMENT_WORDS == 5
> +         typedef v4si unsigned int __attribute__((vector_size((16))));
> +         v4si cleared4 = *(v4si *)&a_elt->bits[0] & *(v4si *)&b_elt->bits[0];
> +         BITMAP_WORD cleared5 = a_elt->bits[4] & b_elt->bits[4];
> +         v4si r4 = *(v4si *)&b_elt->bits[0] ^ cleared4;
> +         BITMAP_WORD r5 = b_elt->bits[4] ^ cleared5;
> +         *(v4si *)&a_elt->bits[0] = r4;
> +         a_elt->bits[4] = r5;
> +         ior4 |= r4;
> +         ior5 |= r5;
> +#else
>           for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)
>
> of course with a proper #if GCC_VERSION.  The theory is we should be
> able to lower it to the same scalar code or even v2si operations where
> only those are available ...
>
> Just an idea and eventually an opportunity to improve generic vector
> lowering if the above really does not work out.

Or figure out if or why not the vectorizer does catch this (of course we do
not enable that with -O2 which we eventually should in a very conservative
mode).

Richard.

> Richard.
>
>> Ciao!
>> Steven
Jakub Jelinek - Aug. 17, 2012, 12:21 p.m.
On Fri, Aug 17, 2012 at 02:16:12PM +0200, Richard Guenther wrote:
> Or figure out if or why not the vectorizer does catch this (of course we do
> not enable that with -O2 which we eventually should in a very conservative
> mode).

It might be helpful if we for the BITMAP_ELEMENT_WORDS == 5 case reordered
indx field after bits, so that bits array is 64-bit aligned.

	Jakub
Steven Bosscher - Aug. 17, 2012, 1:06 p.m.
On Fri, Aug 17, 2012 at 2:21 PM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Fri, Aug 17, 2012 at 02:16:12PM +0200, Richard Guenther wrote:
>> Or figure out if or why not the vectorizer does catch this (of course we do
>> not enable that with -O2 which we eventually should in a very conservative
>> mode).
>
> It might be helpful if we for the BITMAP_ELEMENT_WORDS == 5 case reordered
> indx field after bits, so that bits array is 64-bit aligned.

That's an excellent suggestion! With that change on top of my 5-int
bitmap patch, I actually get a decent speed-up for this test case (5%,
mostly due to less slow DF).

I'm trying this idea with an otherwise unpatched compiler now.

Ciao!
Steven

Patch

Index: gcc/bitmap.c
===================================================================
--- gcc/bitmap.c        (revision 190469)
+++ gcc/bitmap.c        (working copy)
@@ -1446,6 +1446,17 @@  bitmap_compl_and_into (bitmap a, const_b
          unsigned ix;
          BITMAP_WORD ior = 0;

+#if BITMAP_ELEMENT_WORDS == 5
+         typedef v4si unsigned int __attribute__((vector_size((16))));
+         v4si cleared4 = *(v4si *)&a_elt->bits[0] & *(v4si *)&b_elt->bits[0];
+         BITMAP_WORD cleared5 = a_elt->bits[4] & b_elt->bits[4];
+         v4si r4 = *(v4si *)&b_elt->bits[0] ^ cleared4;
+         BITMAP_WORD r5 = b_elt->bits[4] ^ cleared5;
+         *(v4si *)&a_elt->bits[0] = r4;
+         a_elt->bits[4] = r5;
+         ior4 |= r4;
+         ior5 |= r5;
+#else
          for (ix = 0; ix < BITMAP_ELEMENT_WORDS; ix++)

of course with a proper #if GCC_VERSION.  The theory is we should be