diff mbox series

[libatomic] Add nvptx support

Message ID 20200908065104.GA21827@delia
State New
Headers show
Series [libatomic] Add nvptx support | expand

Commit Message

Tom de Vries Sept. 8, 2020, 6:51 a.m. UTC
Hi,

Add nvptx support to libatomic.

Given that atomic_test_and_set is not implemented for nvptx (PR96964), the
compiler translates __atomic_test_and_set falling back onto the "Failing all
else, assume a single threaded environment and simply perform the operation"
case in expand_atomic_test_and_set, so it doesn't map onto an actual atomic
operation.

Still, that counts as supported for the configure test of libatomic, so we
end up with HAVE_ATOMIC_TAS_1/2/4/8/16 == 1, and the corresponding
__atomic_test_and_set_1/2/4/8/16 in libatomic all using that non-atomic
implementation.

Fix this by adding an atomic_test_and_set expansion for nvptx, that uses
libatomics __atomic_test_and_set_1.

This again makes the configure tests for HAVE_ATOMIC_TAS_1/2/4/8/16 fail, so
instead we use this case in tas_n.c:
...
/* If this type is smaller than word-sized, fall back to a word-sized
   compare-and-swap loop.  */
bool
SIZE(libat_test_and_set) (UTYPE *mptr, int smodel)
...
which for __atomic_test_and_set_8 uses INVERT_MASK_8.

Add INVERT_MASK_8 in libatomic_i.h, as well as MASK_8.

Tested libatomic testsuite on nvptx.

Non-target bits (sync-builtins.def, libatomic_i.h) OK for trunk?

Any other comments?

Thanks,
- Tom

[libatomic] Add nvptx support

gcc/ChangeLog:

	PR target/96964
	* config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
	expansion.
	* sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.

libatomic/ChangeLog:

	PR target/96898
	* configure.tgt: Add nvptx.
	* libatomic_i.h (MASK_8, INVERT_MASK_8): New macro definition.
	* config/nvptx/host-config.h: New file.
	* config/nvptx/lock.c: New file.

---
 gcc/config/nvptx/nvptx.md            | 16 +++++++++++
 gcc/sync-builtins.def                |  2 ++
 libatomic/config/nvptx/host-config.h | 56 ++++++++++++++++++++++++++++++++++++
 libatomic/config/nvptx/lock.c        | 56 ++++++++++++++++++++++++++++++++++++
 libatomic/configure.tgt              |  3 ++
 libatomic/libatomic_i.h              |  2 ++
 6 files changed, 135 insertions(+)

Comments

Tobias Burnus Sept. 8, 2020, 3:05 p.m. UTC | #1
On 9/8/20 8:51 AM, Tom de Vries wrote:

> Add nvptx support to libatomic.

I tried it on powerpc64le-none-linux-gnu and that solves the
__sync_val_compare_and_swap_16 issue, I reported at
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553070.html

However, when trying Jakub's example (see below; syntax fixed
version), https://gcc.gnu.org/pipermail/gcc-patches/2020-September/553142.html
it (still) fails with:

atomic.c: In function 'main._omp_fn.0':
atomic.c:6:11: internal compiler error: in write_fn_proto, at config/nvptx/nvptx.c:913
     6 |   #pragma omp target

Tobias

PS: The 'atomic.c' testcase:

__uint128_t v;
#pragma omp declare target (v)
int
main ()
{
   #pragma omp target
   {
     __atomic_add_fetch (&v, 1, __ATOMIC_RELAXED);
     __atomic_fetch_add (&v, 1, __ATOMIC_RELAXED);
     __uint128_t exp = 2;
     __atomic_compare_exchange_n (&v, &exp, 7, 0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE);
   }
}

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Tobias Burnus Sept. 9, 2020, 12:36 p.m. UTC | #2
Hi Tom,

On 9/8/20 5:05 PM, Tobias Burnus wrote:

> On 9/8/20 8:51 AM, Tom de Vries wrote:
>>     PR target/96964
>>     * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
>>     expansion.
>>     * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.

I have your patch applied on a current mainline powerpc64le-none-linux-gnu
+ nvptx offloading build. And I observe the following fails – which seems
to be new and related to your patch (but I have not confirmed it by
reverting your libatomic patch).

Required option for the fail: "-O2 -ftracer",
hence, only the "-O3 ..." testsuite builds fail.
(-ftracer = "Perform tail duplication to enlarge superblock size.")


during RTL pass: mach
asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at config/nvptx/nvptx.c:3293
0x10bf9f13 nvptx_find_par
         gcc/config/nvptx/nvptx.c:3293
0x10bf9b97 nvptx_find_par
         gcc/config/nvptx/nvptx.c:3320
0x10bf9b97 nvptx_find_par
         gcc/config/nvptx/nvptx.c:3320
...


The ICE occurs for the second assert of:
         case CODE_FOR_nvptx_join:
           /* A loop tail.  Finish the current loop and return to
              parent.  */
           {
             unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));

             gcc_assert (par->mask == mask);
             gcc_assert (par->join_block == NULL);

gdb shows:
(gdb) p debug_bb(par->join_block )
(note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
(insn 31 213 204 24 (unspec_volatile:SI [
             (const_int 4 [0x4])
         ] UNSPECV_JOIN) "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237 {nvptx_join}
      (nil))
(jump_insn 204 31 205 24 (set (pc)
         (label_ref 198)) 121 {jump}
      (nil)
  -> 198)


That affects the testcases:
libgomp.oacc-fortran/asyncwait-1.f90
libgomp.oacc-fortran/asyncwait-2.f90
libgomp.oacc-fortran/asyncwait-3.f90
libgomp.oacc-fortran/atomic_capture-1.f90
libgomp.oacc-fortran/atomic_update-1.f90
libgomp.oacc-fortran/classtypes-1.f95
libgomp.oacc-fortran/collapse-1.f90
libgomp.oacc-fortran/collapse-2.f90
libgomp.oacc-fortran/collapse-3.f90
libgomp.oacc-fortran/collapse-4.f90
libgomp.oacc-fortran/collapse-5.f90
libgomp.oacc-fortran/collapse-6.f90
libgomp.oacc-fortran/collapse-7.f90
libgomp.oacc-fortran/collapse-8.f90
libgomp.oacc-fortran/combined-directives-1.f90
libgomp.oacc-fortran/combined-reduction.f90
libgomp.oacc-fortran/common-block-1.f90
libgomp.oacc-fortran/common-block-2.f90
libgomp.oacc-fortran/common-block-3.f90
libgomp.oacc-fortran/deep-copy-1.f90
libgomp.oacc-fortran/deep-copy-3.f90
libgomp.oacc-fortran/deep-copy-4.f90
libgomp.oacc-fortran/deep-copy-5.f90
libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
libgomp.oacc-fortran/deep-copy-6.f90
libgomp.oacc-fortran/deep-copy-7.f90
libgomp.oacc-fortran/deep-copy-8.f90
libgomp.oacc-fortran/derived-type-1.f90
libgomp.oacc-fortran/host_data-2.f90
libgomp.oacc-fortran/host_data-3.f
libgomp.oacc-fortran/host_data-4.f90
libgomp.oacc-fortran/implicit-firstprivate-ref.f90
libgomp.oacc-fortran/lib-14.f90
libgomp.oacc-fortran/map-1.f90
libgomp.oacc-fortran/nested-function-1.f90
libgomp.oacc-fortran/nested-function-2.f90
libgomp.oacc-fortran/nested-function-3.f90
libgomp.oacc-fortran/no_create-3.F90
libgomp.oacc-fortran/optional-data-copyin.f90
libgomp.oacc-fortran/optional-data-copyout.f90
libgomp.oacc-fortran/optional-data-enter-exit.f90
libgomp.oacc-fortran/optional-declare.f90
libgomp.oacc-fortran/optional-firstprivate.f90
libgomp.oacc-fortran/optional-reduction.f90
libgomp.oacc-fortran/optional-update-device.f90
libgomp.oacc-fortran/optional-update-host.f90
libgomp.oacc-fortran/parallel-dims.f90
libgomp.oacc-fortran/parallel-loop-1.f90
libgomp.oacc-fortran/pr81352.f90
libgomp.oacc-fortran/pr84028.f90
libgomp.oacc-fortran/reduction-1.f90
libgomp.oacc-fortran/reduction-2.f90
libgomp.oacc-fortran/reduction-3.f90
libgomp.oacc-fortran/reduction-4.f90
libgomp.oacc-fortran/reduction-5.f90
libgomp.oacc-fortran/reduction-6.f90
libgomp.oacc-fortran/reduction-7.f90
libgomp.oacc-fortran/reduction-8.f90
libgomp.oacc-fortran/routine-1.f90
libgomp.oacc-fortran/routine-2.f90
libgomp.oacc-fortran/routine-3.f90
libgomp.oacc-fortran/routine-4.f90
libgomp.oacc-fortran/routine-7.f90
libgomp.oacc-fortran/routine-9.f90
libgomp.oacc-fortran/subarrays-1.f90
libgomp.oacc-fortran/subarrays-2.f90
libgomp.oacc-fortran/update-2.f90

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Tom de Vries Sept. 9, 2020, 1:15 p.m. UTC | #3
On 9/9/20 2:36 PM, Tobias Burnus wrote:
> Hi Tom,
> 
> On 9/8/20 5:05 PM, Tobias Burnus wrote:
> 
>> On 9/8/20 8:51 AM, Tom de Vries wrote:
>>>     PR target/96964
>>>     * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
>>>     expansion.
>>>     * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.
> 
> I have your patch applied on a current mainline powerpc64le-none-linux-gnu
> + nvptx offloading build.

Thanks for trying this out.

> And I observe the following fails – which seems
> to be new and related to your patch (but I have not confirmed it by
> reverting your libatomic patch).
> 

Could you confirm that?

Meanwhile, I'll try to reproduce on x86_64.

> Required option for the fail: "-O2 -ftracer",
> hence, only the "-O3 ..." testsuite builds fail.
> (-ftracer = "Perform tail duplication to enlarge superblock size.")
> 
> 
> during RTL pass: mach
> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at
> config/nvptx/nvptx.c:3293
> 0x10bf9f13 nvptx_find_par
>         gcc/config/nvptx/nvptx.c:3293
> 0x10bf9b97 nvptx_find_par
>         gcc/config/nvptx/nvptx.c:3320
> 0x10bf9b97 nvptx_find_par
>         gcc/config/nvptx/nvptx.c:3320
> ...
> 
> 
> The ICE occurs for the second assert of:
>         case CODE_FOR_nvptx_join:
>           /* A loop tail.  Finish the current loop and return to
>              parent.  */
>           {
>             unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));
> 
>             gcc_assert (par->mask == mask);
>             gcc_assert (par->join_block == NULL);
> 
> gdb shows:
> (gdb) p debug_bb(par->join_block )
> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
> (insn 31 213 204 24 (unspec_volatile:SI [
>             (const_int 4 [0x4])
>         ] UNSPECV_JOIN)
> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237
> {nvptx_join}
>      (nil))
> (jump_insn 204 31 205 24 (set (pc)
>         (label_ref 198)) 121 {jump}
>      (nil)
>  -> 198)
> 

Yep, code duplication works against the matching of fork/join, it's not
the first time we see this.

Usually the fix is to make an optimization pass conservative with
respect to these fork/join regions, but AFAICT, ftracer already has such
code in ignore_bb_p that tests gimple_call_internal_unique_p.

So, perhaps the ftracer pass is the trigger, but not the pass that does
the problematic transformation? Just a guess at this point.

Thanks,
- Tom

> 
> That affects the testcases:
> libgomp.oacc-fortran/asyncwait-1.f90
> libgomp.oacc-fortran/asyncwait-2.f90
> libgomp.oacc-fortran/asyncwait-3.f90
> libgomp.oacc-fortran/atomic_capture-1.f90
> libgomp.oacc-fortran/atomic_update-1.f90
> libgomp.oacc-fortran/classtypes-1.f95
> libgomp.oacc-fortran/collapse-1.f90
> libgomp.oacc-fortran/collapse-2.f90
> libgomp.oacc-fortran/collapse-3.f90
> libgomp.oacc-fortran/collapse-4.f90
> libgomp.oacc-fortran/collapse-5.f90
> libgomp.oacc-fortran/collapse-6.f90
> libgomp.oacc-fortran/collapse-7.f90
> libgomp.oacc-fortran/collapse-8.f90
> libgomp.oacc-fortran/combined-directives-1.f90
> libgomp.oacc-fortran/combined-reduction.f90
> libgomp.oacc-fortran/common-block-1.f90
> libgomp.oacc-fortran/common-block-2.f90
> libgomp.oacc-fortran/common-block-3.f90
> libgomp.oacc-fortran/deep-copy-1.f90
> libgomp.oacc-fortran/deep-copy-3.f90
> libgomp.oacc-fortran/deep-copy-4.f90
> libgomp.oacc-fortran/deep-copy-5.f90
> libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
> libgomp.oacc-fortran/deep-copy-6.f90
> libgomp.oacc-fortran/deep-copy-7.f90
> libgomp.oacc-fortran/deep-copy-8.f90
> libgomp.oacc-fortran/derived-type-1.f90
> libgomp.oacc-fortran/host_data-2.f90
> libgomp.oacc-fortran/host_data-3.f
> libgomp.oacc-fortran/host_data-4.f90
> libgomp.oacc-fortran/implicit-firstprivate-ref.f90
> libgomp.oacc-fortran/lib-14.f90
> libgomp.oacc-fortran/map-1.f90
> libgomp.oacc-fortran/nested-function-1.f90
> libgomp.oacc-fortran/nested-function-2.f90
> libgomp.oacc-fortran/nested-function-3.f90
> libgomp.oacc-fortran/no_create-3.F90
> libgomp.oacc-fortran/optional-data-copyin.f90
> libgomp.oacc-fortran/optional-data-copyout.f90
> libgomp.oacc-fortran/optional-data-enter-exit.f90
> libgomp.oacc-fortran/optional-declare.f90
> libgomp.oacc-fortran/optional-firstprivate.f90
> libgomp.oacc-fortran/optional-reduction.f90
> libgomp.oacc-fortran/optional-update-device.f90
> libgomp.oacc-fortran/optional-update-host.f90
> libgomp.oacc-fortran/parallel-dims.f90
> libgomp.oacc-fortran/parallel-loop-1.f90
> libgomp.oacc-fortran/pr81352.f90
> libgomp.oacc-fortran/pr84028.f90
> libgomp.oacc-fortran/reduction-1.f90
> libgomp.oacc-fortran/reduction-2.f90
> libgomp.oacc-fortran/reduction-3.f90
> libgomp.oacc-fortran/reduction-4.f90
> libgomp.oacc-fortran/reduction-5.f90
> libgomp.oacc-fortran/reduction-6.f90
> libgomp.oacc-fortran/reduction-7.f90
> libgomp.oacc-fortran/reduction-8.f90
> libgomp.oacc-fortran/routine-1.f90
> libgomp.oacc-fortran/routine-2.f90
> libgomp.oacc-fortran/routine-3.f90
> libgomp.oacc-fortran/routine-4.f90
> libgomp.oacc-fortran/routine-7.f90
> libgomp.oacc-fortran/routine-9.f90
> libgomp.oacc-fortran/subarrays-1.f90
> libgomp.oacc-fortran/subarrays-2.f90
> libgomp.oacc-fortran/update-2.f90
> 
> Tobias
> 
> -----------------
> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
> Germany
> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
> Alexander Walter
Tom de Vries Sept. 9, 2020, 2:14 p.m. UTC | #4
On 9/9/20 3:15 PM, Tom de Vries wrote:
> On 9/9/20 2:36 PM, Tobias Burnus wrote:
>> Hi Tom,
>>
>> On 9/8/20 5:05 PM, Tobias Burnus wrote:
>>
>>> On 9/8/20 8:51 AM, Tom de Vries wrote:
>>>>     PR target/96964
>>>>     * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
>>>>     expansion.
>>>>     * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.
>>
>> I have your patch applied on a current mainline powerpc64le-none-linux-gnu
>> + nvptx offloading build.
> 
> Thanks for trying this out.
> 
>> And I observe the following fails – which seems
>> to be new and related to your patch (but I have not confirmed it by
>> reverting your libatomic patch).
>>
> 
> Could you confirm that?
> 
> Meanwhile, I'll try to reproduce on x86_64.
> 
>> Required option for the fail: "-O2 -ftracer",
>> hence, only the "-O3 ..." testsuite builds fail.
>> (-ftracer = "Perform tail duplication to enlarge superblock size.")
>>
>>
>> during RTL pass: mach
>> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at
>> config/nvptx/nvptx.c:3293
>> 0x10bf9f13 nvptx_find_par
>>         gcc/config/nvptx/nvptx.c:3293
>> 0x10bf9b97 nvptx_find_par
>>         gcc/config/nvptx/nvptx.c:3320
>> 0x10bf9b97 nvptx_find_par
>>         gcc/config/nvptx/nvptx.c:3320
>> ...
>>
>>
>> The ICE occurs for the second assert of:
>>         case CODE_FOR_nvptx_join:
>>           /* A loop tail.  Finish the current loop and return to
>>              parent.  */
>>           {
>>             unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));
>>
>>             gcc_assert (par->mask == mask);
>>             gcc_assert (par->join_block == NULL);
>>
>> gdb shows:
>> (gdb) p debug_bb(par->join_block )
>> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
>> (insn 31 213 204 24 (unspec_volatile:SI [
>>             (const_int 4 [0x4])
>>         ] UNSPECV_JOIN)
>> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237
>> {nvptx_join}
>>      (nil))
>> (jump_insn 204 31 205 24 (set (pc)
>>         (label_ref 198)) 121 {jump}
>>      (nil)
>>  -> 198)
>>
> 
> Yep, code duplication works against the matching of fork/join, it's not
> the first time we see this.
> 
> Usually the fix is to make an optimization pass conservative with
> respect to these fork/join regions, but AFAICT, ftracer already has such
> code in ignore_bb_p that tests gimple_call_internal_unique_p.
> 
> So, perhaps the ftracer pass is the trigger, but not the pass that does
> the problematic transformation? Just a guess at this point.
> 

I can reproduce it, and it's indeed the ftracer pass that does the
duplication.  So, the question is why doesn't ignore_bb_p work.

Thanks,
- Tom

> 
>>
>> That affects the testcases:
>> libgomp.oacc-fortran/asyncwait-1.f90
>> libgomp.oacc-fortran/asyncwait-2.f90
>> libgomp.oacc-fortran/asyncwait-3.f90
>> libgomp.oacc-fortran/atomic_capture-1.f90
>> libgomp.oacc-fortran/atomic_update-1.f90
>> libgomp.oacc-fortran/classtypes-1.f95
>> libgomp.oacc-fortran/collapse-1.f90
>> libgomp.oacc-fortran/collapse-2.f90
>> libgomp.oacc-fortran/collapse-3.f90
>> libgomp.oacc-fortran/collapse-4.f90
>> libgomp.oacc-fortran/collapse-5.f90
>> libgomp.oacc-fortran/collapse-6.f90
>> libgomp.oacc-fortran/collapse-7.f90
>> libgomp.oacc-fortran/collapse-8.f90
>> libgomp.oacc-fortran/combined-directives-1.f90
>> libgomp.oacc-fortran/combined-reduction.f90
>> libgomp.oacc-fortran/common-block-1.f90
>> libgomp.oacc-fortran/common-block-2.f90
>> libgomp.oacc-fortran/common-block-3.f90
>> libgomp.oacc-fortran/deep-copy-1.f90
>> libgomp.oacc-fortran/deep-copy-3.f90
>> libgomp.oacc-fortran/deep-copy-4.f90
>> libgomp.oacc-fortran/deep-copy-5.f90
>> libgomp.oacc-fortran/deep-copy-6-no_finalize.F90
>> libgomp.oacc-fortran/deep-copy-6.f90
>> libgomp.oacc-fortran/deep-copy-7.f90
>> libgomp.oacc-fortran/deep-copy-8.f90
>> libgomp.oacc-fortran/derived-type-1.f90
>> libgomp.oacc-fortran/host_data-2.f90
>> libgomp.oacc-fortran/host_data-3.f
>> libgomp.oacc-fortran/host_data-4.f90
>> libgomp.oacc-fortran/implicit-firstprivate-ref.f90
>> libgomp.oacc-fortran/lib-14.f90
>> libgomp.oacc-fortran/map-1.f90
>> libgomp.oacc-fortran/nested-function-1.f90
>> libgomp.oacc-fortran/nested-function-2.f90
>> libgomp.oacc-fortran/nested-function-3.f90
>> libgomp.oacc-fortran/no_create-3.F90
>> libgomp.oacc-fortran/optional-data-copyin.f90
>> libgomp.oacc-fortran/optional-data-copyout.f90
>> libgomp.oacc-fortran/optional-data-enter-exit.f90
>> libgomp.oacc-fortran/optional-declare.f90
>> libgomp.oacc-fortran/optional-firstprivate.f90
>> libgomp.oacc-fortran/optional-reduction.f90
>> libgomp.oacc-fortran/optional-update-device.f90
>> libgomp.oacc-fortran/optional-update-host.f90
>> libgomp.oacc-fortran/parallel-dims.f90
>> libgomp.oacc-fortran/parallel-loop-1.f90
>> libgomp.oacc-fortran/pr81352.f90
>> libgomp.oacc-fortran/pr84028.f90
>> libgomp.oacc-fortran/reduction-1.f90
>> libgomp.oacc-fortran/reduction-2.f90
>> libgomp.oacc-fortran/reduction-3.f90
>> libgomp.oacc-fortran/reduction-4.f90
>> libgomp.oacc-fortran/reduction-5.f90
>> libgomp.oacc-fortran/reduction-6.f90
>> libgomp.oacc-fortran/reduction-7.f90
>> libgomp.oacc-fortran/reduction-8.f90
>> libgomp.oacc-fortran/routine-1.f90
>> libgomp.oacc-fortran/routine-2.f90
>> libgomp.oacc-fortran/routine-3.f90
>> libgomp.oacc-fortran/routine-4.f90
>> libgomp.oacc-fortran/routine-7.f90
>> libgomp.oacc-fortran/routine-9.f90
>> libgomp.oacc-fortran/subarrays-1.f90
>> libgomp.oacc-fortran/subarrays-2.f90
>> libgomp.oacc-fortran/update-2.f90
>>
>> Tobias
>>
>> -----------------
>> Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München /
>> Germany
>> Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung,
>> Alexander Walter
Tom de Vries Sept. 9, 2020, 2:49 p.m. UTC | #5
On 9/9/20 4:14 PM, Tom de Vries wrote:
> On 9/9/20 3:15 PM, Tom de Vries wrote:
>> On 9/9/20 2:36 PM, Tobias Burnus wrote:
>>> Hi Tom,
>>>
>>> On 9/8/20 5:05 PM, Tobias Burnus wrote:
>>>
>>>> On 9/8/20 8:51 AM, Tom de Vries wrote:
>>>>>     PR target/96964
>>>>>     * config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
>>>>>     expansion.
>>>>>     * sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.
>>>
>>> I have your patch applied on a current mainline powerpc64le-none-linux-gnu
>>> + nvptx offloading build.
>>
>> Thanks for trying this out.
>>
>>> And I observe the following fails – which seems
>>> to be new and related to your patch (but I have not confirmed it by
>>> reverting your libatomic patch).
>>>
>>
>> Could you confirm that?
>>
>> Meanwhile, I'll try to reproduce on x86_64.
>>
>>> Required option for the fail: "-O2 -ftracer",
>>> hence, only the "-O3 ..." testsuite builds fail.
>>> (-ftracer = "Perform tail duplication to enlarge superblock size.")
>>>
>>>
>>> during RTL pass: mach
>>> asyncwait-1.f90:19: internal compiler error: in nvptx_find_par, at
>>> config/nvptx/nvptx.c:3293
>>> 0x10bf9f13 nvptx_find_par
>>>         gcc/config/nvptx/nvptx.c:3293
>>> 0x10bf9b97 nvptx_find_par
>>>         gcc/config/nvptx/nvptx.c:3320
>>> 0x10bf9b97 nvptx_find_par
>>>         gcc/config/nvptx/nvptx.c:3320
>>> ...
>>>
>>>
>>> The ICE occurs for the second assert of:
>>>         case CODE_FOR_nvptx_join:
>>>           /* A loop tail.  Finish the current loop and return to
>>>              parent.  */
>>>           {
>>>             unsigned mask = UINTVAL (XVECEXP (PATTERN (end), 0, 0));
>>>
>>>             gcc_assert (par->mask == mask);
>>>             gcc_assert (par->join_block == NULL);
>>>
>>> gdb shows:
>>> (gdb) p debug_bb(par->join_block )
>>> (note 213 30 31 24 [bb 24] NOTE_INSN_BASIC_BLOCK)
>>> (insn 31 213 204 24 (unspec_volatile:SI [
>>>             (const_int 4 [0x4])
>>>         ] UNSPECV_JOIN)
>>> "libgomp/testsuite/libgomp.oacc-fortran/deep-copy-8.f90":24:0 237
>>> {nvptx_join}
>>>      (nil))
>>> (jump_insn 204 31 205 24 (set (pc)
>>>         (label_ref 198)) 121 {jump}
>>>      (nil)
>>>  -> 198)
>>>
>>
>> Yep, code duplication works against the matching of fork/join, it's not
>> the first time we see this.
>>
>> Usually the fix is to make an optimization pass conservative with
>> respect to these fork/join regions, but AFAICT, ftracer already has such
>> code in ignore_bb_p that tests gimple_call_internal_unique_p.
>>
>> So, perhaps the ftracer pass is the trigger, but not the pass that does
>> the problematic transformation? Just a guess at this point.
>>
> 
> I can reproduce it, and it's indeed the ftracer pass that does the
> duplication.  So, the question is why doesn't ignore_bb_p work.

Filed PR https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97000 for this.

Thanks,
- Tom
Tobias Burnus Sept. 9, 2020, 3:10 p.m. UTC | #6
Hi Tom,

On 9/9/20 3:15 PM, Tom de Vries wrote:
>> And I observe the following fails – which seems
>> to be new and related to your patch (but I have not confirmed it by
>> reverting your libatomic patch).
>>
> Could you confirm that?

It is an indenpend issue; it is newish and I can reproduce
it on x86-64-gnu-linux (not with -O2 but with -O3 -ftracer).

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Tobias Burnus Sept. 10, 2020, 8:01 a.m. UTC | #7
Hi all, hello Tom,

On 9/8/20 8:51 AM, Tom de Vries wrote:
> Add nvptx support to libatomic.

I have tried tried this patch together with the latest patch for/of PR97000
for tree-cfgcleanup.c on tree-cfgcleanup.c:

Now, I only see the expected failures and the compare_and_swap issue in
the testsuite is gone (libgomp.c-c++-common/reduction-17.c)

:-)

(Jakub's atomic test fails with a link error:
"__atomic_compare_exchange_16" (the build I have has
--disable-libatomic, hence, I cannot check this with
-foffload=-latomic right now; something to check later.)

Thanks,

Tobias

-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter
Tom de Vries Sept. 11, 2020, 10:15 a.m. UTC | #8
On 9/8/20 8:51 AM, Tom de Vries wrote:
> Hi,
> 
> Add nvptx support to libatomic.
> 
> Given that atomic_test_and_set is not implemented for nvptx (PR96964), the
> compiler translates __atomic_test_and_set falling back onto the "Failing all
> else, assume a single threaded environment and simply perform the operation"
> case in expand_atomic_test_and_set, so it doesn't map onto an actual atomic
> operation.
> 
> Still, that counts as supported for the configure test of libatomic, so we
> end up with HAVE_ATOMIC_TAS_1/2/4/8/16 == 1, and the corresponding
> __atomic_test_and_set_1/2/4/8/16 in libatomic all using that non-atomic
> implementation.
> 
> Fix this by adding an atomic_test_and_set expansion for nvptx, that uses
> libatomics __atomic_test_and_set_1.
> 
> This again makes the configure tests for HAVE_ATOMIC_TAS_1/2/4/8/16 fail, so
> instead we use this case in tas_n.c:
> ...
> /* If this type is smaller than word-sized, fall back to a word-sized
>    compare-and-swap loop.  */
> bool
> SIZE(libat_test_and_set) (UTYPE *mptr, int smodel)
> ...
> which for __atomic_test_and_set_8 uses INVERT_MASK_8.
> 
> Add INVERT_MASK_8 in libatomic_i.h, as well as MASK_8.
> 
> Tested libatomic testsuite on nvptx.
> 
> Non-target bits (sync-builtins.def, libatomic_i.h) OK for trunk?
> 
> Any other comments?
> 
> Thanks,
> - Tom
> 
> [libatomic] Add nvptx support
> 
> gcc/ChangeLog:
> 
> 	PR target/96964
> 	* config/nvptx/nvptx.md (define_expand "atomic_test_and_set"): New
> 	expansion.
> 	* sync-builtins.def (BUILT_IN_ATOMIC_TEST_AND_SET_1): New builtin.
> 

I realized after reading the header comment of write_fn_proto_from_insn
in nvptx.c that I could drop the sync-builtins.def bit.  So, committed
without the sync-builtins.def change.

Thanks,
- Tom

> libatomic/ChangeLog:
> 
> 	PR target/96898
> 	* configure.tgt: Add nvptx.
> 	* libatomic_i.h (MASK_8, INVERT_MASK_8): New macro definition.
> 	* config/nvptx/host-config.h: New file.
> 	* config/nvptx/lock.c: New file.
> 
> ---
>  gcc/config/nvptx/nvptx.md            | 16 +++++++++++
>  gcc/sync-builtins.def                |  2 ++
>  libatomic/config/nvptx/host-config.h | 56 ++++++++++++++++++++++++++++++++++++
>  libatomic/config/nvptx/lock.c        | 56 ++++++++++++++++++++++++++++++++++++
>  libatomic/configure.tgt              |  3 ++
>  libatomic/libatomic_i.h              |  2 ++
>  6 files changed, 135 insertions(+)
> 
> diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
> index 4168190fa42..6178e6a0f77 100644
> --- a/gcc/config/nvptx/nvptx.md
> +++ b/gcc/config/nvptx/nvptx.md
> @@ -1667,6 +1667,22 @@
>    "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;"
>    [(set_attr "atomic" "true")])
>  
> +(define_expand "atomic_test_and_set"
> +  [(match_operand:SI 0 "nvptx_register_operand")	;; bool success output
> +   (match_operand:QI 1 "memory_operand")		;; memory
> +   (match_operand:SI 2 "const_int_operand")]		;; model
> +  ""
> +{
> +  rtx libfunc;
> +  rtx addr;
> +  libfunc = init_one_libfunc ("__atomic_test_and_set_1");
> +  addr = convert_memory_address (ptr_mode, XEXP (operands[1], 0));
> +  emit_library_call_value (libfunc, operands[0], LCT_NORMAL, SImode,
> +			  addr, ptr_mode,
> +			  operands[2], SImode);
> +  DONE;
> +})
> +
>  (define_insn "nvptx_barsync"
>    [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
>  		     (match_operand:SI 1 "const_int_operand")]
> diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def
> index 156a13ce0f8..b802257bd1a 100644
> --- a/gcc/sync-builtins.def
> +++ b/gcc/sync-builtins.def
> @@ -261,6 +261,8 @@ DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SYNCHRONIZE, "__sync_synchronize",
>  
>  DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET, "__atomic_test_and_set",
>  		  BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST)
> +DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET_1, "__atomic_test_and_set_1",
> +		  BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST)
>  
>  DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_CLEAR, "__atomic_clear", BT_FN_VOID_VPTR_INT,
>  		  ATTR_NOTHROWCALL_LEAF_LIST)
> diff --git a/libatomic/config/nvptx/host-config.h b/libatomic/config/nvptx/host-config.h
> new file mode 100644
> index 00000000000..eb9de81f388
> --- /dev/null
> +++ b/libatomic/config/nvptx/host-config.h
> @@ -0,0 +1,56 @@
> +/* Copyright (C) 2020 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU Atomic Library (libatomic).
> +
> +   Libatomic is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +   more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* Copied from libatomic/config/posix/host-config.h.  */
> +
> +/* Included after all more target-specific host-config.h.  */
> +
> +
> +#ifndef protect_start_end
> +# ifdef HAVE_ATTRIBUTE_VISIBILITY
> +#  pragma GCC visibility push(hidden)
> +# endif
> +
> +void libat_lock_1 (void *ptr);
> +void libat_unlock_1 (void *ptr);
> +
> +static inline UWORD
> +protect_start (void *ptr)
> +{
> +  libat_lock_1 (ptr);
> +  return 0;
> +}
> +
> +static inline void
> +protect_end (void *ptr, UWORD dummy UNUSED)
> +{
> +  libat_unlock_1 (ptr);
> +}
> +
> +# define protect_start_end 1
> +# ifdef HAVE_ATTRIBUTE_VISIBILITY
> +#  pragma GCC visibility pop
> +# endif
> +#endif /* protect_start_end */
> +
> +#include_next <host-config.h>
> diff --git a/libatomic/config/nvptx/lock.c b/libatomic/config/nvptx/lock.c
> new file mode 100644
> index 00000000000..dea85a3e5bd
> --- /dev/null
> +++ b/libatomic/config/nvptx/lock.c
> @@ -0,0 +1,56 @@
> +/* Copyright (C) 2020 Free Software Foundation, Inc.
> +
> +   This file is part of the GNU Atomic Library (libatomic).
> +
> +   Libatomic is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +   more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
> +/* Functions libat_lock_n/libat_unlock_n based on GOMP_atomic_start/end in
> +   libgomp/atomic.c.  */
> +
> +#include "libatomic_i.h"
> +
> +static int atomic_lock;
> +
> +void
> +libat_lock_n (void *ptr __attribute__((unused)),
> +	      size_t n __attribute__((unused)))
> +{
> +  while (__sync_lock_test_and_set (&atomic_lock, 1))
> +    ;  /* Spin.  */
> +}
> +
> +void
> +libat_unlock_n (void *ptr __attribute__((unused)),
> +		size_t n __attribute__((unused)))
> +{
> +  __sync_lock_release (&atomic_lock);
> +}
> +
> +void
> +libat_lock_1 (void *ptr)
> +{
> +  libat_lock_n (ptr, 1);
> +}
> +
> +void
> +libat_unlock_1 (void *ptr)
> +{
> +  libat_unlock_n (ptr, 1);
> +}
> diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
> index efb3b1efb68..7834e0a6528 100644
> --- a/libatomic/configure.tgt
> +++ b/libatomic/configure.tgt
> @@ -174,6 +174,9 @@ case "${target}" in
>  	UNSUPPORTED=1
>  	;;
>  
> +  nvptx*-*-*)
> +	;;
> +
>    *)
>  	# Who are you?
>  	UNSUPPORTED=1
> diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h
> index 081b154e9d7..37de9921024 100644
> --- a/libatomic/libatomic_i.h
> +++ b/libatomic/libatomic_i.h
> @@ -109,9 +109,11 @@ typedef unsigned UWORD __attribute__((mode(word)));
>  #define MASK_1		((UWORD)0xff)
>  #define MASK_2		((UWORD)0xffff)
>  #define MASK_4		((UWORD)0xffffffff)
> +#define MASK_8		((UWORD)0xffffffffffffffff)
>  #define INVERT_MASK_1	((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 1) * CHAR_BIT))
>  #define INVERT_MASK_2	((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 2) * CHAR_BIT))
>  #define INVERT_MASK_4	((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 4) * CHAR_BIT))
> +#define INVERT_MASK_8	((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 8) * CHAR_BIT))
>  
>  /* Most of the files in this library are compiled multiple times with
>     N defined to be a power of 2 between 1 and 16.  The SIZE macro is
>
diff mbox series

Patch

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index 4168190fa42..6178e6a0f77 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1667,6 +1667,22 @@ 
   "%.\\tatom%A1.b%T0.<logic>\\t%0, %1, %2;"
   [(set_attr "atomic" "true")])
 
+(define_expand "atomic_test_and_set"
+  [(match_operand:SI 0 "nvptx_register_operand")	;; bool success output
+   (match_operand:QI 1 "memory_operand")		;; memory
+   (match_operand:SI 2 "const_int_operand")]		;; model
+  ""
+{
+  rtx libfunc;
+  rtx addr;
+  libfunc = init_one_libfunc ("__atomic_test_and_set_1");
+  addr = convert_memory_address (ptr_mode, XEXP (operands[1], 0));
+  emit_library_call_value (libfunc, operands[0], LCT_NORMAL, SImode,
+			  addr, ptr_mode,
+			  operands[2], SImode);
+  DONE;
+})
+
 (define_insn "nvptx_barsync"
   [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
 		     (match_operand:SI 1 "const_int_operand")]
diff --git a/gcc/sync-builtins.def b/gcc/sync-builtins.def
index 156a13ce0f8..b802257bd1a 100644
--- a/gcc/sync-builtins.def
+++ b/gcc/sync-builtins.def
@@ -261,6 +261,8 @@  DEF_SYNC_BUILTIN (BUILT_IN_SYNC_SYNCHRONIZE, "__sync_synchronize",
 
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET, "__atomic_test_and_set",
 		  BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST)
+DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_TEST_AND_SET_1, "__atomic_test_and_set_1",
+		  BT_FN_BOOL_VPTR_INT, ATTR_NOTHROWCALL_LEAF_LIST)
 
 DEF_SYNC_BUILTIN (BUILT_IN_ATOMIC_CLEAR, "__atomic_clear", BT_FN_VOID_VPTR_INT,
 		  ATTR_NOTHROWCALL_LEAF_LIST)
diff --git a/libatomic/config/nvptx/host-config.h b/libatomic/config/nvptx/host-config.h
new file mode 100644
index 00000000000..eb9de81f388
--- /dev/null
+++ b/libatomic/config/nvptx/host-config.h
@@ -0,0 +1,56 @@ 
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Copied from libatomic/config/posix/host-config.h.  */
+
+/* Included after all more target-specific host-config.h.  */
+
+
+#ifndef protect_start_end
+# ifdef HAVE_ATTRIBUTE_VISIBILITY
+#  pragma GCC visibility push(hidden)
+# endif
+
+void libat_lock_1 (void *ptr);
+void libat_unlock_1 (void *ptr);
+
+static inline UWORD
+protect_start (void *ptr)
+{
+  libat_lock_1 (ptr);
+  return 0;
+}
+
+static inline void
+protect_end (void *ptr, UWORD dummy UNUSED)
+{
+  libat_unlock_1 (ptr);
+}
+
+# define protect_start_end 1
+# ifdef HAVE_ATTRIBUTE_VISIBILITY
+#  pragma GCC visibility pop
+# endif
+#endif /* protect_start_end */
+
+#include_next <host-config.h>
diff --git a/libatomic/config/nvptx/lock.c b/libatomic/config/nvptx/lock.c
new file mode 100644
index 00000000000..dea85a3e5bd
--- /dev/null
+++ b/libatomic/config/nvptx/lock.c
@@ -0,0 +1,56 @@ 
+/* Copyright (C) 2020 Free Software Foundation, Inc.
+
+   This file is part of the GNU Atomic Library (libatomic).
+
+   Libatomic is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libatomic is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Functions libat_lock_n/libat_unlock_n based on GOMP_atomic_start/end in
+   libgomp/atomic.c.  */
+
+#include "libatomic_i.h"
+
+static int atomic_lock;
+
+void
+libat_lock_n (void *ptr __attribute__((unused)),
+	      size_t n __attribute__((unused)))
+{
+  while (__sync_lock_test_and_set (&atomic_lock, 1))
+    ;  /* Spin.  */
+}
+
+void
+libat_unlock_n (void *ptr __attribute__((unused)),
+		size_t n __attribute__((unused)))
+{
+  __sync_lock_release (&atomic_lock);
+}
+
+void
+libat_lock_1 (void *ptr)
+{
+  libat_lock_n (ptr, 1);
+}
+
+void
+libat_unlock_1 (void *ptr)
+{
+  libat_unlock_n (ptr, 1);
+}
diff --git a/libatomic/configure.tgt b/libatomic/configure.tgt
index efb3b1efb68..7834e0a6528 100644
--- a/libatomic/configure.tgt
+++ b/libatomic/configure.tgt
@@ -174,6 +174,9 @@  case "${target}" in
 	UNSUPPORTED=1
 	;;
 
+  nvptx*-*-*)
+	;;
+
   *)
 	# Who are you?
 	UNSUPPORTED=1
diff --git a/libatomic/libatomic_i.h b/libatomic/libatomic_i.h
index 081b154e9d7..37de9921024 100644
--- a/libatomic/libatomic_i.h
+++ b/libatomic/libatomic_i.h
@@ -109,9 +109,11 @@  typedef unsigned UWORD __attribute__((mode(word)));
 #define MASK_1		((UWORD)0xff)
 #define MASK_2		((UWORD)0xffff)
 #define MASK_4		((UWORD)0xffffffff)
+#define MASK_8		((UWORD)0xffffffffffffffff)
 #define INVERT_MASK_1	((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 1) * CHAR_BIT))
 #define INVERT_MASK_2	((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 2) * CHAR_BIT))
 #define INVERT_MASK_4	((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 4) * CHAR_BIT))
+#define INVERT_MASK_8	((UWORD)WORDS_BIGENDIAN << ((WORDSIZE - 8) * CHAR_BIT))
 
 /* Most of the files in this library are compiled multiple times with
    N defined to be a power of 2 between 1 and 16.  The SIZE macro is