Patchwork [libitm] Port to ARM

login
register
mail settings
Submitter Richard Henderson
Date Nov. 23, 2011, 11:47 p.m.
Message ID <4ECD8605.4080202@redhat.com>
Download mbox | patch
Permalink /patch/127406/
State New
Headers show

Comments

Richard Henderson - Nov. 23, 2011, 11:47 p.m.
To get the ball rolling for other targets, and to let port maintainers see how easy it really is, here's a first cut at a port to ARM.

Only cross-compiled as yet, and qemu-linux-user isn't good enough to emulate.  I'll do another build on the armv7 host once my current bootstrap and test for the atomic optabs is complete.

Please review, especially the local setjmp-alike implementation.


r~
commit 09969bb6f3597ccb9fd176bb7dc10119cac91371
Author: Richard Henderson <rth@redhat.com>
Date:   Wed Nov 23 15:31:33 2011 -0800

    arm-linux: Add libitm support.
Richard Henderson - Nov. 23, 2011, 11:53 p.m.
On 11/23/2011 03:47 PM, Richard Henderson wrote:
> +GTM_longjmp:
> +	ldm	r0, { r4-r11, sp, pc }


Bah.  This should be

        mov     r2, r0          /* Move the jmpbuf out of the way.  */
        mov     r0, r1          /* Move the return value into place.  */
        ldm     r2, { r4-r11, sp, pc }



r~
Joseph S. Myers - Nov. 24, 2011, midnight
On Wed, 23 Nov 2011, Richard Henderson wrote:

> +  __asm volatile ("swi %1"
> +		  : "+r"(sc_0)
> +		  : "i"(SYS_futex), "r"(sc_1), "r"(sc_2), "r"(sc_3)
> +		  : "memory");

That looks wrong.  Passing the syscall number to swi is the old-ABI 
approach; the EABI uses syscall number in r7.  Maybe this works in EABI 
code if the kernel has CONFIG_OABI_COMPAT enabled, but you can't rely on 
CONFIG_OABI_COMPAT being enabled (and decoding the instruction is slower, 
which is why this changed for EABI); you need to pass the number in r7 
(easier from a pure assembly function, to avoid problems with it being 
used as the Thumb frame pointer) for EABI.

(It's probably time to deprecate the old-ABI targets with EABI equivalents 
(arm*-*-linux* not matching arm*-*-linux-*eabi, arm*-*-uclinux* not 
matching arm*-*-uclinux*eabi, arm*-*-elf not matching arm*-*-ecos-elf, 
arm*-*-rtems* not matching arm*-*-rtemseabi*).)
Richard Henderson - Nov. 24, 2011, 2:23 a.m.
On 11/23/2011 04:00 PM, Joseph S. Myers wrote:
> On Wed, 23 Nov 2011, Richard Henderson wrote:
> 
>> +  __asm volatile ("swi %1"
>> +		  : "+r"(sc_0)
>> +		  : "i"(SYS_futex), "r"(sc_1), "r"(sc_2), "r"(sc_3)
>> +		  : "memory");
> 
> That looks wrong.  Passing the syscall number to swi is the old-ABI 
> approach; the EABI uses syscall number in r7. 

Interesting to know.

I took as examples both glibc ports sysdeps.h and CLEAR_INSN_CACHE from linux-gas.h.  Which... I see is out of date wrt libgcc lib1funcs.S clear_cache.

Perhaps I'm better off with the syscall function as libgomp uses?


r~
Joseph S. Myers - Nov. 24, 2011, 2:33 a.m.
On Wed, 23 Nov 2011, Richard Henderson wrote:

> On 11/23/2011 04:00 PM, Joseph S. Myers wrote:
> > On Wed, 23 Nov 2011, Richard Henderson wrote:
> > 
> >> +  __asm volatile ("swi %1"
> >> +		  : "+r"(sc_0)
> >> +		  : "i"(SYS_futex), "r"(sc_1), "r"(sc_2), "r"(sc_3)
> >> +		  : "memory");
> > 
> > That looks wrong.  Passing the syscall number to swi is the old-ABI 
> > approach; the EABI uses syscall number in r7. 
> 
> Interesting to know.
> 
> I took as examples both glibc ports sysdeps.h

Overridden in the eabi/ subdirectory.  Yes, as ARM glibc maintainer it's 
probably time for me to kill the old-ABI support and simplify the 
directory structure....

> and CLEAR_INSN_CACHE from linux-gas.h.  Which... I see is out of date 
> wrt libgcc lib1funcs.S clear_cache.

Again, an old-ABI-only version; I moved the EABI version to lib1funcs.S 
from linux-eabi.h to avoid a problem where building libgcc for Thumb-2 
with -O0 quietly produced broken code because of the conflict with the 
frame pointer <http://gcc.gnu.org/ml/gcc-patches/2009-05/msg01320.html>.  
Dan ended up moving Thumb-2 syscalls out-of-line in glibc for much the 
same reason (unwinding being the problem there) 
<http://sourceware.org/ml/libc-ports/2010-04/msg00001.html>.

> Perhaps I'm better off with the syscall function as libgomp uses?

That's certainly safer and avoids depending on EABI/old-ABI.
Richard Earnshaw - Nov. 24, 2011, 10:19 a.m.
On 24/11/11 02:23, Richard Henderson wrote:
> On 11/23/2011 04:00 PM, Joseph S. Myers wrote:
>> On Wed, 23 Nov 2011, Richard Henderson wrote:
>>
>>> +  __asm volatile ("swi %1"
>>> +		  : "+r"(sc_0)
>>> +		  : "i"(SYS_futex), "r"(sc_1), "r"(sc_2), "r"(sc_3)
>>> +		  : "memory");
>>
>> That looks wrong.  Passing the syscall number to swi is the old-ABI 
>> approach; the EABI uses syscall number in r7. 
> 

Which gets even more interesting in Thumb because r7 is what GCC thinks
is the frame pointer register :-(

> Interesting to know.
> 
> I took as examples both glibc ports sysdeps.h and CLEAR_INSN_CACHE from linux-gas.h.  Which... I see is out of date wrt libgcc lib1funcs.S clear_cache.
> 
> Perhaps I'm better off with the syscall function as libgomp uses?

That's probably the best solution on ARM.

R.
Richard Earnshaw - Nov. 24, 2011, 10:34 a.m.
On 23/11/11 23:47, Richard Henderson wrote:
> To get the ball rolling for other targets, and to let port maintainers see how easy it really is, here's a first cut at a port to ARM.
> 
> Only cross-compiled as yet, and qemu-linux-user isn't good enough to emulate.  I'll do another build on the armv7 host once my current bootstrap and test for the atomic optabs is complete.
> 
> Please review, especially the local setjmp-alike implementation.
> 
> 
> r~
> 
> 
> d-arm-libitm
> 
>
> +	.text
> +	.align	2
> +	.global	_ITM_beginTransaction
> +	.type	_ITM_beginTransaction, %function
> +
> +_ITM_beginTransaction:
> +	push	{ r4-r11, sp, lr }
> +	mov	r1, sp
> +	bl	GTM_begin_transaction


What about systems with floating-point or vector registers?  Ie VFP or
WMMX on XScale?  Do the callee saved registers in those register banks
also have to be saved?

> diff --git a/libitm/config/arm/target.h b/libitm/config/arm/target.h

> +  unsigned long s[8];	/* r4-r12 */

R4-R12 is 9 registers.  But R12 is callee clobbered.  So is the code or
the comment incorrect?

> +/* ARM generally uses a fixed page size of 4K.  */
> +#define PAGE_SIZE	4096
> +#define FIXED_PAGE_SIZE	1

So I know that on AArch64 there's some interest in moving to larger
pages than this in order to reduce the number of TLB walks needed; this
may impact on the ARM code as well, long term.

> +
> +/* ??? The size of one line in hardware caches (in bytes). */
> +#define HW_CACHELINE_SIZE 64

You can't assume this.  It can vary per core (and I think it may even
vary depending on the cache level).

> +
> +static inline void
> +cpu_relax (void)
> +{
> +  /* ??? Maybe use WFE.  */

Maybe, but WFE doesn't exist on older CPUs

> diff --git a/libitm/config/linux/arm/futex_bits.h b/libitm/config/linux/arm/futex_bits.h

As discussed elsewhere, better to use syscall here rather than try to
inline it.


R.
Richard Henderson - Nov. 24, 2011, 5:08 p.m.
On 11/24/2011 02:34 AM, Richard Earnshaw wrote:
>> +  unsigned long s[8];	/* r4-r12 */
> 
> R4-R12 is 9 registers.  But R12 is callee clobbered.  So is the code or
> the comment incorrect?

The comment is clearly a typo.

>> +/* ARM generally uses a fixed page size of 4K.  */
>> +#define PAGE_SIZE	4096
>> +#define FIXED_PAGE_SIZE	1
> 
> So I know that on AArch64 there's some interest in moving to larger
> pages than this in order to reduce the number of TLB walks needed; this
> may impact on the ARM code as well, long term.
> 
>> +
>> +/* ??? The size of one line in hardware caches (in bytes). */
>> +#define HW_CACHELINE_SIZE 64
> 
> You can't assume this.  It can vary per core (and I think it may even
> vary depending on the cache level).

We don't actually need perfectly accurate values for these.  The only place that we use HW_CACHELINE_SIZE is an attempt to put shared data and thread private data in different lines to prevent ping ponging.


r~
Richard Henderson - Nov. 24, 2011, 11:35 p.m.
On 11/24/2011 02:34 AM, Richard Earnshaw wrote:
> What about systems with floating-point or vector registers?  Ie VFP or
> WMMX on XScale?  Do the callee saved registers in those register banks
> also have to be saved?

Yes.

I mis-read both the glibc and gcc sources on the matter, assuming
that only the integer registers needed care.  I'll try again...


r~
Joseph S. Myers - Nov. 25, 2011, 1:25 a.m.
On Thu, 24 Nov 2011, Richard Henderson wrote:

> On 11/24/2011 02:34 AM, Richard Earnshaw wrote:
> > What about systems with floating-point or vector registers?  Ie VFP or
> > WMMX on XScale?  Do the callee saved registers in those register banks
> > also have to be saved?
> 
> Yes.
> 
> I mis-read both the glibc and gcc sources on the matter, assuming
> that only the integer registers needed care.  I'll try again...

If you're looking at ARM EABI glibc setjmp/longjmp, note that they have a 
bug that they save/restore fpscr but according to the C standard 
floating-point modes and status should not be saved/restored by 
setjmp/longjmp.  I've no idea whether this particular libitm code is meant 
to save/restore floating-point state or not.
Eric Botcazou - Jan. 9, 2012, 6:52 p.m.
> To get the ball rolling for other targets, and to let port maintainers see
> how easy it really is, here's a first cut at a port to ARM.

Do you plan to do something for the SPARC?  I'm asking because you still are a 
maintainer of this arch as well, so if we can avoid doing the work twice...
Richard Henderson - Jan. 10, 2012, 1:02 a.m.
On 01/10/2012 05:52 AM, Eric Botcazou wrote:
>> To get the ball rolling for other targets, and to let port maintainers see
>> how easy it really is, here's a first cut at a port to ARM.
> 
> Do you plan to do something for the SPARC?  I'm asking because you still are a 
> maintainer of this arch as well, so if we can avoid doing the work twice...

I have no immediate plans.  If you have time to knock something
together for Sparc, that would be fantastic.


r~
Eric Botcazou - Jan. 10, 2012, 7:26 a.m.
> I have no immediate plans.  If you have time to knock something
> together for Sparc, that would be fantastic.

OK, will try, thanks.
Eric Botcazou - Feb. 10, 2012, 7:25 p.m.
> I have no immediate plans.  If you have time to knock something
> together for Sparc, that would be fantastic.

Btw, any particular reason why the libgomp-copied linking scheme seems to be 
only half-implemented?

# Set up the set of libraries that we need to link against for libitm.
# Note that the GOMP_SELF_SPEC in gcc.c will force -pthread for -fopenmp,
# which will force linkage against -lpthread (or equivalent for the system).
# That's not 100% ideal, but about the best we can do easily.
if test $enable_shared = yes; then
  link_itm="-litm %{static: $LIBS}"
else
  link_itm="-litm $LIBS"
fi
AC_SUBST(link_itm)

but AFAICS nobody uses libitm.spec (and so you need to force -litm for the 
testsuite)?

Patch

diff --git a/libitm/config/arm/sjlj.S b/libitm/config/arm/sjlj.S
new file mode 100644
index 0000000..f1e7769
--- /dev/null
+++ b/libitm/config/arm/sjlj.S
@@ -0,0 +1,48 @@ 
+/* Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+	.text
+	.align	2
+	.global	_ITM_beginTransaction
+	.type	_ITM_beginTransaction, %function
+
+_ITM_beginTransaction:
+	push	{ r4-r11, sp, lr }
+	mov	r1, sp
+	bl	GTM_begin_transaction
+        add     sp, sp, #(9*4)
+        pop	{ pc }
+	.size	_ITM_beginTransaction, . - _ITM_beginTransaction
+
+	.global	GTM_longjmp
+	.hidden	GTM_longjmp
+	.type	GTM_longjmp, %function
+
+GTM_longjmp:
+	ldm	r0, { r4-r11, sp, pc }
+	.size	GTM_longjmp, . - GTM_longjmp
+
+#ifdef __linux__
+.section .note.GNU-stack, "", %progbits
+#endif
diff --git a/libitm/config/arm/target.h b/libitm/config/arm/target.h
new file mode 100644
index 0000000..d889bc8
--- /dev/null
+++ b/libitm/config/arm/target.h
@@ -0,0 +1,60 @@ 
+/* Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+namespace GTM HIDDEN {
+
+typedef struct gtm_jmpbuf
+{
+  unsigned long s[8];	/* r4-r12 */
+  void *cfa;
+  unsigned long pc;
+} gtm_jmpbuf;
+
+/* ARM generally uses a fixed page size of 4K.  */
+#define PAGE_SIZE	4096
+#define FIXED_PAGE_SIZE	1
+
+/* ??? The size of one line in hardware caches (in bytes). */
+#define HW_CACHELINE_SIZE 64
+
+static inline void
+cpu_relax (void)
+{
+  /* ??? Maybe use WFE.  */
+  __asm volatile ("" : : : "memory");
+}
+
+static inline void
+atomic_read_barrier (void)
+{
+  __sync_synchronize ();
+}
+
+static inline void
+atomic_write_barrier (void)
+{
+  __sync_synchronize ();
+}
+
+} // namespace GTM
diff --git a/libitm/config/linux/arm/futex_bits.h b/libitm/config/linux/arm/futex_bits.h
new file mode 100644
index 0000000..7e1b52f
--- /dev/null
+++ b/libitm/config/linux/arm/futex_bits.h
@@ -0,0 +1,48 @@ 
+/* Copyright (C) 2011 Free Software Foundation, Inc.
+   Contributed by Richard Henderson <rth@redhat.com>.
+
+   This file is part of the GNU Transactional Memory Library (libitm).
+
+   Libitm is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* Provide target-specific access to the futex system call.  */
+
+#include <sys/syscall.h>
+
+static inline long
+sys_futex0 (int *addr, long op, long val)
+{
+  register long sc_0 __asm__("r0");
+  register long sc_1 __asm__("r1");
+  register long sc_2 __asm__("r2");
+  register long sc_3 __asm__("r3");
+
+  sc_0 = (long) addr;
+  sc_1 = op;
+  sc_2 = val;
+  sc_3 = 0;
+
+  __asm volatile ("swi %1"
+		  : "+r"(sc_0)
+		  : "i"(SYS_futex), "r"(sc_1), "r"(sc_2), "r"(sc_3)
+		  : "memory");
+
+  return sc_0;
+}
diff --git a/libitm/configure.tgt b/libitm/configure.tgt
index eac6f50..a17f3fc 100644
--- a/libitm/configure.tgt
+++ b/libitm/configure.tgt
@@ -48,6 +48,8 @@  fi
 case "${target_cpu}" in
   alpha*)	ARCH=alpha ;;
 
+  arm*)		ARCH=arm ;;
+
   i[3456]86)
 	case " ${CC} ${CFLAGS} " in
 	  *" -m64 "*)