Patchwork PR lto/51698 and the state of LTO + transactional memory

login
register
mail settings
Submitter Aldy Hernandez
Date Jan. 24, 2012, 11:16 p.m.
Message ID <4F1F3BB4.7010205@redhat.com>
Download mbox | patch
Permalink /patch/137666/
State New
Headers show

Comments

Aldy Hernandez - Jan. 24, 2012, 11:16 p.m.
The problem here is that -flto cannot equate the instrumentation 
functions being generated with a user supplied version of the library 
functions.  This would happen if the user tried to link a transactional 
program with libitm with -flto (as in -fwhole-program, etc).

This is an easy problem to fix, just by changing DEF_TM_BUILTIN to set 
BOTH_P so the __builtin_ITM_* variants and _ITM_* are considered the same.

In doing so, I noticed that there are a myriad of _ITM_* builtins that 
have incorrect prototypes as defined in the builtins*.def files.  So 
using libitm/libitm.h fails miserably because of mismatches.  The 
attached patch fixes all problems, and the PR.

However...

Looking at the generated code, I've noticed that LTO isn't inlining the 
TM builtins as one would expect.  This is a two-fold problem:

First, LTO streaming happens before the final TM lowering (tmmark pass), 
so there is nothing to inline.  The instrumented function calls 
(_ITM_RU4, et al), aren't generated until after LTO.

Second, it seems that by design, LTO prefers builtins to user-provided 
versions of them.  In particular, lto_symtab_prevailing_decl() 
stipulates that builtins are their own prevailing decl.  So even if we 
lowered TM before LTO streaming, user provided builtins wouldn't be 
preferred (and thus inlined) as we would expect into application code.

Though I don't think fixing LTO of instrumented TM code is 4.7 material, 
I thought I'd write this up for the record.

With regards to the first problem, the reason we lower TM so late (in 
contrast to openmp) is because we were hoping to get good optimization 
of the memory operations before fudging them all up.  So at some point 
we should decide what's the biggest gain, deeper optimization of the 
memory operations or the ability to inline the instrumentation.  Or 
perhaps, whether we can simply move the LTO streaming point under 
certain conditions and get both.

With regards to the second problem, I suppose I should tackle #1 first, 
but I would be curious if the LTO experts could weigh in here.

The attached patch fixes the ICE in the PR, though it won't do what the 
user ultimately wants to do, given the limitations described.  Perhaps 
we could create another PR and tag it with an enhancement request.

OK pending tests?

Aldy

p.s. Oh, and btw, does this all make sense?
PR lto/51698
	* builtin-types.def: (BT_CONST_DOUBLE_PTR): New.
	(BT_FN_VOID_PTR_CONST_PTR_SIZE): New.
	(BT_FN_VOID_PTR_INT_SIZE): New.
	(BT_FN_DOUBLE_VPTR): Remove.
	(BT_FN_DOUBLE_CONST_DOUBLE_PTR): New.
	* gtm-builtins.def (_ITM_abortTransaction): Set return type to
	void.
	(_ITM_changeTransactionMode): Same.
	(_ITM_memmoveRtWt): Change return type to void.
	(_ITM_memcpyRtWt): Same.
	(_ITM_memsetW): Same.
	(_ITM_RaRD): Change types to double.
	(_ITM_RD): Same.
	(_ITM_RaWD): Same.
	(_ITM_RfWD): Same.
	* builtins.def (DEF_TM_BUILTIN): Set BOTH_P to true.
Richard Henderson - Jan. 24, 2012, 11:24 p.m.
On 01/25/2012 10:16 AM, Aldy Hernandez wrote:
> The attached patch fixes the ICE in the PR, though it won't do what
> the user ultimately wants to do, given the limitations described.
> Perhaps we could create another PR and tag it with an enhancement
> request.

An enhancement request pr sounds good.

> 	PR lto/51698
> 	* builtin-types.def: (BT_CONST_DOUBLE_PTR): New.
> 	(BT_FN_VOID_PTR_CONST_PTR_SIZE): New.
> 	(BT_FN_VOID_PTR_INT_SIZE): New.
> 	(BT_FN_DOUBLE_VPTR): Remove.
> 	(BT_FN_DOUBLE_CONST_DOUBLE_PTR): New.
> 	* gtm-builtins.def (_ITM_abortTransaction): Set return type to
> 	void.
> 	(_ITM_changeTransactionMode): Same.
> 	(_ITM_memmoveRtWt): Change return type to void.
> 	(_ITM_memcpyRtWt): Same.
> 	(_ITM_memsetW): Same.
> 	(_ITM_RaRD): Change types to double.
> 	(_ITM_RD): Same.
> 	(_ITM_RaWD): Same.
> 	(_ITM_RfWD): Same.
> 	* builtins.def (DEF_TM_BUILTIN): Set BOTH_P to true.

Ok.


r~
Richard Guenther - Jan. 25, 2012, 10:07 a.m.
On Wed, Jan 25, 2012 at 12:16 AM, Aldy Hernandez <aldyh@redhat.com> wrote:
> The problem here is that -flto cannot equate the instrumentation functions
> being generated with a user supplied version of the library functions.  This
> would happen if the user tried to link a transactional program with libitm
> with -flto (as in -fwhole-program, etc).
>
> This is an easy problem to fix, just by changing DEF_TM_BUILTIN to set
> BOTH_P so the __builtin_ITM_* variants and _ITM_* are considered the same.
>
> In doing so, I noticed that there are a myriad of _ITM_* builtins that have
> incorrect prototypes as defined in the builtins*.def files.  So using
> libitm/libitm.h fails miserably because of mismatches.  The attached patch
> fixes all problems, and the PR.
>
> However...
>
> Looking at the generated code, I've noticed that LTO isn't inlining the TM
> builtins as one would expect.  This is a two-fold problem:
>
> First, LTO streaming happens before the final TM lowering (tmmark pass), so
> there is nothing to inline.  The instrumented function calls (_ITM_RU4, et
> al), aren't generated until after LTO.

Which means they are not available for inlining without LTO either ...
(unless you mark them to be always inlined, in which case the inline
transform, which happens first at local after-IPA passes, probably
still inlines it - but that should work at LTRANS stage as well).

> Second, it seems that by design, LTO prefers builtins to user-provided
> versions of them.  In particular, lto_symtab_prevailing_decl() stipulates
> that builtins are their own prevailing decl.  So even if we lowered TM
> before LTO streaming, user provided builtins wouldn't be preferred (and thus
> inlined) as we would expect into application code.

Hmm, so you say you have sth like

void *memcpy(void *dst, void *src, size_t n) { ...implementation... }
void foo()
{
  memcpy (...);
}

and expect it to be inlined from the supplied body instead of using the
builtin expander?

I think we could make this work ... at least under a sort-of ODR, that
all bodies (from different TUs) and the builtin have the same behavior.

Mind to file an enhancement bug?  Does it work without LTO?

> Though I don't think fixing LTO of instrumented TM code is 4.7 material, I
> thought I'd write this up for the record.
>
> With regards to the first problem, the reason we lower TM so late (in
> contrast to openmp) is because we were hoping to get good optimization of
> the memory operations before fudging them all up.  So at some point we
> should decide what's the biggest gain, deeper optimization of the memory
> operations or the ability to inline the instrumentation.  Or perhaps,
> whether we can simply move the LTO streaming point under certain conditions
> and get both.

I think that lowering at ipa_tm time should be good enough for scalar
code optimizations, but it will surely fuck up loop optimizations galore
(you might want to try).

Richard.

> With regards to the second problem, I suppose I should tackle #1 first, but
> I would be curious if the LTO experts could weigh in here.
>
> The attached patch fixes the ICE in the PR, though it won't do what the user
> ultimately wants to do, given the limitations described.  Perhaps we could
> create another PR and tag it with an enhancement request.
>
> OK pending tests?
>
> Aldy
>
> p.s. Oh, and btw, does this all make sense?
Aldy Hernandez - Jan. 25, 2012, 1 p.m.
>> Second, it seems that by design, LTO prefers builtins to user-provided
>> versions of them.  In particular, lto_symtab_prevailing_decl() stipulates
>> that builtins are their own prevailing decl.  So even if we lowered TM
>> before LTO streaming, user provided builtins wouldn't be preferred (and thus
>> inlined) as we would expect into application code.
>
> Hmm, so you say you have sth like
>
> void *memcpy(void *dst, void *src, size_t n) { ...implementation... }
> void foo()
> {
>    memcpy (...);
> }
>
> and expect it to be inlined from the supplied body instead of using the
> builtin expander?

Yes.  Ultimately we want to do exactly that with TM instrumented code.

> I think we could make this work ... at least under a sort-of ODR, that
> all bodies (from different TUs) and the builtin have the same behavior.
>
> Mind to file an enhancement bug?  Does it work without LTO?

Without LTO the memcpy gets inlined correctly.  This is what I am using:

houston:/build/t/gcc$ cat a.c
char *dst, *src;

void *memcpy(void *, const void *, __SIZE_TYPE__);

main()
{
   memcpy(dst, src, 123);
}
houston:/build/t/gcc$ cat b.c
extern int putchar(int);

void *memcpy(void *dst,
              const void *src,
              __SIZE_TYPE__ n)
{
   putchar(13);
}
houston:/build/t/gcc$ ./xgcc -B./ -flto -O3 a.c b.c -save-temps -o a.out

However, with LTO, somewhere around constant propagation (ccp2), we 
decide the memcpy is no longer needed and remove it altogether.  So it 
looks like the builtin was preferred.

I will file an enhancement PR with the above example.

Thanks for looking into this.
Richard Guenther - Jan. 25, 2012, 2:23 p.m.
On Wed, Jan 25, 2012 at 2:00 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
>
>>> Second, it seems that by design, LTO prefers builtins to user-provided
>>> versions of them.  In particular, lto_symtab_prevailing_decl() stipulates
>>> that builtins are their own prevailing decl.  So even if we lowered TM
>>> before LTO streaming, user provided builtins wouldn't be preferred (and
>>> thus
>>> inlined) as we would expect into application code.
>>
>>
>> Hmm, so you say you have sth like
>>
>> void *memcpy(void *dst, void *src, size_t n) { ...implementation... }
>> void foo()
>> {
>>   memcpy (...);
>> }
>>
>> and expect it to be inlined from the supplied body instead of using the
>> builtin expander?
>
>
> Yes.  Ultimately we want to do exactly that with TM instrumented code.
>
>
>> I think we could make this work ... at least under a sort-of ODR, that
>> all bodies (from different TUs) and the builtin have the same behavior.
>>
>> Mind to file an enhancement bug?  Does it work without LTO?
>
>
> Without LTO the memcpy gets inlined correctly.  This is what I am using:
>
> houston:/build/t/gcc$ cat a.c
> char *dst, *src;
>
> void *memcpy(void *, const void *, __SIZE_TYPE__);
>
> main()
> {
>  memcpy(dst, src, 123);
> }
> houston:/build/t/gcc$ cat b.c
> extern int putchar(int);
>
> void *memcpy(void *dst,
>             const void *src,
>             __SIZE_TYPE__ n)
> {
>  putchar(13);
> }
> houston:/build/t/gcc$ ./xgcc -B./ -flto -O3 a.c b.c -save-temps -o a.out

Of course that's an invalid testcase ;)  GCC correctly assumed that your
memcpy has the same kind of side-effects as __builtin_memcpy.  So
you can't observe, in a valid runtime testcase, which copy chose.

;)

Richard.

> However, with LTO, somewhere around constant propagation (ccp2), we decide
> the memcpy is no longer needed and remove it altogether.  So it looks like
> the builtin was preferred.
>
> I will file an enhancement PR with the above example.
>
> Thanks for looking into this.
Aldy Hernandez - Jan. 25, 2012, 2:27 p.m.
On 01/25/12 08:23, Richard Guenther wrote:
> On Wed, Jan 25, 2012 at 2:00 PM, Aldy Hernandez<aldyh@redhat.com>  wrote:
>>
>>>> Second, it seems that by design, LTO prefers builtins to user-provided
>>>> versions of them.  In particular, lto_symtab_prevailing_decl() stipulates
>>>> that builtins are their own prevailing decl.  So even if we lowered TM
>>>> before LTO streaming, user provided builtins wouldn't be preferred (and
>>>> thus
>>>> inlined) as we would expect into application code.
>>>
>>>
>>> Hmm, so you say you have sth like
>>>
>>> void *memcpy(void *dst, void *src, size_t n) { ...implementation... }
>>> void foo()
>>> {
>>>    memcpy (...);
>>> }
>>>
>>> and expect it to be inlined from the supplied body instead of using the
>>> builtin expander?
>>
>>
>> Yes.  Ultimately we want to do exactly that with TM instrumented code.
>>
>>
>>> I think we could make this work ... at least under a sort-of ODR, that
>>> all bodies (from different TUs) and the builtin have the same behavior.
>>>
>>> Mind to file an enhancement bug?  Does it work without LTO?
>>
>>
>> Without LTO the memcpy gets inlined correctly.  This is what I am using:
>>
>> houston:/build/t/gcc$ cat a.c
>> char *dst, *src;
>>
>> void *memcpy(void *, const void *, __SIZE_TYPE__);
>>
>> main()
>> {
>>   memcpy(dst, src, 123);
>> }
>> houston:/build/t/gcc$ cat b.c
>> extern int putchar(int);
>>
>> void *memcpy(void *dst,
>>              const void *src,
>>              __SIZE_TYPE__ n)
>> {
>>   putchar(13);
>> }
>> houston:/build/t/gcc$ ./xgcc -B./ -flto -O3 a.c b.c -save-temps -o a.out
>
> Of course that's an invalid testcase ;)  GCC correctly assumed that your
> memcpy has the same kind of side-effects as __builtin_memcpy.  So
> you can't observe, in a valid runtime testcase, which copy chose.

Ah!  What do you suggest as a testcase?
Richard Guenther - Jan. 25, 2012, 2:33 p.m.
On Wed, Jan 25, 2012 at 3:27 PM, Aldy Hernandez <aldyh@redhat.com> wrote:
> On 01/25/12 08:23, Richard Guenther wrote:
>>
>> On Wed, Jan 25, 2012 at 2:00 PM, Aldy Hernandez<aldyh@redhat.com>  wrote:
>>>
>>>
>>>>> Second, it seems that by design, LTO prefers builtins to user-provided
>>>>> versions of them.  In particular, lto_symtab_prevailing_decl()
>>>>> stipulates
>>>>> that builtins are their own prevailing decl.  So even if we lowered TM
>>>>> before LTO streaming, user provided builtins wouldn't be preferred (and
>>>>> thus
>>>>> inlined) as we would expect into application code.
>>>>
>>>>
>>>>
>>>> Hmm, so you say you have sth like
>>>>
>>>> void *memcpy(void *dst, void *src, size_t n) { ...implementation... }
>>>> void foo()
>>>> {
>>>>   memcpy (...);
>>>> }
>>>>
>>>> and expect it to be inlined from the supplied body instead of using the
>>>> builtin expander?
>>>
>>>
>>>
>>> Yes.  Ultimately we want to do exactly that with TM instrumented code.
>>>
>>>
>>>> I think we could make this work ... at least under a sort-of ODR, that
>>>> all bodies (from different TUs) and the builtin have the same behavior.
>>>>
>>>> Mind to file an enhancement bug?  Does it work without LTO?
>>>
>>>
>>>
>>> Without LTO the memcpy gets inlined correctly.  This is what I am using:
>>>
>>> houston:/build/t/gcc$ cat a.c
>>> char *dst, *src;
>>>
>>> void *memcpy(void *, const void *, __SIZE_TYPE__);
>>>
>>> main()
>>> {
>>>  memcpy(dst, src, 123);
>>> }
>>> houston:/build/t/gcc$ cat b.c
>>> extern int putchar(int);
>>>
>>> void *memcpy(void *dst,
>>>             const void *src,
>>>             __SIZE_TYPE__ n)
>>> {
>>>  putchar(13);
>>> }
>>> houston:/build/t/gcc$ ./xgcc -B./ -flto -O3 a.c b.c -save-temps -o a.out
>>
>>
>> Of course that's an invalid testcase ;)  GCC correctly assumed that your
>> memcpy has the same kind of side-effects as __builtin_memcpy.  So
>> you can't observe, in a valid runtime testcase, which copy chose.
>
> Ah!  What do you suggest as a testcase?

I don't see anything suitable for inclusion in the testsuite.  Yours certainly
does it for development - simply make src and dst volatile.

Richard.

Patch

Index: testsuite/gcc.dg/lto/trans-mem-4_0.c
===================================================================
--- testsuite/gcc.dg/lto/trans-mem-4_0.c	(revision 0)
+++ testsuite/gcc.dg/lto/trans-mem-4_0.c	(revision 0)
@@ -0,0 +1,11 @@ 
+/* { dg-lto-options {{-flto -fgnu-tm}} } */
+/* { dg-lto-do link } */
+
+extern void foo() __attribute__((transaction_safe));
+
+int main()
+{
+  __transaction_atomic {
+      foo();
+  }
+}
Index: testsuite/gcc.dg/lto/trans-mem-4_1.c
===================================================================
--- testsuite/gcc.dg/lto/trans-mem-4_1.c	(revision 0)
+++ testsuite/gcc.dg/lto/trans-mem-4_1.c	(revision 0)
@@ -0,0 +1,12 @@ 
+__attribute__((transaction_safe))
+void foo() 
+{
+}
+
+unsigned int _ITM_beginTransaction(unsigned int prop, ...)
+{
+}
+
+void __builtin__ITM_commitTransaction (void)
+{
+}
Index: builtin-types.def
===================================================================
--- builtin-types.def	(revision 183474)
+++ builtin-types.def	(working copy)
@@ -103,6 +103,10 @@  DEF_PRIMITIVE_TYPE (BT_PTRMODE, (*lang_h
 DEF_PRIMITIVE_TYPE (BT_INT_PTR, integer_ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_FLOAT_PTR, float_ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_DOUBLE_PTR, double_ptr_type_node)
+DEF_PRIMITIVE_TYPE (BT_CONST_DOUBLE_PTR,
+		    build_pointer_type
+		     (build_qualified_type (double_type_node,
+		     			    TYPE_QUAL_CONST)))
 DEF_PRIMITIVE_TYPE (BT_LONGDOUBLE_PTR, long_double_ptr_type_node)
 DEF_PRIMITIVE_TYPE (BT_PID, pid_type_node)
 DEF_PRIMITIVE_TYPE (BT_SIZE, size_type_node)
@@ -342,10 +346,14 @@  DEF_FUNCTION_TYPE_3 (BT_FN_INT_CONST_STR
 		     BT_INT, BT_CONST_STRING, BT_CONST_STRING, BT_SIZE)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_CONST_PTR_SIZE,
 		     BT_PTR, BT_PTR, BT_CONST_PTR, BT_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_CONST_PTR_SIZE,
+		     BT_VOID, BT_PTR, BT_CONST_PTR, BT_SIZE)
 DEF_FUNCTION_TYPE_3 (BT_FN_INT_CONST_PTR_CONST_PTR_SIZE,
 		     BT_INT, BT_CONST_PTR, BT_CONST_PTR, BT_SIZE)
 DEF_FUNCTION_TYPE_3 (BT_FN_PTR_PTR_INT_SIZE,
 		     BT_PTR, BT_PTR, BT_INT, BT_SIZE)
+DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_SIZE,
+		     BT_VOID, BT_PTR, BT_INT, BT_SIZE)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_PTR_INT_INT,
 		     BT_VOID, BT_PTR, BT_INT, BT_INT)
 DEF_FUNCTION_TYPE_3 (BT_FN_VOID_CONST_PTR_PTR_SIZE,
@@ -539,7 +547,7 @@  DEF_FUNCTION_TYPE_1 (BT_FN_I2_VPTR, BT_I
 DEF_FUNCTION_TYPE_1 (BT_FN_I4_VPTR, BT_I4, BT_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_I8_VPTR, BT_I8, BT_VOLATILE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_FLOAT_VPTR, BT_FLOAT, BT_VOLATILE_PTR)
-DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_VPTR, BT_DOUBLE, BT_VOLATILE_PTR)
+DEF_FUNCTION_TYPE_1 (BT_FN_DOUBLE_CONST_DOUBLE_PTR, BT_DOUBLE, BT_DOUBLE_PTR)
 DEF_FUNCTION_TYPE_1 (BT_FN_LDOUBLE_VPTR, BT_LONGDOUBLE, BT_VOLATILE_PTR)
 
 DEF_FUNCTION_TYPE_2 (BT_FN_VOID_VPTR_I1, BT_VOID, BT_VOLATILE_PTR, BT_I1)
Index: builtins.def
===================================================================
--- builtins.def	(revision 183474)
+++ builtins.def	(working copy)
@@ -147,7 +147,7 @@  along with GCC; see the file COPYING3.
 #undef DEF_TM_BUILTIN
 #define DEF_TM_BUILTIN(ENUM, NAME, TYPE, ATTRS) \
   DEF_BUILTIN (ENUM, "__builtin_" NAME, BUILT_IN_NORMAL, TYPE, TYPE,    \
-	       false, true, true, ATTRS, false, flag_tm)
+	       true, true, true, ATTRS, false, flag_tm)
 
 /* Define an attribute list for math functions that are normally
    "impure" because some of them may write into global memory for
Index: gtm-builtins.def
===================================================================
--- gtm-builtins.def	(revision 183474)
+++ gtm-builtins.def	(working copy)
@@ -6,16 +6,16 @@  DEF_TM_BUILTIN (BUILT_IN_TM_COMMIT, "_IT
 DEF_TM_BUILTIN (BUILT_IN_TM_COMMIT_EH, "_ITM_commitTransactionEH",
 		BT_FN_VOID_PTR, ATTR_TM_NOTHROW_LIST)
 DEF_TM_BUILTIN (BUILT_IN_TM_ABORT, "_ITM_abortTransaction",
-		BT_FN_INT, ATTR_TM_NORETURN_NOTHROW_LIST)
+		BT_FN_VOID_INT, ATTR_TM_NORETURN_NOTHROW_LIST)
 DEF_TM_BUILTIN (BUILT_IN_TM_IRREVOCABLE, "_ITM_changeTransactionMode",
-		BT_FN_INT_INT, ATTR_TM_NOTHROW_LIST)
+		BT_FN_VOID_INT, ATTR_TM_NOTHROW_LIST)
 
 DEF_TM_BUILTIN (BUILT_IN_TM_MEMCPY, "_ITM_memcpyRtWt",
-		BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
+		BT_FN_VOID_PTR_CONST_PTR_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
 DEF_TM_BUILTIN (BUILT_IN_TM_MEMMOVE, "_ITM_memmoveRtWt",
-		BT_FN_PTR_PTR_CONST_PTR_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
+		BT_FN_VOID_PTR_CONST_PTR_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
 DEF_TM_BUILTIN (BUILT_IN_TM_MEMSET, "_ITM_memsetW",
-	       	BT_FN_PTR_PTR_INT_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
+	       	BT_FN_VOID_PTR_INT_SIZE, ATTR_TM_TMPURE_NOTHROW_LIST)
 
 DEF_TM_BUILTIN (BUILT_IN_TM_GETTMCLONE_IRR, "_ITM_getTMCloneOrIrrevocable",
 		BT_FN_PTR_PTR, ATTR_TM_CONST_NOTHROW_LIST)
@@ -172,13 +172,13 @@  DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_RFW_FLO
 		BT_FN_FLOAT_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
 
 DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_DOUBLE, "_ITM_RD",
-		BT_FN_DOUBLE_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
+		BT_FN_DOUBLE_CONST_DOUBLE_PTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
 DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_RAR_DOUBLE, "_ITM_RaRD",
-		BT_FN_FLOAT_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
+		BT_FN_DOUBLE_CONST_DOUBLE_PTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
 DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_RAW_DOUBLE, "_ITM_RaWD",
-		BT_FN_FLOAT_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
+		BT_FN_DOUBLE_CONST_DOUBLE_PTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
 DEF_TM_BUILTIN (BUILT_IN_TM_LOAD_RFW_DOUBLE, "_ITM_RfWD",
-		BT_FN_FLOAT_VPTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
+		BT_FN_DOUBLE_CONST_DOUBLE_PTR, ATTR_TM_PURE_TMPURE_NOTHROW_LIST)
 
 /* These stubs should get defined in the backend if applicable.  */
 DEF_BUILTIN_STUB (BUILT_IN_TM_LOAD_M64, "__builtin__ITM_RM64")