diff mbox series

Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657).

Message ID ec080dce-918a-c3e0-c9d6-a443b5fe204c@suse.cz
State New
Headers show
Series Prefer mempcpy to memcpy on x86_64 target (PR middle-end/81657). | expand

Commit Message

Martin Liška March 12, 2018, 8:57 a.m. UTC
Hi.

This is fix for the PR that introduces a new target macro. Using the macro
one can say that a target has a fast mempcpy and thus it's preferred to be used
if possible.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
I also tested on x86_64-linux-gnu.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-03-08  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Add new
	arguments.
	* config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
	New macro.
	* defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
	* doc/tm.texi: Likewise.
	* doc/tm.texi.in: Likewise.
	* expr.c (compare_by_pieces): Add support for bail out.
	(emit_block_move_hints): Likewise.
	* expr.h (emit_block_move_hints): Add new arguments.

gcc/testsuite/ChangeLog:

2018-03-09  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* gcc.dg/string-opt-1.c: Adjust test to run only on non-x86
	target.
---
 gcc/builtins.c                      | 13 ++++++++++++-
 gcc/config/i386/i386.h              |  3 +++
 gcc/defaults.h                      |  7 +++++++
 gcc/doc/tm.texi                     |  5 +++++
 gcc/doc/tm.texi.in                  |  5 +++++
 gcc/expr.c                          | 16 +++++++++++++++-
 gcc/expr.h                          |  4 +++-
 gcc/testsuite/gcc.dg/string-opt-1.c |  4 ++--
 8 files changed, 52 insertions(+), 5 deletions(-)

Comments

Jakub Jelinek March 12, 2018, 9:07 a.m. UTC | #1
On Mon, Mar 12, 2018 at 09:57:09AM +0100, Martin Liška wrote:
> Hi.
> 
> This is fix for the PR that introduces a new target macro. Using the macro
> one can say that a target has a fast mempcpy and thus it's preferred to be used
> if possible.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I also tested on x86_64-linux-gnu.

I think we shouldn't be adding new target macros, better would be a target
hook that returns this information.

	Jakub
Marc Glisse March 12, 2018, 9:39 a.m. UTC | #2
On Mon, 12 Mar 2018, Martin Liška wrote:

> This is fix for the PR that introduces a new target macro. Using the macro
> one can say that a target has a fast mempcpy and thus it's preferred to be used
> if possible.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I also tested on x86_64-linux-gnu.
>
> Ready to be installed?
> Martin
>
> gcc/ChangeLog:
>
> 2018-03-08  Martin Liska  <mliska@suse.cz>
>
> 	PR middle-end/81657
> 	* builtins.c (expand_builtin_memory_copy_args): Add new
> 	arguments.
> 	* config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
> 	New macro.

Shouldn't the macro be defined in a more specific case, for instance glibc 
on x86? Or do all known libc on x86 happen to provide a fast mempcpy?

> 	* defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
> 	* doc/tm.texi: Likewise.
> 	* doc/tm.texi.in: Likewise.
> 	* expr.c (compare_by_pieces): Add support for bail out.
> 	(emit_block_move_hints): Likewise.
> 	* expr.h (emit_block_move_hints): Add new arguments.
Richard Biener March 12, 2018, 10:08 a.m. UTC | #3
On Mon, Mar 12, 2018 at 9:57 AM, Martin Liška <mliska@suse.cz> wrote:
> Hi.
>
> This is fix for the PR that introduces a new target macro. Using the macro

Don't add new target macros, use target hooks.

> one can say that a target has a fast mempcpy and thus it's preferred to be used
> if possible.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> I also tested on x86_64-linux-gnu.
>
> Ready to be installed?
> Martin
>
> gcc/ChangeLog:
>
> 2018-03-08  Martin Liska  <mliska@suse.cz>
>
>         PR middle-end/81657
>         * builtins.c (expand_builtin_memory_copy_args): Add new
>         arguments.
>         * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
>         New macro.
>         * defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
>         * doc/tm.texi: Likewise.
>         * doc/tm.texi.in: Likewise.
>         * expr.c (compare_by_pieces): Add support for bail out.
>         (emit_block_move_hints): Likewise.
>         * expr.h (emit_block_move_hints): Add new arguments.
>
> gcc/testsuite/ChangeLog:
>
> 2018-03-09  Martin Liska  <mliska@suse.cz>
>
>         PR middle-end/81657
>         * gcc.dg/string-opt-1.c: Adjust test to run only on non-x86
>         target.
> ---
>  gcc/builtins.c                      | 13 ++++++++++++-
>  gcc/config/i386/i386.h              |  3 +++
>  gcc/defaults.h                      |  7 +++++++
>  gcc/doc/tm.texi                     |  5 +++++
>  gcc/doc/tm.texi.in                  |  5 +++++
>  gcc/expr.c                          | 16 +++++++++++++++-
>  gcc/expr.h                          |  4 +++-
>  gcc/testsuite/gcc.dg/string-opt-1.c |  4 ++--
>  8 files changed, 52 insertions(+), 5 deletions(-)
>
>
Martin Liška March 13, 2018, 8:24 a.m. UTC | #4
On 03/12/2018 10:39 AM, Marc Glisse wrote:
> On Mon, 12 Mar 2018, Martin Liška wrote:
> 
>> This is fix for the PR that introduces a new target macro. Using the macro
>> one can say that a target has a fast mempcpy and thus it's preferred to be used
>> if possible.
>>
>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>> I also tested on x86_64-linux-gnu.
>>
>> Ready to be installed?
>> Martin
>>
>> gcc/ChangeLog:
>>
>> 2018-03-08  Martin Liska  <mliska@suse.cz>
>>
>>     PR middle-end/81657
>>     * builtins.c (expand_builtin_memory_copy_args): Add new
>>     arguments.
>>     * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
>>     New macro.
> 
> Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy?

That's Marc a very good question. Do we already have a glibc-related target macros/hooks?
If so, I would add this as one of these.

Thanks,
Martin

> 
>>     * defaults.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE): Likewise.
>>     * doc/tm.texi: Likewise.
>>     * doc/tm.texi.in: Likewise.
>>     * expr.c (compare_by_pieces): Add support for bail out.
>>     (emit_block_move_hints): Likewise.
>>     * expr.h (emit_block_move_hints): Add new arguments.
>
Jakub Jelinek March 13, 2018, 8:32 a.m. UTC | #5
On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin Liška wrote:
> On 03/12/2018 10:39 AM, Marc Glisse wrote:
> > On Mon, 12 Mar 2018, Martin Liška wrote:
> > 
> > > This is fix for the PR that introduces a new target macro. Using the macro
> > > one can say that a target has a fast mempcpy and thus it's preferred to be used
> > > if possible.
> > > 
> > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> > > I also tested on x86_64-linux-gnu.
> > > 
> > > Ready to be installed?
> > > Martin
> > > 
> > > gcc/ChangeLog:
> > > 
> > > 2018-03-08  Martin Liska  <mliska@suse.cz>
> > > 
> > >     PR middle-end/81657
> > >     * builtins.c (expand_builtin_memory_copy_args): Add new
> > >     arguments.
> > >     * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
> > >     New macro.
> > 
> > Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy?
> 
> That's Marc a very good question. Do we already have a glibc-related target macros/hooks?
> If so, I would add this as one of these.

Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
where in particular linux_libc_has_function deals with various C libraries.
Of course, in this case you need another target hook, that is dependent both
on the target backend and C library.

It would be nice to make the target hook a little bit more generic as well,
e.g. pass it enum builtin_function and query if it is fast, slow or
unknown, or even some kind of cost, where the caller could ask for cost of
BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.

	Jakub
Richard Biener March 13, 2018, 9:22 a.m. UTC | #6
On Tue, Mar 13, 2018 at 9:32 AM, Jakub Jelinek <jakub@redhat.com> wrote:
> On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin Liška wrote:
>> On 03/12/2018 10:39 AM, Marc Glisse wrote:
>> > On Mon, 12 Mar 2018, Martin Liška wrote:
>> >
>> > > This is fix for the PR that introduces a new target macro. Using the macro
>> > > one can say that a target has a fast mempcpy and thus it's preferred to be used
>> > > if possible.
>> > >
>> > > Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>> > > I also tested on x86_64-linux-gnu.
>> > >
>> > > Ready to be installed?
>> > > Martin
>> > >
>> > > gcc/ChangeLog:
>> > >
>> > > 2018-03-08  Martin Liska  <mliska@suse.cz>
>> > >
>> > >     PR middle-end/81657
>> > >     * builtins.c (expand_builtin_memory_copy_args): Add new
>> > >     arguments.
>> > >     * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
>> > >     New macro.
>> >
>> > Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy?
>>
>> That's Marc a very good question. Do we already have a glibc-related target macros/hooks?
>> If so, I would add this as one of these.
>
> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
> where in particular linux_libc_has_function deals with various C libraries.
> Of course, in this case you need another target hook, that is dependent both
> on the target backend and C library.
>
> It would be nice to make the target hook a little bit more generic as well,
> e.g. pass it enum builtin_function and query if it is fast, slow or
> unknown, or even some kind of cost, where the caller could ask for cost of
> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.

Or maybe a hook returning the alternative to use?  Pass it BUILT_IN_X
and get back
the same or related builtin?

Richard.

>         Jakub
Martin Liška March 13, 2018, 3:19 p.m. UTC | #7
On 03/13/2018 09:32 AM, Jakub Jelinek wrote:
> On Tue, Mar 13, 2018 at 09:24:11AM +0100, Martin Liška wrote:
>> On 03/12/2018 10:39 AM, Marc Glisse wrote:
>>> On Mon, 12 Mar 2018, Martin Liška wrote:
>>>
>>>> This is fix for the PR that introduces a new target macro. Using the macro
>>>> one can say that a target has a fast mempcpy and thus it's preferred to be used
>>>> if possible.
>>>>
>>>> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>>>> I also tested on x86_64-linux-gnu.
>>>>
>>>> Ready to be installed?
>>>> Martin
>>>>
>>>> gcc/ChangeLog:
>>>>
>>>> 2018-03-08  Martin Liska  <mliska@suse.cz>
>>>>
>>>>      PR middle-end/81657
>>>>      * builtins.c (expand_builtin_memory_copy_args): Add new
>>>>      arguments.
>>>>      * config/i386/i386.h (TARGET_HAS_FAST_MEMPCPY_ROUTINE):
>>>>      New macro.
>>>
>>> Shouldn't the macro be defined in a more specific case, for instance glibc on x86? Or do all known libc on x86 happen to provide a fast mempcpy?
>>
>> That's Marc a very good question. Do we already have a glibc-related target macros/hooks?
>> If so, I would add this as one of these.
> 
> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
> where in particular linux_libc_has_function deals with various C libraries.
> Of course, in this case you need another target hook, that is dependent both
> on the target backend and C library.
> 
> It would be nice to make the target hook a little bit more generic as well,
> e.g. pass it enum builtin_function and query if it is fast, slow or
> unknown, or even some kind of cost, where the caller could ask for cost of
> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.

Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
I would like to implement the function in gcc/targhooks.c, but I don't know how to
make ifdef according to target?

One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
callback argument with int, that will be casted. One last issue: am I right that I'll have to define
TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?

Thanks,
Martin

> 
> 	Jakub
>
Jakub Jelinek March 13, 2018, 3:23 p.m. UTC | #8
On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote:
> > Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
> > where in particular linux_libc_has_function deals with various C libraries.
> > Of course, in this case you need another target hook, that is dependent both
> > on the target backend and C library.
> > 
> > It would be nice to make the target hook a little bit more generic as well,
> > e.g. pass it enum builtin_function and query if it is fast, slow or
> > unknown, or even some kind of cost, where the caller could ask for cost of
> > BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
> 
> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
> I would like to implement the function in gcc/targhooks.c, but I don't know how to
> make ifdef according to target?

Put there just the default implementation (everything is UNKNOWN?).

> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
> callback argument with int, that will be casted. One last issue: am I right that I'll have to define
> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?

And define the i386/x86_64 glibc one in config/i386/*.h, check there
OPTION_GLIBC and only in that case return something other than UNKNOWN.

And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.

	Jakub
Martin Liška March 14, 2018, 12:54 p.m. UTC | #9
On 03/13/2018 04:23 PM, Jakub Jelinek wrote:
> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote:
>>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
>>> where in particular linux_libc_has_function deals with various C libraries.
>>> Of course, in this case you need another target hook, that is dependent both
>>> on the target backend and C library.
>>>
>>> It would be nice to make the target hook a little bit more generic as well,
>>> e.g. pass it enum builtin_function and query if it is fast, slow or
>>> unknown, or even some kind of cost, where the caller could ask for cost of
>>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
>>
>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
>> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
>> I would like to implement the function in gcc/targhooks.c, but I don't know how to
>> make ifdef according to target?
> 
> Put there just the default implementation (everything is UNKNOWN?).
> 
>> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
>> callback argument with int, that will be casted. One last issue: am I right that I'll have to define
>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?
> 
> And define the i386/x86_64 glibc one in config/i386/*.h, check there
> OPTION_GLIBC and only in that case return something other than UNKNOWN.
> 
> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.
> 
> 	Jakub
> 

Hi.

I'm sending V2 that can survive bootstrap and regression tests on both x86_64 and ppc64le.

Martin
From 222c7c205a7afc144dc123d2b378a057dcf8816f Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
	* config/i386/i386.c (enum libc_speed): Likewise.
	(ix86_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.c-torture/execute/builtins/mempcpy.c (main_test): Adjust
	to not use mempcpy.
	* gcc.dg/string-opt-1.c: Adjust for i386 target.
---
 gcc/builtins.c                                       | 13 ++++++++++++-
 gcc/config/i386/i386-protos.h                        |  2 ++
 gcc/config/i386/i386.c                               | 20 ++++++++++++++++++++
 gcc/coretypes.h                                      |  7 +++++++
 gcc/doc/tm.texi                                      |  4 ++++
 gcc/doc/tm.texi.in                                   |  1 +
 gcc/expr.c                                           | 16 +++++++++++++++-
 gcc/expr.h                                           |  4 +++-
 gcc/target.def                                       |  7 +++++++
 gcc/targhooks.c                                      |  6 ++++++
 gcc/targhooks.h                                      |  1 +
 .../gcc.c-torture/execute/builtins/mempcpy.c         |  2 +-
 gcc/testsuite/gcc.dg/string-opt-1.c                  |  4 ++--
 13 files changed, 81 insertions(+), 6 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa74510..eb038dd45b3 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  bool is_move_done;
+
   /* Copy word part most expediently.  */
+  bool bail_out_libcall = endp == 1
+    && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     bail_out_libcall, &is_move_done);
+
+  /* Bail out when a mempcpy call would be expanded as libcall and when
+     we have a target that provides a fast implementation
+     of mempcpy routine.  */
+  if (!is_move_done)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ef7c818986f..d3fc515845b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern enum libc_speed gnu_libc_func_speed (int fn);
+
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index af24c6ec5ba..b3eb50a42f3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
 	  && (!cfun->machine->has_local_indirect_jump
 	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
 }
+
+enum libc_speed
+ix86_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return FAST_SPEED;
+      default:
+	return UNKNOWN_SPEED;
+    }
+}
 
 /* Return a string that documents the current -m options.  The caller is
    responsible for freeing the string.  */
@@ -52061,6 +52078,9 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED ix86_libc_func_speed
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..8123df7ccc5 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  FAST_SPEED,
+  SLOW_SPEED,
+  UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..b6c13652d79 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is set true, do not emit library call and set
+   *IS_MOVE_DONE to false.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,12 +1565,17 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool bail_out_libcall, bool *is_move_done)
 {
   bool may_use_call;
   rtx retval = 0;
   unsigned int align;
 
+  /* When not doing a bail out, we always emit a memory move.  */
+  if (is_move_done)
+    *is_move_done = true;
+
   gcc_assert (size);
   if (CONST_INT_P (size) && INTVAL (size) == 0)
     return 0;
@@ -1625,6 +1632,13 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  if (is_move_done)
+	    *is_move_done = false;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..023bc5aec47 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,9 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool bail_out_libcall = false,
+				  bool *is_move_done = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..af0c87ff9fe 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
index d82e2232d7b..91e1c87f83f 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
@@ -62,7 +62,7 @@ main_test (void)
   mempcpy (p + 5, s3, 1);
   if (memcmp (p, "ABCDEFg", 8))
     abort ();
-  mempcpy (p + 6, s1 + 1, l1);
+  memcpy (p + 6, s1 + 1, l1);
   if (memcmp (p, "ABCDEF2", 8))
     abort ();
 }
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..851c8b04a33 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */
H.J. Lu March 14, 2018, 12:57 p.m. UTC | #10
On Wed, Mar 14, 2018 at 5:54 AM, Martin Liška <mliska@suse.cz> wrote:
> On 03/13/2018 04:23 PM, Jakub Jelinek wrote:
>> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote:
>>>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
>>>> where in particular linux_libc_has_function deals with various C libraries.
>>>> Of course, in this case you need another target hook, that is dependent both
>>>> on the target backend and C library.
>>>>
>>>> It would be nice to make the target hook a little bit more generic as well,
>>>> e.g. pass it enum builtin_function and query if it is fast, slow or
>>>> unknown, or even some kind of cost, where the caller could ask for cost of
>>>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
>>>
>>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
>>> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
>>> I would like to implement the function in gcc/targhooks.c, but I don't know how to
>>> make ifdef according to target?
>>
>> Put there just the default implementation (everything is UNKNOWN?).
>>
>>> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
>>> callback argument with int, that will be casted. One last issue: am I right that I'll have to define
>>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?
>>
>> And define the i386/x86_64 glibc one in config/i386/*.h, check there
>> OPTION_GLIBC and only in that case return something other than UNKNOWN.
>>
>> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.
>>
>>       Jakub
>>
>
> Hi.
>
> I'm sending V2 that can survive bootstrap and regression tests on both x86_64 and ppc64le.
>
> Martin

diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
index d82e2232d7b..91e1c87f83f 100644
--- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
+++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
@@ -62,7 +62,7 @@ main_test (void)
   mempcpy (p + 5, s3, 1);
   if (memcmp (p, "ABCDEFg", 8))
     abort ();
-  mempcpy (p + 6, s1 + 1, l1);
+  memcpy (p + 6, s1 + 1, l1);
   if (memcmp (p, "ABCDEF2", 8))
     abort ();
 }

This is a mempcpy test.  Why is mempcpy changed to memcpy?
Jakub Jelinek March 14, 2018, 1:07 p.m. UTC | #11
On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin Liška wrote:
> --- a/gcc/builtins.c
> +++ b/gcc/builtins.c
> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
>    src_mem = get_memory_rtx (src, len);
>    set_mem_align (src_mem, src_align);
>  
> +  bool is_move_done;
> +
>    /* Copy word part most expediently.  */

This comment supposedly belongs right above the emit_block_move_hints call.

> +  bool bail_out_libcall = endp == 1
> +    && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;

Formatting.  && belongs below endp.  So either:
  bool bail_out_libcall
    = (endp == 1
       && ...);
or
  bool bail_out_libcall = false;
  if (endp == 1
      && ...)
    bail_out_libcall = true;
?
The variable is not named very well, shouldn't it be avoid_libcall or
something similar?  Perhaps the variable should have a comment describing
what it is.  Do you need separate argument for that bool and
is_move_done, rather than just the flag being that some pointer to bool is
non-NULL?

>    dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>  				     CALL_EXPR_TAILCALL (exp)
>  				     && (endp == 0 || target == const0_rtx)
>  				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>  				     expected_align, expected_size,
> -				     min_size, max_size, probable_max_size);
> +				     min_size, max_size, probable_max_size,
> +				     bail_out_libcall, &is_move_done);
> +
> +  /* Bail out when a mempcpy call would be expanded as libcall and when
> +     we have a target that provides a fast implementation
> +     of mempcpy routine.  */
> +  if (!is_move_done)
> +    return NULL_RTX;
>  
>    if (dest_addr == 0)
>      {
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
>  	  && (!cfun->machine->has_local_indirect_jump
>  	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>  }
> +

Missing function comment here.  For target hooks, usually there is a copy of
the target.def comment, perhaps with further details on why it is overridden
and what it does.
> --- a/gcc/coretypes.h
> +++ b/gcc/coretypes.h
> @@ -384,6 +384,13 @@ enum excess_precision_type
>    EXCESS_PRECISION_TYPE_FAST
>  };
>  

Missing comment describing what it is, plus it the enumerators are too
generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.?
> --- a/gcc/targhooks.c
> +++ b/gcc/targhooks.c
> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
>    return false;
>  }
>  

Again, missing function comment.

> +enum libc_speed
> +default_libc_func_speed (int)
> +{
> +  return UNKNOWN_SPEED;
> +}
> +

> --- a/gcc/testsuite/gcc.dg/string-opt-1.c
> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c
> @@ -48,5 +48,5 @@ main (void)
>    return 0;
>  }
>  
> -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
> -/* { dg-final { scan-assembler "memcpy" } } */
> +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
> +/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */

First of all, I don't really understand this, I'd expect both the
two old dg-final lines to be used as is for non-x86 targets and another two
for x86_64, and more importantly, the target hook is only for glibc, not for
musl/bionic etc., nor for non-linux, so you probably need some effective
target for it for whether it is glibc rather than musl/bionic, and use
...-*-linux* and ...-*-gnu* etc. rather than ...-*-*.  Or tweak the dg-fianl
patterns so that it succeeds on both variants of the target hook, but then
does the test test anything at all?

	Jakub
Martin Liška March 14, 2018, 1:08 p.m. UTC | #12
On 03/14/2018 01:57 PM, H.J. Lu wrote:
> On Wed, Mar 14, 2018 at 5:54 AM, Martin Liška <mliska@suse.cz> wrote:
>> On 03/13/2018 04:23 PM, Jakub Jelinek wrote:
>>> On Tue, Mar 13, 2018 at 04:19:21PM +0100, Martin Liška wrote:
>>>>> Yes, see e.g. TARGET_LIBC_HAS_FUNCTION target hook,
>>>>> where in particular linux_libc_has_function deals with various C libraries.
>>>>> Of course, in this case you need another target hook, that is dependent both
>>>>> on the target backend and C library.
>>>>>
>>>>> It would be nice to make the target hook a little bit more generic as well,
>>>>> e.g. pass it enum builtin_function and query if it is fast, slow or
>>>>> unknown, or even some kind of cost, where the caller could ask for cost of
>>>>> BUILT_IN_MEMCPY and BUILT_IN_MEMPCPY and decide based on the relative costs.
>>>>
>>>> Let me start with simple return enum value of FAST,SLOW,UNKNOWN. I've added new hook
>>>> definition to gcc/config/gnu-user.h that will point to gnu_libc_function_implementation.
>>>> I would like to implement the function in gcc/targhooks.c, but I don't know how to
>>>> make ifdef according to target?
>>>
>>> Put there just the default implementation (everything is UNKNOWN?).
>>>
>>>> One another issue is that built_in_function is enum defined in tree.h. Thus I'll replace the
>>>> callback argument with int, that will be casted. One last issue: am I right that I'll have to define
>>>> TARGET_LIBC_FUNCTION_IMPLEMENTATION in each config file (similar to no_c99_libc_has_function)?
>>>
>>> And define the i386/x86_64 glibc one in config/i386/*.h, check there
>>> OPTION_GLIBC and only in that case return something other than UNKNOWN.
>>>
>>> And redefine TARGET_LIBC_FUNCTION_IMPLEMENTATION only in that case.
>>>
>>>       Jakub
>>>
>>
>> Hi.
>>
>> I'm sending V2 that can survive bootstrap and regression tests on both x86_64 and ppc64le.
>>
>> Martin
> 
> diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> index d82e2232d7b..91e1c87f83f 100644
> --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> @@ -62,7 +62,7 @@ main_test (void)
>    mempcpy (p + 5, s3, 1);
>    if (memcmp (p, "ABCDEFg", 8))
>      abort ();
> -  mempcpy (p + 6, s1 + 1, l1);
> +  memcpy (p + 6, s1 + 1, l1);
>    if (memcmp (p, "ABCDEF2", 8))
>      abort ();
>  }
> 
> This is a mempcpy test.  Why is mempcpy changed to memcpy?
> 

Because this mempcpy is not optimized out to memcpy and it then aborts.
It's proper to leave here mempcpy I believe.

Martin
Jakub Jelinek March 14, 2018, 1:10 p.m. UTC | #13
On Wed, Mar 14, 2018 at 02:08:07PM +0100, Martin Liška wrote:
> > diff --git a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > index d82e2232d7b..91e1c87f83f 100644
> > --- a/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > +++ b/gcc/testsuite/gcc.c-torture/execute/builtins/mempcpy.c
> > @@ -62,7 +62,7 @@ main_test (void)
> >    mempcpy (p + 5, s3, 1);
> >    if (memcmp (p, "ABCDEFg", 8))
> >      abort ();
> > -  mempcpy (p + 6, s1 + 1, l1);
> > +  memcpy (p + 6, s1 + 1, l1);
> >    if (memcmp (p, "ABCDEF2", 8))
> >      abort ();
> >  }
> > 
> > This is a mempcpy test.  Why is mempcpy changed to memcpy?
> > 
> 
> Because this mempcpy is not optimized out to memcpy and it then aborts.

Why it isn't optimized to memcpy?  If the lhs is unused, it always should be
folded to memcpy, regardless of whether mempcpy is fast or not (I assume no
target has slow memcpy and fast mempcpy).

	Jakub
Martin Liška March 14, 2018, 1:54 p.m. UTC | #14
On 03/14/2018 02:07 PM, Jakub Jelinek wrote:
> On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin Liška wrote:
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
>>    src_mem = get_memory_rtx (src, len);
>>    set_mem_align (src_mem, src_align);
>>  
>> +  bool is_move_done;
>> +
>>    /* Copy word part most expediently.  */
> 
> This comment supposedly belongs right above the emit_block_move_hints call.
> 
>> +  bool bail_out_libcall = endp == 1
>> +    && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;
> 
> Formatting.  && belongs below endp.  So either:
>   bool bail_out_libcall
>     = (endp == 1
>        && ...);
> or
>   bool bail_out_libcall = false;
>   if (endp == 1
>       && ...)
>     bail_out_libcall = true;
> ?
> The variable is not named very well, shouldn't it be avoid_libcall or
> something similar?  Perhaps the variable should have a comment describing
> what it is.  Do you need separate argument for that bool and
> is_move_done, rather than just the flag being that some pointer to bool is
> non-NULL?

Can you please explain me how to replace the 2 new arguments?

> 
>>    dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>>  				     CALL_EXPR_TAILCALL (exp)
>>  				     && (endp == 0 || target == const0_rtx)
>>  				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>>  				     expected_align, expected_size,
>> -				     min_size, max_size, probable_max_size);
>> +				     min_size, max_size, probable_max_size,
>> +				     bail_out_libcall, &is_move_done);
>> +
>> +  /* Bail out when a mempcpy call would be expanded as libcall and when
>> +     we have a target that provides a fast implementation
>> +     of mempcpy routine.  */
>> +  if (!is_move_done)
>> +    return NULL_RTX;
>>  
>>    if (dest_addr == 0)
>>      {
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
>>  	  && (!cfun->machine->has_local_indirect_jump
>>  	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>>  }
>> +
> 
> Missing function comment here.  For target hooks, usually there is a copy of
> the target.def comment, perhaps with further details on why it is overridden
> and what it does.
>> --- a/gcc/coretypes.h
>> +++ b/gcc/coretypes.h
>> @@ -384,6 +384,13 @@ enum excess_precision_type
>>    EXCESS_PRECISION_TYPE_FAST
>>  };
>>  
> 
> Missing comment describing what it is, plus it the enumerators are too
> generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.?
>> --- a/gcc/targhooks.c
>> +++ b/gcc/targhooks.c
>> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
>>    return false;
>>  }
>>  
> 
> Again, missing function comment.
> 
>> +enum libc_speed
>> +default_libc_func_speed (int)
>> +{
>> +  return UNKNOWN_SPEED;
>> +}
>> +
> 
>> --- a/gcc/testsuite/gcc.dg/string-opt-1.c
>> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c
>> @@ -48,5 +48,5 @@ main (void)
>>    return 0;
>>  }
>>  
>> -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
>> -/* { dg-final { scan-assembler "memcpy" } } */
>> +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
>> +/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */
> 
> First of all, I don't really understand this, I'd expect both the
> two old dg-final lines to be used as is for non-x86 targets and another two
> for x86_64, and more importantly, the target hook is only for glibc, not for
> musl/bionic etc., nor for non-linux, so you probably need some effective
> target for it for whether it is glibc rather than musl/bionic, and use
> ...-*-linux* and ...-*-gnu* etc. rather than ...-*-*.  Or tweak the dg-fianl
> patterns so that it succeeds on both variants of the target hook, but then
> does the test test anything at all?

I fixed that by preserving the old 2 old-finals and then I added a new for x86_64 target.
Apart from the comment above I've fixed all nits and mempcpy is not used when LHS == NULL.

Martin

> 
> 	Jakub
>
From 26979038ce9500015f957afd896146022a38490b Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
	* config/i386/i386.c (enum libc_speed): Likewise.
	(ix86_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 16 +++++++++++++++-
 gcc/config/i386/i386-protos.h       |  2 ++
 gcc/config/i386/i386.c              | 24 ++++++++++++++++++++++++
 gcc/coretypes.h                     |  7 +++++++
 gcc/doc/tm.texi                     |  4 ++++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 16 +++++++++++++++-
 gcc/expr.h                          |  4 +++-
 gcc/target.def                      |  7 +++++++
 gcc/targhooks.c                     |  9 +++++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 +++--
 12 files changed, 91 insertions(+), 5 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa74510..b668045374c 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,27 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  bool is_move_done;
+
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall, &is_move_done);
+
+  if (!is_move_done)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ef7c818986f..d3fc515845b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern enum libc_speed gnu_libc_func_speed (int fn);
+
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index af24c6ec5ba..f4cbecdf099 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2733,6 +2733,27 @@ ix86_using_red_zone (void)
 	  && (!cfun->machine->has_local_indirect_jump
 	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
 }
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
 
 /* Return a string that documents the current -m options.  The caller is
    responsible for freeing the string.  */
@@ -52061,6 +52082,9 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED ix86_libc_func_speed
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..b6c13652d79 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is set true, do not emit library call and set
+   *IS_MOVE_DONE to false.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,12 +1565,17 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool bail_out_libcall, bool *is_move_done)
 {
   bool may_use_call;
   rtx retval = 0;
   unsigned int align;
 
+  /* When not doing a bail out, we always emit a memory move.  */
+  if (is_move_done)
+    *is_move_done = true;
+
   gcc_assert (size);
   if (CONST_INT_P (size) && INTVAL (size) == 0)
     return 0;
@@ -1625,6 +1632,13 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  if (is_move_done)
+	    *is_move_done = false;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..023bc5aec47 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,9 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool bail_out_libcall = false,
+				  bool *is_move_done = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
Martin Liška March 18, 2018, 10:17 p.m. UTC | #15
PING^1

On 03/14/2018 02:54 PM, Martin Liška wrote:
> On 03/14/2018 02:07 PM, Jakub Jelinek wrote:
>> On Wed, Mar 14, 2018 at 01:54:39PM +0100, Martin Liška wrote:
>>> --- a/gcc/builtins.c
>>> +++ b/gcc/builtins.c
>>> @@ -3651,13 +3651,24 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
>>>     src_mem = get_memory_rtx (src, len);
>>>     set_mem_align (src_mem, src_align);
>>>   
>>> +  bool is_move_done;
>>> +
>>>     /* Copy word part most expediently.  */
>>
>> This comment supposedly belongs right above the emit_block_move_hints call.
>>
>>> +  bool bail_out_libcall = endp == 1
>>> +    && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == FAST_SPEED;
>>
>> Formatting.  && belongs below endp.  So either:
>>    bool bail_out_libcall
>>      = (endp == 1
>>         && ...);
>> or
>>    bool bail_out_libcall = false;
>>    if (endp == 1
>>        && ...)
>>      bail_out_libcall = true;
>> ?
>> The variable is not named very well, shouldn't it be avoid_libcall or
>> something similar?  Perhaps the variable should have a comment describing
>> what it is.  Do you need separate argument for that bool and
>> is_move_done, rather than just the flag being that some pointer to bool is
>> non-NULL?
> 
> Can you please explain me how to replace the 2 new arguments?
> 
>>
>>>     dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>>>   				     CALL_EXPR_TAILCALL (exp)
>>>   				     && (endp == 0 || target == const0_rtx)
>>>   				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>>>   				     expected_align, expected_size,
>>> -				     min_size, max_size, probable_max_size);
>>> +				     min_size, max_size, probable_max_size,
>>> +				     bail_out_libcall, &is_move_done);
>>> +
>>> +  /* Bail out when a mempcpy call would be expanded as libcall and when
>>> +     we have a target that provides a fast implementation
>>> +     of mempcpy routine.  */
>>> +  if (!is_move_done)
>>> +    return NULL_RTX;
>>>   
>>>     if (dest_addr == 0)
>>>       {
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -2733,6 +2733,23 @@ ix86_using_red_zone (void)
>>>   	  && (!cfun->machine->has_local_indirect_jump
>>>   	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>>>   }
>>> +
>>
>> Missing function comment here.  For target hooks, usually there is a copy of
>> the target.def comment, perhaps with further details on why it is overridden
>> and what it does.
>>> --- a/gcc/coretypes.h
>>> +++ b/gcc/coretypes.h
>>> @@ -384,6 +384,13 @@ enum excess_precision_type
>>>     EXCESS_PRECISION_TYPE_FAST
>>>   };
>>>   
>>
>> Missing comment describing what it is, plus it the enumerators are too
>> generic, if it is libc_speed enum, perhaps LIBC_FAST_SPEED etc.?
>>> --- a/gcc/targhooks.c
>>> +++ b/gcc/targhooks.c
>>> @@ -1642,6 +1642,12 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
>>>     return false;
>>>   }
>>>   
>>
>> Again, missing function comment.
>>
>>> +enum libc_speed
>>> +default_libc_func_speed (int)
>>> +{
>>> +  return UNKNOWN_SPEED;
>>> +}
>>> +
>>
>>> --- a/gcc/testsuite/gcc.dg/string-opt-1.c
>>> +++ b/gcc/testsuite/gcc.dg/string-opt-1.c
>>> @@ -48,5 +48,5 @@ main (void)
>>>     return 0;
>>>   }
>>>   
>>> -/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
>>> -/* { dg-final { scan-assembler "memcpy" } } */
>>> +/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
>>> +/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */
>>
>> First of all, I don't really understand this, I'd expect both the
>> two old dg-final lines to be used as is for non-x86 targets and another two
>> for x86_64, and more importantly, the target hook is only for glibc, not for
>> musl/bionic etc., nor for non-linux, so you probably need some effective
>> target for it for whether it is glibc rather than musl/bionic, and use
>> ...-*-linux* and ...-*-gnu* etc. rather than ...-*-*.  Or tweak the dg-fianl
>> patterns so that it succeeds on both variants of the target hook, but then
>> does the test test anything at all?
> 
> I fixed that by preserving the old 2 old-finals and then I added a new for x86_64 target.
> Apart from the comment above I've fixed all nits and mempcpy is not used when LHS == NULL.
> 
> Martin
> 
>>
>> 	Jakub
>>
>
Jakub Jelinek March 21, 2018, 10:34 a.m. UTC | #16
On Wed, Mar 14, 2018 at 02:54:00PM +0100, Martin Liška wrote:
> > The variable is not named very well, shouldn't it be avoid_libcall or
> > something similar?  Perhaps the variable should have a comment describing
> > what it is.  Do you need separate argument for that bool and
> > is_move_done, rather than just the flag being that some pointer to bool is
> > non-NULL?
> 
> Can you please explain me how to replace the 2 new arguments?

So you have one bool arg and one pointer arg into which the function
stores true and optionally based on that other bool arg and other conditions
stores false.
I'm suggesting just one bool * argument, which is NULL if the bool arg would
be false, and non-NULL otherwise.  The single argument still could be
called bool *avoid_libcall, and you'd just
if (avoid_libcall) { *avoid_libcall = true; return retval; }
instead of emitting a libcall, the caller would initialize the bool
variable to false.

	Jakub
Martin Liška March 28, 2018, 2:18 p.m. UTC | #17
On 03/21/2018 11:34 AM, Jakub Jelinek wrote:
> On Wed, Mar 14, 2018 at 02:54:00PM +0100, Martin Liška wrote:
>>> The variable is not named very well, shouldn't it be avoid_libcall or
>>> something similar?  Perhaps the variable should have a comment describing
>>> what it is.  Do you need separate argument for that bool and
>>> is_move_done, rather than just the flag being that some pointer to bool is
>>> non-NULL?
>>
>> Can you please explain me how to replace the 2 new arguments?
> 
> So you have one bool arg and one pointer arg into which the function
> stores true and optionally based on that other bool arg and other conditions
> stores false.
> I'm suggesting just one bool * argument, which is NULL if the bool arg would
> be false, and non-NULL otherwise.  The single argument still could be
> called bool *avoid_libcall, and you'd just
> if (avoid_libcall) { *avoid_libcall = true; return retval; }
> instead of emitting a libcall, the caller would initialize the bool
> variable to false.
> 
> 	Jakub
> 

Got it. I'm sending updated version of the patch. Hope I've had enough fantasy to
write it nice.

Tested on both ppc64le and x86_64.

Martin
From d766330364aa2a23512f7d4e60491b634c5d0523 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
	* config/i386/i386.c (enum libc_speed): Likewise.
	(ix86_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 15 ++++++++++++++-
 gcc/config/i386/i386-protos.h       |  2 ++
 gcc/config/i386/i386.c              | 24 ++++++++++++++++++++++++
 gcc/coretypes.h                     |  7 +++++++
 gcc/doc/tm.texi                     |  4 ++++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 11 ++++++++++-
 gcc/expr.h                          |  3 ++-
 gcc/target.def                      |  7 +++++++
 gcc/targhooks.c                     |  9 +++++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 +++--
 12 files changed, 84 insertions(+), 5 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..d3cd93ffbfa 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall ? &libcall_avoided: NULL);
+
+  if (libcall_avoided)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index ef7c818986f..d3fc515845b 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
 
 extern bool ix86_using_red_zone (void);
 
+extern enum libc_speed gnu_libc_func_speed (int fn);
+
 #ifdef RTX_CODE
 extern int standard_80387_constant_p (rtx);
 extern const char *standard_80387_constant_opcode (rtx);
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..aadf9fa5ac3 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -2735,6 +2735,27 @@ ix86_using_red_zone (void)
 	  && (!cfun->machine->has_local_indirect_jump
 	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
 }
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
 
 /* Return a string that documents the current -m options.  The caller is
    responsible for freeing the string.  */
@@ -52105,6 +52126,9 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED ix86_libc_func_speed
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..f3bd698bc4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign
+   true to the pointer when move is not done.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool *bail_out_libcall)
 {
   bool may_use_call;
   rtx retval = 0;
@@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  *bail_out_libcall = true;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..c2bf87fd14e 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool *bail_out_libcall = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
Jakub Jelinek March 28, 2018, 2:31 p.m. UTC | #18
On Wed, Mar 28, 2018 at 04:18:45PM +0200, Martin Liška wrote:
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	PR middle-end/81657
> 	* builtins.c (expand_builtin_memory_copy_args): Handle situation
> 	when libc library provides a fast mempcpy implementation/
> 	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
> 	* config/i386/i386.c (enum libc_speed): Likewise.
> 	(ix86_libc_func_speed): Likewise.
> 	(TARGET_LIBC_FUNC_SPEED): Likewise.
> 	* coretypes.h (enum libc_speed): Likewise.
> 	* doc/tm.texi: Document new target hook.
> 	* doc/tm.texi.in: Likewise.
> 	* expr.c (emit_block_move_hints): Handle libc bail out argument.
> 	* expr.h (emit_block_move_hints): Add new parameters.
> 	* target.def: Add new hook.
> 	* targhooks.c (enum libc_speed): New enum.
> 	(default_libc_func_speed): Provide a default hook
> 	implementation.
> 	* targhooks.h (default_libc_func_speed): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
> 	and others.
>    /* Copy word part most expediently.  */
> +  bool libcall_avoided = false;
>    dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>  				     CALL_EXPR_TAILCALL (exp)
>  				     && (endp == 0 || target == const0_rtx)
>  				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>  				     expected_align, expected_size,
> -				     min_size, max_size, probable_max_size);
> +				     min_size, max_size, probable_max_size,
> +				     avoid_libcall ? &libcall_avoided: NULL);

Missing space before :.
> +
> +  if (libcall_avoided)
> +    return NULL_RTX;
>  
> --- a/gcc/config/i386/i386-protos.h
> +++ b/gcc/config/i386/i386-protos.h
> @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
>  
>  extern bool ix86_using_red_zone (void);
>  
> +extern enum libc_speed gnu_libc_func_speed (int fn);

Here you declare gnu_libc_func_speed.

> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -2735,6 +2735,27 @@ ix86_using_red_zone (void)
>  	  && (!cfun->machine->has_local_indirect_jump
>  	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>  }
> +
> +/* This hook determines whether a function from libc has a fast implementation
> +   FN is present at the runtime.  We override it for i386 and glibc C library
> +   as this combination provides fast implementation of mempcpy function.  */
> +
> +enum libc_speed
> +ix86_libc_func_speed (int fn)

But define a different function.

> +{
> +  enum built_in_function f = (built_in_function)fn;
> +
> +  if (!OPTION_GLIBC)
> +    return LIBC_UNKNOWN_SPEED;

OPTION_GLIBC is only defined if linux.h is included, so I think you break
all other x86 targets this way.  The hook should have linux in the name,
perhaps defined only in config/i386/linux.h and redefined in i386.c through
#ifdef SUBTARGET_LIBC_FUNC_SPEED
#undef TARGET_LIBC_FUNC_SPEED
#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
#endif
or something similar.

Otherwise LGTM, but please get approval also from Richi (for the generic part) and
Uros (for the backend side).

	Jakub
Martin Liška March 28, 2018, 4:30 p.m. UTC | #19
On 03/28/2018 04:31 PM, Jakub Jelinek wrote:
> On Wed, Mar 28, 2018 at 04:18:45PM +0200, Martin Liška wrote:
>> 2018-03-14  Martin Liska  <mliska@suse.cz>
>>
>> 	PR middle-end/81657
>> 	* builtins.c (expand_builtin_memory_copy_args): Handle situation
>> 	when libc library provides a fast mempcpy implementation/
>> 	* config/i386/i386-protos.h (gnu_libc_func_speed): New.
>> 	* config/i386/i386.c (enum libc_speed): Likewise.
>> 	(ix86_libc_func_speed): Likewise.
>> 	(TARGET_LIBC_FUNC_SPEED): Likewise.
>> 	* coretypes.h (enum libc_speed): Likewise.
>> 	* doc/tm.texi: Document new target hook.
>> 	* doc/tm.texi.in: Likewise.
>> 	* expr.c (emit_block_move_hints): Handle libc bail out argument.
>> 	* expr.h (emit_block_move_hints): Add new parameters.
>> 	* target.def: Add new hook.
>> 	* targhooks.c (enum libc_speed): New enum.
>> 	(default_libc_func_speed): Provide a default hook
>> 	implementation.
>> 	* targhooks.h (default_libc_func_speed): Likewise.
>>
>> gcc/testsuite/ChangeLog:
>>
>> 2018-03-14  Martin Liska  <mliska@suse.cz>
>>
>> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>> 	and others.
>>     /* Copy word part most expediently.  */
>> +  bool libcall_avoided = false;
>>     dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
>>   				     CALL_EXPR_TAILCALL (exp)
>>   				     && (endp == 0 || target == const0_rtx)
>>   				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
>>   				     expected_align, expected_size,
>> -				     min_size, max_size, probable_max_size);
>> +				     min_size, max_size, probable_max_size,
>> +				     avoid_libcall ? &libcall_avoided: NULL);
> 
> Missing space before :.

Fixed.

>> +
>> +  if (libcall_avoided)
>> +    return NULL_RTX;
>>   
>> --- a/gcc/config/i386/i386-protos.h
>> +++ b/gcc/config/i386/i386-protos.h
>> @@ -47,6 +47,8 @@ extern void ix86_reset_previous_fndecl (void);
>>   
>>   extern bool ix86_using_red_zone (void);
>>   
>> +extern enum libc_speed gnu_libc_func_speed (int fn);
> 
> Here you declare gnu_libc_func_speed.
> 
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -2735,6 +2735,27 @@ ix86_using_red_zone (void)
>>   	  && (!cfun->machine->has_local_indirect_jump
>>   	      || cfun->machine->indirect_branch_type == indirect_branch_keep));
>>   }
>> +
>> +/* This hook determines whether a function from libc has a fast implementation
>> +   FN is present at the runtime.  We override it for i386 and glibc C library
>> +   as this combination provides fast implementation of mempcpy function.  */
>> +
>> +enum libc_speed
>> +ix86_libc_func_speed (int fn)
> 
> But define a different function.

You are right.

> 
>> +{
>> +  enum built_in_function f = (built_in_function)fn;
>> +
>> +  if (!OPTION_GLIBC)
>> +    return LIBC_UNKNOWN_SPEED;
> 
> OPTION_GLIBC is only defined if linux.h is included, so I think you break
> all other x86 targets this way.

That said the proper way is probably to define the function in linux64.c
(should we also use it on i386, or nobody cares?). And corresponding declaration
will be in config/linux-protos.h.

The hook should have linux in the name,
> perhaps defined only in config/i386/linux.h and redefined in i386.c through
> #ifdef SUBTARGET_LIBC_FUNC_SPEED
> #undef TARGET_LIBC_FUNC_SPEED
> #define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
> #endif
> or something similar.

I adopted this suggested mechanism.
Thanks for review.

Martin

> 
> Otherwise LGTM, but please get approval also from Richi (for the generic part) and
> Uros (for the backend side).
> 
> 	Jakub
>
From 4378fa93aafac3d28eaae48c9a41a2f2ef5e1d18 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	* config/linux.c (ix86_linux_libc_func_speed): Likewise.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-28  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c:

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 15 ++++++++++++++-
 gcc/config/i386/i386.c              |  5 +++++
 gcc/config/i386/linux64.h           |  2 ++
 gcc/config/linux-protos.h           |  1 +
 gcc/config/linux.c                  | 21 +++++++++++++++++++++
 gcc/coretypes.h                     |  7 +++++++
 gcc/doc/tm.texi                     |  4 ++++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 11 ++++++++++-
 gcc/expr.h                          |  3 ++-
 gcc/target.def                      |  7 +++++++
 gcc/targhooks.c                     |  9 +++++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 +++--
 14 files changed, 87 insertions(+), 5 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..98ee3fb272d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall ? &libcall_avoided : NULL);
+
+  if (libcall_avoided)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..2471ff7b99a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -52105,6 +52105,11 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#ifdef SUBTARGET_LIBC_FUNC_SPEED
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
+#endif
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
index f2d913e30ac..d855f5cc239 100644
--- a/gcc/config/i386/linux64.h
+++ b/gcc/config/i386/linux64.h
@@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
 #undef MUSL_DYNAMIC_LINKERX32
 #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
+
+#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
diff --git a/gcc/config/linux-protos.h b/gcc/config/linux-protos.h
index 9da8dd7ecaa..b7284735366 100644
--- a/gcc/config/linux-protos.h
+++ b/gcc/config/linux-protos.h
@@ -20,3 +20,4 @@ along with GCC; see the file COPYING3.  If not see
 extern bool linux_has_ifunc_p (void);
 
 extern bool linux_libc_has_function (enum function_class fn_class);
+extern enum libc_speed ix86_linux_libc_func_speed (int fn);
diff --git a/gcc/config/linux.c b/gcc/config/linux.c
index bc06e54a73d..b878d51e4b4 100644
--- a/gcc/config/linux.c
+++ b/gcc/config/linux.c
@@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
 
   return false;
 }
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_linux_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..f3bd698bc4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign
+   true to the pointer when move is not done.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool *bail_out_libcall)
 {
   bool may_use_call;
   rtx retval = 0;
@@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  *bail_out_libcall = true;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..c2bf87fd14e 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool *bail_out_libcall = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
Jakub Jelinek March 28, 2018, 4:36 p.m. UTC | #20
On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
> --- a/gcc/config/linux.c
> +++ b/gcc/config/linux.c
> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>  
>    return false;
>  }
> +
> +/* This hook determines whether a function from libc has a fast implementation
> +   FN is present at the runtime.  We override it for i386 and glibc C library
> +   as this combination provides fast implementation of mempcpy function.  */
> +
> +enum libc_speed
> +ix86_linux_libc_func_speed (int fn)

Putting a ix86_ function into config/linux.c used by most linux targets is
weird.  Either we multiple linux targets with mempcpy fast, then name it
somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
And yes, we do care about i?86-linux.  Or it is for x86 only, and then
it shouldn't be in config/linux.c, but either e.g. static inline in
config/i386/linux.h, or we need config/i386/linux.c if we don't have it
already.

	Jakub
Martin Liška March 29, 2018, 11:28 a.m. UTC | #21
On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
>> --- a/gcc/config/linux.c
>> +++ b/gcc/config/linux.c
>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>>   
>>     return false;
>>   }
>> +
>> +/* This hook determines whether a function from libc has a fast implementation
>> +   FN is present at the runtime.  We override it for i386 and glibc C library
>> +   as this combination provides fast implementation of mempcpy function.  */
>> +
>> +enum libc_speed
>> +ix86_linux_libc_func_speed (int fn)
> 
> Putting a ix86_ function into config/linux.c used by most linux targets is
> weird.  Either we multiple linux targets with mempcpy fast, then name it
> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
> And yes, we do care about i?86-linux.  Or it is for x86 only, and then
> it shouldn't be in config/linux.c, but either e.g. static inline in
> config/i386/linux.h, or we need config/i386/linux.c if we don't have it
> already.

I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please
help me how to conditionally build the file?

Martin

> 
> 	Jakub
>
H.J. Lu March 29, 2018, 12:21 p.m. UTC | #22
On Thu, Mar 29, 2018 at 4:28 AM, Martin Liška <mliska@suse.cz> wrote:
> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
>>
>> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
>>>
>>> --- a/gcc/config/linux.c
>>> +++ b/gcc/config/linux.c
>>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>>>       return false;
>>>   }
>>> +
>>> +/* This hook determines whether a function from libc has a fast
>>> implementation
>>> +   FN is present at the runtime.  We override it for i386 and glibc C
>>> library
>>> +   as this combination provides fast implementation of mempcpy function.
>>> */
>>> +
>>> +enum libc_speed
>>> +ix86_linux_libc_func_speed (int fn)
>>
>>
>> Putting a ix86_ function into config/linux.c used by most linux targets is
>> weird.  Either we multiple linux targets with mempcpy fast, then name it
>> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
>> And yes, we do care about i?86-linux.  Or it is for x86 only, and then
>> it shouldn't be in config/linux.c, but either e.g. static inline in
>> config/i386/linux.h, or we need config/i386/linux.c if we don't have it
>> already.
>
>
> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can
> you please
> help me how to conditionally build the file?

config.gcc:      extra_objs="${extra_objs} linux.o"
config.gcc: extra_objs="$extra_objs powerpcspe-linux.o"
config.gcc: extra_objs="$extra_objs rs6000-linux.o"
Jakub Jelinek March 29, 2018, 12:25 p.m. UTC | #23
On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin Liška wrote:
> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
> > On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
> > > --- a/gcc/config/linux.c
> > > +++ b/gcc/config/linux.c
> > > @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
> > >     return false;
> > >   }
> > > +
> > > +/* This hook determines whether a function from libc has a fast implementation
> > > +   FN is present at the runtime.  We override it for i386 and glibc C library
> > > +   as this combination provides fast implementation of mempcpy function.  */
> > > +
> > > +enum libc_speed
> > > +ix86_linux_libc_func_speed (int fn)
> > 
> > Putting a ix86_ function into config/linux.c used by most linux targets is
> > weird.  Either we multiple linux targets with mempcpy fast, then name it
> > somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
> > And yes, we do care about i?86-linux.  Or it is for x86 only, and then
> > it shouldn't be in config/linux.c, but either e.g. static inline in
> > config/i386/linux.h, or we need config/i386/linux.c if we don't have it
> > already.
> 
> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please

Can't you just put it into gcc/config/i386/linux-common.h as static inline,
so that it is optimized away whenever not needed?

If you really want to add a c file, it better not be called linux.c, because
linux.o for it would clash with linux.o from gcc/config/linux.c.  And,
you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add
rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and
config/t-linux).

> help me how to conditionally build the file?

	Jakub
Martin Liška March 29, 2018, 12:31 p.m. UTC | #24
On 03/29/2018 02:25 PM, Jakub Jelinek wrote:
> On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin Liška wrote:
>> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
>>> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
>>>> --- a/gcc/config/linux.c
>>>> +++ b/gcc/config/linux.c
>>>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>>>>      return false;
>>>>    }
>>>> +
>>>> +/* This hook determines whether a function from libc has a fast implementation
>>>> +   FN is present at the runtime.  We override it for i386 and glibc C library
>>>> +   as this combination provides fast implementation of mempcpy function.  */
>>>> +
>>>> +enum libc_speed
>>>> +ix86_linux_libc_func_speed (int fn)
>>>
>>> Putting a ix86_ function into config/linux.c used by most linux targets is
>>> weird.  Either we multiple linux targets with mempcpy fast, then name it
>>> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
>>> And yes, we do care about i?86-linux.  Or it is for x86 only, and then
>>> it shouldn't be in config/linux.c, but either e.g. static inline in
>>> config/i386/linux.h, or we need config/i386/linux.c if we don't have it
>>> already.
>>
>> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please
> 
> Can't you just put it into gcc/config/i386/linux-common.h as static inline,
> so that it is optimized away whenever not needed?

I would like to put it there, but:

g++ -c   -g  -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild -I../../gcc -I../../gcc/build -I../../gcc/../include  -I../../gcc/../libcpp/include  \
	-o build/genpreds.o ../../gcc/genpreds.c
In file included from ./tm.h:38:0,
                  from ../../gcc/genpreds.c:26:
../../gcc/config/i386/linux-common.h: In function ‘libc_speed ix86_linux_libc_func_speed(int)’:
../../gcc/config/i386/linux-common.h:137:8: error: use of enum ‘built_in_function’ without previous declaration
    enum built_in_function f = (built_in_function)fn;
         ^~~~~~~~~~~~~~~~~
../../gcc/config/i386/linux-common.h:137:31: error: ‘built_in_function’ was not declared in this scope
    enum built_in_function f = (built_in_function)fn;
                                ^~~~~~~~~~~~~~~~~
../../gcc/config/i386/linux-common.h:137:31: note: suggested alternative: ‘machine_function’
    enum built_in_function f = (built_in_function)fn;
                                ^~~~~~~~~~~~~~~~~
                                machine_function
../../gcc/config/i386/linux-common.h:144:12: error: ‘BUILT_IN_MEMPCPY’ was not declared in this scope
        case BUILT_IN_MEMPCPY:
             ^~~~~~~~~~~~~~~~
Martin

> 
> If you really want to add a c file, it better not be called linux.c, because
> linux.o for it would clash with linux.o from gcc/config/linux.c.  And,
> you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add
> rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and
> config/t-linux).
> 
>> help me how to conditionally build the file?
> 
> 	Jakub
>
Martin Liška April 4, 2018, 10:13 a.m. UTC | #25
PING^1

On 03/29/2018 02:31 PM, Martin Liška wrote:
> On 03/29/2018 02:25 PM, Jakub Jelinek wrote:
>> On Thu, Mar 29, 2018 at 01:28:13PM +0200, Martin Liška wrote:
>>> On 03/28/2018 06:36 PM, Jakub Jelinek wrote:
>>>> On Wed, Mar 28, 2018 at 06:30:21PM +0200, Martin Liška wrote:
>>>>> --- a/gcc/config/linux.c
>>>>> +++ b/gcc/config/linux.c
>>>>> @@ -37,3 +37,24 @@ linux_libc_has_function (enum function_class fn_class)
>>>>>      return false;
>>>>>    }
>>>>> +
>>>>> +/* This hook determines whether a function from libc has a fast implementation
>>>>> +   FN is present at the runtime.  We override it for i386 and glibc C library
>>>>> +   as this combination provides fast implementation of mempcpy function.  */
>>>>> +
>>>>> +enum libc_speed
>>>>> +ix86_linux_libc_func_speed (int fn)
>>>>
>>>> Putting a ix86_ function into config/linux.c used by most linux targets is
>>>> weird.  Either we multiple linux targets with mempcpy fast, then name it
>>>> somehow cpu neutral and let all those CPUs pick it up in config/*/linux.h.
>>>> And yes, we do care about i?86-linux.  Or it is for x86 only, and then
>>>> it shouldn't be in config/linux.c, but either e.g. static inline in
>>>> config/i386/linux.h, or we need config/i386/linux.c if we don't have it
>>>> already.
>>>
>>> I'm fine with putting the implementation into gcc/config/i386/linux.c. Can you please
>>
>> Can't you just put it into gcc/config/i386/linux-common.h as static inline,
>> so that it is optimized away whenever not needed?
> 
> I would like to put it there, but:
> 
> g++ -c   -g  -DIN_GCC     -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -DGENERATOR_FILE -fno-PIE -I. -Ibuild -I../../gcc -I../../gcc/build -I../../gcc/../include  -I../../gcc/../libcpp/include  \
>     -o build/genpreds.o ../../gcc/genpreds.c
> In file included from ./tm.h:38:0,
>                  from ../../gcc/genpreds.c:26:
> ../../gcc/config/i386/linux-common.h: In function ‘libc_speed ix86_linux_libc_func_speed(int)’:
> ../../gcc/config/i386/linux-common.h:137:8: error: use of enum ‘built_in_function’ without previous declaration
>    enum built_in_function f = (built_in_function)fn;
>         ^~~~~~~~~~~~~~~~~
> ../../gcc/config/i386/linux-common.h:137:31: error: ‘built_in_function’ was not declared in this scope
>    enum built_in_function f = (built_in_function)fn;
>                                ^~~~~~~~~~~~~~~~~
> ../../gcc/config/i386/linux-common.h:137:31: note: suggested alternative: ‘machine_function’
>    enum built_in_function f = (built_in_function)fn;
>                                ^~~~~~~~~~~~~~~~~
>                                machine_function
> ../../gcc/config/i386/linux-common.h:144:12: error: ‘BUILT_IN_MEMPCPY’ was not declared in this scope
>        case BUILT_IN_MEMPCPY:
>             ^~~~~~~~~~~~~~~~
> Martin
> 
>>
>> If you really want to add a c file, it better not be called linux.c, because
>> linux.o for it would clash with linux.o from gcc/config/linux.c.  And,
>> you'd need to add the whatever.o into extra_objs in gcc/config.gcc and add
>> rules for it into gcc/config/i386/t-linux (see linux.o in config.gcc and
>> config/t-linux).
>>
>>> help me how to conditionally build the file?
>>
>>     Jakub
>>
>
Martin Liška April 9, 2018, 12:31 p.m. UTC | #26
Hi.

I'm sending updated version of the patch.

Patch can bootstrap on ppc64le-redhat-linux and x86_64-linux and survives regression tests.

Martin
From 6d19b1bf0c28a683e1458e16943e34bb547d36d6 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux64.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* config/i386/linux.h (SUBTARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/t-linux: Add x86-linux.o.
	* config.gcc: Likewise.
	* config/i386/x86-linux.c: New file.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-28  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c:

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 15 ++++++++++-
 gcc/config.gcc                      |  1 +
 gcc/config/i386/i386.c              |  5 ++++
 gcc/config/i386/linux.h             |  2 ++
 gcc/config/i386/linux64.h           |  2 ++
 gcc/config/i386/t-linux             |  6 +++++
 gcc/config/i386/x86-linux.c         | 54 +++++++++++++++++++++++++++++++++++++
 gcc/config/linux-protos.h           |  1 +
 gcc/coretypes.h                     |  7 +++++
 gcc/doc/tm.texi                     |  4 +++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 11 +++++++-
 gcc/expr.h                          |  3 ++-
 gcc/target.def                      |  7 +++++
 gcc/targhooks.c                     |  9 +++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 ++--
 17 files changed, 129 insertions(+), 5 deletions(-)
 create mode 100644 gcc/config/i386/x86-linux.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..98ee3fb272d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall ? &libcall_avoided : NULL);
+
+  if (libcall_avoided)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1b58c060a92..6445ff569b3 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
 	x86_64-*-linux*)
 		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
 		extra_options="${extra_options} linux-android.opt"
+		extra_objs="${extra_objs} x86-linux.o"
 	  	;;
 	x86_64-*-kfreebsd*-gnu)
 		tm_file="${tm_file} kfreebsd-gnu.h i386/kfreebsd-gnu64.h"
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..2471ff7b99a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -52105,6 +52105,11 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#ifdef SUBTARGET_LIBC_FUNC_SPEED
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
+#endif
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h
index 69f97f15b0d..6c59cbd6d62 100644
--- a/gcc/config/i386/linux.h
+++ b/gcc/config/i386/linux.h
@@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
 
 #undef MUSL_DYNAMIC_LINKER
 #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
+
+#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
index f2d913e30ac..d855f5cc239 100644
--- a/gcc/config/i386/linux64.h
+++ b/gcc/config/i386/linux64.h
@@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
 #undef MUSL_DYNAMIC_LINKERX32
 #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
+
+#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
diff --git a/gcc/config/i386/t-linux b/gcc/config/i386/t-linux
index 155314c08a7..6e3ebe94fe8 100644
--- a/gcc/config/i386/t-linux
+++ b/gcc/config/i386/t-linux
@@ -1 +1,7 @@
 MULTIARCH_DIRNAME = $(call if_multiarch,i386-linux-gnu)
+
+x86-linux.o: $(srcdir)/config/i386/x86-linux.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+  $(TM_H) $(RTL_H) $(REGS_H) hard-reg-set.h output.h $(TREE_H) flags.h \
+  $(TM_P_H) $(HASHTAB_H) $(GGC_H)
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+	$(srcdir)/config/i386/x86-linux.c
diff --git a/gcc/config/i386/x86-linux.c b/gcc/config/i386/x86-linux.c
new file mode 100644
index 00000000000..71291d18a71
--- /dev/null
+++ b/gcc/config/i386/x86-linux.c
@@ -0,0 +1,54 @@
+/* Implementation for linux-specific functions for i386 and x86-64 systems.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Martin Liska <mliska@suse.cz>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "cp/cp-tree.h" /* This is why we're a separate module.  */
+#include "stringpool.h"
+#include "attribs.h"
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_linux_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
diff --git a/gcc/config/linux-protos.h b/gcc/config/linux-protos.h
index 9da8dd7ecaa..b7284735366 100644
--- a/gcc/config/linux-protos.h
+++ b/gcc/config/linux-protos.h
@@ -20,3 +20,4 @@ along with GCC; see the file COPYING3.  If not see
 extern bool linux_has_ifunc_p (void);
 
 extern bool linux_libc_has_function (enum function_class fn_class);
+extern enum libc_speed ix86_linux_libc_func_speed (int fn);
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..f3bd698bc4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign
+   true to the pointer when move is not done.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool *bail_out_libcall)
 {
   bool may_use_call;
   rtx retval = 0;
@@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  *bail_out_libcall = true;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..c2bf87fd14e 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool *bail_out_libcall = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
Jakub Jelinek April 10, 2018, 9:19 a.m. UTC | #27
On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin Liška wrote:
> gcc/testsuite/ChangeLog:
> 
> 2018-03-28  Martin Liska  <mliska@suse.cz>
> 
> 	* gcc.dg/string-opt-1.c:

I guess you really didn't mean to keep the above entry around, just the one
below, right?

> gcc/testsuite/ChangeLog:
> 
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
> 	and others.

> --- a/gcc/config.gcc
> +++ b/gcc/config.gcc
> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
>  	x86_64-*-linux*)
>  		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
>  		extra_options="${extra_options} linux-android.opt"
> +		extra_objs="${extra_objs} x86-linux.o"
>  	  	;;

The should go into the i[34567]86-*-linux*) case too (outside of the
if test x$enable_targets = xall; then conditional).
Or maybe better, remove the above and do it in:
        i[34567]86-*-linux* | x86_64-*-linux*)
                extra_objs="${extra_objs} cet.o"
                tmake_file="$tmake_file i386/t-linux i386/t-cet"
                ;;
spot, just add x86-linux.o next to cet.o.

> --- a/gcc/config/i386/linux.h
> +++ b/gcc/config/i386/linux.h
> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
>  
>  #undef MUSL_DYNAMIC_LINKER
>  #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
> +
> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
> index f2d913e30ac..d855f5cc239 100644
> --- a/gcc/config/i386/linux64.h
> +++ b/gcc/config/i386/linux64.h
> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>  #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
>  #undef MUSL_DYNAMIC_LINKERX32
>  #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
> +
> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed

And the above two changes should be replaced by a change in
gcc/config/i386/linux-common.h.

> +#include "coretypes.h"
> +#include "cp/cp-tree.h" /* This is why we're a separate module.  */

Why do you need cp/cp-tree.h?  That is just too weird.
The function just uses libc_speed (in core-types.h, built_in_function
(likewise), OPTION_GLIBC (config/linux.h).

	Jakub
Martin Liška April 10, 2018, 12:27 p.m. UTC | #28
On 04/10/2018 11:19 AM, Jakub Jelinek wrote:
> On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin Liška wrote:
>> gcc/testsuite/ChangeLog:
>>
>> 2018-03-28  Martin Liska  <mliska@suse.cz>
>>
>> 	* gcc.dg/string-opt-1.c:
> 
> I guess you really didn't mean to keep the above entry around, just the one
> below, right?

Sure, fixed.

> 
>> gcc/testsuite/ChangeLog:
>>
>> 2018-03-14  Martin Liska  <mliska@suse.cz>
>>
>> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>> 	and others.
> 
>> --- a/gcc/config.gcc
>> +++ b/gcc/config.gcc
>> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
>>  	x86_64-*-linux*)
>>  		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
>>  		extra_options="${extra_options} linux-android.opt"
>> +		extra_objs="${extra_objs} x86-linux.o"
>>  	  	;;
> 
> The should go into the i[34567]86-*-linux*) case too (outside of the
> if test x$enable_targets = xall; then conditional).
> Or maybe better, remove the above and do it in:
>         i[34567]86-*-linux* | x86_64-*-linux*)
>                 extra_objs="${extra_objs} cet.o"
>                 tmake_file="$tmake_file i386/t-linux i386/t-cet"
>                 ;;
> spot, just add x86-linux.o next to cet.o.

Done.

> 
>> --- a/gcc/config/i386/linux.h
>> +++ b/gcc/config/i386/linux.h
>> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
>>  
>>  #undef MUSL_DYNAMIC_LINKER
>>  #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
>> +
>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
>> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
>> index f2d913e30ac..d855f5cc239 100644
>> --- a/gcc/config/i386/linux64.h
>> +++ b/gcc/config/i386/linux64.h
>> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>  #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
>>  #undef MUSL_DYNAMIC_LINKERX32
>>  #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
>> +
>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
> 
> And the above two changes should be replaced by a change in
> gcc/config/i386/linux-common.h.

Likewise.

> 
>> +#include "coretypes.h"
>> +#include "cp/cp-tree.h" /* This is why we're a separate module.  */
> 
> Why do you need cp/cp-tree.h?  That is just too weird.
> The function just uses libc_speed (in core-types.h, built_in_function
> (likewise), OPTION_GLIBC (config/linux.h).

I ended up with minimal set of includes:

#include "config.h"
#include "system.h"
#include "coretypes.h"
#include "backend.h"
#include "tree.h"

I'm retesting the patch.

Martin

> 
> 	Jakub
>
From bed35715063f9435b697eaf4c9868f81e8556de8 Mon Sep 17 00:00:00 2001
From: marxin <mliska@suse.cz>
Date: Wed, 14 Mar 2018 09:44:18 +0100
Subject: [PATCH] Introduce new libc_func_speed target hook (PR
 middle-end/81657).

gcc/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	PR middle-end/81657
	* builtins.c (expand_builtin_memory_copy_args): Handle situation
	when libc library provides a fast mempcpy implementation/
	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
	(TARGET_LIBC_FUNC_SPEED): Likewise.
	* config/i386/linux-common.h (SUBTARGET_LIBC_FUNC_SPEED): Define
	macro.
	* config/i386/t-linux: Add x86-linux.o.
	* config.gcc: Likewise.
	* config/i386/x86-linux.c: New file.
	* coretypes.h (enum libc_speed): Likewise.
	* doc/tm.texi: Document new target hook.
	* doc/tm.texi.in: Likewise.
	* expr.c (emit_block_move_hints): Handle libc bail out argument.
	* expr.h (emit_block_move_hints): Add new parameters.
	* target.def: Add new hook.
	* targhooks.c (enum libc_speed): New enum.
	(default_libc_func_speed): Provide a default hook
	implementation.
	* targhooks.h (default_libc_func_speed): Likewise.

gcc/testsuite/ChangeLog:

2018-03-14  Martin Liska  <mliska@suse.cz>

	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
	and others.
---
 gcc/builtins.c                      | 15 ++++++++++-
 gcc/config.gcc                      |  2 +-
 gcc/config/i386/i386.c              |  5 ++++
 gcc/config/i386/linux-common.h      |  2 ++
 gcc/config/i386/t-linux             |  6 +++++
 gcc/config/i386/x86-linux.c         | 52 +++++++++++++++++++++++++++++++++++++
 gcc/config/linux-protos.h           |  1 +
 gcc/coretypes.h                     |  7 +++++
 gcc/doc/tm.texi                     |  4 +++
 gcc/doc/tm.texi.in                  |  1 +
 gcc/expr.c                          | 11 +++++++-
 gcc/expr.h                          |  3 ++-
 gcc/target.def                      |  7 +++++
 gcc/targhooks.c                     |  9 +++++++
 gcc/targhooks.h                     |  1 +
 gcc/testsuite/gcc.dg/string-opt-1.c |  5 ++--
 16 files changed, 125 insertions(+), 6 deletions(-)
 create mode 100644 gcc/config/i386/x86-linux.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 487d9d58db2..98ee3fb272d 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,26 @@ expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  /* emit_block_move_hints can generate a library call to memcpy function.
+     In situations when a libc library provides fast implementation
+     of mempcpy, then it's better to call mempcpy directly.  */
+  bool avoid_libcall
+    = (endp == 1
+       && targetm.libc_func_speed ((int)BUILT_IN_MEMPCPY) == LIBC_FAST_SPEED
+       && target != const0_rtx);
+
   /* Copy word part most expediently.  */
+  bool libcall_avoided = false;
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     avoid_libcall ? &libcall_avoided : NULL);
+
+  if (libcall_avoided)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config.gcc b/gcc/config.gcc
index 1b58c060a92..7fe43856b6a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4617,7 +4617,7 @@ case ${target} in
 	i[34567]86-*-darwin* | x86_64-*-darwin*)
 		;;
 	i[34567]86-*-linux* | x86_64-*-linux*)
-		extra_objs="${extra_objs} cet.o"
+		extra_objs="${extra_objs} cet.o x86-linux.o"
 		tmake_file="$tmake_file i386/t-linux i386/t-cet"
 		;;
 	i[34567]86-*-kfreebsd*-gnu | x86_64-*-kfreebsd*-gnu)
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b4f6aec1434..2471ff7b99a 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -52105,6 +52105,11 @@ ix86_run_selftests (void)
 #undef TARGET_WARN_PARAMETER_PASSING_ABI
 #define TARGET_WARN_PARAMETER_PASSING_ABI ix86_warn_parameter_passing_abi
 
+#ifdef SUBTARGET_LIBC_FUNC_SPEED
+#undef TARGET_LIBC_FUNC_SPEED
+#define TARGET_LIBC_FUNC_SPEED SUBTARGET_LIBC_FUNC_SPEED
+#endif
+
 #if CHECKING_P
 #undef TARGET_RUN_TARGET_SELFTESTS
 #define TARGET_RUN_TARGET_SELFTESTS selftest::ix86_run_selftests
diff --git a/gcc/config/i386/linux-common.h b/gcc/config/i386/linux-common.h
index d877387021b..1b48c15e5c0 100644
--- a/gcc/config/i386/linux-common.h
+++ b/gcc/config/i386/linux-common.h
@@ -126,3 +126,5 @@ extern void file_end_indicate_exec_stack_and_cet (void);
 
 #undef TARGET_ASM_FILE_END
 #define TARGET_ASM_FILE_END file_end_indicate_exec_stack_and_cet
+
+#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
diff --git a/gcc/config/i386/t-linux b/gcc/config/i386/t-linux
index 155314c08a7..6e3ebe94fe8 100644
--- a/gcc/config/i386/t-linux
+++ b/gcc/config/i386/t-linux
@@ -1 +1,7 @@
 MULTIARCH_DIRNAME = $(call if_multiarch,i386-linux-gnu)
+
+x86-linux.o: $(srcdir)/config/i386/x86-linux.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
+  $(TM_H) $(RTL_H) $(REGS_H) hard-reg-set.h output.h $(TREE_H) flags.h \
+  $(TM_P_H) $(HASHTAB_H) $(GGC_H)
+	$(COMPILER) -c $(ALL_COMPILERFLAGS) $(ALL_CPPFLAGS) $(INCLUDES) \
+	$(srcdir)/config/i386/x86-linux.c
diff --git a/gcc/config/i386/x86-linux.c b/gcc/config/i386/x86-linux.c
new file mode 100644
index 00000000000..5e4331f635a
--- /dev/null
+++ b/gcc/config/i386/x86-linux.c
@@ -0,0 +1,52 @@
+/* Implementation for linux-specific functions for i386 and x86-64 systems.
+   Copyright (C) 2018 Free Software Foundation, Inc.
+   Contributed by Martin Liska <mliska@suse.cz>.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+<http://www.gnu.org/licenses/>.  */
+
+#define IN_TARGET_CODE 1
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "backend.h"
+#include "tree.h"
+
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  We override it for i386 and glibc C library
+   as this combination provides fast implementation of mempcpy function.  */
+
+enum libc_speed
+ix86_linux_libc_func_speed (int fn)
+{
+  enum built_in_function f = (built_in_function)fn;
+
+  if (!OPTION_GLIBC)
+    return LIBC_UNKNOWN_SPEED;
+
+  switch (f)
+    {
+      case BUILT_IN_MEMPCPY:
+	return LIBC_FAST_SPEED;
+      default:
+	return LIBC_UNKNOWN_SPEED;
+    }
+}
diff --git a/gcc/config/linux-protos.h b/gcc/config/linux-protos.h
index 9da8dd7ecaa..b7284735366 100644
--- a/gcc/config/linux-protos.h
+++ b/gcc/config/linux-protos.h
@@ -20,3 +20,4 @@ along with GCC; see the file COPYING3.  If not see
 extern bool linux_has_ifunc_p (void);
 
 extern bool linux_libc_has_function (enum function_class fn_class);
+extern enum libc_speed ix86_linux_libc_func_speed (int fn);
diff --git a/gcc/coretypes.h b/gcc/coretypes.h
index 283b4eb33fe..fe618f708f4 100644
--- a/gcc/coretypes.h
+++ b/gcc/coretypes.h
@@ -384,6 +384,13 @@ enum excess_precision_type
   EXCESS_PRECISION_TYPE_FAST
 };
 
+enum libc_speed
+{
+  LIBC_FAST_SPEED,
+  LIBC_SLOW_SPEED,
+  LIBC_UNKNOWN_SPEED
+};
+
 /* Support for user-provided GGC and PCH markers.  The first parameter
    is a pointer to a pointer, the second a cookie.  */
 typedef void (*gt_pointer_operator) (void *, void *);
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0f7c91a22c4 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -5501,6 +5501,10 @@ macro, a reasonable default is used.
 This hook determines whether a function from a class of functions
 @var{fn_class} is present at the runtime.
 @end deftypefn
+@deftypefn {Target Hook} libc_speed TARGET_LIBC_FUNC_SPEED (int @var{fn})
+This hook determines whether a function from libc has a fast implementation
+@var{fn} is present at the runtime.
+@end deftypefn
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..4bb2998a8a1 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -3933,6 +3933,7 @@ macro, a reasonable default is used.
 @end defmac
 
 @hook TARGET_LIBC_HAS_FUNCTION
+@hook TARGET_LIBC_FUNC_SPEED
 
 @defmac NEXT_OBJC_RUNTIME
 Set this macro to 1 to use the "NeXT" Objective-C message sending conventions
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..f3bd698bc4d 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@ compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is non-null, do not emit library call and assign
+   true to the pointer when move is not done.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,7 +1565,8 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool *bail_out_libcall)
 {
   bool may_use_call;
   rtx retval = 0;
@@ -1625,6 +1628,12 @@ emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  *bail_out_libcall = true;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..c2bf87fd14e 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,8 @@ extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool *bail_out_libcall = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/target.def b/gcc/target.def
index c5b2a1e7e71..3bbddc82776 100644
--- a/gcc/target.def
+++ b/gcc/target.def
@@ -2639,6 +2639,13 @@ DEFHOOK
  bool, (enum function_class fn_class),
  default_libc_has_function)
 
+DEFHOOK
+(libc_func_speed,
+ "This hook determines whether a function from libc has a fast implementation\n\
+@var{fn} is present at the runtime.",
+ libc_speed, (int fn),
+ default_libc_func_speed)
+
 /* True if new jumps cannot be created, to replace existing ones or
    not, at the current point in the compilation.  */
 DEFHOOK
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index fafcc6c5196..6e44f6f79cf 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1642,6 +1642,15 @@ no_c99_libc_has_function (enum function_class fn_class ATTRIBUTE_UNUSED)
   return false;
 }
 
+/* This hook determines whether a function from libc has a fast implementation
+   FN is present at the runtime.  */
+
+enum libc_speed
+default_libc_func_speed (int)
+{
+  return LIBC_UNKNOWN_SPEED;
+}
+
 tree
 default_builtin_tm_load_store (tree ARG_UNUSED (type))
 {
diff --git a/gcc/targhooks.h b/gcc/targhooks.h
index 8a4393f2ba4..7508673ad0a 100644
--- a/gcc/targhooks.h
+++ b/gcc/targhooks.h
@@ -205,6 +205,7 @@ extern bool default_have_conditional_execution (void);
 extern bool default_libc_has_function (enum function_class);
 extern bool no_c99_libc_has_function (enum function_class);
 extern bool gnu_libc_has_function (enum function_class);
+extern enum libc_speed default_libc_func_speed (int);
 
 extern tree default_builtin_tm_load_store (tree);
 
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..7faaadcbb1f 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } } */
+/* { dg-final { scan-assembler "mempcpy" { target  { i?86-*-gnu* x86_64-*-gnu* i?86-*-linux* x86_64-*-linux* } } } } */
Martin Liška April 12, 2018, 1:22 p.m. UTC | #29
Hi.

I'm reminding review request from Richi for generic part
and Uros/Honza for target part.

Thanks,
Martin
Jan Hubicka April 12, 2018, 1:42 p.m. UTC | #30
> Hi.
> 
> I'm reminding review request from Richi for generic part
> and Uros/Honza for target part.

OK for i386 bits.
Honza
> 
> Thanks,
> Martin
Richard Biener April 12, 2018, 1:52 p.m. UTC | #31
On Thu, 12 Apr 2018, Martin Liška wrote:

> Hi.
> 
> I'm reminding review request from Richi for generic part
> and Uros/Honza for target part.

Not sure if I missed some important part of the discussion but
for the testcase we want to preserve the tailcall, right?  So
it would be enough to set avoid_libcall to
endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
stpcpy)?

I'm not sure I like the interfacing of that to emit_block_move_hints
very much.  I'd have used sth like BLOCK_OP_ABORT_ON_LIBCALL
and extend the interface in a way to return what kind of method it
chose rather than just a bool.

Not sure what gcc.dg/20050503-1.c did on non-x86 targets - the
test runs on all archs but only x86 is ever tested for a result.

So - I think tail-calling is prefered, and somehow in the PR
the discussion wandered off to whether there's fast implementations
or not - but the testcase looks for a tailcall where the source
was a tailcall, that should be authorative for the "default"
decision when the hook isn't implemented or doesn't cover the case.

IMO target libraries have to be quite stupid if they have anything
slower than

void *mempcpy (void *dest, const void *src, size_t n)
{
  return memcpy (dest, src, n) + n;
}

which should be not (very much) slower than a non-tailcall memcpy call.

So -- remove the hook and instead use CALL_EXPR_TAILCALL (exp) instead
of its result.

Thanks,
Richard.
Jakub Jelinek April 12, 2018, 2:05 p.m. UTC | #32
On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
> Not sure if I missed some important part of the discussion but
> for the testcase we want to preserve the tailcall, right?  So
> it would be enough to set avoid_libcall to
> endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
> stpcpy)?

For the testcase yes.  There the question is if some targets have so lame
mempcpy that using a tailcall to mempcpy is slower over avoiding the
tailcall (and on aarch64 it looked like maintainer's choice to have lame
mempcpy and hope the compiler will avoid it at all costs).  On the other
side, that change has been forced over to all targets, even when they don't
have lame mempcpy.
So, the tailcall is one issue, and we can either use mempcpy if endp
and CALL_EXPR_TAILCALL, or only do that if -Os.

And another issue is mempcpy uses in other contexts, here again I think x86
has good enough mempcpy that if I use
foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call,
but not so on targets with lame mempcpy.

My preference would be to have non-lame mempcpy etc. on all targets, but the
aarch64 folks disagree.

So, wonder e.g. about Martin's patch, which would use mempcpy if endp and
either FAST_SPEED for mempcpy (regardless of the context), or not
SLOW_SPEED and CALL_EXPR_TAILCALL.  That way, targets could signal they have
so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask
for it to be used every time it makes sense from caller POV, and have the
default something in between (only use it in tail calls).

	Jakub
Richard Biener April 12, 2018, 2:19 p.m. UTC | #33
On Thu, 12 Apr 2018, Jakub Jelinek wrote:

> On Thu, Apr 12, 2018 at 03:52:09PM +0200, Richard Biener wrote:
> > Not sure if I missed some important part of the discussion but
> > for the testcase we want to preserve the tailcall, right?  So
> > it would be enough to set avoid_libcall to
> > endp != 0 && CALL_EXPR_TAILCALL (exp) (and thus also handle
> > stpcpy)?
> 
> For the testcase yes.  There the question is if some targets have so lame
> mempcpy that using a tailcall to mempcpy is slower over avoiding the
> tailcall (and on aarch64 it looked like maintainer's choice to have lame
> mempcpy and hope the compiler will avoid it at all costs).  On the other
> side, that change has been forced over to all targets, even when they don't
> have lame mempcpy.
> So, the tailcall is one issue, and we can either use mempcpy if endp
> and CALL_EXPR_TAILCALL, or only do that if -Os.
> 
> And another issue is mempcpy uses in other contexts, here again I think x86
> has good enough mempcpy that if I use
> foo (mempcpy (x, y, z)) then it is better to use mempcpy over memcpy call,
> but not so on targets with lame mempcpy.
> 
> My preference would be to have non-lame mempcpy etc. on all targets, but the
> aarch64 folks disagree.
> 
> So, wonder e.g. about Martin's patch, which would use mempcpy if endp and
> either FAST_SPEED for mempcpy (regardless of the context), or not
> SLOW_SPEED and CALL_EXPR_TAILCALL.  That way, targets could signal they have
> so lame mempcpy that they never want to use it (return SLOW_SPEED), or ask
> for it to be used every time it makes sense from caller POV, and have the
> default something in between (only use it in tail calls).

Well, but that wouldn't be a fix for a regression and IMHO there's
no reason for a really lame mempcpy.  If targets disgree well,
then they get what they deserve.

I don't see any aarch64 specific mempcpy in glibc btw so hopefully
the default non-stupid one kicks in (it exactly looks like my C
version)

Richard.

> 	Jakub
> 
>
Jakub Jelinek April 12, 2018, 2:31 p.m. UTC | #34
On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote:
> Well, but that wouldn't be a fix for a regression and IMHO there's
> no reason for a really lame mempcpy.  If targets disgree well,

It is a regression as well, in the past we've emitted mempcpy when user
wrote mempcpy, now we don't.

E.g.
extern void *mempcpy (void *, const void *, __SIZE_TYPE__);
void bar (void *, void *, void *);

void
foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n)
{
  bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n));
}

is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in foo, while
on the trunk uses 3 memcpy calls and 96 bytes in foo.

For -Os that is easily measurable regression, for -O2 it depends on the
relative speed of memcpy vs. mempcpy and whether one or both of them are in
I-cache or not.

> then they get what they deserve.
> 
> I don't see any aarch64 specific mempcpy in glibc btw so hopefully
> the default non-stupid one kicks in (it exactly looks like my C
> version)

	Jakub
Richard Biener April 12, 2018, 3:17 p.m. UTC | #35
On April 12, 2018 4:31:12 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Thu, Apr 12, 2018 at 04:19:38PM +0200, Richard Biener wrote:
>> Well, but that wouldn't be a fix for a regression and IMHO there's
>> no reason for a really lame mempcpy.  If targets disgree well,
>
>It is a regression as well, in the past we've emitted mempcpy when user
>wrote mempcpy, now we don't.
>
>E.g.
>extern void *mempcpy (void *, const void *, __SIZE_TYPE__);
>void bar (void *, void *, void *);
>
>void
>foo (void *x, void *y, void *z, void *w, __SIZE_TYPE__ n)
>{
>  bar (mempcpy (x, w, n), mempcpy (y, w, n), mempcpy (z, w, n));
>}
>
>is on x86_64-linux -O2 in 7.x using the 3 mempcpy calls and 90 bytes in
>foo, while
>on the trunk uses 3 memcpy calls and 96 bytes in foo.
>
>For -Os that is easily measurable regression, for -O2 it depends on the
>relative speed of memcpy vs. mempcpy and whether one or both of them
>are in
>I-cache or not.

Well, then simply unconditionally not generate a libcall from the move expander? 

>
>> then they get what they deserve.
>> 
>> I don't see any aarch64 specific mempcpy in glibc btw so hopefully
>> the default non-stupid one kicks in (it exactly looks like my C
>> version)
>
>	Jakub
Jakub Jelinek April 12, 2018, 3:38 p.m. UTC | #36
On Thu, Apr 12, 2018 at 05:17:29PM +0200, Richard Biener wrote:
> >For -Os that is easily measurable regression, for -O2 it depends on the
> >relative speed of memcpy vs. mempcpy and whether one or both of them
> >are in
> >I-cache or not.
> 
> Well, then simply unconditionally not generate a libcall from the move expander? 

We need to generate libcall for many callers and in fact, we don't have a
mode nor a way to tell the caller that we haven't emitted anything.

What we could do is add another enumerator to enum block_op_methods that
would be like BLOCK_OP_NO_LIBCALL, but would not use emit_block_move_via_loop
if move_by_pieces nor emit_block_move_via_movmem can be used, and say
instead return const0_rtx or pc_rtx or some way to tell the caller that
it hasn't emitted anything and in expand_builtin_memory_copy_args
pass for endp == 1 && target != const0_rtx that new BLOCK_OP_NO_LIBCALL_LOOP
to emit_block_move_hints and return 0 if dest_addr is const0_rtx (or pc_rtx
or whatever is chosen).

	Jakub
Richard Biener April 12, 2018, 4:03 p.m. UTC | #37
On April 12, 2018 5:38:44 PM GMT+02:00, Jakub Jelinek <jakub@redhat.com> wrote:
>On Thu, Apr 12, 2018 at 05:17:29PM +0200, Richard Biener wrote:
>> >For -Os that is easily measurable regression, for -O2 it depends on
>the
>> >relative speed of memcpy vs. mempcpy and whether one or both of them
>> >are in
>> >I-cache or not.
>> 
>> Well, then simply unconditionally not generate a libcall from the
>move expander? 
>
>We need to generate libcall for many callers and in fact, we don't have
>a
>mode nor a way to tell the caller that we haven't emitted anything.
>
>What we could do is add another enumerator to enum block_op_methods
>that
>would be like BLOCK_OP_NO_LIBCALL, but would not use
>emit_block_move_via_loop
>if move_by_pieces nor emit_block_move_via_movmem can be used, and say
>instead return const0_rtx or pc_rtx or some way to tell the caller that
>it hasn't emitted anything and in expand_builtin_memory_copy_args
>pass for endp == 1 && target != const0_rtx that new
>BLOCK_OP_NO_LIBCALL_LOOP
>to emit_block_move_hints and return 0 if dest_addr is const0_rtx (or
>pc_rtx
>or whatever is chosen).

Yes. Emit the "original" call whenever inline expansion fails. 

Richard. 

>	Jakub
Jeff Law July 5, 2018, 7:34 p.m. UTC | #38
On 04/10/2018 06:27 AM, Martin Liška wrote:
> On 04/10/2018 11:19 AM, Jakub Jelinek wrote:
>> On Mon, Apr 09, 2018 at 02:31:04PM +0200, Martin Liška wrote:
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-03-28  Martin Liska  <mliska@suse.cz>
>>>
>>> 	* gcc.dg/string-opt-1.c:
>> I guess you really didn't mean to keep the above entry around, just the one
>> below, right?
> Sure, fixed.
> 
>>> gcc/testsuite/ChangeLog:
>>>
>>> 2018-03-14  Martin Liska  <mliska@suse.cz>
>>>
>>> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
>>> 	and others.
>>> --- a/gcc/config.gcc
>>> +++ b/gcc/config.gcc
>>> @@ -1607,6 +1607,7 @@ x86_64-*-linux* | x86_64-*-kfreebsd*-gnu)
>>>  	x86_64-*-linux*)
>>>  		tm_file="${tm_file} linux.h linux-android.h i386/linux-common.h i386/linux64.h"
>>>  		extra_options="${extra_options} linux-android.opt"
>>> +		extra_objs="${extra_objs} x86-linux.o"
>>>  	  	;;
>> The should go into the i[34567]86-*-linux*) case too (outside of the
>> if test x$enable_targets = xall; then conditional).
>> Or maybe better, remove the above and do it in:
>>         i[34567]86-*-linux* | x86_64-*-linux*)
>>                 extra_objs="${extra_objs} cet.o"
>>                 tmake_file="$tmake_file i386/t-linux i386/t-cet"
>>                 ;;
>> spot, just add x86-linux.o next to cet.o.
> Done.
> 
>>> --- a/gcc/config/i386/linux.h
>>> +++ b/gcc/config/i386/linux.h
>>> @@ -24,3 +24,5 @@ along with GCC; see the file COPYING3.  If not see
>>>  
>>>  #undef MUSL_DYNAMIC_LINKER
>>>  #define MUSL_DYNAMIC_LINKER "/lib/ld-musl-i386.so.1"
>>> +
>>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
>>> diff --git a/gcc/config/i386/linux64.h b/gcc/config/i386/linux64.h
>>> index f2d913e30ac..d855f5cc239 100644
>>> --- a/gcc/config/i386/linux64.h
>>> +++ b/gcc/config/i386/linux64.h
>>> @@ -37,3 +37,5 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
>>>  #define MUSL_DYNAMIC_LINKER64 "/lib/ld-musl-x86_64.so.1"
>>>  #undef MUSL_DYNAMIC_LINKERX32
>>>  #define MUSL_DYNAMIC_LINKERX32 "/lib/ld-musl-x32.so.1"
>>> +
>>> +#define SUBTARGET_LIBC_FUNC_SPEED ix86_linux_libc_func_speed
>> And the above two changes should be replaced by a change in
>> gcc/config/i386/linux-common.h.
> Likewise.
> 
>>> +#include "coretypes.h"
>>> +#include "cp/cp-tree.h" /* This is why we're a separate module.  */
>> Why do you need cp/cp-tree.h?  That is just too weird.
>> The function just uses libc_speed (in core-types.h, built_in_function
>> (likewise), OPTION_GLIBC (config/linux.h).
> I ended up with minimal set of includes:
> 
> #include "config.h"
> #include "system.h"
> #include "coretypes.h"
> #include "backend.h"
> #include "tree.h"
> 
> I'm retesting the patch.
> 
> Martin
> 
>> 	Jakub
>>
> 
> 0001-Introduce-new-libc_func_speed-target-hook-PR-middle-.patch
> 
> 
> From bed35715063f9435b697eaf4c9868f81e8556de8 Mon Sep 17 00:00:00 2001
> From: marxin <mliska@suse.cz>
> Date: Wed, 14 Mar 2018 09:44:18 +0100
> Subject: [PATCH] Introduce new libc_func_speed target hook (PR
>  middle-end/81657).
> 
> gcc/ChangeLog:
> 
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	PR middle-end/81657
> 	* builtins.c (expand_builtin_memory_copy_args): Handle situation
> 	when libc library provides a fast mempcpy implementation/
> 	* config/linux-protos.h (ix86_linux_libc_func_speed): New.
> 	(TARGET_LIBC_FUNC_SPEED): Likewise.
> 	* config/i386/linux-common.h (SUBTARGET_LIBC_FUNC_SPEED): Define
> 	macro.
> 	* config/i386/t-linux: Add x86-linux.o.
> 	* config.gcc: Likewise.
> 	* config/i386/x86-linux.c: New file.
> 	* coretypes.h (enum libc_speed): Likewise.
> 	* doc/tm.texi: Document new target hook.
> 	* doc/tm.texi.in: Likewise.
> 	* expr.c (emit_block_move_hints): Handle libc bail out argument.
> 	* expr.h (emit_block_move_hints): Add new parameters.
> 	* target.def: Add new hook.
> 	* targhooks.c (enum libc_speed): New enum.
> 	(default_libc_func_speed): Provide a default hook
> 	implementation.
> 	* targhooks.h (default_libc_func_speed): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-03-14  Martin Liska  <mliska@suse.cz>
> 
> 	* gcc.dg/string-opt-1.c: Adjust scans for i386 and glibc target
> 	and others.
This looks pretty reasonable now.  Let's go with it.  If we need to
adjust other targets we certainly can fault them in as their properties
are discovered/updated.

jeff
diff mbox series

Patch

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 85affa74510..c2ca36934f7 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -3651,13 +3651,24 @@  expand_builtin_memory_copy_args (tree dest, tree src, tree len,
   src_mem = get_memory_rtx (src, len);
   set_mem_align (src_mem, src_align);
 
+  bool is_move_done;
+
   /* Copy word part most expediently.  */
   dest_addr = emit_block_move_hints (dest_mem, src_mem, len_rtx,
 				     CALL_EXPR_TAILCALL (exp)
 				     && (endp == 0 || target == const0_rtx)
 				     ? BLOCK_OP_TAILCALL : BLOCK_OP_NORMAL,
 				     expected_align, expected_size,
-				     min_size, max_size, probable_max_size);
+				     min_size, max_size, probable_max_size,
+				     TARGET_HAS_FAST_MEMPCPY_ROUTINE
+				     && endp == 1,
+				     &is_move_done);
+
+  /* Bail out when a mempcpy call would be expanded as libcall and when
+     we have a target that provides a fast implementation
+     of mempcpy routine.  */
+  if (!is_move_done)
+    return NULL_RTX;
 
   if (dest_addr == 0)
     {
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index e43edd77b56..8744d706fd7 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -1914,6 +1914,9 @@  typedef struct ix86_args {
 
 #define CLEAR_RATIO(speed) ((speed) ? MIN (6, ix86_cost->move_ratio) : 2)
 
+/* C library provides fast implementation of mempcpy function.  */
+#define TARGET_HAS_FAST_MEMPCPY_ROUTINE 1
+
 /* Define if shifts truncate the shift count which implies one can
    omit a sign-extension or zero-extension of a shift count.
 
diff --git a/gcc/defaults.h b/gcc/defaults.h
index 78a08a33f12..2e5caac8dcd 100644
--- a/gcc/defaults.h
+++ b/gcc/defaults.h
@@ -1340,6 +1340,13 @@  see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
 #define SET_RATIO(speed) MOVE_RATIO (speed)
 #endif
 
+/* By default do not generate libcall to mempcpy and rather use
+   libcall to memcpy and adjustment of return value.  */
+
+#ifndef TARGET_HAS_FAST_MEMPCPY_ROUTINE
+#define TARGET_HAS_FAST_MEMPCPY_ROUTINE 0
+#endif
+
 /* Supply a default definition of STACK_SAVEAREA_MODE for emit_stack_save.
    Normally move_insn, so Pmode stack pointer.  */
 
diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index bd8b917ba82..0c8a3f3298c 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -6627,6 +6627,11 @@  optimized for speed rather than size.
 If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
 @end defmac
 
+@defmac TARGET_HAS_FAST_MEMPCPY_ROUTINE
+By default do not generate libcall to mempcpy and rather use
+libcall to memcpy and adjustment of return value.
+@end defmac
+
 @defmac USE_LOAD_POST_INCREMENT (@var{mode})
 A C expression used to determine whether a load postincrement is a good
 thing to use for a given mode.  Defaults to the value of
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index b0207146e8c..e7ef85ab78e 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -4560,6 +4560,11 @@  optimized for speed rather than size.
 If you don't define this, it defaults to the value of @code{MOVE_RATIO}.
 @end defmac
 
+@defmac TARGET_HAS_FAST_MEMPCPY_ROUTINE
+By default do not generate libcall to mempcpy and rather use
+libcall to memcpy and adjustment of return value.
+@end defmac
+
 @defmac USE_LOAD_POST_INCREMENT (@var{mode})
 A C expression used to determine whether a load postincrement is a good
 thing to use for a given mode.  Defaults to the value of
diff --git a/gcc/expr.c b/gcc/expr.c
index 00660293f72..b6c13652d79 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -1554,6 +1554,8 @@  compare_by_pieces (rtx arg0, rtx arg1, unsigned HOST_WIDE_INT len,
    MIN_SIZE is the minimal size of block to move
    MAX_SIZE is the maximal size of block to move, if it can not be represented
    in unsigned HOST_WIDE_INT, than it is mask of all ones.
+   If BAIL_OUT_LIBCALL is set true, do not emit library call and set
+   *IS_MOVE_DONE to false.
 
    Return the address of the new block, if memcpy is called and returns it,
    0 otherwise.  */
@@ -1563,12 +1565,17 @@  emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 		       unsigned int expected_align, HOST_WIDE_INT expected_size,
 		       unsigned HOST_WIDE_INT min_size,
 		       unsigned HOST_WIDE_INT max_size,
-		       unsigned HOST_WIDE_INT probable_max_size)
+		       unsigned HOST_WIDE_INT probable_max_size,
+		       bool bail_out_libcall, bool *is_move_done)
 {
   bool may_use_call;
   rtx retval = 0;
   unsigned int align;
 
+  /* When not doing a bail out, we always emit a memory move.  */
+  if (is_move_done)
+    *is_move_done = true;
+
   gcc_assert (size);
   if (CONST_INT_P (size) && INTVAL (size) == 0)
     return 0;
@@ -1625,6 +1632,13 @@  emit_block_move_hints (rtx x, rtx y, rtx size, enum block_op_methods method,
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (x))
 	   && ADDR_SPACE_GENERIC_P (MEM_ADDR_SPACE (y)))
     {
+      if (bail_out_libcall)
+	{
+	  if (is_move_done)
+	    *is_move_done = false;
+	  return retval;
+	}
+
       /* Since x and y are passed to a libcall, mark the corresponding
 	 tree EXPR as addressable.  */
       tree y_expr = MEM_EXPR (y);
diff --git a/gcc/expr.h b/gcc/expr.h
index b3d523bcb24..023bc5aec47 100644
--- a/gcc/expr.h
+++ b/gcc/expr.h
@@ -110,7 +110,9 @@  extern rtx emit_block_move_hints (rtx, rtx, rtx, enum block_op_methods,
 			          unsigned int, HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
 				  unsigned HOST_WIDE_INT,
-				  unsigned HOST_WIDE_INT);
+				  unsigned HOST_WIDE_INT,
+				  bool bail_out_libcall = false,
+				  bool *is_move_done = NULL);
 extern rtx emit_block_cmp_hints (rtx, rtx, rtx, tree, rtx, bool,
 				 by_pieces_constfn, void *);
 extern bool emit_storent_insn (rtx to, rtx from);
diff --git a/gcc/testsuite/gcc.dg/string-opt-1.c b/gcc/testsuite/gcc.dg/string-opt-1.c
index 2f060732bf0..851c8b04a33 100644
--- a/gcc/testsuite/gcc.dg/string-opt-1.c
+++ b/gcc/testsuite/gcc.dg/string-opt-1.c
@@ -48,5 +48,5 @@  main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-not "\<mempcpy\>" } } */
-/* { dg-final { scan-assembler "memcpy" } } */
+/* { dg-final { scan-assembler-not "\<mempcpy\>" { target { i?86-*-* x86_64-*-* } } } } */
+/* { dg-final { scan-assembler "memcpy" { target  { ! { i?86-*-* x86_64-*-* } } } } } */