From patchwork Wed Apr 17 14:21:00 2013
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Jan Hubicka <hubicka@ucw.cz>
X-Patchwork-Id: 237235
Return-Path: 
 <gcc-patches-return-340037-incoming=patchwork.ozlabs.org@gcc.gnu.org>
X-Original-To: incoming@patchwork.ozlabs.org
Delivered-To: patchwork-incoming@bilbo.ozlabs.org
Received: from sourceware.org (server1.sourceware.org [209.132.180.131])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "localhost", Issuer "www.qmailtoaster.com" (not verified))
	by ozlabs.org (Postfix) with ESMTPS id 941EA2C012F
	for <incoming@patchwork.ozlabs.org>;
	Thu, 18 Apr 2013 00:21:24 +1000 (EST)
DomainKey-Signature: a=rsa-sha1; c=nofws; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:references:mime-version
	:content-type:in-reply-to; q=dns; s=default; b=n0HK+Xi/4rGgFW3p6
	lH9UaTVkmyjSfyJVHdOoNPxQKijRnfjyTiZy9/Kjh5I67/mCOEOhWK4oJvbKi1Qk
	9ClWhaI5gUynBbkEJYpUWX0CMVM7YjHJdriNKgXNINLV0eb+rJ4xikbeGg239KqW
	Vs+wR7E2lC16jbR3v1S6244xr4=
DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=gcc.gnu.org; h=list-id
	:list-unsubscribe:list-archive:list-post:list-help:sender:date
	:from:to:cc:subject:message-id:references:mime-version
	:content-type:in-reply-to; s=default; bh=glVH4mWovpiUPWT0EAAOLl4
	+15c=; b=KV2kkCrIENkI0wVFi5pcxQUq2yx8KJWfBkom2bDz1LjRJASrwdheGwE
	GvYa1D54XGE0HMB8WNv14nWWDlfwadAU49tWvs4E0nVWTJFYgAEOnjQlYy3lFPiL
	XvqZgjCG3pWHhUxr96N82TKZWhNdngKPo0CcO/JXaNvhAuBKq3aw=
Received: (qmail 10246 invoked by alias); 17 Apr 2013 14:21:16 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Unsubscribe: <mailto:gcc-patches-unsubscribe-##L=##H@gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Delivered-To: mailing list gcc-patches@gcc.gnu.org
Received: (qmail 10232 invoked by uid 89); 17 Apr 2013 14:21:16 -0000
X-Spam-SWARE-Status: No, score=-2.0 required=5.0 tests=AWL, BAYES_00,
	KHOP_RCVD_UNTRUST, RCVD_IN_DNSWL_LOW, RP_MATCHES_RCVD, TW_CP,
	TW_DD, TW_DQ, TW_OV, TW_VD autolearn=ham version=3.3.1
Received: from nikam.ms.mff.cuni.cz (HELO nikam.ms.mff.cuni.cz)
	(195.113.20.16) by sourceware.org
	(qpsmtpd/0.84/v0.84-167-ge50287c) with ESMTP;
	Wed, 17 Apr 2013 14:21:03 +0000
Received: by nikam.ms.mff.cuni.cz (Postfix, from userid 16202)	id
	F272C542FEF; Wed, 17 Apr 2013 16:21:00 +0200 (CEST)
Date: Wed, 17 Apr 2013 16:21:00 +0200
From: Jan Hubicka <hubicka@ucw.cz>
To: Michael Zolotukhin <michael.v.zolotukhin@gmail.com>
Cc: Jan Hubicka <hubicka@ucw.cz>,
	"gcc-patches@gcc.gnu.org" <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH, x86] Use vector moves in memmove expanding
Message-ID: <20130417142100.GA10525@kam.mff.cuni.cz>
References: 
 <CANtU07_xUQHqFVhc=xXcXC1T0c37FhW+F9O8BgHtnoq2LNsEYw@mail.gmail.com>
MIME-Version: 1.0
Content-Disposition: inline
In-Reply-To: 
 <CANtU07_xUQHqFVhc=xXcXC1T0c37FhW+F9O8BgHtnoq2LNsEYw@mail.gmail.com>
User-Agent: Mutt/1.5.20 (2009-06-14)

> 
> Bootstrap/make check/Specs2k are passing on i686 and x86_64.
Thanks for returning to this!

glibc has quite comprehensive testsuite for stringop.  It may be useful to test it
with -minline-all-stringop -mstringop-stategy=vector

I tested the patch on my core notebook and my memcpy micro benchmark. 
Vector loop is not a win since apparenlty we do not produce any SSE code for 64bit
compilation. What CPUs and bock sizes this is intended for?

Also the internal loop with -march=native seems to come out as:
.L7:
        movq    (%rsi,%r8), %rax
        movq    8(%rsi,%r8), %rdx
        movq    48(%rsi,%r8), %r9
        movq    56(%rsi,%r8), %r10
        movdqu  16(%rsi,%r8), %xmm3
        movdqu  32(%rsi,%r8), %xmm1
        movq    %rax, (%rdi,%r8)
        movq    %rdx, 8(%rdi,%r8)
        movdqa  %xmm3, 16(%rdi,%r8)
        movdqa  %xmm1, 32(%rdi,%r8)
        movq    %r9, 48(%rdi,%r8)
        movq    %r10, 56(%rdi,%r8)
        addq    $64, %r8
        cmpq    %r11, %r8

It is not htat much of SSE enablement since RA seems to home the vars in integer regs.
Could you please look into it?
> 
> Changelog entry:
> 
> 2013-04-10  Michael Zolotukhin  <michael.v.zolotukhin@gmail.com>
> 
>         * config/i386/i386-opts.h (enum stringop_alg): Add vector_loop.
>         * config/i386/i386.c (expand_set_or_movmem_via_loop): Use
>         adjust_address instead of change_address to keep info about alignment.
>         (emit_strmov): Remove.
>         (emit_memmov): New function.
>         (expand_movmem_epilogue): Refactor to properly handle bigger sizes.
>         (expand_movmem_epilogue): Likewise and return updated rtx for
>         destination.
>         (expand_constant_movmem_prologue): Likewise and return updated rtx for
>         destination and source.
>         (decide_alignment): Refactor, handle vector_loop.
>         (ix86_expand_movmem): Likewise.
>         (ix86_expand_setmem): Likewise.
>         * config/i386/i386.opt (Enum): Add vector_loop to option stringop_alg.
>         * emit-rtl.c (get_mem_align_offset): Compute alignment for MEM_REF.

+    }
   else
     return -1;
 
This change out to go independently. I can not review it. 
I will make first look over the patch shortly, but please send updated patch fixing
the problem with integer regs.

Honza

diff --git a/gcc/emit-rtl.c b/gcc/emit-rtl.c
index 73a59b5..edb59da 100644
--- a/gcc/emit-rtl.c
+++ b/gcc/emit-rtl.c
@@ -1565,6 +1565,18 @@ get_mem_align_offset (rtx mem, unsigned int align)
 	  expr = inner;
 	}
     }
+  else if (TREE_CODE (expr) == MEM_REF)
+    {
+      tree base = TREE_OPERAND (expr, 0);
+      tree byte_offset = TREE_OPERAND (expr, 1);
+      if (TREE_CODE (base) != ADDR_EXPR
+	  || TREE_CODE (byte_offset) != INTEGER_CST)
+	return -1;
+      if (!DECL_P (TREE_OPERAND (base, 0))
+	  || DECL_ALIGN (TREE_OPERAND (base, 0)) < align)

You can use TYPE_ALIGN here? In general can't we replace all the GIMPLE
handling by get_object_alignment?

+	return -1;
+      offset += tree_low_cst (byte_offset, 1);