From patchwork Mon Mar 7 17:36:26 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "H.J. Lu" X-Patchwork-Id: 593075 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id E969F140216 for ; Tue, 8 Mar 2016 04:37:13 +1100 (AEDT) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b=ky1Z66KJ; dkim-atps=neutral DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=hHe4W7QmNVpc66GyCJGbHVzAc1xijrC hGeTFILrL2TC1Q4BX7LWfGUCx0sehh+CMY0BSoF/PpFVOMpzikfuS3ZM63emypMJ 1wz8L6DaGueWntjOGhu/MguFavlrQYWSU6H1eyDNWjeQhchmp6lpUFucpMakq8z5 O4L7kJDqAW5o= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; s=default; bh=PqqMOjxfXCLd4QfaFpn/4qt3F9I=; b=ky1Z6 6KJ7ParWmkgUj11WWQ4Clvq2jdhTBTRmZjxMhoa1dKVa/EFIPWiGp9hHSEyyylm6 J3uG4fI0bDECEajg7n43JxwnHg9RrTGncFAich41Az+Ms+juh3BU6UzkTQKNv9xv iOCNqKNfcsQb+SfknjSQQ1oEaz0XM7yDCdYi5w= Received: (qmail 125881 invoked by alias); 7 Mar 2016 17:36:54 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 125819 invoked by uid 89); 7 Mar 2016 17:36:53 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=1.5 required=5.0 tests=BAYES_50, FREEMAIL_FROM, SPF_SOFTFAIL autolearn=no version=3.3.2 spammy=$15, $24, $16, movdqu X-HELO: mga03.intel.com X-ExtLoop1: 1 From: "H.J. Lu" To: libc-alpha@sourceware.org Cc: Ondrej Bilka Subject: [PATCH 3/7] Remove L(overlapping) from memcpy-sse2-unaligned.S Date: Mon, 7 Mar 2016 09:36:26 -0800 Message-Id: <1457372190-12196-4-git-send-email-hjl.tools@gmail.com> In-Reply-To: <1457372190-12196-1-git-send-email-hjl.tools@gmail.com> References: <1457372190-12196-1-git-send-email-hjl.tools@gmail.com> Since memcpy doesn't need to check overlapping source and destination, we can remove L(overlapping). [BZ #19776] * sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S (L(overlapping)): Removed. --- sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S | 47 +----------------------- 1 file changed, 2 insertions(+), 45 deletions(-) diff --git a/sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S b/sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S index 19d8aa6..335a498 100644 --- a/sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S +++ b/sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S @@ -25,12 +25,8 @@ ENTRY(__memcpy_sse2_unaligned) movq %rdi, %rax - movq %rsi, %r11 - leaq (%rdx,%rdx), %rcx - subq %rdi, %r11 - subq %rdx, %r11 - cmpq %rcx, %r11 - jb L(overlapping) + testq %rdx, %rdx + je L(return) cmpq $16, %rdx jbe L(less_16) movdqu (%rsi), %xmm8 @@ -89,45 +85,6 @@ L(loop): cmpq %rcx, %rdx jne L(loop) ret -L(overlapping): - testq %rdx, %rdx - .p2align 4,,5 - je L(return) - movq %rdx, %r9 - leaq 16(%rsi), %rcx - leaq 16(%rdi), %r8 - shrq $4, %r9 - movq %r9, %r11 - salq $4, %r11 - cmpq %rcx, %rdi - setae %cl - cmpq %r8, %rsi - setae %r8b - orl %r8d, %ecx - cmpq $15, %rdx - seta %r8b - testb %r8b, %cl - je .L21 - testq %r11, %r11 - je .L21 - xorl %ecx, %ecx - xorl %r8d, %r8d -.L7: - movdqu (%rsi,%rcx), %xmm8 - addq $1, %r8 - movdqu %xmm8, (%rdi,%rcx) - addq $16, %rcx - cmpq %r8, %r9 - ja .L7 - cmpq %r11, %rdx - je L(return) -.L21: - movzbl (%rsi,%r11), %ecx - movb %cl, (%rdi,%r11) - addq $1, %r11 - cmpq %r11, %rdx - ja .L21 - ret L(less_16): testb $24, %dl jne L(between_9_16)