From patchwork Wed May 16 08:34:20 2018 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Simon Guo X-Patchwork-Id: 914273 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 40m7K76cHCz9s19 for ; Wed, 16 May 2018 18:44:27 +1000 (AEST) Authentication-Results: ozlabs.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="UcBNGSdo"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 40m7K74TlQzF14v for ; Wed, 16 May 2018 18:44:27 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="UcBNGSdo"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Authentication-Results: lists.ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=gmail.com (client-ip=2607:f8b0:400e:c00::242; helo=mail-pf0-x242.google.com; envelope-from=wei.guo.simon@gmail.com; receiver=) Authentication-Results: lists.ozlabs.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="UcBNGSdo"; dkim-atps=neutral Received: from mail-pf0-x242.google.com (mail-pf0-x242.google.com [IPv6:2607:f8b0:400e:c00::242]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 40m75x645bzF1QT for ; Wed, 16 May 2018 18:34:45 +1000 (AEST) Received: by mail-pf0-x242.google.com with SMTP id c10-v6so1441059pfi.12 for ; Wed, 16 May 2018 01:34:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=Dx4IkyQyGTId6tIzQHk3kbFuK1sz0vHcldmIPitLKNc=; b=UcBNGSdoA4KmF4J2KNZAwnrVJXsyLSBntgfH1xGB6q/4oy3lqx/PNftivCPkScPsYQ ZkLpbOEpzyV+znOf0YIdv4xEW0twXcs8IcEeF0vub1UGCeiyv3yVVdJtuk7Oeko/Hhxd mI4WEY+9qy8V++ko9dkAcaMyzuk//Cp3bSTXw7u6Fkwgw+FEItsGULU2IO7MPACMAkUJ 4rTZe5knGkboMe/aYMxdxkjTRHOHNpOLfWMLx8vQ875SNcCjJUU2ExD1IZNYawoHc/sF JDGY8p/VXoUeEhnnmgJwgH8WU9gSa86MMwsptSIXaqF98eemgP6a0p+gK5jy2SkPD85t 5IxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=Dx4IkyQyGTId6tIzQHk3kbFuK1sz0vHcldmIPitLKNc=; b=YYVfXbmmmKpfSEkYSNkLlyZOJKivbB+hezQKQSEz5KfIzUWx3E4IWJmAAKubVKBS42 Bazn40QyD/+efsQmX9cpp+vmh40wCcXC0+F/0TBf5t8SDrLsmXwejYHQ7GkFoz8B0Ia+ kBmkLF3+8YFXKaoONM+enu/doDYKNZhrIbrAKVQ0NPuiOAlftNSjws9oxK0fksLJkd51 rM0OPDjZ08cWzz4DhxSih2icHt+/o4sGvvus9KL7+Ds66j3PpMXMaYf9hG+CqsQIq7Iq RlIloW1azEwA/Nslmfmluv3VGw8C/PENiEpxvfstsyf9AUMSMn7KyPMa73dIyEuJ9Cor 8q2A== X-Gm-Message-State: ALKqPweKnanGaEq8vmC95julyg+1Ogpm2hKEwegXonTsT6ctqPnLD4El mnQGrjMUsEHVE5oyPxnEcxQi8g== X-Google-Smtp-Source: AB8JxZqkG7NTv3W7iy4ga4KbQg9pEpDVjpXxWS8gQPswu5m/bQUUa6Y4vEIxFsaeOGEbBWMp9pQKvA== X-Received: by 2002:a63:3643:: with SMTP id d64-v6mr15159533pga.228.1526459683661; Wed, 16 May 2018 01:34:43 -0700 (PDT) Received: from simonLocalRHEL7.cn.ibm.com ([112.73.0.88]) by smtp.gmail.com with ESMTPSA id s16-v6sm2898757pfm.114.2018.05.16.01.34.41 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 16 May 2018 01:34:43 -0700 (PDT) From: wei.guo.simon@gmail.com To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v4 3/4] powerpc/64: add 32 bytes prechecking before using VMX optimization on memcmp() Date: Wed, 16 May 2018 16:34:20 +0800 Message-Id: <1526459661-17323-4-git-send-email-wei.guo.simon@gmail.com> X-Mailer: git-send-email 1.8.3.1 In-Reply-To: <1526459661-17323-1-git-send-email-wei.guo.simon@gmail.com> References: <1526459661-17323-1-git-send-email-wei.guo.simon@gmail.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.26 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: "Naveen N. Rao" , Simon Guo , Cyril Bur Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" From: Simon Guo This patch is based on the previous VMX patch on memcmp(). To optimize ppc64 memcmp() with VMX instruction, we need to think about the VMX penalty brought with: If kernel uses VMX instruction, it needs to save/restore current thread's VMX registers. There are 32 x 128 bits VMX registers in PPC, which means 32 x 16 = 512 bytes for load and store. The major concern regarding the memcmp() performance in kernel is KSM, who will use memcmp() frequently to merge identical pages. So it will make sense to take some measures/enhancement on KSM to see whether any improvement can be done here. Cyril Bur indicates that the memcmp() for KSM has a higher possibility to fail (unmatch) early in previous bytes in following mail. https://patchwork.ozlabs.org/patch/817322/#1773629 And I am taking a follow-up on this with this patch. Per some testing, it shows KSM memcmp() will fail early at previous 32 bytes. More specifically: - 76% cases will fail/unmatch before 16 bytes; - 83% cases will fail/unmatch before 32 bytes; - 84% cases will fail/unmatch before 64 bytes; So 32 bytes looks a better choice than other bytes for pre-checking. This patch adds a 32 bytes pre-checking firstly before jumping into VMX operations, to avoid the unnecessary VMX penalty. And the testing shows ~20% improvement on memcmp() average execution time with this patch. The detail data and analysis is at: https://github.com/justdoitqd/publicFiles/blob/master/memcmp/README.md Any suggestion is welcome. Signed-off-by: Simon Guo --- arch/powerpc/lib/memcmp_64.S | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/arch/powerpc/lib/memcmp_64.S b/arch/powerpc/lib/memcmp_64.S index 6303bbf..df2eec0 100644 --- a/arch/powerpc/lib/memcmp_64.S +++ b/arch/powerpc/lib/memcmp_64.S @@ -405,6 +405,35 @@ _GLOBAL(memcmp) /* Enter with src/dst addrs has the same offset with 8 bytes * align boundary */ + +#ifdef CONFIG_KSM + /* KSM will always compare at page boundary so it falls into + * .Lsameoffset_vmx_cmp. + * + * There is an optimization for KSM based on following fact: + * KSM pages memcmp() prones to fail early at the first bytes. In + * a statisis data, it shows 76% KSM memcmp() fails at the first + * 16 bytes, and 83% KSM memcmp() fails at the first 32 bytes, 84% + * KSM memcmp() fails at the first 64 bytes. + * + * Before applying VMX instructions which will lead to 32x128bits VMX + * regs load/restore penalty, let us compares the first 32 bytes + * so that we can catch the ~80% fail cases. + */ + + li r0,4 + mtctr r0 +.Lksm_32B_loop: + LD rA,0,r3 + LD rB,0,r4 + cmpld cr0,rA,rB + addi r3,r3,8 + addi r4,r4,8 + bne cr0,.LcmpAB_lightweight + addi r5,r5,-8 + bdnz .Lksm_32B_loop +#endif + ENTER_VMX_OPS beq cr1,.Llong_novmx_cmp