From patchwork Wed Apr 12 01:35:52 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matt Brown X-Patchwork-Id: 749700 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from lists.ozlabs.org (lists.ozlabs.org [103.22.144.68]) (using TLSv1.2 with cipher ADH-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3w2mmj0Xn5z9ryQ for ; Wed, 12 Apr 2017 11:39:17 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ca0iQo7V"; dkim-atps=neutral Received: from lists.ozlabs.org (lists.ozlabs.org [IPv6:2401:3900:2:1::3]) by lists.ozlabs.org (Postfix) with ESMTP id 3w2mmh6bgWzDq7h for ; Wed, 12 Apr 2017 11:39:16 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ca0iQo7V"; dkim-atps=neutral X-Original-To: linuxppc-dev@lists.ozlabs.org Delivered-To: linuxppc-dev@lists.ozlabs.org Received: from mail-pf0-x244.google.com (mail-pf0-x244.google.com [IPv6:2607:f8b0:400e:c00::244]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by lists.ozlabs.org (Postfix) with ESMTPS id 3w2mk90KywzDqC4 for ; Wed, 12 Apr 2017 11:37:05 +1000 (AEST) Authentication-Results: lists.ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b="ca0iQo7V"; dkim-atps=neutral Received: by mail-pf0-x244.google.com with SMTP id i5so2298057pfc.3 for ; Tue, 11 Apr 2017 18:37:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=a3BK5aISYfF2zPshXBcilS9zCRiQm6wVRNrHVuMuMl8=; b=ca0iQo7VZOGaeo3qB7Uz+AgE531FRdSE1/etJhktTnHb21cmKRguR+9ZUXvDf5wJmi 1Vhtc9qd07DlkfGVMKpDTtpDUGQY6FD3OwQFWxP2mkyMEXbjXLLW4LiK4t5iswAZu0er W/Q8aK7iRHkb03KWd687qFpnwKcnabgCsboCw/1v8KTpNekNUDNdbtStdJOLCLaOri1s 6opbFjXQCOadUaLamoQhhq3DkqsmDX0D6brB59zz1ygMsqPn0O4sB03221xBK/w1q9QT ITizQTJtnqQ7KMDFZPVkznqULxBCSY5FjwvUOnZ5HaBSW7qh12HdNiHHFIxyegT4zj2Q 8aUg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=a3BK5aISYfF2zPshXBcilS9zCRiQm6wVRNrHVuMuMl8=; b=bHIkWRY1bDfpWWyIMcd0L65HVWVrNoXDmWVjGp5hNDvuhmaygCcsRDvUjlixuyf6Dn gmoXUjbmfnxRE/25eBbguk8TOZSwFjeawSgMQAN3KucyKT/r4J3l7mIG8vF8ZxGRc1Ro M2323VsfEZ5vpKk9W+I6dhfOF7l6hmebnFlHFfLiTBooJC7Jq6YOzsj9/c3myQvJxSOl +sjXQ5lYQXt4T5B2MXQ8yimWHFhpdSogtNpREqj5+7pOVnoPG4ckaI/8rbcDN0nI3IFw 0xFqQctajJ8VCbbZqp2sc7u9OUymL1QepPsWf05LBPBauBrvBAgAm35c7MPME6SU9Q5B m4og== X-Gm-Message-State: AN3rC/7OoKI0tL5QQhVEJDjHCTRs+CMhSp/bJhIXHjQfocUABKTKAvBQEEgDJ4xKVY4LlQ== X-Received: by 10.98.35.140 with SMTP id q12mr13406911pfj.36.1491961023169; Tue, 11 Apr 2017 18:37:03 -0700 (PDT) Received: from matt.ozlabs.ibm.com ([122.99.82.10]) by smtp.gmail.com with ESMTPSA id p68sm32703647pga.6.2017.04.11.18.37.01 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 11 Apr 2017 18:37:02 -0700 (PDT) From: Matt Brown To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v3 2/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome Date: Wed, 12 Apr 2017 11:35:52 +1000 Message-Id: <20170412013552.21650-2-matthew.brown.dev@gmail.com> X-Mailer: git-send-email 2.9.3 In-Reply-To: <20170412013552.21650-1-matthew.brown.dev@gmail.com> References: <20170412013552.21650-1-matthew.brown.dev@gmail.com> X-BeenThere: linuxppc-dev@lists.ozlabs.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Linux on PowerPC Developers Mail List List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: linux-raid@vger.kernel.org, dja@axtens.net Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: "Linuxppc-dev" The raid6 Q syndrome check has been optimised using the vpermxor instruction. This instruction was made available with POWER8, ISA version 2.07. It allows for both vperm and vxor instructions to be done in a single instruction. This has been tested for correctness on a ppc64le vm with a basic RAID6 setup containing 5 drives. The performance benchmarks are from the raid6test in the /lib/raid6/test directory. These results are from an IBM Firestone machine with ppc64le architecture. The benchmark results show a 35% speed increase over the best existing algorithm for powerpc (altivec). The raid6test has also been run on a big-endian ppc64 vm to ensure it also works for big-endian architectures. Performance benchmarks: raid6: altivecx4 gen() 18773 MB/s raid6: altivecx8 gen() 19438 MB/s raid6: vpermxor4 gen() 25112 MB/s raid6: vpermxor8 gen() 26279 MB/s Note: Fixed minor bug in altivec.uc regarding missing and mismatched ifdef statements. Signed-off-by: Matt Brown --- Changelog v2 - Change CONFIG_ALTIVEC to CPU_FTR_ALTIVEC_COMP - Seperate bug fix into different patch --- include/linux/raid/pq.h | 4 ++ lib/raid6/Makefile | 27 ++++++++++++- lib/raid6/algos.c | 4 ++ lib/raid6/altivec.uc | 3 ++ lib/raid6/test/Makefile | 14 ++++++- lib/raid6/vpermxor.uc | 104 ++++++++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 154 insertions(+), 2 deletions(-) create mode 100644 lib/raid6/vpermxor.uc diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h index 4d57bba..3df9aa6 100644 --- a/include/linux/raid/pq.h +++ b/include/linux/raid/pq.h @@ -107,6 +107,10 @@ extern const struct raid6_calls raid6_avx512x2; extern const struct raid6_calls raid6_avx512x4; extern const struct raid6_calls raid6_tilegx8; extern const struct raid6_calls raid6_s390vx8; +extern const struct raid6_calls raid6_vpermxor1; +extern const struct raid6_calls raid6_vpermxor2; +extern const struct raid6_calls raid6_vpermxor4; +extern const struct raid6_calls raid6_vpermxor8; struct raid6_recov_calls { void (*data2)(int, size_t, int, int, void **); diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index 3057011..7775aad 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -4,7 +4,8 @@ raid6_pq-y += algos.o recov.o tables.o int1.o int2.o int4.o \ int8.o int16.o int32.o raid6_pq-$(CONFIG_X86) += recov_ssse3.o recov_avx2.o mmx.o sse1.o sse2.o avx2.o avx512.o recov_avx512.o -raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o +raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o \ + vpermxor1.o vpermxor2.o vpermxor4.o vpermxor8.o raid6_pq-$(CONFIG_KERNEL_MODE_NEON) += neon.o neon1.o neon2.o neon4.o neon8.o raid6_pq-$(CONFIG_TILEGX) += tilegx8.o raid6_pq-$(CONFIG_S390) += s390vx8.o recov_s390xc.o @@ -88,6 +89,30 @@ $(obj)/altivec8.c: UNROLL := 8 $(obj)/altivec8.c: $(src)/altivec.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) +CFLAGS_vpermxor1.o += $(altivec_flags) +targets += vpermxor1.c +$(obj)/vpermxor1.c: UNROLL := 1 +$(obj)/vpermxor1.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE + $(call if_changed,unroll) + +CFLAGS_vpermxor2.o += $(altivec_flags) +targets += vpermxor2.c +$(obj)/vpermxor2.c: UNROLL := 2 +$(obj)/vpermxor2.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE + $(call if_changed,unroll) + +CFLAGS_vpermxor4.o += $(altivec_flags) +targets += vpermxor4.c +$(obj)/vpermxor4.c: UNROLL := 4 +$(obj)/vpermxor4.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE + $(call if_changed,unroll) + +CFLAGS_vpermxor8.o += $(altivec_flags) +targets += vpermxor8.c +$(obj)/vpermxor8.c: UNROLL := 8 +$(obj)/vpermxor8.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE + $(call if_changed,unroll) + CFLAGS_neon1.o += $(NEON_FLAGS) targets += neon1.c $(obj)/neon1.c: UNROLL := 1 diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c index 7857049..edd4f69 100644 --- a/lib/raid6/algos.c +++ b/lib/raid6/algos.c @@ -74,6 +74,10 @@ const struct raid6_calls * const raid6_algos[] = { &raid6_altivec2, &raid6_altivec4, &raid6_altivec8, + &raid6_vpermxor1, + &raid6_vpermxor2, + &raid6_vpermxor4, + &raid6_vpermxor8, #endif #if defined(CONFIG_TILEGX) &raid6_tilegx8, diff --git a/lib/raid6/altivec.uc b/lib/raid6/altivec.uc index 682aae8..d20ed0d 100644 --- a/lib/raid6/altivec.uc +++ b/lib/raid6/altivec.uc @@ -24,10 +24,13 @@ #include +#ifdef CONFIG_ALTIVEC + #include #ifdef __KERNEL__ # include # include +#endif /* __KERNEL__ */ /* * This is the C data type to use. We use a vector of diff --git a/lib/raid6/test/Makefile b/lib/raid6/test/Makefile index 2c7b60e..9c333e9 100644 --- a/lib/raid6/test/Makefile +++ b/lib/raid6/test/Makefile @@ -97,6 +97,18 @@ altivec4.c: altivec.uc ../unroll.awk altivec8.c: altivec.uc ../unroll.awk $(AWK) ../unroll.awk -vN=8 < altivec.uc > $@ +vpermxor1.c: vpermxor.uc ../unroll.awk + $(AWK) ../unroll.awk -vN=1 < vpermxor.uc > $@ + +vpermxor2.c: vpermxor.uc ../unroll.awk + $(AWK) ../unroll.awk -vN=2 < vpermxor.uc > $@ + +vpermxor4.c: vpermxor.uc ../unroll.awk + $(AWK) ../unroll.awk -vN=4 < vpermxor.uc > $@ + +vpermxor8.c: vpermxor.uc ../unroll.awk + $(AWK) ../unroll.awk -vN=8 < vpermxor.uc > $@ + int1.c: int.uc ../unroll.awk $(AWK) ../unroll.awk -vN=1 < int.uc > $@ @@ -122,7 +134,7 @@ tables.c: mktables ./mktables > tables.c clean: - rm -f *.o *.a mktables mktables.c *.uc int*.c altivec*.c neon*.c tables.c raid6test + rm -f *.o *.a mktables mktables.c *.uc int*.c altivec*.c vpermxor*.c neon*.c tables.c raid6test rm -f tilegx*.c spotless: clean diff --git a/lib/raid6/vpermxor.uc b/lib/raid6/vpermxor.uc new file mode 100644 index 0000000..31a324d --- /dev/null +++ b/lib/raid6/vpermxor.uc @@ -0,0 +1,104 @@ +/* + * Copyright 2017, Matt Brown, IBM Corp. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version + * 2 of the License, or (at your option) any later version. + * + * vpermxor$#.c + * + * Based on H. Peter Anvin's paper - The mathematics of RAID-6 + * + * $#-way unrolled portable integer math RAID-6 instruction set + * This file is postprocessed using unroll.awk + * + * vpermxor$#.c makes use of the vpermxor opcode to optimise the RAID6 Q + * syndrome calculations. + * This can be run on systems which have both Altivec and the vpermxor opcode. + * + * This instruction was introduced in POWER8 - ISA v2.07. + */ + +#include +#ifdef CONFIG_ALTIVEC + +#include +#ifdef __KERNEL__ +#include +#include +#endif + +typedef vector unsigned char unative_t; +#define NSIZE sizeof(unative_t) + +static const vector unsigned char gf_low = {0x1e, 0x1c, 0x1a, 0x18, 0x16, 0x14, + 0x12, 0x10, 0x0e, 0x0c, 0x0a, 0x08, + 0x06, 0x04, 0x02,0x00}; +static const vector unsigned char gf_high = {0xfd, 0xdd, 0xbd, 0x9d, 0x7d, 0x5d, + 0x3d, 0x1d, 0xe0, 0xc0, 0xa0, 0x80, + 0x60, 0x40, 0x20, 0x00}; + +static void noinline raid6_vpermxor$#_gen_syndrome_real(int disks, size_t bytes, + void **ptrs) +{ + u8 **dptr = (u8 **)ptrs; + u8 *p, *q; + int d, z, z0; + unative_t wp$$, wq$$, wd$$; + + z0 = disks - 3; /* Highest data disk */ + p = dptr[z0+1]; /* XOR parity */ + q = dptr[z0+2]; /* RS syndrome */ + + for (d = 0; d < bytes; d += NSIZE*$#) { + wp$$ = wq$$ = *(unative_t *)&dptr[z0][d+$$*NSIZE]; + + for (z = z0-1; z>=0; z--) { + wd$$ = *(unative_t *)&dptr[z][d+$$*NSIZE]; + /* P syndrome */ + wp$$ = vec_xor(wp$$, wd$$); + + /*Q syndrome */ + asm("vpermxor %0,%1,%2,%3":"=v"(wq$$):"v"(gf_high), "v"(gf_low), "v"(wq$$)); + wq$$ = vec_xor(wq$$, wd$$); + } + *(unative_t *)&p[d+NSIZE*$$] = wp$$; + *(unative_t *)&q[d+NSIZE*$$] = wq$$; + } +} + +static void raid6_vpermxor$#_gen_syndrome(int disks, size_t bytes, void **ptrs) +{ + preempt_disable(); + enable_kernel_altivec(); + + raid6_vpermxor$#_gen_syndrome_real(disks, bytes, ptrs); + + disable_kernel_altivec(); + preempt_enable(); +} + +int raid6_have_altivec_vpermxor(void); +#if $# == 1 +int raid6_have_altivec_vpermxor(void) +{ + /* Check if CPU has both altivec and the vpermxor instruction*/ +# ifdef __KERNEL__ + return (cpu_has_feature(CPU_FTR_ALTIVEC_COMP) && + cpu_has_feature(CPU_FTR_ARCH_207S)); +# else + return 1; +#endif + +} +#endif + +const struct raid6_calls raid6_vpermxor$# = { + raid6_vpermxor$#_gen_syndrome, + NULL, + raid6_have_altivec_vpermxor, + "vpermxor$#", + 0 +}; +#endif