From patchwork Thu Nov 9 05:13:26 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 836181 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-86925-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="Z5tK/ctg"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yXWYL29s2z9sNc for ; Thu, 9 Nov 2017 16:14:14 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=OOIcUakUHDvdPoOOgYKrLBZce+Oz+7G Vt/D+jeNOmoAExiYdEOlNpe7ECM3Y5IMA8b0SrpTcScHUqnVQivzd9hAoKOYBmT1 +3G8u1+86cBXMZfX/AX4bHtxSpE3FDeWmArlssp70aoj4SD1+AMrwF8JfcyVkskT dY2+ve6/pjks= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; s=default; bh=KIk2D43pUUyBFxtg6Lt8P0HAMDI=; b=Z5tK/ ctg2LWngzdpf4sd8n6c97MltWxqcEfia40jg0qbwQv71Pz14UNaOaeySmwS+e1pQ wKzJ8astv+MyMS28xYwnQc0jEh8tgdtrhZ8uU+vpW61NV63mWFu7LWMu+y/u7f+o m9niR7MbA6uhZQDzRKokJI/7dKctZwDhiQu7zk= Received: (qmail 78663 invoked by alias); 9 Nov 2017 05:14:01 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 78620 invoked by uid 89); 9 Nov 2017 05:14:01 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_NEUTRAL autolearn=ham version=3.3.2 spammy=multiplying X-HELO: homiemail-a52.g.dreamhost.com From: Siddhesh Poyarekar To: libc-alpha@sourceware.org Cc: Wilco.Dijkstra@arm.com, szabolcs.nagy@arm.com Subject: [PATCH 1/3] benchtests: Fix walking sizes and directions for *-walk benchmarks Date: Thu, 9 Nov 2017 10:43:26 +0530 Message-Id: <1510204408-1739-2-git-send-email-siddhesh@sourceware.org> In-Reply-To: <1510204408-1739-1-git-send-email-siddhesh@sourceware.org> References: <1510204408-1739-1-git-send-email-siddhesh@sourceware.org> Make the walking benchmarks walk only backwards since copying both ways is biased in favour of implementations that use non-temporal stores for larger sizes; falkor is one of them. This also fixes up bugs in computation of the result which ended up multiplying the length with the timing result unnecessarily. * benchtests/bench-memcpy-walk.c (do_one_test): Copy only backwards. Fix timing computation. * benchtests/bench-memmove-walk.c (do_one_test): Likewise. * benchtests/bench-memset-walk.c (do_one_test): Walk backwards on memset by N at a time. Fix timing computation. --- benchtests/bench-memcpy-walk.c | 14 +++++--------- benchtests/bench-memmove-walk.c | 15 +++++---------- benchtests/bench-memset-walk.c | 4 ++-- 3 files changed, 12 insertions(+), 21 deletions(-) diff --git a/benchtests/bench-memcpy-walk.c b/benchtests/bench-memcpy-walk.c index 69d467d..5b56341 100644 --- a/benchtests/bench-memcpy-walk.c +++ b/benchtests/bench-memcpy-walk.c @@ -47,26 +47,22 @@ static void do_one_test (json_ctx_t *json_ctx, impl_t *impl, char *dst, char *src, size_t len) { - size_t i, iters = MIN_PAGE_SIZE / len; + size_t i = 0; timing_t start, stop, cur; char *dst_end = dst + MIN_PAGE_SIZE - len; char *src_end = src + MIN_PAGE_SIZE - len; TIMING_NOW (start); - /* Copy the entire buffer back and forth, LEN at a time. */ - for (i = 0; i < iters && dst_end >= dst && src <= src_end; src++, dst_end--) - { - CALL (impl, dst_end, src, len); - CALL (impl, src, dst_end, len); - i += 2; - } + /* Copy the entire buffer backwards, LEN at a time. */ + for (; src_end >= src && dst_end >= dst; src_end -= len, dst_end -= len, i++) + CALL (impl, src_end, dst_end, len); TIMING_NOW (stop); TIMING_DIFF (cur, start, stop); /* Get time taken per function call. */ - json_element_double (json_ctx, (double) cur * len / i); + json_element_double (json_ctx, (double) cur / i); } static void diff --git a/benchtests/bench-memmove-walk.c b/benchtests/bench-memmove-walk.c index 54dcd64..969ddd9 100644 --- a/benchtests/bench-memmove-walk.c +++ b/benchtests/bench-memmove-walk.c @@ -47,26 +47,22 @@ static void do_one_test (json_ctx_t *json_ctx, impl_t *impl, char *dst, char *src, size_t len) { - size_t i, iters = MIN_PAGE_SIZE / len; + size_t i = 0; timing_t start, stop, cur; char *dst_end = dst + MIN_PAGE_SIZE - len; char *src_end = src + MIN_PAGE_SIZE - len; TIMING_NOW (start); - /* Copy the entire buffer back and forth, LEN at a time. */ - for (i = 0; i < iters && dst_end >= dst && src <= src_end; src++, dst_end--) - { - CALL (impl, dst_end, src, len); - CALL (impl, src, dst_end, len); - i += 2; - } + /* Copy the entire buffer backwards, LEN at a time. */ + for (; src_end >= src && dst <= dst_end; dst += len, src_end -= len, i++) + CALL (impl, dst, src_end, len); TIMING_NOW (stop); TIMING_DIFF (cur, start, stop); /* Get time taken per function call. */ - json_element_double (json_ctx, (double) cur * len / i); + json_element_double (json_ctx, (double) cur / i); } static void @@ -79,7 +75,6 @@ do_test (json_ctx_t *json_ctx, size_t len, bool overlap) if (overlap) buf2 = buf1; - /* First the non-overlapping moves. */ FOR_EACH_IMPL (impl, 0) do_one_test (json_ctx, impl, (char *) buf2, (char *) buf1, len); diff --git a/benchtests/bench-memset-walk.c b/benchtests/bench-memset-walk.c index 59d2626..80fbe09 100644 --- a/benchtests/bench-memset-walk.c +++ b/benchtests/bench-memset-walk.c @@ -66,14 +66,14 @@ do_one_test (json_ctx_t *json_ctx, impl_t *impl, CHAR *s, CHAR *s_end, timing_t start, stop, cur; TIMING_NOW (start); - for (i = 0; i < iters && s <= s_end; s++, i++) + for (i = 0; i < iters && s <= s_end; s_end -= n, i++) CALL (impl, s, c, n); TIMING_NOW (stop); TIMING_DIFF (cur, start, stop); /* Get time taken per function call. */ - json_element_double (json_ctx, (double) cur * n / i); + json_element_double (json_ctx, (double) cur / i); } static void From patchwork Thu Nov 9 05:13:27 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 836182 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-86926-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="esAX8GrL"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yXWYW6bgjz9s5L for ; Thu, 9 Nov 2017 16:14:23 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=t7622z4rRDc+dguLiH6neP61iCbdi8l T7cT95DdoDBWPKNAlwEdJ5rqYwx7d6jw2ZwNF5OyH+o1x6mXoAmVJboNsFheFmCq rdXUnR31S9oAtmDXTIMp/kXTkaRPmcEBTMfi4Y8gto9QGAfrGNeshDQ8BplnDPmw ENDlXAfibx7k= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; s=default; bh=wJ8plAEN0xxJmAE+AKEuEQZMKps=; b=esAX8 GrL/aXbzKCFUXZkS4DanO+2+OXY9fQ06Kb5zmvF3H3+psr3xPQlMl3QTNMzFwbHz N/viRdSWlDAETaYGrQIuGRp288ISheEEThlTk+F1w6f3s7yZdRCv1xYJjLMPWk6i rxD93LWqksFEJbLe7NtgIf20D/Tm6EBfqiVFFU= Received: (qmail 79061 invoked by alias); 9 Nov 2017 05:14:05 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 78930 invoked by uid 89); 9 Nov 2017 05:14:04 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-26.1 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, RCVD_IN_DNSWL_NONE, SPF_NEUTRAL autolearn=ham version=3.3.2 spammy= X-HELO: homiemail-a52.g.dreamhost.com From: Siddhesh Poyarekar To: libc-alpha@sourceware.org Cc: Wilco.Dijkstra@arm.com, szabolcs.nagy@arm.com Subject: [PATCH 2/3] benchtests: Bump start size since smaller sizes are noisy Date: Thu, 9 Nov 2017 10:43:27 +0530 Message-Id: <1510204408-1739-3-git-send-email-siddhesh@sourceware.org> In-Reply-To: <1510204408-1739-1-git-send-email-siddhesh@sourceware.org> References: <1510204408-1739-1-git-send-email-siddhesh@sourceware.org> Numbers for very small sizes (< 128B) are much noisier for non-cached benchmarks like the walk benchmarks, so don't include them. * benchtests/bench-memcpy-walk.c (START_SIZE): Set to 128. * benchtests/bench-memmove-walk.c (START_SIZE): Likewise. * benchtests/bench-memset-walk.c (START_SIZE): Likewise. --- benchtests/bench-memcpy-walk.c | 2 +- benchtests/bench-memmove-walk.c | 2 +- benchtests/bench-memset-walk.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/benchtests/bench-memcpy-walk.c b/benchtests/bench-memcpy-walk.c index 5b56341..ef90a92 100644 --- a/benchtests/bench-memcpy-walk.c +++ b/benchtests/bench-memcpy-walk.c @@ -29,7 +29,7 @@ #ifndef MEMCPY_RESULT # define MEMCPY_RESULT(dst, len) dst -# define START_SIZE 1 +# define START_SIZE 128 # define MIN_PAGE_SIZE (getpagesize () + 32 * 1024 * 1024) # define TEST_MAIN # define TEST_NAME "memcpy" diff --git a/benchtests/bench-memmove-walk.c b/benchtests/bench-memmove-walk.c index 969ddd9..189ce64 100644 --- a/benchtests/bench-memmove-walk.c +++ b/benchtests/bench-memmove-walk.c @@ -29,7 +29,7 @@ #ifndef MEMMOVE_RESULT # define MEMMOVE_RESULT(dst, len) dst -# define START_SIZE 1 +# define START_SIZE 128 # define MIN_PAGE_SIZE (getpagesize () + 32 * 1024 * 1024) # define TEST_MAIN # define TEST_NAME "memmove" diff --git a/benchtests/bench-memset-walk.c b/benchtests/bench-memset-walk.c index 80fbe09..213bb60 100644 --- a/benchtests/bench-memset-walk.c +++ b/benchtests/bench-memset-walk.c @@ -22,7 +22,7 @@ #else # define TEST_NAME "wmemset" #endif /* WIDE */ -#define START_SIZE (1) +#define START_SIZE 128 #define MIN_PAGE_SIZE (getpagesize () + 32 * 1024 * 1024) #define TIMEOUT (20 * 60) #include "bench-string.h" From patchwork Thu Nov 9 05:13:28 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Siddhesh Poyarekar X-Patchwork-Id: 836183 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (mailfrom) smtp.mailfrom=sourceware.org (client-ip=209.132.180.131; helo=sourceware.org; envelope-from=libc-alpha-return-86927-incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.b="Kwk4EIjF"; dkim-atps=neutral Received: from sourceware.org (server1.sourceware.org [209.132.180.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 3yXWYk1VN4z9s5L for ; Thu, 9 Nov 2017 16:14:34 +1100 (AEDT) DomainKey-Signature: a=rsa-sha1; c=nofws; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; q=dns; s=default; b=QUrrrrHPZD1nsuIihEAy71Jhr5olu2o +FGP2CbkCGEpbqKtY1125hkOWJtVvzNyMsDSQ/1ga9NUtMPp0YTtGezlwvNfqrib Y3m+ZBXa4iTEmRcDwGo57cE+gcwpfIEFSAAEZ2pE4abVHA+OMvEBdy65mMTonKqM 8smdIUhScMp0= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=sourceware.org; h=list-id :list-unsubscribe:list-subscribe:list-archive:list-post :list-help:sender:from:to:cc:subject:date:message-id:in-reply-to :references; s=default; bh=oorQsjmrIugjOR3A6g+HcIquCYA=; b=Kwk4E IjFkMlhD+1GE7wJOj4ak5c0/k2QhZVXIQvYap97EK8SsP/Rv5Sg1dgDO4vA8g0tT rDfuoEHvhPWJUDi3PVF3CTYqFAonOpvUzy2OZGsxHDNg2r3wGzh2KYWkuJ3sl1GE gyIkqIt4mLyPLtozXj8R8NKnfnKbW5ml240o+I= Received: (qmail 79566 invoked by alias); 9 Nov 2017 05:14:10 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Delivered-To: mailing list libc-alpha@sourceware.org Received: (qmail 79426 invoked by uid 89); 9 Nov 2017 05:14:09 -0000 Authentication-Results: sourceware.org; auth=none X-Virus-Found: No X-Spam-SWARE-Status: No, score=-25.3 required=5.0 tests=BAYES_00, GIT_PATCH_0, GIT_PATCH_1, GIT_PATCH_2, GIT_PATCH_3, KAM_ASCII_DIVIDERS, RCVD_IN_DNSWL_NONE, SPF_NEUTRAL autolearn=ham version=3.3.2 spammy=END, 1032, 14351 X-HELO: homiemail-a52.g.dreamhost.com From: Siddhesh Poyarekar To: libc-alpha@sourceware.org Cc: Wilco.Dijkstra@arm.com, szabolcs.nagy@arm.com Subject: [PATCH 3/3] aarch64: Hoist ZVA check out of the memset function Date: Thu, 9 Nov 2017 10:43:28 +0530 Message-Id: <1510204408-1739-4-git-send-email-siddhesh@sourceware.org> In-Reply-To: <1510204408-1739-1-git-send-email-siddhesh@sourceware.org> References: <1510204408-1739-1-git-send-email-siddhesh@sourceware.org> The DZP bit in the dczid_el0 register does not change dynamically, so it is safe to read once during program startup. Hoist the zva check into an ifunc resolver to optimize for the most common zva size i.e. 64 bytes. I have retained the older memset as __memset_generic for internal libc.so use so that the change impact is minimal. We should eventually have a discussion on what is more expensive, reading dczid_el0 on every memset invocation or the indirection due to PLT. The gains due to this are significant for falkor, with run time reductions as high as 48% in some cases. Likewise for mustang, although the numbers are slightly lower. Here's a sample from the falkor tests: Function: memset Variant: walk simple_memset __memset_zva_64 __memset_generic ====================================================================================================== length=256, char=0: 139.96 (-698.28%) 9.07 ( 48.26%) 17.53 length=257, char=0: 140.50 (-699.03%) 9.53 ( 45.80%) 17.58 length=258, char=0: 140.96 (-703.95%) 9.58 ( 45.36%) 17.53 length=259, char=0: 141.56 (-705.16%) 9.53 ( 45.79%) 17.58 length=260, char=0: 142.15 (-710.76%) 9.57 ( 45.39%) 17.53 length=261, char=0: 142.50 (-710.39%) 9.53 ( 45.78%) 17.58 length=262, char=0: 142.97 (-715.09%) 9.57 ( 45.42%) 17.54 length=263, char=0: 143.51 (-716.18%) 9.53 ( 45.80%) 17.58 length=264, char=0: 143.93 (-720.55%) 9.58 ( 45.39%) 17.54 length=265, char=0: 144.56 (-722.07%) 9.53 ( 45.80%) 17.59 length=266, char=0: 144.98 (-726.42%) 9.58 ( 45.42%) 17.54 length=267, char=0: 145.53 (-727.53%) 9.53 ( 45.80%) 17.59 length=268, char=0: 146.25 (-731.81%) 9.53 ( 45.79%) 17.58 length=269, char=0: 146.52 (-735.39%) 9.53 ( 45.66%) 17.54 length=270, char=0: 146.97 (-735.81%) 9.53 ( 45.80%) 17.58 length=271, char=0: 147.54 (-741.08%) 9.58 ( 45.38%) 17.54 length=512, char=0: 268.26 (-1307.85%) 12.06 ( 36.71%) 19.05 length=513, char=0: 268.73 (-1273.89%) 13.56 ( 30.68%) 19.56 length=514, char=0: 269.31 (-1276.89%) 13.56 ( 30.68%) 19.56 length=515, char=0: 269.73 (-1279.05%) 13.56 ( 30.68%) 19.56 length=516, char=0: 270.34 (-1282.24%) 13.56 ( 30.67%) 19.56 length=517, char=0: 270.83 (-1284.71%) 13.56 ( 30.66%) 19.56 length=518, char=0: 271.20 (-1286.54%) 13.56 ( 30.67%) 19.56 length=519, char=0: 271.67 (-1288.67%) 13.65 ( 30.24%) 19.56 length=520, char=0: 272.14 (-1291.04%) 13.65 ( 30.22%) 19.56 length=521, char=0: 272.66 (-1293.69%) 13.65 ( 30.23%) 19.56 length=522, char=0: 273.14 (-1296.13%) 13.65 ( 30.20%) 19.56 length=523, char=0: 273.64 (-1298.75%) 13.65 ( 30.23%) 19.56 length=524, char=0: 274.34 (-1302.16%) 13.66 ( 30.20%) 19.57 length=525, char=0: 274.64 (-1297.78%) 13.56 ( 30.99%) 19.65 length=526, char=0: 275.20 (-1300.04%) 13.56 ( 31.01%) 19.66 length=527, char=0: 275.66 (-1302.86%) 13.56 ( 30.99%) 19.65 length=1024, char=0: 524.46 (-2169.75%) 20.12 ( 12.92%) 23.11 length=1025, char=0: 525.14 (-2124.63%) 21.62 ( 8.40%) 23.61 length=1026, char=0: 525.59 (-2125.36%) 21.88 ( 7.37%) 23.62 length=1027, char=0: 525.98 (-2127.14%) 21.62 ( 8.46%) 23.62 length=1028, char=0: 526.68 (-2131.10%) 21.62 ( 8.42%) 23.61 length=1029, char=0: 527.10 (-2131.70%) 21.79 ( 7.73%) 23.62 length=1030, char=0: 527.54 (-2118.51%) 21.62 ( 9.10%) 23.78 length=1031, char=0: 527.98 (-2136.37%) 21.62 ( 8.43%) 23.61 length=1032, char=0: 528.70 (-2139.38%) 21.62 ( 8.43%) 23.61 length=1033, char=0: 529.25 (-2124.37%) 21.62 ( 9.11%) 23.79 length=1034, char=0: 529.48 (-2142.95%) 21.62 ( 8.43%) 23.61 length=1035, char=0: 530.11 (-2145.13%) 21.62 ( 8.44%) 23.61 length=1036, char=0: 530.76 (-2147.10%) 21.79 ( 7.73%) 23.62 length=1037, char=0: 531.03 (-2149.45%) 21.62 ( 8.42%) 23.61 length=1038, char=0: 531.64 (-2151.87%) 21.62 ( 8.42%) 23.61 length=1039, char=0: 531.99 (-2151.63%) 21.80 ( 7.75%) 23.63 * sysdeps/aarch64/memset-reg.h: New file. * sysdeps/aarch64/memset.S: Use it. (__memset): Rename to MEMSET macro. [ZVA_MACRO]: Use zva_macro. * sysdeps/aarch64/multiarch/Makefile (sysdep_routines): Add memset_generic and memset_zva_64. * sysdeps/aarch64/multiarch/ifunc-impl-list.c (__libc_ifunc_impl_list): Add memset ifuncs. * sysdeps/aarch64/multiarch/init-arch.h (INIT_ARCH): New local variable zva_size. * sysdeps/aarch64/multiarch/memset.c: New file. * sysdeps/aarch64/multiarch/memset_generic.S: New file. * sysdeps/aarch64/multiarch/memset_zva_64.S: New file. * sysdeps/aarch64/multiarch/rtld-memset.S: New file. * sysdeps/unix/sysv/linux/aarch64/cpu-features.c (DCZID_DZP_MASK): New macro. (DCZID_BS_MASK): Likewise. (init_cpu_features): Read and set zva_size. * sysdeps/unix/sysv/linux/aarch64/cpu-features.h (struct cpu_features): New member zva_size. --- sysdeps/aarch64/memset-reg.h | 30 ++++++++++++++++ sysdeps/aarch64/memset.S | 27 +++++--------- sysdeps/aarch64/multiarch/Makefile | 2 +- sysdeps/aarch64/multiarch/ifunc-impl-list.c | 3 ++ sysdeps/aarch64/multiarch/init-arch.h | 8 +++-- sysdeps/aarch64/multiarch/memset.c | 41 +++++++++++++++++++++ sysdeps/aarch64/multiarch/memset_generic.S | 27 ++++++++++++++ sysdeps/aarch64/multiarch/memset_zva_64.S | 49 ++++++++++++++++++++++++++ sysdeps/aarch64/multiarch/rtld-memset.S | 23 ++++++++++++ sysdeps/unix/sysv/linux/aarch64/cpu-features.c | 10 ++++++ sysdeps/unix/sysv/linux/aarch64/cpu-features.h | 1 + 11 files changed, 199 insertions(+), 22 deletions(-) create mode 100644 sysdeps/aarch64/memset-reg.h create mode 100644 sysdeps/aarch64/multiarch/memset.c create mode 100644 sysdeps/aarch64/multiarch/memset_generic.S create mode 100644 sysdeps/aarch64/multiarch/memset_zva_64.S create mode 100644 sysdeps/aarch64/multiarch/rtld-memset.S diff --git a/sysdeps/aarch64/memset-reg.h b/sysdeps/aarch64/memset-reg.h new file mode 100644 index 0000000..518e562 --- /dev/null +++ b/sysdeps/aarch64/memset-reg.h @@ -0,0 +1,30 @@ +/* Register aliases for memset to be used across implementations. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +#define dstin x0 +#define val x1 +#define valw w1 +#define count x2 +#define dst x3 +#define dstend x4 +#define tmp1 x5 +#define tmp1w w5 +#define tmp2 x6 +#define tmp2w w6 +#define zva_len x7 +#define zva_lenw w7 diff --git a/sysdeps/aarch64/memset.S b/sysdeps/aarch64/memset.S index 110fd22..45fb0a8 100644 --- a/sysdeps/aarch64/memset.S +++ b/sysdeps/aarch64/memset.S @@ -17,6 +17,7 @@ . */ #include +#include "memset-reg.h" /* Assumptions: * @@ -24,20 +25,7 @@ * */ -#define dstin x0 -#define val x1 -#define valw w1 -#define count x2 -#define dst x3 -#define dstend x4 -#define tmp1 x5 -#define tmp1w w5 -#define tmp2 x6 -#define tmp2w w6 -#define zva_len x7 -#define zva_lenw w7 - -ENTRY_ALIGN (__memset, 6) +ENTRY_ALIGN (MEMSET, 6) DELOUSE (0) DELOUSE (2) @@ -108,8 +96,11 @@ L(tail64): stp q0, q0, [dstend, -32] ret - .p2align 3 L(try_zva): +#ifdef ZVA_MACRO + zva_macro +#else + .p2align 3 mrs tmp1, dczid_el0 tbnz tmp1w, 4, L(no_zva) and tmp1w, tmp1w, 15 @@ -189,7 +180,7 @@ L(zva_other): b.hs 3b 4: add count, count, zva_len b L(tail64) +#endif -END (__memset) -weak_alias (__memset, memset) -libc_hidden_builtin_def (memset) +END (MEMSET) +libc_hidden_builtin_def (MEMSET) diff --git a/sysdeps/aarch64/multiarch/Makefile b/sysdeps/aarch64/multiarch/Makefile index 9aa1e79..57a6c8b 100644 --- a/sysdeps/aarch64/multiarch/Makefile +++ b/sysdeps/aarch64/multiarch/Makefile @@ -1,4 +1,4 @@ ifeq ($(subdir),string) sysdep_routines += memcpy_generic memcpy_thunderx memcpy_falkor \ - memmove_falkor + memmove_falkor memset_generic memset_zva_64 endif diff --git a/sysdeps/aarch64/multiarch/ifunc-impl-list.c b/sysdeps/aarch64/multiarch/ifunc-impl-list.c index 2cb74d5..a8c2f6d 100644 --- a/sysdeps/aarch64/multiarch/ifunc-impl-list.c +++ b/sysdeps/aarch64/multiarch/ifunc-impl-list.c @@ -46,6 +46,9 @@ __libc_ifunc_impl_list (const char *name, struct libc_ifunc_impl *array, IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_thunderx) IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_falkor) IFUNC_IMPL_ADD (array, i, memmove, 1, __memmove_generic)) + IFUNC_IMPL (i, name, memset, + IFUNC_IMPL_ADD (array, i, memset, (zva_size == 64), __memset_zva_64) + IFUNC_IMPL_ADD (array, i, memset, 1, __memset_generic)) return i; } diff --git a/sysdeps/aarch64/multiarch/init-arch.h b/sysdeps/aarch64/multiarch/init-arch.h index 3af442c..a756dad 100644 --- a/sysdeps/aarch64/multiarch/init-arch.h +++ b/sysdeps/aarch64/multiarch/init-arch.h @@ -18,6 +18,8 @@ #include -#define INIT_ARCH() \ - uint64_t __attribute__((unused)) midr = \ - GLRO(dl_aarch64_cpu_features).midr_el1; +#define INIT_ARCH() \ + uint64_t __attribute__((unused)) midr = \ + GLRO(dl_aarch64_cpu_features).midr_el1; \ + unsigned __attribute__((unused)) zva_size = \ + GLRO(dl_aarch64_cpu_features).zva_size; diff --git a/sysdeps/aarch64/multiarch/memset.c b/sysdeps/aarch64/multiarch/memset.c new file mode 100644 index 0000000..fcaa376 --- /dev/null +++ b/sysdeps/aarch64/multiarch/memset.c @@ -0,0 +1,41 @@ +/* Multiple versions of memset. AARCH64 version. + Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library; if not, see + . */ + +/* Define multiple versions only for the definition in libc. */ + +#if IS_IN (libc) +/* Redefine memset so that the compiler won't complain about the type + mismatch with the IFUNC selector in strong_alias, below. */ +# undef memset +# define memset __redirect_memset +# include +# include + +extern __typeof (__redirect_memset) __libc_memset; + +extern __typeof (__redirect_memset) __memset_nozva attribute_hidden; +extern __typeof (__redirect_memset) __memset_zva_64 attribute_hidden; +extern __typeof (__redirect_memset) __memset_zva_128 attribute_hidden; +extern __typeof (__redirect_memset) __memset_generic attribute_hidden; + +libc_ifunc (__libc_memset, (zva_size == 64 ? __memset_zva_64 + : __memset_generic)); + +# undef memset +strong_alias (__libc_memset, memset); +#endif diff --git a/sysdeps/aarch64/multiarch/memset_generic.S b/sysdeps/aarch64/multiarch/memset_generic.S new file mode 100644 index 0000000..55134ed --- /dev/null +++ b/sysdeps/aarch64/multiarch/memset_generic.S @@ -0,0 +1,27 @@ +/* Memset for aarch64, default version for internal use. + Copyright (C) 2017 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#if IS_IN (libc) +# define MEMSET __memset_generic +/* Add a hidden definition for use within libc.so. */ +# ifdef SHARED + .globl __GI_memset; __GI_memset = __memset_generic +# endif +# include +#endif diff --git a/sysdeps/aarch64/multiarch/memset_zva_64.S b/sysdeps/aarch64/multiarch/memset_zva_64.S new file mode 100644 index 0000000..bb9f140 --- /dev/null +++ b/sysdeps/aarch64/multiarch/memset_zva_64.S @@ -0,0 +1,49 @@ +/* Memset for zva size of 64 bytes. + Copyright (C) 2017 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#include + +#if IS_IN (libc) +.macro zva_macro + .p2align 4 + /* Write the first and last 64 byte aligned block using stp rather + than using DC ZVA. This is faster on some cores. */ + str q0, [dst, 16] + stp q0, q0, [dst, 32] + bic dst, dst, 63 + stp q0, q0, [dst, 64] + stp q0, q0, [dst, 96] + sub count, dstend, dst /* Count is now 128 too large. */ + sub count, count, 128+64+64 /* Adjust count and bias for loop. */ + add dst, dst, 128 +1: dc zva, dst + add dst, dst, 64 + subs count, count, 64 + b.hi 1b + stp q0, q0, [dst, 0] + stp q0, q0, [dst, 32] + stp q0, q0, [dstend, -64] + stp q0, q0, [dstend, -32] + ret +.endm + +# define ZVA_MACRO zva_macro +# define MEMSET __memset_zva_64 +# include +#endif diff --git a/sysdeps/aarch64/multiarch/rtld-memset.S b/sysdeps/aarch64/multiarch/rtld-memset.S new file mode 100644 index 0000000..4437393 --- /dev/null +++ b/sysdeps/aarch64/multiarch/rtld-memset.S @@ -0,0 +1,23 @@ +/* Memset for aarch64, for the dynamic linker. + Copyright (C) 2017 Free Software Foundation, Inc. + + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + . */ + +#if IS_IN (rtld) +# define MEMSET memset +# include +#endif diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c index e769eeb..092ee81 100644 --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.c +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.c @@ -20,6 +20,9 @@ #include #include +#define DCZID_DZP_MASK (1 << 4) +#define DCZID_BS_MASK (0xf) + #if HAVE_TUNABLES struct cpu_list { @@ -72,4 +75,11 @@ init_cpu_features (struct cpu_features *cpu_features) } cpu_features->midr_el1 = midr; + + /* Check if ZVA is enabled. */ + unsigned dczid; + asm volatile ("mrs %0, dczid_el0" : "=r"(dczid)); + + if ((dczid & DCZID_DZP_MASK) == 0) + cpu_features->zva_size = 4 << (dczid & DCZID_BS_MASK); } diff --git a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h index 73cb53d..f2b6afd 100644 --- a/sysdeps/unix/sysv/linux/aarch64/cpu-features.h +++ b/sysdeps/unix/sysv/linux/aarch64/cpu-features.h @@ -47,6 +47,7 @@ struct cpu_features { uint64_t midr_el1; + unsigned zva_size; }; #endif /* _CPU_FEATURES_AARCH64_H */