From patchwork Mon Aug 9 16:17:21 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Wilco Dijkstra X-Patchwork-Id: 1515152 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=sourceware.org (client-ip=8.43.85.97; helo=sourceware.org; envelope-from=libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (1024-bit key; secure) header.d=sourceware.org header.i=@sourceware.org header.a=rsa-sha256 header.s=default header.b=DrfuNMxB; dkim-atps=neutral Received: from sourceware.org (server2.sourceware.org [8.43.85.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4Gk1TP6J0qz9s5R for ; Tue, 10 Aug 2021 02:18:01 +1000 (AEST) Received: from server2.sourceware.org (localhost [IPv6:::1]) by sourceware.org (Postfix) with ESMTP id AF0A93839401 for ; Mon, 9 Aug 2021 16:17:58 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org AF0A93839401 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1628525878; bh=RKQMKKKd/MAnMwZBGLCHSvL8c+HkC1CC5rR2rhi6d9A=; h=To:Subject:Date:List-Id:List-Unsubscribe:List-Archive:List-Post: List-Help:List-Subscribe:From:Reply-To:Cc:From; b=DrfuNMxBID5tibwPxsvz7MENfCyOgqDdLT/nUx1VkeBaNXfzxR27LqAmeVnSBw/KR TkRP3kNSH4ddVrEybFkEz862ck35DqLF8lxcJu1YwPM7UwB7JSGaOLNf7pIJ/658Fu lVrTpjkF6cWKEfMt33RMO43OdF11gopdDfFoJOmA= X-Original-To: libc-alpha@sourceware.org Delivered-To: libc-alpha@sourceware.org Received: from EUR05-DB8-obe.outbound.protection.outlook.com (mail-db8eur05on2057.outbound.protection.outlook.com [40.107.20.57]) by sourceware.org (Postfix) with ESMTPS id 61999385E019 for ; Mon, 9 Aug 2021 16:17:41 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 61999385E019 Received: from AM6P192CA0104.EURP192.PROD.OUTLOOK.COM (2603:10a6:209:8d::45) by AM9PR08MB7013.eurprd08.prod.outlook.com (2603:10a6:20b:419::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15; Mon, 9 Aug 2021 16:17:39 +0000 Received: from AM5EUR03FT006.eop-EUR03.prod.protection.outlook.com (2603:10a6:209:8d:cafe::e) by AM6P192CA0104.outlook.office365.com (2603:10a6:209:8d::45) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15 via Frontend Transport; Mon, 9 Aug 2021 16:17:39 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 63.35.35.123) smtp.mailfrom=arm.com; sourceware.org; dkim=pass (signature was verified) header.d=armh.onmicrosoft.com;sourceware.org; dmarc=pass action=none header.from=arm.com; Received-SPF: Pass (protection.outlook.com: domain of arm.com designates 63.35.35.123 as permitted sender) receiver=protection.outlook.com; client-ip=63.35.35.123; helo=64aa7808-outbound-1.mta.getcheckrecipient.com; Received: from 64aa7808-outbound-1.mta.getcheckrecipient.com (63.35.35.123) by AM5EUR03FT006.mail.protection.outlook.com (10.152.16.122) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.16 via Frontend Transport; Mon, 9 Aug 2021 16:17:39 +0000 Received: ("Tessian outbound efa8a7456a86:v101"); Mon, 09 Aug 2021 16:17:38 +0000 X-CheckRecipientChecked: true X-CR-MTA-CID: c6ac086bc0c01b28 X-CR-MTA-TID: 64aa7808 Received: from 19c96168655f.1 by 64aa7808-outbound-1.mta.getcheckrecipient.com id 1FBD997E-33B8-47C3-8389-F74080C66ABF.1; Mon, 09 Aug 2021 16:17:32 +0000 Received: from EUR02-HE1-obe.outbound.protection.outlook.com by 64aa7808-outbound-1.mta.getcheckrecipient.com with ESMTPS id 19c96168655f.1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384); Mon, 09 Aug 2021 16:17:32 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=a3GHPP/5LoFmuU/qEaX8pYFM5EX0u6o83R38anQY3mJ7Wn5/y/uaJ6NWAAsojeFdSlnoivB8/rCrgHCvcvU/Nxz6U5fyuUSwEkD9IR1b69bRrdXT5YOLBiU6aF2KSNmnFM4FzI0bi4e7lwON1Iqp/O8sEJC/1fOrBlSQIvLv+GO6M39dsG9kwbs0jAsDf2dovPdMNbFRA2ZfCUyphpPJaCkgeXzNCmRjyCJEZzjIt0ZWkNZYrljT4kG/I5I3MMl7RBen+xOSw70rQSwzcJOHSE1zQyJxzHEeEm6Dlk2fRL1u3o7SidMp12NT2mFdB4zNZV8GoTCnoPAiFHoZvqhwow== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=RKQMKKKd/MAnMwZBGLCHSvL8c+HkC1CC5rR2rhi6d9A=; b=M8Ap8pF2DNRog1qOOhREBxpkBHCcUoQijc4iRrIWy9yiH9OkfIqo3m8ffL41a+isAwzJ2ywYLL7XpwRzEnPTGN7SESvRibQ31HyNLnXMNA68llWxXZM6gbFELj7VY3DQd3g58Rgvy0pd0+V5kj0A0ZX3uS3ygWYqvbRCXOMyNhuFDa6f7y3w9M5XUyf57YVCmukdkWrz0n07sewsenLrC6cEE7l2JjU8H7hgnjmc7yF/X7l18BKWPHrudO0d3TErJAuY35Gy5CGqO+MpmBKmxucVeE4n0oCeW/J4EydGFBrMszzSn2EMzgpIkrA/o3AQqANc6f8eVm8ZMNby+lgKeA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=arm.com; dmarc=pass action=none header.from=arm.com; dkim=pass header.d=arm.com; arc=none Received: from VE1PR08MB5599.eurprd08.prod.outlook.com (2603:10a6:800:1a1::12) by VI1PR0801MB1711.eurprd08.prod.outlook.com (2603:10a6:800:4e::8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4394.15; Mon, 9 Aug 2021 16:17:21 +0000 Received: from VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba]) by VE1PR08MB5599.eurprd08.prod.outlook.com ([fe80::c437:fa2b:33:c8ba%7]) with mapi id 15.20.4394.023; Mon, 9 Aug 2021 16:17:21 +0000 To: "naohirot@fujitsu.com" Subject: [PATCH v4 2/5] AArch64: Improve A64FX memset for large sizes Thread-Topic: [PATCH v4 2/5] AArch64: Improve A64FX memset for large sizes Thread-Index: AQHXjR9/leZH4OrOA0GzbptlHYctdQ== Date: Mon, 9 Aug 2021 16:17:21 +0000 Message-ID: Accept-Language: en-GB, en-US Content-Language: en-GB X-MS-Has-Attach: X-MS-TNEF-Correlator: Authentication-Results-Original: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; x-ms-publictraffictype: Email X-MS-Office365-Filtering-Correlation-Id: c6665cd7-e1fb-4319-c418-08d95b513525 x-ms-traffictypediagnostic: VI1PR0801MB1711:|AM9PR08MB7013: X-Microsoft-Antispam-PRVS: x-checkrecipientrouted: true nodisclaimer: true x-ms-oob-tlc-oobclassifiers: OLM:5236;OLM:5236; X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam-Untrusted: BCL:0; X-Microsoft-Antispam-Message-Info-Original: wDwEQDpQry6Hn6bmfDsWhmTA1/x+dkQXpJpda1Z+M5UgDWBQ34QwGRq+JOCYowpCs4oDWAWwL1unEnauGvoSG1PqB6fV9z2lryAdixR4xyCxKMjEV3LE1SulFUJd0kKNg72GP7Ck8d+VevbAP56mCsriBtC4wpka08I9+DwSZyrvzgS/Netr2rYPDxiP7Z0+ZZaHXf9JaeifLWv/cP9WoyetkqEbtymct06TY7zeanpigM0J6cDBTWASYoFBYxy/lDw+e4ZwUoOBydZ+IShDnAiDieviTm6OO2TWjFMIDCak4lGRehuS4NnoTxGLMpzPb8+57B67KGfFVaHp0eDnuhEOo2G3KCBMEAyS7c/Wp0o3FUvF0OGs5JaPCckLUT241a+6GxnzFq0cjPq7f2xqn4Dbh5RW9X6YKkXwdM7NMEXNYpCU3fdbGIoFxmPRVfW76ppHRpjsYgSrkEH5blEzq6bePd/4p8dPpDhgA7jUy5ClcbIfSF5teywwkE4WiKMvQ4BKOHsWBD3MWvXncQAw9bjn9mPRLgL5FXx2mLl/jCe1vkqEQyASP1PpSATGQyA0SGOzNKunp36qiNObOq6RoiYD5sgtVjrzun64riKKxpobTAuAkzL01HZPX7cHUMylMXcpIKBT5xzBN0bgd/cVo8DRPDRwB+h4RuZJj4JkrfasspezZV5n+l+MxgDoh35qs3CbX+v2LG37EIJZgtL/+moSn/WHu3+Y1puy6mrdRVs= X-Forefront-Antispam-Report-Untrusted: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:VE1PR08MB5599.eurprd08.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(366004)(396003)(376002)(346002)(136003)(39860400002)(478600001)(6916009)(2906002)(66446008)(64756008)(66946007)(66476007)(66556008)(38100700002)(4326008)(55016002)(52536014)(122000001)(38070700005)(91956017)(6506007)(26005)(8676002)(9686003)(5660300002)(8936002)(33656002)(71200400001)(316002)(86362001)(186003)(76116006)(7696005)(357404004); DIR:OUT; SFP:1101; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?q?oxeCzTpE7kTO0tRRI3vwbqJ?= =?iso-8859-1?q?+/JTITubL9hqJAgZDYTHY+S9zqbOBHINtFvqSofiKjWTZln+WOfpZ1M4xz2M?= =?iso-8859-1?q?2Ho/QYNdNglQh/JBE2T+FoayPqxJGv6rLrOcTfZ9ytVF6viLvwUW4retQVrS?= =?iso-8859-1?q?KhtWJ0BWNkR25OvHSfBCFeO6v6kB/3Rt0qT3DVW5qlS1ak1eIUVNuHzS0QJ8?= =?iso-8859-1?q?KGHp84GgERXmkk32ngjSGQUEj445gl6Zimi3peznZr4a2Qze/L+ElDralvCG?= =?iso-8859-1?q?nKt+YlYMW8TyK3EmomF+/0L8y6thbP6x1sVOayePFPpIgtVtzqWVhz+ncamo?= =?iso-8859-1?q?tfoRMl9GK9iYykj5jp/9De/1LE7pes1e+i4I9OQap4aQJ00jxNEwjncyXJet?= =?iso-8859-1?q?IIQsDDns7SianNGmZPhV5QEhG8eEd5kS4YtHAzSZ/dAHgjBvRVQUzhtkf1rr?= =?iso-8859-1?q?yLuZ5ufyVGaBEIIVvQjUjYR8lCxYaSVpdq1Ae1y6RX2gWPyjDI7P1WxPI425?= =?iso-8859-1?q?dgwGmZYxDyG9kjlNtgk69F7dEUznCuQtPHlTkRm+d58hjBzAtuCZ3nRQmSLU?= =?iso-8859-1?q?IRYeYg8IjvA2BajFBqEtr8H64CR79h+ucAu9LW2eK4u05PjAIemeEfKb7oDy?= =?iso-8859-1?q?W4rElqXq7NDbcJcSuEpdV6W3N7QiG4dEuHbh9QymT0wQ2ZOf6LktUlfjbzt+?= =?iso-8859-1?q?7BI8j1BW88xoGLTW/wa7C9J7Hkr1FYzhUmlVe4riwnDmvZ6cU3AtyGdBW6FB?= =?iso-8859-1?q?/Wv0sTa+O6f54UrA6MzUomx+symsqg+MWCaXuyBWPMfbJtaYiNvwL60NMINK?= =?iso-8859-1?q?NxJ0JQ0dKF7uKz7KRq5YbMPiPnSPmWd1Dtb7+Sjpnjc4M2wHVycm+SgGKPYw?= =?iso-8859-1?q?vA0o+8tyYiitfyjrzZLMsD5YfhSU2eMlUXH/7PRz8hPuI/1GYUZUAeUOoDDP?= =?iso-8859-1?q?+BoVNJSr96UYdLyl4SnjNcLBKf07aGYfatd1nJBG//8ZSkERXbiGGg9tLGjR?= =?iso-8859-1?q?s/1jBYJz9B/oOn31cj07H+yU1sQEXArQkjYWYTiPDxw6nTQ1FDlmDrMKIhUA?= =?iso-8859-1?q?QgC3p6y3iFiQXYWmmV63XuzqK8ZTNWDsyjuiNGMvy+T9CBm/ua/KV+BGtE0L?= =?iso-8859-1?q?S8VbxLYprKoP9jaXquO1TiIeLoCMj9dp+NpFjv1C73gAtzZzBD/7J2qJtz+7?= =?iso-8859-1?q?CnXEktpJVwPX+Z6bA0Z/+sY+3hZP6J60ieYDtsmwcmRE3HoVHwRQZU7CBM3W?= =?iso-8859-1?q?96vsmfO4nRiNryv9ZUfQrzX1NSC9GlL/uwq6+FZ72bfWeL7nzVjCNs00lg43?= =?iso-8859-1?q?Q8lLUvNonRqDqTEtSJxf3+yGgD4rOMFcZ4lSWIR0=3D?= x-ms-exchange-transport-forked: True MIME-Version: 1.0 X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1711 Original-Authentication-Results: fujitsu.com; dkim=none (message not signed) header.d=none;fujitsu.com; dmarc=none action=none header.from=arm.com; X-EOPAttributedMessage: 0 X-MS-Exchange-Transport-CrossTenantHeadersStripped: AM5EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Office365-Filtering-Correlation-Id-Prvs: 7544b3e2-a48c-4954-4f1a-08d95b512a8a X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: IcCFxicjQxsqbhN3ft556cv3S5EV13QDSPnXvjeMDGjAhECOqrmSjyjIxT+AySUELAD5y9PQ2cbZGWs9ysAicmTImZ4VBAqRHFKeRjOrkk8cFTNewl2jVe+yHMS0cXhS4ciLh2yg0v983vA5mM18vN9bvGK0cpqEG31EWl2oHKbHWptVwpB2qugtwACnv4IJz2yhFMMQDX+oEDG95GE+dyrBDLu3H8JZoP3p7kDd+gO92kcYC48eAhd4D/EuNLnL07y7meoKyPmEkJALbPFu7HqkudKOTmJQ3n7lzhIIN9orSH24DtDdDBhVfl7srh9Ygt0ug/EGP2Yw9bSn+M5XqJpvSqS+4Z59AVqgk7ow5dNWNRdxRdkG425J85OIaZjk6OBER8293lYlWYiNIYHBf1NX9aWqXBDgR9iPUuQe3zOftb96tlXV89Wu2+fX8hKudDBhq6OJTK+ladINWPvFOxkw4vPoXF4Z5Px7otzId506Lao0HTIldEqOfWmjlFJeRCf82BY6HjlZ8pKnOb0Y1K9Rw4oE+Me64FVugTSM4V8w509B7/VwREaOs1kmIMInnJ1kHg0BkDjbkQsNZd+HcD62nDyVOGhGpfT0945PnHE/KWvUulOJTAtXvJzVzsSSeeck0QhdXFhJ3wiPWRFafwJcOzBd8W56Phcee1/1C7g2dsrqk2LmxEcq2WAVM8KmbBnicpbtswN0LF9HIBFaQqLjTuBfuBiXyO5Mu7uKSXY= X-Forefront-Antispam-Report: CIP:63.35.35.123; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:64aa7808-outbound-1.mta.getcheckrecipient.com; PTR:ec2-63-35-35-123.eu-west-1.compute.amazonaws.com; CAT:NONE; SFS:(4636009)(136003)(376002)(396003)(346002)(39860400002)(46966006)(36840700001)(82310400003)(8936002)(2906002)(6862004)(336012)(81166007)(52536014)(33656002)(6506007)(70206006)(8676002)(186003)(26005)(5660300002)(70586007)(86362001)(356005)(36860700001)(316002)(9686003)(478600001)(47076005)(7696005)(4326008)(55016002)(82740400003)(357404004); DIR:OUT; SFP:1101; X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 09 Aug 2021 16:17:39.1262 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: c6665cd7-e1fb-4319-c418-08d95b513525 X-MS-Exchange-CrossTenant-Id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=f34e5979-57d9-4aaa-ad4d-b122a662184d; Ip=[63.35.35.123]; Helo=[64aa7808-outbound-1.mta.getcheckrecipient.com] X-MS-Exchange-CrossTenant-AuthSource: AM5EUR03FT006.eop-EUR03.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM9PR08MB7013 X-Spam-Status: No, score=-12.3 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, GIT_PATCH_0, RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2, SPF_HELO_PASS, SPF_PASS, TXREP, UNPARSEABLE_RELAY autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: libc-alpha@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Libc-alpha mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Patchwork-Original-From: Wilco Dijkstra via Libc-alpha From: Wilco Dijkstra Reply-To: Wilco Dijkstra Cc: 'GNU C Library' Errors-To: libc-alpha-bounces+incoming=patchwork.ozlabs.org@sourceware.org Sender: "Libc-alpha" v4: Slightly tweak alignment code Improve performance of large memsets. Simplify alignment code. For zero memset use DC ZVA, which almost doubles performance. For non-zero memsets use the unroll8 loop which is about 10% faster. Reviewed-by: Naohiro Tamura diff --git a/sysdeps/aarch64/multiarch/memset_a64fx.S b/sysdeps/aarch64/multiarch/memset_a64fx.S index cf3d402ef681a9d98964d1751537945692a1ae68..6bc8ef5e0c84dbb59a57d114ae6ec8e3fa3822ad 100644 --- a/sysdeps/aarch64/multiarch/memset_a64fx.S +++ b/sysdeps/aarch64/multiarch/memset_a64fx.S @@ -27,14 +27,11 @@ */ #define L1_SIZE (64*1024) // L1 64KB -#define L2_SIZE (8*1024*1024) // L2 8MB - 1MB +#define L2_SIZE (8*1024*1024) // L2 8MB #define CACHE_LINE_SIZE 256 #define PF_DIST_L1 (CACHE_LINE_SIZE * 16) // Prefetch distance L1 -#define ZF_DIST (CACHE_LINE_SIZE * 21) // Zerofill distance -#define rest x8 +#define rest x2 #define vector_length x9 -#define vl_remainder x10 // vector_length remainder -#define cl_remainder x11 // CACHE_LINE_SIZE remainder #if HAVE_AARCH64_SVE_ASM # if IS_IN (libc) @@ -42,14 +39,6 @@ .arch armv8.2-a+sve - .macro dc_zva times - dc zva, tmp1 - add tmp1, tmp1, CACHE_LINE_SIZE - .if \times-1 - dc_zva "(\times-1)" - .endif - .endm - .macro st1b_unroll first=0, last=7 st1b z0.b, p0, [dst, \first, mul vl] .if \last-\first @@ -188,54 +177,30 @@ L(L1_prefetch): // if rest >= L1_SIZE cbnz rest, L(unroll32) ret -L(L2): - // align dst address at vector_length byte boundary - sub tmp1, vector_length, 1 - ands tmp2, dst, tmp1 - // if vl_remainder == 0 - b.eq 1f - sub vl_remainder, vector_length, tmp2 - // process remainder until the first vector_length boundary - whilelt p2.b, xzr, vl_remainder - st1b z0.b, p2, [dst] - add dst, dst, vl_remainder - sub rest, rest, vl_remainder - // align dstin address at CACHE_LINE_SIZE byte boundary -1: mov tmp1, CACHE_LINE_SIZE - ands tmp2, dst, CACHE_LINE_SIZE - 1 - // if cl_remainder == 0 - b.eq L(L2_dc_zva) - sub cl_remainder, tmp1, tmp2 - // process remainder until the first CACHE_LINE_SIZE boundary - mov tmp1, xzr // index -2: whilelt p2.b, tmp1, cl_remainder - st1b z0.b, p2, [dst, tmp1] - incb tmp1 - cmp tmp1, cl_remainder - b.lo 2b - add dst, dst, cl_remainder - sub rest, rest, cl_remainder - -L(L2_dc_zva): - // zero fill - mov tmp1, dst - dc_zva (ZF_DIST / CACHE_LINE_SIZE) - 1 - mov zva_len, ZF_DIST - add tmp1, zva_len, CACHE_LINE_SIZE * 2 - // unroll + // count >= L2_SIZE .p2align 3 -1: st1b_unroll 0, 3 - add tmp2, dst, zva_len - dc zva, tmp2 - st1b_unroll 4, 7 - add tmp2, tmp2, CACHE_LINE_SIZE - dc zva, tmp2 - add dst, dst, CACHE_LINE_SIZE * 2 - sub rest, rest, CACHE_LINE_SIZE * 2 - cmp rest, tmp1 // ZF_DIST + CACHE_LINE_SIZE * 2 - b.ge 1b - cbnz rest, L(unroll8) - ret +L(L2): + tst valw, 255 + b.ne L(unroll8) + // align dst to CACHE_LINE_SIZE byte boundary + and tmp2, dst, CACHE_LINE_SIZE - 1 + st1b z0.b, p0, [dst, 0, mul vl] + st1b z0.b, p0, [dst, 1, mul vl] + st1b z0.b, p0, [dst, 2, mul vl] + st1b z0.b, p0, [dst, 3, mul vl] + sub dst, dst, tmp2 + add count, count, tmp2 + + // clear cachelines using DC ZVA + sub count, count, CACHE_LINE_SIZE * 2 + .p2align 4 +1: add dst, dst, CACHE_LINE_SIZE + dc zva, dst + subs count, count, CACHE_LINE_SIZE + b.hi 1b + add count, count, CACHE_LINE_SIZE + add dst, dst, CACHE_LINE_SIZE + b L(last) END (MEMSET) libc_hidden_builtin_def (MEMSET)