From patchwork Mon Oct 23 18:58:10 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Jakub Kicinski X-Patchwork-Id: 829574 X-Patchwork-Delegate: davem@davemloft.net Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netdev-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=netronome-com.20150623.gappssmtp.com header.i=@netronome-com.20150623.gappssmtp.com header.b="abzD6s8X"; dkim-atps=neutral Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3yLQfX0zhXz9rxj for ; Tue, 24 Oct 2017 05:58:44 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751830AbdJWS6e (ORCPT ); Mon, 23 Oct 2017 14:58:34 -0400 Received: from mail-pg0-f68.google.com ([74.125.83.68]:52451 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751802AbdJWS6b (ORCPT ); Mon, 23 Oct 2017 14:58:31 -0400 Received: by mail-pg0-f68.google.com with SMTP id a192so12473940pge.9 for ; Mon, 23 Oct 2017 11:58:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=netronome-com.20150623.gappssmtp.com; s=20150623; h=from:to:cc:subject:date:message-id:in-reply-to:references; bh=JdMTVq02Y9C5aWLwGTiuP+N4A29ZG4KIVmxxdP79A6U=; b=abzD6s8XsLySMeBZAYrB8mUDL9r3798hMaRo88BPtkLLiTAVDjbHMTf/m97LEhnFRy fXpaDH5h8ZQFa3xsZwwKdAI7OGQn7eQE0627wDYseHhZeh/pComcRFAxxfvZj4G0W7pL 5tzSrbP4s186p8grGLbgoraqE0FrN3faT4WTl8hk5vCkt5ZXeGxiZ5U/74+yv1MR4a/R maZOPqHz4IxLRh4J3vRNnYsYbro2eSoj2+jYtoGVJG3IfJIayBWz/KaXkTLeChR29biq oJxjlEhxZ0N3YIjW4Pqg0n7+vDsQ1/7f+48P84NCvB+WJy2RFmAUT2f0QGZWtBGkALWF RB2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references; bh=JdMTVq02Y9C5aWLwGTiuP+N4A29ZG4KIVmxxdP79A6U=; b=SlkWqBnkcXzBwNBntBkPg2sSx6VPdtzBMdqRRBziRTFeGkQX6rTSFKEWA28LfC3h8V gsBa/ElqhM1Ggt1fh5EFcj/lC9iwdEWNvknt5qMP0ZFknbHPqhljq2dan0tILXr5FWIM JwKvU/o1vxSfOh0nvz28+oahvPN7ZpGoG93/waOR7RhjVUnbww/HBMsBf359+ryAnK/c /hZ+qysxHh610QQFJ99oXb+ncHD9DVvtKSHzZ3HcZul0JEF2VGpXvFQLn8zaQWKpkqPq x9URs9dykIAHcOi4bN5PcxOzIbt2l4yvQHnDxh48HJ3g4mivePvAN9Y/gXiKFBI05cFC WMUA== X-Gm-Message-State: AMCzsaXrazspZClWSnd75Cley2PyxriC+UGj/fHhuSMHmdH34Uu0TVZA mhSvSoAYTQZ/wsvqVpfKEMlecGqW X-Google-Smtp-Source: ABhQp+R7Kut0iB+7DMA8xUIvdbKxioexn80I2jz6MZahUDUalYuW6R+szZ3c0GyKs8czCnXHU8r7iw== X-Received: by 10.84.160.200 with SMTP id v8mr10680160plg.55.1508785110161; Mon, 23 Oct 2017 11:58:30 -0700 (PDT) Received: from jkicinski-Precision-T1700.netronome.com ([75.53.12.129]) by smtp.gmail.com with ESMTPSA id x11sm11599255pgq.29.2017.10.23.11.58.29 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Mon, 23 Oct 2017 11:58:29 -0700 (PDT) From: Jakub Kicinski To: netdev@vger.kernel.org Cc: oss-drivers@netronome.com, Jakub Kicinski Subject: [PATCH net-next 5/9] nfp: bpf: optimize the RMW for stack accesses Date: Mon, 23 Oct 2017 11:58:10 -0700 Message-Id: <20171023185814.4797-6-jakub.kicinski@netronome.com> X-Mailer: git-send-email 2.14.1 In-Reply-To: <20171023185814.4797-1-jakub.kicinski@netronome.com> References: <20171023185814.4797-1-jakub.kicinski@netronome.com> Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org When we are performing unaligned stack accesses in the 32-64B window we have to do a read-modify-write cycle. E.g. for reading 8 bytes from address 17: 0: tmp = stack[16] 1: gprLo = tmp >> 8 2: tmp = stack[20] 3: gprLo |= tmp << 24 4: tmp = stack[20] 5: gprHi = tmp >> 8 6: tmp = stack[24] 7: gprHi |= tmp << 24 The load on line 4 is unnecessary, because tmp already contains data from stack[20]. For write we can optimize both loads and writebacks away. Signed-off-by: Jakub Kicinski Reviewed-by: Quentin Monnet --- drivers/net/ethernet/netronome/nfp/bpf/jit.c | 33 +++++++++++++++++++++------- 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/drivers/net/ethernet/netronome/nfp/bpf/jit.c b/drivers/net/ethernet/netronome/nfp/bpf/jit.c index 094acea35326..6730690cf9d8 100644 --- a/drivers/net/ethernet/netronome/nfp/bpf/jit.c +++ b/drivers/net/ethernet/netronome/nfp/bpf/jit.c @@ -644,11 +644,11 @@ data_st_host_order(struct nfp_prog *nfp_prog, u8 dst_gpr, swreg offset, typedef int (*lmem_step)(struct nfp_prog *nfp_prog, u8 gpr, u8 gpr_byte, s32 off, - unsigned int size, bool new_gpr); + unsigned int size, bool first, bool new_gpr, bool last); static int wrp_lmem_load(struct nfp_prog *nfp_prog, u8 dst, u8 dst_byte, s32 off, - unsigned int size, bool new_gpr) + unsigned int size, bool first, bool new_gpr, bool last) { u32 idx, src_byte; enum shf_sc sc; @@ -692,7 +692,13 @@ wrp_lmem_load(struct nfp_prog *nfp_prog, u8 dst, u8 dst_byte, s32 off, reg = reg_lm(0, idx); } else { reg = imm_a(nfp_prog); - wrp_mov(nfp_prog, reg, reg_lm(0, idx)); + /* If it's not the first part of the load and we start a new GPR + * that means we are loading a second part of the LMEM word into + * a new GPR. IOW we've already looked that LMEM word and + * therefore it has been loaded into imm_a(). + */ + if (first || !new_gpr) + wrp_mov(nfp_prog, reg, reg_lm(0, idx)); } emit_ld_field_any(nfp_prog, reg_both(dst), mask, reg, sc, shf, new_gpr); @@ -702,7 +708,7 @@ wrp_lmem_load(struct nfp_prog *nfp_prog, u8 dst, u8 dst_byte, s32 off, static int wrp_lmem_store(struct nfp_prog *nfp_prog, u8 src, u8 src_byte, s32 off, - unsigned int size, bool new_gpr) + unsigned int size, bool first, bool new_gpr, bool last) { u32 idx, dst_byte; enum shf_sc sc; @@ -746,13 +752,19 @@ wrp_lmem_store(struct nfp_prog *nfp_prog, u8 src, u8 src_byte, s32 off, reg = reg_lm(0, idx); } else { reg = imm_a(nfp_prog); - wrp_mov(nfp_prog, reg, reg_lm(0, idx)); + /* Only first and last LMEM locations are going to need RMW, + * the middle location will be overwritten fully. + */ + if (first || last) + wrp_mov(nfp_prog, reg, reg_lm(0, idx)); } emit_ld_field(nfp_prog, reg, mask, reg_b(src), sc, shf); - if (idx > RE_REG_LM_IDX_MAX) - wrp_mov(nfp_prog, reg_lm(0, idx), reg); + if (new_gpr || last) { + if (idx > RE_REG_LM_IDX_MAX) + wrp_mov(nfp_prog, reg_lm(0, idx), reg); + } return 0; } @@ -762,6 +774,7 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, unsigned int size, u8 gpr, bool clr_gpr, lmem_step step) { s32 off = nfp_prog->stack_depth + meta->insn.off; + bool first = true, last; u8 prev_gpr = 255; u32 gpr_byte = 0; int ret; @@ -777,12 +790,16 @@ mem_op_stack(struct nfp_prog *nfp_prog, struct nfp_insn_meta *meta, slice_end = min(off + slice_size, round_up(off + 1, 4)); slice_size = slice_end - off; + last = slice_size == size; + ret = step(nfp_prog, gpr, gpr_byte, off, slice_size, - gpr != prev_gpr); + first, gpr != prev_gpr, last); if (ret) return ret; prev_gpr = gpr; + first = false; + gpr_byte += slice_size; if (gpr_byte >= 4) { gpr_byte -= 4;