From patchwork Thu Jan 10 11:53:45 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Florian Westphal X-Patchwork-Id: 1022856 X-Patchwork-Delegate: pablo@netfilter.org Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (mailfrom) smtp.mailfrom=vger.kernel.org (client-ip=209.132.180.67; helo=vger.kernel.org; envelope-from=netfilter-devel-owner@vger.kernel.org; receiver=) Authentication-Results: ozlabs.org; dmarc=none (p=none dis=none) header.from=strlen.de Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 43b4HS1cnTz9sLt for ; Thu, 10 Jan 2019 22:57:24 +1100 (AEDT) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727673AbfAJL5X (ORCPT ); Thu, 10 Jan 2019 06:57:23 -0500 Received: from Chamillionaire.breakpoint.cc ([146.0.238.67]:52194 "EHLO Chamillionaire.breakpoint.cc" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727344AbfAJL5X (ORCPT ); Thu, 10 Jan 2019 06:57:23 -0500 Received: from fw by Chamillionaire.breakpoint.cc with local (Exim 4.89) (envelope-from ) id 1ghYxd-0005CK-5c; Thu, 10 Jan 2019 12:57:21 +0100 From: Florian Westphal To: Cc: Florian Westphal Subject: [PATCH nft] payload: refine payload expr merging Date: Thu, 10 Jan 2019 12:53:45 +0100 Message-Id: <20190110115345.6843-1-fw@strlen.de> X-Mailer: git-send-email 2.19.2 MIME-Version: 1.0 Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org nf_tables can handle payload exprs for sizes <= sizeof(u32) via a direct operation from the eval loop, rather than a a call to the payload expression. Two loads for four byte quantities are thus faster than a single load for an 8 byte load. ip saddr 1.2.3.4 ip daddr 2.3.4.5 is faster with this applied, even though it involves two payload expressions and two compare operations, just because all of them can be handled from the main loop rather than calls to the payload expression. Keep merging for linklayer and when at least one of the expressions is already exceeding the 4 byte limit, then it will be cheaper to do the merging. Signed-off-by: Florian Westphal Acked-by: Pablo Neira Ayuso --- src/payload.c | 28 +++++++++++++++++++++++++++- 1 file changed, 27 insertions(+), 1 deletion(-) diff --git a/src/payload.c b/src/payload.c index 6517686cbfba..36991401e1d3 100644 --- a/src/payload.c +++ b/src/payload.c @@ -719,7 +719,33 @@ bool payload_can_merge(const struct expr *e1, const struct expr *e2) if (total < e1->len || total > (NFT_REG_SIZE * BITS_PER_BYTE)) return false; - return true; + /* could return true after this, the expressions are mergeable. + * + * However, there are some caveats. + * + * Loading anything <= sizeof(u32) with base >= network header + * is fast, because its handled directly from eval loop in the + * kernel. + * + * We thus restrict merging a bit more. + */ + + /* can still be handled by fastpath after merge */ + if (total <= NFT_REG32_SIZE * BITS_PER_BYTE) + return true; + + /* Linklayer base is not handled in fastpath, merge */ + if (e1->payload.base == PROTO_BASE_LL_HDR) + return true; + + /* Also merge if at least one expression is already + * above REG32 size, in this case merging is faster. + */ + if (e1->len > (NFT_REG32_SIZE * BITS_PER_BYTE) || + e2->len > (NFT_REG32_SIZE * BITS_PER_BYTE)) + return true; + + return false; } /**