From patchwork Thu Jan 10 00:08:22 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Willem de Bruijn X-Patchwork-Id: 210909 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3AD3C2C020B for ; Thu, 10 Jan 2013 11:39:47 +1100 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932254Ab3AJAjp (ORCPT ); Wed, 9 Jan 2013 19:39:45 -0500 Received: from mail-gg0-f201.google.com ([209.85.161.201]:55618 "EHLO mail-gg0-f201.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932224Ab3AJAjp (ORCPT ); Wed, 9 Jan 2013 19:39:45 -0500 X-Greylist: delayed 1513 seconds by postgrey-1.27 at vger.kernel.org; Wed, 09 Jan 2013 19:39:45 EST Received: by mail-gg0-f201.google.com with SMTP id o6so324912ggm.0 for ; Wed, 09 Jan 2013 16:39:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:cc:subject:date:message-id:x-mailer:in-reply-to :references; bh=R9YrpbcEu1DWug+s6LCjZGf1rZ6rtQY5su5+hP6Rh4g=; b=pMUHJHZmaN8UIU3t1Xe4knNachzA4CDJbRnLNEueTfYn9tOxZIae1tXWFNy0aHk7zT GAwx8IscG1wTA1YduggsUnCVFzo94mhonrGrdzhZOUzxGz1evOPCEovrahKUpznOOcAe Ukr6dUR3UnJ3WSZexhhSzWWiD8yV1nQgcyfbgoY9vSeGGgSG+GfRUrJyBST5fCFNDvUN yVuyvTKia8YA2zftaBXpHRMn2zGqnqBHOyEpvljzG1x0HAElhwlBABn/TL42AUKCVtTk ib50RsC8L5r0ogqie2NjPBoH8+U+Mc0sD6eSOBKrtv+9WNVEOn0duJgPLswsahll5uar uqMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=x-received:from:to:cc:subject:date:message-id:x-mailer:in-reply-to :references:x-gm-message-state; bh=R9YrpbcEu1DWug+s6LCjZGf1rZ6rtQY5su5+hP6Rh4g=; b=X6INHXTJvphADu3/gJhGQgQZ3rIXLcpyR8Sx7SE9R2HCZnyCZHp4JxM6GRZEcXf5P7 /7x88I7QwDOCxj30t/BMg3f4i5E4ge3CQiPvAvjR/UBLKBx4Z8GDr8pJoqJUSMXpO4dK Xt5rIK1Z9spfN8yHFy9A3Qc/PH/7cD4BrDujFU6i9jFJQS3n3x0uCzGFy4yzts+4OnAb wX2IfiSskXlWXzbyED57QWX+tT4q6Bsce2c9nOIPRnfaZtJkRXZyqKMgTX3IRlP2pudB BP9MYwuks73/SKXj+Xg2iG1yZjWHVFDxl1l3dv/dBmurSqQDkX6nPYYscANxhAsbazd6 xe7A== X-Received: by 10.236.159.198 with SMTP id s46mr37509921yhk.40.1357776506329; Wed, 09 Jan 2013 16:08:26 -0800 (PST) Received: from wpzn4.hot.corp.google.com (216-239-44-65.google.com [216.239.44.65]) by gmr-mx.google.com with ESMTPS id l20si5881808yhi.2.2013.01.09.16.08.26 (version=TLSv1/SSLv3 cipher=AES128-SHA); Wed, 09 Jan 2013 16:08:26 -0800 (PST) Received: from gopher.nyc.corp.google.com (gopher.nyc.corp.google.com [172.26.106.37]) by wpzn4.hot.corp.google.com (Postfix) with ESMTP id 4163182004A; Wed, 9 Jan 2013 16:08:26 -0800 (PST) Received: by gopher.nyc.corp.google.com (Postfix, from userid 29878) id E116AC01D6; Wed, 9 Jan 2013 19:08:25 -0500 (EST) From: Willem de Bruijn To: netfilter-devel@vger.kernel.org, pablo@netfilter.org Cc: Willem de Bruijn Subject: [PATCH next v2] iptables: add xt_bpf match Date: Wed, 9 Jan 2013 19:08:22 -0500 Message-Id: <1357776502-21555-1-git-send-email-willemb@google.com> X-Mailer: git-send-email 1.7.7.3 In-Reply-To: References: X-Gm-Message-State: ALoCoQmde3vjBD8M4pHXUzMRM9ZZ/WFYMmhBcQLrYd01ZhBSbvU0dF5yfMQ1C/ecv4bQ85sRZRkAmPm2kugETsp9CO5VjgRP83+OmQ/rTHijCGEx66I/DdnnUKdDUCw9IgrL3IfMT5+XgQNzKeCjTdgcMSiJfYTwmUzHl+9SlrA7D+sXYgTzL7/7HynR3gb03sYFTUgCcPE1eXAntni6/n8+cqZeFCRwmg== Sender: netfilter-devel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netfilter-devel@vger.kernel.org Changes: - v2->v1: use a fixed size match structure to communicate between kernel and userspace. Support arbitrary linux socket filter (BPF) programs as iptables match rules. This allows for very expressive filters, and on platforms with BPF JIT appears competitive with traditional hardcoded iptables rules. At least, on an x86_64 that achieves 40K netperf TCP_STREAM without any iptables rules (40 GBps), inserting 100x this bpf rule gives 28K ./iptables -A OUTPUT -m bpf --bytecode '6,40 0 0 14, 21 0 3 2048,48 0 0 25,21 0 1 20,6 0 0 96,6 0 0 0,' -j (as generated by tcpdump -i any -ddd ip proto 20 | tr '\n' ',') inserting 100x this u32 rule gives 21K ./iptables -A OUTPUT -m u32 --u32 '6&0xFF=0x20' -j DROP The two are logically equivalent, as far as I can tell. Let me know if my test methodology is flawed in some way. Even in cases where slower, the filter adds functionality currently lacking in iptables, such as access to sk_buff fields like rxhash and queue_mapping. --- Gconfig.xt_bpf | 1 + include/uapi/linux/netfilter/xt_bpf.h | 17 ++++++++ net/netfilter/Kconfig | 9 ++++ net/netfilter/Makefile | 1 + net/netfilter/x_tables.c | 5 +- net/netfilter/xt_bpf.c | 73 +++++++++++++++++++++++++++++++++ 6 files changed, 104 insertions(+), 2 deletions(-) create mode 100644 Gconfig.xt_bpf create mode 100644 include/uapi/linux/netfilter/xt_bpf.h create mode 100644 net/netfilter/xt_bpf.c diff --git a/Gconfig.xt_bpf b/Gconfig.xt_bpf new file mode 100644 index 0000000..dd51452 --- /dev/null +++ b/Gconfig.xt_bpf @@ -0,0 +1 @@ +CONFIG_NETFILTER_XT_MATCH_BPF=m diff --git a/include/uapi/linux/netfilter/xt_bpf.h b/include/uapi/linux/netfilter/xt_bpf.h new file mode 100644 index 0000000..5dda450 --- /dev/null +++ b/include/uapi/linux/netfilter/xt_bpf.h @@ -0,0 +1,17 @@ +#ifndef _XT_BPF_H +#define _XT_BPF_H + +#include +#include + +#define XT_BPF_MAX_NUM_INSTR 64 + +struct xt_bpf_info { + __u16 bpf_program_num_elem; + struct sock_filter bpf_program[XT_BPF_MAX_NUM_INSTR]; + + /* only used in the kernel */ + struct sk_filter *filter __attribute__((aligned(8))); +}; + +#endif /*_XT_BPF_H */ diff --git a/net/netfilter/Kconfig b/net/netfilter/Kconfig index fefa514..d45720f 100644 --- a/net/netfilter/Kconfig +++ b/net/netfilter/Kconfig @@ -798,6 +798,15 @@ config NETFILTER_XT_MATCH_ADDRTYPE If you want to compile it as a module, say M here and read . If unsure, say `N'. +config NETFILTER_XT_MATCH_BPF + tristate '"bpf" match support' + depends on NETFILTER_ADVANCED + help + BPF matching applies a linux socket filter to each packet and + accepts those for which the filter returns non-zero. + + To compile it as a module, choose M here. If unsure, say N. + config NETFILTER_XT_MATCH_CLUSTER tristate '"cluster" match support' depends on NF_CONNTRACK diff --git a/net/netfilter/Makefile b/net/netfilter/Makefile index 3259697..6d6194525 100644 --- a/net/netfilter/Makefile +++ b/net/netfilter/Makefile @@ -98,6 +98,7 @@ obj-$(CONFIG_NETFILTER_XT_TARGET_IDLETIMER) += xt_IDLETIMER.o # matches obj-$(CONFIG_NETFILTER_XT_MATCH_ADDRTYPE) += xt_addrtype.o +obj-$(CONFIG_NETFILTER_XT_MATCH_BPF) += xt_bpf.o obj-$(CONFIG_NETFILTER_XT_MATCH_CLUSTER) += xt_cluster.o obj-$(CONFIG_NETFILTER_XT_MATCH_COMMENT) += xt_comment.o obj-$(CONFIG_NETFILTER_XT_MATCH_CONNBYTES) += xt_connbytes.o diff --git a/net/netfilter/x_tables.c b/net/netfilter/x_tables.c index 8d987c3..26306be 100644 --- a/net/netfilter/x_tables.c +++ b/net/netfilter/x_tables.c @@ -379,8 +379,9 @@ int xt_check_match(struct xt_mtchk_param *par, if (XT_ALIGN(par->match->matchsize) != size && par->match->matchsize != -1) { /* - * ebt_among is exempt from centralized matchsize checking - * because it uses a dynamic-size data set. + * matches of variable size length, such as ebt_among, + * are exempt from centralized matchsize checking. They + * skip the test by setting xt_match.matchsize to -1. */ pr_err("%s_tables: %s.%u match: invalid size " "%u (kernel) != (user) %u\n", diff --git a/net/netfilter/xt_bpf.c b/net/netfilter/xt_bpf.c new file mode 100644 index 0000000..1bdfab8 --- /dev/null +++ b/net/netfilter/xt_bpf.c @@ -0,0 +1,73 @@ +/* Xtables module to match packets using a BPF filter. + * Copyright 2013 Google Inc. + * Written by Willem de Bruijn + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License version 2 as + * published by the Free Software Foundation. + */ + +#include +#include +#include +#include +#include + +#include +#include + +MODULE_AUTHOR("Willem de Bruijn "); +MODULE_DESCRIPTION("Xtables: BPF filter match"); +MODULE_LICENSE("GPL"); + +static int bpf_mt_check(const struct xt_mtchk_param *par) +{ + struct xt_bpf_info *info = par->matchinfo; + struct sock_fprog program; + + program.len = info->bpf_program_num_elem; + program.filter = info->bpf_program; + if (sk_unattached_filter_create(&info->filter, &program)) { + pr_info("bpf: check failed: parse error\n"); + return -EINVAL; + } + + return 0; +} + +static bool bpf_mt(const struct sk_buff *skb, struct xt_action_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + + return SK_RUN_FILTER(info->filter, skb); +} + +static void bpf_mt_destroy(const struct xt_mtdtor_param *par) +{ + const struct xt_bpf_info *info = par->matchinfo; + sk_unattached_filter_destroy(info->filter); +} + +static struct xt_match bpf_mt_reg __read_mostly = { + .name = "bpf", + .revision = 0, + .family = NFPROTO_UNSPEC, + .checkentry = bpf_mt_check, + .match = bpf_mt, + .destroy = bpf_mt_destroy, + .matchsize = sizeof(struct xt_bpf_info), + .me = THIS_MODULE, +}; + +static int __init bpf_mt_init(void) +{ + return xt_register_match(&bpf_mt_reg); +} + +static void __exit bpf_mt_exit(void) +{ + xt_unregister_match(&bpf_mt_reg); +} + +module_init(bpf_mt_init); +module_exit(bpf_mt_exit);