mbox series

[RFC,0/3] nf_tables string match support

Message ID 20220729093129.3108-1-pablo@netfilter.org
Headers show
Series nf_tables string match support | expand

Message

Pablo Neira Ayuso July 29, 2022, 9:31 a.m. UTC
Hi,

The following patchset contains nf_tables string match support. This new
infrastructure is based on the Aho-Corasick pattern match algorithm,
which allows for linear search of a dictionary (a "string set" composed
of patterns). The implementation is lockless by performing updates on a
cloned copy of the Aho-Corasick "tree" use the existing 2-phase commit
protocol to atomically expose the new tree to the packet patch.

I decided to add new netlink API for "string set" rather than reusing the
existing set API for simplicity. There is a new Kconfig knob CONFIG_NFT_STRING
to enable built-in support into nf_tables to avoid an indirection between
nf_tables_api and nft_string given that the Aho-Corasick API (see ac_*()
functions) are invoked from the nf_tables netlink frontend.

The implementation of Aho-Corasick comes as a separated file, it is
relatively small (~600 LoC), and a dictionary of 370105 English words
consumes ~150 Mbytes. Maximum string size at this stage is 128 bytes.

The implementation has been validated from userspace via ASAN and
valgrind with testsuites consisting simple tests combined with random
feeding the dictionary with words and autogenerated text patched with a
matching at a random offset to validate correct matching. The userspace
implementation (rather similarly to the one coming in this batch) and
the testsuite is not posted in this batch.

This algorithm is described in "Efficient string matching: An aid to
bibliographic search" by Alfred V. Aho and Margaret J. Corasick (published in
June 1975) at Communications of the ACM 18 (6): 333–340.

There is a few aspect I would like to revisit after this RFC, eg. netlink
notifications are not yet supported.

Please, see specific patch descriptions for implementation details.

Comments welcome.

P.S: Patch 2 reports 200 deletions on nf_tables_api.c. For some reason
     diff is removing 200 LoC and adding them again after the new netlink
     string API, there are not real line removals, it is just noise.

Pablo Neira Ayuso (3):
  netfilter: add Aho-Corasick string match implementation
  netfilter: nf_tables: add string set API
  netfilter: nf_tables: add string expression

 include/net/netfilter/ahocorasick.h      |   27 +
 include/net/netfilter/nf_tables.h        |   37 +
 include/net/netfilter/nf_tables_core.h   |    1 +
 include/uapi/linux/netfilter/nf_tables.h |   65 ++
 net/netfilter/Kconfig                    |    7 +
 net/netfilter/Makefile                   |    3 +
 net/netfilter/ahocorasick.c              |  677 ++++++++++++
 net/netfilter/nf_tables_api.c            | 1287 ++++++++++++++++++----
 net/netfilter/nf_tables_core.c           |    1 +
 net/netfilter/nft_string.c               |  254 +++++
 10 files changed, 2158 insertions(+), 201 deletions(-)
 create mode 100644 include/net/netfilter/ahocorasick.h
 create mode 100644 net/netfilter/ahocorasick.c
 create mode 100644 net/netfilter/nft_string.c