Message ID | cover.1579647351.git.sbrivio@redhat.com |
---|---|
Headers | show |
Series | nftables: Set implementation for arbitrary concatenation of ranges | expand |
On Wed, Jan 22, 2020 at 12:17:50AM +0100, Stefano Brivio wrote: > Existing nftables set implementations allow matching entries with > interval expressions (rbtree), e.g. 192.0.2.1-192.0.2.4, entries > specifying field concatenation (hash, rhash), e.g. 192.0.2.1:22, > but not both. > > In other words, none of the set types allows matching on range > expressions for more than one packet field at a time, such as ipset > does with types bitmap:ip,mac, and, to a more limited extent > (netmasks, not arbitrary ranges), with types hash:net,net, > hash:net,port, hash:ip,port,net, and hash:net,port,net. > > As a pure hash-based approach is unsuitable for matching on ranges, > and "proxying" the existing red-black tree type looks impractical as > elements would need to be shared and managed across all employed > trees, this new set implementation intends to fill the functionality > gap by employing a relatively novel approach. > > The fundamental idea, illustrated in deeper detail in patch 5/9, is to > use lookup tables classifying a small number of grouped bits from each > field, and map the lookup results in a way that yields a verdict for > the full set of specified fields. > > The grouping bit aspect is loosely inspired by the Grouper algorithm, > by Jay Ligatti, Josh Kuhn, and Chris Gage (see patch 5/9 for the full > reference). > > A reference, stand-alone implementation of the algorithm itself is > available at: > https://pipapo.lameexcu.se > > Some notes about possible future optimisations are also mentioned > there. This algorithm reduces the matching problem to, essentially, > a repetitive sequence of simple bitwise operations, and is > particularly suitable to be optimised by leveraging SIMD instruction > sets. An AVX2-based implementation is also presented in this series. > > I plan to post the adaptation of the existing AVX2 vectorised > implementation for (at least) NEON at a later time. > > Patches 1/9 to 3/9 implement the needed infrastructure: new > attributes are used to describe length of single ranged fields in > concatenations and to denote the upper bound for ranges. > > Patch 4/9 adds a new bitmap operation that copies the source bitmap > onto the destination while removing a given region, and is needed to > delete regions of arrays mapping between lookup tables. > > Patch 5/9 is the actual set implementation. > > Patch 6/9 introduces selftests for the new implementation. Applied up to 6/9. Merge window will close soon and I'm going to be a bit defensive and take only the batch that include the initial implementation. I would prefer if we all use this round to start using the C implementation upstream and report bugs. While I have received positive feedback from other fellows meanwhile privately, this batch is large and I'm inclined to follow this approach. Please, don't be disappointed, and just follow up with more patches once merge window opens up again. Thanks.
Hi Stefano, When playing with adding multiple elements, I suddenly noticed a disturbance in the force (general protection fault). Here's a reproducer: | $NFT -f - <<EOF | table t { | set s { | type ipv4_addr . inet_service | flags interval | } | } | EOF | | $NFT add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }' | $NFT flush set t s | $NFT add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }' It is pretty reliable, though sometimes needs a second call. Looks like some things going on in parallel which shouldn't. Here's a typical last breath: [ 71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI [ 71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192 [ 71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014 [ 71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables] [ 71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables] [ 71.326164] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b [ 71.328423] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282 [ 71.329225] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0 [ 71.330365] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a [ 71.331473] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000 [ 71.332627] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2 [ 71.333615] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0 [ 71.334596] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 71.335780] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.336577] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0 [ 71.337533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 71.338557] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 71.339718] Call Trace: [ 71.340093] nft_pipapo_destroy+0x7a/0x170 [nf_tables_set] [ 71.340973] nft_set_destroy+0x20/0x50 [nf_tables] [ 71.341879] nf_tables_trans_destroy_work+0x246/0x260 [nf_tables] [ 71.342916] process_one_work+0x1d5/0x3c0 [ 71.343601] worker_thread+0x4a/0x3c0 [ 71.344229] kthread+0xfb/0x130 [ 71.344780] ? process_one_work+0x3c0/0x3c0 [ 71.345477] ? kthread_park+0x90/0x90 [ 71.346129] ret_from_fork+0x35/0x40 [ 71.346748] Modules linked in: nf_tables_set nf_tables nfnetlink 8021q [last unloaded: nfnetlink] [ 71.348153] ---[ end trace 2eaa8149ca759bcc ]--- [ 71.349066] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables] [ 71.350016] Code: 89 d4 84 c0 74 0e 8b 77 44 0f b6 f8 48 01 df e8 41 ff ff ff 45 84 e4 74 36 44 0f b6 63 08 45 84 e4 74 2c 49 01 dc 49 8b 04 24 <48> 8b 40 38 48 85 c0 74 4f 48 89 e7 4c 8b [ 71.350017] RSP: 0018:ffffc9000226fd90 EFLAGS: 00010282 [ 71.350019] RAX: 6f6b6e696c2e756e RBX: ffff88813ab79f60 RCX: ffff88813931b5a0 [ 71.350019] RDX: 0000000000000001 RSI: 0000000000000000 RDI: ffff88813ab79f9a [ 71.350020] RBP: ffff88813ab79f60 R08: 0000000000000008 R09: 0000000000000000 [ 71.350021] R10: 000000000000021c R11: 0000000000000000 R12: ffff88813ab79fc2 [ 71.350022] R13: ffff88813b3adf50 R14: dead000000000100 R15: ffff88813931b8a0 [ 71.350025] FS: 0000000000000000(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000 [ 71.350026] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 71.350027] CR2: 000055ac683710f0 CR3: 000000013a222003 CR4: 0000000000360ee0 [ 71.350028] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 71.350028] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 71.350030] Kernel panic - not syncing: Fatal exception [ 71.350412] Kernel Offset: disabled [ 71.365922] ---[ end Kernel panic - not syncing: Fatal exception ]--- Cheers, Phil
Hi Phil, On Thu, 20 Feb 2020 11:52:41 +0100 Phil Sutter <phil@nwl.cc> wrote: > Hi Stefano, > > When playing with adding multiple elements, I suddenly noticed a > disturbance in the force (general protection fault). Here's a > reproducer: > > | $NFT -f - <<EOF > | table t { > | set s { > | type ipv4_addr . inet_service > | flags interval > | } > | } > | EOF > | > | $NFT add element t s '{ 10.0.0.1 . 22-25, 10.0.0.1 . 10-20 }' > | $NFT flush set t s > | $NFT add element t s '{ 10.0.0.1 . 10-20, 10.0.0.1 . 22-25 }' > > It is pretty reliable, though sometimes needs a second call. Looks like some > things going on in parallel which shouldn't. Here's a typical last breath: > > [ 71.319848] general protection fault, probably for non-canonical address 0x6f6b6e696c2e756e: 0000 [#1] PREEMPT SMP PTI > [ 71.321540] CPU: 3 PID: 1201 Comm: kworker/3:2 Not tainted 5.6.0-rc1-00377-g2bb07f4e1d861 #192 > [ 71.322746] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190711_202441-buildvm-armv7-10.arm.fedoraproject.org-2.fc31 04/01/2014 > [ 71.324430] Workqueue: events nf_tables_trans_destroy_work [nf_tables] > [ 71.325387] RIP: 0010:nft_set_elem_destroy+0xa5/0x110 [nf_tables] Ouch, thanks for reporting, I'll check in a few hours.