monitor: fix printing of range elements in named sets

Message ID 149935174634.19966.16018870027671610502.stgit@nfdev2.cica.es
State Changes Requested
Delegated to: Pablo Neira
Headers show

Commit Message

Arturo Borrero Gonzalez July 6, 2017, 2:36 p.m.
If you add set elements to interval sets, the output is wrong.
Fix this by caching first element of the range (first event),
then wait for the second element of the range (second event) to
print them both at the same time.

We also avoid printing the first null element required in the RB tree.

Before this patch:

% nft add element t s {10-20, 30-40}
add element ip t s { 0 }
add element ip t s { 10 }
add element ip t s { ftp }
add element ip t s { 30 }
add element ip t s { 41 }

After this patch:

% nft add element t s {10-20, 30-40}
add element ip t s { 10-20 }
add element ip t s { 30-40 }

CC: Phil Sutter <phil@nwl.cc>
Signed-off-by: Arturo Borrero Gonzalez <arturo@netfilter.org>
---

This was discussed during Netfilter Workshop 2017 in Faro, Portugal.
I think Phil has another patch to address this issue from a different
approach.

 include/rule.h |    2 ++
 src/netlink.c  |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 52 insertions(+)


--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Phil Sutter July 11, 2017, 6:11 p.m. | #1
Hi,

On Thu, Jul 06, 2017 at 04:36:45PM +0200, Arturo Borrero Gonzalez wrote:
> If you add set elements to interval sets, the output is wrong.
> Fix this by caching first element of the range (first event),
> then wait for the second element of the range (second event) to
> print them both at the same time.

As promised, I am preparing my own solution for side-by-side comparison.
Though I'm running into problems and want to use the occasion to discuss
them first:

What I wasn't able to solve yet are half-open ranges, like so:

| nft add set ip t portrange { type inet_service; flags interval; }
| nft add element ip t portrange { 1024-65535 }

In this case there is only a single element with value 1024 which
doesn't have EXPR_F_INTERVAL_END set. Looking at
interval_map_decompose(), this is identified to be a range till the end
of the scope if it's the last element in the set.

In monitor code though, I can't predict whether an interval end element
will come afterwards or not, so I end up caching the element and
everything turns into a mess. I'm pretty sure your solution has the same
problem, could you check that?

Right now, I only see two ways to get this sorted:

1) Change kernel code to always include both start end end of a range in
   a single notification. This would eliminate the need for any caching
   in netlink_events_setelem_cb() altogether!

2) Change monitor code to cache all events until the final NFTA_GEN_ID
   message, then handle all messages at once.

What do you think?

Thanks, Phil
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Arturo Borrero Gonzalez July 12, 2017, 11:24 a.m. | #2
On 11 July 2017 at 20:11, Phil Sutter <phil@nwl.cc> wrote:
> Hi,
>
> On Thu, Jul 06, 2017 at 04:36:45PM +0200, Arturo Borrero Gonzalez wrote:
>> If you add set elements to interval sets, the output is wrong.
>> Fix this by caching first element of the range (first event),
>> then wait for the second element of the range (second event) to
>> print them both at the same time.
>
> As promised, I am preparing my own solution for side-by-side comparison.
> Though I'm running into problems and want to use the occasion to discuss
> them first:
>
> What I wasn't able to solve yet are half-open ranges, like so:
>
> | nft add set ip t portrange { type inet_service; flags interval; }
> | nft add element ip t portrange { 1024-65535 }
>
> In this case there is only a single element with value 1024 which
> doesn't have EXPR_F_INTERVAL_END set. Looking at
> interval_map_decompose(), this is identified to be a range till the end
> of the scope if it's the last element in the set.
>
> In monitor code though, I can't predict whether an interval end element
> will come afterwards or not, so I end up caching the element and
> everything turns into a mess. I'm pretty sure your solution has the same
> problem, could you check that?
>
> Right now, I only see two ways to get this sorted:
>
> 1) Change kernel code to always include both start end end of a range in
>    a single notification. This would eliminate the need for any caching
>    in netlink_events_setelem_cb() altogether!
>
> 2) Change monitor code to cache all events until the final NFTA_GEN_ID
>    message, then handle all messages at once.
>
> What do you think?
>

We should avoid touching the kernel for this.

Anyway, my patch doesn't solve the same issue for deleting range elements.
In this patch I added the logic in netlink_events_setelem_cb() and
probably a better place for this
is in the netlink_events_cache_update() routine.

I'm sending a patch to add a bit of debugging to the monitor code path
meanwhile we solve this issue.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Phil Sutter July 12, 2017, 12:13 p.m. | #3
Hi,

On Wed, Jul 12, 2017 at 01:24:24PM +0200, Arturo Borrero Gonzalez wrote:
> On 11 July 2017 at 20:11, Phil Sutter <phil@nwl.cc> wrote:
> > Hi,
> >
> > On Thu, Jul 06, 2017 at 04:36:45PM +0200, Arturo Borrero Gonzalez wrote:
> >> If you add set elements to interval sets, the output is wrong.
> >> Fix this by caching first element of the range (first event),
> >> then wait for the second element of the range (second event) to
> >> print them both at the same time.
> >
> > As promised, I am preparing my own solution for side-by-side comparison.
> > Though I'm running into problems and want to use the occasion to discuss
> > them first:
> >
> > What I wasn't able to solve yet are half-open ranges, like so:
> >
> > | nft add set ip t portrange { type inet_service; flags interval; }
> > | nft add element ip t portrange { 1024-65535 }
> >
> > In this case there is only a single element with value 1024 which
> > doesn't have EXPR_F_INTERVAL_END set. Looking at
> > interval_map_decompose(), this is identified to be a range till the end
> > of the scope if it's the last element in the set.
> >
> > In monitor code though, I can't predict whether an interval end element
> > will come afterwards or not, so I end up caching the element and
> > everything turns into a mess. I'm pretty sure your solution has the same
> > problem, could you check that?
> >
> > Right now, I only see two ways to get this sorted:
> >
> > 1) Change kernel code to always include both start end end of a range in
> >    a single notification. This would eliminate the need for any caching
> >    in netlink_events_setelem_cb() altogether!
> >
> > 2) Change monitor code to cache all events until the final NFTA_GEN_ID
> >    message, then handle all messages at once.
> >
> > What do you think?
> >
> 
> We should avoid touching the kernel for this.

I found a userspace solution yesterday: I created a new callback for
NFT_MSG_NEWGEN, which will take care of the remaining cached element.
Patch follows later.

Thanks, Phil
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Pablo Neira Ayuso July 17, 2017, 4:26 p.m. | #4
Hi Arturo,

On Thu, Jul 06, 2017 at 04:36:45PM +0200, Arturo Borrero Gonzalez wrote:
> If you add set elements to interval sets, the output is wrong.
> Fix this by caching first element of the range (first event),
> then wait for the second element of the range (second event) to
> print them both at the same time.
> 
> We also avoid printing the first null element required in the RB tree.
> 
> Before this patch:
> 
> % nft add element t s {10-20, 30-40}
> add element ip t s { 0 }
> add element ip t s { 10 }
> add element ip t s { ftp }
> add element ip t s { 30 }
> add element ip t s { 41 }
> 
> After this patch:
> 
> % nft add element t s {10-20, 30-40}
> add element ip t s { 10-20 }
> add element ip t s { 30-40 }
> 
> CC: Phil Sutter <phil@nwl.cc>
> Signed-off-by: Arturo Borrero Gonzalez <arturo@netfilter.org>
> ---
> 
> This was discussed during Netfilter Workshop 2017 in Faro, Portugal.
> I think Phil has another patch to address this issue from a different
> approach.

I like this approach, it's simple. Still you mentioned it doesn't work
for several cases. One of them are deletions.

Also Phil has found a number of corner cases that still need special
handling, problems are:

1) nft flush set gets elements in reverse order, we can probably fix
   this from the kernel (flush set support is quite recent). Or just
   use the mergesort function to get them sorted out from userspace.

2) Half-open ranges, it's just a single element with the end flag
   interval set on, in such case, we just print what we have.

Anything else?

There's a list of threads discussing this split in several patches,
it's making it a bit hard to follow you guys ;-).

>  include/rule.h |    2 ++
>  src/netlink.c  |   50 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 52 insertions(+)
> 
> diff --git a/include/rule.h b/include/rule.h
> index 7424b21..1b44e4c 100644
> --- a/include/rule.h
> +++ b/include/rule.h
> @@ -217,6 +217,7 @@ extern struct rule *rule_lookup(const struct chain *chain, uint64_t handle);
>   * @datalen:	mapping data len
>   * @objtype:	mapping object type
>   * @init:	initializer
> + * @rg_cache:	cached range element (left)
>   * @policy:	set mechanism policy
>   * @desc:	set mechanism desc
>   */
> @@ -234,6 +235,7 @@ struct set {
>  	unsigned int		datalen;
>  	uint32_t		objtype;
>  	struct expr		*init;
> +	struct expr		*rg_cache;

Why not just add this element to the set, instead of caching it here
in this new field? I mean, place it in set->init?

>  	uint32_t		policy;
>  	struct {
>  		uint32_t	size;
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/include/rule.h b/include/rule.h
index 7424b21..1b44e4c 100644
--- a/include/rule.h
+++ b/include/rule.h
@@ -217,6 +217,7 @@  extern struct rule *rule_lookup(const struct chain *chain, uint64_t handle);
  * @datalen:	mapping data len
  * @objtype:	mapping object type
  * @init:	initializer
+ * @rg_cache:	cached range element (left)
  * @policy:	set mechanism policy
  * @desc:	set mechanism desc
  */
@@ -234,6 +235,7 @@  struct set {
 	unsigned int		datalen;
 	uint32_t		objtype;
 	struct expr		*init;
+	struct expr		*rg_cache;
 	uint32_t		policy;
 	struct {
 		uint32_t	size;
diff --git a/src/netlink.c b/src/netlink.c
index 880502c..ad0e712 100644
--- a/src/netlink.c
+++ b/src/netlink.c
@@ -2198,6 +2198,46 @@  out:
 	return MNL_CB_OK;
 }
 
+/* returns true if the event should be ignored (i.e. null element) */
+static bool netlink_event_ignore_range_event(struct nftnl_set_elem *nlse)
+{
+        uint32_t flags = 0;
+
+	if (nftnl_set_elem_is_set(nlse, NFTNL_SET_ELEM_FLAGS))
+		flags = nftnl_set_elem_get_u32(nlse, NFTNL_SET_ELEM_FLAGS);
+	if (!(flags & NFT_SET_ELEM_INTERVAL_END))
+		return false;
+
+	if (nftnl_set_elem_get_u32(nlse, NFTNL_SET_ELEM_KEY) != 0)
+		return false;
+
+	return true;
+}
+
+/* returns true if the we cached the range element */
+static bool netlink_event_range_cache(struct set *cached_set,
+				      struct set *dummyset)
+{
+	struct expr *elem;
+
+	/* not an interval ? */
+	if (!(cached_set->flags & NFT_SET_INTERVAL))
+		return false;
+
+	/* cache first element of the range */
+	elem = list_entry(dummyset->init->expressions.prev, struct expr, list);
+	if (!(elem->flags & EXPR_F_INTERVAL_END)) {
+		cached_set->rg_cache = expr_clone(elem);
+		return true;
+	}
+
+	/* we have all the information now */
+	compound_expr_add(dummyset->init, cached_set->rg_cache);
+	interval_map_decompose(dummyset->init);
+
+	return false;
+}
+
 static int netlink_events_setelem_cb(const struct nlmsghdr *nlh, int type,
 				     struct netlink_mon_handler *monh)
 {
@@ -2240,6 +2280,11 @@  static int netlink_events_setelem_cb(const struct nlmsghdr *nlh, int type,
 
 		nlse = nftnl_set_elems_iter_next(nlsei);
 		while (nlse != NULL) {
+			if (netlink_event_ignore_range_event(nlse)) {
+				set_free(dummyset);
+				nftnl_set_elems_iter_destroy(nlsei);
+				goto out;
+			}
 			if (netlink_delinearize_setelem(nlse, dummyset) < 0) {
 				set_free(dummyset);
 				nftnl_set_elems_iter_destroy(nlsei);
@@ -2251,6 +2296,10 @@  static int netlink_events_setelem_cb(const struct nlmsghdr *nlh, int type,
 
 		switch (type) {
 		case NFT_MSG_NEWSETELEM:
+			if (netlink_event_range_cache(set, dummyset)) {
+				set_free(dummyset);
+				goto out;
+			}
 			printf("add ");
 			break;
 		case NFT_MSG_DELSETELEM:
@@ -2264,6 +2313,7 @@  static int netlink_events_setelem_cb(const struct nlmsghdr *nlh, int type,
 		expr_print(dummyset->init, monh->ctx->octx);
 		printf("\n");
 
+		expr_free(set->rg_cache);
 		set_free(dummyset);
 		break;
 	case NFTNL_OUTPUT_XML: