diff mbox series

[nf] netfilter: ctnetlink: remove expired entries first

Message ID 20211209163926.25563-1-fw@strlen.de
State Accepted
Delegated to: Pablo Neira
Headers show
Series [nf] netfilter: ctnetlink: remove expired entries first | expand

Commit Message

Florian Westphal Dec. 9, 2021, 4:39 p.m. UTC
When dumping conntrack table to userspace via ctnetlink, check if the ct has
already expired before doing any of the 'skip' checks.

This expires dead entries faster.
/proc handler also removes outdated entries first.

Reported-by: Vitaly Zuevsky <vzuevsky@ns1.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 Vitaly, I suspect this might be related to the issue you reported,
 I suspect we skip the NAT-clash entries instead of evicting them from
 ctnetlink path too.

 net/netfilter/nf_conntrack_netlink.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Vitaly Zuevsky Dec. 9, 2021, 5:08 p.m. UTC | #1
On Thu, Dec 9, 2021 at 4:39 PM Florian Westphal <fw@strlen.de> wrote:
>
> When dumping conntrack table to userspace via ctnetlink, check if the ct has
> already expired before doing any of the 'skip' checks.
>
> This expires dead entries faster.
> /proc handler also removes outdated entries first.
>
> Reported-by: Vitaly Zuevsky <vzuevsky@ns1.com>
> Signed-off-by: Florian Westphal <fw@strlen.de>
> ---
>  Vitaly, I suspect this might be related to the issue you reported,
>  I suspect we skip the NAT-clash entries instead of evicting them from
>  ctnetlink path too.
>
>  net/netfilter/nf_conntrack_netlink.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 81d03acf68d4..ec4164c32d27 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -1195,8 +1195,6 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
>                 }
>                 hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[cb->args[0]],
>                                            hnnode) {
> -                       if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> -                               continue;
>                         ct = nf_ct_tuplehash_to_ctrack(h);
>                         if (nf_ct_is_expired(ct)) {
>                                 if (i < ARRAY_SIZE(nf_ct_evict) &&
> @@ -1208,6 +1206,9 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
>                         if (!net_eq(net, nf_ct_net(ct)))
>                                 continue;
>
> +                       if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> +                               continue;
> +
>                         if (cb->args[1]) {
>                                 if (ct != last)
>                                         continue;
> --
> 2.32.0
>

Florian, thanks for prompt turnaround on this. Seeing
conntrack -C
107530
mandates the check what flows consume this many entries. I cannot do
this if conntrack -L skips anything while kernel defaults to not
exposing conntrack table via /proc. This server is not supposed to NAT
anything by the way. We use some -j NOTRACK rules and I'd like to see
what flows evade those rules and consume so many slots. Could you
perhaps recommend a way to achieve this other than reconfiguring the
kernel?
Florian Westphal Dec. 9, 2021, 5:11 p.m. UTC | #2
Vitaly Zuevsky <vzuevsky@ns1.com> wrote:
> > diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> > index 81d03acf68d4..ec4164c32d27 100644
> > --- a/net/netfilter/nf_conntrack_netlink.c
> > +++ b/net/netfilter/nf_conntrack_netlink.c
> > @@ -1195,8 +1195,6 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
> >                 }
> >                 hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[cb->args[0]],
> >                                            hnnode) {
> > -                       if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> > -                               continue;
> >                         ct = nf_ct_tuplehash_to_ctrack(h);
> >                         if (nf_ct_is_expired(ct)) {
> >                                 if (i < ARRAY_SIZE(nf_ct_evict) &&
> > @@ -1208,6 +1206,9 @@ ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
> >                         if (!net_eq(net, nf_ct_net(ct)))
> >                                 continue;
> >
> > +                       if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
> > +                               continue;
> > +
> >                         if (cb->args[1]) {
> >                                 if (ct != last)
> >                                         continue;
> > --
> > 2.32.0
> >
> 
> Florian, thanks for prompt turnaround on this. Seeing
> conntrack -C
> 107530
> mandates the check what flows consume this many entries. I cannot do
> this if conntrack -L skips anything while kernel defaults to not
> exposing conntrack table via /proc. This server is not supposed to NAT
> anything by the way.

Then this patch doesn't change anything.

Maybe 'conntrack -L unconfirmed' or 'conntrack -L dying' show something?
Vitaly Zuevsky Dec. 9, 2021, 6:23 p.m. UTC | #3
On Thu, Dec 9, 2021 at 5:11 PM Florian Westphal <fw@strlen.de> wrote:
> > > --
> > > 2.32.0
> > >
> >
> > Florian, thanks for prompt turnaround on this. Seeing
> > conntrack -C
> > 107530
> > mandates the check what flows consume this many entries. I cannot do
> > this if conntrack -L skips anything while kernel defaults to not
> > exposing conntrack table via /proc. This server is not supposed to NAT
> > anything by the way.
>
> Then this patch doesn't change anything.
>
> Maybe 'conntrack -L unconfirmed' or 'conntrack -L dying' show something?

Are you saying that was a patch? v2.32.0? Mind sharing a link for
downloading the source and/or packaged release?
I would like to test it just in case, and if no luck, what do i do to
file it as a bug?
Pablo Neira Ayuso Dec. 16, 2021, 1:10 p.m. UTC | #4
On Thu, Dec 09, 2021 at 05:39:26PM +0100, Florian Westphal wrote:
> When dumping conntrack table to userspace via ctnetlink, check if the ct has
> already expired before doing any of the 'skip' checks.
> 
> This expires dead entries faster.
> /proc handler also removes outdated entries first.

Applied, thanks
Vitaly Zuevsky Dec. 17, 2021, 6:47 p.m. UTC | #5
Hi Florian

Do you have any news on this?
Meanwhile I cloned the repo git://git.netfilter.org/conntrack-tools,
ran ./autogen.sh to produce configure, and the latter failed with:

checking for rpc/rpc_msg.h... yes
./configure: line 13329: syntax error near unexpected token `LIBTIRPC,'
./configure: line 13329: `  PKG_CHECK_MODULES(LIBTIRPC, libtirpc >= 0.1)'

Interestingly, PKG_CHECK_MODULES was never defined there. Is that
repository for production code - I am confused?

Thank you.

Br Vitaly

On Thu, Dec 9, 2021 at 6:23 PM Vitaly Zuevsky <vzuevsky@ns1.com> wrote:
>
> On Thu, Dec 9, 2021 at 5:11 PM Florian Westphal <fw@strlen.de> wrote:
> > > > --
> > > > 2.32.0
> > > >
> > >
> > > Florian, thanks for prompt turnaround on this. Seeing
> > > conntrack -C
> > > 107530
> > > mandates the check what flows consume this many entries. I cannot do
> > > this if conntrack -L skips anything while kernel defaults to not
> > > exposing conntrack table via /proc. This server is not supposed to NAT
> > > anything by the way.
> >
> > Then this patch doesn't change anything.
> >
> > Maybe 'conntrack -L unconfirmed' or 'conntrack -L dying' show something?
>
> Are you saying that was a patch? v2.32.0? Mind sharing a link for
> downloading the source and/or packaged release?
> I would like to test it just in case, and if no luck, what do i do to
> file it as a bug?
Florian Westphal Dec. 17, 2021, 7:04 p.m. UTC | #6
Vitaly Zuevsky <vzuevsky@ns1.com> wrote:
> Hi Florian
> 
> Do you have any news on this?
> Meanwhile I cloned the repo git://git.netfilter.org/conntrack-tools,
> ran ./autogen.sh to produce configure, and the latter failed with:
> 
> checking for rpc/rpc_msg.h... yes
> ./configure: line 13329: syntax error near unexpected token `LIBTIRPC,'
> ./configure: line 13329: `  PKG_CHECK_MODULES(LIBTIRPC, libtirpc >= 0.1)'
> 
> Interestingly, PKG_CHECK_MODULES was never defined there. Is that
> repository for production code - I am confused?

Sure.  But the patch is for the kernel.
I already mentioned that this doesn't handle anything for non-nat case.

> > > Maybe 'conntrack -L unconfirmed' or 'conntrack -L dying' show something?

Still stands.

Also, is there really a discrepancy? Please show output of

conntrack -C
conntrack -L | wc -l
conntrack -C

"conntrack -L" reclaims dead/timed-out entries, conntrack -C currently
does not.
Vitaly Zuevsky Dec. 17, 2021, 7:49 p.m. UTC | #7
On Fri, Dec 17, 2021 at 7:04 PM Florian Westphal <fw@strlen.de> wrote:
> Sure.  But the patch is for the kernel.
> I already mentioned that this doesn't handle anything for non-nat case.
>
> > > > Maybe 'conntrack -L unconfirmed' or 'conntrack -L dying' show something?
>
> Still stands.
>
> Also, is there really a discrepancy? Please show output of
>
> conntrack -C
> conntrack -L | wc -l
> conntrack -C
>
> "conntrack -L" reclaims dead/timed-out entries, conntrack -C currently
> does not.

Of course... It is an order of magnitude difference:

# conntrack -L unconfirmed
conntrack v1.4.4 (conntrack-tools): 0 flow entries have been shown.

# conntrack -L dying
conntrack v1.4.4 (conntrack-tools): 0 flow entries have been shown.

# conntrack -C
88064

# conntrack -L | wc -l
conntrack v1.4.4 (conntrack-tools): 7641 flow entries have been shown.
7641

# conntrack -C
87706

# cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=18.04
DISTRIB_CODENAME=bionic
DISTRIB_DESCRIPTION="Ubuntu 18.04.5 LTS"
Vitaly Zuevsky Dec. 23, 2021, 5:42 p.m. UTC | #8
Hi Florian

Further to our investigation, I have identified the cause of confusion:

# conntrack -C
141412
# conntrack -L >/dev/null
conntrack v1.4.4 (conntrack-tools): 8342 flow entries have been shown.
# conntrack -f ipv6 -L >/dev/null
conntrack v1.4.4 (conntrack-tools): 124358 flow entries have been shown.

so -C option implicitly sums up ipv4 and ipv6 counts, and -L option
defaults to ipv4. This is neither a consistent UX practice nor
properly documented. I am happy to bring it up - Christmas present xD
diff mbox series

Patch

diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
index 81d03acf68d4..ec4164c32d27 100644
--- a/net/netfilter/nf_conntrack_netlink.c
+++ b/net/netfilter/nf_conntrack_netlink.c
@@ -1195,8 +1195,6 @@  ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
 		}
 		hlist_nulls_for_each_entry(h, n, &nf_conntrack_hash[cb->args[0]],
 					   hnnode) {
-			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
-				continue;
 			ct = nf_ct_tuplehash_to_ctrack(h);
 			if (nf_ct_is_expired(ct)) {
 				if (i < ARRAY_SIZE(nf_ct_evict) &&
@@ -1208,6 +1206,9 @@  ctnetlink_dump_table(struct sk_buff *skb, struct netlink_callback *cb)
 			if (!net_eq(net, nf_ct_net(ct)))
 				continue;
 
+			if (NF_CT_DIRECTION(h) != IP_CT_DIR_ORIGINAL)
+				continue;
+
 			if (cb->args[1]) {
 				if (ct != last)
 					continue;