mbox series

[4.14,0/4] netfilter: xt_connlimit: backport upstream fixes for race in connection counting

Message ID 20190102170023.10415-1-mfo@canonical.com
Headers show
Series netfilter: xt_connlimit: backport upstream fixes for race in connection counting | expand

Message

Mauricio Faria de Oliveira Jan. 2, 2019, 5 p.m. UTC
Recently, Alakesh Haloi reported the following issue [1] with stable/4.14:

  """
  An iptable rule like the following on a multicore systems will result in
  accepting more connections than set in the rule.

  iptables  -A INPUT -p tcp -m tcp --syn --dport 7777 -m connlimit \
        --connlimit-above 2000 --connlimit-mask 0 -j DROP
  """

And proposed a fix that is not in Linus's tree. The discussion went on to
confirm whether the issue was still reproducible with mainline/nf.git tip,
and to either identify the upstream fix or re-submit the non-upstream fix.

Alakesh eventually was able to test with upstream, and reported that issue
was still reproducible [2].
On that, our findinds diverge, at least in my test environment:

First, I verified that the suggested mainline fix for the issue [3] indeed
fixes it, by testing with it applied and reverted on v4.18, a clean revert.
(The issue is reproducible with the commit reverted).

Then, with a consistent reproducer, I moved to nf.git, with HEAD on commit
a007232 ("netfilter: nf_conncount: fix argument order to find_next_bit"),
and the issues was not reproducible (even with 20+ threads on client side,
the number Alakesh reported to achieve 2150+ connections [4], and I tried
spreading the network interface IRQ affinity over more and more CPUs too.)

Either way, the suggested mainline fix does actually fix the issue in 4.14
for at least one environment. So, it might well be the case that Alakesh's
test environment has differences/subtleties that leads to more connections
accepted, and more commits are needed for that particular environment type.

But for now, with one bare-metal environment (24-core server, 4-core client)
verified, I thought of submitting the patches for review/comments/testing,
then looking for additional fixes for that environment separately.

The fix is PATCH 4/4, and PATCHes 1-3/4 are helpers for a cleaner backport.
All backports are simple, and essentially consist of refresh context lines
and use older struct/file names.

Reviews from netfilter maintainers are very appreciated, as I've no previous
experience in this area, and although the backports look simple and build/run
correctly, there's usually stuff that only more experienced people may notice.

Thanks,
Mauricio

Links:
=====

  [1] https://www.spinics.net/lists/stable/msg270040.html
  [2] https://www.spinics.net/lists/stable/msg273669.html
  [3] https://www.spinics.net/lists/stable/msg271300.html
  [4] https://www.spinics.net/lists/stable/msg273669.html

Test-case:
=========

 - v4.14.91 (original): client achieves 2000+ connections (6000 target)
                        with 3 threads.

    server # iptables -F
    server # iptables -A INPUT -p tcp -m tcp --syn --dport 7777 -m connlimit --connlimit-above 2000 --connlimit-mask 0 -j DROP 

    server # iptables -L
    Chain INPUT (policy ACCEPT)
    target     prot opt source               destination         
    DROP       tcp  --  anywhere             anywhere             tcp dpt:7777 flags:FIN,SYN,RST,ACK/SYN #conn src/0 > 2000

    Chain FORWARD (policy ACCEPT)
    target     prot opt source               destination         

    Chain OUTPUT (policy ACCEPT)
    target     prot opt source               destination         

    server # ulimit -SHn 65000
    server # ruby server.rb
    <... listening ...>


    client # ulimit -SHn 65000
    client # ruby client.rb 10.230.56.100 7777 6000 3
    Connecting to ["10.230.56.100"]:7777 6000 times with 3
    1
    2
    3
    <...>
    2000
    <...>
    6000
    Target reached. Thread finishing
    6001
    Target reached. Thread finishing
    6002
    Target reached. Thread finishing
    Threads done. 6002 connections
    press enter to exit

 - v4.14.91 + patches: client only achieved 2000 connections.

    server #  (same procedure)

    client #  (same procedure)

    Connecting to ["10.230.56.100"]:7777 6000 times with 3
    1
    2
    3
    <...>
    2000
    <... blocked for a while...>
    failed to create connection: Connection timed out - connect(2) for "10.230.56.100" port 7777
    failed to create connection: Connection timed out - connect(2) for "10.230.56.100" port 7777
    failed to create connection: Connection timed out - connect(2) for "10.230.56.100" port 7777
    Threads done. 2000 connections
    press enter to exit

Florian Westphal (2):
  netfilter: xt_connlimit: don't store address in the conn nodes
  netfilter: nf_conncount: fix garbage collection confirm race

Pablo Neira Ayuso (1):
  netfilter: nf_conncount: expose connection list interface

Yi-Hung Wei (1):
  netfilter: nf_conncount: Fix garbage collection with zones

 include/net/netfilter/nf_conntrack_count.h | 15 +++++
 net/netfilter/xt_connlimit.c               | 99 +++++++++++++++++++++++-------
 2 files changed, 91 insertions(+), 23 deletions(-)
 create mode 100644 include/net/netfilter/nf_conntrack_count.h

Comments

Florian Westphal Jan. 2, 2019, 5:17 p.m. UTC | #1
Mauricio Faria de Oliveira <mfo@canonical.com> wrote:
> Recently, Alakesh Haloi reported the following issue [1] with stable/4.14:
> 
>   """
>   An iptable rule like the following on a multicore systems will result in
>   accepting more connections than set in the rule.
> 
>   iptables  -A INPUT -p tcp -m tcp --syn --dport 7777 -m connlimit \
>         --connlimit-above 2000 --connlimit-mask 0 -j DROP
>   """
> 
> And proposed a fix that is not in Linus's tree. The discussion went on to
> confirm whether the issue was still reproducible with mainline/nf.git tip,
> and to either identify the upstream fix or re-submit the non-upstream fix.
> 
> Alakesh eventually was able to test with upstream, and reported that issue
> was still reproducible [2].
> On that, our findinds diverge, at least in my test environment:
> 
> First, I verified that the suggested mainline fix for the issue [3] indeed
> fixes it, by testing with it applied and reverted on v4.18, a clean revert.
> (The issue is reproducible with the commit reverted).
> 
> Then, with a consistent reproducer, I moved to nf.git, with HEAD on commit
> a007232 ("netfilter: nf_conncount: fix argument order to find_next_bit"),
> and the issues was not reproducible (even with 20+ threads on client side,
> the number Alakesh reported to achieve 2150+ connections [4], and I tried
> spreading the network interface IRQ affinity over more and more CPUs too.)
> 
> Either way, the suggested mainline fix does actually fix the issue in 4.14
> for at least one environment. So, it might well be the case that Alakesh's
> test environment has differences/subtleties that leads to more connections
> accepted, and more commits are needed for that particular environment type.

nf_conncount has a design flaw that is only closed in nf.git/net.git
at the time of this writing, so results with earlier kernels (including
4.20) might just fail with different bugs.

4.14 doesn't have those problems, so I think this series (aside from the
nit in patch 4/4) indeed should fix the issue reported.

> But for now, with one bare-metal environment (24-core server, 4-core client)
> verified, I thought of submitting the patches for review/comments/testing,
> then looking for additional fixes for that environment separately.

4.14 should be good after this afaics.

Thanks a lot for doing this backport and the details testing
information.
Mauricio Faria de Oliveira Jan. 2, 2019, 7:52 p.m. UTC | #2
Florian,

On Wed, Jan 2, 2019 at 3:17 PM Florian Westphal <fw@strlen.de> wrote:
>
> Mauricio Faria de Oliveira <mfo@canonical.com> wrote:
<snip>
> > Either way, the suggested mainline fix does actually fix the issue in 4.14
> > for at least one environment. So, it might well be the case that Alakesh's
> > test environment has differences/subtleties that leads to more connections
> > accepted, and more commits are needed for that particular environment type.
>
> nf_conncount has a design flaw that is only closed in nf.git/net.git
> at the time of this writing, so results with earlier kernels (including
> 4.20) might just fail with different bugs.
>
> 4.14 doesn't have those problems, so I think this series (aside from the
> nit in patch 4/4) indeed should fix the issue reported.

Thanks for mentioning that. It offers some relief about the different
results observed.

> > But for now, with one bare-metal environment (24-core server, 4-core client)
> > verified, I thought of submitting the patches for review/comments/testing,
> > then looking for additional fixes for that environment separately.
>
> 4.14 should be good after this afaics.
>
> Thanks a lot for doing this backport and the details testing
> information.

Thank you a lot for your quick and careful review.
I'll build/test/submit a PATCH v2 series (with that fix to patch 4/4) shortly.

cheers,