mbox series

[SRU,Bionic,0/2] Fix nf_conntrack races when dealing with same origin requests in NAT environments

Message ID 20190717003041.9775-1-matthew.ruffell@canonical.com
Headers show
Series Fix nf_conntrack races when dealing with same origin requests in NAT environments | expand

Message

Matthew Ruffell July 17, 2019, 12:30 a.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1836816

[Impact]

There are a number of races in nf_conntrack when multiple packets are sent from
the same origin to the same destination at the same time, where there is no
pre-existing confirmed conntrack entry, in a NAT environment.

This is most frequently seen in users with kubernetes workloads, where dns
requests are sent from the same socket at the same time, from different threads.
One request is for a A record, the other AAAA. A conntrack race happens, which
leads to one request being dropped, which leads to 5 second dns timeouts.

This problem is not specific to kubernetes, as any multi-threaded process
which sends UDP packets from the same socket at the same time from different
threads is affected.

Note that since UDP is connectionless, no packet is sent when connect() is
called, so no conntrack entries are created until a packet is first sent.

In the scenario where two UDP packets are sent at the same time from the same
socket on different threads, there are the following possible races:

1.) Neither of the packets find a confirmed conntrack entry. This leads to two
    conntrack entries being created with the same tuples.
2.) Same as 1), but a conntrack entry is confirmed for one of the packets before
    the other calls get_unique_tuple(). The other packet gets a different reply
    tuple with the source port changed.

The outcomes of the races are the same, one of the packets is dropped when it
comes time to confirm conntrack entries with __nf_conntrack_confirm().

For more information, read the technical blog post: 
https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts

[Fix]

The fix to race 1) was included in 4.19 upstream by this commit:

netfilter: nf_conntrack: resolve clash for matching conntracks
commit ed07d9a021df6da53456663a76999189badc432a
Author: Martynas Pumputis <martynas@weave.works>
Date: Mon Jul 2 16:52:14 2018 +0200 

This commit adds an extra check to see if the two entries are the same and to
merge the entries if they are.

The fix to race 2) was included in 5.0 upstream by this commit:

netfilter: nf_nat: skip nat clash resolution for same-origin entries
commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
Author: Martynas Pumputis <martynas@weave.works>
Date: Tue Jan 29 15:51:42 2019 +0100 

This commit ensures that a new source port is not allocated to a duplicated
entry, and forwards the duplicated entries to be resolved later on.

Note that 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 was also backported to stable
releases 4.9.163, 4.14.106, 4.19.29, 4.20.16.

Both commits cherry-pick to 4.15 bionic cleanly. Please cherry-pick both commits
to all bionic kernels.

[Testcase]

The reproducer has been provided by the author of the fixes, and is best viewed
here:

https://github.com/brb/conntrack-race

The dmesg logs and packet traces in the above github repo are best viewed while
reading upstream discussion: 

https://patchwork.ozlabs.org/patch/937963/

I have built a test kernel for xenial HWE, which can be found here:
https://launchpad.net/~mruffell/+archive/ubuntu/sf00227747-test-kernel

The test kernel has been tested with a kubernetes workload and aided with
resolution of kubernetes problems.

[Regression Potential]

As noted previously, 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 has been 
backported to stable releases 4.9.163, 4.14.106, 4.19.29, 4.20.16 and has no
changes to the primary commit. This commit is well tested and considered stable
by the community.

Both commits have also been present in the -azure kernels since 4.15.0-1030.31
and 4.18.0-1004.4, as per LP #1795493, http://bugs.launchpad.net/bugs/1795493
They have received wide testing, and no problems have been reported.

The changes are specifically limited to resolving clashes between duplicate
conntrack entries, and any regressions will be limited to that scenario. The
overall risk of regression is very low.

[Notes]
The commits in the -azure kernels will need to be reverted before importing the
generic bionic updates, since they are the pre-upstream commits, and have had 
several revisions since they were imported. Replacing them with the 
actual upstream commits will make things easier in the future to maintain.

http://bugs.launchpad.net/bugs/1795493

For race 1):
Azure: https://patchwork.ozlabs.org/patch/937963/
Upstream: ed07d9a021df6da53456663a76999189badc432a
Commits are the same.

For race 2):
Azure: https://patchwork.ozlabs.org/patch/952939/
Upstream: 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
Note the difference, old commit must be reverted first.

Martynas Pumputis (2):
  netfilter: nf_conntrack: resolve clash for matching conntracks
  netfilter: nf_nat: skip nat clash resolution for same-origin entries

 net/netfilter/nf_conntrack_core.c | 46 +++++++++++++++++++++++++------
 1 file changed, 38 insertions(+), 8 deletions(-)

Comments

Stefan Bader July 18, 2019, 8:20 a.m. UTC | #1
On 17.07.19 02:30, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1836816
> 
> [Impact]
> 
> There are a number of races in nf_conntrack when multiple packets are sent from
> the same origin to the same destination at the same time, where there is no
> pre-existing confirmed conntrack entry, in a NAT environment.
> 
> This is most frequently seen in users with kubernetes workloads, where dns
> requests are sent from the same socket at the same time, from different threads.
> One request is for a A record, the other AAAA. A conntrack race happens, which
> leads to one request being dropped, which leads to 5 second dns timeouts.
> 
> This problem is not specific to kubernetes, as any multi-threaded process
> which sends UDP packets from the same socket at the same time from different
> threads is affected.
> 
> Note that since UDP is connectionless, no packet is sent when connect() is
> called, so no conntrack entries are created until a packet is first sent.
> 
> In the scenario where two UDP packets are sent at the same time from the same
> socket on different threads, there are the following possible races:
> 
> 1.) Neither of the packets find a confirmed conntrack entry. This leads to two
>     conntrack entries being created with the same tuples.
> 2.) Same as 1), but a conntrack entry is confirmed for one of the packets before
>     the other calls get_unique_tuple(). The other packet gets a different reply
>     tuple with the source port changed.
> 
> The outcomes of the races are the same, one of the packets is dropped when it
> comes time to confirm conntrack entries with __nf_conntrack_confirm().
> 
> For more information, read the technical blog post: 
> https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
> 
> [Fix]
> 
> The fix to race 1) was included in 4.19 upstream by this commit:
> 
> netfilter: nf_conntrack: resolve clash for matching conntracks
> commit ed07d9a021df6da53456663a76999189badc432a
> Author: Martynas Pumputis <martynas@weave.works>
> Date: Mon Jul 2 16:52:14 2018 +0200 
> 
> This commit adds an extra check to see if the two entries are the same and to
> merge the entries if they are.
> 
> The fix to race 2) was included in 5.0 upstream by this commit:
> 
> netfilter: nf_nat: skip nat clash resolution for same-origin entries
> commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Author: Martynas Pumputis <martynas@weave.works>
> Date: Tue Jan 29 15:51:42 2019 +0100 
> 
> This commit ensures that a new source port is not allocated to a duplicated
> entry, and forwards the duplicated entries to be resolved later on.
> 
> Note that 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 was also backported to stable
> releases 4.9.163, 4.14.106, 4.19.29, 4.20.16.
> 
> Both commits cherry-pick to 4.15 bionic cleanly. Please cherry-pick both commits
> to all bionic kernels.
> 
> [Testcase]
> 
> The reproducer has been provided by the author of the fixes, and is best viewed
> here:
> 
> https://github.com/brb/conntrack-race
> 
> The dmesg logs and packet traces in the above github repo are best viewed while
> reading upstream discussion: 
> 
> https://patchwork.ozlabs.org/patch/937963/
> 
> I have built a test kernel for xenial HWE, which can be found here:
> https://launchpad.net/~mruffell/+archive/ubuntu/sf00227747-test-kernel
> 
> The test kernel has been tested with a kubernetes workload and aided with
> resolution of kubernetes problems.
> 
> [Regression Potential]
> 
> As noted previously, 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 has been 
> backported to stable releases 4.9.163, 4.14.106, 4.19.29, 4.20.16 and has no
> changes to the primary commit. This commit is well tested and considered stable
> by the community.
> 
> Both commits have also been present in the -azure kernels since 4.15.0-1030.31
> and 4.18.0-1004.4, as per LP #1795493, http://bugs.launchpad.net/bugs/1795493
> They have received wide testing, and no problems have been reported.
> 
> The changes are specifically limited to resolving clashes between duplicate
> conntrack entries, and any regressions will be limited to that scenario. The
> overall risk of regression is very low.
> 
> [Notes]
> The commits in the -azure kernels will need to be reverted before importing the
> generic bionic updates, since they are the pre-upstream commits, and have had 
> several revisions since they were imported. Replacing them with the 
> actual upstream commits will make things easier in the future to maintain.
> 
> http://bugs.launchpad.net/bugs/1795493
> 
> For race 1):
> Azure: https://patchwork.ozlabs.org/patch/937963/
> Upstream: ed07d9a021df6da53456663a76999189badc432a
> Commits are the same.
> 
> For race 2):
> Azure: https://patchwork.ozlabs.org/patch/952939/
> Upstream: 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Note the difference, old commit must be reverted first.
> 
> Martynas Pumputis (2):
>   netfilter: nf_conntrack: resolve clash for matching conntracks
>   netfilter: nf_nat: skip nat clash resolution for same-origin entries
> 
>  net/netfilter/nf_conntrack_core.c | 46 +++++++++++++++++++++++++------
>  1 file changed, 38 insertions(+), 8 deletions(-)
> 
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Connor Kuehl July 19, 2019, 3:54 p.m. UTC | #2
On 7/16/19 5:30 PM, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1836816
> 
> [Impact]
> 
> There are a number of races in nf_conntrack when multiple packets are sent from
> the same origin to the same destination at the same time, where there is no
> pre-existing confirmed conntrack entry, in a NAT environment.
> 
> This is most frequently seen in users with kubernetes workloads, where dns
> requests are sent from the same socket at the same time, from different threads.
> One request is for a A record, the other AAAA. A conntrack race happens, which
> leads to one request being dropped, which leads to 5 second dns timeouts.
> 
> This problem is not specific to kubernetes, as any multi-threaded process
> which sends UDP packets from the same socket at the same time from different
> threads is affected.
> 
> Note that since UDP is connectionless, no packet is sent when connect() is
> called, so no conntrack entries are created until a packet is first sent.
> 
> In the scenario where two UDP packets are sent at the same time from the same
> socket on different threads, there are the following possible races:
> 
> 1.) Neither of the packets find a confirmed conntrack entry. This leads to two
>     conntrack entries being created with the same tuples.
> 2.) Same as 1), but a conntrack entry is confirmed for one of the packets before
>     the other calls get_unique_tuple(). The other packet gets a different reply
>     tuple with the source port changed.
> 
> The outcomes of the races are the same, one of the packets is dropped when it
> comes time to confirm conntrack entries with __nf_conntrack_confirm().
> 
> For more information, read the technical blog post: 
> https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
> 
> [Fix]
> 
> The fix to race 1) was included in 4.19 upstream by this commit:
> 
> netfilter: nf_conntrack: resolve clash for matching conntracks
> commit ed07d9a021df6da53456663a76999189badc432a
> Author: Martynas Pumputis <martynas@weave.works>
> Date: Mon Jul 2 16:52:14 2018 +0200 
> 
> This commit adds an extra check to see if the two entries are the same and to
> merge the entries if they are.
> 
> The fix to race 2) was included in 5.0 upstream by this commit:
> 
> netfilter: nf_nat: skip nat clash resolution for same-origin entries
> commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Author: Martynas Pumputis <martynas@weave.works>
> Date: Tue Jan 29 15:51:42 2019 +0100 
> 
> This commit ensures that a new source port is not allocated to a duplicated
> entry, and forwards the duplicated entries to be resolved later on.
> 
> Note that 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 was also backported to stable
> releases 4.9.163, 4.14.106, 4.19.29, 4.20.16.
> 
> Both commits cherry-pick to 4.15 bionic cleanly. Please cherry-pick both commits
> to all bionic kernels.
> 
> [Testcase]
> 
> The reproducer has been provided by the author of the fixes, and is best viewed
> here:
> 
> https://github.com/brb/conntrack-race
> 
> The dmesg logs and packet traces in the above github repo are best viewed while
> reading upstream discussion: 
> 
> https://patchwork.ozlabs.org/patch/937963/
> 
> I have built a test kernel for xenial HWE, which can be found here:
> https://launchpad.net/~mruffell/+archive/ubuntu/sf00227747-test-kernel
> 
> The test kernel has been tested with a kubernetes workload and aided with
> resolution of kubernetes problems.
> 
> [Regression Potential]
> 
> As noted previously, 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 has been 
> backported to stable releases 4.9.163, 4.14.106, 4.19.29, 4.20.16 and has no
> changes to the primary commit. This commit is well tested and considered stable
> by the community.
> 
> Both commits have also been present in the -azure kernels since 4.15.0-1030.31
> and 4.18.0-1004.4, as per LP #1795493, http://bugs.launchpad.net/bugs/1795493
> They have received wide testing, and no problems have been reported.
> 
> The changes are specifically limited to resolving clashes between duplicate
> conntrack entries, and any regressions will be limited to that scenario. The
> overall risk of regression is very low.
> 
> [Notes]
> The commits in the -azure kernels will need to be reverted before importing the
> generic bionic updates, since they are the pre-upstream commits, and have had 
> several revisions since they were imported. Replacing them with the 
> actual upstream commits will make things easier in the future to maintain.
> 
> http://bugs.launchpad.net/bugs/1795493
> 
> For race 1):
> Azure: https://patchwork.ozlabs.org/patch/937963/
> Upstream: ed07d9a021df6da53456663a76999189badc432a
> Commits are the same.
> 
> For race 2):
> Azure: https://patchwork.ozlabs.org/patch/952939/
> Upstream: 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Note the difference, old commit must be reverted first.
> 
> Martynas Pumputis (2):
>   netfilter: nf_conntrack: resolve clash for matching conntracks
>   netfilter: nf_nat: skip nat clash resolution for same-origin entries
> 
>  net/netfilter/nf_conntrack_core.c | 46 +++++++++++++++++++++++++------
>  1 file changed, 38 insertions(+), 8 deletions(-)
> 

Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
Khalid Elmously July 23, 2019, 5:23 a.m. UTC | #3
On 2019-07-17 12:30:39 , Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/1836816
> 
> [Impact]
> 
> There are a number of races in nf_conntrack when multiple packets are sent from
> the same origin to the same destination at the same time, where there is no
> pre-existing confirmed conntrack entry, in a NAT environment.
> 
> This is most frequently seen in users with kubernetes workloads, where dns
> requests are sent from the same socket at the same time, from different threads.
> One request is for a A record, the other AAAA. A conntrack race happens, which
> leads to one request being dropped, which leads to 5 second dns timeouts.
> 
> This problem is not specific to kubernetes, as any multi-threaded process
> which sends UDP packets from the same socket at the same time from different
> threads is affected.
> 
> Note that since UDP is connectionless, no packet is sent when connect() is
> called, so no conntrack entries are created until a packet is first sent.
> 
> In the scenario where two UDP packets are sent at the same time from the same
> socket on different threads, there are the following possible races:
> 
> 1.) Neither of the packets find a confirmed conntrack entry. This leads to two
>     conntrack entries being created with the same tuples.
> 2.) Same as 1), but a conntrack entry is confirmed for one of the packets before
>     the other calls get_unique_tuple(). The other packet gets a different reply
>     tuple with the source port changed.
> 
> The outcomes of the races are the same, one of the packets is dropped when it
> comes time to confirm conntrack entries with __nf_conntrack_confirm().
> 
> For more information, read the technical blog post: 
> https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
> 
> [Fix]
> 
> The fix to race 1) was included in 4.19 upstream by this commit:
> 
> netfilter: nf_conntrack: resolve clash for matching conntracks
> commit ed07d9a021df6da53456663a76999189badc432a
> Author: Martynas Pumputis <martynas@weave.works>
> Date: Mon Jul 2 16:52:14 2018 +0200 
> 
> This commit adds an extra check to see if the two entries are the same and to
> merge the entries if they are.
> 
> The fix to race 2) was included in 5.0 upstream by this commit:
> 
> netfilter: nf_nat: skip nat clash resolution for same-origin entries
> commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Author: Martynas Pumputis <martynas@weave.works>
> Date: Tue Jan 29 15:51:42 2019 +0100 
> 
> This commit ensures that a new source port is not allocated to a duplicated
> entry, and forwards the duplicated entries to be resolved later on.
> 
> Note that 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 was also backported to stable
> releases 4.9.163, 4.14.106, 4.19.29, 4.20.16.
> 
> Both commits cherry-pick to 4.15 bionic cleanly. Please cherry-pick both commits
> to all bionic kernels.
> 
> [Testcase]
> 
> The reproducer has been provided by the author of the fixes, and is best viewed
> here:
> 
> https://github.com/brb/conntrack-race
> 
> The dmesg logs and packet traces in the above github repo are best viewed while
> reading upstream discussion: 
> 
> https://patchwork.ozlabs.org/patch/937963/
> 
> I have built a test kernel for xenial HWE, which can be found here:
> https://launchpad.net/~mruffell/+archive/ubuntu/sf00227747-test-kernel
> 
> The test kernel has been tested with a kubernetes workload and aided with
> resolution of kubernetes problems.
> 
> [Regression Potential]
> 
> As noted previously, 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 has been 
> backported to stable releases 4.9.163, 4.14.106, 4.19.29, 4.20.16 and has no
> changes to the primary commit. This commit is well tested and considered stable
> by the community.
> 
> Both commits have also been present in the -azure kernels since 4.15.0-1030.31
> and 4.18.0-1004.4, as per LP #1795493, http://bugs.launchpad.net/bugs/1795493
> They have received wide testing, and no problems have been reported.
> 
> The changes are specifically limited to resolving clashes between duplicate
> conntrack entries, and any regressions will be limited to that scenario. The
> overall risk of regression is very low.
> 
> [Notes]
> The commits in the -azure kernels will need to be reverted before importing the
> generic bionic updates, since they are the pre-upstream commits, and have had 
> several revisions since they were imported. Replacing them with the 
> actual upstream commits will make things easier in the future to maintain.
> 
> http://bugs.launchpad.net/bugs/1795493
> 
> For race 1):
> Azure: https://patchwork.ozlabs.org/patch/937963/
> Upstream: ed07d9a021df6da53456663a76999189badc432a
> Commits are the same.
> 
> For race 2):
> Azure: https://patchwork.ozlabs.org/patch/952939/
> Upstream: 4e35c1cb9460240e983a01745b5f29fe3a4d8e39
> Note the difference, old commit must be reverted first.
> 
> Martynas Pumputis (2):
>   netfilter: nf_conntrack: resolve clash for matching conntracks
>   netfilter: nf_nat: skip nat clash resolution for same-origin entries
> 
>  net/netfilter/nf_conntrack_core.c | 46 +++++++++++++++++++++++++------
>  1 file changed, 38 insertions(+), 8 deletions(-)
> 
> -- 
> 2.20.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team@lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team