Message ID | 20190717003041.9775-1-matthew.ruffell@canonical.com |
---|---|
Headers | show |
Series | Fix nf_conntrack races when dealing with same origin requests in NAT environments | expand |
On 17.07.19 02:30, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1836816 > > [Impact] > > There are a number of races in nf_conntrack when multiple packets are sent from > the same origin to the same destination at the same time, where there is no > pre-existing confirmed conntrack entry, in a NAT environment. > > This is most frequently seen in users with kubernetes workloads, where dns > requests are sent from the same socket at the same time, from different threads. > One request is for a A record, the other AAAA. A conntrack race happens, which > leads to one request being dropped, which leads to 5 second dns timeouts. > > This problem is not specific to kubernetes, as any multi-threaded process > which sends UDP packets from the same socket at the same time from different > threads is affected. > > Note that since UDP is connectionless, no packet is sent when connect() is > called, so no conntrack entries are created until a packet is first sent. > > In the scenario where two UDP packets are sent at the same time from the same > socket on different threads, there are the following possible races: > > 1.) Neither of the packets find a confirmed conntrack entry. This leads to two > conntrack entries being created with the same tuples. > 2.) Same as 1), but a conntrack entry is confirmed for one of the packets before > the other calls get_unique_tuple(). The other packet gets a different reply > tuple with the source port changed. > > The outcomes of the races are the same, one of the packets is dropped when it > comes time to confirm conntrack entries with __nf_conntrack_confirm(). > > For more information, read the technical blog post: > https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts > > [Fix] > > The fix to race 1) was included in 4.19 upstream by this commit: > > netfilter: nf_conntrack: resolve clash for matching conntracks > commit ed07d9a021df6da53456663a76999189badc432a > Author: Martynas Pumputis <martynas@weave.works> > Date: Mon Jul 2 16:52:14 2018 +0200 > > This commit adds an extra check to see if the two entries are the same and to > merge the entries if they are. > > The fix to race 2) was included in 5.0 upstream by this commit: > > netfilter: nf_nat: skip nat clash resolution for same-origin entries > commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 > Author: Martynas Pumputis <martynas@weave.works> > Date: Tue Jan 29 15:51:42 2019 +0100 > > This commit ensures that a new source port is not allocated to a duplicated > entry, and forwards the duplicated entries to be resolved later on. > > Note that 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 was also backported to stable > releases 4.9.163, 4.14.106, 4.19.29, 4.20.16. > > Both commits cherry-pick to 4.15 bionic cleanly. Please cherry-pick both commits > to all bionic kernels. > > [Testcase] > > The reproducer has been provided by the author of the fixes, and is best viewed > here: > > https://github.com/brb/conntrack-race > > The dmesg logs and packet traces in the above github repo are best viewed while > reading upstream discussion: > > https://patchwork.ozlabs.org/patch/937963/ > > I have built a test kernel for xenial HWE, which can be found here: > https://launchpad.net/~mruffell/+archive/ubuntu/sf00227747-test-kernel > > The test kernel has been tested with a kubernetes workload and aided with > resolution of kubernetes problems. > > [Regression Potential] > > As noted previously, 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 has been > backported to stable releases 4.9.163, 4.14.106, 4.19.29, 4.20.16 and has no > changes to the primary commit. This commit is well tested and considered stable > by the community. > > Both commits have also been present in the -azure kernels since 4.15.0-1030.31 > and 4.18.0-1004.4, as per LP #1795493, http://bugs.launchpad.net/bugs/1795493 > They have received wide testing, and no problems have been reported. > > The changes are specifically limited to resolving clashes between duplicate > conntrack entries, and any regressions will be limited to that scenario. The > overall risk of regression is very low. > > [Notes] > The commits in the -azure kernels will need to be reverted before importing the > generic bionic updates, since they are the pre-upstream commits, and have had > several revisions since they were imported. Replacing them with the > actual upstream commits will make things easier in the future to maintain. > > http://bugs.launchpad.net/bugs/1795493 > > For race 1): > Azure: https://patchwork.ozlabs.org/patch/937963/ > Upstream: ed07d9a021df6da53456663a76999189badc432a > Commits are the same. > > For race 2): > Azure: https://patchwork.ozlabs.org/patch/952939/ > Upstream: 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 > Note the difference, old commit must be reverted first. > > Martynas Pumputis (2): > netfilter: nf_conntrack: resolve clash for matching conntracks > netfilter: nf_nat: skip nat clash resolution for same-origin entries > > net/netfilter/nf_conntrack_core.c | 46 +++++++++++++++++++++++++------ > 1 file changed, 38 insertions(+), 8 deletions(-) > Acked-by: Stefan Bader <stefan.bader@canonical.com>
On 7/16/19 5:30 PM, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1836816 > > [Impact] > > There are a number of races in nf_conntrack when multiple packets are sent from > the same origin to the same destination at the same time, where there is no > pre-existing confirmed conntrack entry, in a NAT environment. > > This is most frequently seen in users with kubernetes workloads, where dns > requests are sent from the same socket at the same time, from different threads. > One request is for a A record, the other AAAA. A conntrack race happens, which > leads to one request being dropped, which leads to 5 second dns timeouts. > > This problem is not specific to kubernetes, as any multi-threaded process > which sends UDP packets from the same socket at the same time from different > threads is affected. > > Note that since UDP is connectionless, no packet is sent when connect() is > called, so no conntrack entries are created until a packet is first sent. > > In the scenario where two UDP packets are sent at the same time from the same > socket on different threads, there are the following possible races: > > 1.) Neither of the packets find a confirmed conntrack entry. This leads to two > conntrack entries being created with the same tuples. > 2.) Same as 1), but a conntrack entry is confirmed for one of the packets before > the other calls get_unique_tuple(). The other packet gets a different reply > tuple with the source port changed. > > The outcomes of the races are the same, one of the packets is dropped when it > comes time to confirm conntrack entries with __nf_conntrack_confirm(). > > For more information, read the technical blog post: > https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts > > [Fix] > > The fix to race 1) was included in 4.19 upstream by this commit: > > netfilter: nf_conntrack: resolve clash for matching conntracks > commit ed07d9a021df6da53456663a76999189badc432a > Author: Martynas Pumputis <martynas@weave.works> > Date: Mon Jul 2 16:52:14 2018 +0200 > > This commit adds an extra check to see if the two entries are the same and to > merge the entries if they are. > > The fix to race 2) was included in 5.0 upstream by this commit: > > netfilter: nf_nat: skip nat clash resolution for same-origin entries > commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 > Author: Martynas Pumputis <martynas@weave.works> > Date: Tue Jan 29 15:51:42 2019 +0100 > > This commit ensures that a new source port is not allocated to a duplicated > entry, and forwards the duplicated entries to be resolved later on. > > Note that 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 was also backported to stable > releases 4.9.163, 4.14.106, 4.19.29, 4.20.16. > > Both commits cherry-pick to 4.15 bionic cleanly. Please cherry-pick both commits > to all bionic kernels. > > [Testcase] > > The reproducer has been provided by the author of the fixes, and is best viewed > here: > > https://github.com/brb/conntrack-race > > The dmesg logs and packet traces in the above github repo are best viewed while > reading upstream discussion: > > https://patchwork.ozlabs.org/patch/937963/ > > I have built a test kernel for xenial HWE, which can be found here: > https://launchpad.net/~mruffell/+archive/ubuntu/sf00227747-test-kernel > > The test kernel has been tested with a kubernetes workload and aided with > resolution of kubernetes problems. > > [Regression Potential] > > As noted previously, 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 has been > backported to stable releases 4.9.163, 4.14.106, 4.19.29, 4.20.16 and has no > changes to the primary commit. This commit is well tested and considered stable > by the community. > > Both commits have also been present in the -azure kernels since 4.15.0-1030.31 > and 4.18.0-1004.4, as per LP #1795493, http://bugs.launchpad.net/bugs/1795493 > They have received wide testing, and no problems have been reported. > > The changes are specifically limited to resolving clashes between duplicate > conntrack entries, and any regressions will be limited to that scenario. The > overall risk of regression is very low. > > [Notes] > The commits in the -azure kernels will need to be reverted before importing the > generic bionic updates, since they are the pre-upstream commits, and have had > several revisions since they were imported. Replacing them with the > actual upstream commits will make things easier in the future to maintain. > > http://bugs.launchpad.net/bugs/1795493 > > For race 1): > Azure: https://patchwork.ozlabs.org/patch/937963/ > Upstream: ed07d9a021df6da53456663a76999189badc432a > Commits are the same. > > For race 2): > Azure: https://patchwork.ozlabs.org/patch/952939/ > Upstream: 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 > Note the difference, old commit must be reverted first. > > Martynas Pumputis (2): > netfilter: nf_conntrack: resolve clash for matching conntracks > netfilter: nf_nat: skip nat clash resolution for same-origin entries > > net/netfilter/nf_conntrack_core.c | 46 +++++++++++++++++++++++++------ > 1 file changed, 38 insertions(+), 8 deletions(-) > Acked-by: Connor Kuehl <connor.kuehl@canonical.com>
On 2019-07-17 12:30:39 , Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1836816 > > [Impact] > > There are a number of races in nf_conntrack when multiple packets are sent from > the same origin to the same destination at the same time, where there is no > pre-existing confirmed conntrack entry, in a NAT environment. > > This is most frequently seen in users with kubernetes workloads, where dns > requests are sent from the same socket at the same time, from different threads. > One request is for a A record, the other AAAA. A conntrack race happens, which > leads to one request being dropped, which leads to 5 second dns timeouts. > > This problem is not specific to kubernetes, as any multi-threaded process > which sends UDP packets from the same socket at the same time from different > threads is affected. > > Note that since UDP is connectionless, no packet is sent when connect() is > called, so no conntrack entries are created until a packet is first sent. > > In the scenario where two UDP packets are sent at the same time from the same > socket on different threads, there are the following possible races: > > 1.) Neither of the packets find a confirmed conntrack entry. This leads to two > conntrack entries being created with the same tuples. > 2.) Same as 1), but a conntrack entry is confirmed for one of the packets before > the other calls get_unique_tuple(). The other packet gets a different reply > tuple with the source port changed. > > The outcomes of the races are the same, one of the packets is dropped when it > comes time to confirm conntrack entries with __nf_conntrack_confirm(). > > For more information, read the technical blog post: > https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts > > [Fix] > > The fix to race 1) was included in 4.19 upstream by this commit: > > netfilter: nf_conntrack: resolve clash for matching conntracks > commit ed07d9a021df6da53456663a76999189badc432a > Author: Martynas Pumputis <martynas@weave.works> > Date: Mon Jul 2 16:52:14 2018 +0200 > > This commit adds an extra check to see if the two entries are the same and to > merge the entries if they are. > > The fix to race 2) was included in 5.0 upstream by this commit: > > netfilter: nf_nat: skip nat clash resolution for same-origin entries > commit 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 > Author: Martynas Pumputis <martynas@weave.works> > Date: Tue Jan 29 15:51:42 2019 +0100 > > This commit ensures that a new source port is not allocated to a duplicated > entry, and forwards the duplicated entries to be resolved later on. > > Note that 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 was also backported to stable > releases 4.9.163, 4.14.106, 4.19.29, 4.20.16. > > Both commits cherry-pick to 4.15 bionic cleanly. Please cherry-pick both commits > to all bionic kernels. > > [Testcase] > > The reproducer has been provided by the author of the fixes, and is best viewed > here: > > https://github.com/brb/conntrack-race > > The dmesg logs and packet traces in the above github repo are best viewed while > reading upstream discussion: > > https://patchwork.ozlabs.org/patch/937963/ > > I have built a test kernel for xenial HWE, which can be found here: > https://launchpad.net/~mruffell/+archive/ubuntu/sf00227747-test-kernel > > The test kernel has been tested with a kubernetes workload and aided with > resolution of kubernetes problems. > > [Regression Potential] > > As noted previously, 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 has been > backported to stable releases 4.9.163, 4.14.106, 4.19.29, 4.20.16 and has no > changes to the primary commit. This commit is well tested and considered stable > by the community. > > Both commits have also been present in the -azure kernels since 4.15.0-1030.31 > and 4.18.0-1004.4, as per LP #1795493, http://bugs.launchpad.net/bugs/1795493 > They have received wide testing, and no problems have been reported. > > The changes are specifically limited to resolving clashes between duplicate > conntrack entries, and any regressions will be limited to that scenario. The > overall risk of regression is very low. > > [Notes] > The commits in the -azure kernels will need to be reverted before importing the > generic bionic updates, since they are the pre-upstream commits, and have had > several revisions since they were imported. Replacing them with the > actual upstream commits will make things easier in the future to maintain. > > http://bugs.launchpad.net/bugs/1795493 > > For race 1): > Azure: https://patchwork.ozlabs.org/patch/937963/ > Upstream: ed07d9a021df6da53456663a76999189badc432a > Commits are the same. > > For race 2): > Azure: https://patchwork.ozlabs.org/patch/952939/ > Upstream: 4e35c1cb9460240e983a01745b5f29fe3a4d8e39 > Note the difference, old commit must be reverted first. > > Martynas Pumputis (2): > netfilter: nf_conntrack: resolve clash for matching conntracks > netfilter: nf_nat: skip nat clash resolution for same-origin entries > > net/netfilter/nf_conntrack_core.c | 46 +++++++++++++++++++++++++------ > 1 file changed, 38 insertions(+), 8 deletions(-) > > -- > 2.20.1 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team