Message ID | 20210114221243.21614-1-matthew.ruffell@canonical.com |
---|---|
Headers | show |
Series | qede: Kubernetes Internal DNS Failure due to QL41xxx NIC not supporting IPIP tx csum offload. | expand |
On Fri, Jan 15, 2021 at 11:12:42AM +1300, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1909062 > > [Impact] > > For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series > 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 > kernel, Kubernetes Internal DNS requests will fail, due to these packets getting > corrupted. > > Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this > particular packet type is not supported for hardware tx checksum offload, and > the packets end up corrupted when the qede driver attempts to checksum them. > > This only affects internal Kubernetes DNS, as regular DNS lookups to regular > external domains will succeed, due to them not using IPIP packet types. > > [Fix] > > Marvell has developed a fix for the qede driver, which checks the packet type, > and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers > of type IPIP. > > commit 5d5647dad259bb416fd5d3d87012760386d97530 > Author: Manish Chopra <manishc@marvell.com> > Date: Mon Dec 21 06:55:30 2020 -0800 > Subject: qede: fix offload for IPIP tunnel packets > Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530 > > This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream > stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7. > > Note, this SRU isn't targeted for Bionic due to tx csum offload support only > landing in 5.0 and onward, meaning the 4.15 kernel still works even without this > patch. Because of this, Bionic can pick the patch up naturally from upstream > stable. > > [Testcase] > > The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part > of a Kubernetes cluster. > > Firstly, get a list of all devices in the system: > > $ sudo ifconfig > > Next, set all devices down with: > > $ sudo ifconfig <device> down > > Next, bring up the QLogic QL41xxx device: > > $ sudo ifconfig <qlogic nic device> up > > Then, attempt to lookup an internal Kubernetes domain: > > $ nslookup <internal kubernetes domain address> > > Without the patch, the connection will time out: > > ;; connection timed out; no servers could be reached > > If we look at packet traces with tcpdump, we see it leaves the source, but never > arrives at the destination. > > There is a test kernel available in the following ppa: > > https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test > > If you install it, then Kubernetes internal DNS lookups will succeed. > > [Where problems could occur] > > If a regression were to occur, then users of the qede driver would be affected. > This is limited to those with QLogic QL41xxx series NICs. The patch explicitly > checks for IPIP type packets, so only those particular packets would be affected. > > Since IPIP type packets are uncommon, it would not cause a total outage on > regression, since most packets are not IPIP tunnelled. It could potentially cause > problems for users who frequently handle VPN or Kubernetes internal DNS traffic. > > A workaround would be to use ethtool to disable tx csum offload for all packet > types, or to revert to an older kernel. > > Manish Chopra (1): > qede: fix offload for IPIP tunnel packets > > drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++ > 1 file changed, 5 insertions(+) > > -- > 2.27.0 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team Acked-by: William Breathitt Gray <william.gray@canonical.com>
Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> On Fri, Jan 15, 2021 at 11:12:42AM +1300, Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1909062 > > [Impact] > > For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series > 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 > kernel, Kubernetes Internal DNS requests will fail, due to these packets getting > corrupted. > > Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this > particular packet type is not supported for hardware tx checksum offload, and > the packets end up corrupted when the qede driver attempts to checksum them. > > This only affects internal Kubernetes DNS, as regular DNS lookups to regular > external domains will succeed, due to them not using IPIP packet types. > > [Fix] > > Marvell has developed a fix for the qede driver, which checks the packet type, > and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers > of type IPIP. > > commit 5d5647dad259bb416fd5d3d87012760386d97530 > Author: Manish Chopra <manishc@marvell.com> > Date: Mon Dec 21 06:55:30 2020 -0800 > Subject: qede: fix offload for IPIP tunnel packets > Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530 > > This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream > stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7. > > Note, this SRU isn't targeted for Bionic due to tx csum offload support only > landing in 5.0 and onward, meaning the 4.15 kernel still works even without this > patch. Because of this, Bionic can pick the patch up naturally from upstream > stable. > > [Testcase] > > The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part > of a Kubernetes cluster. > > Firstly, get a list of all devices in the system: > > $ sudo ifconfig > > Next, set all devices down with: > > $ sudo ifconfig <device> down > > Next, bring up the QLogic QL41xxx device: > > $ sudo ifconfig <qlogic nic device> up > > Then, attempt to lookup an internal Kubernetes domain: > > $ nslookup <internal kubernetes domain address> > > Without the patch, the connection will time out: > > ;; connection timed out; no servers could be reached > > If we look at packet traces with tcpdump, we see it leaves the source, but never > arrives at the destination. > > There is a test kernel available in the following ppa: > > https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test > > If you install it, then Kubernetes internal DNS lookups will succeed. > > [Where problems could occur] > > If a regression were to occur, then users of the qede driver would be affected. > This is limited to those with QLogic QL41xxx series NICs. The patch explicitly > checks for IPIP type packets, so only those particular packets would be affected. > > Since IPIP type packets are uncommon, it would not cause a total outage on > regression, since most packets are not IPIP tunnelled. It could potentially cause > problems for users who frequently handle VPN or Kubernetes internal DNS traffic. > > A workaround would be to use ethtool to disable tx csum offload for all packet > types, or to revert to an older kernel. > > Manish Chopra (1): > qede: fix offload for IPIP tunnel packets > > drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++ > 1 file changed, 5 insertions(+) > > -- > 2.27.0 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team
Applied to F/G master-next. thank you! -Kelsey On 2021-01-15 11:12:42 , Matthew Ruffell wrote: > BugLink: https://bugs.launchpad.net/bugs/1909062 > > [Impact] > > For users with QLogic QL41xxx series NICs, such as the FastLinQ QL41000 Series > 10/25/40/50GbE Controller, when they upgrade from the 4.15 kernel to the 5.4 > kernel, Kubernetes Internal DNS requests will fail, due to these packets getting > corrupted. > > Kubernetes uses IPIP tunnelled packets for internal DNS resolution, and this > particular packet type is not supported for hardware tx checksum offload, and > the packets end up corrupted when the qede driver attempts to checksum them. > > This only affects internal Kubernetes DNS, as regular DNS lookups to regular > external domains will succeed, due to them not using IPIP packet types. > > [Fix] > > Marvell has developed a fix for the qede driver, which checks the packet type, > and if it is IPPROTO_IPIP, then csum offloads are disabled for socket buffers > of type IPIP. > > commit 5d5647dad259bb416fd5d3d87012760386d97530 > Author: Manish Chopra <manishc@marvell.com> > Date: Mon Dec 21 06:55:30 2020 -0800 > Subject: qede: fix offload for IPIP tunnel packets > Link: https://github.com/torvalds/linux/commit/5d5647dad259bb416fd5d3d87012760386d97530 > > This commit landed in mainline in 5.11-rc3. The commit was accepted into upstream > stable 4.14.215, 4.19.167, 5.4.89 and 5.10.7. > > Note, this SRU isn't targeted for Bionic due to tx csum offload support only > landing in 5.0 and onward, meaning the 4.15 kernel still works even without this > patch. Because of this, Bionic can pick the patch up naturally from upstream > stable. > > [Testcase] > > The system must have a QLogic QL41xxx series NIC fitted, and needs to be a part > of a Kubernetes cluster. > > Firstly, get a list of all devices in the system: > > $ sudo ifconfig > > Next, set all devices down with: > > $ sudo ifconfig <device> down > > Next, bring up the QLogic QL41xxx device: > > $ sudo ifconfig <qlogic nic device> up > > Then, attempt to lookup an internal Kubernetes domain: > > $ nslookup <internal kubernetes domain address> > > Without the patch, the connection will time out: > > ;; connection timed out; no servers could be reached > > If we look at packet traces with tcpdump, we see it leaves the source, but never > arrives at the destination. > > There is a test kernel available in the following ppa: > > https://launchpad.net/~mruffell/+archive/ubuntu/sf297772-test > > If you install it, then Kubernetes internal DNS lookups will succeed. > > [Where problems could occur] > > If a regression were to occur, then users of the qede driver would be affected. > This is limited to those with QLogic QL41xxx series NICs. The patch explicitly > checks for IPIP type packets, so only those particular packets would be affected. > > Since IPIP type packets are uncommon, it would not cause a total outage on > regression, since most packets are not IPIP tunnelled. It could potentially cause > problems for users who frequently handle VPN or Kubernetes internal DNS traffic. > > A workaround would be to use ethtool to disable tx csum offload for all packet > types, or to revert to an older kernel. > > Manish Chopra (1): > qede: fix offload for IPIP tunnel packets > > drivers/net/ethernet/qlogic/qede/qede_fp.c | 5 +++++ > 1 file changed, 5 insertions(+) > > -- > 2.27.0 > > > -- > kernel-team mailing list > kernel-team@lists.ubuntu.com > https://lists.ubuntu.com/mailman/listinfo/kernel-team