Message ID | 20191022231051.30770-1-xiyou.wangcong@gmail.com |
---|---|
Headers | show |
Series | tcp: decouple TLP timer from RTO timer | expand |
From: Cong Wang <xiyou.wangcong@gmail.com> Date: Tue, 22 Oct 2019 16:10:48 -0700 > This patchset contains 3 patches: patch 1 is a cleanup, > patch 2 is a small change preparing for patch 3, patch 3 is the > one does the actual change. Please find details in each of them. Eric, have you had a chance to test this on a system with suitable CPU arity?
On Mon, Oct 28, 2019 at 11:29 AM David Miller <davem@davemloft.net> wrote: > > From: Cong Wang <xiyou.wangcong@gmail.com> > Date: Tue, 22 Oct 2019 16:10:48 -0700 > > > This patchset contains 3 patches: patch 1 is a cleanup, > > patch 2 is a small change preparing for patch 3, patch 3 is the > > one does the actual change. Please find details in each of them. > > Eric, have you had a chance to test this on a system with > suitable CPU arity? Yes, and I confirm I could not repro the issues at all. I got a 100Gbit NIC, trying to increase the pressure a bit, and driving this NIC at line rate was only using 2% of my 96 cpus host, no spinlock contention of any sort. Thanks.
On Mon, Oct 28, 2019 at 11:34 AM Eric Dumazet <edumazet@google.com> wrote: > > On Mon, Oct 28, 2019 at 11:29 AM David Miller <davem@davemloft.net> wrote: > > > > From: Cong Wang <xiyou.wangcong@gmail.com> > > Date: Tue, 22 Oct 2019 16:10:48 -0700 > > > > > This patchset contains 3 patches: patch 1 is a cleanup, > > > patch 2 is a small change preparing for patch 3, patch 3 is the > > > one does the actual change. Please find details in each of them. > > > > Eric, have you had a chance to test this on a system with > > suitable CPU arity? > > Yes, and I confirm I could not repro the issues at all. > > I got a 100Gbit NIC, trying to increase the pressure a bit, and > driving this NIC at line rate was only using 2% of my 96 cpus host, > no spinlock contention of any sort. Please let me know if there is anything else I can provide to help you to make the decision. All I can say so far is this only happens on our hosts with 128 AMD CPU's. I don't see anything here related to AMD, so I think only the number of CPU's (vs. number of TX queues?) matters. Thanks.
On Mon, Oct 28, 2019 at 1:13 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > On Mon, Oct 28, 2019 at 11:34 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Mon, Oct 28, 2019 at 11:29 AM David Miller <davem@davemloft.net> wrote: > > > > > > From: Cong Wang <xiyou.wangcong@gmail.com> > > > Date: Tue, 22 Oct 2019 16:10:48 -0700 > > > > > > > This patchset contains 3 patches: patch 1 is a cleanup, > > > > patch 2 is a small change preparing for patch 3, patch 3 is the > > > > one does the actual change. Please find details in each of them. > > > > > > Eric, have you had a chance to test this on a system with > > > suitable CPU arity? > > > > Yes, and I confirm I could not repro the issues at all. > > > > I got a 100Gbit NIC, trying to increase the pressure a bit, and > > driving this NIC at line rate was only using 2% of my 96 cpus host, > > no spinlock contention of any sort. > > Please let me know if there is anything else I can provide to help > you to make the decision. > > All I can say so far is this only happens on our hosts with 128 > AMD CPU's. I don't see anything here related to AMD, so I think > only the number of CPU's (vs. number of TX queues?) matters. > I also have AMD hosts with 256 cpus, I can try them later (not today, I am too busy) But I feel you are trying to work around a more fundamental issue if this problem only shows up on AMD hosts.
On Mon, Oct 28, 2019 at 1:31 PM Eric Dumazet <edumazet@google.com> wrote: > > On Mon, Oct 28, 2019 at 1:13 PM Cong Wang <xiyou.wangcong@gmail.com> wrote: > > > > On Mon, Oct 28, 2019 at 11:34 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Mon, Oct 28, 2019 at 11:29 AM David Miller <davem@davemloft.net> wrote: > > > > > > > > From: Cong Wang <xiyou.wangcong@gmail.com> > > > > Date: Tue, 22 Oct 2019 16:10:48 -0700 > > > > > > > > > This patchset contains 3 patches: patch 1 is a cleanup, > > > > > patch 2 is a small change preparing for patch 3, patch 3 is the > > > > > one does the actual change. Please find details in each of them. > > > > > > > > Eric, have you had a chance to test this on a system with > > > > suitable CPU arity? > > > > > > Yes, and I confirm I could not repro the issues at all. > > > > > > I got a 100Gbit NIC, trying to increase the pressure a bit, and > > > driving this NIC at line rate was only using 2% of my 96 cpus host, > > > no spinlock contention of any sort. > > > > Please let me know if there is anything else I can provide to help > > you to make the decision. > > > > All I can say so far is this only happens on our hosts with 128 > > AMD CPU's. I don't see anything here related to AMD, so I think > > only the number of CPU's (vs. number of TX queues?) matters. > > > > I also have AMD hosts with 256 cpus, I can try them later (not today, > I am too busy) > > But I feel you are trying to work around a more fundamental issue if > this problem only shows up on AMD hosts. I wish I have Intel hosts with the same number of CPU's, but I don't, all Intel ones have less, probably 80 at max. This is why I think it is related to the number of CPU's. Also, IOMMU is turned off explicitly, I don't see anything else could be AMD specific along the TCP path. Thanks.
From: Cong Wang <xiyou.wangcong@gmail.com> Date: Tue, 22 Oct 2019 16:10:48 -0700 > This patchset contains 3 patches: patch 1 is a cleanup, > patch 2 is a small change preparing for patch 3, patch 3 is the > one does the actual change. Please find details in each of them. I'm marking this deferred until someone can drill down why this is only seen in such a specific configuration, and not to ANY EXTENT whatsoever with just a slightly lower number of CPUs on other machines. It's really hard to justify this set of changes without a full understanding and detailed analysis. Thanks.