Message ID | 20200710115313.18319-1-ross.lagerwall@citrix.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] vswitchd: Pass MCL_ONFAULT to mlockall | expand |
On 7/10/2020 4:53 AM, Ross Lagerwall wrote: > mlockall locks thread stack pages into memory, even pages which have not > yet been demand-paged. As vswitchd has a lot of threads and the default > stack size on x86_64 is 8 MiB, this consumes a lot of memory. On two > systems I looked at, vswitchd used ~150 MiB of RSS when idle after > startup. > > Use the new MCL_ONFAULT flag to only lock pages into memory once they > have been demand-paged in. This still satisfies the requirement that > vswitchd is not swapped out but frees up ~144 MiB of unswappable memory > (18 threads x 8 MiB). After this, vswitchd uses ~6 MiB when idle after > startup. > > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Have you done any performance measurements? This seems like it may have some impact. Thanks, - Greg > --- > vswitchd/ovs-vswitchd.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c > index 1e72b628b..e4d482521 100644 > --- a/vswitchd/ovs-vswitchd.c > +++ b/vswitchd/ovs-vswitchd.c > @@ -92,7 +92,10 @@ main(int argc, char *argv[]) > > if (want_mlockall) { > #ifdef HAVE_MLOCKALL > - if (mlockall(MCL_CURRENT | MCL_FUTURE)) { > +#ifndef MCL_ONFAULT > +#define MCL_ONFAULT 0 > +#endif > + if (mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) { > VLOG_ERR("mlockall failed: %s", ovs_strerror(errno)); > } else { > set_memory_locked(); >
On Fri, Jul 10, 2020 at 12:53:13PM +0100, Ross Lagerwall wrote: > mlockall locks thread stack pages into memory, even pages which have not > yet been demand-paged. As vswitchd has a lot of threads and the default > stack size on x86_64 is 8 MiB, this consumes a lot of memory. On two > systems I looked at, vswitchd used ~150 MiB of RSS when idle after > startup. > > Use the new MCL_ONFAULT flag to only lock pages into memory once they > have been demand-paged in. This still satisfies the requirement that > vswitchd is not swapped out but frees up ~144 MiB of unswappable memory > (18 threads x 8 MiB). After this, vswitchd uses ~6 MiB when idle after > startup. The problem with this approach is that when using userspace datapath those page faults can introduce scheduling points which introduces jitter and impacts performance. Alternatively you can try to reduce the default stack size as done by this commit: commit b82a90e266e1246fe2973db97c95df22558174ea Author: Flavio Leitner <fbl@sysclose.org> Date: Thu Feb 28 13:13:57 2019 -0300 rhel: limit stack size to 2M. Thanks, fbl > > Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> > --- > vswitchd/ovs-vswitchd.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c > index 1e72b628b..e4d482521 100644 > --- a/vswitchd/ovs-vswitchd.c > +++ b/vswitchd/ovs-vswitchd.c > @@ -92,7 +92,10 @@ main(int argc, char *argv[]) > > if (want_mlockall) { > #ifdef HAVE_MLOCKALL > - if (mlockall(MCL_CURRENT | MCL_FUTURE)) { > +#ifndef MCL_ONFAULT > +#define MCL_ONFAULT 0 > +#endif > + if (mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) { > VLOG_ERR("mlockall failed: %s", ovs_strerror(errno)); > } else { > set_memory_locked(); > -- > 2.21.1 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Fri, Jul 10, 2020 at 12:12:57PM -0300, Flavio Leitner wrote: > On Fri, Jul 10, 2020 at 12:53:13PM +0100, Ross Lagerwall wrote: > > mlockall locks thread stack pages into memory, even pages which have not > > yet been demand-paged. As vswitchd has a lot of threads and the default > > stack size on x86_64 is 8 MiB, this consumes a lot of memory. On two > > systems I looked at, vswitchd used ~150 MiB of RSS when idle after > > startup. > > > > Use the new MCL_ONFAULT flag to only lock pages into memory once they > > have been demand-paged in. This still satisfies the requirement that > > vswitchd is not swapped out but frees up ~144 MiB of unswappable memory > > (18 threads x 8 MiB). After this, vswitchd uses ~6 MiB when idle after > > startup. > > The problem with this approach is that when using userspace datapath > those page faults can introduce scheduling points which introduces > jitter and impacts performance. > I like this idea of using MCL_ONFAULT. IIUC, the performance impact only happens first time when page fault happens, and afterward, it should have the same performance. William
On Mon, Jul 13, 2020 at 02:49:50PM -0700, William Tu wrote: > On Fri, Jul 10, 2020 at 12:12:57PM -0300, Flavio Leitner wrote: > > On Fri, Jul 10, 2020 at 12:53:13PM +0100, Ross Lagerwall wrote: > > > mlockall locks thread stack pages into memory, even pages which have not > > > yet been demand-paged. As vswitchd has a lot of threads and the default > > > stack size on x86_64 is 8 MiB, this consumes a lot of memory. On two > > > systems I looked at, vswitchd used ~150 MiB of RSS when idle after > > > startup. > > > > > > Use the new MCL_ONFAULT flag to only lock pages into memory once they > > > have been demand-paged in. This still satisfies the requirement that > > > vswitchd is not swapped out but frees up ~144 MiB of unswappable memory > > > (18 threads x 8 MiB). After this, vswitchd uses ~6 MiB when idle after > > > startup. > > > > The problem with this approach is that when using userspace datapath > > those page faults can introduce scheduling points which introduces > > jitter and impacts performance. > > > I like this idea of using MCL_ONFAULT. > IIUC, the performance impact only happens first time when page fault > happens, and afterward, it should have the same performance. Yes, there is a performance impact only when the page fault happens. The issue is that we can't tell when it is going to happen. Some page faults will happen at the initialization, which is fine, but some might happen when there is production traffic. The thread is preempted and all the ongoing traffic is impacted when that page fault happens. Therefore, this has a potential of affecting "zero" packet loss, for instance, which is important to Telco deployments. It seems we have two relevant use cases here with conflicting requirements. I don't think we can just change current behavior without announcing first and still the current behavior is desired for those who don't want surprises with page faults. Perhaps we can leave --mlockall as is to avoid possible regressions, and add another way to express the on demand use case? Thanks,
On Tue, Jul 14, 2020 at 6:32 AM Flavio Leitner <fbl@sysclose.org> wrote: > > On Mon, Jul 13, 2020 at 02:49:50PM -0700, William Tu wrote: > > On Fri, Jul 10, 2020 at 12:12:57PM -0300, Flavio Leitner wrote: > > > On Fri, Jul 10, 2020 at 12:53:13PM +0100, Ross Lagerwall wrote: > > > > mlockall locks thread stack pages into memory, even pages which have not > > > > yet been demand-paged. As vswitchd has a lot of threads and the default > > > > stack size on x86_64 is 8 MiB, this consumes a lot of memory. On two > > > > systems I looked at, vswitchd used ~150 MiB of RSS when idle after > > > > startup. > > > > > > > > Use the new MCL_ONFAULT flag to only lock pages into memory once they > > > > have been demand-paged in. This still satisfies the requirement that > > > > vswitchd is not swapped out but frees up ~144 MiB of unswappable memory > > > > (18 threads x 8 MiB). After this, vswitchd uses ~6 MiB when idle after > > > > startup. > > > > > > The problem with this approach is that when using userspace datapath > > > those page faults can introduce scheduling points which introduces > > > jitter and impacts performance. > > > > > I like this idea of using MCL_ONFAULT. > > IIUC, the performance impact only happens first time when page fault > > happens, and afterward, it should have the same performance. > > Yes, there is a performance impact only when the page fault happens. > > The issue is that we can't tell when it is going to happen. Some > page faults will happen at the initialization, which is fine, but > some might happen when there is production traffic. > > The thread is preempted and all the ongoing traffic is impacted > when that page fault happens. Therefore, this has a potential > of affecting "zero" packet loss, for instance, which is important > to Telco deployments. I see, that makes sense, thanks for the detail. > > It seems we have two relevant use cases here with conflicting > requirements. I don't think we can just change current behavior > without announcing first and still the current behavior is desired > for those who don't want surprises with page faults. > > Perhaps we can leave --mlockall as is to avoid possible regressions, > and add another way to express the on demand use case? > Agree Thanks, William
diff --git a/vswitchd/ovs-vswitchd.c b/vswitchd/ovs-vswitchd.c index 1e72b628b..e4d482521 100644 --- a/vswitchd/ovs-vswitchd.c +++ b/vswitchd/ovs-vswitchd.c @@ -92,7 +92,10 @@ main(int argc, char *argv[]) if (want_mlockall) { #ifdef HAVE_MLOCKALL - if (mlockall(MCL_CURRENT | MCL_FUTURE)) { +#ifndef MCL_ONFAULT +#define MCL_ONFAULT 0 +#endif + if (mlockall(MCL_CURRENT | MCL_FUTURE | MCL_ONFAULT)) { VLOG_ERR("mlockall failed: %s", ovs_strerror(errno)); } else { set_memory_locked();
mlockall locks thread stack pages into memory, even pages which have not yet been demand-paged. As vswitchd has a lot of threads and the default stack size on x86_64 is 8 MiB, this consumes a lot of memory. On two systems I looked at, vswitchd used ~150 MiB of RSS when idle after startup. Use the new MCL_ONFAULT flag to only lock pages into memory once they have been demand-paged in. This still satisfies the requirement that vswitchd is not swapped out but frees up ~144 MiB of unswappable memory (18 threads x 8 MiB). After this, vswitchd uses ~6 MiB when idle after startup. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> --- vswitchd/ovs-vswitchd.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)