Message ID | 20180717194048.3057-2-jallen@linux.ibm.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | powerpc/pseries: Improve serialization of PRRN events | expand |
Context | Check | Description |
---|---|---|
snowpatch_ozlabs/apply_patch | success | next/apply_patch Successfully applied |
snowpatch_ozlabs/snowpatch-checkpatch | success | Test checkpatch on branch next |
On 07/17/2018 02:40 PM, John Allen wrote: > When a PRRN event is being handled and another PRRN event comes in, the > second event will block rtas polling waiting on the first to complete, > preventing any further rtas events from being handled. This can be > especially problematic in case that PRRN events are continuously being > queued in which case rtas polling gets indefinitely blocked completely. > > This patch introduces a mutex that prevents any subsequent PRRN events from > running while there is a prrn event being handled, allowing rtas polling to > continue normally. > > Signed-off-by: John Allen <jallen@linux.ibm.com> Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> > --- > v2: > -Unlock prrn_lock when PRRN operations are complete, not after handler is > scheduled. > -Remove call to flush_work, the previous broken method of serializing > PRRN events. > --- > arch/powerpc/kernel/rtasd.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c > index 44d66c33d59d..845fc5aec178 100644 > --- a/arch/powerpc/kernel/rtasd.c > +++ b/arch/powerpc/kernel/rtasd.c > @@ -35,6 +35,8 @@ > > static DEFINE_SPINLOCK(rtasd_log_lock); > > +static DEFINE_MUTEX(prrn_lock); > + > static DECLARE_WAIT_QUEUE_HEAD(rtas_log_wait); > > static char *rtas_log_buf; > @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work) > */ > pseries_devicetree_update(-prrn_update_scope); > numa_update_cpu_topology(false); > + mutex_unlock(&prrn_lock); > } > > static DECLARE_WORK(prrn_work, prrn_work_fn); > > static void prrn_schedule_update(u32 scope) > { > - flush_work(&prrn_work); > - prrn_update_scope = scope; > - schedule_work(&prrn_work); > + if (mutex_trylock(&prrn_lock)) { > + prrn_update_scope = scope; > + schedule_work(&prrn_work); > + } > } > > static void handle_rtas_event(const struct rtas_error_log *log) >
Hi John, I'm a bit puzzled by this one. John Allen <jallen@linux.ibm.com> writes: > When a PRRN event is being handled and another PRRN event comes in, the > second event will block rtas polling waiting on the first to complete, > preventing any further rtas events from being handled. This can be > especially problematic in case that PRRN events are continuously being > queued in which case rtas polling gets indefinitely blocked completely. > > This patch introduces a mutex that prevents any subsequent PRRN events from > running while there is a prrn event being handled, allowing rtas polling to > continue normally. > > Signed-off-by: John Allen <jallen@linux.ibm.com> > --- > v2: > -Unlock prrn_lock when PRRN operations are complete, not after handler is > scheduled. > -Remove call to flush_work, the previous broken method of serializing > PRRN events. > --- > arch/powerpc/kernel/rtasd.c | 10 +++++++--- > 1 file changed, 7 insertions(+), 3 deletions(-) > > diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c > index 44d66c33d59d..845fc5aec178 100644 > --- a/arch/powerpc/kernel/rtasd.c > +++ b/arch/powerpc/kernel/rtasd.c > @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work) > */ > pseries_devicetree_update(-prrn_update_scope); > numa_update_cpu_topology(false); > + mutex_unlock(&prrn_lock); > } > > static DECLARE_WORK(prrn_work, prrn_work_fn); > > static void prrn_schedule_update(u32 scope) > { > - flush_work(&prrn_work); This seems like it's actually the core of the change. Previously we were basically blocking on the flush before continuing. > - prrn_update_scope = scope; I don't really understand the scope. With the old code we always ran the work function once for call, now we potentially throw away the scope value (if the try lock fails). > - schedule_work(&prrn_work); > + if (mutex_trylock(&prrn_lock)) { > + prrn_update_scope = scope; > + schedule_work(&prrn_work); > + } Ignoring the scope, the addition of the mutex should not actually make any difference. If you see the doco for schedule_work() it says: * This puts a job in the kernel-global workqueue if it was not already * queued and leaves it in the same position on the kernel-global * workqueue otherwise. So the mutex basically implements that existing behaviour. But maybe the scope is the issue? Like I said I don't really understand the scope value. So I guess I'm wondering if we just need to drop the flush_work() and the rest is not required? cheers
On Mon, Jul 23, 2018 at 11:27:56PM +1000, Michael Ellerman wrote: >Hi John, > >I'm a bit puzzled by this one. > >John Allen <jallen@linux.ibm.com> writes: >> When a PRRN event is being handled and another PRRN event comes in, the >> second event will block rtas polling waiting on the first to complete, >> preventing any further rtas events from being handled. This can be >> especially problematic in case that PRRN events are continuously being >> queued in which case rtas polling gets indefinitely blocked completely. >> >> This patch introduces a mutex that prevents any subsequent PRRN events from >> running while there is a prrn event being handled, allowing rtas polling to >> continue normally. >> >> Signed-off-by: John Allen <jallen@linux.ibm.com> >> --- >> v2: >> -Unlock prrn_lock when PRRN operations are complete, not after handler is >> scheduled. >> -Remove call to flush_work, the previous broken method of serializing >> PRRN events. >> --- >> arch/powerpc/kernel/rtasd.c | 10 +++++++--- >> 1 file changed, 7 insertions(+), 3 deletions(-) >> >> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c >> index 44d66c33d59d..845fc5aec178 100644 >> --- a/arch/powerpc/kernel/rtasd.c >> +++ b/arch/powerpc/kernel/rtasd.c >> @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work) >> */ >> pseries_devicetree_update(-prrn_update_scope); >> numa_update_cpu_topology(false); >> + mutex_unlock(&prrn_lock); >> } >> >> static DECLARE_WORK(prrn_work, prrn_work_fn); >> >> static void prrn_schedule_update(u32 scope) >> { >> - flush_work(&prrn_work); > >This seems like it's actually the core of the change. Previously we were >basically blocking on the flush before continuing. The idea here is to replace the blocking flush_work with a non-blocking mutex. So rather than waiting on the running PRRN event to complete, we bail out since a PRRN event is already running. The situation this is meant to address is flooding the workqueue with PRRN events, which like the situation in patch 2/2, these can be queued up faster than they can actually be handled. > >> - prrn_update_scope = scope; > >I don't really understand the scope. With the old code we always ran the >work function once for call, now we potentially throw away the scope >value (if the try lock fails). So anytime we actually want to run with the scope (in the event the trylock succeeds), we schedule the work with the scope value set accordingly as seen in the code below. In the case that we actually don't want to run a PRRN event (if one is already running) we do throw away the scope and ignore the request entirely. > >> - schedule_work(&prrn_work); >> + if (mutex_trylock(&prrn_lock)) { >> + prrn_update_scope = scope; >> + schedule_work(&prrn_work); >> + } > >Ignoring the scope, the addition of the mutex should not actually make >any difference. If you see the doco for schedule_work() it says: > > * This puts a job in the kernel-global workqueue if it was not already > * queued and leaves it in the same position on the kernel-global > * workqueue otherwise. > > >So the mutex basically implements that existing behaviour. But maybe the >scope is the issue? Like I said I don't really understand the scope >value. > > >So I guess I'm wondering if we just need to drop the flush_work() and >the rest is not required? To sum up the above, the behavior without the mutex is not the same as with the mutex. Without the mutex, that means that anytime we get a PRRN event, it will get queued on the workqueue which can get flooded if PRRN events are queued continuously. With the mutex, only one PRRN event can be queued for handling at once. Hope that clears things up! -John > >cheers >
Hi John, I'm still not sure about this one. John Allen <jallen@linux.ibm.com> writes: > On Mon, Jul 23, 2018 at 11:27:56PM +1000, Michael Ellerman wrote: >>Hi John, >> >>I'm a bit puzzled by this one. >> >>John Allen <jallen@linux.ibm.com> writes: >>> When a PRRN event is being handled and another PRRN event comes in, the >>> second event will block rtas polling waiting on the first to complete, >>> preventing any further rtas events from being handled. This can be >>> especially problematic in case that PRRN events are continuously being >>> queued in which case rtas polling gets indefinitely blocked completely. >>> >>> This patch introduces a mutex that prevents any subsequent PRRN events from >>> running while there is a prrn event being handled, allowing rtas polling to >>> continue normally. >>> >>> Signed-off-by: John Allen <jallen@linux.ibm.com> >>> --- >>> v2: >>> -Unlock prrn_lock when PRRN operations are complete, not after handler is >>> scheduled. >>> -Remove call to flush_work, the previous broken method of serializing >>> PRRN events. >>> --- >>> arch/powerpc/kernel/rtasd.c | 10 +++++++--- >>> 1 file changed, 7 insertions(+), 3 deletions(-) >>> >>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c >>> index 44d66c33d59d..845fc5aec178 100644 >>> --- a/arch/powerpc/kernel/rtasd.c >>> +++ b/arch/powerpc/kernel/rtasd.c >>> @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work) >>> */ >>> pseries_devicetree_update(-prrn_update_scope); >>> numa_update_cpu_topology(false); >>> + mutex_unlock(&prrn_lock); >>> } >>> >>> static DECLARE_WORK(prrn_work, prrn_work_fn); >>> >>> static void prrn_schedule_update(u32 scope) >>> { >>> - flush_work(&prrn_work); >> >>This seems like it's actually the core of the change. Previously we were >>basically blocking on the flush before continuing. > > The idea here is to replace the blocking flush_work with a non-blocking > mutex. So rather than waiting on the running PRRN event to complete, we > bail out since a PRRN event is already running. OK, but why is it OK to bail out? The firmware sent you an error log asking you to do something, with a scope value that has some meaning, and now you're just going to drop that on the floor? Maybe it is OK to just drop these events? Or maybe you're saying that because the system is crashing under the load of too many events it's OK to drop the events in this case. > The situation this is > meant to address is flooding the workqueue with PRRN events, which like > the situation in patch 2/2, these can be queued up faster than they can > actually be handled. I'm not really sure why this is a problem though. The current code synchronously processes the events, so there should only ever be one in flight. I guess the issue is that each one can queue multiple events on the hotplug work queue? But still, we have terabytes of RAM, we should be able to queue a lot of events before it becomes a problem. So what exactly is getting flooded, what's the symptom? If the queuing of the hotplug events is the problem, then why don't we stop doing that? We could just process them synchronously from the PRRN update, that would naturally throttle them. cheers
On Wed, Aug 01, 2018 at 11:02:59PM +1000, Michael Ellerman wrote: >Hi John, > >I'm still not sure about this one. > >John Allen <jallen@linux.ibm.com> writes: >> On Mon, Jul 23, 2018 at 11:27:56PM +1000, Michael Ellerman wrote: >>>Hi John, >>> >>>I'm a bit puzzled by this one. >>> >>>John Allen <jallen@linux.ibm.com> writes: >>>> When a PRRN event is being handled and another PRRN event comes in, the >>>> second event will block rtas polling waiting on the first to complete, >>>> preventing any further rtas events from being handled. This can be >>>> especially problematic in case that PRRN events are continuously being >>>> queued in which case rtas polling gets indefinitely blocked completely. >>>> >>>> This patch introduces a mutex that prevents any subsequent PRRN events from >>>> running while there is a prrn event being handled, allowing rtas polling to >>>> continue normally. >>>> >>>> Signed-off-by: John Allen <jallen@linux.ibm.com> >>>> --- >>>> v2: >>>> -Unlock prrn_lock when PRRN operations are complete, not after handler is >>>> scheduled. >>>> -Remove call to flush_work, the previous broken method of serializing >>>> PRRN events. >>>> --- >>>> arch/powerpc/kernel/rtasd.c | 10 +++++++--- >>>> 1 file changed, 7 insertions(+), 3 deletions(-) >>>> >>>> diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c >>>> index 44d66c33d59d..845fc5aec178 100644 >>>> --- a/arch/powerpc/kernel/rtasd.c >>>> +++ b/arch/powerpc/kernel/rtasd.c >>>> @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work) >>>> */ >>>> pseries_devicetree_update(-prrn_update_scope); >>>> numa_update_cpu_topology(false); >>>> + mutex_unlock(&prrn_lock); >>>> } >>>> >>>> static DECLARE_WORK(prrn_work, prrn_work_fn); >>>> >>>> static void prrn_schedule_update(u32 scope) >>>> { >>>> - flush_work(&prrn_work); >>> >>>This seems like it's actually the core of the change. Previously we were >>>basically blocking on the flush before continuing. >> >> The idea here is to replace the blocking flush_work with a non-blocking >> mutex. So rather than waiting on the running PRRN event to complete, we >> bail out since a PRRN event is already running. > >OK, but why is it OK to bail out? > >The firmware sent you an error log asking you to do something, with a >scope value that has some meaning, and now you're just going to drop >that on the floor? > >Maybe it is OK to just drop these events? Or maybe you're saying that >because the system is crashing under the load of too many events it's OK >to drop the events in this case. I think I see your point. If a PRRN event comes in while another is currently running, the new one may contain a different list of LMBs/CPUs and the old list becomes outdated. With the mutex, the only event that gets handled is the oldest and we will lose any additional changes beyond the initial event. Therefore, as you mentioned in your previous message, the behavior of the global workqueue should work just fine once we remove the call to flush_work. While a prrn event is running, only one will remain on the workqueue, then when the first one completes, the newly scheduled work function should grab the latest PRRN list. I will send a new version of the patch with just the call to flush_work removed. -John > >> The situation this is >> meant to address is flooding the workqueue with PRRN events, which like >> the situation in patch 2/2, these can be queued up faster than they can >> actually be handled. > >I'm not really sure why this is a problem though. > >The current code synchronously processes the events, so there should >only ever be one in flight. > >I guess the issue is that each one can queue multiple events on the >hotplug work queue? > >But still, we have terabytes of RAM, we should be able to queue a lot >of events before it becomes a problem. > >So what exactly is getting flooded, what's the symptom? > >If the queuing of the hotplug events is the problem, then why don't we >stop doing that? We could just process them synchronously from the PRRN >update, that would naturally throttle them. > >cheers >
diff --git a/arch/powerpc/kernel/rtasd.c b/arch/powerpc/kernel/rtasd.c index 44d66c33d59d..845fc5aec178 100644 --- a/arch/powerpc/kernel/rtasd.c +++ b/arch/powerpc/kernel/rtasd.c @@ -35,6 +35,8 @@ static DEFINE_SPINLOCK(rtasd_log_lock); +static DEFINE_MUTEX(prrn_lock); + static DECLARE_WAIT_QUEUE_HEAD(rtas_log_wait); static char *rtas_log_buf; @@ -284,15 +286,17 @@ static void prrn_work_fn(struct work_struct *work) */ pseries_devicetree_update(-prrn_update_scope); numa_update_cpu_topology(false); + mutex_unlock(&prrn_lock); } static DECLARE_WORK(prrn_work, prrn_work_fn); static void prrn_schedule_update(u32 scope) { - flush_work(&prrn_work); - prrn_update_scope = scope; - schedule_work(&prrn_work); + if (mutex_trylock(&prrn_lock)) { + prrn_update_scope = scope; + schedule_work(&prrn_work); + } } static void handle_rtas_event(const struct rtas_error_log *log)
When a PRRN event is being handled and another PRRN event comes in, the second event will block rtas polling waiting on the first to complete, preventing any further rtas events from being handled. This can be especially problematic in case that PRRN events are continuously being queued in which case rtas polling gets indefinitely blocked completely. This patch introduces a mutex that prevents any subsequent PRRN events from running while there is a prrn event being handled, allowing rtas polling to continue normally. Signed-off-by: John Allen <jallen@linux.ibm.com> --- v2: -Unlock prrn_lock when PRRN operations are complete, not after handler is scheduled. -Remove call to flush_work, the previous broken method of serializing PRRN events. --- arch/powerpc/kernel/rtasd.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-)