Message ID | 1421139645-1588-1-git-send-email-ying.xue@windriver.com |
---|---|
State | Changes Requested, archived |
Delegated to: | David Miller |
Headers | show |
On 01/13/15 at 05:00pm, Ying Xue wrote: > When we put our declared work task in the global workqueue with > schedule_delayed_work(), its delay parameter is always zero. > Therefore, we should define a normal work in rhashtable structure > instead of a delayed work. > > Signed-off-by: Ying Xue <ying.xue@windriver.com> > Cc: Thomas Graf <tgraf@suug.ch> > @@ -914,7 +914,7 @@ void rhashtable_destroy(struct rhashtable *ht) > > mutex_lock(&ht->mutex); > > - cancel_delayed_work(&ht->run_work); > + cancel_work_sync(&ht->run_work); > bucket_table_free(rht_dereference(ht->tbl, ht)); > > mutex_unlock(&ht->mutex); I like the patch! I think it introduces a possible dead lock though (see below). OTOH, it could actually explain the reason for the 0day lock debug splash that was reported. Dead lock: The worker could already have been kicked off but was interrupted before it acquired ht->mutex. rhashtable_destroy() is called and acquired ht->mutex. cancel_work_sync() waits for worker to finish while holding ht->mutex. Worker can't finish because it needs to acquire ht->mutex to do so. For the very same reason the reported warning could have been triggered. Instead of the dead lock, it would have called bucket_table_free() with a deferred resizer still underway. What about we do something like this? void rhashtable_destroy(struct rhashtable *ht) { ht->being_destroyed = true; cancel_work_sync(&ht->run_work); mutex_lock(&ht->mutex); bucket_table_free(rht_dereference(ht->tbl, ht)); mutex_unlock(&ht->mutex); } If you agree we can explain this shortly in the commit message and add: Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking") -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/13/2015 05:35 PM, Thomas Graf wrote: > On 01/13/15 at 05:00pm, Ying Xue wrote: >> When we put our declared work task in the global workqueue with >> schedule_delayed_work(), its delay parameter is always zero. >> Therefore, we should define a normal work in rhashtable structure >> instead of a delayed work. >> >> Signed-off-by: Ying Xue <ying.xue@windriver.com> >> Cc: Thomas Graf <tgraf@suug.ch> > >> @@ -914,7 +914,7 @@ void rhashtable_destroy(struct rhashtable *ht) >> >> mutex_lock(&ht->mutex); >> >> - cancel_delayed_work(&ht->run_work); >> + cancel_work_sync(&ht->run_work); >> bucket_table_free(rht_dereference(ht->tbl, ht)); >> >> mutex_unlock(&ht->mutex); > > I like the patch! > > I think it introduces a possible dead lock though (see below). OTOH, it > could actually explain the reason for the 0day lock debug splash that > was reported. > > Dead lock: The worker could already have been kicked off but was > interrupted before it acquired ht->mutex. rhashtable_destroy() is > called and acquired ht->mutex. cancel_work_sync() waits for worker to > finish while holding ht->mutex. Worker can't finish because it needs to > acquire ht->mutex to do so. > > For the very same reason the reported warning could have been triggered. > Instead of the dead lock, it would have called bucket_table_free() > with a deferred resizer still underway. > > What about we do something like this? > > void rhashtable_destroy(struct rhashtable *ht) > { > ht->being_destroyed = true; > cancel_work_sync(&ht->run_work); > > mutex_lock(&ht->mutex); > bucket_table_free(rht_dereference(ht->tbl, ht)); > mutex_unlock(&ht->mutex); > } > Damn! I knew your above described deadlock scenario. Thank you for the nice catch! > If you agree we can explain this shortly in the commit message and add: > Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking") > OK, I will deliver the next version. By the way, I think we should check the following condition before call cancel_work_sync(), otherwise, we may cancel an uninitialized work. (ht->p.grow_decision || ht->p.shrink_decision) What do you think? Regards, Ying > -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 01/13/15 at 05:48pm, Ying Xue wrote: > On 01/13/2015 05:35 PM, Thomas Graf wrote: > > If you agree we can explain this shortly in the commit message and add: > > Fixes: 97defe1 ("rhashtable: Per bucket locks & deferred expansion/shrinking") > > OK, I will deliver the next version. > > By the way, I think we should check the following condition before call > cancel_work_sync(), otherwise, we may cancel an uninitialized work. > > (ht->p.grow_decision || ht->p.shrink_decision) > > What do you think? +1 -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/rhashtable.h b/include/linux/rhashtable.h index 9570832..a2562ed 100644 --- a/include/linux/rhashtable.h +++ b/include/linux/rhashtable.h @@ -119,7 +119,7 @@ struct rhashtable { atomic_t nelems; atomic_t shift; struct rhashtable_params p; - struct delayed_work run_work; + struct work_struct run_work; struct mutex mutex; bool being_destroyed; }; diff --git a/lib/rhashtable.c b/lib/rhashtable.c index ed6ae1a..a7959ed 100644 --- a/lib/rhashtable.c +++ b/lib/rhashtable.c @@ -476,7 +476,7 @@ static void rht_deferred_worker(struct work_struct *work) struct rhashtable *ht; struct bucket_table *tbl; - ht = container_of(work, struct rhashtable, run_work.work); + ht = container_of(work, struct rhashtable, run_work); mutex_lock(&ht->mutex); tbl = rht_dereference(ht->tbl, ht); @@ -498,7 +498,7 @@ static void rhashtable_wakeup_worker(struct rhashtable *ht) if (tbl == new_tbl && ((ht->p.grow_decision && ht->p.grow_decision(ht, size)) || (ht->p.shrink_decision && ht->p.shrink_decision(ht, size)))) - schedule_delayed_work(&ht->run_work, 0); + schedule_work(&ht->run_work); } static void __rhashtable_insert(struct rhashtable *ht, struct rhash_head *obj, @@ -894,7 +894,7 @@ int rhashtable_init(struct rhashtable *ht, struct rhashtable_params *params) get_random_bytes(&ht->p.hash_rnd, sizeof(ht->p.hash_rnd)); if (ht->p.grow_decision || ht->p.shrink_decision) - INIT_DEFERRABLE_WORK(&ht->run_work, rht_deferred_worker); + INIT_WORK(&ht->run_work, rht_deferred_worker); return 0; } @@ -914,7 +914,7 @@ void rhashtable_destroy(struct rhashtable *ht) mutex_lock(&ht->mutex); - cancel_delayed_work(&ht->run_work); + cancel_work_sync(&ht->run_work); bucket_table_free(rht_dereference(ht->tbl, ht)); mutex_unlock(&ht->mutex);
When we put our declared work task in the global workqueue with schedule_delayed_work(), its delay parameter is always zero. Therefore, we should define a normal work in rhashtable structure instead of a delayed work. Signed-off-by: Ying Xue <ying.xue@windriver.com> Cc: Thomas Graf <tgraf@suug.ch> --- include/linux/rhashtable.h | 2 +- lib/rhashtable.c | 8 ++++---- 2 files changed, 5 insertions(+), 5 deletions(-)