Message ID | 20180604064216.32075-2-daniel.axtens@canonical.com |
---|---|
State | New |
Headers | show |
Series | [SRU,T/X/A/B,C,1/1] UBUNTU: SAUCE: CacheFiles: fix a read_waiter/read_copier race | expand |
On 03.06.2018 23:42, Daniel Axtens wrote: > From: Lei Xue <carmark.dlut@gmail.com> > > BugLink: https://bugs.launchpad.net/bugs/1774336 > > There is a potential race in fscache operation enqueuing for reading and > copying multiple pages from cachefiles to netfs. > > If this race occurs, an oops similar to the following is seen: > > [585042.202316] FS-Cache: > [585042.202343] FS-Cache: Assertion failed > [585042.202367] FS-Cache: 6 == 5 is false > [585042.202452] ------------[ cut here ]------------ > [585042.202480] kernel BUG at fs/fscache/operation.c:494! > ... > [585042.209600] Call Trace: > [585042.211233] [<ffffffffc034c29a>] fscache_op_work_func+0x2a/0x50 [fscache] > [585042.212677] [<ffffffff81095a70>] process_one_work+0x150/0x3f0 > [585042.213550] [<ffffffff810961ea>] worker_thread+0x11a/0x470 > ... > > The race occurs in the following situation: > > One thread is in cachefiles_read_waiter: > 1) object->work_lock is taken. > 2) the operation is added to the to_do list. > 3) the work lock is dropped. > 4) fscache_enqueue_retrieval is called, which takes a reference. > > Another thread is in cachefiles_read_copier: > 1) object->work_lock is taken > 2) an item is popped off the to_do list. > 3) object->work_lock is dropped. > 4) some processing is done on the item, and fscache_put_retrieval() > is called, dropping a reference. > > Now if the this process in cachefiles_read_copier takes place > *between* steps 3 and 4 in cachefiles_read_waiter, a reference will be > dropped before it is taken, which leads to the object's reference count > hitting zero, which leads to lifecycle events for the object happening > too soon, leading to the assertion failure later on. > > Move fscache_enqueue_retrieval under the lock in > cachefiles_read_waiter. This means that the object cannot be popped > off the to_do list until it is in a fully consistent state with the > reference taken. > > Signed-off-by: Lei Xue <carmark.dlut@gmail.com> > Reviewed-by: Daniel Axtens <dja@axtens.net> > [dja: rewrite and expand commit message] > (From https://www.redhat.com/archives/linux-cachefs/2018-February/msg00000.html > This patch has been sitting on the mailing list for months with no > response from the maintainer. A similar patch fixing the same issue > was posted as far back as May 2017, and likewise had no response: > https://www.redhat.com/archives/linux-cachefs/2017-May/msg00002.html > I poked the list recently and also got nothing: > https://www.redhat.com/archives/linux-cachefs/2018-May/msg00000.html > and the problem was again reported and this patch validated by > another user: > https://www.redhat.com/archives/linux-cachefs/2018-May/msg00001.html > Hence the submission as a sauce patch.) > Signed-off-by: Daniel Axtens <daniel.axtens@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> > --- > fs/cachefiles/rdwr.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c > index c0f3da3926a0..d9bb47d1e16d 100644 > --- a/fs/cachefiles/rdwr.c > +++ b/fs/cachefiles/rdwr.c > @@ -58,9 +58,9 @@ static int cachefiles_read_waiter(wait_queue_t *wait, unsigned mode, > > spin_lock(&object->work_lock); > list_add_tail(&monitor->op_link, &monitor->op->to_do); > + fscache_enqueue_retrieval(monitor->op); > spin_unlock(&object->work_lock); > > - fscache_enqueue_retrieval(monitor->op); > return 0; > } > >
Applied to unstable master branch. Thanks. Cascardo. Applied-to: unstable/master
diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index c0f3da3926a0..d9bb47d1e16d 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -58,9 +58,9 @@ static int cachefiles_read_waiter(wait_queue_t *wait, unsigned mode, spin_lock(&object->work_lock); list_add_tail(&monitor->op_link, &monitor->op->to_do); + fscache_enqueue_retrieval(monitor->op); spin_unlock(&object->work_lock); - fscache_enqueue_retrieval(monitor->op); return 0; }