[SRU,X/B,1/1] UBUNTU: SAUCE: fscache: Fix race in decrementing refcount of op->npages

Message ID 20181012011131.30608-2-daniel.axtens@canonical.com
State New
Headers show
Series
  • LP: #1797314 - fix another fscache crash
Related show

Commit Message

Daniel Axtens Oct. 12, 2018, 1:11 a.m.
From: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>

BugLink: https://bugs.launchpad.net/bugs/1797314

[Trace]
seen this in 4.4.x kernels and the same bug affects fscache in latest upstreams kernels.
Jun 25 11:32:08  kernel: [4740718.880898] FS-Cache:
Jun 25 11:32:08  kernel: [4740718.880920] FS-Cache: Assertion failed
Jun 25 11:32:08  kernel: [4740718.880934] FS-Cache: 0 > 0 is false
Jun 25 11:32:08  kernel: [4740718.881001] ------------[ cut here ]------------
Jun 25 11:32:08  kernel: [4740718.881017] kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449!
Jun 25 11:32:08  kernel: [4740718.881040] invalid opcode: 0000 [#1] SMP
...
Jun 25 11:32:08  kernel: [4740718.892659] Call Trace:
Jun 25 11:32:08  kernel: [4740718.893506]  [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles]
Jun 25 11:32:08  kernel: [4740718.894374]  [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache]
Jun 25 11:32:08  kernel: [4740718.895180]  [<ffffffff81096da0>] process_one_work+0x150/0x3f0
Jun 25 11:32:08  kernel: [4740718.895966]  [<ffffffff8109751a>] worker_thread+0x11a/0x470
Jun 25 11:32:08  kernel: [4740718.896753]  [<ffffffff81808e59>] ? __schedule+0x359/0x980
Jun 25 11:32:08  kernel: [4740718.897783]  [<ffffffff81097400>] ? rescuer_thread+0x310/0x310
Jun 25 11:32:08  kernel: [4740718.898581]  [<ffffffff8109cdd6>] kthread+0xd6/0xf0
Jun 25 11:32:08  kernel: [4740718.899469]  [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
Jun 25 11:32:08  kernel: [4740718.900477]  [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70
Jun 25 11:32:08  kernel: [4740718.901514]  [<ffffffff8109cd00>] ? kthread_park+0x60/0x60

[Problem]
        atomic_sub(n_pages, &op->n_pages);
        if (atomic_read(&op->n_pages) <= 0)
                fscache_op_complete(&op->op, true);

The code in fscache_retrieval_complete is using atomic_sub followed by an atomic_read.
This causes two threads doing a decrement of pages to race with each other seeing the op->refcount 0 at same time,
and end up calling fscache_op_complete in both the threads leading to the OOPs.

[Fix]

The fix is trivial to use atomic_sub_return instead of two calls.

Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
(backported from
 https://www.redhat.com/archives/linux-cachefs/2018-September/msg00001.html
 The message has been on-list since 21 September 2018 and has
 received no feedback whatsoever.

 I have cleaned up the commit message a little bit and dropped a
 whitespace change.)
Signed-off-by: Daniel Axtens <daniel.axtens@canonical.com>
---
 include/linux/fscache-cache.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Stefan Bader Oct. 12, 2018, 8 a.m. | #1
On 12.10.2018 03:11, Daniel Axtens wrote:
> From: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/1797314
> 
> [Trace]
> seen this in 4.4.x kernels and the same bug affects fscache in latest upstreams kernels.
> Jun 25 11:32:08  kernel: [4740718.880898] FS-Cache:
> Jun 25 11:32:08  kernel: [4740718.880920] FS-Cache: Assertion failed
> Jun 25 11:32:08  kernel: [4740718.880934] FS-Cache: 0 > 0 is false
> Jun 25 11:32:08  kernel: [4740718.881001] ------------[ cut here ]------------
> Jun 25 11:32:08  kernel: [4740718.881017] kernel BUG at /usr/src/linux-4.4.0/fs/fscache/operation.c:449!
> Jun 25 11:32:08  kernel: [4740718.881040] invalid opcode: 0000 [#1] SMP
> ...
> Jun 25 11:32:08  kernel: [4740718.892659] Call Trace:
> Jun 25 11:32:08  kernel: [4740718.893506]  [<ffffffffc1464cf9>] cachefiles_read_copier+0x3a9/0x410 [cachefiles]
> Jun 25 11:32:08  kernel: [4740718.894374]  [<ffffffffc037e272>] fscache_op_work_func+0x22/0x50 [fscache]
> Jun 25 11:32:08  kernel: [4740718.895180]  [<ffffffff81096da0>] process_one_work+0x150/0x3f0
> Jun 25 11:32:08  kernel: [4740718.895966]  [<ffffffff8109751a>] worker_thread+0x11a/0x470
> Jun 25 11:32:08  kernel: [4740718.896753]  [<ffffffff81808e59>] ? __schedule+0x359/0x980
> Jun 25 11:32:08  kernel: [4740718.897783]  [<ffffffff81097400>] ? rescuer_thread+0x310/0x310
> Jun 25 11:32:08  kernel: [4740718.898581]  [<ffffffff8109cdd6>] kthread+0xd6/0xf0
> Jun 25 11:32:08  kernel: [4740718.899469]  [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
> Jun 25 11:32:08  kernel: [4740718.900477]  [<ffffffff8180d0cf>] ret_from_fork+0x3f/0x70
> Jun 25 11:32:08  kernel: [4740718.901514]  [<ffffffff8109cd00>] ? kthread_park+0x60/0x60
> 
> [Problem]
>         atomic_sub(n_pages, &op->n_pages);
>         if (atomic_read(&op->n_pages) <= 0)
>                 fscache_op_complete(&op->op, true);
> 
> The code in fscache_retrieval_complete is using atomic_sub followed by an atomic_read.
> This causes two threads doing a decrement of pages to race with each other seeing the op->refcount 0 at same time,
> and end up calling fscache_op_complete in both the threads leading to the OOPs.
> 
> [Fix]
> 
> The fix is trivial to use atomic_sub_return instead of two calls.
> 
> Signed-off-by: Kiran Kumar Modukuri <kiran.modukuri@gmail.com>
> (backported from
>  https://www.redhat.com/archives/linux-cachefs/2018-September/msg00001.html
>  The message has been on-list since 21 September 2018 and has
>  received no feedback whatsoever.
> 
>  I have cleaned up the commit message a little bit and dropped a
>  whitespace change.)
> Signed-off-by: Daniel Axtens <daniel.axtens@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
> ---
>  include/linux/fscache-cache.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
> index 4c467ef50159..91be23fa037c 100644
> --- a/include/linux/fscache-cache.h
> +++ b/include/linux/fscache-cache.h
> @@ -183,8 +183,7 @@ static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op)
>  static inline void fscache_retrieval_complete(struct fscache_retrieval *op,
>  					      int n_pages)
>  {
> -	atomic_sub(n_pages, &op->n_pages);
> -	if (atomic_read(&op->n_pages) <= 0)
> +	if (atomic_sub_return(n_pages, &op->n_pages) <= 0)
>  		fscache_op_complete(&op->op, true);
>  }
>  
>

Patch

diff --git a/include/linux/fscache-cache.h b/include/linux/fscache-cache.h
index 4c467ef50159..91be23fa037c 100644
--- a/include/linux/fscache-cache.h
+++ b/include/linux/fscache-cache.h
@@ -183,8 +183,7 @@  static inline void fscache_enqueue_retrieval(struct fscache_retrieval *op)
 static inline void fscache_retrieval_complete(struct fscache_retrieval *op,
 					      int n_pages)
 {
-	atomic_sub(n_pages, &op->n_pages);
-	if (atomic_read(&op->n_pages) <= 0)
+	if (atomic_sub_return(n_pages, &op->n_pages) <= 0)
 		fscache_op_complete(&op->op, true);
 }