[v2] ext4: fix a data race in EXT4_I(inode)->i_disksize
diff mbox series

Message ID 1581085751-31793-1-git-send-email-cai@lca.pw
State Awaiting Upstream
Headers show
Series
  • [v2] ext4: fix a data race in EXT4_I(inode)->i_disksize
Related show

Commit Message

Qian Cai Feb. 7, 2020, 2:29 p.m. UTC
EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
KCSAN,

 BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]

 write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
  ext4_write_end+0x4e3/0x750 [ext4]
  ext4_update_i_disksize at fs/ext4/ext4.h:3032
  (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
  (inlined by) ext4_write_end at fs/ext4/inode.c:1287
  generic_perform_write+0x208/0x2a0
  ext4_buffered_write_iter+0x11f/0x210 [ext4]
  ext4_file_write_iter+0xce/0x9e0 [ext4]
  new_sync_write+0x29c/0x3b0
  __vfs_write+0x92/0xa0
  vfs_write+0x103/0x260
  ksys_write+0x9d/0x130
  __x64_sys_write+0x4c/0x60
  do_syscall_64+0x91/0xb47
  entry_SYSCALL_64_after_hwframe+0x49/0xbe

 read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
  ext4_writepages+0x10ac/0x1d00 [ext4]
  mpage_map_and_submit_extent at fs/ext4/inode.c:2468
  (inlined by) ext4_writepages at fs/ext4/inode.c:2772
  do_writepages+0x5e/0x130
  __writeback_single_inode+0xeb/0xb20
  writeback_sb_inodes+0x429/0x900
  __writeback_inodes_wb+0xc4/0x150
  wb_writeback+0x4bd/0x870
  wb_workfn+0x6b4/0x960
  process_one_work+0x54c/0xbe0
  worker_thread+0x80/0x650
  kthread+0x1e0/0x200
  ret_from_fork+0x27/0x50

 Reported by Kernel Concurrency Sanitizer on:
 CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
 Workqueue: writeback wb_workfn (flush-7:0)

Since only the read is operating as lockless (outside of the
"i_data_sem"), load tearing could introduce a logic bug. Fix it by
adding READ_ONCE() for the read and WRITE_ONCE() for the write.

Signed-off-by: Qian Cai <cai@lca.pw>
---

v2: also add WRITE_ONCE() which is recommended even for fixing load tearing.

 fs/ext4/ext4.h  | 2 +-
 fs/ext4/inode.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Marco Elver Feb. 7, 2020, 3:12 p.m. UTC | #1
On Fri, 7 Feb 2020 at 15:29, Qian Cai <cai@lca.pw> wrote:
>
> EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
> KCSAN,
>
>  BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]
>
>  write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
>   ext4_write_end+0x4e3/0x750 [ext4]
>   ext4_update_i_disksize at fs/ext4/ext4.h:3032
>   (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
>   (inlined by) ext4_write_end at fs/ext4/inode.c:1287
>   generic_perform_write+0x208/0x2a0
>   ext4_buffered_write_iter+0x11f/0x210 [ext4]
>   ext4_file_write_iter+0xce/0x9e0 [ext4]
>   new_sync_write+0x29c/0x3b0
>   __vfs_write+0x92/0xa0
>   vfs_write+0x103/0x260
>   ksys_write+0x9d/0x130
>   __x64_sys_write+0x4c/0x60
>   do_syscall_64+0x91/0xb47
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
>
>  read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
>   ext4_writepages+0x10ac/0x1d00 [ext4]
>   mpage_map_and_submit_extent at fs/ext4/inode.c:2468
>   (inlined by) ext4_writepages at fs/ext4/inode.c:2772
>   do_writepages+0x5e/0x130
>   __writeback_single_inode+0xeb/0xb20
>   writeback_sb_inodes+0x429/0x900
>   __writeback_inodes_wb+0xc4/0x150
>   wb_writeback+0x4bd/0x870
>   wb_workfn+0x6b4/0x960
>   process_one_work+0x54c/0xbe0
>   worker_thread+0x80/0x650
>   kthread+0x1e0/0x200
>   ret_from_fork+0x27/0x50
>
>  Reported by Kernel Concurrency Sanitizer on:
>  CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
>  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
>  Workqueue: writeback wb_workfn (flush-7:0)
>
> Since only the read is operating as lockless (outside of the
> "i_data_sem"), load tearing could introduce a logic bug. Fix it by
> adding READ_ONCE() for the read and WRITE_ONCE() for the write.
>
> Signed-off-by: Qian Cai <cai@lca.pw>
> ---
>
> v2: also add WRITE_ONCE() which is recommended even for fixing load tearing.

Just a note: I keep seeing 'load tearing' mentioned as the only reason:

  - The WRITE_ONCE avoids store-tearing (and other optimizations).

  - We're not only interested in avoiding load/store tearing. There
are plenty other compiler optimizations that can break concurrent
code: https://lwn.net/Articles/793253/

Thanks,
-- Marco


>  fs/ext4/ext4.h  | 2 +-
>  fs/ext4/inode.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index 9a2ee2428ecc..8329ccc82fa9 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -3029,7 +3029,7 @@ static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
>                      !inode_is_locked(inode));
>         down_write(&EXT4_I(inode)->i_data_sem);
>         if (newsize > EXT4_I(inode)->i_disksize)
> -               EXT4_I(inode)->i_disksize = newsize;
> +               WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize);
>         up_write(&EXT4_I(inode)->i_data_sem);
>  }
>
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 3313168b680f..6f9862bf63f1 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -2465,7 +2465,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
>          * truncate are avoided by checking i_size under i_data_sem.
>          */
>         disksize = ((loff_t)mpd->first_page) << PAGE_SHIFT;
> -       if (disksize > EXT4_I(inode)->i_disksize) {
> +       if (disksize > READ_ONCE(EXT4_I(inode)->i_disksize)) {
>                 int err2;
>                 loff_t i_size;
>
> --
> 1.8.3.1
>
Qian Cai Feb. 7, 2020, 3:25 p.m. UTC | #2
On Fri, 2020-02-07 at 16:12 +0100, Marco Elver wrote:
> On Fri, 7 Feb 2020 at 15:29, Qian Cai <cai@lca.pw> wrote:
> > 
> > EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
> > KCSAN,
> > 
> >  BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]
> > 
> >  write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
> >   ext4_write_end+0x4e3/0x750 [ext4]
> >   ext4_update_i_disksize at fs/ext4/ext4.h:3032
> >   (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
> >   (inlined by) ext4_write_end at fs/ext4/inode.c:1287
> >   generic_perform_write+0x208/0x2a0
> >   ext4_buffered_write_iter+0x11f/0x210 [ext4]
> >   ext4_file_write_iter+0xce/0x9e0 [ext4]
> >   new_sync_write+0x29c/0x3b0
> >   __vfs_write+0x92/0xa0
> >   vfs_write+0x103/0x260
> >   ksys_write+0x9d/0x130
> >   __x64_sys_write+0x4c/0x60
> >   do_syscall_64+0x91/0xb47
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> >  read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
> >   ext4_writepages+0x10ac/0x1d00 [ext4]
> >   mpage_map_and_submit_extent at fs/ext4/inode.c:2468
> >   (inlined by) ext4_writepages at fs/ext4/inode.c:2772
> >   do_writepages+0x5e/0x130
> >   __writeback_single_inode+0xeb/0xb20
> >   writeback_sb_inodes+0x429/0x900
> >   __writeback_inodes_wb+0xc4/0x150
> >   wb_writeback+0x4bd/0x870
> >   wb_workfn+0x6b4/0x960
> >   process_one_work+0x54c/0xbe0
> >   worker_thread+0x80/0x650
> >   kthread+0x1e0/0x200
> >   ret_from_fork+0x27/0x50
> > 
> >  Reported by Kernel Concurrency Sanitizer on:
> >  CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
> >  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> >  Workqueue: writeback wb_workfn (flush-7:0)
> > 
> > Since only the read is operating as lockless (outside of the
> > "i_data_sem"), load tearing could introduce a logic bug. Fix it by
> > adding READ_ONCE() for the read and WRITE_ONCE() for the write.
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> > 
> > v2: also add WRITE_ONCE() which is recommended even for fixing load tearing.
> 
> Just a note: I keep seeing 'load tearing' mentioned as the only reason:
> 
>   - The WRITE_ONCE avoids store-tearing (and other optimizations).

In general, yes, but in this case, store tearing can't happen because those
concurrent writers are protected by "i_data_sem", i.e., 

down_write(&EXT4_I(inode)->i_data_sem);

> 
>   - We're not only interested in avoiding load/store tearing. There
> are plenty other compiler optimizations that can break concurrent
> code: https://lwn.net/Articles/793253/
> 
> Thanks,
> -- Marco
> 
> 
> >  fs/ext4/ext4.h  | 2 +-
> >  fs/ext4/inode.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 9a2ee2428ecc..8329ccc82fa9 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -3029,7 +3029,7 @@ static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
> >                      !inode_is_locked(inode));
> >         down_write(&EXT4_I(inode)->i_data_sem);
> >         if (newsize > EXT4_I(inode)->i_disksize)
> > -               EXT4_I(inode)->i_disksize = newsize;
> > +               WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize);
> >         up_write(&EXT4_I(inode)->i_data_sem);
> >  }
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 3313168b680f..6f9862bf63f1 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2465,7 +2465,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
> >          * truncate are avoided by checking i_size under i_data_sem.
> >          */
> >         disksize = ((loff_t)mpd->first_page) << PAGE_SHIFT;
> > -       if (disksize > EXT4_I(inode)->i_disksize) {
> > +       if (disksize > READ_ONCE(EXT4_I(inode)->i_disksize)) {
> >                 int err2;
> >                 loff_t i_size;
> > 
> > --
> > 1.8.3.1
> >
Qian Cai Feb. 7, 2020, 3:38 p.m. UTC | #3
On Fri, 2020-02-07 at 16:12 +0100, Marco Elver wrote:
> On Fri, 7 Feb 2020 at 15:29, Qian Cai <cai@lca.pw> wrote:
> > 
> > EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
> > KCSAN,
> > 
> >  BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]
> > 
> >  write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
> >   ext4_write_end+0x4e3/0x750 [ext4]
> >   ext4_update_i_disksize at fs/ext4/ext4.h:3032
> >   (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
> >   (inlined by) ext4_write_end at fs/ext4/inode.c:1287
> >   generic_perform_write+0x208/0x2a0
> >   ext4_buffered_write_iter+0x11f/0x210 [ext4]
> >   ext4_file_write_iter+0xce/0x9e0 [ext4]
> >   new_sync_write+0x29c/0x3b0
> >   __vfs_write+0x92/0xa0
> >   vfs_write+0x103/0x260
> >   ksys_write+0x9d/0x130
> >   __x64_sys_write+0x4c/0x60
> >   do_syscall_64+0x91/0xb47
> >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> >  read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
> >   ext4_writepages+0x10ac/0x1d00 [ext4]
> >   mpage_map_and_submit_extent at fs/ext4/inode.c:2468
> >   (inlined by) ext4_writepages at fs/ext4/inode.c:2772
> >   do_writepages+0x5e/0x130
> >   __writeback_single_inode+0xeb/0xb20
> >   writeback_sb_inodes+0x429/0x900
> >   __writeback_inodes_wb+0xc4/0x150
> >   wb_writeback+0x4bd/0x870
> >   wb_workfn+0x6b4/0x960
> >   process_one_work+0x54c/0xbe0
> >   worker_thread+0x80/0x650
> >   kthread+0x1e0/0x200
> >   ret_from_fork+0x27/0x50
> > 
> >  Reported by Kernel Concurrency Sanitizer on:
> >  CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
> >  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> >  Workqueue: writeback wb_workfn (flush-7:0)
> > 
> > Since only the read is operating as lockless (outside of the
> > "i_data_sem"), load tearing could introduce a logic bug. Fix it by
> > adding READ_ONCE() for the read and WRITE_ONCE() for the write.
> > 
> > Signed-off-by: Qian Cai <cai@lca.pw>
> > ---
> > 
> > v2: also add WRITE_ONCE() which is recommended even for fixing load tearing.
> 
> Just a note: I keep seeing 'load tearing' mentioned as the only reason:
> 
>   - The WRITE_ONCE avoids store-tearing (and other optimizations).
> 
>   - We're not only interested in avoiding load/store tearing. There
> are plenty other compiler optimizations that can break concurrent
> code: https://lwn.net/Articles/793253/

I also realized that from that article, store tearing is strictly from multiple
concurrent writers. However, in the sense of without the WRITE_ONCE() here,
compilers could still have 2 store instructions, so

CPU0:	CPU1:
	store #1
read
	store #2

which was not mentioned in that article. I called it also load tearing, but
maybe you will call that store tearing. Do I understand correctly?

> 
> Thanks,
> -- Marco
> 
> 
> >  fs/ext4/ext4.h  | 2 +-
> >  fs/ext4/inode.c | 2 +-
> >  2 files changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > index 9a2ee2428ecc..8329ccc82fa9 100644
> > --- a/fs/ext4/ext4.h
> > +++ b/fs/ext4/ext4.h
> > @@ -3029,7 +3029,7 @@ static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
> >                      !inode_is_locked(inode));
> >         down_write(&EXT4_I(inode)->i_data_sem);
> >         if (newsize > EXT4_I(inode)->i_disksize)
> > -               EXT4_I(inode)->i_disksize = newsize;
> > +               WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize);
> >         up_write(&EXT4_I(inode)->i_data_sem);
> >  }
> > 
> > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > index 3313168b680f..6f9862bf63f1 100644
> > --- a/fs/ext4/inode.c
> > +++ b/fs/ext4/inode.c
> > @@ -2465,7 +2465,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
> >          * truncate are avoided by checking i_size under i_data_sem.
> >          */
> >         disksize = ((loff_t)mpd->first_page) << PAGE_SHIFT;
> > -       if (disksize > EXT4_I(inode)->i_disksize) {
> > +       if (disksize > READ_ONCE(EXT4_I(inode)->i_disksize)) {
> >                 int err2;
> >                 loff_t i_size;
> > 
> > --
> > 1.8.3.1
> >
Marco Elver Feb. 7, 2020, 4:08 p.m. UTC | #4
On Fri, 7 Feb 2020 at 16:38, Qian Cai <cai@lca.pw> wrote:
>
> On Fri, 2020-02-07 at 16:12 +0100, Marco Elver wrote:
> > On Fri, 7 Feb 2020 at 15:29, Qian Cai <cai@lca.pw> wrote:
> > >
> > > EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
> > > KCSAN,
> > >
> > >  BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]
> > >
> > >  write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
> > >   ext4_write_end+0x4e3/0x750 [ext4]
> > >   ext4_update_i_disksize at fs/ext4/ext4.h:3032
> > >   (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
> > >   (inlined by) ext4_write_end at fs/ext4/inode.c:1287
> > >   generic_perform_write+0x208/0x2a0
> > >   ext4_buffered_write_iter+0x11f/0x210 [ext4]
> > >   ext4_file_write_iter+0xce/0x9e0 [ext4]
> > >   new_sync_write+0x29c/0x3b0
> > >   __vfs_write+0x92/0xa0
> > >   vfs_write+0x103/0x260
> > >   ksys_write+0x9d/0x130
> > >   __x64_sys_write+0x4c/0x60
> > >   do_syscall_64+0x91/0xb47
> > >   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > >
> > >  read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
> > >   ext4_writepages+0x10ac/0x1d00 [ext4]
> > >   mpage_map_and_submit_extent at fs/ext4/inode.c:2468
> > >   (inlined by) ext4_writepages at fs/ext4/inode.c:2772
> > >   do_writepages+0x5e/0x130
> > >   __writeback_single_inode+0xeb/0xb20
> > >   writeback_sb_inodes+0x429/0x900
> > >   __writeback_inodes_wb+0xc4/0x150
> > >   wb_writeback+0x4bd/0x870
> > >   wb_workfn+0x6b4/0x960
> > >   process_one_work+0x54c/0xbe0
> > >   worker_thread+0x80/0x650
> > >   kthread+0x1e0/0x200
> > >   ret_from_fork+0x27/0x50
> > >
> > >  Reported by Kernel Concurrency Sanitizer on:
> > >  CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
> > >  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
> > >  Workqueue: writeback wb_workfn (flush-7:0)
> > >
> > > Since only the read is operating as lockless (outside of the
> > > "i_data_sem"), load tearing could introduce a logic bug. Fix it by
> > > adding READ_ONCE() for the read and WRITE_ONCE() for the write.
> > >
> > > Signed-off-by: Qian Cai <cai@lca.pw>
> > > ---
> > >
> > > v2: also add WRITE_ONCE() which is recommended even for fixing load tearing.
> >
> > Just a note: I keep seeing 'load tearing' mentioned as the only reason:
> >
> >   - The WRITE_ONCE avoids store-tearing (and other optimizations).
> >
> >   - We're not only interested in avoiding load/store tearing. There
> > are plenty other compiler optimizations that can break concurrent
> > code: https://lwn.net/Articles/793253/
>
> I also realized that from that article, store tearing is strictly from multiple
> concurrent writers. However, in the sense of without the WRITE_ONCE() here,
> compilers could still have 2 store instructions, so
>
> CPU0:   CPU1:
>         store #1
> read
>         store #2
>
> which was not mentioned in that article. I called it also load tearing, but
> maybe you will call that store tearing. Do I understand correctly?

The effect is the same, so yes. If you have the writer side split the
write, but have a concurrent load, the observed value will appear
"teared". Similar if the reader side splits the reads (the more
obvious case).

> >
> > Thanks,
> > -- Marco
> >
> >
> > >  fs/ext4/ext4.h  | 2 +-
> > >  fs/ext4/inode.c | 2 +-
> > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> > > index 9a2ee2428ecc..8329ccc82fa9 100644
> > > --- a/fs/ext4/ext4.h
> > > +++ b/fs/ext4/ext4.h
> > > @@ -3029,7 +3029,7 @@ static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
> > >                      !inode_is_locked(inode));
> > >         down_write(&EXT4_I(inode)->i_data_sem);
> > >         if (newsize > EXT4_I(inode)->i_disksize)
> > > -               EXT4_I(inode)->i_disksize = newsize;
> > > +               WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize);
> > >         up_write(&EXT4_I(inode)->i_data_sem);
> > >  }
> > >
> > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> > > index 3313168b680f..6f9862bf63f1 100644
> > > --- a/fs/ext4/inode.c
> > > +++ b/fs/ext4/inode.c
> > > @@ -2465,7 +2465,7 @@ static int mpage_map_and_submit_extent(handle_t *handle,
> > >          * truncate are avoided by checking i_size under i_data_sem.
> > >          */
> > >         disksize = ((loff_t)mpd->first_page) << PAGE_SHIFT;
> > > -       if (disksize > EXT4_I(inode)->i_disksize) {
> > > +       if (disksize > READ_ONCE(EXT4_I(inode)->i_disksize)) {
> > >                 int err2;
> > >                 loff_t i_size;
> > >
> > > --
> > > 1.8.3.1
> > >
Theodore Y. Ts'o Feb. 20, 2020, 4:16 a.m. UTC | #5
On Fri, Feb 07, 2020 at 09:29:11AM -0500, Qian Cai wrote:
> EXT4_I(inode)->i_disksize could be accessed concurrently as noticed by
> KCSAN,
> 
>  BUG: KCSAN: data-race in ext4_write_end [ext4] / ext4_writepages [ext4]
> 
>  write to 0xffff91c6713b00f8 of 8 bytes by task 49268 on cpu 127:
>   ext4_write_end+0x4e3/0x750 [ext4]
>   ext4_update_i_disksize at fs/ext4/ext4.h:3032
>   (inlined by) ext4_update_inode_size at fs/ext4/ext4.h:3046
>   (inlined by) ext4_write_end at fs/ext4/inode.c:1287
>   generic_perform_write+0x208/0x2a0
>   ext4_buffered_write_iter+0x11f/0x210 [ext4]
>   ext4_file_write_iter+0xce/0x9e0 [ext4]
>   new_sync_write+0x29c/0x3b0
>   __vfs_write+0x92/0xa0
>   vfs_write+0x103/0x260
>   ksys_write+0x9d/0x130
>   __x64_sys_write+0x4c/0x60
>   do_syscall_64+0x91/0xb47
>   entry_SYSCALL_64_after_hwframe+0x49/0xbe
> 
>  read to 0xffff91c6713b00f8 of 8 bytes by task 24872 on cpu 37:
>   ext4_writepages+0x10ac/0x1d00 [ext4]
>   mpage_map_and_submit_extent at fs/ext4/inode.c:2468
>   (inlined by) ext4_writepages at fs/ext4/inode.c:2772
>   do_writepages+0x5e/0x130
>   __writeback_single_inode+0xeb/0xb20
>   writeback_sb_inodes+0x429/0x900
>   __writeback_inodes_wb+0xc4/0x150
>   wb_writeback+0x4bd/0x870
>   wb_workfn+0x6b4/0x960
>   process_one_work+0x54c/0xbe0
>   worker_thread+0x80/0x650
>   kthread+0x1e0/0x200
>   ret_from_fork+0x27/0x50
> 
>  Reported by Kernel Concurrency Sanitizer on:
>  CPU: 37 PID: 24872 Comm: kworker/u261:2 Tainted: G        W  O L 5.5.0-next-20200204+ #5
>  Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019
>  Workqueue: writeback wb_workfn (flush-7:0)
> 
> Since only the read is operating as lockless (outside of the
> "i_data_sem"), load tearing could introduce a logic bug. Fix it by
> adding READ_ONCE() for the read and WRITE_ONCE() for the write.
> 
> Signed-off-by: Qian Cai <cai@lca.pw>

Thanks, applied.

						- Ted

Patch
diff mbox series

diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
index 9a2ee2428ecc..8329ccc82fa9 100644
--- a/fs/ext4/ext4.h
+++ b/fs/ext4/ext4.h
@@ -3029,7 +3029,7 @@  static inline void ext4_update_i_disksize(struct inode *inode, loff_t newsize)
 		     !inode_is_locked(inode));
 	down_write(&EXT4_I(inode)->i_data_sem);
 	if (newsize > EXT4_I(inode)->i_disksize)
-		EXT4_I(inode)->i_disksize = newsize;
+		WRITE_ONCE(EXT4_I(inode)->i_disksize, newsize);
 	up_write(&EXT4_I(inode)->i_data_sem);
 }
 
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 3313168b680f..6f9862bf63f1 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2465,7 +2465,7 @@  static int mpage_map_and_submit_extent(handle_t *handle,
 	 * truncate are avoided by checking i_size under i_data_sem.
 	 */
 	disksize = ((loff_t)mpd->first_page) << PAGE_SHIFT;
-	if (disksize > EXT4_I(inode)->i_disksize) {
+	if (disksize > READ_ONCE(EXT4_I(inode)->i_disksize)) {
 		int err2;
 		loff_t i_size;