Message ID | 1403778991-11768-1-git-send-email-luis.henriques@canonical.com |
---|---|
State | New |
Headers | show |
On Thu, Jun 26, 2014 at 11:36:31AM +0100, Luis Henriques wrote: > This is a note to let you know that I have just added a patch titled > > xfs: block allocation work needs to be kswapd aware > > to the linux-3.11.y-queue branch of the 3.11.y.z extended stable tree > which can be found at: > > http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.11.y-queue > > If you, or anyone else, feels it should not be added to this tree, please > reply to this email. > > For more information about the 3.11.y.z tree, see > https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable No, do not apply this commit to stable kernels - it is due to be reverted as is causes major memory reclaim behaviour regressions. http://oss.sgi.com/pipermail/xfs/2014-June/036851.html Cheers, Dave. > > Thanks. > -Luis > > ------ > > From 979242514e5dbb38d8c68e91bb7adcf9cffce846 Mon Sep 17 00:00:00 2001 > From: Dave Chinner <dchinner@redhat.com> > Date: Fri, 6 Jun 2014 15:59:59 +1000 > Subject: xfs: block allocation work needs to be kswapd aware > > commit 1f6d64829db78a7e1d63e15c9f48f0a5d2b5a679 upstream. > > Upon memory pressure, kswapd calls xfs_vm_writepage() from > shrink_page_list(). This can result in delayed allocation occurring > and that gets deferred to the the allocation workqueue. > > The allocation then runs outside kswapd context, which means if it > needs memory (and it does to demand page metadata from disk) it can > block in shrink_inactive_list() waiting for IO congestion. These > blocking waits are normally avoiding in kswapd context, so under > memory pressure writeback from kswapd can be arbitrarily delayed by > memory reclaim. > > To avoid this, pass the kswapd context to the allocation being done > by the workqueue, so that memory reclaim understands correctly that > the work is being done for kswapd and therefore it is not blocked > and does not delay memory reclaim. > > To avoid issues with int->char conversion of flag fields (as noticed > in v1 of this patch) convert the flag fields in the struct > xfs_bmalloca to bool types. pahole indicates these variables are > still single byte variables, so no extra space is consumed by this > change. > > Reported-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> > Signed-off-by: Dave Chinner <dchinner@redhat.com> > Reviewed-by: Christoph Hellwig <hch@lst.de> > Signed-off-by: Dave Chinner <david@fromorbit.com> > [ luis: backported to 3.11: files rename: > - fs/xfs/xfs_bmap_util.c -> fs/xfs/xfs_bmap.c > - fs/xfs/xfs_bmap_util.h -> fs/xfs/xfs_bmap.h ] > Signed-off-by: Luis Henriques <luis.henriques@canonical.com> > --- > fs/xfs/xfs_bmap.c | 16 +++++++++++++--- > fs/xfs/xfs_bmap.h | 13 +++++++------ > 2 files changed, 20 insertions(+), 9 deletions(-) > > diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c > index 05c698ccb238..ed6dc2d02573 100644 > --- a/fs/xfs/xfs_bmap.c > +++ b/fs/xfs/xfs_bmap.c > @@ -4763,14 +4763,23 @@ xfs_bmapi_allocate_worker( > struct xfs_bmalloca *args = container_of(work, > struct xfs_bmalloca, work); > unsigned long pflags; > + unsigned long new_pflags = PF_FSTRANS; > > - /* we are in a transaction context here */ > - current_set_flags_nested(&pflags, PF_FSTRANS); > + /* > + * we are in a transaction context here, but may also be doing work > + * in kswapd context, and hence we may need to inherit that state > + * temporarily to ensure that we don't block waiting for memory reclaim > + * in any way. > + */ > + if (args->kswapd) > + new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD; > + > + current_set_flags_nested(&pflags, new_pflags); > > args->result = __xfs_bmapi_allocate(args); > complete(args->done); > > - current_restore_flags_nested(&pflags, PF_FSTRANS); > + current_restore_flags_nested(&pflags, new_pflags); > } > > /* > @@ -4789,6 +4798,7 @@ xfs_bmapi_allocate( > > > args->done = &done; > + args->kswapd = current_is_kswapd(); > INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker); > queue_work(xfs_alloc_wq, &args->work); > wait_for_completion(&done); > diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h > index 1cf1292d29b7..035afc71a9ca 100644 > --- a/fs/xfs/xfs_bmap.h > +++ b/fs/xfs/xfs_bmap.h > @@ -130,12 +130,13 @@ typedef struct xfs_bmalloca { > xfs_extlen_t total; /* total blocks needed for xaction */ > xfs_extlen_t minlen; /* minimum allocation size (blocks) */ > xfs_extlen_t minleft; /* amount must be left after alloc */ > - char eof; /* set if allocating past last extent */ > - char wasdel; /* replacing a delayed allocation */ > - char userdata;/* set if is user data */ > - char aeof; /* allocated space at eof */ > - char conv; /* overwriting unwritten extents */ > - char stack_switch; > + bool eof; /* set if allocating past last extent */ > + bool wasdel; /* replacing a delayed allocation */ > + bool userdata;/* set if is user data */ > + bool aeof; /* allocated space at eof */ > + bool conv; /* overwriting unwritten extents */ > + bool stack_switch; > + bool kswapd; /* allocation in kswapd context */ > int flags; > struct completion *done; > struct work_struct work; > -- > 1.9.1 > >
Dave Chinner <david@fromorbit.com> writes: > On Thu, Jun 26, 2014 at 11:36:31AM +0100, Luis Henriques wrote: >> This is a note to let you know that I have just added a patch titled >> >> xfs: block allocation work needs to be kswapd aware >> >> to the linux-3.11.y-queue branch of the 3.11.y.z extended stable tree >> which can be found at: >> >> http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.11.y-queue >> >> If you, or anyone else, feels it should not be added to this tree, please >> reply to this email. >> >> For more information about the 3.11.y.z tree, see >> https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable > > No, do not apply this commit to stable kernels - it is due to be > reverted as is causes major memory reclaim behaviour regressions. > > http://oss.sgi.com/pipermail/xfs/2014-June/036851.html > > Cheers, > > Dave. > Ups, I wasn't aware of that. Thanks a lot, I'll drop this patch from the 3.11 queue. Cheers,
diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c index 05c698ccb238..ed6dc2d02573 100644 --- a/fs/xfs/xfs_bmap.c +++ b/fs/xfs/xfs_bmap.c @@ -4763,14 +4763,23 @@ xfs_bmapi_allocate_worker( struct xfs_bmalloca *args = container_of(work, struct xfs_bmalloca, work); unsigned long pflags; + unsigned long new_pflags = PF_FSTRANS; - /* we are in a transaction context here */ - current_set_flags_nested(&pflags, PF_FSTRANS); + /* + * we are in a transaction context here, but may also be doing work + * in kswapd context, and hence we may need to inherit that state + * temporarily to ensure that we don't block waiting for memory reclaim + * in any way. + */ + if (args->kswapd) + new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD; + + current_set_flags_nested(&pflags, new_pflags); args->result = __xfs_bmapi_allocate(args); complete(args->done); - current_restore_flags_nested(&pflags, PF_FSTRANS); + current_restore_flags_nested(&pflags, new_pflags); } /* @@ -4789,6 +4798,7 @@ xfs_bmapi_allocate( args->done = &done; + args->kswapd = current_is_kswapd(); INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker); queue_work(xfs_alloc_wq, &args->work); wait_for_completion(&done); diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h index 1cf1292d29b7..035afc71a9ca 100644 --- a/fs/xfs/xfs_bmap.h +++ b/fs/xfs/xfs_bmap.h @@ -130,12 +130,13 @@ typedef struct xfs_bmalloca { xfs_extlen_t total; /* total blocks needed for xaction */ xfs_extlen_t minlen; /* minimum allocation size (blocks) */ xfs_extlen_t minleft; /* amount must be left after alloc */ - char eof; /* set if allocating past last extent */ - char wasdel; /* replacing a delayed allocation */ - char userdata;/* set if is user data */ - char aeof; /* allocated space at eof */ - char conv; /* overwriting unwritten extents */ - char stack_switch; + bool eof; /* set if allocating past last extent */ + bool wasdel; /* replacing a delayed allocation */ + bool userdata;/* set if is user data */ + bool aeof; /* allocated space at eof */ + bool conv; /* overwriting unwritten extents */ + bool stack_switch; + bool kswapd; /* allocation in kswapd context */ int flags; struct completion *done; struct work_struct work;