Patchwork DIO process stuck apparently due to dioread_nolock (3.0)

login
register
mail settings
Submitter Jan Kara
Date Aug. 11, 2011, 2:01 p.m.
Message ID <20110811140101.GA18802@quack.suse.cz>
Download mbox | patch
Permalink /patch/109612/
State Superseded
Headers show

Comments

Jan Kara - Aug. 11, 2011, 2:01 p.m.
Hello,

On Thu 11-08-11 16:21:42, Michael Tokarev wrote:
> 11.08.2011 15:59, Jan Kara wrote:
> > On Wed 10-08-11 14:51:17, Michael Tokarev wrote:
> >> For a few days I'm evaluating various options to use
> >> storage.  I'm interested in concurrent direct I/O
> >> (oracle rdbms workload).
> >>
> >> I noticed that somehow, ext4fs in mixed read-write
> >> test greatly prefers writes over reads - writes goes
> >> at full speed while reads are almost non-existent.
> >>
> >> Sandeen on IRC pointed me at dioread_nolock mount
> >> option, which I tried with great results, if not
> >> one "but".
> >>
> >> There's a deadlock somewhere, which I can't trigger
> >> "on demand" - I can't hit the right condition.  It
> >> happened twice in a row already, each time after the
> >> same scenario (more about that later).
> >>
> >> When it happens, a process doing direct AIO stalls
> >> infinitely, with the following backtrace:
> >>
> >> [87550.759848] INFO: task oracle:23176 blocked for more than 120 seconds.
> >> [87550.759892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> >> [87550.759955] oracle          D 0000000000000000     0 23176      1 0x00000000
> >> [87550.760006]  ffff8820457b47d0 0000000000000082 ffff880600000000 ffff881278e3f7d0
> >> [87550.760085]  ffff8806215c1fd8 ffff8806215c1fd8 ffff8806215c1fd8 ffff8820457b47d0
> >> [87550.760163]  ffffea0010bd7c68 ffffffff00000000 ffff882045512ef8 ffffffff810eeda2
> >> [87550.760245] Call Trace:
> >> [87550.760285]  [<ffffffff810eeda2>] ? __do_fault+0x422/0x520
> >> [87550.760327]  [<ffffffff81111ded>] ? kmem_getpages+0x5d/0x170
> >> [87550.760367]  [<ffffffff81112e58>] ? ____cache_alloc_node+0x48/0x140
> >> [87550.760430]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
> >> [87550.760475]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
> >> [87550.760523]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
> >> [87550.760566]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
> >> [87550.760607]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
> >> [87550.760646]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
> >> [87550.760689]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b
> >   Hmm, the stack trace does not quite make sense to me - the part between
> > __do_fault and aio_rw_vect_retry is somehow broken. I can imagine we
> > blocked in ext4_file_write() but I don't see any place there where we would
> > allocate memory. By any chance, are there messages like "Unaligned AIO/DIO
> > on inode ..." in the kernel log?
> 
> Yes, there are warnings about unaligned DIO, referring to this same
> process actually. Oracle does almost good job at aligning writes
> (usually it does i/o by its blocks which are 4Kb by default but
> are set to something larger - like 16Kb - for larger database).
> Except of a few cases, and lgwr process is one of them (*) - it
> writes logfiles using 512b blocks.  This is okay for a raw device
> with 512bytes blocks, but ext4 expects 4k writes at min.
> 
> (*) another case is writing to control file, which is also done in
> 512byte chunks.
  Ah, OK. I think the code ext4_end_io_nolock() handling the wait queue
might be racy. waitqueue_active() check is missing a barrier I think.
Does attached (untested) patch fix the issue for you?

								Honza
Michael Tokarev - Aug. 11, 2011, 8:05 p.m.
11.08.2011 18:01, Jan Kara wrote:
[]
>>>> When it happens, a process doing direct AIO stalls
>>>> infinitely, with the following backtrace:
>>>>
>>>> [87550.759848] INFO: task oracle:23176 blocked for more than 120 seconds.
>>>> [87550.759892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>> [87550.759955] oracle          D 0000000000000000     0 23176      1 0x00000000
>>>> [87550.760006]  ffff8820457b47d0 0000000000000082 ffff880600000000 ffff881278e3f7d0
>>>> [87550.760085]  ffff8806215c1fd8 ffff8806215c1fd8 ffff8806215c1fd8 ffff8820457b47d0
>>>> [87550.760163]  ffffea0010bd7c68 ffffffff00000000 ffff882045512ef8 ffffffff810eeda2
>>>> [87550.760245] Call Trace:
>>>> [87550.760285]  [<ffffffff810eeda2>] ? __do_fault+0x422/0x520
>>>> [87550.760327]  [<ffffffff81111ded>] ? kmem_getpages+0x5d/0x170
>>>> [87550.760367]  [<ffffffff81112e58>] ? ____cache_alloc_node+0x48/0x140
>>>> [87550.760430]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
>>>> [87550.760475]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
>>>> [87550.760523]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
>>>> [87550.760566]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
>>>> [87550.760607]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
>>>> [87550.760646]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
>>>> [87550.760689]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b
>>>   Hmm, the stack trace does not quite make sense to me - the part between
>>> __do_fault and aio_rw_vect_retry is somehow broken. I can imagine we
>>> blocked in ext4_file_write() but I don't see any place there where we would
>>> allocate memory. By any chance, are there messages like "Unaligned AIO/DIO
>>> on inode ..." in the kernel log?
>>
>> Yes, there are warnings about unaligned DIO, referring to this same
>> process actually. Oracle does almost good job at aligning writes
>> (usually it does i/o by its blocks which are 4Kb by default but
>> are set to something larger - like 16Kb - for larger database).
>> Except of a few cases, and lgwr process is one of them (*) - it
>> writes logfiles using 512b blocks.  This is okay for a raw device
>> with 512bytes blocks, but ext4 expects 4k writes at min.
>>
>> (*) another case is writing to control file, which is also done in
>> 512byte chunks.
>   Ah, OK. I think the code ext4_end_io_nolock() handling the wait queue
> might be racy. waitqueue_active() check is missing a barrier I think.
> Does attached (untested) patch fix the issue for you?

With this patch applied, I can reproduce the problem easily too:

[   76.982985] EXT4-fs (dm-1): Unaligned AIO/DIO on inode 3407879 by oracle; performance will be poor.
[ 1469.734114] SysRq : Show Blocked State
[ 1469.734157]   task                        PC stack   pid father
[ 1469.734473] oracle          D 0000000000000000     0  6146      1 0x00000000
[ 1469.734525]  ffff88103f604810 0000000000000082 ffff881000000000 ffff881079791040
[ 1469.734603]  ffff880432c19fd8 ffff880432c19fd8 ffff880432c19fd8 ffff88103f604810
[ 1469.734681]  ffffea000ec13590 ffffffff00000000 ffff881438c8dad8 ffffffff810eeda2
[ 1469.734760] Call Trace:
[ 1469.734800]  [<ffffffff810eeda2>] ? __do_fault+0x422/0x520
[ 1469.734863]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
[ 1469.734909]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
[ 1469.734956]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
[ 1469.734999]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
[ 1469.735039]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
[ 1469.735078]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
[ 1469.735122]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b

which is exactly the same as with previous "w" into sysrq-trigger.

So it's something else... :(

Thank you!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiaying Zhang - Aug. 12, 2011, 2:46 a.m.
Hi Michael,

Could you try your test with the patch I just posted:
http://marc.info/?l=linux-ext4&m=131311627101278&w=2
and see whether it fixes the problem?

Thanks!

Jiaying

On Thu, Aug 11, 2011 at 1:05 PM, Michael Tokarev <mjt@tls.msk.ru> wrote:
> 11.08.2011 18:01, Jan Kara wrote:
> []
>>>>> When it happens, a process doing direct AIO stalls
>>>>> infinitely, with the following backtrace:
>>>>>
>>>>> [87550.759848] INFO: task oracle:23176 blocked for more than 120 seconds.
>>>>> [87550.759892] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
>>>>> [87550.759955] oracle          D 0000000000000000     0 23176      1 0x00000000
>>>>> [87550.760006]  ffff8820457b47d0 0000000000000082 ffff880600000000 ffff881278e3f7d0
>>>>> [87550.760085]  ffff8806215c1fd8 ffff8806215c1fd8 ffff8806215c1fd8 ffff8820457b47d0
>>>>> [87550.760163]  ffffea0010bd7c68 ffffffff00000000 ffff882045512ef8 ffffffff810eeda2
>>>>> [87550.760245] Call Trace:
>>>>> [87550.760285]  [<ffffffff810eeda2>] ? __do_fault+0x422/0x520
>>>>> [87550.760327]  [<ffffffff81111ded>] ? kmem_getpages+0x5d/0x170
>>>>> [87550.760367]  [<ffffffff81112e58>] ? ____cache_alloc_node+0x48/0x140
>>>>> [87550.760430]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
>>>>> [87550.760475]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
>>>>> [87550.760523]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
>>>>> [87550.760566]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
>>>>> [87550.760607]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
>>>>> [87550.760646]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
>>>>> [87550.760689]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b
>>>>   Hmm, the stack trace does not quite make sense to me - the part between
>>>> __do_fault and aio_rw_vect_retry is somehow broken. I can imagine we
>>>> blocked in ext4_file_write() but I don't see any place there where we would
>>>> allocate memory. By any chance, are there messages like "Unaligned AIO/DIO
>>>> on inode ..." in the kernel log?
>>>
>>> Yes, there are warnings about unaligned DIO, referring to this same
>>> process actually. Oracle does almost good job at aligning writes
>>> (usually it does i/o by its blocks which are 4Kb by default but
>>> are set to something larger - like 16Kb - for larger database).
>>> Except of a few cases, and lgwr process is one of them (*) - it
>>> writes logfiles using 512b blocks.  This is okay for a raw device
>>> with 512bytes blocks, but ext4 expects 4k writes at min.
>>>
>>> (*) another case is writing to control file, which is also done in
>>> 512byte chunks.
>>   Ah, OK. I think the code ext4_end_io_nolock() handling the wait queue
>> might be racy. waitqueue_active() check is missing a barrier I think.
>> Does attached (untested) patch fix the issue for you?
>
> With this patch applied, I can reproduce the problem easily too:
>
> [   76.982985] EXT4-fs (dm-1): Unaligned AIO/DIO on inode 3407879 by oracle; performance will be poor.
> [ 1469.734114] SysRq : Show Blocked State
> [ 1469.734157]   task                        PC stack   pid father
> [ 1469.734473] oracle          D 0000000000000000     0  6146      1 0x00000000
> [ 1469.734525]  ffff88103f604810 0000000000000082 ffff881000000000 ffff881079791040
> [ 1469.734603]  ffff880432c19fd8 ffff880432c19fd8 ffff880432c19fd8 ffff88103f604810
> [ 1469.734681]  ffffea000ec13590 ffffffff00000000 ffff881438c8dad8 ffffffff810eeda2
> [ 1469.734760] Call Trace:
> [ 1469.734800]  [<ffffffff810eeda2>] ? __do_fault+0x422/0x520
> [ 1469.734863]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
> [ 1469.734909]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
> [ 1469.734956]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
> [ 1469.734999]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
> [ 1469.735039]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
> [ 1469.735078]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
> [ 1469.735122]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b
>
> which is exactly the same as with previous "w" into sysrq-trigger.
>
> So it's something else... :(
>
> Thank you!
>
> /mjt
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Tokarev - Aug. 12, 2011, 6:23 a.m.
12.08.2011 06:46, Jiaying Zhang wrote:
> Hi Michael,
> 
> Could you try your test with the patch I just posted:
> http://marc.info/?l=linux-ext4&m=131311627101278&w=2
> and see whether it fixes the problem?

No it does not.  I'm now able to trigger it more or
less reliable - I need to decompress a relatively
small (about 70Gb) oracle database and try to start
it (this requires a rebuild of initrd and reboot ofcourse --
whole thing takes about 15 minutes) - and I see this:

[  945.729965] EXT4-fs (sda11): Unaligned AIO/DIO on inode 5767175 by oracle; performance will be poor.
[  960.915602] SysRq : Show Blocked State
[  960.915650]   task                        PC stack   pid father
[  960.915852] oracle          D 0000000000000000     0  4985      1 0x00000000
[  960.915909]  ffff88103e627040 0000000000000082 ffff881000000000 ffff881078e3f7d0
[  960.917855]  ffff88103f88ffd8 ffff88103f88ffd8 ffff88103f88ffd8 ffff88103e627040
[  960.917953]  0000000001c08400 ffff88203e98c948 ffff88207873e240 ffffffff813527c6
[  960.918045] Call Trace:
[  960.918092]  [<ffffffff813527c6>] ? printk+0x43/0x48
[  960.918153]  [<ffffffffa01432a8>] ? ext4_msg+0x58/0x60 [ext4]
[  960.918201]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
[  960.918252]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
[  960.918301]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
[  960.918348]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
[  960.918392]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
[  960.918436]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
[  960.918483]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b

(Inum 5767175 is one of oracle redologs).

Jan, I lied to you initially - I didn't even test your first patch,
because I loaded the wrong initrd.  With it applied, situation does
not improve still, and it looks exactly the same as with this new
patch by Jiaying Zhang:

[  415.288104] EXT4-fs (sda11): Unaligned AIO/DIO on inode 5767177 by oracle; performance will be poor.
[  422.967323] SysRq : Show Blocked State
[  422.967494]   task                        PC stack   pid father
[  422.967872] oracle          D 0000000000000000     0  3743      1 0x00000000
[  422.968112]  ffff88203e5a2810 0000000000000086 ffff882000000000 ffff88103f403080
[  422.968505]  ffff88203eeddfd8 ffff88203eeddfd8 ffff88203eeddfd8 ffff88203e5a2810
[  422.968895]  0000000001c08400 ffff88103f3db348 ffff88103f2fe800 ffffffff813527c6
[  422.969287] Call Trace:
[  422.969397]  [<ffffffff813527c6>] ? printk+0x43/0x48
[  422.969528]  [<ffffffffa0143288>] ? ext4_msg+0x58/0x60 [ext4]
[  422.969643]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
[  422.969758]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
[  422.969873]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
[  422.969985]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
[  422.970093]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
[  422.970200]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
[  422.970312]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b

Note in both this cases, I now see slightly different
backtrace -- both mentions ext4_llseek and abort_exclusive_wait,
but the rest is different:

>> [   76.982985] EXT4-fs (dm-1): Unaligned AIO/DIO on inode 3407879 by oracle; performance will be poor.
>> [ 1469.734114] SysRq : Show Blocked State
>> [ 1469.734157]   task                        PC stack   pid father
>> [ 1469.734473] oracle          D 0000000000000000     0  6146      1 0x00000000
>> [ 1469.734525]  ffff88103f604810 0000000000000082 ffff881000000000 ffff881079791040
>> [ 1469.734603]  ffff880432c19fd8 ffff880432c19fd8 ffff880432c19fd8 ffff88103f604810
>> [ 1469.734681]  ffffea000ec13590 ffffffff00000000 ffff881438c8dad8 ffffffff810eeda2
>> [ 1469.734760] Call Trace:
>> [ 1469.734800]  [<ffffffff810eeda2>] ? __do_fault+0x422/0x520
>> [ 1469.734863]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
>> [ 1469.734909]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
>> [ 1469.734956]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
>> [ 1469.734999]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
>> [ 1469.735039]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
>> [ 1469.735078]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
>> [ 1469.735122]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b

As Jan already pointed out, this place looks bogus, and
the same can be said about the new backtrace.  So I wonder
if there's some stack corruption going on there as well.

Btw, does ext4_llseek() look sane here?  Note it's called from
aio_submit() -- does it _ever_ implement SEEKs?

Maybe some debugging is neecessary here?

Thank you!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Michael Tokarev - Aug. 12, 2011, 7:07 a.m.
12.08.2011 10:23, Michael Tokarev пишет:
> 12.08.2011 06:46, Jiaying Zhang wrote:
>> Hi Michael,
>>
>> Could you try your test with the patch I just posted:
>> http://marc.info/?l=linux-ext4&m=131311627101278&w=2
>> and see whether it fixes the problem?
> 
> No it does not.  I'm now able to trigger it more or
> less reliable - I need to decompress a relatively
> small (about 70Gb) oracle database and try to start
> it (this requires a rebuild of initrd and reboot ofcourse --
> whole thing takes about 15 minutes) - and I see this:

And I forgot to note an important (in my opinion anyway)
detail: I can only trigger the issue after copying the
database files, ie, on a hot cache.  After reboot when
the files are not in cache, I can start this database
without any issues whatsoever, and it passes all our
stress tests just fine.  So it appears the problem only
happens when the cache is hot - maybe DIO is fighting
with in-memory cache of the file or somesuch, or maybe
with yet unwritten file (since the amount of memory on
this machine is large, whole database with all its files
fits in memory).  Next time I'll try to ensure only the
(problematic and small) logfiles are hot in the cache.

Thanks,

/mjt

> [  945.729965] EXT4-fs (sda11): Unaligned AIO/DIO on inode 5767175 by oracle; performance will be poor.
> [  960.915602] SysRq : Show Blocked State
> [  960.915650]   task                        PC stack   pid father
> [  960.915852] oracle          D 0000000000000000     0  4985      1 0x00000000
> [  960.915909]  ffff88103e627040 0000000000000082 ffff881000000000 ffff881078e3f7d0
> [  960.917855]  ffff88103f88ffd8 ffff88103f88ffd8 ffff88103f88ffd8 ffff88103e627040
> [  960.917953]  0000000001c08400 ffff88203e98c948 ffff88207873e240 ffffffff813527c6
> [  960.918045] Call Trace:
> [  960.918092]  [<ffffffff813527c6>] ? printk+0x43/0x48
> [  960.918153]  [<ffffffffa01432a8>] ? ext4_msg+0x58/0x60 [ext4]
> [  960.918201]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
> [  960.918252]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
> [  960.918301]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
> [  960.918348]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
> [  960.918392]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
> [  960.918436]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
> [  960.918483]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b
> 
> (Inum 5767175 is one of oracle redologs).
> 
> Jan, I lied to you initially - I didn't even test your first patch,
> because I loaded the wrong initrd.  With it applied, situation does
> not improve still, and it looks exactly the same as with this new
> patch by Jiaying Zhang:
> 
> [  415.288104] EXT4-fs (sda11): Unaligned AIO/DIO on inode 5767177 by oracle; performance will be poor.
> [  422.967323] SysRq : Show Blocked State
> [  422.967494]   task                        PC stack   pid father
> [  422.967872] oracle          D 0000000000000000     0  3743      1 0x00000000
> [  422.968112]  ffff88203e5a2810 0000000000000086 ffff882000000000 ffff88103f403080
> [  422.968505]  ffff88203eeddfd8 ffff88203eeddfd8 ffff88203eeddfd8 ffff88203e5a2810
> [  422.968895]  0000000001c08400 ffff88103f3db348 ffff88103f2fe800 ffffffff813527c6
> [  422.969287] Call Trace:
> [  422.969397]  [<ffffffff813527c6>] ? printk+0x43/0x48
> [  422.969528]  [<ffffffffa0143288>] ? ext4_msg+0x58/0x60 [ext4]
> [  422.969643]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
> [  422.969758]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
> [  422.969873]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
> [  422.969985]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
> [  422.970093]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
> [  422.970200]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
> [  422.970312]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b
> 
> Note in both this cases, I now see slightly different
> backtrace -- both mentions ext4_llseek and abort_exclusive_wait,
> but the rest is different:
> 
>>> [   76.982985] EXT4-fs (dm-1): Unaligned AIO/DIO on inode 3407879 by oracle; performance will be poor.
>>> [ 1469.734114] SysRq : Show Blocked State
>>> [ 1469.734157]   task                        PC stack   pid father
>>> [ 1469.734473] oracle          D 0000000000000000     0  6146      1 0x00000000
>>> [ 1469.734525]  ffff88103f604810 0000000000000082 ffff881000000000 ffff881079791040
>>> [ 1469.734603]  ffff880432c19fd8 ffff880432c19fd8 ffff880432c19fd8 ffff88103f604810
>>> [ 1469.734681]  ffffea000ec13590 ffffffff00000000 ffff881438c8dad8 ffffffff810eeda2
>>> [ 1469.734760] Call Trace:
>>> [ 1469.734800]  [<ffffffff810eeda2>] ? __do_fault+0x422/0x520
>>> [ 1469.734863]  [<ffffffffa0123e6d>] ? ext4_file_write+0x20d/0x260 [ext4]
>>> [ 1469.734909]  [<ffffffff8106aee0>] ? abort_exclusive_wait+0xb0/0xb0
>>> [ 1469.734956]  [<ffffffffa0123c60>] ? ext4_llseek+0x120/0x120 [ext4]
>>> [ 1469.734999]  [<ffffffff81162173>] ? aio_rw_vect_retry+0x73/0x1d0
>>> [ 1469.735039]  [<ffffffff8116302f>] ? aio_run_iocb+0x5f/0x160
>>> [ 1469.735078]  [<ffffffff81164258>] ? do_io_submit+0x4f8/0x600
>>> [ 1469.735122]  [<ffffffff81359b52>] ? system_call_fastpath+0x16/0x1b
> 
> As Jan already pointed out, this place looks bogus, and
> the same can be said about the new backtrace.  So I wonder
> if there's some stack corruption going on there as well.
> 
> Btw, does ext4_llseek() look sane here?  Note it's called from
> aio_submit() -- does it _ever_ implement SEEKs?
> 
> Maybe some debugging is neecessary here?
> 
> Thank you!
> 
> /mjt

--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

From a5dd84bbe3c55b2717150ac26f8b9011d8f9181f Mon Sep 17 00:00:00 2001
From: Jan Kara <jack@suse.cz>
Date: Thu, 11 Aug 2011 15:57:51 +0200
Subject: [PATCH] ext4: Fix missed wakeups

Signed-off-by: Jan Kara <jack@suse.cz>
---
 fs/ext4/page-io.c |   15 +++++++++------
 1 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/fs/ext4/page-io.c b/fs/ext4/page-io.c
index 430c401..34d01d4 100644
--- a/fs/ext4/page-io.c
+++ b/fs/ext4/page-io.c
@@ -79,9 +79,11 @@  void ext4_free_io_end(ext4_io_end_t *io)
 		put_io_page(io->pages[i]);
 	io->num_io_pages = 0;
 	wq = ext4_ioend_wq(io->inode);
-	if (atomic_dec_and_test(&EXT4_I(io->inode)->i_ioend_count) &&
-	    waitqueue_active(wq))
-		wake_up_all(wq);
+	if (atomic_dec_and_test(&EXT4_I(io->inode)->i_ioend_count)) {
+		smp_mb__after_atomic_dec();
+		if (waitqueue_active(wq))
+			wake_up_all(wq);
+	}
 	kmem_cache_free(io_end_cachep, io);
 }
 
@@ -122,9 +124,10 @@  int ext4_end_io_nolock(ext4_io_end_t *io)
 		io->flag &= ~EXT4_IO_END_UNWRITTEN;
 		/* Wake up anyone waiting on unwritten extent conversion */
 		wq = ext4_ioend_wq(io->inode);
-		if (atomic_dec_and_test(&EXT4_I(inode)->i_aiodio_unwritten) &&
-		    waitqueue_active(wq)) {
-			wake_up_all(wq);
+		if (atomic_dec_and_test(&EXT4_I(inode)->i_aiodio_unwritten)) {
+			smp_mb__after_atomic_dec();
+			if (waitqueue_active(wq))
+				wake_up_all(wq);
 		}
 	}
 
-- 
1.7.1