Message ID | 20130411170402.GB11656@suse.de |
---|---|
State | Accepted, archived |
Headers | show |
On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > If we're stalling on lock_buffer(), that implies that buffer was being > > written, and for some reason it was taking a very long time to > > complete. > > > > Yes. > > > It might be worthwhile to put a timestamp in struct dm_crypt_io, and > > record the time when a particular I/O encryption/decryption is getting > > queued to the kcryptd workqueues, and when they finally squirt out. > > > > That somewhat assumes that dm_crypt was at fault which is not unreasonable > but I was skeptical as the workload on dm_crypt was opening a maildir > and mostly reads. Hmm... well, I've reviewed all of the places in the ext4 and jbd2 layer where we call lock_buffer(), and with one exception[1] we're not holding the the bh locked any longer than necessary. There are a few places where we grab a spinlock or two before we can do what we need to do and then release the lock'ed buffer head, but the only time we hold the bh locked for long periods of time is when we submit metadata blocks for I/O. [1] There is one exception in ext4_xattr_release_block() where I believe we should move the call to unlock_buffer(bh) before the call to ext4_free_blocks(), since we've already elevanted the bh count and ext4_free_blocks() does not need to have the bh locked. It's not related to any of the stalls you've repored, though, as near as I can tell (none of the stack traces include the ext4 xattr code, and this would only affect external extended attribute blocks). Could you code which checks the hold time of lock_buffer(), measuing from when the lock is successfully grabbed, to see if you can see if I missed some code path in ext4 or jbd2 where the bh is locked and then there is some call to some function which needs to block for some random reason? What I'd suggest is putting a timestamp in buffer_head structure, which is set by lock_buffer once it is successfully grabbed the lock, and then in unlock_buffer(), if it is held for more than a second or some such, to dump out the stack trace. Because at this point, either I'm missing something or I'm beginning to suspect that your hard drive (or maybe something the block layer?) is simply taking a long time to service an I/O request. Putting in this check should be able to very quickly determine what code path and/or which subsystem we should be focused upon. Thanks, - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu 11-04-13 14:35:12, Ted Tso wrote: > On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > > If we're stalling on lock_buffer(), that implies that buffer was being > > > written, and for some reason it was taking a very long time to > > > complete. > > > > > > > Yes. > > > > > It might be worthwhile to put a timestamp in struct dm_crypt_io, and > > > record the time when a particular I/O encryption/decryption is getting > > > queued to the kcryptd workqueues, and when they finally squirt out. > > > > > > > That somewhat assumes that dm_crypt was at fault which is not unreasonable > > but I was skeptical as the workload on dm_crypt was opening a maildir > > and mostly reads. > > Hmm... well, I've reviewed all of the places in the ext4 and jbd2 > layer where we call lock_buffer(), and with one exception[1] we're not > holding the the bh locked any longer than necessary. There are a few > places where we grab a spinlock or two before we can do what we need > to do and then release the lock'ed buffer head, but the only time we > hold the bh locked for long periods of time is when we submit metadata > blocks for I/O. > > [1] There is one exception in ext4_xattr_release_block() where I > believe we should move the call to unlock_buffer(bh) before the call > to ext4_free_blocks(), since we've already elevanted the bh count and > ext4_free_blocks() does not need to have the bh locked. It's not > related to any of the stalls you've repored, though, as near as I can > tell (none of the stack traces include the ext4 xattr code, and this > would only affect external extended attribute blocks). > > > Could you code which checks the hold time of lock_buffer(), measuing > from when the lock is successfully grabbed, to see if you can see if I > missed some code path in ext4 or jbd2 where the bh is locked and then > there is some call to some function which needs to block for some > random reason? What I'd suggest is putting a timestamp in buffer_head > structure, which is set by lock_buffer once it is successfully grabbed > the lock, and then in unlock_buffer(), if it is held for more than a > second or some such, to dump out the stack trace. > > Because at this point, either I'm missing something or I'm beginning > to suspect that your hard drive (or maybe something the block layer?) > is simply taking a long time to service an I/O request. Putting in > this check should be able to very quickly determine what code path > and/or which subsystem we should be focused upon. I think it might be more enlightening if Mel traced which process in which funclion is holding the buffer lock. I suspect we'll find out that the flusher thread has submitted the buffer for IO as an async write and thus it takes a long time to complete in presence of reads which have higher priority. Honza
On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: > I think it might be more enlightening if Mel traced which process in > which funclion is holding the buffer lock. I suspect we'll find out that > the flusher thread has submitted the buffer for IO as an async write and > thus it takes a long time to complete in presence of reads which have > higher priority. That's an interesting theory. If the workload is one which is very heavy on reads and writes, that could explain the high latency. That would explain why those of us who are using primarily SSD's are seeing the problems, because would be reads are nice and fast. If that is the case, one possible solution that comes to mind would be to mark buffer_heads that contain metadata with a flag, so that the flusher thread can write them back at the same priority as reads. The only problem I can see with this hypothesis is that if this is the explanation for what Mel and Jiri are seeing, it's something that would have been around for a long time, and would affect ext3 as well as ext4. That isn't quite consistent, however, with Mel's observation that this is a probablem which has gotten worse in relatively recently. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote: > On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: > > I think it might be more enlightening if Mel traced which process in > > which funclion is holding the buffer lock. I suspect we'll find out that > > the flusher thread has submitted the buffer for IO as an async write and > > thus it takes a long time to complete in presence of reads which have > > higher priority. > > That's an interesting theory. If the workload is one which is very > heavy on reads and writes, that could explain the high latency. That > would explain why those of us who are using primarily SSD's are seeing > the problems, because would be reads are nice and fast. > > If that is the case, one possible solution that comes to mind would be > to mark buffer_heads that contain metadata with a flag, so that the > flusher thread can write them back at the same priority as reads. Ext4 is already using REQ_META for this purpose. I'm surprised that no-one has suggested "change the IO elevator" yet..... Cheers, Dave.
On Thu, Apr 11, 2013 at 02:35:12PM -0400, Theodore Ts'o wrote: > On Thu, Apr 11, 2013 at 06:04:02PM +0100, Mel Gorman wrote: > > > If we're stalling on lock_buffer(), that implies that buffer was being > > > written, and for some reason it was taking a very long time to > > > complete. > > > > > > > Yes. > > > > > It might be worthwhile to put a timestamp in struct dm_crypt_io, and > > > record the time when a particular I/O encryption/decryption is getting > > > queued to the kcryptd workqueues, and when they finally squirt out. > > > > > > > That somewhat assumes that dm_crypt was at fault which is not unreasonable > > but I was skeptical as the workload on dm_crypt was opening a maildir > > and mostly reads. > > Hmm... well, I've reviewed all of the places in the ext4 and jbd2 > layer where we call lock_buffer(), and with one exception[1] we're not > holding the the bh locked any longer than necessary. There are a few > places where we grab a spinlock or two before we can do what we need > to do and then release the lock'ed buffer head, but the only time we > hold the bh locked for long periods of time is when we submit metadata > blocks for I/O. > Yeah, ok. This is not the answer I was hoping for but it's the answer I expected. > Could you code which checks the hold time of lock_buffer(), measuing > from when the lock is successfully grabbed, to see if you can see if I > missed some code path in ext4 or jbd2 where the bh is locked and then > there is some call to some function which needs to block for some > random reason? > > What I'd suggest is putting a timestamp in buffer_head > structure, which is set by lock_buffer once it is successfully grabbed > the lock, and then in unlock_buffer(), if it is held for more than a > second or some such, to dump out the stack trace. > I can do that but the results might lack meaning. What I could do instead is use a variation of the page owner tracking patch (current iteration at https://lkml.org/lkml/2012/12/7/487) to record a stack trace in lock_buffer and dump it from jbd2/transaction.c if it stalls for too long. I'll report if I find something useful. > Because at this point, either I'm missing something or I'm beginning > to suspect that your hard drive (or maybe something the block layer?) > is simply taking a long time to service an I/O request. It could be because the drive is a piece of crap.
On Thu, Apr 11, 2013 at 10:57:08PM -0400, Theodore Ts'o wrote: > On Thu, Apr 11, 2013 at 11:33:35PM +0200, Jan Kara wrote: > > I think it might be more enlightening if Mel traced which process in > > which funclion is holding the buffer lock. I suspect we'll find out that > > the flusher thread has submitted the buffer for IO as an async write and > > thus it takes a long time to complete in presence of reads which have > > higher priority. > > That's an interesting theory. If the workload is one which is very > heavy on reads and writes, that could explain the high latency. That > would explain why those of us who are using primarily SSD's are seeing > the problems, because would be reads are nice and fast. > > If that is the case, one possible solution that comes to mind would be > to mark buffer_heads that contain metadata with a flag, so that the > flusher thread can write them back at the same priority as reads. > > The only problem I can see with this hypothesis is that if this is the > explanation for what Mel and Jiri are seeing, it's something that > would have been around for a long time, and would affect ext3 as well > as ext4. That isn't quite consistent, however, with Mel's observation > that this is a probablem which has gotten worse in relatively > recently. > According to the tests I've run, multi-second stalls have been a problem for a while but never really bothered me. I'm not sure why it felt particularly bad around -rc2 or why it seems to be better now. Maybe I just had my cranky pants on.
Hi all, On Thursday 11 April 2013 22:57:08 Theodore Ts'o wrote: > That's an interesting theory. If the workload is one which is very > heavy on reads and writes, that could explain the high latency. That > would explain why those of us who are using primarily SSD's are seeing > the problems, because would be reads are nice and fast. > > If that is the case, one possible solution that comes to mind would be > to mark buffer_heads that contain metadata with a flag, so that the > flusher thread can write them back at the same priority as reads. > > The only problem I can see with this hypothesis is that if this is the > explanation for what Mel and Jiri are seeing, it's something that > would have been around for a long time, and would affect ext3 as well > as ext4. That isn't quite consistent, however, with Mel's observation > that this is a probablem which has gotten worse in relatively > recently. Dropping in as a casual observer and having missed the start of the thread, risking that I will just muddle the water for you. I had a similar problem for quite a while with ext4, at least that was my conclusion since the fix was to migrate one filesystem to xfs which fixed it for me. Time period when I observed this was between 3.5 and 3.7 kernels. Situation was I had an ext4 filesystem (on top of LVM, which was on top of MD RAID 1, which was on top of two mechanical hard drives) which was dedicated to holding a large SVN check-out. Other filesystems were also ext4 on different logical volumes (but same spindles). Symptoms were long stalls of everything (including window management!) on a relatively heavily loaded desktop (which was KDE). Stalls would last anything from five to maybe even 30 seconds. Not sure exactly but long enough that you think the system has actually crashed. I couldn't even switch away to a different virtual terminal during the stall, nothing. Eventually I traced it down to kdesvn (subversion client) periodically refreshing (or something) it's metadata and hence generating some IO on that dedicated filesystem. That combined with some other desktop activity had an effect of stalling everything else. I thought it was very weird, but I suppose KDE and all the rest nowadays do to much IO in everything they do. Following a hunch I reformatted that filesystem as XFS which fixed the problem. I can't reproduce this now to run any tests so I know this is not very helpful now. But perhaps some of the info will be useful to someone. Tvrtko -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > If that is the case, one possible solution that comes to mind would be > > to mark buffer_heads that contain metadata with a flag, so that the > > flusher thread can write them back at the same priority as reads. > > Ext4 is already using REQ_META for this purpose. We're using REQ_META | REQ_PRIO for reads, not writes. > I'm surprised that no-one has suggested "change the IO elevator" > yet..... Well, testing to see if the stalls go away with the noop schedule is a good thing to try just to validate the theory. The thing is, we do want to make ext4 work well with cfq, and prioritizing non-readahead read requests ahead of data writeback does make sense. The issue is with is that metadata writes going through the block device could in some cases effectively cause a priority inversion when what had previously been an asynchronous writeback starts blocking a foreground, user-visible process. At least, that's the theory; we should confirm that this is indeed what is causing the data stalls which Mel is reporting on HDD's before we start figuring out how to fix this problem. - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote: > On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > > If that is the case, one possible solution that comes to mind would be > > > to mark buffer_heads that contain metadata with a flag, so that the > > > flusher thread can write them back at the same priority as reads. > > > > Ext4 is already using REQ_META for this purpose. > > We're using REQ_META | REQ_PRIO for reads, not writes. > > > I'm surprised that no-one has suggested "change the IO elevator" > > yet..... > > Well, testing to see if the stalls go away with the noop schedule is a > good thing to try just to validate the theory. Exactly. > The thing is, we do want to make ext4 work well with cfq, and > prioritizing non-readahead read requests ahead of data writeback does > make sense. The issue is with is that metadata writes going through > the block device could in some cases effectively cause a priority > inversion when what had previously been an asynchronous writeback > starts blocking a foreground, user-visible process. Here's the historic problem with CFQ: it's scheduling algorithms change from release to release, and so what you tune the filesystem to for this release is likely to cause different behaviour in a few releases time. We've had this problem time and time again with CFQ+XFS, so we stopped trying to "tune" to a particular elevator long ago. The best you can do it tag the Io as appropriately as possible (e.g. metadata with REQ_META, sync IO with ?_SYNC, etc), and then hope CFQ hasn't been broken since the last release.... > At least, that's the theory; we should confirm that this is indeed > what is causing the data stalls which Mel is reporting on HDD's before > we start figuring out how to fix this problem. *nod*. Cheers, Dave.
As an update to this thread, we brought up this issue at LSF/MM, and there is a thought that we should be able to solve this problem by having lock_buffer() check to see if the buffer is locked due to a write being queued, to have the priority of the write bumped up in the write queues to resolve the priority inversion. I believe Jeff Moyer was going to look into this, if I remember correctly. An alternate solution which I've been playing around adds buffer_head flags so we can indicate that a buffer contains metadata and/or should have I/O submitted with the REQ_PRIO flag set. Adding a buffer_head flag for at least BH_Meta is probably a good thing, since that way the blktrace will be properly annotated. Whether we should keep the BH_Prio flag or rely on lock_buffer() automatically raising the priority is, my feeling is that if lock_buffer() can do the right thing, we should probably do it via lock_buffer(). I have a feeling this might be decidedly non-trivial, though, so perhaps we should just doing via BH flags? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
(Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace) On Fri, Apr 12, 2013 at 11:19:52AM -0400, Theodore Ts'o wrote: > On Fri, Apr 12, 2013 at 02:50:42PM +1000, Dave Chinner wrote: > > > If that is the case, one possible solution that comes to mind would be > > > to mark buffer_heads that contain metadata with a flag, so that the > > > flusher thread can write them back at the same priority as reads. > > > > Ext4 is already using REQ_META for this purpose. > > We're using REQ_META | REQ_PRIO for reads, not writes. > > > I'm surprised that no-one has suggested "change the IO elevator" > > yet..... > > Well, testing to see if the stalls go away with the noop schedule is a > good thing to try just to validate the theory. > I still haven't tested with a different elevator. While this bug is relatively high priority for me, there are still are other issues in the way. TLDR: Flusher writes pages very quickly after processes dirty a buffer. Reads starve flusher writes. Now the ugliness and being a windbag. I collected blktrace and some other logs and they are available at http://www.csn.ul.ie/~mel/postings/stalls-20130419/log.tar.gz and there is a lot of stuff in there. The unix time the test started is in the first line of the file tests-timestamp-bisect . This can help figure out how far into the test some of the other timestamped logs are The kernel log with information from the lock_buffer debugging patch is in dmesg-bisect-gitcheckout . The information in it is race-prone and cannot be 100% trusted but it's still useful. iostat is in iostat-bisect-gitcheckout Here are a few observations I got from the data. 1. flushers appear to acquire buffer locks *very* quickly after an application writes. Look for lines that look like "foo failed trylock without holder released 0 ms ago acquired 0 ms ago by bar" There are a lot of entries like this jbd2 239 flush-8:0 failed trylock without holder released 0 ms ago acquired 0 ms ago by 2124 tar jbd2 239 flush-8:0 failed trylock without holder released 0 ms ago acquired 0 ms ago by 2124 tar jbd2 2124 tar failed trylock without holder released 0 ms ago acquired 0 ms ago by 239 flush-8:0 jbd2 239 flush-8:0 failed trylock without holder released 0 ms ago acquired 0 ms ago by 2124 tar jbd2 239 flush-8:0 failed trylock without holder released 0 ms ago acquired 0 ms ago by 2124 tar jbd2 239 flush-8:0 failed trylock without holder released 0 ms ago acquired 0 ms ago by 2124 tar jbd2 239 flush-8:0 failed trylock without holder released 0 ms ago acquired 0 ms ago by 2124 tar I expected flushers to be writing back the buffers just released in about 5 seconds time, not immediately. It may indicate that when flushers wake to clean expired inodes that it keeps cleaning inodes as they are being dirtied. 2. The flush thread can prevent a process making forward progress for a long time. Take this as an example jbd2 stalled dev 8,8 for 8168 ms lock holdtime 20692 ms Last Owner 239 flush-8:0 Acquired Stack [<ffffffff8100fd8a>] save_stack_trace+0x2a/0x50 [<ffffffff811a3ad6>] set_lock_buffer_owner+0x86/0x90 [<ffffffff811a72ee>] __block_write_full_page+0x16e/0x360 [<ffffffff811a75b3>] block_write_full_page_endio+0xd3/0x110 [<ffffffff811a7600>] block_write_full_page+0x10/0x20 [<ffffffff811aa7f3>] blkdev_writepage+0x13/0x20 [<ffffffff81119352>] __writepage+0x12/0x40 [<ffffffff81119b56>] write_cache_pages+0x206/0x460 [<ffffffff81119df5>] generic_writepages+0x45/0x70 [<ffffffff8111accb>] do_writepages+0x1b/0x30 [<ffffffff81199d60>] __writeback_single_inode+0x40/0x1b0 [<ffffffff8119c40a>] writeback_sb_inodes+0x19a/0x350 [<ffffffff8119c656>] __writeback_inodes_wb+0x96/0xc0 [<ffffffff8119c8fb>] wb_writeback+0x27b/0x330 [<ffffffff8119e300>] wb_do_writeback+0x190/0x1d0 [<ffffffff8119e3c3>] bdi_writeback_thread+0x83/0x280 [<ffffffff8106901b>] kthread+0xbb/0xc0 [<ffffffff8159e1fc>] ret_from_fork+0x7c/0xb0 [<ffffffffffffffff>] 0xffffffffffffffff This part is saying that we locked the buffer due to blkdev_writepage which I assume must be a metadata update. Based on where we lock the buffer, the only reason we would leave the buffer unlocked if this was an asynchronous write request leaving the buffer to be unlocked by end_buffer_async_write at some time in the future Last Owner Activity Stack: 239 flush-8:0 [<ffffffff812aee61>] __blkdev_issue_zeroout+0x191/0x1a0 [<ffffffff812aef51>] blkdev_issue_zeroout+0xe1/0xf0 [<ffffffff8121abe9>] ext4_ext_zeroout.isra.30+0x49/0x60 [<ffffffff8121ee47>] ext4_ext_convert_to_initialized+0x227/0x5f0 [<ffffffff8121f8a3>] ext4_ext_handle_uninitialized_extents+0x2f3/0x3a0 [<ffffffff8121ff57>] ext4_ext_map_blocks+0x5d7/0xa00 [<ffffffff811f0715>] ext4_map_blocks+0x2d5/0x470 [<ffffffff811f47da>] mpage_da_map_and_submit+0xba/0x2f0 [<ffffffff811f52e0>] ext4_da_writepages+0x380/0x620 [<ffffffff8111accb>] do_writepages+0x1b/0x30 [<ffffffff81199d60>] __writeback_single_inode+0x40/0x1b0 [<ffffffff8119c40a>] writeback_sb_inodes+0x19a/0x350 [<ffffffff8119c656>] __writeback_inodes_wb+0x96/0xc0 [<ffffffff8119c8fb>] wb_writeback+0x27b/0x330 This part is indicating that at the time a process tried to acquire the buffer lock that flusher was off doing something else entirely. That points again to the metadata write being asynchronous. Current Owner 1829 stap [<ffffffff8100fd8a>] save_stack_trace+0x2a/0x50 [<ffffffff811a3ad6>] set_lock_buffer_owner+0x86/0x90 [<ffffffff8123a5b2>] do_get_write_access+0xd2/0x800 [<ffffffff8123ae2b>] jbd2_journal_get_write_access+0x2b/0x50 [<ffffffff81221249>] __ext4_journal_get_write_access+0x39/0x80 [<ffffffff81229bca>] ext4_free_blocks+0x36a/0xbe0 [<ffffffff8121c686>] ext4_remove_blocks+0x256/0x2d0 [<ffffffff8121c905>] ext4_ext_rm_leaf+0x205/0x520 [<ffffffff8121e64c>] ext4_ext_remove_space+0x4dc/0x750 [<ffffffff8122051b>] ext4_ext_truncate+0x19b/0x1e0 [<ffffffff811efde5>] ext4_truncate.part.61+0xd5/0xf0 [<ffffffff811f0ee4>] ext4_truncate+0x34/0x90 [<ffffffff811f382d>] ext4_setattr+0x18d/0x640 [<ffffffff8118d3e2>] notify_change+0x1f2/0x3c0 [<ffffffff811716f9>] do_truncate+0x59/0xa0 [<ffffffff8117d3f6>] handle_truncate+0x66/0xa0 [<ffffffff81181576>] do_last+0x626/0x820 [<ffffffff81181823>] path_openat+0xb3/0x4a0 [<ffffffff8118237d>] do_filp_open+0x3d/0xa0 [<ffffffff81172869>] do_sys_open+0xf9/0x1e0 [<ffffffff8117296c>] sys_open+0x1c/0x20 [<ffffffff8159e2ad>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff This is just showing where stap was trying to acquire the buffer lock truncating data. 3. The blktrace indicates that reads can starve writes from flusher While there are people that can look at a blktrace and find problems like they are rain man, I'm more like an ADHD squirrel when looking at a trace. I wrote a script to look for what unrelated requests completed while an request got stalled for over a second. It seemed like something that a tool shoudl already exist for but I didn't find one unless btt can give the information somehow. Each delayed request is quite long but here is the first example discovered by the script Request 4174 took 1.060828037 to complete 239 W 260608696 [flush-8:0] Request started time index 4.731862902 Inflight while queued 239 W 260608696 [flush-8:0] 239 W 260608072 [flush-8:0] 239 W 260607872 [flush-8:0] 239 W 260608488 [flush-8:0] 239 W 260608472 [flush-8:0] 239 W 260608568 [flush-8:0] 239 W 260608008 [flush-8:0] 239 W 260607728 [flush-8:0] 239 W 260607112 [flush-8:0] 239 W 260608544 [flush-8:0] 239 W 260622168 [flush-8:0] 239 W 271863816 [flush-8:0] 239 W 260608672 [flush-8:0] 239 W 260607944 [flush-8:0] 239 W 203833687 [flush-8:0] 1676 R 541999743 [watch-inbox-ope] 239 W 260608240 [flush-8:0] 239 W 203851359 [flush-8:0] 239 W 272019768 [flush-8:0] 239 W 260607272 [flush-8:0] 239 W 260607992 [flush-8:0] 239 W 483478791 [flush-8:0] 239 W 260608528 [flush-8:0] 239 W 260607456 [flush-8:0] 239 W 261310704 [flush-8:0] 239 W 260608200 [flush-8:0] 239 W 260607744 [flush-8:0] 239 W 204729015 [flush-8:0] 239 W 204728927 [flush-8:0] 239 W 260608584 [flush-8:0] 239 W 260608352 [flush-8:0] 239 W 270532504 [flush-8:0] 239 W 260608600 [flush-8:0] 239 W 260607152 [flush-8:0] 239 W 260607888 [flush-8:0] 239 W 260607192 [flush-8:0] 239 W 260607568 [flush-8:0] 239 W 260607632 [flush-8:0] 239 W 271831080 [flush-8:0] 239 W 260608312 [flush-8:0] 239 W 260607440 [flush-8:0] 239 W 204729023 [flush-8:0] 239 W 260608056 [flush-8:0] 239 W 272019776 [flush-8:0] 239 W 260608632 [flush-8:0] 239 W 260607704 [flush-8:0] 239 W 271827168 [flush-8:0] 239 W 260607208 [flush-8:0] 239 W 260607384 [flush-8:0] 239 W 260607856 [flush-8:0] 239 W 260607320 [flush-8:0] 239 W 271827160 [flush-8:0] 239 W 260608152 [flush-8:0] 239 W 261271552 [flush-8:0] 239 W 260607168 [flush-8:0] 239 W 260608088 [flush-8:0] 239 W 260607480 [flush-8:0] 239 W 260608424 [flush-8:0] 239 W 260608040 [flush-8:0] 239 W 260608400 [flush-8:0] 239 W 260608224 [flush-8:0] 239 W 260607680 [flush-8:0] 239 W 260607808 [flush-8:0] 239 W 266347440 [flush-8:0] 239 W 260607776 [flush-8:0] 239 W 260607512 [flush-8:0] 239 W 266347280 [flush-8:0] 239 W 260607424 [flush-8:0] 239 W 260607656 [flush-8:0] 239 W 260607976 [flush-8:0] 239 W 260608440 [flush-8:0] 239 W 260608272 [flush-8:0] 239 W 260607536 [flush-8:0] 239 W 260607920 [flush-8:0] 239 W 260608456 [flush-8:0] Complete since queueing 1676 R 541999743 [watch-inbox-ope] 239 W 203833687 [flush-8:0] 1676 R 541999759 [watch-inbox-ope] 1676 R 541999791 [watch-inbox-ope] 1676 R 541999807 [watch-inbox-ope] 1676 R 541999839 [watch-inbox-ope] 1676 R 541999855 [watch-inbox-ope] 1676 R 542210351 [watch-inbox-ope] 1676 R 542210367 [watch-inbox-ope] 1676 R 541999887 [watch-inbox-ope] 1676 R 541999911 [watch-inbox-ope] 1676 R 541999935 [watch-inbox-ope] 1676 R 541999967 [watch-inbox-ope] 1676 RM 540448791 [watch-inbox-ope] 1676 R 541999983 [watch-inbox-ope] 1676 R 542051791 [watch-inbox-ope] 1676 R 541999999 [watch-inbox-ope] 1676 R 541949839 [watch-inbox-ope] 1676 R 541949871 [watch-inbox-ope] 1676 R 541949903 [watch-inbox-ope] 1676 R 541949935 [watch-inbox-ope] 1676 R 541949887 [watch-inbox-ope] 1676 R 542051823 [watch-inbox-ope] 1676 R 541949967 [watch-inbox-ope] 1676 R 542051839 [watch-inbox-ope] 1676 R 541949999 [watch-inbox-ope] 1676 R 541950015 [watch-inbox-ope] 1676 R 541950031 [watch-inbox-ope] 1676 R 541950047 [watch-inbox-ope] 1676 R 541950063 [watch-inbox-ope] 1676 R 542112079 [watch-inbox-ope] 1676 R 542112095 [watch-inbox-ope] 1676 R 542112111 [watch-inbox-ope] 1676 R 542112127 [watch-inbox-ope] 1676 R 542112847 [watch-inbox-ope] 1676 R 542112863 [watch-inbox-ope] 1676 RM 540461311 [watch-inbox-ope] 1676 RM 540448799 [watch-inbox-ope] 1676 R 542112879 [watch-inbox-ope] 1676 R 541950087 [watch-inbox-ope] 1676 R 541950111 [watch-inbox-ope] 1676 R 542112895 [watch-inbox-ope] 1676 R 541950127 [watch-inbox-ope] 1676 R 541950159 [watch-inbox-ope] 1676 R 541950175 [watch-inbox-ope] 1676 R 541950191 [watch-inbox-ope] 1676 R 541950207 [watch-inbox-ope] 1676 R 541950239 [watch-inbox-ope] 1676 R 541950255 [watch-inbox-ope] 1676 R 541950287 [watch-inbox-ope] 1676 R 541950303 [watch-inbox-ope] 1676 R 541950319 [watch-inbox-ope] 1676 R 542113103 [watch-inbox-ope] 1676 R 541950343 [watch-inbox-ope] 1676 R 541950367 [watch-inbox-ope] 1676 R 541950399 [watch-inbox-ope] 1676 R 542113119 [watch-inbox-ope] 1676 R 542113135 [watch-inbox-ope] 1676 R 541950415 [watch-inbox-ope] 1676 RM 540448815 [watch-inbox-ope] 1676 R 542113151 [watch-inbox-ope] 1676 R 541950447 [watch-inbox-ope] 1676 R 541950463 [watch-inbox-ope] 1676 R 542113743 [watch-inbox-ope] 1676 R 542113759 [watch-inbox-ope] 1676 R 542113775 [watch-inbox-ope] 1676 R 542113791 [watch-inbox-ope] 239 W 203851359 [flush-8:0] 239 W 204729015 [flush-8:0] 239 W 204728927 [flush-8:0] 239 W 204729023 [flush-8:0] 239 W 260569008 [flush-8:0] 1676 R 542145871 [watch-inbox-ope] 1676 R 542145903 [watch-inbox-ope] 1676 R 542145887 [watch-inbox-ope] 1676 R 542154639 [watch-inbox-ope] 1676 R 542154655 [watch-inbox-ope] 1676 R 542154671 [watch-inbox-ope] 1676 R 542154687 [watch-inbox-ope] 1676 R 542154831 [watch-inbox-ope] 1676 R 542154863 [watch-inbox-ope] 1676 R 542157647 [watch-inbox-ope] 1676 R 542157663 [watch-inbox-ope] 1676 R 541950479 [watch-inbox-ope] 1676 R 541950503 [watch-inbox-ope] 1676 R 541950535 [watch-inbox-ope] 1676 R 541950599 [watch-inbox-ope] 1676 R 541950727 [watch-inbox-ope] 1676 R 541950751 [watch-inbox-ope] 1676 R 541950767 [watch-inbox-ope] 1676 RM 540448823 [watch-inbox-ope] 1676 R 541950783 [watch-inbox-ope] 1676 R 541950807 [watch-inbox-ope] 1676 R 541950839 [watch-inbox-ope] 1676 R 541950855 [watch-inbox-ope] 1676 R 541950879 [watch-inbox-ope] 1676 R 541950895 [watch-inbox-ope] 1676 R 541950919 [watch-inbox-ope] 1676 R 541950951 [watch-inbox-ope] 1676 R 541950959 [watch-inbox-ope] 1676 R 541950975 [watch-inbox-ope] 1676 R 541951007 [watch-inbox-ope] 1676 R 541951023 [watch-inbox-ope] 1676 R 541951055 [watch-inbox-ope] 1676 R 541951087 [watch-inbox-ope] 1676 R 541951103 [watch-inbox-ope] 1676 R 541951119 [watch-inbox-ope] 1676 R 541951143 [watch-inbox-ope] 1676 R 541951167 [watch-inbox-ope] 1676 R 542157679 [watch-inbox-ope] 1676 R 542157695 [watch-inbox-ope] 1676 R 541951183 [watch-inbox-ope] 1676 R 541951215 [watch-inbox-ope] 1676 R 541951231 [watch-inbox-ope] 1676 R 542158223 [watch-inbox-ope] 1676 RM 540448831 [watch-inbox-ope] 1676 R 541951247 [watch-inbox-ope] 1676 R 541951271 [watch-inbox-ope] 1676 R 541951295 [watch-inbox-ope] 1676 R 542158239 [watch-inbox-ope] 1676 R 542158255 [watch-inbox-ope] 1676 R 541951311 [watch-inbox-ope] 1676 R 542158271 [watch-inbox-ope] 1676 R 541951343 [watch-inbox-ope] 1676 R 541951359 [watch-inbox-ope] 1676 R 541951391 [watch-inbox-ope] 1676 R 541951407 [watch-inbox-ope] 1676 R 541951423 [watch-inbox-ope] 1676 R 541951439 [watch-inbox-ope] 1676 R 541951471 [watch-inbox-ope] 1676 R 542158607 [watch-inbox-ope] 1676 R 541951487 [watch-inbox-ope] 1676 R 542158639 [watch-inbox-ope] 1676 R 542158655 [watch-inbox-ope] 1676 R 542187215 [watch-inbox-ope] 1676 R 542187231 [watch-inbox-ope] 1676 R 542187247 [watch-inbox-ope] 1676 R 541951503 [watch-inbox-ope] 1676 RM 540448839 [watch-inbox-ope] 1676 R 542187263 [watch-inbox-ope] 1676 R 541951535 [watch-inbox-ope] 1676 R 541951551 [watch-inbox-ope] 1676 R 541951599 [watch-inbox-ope] 1676 R 541951575 [watch-inbox-ope] 1676 R 542190607 [watch-inbox-ope] 239 W 261310704 [flush-8:0] 239 W 266347280 [flush-8:0] 239 W 266347440 [flush-8:0] 1676 R 542190623 [watch-inbox-ope] 1676 R 542190639 [watch-inbox-ope] 1676 R 542190655 [watch-inbox-ope] 1676 R 542193999 [watch-inbox-ope] 1676 R 542194015 [watch-inbox-ope] 1676 R 541951631 [watch-inbox-ope] 1676 R 541951663 [watch-inbox-ope] 1676 R 541951679 [watch-inbox-ope] 1676 R 541951711 [watch-inbox-ope] 1676 R 541951727 [watch-inbox-ope] 1676 R 541951743 [watch-inbox-ope] 1676 R 542194031 [watch-inbox-ope] 1676 R 542194047 [watch-inbox-ope] 1676 R 542197711 [watch-inbox-ope] 1676 RM 540448847 [watch-inbox-ope] 1676 R 541951759 [watch-inbox-ope] 1676 R 541951783 [watch-inbox-ope] 1676 R 541951807 [watch-inbox-ope] 1676 R 542197727 [watch-inbox-ope] 1676 R 542197743 [watch-inbox-ope] 1676 R 542197759 [watch-inbox-ope] 1676 R 541951823 [watch-inbox-ope] 1676 R 541951855 [watch-inbox-ope] 1676 R 541951871 [watch-inbox-ope] 1676 R 541951895 [watch-inbox-ope] 1676 R 541951919 [watch-inbox-ope] 1676 R 541951935 [watch-inbox-ope] 1676 R 541951951 [watch-inbox-ope] 1676 R 541951967 [watch-inbox-ope] 1676 R 541951983 [watch-inbox-ope] 1676 R 542207567 [watch-inbox-ope] 1676 R 542207599 [watch-inbox-ope] 1676 R 542210383 [watch-inbox-ope] 1676 R 542210399 [watch-inbox-ope] 1676 R 542210415 [watch-inbox-ope] 1676 R 542210431 [watch-inbox-ope] 1676 RM 540448855 [watch-inbox-ope] 1676 R 541952015 [watch-inbox-ope] 1676 R 541952047 [watch-inbox-ope] 1676 R 541952063 [watch-inbox-ope] 1676 R 541952079 [watch-inbox-ope] 1676 R 541952103 [watch-inbox-ope] 1676 R 541952127 [watch-inbox-ope] 1676 R 541952159 [watch-inbox-ope] 1676 R 541952175 [watch-inbox-ope] 1676 R 541952207 [watch-inbox-ope] 1676 R 541952223 [watch-inbox-ope] 1676 R 541952255 [watch-inbox-ope] 1676 R 541952303 [watch-inbox-ope] 1676 R 541952319 [watch-inbox-ope] 1676 R 541952335 [watch-inbox-ope] 1676 R 541952351 [watch-inbox-ope] 1676 R 541952383 [watch-inbox-ope] 1676 R 542051855 [watch-inbox-ope] 1676 R 542051871 [watch-inbox-ope] 1676 R 542051887 [watch-inbox-ope] 1676 R 542051903 [watch-inbox-ope] 1676 R 542051919 [watch-inbox-ope] 1676 R 541952391 [watch-inbox-ope] 1676 R 541952415 [watch-inbox-ope] 1676 RM 540448863 [watch-inbox-ope] 1676 R 542051935 [watch-inbox-ope] 1676 R 541952431 [watch-inbox-ope] 1676 R 541952447 [watch-inbox-ope] 1676 R 541952463 [watch-inbox-ope] 1676 R 541952487 [watch-inbox-ope] 1676 R 541952511 [watch-inbox-ope] 1676 R 541952527 [watch-inbox-ope] 1676 R 541952559 [watch-inbox-ope] 1676 R 541952607 [watch-inbox-ope] 1676 R 541952623 [watch-inbox-ope] 1676 R 542051951 [watch-inbox-ope] 1676 R 541952639 [watch-inbox-ope] 1676 R 542112271 [watch-inbox-ope] 239 W 261271552 [flush-8:0] 239 W 270532504 [flush-8:0] 239 W 271827168 [flush-8:0] 239 W 271827160 [flush-8:0] 239 W 271831080 [flush-8:0] 1676 R 542112287 [watch-inbox-ope] 1676 R 542112303 [watch-inbox-ope] 1676 R 542112319 [watch-inbox-ope] 1676 R 542112335 [watch-inbox-ope] 1676 R 542112351 [watch-inbox-ope] 1676 R 542112367 [watch-inbox-ope] 1676 R 542112383 [watch-inbox-ope] 1676 R 542112655 [watch-inbox-ope] 1676 RM 540448871 [watch-inbox-ope] 1676 R 542112671 [watch-inbox-ope] 1676 R 542112687 [watch-inbox-ope] 1676 R 542112703 [watch-inbox-ope] 1676 R 542112719 [watch-inbox-ope] 1676 R 542112735 [watch-inbox-ope] 1676 R 541952655 [watch-inbox-ope] 1676 R 541952687 [watch-inbox-ope] 1676 R 541952703 [watch-inbox-ope] 1676 R 541952735 [watch-inbox-ope] 1676 R 541952751 [watch-inbox-ope] 1676 R 542112751 [watch-inbox-ope] 1676 R 541952767 [watch-inbox-ope] 1676 R 541952783 [watch-inbox-ope] 1676 R 541952799 [watch-inbox-ope] 1676 R 541952815 [watch-inbox-ope] 1676 R 541952831 [watch-inbox-ope] 1676 R 541952863 [watch-inbox-ope] 1676 R 541952879 [watch-inbox-ope] 1676 R 542113807 [watch-inbox-ope] 1676 R 541952903 [watch-inbox-ope] 1676 R 541952935 [watch-inbox-ope] 1676 RM 540448879 [watch-inbox-ope] 1676 R 541952959 [watch-inbox-ope] 1676 R 542113823 [watch-inbox-ope] 1676 R 542113839 [watch-inbox-ope] 1676 R 542113855 [watch-inbox-ope] 1676 R 541952975 [watch-inbox-ope] 1676 R 541952991 [watch-inbox-ope] 1676 R 541953007 [watch-inbox-ope] 1676 R 541953023 [watch-inbox-ope] 1676 R 541953055 [watch-inbox-ope] 1676 R 541953071 [watch-inbox-ope] 1676 R 541953103 [watch-inbox-ope] 1676 R 541953119 [watch-inbox-ope] 1676 R 541953135 [watch-inbox-ope] 1676 R 542113871 [watch-inbox-ope] 1676 R 542113887 [watch-inbox-ope] 1676 R 541953167 [watch-inbox-ope] 1676 R 541953191 [watch-inbox-ope] 1676 R 541953223 [watch-inbox-ope] 1676 R 541953247 [watch-inbox-ope] 1676 R 541953263 [watch-inbox-ope] 1676 R 541953279 [watch-inbox-ope] 1676 RM 540448887 [watch-inbox-ope] 1676 R 541953303 [watch-inbox-ope] 1676 R 541953327 [watch-inbox-ope] 1676 R 542113903 [watch-inbox-ope] 1676 R 542113919 [watch-inbox-ope] 1676 R 541953359 [watch-inbox-ope] 1676 R 541953375 [watch-inbox-ope] 1676 R 541953391 [watch-inbox-ope] 1676 R 541953407 [watch-inbox-ope] 1676 R 542145679 [watch-inbox-ope] 1676 R 542145695 [watch-inbox-ope] 1676 R 542145711 [watch-inbox-ope] 1676 R 542145727 [watch-inbox-ope] 1676 R 542145743 [watch-inbox-ope] 1676 R 541953423 [watch-inbox-ope] 1676 R 542145759 [watch-inbox-ope] 1676 R 541953455 [watch-inbox-ope] 1676 R 541953471 [watch-inbox-ope] 1676 R 542145775 [watch-inbox-ope] 1676 R 542145791 [watch-inbox-ope] 1676 R 541953487 [watch-inbox-ope] 1676 R 541953519 [watch-inbox-ope] 1676 RM 540448895 [watch-inbox-ope] 1676 R 541953535 [watch-inbox-ope] 1676 R 541953551 [watch-inbox-ope] 1676 R 541953567 [watch-inbox-ope] 1676 R 541953599 [watch-inbox-ope] 1676 R 541953615 [watch-inbox-ope] 1676 R 541953631 [watch-inbox-ope] 1676 R 541953647 [watch-inbox-ope] 1676 R 542157455 [watch-inbox-ope] 1676 R 542157471 [watch-inbox-ope] 1676 R 542157487 [watch-inbox-ope] 1676 R 541953671 [watch-inbox-ope] 1676 RA 540386719 [watch-inbox-ope] 1676 RA 540386727 [watch-inbox-ope] 1676 RA 540386735 [watch-inbox-ope] 1676 RA 540386743 [watch-inbox-ope] 1676 RA 540386751 [watch-inbox-ope] 1676 RA 540386759 [watch-inbox-ope] 1676 RA 540386767 [watch-inbox-ope] 1676 RA 540386775 [watch-inbox-ope] 1676 RA 540386783 [watch-inbox-ope] 1676 RA 540386791 [watch-inbox-ope] 1676 RA 540386799 [watch-inbox-ope] 1676 RA 540386807 [watch-inbox-ope] 1676 RA 540386815 [watch-inbox-ope] 1676 RA 540386823 [watch-inbox-ope] 1676 RA 540386831 [watch-inbox-ope] 1676 RA 540386839 [watch-inbox-ope] 1676 RA 540386847 [watch-inbox-ope] 1676 RA 540386855 [watch-inbox-ope] 1676 RA 540386863 [watch-inbox-ope] 1676 RA 540386871 [watch-inbox-ope] 1676 RA 540386879 [watch-inbox-ope] 1676 RA 540386887 [watch-inbox-ope] 1676 RA 540386895 [watch-inbox-ope] 1676 RA 540386903 [watch-inbox-ope] 1676 RA 540386911 [watch-inbox-ope] 1676 RA 540386919 [watch-inbox-ope] 1676 RA 540386927 [watch-inbox-ope] 1676 RA 540386935 [watch-inbox-ope] 1676 RA 540386943 [watch-inbox-ope] 1676 RA 540386951 [watch-inbox-ope] 1676 RA 540386959 [watch-inbox-ope] 1676 RM 540386711 [watch-inbox-ope] 239 W 271863816 [flush-8:0] 239 W 272019768 [flush-8:0] 239 W 272019776 [flush-8:0] 239 W 483478791 [flush-8:0] 239 W 260578312 [flush-8:0] 239 W 260578400 [flush-8:0] 1676 R 541953695 [watch-inbox-ope] 1676 R 541953711 [watch-inbox-ope] 1676 R 542157503 [watch-inbox-ope] 1676 R 541953743 [watch-inbox-ope] 1676 R 541953759 [watch-inbox-ope] 1676 R 541953775 [watch-inbox-ope] 1676 R 542157519 [watch-inbox-ope] 1676 R 541953791 [watch-inbox-ope] 1676 R 542157551 [watch-inbox-ope] 1676 R 541953807 [watch-inbox-ope] 1676 R 541953831 [watch-inbox-ope] 1676 R 541953863 [watch-inbox-ope] 1676 R 541953927 [watch-inbox-ope] 1676 R 541954055 [watch-inbox-ope] 1676 RM 540448903 [watch-inbox-ope] 1676 R 542157567 [watch-inbox-ope] 1676 R 541954127 [watch-inbox-ope] 1676 R 541954143 [watch-inbox-ope] 1676 R 541954159 [watch-inbox-ope] 1676 R 541954183 [watch-inbox-ope] 1676 R 541954207 [watch-inbox-ope] 1676 R 541954223 [watch-inbox-ope] 1676 R 541954239 [watch-inbox-ope] 1676 R 541954255 [watch-inbox-ope] 1676 R 541954271 [watch-inbox-ope] 1676 R 541954287 [watch-inbox-ope] 1676 R 541954319 [watch-inbox-ope] 1676 R 541954335 [watch-inbox-ope] 1676 R 541954351 [watch-inbox-ope] 1676 R 541954367 [watch-inbox-ope] 1676 R 541954391 [watch-inbox-ope] 1676 R 541954415 [watch-inbox-ope] 1676 R 541954431 [watch-inbox-ope] 1676 R 541954455 [watch-inbox-ope] 1676 R 541954479 [watch-inbox-ope] 1676 R 541954495 [watch-inbox-ope] 1676 RM 540456719 [watch-inbox-ope] 239 W 260622168 [flush-8:0] 239 W 260625528 [flush-8:0] 239 W 260625608 [flush-8:0] 239 W 260614368 [flush-8:0] 239 W 260614336 [flush-8:0] 239 W 260614304 [flush-8:0] 239 W 260614280 [flush-8:0] 1676 R 541954511 [watch-inbox-ope] 1676 R 541954527 [watch-inbox-ope] 1676 R 541954543 [watch-inbox-ope] 1676 R 541954567 [watch-inbox-ope] 1676 R 541954599 [watch-inbox-ope] 1676 R 541954607 [watch-inbox-ope] 1676 R 541954623 [watch-inbox-ope] 1676 R 541954655 [watch-inbox-ope] 1676 R 541954671 [watch-inbox-ope] 1676 R 541954687 [watch-inbox-ope] 1676 R 542158351 [watch-inbox-ope] 1676 R 542158367 [watch-inbox-ope] 1676 R 542158383 [watch-inbox-ope] 1676 R 542158399 [watch-inbox-ope] 1676 R 542158415 [watch-inbox-ope] 1676 R 541954703 [watch-inbox-ope] 1676 R 541954727 [watch-inbox-ope] 1676 R 541954751 [watch-inbox-ope] 1676 R 542158431 [watch-inbox-ope] 1676 R 542158447 [watch-inbox-ope] 1676 R 542158463 [watch-inbox-ope] 1676 R 541954767 [watch-inbox-ope] 1676 RM 540456727 [watch-inbox-ope] 1676 R 541954831 [watch-inbox-ope] 1676 R 541954863 [watch-inbox-ope] 1676 R 541954783 [watch-inbox-ope] 1676 R 541954799 [watch-inbox-ope] 1676 R 541954895 [watch-inbox-ope] 1676 R 541954911 [watch-inbox-ope] 1676 R 541954927 [watch-inbox-ope] 1676 R 541954943 [watch-inbox-ope] 1676 R 542158479 [watch-inbox-ope] 1676 R 541954959 [watch-inbox-ope] 1676 R 542158495 [watch-inbox-ope] 1676 R 541954983 [watch-inbox-ope] 1676 R 541955007 [watch-inbox-ope] 1676 R 541955023 [watch-inbox-ope] 1676 R 541955047 [watch-inbox-ope] 1676 R 541955071 [watch-inbox-ope] 1676 R 541955087 [watch-inbox-ope] 1676 R 541955119 [watch-inbox-ope] 1676 R 541955183 [watch-inbox-ope] 239 W 260607112 [flush-8:0] 239 W 260607152 [flush-8:0] 239 W 260607168 [flush-8:0] 239 W 260607192 [flush-8:0] 239 W 260607208 [flush-8:0] 239 W 260607272 [flush-8:0] 239 W 260607320 [flush-8:0] 239 W 260607384 [flush-8:0] 239 W 260607424 [flush-8:0] 239 W 260607440 [flush-8:0] 239 W 260607456 [flush-8:0] 239 W 260607480 [flush-8:0] 239 W 260607512 [flush-8:0] 239 W 260607536 [flush-8:0] 239 W 260607568 [flush-8:0] 239 W 260607632 [flush-8:0] 239 W 260607656 [flush-8:0] 239 W 260607680 [flush-8:0] 239 W 260607704 [flush-8:0] 239 W 260607728 [flush-8:0] 239 W 260607744 [flush-8:0] 239 W 260607776 [flush-8:0] 239 W 260607808 [flush-8:0] 239 W 260607856 [flush-8:0] 239 W 260607872 [flush-8:0] 239 W 260607888 [flush-8:0] 239 W 260607920 [flush-8:0] 239 W 260607944 [flush-8:0] 239 W 260607976 [flush-8:0] 239 W 260607992 [flush-8:0] 239 W 260608008 [flush-8:0] 239 W 260608040 [flush-8:0] 239 W 260608056 [flush-8:0] 239 W 260608072 [flush-8:0] 239 W 260608088 [flush-8:0] 239 W 260608152 [flush-8:0] 239 W 260608200 [flush-8:0] 239 W 260608224 [flush-8:0] 239 W 260608240 [flush-8:0] 239 W 260608272 [flush-8:0] 239 W 260608312 [flush-8:0] 239 W 260608352 [flush-8:0] 239 W 260608400 [flush-8:0] 239 W 260608424 [flush-8:0] 239 W 260608440 [flush-8:0] 1676 R 541955311 [watch-inbox-ope] 1676 R 541955327 [watch-inbox-ope] 1676 R 542158511 [watch-inbox-ope] 1676 R 542158543 [watch-inbox-ope] 1676 R 542158559 [watch-inbox-ope] 1676 RM 540456735 [watch-inbox-ope] 1676 R 542158575 [watch-inbox-ope] 1676 R 542158591 [watch-inbox-ope] 1676 R 541955343 [watch-inbox-ope] 1676 R 541955359 [watch-inbox-ope] 1676 R 541955375 [watch-inbox-ope] 1676 R 541955391 [watch-inbox-ope] 1676 R 541955423 [watch-inbox-ope] 1676 R 541955439 [watch-inbox-ope] 1676 R 542187279 [watch-inbox-ope] 239 W 260608456 [flush-8:0] 239 W 260608472 [flush-8:0] 239 W 260608488 [flush-8:0] 239 W 260608528 [flush-8:0] 239 W 260608544 [flush-8:0] 239 W 260608568 [flush-8:0] 239 W 260608584 [flush-8:0] 239 W 260608600 [flush-8:0] 239 W 260608632 [flush-8:0] 239 W 260608672 [flush-8:0] 1676 R 542187295 [watch-inbox-ope] 1676 R 542187311 [watch-inbox-ope] 1676 R 541955463 [watch-inbox-ope] 1676 R 541955503 [watch-inbox-ope] 1676 R 541955487 [watch-inbox-ope] 1676 R 542187327 [watch-inbox-ope] 1676 R 541955535 [watch-inbox-ope] 1676 R 541955551 [watch-inbox-ope] 1676 R 541955567 [watch-inbox-ope] 1676 R 541955583 [watch-inbox-ope] 1676 R 541955615 [watch-inbox-ope] 1676 R 541955655 [watch-inbox-ope] 1676 RM 540456743 [watch-inbox-ope] 1676 R 541955679 [watch-inbox-ope] 1676 R 541955695 [watch-inbox-ope] 1676 R 541955711 [watch-inbox-ope] 1676 R 542187343 [watch-inbox-ope] 1676 R 542187359 [watch-inbox-ope] 1676 R 542187375 [watch-inbox-ope] 1676 R 542187391 [watch-inbox-ope] 1676 R 542190479 [watch-inbox-ope] 1676 R 541955727 [watch-inbox-ope] 1676 R 541955751 [watch-inbox-ope] 1676 R 541955775 [watch-inbox-ope] 1676 R 542190495 [watch-inbox-ope] 1676 R 542190511 [watch-inbox-ope] 1676 R 542190527 [watch-inbox-ope] 1676 R 541955791 [watch-inbox-ope] 1676 R 541955823 [watch-inbox-ope] 1676 R 541955839 [watch-inbox-ope] 1676 R 541955855 [watch-inbox-ope] 1676 R 541955879 [watch-inbox-ope] 1676 R 541955903 [watch-inbox-ope] 1676 R 541955919 [watch-inbox-ope] 1676 RM 540456751 [watch-inbox-ope] 1676 R 541955943 [watch-inbox-ope] 1676 R 541955967 [watch-inbox-ope] 1676 R 542190543 [watch-inbox-ope] 1676 R 542190559 [watch-inbox-ope] 1676 R 542190575 [watch-inbox-ope] 1676 R 542190591 [watch-inbox-ope] 1676 R 542193807 [watch-inbox-ope] 1676 R 541955983 [watch-inbox-ope] 1676 R 541956015 [watch-inbox-ope] 1676 R 541956031 [watch-inbox-ope] 1676 R 541956047 [watch-inbox-ope] 1676 R 541956079 [watch-inbox-ope] 1676 R 541956095 [watch-inbox-ope] 1676 R 542193839 [watch-inbox-ope] 1676 R 542193855 [watch-inbox-ope] 1676 R 542193871 [watch-inbox-ope] 1676 R 541956111 [watch-inbox-ope] 1676 R 541956143 [watch-inbox-ope] 1676 R 541956207 [watch-inbox-ope] 1676 R 541956255 [watch-inbox-ope] 1676 R 542193887 [watch-inbox-ope] 1676 R 541956271 [watch-inbox-ope] 1676 R 541956287 [watch-inbox-ope] 1676 R 541956335 [watch-inbox-ope] 1676 RM 540456759 [watch-inbox-ope] 1676 R 542193903 [watch-inbox-ope] 1676 R 541956319 [watch-inbox-ope] 1676 R 541956367 [watch-inbox-ope] 1676 R 541956399 [watch-inbox-ope] 1676 R 541956415 [watch-inbox-ope] 1676 R 541956431 [watch-inbox-ope] 1676 R 541956447 [watch-inbox-ope] 1676 R 541956479 [watch-inbox-ope] 1676 R 542197775 [watch-inbox-ope] 1676 R 542197791 [watch-inbox-ope] 1676 R 542197807 [watch-inbox-ope] 1676 R 542197823 [watch-inbox-ope] 1676 R 542197839 [watch-inbox-ope] I recognise that the output will have a WTF reaction but the key observations to me are a) a single write request from flusher took over a second to complete b) at the time it was queued, it was mostly other writes that were in the queue at the same time c) The write request and the parallel writes were all asynchronous write requests D) at the time the request completed, there were a LARGE number of other requested queued and completed at the same time. Of the requests queued and completed in the meantime the breakdown was 22 RM 31 RA 82 W 445 R If I'm reading this correctly, it is saying that 22 reads were merged (RM), 31 reads were remapped to another device (RA) which is probably reads from the dm-crypt partition, 82 were writes (W) which is not far off the number of writes that were in the queue and 445 were other reads. The delay was dominated by reads that were queued after the write request and completed before it. There are lots of other example but here is part of one from much later that starts with. Request 27128 took 7.536721619 to complete 239 W 188973663 [flush-8:0] That's saying that the 27128th request in the trace took over 7 seconds to complete and was an asynchronous write from flusher. The contents of the queue are displayed at that time and the breakdown of requests is 23 WS 86 RM 124 RA 442 W 1931 R 7 seconds later when it was completed the breakdown of completed requests was 25 WS 114 RM 155 RA 408 W 2457 R In combination, that confirms for me that asynchronous writes from flush are being starved by reads. When a process requires a buffer that is locked by that asynchronous write from flusher, it stalls. > The thing is, we do want to make ext4 work well with cfq, and > prioritizing non-readahead read requests ahead of data writeback does > make sense. The issue is with is that metadata writes going through > the block device could in some cases effectively cause a priority > inversion when what had previously been an asynchronous writeback > starts blocking a foreground, user-visible process. > > At least, that's the theory; I *think* the data more or less confirms the theory but it'd be nice if someone else double checked in case I'm seeing what I want to see instead of what is actually there.
Mel Gorman <mgorman@suse.de> writes: > (Adding Jeff Moyer to the cc as I'm told he is interested in the blktrace) Thanks. I've got a few comments and corrections for you below. > TLDR: Flusher writes pages very quickly after processes dirty a buffer. Reads > starve flusher writes. [snip] > 3. The blktrace indicates that reads can starve writes from flusher > > While there are people that can look at a blktrace and find problems > like they are rain man, I'm more like an ADHD squirrel when looking at > a trace. I wrote a script to look for what unrelated requests completed > while an request got stalled for over a second. It seemed like something > that a tool shoudl already exist for but I didn't find one unless btt > can give the information somehow. Care to share that script? [snip] > I recognise that the output will have a WTF reaction but the key > observations to me are > > a) a single write request from flusher took over a second to complete > b) at the time it was queued, it was mostly other writes that were in > the queue at the same time > c) The write request and the parallel writes were all asynchronous write > requests > D) at the time the request completed, there were a LARGE number of > other requested queued and completed at the same time. > > Of the requests queued and completed in the meantime the breakdown was > > 22 RM > 31 RA > 82 W > 445 R > > If I'm reading this correctly, it is saying that 22 reads were merged (RM), > 31 reads were remapped to another device (RA) which is probably reads from > the dm-crypt partition, 82 were writes (W) which is not far off the number > of writes that were in the queue and 445 were other reads. The delay was > dominated by reads that were queued after the write request and completed > before it. RM == Read Meta RA == Read Ahead (remapping, by the way, does not happen across devices, just into partitions) W and R you understood correctly. > That's saying that the 27128th request in the trace took over 7 seconds > to complete and was an asynchronous write from flusher. The contents of > the queue are displayed at that time and the breakdown of requests is > > 23 WS [JEM: write sync] > 86 RM [JEM: Read Meta] > 124 RA [JEM: Read Ahead] > 442 W > 1931 R > > 7 seconds later when it was completed the breakdown of completed > requests was > > 25 WS > 114 RM > 155 RA > 408 W > 2457 R > > In combination, that confirms for me that asynchronous writes from flush > are being starved by reads. When a process requires a buffer that is locked > by that asynchronous write from flusher, it stalls. > >> The thing is, we do want to make ext4 work well with cfq, and >> prioritizing non-readahead read requests ahead of data writeback does >> make sense. The issue is with is that metadata writes going through >> the block device could in some cases effectively cause a priority >> inversion when what had previously been an asynchronous writeback >> starts blocking a foreground, user-visible process. >> >> At least, that's the theory; > > I *think* the data more or less confirms the theory but it'd be nice if > someone else double checked in case I'm seeing what I want to see > instead of what is actually there. Looks sane. You can also see a lot of "preempt"s in the blkparse output, which indicates exactly what you're saying. Any sync request gets priority over the async requests. I'll also note that even though your I/O is going all over the place (D2C is pretty bad, 14ms), most of the time is spent waiting for a struct request allocation or between Queue and Merge: ==================== All Devices ==================== ALL MIN AVG MAX N --------------- ------------- ------------- ------------- ----------- Q2Q 0.000000001 0.000992259 8.898375882 2300861 Q2G 0.000000843 10.193261239 2064.079501935 1016463 <==== G2I 0.000000461 0.000044702 3.237065090 1015803 Q2M 0.000000101 8.203147238 2064.079367557 1311662 I2D 0.000002012 1.476824812 2064.089774419 1014890 M2D 0.000003283 6.994306138 283.573348664 1284872 D2C 0.000061889 0.014438316 0.857811758 2291996 Q2C 0.000072284 13.363007244 2064.092228625 2292191 ==================== Device Overhead ==================== DEV | Q2G G2I Q2M I2D D2C ---------- | --------- --------- --------- --------- --------- ( 8, 0) | 33.8259% 0.0001% 35.1275% 4.8932% 0.1080% ---------- | --------- --------- --------- --------- --------- Overall | 33.8259% 0.0001% 35.1275% 4.8932% 0.1080% I'm not sure I believe that max value. 2064 seconds seems a bit high. Also, Q2M should not be anywhere near that big, so more investigation is required there. A quick look over the data doesn't show any such delays (making me question the tools), but I'll write some code tomorrow to verify the btt output. Jan, if I were to come up with a way of promoting a particular async queue to the front of the line, where would I put such a call in the ext4/jbd2 code to be effective? Mel, can you reproduce this at will? Do you have a reproducer that I could run so I'm not constantly bugging you? Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote: > > Jan, if I were to come up with a way of promoting a particular async > queue to the front of the line, where would I put such a call in the > ext4/jbd2 code to be effective? Well, I thought we had discussed trying to bump a pending I/O automatically when there was an attempt to call lock_buffer() on the bh? That would be ideal, because we could keep the async writeback low priority until someone is trying to wait upon it, at which point obviously it should no longer be considered an async write call. Failing that, this is something I've been toying with.... what do you think? http://patchwork.ozlabs.org/patch/238192/ http://patchwork.ozlabs.org/patch/238257/ (The first patch in the series just makes sure that allocation bitmap reads are marked with the META/PRIO flags. It's not strictly speaking related to the problem discussed here, but for completeness: http://patchwork.ozlabs.org/patch/238193/) - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Apr 22, 2013 at 06:42:23PM -0400, Jeff Moyer wrote: > > 3. The blktrace indicates that reads can starve writes from flusher > > > > While there are people that can look at a blktrace and find problems > > like they are rain man, I'm more like an ADHD squirrel when looking at > > a trace. I wrote a script to look for what unrelated requests completed > > while an request got stalled for over a second. It seemed like something > > that a tool shoudl already exist for but I didn't find one unless btt > > can give the information somehow. > > Care to share that script? > I would have preferred not to because it is an ugly hatchet job churned out in a few minutes. It's written in perl and uses the text output from blkparse making it slow. It uses an excessive amount of memory because I was taking shortcuts so is resource heavy. It ignores most of the information from blkparse and so there are gaps in what it reports. Even though it's dogshit, it was useful in this particular case so I added it to mmtests anyway. Be aware that it takes ages to run and you might want to break the blkparse output into pieces It's used something like blkparse -i blktrace-log > blkparse-log cat blkparse-log | $PATH_TO_MMTESTS/subreport/blktrace-queue-watch.pl > [snip] > > > I recognise that the output will have a WTF reaction but the key > > observations to me are > > > > a) a single write request from flusher took over a second to complete > > b) at the time it was queued, it was mostly other writes that were in > > the queue at the same time > > c) The write request and the parallel writes were all asynchronous write > > requests > > D) at the time the request completed, there were a LARGE number of > > other requested queued and completed at the same time. > > > > Of the requests queued and completed in the meantime the breakdown was > > > > 22 RM > > 31 RA > > 82 W > > 445 R > > > > If I'm reading this correctly, it is saying that 22 reads were merged (RM), > > 31 reads were remapped to another device (RA) which is probably reads from > > the dm-crypt partition, 82 were writes (W) which is not far off the number > > of writes that were in the queue and 445 were other reads. The delay was > > dominated by reads that were queued after the write request and completed > > before it. > > RM == Read Meta > RA == Read Ahead (remapping, by the way, does not happen across > devices, just into partitions) > W and R you understood correctly. > Thanks for those corrections. I misread the meaning of the action identifiers section of the blkparse manual. I should have double checked the source. > >> <SNIP> > >> The thing is, we do want to make ext4 work well with cfq, and > >> prioritizing non-readahead read requests ahead of data writeback does > >> make sense. The issue is with is that metadata writes going through > >> the block device could in some cases effectively cause a priority > >> inversion when what had previously been an asynchronous writeback > >> starts blocking a foreground, user-visible process. > >> > >> At least, that's the theory; > > > > I *think* the data more or less confirms the theory but it'd be nice if > > someone else double checked in case I'm seeing what I want to see > > instead of what is actually there. > > Looks sane. You can also see a lot of "preempt"s in the blkparse > output, which indicates exactly what you're saying. Any sync request > gets priority over the async requests. > Good to know. > I'll also note that even though your I/O is going all over the place > (D2C is pretty bad, 14ms), most of the time is spent waiting for a > struct request allocation or between Queue and Merge: > > ==================== All Devices ==================== > > ALL MIN AVG MAX N > --------------- ------------- ------------- ------------- ----------- > > Q2Q 0.000000001 0.000992259 8.898375882 2300861 > Q2G 0.000000843 10.193261239 2064.079501935 1016463 <==== This is not normally my sandbox so do you mind spelling this out? IIUC, the time to allocate the struct request from the slab cache is just a small portion of this time. The bulk of the time is spent in get_request() waiting for congestion to clear on the request list for either the sync or async queue. Once a process goes to sleep on that waitqueue, it has to wait until enough requests on that queue have been serviced before it gets woken again at which point it gets priority access to prevent further starvation. This is the Queue To Get Reqiest (Q2G) delay. What we may be seeing here is that the async queue was congested and on average, we are waiting for 10 seconds for it to clear. The maximum value may be bogus for reasons explained later. Is that accurate? > G2I 0.000000461 0.000044702 3.237065090 1015803 > Q2M 0.000000101 8.203147238 2064.079367557 1311662 > I2D 0.000002012 1.476824812 2064.089774419 1014890 > M2D 0.000003283 6.994306138 283.573348664 1284872 > D2C 0.000061889 0.014438316 0.857811758 2291996 > Q2C 0.000072284 13.363007244 2064.092228625 2292191 > > ==================== Device Overhead ==================== > > DEV | Q2G G2I Q2M I2D D2C > ---------- | --------- --------- --------- --------- --------- > ( 8, 0) | 33.8259% 0.0001% 35.1275% 4.8932% 0.1080% > ---------- | --------- --------- --------- --------- --------- > Overall | 33.8259% 0.0001% 35.1275% 4.8932% 0.1080% > > I'm not sure I believe that max value. 2064 seconds seems a bit high. It is so I looked closer at the timestamps and there is an one hour correction about 4400 seconds into the test. Daylight savings time kicked in on March 31st and the machine is rarely rebooted until this test case came along. It looks like there is a timezone or time misconfiguration on the laptop that starts the machine with the wrong time. NTP must have corrected the time which skewed the readings in that window severely :( Normally on my test machines these services are disabled to avoid exactly this sort of problem. > Also, Q2M should not be anywhere near that big, so more investigation is > required there. A quick look over the data doesn't show any such delays > (making me question the tools), but I'll write some code tomorrow to > verify the btt output. > It might be a single set of readings during a time correction that screwed it. > Jan, if I were to come up with a way of promoting a particular async > queue to the front of the line, where would I put such a call in the > ext4/jbd2 code to be effective? > > Mel, can you reproduce this at will? Do you have a reproducer that I > could run so I'm not constantly bugging you? > I can reproduce it at will. Due to the nature of the test, the test results are variable and unfortunately it is one of the tricker mmtest configurations to setup. 1. Get access to a webserver 2. Close mmtests to your test machine git clone https://github.com/gormanm/mmtests.git 3. Edit shellpacks/common-config.sh and set WEBROOT to a webserver path 4. Create a tar.gz of a large git tree and place it at $WEBROOT/linux-2.6.tar.gz Alternatively place a compressed git tree anywhere and edit configs/config-global-dhp__io-multiple-source-latency and update GITCHECKOUT_SOURCETAR 5. Create a tar.gz of a large maildir directory and place it at $WEBROOT/$WEBROOT/maildir.tar.gz Alternatively, use an existing maildir folder and set MONITOR_INBOX_OPEN_MAILDIR in configs/config-global-dhp__io-multiple-source-latency It's awkward but it's not like there are standard benchmarks lying around and it seemed the best way to reproduce the problems I typically see early in the lifetime of a system or when running a git checkout when the tree has not been used in a few hours. Run the actual test with ./run-mmtests.sh --config configs/config-global-dhp__io-multiple-source-latency --run-monitor test-name-of-your-choice Results will be in work/log. You'll need to run this as root so it can run blktrace and so it can drop_caches between git checkouts (to force disk IO). If systemtap craps out on you, then edit configs/config-global-dhp__io-multiple-source-latency and remove dstate from MONITORS_GZIP If you have trouble getting this running, ping me on IRC. Thanks.
On Sat, Apr 20, 2013 at 08:05:22PM -0400, Theodore Ts'o wrote: > An alternate solution which I've been playing around adds buffer_head > flags so we can indicate that a buffer contains metadata and/or should > have I/O submitted with the REQ_PRIO flag set. > I beefed up the reporting slightly and tested the patches comparing 3.9-rc6 vanilla with your patches. The full report with graphs are at http://www.csn.ul.ie/~mel/postings/ext4tag-20130423/report.html 3.9.0-rc6 3.9.0-rc6 vanilla ext4tag User min 0.00 ( 0.00%) 0.00 ( 0.00%) User mean nan ( nan%) nan ( nan%) User stddev nan ( nan%) nan ( nan%) User max 0.00 ( 0.00%) 0.00 ( 0.00%) User range 0.00 ( 0.00%) 0.00 ( 0.00%) System min 9.14 ( 0.00%) 9.13 ( 0.11%) System mean 9.60 ( 0.00%) 9.73 ( -1.33%) System stddev 0.39 ( 0.00%) 0.94 (-142.69%) System max 10.31 ( 0.00%) 11.58 (-12.32%) System range 1.17 ( 0.00%) 2.45 (-109.40%) Elapsed min 665.54 ( 0.00%) 612.25 ( 8.01%) Elapsed mean 775.35 ( 0.00%) 688.01 ( 11.26%) Elapsed stddev 69.11 ( 0.00%) 58.22 ( 15.75%) Elapsed max 858.40 ( 0.00%) 773.06 ( 9.94%) Elapsed range 192.86 ( 0.00%) 160.81 ( 16.62%) CPU min 3.00 ( 0.00%) 3.00 ( 0.00%) CPU mean 3.60 ( 0.00%) 4.20 (-16.67%) CPU stddev 0.49 ( 0.00%) 0.75 (-52.75%) CPU max 4.00 ( 0.00%) 5.00 (-25.00%) CPU range 1.00 ( 0.00%) 2.00 (-100.00%) The patches appear to improve the git checkout times slightly but this test is quite variable. The vmstat figures report some reclaim activity but if you look at the graphs further down you will see that the bulk of the kswapd reclaim scan and steal activity is at the start of the test when it's downloading and untarring a git tree to work with. (I also note that the mouse-over graph for direct reclaim efficiency is broken but it's not important right now). From iostat 3.9.0-rc6 3.9.0-rc6 vanilla ext4tag Mean dm-0-avgqz 1.18 1.19 Mean dm-0-await 17.30 16.50 Mean dm-0-r_await 17.30 16.50 Mean dm-0-w_await 0.94 0.48 Mean sda-avgqz 650.29 719.81 Mean sda-await 2501.33 2597.23 Mean sda-r_await 30.01 24.91 Mean sda-w_await 11228.80 11120.64 Max dm-0-avgqz 12.30 10.14 Max dm-0-await 42.65 52.23 Max dm-0-r_await 42.65 52.23 Max dm-0-w_await 541.00 263.83 Max sda-avgqz 3811.93 3375.11 Max sda-await 7178.61 7170.44 Max sda-r_await 384.37 297.85 Max sda-w_await 51353.93 50338.25 There are no really obvious massive advantages to me there and if you look at the graphs for the avgqs, await etc over time, the patched kernel are not obviously better. The Wait CPU usage looks roughly the same too. On the more positive side, the dstate systemtap monitor script tells me that all processes were stalled for less time -- 9575 seconds versus 10910. The most severe event to stall on is sleep_on_buffer() as a result of ext4_bread. Vanilla kernel 3325677 ms stalled with 57 events Patched kernel 2411471 ms stalled with 42 events That's a pretty big drop but it gets bad again for the second worst stall -- wait_on_page_bit as a result of generic_file_buffered_write. Vanilla kernel 1336064 ms stalled with 109 events Patched kernel 2338781 ms stalled with 164 events So conceptually the patches make sense but the first set of tests do not indicate that they'll fix the problem and the stall times do not indicate that interactivity will be any better. I'll still apply them and boot them on my main work machine and see how they "feel" this evening.
On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote: > That's a pretty big drop but it gets bad again for the second worst stall -- > wait_on_page_bit as a result of generic_file_buffered_write. > > Vanilla kernel 1336064 ms stalled with 109 events > Patched kernel 2338781 ms stalled with 164 events Do you have the stack trace for this stall? I'm wondering if this is caused by the waiting for stable pages in write_begin() , or something else. If it is blocking caused by stable page writeback that's interesting, since it would imply that something in your workload is trying to write to a page that has already been modified (i.e., appending to a log file, or updating a database file). Does that make sense given what your workload might be running? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, Apr 23, 2013 at 11:50:19AM -0400, Theodore Ts'o wrote: > On Tue, Apr 23, 2013 at 04:33:05PM +0100, Mel Gorman wrote: > > That's a pretty big drop but it gets bad again for the second worst stall -- > > wait_on_page_bit as a result of generic_file_buffered_write. > > > > Vanilla kernel 1336064 ms stalled with 109 events > > Patched kernel 2338781 ms stalled with 164 events > > Do you have the stack trace for this stall? I'm wondering if this is > caused by the waiting for stable pages in write_begin() , or something > else. > [<ffffffff81110238>] wait_on_page_bit+0x78/0x80 [<ffffffff815af294>] kretprobe_trampoline+0x0/0x4c [<ffffffff81110e84>] generic_file_buffered_write+0x114/0x2a0 [<ffffffff81111ccd>] __generic_file_aio_write+0x1bd/0x3c0 [<ffffffff81111f4a>] generic_file_aio_write+0x7a/0xf0 [<ffffffff811ee639>] ext4_file_write+0x99/0x420 [<ffffffff81174d87>] do_sync_write+0xa7/0xe0 [<ffffffff81175447>] vfs_write+0xa7/0x180 [<ffffffff811758cd>] sys_write+0x4d/0x90 [<ffffffff815b3eed>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff The processes that stalled in this particular trace are wget, latency-output, tar and tclsh. Most of these are sequential writers except for tar which is both a sequential reader and sequential writers. > If it is blocking caused by stable page writeback that's interesting, > since it would imply that something in your workload is trying to > write to a page that has already been modified (i.e., appending to a > log file, or updating a database file). Does that make sense given > what your workload might be running? > I doubt it is stable write consider the type of processes that are running. I would expect the bulk of the activity to be sequential readers or writers of multiple files. The summarised report from the raw data is now at http://www.csn.ul.ie/~mel/postings/ext4tag-20130423/dstate-summary-vanilla http://www.csn.ul.ie/~mel/postings/ext4tag-20130423/dstate-summary-ext4tag It's an aside but the worst of the stalls are incurred by systemd-tmpfile which were not a deliberate part of the test and yet another thing that I would not have caught unless I was running tests on my laptop. Looking closer at that thing, the default configuration is to run the service 15 minutes after boot and after that it runs once a day. It looks like the bulk of the scanning would be in /var/tmp/ looking at systemds own files (over 3000 of them) which I'm a little amused by. My normal test machines would not hit this because they are not systemd based but the existance of thing thing is worth noting. Any IO-based tests run on systemd-based distributions may give different results depending on whether this service triggered during the test or not.
Mel Gorman <mgorman@suse.de> writes: >> I'll also note that even though your I/O is going all over the place >> (D2C is pretty bad, 14ms), most of the time is spent waiting for a >> struct request allocation or between Queue and Merge: >> >> ==================== All Devices ==================== >> >> ALL MIN AVG MAX N >> --------------- ------------- ------------- ------------- ----------- >> >> Q2Q 0.000000001 0.000992259 8.898375882 2300861 >> Q2G 0.000000843 10.193261239 2064.079501935 1016463 <==== > > This is not normally my sandbox so do you mind spelling this out? > > IIUC, the time to allocate the struct request from the slab cache is just a > small portion of this time. The bulk of the time is spent in get_request() > waiting for congestion to clear on the request list for either the sync or > async queue. Once a process goes to sleep on that waitqueue, it has to wait > until enough requests on that queue have been serviced before it gets woken > again at which point it gets priority access to prevent further starvation. > This is the Queue To Get Reqiest (Q2G) delay. What we may be seeing here > is that the async queue was congested and on average, we are waiting for > 10 seconds for it to clear. The maximum value may be bogus for reasons > explained later. > > Is that accurate? Yes, without getting into excruciating detail. >> G2I 0.000000461 0.000044702 3.237065090 1015803 >> Q2M 0.000000101 8.203147238 2064.079367557 1311662 >> I2D 0.000002012 1.476824812 2064.089774419 1014890 >> M2D 0.000003283 6.994306138 283.573348664 1284872 >> D2C 0.000061889 0.014438316 0.857811758 2291996 >> Q2C 0.000072284 13.363007244 2064.092228625 2292191 >> >> ==================== Device Overhead ==================== >> >> DEV | Q2G G2I Q2M I2D D2C >> ---------- | --------- --------- --------- --------- --------- >> ( 8, 0) | 33.8259% 0.0001% 35.1275% 4.8932% 0.1080% >> ---------- | --------- --------- --------- --------- --------- >> Overall | 33.8259% 0.0001% 35.1275% 4.8932% 0.1080% >> >> I'm not sure I believe that max value. 2064 seconds seems a bit high. > > It is so I looked closer at the timestamps and there is an one hour > correction about 4400 seconds into the test. Daylight savings time kicked > in on March 31st and the machine is rarely rebooted until this test case > came along. It looks like there is a timezone or time misconfiguration > on the laptop that starts the machine with the wrong time. NTP must have > corrected the time which skewed the readings in that window severely :( Not sure I'm buying that argument, as there are no gaps in the blkparse output. The logging is not done using wallclock time. I still haven't had sufficient time to dig into these numbers. >> Also, Q2M should not be anywhere near that big, so more investigation is >> required there. A quick look over the data doesn't show any such delays >> (making me question the tools), but I'll write some code tomorrow to >> verify the btt output. >> > > It might be a single set of readings during a time correction that > screwed it. Again, I don't think so. > I can reproduce it at will. Due to the nature of the test, the test > results are variable and unfortunately it is one of the tricker mmtest > configurations to setup. > > 1. Get access to a webserver > 2. Close mmtests to your test machine > git clone https://github.com/gormanm/mmtests.git > 3. Edit shellpacks/common-config.sh and set WEBROOT to a webserver path > 4. Create a tar.gz of a large git tree and place it at $WEBROOT/linux-2.6.tar.gz > Alternatively place a compressed git tree anywhere and edit > configs/config-global-dhp__io-multiple-source-latency > and update GITCHECKOUT_SOURCETAR > 5. Create a tar.gz of a large maildir directory and place it at > $WEBROOT/$WEBROOT/maildir.tar.gz > Alternatively, use an existing maildir folder and set > MONITOR_INBOX_OPEN_MAILDIR in > configs/config-global-dhp__io-multiple-source-latency > > It's awkward but it's not like there are standard benchmarks lying around > and it seemed the best way to reproduce the problems I typically see early > in the lifetime of a system or when running a git checkout when the tree > has not been used in a few hours. Run the actual test with > > ./run-mmtests.sh --config configs/config-global-dhp__io-multiple-source-latency --run-monitor test-name-of-your-choice > > Results will be in work/log. You'll need to run this as root so it > can run blktrace and so it can drop_caches between git checkouts > (to force disk IO). If systemtap craps out on you, then edit > configs/config-global-dhp__io-multiple-source-latency and remove dstate > from MONITORS_GZIP And how do I determine whether I've hit the problem? > If you have trouble getting this running, ping me on IRC. Yes, I'm having issues getting things to go, but you didn't provide me a time zone, an irc server or a nick to help me find you. Was that intentional? ;-) Cheers, Jeff -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, Apr 24, 2013 at 03:09:13PM -0400, Jeff Moyer wrote: > Mel Gorman <mgorman@suse.de> writes: > > >> I'll also note that even though your I/O is going all over the place > >> (D2C is pretty bad, 14ms), most of the time is spent waiting for a > >> struct request allocation or between Queue and Merge: > >> > >> ==================== All Devices ==================== > >> > >> ALL MIN AVG MAX N > >> --------------- ------------- ------------- ------------- ----------- > >> > >> Q2Q 0.000000001 0.000992259 8.898375882 2300861 > >> Q2G 0.000000843 10.193261239 2064.079501935 1016463 <==== > > > > This is not normally my sandbox so do you mind spelling this out? > > > > IIUC, the time to allocate the struct request from the slab cache is just a > > small portion of this time. The bulk of the time is spent in get_request() > > waiting for congestion to clear on the request list for either the sync or > > async queue. Once a process goes to sleep on that waitqueue, it has to wait > > until enough requests on that queue have been serviced before it gets woken > > again at which point it gets priority access to prevent further starvation. > > This is the Queue To Get Reqiest (Q2G) delay. What we may be seeing here > > is that the async queue was congested and on average, we are waiting for > > 10 seconds for it to clear. The maximum value may be bogus for reasons > > explained later. > > > > Is that accurate? > > Yes, without getting into excruciating detail. Good enough, thanks. > >> I'm not sure I believe that max value. 2064 seconds seems a bit high. > > > > It is so I looked closer at the timestamps and there is an one hour > > correction about 4400 seconds into the test. Daylight savings time kicked > > in on March 31st and the machine is rarely rebooted until this test case > > came along. It looks like there is a timezone or time misconfiguration > > on the laptop that starts the machine with the wrong time. NTP must have > > corrected the time which skewed the readings in that window severely :( > > Not sure I'm buying that argument, as there are no gaps in the blkparse > output. The logging is not done using wallclock time. I still haven't > had sufficient time to dig into these numbers. > Ok. > > It's awkward but it's not like there are standard benchmarks lying around > > and it seemed the best way to reproduce the problems I typically see early > > in the lifetime of a system or when running a git checkout when the tree > > has not been used in a few hours. Run the actual test with > > > > ./run-mmtests.sh --config configs/config-global-dhp__io-multiple-source-latency --run-monitor test-name-of-your-choice > > > > Results will be in work/log. You'll need to run this as root so it > > can run blktrace and so it can drop_caches between git checkouts > > (to force disk IO). If systemtap craps out on you, then edit > > configs/config-global-dhp__io-multiple-source-latency and remove dstate > > from MONITORS_GZIP > > And how do I determine whether I've hit the problem? > If systemtap is available then cat work/log/dstate-TESTNAME-gitcheckout | subreport/stap-dstate-frequency will give you a report on the worst stalls and the stack traces when those stalls occurred. If the stalls are 10+ seconds then you've certainly hit the problem. Alternatively cd work/log ../../compare-kernels.sh Look at the average time it takes to run the git checkout. Is it abnormally high in comparison to if there was no parallel IO? If you do not know what the normal time is, run with ./run-mmtests.sh --config configs/config-global-dhp__io-multiple-source-latency --no-monitor test-name-no-monitor The monitors are what's opening the maildir and generating the parallel IO. If there is an excessive difference between the git checkout times, then you've hit the problem. Furthermore, look at the await times. If they do not appear in the compare-kernels.sh output then ../../bin/compare-mmtests.pl -d . -b gitcheckout -n test-name-of-your-choice --print-monitor iostat And look at the await times. Are they very high? If so, you hit the problem. If you want a better look at the await figures over time, either look at the iostat file or you can graph it with ../../bin/graph-mmtests.sh -d . -b gitcheckout -n test-name-of-your-choice --print-monitor iostat --sub-heading sda-await where sda-await should be substituted for whatever the disk is that you're running the test one. > > If you have trouble getting this running, ping me on IRC. > > Yes, I'm having issues getting things to go, but you didn't provide me a > time zone, an irc server or a nick to help me find you. Was that > intentional? ;-) > Not consciously :) . I'm in the IST timezone (GMT+1), OFTC IRC network and #mm channel. If it's my evening I'm not always responsive so send me the error output on email and I'll try fixing any problems that way.
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c index 325bc01..1be0ccb 100644 --- a/fs/jbd2/transaction.c +++ b/fs/jbd2/transaction.c @@ -640,6 +640,7 @@ do_get_write_access(handle_t *handle, struct journal_head *jh, int error; char *frozen_buffer = NULL; int need_copy = 0; + unsigned long start_lock, time_lock; if (is_handle_aborted(handle)) return -EROFS; @@ -655,9 +656,16 @@ repeat: /* @@@ Need to check for errors here at some point. */ + start_lock = jiffies; lock_buffer(bh); jbd_lock_bh_state(bh); + /* If it takes too long to lock the buffer, trace it */ + time_lock = jbd2_time_diff(start_lock, jiffies); + if (time_lock > HZ/10) + trace_jbd2_lock_buffer_stall(bh->b_bdev->bd_dev, + jiffies_to_msecs(time_lock)); + /* We now hold the buffer lock so it is safe to query the buffer * state. Is the buffer dirty? * diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h index 070df49..c1d1f3e 100644 --- a/include/trace/events/jbd2.h +++ b/include/trace/events/jbd2.h @@ -358,6 +358,27 @@ TRACE_EVENT(jbd2_write_superblock, MINOR(__entry->dev), __entry->write_op) ); +TRACE_EVENT(jbd2_lock_buffer_stall, + + TP_PROTO(dev_t dev, unsigned long stall_ms), + + TP_ARGS(dev, stall_ms), + + TP_STRUCT__entry( + __field( dev_t, dev ) + __field(unsigned long, stall_ms ) + ), + + TP_fast_assign( + __entry->dev = dev; + __entry->stall_ms = stall_ms; + ), + + TP_printk("dev %d,%d stall_ms %lu", + MAJOR(__entry->dev), MINOR(__entry->dev), + __entry->stall_ms) +); + #endif /* _TRACE_JBD2_H */ /* This part must be outside protection */