Message ID | 50FFFFC6.8070308@redhat.com |
---|---|
State | Superseded, archived |
Headers | show |
On Wed 23-01-13 09:20:38, Eric Sandeen wrote: > On 1/23/13 3:44 AM, Jan Kara wrote: > > On Tue 22-01-13 19:37:46, Eric Sandeen wrote: > >> On 1/22/13 5:50 PM, Jan Kara wrote: > >>> On Mon 21-01-13 18:11:30, Ted Tso wrote: > >>>> On Tue, Jan 22, 2013 at 12:04:32AM +0100, Sedat Dilek wrote: > >>>>> > >>>>> Beyond the FUSE/LOOP fun, will you apply this patch to your linux-next GIT tree? > >>>>> > >>>>> Feel free to add... > >>>>> > >>>>> Tested-by: Sedat Dilek <sedat.dilek@gmail.com> > >>>>> > >>>>> A similiar patch for JBD went through your tree into mainline (see [1] and [2]). > >>>> > >>>> I'm not at all convinced that this patch has anything to do with your > >>>> problem. I don't see how it could affect things, and I believe you > >>>> mentioned that you saw the problem even with this patch applied? (I'm > >>>> not sure; some of your messages which you sent were hard to > >>>> understand, and you mentioned something about trying to send messages > >>>> when low on sleep :-). > >>>> > >>>> In any case, the reason why I haven't pulled this patch into the ext4 > >>>> tree is because I was waiting for Eric and some of the performance > >>>> team folks at Red Hat to supply some additional information about why > >>>> this commit was making a difference in performance for a particular > >>>> proprietary, closed source benchmark. > >>> Just a small correction - it was aim7 AFAIK which isn't closed source > >>> (anymore). You can download it from SourceForge > >>> (http://sourceforge.net/projects/aimbench/files/aim-suite7/Initial%20release/). > >>> Now I have some reservations about what the benchmark does but historically > >>> it has found quite a few issues for us as well. > >>> > >>>> I'm very suspicious about applying patches under the "cargo cult" > >>>> school of programming. ("We don't understand why it makes a > >>>> difference, but it seems to be good, so bombs away!" :-) > >>> Well, neither am I ;) But it is obvious the patch speeds up > >>> log_start_commit() by 'a bit' (taking spinlock, disabling irqs, ...). And > >>> apparently 'a bit' is noticeable for particular workload on a particular > >>> machine - commit statistics Eric provided showed that clearly. I'd still be > >>> happier if Eric also told us how much log_start_commit() calls there were > >>> so that one could verify that 'a bit' could indeed multiply to a measurable > >>> difference. But given how simple the patch is, I gave away after a while > >>> and just merged it... > >> > >> I am still trying to get our perf guys to collect that data, FWIW... > >> I will send it when I get it. I bugged them again today. :) > >> > >> (Just to be sure: I was going to measure the wakeups the old way, and the > >> avoided wakeups with the new change; sound ok?) > > Yes, that would be what I'm interested in. > > Holy cow, this is much more than I expected, but here's what they report: > > old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups > new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups Yeah, that's a lot. My guess would be *a lot* of processes are hanging in start_this_handle() and waiting for transaction commit. Each of them calls __log_start_commit() and things add up. Thanks for getting these numbers. Honza
On Wed, Jan 23, 2013 at 08:29:11PM +0100, Jan Kara wrote: > > old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups > > new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups > Yeah, that's a lot. My guess would be *a lot* of processes are hanging in > start_this_handle() and waiting for transaction commit. Each of them calls > __log_start_commit() and things add up. Thanks for getting these numbers. Yeah, wow. That would imply that there are a huge number of processes that get hung up in start_this_handle(), and they are waking up the journal before the kjournald has a chance to set T_LOCKED (since then they would get blocked earlier in start_this_handle). Given that, I wonder if the following change would actually help or hurt things. Eric, would you be willing to ask your perf team to try testing out these patches? - Ted -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 30-01-13 00:26:58, Ted Tso wrote: > On Wed, Jan 23, 2013 at 08:29:11PM +0100, Jan Kara wrote: > > > old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups > > > new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups > > Yeah, that's a lot. My guess would be *a lot* of processes are hanging in > > start_this_handle() and waiting for transaction commit. Each of them calls > > __log_start_commit() and things add up. Thanks for getting these numbers. > > Yeah, wow. That would imply that there are a huge number of processes > that get hung up in start_this_handle(), and they are waking up the > journal before the kjournald has a chance to set T_LOCKED (since then > they would get blocked earlier in start_this_handle). > > Given that, I wonder if the following change would actually help or > hurt things. Eric, would you be willing to ask your perf team to try > testing out these patches? Umm, I don't see anything. Forgot to attach them? Honza
On Wed, Jan 30, 2013 at 11:07 AM, Jan Kara <jack@suse.cz> wrote: > On Wed 30-01-13 00:26:58, Ted Tso wrote: >> On Wed, Jan 23, 2013 at 08:29:11PM +0100, Jan Kara wrote: >> > > old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups >> > > new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups >> > Yeah, that's a lot. My guess would be *a lot* of processes are hanging in >> > start_this_handle() and waiting for transaction commit. Each of them calls >> > __log_start_commit() and things add up. Thanks for getting these numbers. >> >> Yeah, wow. That would imply that there are a huge number of processes >> that get hung up in start_this_handle(), and they are waking up the >> journal before the kjournald has a chance to set T_LOCKED (since then >> they would get blocked earlier in start_this_handle). >> >> Given that, I wonder if the following change would actually help or >> hurt things. Eric, would you be willing to ask your perf team to try >> testing out these patches? > Umm, I don't see anything. Forgot to attach them? > Here I catched the two patches: http://patchwork.ozlabs.org/patch/216768/ http://patchwork.ozlabs.org/patch/216767/ - Sedat - > Honza > -- > Jan Kara <jack@suse.cz> > SUSE Labs, CR -- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed 30-01-13 11:29:02, Sedat Dilek wrote: > On Wed, Jan 30, 2013 at 11:07 AM, Jan Kara <jack@suse.cz> wrote: > > On Wed 30-01-13 00:26:58, Ted Tso wrote: > >> On Wed, Jan 23, 2013 at 08:29:11PM +0100, Jan Kara wrote: > >> > > old JBD: AIM7 jobs/min 97624.39; got 78193 jbd wakeups > >> > > new JBD: AIM7 jobs/min 85929.43; got 6306999 jbd wakeups, 6264684 extra wakeups > >> > Yeah, that's a lot. My guess would be *a lot* of processes are hanging in > >> > start_this_handle() and waiting for transaction commit. Each of them calls > >> > __log_start_commit() and things add up. Thanks for getting these numbers. > >> > >> Yeah, wow. That would imply that there are a huge number of processes > >> that get hung up in start_this_handle(), and they are waking up the > >> journal before the kjournald has a chance to set T_LOCKED (since then > >> they would get blocked earlier in start_this_handle). > >> > >> Given that, I wonder if the following change would actually help or > >> hurt things. Eric, would you be willing to ask your perf team to try > >> testing out these patches? > > Umm, I don't see anything. Forgot to attach them? > > > > Here I catched the two patches: > > http://patchwork.ozlabs.org/patch/216768/ > http://patchwork.ozlabs.org/patch/216767/ Ah, OK. Thanks for the pointer. Honza
diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c index d492d57..3e0c4eb 100644 --- a/fs/jbd/journal.c +++ b/fs/jbd/journal.c @@ -433,15 +433,25 @@ int __log_space_left(journal_t *journal) return left; } +unsigned long jbd_wakeups; +unsigned long jbd_extra_wakeups; + /* * Called under j_state_lock. Returns true if a transaction commit was started. */ int __log_start_commit(journal_t *journal, tid_t target) { /* - * Are we already doing a recent enough commit? + * The only transaction we can possibly wait upon is the + * currently running transaction (if it exists). Otherwise, + * the target tid must be an old one. */ - if (!tid_geq(journal->j_commit_request, target)) { + if (/* journal->j_commit_request != target && <--- ERS: Undo "fix" */ + journal->j_running_transaction && + journal->j_running_transaction->t_tid == target) { + /* if we already have the right target, this is extra */ + if (journal->j_commit_request == target) + jbd_extra_wakeups++; /* * We want a new commit: OK, mark the request and wakup the * commit thread. We do _not_ do the commit ourselves. @@ -451,9 +461,17 @@ int __log_start_commit(journal_t *journal, tid_t target) jbd_debug(1, "JBD: requesting commit %d/%d\n", journal->j_commit_request, journal->j_commit_sequence); + jbd_wakeups++; wake_up(&journal->j_wait_commit); return 1; - } + } else if (!tid_geq(journal->j_commit_request, target)) + /* This should never happen, but if it does, preserve + the evidence before kjournald goes into a loop and + increments j_commit_sequence beyond all recognition. */ + WARN_ONCE(1, "jbd: bad log_start_commit: %u %u %u %u\n", + journal->j_commit_request, journal->j_commit_sequence, + target, journal->j_running_transaction ? + journal->j_running_transaction->t_tid : 0); return 0; } @@ -2039,6 +2057,7 @@ static void __exit journal_exit(void) if (n) printk(KERN_EMERG "JBD: leaked %d journal_heads!\n", n); #endif + printk("got %lu jbd wakeups, %lu extra wakeups\n", jbd_wakeups, jbd_extra_wakeups); jbd_remove_debugfs_entry(); journal_destroy_caches(); }