diff mbox series

[1/2] jbd2: add new tracepoint jbd2_sleep_on_shadow

Message ID 20190902145442.1921-1-xiaoguang.wang@linux.alibaba.com
State New
Headers show
Series [1/2] jbd2: add new tracepoint jbd2_sleep_on_shadow | expand

Commit Message

Xiaoguang Wang Sept. 2, 2019, 2:54 p.m. UTC
Sometimes process will be stalled in "wait_on_bit_io(&bh->b_state,
BH_Shadow, TASK_UNINTERRUPTIBLE)" for a while, and in order to analyse
app's latency thoroughly, add a new tracepoint to track this delay.

Trace info likes below:
fsstress-5068  [008] .... 11007.757543: jbd2_sleep_on_shadow: dev 254,17 sleep 1
fsstress-5070  [007] .... 11007.757544: jbd2_sleep_on_shadow: dev 254,17 sleep 2
fsstress-5069  [009] .... 11007.757548: jbd2_sleep_on_shadow: dev 254,17 sleep 2
fsstress-5067  [011] .... 11007.757569: jbd2_sleep_on_shadow: dev 254,17 sleep 1
fsstress-5063  [007] .... 11007.757651: jbd2_sleep_on_shadow: dev 254,17 sleep 2
fsstress-5070  [007] .... 11007.757792: jbd2_sleep_on_shadow: dev 254,17 sleep 0
fsstress-5071  [011] .... 11007.763493: jbd2_sleep_on_shadow: dev 254,17 sleep 1

Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
---
 fs/jbd2/transaction.c       |  3 +++
 include/trace/events/jbd2.h | 21 +++++++++++++++++++++
 2 files changed, 24 insertions(+)

Comments

Theodore Ts'o Sept. 7, 2019, 4:21 p.m. UTC | #1
On Mon, Sep 02, 2019 at 10:54:41PM +0800, Xiaoguang Wang wrote:
> Sometimes process will be stalled in "wait_on_bit_io(&bh->b_state,
> BH_Shadow, TASK_UNINTERRUPTIBLE)" for a while, and in order to analyse
> app's latency thoroughly, add a new tracepoint to track this delay.
> 
> Trace info likes below:
> fsstress-5068  [008] .... 11007.757543: jbd2_sleep_on_shadow: dev 254,17 sleep 1
> fsstress-5070  [007] .... 11007.757544: jbd2_sleep_on_shadow: dev 254,17 sleep 2
> fsstress-5069  [009] .... 11007.757548: jbd2_sleep_on_shadow: dev 254,17 sleep 2
> fsstress-5067  [011] .... 11007.757569: jbd2_sleep_on_shadow: dev 254,17 sleep 1
> fsstress-5063  [007] .... 11007.757651: jbd2_sleep_on_shadow: dev 254,17 sleep 2
> fsstress-5070  [007] .... 11007.757792: jbd2_sleep_on_shadow: dev 254,17 sleep 0
> fsstress-5071  [011] .... 11007.763493: jbd2_sleep_on_shadow: dev 254,17 sleep 1
> 
> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>

I think maybe it might be better to use units of microseconds and then
change sleep to usleep so the units are clear?  This is a spinlock, so
it should be quick.

For the other patch in this series, milliseconds seems fine, but if we
change the trace info to use "msleep" instead that would be clearer
--- or you could change it to use microseconds as well just for
consistency; I think either would be fine.

What do you think?

Cheers,

						- Ted
Xiaoguang Wang Sept. 11, 2019, 6:52 a.m. UTC | #2
hi,

Thanks for reviewing.
> On Mon, Sep 02, 2019 at 10:54:41PM +0800, Xiaoguang Wang wrote:
>> Sometimes process will be stalled in "wait_on_bit_io(&bh->b_state,
>> BH_Shadow, TASK_UNINTERRUPTIBLE)" for a while, and in order to analyse
>> app's latency thoroughly, add a new tracepoint to track this delay.
>>
>> Trace info likes below:
>> fsstress-5068  [008] .... 11007.757543: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>> fsstress-5070  [007] .... 11007.757544: jbd2_sleep_on_shadow: dev 254,17 sleep 2
>> fsstress-5069  [009] .... 11007.757548: jbd2_sleep_on_shadow: dev 254,17 sleep 2
>> fsstress-5067  [011] .... 11007.757569: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>> fsstress-5063  [007] .... 11007.757651: jbd2_sleep_on_shadow: dev 254,17 sleep 2
>> fsstress-5070  [007] .... 11007.757792: jbd2_sleep_on_shadow: dev 254,17 sleep 0
>> fsstress-5071  [011] .... 11007.763493: jbd2_sleep_on_shadow: dev 254,17 sleep 1
>>
>> Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
> 
> I think maybe it might be better to use units of microseconds and then
> change sleep to usleep so the units are clear?  This is a spinlock, so
> it should be quick.
Sorry, I may not quite understand you, do you mean that milliseconds is not precise, so
should use microseconds? For these two patches, they do not use usleep or msleep to do
real sleep work, they just record the duration which process takes to wait bh_shadow flag
to be cleared or transaction to be unlocked.

Regards,
Xiaougang Wang

> 
> For the other patch in this series, milliseconds seems fine, but if we
> change the trace info to use "msleep" instead that would be clearer
> --- or you could change it to use microseconds as well just for
> consistency; I think either would be fine.
> 
> What do you think?
> 
> Cheers,
> 
> 						- Ted
>
Theodore Ts'o Sept. 11, 2019, 1:57 p.m. UTC | #3
On Wed, Sep 11, 2019 at 02:52:51PM +0800, Xiaoguang Wang wrote:
> > I think maybe it might be better to use units of microseconds and then
> > change sleep to usleep so the units are clear?  This is a spinlock, so
> > it should be quick.
>
> Sorry, I may not quite understand you, do you mean that milliseconds is not precise, so
> should use microseconds? For these two patches, they do not use usleep or msleep to do
> real sleep work, they just record the duration which process takes to wait bh_shadow flag
> to be cleared or transaction to be unlocked.

Apologies, I should have been clear enough.  Yes, my concern that
milliseconds might not be fine-grained enough.  The sample results
which you showed had values of 2ms, 1ms, and 0ms.  And the single 0ms
result in particular raised the concern that we should use a
microseconds instead of milliseconds.

In fact, instead of "sleep", maybe "latency(us)" or "latency(ms)"
would be a better label?

Regards,

						- Ted
Xiaoguang Wang Sept. 16, 2019, 2:20 p.m. UTC | #4
hi,

> On Wed, Sep 11, 2019 at 02:52:51PM +0800, Xiaoguang Wang wrote:
>>> I think maybe it might be better to use units of microseconds and then
>>> change sleep to usleep so the units are clear?  This is a spinlock, so
>>> it should be quick.
>>
>> Sorry, I may not quite understand you, do you mean that milliseconds is not precise, so
>> should use microseconds? For these two patches, they do not use usleep or msleep to do
>> real sleep work, they just record the duration which process takes to wait bh_shadow flag
>> to be cleared or transaction to be unlocked.
> 
> Apologies, I should have been clear enough.  Yes, my concern that
> milliseconds might not be fine-grained enough.  The sample results
> which you showed had values of 2ms, 1ms, and 0ms.  And the single 0ms
> result in particular raised the concern that we should use a
> microseconds instead of milliseconds.
> 
> In fact, instead of "sleep", maybe "latency(us)" or "latency(ms)"
> would be a better label?
OK, I'll update a v2, thanks.

Regards,
Xiaoguang Wang

> 
> Regards,
> 
> 						- Ted
>
diff mbox series

Patch

diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index 990e7b5062e7..84974fb9d4f9 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -991,7 +991,10 @@  do_get_write_access(handle_t *handle, struct journal_head *jh,
 	if (buffer_shadow(bh)) {
 		JBUFFER_TRACE(jh, "on shadow: sleep");
 		jbd_unlock_bh_state(bh);
+		start_lock = jiffies;
 		wait_on_bit_io(&bh->b_state, BH_Shadow, TASK_UNINTERRUPTIBLE);
+		trace_jbd2_sleep_on_shadow(bh->b_bdev->bd_dev,
+			jiffies_to_msecs(jiffies - start_lock));
 		goto repeat;
 	}
 
diff --git a/include/trace/events/jbd2.h b/include/trace/events/jbd2.h
index 2310b259329f..2f048fdb63c6 100644
--- a/include/trace/events/jbd2.h
+++ b/include/trace/events/jbd2.h
@@ -380,6 +380,27 @@  TRACE_EVENT(jbd2_lock_buffer_stall,
 		__entry->stall_ms)
 );
 
+TRACE_EVENT(jbd2_sleep_on_shadow,
+
+	TP_PROTO(dev_t dev, unsigned long stall_ms),
+
+	TP_ARGS(dev, stall_ms),
+
+	TP_STRUCT__entry(
+		__field(        dev_t, dev	)
+		__field(unsigned long, stall_ms	)
+	),
+
+	TP_fast_assign(
+		__entry->dev		= dev;
+		__entry->stall_ms	= stall_ms;
+	),
+
+	TP_printk("dev %d,%d sleep %lu",
+		MAJOR(__entry->dev), MINOR(__entry->dev),
+		__entry->stall_ms)
+);
+
 #endif /* _TRACE_JBD2_H */
 
 /* This part must be outside protection */