mbox series

[RFC,0/4] make jbd2 debug switch per device

Message ID cover.1611287342.git.brookxu@tencent.com
Headers show
Series make jbd2 debug switch per device | expand

Message

brookxu Jan. 22, 2021, 6:43 a.m. UTC
On a multi-disk machine, because jbd2 debugging switch is global, this
confuses the logs of multiple disks. It is not easy to distinguish the
logs of each disk and the amount of generated logs is very large. Or a
separate debugging switch for each disk would be better, so that you
can easily distinguish the logs of a certain disk. 

We can enable jbd2 debugging of a device in the following ways:
echo X > /proc/fs/jbd2/sdX/jbd2_debug

But there is a small disadvantage here. Because the debugging switch is
placed in the journal_t object, the log before the object is initialized
will be lost. However, usually this will not have much impact on
debugging.

Chunguang Xu (4):
  jbd2: make jdb2_debug module parameter per device
  jbd2: introduce some new log interfaces
  jbd2: replace jbd_debug with the new log interface
  ext4: replace jbd_debug with the new log interface

 fs/ext4/balloc.c      |   2 +-
 fs/ext4/ext4_jbd2.c   |   3 +-
 fs/ext4/fast_commit.c |  64 ++++++++++++----------
 fs/ext4/indirect.c    |   4 +-
 fs/ext4/inode.c       |   3 +-
 fs/ext4/namei.c       |  10 ++--
 fs/ext4/super.c       |  16 +++---
 fs/jbd2/checkpoint.c  |   6 +--
 fs/jbd2/commit.c      |  36 ++++++-------
 fs/jbd2/journal.c     | 122 +++++++++++++++++++++++++++---------------
 fs/jbd2/recovery.c    |  59 ++++++++++----------
 fs/jbd2/revoke.c      |   8 +--
 fs/jbd2/transaction.c |  35 ++++++------
 include/linux/jbd2.h  |  65 ++++++++++++++++------
 14 files changed, 257 insertions(+), 176 deletions(-)

Comments

Jan Kara Jan. 25, 2021, 12:41 p.m. UTC | #1
On Fri 22-01-21 14:43:18, Chunguang Xu wrote:
> On a multi-disk machine, because jbd2 debugging switch is global, this
> confuses the logs of multiple disks. It is not easy to distinguish the
> logs of each disk and the amount of generated logs is very large. Or a
> separate debugging switch for each disk would be better, so that you
> can easily distinguish the logs of a certain disk. 
> 
> We can enable jbd2 debugging of a device in the following ways:
> echo X > /proc/fs/jbd2/sdX/jbd2_debug
> 
> But there is a small disadvantage here. Because the debugging switch is
> placed in the journal_t object, the log before the object is initialized
> will be lost. However, usually this will not have much impact on
> debugging.

OK, I didn't look at the series yet but I'm wondering: How are you using
jbd2 debugging? I mean obviously it isn't meant for production use but
rather for debugging JBD2 bugs so I'm kind of wondering in which case too
many messages matter.

And if the problem is that there's a problem with distinguishing messages
from multiple filesystems, then it would be perhaps more useful to add
journal identification to each message similarly as we do it with ext4
messages (likely by using journal->j_dev) - which is very simple to do
after your patches 3 and 4.

								Honza
brookxu Jan. 25, 2021, 1:59 p.m. UTC | #2
Thanks for your reply.

Jan Kara wrote on 2021/1/25 20:41:
> On Fri 22-01-21 14:43:18, Chunguang Xu wrote:
>> On a multi-disk machine, because jbd2 debugging switch is global, this
>> confuses the logs of multiple disks. It is not easy to distinguish the
>> logs of each disk and the amount of generated logs is very large. Or a
>> separate debugging switch for each disk would be better, so that you
>> can easily distinguish the logs of a certain disk. 
>>
>> We can enable jbd2 debugging of a device in the following ways:
>> echo X > /proc/fs/jbd2/sdX/jbd2_debug
>>
>> But there is a small disadvantage here. Because the debugging switch is
>> placed in the journal_t object, the log before the object is initialized
>> will be lost. However, usually this will not have much impact on
>> debugging.
> 
> OK, I didn't look at the series yet but I'm wondering: How are you using
> jbd2 debugging? I mean obviously it isn't meant for production use but
> rather for debugging JBD2 bugs so I'm kind of wondering in which case too
> many messages matter.
We perform stress testing on machines in the test environment, and use scripts
to capture journal related logs to analyze problems. There are 12 disks on this
machine, and each disk runs different jobs. Our test kernel also adds some
additional function-related logs. If we adjust the log level to a higher level,
a large number of logs have nothing to do with the disk to be observed. These
logs are generated by system agents or coordinated tasks. This makes the log
difficul to analyze.
 
> And if the problem is that there's a problem with distinguishing messages
> from multiple filesystems, then it would be perhaps more useful to add
> journal identification to each message similarly as we do it with ext4
> messages (likely by using journal->j_dev) - which is very simple to do
> after your patches 3 and 4.
Our test kernel did this. Because it broke the log format, I was not sure whether
it would break something, so I didn't bring this part. Even if the device information
is added, when there are more disks and the log level is higher, there will be a
lot of irrelevant logs, which makes it necessary to consume a lot of CPU to filter
messages. Therefore, a device-level switch is provided to make everything simpler.
> 
> 								Honza
>
brookxu Jan. 25, 2021, 2:07 p.m. UTC | #3
Thanks for your reply.

Jan Kara wrote on 2021/1/25 20:41:
> On Fri 22-01-21 14:43:18, Chunguang Xu wrote:
>> On a multi-disk machine, because jbd2 debugging switch is global, this
>> confuses the logs of multiple disks. It is not easy to distinguish the
>> logs of each disk and the amount of generated logs is very large. Or a
>> separate debugging switch for each disk would be better, so that you
>> can easily distinguish the logs of a certain disk.
>>
>> We can enable jbd2 debugging of a device in the following ways:
>> echo X > /proc/fs/jbd2/sdX/jbd2_debug
>>
>> But there is a small disadvantage here. Because the debugging switch is
>> placed in the journal_t object, the log before the object is initialized
>> will be lost. However, usually this will not have much impact on
>> debugging.
>
> OK, I didn't look at the series yet but I'm wondering: How are you using
> jbd2 debugging? I mean obviously it isn't meant for production use but
> rather for debugging JBD2 bugs so I'm kind of wondering in which case too
> many messages matter.
We perform stress testing on machines in the test environment, and use scripts
to capture journal related logs to analyze problems. There are 12 disks on this
machine, and each disk runs different jobs. Our test kernel also adds
some additional
function-related logs. If we adjust the log level to a higher level, a large
number of logs have nothing to do with the disk to be observed. These logs are
generated by system agents or coordinated tasks. This makes the log difficul
to analyze.

> And if the problem is that there's a problem with distinguishing messages
> from multiple filesystems, then it would be perhaps more useful to add
> journal identification to each message similarly as we do it with ext4
> messages (likely by using journal->j_dev) - which is very simple to do
> after your patches 3 and 4.
Our test kernel did this. Because it broke the log format, I was not
sure whether
it would break something, so I didn't bring this part. Even if the
device information
is added, when there are more disks and the log level is higher, there will be a
lot of irrelevant logs, which makes it necessary to consume a lot of
CPU to filter
messages. Therefore, a device-level switch is provided to make
everything simpler.
>
>                                                               Honza
>