mbox series

[v5,0/3] ext4, jbd2: journal cycled record transactions between each mount

Message ID 20230322013353.1843306-1-yi.zhang@huaweicloud.com
Headers show
Series ext4, jbd2: journal cycled record transactions between each mount | expand

Message

Zhang Yi March 22, 2023, 1:33 a.m. UTC
From: Zhang Yi <yi.zhang@huawei.com>

v4->v5:
 - Update doc about journal superblock in journal.rst.
v3->v4:
 - Remove journal_cycle_record mount option, always enable it on ext4.
v2->v3:
 - Prevent warning if mount old image with journal_cycle_record enabled.
 - Limit this mount option into ext4 iamge only.
v1->v2:
 - Fix the format type warning.
 - Add more check of journal_cycle_record mount options in remount.

Hello!

This patch set add a new journal option 'JBD2_CYCLE_RECORD' and always
enable on ext4. It saves journal head for a clean unmounted file system
in the journal super block, which could let us record journal
transactions between each mount continuously. It could help us to do
journal backtrack and find root cause from a corrupted filesystem.
Current filesystem's corruption analysis is difficult and less useful
information, especially on the real products. It is useful to some
extent, especially for the cases of doing fuzzy tests and deploy in some
shout-runing products.

I've sent out the corresponding e2fsprogs part v2 separately[1], all of
these have done below test cases and also passed xfstests in auto mode.
 - Mount a filesystem with empty journal.
 - Mount a filesystem with journal ended in an unrecovered complete
   transaction.
 - Mount a filesystem with journal ended in an incomplete transaction.
 - Mount a corrupted filesystem with out of bound journal s_head.
 - Mount old filesystem without journal s_head set.

Any comments are welcome.

[1] https://lore.kernel.org/linux-ext4/20230317091716.4150992-1-yi.zhang@huaweicloud.com

Thanks!
Yi.

v4: https://lore.kernel.org/linux-ext4/20230317090926.4149399-1-yi.zhang@huaweicloud.com/
v3: https://lore.kernel.org/linux-ext4/20230314140522.3266591-1-yi.zhang@huaweicloud.com/
v2: https://lore.kernel.org/linux-ext4/20230202142224.3679549-1-yi.zhang@huawei.com/
v1: https://lore.kernel.org/linux-ext4/20230119034600.3431194-3-yi.zhang@huaweicloud.com/

Zhang Yi (3):
  jbd2: continue to record log between each mount
  ext4: add journal cycled recording support
  ext4: update doc about journal superblock description

 Documentation/filesystems/ext4/journal.rst |  7 ++++++-
 fs/ext4/super.c                            |  5 +++++
 fs/jbd2/journal.c                          | 18 ++++++++++++++++--
 fs/jbd2/recovery.c                         | 22 +++++++++++++++++-----
 include/linux/jbd2.h                       |  9 +++++++--
 5 files changed, 51 insertions(+), 10 deletions(-)

Comments

Andreas Dilger March 22, 2023, 9:34 p.m. UTC | #1
On Mar 21, 2023, at 7:33 PM, Zhang Yi <yi.zhang@huaweicloud.com> wrote:
> This patch set add a new journal option 'JBD2_CYCLE_RECORD' and always
> enable on ext4. It saves journal head for a clean unmounted file system
> in the journal super block, which could let us record journal
> transactions between each mount continuously. It could help us to do
> journal backtrack and find root cause from a corrupted filesystem.
> Current filesystem's corruption analysis is difficult and less useful
> information, especially on the real products. It is useful to some
> extent, especially for the cases of doing fuzzy tests and deploy in some
> shout-runing products.

Another interesting side benefit of this change is that it gets a step
closer to the "lazy ext4" (log-structured optimization) that had been
described some time ago at FAST:

https://lwn.net/Articles/720226/
https://www.usenix.org/system/files/conference/fast17/fast17-aghayev.pdf
https://lists.openwall.net/linux-ext4/2017/04/11/1

Essentially, free space in the filesystem (or a large external device)
could be used as a continuous journal, and metadata would only rarely
be checkpointed to the actual filesystem.  If the "journal" is close to
wrapping to the start, either the meta/data is checkpointed (if it is
no longer actively used or can make a large write), or re-journaled to
the end of the journal.  At remount time, the full journal is read into
memory (discarding old copies of blocks) and this is used to identify
the current metadata rather than reading from the filesystem itself.

This would allow e.g. very efficient flash caching of metadata (and also
journaled data for small writes) for an HDD (or QLC) device.

Cheers, Andreas
Zhang Yi March 23, 2023, 8:20 a.m. UTC | #2
On 2023/3/23 5:34, Andreas Dilger wrote:
> On Mar 21, 2023, at 7:33 PM, Zhang Yi <yi.zhang@huaweicloud.com> wrote:
>> This patch set add a new journal option 'JBD2_CYCLE_RECORD' and always
>> enable on ext4. It saves journal head for a clean unmounted file system
>> in the journal super block, which could let us record journal
>> transactions between each mount continuously. It could help us to do
>> journal backtrack and find root cause from a corrupted filesystem.
>> Current filesystem's corruption analysis is difficult and less useful
>> information, especially on the real products. It is useful to some
>> extent, especially for the cases of doing fuzzy tests and deploy in some
>> shout-runing products.
> 
> Another interesting side benefit of this change is that it gets a step
> closer to the "lazy ext4" (log-structured optimization) that had been
> described some time ago at FAST:
> 
> https://lwn.net/Articles/720226/
> https://www.usenix.org/system/files/conference/fast17/fast17-aghayev.pdf
> https://lists.openwall.net/linux-ext4/2017/04/11/1
> 
> Essentially, free space in the filesystem (or a large external device)
> could be used as a continuous journal, and metadata would only rarely
> be checkpointed to the actual filesystem.  If the "journal" is close to
> wrapping to the start, either the meta/data is checkpointed (if it is
> no longer actively used or can make a large write), or re-journaled to
> the end of the journal.  At remount time, the full journal is read into
> memory (discarding old copies of blocks) and this is used to identify
> the current metadata rather than reading from the filesystem itself.
> 
> This would allow e.g. very efficient flash caching of metadata (and also
> journaled data for small writes) for an HDD (or QLC) device.
> 

This is interesting, but current change looks like is just one small step.
It's been almost 6 years after the last talk I can found[1]. Is there
anyone still working on it?

[1] https://lore.kernel.org/linux-ext4/6B0F0C59-6930-41B3-8EE4-EA5BEECEB9F9@dilger.ca/

Thanks,
Yi.
Theodore Ts'o June 15, 2023, 2:59 p.m. UTC | #3
On Wed, 22 Mar 2023 09:33:50 +0800, Zhang Yi wrote:
> v4->v5:
>  - Update doc about journal superblock in journal.rst.
> v3->v4:
>  - Remove journal_cycle_record mount option, always enable it on ext4.
> v2->v3:
>  - Prevent warning if mount old image with journal_cycle_record enabled.
>  - Limit this mount option into ext4 iamge only.
> v1->v2:
>  - Fix the format type warning.
>  - Add more check of journal_cycle_record mount options in remount.
> 
> [...]

Applied, thanks!

[1/3] jbd2: continue to record log between each mount
      commit: 0311c8729c0a35114d64a64f8977e7d9bec926df
[2/3] ext4: add journal cycled recording support
      commit: b956fe38a26861bfe13e7e83fbeadf9d2e159366
[3/3] ext4: update doc about journal superblock description
      commit: ecdae6e9d63414b263ab2848ba3835e727eef2f9

Best regards,