Message ID | 20230322013353.1843306-1-yi.zhang@huaweicloud.com |
---|---|
Headers | show |
Series | ext4, jbd2: journal cycled record transactions between each mount | expand |
On Mar 21, 2023, at 7:33 PM, Zhang Yi <yi.zhang@huaweicloud.com> wrote: > This patch set add a new journal option 'JBD2_CYCLE_RECORD' and always > enable on ext4. It saves journal head for a clean unmounted file system > in the journal super block, which could let us record journal > transactions between each mount continuously. It could help us to do > journal backtrack and find root cause from a corrupted filesystem. > Current filesystem's corruption analysis is difficult and less useful > information, especially on the real products. It is useful to some > extent, especially for the cases of doing fuzzy tests and deploy in some > shout-runing products. Another interesting side benefit of this change is that it gets a step closer to the "lazy ext4" (log-structured optimization) that had been described some time ago at FAST: https://lwn.net/Articles/720226/ https://www.usenix.org/system/files/conference/fast17/fast17-aghayev.pdf https://lists.openwall.net/linux-ext4/2017/04/11/1 Essentially, free space in the filesystem (or a large external device) could be used as a continuous journal, and metadata would only rarely be checkpointed to the actual filesystem. If the "journal" is close to wrapping to the start, either the meta/data is checkpointed (if it is no longer actively used or can make a large write), or re-journaled to the end of the journal. At remount time, the full journal is read into memory (discarding old copies of blocks) and this is used to identify the current metadata rather than reading from the filesystem itself. This would allow e.g. very efficient flash caching of metadata (and also journaled data for small writes) for an HDD (or QLC) device. Cheers, Andreas
On 2023/3/23 5:34, Andreas Dilger wrote: > On Mar 21, 2023, at 7:33 PM, Zhang Yi <yi.zhang@huaweicloud.com> wrote: >> This patch set add a new journal option 'JBD2_CYCLE_RECORD' and always >> enable on ext4. It saves journal head for a clean unmounted file system >> in the journal super block, which could let us record journal >> transactions between each mount continuously. It could help us to do >> journal backtrack and find root cause from a corrupted filesystem. >> Current filesystem's corruption analysis is difficult and less useful >> information, especially on the real products. It is useful to some >> extent, especially for the cases of doing fuzzy tests and deploy in some >> shout-runing products. > > Another interesting side benefit of this change is that it gets a step > closer to the "lazy ext4" (log-structured optimization) that had been > described some time ago at FAST: > > https://lwn.net/Articles/720226/ > https://www.usenix.org/system/files/conference/fast17/fast17-aghayev.pdf > https://lists.openwall.net/linux-ext4/2017/04/11/1 > > Essentially, free space in the filesystem (or a large external device) > could be used as a continuous journal, and metadata would only rarely > be checkpointed to the actual filesystem. If the "journal" is close to > wrapping to the start, either the meta/data is checkpointed (if it is > no longer actively used or can make a large write), or re-journaled to > the end of the journal. At remount time, the full journal is read into > memory (discarding old copies of blocks) and this is used to identify > the current metadata rather than reading from the filesystem itself. > > This would allow e.g. very efficient flash caching of metadata (and also > journaled data for small writes) for an HDD (or QLC) device. > This is interesting, but current change looks like is just one small step. It's been almost 6 years after the last talk I can found[1]. Is there anyone still working on it? [1] https://lore.kernel.org/linux-ext4/6B0F0C59-6930-41B3-8EE4-EA5BEECEB9F9@dilger.ca/ Thanks, Yi.
On Wed, 22 Mar 2023 09:33:50 +0800, Zhang Yi wrote: > v4->v5: > - Update doc about journal superblock in journal.rst. > v3->v4: > - Remove journal_cycle_record mount option, always enable it on ext4. > v2->v3: > - Prevent warning if mount old image with journal_cycle_record enabled. > - Limit this mount option into ext4 iamge only. > v1->v2: > - Fix the format type warning. > - Add more check of journal_cycle_record mount options in remount. > > [...] Applied, thanks! [1/3] jbd2: continue to record log between each mount commit: 0311c8729c0a35114d64a64f8977e7d9bec926df [2/3] ext4: add journal cycled recording support commit: b956fe38a26861bfe13e7e83fbeadf9d2e159366 [3/3] ext4: update doc about journal superblock description commit: ecdae6e9d63414b263ab2848ba3835e727eef2f9 Best regards,
From: Zhang Yi <yi.zhang@huawei.com> v4->v5: - Update doc about journal superblock in journal.rst. v3->v4: - Remove journal_cycle_record mount option, always enable it on ext4. v2->v3: - Prevent warning if mount old image with journal_cycle_record enabled. - Limit this mount option into ext4 iamge only. v1->v2: - Fix the format type warning. - Add more check of journal_cycle_record mount options in remount. Hello! This patch set add a new journal option 'JBD2_CYCLE_RECORD' and always enable on ext4. It saves journal head for a clean unmounted file system in the journal super block, which could let us record journal transactions between each mount continuously. It could help us to do journal backtrack and find root cause from a corrupted filesystem. Current filesystem's corruption analysis is difficult and less useful information, especially on the real products. It is useful to some extent, especially for the cases of doing fuzzy tests and deploy in some shout-runing products. I've sent out the corresponding e2fsprogs part v2 separately[1], all of these have done below test cases and also passed xfstests in auto mode. - Mount a filesystem with empty journal. - Mount a filesystem with journal ended in an unrecovered complete transaction. - Mount a filesystem with journal ended in an incomplete transaction. - Mount a corrupted filesystem with out of bound journal s_head. - Mount old filesystem without journal s_head set. Any comments are welcome. [1] https://lore.kernel.org/linux-ext4/20230317091716.4150992-1-yi.zhang@huaweicloud.com Thanks! Yi. v4: https://lore.kernel.org/linux-ext4/20230317090926.4149399-1-yi.zhang@huaweicloud.com/ v3: https://lore.kernel.org/linux-ext4/20230314140522.3266591-1-yi.zhang@huaweicloud.com/ v2: https://lore.kernel.org/linux-ext4/20230202142224.3679549-1-yi.zhang@huawei.com/ v1: https://lore.kernel.org/linux-ext4/20230119034600.3431194-3-yi.zhang@huaweicloud.com/ Zhang Yi (3): jbd2: continue to record log between each mount ext4: add journal cycled recording support ext4: update doc about journal superblock description Documentation/filesystems/ext4/journal.rst | 7 ++++++- fs/ext4/super.c | 5 +++++ fs/jbd2/journal.c | 18 ++++++++++++++++-- fs/jbd2/recovery.c | 22 +++++++++++++++++----- include/linux/jbd2.h | 9 +++++++-- 5 files changed, 51 insertions(+), 10 deletions(-)