From patchwork Mon Jan 30 02:11:31 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Ruffell X-Patchwork-Id: 1733688 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@legolas.ozlabs.org Authentication-Results: legolas.ozlabs.org; spf=pass (sender SPF authorized) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Authentication-Results: legolas.ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=canonical.com header.i=@canonical.com header.a=rsa-sha256 header.s=20210705 header.b=P+QGY7kh; dkim-atps=neutral Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-ECDSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by legolas.ozlabs.org (Postfix) with ESMTPS id 4P4sBP6N8Fz23hg for ; Mon, 30 Jan 2023 13:11:56 +1100 (AEDT) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1pMJe6-0005NL-Qj; Mon, 30 Jan 2023 02:11:46 +0000 Received: from smtp-relay-internal-1.internal ([10.131.114.114] helo=smtp-relay-internal-1.canonical.com) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1pMJe3-0005Lj-Nv for kernel-team@lists.ubuntu.com; Mon, 30 Jan 2023 02:11:43 +0000 Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by smtp-relay-internal-1.canonical.com (Postfix) with ESMTPS id 6507E3F2F5 for ; Mon, 30 Jan 2023 02:11:43 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=canonical.com; s=20210705; t=1675044703; bh=cJXFJhPfRKFjOy2Hw2UWO4nmVxpmfCWWIQ8UOIgdQts=; h=From:To:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=P+QGY7kh+nTAR8MAlEV4MQSpxuWuL8RT9W7ThclmIrMNqTZj+ufSZArcNzKM0a03l HKCaE0TSptITB6LSEWUzv4chpDzRSPhH3X3IUEBWtbDbmx49al7wZ8oJW3kxKoWIP8 JrgXbaYwRzTKO0DwfpC2dxeyWdMoQ+yd44RzbVaFXvusIQTShub2kH+7yNy2qp7DeY 2Emno9oJSoKxZBAI+JqrNVTqbRks7th9Mz5LoEFZgpdAxYTcuNO+f3dNN2TUKf8OZ4 pmJsg4WqMJFOK+p4Gx+1mJ7sJOSPaaCu4tY35YZRfmxQUO6LQU8zEGMkK8XIVWQf65 v28zAVyWV3ZkA== Received: by mail-pl1-f198.google.com with SMTP id f8-20020a170902ce8800b00190c6518e21so5772976plg.1 for ; Sun, 29 Jan 2023 18:11:43 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=cJXFJhPfRKFjOy2Hw2UWO4nmVxpmfCWWIQ8UOIgdQts=; b=jdX90IfcEBFSkbeaY3O/T1F1ghSzUZbjm94xsYoBSkiCNU7BCuOfZW34NYYbZRL5S+ u62dbBDuILYuKFIRsuvJrecti8wNLO0rN1poeyFXotuqKx5+rkmPaX24qVv1abYYOboC LVfRfJfLLVxWBk8KDc3EGeLkpJgD2vf1g0Rruv1cA64M39qKgd95npgM/AE/9lqQfto7 tyBTznbeEYT4kVi4DTXl3/jVnFR7GKRjiKqIQ0UHvd+/SyWbZMWSL5c5zzOVjDCPBYrU qxfj4DfIYT8bbAzp6+57KstjsNSpfQ/X/+IOuI9jVprNwaftkcEhS/wq2vTanqIEqR1o +wLA== X-Gm-Message-State: AO0yUKWtdKNy3lAnJ5GhJ6bYctS/SZP9S3/M4VqkhR9YPmOqhELgQCLn lWyxfUgRhFFS2TVHSq6df6oXA7B1VBgUzKJoStKMsDfq2OoKckJ9SVpYeu0Y644OXXXu5SRjSkW NK6g7CpklH5EC9sx83FjaFLjm4miGoGhgfO2r+76PtA== X-Received: by 2002:a05:6a00:1949:b0:593:954e:1b09 with SMTP id s9-20020a056a00194900b00593954e1b09mr7657269pfk.8.1675044701987; Sun, 29 Jan 2023 18:11:41 -0800 (PST) X-Google-Smtp-Source: AK7set9CynerY2piRsrAiK34uf35ezBe4LFYEbLz5MBzFT+n0ppkovIfXCjFhfL6TNmc2ZP0kU1Nhw== X-Received: by 2002:a05:6a00:1949:b0:593:954e:1b09 with SMTP id s9-20020a056a00194900b00593954e1b09mr7657252pfk.8.1675044701616; Sun, 29 Jan 2023 18:11:41 -0800 (PST) Received: from ThinkPad-X1.. (125-239-70-54-fibre.sparkbb.co.nz. [125.239.70.54]) by smtp.gmail.com with ESMTPSA id n16-20020aa79850000000b005818d429d98sm6218468pfq.136.2023.01.29.18.11.40 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 29 Jan 2023 18:11:41 -0800 (PST) From: Matthew Ruffell To: kernel-team@lists.ubuntu.com Subject: [SRU][F][PATCH 1/1] btrfs: correctly calculate item size used when item key collision happens Date: Mon, 30 Jan 2023 15:11:31 +1300 Message-Id: <20230130021131.19564-3-matthew.ruffell@canonical.com> X-Mailer: git-send-email 2.37.2 In-Reply-To: <20230130021131.19564-1-matthew.ruffell@canonical.com> References: <20230130021131.19564-1-matthew.ruffell@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: ethanwu BugLink: https://bugs.launchpad.net/bugs/2004132 Item key collision is allowed for some item types, like dir item and inode refs, but the overall item size is limited by the nodesize. item size(ins_len) passed from btrfs_insert_empty_items to btrfs_search_slot already contains size of btrfs_item. When btrfs_search_slot reaches leaf, we'll see if we need to split leaf. The check incorrectly reports that split leaf is required, because it treats the space required by the newly inserted item as btrfs_item + item data. But in item key collision case, only item data is actually needed, the newly inserted item could merge into the existing one. No new btrfs_item will be inserted. And split_leaf return EOVERFLOW from following code: if (extend && data_size + btrfs_item_size_nr(l, slot) + sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(fs_info)) return -EOVERFLOW; In most cases, when callers receive EOVERFLOW, they either return this error or handle in different ways. For example, in normal dir item creation the userspace will get errno EOVERFLOW; in inode ref case INODE_EXTREF is used instead. However, this is not the case for rename. To avoid the unrecoverable situation in rename, btrfs_check_dir_item_collision is called in early phase of rename. In this function, when item key collision is detected leaf space is checked: data_size = sizeof(*di) + name_len; if (data_size + btrfs_item_size_nr(leaf, slot) + sizeof(struct btrfs_item) > BTRFS_LEAF_DATA_SIZE(root->fs_info)) the sizeof(struct btrfs_item) + btrfs_item_size_nr(leaf, slot) here refers to existing item size, the condition here correctly calculates the needed size for collision case rather than the wrong case above. The consequence of inconsistent condition check between btrfs_check_dir_item_collision and btrfs_search_slot when item key collision happens is that we might pass check here but fail later at btrfs_search_slot. Rename fails and volume is forced readonly [436149.586170] ------------[ cut here ]------------ [436149.586173] BTRFS: Transaction aborted (error -75) [436149.586196] WARNING: CPU: 0 PID: 16733 at fs/btrfs/inode.c:9870 btrfs_rename2+0x1938/0x1b70 [btrfs] [436149.586227] CPU: 0 PID: 16733 Comm: python Tainted: G D 4.18.0-rc5+ #1 [436149.586228] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016 [436149.586238] RIP: 0010:btrfs_rename2+0x1938/0x1b70 [btrfs] [436149.586254] RSP: 0018:ffffa327043a7ce0 EFLAGS: 00010286 [436149.586255] RAX: 0000000000000000 RBX: ffff8d8a17d13340 RCX: 0000000000000006 [436149.586256] RDX: 0000000000000007 RSI: 0000000000000096 RDI: ffff8d8a7fc164b0 [436149.586257] RBP: ffffa327043a7da0 R08: 0000000000000560 R09: 7265282064657472 [436149.586258] R10: 0000000000000000 R11: 6361736e61725420 R12: ffff8d8a0d4c8b08 [436149.586258] R13: ffff8d8a17d13340 R14: ffff8d8a33e0a540 R15: 00000000000001fe [436149.586260] FS: 00007fa313933740(0000) GS:ffff8d8a7fc00000(0000) knlGS:0000000000000000 [436149.586261] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [436149.586262] CR2: 000055d8d9c9a720 CR3: 000000007aae0003 CR4: 00000000003606f0 [436149.586295] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [436149.586296] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [436149.586296] Call Trace: [436149.586311] vfs_rename+0x383/0x920 [436149.586313] ? vfs_rename+0x383/0x920 [436149.586315] do_renameat2+0x4ca/0x590 [436149.586317] __x64_sys_rename+0x20/0x30 [436149.586324] do_syscall_64+0x5a/0x120 [436149.586330] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [436149.586332] RIP: 0033:0x7fa3133b1d37 [436149.586348] RSP: 002b:00007fffd3e43908 EFLAGS: 00000246 ORIG_RAX: 0000000000000052 [436149.586349] RAX: ffffffffffffffda RBX: 00007fa3133b1d30 RCX: 00007fa3133b1d37 [436149.586350] RDX: 000055d8da06b5e0 RSI: 000055d8da225d60 RDI: 000055d8da2c4da0 [436149.586351] RBP: 000055d8da2252f0 R08: 00007fa313782000 R09: 00000000000177e0 [436149.586351] R10: 000055d8da010680 R11: 0000000000000246 R12: 00007fa313840b00 Thanks to Hans van Kranenburg for information about crc32 hash collision tools, I was able to reproduce the dir item collision with following python script. https://github.com/wutzuchieh/misc_tools/blob/master/crc32_forge.py Run it under a btrfs volume will trigger the abort transaction. It simply creates files and rename them to forged names that leads to hash collision. There are two ways to fix this. One is to simply revert the patch 878f2d2cb355 ("Btrfs: fix max dir item size calculation") to make the condition consistent although that patch is correct about the size. The other way is to handle the leaf space check correctly when collision happens. I prefer the second one since it correct leaf space check in collision case. This fix will not account sizeof(struct btrfs_item) when the item already exists. There are two places where ins_len doesn't contain sizeof(struct btrfs_item), however. 1. extent-tree.c: lookup_inline_extent_backref 2. file-item.c: btrfs_csum_file_blocks to make the logic of btrfs_search_slot more clear, we add a flag search_for_extension in btrfs_path. This flag indicates that ins_len passed to btrfs_search_slot doesn't contain sizeof(struct btrfs_item). When key exists, btrfs_search_slot will use the actual size needed to calculate the required leaf space. CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana Signed-off-by: ethanwu Signed-off-by: David Sterba (cherry picked from 9a664971569daf68254928149f580b4f5856d274) Signed-off-by: Matthew Ruffell --- fs/btrfs/ctree.c | 24 ++++++++++++++++++++++-- fs/btrfs/ctree.h | 6 ++++++ fs/btrfs/extent-tree.c | 2 ++ fs/btrfs/file-item.c | 2 ++ 4 files changed, 32 insertions(+), 2 deletions(-) diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c index bd82882f84a0..982d769e5b4e 100644 --- a/fs/btrfs/ctree.c +++ b/fs/btrfs/ctree.c @@ -2756,8 +2756,14 @@ static struct extent_buffer *btrfs_search_slot_get_root(struct btrfs_root *root, * @p: Holds all btree nodes along the search path * @root: The root node of the tree * @key: The key we are looking for - * @ins_len: Indicates purpose of search, for inserts it is 1, for - * deletions it's -1. 0 for plain searches + * @ins_len: Indicates purpose of search: + * >0 for inserts it's size of item inserted (*) + * <0 for deletions + * 0 for plain searches, not modifying the tree + * + * (*) If size of item inserted doesn't include + * sizeof(struct btrfs_item), then p->search_for_extension must + * be set. * @cow: boolean should CoW operations be performed. Must always be 1 * when modifying the tree. * @@ -2971,6 +2977,20 @@ int btrfs_search_slot(struct btrfs_trans_handle *trans, struct btrfs_root *root, } } else { p->slots[level] = slot; + /* + * Item key already exists. In this case, if we are + * allowed to insert the item (for example, in dir_item + * case, item key collision is allowed), it will be + * merged with the original item. Only the item size + * grows, no new btrfs item will be added. If + * search_for_extension is not set, ins_len already + * accounts the size btrfs_item, deduct it here so leaf + * space check will be correct. + */ + if (ret == 0 && ins_len > 0 && !p->search_for_extension) { + ASSERT(ins_len >= sizeof(struct btrfs_item)); + ins_len -= sizeof(struct btrfs_item); + } if (ins_len > 0 && btrfs_leaf_free_space(b) < ins_len) { if (write_lock_level < 1) { diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h index c2e5fe972f56..df9c0393d138 100644 --- a/fs/btrfs/ctree.h +++ b/fs/btrfs/ctree.h @@ -363,6 +363,12 @@ struct btrfs_path { unsigned int search_commit_root:1; unsigned int need_commit_sem:1; unsigned int skip_release_on_error:1; + /* + * Indicate that new item (btrfs_search_slot) is extending already + * existing item and ins_len contains only the data size and not item + * header (ie. sizeof(struct btrfs_item) is not included). + */ + unsigned int search_for_extension:1; }; #define BTRFS_MAX_EXTENT_ITEM_SIZE(r) ((BTRFS_LEAF_DATA_SIZE(r->fs_info) >> 4) - \ sizeof(struct btrfs_item)) diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c index 7005f8c3848a..8c8cb704d580 100644 --- a/fs/btrfs/extent-tree.c +++ b/fs/btrfs/extent-tree.c @@ -847,6 +847,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, want = extent_ref_type(parent, owner); if (insert) { extra_size = btrfs_extent_inline_ref_size(want); + path->search_for_extension = 1; path->keep_locks = 1; } else extra_size = -1; @@ -999,6 +1000,7 @@ int lookup_inline_extent_backref(struct btrfs_trans_handle *trans, out: if (insert) { path->keep_locks = 0; + path->search_for_extension = 0; btrfs_unlock_up_safe(path, 1); } return err; diff --git a/fs/btrfs/file-item.c b/fs/btrfs/file-item.c index 61b82c69eed5..7d0f0c7bc84c 100644 --- a/fs/btrfs/file-item.c +++ b/fs/btrfs/file-item.c @@ -822,8 +822,10 @@ int btrfs_csum_file_blocks(struct btrfs_trans_handle *trans, * enough yet to put our csum in. Grow it */ btrfs_release_path(path); + path->search_for_extension = 1; ret = btrfs_search_slot(trans, root, &file_key, path, csum_size, 1); + path->search_for_extension = 0; if (ret < 0) goto fail_unlock;