From patchwork Wed Jul 7 04:40:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Matthew Ruffell X-Patchwork-Id: 1501572 Return-Path: X-Original-To: incoming@patchwork.ozlabs.org Delivered-To: patchwork-incoming@bilbo.ozlabs.org Authentication-Results: ozlabs.org; spf=none (no SPF record) smtp.mailfrom=lists.ubuntu.com (client-ip=91.189.94.19; helo=huckleberry.canonical.com; envelope-from=kernel-team-bounces@lists.ubuntu.com; receiver=) Received: from huckleberry.canonical.com (huckleberry.canonical.com [91.189.94.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by ozlabs.org (Postfix) with ESMTPS id 4GKRZV6hrnz9t25; Wed, 7 Jul 2021 14:41:06 +1000 (AEST) Received: from localhost ([127.0.0.1] helo=huckleberry.canonical.com) by huckleberry.canonical.com with esmtp (Exim 4.86_2) (envelope-from ) id 1m0zMs-00055k-1k; Wed, 07 Jul 2021 04:41:02 +0000 Received: from youngberry.canonical.com ([91.189.89.112]) by huckleberry.canonical.com with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.86_2) (envelope-from ) id 1m0zMk-00051c-9g for kernel-team@lists.ubuntu.com; Wed, 07 Jul 2021 04:40:54 +0000 Received: from mail-pg1-f199.google.com ([209.85.215.199]) by youngberry.canonical.com with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.93) (envelope-from ) id 1m0zMk-0008Ef-2j for kernel-team@lists.ubuntu.com; Wed, 07 Jul 2021 04:40:54 +0000 Received: by mail-pg1-f199.google.com with SMTP id x9-20020a6541490000b0290222fe6234d6so815748pgp.14 for ; Tue, 06 Jul 2021 21:40:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bjxn59GOYCLwUm6uFoXXwdlFN78dRpdfisyRGFWY38c=; b=dRrlS9+yhQyVHJ8pd7BOMrV4nXC8q+tuGEy//tqKncXXIg9UfpXLLpsbVSZuy77VTC D7bqVxPDD4RLGbDvZ9IbJAo6EDSwfWe2sROWgul5dNby/adekQ+fAv7KgS6w8xu8qG20 fdqLoWuEO2TtSE5UHHGRHTDW7516JF6O4Gn3DYBZpcHIuAe4l8GAn+7HZHvbEb/pu9C9 cKbM5aju2+UF+08c2MiJfNCjAscJy3iRcMgdBFR7/mjJwRswR5+GqGsbKc+Acq+mkJSd NQ+jtfDWOvATMsdeiCh/ImdmX7La2ny0M9LYpdZVkJ22ur2ORHeWTa7IiEkZHGVXw+ES l12g== X-Gm-Message-State: AOAM530L74PSd4LClmr+MPG4piiPs+1hFJHidrKOt0YJoEDLIshjIIPs BWwqhQ7lhYFKhWXQyZWC5rRI8mPLOlqMsVY1FlQVIc/XLI65toMigP+raVEYvZmKFoJbJJ+RE/k d0w5DD+r2OVw4mc4hP0kMevqd5//zWwQZASkQHKVHNQ== X-Received: by 2002:a17:902:ed82:b029:ef:48c8:128e with SMTP id e2-20020a170902ed82b02900ef48c8128emr19768261plj.72.1625632852785; Tue, 06 Jul 2021 21:40:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxHMPrWBRqYKFY8odLiwZbxKwSiiFsx/ZydIdAk3jVIQJEUSgml2p0aTsO+hFROAsy3eZuHVQ== X-Received: by 2002:a17:902:ed82:b029:ef:48c8:128e with SMTP id e2-20020a170902ed82b02900ef48c8128emr19768257plj.72.1625632852569; Tue, 06 Jul 2021 21:40:52 -0700 (PDT) Received: from desktop.. (125-237-197-94-fibre.sparkbb.co.nz. [125.237.197.94]) by smtp.gmail.com with ESMTPSA id d1sm16861332pfu.6.2021.07.06.21.40.51 for (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jul 2021 21:40:52 -0700 (PDT) From: Matthew Ruffell To: kernel-team@lists.ubuntu.com Subject: [SRU][B][PATCH 4/6] btrfs: Only check first key for committed tree blocks Date: Wed, 7 Jul 2021 16:40:35 +1200 Message-Id: <20210707044037.37992-5-matthew.ruffell@canonical.com> X-Mailer: git-send-email 2.30.2 In-Reply-To: <20210707044037.37992-1-matthew.ruffell@canonical.com> References: <20210707044037.37992-1-matthew.ruffell@canonical.com> MIME-Version: 1.0 X-BeenThere: kernel-team@lists.ubuntu.com X-Mailman-Version: 2.1.20 Precedence: list List-Id: Kernel team discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: kernel-team-bounces@lists.ubuntu.com Sender: "kernel-team" From: Qu Wenruo BugLink: https://bugs.launchpad.net/bugs/1934709 When looping btrfs/074 with many cpus (>= 8), it's possible to trigger kernel warning due to first key verification: [ 4239.523446] WARNING: CPU: 5 PID: 2381 at fs/btrfs/disk-io.c:460 btree_read_extent_buffer_pages+0x1ad/0x210 [ 4239.523830] Modules linked in: [ 4239.524630] RIP: 0010:btree_read_extent_buffer_pages+0x1ad/0x210 [ 4239.527101] Call Trace: [ 4239.527251] read_tree_block+0x42/0x70 [ 4239.527434] read_node_slot+0xd2/0x110 [ 4239.527632] push_leaf_right+0xad/0x1b0 [ 4239.527809] split_leaf+0x4ea/0x700 [ 4239.527988] ? leaf_space_used+0xbc/0xe0 [ 4239.528192] ? btrfs_set_lock_blocking_rw+0x99/0xb0 [ 4239.528416] btrfs_search_slot+0x8cc/0xa40 [ 4239.528605] btrfs_insert_empty_items+0x71/0xc0 [ 4239.528798] __btrfs_run_delayed_refs+0xa98/0x1680 [ 4239.529013] btrfs_run_delayed_refs+0x10b/0x1b0 [ 4239.529205] btrfs_commit_transaction+0x33/0xaf0 [ 4239.529445] ? start_transaction+0xa8/0x4f0 [ 4239.529630] btrfs_alloc_data_chunk_ondemand+0x1b0/0x4e0 [ 4239.529833] btrfs_check_data_free_space+0x54/0xa0 [ 4239.530045] btrfs_delalloc_reserve_space+0x25/0x70 [ 4239.531907] btrfs_direct_IO+0x233/0x3d0 [ 4239.532098] generic_file_direct_write+0xcb/0x170 [ 4239.532296] btrfs_file_write_iter+0x2bb/0x5f4 [ 4239.532491] aio_write+0xe2/0x180 [ 4239.532669] ? lock_acquire+0xac/0x1e0 [ 4239.532839] ? __might_fault+0x3e/0x90 [ 4239.533032] do_io_submit+0x594/0x860 [ 4239.533223] ? do_io_submit+0x594/0x860 [ 4239.533398] SyS_io_submit+0x10/0x20 [ 4239.533560] ? SyS_io_submit+0x10/0x20 [ 4239.533729] do_syscall_64+0x75/0x1d0 [ 4239.533979] entry_SYSCALL_64_after_hwframe+0x42/0xb7 [ 4239.534182] RIP: 0033:0x7f8519741697 The problem here is, at btree_read_extent_buffer_pages() we don't have acquired read/write lock on that extent buffer, only basic info like level/bytenr is reliable. So race condition leads to such false alert. However in current call site, it's impossible to acquire proper lock without race window. To fix the problem, we only verify first key for committed tree blocks (whose generation is no larger than fs_info->last_trans_committed), so the content of such tree blocks will not change and there is no need to get read/write lock. Reported-by: Nikolay Borisov Fixes: 581c1760415c ("btrfs: Validate child tree block's level and first key") Signed-off-by: Qu Wenruo Signed-off-by: David Sterba (cherry picked from commit 5d41be6f702f19f72db816c17175caf9dbdcdfa6) Signed-off-by: Matthew Ruffell --- fs/btrfs/disk-io.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 2f08fade7b8b..40e166cb263b 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -459,6 +459,14 @@ static int verify_level_key(struct btrfs_fs_info *fs_info, if (!first_key) return 0; + /* + * For live tree block (new tree blocks in current transaction), + * we need proper lock context to avoid race, which is impossible here. + * So we only checks tree blocks which is read from disk, whose + * generation <= fs_info->last_trans_committed. + */ + if (btrfs_header_generation(eb) > fs_info->last_trans_committed) + return 0; if (found_level) btrfs_node_key_to_cpu(eb, &found_key, 0); else