From patchwork Sat Apr 30 16:19:22 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eryu Guan X-Patchwork-Id: 617105 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3qxxcZ3W0Gz9t7b for ; Sun, 1 May 2016 02:59:26 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=Po2HiMCB; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752366AbcD3Q7T (ORCPT ); Sat, 30 Apr 2016 12:59:19 -0400 Received: from mail-pf0-f193.google.com ([209.85.192.193]:34541 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752348AbcD3Q7S (ORCPT ); Sat, 30 Apr 2016 12:59:18 -0400 Received: by mail-pf0-f193.google.com with SMTP id 145so15164966pfz.1; Sat, 30 Apr 2016 09:59:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=m8Y8bpqhXXnCMKP2MVmvyZvxcTAD0tDS4UMaDgH1src=; b=Po2HiMCB9EX0GuQZue124wyh4X9d6mC5Mc1MmYtuI8MwMx3uiPS+5PYxt6uaqhlkZI QWASnbGWQPB8tVc8Xet6SErtAngpzd4y3QF4Q8tZnhPh0xetS4/4AYWYDtHxw2K72nwT l2nELBSfImLNy3pDCniUWZdZ9CSg3FORRxOkaAFc3lKqnmJ2V/0wFParyW0e6I9iAR4s KYTWPOkbmEX59UCQTFShFPiRDdb7O9F3oHod69dCgW+IxjaMmXxUofaufcBDJ++J7M9b b7rkKKUarkli22nR2Bc/b+wJGHpCArvKr9HBtoY/m21eRh8lrhV5yqF0Ue6jT1M8dG20 HETQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=m8Y8bpqhXXnCMKP2MVmvyZvxcTAD0tDS4UMaDgH1src=; b=EWglQbQicXGrzSk4+Pc2frjjT3nJKK77RlMiUKR2Jl/yEltlKnup/EQejuZBtzemTy 7fQeSFgG2kpUXa2rHkRQhAAfLl8mKGH8Ymu+J3dk8UMxUUY800oAgu0lihKZl9locjEd NSk4FmlbsfAAQv8PMp8BKXKDHDS1lLG/YzYECxzoiO1Ok+U9QuS+YeexqiQOH6mPFXNt WBjqdkflH+3YOXpulmWmMVkNZ0lCirvRAMsE2/ctjgKl7YwpyI3chVHZc0Y/etNT3nvj BHLUarEMEjtRuTVXS5gRTBfq2MN+ztA4pb4asmjKi+Nt3qEJPkgZR2DAr4aA4Eh1PpQJ htZA== X-Gm-Message-State: AOPr4FWCopa9rUu9ta+c3XsBloTw8nggmggc8OX2nA9jghrlMQylv7q3mJkal4hK61Qbtg== X-Received: by 10.98.73.142 with SMTP id r14mr38449366pfi.140.1462035557749; Sat, 30 Apr 2016 09:59:17 -0700 (PDT) Received: from localhost ([128.199.137.77]) by smtp.gmail.com with ESMTPSA id fn3sm32894683pab.20.2016.04.30.09.59.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Sat, 30 Apr 2016 09:59:17 -0700 (PDT) From: Eryu Guan To: linux-fsdevel@vger.kernel.org Cc: linux-ext4@vger.kernel.org, Eryu Guan Subject: [PATCH] direct-io: fix stale data exposure from concurrent buffered read Date: Sun, 1 May 2016 00:19:22 +0800 Message-Id: <1462033162-21837-1-git-send-email-guaneryu@gmail.com> X-Mailer: git-send-email 2.5.5 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Direct writes inside i_size on a DIO_SKIP_HOLES filesystem are not allowed to allocate blocks(get_more_blocks() sets 'create' to 0 before calling get_bkicl() callback), if it's a sparse file, direct writes fall back to buffered writes to avoid stale data exposure from concurrent buffered read. But the detection for "writing inside i_size" is not correct, writes can be treated as "extending writes" wrongly, which results in block allocation for holes and could expose stale data. This is because we're checking on the fs blocks not the actual offset and i_size in bytes. For example, direct write 1FSB to a 1FSB(or smaller) sparse file on ext2/3/4, starting from offset 0, in this case it's writing inside i_size, but 'create' is non-zero, because 'sdio->block_in_file' and '(i_size_read(dio->inode) >> sdio->blkbits' are both zero. This can be demonstrated by running ltp-aiodio test ADSP045 many times. When testing on extN filesystems, I see test failures occasionally, buffered read could read non-zero (stale) data. ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 2 dio_sparse 0 TINFO : Dirtying free blocks dio_sparse 0 TINFO : Starting I/O tests non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa non-zero read at offset 0 dio_sparse 0 TINFO : Killing childrens(s) dio_sparse 1 TFAIL : dio_sparse.c:191: 1 children(s) exited abnormally Fix it by checking on the actual offset and i_size in bytes, not the fs blocks where offset and i_size are in, to make sure it's really writing into the file. Also introduce some local variables to make the code easier to read a little bit. Signed-off-by: Eryu Guan --- fs/direct-io.c | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 4720377..ca0c9bc 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -607,9 +607,12 @@ static int get_more_blocks(struct dio *dio, struct dio_submit *sdio, int ret; sector_t fs_startblk; /* Into file, in filesystem-sized blocks */ sector_t fs_endblk; /* Into file, in filesystem-sized blocks */ + sector_t block_in_file = sdio->block_in_file; unsigned long fs_count; /* Number of filesystem-sized blocks */ int create; - unsigned int i_blkbits = sdio->blkbits + sdio->blkfactor; + unsigned int blkbits = sdio->blkbits; + unsigned int blkfactor = sdio->blkfactor; + unsigned int i_blkbits = blkbits + blkfactor; /* * If there was a memory error and we've overwritten all the @@ -617,10 +620,9 @@ static int get_more_blocks(struct dio *dio, struct dio_submit *sdio, */ ret = dio->page_errors; if (ret == 0) { - BUG_ON(sdio->block_in_file >= sdio->final_block_in_request); - fs_startblk = sdio->block_in_file >> sdio->blkfactor; - fs_endblk = (sdio->final_block_in_request - 1) >> - sdio->blkfactor; + BUG_ON(block_in_file >= sdio->final_block_in_request); + fs_startblk = block_in_file >> blkfactor; + fs_endblk = (sdio->final_block_in_request - 1) >> blkfactor; fs_count = fs_endblk - fs_startblk + 1; map_bh->b_state = 0; @@ -638,11 +640,9 @@ static int get_more_blocks(struct dio *dio, struct dio_submit *sdio, * buffer head. */ create = dio->rw & WRITE; - if (dio->flags & DIO_SKIP_HOLES) { - if (sdio->block_in_file < (i_size_read(dio->inode) >> - sdio->blkbits)) - create = 0; - } + if ((dio->flags & DIO_SKIP_HOLES) && + ((block_in_file << blkbits) < i_size_read(dio->inode))) + create = 0; ret = (*sdio->get_block)(dio->inode, fs_startblk, map_bh, create);