From patchwork Fri May 13 16:25:28 2016 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Eryu Guan X-Patchwork-Id: 622065 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3r5wLt2Ggzz9t77 for ; Sat, 14 May 2016 02:30:14 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=fail reason="signature verification failed" (2048-bit key; unprotected) header.d=gmail.com header.i=@gmail.com header.b=uP7s8qNW; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752849AbcEMQaH (ORCPT ); Fri, 13 May 2016 12:30:07 -0400 Received: from mail-pf0-f196.google.com ([209.85.192.196]:34101 "EHLO mail-pf0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752597AbcEMQaF (ORCPT ); Fri, 13 May 2016 12:30:05 -0400 Received: by mail-pf0-f196.google.com with SMTP id 145so9710033pfz.1; Fri, 13 May 2016 09:30:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=from:to:cc:subject:date:message-id; bh=uYDYT431IagrVoUKw7ayvJj9pywDj9ljwzBEVJgpVvE=; b=uP7s8qNWMoWnQIc3VvTQGvxyWgd1asLHCd+dBZfAk5dcx4YrqE7VAmpRh7rE6Mg8rx XPRxe2JcaAxWvtzkqKF3w04hWecWBNLOrAaIsGIFpWg48yV7o/FvITVSvlySihjwmgHs CeDHPOt99h49+M6JQjZw4RyoXqyvFsqcQeT6UULunyMl1M/6kjrH1B3lJ+DP0Vjo2RYf R8PBhkH0WRdJ7wjBztbr/3xnXKKeULHvRDb3nBBMI0CxyxurPAtj4IXvEnahi6X7JsEh scYUsNTgwwuXHNtYLySUJ6Vw7h2bMmKX2DM9pIH3SooczJeQ/Dyb4t9M0IyuNDUAWbhF 4hYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=uYDYT431IagrVoUKw7ayvJj9pywDj9ljwzBEVJgpVvE=; b=lTdy7YtyR6jkGY54SyknyxXxC8U3Xd5mu6ea2vKOJGcQ79YVjjnF29piWX/IqgS9J7 VG6IgJ7T7/P8IXwf83gQ9gAOeilo+wpKnD6yQwUUwa5x/fWEvgo7Y02W9UChHiKHfnAs 5plpcxp3FlAFlMz5Tqwr//eKuRajI74bpMc2Tjbl8ngkNa4bJO+sF+x9tX0hpH+94ATt OQnhKMTMEr+tfeEPr2cYi/nxjq9vBNXedOhDiyAV4BFAjKDW1y6lO76mWTAeAsLstlqB 2I6MqtHvfr3oN/TvAEhGCegmkNl5rmQK6Gz3bPgqI3g/oURM/fZ/Dt1c4Mq58Msx0eYL 4Wmw== X-Gm-Message-State: AOPr4FU6X0205imFUj+6ZVXs4SVbFl1nJ+dMz+SKHzj+OmvU7YnE5+inLB99/GqOKXPLuw== X-Received: by 10.98.36.87 with SMTP id r84mr24594559pfj.5.1463157004930; Fri, 13 May 2016 09:30:04 -0700 (PDT) Received: from localhost ([128.199.137.77]) by smtp.gmail.com with ESMTPSA id ey12sm28609325pac.26.2016.05.13.09.30.03 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 13 May 2016 09:30:04 -0700 (PDT) From: Eryu Guan To: linux-fsdevel@vger.kernel.org Cc: linux-ext4@vger.kernel.org, jmoyer@redhat.com, viro@ZenIV.linux.org.uk, Eryu Guan Subject: [PATCH v3] direct-io: fix direct write stale data exposure from concurrent buffered read Date: Sat, 14 May 2016 00:25:28 +0800 Message-Id: <1463156728-13357-1-git-send-email-guaneryu@gmail.com> X-Mailer: git-send-email 2.5.5 Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org Currently direct writes inside i_size on a DIO_SKIP_HOLES filesystem are not allowed to allocate blocks(get_more_blocks() sets 'create' to 0 before calling get_block() callback), if it's a sparse file, direct writes fall back to buffered writes to avoid stale data exposure from concurrent buffered read. But there're two cases that can result in stale data exposure are not correctly detected. 1. The detection for "writing inside i_size" is not sufficient, writes can be treated as "extending writes" wrongly. For example, direct write 1FSB to a 1FSB sparse file on ext2/3/4, starting from offset 0, in this case it's writing inside i_size, but 'create' is non-zero, because 'block_in_file' and '(i_size_read(inode) >> blkbits' are both zero. 2. Direct writes starting from or beyong i_size (not inside i_size) also could trigger block allocation and expose stale data. For example, consider a sparse file with i_size of 2k, and a write to offset 2k or 3k into the file, with a filesystem block size of 4k. (Thanks to Jeff Moyer for pointing this case out in his review.) The first problem can be demostrated by running ltp-aiodio test ADSP045 many times. When testing on extN filesystems, I see test failures occasionally, buffered read could read non-zero (stale) data. ADSP045: dio_sparse -a 4k -w 4k -s 2k -n 1 dio_sparse 0 TINFO : Dirtying free blocks dio_sparse 0 TINFO : Starting I/O tests non zero buffer at buf[0] => 0xffffffaa,ffffffaa,ffffffaa,ffffffaa non-zero read at offset 0 dio_sparse 0 TINFO : Killing childrens(s) dio_sparse 1 TFAIL : dio_sparse.c:191: 1 children(s) exited abnormally The second problem can also be reproduced easily by a hacked dio_sparse program, which accepts an option to specify the write offset. What we should really do is to disable block allocation for writes that could result in filling holes inside i_size. Reviewed-by: Jan Kara Signed-off-by: Eryu Guan Reviewed-by: Jeff Moyer --- v3: - Kill unnecessary cleanup patch - Update comments a bit accordingly v2: - Fix the case Jeff pointed out as well - Update commit log fs/direct-io.c | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/fs/direct-io.c b/fs/direct-io.c index 4720377..62921ce 100644 --- a/fs/direct-io.c +++ b/fs/direct-io.c @@ -627,11 +627,11 @@ static int get_more_blocks(struct dio *dio, struct dio_submit *sdio, map_bh->b_size = fs_count << i_blkbits; /* - * For writes inside i_size on a DIO_SKIP_HOLES filesystem we - * forbid block creations: only overwrites are permitted. - * We will return early to the caller once we see an - * unmapped buffer head returned, and the caller will fall - * back to buffered I/O. + * For writes that could fill holes inside i_size on a + * DIO_SKIP_HOLES filesystem we forbid block creations: only + * overwrites are permitted. We will return early to the caller + * once we see an unmapped buffer head returned, and the caller + * will fall back to buffered I/O. * * Otherwise the decision is left to the get_blocks method, * which may decide to handle it or also return an unmapped @@ -639,8 +639,8 @@ static int get_more_blocks(struct dio *dio, struct dio_submit *sdio, */ create = dio->rw & WRITE; if (dio->flags & DIO_SKIP_HOLES) { - if (sdio->block_in_file < (i_size_read(dio->inode) >> - sdio->blkbits)) + if (fs_startblk <= ((i_size_read(dio->inode) - 1) >> + i_blkbits)) create = 0; }