From patchwork Fri Jun 23 21:36:57 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Andreas Dilger X-Patchwork-Id: 780287 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 3wvWxc4P6nz9s7h for ; Sat, 24 Jun 2017 07:37:08 +1000 (AEST) Authentication-Results: ozlabs.org; dkim=pass (2048-bit key; unprotected) header.d=dilger-ca.20150623.gappssmtp.com header.i=@dilger-ca.20150623.gappssmtp.com header.b="ZSoraHeZ"; dkim-atps=neutral Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754222AbdFWVhH (ORCPT ); Fri, 23 Jun 2017 17:37:07 -0400 Received: from mail-it0-f67.google.com ([209.85.214.67]:34567 "EHLO mail-it0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751488AbdFWVhG (ORCPT ); Fri, 23 Jun 2017 17:37:06 -0400 Received: by mail-it0-f67.google.com with SMTP id y134so8468112itc.1 for ; Fri, 23 Jun 2017 14:37:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=dilger-ca.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=5iy/C5StAvF6uOdzChzjpUrAWjzoppPvtRhCIhntHlI=; b=ZSoraHeZNtKF4S5hk1W6Kfu8pP+5drwxQNHLJwgWmo3SKBP1/vUCxU4FdWVcNcPAFR RAEtjOu8cgeNMzn1WZE/P+QPBTfl2up9GkgSFU14PvqgyapzcUxhuQUJGNvSacyneFZ6 dUehdqCJD001QEpbc6B2/MiroOtRHLu+GE7wAsq/CckdtbfwbTeLwhkh0QXJtNxr0Zhk QZ6KZ/u78Wp2o+/PqVusjIRuuQjYniCL54aH3jKMqrEQe5G+3qNEUJn7Y3MjMqEg8E+D VVjcQFMU1PKFqIYeVoIJl4xKyMTRn5ZonqG/5ZUMY+dzZGQbAhmn/Qo2I2nh0NUTjGOL 3U5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=5iy/C5StAvF6uOdzChzjpUrAWjzoppPvtRhCIhntHlI=; b=sAUm6Iu4Ue87fIJB8k3bzePvu0Mywcqu/9xgi/NSdc2Ejl/1SHNr67gy6xEMo9PV4K 52OTR/TwRcQQMNoCDgTcp68PQbYaJNFweB7XUmc+6mMPv9W9hqm79NA8KhhUYlNbGoZj /oCMR+EXJQUeHKyHw95hcDfDj8tthRdhVbBdOEvlameH+/LtcDTPVuugd5dJhl8kvPKU Q95i0ioivb8dBxpyNE9I+IZIx0Gg1wsgrNv8tKX9mCZcwZJ7jOiPmVRbRGXW80Mnr5yT zlOD/fQQQT0viCpjbGrZ6F4yEABW40B6dUZpiRuDQyHmZLD+WJZvNAzNrks47ePrmsnP unNQ== X-Gm-Message-State: AKS2vOx9Mah2fRdtSRw68ENrVjO/J0NUazsu/8LZ48QQIcx6PTvOtVQQ hRe9FEMNHnO+NuvOK5EhxQ== X-Received: by 10.36.29.147 with SMTP id 141mr9266740itj.83.1498253825756; Fri, 23 Jun 2017 14:37:05 -0700 (PDT) Received: from cabot-wlan.adilger.int (S0106a84e3fe4b223.cg.shawcable.net. [70.77.216.213]) by smtp.gmail.com with ESMTPSA id s77sm3528514ita.3.2017.06.23.14.37.04 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 23 Jun 2017 14:37:04 -0700 (PDT) From: Andreas Dilger Message-Id: Mime-Version: 1.0 (Mac OS X Mail 10.3 \(3273\)) Subject: Re: [PATCH] ext4: Return EIO on read error in ext4_find_entry Date: Fri, 23 Jun 2017 15:36:57 -0600 In-Reply-To: <20170623122603.jmvyw4oqkojcapv3@thunk.org> Cc: Khazhismel Kumykov , adilger.kernel@dilger.ca, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org To: Theodore Ts'o References: <20170622232307.48392-1-khazhy@google.com> <20170623044314.7f23ighkelnpgnah@thunk.org> <204110E6-EECE-4925-9020-EC6D9633C822@dilger.ca> <20170623122603.jmvyw4oqkojcapv3@thunk.org> X-Mailer: Apple Mail (2.3273) Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Jun 23, 2017, at 6:26 AM, Theodore Ts'o wrote: > > On Fri, Jun 23, 2017 at 12:33:02AM -0600, Andreas Dilger wrote: >> On Jun 22, 2017, at 22:43, Theodore Ts'o wrote: >>> >>>> On Thu, Jun 22, 2017 at 04:23:07PM -0700, Khazhismel Kumykov wrote: >>>> Previously, a read error would be ignored and we would eventually return >>>> NULL from ext4_find_entry, which signals "no such file or directory". We >>>> should be returning EIO. >>>> >>>> Signed-off-by: Khazhismel Kumykov >>> >>> Thanks, applied. >> >> I don't necessarily agree that this is an improvement. >> >> If the requested entry is not in the bad block, this will return an >> error even if the file name could be found in another block. It >> would be better to save the error until the end and only return -EIO >> if the entry cannot be found. > > The problem is that if we continue, successive reads may all take > seconds or minutes to fail, thus tieing up the process for a long > time. Sorry, I don't understand where the seconds or minutes of delay come from? Is that because of long SCSI retries in the block layer, or in the disk itself, or something caused specifically because of this code? This is in the non-htree lookup path, so it is already doing a linear directory scan to find the entry. If this is a non-htree directory because the directory is only a single block in size, then saving the EIO to return at "the end" is identical to returning it immediately. If it is using this codepath because it is a large non-htree directory, then there is a real chance that the entry can be found in another block, and this will cause the application to fail trying to find the file when it doesn't need to. Since this is after EXT4_ERROR_INODE() then the filesystem must be mounted with "errors=continue", so I'd think in that case the filesystem should try to continue in the face of errors. > If this process happens to be, say, the node's Kubernetes > management server it can take down the entire node (since if there is > a watchdog daemon which assumes that if the management server is down, > it's time to kill and reset the entire node), and the file system is, > say, a network file system error which should only kill the individual > job, and not the entire node, the results could be quite unfortunate. Sorry, I don't understand your example. It isn't clear where the error is coming from? Is this a media error or I don't understand how this could be a network filesystem error? To my reading, the patch removing the ability to skip a bad directory leaf block would _induce_ an error rather than avoid it. > By returning EIO right away, we can "fast fail". But it seems like you don't necessarily need to fail at all? Something like the following would return an error if the entry is not found, but still search the rest of the leaf blocks (if any) before giving up: Cheers, Andreas diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c index b81f7d4..c75b6dc 100644 --- a/fs/ext4/namei.c +++ b/fs/ext4/namei.c @@ -1434,6 +1434,8 @@ static struct buffer_head * ext4_find_entry (struct inode *dir, EXT4_ERROR_INODE(dir, "reading directory lblock %lu", (unsigned long) block); brelse(bh); + /* if entry isn't found in a later block, return an error */ + ret = ERR_PTR(-EIO); goto next; } if (!buffer_verified(bh) &&