From patchwork Wed Jul 16 19:22:36 2014 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Patchwork-Submitter: Darrick Wong X-Patchwork-Id: 370858 Return-Path: X-Original-To: patchwork-incoming@ozlabs.org Delivered-To: patchwork-incoming@ozlabs.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by ozlabs.org (Postfix) with ESMTP id 431021400FA for ; Thu, 17 Jul 2014 05:23:05 +1000 (EST) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S965625AbaGPTW4 (ORCPT ); Wed, 16 Jul 2014 15:22:56 -0400 Received: from userp1040.oracle.com ([156.151.31.81]:23876 "EHLO userp1040.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932065AbaGPTWw (ORCPT ); Wed, 16 Jul 2014 15:22:52 -0400 Received: from acsinet21.oracle.com (acsinet21.oracle.com [141.146.126.237]) by userp1040.oracle.com (Sentrion-MTA-4.3.2/Sentrion-MTA-4.3.2) with ESMTP id s6GJMfpl016678 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=OK); Wed, 16 Jul 2014 19:22:41 GMT Received: from userz7021.oracle.com (userz7021.oracle.com [156.151.31.85]) by acsinet21.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s6GJMd5Q027068 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=FAIL); Wed, 16 Jul 2014 19:22:40 GMT Received: from abhmp0002.oracle.com (abhmp0002.oracle.com [141.146.116.8]) by userz7021.oracle.com (8.14.4+Sun/8.14.4) with ESMTP id s6GJMc48023317; Wed, 16 Jul 2014 19:22:38 GMT Received: from localhost (/10.145.178.171) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Wed, 16 Jul 2014 12:22:38 -0700 Date: Wed, 16 Jul 2014 12:22:36 -0700 From: "Darrick J. Wong" To: Xiaoguang Wang Cc: linux-ext4@vger.kernel.org, "Theodore Ts'o" , a-fujita@rs.jp.nec.com, t-sato@yk.jp.nec.com, sandeen@redhat.com Subject: Re: Question about e2fsprogs/e4defrag Message-ID: <20140716192236.GA8628@birch.djwong.org> References: <53C5FCA5.20207@cn.fujitsu.com> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <53C5FCA5.20207@cn.fujitsu.com> User-Agent: Mutt/1.5.21 (2010-09-15) X-Source-IP: acsinet21.oracle.com [141.146.126.237] Sender: linux-ext4-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org On Wed, Jul 16, 2014 at 12:16:37PM +0800, Xiaoguang Wang wrote: > Hi, > > When I run xfstests/tests/generic/018 for ext4 file system in RHEL7.0GA, > sometimes it fails and sometime it succeeds. After looking into this case, I > think it's not a kernel ext4 bug, it maybe an e4dfrag bug. I compiled the > newest e2fsprogs to have test, it seems this issue still exits, so I still > send a mail to this list to look for some help, thanks. > > The issue is that sometimes e4defrag does not defrag file correctly. > Steps to reproduce this issue: > 1. cd mntpoint > 2. rm -f lege > 3. for I in `seq 9 -1 0`; > do dd if=/dev/zero of=lege bs=4k count=1 conv=notrunc seek=$I oflag=sync &>/dev/null; > done; > 4. e4defrag -c -v lege > > Repeatedly execute the 2, 3, 4 steps until you get a file which have the similar extent layout like below: > ################################################################ > [root@localhost test_e2fsprogs]# e4defrag -c -v lege > > [ext 1]: start 49365571: logical 0: len 1 > [ext 2]: start 49365570: logical 1: len 1 > [ext 3]: start 49365569: logical 2: len 1 > [ext 4]: start 49365568: logical 3: len 1 > [ext 5]: start 49365567: logical 4: len 1 > [ext 6]: start 49365566: logical 5: len 1 > [ext 7]: start 49365565: logical 6: len 1 > [ext 8]: start 49365564: logical 7: len 1 > [ext 9]: start 49365563: logical 8: len 1 > [ext 10]: start 49365562: logical 9: len 1 > > Total/best extents 10/1 > Average size per extent 4 KB > Fragmentation score 98 > [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag] > This file (lege) needs defragmentation. > Done. > ################################################################ > The physical blocks are continuous but reversed. > > If we call e4defrag against this file, the output would be: > ################################################################ > [root@localhost test_e2fsprogs]# /tmp/e4defrag -v lege > ext4 defragmentation for lege > [1/1]lege: 100% extents: 10 -> 10 [ OK ] > Success: [1/1] > [root@localhost test_e2fsprogs]# /tmp/e4defrag -v -c lege > > [ext 1]: start 49365571: logical 0: len 1 > [ext 2]: start 49365570: logical 1: len 1 > [ext 3]: start 49365569: logical 2: len 1 > [ext 4]: start 49365568: logical 3: len 1 > [ext 5]: start 49365567: logical 4: len 1 > [ext 6]: start 49365566: logical 5: len 1 > [ext 7]: start 49365565: logical 6: len 1 > [ext 8]: start 49365564: logical 7: len 1 > [ext 9]: start 49365563: logical 8: len 1 > [ext 10]: start 49365562: logical 9: len 1 > > Total/best extents 10/1 > Average size per extent 4 KB > Fragmentation score 98 > [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag] > This file (lege) needs defragmentation. > Done. > ################################################################ > According to my understanding, this file is not defraged correctly and should > be convert into one extent. Or because if the physical blocks are continuous > though reversed, we do not need to do defragment? Oh, I think we /do/ need to defragment. Granted, file readahead might paper over the symptoms, but since the user explicitly ran e4defrag we can try to do better. > I have checked the e4defrag source code, whether to do real defragment > depends on some conditions, please > see this code(e4defrag.c). > --main > --file_defrag > > In file_defrag(), there is such a judgement: > "if (file_frags_start <= best || orig_physical_cnt <= donor_physical_cnt)", If this returns true, the e4defrag will > not call call_defrag() to do real defragment work. > > Here file_frags_start: number of file fragments before defrag > orig_physical_cnt: number of original file's continuous physical region > donor_physical_cnt: number of donor file's continuous physical region > > In this "lege" file, the orig_physical_cnt is 1, and donor_physical_cnt is also 1, so the "if" is satisfied and > call_defrag() won't be called. This is a curious corner case of e4defrag -- if you look in get_file_extents(), the list of extents is insertion-sorted by physical block, which means that get_physical_count() (stupidly) looks only for gaps in the runs of physical blocks. Therefore, e4defrag thinks that this "lege" file has one physical extent. Ignoring logical block ordering, this is true, but as you point out, this leaves the "file written backwards" case in a fragmented state. So let's not ignore the logical block ordering: What I think we really need to do here is make get_physical_count() smarter -- if there's a gap either in the physical or logical offsets of extents, then we need to increment *_physical_cnt so that we later decide to defragment the file. (Please keep reading) > Here I'd like to know the comparison "orig_physical_cnt <= > donor_physical_cnt" is useful? According to my understanding, what should we > have comparison are number of extents or average extent size. > > When I have this change: > diff --git a/misc/e4defrag.c b/misc/e4defrag.c > index a204793..cd95698 100644 > --- a/misc/e4defrag.c > +++ b/misc/e4defrag.c > @@ -1598,8 +1598,7 @@ check_improvement: > extents_before_defrag += file_frags_start; > } > > - if (file_frags_start <= best || > - orig_physical_cnt <= donor_physical_cnt) { > + if (file_frags_start <= best) { This is incorrect, since the point of the "orig_physical_cnt <= donor_physical_cnt" check is to ensure that we don't increase the fragmentation of a file by swapping it with pieces from a donor file whose contents are spread out over a larger number of runs of physical blocks. (It does, however, force defragmentation for all files, so you get the results you wanted.) Please try the patch at the end of this message on for size. It fixes things on my test VM; does it fix yours? --D > printf("\033[79;0H\033[K[%u/%u]%s:\t%3d%%", > defraged_file_count, total_count, file, 100); > if (mode_flag & DETAIL) > > Then the "lege" file could be defraged correctly. > ################################################################## > [root@localhost test_e2fsprogs]# /tmp/e4defrag -v lege > ext4 defragmentation for lege > [1/1]lege: 100% extents: 10 -> 1 [ OK ] > Success: [1/1] > [root@localhost test_e2fsprogs]# /tmp/e4defrag -v -c lege > > [ext 1]: start 49366583: logical 0: len 10 > > Total/best extents 1/1 > Average size per extent 40 KB > Fragmentation score 0 > [0-30 no problem: 31-55 a little bit fragmented: 56- needs defrag] > This file (lege) does not need defragmentation. > Done. > ################################################################## > > Any opinion or suggestions will be appreciated! > If I'm wrong, please correct me, thanks! > > Regards, > Xiaoguang Wang --- To unsubscribe from this list: send the line "unsubscribe linux-ext4" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html diff --git a/misc/e4defrag.c b/misc/e4defrag.c index a204793..d0eac60 100644 --- a/misc/e4defrag.c +++ b/misc/e4defrag.c @@ -888,7 +888,9 @@ static int get_physical_count(struct fiemap_extent_list *physical_list_head) do { if ((ext_list_tmp->data.physical + ext_list_tmp->data.len) - != ext_list_tmp->next->data.physical) { + != ext_list_tmp->next->data.physical || + (ext_list_tmp->data.logical + ext_list_tmp->data.len) + != ext_list_tmp->next->data.logical) { /* This extent and next extent are not continuous. */ ret++; }