[{"id":1760251,"web_url":"http://patchwork.ozlabs.org/comment/1760251/","msgid":"<20170830145921.GB5920@quack2.suse.cz>","list_archive_url":null,"date":"2017-08-30T14:59:21","subject":"Re: [PATCH 4/4] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA","submitter":{"id":363,"url":"http://patchwork.ozlabs.org/api/people/363/","name":"Jan Kara","email":"jack@suse.cz"},"content":"On Tue 29-08-17 16:29:42, Andreas Gruenbacher wrote:\n> From: Christoph Hellwig <hch@lst.de>\n> \n> Switch to the iomap_seek_hole and iomap_seek_data helpers for\n> implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the code that\n> isn't needed any more.\n> \n> Note that with this patch, ext4 will now always depend on the iomap code\n> instead of only when CONFIG_DAX is enabled, and it requires adding a\n> call into the extent status tree for iomap_begin as well to properly\n> deal with delalloc extents.\n> \n> Signed-off-by: Christoph Hellwig <hch@lst.de>\n> [Minor fixes and cleanups by Andreas]\n> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>\n\n...\n\n> @@ -3425,6 +3425,29 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,\n>  \n>  \tif (!(flags & IOMAP_WRITE)) {\n>  \t\tret = ext4_map_blocks(NULL, inode, &map, 0);\n> +\t\tif (ret < 0)\n> +\t\t\treturn ret;\n\nOne general question about IOMAP_REPORT: When there is unwritten extent, we\nuse page_cache_seek_hole_data() to determine what is actually a hole and\nwhat is data. We could use exactly the same function to determine what is a\nhole and what is data in an IOMAP_HOLE extent, couldn't we? Now I\nunderstand that a filesystem is supposed to return IOMAP_DELALLOC extent if\nthere is delayed allocation data covering queried offset so probably it's\nnot a great idea to use that however it still seems a bit like an\nunnecessary duplication...\n\n> +\t\tif (!ret) {\n\nPlease go to this branch only for IOMAP_REPORT case to avoid unnecessary\noverhead for DAX mappings...\n\n> +\t\t\tstruct extent_status es = {};\n> +\n> +\t\t\text4_es_find_delayed_extent_range(inode, map.m_lblk,\n> +\t\t\t\t\tmap.m_lblk + map.m_len - 1, &es);\n> +\t\t\t/* Is delalloc data before next block in extent tree? */\n> +\t\t\tif (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {\n\nAnd this is still wrong - I have actually verified that with attached patch\nthat disables caching of extents (so it is equivalent to *very* aggresive\nslab reclaim happening). With that patch applied you'll get:\n\nkvm0:~ # rm /mnt/file; xfs_io -f -c \"pwrite 4096 4096\" -c \"seek -a 0\"\n/mnt/file\nwrote 4096/4096 bytes at offset 4096\n4 KiB, 1 ops; 0.0000 sec (16.693 MiB/sec and 4273.5043 ops/sec)\nWhence\tResult\nDATA\t0\nHOLE\t8192\n\nWhich is obviously wrong - hole should be 0, data should be 4096.\n\nAnd the reason why this is wrong is that when we are asked to map things at\nfirst_block and there is a hole at that offset, we need to just truncate\nthe hole extent returned by ext4_map_blocks() by possibly following\ndelalloc allocation. But we still need to return that there *is a hole* at\nfirst_block. But your patch seems to try to return the delalloc extent\ninstead which is wrong.\n\n> +\t\t\t\text4_lblk_t offs = 0;\n> +\n> +\t\t\t\tif (es.es_lblk < map.m_lblk)\n> +\t\t\t\t\toffs = map.m_lblk - es.es_lblk;\n> +\t\t\t\tmap.m_lblk = es.es_lblk + offs;\n> +\t\t\t\tmap.m_pblk = ext4_es_pblock(&es) + offs;\n> +\t\t\t\tmap.m_len = es.es_len - offs;\n> +\t\t\t\tif (ext4_es_is_unwritten(&es))\n> +\t\t\t\t\tmap.m_flags |= EXT4_MAP_UNWRITTEN;\n> +\t\t\t\tif (ext4_es_is_delayed(&es))\n> +\t\t\t\t\tdelalloc = true;\n> +\t\t\t\tret = 1;\n> +\t\t\t}\n> +\t\t}\n>  \t} else {\n>  \t\tint dio_credits;\n>  \t\thandle_t *handle;\n\n\t\t\t\t\t\t\t\tHonza","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xj7vM2Tw0z9sN7\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 31 Aug 2017 00:59:27 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751464AbdH3O7Z (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tWed, 30 Aug 2017 10:59:25 -0400","from mx2.suse.de ([195.135.220.15]:37581 \"EHLO mx1.suse.de\"\n\trhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP\n\tid S1751358AbdH3O7Y (ORCPT <rfc822;linux-ext4@vger.kernel.org>);\n\tWed, 30 Aug 2017 10:59:24 -0400","from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254])\n\tby mx1.suse.de (Postfix) with ESMTP id C9D1BACE9;\n\tWed, 30 Aug 2017 14:59:22 +0000 (UTC)","by quack2.suse.cz (Postfix, from userid 1000)\n\tid 3867C1E3414; Wed, 30 Aug 2017 16:59:21 +0200 (CEST)"],"X-Virus-Scanned":"by amavisd-new at test-mx.suse.de","Date":"Wed, 30 Aug 2017 16:59:21 +0200","From":"Jan Kara <jack@suse.cz>","To":"Andreas Gruenbacher <agruenba@redhat.com>","Cc":"linux-fsdevel@vger.kernel.org, linux-ext4@vger.kernel.org,\n\tlinux-xfs@vger.kernel.org, Jan Kara <jack@suse.cz>,\n\tChristoph Hellwig <hch@lst.de>","Subject":"Re: [PATCH 4/4] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA","Message-ID":"<20170830145921.GB5920@quack2.suse.cz>","References":"<20170829142942.21594-1-agruenba@redhat.com>\n\t<20170829142942.21594-5-agruenba@redhat.com>","MIME-Version":"1.0","Content-Type":"multipart/mixed; boundary=\"r5Pyd7+fXNt84Ff3\"","Content-Disposition":"inline","In-Reply-To":"<20170829142942.21594-5-agruenba@redhat.com>","User-Agent":"Mutt/1.5.24 (2015-08-30)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1768444,"web_url":"http://patchwork.ozlabs.org/comment/1768444/","msgid":"<CAHc6FU4aDVF7fhYVfShnRyg=0P=8T7mJ9D59r2CC1i2OO-SEeQ@mail.gmail.com>","list_archive_url":null,"date":"2017-09-14T09:17:24","subject":"Re: [PATCH 4/4] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA","submitter":{"id":67290,"url":"http://patchwork.ozlabs.org/api/people/67290/","name":"Andreas Gruenbacher","email":"agruenba@redhat.com"},"content":"Jan,\n\nOn Wed, Aug 30, 2017 at 4:59 PM, Jan Kara <jack@suse.cz> wrote:\n> On Tue 29-08-17 16:29:42, Andreas Gruenbacher wrote:\n>> From: Christoph Hellwig <hch@lst.de>\n>>\n>> Switch to the iomap_seek_hole and iomap_seek_data helpers for\n>> implementing lseek SEEK_HOLE / SEEK_DATA, and remove all the code that\n>> isn't needed any more.\n>>\n>> Note that with this patch, ext4 will now always depend on the iomap code\n>> instead of only when CONFIG_DAX is enabled, and it requires adding a\n>> call into the extent status tree for iomap_begin as well to properly\n>> deal with delalloc extents.\n>>\n>> Signed-off-by: Christoph Hellwig <hch@lst.de>\n>> [Minor fixes and cleanups by Andreas]\n>> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>\n>\n> ...\n>\n>> @@ -3425,6 +3425,29 @@ static int ext4_iomap_begin(struct inode *inode, loff_t offset, loff_t length,\n>>\n>>       if (!(flags & IOMAP_WRITE)) {\n>>               ret = ext4_map_blocks(NULL, inode, &map, 0);\n>> +             if (ret < 0)\n>> +                     return ret;\n>\n> One general question about IOMAP_REPORT: When there is unwritten extent, we\n> use page_cache_seek_hole_data() to determine what is actually a hole and\n> what is data. We could use exactly the same function to determine what is a\n> hole and what is data in an IOMAP_HOLE extent, couldn't we? Now I\n> understand that a filesystem is supposed to return IOMAP_DELALLOC extent if\n> there is delayed allocation data covering queried offset so probably it's\n> not a great idea to use that however it still seems a bit like an\n> unnecessary duplication...\n\nThere is no point in scanning the page cache on filesystems that don't\nsupport delayed allocation, so it does make sense to distinguish the\ntwo types of mappings.\n\n>> +             if (!ret) {\n>\n> Please go to this branch only for IOMAP_REPORT case to avoid unnecessary\n> overhead for DAX mappings...\n>\n>> +                     struct extent_status es = {};\n>> +\n>> +                     ext4_es_find_delayed_extent_range(inode, map.m_lblk,\n>> +                                     map.m_lblk + map.m_len - 1, &es);\n>> +                     /* Is delalloc data before next block in extent tree? */\n>> +                     if (es.es_len && es.es_lblk < map.m_lblk + map.m_len) {\n>\n> And this is still wrong - I have actually verified that with attached patch\n> that disables caching of extents (so it is equivalent to *very* aggresive\n> slab reclaim happening). With that patch applied you'll get:\n>\n> kvm0:~ # rm /mnt/file; xfs_io -f -c \"pwrite 4096 4096\" -c \"seek -a 0\"\n> /mnt/file\n> wrote 4096/4096 bytes at offset 4096\n> 4 KiB, 1 ops; 0.0000 sec (16.693 MiB/sec and 4273.5043 ops/sec)\n> Whence  Result\n> DATA    0\n> HOLE    8192\n>\n> Which is obviously wrong - hole should be 0, data should be 4096.\n>\n> And the reason why this is wrong is that when we are asked to map things at\n> first_block and there is a hole at that offset, we need to just truncate\n> the hole extent returned by ext4_map_blocks() by possibly following\n> delalloc allocation. But we still need to return that there *is a hole* at\n> first_block. But your patch seems to try to return the delalloc extent\n> instead which is wrong.\n\nGot it, thanks for the debug patch. I'll send a fixed version of the series.\n\n>> +                             ext4_lblk_t offs = 0;\n>> +\n>> +                             if (es.es_lblk < map.m_lblk)\n>> +                                     offs = map.m_lblk - es.es_lblk;\n>> +                             map.m_lblk = es.es_lblk + offs;\n>> +                             map.m_pblk = ext4_es_pblock(&es) + offs;\n>> +                             map.m_len = es.es_len - offs;\n>> +                             if (ext4_es_is_unwritten(&es))\n>> +                                     map.m_flags |= EXT4_MAP_UNWRITTEN;\n>> +                             if (ext4_es_is_delayed(&es))\n>> +                                     delalloc = true;\n>> +                             ret = 1;\n>> +                     }\n>> +             }\n>>       } else {\n>>               int dio_credits;\n>>               handle_t *handle;\n\nThanks,\nAndreas","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xtCbv4Bs1z9sPs\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu, 14 Sep 2017 19:17:31 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751737AbdINJR3 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 14 Sep 2017 05:17:29 -0400","from mail-io0-f171.google.com ([209.85.223.171]:44989 \"EHLO\n\tmail-io0-f171.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1751721AbdINJR0 (ORCPT\n\t<rfc822; linux-ext4@vger.kernel.org>); Thu, 14 Sep 2017 05:17:26 -0400","by mail-io0-f171.google.com with SMTP id v36so16336924ioi.1\n\tfor <linux-ext4@vger.kernel.org>;\n\tThu, 14 Sep 2017 02:17:26 -0700 (PDT)","by 10.157.50.226 with HTTP; Thu, 14 Sep 2017 02:17:24 -0700 (PDT)"],"X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=Ie5xkXBqBweIeXvrYnsMfr3gJqvFtJ6SFmAp5gFGSuM=;\n\tb=mL+4F8evbX1o7h7CICwb2Yxba0gY1unD9vbp+9lNfeX4VR21n88fWey1qWfrTqmr69\n\tCdqadA6AZ/UVDos3y1mef873O9SVmPiduo0Ma51VpDKspPzJ1ZBpB6ln+sFa7+bxm4y0\n\tyRhujbUjhWl5FV+s7O558oM4t+G8S03cE20GwNPEGsnSiDXYYw//qwgDflHhqg9AjoBP\n\tAhwH/vtlwKrkDGufS79oKgUVjNazCEHjFIMfHwNp5C+hXAV/iCHd42lHuWd/xS6b301y\n\tw8iWtStf0br5Hh4n3jKJ14FKaWu8FP4o6zDgTh7Rige2NJfsJG/uvyB190MZvHF/doPj\n\tJicw==","X-Gm-Message-State":"AHPjjUibxfrogHyuLiKrtwmFYii7MU7raJTH94n9+RR2K1z+3MK96QAv\n\trCu/s7Toq4Rg1ErNFOhRPsnKPxZfcHpcDoLLg5ak8g==","X-Google-Smtp-Source":"AOwi7QDq+gn6d2Xy8uPpjY9jy+hDgRupnx8AE1Dp0GznLNtdyQXFdaXp3z/NdpvBSkB5OTTpFaBuDN4ndrYzqNxU0Ro=","X-Received":"by 10.202.196.195 with SMTP id\n\tu186mr9882121oif.315.1505380645892; \n\tThu, 14 Sep 2017 02:17:25 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<20170830145921.GB5920@quack2.suse.cz>","References":"<20170829142942.21594-1-agruenba@redhat.com>\n\t<20170829142942.21594-5-agruenba@redhat.com>\n\t<20170830145921.GB5920@quack2.suse.cz>","From":"Andreas Gruenbacher <agruenba@redhat.com>","Date":"Thu, 14 Sep 2017 11:17:24 +0200","Message-ID":"<CAHc6FU4aDVF7fhYVfShnRyg=0P=8T7mJ9D59r2CC1i2OO-SEeQ@mail.gmail.com>","Subject":"Re: [PATCH 4/4] ext4: Switch to iomap for SEEK_HOLE / SEEK_DATA","To":"Jan Kara <jack@suse.cz>","Cc":"linux-fsdevel <linux-fsdevel@vger.kernel.org>,\n\tlinux-ext4 <linux-ext4@vger.kernel.org>,\n\tlinux-xfs@vger.kernel.org, Christoph Hellwig <hch@lst.de>","Content-Type":"text/plain; charset=\"UTF-8\"","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}}]