[{"id":1763783,"web_url":"http://patchwork.ozlabs.org/comment/1763783/","msgid":"<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>","list_archive_url":null,"date":"2017-09-06T02:12:35","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":4078,"url":"http://patchwork.ozlabs.org/api/people/4078/","name":"Eric Sandeen","email":"esandeen@redhat.com"},"content":"On 9/5/17 5:35 PM, Ross Zwisler wrote:\n> The original intent of this series was to add a per-inode DAX flag to ext4\n> so that it would be consistent with XFS.  In my travels I found and fixed\n> several related issues in both ext4 and XFS.\n\nHi Ross -\n\nhch had a lot of reasons to nuke the dax flag from orbit, and we just\n/disabled/ it in xfs due to its habit of crashing the kernel...\nso a couple questions:\n\n1) does this series pass hch's \"test the per-inode DAX flag\" fstest?\n2) do we have an agreement that we need this flag at all, or is this\n   just a parity item because xfs has^whad a per-inode flag?\n\nThanks,\n-Eric\n\n> I'm not fully happy with the ways that ext4 DAX interacts with conflicting\n> features (journaling, inline data and encryption).  My goal with this\n> series was to make all these interactions as consistent as possilble, and\n> of course to make them safe.  If anyone has ideas for improvements, I'm\n> very open.\n> \n> Ross Zwisler (9):\n>   ext4: remove duplicate extended attributes defs\n>   xfs: always use DAX if mount option is used\n>   xfs: validate bdev support for DAX inode flag\n>   ext4: add ext4_should_use_dax()\n>   ext4: ext4_change_inode_journal_flag error handling\n>   ext4: safely transition S_DAX on journaling changes\n>   ext4: prevent data corruption with inline data + DAX\n>   ext4: add sanity check for encryption + DAX\n>   ext4: add per-inode DAX flag\n> \n>  fs/ext4/ext4.h      | 47 ++++++---------------------------------------\n>  fs/ext4/ext4_jbd2.h | 16 ++++++++++++++++\n>  fs/ext4/inline.c    | 10 ----------\n>  fs/ext4/inode.c     | 45 ++++++++++++++++++++++++-------------------\n>  fs/ext4/ioctl.c     | 55 +++++++++++++++++++++++++++++++++++++++++++++++++++--\n>  fs/ext4/super.c     |  8 ++++++++\n>  fs/xfs/xfs_ioctl.c  | 14 +++++++++++---\n>  7 files changed, 119 insertions(+), 76 deletions(-)\n>","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tdmarc=none (p=none dis=none) header.from=redhat.com","ext-mx01.extmail.prod.ext.phx2.redhat.com;\n\tspf=fail smtp.mailfrom=esandeen@redhat.com"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xn6Yg0hSHz9sR9\n\tfor <patchwork-incoming@ozlabs.org>;\n\tWed,  6 Sep 2017 12:12:54 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1754261AbdIFCMj (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tTue, 5 Sep 2017 22:12:39 -0400","from mx1.redhat.com ([209.132.183.28]:53700 \"EHLO mx1.redhat.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1754229AbdIFCMi (ORCPT <rfc822;linux-ext4@vger.kernel.org>);\n\tTue, 5 Sep 2017 22:12:38 -0400","from smtp.corp.redhat.com\n\t(int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16])\n\t(using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))\n\t(No client certificate requested)\n\tby mx1.redhat.com (Postfix) with ESMTPS id 0C81781DE0;\n\tWed,  6 Sep 2017 02:12:38 +0000 (UTC)","from [IPv6:::1] (ovpn04.gateway.prod.ext.phx2.redhat.com\n\t[10.5.9.4])\n\tby smtp.corp.redhat.com (Postfix) with ESMTPS id 3446883B9C;\n\tWed,  6 Sep 2017 02:12:36 +0000 (UTC)"],"DMARC-Filter":"OpenDMARC Filter v1.3.2 mx1.redhat.com 0C81781DE0","Reply-To":"sandeen@redhat.com","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","To":"Ross Zwisler <ross.zwisler@linux.intel.com>,\n\tAndrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org","Cc":"\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tTheodore Ts'o <tytso@mit.edu>, Andreas Dilger <adilger.kernel@dilger.ca>,\n\tChristoph Hellwig <hch@lst.de>, Dan Williams <dan.j.williams@intel.com>,\n\tDave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,\n\tlinux-ext4@vger.kernel.org, linux-nvdimm@lists.01.org,\n\tlinux-xfs@vger.kernel.org","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>","From":"Eric Sandeen <esandeen@redhat.com>","Message-ID":"<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>","Date":"Tue, 5 Sep 2017 21:12:35 -0500","User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:52.0)\n\tGecko/20100101 Thunderbird/52.3.0","MIME-Version":"1.0","In-Reply-To":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>","Content-Type":"text/plain; charset=utf-8","Content-Language":"en-US","Content-Transfer-Encoding":"8bit","X-Scanned-By":"MIMEDefang 2.79 on 10.5.11.16","X-Greylist":"Sender IP whitelisted, not delayed by milter-greylist-4.5.16\n\t(mx1.redhat.com [10.5.110.25]);\n\tWed, 06 Sep 2017 02:12:38 +0000 (UTC)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1764266,"web_url":"http://patchwork.ozlabs.org/comment/1764266/","msgid":"<20170906170754.GB17663@linux.intel.com>","list_archive_url":null,"date":"2017-09-06T17:07:54","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":46514,"url":"http://patchwork.ozlabs.org/api/people/46514/","name":"Ross Zwisler","email":"ross.zwisler@linux.intel.com"},"content":"On Tue, Sep 05, 2017 at 09:12:35PM -0500, Eric Sandeen wrote:\n> On 9/5/17 5:35 PM, Ross Zwisler wrote:\n> > The original intent of this series was to add a per-inode DAX flag to ext4\n> > so that it would be consistent with XFS.  In my travels I found and fixed\n> > several related issues in both ext4 and XFS.\n> \n> Hi Ross -\n> \n> hch had a lot of reasons to nuke the dax flag from orbit, and we just\n> /disabled/ it in xfs due to its habit of crashing the kernel...\n\nAh, sorry, I wasn't CC'd on those threads and missed them.  For any interested\nbystanders:\n\nhttps://www.spinics.net/lists/linux-ext4/msg57840.html\nhttps://www.spinics.net/lists/linux-xfs/msg09831.html\nhttps://www.spinics.net/lists/linux-xfs/msg10124.html\n\n> so a couple questions:\n> \n> 1) does this series pass hch's \"test the per-inode DAX flag\" fstest?\n\nNope, it has the exact same problems as the XFS per-inode DAX flag.\n\n> 2) do we have an agreement that we need this flag at all, or is this\n>    just a parity item because xfs has^whad a per-inode flag?\n\nIt was for parity, and because it allows admins finer grained control over\ntheir system.  Basically all things discussed in response to Lukas's original\npatch in the first link above.\n\nThe way this series ended up the first 8 patches were all fixes for the\nexisting code, and only patch 9 introduced the new per-inode flag.  I'll drop\npatch 9 for now and rework the first 8 patches so we can get safer behavior of\nthe existing DAX mount option in ext4.  We can try patch 9 again later if we\ncome to an agreement that re-enables the XFS per-inode DAX option.","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xnVQX56HBz9t2d\n\tfor <patchwork-incoming@ozlabs.org>;\n\tThu,  7 Sep 2017 03:08:04 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S932081AbdIFRH6 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tWed, 6 Sep 2017 13:07:58 -0400","from mga02.intel.com ([134.134.136.20]:24113 \"EHLO mga02.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1754275AbdIFRH4 (ORCPT <rfc822;linux-ext4@vger.kernel.org>);\n\tWed, 6 Sep 2017 13:07:56 -0400","from fmsmga006.fm.intel.com ([10.253.24.20])\n\tby orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t06 Sep 2017 10:07:56 -0700","from theros.lm.intel.com (HELO linux.intel.com) ([10.232.112.77])\n\tby fmsmga006.fm.intel.com with ESMTP; 06 Sep 2017 10:07:55 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.42,354,1500966000\"; d=\"scan'208\";a=\"148855097\"","Date":"Wed, 6 Sep 2017 11:07:54 -0600","From":"Ross Zwisler <ross.zwisler@linux.intel.com>","To":"sandeen@redhat.com, Lukas Czerner <lczerner@redhat.com>","Cc":"Ross Zwisler <ross.zwisler@linux.intel.com>,\n\tAndrew Morton <akpm@linux-foundation.org>, linux-kernel@vger.kernel.org,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tTheodore Ts'o <tytso@mit.edu>, Andreas Dilger <adilger.kernel@dilger.ca>,\n\tChristoph Hellwig <hch@lst.de>, Dan Williams <dan.j.williams@intel.com>,\n\tDave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,\n\tlinux-ext4@vger.kernel.org, linux-nvdimm@lists.01.org,\n\tlinux-xfs@vger.kernel.org","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Message-ID":"<20170906170754.GB17663@linux.intel.com>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>","User-Agent":"Mutt/1.8.3 (2017-05-23)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1764940,"web_url":"http://patchwork.ozlabs.org/comment/1764940/","msgid":"<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>","list_archive_url":null,"date":"2017-09-07T20:54:45","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":347,"url":"http://patchwork.ozlabs.org/api/people/347/","name":"Dan Williams","email":"dan.j.williams@intel.com"},"content":"On Wed, Sep 6, 2017 at 10:07 AM, Ross Zwisler\n<ross.zwisler@linux.intel.com> wrote:\n> On Tue, Sep 05, 2017 at 09:12:35PM -0500, Eric Sandeen wrote:\n>> On 9/5/17 5:35 PM, Ross Zwisler wrote:\n>> > The original intent of this series was to add a per-inode DAX flag to ext4\n>> > so that it would be consistent with XFS.  In my travels I found and fixed\n>> > several related issues in both ext4 and XFS.\n>>\n>> Hi Ross -\n>>\n>> hch had a lot of reasons to nuke the dax flag from orbit, and we just\n>> /disabled/ it in xfs due to its habit of crashing the kernel...\n>\n> Ah, sorry, I wasn't CC'd on those threads and missed them.  For any interested\n> bystanders:\n>\n> https://www.spinics.net/lists/linux-ext4/msg57840.html\n> https://www.spinics.net/lists/linux-xfs/msg09831.html\n> https://www.spinics.net/lists/linux-xfs/msg10124.html\n>\n>> so a couple questions:\n>>\n>> 1) does this series pass hch's \"test the per-inode DAX flag\" fstest?\n>\n> Nope, it has the exact same problems as the XFS per-inode DAX flag.\n>\n>> 2) do we have an agreement that we need this flag at all, or is this\n>>    just a parity item because xfs has^whad a per-inode flag?\n>\n> It was for parity, and because it allows admins finer grained control over\n> their system.  Basically all things discussed in response to Lukas's original\n> patch in the first link above.\n\nI think it's more than parity. When pmem is slower than page cache it\nis actively harmful to have DAX enabled globally for a filesystem. So,\nnot only should we push for per-inode DAX control, we should also push\nto deprecate the mount option. I agree with Christoph that we should\ntry to automatically and transparently enable DAX where it makes\nsense, but we also need a finer-grained mechanism than a mount flag to\nforce the behavior one way or the other.","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=intel-com.20150623.gappssmtp.com\n\theader.i=@intel-com.20150623.gappssmtp.com\n\theader.b=\"pDIvcsHS\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xpCPy73Hfz9t1t\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  8 Sep 2017 06:55:02 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1755588AbdIGUys (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 7 Sep 2017 16:54:48 -0400","from mail-oi0-f48.google.com ([209.85.218.48]:36321 \"EHLO\n\tmail-oi0-f48.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1752375AbdIGUyr (ORCPT\n\t<rfc822; linux-ext4@vger.kernel.org>); Thu, 7 Sep 2017 16:54:47 -0400","by mail-oi0-f48.google.com with SMTP id x190so4850854oix.3\n\tfor <linux-ext4@vger.kernel.org>;\n\tThu, 07 Sep 2017 13:54:47 -0700 (PDT)","by 10.157.35.2 with HTTP; Thu, 7 Sep 2017 13:54:45 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=intel-com.20150623.gappssmtp.com; s=20150623;\n\th=mime-version:in-reply-to:references:from:date:message-id:subject:to\n\t:cc; bh=XPwte5NfyuhRM7RpVjrpZ3WzE8bv7CXk0H3yDQoz3xQ=;\n\tb=pDIvcsHSuKdRsKrYRqPjThaobsehv080a7OEIxi9TWF+Nkbs9H+alsHnaMz7DBqxgV\n\tB2dDDJWdi+VCoZ/ZClXoUF89b5d5thZ20vUes3x6RRBeqimupXjtgOTL9hEV9oXOIZ5g\n\tBE8KpTvm62Pe2j1eR8X7xxrrYBab4HRPFzG67MMJHKm0WIXbPNpog1Bl7i5XHKdqb69O\n\tZXuPT3+6syi0Oo8ixOjsRvagYzWcDC25gE5NXZ0jN+hXp3kl+zsNgejwD0erb0928d2/\n\terASHQ3W7Lfie0Or2YlJDpW7/S1QuJMr39MPcECnu84eGRmm7TqPyjnONgPO4N8YJn3u\n\tZK8Q==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:mime-version:in-reply-to:references:from:date\n\t:message-id:subject:to:cc;\n\tbh=XPwte5NfyuhRM7RpVjrpZ3WzE8bv7CXk0H3yDQoz3xQ=;\n\tb=ZOeUOWMJHc9sHAelh9gNR3k9mEMNyHKuprakM+xrXVZ51Mx0K4d52L3Fl4iEEF9iaU\n\t6Ab2RgZfhB7FwLpWR6Pab67BmUUP/Osdhttad8pqN8Por61sL59f1XmaHwsX90MDxCAR\n\tGW+JZM9CX56DPHkQtbaIEMLP0YNQ5W8PhOMROv9Su2isazLxdwhJ5RfGXilFi5DSurve\n\t63l05eKV1xD8oEFr1+bZDoO17D1hOX4BsbUDRCZvefoQ3+g4bkNSFdszV7TQdd/Dn8Qg\n\tpNEJJJu0t75sPmvXyu9szYMSDFn/+2N4ZzabBzIQQX55JgXQXU1Zc6bdVyJYWUt7YqIv\n\t0WSQ==","X-Gm-Message-State":"AHPjjUiW10fmwS06ZUntD5Eqv/6uFan1/NrkOxdfLePIPcYm0ibj2keo\n\tn96adqCnixg4VwJtE/aTAetgAU+D5KCD","X-Google-Smtp-Source":"AOwi7QBRdFPe2sMr5NYb33AiEGwtyqJRP/EN4MJkrBvTfxBrFBuLQNH5AxlQ2/r4DIUvQ5DVw5WlwQE5Hijs3tG5JaA=","X-Received":"by 10.202.117.134 with SMTP id q128mr659033oic.22.1504817686627; \n\tThu, 07 Sep 2017 13:54:46 -0700 (PDT)","MIME-Version":"1.0","In-Reply-To":"<20170906170754.GB17663@linux.intel.com>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>","From":"Dan Williams <dan.j.williams@intel.com>","Date":"Thu, 7 Sep 2017 13:54:45 -0700","Message-ID":"<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","To":"Ross Zwisler <ross.zwisler@linux.intel.com>","Cc":"Eric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\t\"Theodore Ts'o\" <tytso@mit.edu>,\n\tAndreas Dilger <adilger.kernel@dilger.ca>,\n\tChristoph Hellwig <hch@lst.de>,\n\tDave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,\n\tlinux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\tlinux-xfs@vger.kernel.org","Content-Type":"text/plain; charset=\"UTF-8\"","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1764944,"web_url":"http://patchwork.ozlabs.org/comment/1764944/","msgid":"<20170907211303.GA23212@linux.intel.com>","list_archive_url":null,"date":"2017-09-07T21:13:03","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":46514,"url":"http://patchwork.ozlabs.org/api/people/46514/","name":"Ross Zwisler","email":"ross.zwisler@linux.intel.com"},"content":"On Thu, Sep 07, 2017 at 01:54:45PM -0700, Dan Williams wrote:\n> On Wed, Sep 6, 2017 at 10:07 AM, Ross Zwisler\n> <ross.zwisler@linux.intel.com> wrote:\n> > On Tue, Sep 05, 2017 at 09:12:35PM -0500, Eric Sandeen wrote:\n> >> On 9/5/17 5:35 PM, Ross Zwisler wrote:\n> >> > The original intent of this series was to add a per-inode DAX flag to ext4\n> >> > so that it would be consistent with XFS.  In my travels I found and fixed\n> >> > several related issues in both ext4 and XFS.\n> >>\n> >> Hi Ross -\n> >>\n> >> hch had a lot of reasons to nuke the dax flag from orbit, and we just\n> >> /disabled/ it in xfs due to its habit of crashing the kernel...\n> >\n> > Ah, sorry, I wasn't CC'd on those threads and missed them.  For any interested\n> > bystanders:\n> >\n> > https://www.spinics.net/lists/linux-ext4/msg57840.html\n> > https://www.spinics.net/lists/linux-xfs/msg09831.html\n> > https://www.spinics.net/lists/linux-xfs/msg10124.html\n> >\n> >> so a couple questions:\n> >>\n> >> 1) does this series pass hch's \"test the per-inode DAX flag\" fstest?\n> >\n> > Nope, it has the exact same problems as the XFS per-inode DAX flag.\n> >\n> >> 2) do we have an agreement that we need this flag at all, or is this\n> >>    just a parity item because xfs has^whad a per-inode flag?\n> >\n> > It was for parity, and because it allows admins finer grained control over\n> > their system.  Basically all things discussed in response to Lukas's original\n> > patch in the first link above.\n> \n> I think it's more than parity. When pmem is slower than page cache it\n> is actively harmful to have DAX enabled globally for a filesystem. So,\n> not only should we push for per-inode DAX control, we should also push\n> to deprecate the mount option. I agree with Christoph that we should\n> try to automatically and transparently enable DAX where it makes\n> sense, but we also need a finer-grained mechanism than a mount flag to\n> force the behavior one way or the other.\n\nYep, agreed.  I'll play with how to make this work after I've sorted out all\nthe data corruptions I've found. :)","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xpCqF2FVzz9s81\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  8 Sep 2017 07:13:29 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1755725AbdIGVNP (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 7 Sep 2017 17:13:15 -0400","from mga07.intel.com ([134.134.136.100]:42606 \"EHLO\n\tmga07.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1751456AbdIGVNO (ORCPT <rfc822;linux-ext4@vger.kernel.org>);\n\tThu, 7 Sep 2017 17:13:14 -0400","from orsmga003.jf.intel.com ([10.7.209.27])\n\tby orsmga105.jf.intel.com with ESMTP; 07 Sep 2017 14:13:13 -0700","from theros.lm.intel.com (HELO linux.intel.com) ([10.232.112.77])\n\tby orsmga003.jf.intel.com with ESMTP; 07 Sep 2017 14:13:04 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.42,360,1500966000\"; d=\"scan'208\";a=\"1012160464\"","Date":"Thu, 7 Sep 2017 15:13:03 -0600","From":"Ross Zwisler <ross.zwisler@linux.intel.com>","To":"Dan Williams <dan.j.williams@intel.com>","Cc":"Ross Zwisler <ross.zwisler@linux.intel.com>,\n\tEric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tTheodore Ts'o <tytso@mit.edu>, Andreas Dilger <adilger.kernel@dilger.ca>,\n\tChristoph Hellwig <hch@lst.de>,\n\tDave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,\n\tlinux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\tlinux-xfs@vger.kernel.org","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Message-ID":"<20170907211303.GA23212@linux.intel.com>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>\n\t<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>","User-Agent":"Mutt/1.8.3 (2017-05-23)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1764951,"web_url":"http://patchwork.ozlabs.org/comment/1764951/","msgid":"<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>","list_archive_url":null,"date":"2017-09-07T21:26:10","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":4514,"url":"http://patchwork.ozlabs.org/api/people/4514/","name":"Andreas Dilger","email":"adilger@dilger.ca"},"content":"On Sep 7, 2017, at 3:13 PM, Ross Zwisler <ross.zwisler@linux.intel.com> wrote:\n> \n> On Thu, Sep 07, 2017 at 01:54:45PM -0700, Dan Williams wrote:\n>> On Wed, Sep 6, 2017 at 10:07 AM, Ross Zwisler\n>> <ross.zwisler@linux.intel.com> wrote:\n>>> On Tue, Sep 05, 2017 at 09:12:35PM -0500, Eric Sandeen wrote:\n>>>> On 9/5/17 5:35 PM, Ross Zwisler wrote:\n>>>>> The original intent of this series was to add a per-inode DAX flag to ext4\n>>>>> so that it would be consistent with XFS.  In my travels I found and fixed\n>>>>> several related issues in both ext4 and XFS.\n>>>> \n>>>> Hi Ross -\n>>>> \n>>>> hch had a lot of reasons to nuke the dax flag from orbit, and we just\n>>>> /disabled/ it in xfs due to its habit of crashing the kernel...\n>>> \n>>> Ah, sorry, I wasn't CC'd on those threads and missed them.  For any interested\n>>> bystanders:\n>>> \n>>> https://www.spinics.net/lists/linux-ext4/msg57840.html\n>>> https://www.spinics.net/lists/linux-xfs/msg09831.html\n>>> https://www.spinics.net/lists/linux-xfs/msg10124.html\n>>> \n>>>> so a couple questions:\n>>>> \n>>>> 1) does this series pass hch's \"test the per-inode DAX flag\" fstest?\n>>> \n>>> Nope, it has the exact same problems as the XFS per-inode DAX flag.\n>>> \n>>>> 2) do we have an agreement that we need this flag at all, or is this\n>>>>   just a parity item because xfs has^whad a per-inode flag?\n>>> \n>>> It was for parity, and because it allows admins finer grained control over\n>>> their system.  Basically all things discussed in response to Lukas's original\n>>> patch in the first link above.\n>> \n>> I think it's more than parity. When pmem is slower than page cache it\n>> is actively harmful to have DAX enabled globally for a filesystem. So,\n>> not only should we push for per-inode DAX control, we should also push\n>> to deprecate the mount option. I agree with Christoph that we should\n>> try to automatically and transparently enable DAX where it makes\n>> sense, but we also need a finer-grained mechanism than a mount flag to\n>> force the behavior one way or the other.\n> \n> Yep, agreed.  I'll play with how to make this work after I've sorted out all\n> the data corruptions I've found. :)\n\nIt seems that the majority of problems are from enabling/disabling S_DAX\non an inode that already has dirty data.  However, I wonder if this could\nbe prevented at runtime, and only allow S_DAX to be set when the inode is\nfirst instantiated, and wouldn't be allowed to change after that?  Setting\nor clearing the per-inode DAX flag might still be allowed, but it wouldn't\nbe enabled until the inode is next fetched into cache?  Similarly, for\ninodes that have conflicting features (e.g. inline data or encryption)\nwould not be allowed to enable S_DAX.\n\nMy assumption here is that it is possible to fall back to always using\npage cache for such inodes, and flush the data to pmem via the block\ninterface for inodes that don't have S_DAX set?\n\nThat would allow the vast majority of cases to work out of the box, or in\na few rare cases where the DAX feature is being changed (e.g. inline data\ninode on disk growing to external disk blocks) would use the page cache\nuntil such a time that the inode is dropped from cache and reloaded (at\nworst the next remount).\n\nCheers, Andreas","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (2048-bit key;\n\tunprotected) header.d=dilger-ca.20150623.gappssmtp.com\n\theader.i=@dilger-ca.20150623.gappssmtp.com\n\theader.b=\"lKEsUTmd\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xpD6G18bjz9s81\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  8 Sep 2017 07:26:30 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1756137AbdIGV0T (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 7 Sep 2017 17:26:19 -0400","from mail-it0-f65.google.com ([209.85.214.65]:35992 \"EHLO\n\tmail-it0-f65.google.com\" rhost-flags-OK-OK-OK-OK) by vger.kernel.org\n\twith ESMTP id S1755939AbdIGV0R (ORCPT\n\t<rfc822; linux-ext4@vger.kernel.org>); Thu, 7 Sep 2017 17:26:17 -0400","by mail-it0-f65.google.com with SMTP id 3so378629ity.3\n\tfor <linux-ext4@vger.kernel.org>;\n\tThu, 07 Sep 2017 14:26:17 -0700 (PDT)","from cabot-wlan.adilger.int (S0106a84e3fe4b223.cg.shawcable.net.\n\t[70.77.216.213]) by smtp.gmail.com with ESMTPSA id\n\te34sm163034ioj.1.2017.09.07.14.26.13\n\t(version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128);\n\tThu, 07 Sep 2017 14:26:14 -0700 (PDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=dilger-ca.20150623.gappssmtp.com; s=20150623;\n\th=from:message-id:mime-version:subject:date:in-reply-to:cc:to\n\t:references; bh=0tLYGPcSs228196I4CoU4FXSt63a8CSUZ2awtQp3L54=;\n\tb=lKEsUTmdVTKiqFwHD55iumXu1ZIm44MmVDXFMsoTqsdvt8S5grfmg8FNvSJoXv4k0J\n\tTHJbnD/mRJPqmn3RK8xns4bN/qQboGHVbtwKz+aO/+vP7+HHWx0pEmgb1kNiIr52IYhw\n\tcY53drHs/lPHdaOmYae+7LEWBJRhutu+UbCIId/C+zHg4cG34HEP3+goRBzMHDE0A9qh\n\tNW4ybFrMPMNHQKgNtIqdKwnGbGMrAyFIg4lyxqG1L7fdgHyOo5Lha8wzY2mxJj5onViQ\n\tVtGWJp6wVIDHusrWOzutx20KIXwhVflR1z4yQyFub+/ZGtuNvtFQiFlueR4G8oD9wMrN\n\tlQPg==","X-Google-DKIM-Signature":"v=1; a=rsa-sha256; c=relaxed/relaxed;\n\td=1e100.net; s=20161025;\n\th=x-gm-message-state:from:message-id:mime-version:subject:date\n\t:in-reply-to:cc:to:references;\n\tbh=0tLYGPcSs228196I4CoU4FXSt63a8CSUZ2awtQp3L54=;\n\tb=o6pidg+6tpqfxdGhdHp4JOZrpJEG/3U2sZTdaElRokFcB+D49QjNaL0962ElPWjArr\n\tqLwY/CR+53fACZr0PO0l7pkQ+kJj6YXEGIBjt/i2SBX0mMs72x6j0sFZzO4SRWyzTXKH\n\tu2sp9w3xyAZmE8qAcbOWKarb93F89N/KFMw/A/HgL5aWD13XaLJUFx20bh/3l1B9ahl5\n\tlnGMZdCF4fP+pmo0/zEd5BMKabfmmo+7tHqqA5lVhwaXJiOxu38Mxkvciz7oFLRSbKWV\n\tsRj0wY+jCsBjz8SQwDNdj7nj42/ZheYVd7vwLf3UG7zkK3NCC6A/yHdLeinzhMKrSf95\n\tfW4Q==","X-Gm-Message-State":"AHPjjUgZvoEGtup5qfEup+waKosQwJdc+99O/6X/nmZl31kXv7cuQCp4\n\tzkmTgcOnCOB0B9GE","X-Google-Smtp-Source":"AOwi7QB26Ni2aJ78KMXwbJ8YPyAcXN8rtDUdUa/3MpzLoo+t8sc2pmhEl3lid+xTqHfLruRPwW4mAA==","X-Received":"by 10.36.130.130 with SMTP id t124mr366844itd.170.1504819576313; \n\tThu, 07 Sep 2017 14:26:16 -0700 (PDT)","From":"Andreas Dilger <adilger@dilger.ca>","Message-Id":"<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>","Content-Type":"multipart/signed;\n\tboundary=\"Apple-Mail=_731D3B21-257E-4633-80D4-CB7DD9B217F4\";\n\tprotocol=\"application/pgp-signature\"; micalg=pgp-sha1","Mime-Version":"1.0 (Mac OS X Mail 10.3 \\(3273\\))","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Date":"Thu, 7 Sep 2017 15:26:10 -0600","In-Reply-To":"<20170907211303.GA23212@linux.intel.com>","Cc":"Dan Williams <dan.j.williams@intel.com>,\n\tEric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tTheodore Ts'o <tytso@mit.edu>, Christoph Hellwig <hch@lst.de>,\n\tDave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,\n\tlinux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\txfs <linux-xfs@vger.kernel.org>","To":"Ross Zwisler <ross.zwisler@linux.intel.com>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>\n\t<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>\n\t<20170907211303.GA23212@linux.intel.com>","X-Mailer":"Apple Mail (2.3273)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1764969,"web_url":"http://patchwork.ozlabs.org/comment/1764969/","msgid":"<20170907215148.GA12669@linux.intel.com>","list_archive_url":null,"date":"2017-09-07T21:51:48","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":46514,"url":"http://patchwork.ozlabs.org/api/people/46514/","name":"Ross Zwisler","email":"ross.zwisler@linux.intel.com"},"content":"On Thu, Sep 07, 2017 at 03:26:10PM -0600, Andreas Dilger wrote:\n> On Sep 7, 2017, at 3:13 PM, Ross Zwisler <ross.zwisler@linux.intel.com> wrote:\n> > \n> > On Thu, Sep 07, 2017 at 01:54:45PM -0700, Dan Williams wrote:\n> >> On Wed, Sep 6, 2017 at 10:07 AM, Ross Zwisler\n> >> <ross.zwisler@linux.intel.com> wrote:\n> >>> On Tue, Sep 05, 2017 at 09:12:35PM -0500, Eric Sandeen wrote:\n> >>>> On 9/5/17 5:35 PM, Ross Zwisler wrote:\n> >>>>> The original intent of this series was to add a per-inode DAX flag to ext4\n> >>>>> so that it would be consistent with XFS.  In my travels I found and fixed\n> >>>>> several related issues in both ext4 and XFS.\n> >>>> \n> >>>> Hi Ross -\n> >>>> \n> >>>> hch had a lot of reasons to nuke the dax flag from orbit, and we just\n> >>>> /disabled/ it in xfs due to its habit of crashing the kernel...\n> >>> \n> >>> Ah, sorry, I wasn't CC'd on those threads and missed them.  For any interested\n> >>> bystanders:\n> >>> \n> >>> https://www.spinics.net/lists/linux-ext4/msg57840.html\n> >>> https://www.spinics.net/lists/linux-xfs/msg09831.html\n> >>> https://www.spinics.net/lists/linux-xfs/msg10124.html\n> >>> \n> >>>> so a couple questions:\n> >>>> \n> >>>> 1) does this series pass hch's \"test the per-inode DAX flag\" fstest?\n> >>> \n> >>> Nope, it has the exact same problems as the XFS per-inode DAX flag.\n> >>> \n> >>>> 2) do we have an agreement that we need this flag at all, or is this\n> >>>>   just a parity item because xfs has^whad a per-inode flag?\n> >>> \n> >>> It was for parity, and because it allows admins finer grained control over\n> >>> their system.  Basically all things discussed in response to Lukas's original\n> >>> patch in the first link above.\n> >> \n> >> I think it's more than parity. When pmem is slower than page cache it\n> >> is actively harmful to have DAX enabled globally for a filesystem. So,\n> >> not only should we push for per-inode DAX control, we should also push\n> >> to deprecate the mount option. I agree with Christoph that we should\n> >> try to automatically and transparently enable DAX where it makes\n> >> sense, but we also need a finer-grained mechanism than a mount flag to\n> >> force the behavior one way or the other.\n> > \n> > Yep, agreed.  I'll play with how to make this work after I've sorted out all\n> > the data corruptions I've found. :)\n> \n> It seems that the majority of problems are from enabling/disabling S_DAX\n> on an inode that already has dirty data. \n\nI don't think it's precisely about dirty data, more about having mappings set\nup and I/Os in flight, even if those are read operations.  Tomorrow I'll post\nsome xfstests for the data corruptions due to DAX + each of inline data and\njournaling, and those both happen because we set up one mapping to page cache,\nand one to DAX.  Once either is written to they become out of sync.\n\n> However, I wonder if this could\n> be prevented at runtime, and only allow S_DAX to be set when the inode is\n> first instantiated, and wouldn't be allowed to change after that?  Setting\n> or clearing the per-inode DAX flag might still be allowed, but it wouldn't\n> be enabled until the inode is next fetched into cache?  Similarly, for\n> inodes that have conflicting features (e.g. inline data or encryption)\n> would not be allowed to enable S_DAX.\n\nOoh, this seems interesting.  This would ensure that S_DAX transitions\ncouldn't ever race with I/Os or mmaps().  I had some other ideas for how to\nhandle this, but I think your idea is more promising. :)\n\nI guess with this solution we'd need:\n\na) A good way of letting the user detect the state where they had set the DAX\ninode flag, but that it wasn't yet in use by the inode.\n\nb) A reliable way of flushing the inode from the filesystem cache, so that the\nnext time an open() happens they get the new behavior.  The way I usually do\nthis is via umount/remount, but there is probably already a way to do this?\n\n> My assumption here is that it is possible to fall back to always using\n> page cache for such inodes, and flush the data to pmem via the block\n> interface for inodes that don't have S_DAX set?\n\nCorrect.\n\n> That would allow the vast majority of cases to work out of the box, or in\n> a few rare cases where the DAX feature is being changed (e.g. inline data\n> inode on disk growing to external disk blocks) would use the page cache\n> until such a time that the inode is dropped from cache and reloaded (at\n> worst the next remount).\n\nAh, yep, this has the potential to solve those cases as well.  Seems\npromising, to me at least. :)","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xpDgz4Mf8z9sBZ\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  8 Sep 2017 07:52:15 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S932464AbdIGVvv (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 7 Sep 2017 17:51:51 -0400","from mga02.intel.com ([134.134.136.20]:9803 \"EHLO mga02.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S932095AbdIGVvu (ORCPT <rfc822;linux-ext4@vger.kernel.org>);\n\tThu, 7 Sep 2017 17:51:50 -0400","from fmsmga005.fm.intel.com ([10.253.24.32])\n\tby orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;\n\t07 Sep 2017 14:51:49 -0700","from theros.lm.intel.com (HELO linux.intel.com) ([10.232.112.77])\n\tby fmsmga005.fm.intel.com with ESMTP; 07 Sep 2017 14:51:48 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.42,360,1500966000\"; d=\"scan'208\";a=\"148821834\"","Date":"Thu, 7 Sep 2017 15:51:48 -0600","From":"Ross Zwisler <ross.zwisler@linux.intel.com>","To":"Andreas Dilger <adilger@dilger.ca>","Cc":"Ross Zwisler <ross.zwisler@linux.intel.com>,\n\tDan Williams <dan.j.williams@intel.com>,\n\tEric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tTheodore Ts'o <tytso@mit.edu>, Christoph Hellwig <hch@lst.de>,\n\tDave Chinner <david@fromorbit.com>, Jan Kara <jack@suse.cz>,\n\tlinux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\txfs <linux-xfs@vger.kernel.org>","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Message-ID":"<20170907215148.GA12669@linux.intel.com>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>\n\t<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>\n\t<20170907211303.GA23212@linux.intel.com>\n\t<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>","User-Agent":"Mutt/1.8.3 (2017-05-23)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1764973,"web_url":"http://patchwork.ozlabs.org/comment/1764973/","msgid":"<20170907221201.GZ17782@dastard>","list_archive_url":null,"date":"2017-09-07T22:12:01","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":421,"url":"http://patchwork.ozlabs.org/api/people/421/","name":"Dave Chinner","email":"david@fromorbit.com"},"content":"On Thu, Sep 07, 2017 at 03:51:48PM -0600, Ross Zwisler wrote:\n> On Thu, Sep 07, 2017 at 03:26:10PM -0600, Andreas Dilger wrote:\n> > However, I wonder if this could\n> > be prevented at runtime, and only allow S_DAX to be set when the inode is\n> > first instantiated, and wouldn't be allowed to change after that?  Setting\n> > or clearing the per-inode DAX flag might still be allowed, but it wouldn't\n> > be enabled until the inode is next fetched into cache?  Similarly, for\n> > inodes that have conflicting features (e.g. inline data or encryption)\n> > would not be allowed to enable S_DAX.\n> \n> Ooh, this seems interesting.  This would ensure that S_DAX transitions\n> couldn't ever race with I/Os or mmaps().  I had some other ideas for how to\n> handle this, but I think your idea is more promising. :)\n\nIMO, that's an awful admin interface - it can't be done on demand\n(i.e. when needed) because we can't force an inode to be evicted\nfrom the cache. And then we have the \"why the hell did that just\nchange\" problem if an inode is evicted due to memory pressure and\nthen immediately reinstantiated by the running workload. That's a\nrecipe for driving admins insane...\n\n> I guess with this solution we'd need:\n> \n> a) A good way of letting the user detect the state where they had set the DAX\n> inode flag, but that it wasn't yet in use by the inode.\n> \n> b) A reliable way of flushing the inode from the filesystem cache, so that the\n> next time an open() happens they get the new behavior.  The way I usually do\n> this is via umount/remount, but there is probably already a way to do this?\n\nNot if it's referenced. And if it's not referenced, then the only\nhammer we have is Brutus^Wdrop_caches. That's not an option for\nproduction machines.\n\nNeat idea, but one I'd already thought of and discarded as \"not\npractical from an admin perspective\".\n\nCheers,\n\nDave.","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xpF784c6Qz9sCZ\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  8 Sep 2017 08:12:20 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S932311AbdIGWMH (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 7 Sep 2017 18:12:07 -0400","from ipmail07.adl2.internode.on.net ([150.101.137.131]:65237 \"EHLO\n\tipmail07.adl2.internode.on.net\" rhost-flags-OK-OK-OK-OK)\n\tby vger.kernel.org with ESMTP id S932238AbdIGWMG (ORCPT\n\t<rfc822; linux-ext4@vger.kernel.org>); Thu, 7 Sep 2017 18:12:06 -0400","from ppp59-167-129-252.static.internode.on.net (HELO dastard)\n\t([59.167.129.252]) by ipmail07.adl2.internode.on.net with ESMTP;\n\t08 Sep 2017 07:42:03 +0930","from dave by dastard with local (Exim 4.80)\n\t(envelope-from <david@fromorbit.com>)\n\tid 1dq51l-0007iC-W6; Fri, 08 Sep 2017 08:12:02 +1000"],"X-IronPort-Anti-Spam-Filtered":"true","X-IronPort-Anti-Spam-Result":"A2ARAwApw7FZ//yBpztcGgEBAQECAQEBAQgBAQEBhSwnjwyPZwEBAQEBAQaBKo0YiyKFQgQCAoRaAQIBAQEBAQJrKIUYAQEBAwE6HCMFCwgDGAklDwUlAyETG4oJBQeubIs5AQEIAiYggwqBG4FugimDKYppBaB0lESSfpZkV4ENMiEIHBWFYRyBeS42iikBAQE","Date":"Fri, 8 Sep 2017 08:12:01 +1000","From":"Dave Chinner <david@fromorbit.com>","To":"Ross Zwisler <ross.zwisler@linux.intel.com>","Cc":"Andreas Dilger <adilger@dilger.ca>,\n\tDan Williams <dan.j.williams@intel.com>,\n\tEric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tTheodore Ts'o <tytso@mit.edu>, Christoph Hellwig <hch@lst.de>,\n\tJan Kara <jack@suse.cz>, linux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\txfs <linux-xfs@vger.kernel.org>","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Message-ID":"<20170907221201.GZ17782@dastard>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>\n\t<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>\n\t<20170907211303.GA23212@linux.intel.com>\n\t<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>\n\t<20170907215148.GA12669@linux.intel.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170907215148.GA12669@linux.intel.com>","User-Agent":"Mutt/1.5.21 (2010-09-15)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1764977,"web_url":"http://patchwork.ozlabs.org/comment/1764977/","msgid":"<20170907221900.GB12669@linux.intel.com>","list_archive_url":null,"date":"2017-09-07T22:19:00","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":46514,"url":"http://patchwork.ozlabs.org/api/people/46514/","name":"Ross Zwisler","email":"ross.zwisler@linux.intel.com"},"content":"On Fri, Sep 08, 2017 at 08:12:01AM +1000, Dave Chinner wrote:\n> On Thu, Sep 07, 2017 at 03:51:48PM -0600, Ross Zwisler wrote:\n> > On Thu, Sep 07, 2017 at 03:26:10PM -0600, Andreas Dilger wrote:\n> > > However, I wonder if this could\n> > > be prevented at runtime, and only allow S_DAX to be set when the inode is\n> > > first instantiated, and wouldn't be allowed to change after that?  Setting\n> > > or clearing the per-inode DAX flag might still be allowed, but it wouldn't\n> > > be enabled until the inode is next fetched into cache?  Similarly, for\n> > > inodes that have conflicting features (e.g. inline data or encryption)\n> > > would not be allowed to enable S_DAX.\n> > \n> > Ooh, this seems interesting.  This would ensure that S_DAX transitions\n> > couldn't ever race with I/Os or mmaps().  I had some other ideas for how to\n> > handle this, but I think your idea is more promising. :)\n> \n> IMO, that's an awful admin interface - it can't be done on demand\n> (i.e. when needed) because we can't force an inode to be evicted\n> from the cache. And then we have the \"why the hell did that just\n> change\" problem if an inode is evicted due to memory pressure and\n> then immediately reinstantiated by the running workload. That's a\n> recipe for driving admins insane...\n> \n> > I guess with this solution we'd need:\n> > \n> > a) A good way of letting the user detect the state where they had set the DAX\n> > inode flag, but that it wasn't yet in use by the inode.\n> > \n> > b) A reliable way of flushing the inode from the filesystem cache, so that the\n> > next time an open() happens they get the new behavior.  The way I usually do\n> > this is via umount/remount, but there is probably already a way to do this?\n> \n> Not if it's referenced. And if it's not referenced, then the only\n> hammer we have is Brutus^Wdrop_caches. That's not an option for\n> production machines.\n> \n> Neat idea, but one I'd already thought of and discarded as \"not\n> practical from an admin perspective\".\n\nOkay, so other ideas (which you have also probably already though of) include:\n\n1) Just return -EBUSY if anyone tries to change the DAX flag of an inode with\nopen mappings or any open file handles.  To prevent TOCTOU races we'd have to\ndo some additional locking while actually changing the flag.\n\n2) Be more drastic and follow the flow of ext4 file based encryption, only\nallowing the inode flag to be set by an admin on an empty directory.  Files in\nthat directory will inherit it when they are created, and we don't provide a\nway to clear.  If you want your file to not use DAX, move it to a different\ndirectory (which I think for ext4 encryption turns it into a new inode).\n\nOther ideas?","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xpFHV5tcjz9sCZ\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  8 Sep 2017 08:19:34 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S932596AbdIGWTD (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 7 Sep 2017 18:19:03 -0400","from mga05.intel.com ([192.55.52.43]:51498 \"EHLO mga05.intel.com\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S932287AbdIGWTB (ORCPT <rfc822;linux-ext4@vger.kernel.org>);\n\tThu, 7 Sep 2017 18:19:01 -0400","from fmsmga003.fm.intel.com ([10.253.24.29])\n\tby fmsmga105.fm.intel.com with ESMTP; 07 Sep 2017 15:19:01 -0700","from theros.lm.intel.com (HELO linux.intel.com) ([10.232.112.77])\n\tby FMSMGA003.fm.intel.com with ESMTP; 07 Sep 2017 15:19:00 -0700"],"X-ExtLoop1":"1","X-IronPort-AV":"E=Sophos;i=\"5.42,360,1500966000\"; d=\"scan'208\";a=\"898184763\"","Date":"Thu, 7 Sep 2017 16:19:00 -0600","From":"Ross Zwisler <ross.zwisler@linux.intel.com>","To":"Dave Chinner <david@fromorbit.com>","Cc":"Ross Zwisler <ross.zwisler@linux.intel.com>,\n\tAndreas Dilger <adilger@dilger.ca>,\n\tDan Williams <dan.j.williams@intel.com>,\n\tEric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tTheodore Ts'o <tytso@mit.edu>, Christoph Hellwig <hch@lst.de>,\n\tJan Kara <jack@suse.cz>, linux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\txfs <linux-xfs@vger.kernel.org>","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Message-ID":"<20170907221900.GB12669@linux.intel.com>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>\n\t<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>\n\t<20170907211303.GA23212@linux.intel.com>\n\t<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>\n\t<20170907215148.GA12669@linux.intel.com>\n\t<20170907221201.GZ17782@dastard>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170907221201.GZ17782@dastard>","User-Agent":"Mutt/1.8.3 (2017-05-23)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1765001,"web_url":"http://patchwork.ozlabs.org/comment/1765001/","msgid":"<20170907232543.GB17782@dastard>","list_archive_url":null,"date":"2017-09-07T23:25:43","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":421,"url":"http://patchwork.ozlabs.org/api/people/421/","name":"Dave Chinner","email":"david@fromorbit.com"},"content":"On Thu, Sep 07, 2017 at 04:19:00PM -0600, Ross Zwisler wrote:\n> On Fri, Sep 08, 2017 at 08:12:01AM +1000, Dave Chinner wrote:\n> > On Thu, Sep 07, 2017 at 03:51:48PM -0600, Ross Zwisler wrote:\n> > > On Thu, Sep 07, 2017 at 03:26:10PM -0600, Andreas Dilger wrote:\n> > > > However, I wonder if this could\n> > > > be prevented at runtime, and only allow S_DAX to be set when the inode is\n> > > > first instantiated, and wouldn't be allowed to change after that?  Setting\n> > > > or clearing the per-inode DAX flag might still be allowed, but it wouldn't\n> > > > be enabled until the inode is next fetched into cache?  Similarly, for\n> > > > inodes that have conflicting features (e.g. inline data or encryption)\n> > > > would not be allowed to enable S_DAX.\n> > > \n> > > Ooh, this seems interesting.  This would ensure that S_DAX transitions\n> > > couldn't ever race with I/Os or mmaps().  I had some other ideas for how to\n> > > handle this, but I think your idea is more promising. :)\n> > \n> > IMO, that's an awful admin interface - it can't be done on demand\n> > (i.e. when needed) because we can't force an inode to be evicted\n> > from the cache. And then we have the \"why the hell did that just\n> > change\" problem if an inode is evicted due to memory pressure and\n> > then immediately reinstantiated by the running workload. That's a\n> > recipe for driving admins insane...\n> > \n> > > I guess with this solution we'd need:\n> > > \n> > > a) A good way of letting the user detect the state where they had set the DAX\n> > > inode flag, but that it wasn't yet in use by the inode.\n> > > \n> > > b) A reliable way of flushing the inode from the filesystem cache, so that the\n> > > next time an open() happens they get the new behavior.  The way I usually do\n> > > this is via umount/remount, but there is probably already a way to do this?\n> > \n> > Not if it's referenced. And if it's not referenced, then the only\n> > hammer we have is Brutus^Wdrop_caches. That's not an option for\n> > production machines.\n> > \n> > Neat idea, but one I'd already thought of and discarded as \"not\n> > practical from an admin perspective\".\n> \n> Okay, so other ideas (which you have also probably already though of) include:\n> \n> 1) Just return -EBUSY if anyone tries to change the DAX flag of an inode with\n> open mappings or any open file handles.\n\nYou have to have an open fd to change the flag. :)\n\n> To prevent TOCTOU races we'd have to\n> do some additional locking while actually changing the flag.\n\nI think that make sense - the fundamental problem is that the\nmappings are different between dax and non-dax, and that we can't\nproperly lock out page faults to to prevent sending a racing\npage fault down the wrong path.\n\n> 2) Be more drastic and follow the flow of ext4 file based encryption, only\n> allowing the inode flag to be set by an admin on an empty directory.  Files in\n> that directory will inherit it when they are created, and we don't provide a\n> way to clear.  If you want your file to not use DAX, move it to a different\n> directory (which I think for ext4 encryption turns it into a new inode).\n\nSeems like the wrong model to me - moving application data files\nis a PITA because you've also go to change the app config to point\nat the new location...\n\n> Other ideas?\n\nIMO, we need to fix the page fault path so we don't look at inode\nflags to determine processing behaviour during the fault. Fault\nprocessing as DAX or non-dax needs to be determined by the page\nfault code and communicated to the fs via the vmf as the contents\nof the vmf for a dax fault can be invalid for a non-dax fault. Fixing\nthat problem (i.e. make DAX is a property of the mapping and\ninstantiate it from the inode only at mmap() time) means all the\npage fault vs inode flag race problems go away and we have a model\nthat is much more robust if we want to expand it in future.\n\nCombine that with -EBUSY when there are active mappings as you've\nproposed above and I think we've got a much more solid solution to\nthe problem.\n\n-Dave.","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xpGpZ3qqWz9s7C\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  8 Sep 2017 09:28:06 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751543AbdIGXZs (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tThu, 7 Sep 2017 19:25:48 -0400","from ipmail07.adl2.internode.on.net ([150.101.137.131]:53559 \"EHLO\n\tipmail07.adl2.internode.on.net\" rhost-flags-OK-OK-OK-OK)\n\tby vger.kernel.org with ESMTP id S1750852AbdIGXZr (ORCPT\n\t<rfc822; linux-ext4@vger.kernel.org>); Thu, 7 Sep 2017 19:25:47 -0400","from ppp59-167-129-252.static.internode.on.net (HELO dastard)\n\t([59.167.129.252]) by ipmail07.adl2.internode.on.net with ESMTP;\n\t08 Sep 2017 08:55:44 +0930","from dave by dastard with local (Exim 4.80)\n\t(envelope-from <david@fromorbit.com>)\n\tid 1dq6B5-0007mk-CF; Fri, 08 Sep 2017 09:25:43 +1000"],"X-IronPort-Anti-Spam-Filtered":"true","X-IronPort-Anti-Spam-Result":"A2DAAgCB1LFZ//yBpztcGgEBAQECAQEBAQgBAQEBgy2BfyeDLItgj2sBAQaBKo0YiyKFQgQCAoRaAQIBAQEBAQJrKIUYAQEBAwE6HCMFCwgDGAklDwUlAyETG4oJBQeuFIs5AQEBAQYCASUggwqBG4FugikBgyiIOIIxBaB0lESERI46lmRXgQ0yIQgcFYVhHIF5LjaKHgEBAQ","Date":"Fri, 8 Sep 2017 09:25:43 +1000","From":"Dave Chinner <david@fromorbit.com>","To":"Ross Zwisler <ross.zwisler@linux.intel.com>","Cc":"Andreas Dilger <adilger@dilger.ca>,\n\tDan Williams <dan.j.williams@intel.com>,\n\tEric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tTheodore Ts'o <tytso@mit.edu>, Christoph Hellwig <hch@lst.de>,\n\tJan Kara <jack@suse.cz>, linux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\txfs <linux-xfs@vger.kernel.org>","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Message-ID":"<20170907232543.GB17782@dastard>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>\n\t<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>\n\t<20170907211303.GA23212@linux.intel.com>\n\t<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>\n\t<20170907215148.GA12669@linux.intel.com>\n\t<20170907221201.GZ17782@dastard>\n\t<20170907221900.GB12669@linux.intel.com>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170907221900.GB12669@linux.intel.com>","User-Agent":"Mutt/1.5.21 (2010-09-15)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1765213,"web_url":"http://patchwork.ozlabs.org/comment/1765213/","msgid":"<20170908094852.GA30332@quack2.suse.cz>","list_archive_url":null,"date":"2017-09-08T09:48:52","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":363,"url":"http://patchwork.ozlabs.org/api/people/363/","name":"Jan Kara","email":"jack@suse.cz"},"content":"On Fri 08-09-17 09:25:43, Dave Chinner wrote:\n> On Thu, Sep 07, 2017 at 04:19:00PM -0600, Ross Zwisler wrote:\n> > On Fri, Sep 08, 2017 at 08:12:01AM +1000, Dave Chinner wrote:\n> > > On Thu, Sep 07, 2017 at 03:51:48PM -0600, Ross Zwisler wrote:\n> > > > On Thu, Sep 07, 2017 at 03:26:10PM -0600, Andreas Dilger wrote:\n> > > > > However, I wonder if this could\n> > > > > be prevented at runtime, and only allow S_DAX to be set when the inode is\n> > > > > first instantiated, and wouldn't be allowed to change after that?  Setting\n> > > > > or clearing the per-inode DAX flag might still be allowed, but it wouldn't\n> > > > > be enabled until the inode is next fetched into cache?  Similarly, for\n> > > > > inodes that have conflicting features (e.g. inline data or encryption)\n> > > > > would not be allowed to enable S_DAX.\n> > > > \n> > > > Ooh, this seems interesting.  This would ensure that S_DAX transitions\n> > > > couldn't ever race with I/Os or mmaps().  I had some other ideas for how to\n> > > > handle this, but I think your idea is more promising. :)\n> > > \n> > > IMO, that's an awful admin interface - it can't be done on demand\n> > > (i.e. when needed) because we can't force an inode to be evicted\n> > > from the cache. And then we have the \"why the hell did that just\n> > > change\" problem if an inode is evicted due to memory pressure and\n> > > then immediately reinstantiated by the running workload. That's a\n> > > recipe for driving admins insane...\n> > > \n> > > > I guess with this solution we'd need:\n> > > > \n> > > > a) A good way of letting the user detect the state where they had set the DAX\n> > > > inode flag, but that it wasn't yet in use by the inode.\n> > > > \n> > > > b) A reliable way of flushing the inode from the filesystem cache, so that the\n> > > > next time an open() happens they get the new behavior.  The way I usually do\n> > > > this is via umount/remount, but there is probably already a way to do this?\n> > > \n> > > Not if it's referenced. And if it's not referenced, then the only\n> > > hammer we have is Brutus^Wdrop_caches. That's not an option for\n> > > production machines.\n> > > \n> > > Neat idea, but one I'd already thought of and discarded as \"not\n> > > practical from an admin perspective\".\n> > \n> > Okay, so other ideas (which you have also probably already though of) include:\n> > \n> > 1) Just return -EBUSY if anyone tries to change the DAX flag of an inode with\n> > open mappings or any open file handles.\n> \n> You have to have an open fd to change the flag. :)\n\nYeah, open file handles don't matter and we can serialize against IO in\nprogress, that's not a big deal. Established mappings are difficult to deal\nwith.\n\n> > To prevent TOCTOU races we'd have to\n> > do some additional locking while actually changing the flag.\n> \n> I think that make sense - the fundamental problem is that the\n> mappings are different between dax and non-dax, and that we can't\n> properly lock out page faults to to prevent sending a racing\n> page fault down the wrong path.\n> \n> > 2) Be more drastic and follow the flow of ext4 file based encryption, only\n> > allowing the inode flag to be set by an admin on an empty directory.  Files in\n> > that directory will inherit it when they are created, and we don't provide a\n> > way to clear.  If you want your file to not use DAX, move it to a different\n> > directory (which I think for ext4 encryption turns it into a new inode).\n> \n> Seems like the wrong model to me - moving application data files\n> is a PITA because you've also go to change the app config to point\n> at the new location...\n\nAgreed.\n\n> > Other ideas?\n> \n> IMO, we need to fix the page fault path so we don't look at inode\n> flags to determine processing behaviour during the fault. Fault\n> processing as DAX or non-dax needs to be determined by the page\n> fault code and communicated to the fs via the vmf as the contents\n> of the vmf for a dax fault can be invalid for a non-dax fault. Fixing\n> that problem (i.e. make DAX is a property of the mapping and\n> instantiate it from the inode only at mmap() time) means all the\n> page fault vs inode flag race problems go away and we have a model\n> that is much more robust if we want to expand it in future.\n\nIn fact, the real problem is only with .page_mkwrite and .pfn_mkwrite\ncallbacks. For those setup of 'vmf' differs. For .fault or .huge_fault the\nvmf is the same regardless whether we do DAX or non-DAX fault. But it seems\ndifficult to me to determine DAX / non-DAX fault in vmf since locks\nnecessary to stabilize S_DAX flag are acquired only in filesystem-specific\nhandlers (and the locks themselves are fs specific).\n\nSo the only way I see of dealing safely with these races is careful\nchecking in .page_mkwrite and .pfn_mkwrite after necessary locks are\nobtained and bail out doing nothing if state is inconsistent. VM will retry\nthe fault and we'll get to the correct handler next time.\n\nBut if we disallow any mappings when switching S_DAX flag, then all the\nabove is moot and there can be no races... We just have to be sure to block\nnew mappings of the file while switching the flag.\n\n\t\t\t\t\t\t\t\tHonza","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xpXbD4rrxz9s81\n\tfor <patchwork-incoming@ozlabs.org>;\n\tFri,  8 Sep 2017 19:49:12 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1754177AbdIHJs6 (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 8 Sep 2017 05:48:58 -0400","from mx2.suse.de ([195.135.220.15]:45955 \"EHLO mx1.suse.de\"\n\trhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP\n\tid S1753835AbdIHJs4 (ORCPT <rfc822;linux-ext4@vger.kernel.org>);\n\tFri, 8 Sep 2017 05:48:56 -0400","from relay1.suse.de (charybdis-ext.suse.de [195.135.220.254])\n\tby mx1.suse.de (Postfix) with ESMTP id 9E771AD1E;\n\tFri,  8 Sep 2017 09:48:54 +0000 (UTC)","by quack2.suse.cz (Postfix, from userid 1000)\n\tid 8BCE01E1191; Fri,  8 Sep 2017 11:48:52 +0200 (CEST)"],"X-Virus-Scanned":"by amavisd-new at test-mx.suse.de","Date":"Fri, 8 Sep 2017 11:48:52 +0200","From":"Jan Kara <jack@suse.cz>","To":"Dave Chinner <david@fromorbit.com>","Cc":"Ross Zwisler <ross.zwisler@linux.intel.com>,\n\tAndreas Dilger <adilger@dilger.ca>,\n\tDan Williams <dan.j.williams@intel.com>,\n\tEric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tTheodore Ts'o <tytso@mit.edu>, Christoph Hellwig <hch@lst.de>,\n\tJan Kara <jack@suse.cz>, linux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\txfs <linux-xfs@vger.kernel.org>","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Message-ID":"<20170908094852.GA30332@quack2.suse.cz>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>\n\t<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>\n\t<20170907211303.GA23212@linux.intel.com>\n\t<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>\n\t<20170907215148.GA12669@linux.intel.com>\n\t<20170907221201.GZ17782@dastard>\n\t<20170907221900.GB12669@linux.intel.com>\n\t<20170907232543.GB17782@dastard>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170907232543.GB17782@dastard>","User-Agent":"Mutt/1.5.24 (2015-08-30)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1765465,"web_url":"http://patchwork.ozlabs.org/comment/1765465/","msgid":"<20170908153913.jjhzogjs5zpeea5v@thunk.org>","list_archive_url":null,"date":"2017-09-08T15:39:13","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":350,"url":"http://patchwork.ozlabs.org/api/people/350/","name":"Theodore Tso","email":"tytso@mit.edu"},"content":"On Fri, Sep 08, 2017 at 09:25:43AM +1000, Dave Chinner wrote:\n> > Okay, so other ideas (which you have also probably already though of) include:\n> > \n> > 1) Just return -EBUSY if anyone tries to change the DAX flag of an inode with\n> > open mappings or any open file handles.\n> \n> You have to have an open fd to change the flag. :)\n\nWhat if we only allow the S_DAX flag to be *set*, when i_size and\ni_blocks is zero?  We could also require that only one file descriptor\nbe open against the inode, and that it be opened O_RDONLY.\n\n\t\t\t\t\t\t- Ted","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":["ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","ozlabs.org; dkim=pass (1024-bit key;\n\tunprotected) header.d=thunk.org header.i=@thunk.org\n\theader.b=\"oJxdThN3\"; dkim-atps=neutral"],"Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xphXW2Mftz9t2r\n\tfor <patchwork-incoming@ozlabs.org>;\n\tSat,  9 Sep 2017 01:47:23 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1756580AbdIHPjg (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tFri, 8 Sep 2017 11:39:36 -0400","from imap.thunk.org ([74.207.234.97]:42298 \"EHLO imap.thunk.org\"\n\trhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP\n\tid S1756510AbdIHPjd (ORCPT <rfc822;linux-ext4@vger.kernel.org>);\n\tFri, 8 Sep 2017 11:39:33 -0400","from root (helo=callcc.thunk.org)\n\tby imap.thunk.org with local-esmtp (Exim 4.84_2)\n\t(envelope-from <tytso@thunk.org>)\n\tid 1dqLNC-0003qE-8M; Fri, 08 Sep 2017 15:39:14 +0000","by callcc.thunk.org (Postfix, from userid 15806)\n\tid 9CF60C001C2; Fri,  8 Sep 2017 11:39:13 -0400 (EDT)"],"DKIM-Signature":"v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=thunk.org;\n\ts=ef5046eb; \n\th=In-Reply-To:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date;\n\tbh=n4knO1Mm+qnvkHHC101hz5ceF/BgzYl9pmC18wFDEAU=; \n\tb=oJxdThN35wDsqiVVkCcumD3rzKqdoXp27JkwQaBKZ3B8nXdemC8flRHlR6GsP7tms6BEE2n3tIyFCIChV/xPovMcFxzKrF7zJB5+vBFgdb66nd6baUhjSIQCooRqyx2Rc1bIBlsekJM7jumFc/O1SKccaEaU34Xjfllg9UOKr6g=;","Date":"Fri, 8 Sep 2017 11:39:13 -0400","From":"Theodore Ts'o <tytso@mit.edu>","To":"Dave Chinner <david@fromorbit.com>","Cc":"Ross Zwisler <ross.zwisler@linux.intel.com>,\n\tAndreas Dilger <adilger@dilger.ca>,\n\tDan Williams <dan.j.williams@intel.com>,\n\tEric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tChristoph Hellwig <hch@lst.de>, Jan Kara <jack@suse.cz>,\n\tlinux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\txfs <linux-xfs@vger.kernel.org>","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Message-ID":"<20170908153913.jjhzogjs5zpeea5v@thunk.org>","Mail-Followup-To":"Theodore Ts'o <tytso@mit.edu>,\n\tDave Chinner <david@fromorbit.com>,\n\tRoss Zwisler <ross.zwisler@linux.intel.com>,\n\tAndreas Dilger <adilger@dilger.ca>,\n\tDan Williams <dan.j.williams@intel.com>,\n\tEric Sandeen <sandeen@redhat.com>,\n\tLukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tChristoph Hellwig <hch@lst.de>, Jan Kara <jack@suse.cz>,\n\tlinux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\txfs <linux-xfs@vger.kernel.org>","References":"<20170905223541.20594-1-ross.zwisler@linux.intel.com>\n\t<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>\n\t<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>\n\t<20170907211303.GA23212@linux.intel.com>\n\t<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>\n\t<20170907215148.GA12669@linux.intel.com>\n\t<20170907221201.GZ17782@dastard>\n\t<20170907221900.GB12669@linux.intel.com>\n\t<20170907232543.GB17782@dastard>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170907232543.GB17782@dastard>","User-Agent":"NeoMutt/20170609 (1.8.3)","X-SA-Exim-Connect-IP":"<locally generated>","X-SA-Exim-Mail-From":"tytso@thunk.org","X-SA-Exim-Scanned":"No (on imap.thunk.org); SAEximRunCond expanded to false","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}},{"id":1766132,"web_url":"http://patchwork.ozlabs.org/comment/1766132/","msgid":"<20170911084733.GB7120@quack2.suse.cz>","list_archive_url":null,"date":"2017-09-11T08:47:33","subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","submitter":{"id":363,"url":"http://patchwork.ozlabs.org/api/people/363/","name":"Jan Kara","email":"jack@suse.cz"},"content":"On Fri 08-09-17 11:39:13, Ted Tso wrote:\n> On Fri, Sep 08, 2017 at 09:25:43AM +1000, Dave Chinner wrote:\n> > > Okay, so other ideas (which you have also probably already though of) include:\n> > > \n> > > 1) Just return -EBUSY if anyone tries to change the DAX flag of an inode with\n> > > open mappings or any open file handles.\n> > \n> > You have to have an open fd to change the flag. :)\n> \n> What if we only allow the S_DAX flag to be *set*, when i_size and\n> i_blocks is zero?  We could also require that only one file descriptor\n> be open against the inode, and that it be opened O_RDONLY.\n\nWe could do something like that but IMHO it will be a pain to use (e.g.\nthink how difficult it would be to switch your existing database to use DAX\nfor data files). We can make transition reliable whenever\ninode->i_mapping->i_mmap RB tree is empty (effectively: whenever the file\nis not mmaped). And that should be relaxed enough for most usecases... But\nI agree that it will be somewhat tricky to prevent creation of new mappings\nwhile we are switching S_DAX flag so it needs more though.\n\n\t\t\t\t\t\t\t\tHonza","headers":{"Return-Path":"<linux-ext4-owner@vger.kernel.org>","X-Original-To":"patchwork-incoming@ozlabs.org","Delivered-To":"patchwork-incoming@ozlabs.org","Authentication-Results":"ozlabs.org;\n\tspf=none (mailfrom) smtp.mailfrom=vger.kernel.org\n\t(client-ip=209.132.180.67; helo=vger.kernel.org;\n\tenvelope-from=linux-ext4-owner@vger.kernel.org;\n\treceiver=<UNKNOWN>)","Received":["from vger.kernel.org (vger.kernel.org [209.132.180.67])\n\tby ozlabs.org (Postfix) with ESMTP id 3xrM565QQTz9s7g\n\tfor <patchwork-incoming@ozlabs.org>;\n\tMon, 11 Sep 2017 18:47:54 +1000 (AEST)","(majordomo@vger.kernel.org) by vger.kernel.org via listexpand\n\tid S1751087AbdIKIrk (ORCPT <rfc822;patchwork-incoming@ozlabs.org>);\n\tMon, 11 Sep 2017 04:47:40 -0400","from mx2.suse.de ([195.135.220.15]:51033 \"EHLO mx1.suse.de\"\n\trhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP\n\tid S1751065AbdIKIrj (ORCPT <rfc822;linux-ext4@vger.kernel.org>);\n\tMon, 11 Sep 2017 04:47:39 -0400","from relay2.suse.de (charybdis-ext.suse.de [195.135.220.254])\n\tby mx1.suse.de (Postfix) with ESMTP id EE60BAC0B;\n\tMon, 11 Sep 2017 08:47:36 +0000 (UTC)","by quack2.suse.cz (Postfix, from userid 1000)\n\tid D8E7F1E0D83; Mon, 11 Sep 2017 10:47:33 +0200 (CEST)"],"X-Virus-Scanned":"by amavisd-new at test-mx.suse.de","Date":"Mon, 11 Sep 2017 10:47:33 +0200","From":"Jan Kara <jack@suse.cz>","To":"Theodore Ts'o <tytso@mit.edu>","Cc":"Dave Chinner <david@fromorbit.com>,\n\tRoss Zwisler <ross.zwisler@linux.intel.com>,\n\tAndreas Dilger <adilger@dilger.ca>,\n\tDan Williams <dan.j.williams@intel.com>,\n\tEric Sandeen <sandeen@redhat.com>, Lukas Czerner <lczerner@redhat.com>,\n\tAndrew Morton <akpm@linux-foundation.org>,\n\t\"linux-kernel@vger.kernel.org\" <linux-kernel@vger.kernel.org>,\n\t\"Darrick J. Wong\" <darrick.wong@oracle.com>,\n\tChristoph Hellwig <hch@lst.de>, Jan Kara <jack@suse.cz>,\n\tlinux-ext4 <linux-ext4@vger.kernel.org>,\n\t\"linux-nvdimm@lists.01.org\" <linux-nvdimm@lists.01.org>,\n\txfs <linux-xfs@vger.kernel.org>","Subject":"Re: [PATCH 0/9] add ext4 per-inode DAX flag","Message-ID":"<20170911084733.GB7120@quack2.suse.cz>","References":"<f856f8fd-3697-adb1-198b-a9ae2fce405f@redhat.com>\n\t<20170906170754.GB17663@linux.intel.com>\n\t<CAPcyv4hfhDT9NFRXL+MT5epiqWHJ0RLraV4P3CZ4EJM6L-s0Nw@mail.gmail.com>\n\t<20170907211303.GA23212@linux.intel.com>\n\t<5F58D3F5-D93B-4648-AE01-8A46956FBB4B@dilger.ca>\n\t<20170907215148.GA12669@linux.intel.com>\n\t<20170907221201.GZ17782@dastard>\n\t<20170907221900.GB12669@linux.intel.com>\n\t<20170907232543.GB17782@dastard>\n\t<20170908153913.jjhzogjs5zpeea5v@thunk.org>","MIME-Version":"1.0","Content-Type":"text/plain; charset=us-ascii","Content-Disposition":"inline","In-Reply-To":"<20170908153913.jjhzogjs5zpeea5v@thunk.org>","User-Agent":"Mutt/1.5.24 (2015-08-30)","Sender":"linux-ext4-owner@vger.kernel.org","Precedence":"bulk","List-ID":"<linux-ext4.vger.kernel.org>","X-Mailing-List":"linux-ext4@vger.kernel.org"}}]