diff mbox series

[SRU,UNSTABLE/FOCAL/BIONIC,v2] UBUNTU: SAUCE: shiftfs: fix dentry revalidation

Message ID 20200518181802.3406919-1-christian.brauner@ubuntu.com
State New
Headers show
Series [SRU,UNSTABLE/FOCAL/BIONIC,v2] UBUNTU: SAUCE: shiftfs: fix dentry revalidation | expand

Commit Message

Christian Brauner May 18, 2020, 6:18 p.m. UTC
BugLink: https://bugs.launchpad.net/bugs/1879196

While fixing [1] we introduced a regression and triggered [2] whereby
the underlay and overlay go out of sync because they disagree about the
dcache. Such situation easily arise when the underlay is modified
directly. This means the overlay needs to revalidate the dentry in the
dcache to catchup with the underlay.

Note, in general it's not advisable to directly modify the underlay
while a shiftfs mount is on top. In some way this means we need to keep
two caches in sync and it's hard enough to keep a single cache happy.
But shiftfs' use-case is inherently prone to be used for exactly that.
So this is something we have to navigate carefully and honestly we have
no full model upstream that does the same. Overlayfs has the copy-up
behavior which let's it get around most of the issues but we don't have
it and ecryptfs is broken in such scenarios which we verified quite a
while back.
In any case, I built a kernel with this patch and re-ran all regressions
that are related to this that we have so far (cf.  [1], [2], and [3]).
None of them were reproducible with this patch here. So we still fix the
ESTALE issue but also keep underlay and overlay in sync.

[1]: https://bugs.launchpad.net/bugs/1872757
[2]: https://bugs.launchpad.net/bugs/1879196
[3]: https://bugs.launchpad.net/bugs/1860041
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: James Troup <james.troup@canonical.com>
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
---
 fs/shiftfs.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)


base-commit: 9dd531622eff2baf657b906293745ca6dab9bb91

Comments

Stefan Bader May 19, 2020, 5:43 a.m. UTC | #1
On 18.05.20 20:18, Christian Brauner wrote:
> BugLink: https://bugs.launchpad.net/bugs/1879196
> 
> While fixing [1] we introduced a regression and triggered [2] whereby
> the underlay and overlay go out of sync because they disagree about the
> dcache. Such situation easily arise when the underlay is modified
> directly. This means the overlay needs to revalidate the dentry in the
> dcache to catchup with the underlay.
> 
> Note, in general it's not advisable to directly modify the underlay
> while a shiftfs mount is on top. In some way this means we need to keep
> two caches in sync and it's hard enough to keep a single cache happy.
> But shiftfs' use-case is inherently prone to be used for exactly that.
> So this is something we have to navigate carefully and honestly we have
> no full model upstream that does the same. Overlayfs has the copy-up
> behavior which let's it get around most of the issues but we don't have
> it and ecryptfs is broken in such scenarios which we verified quite a
> while back.
> In any case, I built a kernel with this patch and re-ran all regressions
> that are related to this that we have so far (cf.  [1], [2], and [3]).
> None of them were reproducible with this patch here. So we still fix the
> ESTALE issue but also keep underlay and overlay in sync.

This test kernel should be provided to the bug reporter, so it can be confirmed
to fix the issue there as well.
But more importantly, you submitted this for Bionic and there is no shiftfs in
that release. Only Eoan. I am not convinced you tried at least to apply the
patch to all the releases you submit for. There is always chances that code had
to be adjusted to be backported. Or that it applies but no longer compiles
because added function calls do not exist there.

-Stefan
> 
> [1]: https://bugs.launchpad.net/bugs/1872757
> [2]: https://bugs.launchpad.net/bugs/1879196
> [3]: https://bugs.launchpad.net/bugs/1860041
> Cc: Seth Forshee <seth.forshee@canonical.com>
> Cc: James Troup <james.troup@canonical.com>
> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
> ---
>  fs/shiftfs.c | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/shiftfs.c b/fs/shiftfs.c
> index 39b878a6f820..652fb67b0e0b 100644
> --- a/fs/shiftfs.c
> +++ b/fs/shiftfs.c
> @@ -240,6 +240,9 @@ static int shiftfs_d_weak_revalidate(struct dentry *dentry, unsigned int flags)
>  	int err = 1;
>  	struct dentry *lowerd = dentry->d_fsdata;
>  
> +	if (d_is_negative(lowerd) != d_is_negative(dentry))
> +		return 0;
> +
>  	if (lowerd->d_flags & DCACHE_OP_WEAK_REVALIDATE) {
>  		err = lowerd->d_op->d_weak_revalidate(lowerd, flags);
>  		if (err < 0)
> @@ -264,13 +267,16 @@ static int shiftfs_d_revalidate(struct dentry *dentry, unsigned int flags)
>  	if (flags & LOOKUP_RCU)
>  		return -ECHILD;
>  
> +	if (d_unhashed(lowerd) ||
> +	    d_is_negative(lowerd) != d_is_negative(dentry))
> +		return 0;
> +
>  	if (lowerd->d_flags & DCACHE_OP_REVALIDATE) {
>  		err = lowerd->d_op->d_revalidate(lowerd, flags);
>  		if (err < 0)
>  			return err;
>  		if (!err) {
> -			if (!(flags & LOOKUP_RCU))
> -				d_invalidate(lowerd);
> +			d_invalidate(lowerd);
>  			return -ESTALE;
>  		}
>  	}
> 
> base-commit: 9dd531622eff2baf657b906293745ca6dab9bb91
>
Christian Brauner May 19, 2020, 6:02 a.m. UTC | #2
On Tue, May 19, 2020 at 07:43:21AM +0200, Stefan Bader wrote:
> On 18.05.20 20:18, Christian Brauner wrote:
> > BugLink: https://bugs.launchpad.net/bugs/1879196
> > 
> > While fixing [1] we introduced a regression and triggered [2] whereby
> > the underlay and overlay go out of sync because they disagree about the
> > dcache. Such situation easily arise when the underlay is modified
> > directly. This means the overlay needs to revalidate the dentry in the
> > dcache to catchup with the underlay.
> > 
> > Note, in general it's not advisable to directly modify the underlay
> > while a shiftfs mount is on top. In some way this means we need to keep
> > two caches in sync and it's hard enough to keep a single cache happy.
> > But shiftfs' use-case is inherently prone to be used for exactly that.
> > So this is something we have to navigate carefully and honestly we have
> > no full model upstream that does the same. Overlayfs has the copy-up
> > behavior which let's it get around most of the issues but we don't have
> > it and ecryptfs is broken in such scenarios which we verified quite a
> > while back.
> > In any case, I built a kernel with this patch and re-ran all regressions
> > that are related to this that we have so far (cf.  [1], [2], and [3]).
> > None of them were reproducible with this patch here. So we still fix the
> > ESTALE issue but also keep underlay and overlay in sync.
> 
> This test kernel should be provided to the bug reporter, so it can be confirmed
> to fix the issue there as well.

See
https://drive.google.com/open?id=19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io
Elmo is also Cced here. I don't have any other place I can upload this,
I think.

> But more importantly, you submitted this for Bionic and there is no shiftfs in
> that release. Only Eoan. I am not convinced you tried at least to apply the

Yeah, sorry that's always my thing mixing up that shiftfs is in Eoan and
not in Bionic.
Similar to all the other patches, Eoan doesn't need a separate patch
I've verified this on top of Eoan master-next that has all the other
shiftfs patches:

ubuntu@amdgpu:~/linux/ubuntu/ubuntu-eoan$ git am ./0001-UBUNTU-SAUCE-shiftfs-fix-dentry-revalidation.patch
Applying: UBUNTU: SAUCE: shiftfs: fix dentry revalidation

on top of

ubuntu@amdgpu:~/linux/ubuntu/ubuntu-eoan$ git am ./0001-UBUNTU-SAUCE-shiftfs-fix-dentry-revalidation.patch
Applying: UBUNTU: SAUCE: shiftfs: fix dentry revalidation
commit d53d4b6679bb89ae78787dca0fea951752e27d48 (upstream/master-next)
Author: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Date:   Fri May 15 13:03:42 2020 +0200

    UBUNTU: Ubuntu-5.3.0-54.48

    Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>

I'm not letting shiftfs across kernel versions go out of sync precisely
(at least until we absolutely have to) so that patches can be applied
trivially across releases so it's easier for all of you.

Thanks!
Christian
Christian Brauner May 19, 2020, 6:54 a.m. UTC | #3
On Tue, May 19, 2020 at 08:02:25AM +0200, Christian Brauner wrote:
> On Tue, May 19, 2020 at 07:43:21AM +0200, Stefan Bader wrote:
> > On 18.05.20 20:18, Christian Brauner wrote:
> > > BugLink: https://bugs.launchpad.net/bugs/1879196
> > > 
> > > While fixing [1] we introduced a regression and triggered [2] whereby
> > > the underlay and overlay go out of sync because they disagree about the
> > > dcache. Such situation easily arise when the underlay is modified
> > > directly. This means the overlay needs to revalidate the dentry in the
> > > dcache to catchup with the underlay.
> > > 
> > > Note, in general it's not advisable to directly modify the underlay
> > > while a shiftfs mount is on top. In some way this means we need to keep
> > > two caches in sync and it's hard enough to keep a single cache happy.
> > > But shiftfs' use-case is inherently prone to be used for exactly that.
> > > So this is something we have to navigate carefully and honestly we have
> > > no full model upstream that does the same. Overlayfs has the copy-up
> > > behavior which let's it get around most of the issues but we don't have
> > > it and ecryptfs is broken in such scenarios which we verified quite a
> > > while back.
> > > In any case, I built a kernel with this patch and re-ran all regressions
> > > that are related to this that we have so far (cf.  [1], [2], and [3]).
> > > None of them were reproducible with this patch here. So we still fix the
> > > ESTALE issue but also keep underlay and overlay in sync.
> > 
> > This test kernel should be provided to the bug reporter, so it can be confirmed
> > to fix the issue there as well.
> 
> See
> https://drive.google.com/open?id=19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io
> Elmo is also Cced here. I don't have any other place I can upload this,
> I think.

Just so I don't get railed for this later in case there's an issue.
Here's an asciinema recording of all passing tests:

https://asciinema.org/a/331649

> 
> > But more importantly, you submitted this for Bionic and there is no shiftfs in
> > that release. Only Eoan. I am not convinced you tried at least to apply the
> 
> Yeah, sorry that's always my thing mixing up that shiftfs is in Eoan and
> not in Bionic.
> Similar to all the other patches, Eoan doesn't need a separate patch
> I've verified this on top of Eoan master-next that has all the other
> shiftfs patches:
> 
> ubuntu@amdgpu:~/linux/ubuntu/ubuntu-eoan$ git am ./0001-UBUNTU-SAUCE-shiftfs-fix-dentry-revalidation.patch
> Applying: UBUNTU: SAUCE: shiftfs: fix dentry revalidation
> 
> on top of
> 
> ubuntu@amdgpu:~/linux/ubuntu/ubuntu-eoan$ git am ./0001-UBUNTU-SAUCE-shiftfs-fix-dentry-revalidation.patch
> Applying: UBUNTU: SAUCE: shiftfs: fix dentry revalidation
> commit d53d4b6679bb89ae78787dca0fea951752e27d48 (upstream/master-next)
> Author: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
> Date:   Fri May 15 13:03:42 2020 +0200
> 
>     UBUNTU: Ubuntu-5.3.0-54.48
> 
>     Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
> 
> I'm not letting shiftfs across kernel versions go out of sync precisely
> (at least until we absolutely have to) so that patches can be applied
> trivially across releases so it's easier for all of you.

Do you still want me to resend with s/BIONIC/EOAN/ in there?

Thanks!
Christian
Stefan Bader May 19, 2020, 8:52 a.m. UTC | #4
On 19.05.20 08:02, Christian Brauner wrote:
> On Tue, May 19, 2020 at 07:43:21AM +0200, Stefan Bader wrote:
>> On 18.05.20 20:18, Christian Brauner wrote:
>>> BugLink: https://bugs.launchpad.net/bugs/1879196
>>>
>>> While fixing [1] we introduced a regression and triggered [2] whereby
>>> the underlay and overlay go out of sync because they disagree about the
>>> dcache. Such situation easily arise when the underlay is modified
>>> directly. This means the overlay needs to revalidate the dentry in the
>>> dcache to catchup with the underlay.
>>>
>>> Note, in general it's not advisable to directly modify the underlay
>>> while a shiftfs mount is on top. In some way this means we need to keep
>>> two caches in sync and it's hard enough to keep a single cache happy.
>>> But shiftfs' use-case is inherently prone to be used for exactly that.
>>> So this is something we have to navigate carefully and honestly we have
>>> no full model upstream that does the same. Overlayfs has the copy-up
>>> behavior which let's it get around most of the issues but we don't have
>>> it and ecryptfs is broken in such scenarios which we verified quite a
>>> while back.
>>> In any case, I built a kernel with this patch and re-ran all regressions
>>> that are related to this that we have so far (cf.  [1], [2], and [3]).
>>> None of them were reproducible with this patch here. So we still fix the
>>> ESTALE issue but also keep underlay and overlay in sync.
>>
>> This test kernel should be provided to the bug reporter, so it can be confirmed
>> to fix the issue there as well.
> 
> See
> https://drive.google.com/open?id=19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io
> Elmo is also Cced here. I don't have any other place I can upload this,
> I think.

Since it is James, you could use private-fileshare.c.c, otherwise people.c.c for
providing test kernels. The reason I would like to see the reporter (did not
look close enough to see its James initially, but even better) confirm the fix
is that while you did regression test against the known issues, this might
introduce something new. And there some unknown use pattern not directly aimed
at something adds value.

> 
>> But more importantly, you submitted this for Bionic and there is no shiftfs in
>> that release. Only Eoan. I am not convinced you tried at least to apply the
> 
> Yeah, sorry that's always my thing mixing up that shiftfs is in Eoan and
> not in Bionic.
> Similar to all the other patches, Eoan doesn't need a separate patch
> I've verified this on top of Eoan master-next that has all the other
> shiftfs patches:

Problem there is that this causes confusion as the bug report has no nominations
to cross-check either. And the SRU justification should go into the description
there. Also one gets a bit doubtful of how much care was taken with the fix if
the submission looks like it was hurried out.

> 
> ubuntu@amdgpu:~/linux/ubuntu/ubuntu-eoan$ git am ./0001-UBUNTU-SAUCE-shiftfs-fix-dentry-revalidation.patch
> Applying: UBUNTU: SAUCE: shiftfs: fix dentry revalidation
> 
> on top of
> 
> ubuntu@amdgpu:~/linux/ubuntu/ubuntu-eoan$ git am ./0001-UBUNTU-SAUCE-shiftfs-fix-dentry-revalidation.patch
> Applying: UBUNTU: SAUCE: shiftfs: fix dentry revalidation
> commit d53d4b6679bb89ae78787dca0fea951752e27d48 (upstream/master-next)
> Author: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
> Date:   Fri May 15 13:03:42 2020 +0200
> 
>     UBUNTU: Ubuntu-5.3.0-54.48
> 
>     Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
> 
> I'm not letting shiftfs across kernel versions go out of sync precisely
> (at least until we absolutely have to) so that patches can be applied
> trivially across releases so it's easier for all of you.

I trust that this is true for shiftfs.c but you might be using kernel functions
outside your driver domain which do not exist in older kernels.

I would say once we get additional test results by James and you refreshed the
bug report (I added nominations for focal and eoan, the importance I set to
medium as I understood you as this might be an uncommon use-case which triggers
it), then re-send your v2 for unstable,focal, and eoan.

-Stefan
> 
> Thanks!
> Christian
>
Christian Brauner May 19, 2020, 9:15 a.m. UTC | #5
On Tue, May 19, 2020 at 10:52:10AM +0200, Stefan Bader wrote:
> On 19.05.20 08:02, Christian Brauner wrote:
> > On Tue, May 19, 2020 at 07:43:21AM +0200, Stefan Bader wrote:
> >> On 18.05.20 20:18, Christian Brauner wrote:
> >>> BugLink: https://bugs.launchpad.net/bugs/1879196
> >>>
> >>> While fixing [1] we introduced a regression and triggered [2] whereby
> >>> the underlay and overlay go out of sync because they disagree about the
> >>> dcache. Such situation easily arise when the underlay is modified
> >>> directly. This means the overlay needs to revalidate the dentry in the
> >>> dcache to catchup with the underlay.
> >>>
> >>> Note, in general it's not advisable to directly modify the underlay
> >>> while a shiftfs mount is on top. In some way this means we need to keep
> >>> two caches in sync and it's hard enough to keep a single cache happy.
> >>> But shiftfs' use-case is inherently prone to be used for exactly that.
> >>> So this is something we have to navigate carefully and honestly we have
> >>> no full model upstream that does the same. Overlayfs has the copy-up
> >>> behavior which let's it get around most of the issues but we don't have
> >>> it and ecryptfs is broken in such scenarios which we verified quite a
> >>> while back.
> >>> In any case, I built a kernel with this patch and re-ran all regressions
> >>> that are related to this that we have so far (cf.  [1], [2], and [3]).
> >>> None of them were reproducible with this patch here. So we still fix the
> >>> ESTALE issue but also keep underlay and overlay in sync.
> >>
> >> This test kernel should be provided to the bug reporter, so it can be confirmed
> >> to fix the issue there as well.
> > 
> > See
> > https://drive.google.com/open?id=19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io
> > Elmo is also Cced here. I don't have any other place I can upload this,
> > I think.
> 
> Since it is James, you could use private-fileshare.c.c, otherwise people.c.c for
> providing test kernels. The reason I would like to see the reporter (did not
> look close enough to see its James initially, but even better) confirm the fix
> is that while you did regression test against the known issues, this might
> introduce something new. And there some unknown use pattern not directly aimed
> at something adds value.

It might be worth gathering all the regression tests I've posted into
the launchpad bugs into an out of Ubuntu-kernel-tree test-suite
somewhere but I frankly lack the time to do that at the moment. But I'll
keep it in mind.

The LXD test-suite is pretty good at detecting these regressions
usually but all bugs here are use-cases that are very rare. The first
bug was re-opening a deleted file through /proc/<pid>/fd/<nr>, the
second was unlinking a device file and then stat()ing the fd. In both cases
they take specific kernel paths. For the re-opening case it's the
LOOKUP_JUMPED path that hits d_weak_revalidate and for device files the
pinning works differently in the vfs as it does go through the device
layer (in convoluted ways).
We had Elmo's issue originally in one of the first test version of
shiftfs so we introduced agressive revalidation which then led to
scenarios where we caused the vfs to think a file had gone ESTALE. This
specifically was true for temporary files. That ESTALE is what prevented
re-openeing through /proc/<pid>/fd/<nr>.
The original change back from this agressive style or revalidation was
made under the assumption that we can essentially simply trust the
underlay's cache but that presupposes that the underlay isn't modified
directly but that squares with directory sharing.

So the new version is aggressively revalidating but handles the
d_weak_revalidate case which is hit for LOOKUP_JUMPED (and thus for
going through proc magic links) by not invalidating a dentry simply
because it's i_nlink count is 0. Where i_nlink count == 0 means that it
was deleted while we still held it open which is a fine thing to do and
backed by posix fs semantics of course. The problem with the original
version was - if my analysis is correct - that d_weak_revalidate() for
shiftfs returned 0 when i_nlink was 0. But that obviously doesn't matter
since someone can still hold a reference to the inode.

Christian
James Troup May 25, 2020, 2:33 p.m. UTC | #6
Christian Brauner <christian.brauner@ubuntu.com> writes:

> See
> https://drive.google.com/open?id=19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io
> Elmo is also Cced here. I don't have any other place I can upload this,
> I think.

Sorry for not replying earlier; the only place I can reproduce this is
my laptop and I didn't have the space last week to reboot it.  I did
however finally try over the weekend.

When I boot into this kernel, I get a degraded (low-res, laptop screen
only) X session and lxd doesn't work:

$ lxc list
Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused
$

The X session looks like the lack of i915 module - I don't see a
linux-modules-extra package?

Not sure what module/config is missing that is causing lxd to not
work....
Christian Brauner May 25, 2020, 2:37 p.m. UTC | #7
On Mon, May 25, 2020 at 03:33:03PM +0100, James Troup wrote:
> Christian Brauner <christian.brauner@ubuntu.com> writes:
> 
> > See
> > https://drive.google.com/open?id=19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io
> > Elmo is also Cced here. I don't have any other place I can upload this,
> > I think.
> 
> Sorry for not replying earlier; the only place I can reproduce this is
> my laptop and I didn't have the space last week to reboot it.  I did
> however finally try over the weekend.
> 
> When I boot into this kernel, I get a degraded (low-res, laptop screen
> only) X session and lxd doesn't work:
> 
> $ lxc list
> Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused
> $
> 
> The X session looks like the lack of i915 module - I don't see a
> linux-modules-extra package?

Oh yeah, I don't use anything like that on my test machines, sorry.

> 
> Not sure what module/config is missing that is causing lxd to not
> work....

It's the zfs dkms. I'm not sure it can even be built with unstable but
probably not. I'm building you a focal kernel with that patch right now
and which should also include all the bits and pieces.

Should take some time though. The download from launchpad alone.

Christian
Christian Brauner May 25, 2020, 3:35 p.m. UTC | #8
On Mon, May 25, 2020 at 04:37:40PM +0200, Christian Brauner wrote:
> On Mon, May 25, 2020 at 03:33:03PM +0100, James Troup wrote:
> > Christian Brauner <christian.brauner@ubuntu.com> writes:
> > 
> > > See
> > > https://drive.google.com/open?id=19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io
> > > Elmo is also Cced here. I don't have any other place I can upload this,
> > > I think.
> > 
> > Sorry for not replying earlier; the only place I can reproduce this is
> > my laptop and I didn't have the space last week to reboot it.  I did
> > however finally try over the weekend.
> > 
> > When I boot into this kernel, I get a degraded (low-res, laptop screen
> > only) X session and lxd doesn't work:
> > 
> > $ lxc list
> > Error: Get "http://unix.socket/1.0": dial unix /var/snap/lxd/common/lxd/unix.socket: connect: connection refused
> > $
> > 
> > The X session looks like the lack of i915 module - I don't see a
> > linux-modules-extra package?
> 
> Oh yeah, I don't use anything like that on my test machines, sorry.
> 
> > 
> > Not sure what module/config is missing that is causing lxd to not
> > work....
> 
> It's the zfs dkms. I'm not sure it can even be built with unstable but
> probably not. I'm building you a focal kernel with that patch right now
> and which should also include all the bits and pieces.
> 
> Should take some time though. The download from launchpad alone.

Ok, here's a focal kernel master-next. It should include zfs and
hopefully i915 too:
https://drive.google.com/drive/u/1/folders/19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io

Christian
James Troup May 27, 2020, 6:41 p.m. UTC | #9
Christian Brauner <christian.brauner@ubuntu.com> writes:

> Ok, here's a focal kernel master-next. It should include zfs and
> hopefully i915 too:
> https://drive.google.com/drive/u/1/folders/19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io

OK, that's much better, thanks.  I'm booted into it now and
unfortunately it doesn't solve the original problem...

After editing a file in emacs on the host:

🙂 james@malefic:~/projects/tasta$ md5sum xxx.foo
5d56b88fddb21d4134eca0294dd76c79  xxx.foo
🙂 james@malefic:~/projects/tasta$

And in the lxc with the shifted FS:

ubuntu@portal:~/tasta$ md5sum xxx.foo
14758f1afd44c09b7992073ccf00b43d  xxx.foo
ubuntu@portal:~/tasta$

🙂 james@malefic:~/projects/tasta$ uname -a
Linux malefic 5.4.0-33-generic #37 SMP Mon May 25 14:40:20 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
🙂 james@malefic:~/projects/tasta$
Christian Brauner May 27, 2020, 7:33 p.m. UTC | #10
On Wed, May 27, 2020 at 07:41:39PM +0100, James Troup wrote:
> Christian Brauner <christian.brauner@ubuntu.com> writes:
> 
> > Ok, here's a focal kernel master-next. It should include zfs and
> > hopefully i915 too:
> > https://drive.google.com/drive/u/1/folders/19iTwaFSYNS95_I-gD_rvFoV9cMAfy6io
> 
> OK, that's much better, thanks.  I'm booted into it now and
> unfortunately it doesn't solve the original problem...
> 
> After editing a file in emacs on the host:
> 
> 🙂 james@malefic:~/projects/tasta$ md5sum xxx.foo
> 5d56b88fddb21d4134eca0294dd76c79  xxx.foo
> 🙂 james@malefic:~/projects/tasta$
> 
> And in the lxc with the shifted FS:
> 
> ubuntu@portal:~/tasta$ md5sum xxx.foo
> 14758f1afd44c09b7992073ccf00b43d  xxx.foo
> ubuntu@portal:~/tasta$
> 
> 🙂 james@malefic:~/projects/tasta$ uname -a
> Linux malefic 5.4.0-33-generic #37 SMP Mon May 25 14:40:20 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
> 🙂 james@malefic:~/projects/tasta$ 

I'm not sure what's happening but I can't reproduce this at all. See:

https://asciinema.org/a/334491
https://asciinema.org/a/334492

What's the backing filesystem you're using, i.e. whats your host
filesystem? Any other details that might help?

Christian
James Troup May 29, 2020, 2:36 a.m. UTC | #11
Christian Brauner <christian.brauner@ubuntu.com> writes:

> https://asciinema.org/a/334491
> https://asciinema.org/a/334492

Like I mentioned in the bug report, emacs is the key to reproducing it
for me.  I don't know why.  But if I use vim or shell commands, I
can't (reliably or easily) reproduce it either.  However, I can
everytime with emacs.

> What's the backing filesystem you're using, i.e. whats your host
> filesystem? Any other details that might help?

The host is a FDE ext4 FS (so with LVM + DM crypt).  The lxc is on
zfs. Ah, and I'm using ZRAM (zram-config package is installed), I
guess that could be relevant.
James Troup Aug. 24, 2020, 3:29 a.m. UTC | #12
James Troup <james.troup@canonical.com> writes:

> Christian Brauner <christian.brauner@ubuntu.com> writes:
>
>> https://asciinema.org/a/334491
>> https://asciinema.org/a/334492
>
> Like I mentioned in the bug report, emacs is the key to reproducing it
> for me.  I don't know why.  But if I use vim or shell commands, I
> can't (reliably or easily) reproduce it either.  However, I can
> everytime with emacs.
>
>> What's the backing filesystem you're using, i.e. whats your host
>> filesystem? Any other details that might help?
>
> The host is a FDE ext4 FS (so with LVM + DM crypt).  The lxc is on
> zfs. Ah, and I'm using ZRAM (zram-config package is installed), I
> guess that could be relevant.

Hey, any update on this or anything I can do to help move this
forward?  Chris Sanders reporting seeing the same problem, so I don't
think it's something specific to my setup/laptop.

FWIW, I can also apparently reproduce it reliably with 'bzr revert',
so not just emacs either.  I wonder if it has to do with updating a
file in place vs. writing to a temporary file and rename()'ing into
place?
Stefan Bader Aug. 24, 2020, 7:13 a.m. UTC | #13
On 24.08.20 05:29, James Troup wrote:
> James Troup <james.troup@canonical.com> writes:
> 
>> Christian Brauner <christian.brauner@ubuntu.com> writes:
>>
>>> https://asciinema.org/a/334491
>>> https://asciinema.org/a/334492
>>
>> Like I mentioned in the bug report, emacs is the key to reproducing it
>> for me.  I don't know why.  But if I use vim or shell commands, I
>> can't (reliably or easily) reproduce it either.  However, I can
>> everytime with emacs.
>>
>>> What's the backing filesystem you're using, i.e. whats your host
>>> filesystem? Any other details that might help?
>>
>> The host is a FDE ext4 FS (so with LVM + DM crypt).  The lxc is on
>> zfs. Ah, and I'm using ZRAM (zram-config package is installed), I
>> guess that could be relevant.
> 
> Hey, any update on this or anything I can do to help move this
> forward?  Chris Sanders reporting seeing the same problem, so I don't
> think it's something specific to my setup/laptop.
> 
> FWIW, I can also apparently reproduce it reliably with 'bzr revert',
> so not just emacs either.  I wonder if it has to do with updating a
> file in place vs. writing to a temporary file and rename()'ing into
> place?
> 
Hi James,

were you provided with a test kernel to ensure the change solves the problem and
does not introduce any new problematic behavior?

-Stefan
Christian Brauner Aug. 24, 2020, 8:09 a.m. UTC | #14
On Mon, Aug 24, 2020 at 09:13:11AM +0200, Stefan Bader wrote:
> On 24.08.20 05:29, James Troup wrote:
> > James Troup <james.troup@canonical.com> writes:
> > 
> >> Christian Brauner <christian.brauner@ubuntu.com> writes:
> >>
> >>> https://asciinema.org/a/334491
> >>> https://asciinema.org/a/334492
> >>
> >> Like I mentioned in the bug report, emacs is the key to reproducing it
> >> for me.  I don't know why.  But if I use vim or shell commands, I
> >> can't (reliably or easily) reproduce it either.  However, I can
> >> everytime with emacs.
> >>
> >>> What's the backing filesystem you're using, i.e. whats your host
> >>> filesystem? Any other details that might help?
> >>
> >> The host is a FDE ext4 FS (so with LVM + DM crypt).  The lxc is on
> >> zfs. Ah, and I'm using ZRAM (zram-config package is installed), I
> >> guess that could be relevant.
> > 
> > Hey, any update on this or anything I can do to help move this
> > forward?  Chris Sanders reporting seeing the same problem, so I don't
> > think it's something specific to my setup/laptop.
> > 
> > FWIW, I can also apparently reproduce it reliably with 'bzr revert',
> > so not just emacs either.  I wonder if it has to do with updating a
> > file in place vs. writing to a temporary file and rename()'ing into
> > place?
> > 
> Hi James,
> 
> were you provided with a test kernel to ensure the change solves the problem and
> does not introduce any new problematic behavior?

The change here is not relevant anymore otherwise I would've resent.
Once Plumbers is over I'll look into this again.

Thanks!
Christian
Christian Brauner Aug. 24, 2020, 8:16 a.m. UTC | #15
On Mon, Aug 24, 2020 at 04:29:06AM +0100, James Troup wrote:
> James Troup <james.troup@canonical.com> writes:
> 
> > Christian Brauner <christian.brauner@ubuntu.com> writes:
> >
> >> https://asciinema.org/a/334491
> >> https://asciinema.org/a/334492
> >
> > Like I mentioned in the bug report, emacs is the key to reproducing it
> > for me.  I don't know why.  But if I use vim or shell commands, I
> > can't (reliably or easily) reproduce it either.  However, I can
> > everytime with emacs.
> >
> >> What's the backing filesystem you're using, i.e. whats your host
> >> filesystem? Any other details that might help?
> >
> > The host is a FDE ext4 FS (so with LVM + DM crypt).  The lxc is on
> > zfs. Ah, and I'm using ZRAM (zram-config package is installed), I
> > guess that could be relevant.
> 
> Hey, any update on this or anything I can do to help move this
> forward?  Chris Sanders reporting seeing the same problem, so I don't
> think it's something specific to my setup/laptop.
> 
> FWIW, I can also apparently reproduce it reliably with 'bzr revert',
> so not just emacs either.  I wonder if it has to do with updating a
> file in place vs. writing to a temporary file and rename()'ing into
> place?

Thanks James. I haven't forgotten. I simply didn't have time to look
into it yet. I'm look a now but I'll likely need another week because of
Plumbers. Sorry for the delay.

Christian
diff mbox series

Patch

diff --git a/fs/shiftfs.c b/fs/shiftfs.c
index 39b878a6f820..652fb67b0e0b 100644
--- a/fs/shiftfs.c
+++ b/fs/shiftfs.c
@@ -240,6 +240,9 @@  static int shiftfs_d_weak_revalidate(struct dentry *dentry, unsigned int flags)
 	int err = 1;
 	struct dentry *lowerd = dentry->d_fsdata;
 
+	if (d_is_negative(lowerd) != d_is_negative(dentry))
+		return 0;
+
 	if (lowerd->d_flags & DCACHE_OP_WEAK_REVALIDATE) {
 		err = lowerd->d_op->d_weak_revalidate(lowerd, flags);
 		if (err < 0)
@@ -264,13 +267,16 @@  static int shiftfs_d_revalidate(struct dentry *dentry, unsigned int flags)
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
+	if (d_unhashed(lowerd) ||
+	    d_is_negative(lowerd) != d_is_negative(dentry))
+		return 0;
+
 	if (lowerd->d_flags & DCACHE_OP_REVALIDATE) {
 		err = lowerd->d_op->d_revalidate(lowerd, flags);
 		if (err < 0)
 			return err;
 		if (!err) {
-			if (!(flags & LOOKUP_RCU))
-				d_invalidate(lowerd);
+			d_invalidate(lowerd);
 			return -ESTALE;
 		}
 	}