diff mbox series

[v1] cifs: make nested cifs mount point dentries always valid to deal with signaled 'df'

Message ID 20210129171316.13160-1-aaptel@suse.com
State New
Headers show
Series [v1] cifs: make nested cifs mount point dentries always valid to deal with signaled 'df' | expand

Commit Message

Aurélien Aptel Jan. 29, 2021, 5:13 p.m. UTC
From: Aurelien Aptel <aaptel@suse.com>

Assuming
- //HOST/a is mounted on /mnt
- //HOST/b is mounted on /mnt/b

On a slow connection, running 'df' and killing it while it's
processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS.

This triggers the following chain of events:
=> the dentry revalidation fail
=> dentry is put and released
=> superblock associated with the dentry is put
=> /mnt/b is unmounted

This quick fix makes cifs_d_revalidate() always succeed (mark as
valid) on cifs dentries which are also mount points.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
---

I have managed to reproduce this bug with the following script.  It
uses tc with netem discipline (CONFIG_NET_SCH_NETEM=y) to simulate
network delays.

#!/bin/bash

#
# reproducing bsc#1177440
#
# nested mount point gets unmounted when process is signaled
#
set -e

share1=//192.168.2.110/scratch
share2=//192.168.2.110/test
opts="username=administrator,password=aaptel-42,vers=1.0,actimeo=0"

cleanup() {
    # reset delay
    tc qdisc del dev eth0 root
    mount|grep -q /mnt/nest/a && umount /mnt/nest/a
    mount|grep -q /mnt/nest && umount /mnt/nest

    echo 'module cifs -p' > /sys/kernel/debug/dynamic_debug/control
    echo 'file fs/cifs/* -p' > /sys/kernel/debug/dynamic_debug/control
    echo 0 > /proc/fs/cifs/cifsFYI
    echo 0 > /sys/module/dns_resolver/parameters/debug
    echo 1 > /proc/sys/kernel/printk_ratelimit
    
}

trap cleanup EXIT

nbcifs() {
    mount|grep cifs|wc -l
}

reset() {
    echo "unmounting and reloading cifs.ko"
    mount|grep -q /mnt/nest/a && umount /mnt/nest/a
    mount|grep -q /mnt/nest && umount /mnt/nest
    sleep 1
    lsmod|grep -q cifs && ( modprobe -r cifs &> /dev/null || true )
    lsmod|grep -q cifs || ( modprobe cifs &> /dev/null  || true )
}

mnt() {
    dmesg --clear
    echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control
    echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control
    echo 1 > /proc/fs/cifs/cifsFYI
    echo 1 > /sys/module/dns_resolver/parameters/debug
    echo 0 > /proc/sys/kernel/printk_ratelimit

    echo "mounting"
    mkdir -p /mnt/nest
    mount.cifs $share1 /mnt/nest -o "$opts"
    mkdir -p /mnt/nest/a
    mount.cifs $share2 /mnt/nest/a -o "$opts"
}

# add fake delay
tc qdisc add dev eth0 root netem delay 300ms

while :; do
    reset
    mnt
    n=$(nbcifs)    
    echo "starting df with $n mounts"
    df & 
    pid=$!
    sleep 1.5
    kill $pid || true
    x=$(nbcifs)
    echo "stopping with $x mounts"
    if [ $x -lt $n ]; then
        echo "lost mounts"
        dmesg > kernel.log
        exit 1
    fi
done



fs/cifs/dir.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Shyam Prasad N Jan. 31, 2021, 9:28 a.m. UTC | #1
I'm assuming that the revalidation is failing in
cifs_revalidate_dentry, because it returns -ERESTARTSYS.

Going by the documentation of d_revalidate:
> This function should return a positive value if the dentry is still
> valid, and zero or a negative error code if it isn't.

In case of error, can we try returning the rc itself (rather than 0),
and see if VFS avoids a dentry put?
Because theoretically, the call execution has failed, and the dentry
is not found to be invalid.

Regards,
Shyam

On Fri, Jan 29, 2021 at 9:16 AM Aurélien Aptel <aaptel@suse.com> wrote:
>
> From: Aurelien Aptel <aaptel@suse.com>
>
> Assuming
> - //HOST/a is mounted on /mnt
> - //HOST/b is mounted on /mnt/b
>
> On a slow connection, running 'df' and killing it while it's
> processing /mnt/b can make cifs_get_inode_info() returns -ERESTARTSYS.
>
> This triggers the following chain of events:
> => the dentry revalidation fail
> => dentry is put and released
> => superblock associated with the dentry is put
> => /mnt/b is unmounted
>
> This quick fix makes cifs_d_revalidate() always succeed (mark as
> valid) on cifs dentries which are also mount points.
>
> Signed-off-by: Aurelien Aptel <aaptel@suse.com>
> Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz>
> ---
>
> I have managed to reproduce this bug with the following script.  It
> uses tc with netem discipline (CONFIG_NET_SCH_NETEM=y) to simulate
> network delays.
>
> #!/bin/bash
>
> #
> # reproducing bsc#1177440
> #
> # nested mount point gets unmounted when process is signaled
> #
> set -e
>
> share1=//192.168.2.110/scratch
> share2=//192.168.2.110/test
> opts="username=administrator,password=aaptel-42,vers=1.0,actimeo=0"
>
> cleanup() {
>     # reset delay
>     tc qdisc del dev eth0 root
>     mount|grep -q /mnt/nest/a && umount /mnt/nest/a
>     mount|grep -q /mnt/nest && umount /mnt/nest
>
>     echo 'module cifs -p' > /sys/kernel/debug/dynamic_debug/control
>     echo 'file fs/cifs/* -p' > /sys/kernel/debug/dynamic_debug/control
>     echo 0 > /proc/fs/cifs/cifsFYI
>     echo 0 > /sys/module/dns_resolver/parameters/debug
>     echo 1 > /proc/sys/kernel/printk_ratelimit
>
> }
>
> trap cleanup EXIT
>
> nbcifs() {
>     mount|grep cifs|wc -l
> }
>
> reset() {
>     echo "unmounting and reloading cifs.ko"
>     mount|grep -q /mnt/nest/a && umount /mnt/nest/a
>     mount|grep -q /mnt/nest && umount /mnt/nest
>     sleep 1
>     lsmod|grep -q cifs && ( modprobe -r cifs &> /dev/null || true )
>     lsmod|grep -q cifs || ( modprobe cifs &> /dev/null  || true )
> }
>
> mnt() {
>     dmesg --clear
>     echo 'module cifs +p' > /sys/kernel/debug/dynamic_debug/control
>     echo 'file fs/cifs/* +p' > /sys/kernel/debug/dynamic_debug/control
>     echo 1 > /proc/fs/cifs/cifsFYI
>     echo 1 > /sys/module/dns_resolver/parameters/debug
>     echo 0 > /proc/sys/kernel/printk_ratelimit
>
>     echo "mounting"
>     mkdir -p /mnt/nest
>     mount.cifs $share1 /mnt/nest -o "$opts"
>     mkdir -p /mnt/nest/a
>     mount.cifs $share2 /mnt/nest/a -o "$opts"
> }
>
> # add fake delay
> tc qdisc add dev eth0 root netem delay 300ms
>
> while :; do
>     reset
>     mnt
>     n=$(nbcifs)
>     echo "starting df with $n mounts"
>     df &
>     pid=$!
>     sleep 1.5
>     kill $pid || true
>     x=$(nbcifs)
>     echo "stopping with $x mounts"
>     if [ $x -lt $n ]; then
>         echo "lost mounts"
>         dmesg > kernel.log
>         exit 1
>     fi
> done
>
>
>
> fs/cifs/dir.c | 4 ++++
>  1 file changed, 4 insertions(+)
>
> diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
> index 68900f1629bff..876ef01628538 100644
> --- a/fs/cifs/dir.c
> +++ b/fs/cifs/dir.c
> @@ -741,6 +741,10 @@ cifs_d_revalidate(struct dentry *direntry, unsigned int flags)
>         if (flags & LOOKUP_RCU)
>                 return -ECHILD;
>
> +       /* nested cifs mount point are always valid */
> +       if (d_mountpoint(direntry))
> +               return 1;
> +
>         if (d_really_is_positive(direntry)) {
>                 inode = d_inode(direntry);
>                 if ((flags & LOOKUP_REVAL) && !CIFS_CACHE_READ(CIFS_I(inode)))
> --
> 2.29.2
>
Aurélien Aptel Feb. 1, 2021, 10:31 a.m. UTC | #2
Shyam Prasad N <nspmangalore@gmail.com> writes:
> Going by the documentation of d_revalidate:
>> This function should return a positive value if the dentry is still
>> valid, and zero or a negative error code if it isn't.
>
> In case of error, can we try returning the rc itself (rather than 0),
> and see if VFS avoids a dentry put?
> Because theoretically, the call execution has failed, and the dentry
> is not found to be invalid.

AFAIK mount points are pinned, you cannot rm or mv them so it seems we
could make them always valid. I don't know if there are deeper and more
subtle implications.

The recent signal fixes are not fixing this issue.

Cheers,
Shyam Prasad N Feb. 1, 2021, 4:51 p.m. UTC | #3
I'm okay with returning valid for directory mount point.

But the point that I'm trying to make here is that VFS reacts
differently when d_validate returns an error VS when it just returns
invalid:
https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1409

Notice how it calls d_invalidate only when there's no error. And
d_invalidate seems to have detach_mounts.
It is likely that the umount happens there.

I'm suggesting that we should return errors inside d_validate
handlers, rather than just 0 or 1.
Makes sense?

Regards,
Shyam

On Mon, Feb 1, 2021 at 4:01 PM Aurélien Aptel <aaptel@suse.com> wrote:
>
> Shyam Prasad N <nspmangalore@gmail.com> writes:
> > Going by the documentation of d_revalidate:
> >> This function should return a positive value if the dentry is still
> >> valid, and zero or a negative error code if it isn't.
> >
> > In case of error, can we try returning the rc itself (rather than 0),
> > and see if VFS avoids a dentry put?
> > Because theoretically, the call execution has failed, and the dentry
> > is not found to be invalid.
>
> AFAIK mount points are pinned, you cannot rm or mv them so it seems we
> could make them always valid. I don't know if there are deeper and more
> subtle implications.
>
> The recent signal fixes are not fixing this issue.
>
> Cheers,
> --
> Aurélien Aptel / SUSE Labs Samba Team
> GPG: 1839 CB5F 9F5B FB9B AA97  8C99 03C8 A49B 521B D5D3
> SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg, DE
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah HRB 247165 (AG München)
>
Aurélien Aptel Feb. 2, 2021, 11 a.m. UTC | #4
Shyam Prasad N <nspmangalore@gmail.com> writes:
> But the point that I'm trying to make here is that VFS reacts
> differently when d_validate returns an error VS when it just returns
> invalid:
> https://elixir.bootlin.com/linux/latest/source/fs/namei.c#L1409

I've tried returning the error with the repro script and it also worked.

> Notice how it calls d_invalidate only when there's no error. And
> d_invalidate seems to have detach_mounts.
> It is likely that the umount happens there.

I've dumped call stacks, the revalidation is triggered by filename_lookup()

[   31.913092]  [<ffffffff8133e3d1>] ? SendReceive+0x2b1/0x2d0
[   31.913093]  [<ffffffff8132cbd2>] ? CIFSSMBQPathInfo+0x152/0x260
[   31.913096]  [<ffffffff81343c9f>] ? cifs_query_path_info+0x6f/0x1a0
[   31.913098]  [<ffffffff81366953>] ? cifs_get_inode_info.cold+0x44f/0x6bb
[   31.913100]  [<ffffffff810bf935>] ? wake_up_process+0x15/0x20
[   31.913102]  [<ffffffff810e9c30>] ? vprintk_emit+0x200/0x500
[   31.913103]  [<ffffffff810ea099>] ? vprintk_default+0x29/0x40
[   31.913105]  [<ffffffff811a896f>] ? printk+0x50/0x52
[   31.913107]  [<ffffffff813678c1>] ? cifs_revalidate_dentry_attr.cold+0x71/0x79
[   31.913109]  [<ffffffff8133b194>] ? cifs_revalidate_dentry+0x14/0x30
[   31.913110]  [<ffffffff813327e5>] ? cifs_d_revalidate+0x25/0xb0
[   31.913112]  [<ffffffff812404ff>] ? lookup_fast+0x1bf/0x220
[   31.913113]  [<ffffffff812407bc>] ? walk_component+0x3c/0x3f0
[   31.913114]  [<ffffffff8123fe5b>] ? path_init+0x23b/0x450
[   31.913116]  [<ffffffff812412bf>] ? path_lookupat+0x7f/0x110
[   31.913118]  [<ffffffff81242ae7>] ? filename_lookup+0x97/0x190
[   31.913120]  [<ffffffff811e1e39>] ? handle_pte_fault+0x1d9/0x240
[   31.913122]  [<ffffffff81242cc9>] ? user_path_at_empty+0x59/0x90
[   31.913124]  [<ffffffff8126ab59>] ? user_statfs+0x39/0xa0
[   31.913125]  [<ffffffff8126abd6>] ? SYSC_statfs+0x16/0x40
[   31.913127]  [<ffffffff8106de53>] ? trace_do_page_fault+0x43/0x150
[   31.913130]  [<ffffffff8180359c>] ? async_page_fault+0x3c/0x60
[   31.913131]  [<ffffffff8126ad6e>] ? SyS_statfs+0xe/0x10
[   31.913132]  [<ffffffff81800021>] ? entry_SYSCALL_64_fastpath+0x20/0xee


d_invalidate()->...->drop_mountpoint() adds a callback to unmount at later times:

[   31.913246]  [<ffffffff812554da>] ? drop_mountpoint+0x6a/0x70
[   31.913247]  [<ffffffff8126b0b8>] ? pin_kill+0x88/0x160
[   31.913249]  [<ffffffff810d6a20>] ? prepare_to_wait_event+0x100/0x100
[   31.913250]  [<ffffffff810d6a20>] ? prepare_to_wait_event+0x100/0x100
[   31.913252]  [<ffffffff8126b205>] ? group_pin_kill+0x25/0x50
[   31.913253]  [<ffffffff812564fa>] ? __detach_mounts+0x13a/0x140
[   31.913255]  [<ffffffff8124bf24>] ? d_invalidate+0xa4/0x100
[   31.913256]  [<ffffffff812404cd>] ? lookup_fast+0x18d/0x220
[   31.913257]  [<ffffffff812407bc>] ? walk_component+0x3c/0x3f0
[   31.913258]  [<ffffffff8123fe5b>] ? path_init+0x23b/0x450
[   31.913259]  [<ffffffff812412bf>] ? path_lookupat+0x7f/0x110
[   31.913261]  [<ffffffff81242ae7>] ? filename_lookup+0x97/0x190
[   31.913262]  [<ffffffff811e1e39>] ? handle_pte_fault+0x1d9/0x240
[   31.913263]  [<ffffffff81242cc9>] ? user_path_at_empty+0x59/0x90
[   31.913265]  [<ffffffff8126ab59>] ? user_statfs+0x39/0xa0
[   31.913266]  [<ffffffff8126abd6>] ? SYSC_statfs+0x16/0x40
[   31.913268]  [<ffffffff8106de53>] ? trace_do_page_fault+0x43/0x150
[   31.913269]  [<ffffffff8180359c>] ? async_page_fault+0x3c/0x60
[   31.913270]  [<ffffffff8126ad6e>] ? SyS_statfs+0xe/0x10
[   31.913272]  [<ffffffff81800021>] ? entry_SYSCALL_64_fastpath+0x20/0xee

The actual unmount call:

[   31.913594]  [<ffffffff81362248>] ? cifs_put_tcon.part.0+0x71/0x123
[   31.913597]  [<ffffffff81362309>] ? cifs_put_tlink.cold+0x5/0xa
[   31.913599]  [<ffffffff813318a5>] ? cifs_umount+0x65/0xd0
[   31.913601]  [<ffffffff8132976f>] ? cifs_kill_sb+0x1f/0x30
[   31.913603]  [<ffffffff8123527a>] ? deactivate_locked_super+0x4a/0xd0
[   31.913605]  [<ffffffff81235344>] ? deactivate_super+0x44/0x50
[   31.913609]  [<ffffffff81253adb>] ? cleanup_mnt+0x3b/0x90
[   31.913610]  [<ffffffff81253b82>] ? __cleanup_mnt+0x12/0x20
[   31.913613]  [<ffffffff810af9eb>] ? task_work_run+0x7b/0xb0
[   31.913616]  [<ffffffff810a0a3a>] ? get_signal+0x4ea/0x4f0
[   31.913618]  [<ffffffff811e1e39>] ? handle_pte_fault+0x1d9/0x240
[   31.913621]  [<ffffffff810185d1>] ? do_signal+0x21/0x100
[   31.913624]  [<ffffffff81242cc9>] ? user_path_at_empty+0x59/0x90
[   31.913626]  [<ffffffff8126ab59>] ? user_statfs+0x39/0xa0
[   31.913628]  [<ffffffff8126abd6>] ? SYSC_statfs+0x16/0x40
[   31.913630]  [<ffffffff81003517>] ? exit_to_usermode_loop+0x87/0xc0
[   31.913632]  [<ffffffff81003bed>] ? syscall_return_slowpath+0xed/0x180
[   31.913634]  [<ffffffff818001b1>] ? int_ret_from_sys_call+0x8/0x6d


> I'm suggesting that we should return errors inside d_validate
> handlers, rather than just 0 or 1.
> Makes sense?

Yes that worked too. I'll send v2.

Cheers,
diff mbox series

Patch

diff --git a/fs/cifs/dir.c b/fs/cifs/dir.c
index 68900f1629bff..876ef01628538 100644
--- a/fs/cifs/dir.c
+++ b/fs/cifs/dir.c
@@ -741,6 +741,10 @@  cifs_d_revalidate(struct dentry *direntry, unsigned int flags)
 	if (flags & LOOKUP_RCU)
 		return -ECHILD;
 
+	/* nested cifs mount point are always valid */
+	if (d_mountpoint(direntry))
+		return 1;
+
 	if (d_really_is_positive(direntry)) {
 		inode = d_inode(direntry);
 		if ((flags & LOOKUP_REVAL) && !CIFS_CACHE_READ(CIFS_I(inode)))