diff mbox series

fcntl36: fix 32-bit sporadic failures

Message ID 8b5ddc69aae83b332b205dbeddb3284ab00f827e.1529082112.git.jstancek@redhat.com
State Accepted
Delegated to: Petr Vorel
Headers show
Series fcntl36: fix 32-bit sporadic failures | expand

Commit Message

Jan Stancek June 15, 2018, 5:22 p.m. UTC
fcntl36 testcase has been observed to sporadically fail, when
running 32-bit user-space on 64-bit kernel.

Strace shows that region length is 0 for some commands:
  fcntl64(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=5120, l_len=0}

This is because testcase is passing struct flock64, but
command is F_SETLK, not F_SETLK64.

So, kernel treats argument for POSIX command as 32-bit "struct flock":

  static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
  ...
        case F_SETLK:
        case F_SETLKW:
                err = get_compat_flock(&flock, compat_ptr(arg));
		...

in contrast to F_OFD_* commands, where argument  is treated
as "struct flock64":
        case F_SETLK64:
        case F_SETLKW64:
        case F_OFD_SETLK:
        case F_OFD_SETLKW:
                err = get_compat_flock64(&flock, compat_ptr(arg));
		...

Switch argument of POSIX commands to 'struct flock' and leave it to
glibc to pick correct syscall and command.

I tested fcntl36 and fcntl36_64 on x86_64 kernel (as 64bit + 32bit binary)
and i386 kernel by running 100 iterations.

Signed-off-by: Jan Stancek <jstancek@redhat.com>
---
No kernel bug, problem was found in my reproducer.
Can anyone confirm this fixes 32-bit arm too?

 testcases/kernel/syscalls/fcntl/fcntl36.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Rafael David Tinoco June 15, 2018, 6:27 p.m. UTC | #1
Jan,

Running in a container, armhf userland, with arm64 kernel (real HW: hikey):

$ uname -a
Linux ltpfcntl36 4.16.0-2-arm64 #1 SMP Debian 4.16.12-1 (2018-05-27)
armv8l GNU/Linux

$ file ./fcntl36
./fcntl36: ELF 32-bit LSB pie executable ARM, EABI5 version 1 (SYSV),
dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for
GNU/Linux 3.2.0,
BuildID[sha1]=074ba3622d50f62108260d2912ae5dc0e8d52d46, with
debug_info, not stripped

My un-patched binary shows me:

[pid  3501] fcntl64(11, F_OFD_SETLK, {l_type=F_UNLCK,
l_whence=SEEK_SET, l_start=16384, l_len=4096} <unfinished ...>

no F_SETLK for this binary.

Running in a container, arm64 userland, with arm64 kernel (real HW: hikey):

$ file ./fcntl36
./fcntl36: ELF 64-bit LSB pie executable ARM aarch64, version 1
(SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1,
for GNU/Linux 3.7.0,
BuildID[sha1]=8fb1d0e0e0d027fce5c8d11742ce5bdb7678749b, with
debug_info, not stripped

Same un-patched binary shows me:

[pid   483] fcntl(10, F_OFD_SETLKW, {l_type=F_UNLCK,
l_whence=SEEK_SET, l_start=13312, l_len=4096} <unfinished ...>

Do you want me to test patched LTP in arm32/64 to make sure they work
and you check if it fixes the issue for i686 on your side ? Or should
I compile it some other way ?

Let me know what you need.

Rafael

On 15 June 2018 at 14:22, Jan Stancek <jstancek@redhat.com> wrote:

> No kernel bug, problem was found in my reproducer.
> Can anyone confirm this fixes 32-bit arm too?
Jan Stancek June 15, 2018, 6:39 p.m. UTC | #2
----- Original Message -----
> Jan,
> 
> Running in a container, armhf userland, with arm64 kernel (real HW: hikey):
> 
> $ uname -a
> Linux ltpfcntl36 4.16.0-2-arm64 #1 SMP Debian 4.16.12-1 (2018-05-27)
> armv8l GNU/Linux
> 
> $ file ./fcntl36
> ./fcntl36: ELF 32-bit LSB pie executable ARM, EABI5 version 1 (SYSV),
> dynamically linked, interpreter /lib/ld-linux-armhf.so.3, for
> GNU/Linux 3.2.0,
> BuildID[sha1]=074ba3622d50f62108260d2912ae5dc0e8d52d46, with
> debug_info, not stripped
> 
> My un-patched binary shows me:
> 
> [pid  3501] fcntl64(11, F_OFD_SETLK, {l_type=F_UNLCK,
> l_whence=SEEK_SET, l_start=16384, l_len=4096} <unfinished ...>
> 
> no F_SETLK for this binary.
> 
> Running in a container, arm64 userland, with arm64 kernel (real HW: hikey):
> 
> $ file ./fcntl36
> ./fcntl36: ELF 64-bit LSB pie executable ARM aarch64, version 1
> (SYSV), dynamically linked, interpreter /lib/ld-linux-aarch64.so.1,
> for GNU/Linux 3.7.0,
> BuildID[sha1]=8fb1d0e0e0d027fce5c8d11742ce5bdb7678749b, with
> debug_info, not stripped
> 
> Same un-patched binary shows me:
> 
> [pid   483] fcntl(10, F_OFD_SETLKW, {l_type=F_UNLCK,
> l_whence=SEEK_SET, l_start=13312, l_len=4096} <unfinished ...>
> 
> Do you want me to test patched LTP in arm32/64 to make sure they work

Yes, if possible test patched LTP in environment which
is known to hit the failure.

> and you check if it fixes the issue for i686 on your side ?

I already did, it does eliminate problem for me.

> Or should
> I compile it some other way ?

Default build is OK. Apply patch, build and see if
you can reproduce the failure.

> 
> Let me know what you need.
> 
> Rafael
> 
> On 15 June 2018 at 14:22, Jan Stancek <jstancek@redhat.com> wrote:
> 
> > No kernel bug, problem was found in my reproducer.
> > Can anyone confirm this fixes 32-bit arm too?
>
Li Wang June 16, 2018, 4:15 a.m. UTC | #3
On Sat, Jun 16, 2018 at 1:22 AM, Jan Stancek <jstancek@redhat.com> wrote:
> fcntl36 testcase has been observed to sporadically fail, when
> running 32-bit user-space on 64-bit kernel.
>
> Strace shows that region length is 0 for some commands:
>   fcntl64(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=5120, l_len=0}
>
> This is because testcase is passing struct flock64, but
> command is F_SETLK, not F_SETLK64.
>
> So, kernel treats argument for POSIX command as 32-bit "struct flock":
>
>   static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
>   ...
>         case F_SETLK:
>         case F_SETLKW:
>                 err = get_compat_flock(&flock, compat_ptr(arg));
>                 ...
>
> in contrast to F_OFD_* commands, where argument  is treated
> as "struct flock64":
>         case F_SETLK64:
>         case F_SETLKW64:
>         case F_OFD_SETLK:
>         case F_OFD_SETLKW:
>                 err = get_compat_flock64(&flock, compat_ptr(arg));
>                 ...
>
> Switch argument of POSIX commands to 'struct flock' and leave it to
> glibc to pick correct syscall and command.
>
> I tested fcntl36 and fcntl36_64 on x86_64 kernel (as 64bit + 32bit binary)
> and i386 kernel by running 100 iterations.

Sounds reasonable.

I can stably reproduce this issue on my raspbery Pi3, and with your
patch applied, the problem was gone.

[root@centos-rpi3 fcntl]# uname  -a
Linux centos-rpi3 4.9.13-v7.1.el7 #1 SMP Mon Feb 27 10:45:31 UTC 2017
armv7l armv7l armv7l GNU/Linux

[root@centos-rpi3 fcntl]# file fcntl36
fcntl36: ELF 32-bit LSB executable, ARM, version 1 (SYSV), dynamically
linked (uses shared libs), for GNU/Linux 2.6.32,
BuildID[sha1]=73d3a834d43361ecd4331d9b77781ea932a970cc, not stripped

[root@centos-rpi3 fcntl]# ./fcntl36
tst_test.c:1015: INFO: Timeout per run is 0h 05m 00s
fcntl36.c:303: INFO: OFD read lock vs OFD write lock
fcntl36.c:381: PASS: Access between threads synchronized
fcntl36.c:303: INFO: OFD write lock vs POSIX write lock
fcntl36.c:381: PASS: Access between threads synchronized
fcntl36.c:303: INFO: OFD read lock vs POSIX write lock
fcntl36.c:381: PASS: Access between threads synchronized
fcntl36.c:303: INFO: OFD write lock vs POSIX read lock
fcntl36.c:381: PASS: Access between threads synchronized
fcntl36.c:303: INFO: OFD write lock vs OFD write lock
fcntl36.c:381: PASS: Access between threads synchronized
fcntl36.c:303: INFO: OFD r/w lock vs POSIX write lock
fcntl36.c:381: PASS: Access between threads synchronized
fcntl36.c:303: INFO: OFD r/w lock vs POSIX read lock
fcntl36.c:381: PASS: Access between threads synchronized

Summary:
passed   7
failed   0
skipped  0
warnings 0
Naresh Kamboju June 19, 2018, 7:51 a.m. UTC | #4
On 15 June 2018 at 22:52, Jan Stancek <jstancek@redhat.com> wrote:
> fcntl36 testcase has been observed to sporadically fail, when
> running 32-bit user-space on 64-bit kernel.
>
> Strace shows that region length is 0 for some commands:
>   fcntl64(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=5120, l_len=0}
>
> This is because testcase is passing struct flock64, but
> command is F_SETLK, not F_SETLK64.
>
> So, kernel treats argument for POSIX command as 32-bit "struct flock":
>
>   static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
>   ...
>         case F_SETLK:
>         case F_SETLKW:
>                 err = get_compat_flock(&flock, compat_ptr(arg));
>                 ...
>
> in contrast to F_OFD_* commands, where argument  is treated
> as "struct flock64":
>         case F_SETLK64:
>         case F_SETLKW64:
>         case F_OFD_SETLK:
>         case F_OFD_SETLKW:
>                 err = get_compat_flock64(&flock, compat_ptr(arg));
>                 ...
>
> Switch argument of POSIX commands to 'struct flock' and leave it to
> glibc to pick correct syscall and command.
>
> I tested fcntl36 and fcntl36_64 on x86_64 kernel (as 64bit + 32bit binary)
> and i386 kernel by running 100 iterations.
>
> Signed-off-by: Jan Stancek <jstancek@redhat.com>

Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>

> ---
> No kernel bug, problem was found in my reproducer.
> Can anyone confirm this fixes 32-bit arm too?

This patch tested on arm32 Beagleboard x15 for 25 iterations and PASS.
Here you find test log before and after the patch.

Full test log,
https://lkft.validation.linaro.org/scheduler/job/290879#L3056

>
>  testcases/kernel/syscalls/fcntl/fcntl36.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/testcases/kernel/syscalls/fcntl/fcntl36.c b/testcases/kernel/syscalls/fcntl/fcntl36.c
> index 3246d13892cd..81bd5a647e4c 100644
> --- a/testcases/kernel/syscalls/fcntl/fcntl36.c
> +++ b/testcases/kernel/syscalls/fcntl/fcntl36.c
> @@ -127,7 +127,7 @@ static void *fn_posix_w(void *arg)
>         int fd = SAFE_OPEN(fname, O_RDWR);
>         long wt = pa->cnt;
>
> -       struct flock64 lck = {
> +       struct flock lck = {
>                 .l_whence = SEEK_SET,
>                 .l_start  = pa->offset,
>                 .l_len    = pa->length,
> @@ -227,7 +227,7 @@ static void *fn_posix_r(void *arg)
>         int i;
>         int fd = SAFE_OPEN(fname, O_RDWR);
>
> -       struct flock64 lck = {
> +       struct flock lck = {
>                 .l_whence = SEEK_SET,
>                 .l_start  = pa->offset,
>                 .l_len    = pa->length,
> --
> 1.8.3.1
>

Thanks for the patch !

Best regards
Naresh Kamboju
Jan Stancek June 19, 2018, 10:56 a.m. UTC | #5
----- Original Message -----
> On 15 June 2018 at 22:52, Jan Stancek <jstancek@redhat.com> wrote:
> > fcntl36 testcase has been observed to sporadically fail, when
> > running 32-bit user-space on 64-bit kernel.
> >
> > Strace shows that region length is 0 for some commands:
> >   fcntl64(6, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=5120,
> >   l_len=0}
> >
> > This is because testcase is passing struct flock64, but
> > command is F_SETLK, not F_SETLK64.
> >
> > So, kernel treats argument for POSIX command as 32-bit "struct flock":
> >
> >   static long do_compat_fcntl64(unsigned int fd, unsigned int cmd,
> >   ...
> >         case F_SETLK:
> >         case F_SETLKW:
> >                 err = get_compat_flock(&flock, compat_ptr(arg));
> >                 ...
> >
> > in contrast to F_OFD_* commands, where argument  is treated
> > as "struct flock64":
> >         case F_SETLK64:
> >         case F_SETLKW64:
> >         case F_OFD_SETLK:
> >         case F_OFD_SETLKW:
> >                 err = get_compat_flock64(&flock, compat_ptr(arg));
> >                 ...
> >
> > Switch argument of POSIX commands to 'struct flock' and leave it to
> > glibc to pick correct syscall and command.
> >
> > I tested fcntl36 and fcntl36_64 on x86_64 kernel (as 64bit + 32bit binary)
> > and i386 kernel by running 100 iterations.
> >
> > Signed-off-by: Jan Stancek <jstancek@redhat.com>
> 
> Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
> 

Thanks Li & Naresh. Pushed.

Regards,
Jan
diff mbox series

Patch

diff --git a/testcases/kernel/syscalls/fcntl/fcntl36.c b/testcases/kernel/syscalls/fcntl/fcntl36.c
index 3246d13892cd..81bd5a647e4c 100644
--- a/testcases/kernel/syscalls/fcntl/fcntl36.c
+++ b/testcases/kernel/syscalls/fcntl/fcntl36.c
@@ -127,7 +127,7 @@  static void *fn_posix_w(void *arg)
 	int fd = SAFE_OPEN(fname, O_RDWR);
 	long wt = pa->cnt;
 
-	struct flock64 lck = {
+	struct flock lck = {
 		.l_whence = SEEK_SET,
 		.l_start  = pa->offset,
 		.l_len    = pa->length,
@@ -227,7 +227,7 @@  static void *fn_posix_r(void *arg)
 	int i;
 	int fd = SAFE_OPEN(fname, O_RDWR);
 
-	struct flock64 lck = {
+	struct flock lck = {
 		.l_whence = SEEK_SET,
 		.l_start  = pa->offset,
 		.l_len    = pa->length,