Message ID | 35e73925-532c-bc95-53ca-005f3dbd130b@linaro.org |
---|---|
State | New |
Headers | show |
On Wed, 16 Aug 2017, Adhemerval Zanella wrote: > +ENTRY(__libc_do_syscall) > + move $2, $4 > + move $4, $5 > + move $5, $6 > + move $6, $7 I'm not very keen on having a nested syscall function call, but if you do that, then please at least arrange the wrapper's arguments such that you don't have to shuffle them, i.e. I suggest placing the syscall number last. For historical reasons you may want to initialise $2 right before the SYSCALL instruction, although I take it we don't anymore support Linux kernels old enough to require it for the syscall restart convention (so it would mainly serve as a reference for those who need to write their own code supporting those old kernels, as people often blindly copy & paste existing pieces). Also the MIPS16 wrappers may require adjustment then in order not to execute a doubly nested function call unnecessarily, i.e. call `__libc_do_syscall' directly rather than through another wrapper. Maciej
On 16/08/2017 11:46, Maciej W. Rozycki wrote: > On Wed, 16 Aug 2017, Adhemerval Zanella wrote: > >> +ENTRY(__libc_do_syscall) >> + move $2, $4 >> + move $4, $5 >> + move $5, $6 >> + move $6, $7 > > I'm not very keen on having a nested syscall function call, but if you do > that, then please at least arrange the wrapper's arguments such that you > don't have to shuffle them, i.e. I suggest placing the syscall number > last. I aimed for simplicity here since to avoid shuffle it would require three specialized wrapper, one for each syscall convention (5/6/7). I can do it, but I still prefer to have only one entry point, since I think the possible performance gains are not worth the extra maintenance burden. > > For historical reasons you may want to initialise $2 right before the > SYSCALL instruction, although I take it we don't anymore support Linux > kernels old enough to require it for the syscall restart convention (so it > would mainly serve as a reference for those who need to write their own > code supporting those old kernels, as people often blindly copy & paste > existing pieces). Do you know which is the kernel version which this was not really required? I actually tested on a 3.2 on qemu (as it is the minimum one supported currently). > > Also the MIPS16 wrappers may require adjustment then in order not to > execute a doubly nested function call unnecessarily, i.e. call > `__libc_do_syscall' directly rather than through another wrapper. I did not actually tested MIPS16, neither build for it. I would appreciate any help here, since my mips abi knowledge is limited.
On 2017-08-16 11:13, Adhemerval Zanella wrote: > > > On 16/08/2017 10:44, Joseph Myers wrote: > > On Wed, 16 Aug 2017, Maciej W. Rozycki wrote: > > > >> On Tue, 15 Aug 2017, Joseph Myers wrote: > >> > >>> In which case having a volatile integer variable with value 4, declaring a > >>> VLA whose size is that variable, and storing a pointer to that VLA in a > >>> variable, would be an alternative to alloca to force a frame pointer, but > >>> with deallocation happening when the scope ends rather than the function > >>> ending (and the syscall macro has its own scope, so using it inside a loop > >>> wouldn't be a problem). > >> > >> I suspect using volatile variables will cause unnecessary memory traffic. > >> Passing the size specifier through an empty `asm' might give better code; > >> also I think we can use 0 as the size requested, not to decrease the stack > >> pointer unnecessarily, e.g.: > > > > Sure, as long as (a) the compiler can't know the size is actually constant > > and (b) it can't know the VLA isn't actually used, as if it can tell > > either of those things it can optimize away the variable stack allocation. > > > >> Also I wonder if there's actually a dependable way to have GCC itself > >> allocate the argument space we require. For example if we set `s' to 1 > >> above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be > >> valid to place arguments #5 and #6 at respectively without the subsequent > >> $sp adjustment we currently have in the syscall `asm' or would it be UB? > > > > You can't tell whether the compiler might have allocated other variables > > on the stack after the dynamic adjustment - that is, whether any > > particular offset from sp is in fact unused or not. > > > > What about the below? I can use some help to see if I am handling all the > required ABI requirements for the __libc_do_syscall, but on an qemu emulated > system I see no regression on basic tests (including some cancellation one > from glibc to see the syscall is correctly unwinded) and tst-rwlock15 also > does not fail anymore. Thanks for this patch, I'll give it a try. I have been working on something similar, however I only routed the syscalls with 5, 6 or 7 arguments to the __libc_do_syscall. That way there is no performance penalty for them as they are the most used ones.
On 2017-08-16 11:54, Adhemerval Zanella wrote: > > > On 16/08/2017 11:46, Maciej W. Rozycki wrote: > > On Wed, 16 Aug 2017, Adhemerval Zanella wrote: > > > >> +ENTRY(__libc_do_syscall) > >> + move $2, $4 > >> + move $4, $5 > >> + move $5, $6 > >> + move $6, $7 > > > > I'm not very keen on having a nested syscall function call, but if you do > > that, then please at least arrange the wrapper's arguments such that you > > don't have to shuffle them, i.e. I suggest placing the syscall number > > last. > > I aimed for simplicity here since to avoid shuffle it would require three > specialized wrapper, one for each syscall convention (5/6/7). I can do it, > but I still prefer to have only one entry point, since I think the possible > performance gains are not worth the extra maintenance burden. Thinking about that, if the __libc_do_syscall routine is only used for syscall with 5/6/7 arguments, the syscall number can be passed as the 5th argument (the first on the stack), between argument 4 and 5. That way arguments 1 to 4 are already in the right registers and the other needs to be copied anyway.
On 2017-08-16 11:54, Adhemerval Zanella wrote: > Do you know which is the kernel version which this was not really required? > I actually tested on a 3.2 on qemu (as it is the minimum one supported > currently). According to https://www.linux-mips.org/wiki/Syscall it's required up to kernel 2.6.35.
On 2017-08-16 11:13, Adhemerval Zanella wrote: > > > On 16/08/2017 10:44, Joseph Myers wrote: > > On Wed, 16 Aug 2017, Maciej W. Rozycki wrote: > > > >> On Tue, 15 Aug 2017, Joseph Myers wrote: > >> > >>> In which case having a volatile integer variable with value 4, declaring a > >>> VLA whose size is that variable, and storing a pointer to that VLA in a > >>> variable, would be an alternative to alloca to force a frame pointer, but > >>> with deallocation happening when the scope ends rather than the function > >>> ending (and the syscall macro has its own scope, so using it inside a loop > >>> wouldn't be a problem). > >> > >> I suspect using volatile variables will cause unnecessary memory traffic. > >> Passing the size specifier through an empty `asm' might give better code; > >> also I think we can use 0 as the size requested, not to decrease the stack > >> pointer unnecessarily, e.g.: > > > > Sure, as long as (a) the compiler can't know the size is actually constant > > and (b) it can't know the VLA isn't actually used, as if it can tell > > either of those things it can optimize away the variable stack allocation. > > > >> Also I wonder if there's actually a dependable way to have GCC itself > >> allocate the argument space we require. For example if we set `s' to 1 > >> above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be > >> valid to place arguments #5 and #6 at respectively without the subsequent > >> $sp adjustment we currently have in the syscall `asm' or would it be UB? > > > > You can't tell whether the compiler might have allocated other variables > > on the stack after the dynamic adjustment - that is, whether any > > particular offset from sp is in fact unused or not. > > > > What about the below? I can use some help to see if I am handling all the > required ABI requirements for the __libc_do_syscall, but on an qemu emulated Do we actually have to follow the ABI requirements if we control both the caller of __libc_do_syscall and the function itself? The i386 and arm version seem to pass as much as possible in the right registers and the other values and other way. For MIPS, it means we can pass v0, a0-a3 in the correct registers and use __libc_do_syscall to just setup the values on the stack. Something like that for example: ENTRY(__libc_do_syscall) PTR_SUBU sp, 32 cfi_adjust_cfa_offset(32) .set noreorder REG_S s2, 16(sp) REG_S s3, 20(sp) REG_S s4, 24(sp) syscall .set reorder PTR_SUBU sp, -32 cfi_adjust_cfa_offset(-32) ret END (__libc_do_syscall) On the caller side the 5th and following arguments should be passed in s2, s3, s4. s1 can be used to save ra around the subroutine call.
On Wed, 16 Aug 2017, Aurelien Jarno wrote: > > Do you know which is the kernel version which this was not really required? > > I actually tested on a 3.2 on qemu (as it is the minimum one supported > > currently). > > According to https://www.linux-mips.org/wiki/Syscall it's required up to > kernel 2.6.35. Well, there is this very comment in the source file concerned: The convention was relaxed in Linux with a change applied to the kernel GIT repository as commit 96187fb0bc30cd7919759d371d810e928048249d, that first appeared in the 2.6.36 release. Since then the kernel has had code that reloads $v0 upon syscall restart and resumes right at the SYSCALL instruction, so no special arrangement is needed anymore. (which is "MIPS: Sanitize restart logics", dated Sep 28, 2010). Maciej
On 16/08/2017 18:15, Aurelien Jarno wrote: > On 2017-08-16 11:13, Adhemerval Zanella wrote: >> >> >> On 16/08/2017 10:44, Joseph Myers wrote: >>> On Wed, 16 Aug 2017, Maciej W. Rozycki wrote: >>> >>>> On Tue, 15 Aug 2017, Joseph Myers wrote: >>>> >>>>> In which case having a volatile integer variable with value 4, declaring a >>>>> VLA whose size is that variable, and storing a pointer to that VLA in a >>>>> variable, would be an alternative to alloca to force a frame pointer, but >>>>> with deallocation happening when the scope ends rather than the function >>>>> ending (and the syscall macro has its own scope, so using it inside a loop >>>>> wouldn't be a problem). >>>> >>>> I suspect using volatile variables will cause unnecessary memory traffic. >>>> Passing the size specifier through an empty `asm' might give better code; >>>> also I think we can use 0 as the size requested, not to decrease the stack >>>> pointer unnecessarily, e.g.: >>> >>> Sure, as long as (a) the compiler can't know the size is actually constant >>> and (b) it can't know the VLA isn't actually used, as if it can tell >>> either of those things it can optimize away the variable stack allocation. >>> >>>> Also I wonder if there's actually a dependable way to have GCC itself >>>> allocate the argument space we require. For example if we set `s' to 1 >>>> above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be >>>> valid to place arguments #5 and #6 at respectively without the subsequent >>>> $sp adjustment we currently have in the syscall `asm' or would it be UB? >>> >>> You can't tell whether the compiler might have allocated other variables >>> on the stack after the dynamic adjustment - that is, whether any >>> particular offset from sp is in fact unused or not. >>> >> >> What about the below? I can use some help to see if I am handling all the >> required ABI requirements for the __libc_do_syscall, but on an qemu emulated > > Do we actually have to follow the ABI requirements if we control both > the caller of __libc_do_syscall and the function itself? The i386 and > arm version seem to pass as much as possible in the right registers and > the other values and other way. > > For MIPS, it means we can pass v0, a0-a3 in the correct registers and > use __libc_do_syscall to just setup the values on the stack. Something > like that for example: > We do not really to follow ABI requirements and the only requirement is to unwind correctly backtrace for cancellation work. However to allow this optimization we would need to take care different ABI calling convention for internal symbol on internal symbols (I noted that for PIC code MIPS adds a GOT reference plus a R_MIPS_JALR, which linker might relax later). I think we should aim for simplicity and use as much as C support we can and optimize this with more asm hackery if we really need to squeeze the specific cycles out the syscall (which I really think it is overkill for mostly if not all of them). Currently with this patch __libc_do_syscall is called on pread, pwrite, lseek, llseek, ppoll, posix_fadvice, posix_fallocate, sync_file_range, fallocate, preadv, pwritev, preadv2, pwritev2, select, pselect, mmap, readahead, epoll_pwait, splice, recvfrom, sendto, recvmmsg, msgsnd, msgrcv, msgget, msgctl, semop, semget, semctl, semtimedop, shmat, shmdt, shmget, and shmctl. All with possible exception of posix_fadvice and sysv ctl are blocking calls which trying to get some cycles really won't make any difference IMHO. Also context switch is usually the large factor of latency. > ENTRY(__libc_do_syscall) > PTR_SUBU sp, 32 > cfi_adjust_cfa_offset(32) > > .set noreorder > REG_S s2, 16(sp) > REG_S s3, 20(sp) > REG_S s4, 24(sp) > syscall > .set reorder > > PTR_SUBU sp, -32 > cfi_adjust_cfa_offset(-32) > ret > END (__libc_do_syscall) > > > On the caller side the 5th and following arguments should be passed in > s2, s3, s4. s1 can be used to save ra around the subroutine call. >
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/Makefile b/sysdeps/unix/sysv/linux/mips/mips32/Makefile index 33b4615..cbdf032 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/Makefile +++ b/sysdeps/unix/sysv/linux/mips/mips32/Makefile @@ -1,8 +1,26 @@ +ifeq ($(subdir),elf) +sysdep-dl-routines += libc-do-syscall +endif + ifeq ($(subdir),conform) # For bugs 17786 and 21278. conformtest-xfail-conds += mips-o32-linux endif +ifeq ($(subdir),io) +sysdep_routines += libc-do-syscall +endif + +ifeq ($(subdir),nptl) +libpthread-sysdep_routines += libc-do-syscall +libpthread-shared-only-routines += libc-do-syscall +endif + +ifeq ($(subdir),rt) +librt-sysdep_routines += libc-do-syscall +librt-shared-only-routines += libc-do-syscall +endif + ifeq ($(subdir),stdlib) tests += bug-getcontext-mips-gp endif diff --git a/sysdeps/unix/sysv/linux/mips/mips32/libc-do-syscall.S b/sysdeps/unix/sysv/linux/mips/mips32/libc-do-syscall.S new file mode 100644 index 0000000..a7184d9 --- /dev/null +++ b/sysdeps/unix/sysv/linux/mips/mips32/libc-do-syscall.S @@ -0,0 +1,54 @@ +/* Copyright (C) 2017 Free Software Foundation, Inc. + This file is part of the GNU C Library. + + The GNU C Library is free software; you can redistribute it and/or + modify it under the terms of the GNU Lesser General Public + License as published by the Free Software Foundation; either + version 2.1 of the License, or (at your option) any later version. + + The GNU C Library is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + Lesser General Public License for more details. + + You should have received a copy of the GNU Lesser General Public + License along with the GNU C Library. If not, see + <http://www.gnu.org/licenses/>. */ + +#include <sys/asm.h> +#include <sysdep.h> +#include <asm/unistd.h> +#include <sgidefs.h> + + +/* long int __libc_do_syscall (long int, ...) */ + +#define FRAMESZ 32 + + .text + .set nomips16 + .hidden __libc_do_syscall +ENTRY(__libc_do_syscall) + move $2, $4 + move $4, $5 + move $5, $6 + move $6, $7 + lw $7, 16(sp) + lw $8, 20(sp) + lw $9, 24(sp) + lw $10,28(sp) + .set noreorder + PTR_SUBU sp, FRAMESZ + cfi_adjust_cfa_offset (FRAMESZ) + sw $8, 16(sp) + sw $9, 20(sp) + sw $10,24(sp) + syscall + PTR_ADDU sp, FRAMESZ + cfi_adjust_cfa_offset (-FRAMESZ) + .set reorder + beq $7, $0, 1f + subu $2, $0, $2 +1: jr ra + nop +END (__libc_do_syscall) diff --git a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h index e9e3ee7..3a8920a 100644 --- a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h +++ b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h @@ -121,13 +121,13 @@ # define INTERNAL_SYSCALL_MIPS16(number, err, nr, args...) \ internal_syscall##nr ("lw\t%0, %2\n\t", \ "R" (number), \ - 0, err, args) + SYS_ify(name), err, args) #else /* !__mips16 */ # define INTERNAL_SYSCALL(name, err, nr, args...) \ internal_syscall##nr ("li\t%0, %2\t\t\t# " #name "\n\t", \ "IK" (SYS_ify (name)), \ - 0, err, args) + SYS_ify(name), err, args) # define INTERNAL_SYSCALL_NCS(number, err, nr, args...) \ internal_syscall##nr (MOVE32 "\t%0, %2\n\t", \ @@ -136,6 +136,7 @@ #endif /* !__mips16 */ + #define internal_syscall0(v0_init, input, number, err, dummy...) \ ({ \ long _sys_result; \ @@ -262,109 +263,41 @@ _sys_result; \ }) -/* We need to use a frame pointer for the functions in which we - adjust $sp around the syscall, or debug information and unwind - information will be $sp relative and thus wrong during the syscall. As - of GCC 4.7, this is sufficient. */ -#define FORCE_FRAME_POINTER \ - void *volatile __fp_force __attribute__ ((unused)) = alloca (4) +long int __libc_do_syscall (long int, ...) attribute_hidden; #define internal_syscall5(v0_init, input, number, err, \ arg1, arg2, arg3, arg4, arg5) \ ({ \ - long _sys_result; \ - \ - FORCE_FRAME_POINTER; \ - { \ - register long __s0 asm ("$16") __attribute__ ((unused)) \ - = (number); \ - register long __v0 asm ("$2"); \ - register long __a0 asm ("$4") = (long) (arg1); \ - register long __a1 asm ("$5") = (long) (arg2); \ - register long __a2 asm ("$6") = (long) (arg3); \ - register long __a3 asm ("$7") = (long) (arg4); \ - __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ - "subu\t$29, 32\n\t" \ - "sw\t%6, 16($29)\n\t" \ - v0_init \ - "syscall\n\t" \ - "addiu\t$29, 32\n\t" \ - ".set\treorder" \ - : "=r" (__v0), "+r" (__a3) \ - : input, "r" (__a0), "r" (__a1), "r" (__a2), \ - "r" ((long) (arg5)) \ - : __SYSCALL_CLOBBERS); \ - err = __a3; \ - _sys_result = __v0; \ - } \ + long int _sys_result; \ + _sys_result = __libc_do_syscall (number, arg1, arg2, arg3, \ + arg4, arg5); \ + err = _sys_result > -4096UL ? 1 : 0; \ + if (err) \ + _sys_result = -_sys_result; \ _sys_result; \ }) #define internal_syscall6(v0_init, input, number, err, \ arg1, arg2, arg3, arg4, arg5, arg6) \ ({ \ - long _sys_result; \ - \ - FORCE_FRAME_POINTER; \ - { \ - register long __s0 asm ("$16") __attribute__ ((unused)) \ - = (number); \ - register long __v0 asm ("$2"); \ - register long __a0 asm ("$4") = (long) (arg1); \ - register long __a1 asm ("$5") = (long) (arg2); \ - register long __a2 asm ("$6") = (long) (arg3); \ - register long __a3 asm ("$7") = (long) (arg4); \ - __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ - "subu\t$29, 32\n\t" \ - "sw\t%6, 16($29)\n\t" \ - "sw\t%7, 20($29)\n\t" \ - v0_init \ - "syscall\n\t" \ - "addiu\t$29, 32\n\t" \ - ".set\treorder" \ - : "=r" (__v0), "+r" (__a3) \ - : input, "r" (__a0), "r" (__a1), "r" (__a2), \ - "r" ((long) (arg5)), "r" ((long) (arg6)) \ - : __SYSCALL_CLOBBERS); \ - err = __a3; \ - _sys_result = __v0; \ - } \ + long int _sys_result; \ + _sys_result = __libc_do_syscall (number, arg1, arg2, arg3, \ + arg4, arg5, arg6); \ + err = _sys_result > -4096UL ? 1 : 0; \ + if (err) \ + _sys_result = -_sys_result; \ _sys_result; \ }) #define internal_syscall7(v0_init, input, number, err, \ arg1, arg2, arg3, arg4, arg5, arg6, arg7) \ ({ \ - long _sys_result; \ - \ - FORCE_FRAME_POINTER; \ - { \ - register long __s0 asm ("$16") __attribute__ ((unused)) \ - = (number); \ - register long __v0 asm ("$2"); \ - register long __a0 asm ("$4") = (long) (arg1); \ - register long __a1 asm ("$5") = (long) (arg2); \ - register long __a2 asm ("$6") = (long) (arg3); \ - register long __a3 asm ("$7") = (long) (arg4); \ - __asm__ volatile ( \ - ".set\tnoreorder\n\t" \ - "subu\t$29, 32\n\t" \ - "sw\t%6, 16($29)\n\t" \ - "sw\t%7, 20($29)\n\t" \ - "sw\t%8, 24($29)\n\t" \ - v0_init \ - "syscall\n\t" \ - "addiu\t$29, 32\n\t" \ - ".set\treorder" \ - : "=r" (__v0), "+r" (__a3) \ - : input, "r" (__a0), "r" (__a1), "r" (__a2), \ - "r" ((long) (arg5)), "r" ((long) (arg6)), "r" ((long) (arg7)) \ - : __SYSCALL_CLOBBERS); \ - err = __a3; \ - _sys_result = __v0; \ - } \ + long int _sys_result; \ + _sys_result = __libc_do_syscall (number, arg1, arg2, arg3, \ + arg4, arg5, arg6, arg7); \ + err = _sys_result > -4096UL ? 1 : 0; \ + if (err) \ + _sys_result = -_sys_result; \ _sys_result; \ })