diff mbox

mips/o32: fix internal_syscall5/6/7

Message ID 35e73925-532c-bc95-53ca-005f3dbd130b@linaro.org
State New
Headers show

Commit Message

Adhemerval Zanella Aug. 16, 2017, 2:13 p.m. UTC
On 16/08/2017 10:44, Joseph Myers wrote:
> On Wed, 16 Aug 2017, Maciej W. Rozycki wrote:
> 
>> On Tue, 15 Aug 2017, Joseph Myers wrote:
>>
>>> In which case having a volatile integer variable with value 4, declaring a 
>>> VLA whose size is that variable, and storing a pointer to that VLA in a 
>>> variable, would be an alternative to alloca to force a frame pointer, but 
>>> with deallocation happening when the scope ends rather than the function 
>>> ending (and the syscall macro has its own scope, so using it inside a loop 
>>> wouldn't be a problem).
>>
>>  I suspect using volatile variables will cause unnecessary memory traffic.  
>> Passing the size specifier through an empty `asm' might give better code; 
>> also I think we can use 0 as the size requested, not to decrease the stack 
>> pointer unnecessarily, e.g.:
> 
> Sure, as long as (a) the compiler can't know the size is actually constant 
> and (b) it can't know the VLA isn't actually used, as if it can tell 
> either of those things it can optimize away the variable stack allocation.
> 
>>  Also I wonder if there's actually a dependable way to have GCC itself 
>> allocate the argument space we require.  For example if we set `s' to 1 
>> above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be 
>> valid to place arguments #5 and #6 at respectively without the subsequent 
>> $sp adjustment we currently have in the syscall `asm' or would it be UB?
> 
> You can't tell whether the compiler might have allocated other variables 
> on the stack after the dynamic adjustment - that is, whether any 
> particular offset from sp is in fact unused or not.
> 

What about the below? I can use some help to see if I am handling all the
required ABI requirements for the __libc_do_syscall, but on an qemu emulated
system I see no regression on basic tests (including some cancellation one
from glibc to see the syscall is correctly unwinded) and tst-rwlock15 also
does not fail anymore.

Comments

Maciej W. Rozycki Aug. 16, 2017, 2:46 p.m. UTC | #1
On Wed, 16 Aug 2017, Adhemerval Zanella wrote:

> +ENTRY(__libc_do_syscall)
> +        move    $2, $4
> +        move    $4, $5
> +        move    $5, $6
> +        move    $6, $7

 I'm not very keen on having a nested syscall function call, but if you do 
that, then please at least arrange the wrapper's arguments such that you 
don't have to shuffle them, i.e. I suggest placing the syscall number 
last.

 For historical reasons you may want to initialise $2 right before the 
SYSCALL instruction, although I take it we don't anymore support Linux 
kernels old enough to require it for the syscall restart convention (so it 
would mainly serve as a reference for those who need to write their own 
code supporting those old kernels, as people often blindly copy & paste 
existing pieces).

 Also the MIPS16 wrappers may require adjustment then in order not to 
execute a doubly nested function call unnecessarily, i.e. call 
`__libc_do_syscall' directly rather than through another wrapper.

  Maciej
Adhemerval Zanella Aug. 16, 2017, 2:54 p.m. UTC | #2
On 16/08/2017 11:46, Maciej W. Rozycki wrote:
> On Wed, 16 Aug 2017, Adhemerval Zanella wrote:
> 
>> +ENTRY(__libc_do_syscall)
>> +        move    $2, $4
>> +        move    $4, $5
>> +        move    $5, $6
>> +        move    $6, $7
> 
>  I'm not very keen on having a nested syscall function call, but if you do 
> that, then please at least arrange the wrapper's arguments such that you 
> don't have to shuffle them, i.e. I suggest placing the syscall number 
> last.

I aimed for simplicity here since to avoid shuffle it would require three
specialized wrapper, one for each syscall convention (5/6/7).  I can do it,
but I still prefer to have only one entry point, since I think the possible
performance gains are not worth the extra maintenance burden.

> 
>  For historical reasons you may want to initialise $2 right before the 
> SYSCALL instruction, although I take it we don't anymore support Linux 
> kernels old enough to require it for the syscall restart convention (so it 
> would mainly serve as a reference for those who need to write their own 
> code supporting those old kernels, as people often blindly copy & paste 
> existing pieces).

Do you know which is the kernel version which this was not really required?
I actually tested on a 3.2 on qemu (as it is the minimum one supported
currently).

> 
>  Also the MIPS16 wrappers may require adjustment then in order not to 
> execute a doubly nested function call unnecessarily, i.e. call 
> `__libc_do_syscall' directly rather than through another wrapper.

I did not actually tested MIPS16, neither build for it.  I would appreciate
any help here, since my mips abi knowledge is limited.
Aurelien Jarno Aug. 16, 2017, 3:18 p.m. UTC | #3
On 2017-08-16 11:13, Adhemerval Zanella wrote:
> 
> 
> On 16/08/2017 10:44, Joseph Myers wrote:
> > On Wed, 16 Aug 2017, Maciej W. Rozycki wrote:
> > 
> >> On Tue, 15 Aug 2017, Joseph Myers wrote:
> >>
> >>> In which case having a volatile integer variable with value 4, declaring a 
> >>> VLA whose size is that variable, and storing a pointer to that VLA in a 
> >>> variable, would be an alternative to alloca to force a frame pointer, but 
> >>> with deallocation happening when the scope ends rather than the function 
> >>> ending (and the syscall macro has its own scope, so using it inside a loop 
> >>> wouldn't be a problem).
> >>
> >>  I suspect using volatile variables will cause unnecessary memory traffic.  
> >> Passing the size specifier through an empty `asm' might give better code; 
> >> also I think we can use 0 as the size requested, not to decrease the stack 
> >> pointer unnecessarily, e.g.:
> > 
> > Sure, as long as (a) the compiler can't know the size is actually constant 
> > and (b) it can't know the VLA isn't actually used, as if it can tell 
> > either of those things it can optimize away the variable stack allocation.
> > 
> >>  Also I wonder if there's actually a dependable way to have GCC itself 
> >> allocate the argument space we require.  For example if we set `s' to 1 
> >> above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be 
> >> valid to place arguments #5 and #6 at respectively without the subsequent 
> >> $sp adjustment we currently have in the syscall `asm' or would it be UB?
> > 
> > You can't tell whether the compiler might have allocated other variables 
> > on the stack after the dynamic adjustment - that is, whether any 
> > particular offset from sp is in fact unused or not.
> > 
> 
> What about the below? I can use some help to see if I am handling all the
> required ABI requirements for the __libc_do_syscall, but on an qemu emulated
> system I see no regression on basic tests (including some cancellation one
> from glibc to see the syscall is correctly unwinded) and tst-rwlock15 also
> does not fail anymore.

Thanks for this patch, I'll give it a try. I have been working on
something similar, however I only routed the syscalls with 5, 6 or 7
arguments to the __libc_do_syscall. That way there is no performance
penalty for them as they are the most used ones.
Aurelien Jarno Aug. 16, 2017, 4:12 p.m. UTC | #4
On 2017-08-16 11:54, Adhemerval Zanella wrote:
> 
> 
> On 16/08/2017 11:46, Maciej W. Rozycki wrote:
> > On Wed, 16 Aug 2017, Adhemerval Zanella wrote:
> > 
> >> +ENTRY(__libc_do_syscall)
> >> +        move    $2, $4
> >> +        move    $4, $5
> >> +        move    $5, $6
> >> +        move    $6, $7
> > 
> >  I'm not very keen on having a nested syscall function call, but if you do 
> > that, then please at least arrange the wrapper's arguments such that you 
> > don't have to shuffle them, i.e. I suggest placing the syscall number 
> > last.
> 
> I aimed for simplicity here since to avoid shuffle it would require three
> specialized wrapper, one for each syscall convention (5/6/7).  I can do it,
> but I still prefer to have only one entry point, since I think the possible
> performance gains are not worth the extra maintenance burden.

Thinking about that, if the __libc_do_syscall routine is only used for
syscall with 5/6/7 arguments, the syscall number can be passed as the
5th argument (the first on the stack), between argument 4 and 5. That
way arguments 1 to 4 are already in the right registers and the other
needs to be copied anyway.
Aurelien Jarno Aug. 16, 2017, 9:07 p.m. UTC | #5
On 2017-08-16 11:54, Adhemerval Zanella wrote:
> Do you know which is the kernel version which this was not really required?
> I actually tested on a 3.2 on qemu (as it is the minimum one supported
> currently).

According to https://www.linux-mips.org/wiki/Syscall it's required up to
kernel 2.6.35.
Aurelien Jarno Aug. 16, 2017, 9:15 p.m. UTC | #6
On 2017-08-16 11:13, Adhemerval Zanella wrote:
> 
> 
> On 16/08/2017 10:44, Joseph Myers wrote:
> > On Wed, 16 Aug 2017, Maciej W. Rozycki wrote:
> > 
> >> On Tue, 15 Aug 2017, Joseph Myers wrote:
> >>
> >>> In which case having a volatile integer variable with value 4, declaring a 
> >>> VLA whose size is that variable, and storing a pointer to that VLA in a 
> >>> variable, would be an alternative to alloca to force a frame pointer, but 
> >>> with deallocation happening when the scope ends rather than the function 
> >>> ending (and the syscall macro has its own scope, so using it inside a loop 
> >>> wouldn't be a problem).
> >>
> >>  I suspect using volatile variables will cause unnecessary memory traffic.  
> >> Passing the size specifier through an empty `asm' might give better code; 
> >> also I think we can use 0 as the size requested, not to decrease the stack 
> >> pointer unnecessarily, e.g.:
> > 
> > Sure, as long as (a) the compiler can't know the size is actually constant 
> > and (b) it can't know the VLA isn't actually used, as if it can tell 
> > either of those things it can optimize away the variable stack allocation.
> > 
> >>  Also I wonder if there's actually a dependable way to have GCC itself 
> >> allocate the argument space we require.  For example if we set `s' to 1 
> >> above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be 
> >> valid to place arguments #5 and #6 at respectively without the subsequent 
> >> $sp adjustment we currently have in the syscall `asm' or would it be UB?
> > 
> > You can't tell whether the compiler might have allocated other variables 
> > on the stack after the dynamic adjustment - that is, whether any 
> > particular offset from sp is in fact unused or not.
> > 
> 
> What about the below? I can use some help to see if I am handling all the
> required ABI requirements for the __libc_do_syscall, but on an qemu emulated

Do we actually have to follow the ABI requirements if we control both
the caller of __libc_do_syscall and the function itself? The i386 and
arm version seem to pass as much as possible in the right registers and
the other values and other way.

For MIPS, it means we can pass v0, a0-a3 in the correct registers and
use __libc_do_syscall to just setup the values on the stack. Something
like that for example:

ENTRY(__libc_do_syscall)
       PTR_SUBU sp, 32
       cfi_adjust_cfa_offset(32)

       .set noreorder
       REG_S s2, 16(sp)
       REG_S s3, 20(sp)
       REG_S s4, 24(sp)
       syscall
       .set reorder

       PTR_SUBU sp, -32
       cfi_adjust_cfa_offset(-32)
       ret
END (__libc_do_syscall)


On the caller side the 5th and following arguments should be passed in
s2, s3, s4. s1 can be used to save ra around the subroutine call.
Maciej W. Rozycki Aug. 16, 2017, 10:10 p.m. UTC | #7
On Wed, 16 Aug 2017, Aurelien Jarno wrote:

> > Do you know which is the kernel version which this was not really required?
> > I actually tested on a 3.2 on qemu (as it is the minimum one supported
> > currently).
> 
> According to https://www.linux-mips.org/wiki/Syscall it's required up to
> kernel 2.6.35.

 Well, there is this very comment in the source file concerned:

   The convention was relaxed in Linux with a change applied to the kernel
   GIT repository as commit 96187fb0bc30cd7919759d371d810e928048249d, that
   first appeared in the 2.6.36 release.  Since then the kernel has had
   code that reloads $v0 upon syscall restart and resumes right at the
   SYSCALL instruction, so no special arrangement is needed anymore.

(which is "MIPS: Sanitize restart logics", dated Sep 28, 2010).

  Maciej
Adhemerval Zanella Aug. 17, 2017, 1:33 p.m. UTC | #8
On 16/08/2017 18:15, Aurelien Jarno wrote:
> On 2017-08-16 11:13, Adhemerval Zanella wrote:
>>
>>
>> On 16/08/2017 10:44, Joseph Myers wrote:
>>> On Wed, 16 Aug 2017, Maciej W. Rozycki wrote:
>>>
>>>> On Tue, 15 Aug 2017, Joseph Myers wrote:
>>>>
>>>>> In which case having a volatile integer variable with value 4, declaring a 
>>>>> VLA whose size is that variable, and storing a pointer to that VLA in a 
>>>>> variable, would be an alternative to alloca to force a frame pointer, but 
>>>>> with deallocation happening when the scope ends rather than the function 
>>>>> ending (and the syscall macro has its own scope, so using it inside a loop 
>>>>> wouldn't be a problem).
>>>>
>>>>  I suspect using volatile variables will cause unnecessary memory traffic.  
>>>> Passing the size specifier through an empty `asm' might give better code; 
>>>> also I think we can use 0 as the size requested, not to decrease the stack 
>>>> pointer unnecessarily, e.g.:
>>>
>>> Sure, as long as (a) the compiler can't know the size is actually constant 
>>> and (b) it can't know the VLA isn't actually used, as if it can tell 
>>> either of those things it can optimize away the variable stack allocation.
>>>
>>>>  Also I wonder if there's actually a dependable way to have GCC itself 
>>>> allocate the argument space we require.  For example if we set `s' to 1 
>>>> above instead for `internal_syscall6', then would `0($sp)' and `4($sp)' be 
>>>> valid to place arguments #5 and #6 at respectively without the subsequent 
>>>> $sp adjustment we currently have in the syscall `asm' or would it be UB?
>>>
>>> You can't tell whether the compiler might have allocated other variables 
>>> on the stack after the dynamic adjustment - that is, whether any 
>>> particular offset from sp is in fact unused or not.
>>>
>>
>> What about the below? I can use some help to see if I am handling all the
>> required ABI requirements for the __libc_do_syscall, but on an qemu emulated
> 
> Do we actually have to follow the ABI requirements if we control both
> the caller of __libc_do_syscall and the function itself? The i386 and
> arm version seem to pass as much as possible in the right registers and
> the other values and other way.
> 
> For MIPS, it means we can pass v0, a0-a3 in the correct registers and
> use __libc_do_syscall to just setup the values on the stack. Something
> like that for example:
> 

We do not really to follow ABI requirements and the only requirement is to
unwind correctly backtrace for cancellation work.  However to allow this
optimization we would need to take care different ABI calling convention for
internal symbol on internal symbols  (I noted that for PIC code MIPS adds
a GOT reference plus a R_MIPS_JALR, which linker might relax later).

I think we should aim for simplicity and use as much as C support we can
and optimize this with more asm hackery if we really need to squeeze the
specific cycles out the syscall (which I really think it is overkill for
mostly if not all of them).

Currently with this patch __libc_do_syscall is called on pread, pwrite, 
lseek, llseek, ppoll, posix_fadvice, posix_fallocate, sync_file_range, 
fallocate, preadv, pwritev, preadv2, pwritev2, select, pselect, mmap, 
readahead, epoll_pwait, splice, recvfrom, sendto, recvmmsg, msgsnd, msgrcv, 
msgget, msgctl, semop, semget, semctl, semtimedop, shmat, shmdt, shmget, 
and shmctl.  All with possible exception of posix_fadvice and sysv ctl
are blocking calls which trying to get some cycles really won't make
any difference IMHO.  Also context switch is usually the large factor
of latency.

> ENTRY(__libc_do_syscall)
>        PTR_SUBU sp, 32
>        cfi_adjust_cfa_offset(32)
> 
>        .set noreorder
>        REG_S s2, 16(sp)
>        REG_S s3, 20(sp)
>        REG_S s4, 24(sp)
>        syscall
>        .set reorder
> 
>        PTR_SUBU sp, -32
>        cfi_adjust_cfa_offset(-32)
>        ret
> END (__libc_do_syscall)
> 
> 
> On the caller side the 5th and following arguments should be passed in
> s2, s3, s4. s1 can be used to save ra around the subroutine call.
>
diff mbox

Patch

diff --git a/sysdeps/unix/sysv/linux/mips/mips32/Makefile b/sysdeps/unix/sysv/linux/mips/mips32/Makefile
index 33b4615..cbdf032 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/Makefile
+++ b/sysdeps/unix/sysv/linux/mips/mips32/Makefile
@@ -1,8 +1,26 @@ 
+ifeq ($(subdir),elf)
+sysdep-dl-routines += libc-do-syscall
+endif
+
 ifeq ($(subdir),conform)
 # For bugs 17786 and 21278.
 conformtest-xfail-conds += mips-o32-linux
 endif
 
+ifeq ($(subdir),io)
+sysdep_routines += libc-do-syscall
+endif
+
+ifeq ($(subdir),nptl)
+libpthread-sysdep_routines += libc-do-syscall
+libpthread-shared-only-routines += libc-do-syscall
+endif
+
+ifeq ($(subdir),rt)
+librt-sysdep_routines += libc-do-syscall
+librt-shared-only-routines += libc-do-syscall
+endif
+
 ifeq ($(subdir),stdlib)
 tests += bug-getcontext-mips-gp
 endif
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/libc-do-syscall.S b/sysdeps/unix/sysv/linux/mips/mips32/libc-do-syscall.S
new file mode 100644
index 0000000..a7184d9
--- /dev/null
+++ b/sysdeps/unix/sysv/linux/mips/mips32/libc-do-syscall.S
@@ -0,0 +1,54 @@ 
+/* Copyright (C) 2017 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library.  If not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <sys/asm.h>
+#include <sysdep.h>
+#include <asm/unistd.h>
+#include <sgidefs.h>
+
+
+/* long int __libc_do_syscall (long int, ...)  */
+
+#define FRAMESZ 32
+
+        .text
+        .set    nomips16
+	.hidden __libc_do_syscall
+ENTRY(__libc_do_syscall)
+        move    $2, $4
+        move    $4, $5
+        move    $5, $6
+        move    $6, $7
+        lw      $7, 16(sp)
+        lw      $8, 20(sp)
+        lw      $9, 24(sp)
+        lw      $10,28(sp)
+	.set 	noreorder
+	PTR_SUBU sp, FRAMESZ
+	cfi_adjust_cfa_offset (FRAMESZ)
+        sw      $8, 16(sp)
+        sw      $9, 20(sp)
+        sw      $10,24(sp)
+        syscall
+	PTR_ADDU sp, FRAMESZ
+	cfi_adjust_cfa_offset (-FRAMESZ)
+	.set	reorder
+        beq     $7, $0, 1f
+        subu    $2, $0, $2
+1:      jr      ra
+        nop
+END (__libc_do_syscall)
diff --git a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
index e9e3ee7..3a8920a 100644
--- a/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
+++ b/sysdeps/unix/sysv/linux/mips/mips32/sysdep.h
@@ -121,13 +121,13 @@ 
 # define INTERNAL_SYSCALL_MIPS16(number, err, nr, args...)		\
 	internal_syscall##nr ("lw\t%0, %2\n\t",				\
 			      "R" (number),				\
-			      0, err, args)
+			      SYS_ify(name), err, args)
 
 #else /* !__mips16 */
 # define INTERNAL_SYSCALL(name, err, nr, args...)			\
 	internal_syscall##nr ("li\t%0, %2\t\t\t# " #name "\n\t",	\
 			      "IK" (SYS_ify (name)),			\
-			      0, err, args)
+			      SYS_ify(name), err, args)
 
 # define INTERNAL_SYSCALL_NCS(number, err, nr, args...)			\
 	internal_syscall##nr (MOVE32 "\t%0, %2\n\t",			\
@@ -136,6 +136,7 @@ 
 
 #endif /* !__mips16 */
 
+
 #define internal_syscall0(v0_init, input, number, err, dummy...)	\
 ({									\
 	long _sys_result;						\
@@ -262,109 +263,41 @@ 
 	_sys_result;							\
 })
 
-/* We need to use a frame pointer for the functions in which we
-   adjust $sp around the syscall, or debug information and unwind
-   information will be $sp relative and thus wrong during the syscall.  As
-   of GCC 4.7, this is sufficient.  */
-#define FORCE_FRAME_POINTER						\
-  void *volatile __fp_force __attribute__ ((unused)) = alloca (4)
+long int __libc_do_syscall (long int, ...) attribute_hidden;
 
 #define internal_syscall5(v0_init, input, number, err,			\
 			  arg1, arg2, arg3, arg4, arg5)			\
 ({									\
-	long _sys_result;						\
-									\
-	FORCE_FRAME_POINTER;						\
-	{								\
-	register long __s0 asm ("$16") __attribute__ ((unused))		\
-	  = (number);							\
-	register long __v0 asm ("$2");					\
-	register long __a0 asm ("$4") = (long) (arg1);			\
-	register long __a1 asm ("$5") = (long) (arg2);			\
-	register long __a2 asm ("$6") = (long) (arg3);			\
-	register long __a3 asm ("$7") = (long) (arg4);			\
-	__asm__ volatile (						\
-	".set\tnoreorder\n\t"						\
-	"subu\t$29, 32\n\t"						\
-	"sw\t%6, 16($29)\n\t"						\
-	v0_init								\
-	"syscall\n\t"							\
-	"addiu\t$29, 32\n\t"						\
-	".set\treorder"							\
-	: "=r" (__v0), "+r" (__a3)					\
-	: input, "r" (__a0), "r" (__a1), "r" (__a2),			\
-	  "r" ((long) (arg5))						\
-	: __SYSCALL_CLOBBERS);						\
-	err = __a3;							\
-	_sys_result = __v0;						\
-	}								\
+	long int _sys_result;						\
+	_sys_result = __libc_do_syscall (number, arg1, arg2, arg3,	\
+					 arg4, arg5);			\
+	err = _sys_result > -4096UL ? 1 : 0;				\
+	if (err)							\
+	  _sys_result = -_sys_result;					\
 	_sys_result;							\
 })
 
 #define internal_syscall6(v0_init, input, number, err,			\
 			  arg1, arg2, arg3, arg4, arg5, arg6)		\
 ({									\
-	long _sys_result;						\
-									\
-	FORCE_FRAME_POINTER;						\
-	{								\
-	register long __s0 asm ("$16") __attribute__ ((unused))		\
-	  = (number);							\
-	register long __v0 asm ("$2");					\
-	register long __a0 asm ("$4") = (long) (arg1);			\
-	register long __a1 asm ("$5") = (long) (arg2);			\
-	register long __a2 asm ("$6") = (long) (arg3);			\
-	register long __a3 asm ("$7") = (long) (arg4);			\
-	__asm__ volatile (						\
-	".set\tnoreorder\n\t"						\
-	"subu\t$29, 32\n\t"						\
-	"sw\t%6, 16($29)\n\t"						\
-	"sw\t%7, 20($29)\n\t"						\
-	v0_init								\
-	"syscall\n\t"							\
-	"addiu\t$29, 32\n\t"						\
-	".set\treorder"							\
-	: "=r" (__v0), "+r" (__a3)					\
-	: input, "r" (__a0), "r" (__a1), "r" (__a2),			\
-	  "r" ((long) (arg5)), "r" ((long) (arg6))			\
-	: __SYSCALL_CLOBBERS);						\
-	err = __a3;							\
-	_sys_result = __v0;						\
-	}								\
+	long int _sys_result;						\
+	_sys_result = __libc_do_syscall (number, arg1, arg2, arg3,	\
+					 arg4, arg5, arg6);		\
+	err = _sys_result > -4096UL ? 1 : 0;				\
+	if (err)							\
+	  _sys_result = -_sys_result;					\
 	_sys_result;							\
 })
 
 #define internal_syscall7(v0_init, input, number, err,			\
 			  arg1, arg2, arg3, arg4, arg5, arg6, arg7)	\
 ({									\
-	long _sys_result;						\
-									\
-	FORCE_FRAME_POINTER;						\
-	{								\
-	register long __s0 asm ("$16") __attribute__ ((unused))		\
-	  = (number);							\
-	register long __v0 asm ("$2");					\
-	register long __a0 asm ("$4") = (long) (arg1);			\
-	register long __a1 asm ("$5") = (long) (arg2);			\
-	register long __a2 asm ("$6") = (long) (arg3);			\
-	register long __a3 asm ("$7") = (long) (arg4);			\
-	__asm__ volatile (						\
-	".set\tnoreorder\n\t"						\
-	"subu\t$29, 32\n\t"						\
-	"sw\t%6, 16($29)\n\t"						\
-	"sw\t%7, 20($29)\n\t"						\
-	"sw\t%8, 24($29)\n\t"						\
-	v0_init								\
-	"syscall\n\t"							\
-	"addiu\t$29, 32\n\t"						\
-	".set\treorder"							\
-	: "=r" (__v0), "+r" (__a3)					\
-	: input, "r" (__a0), "r" (__a1), "r" (__a2),			\
-	  "r" ((long) (arg5)), "r" ((long) (arg6)), "r" ((long) (arg7))	\
-	: __SYSCALL_CLOBBERS);						\
-	err = __a3;							\
-	_sys_result = __v0;						\
-	}								\
+	long int _sys_result;						\
+	_sys_result = __libc_do_syscall (number, arg1, arg2, arg3,	\
+					 arg4, arg5, arg6, arg7);	\
+	err = _sys_result > -4096UL ? 1 : 0;				\
+	if (err)							\
+	  _sys_result = -_sys_result;					\
 	_sys_result;							\
 })