mbox series

[RFC,v2,00/10] powerpc/32: switch VDSO to C implementation.

Message ID cover.1577111363.git.christophe.leroy@c-s.fr (mailing list archive)
Headers show
Series powerpc/32: switch VDSO to C implementation. | expand

Message

Christophe Leroy Dec. 23, 2019, 2:31 p.m. UTC
This is a second tentative to switch powerpc/32 vdso to generic C implementation.

It will likely not work on 64 bits or even build properly at the moment.

powerpc is a bit special for VDSO as well as system calls in the
way that it requires setting CR SO bit which cannot be done in C.
Therefore, entry/exit and fallback needs to be performed in ASM.

To allow that, the fallback calls are moved out of the common code
and left to the arches.

A few other changes in the common code have allowed performance improvement.

The performance has improved since first RFC, but it is still lower than
current assembly VDSO.

On a powerpc8xx, with current powerpc/32 ASM VDSO:

gettimeofday:    vdso: 737 nsec/call
clock-getres-realtime:    vdso: 475 nsec/call
clock-gettime-realtime:    vdso: 892 nsec/call
clock-getres-monotonic:    vdso: 475 nsec/call
clock-gettime-monotonic:    vdso: 1014 nsec/call

First try of C implementation:

gettimeofday:    vdso: 1533 nsec/call
clock-getres-realtime:    vdso: 853 nsec/call
clock-gettime-realtime:    vdso: 1570 nsec/call
clock-getres-monotonic:    vdso: 835 nsec/call
clock-gettime-monotonic:    vdso: 1605 nsec/call

With this series:

gettimeofday:    vdso: 1016 nsec/call
clock-getres-realtime:    vdso: 560 nsec/call
clock-gettime-realtime:    vdso: 1192 nsec/call
clock-getres-monotonic:    vdso: 560 nsec/call
clock-gettime-monotonic:    vdso: 1192 nsec/call


Changes made to other arches are untested, not even compiled.


Christophe Leroy (10):
  lib: vdso: ensure all arches have 32bit fallback
  lib: vdso: move call to fallback out of common code.
  lib: vdso: Change __cvdso_clock_gettime/getres_common() to
    __cvdso_clock_gettime/getres()
  lib: vdso: get pointer to vdso data from the arch
  lib: vdso: inline do_hres()
  lib: vdso: make do_coarse() return 0
  lib: vdso: don't use READ_ONCE() in __c_kernel_time()
  lib: vdso: Avoid duplication in __cvdso_clock_getres()
  powerpc/vdso32: inline __get_datapage()
  powerpc/32: Switch VDSO to C implementation.

 arch/arm/include/asm/vdso/gettimeofday.h          |  26 +++
 arch/arm/vdso/vgettimeofday.c                     |  32 ++-
 arch/arm64/include/asm/vdso/compat_gettimeofday.h |   2 -
 arch/arm64/include/asm/vdso/gettimeofday.h        |  26 +++
 arch/arm64/kernel/vdso/vgettimeofday.c            |  24 +-
 arch/arm64/kernel/vdso32/vgettimeofday.c          |  39 +++-
 arch/mips/include/asm/vdso/gettimeofday.h         |  28 ++-
 arch/mips/vdso/vgettimeofday.c                    |  56 ++++-
 arch/powerpc/Kconfig                              |   2 +
 arch/powerpc/include/asm/vdso/gettimeofday.h      |  45 ++++
 arch/powerpc/include/asm/vdso/vsyscall.h          |  27 +++
 arch/powerpc/include/asm/vdso_datapage.h          |  28 +--
 arch/powerpc/kernel/asm-offsets.c                 |  23 +-
 arch/powerpc/kernel/time.c                        |  92 +-------
 arch/powerpc/kernel/vdso.c                        |  19 +-
 arch/powerpc/kernel/vdso32/Makefile               |  19 +-
 arch/powerpc/kernel/vdso32/cacheflush.S           |   9 +-
 arch/powerpc/kernel/vdso32/datapage.S             |  28 +--
 arch/powerpc/kernel/vdso32/gettimeofday.S         | 265 +++-------------------
 arch/powerpc/kernel/vdso32/vgettimeofday.c        |  32 +++
 arch/x86/entry/vdso/vclock_gettime.c              |  52 ++++-
 arch/x86/include/asm/vdso/gettimeofday.h          |  28 ++-
 lib/vdso/gettimeofday.c                           | 100 +++-----
 23 files changed, 505 insertions(+), 497 deletions(-)
 create mode 100644 arch/powerpc/include/asm/vdso/gettimeofday.h
 create mode 100644 arch/powerpc/include/asm/vdso/vsyscall.h
 create mode 100644 arch/powerpc/kernel/vdso32/vgettimeofday.c

Comments

Christophe Leroy Jan. 9, 2020, 5:52 p.m. UTC | #1
Wondering why we get something so complicated/redundant for 
vdso_read_begin() <include/vdso/helpers.h>

static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
{
	u32 seq;

	while ((seq = READ_ONCE(vd->seq)) & 1)
		cpu_relax();

	smp_rmb();
	return seq;
}


  6e0:   81 05 00 f0     lwz     r8,240(r5)
  6e4:   71 09 00 01     andi.   r9,r8,1
  6e8:   41 82 00 10     beq     6f8 <__c_kernel_clock_gettime+0x158>
  6ec:   81 05 00 f0     lwz     r8,240(r5)
  6f0:   71 0a 00 01     andi.   r10,r8,1
  6f4:   40 82 ff f8     bne     6ec <__c_kernel_clock_gettime+0x14c>
  6f8:

r5 being vd pointer

Why the first triplet, not only the second triplet ? Something wrong 
with using READ_ONCE() for that ?

Christophe
Segher Boessenkool Jan. 9, 2020, 8:07 p.m. UTC | #2
On Thu, Jan 09, 2020 at 05:52:34PM +0000, Christophe Leroy wrote:
> Wondering why we get something so complicated/redundant for 
> vdso_read_begin() <include/vdso/helpers.h>
> 
> static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
> {
> 	u32 seq;
> 
> 	while ((seq = READ_ONCE(vd->seq)) & 1)
> 		cpu_relax();
> 
> 	smp_rmb();
> 	return seq;
> }
> 
> 
>  6e0:   81 05 00 f0     lwz     r8,240(r5)
>  6e4:   71 09 00 01     andi.   r9,r8,1
>  6e8:   41 82 00 10     beq     6f8 <__c_kernel_clock_gettime+0x158>
>  6ec:   81 05 00 f0     lwz     r8,240(r5)
>  6f0:   71 0a 00 01     andi.   r10,r8,1
>  6f4:   40 82 ff f8     bne     6ec <__c_kernel_clock_gettime+0x14c>
>  6f8:
> 
> r5 being vd pointer
> 
> Why the first triplet, not only the second triplet ? Something wrong 
> with using READ_ONCE() for that ?

It looks like the compiler did loop peeling.  What GCC version is this?
Please try current trunk (to become GCC 10), or at least GCC 9?


Segher
Christophe Leroy Jan. 10, 2020, 6:45 a.m. UTC | #3
Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> On Thu, Jan 09, 2020 at 05:52:34PM +0000, Christophe Leroy wrote:
>> Wondering why we get something so complicated/redundant for
>> vdso_read_begin() <include/vdso/helpers.h>
>>
>> static __always_inline u32 vdso_read_begin(const struct vdso_data *vd)
>> {
>> 	u32 seq;
>>
>> 	while ((seq = READ_ONCE(vd->seq)) & 1)
>> 		cpu_relax();
>>
>> 	smp_rmb();
>> 	return seq;
>> }
>>
>>
>>   6e0:   81 05 00 f0     lwz     r8,240(r5)
>>   6e4:   71 09 00 01     andi.   r9,r8,1
>>   6e8:   41 82 00 10     beq     6f8 <__c_kernel_clock_gettime+0x158>
>>   6ec:   81 05 00 f0     lwz     r8,240(r5)
>>   6f0:   71 0a 00 01     andi.   r10,r8,1
>>   6f4:   40 82 ff f8     bne     6ec <__c_kernel_clock_gettime+0x14c>
>>   6f8:
>>
>> r5 being vd pointer
>>
>> Why the first triplet, not only the second triplet ? Something wrong
>> with using READ_ONCE() for that ?
> 
> It looks like the compiler did loop peeling.  What GCC version is this?
> Please try current trunk (to become GCC 10), or at least GCC 9?
> 

It is with GCC 5.5

https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more 
recent than 8.1

With 8.1, the problem doesn't show up.

Thanks
Christophe
Segher Boessenkool Jan. 11, 2020, 11:33 a.m. UTC | #4
On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> >It looks like the compiler did loop peeling.  What GCC version is this?
> >Please try current trunk (to become GCC 10), or at least GCC 9?
> 
> It is with GCC 5.5
> 
> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more 
> recent than 8.1

Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
this hard and/or painful to do?

> With 8.1, the problem doesn't show up.

Good to hear that.  Hrm, I wonder when it was fixed...


Segher
Arnd Bergmann Feb. 16, 2020, 6:10 p.m. UTC | #5
On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
<segher@kernel.crashing.org> wrote:
>
> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> > Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> > >It looks like the compiler did loop peeling.  What GCC version is this?
> > >Please try current trunk (to become GCC 10), or at least GCC 9?
> >
> > It is with GCC 5.5
> >
> > https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> > recent than 8.1
>
> Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
> this hard and/or painful to do?

To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
binaries, as well as a recent 10.0 snapshot.

I hope these work, let me know if there are problems.

       Arnd
Christophe Leroy Feb. 19, 2020, 8:45 a.m. UTC | #6
Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :
> On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> <segher@kernel.crashing.org> wrote:
>>
>> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
>>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
>>>> It looks like the compiler did loop peeling.  What GCC version is this?
>>>> Please try current trunk (to become GCC 10), or at least GCC 9?
>>>
>>> It is with GCC 5.5
>>>
>>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
>>> recent than 8.1
>>
>> Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
>> this hard and/or painful to do?
> 
> To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
> binaries, as well as a recent 10.0 snapshot.
> 

Thanks Arnd,

I have built the VDSO with 9.2, I get less performant result than with 
8.2 (same performance as with 5.5).

After a quick look, I see:
- Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should 
avoid that.
- A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC 
8.1 don't need that, all VDSO functions are frameless with 8.1

Christophe
Arnd Bergmann Feb. 19, 2020, 9:52 a.m. UTC | #7
On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy
<christophe.leroy@c-s.fr> wrote:
> Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :
> > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> > <segher@kernel.crashing.org> wrote:
> >>
> >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> >>>> It looks like the compiler did loop peeling.  What GCC version is this?
> >>>> Please try current trunk (to become GCC 10), or at least GCC 9?
> >>>
> >>> It is with GCC 5.5
> >>>
> >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> >>> recent than 8.1
> >>
> >> Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
> >> this hard and/or painful to do?
> >
> > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
> > binaries, as well as a recent 10.0 snapshot.
> >
>
> Thanks Arnd,
>
> I have built the VDSO with 9.2, I get less performant result than with
> 8.2 (same performance as with 5.5).
>
> After a quick look, I see:
> - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should
> avoid that.
> - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC
> 8.1 don't need that, all VDSO functions are frameless with 8.1

If you think it should be fixed in gcc, maybe try to reproduce it in
https://godbolt.org/ and open a gcc bug against that.

Also, please try the gcc-10 snapshot, which has the highest chance
of getting fixes if it shows the same issue (or worse).

     Arnd
Segher Boessenkool Feb. 19, 2020, 1:08 p.m. UTC | #8
On Wed, Feb 19, 2020 at 10:52:16AM +0100, Arnd Bergmann wrote:
> On Wed, Feb 19, 2020 at 9:45 AM Christophe Leroy
> <christophe.leroy@c-s.fr> wrote:
> > Le 16/02/2020 à 19:10, Arnd Bergmann a écrit :
> > > On Sat, Jan 11, 2020 at 12:33 PM Segher Boessenkool
> > > <segher@kernel.crashing.org> wrote:
> > >>
> > >> On Fri, Jan 10, 2020 at 07:45:44AM +0100, Christophe Leroy wrote:
> > >>> Le 09/01/2020 à 21:07, Segher Boessenkool a écrit :
> > >>>> It looks like the compiler did loop peeling.  What GCC version is this?
> > >>>> Please try current trunk (to become GCC 10), or at least GCC 9?
> > >>>
> > >>> It is with GCC 5.5
> > >>>
> > >>> https://mirrors.edge.kernel.org/pub/tools/crosstool/ doesn't have more
> > >>> recent than 8.1
> > >>
> > >> Arnd, can you update the tools?  We are at 8.3 and 9.2 now :-)  Or is
> > >> this hard and/or painful to do?
> > >
> > > To follow up on this older thread, I have now uploaded 6.5, 7.5, 8.3 and 9.2
> > > binaries, as well as a recent 10.0 snapshot.
> > >
> >
> > Thanks Arnd,
> >
> > I have built the VDSO with 9.2, I get less performant result than with
> > 8.2 (same performance as with 5.5).
> >
> > After a quick look, I see:
> > - Irrelevant NOPs to align loops and stuff, allthough -mpcu=860 should
> > avoid that.
> > - A stack frame is set for saving r31 in __c_kernel_clock_gettime. GCC
> > 8.1 don't need that, all VDSO functions are frameless with 8.1
> 
> If you think it should be fixed in gcc, maybe try to reproduce it in
> https://godbolt.org/

(Feel free to skip this step; and don't put links to godbolt (or anything
else external) in our bugzilla, please; such links go stale before you
know it.)

> and open a gcc bug against that.

Yes please :-)

> Also, please try the gcc-10 snapshot, which has the highest chance
> of getting fixes if it shows the same issue (or worse).

If it is a regression, chances are it will be backported.  (But not to
9.3, which is due in just a few weeks, just like 8.4).  If it is just a
side effect of some other change, it will probably *not* be undone, not
on trunk (GCC 10) either.  It depends.

But sure, always test trunk if you can.


Segher