mbox series

[00/10] implement DYNAMIC_DEBUG_RELATIVE_POINTERS

Message ID 20190409212517.7321-1-linux@rasmusvillemoes.dk
Headers show
Series implement DYNAMIC_DEBUG_RELATIVE_POINTERS | expand

Message

Rasmus Villemoes April 9, 2019, 9:25 p.m. UTC
Similar to CONFIG_GENERIC_BUG_RELATIVE_POINTERS that replaces (8 byte)
const char* members by (4 byte) signed offsets from the bug_entry,
this implements the similar thing for struct _ddebug, the descriptors
underlying pr_debug() and friends in a CONFIG_DYNAMIC_DEBUG kernel.

Since struct _ddebug has four string members, we save 16 byte per
instance. The total savings seem to be comparable to what is saved by
an architecture selecting GENERIC_BUG_RELATIVE_POINTERS (see patch 8
for some numbers for a common distro config).

While refreshing these patches, which were orignally just targeted at
x86-64, it occured to me that despite the implementation relying on
inline asm, there's nothing x86 specific about it, and indeed it seems
to work out-of-the-box for ppc64 and arm64 as well, but those have
only been compile-tested.

The first 6 patches are rather pedestrian preparations. The fun stuff
is in patch 7, and the remaining three patches are just the minimal
boilerplate to hook up the config option and asm-generic header on the
three architectures.

Rasmus Villemoes (10):
  linux/device.h: use unique identifier for each struct _ddebug
  linux/net.h: use unique identifier for each struct _ddebug
  linux/printk.h: use unique identifier for each struct _ddebug
  dynamic_debug: introduce accessors for string members of struct
    _ddebug
  dynamic_debug: drop use of bitfields in struct _ddebug
  dynamic_debug: introduce CONFIG_DYNAMIC_DEBUG_RELATIVE_POINTERS
  dynamic_debug: add asm-generic implementation for
    DYNAMIC_DEBUG_RELATIVE_POINTERS
  x86-64: select DYNAMIC_DEBUG_RELATIVE_POINTERS
  arm64: select DYNAMIC_DEBUG_RELATIVE_POINTERS
  powerpc: select DYNAMIC_DEBUG_RELATIVE_POINTERS for PPC64

 arch/arm64/Kconfig                          |   1 +
 arch/arm64/include/asm/Kbuild               |   1 +
 arch/powerpc/Kconfig                        |   1 +
 arch/powerpc/include/asm/Kbuild             |   1 +
 arch/x86/Kconfig                            |   1 +
 arch/x86/entry/vdso/vdso32/vclock_gettime.c |   1 +
 arch/x86/include/asm/Kbuild                 |   1 +
 include/asm-generic/dynamic_debug.h         | 116 ++++++++++++++++++++
 include/linux/device.h                      |   4 +-
 include/linux/dynamic_debug.h               |  26 +++--
 include/linux/jump_label.h                  |   2 +
 include/linux/net.h                         |   4 +-
 include/linux/printk.h                      |   4 +-
 lib/Kconfig.debug                           |   3 +
 lib/dynamic_debug.c                         | 111 ++++++++++++++-----
 15 files changed, 237 insertions(+), 40 deletions(-)
 create mode 100644 include/asm-generic/dynamic_debug.h

Comments

Rasmus Villemoes May 6, 2019, 6:48 a.m. UTC | #1
On 09/04/2019 23.25, Rasmus Villemoes wrote:

> While refreshing these patches, which were orignally just targeted at
> x86-64, it occured to me that despite the implementation relying on
> inline asm, there's nothing x86 specific about it, and indeed it seems
> to work out-of-the-box for ppc64 and arm64 as well, but those have
> only been compile-tested.

So, apart from the Clang build failures for non-x86, I now also got a
report that gcc 4.8 miscompiles this stuff in some cases [1], even for
x86 - gcc 4.9 does not seem to have the problem. So, given that the 5.2
merge window just opened, I suppose this is the point where I should
pull the plug on this experiment :(

Rasmus

[1] Specifically, the problem manifested in net/ipv4/tcp_input.c: Both
uses of the static inline inet_csk_clear_xmit_timer() pass a
compile-time constant 'what', so the ifs get folded away and both uses
are completely inlined. Yet, gcc still decides to emit a copy of the
final 'else' branch of inet_csk_clear_xmit_timer() as its own
inet_csk_reset_xmit_timer.part.55 function, which is of course unused.
And despite the asm() that defines the ddebug descriptor being an "asm
volatile", gcc thinks it's fine to elide that (the code path is
unreachable, after all....), so the entire asm for that function is

        .section        .text.unlikely
        .type   inet_csk_reset_xmit_timer.part.55, @function
inet_csk_reset_xmit_timer.part.55:
        movq    $.LC1, %rsi     #,
        movq    $__UNIQUE_ID_ddebug160, %rdi    #,
        xorl    %eax, %eax      #
        jmp     __dynamic_pr_debug      #
        .size   inet_csk_reset_xmit_timer.part.55,
.-inet_csk_reset_xmit_timer.part.55

which of course fails to link since the symbol __UNIQUE_ID_ddebug160 is
nowhere defined.
Ingo Molnar May 6, 2019, 7:05 a.m. UTC | #2
* Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> On 09/04/2019 23.25, Rasmus Villemoes wrote:
> 
> > While refreshing these patches, which were orignally just targeted at
> > x86-64, it occured to me that despite the implementation relying on
> > inline asm, there's nothing x86 specific about it, and indeed it seems
> > to work out-of-the-box for ppc64 and arm64 as well, but those have
> > only been compile-tested.
> 
> So, apart from the Clang build failures for non-x86, I now also got a
> report that gcc 4.8 miscompiles this stuff in some cases [1], even for
> x86 - gcc 4.9 does not seem to have the problem. So, given that the 5.2
> merge window just opened, I suppose this is the point where I should
> pull the plug on this experiment :(
> 
> Rasmus
> 
> [1] Specifically, the problem manifested in net/ipv4/tcp_input.c: Both
> uses of the static inline inet_csk_clear_xmit_timer() pass a
> compile-time constant 'what', so the ifs get folded away and both uses
> are completely inlined. Yet, gcc still decides to emit a copy of the
> final 'else' branch of inet_csk_clear_xmit_timer() as its own
> inet_csk_reset_xmit_timer.part.55 function, which is of course unused.
> And despite the asm() that defines the ddebug descriptor being an "asm
> volatile", gcc thinks it's fine to elide that (the code path is
> unreachable, after all....), so the entire asm for that function is
> 
>         .section        .text.unlikely
>         .type   inet_csk_reset_xmit_timer.part.55, @function
> inet_csk_reset_xmit_timer.part.55:
>         movq    $.LC1, %rsi     #,
>         movq    $__UNIQUE_ID_ddebug160, %rdi    #,
>         xorl    %eax, %eax      #
>         jmp     __dynamic_pr_debug      #
>         .size   inet_csk_reset_xmit_timer.part.55,
> .-inet_csk_reset_xmit_timer.part.55
> 
> which of course fails to link since the symbol __UNIQUE_ID_ddebug160 is
> nowhere defined.

It's sad to see such nice data footprint savings go the way of the dodo 
just because GCC 4.8 is buggy.

The current compatibility cut-off is GCC 4.6:

  GNU C                  4.6              gcc --version

Do we know where the GCC bug was fixed, was it in GCC 4.9?

According to the GCC release dates:

  https://www.gnu.org/software/gcc/releases.html

4.8.0 was released in early-2013, while 4.9.0 was released in early-2014. 
So the tooling compatibility window for latest upstream would narrow from 
~6 years to ~5 years.

Thanks,

	Ingo
Rasmus Villemoes May 6, 2019, 7:34 a.m. UTC | #3
On 06/05/2019 09.05, Ingo Molnar wrote:
> 
> 
> It's sad to see such nice data footprint savings go the way of the dodo 
> just because GCC 4.8 is buggy.
> 
> The current compatibility cut-off is GCC 4.6:
> 
>   GNU C                  4.6              gcc --version
> 
> Do we know where the GCC bug was fixed, was it in GCC 4.9?

Not sure. The report was from a build on CentOS with gcc 4.8.5, so I
tried installing the gcc-4.8 package on my Ubuntu machine and could
reproduce. Then I tried installed gcc-4.9, and after disabling
CONFIG_RETPOLINE (both CentOS and Ubuntu carry backported retpoline
support in their 4.8, but apparently not 4.9), I could see that the
problem was gone. But whether it's gone because it no longer elides an
asm volatile() on a code path it otherwise emits code for, or because it
simply doesn't emit the unused static inline() at all I don't know.

I thought 0day also tested a range of supported compiler versions, so I
was rather surprised at getting this report at all. But I suppose the
arch/config matrix is already pretty huge. Anyway, the bug certainly
doesn't exist in any of the gcc versions 0day does test.

I _am_ bending the C rules a bit with the "extern some_var; asm
volatile(".section some_section\nsome_var: blabla");". I should probably
ask on the gcc list whether this way of defining a local symbol in
inline assembly and referring to it from C is supposed to work, or it
just happens to work by chance.

Rasmus
Ingo Molnar May 6, 2019, 7:48 a.m. UTC | #4
* Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> I _am_ bending the C rules a bit with the "extern some_var; asm 
> volatile(".section some_section\nsome_var: blabla");". I should 
> probably ask on the gcc list whether this way of defining a local 
> symbol in inline assembly and referring to it from C is supposed to 
> work, or it just happens to work by chance.

Doing that would be rather useful I think.

Thanks,

	Ingo
Segher Boessenkool May 6, 2019, 2:48 p.m. UTC | #5
On Mon, May 06, 2019 at 09:34:55AM +0200, Rasmus Villemoes wrote:
> I _am_ bending the C rules a bit with the "extern some_var; asm
> volatile(".section some_section\nsome_var: blabla");". I should probably
> ask on the gcc list whether this way of defining a local symbol in
> inline assembly and referring to it from C is supposed to work, or it
> just happens to work by chance.

It only works by chance.  There is no way GCC can know the asm needs
that variable.  If you make it (or its address) an input of the asm it
should work as far as I can see?  (Need exact code to analyse it exactly).


Segher