diff mbox series

treewide: remove current_text_addr

Message ID 20180821202900.208417-1-ndesaulniers@google.com
State Not Applicable
Headers show
Series treewide: remove current_text_addr | expand

Commit Message

Nick Desaulniers Aug. 21, 2018, 8:28 p.m. UTC
Prefer _THIS_IP_ defined in linux/kernel.h.

Most definitions of current_text_addr were the same as _THIS_IP_, but
a few archs had inline assembly instead.

This patch removes the final call site of current_text_addr, making all
of the definitions dead code.

Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
---
I suspect that current_text_addr predated GNU C extensions for statement
expressions and/or taking the address of a label, then the macro was
reimplemented for every new archs include/asm/processor.h, even though
there were very few call sites, and none required an assembly
implementation vs the C implementation.

I am sad to see a few neat arch specific ways of getting the ip/pc, but
we should prefer the higher level C in cases where assembly is not
required. And the definitions can always be found again in git history.

 arch/alpha/include/asm/processor.h      |  6 ------
 arch/arc/include/asm/processor.h        |  8 --------
 arch/arm/include/asm/processor.h        |  6 ------
 arch/arm64/include/asm/processor.h      |  7 -------
 arch/c6x/include/asm/processor.h        | 11 -----------
 arch/h8300/include/asm/processor.h      |  6 ------
 arch/hexagon/include/asm/processor.h    |  3 ---
 arch/ia64/include/asm/processor.h       |  6 ------
 arch/m68k/include/asm/processor.h       |  6 ------
 arch/microblaze/include/asm/processor.h | 12 ------------
 arch/mips/include/asm/processor.h       |  5 -----
 arch/nds32/include/asm/processor.h      |  6 ------
 arch/nios2/include/asm/processor.h      |  6 ------
 arch/openrisc/include/asm/processor.h   |  5 -----
 arch/parisc/include/asm/processor.h     | 11 -----------
 arch/powerpc/include/asm/processor.h    |  6 ------
 arch/riscv/include/asm/processor.h      |  6 ------
 arch/s390/include/asm/processor.h       |  6 ------
 arch/sh/include/asm/processor_32.h      |  6 ------
 arch/sh/include/asm/processor_64.h      | 15 ---------------
 arch/sparc/include/asm/processor_32.h   |  6 ------
 arch/sparc/include/asm/processor_64.h   |  6 ------
 arch/unicore32/include/asm/processor.h  |  6 ------
 arch/x86/include/asm/kexec.h            |  3 ++-
 arch/x86/include/asm/processor.h        | 12 ------------
 arch/x86/um/asm/processor_32.h          |  8 --------
 arch/x86/um/asm/processor_64.h          |  3 ---
 arch/xtensa/include/asm/processor.h     |  8 --------
 28 files changed, 2 insertions(+), 193 deletions(-)

Comments

Helge Deller Aug. 25, 2018, 10:48 a.m. UTC | #1
On 21.08.2018 22:28, Nick Desaulniers wrote:
> Prefer _THIS_IP_ defined in linux/kernel.h.
> 
> Most definitions of current_text_addr were the same as _THIS_IP_, but
> a few archs had inline assembly instead.
> 
> This patch removes the final call site of current_text_addr, making all
> of the definitions dead code.
> 
> Signed-off-by: Nick Desaulniers <ndesaulniers@google.com>
> ---
> I suspect that current_text_addr predated GNU C extensions for statement
> expressions and/or taking the address of a label, then the macro was
> reimplemented for every new archs include/asm/processor.h, even though
> there were very few call sites, and none required an assembly
> implementation vs the C implementation.
> 
> I am sad to see a few neat arch specific ways of getting the ip/pc, but
> we should prefer the higher level C in cases where assembly is not
> required. And the definitions can always be found again in git history.

Currently alpha, s390, sparc, sh, c6x, ia64 and parisc provide an
inline assembly function to get the current instruction pointer. 
As mentioned in an earlier thread, I personally would *prefer* if 
_THIS_IP_ would use those inline assembly instructions on those
architectures instead of the (currently used) higher C-level
implementation.

Helge


>  arch/alpha/include/asm/processor.h      |  6 ------
>  arch/arc/include/asm/processor.h        |  8 --------
>  arch/arm/include/asm/processor.h        |  6 ------
>  arch/arm64/include/asm/processor.h      |  7 -------
>  arch/c6x/include/asm/processor.h        | 11 -----------
>  arch/h8300/include/asm/processor.h      |  6 ------
>  arch/hexagon/include/asm/processor.h    |  3 ---
>  arch/ia64/include/asm/processor.h       |  6 ------
>  arch/m68k/include/asm/processor.h       |  6 ------
>  arch/microblaze/include/asm/processor.h | 12 ------------
>  arch/mips/include/asm/processor.h       |  5 -----
>  arch/nds32/include/asm/processor.h      |  6 ------
>  arch/nios2/include/asm/processor.h      |  6 ------
>  arch/openrisc/include/asm/processor.h   |  5 -----
>  arch/parisc/include/asm/processor.h     | 11 -----------
>  arch/powerpc/include/asm/processor.h    |  6 ------
>  arch/riscv/include/asm/processor.h      |  6 ------
>  arch/s390/include/asm/processor.h       |  6 ------
>  arch/sh/include/asm/processor_32.h      |  6 ------
>  arch/sh/include/asm/processor_64.h      | 15 ---------------
>  arch/sparc/include/asm/processor_32.h   |  6 ------
>  arch/sparc/include/asm/processor_64.h   |  6 ------
>  arch/unicore32/include/asm/processor.h  |  6 ------
>  arch/x86/include/asm/kexec.h            |  3 ++-
>  arch/x86/include/asm/processor.h        | 12 ------------
>  arch/x86/um/asm/processor_32.h          |  8 --------
>  arch/x86/um/asm/processor_64.h          |  3 ---
>  arch/xtensa/include/asm/processor.h     |  8 --------
>  28 files changed, 2 insertions(+), 193 deletions(-)
> 
> diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
> index cb05d045efe3..6100431da07a 100644
> --- a/arch/alpha/include/asm/processor.h
> +++ b/arch/alpha/include/asm/processor.h
> @@ -10,12 +10,6 @@
>  
>  #include <linux/personality.h>	/* for ADDR_LIMIT_32BIT */
>  
> -/*
> - * Returns current instruction pointer ("program counter").
> - */
> -#define current_text_addr() \
> -  ({ void *__pc; __asm__ ("br %0,.+4" : "=r"(__pc)); __pc; })
> -
>  /*
>   * We have a 42-bit user address space: 4TB user VM...
>   */
> diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
> index 8ee41e988169..10346d6cf926 100644
> --- a/arch/arc/include/asm/processor.h
> +++ b/arch/arc/include/asm/processor.h
> @@ -98,14 +98,6 @@ extern void start_thread(struct pt_regs * regs, unsigned long pc,
>  
>  extern unsigned int get_wchan(struct task_struct *p);
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - * Should the PC register be read instead ? This macro does not seem to
> - * be used in many places so this wont be all that bad.
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l; })
> -
>  #endif /* !__ASSEMBLY__ */
>  
>  /*
> diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
> index 1bf65b47808a..120f4c9bbfde 100644
> --- a/arch/arm/include/asm/processor.h
> +++ b/arch/arm/include/asm/processor.h
> @@ -11,12 +11,6 @@
>  #ifndef __ASM_ARM_PROCESSOR_H
>  #define __ASM_ARM_PROCESSOR_H
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l;})
> -
>  #ifdef __KERNEL__
>  
>  #include <asm/hw_breakpoint.h>
> diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
> index 79657ad91397..966214f473b4 100644
> --- a/arch/arm64/include/asm/processor.h
> +++ b/arch/arm64/include/asm/processor.h
> @@ -25,13 +25,6 @@
>  #define USER_DS		(TASK_SIZE_64 - 1)
>  
>  #ifndef __ASSEMBLY__
> -
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l;})
> -
>  #ifdef __KERNEL__
>  
>  #include <linux/build_bug.h>
> diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
> index 8f7cce829f8e..a8581f5b27f6 100644
> --- a/arch/c6x/include/asm/processor.h
> +++ b/arch/c6x/include/asm/processor.h
> @@ -17,17 +17,6 @@
>  #include <asm/page.h>
>  #include <asm/current.h>
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr()			\
> -({						\
> -	void *__pc;				\
> -	asm("mvc .S2 pce1,%0\n" : "=b"(__pc));	\
> -	__pc;					\
> -})
> -
>  /*
>   * User space process size. This is mostly meaningless for NOMMU
>   * but some C6X processors may have RAM addresses up to 0xFFFFFFFF.
> diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
> index 985346393e4a..a060b41b2d31 100644
> --- a/arch/h8300/include/asm/processor.h
> +++ b/arch/h8300/include/asm/processor.h
> @@ -12,12 +12,6 @@
>  #ifndef __ASM_H8300_PROCESSOR_H
>  #define __ASM_H8300_PROCESSOR_H
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l; })
> -
>  #include <linux/compiler.h>
>  #include <asm/segment.h>
>  #include <asm/ptrace.h>
> diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
> index ce67940860a5..227bcb9cfdac 100644
> --- a/arch/hexagon/include/asm/processor.h
> +++ b/arch/hexagon/include/asm/processor.h
> @@ -27,9 +27,6 @@
>  #include <asm/registers.h>
>  #include <asm/hexagon_vm.h>
>  
> -/*  must be a macro  */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l; })
> -
>  /*  task_struct, defined elsewhere, is the "process descriptor" */
>  struct task_struct;
>  
> diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
> index 10061ccf0440..c91ef98ed6bf 100644
> --- a/arch/ia64/include/asm/processor.h
> +++ b/arch/ia64/include/asm/processor.h
> @@ -602,12 +602,6 @@ ia64_set_unat (__u64 *unat, void *spill_addr, unsigned long nat)
>  	*unat = (*unat & ~mask) | (nat << bit);
>  }
>  
> -/*
> - * Get the current instruction/program counter value.
> - */
> -#define current_text_addr() \
> -	({ void *_pc; _pc = (void *)ia64_getreg(_IA64_REG_IP); _pc; })
> -
>  static inline __u64
>  ia64_get_ivr (void)
>  {
> diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
> index 464e9f5f50ee..3750819ac5a1 100644
> --- a/arch/m68k/include/asm/processor.h
> +++ b/arch/m68k/include/asm/processor.h
> @@ -8,12 +8,6 @@
>  #ifndef __ASM_M68K_PROCESSOR_H
>  #define __ASM_M68K_PROCESSOR_H
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l;})
> -
>  #include <linux/thread_info.h>
>  #include <asm/segment.h>
>  #include <asm/fpu.h>
> diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
> index 330d556860ba..66b537b8d138 100644
> --- a/arch/microblaze/include/asm/processor.h
> +++ b/arch/microblaze/include/asm/processor.h
> @@ -45,12 +45,6 @@ extern void ret_from_kernel_thread(void);
>   */
>  # define TASK_SIZE	(0x81000000 - 0x80000000)
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -# define current_text_addr() ({ __label__ _l; _l: &&_l; })
> -
>  /*
>   * This decides where the kernel will search for a free chunk of vm
>   * space during mmap's. We won't be using it
> @@ -92,12 +86,6 @@ extern unsigned long get_wchan(struct task_struct *p);
>  
>  #  ifndef __ASSEMBLY__
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#  define current_text_addr()	({ __label__ _l; _l: &&_l; })
> -
>  /* If you change this, you must change the associated assembly-languages
>   * constants defined below, THREAD_*.
>   */
> diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
> index b2fa62922d88..f08417f8772e 100644
> --- a/arch/mips/include/asm/processor.h
> +++ b/arch/mips/include/asm/processor.h
> @@ -22,11 +22,6 @@
>  #include <asm/mipsregs.h>
>  #include <asm/prefetch.h>
>  
> -/*
> - * Return current * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l;})
> -
>  /*
>   * System setup and hardware flags..
>   */
> diff --git a/arch/nds32/include/asm/processor.h b/arch/nds32/include/asm/processor.h
> index 9c83caf4269f..c2660f566bac 100644
> --- a/arch/nds32/include/asm/processor.h
> +++ b/arch/nds32/include/asm/processor.h
> @@ -4,12 +4,6 @@
>  #ifndef __ASM_NDS32_PROCESSOR_H
>  #define __ASM_NDS32_PROCESSOR_H
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l;})
> -
>  #ifdef __KERNEL__
>  
>  #include <asm/ptrace.h>
> diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
> index 4944e2e1d8b0..94bcb86f679f 100644
> --- a/arch/nios2/include/asm/processor.h
> +++ b/arch/nios2/include/asm/processor.h
> @@ -38,12 +38,6 @@
>  #define KUSER_SIZE		(PAGE_SIZE)
>  #ifndef __ASSEMBLY__
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l; })
> -
>  # define TASK_SIZE		0x7FFF0000UL
>  # define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
>  
> diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
> index af31a9fe736a..351d3aed7a06 100644
> --- a/arch/openrisc/include/asm/processor.h
> +++ b/arch/openrisc/include/asm/processor.h
> @@ -30,11 +30,6 @@
>  		   | SPR_SR_DCE | SPR_SR_SM)
>  #define USER_SR   (SPR_SR_DME | SPR_SR_IME | SPR_SR_ICE \
>  		   | SPR_SR_DCE | SPR_SR_IEE | SPR_SR_TEE)
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l; })
>  
>  /*
>   * User space process size. This is hardcoded into a few places,
> diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
> index 2dbe5580a1a4..0d7f64ef9c7d 100644
> --- a/arch/parisc/include/asm/processor.h
> +++ b/arch/parisc/include/asm/processor.h
> @@ -20,17 +20,6 @@
>  #include <asm/percpu.h>
>  #endif /* __ASSEMBLY__ */
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#ifdef CONFIG_PA20
> -#define current_ia(x)	__asm__("mfia %0" : "=r"(x))
> -#else /* mfia added in pa2.0 */
> -#define current_ia(x)	__asm__("blr 0,%0\n\tnop" : "=r"(x))
> -#endif
> -#define current_text_addr() ({ void *pc; current_ia(pc); pc; })
> -
>  #define HAVE_ARCH_PICK_MMAP_LAYOUT
>  
>  #define TASK_SIZE_OF(tsk)       ((tsk)->thread.task_size)
> diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
> index 52fadded5c1e..1fff74df06e6 100644
> --- a/arch/powerpc/include/asm/processor.h
> +++ b/arch/powerpc/include/asm/processor.h
> @@ -67,12 +67,6 @@ extern int _chrp_type;
>  
>  #endif /* defined(__KERNEL__) && defined(CONFIG_PPC32) */
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l;})
> -
>  /* Macros for adjusting thread priority (hardware multi-threading) */
>  #define HMT_very_low()   asm volatile("or 31,31,31   # very low priority")
>  #define HMT_low()	 asm volatile("or 1,1,1	     # low priority")
> diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
> index 3fe4af8147d2..020e35947060 100644
> --- a/arch/riscv/include/asm/processor.h
> +++ b/arch/riscv/include/asm/processor.h
> @@ -33,12 +33,6 @@
>  struct task_struct;
>  struct pt_regs;
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr()	({ __label__ _l; _l: &&_l; })
> -
>  /* CPU-specific state of a task */
>  struct thread_struct {
>  	/* Callee-saved registers */
> diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
> index 7f2953c15c37..f8028d37bb18 100644
> --- a/arch/s390/include/asm/processor.h
> +++ b/arch/s390/include/asm/processor.h
> @@ -73,12 +73,6 @@ static inline int test_cpu_flag_of(int flag, int cpu)
>  
>  #define arch_needs_cpu() test_cpu_flag(CIF_NOHZ_DELAY)
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ void *pc; asm("basr %0,0" : "=a" (pc)); pc; })
> -
>  static inline void get_cpu_id(struct cpuid *ptr)
>  {
>  	asm volatile("stidp %0" : "=Q" (*ptr));
> diff --git a/arch/sh/include/asm/processor_32.h b/arch/sh/include/asm/processor_32.h
> index 95100d8a0b7b..0e0ecc0132e3 100644
> --- a/arch/sh/include/asm/processor_32.h
> +++ b/arch/sh/include/asm/processor_32.h
> @@ -16,12 +16,6 @@
>  #include <asm/types.h>
>  #include <asm/hw_breakpoint.h>
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ void *pc; __asm__("mova	1f, %0\n.align 2\n1:":"=z" (pc)); pc; })
> -
>  /* Core Processor Version Register */
>  #define CCN_PVR		0xff000030
>  #define CCN_CVR		0xff000040
> diff --git a/arch/sh/include/asm/processor_64.h b/arch/sh/include/asm/processor_64.h
> index 777a16318aff..f3d7075648d0 100644
> --- a/arch/sh/include/asm/processor_64.h
> +++ b/arch/sh/include/asm/processor_64.h
> @@ -19,21 +19,6 @@
>  #include <asm/types.h>
>  #include <cpu/registers.h>
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ \
> -void *pc; \
> -unsigned long long __dummy = 0; \
> -__asm__("gettr	tr0, %1\n\t" \
> -	"pta	4, tr0\n\t" \
> -	"gettr	tr0, %0\n\t" \
> -	"ptabs	%1, tr0\n\t"	\
> -	:"=r" (pc), "=r" (__dummy) \
> -	: "1" (__dummy)); \
> -pc; })
> -
>  #endif
>  
>  /*
> diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
> index 192493c257fa..3c4bc2189092 100644
> --- a/arch/sparc/include/asm/processor_32.h
> +++ b/arch/sparc/include/asm/processor_32.h
> @@ -7,12 +7,6 @@
>  #ifndef __ASM_SPARC_PROCESSOR_H
>  #define __ASM_SPARC_PROCESSOR_H
>  
> -/*
> - * Sparc32 implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ void *pc; __asm__("sethi %%hi(1f), %0; or %0, %%lo(1f), %0;\n1:" : "=r" (pc)); pc; })
> -
>  #include <asm/psr.h>
>  #include <asm/ptrace.h>
>  #include <asm/head.h>
> diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
> index aac23d4a4ddd..5cf145f18f36 100644
> --- a/arch/sparc/include/asm/processor_64.h
> +++ b/arch/sparc/include/asm/processor_64.h
> @@ -8,12 +8,6 @@
>  #ifndef __ASM_SPARC64_PROCESSOR_H
>  #define __ASM_SPARC64_PROCESSOR_H
>  
> -/*
> - * Sparc64 implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ void *pc; __asm__("rd %%pc, %0" : "=r" (pc)); pc; })
> -
>  #include <asm/asi.h>
>  #include <asm/pstate.h>
>  #include <asm/ptrace.h>
> diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
> index 4eaa42167667..b772ed1c0f25 100644
> --- a/arch/unicore32/include/asm/processor.h
> +++ b/arch/unicore32/include/asm/processor.h
> @@ -13,12 +13,6 @@
>  #ifndef __UNICORE_PROCESSOR_H__
>  #define __UNICORE_PROCESSOR_H__
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr() ({ __label__ _l; _l: &&_l; })
> -
>  #ifdef __KERNEL__
>  
>  #include <asm/ptrace.h>
> diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
> index f327236f0fa7..86924d594ecd 100644
> --- a/arch/x86/include/asm/kexec.h
> +++ b/arch/x86/include/asm/kexec.h
> @@ -21,6 +21,7 @@
>  #ifndef __ASSEMBLY__
>  
>  #include <linux/string.h>
> +#include <linux/kernel.h>
>  
>  #include <asm/page.h>
>  #include <asm/ptrace.h>
> @@ -132,7 +133,7 @@ static inline void crash_setup_regs(struct pt_regs *newregs,
>  		asm volatile("movl %%cs, %%eax;" :"=a"(newregs->cs));
>  		asm volatile("pushfq; popq %0" :"=m"(newregs->flags));
>  #endif
> -		newregs->ip = (unsigned long)current_text_addr();
> +		newregs->ip = _THIS_IP_;
>  	}
>  }
>  
> diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
> index 682286aca881..20080b303605 100644
> --- a/arch/x86/include/asm/processor.h
> +++ b/arch/x86/include/asm/processor.h
> @@ -42,18 +42,6 @@ struct vm86;
>  #define NET_IP_ALIGN	0
>  
>  #define HBP_NUM 4
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -static inline void *current_text_addr(void)
> -{
> -	void *pc;
> -
> -	asm volatile("mov $1f, %0; 1:":"=r" (pc));
> -
> -	return pc;
> -}
>  
>  /*
>   * These alignment constraints are for performance in the vSMP case,
> diff --git a/arch/x86/um/asm/processor_32.h b/arch/x86/um/asm/processor_32.h
> index c112de81c9e1..5fb1b8449adf 100644
> --- a/arch/x86/um/asm/processor_32.h
> +++ b/arch/x86/um/asm/processor_32.h
> @@ -47,14 +47,6 @@ static inline void arch_copy_thread(struct arch_thread *from,
>          memcpy(&to->tls_array, &from->tls_array, sizeof(from->tls_array));
>  }
>  
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter"). Stolen
> - * from asm-i386/processor.h
> - */
> -#define current_text_addr() \
> -	({ void *pc; __asm__("movl $1f,%0\n1:":"=g" (pc)); pc; })
> -
>  #define current_sp() ({ void *sp; __asm__("movl %%esp, %0" : "=r" (sp) : ); sp; })
>  #define current_bp() ({ unsigned long bp; __asm__("movl %%ebp, %0" : "=r" (bp) : ); bp; })
>  
> diff --git a/arch/x86/um/asm/processor_64.h b/arch/x86/um/asm/processor_64.h
> index c3be85205a65..1ef9c21877bc 100644
> --- a/arch/x86/um/asm/processor_64.h
> +++ b/arch/x86/um/asm/processor_64.h
> @@ -31,9 +31,6 @@ static inline void arch_copy_thread(struct arch_thread *from,
>  	to->fs = from->fs;
>  }
>  
> -#define current_text_addr() \
> -	({ void *pc; __asm__("movq $1f,%0\n1:":"=g" (pc)); pc; })
> -
>  #define current_sp() ({ void *sp; __asm__("movq %%rsp, %0" : "=r" (sp) : ); sp; })
>  #define current_bp() ({ unsigned long bp; __asm__("movq %%rbp, %0" : "=r" (bp) : ); bp; })
>  
> diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
> index 5b0027d4ecc0..68891c992105 100644
> --- a/arch/xtensa/include/asm/processor.h
> +++ b/arch/xtensa/include/asm/processor.h
> @@ -153,14 +153,6 @@ struct thread_struct {
>  	int align[0] __attribute__ ((aligned(16)));
>  };
>  
> -
> -/*
> - * Default implementation of macro that returns current
> - * instruction pointer ("program counter").
> - */
> -#define current_text_addr()  ({ __label__ _l; _l: &&_l;})
> -
> -
>  /* This decides where the kernel will search for a free chunk of vm
>   * space during mmap's.
>   */
>
Linus Torvalds Aug. 25, 2018, 9:02 p.m. UTC | #2
On Tue, Aug 21, 2018 at 1:31 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> I suspect that current_text_addr predated GNU C extensions for statement
> expressions and/or taking the address of a label, then the macro was
> reimplemented for every new archs include/asm/processor.h, even though
> there were very few call sites, and none required an assembly
> implementation vs the C implementation.

I actually have this very dim memory that we had some compiler issues
where a label in the code resulted in gcc generating worse code
elsewhere in that same function.

But current_text_addr() predates both the git and the BK history, so
it's all shrouded in antiquity, and even if my dim recollection is
true, it may not be true any more. There aren't so many call sites
that it is likely to matter anyway.

             Linus
H. Peter Anvin Aug. 26, 2018, 2:38 a.m. UTC | #3
On 08/25/18 03:48, Helge Deller wrote:
> 
> Currently alpha, s390, sparc, sh, c6x, ia64 and parisc provide an
> inline assembly function to get the current instruction pointer. 
> As mentioned in an earlier thread, I personally would *prefer* if 
> _THIS_IP_ would use those inline assembly instructions on those
> architectures instead of the (currently used) higher C-level
> implementation.
> 

Older ones have as well, e.g. x86.

The only reason to retain the use of an assembly function would be in
the case where either:

a) the C implementation produces bad or invalid code on certain
   architectures;
b) there is a specific requirement that either an absolute or a relative
   value is used in the binary, e.g. due to constraints on relocation.
   The latter particularly comes to mind since the x86-64 implementation
   in assembly will produce movq $.,%reg (which requires relocation)
   instead of the more natural leaq .(%rip),%reg.

In the case (a) those architectures ought to be able to simply

#undef _THIS_IP_
#define _THIS_IP_ blah...

and in case (b) *those specific instances* should be using some kind of
specially flagged function e.g. current_true_ip() vs.
current_linktime_ip() or somesuch.

I also note that a lot of those functions are not marked
__always_inline, which is a serious error should the compiler ever get
the idea to out-of-line these functions, which could potentially happen
as gcc is rather bad at assigning weight to an assembly statement.

I'm also going to throw in a perhaps ugly bomb into this discussion:

_THIS_IP_ seems to be horribly ill-defined; there is no kind of
serialization, and no reason to believe it can't be arbitrarily hoisted
inside a function. Furthermore, *most of the uses of _THIS_IP_ seem to
be either discarded or passed to a function*, and the location of a
function call, unlike _THIS_IP_ is very well defined.

In that case, the use of this mechanism is completely pointless and
ought to be replaced with _RET_IP_.  It seems like most invocations of
_THIS_IP_ can be trivially replaced with _RET_IP_ inside the function,
which would also reduce the footprint of the function call, for example:

__trace_puts() is only ever called with _THIS_IP_ as the first argument;
drop that argument and use _RET_IP_ inside the function (also,
__trace_puts() only ever uses strlen() as the third argument, which gcc
can of course optimize into a constant for the case of a consta t
string, but *is that optimization actually worth it*?  In the case of
__trace_puts(), a variable strlen() would only ever need to be called in
the case of an allocation actually happening -- otherwise str is never
examined -- and again, it increases the *code size* of the call site.
If it was worthwhile it would make more sense to at least force this
into the rodata section with the string, something like the attached
file for an example; however, I have a hunch it doesn't matter.

I wouldn't be surprised if all or nearly all instances of _THIS_IP_ can
be completely removed.

	-hpa
#include <stddef.h>
#include <string.h>

#define _RET_IP_ ((unsigned long)__builtin_return_address(0))
#define no_inline __attribute__((noinline))
#define must_inline __attribute__((always_inline)) inline

struct myputs_string {
	size_t len;
	const char *str;
};

int _myputs_struct(const struct myputs_string * const strs);
int _myputs_string(const char *str);
int __myputs(unsigned long ip, const char *str, size_t len);

int no_inline _myputs_struct(const struct myputs_string * const strs)
{
	return __myputs(_RET_IP_, strs->str, strs->len);
}

int no_inline _myputs_string(const char *str)
{
	return __myputs(_RET_IP_, str, strlen(str)+1);
}

#define myputs(s)							\
({									\
	int _rv;							\
	if (__builtin_constant_p(s) &&					\
	    __builtin_constant_p(strlen(s)) &&				\
	    strlen(s)+1 == sizeof(s)) {					\
		static const struct myputs_string _mps = {		\
			.len = sizeof(s),				\
			.str = __builtin_constant_p(s) ? s : NULL,	\
		};							\
		_rv = _myputs_struct(&_mps);				\
	} else {							\
		_rv = _myputs_string(s);				\
	}								\
	_rv;								\
})

int test1(void);
int test2(const char *strx);

int test1(void)
{
	return myputs("Foobar");
}

int test2(const char *strx)
{
	return myputs(strx);
}
H. Peter Anvin Aug. 26, 2018, 3:16 a.m. UTC | #4
On 08/25/18 19:38, H. Peter Anvin wrote:
>
> If it was worthwhile it would make more sense to at least force this
> into the rodata section with the string, something like the attached
> file for an example; however, I have a hunch it doesn't matter.
> 

An even nuttier version which avoids the extra pointer indirection.
Read it and fear.

	-hpa
#include <stddef.h>
#include <string.h>

#define _RET_IP_ ((unsigned long)__builtin_return_address(0))
#define no_inline __attribute__((noinline))
#define must_inline __attribute__((always_inline)) inline

struct myputs_string {
	unsigned short len;
	char str[0];
};

int _myputs_struct(const struct myputs_string * const strs);
int _myputs_string(const char *str);
int __myputs(unsigned long ip, const char *str, size_t len);

int no_inline _myputs_struct(const struct myputs_string * const strs)
{
	return __myputs(_RET_IP_, strs->str, strs->len);
}

int no_inline _myputs_string(const char *str)
{
	return __myputs(_RET_IP_, str, strlen(str)+1);
}

#define ifconst(x,y)	__builtin_choose_expr(__builtin_constant_p(x),(x),(y))

#define myputs(s)							\
({									\
	int _rv;							\
	if (__builtin_constant_p(s) &&					\
	    __builtin_constant_p(strlen(s)) &&				\
	    strlen(s)+1 == sizeof(s) &&					\
	    sizeof(s) <= (size_t)65535) {				\
	static const struct {						\
		struct myputs_string _mps_hdr;				\
		char _mps_str[sizeof(s)];				\
	} _mps = {							\
		._mps_hdr.len = sizeof(s),				\
		._mps_str = ifconst(s,""),				\
	};								\
		_rv = _myputs_struct(&_mps._mps_hdr);			\
	} else {							\
		_rv = _myputs_string(s);				\
	}								\
	_rv;								\
})

int test1(void);
int test2(const char *strx);

int test1(void)
{
	return myputs("Foobar");
}

int test2(const char *strx)
{
	return myputs(strx);
}
H. Peter Anvin Aug. 26, 2018, 4:56 a.m. UTC | #5
On 08/25/18 20:16, H. Peter Anvin wrote:
> On 08/25/18 19:38, H. Peter Anvin wrote:
>>
>> If it was worthwhile it would make more sense to at least force this
>> into the rodata section with the string, something like the attached
>> file for an example; however, I have a hunch it doesn't matter.
>>
> 
> An even nuttier version which avoids the extra pointer indirection.
> Read it and fear.
> 
> 	-hpa
> 

OK, so one more thing, I guess: it is necessary to suppress the tailcall
optimization for _RET_IP_ to make any sense, but that should be pretty
simple:

static inline void notailcall(void)
{
	asm volatile("");
}

	-hpa
H. Peter Anvin Aug. 26, 2018, 7:30 p.m. UTC | #6
Here is a full-blown (user space) test program demonstrating the whole
technique and how to use it.

	-hpa
#include <stddef.h>
#include <string.h>

#define _RET_IP_ ((unsigned long)__builtin_return_address(0))
#define noinline __attribute__((noinline))
#define used __attribute__((used))
/* __always_inline is defined in glibc already */
#define ifconst(x,y)	__builtin_choose_expr(__builtin_constant_p(x),(x),(y))
static inline void notailcall(void)
{
	asm volatile("");
}

/* Change this to a null string to make all functions global */
#define STATIC static

struct myputs_string {
	unsigned short len;
	char str[0];
};

STATIC int _myputs_struct(const struct myputs_string * const strs);
STATIC int _myputs_string(const char *str);
STATIC int __myputs(unsigned long ip, const char *str, size_t len);

#if 1

#include <stdio.h>

STATIC void dump_caller(unsigned long where)
{
	const char *opname = NULL;
	const char *wheretoname = NULL;
	char ichar;
	unsigned long whereto = 0;

#if defined(__i386__) || defined(__x86_64__)
	char opname_buf[4];
	unsigned char opcode;
	
	where -= 5;
	opcode = *(unsigned char *)where;

	switch (opcode) {
	case 0xe8:
		opname = "call";
		whereto = where + 5 + *(signed int *)(where + 1);
		break;
	case 0xe9:
		opname = "jmp";
		whereto = where + 5 + *(signed int *)(where + 1);
		break;
	default:
		snprintf(opname_buf, sizeof opname_buf, "?%02x", opcode);
		opname = opname_buf;
		break;
	}

#elif defined(__sparc__)
	const char regtype[4] = "gilo";
	unsigned int opcode, op1, op3, ibit;
	signed int simm13, simm30;
	char opname_buf[32];
	char *p;

	where -= 8;
	
	opcode = *(signed int *)where;
	op1 = opcode >> 30;
	op3 = (opcode >> 19) & 0x3f;
	ibit = (opcode >> 13) & 1;
	simm13 = (opcode & 0x1fff) << 2;
	simm30 = (opcode & 0x3fffffff) << 2;

	opname = opname_buf;
	
	if (op1 == 1) {
		opname = "call";
		whereto = where + simm30;
	} else if (op1 == 2 && op3 == 0x38) {
		if (ibit) {
			snprintf(opname_buf, sizeof opname_buf,
				 "jmpl %%%c%u %c 0x%x",
				 regtype[(opcode >> 17) & 3],
				 (opcode >> 14) & 7,
				 simm13 < 0 ? '-' : '+',
				 abs(simm13));
		} else {
			snprintf(opname_buf, sizeof opname_buf,
				 "jmpl %%%c%u + %%%c%u",
				 regtype[(opcode >> 17) & 3],
				 (opcode >> 14) & 7,
				 regtype[(opcode >> 3) & 3],
				 opcode & 7);
		}
	} else {
		snprintf(opname_buf, sizeof opname_buf,
			 "?0x08x", opcode);
	}
#else
	/* Unknown architecture */
#endif
	if (whereto == (unsigned long)_myputs_struct) {
		wheretoname = "_myputs_struct";
	} else if (whereto == (unsigned long)_myputs_string) {
		wheretoname = "_myputs_string";
	} else {
		wheretoname = "?";
	}

	ichar = '[';
	
	if (opname) {
		printf("%c%p: %s",
		       ichar, (void *)where, opname);
		ichar = ' ';
	}
	if (whereto) {
		printf("%c%p <%s>", ichar, (void *)whereto, wheretoname);
		ichar = ' ';
	}
	if (ichar != '[')
		putchar(']');
}
	
STATIC int __myputs(unsigned long where, const char *str, size_t len)
{
	size_t slen = strlen(str);
	size_t rv;
	
	len--;
	rv = printf("%p: \"%.*s\"%*s", (void *)where, (int)len, str,
		    16-(int)slen, "");
	dump_caller(where);
	if (slen != len)
		printf(" <err: strlen = %zu, len = %zu>\n", slen, len);
	else
		printf(" <ok: len = %zu>\n", len);
	
	return rv;
}

STATIC int noinline _myputs_struct(const struct myputs_string * const strs)
{
	return __myputs(_RET_IP_, strs->str, strs->len);
}

STATIC int noinline _myputs_string(const char *str)
{
	return __myputs(_RET_IP_, str, strlen(str)+1);
}
#endif

#define myputs(s)							\
({									\
	int _rv;							\
	if (__builtin_constant_p(s) &&					\
	    __builtin_constant_p(strlen(s)) &&				\
	    strlen(s)+1 == sizeof(s) &&					\
	    sizeof(s) <= (size_t)65535) {				\
	static const struct {						\
		struct myputs_string _mps_hdr;				\
		char _mps_str[sizeof(s)];				\
	} _mps = {							\
		._mps_hdr.len = sizeof(s),				\
		._mps_str = ifconst(s,""),				\
	};								\
		_rv = _myputs_struct(&_mps._mps_hdr);			\
	} else {							\
		_rv = _myputs_string(s);				\
	}								\
	notailcall();							\
	_rv;								\
})

STATIC int test1(void);
STATIC int test2(const char *strx);

STATIC int test1(void)
{
	return myputs("Foobar");
}

STATIC int test2(const char *strx)
{
	return myputs(strx);
}

int main(int argc, char *argv[])
{
	(void)argc;

	test1();
	test2(argv[0]);
	return 0;
}
Linus Torvalds Aug. 26, 2018, 8:25 p.m. UTC | #7
On Sun, Aug 26, 2018 at 12:32 PM H. Peter Anvin <hpa@zytor.com> wrote:
>
> Here is a full-blown (user space) test program demonstrating the whole
> technique and how to use it.

So while I agree that some _THIS_IP_ users might be better off being
converted to __builtin_return_address(0) at the caller, I also think
that the whole "notailcall" thing shows why that can easily be more
problematic than just our currnet _THIS_IP_ solution.

Honestly, I'd suggest:

 - just do the current_text_addr() to _THIS_IP_ conversion

 - keep _THIS_IP_ and make it be the generic one, and screw the whole
"some architectures might implement is better" issue. Nobody cares.

 - try to convince people to move away from the "we want the kernel
instruction pointer for the call" model entirely, and consider this a
"legacy" issue.

The whole instruction pointer is a nasty thing. We should discourage
it and not make complex infrastructure for it.

Instead, maybe we could encourage something like

  struct kernel_loc { const char *file; const char *fn; int line; };

  #define __GEN_LOC__(n) \
        ({ static const struct kernel_loc n = { \
                __FILE__, __FUNCTION__, __LINE__  \
           }; &n; })

  #define _THIS_LOC_ __GEN_LOC__(__UNIQUE_ID(loc))

which is a hell of a lot nicer to use, and actually allows gcc to
optimize things (try it: if you pass a _THIS_LOC_ off to an inline
function, and that inline function uses the name and line number, gcc
will pick them up directly, without the extra structure dereference.

Wouldn't it be much nicer to pass these kinds of "location pointer"
around, rather than the nasty _THIS_IP_ thing?

Certainly lockdep looks like it could easily take that "const struct
kernel_loc *" instead of "unsigned long ip". Makes it easy to print
out the lockdep info.

Ok, I didn't try to convert anybody, so maybe people who currently use
_THIS_IP_ or current_text_addr() have some fundamental reason why they
want just that, but let's not male _THIS_IP_ more complex than it
needs to be.

Hmm?

             Linus
H. Peter Anvin Aug. 26, 2018, 11:20 p.m. UTC | #8
On 08/26/18 12:30, H. Peter Anvin wrote:
> Here is a full-blown (user space) test program demonstrating the whole
> technique and how to use it.
> 
> 	-hpa

Incidentally, it looks like _RET_IP_ really should be defined as:


/*
 * Is there any reason whatsoever to have _RET_IP_ an unsigned int
 * rather than a pointer throughout?
 */

#define _RET_IP_PTR_ \
	__builtin_extract_return_addr(__builtin_return_addr(0))
#define _RET_IP_ ((unsigned long)_RET_IP_PTR_)

On some architectures __builtin_extract_return_addr() is apparently
necessary; its a nop on x86.  Why that isn't part of
__builtin_return_addr() one can really wonder.

So, checking into all of this, the generic _THIS_IP_ DOES NOT WORK on
x86.  I have tried a tons of variants, including adding various asm
volatile(...) instructions, and no matter what I do, it will always
return the address of the surrounding function rather than any kind of
local IP.  The only way to get a localized address seems to be in
assembly, but even so, there is absolutely no guarantee that the value
of _THIS_IP_ has anything to do with where the code is otherwise
localized in the function.

From examining the output of gcc, the fundamental problem seems to be
that *no matter what* one do with the label, unless gcc actually
produces a computed goto somewhere in the code, that it can't remove
with dead-code elimination or constant propagation, it will arbitrarily
hoist the labels all the way to the beginning of the function. Given
that, I suspect that other versions of gcc might have similar problems.

This is the closest thing to arch-neutral I have been able to find that
also works on x86, while not at the same time horribly polluting the
namespace:

#define __here(n) ___here_ ## n
#define __hereasm(n) ".L___here_" #n
#define _THIS_IP_CTR_(n)					\
	({							\
		extern const char __here(n) asm(__hereasm(n));	\
		asm volatile(__hereasm(n) ": /* _THIS_IP_ */"); \
		(unsigned long)&__here(n);			\
	})
#define _THIS_IP_ _THIS_IP_CTR_(__COUNTER__)

The use of asm volatile() to define a label means that the position in
the instruction stream is at least reasonably well-defined.

	-hpa
Nick Desaulniers Aug. 27, 2018, 2:52 a.m. UTC | #9
On Sun, Aug 26, 2018 at 1:25 PM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> Honestly, I'd suggest:
>
>  - just do the current_text_addr() to _THIS_IP_ conversion
>
>  - keep _THIS_IP_ and make it be the generic one, and screw the whole
> "some architectures might implement is better" issue. Nobody cares.

And mention it to the compiler vendors as this seems like a case where
code gen can be improved.

>
>  - try to convince people to move away from the "we want the kernel
> instruction pointer for the call" model entirely, and consider this a
> "legacy" issue.
>
> The whole instruction pointer is a nasty thing. We should discourage
> it and not make complex infrastructure for it.

Yes, please.  I think we should strive for simplicity here.

>
> Instead, maybe we could encourage something like
>
>   struct kernel_loc { const char *file; const char *fn; int line; };
>
>   #define __GEN_LOC__(n) \
>         ({ static const struct kernel_loc n = { \
>                 __FILE__, __FUNCTION__, __LINE__  \
>            }; &n; })
>
>   #define _THIS_LOC_ __GEN_LOC__(__UNIQUE_ID(loc))
>
> which is a hell of a lot nicer to use, and actually allows gcc to
> optimize things (try it: if you pass a _THIS_LOC_ off to an inline
> function, and that inline function uses the name and line number, gcc
> will pick them up directly, without the extra structure dereference.
>
> Wouldn't it be much nicer to pass these kinds of "location pointer"
> around, rather than the nasty _THIS_IP_ thing?
>
> Certainly lockdep looks like it could easily take that "const struct
> kernel_loc *" instead of "unsigned long ip". Makes it easy to print
> out the lockdep info.
>
> Ok, I didn't try to convert anybody, so maybe people who currently use
> _THIS_IP_ or current_text_addr() have some fundamental reason why they
> want just that, but let's not male _THIS_IP_ more complex than it
> needs to be.
>
> Hmm?
>
>              Linus

This is extremely reasonable.  I can follow up with the lockdep folks
to see if they really need _THIS_IP_ to solve their problem, or if
there's a simpler solution that can solve their needs.  Sometimes
taking a step back and asking for clarity around the big picture
allows simpler solutions to shake out.
Peter Zijlstra Aug. 27, 2018, 7:33 a.m. UTC | #10
On Sun, Aug 26, 2018 at 07:52:59PM -0700, Nick Desaulniers wrote:
> On Sun, Aug 26, 2018 at 1:25 PM Linus Torvalds

> > Instead, maybe we could encourage something like
> >
> >   struct kernel_loc { const char *file; const char *fn; int line; };
> >
> >   #define __GEN_LOC__(n) \
> >         ({ static const struct kernel_loc n = { \
> >                 __FILE__, __FUNCTION__, __LINE__  \
> >            }; &n; })
> >
> >   #define _THIS_LOC_ __GEN_LOC__(__UNIQUE_ID(loc))
> >
> > which is a hell of a lot nicer to use, and actually allows gcc to
> > optimize things (try it: if you pass a _THIS_LOC_ off to an inline
> > function, and that inline function uses the name and line number, gcc
> > will pick them up directly, without the extra structure dereference.
> >
> > Wouldn't it be much nicer to pass these kinds of "location pointer"
> > around, rather than the nasty _THIS_IP_ thing?
> >
> > Certainly lockdep looks like it could easily take that "const struct
> > kernel_loc *" instead of "unsigned long ip". Makes it easy to print
> > out the lockdep info.

> This is extremely reasonable.  I can follow up with the lockdep folks
> to see if they really need _THIS_IP_ to solve their problem, or if
> there's a simpler solution that can solve their needs.  Sometimes
> taking a step back and asking for clarity around the big picture
> allows simpler solutions to shake out.

What problem are we trying to solve? _THIS_IP_ and _RET_IP_ work fine.
We're 'good' at dealing with text addresses, we use them for call stacks
and all sorts. Why does this need changing?
Nicholas Piggin Aug. 27, 2018, 7:43 a.m. UTC | #11
[ Trimmed the cc list because my SMTP didn't accept that many
addresses. ]

On Sun, 26 Aug 2018 13:25:14 -0700
Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Sun, Aug 26, 2018 at 12:32 PM H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > Here is a full-blown (user space) test program demonstrating the whole
> > technique and how to use it.  
> 
> So while I agree that some _THIS_IP_ users might be better off being
> converted to __builtin_return_address(0) at the caller, I also think
> that the whole "notailcall" thing shows why that can easily be more
> problematic than just our currnet _THIS_IP_ solution.
> 
> Honestly, I'd suggest:
> 
>  - just do the current_text_addr() to _THIS_IP_ conversion
> 
>  - keep _THIS_IP_ and make it be the generic one, and screw the whole
> "some architectures might implement is better" issue. Nobody cares.
> 
>  - try to convince people to move away from the "we want the kernel
> instruction pointer for the call" model entirely, and consider this a
> "legacy" issue.
> 
> The whole instruction pointer is a nasty thing. We should discourage
> it and not make complex infrastructure for it.
> 
> Instead, maybe we could encourage something like
> 
>   struct kernel_loc { const char *file; const char *fn; int line; };
> 
>   #define __GEN_LOC__(n) \
>         ({ static const struct kernel_loc n = { \
>                 __FILE__, __FUNCTION__, __LINE__  \
>            }; &n; })
> 
>   #define _THIS_LOC_ __GEN_LOC__(__UNIQUE_ID(loc))
> 
> which is a hell of a lot nicer to use, and actually allows gcc to
> optimize things (try it: if you pass a _THIS_LOC_ off to an inline
> function, and that inline function uses the name and line number, gcc
> will pick them up directly, without the extra structure dereference.
> 
> Wouldn't it be much nicer to pass these kinds of "location pointer"
> around, rather than the nasty _THIS_IP_ thing?

Seems nice. Do you even need this unique ID thing? AFAIKS the name
would never really be useful.

It could perhaps go into a cold data section too, I assume the common
case is that you do not access it. Although gcc will end up putting
the file and function names into regular rodata.

Possibly we could add a printk specifier for it, pass it through to
existing BUG, etc macros that want exactly this, etc. Makes a lot of
sense.

Thanks,
Nick
H. Peter Anvin Aug. 27, 2018, 12:26 p.m. UTC | #12
On 08/27/18 00:33, Peter Zijlstra wrote:
> 
> What problem are we trying to solve? _THIS_IP_ and _RET_IP_ work fine.
> We're 'good' at dealing with text addresses, we use them for call stacks
> and all sorts. Why does this need changing?
> 

_RET_IP_ works fine, with the following two caveats:

1. To get a unique IP for each call site, the function call needs to be
   tailcall protected (easily done by wrapping the function in an
  __always_inline function with the notailcall() function I described
  earlier.  Alternatively, a generic macro wrapper for the same thing:

  #define notailcall(x) ({ typeof(x) _x = (x); asm volatile("");  _x; })

2. To uniformly get the return IP, it needs to be defined as:

#define _RET_IP_((unsigned long) \
__builtin_extract_return_addr(__builtin_return_address(0)))

[sorry for the line wrapping]

Using the type unsigned long instead of void * seems kind of pointless
though.


_THIS_IP_, however, is completely ill-defined, other than being an
address *somewhere* in the same global function (not even necessarily
the same function if the function is static!)  As my experiment show, in
many (nearly) cases gcc will hoist the address all the way to the top of
the function, at least for the current generic implementation.

For the case where _THIS_IP_ is passed to an out-of-line function in all
cases, it is extra pointless because all it does is increase the
footprint of every caller: _RET_IP_ is inherently passed to the function
anyway, and with tailcall protection it will uniquely identify a callsite.

For the case where _THIS_IP_ is used inline, I believe the version I
described will at the very least avoid hoisting around volatile accesses
like READ_ONCE(). Surrounding the marked code with asm volatile("");
[which should be turned into a macro or inline, obviously] might be
necessary for it to make any kind of inherent sense.

The proposed "location identifier" does have a serious problem: with
inline functions you might very well have a bunch of duplicates pointing
into the inline function, so a single callsite isn't identifiable.

	-hpa
Peter Zijlstra Aug. 27, 2018, 1:11 p.m. UTC | #13
On Mon, Aug 27, 2018 at 05:26:53AM -0700, H. Peter Anvin wrote:

> _THIS_IP_, however, is completely ill-defined, other than being an
> address *somewhere* in the same global function (not even necessarily
> the same function if the function is static!)  As my experiment show, in
> many (nearly) cases gcc will hoist the address all the way to the top of
> the function, at least for the current generic implementation.

It seems to have mostly worked so far... did anything change?

> For the case where _THIS_IP_ is passed to an out-of-line function in all
> cases, it is extra pointless because all it does is increase the
> footprint of every caller: _RET_IP_ is inherently passed to the function
> anyway, and with tailcall protection it will uniquely identify a callsite.

So I think we can convert many of the lockdep _THIS_IP_ calls to
_RET_IP_ on the other side, with a wee bit of care.

A little something like so perhaps...

---

 drivers/md/bcache/btree.c    |  2 +-
 fs/jbd2/transaction.c        |  6 +++---
 fs/super.c                   |  4 ++--
 include/linux/fs.h           |  4 ++--
 include/linux/jbd2.h         |  4 ++--
 include/linux/lockdep.h      | 21 ++++++++++-----------
 include/linux/percpu-rwsem.h | 22 ++++++++++------------
 include/linux/rcupdate.h     |  8 ++++----
 include/linux/ww_mutex.h     |  2 +-
 kernel/locking/lockdep.c     | 14 ++++++++------
 kernel/printk/printk.c       | 14 +++++++-------
 kernel/sched/core.c          |  4 ++--
 lib/locking-selftest.c       | 32 ++++++++++++++++----------------
 13 files changed, 68 insertions(+), 69 deletions(-)

diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
index c19f7716df88..21ede9b317de 100644
--- a/drivers/md/bcache/btree.c
+++ b/drivers/md/bcache/btree.c
@@ -940,7 +940,7 @@ static struct btree *mca_alloc(struct cache_set *c, struct btree_op *op,
 	hlist_del_init_rcu(&b->hash);
 	hlist_add_head_rcu(&b->hash, mca_hash(c, k));
 
-	lock_set_subclass(&b->lock.dep_map, level + 1, _THIS_IP_);
+	lock_set_subclass(&b->lock.dep_map, level + 1);
 	b->parent	= (void *) ~0UL;
 	b->flags	= 0;
 	b->written	= 0;
diff --git a/fs/jbd2/transaction.c b/fs/jbd2/transaction.c
index c0b66a7a795b..40aa71321f8a 100644
--- a/fs/jbd2/transaction.c
+++ b/fs/jbd2/transaction.c
@@ -382,7 +382,7 @@ static int start_this_handle(journal_t *journal, handle_t *handle,
 	read_unlock(&journal->j_state_lock);
 	current->journal_info = handle;
 
-	rwsem_acquire_read(&journal->j_trans_commit_map, 0, 0, _THIS_IP_);
+	rwsem_acquire_read(&journal->j_trans_commit_map, 0, 0);
 	jbd2_journal_free_transaction(new_transaction);
 	/*
 	 * Ensure that no allocations done while the transaction is open are
@@ -677,7 +677,7 @@ int jbd2__journal_restart(handle_t *handle, int nblocks, gfp_t gfp_mask)
 	if (need_to_start)
 		jbd2_log_start_commit(journal, tid);
 
-	rwsem_release(&journal->j_trans_commit_map, 1, _THIS_IP_);
+	rwsem_release(&journal->j_trans_commit_map, 1);
 	handle->h_buffer_credits = nblocks;
 	/*
 	 * Restore the original nofs context because the journal restart
@@ -1771,7 +1771,7 @@ int jbd2_journal_stop(handle_t *handle)
 			wake_up(&journal->j_wait_transaction_locked);
 	}
 
-	rwsem_release(&journal->j_trans_commit_map, 1, _THIS_IP_);
+	rwsem_release(&journal->j_trans_commit_map, 1);
 
 	if (wait_for_commit)
 		err = jbd2_log_wait_commit(journal, tid);
diff --git a/fs/super.c b/fs/super.c
index 50728d9c1a05..ec650a558f09 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1431,7 +1431,7 @@ static void lockdep_sb_freeze_release(struct super_block *sb)
 	int level;
 
 	for (level = SB_FREEZE_LEVELS - 1; level >= 0; level--)
-		percpu_rwsem_release(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
+		percpu_rwsem_release(sb->s_writers.rw_sem + level, 0);
 }
 
 /*
@@ -1442,7 +1442,7 @@ static void lockdep_sb_freeze_acquire(struct super_block *sb)
 	int level;
 
 	for (level = 0; level < SB_FREEZE_LEVELS; ++level)
-		percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0, _THIS_IP_);
+		percpu_rwsem_acquire(sb->s_writers.rw_sem + level, 0);
 }
 
 static void sb_freeze_unlock(struct super_block *sb)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1ec33fd0423f..2ba14e5362e4 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1505,9 +1505,9 @@ void __sb_end_write(struct super_block *sb, int level);
 int __sb_start_write(struct super_block *sb, int level, bool wait);
 
 #define __sb_writers_acquired(sb, lev)	\
-	percpu_rwsem_acquire(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_)
+	percpu_rwsem_acquire(&(sb)->s_writers.rw_sem[(lev)-1], 1)
 #define __sb_writers_release(sb, lev)	\
-	percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1, _THIS_IP_)
+	percpu_rwsem_release(&(sb)->s_writers.rw_sem[(lev)-1], 1)
 
 /**
  * sb_end_write - drop write access to a superblock
diff --git a/include/linux/jbd2.h b/include/linux/jbd2.h
index b708e5169d1d..7c31176ec8ae 100644
--- a/include/linux/jbd2.h
+++ b/include/linux/jbd2.h
@@ -1155,8 +1155,8 @@ struct journal_s
 
 #define jbd2_might_wait_for_commit(j) \
 	do { \
-		rwsem_acquire(&j->j_trans_commit_map, 0, 0, _THIS_IP_); \
-		rwsem_release(&j->j_trans_commit_map, 1, _THIS_IP_); \
+		rwsem_acquire(&j->j_trans_commit_map, 0, 0); \
+		rwsem_release(&j->j_trans_commit_map, 1); \
 	} while (0)
 
 /* journal feature predicate functions */
diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index 6fc77d4dbdcd..ed3daf41ae7b 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -348,16 +348,15 @@ static inline int lock_is_held(const struct lockdep_map *lock)
 #define lockdep_is_held_type(lock, r)	lock_is_held_type(&(lock)->dep_map, (r))
 
 extern void lock_set_class(struct lockdep_map *lock, const char *name,
-			   struct lock_class_key *key, unsigned int subclass,
-			   unsigned long ip);
+			   struct lock_class_key *key, unsigned int subclass);
 
-static inline void lock_set_subclass(struct lockdep_map *lock,
-		unsigned int subclass, unsigned long ip)
+static __always_inline void
+lock_set_subclass(struct lockdep_map *lock, unsigned int subclass)
 {
-	lock_set_class(lock, lock->name, lock->key, subclass, ip);
+	lock_set_class(lock, lock->name, lock->key, subclass);
 }
 
-extern void lock_downgrade(struct lockdep_map *lock, unsigned long ip);
+extern void lock_downgrade(struct lockdep_map *lock);
 
 struct pin_cookie { unsigned int val; };
 
@@ -401,11 +400,11 @@ static inline void lockdep_on(void)
 {
 }
 
-# define lock_acquire(l, s, t, r, c, n, i)	do { } while (0)
-# define lock_release(l, n, i)			do { } while (0)
-# define lock_downgrade(l, i)			do { } while (0)
-# define lock_set_class(l, n, k, s, i)		do { } while (0)
-# define lock_set_subclass(l, s, i)		do { } while (0)
+# define lock_acquire(l, s, t, r, c, n)		do { } while (0)
+# define lock_release(l, n)			do { } while (0)
+# define lock_downgrade(l)			do { } while (0)
+# define lock_set_class(l, n, k, s)		do { } while (0)
+# define lock_set_subclass(l, s)		do { } while (0)
 # define lockdep_info()				do { } while (0)
 # define lockdep_init_map(lock, name, key, sub) \
 		do { (void)(name); (void)(key); } while (0)
diff --git a/include/linux/percpu-rwsem.h b/include/linux/percpu-rwsem.h
index 79b99d653e03..4ebf14e99034 100644
--- a/include/linux/percpu-rwsem.h
+++ b/include/linux/percpu-rwsem.h
@@ -29,11 +29,11 @@ static struct percpu_rw_semaphore name = {				\
 extern int __percpu_down_read(struct percpu_rw_semaphore *, int);
 extern void __percpu_up_read(struct percpu_rw_semaphore *);
 
-static inline void percpu_down_read_preempt_disable(struct percpu_rw_semaphore *sem)
+static __always_inline void percpu_down_read_preempt_disable(struct percpu_rw_semaphore *sem)
 {
 	might_sleep();
 
-	rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0, _RET_IP_);
+	rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 0);
 
 	preempt_disable();
 	/*
@@ -60,7 +60,7 @@ static inline void percpu_down_read(struct percpu_rw_semaphore *sem)
 	preempt_enable();
 }
 
-static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
+static __always_inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
 {
 	int ret = 1;
 
@@ -78,12 +78,12 @@ static inline int percpu_down_read_trylock(struct percpu_rw_semaphore *sem)
 	 */
 
 	if (ret)
-		rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 1, _RET_IP_);
+		rwsem_acquire_read(&sem->rw_sem.dep_map, 0, 1);
 
 	return ret;
 }
 
-static inline void percpu_up_read_preempt_enable(struct percpu_rw_semaphore *sem)
+static __always_inline void percpu_up_read_preempt_enable(struct percpu_rw_semaphore *sem)
 {
 	/*
 	 * The barrier() prevents the compiler from
@@ -99,7 +99,7 @@ static inline void percpu_up_read_preempt_enable(struct percpu_rw_semaphore *sem
 		__percpu_up_read(sem); /* Unconditional memory barrier */
 	preempt_enable();
 
-	rwsem_release(&sem->rw_sem.dep_map, 1, _RET_IP_);
+	rwsem_release(&sem->rw_sem.dep_map, 1);
 }
 
 static inline void percpu_up_read(struct percpu_rw_semaphore *sem)
@@ -127,20 +127,18 @@ extern void percpu_free_rwsem(struct percpu_rw_semaphore *);
 #define percpu_rwsem_assert_held(sem)				\
 	lockdep_assert_held(&(sem)->rw_sem)
 
-static inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem,
-					bool read, unsigned long ip)
+static __always_inline void percpu_rwsem_release(struct percpu_rw_semaphore *sem, bool read)
 {
-	lock_release(&sem->rw_sem.dep_map, 1, ip);
+	lock_release(&sem->rw_sem.dep_map, 1);
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	if (!read)
 		sem->rw_sem.owner = RWSEM_OWNER_UNKNOWN;
 #endif
 }
 
-static inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem,
-					bool read, unsigned long ip)
+static __always_inline void percpu_rwsem_acquire(struct percpu_rw_semaphore *sem, bool read)
 {
-	lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL, ip);
+	lock_acquire(&sem->rw_sem.dep_map, 0, 1, read, 1, NULL);
 #ifdef CONFIG_RWSEM_SPIN_ON_OWNER
 	if (!read)
 		sem->rw_sem.owner = current;
diff --git a/include/linux/rcupdate.h b/include/linux/rcupdate.h
index 75e5b393cf44..6c1a35555e9d 100644
--- a/include/linux/rcupdate.h
+++ b/include/linux/rcupdate.h
@@ -239,14 +239,14 @@ static inline bool rcu_lockdep_current_cpu_online(void) { return true; }
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 
-static inline void rcu_lock_acquire(struct lockdep_map *map)
+static __always_inline void rcu_lock_acquire(struct lockdep_map *map)
 {
-	lock_acquire(map, 0, 0, 2, 0, NULL, _THIS_IP_);
+	lock_acquire(map, 0, 0, 2, 0, NULL);
 }
 
-static inline void rcu_lock_release(struct lockdep_map *map)
+static __always_inline void rcu_lock_release(struct lockdep_map *map)
 {
-	lock_release(map, 1, _THIS_IP_);
+	lock_release(map, 1);
 }
 
 extern struct lockdep_map rcu_lock_map;
diff --git a/include/linux/ww_mutex.h b/include/linux/ww_mutex.h
index 3af7c0e03be5..524aa28eef33 100644
--- a/include/linux/ww_mutex.h
+++ b/include/linux/ww_mutex.h
@@ -182,7 +182,7 @@ static inline void ww_acquire_done(struct ww_acquire_ctx *ctx)
 static inline void ww_acquire_fini(struct ww_acquire_ctx *ctx)
 {
 #ifdef CONFIG_DEBUG_MUTEXES
-	mutex_release(&ctx->dep_map, 0, _THIS_IP_);
+	mutex_release(&ctx->dep_map, 0);
 
 	DEBUG_LOCKS_WARN_ON(ctx->acquired);
 	if (!IS_ENABLED(CONFIG_PROVE_LOCKING))
diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c
index 5fa4d3138bf1..0b7c4f94a7a3 100644
--- a/kernel/locking/lockdep.c
+++ b/kernel/locking/lockdep.c
@@ -3868,9 +3868,9 @@ static void check_flags(unsigned long flags)
 }
 
 void lock_set_class(struct lockdep_map *lock, const char *name,
-		    struct lock_class_key *key, unsigned int subclass,
-		    unsigned long ip)
+		    struct lock_class_key *key, unsigned int subclass)
 {
+	unsigned long ip = _RET_IP_;
 	unsigned long flags;
 
 	if (unlikely(current->lockdep_recursion))
@@ -3886,8 +3886,9 @@ void lock_set_class(struct lockdep_map *lock, const char *name,
 }
 EXPORT_SYMBOL_GPL(lock_set_class);
 
-void lock_downgrade(struct lockdep_map *lock, unsigned long ip)
+void lock_downgrade(struct lockdep_map *lock)
 {
+	unsigned long ip = _RET_IP_;
 	unsigned long flags;
 
 	if (unlikely(current->lockdep_recursion))
@@ -3909,8 +3910,9 @@ EXPORT_SYMBOL_GPL(lock_downgrade);
  */
 void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 			  int trylock, int read, int check,
-			  struct lockdep_map *nest_lock, unsigned long ip)
+			  struct lockdep_map *nest_lock)
 {
+	unsigned long ip = _RET_IP_;
 	unsigned long flags;
 
 	if (unlikely(current->lockdep_recursion))
@@ -3928,9 +3930,9 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass,
 }
 EXPORT_SYMBOL_GPL(lock_acquire);
 
-void lock_release(struct lockdep_map *lock, int nested,
-			  unsigned long ip)
+void lock_release(struct lockdep_map *lock, int nested)
 {
+	unsigned long ip = _RET_IP_;
 	unsigned long flags;
 
 	if (unlikely(current->lockdep_recursion))
diff --git a/kernel/printk/printk.c b/kernel/printk/printk.c
index 90b6ab01db59..9c8654be08bb 100644
--- a/kernel/printk/printk.c
+++ b/kernel/printk/printk.c
@@ -1583,7 +1583,7 @@ static void console_lock_spinning_enable(void)
 	raw_spin_unlock(&console_owner_lock);
 
 	/* The waiter may spin on us after setting console_owner */
-	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+	spin_acquire(&console_owner_dep_map, 0, 0);
 }
 
 /**
@@ -1611,20 +1611,20 @@ static int console_lock_spinning_disable_and_check(void)
 	raw_spin_unlock(&console_owner_lock);
 
 	if (!waiter) {
-		spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+		spin_release(&console_owner_dep_map, 1);
 		return 0;
 	}
 
 	/* The waiter is now free to continue */
 	WRITE_ONCE(console_waiter, false);
 
-	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+	spin_release(&console_owner_dep_map, 1);
 
 	/*
 	 * Hand off console_lock to waiter. The waiter will perform
 	 * the up(). After this, the waiter is the console_lock owner.
 	 */
-	mutex_release(&console_lock_dep_map, 1, _THIS_IP_);
+	mutex_release(&console_lock_dep_map, 1);
 	return 1;
 }
 
@@ -1674,11 +1674,11 @@ static int console_trylock_spinning(void)
 	}
 
 	/* We spin waiting for the owner to release us */
-	spin_acquire(&console_owner_dep_map, 0, 0, _THIS_IP_);
+	spin_acquire(&console_owner_dep_map, 0, 0);
 	/* Owner will clear console_waiter on hand off */
 	while (READ_ONCE(console_waiter))
 		cpu_relax();
-	spin_release(&console_owner_dep_map, 1, _THIS_IP_);
+	spin_release(&console_owner_dep_map, 1);
 
 	printk_safe_exit_irqrestore(flags);
 	/*
@@ -1687,7 +1687,7 @@ static int console_trylock_spinning(void)
 	 * this as a trylock. Otherwise lockdep will
 	 * complain.
 	 */
-	mutex_acquire(&console_lock_dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&console_lock_dep_map, 0, 1);
 
 	return 1;
 }
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 454adf9f8180..a3d146cc2cb9 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2557,7 +2557,7 @@ prepare_lock_switch(struct rq *rq, struct task_struct *next, struct rq_flags *rf
 	 * do an early lockdep release here:
 	 */
 	rq_unpin_lock(rq, rf);
-	spin_release(&rq->lock.dep_map, 1, _THIS_IP_);
+	spin_release(&rq->lock.dep_map, 1);
 #ifdef CONFIG_DEBUG_SPINLOCK
 	/* this is a valid case when another task releases the spinlock */
 	rq->lock.owner = next;
@@ -2571,7 +2571,7 @@ static inline void finish_lock_switch(struct rq *rq)
 	 * fix up the runqueue lock - which gets 'carried over' from
 	 * prev into current:
 	 */
-	spin_acquire(&rq->lock.dep_map, 0, 0, _THIS_IP_);
+	spin_acquire(&rq->lock.dep_map, 0, 0);
 	raw_spin_unlock_irq(&rq->lock);
 }
 
diff --git a/lib/locking-selftest.c b/lib/locking-selftest.c
index 1e1bbf171eca..d9599c7d0426 100644
--- a/lib/locking-selftest.c
+++ b/lib/locking-selftest.c
@@ -1475,7 +1475,7 @@ static void ww_test_edeadlk_normal(void)
 
 	mutex_lock(&o2.base);
 	o2.ctx = &t2;
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 
 	WWAI(&t);
 	t2 = t;
@@ -1488,7 +1488,7 @@ static void ww_test_edeadlk_normal(void)
 	WARN_ON(ret != -EDEADLK);
 
 	o2.ctx = NULL;
-	mutex_acquire(&o2.base.dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&o2.base.dep_map, 0, 1);
 	mutex_unlock(&o2.base);
 	WWU(&o);
 
@@ -1500,7 +1500,7 @@ static void ww_test_edeadlk_normal_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1514,7 +1514,7 @@ static void ww_test_edeadlk_normal_slow(void)
 	WARN_ON(ret != -EDEADLK);
 
 	o2.ctx = NULL;
-	mutex_acquire(&o2.base.dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&o2.base.dep_map, 0, 1);
 	mutex_unlock(&o2.base);
 	WWU(&o);
 
@@ -1527,7 +1527,7 @@ static void ww_test_edeadlk_no_unlock(void)
 
 	mutex_lock(&o2.base);
 	o2.ctx = &t2;
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 
 	WWAI(&t);
 	t2 = t;
@@ -1540,7 +1540,7 @@ static void ww_test_edeadlk_no_unlock(void)
 	WARN_ON(ret != -EDEADLK);
 
 	o2.ctx = NULL;
-	mutex_acquire(&o2.base.dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&o2.base.dep_map, 0, 1);
 	mutex_unlock(&o2.base);
 
 	WWL(&o2, &t);
@@ -1551,7 +1551,7 @@ static void ww_test_edeadlk_no_unlock_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1565,7 +1565,7 @@ static void ww_test_edeadlk_no_unlock_slow(void)
 	WARN_ON(ret != -EDEADLK);
 
 	o2.ctx = NULL;
-	mutex_acquire(&o2.base.dep_map, 0, 1, _THIS_IP_);
+	mutex_acquire(&o2.base.dep_map, 0, 1);
 	mutex_unlock(&o2.base);
 
 	ww_mutex_lock_slow(&o2, &t);
@@ -1576,7 +1576,7 @@ static void ww_test_edeadlk_acquire_more(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1597,7 +1597,7 @@ static void ww_test_edeadlk_acquire_more_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1618,11 +1618,11 @@ static void ww_test_edeadlk_acquire_more_edeadlk(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	mutex_lock(&o3.base);
-	mutex_release(&o3.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o3.base.dep_map, 1);
 	o3.ctx = &t2;
 
 	WWAI(&t);
@@ -1644,11 +1644,11 @@ static void ww_test_edeadlk_acquire_more_edeadlk_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	mutex_lock(&o3.base);
-	mutex_release(&o3.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o3.base.dep_map, 1);
 	o3.ctx = &t2;
 
 	WWAI(&t);
@@ -1669,7 +1669,7 @@ static void ww_test_edeadlk_acquire_wrong(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
@@ -1694,7 +1694,7 @@ static void ww_test_edeadlk_acquire_wrong_slow(void)
 	int ret;
 
 	mutex_lock(&o2.base);
-	mutex_release(&o2.base.dep_map, 1, _THIS_IP_);
+	mutex_release(&o2.base.dep_map, 1);
 	o2.ctx = &t2;
 
 	WWAI(&t);
H. Peter Anvin Aug. 27, 2018, 1:33 p.m. UTC | #14
On 08/27/18 06:11, Peter Zijlstra wrote:
> On Mon, Aug 27, 2018 at 05:26:53AM -0700, H. Peter Anvin wrote:
> 
>> _THIS_IP_, however, is completely ill-defined, other than being an
>> address *somewhere* in the same global function (not even necessarily
>> the same function if the function is static!)  As my experiment show, in
>> many (nearly) cases gcc will hoist the address all the way to the top of
>> the function, at least for the current generic implementation.
> 
> It seems to have mostly worked so far... did anything change?
> 

Most likely because the major architectures contain a arch-specific
assembly implementation.  The generic implementation used in some places
is completely broken, as my experiments show.

>> For the case where _THIS_IP_ is passed to an out-of-line function in all
>> cases, it is extra pointless because all it does is increase the
>> footprint of every caller: _RET_IP_ is inherently passed to the function
>> anyway, and with tailcall protection it will uniquely identify a callsite.
> 
> So I think we can convert many of the lockdep _THIS_IP_ calls to
> _RET_IP_ on the other side, with a wee bit of care.
> 
> A little something like so perhaps...

I don't have time to look at this right now (I'm on sabbatical, and I'm
dealing with personal legal stuff right at the moment), but I think it
is the right direction.

	-hpa
Nick Desaulniers Aug. 31, 2018, 4:48 p.m. UTC | #15
On Mon, Aug 27, 2018 at 6:34 AM H. Peter Anvin <hpa@zytor.com> wrote:
>
> On 08/27/18 06:11, Peter Zijlstra wrote:
> > On Mon, Aug 27, 2018 at 05:26:53AM -0700, H. Peter Anvin wrote:
> >
> >> _THIS_IP_, however, is completely ill-defined, other than being an
> >> address *somewhere* in the same global function (not even necessarily
> >> the same function if the function is static!)  As my experiment show, in
> >> many (nearly) cases gcc will hoist the address all the way to the top of
> >> the function, at least for the current generic implementation.
> >
> > It seems to have mostly worked so far... did anything change?
> >
>
> Most likely because the major architectures contain a arch-specific
> assembly implementation.  The generic implementation used in some places
> is completely broken, as my experiments show.
>
> >> For the case where _THIS_IP_ is passed to an out-of-line function in all
> >> cases, it is extra pointless because all it does is increase the
> >> footprint of every caller: _RET_IP_ is inherently passed to the function
> >> anyway, and with tailcall protection it will uniquely identify a callsite.
> >
> > So I think we can convert many of the lockdep _THIS_IP_ calls to
> > _RET_IP_ on the other side, with a wee bit of care.
> >
> > A little something like so perhaps...
>
> I don't have time to look at this right now (I'm on sabbatical, and I'm
> dealing with personal legal stuff right at the moment), but I think it
> is the right direction.
>
>         -hpa

Linus,
Can this patch please be merged?  Then we can polish off Peter's
change to lockdep to not even use _THIS_IP_.
diff mbox series

Patch

diff --git a/arch/alpha/include/asm/processor.h b/arch/alpha/include/asm/processor.h
index cb05d045efe3..6100431da07a 100644
--- a/arch/alpha/include/asm/processor.h
+++ b/arch/alpha/include/asm/processor.h
@@ -10,12 +10,6 @@ 
 
 #include <linux/personality.h>	/* for ADDR_LIMIT_32BIT */
 
-/*
- * Returns current instruction pointer ("program counter").
- */
-#define current_text_addr() \
-  ({ void *__pc; __asm__ ("br %0,.+4" : "=r"(__pc)); __pc; })
-
 /*
  * We have a 42-bit user address space: 4TB user VM...
  */
diff --git a/arch/arc/include/asm/processor.h b/arch/arc/include/asm/processor.h
index 8ee41e988169..10346d6cf926 100644
--- a/arch/arc/include/asm/processor.h
+++ b/arch/arc/include/asm/processor.h
@@ -98,14 +98,6 @@  extern void start_thread(struct pt_regs * regs, unsigned long pc,
 
 extern unsigned int get_wchan(struct task_struct *p);
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- * Should the PC register be read instead ? This macro does not seem to
- * be used in many places so this wont be all that bad.
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l; })
-
 #endif /* !__ASSEMBLY__ */
 
 /*
diff --git a/arch/arm/include/asm/processor.h b/arch/arm/include/asm/processor.h
index 1bf65b47808a..120f4c9bbfde 100644
--- a/arch/arm/include/asm/processor.h
+++ b/arch/arm/include/asm/processor.h
@@ -11,12 +11,6 @@ 
 #ifndef __ASM_ARM_PROCESSOR_H
 #define __ASM_ARM_PROCESSOR_H
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l;})
-
 #ifdef __KERNEL__
 
 #include <asm/hw_breakpoint.h>
diff --git a/arch/arm64/include/asm/processor.h b/arch/arm64/include/asm/processor.h
index 79657ad91397..966214f473b4 100644
--- a/arch/arm64/include/asm/processor.h
+++ b/arch/arm64/include/asm/processor.h
@@ -25,13 +25,6 @@ 
 #define USER_DS		(TASK_SIZE_64 - 1)
 
 #ifndef __ASSEMBLY__
-
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l;})
-
 #ifdef __KERNEL__
 
 #include <linux/build_bug.h>
diff --git a/arch/c6x/include/asm/processor.h b/arch/c6x/include/asm/processor.h
index 8f7cce829f8e..a8581f5b27f6 100644
--- a/arch/c6x/include/asm/processor.h
+++ b/arch/c6x/include/asm/processor.h
@@ -17,17 +17,6 @@ 
 #include <asm/page.h>
 #include <asm/current.h>
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr()			\
-({						\
-	void *__pc;				\
-	asm("mvc .S2 pce1,%0\n" : "=b"(__pc));	\
-	__pc;					\
-})
-
 /*
  * User space process size. This is mostly meaningless for NOMMU
  * but some C6X processors may have RAM addresses up to 0xFFFFFFFF.
diff --git a/arch/h8300/include/asm/processor.h b/arch/h8300/include/asm/processor.h
index 985346393e4a..a060b41b2d31 100644
--- a/arch/h8300/include/asm/processor.h
+++ b/arch/h8300/include/asm/processor.h
@@ -12,12 +12,6 @@ 
 #ifndef __ASM_H8300_PROCESSOR_H
 #define __ASM_H8300_PROCESSOR_H
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l; })
-
 #include <linux/compiler.h>
 #include <asm/segment.h>
 #include <asm/ptrace.h>
diff --git a/arch/hexagon/include/asm/processor.h b/arch/hexagon/include/asm/processor.h
index ce67940860a5..227bcb9cfdac 100644
--- a/arch/hexagon/include/asm/processor.h
+++ b/arch/hexagon/include/asm/processor.h
@@ -27,9 +27,6 @@ 
 #include <asm/registers.h>
 #include <asm/hexagon_vm.h>
 
-/*  must be a macro  */
-#define current_text_addr() ({ __label__ _l; _l: &&_l; })
-
 /*  task_struct, defined elsewhere, is the "process descriptor" */
 struct task_struct;
 
diff --git a/arch/ia64/include/asm/processor.h b/arch/ia64/include/asm/processor.h
index 10061ccf0440..c91ef98ed6bf 100644
--- a/arch/ia64/include/asm/processor.h
+++ b/arch/ia64/include/asm/processor.h
@@ -602,12 +602,6 @@  ia64_set_unat (__u64 *unat, void *spill_addr, unsigned long nat)
 	*unat = (*unat & ~mask) | (nat << bit);
 }
 
-/*
- * Get the current instruction/program counter value.
- */
-#define current_text_addr() \
-	({ void *_pc; _pc = (void *)ia64_getreg(_IA64_REG_IP); _pc; })
-
 static inline __u64
 ia64_get_ivr (void)
 {
diff --git a/arch/m68k/include/asm/processor.h b/arch/m68k/include/asm/processor.h
index 464e9f5f50ee..3750819ac5a1 100644
--- a/arch/m68k/include/asm/processor.h
+++ b/arch/m68k/include/asm/processor.h
@@ -8,12 +8,6 @@ 
 #ifndef __ASM_M68K_PROCESSOR_H
 #define __ASM_M68K_PROCESSOR_H
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l;})
-
 #include <linux/thread_info.h>
 #include <asm/segment.h>
 #include <asm/fpu.h>
diff --git a/arch/microblaze/include/asm/processor.h b/arch/microblaze/include/asm/processor.h
index 330d556860ba..66b537b8d138 100644
--- a/arch/microblaze/include/asm/processor.h
+++ b/arch/microblaze/include/asm/processor.h
@@ -45,12 +45,6 @@  extern void ret_from_kernel_thread(void);
  */
 # define TASK_SIZE	(0x81000000 - 0x80000000)
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-# define current_text_addr() ({ __label__ _l; _l: &&_l; })
-
 /*
  * This decides where the kernel will search for a free chunk of vm
  * space during mmap's. We won't be using it
@@ -92,12 +86,6 @@  extern unsigned long get_wchan(struct task_struct *p);
 
 #  ifndef __ASSEMBLY__
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#  define current_text_addr()	({ __label__ _l; _l: &&_l; })
-
 /* If you change this, you must change the associated assembly-languages
  * constants defined below, THREAD_*.
  */
diff --git a/arch/mips/include/asm/processor.h b/arch/mips/include/asm/processor.h
index b2fa62922d88..f08417f8772e 100644
--- a/arch/mips/include/asm/processor.h
+++ b/arch/mips/include/asm/processor.h
@@ -22,11 +22,6 @@ 
 #include <asm/mipsregs.h>
 #include <asm/prefetch.h>
 
-/*
- * Return current * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l;})
-
 /*
  * System setup and hardware flags..
  */
diff --git a/arch/nds32/include/asm/processor.h b/arch/nds32/include/asm/processor.h
index 9c83caf4269f..c2660f566bac 100644
--- a/arch/nds32/include/asm/processor.h
+++ b/arch/nds32/include/asm/processor.h
@@ -4,12 +4,6 @@ 
 #ifndef __ASM_NDS32_PROCESSOR_H
 #define __ASM_NDS32_PROCESSOR_H
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l;})
-
 #ifdef __KERNEL__
 
 #include <asm/ptrace.h>
diff --git a/arch/nios2/include/asm/processor.h b/arch/nios2/include/asm/processor.h
index 4944e2e1d8b0..94bcb86f679f 100644
--- a/arch/nios2/include/asm/processor.h
+++ b/arch/nios2/include/asm/processor.h
@@ -38,12 +38,6 @@ 
 #define KUSER_SIZE		(PAGE_SIZE)
 #ifndef __ASSEMBLY__
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l; })
-
 # define TASK_SIZE		0x7FFF0000UL
 # define TASK_UNMAPPED_BASE	(PAGE_ALIGN(TASK_SIZE / 3))
 
diff --git a/arch/openrisc/include/asm/processor.h b/arch/openrisc/include/asm/processor.h
index af31a9fe736a..351d3aed7a06 100644
--- a/arch/openrisc/include/asm/processor.h
+++ b/arch/openrisc/include/asm/processor.h
@@ -30,11 +30,6 @@ 
 		   | SPR_SR_DCE | SPR_SR_SM)
 #define USER_SR   (SPR_SR_DME | SPR_SR_IME | SPR_SR_ICE \
 		   | SPR_SR_DCE | SPR_SR_IEE | SPR_SR_TEE)
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l; })
 
 /*
  * User space process size. This is hardcoded into a few places,
diff --git a/arch/parisc/include/asm/processor.h b/arch/parisc/include/asm/processor.h
index 2dbe5580a1a4..0d7f64ef9c7d 100644
--- a/arch/parisc/include/asm/processor.h
+++ b/arch/parisc/include/asm/processor.h
@@ -20,17 +20,6 @@ 
 #include <asm/percpu.h>
 #endif /* __ASSEMBLY__ */
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#ifdef CONFIG_PA20
-#define current_ia(x)	__asm__("mfia %0" : "=r"(x))
-#else /* mfia added in pa2.0 */
-#define current_ia(x)	__asm__("blr 0,%0\n\tnop" : "=r"(x))
-#endif
-#define current_text_addr() ({ void *pc; current_ia(pc); pc; })
-
 #define HAVE_ARCH_PICK_MMAP_LAYOUT
 
 #define TASK_SIZE_OF(tsk)       ((tsk)->thread.task_size)
diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h
index 52fadded5c1e..1fff74df06e6 100644
--- a/arch/powerpc/include/asm/processor.h
+++ b/arch/powerpc/include/asm/processor.h
@@ -67,12 +67,6 @@  extern int _chrp_type;
 
 #endif /* defined(__KERNEL__) && defined(CONFIG_PPC32) */
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l;})
-
 /* Macros for adjusting thread priority (hardware multi-threading) */
 #define HMT_very_low()   asm volatile("or 31,31,31   # very low priority")
 #define HMT_low()	 asm volatile("or 1,1,1	     # low priority")
diff --git a/arch/riscv/include/asm/processor.h b/arch/riscv/include/asm/processor.h
index 3fe4af8147d2..020e35947060 100644
--- a/arch/riscv/include/asm/processor.h
+++ b/arch/riscv/include/asm/processor.h
@@ -33,12 +33,6 @@ 
 struct task_struct;
 struct pt_regs;
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr()	({ __label__ _l; _l: &&_l; })
-
 /* CPU-specific state of a task */
 struct thread_struct {
 	/* Callee-saved registers */
diff --git a/arch/s390/include/asm/processor.h b/arch/s390/include/asm/processor.h
index 7f2953c15c37..f8028d37bb18 100644
--- a/arch/s390/include/asm/processor.h
+++ b/arch/s390/include/asm/processor.h
@@ -73,12 +73,6 @@  static inline int test_cpu_flag_of(int flag, int cpu)
 
 #define arch_needs_cpu() test_cpu_flag(CIF_NOHZ_DELAY)
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ void *pc; asm("basr %0,0" : "=a" (pc)); pc; })
-
 static inline void get_cpu_id(struct cpuid *ptr)
 {
 	asm volatile("stidp %0" : "=Q" (*ptr));
diff --git a/arch/sh/include/asm/processor_32.h b/arch/sh/include/asm/processor_32.h
index 95100d8a0b7b..0e0ecc0132e3 100644
--- a/arch/sh/include/asm/processor_32.h
+++ b/arch/sh/include/asm/processor_32.h
@@ -16,12 +16,6 @@ 
 #include <asm/types.h>
 #include <asm/hw_breakpoint.h>
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ void *pc; __asm__("mova	1f, %0\n.align 2\n1:":"=z" (pc)); pc; })
-
 /* Core Processor Version Register */
 #define CCN_PVR		0xff000030
 #define CCN_CVR		0xff000040
diff --git a/arch/sh/include/asm/processor_64.h b/arch/sh/include/asm/processor_64.h
index 777a16318aff..f3d7075648d0 100644
--- a/arch/sh/include/asm/processor_64.h
+++ b/arch/sh/include/asm/processor_64.h
@@ -19,21 +19,6 @@ 
 #include <asm/types.h>
 #include <cpu/registers.h>
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ \
-void *pc; \
-unsigned long long __dummy = 0; \
-__asm__("gettr	tr0, %1\n\t" \
-	"pta	4, tr0\n\t" \
-	"gettr	tr0, %0\n\t" \
-	"ptabs	%1, tr0\n\t"	\
-	:"=r" (pc), "=r" (__dummy) \
-	: "1" (__dummy)); \
-pc; })
-
 #endif
 
 /*
diff --git a/arch/sparc/include/asm/processor_32.h b/arch/sparc/include/asm/processor_32.h
index 192493c257fa..3c4bc2189092 100644
--- a/arch/sparc/include/asm/processor_32.h
+++ b/arch/sparc/include/asm/processor_32.h
@@ -7,12 +7,6 @@ 
 #ifndef __ASM_SPARC_PROCESSOR_H
 #define __ASM_SPARC_PROCESSOR_H
 
-/*
- * Sparc32 implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ void *pc; __asm__("sethi %%hi(1f), %0; or %0, %%lo(1f), %0;\n1:" : "=r" (pc)); pc; })
-
 #include <asm/psr.h>
 #include <asm/ptrace.h>
 #include <asm/head.h>
diff --git a/arch/sparc/include/asm/processor_64.h b/arch/sparc/include/asm/processor_64.h
index aac23d4a4ddd..5cf145f18f36 100644
--- a/arch/sparc/include/asm/processor_64.h
+++ b/arch/sparc/include/asm/processor_64.h
@@ -8,12 +8,6 @@ 
 #ifndef __ASM_SPARC64_PROCESSOR_H
 #define __ASM_SPARC64_PROCESSOR_H
 
-/*
- * Sparc64 implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ void *pc; __asm__("rd %%pc, %0" : "=r" (pc)); pc; })
-
 #include <asm/asi.h>
 #include <asm/pstate.h>
 #include <asm/ptrace.h>
diff --git a/arch/unicore32/include/asm/processor.h b/arch/unicore32/include/asm/processor.h
index 4eaa42167667..b772ed1c0f25 100644
--- a/arch/unicore32/include/asm/processor.h
+++ b/arch/unicore32/include/asm/processor.h
@@ -13,12 +13,6 @@ 
 #ifndef __UNICORE_PROCESSOR_H__
 #define __UNICORE_PROCESSOR_H__
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr() ({ __label__ _l; _l: &&_l; })
-
 #ifdef __KERNEL__
 
 #include <asm/ptrace.h>
diff --git a/arch/x86/include/asm/kexec.h b/arch/x86/include/asm/kexec.h
index f327236f0fa7..86924d594ecd 100644
--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -21,6 +21,7 @@ 
 #ifndef __ASSEMBLY__
 
 #include <linux/string.h>
+#include <linux/kernel.h>
 
 #include <asm/page.h>
 #include <asm/ptrace.h>
@@ -132,7 +133,7 @@  static inline void crash_setup_regs(struct pt_regs *newregs,
 		asm volatile("movl %%cs, %%eax;" :"=a"(newregs->cs));
 		asm volatile("pushfq; popq %0" :"=m"(newregs->flags));
 #endif
-		newregs->ip = (unsigned long)current_text_addr();
+		newregs->ip = _THIS_IP_;
 	}
 }
 
diff --git a/arch/x86/include/asm/processor.h b/arch/x86/include/asm/processor.h
index 682286aca881..20080b303605 100644
--- a/arch/x86/include/asm/processor.h
+++ b/arch/x86/include/asm/processor.h
@@ -42,18 +42,6 @@  struct vm86;
 #define NET_IP_ALIGN	0
 
 #define HBP_NUM 4
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-static inline void *current_text_addr(void)
-{
-	void *pc;
-
-	asm volatile("mov $1f, %0; 1:":"=r" (pc));
-
-	return pc;
-}
 
 /*
  * These alignment constraints are for performance in the vSMP case,
diff --git a/arch/x86/um/asm/processor_32.h b/arch/x86/um/asm/processor_32.h
index c112de81c9e1..5fb1b8449adf 100644
--- a/arch/x86/um/asm/processor_32.h
+++ b/arch/x86/um/asm/processor_32.h
@@ -47,14 +47,6 @@  static inline void arch_copy_thread(struct arch_thread *from,
         memcpy(&to->tls_array, &from->tls_array, sizeof(from->tls_array));
 }
 
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter"). Stolen
- * from asm-i386/processor.h
- */
-#define current_text_addr() \
-	({ void *pc; __asm__("movl $1f,%0\n1:":"=g" (pc)); pc; })
-
 #define current_sp() ({ void *sp; __asm__("movl %%esp, %0" : "=r" (sp) : ); sp; })
 #define current_bp() ({ unsigned long bp; __asm__("movl %%ebp, %0" : "=r" (bp) : ); bp; })
 
diff --git a/arch/x86/um/asm/processor_64.h b/arch/x86/um/asm/processor_64.h
index c3be85205a65..1ef9c21877bc 100644
--- a/arch/x86/um/asm/processor_64.h
+++ b/arch/x86/um/asm/processor_64.h
@@ -31,9 +31,6 @@  static inline void arch_copy_thread(struct arch_thread *from,
 	to->fs = from->fs;
 }
 
-#define current_text_addr() \
-	({ void *pc; __asm__("movq $1f,%0\n1:":"=g" (pc)); pc; })
-
 #define current_sp() ({ void *sp; __asm__("movq %%rsp, %0" : "=r" (sp) : ); sp; })
 #define current_bp() ({ unsigned long bp; __asm__("movq %%rbp, %0" : "=r" (bp) : ); bp; })
 
diff --git a/arch/xtensa/include/asm/processor.h b/arch/xtensa/include/asm/processor.h
index 5b0027d4ecc0..68891c992105 100644
--- a/arch/xtensa/include/asm/processor.h
+++ b/arch/xtensa/include/asm/processor.h
@@ -153,14 +153,6 @@  struct thread_struct {
 	int align[0] __attribute__ ((aligned(16)));
 };
 
-
-/*
- * Default implementation of macro that returns current
- * instruction pointer ("program counter").
- */
-#define current_text_addr()  ({ __label__ _l; _l: &&_l;})
-
-
 /* This decides where the kernel will search for a free chunk of vm
  * space during mmap's.
  */