diff mbox

[v3,1/3] arm64: arch_timer: Add device tree binding for A-008585 erratum

Message ID 1467412897-15220-1-git-send-email-oss@buserror.net
State Not Applicable, archived
Headers show

Commit Message

Crystal Wood July 1, 2016, 10:41 p.m. UTC
This erratum describes a bug in logic outside the core, so MIDR can't be
used to identify its presence, and reading an SoC-specific revision
register from common arch timer code would be awkward.  So, describe it
in the device tree.

Signed-off-by: Scott Wood <oss@buserror.net>
---
 Documentation/devicetree/bindings/arm/arch_timer.txt | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Will Deacon July 4, 2016, 9:58 a.m. UTC | #1
On Fri, Jul 01, 2016 at 05:41:37PM -0500, Scott Wood wrote:
> Erratum A-008585 says that the ARM generic timer counter "has the
> potential to contain an erroneous value for a small number of core
> clock cycles every time the timer value changes".  Accesses to TVAL
> (both read and write) are also affected due to the implicit counter
> read.  Accesses to CVAL are not affected.
> 
> The workaround is to reread TVAL and count registers until successive reads
> return the same value, and when writing TVAL to retry until counter
> reads before and after the write return the same value.
> 
> This erratum can be found on LS1043A and LS2080A.
> 
> Signed-off-by: Scott Wood <oss@buserror.net>
> ---
> v3:
> - Used cval rather than a loop for the write side of the erratum
> - Added a Kconfig control
> - Moved the device tree binding into its own patch
> - Added erratum to silicon-errata.txt
> - Changed function names to contain the erratum name
> - Factored out the setting of erratum versions of set_next_event
>   to improve readability
> - Added a comment clarifying that the timeout is arbitrary
> 
> v2:
> Significant rework based on feedback, including using static_key,
> disabling VDSO counter access rather than adding the workaround to the
> VDSO, and uninlining the loops.
> 
> Dropped the separate property for indicating that writes to TVAL are
> affected, as I believe that's just a side effect of the implicit
> counter read being corrupted, and thus a chip that is affected by one
> will always be affected by the other.
> 
> Dropped the arm32 portion as it seems there was confusion about whether
> LS1021A is affected.  Currently I am being told that it is not
> affected.
> 
> I considered writing to CVAL rather than looping on TVAL writes, but
> that would still have required separate set_next_event() code for the
> erratum, and adding CVAL to the enum would have required a bunch of
> extra handlers in switch statements (even where unused, due to compiler
> warnings about unhandled enum values) including in an arm32 header.  It
> seemed better to avoid the arm32 interaction and new untested
> accessors.
> ---
>  Documentation/arm64/silicon-errata.txt |   2 +
>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>  drivers/clocksource/Kconfig            |  10 ++++
>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>  4 files changed, 154 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index ba4b6ac..5778f62 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -57,3 +57,5 @@ stable kernels.
>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
> +|                |                 |                 |                         |
> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index fbe0ca3..70fbad9 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -23,10 +23,34 @@
>  
>  #include <linux/bug.h>
>  #include <linux/init.h>
> +#include <linux/jump_label.h>
>  #include <linux/types.h>
>  
>  #include <clocksource/arm_arch_timer.h>
>  
> +extern struct static_key_false arch_timer_read_ool_enabled;
> +
> +#define ARCH_TIMER_REG_READ(reg, func) \
> +extern u64 func##_ool(void); \
> +static inline u64 __##func(void) \
> +{ \
> +	u64 val; \
> +	asm volatile("mrs %0, " reg : "=r" (val)); \
> +	return val; \
> +} \
> +static inline u64 _##func(void) \
> +{ \
> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
> +		return func##_ool(); \
> +	else \
> +		return __##func(); \
> +}
> +
> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)

It's probably worth #undefing the macro here, since it really shouldn't
grow any users outside of this header file.

> +
>  /*
>   * These register accessors are marked inline so the compiler can
>   * nicely work out which register we want, and chuck away the rest of
> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>  	isb();
>  }
>  
> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
> +{
> +	if (access == ARCH_TIMER_PHYS_ACCESS)
> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
> +
> +	isb();
> +}
> +
>  static __always_inline
>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  {
> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_ptval();
>  			break;
>  		}
>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_vtval();
>  			break;
>  		}
>  	}
> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>  
>  static inline u64 arch_counter_get_cntvct(void)
>  {
> -	u64 cval;
> -
>  	isb();
> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
> -
> -	return cval;
> +	return _arch_counter_get_cntvct();
>  }
>  
>  static inline int arch_timer_arch_init(void)
> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> index c346be6..672ddc3 100644
> --- a/drivers/clocksource/Kconfig
> +++ b/drivers/clocksource/Kconfig
> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>  	  This must be disabled for hardware validation purposes to detect any
>  	  hardware anomalies of missing events.
>  
> +config FSL_ERRATUM_A008585
> +	bool "Workaround for Freescale/NXP Erratum A-008585"
> +	default y
> +	depends on ARM_ARCH_TIMER && ARM64
> +	help
> +	  This option enables a workaround for Freescale/NXP Erratum
> +	  A-008585 ("ARM generic timer may contain an erroneous
> +	  value").  The workaround will only be active if the
> +	  fsl,erratum-a008585 property is found in the timer node.
> +
>  config ARM_GLOBAL_TIMER
>  	bool
>  	select CLKSRC_OF if OF
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 5152b38..7ead4eb 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>   * Architected system timer support.
>   */
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
> +
> +/*
> + * __always_inline is used to ensure that func() is not an actual function
> + * pointer, which would result in the register accesses potentially being too
> + * far apart for the loop to work.
> + *
> + * The timeout is an arbitrary value well beyond the highest number
> + * of iterations the loop has been observed to take.
> + */
> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
> +{
> +	u64 cval_old, cval_new;
> +	int timeout = 200;
> +
> +	do {
> +		isb();
> +		cval_old = func();
> +		cval_new = func();
> +		timeout--;
> +	} while (unlikely(cval_old != cval_new) && timeout);
> +
> +	WARN_ON_ONCE(!timeout);
> +	return cval_new;
> +}
> +
> +u64 arch_counter_get_cntvct_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_counter_get_cntvct);
> +}
> +
> +u64 arch_timer_get_vtval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_vtval);
> +}
> +
> +u64 arch_timer_get_ptval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_ptval);
> +}
> +
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static __always_inline
>  void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
>  			  struct clock_event_device *clk)
> @@ -232,6 +277,35 @@ static __always_inline void set_next_event(const int access, unsigned long evt,
>  	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
>  }
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +static __always_inline void fsl_a008585_set_next_event(const int access,
> +		unsigned long evt, struct clock_event_device *clk)
> +{
> +	unsigned long ctrl;
> +	u64 cval = evt + arch_counter_get_cntvct();
> +
> +	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
> +	ctrl |= ARCH_TIMER_CTRL_ENABLE;
> +	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
> +	arch_timer_cval_write_cp15(access, cval);
> +	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
> +}
> +
> +static int fsl_a008585_set_next_event_virt(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_VIRT_ACCESS, evt, clk);
> +	return 0;
> +}
> +
> +static int fsl_a008585_set_next_event_phys(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_PHYS_ACCESS, evt, clk);
> +	return 0;
> +}
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static int arch_timer_set_next_event_virt(unsigned long evt,
>  					  struct clock_event_device *clk)
>  {
> @@ -260,6 +334,19 @@ static int arch_timer_set_next_event_phys_mem(unsigned long evt,
>  	return 0;
>  }
>  
> +static void fsl_a008585_set_sne(struct clock_event_device *clk)
> +{
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +	if (!static_branch_unlikely(&arch_timer_read_ool_enabled))
> +		return;
> +
> +	if (arch_timer_uses_ppi == VIRT_PPI)
> +		clk->set_next_event = fsl_a008585_set_next_event_virt;
> +	else
> +		clk->set_next_event = fsl_a008585_set_next_event_phys;
> +#endif
> +}
> +
>  static void __arch_timer_setup(unsigned type,
>  			       struct clock_event_device *clk)
>  {
> @@ -288,6 +375,8 @@ static void __arch_timer_setup(unsigned type,
>  		default:
>  			BUG();
>  		}
> +
> +		fsl_a008585_set_sne(clk);
>  	} else {
>  		clk->features |= CLOCK_EVT_FEAT_DYNIRQ;
>  		clk->name = "arch_mem_timer";
> @@ -485,6 +574,15 @@ static void __init arch_counter_register(unsigned type)
>  			arch_timer_read_counter = arch_counter_get_cntvct;
>  		else
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> +
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +		/*
> +		 * Don't use the vdso fastpath if errata require using
> +		 * the out-of-line counter accessor.
> +		 */
> +		if (static_branch_unlikely(&arch_timer_read_ool_enabled))
> +			clocksource_counter.name = "arch_sys_counter_ool";

We could avoid this entirely by providing an arch_clocksource_data structure
like x86 does, and get rid of the ugly strcmp magic from update_vsyscall.

Will
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Rob Herring (Arm) July 5, 2016, 3:43 p.m. UTC | #2
On Fri, Jul 01, 2016 at 05:41:35PM -0500, Scott Wood wrote:
> This erratum describes a bug in logic outside the core, so MIDR can't be
> used to identify its presence, and reading an SoC-specific revision
> register from common arch timer code would be awkward.  So, describe it
> in the device tree.
> 
> Signed-off-by: Scott Wood <oss@buserror.net>
> ---
>  Documentation/devicetree/bindings/arm/arch_timer.txt | 6 ++++++
>  1 file changed, 6 insertions(+)

Acked-by: Rob Herring <robh@kernel.org>
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ding Tianhong July 7, 2016, 9:34 a.m. UTC | #3
On 2016/7/2 6:41, Scott Wood wrote:
> Erratum A-008585 says that the ARM generic timer counter "has the
> potential to contain an erroneous value for a small number of core
> clock cycles every time the timer value changes".  Accesses to TVAL
> (both read and write) are also affected due to the implicit counter
> read.  Accesses to CVAL are not affected.
> 
> The workaround is to reread TVAL and count registers until successive reads
> return the same value, and when writing TVAL to retry until counter
> reads before and after the write return the same value.
> 
> This erratum can be found on LS1043A and LS2080A.
> 
> Signed-off-by: Scott Wood <oss@buserror.net>
> ---
> v3:
> - Used cval rather than a loop for the write side of the erratum
> - Added a Kconfig control
> - Moved the device tree binding into its own patch
> - Added erratum to silicon-errata.txt
> - Changed function names to contain the erratum name
> - Factored out the setting of erratum versions of set_next_event
>   to improve readability
> - Added a comment clarifying that the timeout is arbitrary
> 
> v2:
> Significant rework based on feedback, including using static_key,
> disabling VDSO counter access rather than adding the workaround to the
> VDSO, and uninlining the loops.
> 
> Dropped the separate property for indicating that writes to TVAL are
> affected, as I believe that's just a side effect of the implicit
> counter read being corrupted, and thus a chip that is affected by one
> will always be affected by the other.
> 
> Dropped the arm32 portion as it seems there was confusion about whether
> LS1021A is affected.  Currently I am being told that it is not
> affected.
> 
> I considered writing to CVAL rather than looping on TVAL writes, but
> that would still have required separate set_next_event() code for the
> erratum, and adding CVAL to the enum would have required a bunch of
> extra handlers in switch statements (even where unused, due to compiler
> warnings about unhandled enum values) including in an arm32 header.  It
> seemed better to avoid the arm32 interaction and new untested
> accessors.
> ---
>  Documentation/arm64/silicon-errata.txt |   2 +
>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>  drivers/clocksource/Kconfig            |  10 ++++
>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>  4 files changed, 154 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
> index ba4b6ac..5778f62 100644
> --- a/Documentation/arm64/silicon-errata.txt
> +++ b/Documentation/arm64/silicon-errata.txt
> @@ -57,3 +57,5 @@ stable kernels.
>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
> +|                |                 |                 |                         |
> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
> index fbe0ca3..70fbad9 100644
> --- a/arch/arm64/include/asm/arch_timer.h
> +++ b/arch/arm64/include/asm/arch_timer.h
> @@ -23,10 +23,34 @@
>  
>  #include <linux/bug.h>
>  #include <linux/init.h>
> +#include <linux/jump_label.h>
>  #include <linux/types.h>
>  
>  #include <clocksource/arm_arch_timer.h>
>  
> +extern struct static_key_false arch_timer_read_ool_enabled;
> +
> +#define ARCH_TIMER_REG_READ(reg, func) \
> +extern u64 func##_ool(void); \
> +static inline u64 __##func(void) \
> +{ \
> +	u64 val; \
> +	asm volatile("mrs %0, " reg : "=r" (val)); \
> +	return val; \
> +} \
> +static inline u64 _##func(void) \
> +{ \
> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
> +		return func##_ool(); \
> +	else \
> +		return __##func(); \
> +}
> +
> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
> +
>  /*
>   * These register accessors are marked inline so the compiler can
>   * nicely work out which register we want, and chuck away the rest of
> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>  	isb();
>  }
>  
> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
> +{
> +	if (access == ARCH_TIMER_PHYS_ACCESS)
> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
> +
> +	isb();
> +}
> +
>  static __always_inline
>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  {
> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_ptval();
>  			break;
>  		}
>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>  		switch (reg) {
>  		case ARCH_TIMER_REG_CTRL:
> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>  			break;
>  		case ARCH_TIMER_REG_TVAL:
> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
> +			val = _arch_timer_get_vtval();
>  			break;
>  		}
>  	}
> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>  
>  static inline u64 arch_counter_get_cntvct(void)
>  {
> -	u64 cval;
> -
>  	isb();
> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
> -
> -	return cval;
> +	return _arch_counter_get_cntvct();
>  }
>  
>  static inline int arch_timer_arch_init(void)
> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
> index c346be6..672ddc3 100644
> --- a/drivers/clocksource/Kconfig
> +++ b/drivers/clocksource/Kconfig
> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>  	  This must be disabled for hardware validation purposes to detect any
>  	  hardware anomalies of missing events.
>  
> +config FSL_ERRATUM_A008585
> +	bool "Workaround for Freescale/NXP Erratum A-008585"
> +	default y
> +	depends on ARM_ARCH_TIMER && ARM64
> +	help
> +	  This option enables a workaround for Freescale/NXP Erratum
> +	  A-008585 ("ARM generic timer may contain an erroneous
> +	  value").  The workaround will only be active if the
> +	  fsl,erratum-a008585 property is found in the timer node.
> +
>  config ARM_GLOBAL_TIMER
>  	bool
>  	select CLKSRC_OF if OF
> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
> index 5152b38..7ead4eb 100644
> --- a/drivers/clocksource/arm_arch_timer.c
> +++ b/drivers/clocksource/arm_arch_timer.c
> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>   * Architected system timer support.
>   */
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
> +
> +/*
> + * __always_inline is used to ensure that func() is not an actual function
> + * pointer, which would result in the register accesses potentially being too
> + * far apart for the loop to work.
> + *
> + * The timeout is an arbitrary value well beyond the highest number
> + * of iterations the loop has been observed to take.
> + */
> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
> +{
> +	u64 cval_old, cval_new;
> +	int timeout = 200;
> +
> +	do {
> +		isb();
> +		cval_old = func();
> +		cval_new = func();
> +		timeout--;
> +	} while (unlikely(cval_old != cval_new) && timeout);
> +
> +	WARN_ON_ONCE(!timeout);
> +	return cval_new;
> +}
Hi Scott:

I have test this patch, this solution looks will break the performance a little more than I expected.
it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
cval_new in the normal circumstances, so I prefer this way:

	do {
		isb();
		cval_old = func();
		cval_new = func();
		timeout--;	
	} while (unlikely((cval_new - cval_old) >> 2) && timeout);

Thanks.
Ding

> +
> +u64 arch_counter_get_cntvct_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_counter_get_cntvct);
> +}
> +
> +u64 arch_timer_get_vtval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_vtval);
> +}
> +
> +u64 arch_timer_get_ptval_ool(void)
> +{
> +	return fsl_a008585_reread_counter(__arch_timer_get_ptval);
> +}
> +
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static __always_inline
>  void arch_timer_reg_write(int access, enum arch_timer_reg reg, u32 val,
>  			  struct clock_event_device *clk)
> @@ -232,6 +277,35 @@ static __always_inline void set_next_event(const int access, unsigned long evt,
>  	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
>  }
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +static __always_inline void fsl_a008585_set_next_event(const int access,
> +		unsigned long evt, struct clock_event_device *clk)
> +{
> +	unsigned long ctrl;
> +	u64 cval = evt + arch_counter_get_cntvct();
> +
> +	ctrl = arch_timer_reg_read(access, ARCH_TIMER_REG_CTRL, clk);
> +	ctrl |= ARCH_TIMER_CTRL_ENABLE;
> +	ctrl &= ~ARCH_TIMER_CTRL_IT_MASK;
> +	arch_timer_cval_write_cp15(access, cval);
> +	arch_timer_reg_write(access, ARCH_TIMER_REG_CTRL, ctrl, clk);
> +}
> +
> +static int fsl_a008585_set_next_event_virt(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_VIRT_ACCESS, evt, clk);
> +	return 0;
> +}
> +
> +static int fsl_a008585_set_next_event_phys(unsigned long evt,
> +					   struct clock_event_device *clk)
> +{
> +	fsl_a008585_set_next_event(ARCH_TIMER_PHYS_ACCESS, evt, clk);
> +	return 0;
> +}
> +#endif /* CONFIG_FSL_ERRATUM_A008585 */
> +
>  static int arch_timer_set_next_event_virt(unsigned long evt,
>  					  struct clock_event_device *clk)
>  {
> @@ -260,6 +334,19 @@ static int arch_timer_set_next_event_phys_mem(unsigned long evt,
>  	return 0;
>  }
>  
> +static void fsl_a008585_set_sne(struct clock_event_device *clk)
> +{
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +	if (!static_branch_unlikely(&arch_timer_read_ool_enabled))
> +		return;
> +
> +	if (arch_timer_uses_ppi == VIRT_PPI)
> +		clk->set_next_event = fsl_a008585_set_next_event_virt;
> +	else
> +		clk->set_next_event = fsl_a008585_set_next_event_phys;
> +#endif
> +}
> +
>  static void __arch_timer_setup(unsigned type,
>  			       struct clock_event_device *clk)
>  {
> @@ -288,6 +375,8 @@ static void __arch_timer_setup(unsigned type,
>  		default:
>  			BUG();
>  		}
> +
> +		fsl_a008585_set_sne(clk);
>  	} else {
>  		clk->features |= CLOCK_EVT_FEAT_DYNIRQ;
>  		clk->name = "arch_mem_timer";
> @@ -485,6 +574,15 @@ static void __init arch_counter_register(unsigned type)
>  			arch_timer_read_counter = arch_counter_get_cntvct;
>  		else
>  			arch_timer_read_counter = arch_counter_get_cntpct;
> +
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +		/*
> +		 * Don't use the vdso fastpath if errata require using
> +		 * the out-of-line counter accessor.
> +		 */
> +		if (static_branch_unlikely(&arch_timer_read_ool_enabled))
> +			clocksource_counter.name = "arch_sys_counter_ool";
> +#endif
>  	} else {
>  		arch_timer_read_counter = arch_counter_get_cntvct_mem;
>  
> @@ -763,6 +861,11 @@ static void __init arch_timer_of_init(struct device_node *np)
>  
>  	arch_timer_c3stop = !of_property_read_bool(np, "always-on");
>  
> +#ifdef CONFIG_FSL_ERRATUM_A008585
> +	if (of_property_read_bool(np, "fsl,erratum-a008585"))
> +		static_branch_enable(&arch_timer_read_ool_enabled);
> +#endif
> +
>  	/*
>  	 * If we cannot rely on firmware initializing the timer registers then
>  	 * we should use the physical timers instead.
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier July 7, 2016, 9:49 a.m. UTC | #4
On 07/07/16 10:34, Ding Tianhong wrote:
> On 2016/7/2 6:41, Scott Wood wrote:
>> Erratum A-008585 says that the ARM generic timer counter "has the
>> potential to contain an erroneous value for a small number of core
>> clock cycles every time the timer value changes".  Accesses to TVAL
>> (both read and write) are also affected due to the implicit counter
>> read.  Accesses to CVAL are not affected.
>>
>> The workaround is to reread TVAL and count registers until successive reads
>> return the same value, and when writing TVAL to retry until counter
>> reads before and after the write return the same value.
>>
>> This erratum can be found on LS1043A and LS2080A.
>>
>> Signed-off-by: Scott Wood <oss@buserror.net>
>> ---
>> v3:
>> - Used cval rather than a loop for the write side of the erratum
>> - Added a Kconfig control
>> - Moved the device tree binding into its own patch
>> - Added erratum to silicon-errata.txt
>> - Changed function names to contain the erratum name
>> - Factored out the setting of erratum versions of set_next_event
>>   to improve readability
>> - Added a comment clarifying that the timeout is arbitrary
>>
>> v2:
>> Significant rework based on feedback, including using static_key,
>> disabling VDSO counter access rather than adding the workaround to the
>> VDSO, and uninlining the loops.
>>
>> Dropped the separate property for indicating that writes to TVAL are
>> affected, as I believe that's just a side effect of the implicit
>> counter read being corrupted, and thus a chip that is affected by one
>> will always be affected by the other.
>>
>> Dropped the arm32 portion as it seems there was confusion about whether
>> LS1021A is affected.  Currently I am being told that it is not
>> affected.
>>
>> I considered writing to CVAL rather than looping on TVAL writes, but
>> that would still have required separate set_next_event() code for the
>> erratum, and adding CVAL to the enum would have required a bunch of
>> extra handlers in switch statements (even where unused, due to compiler
>> warnings about unhandled enum values) including in an arm32 header.  It
>> seemed better to avoid the arm32 interaction and new untested
>> accessors.
>> ---
>>  Documentation/arm64/silicon-errata.txt |   2 +
>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>  drivers/clocksource/Kconfig            |  10 ++++
>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>
>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>> index ba4b6ac..5778f62 100644
>> --- a/Documentation/arm64/silicon-errata.txt
>> +++ b/Documentation/arm64/silicon-errata.txt
>> @@ -57,3 +57,5 @@ stable kernels.
>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>> +|                |                 |                 |                         |
>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>> index fbe0ca3..70fbad9 100644
>> --- a/arch/arm64/include/asm/arch_timer.h
>> +++ b/arch/arm64/include/asm/arch_timer.h
>> @@ -23,10 +23,34 @@
>>  
>>  #include <linux/bug.h>
>>  #include <linux/init.h>
>> +#include <linux/jump_label.h>
>>  #include <linux/types.h>
>>  
>>  #include <clocksource/arm_arch_timer.h>
>>  
>> +extern struct static_key_false arch_timer_read_ool_enabled;
>> +
>> +#define ARCH_TIMER_REG_READ(reg, func) \
>> +extern u64 func##_ool(void); \
>> +static inline u64 __##func(void) \
>> +{ \
>> +	u64 val; \
>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>> +	return val; \
>> +} \
>> +static inline u64 _##func(void) \
>> +{ \
>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>> +		return func##_ool(); \
>> +	else \
>> +		return __##func(); \
>> +}
>> +
>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>> +
>>  /*
>>   * These register accessors are marked inline so the compiler can
>>   * nicely work out which register we want, and chuck away the rest of
>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>  	isb();
>>  }
>>  
>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>> +{
>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>> +
>> +	isb();
>> +}
>> +
>>  static __always_inline
>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>  {
>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>  		switch (reg) {
>>  		case ARCH_TIMER_REG_CTRL:
>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>  			break;
>>  		case ARCH_TIMER_REG_TVAL:
>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>> +			val = _arch_timer_get_ptval();
>>  			break;
>>  		}
>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>  		switch (reg) {
>>  		case ARCH_TIMER_REG_CTRL:
>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>  			break;
>>  		case ARCH_TIMER_REG_TVAL:
>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>> +			val = _arch_timer_get_vtval();
>>  			break;
>>  		}
>>  	}
>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>  
>>  static inline u64 arch_counter_get_cntvct(void)
>>  {
>> -	u64 cval;
>> -
>>  	isb();
>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>> -
>> -	return cval;
>> +	return _arch_counter_get_cntvct();
>>  }
>>  
>>  static inline int arch_timer_arch_init(void)
>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>> index c346be6..672ddc3 100644
>> --- a/drivers/clocksource/Kconfig
>> +++ b/drivers/clocksource/Kconfig
>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>  	  This must be disabled for hardware validation purposes to detect any
>>  	  hardware anomalies of missing events.
>>  
>> +config FSL_ERRATUM_A008585
>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>> +	default y
>> +	depends on ARM_ARCH_TIMER && ARM64
>> +	help
>> +	  This option enables a workaround for Freescale/NXP Erratum
>> +	  A-008585 ("ARM generic timer may contain an erroneous
>> +	  value").  The workaround will only be active if the
>> +	  fsl,erratum-a008585 property is found in the timer node.
>> +
>>  config ARM_GLOBAL_TIMER
>>  	bool
>>  	select CLKSRC_OF if OF
>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>> index 5152b38..7ead4eb 100644
>> --- a/drivers/clocksource/arm_arch_timer.c
>> +++ b/drivers/clocksource/arm_arch_timer.c
>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>   * Architected system timer support.
>>   */
>>  
>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>> +
>> +/*
>> + * __always_inline is used to ensure that func() is not an actual function
>> + * pointer, which would result in the register accesses potentially being too
>> + * far apart for the loop to work.
>> + *
>> + * The timeout is an arbitrary value well beyond the highest number
>> + * of iterations the loop has been observed to take.
>> + */
>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>> +{
>> +	u64 cval_old, cval_new;
>> +	int timeout = 200;
>> +
>> +	do {
>> +		isb();
>> +		cval_old = func();
>> +		cval_new = func();
>> +		timeout--;
>> +	} while (unlikely(cval_old != cval_new) && timeout);
>> +
>> +	WARN_ON_ONCE(!timeout);
>> +	return cval_new;
>> +}
> Hi Scott:
> 
> I have test this patch, this solution looks will break the performance a little more than I expected.
> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
> cval_new in the normal circumstances, so I prefer this way:
> 
> 	do {
> 		isb();
> 		cval_old = func();
> 		cval_new = func();
> 		timeout--;	
> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);

What makes you think that ignoring the two bottom bits is a safe thing
to do? Talking about performance when the HW has such a dramatic bug is
like putting a bigger engine on a car that has no brakes: you just hit
the wall quicker.

Thanks,

	M.
Ding Tianhong July 7, 2016, 11:37 a.m. UTC | #5
On 2016/7/7 17:49, Marc Zyngier wrote:
> On 07/07/16 10:34, Ding Tianhong wrote:
>> On 2016/7/2 6:41, Scott Wood wrote:
>>> Erratum A-008585 says that the ARM generic timer counter "has the
>>> potential to contain an erroneous value for a small number of core
>>> clock cycles every time the timer value changes".  Accesses to TVAL
>>> (both read and write) are also affected due to the implicit counter
>>> read.  Accesses to CVAL are not affected.
>>>
>>> The workaround is to reread TVAL and count registers until successive reads
>>> return the same value, and when writing TVAL to retry until counter
>>> reads before and after the write return the same value.
>>>
>>> This erratum can be found on LS1043A and LS2080A.
>>>
>>> Signed-off-by: Scott Wood <oss@buserror.net>
>>> ---
>>> v3:
>>> - Used cval rather than a loop for the write side of the erratum
>>> - Added a Kconfig control
>>> - Moved the device tree binding into its own patch
>>> - Added erratum to silicon-errata.txt
>>> - Changed function names to contain the erratum name
>>> - Factored out the setting of erratum versions of set_next_event
>>>   to improve readability
>>> - Added a comment clarifying that the timeout is arbitrary
>>>
>>> v2:
>>> Significant rework based on feedback, including using static_key,
>>> disabling VDSO counter access rather than adding the workaround to the
>>> VDSO, and uninlining the loops.
>>>
>>> Dropped the separate property for indicating that writes to TVAL are
>>> affected, as I believe that's just a side effect of the implicit
>>> counter read being corrupted, and thus a chip that is affected by one
>>> will always be affected by the other.
>>>
>>> Dropped the arm32 portion as it seems there was confusion about whether
>>> LS1021A is affected.  Currently I am being told that it is not
>>> affected.
>>>
>>> I considered writing to CVAL rather than looping on TVAL writes, but
>>> that would still have required separate set_next_event() code for the
>>> erratum, and adding CVAL to the enum would have required a bunch of
>>> extra handlers in switch statements (even where unused, due to compiler
>>> warnings about unhandled enum values) including in an arm32 header.  It
>>> seemed better to avoid the arm32 interaction and new untested
>>> accessors.
>>> ---
>>>  Documentation/arm64/silicon-errata.txt |   2 +
>>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>>  drivers/clocksource/Kconfig            |  10 ++++
>>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>>> index ba4b6ac..5778f62 100644
>>> --- a/Documentation/arm64/silicon-errata.txt
>>> +++ b/Documentation/arm64/silicon-errata.txt
>>> @@ -57,3 +57,5 @@ stable kernels.
>>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>>> +|                |                 |                 |                         |
>>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>> index fbe0ca3..70fbad9 100644
>>> --- a/arch/arm64/include/asm/arch_timer.h
>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>> @@ -23,10 +23,34 @@
>>>  
>>>  #include <linux/bug.h>
>>>  #include <linux/init.h>
>>> +#include <linux/jump_label.h>
>>>  #include <linux/types.h>
>>>  
>>>  #include <clocksource/arm_arch_timer.h>
>>>  
>>> +extern struct static_key_false arch_timer_read_ool_enabled;
>>> +
>>> +#define ARCH_TIMER_REG_READ(reg, func) \
>>> +extern u64 func##_ool(void); \
>>> +static inline u64 __##func(void) \
>>> +{ \
>>> +	u64 val; \
>>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>>> +	return val; \
>>> +} \
>>> +static inline u64 _##func(void) \
>>> +{ \
>>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>>> +		return func##_ool(); \
>>> +	else \
>>> +		return __##func(); \
>>> +}
>>> +
>>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>>> +
>>>  /*
>>>   * These register accessors are marked inline so the compiler can
>>>   * nicely work out which register we want, and chuck away the rest of
>>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>>  	isb();
>>>  }
>>>  
>>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>>> +{
>>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>>> +
>>> +	isb();
>>> +}
>>> +
>>>  static __always_inline
>>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>  {
>>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>>  		switch (reg) {
>>>  		case ARCH_TIMER_REG_CTRL:
>>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>>  			break;
>>>  		case ARCH_TIMER_REG_TVAL:
>>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>>> +			val = _arch_timer_get_ptval();
>>>  			break;
>>>  		}
>>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>>  		switch (reg) {
>>>  		case ARCH_TIMER_REG_CTRL:
>>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>>  			break;
>>>  		case ARCH_TIMER_REG_TVAL:
>>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>>> +			val = _arch_timer_get_vtval();
>>>  			break;
>>>  		}
>>>  	}
>>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>>  
>>>  static inline u64 arch_counter_get_cntvct(void)
>>>  {
>>> -	u64 cval;
>>> -
>>>  	isb();
>>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>>> -
>>> -	return cval;
>>> +	return _arch_counter_get_cntvct();
>>>  }
>>>  
>>>  static inline int arch_timer_arch_init(void)
>>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>>> index c346be6..672ddc3 100644
>>> --- a/drivers/clocksource/Kconfig
>>> +++ b/drivers/clocksource/Kconfig
>>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>>  	  This must be disabled for hardware validation purposes to detect any
>>>  	  hardware anomalies of missing events.
>>>  
>>> +config FSL_ERRATUM_A008585
>>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>>> +	default y
>>> +	depends on ARM_ARCH_TIMER && ARM64
>>> +	help
>>> +	  This option enables a workaround for Freescale/NXP Erratum
>>> +	  A-008585 ("ARM generic timer may contain an erroneous
>>> +	  value").  The workaround will only be active if the
>>> +	  fsl,erratum-a008585 property is found in the timer node.
>>> +
>>>  config ARM_GLOBAL_TIMER
>>>  	bool
>>>  	select CLKSRC_OF if OF
>>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>>> index 5152b38..7ead4eb 100644
>>> --- a/drivers/clocksource/arm_arch_timer.c
>>> +++ b/drivers/clocksource/arm_arch_timer.c
>>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>>   * Architected system timer support.
>>>   */
>>>  
>>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>>> +
>>> +/*
>>> + * __always_inline is used to ensure that func() is not an actual function
>>> + * pointer, which would result in the register accesses potentially being too
>>> + * far apart for the loop to work.
>>> + *
>>> + * The timeout is an arbitrary value well beyond the highest number
>>> + * of iterations the loop has been observed to take.
>>> + */
>>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>>> +{
>>> +	u64 cval_old, cval_new;
>>> +	int timeout = 200;
>>> +
>>> +	do {
>>> +		isb();
>>> +		cval_old = func();
>>> +		cval_new = func();
>>> +		timeout--;
>>> +	} while (unlikely(cval_old != cval_new) && timeout);
>>> +
>>> +	WARN_ON_ONCE(!timeout);
>>> +	return cval_new;
>>> +}
>> Hi Scott:
>>
>> I have test this patch, this solution looks will break the performance a little more than I expected.
>> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
>> cval_new in the normal circumstances, so I prefer this way:
>>
>> 	do {
>> 		isb();
>> 		cval_old = func();
>> 		cval_new = func();
>> 		timeout--;	
>> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);
> 
> What makes you think that ignoring the two bottom bits is a safe thing
> to do? Talking about performance when the HW has such a dramatic bug is
> like putting a bigger engine on a car that has no brakes: you just hit
> the wall quicker.
> 
> Thanks,
> 

I have a chip which has the same problem like Scott's chip, and I wish to solve this problem in the same way, 
our chip designer told me that if you got a wrong value from the cntvct_el0, you would not get a wrong value
until 8 cycles later, so I could ignoring the lowest 3 bits if I reading twice together.

The key problem is the probability of this bug, my chip has 1/100000 chance to met this bug, so use 10% performance
to fix this bug looks more expensive.

Thanks.
Ding

> 	M.
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Marc Zyngier July 7, 2016, 11:51 a.m. UTC | #6
On 07/07/16 12:37, Ding Tianhong wrote:
> On 2016/7/7 17:49, Marc Zyngier wrote:
>> On 07/07/16 10:34, Ding Tianhong wrote:
>>> On 2016/7/2 6:41, Scott Wood wrote:
>>>> Erratum A-008585 says that the ARM generic timer counter "has the
>>>> potential to contain an erroneous value for a small number of core
>>>> clock cycles every time the timer value changes".  Accesses to TVAL
>>>> (both read and write) are also affected due to the implicit counter
>>>> read.  Accesses to CVAL are not affected.
>>>>
>>>> The workaround is to reread TVAL and count registers until successive reads
>>>> return the same value, and when writing TVAL to retry until counter
>>>> reads before and after the write return the same value.
>>>>
>>>> This erratum can be found on LS1043A and LS2080A.
>>>>
>>>> Signed-off-by: Scott Wood <oss@buserror.net>
>>>> ---
>>>> v3:
>>>> - Used cval rather than a loop for the write side of the erratum
>>>> - Added a Kconfig control
>>>> - Moved the device tree binding into its own patch
>>>> - Added erratum to silicon-errata.txt
>>>> - Changed function names to contain the erratum name
>>>> - Factored out the setting of erratum versions of set_next_event
>>>>   to improve readability
>>>> - Added a comment clarifying that the timeout is arbitrary
>>>>
>>>> v2:
>>>> Significant rework based on feedback, including using static_key,
>>>> disabling VDSO counter access rather than adding the workaround to the
>>>> VDSO, and uninlining the loops.
>>>>
>>>> Dropped the separate property for indicating that writes to TVAL are
>>>> affected, as I believe that's just a side effect of the implicit
>>>> counter read being corrupted, and thus a chip that is affected by one
>>>> will always be affected by the other.
>>>>
>>>> Dropped the arm32 portion as it seems there was confusion about whether
>>>> LS1021A is affected.  Currently I am being told that it is not
>>>> affected.
>>>>
>>>> I considered writing to CVAL rather than looping on TVAL writes, but
>>>> that would still have required separate set_next_event() code for the
>>>> erratum, and adding CVAL to the enum would have required a bunch of
>>>> extra handlers in switch statements (even where unused, due to compiler
>>>> warnings about unhandled enum values) including in an arm32 header.  It
>>>> seemed better to avoid the arm32 interaction and new untested
>>>> accessors.
>>>> ---
>>>>  Documentation/arm64/silicon-errata.txt |   2 +
>>>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>>>  drivers/clocksource/Kconfig            |  10 ++++
>>>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>>>
>>>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>>>> index ba4b6ac..5778f62 100644
>>>> --- a/Documentation/arm64/silicon-errata.txt
>>>> +++ b/Documentation/arm64/silicon-errata.txt
>>>> @@ -57,3 +57,5 @@ stable kernels.
>>>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>>>> +|                |                 |                 |                         |
>>>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>>> index fbe0ca3..70fbad9 100644
>>>> --- a/arch/arm64/include/asm/arch_timer.h
>>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>>> @@ -23,10 +23,34 @@
>>>>  
>>>>  #include <linux/bug.h>
>>>>  #include <linux/init.h>
>>>> +#include <linux/jump_label.h>
>>>>  #include <linux/types.h>
>>>>  
>>>>  #include <clocksource/arm_arch_timer.h>
>>>>  
>>>> +extern struct static_key_false arch_timer_read_ool_enabled;
>>>> +
>>>> +#define ARCH_TIMER_REG_READ(reg, func) \
>>>> +extern u64 func##_ool(void); \
>>>> +static inline u64 __##func(void) \
>>>> +{ \
>>>> +	u64 val; \
>>>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>>>> +	return val; \
>>>> +} \
>>>> +static inline u64 _##func(void) \
>>>> +{ \
>>>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>>>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>>>> +		return func##_ool(); \
>>>> +	else \
>>>> +		return __##func(); \
>>>> +}
>>>> +
>>>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>>>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>>>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>>>> +
>>>>  /*
>>>>   * These register accessors are marked inline so the compiler can
>>>>   * nicely work out which register we want, and chuck away the rest of
>>>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>>>  	isb();
>>>>  }
>>>>  
>>>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>>>> +{
>>>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>>>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>>>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>>>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>>>> +
>>>> +	isb();
>>>> +}
>>>> +
>>>>  static __always_inline
>>>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>  {
>>>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>>>  		switch (reg) {
>>>>  		case ARCH_TIMER_REG_CTRL:
>>>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>>>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>>>  			break;
>>>>  		case ARCH_TIMER_REG_TVAL:
>>>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>>>> +			val = _arch_timer_get_ptval();
>>>>  			break;
>>>>  		}
>>>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>>>  		switch (reg) {
>>>>  		case ARCH_TIMER_REG_CTRL:
>>>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>>>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>>>  			break;
>>>>  		case ARCH_TIMER_REG_TVAL:
>>>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>>>> +			val = _arch_timer_get_vtval();
>>>>  			break;
>>>>  		}
>>>>  	}
>>>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>>>  
>>>>  static inline u64 arch_counter_get_cntvct(void)
>>>>  {
>>>> -	u64 cval;
>>>> -
>>>>  	isb();
>>>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>>>> -
>>>> -	return cval;
>>>> +	return _arch_counter_get_cntvct();
>>>>  }
>>>>  
>>>>  static inline int arch_timer_arch_init(void)
>>>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>>>> index c346be6..672ddc3 100644
>>>> --- a/drivers/clocksource/Kconfig
>>>> +++ b/drivers/clocksource/Kconfig
>>>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>>>  	  This must be disabled for hardware validation purposes to detect any
>>>>  	  hardware anomalies of missing events.
>>>>  
>>>> +config FSL_ERRATUM_A008585
>>>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>>>> +	default y
>>>> +	depends on ARM_ARCH_TIMER && ARM64
>>>> +	help
>>>> +	  This option enables a workaround for Freescale/NXP Erratum
>>>> +	  A-008585 ("ARM generic timer may contain an erroneous
>>>> +	  value").  The workaround will only be active if the
>>>> +	  fsl,erratum-a008585 property is found in the timer node.
>>>> +
>>>>  config ARM_GLOBAL_TIMER
>>>>  	bool
>>>>  	select CLKSRC_OF if OF
>>>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>>>> index 5152b38..7ead4eb 100644
>>>> --- a/drivers/clocksource/arm_arch_timer.c
>>>> +++ b/drivers/clocksource/arm_arch_timer.c
>>>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>>>   * Architected system timer support.
>>>>   */
>>>>  
>>>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>>>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>>>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>>>> +
>>>> +/*
>>>> + * __always_inline is used to ensure that func() is not an actual function
>>>> + * pointer, which would result in the register accesses potentially being too
>>>> + * far apart for the loop to work.
>>>> + *
>>>> + * The timeout is an arbitrary value well beyond the highest number
>>>> + * of iterations the loop has been observed to take.
>>>> + */
>>>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>>>> +{
>>>> +	u64 cval_old, cval_new;
>>>> +	int timeout = 200;
>>>> +
>>>> +	do {
>>>> +		isb();
>>>> +		cval_old = func();
>>>> +		cval_new = func();
>>>> +		timeout--;
>>>> +	} while (unlikely(cval_old != cval_new) && timeout);
>>>> +
>>>> +	WARN_ON_ONCE(!timeout);
>>>> +	return cval_new;
>>>> +}
>>> Hi Scott:
>>>
>>> I have test this patch, this solution looks will break the performance a little more than I expected.
>>> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
>>> cval_new in the normal circumstances, so I prefer this way:
>>>
>>> 	do {
>>> 		isb();
>>> 		cval_old = func();
>>> 		cval_new = func();
>>> 		timeout--;	
>>> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);
>>
>> What makes you think that ignoring the two bottom bits is a safe thing
>> to do? Talking about performance when the HW has such a dramatic bug is
>> like putting a bigger engine on a car that has no brakes: you just hit
>> the wall quicker.
>>
>> Thanks,
>>
> 
> I have a chip which has the same problem like Scott's chip, and I
> wish to solve this problem in the same way, our chip designer told me
> that if you got a wrong value from the cntvct_el0, you would not get
> a wrong value until 8 cycles later, so I could ignoring the lowest 3
> bits if I reading twice together.

Is that CPU cycles? Or timer cycles? What guarantees do you have that
the two reads are *always* done in the right timing window?

> The key problem is the probability of this bug, my chip has 1/100000
> chance to met this bug, so use 10% performance to fix this bug looks
> more expensive.

You care about performance, I care about correctness. If 10% of your CPU
is wasted on a correct workaround, tough. Next time, your HW guy will
spend more time getting it right.

Anyway, what you are describing is a different bug from what FSL has, so
don't try and shoehorn your own constraints in another workaround.
Please implement your own.

Thanks,

	M.
Ding Tianhong July 7, 2016, 12:59 p.m. UTC | #7
On 2016/7/7 19:51, Marc Zyngier wrote:
> On 07/07/16 12:37, Ding Tianhong wrote:
>> On 2016/7/7 17:49, Marc Zyngier wrote:
>>> On 07/07/16 10:34, Ding Tianhong wrote:
>>>> On 2016/7/2 6:41, Scott Wood wrote:
>>>>> Erratum A-008585 says that the ARM generic timer counter "has the
>>>>> potential to contain an erroneous value for a small number of core
>>>>> clock cycles every time the timer value changes".  Accesses to TVAL
>>>>> (both read and write) are also affected due to the implicit counter
>>>>> read.  Accesses to CVAL are not affected.
>>>>>
>>>>> The workaround is to reread TVAL and count registers until successive reads
>>>>> return the same value, and when writing TVAL to retry until counter
>>>>> reads before and after the write return the same value.
>>>>>
>>>>> This erratum can be found on LS1043A and LS2080A.
>>>>>
>>>>> Signed-off-by: Scott Wood <oss@buserror.net>
>>>>> ---
>>>>> v3:
>>>>> - Used cval rather than a loop for the write side of the erratum
>>>>> - Added a Kconfig control
>>>>> - Moved the device tree binding into its own patch
>>>>> - Added erratum to silicon-errata.txt
>>>>> - Changed function names to contain the erratum name
>>>>> - Factored out the setting of erratum versions of set_next_event
>>>>>   to improve readability
>>>>> - Added a comment clarifying that the timeout is arbitrary
>>>>>
>>>>> v2:
>>>>> Significant rework based on feedback, including using static_key,
>>>>> disabling VDSO counter access rather than adding the workaround to the
>>>>> VDSO, and uninlining the loops.
>>>>>
>>>>> Dropped the separate property for indicating that writes to TVAL are
>>>>> affected, as I believe that's just a side effect of the implicit
>>>>> counter read being corrupted, and thus a chip that is affected by one
>>>>> will always be affected by the other.
>>>>>
>>>>> Dropped the arm32 portion as it seems there was confusion about whether
>>>>> LS1021A is affected.  Currently I am being told that it is not
>>>>> affected.
>>>>>
>>>>> I considered writing to CVAL rather than looping on TVAL writes, but
>>>>> that would still have required separate set_next_event() code for the
>>>>> erratum, and adding CVAL to the enum would have required a bunch of
>>>>> extra handlers in switch statements (even where unused, due to compiler
>>>>> warnings about unhandled enum values) including in an arm32 header.  It
>>>>> seemed better to avoid the arm32 interaction and new untested
>>>>> accessors.
>>>>> ---
>>>>>  Documentation/arm64/silicon-errata.txt |   2 +
>>>>>  arch/arm64/include/asm/arch_timer.h    |  48 ++++++++++++---
>>>>>  drivers/clocksource/Kconfig            |  10 ++++
>>>>>  drivers/clocksource/arm_arch_timer.c   | 103 +++++++++++++++++++++++++++++++++
>>>>>  4 files changed, 154 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/Documentation/arm64/silicon-errata.txt b/Documentation/arm64/silicon-errata.txt
>>>>> index ba4b6ac..5778f62 100644
>>>>> --- a/Documentation/arm64/silicon-errata.txt
>>>>> +++ b/Documentation/arm64/silicon-errata.txt
>>>>> @@ -57,3 +57,5 @@ stable kernels.
>>>>>  | Cavium         | ThunderX ITS    | #22375, #24313  | CAVIUM_ERRATUM_22375    |
>>>>>  | Cavium         | ThunderX GICv3  | #23154          | CAVIUM_ERRATUM_23154    |
>>>>>  | Cavium         | ThunderX Core   | #27456          | CAVIUM_ERRATUM_27456    |
>>>>> +|                |                 |                 |                         |
>>>>> +| Freescale/NXP  | LS2080A/LS1043A | A-008585        | FSL_ERRATUM_A008585     |
>>>>> diff --git a/arch/arm64/include/asm/arch_timer.h b/arch/arm64/include/asm/arch_timer.h
>>>>> index fbe0ca3..70fbad9 100644
>>>>> --- a/arch/arm64/include/asm/arch_timer.h
>>>>> +++ b/arch/arm64/include/asm/arch_timer.h
>>>>> @@ -23,10 +23,34 @@
>>>>>  
>>>>>  #include <linux/bug.h>
>>>>>  #include <linux/init.h>
>>>>> +#include <linux/jump_label.h>
>>>>>  #include <linux/types.h>
>>>>>  
>>>>>  #include <clocksource/arm_arch_timer.h>
>>>>>  
>>>>> +extern struct static_key_false arch_timer_read_ool_enabled;
>>>>> +
>>>>> +#define ARCH_TIMER_REG_READ(reg, func) \
>>>>> +extern u64 func##_ool(void); \
>>>>> +static inline u64 __##func(void) \
>>>>> +{ \
>>>>> +	u64 val; \
>>>>> +	asm volatile("mrs %0, " reg : "=r" (val)); \
>>>>> +	return val; \
>>>>> +} \
>>>>> +static inline u64 _##func(void) \
>>>>> +{ \
>>>>> +	if (IS_ENABLED(CONFIG_FSL_ERRATUM_A008585) && \
>>>>> +	    static_branch_unlikely(&arch_timer_read_ool_enabled)) \
>>>>> +		return func##_ool(); \
>>>>> +	else \
>>>>> +		return __##func(); \
>>>>> +}
>>>>> +
>>>>> +ARCH_TIMER_REG_READ("cntp_tval_el0", arch_timer_get_ptval)
>>>>> +ARCH_TIMER_REG_READ("cntv_tval_el0", arch_timer_get_vtval)
>>>>> +ARCH_TIMER_REG_READ("cntvct_el0", arch_counter_get_cntvct)
>>>>> +
>>>>>  /*
>>>>>   * These register accessors are marked inline so the compiler can
>>>>>   * nicely work out which register we want, and chuck away the rest of
>>>>> @@ -58,6 +82,16 @@ void arch_timer_reg_write_cp15(int access, enum arch_timer_reg reg, u32 val)
>>>>>  	isb();
>>>>>  }
>>>>>  
>>>>> +static __always_inline void arch_timer_cval_write_cp15(int access, u64 val)
>>>>> +{
>>>>> +	if (access == ARCH_TIMER_PHYS_ACCESS)
>>>>> +		asm volatile("msr cntp_cval_el0, %0" : : "r" (val));
>>>>> +	else if (access == ARCH_TIMER_VIRT_ACCESS)
>>>>> +		asm volatile("msr cntv_cval_el0, %0" : : "r" (val));
>>>>> +
>>>>> +	isb();
>>>>> +}
>>>>> +
>>>>>  static __always_inline
>>>>>  u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>>  {
>>>>> @@ -66,19 +100,19 @@ u32 arch_timer_reg_read_cp15(int access, enum arch_timer_reg reg)
>>>>>  	if (access == ARCH_TIMER_PHYS_ACCESS) {
>>>>>  		switch (reg) {
>>>>>  		case ARCH_TIMER_REG_CTRL:
>>>>> -			asm volatile("mrs %0,  cntp_ctl_el0" : "=r" (val));
>>>>> +			asm volatile("mrs %0, cntp_ctl_el0" : "=r" (val));
>>>>>  			break;
>>>>>  		case ARCH_TIMER_REG_TVAL:
>>>>> -			asm volatile("mrs %0, cntp_tval_el0" : "=r" (val));
>>>>> +			val = _arch_timer_get_ptval();
>>>>>  			break;
>>>>>  		}
>>>>>  	} else if (access == ARCH_TIMER_VIRT_ACCESS) {
>>>>>  		switch (reg) {
>>>>>  		case ARCH_TIMER_REG_CTRL:
>>>>> -			asm volatile("mrs %0,  cntv_ctl_el0" : "=r" (val));
>>>>> +			asm volatile("mrs %0, cntv_ctl_el0" : "=r" (val));
>>>>>  			break;
>>>>>  		case ARCH_TIMER_REG_TVAL:
>>>>> -			asm volatile("mrs %0, cntv_tval_el0" : "=r" (val));
>>>>> +			val = _arch_timer_get_vtval();
>>>>>  			break;
>>>>>  		}
>>>>>  	}
>>>>> @@ -116,12 +150,8 @@ static inline u64 arch_counter_get_cntpct(void)
>>>>>  
>>>>>  static inline u64 arch_counter_get_cntvct(void)
>>>>>  {
>>>>> -	u64 cval;
>>>>> -
>>>>>  	isb();
>>>>> -	asm volatile("mrs %0, cntvct_el0" : "=r" (cval));
>>>>> -
>>>>> -	return cval;
>>>>> +	return _arch_counter_get_cntvct();
>>>>>  }
>>>>>  
>>>>>  static inline int arch_timer_arch_init(void)
>>>>> diff --git a/drivers/clocksource/Kconfig b/drivers/clocksource/Kconfig
>>>>> index c346be6..672ddc3 100644
>>>>> --- a/drivers/clocksource/Kconfig
>>>>> +++ b/drivers/clocksource/Kconfig
>>>>> @@ -207,6 +207,16 @@ config ARM_ARCH_TIMER_EVTSTREAM
>>>>>  	  This must be disabled for hardware validation purposes to detect any
>>>>>  	  hardware anomalies of missing events.
>>>>>  
>>>>> +config FSL_ERRATUM_A008585
>>>>> +	bool "Workaround for Freescale/NXP Erratum A-008585"
>>>>> +	default y
>>>>> +	depends on ARM_ARCH_TIMER && ARM64
>>>>> +	help
>>>>> +	  This option enables a workaround for Freescale/NXP Erratum
>>>>> +	  A-008585 ("ARM generic timer may contain an erroneous
>>>>> +	  value").  The workaround will only be active if the
>>>>> +	  fsl,erratum-a008585 property is found in the timer node.
>>>>> +
>>>>>  config ARM_GLOBAL_TIMER
>>>>>  	bool
>>>>>  	select CLKSRC_OF if OF
>>>>> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c
>>>>> index 5152b38..7ead4eb 100644
>>>>> --- a/drivers/clocksource/arm_arch_timer.c
>>>>> +++ b/drivers/clocksource/arm_arch_timer.c
>>>>> @@ -83,6 +83,51 @@ static bool arch_timer_mem_use_virtual;
>>>>>   * Architected system timer support.
>>>>>   */
>>>>>  
>>>>> +#ifdef CONFIG_FSL_ERRATUM_A008585
>>>>> +DEFINE_STATIC_KEY_FALSE(arch_timer_read_ool_enabled);
>>>>> +EXPORT_SYMBOL_GPL(arch_timer_read_ool_enabled);
>>>>> +
>>>>> +/*
>>>>> + * __always_inline is used to ensure that func() is not an actual function
>>>>> + * pointer, which would result in the register accesses potentially being too
>>>>> + * far apart for the loop to work.
>>>>> + *
>>>>> + * The timeout is an arbitrary value well beyond the highest number
>>>>> + * of iterations the loop has been observed to take.
>>>>> + */
>>>>> +static __always_inline u64 fsl_a008585_reread_counter(u64 (*func)(void))
>>>>> +{
>>>>> +	u64 cval_old, cval_new;
>>>>> +	int timeout = 200;
>>>>> +
>>>>> +	do {
>>>>> +		isb();
>>>>> +		cval_old = func();
>>>>> +		cval_new = func();
>>>>> +		timeout--;
>>>>> +	} while (unlikely(cval_old != cval_new) && timeout);
>>>>> +
>>>>> +	WARN_ON_ONCE(!timeout);
>>>>> +	return cval_new;
>>>>> +}
>>>> Hi Scott:
>>>>
>>>> I have test this patch, this solution looks will break the performance a little more than I expected.
>>>> it will have more than 10% that the cval will read again, we could sure that the cval_old always equal to the
>>>> cval_new in the normal circumstances, so I prefer this way:
>>>>
>>>> 	do {
>>>> 		isb();
>>>> 		cval_old = func();
>>>> 		cval_new = func();
>>>> 		timeout--;	
>>>> 	} while (unlikely((cval_new - cval_old) >> 2) && timeout);
>>>
>>> What makes you think that ignoring the two bottom bits is a safe thing
>>> to do? Talking about performance when the HW has such a dramatic bug is
>>> like putting a bigger engine on a car that has no brakes: you just hit
>>> the wall quicker.
>>>
>>> Thanks,
>>>
>>
>> I have a chip which has the same problem like Scott's chip, and I
>> wish to solve this problem in the same way, our chip designer told me
>> that if you got a wrong value from the cntvct_el0, you would not get
>> a wrong value until 8 cycles later, so I could ignoring the lowest 3
>> bits if I reading twice together.
> 
> Is that CPU cycles? Or timer cycles? What guarantees do you have that
> the two reads are *always* done in the right timing window?
> 

The timer counter only use 56 bits in aarch64, my chip would change one of the higher 
bit(55 to 3) to a wrong value when occur bug, so there will be more than 8 cycles between
correct value and wrong value from the timer counter. Maybe Scott's problem is not just like
mine.

>> The key problem is the probability of this bug, my chip has 1/100000
>> chance to met this bug, so use 10% performance to fix this bug looks
>> more expensive.
> 
> You care about performance, I care about correctness. If 10% of your CPU
> is wasted on a correct workaround, tough. Next time, your HW guy will
> spend more time getting it right.
> 
> Anyway, what you are describing is a different bug from what FSL has, so
> don't try and shoehorn your own constraints in another workaround.
> Please implement your own.
> 

OK, In deed I should, thanks.

Ding

> Thanks,
> 
> 	M.
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Crystal Wood July 7, 2016, 5:39 p.m. UTC | #8
On Thu, 2016-07-07 at 20:59 +0800, Ding Tianhong wrote:
> On 2016/7/7 19:51, Marc Zyngier wrote:
> > 
> > On 07/07/16 12:37, Ding Tianhong wrote:
> > > 
> > > On 2016/7/7 17:49, Marc Zyngier wrote:
> > > > 
> > > > What makes you think that ignoring the two bottom bits is a safe thing
> > > > to do? Talking about performance when the HW has such a dramatic bug
> > > > is
> > > > like putting a bigger engine on a car that has no brakes: you just hit
> > > > the wall quicker.
> > > > 
> > > > Thanks,
> > > > 
> > > I have a chip which has the same problem like Scott's chip, and I
> > > wish to solve this problem in the same way, our chip designer told me
> > > that if you got a wrong value from the cntvct_el0, you would not get
> > > a wrong value until 8 cycles later, so I could ignoring the lowest 3
> > > bits if I reading twice together.
> > Is that CPU cycles? Or timer cycles? What guarantees do you have that
> > the two reads are *always* done in the right timing window?
> > 
> The timer counter only use 56 bits in aarch64, my chip would change one of
> the higher 
> bit(55 to 3) to a wrong value when occur bug, so there will be more than 8
> cycles between
> correct value and wrong value from the timer counter. Maybe Scott's problem
> is not just like
> mine.

It's not like yours.  Most errors I saw were time going backwards by 1, 3, or
7 cycles (with occasional larger errors).

-Scott

--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ding Tianhong July 8, 2016, 12:51 a.m. UTC | #9
On 2016/7/8 1:39, Scott Wood wrote:
> On Thu, 2016-07-07 at 20:59 +0800, Ding Tianhong wrote:
>> On 2016/7/7 19:51, Marc Zyngier wrote:
>>>
>>> On 07/07/16 12:37, Ding Tianhong wrote:
>>>>
>>>> On 2016/7/7 17:49, Marc Zyngier wrote:
>>>>>
>>>>> What makes you think that ignoring the two bottom bits is a safe thing
>>>>> to do? Talking about performance when the HW has such a dramatic bug
>>>>> is
>>>>> like putting a bigger engine on a car that has no brakes: you just hit
>>>>> the wall quicker.
>>>>>
>>>>> Thanks,
>>>>>
>>>> I have a chip which has the same problem like Scott's chip, and I
>>>> wish to solve this problem in the same way, our chip designer told me
>>>> that if you got a wrong value from the cntvct_el0, you would not get
>>>> a wrong value until 8 cycles later, so I could ignoring the lowest 3
>>>> bits if I reading twice together.
>>> Is that CPU cycles? Or timer cycles? What guarantees do you have that
>>> the two reads are *always* done in the right timing window?
>>>
>> The timer counter only use 56 bits in aarch64, my chip would change one of
>> the higher 
>> bit(55 to 3) to a wrong value when occur bug, so there will be more than 8
>> cycles between
>> correct value and wrong value from the timer counter. Maybe Scott's problem
>> is not just like
>> mine.
> 
> It's not like yours.  Most errors I saw were time going backwards by 1, 3, or
> 7 cycles (with occasional larger errors).
> 
Looks more series, agree with your solution, thanks。

Thanks.
Ding
> -Scott
> 
> 
> .
> 


--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/arm/arch_timer.txt b/Documentation/devicetree/bindings/arm/arch_timer.txt
index e774128..ef5fbe9 100644
--- a/Documentation/devicetree/bindings/arm/arch_timer.txt
+++ b/Documentation/devicetree/bindings/arm/arch_timer.txt
@@ -25,6 +25,12 @@  to deliver its interrupts via SPIs.
 - always-on : a boolean property. If present, the timer is powered through an
   always-on power domain, therefore it never loses context.
 
+- fsl,erratum-a008585 : A boolean property. Indicates the presence of
+  QorIQ erratum A-008585, which says that reading the counter is
+  unreliable unless the same value is returned by back-to-back reads.
+  This also affects writes to the tval register, due to the implicit
+  counter read.
+
 ** Optional properties:
 
 - arm,cpu-registers-not-fw-configured : Firmware does not initialize