mbox series

[0/6] mips: bmips: improve handling of RAC and CBR addr

Message ID 20240503135455.966-1-ansuelsmth@gmail.com
Headers show
Series mips: bmips: improve handling of RAC and CBR addr | expand

Message

Christian Marangi May 3, 2024, 1:54 p.m. UTC
Hi,

this simple series improve handling of RAC and CBR address and try to
upstream these simple patch we have in OpenWrt for a while.

The first patch fix a straight kernel panic where some Bootloader might
enable RAC but misconfigure the CBR address. The current logic only
check if RAC is enabled but doesn't verify if the CBR address is usable.

The DMA sync function cause a kernel panic for invalid write. (as CBR is
0 or something like 0xa)

The second is preparation for making the CBR address configurable in DT.
Since this address doesn't change, we can cache it and reference it with
a local variable instead of calling the register to access the value.

The 4th patch make it configurable with 2 DT property, one to actually
set the reg and the other to force set it.

The first property is used when CBR is set to 0. The second property is
to force it if the Bootloader sets it to something wrong.

If the CBR value is not 0 and is not forced with the second property a
WARN is printed and the DT value is ignored.

The 5th patch enable RAC on BMIPS4350 and the 5th patch is a micro
optimization to skip more call on DMA sync to save as resource as
possible on low spec devices. (since DMA sync is called many times for
the Ethernet Switch and we can reference the bool instead of checking
the CPU type everytime)

These has been tested on BCM6358 (HG556a) and BCM6368 (VH4032N) and
reported correct functionality.

Christian Marangi (5):
  mips: bmips: BCM6358: make sure CBR is correctly set
  mips: bmips: rework and cache CBR addr handling
  dt-bindings: mips: brcm: Document mips-cbr-reg property
  mips: bmips: setup: make CBR address configurable
  mips: bmips: dma: drop redundant boot_cpu_type in arch_dma_sync

Daniel González Cabanelas (1):
  mips: bmips: enable RAC on BMIPS4350

 .../devicetree/bindings/mips/brcm/soc.yaml    | 32 ++++++++++++
 arch/mips/bmips/dma.c                         | 12 ++---
 arch/mips/bmips/setup.c                       | 36 +++++++++++--
 arch/mips/include/asm/bmips.h                 |  2 +
 arch/mips/kernel/smp-bmips.c                  | 50 ++++++++++++-------
 5 files changed, 102 insertions(+), 30 deletions(-)

Comments

Christian Marangi May 3, 2024, 1:56 p.m. UTC | #1
On Fri, May 03, 2024 at 03:54:06PM +0200, Christian Marangi wrote:
> Drop redundant boot_cpu_type in arch_sync_dma_for_cpu_all. These needs
> to be parsed only once and we can make use of bmips_rac_flush_disable to
> disable RAC flush on unsupported CPU.
> 
> Set this value in bmips_cpu_setup for unsupported CPU to skip this
> redundant check every time DMA needs to be synced.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>

Sent by mistake please ignore and use the other PATCH 6/6 patch. (wrong commit title)
Florian Fainelli May 3, 2024, 6:56 p.m. UTC | #2
On 5/3/24 06:54, Christian Marangi wrote:
> From: Daniel González Cabanelas <dgcbueu@gmail.com>
> 
> The data RAC is left disabled by the bootloader in some SoCs, at least in
> the core it boots from.
> Enabling this feature increases the performance up to +30% depending on the
> task.
> 
> Signed-off-by: Daniel González Cabanelas <dgcbueu@gmail.com>
> Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> [ rework code and reduce code duplication ]
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---
>   arch/mips/kernel/smp-bmips.c | 12 ++++++++++++
>   1 file changed, 12 insertions(+)
> 
> diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
> index 6048c471b5ee..7bde6bbaa41f 100644
> --- a/arch/mips/kernel/smp-bmips.c
> +++ b/arch/mips/kernel/smp-bmips.c
> @@ -617,6 +617,18 @@ void bmips_cpu_setup(void)
>   		__raw_readl(bmips_cbr_addr + BMIPS_RAC_ADDRESS_RANGE);
>   		break;
>   
> +	case CPU_BMIPS4350:
> +		u32 rac_addr = BMIPS_RAC_CONFIG_1;
> +
> +		if (!(read_c0_brcm_cmt_local() & (1 << 31)))
> +			rac_addr = BMIPS_RAC_CONFIG;
> +
> +		/* Enable data RAC */
> +		cfg = __raw_readl(bmips_cbr_addr + rac_addr);
> +		__raw_writel(cfg | 0xa, bmips_cbr_addr + rac_addr);

This enables data pre-fetching (bit 3) and data-caching (bit 1), have 
you tried with 0xF to see if this provides any additional speed-up?

Looks correct to me otherwise, I wonder if a flush would be in order 
right after enabling, though I did not see any specific instructions 
towards that part in the programming notes.

> +		__raw_readl(bmips_cbr_addr + rac_addr);
> +		break;
> +
>   	case CPU_BMIPS4380:
>   		/* CBG workaround for early BMIPS4380 CPUs */
>   		switch (read_c0_prid()) {
Florian Fainelli May 3, 2024, 7 p.m. UTC | #3
On 5/3/24 06:54, Christian Marangi wrote:
> Rework the handling of the CBR address and cache it. This address
> doesn't chance and can be cache instead of calling the register every
> time.

s/change/change/
s/be cache/be cached/
s/calling the register/reading from the register/

> 
> This is in preparation of permitting to tweak the CBR address in DT with
> broken SoC or bootloader.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---
>   arch/mips/bmips/dma.c         |  7 +++----
>   arch/mips/bmips/setup.c       |  6 +++++-
>   arch/mips/include/asm/bmips.h |  1 +
>   arch/mips/kernel/smp-bmips.c  | 31 ++++++++++++++-----------------
>   4 files changed, 23 insertions(+), 22 deletions(-)
> 
> diff --git a/arch/mips/bmips/dma.c b/arch/mips/bmips/dma.c
> index 3779e7855bd7..799cc3e12fc3 100644
> --- a/arch/mips/bmips/dma.c
> +++ b/arch/mips/bmips/dma.c
> @@ -9,7 +9,6 @@ bool bmips_rac_flush_disable;
>   
>   void arch_sync_dma_for_cpu_all(void)
>   {
> -	void __iomem *cbr = BMIPS_GET_CBR();
>   	u32 cfg;
>   
>   	if (boot_cpu_type() != CPU_BMIPS3300 &&
> @@ -21,7 +20,7 @@ void arch_sync_dma_for_cpu_all(void)
>   		return;
>   
>   	/* Flush stale data out of the readahead cache */
> -	cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
> -	__raw_writel(cfg | 0x100, cbr + BMIPS_RAC_CONFIG);
> -	__raw_readl(cbr + BMIPS_RAC_CONFIG);
> +	cfg = __raw_readl(bmips_cbr_addr + BMIPS_RAC_CONFIG);
> +	__raw_writel(cfg | 0x100, bmips_cbr_addr + BMIPS_RAC_CONFIG);
> +	__raw_readl(bmips_cbr_addr + BMIPS_RAC_CONFIG);
>   }
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index 66a8ba19c287..18561d426f89 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -34,6 +34,8 @@
>   #define REG_BCM6328_OTP		((void __iomem *)CKSEG1ADDR(0x1000062c))
>   #define BCM6328_TP1_DISABLED	BIT(9)
>   
> +/* CBR addr doesn't change and we can cache it */
> +void __iomem *bmips_cbr_addr;

Maybe __ro_after_init and __read_mostly, too?

>   extern bool bmips_rac_flush_disable;
>   
>   static const unsigned long kbase = VMLINUX_LOAD_ADDRESS & 0xfff00000;
> @@ -111,7 +113,7 @@ static void bcm6358_quirks(void)
>   	 * because the bootloader is not initializing it properly.
>   	 */
>   	bmips_rac_flush_disable = !!(read_c0_brcm_cmt_local() & (1 << 31)) ||
> -				  !!BMIPS_GET_CBR();
> +				  !!bmips_cbr_addr;
>   }
>   
>   static void bcm6368_quirks(void)
> @@ -144,6 +146,8 @@ static void __init bmips_init_cfe(void)
>   
>   void __init prom_init(void)
>   {
> +	/* Cache CBR addr before CPU/DMA setup */
> +	bmips_cbr_addr = BMIPS_GET_CBR();
>   	bmips_init_cfe();
>   	bmips_cpu_setup();
>   	register_bmips_smp_ops();
> diff --git a/arch/mips/include/asm/bmips.h b/arch/mips/include/asm/bmips.h
> index 581a6a3c66e4..3a1cdfddb987 100644
> --- a/arch/mips/include/asm/bmips.h
> +++ b/arch/mips/include/asm/bmips.h
> @@ -81,6 +81,7 @@ extern char bmips_smp_movevec[];
>   extern char bmips_smp_int_vec[];
>   extern char bmips_smp_int_vec_end[];
>   
> +extern void __iomem *bmips_cbr_addr;
>   extern int bmips_smp_enabled;
>   extern int bmips_cpu_offset;
>   extern cpumask_t bmips_booted_mask;
> diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
> index b3dbf9ecb0d6..6048c471b5ee 100644
> --- a/arch/mips/kernel/smp-bmips.c
> +++ b/arch/mips/kernel/smp-bmips.c
> @@ -518,14 +518,12 @@ static void bmips_set_reset_vec(int cpu, u32 val)
>   		info.val = val;
>   		bmips_set_reset_vec_remote(&info);
>   	} else {
> -		void __iomem *cbr = BMIPS_GET_CBR();
> -
>   		if (cpu == 0)
> -			__raw_writel(val, cbr + BMIPS_RELO_VECTOR_CONTROL_0);
> +			__raw_writel(val, bmips_cbr_addr + BMIPS_RELO_VECTOR_CONTROL_0);
>   		else {
>   			if (current_cpu_type() != CPU_BMIPS4380)
>   				return;
> -			__raw_writel(val, cbr + BMIPS_RELO_VECTOR_CONTROL_1);
> +			__raw_writel(val, bmips_cbr_addr + BMIPS_RELO_VECTOR_CONTROL_1);
>   		}
>   	}
>   	__sync();
> @@ -591,7 +589,6 @@ asmlinkage void __weak plat_wired_tlb_setup(void)
>   
>   void bmips_cpu_setup(void)
>   {
> -	void __iomem __maybe_unused *cbr = BMIPS_GET_CBR();

Could keep the local variable here to minimize the amount of changes, 
should not matter how the resulting code is generated, as it should 
cache it in a register.

>   	u32 __maybe_unused cfg;
>   
>   	switch (current_cpu_type()) {
> @@ -607,17 +604,17 @@ void bmips_cpu_setup(void)
>   		clear_c0_brcm_reset(BIT(16));
>   
>   		/* Flush and enable RAC */
> -		cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
> -		__raw_writel(cfg | 0x100, cbr + BMIPS_RAC_CONFIG);
> -		__raw_readl(cbr + BMIPS_RAC_CONFIG);
> +		cfg = __raw_readl(bmips_cbr_addr + BMIPS_RAC_CONFIG);
> +		__raw_writel(cfg | 0x100, bmips_cbr_addr + BMIPS_RAC_CONFIG);
> +		__raw_readl(bmips_cbr_addr + BMIPS_RAC_CONFIG);
>   
> -		cfg = __raw_readl(cbr + BMIPS_RAC_CONFIG);
> -		__raw_writel(cfg | 0xf, cbr + BMIPS_RAC_CONFIG);
> -		__raw_readl(cbr + BMIPS_RAC_CONFIG);
> +		cfg = __raw_readl(bmips_cbr_addr + BMIPS_RAC_CONFIG);
> +		__raw_writel(cfg | 0xf, bmips_cbr_addr + BMIPS_RAC_CONFIG);
> +		__raw_readl(bmips_cbr_addr + BMIPS_RAC_CONFIG);
>   
> -		cfg = __raw_readl(cbr + BMIPS_RAC_ADDRESS_RANGE);
> -		__raw_writel(cfg | 0x0fff0000, cbr + BMIPS_RAC_ADDRESS_RANGE);
> -		__raw_readl(cbr + BMIPS_RAC_ADDRESS_RANGE);
> +		cfg = __raw_readl(bmips_cbr_addr + BMIPS_RAC_ADDRESS_RANGE);
> +		__raw_writel(cfg | 0x0fff0000, bmips_cbr_addr + BMIPS_RAC_ADDRESS_RANGE);
> +		__raw_readl(bmips_cbr_addr + BMIPS_RAC_ADDRESS_RANGE);
>   		break;
>   
>   	case CPU_BMIPS4380:
> @@ -627,9 +624,9 @@ void bmips_cpu_setup(void)
>   		case 0x2a042:
>   		case 0x2a044:
>   		case 0x2a060:
> -			cfg = __raw_readl(cbr + BMIPS_L2_CONFIG);
> -			__raw_writel(cfg & ~0x07000000, cbr + BMIPS_L2_CONFIG);
> -			__raw_readl(cbr + BMIPS_L2_CONFIG);
> +			cfg = __raw_readl(bmips_cbr_addr + BMIPS_L2_CONFIG);
> +			__raw_writel(cfg & ~0x07000000, bmips_cbr_addr + BMIPS_L2_CONFIG);
> +			__raw_readl(bmips_cbr_addr + BMIPS_L2_CONFIG);
>   		}
>   
>   		/* clear BHTD to enable branch history table */
Florian Fainelli May 3, 2024, 7:07 p.m. UTC | #4
On 5/3/24 06:54, Christian Marangi wrote:
> Drop redundant boot_cpu_type in arch_sync_dma_for_cpu_all. These needs
> to be parsed only once and we can make use of bmips_rac_flush_disable to
> disable RAC flush on unsupported CPU.
> 
> Set this value in bmips_cpu_setup for unsupported CPU to skip this
> redundant check every time DMA needs to be synced.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>

You are taking a shortcut that is reasonable in premise, but keying off 
the bmips_rac_flush_disable is IMHO misleading. The RAC is enabled in 
the BMIPS5000 and BMIPS5200 cores, just it does not need SW management 
unlike earlier cores.

If you renamed it to bmips_rac_flush_needed that might be more 
compelling. Also, the other reason is that on a kernel that was 
configured for supporting only BMIPS5000 and BMIPS5200 CPUs, I think we 
could get some decent dead code elimination of the boot_cpu_type() 
check, which would not be the case.
Florian Fainelli May 3, 2024, 7:09 p.m. UTC | #5
On 5/3/24 06:54, Christian Marangi wrote:
> Add support to provide CBR address from DT to handle broken
> SoC/Bootloader that doesn't correctly init it. This permits to use the
> RAC flush even in these condition.
> 
> To provide a CBR address from DT, the property "mips-cbr-reg" needs to
> be set in the "cpus" node. On DT init, this property presence will be
> checked and will set the bmips_cbr_addr value accordingly. Also
> bmips_rac_flush_disable will be set to false as RAC flush can be
> correctly supported.
> 
> The CBR address from DT will be applied only if the CBR address from the
> registers is 0, if the CBR address from the registers is not 0 and
> is not equal to the one set in DT (if provided) a WARN is printed.
> 
> To ALWAYS overwrite the CBR address the additional property
> "mips-broken-cbr-reg" needs to be set.
> 
> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> ---
>   arch/mips/bmips/setup.c | 30 +++++++++++++++++++++++++++---
>   1 file changed, 27 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> index 18561d426f89..bef84677248e 100644
> --- a/arch/mips/bmips/setup.c
> +++ b/arch/mips/bmips/setup.c
> @@ -34,7 +34,11 @@
>   #define REG_BCM6328_OTP		((void __iomem *)CKSEG1ADDR(0x1000062c))
>   #define BCM6328_TP1_DISABLED	BIT(9)
>   
> -/* CBR addr doesn't change and we can cache it */
> +/*
> + * CBR addr doesn't change and we can cache it.
> + * For broken SoC/Bootloader CBR addr might also be provided via DT
> + * with "mips-cbr-reg" in the "cpus" node.
> + */
>   void __iomem *bmips_cbr_addr;
>   extern bool bmips_rac_flush_disable;
>   
> @@ -212,8 +216,28 @@ void __init device_tree_init(void)
>   
>   	/* Disable SMP boot unless both CPUs are listed in DT and !disabled */
>   	np = of_find_node_by_name(NULL, "cpus");
> -	if (np && of_get_available_child_count(np) <= 1)
> -		bmips_smp_enabled = 0;
> +	if (np) {

Please reduce the indentation with early return/gotos. There might also 
be a need to do some validation that the CBR is at least outside of the 
DRAM window, that is we cannot blindly trust the DT to have gotten the 
CBR right IMHO.
Christian Marangi May 3, 2024, 7:35 p.m. UTC | #6
On Fri, May 03, 2024 at 12:09:02PM -0700, Florian Fainelli wrote:
> On 5/3/24 06:54, Christian Marangi wrote:
> > Add support to provide CBR address from DT to handle broken
> > SoC/Bootloader that doesn't correctly init it. This permits to use the
> > RAC flush even in these condition.
> > 
> > To provide a CBR address from DT, the property "mips-cbr-reg" needs to
> > be set in the "cpus" node. On DT init, this property presence will be
> > checked and will set the bmips_cbr_addr value accordingly. Also
> > bmips_rac_flush_disable will be set to false as RAC flush can be
> > correctly supported.
> > 
> > The CBR address from DT will be applied only if the CBR address from the
> > registers is 0, if the CBR address from the registers is not 0 and
> > is not equal to the one set in DT (if provided) a WARN is printed.
> > 
> > To ALWAYS overwrite the CBR address the additional property
> > "mips-broken-cbr-reg" needs to be set.
> > 
> > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > ---
> >   arch/mips/bmips/setup.c | 30 +++++++++++++++++++++++++++---
> >   1 file changed, 27 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> > index 18561d426f89..bef84677248e 100644
> > --- a/arch/mips/bmips/setup.c
> > +++ b/arch/mips/bmips/setup.c
> > @@ -34,7 +34,11 @@
> >   #define REG_BCM6328_OTP		((void __iomem *)CKSEG1ADDR(0x1000062c))
> >   #define BCM6328_TP1_DISABLED	BIT(9)
> > -/* CBR addr doesn't change and we can cache it */
> > +/*
> > + * CBR addr doesn't change and we can cache it.
> > + * For broken SoC/Bootloader CBR addr might also be provided via DT
> > + * with "mips-cbr-reg" in the "cpus" node.
> > + */
> >   void __iomem *bmips_cbr_addr;
> >   extern bool bmips_rac_flush_disable;
> > @@ -212,8 +216,28 @@ void __init device_tree_init(void)
> >   	/* Disable SMP boot unless both CPUs are listed in DT and !disabled */
> >   	np = of_find_node_by_name(NULL, "cpus");
> > -	if (np && of_get_available_child_count(np) <= 1)
> > -		bmips_smp_enabled = 0;
> > +	if (np) {
> 
> Please reduce the indentation with early return/gotos. There might also be a
> need to do some validation that the CBR is at least outside of the DRAM
> window, that is we cannot blindly trust the DT to have gotten the CBR right
> IMHO.

Do you have any hint on how to do the check if we are outside DRAM?
Christian Marangi May 3, 2024, 7:39 p.m. UTC | #7
On Fri, May 03, 2024 at 12:07:45PM -0700, Florian Fainelli wrote:
> On 5/3/24 06:54, Christian Marangi wrote:
> > Drop redundant boot_cpu_type in arch_sync_dma_for_cpu_all. These needs
> > to be parsed only once and we can make use of bmips_rac_flush_disable to
> > disable RAC flush on unsupported CPU.
> > 
> > Set this value in bmips_cpu_setup for unsupported CPU to skip this
> > redundant check every time DMA needs to be synced.
> > 
> > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> 
> You are taking a shortcut that is reasonable in premise, but keying off the
> bmips_rac_flush_disable is IMHO misleading. The RAC is enabled in the
> BMIPS5000 and BMIPS5200 cores, just it does not need SW management unlike
> earlier cores.
> 
> If you renamed it to bmips_rac_flush_needed that might be more compelling.
> Also, the other reason is that on a kernel that was configured for
> supporting only BMIPS5000 and BMIPS5200 CPUs, I think we could get some
> decent dead code elimination of the boot_cpu_type() check, which would not
> be the case.

I was a bit confused by the last part, should I drop this or just rename
the variable? Cause I think for kernel that support ONLY those CPU I
guess the DMA function will be optimized anyway since the bool will
always be false I guess?
Florian Fainelli May 3, 2024, 8:08 p.m. UTC | #8
On 5/3/24 12:39, Christian Marangi wrote:
> On Fri, May 03, 2024 at 12:07:45PM -0700, Florian Fainelli wrote:
>> On 5/3/24 06:54, Christian Marangi wrote:
>>> Drop redundant boot_cpu_type in arch_sync_dma_for_cpu_all. These needs
>>> to be parsed only once and we can make use of bmips_rac_flush_disable to
>>> disable RAC flush on unsupported CPU.
>>>
>>> Set this value in bmips_cpu_setup for unsupported CPU to skip this
>>> redundant check every time DMA needs to be synced.
>>>
>>> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
>>
>> You are taking a shortcut that is reasonable in premise, but keying off the
>> bmips_rac_flush_disable is IMHO misleading. The RAC is enabled in the
>> BMIPS5000 and BMIPS5200 cores, just it does not need SW management unlike
>> earlier cores.
>>
>> If you renamed it to bmips_rac_flush_needed that might be more compelling.
>> Also, the other reason is that on a kernel that was configured for
>> supporting only BMIPS5000 and BMIPS5200 CPUs, I think we could get some
>> decent dead code elimination of the boot_cpu_type() check, which would not
>> be the case.
> 
> I was a bit confused by the last part, should I drop this or just rename
> the variable? Cause I think for kernel that support ONLY those CPU I
> guess the DMA function will be optimized anyway since the bool will
> always be false I guess?

I don't think it can be optimized, I would drop that patch. This is a 
hot-path and so any optimization is welcome.
Daniel González Cabanelas May 3, 2024, 9:11 p.m. UTC | #9
Hi Florian.
Bits 0 and 1 are already enabled by the bootloader, so no need to
write 0xF. I checked it on some devices with BCM6328, 6358, 6368 SoCs.

Example, without the patch, reading the RAC Configuration Register 0 and 1:

- BCM6368 booting from TP0:
root@OpenWrt:/# devmem 0xff400000
0x02A07015
root@OpenWrt:/# devmem 0xff400008
0x0000000F

- BCM6368 booting from TP1:
root@OpenWrt:/# devmem 0xff400000
0x02A0701F
root@OpenWrt:/# devmem 0xff400008
0x00000005
root@OpenWrt:/#

Regards.
Daniel

El vie, 3 may 2024 a las 20:56, Florian Fainelli
(<florian.fainelli@broadcom.com>) escribió:
>
> On 5/3/24 06:54, Christian Marangi wrote:
> > From: Daniel González Cabanelas <dgcbueu@gmail.com>
> >
> > The data RAC is left disabled by the bootloader in some SoCs, at least in
> > the core it boots from.
> > Enabling this feature increases the performance up to +30% depending on the
> > task.
> >
> > Signed-off-by: Daniel González Cabanelas <dgcbueu@gmail.com>
> > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> > [ rework code and reduce code duplication ]
> > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > ---
> >   arch/mips/kernel/smp-bmips.c | 12 ++++++++++++
> >   1 file changed, 12 insertions(+)
> >
> > diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
> > index 6048c471b5ee..7bde6bbaa41f 100644
> > --- a/arch/mips/kernel/smp-bmips.c
> > +++ b/arch/mips/kernel/smp-bmips.c
> > @@ -617,6 +617,18 @@ void bmips_cpu_setup(void)
> >               __raw_readl(bmips_cbr_addr + BMIPS_RAC_ADDRESS_RANGE);
> >               break;
> >
> > +     case CPU_BMIPS4350:
> > +             u32 rac_addr = BMIPS_RAC_CONFIG_1;
> > +
> > +             if (!(read_c0_brcm_cmt_local() & (1 << 31)))
> > +                     rac_addr = BMIPS_RAC_CONFIG;
> > +
> > +             /* Enable data RAC */
> > +             cfg = __raw_readl(bmips_cbr_addr + rac_addr);
> > +             __raw_writel(cfg | 0xa, bmips_cbr_addr + rac_addr);
>
> This enables data pre-fetching (bit 3) and data-caching (bit 1), have
> you tried with 0xF to see if this provides any additional speed-up?
>
> Looks correct to me otherwise, I wonder if a flush would be in order
> right after enabling, though I did not see any specific instructions
> towards that part in the programming notes.
>
> > +             __raw_readl(bmips_cbr_addr + rac_addr);
> > +             break;
> > +
> >       case CPU_BMIPS4380:
> >               /* CBG workaround for early BMIPS4380 CPUs */
> >               switch (read_c0_prid()) {
>
> --
> Florian
>
Christian Marangi May 3, 2024, 9:15 p.m. UTC | #10
On Fri, May 03, 2024 at 11:11:13PM +0200, Daniel González Cabanelas wrote:
> El vie, 3 may 2024 a las 20:56, Florian Fainelli
> (<florian.fainelli@broadcom.com>) escribió:
> >
> > On 5/3/24 06:54, Christian Marangi wrote:
> > > From: Daniel González Cabanelas <dgcbueu@gmail.com>
> > >
> > > The data RAC is left disabled by the bootloader in some SoCs, at least in
> > > the core it boots from.
> > > Enabling this feature increases the performance up to +30% depending on the
> > > task.
> > >
> > > Signed-off-by: Daniel González Cabanelas <dgcbueu@gmail.com>
> > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> > > [ rework code and reduce code duplication ]
> > > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > > ---
> > >   arch/mips/kernel/smp-bmips.c | 12 ++++++++++++
> > >   1 file changed, 12 insertions(+)
> > >
> > > diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
> > > index 6048c471b5ee..7bde6bbaa41f 100644
> > > --- a/arch/mips/kernel/smp-bmips.c
> > > +++ b/arch/mips/kernel/smp-bmips.c
> > > @@ -617,6 +617,18 @@ void bmips_cpu_setup(void)
> > >               __raw_readl(bmips_cbr_addr + BMIPS_RAC_ADDRESS_RANGE);
> > >               break;
> > >
> > > +     case CPU_BMIPS4350:
> > > +             u32 rac_addr = BMIPS_RAC_CONFIG_1;
> > > +
> > > +             if (!(read_c0_brcm_cmt_local() & (1 << 31)))
> > > +                     rac_addr = BMIPS_RAC_CONFIG;
> > > +
> > > +             /* Enable data RAC */
> > > +             cfg = __raw_readl(bmips_cbr_addr + rac_addr);
> > > +             __raw_writel(cfg | 0xa, bmips_cbr_addr + rac_addr);
> >
> > This enables data pre-fetching (bit 3) and data-caching (bit 1), have
> > you tried with 0xF to see if this provides any additional speed-up?
> >
> > Looks correct to me otherwise, I wonder if a flush would be in order
> > right after enabling, though I did not see any specific instructions
> > towards that part in the programming notes.
> >
> > > +             __raw_readl(bmips_cbr_addr + rac_addr);
> > > +             break;
> > > +
> > >       case CPU_BMIPS4380:
> > >               /* CBG workaround for early BMIPS4380 CPUs */
> > >               switch (read_c0_prid()) {
> >
> Hi Florian.
> Bits 0 and 1 are already enabled by the bootloader, so no need to
> write 0xF. I checked it on some devices with BCM6328, 6358, 6368 SoCs.
>
> Example, without the patch, reading the RAC Configuration Register 0 and 1:
>
> - BCM6368 booting from TP0:
> root@OpenWrt:/# devmem 0xff400000
> 0x02A07015
> root@OpenWrt:/# devmem 0xff400008
> 0x0000000F
>
> - BCM6368 booting from TP1:
> root@OpenWrt:/# devmem 0xff400000
> 0x02A0701F
> root@OpenWrt:/# devmem 0xff400008
> 0x00000005
> root@OpenWrt:/#
>

[ fixed the top-post ]

If that's the case then i'm setting 0xf since we verified it doesn't
cause problem and it's already set.
Florian Fainelli May 3, 2024, 9:24 p.m. UTC | #11
On 5/3/24 12:35, Christian Marangi wrote:
> On Fri, May 03, 2024 at 12:09:02PM -0700, Florian Fainelli wrote:
>> On 5/3/24 06:54, Christian Marangi wrote:
>>> Add support to provide CBR address from DT to handle broken
>>> SoC/Bootloader that doesn't correctly init it. This permits to use the
>>> RAC flush even in these condition.
>>>
>>> To provide a CBR address from DT, the property "mips-cbr-reg" needs to
>>> be set in the "cpus" node. On DT init, this property presence will be
>>> checked and will set the bmips_cbr_addr value accordingly. Also
>>> bmips_rac_flush_disable will be set to false as RAC flush can be
>>> correctly supported.
>>>
>>> The CBR address from DT will be applied only if the CBR address from the
>>> registers is 0, if the CBR address from the registers is not 0 and
>>> is not equal to the one set in DT (if provided) a WARN is printed.
>>>
>>> To ALWAYS overwrite the CBR address the additional property
>>> "mips-broken-cbr-reg" needs to be set.
>>>
>>> Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
>>> ---
>>>    arch/mips/bmips/setup.c | 30 +++++++++++++++++++++++++++---
>>>    1 file changed, 27 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
>>> index 18561d426f89..bef84677248e 100644
>>> --- a/arch/mips/bmips/setup.c
>>> +++ b/arch/mips/bmips/setup.c
>>> @@ -34,7 +34,11 @@
>>>    #define REG_BCM6328_OTP		((void __iomem *)CKSEG1ADDR(0x1000062c))
>>>    #define BCM6328_TP1_DISABLED	BIT(9)
>>> -/* CBR addr doesn't change and we can cache it */
>>> +/*
>>> + * CBR addr doesn't change and we can cache it.
>>> + * For broken SoC/Bootloader CBR addr might also be provided via DT
>>> + * with "mips-cbr-reg" in the "cpus" node.
>>> + */
>>>    void __iomem *bmips_cbr_addr;
>>>    extern bool bmips_rac_flush_disable;
>>> @@ -212,8 +216,28 @@ void __init device_tree_init(void)
>>>    	/* Disable SMP boot unless both CPUs are listed in DT and !disabled */
>>>    	np = of_find_node_by_name(NULL, "cpus");
>>> -	if (np && of_get_available_child_count(np) <= 1)
>>> -		bmips_smp_enabled = 0;
>>> +	if (np) {
>>
>> Please reduce the indentation with early return/gotos. There might also be a
>> need to do some validation that the CBR is at least outside of the DRAM
>> window, that is we cannot blindly trust the DT to have gotten the CBR right
>> IMHO.
> 
> Do you have any hint on how to do the check if we are outside DRAM?
> 

I was going to suggest the use of memblock_start_of_DRAM() and 
memblock_end_of_DRAM() but before I sent that out your v2 already had it!
Christian Marangi May 3, 2024, 9:27 p.m. UTC | #12
On Fri, May 03, 2024 at 02:24:49PM -0700, Florian Fainelli wrote:
> On 5/3/24 12:35, Christian Marangi wrote:
> > On Fri, May 03, 2024 at 12:09:02PM -0700, Florian Fainelli wrote:
> > > On 5/3/24 06:54, Christian Marangi wrote:
> > > > Add support to provide CBR address from DT to handle broken
> > > > SoC/Bootloader that doesn't correctly init it. This permits to use the
> > > > RAC flush even in these condition.
> > > > 
> > > > To provide a CBR address from DT, the property "mips-cbr-reg" needs to
> > > > be set in the "cpus" node. On DT init, this property presence will be
> > > > checked and will set the bmips_cbr_addr value accordingly. Also
> > > > bmips_rac_flush_disable will be set to false as RAC flush can be
> > > > correctly supported.
> > > > 
> > > > The CBR address from DT will be applied only if the CBR address from the
> > > > registers is 0, if the CBR address from the registers is not 0 and
> > > > is not equal to the one set in DT (if provided) a WARN is printed.
> > > > 
> > > > To ALWAYS overwrite the CBR address the additional property
> > > > "mips-broken-cbr-reg" needs to be set.
> > > > 
> > > > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > > > ---
> > > >    arch/mips/bmips/setup.c | 30 +++++++++++++++++++++++++++---
> > > >    1 file changed, 27 insertions(+), 3 deletions(-)
> > > > 
> > > > diff --git a/arch/mips/bmips/setup.c b/arch/mips/bmips/setup.c
> > > > index 18561d426f89..bef84677248e 100644
> > > > --- a/arch/mips/bmips/setup.c
> > > > +++ b/arch/mips/bmips/setup.c
> > > > @@ -34,7 +34,11 @@
> > > >    #define REG_BCM6328_OTP		((void __iomem *)CKSEG1ADDR(0x1000062c))
> > > >    #define BCM6328_TP1_DISABLED	BIT(9)
> > > > -/* CBR addr doesn't change and we can cache it */
> > > > +/*
> > > > + * CBR addr doesn't change and we can cache it.
> > > > + * For broken SoC/Bootloader CBR addr might also be provided via DT
> > > > + * with "mips-cbr-reg" in the "cpus" node.
> > > > + */
> > > >    void __iomem *bmips_cbr_addr;
> > > >    extern bool bmips_rac_flush_disable;
> > > > @@ -212,8 +216,28 @@ void __init device_tree_init(void)
> > > >    	/* Disable SMP boot unless both CPUs are listed in DT and !disabled */
> > > >    	np = of_find_node_by_name(NULL, "cpus");
> > > > -	if (np && of_get_available_child_count(np) <= 1)
> > > > -		bmips_smp_enabled = 0;
> > > > +	if (np) {
> > > 
> > > Please reduce the indentation with early return/gotos. There might also be a
> > > need to do some validation that the CBR is at least outside of the DRAM
> > > window, that is we cannot blindly trust the DT to have gotten the CBR right
> > > IMHO.
> > 
> > Do you have any hint on how to do the check if we are outside DRAM?
> > 
> 
> I was going to suggest the use of memblock_start_of_DRAM() and
> memblock_end_of_DRAM() but before I sent that out your v2 already had it!

Ehehe, I was initially using mem_map and totalram_pages but then I inspected
memblock header and notice those good API.
Daniel González Cabanelas May 3, 2024, 9:34 p.m. UTC | #13
El vie, 3 may 2024 a las 23:15, Christian Marangi
(<ansuelsmth@gmail.com>) escribió:
>
> On Fri, May 03, 2024 at 11:11:13PM +0200, Daniel González Cabanelas wrote:
> > El vie, 3 may 2024 a las 20:56, Florian Fainelli
> > (<florian.fainelli@broadcom.com>) escribió:
> > >
> > > On 5/3/24 06:54, Christian Marangi wrote:
> > > > From: Daniel González Cabanelas <dgcbueu@gmail.com>
> > > >
> > > > The data RAC is left disabled by the bootloader in some SoCs, at least in
> > > > the core it boots from.
> > > > Enabling this feature increases the performance up to +30% depending on the
> > > > task.
> > > >
> > > > Signed-off-by: Daniel González Cabanelas <dgcbueu@gmail.com>
> > > > Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
> > > > [ rework code and reduce code duplication ]
> > > > Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
> > > > ---
> > > >   arch/mips/kernel/smp-bmips.c | 12 ++++++++++++
> > > >   1 file changed, 12 insertions(+)
> > > >
> > > > diff --git a/arch/mips/kernel/smp-bmips.c b/arch/mips/kernel/smp-bmips.c
> > > > index 6048c471b5ee..7bde6bbaa41f 100644
> > > > --- a/arch/mips/kernel/smp-bmips.c
> > > > +++ b/arch/mips/kernel/smp-bmips.c
> > > > @@ -617,6 +617,18 @@ void bmips_cpu_setup(void)
> > > >               __raw_readl(bmips_cbr_addr + BMIPS_RAC_ADDRESS_RANGE);
> > > >               break;
> > > >
> > > > +     case CPU_BMIPS4350:
> > > > +             u32 rac_addr = BMIPS_RAC_CONFIG_1;
> > > > +
> > > > +             if (!(read_c0_brcm_cmt_local() & (1 << 31)))
> > > > +                     rac_addr = BMIPS_RAC_CONFIG;
> > > > +
> > > > +             /* Enable data RAC */
> > > > +             cfg = __raw_readl(bmips_cbr_addr + rac_addr);
> > > > +             __raw_writel(cfg | 0xa, bmips_cbr_addr + rac_addr);
> > >
> > > This enables data pre-fetching (bit 3) and data-caching (bit 1), have
> > > you tried with 0xF to see if this provides any additional speed-up?
> > >
> > > Looks correct to me otherwise, I wonder if a flush would be in order
> > > right after enabling, though I did not see any specific instructions
> > > towards that part in the programming notes.
> > >
> > > > +             __raw_readl(bmips_cbr_addr + rac_addr);
> > > > +             break;
> > > > +
> > > >       case CPU_BMIPS4380:
> > > >               /* CBG workaround for early BMIPS4380 CPUs */
> > > >               switch (read_c0_prid()) {
> > >
> > Hi Florian.
> > Bits 0 and 1 are already enabled by the bootloader, so no need to

I meant bits 0 and 2. These are the RAC bits:
#define RAC_FLH         (1 << 8)
#define RAC_DPF         (1 << 6)
#define RAC_NCH         (1 << 5)
#define RAC_C_INV       (1 << 4)
#define RAC_PF_D        (1 << 3)
#define RAC_PF_I        (1 << 2)
#define RAC_D           (1 << 1)
#define RAC_I           (1 << 0)

> > write 0xF. I checked it on some devices with BCM6328, 6358, 6368 SoCs.
> >
> > Example, without the patch, reading the RAC Configuration Register 0 and 1:
> >
> > - BCM6368 booting from TP0:
> > root@OpenWrt:/# devmem 0xff400000
> > 0x02A07015
> > root@OpenWrt:/# devmem 0xff400008
> > 0x0000000F
> >
> > - BCM6368 booting from TP1:
> > root@OpenWrt:/# devmem 0xff400000
> > 0x02A0701F
> > root@OpenWrt:/# devmem 0xff400008
> > 0x00000005
> > root@OpenWrt:/#
> >
>
> [ fixed the top-post ]
>
> If that's the case then i'm setting 0xf since we verified it doesn't
> cause problem and it's already set.

It's harmless to re-enable the instruction bits. BTW the log commit
refers only to data RAC, 0xF is enabling both the data and instruction
RAC.

Daniel
>
> --
>         Ansuel