[v3] powerpc: 85xx: separate e500 from e500mc

Message ID	b965536bbd0457ab9614777dc00fc0dd22cf1d47.1312953203.git.baruch@tkos.co.il (mailing list archive)
State	Superseded
Delegated to:	Kumar Gala
Headers	show Return-Path: <linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org> From: Baruch Siach <baruch@tkos.co.il> To: linuxppc-dev@lists.ozlabs.org Subject: [PATCH v3] powerpc: 85xx: separate e500 from e500mc Date: Wed, 10 Aug 2011 08:21:18 +0300 Message-Id: <b965536bbd0457ab9614777dc00fc0dd22cf1d47.1312953203.git.baruch@tkos.co.il> In-Reply-To: <20110810044307.GA2804@sapphire.tkos.co.il> References: <20110810044307.GA2804@sapphire.tkos.co.il> Cc: Scott Wood <scottwood@freescale.com>, Baruch Siach <baruch@tkos.co.il>, Timur Tabi <timur@freescale.com> Precedence: list MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Errors-To: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org Sender: linuxppc-dev-bounces+patchwork-incoming=ozlabs.org@lists.ozlabs.org

Baruch Siach Aug. 10, 2011, 5:21 a.m. UTC

CONFIG_E500MC breaks e500/e500v2 systems. It defines L1_CACHE_SHIFT to 6, thus
breaking clear_pages(), probably others too.

This patch adds a new "Processor Type" entry for e500mc, and makes e500 systems
depend on PPC_E500_V1_V2.

Cc: Kumar Gala <galak@kernel.crashing.org>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
---

Changes from v2:

	* s/CONFIG_PPC_E500/CONFIG_PPC_E500_V1_V2/ to avoid confusion as
	  noted by Scott Wood

Changes from v1:

	* Rebase on 3.1-rc1

	* Remove the list of processor families from the PPC_E500 and 
	  PPC_E500MC options description. The P20xx can be either e500v2 or 
	  e500mc.

 arch/powerpc/platforms/85xx/Kconfig    |   13 +++++++++----
 arch/powerpc/platforms/Kconfig.cputype |   27 +++++++++++++++------------
 2 files changed, 24 insertions(+), 16 deletions(-)

Paul Gortmaker Aug. 10, 2011, 3:39 p.m. UTC | #1

On Wed, Aug 10, 2011 at 1:21 AM, Baruch Siach <baruch@tkos.co.il> wrote:
> CONFIG_E500MC breaks e500/e500v2 systems. It defines L1_CACHE_SHIFT to 6, thus
> breaking clear_pages(), probably others too.
>
> This patch adds a new "Processor Type" entry for e500mc, and makes e500 systems
> depend on PPC_E500_V1_V2.

Isn't the original invalid configuration still possible, i.e. I can
choose E500_V1_V2
and also E500MC at the same time, unless you add something like a
"depends !E500MC" to  your new V1_V2 option?

Alternatively, you could treat it like using i386 kernel on a modern
core by taking
the LCD for the L1_CACHE_SHIFT of the configured in platforms.  I have booted
a kernel built for an mpc8548 core on a P4080 CPU, so that does work (with only
minimal dts fiddling).  And it keeps the ability to boot one kernel on several
platforms open (one of the reasons for the ppc --> powerpc shuffle a couple
of years ago...)

Paul.

>
> Cc: Kumar Gala <galak@kernel.crashing.org>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
>
> Changes from v2:
>
>        * s/CONFIG_PPC_E500/CONFIG_PPC_E500_V1_V2/ to avoid confusion as
>          noted by Scott Wood
>
> Changes from v1:
>
>        * Rebase on 3.1-rc1
>
>        * Remove the list of processor families from the PPC_E500 and
>          PPC_E500MC options description. The P20xx can be either e500v2 or
>          e500mc.
>
>  arch/powerpc/platforms/85xx/Kconfig    |   13 +++++++++----
>  arch/powerpc/platforms/Kconfig.cputype |   27 +++++++++++++++------------
>  2 files changed, 24 insertions(+), 16 deletions(-)
>
> diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
> index 498534c..00d4720 100644
> --- a/arch/powerpc/platforms/85xx/Kconfig
> +++ b/arch/powerpc/platforms/85xx/Kconfig
> @@ -13,6 +13,8 @@ if FSL_SOC_BOOKE
>
>  if PPC32
>
> +if PPC_E500_V1_V2
> +
>  config MPC8540_ADS
>        bool "Freescale MPC8540 ADS"
>        select DEFAULT_UIMAGE
> @@ -171,10 +173,13 @@ config SBC8560
>        help
>          This option enables support for the Wind River SBC8560 board
>
> +endif # PPC_E500_V1_V2
> +
> +if PPC_E500MC
> +
>  config P2040_RDB
>        bool "Freescale P2040 RDB"
>        select DEFAULT_UIMAGE
> -       select PPC_E500MC
>        select PHYS_64BIT
>        select SWIOTLB
>        select MPC8xxx_GPIO
> @@ -186,7 +191,6 @@ config P2040_RDB
>  config P3041_DS
>        bool "Freescale P3041 DS"
>        select DEFAULT_UIMAGE
> -       select PPC_E500MC
>        select PHYS_64BIT
>        select SWIOTLB
>        select MPC8xxx_GPIO
> @@ -198,7 +202,6 @@ config P3041_DS
>  config P4080_DS
>        bool "Freescale P4080 DS"
>        select DEFAULT_UIMAGE
> -       select PPC_E500MC
>        select PHYS_64BIT
>        select SWIOTLB
>        select MPC8xxx_GPIO
> @@ -207,13 +210,15 @@ config P4080_DS
>        help
>          This option enables support for the P4080 DS board
>
> +endif # PPC_E500MC
> +
>  endif # PPC32
>
>  config P5020_DS
>        bool "Freescale P5020 DS"
> +       depends on PPC_E500MC
>        select DEFAULT_UIMAGE
>        select E500
> -       select PPC_E500MC
>        select PHYS_64BIT
>        select SWIOTLB
>        select MPC8xxx_GPIO
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index e06e395..e6cb00c 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -10,13 +10,13 @@ choice
>        prompt "Processor Type"
>        depends on PPC32
>        help
> -         There are five families of 32 bit PowerPC chips supported.
> +         There are six families of 32 bit PowerPC chips supported.
>          The most common ones are the desktop and server CPUs (601, 603,
>          604, 740, 750, 74xx) CPUs from Freescale and IBM, with their
>          embedded 512x/52xx/82xx/83xx/86xx counterparts.
> -         The other embeeded parts, namely 4xx, 8xx, e200 (55xx) and e500
> -         (85xx) each form a family of their own that is not compatible
> -         with the others.
> +         The other embeeded parts, namely 4xx, 8xx, e200 (55xx), e500
> +         (85xx), and e500mc each form a family of their own that is not
> +         compatible with the others.
>
>          If unsure, select 52xx/6xx/7xx/74xx/82xx/83xx/86xx.
>
> @@ -24,10 +24,15 @@ config PPC_BOOK3S_32
>        bool "512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx"
>        select PPC_FPU
>
> -config PPC_85xx
> -       bool "Freescale 85xx"
> +config PPC_E500_V1_V2
> +       bool "Freescale e500v1/e500v2"
> +       select PPC_85xx
>        select E500
>
> +config PPC_E500MC
> +       bool "Freescale e500mc/e5500"
> +       select PPC_85xx
> +
>  config PPC_8xx
>        bool "Freescale 8xx"
>        select FSL_SOC
> @@ -128,15 +133,13 @@ config TUNE_CELL
>  config 8xx
>        bool
>
> -config E500
> +config PPC_85xx
> +       bool
>        select FSL_EMB_PERFMON
>        select PPC_FSL_BOOK3E
> -       bool
>
> -config PPC_E500MC
> -       bool "e500mc Support"
> -       select PPC_FPU
> -       depends on E500
> +config E500
> +       bool
>
>  config PPC_FPU
>        bool
> --
> 1.7.5.4
>
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
>

Scott Wood Aug. 10, 2011, 4:01 p.m. UTC | #2

On 08/10/2011 10:39 AM, Paul Gortmaker wrote:
> On Wed, Aug 10, 2011 at 1:21 AM, Baruch Siach <baruch@tkos.co.il> wrote:
>> CONFIG_E500MC breaks e500/e500v2 systems. It defines L1_CACHE_SHIFT to 6, thus
>> breaking clear_pages(), probably others too.
>>
>> This patch adds a new "Processor Type" entry for e500mc, and makes e500 systems
>> depend on PPC_E500_V1_V2.
> 
> Isn't the original invalid configuration still possible, i.e. I can
> choose E500_V1_V2
> and also E500MC at the same time, unless you add something like a
> "depends !E500MC" to  your new V1_V2 option?

They're members of a "choice", not standalone bools -- so they're
mutually exclusive.

> Alternatively, you could treat it like using i386 kernel on a modern
> core by taking
> the LCD for the L1_CACHE_SHIFT of the configured in platforms. 

For alignment you want to err on the high side, but for invalidation you
want to err on the low side.  For dcbz you can't err at all.

And there are other issues than cache size with combining e500v2 and e500mc.

Could it be done with sufficient hoop-jumping?  Probably.  Is it worth
it?  No.  These chips don't even have compatible userspace, unless you
use soft-float.

> I have booted
> a kernel built for an mpc8548 core on a P4080 CPU, so that does work (with only
> minimal dts fiddling).

The opposite direction does not work, and simply booting doesn't mean
there wouldn't be issues in running that kernel on a p4080 (floating
point?  bad cache size information being given to userspace?  emulation
of non-cacheable dcbz?  performance?).

What dts fiddling?

> And it keeps the ability to boot one kernel on several
> platforms open (one of the reasons for the ppc --> powerpc shuffle a couple
> of years ago...)

It's much better than the arch/ppc way of a separate kernel build for
every board.  Beyond a certain point there are diminishing returns on
the effort.

-Scott

Paul Gortmaker Aug. 10, 2011, 4:40 p.m. UTC | #3

On 11-08-10 12:01 PM, Scott Wood wrote:
> On 08/10/2011 10:39 AM, Paul Gortmaker wrote:
>> On Wed, Aug 10, 2011 at 1:21 AM, Baruch Siach <baruch@tkos.co.il> wrote:
>>> CONFIG_E500MC breaks e500/e500v2 systems. It defines L1_CACHE_SHIFT to 6, thus
>>> breaking clear_pages(), probably others too.
>>>
>>> This patch adds a new "Processor Type" entry for e500mc, and makes e500 systems
>>> depend on PPC_E500_V1_V2.
>>
>> Isn't the original invalid configuration still possible, i.e. I can
>> choose E500_V1_V2
>> and also E500MC at the same time, unless you add something like a
>> "depends !E500MC" to  your new V1_V2 option?
> 
> They're members of a "choice", not standalone bools -- so they're
> mutually exclusive.

OK, I missed that.

> 
>> Alternatively, you could treat it like using i386 kernel on a modern
>> core by taking
>> the LCD for the L1_CACHE_SHIFT of the configured in platforms. 
> 
> For alignment you want to err on the high side, but for invalidation you
> want to err on the low side.  For dcbz you can't err at all.
> 
> And there are other issues than cache size with combining e500v2 and e500mc.
> 
> Could it be done with sufficient hoop-jumping?  Probably.  Is it worth
> it?  No.  These chips don't even have compatible userspace, unless you
> use soft-float.

Yeah, if there are lots of other issues and the value return is low,
then I can't argue with that.  And yes I did use soft float in the
thing I was meddling with.

> 
>> I have booted
>> a kernel built for an mpc8548 core on a P4080 CPU, so that does work (with only
>> minimal dts fiddling).
> 
> The opposite direction does not work, and simply booting doesn't mean
> there wouldn't be issues in running that kernel on a p4080 (floating
> point?  bad cache size information being given to userspace?  emulation
> of non-cacheable dcbz?  performance?).
> 
> What dts fiddling?

Just making sure that the 8548 dts had the right address to
find the uart on the actual p4080 platform.

> 
>> And it keeps the ability to boot one kernel on several
>> platforms open (one of the reasons for the ppc --> powerpc shuffle a couple
>> of years ago...)
> 
> It's much better than the arch/ppc way of a separate kernel build for
> every board.  Beyond a certain point there are diminishing returns on
> the effort.

Given the extra info you list above, I agree.  I just thought it worth
a mention since I had happened to boot the 8548 kernel on a p4080 as
part of something else I was experimenting with, and it didn't totally
catch fire (which somewhat surprised me).

P.

> 
> -Scott
>

Baruch Siach Oct. 24, 2011, 6 a.m. UTC | #4

Hi Kumar,

On Wed, Aug 10, 2011 at 08:21:18AM +0300, Baruch Siach wrote:
> CONFIG_E500MC breaks e500/e500v2 systems. It defines L1_CACHE_SHIFT to 6, thus
> breaking clear_pages(), probably others too.
> 
> This patch adds a new "Processor Type" entry for e500mc, and makes e500 systems
> depend on PPC_E500_V1_V2.

Ping.
Any chance of merging this for 3.2?

baruch

> Cc: Kumar Gala <galak@kernel.crashing.org>
> Signed-off-by: Baruch Siach <baruch@tkos.co.il>
> ---
> 
> Changes from v2:
> 
> 	* s/CONFIG_PPC_E500/CONFIG_PPC_E500_V1_V2/ to avoid confusion as
> 	  noted by Scott Wood
> 
> Changes from v1:
> 
> 	* Rebase on 3.1-rc1
> 
> 	* Remove the list of processor families from the PPC_E500 and 
> 	  PPC_E500MC options description. The P20xx can be either e500v2 or 
> 	  e500mc.
> 
>  arch/powerpc/platforms/85xx/Kconfig    |   13 +++++++++----
>  arch/powerpc/platforms/Kconfig.cputype |   27 +++++++++++++++------------
>  2 files changed, 24 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/powerpc/platforms/85xx/Kconfig b/arch/powerpc/platforms/85xx/Kconfig
> index 498534c..00d4720 100644
> --- a/arch/powerpc/platforms/85xx/Kconfig
> +++ b/arch/powerpc/platforms/85xx/Kconfig
> @@ -13,6 +13,8 @@ if FSL_SOC_BOOKE
>  
>  if PPC32
>  
> +if PPC_E500_V1_V2
> +
>  config MPC8540_ADS
>  	bool "Freescale MPC8540 ADS"
>  	select DEFAULT_UIMAGE
> @@ -171,10 +173,13 @@ config SBC8560
>  	help
>  	  This option enables support for the Wind River SBC8560 board
>  
> +endif # PPC_E500_V1_V2
> +
> +if PPC_E500MC
> +
>  config P2040_RDB
>  	bool "Freescale P2040 RDB"
>  	select DEFAULT_UIMAGE
> -	select PPC_E500MC
>  	select PHYS_64BIT
>  	select SWIOTLB
>  	select MPC8xxx_GPIO
> @@ -186,7 +191,6 @@ config P2040_RDB
>  config P3041_DS
>  	bool "Freescale P3041 DS"
>  	select DEFAULT_UIMAGE
> -	select PPC_E500MC
>  	select PHYS_64BIT
>  	select SWIOTLB
>  	select MPC8xxx_GPIO
> @@ -198,7 +202,6 @@ config P3041_DS
>  config P4080_DS
>  	bool "Freescale P4080 DS"
>  	select DEFAULT_UIMAGE
> -	select PPC_E500MC
>  	select PHYS_64BIT
>  	select SWIOTLB
>  	select MPC8xxx_GPIO
> @@ -207,13 +210,15 @@ config P4080_DS
>  	help
>  	  This option enables support for the P4080 DS board
>  
> +endif # PPC_E500MC
> +
>  endif # PPC32
>  
>  config P5020_DS
>  	bool "Freescale P5020 DS"
> +	depends on PPC_E500MC
>  	select DEFAULT_UIMAGE
>  	select E500
> -	select PPC_E500MC
>  	select PHYS_64BIT
>  	select SWIOTLB
>  	select MPC8xxx_GPIO
> diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
> index e06e395..e6cb00c 100644
> --- a/arch/powerpc/platforms/Kconfig.cputype
> +++ b/arch/powerpc/platforms/Kconfig.cputype
> @@ -10,13 +10,13 @@ choice
>  	prompt "Processor Type"
>  	depends on PPC32
>  	help
> -	  There are five families of 32 bit PowerPC chips supported.
> +	  There are six families of 32 bit PowerPC chips supported.
>  	  The most common ones are the desktop and server CPUs (601, 603,
>  	  604, 740, 750, 74xx) CPUs from Freescale and IBM, with their
>  	  embedded 512x/52xx/82xx/83xx/86xx counterparts.
> -	  The other embeeded parts, namely 4xx, 8xx, e200 (55xx) and e500
> -	  (85xx) each form a family of their own that is not compatible
> -	  with the others.
> +	  The other embeeded parts, namely 4xx, 8xx, e200 (55xx), e500
> +	  (85xx), and e500mc each form a family of their own that is not
> +	  compatible with the others.
>  
>  	  If unsure, select 52xx/6xx/7xx/74xx/82xx/83xx/86xx.
>  
> @@ -24,10 +24,15 @@ config PPC_BOOK3S_32
>  	bool "512x/52xx/6xx/7xx/74xx/82xx/83xx/86xx"
>  	select PPC_FPU
>  
> -config PPC_85xx
> -	bool "Freescale 85xx"
> +config PPC_E500_V1_V2
> +	bool "Freescale e500v1/e500v2"
> +	select PPC_85xx
>  	select E500
>  
> +config PPC_E500MC
> +	bool "Freescale e500mc/e5500"
> +	select PPC_85xx
> +
>  config PPC_8xx
>  	bool "Freescale 8xx"
>  	select FSL_SOC
> @@ -128,15 +133,13 @@ config TUNE_CELL
>  config 8xx
>  	bool
>  
> -config E500
> +config PPC_85xx
> +	bool
>  	select FSL_EMB_PERFMON
>  	select PPC_FSL_BOOK3E
> -	bool
>  
> -config PPC_E500MC
> -	bool "e500mc Support"
> -	select PPC_FPU
> -	depends on E500
> +config E500
> +	bool
>  
>  config PPC_FPU
>  	bool
> -- 
> 1.7.5.4
>

Kyle Moffett Nov. 10, 2011, 12:03 a.m. UTC | #5

Hello,

I saw Baruch Siach's patch:
  powerpc: 85xx: separate e500 from e500mc

Unfortunately, that patch breaks the dependencies for the P5020DS
platform and does not fix the underlying code which does not
understand what the ambiguous "CONFIG_E500" means.

In order to fix the issue at the fundamental level, I created the
following 17-patch series loosely based on Baruch's patch.

=== High-Level Summary ===

The e500v1/v2 and e500mc/e5500 CPU families are not compatible with
each other, yet they share the same "CONFIG_E500" Kconfig option.

The following patch series splits the 32-bit CPU support into two
separate options: "CONFIG_FSL_E500_V1_V2" and "CONFIG_FSL_E500MC".
Additionally, the 64-bit e5500 support is separated to its own config
option ("CONFIG_FSL_E5500") which is automatically combined with
either 32-bit e500MC or 64-bit Book-3E when the P5020DS board support
is enabled.

I based the patches on v3.2-rc1, please let me know if I should
update the patches against a different tree.

The first 4 patches stand on their own merits; they are generic code
cleanups necessary to support the later patches.

I'd like to know what you all think.

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

Kyle Moffett Nov. 10, 2011, 12:06 a.m. UTC | #6

(Sorry for the repost, I accidentally omitted Baruch's email).

Hello,

I saw Baruch Siach's patch:
  powerpc: 85xx: separate e500 from e500mc

Unfortunately, that patch breaks the dependencies for the P5020DS
platform and does not fix the underlying code which does not
understand what the ambiguous "CONFIG_E500" means.

In order to fix the issue at the fundamental level, I created the
following 17-patch series loosely based on Baruch's patch.

=== High-Level Summary ===

The e500v1/v2 and e500mc/e5500 CPU families are not compatible with
each other, yet they share the same "CONFIG_E500" Kconfig option.

The following patch series splits the 32-bit CPU support into two
separate options: "CONFIG_FSL_E500_V1_V2" and "CONFIG_FSL_E500MC".
Additionally, the 64-bit e5500 support is separated to its own config
option ("CONFIG_FSL_E5500") which is automatically combined with
either 32-bit e500MC or 64-bit Book-3E when the P5020DS board support
is enabled.

I based the patches on v3.2-rc1, please let me know if I should
update the patches against a different tree.

The first 4 patches stand on their own merits; they are generic code
cleanups necessary to support the later patches.

I'd like to know what you all think.

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

Kumar Gala Nov. 10, 2011, 1:59 p.m. UTC | #7

On Nov 9, 2011, at 6:03 PM, Kyle Moffett wrote:

> Hello,
> 
> I saw Baruch Siach's patch:
>  powerpc: 85xx: separate e500 from e500mc
> 
> Unfortunately, that patch breaks the dependencies for the P5020DS
> platform and does not fix the underlying code which does not
> understand what the ambiguous "CONFIG_E500" means.
> 
> In order to fix the issue at the fundamental level, I created the
> following 17-patch series loosely based on Baruch's patch.
> 
> === High-Level Summary ===
> 
> The e500v1/v2 and e500mc/e5500 CPU families are not compatible with
> each other, yet they share the same "CONFIG_E500" Kconfig option.
> 
> The following patch series splits the 32-bit CPU support into two
> separate options: "CONFIG_FSL_E500_V1_V2" and "CONFIG_FSL_E500MC".
> Additionally, the 64-bit e5500 support is separated to its own config
> option ("CONFIG_FSL_E5500") which is automatically combined with
> either 32-bit e500MC or 64-bit Book-3E when the P5020DS board support
> is enabled.

So its clear from the community that there is confusion here and we need to clean this up.  I guess my attempt to support an kernel that ran on both E500v2 and E500mc isn't worth it.  However I don't want to completely remove the ability to do this.

Towards the cleanup I'd ask for a proposal on what exactly the CONFIG_ options we'd end up with would be and their meaning.

So today we have:

CONFIG_E500
CONFIG_PPC_E500MC

What do we want to move to?  I want to keep the builds such that we have only 2 classes:  e500V1/V2 and e500mc/e5500/e6500/.../eX500.  I see no reason to hyper-optimize e500mc vs e5500 vs e6500.

- k

Kyle Moffett Nov. 10, 2011, 4:17 p.m. UTC | #8

On Nov 10, 2011, at 08:59, Kumar Gala wrote:
> On Nov 9, 2011, at 6:03 PM, Kyle Moffett wrote:
>> I saw Baruch Siach's patch:
>> powerpc: 85xx: separate e500 from e500mc
>> 
>> Unfortunately, that patch breaks the dependencies for the P5020DS
>> platform and does not fix the underlying code which does not
>> understand what the ambiguous "CONFIG_E500" means.
>> 
>> In order to fix the issue at the fundamental level, I created the
>> following 17-patch series loosely based on Baruch's patch.
>> 
>> === High-Level Summary ===
>> 
>> The e500v1/v2 and e500mc/e5500 CPU families are not compatible with
>> each other, yet they share the same "CONFIG_E500" Kconfig option.
>> 
>> The following patch series splits the 32-bit CPU support into two
>> separate options: "CONFIG_FSL_E500_V1_V2" and "CONFIG_FSL_E500MC".
>> Additionally, the 64-bit e5500 support is separated to its own config
>> option ("CONFIG_FSL_E5500") which is automatically combined with
>> either 32-bit e500MC or 64-bit Book-3E when the P5020DS board support
>> is enabled.
> 
> So its clear from the community that there is confusion here and we
> need to clean this up.  I guess my attempt to support an kernel that
> ran on both E500v2 and E500mc isn't worth it.  However I don't want to
> completely remove the ability to do this.

Well, a kernel built with CONFIG_PPC_E500MC today appears to be
fundamentally broken on E500v1/E500v2:

#if defined(CONFIG_8xx) || defined(CONFIG_403GCX)
#define L1_CACHE_SHIFT		4
#define MAX_COPY_PREFETCH	1
#elif defined(CONFIG_PPC_E500MC)
#define L1_CACHE_SHIFT		6
#define MAX_COPY_PREFETCH	4
#elif defined(CONFIG_PPC32)
#define MAX_COPY_PREFETCH	4
#if defined(CONFIG_PPC_47x)
#define L1_CACHE_SHIFT		7
#else
#define L1_CACHE_SHIFT		5
#endif
#else /* CONFIG_PPC64 */
#define L1_CACHE_SHIFT		7
#endif

E500MC will set L1_CACHE_SHIFT to 6, while regular E500 appears to
want it set to 5.  I don't know if that's a mistake or exactly what
code that affects, but it looks very wrong.

Furthermore, it looks like there are a couple issues here I missed
before.  PPC64 systems all appear to have an L1_CACHE_SHIFT of 7,
except when you turn on the P5020DS board option which magically
changes it to "6" and breaks lord-knows-what.  I think my patch
series actually "breaks" that and makes e5500 use 7 as well.

Are you sure that a kernel built to support E5500 can also run on
other 64-bit PowerPC/POWER systems?

> Towards the cleanup I'd ask for a proposal on what exactly the
> CONFIG_ options we'd end up with would be and their meaning.
> So today we have:
> 
> CONFIG_E500
> CONFIG_PPC_E500MC

It's actually a bit more complicated than that.  There are 3 ways
that the user can configure an e500 kernel today.  I'm omitting
the "FSL_SOC_BOOKE" menu that wraps around all of the 85xx/e5500
boards today, because that is set for all of these platforms:

  * PPC32 + PPC_85xx + E500 [+ boards]
  * PPC64 + BOOK3E_64 + P5020_DS (which adds E500 and PPC_E500MC)

Note that whether or not "PPC_E500MC" is set on PPC32 depends
only on which boards the user picked.  So if I am trying to
build an e500v2 kernel and I accidentally also turn on support
for one of the e500mc boards, my kernel mysteriously breaks.

> What do we want to move to?  I want to keep the builds such that we
> have only 2 classes:  e500V1/V2 and e500mc/e5500/e6500/.../eX500.
> I see no reason to hyper-optimize e500mc vs e5500 vs e6500.

So after my changes, there are the following user-configurable
option sets:
  * PPC32 + FSL_E500_V1_V2 [+ e500v1/v2 boards]
  * PPC32 + FSL_E500MC     [+ e500mc boards]
  * PPC64 + BOOK3E_64 + P5020_DS (which adds FSL_E5500)

Since most of the "e500mc"-specific code was in 32-bit-only ASM
or inside of #ifdef PPC32, the new FSL_E500MC option is only
set on 32-bit builds, even if it is running in compat mode on
64-bit e5500 hardware)

Internally the P5020_DS option turns on the hidden FSL_E5500
option for both 32-bit and 64-bit; that config option enables
platform drivers and similar stuff.

Depending on how compatible the AMP processors are, you could
rename the option to be "FSL_E5X00" or add a hidden option for
"FSL_E6500" that is also selected by appropriate boards.

Please let me know if you think!

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

Kumar Gala Nov. 10, 2011, 4:30 p.m. UTC | #9

On Nov 10, 2011, at 10:17 AM, Moffett, Kyle D wrote:

> On Nov 10, 2011, at 08:59, Kumar Gala wrote:
>> On Nov 9, 2011, at 6:03 PM, Kyle Moffett wrote:
>>> I saw Baruch Siach's patch:
>>> powerpc: 85xx: separate e500 from e500mc
>>> 
>>> Unfortunately, that patch breaks the dependencies for the P5020DS
>>> platform and does not fix the underlying code which does not
>>> understand what the ambiguous "CONFIG_E500" means.
>>> 
>>> In order to fix the issue at the fundamental level, I created the
>>> following 17-patch series loosely based on Baruch's patch.
>>> 
>>> === High-Level Summary ===
>>> 
>>> The e500v1/v2 and e500mc/e5500 CPU families are not compatible with
>>> each other, yet they share the same "CONFIG_E500" Kconfig option.
>>> 
>>> The following patch series splits the 32-bit CPU support into two
>>> separate options: "CONFIG_FSL_E500_V1_V2" and "CONFIG_FSL_E500MC".
>>> Additionally, the 64-bit e5500 support is separated to its own config
>>> option ("CONFIG_FSL_E5500") which is automatically combined with
>>> either 32-bit e500MC or 64-bit Book-3E when the P5020DS board support
>>> is enabled.
>> 
>> So its clear from the community that there is confusion here and we
>> need to clean this up.  I guess my attempt to support an kernel that
>> ran on both E500v2 and E500mc isn't worth it.  However I don't want to
>> completely remove the ability to do this.
> 
> Well, a kernel built with CONFIG_PPC_E500MC today appears to be
> fundamentally broken on E500v1/E500v2:
> 
> #if defined(CONFIG_8xx) || defined(CONFIG_403GCX)
> #define L1_CACHE_SHIFT		4
> #define MAX_COPY_PREFETCH	1
> #elif defined(CONFIG_PPC_E500MC)
> #define L1_CACHE_SHIFT		6
> #define MAX_COPY_PREFETCH	4
> #elif defined(CONFIG_PPC32)
> #define MAX_COPY_PREFETCH	4
> #if defined(CONFIG_PPC_47x)
> #define L1_CACHE_SHIFT		7
> #else
> #define L1_CACHE_SHIFT		5
> #endif
> #else /* CONFIG_PPC64 */
> #define L1_CACHE_SHIFT		7
> #endif
> 
> E500MC will set L1_CACHE_SHIFT to 6, while regular E500 appears to
> want it set to 5.  I don't know if that's a mistake or exactly what
> code that affects, but it looks very wrong.

This is correct for E500 & E500MC.  However we have a HW mode that allows us to handle running in 'e500' mode on e500mc.

> Furthermore, it looks like there are a couple issues here I missed
> before.  PPC64 systems all appear to have an L1_CACHE_SHIFT of 7,
> except when you turn on the P5020DS board option which magically
> changes it to "6" and breaks lord-knows-what.  I think my patch
> series actually "breaks" that and makes e5500 use 7 as well.

a value of '6' on E5500 / P5020DS is correct and doesn't break anything.  Setting it to 7 is wrong and thus the code is correct today.

> Are you sure that a kernel built to support E5500 can also run on
> other 64-bit PowerPC/POWER systems?

No it will not.  There is not expectation of that as E5500 is an embedded / Book-E class part and uses that ISA version.  Book-S (server) 64-bit machines are not OS compatible and we are not trying to make them as such (but we do re-use a lot of code).

>> Towards the cleanup I'd ask for a proposal on what exactly the
>> CONFIG_ options we'd end up with would be and their meaning.
>> So today we have:
>> 
>> CONFIG_E500
>> CONFIG_PPC_E500MC
> 
> It's actually a bit more complicated than that.  There are 3 ways
> that the user can configure an e500 kernel today.  I'm omitting
> the "FSL_SOC_BOOKE" menu that wraps around all of the 85xx/e5500
> boards today, because that is set for all of these platforms:
> 
>  * PPC32 + PPC_85xx + E500 [+ boards]
>  * PPC64 + BOOK3E_64 + P5020_DS (which adds E500 and PPC_E500MC)
> 
> Note that whether or not "PPC_E500MC" is set on PPC32 depends
> only on which boards the user picked.  So if I am trying to
> build an e500v2 kernel and I accidentally also turn on support
> for one of the e500mc boards, my kernel mysteriously breaks.

sure, I understand I'm fine with us 'fixing' things such that we treat E500V1/V2 differently from E500MC/E5500 in user Kconfig choices

>> What do we want to move to?  I want to keep the builds such that we
>> have only 2 classes:  e500V1/V2 and e500mc/e5500/e6500/.../eX500.
>> I see no reason to hyper-optimize e500mc vs e5500 vs e6500.
> 
> So after my changes, there are the following user-configurable
> option sets:
>  * PPC32 + FSL_E500_V1_V2 [+ e500v1/v2 boards]
>  * PPC32 + FSL_E500MC     [+ e500mc boards]
>  * PPC64 + BOOK3E_64 + P5020_DS (which adds FSL_E5500)
> 
> Since most of the "e500mc"-specific code was in 32-bit-only ASM
> or inside of #ifdef PPC32, the new FSL_E500MC option is only
> set on 32-bit builds, even if it is running in compat mode on
> 64-bit e5500 hardware)
> 
> Internally the P5020_DS option turns on the hidden FSL_E5500
> option for both 32-bit and 64-bit; that config option enables
> platform drivers and similar stuff.
> 
> Depending on how compatible the AMP processors are, you could
> rename the option to be "FSL_E5X00" or add a hidden option for
> "FSL_E6500" that is also selected by appropriate boards.
> 
> Please let me know if you think!

I'd like to avoid adding FSL_E5500, FSL_E6500, etc CONFIG options getting added.  I'd like to keep things as:

32-bit:
	e500v1/v2
	e500mc/e5500/e6500/...
64-bit:
	e5500/e6500/...

We need to come up with some CONFIG option that covers e500mc/e5500/e6500/...

- k

Scott Wood Nov. 10, 2011, 4:54 p.m. UTC | #10

On Thu, Nov 10, 2011 at 10:30:41AM -0600, Kumar Gala wrote:
> On Nov 10, 2011, at 10:17 AM, Moffett, Kyle D wrote:
> > Furthermore, it looks like there are a couple issues here I missed
> > before.  PPC64 systems all appear to have an L1_CACHE_SHIFT of 7,
> > except when you turn on the P5020DS board option which magically
> > changes it to "6" and breaks lord-knows-what.  I think my patch
> > series actually "breaks" that and makes e5500 use 7 as well.
> 
> a value of '6' on E5500 / P5020DS is correct and doesn't break anything.  Setting it to 7 is wrong and thus the code is correct today.
> 
> > Are you sure that a kernel built to support E5500 can also run on
> > other 64-bit PowerPC/POWER systems?
> 
> No it will not.  There is not expectation of that as E5500 is an
> embedded / Book-E class part and uses that ISA version.  Book-S
> (server) 64-bit machines are not OS compatible and we are not trying to
> make them as such (but we do re-use a lot of code).

What about other 64-bit book3e chips?  What cache block size does A2 have?

-Scott

Kyle Moffett Nov. 11, 2011, 12:38 a.m. UTC | #11

On Nov 10, 2011, at 11:54, Scott Wood wrote:
> On Thu, Nov 10, 2011 at 10:30:41AM -0600, Kumar Gala wrote:
>> On Nov 10, 2011, at 10:17 AM, Moffett, Kyle D wrote:
>>> Furthermore, it looks like there are a couple issues here I missed
>>> before.  PPC64 systems all appear to have an L1_CACHE_SHIFT of 7,
>>> except when you turn on the P5020DS board option which magically
>>> changes it to "6" and breaks lord-knows-what.  I think my patch
>>> series actually "breaks" that and makes e5500 use 7 as well.
>> 
>> a value of '6' on E5500 / P5020DS is correct and doesn't break anything.
>> Setting it to 7 is wrong and thus the code is correct today.
>> 
>>> Are you sure that a kernel built to support E5500 can also run on
>>> other 64-bit PowerPC/POWER systems?
>> 
>> No it will not.  There is not expectation of that as E5500 is an
>> embedded / Book-E class part and uses that ISA version.  Book-S
>> (server) 64-bit machines are not OS compatible and we are not trying to
>> make them as such (but we do re-use a lot of code).
> 
> What about other 64-bit book3e chips?  What cache block size does A2 have?

Ok, so I've been poking around this code a bunch and as far as I can
tell, the cacheline stuff has basically always been subtly wrong in
twelve different ways and it's only largely coincidence that it works
today.

So PowerPC64 systems have their own "ppc64_caches" structure set up
before start_kernel() is called by parsing the OpenFirmware "cpu" nodes.
That structure is then checked in every piece of 64-bit kernel code
(except xmon) that uses the "dcbXX" and "icbXX" opcodes.

There is an entirely separate mechanism built into the "cputable" that
is used on all PowerPC systems to compute cacheline sizes to pass in via
ELF headers for userspace to use in memset()/memcpy(), etc.

Furthermore, the VDSO gets cacheline sizes stored into it, but on 64-bit
they come from the ppc64_caches structure and on 32-bit they come from
dcache_bsize/icache_bsize copied from the cputable.

Then there's the value in arch/powerpc/include/asm/cache.h which is used
throughout the kernel to figure out how far apart to space CPU-specific
datastructures (EG: __cacheline_aligned_on_smp).

Despite the fact that all PPC64 have an "L1_CACHE_SIZE" value of 128,
the PowerPC A2 and e5500 have {d,i}cache_bsize values of 64 in cputable
and presumably also get correct values from OpenFirmware, so the bogus
constant in asm/cache.h does nothing more than waste a bit of memory
for unnecessary padding.

Unfortunately, lots of PPC32 assembly pretends that the value found in
asm/cache.h is a hard truth and uses it for "dcbz", etc, which is why
there are all of those ugly #ifdefs in asm/cache.h

Based on all of that, my proposal is going to be a patch which does the
following:

  (1) Conditionally set L1_CACHE_SHIFT to the maximum value used by any
      platform being compiled in for alignment purposes.

  (2) Make the ppc64_caches struct apply to ppc32 as well, and
      preinitialize it with a minimum value used by any platform being
      compiled in (for "dcbXX"/"icbXX" purposes).  This is safe because
      the pagesize is always a multiple of the cache block size and the
      kernel only uses dcbXX/icbXX on whole pages.  The only impact is a
      temporary small performance hit from flushing or zeroing the same
      block 8 times if too small.

  (3) Try to initialize the ppc_caches struct on ppc32 from the
      OpenFirmware device-tree.  If that fails, then use the values we
      find in the cputable.  After this is initialized any performance
      hit in copy_page()/zero_page() will obviously disappear.

  (4) Fix all of the PPC32 assembly code that is misusing L1_CACHE_SHIFT
      to use the ppc_caches struct instead.

Does that sound like a reasonable approach?

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

Benjamin Herrenschmidt Nov. 11, 2011, 4:40 a.m. UTC | #12

On Thu, 2011-11-10 at 18:38 -0600, Moffett, Kyle D wrote:

> Ok, so I've been poking around this code a bunch and as far as I can
> tell, the cacheline stuff has basically always been subtly wrong in
> twelve different ways and it's only largely coincidence that it works
> today.

Yay ! Somebody to clean that shit up ! :-)

That's the biggest missing step to being able to have 440 and 476 in a
single binary :-)

> So PowerPC64 systems have their own "ppc64_caches" structure set up
> before start_kernel() is called by parsing the OpenFirmware "cpu" nodes.
> That structure is then checked in every piece of 64-bit kernel code
> (except xmon) that uses the "dcbXX" and "icbXX" opcodes.

Yup. (And we should really fix xmon btw...)

> There is an entirely separate mechanism built into the "cputable" that
> is used on all PowerPC systems to compute cacheline sizes to pass in via
> ELF headers for userspace to use in memset()/memcpy(), etc.

Yeah well, it actually uses global variables which are set from cputable
on ppc32 and from the ppc64_caches structure on ppc64. Yeah it's not
pretty.

> Furthermore, the VDSO gets cacheline sizes stored into it, but on 64-bit
> they come from the ppc64_caches structure and on 32-bit they come from
> dcache_bsize/icache_bsize copied from the cputable.

Yup.

> Then there's the value in arch/powerpc/include/asm/cache.h which is used
> throughout the kernel to figure out how far apart to space CPU-specific
> datastructures (EG: __cacheline_aligned_on_smp).

Not much we can do about that one since it has to be compile time. Maybe
something like calculating the biggest cache line size supported by all
built-in processor types ?

> Despite the fact that all PPC64 have an "L1_CACHE_SIZE" value of 128,
> the PowerPC A2 and e5500 have {d,i}cache_bsize values of 64 in cputable
> and presumably also get correct values from OpenFirmware, so the bogus
> constant in asm/cache.h does nothing more than waste a bit of memory
> for unnecessary padding.

More or less yes, though we haven't totally given up on the idea of
eventually, one day, produce binaries capable of running both 64-bit S
and E :-)

> Unfortunately, lots of PPC32 assembly pretends that the value found in
> asm/cache.h is a hard truth and uses it for "dcbz", etc, which is why
> there are all of those ugly #ifdefs in asm/cache.h

Yes, well... -some- assembly, mostly the copy routines. It's been the
main reason why this hasn't been fixed yet.

> Based on all of that, my proposal is going to be a patch which does the
> following:
> 
>   (1) Conditionally set L1_CACHE_SHIFT to the maximum value used by any
>       platform being compiled in for alignment purposes.

Yay !

>   (2) Make the ppc64_caches struct apply to ppc32 as well, and
>       preinitialize it with a minimum value used by any platform being
>       compiled in (for "dcbXX"/"icbXX" purposes).  This is safe because
>       the pagesize is always a multiple of the cache block size and the
>       kernel only uses dcbXX/icbXX on whole pages.  The only impact is a
>       temporary small performance hit from flushing or zeroing the same
>       block 8 times if too small.

Are you sure about dcbz ? Getting that wrong can be deadly ... I'd
rather get rid of some fancy optims and use a soft value in some cases.
That or we can compile multiple variants for the common case of some of
the copy routines and use patching (alternate sections) to branch to the
right one at runtime, at least for the common cases (32 and 128 for
example for 440 and 476).

>   (3) Try to initialize the ppc_caches struct on ppc32 from the
>       OpenFirmware device-tree.  If that fails, then use the values we
>       find in the cputable.  After this is initialized any performance
>       hit in copy_page()/zero_page() will obviously disappear.
>
>   (4) Fix all of the PPC32 assembly code that is misusing L1_CACHE_SHIFT
>       to use the ppc_caches struct instead.

Yes. This could be done while keeping the hand-optimized stuff by
compiling several variants of it.

> Does that sound like a reasonable approach?

It absolutely does ! Thanks for looking at that, it's been on my todo
list for ages and I've been always finding good reasons to do something
else instead :-)

Cheers,
Ben.

> Cheers,
> Kyle Moffett
> 
> --
> Curious about my work on the Debian powerpcspe port?
> I'm keeping a blog here: http://pureperl.blogspot.com/

Kyle Moffett Nov. 15, 2011, 2:32 a.m. UTC | #13

Ok, so I have a work-in-progress patch for cleaning up the CPU cache
handling, and I'd like some comments on the approach.

It's not really split up, and it's kind of a huge patch because it
tries to tackle a lot of things at once.  Unfortunately, I'm having a
hard time finding good clean places to break things apart.

Furthermore, I know 100% that it is not complete on PPC32 yet, and it
almost certainly does not build on PPC64 yet either.

These are the only files in arch/powerpc/ which have known-incorrect
references to L1_CACHE_* variables:
  arch/powerpc/lib/copy_32.S
  arch/powerpc/kernel/misc_32.S

Unfortunately, I've been staring at PPC asm for long enough that I
have a migraine headache and I'm going to have to stop here for now.
If somebody else wants to tackle fixing up the 32-bit copy_page() and
__copy_tofrom_user() routines it would be highly appreciated.

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

Kyle Moffett Nov. 15, 2011, 2:36 a.m. UTC | #14

On Nov 10, 2011, at 23:40, Benjamin Herrenschmidt wrote:
> On Thu, 2011-11-10 at 18:38 -0600, Moffett, Kyle D wrote:
>>  (2) Make the ppc64_caches struct apply to ppc32 as well, and
>>      preinitialize it with a minimum value used by any platform being
>>      compiled in (for "dcbXX"/"icbXX" purposes).  This is safe because
>>      the pagesize is always a multiple of the cache block size and the
>>      kernel only uses dcbXX/icbXX on whole pages.  The only impact is a
>>      temporary small performance hit from flushing or zeroing the same
>>      block 8 times if too small.
> 
> Are you sure about dcbz ? Getting that wrong can be deadly ... I'd
> rather get rid of some fancy optims and use a soft value in some cases.
> That or we can compile multiple variants for the common case of some of
> the copy routines and use patching (alternate sections) to branch to the
> right one at runtime, at least for the common cases (32 and 128 for
> example for 440 and 476).

Well, all of the kernel loops that use dcbz are operating on whole pages,
and the PPC Book-E spec documents that the pagesize is an even multiple
of the cacheline size and the cachelines are always page-aligned.

So when you are clearing a whole page, there are only 2 things you can do
wrong with "dcbz":

  (1) Call "dcbz" with an address outside of the page you want to zero.

  (2) Omit calls "dcbz" to dcbz for some physical cachelines in the page.

Now, that's a totally different story from the userspace memset() calls
that caused the problem originally, because they were frequently given
memory much smaller than a page to clear, and if you didn't know exactly
how many bytes a "dcbz" was going to clear you couldn't use it at all.

But the kernel doesn't do that anywhere, it just uses it for page clears.

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

Tabi Timur-B04825 Nov. 15, 2011, 2:41 a.m. UTC | #15

Moffett, Kyle D wrote:
>the PPC Book-E spec documents that the pagesize is an even multiple
> of the cacheline size and the cachelines are always page-aligned.

cachelines are page aligned?

Kyle Moffett Nov. 15, 2011, 3:40 a.m. UTC | #16

On Mon, Nov 14, 2011 at 21:41, Tabi Timur-B04825 <B04825@freescale.com> wrote:
> Moffett, Kyle D wrote:
>>the PPC Book-E spec documents that the pagesize is an even multiple
>> of the cacheline size and the cachelines are always page-aligned.
>
> cachelines are page aligned?

Whoops, good catch.  That should have been:

"the PPC Book-E spec documents that the pagesize is an even multiple
of the cacheline size and that the pages are always cacheline-aligned."

Thanks!

Cheers,
Kyle Moffett

Benjamin Herrenschmidt Nov. 15, 2011, 10:29 p.m. UTC | #17

On Mon, 2011-11-14 at 21:32 -0500, Kyle Moffett wrote:
> Unfortunately, I've been staring at PPC asm for long enough that I
> have a migraine headache and I'm going to have to stop here for now.
> If somebody else wants to tackle fixing up the 32-bit copy_page() and
> __copy_tofrom_user() routines it would be highly appreciated. 

Yeah that's the one everybody's avoiding :-)

What about my idea of instead compiling it multiple times with a
different size and fixing up the branch to call the right one ?

Cheers,
Ben.

Benjamin Herrenschmidt Nov. 15, 2011, 10:41 p.m. UTC | #18

On Mon, 2011-11-14 at 20:36 -0600, Moffett, Kyle D wrote:
> So when you are clearing a whole page, there are only 2 things you can do
> wrong with "dcbz":
> 
>   (1) Call "dcbz" with an address outside of the page you want to zero.
> 
>   (2) Omit calls "dcbz" to dcbz for some physical cachelines in the page.
> 
> Now, that's a totally different story from the userspace memset() calls
> that caused the problem originally, because they were frequently given
> memory much smaller than a page to clear, and if you didn't know exactly
> how many bytes a "dcbz" was going to clear you couldn't use it at all.

Right. That's why we pass the cache line sizes to userspace via the elf
AUX table so they don't do stupid things like that :-)

> But the kernel doesn't do that anywhere, it just uses it for page clears. 

Right, so we could easily precalc the count & increment and use a "soft"
loop.

Cheers,
Ben.

Kyle Moffett Nov. 15, 2011, 10:45 p.m. UTC | #19

On Nov 15, 2011, at 17:29, Benjamin Herrenschmidt wrote:
> On Mon, 2011-11-14 at 21:32 -0500, Kyle Moffett wrote:
>> Unfortunately, I've been staring at PPC asm for long enough that I
>> have a migraine headache and I'm going to have to stop here for now.
>> If somebody else wants to tackle fixing up the 32-bit copy_page() and
>> __copy_tofrom_user() routines it would be highly appreciated. 
> 
> Yeah that's the one everybody's avoiding :-)
> 
> What about my idea of instead compiling it multiple times with a
> different size and fixing up the branch to call the right one ?

I guess that's doable, although I have to admit that idea almost gives
me more of a headache than trying to fix up the 32-bit ASM.

One thing that bothers me in particular is that both 32/64 versions of
__copy_tofrom_user() are dramatically overcomplicated for what they
ought to be doing.

It would seem that if we get a page fault during an unaligned copy, we
ought to just give up and fall back to a simple byte-by-byte copy loop
from wherever we left off.  That would eliminate 90% of the ugly
special cases without actually hurting performance, right?

For a page-fault during a cacheline-aligned copy, we should be able to
handle the exception and retry from the last cacheline without much
logic, again with good performance.

With that said, I'm curious about the origin of the PPC32 ASM.  In
particular, it looks like it was generated by GCC at some point in the
distant past, and I'm wondering if there's a good way to rewrite that
file in C and trick GCC into generating the relevant exception tables
for it?

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

Benjamin Herrenschmidt Nov. 15, 2011, 11:46 p.m. UTC | #20

On Tue, 2011-11-15 at 16:45 -0600, Moffett, Kyle D wrote:

> I guess that's doable, although I have to admit that idea almost gives
> me more of a headache than trying to fix up the 32-bit ASM.
> 
> One thing that bothers me in particular is that both 32/64 versions of
> __copy_tofrom_user() are dramatically overcomplicated for what they
> ought to be doing.
> 
> It would seem that if we get a page fault during an unaligned copy, we
> ought to just give up and fall back to a simple byte-by-byte copy loop
> from wherever we left off.  That would eliminate 90% of the ugly
> special cases without actually hurting performance, right?
> 
> For a page-fault during a cacheline-aligned copy, we should be able to
> handle the exception and retry from the last cacheline without much
> logic, again with good performance.
> 
> With that said, I'm curious about the origin of the PPC32 ASM.  In
> particular, it looks like it was generated by GCC at some point in the
> distant past, and I'm wondering if there's a good way to rewrite that
> file in C and trick GCC into generating the relevant exception tables
> for it?

There is some serious history in there :-)

I would check with Anton, he's been doing some performance work on those
lately (the 64-bit ones).

It's probably worth throwing a proof-of-concept simpler variant for
32-bit at least on the table and have people compare the perfs
(typically network perfs). I can test on a range of ppc32 here (6xx,
7xxx, 4xx).

Cheers,
Ben.

Kyle Moffett Nov. 16, 2011, 12:25 a.m. UTC | #21

On Nov 15, 2011, at 18:46, Benjamin Herrenschmidt wrote:
> On Tue, 2011-11-15 at 16:45 -0600, Moffett, Kyle D wrote:
>> 
>> With that said, I'm curious about the origin of the PPC32 ASM.  In
>> particular, it looks like it was generated by GCC at some point in the
>> distant past, and I'm wondering if there's a good way to rewrite that
>> file in C and trick GCC into generating the relevant exception tables
>> for it?
> 
> There is some serious history in there :-)
> 
> I would check with Anton, he's been doing some performance work on those
> lately (the 64-bit ones).
> 
> It's probably worth throwing a proof-of-concept simpler variant for
> 32-bit at least on the table and have people compare the perfs
> (typically network perfs). I can test on a range of ppc32 here (6xx,
> 7xxx, 4xx).

Ok, so there's not really a good way to make GCC generate the exception
tables itself.  I've come up with several overly-clever ways to do most
of what we would want using "asm goto" except that (1) "asm goto" cannot
have register outputs, and (2) "asm goto" is only available in GCC 4.5+

I could easily work around the former by putting the code into its own
file and creating a "global" register variable just for that file, but
the GCC 4.5+ dependency is a total nonstarter.

I'm trying to see if I can make it look better than it does now with
some judicious use of inline ASM.  At the very least, it should be
possible to have a wrapper function written in C which calls the ASM
guts with the correct cache params.

More importantly, the ASM code needs to use something other than
totally arbitrary numbers for labels.  :-D

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

Paul Mackerras Nov. 16, 2011, 4:40 a.m. UTC | #22

On Tue, Nov 15, 2011 at 04:45:18PM -0600, Moffett, Kyle D wrote:
> On Nov 15, 2011, at 17:29, Benjamin Herrenschmidt wrote:
> > On Mon, 2011-11-14 at 21:32 -0500, Kyle Moffett wrote:
> >> Unfortunately, I've been staring at PPC asm for long enough that I
> >> have a migraine headache and I'm going to have to stop here for now.
> >> If somebody else wants to tackle fixing up the 32-bit copy_page() and
> >> __copy_tofrom_user() routines it would be highly appreciated. 
> > 
> > Yeah that's the one everybody's avoiding :-)
> > 
> > What about my idea of instead compiling it multiple times with a
> > different size and fixing up the branch to call the right one ?
> 
> I guess that's doable, although I have to admit that idea almost gives
> me more of a headache than trying to fix up the 32-bit ASM.
> 
> One thing that bothers me in particular is that both 32/64 versions of
> __copy_tofrom_user() are dramatically overcomplicated for what they
> ought to be doing.
> 
> It would seem that if we get a page fault during an unaligned copy, we
> ought to just give up and fall back to a simple byte-by-byte copy loop
> from wherever we left off.  That would eliminate 90% of the ugly
> special cases without actually hurting performance, right?

That's basically what we do, IIRC, and most of the complexity comes
from working out where we were up to.  We could probably use a simpler
approximation that means we might copy some bytes twice.  In fact the
greatest simplification would probably be to implement range entries
in the exception table so we can just have one entry for all the loads
and stores instead of an entry for each individual load and store.

> For a page-fault during a cacheline-aligned copy, we should be able to
> handle the exception and retry from the last cacheline without much
> logic, again with good performance.
> 
> With that said, I'm curious about the origin of the PPC32 ASM.  In
> particular, it looks like it was generated by GCC at some point in the
> distant past, and I'm wondering if there's a good way to rewrite that
> file in C and trick GCC into generating the relevant exception tables
> for it?

Why do you think it was generated by gcc?  I wrote the original
version, but I think it got extended and macro-ized by others.

Paul.

Kyle Moffett Nov. 16, 2011, 8:52 p.m. UTC | #23

On Nov 15, 2011, at 23:40, Paul Mackerras wrote:
> On Tue, Nov 15, 2011 at 04:45:18PM -0600, Moffett, Kyle D wrote:
>> 
>> I guess that's doable, although I have to admit that idea almost gives
>> me more of a headache than trying to fix up the 32-bit ASM.
>> 
>> One thing that bothers me in particular is that both 32/64 versions of
>> __copy_tofrom_user() are dramatically overcomplicated for what they
>> ought to be doing.
>> 
>> It would seem that if we get a page fault during an unaligned copy, we
>> ought to just give up and fall back to a simple byte-by-byte copy loop
>> from wherever we left off.  That would eliminate 90% of the ugly
>> special cases without actually hurting performance, right?
> 
> That's basically what we do, IIRC, and most of the complexity comes
> from working out where we were up to.  We could probably use a simpler
> approximation that means we might copy some bytes twice.  In fact the
> greatest simplification would probably be to implement range entries
> in the exception table so we can just have one entry for all the loads
> and stores instead of an entry for each individual load and store.

Well, I spent some time tinkering with the GCC inline-assembly option,
which was probably a waste, but I figured I would post my code here for
other people to chuckle at.  :-D

Here's a basic, relatively easily extended "copy u8" macro that sets up
the exception table using "asm goto":

#define try_copy_u8(DST, SRC, LOAD_FAULT, STORE_FAULT) do {	\
	unsigned long try_copy_tmp__ = (try_copy_tmp__);	\
	asm goto (						\
		"1:	lbz %[tmp], %[src]\n"			\
		"2:	stb %[tmp], %[dst]\n"			\
		"	.pushsection __ex_table, \"a\"\n"	\
		"	.align 2\n"				\
		"	.long 1b, %l["#LOAD_FAULT"]\n"		\
		"	.long 2b, %l["#STORE_FAULT"]\n"		\
		"	.popsection\n"				\
		: /* No outputs allowed for "asm goto" */	\
		: [dst] "m"(*(__user u8 *)(DST)),		\
		  [src] "m"(*(const __user u8 *)(SRC)),		\
		  [tmp] "r"(try_copy_tmp__)			\
		: "memory"					\
		: LOAD_FAULT, STORE_FAULT			\
	);							\
} while(0)

If I put that into a function and compile it, the assembly and the
exception table look perfectly OK, even under register pressure.
With a few macros like that it looks like it should be possible to
write the copy function directly in C and get optimal results.

The only other variants you need would be "try_copy_ulong" and
"try_copy_4ulong"/"try_copy_8ulong" for 32/64-bit.

Unfortunately, as I mentioned before, GCC 4.4 and older don't have
"asm goto" support :-(.

Perhaps I could put __copy_tofrom_user() into its own file and make
the assembled 32/64 output files be ".shipped"?

On the other hand, perhaps this is overly complicated :-D.

I'll poke at it more tomorrow.

>> For a page-fault during a cacheline-aligned copy, we should be able to
>> handle the exception and retry from the last cacheline without much
>> logic, again with good performance.
>> 
>> With that said, I'm curious about the origin of the PPC32 ASM.  In
>> particular, it looks like it was generated by GCC at some point in the
>> distant past, and I'm wondering if there's a good way to rewrite that
>> file in C and trick GCC into generating the relevant exception tables
>> for it?
> 
> Why do you think it was generated by gcc?  I wrote the original
> version, but I think it got extended and macro-ized by others.

Ah, sorry,  when I first looked at it the large collection of numeric
labels and the very sparing comments made it look autogenerated.

Although, given how much of a pain in the neck it is maybe you would
rather people not think you wrote it at all. ;-)

Cheers,
Kyle Moffett

--
Curious about my work on the Debian powerpcspe port?
I'm keeping a blog here: http://pureperl.blogspot.com/

[v3] powerpc: 85xx: separate e500 from e500mc

Commit Message

Comments

Patch