Message ID | 1305753895-24845-6-git-send-email-ericvh@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote: > BG/P maps firmware with an early TLB That's a bit gross. How often do you call that firmware in practice ? Aren't you better off instead inserting a TLB entry for it when you call it instead ? A simple tlbsx. + tlbwe sequence would do. That would free up a TLB entry for normal use. Cheers, Ben. > Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> > --- > arch/powerpc/include/asm/mmu-44x.h | 6 +++++- > 1 files changed, 5 insertions(+), 1 deletions(-) > > diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h > index ca1b90c..2807d6e 100644 > --- a/arch/powerpc/include/asm/mmu-44x.h > +++ b/arch/powerpc/include/asm/mmu-44x.h > @@ -115,8 +115,12 @@ typedef struct { > #endif /* !__ASSEMBLY__ */ > > #ifndef CONFIG_PPC_EARLY_DEBUG_44x > +#ifndef CONFIG_BGP > #define PPC44x_EARLY_TLBS 1 > -#else > +#else /* CONFIG_BGP */ > +#define PPC44x_EARLY_TLBS 2 > +#endif /* CONFIG_BGP */ > +#else /* CONFIG_PPC_EARLY_DEBUG_44x */ > #define PPC44x_EARLY_TLBS 2 > #define PPC44x_EARLY_DEBUG_VIRTADDR (ASM_CONST(0xf0000000) \ > | (ASM_CONST(CONFIG_PPC_EARLY_DEBUG_44x_PHYSLOW) & 0xffff))
On Thu, May 19, 2011 at 7:39 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: > On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote: >> BG/P maps firmware with an early TLB > > That's a bit gross. How often do you call that firmware in practice ? > Aren't you better off instead inserting a TLB entry for it when you call > it instead ? A simple tlbsx. + tlbwe sequence would do. That would free > up a TLB entry for normal use. > Well, it depends on who you talk to. The production software BG/P guys use the firmware constantly, its the primary interface to the networks, the console, and the management software which runs the machine. As such the IO Node guys, the Compute Node Kernel guys and the ZeptoOS guys use it quite a bit. The kittyhawk guys on the other hand barely use it at all, in fact I believe they do all the interaction with it during uboot and then shut it off. IIRC, the sticky question is RAS support, there are certain things it wants to jump to firmware to deal with and expects things to be mapped an pinned into memory. Furthermore, I think it may make assumptions about where in the TLB the mappings are. Since the kittyhawk guys obviously ignore this by shutting it down, its not clear just how important this is. I'm game to try the dynamic mapping as you suggest if you would prefer it. Its worth mentioning that I believe with BG/Q, the plan is to rely on the firmware even more extensively, but I haven't looked at any of the code yet to verify whether or not this is true. -eric
On Thu, 2011-05-19 at 20:21 -0500, Eric Van Hensbergen wrote: > On Thu, May 19, 2011 at 7:39 PM, Benjamin Herrenschmidt > <benh@kernel.crashing.org> wrote: > > On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote: > >> BG/P maps firmware with an early TLB > > > > That's a bit gross. How often do you call that firmware in practice ? > > Aren't you better off instead inserting a TLB entry for it when you call > > it instead ? A simple tlbsx. + tlbwe sequence would do. That would free > > up a TLB entry for normal use. > > > > Well, it depends on who you talk to. The production software BG/P > guys use the firmware constantly, its the primary interface to the networks, the console, > and the management software which runs the machine. Yuck. > As such the IO Node guys, the Compute Node Kernel guys and the > ZeptoOS guys use it quite a bit. The kittyhawk guys on the other hand > barely use it at all, in fact I believe they do all the interaction with > it during uboot and then shut it off. I would prefer that approach. > IIRC, the sticky question is RAS support, there are certain things it > wants to jump to firmware to deal with and expects things to be mapped > an pinned into memory. > > Furthermore, I think it may make assumptions about where in the TLB the > mappings are. This is gross, especially on a system with only 64 SW loaded TLB entries :-( > Since the kittyhawk guys > obviously ignore this by shutting it down, its not clear just how > important this is. I'm game to > try the dynamic mapping as you suggest if you would prefer it. I would yes, we can sort things out later for RAS. > Its worth mentioning that I believe with BG/Q, the plan is to rely on > the firmware even more extensively, but I haven't looked at any of the code yet to verify > whether or not this is true. This is tantamount to linking a binary blob with the kernel ... it's a fine line. At some point we might refuse the patches if they go too far in that direction. Cheers, Ben. > -eric > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/
On 05/19/2011 08:54 PM, Benjamin Herrenschmidt wrote: > On Thu, 2011-05-19 at 20:21 -0500, Eric Van Hensbergen wrote: > >> On Thu, May 19, 2011 at 7:39 PM, Benjamin Herrenschmidt >> <benh@kernel.crashing.org> wrote: >> >>> On Wed, 2011-05-18 at 16:24 -0500, Eric Van Hensbergen wrote: >>> >>>> BG/P maps firmware with an early TLB >>>> >>> That's a bit gross. How often do you call that firmware in practice ? >>> Aren't you better off instead inserting a TLB entry for it when you call >>> it instead ? A simple tlbsx. + tlbwe sequence would do. That would free >>> up a TLB entry for normal use. >>> >>> >> Well, it depends on who you talk to. The production software BG/P >> guys use the firmware constantly, its the primary interface to the networks, the console, >> and the management software which runs the machine. >> > Yuck. > Unfortunately, the firmware is also required: - to configure Blue Gene Interrupt Controller(BIC) - to configure Torus DMA unit. e.g. fifo - to configure global interrupt (even we don't use, we need to disable some channel correctly) - to access node personality information (node id, DDR size, HZ, etc) or maybe we can directly access SRAM? etc, etc. >> As such the IO Node guys, the Compute Node Kernel guys and the >> ZeptoOS guys use it quite a bit. The kittyhawk guys on the other hand >> barely use it at all, in fact I believe they do all the interaction with >> it during uboot and then shut it off. >> > (I'm one of the ZeptoOS guys, btw) As a regular ppc linux usage, our firmware dependency is minimum as well. However, with our HPC extension, the firmware functions are called when it configures BGP specific network hardware. We are not planning to submit our HPC extension here anytime soon because our work is very special purpose and includes lots of dirty hack right now. Thanks, Kaz > I would prefer that approach. > > >> IIRC, the sticky question is RAS support, there are certain things it >> wants to jump to firmware to deal with and expects things to be mapped >> an pinned into memory. >> >> Furthermore, I think it may make assumptions about where in the TLB the >> mappings are. >> > This is gross, especially on a system with only 64 SW loaded TLB > entries :-( > > >> Since the kittyhawk guys >> obviously ignore this by shutting it down, its not clear just how >> important this is. I'm game to >> try the dynamic mapping as you suggest if you would prefer it. >> > I would yes, we can sort things out later for RAS. > > >> Its worth mentioning that I believe with BG/Q, the plan is to rely on >> the firmware even more extensively, but I haven't looked at any of the code yet to verify >> whether or not this is true. >> > This is tantamount to linking a binary blob with the kernel ... it's a > fine line. At some point we might refuse the patches if they go too far > in that direction. > > Cheers, > Ben. > > >> -eric >> -- >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in >> the body of a message to majordomo@vger.kernel.org >> More majordomo info at http://vger.kernel.org/majordomo-info.html >> Please read the FAQ at http://www.tux.org/lkml/ >> > > _______________________________________________ > bg-linux mailing list > bg-linux@lists.anl-external.org > https://lists.anl-external.org/mailman/listinfo/bg-linux > http://bg-linux.anl-external.org/wiki >
> Unfortunately, the firmware is also required: > - to configure Blue Gene Interrupt Controller(BIC) Can't we just write bare metal code for that ? > - to configure Torus DMA unit. e.g. fifo Same > - to configure global interrupt (even we don't use, we need to disable > some channel correctly) Same > - to access node personality information (node id, DDR size, HZ, etc) or > maybe we can directly access SRAM? That should be turned into device-tree at boot, possibly from a bootloader or from the zImage wrapper. > etc, etc. > > >> As such the IO Node guys, the Compute Node Kernel guys and the > >> ZeptoOS guys use it quite a bit. The kittyhawk guys on the other hand > >> barely use it at all, in fact I believe they do all the interaction with > >> it during uboot and then shut it off. > >> > > > (I'm one of the ZeptoOS guys, btw) Heh ok. > As a regular ppc linux usage, our firmware dependency is minimum as well. > However, with our HPC extension, the firmware functions are called when > it configures BGP specific network hardware. > > We are not planning to submit our HPC extension here anytime soon > because our work is very special purpose and includes lots of dirty hack > right now. Ok. Cheers, Ben. > Thanks, > Kaz > > I would prefer that approach. > > > > > >> IIRC, the sticky question is RAS support, there are certain things it > >> wants to jump to firmware to deal with and expects things to be mapped > >> an pinned into memory. > >> > >> Furthermore, I think it may make assumptions about where in the TLB the > >> mappings are. > >> > > This is gross, especially on a system with only 64 SW loaded TLB > > entries :-( > > > > > >> Since the kittyhawk guys > >> obviously ignore this by shutting it down, its not clear just how > >> important this is. I'm game to > >> try the dynamic mapping as you suggest if you would prefer it. > >> > > I would yes, we can sort things out later for RAS. > > > > > >> Its worth mentioning that I believe with BG/Q, the plan is to rely on > >> the firmware even more extensively, but I haven't looked at any of the code yet to verify > >> whether or not this is true. > >> > > This is tantamount to linking a binary blob with the kernel ... it's a > > fine line. At some point we might refuse the patches if they go too far > > in that direction. > > > > Cheers, > > Ben. > > > > > >> -eric > >> -- > >> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > >> the body of a message to majordomo@vger.kernel.org > >> More majordomo info at http://vger.kernel.org/majordomo-info.html > >> Please read the FAQ at http://www.tux.org/lkml/ > >> > > > > _______________________________________________ > > bg-linux mailing list > > bg-linux@lists.anl-external.org > > https://lists.anl-external.org/mailman/listinfo/bg-linux > > http://bg-linux.anl-external.org/wiki > >
On Thu, May 19, 2011 at 10:52 PM, Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote: >> Unfortunately, the firmware is also required: >> - to configure Blue Gene Interrupt Controller(BIC) >> - to configure Torus DMA unit. e.g. fifo >> - to configure global interrupt (even we don't use, we need to disable >> some channel correctly) > > Can't we just write bare metal code for that ? > The kittyhawk code has the bare-metal equivalents for all of these. When I get to the drivers, I'll favor the kittyhawk versions for submission and then we'll see if it would be possible to adapt the HPC extensions to use the bare-metal versions of the drivers versus the firmware interface. >> - to access node personality information (node id, DDR size, HZ, etc) or >> maybe we can directly access SRAM? > > That should be turned into device-tree at boot, possibly from a > bootloader or from the zImage wrapper. > This is the approach is used by the kittyhawk u-boot approach. However, it would also be just as easy to construct an in-memory device-tree within Linux by mapping the personality page and copying the relevant bits out. This has the advantage of being able to boot Linux directly on the nodes without an intermediary boot loader (which kittyhawk uses just to allow us customize which kernel boots on a node-to-node basis whereas the stock system boots the same kernel on all the nodes within a partition allocation (64-40,000 nodes)). -eric
On Fri, 2011-05-20 at 08:01 -0500, Eric Van Hensbergen wrote: > On Thu, May 19, 2011 at 10:52 PM, Benjamin Herrenschmidt > <benh@kernel.crashing.org> wrote: > >> Unfortunately, the firmware is also required: > >> - to configure Blue Gene Interrupt Controller(BIC) > >> - to configure Torus DMA unit. e.g. fifo > >> - to configure global interrupt (even we don't use, we need to disable > >> some channel correctly) > > > > Can't we just write bare metal code for that ? > > > > The kittyhawk code has the bare-metal equivalents for all of these. > When I get to the drivers, I'll favor the kittyhawk versions for > submission and then we'll see if it would be possible to adapt the HPC > extensions to use the bare-metal versions of the drivers versus the > firmware interface. Ok. We can also start with using the FW and then migrate to bare metal. > >> - to access node personality information (node id, DDR size, HZ, etc) or > >> maybe we can directly access SRAM? > > > > That should be turned into device-tree at boot, possibly from a > > bootloader or from the zImage wrapper. > > > > This is the approach is used by the kittyhawk u-boot approach. > However, it would also be just as easy to construct an in-memory > device-tree within Linux by mapping the personality page and copying > the relevant bits out. This has the advantage of being able to boot > Linux directly on the nodes without an intermediary boot loader (which > kittyhawk uses just to allow us customize which kernel boots on a > node-to-node basis whereas the stock system boots the same kernel on > all the nodes within a partition allocation (64-40,000 nodes)). We can do that from the zImage wrapper... that would be nicer than doing it from the kernel itself unless there's good reasons to do so like iSeries. Cheers, Ben.
diff --git a/arch/powerpc/include/asm/mmu-44x.h b/arch/powerpc/include/asm/mmu-44x.h index ca1b90c..2807d6e 100644 --- a/arch/powerpc/include/asm/mmu-44x.h +++ b/arch/powerpc/include/asm/mmu-44x.h @@ -115,8 +115,12 @@ typedef struct { #endif /* !__ASSEMBLY__ */ #ifndef CONFIG_PPC_EARLY_DEBUG_44x +#ifndef CONFIG_BGP #define PPC44x_EARLY_TLBS 1 -#else +#else /* CONFIG_BGP */ +#define PPC44x_EARLY_TLBS 2 +#endif /* CONFIG_BGP */ +#else /* CONFIG_PPC_EARLY_DEBUG_44x */ #define PPC44x_EARLY_TLBS 2 #define PPC44x_EARLY_DEBUG_VIRTADDR (ASM_CONST(0xf0000000) \ | (ASM_CONST(CONFIG_PPC_EARLY_DEBUG_44x_PHYSLOW) & 0xffff))
BG/P maps firmware with an early TLB Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com> --- arch/powerpc/include/asm/mmu-44x.h | 6 +++++- 1 files changed, 5 insertions(+), 1 deletions(-)