Patchwork PowerPC radeon KMS - is it possible?

login
register
mail settings
Submitter Benjamin Herrenschmidt
Date April 18, 2012, 11:17 a.m.
Message ID <1334747877.3143.12.camel@pasglop>
Download mbox | patch
Permalink /patch/153475/
State Not Applicable
Headers show

Comments

Benjamin Herrenschmidt - April 18, 2012, 11:17 a.m.
On Wed, 2012-04-18 at 12:34 +0200, Michel Dänzer wrote:
> On Mit, 2012-04-18 at 20:20 +1000, Benjamin Herrenschmidt wrote: 
> > On Wed, 2012-04-18 at 10:02 +0200, Michel Dänzer wrote:
> > > 
> > > > GPU lockup appears to be a common problem with the radeon driver.
> > > 
> > > It's what happens when anything goes wrong with the GPU. If it doesn't
> > > happen with agpmode=-1, it's probably an AGP related coherency issue. 
> > 
> > I had some success hacking the DRM to do an in_le32 from the ring head
> > after writing it. Just a gross hack but it seemed to help on a G5.
> 
> AFAICT radeon_ring_commit() does that already:
> 
>         DRM_MEMORYBARRIER();
>         WREG32(ring->wptr_reg, (ring->wptr << ring->ptr_reg_shift) & ring->ptr_reg_mask);
>         (void)RREG32(ring->wptr_reg);
> 
> We added the readback about a decade ago. :)

Hrm, I have a different hack in that old tree I was playing with a while
back, let me see...


I think that my rational was to ensure that all previous stores to
AGP (indirect buffers etc...) were pushed out & ordered vs the ring
wptr update or something like that, bcs I think those path aren't well
ordered in HW. In fact I suspect we might even need a bigger hammer than
that in_be32().

Another hack I had around was removing the SBA reset from agp-uninorth
completely on binding new pages, it seemed to cause hangs.

> > I suspect there's a fundamental design issue with apple bridge in that
> > the CPU to memory path isn't coherent at all with the GPU to memory path
> > ie. even vs. cache flush instructions (ie buffers in the memory
> > controllers can still be out of sync).
> > 
> > Darwin does some gross hacks to work around that, some of them visible
> > in the AGP drivers, some burried in the Apple driver, I don't know for
> > sure. It's possible that they end up mapping all AGP memory as cache
> > inhibited, but we can't do that because of our linear mapping.
> 
> We are doing that though...

Are we really ? I thought we were taking existing cachable RAM objects
and mapping them into the AGP gart. Are we replacing both kernel & user
mappings for those objects with an equivalent cache inhibited mapping ?

I'm not -that- familiar with how ttm works here. In any case it can
cause bus checkstops because the same pages can be prefetched into the
cache via the linear mapping which is covered by BATs (unless you make
your graphic objects HIGHMEM only but good luck with that :-)

To make that work reliably we should disable the BAT mapping so the
linear mapping can then be controlled on a per-page basis (on 32-bit)
and this is complicated .... we have code that more/less relies on the
BAT mapping being there elsewhere. On 64-bit it's even nastier because
we use 16M pages for the linear mapping.

Cheers,
Ben.
Michel Dänzer - April 18, 2012, 1:27 p.m.
On Mit, 2012-04-18 at 21:17 +1000, Benjamin Herrenschmidt wrote: 
> On Wed, 2012-04-18 at 12:34 +0200, Michel Dänzer wrote:
> > On Mit, 2012-04-18 at 20:20 +1000, Benjamin Herrenschmidt wrote: 
> > > On Wed, 2012-04-18 at 10:02 +0200, Michel Dänzer wrote:
> > > > 
> > > > > GPU lockup appears to be a common problem with the radeon driver.
> > > > 
> > > > It's what happens when anything goes wrong with the GPU. If it doesn't
> > > > happen with agpmode=-1, it's probably an AGP related coherency issue. 
> > > 
> > > I had some success hacking the DRM to do an in_le32 from the ring head
> > > after writing it. Just a gross hack but it seemed to help on a G5.
> > 
> > AFAICT radeon_ring_commit() does that already:
> > 
> >         DRM_MEMORYBARRIER();
> >         WREG32(ring->wptr_reg, (ring->wptr << ring->ptr_reg_shift) & ring->ptr_reg_mask);
> >         (void)RREG32(ring->wptr_reg);
> > 
> > We added the readback about a decade ago. :)
> 
> Hrm, I have a different hack in that old tree I was playing with a while
> back, let me see...
> 
> --- a/drivers/gpu/drm/radeon/radeon_cp.c
> +++ b/drivers/gpu/drm/radeon/radeon_cp.c

Note that radeon_cp.c is UMS code, for KMS you need to look at
radeon_ring.c.

> @@ -2245,6 +2245,9 @@ void radeon_commit_ring(drm_radeon_private_t
> *dev_priv)
>         DRM_MEMORYBARRIER();
>         GET_RING_HEAD( dev_priv );
>  
> +#ifdef CONFIG_PPC
> +       in_be32(dev_priv->ring.start);
> +#endif
>         if ((dev_priv->flags & RADEON_FAMILY_MASK) >= CHIP_R600) {
> 
> 
> I think that my rational was to ensure that all previous stores to
> AGP (indirect buffers etc...) were pushed out & ordered vs the ring
> wptr update or something like that, bcs I think those path aren't well
> ordered in HW. In fact I suspect we might even need a bigger hammer than
> that in_be32().

Probably wouldn't hurt trying something like that in the KMS code as
well.


> Another hack I had around was removing the SBA reset from agp-uninorth
> completely on binding new pages, it seemed to cause hangs.

You mean like commit 5613beb46d54da6ef7f1c4589e9f2e60eeb10721? :)


> > > I suspect there's a fundamental design issue with apple bridge in that
> > > the CPU to memory path isn't coherent at all with the GPU to memory path
> > > ie. even vs. cache flush instructions (ie buffers in the memory
> > > controllers can still be out of sync).
> > > 
> > > Darwin does some gross hacks to work around that, some of them visible
> > > in the AGP drivers, some burried in the Apple driver, I don't know for
> > > sure. It's possible that they end up mapping all AGP memory as cache
> > > inhibited, but we can't do that because of our linear mapping.
> > 
> > We are doing that though...
> 
> Are we really ? I thought we were taking existing cachable RAM objects
> and mapping them into the AGP gart.

No, the radeon driver has always mapped memory uncacheable to the CPU
while it's bound into the AGP GART.

> Are we replacing both kernel & user mappings for those objects with an
> equivalent cache inhibited mapping ? 
> 
> I'm not -that- familiar with how ttm works here.

I'm hardly more familiar with how it all works than you. :)

> In any case it can cause bus checkstops because the same pages can be
> prefetched into the cache via the linear mapping which is covered by
> BATs

So you've been saying for about a decade. :) But I've never seen any
problems tracked down to that.

> (unless you make your graphic objects HIGHMEM only but good luck with
> that :-)

FWIW I think TTM indeed prefers highmem pages for GPU access. The radeon
driver normally doesn't need kernel mappings for them.

Patch

--- a/drivers/gpu/drm/radeon/radeon_cp.c
+++ b/drivers/gpu/drm/radeon/radeon_cp.c
@@ -2245,6 +2245,9 @@  void radeon_commit_ring(drm_radeon_private_t
*dev_priv)
        DRM_MEMORYBARRIER();
        GET_RING_HEAD( dev_priv );
 
+#ifdef CONFIG_PPC
+       in_be32(dev_priv->ring.start);
+#endif
        if ((dev_priv->flags & RADEON_FAMILY_MASK) >= CHIP_R600) {