Patchwork [002/002] de2104x: support for systems lacking cache coherence

login
register
mail settings
Submitter Risto Suominen
Date Feb. 9, 2009, 7:27 a.m.
Message ID <46e1c7760902082327s1c498ac3w56939960ac306426@mail.gmail.com>
Download mbox | patch
Permalink /patch/22664/
State Rejected
Delegated to: David Miller
Headers show

Comments

Risto Suominen - Feb. 9, 2009, 7:27 a.m.
---------- Forwarded message ----------
From: Risto Suominen <risto.suominen@gmail.com>
Date: 2009/2/7
Subject: [PATCH 002/002] de2104x: support for systems lacking cache coherence
To: Jeff Garzik <jgarzik@pobox.com>, lkml <linux-kernel@vger.kernel.org>


Add a configurable Descriptor Skip Length for systems that lack cache coherence.

Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com>
---
The testing is done on kernel version 2.6.18.
David Miller - Feb. 9, 2009, 7:45 a.m.
From: Risto Suominen <risto.suominen@gmail.com>
Date: Mon, 9 Feb 2009 09:27:49 +0200

> Add a configurable Descriptor Skip Length for systems that lack cache coherence.
> 
> Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com>

I really don't see why this patch could possibly be necessary.

On systems that lack cache coherence:

1) {pci,dma}_alloc_{consistent,coherent}() give kernel mappings of
   the buffer with the cache disabled.  Therefore the device and
   and cpu see the correct data.

2) {pci,dma}_{map,unmap}_{single,sg}() do the appropriate cache
   flushing.

   Explicit syncing between cpu and device can be performed
   using {pci,dma}_sync_{single,sg}() as needed.

Therefore, this patch is superfluous.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Risto Suominen - Feb. 9, 2009, 8:22 a.m.
2009/2/9 David Miller <davem@davemloft.net>:
> From: Risto Suominen <risto.suominen@gmail.com>
> Date: Mon, 9 Feb 2009 09:27:49 +0200
>
>> Add a configurable Descriptor Skip Length for systems that lack cache coherence.
>>
>> Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com>
>
> I really don't see why this patch could possibly be necessary.
>
Because it makes it work on my PowerMac 5500 ;)

> On systems that lack cache coherence:
>
> 1) {pci,dma}_alloc_{consistent,coherent}() give kernel mappings of
>   the buffer with the cache disabled.  Therefore the device and
>   and cpu see the correct data.
>
> 2) {pci,dma}_{map,unmap}_{single,sg}() do the appropriate cache
>   flushing.
>
>   Explicit syncing between cpu and device can be performed
>   using {pci,dma}_sync_{single,sg}() as needed.
>
Sounds good, but does not seem to help. My theory is that when the cpu
is writing to one descriptor, it accidentally overwrites another
descriptor, that has already been written to by the device, as there
is only a single dirty bit, that makes the whole cacheline to be
flushed.

> Therefore, this patch is superfluous.
>
Or everything else is. My solution does not cost a penny.

Risto
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 9, 2009, 8:29 a.m.
From: Risto Suominen <risto.suominen@gmail.com>
Date: Mon, 9 Feb 2009 10:22:15 +0200

> Sounds good, but does not seem to help. My theory is that when the cpu
> is writing to one descriptor, it accidentally overwrites another
> descriptor, that has already been written to by the device, as there
> is only a single dirty bit, that makes the whole cacheline to be
> flushed.

You tested with 2.6.18 but you want me to apply this to current
kernels, that won't work.

Therefore you'll need to verify that the problem still exists with
current kernels before I'll consider this seriously.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Risto Suominen - Feb. 9, 2009, 8:35 a.m.
I see your point, was kind of expecting this. I'll see what I can do.

Risto
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa - Feb. 9, 2009, 4:58 p.m.
Risto Suominen <risto.suominen@gmail.com> writes:

>> 1) {pci,dma}_alloc_{consistent,coherent}() give kernel mappings of
>>   the buffer with the cache disabled.
                ^^^^^^^^^^^^^^^^^^^^^^^

> Sounds good, but does not seem to help. My theory is that when the cpu
> is writing to one descriptor, it accidentally overwrites another
> descriptor, that has already been written to by the device, as there
> is only a single dirty bit, that makes the whole cacheline to be
> flushed.

That means the consistent/coherent mapping isn't really
consistent/coherent (uncached), right? Perhaps there is some way to fix
this instead of changing the drivers to avoid the problematic area?

Potentially any driver is affected by such coherency problem, this can't
be specific to 21040.
Risto Suominen - Feb. 9, 2009, 7:22 p.m.
2009/2/9 Krzysztof Halasa <khc@pm.waw.pl>:
>
> That means the consistent/coherent mapping isn't really
> consistent/coherent (uncached), right? Perhaps there is some way to fix
> this instead of changing the drivers to avoid the problematic area?
>
Yes.

The other way could be to add a config parameter that allows to
explicitly set CONFIG_NOT_COHERENT_CACHE. I have not tried this,
probably because the necessary code wasn't in arch/powerpc until
2.6.21.

Anyway, now I have tested with 2.6.24, and the problem still exists.

Apropos changing the driver, I would like to point out, that leaving
my proposed config parameter to its default value means not changing
the driver.

> Potentially any driver is affected by such coherency problem, this can't
> be specific to 21040.
>
I agree. That talks for the config solution.

Risto
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 9, 2009, 10:51 p.m.
From: Risto Suominen <risto.suominen@gmail.com>
Date: Mon, 9 Feb 2009 21:22:03 +0200

> 2009/2/9 Krzysztof Halasa <khc@pm.waw.pl>:
> >
> > That means the consistent/coherent mapping isn't really
> > consistent/coherent (uncached), right? Perhaps there is some way to fix
> > this instead of changing the drivers to avoid the problematic area?
> >
> Yes.
> 
> The other way could be to add a config parameter that allows to
> explicitly set CONFIG_NOT_COHERENT_CACHE. I have not tried this,
> probably because the necessary code wasn't in arch/powerpc until
> 2.6.21.
> 
> Anyway, now I have tested with 2.6.24, and the problem still exists.

Ok.

The issue are descriptors that are _written_ by both the cpu
and the device.  That is the problematic case here.

e100 had a similar problem and changes were made there too to
accomodate such system types.

If only the cpu or only the device writes to the descriptor
(which is the most common design these days) there are no
problems.

So as promised I'll have to relook at this patch :-)
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Krzysztof Halasa - Feb. 10, 2009, 1:45 a.m.
David Miller <davem@davemloft.net> writes:

> The issue are descriptors that are _written_ by both the cpu
> and the device.  That is the problematic case here.

Do you mean both CPU and 21040 write to the same descriptor at (nearly)
the same time? Is it TX, RX or both?

I wonder, how would the patch help it?

The patch seems to align the descriptors on cache line boundary. That
IMHO means the corruption is caused by the 21040 writing to e.g. desc
#0, CPU writing to desc #1, which causes the cache line write bringing
the old desc #0 back.

Is it possible to use uncached memory for coherent allocations (with no
write side effects) on this machine?
David Miller - Feb. 10, 2009, 1:50 a.m.
From: Krzysztof Halasa <khc@pm.waw.pl>
Date: Tue, 10 Feb 2009 02:45:46 +0100

> David Miller <davem@davemloft.net> writes:
> 
> > The issue are descriptors that are _written_ by both the cpu
> > and the device.  That is the problematic case here.
> 
> Do you mean both CPU and 21040 write to the same descriptor at (nearly)
> the same time? Is it TX, RX or both?
> 
> I wonder, how would the patch help it?

The problem is when the chip is writing to one neighbouring descriptor
of one which the cpu is writing to at the same time.

> The patch seems to align the descriptors on cache line boundary. That
> IMHO means the corruption is caused by the 21040 writing to e.g. desc
> #0, CPU writing to desc #1, which causes the cache line write bringing
> the old desc #0 back.

Right.

> Is it possible to use uncached memory for coherent allocations (with no
> write side effects) on this machine?

Good question.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Miller - Feb. 10, 2009, 11:21 p.m.
From: Risto Suominen <risto.suominen@gmail.com>
Date: Mon, 9 Feb 2009 21:22:03 +0200

> 2009/2/9 Krzysztof Halasa <khc@pm.waw.pl>:
> > Potentially any driver is affected by such coherency problem, this can't
> > be specific to 21040.
> >
> I agree. That talks for the config solution.

I think the pci_alloc_consistent() implementation for your particular
platform should be fixed instead :-)

It should be using uncacheable cpu mappings of the returned memory if
the cpu lacks cache coherency with DMA.

Peppering all kinds of drivers with this kind of change being proposed
here is not the way to handle this problem.  It makes the creation of
the abstractions we created with pci_alloc_consistent() and friends
totally useless.

Drivers have to be able to say "what I write to this memory the device
will see, and what the device writes the cpu will see" and they have
to be able to say this regardless of details like cache alignment and
other things that they should have no business knowing about.
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Benjamin Herrenschmidt - Feb. 13, 2009, 3:42 a.m.
> I really don't see why this patch could possibly be necessary.
> 
> On systems that lack cache coherence:
> 
> 1) {pci,dma}_alloc_{consistent,coherent}() give kernel mappings of
>    the buffer with the cache disabled.  Therefore the device and
>    and cpu see the correct data.
> 
> 2) {pci,dma}_{map,unmap}_{single,sg}() do the appropriate cache
>    flushing.
> 
>    Explicit syncing between cpu and device can be performed
>    using {pci,dma}_sync_{single,sg}() as needed.
> 
> Therefore, this patch is superfluous.

Allright so in that case, I'm not entirely sure... There is some history
behind that (and btw, de4x5 and tulip probably need the same
"workaround"), see below.

First what I believe the problem to be is some kind of coherency bug in
a bridge in some of these old systems. I'm not 100% sure of the details
but it does have to do with partial cache line access iirc. By moving
the descriptors into separate cache lines, we avoid the bug.

It would be possible, I suppose, to work around it by treating those
machines as non-coherent but with two significant drawbacks:

 - One is simple and you may want to ignore it :-) Basically, those are
already pretty slow machines, and thus we would slow them down even more
by doing manual cache flush/invalidates all over the place (as a matter
of fact, I'm not even sure those CPUs support dcbi instructions for
cache inval, I need to dbl check).

 - The other is more subtle but also harder to avoid: It would be fairly
hard to add support for non-coherent mappings on these because of the
way the whole MMU stuff works for those 32-bit hash based powerpc's.
Basically, we cannot have something mapped both cached and non-cached
(in different virtual addresses), and the use of the BAT mappings in the
processor means that most of the linear mapping -will- be mapped cached
in chunks of 256MB. The code pretty much relies on that, it's not just
an optimisation that can be turned off.

So I'm of mixed opinion here... It looks like since only those 2 or 3
drivers are affected (ie, they are the only thing ever found behind
those weirdo bridges) and since those chips happen to support this
padding between descriptor, it looks like a good compromise to just add
this feature to the drivers. In fact, de4x5.c at least used to have it,
I don't know if it's still the case.

Now, there are other cache coherency related bugs I think on those old
machines, or at least what look like it -could- be cache coherency
related bugs, or maybe just bugs in the DBDMA engine. Mesh for example
gets unhappy if we give it a buffer that is not aligned on a cache line
boundary and corrupts the beginning of that cache line. But I don't
think that treating the platform as non-coherent will fix that, ie, it
seems to happens regardless of the line actually being in the cache or
not.  This is typically triggered by the SCSI ID at boot when using SLAB
debug which de-aligns kmalloc, or at least that's how I observed it a
while ago.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

Add a configurable Descriptor Skip Length for systems that lack cache coherence.

Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com>
---
The testing is done on kernel version 2.6.18.

--- drivers/net/tulip/Kconfig.org	2006-09-20 06:42:06.000000000 +0300
+++ drivers/net/tulip/Kconfig	2009-02-07 20:48:17.000000000 +0200
@@ -27,6 +27,18 @@  config DE2104X
 	  To compile this driver as a module, choose M here. The module will
 	  be called de2104x.
 
+config DE2104X_DSL
+	int "Descriptor Skip Length in 32 bit longwords"
+	depends on DE2104X
+	range 0 31
+	default 0
+	help
+	  Setting this value allows to align ring buffer descriptors into their
+	  own cache lines. Value of 4 corresponds to the typical 32 byte line
+	  (the descriptor is 16 bytes). This is necessary on systems that lack
+	  cache coherence, an example is PowerMac 5500. Otherwise 0 is safe.
+	  Default is 0, and range is 0 to 31.
+
 config TULIP
 	tristate "DECchip Tulip (dc2114x) PCI support"
 	depends on PCI
--- drivers/net/tulip/de2104x.c.org	2006-09-20 06:42:06.000000000 +0300
+++ drivers/net/tulip/de2104x.c	2009-02-07 15:04:04.000000000 +0200
@@ -82,6 +82,13 @@  MODULE_PARM_DESC (rx_copybreak, "de2104x
 				 NETIF_MSG_RX_ERR	| \
 				 NETIF_MSG_TX_ERR)
 
+/* Descriptor skip length in 32 bit longwords. */
+#ifndef CONFIG_DE2104X_DSL
+#define DSL			0
+#else
+#define DSL			CONFIG_DE2104X_DSL
+#endif
+
 #define DE_RX_RING_SIZE		64
 #define DE_TX_RING_SIZE		64
 #define DE_RING_BYTES		\
@@ -153,6 +160,7 @@  enum {
 	CmdReset		= (1 << 0),
 	CacheAlign16		= 0x00008000,
 	BurstLen4		= 0x00000400,
+	DescSkipLen		= (DSL << 2),
 
 	/* Rx/TxPoll bits */
 	NormalTxPoll		= (1 << 0),
@@ -246,7 +254,7 @@  static const u32 de_intr_mask =
  * Set the programmable burst length to 4 longwords for all:
  * DMA errors result without these values. Cache align 16 long.
  */
-static const u32 de_bus_mode = CacheAlign16 | BurstLen4;
+static const u32 de_bus_mode = CacheAlign16 | BurstLen4 | DescSkipLen;
 
 struct de_srom_media_block {
 	u8			opts;
@@ -266,6 +274,9 @@  struct de_desc {
 	__le32			opts2;
 	__le32			addr1;
 	__le32			addr2;
+#if DSL
+	__le32			skip[DSL];
+#endif
 };
 
 struct media_info {