Message ID | 46e1c7760902082327s1c498ac3w56939960ac306426@mail.gmail.com |
---|---|
State | Rejected, archived |
Delegated to: | David Miller |
Headers | show |
From: Risto Suominen <risto.suominen@gmail.com> Date: Mon, 9 Feb 2009 09:27:49 +0200 > Add a configurable Descriptor Skip Length for systems that lack cache coherence. > > Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com> I really don't see why this patch could possibly be necessary. On systems that lack cache coherence: 1) {pci,dma}_alloc_{consistent,coherent}() give kernel mappings of the buffer with the cache disabled. Therefore the device and and cpu see the correct data. 2) {pci,dma}_{map,unmap}_{single,sg}() do the appropriate cache flushing. Explicit syncing between cpu and device can be performed using {pci,dma}_sync_{single,sg}() as needed. Therefore, this patch is superfluous. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
2009/2/9 David Miller <davem@davemloft.net>: > From: Risto Suominen <risto.suominen@gmail.com> > Date: Mon, 9 Feb 2009 09:27:49 +0200 > >> Add a configurable Descriptor Skip Length for systems that lack cache coherence. >> >> Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com> > > I really don't see why this patch could possibly be necessary. > Because it makes it work on my PowerMac 5500 ;) > On systems that lack cache coherence: > > 1) {pci,dma}_alloc_{consistent,coherent}() give kernel mappings of > the buffer with the cache disabled. Therefore the device and > and cpu see the correct data. > > 2) {pci,dma}_{map,unmap}_{single,sg}() do the appropriate cache > flushing. > > Explicit syncing between cpu and device can be performed > using {pci,dma}_sync_{single,sg}() as needed. > Sounds good, but does not seem to help. My theory is that when the cpu is writing to one descriptor, it accidentally overwrites another descriptor, that has already been written to by the device, as there is only a single dirty bit, that makes the whole cacheline to be flushed. > Therefore, this patch is superfluous. > Or everything else is. My solution does not cost a penny. Risto -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Risto Suominen <risto.suominen@gmail.com> Date: Mon, 9 Feb 2009 10:22:15 +0200 > Sounds good, but does not seem to help. My theory is that when the cpu > is writing to one descriptor, it accidentally overwrites another > descriptor, that has already been written to by the device, as there > is only a single dirty bit, that makes the whole cacheline to be > flushed. You tested with 2.6.18 but you want me to apply this to current kernels, that won't work. Therefore you'll need to verify that the problem still exists with current kernels before I'll consider this seriously. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
I see your point, was kind of expecting this. I'll see what I can do. Risto -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Risto Suominen <risto.suominen@gmail.com> writes: >> 1) {pci,dma}_alloc_{consistent,coherent}() give kernel mappings of >> the buffer with the cache disabled. ^^^^^^^^^^^^^^^^^^^^^^^ > Sounds good, but does not seem to help. My theory is that when the cpu > is writing to one descriptor, it accidentally overwrites another > descriptor, that has already been written to by the device, as there > is only a single dirty bit, that makes the whole cacheline to be > flushed. That means the consistent/coherent mapping isn't really consistent/coherent (uncached), right? Perhaps there is some way to fix this instead of changing the drivers to avoid the problematic area? Potentially any driver is affected by such coherency problem, this can't be specific to 21040.
2009/2/9 Krzysztof Halasa <khc@pm.waw.pl>: > > That means the consistent/coherent mapping isn't really > consistent/coherent (uncached), right? Perhaps there is some way to fix > this instead of changing the drivers to avoid the problematic area? > Yes. The other way could be to add a config parameter that allows to explicitly set CONFIG_NOT_COHERENT_CACHE. I have not tried this, probably because the necessary code wasn't in arch/powerpc until 2.6.21. Anyway, now I have tested with 2.6.24, and the problem still exists. Apropos changing the driver, I would like to point out, that leaving my proposed config parameter to its default value means not changing the driver. > Potentially any driver is affected by such coherency problem, this can't > be specific to 21040. > I agree. That talks for the config solution. Risto -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Risto Suominen <risto.suominen@gmail.com> Date: Mon, 9 Feb 2009 21:22:03 +0200 > 2009/2/9 Krzysztof Halasa <khc@pm.waw.pl>: > > > > That means the consistent/coherent mapping isn't really > > consistent/coherent (uncached), right? Perhaps there is some way to fix > > this instead of changing the drivers to avoid the problematic area? > > > Yes. > > The other way could be to add a config parameter that allows to > explicitly set CONFIG_NOT_COHERENT_CACHE. I have not tried this, > probably because the necessary code wasn't in arch/powerpc until > 2.6.21. > > Anyway, now I have tested with 2.6.24, and the problem still exists. Ok. The issue are descriptors that are _written_ by both the cpu and the device. That is the problematic case here. e100 had a similar problem and changes were made there too to accomodate such system types. If only the cpu or only the device writes to the descriptor (which is the most common design these days) there are no problems. So as promised I'll have to relook at this patch :-) -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
David Miller <davem@davemloft.net> writes: > The issue are descriptors that are _written_ by both the cpu > and the device. That is the problematic case here. Do you mean both CPU and 21040 write to the same descriptor at (nearly) the same time? Is it TX, RX or both? I wonder, how would the patch help it? The patch seems to align the descriptors on cache line boundary. That IMHO means the corruption is caused by the 21040 writing to e.g. desc #0, CPU writing to desc #1, which causes the cache line write bringing the old desc #0 back. Is it possible to use uncached memory for coherent allocations (with no write side effects) on this machine?
From: Krzysztof Halasa <khc@pm.waw.pl> Date: Tue, 10 Feb 2009 02:45:46 +0100 > David Miller <davem@davemloft.net> writes: > > > The issue are descriptors that are _written_ by both the cpu > > and the device. That is the problematic case here. > > Do you mean both CPU and 21040 write to the same descriptor at (nearly) > the same time? Is it TX, RX or both? > > I wonder, how would the patch help it? The problem is when the chip is writing to one neighbouring descriptor of one which the cpu is writing to at the same time. > The patch seems to align the descriptors on cache line boundary. That > IMHO means the corruption is caused by the 21040 writing to e.g. desc > #0, CPU writing to desc #1, which causes the cache line write bringing > the old desc #0 back. Right. > Is it possible to use uncached memory for coherent allocations (with no > write side effects) on this machine? Good question. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Risto Suominen <risto.suominen@gmail.com> Date: Mon, 9 Feb 2009 21:22:03 +0200 > 2009/2/9 Krzysztof Halasa <khc@pm.waw.pl>: > > Potentially any driver is affected by such coherency problem, this can't > > be specific to 21040. > > > I agree. That talks for the config solution. I think the pci_alloc_consistent() implementation for your particular platform should be fixed instead :-) It should be using uncacheable cpu mappings of the returned memory if the cpu lacks cache coherency with DMA. Peppering all kinds of drivers with this kind of change being proposed here is not the way to handle this problem. It makes the creation of the abstractions we created with pci_alloc_consistent() and friends totally useless. Drivers have to be able to say "what I write to this memory the device will see, and what the device writes the cpu will see" and they have to be able to say this regardless of details like cache alignment and other things that they should have no business knowing about. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> I really don't see why this patch could possibly be necessary. > > On systems that lack cache coherence: > > 1) {pci,dma}_alloc_{consistent,coherent}() give kernel mappings of > the buffer with the cache disabled. Therefore the device and > and cpu see the correct data. > > 2) {pci,dma}_{map,unmap}_{single,sg}() do the appropriate cache > flushing. > > Explicit syncing between cpu and device can be performed > using {pci,dma}_sync_{single,sg}() as needed. > > Therefore, this patch is superfluous. Allright so in that case, I'm not entirely sure... There is some history behind that (and btw, de4x5 and tulip probably need the same "workaround"), see below. First what I believe the problem to be is some kind of coherency bug in a bridge in some of these old systems. I'm not 100% sure of the details but it does have to do with partial cache line access iirc. By moving the descriptors into separate cache lines, we avoid the bug. It would be possible, I suppose, to work around it by treating those machines as non-coherent but with two significant drawbacks: - One is simple and you may want to ignore it :-) Basically, those are already pretty slow machines, and thus we would slow them down even more by doing manual cache flush/invalidates all over the place (as a matter of fact, I'm not even sure those CPUs support dcbi instructions for cache inval, I need to dbl check). - The other is more subtle but also harder to avoid: It would be fairly hard to add support for non-coherent mappings on these because of the way the whole MMU stuff works for those 32-bit hash based powerpc's. Basically, we cannot have something mapped both cached and non-cached (in different virtual addresses), and the use of the BAT mappings in the processor means that most of the linear mapping -will- be mapped cached in chunks of 256MB. The code pretty much relies on that, it's not just an optimisation that can be turned off. So I'm of mixed opinion here... It looks like since only those 2 or 3 drivers are affected (ie, they are the only thing ever found behind those weirdo bridges) and since those chips happen to support this padding between descriptor, it looks like a good compromise to just add this feature to the drivers. In fact, de4x5.c at least used to have it, I don't know if it's still the case. Now, there are other cache coherency related bugs I think on those old machines, or at least what look like it -could- be cache coherency related bugs, or maybe just bugs in the DBDMA engine. Mesh for example gets unhappy if we give it a buffer that is not aligned on a cache line boundary and corrupts the beginning of that cache line. But I don't think that treating the platform as non-coherent will fix that, ie, it seems to happens regardless of the line actually being in the cache or not. This is typically triggered by the SCSI ID at boot when using SLAB debug which de-aligns kmalloc, or at least that's how I observed it a while ago. Cheers, Ben. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Add a configurable Descriptor Skip Length for systems that lack cache coherence. Signed-off-by: Risto Suominen <Risto.Suominen@gmail.com> --- The testing is done on kernel version 2.6.18. --- drivers/net/tulip/Kconfig.org 2006-09-20 06:42:06.000000000 +0300 +++ drivers/net/tulip/Kconfig 2009-02-07 20:48:17.000000000 +0200 @@ -27,6 +27,18 @@ config DE2104X To compile this driver as a module, choose M here. The module will be called de2104x. +config DE2104X_DSL + int "Descriptor Skip Length in 32 bit longwords" + depends on DE2104X + range 0 31 + default 0 + help + Setting this value allows to align ring buffer descriptors into their + own cache lines. Value of 4 corresponds to the typical 32 byte line + (the descriptor is 16 bytes). This is necessary on systems that lack + cache coherence, an example is PowerMac 5500. Otherwise 0 is safe. + Default is 0, and range is 0 to 31. + config TULIP tristate "DECchip Tulip (dc2114x) PCI support" depends on PCI --- drivers/net/tulip/de2104x.c.org 2006-09-20 06:42:06.000000000 +0300 +++ drivers/net/tulip/de2104x.c 2009-02-07 15:04:04.000000000 +0200 @@ -82,6 +82,13 @@ MODULE_PARM_DESC (rx_copybreak, "de2104x NETIF_MSG_RX_ERR | \ NETIF_MSG_TX_ERR) +/* Descriptor skip length in 32 bit longwords. */ +#ifndef CONFIG_DE2104X_DSL +#define DSL 0 +#else +#define DSL CONFIG_DE2104X_DSL +#endif + #define DE_RX_RING_SIZE 64 #define DE_TX_RING_SIZE 64 #define DE_RING_BYTES \ @@ -153,6 +160,7 @@ enum { CmdReset = (1 << 0), CacheAlign16 = 0x00008000, BurstLen4 = 0x00000400, + DescSkipLen = (DSL << 2), /* Rx/TxPoll bits */ NormalTxPoll = (1 << 0), @@ -246,7 +254,7 @@ static const u32 de_intr_mask = * Set the programmable burst length to 4 longwords for all: * DMA errors result without these values. Cache align 16 long. */ -static const u32 de_bus_mode = CacheAlign16 | BurstLen4; +static const u32 de_bus_mode = CacheAlign16 | BurstLen4 | DescSkipLen; struct de_srom_media_block { u8 opts; @@ -266,6 +274,9 @@ struct de_desc { __le32 opts2; __le32 addr1; __le32 addr2; +#if DSL + __le32 skip[DSL]; +#endif }; struct media_info {