Message ID | 4DA5BCF3.5080205@teksavvy.com |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
> Yeah, I'm suspecting there's a loophole in the logic there somewhere. > > I dusted off the 6041 reference card I have here, and played with the cables for a while. Managed to > get one port to stop responding to hot plug fairly quickly, though I'm not sure how/why. > > Then I added a debug printk() to mv_write_main_irq_mask(), with no other changes, and that appears to > have been enough to change the race timing so that I could no longer produce the problem. > > Bruce, here's a slightly-ugly patch that should remove all doubt about races in the irq_mask. Please > apply it, test with it, and let me know here if the issue goes away. > > Thanks Thanks Mark. I was about to try out some tracing in writelfl because I suspected the irq mask was getting clobbered somewhere along the way, but I'd been distracted by other work. I shall try your patch out as soon as I can, and report back. Cheers, Bruce. Bruce Stenning, IndigoVision, b <dot> stenning <at> indigovision <dot> com Latest News at: http://www.indigovision.com/index.php/en/news.html
PiA+IE9uZSB0aGluZyBJIG5vdGljZWQgd2FzIHRoYXQgdGhlcmUgaXMgbm8gc3BpbmxvY2sgYXJv dW5kIHRoZQ0KPiA+IG12X3NhdmVfY2FjaGVkX3JlZ3MvbXZfZWRtYV9jZmcgaW4gbXZfaGFyZHJl c2V0ICh1bmxpa2UgbXZfcG9ydF9zdGFydA0KPiA+IGFuZCBtdl9wb3J0X3N0b3ApOyB3aHkgaXMg dGhpcz8NCj4NCj4gWWVhaCwgSSdtIHN1c3BlY3RpbmcgdGhlcmUncyBhIGxvb3Bob2xlIGluIHRo ZSBsb2dpYyB0aGVyZSBzb21ld2hlcmUuDQo+DQo+IEkgZHVzdGVkIG9mZiB0aGUgNjA0MSByZWZl cmVuY2UgY2FyZCBJIGhhdmUgaGVyZSwgYW5kIHBsYXllZCB3aXRoIHRoZSBjYWJsZXMNCj4gZm9y IGEgd2hpbGUuICBNYW5hZ2VkIHRvIGdldCBvbmUgcG9ydCB0byBzdG9wIHJlc3BvbmRpbmcgdG8g aG90IHBsdWcgZmFpcmx5DQo+IHF1aWNrbHksIHRob3VnaCBJJ20gbm90IHN1cmUgaG93L3doeS4N Cj4NCj4gVGhlbiBJIGFkZGVkIGEgZGVidWcgcHJpbnRrKCkgdG8gbXZfd3JpdGVfbWFpbl9pcnFf bWFzaygpLCB3aXRoIG5vIG90aGVyDQo+IGNoYW5nZXMsIGFuZCB0aGF0IGFwcGVhcnMgdG8gaGF2 ZSBiZWVuIGVub3VnaCB0byBjaGFuZ2UgdGhlIHJhY2UgdGltaW5nDQo+IHNvIHRoYXQgSSBjb3Vs ZCBubyBsb25nZXIgcHJvZHVjZSB0aGUgcHJvYmxlbS4NCj4NCj4gQnJ1Y2UsIGhlcmUncyBhIHNs aWdodGx5LXVnbHkgcGF0Y2ggdGhhdCBzaG91bGQgcmVtb3ZlIGFsbCBkb3VidCBhYm91dCByYWNl cw0KPiBpbiB0aGUgaXJxX21hc2suICBQbGVhc2UgYXBwbHkgaXQsIHRlc3Qgd2l0aCBpdCwgYW5k IGxldCBtZSBrbm93IGhlcmUgaWYgdGhlDQo+IGlzc3VlIGdvZXMgYXdheS4NCj4NCj4gVGhhbmtz DQo+DQo+IFtwYXRjaF0NCg0KDQpIaSBNYXJrLA0KDQpJIGhhdmUgdHJpZWQgdGhlIHBhdGNoIG91 dCBhbmQgSSB3YXMgYWJsZSB0byByZXByb2R1Y2UgdGhlIHBvcnQgbG9ja3VwIHdpdGggaXQuDQoN CkkgYWxzbyB0cmllZCBvdXQgbXkgdHJhY2luZyBpbiB3cml0ZWxmbCAod2l0aG91dCBhbnkgb3Ro ZXIgY2hhbmdlcykgYW5kIEkgd2FzDQphYmxlIHRvIGxvY2sgdXAgYSBwb3J0IHdpdGhvdXQgYW55 IGFwcGFyZW50IHVudXN1YWwgY2hhbmdlcyB0byB0aGUgcmVnaXN0ZXINCmNvbnRhaW5pbmcgdGhl IGlycSBtYXNrcy4gIEFyZSB0aGVyZSBvdGhlciByb3V0ZXMgdG8gZGlzYWJsaW5nIHRoZSBpcnFz Pw0KDQpJIHNoYWxsIGNvbnRpbnVlIGxvb2tpbmcuDQoNCkNoZWVycywNCg0KQnJ1Y2UuDQoNCg0K QnJ1Y2UgU3Rlbm5pbmcsDQpJbmRpZ29WaXNpb24sDQpiIDxkb3Q+IHN0ZW5uaW5nIDxhdD4gaW5k aWdvdmlzaW9uIDxkb3Q+IGNvbQ0KDQoNCkxhdGVzdCBOZXdzIGF0OiBodHRwOi8vd3d3LmluZGln b3Zpc2lvbi5jb20vaW5kZXgucGhwL2VuL25ld3MuaHRtbA0K -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 11-04-14 12:50 PM, Bruce Stenning wrote: .. > I have tried the patch out and I was able to reproduce the port lockup with it. Bummer. > I also tried out my tracing in writelfl (without any other changes) and I was > able to lock up a port without any apparent unusual changes to the register > containing the irq masks. Are there other routes to disabling the irqs? Mmm.. presumably you are NOT using MSI interrupts, right? -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > I also tried out my tracing in writelfl (without any other changes) and I was > > able to lock up a port without any apparent unusual changes to the register > > containing the irq masks. Are there other routes to disabling the irqs? > > Mmm.. presumably you are NOT using MSI interrupts, right? Hi Mark, That is correct: sata_mv 0000:00:04.0: Gen-II 32 slots 4 ports SCSI mode IRQ via INTx I am building with CONFIG_ARCH_SUPPORTS_MSI=y, but also with CONFIG_PCI_MSI not set. With all this discussion of SATA link speed, I ought to say that we limit our SATA links to 1.5Gbps with the following kernel parameter: libata.force=1.5g I had noticed occasional messages similar to the following earlier in the week: ata4.00: limiting speed to UDMA/100:PIO4 I was surprised to see them, and thought they might be related to the hotplug issues, but I was able to reproduce the lockups without these messages being generated. In some cases (but not all) both sata_down_spd_limit and ata_down_xfermask_limit are called when backing off the link speed. Mark, I was intrigued by your comment from the following message: http://www.spinics.net/lists/linux-ide/msg36922.html > If it's like their non-AHCI controllers (sata_mv), then the chipset/phy > could be very particular about the sequence/timing used when changing speeds. Is it possible that the driver is doing some of the work to change the link speed (even though it has nowhere to go) and clobbering the link entirely? I shall take another closer look at the source code and tracing around hotplug w.r.t. link speed. Cheers, Bruce. Bruce Stenning, IndigoVision, b <dot> stenning <at> indigovision <dot> com Latest News at: http://www.indigovision.com/index.php/en/news.html
On 11-04-15 09:21 AM, Bruce Stenning wrote: .. > sata_mv 0000:00:04.0: Gen-II 32 slots 4 ports SCSI mode IRQ via INTx .. > > I had noticed occasional messages similar to the following earlier in the week: > > ata4.00: limiting speed to UDMA/100:PIO4 Ugh. Tejun: that's just plain silly. Why would we EVER EVER EVER fall back to "PIO" for SATA ??? > I am building with CONFIG_ARCH_SUPPORTS_MSI=y, but also with CONFIG_PCI_MSI > not set. > > With all this discussion of SATA link speed, I ought to say that we limit our > SATA links to 1.5Gbps with the following kernel parameter: > > libata.force=1.5g Mmm... I suppose it could be getting stuck at 1.5g, and not transitioning back to "auto detect" or something. That part of sata_mv involves a bit of voo-doo, because I was reluctant to cut/paste the 200 lines or so of generic code just for the sake of changing five lines. Perhaps the generic code has changed since this was originally working, and now no longer always works for sata_mv.. I suppose you have a good reason for using "libata.force=1.5g" there? Cheers -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> > libata.force=1.5g > > Mmm... I suppose it could be getting stuck at 1.5g, > and not transitioning back to "auto detect" or something. > > That part of sata_mv involves a bit of voo-doo, > because I was reluctant to cut/paste the 200 lines or so > of generic code just for the sake of changing five lines. > > Perhaps the generic code has changed since this was originally working, > and now no longer always works for sata_mv.. > > I suppose you have a good reason for using "libata.force=1.5g" there? Hi Mark, I hope you had a good weekend. The SATA speed was limited because there were concerns over signal integrity at 3Gbps early on in our product development, and even at 1.5Gbps the SATA links are not a bottleneck in our platform. The signal integrity issues have been addressed, and we would like to remove the limit, but this obviously has an additional test overhead and it is low down on our priorities. Out of interest I examined the behaviour of the hotplug with the 1.5Gbps limit removed, but I saw no change in the port lockup behaviour. Cheers, Bruce. Latest News at: http://www.indigovision.com/index.php/en/news.html
On Fri, Apr 15, 2011 at 11:36:38AM -0400, Mark Lord wrote: > On 11-04-15 09:21 AM, Bruce Stenning wrote: > .. > > sata_mv 0000:00:04.0: Gen-II 32 slots 4 ports SCSI mode IRQ via INTx > .. > > > > I had noticed occasional messages similar to the following earlier in the week: > > > > ata4.00: limiting speed to UDMA/100:PIO4 > > Ugh. Tejun: that's just plain silly. > Why would we EVER EVER EVER fall back to "PIO" for SATA ??? Nah, it's just printing both DMA and PIO speed settings. It's limited to UDMA/100 not PIO4. Thanks.
--- old/drivers/ata/sata_mv.c 2011-04-13 11:03:07.442481426 -0400 +++ linux/drivers/ata/sata_mv.c 2011-04-13 11:07:55.224249621 -0400 @@ -1027,15 +1027,19 @@ static void mv_set_main_irq_mask(struct ata_host *host, u32 disable_bits, u32 enable_bits) { + static spinlock_t irq_mask_lock = __SPIN_LOCK_UNLOCKED(irq_mask_lock); // FIXME: per-host would be nicer struct mv_host_priv *hpriv = host->private_data; u32 old_mask, new_mask; + unsigned long flags; + spin_lock_irqsave(&irq_mask_lock, flags); old_mask = hpriv->main_irq_mask; new_mask = (old_mask & ~disable_bits) | enable_bits; if (new_mask != old_mask) { hpriv->main_irq_mask = new_mask; mv_write_main_irq_mask(new_mask, hpriv); } + spin_unlock_irqrestore(&irq_mask_lock, flags); } static void mv_enable_port_irqs(struct ata_port *ap,