Message ID | 20100623080200.GB5010@liondog.tnic |
---|---|
State | Not Applicable |
Delegated to: | David Miller |
Headers | show |
Okey, I hope I captured all the data you asked me for On Wed, 23 Jun 2010 10:02:00 +0200 Borislav Petkov <bp@alien8.de> wrote: > cat /proc/interrupts > irqs > dmesg > dmesg.log They are attached > Also, when you shutdown after having done your test case, do you see any > activity after the "task ... blocked" backtrace? log12 contains everything exactly how I saw it on my screen; the data I saw, ended with "[...] comp", too. log12 contains the kernel buffers data from the time between starting the transfer and the hang while shutting down.
From: Hans Mueller <hans42mueller@googlemail.com>
Date: Fri, Jun 25, 2010 at 06:58:46PM +0200
Hi, sorry for the delay.
Right, and I had a suspicion about sharing IRQs with the NIC:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 42503 0 0 0 0 0 0 0 IO-APIC-edge timer
1: 2 0 0 0 0 0 0 0 IO-APIC-edge i8042
4: 2 0 0 0 0 0 0 0 IO-APIC-edge
9: 0 0 0 0 0 0 0 0 IO-APIC-fasteoi acpi
12: 4 0 0 0 0 0 0 0 IO-APIC-edge i8042
14: 0 0 0 0 0 0 0 0 IO-APIC-edge ide2
15: 0 0 0 0 0 0 0 0 IO-APIC-edge ide3
16: 26 0 0 0 0 0 0 0 IO-APIC-fasteoi ehci_hcd:usb1
17: 127 0 0 0 0 0 0 0 IO-APIC-fasteoi hda_intel
18: 47 0 0 0 0 0 0 0 IO-APIC-fasteoi firewire_ohci, ahci
19: 753 0 0 0 0 0 0 0 IO-APIC-fasteoi ide0, ide1, eth0
21: 3989 0 0 0 0 0 0 0 IO-APIC-fasteoi ahci
so IRQ19 is shared between the nic and the first ide controller and the
sata controller is using another irq line which could explain why the
issue doesn't happen with libata. Is your nick a pluggable card and if
yes, can you move it to another PCI slot so that ide0 and ide1 don't
share the same irq line with eth0 and retest again? Before retesting
though, do 'cat /proc/interrupts' to make sure.
I'm guessing the problem will go away then...
Thanks.
Hi,
On Wed, 30 Jun 2010 08:54:26 +0200
Borislav Petkov <bp@alien8.de> wrote:
> Is your nick a pluggable card [...]
No sorry, it's an onboard card.
Furthermore, I don't know how to change the IRQ using linux. (Or rather
if it's possible at all)
I didn't find a possibility in my board's bios to change the IRQ
mappings, too.
From: Hans Mueller <hans42mueller@googlemail.com> Date: Wed, Jun 30, 2010 at 08:02:54PM +0200 > On Wed, 30 Jun 2010 08:54:26 +0200 > Borislav Petkov <bp@alien8.de> wrote: > > > Is your nick a pluggable card [...] > No sorry, it's an onboard card. > Furthermore, I don't know how to change the IRQ using linux. (Or rather > if it's possible at all) > I didn't find a possibility in my board's bios to change the IRQ > mappings, too. Ok, first you can try something which is real easy: I see you have an ide2 and ide3 channels each having their own irq line. You could move the cdrom connector to the other ide controller and test again. Alternatively, if you have a spare PCI NIC, you can insert it into one of the PCI slots after having disabled the onboard NIC in the BIOS. Just for testing purposes, to see whether "unsharing" the IRQ line fixes the issue. Thanks.
Hi, On Wed, 30 Jun 2010 20:31:38 +0200 Borislav Petkov <bp@alien8.de> wrote: > Alternatively, if you have a spare PCI NIC, you can insert it into one > of the PCI slots after having disabled the onboard NIC in the BIOS. Just > for testing purposes, to see whether "unsharing" the IRQ line fixes the > issue. I have currently no access to the computer, I will be able to test things again on monday. But as I wrote in my original bugreport, I tested with a PCI NIC. (But the onboard NIC was not completely disabled, it was only disabled via ifconfig ... down) When using the PCI NIC, the whole problem (kernel panik) did not occure. -- Regards Jonas -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Hans Mueller <hans42mueller@googlemail.com> Date: Sat, Jul 03, 2010 at 09:49:18AM +0200 > Hi, > > On Wed, 30 Jun 2010 20:31:38 +0200 > Borislav Petkov <bp@alien8.de> wrote: > > Alternatively, if you have a spare PCI NIC, you can insert it into one > > of the PCI slots after having disabled the onboard NIC in the BIOS. Just > > for testing purposes, to see whether "unsharing" the IRQ line fixes the > > issue. > > I have currently no access to the computer, I will be able to test > things again on monday. > But as I wrote in my original bugreport, I tested with a PCI NIC. (But > the onboard NIC was not completely disabled, it was only disabled via > ifconfig ... down) > When using the PCI NIC, the whole problem (kernel panik) did not occure. Ok, this confirms my suspicion that it is shared-irq related. Also, we already verified that switching to libata does fix the issue for you so you are good to go. Considering the DEPRECATED status of ide, I have a very little incentive in hunting this thing further down, so let's leave it at that. I'll send the first fix to Dave since it is still needed and add a note to bugzilla for further reference. Jonas, big thanks for your hard work with testing patches and ideas. I really appreciate it! :)
Hi On Sat, 3 Jul 2010 10:23:04 +020I0 Borislav Petkov <bp@alien8.de> wrote: > Ok, this confirms my suspicion that it is shared-irq related. Also, we > already verified that switching to libata does fix the issue for you so > you are good to go. Considering the DEPRECATED status of ide, I have a > very little incentive in hunting this thing further down, so let's leave > it at that. Okey good. :) > Jonas, big thanks for your hard work with testing patches and ideas. I > really appreciate it! :) You're welcome. :) Big thanks to all of you who tried to resolve the bug, or helped in another way. I do especially emphasize this, as it seems not to standard to answer bugreports at all. (At least on the b43 bugreport list; I reported a bug as their driver seems to have destroyed my wifi card. Don't missundersstand me I know this can happen, but I exspected the will to stop the driver from destroying other peoples hardware.) -- Regards/Gruss Jonas -- To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
From: Hans Mueller <hans42mueller@googlemail.com> Date: Sat, Jul 03, 2010 at 01:41:49PM +0200 > I do especially emphasize this, as it seems not to standard to answer > bugreports at all. It's a shame that users get that impression but sadly I must admit I know what you mean and yes, we should try harder instead of hacking in gazillion new features. But the answer is simple: writing new features is much more fun than bughunting... > (At least on the b43 bugreport list; I reported a bug as their driver > seems to have destroyed my wifi card. Don't missundersstand me I know > this can happen, but I exspected the will to stop the driver from > destroying other peoples hardware.) Hmm, that's strange. I remember reading that while there's no real maintainer of the driver, there are still people doing some work on it, according to this: http://marc.info/?l=linux-wireless&m=127747220616577&w=2 Did you also add the wireless maintainer to the Cc of your bugreport - "John W. Linville" <linville@tuxdriver.com> - along with a detailed description of what the problem is, which kernel, how to reproduce along with dmesg?
Hi, even if this is quite offtopic and I'm not sure whether linux-ide sould be still in the Cc field, I answert to all to leave nobody with half of the story :) On Sat, 3 Jul 2010 13:53:05 +0200 Borislav Petkov <bp@alien8.de> wrote: > Did you also add the wireless maintainer to the Cc of your bugreport > - "John W. Linville" <linville@tuxdriver.com> [...] No that is the only thing I did not. I followed the instructions from: http://linuxwireless.org/en/users/Drivers/b43#bug_reporting It wasn't mentioned to Cc anybody as far as I know :) > [...] along with a detailed > description of what the problem is, which kernel, how to reproduce along > with dmesg? I attached the text of the original mail (but not the attachments as there should be no need for them in here; if I am wrong with this, ask me for them :) ) If there is anything wrong with the bugreport, feel free to criticize. I am aware that this is offtopic, so do not spend to much time on it, I am going to send a copy of the original mail to the wireless maintainer, as you suggested, and see what will happen. Gruss / Regards, Jonas
From: Hans Mueller <hans42mueller@googlemail.com> Date: Mon, Jul 05, 2010 at 01:44:01PM +0200 > even if this is quite offtopic and I'm not sure whether linux-ide > sould be still in the Cc field, I answert to all to leave nobody with > half of the story :) Right. > > Did you also add the wireless maintainer to the Cc of your bugreport > > - "John W. Linville" <linville@tuxdriver.com> [...] > > No that is the only thing I did not. I followed the instructions from: > http://linuxwireless.org/en/users/Drivers/b43#bug_reporting > It wasn't mentioned to Cc anybody as far as I know :) > > > > > [...] along with a detailed > > description of what the problem is, which kernel, how to reproduce along > > with dmesg? > > I attached the text of the original mail (but not the attachments as > there should be no need for them in here; if I am wrong with this, ask > me for them :) ) > If there is anything wrong with the bugreport, feel free to criticize. Yep, it looks good. Bottom line is: The bug report should try to plausibly lay out what the symptoms are and how to reproduce them, if possible. Better be too verbose than not to mention something which might turn out important. Btw, does MacOS recognize your wlan card at all or is it completely bricked? And does it freeze only after you reboot from Linux? > I am aware that this is offtopic, so do not spend to much time > on it, I am going to send a copy of the original mail to the wireless > maintainer, as you suggested, and see what will happen. Yes, good luck :)
diff --git a/block/blk-core.c b/block/blk-core.c index 9fe174d..1213e13 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -173,9 +173,9 @@ void blk_dump_rq_flags(struct request *rq, char *msg) { int bit; - printk(KERN_INFO "%s: dev %s: type=%x, flags=%x\n", msg, + printk(KERN_INFO "%s: dev %s: type=%x, flags=%x, ref_count: %d\n", msg, rq->rq_disk ? rq->rq_disk->disk_name : "?", rq->cmd_type, - rq->cmd_flags); + rq->cmd_flags, rq->ref_count); printk(KERN_INFO " sector %llu, nr/cnr %u/%u\n", (unsigned long long)blk_rq_pos(rq), diff --git a/drivers/ide/ide-cd.c b/drivers/ide/ide-cd.c index 64207df..cefcaf4 100644 --- a/drivers/ide/ide-cd.c +++ b/drivers/ide/ide-cd.c @@ -448,6 +448,7 @@ int ide_cd_queue_pc(ide_drive_t *drive, const unsigned char *cmd, int error; rq = blk_get_request(drive->queue, write, __GFP_WAIT); + blk_dump_rq_flags(rq, "ide_cd_queue_pc got rq"); memcpy(rq->cmd, cmd, BLK_MAX_CDB); rq->cmd_type = REQ_TYPE_ATA_PC; @@ -464,12 +465,14 @@ int ide_cd_queue_pc(ide_drive_t *drive, const unsigned char *cmd, } error = blk_execute_rq(drive->queue, info->disk, rq, 0); + blk_dump_rq_flags(rq, "ide_cd_queue_pc exec rq"); if (buffer) *bufflen = rq->resid_len; flags = rq->cmd_flags; blk_put_request(rq); + blk_dump_rq_flags(rq, "ide_cd_queue_pc put rq"); /* * FIXME: we should probably abort/retry or something in case of @@ -506,15 +509,23 @@ int ide_cd_queue_pc(ide_drive_t *drive, const unsigned char *cmd, return (flags & REQ_FAILED) ? -EIO : 0; } -static void ide_cd_error_cmd(ide_drive_t *drive, struct ide_cmd *cmd) +/* + * notify callers that we ended the rq by returning a true value + */ +static bool ide_cd_error_cmd(ide_drive_t *drive, struct ide_cmd *cmd) { unsigned int nr_bytes = cmd->nbytes - cmd->nleft; if (cmd->tf_flags & IDE_TFLAG_WRITE) nr_bytes -= cmd->last_xfer_len; - if (nr_bytes > 0) + if (nr_bytes > 0) { + blk_dump_rq_flags(drive->hwif->rq, "ide_cd_error_cmd completes rq"); ide_complete_rq(drive, 0, nr_bytes); + return true; + } + + return false; } static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) @@ -552,8 +563,10 @@ static ide_startstop_t cdrom_newpc_intr(ide_drive_t *drive) if (!OK_STAT(stat, 0, BAD_R_STAT)) { rc = cdrom_decode_status(drive, stat); if (rc) { - if (rc == 2) + if (rc == 2) { + printk(KERN_EMERG "%s: bad status with a sense rq: %p\n", __func__, rq); goto out_end; + } return ide_stopped; } } @@ -667,8 +680,10 @@ out_end: blk_end_request_all(rq, 0); hwif->rq = NULL; } else { - if (sense && uptodate) + if (sense && uptodate) { + printk(KERN_EMERG "%s: complete failed rq: %p\n", __func__, rq); ide_cd_complete_failed_rq(drive, rq); + } if (blk_fs_request(rq)) { if (cmd->nleft == 0) @@ -679,7 +694,10 @@ out_end: } if (uptodate == 0 && rq->bio) - ide_cd_error_cmd(drive, cmd); + if (ide_cd_error_cmd(drive, cmd)) { + printk(KERN_EMERG "ide_cd_error_cmd completes rq"); + return ide_stopped; + } /* make sure it's fully ended */ if (blk_fs_request(rq) == 0) { @@ -688,10 +706,13 @@ out_end: rq->resid_len += cmd->last_xfer_len; } + printk(KERN_EMERG "%s: completing rq %p\n", __func__, rq); ide_complete_rq(drive, uptodate ? 0 : -EIO, blk_rq_bytes(rq)); - if (sense && rc == 2) + if (sense && rc == 2) { + printk(KERN_EMERG "%s: request sense failure, rq: %p\n", __func__, rq); ide_error(drive, "request sense failure", stat); + } } return ide_stopped; } @@ -1707,6 +1728,8 @@ static int ide_cd_probe(ide_drive_t *drive) struct gendisk *g; struct request_sense sense; + drive->debug_mask = 0xffffffff; + ide_debug_log(IDE_DBG_PROBE, "driver_req: %s, media: 0x%x", drive->driver_req, drive->media); @@ -1716,7 +1739,6 @@ static int ide_cd_probe(ide_drive_t *drive) if (drive->media != ide_cdrom && drive->media != ide_optical) goto failed; - drive->debug_mask = debug_mask; drive->irq_handler = cdrom_newpc_intr; info = kzalloc(sizeof(struct cdrom_info), GFP_KERNEL); diff --git a/drivers/ide/ide-cd.h b/drivers/ide/ide-cd.h index 93a3cf1..613542a 100644 --- a/drivers/ide/ide-cd.h +++ b/drivers/ide/ide-cd.h @@ -8,7 +8,7 @@ #include <linux/cdrom.h> #include <asm/byteorder.h> -#define IDECD_DEBUG_LOG 0 +#define IDECD_DEBUG_LOG 1 #if IDECD_DEBUG_LOG #define ide_debug_log(lvl, fmt, args...) __ide_debug_log(lvl, fmt, ## args) diff --git a/drivers/ide/ide-io.c b/drivers/ide/ide-io.c index 172ac92..c522435 100644 --- a/drivers/ide/ide-io.c +++ b/drivers/ide/ide-io.c @@ -126,8 +126,13 @@ int ide_complete_rq(ide_drive_t *drive, int error, unsigned int nr_bytes) nr_bytes = blk_rq_sectors(rq) << 9; rc = ide_end_rq(drive, rq, error, nr_bytes); - if (rc == 0) + if (rc == 0) { + printk(KERN_EMERG "ide_complete_rq: no buffers pending for this rq"); hwif->rq = NULL; + } + else + blk_dump_rq_flags(rq, "still buffers pending for this rq"); + return rc; }