Message ID | 1327611520-18256-1-git-send-email-iws@ovro.caltech.edu (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
On Thu, Jan 26, 2012 at 2:58 PM, Ira W. Snyder <iws@ovro.caltech.edu> wrote: > The mpc8349ea has been observed to generate spurious end of segments > interrupts despite the fact that they are not enabled by this driver. > Check for them and ignore them to avoid a kernel error message. When this happens, are there any other status bits set? It seems weird that there are spurious interrupts from an internal block, especially since it's the same block on all 83xx parts. I wonder if the EOSI bit just happens to be set when the interrupt occurs for some other reason.
On Thu, Feb 16, 2012 at 05:50:47PM +0000, Tabi Timur-B04825 wrote: > On Thu, Jan 26, 2012 at 2:58 PM, Ira W. Snyder <iws@ovro.caltech.edu> wrote: > > The mpc8349ea has been observed to generate spurious end of segments > > interrupts despite the fact that they are not enabled by this driver. > > Check for them and ignore them to avoid a kernel error message. > > When this happens, are there any other status bits set? It seems > weird that there are spurious interrupts from an internal block, > especially since it's the same block on all 83xx parts. > > I wonder if the EOSI bit just happens to be set when the interrupt > occurs for some other reason. > I'm not sure. The fsldma irq handler only prints bits it did not handle. There are several other bits in the driver which should never be seen, but they are handled by the irq handler anyway. This is just a remnant from the original Freescale code. I have a set of 15 test boards that I can use to figure out which other bits are set when this happens, if it is important. I put a variation of this patch (missing the "skip tasklet if not idle" logic) into my production boards roughly a month ago. I've gotten the "controller not idle" error message 748 times, as compared to the "unhandled sr 0x00000002" message 3449 times. This leads me to believe that this occurs mostly (but not always) concurrent with the end-of-chain interrupt. In the last month, the "unhandled sr" error has occurred on 92 out of 120 boards in production use. The statistics are included below. On some boards, it is much more frequent than on others. All boards have roughly the same workload. Another interesting tidbit from my logs: this only occurs on DMA channel 2 (the are numbered starting at 0, it is the 3rd channel). Here is an example log message: [3484053.821689] of:fsl-elo-dma e00082a8.dma: chan2: irq: unhandled sr 0x00000002 Thanks, Ira 15 serial-number-5 1 serial-number-16 8 serial-number-18 16 serial-number-19 3 serial-number-20 21 serial-number-21 1 serial-number-24 1 serial-number-26 3 serial-number-27 2 serial-number-28 16 serial-number-29 4 serial-number-30 1 serial-number-31 4 serial-number-32 5 serial-number-33 1 serial-number-34 6 serial-number-35 18 serial-number-36 1 serial-number-39 1 serial-number-40 2 serial-number-41 10 serial-number-42 11 serial-number-43 32 serial-number-45 6 serial-number-46 4 serial-number-47 1 serial-number-49 6 serial-number-50 2 serial-number-51 4 serial-number-53 1 serial-number-55 1 serial-number-57 15 serial-number-58 1 serial-number-60 1 serial-number-62 1 serial-number-66 8 serial-number-67 2 serial-number-75 1 serial-number-76 11 serial-number-79 4 serial-number-80 8 serial-number-81 1 serial-number-82 11 serial-number-84 2 serial-number-92 20 serial-number-93 30 serial-number-94 19 serial-number-95 32 serial-number-96 73 serial-number-97 18 serial-number-99 57 serial-number-100 41 serial-number-101 28 serial-number-102 8 serial-number-103 132 serial-number-107 60 serial-number-108 55 serial-number-109 97 serial-number-110 18 serial-number-111 45 serial-number-113 6 serial-number-114 123 serial-number-115 27 serial-number-117 29 serial-number-118 12 serial-number-119 47 serial-number-120 74 serial-number-121 8 serial-number-124 128 serial-number-125 326 serial-number-128 84 serial-number-129 36 serial-number-130 2 serial-number-131 75 serial-number-133 64 serial-number-135 686 serial-number-137 97 serial-number-139 28 serial-number-140 82 serial-number-141 36 serial-number-144 31 serial-number-145 47 serial-number-147 60 serial-number-150 22 serial-number-152 36 serial-number-154 57 serial-number-156 68 serial-number-158 54 serial-number-159 37 serial-number-160 46 serial-number-161 14 serial-number-162
Ira W. Snyder wrote: > This leads me to believe that this occurs mostly (but not always) > concurrent with the end-of-chain interrupt. Have you tested this on an 85xx platform? I noticed something odd. You're modifying fsldma_chan_irq(), which is for DMA controllers that have per-channel IRQs. 83xx devices don't have per-channel IRQs -- all channels on one controller have the same IRQ. Looking at the device tree, I see that the IRQs are listed in the channel nodes *and* in the controller node. I don't see how we ever use the per-controller ISR. I wonder if the shared IRQ is the part of the cause of the interrupts you're seeing. > > In the last month, the "unhandled sr" error has occurred on 92 out of > 120 boards in production use. The statistics are included below. On some > boards, it is much more frequent than on others. All boards have roughly > the same workload. > > Another interesting tidbit from my logs: this only occurs on DMA channel > 2 (the are numbered starting at 0, it is the 3rd channel). Here is an > example log message: What happens if you never register that channel? That is, remove this node from the device tree: dma-channel@100 { compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; reg = <0x100 0x80>; cell-index = <2>; interrupt-parent = <&ipic>; interrupts = <71 8>; };
On Thu, Feb 16, 2012 at 01:34:00PM -0600, Timur Tabi wrote: > Ira W. Snyder wrote: > > This leads me to believe that this occurs mostly (but not always) > > concurrent with the end-of-chain interrupt. > > Have you tested this on an 85xx platform? > No. I don't have the ability to connect my P2020 up to an FPGA to recreate the DMA workload that causes this on my 8349EA. I can run the dmatest module, if you'd like. > I noticed something odd. You're modifying fsldma_chan_irq(), which is for > DMA controllers that have per-channel IRQs. 83xx devices don't have > per-channel IRQs -- all channels on one controller have the same IRQ. > Looking at the device tree, I see that the IRQs are listed in the channel > nodes *and* in the controller node. I don't see how we ever use the > per-controller ISR. > fsldma_ctrl_irq() (the per-controller irq handler) just calls through to fsldma_chan_irq() (the per-channel irq handler). > I wonder if the shared IRQ is the part of the cause of the interrupts > you're seeing. > My device tree is slightly modified to remove the per-controller interrupts and interrupt-parent properties. Each individual channel has identical interrupts and interrupt-parent properties specified. Someone here suggested that I do that, several years ago. It has been too long, and I do not remember who. I can reverse it, and use the per-controller IRQ instead. > > > > In the last month, the "unhandled sr" error has occurred on 92 out of > > 120 boards in production use. The statistics are included below. On some > > boards, it is much more frequent than on others. All boards have roughly > > the same workload. > > > > Another interesting tidbit from my logs: this only occurs on DMA channel > > 2 (the are numbered starting at 0, it is the 3rd channel). Here is an > > example log message: > > What happens if you never register that channel? That is, remove this > node from the device tree: > > dma-channel@100 { > compatible = "fsl,mpc8349-dma-channel", "fsl,elo-dma-channel"; > reg = <0x100 0x80>; > cell-index = <2>; > interrupt-parent = <&ipic>; > interrupts = <71 8>; > }; > I can try that. I hunch the problem will move, as the carma-fpga driver (see drivers/misc/carma/carma-fpga.c) will claim the 4th channel instead. Ira
Ira W. Snyder wrote: > No. I don't have the ability to connect my P2020 up to an FPGA to > recreate the DMA workload that causes this on my 8349EA. I can run the > dmatest module, if you'd like. I just want to make sure your patch doesn't break 85xx.
On Thu, Feb 16, 2012 at 01:48:20PM -0600, Timur Tabi wrote: > Ira W. Snyder wrote: > > > No. I don't have the ability to connect my P2020 up to an FPGA to > > recreate the DMA workload that causes this on my 8349EA. I can run the > > dmatest module, if you'd like. > > I just want to make sure your patch doesn't break 85xx. > I tried both with and without this patch on my P2020 COM Express board. With both kernels, the board locks up after 20 minutes or so, no messages to the serial console. I wouldn't be surprised if there are some memory problems with this board. In any case, I don't have any reason to believe that this patch causes any trouble: the board dies without it. However, the patch doesn't break DMA on 85xx. If I unload the dmatest module after 10 minutes or so, it claims to have passed many thousands of tests without problems. My 8349EA test boards (15 of them) have been running their normal DMA workload plus dmatest on the unused 4th channel, all without errors, for several hours. ~2.5 million successful tests per board, as I write this. Ira
diff --git a/drivers/dma/fsldma.c b/drivers/dma/fsldma.c index 8a78154..7dc9689 100644 --- a/drivers/dma/fsldma.c +++ b/drivers/dma/fsldma.c @@ -1052,6 +1052,16 @@ static irqreturn_t fsldma_chan_irq(int irq, void *data) stat &= ~FSL_DMA_SR_EOLNI; } + /* + * This driver does not use this feature, therefore we shouldn't + * ever see this bit set in the status register. However, it has + * been observed on MPC8349EA parts. + */ + if (stat & FSL_DMA_SR_EOSI) { + chan_dbg(chan, "irq: End-of-Segments INT\n"); + stat &= ~FSL_DMA_SR_EOSI; + } + /* check that the DMA controller is really idle */ if (!dma_is_idle(chan)) chan_err(chan, "irq: controller not idle!\n");
The mpc8349ea has been observed to generate spurious end of segments interrupts despite the fact that they are not enabled by this driver. Check for them and ignore them to avoid a kernel error message. Signed-off-by: Ira W. Snyder <iws@ovro.caltech.edu> Cc: Dan Williams <dan.j.williams@intel.com> --- drivers/dma/fsldma.c | 10 ++++++++++ 1 files changed, 10 insertions(+), 0 deletions(-)