Message ID | 1385470344-2542-1-git-send-email-ezequiel.garcia@free-electrons.com |
---|---|
State | Accepted |
Commit | 15b540c71cac840f0a3e8b1b4b7a773deb847ffb |
Headers | show |
On Tue, Nov 26, 2013 at 09:52:24AM -0300, Ezequiel Garcia wrote: > After the driver allocates all DMA resources, it sets "info->use_dma". > Therefore, we need to check that variable to decide which resources > needs to be freed, instead of the global use_dma variable. > > Without this change, when the device probe fails, the driver will try > to release unallocated DMA resources, with nasty results. > > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > --- > This is minor fix, but a fix anyway, so it should be queued for stable. > > drivers/mtd/nand/pxa3xx_nand.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c > index 97c3eb5..8f2104c 100644 > --- a/drivers/mtd/nand/pxa3xx_nand.c > +++ b/drivers/mtd/nand/pxa3xx_nand.c > @@ -1288,7 +1288,7 @@ static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info) > static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info) > { > struct platform_device *pdev = info->pdev; > - if (use_dma) { > + if (info->use_dma) { With this change, then the 'else' case will be executed and will kfree() the data buffer that we didn't allocate? kfree(info->data_buff); Fortunately kfree()'ing a NULL is a no-op. But this highlights some of the strangeness of the error handling in this driver. It has an atypical style, in which the probe routine calls the remove routine if it fails. Normally, the probe routine has teardown carefully planned in stages, whereas the remove routine just does it all in one go (since it can assume the device was fully initialized). (And of course, the devm_* functions can help with simplifying some of this...) But I suppose this patch is correct as-is, and this issue could be revisited later. For my reference, have you actually seen this bug in practice? I'm not sure how well this fits in the -stable rules, if it hasn't been observed and tested appropriately. > pxa_free_dma(info->data_dma_ch); > dma_free_coherent(&pdev->dev, info->buf_size, > info->data_buff, info->data_buff_phys); Brian
On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote: > On Tue, Nov 26, 2013 at 09:52:24AM -0300, Ezequiel Garcia wrote: > > After the driver allocates all DMA resources, it sets "info->use_dma". > > Therefore, we need to check that variable to decide which resources > > needs to be freed, instead of the global use_dma variable. > > > > Without this change, when the device probe fails, the driver will try > > to release unallocated DMA resources, with nasty results. > > > > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > > --- > > This is minor fix, but a fix anyway, so it should be queued for stable. > > > > drivers/mtd/nand/pxa3xx_nand.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c > > index 97c3eb5..8f2104c 100644 > > --- a/drivers/mtd/nand/pxa3xx_nand.c > > +++ b/drivers/mtd/nand/pxa3xx_nand.c > > @@ -1288,7 +1288,7 @@ static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info) > > static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info) > > { > > struct platform_device *pdev = info->pdev; > > - if (use_dma) { > > + if (info->use_dma) { > > With this change, then the 'else' case will be executed and will kfree() > the data buffer that we didn't allocate? > > kfree(info->data_buff); > > Fortunately kfree()'ing a NULL is a no-op. > No. The buffer is allocated at a very early stage in DMA and non-DMA modes. It's later reallocated, but in any case when pxa3xx_nand_free_buff() is called, the buffer can never be NULL. The situation this commit fixes is that when the cleanup is called from the probe(), the DMA resources haven't been allocated yet and so the release must be kfree(). Remember we've recently changed this driver to work in non-DMA mode until the page size is correctly detected. Until then, info->use_dma is false. > But this highlights some of the strangeness of the error handling in > this driver. It has an atypical style, in which the probe routine calls > the remove routine if it fails. Normally, the probe routine has teardown > carefully planned in stages, whereas the remove routine just does it all > in one go (since it can assume the device was fully initialized). (And > of course, the devm_* functions can help with simplifying some of > this...) > Yes, I know. The driver is very very old, and some parts of it need some love. However, the error handling is not so bad. When pxa3xx_nand_remove() is called, all resources are guaranteed to be allocated so it works like a full-fledged release. I'm actually more annoyed by the fact there's two variables with the same name: use_dma, and info->use_dma. > But I suppose this patch is correct as-is, and this issue could be > revisited later. > > For my reference, have you actually seen this bug in practice? I'm not > sure how well this fits in the -stable rules, if it hasn't been observed > and tested appropriately. > Yes, I've seen this bug in practice, or otherwise wouldn't notice such small typo :-) The bug was triggered in PXA platforms, when the device cannot be identified (nand_scan fails). It's a very rare condition, but it's a real bug. Is that enough for -stable?
Brian, On Wed, Nov 27, 2013 at 08:34:05AM -0300, Ezequiel Garcia wrote: > On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote: > > On Tue, Nov 26, 2013 at 09:52:24AM -0300, Ezequiel Garcia wrote: > > > After the driver allocates all DMA resources, it sets "info->use_dma". > > > Therefore, we need to check that variable to decide which resources > > > needs to be freed, instead of the global use_dma variable. > > > > > > Without this change, when the device probe fails, the driver will try > > > to release unallocated DMA resources, with nasty results. > > > > > > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > > > --- > > > This is minor fix, but a fix anyway, so it should be queued for stable. > > > > > > drivers/mtd/nand/pxa3xx_nand.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c > > > index 97c3eb5..8f2104c 100644 > > > --- a/drivers/mtd/nand/pxa3xx_nand.c > > > +++ b/drivers/mtd/nand/pxa3xx_nand.c > > > @@ -1288,7 +1288,7 @@ static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info) > > > static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info) > > > { > > > struct platform_device *pdev = info->pdev; > > > - if (use_dma) { > > > + if (info->use_dma) { > > > > With this change, then the 'else' case will be executed and will kfree() > > the data buffer that we didn't allocate? > > > > kfree(info->data_buff); > > > > Fortunately kfree()'ing a NULL is a no-op. > > > > No. The buffer is allocated at a very early stage in DMA and non-DMA modes. > It's later reallocated, but in any case when pxa3xx_nand_free_buff() > is called, the buffer can never be NULL. > > The situation this commit fixes is that when the cleanup is called from > the probe(), the DMA resources haven't been allocated yet and so the > release must be kfree(). > > Remember we've recently changed this driver to work in non-DMA mode > until the page size is correctly detected. Until then, info->use_dma is > false. > > > But this highlights some of the strangeness of the error handling in > > this driver. It has an atypical style, in which the probe routine calls > > the remove routine if it fails. Normally, the probe routine has teardown > > carefully planned in stages, whereas the remove routine just does it all > > in one go (since it can assume the device was fully initialized). (And > > of course, the devm_* functions can help with simplifying some of > > this...) > > > > Yes, I know. The driver is very very old, and some parts of it need some > love. However, the error handling is not so bad. When > pxa3xx_nand_remove() is called, all resources are guaranteed to be > allocated so it works like a full-fledged release. > > I'm actually more annoyed by the fact there's two variables with the > same name: use_dma, and info->use_dma. > > > But I suppose this patch is correct as-is, and this issue could be > > revisited later. > > > > For my reference, have you actually seen this bug in practice? I'm not > > sure how well this fits in the -stable rules, if it hasn't been observed > > and tested appropriately. > > > > Yes, I've seen this bug in practice, or otherwise wouldn't notice such small > typo :-) The bug was triggered in PXA platforms, when the device cannot > be identified (nand_scan fails). > > It's a very rare condition, but it's a real bug. Is that enough for -stable? Just a gentle ping, in case this one felt through the cracks.
> Brian, > > On Wed, Nov 27, 2013 at 08:34:05AM -0300, Ezequiel Garcia wrote: > > On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote: > > > On Tue, Nov 26, 2013 at 09:52:24AM -0300, Ezequiel Garcia wrote: > > > > After the driver allocates all DMA resources, it sets "info->use_dma". > > > > Therefore, we need to check that variable to decide which resources > > > > needs to be freed, instead of the global use_dma variable. > > > > > > > > Without this change, when the device probe fails, the driver will try > > > > to release unallocated DMA resources, with nasty results. > > > > > > > > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > > > > --- > > > > This is minor fix, but a fix anyway, so it should be queued for stable. > > > > > > > > drivers/mtd/nand/pxa3xx_nand.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c > > > > index 97c3eb5..8f2104c 100644 > > > > --- a/drivers/mtd/nand/pxa3xx_nand.c > > > > +++ b/drivers/mtd/nand/pxa3xx_nand.c > > > > @@ -1288,7 +1288,7 @@ static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info) > > > > static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info) > > > > { > > > > struct platform_device *pdev = info->pdev; > > > > - if (use_dma) { > > > > + if (info->use_dma) { > > > > > > With this change, then the 'else' case will be executed and will kfree() > > > the data buffer that we didn't allocate? > > > > > > kfree(info->data_buff); > > > > > > Fortunately kfree()'ing a NULL is a no-op. > > > > > > > No. The buffer is allocated at a very early stage in DMA and non-DMA modes. > > It's later reallocated, but in any case when pxa3xx_nand_free_buff() > > is called, the buffer can never be NULL. > > > > The situation this commit fixes is that when the cleanup is called from > > the probe(), the DMA resources haven't been allocated yet and so the > > release must be kfree(). > > > > Remember we've recently changed this driver to work in non-DMA mode > > until the page size is correctly detected. Until then, info->use_dma is > > false. > > > > > But this highlights some of the strangeness of the error handling in > > > this driver. It has an atypical style, in which the probe routine calls > > > the remove routine if it fails. Normally, the probe routine has teardown > > > carefully planned in stages, whereas the remove routine just does it all > > > in one go (since it can assume the device was fully initialized). (And > > > of course, the devm_* functions can help with simplifying some of > > > this...) > > > > > > > Yes, I know. The driver is very very old, and some parts of it need some > > love. However, the error handling is not so bad. When > > pxa3xx_nand_remove() is called, all resources are guaranteed to be > > allocated so it works like a full-fledged release. > > > > I'm actually more annoyed by the fact there's two variables with the > > same name: use_dma, and info->use_dma. > > > > > But I suppose this patch is correct as-is, and this issue could be > > > revisited later. > > > > > > For my reference, have you actually seen this bug in practice? I'm not > > > sure how well this fits in the -stable rules, if it hasn't been observed > > > and tested appropriately. > > > > > > > Yes, I've seen this bug in practice, or otherwise wouldn't notice such small > > typo :-) The bug was triggered in PXA platforms, when the device cannot > > be identified (nand_scan fails). > > > > It's a very rare condition, but it's a real bug. Is that enough for -stable? > > Just a gentle ping, in case this one felt through the cracks. > I'm sorry to bother by pinging *again*, but I think this is a fix for v3.13.
Sorry this one fell through a bit. On Wed, Nov 27, 2013 at 08:34:06AM -0300, Ezequiel Garcia wrote: > On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote: > > For my reference, have you actually seen this bug in practice? I'm not > > sure how well this fits in the -stable rules, if it hasn't been observed > > and tested appropriately. > > Yes, I've seen this bug in practice, or otherwise wouldn't notice such small > typo :-) The bug was triggered in PXA platforms, when the device cannot > be identified (nand_scan fails). > > It's a very rare condition, but it's a real bug. Is that enough for -stable? Yes, this qualifies just fine. I'm not quite sure how far back this can go on stable, though, as we have a lot of churn in pxa3xx_nand.c. It looks like the bug is long-standing, but we may need to port it for older releases. Let me know what you want to do here. BTW, am I reading something wrong or is pxa3xx_nand.c broken for multi-CS systems? It seems like each chip's pxa3xx_nand_scan() will take turns overwriting info->buf_size and info->data_buff, leaking memory in the process. I guess the driver's designed to only use one chip at a time? Brian
> On Wed, Nov 27, 2013 at 08:34:06AM -0300, Ezequiel Garcia wrote: > > On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote: > > > > It's a very rare condition, but it's a real bug. Is that enough for -stable? > > Yes, this qualifies just fine. I'm not quite sure how far back this can > go on stable, though, as we have a lot of churn in pxa3xx_nand.c. > It looks like the bug is long-standing, but we may need to port it for > older releases. Let me know what you want to do here. > No, it's not. This was actually introduced by me when we added the non-DMA initial probing to allocate a buffer based on the page size. Namely, I *think* this commit introduces the bug (haven't actually tested it): commit 62e8b851783138a11da63285be0fbf69530ff73d Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> Date: Fri Oct 4 15:30:38 2013 -0300 mtd: nand: pxa3xx: Allocate data buffer on detected flash size This commit changed the way we allocate the buffer: the first READ_ID is issued with a small kmalloc'ed buffer and only later DMA buffer are allocated. So the actual decision must now be made based on 'info->use_dma' instead of module parameter 'use_dma' (this also uncovers a real crappy naming issue in this driver, which I should have changed long long ago). Therefore, this doesn't hit stable, and it's just v3.13 material; sorry for the confusion. And to be fair, the commit log could much better. Right now it looks a bit laconic. Want me to resend a more verbose one? > BTW, am I reading something wrong or is pxa3xx_nand.c broken for > multi-CS systems? It seems like each chip's pxa3xx_nand_scan() will take > turns overwriting info->buf_size and info->data_buff, leaking memory in > the process. I guess the driver's designed to only use one chip at a > time? > Glad you noticed. I've been having problems with this since a long time. The multi-chip feature was added a long time ago, and the last time I checked it also looked quite broken. However, I don't have any HW to test, nor I know anyone else that can help here. I've been tempted to remove it, but I wasn't convinced either. My best solution was to always work with extra care and touch the relevant multi-cs code in the least intrusive way. FWIW, there's not a single current user in the kernel for that, given 'num_cs' is always set to '1' in board files.
On Mon, Dec 09, 2013 at 07:32:59PM -0300, Ezequiel Garcia wrote: > > On Wed, Nov 27, 2013 at 08:34:06AM -0300, Ezequiel Garcia wrote: > > > On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote: > > > > > > It's a very rare condition, but it's a real bug. Is that enough for -stable? > > > > Yes, this qualifies just fine. I'm not quite sure how far back this can > > go on stable, though, as we have a lot of churn in pxa3xx_nand.c. > > It looks like the bug is long-standing, but we may need to port it for > > older releases. Let me know what you want to do here. > > > > No, it's not. This was actually introduced by me when we added the > non-DMA initial probing to allocate a buffer based on the page size. > > Namely, I *think* this commit introduces the bug (haven't actually > tested it): > > commit 62e8b851783138a11da63285be0fbf69530ff73d > Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> > Date: Fri Oct 4 15:30:38 2013 -0300 > > mtd: nand: pxa3xx: Allocate data buffer on detected flash size > > This commit changed the way we allocate the buffer: the first READ_ID > is issued with a small kmalloc'ed buffer and only later DMA buffer are > allocated. Yes, that commit makes sense as the introduction of the bug. So no need for -stable, then. > So the actual decision must now be made based on 'info->use_dma' instead > of module parameter 'use_dma' (this also uncovers a real crappy naming issue > in this driver, which I should have changed long long ago). > > Therefore, this doesn't hit stable, and it's just v3.13 material; sorry > for the confusion. And to be fair, the commit log could much better. > Right now it looks a bit laconic. Want me to resend a more verbose one? Yes, please. Brian
On Mon, Dec 09, 2013 at 03:10:28PM -0800, Brian Norris wrote: > On Mon, Dec 09, 2013 at 07:32:59PM -0300, Ezequiel Garcia wrote: [..] > > > > Therefore, this doesn't hit stable, and it's just v3.13 material; sorry > > for the confusion. And to be fair, the commit log could much better. > > Right now it looks a bit laconic. Want me to resend a more verbose one? > > Yes, please. > OK, I've just sent this one with a better commit message, together with a resend of the "need_wait" fix Arnaud and I were hunting recently. I think both are fine for v3.13.
diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c index 97c3eb5..8f2104c 100644 --- a/drivers/mtd/nand/pxa3xx_nand.c +++ b/drivers/mtd/nand/pxa3xx_nand.c @@ -1288,7 +1288,7 @@ static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info) static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info) { struct platform_device *pdev = info->pdev; - if (use_dma) { + if (info->use_dma) { pxa_free_dma(info->data_dma_ch); dma_free_coherent(&pdev->dev, info->buf_size, info->data_buff, info->data_buff_phys);
After the driver allocates all DMA resources, it sets "info->use_dma". Therefore, we need to check that variable to decide which resources needs to be freed, instead of the global use_dma variable. Without this change, when the device probe fails, the driver will try to release unallocated DMA resources, with nasty results. Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com> --- This is minor fix, but a fix anyway, so it should be queued for stable. drivers/mtd/nand/pxa3xx_nand.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)