diff mbox

mtd: nand: pxa3xx: Use info->use_dma to release DMA resources

Message ID 1385470344-2542-1-git-send-email-ezequiel.garcia@free-electrons.com
State Accepted
Commit 15b540c71cac840f0a3e8b1b4b7a773deb847ffb
Headers show

Commit Message

Ezequiel Garcia Nov. 26, 2013, 12:52 p.m. UTC
After the driver allocates all DMA resources, it sets "info->use_dma".
Therefore, we need to check that variable to decide which resources
needs to be freed, instead of the global use_dma variable.

Without this change, when the device probe fails, the driver will try
to release unallocated DMA resources, with nasty results.

Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
---
This is minor fix, but a fix anyway, so it should be queued for stable.

 drivers/mtd/nand/pxa3xx_nand.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Brian Norris Nov. 27, 2013, 7:58 a.m. UTC | #1
On Tue, Nov 26, 2013 at 09:52:24AM -0300, Ezequiel Garcia wrote:
> After the driver allocates all DMA resources, it sets "info->use_dma".
> Therefore, we need to check that variable to decide which resources
> needs to be freed, instead of the global use_dma variable.
> 
> Without this change, when the device probe fails, the driver will try
> to release unallocated DMA resources, with nasty results.
> 
> Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> ---
> This is minor fix, but a fix anyway, so it should be queued for stable.
> 
>  drivers/mtd/nand/pxa3xx_nand.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c
> index 97c3eb5..8f2104c 100644
> --- a/drivers/mtd/nand/pxa3xx_nand.c
> +++ b/drivers/mtd/nand/pxa3xx_nand.c
> @@ -1288,7 +1288,7 @@ static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info)
>  static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info)
>  {
>  	struct platform_device *pdev = info->pdev;
> -	if (use_dma) {
> +	if (info->use_dma) {

With this change, then the 'else' case will be executed and will kfree()
the data buffer that we didn't allocate?

  kfree(info->data_buff);

Fortunately kfree()'ing a NULL is a no-op.

But this highlights some of the strangeness of the error handling in
this driver. It has an atypical style, in which the probe routine calls
the remove routine if it fails. Normally, the probe routine has teardown
carefully planned in stages, whereas the remove routine just does it all
in one go (since it can assume the device was fully initialized). (And
of course, the devm_* functions can help with simplifying some of
this...)

But I suppose this patch is correct as-is, and this issue could be
revisited later.

For my reference, have you actually seen this bug in practice? I'm not
sure how well this fits in the -stable rules, if it hasn't been observed
and tested appropriately.

>  		pxa_free_dma(info->data_dma_ch);
>  		dma_free_coherent(&pdev->dev, info->buf_size,
>  				  info->data_buff, info->data_buff_phys);

Brian
Ezequiel Garcia Nov. 27, 2013, 11:34 a.m. UTC | #2
On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote:
> On Tue, Nov 26, 2013 at 09:52:24AM -0300, Ezequiel Garcia wrote:
> > After the driver allocates all DMA resources, it sets "info->use_dma".
> > Therefore, we need to check that variable to decide which resources
> > needs to be freed, instead of the global use_dma variable.
> > 
> > Without this change, when the device probe fails, the driver will try
> > to release unallocated DMA resources, with nasty results.
> > 
> > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> > ---
> > This is minor fix, but a fix anyway, so it should be queued for stable.
> > 
> >  drivers/mtd/nand/pxa3xx_nand.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c
> > index 97c3eb5..8f2104c 100644
> > --- a/drivers/mtd/nand/pxa3xx_nand.c
> > +++ b/drivers/mtd/nand/pxa3xx_nand.c
> > @@ -1288,7 +1288,7 @@ static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info)
> >  static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info)
> >  {
> >  	struct platform_device *pdev = info->pdev;
> > -	if (use_dma) {
> > +	if (info->use_dma) {
> 
> With this change, then the 'else' case will be executed and will kfree()
> the data buffer that we didn't allocate?
> 
>   kfree(info->data_buff);
> 
> Fortunately kfree()'ing a NULL is a no-op.
> 

No. The buffer is allocated at a very early stage in DMA and non-DMA modes.
It's later reallocated, but in any case when pxa3xx_nand_free_buff()
is called, the buffer can never be NULL.

The situation this commit fixes is that when the cleanup is called from
the probe(), the DMA resources haven't been allocated yet and so the
release must be kfree().

Remember we've recently changed this driver to work in non-DMA mode
until the page size is correctly detected. Until then, info->use_dma is
false.

> But this highlights some of the strangeness of the error handling in
> this driver. It has an atypical style, in which the probe routine calls
> the remove routine if it fails. Normally, the probe routine has teardown
> carefully planned in stages, whereas the remove routine just does it all
> in one go (since it can assume the device was fully initialized). (And
> of course, the devm_* functions can help with simplifying some of
> this...)
> 

Yes, I know. The driver is very very old, and some parts of it need some
love. However, the error handling is not so bad. When
pxa3xx_nand_remove() is called, all resources are guaranteed to be
allocated so it works like a full-fledged release.

I'm actually more annoyed by the fact there's two variables with the
same name: use_dma, and info->use_dma.

> But I suppose this patch is correct as-is, and this issue could be
> revisited later.
> 
> For my reference, have you actually seen this bug in practice? I'm not
> sure how well this fits in the -stable rules, if it hasn't been observed
> and tested appropriately.
> 

Yes, I've seen this bug in practice, or otherwise wouldn't notice such small
typo :-) The bug was triggered in PXA platforms, when the device cannot
be identified (nand_scan fails).

It's a very rare condition, but it's a real bug. Is that enough for -stable?
Ezequiel Garcia Dec. 6, 2013, 10:53 a.m. UTC | #3
Brian,

On Wed, Nov 27, 2013 at 08:34:05AM -0300, Ezequiel Garcia wrote:
> On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote:
> > On Tue, Nov 26, 2013 at 09:52:24AM -0300, Ezequiel Garcia wrote:
> > > After the driver allocates all DMA resources, it sets "info->use_dma".
> > > Therefore, we need to check that variable to decide which resources
> > > needs to be freed, instead of the global use_dma variable.
> > > 
> > > Without this change, when the device probe fails, the driver will try
> > > to release unallocated DMA resources, with nasty results.
> > > 
> > > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> > > ---
> > > This is minor fix, but a fix anyway, so it should be queued for stable.
> > > 
> > >  drivers/mtd/nand/pxa3xx_nand.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c
> > > index 97c3eb5..8f2104c 100644
> > > --- a/drivers/mtd/nand/pxa3xx_nand.c
> > > +++ b/drivers/mtd/nand/pxa3xx_nand.c
> > > @@ -1288,7 +1288,7 @@ static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info)
> > >  static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info)
> > >  {
> > >  	struct platform_device *pdev = info->pdev;
> > > -	if (use_dma) {
> > > +	if (info->use_dma) {
> > 
> > With this change, then the 'else' case will be executed and will kfree()
> > the data buffer that we didn't allocate?
> > 
> >   kfree(info->data_buff);
> > 
> > Fortunately kfree()'ing a NULL is a no-op.
> > 
> 
> No. The buffer is allocated at a very early stage in DMA and non-DMA modes.
> It's later reallocated, but in any case when pxa3xx_nand_free_buff()
> is called, the buffer can never be NULL.
> 
> The situation this commit fixes is that when the cleanup is called from
> the probe(), the DMA resources haven't been allocated yet and so the
> release must be kfree().
> 
> Remember we've recently changed this driver to work in non-DMA mode
> until the page size is correctly detected. Until then, info->use_dma is
> false.
> 
> > But this highlights some of the strangeness of the error handling in
> > this driver. It has an atypical style, in which the probe routine calls
> > the remove routine if it fails. Normally, the probe routine has teardown
> > carefully planned in stages, whereas the remove routine just does it all
> > in one go (since it can assume the device was fully initialized). (And
> > of course, the devm_* functions can help with simplifying some of
> > this...)
> > 
> 
> Yes, I know. The driver is very very old, and some parts of it need some
> love. However, the error handling is not so bad. When
> pxa3xx_nand_remove() is called, all resources are guaranteed to be
> allocated so it works like a full-fledged release.
> 
> I'm actually more annoyed by the fact there's two variables with the
> same name: use_dma, and info->use_dma.
> 
> > But I suppose this patch is correct as-is, and this issue could be
> > revisited later.
> > 
> > For my reference, have you actually seen this bug in practice? I'm not
> > sure how well this fits in the -stable rules, if it hasn't been observed
> > and tested appropriately.
> > 
> 
> Yes, I've seen this bug in practice, or otherwise wouldn't notice such small
> typo :-) The bug was triggered in PXA platforms, when the device cannot
> be identified (nand_scan fails).
> 
> It's a very rare condition, but it's a real bug. Is that enough for -stable?

Just a gentle ping, in case this one felt through the cracks.
Ezequiel Garcia Dec. 9, 2013, 9:20 p.m. UTC | #4
> Brian,
> 
> On Wed, Nov 27, 2013 at 08:34:05AM -0300, Ezequiel Garcia wrote:
> > On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote:
> > > On Tue, Nov 26, 2013 at 09:52:24AM -0300, Ezequiel Garcia wrote:
> > > > After the driver allocates all DMA resources, it sets "info->use_dma".
> > > > Therefore, we need to check that variable to decide which resources
> > > > needs to be freed, instead of the global use_dma variable.
> > > > 
> > > > Without this change, when the device probe fails, the driver will try
> > > > to release unallocated DMA resources, with nasty results.
> > > > 
> > > > Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> > > > ---
> > > > This is minor fix, but a fix anyway, so it should be queued for stable.
> > > > 
> > > >  drivers/mtd/nand/pxa3xx_nand.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c
> > > > index 97c3eb5..8f2104c 100644
> > > > --- a/drivers/mtd/nand/pxa3xx_nand.c
> > > > +++ b/drivers/mtd/nand/pxa3xx_nand.c
> > > > @@ -1288,7 +1288,7 @@ static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info)
> > > >  static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info)
> > > >  {
> > > >  	struct platform_device *pdev = info->pdev;
> > > > -	if (use_dma) {
> > > > +	if (info->use_dma) {
> > > 
> > > With this change, then the 'else' case will be executed and will kfree()
> > > the data buffer that we didn't allocate?
> > > 
> > >   kfree(info->data_buff);
> > > 
> > > Fortunately kfree()'ing a NULL is a no-op.
> > > 
> > 
> > No. The buffer is allocated at a very early stage in DMA and non-DMA modes.
> > It's later reallocated, but in any case when pxa3xx_nand_free_buff()
> > is called, the buffer can never be NULL.
> > 
> > The situation this commit fixes is that when the cleanup is called from
> > the probe(), the DMA resources haven't been allocated yet and so the
> > release must be kfree().
> > 
> > Remember we've recently changed this driver to work in non-DMA mode
> > until the page size is correctly detected. Until then, info->use_dma is
> > false.
> > 
> > > But this highlights some of the strangeness of the error handling in
> > > this driver. It has an atypical style, in which the probe routine calls
> > > the remove routine if it fails. Normally, the probe routine has teardown
> > > carefully planned in stages, whereas the remove routine just does it all
> > > in one go (since it can assume the device was fully initialized). (And
> > > of course, the devm_* functions can help with simplifying some of
> > > this...)
> > > 
> > 
> > Yes, I know. The driver is very very old, and some parts of it need some
> > love. However, the error handling is not so bad. When
> > pxa3xx_nand_remove() is called, all resources are guaranteed to be
> > allocated so it works like a full-fledged release.
> > 
> > I'm actually more annoyed by the fact there's two variables with the
> > same name: use_dma, and info->use_dma.
> > 
> > > But I suppose this patch is correct as-is, and this issue could be
> > > revisited later.
> > > 
> > > For my reference, have you actually seen this bug in practice? I'm not
> > > sure how well this fits in the -stable rules, if it hasn't been observed
> > > and tested appropriately.
> > > 
> > 
> > Yes, I've seen this bug in practice, or otherwise wouldn't notice such small
> > typo :-) The bug was triggered in PXA platforms, when the device cannot
> > be identified (nand_scan fails).
> > 
> > It's a very rare condition, but it's a real bug. Is that enough for -stable?
> 
> Just a gentle ping, in case this one felt through the cracks.
> 

I'm sorry to bother by pinging *again*, but I think this is a fix for v3.13.
Brian Norris Dec. 9, 2013, 9:55 p.m. UTC | #5
Sorry this one fell through a bit.

On Wed, Nov 27, 2013 at 08:34:06AM -0300, Ezequiel Garcia wrote:
> On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote:
> > For my reference, have you actually seen this bug in practice? I'm not
> > sure how well this fits in the -stable rules, if it hasn't been observed
> > and tested appropriately.
> 
> Yes, I've seen this bug in practice, or otherwise wouldn't notice such small
> typo :-) The bug was triggered in PXA platforms, when the device cannot
> be identified (nand_scan fails).
> 
> It's a very rare condition, but it's a real bug. Is that enough for -stable?

Yes, this qualifies just fine. I'm not quite sure how far back this can
go on stable, though, as we have a lot of churn in pxa3xx_nand.c. It
looks like the bug is long-standing, but we may need to port it for
older releases. Let me know what you want to do here.

BTW, am I reading something wrong or is pxa3xx_nand.c broken for
multi-CS systems? It seems like each chip's pxa3xx_nand_scan() will take
turns overwriting info->buf_size and info->data_buff, leaking memory in
the process. I guess the driver's designed to only use one chip at a
time?

Brian
Ezequiel Garcia Dec. 9, 2013, 10:32 p.m. UTC | #6
> On Wed, Nov 27, 2013 at 08:34:06AM -0300, Ezequiel Garcia wrote:
> > On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote:
> >
> > It's a very rare condition, but it's a real bug. Is that enough for -stable?
> 
> Yes, this qualifies just fine. I'm not quite sure how far back this can
> go on stable, though, as we have a lot of churn in pxa3xx_nand.c.
> It looks like the bug is long-standing, but we may need to port it for
> older releases. Let me know what you want to do here.
> 

No, it's not. This was actually introduced by me when we added the
non-DMA initial probing to allocate a buffer based on the page size.

Namely, I *think* this commit introduces the bug (haven't actually
tested it):

commit 62e8b851783138a11da63285be0fbf69530ff73d
Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Date:   Fri Oct 4 15:30:38 2013 -0300

  mtd: nand: pxa3xx: Allocate data buffer on detected flash size

This commit changed the way we allocate the buffer: the first READ_ID
is issued with a small kmalloc'ed buffer and only later DMA buffer are
allocated.

So the actual decision must now be made based on 'info->use_dma' instead
of module parameter 'use_dma' (this also uncovers a real crappy naming issue
in this driver, which I should have changed long long ago).

Therefore, this doesn't hit stable, and it's just v3.13 material; sorry
for the confusion. And to be fair, the commit log could much better.
Right now it looks a bit laconic. Want me to resend a more verbose one?

> BTW, am I reading something wrong or is pxa3xx_nand.c broken for
> multi-CS systems? It seems like each chip's pxa3xx_nand_scan() will take
> turns overwriting info->buf_size and info->data_buff, leaking memory in
> the process. I guess the driver's designed to only use one chip at a
> time?
> 

Glad you noticed. I've been having problems with this since a long time.
The multi-chip feature was added a long time ago, and the last time I
checked it also looked quite broken. However, I don't have any HW to
test, nor I know anyone else that can help here.

I've been tempted to remove it, but I wasn't convinced either. My best
solution was to always work with extra care and touch the relevant multi-cs
code in the least intrusive way.

FWIW, there's not a single current user in the kernel for that, given
'num_cs' is always set to '1' in board files.
Brian Norris Dec. 9, 2013, 11:10 p.m. UTC | #7
On Mon, Dec 09, 2013 at 07:32:59PM -0300, Ezequiel Garcia wrote:
> > On Wed, Nov 27, 2013 at 08:34:06AM -0300, Ezequiel Garcia wrote:
> > > On Tue, Nov 26, 2013 at 11:58:19PM -0800, Brian Norris wrote:
> > >
> > > It's a very rare condition, but it's a real bug. Is that enough for -stable?
> > 
> > Yes, this qualifies just fine. I'm not quite sure how far back this can
> > go on stable, though, as we have a lot of churn in pxa3xx_nand.c.
> > It looks like the bug is long-standing, but we may need to port it for
> > older releases. Let me know what you want to do here.
> > 
> 
> No, it's not. This was actually introduced by me when we added the
> non-DMA initial probing to allocate a buffer based on the page size.
> 
> Namely, I *think* this commit introduces the bug (haven't actually
> tested it):
> 
> commit 62e8b851783138a11da63285be0fbf69530ff73d
> Author: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
> Date:   Fri Oct 4 15:30:38 2013 -0300
> 
>   mtd: nand: pxa3xx: Allocate data buffer on detected flash size
> 
> This commit changed the way we allocate the buffer: the first READ_ID
> is issued with a small kmalloc'ed buffer and only later DMA buffer are
> allocated.

Yes, that commit makes sense as the introduction of the bug. So no need
for -stable, then.

> So the actual decision must now be made based on 'info->use_dma' instead
> of module parameter 'use_dma' (this also uncovers a real crappy naming issue
> in this driver, which I should have changed long long ago).
> 
> Therefore, this doesn't hit stable, and it's just v3.13 material; sorry
> for the confusion. And to be fair, the commit log could much better.
> Right now it looks a bit laconic. Want me to resend a more verbose one?

Yes, please.

Brian
Ezequiel Garcia Dec. 10, 2013, 12:59 p.m. UTC | #8
On Mon, Dec 09, 2013 at 03:10:28PM -0800, Brian Norris wrote:
> On Mon, Dec 09, 2013 at 07:32:59PM -0300, Ezequiel Garcia wrote:
[..]
> > 
> > Therefore, this doesn't hit stable, and it's just v3.13 material; sorry
> > for the confusion. And to be fair, the commit log could much better.
> > Right now it looks a bit laconic. Want me to resend a more verbose one?
> 
> Yes, please.
> 

OK, I've just sent this one with a better commit message, together with
a resend of the "need_wait" fix Arnaud and I were hunting recently.

I think both are fine for v3.13.
diff mbox

Patch

diff --git a/drivers/mtd/nand/pxa3xx_nand.c b/drivers/mtd/nand/pxa3xx_nand.c
index 97c3eb5..8f2104c 100644
--- a/drivers/mtd/nand/pxa3xx_nand.c
+++ b/drivers/mtd/nand/pxa3xx_nand.c
@@ -1288,7 +1288,7 @@  static int pxa3xx_nand_init_buff(struct pxa3xx_nand_info *info)
 static void pxa3xx_nand_free_buff(struct pxa3xx_nand_info *info)
 {
 	struct platform_device *pdev = info->pdev;
-	if (use_dma) {
+	if (info->use_dma) {
 		pxa_free_dma(info->data_dma_ch);
 		dma_free_coherent(&pdev->dev, info->buf_size,
 				  info->data_buff, info->data_buff_phys);