diff mbox

[U-Boot] FIX: fat: Provide correct return code from disk_{read|write} to upper layers

Message ID 1441282899-13569-1-git-send-email-l.majewski@samsung.com
State Accepted
Delegated to: Tom Rini
Headers show

Commit Message

Łukasz Majewski Sept. 3, 2015, 12:21 p.m. UTC
It is very common that FAT code is using following pattern:
if (disk_{read|write}() < 0)
        return -1;

Up till now the above code was dead, since disk_{read|write) could only
return value >= 0.
As a result some errors from medium layer (i.e. eMMC/SD) were not caught.

The above behavior was caused by block_{read|write|erase} declared at
struct block_dev_desc (@part.h). It returns unsigned long, where 0
indicates error and > 0 indicates that medium operation was correct.

This patch as error regards 0 returned from block_{read|write|erase}
when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0
should return 0 and hence is not considered as an error.

Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>

Test HW: Odroid XU3 - Exynos 5433
---
 fs/fat/fat.c       | 11 +++++++++--
 fs/fat/fat_write.c | 11 +++++++++--
 2 files changed, 18 insertions(+), 4 deletions(-)

Comments

Tom Rini Sept. 3, 2015, 12:44 p.m. UTC | #1
On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:

> It is very common that FAT code is using following pattern:
> if (disk_{read|write}() < 0)
>         return -1;
> 
> Up till now the above code was dead, since disk_{read|write) could only
> return value >= 0.
> As a result some errors from medium layer (i.e. eMMC/SD) were not caught.
> 
> The above behavior was caused by block_{read|write|erase} declared at
> struct block_dev_desc (@part.h). It returns unsigned long, where 0
> indicates error and > 0 indicates that medium operation was correct.
> 
> This patch as error regards 0 returned from block_{read|write|erase}
> when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0
> should return 0 and hence is not considered as an error.
> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> 
> Test HW: Odroid XU3 - Exynos 5433

Can you pick up Stephen's FAT replacement series and see if it also
fixes this problem?  Thanks!
Łukasz Majewski Sept. 3, 2015, 1:40 p.m. UTC | #2
Hi Tom,

> On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
> 
> > It is very common that FAT code is using following pattern:
> > if (disk_{read|write}() < 0)
> >         return -1;
> > 
> > Up till now the above code was dead, since disk_{read|write) could
> > only return value >= 0.
> > As a result some errors from medium layer (i.e. eMMC/SD) were not
> > caught.
> > 
> > The above behavior was caused by block_{read|write|erase} declared
> > at struct block_dev_desc (@part.h). It returns unsigned long, where
> > 0 indicates error and > 0 indicates that medium operation was
> > correct.
> > 
> > This patch as error regards 0 returned from block_{read|write|erase}
> > when nr_blocks is grater than zero. Read/Write operation with
> > nr_blocks=0 should return 0 and hence is not considered as an error.
> > 
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > 
> > Test HW: Odroid XU3 - Exynos 5433
> 
> Can you pick up Stephen's FAT replacement series and see if it also
> fixes this problem?  Thanks!
> 

Ok, I will test this fat implementation.

However, since for v2015.10 it won't be included, I would also opt for
adding this fix to the current u-boot.
Łukasz Majewski Sept. 3, 2015, 2:18 p.m. UTC | #3
Hi Lukasz,

> Hi Tom,
> 
> > On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
> > 
> > > It is very common that FAT code is using following pattern:
> > > if (disk_{read|write}() < 0)
> > >         return -1;
> > > 
> > > Up till now the above code was dead, since disk_{read|write) could
> > > only return value >= 0.
> > > As a result some errors from medium layer (i.e. eMMC/SD) were not
> > > caught.
> > > 
> > > The above behavior was caused by block_{read|write|erase} declared
> > > at struct block_dev_desc (@part.h). It returns unsigned long,
> > > where 0 indicates error and > 0 indicates that medium operation
> > > was correct.
> > > 
> > > This patch as error regards 0 returned from
> > > block_{read|write|erase} when nr_blocks is grater than zero.
> > > Read/Write operation with nr_blocks=0 should return 0 and hence
> > > is not considered as an error.
> > > 
> > > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> > > 
> > > Test HW: Odroid XU3 - Exynos 5433
> > 
> > Can you pick up Stephen's FAT replacement series and see if it also
> > fixes this problem?  Thanks!
> > 
> 
> Ok, I will test this fat implementation.

I've applied v2 of this patchset
on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f

Unfortunately, DFU tests fail with first attempt to pass the test.

Apparently this code needs some more review/testing before being
included to main line.

> 
> However, since for v2015.10 it won't be included, I would also opt for
> adding this fix to the current u-boot.
>
Łukasz Majewski Sept. 9, 2015, 7:02 a.m. UTC | #4
Hi,

> It is very common that FAT code is using following pattern:
> if (disk_{read|write}() < 0)
>         return -1;
> 
> Up till now the above code was dead, since disk_{read|write) could
> only return value >= 0.
> As a result some errors from medium layer (i.e. eMMC/SD) were not
> caught.
> 
> The above behavior was caused by block_{read|write|erase} declared at
> struct block_dev_desc (@part.h). It returns unsigned long, where 0
> indicates error and > 0 indicates that medium operation was correct.
> 
> This patch as error regards 0 returned from block_{read|write|erase}
> when nr_blocks is grater than zero. Read/Write operation with
> nr_blocks=0 should return 0 and hence is not considered as an error.
> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>

Are there any more questions regarding this patch? I'd be more than
happy if it would be added to v2015.10 :-).

> 
> Test HW: Odroid XU3 - Exynos 5433
> ---
>  fs/fat/fat.c       | 11 +++++++++--
>  fs/fat/fat_write.c | 11 +++++++++--
>  2 files changed, 18 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/fat/fat.c b/fs/fat/fat.c
> index bccc3e3..d743014 100644
> --- a/fs/fat/fat.c
> +++ b/fs/fat/fat.c
> @@ -45,11 +45,18 @@ static disk_partition_t cur_part_info;
>  
>  static int disk_read(__u32 block, __u32 nr_blocks, void *buf)
>  {
> +	ulong ret;
> +
>  	if (!cur_dev || !cur_dev->block_read)
>  		return -1;
>  
> -	return cur_dev->block_read(cur_dev->dev,
> -			cur_part_info.start + block, nr_blocks, buf);
> +	ret = cur_dev->block_read(cur_dev->dev,
> +				  cur_part_info.start + block,
> nr_blocks, buf); +
> +	if (nr_blocks && ret == 0)
> +		return -1;
> +
> +	return ret;
>  }
>  
>  int fat_set_blk_dev(block_dev_desc_t *dev_desc, disk_partition_t
> *info) diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c
> index 98b88ad..adb6940 100644
> --- a/fs/fat/fat_write.c
> +++ b/fs/fat/fat_write.c
> @@ -30,6 +30,8 @@ static void uppercase(char *str, int len)
>  static int total_sector;
>  static int disk_write(__u32 block, __u32 nr_blocks, void *buf)
>  {
> +	ulong ret;
> +
>  	if (!cur_dev || !cur_dev->block_write)
>  		return -1;
>  
> @@ -39,8 +41,13 @@ static int disk_write(__u32 block, __u32
> nr_blocks, void *buf) return -1;
>  	}
>  
> -	return cur_dev->block_write(cur_dev->dev,
> -			cur_part_info.start + block,
> nr_blocks,	buf);
> +	ret = cur_dev->block_write(cur_dev->dev,
> +				   cur_part_info.start + block,
> +				   nr_blocks, buf);
> +	if (nr_blocks && ret == 0)
> +		return -1;
> +
> +	return ret;
>  }
>  
>  /*
Tom Rini Sept. 12, 2015, 12:51 p.m. UTC | #5
On Thu, Sep 03, 2015 at 02:21:39PM +0200, Łukasz Majewski wrote:

> It is very common that FAT code is using following pattern:
> if (disk_{read|write}() < 0)
>         return -1;
> 
> Up till now the above code was dead, since disk_{read|write) could only
> return value >= 0.
> As a result some errors from medium layer (i.e. eMMC/SD) were not caught.
> 
> The above behavior was caused by block_{read|write|erase} declared at
> struct block_dev_desc (@part.h). It returns unsigned long, where 0
> indicates error and > 0 indicates that medium operation was correct.
> 
> This patch as error regards 0 returned from block_{read|write|erase}
> when nr_blocks is grater than zero. Read/Write operation with nr_blocks=0
> should return 0 and hence is not considered as an error.
> 
> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> 
> Test HW: Odroid XU3 - Exynos 5433

Applied to u-boot/master, thanks!
Łukasz Majewski Sept. 17, 2015, 2:44 p.m. UTC | #6
Hi Tom,,

> Hi,
> 
> > It is very common that FAT code is using following pattern:
> > if (disk_{read|write}() < 0)
> >         return -1;
> > 
> > Up till now the above code was dead, since disk_{read|write) could
> > only return value >= 0.
> > As a result some errors from medium layer (i.e. eMMC/SD) were not
> > caught.
> > 
> > The above behavior was caused by block_{read|write|erase} declared
> > at struct block_dev_desc (@part.h). It returns unsigned long, where
> > 0 indicates error and > 0 indicates that medium operation was
> > correct.
> > 
> > This patch as error regards 0 returned from block_{read|write|erase}
> > when nr_blocks is grater than zero. Read/Write operation with
> > nr_blocks=0 should return 0 and hence is not considered as an error.
> > 
> > Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> 
> Are there any more questions regarding this patch? I'd be more than
> happy if it would be added to v2015.10 :-).

Gentle ping :-)

> 
> > 
> > Test HW: Odroid XU3 - Exynos 5433
> > ---
> >  fs/fat/fat.c       | 11 +++++++++--
> >  fs/fat/fat_write.c | 11 +++++++++--
> >  2 files changed, 18 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/fat/fat.c b/fs/fat/fat.c
> > index bccc3e3..d743014 100644
> > --- a/fs/fat/fat.c
> > +++ b/fs/fat/fat.c
> > @@ -45,11 +45,18 @@ static disk_partition_t cur_part_info;
> >  
> >  static int disk_read(__u32 block, __u32 nr_blocks, void *buf)
> >  {
> > +	ulong ret;
> > +
> >  	if (!cur_dev || !cur_dev->block_read)
> >  		return -1;
> >  
> > -	return cur_dev->block_read(cur_dev->dev,
> > -			cur_part_info.start + block, nr_blocks,
> > buf);
> > +	ret = cur_dev->block_read(cur_dev->dev,
> > +				  cur_part_info.start + block,
> > nr_blocks, buf); +
> > +	if (nr_blocks && ret == 0)
> > +		return -1;
> > +
> > +	return ret;
> >  }
> >  
> >  int fat_set_blk_dev(block_dev_desc_t *dev_desc, disk_partition_t
> > *info) diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c
> > index 98b88ad..adb6940 100644
> > --- a/fs/fat/fat_write.c
> > +++ b/fs/fat/fat_write.c
> > @@ -30,6 +30,8 @@ static void uppercase(char *str, int len)
> >  static int total_sector;
> >  static int disk_write(__u32 block, __u32 nr_blocks, void *buf)
> >  {
> > +	ulong ret;
> > +
> >  	if (!cur_dev || !cur_dev->block_write)
> >  		return -1;
> >  
> > @@ -39,8 +41,13 @@ static int disk_write(__u32 block, __u32
> > nr_blocks, void *buf) return -1;
> >  	}
> >  
> > -	return cur_dev->block_write(cur_dev->dev,
> > -			cur_part_info.start + block,
> > nr_blocks,	buf);
> > +	ret = cur_dev->block_write(cur_dev->dev,
> > +				   cur_part_info.start + block,
> > +				   nr_blocks, buf);
> > +	if (nr_blocks && ret == 0)
> > +		return -1;
> > +
> > +	return ret;
> >  }
> >  
> >  /*
> 
> 
>
Stephen Warren Sept. 23, 2015, 3:17 a.m. UTC | #7
On 09/03/2015 08:18 AM, Lukasz Majewski wrote:
> Hi Lukasz,
> 
>> Hi Tom,
>>
>>> On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
>>>
>>>> It is very common that FAT code is using following pattern:
>>>> if (disk_{read|write}() < 0)
>>>>         return -1;
>>>>
>>>> Up till now the above code was dead, since disk_{read|write) could
>>>> only return value >= 0.
>>>> As a result some errors from medium layer (i.e. eMMC/SD) were not
>>>> caught.
>>>>
>>>> The above behavior was caused by block_{read|write|erase} declared
>>>> at struct block_dev_desc (@part.h). It returns unsigned long,
>>>> where 0 indicates error and > 0 indicates that medium operation
>>>> was correct.
>>>>
>>>> This patch as error regards 0 returned from
>>>> block_{read|write|erase} when nr_blocks is grater than zero.
>>>> Read/Write operation with nr_blocks=0 should return 0 and hence
>>>> is not considered as an error.
>>>>
>>>> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
>>>>
>>>> Test HW: Odroid XU3 - Exynos 5433
>>>
>>> Can you pick up Stephen's FAT replacement series and see if it also
>>> fixes this problem?  Thanks!
>>>
>>
>> Ok, I will test this fat implementation.
> 
> I've applied v2 of this patchset
> on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f
> 
> Unfortunately, DFU tests fail with first attempt to pass the test.

I've found a couple of problems.

First up, file_fat_write() wasn't truncating the file when writing, so
the file size wasn't changing when over-writing a large file with a
small file. With this fixed, I can run the DFU tests just fine for all
the small files (<1M). I've fixed this locally and in the ff branch on
my github.

Second, ff is slow:

Some random old build I had in flash on my system:
> Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
> reading dfu1.bin
> 1048576 bytes read in 95 ms (10.5 MiB/s)

With my ff branch:
> Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
> 1048576 bytes read in 5038 ms (203.1 KiB/s)

That's quite the slow-down! I believe this is causing dfu-util to time
out on the larger files (1M+). Just for functional testing, I'll try and
find a way to hack dfu-util to have a much larger timeout for the final
flush operation. I wonder if the old FAT implementation had a disk cache
(e.g. that 32K buffer in BSS?) and we need the same for ff? I'll try and
track down why it's so slow.

Perhaps there are other issues as yet unfound.
Łukasz Majewski Sept. 23, 2015, 8:40 a.m. UTC | #8
Hi Stephen,

> On 09/03/2015 08:18 AM, Lukasz Majewski wrote:
> > Hi Lukasz,
> > 
> >> Hi Tom,
> >>
> >>> On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
> >>>
> >>>> It is very common that FAT code is using following pattern:
> >>>> if (disk_{read|write}() < 0)
> >>>>         return -1;
> >>>>
> >>>> Up till now the above code was dead, since disk_{read|write)
> >>>> could only return value >= 0.
> >>>> As a result some errors from medium layer (i.e. eMMC/SD) were not
> >>>> caught.
> >>>>
> >>>> The above behavior was caused by block_{read|write|erase}
> >>>> declared at struct block_dev_desc (@part.h). It returns unsigned
> >>>> long, where 0 indicates error and > 0 indicates that medium
> >>>> operation was correct.
> >>>>
> >>>> This patch as error regards 0 returned from
> >>>> block_{read|write|erase} when nr_blocks is grater than zero.
> >>>> Read/Write operation with nr_blocks=0 should return 0 and hence
> >>>> is not considered as an error.
> >>>>
> >>>> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
> >>>>
> >>>> Test HW: Odroid XU3 - Exynos 5433
> >>>
> >>> Can you pick up Stephen's FAT replacement series and see if it
> >>> also fixes this problem?  Thanks!
> >>>
> >>
> >> Ok, I will test this fat implementation.
> > 
> > I've applied v2 of this patchset
> > on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f
> > 
> > Unfortunately, DFU tests fail with first attempt to pass the test.
> 
> I've found a couple of problems.
> 
> First up, file_fat_write() wasn't truncating the file when writing, so
> the file size wasn't changing when over-writing a large file with a
> small file. With this fixed, I can run the DFU tests just fine for all
> the small files (<1M). I've fixed this locally and in the ff branch on
> my github.

Nice to hear that you have found the error.

> 
> Second, ff is slow:
> 
> Some random old build I had in flash on my system:
> > Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
> > reading dfu1.bin
> > 1048576 bytes read in 95 ms (10.5 MiB/s)
> 
> With my ff branch:
> > Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
> > 1048576 bytes read in 5038 ms (203.1 KiB/s)
> 
> That's quite the slow-down! I believe this is causing dfu-util to time
> out on the larger files (1M+). Just for functional testing, I'll try
> and find a way to hack dfu-util to have a much larger timeout for the
> final flush operation. I wonder if the old FAT implementation had a
> disk cache (e.g. that 32K buffer in BSS?) and we need the same for
> ff? 

I think that our current Fat implementation is optimized for tiny
embedded system (and probably no cache).

> I'll try and track down why it's so slow.
> 
> Perhaps there are other issues as yet unfound.

We might also check with sandbox FS set of tests.
Stephen Warren Sept. 25, 2015, 5:47 a.m. UTC | #9
On 09/23/2015 02:40 AM, Lukasz Majewski wrote:
> Hi Stephen,
> 
>> On 09/03/2015 08:18 AM, Lukasz Majewski wrote:
>>> Hi Lukasz,
>>>
>>>> Hi Tom,
>>>>
>>>>> On Thu, Sep 03, 2015 at 02:21:39PM +0200, Lukasz Majewski wrote:
>>>>>
>>>>>> It is very common that FAT code is using following pattern:
>>>>>> if (disk_{read|write}() < 0)
>>>>>>         return -1;
>>>>>>
>>>>>> Up till now the above code was dead, since disk_{read|write)
>>>>>> could only return value >= 0.
>>>>>> As a result some errors from medium layer (i.e. eMMC/SD) were not
>>>>>> caught.
>>>>>>
>>>>>> The above behavior was caused by block_{read|write|erase}
>>>>>> declared at struct block_dev_desc (@part.h). It returns unsigned
>>>>>> long, where 0 indicates error and > 0 indicates that medium
>>>>>> operation was correct.
>>>>>>
>>>>>> This patch as error regards 0 returned from
>>>>>> block_{read|write|erase} when nr_blocks is grater than zero.
>>>>>> Read/Write operation with nr_blocks=0 should return 0 and hence
>>>>>> is not considered as an error.
>>>>>>
>>>>>> Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
>>>>>>
>>>>>> Test HW: Odroid XU3 - Exynos 5433
>>>>>
>>>>> Can you pick up Stephen's FAT replacement series and see if it
>>>>> also fixes this problem?  Thanks!
>>>>>
>>>>
>>>> Ok, I will test this fat implementation.
>>>
>>> I've applied v2 of this patchset
>>> on top of SHA1: 79c884d7e449a63fa8f07b7495f8f9873355c48f
>>>
>>> Unfortunately, DFU tests fail with first attempt to pass the test.
>>
>> I've found a couple of problems.
>>
>> First up, file_fat_write() wasn't truncating the file when writing, so
>> the file size wasn't changing when over-writing a large file with a
>> small file. With this fixed, I can run the DFU tests just fine for all
>> the small files (<1M). I've fixed this locally and in the ff branch on
>> my github.
> 
> Nice to hear that you have found the error.
> 
>> Second, ff is slow:
>>
>> Some random old build I had in flash on my system:
>>> Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
>>> reading dfu1.bin
>>> 1048576 bytes read in 95 ms (10.5 MiB/s)
>>
>> With my ff branch:
>>> Tegra124 (Jetson TK1) # load mmc 1:1 $loadaddr dfu1.bin
>>> 1048576 bytes read in 5038 ms (203.1 KiB/s)
>>
>> That's quite the slow-down! I believe this is causing dfu-util to time
>> out on the larger files (1M+). Just for functional testing, I'll try
>> and find a way to hack dfu-util to have a much larger timeout for the
>> final flush operation. I wonder if the old FAT implementation had a
>> disk cache (e.g. that 32K buffer in BSS?) and we need the same for
>> ff? 

Extending the timeout (massively) in dfu-util did make
dfu_gadget_test.sh work.

> I think that our current Fat implementation is optimized for tiny
> embedded system (and probably no cache).

The ff library claims to be too. I'll try tracing the IO pattern in
sandbox and see where the difference lies.

>> I'll try and track down why it's so slow.
>>
>> Perhaps there are other issues as yet unfound.
> 
> We might also check with sandbox FS set of tests.

I get the same failures for fs-test.sh with or without this series; TC10
fails and everything else passes, for the non-"sb" FAT tests. (with the
file truncation fix I mentioned above applied, although I don't know if
it matters for fs-test.sh)
diff mbox

Patch

diff --git a/fs/fat/fat.c b/fs/fat/fat.c
index bccc3e3..d743014 100644
--- a/fs/fat/fat.c
+++ b/fs/fat/fat.c
@@ -45,11 +45,18 @@  static disk_partition_t cur_part_info;
 
 static int disk_read(__u32 block, __u32 nr_blocks, void *buf)
 {
+	ulong ret;
+
 	if (!cur_dev || !cur_dev->block_read)
 		return -1;
 
-	return cur_dev->block_read(cur_dev->dev,
-			cur_part_info.start + block, nr_blocks, buf);
+	ret = cur_dev->block_read(cur_dev->dev,
+				  cur_part_info.start + block, nr_blocks, buf);
+
+	if (nr_blocks && ret == 0)
+		return -1;
+
+	return ret;
 }
 
 int fat_set_blk_dev(block_dev_desc_t *dev_desc, disk_partition_t *info)
diff --git a/fs/fat/fat_write.c b/fs/fat/fat_write.c
index 98b88ad..adb6940 100644
--- a/fs/fat/fat_write.c
+++ b/fs/fat/fat_write.c
@@ -30,6 +30,8 @@  static void uppercase(char *str, int len)
 static int total_sector;
 static int disk_write(__u32 block, __u32 nr_blocks, void *buf)
 {
+	ulong ret;
+
 	if (!cur_dev || !cur_dev->block_write)
 		return -1;
 
@@ -39,8 +41,13 @@  static int disk_write(__u32 block, __u32 nr_blocks, void *buf)
 		return -1;
 	}
 
-	return cur_dev->block_write(cur_dev->dev,
-			cur_part_info.start + block, nr_blocks,	buf);
+	ret = cur_dev->block_write(cur_dev->dev,
+				   cur_part_info.start + block,
+				   nr_blocks, buf);
+	if (nr_blocks && ret == 0)
+		return -1;
+
+	return ret;
 }
 
 /*