diff mbox

[U-Boot,2/4] mmc: dw_mmc: Zap endless timeout

Message ID 1441958371.2737.19.camel@synopsys.com
State Superseded
Delegated to: Pantelis Antoniou
Headers show

Commit Message

Alexey Brodkin Sept. 11, 2015, 7:59 a.m. UTC
Hi Marek,

On Mon, 2015-07-27 at 22:39 +0200, Marek Vasut wrote:
> Endless timeouts are bad, since if we get stuck in one, we have no
> way out. Zap this one by implementing proper timeout.
> 
> Signed-off-by: Marek Vasut <marex@denx.de>
> Cc: Dinh Nguyen <dinguyen@opensource.altera.com>
> Cc: Pantelis Antoniou <panto@antoniou-consulting.com>
> Cc: Tom Rini <trini@konsulko.com>
> ---
>  drivers/mmc/dw_mmc.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
> index 3fffa71..0f61f16 100644
> --- a/drivers/mmc/dw_mmc.c
> +++ b/drivers/mmc/dw_mmc.c
> @@ -211,14 +211,29 @@ static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
>  	}
>  
>  	if (data) {
> -		do {
> +		start = get_timer(0);
> +		timeout = 1000;
> +		for (;;) {
>  			mask = dwmci_readl(host, DWMCI_RINTSTS);
> +			/* Error during data transfer. */
>  			if (mask & (DWMCI_DATA_ERR | DWMCI_DATA_TOUT)) {
>  				printf("%s: DATA ERROR!\n", __func__);
>  				bounce_buffer_stop(&bbstate);
>  				return -1;
>  			}
> -		} while (!(mask & DWMCI_INTMSK_DTO));
> +
> +			/* Data arrived correctly. */
> +			if (mask & DWMCI_INTMSK_DTO)
> +				break;
> +
> +			/* Check for timeout. */
> +			if (get_timer(start) > timeout) {
> +				printf("%s: Timeout waiting for data!\n",
> +				       __func__);
> +				bounce_buffer_stop(&bbstate);
> +				return TIMEOUT;
> +			}
> +		}
>  
>  		dwmci_writel(host, DWMCI_RINTSTS, mask);
>  

It turned out that patch breaks functionality in some cases.
For me on every attempt to download something significant (at least I see it on
5/7 Mb files) from SD I'm seeing timeout firing too early.

I added a bit of extra instrumentation to see where time is spent and why.

So my diff is:
----------------------------------->8--------------------------------
And that's what I see then:
----------------------------------->8--------------------------------
AXS# fatload mmc 0
 * time spent: 0, data size: 8, blocks: 1
 * time spent: 0, data size: 512, blocks: 1
 * time spent: 0, data size: 512, blocks: 1
 * time spent: 0, data size: 512, blocks: 1
reading uImage
 * time spent: 1, data size: 512, blocks: 1
 * time spent: 0, data size: 1024, blocks: 2
 * time spent: 1, data size: 3072, blocks: 6
 * time spent: 1, data size: 3072, blocks: 6
 * time spent: 1, data size: 3072, blocks: 6
 * time spent: 0, data size: 3072, blocks: 6
 * time spent: 0, data size: 3072, blocks: 6
 * time spent: 1599, data size: 13338112, blocks: 26051
 * time spent: 0, data size: 512, blocks: 1
13338188 bytes read in 1651 ms (7.7 MiB/s)
----------------------------------->8--------------------------------

So you see real data transfer takes  ~1.7 seconds when getting 26k blocks.

In other words timeout check has to be a bit smarter, for example
taking into account number of blocks to be transferred.

Any thoughts?

-Alexey

Comments

Marek Vasut Sept. 11, 2015, 11:49 a.m. UTC | #1
On Friday, September 11, 2015 at 09:59:32 AM, Alexey Brodkin wrote:
> Hi Marek,

Hi!

> On Mon, 2015-07-27 at 22:39 +-0200, Marek Vasut wrote:
> +AD4- Endless timeouts are bad, since if we get stuck in one, we have no
> +AD4- way out. Zap this one by implementing proper timeout.
> +AD4-
> +AD4- Signed-off-by: Marek Vasut +ADw-marex+AEA-denx.de+AD4-
> +AD4- Cc: Dinh Nguyen +ADw-dinguyen+AEA-opensource.altera.com+AD4-
> +AD4- Cc: Pantelis Antoniou +ADw-panto+AEA-antoniou-consulting.com+AD4-
> +AD4- Cc: Tom Rini +ADw-trini+AEA-konsulko.com+AD4-
> +AD4- ---
> +AD4-  drivers/mmc/dw+AF8-mmc.c +AHw- 19
> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+--- +AD4-  1 file changed, 17
> insertions(+-), 2 deletions(-)
> +AD4-
> +AD4- diff --git a/drivers/mmc/dw+AF8-mmc.c b/drivers/mmc/dw+AF8-mmc.c
> +AD4- index 3fffa71..0f61f16 100644
> +AD4- --- a/drivers/mmc/dw+AF8-mmc.c
> +AD4- +-+-+- b/drivers/mmc/dw+AF8-mmc.c
> +AD4- +AEAAQA- -211,14 +-211,29 +AEAAQA- static int
> dwmci+AF8-send+AF8-cmd(struct mmc +ACo-mmc, struct mmc+AF8-cmd +ACo-cmd,
> +AD4-  	+AH0-
> +AD4-
> +AD4-  	if (data) +AHs-
> +AD4- -		do +AHs-
> +AD4- +-		start +AD0- get+AF8-timer(0)+ADs-
> +AD4- +-		timeout +AD0- 1000+ADs-
> +AD4- +-		for (+ADsAOw-) +AHs-
> +AD4-  			mask +AD0- dwmci+AF8-readl(host, DWMCI+AF8-
RINTSTS)+ADs-
> +AD4- +-			/+ACo- Error during data transfer. +ACo-/
> +AD4-  			if (mask +ACY- (DWMCI+AF8-DATA+AF8-ERR +AHw-
> DWMCI+AF8-DATA+AF8-TOUT)) +AHs- +AD4-  				
printf(+ACIAJQ-s: DATA
> ERROR+ACEAXA-n+ACI-, +AF8AXw-func+AF8AXw-)+ADs- +AD4- 
> 				bounce+AF8-buffer+AF8-stop(+ACY-bbstate)+ADs-
> +AD4-  				return -1+ADs-
> +AD4-  			+AH0-
> +AD4- -		+AH0- while (+ACE-(mask +ACY- DWMCI+AF8-INTMSK+AF8-
DTO))+ADs-
> +AD4- +-
> +AD4- +-			/+ACo- Data arrived correctly. +ACo-/
> +AD4- +-			if (mask +ACY- DWMCI+AF8-INTMSK+AF8-DTO)
> +AD4- +-				break+ADs-
> +AD4- +-
> +AD4- +-			/+ACo- Check for timeout. +ACo-/
> +AD4- +-			if (get+AF8-timer(start) +AD4- timeout) +AHs-
> +AD4- +-				printf(+ACIAJQ-s: Timeout waiting for 
data+ACEAXA-n+ACI-,
> +AD4- +-				       +AF8AXw-func+AF8AXw-)+ADs-
> +AD4- +-				bounce+AF8-buffer+AF8-stop(+ACY-
bbstate)+ADs-
> +AD4- +-				return TIMEOUT+ADs-
> +AD4- +-			+AH0-
> +AD4- +-		+AH0-
> +AD4-
> +AD4-  		dwmci+AF8-writel(host, DWMCI+AF8-RINTSTS, mask)+ADs-
> +AD4-

btw Is your mailer totally broken by any chance ?

> It turned out that patch breaks functionality in some cases.
> For me on every attempt to download something significant (at least I see
> it on 5/7 Mb files) from SD I'm seeing timeout firing too early.
> 
> I added a bit of extra instrumentation to see where time is spent and why.

Check this patch:

[PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds

https://patchwork.ozlabs.org/patch/511899/

Does it fix things for you ?

[...]
Alexey Brodkin Sept. 11, 2015, 5:04 p.m. UTC | #2
Hi Marek,

On Fri, 2015-09-11 at 13:49 +0200, Marek Vasut wrote:
> On Friday, September 11, 2015 at 09:59:32 AM, Alexey Brodkin wrote:
> > Hi Marek,
> 
> Hi!

> btw Is your mailer totally broken by any chance ?

Hm, I'm not sure what happened but as I may see here
https://patchwork.ozlabs.org/patch/516618/ my message looks good :)

> > It turned out that patch breaks functionality in some cases.
> > For me on every attempt to download something significant (at least I see
> > it on 5/7 Mb files) from SD I'm seeing timeout firing too early.
> > 
> > I added a bit of extra instrumentation to see where time is spent and why.
> 
> Check this patch:
> 
> [PATCH 1/2] mmc: dw_mmc: Increase timeout to 20 seconds
> 
> https://patchwork.ozlabs.org/patch/511899/
> 
> Does it fix things for you ?

Well this might fix my particular test-case, but are you sure there's
no chance for this timeout to be not long enough?

And vice versa why wait 20 seconds if problem has happened on short
transfer? Really wait 20 seconds on boot of say TV-set just because
USB-drive is broken?

So I would say that we need to rely on amount of data to be transferred
instead of having any random number of seconds for all.

-Alexey
Marek Vasut Sept. 12, 2015, 4:17 p.m. UTC | #3
On Friday, September 11, 2015 at 07:04:42 PM, Alexey Brodkin wrote:
> Hi Marek,

Hi,

> On Fri, 2015-09-11 at 13:49 +-0200, Marek Vasut wrote:
> +AD4- On Friday, September 11, 2015 at 09:59:32 AM, Alexey Brodkin wrote:
> +AD4- +AD4- Hi Marek,
> +AD4-
> +AD4- Hi+ACE-
> 
> +AD4- btw Is your mailer totally broken by any chance ?
> 
> Hm, I'm not sure what happened but as I may see here
> https://patchwork.ozlabs.org/patch/516618/ my message looks good :)

That's great.

> +AD4- +AD4- It turned out that patch breaks functionality in some cases.
> +AD4- +AD4- For me on every attempt to download something significant (at
> least I see +AD4- +AD4- it on 5/7 Mb files) from SD I'm seeing timeout
> firing too early. +AD4- +AD4-
> +AD4- +AD4- I added a bit of extra instrumentation to see where time is
> spent and why. +AD4-
> +AD4- Check this patch:
> +AD4-
> +AD4- +AFs-PATCH 1/2+AF0- mmc: dw+AF8-mmc: Increase timeout to 20 seconds
> +AD4-
> +AD4- https://patchwork.ozlabs.org/patch/511899/
> +AD4-
> +AD4- Does it fix things for you ?
> 
> Well this might fix my particular test-case, but are you sure there's
> no chance for this timeout to be not long enough?

There is, the crappier the card, the higher the possibility.

> And vice versa why wait 20 seconds if problem has happened on short
> transfer? Really wait 20 seconds on boot of say TV-set just because
> USB-drive is broken?

It's still better to wait 20 seconds instead of waiting indefinitelly, right?
That is the reason for my patch, which removed the unbounded loop.

> So I would say that we need to rely on amount of data to be transferred
> instead of having any random number of seconds for all.

Let's move this discussion into the dwmmc thread by Lukasz. There is more
to it than just the amount of data transferred.

Best regards,
Marek Vasut
diff mbox

Patch

diff --git a/drivers/mmc/dw_mmc.c b/drivers/mmc/dw_mmc.c
index 77b87e0..2da77a7 100644
--- a/drivers/mmc/dw_mmc.c
+++ b/drivers/mmc/dw_mmc.c
@@ -213,7 +213,7 @@  static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
 
        if (data) {
                start = get_timer(0);
-               timeout = 1000;
+               timeout = 10000; // That's required to get to the end of the transfer
                for (;;) {
                        mask = dwmci_readl(host, DWMCI_RINTSTS);
                        /* Error during data transfer. */
@@ -226,6 +226,7 @@  static int dwmci_send_cmd(struct mmc *mmc, struct mmc_cmd *cmd,
                        /* Data arrived correctly. */
                        if (mask & DWMCI_INTMSK_DTO) {
                                ret = 0;
+                               printf(" * time spent: %d, data size: %d, blocks: %d\n", (int)get_timer(start), data
->blocksize * data->blocks, data->blocks);
                                break;
                        }
----------------------------------->8--------------------------------