diff mbox series

[U-Boot] w1: fix occasional enumeration failure

Message ID 1542966793-24558-1-git-send-email-martin.fuzzey@flowbird.group
State Accepted
Commit b95938117cb2bd9b020305a58bd748a48e58d30a
Delegated to: Tom Rini
Headers show
Series [U-Boot] w1: fix occasional enumeration failure | expand

Commit Message

Martin Fuzzey Nov. 23, 2018, 9:53 a.m. UTC
Sometimes enumeration fails (about 1 in 50 times on my custom board).

The underlying reason is probably electrical but Linux does not have
the problem.

Comparing the Linux / u-boot implementations shows that Linux
retries the error case whereas u-boot aborts early.

Removing the early abort in u-boot fixes the problem.

Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
---
 drivers/w1/w1-uclass.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Eugen Hristev Nov. 27, 2018, 9:47 a.m. UTC | #1
On 23.11.2018 11:53, Martin Fuzzey wrote:
> Sometimes enumeration fails (about 1 in 50 times on my custom board).
> 
> The underlying reason is probably electrical but Linux does not have
> the problem.
> 
> Comparing the Linux / u-boot implementations shows that Linux
> retries the error case whereas u-boot aborts early.
> 
> Removing the early abort in u-boot fixes the problem.
> 
> Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
> ---
>   drivers/w1/w1-uclass.c | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/w1/w1-uclass.c b/drivers/w1/w1-uclass.c
> index 5544b19..ad86bf6 100644
> --- a/drivers/w1/w1-uclass.c
> +++ b/drivers/w1/w1-uclass.c
> @@ -84,10 +84,6 @@ static int w1_enumerate(struct udevice *bus)
>   			rn |= (tmp64 << i);
>   		}
>   
> -		/* last device or error, aborting here */
> -		if ((triplet_ret & 0x03) == 0x03)
> -			last_device = true;
> -

Hello Martin,

Your fix will basically make U-boot try again the same ID ? What happens 
if we keep getting errors at this triplet, we will be stuck in a loop ?
When are we going to abort searching this ID if we keep getting errors ?

Thanks,
Eugen

>   		if ((triplet_ret & 0x03) != 0x03) {
>   			if (desc_bit == last_zero || last_zero < 0) {
>   				last_device = 1;
>
Martin Fuzzey Nov. 27, 2018, 10:58 a.m. UTC | #2
Hi Eugen,

On 27/11/2018 10:47, Eugen.Hristev@microchip.com wrote:
>
> On 23.11.2018 11:53, Martin Fuzzey wrote:
>> Sometimes enumeration fails (about 1 in 50 times on my custom board).
>>
>> The underlying reason is probably electrical but Linux does not have
>> the problem.
>>
>> Comparing the Linux / u-boot implementations shows that Linux
>> retries the error case whereas u-boot aborts early.
>>
>> Removing the early abort in u-boot fixes the problem.
>>
>> Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
>> ---
>>    drivers/w1/w1-uclass.c | 4 ----
>>    1 file changed, 4 deletions(-)
>>
>> diff --git a/drivers/w1/w1-uclass.c b/drivers/w1/w1-uclass.c
>> index 5544b19..ad86bf6 100644
>> --- a/drivers/w1/w1-uclass.c
>> +++ b/drivers/w1/w1-uclass.c
>> @@ -84,10 +84,6 @@ static int w1_enumerate(struct udevice *bus)
>>    			rn |= (tmp64 << i);
>>    		}
>>    
>> -		/* last device or error, aborting here */
>> -		if ((triplet_ret & 0x03) == 0x03)
>> -			last_device = true;
>> -
> Hello Martin,
>
> Your fix will basically make U-boot try again the same ID ? What happens
> if we keep getting errors at this triplet, we will be stuck in a loop ?
> When are we going to abort searching this ID if we keep getting errors ?


Yes it will retry again the same 64 bit sequence after a bus reset.

So theoretically yes we could get stuck in a loop if the same error 
keeps occuring at the same point.

However I'm not really convinced that it will occur in real life except 
for completely broken W1 devices or contrived scenarios (though you 
could argue that this would allow a malicious W1 device to do a DOS 
attack on u-boot...)

The bus reset (which actually cuts the power to the bus and hence hard 
resets all the devices on it) should ensure that any transient weirdness 
due to bus noise or software bugs on the W1 devices themselves won't 
cause us to loop and if the bus itself breaks we will exit at the 
beginning of the loop on the error return from ops->bus_reset() due to 
non detection of the presence pulse.


That said It would be easy to add a max retry counter to ensure that the 
pathological case cannot occur. But Linux doesn't do that and I'm not 
aware of any infinite loop cases there.

If you think we need the retry limit though I'll send a V2 with that.


Regards,

Martin
Eugen Hristev Nov. 27, 2018, 11:53 a.m. UTC | #3
On 27.11.2018 12:58, Martin Fuzzey wrote:
> Hi Eugen,
> 
> On 27/11/2018 10:47, Eugen.Hristev@microchip.com wrote:
>>
>> On 23.11.2018 11:53, Martin Fuzzey wrote:
>>> Sometimes enumeration fails (about 1 in 50 times on my custom board).
>>>
>>> The underlying reason is probably electrical but Linux does not have
>>> the problem.
>>>
>>> Comparing the Linux / u-boot implementations shows that Linux
>>> retries the error case whereas u-boot aborts early.
>>>
>>> Removing the early abort in u-boot fixes the problem.
>>>
>>> Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>
>>> ---
>>>    drivers/w1/w1-uclass.c | 4 ----
>>>    1 file changed, 4 deletions(-)
>>>
>>> diff --git a/drivers/w1/w1-uclass.c b/drivers/w1/w1-uclass.c
>>> index 5544b19..ad86bf6 100644
>>> --- a/drivers/w1/w1-uclass.c
>>> +++ b/drivers/w1/w1-uclass.c
>>> @@ -84,10 +84,6 @@ static int w1_enumerate(struct udevice *bus)
>>>                rn |= (tmp64 << i);
>>>            }
>>> -        /* last device or error, aborting here */
>>> -        if ((triplet_ret & 0x03) == 0x03)
>>> -            last_device = true;
>>> -
>> Hello Martin,
>>
>> Your fix will basically make U-boot try again the same ID ? What happens
>> if we keep getting errors at this triplet, we will be stuck in a loop ?
>> When are we going to abort searching this ID if we keep getting errors ?
> 
> 
> Yes it will retry again the same 64 bit sequence after a bus reset.
> 
> So theoretically yes we could get stuck in a loop if the same error 
> keeps occuring at the same point.
> 
> However I'm not really convinced that it will occur in real life except 
> for completely broken W1 devices or contrived scenarios (though you 
> could argue that this would allow a malicious W1 device to do a DOS 
> attack on u-boot...)
> 
> The bus reset (which actually cuts the power to the bus and hence hard 
> resets all the devices on it) should ensure that any transient weirdness 
> due to bus noise or software bugs on the W1 devices themselves won't 
> cause us to loop and if the bus itself breaks we will exit at the 
> beginning of the loop on the error return from ops->bus_reset() due to 
> non detection of the presence pulse.

About reset cutting the power, it depends on how it's implemented, on my 
boards it's a bit-banged GPIO so I do not believe this happens.

> 
> 
> That said It would be easy to add a max retry counter to ensure that the 
> pathological case cannot occur. But Linux doesn't do that and I'm not 
> aware of any infinite loop cases there.
> 
> If you think we need the retry limit though I'll send a V2 with that.


Your points are valid, but I think it would be good to avoid the 
situation all-together with a retry limit...

Anyway it's just my opinion and I let Tom or Maxime have a last word if 
they feel different.

Thanks for your patch,

Eugen

> 
> 
> Regards,
> 
> Martin
> 
> 
>
Tom Rini Dec. 7, 2018, 8:32 p.m. UTC | #4
On Fri, Nov 23, 2018 at 10:53:06AM +0100, Martin Fuzzey wrote:

> Sometimes enumeration fails (about 1 in 50 times on my custom board).
> 
> The underlying reason is probably electrical but Linux does not have
> the problem.
> 
> Comparing the Linux / u-boot implementations shows that Linux
> retries the error case whereas u-boot aborts early.
> 
> Removing the early abort in u-boot fixes the problem.
> 
> Signed-off-by: Martin Fuzzey <martin.fuzzey@flowbird.group>

I saw the overall discussion and yes, OK, lets try this method.

Applied to u-boot/master, thanks!
diff mbox series

Patch

diff --git a/drivers/w1/w1-uclass.c b/drivers/w1/w1-uclass.c
index 5544b19..ad86bf6 100644
--- a/drivers/w1/w1-uclass.c
+++ b/drivers/w1/w1-uclass.c
@@ -84,10 +84,6 @@  static int w1_enumerate(struct udevice *bus)
 			rn |= (tmp64 << i);
 		}
 
-		/* last device or error, aborting here */
-		if ((triplet_ret & 0x03) == 0x03)
-			last_device = true;
-
 		if ((triplet_ret & 0x03) != 0x03) {
 			if (desc_bit == last_zero || last_zero < 0) {
 				last_device = 1;